WEIGHTING IN MULTILEVEL MOD
EL
S
 
By
 
Bing Ton
g
 
 
A DISSERTATION
 
Submitted to
 
Michigan State University
 
i
n partial fulfillment of the requirements
 
f
or the degree of
 
Measurement and Quantitative Method
s
 
--
 
D
octor of Philosophy
 
2019
ABSTRACT
 
WEIGHTING IN
 
MULTILEV
EL
 
MODELS
 
B
y
 
Bing Tong
 
Large
-
scale 
survey programs
 
usually use complex sampling design
s
 
such as unequal 
probabilities of selection, 
stratifications, and/or 
clustering to collect data to save time and money. 
This leads to the necessity to incorpora
te sampling weights into multilevel models in order to 
obtain accurate estimates and valid inferences. However, the weighted multilevel 
estimators
 
have 
been lately d
eveloped and minimal guidance 
is
 
left on how to use sampling weights in multilevel 
models a
nd which 
estimator
 
is most appropriate.
 
The goal of this study is to examine the performance of multilevel pseudo maximum 
likelihood (MPML) estimation
 
method
s
 
using different scaling techniques under the informative 
and non
-
informative condition in the con
text of
 
a
 
two
-
stage sampling design with unequal 
probabilities of selection. Monte Carlo simulation methods are used to evaluate the impact
s
 
of 
three 
factors, including informativeness of the
 
sampling
 
design, intraclass correlation coefficient
 
(ICC)
, and e
stimation methods. Simulation results indicate that
 
including sampling weights in the 
model still produce biased estimates
 
for
 
the
 
school
-
level
 
varian
ce
. In general,
 
the weighted 
methods
 
outperform the unweighted method in estimating intercept and 
student
-
level variance 
while the unweighted method outperforms the weighted methods for 
school
-
level variance 
estimation in the informative condition. 
In gene
ral, the cluster scaling estimation method is 
recommended in the informative sampling design. 
Under the no
n
-
informative condition, the 
unweighted method can be considered a better choice than the weighted methods for all the 
parameter estimates.
 
Besides, t
he ICC 
has obvious effects on 
school
-
level
 
variance estimat
e
s
 
in 
the informative condition, but in the non
-
informative condition, it also affects intercept estimat
e
s
.
 
An empirical study is included to illustrate the model.
Copyright by
 
BING TONG
 
2
01
9
v
 
 
This dissertation is dedicated to my famil
y
.
vi
 
 
ACKNOWLEDGEMENT
S
 
 
I have received a great deal of support and assi
stance throughout the writing of 
my 
dissertation. 
This 
dissertation could 
not 
be completed
 
without their help. 
 
I am especially indebted to my advisor
 
and dissertation chair
, Dr. Kimberly
 
S.
 
Kelly
. With 
her encouragement, I 
chose
 
MQM program. During my PhD
 
career, she gave me tremendous help 
in my academic studies, and my 
spiritual life as well. Her
 
expertise was invaluable in formulating 
the research top
ic.
 
 
I would like to acknowledge my committee members, Dr. Yuehua Cui, Dr. Richard Houang, 
and Dr. Will
iam Schmidt. I am grateful 
to
 
them
 
and appreciate
 
the
m
 
offering
 
me
 
e
nlightening
 
feedback, enormous support and guidance with great patience.   
 
My special thanks go to my CSTAT colleagues
, including
 
Dr. Frank Lawrence, Dr. Steven 
Pierce, Dr. Dhruv Sharman,
 
Dr. Wenjuan Ma
, Dr. Sarah Hession.
 
In the
 
last
 
four
 
and half
 
years 
here, 
they
 
have become my family members and I love
 
to
 
work with them. They have
 
shared
 
with 
me
 
tremendous resources 
and 
insightful ideas. 
More importantly, they never hesitate to help me 
when
ever
 
I encounter any problems. 
I would never forget them and wou
ld miss every single of 
them in the future.
 
 
Nobody has been more important to me in the pursuit of this dissertation than my family 
members. I would like to thank my mom and 
my 
sisters, w
ho always love me and support me 
without
 
conditions.
 
Most 
importantly, I would 
thank
 
my beloved daughter, Shiyuan, who provides 
unending 
support
. 
She is always there for me. 
vii
 
 
TABLE OF CONTEN
T
S
 
 
LIST OF TABLE
S
 
................................
................................
................................
........................
 
i
x
 
LIST OF 
FIGURES
 
................................
................................
................................
......................
 
x
 
KEY TO ABBRIVIATIONS
 
................................
................................
................................
........
 
x
i
 
CH
APTER 1 INTRODUCTION
 
................................
................................
................................
..
 
1
 
CH
AP
TER 2 THEORETICAL 
BACKGROUND AND LITERATURE REVIEW
 
....................
 
7
 
2.1 Research Goal
 
................................
................................
................................
...................
 
7
 
2.
2
 
Multistage 
S
ampling
 
................................
................................
................................
.........
 
8
 
2.
3
 
M
ultilevel 
M
odel
 
................................
................................
................................
..............
 
9
 
2.
4
 
Multilevel 
P
s
eudo
-
M
aximum 
L
ikelihood (MPML) 
E
stimation 
M
ethod
s
 
........................
 
11
 
2.
5
 
Scaling 
S
ampling 
W
eights 
for 
M
ult
ilevel 
M
odels
................................
............................
 
14
 
2.
6
 
Intraclass 
C
orrelation 
C
oefficient (ICC)
 
................................
................................
...........
 
1
5
 
2.
7
 
Informativeness
 
of Selection
................................
................................
.............................
 
17
 
CHAPTER 3 METHOD
S
 
................................
................................
................................
.............
 
19
 
3.1 Empirical Data
 
................................
................................
................................
..................
 
19
 
3.1.1 Data and 
V
ariables
 
................................
................................
................................
...
 
19
 
3.1.2 Statistical 
M
odels
 
................................
................................
................................
.....
 
2
2
 
3.2 Simulation
s
 
 
................................
................................
................................
......................
 
2
4
 
3.2.1 Simulation 
D
esign
 
 
................................
................................
................................
...
 
2
4
 
3.2.2 Mo
del
 
................................
................................
................................
.......................
 
2
6
 
3.2.3 Sampling 
S
election
 
................................
................................
................................
..
 
2
7
 
3.2.4 M
plus
 
and 
D
ata 
A
nalysis
 
................................
................................
.........................
 
2
9
 
3.2.
5
 
Evaluation 
C
riteria
 
................................
................................
................................
...
 
30
 
CH
AP
TER 4 RESULTS
 
................................
................................
................................
...............
 
3
2
 
4.
1
 
S
imulation
 
R
esults
 
................................
................................
................................
............
 
3
2
 
4.
1
.1 Research 
Q
uestion 
O
ne
 
................................
................................
............................
 
3
3
 
4.1.1.1 (Absolute) Relative Bias
 
 
................................
................................
................
 
3
3
 
4.1.1.1.1 Informative Design
 
 
................................
................................
...............
 
3
3
 
4.1.1.1.2 N
on
-
I
nformative Design
 
 
................................
................................
.......
 
3
8
 
4.1.1.2 RMSE
 
................................
................................
................................
..............
 
4
2
 
4.1.1.2.1 Informative Design
 
................................
................................
................
 
4
2
 
4.1.1.2.2 Non
-
I
nformative Design
 
................................
................................
........
 
4
4
 
4.1.1.3 Coverage 
R
ate
 
................................
................................
................................
.
 
4
5
 
4.1.1.3.1 Informative Design
 
................................
................................
................
 
4
6
 
4.1.1.3.2 Non
-
I
nformative Design
 
................................
................................
........
 
4
7
 
4.
1
.
2
 
Research 
Q
uestion 
T
wo
 
................................
................................
...........................
 
4
9
 
4.1.2.1 
(Absolute) Rel
ative Bias
 
................................
................................
.................
 
4
9
 
viii
 
 
4.1.2.1.1 Informative Design
 
................................
................................
................
 
4
9
 
4.1.2.1.2 Non
-
I
nformative Design
 
................................
................................
........
 
5
1
 
 
4.1.2.2 RMSE
 
................................
................................
................................
..............
 
5
2
 
4.1.2.2.1 Informative Design
 
................................
................................
................
 
5
2
 
4.1.2.2.2 Non
-
I
nformative Design
 
................................
................................
........
 
5
3
 
4.1.2.3 Coverage 
R
ate
 
................................
................................
................................
.
 
5
3
 
4.1.2.3.1 Informative Design
 
................................
................................
................
 
5
3
 
4.1.2.3.2 Non
-
I
nfo
rmative Design
 
................................
................................
........
 
5
3
 
4.
1
.3 Simulated 
S
tandard 
E
rrors and 
S
tandard 
D
eviations
 
................................
...............
 
5
4
 
4.2 Results for ECLS
-
K:2011
 
................................
................................
................................
.
 
5
8
 
CH
AP
TER 5 
SUMMARY 
AND
 
DISCUSSION
 
................................
................................
.........
 
6
3
 
5.1 Summary of 
T
his 
S
tudy
 
................................
................................
................................
....
 
6
3
 
5.2 Discussion of 
R
esults
 
................................
................................
................................
........
 
6
8
 
5.3 Implications
................................
................................
................................
.......................
 
70
 
5.4 Limitations and 
F
uture
 
Studies
 
................................
................................
.........................
 
70
 
APPENDICES
 
................................
................................
................................
..............................
 
7
2
 
A
PPENDIX
 
A
.
 
Stata 
Simulation Syntax 
in
 
the
 
I
nformative
 
Sampling
 
D
esign
 
......................
 
7
3
 
A
PPENDIX
 
B
.
 
Stata Simulation Syntax 
in
 
the
 
N
on
-
I
nformative
 
Sampling
 
D
esign
 
..............
 
7
5
 
A
PPENDIX
 
C
.
 
M
plus
 
Syntax
 
................................
................................
................................
.
 
7
6
 
REFERENCES
 
................................
................................
................................
.............................
 
80
ix
 
 
LIST OF TABLES
 
 
Table 
3.
1
.
 
ECLS
-
K: 2011 Variable 
D
escripti
ve 
S
tatistics
 
................................
............................
 
22
 
 
T
able 3.2
.
 
Simulation 
Design
 
................................
................................
................................
........
 
2
5
 
 
Table 
4.
1
.
 
RB
 
(%)
, RMSE, 95%
 
CI
 
CR for 
C
ovariates in
 
the
 
I
nformative 
D
esign
 
.......................
 
3
4
 
 
Table 
4.
2
.
 
RB
 
(%)
, RMSE, 95% 
CI 
CR
 
for
 
I
ntercept and 
V
ariance 
C
omponents in
 
the
 
I
nformative
 
D
esig
n
 
................................
................................
................................
................................
...........
 
3
5
 
 
T
able 
4.
3
.
 
RB
 
(%)
, RMS
E, 95%
 
CI
 
CR for 
C
ovariates in 
the N
on
-
I
nformative 
D
esign
 
...............
 
3
9
 
 
Table 
4.
4
.
 
RB
 
(%)
, RMSE, 
95% CI CR for Intercept and Variance Components in the Non
-
I
nformative Design
................................
................................
................................
........................
 
40
 
 
Table 
4.
5
.
 
S
imulation 
S
tandard 
D
eviations and 
S
tandard 
E
rrors of 
E
stimates in
 
the
 
I
nfor
mative 
D
esign
 
................................
................................
................................
................................
...........
 
5
6
 
 
Table 
4.
6
.
 
Simulation Standard Deviations and Standard Errors of Estimates in the Non
-
I
nformative Design
................................
................................
................................
........................
 
5
7
 
 
Table 4.
7
. 
Null Model for ECLS
-
K: 
2011Mathematics and Reading
 
................................
...........
 
5
9
 
 
Table 4.
8
. 
Model with
 
Student
-
L
evel Predictors
 
for 
ECLS
-
K: 2011
 
Mathematics and Reading
 
.
 
60
 
 
Table 4.
9
. 
Full Model for
 
ECLS
-
K: 2011 Mathematics and Reading
 
................................
..........
 
6
1
 
 
Table 5.1. 
Summary of Comparisons of the Estimators
 
................................
...............................
 
6
5
 
 
Table 5.2
. 
ICC effect
 
................................
................................
................................
.....................
 
6
7
 
x
 
 
LIST OF FIGURES
 
 
Figure 
4.
1
.
 
Relative bias
 
(%) for covariates in
 
the
 
informati
ve design
 
................................
........
 
3
6
 
 
Figure 
4.2
.
 
Relative bias
 
(%) for intercept and 
variance components in
 
the
 
informative design
 
3
6
 
 
Figure 
4.
3
.
 
Relat
ive bias
 
(%) for covariates in
 
the
 
non
-
informative design
................................
.
 
41
 
 
Figure 
4.
4
.
 
Relative bias
 
(%) for intercept and variance componen
ts in
 
the
 
non
-
informative design
................................
................................
................................
................................
.......................
 
41
 
 
Figure 
4.
5
.
 
RMSE for covariates in
 
the
 
informative design
 
................................
.........................
 
4
3
 
 
Figure 
4.
6
.
 
RMSE for intercept and variance components in 
the 
informative design
 
.................
 
4
3
 
 
Figure 
4.
7
.
 
RMSE for covariates in
 
the
 
non
-
informative de
si
gn
 
................................
.................
 
4
4
 
 
Figure
 
4.
8
.
 
RMSE for intercept and variance components in 
the 
non
-
informative design
 
..........
 
4
5
 
 
Figure 
4.
9
.
 
Coverage rate for covariates in
 
the
 
informative design
 
................................
.............
 
4
6
 
 
Figur
e 
4.
10
.
 
Coverage rate for intercept and variance components in
 
the
 
informative design
 
....
 
4
7
 
 
Figure 
4.
11
.
 
Coverage rate for covariates in 
the 
non
-
informative design
 
................................
....
 
4
8
 
 
Figure 
4.
12
.
 
Coverage rate for intercept and variance components in
 
the
 
non
-
infor
ma
tive desi
gn
................................
................................
................................
................................
.......................
 
4
8
 
 
Figure 4.13
.
 
Relative bias
 
(%) for covariates in
 
the
 
informative design
 
................................
......
 
50
 
 
Figure 4.14
.
 
Relative bias
 
(%) for intercept and variance componen
ts
 
in
 
the
 
informative design
................................
................................
................................
................................
.......................
 
50
 
 
Figure 4.15
.
 
Relative bias
 
(%) for covariates in 
the 
non
-
informative design
 
...............................
 
5
1
 
 
Figure 4.16
.
 
Relative bias
 
(%) for intercept and variance components in
 
the
 
non
-
informat
ive 
design
 
................................
................................
................................
................................
............
 
5
2
 
 
xi
 
 
KEY TO ABBREVIATIONS
 
 
ECLS
-
K: 2011
 
Early Childhood Longitudinal Study, Kindergarten Class of 2010
-
2011
 
PML
 
 
Pseudo Maximum Likelihood
 
MPML
 
 
Multilevel Pseudo Maximum Likelihood
 
PML
 
 
Pseudo Maximum Likelihood
 
PWIGLS
 
 
Probability Weighted
 
I
terative Generalized Least Squares
 
ICC
 
 
Intraclass Correlation Coefficient
 
RB
 
 
Relative Bias
 
RMSE
 
 
Root Mean Square Error
 
CR
 
 
Coverage Rate
 
UW
 
 
Unweighted Estimation Method
 
 
RW
 
 
Estimation Method with Raw Weights
 
CS
 
 
Estimation Method with Cluster
 
S
cali
ng
 
ES
 
 
Estimation Method with Effective Scaling
 
NAEP
 
 
National Assessment of Educational Progress
 
NCES
 
 
National Center for Education Statistics
 
NSF
 
 
National Science Foundation
1
 
 
CHAPTER 1 INTRODUCTION
 
 
A survey is defined as a data collection too
l 
and is
 
commonly used in social science to 
collect self
-
report data from study participants. 
It
 
allow
s
 
researchers to collect a large amount of 
data quickly and less expensively. 
Besides
, the samples in survey 
r
esearch
 
are
 
often large
, and a 
wide variety
 
of
 
variables can be examined
 
(Boslaugh, 2007; Koz
io
l, Bovaird, & Suarez, 2017), 
including personal facts, attitudes, 
previous
 
behaviors, and opinions. Also, 
a 
survey can be 
often 
quickly created and easily administered. Thus, secondary data analysis is bec
om
ing increasingly 
popular (Stapleton, 2006). 
Many large
-
scale 
survey 
programs
 
in social science use complex 
sampling designs to collect data, such as unequal probabilities of sel
ection, stratification, and/or 
cluster sampling 
due to the impracticality of
 
si
mple random sampling
. 
In
 
educational 
research
, 
large scale data collection efforts such as
 
National Assessment of Educational Progress (
NAEP
 
afterwards)
, 
Early Childhood Longitu
dinal Study
-
Kindergarten 
Class of 1998
-
1999 
(
ECLS
-
K
 
afterwards)
, Early Childh
oo
d Longitudinal Study
-
Kindergarten Class of 2010
-
2011 (ECLS
-
K 
afterwards)
, 
available through 
National Center for Education Statistics (
NCES
)
 
or 
National 
Science Foundation (
NSF
)
 
use complex sampling plans. These three
-
stage surveys first involve 
sampling 
ge
ographic areas
 
with different probabilities of selection according to characteristics. 
These areas are often termed primary sampling units (PSUs). Then schools are sampled with 
different probabilities from
 
the selected
 
areas and lastly students are sampl
ed
 
from each of the 
selected schools
, resulting in a
 
cluster sampling design. Students 
chosen
 
from the same school tend 
to be more alike 
than 
students 
chosen from
 
other schools, and
 
these groups of
 
students 
show
 
some 
degree of dependence (Hox & Kreft, 1994
; 
Kish, 1965; Skinner, Holt & Smith, 1989) 
when 
compared 
to 
students from 
other
 
school
s
. 
This type of sampling design 
bring
s challenges wh
en
 
2
 
 
performing statistica
l analyses. 
If
 
we disaggregate 
higher order variables to individual variables, 
ignoring the ne
st
ed 
structure of the data
 
and assuming each observation is independent, the 
assumption of independence of observations is not tenable. Conventional parametric an
alytic 
methods
 
(e.g., regression, analysis of variance, 
t
-
tests)
 
do not work well because 
they
 
v
iolate the 
assumption of observation independence
 
(Cohen, West, 
&
 
Aiken, 2003). The standard errors for 
the point estimates are estimated incorrectly, which cou
ld lead to erroneous conclusions arising 
from increased Type
-
I errors due to the violation of 
th
is assumption (Arceneaux 
&
 
Nickerson, 
2009; Clarke, 2008; Hahs
-
Vaughn, 2005; Heck 
&
 
Mahoe, 2004; Judd, McClelland, 
&
 
Ryan, 2009; 
Musca
 
et al.
, 2011). 
However, i
f all the individual level variables
 
are aggregated
 
to the higher level, 
then important inform
at
ion 
could
 
be lost.
 
Multilevel models or Hierarchical linear models (HLM)
 
were proposed and
 
have been widely used in education
, because they can be used
 
to account for 
clustering, 
and allow 
the variance
 
of the dependent variable
 
to be partitioned explicit
ly
 
into 
within
-
 
and between
-
variance
 
(Lee 
&
 
Fish, 2010; Lubienski 
&
 
Lubienski, 2006; Palardy, 2010; 
Raudenbush 
&
 
Bryk, 2002; Snijder
s
 
&
 
Bosker, 2012). Th
ey are an alternative to
 
some of
 
the 
approaches used by survey analysis for dealing with nested data st
ru
cture
s
.
 
Furthermore, some groups of the population are oversampled for various reasons. Units 
with higher data collection costs may be drawn with lower
 
selection probabilities and individuals 
from small subpopulations of particular interest may be sample
d 
with higher probabilities. For 
example, both ECLS
-
K and ECLS
-
K:2011 oversampled Asian, Native Hawaiians, and other 
Pacific islanders with the rate of 2
.5 compared with other racial groups. This feature suggests 
applying sampling weights in the model to 
r
ef
lect
 
the unequal probabilit
ies
 
of selection
 
whenever 
selection probabilities are related to the outcome variable after conditioning on covariates in th
e 
model. The sampling design is said to be informative in this case (Fuller, 2009; Grilli & Pratesi, 
3
 
 
20
04
)
. 
Ignoring this feature and w
ithout using weights, parameter estimates would be severely 
biased (Korn & Graubard, 1995
; Pfeffermann, Skinner & Goldste
in, 1998; Rodriguez & Goldman, 
1995, 2001
; Zaccarin 
&
 
Donati, 2008
). 
 
But, appropriately using weights 
is
 
not an easy task. For large
-
scale data sets, for example, 
ECLS
-
K:2011, there are many sampling weight variables, including school
-
level and student
-
le
vel 
weights. For student level, th
is
 
include
s
 
weights generated f
or the
 
child assessment
s
, teacher
-
leve
l 
questionnaire, 
student
-
level questionnaire, parent interview, and care provider questionnaire. 
A
ppropriate use of complex sampling weights is of great 
importance because ignoring them may 
produce erroneous standard errors and consequently, inaccurate sta
ti
stical inference. 

there is not much guidance on how to incorporate sampling weights in the multilevel models. It 
can be dated back from th
e late 1980s (e.g., Pfeffermann & LaVange, 1989). The pseudo maximum 
likelihood (PML) method, developed
 
b
y Skinner (1989) and following the thoughts of Binder 
(1983), is a well
-
established estimation procedure for any weighted single
-
level models. However,
 
flexible techniques for estimating weighted multilevel models have only newly been developed 
(cf., Asp
ar
ouhov, 2004, 2006; Grilli & Pratesi, 2004; Rabe
-
Hesketh & Skrondal, 2006; Koziol et 
al., 2017). One possible reason for this is multilevel weights are 
not available, which is often the 
case for public
-
release
d
 
data file (
Kova


evi


& Rai, 2003; Staple
to
n, 2012).  The second reason 
might be that weighted multilevel modeling requires scaling of the lower level sampling weights 
(Pfeffermann et al., 1998). Currently, there is no well
-
established general multilevel consistent 
estimation 
method incorporating
 
w
eights. 
 
I
t is controversial whether to weight or not (Bertolet, 2008; Kish, 1992; Skinner, 1994; 
Smith, 1988; Xia & Torian, 2013)
. F
or example, o
n the one hand, some
 
researchers
 
(e.g., Graubard 
& Korn, 1996; Korn & Graubard, 1995, 20
03; Lohr & Liu, 1994
) 
suggest
ed
 
us
ing sampling 
4
 
 
weights in the model,
 
as mentioned above
 
to take into account for the complex sampling scheme
. 
On the other hand, Winship 
and
 
Radbill (1994) preferred unweighted estimators because estimates 
were unbiased, and
 
consistent 
because 
th
ey produced
 
smaller standard errors. 
However, although 
the use of sampling weights will result in the increase of variance from unequal inclusion 
probabilities, it is still required and necessary because it prevents producing biased p
arameter 
estimates u
nd
er informative sampling in multilevel models (Pfeffermann et al., 1998; Kim & 
Skinner, 2013), protects against misspecification, and makes full use of population
-
level 
information (Kim & Skinner, 2013).
 
The estimation quality can be a
ffected by a number 
of
 
factors and some of them have been 
investigated in the past research across different conditions, such as cluster size, distribution of the 
response variable
,
 
estimator/software program
,
 
informativeness of the sampling design
,
 
intrac
lass 
correlation coe
ff
icient (ICC)
,
 
model type
,
 
invariance of selection across clusters
,
 
number of 
clusters, relative variance of weights, sample design features, and weight approximation method.
 
In this study, I 
focus on 
the multilevel pseudo maximum likelihood (MPML) estima
ti
on method
. 
First of all, although various conditions have been exa
mined, conclusions are not inconclusive and 
rely on the particular model or sampling mechanism. Second, there are limited number of studies 
evaluating MPML (i.e., Asparouhov, 2006; Asparou
ho
v & 
Muth


n
, 2006; Cai, 2013; Grilli & 
Pratesi, 2004; Koziol et al.
, 2017; Rabe
-
Hesketh & Skrondal, 2006; Stapleton, 2012). Third, 
MPML, compared with other estimators
, are more flexible. 
Therefore, more studies are needed to 
evaluate MPML. 
 
The purpose
 
o
f the present study is to evaluate the performance of MPML using dif
ferent 
scaling procedures in the context of
 
a
 
two
-
stage sampling design with unequal probabilities of 
selection
 
in the 
in
formative and non
-
informative conditions across different levels 
of
 
ICC
 
using a 
5
 
 
linear 
random
-
intercept model with covariates at both l
evels. Monte Carlo simulation methods are 
used to estimate the relative bias (RB), root mean square error (RMSE) and
 
coverage 
rate/probability (CR) of the corresponding 
95% 
confidence in
te
rval estimators. The following 
factors are manipulated: (
a
) informat
iveness; (b) ICC
 
of the unconditional model
; and (c) 
estimation method. All factors are fully crossed. 
 
Cai (2013) conducted Monte Carlo simulations and found that the unweighted estimat
or
 
produce
s
 
biased estimates for the intercept and 
school
-
level variance, while the estimates for fixed 
effects and 
student
-
level var
ia
nce 
are
 
nearly unbiased within 10% of the true value in terms of 
Muth


n
 
and 
Muth


n
 
(2002). 
Generally speaking, the MP
ML
 
estimators h
a
ve
 
higher coverage rates 
than the unweighted estimator in the informative condition. 
I
ncluding sampling weights increase
s
 
MSE substantially
 
and produces biased estimates for the intercept and 
school
-
level
 
variance in the 
informative samplin
g 
design.
 
Furth
ermore, ignoring informative 
sampling design could produce 
biased estimates. 
Pfeffermann et al
.
 
(1998) pointed out that the unweighted method only produced 
biased est
i
m
a
tes for the intercept and 
school
-
level
 
variance, not for 
student
-
level
 
v
ar
iance when 
th
e design is informative at 
school
-
level
 
variance. Prior studies (e.g., Asparouhov & Muth


n,
 
2006
; 
Kova


evi


&
 
Rai
, 
2003) 
show
 
that as the ICC increase
s
 
the bias decrease
s for all the parameters
 
using
 
an
 
unconditional model. 
 
Asparouhov
 
a
nd
 
Muth


n 
(2007)
 
also found that the MPML 
estimator outperforms substantially the other estimators. 
                                                                                 
 
The plan of this study is as follows. Chapter 2 di
s
cusse
s theoretical
 
ba
ckground and 
reviews 
the related 
literature
. We briefly review
 
multistage design and general multilevel models
.
 
Pseudo 
maximum likelihood estimation (MPML) method is 
presented
, followed by two scaling methods. 
Intraclass correlation coefficient (ICC) an
d
 
i
nformativeness are also 
describ
ed in this section.  
In 
Chapter 3, I 
introduce
 
the
 
empirical
 
data set I use 
in
 
this study: ECLS
-
K:2011, and
 
procedures of
 
6
 
 
simulat
ion 
for the present study
. Chapter 4 presents the results of the empirical data analysis and 
s
im
ulation analysis. Chapter 5 provides a discussion of overall findings, limitations, and topics for 
future researc
h
.
7
 
 
CHAPTER 2 THEORETICAL BACKGROUND AND LITERATURE REVIEW
 
 
2.1 Research Goal
 
 
Using empirical and simulated data, the present study focuses o
n 
examining the 
performance of MPML in the context of 
a 
two
-
stage sampling design with unequal probability of 
selection. 
Since MPML is newly developed compared 
to
 
PML, there are far fewer stud
ies 
examining MPML. 
And n
o consensus ha
s
 
been achieved on which 
on
e performs best and under 
which condition
 
for the existing weighted multilevel estimators
. MPML is considered the most 
flexible and popular method if the consistency of estimates and computa
tion intensity are 
considered for multilevel data. But it is als
o 
obvious that weighted estimators produce larger 
standard errors than unweighted
 
method
s
 
do
. 
Therefore, i
t is controversial whether to use weight 
to not. More studies are needed to compare th

more, the scaling
 
e
ffect used in the multilevel estimation method is inconclusive based on the 
previous literature. Lastly, to my knowledge, except one study (c.f., Koziol et al., 2017), all other 
previous sim
ulation studies manipulating ICC values use only an unconditiona
l 
random intercept 
model. 
 
Therefore, the main goal for this study is to examine the impact of sampling weights and 
to evaluate the performance of the MPML method
s
 
with different scaling techn
iques in the context 
of two
-
stage informative and non
-
informativ
e 
sampling design
s
 
across different values of ICC 
with unequal probability of selection using random intercept model with covariates at both levels. 
Monte Carlo simulation methods are used to 
evaluate several factors, including: (a) 
informativeness of the 
sa
mple design (non
-
informativeness vs. informativeness at both stages
)
; (b) 
ICC with five different values; (c) estimation methods (unweighted, raw/unscaled weighted, 
8
 
 
cluster scaling, effectiv
e scaling). All the factors are fully crossed. This gives rise t
o 
2 × 5 × 4 =40 
combination of conditions. 
 
This study makes several contributions to the complex survey data literature. First, it 
provides a comparison between unweighted and weighted multil
evel approaches in the context of 
unequal probability of selecti
on
. Second, it provides a comparison of estimation methods between 
informative and 
non
-
informative
 
sampling design. Third, it provides a comparison of estimation 
methods under different level
s
 
of ICC values.
 
In order to cover the gaps of the current body o
f 
literature, the following research questions 
are addressed:
 
1.
 
How do MPML estimators differ from unweighted estimator in multilevel models in the 
informative and non
-
informative 
sampling designs in terms of relative bias, root mean 
square error and 95% con
fi
dence interval coverage rate?
 
2.
 
How does intraclass correlation influence the performance of estimators under the 
informative and non
-
informative condition in terms of relative b
ias, root mean square error 
and 95% confidence interval coverage rate?
 
Large
-
s
ca
le
 
survey
s
 
in social 
studies usually use complex sampling design
s
 
based on the 
characteristics of the population to glean information in order to address various research 
quest
ions.
 
This feature brings challenges to the analysis.
 
This chapter include
s
 
se
ve
ral topics 
which are central to 
understanding
 
weighted multilevel
 
analysis of survey data.
 
 
2.
2
 
Multistage 
S
ampling
 
Multistage designs are 
commonly 
used in many practical cases. 
F
or a two
-
stage sampling
 
in
 
the
 
educational setting
,
 
for example,
 
clusters o
r 
PSUs such as schools are selected in the first 
9
 
 
stage. 
In the second stage, i
ndividual units, such as students are 
then
 
sampled from the clusters. 
Ea
ch sampling 
stage 
correspon
ds
 
to a multilevel model
 
level
. In this case, second stage 
corresponds to 
L
evel
 
1
, first stage to 
L
evel 2. 
 
At the first stage, cluster 

 
is sampled with probability 


, 

 
, where 

 
is the 
number of clusters to be sampled from the total number of clusters in the population, 
M
. At the 
second stage, individual 

 
is sa
mp
led from the cluster selected 
at
 
the first stage with c
onditional 
probability 


, 

 
, where 

 
is the
 
cluster sample size. Usually, clusters are sampled with 
probabilities that are proportional to their sizes, that is, the number of in
di
vidual units in their 
clusters, 


,
 
 
=
 
 
(
2
.1
)
 
and the weight at cluster level is
 
the inverse of the probability 


, that is, 
 

= 1/


. Each unit is 
sampled from cluster 
j
 
with conditio
na
l probability (assuming that equal 
number of 
units are 
sampled from each cluster)
 

=
 
 
(
2
.
2)
 
an
d 
the weight for individual unit 

 
given cluster 

 
is
 
the inverse of the conditional probability 


, 
that is, 


= 1/


. Then the unconditional probability is 
defined as
 
 
= 


= 


(
2
.
3)
 
 
2.
3
 
Multilevel 
M
odel
 
A typical two
-
level linear model can be specified with two equations. The first equation is 
u
sed to describe the 
re
lationship between dependent variables and the covariates at the 
student
 
level, within each group. Some or all of the parameters of the 
stude
nt
-
level
 
equation are viewed as 
10
 
 
varying randomly across the groups. The second equation, 
scho
ol
-
level 
equation, d
ef
ines these 
parameters as dependent variables with the 
school
-
level variables as covariates.
 
If we combine 
them together, a
 
two
-
level linear m
ixed model can be specified
 
in matrix vector form
 
as follows, 
based on Laird and Ware (1982),
 
  
= 


+ 


+ 


.
 
 
(
2
.
4)
 
In the above equation, 

 
indexes the cluster, with 

 
m
, where 
m
 
is the number of clusters. 
For the 


cl
uster with size 


, 


is an
 

x 1 vector of obser
ved response, 


is an 


x 

 
obser
ved matrix for fixed effects, 

 
is a 

 
x 1 vector of unknown
 
coefficients
, 


denotes an  


x 

 
random
-
effect design m
at
rix, 


is a
 

x 1 vector of cluster
-
specified random effects, and 


is an 


x 1 vector of random residual errors, where 

 
is the number of unknown coefficients 
including the intercept and 

 
is the number of random effects. Since rand
om
 
intercept model is 
used in the current study, 

 
equals 1. 
 
Either full ma
ximum likelihood (M
L/FIML) or the restricted maximum likelihood (REML) 
estimation method
 
is often used to estimate
 
the unknown
 
model parameters in 
a 
general
 
linear 
mixed 
model
, su
ch
 
as fixed regression coefficients and variance components.
 
Searle, Casella,
 
and 
McCulloch (199
2) defines t
he likelihood function for 
a
 
linear mixed model
 
as follows, 
 

) = 


,
 
 
(
2
.
5)
 
where 

 
is the covariance matrix of vector 

, 

 
= 


+ 


, 

 
denotes covariance matrix for 
the random effect ve
ctor 


, and in our 
ca
se, it is a scalar 


, and 


is the variance of the error 
term. 
For computational convenience, t
he log 
likelihood function
 
is more often used instead of 
likelihood function. It
 
is 
specified in mathematical form as 
 
11
 
 
(
2
.
6)
 
where 

 
is the total number of observations, 

 
= 


. 
 
 
2.
4
 
Multilevel Pseudo
-
Maximum Likelihood
 
(MPML
)
 
Estimation 
M
ethod
s
 
In o
rd
er to 
achieve
 
valid inference
 
for the population
, sampling weights must be used for 
all the levels of 
the
 
data. 
But the literature does not obviously describe when and how to use 
sampling weights properly in the multilevel models
. 
U
sing single
-
level weig
ht
s to replace 
multilevel weights, is not 
always 
a
ppropriate
 
for the following reasons
. First, sampling weights 
are 
placed
 
into sum of squares and cross
-
products
 
in a single
-
level regression
.
 
Final
-
level weights 
are the product of 
multilevel weights.
 
Based
 
o
n Christ, Biemer, & Wiesen (2007), i
f we use fin
al
-
level weights, 
it 
might lead to biased estimates in multilevel models
. 
Second, Pfeffermann et al
.
, 
(1998) 
noted
 
that single final
-
level weights or overall inclusion probabilities
 
may not c
ontain
 
sufficie
nt
 
information to correct for unequal sampling pro
babilities
 
at higher levels
, because units 
at
 
either level can be selected with differential probabilities. Therefore, multilevel weights need to 
be used in multilevel models. We us
e sample data and the sam
pl
ing weights to estimate unknown 
parameters by ma
ximizing the weighted sample likelihood. 
 
 
So far, researchers have explored different estimation methods
 
incorporating sampling 
weights
 
for complex surveys, such 
as 
multilevel pseudo maxi
mu
m likelihood (MPML) 
(Asparouhov
, 2004, 2006; Grilli & Pratesi, 2004; Rabe
-
Hesketh & Skrondal, 2006), probability
-
weighted iterative generalized least squares (PWIGLS) (Pfeffermann et al., 1998), sample 
distribution methods (Eideh
 
& Nathan, 2009; Pfefferm
an
n, Moura, & Silva, 2006), weigh
ted 
composite likelihood (WCL) estimation (
Rao, Verret, & Hidiroglou, 2013
)
, 
and pseudo empirical 
12
 
 
likelihoods (Chaudhuri, Handcock, & Rendall, 2010; Chen & Sitter, 1999; Francisco & Fuller, 
1991; Fu
ller, 1984; Lin, Steel, &
 
C
hambers, 2004; Rao & Wu, 2010; 
Scott & Holt, 1982). 
As 
Asparouhov & 
Muth


n
 
(2006) 
stated
 
that there is no best estimation method for multilevel models 
if sampling weights are used. 
MPML method and PWIGLS method are the two most
 
widely used 
estimation m
et
hods in multilevel models incorporating sampling weights. 
Compared 
with
 
PWIGLS, 
MPML is more flexible and more widely applied, from the perspective of software 
implementation. 
Current
ly, MPML has been applied in 
the software of 
S
tata, M
plus
, and SAS 
whil
e 
PWIGLS has been used in LISRAEL, HLM and MLwiN. 
Different software would generate 
different 
output
 
(
Chantala, Blanchette, & Suchindran, 2011; Chantala & Suchindran, 2006
). 
The 
applica
tion of MPML, compared with PWIGLS, requires l
ess 
computational intensi
ty
 
and is much 
more
 
flexible (
Kova


evi


& Rai, 2003; Rabe
-
Hesketh & Skrondal, 2006). 
Besides, MPML 
can be 
applied to any general multilevel model (Rabe
-
Hesketh & Skrondal, 2006) just as
 
the PML method 
can be used in any single
-
level models. 
The third
 
a
dv
antage is th
at
 
MPML is versatile and it can 
be modified for different estimation issues (Asparouhov, 2004; Asparouhov &
 
Muth


n
, 2006). In 
addition, MPML can account for stratification and
 
extra non
-
substantive
 
clustering levels
 
in the 
estimation of sta
nd
ard errors without having to incorporate such design features into the 
parameterization of the model (Asparouhov & 
Muth


n
, 2006; 
Koziol et al., 2017; 
Rabe
-
Hesketh 
& Skrondal, 2006). 
Because of these
 
advantages
, only
 
the
 
MPML 
wi
th different scaling 
tech
ni
que
s 
is 
consider
ed in th
e present
 
study.
 
Let the estimates 


= 


be the parameters and the likelihood function for a general 
multilevel model can be expressed as
 

)
 
 
(
2.7)
 
13
 
 
where 


is the response 
variable
 
in cluster 

 
of individual 

 
and 


the 
cluster
-
specific random effect; 


is 
student
-
level
 
covariates and 


the cluster level covariates; 


is the density function of 


and 


the density function of 


, where 


and 


are the parameters to be estimated for the fixed effects for the 
student 
level
 
and 
school 
level, respectively
. 
 
If weight
ing is incorporated into the analysis, and scaling procedures are also applied in 
order to reduce the bias arising from unequal probabilit
ies
 
of selection for complex survey data, 
the population likelihood functio
n is directly estimated by weighting the sa
mpling likelihood 
function,
 

,
 
(
2.
8
)
 
where 


= 1/ 


is 
student
-
level
 
weights where 


is
 
the conditional inclusion probability for 
the 
i
th unit in the 
j
th cluster, given that the 
j
th cluster is sampled; 


= 1/


is the 
school
-
level
 
weights where 


is the inclusion probability for the 
j
th cluster; 


and 


are the
 
scaling fac
tors 
for the 
school
-
level
 
and individual level sampling weights, respectively. 
 
Numerical techniques are needed to integrate out the unobserved 
school
-
level
 
random 
effect 


to approximate the weighted likelihood. 
 
Sandwich variance estima
tor is employed
 
to obtain standard errors because some 
researchers (e.g., Huber, 1967; White, 1980) claimed that they are robust to nonnormality and 
heterogeneity. 
The asymptotic covariance matrix of the parameter 

 
using 
this method
 
is defined 
by 
 

Var(


)


(
2.
9
)
 
14
 
 
w
here
 
' and " 
refer to the first and second derivative of the log
-
likelihoods with respect to the 
parameters 

. M
plus
 
(Muth


n 
&
 
Muth


n 1998
-
2017) implement
s
 
this method using a robust 
variance estimator 
having the following form:
 

.
 
 
(
2.
10
)
 
 
2. 
5
 
Scaling Sampling Weights for Multilevel Models
 
In multilevel weig
hted estimation literature, one of the 
main problems
 
is
 
the fact that the 
parameter estimat
es
 
are usually only approximately unbiased. There are many factors that have 
substantial influence on the quality of the estimation, such as sample
 
size of cluster, 
informativeness of selection, variability of sampling weights, intraclass correlation and scaling 
methods (Asparouhov, 2006; Asparouhov & 
Muth


n
, 2006; Bertolet, 2008; Cai, 2013; Grilli & 
Pratesi, 2004; 
Jia, Stokes, Harris, & Wang, 2011
; 
Kova


evi


& 
Rai, 2003; Pfeffermann et al., 1998; 
Rabe
-
Hesketh & Skrondal, 2006). For instance, parameter estimation would be severely biased 
when the cluster sample size is not sufficiently large enough
 
(Asparouhov, 2006; Rabe
-
Hesketh 
and Skondal, 2006). I
n order to c
orrect this, 
two scaling methods were proposed by 
Pfeffermann 
et al. (1998)
. 
 
The scaling method is an indicator
 
of how the weights are normalized at each level 
(Asparouhov, 2006).
 
The first method, assuming individual level weights are approxi
mately non
-
i
nformative, 
may 
produce approximately unbiased estimator for both variance components. This 
approach produces a scaling factor so that the 
individual level weights 

size (Longford, 1995, 1996; Pfeffermann et al
.
, 1998).  The sc
alar factor, which was referred to as 

,
 
is specified as follows
 
15
 
 
= 


.
 
 
(
2.11
)
 
Method 2 in Pfeffermann et al. (1998) is used when both levels of sampling 
design
 
are assumed to 
be informative. The scaling factor is defined as 
 

= 


,
 
 
(
2.
1
2
)
 
where  


is the number of sample units in
 
the
 
j
th
 
cluster. The scaling factor is set so that the 
individual level weights 
equal
 
the actual cluster size. These two scaling methods are termed as 
effective cluster sca
ling (ES) and cluster s
caling
 
(CS)
 
respectively in the current study
.
 
 
Currently, there is 
no consensus about which scaling method works better and under what 
conditions
. 
For example, Pfeffermann
 
et al. (1998) pointed out Method 2 (cluster scaling) works 
b
etter in reducing bias in simulation in 
the 
informative sampling design while Stapleton (2002) 
found that Method 1 (effective cluster scaling) produces unbiased estimates in multilevel SEM 
analys
is. Asparouhov (2006) 
noted
 
that the different scaling method
s may have different effects 
on different estimation techniques. If a scaling method performs well with the MPML approach, 
it does not necessarily mean that it performs well with 
other estimation
 
techniques, for example, 
PWIGLS. 
Sometimes, which scaling me
thod to use depends on the purpose of the research. 
If 
the 
main interest is 
point estimates, 
cluster
 
scaling method is recommended. If cluster variance 
estimates are, then effective scaling metho
d might be 
used
 
(Asparouhov, 2006; Carle, 200
9
).
 
 
2.
6
 
Intracl
ass 
C
orrelation
 
C
oefficient
 
(ICC
)
 
Besides sample size of cluster, informativeness of selection, variability of sampling 
weights, and scaling methods, 
ICC
 
also affects estimation quality (Asparouh
ov, 2006; Asparouhov 
and 
Muth


n
, 2006; Bertolet, 2008; Cai,
 
2013; Gril
l
i & 
 
Pratesi, 2004; Jia et al., 2011; 
Kova


evi


16
 
 
& Rai, 2003; Pfeffermann et al., 1998; Rabe
-
Hesketh & Skrondal, 2006). 
Prior s
tudies have found 
that the larger the ICC values are, the less biased the estimates 
are
 
in simulation studies 
mani
pulating ICCs using random intercept models without any covariat
es at both levels 
(Asparouhov, 2006; Jia et al., 2011; 
Kova


evi


& Rai, 2003).
 
 
ICC is one of the factors that 
is
 
examined in this study. It can be used
 
for model 
construction because it h
elps to determine the predictors which are most important to account
 
for 
the outcome variable (Raudenbush & Bryk, 2002)
. It is also used
 
as an index for including cluster 
level in multilevel modeling if ICC is not close to zero. Larger ICC values usually r
epresent larger 
variations in cluster level, indicating larger propo
rtion of total variance in the response variable 
that is accounted for by the clustering and thus larger clustering effect. In addition, the ICC value 
is informative for planning group
-
ran
domized experiments in education (Hedges & Hedberg, 2007
, 
2013).
 
To 
estimate the ICC for a given outcome, 
y
, a multilevel model is fit for the 
i
th 
student
 
in 
the 
j
th school
 

=  


+ 


+ 


,
 
 
(
2.
1
3
)
 
and the REML estimates of the variance of 


, (labeled as 


), which is the variation between 
schools, and the variance of 


(labeled as 


), which represents variati
on at student level are used 
to compute ICC
. 
The estimate of the ICC, 


, is then
 
defined as
 

= 


,
 
 
(
2.
1
4
)
 
which is t
he proportion of total variability in scores due to the school
-
to
-
school differences. 
 
Mo
reover, the ICC is used to calculate the design effect, which shows how much standard 
errors are underestimated. The design effect is defined as follows
 
 
Designeffect
 
= 1 + (averageclsutersize 

 
1)
 
* 


.
 
 
(
2.
1
5
)
 
17
 
 
Based on Kish (1965), a design effect wh
ich is greater than 2 
indicates
 
that we need to take into 
account the clustering effect 
of the data 
during estimation.
 
 
2.
7
 
Informativeness
 
of Selection
 
 
The informative
ness of selection, according to 
Asparouhov (2006), is an indicator of how 
biased the se
lection is. 
If the sampling design is informative, 
the inclusion probabilities are related 
to 
the response
 
variable after conditioning on the variables in the model
 
(Fuller, 2009; Grilli & 
Pratesi, 2004)
. 
Otherwise, it is 
non
-
informative
. 
Pfeffermann (1993
)
 
and Cai (2013)
 
pointed out 
that if weights are informative, they are quite influential on the results an
d therefore, should be
 
consider
ed in the multilevel analysis. However, if the 
sampling designs or 
weights are not 
informative, the effect of weights 
c
ould
 
be 
negligible
 
and 
it 
is not necessary to include weights in 
the analysis. Therefore, to check whether
 
the sampling design/weight is informative or not is 
necessary
. Following 
Laukaityte 
and
 
Wiberg (2018), weights are informative if the effective 
sampl
e size is smaller than the real sample size. Effective sample size for two
-
level models can be 
defined as 
follows. 
E
ffect
ive
 
sample size at level 2 (between schools) is calculated using the 
following formul
a
s:
 

=
 
 
(
2.
1
6
)
     
 
and effect
ive sample
 
size at level 1
 
(within schools)
 
for school 
j
 
is obtained by
 

=
 
 
.
 
 
(
2.17
)
 
Pfeffermann
 
(1993) developed a model to evaluate whether the s
ampling design is 
informative or not. The informativeness of sampling design is examined by the 


test, which is 
defined as follows
 
18
 
 
I
 
=
 

(


)
 
~
 

(2.18)       
                     
                                                     
where 


and 


are the estimates of weighted and unweighted analyses, respectively, and 


and 


are their va
riance 
est
imates
. The informativeness statistic follows a 


distribution
 
with 
p = dim
(

) degrees of freedom.
19
 
 
CHAPTER 3 METHOD
S
 
 
Two primary sections are included in this chapter
: one introduces methods for empirical 
data; one introduces simulation
 
design.
 
 
3.1 Empirical Data
 
3.1.1 Data and 
V
ariables
 
This study uses data from the public
-
use the Early Childhood Longitudinal Study, 
Kindergarten Class of 2010

2011 (ECLS
-
K: 2011, see Mulligan, Hastedt, & McCarroll, 2012, for 
an overview) data set, which
 
is sponsored by the National Center for Education Statistics (NCES). 
It is a latest study in early childhood longitudinal study that follows a U.S. nationally representative 
sample of 
students
 
entering
 
Kindergarten 
in 2011
-
2012
 
to the spring of 2016, fift
h
 
grade. ECLS
-
K:2011 provides 
descriptive 
information 
about
 

Data have been 
collected related to family, classroom and school environment.  Individual variables are available 
as well, studying how cognitive, social and emotiona
l
 
development is related to them.
 
The ECLS
-
K
:
 
2011 data are not a simple random sample of individuals or clusters. The 
study 
employed
 
a 3
-
stage cluster sampling design. 90 
geographic areas (
counties or groups
 
of 
count
ies
) as the primary sampling units 
(PSU
s)
 
were first sampled
 
at stage 1. Then samples of 
public and private schools were 
select
ed at stage 2 
from
 
the selected PSUs. Lastly, five
-
year
-
old 
children were randomly 
sampl
ed within s
elect
ed schools at stage 3. Stratification and probability 
propor
tion
al to size sampling w
ere
 
used at the first two stages of selection; stratification
 
and
 
unequal sampling 
were
 
used at the final stage. 
In the base year, 
Asian, Native Hawaiians, and other 

LS
-
K
: 2011 kindergarten data 
20
 
 
file and electronic codebook, public version (Tourangeau
 
et al.
, 2015) 
offer
s an excellent 
overv
ie
w
 
of the characteristics of complex sample designs including clustering, stratification, unequal 
probabilities of selection, and 
non
-
response and poststratification. 
 
The analytic samples
 
in this paper
 
only include kids in kindergarten, and data collected in 
both the fall and the spring
 
semesters
. 
Approximately 18,200 children enrolled in 970 schools 
during the 2010
-
11 school year p
arti
cipated during their kindergarten year.
 
Although the use of sampling weights will result in the increase of variance 
due to
 
unequal 
inclusion probabilities, it is still required and necessary because it 
prevent
s producing biased 
parameter 
estimates under i
nformative sampling in multilevel models (Pfeffermann e
t
 
al
.
, 1998; 
Kim & Skinner, 2013), protects against misspecification, and makes full use of population
-
level 
information (Kim & Skinner, 2013). The 
supplied
 
sampling weights adjusted f
or
 
school
-
level
 
n
onresponse and inverses of estimated
 
student
-
level
 
response probability 
are
 
used. Weights for 
first sampling stage are not 
available
. For student level, 
I use composite variables based on the 
parent survey as the primary independent variab
les of interest, 
as well as controlling for the 
student's fall test score in order to predict the spring score. The parent is used as a primary 
component to adjust for non
-
response, suggesting that 
child base weight adjusting for non
-
response associated wi
th either fall or
 
spring kindergarten parent interviews (
W1_2P0
)
 
would be 
a good choice of weight. For school
-
level weight, 
school base weight adjusted for non
-
response 
associated with the school administrator questionnaire (
W2SCH0
)
 
are
 
used.
 
 
The academi
c outcome variabl
es in this study are reading and mathematics scale scores 
calibrated using Item Response Theory (IRT) procedures. 

for the ECLS
-
K:2011, Mulligan et al
.
, 2012) measures basic skills (print familiarity, l
etter 
recognition
, beginning and ending sounds, rhyming words, word recognition), vocabulary 
21
 
 
knowledge, and reading comprehension. Reading comprehension 
consists of questions
 
identify
ing
 
information specifically in text, mak
ing
 
complex inferences within an
d across texts, a
nd 
consider
ing
 
the text objectively to judge its appropriateness and quality. The mathematics 
assessment measures skills in conceptual knowledge, procedural knowledge, and problem solving.
 
The
 
construct
 
validity has been established for EC
LS
-
K:2011 assessm
ents as the 
assessment, national and state performance standards in each 
of the domains were examined and 
specifications for reading and mathematics were established based on NAEP framework. 
Furthermore, curriculum specialists in the subje
ct areas were rec
ruited and the pool of items 
created were examined for content and framework strand design, accuracy, on
-
ambiguity of 
response options, and appropriate formatting.
 
The reliability of the reading score for Fall and Spring 
Kindergarten is 0.95, and the 
relia
bility of the mathematics score is 0.92 for Fall Kindergarten, 0.94 for Spring. The kindergarten 

SD = 

kindergarten mean score was
 
61.26 (
SD
 
= 13.56).
 
To model mathe
matics and reading 
achievements, we use three 
student
-
level
 
covariates and two 
school
-
level
 
covariates. Descriptive 
statistics of these variables are presented in Table 
3.
1
.
 
 
22
 
 
Table 3.1. 
ECLS
-
K: 2011 Variable Descriptive Statistics
 
 
Note
: SD=standard
 
deviation; MIN=minimum; MAX=maximum.
 
 
3.1.2 Statistical 
M
odels
 
The 
unexplained variance among randomly sampled clust
ers (e.g., schools) in outcomes of 
interest
 
could be inferred by using multilevel models. The effects of covariates at each level could 
also be estimated. Researchers could use models with random intercepts 
to account for the 
correlations within clusters c
aused by longitudinal or clustered design (
West
 
et al.
, 2015).
 
In a 
survey with multistage samples, there are always various levels of cluster. But only the lowest 
level of clustering usually has the greates
t impact on individual outcome (Asparouhov & 
Muth


n
, 
2006). 
Furthermore, Stapleton 
and
 
Kang (2016) found minor impacts could be found on inference 
and no difference could be detected even if we disregard the first stage sampling design which is 
beyond the levels in the model. For large
-
scale data sets,
 
the first
-
stage weights are usually no
t 
provided, for example, ECLS
-
K: 2011. Hence this first stage sampling design is not considered in 
this study. 
Therefore, for simplicity, two
-
level random intercept regression models are used in this 
study to fit mult
ilevel models in which individual stude
nts are nested in schools to two academic 
23
 
 
dependent variables, reading IRT scale score, and mathematics IRT scale scores. But I would not 
take account of IRT measurement errors in the analysis.
 
Three
 
different two
-
lev
el
 
models are 
examined
 
with different s
ets of covariates
. Model 1 is
 
an
 
unconditional model without any 
covariates at both levels
, model 2 includes all the
 
student level
 
predictors and model 3 is a full 
model consisting
 
of
 
all the
 
student level and school 
level
 
predictors.
 
Model 1: unconditiona
l model
 
 
Level 1: 


= 


+ 


(
3.1
)
 
  
Level 2: 


= 


+ 


(
3.2
)
 
 
Combined: 


= 


+ 


+ 


(
3.3
)
 
Model 
2
:
 
student 
model wi
th
 
three
 
student
-
level
 
predictors
 
    
Level 1: 


= 


+ 


*Female + 


*SES + 


*Pretest +  


(
3.4
)
 
    
Level 2: 


= 


+ 


(
3.5
)
 
    
Combined: 


= 


+ 


*Female + 


*SES + 


*Pretest + 


+ 


(3.6)
 
Model 
3
: full model including two level 
covariates
 
    
Level 1: 


= 


+ 


*Female + 


*SES + 


*Pretest +  


(
3.7
)
 
    
Level 2: 


= 


+ 


*Suburb + 


*Rural + 


*Suburban +  


(
3.8
)
 
    
Combined: 


= 


+ 


*Female + 


*SES + 


*Pretest 
+ 


*Suburb + 
 
 
*Rural 
+
  

*Suburban +


+ 


(3.9)             
                                                              
   
Since there are 
many
 
factors affecting the quality of estimation in complex sampling design, 
it is noteworthy to investigate both unweighted and weighted models. In t
his 
study
, 
all the three 
multilevel models above are 
explored
 
using
 
the following
 
four estimation methods
:
 
(a) 
maximum likelihood estimation
 
method with no weights
 
(
UW
),
 
(b) MPML using raw 
/
unscaled weights (RW),
 
24
 
 
(c) MPML using cluster scaling (CS),
 
(d) MP
ML using e
ffective 
cluster scaling (ES).
 
The missing data at level 1 
ranges 
from 0.2% for female to 14.2% f
or math pretest. 
Listwise 
deletion is used for handling missing data for the empirical study. Multiple imputation can be used 
in this case, but the e
xact models for real data is less important here. So listwise deletion is used 
to simplify the problem. Mis
sing data at level 2 is 3.6% for rural and suburban. Level 2 missing 
values cannot be simply removed because they have impact on the lower level. Sch
after and 
Graham (2002) 
mentioned if the probabilities of missingness only depended on observed items, 
miss
ing data could be assumed to be missing at random (MAR afterwards). Therefore, I assume 
missingness at level 2
 
here
 
is MAR. 
Two 
methods are 
recommend
ed
 
for handling
 
MAR data. One 
method
 
is multiple imputation method (Robin, 1987; 
Enders, 2010; 
Howell, 2008), and the other 
is the full
-
information maximum likelihood (FIML) method (
Danielsen, Wiium, Wilhelmsen, 
&
 
Wold, 2010
; 
Enders, 2010; 
Laukaityte 
&
 
Wei
bert, 2018
).
 
I use FIML for handling
 
missing data in 
this study.
 
 
3.2 Simulations
 
3.2.1 
S
imulation 
D
esign
 
The informativeness (Asparouhov, 2006; Cai, 2013) and t
he intraclass correlation w
ere
 
found to be influential factor
s
 
on the performance of weighted e
stimation in multilevel models 
(Aspa
rouhov, 2006; Jia et al., 2011; 
Kova


evi


&
 
Rai, 2003).  Monte Carlo simulation method
s
 
are
 
applied to evaluate the effect of ICC and examine the performance of MPML
 
using different 
scaling tech
ni
ques
 
in the context 
of two
-
stage informative 
and non
-
informative 
sampling design 
25
 
 
(please see Table 
3.2
 
S
imulation 
D
esign
)
. 
All the conditions are fully crossed. 
The full study design 
results in a total of 2 

 
5 

 
4= 40 simulation settings.
 
 
Table 3.2. 
Simulation Design
 
 
Note
: UW=unweighted estimation method; 
RW=es
timation method 
 
with raw 
weights; 
CS=
estimation
 
method with cluster 
scaling
;
 
ES=
 
estimation
 
method 
with effective cl
uster 
scaling
.
 
 
Five different ICC values are us
ed in this simulation: 0.5, 0.3, 0.2, 0.1, and 0.01. The 
unconditional ICCs that may typically be 
found in educational and psychological research
 
in the 
United States are in the range of 0.15 and 0.25
 
for academic large
-
scal
e assessments
 
(Bloom, Bos, 
& Lee
,
 
1999; Bloom, Richburg
-
Hayes, & Black, 2007; Hedges & Hedberg, 2007, 2013; Kreft & 
Yoon, 1994; Schochet, 2008). 
Accordingly, the values of
 
0.1, 0.2, and 0.3
 
are chosen for 
this 
study
. 
T
he lowest
 
ICC
 
value found in Hedges an
d Hedberg (2013)
 
is 0.02,
 
in whi
ch students were nested 
in grade
s
 
for each state. Raykov (2015) 
show
ed that the lower bound of 95% confidence interval 
of ICC could be as low as 0.014. 
Murry and short (1995) found that in
 
a
 
school
-
based intervention 
design,
 
ICC values were generally small
er, in the range of 0.01 to 0.05. 
T
he current study considers 
students may be nested in school district, or even larger geographic area
s
, which may result in a 
lower ICC
 
value
. 
Therefore
 
0.01, a very small non
-
zero value, is
 
chosen
, because small ICC still
 
26
 
 
affect
s
 
estimates of standard errors if we ignore the dependency. Musca et al. (2011) said small 
ICC would impact Type
-
I error dramatically.
 
Different values for 


and 


are used while the total variance of 
y
 
is kep
t fixed, 


+ 


= 
60
. 
This value is determined based on the empirical data results (See Table 3.5). 
Five different 
ICC values 0.5, 0.3, 0.2, 0.1, and 0.01 are obtained by setting 


to be 
30, 18, 12, 6 and 0.6
 
respectively, while the value of 


is 
60
 
-
 

, i.e., 
30, 42, 48, 54, and 59.4 
correspondingly.
 
3.2.2 Model
 
To evaluate the performance o
f MPML approach for a linear two
-
level regression model 
under informative 
and non
-
informative sampling 
condition
, the Monte Carlo simulatio
n mimics 
the sampling design in ECLS
-
K:2011. Specifically, about 18,200 kindergarteners from 970 
schools were sampled
. Overall, about 19 students were selected on average from each school. 
Mulligan
 
et al.
 
(2012) indicated that the school and student select
ion probability (i.e., sampling 
rate) is 0.02 and 0.25 respectively and the overall student selection probability is 
0.02 × 0.25 = 
0.005.  
All the school population are categorized into six groups based on the percentages of public 
schools in ECLS
-
K:2011: 
5.69% of schools have students ranging from 16 to 24; 11.49% of 
schools have students ranging from 25 to 49, 43.53% o
f schools have students varying from 50 to 
99, 25.3% of schools varying students from 100 to 149, 8.59% of schools have students ranging 
fr
om 150 to 199, then 5.22% of schools have more than 200 students. Then finally 
1
50
 
schools
 
and 3915 student
s
 
are draw
n from the population
 
with the expected sampling rate for schools and 
students in ECLS
-
K:2011
. The true values for the parameters are all o
btained using the empirical 
data set ECLS
-
K:
 
2011 with maximum likelihood estimation method (
s
ee Table 
3.5
). Thus
,
 
the 
data are generated using the following model:
 

= 17.4
3
 
+ 0.91*Female + 1.06*SES + 0.92*Pretest + 
1.04
*Rural + 


+ 


(
3.10
)
 
27
 
 
where 


is 
school
-
level
 
random effect and 


is 
student
-
level
 
error term, 


and 


are normally 
distributed with mean of 0 and variance 
30
,
 
18
, 
12
, 
6
, and 0.
6
 
for 


, and corresponding variance 
of 
60
 
-
 

for 


. Explanatory variables
 
(e.g., female, social economic status (SES), pretest, rural 
and suburban)
 
are
 
determined because they contribute significantly to the model and
 
are also 
variables 
other researchers are also interested in (e.g., Hedb
erg, 2016; Hedges & Hedberg, 2007). 
Fema
le follows Bernoulli distribution with probability of 0.49. 
Social economic status (
SES
)
 
follows normal distribution with mean 
-
0.05 and variance 0.66 (
SD
 
= 0.81). Pretest score follows 
normal distribution with mean
 
46.92
 
and variance 
132.22 
(
SD
 
=
 
11.50). 
Subur
ban
 
follows 
Bernoulli distribution with probability of 0.3
6
. Rural follows Bernoulli distribution with 
probability of 0.2
2
.
 
3.2.3 Sampling 
S
electio
n
 
Finite population are generated according to the model describ
ed above.
 
The expected 
sampling rate used in this study is still 0.02 for school
s
 
and 0.25 for student
s
 
as in ECLS
-
K: 2011, 
which results in the overall sampling rate of 0.005.
 
Sampling selection 
is determined by
 
whether the sampling design is informative 
or non
-
in
formative. 
In order to introduce unequal probability sampling 
at
 
both levels and make our 
sampling design informative, the 
present study
 
uses the similar plan
 
used by
 
Asparouhov (2006), 
Cai (2013) and Koziol et al. (2017). Poisson sampling is used
 
to selec
t
 
the
 
j
th school with 
probability:
 
prob
 
(
I
j
 
=
 
1) = 


(
3.11
)                 
 
where t
he 


is equal to 


(the random intercept effect for the 
j
th
 
cluster) but rescaled to have 
a variance of
 
2.
 
For the selected school, Poisson sampling is used to select
 
the
 
i
th student within 
the 
j
th school with probability:
 
28
 
 
p
rob
 
(
I
i|j
 
= 1)
 
= 


.
 
 
(
3.12
)
 
The 


is equal to 


(the residual effect fo
r the 
i
th student in the 
j
th cluster) but rescaled to have 
a variance of 2. This sampling plan results in a design which is informative at both levels, because 
at both levels, the inclusion probabilities are 
linked
 
to
 
the
 
response
 
variable
, according to th
e 
defini
tion of sampling design informativeness (
c.f., 
Fuller, 2009; Grilli & Pratesi, 2004)
.
 
The 
random variable
 
variance
 
is rescaled 
in order to keep 
a constant level of 
informativeness across different levels of
 
the
 
ICC
.
 
A
 
variance of 2
 
f
or both random 
variables
 
and 
the slope coefficients (1/2) are
 
selected to 
have approximately 0.3 of informativeness for both the 
school level and student level
, which
 
Asparouhov
 
(2006)
 
used as a moderate level of 
informativeness 
in his simulation
s
.
 
The intercept values
 
(
4.12 and 1.23 for school level and student 
level, respectively)
 
are determined 
using expected 
sampling rates
 
(0.02 and 0.25 for
 
the
 
school 
level and
 
the
 
student level, respectively)
 
and the formulas above (equation 
3.11
 
and 
3
.12
) 
to obtain 
desired sample s
izes.
 
Under
 
the 
non
-
informative
 
sampling 
condition
, 


and 


are replaced by other variables 
that are not part of the population model. Still
 
Po
i
sson sampling is used to sel
ect the 
j
th school 
with probability
 
prob
 
(
I
j
 
= 1) = 


(
3.13
)                 
 
where 


~ N (0, 
2
) and is 
not 
related to any variables in the model. Conditional on 
the selected 
school, Poisson sampling is used to select the 
i
th student in the 
j
th school with probability of 
 
prob 
(
I
i|j
 
= 1) = 


.
 
 
(
3.14
)
 
29
 
 
where 


~ N (0, 
2
) and is not related to any variables in the model. Although th
is design uses 
unequal probability of selection, it is not informative, because the selectio
n probability is not 
related to 
the response
 
variable.
 
 
Data are generated using
 
the
 
software Stata. The 
syntax 
for data generation 
is provided 
in 
A
PPENDIX
 
A
 
and 
A
PPENDIX
 
B
.
 
3.2.4 
M
plus
 
and 
D
ata 
A
nalysis
 
Each simulation is replicated 
10
00 times
 
for each st
udy condition. Each 1000 replications 
are analyzed in M
plus
 
Version 8 (
Muthén & Muthén
, 1998
-
201
7
) using the TYPE
 
=
 
MONTECARLO option under the M
plus
 
DATA command. 
 
The M
plus
 

Muthén 
& Muthén
, 1998
-
2017) provides guidance on how to incorporat
e sampling weights and how to use 
scaling methods in a two
-
level model. 
 
  
The two scaling methods that are used are
 
referred to E
CLUSTER
 
and C
LUSTER
 
respectively in M
plus
 
documentation
, which correspond to effective 
cluster scaling and clustering 
scaling 
respectively in this study.
 
 
Altogether
, four estimation methods are considered: (a) unweighted estimation method
 
(U
W)
; (b) MPML 
method 
using raw
/
unscaled weights (RW); (c) MPML
 
method
 
using cluster 
scal
ed
 
(CS)
 
weights
, and (d) MPML
 
method
 
using effective 
cluster scal
ed
 
(ES)
 
weights.
 
 
Then Sandwich variance estimators (ESTIMATOR
 
=
 
MLR) are used in all instances. The 
TYPE option is set to TWOLEVEL, and appropriate variables are identified for the CLUSTER, 
WEIGHT, and BWEIGHT options.  For MPML models, WTSCAL
E and BWTSCALE are also 
specified based on different scaling methods: 
UNSCALE
D
 
and 
UNSCALED
 
are used respectively 
for raw scaling method, 
CLUSTER
 
and 
SAMPLE
 
for 
cluster scaling
 
method
, and 
ECLUSTER
 
and 
SAMPLE
 
for
 
effective scaling method
 
for three weighted
 
methods respectively
. For
 
a
 
general 
30
 
 
multilevel model
 
ignoring weighting
 
in t
h
e present study
, WTSCALE and BWTSCALE are not 
used
 
under the VARIABLE command
.
 
 
3.2.
5
 
Evaluation 
C
riteria
 
Empirical
 
(absolute)
 
Relative Bias, Root Mean Square Error (
R
MSE), and 
9
5% Confidence 
Interval 
Coverage Rate are used as
 
the primary
 
criteria to est
imate the quality of the performance 
of the estimators as previous simulation studies (
e.g., 
Cai, 2013; Eideh & Nathan, 2009). 
In 
measurement or sampling situations, bias is defin
e

of the 
measurements or test results
 

Then the true value can be under
-
 
or overestimated. Since large number of replications are applied 
in this stud
y
, even small values of bias may 
be 
deemed significantl
y different from 0. As such, the 
relative bias instead of bias is used. 
The relative bias is defined as
 
RBias
(


) = 


(


).
 
 
(
3.15
)
 
where 

 
is the true value set, and 


is the estimated value in each iteration.
 
It is noted in 
Muth
én 
and
 
Muthén (2002)
 
that
, if the
 
absolute
 
relative bias is less than 10%
 
of the true val
ue
, then the 
parameter estimates can be considered unbiased.
 
A common accuracy measure called mean square error (MSE) is the mean of the squared 
differences. 
It indica
tes how close the estimate is to the true value. 
This measure incorporates 
concepts of bi
as and precision because it equals to the sum of the variance of the estimates and the 
squared mean error. 
The root MSE
 
(RMSE)
 
tells us how far the approximation will be from the 
true value on average. 
RMSE is used because it can penalize large values. 
It 
is computed using the 
following formula
 
RMSE
(


) = 


,
 
 
(3.16)
              
      
31
 
 
where 


= 


. 
The smaller the RMSE is, the be
tter the estimate is.
 
The coverage 
rate/
probability
 
(CR)
 
in this study is set at 95%. It is utilized to evaluate the 
proportion
 
of replication in each parameter estima
te
 
that the interval estimator contains 
the 
population parameter value (Muthén & Muthén, 
1998
-
2017). It is recommended that the coverage 
rate should be at least 
0
.91 by Muthén & Muthén (2002). That is, at least 91% of replications 
having true parameter val
ues within the 95% confidence interval.
 
M
plus
 
syntax
 
for analysis 
is provided
 
in A
PPENDIX
 
C
.
  
32
 
 
CHAPTER 4 RESULTS
 
 
This chapter consists of two primary sections: one for simulation results, the other for 
empirical study results. 
 
 
4.
1
 
Simulation Results
 
The primary 
evaluation
 
criteria are (absolute) relative bias, root mean square error (RMSE) 
and coverage
 
rate
 
of the 
interval estimators. 
Simulation r
esults are depicted in
 
Table 4.
1
-
4.
6
 
and 
Figure 4.1
-
4.1
6
. Table 4.
1
-
4.
2
 
illustrate the Monte Carlo estimates of relative bias, RMSE and
 
9
5
% 
confidence interval
 
coverage rate for
 
the
 
fixed effects,
 
i
nter
cept and variance components in the 
informative 
condition, Table 4.3
-
4.4
 
for those in the
 
non
-
informative
 
condition
. Table 4.
5
-
4.
6
 
display the average standard errors of the estimates and the standard deviations in
 
t
he
 
informative 
and 
non
-
informative
 
d
esi
gn r
espectively. Figure 4.1
-
4.2
, and Figure 4.13
-
4.14
 
plot
 
relative bias 
for the four covariates
,
 
intercept and variance components in the informative 
condition
,
 
and
 
Figure 
4.
3
-
4.
4
 
and Figure 4.15
-
4.16 
for those in the
 
non
-
informative
 
cond
ition
. Dashed 
hori
zont
al lines 
indicate bounds for acceptable
 
levels of relative bias (|RB%| 

 
10; Muthén & Muthén, 2002). 
Figure 4.
5
-
4.
6
 
plot
 
RMSE for the four covariates and intercept and variance components in the 
informative 
design
 
and
 
Figure 4.
7
-
4.
8
 
for those in th
e 
n
on
-
informative
 
design
.
 
Figure 4.
9
-
4.
10
 
plot
 
coverage rate for the four covariates and intercept and variance components in the informative 
design
 
and
 
Figure 4.11
-
4.12 for those in the 
non
-
informative
 
design. Dashed horizontal lines 
indicate the nominal 
co
vera
ge rate of 95%.
 
Results are organized b
y research question
s
 
and 
evaluation
 
criteria. 
Under each 
evaluation 
criteria, the results are illustrated by informative and 
non
-
informative
 
condition
 
respectively
.
 
33
 
 
4.
1
.1 Research 
Q
uestion 
O
ne
 
Research question
 
o
ne allows me to evaluate the performance of
 
weighted and unweighted 
estimators under the informative and 
non
-
informative
 
condition in terms of (absolute) relative bias, 
RMSE, and 95% confidence interval coverage rate. Comparison between 
unweighted and 
we
ig
hted estimators can give us a picture under
standing whether differences among them are due 
to sampling weights application and which estimator performs best. 
 
4.1.1.1 
(Absolute) 
R
elative 
B
ias 
 
In general, all the fixed effects are estima
ted somewhat unbi
as
edly in both informative and 
non
-
informativ
e
 
conditions if the criterion of Muthén 
and
 
Muthén (2002) is applied. However,
 
a
 
different story can be told for the intercept and variance components
 
estimates
. On average, 
the 
absolute relativ
e bias is compara
ti
vely larger in magnitude under the informat
ive 
condition
 
than 
that in 
the 
non
-
informative
 
design
. 
T
he most variability in the absolute relative bias occurs for the 
school
-
level
 
variance estimators
 
in both conditions
.
 
4.1.1.1.1
 
Informative
 
Design
 
From the
 
p
resented simulation results in Table 4.1 an
d Figure 4.1, it is evident that all the 
estimates of absolute relative bias for the four 
fixed effects
 
are less than 10%
 
of the true value
 
and 
can be considered unbiased
 
i
f the crit
erion of 
Muthén 
and
 
Muthén (2
002
)
 
is 
used
 
acros
s the fo
ur 
estimators. The absolute relative bias
es
 
for the three student
-
level co
variates (i.e., female, SES and 
pretest) are less than or close to 1%. Although the relative bias
 
for the school
-
level covariate (i.e., 
rural) 
is
 
higher tha
n those of student
-
l
evel cov
ariates, 
it is
 
still within 10% of the true value. 
 
Table 4.2 and Figure 4.2 show 
that the intercept and 
student
-
level
 
variance are unbiased
ly 
estimated (
in terms of 
Muth


n 
&
 
Muth


n
, 2002)
 
except 
for 
the intercept estimate i
n the
 
unweighted 
 
34
 
 
Table 4.1
. 
RB(%), RMSE, 95% 
CI CR for Cova
riates in the In
formative Design
 
 
Note
:
 
RB=relative bias; 
RMSE=root mean
 
square error
; 
CR=95%
 
confidence interva
l co
vera
ge rate; UW=unweighted estimation 
method; RW=
estimat
ion
 
method with raw 
we
ight
s; CS=
estima
tion method with cluster scaling; ES=estimation method with e
ffective 
cluster scaling.
 
35
 
 
Table 4.
2
. 
RB(%), RMSE, 95% 
CI CR for 
Intercept and Variance Components
 
in the In
formative Design
 
 
Note
:
 
RB=relative bias; 
RMSE=root mean
 
square error
;
 
CR=95%
 
confidence interva
l co
vera
ge rate; UW=unweighted estimation 
method; RW=
estimat
ion
 
method with raw 
weight
s; CS=
estima
tion method with cluster scaling; ES=estimation method 
with e
ffective 
cluster scaling.
36
 
 
Figure 4.1
. Relative bias (%) for covariate
s in the informative design
 
 
Figure 4.2
. Relative bias
 
(%)
 
for intercept and variance components in the informative design
 
 
37
 
 
c
ase.
 
T
he three weighted estimators
 
perform 
almost equa
lly 
well since
 
all the 
relative 
biases 
of the 
intercept estimate
s
 
produced
 
by
 
them are
 
less 
than 2. 
T
he unweighted estimator performs
 
the
 
worst 
and produces
 
substantially larger relative bias than the weighted estimators
 
do
.
 
As for 
the 
student
-
level 
varian
ce, the absolut
e relative biases are all less than or close to 10%. Among 
the four 
estimato
rs, the unweighted method produces larger absolute relative bias than the weighted 
methods 
do. The
 
cluster scaling method has the smal
lest values of absolute relativ
e bias. Therefo
re, 
the cluster scaling method works
 
the
 
best and 
the 
unwei
ghted method works 
the 
worst 
for
 
the
 
student
-
level
 
variance
 
in terms of (absolute) relative bias
. As
 
for
 
the estimate
s
 
of 
school
-
level
 
variance, all four estimators do 
not perform we
ll and have ver
y large relative 
biases when the ICC 
is extremely small. To
 
be more specific, the relative bi
as is as large as over 600 with the raw 
weighted method. Even for the best estimator, the unweighted one, has the relative bias of ove
r 80, 
which is
 
much
 
larger th
an
 
the
 
s
tandard
 
used in the present study.
 
In general, the 
raw weighted 
estimator performs 
the 
worst and the unwei
ghted estimator performs
 
the
 
best for 
the 
school
-
level
 
variance
 
across all the ICC levels
. 
 
In all, the weighted models perform
 
quite similarl
y with each other and outperform the 
unweighte
d estimator f
or the intercept and 
student
-
level 
variance while the unweighted
 
model has 
smaller relative bias and outperform the weighted estimators for the 
school
-
le
vel 
variance. 
T
he 
intercept 
i
s
 
always overes
timated and
 
the
 
student
-
level 
variance 
is
 
underesti
mated. T
he 
school
-
level 
variance, in most cases 
is
 
overestimated, except wi
th 
the 
unweighted method and effective 
scaling method when ICC equals 0.5. 
The 
student
-
level variables 
F
emale and S
ES are 
underestimated and pretest 
is
 
overestima
ted. 
School
-
level variable,
 
rural
, is
 
overestimate
d in 
the 
weighted case, while underestimated in
 
the
 
unweighted case.
 
 
38
 
 
4.1.1.1
.2 
Non
-
Informative
 
D
esign
 
 
Table 4.3 and Figure 4.
3
 
show that the abs
olute relativ
e biases of the four covariate
 
estimates are al
l smaller than 
10% in the
 
n
on
-
informative 
condition
. It means that these
 
four
 
covariates are considered to be estimated unbiase
dly
 
in terms of 
Muth


n 
&
 
Muth


n
 
(2002)
. Also, 
the two continuous covaria
tes ha
ve 
smaller absolute relative bias
es
 
than the tw
o
 
dichotomous 
covariates
 
do
. At the same time, the unweighted method produces lower or equal absolute relative 
bias for the four fix
ed effects than or as the other three weighted estimators do. 
So,
 
the unweig
h
ted 
estimator performs 
the 
best for all the fix
ed effec
ts among the four e
stimators. 
 
The intercept is precisely estimated since all the absolute relative biases are no more than 
0.20
5 (see Table 4.
4
 
and Figure 4.
4
). The unweighted method outperforms 
the 
o
ther es
timators 
when the ICC equals 0.01, 0.1, and 
0.2, whi
le it performs
 
the
 
worst when the ICC equals 0.5. 
Results also show that the 
student
-
level
 
variance is estimated unbiasedly since th
e absolute relative 
biases are all less than 5% across all the es
timators. A
mong them, the raw weighted method has 
t
he largest r
elative bia
s, indicating it works
 
the
 
worst. The effective scaling and unweighted 
method outperform the other two. As for the 
school
-
level 
variance, all the four estimators produce 
substantiall
y large relati
ve
 
bias when the ICC is extremely sma
ll
 
and
 
all th
e estimato
rs do not work 
well when the ICC is 0.01. Comparatively, the raw weighted method works 
th
e 
worst while the 
unweighted meth
od performs 
the 
best across different levels of the ICC for 
the 
school
-
level
 
varianc
e estimate
s
.
 
39
 
 
Table 4.
3
. 
RB(%), RMSE, 95% 
CI CR f
or Cova
riates in the
 
Non
-
In
formative Design
 
 
Note
:
 
RB=relative bias; 
RMSE=root mean
 
square error
; 
CR=95%
 
confidence interva
l co
vera
ge rate; UW=unweighted estimation 
method; RW=
estimat
ion
 
method with raw 
weight
s; CS=
estima
tion method with cluster scaling; 
ES=estimation method with e
ffective 
cluster scaling
.
40
 
 
Table 
4.4. 
RB(%), RMSE, 95% CI CR for Intercept and Variance Components in the Non
-
Informative Design
 
 
Note
:
 
RB=relative bias; 
RMSE=root mean
 
square error
; 
CR=95%
 
confidence interva
l co
vera
ge rate; UW=u
nweighted estimation 
method; RW=
estimat
ion
 
method with raw 
weight
s; CS=
estima
tion method with cluster scaling; ES=estimation method with e
ffective 
cluster scaling.
41
 
 
Figu
re 4.
3
. Relative bias (%) for covar
iates in
 
the
 
non
-
informative 
design
 
 
Figure 4.
4
. 
Relative bias
 
(%)
 
for intercept and variance components in 
th
e 
non
-
informative 
design
 
 
42
 
 
4.1.1.2 RMSE 
 
An overview of the
 
RSME of the fixed effect point estimators and
 
the
 
interce
pt and 
variance component estimators across informativeness and ICCs is provid
ed i
n Table 4.1
-
4.4, 
Figure
 
4.
5
-
4.
8
. There is not much diff
erence on the RMSE for the
 
fixed effects between 
 
the informa
tive and 
non
-
informative
 
condition. However, on average, t
he RMSE is comparatively 
larger under the informative conditio
n than those in 
the 
non
-
informative
 
condition.
 
4.1.1.2.1 Informative Design
 
Compared with weighted est
imators, the unweighted estimator 
has smaller RMSE
 
value
 
for the 
four covariates
 
under the i
nformative condition (see Table 4.1 and Figure 4.
5
).
 
Th
e
 
weigh
ted estimates of
 
the 
RMSE show almost the same patterns for the four covaria
tes. The 
unweighted 
estima
tor performs 
the 
most efficiently 
among the four estimators
.
 
As the relative biases of
 
the
 
i
ntercept and variance components, similar results are obtained 
for 
the
 
RMSE. F
or example, 
the 
unweighted method has comparatively much larger RMSE for
 
the 
intercept than the weighted 
estimators do
 
a
nd the three weighted estimators perform very much 
similar
ly to each other (see
 
Table 4.
2
 
and 
Figure 4.
6
). The unweighted 
estimator 
prod
uces the 
largest RMSE for
 
the
 
student
-
level 
variance and performs
 
the
 
least effic
iently among the four. The 
cluster scal
ing method performs the most efficiently. As for the 
schoo
l
-
level 
variance, the 
unweighted estimator has the smallest RMSE and per
forms 
the 
most efficiently among the four. 
The raw weighted estimator has the least ef
ficiency.
 
 
In all, the unweighted estima
tor performs
 
the
 
worst for
 
the
 
intercept and 
student
-
level
 
variance estimate
s
, but performs 
the 
best for 
school
-
level
 
variance estimate
s
 
in term
s of RMSE 
in 
the informative design.
 
 
43
 
 
Figure 4.
5
. RMSE for covariates in the informative des
ign
 
 
Figure 4.
6
. RMSE for intercept and variance components in the infor
mative design
 
44
 
 
4.1.1.2.
2
 
Non
-
Informative
 
Design
 
Table 4.
3
 
and Fi
gure 4.
7
 
show 
that the unweighted method has the smaller RMSE for the 
four covariates than the weighted methods do, and in most cases,
 
there is not much difference 
across
 
the weigh
ted methods 
for the four covariates at different levels of the ICC. Therefor
e, the 
unweigh
ted method performs
 
the
 
best among the four estimators for all the fixed effects. 
 
Apparently, the unweighted method ha
s
 
the smallest RMSE for the intercept and
 
the
 
t
wo variance 
components across all the conditions in the 
non
-
informative
 
cond
ition (see Tab
le 4.4 and Figure 
4.
8
) and performs
 
the
 
most efficiently among the four estimators across all the levels of the
 
ICC. 
And
 
the
 
raw weighted method produces
 
the
 
large
st
 
RMSE among t
he four estimators fo
r the 
intercept and
 
the
 
two variance component
 
estimates.
 
 
Figure 4.
7
. RMSE for covariates in the 
non
-
informative
 
desig
n
45
 
 
Figure 4.
8
. RMSE for intercept and variance compon
ents in the 
non
-
informative
 
design
 
 
4.1.1.3 
Coverage Rate
 
An 
overview of coverage of the fixed effects
,
 
intercept and variance component
 
estimators 
across inf
ormativeness and ICCs is provided in
 
T
able 4.
1
-
4.
4
 
and
 
Figure 4.
9
-
4.
12
. All the fixed 
effects are 
estimated without much bias (<10%) in both
 
the
 
i
nformative and
 
non
-
informative
 
condition
s
 
i
f the criterion
 
of 
Muth


n
 
& 
Muth


n
 
(2002)
 
is applie
d.
 
T
he corresponding coverage 
rates for them are good and not much differen
ce can be found 
among
 
them. For the
 
intercept and 
variance components, on average, 
the
ir 
covera
ge rate
s
 
are
 
much lower under the informative 
condition than
 
th
ose
 
under the
 
non
-
inf
ormative
 
condition. Under the informative condition, the 
most variability i
n coverage occurs for the intercept e
stimators, whereas under the 
non
-
informative
 
condit
ion, the 
most variability in coverage occurs for the 
school
-
level
 
estimators.
 
46
 
 
4.1.1.3
.1 Infor
mative Design
 
Because the four covariates are precisely or slightly biasedl
y estimated, the coverage rates 
for t
hem are all above or close to 0.91, 
especially for the thre
e level
-
one predictors (see Table 4.
1
 
and Figure 4.
9
). 
 
Because
 
the
 
unweighte
d method
 
produces substantially larger bias
es
 
for the intercept and 
student
-
level
 
v
ariance estimate
s
, this leads to very
 
poor coverage rates for both 
of them (see Table 4.2 and 
Fi
gure 4.
10
): with the coverage rate of 0 for the intercept and less than 3% fo
r 
the stu
dent
-
level
 
variance. The three weighted methods perform almost equally well
 
and have the coverage rates of 
arou
nd or over 0.91 for the interce
pt. However, even the best 
s
tudent
-
level
 
variance estimator, the 
cluster scaling estimator, has the cover
age rates
 
no more than 0.63. For 
the 
schoo
l
-
 
level
 
variance 
estimate
s
, raw weighted method performs 
the 
worst while the 
unweighted estimator performs 
the 
best.
 
 
Figure 4.
9
. Cove
rage rate for covari
ates in the informative design
 
47
 
 
Figure 4.
10
. Coverage ra
te 
for intercept and variance components in the informative design
 
 
4.1.1.3.2 Non
-
I
n
formative Design
 
The coverage rates 
for the four covariate estimates in the 
non
-
informative
 
con
dition are all 
above or close to 0.95. Among the four estimators, the unweigh
ted meth
od performs
 
the
 
best. 
 
The unweighted method has the highest coverage rates for 
the intercept amo
ng the four 
est
imators as well and they are all above or around 0.94. As for t
he 
student
-
level 
variance, the 
effective scaling method has the 
high
est c
overage rates
, which are around 0.92 wh
ereas
 
the raw 
weighted method has the lowest cover
age rates, which 
are around 0.6
5. 
The unweighted estimator 
has very similar coverage rate to t
he effective scaling method. 
The coverage rates for the 
school
-
level 
vari
ance with un
weighted method are the highest among all the estimators and are all larger 
than 0.93 except 
when
 
the
 
ICC is
 
0.01
, while the raw weighted has the smallest one
. 
 
 
48
 
 
Figure 4.1
1
. Coverage rate for covariates in
 
the
 
non
-
informative
 
design
 
 
Figu
re 4.12
. Covera
ge rate for intercept and variance components in
 
the
 
non
-
informative
 
design
 
 
49
 
 
4.
1
.2 R
esearch 
Q
uestion 
T
wo
 
Research question two addresses the ICC effect on the different e
stimation 
method
s in the 
informative and 
non
-
informative
 
design.
 
4.1.2
.1 
(
Absolute
)
 
R
e
lative 
B
ias
 
4.1.2.1.1 Informative Design
 
 
Table 4.
1
 
and 
Figure 4.13
 
show that, a
s th
e
 
ICC increases, the
 
absolute relative bias
es
 
for 
the 
two continuous covariates (e.g., 
SES and pretest) decrease
. For the covariate female, there is 
no mono
tonous pattern for
 
its relative bias. As the ICC increases, it increases first and then starts 
to
 
dec
r
ease. For the cova
riate rural, the relative bias increases as the ICC increases in th
e weighted 
case, while the absolute relative bias decreases in the un
weighted case. The
refore, for all the fixed 
effects, there is no overall consistent pattern. 
 
It 
is e
v
ident (see 
Figure 
4.14
) that the absolute relative bias for the intercept estimate wi
th 
unweighted method increases as the increase of ICC, but no consist
ent monotonous pat
tern can be 
found for the  relative biases for the intercept estimate with the
 
wei
gh
ted methods and th
ey do not 
vary much across the weighted methods at each different l
evels of the ICC (see Table 4.2). The 
absolute relative biases for
 
th
e
 
student
-
level
 
va
riance estimate
s
 
decrease as the ICC decreases with 
all the four estimators, but t
he
 
decrease
 
rate
 
is 
very 
tiny
 
and hard to find from 
Figure 4.14
. There 
is an obvious in
crease pattern in the relative bias of 
the 
school
-
level
 
variance esti
mate
s
 
as the ICC 
d
ecreases with t
he four estimators (see Table 4.2 and 
Figure 4.14
).
 
 
50
 
 
Figure 4.13
. R
elative bias (%) f
or covariates in
 
the
 
informative design
 
 
Figure 4.14
. Relative b
ias (%) for intercept and variance components in
 
the
 
informative desig
n
 
 
51
 
 
4.1.2.1.2 Non
-
I
nformative Design 
 
Clear patterns can be found under the 
non
-
informative
 
samp
ling d
esign. Table 4.3 
and
 
Figure 4.15
 
indicate as the ICC increases, the absolute relativ
e bias decreases for the three 
student
-
level
 
covariates, and increase
s for rural, the sc
hool
-
level covariate.
 
Simulation results show that as the increase of the ICC
, the 
absolute relative 
bia
s for the 
intercept decreases with the three weighted methods wh
ereas it increases with the unweighted 
model (see Table 4.4 and 
Figur
e 4.16
). As for the
 
relative bias of 
student
-
level variance, it 
decreases as the ICC decreases,
 
but t
he decrease rate is
 
s
o 
small
 
that similar patterns hold for the 
estimators across diff
erent ICC values. The relative bias for 
the 
school
-
level variance inc
reases as 
the decre
ases of the ICC.
 
 
Figure 4.15
. Relative bias (%) for covariates in the 
no
n
-
info
rmative
 
design
 
 
52
 
 
Fi
gure 4.16
. Relative bias for intercept and variance components in
 
t
he
 
non
-
informative
 
desig
n
 
 
4.1.2.2 
RMSE
 
4.1.2.2.1 Informative Design
 
Contrary to the rela
tive bias, there are clear patterns of RMSE for all the fixed effects (see
 
Table
 
4.
1
 
and Figure 4.
5
)
. As the ICC increases, the RMSE decreases for all the student
-
leve
l 
fixed 
effects
, and 
in
creases for the school
-
level 
fixed effect
 
with
 
all the estimators.
 
 
Table 4.
2
 
and Figure 4.
6
 
show that the RMSE 
for the
 
intercept is increas
ing as
 
the
 
ICC 
increases. 
The increase rate is quite obvious with the unweighted method but i
t is 
so
 
small with 
the three weighted methods that not much 
variation
 
can be 
found across
 
different ICC values. As 
for the variance components, there are clear pat
terns 
for 
both of them. As
 
the ICC increases, the 
RMSE of
 
the
 
student
-
level
 
variance decrease
s wh
ereas
 
the RMSE of 
the 
school
-
level
 
variance 
increases.
 
 
53
 
 
4.1.2.2.2
 
Non
-
I
nformative Desi
gn
 
The RMSE decreases for the three 
student
-
level
 
covariates, and increase
s
 
for 
rural, the 
school
-
l
evel covariate as the ICC increases with all the four estimators (s
ee Table 4.
3
 
and Figure 
4.
7
). 
 
Figure 4.
8
 
shows that the RMSE for the
 
intercept remains a
lmost unchanged across 
different levels of
 
the
 
IC
C with the four estimator
s, but
 
Table 4.
4
 
show
s
 
tha
t
 
the RMSE does 
increase as the increase of the ICC consistently. I
t is clear that as the ICC increases, the RMSE for 
the 
student
-
level
 
variance decreases w
h
ereas
 
the RMSE for
 
the
 
school
-
level
 
variance incre
ases 
with all the estimat
ors.
 
4.1.2.3 
Coverage Ra
te
 
4.1.2.3.1 Informative 
Design
 
Table
 
4.
1
 
and Figure 4.
9
 
show as th
e increase of the ICC, there is not much variation on 
the coverage ra
tes 
for
 
all the fix
ed effects. 
 
The coverage rate for the intercept and 
student
-
level
 
varianc
e 
rema
in
s almost the same a
s 
the ICC increases (see Table 4.
2
 
and Fi
gure 4.
10
). For 
the 
school
-
level
 
variance, although the 
coverage rate changes as the increase o
f ICC, no consistent
 
pattern can be seen for the estimators 
except for with the raw weighted m
ethod.
 
Overall, coverage r
ate is not 
sensitive 
to the change of
 
the ICC in the curr
ent case.
 
4.1.2.3.2 Non
-
I
nformative Design
 
No obvious ICC effect can be found 
i
n terms of
 
the coverage rate for al
l the parameter 
estimates except for 
school
-
level
 
variance (see Table 4.
3
-
4.
4
, a
nd Fi
gure 4.11
-
4.12). The coverage 
rates for the fixed effects
, intercep
t, and 
student
-
level
 
variance remain almost unchanged as the 
increase
 
of the ICC. Although there are som
e variations 
of
 
the coverage rates for the 
school
-
level
 
54
 
 
variance, there i
s no c
lear p
attern with the four estimators. For example, the covera
ge rates wi
th 
the unweighted model and cluster scaling method increase first and
 
then decreases later as the 
incre
ase of the ICC. The coverage rate keeps on increasing with the effective s
caling
 
metho
d and 
decreasing with the raw weighted method as the dec
rease of th
e ICC. In sum, the
 
effect of
 
ICC 
cannot be found for the all the para
meters in terms of
 
the coverage ra
te in the 
non
-
informative
 
condition.
 
4.
1
.3 Simulated 
S
tandard 
E
rrors
 
and 
S
tandar
d 
D
evi
ations
 
If we tend to repeat the Monte Carlo simulation and tally th
e sample mean each time, a 
normal distribution (based on Central Limi
t Theorem) would result in the dis
tribution of the 
sample mean. To assess how well the standard errors of th
e esti
mates 
approxi
mate the true 
sampling variation, the sample standard deviat
ion of each replicate, that is, the Monte Carlo 
standard deviation, c
an be compared to the average of t
he estimated standard errors. We might 
expect the sample standard deviatio
n, an 
approx
imation
 

to and the av
erage of standard errors.  It means that the standard error is a good
 
estimate of the 
standard deviatio
n of the normal distribution if the sample size is sufficiently large.  Th
e 
diff
erence
s are c
alculated between the standard deviations and averaged stand
ard errors of 1000 
point estimates for all the seven parameters: four
 
regression coefficients for femal
e, SES, pretest 
and rural, intercept, and two random effects (
the 
student
-
leve
l
 
vari
a
nce and 
scho
ol
-
level
 
variance). 
Table 4.
5
 
presents the results of standa
rd deviations of simulation and standard errors of estimates 
in t
he 
informative sampling design. The di
fferences between them for the four fixed effects and 
intercept are on the
 
secon
d
 
or even thi
rd decimal place. The differences for 
student
-
level
 
variance
 
and 
school
-
level
 
variance are a little bit larger, but still the
y are less than or close to 1. Clearly
, the
 
55
 
 
unweighted
 
method produces the smallest standard errors and works be
st com
p
ared with th
e 
weighted estimators.
 
Table 4.6 contains the results of sta
ndard deviations of simulation and standard errors of 
estimates i
n the non
-
informative sampling design.
 
It tells us the same story as in the informative 
setting. The differences
 
for a
l
l the parame
ter estimates are even smaller, and the largest absolute
 
dif
ference is 0.273, indicating the estimation performs quite well. 
Still, the unweighted method 
has the s
mallest standard errors and performs best compared with the three weighted
 
model
s
.
 
 
56
 
 
Table 4.5. 
Simulation Standard Deviations and Standard Errors of E
stimates in the Info
rmative 
Design
 
 
Note:
 
UW=unweighted estimati
on method; RW=
estimat
ion
 
method with raw 
weight
s
; 
CS=estimation method with cluster scaling; ES=estimation method with effective cluster scaling
; 
SD=standard 
deviation; SE=standard e
r
ror
; Diff=
difference.
 
 
57
 
 
T
able 4.6. 
Simulation 
Standard Deviations and Sta
ndard Errors of Estimates 
in the Non
-
Informative Desig
n
 
 
Note:
 
UW=unweighted estimation method; RW=
estimat
ion
 
method with raw 
weight
s
; 
CS=estimation method with cluster scaling; ES=estimation method with effective cluster scaling
; 
SD=standard 
deviation; S
E=standard e
r
ror
; Diff=
difference.
 
58
 
 
4.
2
 
Results for ECLS
-
K:201
1
 
First, the informa
tiveness of the 
weights 
is
 
examined
 
following
 
Laukaityte 
and
 
Wiberg 
(2018)
.
 
 
The 
student
-
level
 
effective sample sizes 


are all smaller than the actual sample siz
es 
except those schools 
wh
ich ha
ve
 
only one student. The schoo
l
-
level effective sample size 


is 
614, which is smaller than the actual number of schools. 
Therefore,
 
both level weights are 
inf
ormative
 
and both level weights 
would
 
affect the result
s of the multilevel analys
i
s.
 
Three two
-
level HLM models with 
different sets of covariates are use
d to fit two dependent 
variables: reading achievement scores and mathematics achievement scores. The fi
rst model is a 
null model, the second is the model with
 
student
-
level predictors 
(
I label it as student model), and 
t
he third model is a full model with 
student
-
level and school
-
level predictors included. Table 4.
7
 
presents the results of the unweighted an
d weighted null models. Even this simple model shows 
th
ere are important differen
c
es in the estimates of the variance
 
components. Having no weights 
produ
ces the largest estimates of 
student
-
level
 
variance, whereas using raw weights produces the 
largest est
imates of 
school
-
level
 
variance. The estimates of inter
cept are found to be in th
e
 
same 
direction and have similar si
zes to each other across the four es
timators in reading and mathematics. 
Still, the weighted intercept estimates are consistently larger th
an unweighted estimate. Overall, 
the unweighted method 
has the smallest standard 
e
rrors and largest test statistics c
onsistently 
among the four estimator
s. In addition, the two scaling methods perform more similarly with much 
more similar results of point 
estimates, standard errors, and consequently the test s
tatistics.
 
The ICC (see Ta
b
le 4.
7
) shows 19.6% and 16.2% of th
e total variance in mathematics and
 
reading achievement are attributable to schools. Based on Equation 2.15, the design effects are 
13.61 a
nd 13.65 for mathematics and reading respectively. They
 
are greater than 2, indic
a
ting that 
using multilevel model to
 
analyzed data here is reasonable.
 
59
 
 
Table 4.7. 
Null Model for ECLS
-
K: 2011 
Math
ematics
 
and Reading
 
 
Note
: 
UW=unweighted estimation method; RW=
estimat
ion
 
method with 
raw 
 
weight
s
; 
CS=estimation method wi
th cluster scaling; ES=estimation method with 
 
effective cluster scaling
; SE=standard error.
 
*
p < .05
;
 
**p
 
<
 
.01; ***p < .001.
 
 
The results of the model with student
-
level predictors are depicted in Table 4.
8
. Contrary 
to the nul
l model, the intercept estim
ates in the weighted models are smaller than those in
 
the 
unweighted model. As in the nul
l model, the unweighted model produc
es the largest estimate of 
student
-
level variance and the raw weighted model produ
ces the largest estim
ate of school
-
level 
variance
. Furthermore, the indices of goodness of fit AIC, BI
C, and deviance are substantially 
la
rger when raw weighted estimation me
thod is applied. Compared with the null model, the 
standard errors for the interce
pt increase, while th
e standard errors for studen
t
-
level and school
-
level variance decrease. The withi
n
-
school variance decreases by 67% f
or both mathematics and 
reading, the
 
between
-
school variance decrease varies from by 68% to 72% for mathematics, and 
f
rom by 61% to 64% for
 
reading. Similar results ar
e obtained when both 
student
-
level
 
and 
school
-
level
 
w
eights are used in the model. The st
andard errors of all the parameters 
with the unweighted 
method are consistently smaller than those of weighted methods
, and the test statis
tics of the 
60
 
 
unweighted estim
ator consistently larger than those of the weighted e
stimators, as expected. The 
signific
ance is stable for all the parameter
s as well.
 
 
Table 4.8. 
Model with Student
-
Level Predictors for ECLS
-
K: 2011 
Mathematics and Reading
 
 
Note
: 
UW=unweighted estimation method; RW=
estimat
ion
 
method with raw 
weight
s
;
 
CS=estimation method wi
th cluster scaling; ES=estimation method with 
effective cluster 
 
scaling
; SE=standard error.
 
AIC: 
Akaike In
formation Criteri
a; BIC=
Bayesian In
formation Criteria.
 
*
p < .05
;
 
**p
 
<
 
.01; ***p < .001.
 
 
Table 4.
9
 
reports the results of the full model. The covariate sub
urban is found not to
 
contribute significantly to
 
the model for both reading and mathematics data. Ano
ther model 
excluding suburban is
 
also run. These two models are then
 
compared using likelihood ratio test:
 
61
 
 
Table 4.9. 
Full Model for ECLS
-
K
: 2011 
Mathematics and Reading
 
 
Note
: 
UW=unweighted estimation method; RW=
estimat
ion
 
method with raw 
weight
s
;
 
CS=estimation method wi
th cluster scaling; ES=estimation method with effective cluster 
 
scaling
; SE=standard error.
 
AIC: 
Akaike In
formation Criteri
a; BIC=
Bayesian In
formation Criteria.
 
*
p < .05
;
 
**p
 
<
 
.01; ***p < .001.
 
 
one with suburban and one without. No signi
ficant result is foun
d. Therefore, I simplify the
 
model 
and include the three 
student
-
level
 
predictors
 
and only one 
school
-
level
 
predictor
 
rural in the model 
as full model in
 
this study. The findings from comparison of weighted and unweighted analyses 
are 
similar to the those 
obtained from the model with
 
only student level predictors. The estimates, 
62
 
 
standa
rd errors, and consequently the test
 
statistics do not show much differe
nces between the full 
model and student model. However, one can see that the signi
ficance remains uncha
nged for all 
the parameters 
except for 
school
-
level
 
covariate rural. It changes, 
from being significant at 0.01 
with 
raw weighted model to being signific
ant at 0.001 with the other three models for mathematics 
data. For reading data, t
he estimate for rural
 
is significant at 0.001 wit
h unweighted model, but it 
changes to be significant 
at 0.01 with other three weighted mo
dels. 
 
In general, for both reading 
and mathematics data in ECLS
-
K:2011, using weighted 
approaches produce large
r standard errors and small
er test statistics than unwe
ighted model. Hah
-

 
larger standard errors and resultin
g smaller test statistic values 
gene
rated suggest that, given a different model, the chance of committing Type I
 
error will increase 
substa
ntially when weights are use
d, although rejection of the hypotheses remain the sa
me across 

 
weighted approaches, the raw weight
ed method produces larger 
standard errors than the other two weighted method
s do. The two scaling metho
ds perform quite 
similarly f
or all the parameters in all models. 
63
 
 
C
HATPER
 
5 
S
UMMA
RY
 
AND
 
DISCUSSION
 
 
This chapter prov
ides a summary, a discussion and lim
itations of the results. It consists of 
four
 
sections. The first section summarizes the research object
ives, 
and 
results. The secon
d section 
presents the
 
discussion of major findings, 
followed by the
 
imp
lication
s
. Limita
tions of this study 
and directions f
or future research are discussed in the final section.
 
 
5.1 Summary 
of 
T
his Study
 
Th
e primary aim for t
his study is to
 
examine the 
performance of the four estimators and
 
analyze the im
pact of sampling we
ights in multilev
el models in the context of two
-
stag
e 
informative and 
non
-
informative
 
sampling designs. 
Large
-
scale data in social science usually 
adopt co
mplex sampling designs, such
 
as clustering and unequal probability of selection, 
which 
bring challenges in statistica
l analysis. 
Using multilevel models 
to analyze complex large
-
scale 
assessment data
 
accou
nting for clustering
 
is becoming more and more popu
lar, but it is still a 
quest
ion in when and how to use
 
sampling
 
weights in such m
odels
, to correct for unequal 
probab
ility of selection
. For example, the
re is controversy whether to use weight or not. It h
as long 
history arguing this issue between model
-
ba
sed and design
-
based schools
. Even if we have 
determined to use weights
, for 
inst
ance
,
 
in a two
-
level model
, us
ing
 
si
ngle
-
level weight derived 
from the p
roduct of the weights from each level, or using 
mult
ilevel weights 
is debatable. 
I use 
multilevel weigh
ts in this study because sin
gle
-
level weight may not carry adequate information 
t
o correct for unequal probability of
 
selection. The analysis with real d
ata shows that 
inco
r
porating 
sampling weights
 
in the
 
model 
do
es 
produce different parameter estimates, 
standard errors
,
 
test 
statis
tics
 
and 
even sometimes the significance of a certain
 
variable from those obtained when b
oth 
64
 
 
levels are informative. 
W
eighted
 
models have larger standard errors and smaller test
 
statistics than 
unweight
ed
 
model
 
does
. And the 
clu
ster scaling and effective s
caling method
 
produce more similar 
results compared w
ith
 
the
 
unweighted and raw weighted 
model. Therefore, caution should be 
exercised
 
while weights are applied 
in the multileve
l analysis.
 
 
In this study, 
Monte Carlo simulation
s
 
are
 
conducted to 
evaluate t
he performance of the 
four estimation methods in 
the 
in
formative and 
non
-
informative
 
samp
ling design
 
in a linear 
random
-
inter
cept model
,
 
because prior studies (e.g
.
, Cai, 2013)
 
found that the estimates were 
biased if the informa
tiveness was ignored
. 
Summar
y of the comparisons of the estimators are 
depicted i
n Table 5.1. 
Substantial differences
 
are found
 
among these
 
four
 
estimati
on methods
 
while
 
estimating
 
the intercept and variance components. In the informative design,
 
in terms 
of bias,
 
the weighted estima
tors outperform the unweighted for 
the 
intercept and 
student
-
level
 
variance 
estimation, w
hereas the unweighted estimator work
s the best for 
school
-
level
 
variance estimation
. 
Although the three weighted estimators produce almost 
unbiased estimates for the i
ntercept and 
student
-
level
 
variance, they p
erform
 
qui
te differentl
y. The three weighted p
erform almost equally 
well for inter
cept estimation, while the cluster scaling estimator performs
 
the
 
best for 
student
-
level
 
variance estim
ation. 
Raw weighted method 
w
orks the worst and 
should be used with caution when 
e
stimating 
sch
ool
-
level
 
variance
.
 
The
 
weighted methods give better covera
ge rates for
 
the
 
intercept 
and 
student
-
level
 
variance, but unweighted method does for 
school
-
level
 
vari
ance
 
in the
 
informative desi
gn. 
In the 
non
-
informative
 
setting, t
he unweighted me
thod gives th
e be
tter 
coverage rate
 
for all the parameter estimates
. 
The
 
unweighted estimator performs the best or the 
second best in terms of relative bias in the 
non
-
informa
tive
 
condition. 
Furthermore,
 
including 
sampling weights decrease
s
 
the RMSE for
 
th
e 
intercept a
nd 
student
-
level
 
varian
ce and increase 
 
65
 
 
Tabl
e 5.1.
 
Summa
ry of Comparisons of the Es
timation 
Methods
 
 
Note
: RB=
relative bias; RMSE=root mean square error; 95%
CR=95% 
confidence interval co
vera
ge rate.
66
 
 
the RMSE 
for 
the
 
school
-
level
 
variance in
 
the
 
informative design. However, 
it 
increase
s
 
the RMSE 
for
 
the
 
intercept, 
stu
dent
-
level
 
variance and 
scho
ol
-
level
 
variance
 
in 
the 
non
-
informative
 
design
.
 
Ther
efore
, the unweighted method works 
t
he 
most efficiently for all the paramete
r estimates across 
different levels of 
the 
ICC
 
i
n the 
non
-
informative
 
design.
 
Tentatively
, the clus
ter 
scaling estimator 
and ef
fective scaling estimator mig
ht be preferred in the infor
mative c
ondition. 
 
ICC is one of
 
the factors
 
that influences the quality
 
of estimation (e.g., Asparouhov
 
& 
Muth


n
, 2006; 
Kova


evi


& Rai, 2003). Therefore, it
 
is man
ipul
ated in this study
.
 
Simu
lation 
results
 
are summarized in Table 5.2 and it
 
show
s
, 
the effect of the ICC is related
 
to relative bias 
and RMSE, but not sens
itive to coverage rate. A
s the ICC increases
, the bias for 
student
-
level
 
variance 
increases and the
 
bia
s for 
school
-
level
 
varia
nce decreases 
in both conditions. These
 
changes are quite
 
obvious for 
school
-
level
 
varian
ce, but hard to see for 
student
-
level
 
va
riance. No 
monotonic patterns for the relative bias can be found as the ICC increases for fixed eff
ects
 
and 
intercept in the in
formative cond
ition, but clear patterns can be seen for f
ixed effects and intercepts 
as t
he increase of the ICC.
 
RMSE shows the s
imilar patterns in both conditions for all the parameters. As the ICC 
increases, t
he RMSE decreases
 
for
 
the three 
student
-
level
 
fixed effects
 
and variance
, and increases
 
for 
the 
school
-
level
 
fixed effect and variance
 
with all the four estimators
.
 
Take the 
following scenario when ICC = 0.3 for example. In the informative condition, 
when ICC = 0.3, the si
mula
tion results show that t
he clust
er scaling estimator works best for the 
intercept
 
and student
-
level variance in t
erms of relative bias, RMSE and coverage
 
rate. Although 
it is not the best estimator for the school
-
level variance estimates among the weig
hted
 
estimators, 
it gives th
e best c
overage rate and just slightly higher RMSE compar
ed with the best weighted 
estima
tor, the effective scaling estimator. In
 
addition, it produces 
unbiased estimates
 
for the
 
 
67
 
 
Table 5.2.
 
ICC Effect
 
 
Note
: RB=relative bias; RMSE=
root mean square error;
 
95%CR=95%
 
confidence interval co
verage rate.
68
 
 
school
-
level
 
variance. Therefore, in the info
rmat
ive setting, cluster
 
sca
ling estimator is preferred 
in most cases. In the 
non
-
inf
ormative
 
condition, when ICC is 
0.3, the unweighted estimator has 
the le
ast (absolute) relative bias, RMSE and highest coverage rate in almost all the cases. Therefore, 
th
e un
weighted estimator i
s pr
eferred in the 
non
-
informative
 
condition.
 
 
5.2 Discussion
 
of Results
 
The design of curren
t simulation captures the general featur
es of large
-
scale data sets 
available in social studies, for example, large number of clusters with
 
dif
ferent sizes, unequal 
pr
obability of selection, and moderate informativeness valu
es. 
Some of 
the findings from th
e 
previous studies are confirmed, and so
me are not in this study. 
For example, 
prior studies
 
show
ed
 
that
 
the
 
unweighted method 
produces bia
sed 
estimate for
 
the interce
pt and 
school
-
level
 
variance
 
when the sampling design is 
informative 
at both levels 
(Cai,
 
2013; Pfeffermann
 
et al., 1998)
. 
Pfeffe
rmann et al. (1998) 
pointed out
 
that when the design is informative at the cluster level, the 
unwei
ghte
d method only produces b
iased estimates for intercept and 
school
-
level
 
variance, 
not for 
stud
ent
-
level
 
variance. 
However,
 
the current study shows 
that 
th
e 
unweighted method
 
works quite 
well most of the time for
 
school
-
level
 
variance 
estimat
ion,
 
and it 
only
 
does not work well
 
when
 
the ICC is extremely small
 
in the informative design
. 
No
ne of the es
timators works well 
when the 
ICC is extremely small. This is
 
expected because, based on the equation 3.15, we have a very small 
denominator, 0.6, which results
 
in 
a very large relative bi
as compared with relative bias when ICC 
is compar
atively
 
larger. 
As f
or 
student
-
level
 
var
iance
, although the unweighted estimator
 
works 
the worst
 
in the informative condition
, it produces unbiased
 
estimate
s
.
 
In addition, Cai (20
13) 
pointed out
 
that includi
ng the sampling weights substantially increases the MSE. 
This
 
is 
only 
confirmed in the 
no
n
-
informative
 
setting, but not in the in
formative setting
 
in the current study
. 
69
 
 
All the fixed effects are 
nearly unbiased 
estimated
 
in term
s of
 
Muth


n
 
& 
M
uth


n
 
(20
02)
. This is 
confirmed in both studies.
 
In general, inclu
ding sampling weights still prod
uces 
biased estimates. 
This is confirmed
 
by all the studies. 
Asparouhov 
and
 
Muth


n
 
(2007) reported that the MPML 
estimator outperforms su
bsta
ntially the other
 
estima
tors. 
This is 
partially confirmed in the present 
study, s
ince cluster scaling estimator p
erforms better than others in the inform
ative condition, 
while raw weighted estimator needs to be used with caution, especially when we est
imat
e variance 
compon
ents in
 
the informative condition.
 
 
Previous studies (e.g., Aspa
rouhov & Muth


n, 2006; 
Kova


evi


& Rai, 2003) found that 
the bias 
increases for all the parameters as the ICC decreases. 
This is 
only partially confirmed
 
in 
the curr
ent 
study. 
C
urrent results d
o not show
 
monotonic patterns of the relative bias for th
e fixed 
effects and intercept, b
ut bia
s increases for 
student
-
level
 
vari
ance and decreases for 
school
-
level
 
variance as 
the
 
ICC 
increases
 
in the informative condition
.
 
In 
the 
non
-
informative
 
conditio
n, the 
increase of the ICC decreases the bias for 
student
-
level
 
fixed effects and varianc
e, and
 
increase the 
bias for 
school
-
leve
l
 
fixed effect and variance
.
 
Therefore, the tentative conclusion is that 
weighted
 
estimators
 
with c
lust
er scaling and effective
 
scaling 
weights
 
are preferred when the ICC is not 
extrem
ely small
 
in the informative des
ign an
d unweighted method could be used 
in the 
non
-
informative
 
design.
 
The differences above might be due to the different settings of simu
lati
on. For example, 
either 
the estimators are examined using random
-
intercept model 
with no covariates at both level
s 
(
cf
.
, Asparouhov & Muth


n, 2006; 
Kov
a


evi


& Rai, 2003) or the linear random
-
intercept model 
is used with no school
-
level predictor
s (
c
f., 
Cai
, 
2013). Therefor
e, it is possible that our 
results 
might not be replicate
d in different setting
s
. 
 
 
70
 
 
5.3 
I
mplications
 
The major finding from this 
study confirms that including sampling weights in the analysis 
produce different estimates in the i
nfor
mative sampling design a
nd 
the 
unweighted method works 
best in the 
non
-
informativ
e
 
sampling design. The fair com
p
arison between
 
the
 
weighted and 
unweight
ed, and between
 
the
 
informative and 
non
-
informative
 
design might indicate to use 
sampling weights i
n th
e informative design and
 
use unweighted estimation method in 
the 
non
-
informative
 
sampling design. 
Calculation of
 
informativeness is necessary since it gi
ves 
us the 
extent to which the design is informative
 
and indicate whether it is necessary to includ
e sa
mpling 
weights
. 
Second
, 
researchers should examine the ICC and evaluate the magni
tude and significance 
of varian
c
e components to determine whether multil
evel modeling is necessary.
 
Lastly but not the 
least, c
aution should be taken in using sampling wei
ghts
 
when ICC is extremely s
mall
. 
 
 
5.4 Limitations and Future Studies
 
There are sev
eral limitations in this study. 
The primary limitation is that 
only a si
mple 
linear random
-
intercept model is applied. It is more real if the slopes are random and differe
nt 
t
ypes of outcome variable
s, such as Poisson or nominal, may be used. 
This may prov
ide us with a 
clearer picture wh
ich estimator works best. Second, 
beside
s scaling the sampling weights, 
trimming weights can b
e an alternative
, which is not considered in 
this
 
study.
 
Third, 
I just ro
ughly 
divide the situation into two: informa
tive or 
non
-
i
nformative
. It might be better i
dea if different 
levels of informativene
ss, for example, low, medium and high levels of informativeness are all 
included in the analysis. T
his 
might tell us under whic
h condition of informativeness, the 
paramete
r estimates c
an be estimated unbiasedly.
 
Four
th, multistage sample selection is more 
complicated in real life. Therefore, the simulation design may not well reflect the reality.
 
71
 
 
Not al
l th
e findings in the prior 
studies are confirmed in this study. T
herefore, more stud
ies 
are need
ed
 
to evaluate MPML 
performance 
in different settings. 
 
For 
example, different types of 
outcome variable, such as discrete response or count data can be used. 
Ther
e are more and more 
rese
arch focus
ing
 
on them (Chaudhuri, Hand
cock, & Rendall, 20
08; Natarajan, Lipsitz, 
Fitzmaur
ice, Moore, & Gonin, 2008; Nordberg, 198
9; Rodriguez & Goldman, 1995, 2001)
,
 
or 
higher 
level HLM model
s
 
(e.g., three
-
level model)
 
can be us
ed. 
Furthermore, 
as is true 
with any 
simulation, conclusions from 
this study are 
rest
rict
ed to a 
particular
 
sampling 
design and modeling 
context. 
In order to
 
see if comparable findings happen in alternative situations, f
uture research is 
necessary. In this
 
stu
dy, the simulation is 
co
nduct
ed on
 
the basis of
 
a large of number of clusters. 
Sm
all samples are possible
 
in prac
tice
. The performance of estimators migh
t suffer from the small 
number of clusters (Asparouhov & 
Muth


n
, 2005; Li & Redden, 2015; Mass & 
Hox,
 
2005). 
Research to exam
ine the performance of different estimation methods in un
ideal conditions is 
necessary. 
A
bove all, 
future research is needed to 
e
nhance
 
weighted multilevel models. 
Asparouhov & 
Muth


n
 
(2010) 
sta
t
ed that
 
Bayesian estimation met
hod 
could be an alternative 
with 
maximum likelihood 
estimation 
methods when sample si
zes
 
are small
 
if we have informa
tive 
pri
or
s
,
 
but few comparisons 
were
 
ma
de
 
in the context of
 
informative sampling design
s
.
72
 
 
APPENDIC
ES
73
 
 
A
PPENDIX
 
A
.
 
Stata 
S
imulation 
S
yntax
 
i
n
 
th
e I
nformative 
S
ampling 
D
esign
 
 
/*************************************************
**************************/
 
set 
more off
 
local info 30 18 12 6 0.6
 
/*lev
el 2 variance*/
 
forvalues 
i 
=
 
1/
10
00 {/*to repeat the 
process 1000 times*/
 
 
display "iteration `i'"
 
   
foreach j in `i
nfo' {
 
 
clear 
 
 
display "l2var `j'"
 
 
*generate school 
level data
 
 
quietly: set seed 1
`i'1
 
 
quietly: set obs 75000
 
 
quietly:
 
gen uj
 
=
 
rnormal(0,
 
sqrt(`j')) /*need sd here, so nee
d to square root j*/
 
 
*uj recaled
 
 
quietly:
 
ege
n ujmean
 
=
 
mean(uj)
 
 
qu
ietly: egen ujsd
 
=
 
sd(uj)
 
 
quietly: gen uj_scaled
 
=
 
((uj
-
ujmean)/ujsd)*sqrt(2)
 
 
quietly
: gen pj
 
=
 
1/(1+exp(4.12
-
uj_scaled/2))
 
 
quietly: gen wj
 
=
 
1/pj
 
 
quietly: gsample
 
150 [aw=pj] /*draws a unequal probability sample with sa
mpli
ng 
probabilities pj.*/
 
 
quietly: gen index
 
=
 
1 
 
 
quietly: gen school
 
=
 
_n
 
 
*school covariates 
 
 
quietly: g
en rand = runiform()
 
 
quietly: gen loca
le = cond(rand < 0.22, 1, cond(rand < 0.58
, 2, 3))
 
 
quietly: gen rural = locale==1
 
 
quietly: gen 
subu
rb = locale==2
 
 
quietly
: gen urban = locale==3
 
 
*expand students based on perce
ntages of different types of sch
ools
 
 
quietly: expand 16+int((24
-
10+1)*
runiform()) if school<=8  /*5.69% of 150 
s
chools: 8*/
 
 
quietly: expand 25+int((49
-
25+1)*runiform()
) if
 
school>=9 &school<=25  
 
/*11.49% 
of 150 schools: 17*/
 
 
quietly: expand 50+int((
99
-
50+1)*runiform()) if school>=
26 & school<=91 
/*43.53% of 150 schools:
66*/
 
 
quietly: expand 100+int((149
-
100+1)
*runiform()) if school>=92 & school<=129 
/*25.48% of 150 
scho
ols:38*/
 
 
quietly: expa
nd 150+int((199
-
150+1)*runiform()) if school>=130 & schoo
l<=142  
/*8.59% of 150 schools:1
3*/
 
74
 
 
quietly: expand 200+int((600
-
200+1
)*runiform()) if school>=143 & school<=150
  
/*5.22% of 150 schools:8*/
 
 
quietly: bysort school: 
gene
rate student
 
=
 
_n
 
 
*gen
erate student data
 
 
quietly: gen eij
 
=
 
rnormal(0,
 
sqrt(6
0
-
`j'))
 
 
*eij recaled
 
 
quietly
: egen eijmean
 
=
 
mean(eij)
 
 
quietly: eg
en eijsd
 
=
 
sd(eij)
 
 
quietly: gen eij_scal
ed
 
=
 
((eij
-
eijmean)/eijsd)*sqrt(2)
 
 
quietly: gen pi_j
 
=
 
1/
(1+exp(1.23
-
eij_scaled/2
))
 
 
quietly: gen wi_j
 
=
 
1/pi_j
 
 
quietly: gen pij
 
=
 
pi_j
*pj
 
 
quietly: gen wij
 
=
 
1/pij
 
 
*generate correlated data for female, S
ES and pretest
 
 
quietly: local p
 
=
 
0.49
 
        
quietly: ma
trix
 
m = (0,
 
-
0.05,
 
46.92) 
 
        
quietly: matrix sd = (0.5,
 
0.81,
 
11.50)
 
        
q
uietly: matrix input c =
 
(1, 0.005, 1, 0
.07, 0.409, 1)
 
        
quietly: corr2data female SES pretest, corr(c) means(m) sds(s
d) c
storage(lower)
 
 
/* Step
s 2
-
3 for the one Bernoulli variable */
 
        
quietly: replace female = 
cond(normal(female)>=(1
-
`p'),1,0)
 
 
/*me
rge two level data*/
 
 
quietly: gen yij
 
=
 
17.43+0.91*female+1.06*SES + 0.92*pretest+1.04*rural+uj+e
ij
 
 
quietly: rename yij ach
ieve
 
 
quietly: rename wj schwgt
 
 
quietly: rename wi_j s
tdwgt 
 
 
*select final sample
 
 
quietly: keep if index
 
==
 
1
 
 
quietly: g
sample
 
3915 [aw=pi_j] 
 
 
if `j' == 30 local r = 1
 
 
if `j' == 18 local r = 2
 
 
if `j' == 12 local r
 
= 3
 
 
if `j' == 6 local r =
 
4
 
 
if `j' == 0.6 local r = 5
 
 
quietly: keep student sc
hwgt school locale rural suburb 
urban stdwgt female SES 
pretest achieve
 
 
gen 
iteration = `i'
 
****************************************************************************/
75
 
 
A
PP
ENDIX
 
B
.
 
Stata 
S
imulatio
n 
S
yntax 
in
 
the
 
N
on
-
I
nformative 
S
ampling 
D
esign
 
 
/*******
*************************
 
set mo
re off
 
local info 30 18 12 6 0.6
 
/*level
 
2 variance*/
 
forvalues 
i 
=
 
1/
10
00 {/*to repeat the process 1000 times*/
 
 
display "iteration `i'"
 
 
foreach j in `inf
o' {
 
 
clear 
 
 
display "l2var `j'"
 
 
*generate school le
vel data
 
 
quietly: set seed 1`i
'1
 
 
quietly: set obs 75000
 
 
qui
etly: g
en uj
 
=
 
rnormal
 
(0,
 
sqrt(`j')) 
 
 
*betaj recaled
 
 
quietly: gen betaj
 
=
 
rnormal
 
(0,
 
sqrt(
2
))
 
 
quie
tly:
 
egen betajmean
 
=
 
mean(b
etaj)
 
 
quietly: egen betajsd
 
=
 
sd(betaj)
 
 
quietly: gen 
betaj_scaled
 
=
 
((betaj
-
betajmean
)/betajsd)*sqrt(2)
 
 
quietly: gen pj
 
=
 
1
/(1+exp(4.12
-
betaj_scaled/2))
 
 
quietly: gen wj
 
=
 
1/pj
 
 
quietly: gsample 150 [aw=pj] /*draws a une
qual
 
probability sample with
 
sampling 
probabilities pj.*/
 
 
quietly: gen index
 
=
 
1 
 
 
quietly: gen school
 
=
 
_n
 
 
*s
chool covariates 
 
 
quietly: gen rand =
 
runiform()
 
 
quietly: gen locale = cond(rand < 0.22, 1, cond(rand < 0.58, 2, 3))
 
 
quietly: gen rur
al 
= locale==1
 
 
quietly: ge
n suburb = locale==2
 
 
quietly: gen urban = locale==3
 
 
*expand students based on percent
ages of 
different types of schools
 
 
qu
ietly: expand 16+int((24
-
10+1)*runiform()) if school<=8  /*5.69% of 150 
schools: 8*/
 
 
quietly: expa
nd 
25+int((49
-
25+1)*runiform
()) if school>=9 &school<=25   /*11.49% 
of 150 schools: 
17*/
 
 
quietly: expand 50+int((99
-
50+1)*r
uniform()) if school>=26 & scho
ol<=91 
/*43.53% of 150 schools:66*/
 
 
quietly: expand 100+int((149
-
100+1)*runiform()) if school>=92 
& s
chool<=129 
/*25.48% of 15
0 schools:38*/
 
 
quietly: expand 150+int((199
-
150+1)*run
iform()) if school>=130 & school<
=142  
/*
8.59% of 150 schools:13*/
 
76
 
 
qui
etly: expand 200+int((600
-
200+1)*runiform()) if school>=143 & school<=150  
/*5.22% of 150 schools:8*
/
 
 
quietly: bysort school: 
generate 
student 
=
 
_n
 
 
*generate student data
 
 
quietly
: gen eij
 
=
 
rnormal(0,sqrt(60
-
`j'
))
 
 
*ri
j recaled
 
 
quietly: gen eij
 
=
 
rnormal(0,sqrt(60
-
`j'))
 
 
quietly: gen rij
 
=
 
rnormal(0,sqrt(
2
))
 
 
quietly: egen rijmean
 
=
 
mean(rij)
 
 
q
uietly: egen rijsd
 
=
 
sd(r
ij)
 
 
quietly: gen rij_scaled
 
=
 
((rij
-
rijmean)/rijsd)*sq
rt(2)
 
 
quietly: gen pi_j
 
=
 
1/(
1+exp(1.
23
-
rij_scaled/2))
 
 
quietly: ge
n wi_j
 
=
 
1/pi_j
 
 
quietly: gen pij
 
=
 
pi_j*pj
 
 
quietly: gen wij
 
=
 
1/pij
 
 
*generate correlated data 
for
 
female, SES and pretest
 
 
quietly: local p
 
=
 
0.49
 
        
quietly: matrix m 
= (0,
 
-
0.05,46.92) 
 
        
quietly: matrix sd = (0.5,0.81,11.50)
 
 
quietly: matrix input c =
 
(1, 0.006, 1, 0.07, 0.409, 1)
 
 
q
uietly: corr2data female 
SES pretest, corr(c) means(m) sds(sd) cstorage(lower)
 
 
/* Steps 2
-
3 for the one Bernoull
i vari
able */
 
        
q
uietly: replace female = cond(normal(female)>=(1
-
`p'),1,0)
 
 
/*merge two level data*/
 
 
quietly: gen
 
yi
j
 
=
 
17.43+0.91*female+1.0
6*SES + 0.92*pretest+1.04*rural+uj+eij
 
 
quietly: rename
 
yij achieve
 
 
quietly: rename wj
 
schwgt
 
 
quietly: rename wi_j stdwgt 
 
 
*select final sample
 
 
quietly: keep if index
 
==
 
1
 
 
quietly: gsample 3915 [aw=pi_j] 
 
 
if `j' == 
30 
local r = 1
 
 
if `j' == 1
8 local r = 2
 
 
if `j' == 12 local r = 3
 
 
if `j' == 6 l
ocal r = 4
 
 
if
 
`j' == 0.6 local 
r = 5
 
 
quietly: keep student schwgt sc
hool locale rural suburb urban stdwgt female SES 
pretest achieve
 
 
gen iteration = `i'
 
 
}
 
}
 
 
*******
***
*************************
*************************************/
77
 
 
A
PPENDIX
 
C
.
 
M
p
lus
 
Syntax
 
 
/***********************
**
M
plus
 
VERSION 8**********************
*******/
 
/****************
 
*******Unweighted
 
estimation method
*********************/
 
Title: READING 
wit
h NO weights; 
 
Data: File
 
is iteration_list.csv;
 
       
Type =
 
MONTECARLO; 
 
Vari
able: Names are
 
       
schwgt sch
ool locale rural suburb urban student s
tdwgt female
 
     
SES pretest achieve iteration;
 
     
USEVARIABLES are achieve school female SES p
ret
est rural;
 
     
CLUSTER
 
=
 
school;
 
     
WITHIN
 
=
 
female SES pretest;
 
     
BETWEEN
 
= rural;     
 
MODEL: %WITHIN% 
 
  
achieve on female*.91 SES*1.06 pre
test*.92;
 
       
achieve*
30
; !variance at level1
 
       
%BETWEEN% 
 
       
achieve on rural*1.04;
 
   
[achieve*17.43]; ![gamma
00]
 
       
achieve*
30
; !variance at level2 
      
 
ANALYS
IS:
 
          
TYPE = TWOLEVEL;
 
 
/
**************** Estimating method with
 
raw weights *********************/
 
Title: READING with raw weights (unscaled);  
 
Data: File is iter
ati
on_list.csv;
 
       
Type 
=
 
MONTECARLO; 
 
Variable: Names are
 
 
schwgt
 
school local
e rural suburb urban student stdw
gt female
 
     
SES pretest achi
eve iteration;
 
     
USEVARIABLES are achieve school female SES pretest rural;
 
     
CLUSTER
 
=
 
school;
 
  
WITHIN
 
=
 
female SES pr
etest;
 
     
BETWEEN
 
= rural;
 
     
Weight is
 
stdwgt;
 
    
Bweight
 
=
 
schwgt;
 
     
Wtscale
 
=
 
UNSCALED;
 
     
Bwtscale
 
=
 
UNSCALED; 
 
M
ODEL: %WITHIN% 
 
       
achieve on female*.91 SES*1.06 pretest*.92;
 
       
achieve*
30
; !variance at l
eve
l1
 
78
 
 
%BETWEEN% 
 
    
achieve on rural*1.04;
 
       
[achieve*17.43]; ![gamm
a00]
 
       
achieve*
30
; !variance
 
at level2
 
ANALYSIS:
 
         
TYPE = TW
OLEVEL;
 
         
a
lgorithm
 
=
 
integration;
 
         
estimator
 
=
 
MLR;
 
 
/********************
Estimation
 
me
thod with 
cluster scaling
************************/
 
Title: READING with scaling1; 
 
 
Data: File is iteration_list.cs
v;
 
       
Type =
 
MONTECARLO; 
 
Variable:
 
Names are
 
 
schwgt school locale rural suburb urban student stdwgt female
 
     
SES pretest ach
iev
e iteratio
n;
 
     
USEVARI
ABLES are achieve school female SES pretest rural
;
 
     
CLUSTER
 
=
 
school;
 
     
WITHIN
 
=
 
f
emale SES pretest;
 
     
BETWEEN
 
= rural
;
 
     
Weight is stdwgt;
 
     
Bweight
 
=
 
schwgt;
 
     
Wtscale
 
=
 
cluster;
 
     
Bwtscale
 
=
 
sample; 
 
MOD
EL:
 
%WITHIN% 
 
       
achieve
 
on female*.91 SES*1.06 pretest*.92;
 
       
achieve*
30
; 
!variance at level1
 
       
%BETWE
EN% 
 
       
achieve on rural*1.04;
 
    
[achieve*17.43]; ![gamma00]
 
       
achieve*
30
; !variance at level2
 
ANALYSIS:
 
         
TYPE = T
WOL
EVE
L;
 
         
a
lgorithm
 
=
 
i
ntegration;
 
         
estimator
 
=
 
MLR;
 
 
/*************
 
Es
timation method with effective sc
aling (ecluster scaling)
***************
/
 
Title: READING with scaling2;
 
Data: File is iteration_list.csv;
 
       
Type =
 
MONTECARLO; 
 
Vari
abl
e: 
Names are
 
 
schwgt school
 
locale rural suburb urban student stdwgt female
 
     
SES pretest achieve iterati
on;
 
79
 
 
USEVARIABLES are achieve schoo
l female SES pretest rural; 
 
     
CLUSTER
 
=
 
school;
 
     
WITHIN
 
=
 
female SES pretest;
 
     
BETWEEN
 
=
 
ru
ral;
 
     
Weight is stdwg
t;
 
     
Bweight
 
=
 
schwgt;
 
     
Wtscale
 
=
 
ecluster;
 
     
Bwtscale
 
=
 
sample;
 
MODEL: %WITHIN
% 
 
       
achieve on female*.91 SES*1.0
6 pretest*.92;
 
       
achieve*
30
; !variance at level1
 
       
%BETWEEN% 
 
       
achieve on rural*1.04
;
 
 
[achieve*17.43]; ![
gamma00]
 
       
achieve*
30
; !variance at level2
 
ANALYSIS
:
 
        
TYPE = TWOLEVEL;
 
      
a
lgorithm
 
=
 
integration;
 
         
es
timator
 
=
 
MLR;
80
 
 
REFER
ENCE
S
81
 
 
REFERENCES
 
 
Arceneaux, K.
,
 
& Nickerson, D. W. (2009). Modeling certainty w
ith
 
clustered data: A compar
ison 
of methods. 
Political Analysis
, 
17
, 177
-
190. 
d
oi: 1
0. 1093/pan/mpp004
 
 
Asparouhov, T
. (2004). 
Weighting for unequal probabi
lity of selection in 
multilevel modeling
, 
M
plus
 
Web Notes
:
 
No. 8
,
 
available from 
http:/
/www.statmodel.com/
 
 
Asparouhov, T. (2005). Sampling we
ights in latent variable modeling
. 
Structural Equation 
Modeling, 
12
(
3), 
411
-
434.
 
 
Asparouhov,
 
T. (2006). General multi
-
level modeling with sampling weights. 
Communications 
in 
Statistics

Theory and Met
hods, 
35
(3
), 439
-
460.
 
 
Asparouhov, T.
,
 
& 
Muth


n
, B. (2
005). 
Multivariate statistical mo
deling with survey data 
(M
plus 
Web Note
s). Los Angeles, CA: 
Muthén & Muthén
.
 
 
Asparouhov, T.
,
 
& 
Muth


n
, B. (2007). 
Testing for informativ
e w
eights and weights trimmi
ng in 
multivariate modeling with survey data
. Retrieved 
August 2
1, 2012
 
from
 
http://www.statmodel.com/download/JSM2007000745.pdf
 
 
Asparouhov, T.
,
 
& 
Muth


n
, B.
 
(2
010). 
Bayesian analysis o
f latent variable models using Mplus 
(M
plus
 
Technical Re
port Versi
on 4). Los Angeles, CA:
 
Muthén & Muthén
.  Retrieved 
from 
http://www.statmodel.com/download
/Ba
yes
-
Advantages18.pdf
 
 
As
parouhov, T.
,
 
& 
Muthén
, B. (2006). 
Multilevel modeling o
f complex survey data. 
Paper 
pres
ented at the Proceedings of the Joint S
tatistical Meeting in Seattle.
 
 
Bainbridge, T. R. (1985). The Committee on standards: precision and 
bia
s. 

 
ASTM 
Standardization
 
News 13, 44
-
46.
 
 
Bertolet, M. (200
8). 
To weight or not 
to weight? Incorporating sampling
 
designs into model
-
based 
analyses. 
(Ph
. D.), Carnegie Mellon University, Ann Arbor.
 
 
Binder, D. A. (1983). On the variances of asymptotica
lly
 
normal estimators from c
omplex surveys. 
International Stati
stical Review, 
51
(
3),
 
279
-
292.
 
 
Bloom, 
H. S., 
Bos,
 
J. 
M.,
 
& Lee
, S.
 
(1999). Using cluster ran
dom assignment to measure program 
impacts: statistical implications for the evaluation of education 
pro
grams. 
Evaluation 
Review,
 
23
(4), 445
-
469.
 
 
82
 
 
Bloom, 
H. S., 
Ric
hburg
-
Hayes,
 
L.,
 
& Bl
ack, 
A. R. (
2007
)
. Using covariat
es to improve precision 
for studies tha
t randomize schools to evaluate educational interventions. 
Educational 
Evaluation and Policy Analysi
s, 
29
(1
)
, 
30
-
59
. 
doi: 10.310
2/0162373707299550Schochet, 
2008
 
 
Boslaugh, S. (2007). 
S
econdary data sou
rces for public 
health: A practical guide. 
New York, NY
: 
Cambridge University Press.
 
 
Cai, T. (2013). Investigation of ways to handle sampling weights for 
mul
tilevel model analyses. 
S
ociological Methodology, 
43
(1),
 
178
-
219.
 
 
Carle, A. C. (
2009). Fitting multilevel models 
in complex survey data with design weig
hts: 
Recommendations. 
BMC Medic
al Research Methodology
. 
doi:10.1186/1471
-
2288
-
9
-
49
 
 
Chantala, K.
,
 
& 
Suc
hindran, C. M. (2006). Ad
justing for unequal selection probability in 
multilevel 
models: a comparison of software 
packages. 
Proceedings of the American 
S
tatistical Association, Seattle, WA: American Statistical Association, 
2815
-
2824.
 
 
Chantala, K., Bla
nch
ette, D., & Suchindran, C
. M. (2011). Software to compute sampling weights 
for
 
mu
ltilevel
 
analysis
.
 
Available from
 
ht
tp://www.cpc.unc.edu/rese
arch/tools/data_analysis/ml_sampling_weights/Compute%20
W
eights%20for%20Multilevel%20Analy
sis.pdf
.
 
 
Chaudhuri, S., Handcock, M. 
S., & Rendall, M. S. (2008). Generalized linear models incorporating 
population level information: a
n e
mpirical
-
likelihood
-
based
 
approach. 
Journal of the Royal 
Statistical Society: Ser
ies B (Statistical Methodology), 
70
(
2), 311
-
328.
 
 
Chaudhuri, S., Handcoc
k, M. S., & Rendall, M. S. (2010). A conditional empirical likelihood 
approach to combine sampling d
esi
gn and population level i
nformation.
 
Technical report 
No. 3/2010, National Univer
sity of Singapore, Singapore, 117
546.
 
 
Chen
, J.
,
 
& Sitter, R. R. (1999).
 
A pseudo empirical likelihood approach to the effective use of 
auxiliary information in complex sur
vey
s. 
Statistical Sinica, 
9
(
2), 385
-
406.
 
 
Christ, S., Biemer, P., & Wiesen, C. (2007
). 
Guidelines for applying multil
evel model
ing t
o the 
NSCAW data
. Ithaca, NY: National Data Archive on Child Abuse and Neglect.
 
 
Clarke, P. (2008). When can group level clu
ste
ring be ignored? Multilev
el models versus single
-
level models with sparse data. 
J
ournal of Epidemiology and Commun
ity Health
, 
62
,
 
752
-
758. doi: 1
0. 1136/jech.2007.060798
 
 
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). 
Applied multiple regress
ion/correlation 
analysis for
 
the behavioral sciences. 
Mahwah, NJ: Lawrence Erlbaum.
 
 
83
 
 
Danielsen, A. G., Wiium, N., Wilhelmsen, B
. U.,
 
&
 
Wold, B. (201
0). Perceived support provided 

-
reporte
d academic initiative. 
J
ournal of 
School Psychology
, 
48
(3), 247
-
67. doi:10.1016/j.jsp.2010.02.002
 
 
Eideh, A.
,
 
& Nathan, G. (2009). Two
-
stage informative clu
ster sampling wi
th application in small 
area estimation. 
Journal of Statistical Planning and Inferen
ce, 
139
, 3088
-
3101.
 
 
Enders, C. K. (2010). 
Applied 
missing data analysis. 
New York, NY: Guilford Press.
 
 
Fra
n
cisco, C. A.
,
 
& Fuller, W. A. (1991).  Quantile
 
estimation with
 
a complex survey design. 
The 
Annals of Statistics, 
19
(1),
 
454
-
469.
 
 
Fuller, W. (198
4). Least squares and
 
related analyses for complex 
survey design. 
The Annals of 
Statistics, 
10
(1
), 99
-
118
.
 
 
F
uller, W. (2009). 
Sampling Statistics.
 
Hoboken:
 
Wiley.
 
 
Goldste
in, H.  (1986). Multilevel mixed linear model analysis using iterative generalized l
east 
squares. 
Biometr
ika,
 
73
, 43
-
56. 
 
 
Graubard, B.
 
I.
,
 
& Korn, E. L. (1996). Modeling the sampling desig
n i
n
 
the analysis of health 
surveys. 
Statistical me
thods in medical
 
research, 
5
(3),
 
43
-
56.
 
 
Grilli, L.
,
 
& Pratesi, M. (2004). Weighted estimation in mu
ltilevel ordinal and 
binary models in 
the presence 
of informative sampling designs. 
Survey Methodology, 
3
0
(
1
)
, 93
-
103.
 
 
Hahs
-
Vaughn, D. L. (2005). A primer 
for using and un
derstanding weights with national datasets. 
The
 
Journal of Experimental Education, 
7
3
(3)
, 221
-
248. 
d
oi: 1
0.3200/JEXE.73.3.221
-
248
 
 
Heck
, R. H.
,
 
& Mahoe, R. (2004). 
An example of the 
impact of 
s
ample weights and centering on 
multilevel SEM m
odels.
 
Paper pre
sented at the annual meeting of the American Educational 
Research Association, San D
iego, CA.
 
 
Hedges, L.
 
V.
,
 
& Hedberg, E. C. (2007). 
Intraclass correlation values for planning grou
p
-
randomiz
e
d trials in education. 
Educational Evaluation a
nd Policy Analys
is, 
29
(1)
,
 
60
-
87. 
doi: 10.3102/0162373707299706
 
 
Hedges, L. V.
,
 
& Hedberg, E. C. (20
13). Intraclass corre
lations and covariate outcome 
correlations 
for planning two
-
 
and three
-
level 
cluster
-
ra
n
domized experiments in education. 
Evaluation 
Re
view,
 
37
(6
), 445
-
489.
 
 
Howell, D. C. (2008).  The analysis of missing data. In 
Handbook of social sc
ience methodology, 
ed
. W. Outhwaite and S. Turner, 
(208
-
224). London, GB: Sage.
 
 
Hox, J. J.
,
 
& Kre
ft, I. G. 
(
1994). Multilevel analysis methods. 
Sociologica
l Methods & Rese
arch, 
22
(3)
, 283
-
299.
 
84
 
 
Huber, P. J. (1967). The behavior of maximum likelihood estima
tes under nonstandard
 
conditions. 
In Proceedings of
 
the Fifth Berkeley Symposium on Mathematical S
tatistics 
a
nd 
Probability (Vol. 1, pp. 221
-
233). Berkeley,
 
CA: University 
of California Press.
 
https://projecteucl
id.org/euclid.bsmsp/1200512988
 
 
Jia, Y., Stokes, L., Harris, I., & Wang, Y. (2011). Pe
r
formance of random effects model estimators 
und
er complex sampl
ing designs. 
Journal of Educational and Behavioral Statistics, 
36
(
1), 
6
-
32.
 
 
Judd, C
. M., McClelland, G. 
H., & Ryan, C. S. (2009). 
Data
 
an
alysis: A model comparison 
approach. 
New York, NY: Rou
t
ledge.
 
 
Kim, J. K.
, &
 
Skinner, C. J. (2013). We
ighting in surve
y analysis under informative sampling. 
Biometrika,
 
100
(2
), 385
-
398. 
https://www.js
to
r.org/stable/43304565
 
 
Kish, L. (1965). 
Survey samplin
g
.
 
New York: Wiley.
 
 
Kish, L. (1992). Weighting 
for unequal Pi. 
Journal of Official Statistics, 
8
(
2), 183
-
200.
 
 
Korn, E. L.
,
 
& Graubard, B. I. (1995
). Examples of differ
ing weighted and unweighted es
tim
ates 
from a sample survey. 
The American Statistician, 
4
9
(
3), 291
-
295.
 
 
Korn, E. L.
,
 
& Graubard, B. I. 
(2003). Estimati
ng variance components by using survey data. 
Journal of the Royal Statistical Societ
y
:
 
Series B (Statisti
cal Methodology), 
65
(1
), 175
-
1
90.
 
 
Kova


evi


, M. S.
, &
 
Rai, S. N. (2003). A pseud
o
 
maximum likelihood approach to multi
-
level 
mod
eling of survey 
data. 
Communications in Statistics
-
Theory and Methods, 
32
(
1), 103
-
121.
 
 
Koziol, N. A
., Bovaird, J. A., & 
Suarez, S. (2017). A compariso
n of population
-
averaged and 
cluster
-
specific approaches 
i
n the context of unequal probabilities of selec
tion. 
Multivaria
te Behavioral Research, 
52
(3
)
, 
325
-
349
. 
doi: 10.1080/00273171.2
-
17.12921
15
 
 
Kreft, I
. G. G.
,
 
& Yoon, B. (
1994). 
Are multilevel techniqu
es necessary
? 
An attempt at 
demystification
. Retrieved fr
o
m 
http://eric.ed
.gov/?id=ED371033
 
 
Laird, N. M.
,
 
& Ware, J. H. (1982). Random
-
effects m
odels for lo
ngitudinal data. 
Biom
etrics, 
38,
 
963
-
974.
 
 
Laukaity
te, I.
,
 
& Wiberg, M. (2018). Importance of sampling weigh
t
s in multilevel modeling of 
international large
-
scale assessmen
t data. 
Communications in Statistics
-
Theory and 
Methods, 
47
(
20), 4991
-
50
12. 
https://doi.org/10.1080/03610926.2017.1383429
 
 
Lee, J.
,
 
& Fish, R. M. (2010). International and inters
tate gaps in val
ue
-
added math
-
achievement: 
multilevel instrumental variable analysis of age effect a
nd grade effect. 
Amer
ican Journal 
of Education, 
117
(
1), 109
-
137.
 
85
 
 
Li, P.
,
 
& Redden, D. T. (2015). Small sampl
e
 
performance of bias
-
corrected sandwich estimat
ors 
for cluster
-
randomized trials with binary outcomes. 
Statistics in Medicine, 
34
, 
281
-
296. 
http://d
x.doi.org/10.1002/sim.6344
 
 
Lin, Y. X., Steel, D., & Cha
m
bers, R. L. (2004). Restricted quasi
-
score esti
mating functions
 
for 
sample survey data. 
Journal of Applied Probability, 
41
,
 
119
-
130.
 
 
Longford, N. 
T. (1995). 
Model
-
base
d methods for analysis of data
 
from 1990 NAEP trial state 
a
ssessment.
 
Washington, DC.
 
 
L
ongford, N. T. (1995). 
Random coefficient model
s
. Handbook of S
tatistical Modeling for the 
Social and Behavioral Sciences, 519
-
570.
 
 
Lubienski, S. 
T.
,
 
& Lubienski, C. (
2006). School sector and acade
mic achievement: a multilevel
 
analysis of NAEP mathematic
s
 
data. 
American Educational Research Journal, 
4
3
(
4), 651
-
698.
 
 
Mass, C. J. M.
,
 
& Hox, J. J. (2005). Sufficient sample sizes for multilevel modeling
. 
Methodology, 
1
(3)
, 
86
-
92. 
http://dx.doi.org/10.1
0
27/1614
-
1881.1.3.86
 
 
Mels, G. (2006). 
LISREL 
for windows: get
ting started guide. 
Lincolnwood, IL: Scientific Software 
International.
 
 
Mulligan, G
. M., Hastedt, S., & 
McCarroll, J. C. (2012). 
First
-
Time Kindergarteners in 2010
-
2011: 
First Findings From t
h
e Kindergarten Rounds of the Early Childhood Lo
ngitudinal Study
, 
Kindergarten Class of 20101
-
11 (ECLS
-
K:2011) (NCES 2012
-
049). 
U.S. Department of 
E
ducation. Washington,
 
DC: National Center for 
Educa
tion Statistics. 
 
 
Murray
, D. M.,
 
& Short,
 
B.
 
(
1995
)
. Int
r
aclass correlation among measures related to al
cohol use 
by you
ng adults: estimates, correlates, and applications in intervention studies. 
Journal 
of 
Studies on Alcohol
, 
56
(6), 681
-
694.
 
 
Musca,
 
S. C
., Kamiejski, R., Nugier, A., M


ot, A., Er
-
Rafiy, A., &
 
Brauer, M. (2011). Data with 
hierarchical struc
ture: Impact of 
intraclass correlation and sample size on type
-
I error. 
Frontiers in Psychology, 
2
(7
4)
. doi: 10.3389/fpsy
g.2011.00074
 
 
Muth


n, L. K.
,
 
&
 
Muth


n, B. O. (1998
-
2017). 

8
th
 
e
d
. Los Angeles: Muth


n 
& Muth


n. 
 
 
Muth


n,
 
L. K.
, &
 
Muth


n, B. O. (2002). How to use a Monte Carlo study to decide on sample size 
and determ
ine power. 
Structural
 
Equation Modeling, 
9
(4
), 599
-
620.
 
 
Natarajan,
 
S.,
 
Lipsitz,
 
S. R.,
 
Fitzmaurice,
 
G.,
 
Moo
r
e,
 
C. G., 
& Gonin, 
R. (
2008
). Variance 
estimati
on in complex su
r
vey sampling for generalized linear models. 
Journal of the Royal 
Statistical Societ
y: Series C (Applied 
Statistics), 
57
(1), 75
-
87.
 
86
 
 
Nor
dberg,
 
L.
 
(
1989
). Generalized linear modeling of sample s
u
rvey data. 
journal of Official 
Statistics, 
5
(3)
, 223.
 
 
Palardy,
 
G. J. (2010). The multilevel crossed random effects growth model for estimating tea
cher 
and school effec
ts: Issues and extensions. 
Edu
cational and Psychological Measurement, 
70
(3), 401
-
419.
 
 
P
feffermann, D. (1993). The role of sampling wei
ghts whe
n modeli
ng survey data. 
International 
Statistical Review, 
61
(
2)
,
 
317
-
337. doi: 10.2307/14036
31.
 
 
Pfeffermann, D.
,
 
& LaVange
, L. (1989). Regress
ion models for stratified multi
-
stage cluster 
samples. In
 
C. J. Skinner, D. Holt, & T. M. F. Smith (Eds),
 
Analysis of com
plex surveys
 
(237
-
260). New York, NY: John Wiley & Sons.
 
 
Pfeffermann, D., Krieger, 
A. M., & Rinott, Y. (
1998). Par
ametric distribution
s of complex survey 
data under informative probability sa
m
pling. 
Statistica Sinica, 
8
(
4), 1087
-
1114.
 
 
Pfe
ffermann, D., Sk
inner, C. J., Holmes D. J
.
, Goldstein,
 
H. &
 
Rasbash, J. (1998). Weighting for 
u
nequa
l 
s
election 
p
robabili
ties in 
m
u
ltilevel 
m
odels. 
Jou
rnal of Royal Statistical Society
:
 
Series B
, 
60
(1)
, 23
-
40
.
 
 
Rabe
-
Hesketh, S. & Skrondal, A. (2006). Multi
level modeling o
f complex survey data. 
Journal of 
Royal Statistical Society
:
 
Series A
,
 
169
(4)
, 805
-
8
27
. 
https://doi.org/10.1111/j.1467
-
985X.2006
.
00426.x
 
 
Rao, J. N. K.
,
 
& Wu, C. (2010). Bayes
ian pseudo
-
empir
ical
-
likelihood intervals for complex 
surveys. 
Journal of the Royal Statistical Soci
ety
:
 
S
eries B (Statis
tical Methodology), 
72
(4
), 
533
-
544.
 
 
Rao, J. N. K., Verret, F., & Hidiroglou, M. A. (20
1
3). A weighted composite likelihood approach 
to
 
inference for t
wo
-
level models from survey data. 
Survey Methodology, 
39
(
2)
, 
263
-
282.
 
 
Raudenbush, S
. W.
,
 
& Bryk, A. S. (
2002). 
Hierarchical linear mod
es (2
nd
 
ed.).
 
Tousand Oaks, CA: 
SAGE.
 
 
Raykov, T. (2011).
 
Intraclass correlation coefficients in hierarch
ical designs: Ev
aluation using 
latent variable modeling. 
Structural Equation Modeling
, 
18
(1
), 73
-
90.
 
doi: 
10.1080/1070551
1.2011.534319
 
 
Raykov, T.
,
 
& M
arcoulides, G. A. (2015). Intraclass correlation coeffici
e
nt in hierarchical design 
studies with discrete
 
response variab
les: a note on a direct interval estimation procedure. 
Educational and 
Psychological
 
Measurement, 
75
(6), 
1063
-
1071.
 
 
Robin, D. B. (1987
). 
Multiple imputations for non
-
response in surveys. 
New 
Y
ork, NY: Wiley.
 
 
87
 
 
Rodriguez, G.
,
 
& Goldman, N. (
1995). An assess
ment of estimation procedures for multilevel 
models with binary responses. 
Journal o
f the Royal Statistic
al Society
:
 
Series A (Statisti
cs 
in Society), 
73
-
79.
 
 
Rodriguez, G.
,
 
& Goldman, N. (200
1
). Improved estimation procedures for multileve
l models with 
bi
nary response: a case study. 
Journal of the Royal Statistical Society
:
 
Series A (Sta
tistics 
in Society), 
164
(2), 339
-
355.
 
 
Schafer, J. 
L.
,
 
& Graham, J. W. (2002). Missing da
ta: Our view of the
 
state of the art. 
Psychological Methods, 
7
(
2), 
147
-
177. doi: 10
.1037//1082
-
989X.7.2.147
 
 
Schochet, P. Z. (2008). Statistical power for random assig
nment evaluations of 
educational 
programs. 
Journal 
of Educational and Behavioral Statisti
cs, 
22
(
1), 62
-
87. 
d
o
i: 
10.3102/1076998607302714
 
 
Scientific Softwar
e International,
 
2005
-
2012. Multilevel Models. LISREL Documentation.  
Retrieved July 22, 2011
 
from 
 
http://www.
ssicentral.com/lisre
l
/complexdocs/chapter4_web.pdf
 
 
Scott, A. J.
,
 
&
 
Holt, D. (1982)
. The effect of two
-
stage sampling on ordinary least squares methods. 
Journal of the
 
American Statistical
 
Association, 
77
(
380), 848
-
854
.  
 
 
Searle, S. R., Casella, G., & McCulloch, C. E. (1992
)
. 
Variance Components.
 
New York: Wiley. 
 
  
Ski
nner, C. J. (199
4). 
Sample models and weights
. Paper presented at the Proceedings of the 
Section on 
Survey Research Metho
ds.    
 
 
Skinner, C. J., Holt,
 
D., 
& Smith, T. M. F. (1989). 
Analysis of complex survey
s
. 
Chichester, UK: 
Wiley.
 
 
Snijder
,
 
T. A.
,
 
& Bos
ker, R. J. (2012
). 
Multilevel analysis: an introduction to basic and advanced 
multilevel modeling, 2
nd
 
edition. 
London: S
age Publication Ltd.
 
 
Stapleto
n, L.
 
M. (2006). An assessment of practical solutions for
 
structural equation modeling with 
complex sampl
e data. 
Structur
al Equation Modeling: A Multidisciplinary Journal, 
13
, 
28
-
58. doi: 10.1207/s15328007
sem1301_2
 
 
Stapleton,
 
L. M. (2012). Evaluation of c
ondit
ional weight approximations for two
-
level models. 
Co
m
munications in Statistics 

 
Simulation and Comp
utation, 41, 
182
-
204. doi: 
10.1080/03610918.2011.579700
 
 
Stapleton, L. M.
,
 
& Kang, Y. (201
8
). Design
 
effects of multileve
l estimates from national 
prob
abili
ty samples. 
Sociological 
M
ethods & 
R
esearch,
 
47
(3),
 
4
30
-
457
.
 
88
 
 
Tourangeau, K., Nord, C., L


., T., So
rongon, A. G., H
agedorn, M. C., Daly, P., & Najarian, M. 
(2015). Early Childhood Longitudinal Study,
 
Kindergarten Class o
f 2010
-
11 (ECLS
-

s Manual for the ECLS
-
K:2011 Kindergarten Data File and E
l
ectronic 
Cdebook, Public Version. NCES
 
2015
-
074
. 
National Cente
r for Education Statistics.
 
 
West, B. T., Beer, L., Gremel, G. W., Weiser, J., Johns
on, C. H., Garg, S., 
& Skarbinski, J. (2015).  
Weig
hted multilevel models: a case study.  
American Journal o
f
 
Public Health, 
105
(11), 
2214
-
2215.
 
 
White, H. 
(1980). A hetero
skedasticity
-
consistent covariance matrix estimator and a direct test for 
heterosked
asticity. 
Econometric
a
, 
48
, 817
-
830. doi: 10.2307/1
912934
 
 
Winship, C.
,
 
& Radbill, L. (1994). Sampling weigh
t
s and regression analysis. 
Sociologic
al 
M
ethods
 
& 
R
esearch, 
23
(
2), 230
-
257.
 
 
Xia, Q.
,
 
& Torian, L. V. (2013). To weight or not to weight in time
-
lo
cation sampling: why 
not do 
both? 
AIDS and Behavior
, 
17
(9
), 3120
-
3123.
 
 
Zaccarin, S.
,
 
& Donati, C. (2008). 
T
h
e effects of sampling weights in mult
ilevel ana
lysis of PISA 
da
ta
 
(Working Paper No. 119). Universita Degli Studi di Trieste: Departimento di Scien
ze 
Economiche e Stati
stiche.  Retrieved from: 
http://www2.
units.it/nirdses/sito_inglese/working%20papers/files%20for%20wp/wp119.
pdf.