THEORY AND APPLICATIONS OF INTRACLASS CORRELATION COEFFICIENTS 
AT CLUSTER RANDOMIZED DESIGN FOR STATISTICAL PLANNING VIA 
HIERARCHICAL 
MIXED
 MODELS
  By   Chun
-Lung Lee
                  A DISSERTATION
  Submitted
 to Michigan
 State
 University
 in 
partial
 fulfillment
 of the 
requirements
 for 
the
 degree 
of  Measurement and Quantitative Methods 
 Doctor
 of 
Philosophy
  2019       ABSTRACT
  THEORY AND APPLICATIONS OF INTRACLASS CORRELATION 
COEFFICIENTS AT CLUSTER RANDOMIZED DESIGN FOR 
STATISTICAL PLANNING VIA HIERARCHICAL 
MIXED
 MODELS
  By  Chun
-Lung Lee
 Research investigators rely on information of intraclass correlation coefficients for 
plan
ning and conducting designs and experiments for scientific inquiries in educational and 
social studies. 
Randomized control
led
 trials 
and
 cluster randomized 
studies are
 deemed
 as 
the gold standard for evidence
-based interventions, and 
both
 approach
es have been applied 
successfully in many situations for 
more 
effective decision
-making in education and social 
research
. The c
luster randomized design
s for community
-based 
research
, in particular,
 have been widely used
 in 
the modern era
, since 
they are
 often operated at the group level, like a 
whole community or worksite, in order for researchers more easily to deal with random 
assignment of an entire intact group rather than that of 
each individual subject. Hence, 
such 
cluster
-randomized trials or group
-randomized experiments have become important 
and 
useful 
to 
provide 
evidence
-guided
 practice models 
for scientific inquiry and research
.  The aim of this dissertation is to develop the 
methods for the intraclass correlation 
coefficients 
for binary and continuous outcomes in cluster
-based intervention designs
 using 
hierarchal mixed model
 based on
 the
 scenarios of unconditional and conditional
 mult
ilevel
 structures with 
cluster 
sampling schemes. 
Simulation
 studies
 are
 used to assess the statistical 
properties of intraclass correlation 
estimation and inference 
via the real data set of RSA
-911 for people with disabilities 
served 
in 
the 
Michigan Rehabilitation Services
 Programs
.  The results show that the average (unadjusted) intraclass correlation is about 0.01 for 
  competitive employment and about 0.02 for 
weekly
 earning
s (quality employment
) in 
Michigan. These average (unadjusted) intraclass correlations 
from RSA
-911 are relative
ly 
low in comparison to education interventions or 
academic 
programs 
for 
assessments in 
reading and mathematics across K
-12 (Bloom et al., 1999, 2007; Hedges & Hedberg, 2007; 
Schochet, 2008); however, they 
seem 
comparable to 
some extent from 
those psycholo
gical
 and
 mental health data in school
-based intervention designs (Murray & Short, 1995). 
 For future study, researchers may look into different types of integrated large
-scale 
complex data sets such as RSA
-911 data with a set of covariates from Census dat
a for 
investigating how intraclass correlation performs in statistical estimation and inference 
across multiple platforms. In addition, it would be interesting to study how to deal with 
missing values in the estimation procedure of intraclass correlation, 
and what remedial 
procedure can be added to improve estimation process. For the proposed method, it would 
recommend the total sample size should be greater than 
1,500 and within group sample size 
would be better to be larger than 
100 (with the number of gr
oups about 
15). In conclusion, this study provides a comprehensive methodology for intraclass 
correlation estimation and inference using the mixed 
analysis of variance
 approach along 
with the derived sampling distribution
 (i.e., 
F-distribution)
 for testing hypothesis 
as well as 
building confidence interval on intraclass correlation estimates. Such proposed statistical 
procedures can be easily used and applied in any large
-scale or small
-scale data sets, 
whereas small total sample size and small within group size and miss
ing data are limitations 
on intraclass correlation estimation in terms of precision and accuracy. 
  Keywords:
 Intraclass Correlation Coefficient, Cluster Randomized Design, Multilevel 
Structure
, Hierarchical Linear Model
ing
, Evidence
-based 
Practice Models
                         This
 dissertation
 is 
dedicated
 to Mom
 and
 Dad
 (, both of 
who
m graciously and 
patiently tolerated me then and now)
 ~                 
Through
 all the years,
  Thank
 you for 
always
 always 
believing
 in me (that this would someday be completed)
!                           iv   ACKNOWLEDGEMENTS
   I like to thank the support of my dissertation
/academic
 advisor, Dr. 
Kimberly Kelly, and my 
committee members, Drs. Richard Houang, Gloria Lee, and Su
kyeong
 Pi.
 This dissertation is the
 final product 
of my 
(long and winding) 
PhD 
journey
 at M
ichigan State
, and it cannot be done 
without
 two 
(separate but 
equally important
) groups
 proper 
training
  my 
MQM (measurement 
& quantitative
 methods
) and 
PE (project excellence at 
rehab counseling
). I am very fortunate to 
have not just one 
(major of 
MQM) 
but two 
(aficionad
o of rehab counseling
 as well
) unique
 experience
s (within which there
re challenges
, difficulties
, happiness and 
joys
 to make me grow
 as who I am today
) on the 
special educational
 trip to 
the goal line (
a doctorate degree
). Although I 
did not attend the 
graduat
ion
 ceremony, I was truly inspired by 
Kirk Cousins 
who 
deliver
ed a passionate commencement speech
 (MSU, Spring 2019
; https://www.wkar.org/post/kirk
-cousins
-may-3-2019-michigan
-state
-university
-commencement
-address#stream/0
), addressing
 that
:        Through it all, enjoy the journey
 ... let us rejoice and be glad in it ... 
d
-deliver
 ... 
see life th
rough a window, not a mirror 
.... and choose to be a 
great
 decision maker.
  At the end of the day

While
 chasing/
fighting
 
forget 
to stop and smell the roses along the way
 (to enjoy 
enough 
the 
tough 
road
 thru paradise
). And also
 The Lord blessed my time here in ways I never thought possible
 (God 
was preparing us 
for great things, 
wa
 

Go G
reen, Go White, Go MQM, and Go PE!
      v  PREFACE
    The 
history
 of intraclass correlation
 can be traced back to the last century that Sir Ronald A
. Fisher 
introduced it to 
research communities 
as a
 new
 tool for 
measuring the level of similarity within a 
group
. Since then, 
the 
intraclass correlation has been used as one of the most important 
statistical 
tools in scientific inquiries
. In education
, for example,
 it is 
often
 to use 
the 
intraclass correlation 
coefficient (or ICC) 
to measure the degree of intra
-cluster
 resemblance
 in student
 educational 
outcome
s (e.g., test scores)
 between
 different
 classroom
s or school
s. 
Although 
the ICC was a 
great success
 in the idea of how to measure within
-group
 
, it 
was not until 
later that 
Allan Donner 
and his colleagues 
provided a comprehensive
 and practical
 framework 
of the ICC 
estimation and inference
 (e.g.
, point estimates
 are 
derived by
 multivariate 
normal theory,
 and 
hypothesis tests 
are based 
on variance components 
using
 analysis of variance, 
ANOV
A). In the 
contemporary era, 
ICC
 play
s another
 key role in quantifying the inherent clustering effect
 size
 (i.e., within
-group
 variation) in multilevel 
design
s by using
 hierarchical
 linear models (HLM)
. Stephen Raudenbush 
is a pioneer
 for 
the 
development and application
 of HLM 
in education, 
and
 he sheds light on how to evaluate 
the effect magnitude of multilevel structure
 by ICC.
 Moreover
, Larry Hedges
, renowned for his work of meta
-analysis in education, 
finds
 a n
ovel 
approach
 to 
power
ing
 (i.e., power analysis)
 sampl
ing
 designs 
through design effect
 (i.e., a function of 
ICC).
 Lastly, Tenko Raykov 
gives
 new insight 
into
 strategies for
 ICC 
estimates 
in 
the 
complex
 statistics setting
 (e.g., 
a categorical
 outcome
 variable
) for HLM 
via
 latent variable model
s. The 
goal of this dissertation is to 
draw together in one place the major 
ICC 
developments
, then to 
further develop 
a new 
thinking 
in statistical inquiry of ICC
 estimation and 
inference. 
In addition,
 the evidence
-based paradigm in v

.  vi   TABLE
 OF CONTENTS
 LIST
 OF TABLES
 ................................................................................................................ ix LIST
 OF FIGURES
 ............................................................................................................... xi CHAPTER
 1 INTRODUCTION
 ............................................................................................ 1 CHAPTER
 2!LITERATURE REVIEW OF STATISTICAL METHODS
 ............................ 8 2.1 Fisher Approach
 .......................................................................................................... 8 2.2 Donner Approach
...................................................................................................... 21 2.3 Hedges Approach
...................................................................................................... 31 2.4 Raykov Approach
 ..................................................................................................... 39 CHAPTER
 3 LITERATURE IN REHABILITATION COUNSELING
 ............................. 45 3.1 Multilevel Analysis
 ................................................................................................... 46 3.2 Structural Equation Model
 ........................................................................................ 48 3.3 Classification Tree Model
 ......................................................................................... 49 3.4 Other Methods Such 
as Social Network Analysis 
and Spatial Analysis
 ................... 50   3.5 Justification 
for Covariates Used 
in Multilevel
 Analysis
 .......................................... 51 CHAPTER
 4 METHOD
S AND RESEARCH QUESTIONS
 .............................................. 52 4.1 Research Methods
 ..................................................................................................... 52 4.2 Proposed Models
 ....................................................................................................... 56 4.3 Research Questions
 ................................................................................................... 57 4.4 Description of RSA
-911 Data
 ................................................................................... 59 4.5 Simulation 
and
 Analysis Plan
 ................................................................................... 59 4.6 Theoretical Framework of HLM 
and
 HGLM in 2
-Level 
Cluster 
Randomized 
Design
 ........................................................................................................................................ 61 4.6.1 HLM in 2
-Level 
Cluster 
Randomized 
Structure via RSA
-911 ........................... 61 4.6.2 HGLM in 2
-Level 
Cluster 
Randomized 
Structure via RSA
-911 ....................... 63 CHAPTER
 5 RESULTS
 ...................................................................................................... 65 5.1!Data Source and Sample Characteristics
 ................................................................... 65 5.2 Models and Variables Used for Simulations of ICC Analysis
 .................................. 68 5.3 ICC Estimation Method and Its 
Inferential Statistics
 ................................................ 74 5.4 Results of ICC Estimates and Inferential Statistics
 ................................................... 79 5.4.1 Competitive Employment Outcome Measure
 .................................................... 80 5.4.2 Earnings or 
Quality Employment Outcome
 Measure
 ........................................ 91        vii   CHAPTER
 6 CONCLUSION & DISCUSSION
 ............................................................... 101 6.1!Summary of the Results
 .......................................................................................... 101 6.2 Implications
............................................................................................................. 105 6.3 Limitations of the Study
 .......................................................................................... 114 6.4 Future Research
....................................................................................................... 117 6.5 Conclusion
 .............................................................................................................. 120 APPENDICES
 .................................................................................................................... 121 APPENDIX A:
 Definitions of the 
VR Variables in 
RSA
-911 ...................................... 122 APPENDIX B:
 Descriptive 
Data Statistics
 .................................................................. 125 APPENDIX C:
 Glossary of A
bbreviations
 ................................................................... 128 BIBLIOGRAPHY
 .............................................................................................................. 129                     vii
i   LIST
 OF TABLES
   Table
 2.1 Analysis 
of Variance (ANOVA) 
for Intraclass Correlation (ICC) Calculations
 .... 23 Table
 5.1 Individual Characteristics of the Usable Samples (
n=11,819)
 ............................... 66 Table
 5.2 Disability & Rehabilitation Characteristics of the Usable Samples (
n=11,819)
 .... 67 Table
 5.3 Outcomes of the Usable Samples (
n=11,819)
 ....................................................... 68 Table
 5.4 Correlation Structure of All Predictors and Outcome Y1 in Hierarchical Analysis
 .............................................................................................................................................. 70 Table
 5.5 Correlation Structure of All Predictors and Outcome Y2 in Hierarchical Analysis
 .............................................................................................................................................. 70 Table
 5.6 Summary of Mean Differences in the Outcomes between Type of Disability
 ....... 71 Table
 5.7 ICC Estimates of Unconditional Model M1 for Outcome Measure Y1
................. 86 Table
 5.8 ICC Estimates of Conditional Model M2 for Outcome Measure Y1
 .................... 87 Table
 5.9 ICC Estimates of Conditional Model M3 for Outcome Measure Y1
 .................... 88 Table
 5.10 ICC Estimates of Conditional Model M4 for Outcome Measure Y1
................... 89 Table
 5.11 Auxiliary Information 
of ICC Estimates for Outcome Measure Y1
 .................... 90 Table
 5.12 Evaluation of Bootstrap ICC Estimates for Outcome Measure Y1
 ................... 90 Table
 5.13 ICC Estimates of Unconditional Model M1 for Outcome Measure Y2
............... 96 Table
 5.14 ICC Estimates of Conditional Model M2 for 
Outcome Measure Y2
................... 97 Table
 5.15 ICC Estimates of Conditional Model M3 for Outcome Measure Y2
................... 98 Table
 5.16 ICC Estimates of Conditional Model M4 for Outcome Measure Y2
 ................ 99 Table
 5.17 Auxiliary Information of ICC Estimates for Outcome Measure Y2
 ................ 100 Table
 5.18 Evaluation of Bootstrap ICC Estimates for Outcome Measure Y
2 ................. 100 !   ix !!Table
 A.1 List 
of the Definitions 
of VR Service Variables Used 
in the Study
 .................. 122 Table
 A.2 
List 
of the Definitions 
of VR Demographic Variables Used 
in the Study
 ........ 123 Table
 A.3 
List 
of the Definitions 
of VR Outcome Variables Used 
in the Study
 ............... 124 Table
 B.1 Descriptive Summary 
of the 
Usable Sample 
by Office Level 
in Michigan 
(n=11,819) .......................................................................................................................... 125 Table
 B.2 A Summary of the Geogra
phic Information System of Office Units in Michigan
 ............................................................................................................................................ 126 Table
 C.1 Glossary of
 Abbreviations
 ................................................................................. 128 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! x !!LIST
 OF FIGURES
   Figure 
1.1 Conceptual Flowchart of the Intraclass Correlation Study at Hierarchical Design
 ................................................................................................................................................ 5 Figure 
2.1 Sampling 
Distributions 
of Non
-Transformed 
and Transformed Correlations 
at Three Different Levels
 ......................................................................................................... 15 Figure 
2.2 Intraclass Correlation Between Two Classes 
of Measurements
 ......................... 18 Figure 
2.3 Demonstration Example 
of Intraclass Correlation 
by 
Two Classes 
of Measurements
 ....................................................................................................................... 20 Figure 
2.4 Intraclass Correlation & Design Effect in 2
-Level Hierarchical Linear Model
 .... 36 Figure 
2.5 Latent Variable Model 
for Estimation 
of Intraclass Correlation 
in 2
-Level Design
 .............................................................................................................................................. 41 Figure 
4.1 A W
orkflow Diagram 
of Simulation
-based Exploration 
and Evaluation 
for 
the 
ICC
 ....................................................................................................................................... 60 Figure 
B.1 Spatial Network of Target Sample in Michigan by Hierarchical Structure
 ....... 127      !!!!!!!!!!!  xi ! !1 CHAPTER 1
  INTRODUCTION
   The need for 
more scientific 
evidence
-based 
research 
has been increasingly concerned 
in 21st century education (Schneider et al., 2007). The use of rigorous methods such as 
randomized control trial (RCT) 
and cluster randomized trial (CRT) 
experiments
 in particular, 
is 
important 
to 
not onl
y reinforce 
sound
 research 
but also
 build a 
solid
 basis 
of evidence
-guided
 knowledge 
for 
inform
ing
 policymakers and practitioners (Menon et al., 2009; Slavin, 2002). 
Under 
The 
Every Student Succeeds Act of 2016 (amended after No Child Left Behind), the 
U.S. 
Department of Education 
(2016) 
wrote
 the new guidelines
 of implementation of 
scientific 
research
. Specifically, a
s for use of evidence
-based
 intervention
s, researchers 
need t
o be
 guided 
by 
auxiliary 
research evidence
 from previous studies 
in order to 
conduct scientific
ally rigorous 
research 
as well as
 promot
e better 
and effective 
outcomes
 in 
education
, according to 
the 
statistical standards and guidelines for the National Cent
er for Education Statistics at The What 
Works Clearinghouse
 (https://ies.ed.gov/ncee/wwc/). 
With 
that goal
 in mind
, RCTs and
 CRTs are 
often 
highly suggested by 
federal 
education 
research 
agencies
, such as
 Institute of 
Education Sciences
 and 
its
 affiliated centers
, and constantly deemed as the gold standard
 in scientific research
 and
 evidence
-informed
 practice
, since 
both RCT and CRT
 approach
es have already 
been 
proved 
successfully 
in many circumstances
 for making 
decision
s in educatio
n. One key element to making any meaningful scientific conclusions is to produce 
evidential base through designs and experiments (Anderson & Shattuck, 2012; Barab & Squire, 
2004; Cobb et al., 2003; Odom et al., 2005; 
Shavelson et al., 2003
). For 
education 
policy and 
practice in the 21st century
 (Slavin, 2008
), the pursuit of
 research
 soundness 
has been
 already 
!2 reinforced
 persistently
 by means of
 education legislation
, e.g., 
NCLB Legislation
 (2002
) and
 ESRA Legislation
 (2008). The No Child Left Behind Act of 2001 (NCLB)
, f
or example, 
supported scientifically based research involving rigorous and systematic methods to obtain 
applicable
 and
 generalizable knowledge for improving school programs, teaching methods and 
learning outcomes. 
Further
more
, The Education Sciences Reform Act of 2002 (ESRA) was 
proposed to reform education science
s through principles of scientific research such as 
random
ized
 experiments
 to measure causal impacts on educational outcomes.
 In the era of evidence
-based practice (EBP), rehabilitation counseling is 
also 
embracing 
the concepts of best practice and knowledge translation to incorporate scientific advances and 
chan
ges that have redefined the relationship between impairment
s and the 
cap
ability to work 
(Leahy et al., 2014a). As for the state
-federal vocational rehabilitation (VR) services, the public 
VR agencies are a major 
force
 of employment assistance for individua
ls with disabilities.
 Recent legislation 
for
 The Workforce Innovation and Opportunity Act 
(WIOA)
 of 2014, 
state 
VR programs have to assist the target disability population
s, with educational
 or vocational 
training services, to succeed in the labor market 
and further 
to compete, with 
professional 
competency 
skills, in the global economy (WIOA Legislation, 2018). Therefore, nowadays 
the 
rehabilitation counseling 
workforce 
(including all those counselors, educators, practitioners, 
and researchers) need to work
 together to embrace the new era of the EBP paradigm to help VR 
customers improve
 the 
access
ibility
 of quality
 rehabilitation services with informed choices of 
effective interventions or treatments. 
Moreover
, it is important to use data
-driven or evidence
-based rehabilitation counseling best practices 
to 
improv
e accountability and outcomes for 
people with disabili
ties by conducting systematic reviews and well
-designed studies
, as a way
 to get
 more 
reliable
 and valid 
evidence for translating knowledge 
and
 making
 good decisions in 
!3 VR (Chan et al., 2009; Leahy et al., 2009; Leahy & Arokiasamy, 2010; Leahy et al., 2014
b).
 The e
vidence
-based 
practice (EBP) 
has 
become 
a new norm today 
by conduct
ing
 valid 
research and gather
ing
 reliable data for improving practices and outcomes (Eignor, 2013). 
In education (including rehabilitation counseling), t
he EBP 
research
 along with
 well
-constructed
 designs and experiments can provide fundamental and significant improvements over practices. 
Not only can t
he proper use of EBP results 
help make better decisions about individuals (e.g., 
people with disabilities) and programs (e.g., VR 
agencies
), but it can 
also provide a successful 
path
way
 to 
gaining 
broader access to 
quality 
education 
or full 
employment
, according to the 
standards for research conduct
 by educational researchers (
American Educational Research 
Association, American Psychological Association, National Council on Measurement in 
Education, & Joint Committee on Standards for Educational and Psychological Testing, 2014). 
 Professionals in the field of rehabilitation coun
seling
, such as 
VR counselors and 
practitioners
, are often expected to integrate clinical judgement skills (including scientific 
attitude, cognitive complexity, evidence
-based practice, and counselor biases) with research 
evidence via scientific
-based meth
ods to make best informed decisions that maximize the well
-being outcomes of the clients (e.g., people with disabilities in public VR) (Austin & Leahy, 
2015; Menon et al., 2009). The emphasis of best EBP 
lends VR counselors
 a significant
ly renewed impetus
, so that they can
 be more accurate in clinical judgement
 by getting
 research
-informed
 knowledge 
in clinical issues
 of interventions and outcomes
 (Chan et al., 2010).  
 Since the 

correlation coefficient was introduced last century, it has been used 
as one of the most important statistical tools for scientific inquiries in educational and social 
research
 (Agresti & Finlay, 2009; Fisher, 1915; Olkin & Pratt, 1958; Pearson & Lee, 1903
; Pearson, 1904; Pearson, 1920; Soper et al., 1917; Student, 1917; Thorndike, 2005). When using 
!4 


  whereas, 
this issue
, which has 
the 
potential
 for an informal
 fallacy
, can be rectified
 by 
using 
a well
-designed experiment
, by 
means of
 which
 researchers
 are more likely to 
go 
the 
extra mile to 
obtain
 valid 
statistical 
inference 
or even causality 
in studies 
(Fisher, 
1925a, 
1942, 
1958a, 1958b; Holland, 1986).
 Of the different types of effect magnitude measures for the correlation ratio (e.g., 
Intraclass Correlation
, Eta
-squared, Omega
-squared
, R-squared, 
and 
Rho
-squared
 indexes), the 
intraclass correlation (ICC) is a parametric estimator in the random
-effect (or mixed
-effect
) model to quantify the true proportion of total variance accounted for 
in the outcome variable 
(Hays, 1994; Raudenbush & Bryk, 2002). Further
more
, the ICC can summarize the cluster
ing 
effect 
magnitude (i.e., the relatedness)
 at a hierarchical design (
Note: I
n statistics, this technique 
is called the 
random
 coefficients in m
ultilevel models) (Hays, 1994; Hedges & Olkin, 1985). 
In this study, one main research goal is to investigate the ICC in hierarchical linear models 
(HLM) and hierarchical generalized linear models (HGLM) by using 
the 
mixed
-effect 
analysis 
of variance (
ANOV
A), in order to better understand 

 statistical properties on 
different 
simulation
-based scenarios 
with respect to 
complex 
modeling 
structure
s and
 sampling 
design
s.  Both RCT
s and CRT
s have been widely viewed as the one of the best EBP approaches 
(i.e., 
the 
gold standard) for appraising and measuring the 
efficacy and 
effectiveness of 
intervention
s or treatments 
in educational and social 
research
 studies
, since not only 
can 
such a 
method
ology
 for designing
 experiment
s efficiently 

relation to
 the
 intervention
 or 
treatment
 given by 
using a
n experimental
 design, but 
it 
can 
also
 effectively
 provide more robust and valid evidence 
in EBP for scientific inquir
y and research 
!5 (Connolly et al., 2018; Menon et al., 2009; Schneider et al., 2007; Sullivan, 2011). 
  Figure 1.1 Conceptual Flowchart of the Intraclass Correlation Study at Hierarchical Design
 ! The current study 
is to
 address the following research questions 
under 
an EBP paradigm 
with 
the
 hierarchical
 design (i.e., CRT) 
driven by the intraclass correlation coefficient (ICC) 

ICC). Moreover, this research 
is to
 evaluate 
the 
statistical
 performance of ICC 
in 
various
 simulation
-based
 scenarios
 designed
 by the 
complex hierarchical data structure
s through
 the 
existing RSA
-911 data
 set, where 
clients in 
VR were 
represented as 
real
-world connections 
into
 computer simulations 
in the
 two
-level 
CRT 
setting
 (i.e., clients 
are 
in level 1, and offices 
are 
in 
level
 2). Also n
ote that
, in simulations 
using real data 
via 
the 
RSA
-911, the selected 
variables 
are 
incorporate
d into multilevel mode
ling
 (i.e.,
 HLM & HGLM) to represent
 the pivotal VR 
relationships between 
demographic 
characteristics, 
rehabilitation services
, and 
employment 
!6 outcomes
. In order to answer the proposed research questions, a computer simulation study 
(i.e., the Monte Carlo Method) 
is conducted using the bootstrapping procedure with the real 
data 
of RSA
-911 
(Note: The bootstrap method is 
a resampling technique wit
hout replac
ement 
from a given sample). More details of this computer simulation framework with RSA
-911 data 
are provided in Chapter 4 (
Methods 
and Research Questions
) and 
Chapter 
5 (Results).
 This study 
is to
 address the following three research questions with respec
t to the ICC. 
 Research Question 1
. Consider
 RSA
-911 data for those people with disabilities 
served 
in Michigan in FY 2015
. What are the empirical distributions of ICC (estimate, standard error, 
p-value
 and 95% confidence limits
) for the 
usable samples
 of RSA
-911 data? 
 (a).
 Compare the 
method 
performance of statistical estimation 
and inference 
among 
Models 1
-4, where Model 1 is fully unconditional, Model 2 is conditional on individual 
characteristics of gender, minority, age, education and social security 
insurance
 benefits
, Model 
3 is conditional on rehabilitation service predictors
 (job placement, on
-the
-job supports and 
rehabilitation technology)
, and Model 4 is a combination of Models 2 and 3.
 (b). What are the empirical distributions of ICC 
estimates 
given by different breaking 
variables (disability type, 
disability significance and severity, and previous work experience
) for 
subset analysis under
 Models 1
-4? What are the differences among Models 1
-4 in 
(a)
 and 
(b)
? Research Question 2
. Given the clust
er randomized design structure of RSA
-911 data, 
there are three different 
cluster
 settings at level 2
 - the number of groups = 5, 15,
 or 25; and 
there
 are 
three 
individual
settings at level 1
 - the number of subjects = 50, 100,
 or 150. Based 
on the boo
tstrapping procedure by 100 times, what are the empirical distributions of ICC 
(estimate, standard error, and p
-value) in each 
bootstrap 
scenario
 under Models 1
-4? !7 (a)
. Given by 
each of
 bootstrap resampling scenarios (the number of bootstrap 
repetitions=10
0),
 compare the 
ICC estimates
 among
 Models 1
-4 and 
examine
 which
 model 
(from Models 1
-4) 
can 
provide better statistical performance 
of ICC estimation and inference
. (b). Evaluate which bootstrap sampling scenarios (based on the number of groups and 
the 
number of subjects) can provide more accurate and precise ICC estimates (i.e., less bias and 
less mean squared error in statistical estimates)
? What are the 
recommended sampling strategies 
(the number of groups and subjects) for cluster randomized trials u
sing RSA
-911 data?
 Research Question 3
. Comparing the results between Research Question 1 (
RQ1: 
Population 
Model) and Research Question 2 (
RQ2: R
esampling 
Model), which 
model
 (in 
Models 1
-4) can provide the best statistical properties of ICC estimat
ion and
 inference,
 in terms 
of statistical bias (expected difference in ICC estimates between RQ1 and RQ2), mean square
d error (mean square
d deviations
 in ICC estimates between RQ1 and RQ2), and 
ICC 
parameter 

95% confidence interval for ICC
, based on 
the subsamples in RQ2
, in comparison with the overall sample result in RQ1
)? The next two chapters 
are to 
present both the literature review of statistical methods
 and 
applications
 for intraclass correlation plus t
he motivation 
for the study 
(Chapter 2), as well as 
the literature of statistical approaches in rehabilitation counseling using RSA
-911 data (Chapter 
3). The rest of the dissertation is organized as follows. In Chapter 4, 
it 
covers a
 mathematical 
framework and notation in the propose
d methodology for investigating intraclass correlation in 
multilevel structure. In Chapter 5, 
it shows
 the results using the real data set of RSA
-911 
via
 an 
exploratory 
bootstrap 
simulation approach to ICC estimation and related statistical inference. 
Last
 but not least, 
simulation results and 
study 
findings 
are
 discussed
 in Chapter6.   
 !8 CHAPTER 2
  LITERATURE REVIEW OF STATISTICAL METHODS
   In this chapter, a 
comprehensive
 introduction to the history of intraclass correlation 
coefficients (ICC) at experimental designs 
is provided to serve a basic framework of this study. 
ICC has been one of the oldest statistical measures since 
Sir 
Ronald 
A. Fisher coined it last 
century. T
he fundamental idea of ICC is presented first to show the basic context of intraclass 
correlation, and then is followed by a series systematic review of
 its 
development
s in statistical 
estimation & hypothesis testing, effect size
 measurement,
 and Fisher tr
ansformation
 using ICC
. In addition, a review of the literature pertinent to the current 
major 
development
s in ICC by 
Allan 
Donner, 
Larry 
Hedges, and 
Tenko 
Raykov, as well as their proposed analytic strategies 
for using 
ICC
, are
 all
 provided to serve a fundamental basis of the study 
and then
 to understand
 the ICC

 phenomenon at multilevel design 
especially for 
cluster randomized trials.  
  2.1 Fisher Approach
  Since the correlation coefficient was introduced last century, it has been used as one of 
the most popular and important tools in scientific inquiries including biometrical work 
as well 
as social and educational research studies (Fisher, 1915; Olkin & Pratt, 1958; Pearson, 1920; 
Rodgers & Nicewander, 1988; Soper et al., 1917; Student, 1917). The inheritance of physical 
and mental characters in human is one classical example to show how powe
rful this statistical 
tool can be applied to across all our scientific fields. For example, Pearson and his colleague 
used U.K. school children data in the late 1800s to investigate a variety of 
basic human
 !9 mechanisms from physical characteristics (e.g., a
ge, body size, stature, and even eye color) to 
latent or psychic abilities (e.g., mental status or intelligence), and further to compare those 
measures, using 
Person product
-moment 
correlation, to 
understand
 ancestral heredity, natural 
inheritance, and fam
ily resemblance (Pearson & Lee, 1903; Pearson, 1904). 
 When using correlation to interpret statistical results, 
researchers
 need to be aware of 


issue
 can be dealt with by a carefully designed experiment (like randomized control trials)
, and 
it may help go the extra mile to test causality and further to make a valid statem
ent of 
causal inference (Fisher, 1958a, 1958b; Holland, 1986). In the experimental field, scientific 
inquiries can be done 
synthetically 
with three key ingredients 
 replication (for adding 
precision), randomization (for bringing validity), and control (fo
r reducing interference), and 
so research workers
 therefore 
are able to
 

409-410); on the other hand, in the obs
ervational study, some may be found it useful in the 
exploratory stages to express a statistical inquiry in the form of a correlation coefficient, but, 

valid founda
tion of making causal links rather than simply to produce spurious correlations or 
even counterfactual connections, due to a reasonable suspicion that, if any, various possible 
contributory causes of a studied phenomenon cannot be controlled (Fisher, 1925a
, Chapter Six 

(like 
quasi
-experimentation or regression discontinuity) to circumvent, or at least to alleviate, 
the difficulty and probl
em by adjusting 
uncontrolled
 observations
 (or uncontrollable
 events
 !10 with 
artificially controlled (or statistically manipulated)
 quasi
-experimental conditions
 (i.e., 
pseudo experiment
al models
) to appropriately 
but properly 
estimate causal impacts 
(i.e., 
treatment or intervention 
effects) 
using 
Neyman
-Rubin 
Model (Schneider et al., 2007). 
  In a theoretical perspective, mathematical features 
(algebraic relationships) 
and key 
properties 
(statistical functions) 
of 
Pearson
 correlation coefficient are listed
 as follows
.  Let are 
N pairs of independent samples with bivariate normal 
with means 
 , variances 
 and correlation 
 . The frequency can be written in 
the form
     , where the correlation 
 may be positive or negative or zero but cannot exceed unity in 
magnitude (Fisher, 1925a; Roussas, 2002). If one variate has an assigned value (e.g., 
), then by giving 
 a constant value 
, this conditional frequency (i.e., the total frequency above
 is divided by the frequency with which 
 occurs) can be expressed by a general formula
     , where the conditional distribution (
 of 
 given 
) is normal with mean 
 and variance 
 , and it implies that the total variance of 
 in the fraction 
  !11 is independent of 
, while the remaining variation of 
 in the fraction 
 is determined by 
(and calculable from) the value of 
 (Fisher, 1925a; Mood, Graybill
, & Boes, 1974). 
 The statistical estimation of the correlation is the ratio of the covariance to the 
geometric mean of the two variances; if 
 and 
 represent the deviations of the two 
variates from their means 
, then the correlation coefficient (or product moment) 
estimator 
 would be given by 
    , where the mean estimates 
 can be approximated by sample means 
.   By Olkin
 and Pratt (1958), the probability density of 
 is derived as
    , where 
  is the hypergeometric function, and the last 
term therefor
e can be computed and simplified as  
 . It is noteworthy that 
under the null hypothesis of 
 is true (i.e., 
), the asymptotic distribution of a sample 
correlation 
 is a normal density with mean 0 and variance 
. In the general case of 
 (i.e., 
), by using Laplace transformation and Taylor series expansion through on 
that previous density function of sample correlation
, Olkin and Pratt (1958) derived the 
!12 uniformly minimum
-variance unbiased estimator (UMVU
E: an unbiased estimator that has 
lower variation error than any other unbiased estimators for all plausible values of the 
parameter), which is shown to be
    Note that there is another simple estimator of correlation coefficient to adjust biased 
correlation especially for a small sample size 
, according to 
Kelly
 (2018) and Flom 
(2015): 
  , where the formula is resulted from adjusted 
 for 
 # predictors. 
  To test whether a correlation is different from zero (i.e., 
), the test statistic 
is 
   , which is t
-distributed with 
 degrees of freedom (Lomax & Hahs
-Vaughn, 2012, 
pp.267-268; 
Roussas, 2002, pp. 472
-473). It is interesting to note that the probability density 
of a correlation (when 
) can be found using a linear transformation of 
-statistic above 
!13 by 
   , where this density is only true for the case of 
 (independence) (
Roussas, 2002, pp. 
474). Comparing to the previous 
-statistic approach of testing significance of a correlation 
coefficient, transformed correlations is another 
way to deal with the issue of testing the 
significance of an observed correlation coefficient. By using a well
-known standard normal 
 testing statistic, Fisher (1925a) proposed a more reliable and accurate transformation method 
that employs the informati
on of a given correlation 
 to approximate to the standard normal 
 distribution in which this test can be carried out without much difficulty in laborious 

 transformation is defined as the formula  
  

  , where the statistic value 
 ranges from 
 to 
 as the sample correlation 
 changes from 
 to 
 
 can also be approximated by 
 , and the 
standard error of 
 is derived in a simpler form approximately as 
  which is 
practically independent of any value of correlation in the population from which the sample is 
drawn. There are three advantages of this transformation of 
 into 
 (Fisher, 1925a, pp. 198
-199) : (1) the standard error of 
 does not depend on the true value of the correlation 
 , so 
can provide a true weight for the value of the estimate (i.e., 
 is a so
-called ancillary statistic 
!14 which contains no information about the parameter interest 
, but sometimes it paradoxically 

the accuracy and precision of th
e estimate 
 here; Casella & Berger, 2002, pp. 282
-284; Cox, 
1971; Efron & Hinkley, 1978; Fisher, 1925b, p. 724); (2) although the distribution of  
 is not 
normal in small samples and even remains far from normal for large samples with a high 
correlatio
n (e.g., the correlation 
 is close to  
) , the sampling distribution 
 still tends to 
converge to asymptotic normality as the sample size
 increases, no matter what the value of 
the correlation may be (either large or small, positive or negative); (
3) while the distribution 
of 
 changes rapidly in terms of its shape (i.e., skewness and kurtosis) as the parameter 
 is 
changed (given by 
 ), the sampling distribution of 
 is probabilistically more 
stable and nearly constant in the form of a 
symmetrical bell shape (i.e., values are normally 

 transformation follows approximate normality with mean the true correlation parameter 
 and 
variance 
.  Also see 
below 
Figure 
2.1 for demonstrating the comparisons of the 
sampling distributions between non
-transformed and transformed correlation coefficients at 
the three different levels
 (i.e., 
correlation coefficient
s  are
 set at 0.2, 0.5, and 0.8
, while 

). In Figure 2.1, i

distributions are relatively 
more 
robust and stable 
than 

-transformed 

correlation 
coefficient
 across the continuum
 domain (i.e., 
 and
 ).     !15   Figure 2.1 
Sampling 
Distributions of Non
-Transformed and Transformed Correlations at 
Three Different Levels
   Note. The original idea of this graph 
(Figure 2.1) 
comes
 from
 
 transformation 
(1925a, p.200)
. The upper panel demonstrates the sampling distributions of correlation at the 
levels of r = 0.2, 0.5, and 0.8
; and the lower panel shows the respective sampling distributions 
by 

Z transformation
 in which 
the values 
are 
shown as 
z = 0.20, 0.55, and 1.10
.     
Corr
elation
 Distribution by Three Diff
erent
 Levels
          
 Distribution by Three Diff
erent
 Levels
 
Correlation
 Value
 
 Density
 Density
 !16 In terms of 
correlation
-based
 measures
, Jacob 
Cohen (1988, pp. 77
-81) proposed that 
 (or the threshold of 
 around 0.1) as a small or weak effect, 
 (or 
 around 0.3) as a medium or moderate effect, and 
 (or 
 around 0.5) as a large 
or strong effect, and 
 (or 
 around 0.7 or above) as a very large or 
extremely 
strong 
effect, to determine the effect size magnitude of a studied phenomenon of
 interest (
Cohen, 
1988; 
Ellis, 2009; Rosenthal, 1996). It is cautious to note that these standards for correlation 
thresholds 
may need to be modified or even re
-evaluated 
& re-justified in different areas of 
scientific inquiries
, especi
ally for the fields 
other than behavioral 
and social 
sciences 
(such as 
clinical 
and social 
psychology
), since
 J. Cohen 
(as a clinical 
and social 
psychologist) 
was 
originally working on this effect
-size magnitude research 
using the data in his field 
(specifically, 
unique to 
psychology
 and social sciences
) for 
developing 
qualitative
 descriptors of strength of association w
ith respect to
 a quantitative
 product
-moment 
. In the family of effect size measures of correlation, there are other types of effect size 
estimates that are calculated based on different variance components (e.g., 
effect magnitude
 (EM)
 = [explained variance] / [total va
riance] 
, which is 
translat
ed into plain language 
 EM is 
the amount of the explained variance can be accounted for by the total variation within 
an experimental design model; Cohen, 1988, p. 78.) 
For
 instance, the coefficient of 
determination (aka R
-squared, or 


) is widely known and used 
especially in regression models
. In addition,
 the correlation ratio Eta
-squared (


) is another form of the squared correlation in analysis 
of variance 
(ANOVA) models (Pearson, 1923; Richardson, 2011)
. Also,
 Hays (1994) introduced a similar 
one 
 the omega
-squared index (
)  as a ratio of the relative reduction in uncertainty say 
about 
 due to 
, which shows the variance component in 
 given by 
, and this index can be 
!17 described as 
. Last but not least, the intraclass correlation coefficient 
(IC
C) is defined as 
  

  , the formula of which is another idea to quantify the true proportion of variance accounted for 
in the outcome (by cluster effect) in random
-effect mixed models (Hays, 1994; Raudenbush & 
Bryk, 2002). Note that the intraclass correlation (or the so
-called 
cluster effect) is defined only 
in the random
-effect (esp. random
-intercept) models, while the omega
-squared index can also 
be used in the fixed
-effect analysis (Hays, 1994, p.535; Hedges & Olkin, 1985, p. 103). In this 
study, one main focus is to investig
ate ICC in hierarchical models (mixed effects ANOVA) 
so 
as to better understand its properties on different scenarios by design effect and sample size. 
 Another application of the use of intraclass correlation is to measure the level of 
similarity or resem
blance (Fisher, 1925a; see Figure 2.2 
below 
as an 
illustration
 of intraclass 
correlation). In one case like plant biology fields, the resemblance between leaves or pods on 
the same tree was studied say by picking 30 seed pods from a number of different 100
 trees. 
In another case of human & family correlation studies, for example, we have a sample of 
anthropometric measurements of about 1500 pairs of siblings of the same family (e.g., two 
classes: elder kid vs younger kid); and we may want to calculate corre
lation between siblings. 
Here
, if 
an association
 of interest is based on 
difference
s between two classes 
(or groups) 
of 
measurements
, then it
 would be
 so-called 
interclass
 correlation that is
 also equivalent to
 a typical
 
 correlation coefficient 
 between two 
sets of measurements
. On the other 
hand, suppose that all the subjects (
e.g., 
a combination of 
both older and younger 
siblings) 
!18 belong to the same class (
only 
one
 group 
of a 
single 
whole study overall) with a common 
mean and a common standard deviation about that mean for all measurements, 
and 
then 
correlation now is distinguished as 
intraclass
 correlation (Fisher, 1925a, pp. 211
-215).   Figure 2.2 Intraclass Correlation Between Tw
o Classes of Measurements
  Note. 
This 
illustration
 of ICC
 is motivated by
 an original
 idea 
by Fisher (1925a).
  In the special case of having two classes of measurements given by 
N pairs of samples 
, intraclass 
correlation is
 defined as 
 !19  
  , where the common mean is 
, and the common variance is 
. When 
it consider
s the general case of having a 
set of 
 classes of measurements given by 
N samples with 
 representing a set of 
means from the k classes in each sample, the general formula of intraclass correlation can be 
written by 
  
  , where the common mean is 
, the common variance is 
, and the range of 
intraclass correlation values is always positive 
or should not be less than 
. See 
Figure 2.3 for a geometric interpretation of ICC by illustrating the resemblance of 10 paired 
observations
 (i.e., 
siblings
 
) as to some measure of 
within
-pair association (or 
intraclass correlation
) bet
ween 
the 
two 
siblings in the same family
. It is interesting to note that 
the ICC 
in Figure 2.3 
can be geometrically represented
 as well as 
numerically approximated
 by the overall Euclidean distance
 (or norm
) between the paired samples on the 
standardized 
scale
 (i.e., t
he overall Euclidean 
length
 can be defined by the 
standardized 
difference 
between 
the 
measures of 
sibling A and sibling B
 in the 
Cartesian 
coordinate
 system
 ).  !20   Figure 2.3 
Demonstration Example of 
Intraclass Correlation 
by 
Two Classes of Measurements
   Note. This illustration comes from the concept of intraclass 

of having the common mean and standard deviation for all the measurements (1925a, Section 
38 of Intraclass Correlations and the Analysis of Variance, pp.211
-214). The intraclass 
correlation
 (or within
-pair 
correlation)
 can
 be estimated by the Euclidian distance of the 
paired measurements between the two related groups of samples (i.e., the true ICC is 
set at 
0.303, and the estimated ICC is 
given by 
0.298 using the standardized length between 
a pair 
of 
measurements
 from Sibling A and Sibling B
).  -2-1012-2-1012!"#$%&'%(()*+$$,'%#-+")./%01',)2+$)3,%(4$,0,"#()+2)56)7%-$()+2)8-9'-":(Measure of Sibling AMeasure of Sibling BReg Line: r= 0.303Cor Dist: r= 0.29845-Degree LineExample 
of Intraclass Correlation for Measurements of 10 Pairs of Siblings
 Measure of Sibling A
 Measure of Sibling B
 !21 2.2 Donner Approach
  In the analysis of family data, it is frequent to use the intraclass correlation coefficient 
to measure the degree of intra
-family resemblance among family members with regard to 
family health history in quantitative traits of biological or psychological a
ttributes such as 

intelligence (IQ).  Donner & Koval (1980a) derived the maximum likelihood estimator 
(regarding no prior knowledge of statistical estimates) of the
 intraclass correlation 

 using multivariate normal theory in variance component models (assuming unequal 
group/family sample size). 
 In statistical theory, suppose one observation on the 
j-th member (
) of the 
i-th family (
) is used to investigate the intraclass resemblance 

 among the 
class of 
 samples from each of 
 families, which can be stated mathematically as 
    , where 
 is an observation for which 
i is the index of a family or group factor (
) and 
j is an individual member within that family or group factor (
),  is the grand 
mean of all the observations, 
 is designed as the random effect (identically distributed) with 
mean 
0 and variance 
 (i.e., NID(0, 
)),  is a random normal error term for 
j-th subject in 
i-th group (
i.e., 
independently and identically distributed with mean 0 and variance 
; viz., 
NID(0, 
)), and both random components, {
} and {
}, are assumed to be mutual
ly independen
t.   !22 By summing the additive variance components (i.e., a sum of both between
-group and 
within
-group variation is equal to total variation), the variance of 
 is given by 
, and then the int


, where this index will be zero when 
, and it will be unity if 
 (assuming
 that 
). Notice that the intraclass correlation represents the tr
ue proportion of 
variance attributable to Factor 
, and that the intraclass correlation is similar to the omega
-squared index (
) in the general form, although the intraclass correlation (

) applies 
to the random
-effect model but th
e omega
-squared index (
) often only to the fixed
-effect 
model (Hays, 1994, p.535).
  Equivalently, from a point of view of statistical theory, the intraclass correlation can 
also be fundamentally defined as the ordinary correlation coefficient between any two 
observations in the same class (group or family), say 
 & , since thei
r statistical 
relationship holds that
  
  , where 


, and 

 (Donner & Koval, 
1980a). 
 
 above requires distributional 
assumptions of observations (based upon multivariate normal theory), it is the analysis of 
variance (ANOVA) that provides an alternative estimator of intraclass correlation (for 
relaxing the assumptions) in the classical line
ar models (Donner & Koval, 1980a). The new 
!23 practical method for estimating intraclass correlation is to utilize relevant information in the 
ANOVA table shown as following (without loss of generosity, it is assumed to be a balanced 
design with equal group/f
amily size).
  Table 2.1 
Analysis of Variance (ANOVA) for Intraclass Correlation (ICC) Calculations
 !Source of 
Variation
 Degree of 
Freedom (DF)
 Sum of Squares 
(SS)
 Mean Squares 
(MS)
 F Statistic
 Among Groups
 k-1 SSA MSA MSA / MSW
 Within Groups
 k(n
-1) SSW MSW  Total
 kn-1 SST   ! , where the between
-group variation SSA = 
, the within
-group variation 
SSW =
, the total variation SST =
, the mean 
squares among groups MSA = SSA / DF(Among Groups) = SSA / (k
-1) = , the mean squares within groups (or the mean squared error) 
MSW = SSW / DF(Within Groups) = SSW / [k(n
-1)] = 
, the 
between
-group degrees of freedom DF(Among Groups) is 
 (for 
 = the number of 
groups), the within
-group degrees of freedom DF(Within Groups) is 
 (for
 = the 
number of within
-group subjects).
 It is interesting to note that, by Hays (1994, pp. 533
-535), the expectation of mean 
square among groups E[MSA] = 
, and that the expectation of mean square within 
groups E[MSE] = 
 (i.e., MSE is an unbiased estimate of error variance; Hays, 1994, p.532). 
Therefore, the intraclass correlation 
estimator can be indirectly obtained in such a way (via 
ANOVA) that:
 !24  


  , where the total variance consists of two independent variance components and hence is given 
by 
 for 


 and 

; the best estimate of 
the total variance (
) is to use the estimates of group variance (
) and error variance (
), so that 


. Also notice: an unbiased estimate of 
group variance may be found 
 when MSE is greater than or equal to MSA (Hays, 
1994, p.534).     
 (for 
) in ANOVA, the common fami
ly or group size 
 is calculated for representing the 
mean within
-group individuals, and the intraclass correlation coefficient (Donner & Koval, 
1982) is given by
  

  , where 
 and 
 is defined by the number of total sample size 
(i.e.,  
). Also note that, by Donner & Koval (1980a), the mean within
-group 
subjects can be alternatively calculated by 
, where the 
approximate group si
ze 
, and this latter formula of the average within
-group size (
) is mathematically equivalent to the former (
), yet the computation (
) is 
more laborious. Since 


 and 

 are deemed, respectivel
y, as 
!25 the unbiased estimates of 
 and 
, it is intuitive 
and straightforward 
to find the estimator of 

   
  , where it
 is equivalent to the previous formula due to 
 (Donner & Koval, 1980a).
 As for statistical testing of intraclass correlation, by Donner & Koval (1980a), there is 
a test of significance for the estimate of intraclass correlation in analysis of varia
nce using 
F-distribution with 
 and 
 degrees of freedom at the chosen level of significance, with 
respect to testing the hypotheses 

 vs. 

. A significant 
F testing statistic value (i.e., 

) implies that members of the same group tend to be 
more alike and similar 
to each other 
with respect to the attribute or characteristic in question 
than those from 
a different group, and 
also 
that the estimated intraclass correlation
 coefficient 
shows the idea of the true proportion of variance accounted for in the population by that factor 
of interest (e.g., families or groups).
 For the sake of another mathematical and statistical expression of the intraclass 
correlation index, the i
ntraclass correlation coefficient can be re
-defined using the quantity 
as  
  , where there is a basic statistical assumption of the normal distribution for the random effect 
(
) and the error ter
m (

) (Hays, 1994, p.535). Further, in linear 
!26 modeling theory (Hays, 1994, pp.535
-536; Kutner et al., 2005, pp.1040
-1041; Stapleton, 
2009, p.285), the testing statistic of the proposed intraclass correlation estimator can be shown 
that 
  

  , where this proposed method is mainly based on the random
-effect ANOVA with a balanced 
design, and it follows an 
 distribution with 
 and 
 degrees of freedom, so that 
a  confidence interval on 
 can be 
obtained by
    , where 


 is the sample 
 ratio value in ANOVA table. By the 
algebraic 
relationship

, the corresponding interval for intraclass correlation is 
  
  , where this
 confidence limit, with confidence coefficient 
, for intraclass correlation 
!27 
 represents the degree of total variability accounted for by the mean differences 
among different factor levels (or the effect of the extent of variation 
between groups or 
families in the analysis of family data). Note that this interval estimate (for either 
 or 

) may not be very precise, if it results from a relatively small sample size, or if 
variance components are much more difficult 
(e.g.,
 relatively low reliability in measurements)
 to 
be 
estimate
d precisely than means. Also note that it may occasionally happ
en that the lower 
limit of the confidence interval for either 
 or 

 is negative, but since this ratio (
 or 


lower limit with
 the best
 value to the 
zero 
lower bound 
 that is
, simply, 
zero in this case. 
 The maximum likelihood estimator of intraclass correlation can be derived by using a 
theory of multivariate normal distribution (with the common mean and variance
-covariance 
struct
ure). Let 
 represent measurements taken on the 
-th groups (
), each consisting of 
 subjects, with a total size 
. Assume this 
-variate 
follows a multivariate normal
  
  or equivalently, the (
-variate normal) probability density function is given by
  
  , where the mean vector is 
 for a common mean 
 across all groups, and the 
!28 variance
-covariance matrix is 
for the diagonal element 
 (or a 
common variance across groups) and the off
-diagonal element 
 (or a common 
covariance over groups), 
 denotes the determinant of 
 (i.e., the scaling factor in matrix 
algebra), and 
 is the index of groups for 
. In a balanced design (the common 
correlation model), the estimate of intraclass correlation 
 can be obtained by using Pearson 
produ
ct-moment correlation (Donner & Koval, 1980a), and the explicit form of the estimator 
can be expressed by
  
  , where 
 and 
 represent the common sample mean and variance, respectively, 
and can be 
computed across all observations 
 using the concept of intraclass correlation by Fisher 
(1925a). 
 And, by a large sample theory (asymptotic normality), the variance of the proposed 
estimator is 
  

  Note that when a balanced design is considered (i.e., 
 for all 
), this 
estimator 

 is also equivalent to the result of the maximum likelihood estimate (MLE) 
of intraclass correlation (i.e., the multivariate normal
 density is taken by the maximum 
!29 likelihood method). 
 On the other hand, for an unbalanced design, the asymptotic (large sample) variance of 
the proposed estimator 

 is given by
  

  , where the sampling weights are 
 & , total sample 
, and Pearson correlation is used as the estimate of 
 (Donner & Koval, 1982). 
  In addition, as for the estimators of 
 and 
, the
 MLE solutions can be found by 
  
  and
  
  Hence, with 

 and 

, the MLE of intraclass correlation in this case can be 
computed as
  !30 


  Alternatively, it is equivalent to
  

   Note that Karlin et al. (1981) derived this MLE of intraclass correlation in an 
unbalanced design (by using invariance prop

 is the MLE of 
, then for any function 
, the MLE of 
is 

& Koval (1980b) used a different approach to solving the MLE of 
 by numerically 

g-likelihood function (the logarithm of 
-variate normal 
density) with a scaling factor of 
 :  


  , where this optimization method takes differentiation with respect to 
 to find the MLE.
   !31 2.3. Hedges Approach
  Hedges used intraclass correlation to summarize the information of variance 
components in multilevel structure of 2
-Level, 3
-Level, and 4
-Level hierarchical design 
(Hedges et al., 2012; Hedges & Hedberg, 2013). Further, intraclass correlation has been 
considered as an important tool/statistic to provide design effect parameters for statistical 
planning (power analysis) in experimental design an
d survey sampling (e.g., randomized 
controlled trials or large
-scale experiments in education settings). In hierarchical linear 
models, intraclass correlation play
s a key role in quantifying the amount of inherent clustering 
effects (i.e., within
-cluster v
ariation) in multilevel data. 
Look back at the development of ICC
 in hierarchical 
designs
. The 
ICC
 was first introduced by Fisher (1925a), who created the 
oldest measure for 
within
-group
 correlation and provided a significance testing procedure in 
experimental designs
 (such as RCTs and CRTs)
. Later
 on, Raudenbush 
(1997) buil
t on
 hierarchical
 linear
 models
 in education
 to evaluate the 
clustering 
effect of multilevel 
data 
structure 
through
 ICC. 
Furthermore,
 Hedges
 used
 the 
meta
-analy
tic
 framework 
to 
rethink
 the 
ICC 
by using 
design effect
 to improve
 multilevel
 design
s in education and social 
research
.   The 

theoretical framework of intraclass correlation in 
multilevel design (like 
a cluster
 randomized 
trial
, CRT) using 
hierarchical linear model (HLM) is:
 In a two
-level HLM, suppose that the variance components associated with fully 
unconditional model (no cov
ariates at any level of the model). Let 
 and 
 be the variance 
components at Level 1 and Level 2, respectively, and 
 and 
 be the MLEs of 
 and 
, respectively. Let the variances of 
 and 
 be 

 and 

, !32 respectively. Without loss of generality, suppose that 
 (note: in most large
-scale 
studies by hierarchical design, the Level
-1 variance component is usually known
, i.e.,
  is a 
given constant
 and 
, or c
an 
most likely 
be est
imated precisely,
 i.e., 
, since
 there 
are many Level
-1 unit
s that
 provide sufficient information for estimation
; Hedges et al., 
2012.) Let 
 denote the number of groups or clusters (Level
-2 units) and 
 denote the 
number of Level
-1 units in the 
-th Level
-2 unit of group or cluster. When the study is a 
balanced design analysis (i.e., 
), the intraclass correlation in the tw
o-level HLM model is 
  
  , and the intraclass correlation estimator (based on cluster random samples) is given by
   

  , then the asymptotic variance (based on la
rge sample theory and delta method) is shown by 
   

  , where the total variance component is 
, and the variance of 
 is 

 which is the variance (or squared standard error) estimate of the Level
-2 variance 
!33 component. As for the estimate of the variance of 

 (i.e., 
sampling variability of 
the 
sample 
ICC)
, the large sample variance is given by 
  

  , where 
 is the intraclass correlation estimate 

 (or 

), and the variance of 
 is defined by 
, so 
that the estimate is 
 for 
. (Note: the 
assumption of 
, or 
, is imposed on the large
-sample variance of intr
aclass 
correlation estimates.)
 Fisher (1925a, p.220) derived a similar formula (large sam
ple variance) for the 
intraclass correlation in a balance design (
note: Fisher did not consider the assumption of 
):  

  Donner & Koval (1980b) showed the large sample variance of intracl
ass correlation in 
an unbalanced design (note: Donner & Koval did not consider the assumption of 
) as
  

  !34 In a cluster (or group) randomiz
ed design, researchers often operate interventions or 
assign treatments at a group level (say Level
-2 such as classrooms, schools, or sites) rather 
than at an individual level (say Level
-1 for individual subjects like students) for some 
practical reasons t
hat it is sometimes too expensive (or even not feasible) to work on 
interventions to each subject but rather than deal with an entire intact group (e.g., a whole 
community, school, worksite, or family). Therefore, cluster
-randomized trials (or group
-random
ized experiments) recently have become more and more important and popular in 
educational and social research studies for effectively and economically evaluating 
educational and social interventions (Donner et al., 1981; Hauck et al., 1991; Hedges & 
Hedber
g, 2007; Klar & Donner, 2015).
 For example, 
a research
 investigator could save
 money 
(or increase the effectiveness of cost)
 by using group interventions
, e.g., CRTs
, instead of 
individual ones 
like RCTs 
(Tachibana
 et al., 2018
). Also note that
 researchers 
find CRTs are 
more 
suitable 
than RCTs 
for the construction of 
economic
ally
-efficient and 
economic
ally
-productive
 samples that 
have the desired statistical properties
 (Connelly, 2003).
 In a theoretical framework of cluster sampling experiments (i.e., cluster
-randomized 
trials), suppose a sample 
of 
subject
s are
 collected from 
 clusters (or 
organizational units
 such as classrooms, schools, or district sites) of a group size 
 which are 
assigned to an 
intervention (or a treatment group) with randomization. In this cluster sampling design, the 

 individual samples 
are not independent to each other, but rather are highly 
dependent on the cluster to whom a 
subject, he or she, belongs
 or is assigned
; Lohr, 1999, Chapter 2 Simple Probability Samples 
& Chapter 5 Cluster Sampling with Equal Probability). Therefore, the sampling distribution of 
a statistic using cluster samples 
needs to take into account both between
-group correlation and 
!35 within
-group variation at the same time in analysis. Suppose that in this cluster sampling 
structure, the total variance 
consists of a within
-cluster variance 
 and a between
-cluster 
variance 
, i.e., 
.  Then, comparing with the formula of the population mean 
variance estimator for a simple random sample 
 , the population average variance 
for an individual sample (from 
 clusters with size 
) is shown as
  , where the intraclass (or sometimes called intra
-cluster) correlation coefficient is 
 which provides a statistical measure of homogeneity 
within the clusters (i.e., if the clusters are perfectly ho
mogeneous
, then
  and
 ), and the design effect (DE) or variance inflation factor (VIF) is defined as 
 (Donner et al., 1981; Lohr, 1999, pp.138
-140). Note that clustering has more variation than 
simple random sampling by a factor of DE (or
 VIF>1) due to the major part of cluster
-to-cluster variability plus the minor portion of within
-cluster variance (i.e., samples in different 
clusters often vary more than those samples in the same cluster). See Figure 2.4 as an example 
of 2
-level hierarch
ical structure with regard to intraclass correlation and design effect.
 In experimental design, statistical planning for sample size determination and power 
calculation is critical for researchers to better produce evidence
-based conclusions by 
rigorously 
detecting true effects at the desired level of significance. Traditionally, the 
experimental planning approach of sample and power computation considers the classical 
assumption of simple random samples. Therefore, power analysis for cluster sampling desig
n or group randomized experiments need to use intraclass correlation coefficient along with 
non-centrality parameters (of 
-distribution) to account for variability in multilevel design 
(e.g., between
-group and within
-group variations) (Cohen, 1992; 
Hedges & Hedberg, 2007, 
2013; Raudenbush, 1997; Rutterford et al., 2015).
 !36     Figure 2.4 Intraclass Correlation & Design Effect in 2
-Level Hierarchical Linear Model
  Note. Each level has its own variation, where variation between sites is 
sigma
-square of 
between, and variation within site is sigma
-square of within, and the total variation is the sum 

-
-
     !37 In a two
-level hierarchical design structure (i.e., individuals a
re at the level 1, and 
groups or clusters at the level 2), the unconditional model (involving with no covariates) is 
written by
  

  , where 
 represents an o
utcome for the 
-th individual subject (at the level 1) in the 
-th 
cluster group (at the level 2), 
 is a grand mean outcome 
,  is a 
random error term at the level 1 (i.e., 

) corresponding to the 
-th person in the 
-th group, 
 is a random effect (i.e., 

) associated with the 
-th cluster (or a 
random error term at the level 2), the within
-group (between
-person) variance component is 
given by 

, the between
-group variance component is given by 

, and the random error terms at the level 1 and level 2 are not correlated (i.e., 

). The (unconditional) intraclass correlation coefficient associated with the u
nconditional 
model is 
    , where the (unconditional) total variance is defined as 
,  and 
 represents 
the error variances corresponding to the within
- and between
-group random variation, 
!38 respectively
. In a hierarchical design (such as cluster
-randomized experiment) involving statistical 
adjustment by covariate(s), the (covariate
-adjusted, or conditional) intraclass correlation is 
defined by
    , where the (covariate
-adjusted) total variance is defined as 
,  and 
 
-effect variance components adjusted 
by covariates) corresponding to the within
- and between
-group random variation, 
respectively.
 In order to evaluate the relative efficiency between unconditional and conditional 
hierarchical models, Hedges & Hedberg (2007) proposed two statistical auxiliary quantities 
   and
   , where 
 indicates the proportion of between
-group variance remaining, and 
 indicates 
the proportion of within
-group variance remaining. Note that these two measures, along with 
 and 
, are useful to provide information o
f statistical variation for 
power and sample size computations, where 
 and 
 are defined as the proportion of 
!39 between
-group and within
-group variance explained by covariate(s) in hierarchical design, 
respectively.
   2.4. Raykov Approach
  In 
classical test theory (CTT), a given test score (
) consists of two parts 
 the true 
score (
) and the measurement error (
) (Raykov & Marcoulides, 2011, pp.117
-118); hence, 
the relationship can be mathematically described as 
, where the true sc
ore variance 
is 

, the error variance is 

, plus the true score and error score are 
assumed to be mutually independent, i.e., 

. According to the CTT 
equation, reliability coefficient (
) is the ratio of t
he true score variance to observed score 
variance, and can be expressed as 
 , which is equivalent to a similar 
idea of the 
 index in regression analysis when predicting true score from observed score. 
Moreover, it is interest
ing to note that the standard error of measurement (SEM) is 
 (Raykov & Marcoulides, 2011, pp.137
-145). Thereby, within the CTT framework, 
it appears a strong connection between reliability coefficient and intraclass correlation 
coefficient i
n terms of statistical concepts and mathematical definitions (i.e., both share 
common ground to utilize variance accounted for). 
  By the latent variable modeling (LVM) approach (Bartholomew, 1987), Raykov & 
Penev (2010) showed a procedure to evaluate reli
ability coefficients (such as point and 
interval estimators) in 2
-level HLM unconditional and conditional models, and further derived 
!40 standard error (SE) estimates for reliability coefficients with logit transformation (i.e., 

) via Taylor series expansion method (aka Delta method) as 

, which can lead to an 
 large 
sample confidence interval using the standard normal Z distribution by 
 where 
 , and 
 is 
a logit
-transformed reliability coefficient, and 
 is the error 
measurement.
  As for intraclass correlation coefficients (ICC) in hierarchical designs (e.g., two
-level 
models) within the LVM framework (Bartholomew et al., 2011), Raykov (2011) used the
 restrictive maximum likelihood (REML) estimators to find ICC in the two
-level HLM 
structure (aka factorial random
-effect ANOVA):
  

  , where 
 represents a response outcome score for the 
-th individual subject (at the level 1; 
) in the 
-th cluster group (at the level 2; 
),  is the grand mean, 
 is 
a random error term at the level 1 and assumed to be normally distributed with mean 
 and 
within
-group variance 
 (i.e., 

) corresponding to the 
-th person in the 
-th 
group, 
 is a random effect and assumed to be normally
 distributed with mean 
 and 
between
-group variance 
 (i.e., 

) associated with the 
-
deviation term at the level 2, and the random error terms at the level 1 and level 2 are 
!41 supposed to be mutually uncorrelated (i
.e., 

). In this LVM framework, 
the ICC is defined as the ratio of between
-group variance to observed total variance 

 , where the within
-group variance is 

, and the 
between
-group
 variance is 

. The visualization of this LVM modeling approach 
using a path diagram is shown in Figure 2.5.
   Figure 2.5 
Latent Variable Model for Estimation of Intraclass Correlation in 2
-Level Design
  Note. The path diagram is inspired 
by the visualization of 2
-level random coefficient models 
in the 
book of 
statistical 
multilevel modeling (Muth”n & Muth”n, 2012, Chapters 9 & 10)
.    With the invariance property of MLE for the variance estimates in LVM, the ICC is 
given by 

, where 
 and 
 are the between
- and within
-group 
variation estimates, respectively, obtained by the REML method in th
e two
-level  LVM 
model. 
Note tha
t according to 
Casella & Berger
 (2002, p.320
), the 
invariance property of 
MLEs
 is stated as follows: 
If 
 is the MLE of 
, then for any 
one-to-one 
function 
, the 
MLE of 
is 
. As for hypothesis testing, the test statistic for intraclass correlation is 
!42 given by a standard normal 
 distribution for the pivotal quantity 
  

  is used to test the simple hypotheses 


 vs 

 (i.e., a two
-tailed test at the significance level of 
), or 

 vs 

 (i.e., a 
one
-tailed test at the 
 level), albeit this analytic strategy may
 only work for the large sample 
case, plus the lower bound of an interval estimation for 

 by this method may reach 
out below zero
 (i.e., 
an 
out
-of-bounds 
value from 
the valid domain
 of ICC
 ).   The LVM procedure can also be extended and used to evaluate ICC at two
-level 
designs with discrete response variables (Raykov & Marcoulides, 2015a). Suppose the same 
two
-level LVM setting above, but assume that the observed outcome score 
 is recorded
 on a 
categorical scale (i.e., a discrete variable for the 
-th unit at the level
-1 of individual subject 
() in the 
-th unit at the level
-2 of cluster group (
)). In this situation 
with categorical responses, the traditional approach 
of ICC estimation (which presumes the 
outcome is continuous) needs to be modified by the following modification procedure via the 
LVM framework (Raykov & Marcoulides, 2011, Chapter 10 Introduction to Item Response 
Theory). First, consider the underlying la

( possible cateogries) as 
   !43  , where 
 () plays an important role of a continuous latent variable (i.e., 
), which 
is not only linked with the observed measure 
 by a one
-to-one linear transformation from 
one domain (latent space) to another (real space), but also used to assign a s
pecific categorical 
value through the given thresholds points from 
 (note: each threshold or 
cut
-off point is a real number, and it holds that 
) (Raykov & Marcoulides, 2015b). 
 Given this underlying latent st
ructure above, the ICC estimator for a binary outcome (a 
special case of categorical outcome variables; Raudenbush & Bryk, 2002, p.334) can be 
derived by 
  
  , where 
 is the between
-group variation, and 
 is a mathematical constant 

 (note: 
the standard logistic distribution, with location 
 and shape 
, has a variance of 
). Also notice that this ICC estimator for the dichotomous outcome case (say,
  or 
) makes a 
strong assumption that the within
-group variance 
 is held as a constant of 
 over all 

-group 

-life data, and so the modified
 analytic strategies are needed 
for building non
-constant within
-group variances 
(which 
are
 data
-driven and more flexible for 
a real world situation) 
into hierarchical generalized linear models (HGLM). 
 Furthermore, the standard error of the ICC above (for
 the binary response case) can be 
approximately derived via Delta method (Raykov & Marcoulides, 2004; Hedges et al., 2012), 
!44 which is given by 
  

  , where 

 is the ICC estimate, the total variance estimate is 
 (assuming the within
-group variance is a constant of 
), and 
 is the between
-group 
variance estimate. 
             !45 CHAPTER 3
  LITERATURE IN 
REHABILITATION
 COUNSELING
   This chapter presents literature of EBP in rehabilitation counseling using the RSA
-911.
 The state vocational rehabilitation (VR) agencies collect and report summary data in a 
federally mandated format called the Rehabilitation S
ervices Administration (RSA) Case 
Service Report, aka the RSA
-911 (Schwanke & Smith, 2004). The RSA
-911 provides 
researchers in the field of rehabilitation counseling an open playground and additional resource 
for deep learning and data mining. Not only do
es the RSA
-911 allow multi
-faceted explorations 
of complex issues about people with disabilities in VR, but rehabilitation researchers can also 
probe extensively into big data to examines the hidden components or latent factors contributed 
to successful VR
 outcomes (Pi & Thielsen, 2011). Moreover, rehabilitation practitioners and 
scholars can take full advantage of the RSA
-911 data to develop evidence
-based practices, 
particularly for individual
-level and employment
-focused interventions, effective strategi
es, as 
well as best practices to promote independent living and positive outcomes for individuals with 
disabilities (Fleming et al., 2013). 
 With EBP as a cookbook approach to rehabilitation counseling (Kosciulek, 2010), it 
provides the fundamental framewo
rk for rehabilitation counseling practitioners that 
incorporates the available scientific evidence with the expertise of clinical judgement skills to 
make best decisions about interventions, services, or treatments for people with disabilities. In 
this man
ner, EBP guidelines also suggests rehabilitation counselors to identify relevant 
literature and systematic research, to assess different available information resources such as the 
RSA
-
 services for people 
!46 with disabilities. So, 
with
 the data
-driven framework 
using 
information on 
RSA
-911, which 
research 
method or 
statistical 
approach can
 provide insights to
 work best for whom (target 
population
s), how (
intervention 
or treatment
 programs
), and under what condition 
(rehabilitation support
 or other 
types of 
services
)? 
This literature review survey
s recent 
academic knowledge on those key questions 
and 
provides
 a firm foundation
 to this study
.   The following is a summary of 
literature review of statistical methods using the RSA
-911.
  3.1. 
Multilevel Analysis
  Hierarchical data structures are often seen in educational and social 
research studies
. For 
example, in rehabilitation counseling, 
VR 
clients are grouped into 
organizational buildings and 
structures or field offices, which are nested into different local districts, and local districts can 
be nested into states or regions, and so on. So, it is important to take into account all these 
hierarchical data structures 
and topological data relationships by using multilevel analysis
 (hierarchical linear models)
. Note that
 conventional regression models 
often 
under
perform
 statistical estimation and inference
 (e.g., inflation of standard errors
, and 
relative 
bias
 in ICC
) in hierarchical
ly structure
d data
 due to 
non-normal residuals 
resulted 
from 
the 
interrelation 
between 
subjects 
(which 
somewhat 
leads to
 violat
ion of
 the important 
assumption
s of independence
, homogeneity
 and normality
) (Maas
 & Hox, 2004
; Raudenbush & Bryk, 
2002).  Chan and his colleagues (2014) used RSA
-911 data in FY 2005 (before the economic 
recession) and FY 2009 (after the economic recession) to study the impact of the contextual 
!47 factor of state unemployment rate, and its impact on the employment opportu
nities and 
outcomes in VR. By the (2
-level) hierarchical (generalized) linear modeling approach, they 
found state unemployment rate (the contextual variable) was having a significant moderation 
effect on the relationship between personal factors (demograph
ic and disability variables) and 
competitive employment.
 Alsaman & Lee (2017) examine the relationships between contextual factors, individual 
factors, and employment outcomes of transition youth with disabilities in VR using the RSA
-911 in FY 2013 by the 
2-level hierarchical generalized linear modeling. They found state 
unemployment rates were having the indirect interaction impacts on the relationships between 
individual characteristics, rehabilitation services, and successful employment. For example, the
 state unemployment rate increased, the disparity in successful VR closure decreased across 
some types of disabilities such as intellectual disabilities, TBI, or youth with autism and other 
communicative disabilities (in comparison to the reference group o
f physical disabilities).
 Pi (2006) constructed the 2
-level hierarchical structure model with the micro
- and 
macro
-level factors related to VR outcomes using RSA
-911 in FY 2002. Results showed the 
micro
-level variables (i.e., age, education, minority, SSI/
DI, disability significance, services 
 rehabilitation technology, job placement assistance, on
-the
-job
-support, and diagnosis & 
treatment) were more related to rehabilitation outcomes than the macro
-level variables (i.e., 
counselors who met CSPD requireme
nts, proportion of clients with significant disabilities, 
unemployment rate, proportion of minority population). Note: CSPD=Comprehensive System 
of Personnel Development.
  !48 3.2. 
Structural Equation Model
   The structural equation modeling (SEM) with latent 
constructs (unobserved
 factors
) and 
manifest variables (
truth 
realizations) is 
one type of 
structural 
causal modeling 
(statistical 
models for causation
) that is built 
(through a path diagram for visualization) 
to identify the 
underlying factor structure ex
plaining the direct and/or indirect effects 
of latent 
constructs and 
their
 inter
-relationships 
on outcomes of interest (Raykov & Marcoulides, 2006). In the VR 
context, SEM can be used to 
understand 
complex theoretical models (or EBP) and to find 
important 
predictive associations (
using latent factor analysis
) among
 individual characteristics, 
rehabilitation services, and employment outcomes (Austin & Lee, 2014). 
 Kosciulek & Merz (2001) conducted structural analysis of consumer
-directed theory of 
empowermen
t for consumers with disabilities in the community rehabilitation program.
 Chan et al. (2007) provided an overview of the basic concepts and applications of SEM 
(e.g., confirmatory factor analysis) in counseling, psychology, and rehabilitation research. 
 Austin & Lee (2014) built a structural equation model of VR services 
(consisting of job
-related and person
-related factors) 
via RSA
-911 in FY 2009, to study predictors of employment 
outcomes in VR for people with intellectual and co
-occurring psychiatric d
isabilities. The study 
found job
-related services such as job placement, job search, job readiness, and on
-the
-job 
support, were to significantly predict competitive employment
 outcomes
.   
   !49 3.3. 
Classification Tree Model
  The 
tree model is 
a data
-mining technique
 via the 
classification method 
of 
CHAID
  Chi-squared Automatic Interaction Detection
 algorithm
  to 
explore hidden 
relationships and 
predictive information in a large database (Tan et al., 2005). In the classification tree procedure, 
the tree
-based model is designed to classify all subjects into 
homogeneous 
subgroups
 by their 
attributes
. Additionally, the 

 quite useful to 
uncover
 the 
complex multivariate system 
like
 the 
VR process 
by 
provid
ing
 useful 

information
.  Rosenthal et al. (2007) used the data mining approach via RSA
-911 data in FY 2001 to 
examine factors (i.e., services) affecting outcomes in the VR process for individuals suffering 
psychiatric disabilities. Results showed rece
iving job placement services was found to be the 
most important variable and had a positive effect for the target population in VR.
 Schoen (2010), and Schoen & Leahy (2012) conducted an examination of 
demographics, services, and employment outcomes for peo
ple with spinal cord injury in VR 
between FY 2004 and FY 2008 by data mining models via RSA
-911 data. Findings suggested 
the most significant predictors of employment were level of education attained, cost of 
purchased services, days from application to cl
osure, rehabilitation technology, job placement 
assistance, and job supports.
 Lee and his colleagues (2012), and Lee (2014) tried to 
discover
 the
 VR 
evidence
-based 
best practices using a data mining approach of decision (or classification) tree models thro
ugh 
the RSA
-911 data in FY 2011 and FY 2013, respectively, to study the inter
-relationships
 of VR 
measurements
 between services delivery, personal backgrounds and rehabilitation outcomes for 
!50 people with disabilities in 
State of 
Michigan
.  3.4. 
Other 
Methods such as Social Network Analysis and Spatial Analysis
    Spatial analysis is a type of geographical
/locational 
analysis 
(statistics) 
which seeks to 
explain patterns of 
human behavior (e.g., rehabilitation outcome
s) and its spatial expression
 (reside
ntial 
areas
). The
 geostatistical 
model can predict the spatial patterns 
(using 
geographical 
information) 
in the 
complex 
networks or systems
 (like RSA
-911)
 for spatial decision
-making
 support and 
solving geographic issues in 
planning and policy development
 (Mayhew, 2015)
. Sink et al. (2014) developed location theory in VR to study effectiveness of service 
delivery and consumption for persons with disabilities using the geographic information system 
(GIS) and data from West Virginia Division of 
Rehabilitation Services (including RSA
-911 and 
Census data). The findings supported the value of public VR field office or facility location and 
its effectiveness and efficiency for people with disabilities to achieve or maintain employment.
 Social 
network analysis is the process of investigating social structures through the use 
of graph and network theory. The social networking model characterizes individual links or ties 
(relationships or interactions) within a networked structure (such as the VR 
system). One key 
feature of this social network analysis is visual representation (via sociograms) which provides 
pivotal information about attributes within a network (e.g., positive or negative relationships 
between services and outcomes in the VR networ
k data) (
Schneider, 2018
).  Ditchman et al. (2018) applied social network analysis, via the RSA
-911 data in FY 
2009, to examine
 service patterns and their relationships with employment outcomes for 
!51 transition
-age individuals with autism spectrum disorder (
ASD). By 
social 
network analysis, six 
core 
VR 
services were found positively linked with a better employment outcome, including: 
assessment, counseling & guidance, job placement, job search, 
job support 
and transportation.
  3.5 
Justification for Covariates
 Used in Multilevel Analysis
   The Rehabilitation Act of 1973 (and its Amendments of 1986, 1992) was legislated with 
the goal of providing individuals with disabilities with equal opportunities to achieve 
employment, independent living, and self
-sufficient
 as the general population without 
disabilities. Under the law, state VR programs are to help people with disabilities to obtain or 
maintain employment through rehabilitation services, which may include but not limited to 
assessment, vocational rehabilitat
ion counseling & career guidance, educational training (e.g., 
colleges or universities), job coaching, job placement services, on
-the
-job support training, 
transportation and miscellaneous services (see Appendix A for the definitions of VR variables 
used i
n the study; Rehabilitation Services Administration Policy Directive, 2013).  Many 
research studies have been conducted to examine the relationships between various factors (i.e., 
individual characteristics, VR services, VR counselors, and environmental fa
ctors) and 
rehabilitation outcomes. Based on a systematic review on VR outcomes in relation to VR 
factors, previous rehabilitation studies confirms the VR variables of interest in this study 
(including individual characteristics, employment backgrounds, re
habilitation services) are all 
supported by the VR foundations with the significance of associations with successful 
employment outcomes for people with disabilities (Alsaman & Lee, 2016; Bolton et al., 2000; 
Chan et al., 2014; Dutta et al., 2008; Moore et
 al., 2000, 2001, 2002a, 2002b, 2004).       
 !52  CHAPTER 4 
  METHODS AND RESEARCH QUESTIONS
   In this chapter, 
it provide
s analytic strategies of experimental planning for cluster (or 
group) randomized design structure
 with respect to
 power & sample size 
calculations 
using 
intraclass correlation coefficient (or ICC) via hierarchical linear model (HLM) and hierarchical 
general
ized linear model (HGLM). By the bootstrapping simulations (Givens & Hoeting, 2012; 
Rizzo, 2007), the methods are 
proposed 
to evaluate 
statistical performance
 of ICC
, in terms of 
relative bias
, estimation error, and 
inference on 
parameter
, via
 HLM & HGLM using the real 
data set of RSA
-911
 from the U.S. Department of Education and Labor
. In the RSA
-911 data of 
this study, the target population 
focuses on 
those people with disabilities who had been served 
in Michigan in fiscal year (FY) 2015. In 
addition
, the two
-stage sampling approach is used to 
generate the simulated data sets with the cluster
-randomized design structure, where individual 
subject (person with disability) is for Level 1 and 
structure (
rehabilitation office
) is for Level 2. 
  4.1 Research Methods
  Three proposed ICC estimation methods are shown for different statistical settings and 
experimental design 
purposes
 using multilevel model
s:  Method 1 
 the ICC estimator (via Pearson correlation & F of ANOVA) given 
by 
a balance design (
equal size of 
 individual subjects across 
 groups)
 is shown in Equations 1 
!53 and 2
:  
 (1) and
  

 (2)  , where
  is 
group
 sample size
,  is the index of 
samples
, the
 among
-group mean is 
 (from 
the 
-th sample 
over all
  group
s), the common mean is 
, the common variance is 
, 
 and 

 are
 Mean 
Square
s Among and 
Mean 
Square
s Within
 from 
ANOVA
, respectively.
 Method 2 
 the ICC estimator (via Pearson correlation & F of ANOVA) given 
by 
an 
unbalance design (unequal size of 
 individual subjects across 
 groups
, for 
) is 
shown 
in Equation 3
:   


 (3)  , where 
 
sample 
size for ICC 
estimation
, and 
 is the total 
sample 
size (i.e.,  
), 
 and 

 are Mean 
Squares Among and Mean Squares Within from 
ANOVA
, respectively
. Note that Pearson correlation estimate 
requires numerical approximation of 
-2 log likelihood.
 !54 Method 3
  find auxiliary information 
(based on the ICC estimate from Method 1 or 2) 
for experimental planning in designs (design effect and minimum detectable effect size with 
respect to desired power
 & required sample size): 
 (a)
 Design effect (DE), or variance inflation factor (VIF), is 
defined
 in Equation 4 as
   
 (4)  , where the intraclass correlation coefficient is 
 , or alternatively 
, which provides a statistical measure of homogeneity within the 
clusters
,  is 
group
 sample size
 for a balance
d design 
case 
(or, 
alternatively, 
 can be 
substituted
 for  in an unbalance
d design)
. (b) The unconditional intraclass correlation coefficient is 
shown
 in Equation 
5:    (5)  , where the unconditional total variance is 
 ,  and 
 represent the 
error variances corresponding to the within
- and between
-group variation, respectively.
 In a hierarchical design, such as cluster
-randomized experiment, involving
 statistical 
adjustment by covariate(s), the conditional (or covariate
-adjusted) intraclass correlation 
is described in Equation 
6:  !55    (6)  , where the covariate
-adjusted total variance is 
 ,  and 
 represent the random
-effect variance components, adjusted by covariates, corresponding 
to the within
- and between
-group random variation, respectively.
 (c)
 Four prop
osed statistical auxiliary quantities for evaluating the relative efficiency 
between unconditional and conditional hierarchical models, are
 shown as follows. The 
first two
 for measuring
 
 are described 
in Equations 
7 and 
8:    (7) and
   (8)  , where 
 indicates the proportion of between
-group variance remaining, and 
 indicates the proportion of within
-group variance remaining. 
 The other two 
supplementary
 measures 
for variance explained by covariates 
(also 
serving the 
comple
mentary 
side of
 measurements 
in 
Equations 
7 and
 8) are
 described
 below 
in 
Equations 
9 and 
10: !56    (9) and 
   (10)  , where 
 and 
 are defined as the proportion of between
-group and within
-group 
variance explained by covariate(s) in hierarchical design, respectively.
  4.2 
Proposed Models
  Four 
hierarchical
 modeling structures 
(Models 1
-4 as shown below) 
are considered in 
the study to test the proposed methods. And 

 significance of 
disability (yes/no)
, type of disability (nominal measure
 with 10 categories
), and previous work 
experience (yes/no)
  are included in all four models
 for separate
 (subgroup
-specific
) analys
es by breaking down the
 whole 
sample into
 different
 subsets based on the shared characteristics.
 Model 1
  Unconditional Model (no covariate
-adjusted)
 Model 2
  Conditional Model (covariate
-adjusted by 
Covariate Set 
1) Covariate Set 1
 consisting of demographic characteristics includes: (a) gender (male or 
female); (b) minority (yes or no); (c) age (continuous measure); (d) SES by social security 
and/or insurance benefits 
(yes or no); 
(e) 
educational background (ordina
l measure)
. !57 Model 3
  Conditional Model (covariate
-adjusted by 
Covariate Set 2)
 Covariate Set 2
 consisting of VR service variables includes: (a) job placement 
assistance (
binary; 
received or not received); (b) on
-the
-job supports (
binary; 
received or 
not 
received); and (c) rehabilitation technology (
binary; 
received or not received)
. Model 4
  Conditional Model (covariate
-adjusted by 
Covariate Set 3
) Covariate Set 3
 combines both Covariate Sets 1 and 2 altogether in
to one 
set
. There are 
two
 different V
R outcomes
 used in simulation analyses 
 (1) competitive 
employment outcome
 (yes/no); and (
2) weekly
 earnings
 (a continuous measure)
 = rehabilitation 
outcome (a dichotomous 0 or 1 measure) X 
weekly
 income
 (a continuous measure)
, where 
the 
weekly
 earnings
 can 
also 
be 
deemed as an indicator of quality of employment outcomes 
achieved at exit in 
the 
VR (Chan et al., 2016;
 
 et al., 2015
). Note. The total number of all combinations of analyses (4 Models X 
2 Outcomes) = 
8.  4.3 
Research Questions
  Our proposed methods are used to address the following research questions:
 In order to evaluate the simulation results, descriptive statistics of ICC 
are 
provided 
to answer R
esearch 
Question 
1 (RQ1)
 & Research 
Question 
2 (RQ2) 
below. In addition
, statistical 
performance 
(precision
 and accuracy) 
of ICC
 under the designated conditi
ons 
using 
randomized 
cluster samples 
is examined
 by statistical bias (or 
average
 bias) and its error variance (or mean 
square error) 
to answer 
Research Question 3 (RQ3)
 below. Further
more
, the 
usable
 samples
 in 
!58 the 
whole
 data
 set of RSA
-911 
are
 used as a collection of the true parameters of ICC in RQ1; 
then, in the bootstrapping computations (Ross, 2013), the full data set of RSA
-911 is resampled 
100 times (
number of bootstrap
ping
 repetition
s=100) under the given sampling condi
tions for 
ICC estimation using
 the 
bootstrap
 samples in RQ2
. At
 the end, 
by comparing 
the differences 
in ICC 
estimates 
between 
RQ1 and RQ2
, it shows
 which
 one of
 estimation methods, designated 
models, and sampling conditions, can provide the best results
 of statistical performance of ICC 
estimation
 and inference
 at multilevel design with 
randomized cluster samples 
(RQ3)
. Research Question 
1 (RQ1)
: What are 

intraclass correlation values (ICC 
estimate, standard error, p
-value
 and 95% confidence limits
) in the 
usable samples
 given by the 
breaking variables for subset analysis (Models 1
-4)? How are 
ICC estimates
 distributed in 
Models 1
-4? 
What are the differences in the 
ICC
 estimates among Models 1
-4?   
 Research 
Question 2
 (RQ2
): Given 
the
 designated 
cluster randomized structure (
i.e., t
he 
number of groups = 5, 15, 25; the number of subjects = 50, 100, 150), what are the intraclass 
correlation estimates (ICC estimate, standard error, and p
-value) 
using
 the 
bootstrap
 samples 
(the number of bootstrap repetition=100) 
given by breaking variables 
under 
Models 1
-4? Research 
Question 3
 (RQ3)
: Comparing the 
results
 between Research Question 1 
(population model) and Research Question 2 (
bootstrap 
subsample model), which model
ing 
structu
re (Models 1
-4) can provide the best statistical properties of ICC 
estimation and 
inference
, in terms of statistical bias (
mean 
difference 
in ICC estimates 
between RQ1 and 
RQ2), mean square
d error 
or mean squared deviation 
(average
 square
d difference
 in ICC
 estimates 
between RQ1 and RQ2), and parameter coverage rate (proportion of true parameter 

confidence interval
 for ICC
, using the results of RQ1 and RQ2
)? !59  4.4 Description of RSA
-911 Data
  The RSA
-911 data in FY 201
5 (which 
RSA
-911 is 
supporting information 
by state VR 
agencies 
for rehabilitation services administration
 by the U.S.
 Department of Education
) is used 
to test the proposed methods for the ICC in different simulation scenarios of multilevel 
structure models. As for
 the foundations of evidence
-based rehabilitation, the target population 

for employment, IPE, and had been receiving VR services already by their IPE) from the p
ublic 
VR program in the State of Michigan. There are 33 VR office structures in Michigan that are 
used as an indicator of level
-2 units in HLM & HGLM analyses 
in the simulation study.
  4.5 Simulation 
and
 Analysis Plan
  To address the proposed research questions, a simulation study via the existing RSA
-911 data (representing a 
complex system 
in a real
-world 
situation
) is conducted by 2
-level 
hierarchical design modeling, where individual is on level
-1, and office is on le
vel
-2. Two types 
of the proposed hierarchical models are considered in analyses: (1) unconditional model 
(without covariates) is designed by Model 1; and (2) conditional model (with covariates) is 
given by Models 2
-4.  To test proposed multilevel designs a
nd their modeling structures in 

apply a simulation analysis to compare the results between unconditional and conditional 
!60 models, with respect to four differe

data set in RQ1, plus three different cluster sampling procedures in RQ2). Furthermore, in test 
design and evaluation, three outcomes of interest in the study (rehabilitation outcome, 
competitiv
e employment, and quality of employment) are used to examine the statistical 
performance (effectiveness analyses) of the proposed models and the simulation results, in 
terms of statistical bias, error bias, and accuracy & precision (in RQ3).
  A graphic ove
rview of the simulation process in the study is shown 
as 
a workflow chart 
below
 in Figure 4.1
. Figure 4.1 A 
Workflow 
Diagram of 
Simulation
-based 
Exploration and 
Evaluation for the ICC
   In computer simulations via the RSA
-911, the statistical software R (Linear Mixed 
!61 Model 
lmer
 and Generlized Linear Mixed Model 
glmer
 in the package of 
lme
 or 
lme4
), IBM 
SPSS (Mixed Effect Model by 
MIXED
; Generalized Linear Mixed Model by 
GENLINMIXED
; Varia
nce Component Analysis by 
VARCOMP
), SAS (Mixed Effect Modeling through 
Proc 
Mixed
; Generalized Linear Mixed Model via 
Proc Glimmix
), and Stata (Multilevel Mixed 
Model through 
Xtmixed
 or 
Mixed
) are used for conducting statistical analysis and outcome 
performance evaluation for simulation results of ICC estimation and statistical inference.
  4.6 Theoretical Framework of HLM and HGLM in 2
-Level Cluster Randomized Design
  This section provides mathematical details of multilevel modeling structures used in
 the 
study.
  4.6.1. 
HLM in 2
-Level Cluster Randomized Structure via RSA
-911  In the two
-level hierarchical design structure (i.e., individuals are at the level 1, and 
offices are at the level 2), the unconditional model (involving with no covariates) is de
scribed 
in 
Equation 1
1:   


 (11) !62  , where 
 represents an outcome for the 
-th individual subject (at the level 1; 
) in 
the 
-th office (at the level 2; 
),  is a grand mean outcome that can be estimated 
by 
,  is a random error term (or individual variation) at the level 1 
(i.e., 

) corresponding to the 
-th person in the 
-th group, 
 is a random effect 
(i.e., 

) associated with the 
-th office (or cluster variation at
 the level 2), the 
within
-cluster (i.e., between
-person) variance component is given by 

, the 
between
-cluster variance component is given by 

, and the random error terms at 
the level 1 and level 2 are assumed to be not mut
ually correlated (i.e., 

).  When a covariate (e.g., age groups) used in the hierarchical design, the conditional 
model (involving with one covariate centered at the group mean) is written 
in Equation 1
2:   


 (12)  , where the covariate model uses group (office) mean centering for reducing correlation 
between 
groups (Paccagnella, 2006; Raudenbush & Bryk, 2002), the Level 1 model is for the 
-th person (
) and the Level 2 is for the 
-th group (
),  is the covariate 
for the 
-th individual subject in the 
-th office, 
 is group me
an for the 
-th group, 
 is a 
!63 random effect of the 
-th office (a random residual at Level 2), 
 is an individual error term for 
the 
-th person (a random residual at Level 1), 
 

es are equal across offices), 
 is grand mean, and independence between 
errors at levels 1 and 2.
  4.6.2. 
HGLM in 2
-Level Cluster Randomized Structure via RSA
-911  Suppose that 
 is a binary outcome variable for the 
-th individual subject (at the level 
1; 
) from the 
-th cluster (office). In the 2
-level cluster randomized trial, the 2
-level 
hierarchical generalized linear model, HGLM, (involving with no covariates) i
s given
 in 
Equation 1
3:    

 (13)  , where 
 denote a dichotomous outcome (coded as zero or one) for the 
-th individual subject 
(at the level 1; 
) from the 
-th office (at the level 2; 
), 
 is grand 
mean, 
 is an individual error term at the level 1 (i.e., 

) corresponding to the 
-th person in the 
-th group,
  is a random effect (i.e., 

) associated with the 
-th 
group (or office variation at the level 2), the within
-group variance is given by 

, the between
-group variance is given by 

, and the random error terms at the level 
1 and level 2 are assumed to be not mutually independent (i.e., 

). !64  When a covariate (e.g., minority groups) used in the 2
-level generalized hierarchical 
design, the conditional model (involving 
with one covariate centered at the group mean) is 
written 
in Equation 1
4:   

 (14)  , where the generalized or binary covariate model is centered by cluster (office) mean, Level 1 
is denoted for the 
-th person (
) and Level 2 is denoted for the 
-th group (
),  is a covariate for the 
-th person in the 
-th gr
oup, 
 is group mean for the 
-th 
cluster, 
 is a random effect of the 
-th office (a residual at Level 2), 
 is an individual error 
for the 
-th subject (a residual at Level 1), 
 

opes are not the same across office structures), 
 is grand mean, and random 
errors at levels 1 and 2 are assumed to be mutually independent (Klar & Donner, 2001; 
Raudenbush & Bryk, 2002).
    !!!!!!65 !!!CHAPTER 5 
  RESULTS
    5.1 
Data Source and 
Sample Characteristics
  This study used the real data set of Rehabilitation Services Administration, RSA
-911, in 
FY 2015 to examine and verify the proposed analytic methods of intraclass correlation (ICC) 
estimation and related inferential statistics (e.g.
, confidence interval and p
-value) in different 
types of scenarios with respect to hierarchical design and modeling structure. The target 
samples are selected from 
people with disabilities who had been 
receiving services in the 
Michigan Rehabilitation Serv
ices Programs for vocational rehabilitation and supported 
employment. Note that in order to select usable samples for data simulations, this study only 
includes those samples having an individualized plan for employment (IPE) for services in 
vocational reh
abilitation (VR), while all other subjects (ineligible for VR or not having an IPE) 
are excluded from the target samples and not considered further in data analysis for ICC 
calculations. In simulation analysis of the study, the target sample is of size 
N=17,633, while 
the usable sample size is 
n=11,819 for ICC estimation and inference. By hierarchical design & 
model considerations (i.e., individuals are on Level 1 and offices are on Level 2), all usable 
samples are distributed across 33 office units statewi
de in Michigan (see Tables B.1 and B.2 
and Figure B.1 in Appendix B for an illustration
 of the hierarchical spatial data structure for
 usable samples in Michigan from 
RSA
-911). Individual characteristics of the usable samples 
!66 are described in Tables 5.1, 5
.2 and 5.3 for more details.
     Table 5.1 Individual Characteristics of the Usable Samples (
n=11,819)  Demographic Background
 Frequency
 Percentage
 Gender
   Female
 5,069 42.90% Male
 6,750 57.10% Age
   Younger than 22
 3,771 31.91% Ages 22
-40 2,905 24.58% Ages 40
-64 4,734 40.05% Older than 65
 409 3.46% Minority
   Yes (Non
-Whites)
 7,757 65.63% No (Whites)
 4,062 34.37% Education
   Elementary or Secondary
 3,177 26.88% Special Education
 840 7.11% High School 
 5,075 42.94% College 
Above
 2,727 23.07% Social Security Benefits
   No 9,168 77.60% Yes
 2,651 22.40% Total
 11,819 100.00% Note1. Minority group is defined as the non
-white populations (e.g., Black or African American, 
American Indian or Alaska Native, Asian, Native 
Hawaiian or Other Pacific Islanders). Non
-
Middle East or North African, according to the RSA
-911 Report Manual
; also see Appendix A
). Note2. Mean of Age = 36.4
, and Standard Deviation of Age = 16.3.
              !67     Table 5.2 Disability & Rehabilitation Characteristics of the Usable Samples (
n=11,819)  Disability & Rehabilitation 
Information
 Frequency
 Percentage
 Type of Disability
   VI: Visual 
Impairments
 87 0.70% HI: Hearing Impairments
 1,989 16.80% PI: Physical Impairments
 2,154 18.20% LD: Learning Disability
 2,276 19.30% ADHD
 443 3.70% ID 652 5.50% TBI
 132 1.10% ASD: Autism 
 436 3.70% MI: Mental Illness
 3,073 26.00% SA: 
Substance Abuse
 577 4.90% Significance of Disability
   No 1,259 10.65% Yes
 10,560 89.35% Previous Work Background
   No Work Experience 
 8,836 74.76% Had Work Experience
 2,983 25.24% Job Placement Assistance Service
   Not Received
 7,347 62.20% Received
 4,472 37.80% On-the
-job Supports Service
   Not Received
 11,076 93.70% Received
 743 6.30% Rehabilitation Technology Service
   Not Received
 9,610 81.30% Received
 2,209 18.70% Total
 11,819 100.00% Note. VI=Visual Impairments or 
Blindness; HI=Hearing Impairments or Deafness; 
PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity 
Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum 
Disorder; MI=Mental Illness; SA=S
ubstance Abuse.
        !68     Table 5.3 Outcomes of the Usable Samples (
n=11,819)
  Outcome Measure
 Frequency
 Percentage
 Rehabilitation Outcome
   Not Employment
 5,201 44.01% Employment
 6,618 55.99% Competitive Employment
   Not Competitive 
Employment
 6,787 45.60% Competitive Employment
 6,429 54.40% Weekly
 Earnings
   Below $100 Weekly Income
 5,409 45.80% $100-$200 Weekly Income
 1,647 13.90% $200-$300 Weekly Income
 1,506 12.70% Above $300 Weekly Income
 3,257 27.60% Total
 11,819 100.00% Note
1. Median of 
Weekly
 Earnings
 = 148.0, Mean of 
Weekly
 Earnings
 = 224.5, and Standard 
Error of Mean 
(SEM) 
of 
Weekly
 Earnings
 = 3.0.
 Note2. 
Weekly
 Earnings
 can also be deemed as an indicat
or of quality employment.
     5.2 
Models and 
Variables Used for Simulations of ICC Analysis
   There are four multilevel modeling structures (Models 1
-4; M1
-M4) in the study to test 
the proposed methods of ICC estimation and inference. Furthermore, three disability
-related 
covariates 
 significance of
 disability (dichotomous; W1), type of disability (nominal; W2), and 
previous work experience (dichotomous; W3) 
 
analysis (i.e., separating the whole usable sample into different and mutually exclusive sub
-samples) in the all four designated models (M1
-M4). Three covariate sets are considered for 
statistical adjustment in the multilevel modeling procedure: (1) Covariate Set 1 (CVS1) 
!69 includes demographic information such as gender (dichotomous; X1), minority (d
ichotomous; 
X2), age (continuous; X3), social security benefits (dichotomous; X4), and education 
background (ordinal or approximately continuous; X5); (2) Covariate Set 2 (CVS2) includes 
rehabilitation service information such as job placement assistance (
dichotomous; X6), on
-the
-job supports (dichotomous; X7), and rehabilitation technology (dichotomous; X8); (3) 
Covariate Set 3 (CVS3) combines the previous covariate sets together (both CVS1 and CVS2) 
to account for all individual information in multilevel 
modeling. Two different outcome 
measures, competitive employment (dichotomous; Y1) and
 weekly earnings
 (continuous; Y2), 

correlation structures between predicto
rs, covariates and outcomes are shown in Tables 5.4 and 
5.5, and that the associations between disability type and outcomes are described via one
-way 
analysis of variance (ANOVA) in Table 5.6. 
For outcome measure Y1, e
xcept for X1 (p
-value=0.41), all other
 predictors (X2
-X8) and covariates (W1 & W3) are correlated with the 
outcome measure Y1 at the significance level of 0.05
 (see Table 5.4)
. For outcome measure Y2, 
all predictors (X1
-X8) and covariates (W1 & W3) are correlated with the outcome measure Y2 
at the significance level of 0.05
 (see Table 5.5)
. For the association of 
W2 (Type of Disability)
 with both outcome measures Y1 & Y2, it demonstrates in Table 5.6 that disability type is a 
significant factor in explaining total variation of both outcome meas
ures, and that the 
measure 
of strength of association 
(i.e., 
F-statistic 
in ANOVA
 along with Eta
-squared as an  ICC effect 
size measure) is significant 
at the 
alpha
 level of 0.05
. In all, it suggests those predictors (X1
-X8) and covariates (W1
-W3) have pro
spective 
associations with key outcome
 variable
s (Y1
-Y2), and that this statistical evidence may provide 
supportive 
information linked to favorable and promising ICC calculations in the study.
 !70    Table 5.4 Correlation Structure of All Predictors and Outcom
e Y1 in Hierarchical Analysis
   Y1 X1 X2 X3 X4 X5 X6 X7 X8 W1 W3 Y1 1.00
 0.01
 -0.09
 0.18
 -0.16
 0.16
 0.07
 0.07
 0.31
 -0.21
 0.31
 X1 0.01
 1.00
 -0.01
 -0.04
 -0.02
 -0.08
 0.02
 0.03
 -0.05
 0.03
 -0.06
 X2 -0.09
 -0.01
 1.00
 0.01
 0.10
 -0.05
 -0.01
 -0.06
 -0.23
 0.13
 -0.19
 X3 0.18
 -0.04
 0.01
 1.00
 0.01
 0.51
 -0.16
 -0.13
 0.40
 -0.26
 0.36
 X4 -0.16
 -0.02
 0.10
 0.01
 1.00
 0.00
 0.10
 0.12
 -0.15
 0.18
 -0.17
 X5 0.16
 -0.08
 -0.05
 0.51
 0.00
 1.00
 -0.08
 -0.10
 0.29
 -0.18
 0.28
 X6 0.07
 0.02
 -0.01
 -0.16
 0.10
 -0.08
 1.00
 0.21
 -0.25
 0.17
 -0.26
 X7 0.07
 0.03
 -0.06
 -0.13
 0.12
 -0.10
 0.21
 1.00
 -0.10
 0.07
 -0.08
 X8 0.31
 -0.05
 -0.23
 0.40
 -0.15
 0.29
 -0.25
 -0.10
 1.00
 -0.38
 0.56
 W1 -0.21
 0.03
 0.13
 -0.26
 0.18
 -0.18
 0.17
 0.07
 -0.38
 1.00
 -0.39
 W3 0.31
 -0.06
 -0.19
 0.36
 -0.17
 0.28
 -0.26
 -0.08
 0.56
 -0.39
 1.00
 Note1. Y1=Competitive Employment; X1=Gender; X2=Minority; X3=Age; X4=Social Benefits; 
X5=Education; X6=Job Placement; X7=On
-the
-job Supports; X8=Rehabilitation Technology; 
W1=Significance of Disability; W3=Previous Work Experience.
 Note2. Except for X1 (p
-value=0.41), all other predictors (X2
-X8) and covariates (W1 & W3) 
are correlated with the outcome measure Y1 at the significance level of 0.05.
 Note3. W2 (Type of Disability) is not included, due to the categorical (nominal) measure
ment. 
     Table 5.5 Correlation Structure of All Predictors and Outcome Y2 in Hierarchical Analysis
   Y2 X1 X2 X3 X4 X5 X6 X7 X8 W1 W3 Y2 1.00
 0.05
 -0.14
 0.32
 -0.22
 0.28
 -0.16
 -0.07
 0.51
 -0.34
 0.47
 X1 0.05
 1.00
 -0.01
 -0.04
 -0.02
 -0.08
 0.02
 0.03
 -0.05
 0.03
 -0.06
 X2 -0.14
 -0.01
 1.00
 0.01
 0.10
 -0.05
 -0.01
 -0.06
 -0.23
 0.13
 -0.19
 X3 0.32
 -0.04
 0.01
 1.00
 0.01
 0.51
 -0.16
 -0.13
 0.40
 -0.26
 0.36
 X4 -0.22
 -0.02
 0.10
 0.01
 1.00
 0.00
 0.10
 0.12
 -0.15
 0.18
 -0.17
 X5 0.28
 -0.08
 -0.05
 0.51
 0.00
 1.00
 -0.08
 -0.10
 0.29
 -0.18
 0.28
 X6 -0.16
 0.02
 -0.01
 -0.16
 0.10
 -0.08
 1.00
 0.21
 -0.25
 0.17
 -0.26
 X7 -0.07
 0.03
 -0.06
 -0.13
 0.12
 -0.10
 0.21
 1.00
 -0.10
 0.07
 -0.08
 X8 0.51
 -0.05
 -0.23
 0.40
 -0.15
 0.29
 -0.25
 -0.10
 1.00
 -0.38
 0.56
 W1 -0.34
 0.03
 0.13
 -0.26
 0.18
 -0.18
 0.17
 0.07
 -0.38
 1.00
 -0.39
 W3 0.47
 -0.06
 -0.19
 0.36
 -0.17
 0.28
 -0.26
 -0.08
 0.56
 -0.39
 1.00
 Note1. Y2=
Weekly Earnings
; X1=Gender; X2=Minority; X3=Age; X4=Social Benefits; 
X5=Education; X6=Job Placement; X7=On
-the
-job Supports; 
X8=Rehabilitation Technology; 
W1=Significance of Disability; W3=Previous Work Experience.
 Note2. All predictors (X1
-X8) and covariates (W1 & W3) are correlated with the outcome 
measure Y2 at the significance level of 0.05.
 Note3. W2 (Type of Disability) is
 not included, due to the categorical (nominal) measurement. 
 !71     Table 5.6 Summary of Mean Differences in the Outcomes between Type of Disability
  Type of Disability 
(W2)
 Competitive Employment 
Outcome (Y1)
 Quality of Employment 
Outcome (Y2)
 VI 0.62 250.15 HI 0.86 578.54 PI 0.49 199.74 LD 0.48 140.62 ADHD
 0.47 135.19 ID 0.48 103.63 TBI
 0.48 180.94 ASD
 0.52 123.64 MI 0.46 138.34 SA 0.50 173.38 Overall Mean
 (Standard Error)
 0.54 (SE=0.01)
 224.48 (SE=3.02)
 F-value
 (p-value)
 118.36  (p-value < 0.01)
                421.52 (p-value < 0.01)
 Eta-squared
 (or ICC)
 0.08 0.24 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or 
Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention 
Deficit 
Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain 
Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Abuse.
 Note2. F
-value is based on One
-way Analysis of Variance (ANOVA).
 Note3. Eta
-squared 
(

) is a measure of strength of 
association in ANOVA, and it can be computed as between
-group sum of squares 
divided by total sum of squares, which is another form of effect
-size measure of 
intraclass correlation coefficients (ICC).
 See more detail in Section 2.1.
    There are two types of multilevel modeling structures in the simulation study. The first 
one is unconditional model (Model 1, or M1) with no covariates adjusted; and the second one is 
conditional model (Model 2
-4, or M2
-4) with an adjustment of covariates (i.e., M2|CVS1, 
M3|CVS2, and M4|CVS3). Note that CVS1 is a pre
-specified covariate set 1 about 
demographic information in Model 2 (M2), CVS2 is about rehabilitation service information in 
!72 Model 3 (M3), and CVS3 is about 
all individual information linking both CVS1 and CVS2 in 
Model 4 (M4).  
  The statistical model specification for both unconditional model (M1) and conditional 
model (M2
-M4) is described as following:
 (1) Unconditional Model (Model 1; M1):
 In the two
-level mul
tilevel design structure (i.e., individual subjects are on the level 1, 
and office units are on the level 2), the unconditional model with no covariates
-adjusted is 
shown in 
the system of 
Equation 15:
   

 (15)  , where 
 represents an outcome measure for the 
-th individual subject (at the level 1; 
;) in the 
-th office unit (at the level 2; 
; ),  is a grand mean 
outcome that can be estimated as 
,  is a random error term (or individual 
variation) at the level 1 (i.e., 

) corresponding to the 
-th person in the 
-th 
group, 
 is a random effect (i.e., 

) associated with the 
-th office (or cluster 
variation at
 the level 2), the within
-cluster (i.e., between
-person) variance component is given 
by 

, the between
-cluster variance component is given by 

, and the 
!73 random error terms at the level 1 and level 2 are assumed to be 

.  (2) Conditional Model (Model
s 2-4; M2
-M4)
 When a pre
-specified covariate set (i.e., CSV1
-CSV3) is added in the previous 
unconditional model (M1), the 
conditional model with a covariate set
-adjusted
, where the 
covariate set 

 is to be centered at the group mean on each level, can be 
described 
in the system of
 Equation 16:
   

 (16)  , where the conditional model with covariate mean adjustment uses group
-mean centering for 
reducing correlation between groups (Paccagne
lla, 2006; Raudenbush & Bryk, 2002), Level 1 
is for the 
-th person (
) and Level 2 is for the 
-th group (
), 
 is the 
-th covariate for the 
-th individual subject in the 
-th office, 
 is group mean of the 
-th 
covariate for the 
-th group, 
 is a random effect of the 
-th office (a random residual at the 
level 2), 
 is an individual error term for the 
-th person (a random residual at the level 1), 
 is the 
-
-th group (assuming each of slopes are varied across 
offices), 
 is grand mean, 
 is the slope regressed on the grand mean for the 
-th covariate 
adjusted by group mean, and independence is assumed between errors at levels 1 and 2.
 !74  5.3 
ICC 
Estimation Method and Its Inferential Statistics
  The proposed intraclass correlation (ICC) estimator via Analysis of Variance 
(ANOVA)
, shown below in Equation 17,
 is suitable for either a balanced (equal size over 
groups) or unbalance design (unequal size
 across groups):
   

 (17)  , where 
MSA
 is 
Mean 
Squares 
Among 
Groups in the ANOVA, 
MSW is 
Mean 
Squares 
Within 
Groups in the ANOVA, 
 is the 
-th group size, 
, and 
 is the 
total sample size, i.e., 
. Note that computational information pertinent to the 
ANOVA for the ICC estimator 
(in Equation 17) 
is specified below in great detail. 
  Suppose 
 is decomposed by 
analysis of variance (ANOVA) for the intraclass 
correlation (ICC) estimator
, where 
 is an outcome measure for the 
-th person (
) in the 
-th group (
). The source of 
overall 
variation 
(or sum of squares, SS) 
is defined by 
, where
 the 
among
-group variation 
, the within
-group variation 
, and
 the total variation 
. The 
mean squares
 source
 (MS) in ANOVA
 can be
 obtained 
through 
the formula
 (i.e., 
regression to
ward
 the mean
 or the average of variation
) , that 
!75 is, 


 and 


, where 


 is 
the mean variation 
among groups
, 


 is 
the mean variation within groups (or the mean squared error)
, 

 is 
 (for 
 = the number of groups) representing the between
-group degrees of freedom, the within
-group degrees of freedom 


 is 
 (for
 = the average number 
of within
-group subjects = wei
ghted mean group size).
 Note that the 
original idea of 
analysis of 
variable (ANOVA) for ICC estimation can be referred to
 Table 2.1 (
Donner & Koval, 1980a
). Furthermore, the variance of the ICC estimate
 can be obtained by 
   

 (18)  , where the sampling weights are 
 and 
, the total 
sample size is 
, and 
 is the ICC estimate as 

.  Thus, the standard error of the ICC estimate is 


 . The proposed testing statistic of the ICC estimate (

) can be written that 
   

 (19) !76  , where the test statistic 
 follows an 
 distribution with degrees of freedom 
 and 
, for hypothesis testing 

 versus 

. Given the sampling 
 distribution for the ICC estimate, the 
 confidence 
interval on the intraclass correlation can be obtained by 
   
 (20)  , where this 
 confidence limit for the ICC (

) represents the degree of 
total variability accounted for by betw
een-group variation in multilevel design. It is noteworthy 
that the interval estimate on 

 may not be very accurate and precise for a small sample 
size (i.e., small 
 or 
) or low reliability in measurements (i.e., large MSW or 
). Also, it 
should be pointed out that the lower confidence limit on 

 could be negative 
(especially when small sample size or large measurement error occurs in hierarchical 
modeling), but since 

 normally should 
not be negative anyway by its mathematical 
definition
 (i.e., 

), it is customary to replace the negative lower bound with 

 For statistical planning in multilevel design, the proposed auxiliary statistics are used to 
help understand minimum detectable effect size with respect to desired power and required 
!77 sample size. Three types of measures linked with the intraclass correlati
on (ICC) estimator are: 
 (i)
 Design effect (
), or variance inflation factor (

), is written by
   
 (21)  , where 
 is the ICC estimate (

) which provides a statistical measure of 
homogeneity within groups (i.e., if within
-group subjects are homogeneous perfectly 
, then 
 and hence 
). In general, grouping creates more variation 
than simple random sampling by a factor of 
 (or 

), due to the major part of 
group
-to-group variability plus the minor portion of within
-group variation (i.e., samples 
in different groups vary more than those in the same group).
  (ii)
 The unconditional intraclass correlation coefficient is given by 
    (22)  , where the unconditional total variance is 
,  and 
 represent error 
variances corresponding to the within
- and between
-group variation, respectively, in the 
unconditional model with no covariates adjusted in multilevel design.
  !78 In hierarchical models with covariates for statistical adjustment, the conditio
nal 
intraclass correlation coefficient is defined as 
    (23)  , where the covariate
-adjusted total variance is 
,  and 
 represent the variance components, adjusted by covariates, corresponding to the within
- and between
-group variation, respectively, in the conditional multilevel model.
 (iii)
 For evaluating the relative efficiency of measures of homogeneity and heterogeneity in 
multilevel design, two statistical ancillary quantities, based on random variations of both 
unconditional and conditional hierarchical models, are given by
    (24) and
   (25)  , where 
 indicates the proportion of between
-group variance remaining (after given by 
covariate adjustment) in multilevel design, and 
 indicates the proportion of within
-group variance remaining (after given by cov
ariate adjustment) in multilevel design.  
!79 Both 
 and 
 measures show efficacy and effectiveness of covariate adjustment for 
between
-group and within
-group random variation in multilevel design and modeling.
  The other two opposite measures (like a 
pseudo R
-squared) for random variation by 
covariate adjustment in hierarchical modeling, are written by
    (26) and 
   (27)  , where 
 and 
 are defined as the proportion of between
-group and within
-group, 
respectively, variation explained by covariates adjusted in hierarchical design. Note that 
both 
 and 
 can also show efficacy of covariate adjustment in multilevel design.
        
 5.4 
Results of ICC Estimates and Inferential Statistics
  
measure for 
competitive employment
 (Y1); (2) The other one is a continuous measure for 
weekly earned income
 or quality e
mployment 
(Y2). Further, there are four different multilevel 
models for ICC calculations: (1) 
Unconditional Model
 (M1) is of no covariate adjustment; (2) 
!80 Conditional Model
 (M2) is fitted with covariate adjustment by the demographic predictors 
(Covariate Se
t1); (3) 
Conditional Model
 (M3) is fitted with covariate adjustment by the 
rehabilitation service predictors (Covariate Set2); (4) 
Conditional Model
 (M3) is fitted with 
covariate adjustment by both the demographic and service predictors (Covariate Set3). I
n addition, three breaking variables are considered for subset analysis of ICC estimation and 
inference using usable sample
s (n=11,819) in multilevel design: (1) Previous Work Experience 
 binary measure (i.e., yes or no); (2) Significance Disability 
 bin
ary measure (i.e., yes or no); 
(3) Disability Type 
 nominal measure with 
10 different disability categories (i.e., VI, HI, PI, 
LD, ADHD, ID
, TBI, ASD, MI, and SA). In this section, the main results of the study are 
presented in the following Tables 5.7
-5.16. 
  5.4.1 
Competitive Employment Outcome Measure
  The competitive employment (Y1) is fitted as a dichotomous outcome measure in the 2
-level hierarchical generalized linear modeling (HGLM) framework, where individual subjects 
are on the level 1 and office
 units are on the level 2. The main results of the unconditional 
model M1 (Model 1) are shown in Table 5.7; the conditional model M2 (Model 2) in Table 5.8; 
the conditional model M3 (Model 3) in Table 5.9; the conditional model M4 (Model 4) in Table 
5.10; 
Table 5.11 provides all the auxiliary information of ICC estimates such as design effect 

, ,  and 
; and Table 5.12 shows ICC evaluation results based on the 
bootstrap sampling procedure 
(the number of bootstrap 
repetition
s=100)
.   !81 The ICC estimate
s (includin
g standard error, p
-value, 95% confidence interval)
 for 
competitive outcome measure (Y1) 
under unconditional (Model 1) and conditional (Models 2
-4) multilevel modeling structure, are summarized as follows. 
 For competitive employment (Y1 under Model 1; refer to Tables 5.7), the average 
(unadjusted) intraclass correlation is about 0.01 (SE=0.00
3, p<0.01, 95% CI = [0.01,0.02]). 
Given by work experience (binary coding of yes or no) for partitioning subset samples, both 
show the average (unadjusted) ICC of 0.01 (SE=0.004, p<0.01, 95% CI = [0.01, 0.02]). By 
significance disability (binary coding of 
yes or no) for subset analyses, both show the average 
(unadjusted) ICC of 0.02 (SE=0.009, p<0.01, 95% CI = [0.01, 0.05]). Breaking down by 
disability types, it finds that autism spectrum disorder (ASD) has the highest (unadjusted) ICC 
of 0.06 (SE=0.03, p<0
.01, 95% CI = [0.00, 0.15]), followed by learning disability (LD; 
ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.07]), hearing impairments (HI; ICC=0.02, 
SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), physical impairments (PI; ICC=0.02, SE=0.01, 
p<0.01, 95% CI = [
0.01, 0.04]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = 
[0.01, 0.04]). Also noted that at the significance level of 0.05, 
it shows
 non-significan
ce for the 
unadjusted ICC estimates in the following disabilities 
 visual impairments (VI, I
CC=0.07, 
SE=0.10, p=0.25), attention deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, 
p=0.54), 
intellectual disability
 (ID; ICC=0.0
2, SE=0.02, p=0.0
6), traumatic brain injury (TBI; 
ICC=0.0
2, SE=0.0
5, p=0.
36), and substance abuse (SA; ICC=0.00, SE=0
.01, p=0.48).  
 For competitive employment (Y1 under Model 2; refer to Tables 5.8), the average 
(adjusted by demographic information) intraclass correlation is about 0.01 (SE=0.003, p<0.01, 
95% CI = [0.01,0.02]). Given by work experience (binary coding of 
yes or no) for partitioning 
subset samples, both show the average (adjusted by demographic information) ICC of 0.01 
!82 (SE=0.004, p<0.01, 95% CI = [0.01, 0.03]). By significance disability (binary coding of yes or 
no) for subset analyses, both show the averag
e (adjusted by demographic information) ICC of 
0.02 (SE=0.01, p<0.01, 95% CI = [0.01, 0.05]). Breaking down by disability types, it finds that 
autism spectrum disorder (ASD) has the highest (adjusted) ICC of 0.06 (SE=0.03, p<0.01, 95% 
CI = [0.00, 0.15]), f
ollowed by learning disability (LD; ICC=0.03, SE=0.01, p<0.01, 95% CI = 
[0.01, 0.06]), hearing impairments (HI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), 
physical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]), and mental 
illne
ss (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the 
significance level of 0.05, it
 shows
 non-significan
ce for the adjusted ICC estimates in the 
following disability types 
 visual impairments (VI, ICC=0.07, SE=0.10, p=0.26), a
ttention 
deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.54), 
intellectual disability (ID; 
ICC=0.0
2, SE=0.02, p=0.0
5), traumatic brain injury (TBI; ICC=0.0
2, SE=0.0
5, p=0.
37), and 
substance abuse (SA; ICC=0.00, SE=0.01, p=0.48).  
  For competi
tive employment (Y1 under Model 3; refer to Tables 5.9), the average 
(adjusted by rehabilitation services information) intraclass correlation is about 0.01 (SE=0.003, 
p<0.01, 95% CI = [0.01,0.02]). Given by work experience (binary coding of yes or no) for 
partitioning subset samples, both show the average (adjusted by rehabilitation services 
information) ICC of 0.01 (SE=0.005, p<0.01, 95% CI = [0.01, 0.03]). By significance disability 
(binary coding of yes or no) for subset analyses, both show the average (
adjusted by 
rehabilitation services information) ICC of 0.02 (SE=0.01, p<0.01, 95% CI = [0.01, 0.05]). 
Breaking down by disability types, it finds that autism spectrum disorder (ASD) has the highest 
(adjusted) ICC of 0.08 (SE=0.04, p<0.01, 95% CI = [0.02, 
0.17]), followed by learning 
disability (LD; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.02, 0.07]), intellectual disability (ID; 
!83 ICC=0.03, SE=0.02, p
=0.02, 95% CI = [0.0
2, 0.0
9]), hearing impairments (HI; ICC=0.02, 
SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), phys
ical impairments (PI; ICC=0.02, SE=0.01, 
p<0.01, 95% CI = [0.01, 0.04]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = 
[0.01, 0.04]). Also noted that at the significance level of 0.05, it 
shows 
non-significan
ce for the 
adjusted ICC estimates in the following disability types 
 visual impairments (VI, ICC=0.09, 
SE=0.10, p=0.20), attention deficit hyperactivity disorder (ADHD; ICC=0.01, SE=0.02, 
p=0.29), 
traumatic brain injury (TBI; ICC=0.0
2, SE=0.0
5, p=0.
35), and
 substance abuse (SA; 
ICC=0.00, SE=0.01, p=0.47).  
 For competitive employment (Y1 under Model 4; refer to Tables 5.10), the average 
(adjusted by both demographics and rehabilitation services) intraclass correlation is about 0.01 
(SE=0.003, p<0.01, 95% CI 
= [0.01,0.02]). Given by work experience (binary coding of yes or 
no) for partitioning subset samples, both show the average (adjusted by both demographics and 
rehabilitation services) ICC of 0.01 (SE=0.005, p<0.01, 95% CI = [0.01, 0.03]). By significance 
disability (binary coding of yes or no) for subset analyses, both show the average (adjusted by 
both demographics and rehabilitation services) ICC of 0.02 (SE=0.01, p<0.01, 95% CI = [0.01, 
0.05]). Breaking down by disability types, it finds that autism spe
ctrum disorder (ASD) has the 
highest (adjusted) ICC of 0.06 (SE=0.03, p<0.01, 95% CI = [0.01, 0.15]), followed by learning 
disability (LD; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.06]), hearing impairments (HI; 
ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01
, 0.05]), physical impairments (PI; ICC=0.02, 
SE=0.01, p<0.01, 95% CI = [0.01, 0.04]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 
95% CI = [0.01, 0.04]). Also noted that at the significance level of 0.05, it
 shows
 non-significan
ce for the adjusted 
ICC estimates in the following disability types 
 visual 
impairments (VI, ICC=0.07, SE=0.10, p=0.26), attention deficit hyperactivity disorder (ADHD; 
!84 ICC=0.00, SE=0.01, p=0.54), 
intellectual disability (ID; ICC=0.0
2, SE=0.02, p=0.0
5), traumatic brain injur
y (
TBI; ICC=0.0
2, SE=0.0
5, p=0.
37), 
and substance abuse (SA; ICC=0.00, 
SE=0.01, p=0.48).   
   For a
uxiliary 
information of ICC Estimates for Outcome Measure Y1
 (see T
ables 5.1
1), 
the unconditional model (Model 1
; unconditional ICC=0.01 and design effect DE
=4.44
) is used 
as a baseline for measuring relative efficiency of between
-group variance 
( and 
) and
 within
-group variance 
( and 
) for ICC estimates.  The conditional model (Model 2
; conditional ICC=0.01 and design effect DE=4.59
) with a covariate set of demographic 
information
 has
 a decrease 
of 3.05% of within
-group variation and 0.00% of change in 
between
-group variation, in comparison with the unconditional model (Model 1).  The 
conditional model (Model 3
; conditional ICC=0.01 an
d design effect DE=4.83
) with a covariate 
set of rehabilitation service information has a decrease of 8.06% of within
-group variation and 
an increase of 4.17% in between
-group variation, in comparison with the unconditional model 
(Model 1).  The conditiona
l model (Model 4
; conditional ICC=0.01 and design effect DE=4.59
) with a covariate set of both demographic and rehabilitation service information has a decrease 
of 3.38% of within
-group variation and no change (0.00%) in between
-group variation, in 
compari
son with the unconditional model (Model 1).
      
For evaluation of bootstrapping ICC estimates (
bootstrap 
repetition of 100 times) for 
outcome measure Y1 in the different resampling scenarios of the number of groups and subjects 
(see Table 5.12), it provid
es important information of sampling schemes in multilevel structure 
(based on Model 4 with the full set of covariates of demographics and 
rehabilitation 
services).
 For the low level of cluster samples (i.e., number of groups=5), the mean bias is about 0.0
068, 
MSE is about 0.0004, the proportion of successful hits is about 34%. For the medium level of 
!85 cluster samples (i.e., number of groups=15), the mean bias is about 0.0049, MSE is about 
0.0002, the proportion of successful hits is about 66%. For the high 
level of cluster samples 
(i.e., number of groups=25), the mean bias is about 0.0047, MSE is about 0.0001, the 
proportion of successful hits is about 68%. On the other hand, For the low level of subject 
samples (i.e., number of subjects=50), the mean bias i
s about 0.0062, MSE is about 0.0003, the 
proportion of successful hits is about 41%. For the medium level of subject samples (i.e., 
number of subjects=100), the mean bias is about 0.0053, MSE is about 0.0002, the proportion 
of successful hits is about 59%.
 For the high level of subject samples (i.e., number of 
subjects=150), the mean bias is about 0.0047, MSE is about 0.0001, the proportion of 
successful hits is about 
70%. Overall, the sampling scheme with the high level of group 
samples (i.e., 25) and high
 level of subject samples (i.e., 150) achieve the best outcome (i.e., 
lowest bias & MSE, and highest successful hits); the sampling scheme with moderate cluster 
and subject samples (i.e., number of groups=15 and number of subjects=100) can provide the 
aver
age performance
 of ICC estimation
;  the sampling scheme with the low level of group 
samples (i.e., 5) or the level of group subject samples (i.e., 50) is more likely to result in poor 
performance
 of ICC estimates in hierarchical generalized linear modeling
 structure
.       !86     Table 5.7 ICC Estimates of Unconditional Model M1 for Outcome Measure Y1
  Model 1
 Total 
Sample 
Size 
 Number 
of 
Groups
 Within 
Group 
Size 
 ICC 
Estimate
 SE of 
ICC 
Estimate
 p-value
 Lower 
Bound 
of ICC
 Upper 
Bound 
of ICC
 Overall 
Sample
 11,819 33 356 0.0097 0.0031 0.00 0.0053 0.0187 Work Experience
 No 
 8,821 33 266 0.0119 0.0038 0.00 0.0064 0.0232 Yes
 2,998 33 90 0.0101 0.0053 0.00 0.0026 0.0254 Significance Disability
 No 1,233 33 36 0.0297 0.0145 0.00 0.0093 0.0675 Yes
 10,586 33 319 0.0107 0.0034 0.00 0.0058 0.0208 Disability Type
 VI 87 29 3 0.0732 0.1008 0.25 -0.1241 0.3253 HI 1,989 32 61 0.0201 0.0093 0.00 0.007 0.0459 PI 2,154 33 65 0.0187 0.0084 0.00 0.0067 0.0429 LD 2,276 33 68 0.0286 0.0105 0.00 0.0134 0.0585 ADHD
 443 33 13 -0.0032 0.0149 0.54 -0.0303 0.0495 ID 652 33 19 0.0223 0.0173 0.06 -0.0041 0.0727 TBI
 132 27 5 0.0212 0.0513 0.36 -0.0823 0.1919 ASD
 436 33 13 0.0641 0.0329 0.00 0.0141 0.1505 MI 3,073 33 92 0.0175 0.0070 0.00 0.0075 0.0376 SA 577 31 18 -0.0006 0.0085 0.48 -0.0208 0.0405 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; 
PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity 
Disorder; 
ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum 
Disorder; MI=Mental Illness; SA=Substance Abuse.
 Note2. P
-value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01).
              !87     Table 5.8 ICC 
Estimates of Conditional Model M2 for Outcome Measure Y1
  Model 2
 Total 
Sample 
Size 
 Number 
of 
Groups
 Within 
Group 
Size 
 ICC 
Estimate
 SE of 
ICC 
Estimate
 p-value
 Lower 
Bound 
of ICC
 Upper 
Bound 
of ICC
 Overall 
Sample
 11,819 33 356 0.0101 0.0032 0.00 0.0055 0.0194 Work Experience
 No 
 8,821 33 266 0.0119 0.0038 0.00 0.0064 0.0232 Yes
 2,998 33 90 0.0119 0.0057 0.00 0.0038 0.0284 Significance Disability
 No 1,233 33 36 0.0356 0.0160 0.00 0.0131 0.0767 Yes
 10,586 33 319 0.0109 0.0034 0.00 0.0059 0.0211 Disability Type
 VI 87 29 3 0.0671 0.0996 0.26 -0.1289 0.3190 HI 1,989 32 61 0.0236 0.0102 0.00 0.0093 0.0517 PI 2,154 33 65 0.0188 0.0084 0.00 0.0067 0.0430 LD 2,276 33 68 0.0289 0.0105 0.00 0.0136 0.0588 ADHD
 443 33 13 -0.0032 0.0149 0.54 -0.0303 0.0495 ID 652 33 19 0.0234 0.0176 0.05 -0.0034 0.0743 TBI
 132 27 5 0.0190 0.0501 0.37 -0.0837 0.1892 ASD
 436 33 13 0.0640 0.0329 0.00 0.0140 0.1504 MI 3,073 33 92 0.0175 0.0070 0.00 0.0075 0.0376 SA 577 31 18 -0.0006 0.0085 0.48 -0.0208 0.0405 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; 
PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity 
Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; 
ASD=Autism Spectrum 
Disorder; MI=Mental Illness; SA=Substance Abuse.
 Note2. P
-value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01).
              !88     Table 5.9 ICC Estimates of Conditional Model M3 for Outcome Measure Y1
  Model 3
 Total 
Sample 
Size 
 Number 
of 
Groups
 Within 
Group 
Size 
 ICC 
Estimate
 SE of 
ICC 
Estimate
 p-value
 Lower 
Bound 
of ICC
 Upper 
Bound 
of ICC
 Overall 
Sample
 11,819 33 356 0.0108 0.0033 0.00 0.0060 0.0206 Work Experience
 No 
 8,821 33 266 0.0130 0.0041 0.00 0.0071 0.0250 Yes
 2,998 33 90 0.0124 0.0059 0.00 0.0041 0.0291 Significance Disability
 No 1,233 33 36 0.0356 0.0160 0.00 0.0131 0.0767 Yes
 10,586 33 319 0.0119 0.0037 0.00 0.0066 0.0228 Disability Type
 VI 87 29 3 0.0905 0.1038 0.20 -0.1105 0.3430 HI 1,989 32 61 0.0215 0.0097 0.00 0.0079 0.0482 PI 2,154 33 65 0.0192 0.0085 0.00 0.0070 0.0437 LD 2,276 33 68 0.0340 0.0116 0.00 0.0170 0.0672 ADHD
 443 33 13 0.0096 0.0189 0.29 -0.0219 0.0694 ID 652 33 19 0.0320 0.0199 0.02 0.0023 0.0877 TBI
 132 27 5 0.0224 0.0519 0.35 -0.0815 0.1935 ASD
 436 33 13 0.0782 0.0356 0.00 0.0238 0.1705 MI 3,073 33 92 0.0187 0.0073 0.00 0.0083 0.0396 SA 577 31 18 0.0001 0.0089 0.47 -0.0204 0.0416 Note1. VI=Visual Impairments or Blindness; 
HI=Hearing Impairments or Deafness; 
PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity 
Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum 
Disorder; MI=Mental Illness; SA=Substance Ab
use.
 Note2. P
-value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01).
              !89     Table 5.10 ICC Estimates of Conditional Model M4 for Outcome Measure Y1
  Model 4
 Total 
Sample 
Size 
 Number 
of 
Groups
 Within 
Group 
Size 
 ICC 
Estimate
 SE of 
ICC 
Estimate
 p-value
 Lower 
Bound 
of ICC
 Upper 
Bound 
of ICC
 Overall 
Sample
 11,819 33 356 0.0101 0.0032 0.00 0.0055 0.0195 Work Experience
 No 
 8,821 33 266 0.0119 0.0038 0.00 0.0064 0.0232 Yes
 2,998 33 90 0.0120 0.0058 0.00 0.0038 0.0286 Significance Disability
 No 1,233 33 36 0.0359 0.0161 0.00 0.0133 0.0771 Yes
 10,586 33 319 0.0109 0.0034 0.00 0.0060 0.0211 Disability Type
 VI 87 29 3 0.0673 0.0996 0.26 -0.1287 0.3192 HI 1,989 32 61 0.0237 0.0102 0.00 0.0094 0.0519 PI 2,154 33 65 0.0188 0.0084 0.00 0.0067 0.0430 LD 2,276 33 68 0.0290 0.0106 0.00 0.0137 0.0591 ADHD
 443 33 13 -0.0033 0.0148 0.54 -0.0304 0.0493 ID 652 33 19 0.0237 0.0177 0.05 -0.0032 0.0749 TBI
 132 27 5 0.0191 0.0501 0.37 -0.0837 0.1893 ASD
 436 33 13 0.0645 0.0330 0.00 0.0144 0.1511 MI 3,073 33 92 0.0175 0.0070 0.00 0.0075 0.0377 SA 577 31 18 -0.0006 0.0085 0.48 -0.0208 0.0405 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; 
PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity 
Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum 
Disorder; MI=Mental Illness; SA=Substance Abuse.
 Note2. P
-value=0.00 indicates t
hat the level of significance is below 0.01 (i.e., p<0.01).
              !90     Table 5.11 Auxiliary Information of ICC Estimates for Outcome Measure Y1
  Modeling 
Structure
 ICC 
Estimate
 Between 
Group 
Variance
 Within 
Group 
Variance
 Design 
Effect 
(DE)
     Model 1
 (M1)
 0.0097 0.0024 0.2458 4.4436 NA NA NA NA Model 2
 (M2)
 0.0101 0.0024 0.2383 4.5856 1.0000 0.9695 0.0000 0.0305 Model 3
 (M3)
 0.0108 0.0025 0.2260 4.8341 1.0417 0.9194 -0.0417 0.0806 Model 4
 (M4)
 0.0101 0.0024 0.2375 4.5856 1.0000 0.9662 0.0000 0.0338 
M2-M4 show the conditional ICC quantity.
 Note2. Relative efficiency measures for ICC estimates between unconditional and condi
tional 
models (M1 versus M2
-M4) are 
, ,  and 
.!    Table 5.1
2 Evaluation of Bootstrap ICC Estimates for Outcome Measure Y1
  Number of 
Group
 Within Group 
Size
 Bias
 MSE Hits
 5 50 0.0078 0.0005 0.19 5 100 0.0069 0.0004 0.34 5 150 0.0056 0.0003 0.50 15 50 0.0054 0.0003 0.50 15 100 0.0048 0.0002 0.70 15 150 0.0045 0.0001 0.78 25 50 0.0053 0.0002 0.54 25 100 0.0042 0.0001 0.73 25 150 0.0033 0.0000 0.82 Note1. Bias is defined as the mean difference between Bootstrap ICC and True ICC. 
 Note2. MSE is the mean squared error difference between Bootstrap ICC estimates.
 Note3. Hits shows the proportion of Bootstrap ICC estimates successfully lying within the 95
% confidence interval of True ICC.
  !91 5.4.2 
Earnings or Quality 
Employment
 Outcome Measure
  The w
eekly 
earned income
, or quality employment
, (Y2) is fitted as a continuous 
outcome measure in the 2
-level hierarchical linear modeling (HLM) framework, 
where 
individual subjects are on the level 1 and office units are on the level 2. The main results of the 
unconditional model M1 (Model 1) are shown in Table 5.1
3; the conditional model M2 (Model 
2) in Table 5.1
4; the conditional model M3 (Model 3) in Tabl
e 5.1
5; the conditional model M4 
(Model 4) in Table 5.1
6; and Table 5.1
7 provides all the auxiliary information of ICC estimates 

measures of 
, ,  and 
; and Table 5.1
8 shows ICC evaluation results based on the 
bootstrap sampling procedure (the number of bootstrap repetitions=100).  
 The ICC estimates (including standard error, p
-value, 95% confidence interval) for 
quality of employment
 outcome measure
 (Y2) under unconditional (Model 1) and conditional 
(Models 2
-4) multilevel modeling structure, are summarized as follows.
  For quality employment (Y2 under Model 1; refer to Tables 5.1
3), the average 
(unadjusted) intraclass correlation is about 0.02 (SE=0
.01, p<0.01, 95% CI = [0.01,0.04]). 
Given by work experience (binary coding of yes or no) for partitioning subset samples, both 
show the average (unadjusted) ICC of 0.03 (SE=0.01, p<0.01, 95% CI = [0.02, 0.05]). By 
significance disability (binary coding of
 yes or no) for subset analyses, both show the average 
(unadjusted) ICC of 0.05 (SE=0.01, p<0.01, 95% CI = [0.03, 0.09]). Breaking down by 
disability types, it finds that learning disability (LD) has the highest (unadjusted) ICC of 0.03 
(SE=0.01, p<0.01, 9
5% CI = [0.02, 0.07]), followed by substance abuse (SA; ICC=0.03, 
!92 SE=0.02, p=0.04, 95% CI = [0.00, 0.09]), hearing impairments (HI; ICC=0.03, SE=0.01, 
p<0.01, 95% CI = [0.01, 0.06]), physical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI 
= [0.01, 0.05
]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). 
Also noted that at the significance level of 0.05, it
 shows
 non-significan
ce for the ICC estimates 
in the following disability types 
 visual impairments (VI, ICC=0.00, SE=0.08,
 p=0.50), 
attention deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.87), 
intellectual 
disability
 (ID; ICC=0.0
2, SE=0.02, p=0.0
5), traumatic brain injury (TBI; ICC=
-0.08, SE=0.0
3, p=0.
88), 
and autism spectrum disorder (ASD; ICC=0.02, SE=0.02, p
=0.13).  
 For quality employment (Y2 under Model 2; refer to Tables 5.1
4), the average (adjusted 
by demographic information) intraclass correlation is about 0.02 (SE=0.01, p<0.01, 95% CI = 
[0.01,0.04]). Given by work experience (binary coding of yes or no)
 for partitioning subset 
samples, both show the average (adjusted by demographic information) ICC of 0.03 (SE=0.01, 
p<0.01, 95% CI = [0.02, 0.05]). By significance disability (binary coding of yes or no) for 
subset analyses, both show the average (adjusted
 by demographic information) ICC of 0.05 
(SE=0.01, p<0.01, 95% CI = [0.03, 0.09]). Breaking down by disability types, it finds that 
learning disability (LD) has the highest (adjusted by demographic information) ICC of 0.03 
(SE=0.01, p<0.01, 95% CI = [0.02,
 0.07]), followed by hearing impairments (HI; ICC=0.03, 
SE=0.01, p<0.01, 95% CI = [0.01, 0.06]), substance abuse (SA; ICC=0.03, SE=0.02, p=0.04, 
95% CI = [0.00, 0.09]), physical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = 
[0.01, 0.05]), and mental
 illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also 
noted that at the significance level of 0.05, it 
shows 
non
-significan
ce for the ICC estimates in 
the following disability types 
 visual impairments (VI, ICC=0.00, SE=0.08, p=0.49), atte
ntion 
deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.87), 
intellectual disability (ID; 
!93 ICC=0.0
2, SE=0.02, p=0.0
5), traumatic brain injury (TBI; ICC=
-0.08, SE=0.0
3, p=0.
87),and 
autism spectrum disorder (ASD; ICC=0.02, SE=0.02, p=0.13).
 For 
quality employment (Y2 under Model 3; refer to Tables 5.1
5), the average (adjusted 
by rehabilitation services) intraclass correlation is about 0.02 (SE=0.01, p<0.01, 95% CI = 
[0.01,0.04]). Given by work experience (binary coding of yes or no) for partition
ing subset 
samples, both show the average (adjusted by rehabilitation services) ICC of 0.03 (SE=0.01, 
p<0.01, 95% CI = [0.02, 0.05]). By significance disability (binary coding of yes or no) for 
subset analyses, both show the average (adjusted by rehabilita
tion services) ICC of 0.05 
(SE=0.01, p<0.01, 95% CI = [0.03, 0.09]). Breaking down by disability types, it finds that 
learning disability (LD) has the highest (adjusted by rehabilitation services) ICC of 0.04 
(SE=0.01, p<0.01, 95% CI = [0.02, 0.07]), follo
wed by substance abuse (SA; ICC=0.03, 
SE=0.02, p=0.04, 95% CI = [0.00, 0.09]), hearing impairments (HI; ICC=0.03, SE=0.01, 
p<0.01, 95% CI = [0.01, 0.06]), intellectual disability (ID; ICC=0.03, SE=0.02, p=0.0
3, 95% CI 
= [0.00, 0.0
8]), physical impairments 
(PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), 
and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at 
the significance level of 0.05, it 
shows 
non-significan
ce for the ICC estimates in the following 
disabilit
y types 
 visual impairments (VI, ICC=0.00, SE=0.08, p=0.52), attention deficit 
hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.81), 
traumatic brain injury (TBI; 
ICC=
-0.0
8, SE=0.0
3, p=0.
86),and autism spectrum disorder (ASD; ICC=0.02, SE=0.02, 
p=0.12)
. For quality employment (Y2 under Model 4; refer to Tables 5.1
6), the average (adjusted 
by both demographics and rehabilitation services) intraclass correlation is about 0.02 (SE=0.01, 
p<0.01, 95% CI = [0.01,0.04]). Given by work experience (binary coding
 of yes or no) for 
!94 partitioning subset samples, both show the average (adjusted by both demographics and 
rehabilitation services) ICC of 0.03 (SE=0.01, p<0.01, 95% CI = [0.02, 0.05]). By significance 
disability (binary coding of yes or no) for subset analy
ses, both show the average (adjusted by 
both demographics and rehabilitation services) ICC of 0.05 (SE=0.01, p<0.01, 95% CI = [0.03, 
0.09]). Breaking down by disability types, it finds that learning disability (LD) has the highest 
(adjusted by both demogra
phics and rehabilitation services) ICC of 0.03 (SE=0.01, p<0.01, 
95% CI = [0.02, 0.07]), followed by substance abuse (SA; ICC=0.03, SE=0.02, p=0.04, 95% CI 
= [0.00, 0.09]), hearing impairments (HI; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.06]), 
physica
l impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), and mental 
illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the 
significance level of 0.05, it 
shows 
non-significan
ce for the ICC estimates in the follo
wing 
disability types 
 visual impairments (VI, ICC=0.00, SE=0.08, p=0.49), attention deficit 
hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.87), 
intellectual disability 
(ID; 
ICC=0.0
2, SE=0.02, p=0.0
5), traumatic brain injury (TBI; ICC=
-0.08, SE=0.0
3, p=0.
87),and 
autism spectrum disorder (ASD; ICC=0.02, SE=0.02, p=0.12).
   For auxiliary information of ICC Estimates for Outcome Measure Y
2 (see Tables 5.1
7), the unconditional model (Model 1; unconditional ICC=0.0
2 and design effect DE=
8.49) is used 
as a baseline for measuring relative efficiency of between
-group variance 
( and 
) and 
within
-group variance 
( and 
) for ICC estimates.  The conditional model (Model 2; 
conditional ICC=0.0
2 and design 
effect DE=
9.38) with a covariate set of demographic 
information has a decrease of 
9.75% of within
-group variation and 
an increase of 1
.27% of between
-group variation, in comparison with the unconditional model (Model 1).  The 
conditional model (Model 3; co
nditional ICC=0.0
2 and design effect DE=
8.70) with a covariate 
!95 set of rehabilitation service information has a decrease of 
2.47% of within
-group variation and 
an increase of 
0.32% in between
-group variation, in comparison with the unconditional model 
(Mode
l 1).  The conditional model (Model 4; conditional ICC=0.0
2 and design effect DE=
9.41) with a covariate set of both demographic and rehabilitation service information has a decrease 
of 10.02 % of within
-group variation and 
an increase of 1
.31% of between
-group variation, in 
comparison with the unconditional model (Model 1).
  For evaluation of bootstrapping ICC estimates (bootstrap repetition of 100 times) for 
outcome measure Y2 in the different resampling scenarios of the number of groups and subj
ects 
(see Table 5.18), it provides important information of sampling schemes in multilevel structure 
(based on Model 4 with the full set of covariates of demographics and rehabilitation services). 
For the low level of cluster samples (i.e., number of group
s=5), the mean bias is about 0.0164, 
MSE is about 0.0009, the proportion of successful hits is about 34%. For the medium level of 
cluster samples (i.e., number of groups=15), the mean bias is about 0.0152, MSE is about 
0.0004, the proportion of successful 
hits is about 55%. For the high level of cluster samples 
(i.e., number of groups=25), the mean bias is about 0.0149, MSE is about 0.0003, the 
proportion of successful hits is about 64%. On the other hand, For the low level of subject 
samples (i.e., number 
of subjects=50), the mean bias is about 0.0160, MSE is about 0.0007, the 
proportion of successful hits is about 40%. For the medium level of subject samples (i.e., 
number of subjects=100), the mean bias is about 0.0154, MSE is about 0.0004, the proportion 
of successful hits is about 54%. For the high level of subject samples (i.e., number of 
subjects=150), the mean bias is about 0.0148, MSE is about 0.0004, the proportion of 
successful hits is about 66%. Overall, the sampling scheme with the high level of g
roup 
samples (i.e., 25) and high level of subject samples (i.e., 150) achieve the best outcome (i.e., 
!96 lowest bias & MSE, and highest successful hits); the sampling scheme with moderate cluster 
or subject samples (i.e., number of groups=15 
or number of subj
ects=100) can provide the average 
performance of ICC estimates in multilevel structure;  the sampling scheme with the low level 
of group samples (i.e., 5) or the level of group subject samples (i.e., 50) is more likely to result 
in poor performance of ICC 
estimates in hierarchical linear modeling structure.
      
    Table 5.1
3 ICC Estimates of Unconditional Model M1 for Outcome Measure Y2
  Model 1
 Total 
Sample 
Size 
 Number 
of 
Groups
 Within 
Group 
Size 
 ICC 
Estimate
 SE of 
ICC 
Estimate
 p-value
 Lower 
Bound 
of ICC
 Upper 
Bound 
of ICC
 Overall 
Sample
 11,819 33 356 0.0211 0.0054 0.00 0.0127 0.0381 Work Experience
 No 
 8,821 33 266 0.0134 0.0042 0.00 0.0073 0.0257 Yes
 2,998 33 90 0.0408 0.0118 0.00 0.0227 0.0758 Significance Disability
 No 1,233 33 36 0.0797 0.0237 0.00 0.0422 0.1434 Yes
 10,586 33 319 0.0171 0.0048 0.00 0.0100 0.0316 Disability Type
 VI 87 29 3 -0.0044 0.0798 0.50 -0.1832 0.2422 HI 1,989 32 61 0.0273 0.0110 0.00 0.0117 0.0577 PI 2,154 33 65 0.0223 0.0092 0.00 0.0090 0.0488 LD 2,276 33 68 0.0342 0.0116 0.00 0.0171 0.0676 ADHD
 443 33 13 -0.0219 0.0081 0.87 -0.0425 0.0198 ID 652 33 19 0.0233 0.0176 0.05 -0.0034 0.0743 TBI
 132 27 5 -0.0773 0.0281 0.88 -0.1447 0.0604 ASD
 436 33 13 0.0226 0.0225 0.13 -0.0138 0.0899 MI 3,073 33 92 0.0190 0.0074 0.00 0.0085 0.0401 SA 577 31 18 0.0283 0.0207 0.04 -0.0026 0.0853 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; 
PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit 
Hyperactivity 
Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum 
Disorder; MI=Mental Illness; SA=Substance Abuse.
 Note2. P
-value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01).
    !97     Table 5.1
4 ICC Estimates of Conditional Model M2 for Outcome Measure Y2
  Model 2
 Total 
Sample 
Size 
 Number 
of 
Groups
 Within 
Group 
Size 
 ICC 
Estimate
 SE of 
ICC 
Estimate
 p-value
 Lower 
Bound 
of ICC
 Upper 
Bound 
of ICC
 Overall 
Sample
 11,819 33 356 0.0236 0.0059 0.00 0.0143 0.0423 Work Experience
 No 
 8,821 33 266 0.0135 0.0042 0.00 0.0074 0.0259 Yes
 2,998 33 90 0.0457 0.0126 0.00 0.0260 0.0838 Significance Disability
 No 1,233 33 36 0.0869 0.0246 0.00 0.0471 0.1540 Yes
 10,586 33 319 0.0183 0.0050 0.00 0.0108 0.0336 Disability Type
 VI 87 29 3 -0.0016 0.0808 0.49 -0.1811 0.2453 HI 1,989 32 61 0.0293 0.0115 0.00 0.0130 0.0610 PI 2,154 33 65 0.0227 0.0093 0.00 0.0093 0.0495 LD 2,276 33 68 0.0346 0.0117 0.00 0.0174 0.0682 ADHD
 443 33 13 -0.0218 0.0081 0.87 -0.0425 0.0198 ID 652 33 19 0.0238 0.0177 0.05 -0.0031 0.0750 TBI
 132 27 5 -0.0750 0.0287 0.87 -0.1433 0.0636 ASD
 436 33 13 0.0233 0.0226 0.13 -0.0134 0.0908 MI 3,073 33 92 0.0193 0.0074 0.00 0.0087 0.0405 SA 577 31 18 0.0283 0.0207 0.04 -0.0026 0.0853 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; 
PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity 
Disorder; ID=Intellectual Disability; TBI= 
Traumatic Brain Injury; ASD=Autism Spectrum 
Disorder; MI=Mental Illness; SA=Substance Abuse.
 Note2. P
-value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01).
              !98     Table 5.1
5 ICC Estimates of Conditional Model M3 for Outcome Measure Y2
  Model 3
 Total 
Sample 
Size 
 Number 
of 
Groups
 Within 
Group 
Size 
 ICC 
Estimate
 SE of 
ICC 
Estimate
 p-value
 Lower 
Bound 
of ICC
 Upper 
Bound 
of ICC
 Overall 
Sample
 11,819 33 356 0.0217 0.0055 0.00 0.0131 0.0391 Work Experience
 No 
 8,821 33 266 0.0136 0.0042 0.00 0.0074 0.0260 Yes
 2,998 33 90 0.0429 0.0121 0.00 0.0241 0.0793 Significance Disability
 No 1,233 33 36 0.0855 0.0244 0.00 0.0462 0.1520 Yes
 10,586 33 319 0.0175 0.0049 0.00 0.0103 0.0324 Disability Type
 VI 87 29 3 -0.0099 0.0777 0.52 -0.1872 0.2361 HI 1,989 32 61 0.0273 0.0111 0.00 0.0117 0.0577 PI 2,154 33 65 0.0223 0.0092 0.00 0.0091 0.0488 LD 2,276 33 68 0.0359 0.0120 0.00 0.0182 0.0703 ADHD
 443 33 13 -0.0171 0.0100 0.81 -0.0394 0.0274 ID 652 33 19 0.0275 0.0187 0.03 -0.0007 0.0807 TBI
 132 27 5 -0.0727 0.0292 0.86 -0.1419 0.0669 ASD
 436 33 13 0.0242 0.0229 0.12 -0.0128 0.0922 MI 3,073 33 92 0.0193 0.0074 0.00 0.0087 0.0405 SA 577 31 18 0.0282 0.0207 0.04 -0.0027 0.0851 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; 
PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity 
Disorder; ID=Intellectual Disability; TBI= 
Traumatic Brain Injury; ASD=Autism Spectrum 
Disorder; MI=Mental Illness; SA=Substance Abuse.
 Note2. P
-value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01).
              !99     Table 5.1
6 ICC Estimates of Conditional Model M4 for Outcome Measure Y2
  Model 4
 Total 
Sample 
Size 
 Number 
of 
Groups
 Within 
Group 
Size 
 ICC 
Estimate
 SE of 
ICC 
Estimate
 p-value
 Lower 
Bound 
of ICC
 Upper 
Bound 
of ICC
 Overall 
Sample
 11,819 33 356 0.0237 0.0059 0.00 0.0144 0.0424 Work Experience
 No 
 8,821 33 266 0.0135 0.0042 0.00 0.0074 0.0259 Yes
 2,998 33 90 0.0458 0.0126 0.00 0.0261 0.0840 Significance Disability
 No 1,233 33 36 0.0872 0.0246 0.00 0.0473 0.1544 Yes
 10,586 33 319 0.0184 0.0050 0.00 0.0108 0.0337 Disability Type
 VI 87 29 3 -0.0015 0.0808 0.49 -0.1810 0.2454 HI 1,989 32 61 0.0293 0.0115 0.00 0.0130 0.0610 PI 2,154 33 65 0.0227 0.0093 0.00 0.0093 0.0495 LD 2,276 33 68 0.0348 0.0117 0.00 0.0175 0.0684 ADHD
 443 33 13 -0.0216 0.0082 0.87 -0.0424 0.0201 ID 652 33 19 0.0240 0.0178 0.05 -0.0030 0.0753 TBI
 132 27 5 -0.0754 0.0286 0.87 -0.1436 0.0630 ASD
 436 33 13 0.0235 0.0227 0.12 -0.0132 0.0912 MI 3,073 33 92 0.0193 0.0074 0.00 0.0087 0.0406 SA 577 31 18 0.0283 0.0207 0.04 -0.0026 0.0853 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; 
PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity 
Disorder; ID=Intellectual Disability; TBI= 
Traumatic Brain Injury; ASD=Autism Spectrum 
Disorder; MI=Mental Illness; SA=Substance Abuse.
 Note2. P
-value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01).
              !100     Table 5.1
7 Auxiliary Information of ICC Estimates for Outcome Measure Y2
  Modeling 
Structure
 ICC 
Estimate
 Between 
Group 
Variance
 Within 
Group 
Variance
 Design 
Effect 
(DE)
     Model 1
 (M1)
 0.0211 2,275.62 105,264.82 8.4907 NA NA NA NA Model 2
 (M2)
 0.0236 2,304.53 95,000.22 9.3782 1.0127 0.9025 -0.0127 0.0975 Model 3
 (M3)
 0.0217 2,282.94 102,665.47 8.7037 1.0032 0.9753 -0.0032 0.0247 Model 4
 (M4)
 0.0237 2,305.32 94,718.63 9.4137 1.0131 0.8998 -0.0131 0.1002 Note1. The ICC estimate for M1 

M2-M4 show the conditional ICC quantity.
 Note2. Relative efficiency measures for ICC estimates between unconditional and conditional 
models (M1 versus M2
-M4) are 
, ,  and 
. !!!!Table 5.1
8 Evaluation of Bootstrap ICC Estimates for Outcome Measure Y2
  Number of 
Group
 Within Grou
p Size
 Bias
 MSE Hits
 5 50 0.0175 0.0013 0.20 5 100 0.0162 0.0007 0.37 5 150 0.0156 0.0006 0.45 15 50 0.0154 0.0005 0.47 15 100 0.0153 0.0003 0.51 15 150 0.0148 0.0003 0.66 25 50 0.0152 0.0004 0.52 25 100 0.0147 0.0003 0.75 25 150 0.0139 0.0002 0.86 Note1. Bias is defined as the mean difference between Bootstrap ICC and True ICC. 
 Note2. MSE is the mean squared error difference between Bootstrap ICC estimates.
 Note3. Hits shows the proportion of Bootstrap ICC estimates successfully lying 
within the 95% 
confidence interval of True ICC.
  !!101 CHAPTER 6 
  CONCLUSION & DISCUSSION
    6.1 Summary of the Results
  The proposed
 method 
for ICC 
estimation and inference 
is based on
 the real
-world
 data 
set of RSA
-911, 
where
 the 
usable
 sample
s are those 
individuals
 with disabilities 
served 
in the
 Michigan
 Rehabilitation Service
s Programs
 in FY 2015
 (n=11,819). To address the research 
questions
 of the study
, the two
-level 
multilevel
 modeling approach 
to the cluster
-randomized 
design data struct
ure, is used to fit 
the data simulations
, where individual subject
s are at the 
level 1
 (i.e., the average within cluster size is 356 per unit)
 and rehabilitation office
s are at the 
level 2
 (i.e., there are 33 of 
vocational rehabilitation office structures 
statewide in Michigan
).  There are two types of 
multilevel
 model
ing in
 data 
simulation
s: (1) unconditional model 
(Model 1
); and (2) conditional model
s (Models 2
-4).  To 
evaluate
 which 
multilevel modeling 
structure
s match better
 with
 which 
sampling
 schemes
, a bootstrap resampling procedure
 is 
adopted in data
 simulation
 and
 analysis
, to compare the 
ICC estimates
 between 
population
 (Research Question 1) 
and 
subsample (Research Question 2) models, in terms of statistical 
properties of accuracy and preci
sion on ICC estimation and inference (Research Question 3).  
 (a)
 Research Question 1 for Outcome Measure Y1
 (see
 Tables 
5.7-5.10
) For overall sample of competitive employment, t
he 
ICC estimate on average 
is about 
0.01 (SE=0.003, p<0.01). 
Given by work experie
nce (
i.e., no work experience, in 
particular
), the ICC estimate is inflated slightly 
(i.e., 0.002) 
but so is the standard error
 !102 (i.e., 0.002)
, comparing with the overall sample. Given by disability significance
 (i.e., no 
disability significance, in particu
lar)
, the ICC estimate is inflated more (i.e., 0.02) and so 
is the standard error (i.e., 0.01), comparing with the overall sample. Given by disability 
type, 
the ICC estimate is inflated most (i.e., 0.05) for ASD, followed by 
LD (i.e., 
0.0
2),
 HI (i.e., 
0.0
1), 
PI (i.e., 
0.01), 
and MI
 (i.e., 0.01). 
Also note that
 VI has the highest ICC 
(i.e., about 0.07), but the estimate is not significant 
at the level of 0.05
, due to small 
sample size (i.e., total sample size is 87 across 29 office units).
 (b) Research 
Question 1 for Outcome Measure Y2 (see Tables 5.
13-5.1
6) For overall sample of 
quality
 employment, t
he 
ICC estimate on average 
is about 0.0
2 (SE=0.00
5, p<0.01). 
Given by work experience (i.e.,
 having
 work experience, in 
particular), the ICC estimate is inf
lated 
to some extent 
(i.e., 0.02) 
and
 so is the standard 
error (i.e., 0.00
6), comparing with the overall sample. Given by disability significance 
(i.e., no disability significance, in particular), the ICC estimate is inflated 
much
 (i.e., 
0.06) and so is th
e standard error (i.e., 0.0
2), comparing with the overall sample. Given 
by disability type, the ICC estimate 
for LD is inflated most (i.e., 0.01) followed by SA 
(i.e., 0.01), HI (i.e., 0.01), and PI (i.e., 0.001).
 Also note that the ICC estimate for MI is 
relatively lower than the overall sample by about 0.002.
 (c)
 Research Question 2 for Outcome Measure Y1 (see Tables 5.11
-5.12)
 As for examination of bootstrapping ICC estimates (repetitions=100) for competitive 
employment in the different sampling scenarios, it provides important sampling design 
information about hierarchical modeling with the full set of covariates of individual 
characteristics and rehabilitation services. With an average cluster sample 
size 
(e.g., the 
!103 number of clusters is about 10
-15), the mean bias is about 0.00
5, MSE is about 0.000
2, the proportion of successful hits is about 
70%. 
With an average level of subj
ect sample 
size (e.g., the number of subjects is around100), the mean bias is about 0.0053, MSE is 
about 0.0002, the proportion of successful hits is close to 60%. That is, the within
-cluster 
subject size also plays an auxiliary role in quality of ICC esti
mation and inference, while 
the between
-cluster sample size determine
s overall quality of ICC estimates.
 In general, with large cluster samples (e.g., cluster size is 15
-25) and average within
-cluster samples (e.g., within
-cluster size is 100
-150), the ICC
 estimation and inference 
can result in 
effective
 performance
 in terms of accuracy and precision; 
on the other side, 
with
 a smaller cluster
 size
 (e.g.
, 5
 or below
) or 
a smaller within
-cluster sample size
 (e.g.
, 50 or below
), the ICC estimate
 is 
susceptible to
 be less reliable and more biased 
in 
the 
hierarchical generalized linear modeling 
framework for 
a binary outcome measure
. (d) Research Question 2 for Outcome Measure Y
2 (see Tables 5.17
-5.18)
 As for examination of bootstrapping ICC estimates (rep
etitions=100) for quality of 
employment in the different resampling scenarios, it provides crucial sampling design 
information about multilevel modeling with the full set of covariates of individual 
characteristics and rehabilitation services. With an aver
age cluster sample size (e.g., the 
number of clusters is about 10
-15), the mean bias is about 0.015, MSE is about 0.0003, 
the proportion of successful hits is about 55%. With an average level of subject sample 
size (e.g., the number of subjects is around
 100), the mean bias is also about 0.015, MSE 
is about 0.0004, the proportion of successful hits is close to 55% as well. That is, the 
within
-cluster size also plays a supplemental role in ICC estimation and inference, while 
the between
-cluster size still 
can boost effective performance
 of ICC estimates.
 !104 In general, with large cluster samples (e.g., cluster size is 15
-25+) and average within
-cluster samples (e.g., within
-cluster size is 100
-150
+), the ICC estimation and inference 
can result in effective perfo
rmance in terms of accuracy and precision; on the other 
hand
, with a smaller cluster size (e.g., 
10 or less
) or a smaller within
-cluster sample size (e.g., 
50 or 
less
), the ICC estimate is 
prone 
to be less 
consistent
 and more biased in the 
hierarchical lin
ear modeling framework for a 
contin
uous
 outcome measure.
 (e)
 Research Question 3 for Outcome Measure Y1 (see Tables 5.11
-5.12)
 As for a
uxiliary 
statistics
 of the 
ICC 
estimates
 for 
competitive employment, the 
unadjusted ICC is about 0.01 (DE=4.44), while the ad
justed ICC is also about 0.01 
(DE=4.67). The unconditional model is used as a baseline to measure relative efficiency 
of between
- and within
-group variances for ICC estimates in conditional models. Among 
the three competing conditional models (Models 2
-4),
 Model 3 (
the one with a covariate 
set of service information
) has 
the most 
decrease of 8.06% of within
-group variation 
as 
well as
 a significant
 increase of 4.17% in between
-group variation, 
comparing with
 the 
baseline
 model (Model 1).  
Note that both 
Model 2 (demographic model) and Model 4 
(full model) have similar performance that result in a decrease of 3.05% of within
-group 
variation and 0.00% of change in between
-group variation, comparing with the baseline.
 (f)
 Research Question 3 for Outcome Measure 
Y2 (see Tables 5.17
-5.18)
 As for a
uxiliary 
statistics
 of the 
ICC 
estimates
 for 
quality of employment, the unadjusted 
ICC is about 0.02 (DE=8.49), while the adjusted ICC is also about 0.02 (DE=9.17). The 
unconditional model is used as a baseline to measure 
relative efficiency of between
- and 
within
-group variances for ICC estimates in conditional models. Among the three 
!105 competing conditional models (Models 2
-4), Model 
4 (the one with 
the full 
covariate set 
of demographics and 
service
s) and Model 2 (the one w
ith a covariate set of demographic 
information) 
has the most decrease of 
about 9.88
% of within
-group variation as well as a 
slight 
increase of 
1.29% in between
-group variation, comparing with the baseline model
 (Model 1)
. Note that Model 
3 (service
 model) 
has relatively ineffective 
performance that 
result in a 
modest 
decrease of 
2.47% of within
-group variation and 
a tiny
 increase of 
0.32% in between
-group variation, comparing with the baseline
 model
.  6.2 Implications
  (a)
 Statistical perspectives on the ICC 
estimation and inference
 The
 intraclass correlation coefficients (ICC) at experimental designs has been one of the 
oldest statistical measures since 
Sir 
RA Fisher 
invented
 it last century
 (Fisher, 1925a)
. 
it has been used as one of the most popular and 
important tools in scientific inquiries including educational 
and social 
research.  In a 
theoretical perspective, both correlation
 coefficient
 and intraclass correlatio
n share 
mathematical similarities and features. 
For example, ICC can be used to measure the 
level of similarity or resemblance within a group of measurements (e.g., students in 
a classroom or school), and the general formula of intraclass correlation can b
e written 
by a 
very 

Fisher 
(1925a) also pointed out that the ICC can be geometrically equivalent to the overall 
Euclidean distance between the paired samples on the standardized scale (see F
igures 
!106 2.2 and 2.3 as example
s). In terms of effect size measures, 
both correlation and ICC can 
determine the effect size magnitude of a studied phenomenon of interest; in particular, 
the ICC
 can show 
the amount of 
total 
variance 
explained
 by 
between
-group
 variation 
in an experimental design model
 (e.g., hierarchical linear models)
, and that it is another 
form of the squared correlation 
(R-squared) 
in analysis of variance models which 
accounts for the true proportion of outcome variance 
across
 different clu
sters.
 One research gap in methodology for ICC estimation and inference is about the testing 
statistic and its related sampling distribution. This study aims to address that important 
issue by developing the mathematical foundations of the ICC estimator 
at a hierarchical 
design (e.g., cluster randomized trials).  
Donner & Koval (1980a) derived maximum 
likelihood estimator (MLE) of the intraclass correlation using variance component 
in 
analysis of variance (ANOVA) 
models
. Since the traditional method (Fisher

approach) requires distributional assumptions (based on multivariate normal theory), it 
is analysis of variance (ANOVA) that provides an alternative estimator of intraclass 
correlation (by relaxing the multi
-normal assumptions) via classical ANOVA. 
This
 
extends it to utilize relevant information in the ANOVA table 

(i.e., 
utility of 
between
- and within
-group variation) 
for developing
 a general
 statistical 
framework for the ICC in the 
multilevel structure (i.e., 
a flexible approach to 
either 
a balanced 
design 
with equal group size or 
an 
unbalanced 

design 
with unequal 
group size). 
It is noteworthy that the approximate group size (or 
the average within
-group size
 by Donner & Koval, 1980a
) is a key in an unbalanced design 
case 
for 
computation of the proposed ICC
 estimator 
(see Figure 2.4 as an illustration)
. !107 As for statistical testing of the proposed ICC estimator
 (
), this study suggests 
the use of 
F-distribution 
(with 
 and 
 degrees of freedom
) and 
F-testing 
statistic (based on ANOVA) for determining 
if 
the null or alternative hypothesis
 of the 
magnitude of effects 
is true
 at the chosen level of significanc
e (i.e., 

 vs. 

). A significant 
F-testing statistic value implies that members of 
the same group tend to be more alike and similar with respect to the attribute or 
characteristic in question than tho
se from different groups
 (i.e., if 
within
-group
 subjects 
are perfectly homogeneous, 
or equivalently
 , then it 
implies
 
). As for a 
 confidence interval on the 
ICC
, this study provides the 
formulas 
for the 
corresponding interval 
for an ICC estimand (i.e., 
the true proportion of variance 
accounted for by a grouping factor of interest in a hierarchical design)
. Also, it
 is 
notable to
 be pointed out that the lower confidence limit on 
an ICC
 int
erval estimate
 could be negative
 using the proposed method, 
especially when 
a small sample size or 
large measurement error occurs in hierarchical modeling
; but 
since
 ICC
 is 
normally 
non-negative 
in 
anyway by 
the
 mathematical definition, it is 
a common prac
tice
 to 

-hoc adjustment
 (Hays, 1994)
. As for the variance of the proposed ICC estimator, this study uses the MLE approach 
(multivariate 
normality in a large sample theory
) by Donner & Koval (1980a) to 
obtain
 the standard error of the ICC estimate. It is interesting to note that the MLE of ICC is 

(i.e., a quick 
shortcut solution for the ICC estimation) 
for a balanced design in hierarchi
cal 
modeling; but for an unbalanced design, the MLE of ICC needs to be solved by a 
different approach 
 either numerical optimization via multivariate log
-likelihood by 
!108 Donner & Koval (1908b) or using invariance property of MLEs by Karlin et al. (1981)
. Th
approach (
Hedges
 & Hedberg
, 2007) that uses ICC 
via hierarchical modeling 
to collect 
the clustering information of variance components in 
cluster
 randomized trials
 (CRT)
. Nowadays 
CRT have become more and more popular in education and social 
studies 
for some practical reasons that 
RCT (randomized control trial) 
is 
too 
expensive
 for the 
assignment of each individual subject,
 whereas CRT is more 
economical
 by deal
ing
 with an entire intact group
 at one time
. Since 
ICC
 has been considered as an 
ancillary 
statistic to provide design effect 
(DE, 
or variance inflation factor
, VIF
) for statistical 
planning in 
multilevel
 design
, ICC
 can 
play a key role in 
effectively 
quantif
ying the 
amount of 
inherent 
clustering effects 
for a 
CRT survey study 
(Hedges et al., 2012; 
Hedges & Hedberg, 2013)
. It is important to n
ote that cluster
ing design (CRT) 
has 
more 
total 
variation 
(i.e., 
cluster
-to-cluster plus within
-cluster variance
) than 
simple 
random sampling 
(RCT) 
by a factor of DE (
that is why 
it is also called 
VIF). As 
for 
experimental design with 
a binary outcome (
e.g., 
a dichotomous
 variable)
, the 
proposed ICC estimator
 in this study 
is derived 
by 
using 
the 
hierarchical generalized 
linear modeling 
framework 
(HGLM
; Raudenbush
 & Bryk
, 2002). It is conventional 
(and also mathematically convenient) 
to 
use a constant variance (i.e., 
) as within
-group variance 
based on
 the standard logistic distribution
 (locati
on 
 and shape 
), whereas 
this 
strong 
assumption of 

within
-group variance
 as 

often 
is not met in 
real world
, so the 
recommended 
modifi
cation
 strategy
 from the study is to 
introduce
 a more 
flexible 
estimation procedure 
by incorporati
ng a data
-driven 
within
-group variance 
via
 HGLM
 for the proposed ICC estimation and inference
. !109 Last but not least, 
the proposed ICC method is 
also 
connected with statistical planning 
in experimental design 
for sample size determination and power 
calculation
, which
 is 
critical for researchers to 
conduct
 rigorous scientific investigations
 for
 detecting true 
effects at 
a desired 
effect size, statistical power, and significance level
. Traditionally, 
the design and planning 
for 
sample and power calcula
tions 
requires
 a classical 
restrictive 
assumption of simple random samples
, which is not quite 
met
 for multilevel 
modeling
. Hence
, this study propose
s a theoretical framework for the ICC 
estimator 
to circumvent 
such a shortcoming 
by tak
ing
 into account het
erogeneity in hierarchical 
structures 
of cluster samples (
such as CRT
). The proposed
 ICC estimation and 
inference is feasible via the use of between
- and within
-group variance in ANOVA of 
hierarchical linear modeling, and the testing statistic is based on 
-distribution to 
serve 
a foundation for statistical inference of the ICC estimand in a multilevel design.
  (b) Policy
 perspectives on the ICC estimation and inference
  In behavior, educational, psychological and social research, cluster randomized design 
that 
assig
ns intact groups (e.g., 
classrooms or 
schools) to interventions, 
has been 
become more 
increasingly 
adopted
 in 
the era of evidence
-based education and policy
 (Lingard, 2013
). Since e
xperimental design with such 
a cluster randomization 
is 
deemed as 
a hierarchical
 data structure (i.e.,
 subjects
 nested within a cluster), 
statistical 
planning
 would require relevant 
information
 of 
ICC
 to account for 
clustering effects 
to 
achieve adequate power and collect sufficient sample
. Through the real data set of 
!110 RSA
-911 from U.S. Department of Education, this study provides a comprehensive 
analysis
 of 
ICC
 of employment outcomes (
i.e., 
competitive employment and
 quality of 
employment measures) 
which are 
adjusted 
by 
covariates 
of interest 
(i.e., 
demographics 
and rehabilitation services) that can be used for statistical planning on 
CRT research 
(randomized trials or quasi 
experiments
) in future education studies. 
In addition
, this 
study also
 provides relative variance component 
information 
(i.e., 
between
-group and 
within
-group variation) that can be useful 
to understand which types of covariates 
should be involved 
in 
multilevel design for statistical planning and an
alysis.
 In an era of evidence
-based practice
 in rehabilitation counseling & education
, researchers
 are more aware of 
incorporati
on of
 scientific 

ways to empower people with 
impairments
 through effective services
 (Chan et al., 
2009). The recent legislation of The Workforce Innovation and Opportunity Act of 
2014 (WIOA), 
state and federal 
VR 
agencies
 have to assist the target disability 
population
s, to succeed in the 

and 

labor market
s in 
the global economy (WIOA Legislation, 2018). 
Thus,
 those 
rehabilitation counselors, 
educators, practitioners, and researchers
 all 
need to work together to 
adopt t
he 
new 
EBP paradigm to 
improve the quality of life for 
VR customers
 through
 rehabilitation 
services.
 Further, 
evidence
-based best practices 
in rehabilitation counseling would
 significantly 
improve outcomes for people with disabilities by 
translating knowledge
 and making good decisions in VR
 (Leahy
 et al., 2009, 2010, 
2014a, 2014b).  The use of E
BP has become a new standard to conduct effective research and gather 
reliable data for improving practices and outcomes (Eignor, 2013). Rehabilitation 
counselors and practitioners can integrate best EBP research evidence with clinical 
!111 judgement expertise,
 to make better decisions that enhance the outcomes, so the EBP 
can provide a significant improvement of knowledge translation in practice
 (Kosciulek, 
2010). So,
 not only does 
EBP provide the 
foundations
 incorporat
ing
 scientific evidence 
as well as
 clinica
l judgement expertise
, to make best decisions about interventions, 
services, or treatments for people with disabilities
, but
 EBP also 
assists
 rehabilitation 
counselors to identify relevant 

literature
, assess 
available
 information 
such as th
e RSA
-
services for people with disabilities. So, under the data
-driven framework with RSA
-911, this study provides the proposed
 method 
of ICC in multilevel data structure (i.e., 
individua
l subjects are on level 1 and rehabilitation office units are on level 2) that 
can 
help rehabilitation counseling researchers better understand the 
target population
 of 
people with disabilities when conducting 
CRT design and analysis for gathering 
relevant information of 
EBP by 
taking into account of
 the 
clustering effects 
via the ICC 
(w.r.t. the office units statewide) in the RSA
-911 data 
using hierarchical linear models.
 Hierarchical data structures are ubiqui
tous in education and social 
studies
 (Raudenbush 
& Bryk, 1992). In rehabilitation counseling & education, for example, clients are 
nested
 into 
field office 
structures, which are 
also 
nested into
 local
 districts, and local 
districts 
are
 nested into states
, and states are nested into
 regions, and so on. So, it is 
important to take into account all these 
multilevel
 structures and 
related 
topological 
relationships by using
 the 
hierarchical modeling framework for design and analysis
.   As for the origin of the R
SA-911 data, 
Rehabilitation Services Administration Case 
Service Report
 (RSA
-911for short) is
 the state vocational rehabilitation
 agencies collect 
and report summary data in a federally mandated format. The RSA
-911 provides 
!112 researchers a 
good
 resource for 
gathering evidence of EBP
 (Schwanke & Smith, 2004). 
Through data mining and deep learning of t
he RSA
-911 data, 
rehabilitation researchers 
can study 
complex issues
 to build EBP for 
people with disabilities
 (Pi & Thielsen, 
2011), and
 they can also explore
 big data 
of RSA
-911 
to examine 
what and how factors 
(e.g., variables in the individual level or office level) 
affect 
VR outcomes
 in which type 
disability groups
. Therefore, 
rehabilitation 
researchers 
can 
exploit
 the RSA
-911 data to 
develop 
EBP (either by CRT design or quasi
-experimental analysis), in 
particular
, for 
conducting 
individual
-level and employment
-related
 interventions, 
finding 
effective 
strategies
 for VR outcome improvement
, and
 best 
VR 
practices to 
achieve
 successful
 outcomes for individuals with disabilities (Fleming et al., 2013
; Pi, 2006). In previous literature of multilevel modeling using RSA
-911 data, 
Alsaman & Lee 
(2017) examine
d the cross
-sectional inter
-relationships between contextual factor
s (unemployment rates at the state level), individual factors (demographic background at 
the person level) , and employment outcomes (competitive employment of a binary 
measure) for the youth population with disabilities using the 2
-level hierarchical 
gene
ralized linear modeling (HGLM) framework. Chan et al. (2014) stud
ied
 the impact 
of 
the economic recession on VR employment by controlling for the 
contextual factor 
of unemployment rate
 in each state, where the 
2-level
 HGLM
 approach
 is applied. 
Pi 
(2006) used the 2
-level 
HLM
 method with 
the micro
- and macro
-level factors related to 
VR outcomes
, to study relationship between predictors across levels in the VR. 
 One knowledge gap in rehabilitation counseling research and literature for the ICC 
applications is 
about how to incorporate relevant ICC information into design and 
analysis using the RSA
-911 data by taking into account the clustering effects via the 
!113 ICC and the related DE estimates using multilevel models. This study aims to address 
that important issu
e by examining the ICC values 
via
 HLM and HGLM
. The proposed
 framework
 for ICC estimation and inference in the study
 is examined via
 the real
-life
 data
 set of RSA
-911, 
where
 the 
target
 sample
s of interest 
are
 people
 with disabilities in 
Michigan Rehabilitation Service
s in FY 2015
 (n=11,819). To address the 
ICC
-related 
research questions
 of the study
, the two
-level 
HLM and HGLM 
approach 
to the 
CRT 
(or 
cluster
ing
 RCT) type of 
study 
design is used to 
conduct
 the simulat
ions
, where 
person
 subject
s are 
on the l
evel and 
cluster units
 are 
on the l
evel 2. 
Results show that: 
(i) the 
overall ICC estimate for both outcome measures (competitive employment and quality 
employment) tends to be low (0.01 and 0.02, respectively), impl
ying that the clustering 
effects of rehabilitation office structures cannot capture much total variation in the RSA
-911 data; (ii) rehabilitation services play a bigger role
 than individual characteristics
 in 
accounting for total variation in the both empl
oyment outcome measures; (iii) previous 
work experience, significance of disability, and type of disability (i.e., covariates for 
subgroup analysis) can affect outcome measures, but also they show differences in the 
ICC estimates, which indicates that rese
archers should pay attention to those groups with 
a high ICC value when conducting a CRT design study; (iv) 
should a CRT experiment be 
conducted, the recommended minimum cluster samples are about 10
-15 units, and person 
samples are about 100
-150 subjects, 
for attaining sufficient quality sample in analysis.
 It 
is interesting to notice that 
the
 average (unadjusted) 
ICC estimates
 in the simulation study 
are comparable to those psychological mental health data in school
-based intervention 
designs
 in which ICCs range from 0.01 to 0.05 
(Murray & Short, 1995)
, although they are  
relatively low
er than the standards of 0.05
-0.15
 based on
 education 
data 
in reading and 
!114 mathematics across Grades K
-12 (Bloom et al., 1999, 2007; Hedges & Hedberg, 2007; 
Scho
chet, 2008
). The low ICC is an indicator of small clustering effects in the multilevel
 design and analysis, but the effective sample size (i.e., a total sample size divided by design 
effect)
 is inflated
 to a certain degree
, meaning the bottom line (minimum
 sample
 size
) is 
risen to maintain high statistical power and low standard error 
given by
 the same 
model. 
  6.3 Limitations of the Study
  There are 
four
 limitations in the study. 
 (1) 
Of the different types of effect magnitude measures for the 
correlation ratio
 (), the intraclass correlation (ICC
; 
) is a parametric estimator in 
ANOVA via 
HLM to 
quantify the true proportion of total variance (
) accounted for in the outcome. 
Although the underlying 
ANOVA fr
amework
 in HLM suggests the total variance consists of 
two independent variance components 
(i.e., both always be a positive real number) 
 group 
variance (
) and error variance (
), an unbiased estimate of group variance may be 
failed 
and 
found 
, especially 
when MSE 
() is greater than or equal to MSA
 () (Hayes, 
1994). As a consequence, the ICC estimate value 
is forced to 
become zero, which 
would be
 shown as a
 warning 
of estimation failure
 from the command for HLM or HGLM 
in 
statistical 
software (like 
lmer
 or 
glmer
 from the package of 
lme
 or 
lme4
 in R). In this case of 
estimation failure in HLM, model modification is suggested to remedy the situation that
 there 
is mor
e within
-group 
error variation than between
-group
 variation
, i.e., 
, in ANOVA 
via HLM
 (Raudenbush 
& Bryk
, 2002).   !115 (2) In the simulation using the RSA
-911 data, there have other options to build a 
different multilevel design and analysis
 for ICC
 estimation and inference
. In this study, a two
-level hierarchical design structure (i.e., individuals are at the level 1, and offices at the level 
2) is fitted by HLM and HGLM to find the unadjusted ICC (by the unconditional model 
without any covariates) 
and adjusted ICC (by the conditional model with covariates). On the 
other side, alternative modeling choice can be the latent variable modeling 
(LVM) 
approach 
to investigate the multilevel data of RSA
-911. Austin
 & Lee
 (2014) built a structural equation 
model
 (SEM) 
of VR services via RSA
-911, to study predictors of employment outcomes in 
VR for people with intellectual and co
-occurring psychiatric disabilities. And 
Alsaman
 & Lee
 (2017) examine the relationships between contextual factors, individual factors, and 
empl
oyment outcomes of transition youth with disabilities in VR using the RSA
-911 data in 
by the 2
-level HGLM (individuals are on Level 1, and states are on Level 2). 
Since the 
current study does not use latent factors in the HLM and HGLM framework due to the 
limitation of HLM and HGLM modeling structure, the alternative LVM approach can provide 
a holistic modeling structure with latent constructs and manifest variables both at the same 
time to study latent factor structures of interest  (
Raykov
 & Marcoulides, 
2006). In the VR 
context, 
SEM
 can 
also be used to examine 
important predictive associations between 
individual characteristics, rehabilitation services, and employment outcomes
, while HLM is 

evel design (such as CRT).
 (3) In the 
simulation 
study
 using the RSA
-911 data
, it does not consider 
any 
interactions at the 
person
 level
 or the office level
 (e.g., demographic variables and service 
indicators at the level
 1, or their group means at the lev
el 2
) due to
 statistical 
simplicity for 
simulations, but they may exist 
two
-way 
interactions 
somewhat 
between those individual 
!116 characteristic
 and rehabilitation service
 variables. 
For example, age group 
(X3) 
can be related 
to 
education
 (X5)
, rehabilitation
 services
 (X6
-X8) 
for both
 employment outcome
 measures 
(Y1 and Y2), according to the sample correlation structures of all predictors in hierarchical 
analysis (see Tables 5.4 and 5.5). With those important two
-way interactions added into HLM 
and HGLM, the I
CC estimation and inference 
can be influenced to some degree due to 
between
- and within
-group variation 
affected by new predictors (those important two
-way 
interactions) in the HLM and HGLM model.
 Theoretically, after adding those significant 
predictors in
 an HLM or HGLM 
model, MSE (within
-group variation) would be decreasing 
to some extent, and the new ICC could be increasing to a certain degree, comparing with the 
old ICC (based on the baseline model without newly added important two
-way interactions).
  (4) The ICC estimation would require a minimum total sample size (
), the number of 
groups (
), and within
-group size (
). If one of the
 criteria (i.e., 
, , and 
) is not met, it 
is very likely to obtain an invalid ICC estimate value (either the 
ICC estimate is a negative 
value or zero, or the lower bound of confidence interval is not positive at all). For example, 
the lower bound of ICC confidence interval
 (CI)
 for visual impairments 
(VI) on Y1 under 
Model 1 
is not valid
 (see Table 5.7), due to t
he small total sample size (
) and within 
group size (
); similarly, the lower bound of ICC confidence interval (CI) for visual 
impairments (VI) on Y2 under Model 1 is not valid either (see Table 5.13) and so is the ICC 
estimate negative, due to ag
ain the small total sample size (
) and within group size 
(). The threshold of sample size criteria for ICC estimation and inference would need 
future research to determine the minimum sample size for statistical analysis in HLM and 
HGLM. From the
 simulations, the rule of thumb is total sample size (
) greater than 600 and 
within group size (
) larger than 20, given by the number of groups about 30. In other 
!117 words, the 
quick formula is 
, where 
 is 
total sample size
,  is the number o
f groups, 
 is within group size; and the simulation finding in the study (based on the RSA
-911 data) suggests that the sample size criterion 
 , or 
, would assure 
the ICC estimation and inference is more likely to get a valid and reliabl
e result in the case of 
CRT (or cluster RCT) via the HLM and HGLM framework using the RSA
-911 data.
   6.4 Future Research
  Future work should address the following 
five 
potential issues that have not been 
fully 
addressed in this study.
  First, 
as 
for the tradi
tional approach to ICC estimation, 
the practical method
 is based on 
a two
-level multilevel structure (e.g., the person level is defined as Level 1, and the group 
level is defined as Level 2), where the ICC estimation is to utilize relevant 
information from 
the ANOVA table including the source of both between
- and within
-group variation in the 
HLM and HGLM framework.
 For more complex multilevel structure in CRT experiments 
(e.g., 3
-level and 4
-level hierarchical design), ICC estimation (using
 variance component 
decomposition in ANOVA
 via HLM
) has been discussed (Hedges et al., 2012; Hedges & 
Hedberg, 2013), but ICC inference 
(hypothesis testing by confidence interval and p
-value) 
has not been 
done 
yet
 for complex 3
-level or 4
-level multilevel 
models.
 For this development, 
one statistical challenge and difficulty is to find out an effective way to quantify standard 
error of ICC (based on the pooled weighted variance of ICC across different levels) in 
complex multilevel design via the HLM or HGLM
 framework, or to extend the 2
-level 
!118 multilevel framework in the study to 3
- or 4
-level HLM or HGLM by 
using multiple 

correction 
method, and Benjamin
i-Hochberg procedure) to control for familywise Type
 I error 
rate 
or the overall false discovery 
rate
 (i.e., the probability of making one or more Type I errors or false discoveries when 
performing multiple hypotheses tests)
.  Second, complex data integration (or data fusion) has become an important issue in 
the big
-data era
 with 
today

, and 
researchers may look into 
multiple sources 
of 
large
-scale complex data sets 
(or data platforms) to conduct interdisciplinary studies. For 
example, it would be interesting to integrate the
 RSA
-911 data with a set
 of covariates from 
Census data
 for a comprehensive research
 investigati
on about 
how 
the between
- and within 
group variation sources are varied by the ICC estimates, in terms of statistical 
effectiveness
 perspectives for design and analysis, for 
statistica
l estimation and inference
 at eac
h level
 of 
multilevel modeling
 across
 different
 data 
platforms. 
In such a way, 
multilevel design models 
are inherently nested at each level in different data platforms
 (note: data platform can be 

ated as an additional level in the HLM and HGLM framework) 
Given by this complex design structure (multiple data platforms), it would be interesting to 
study how 
statistical planning 
can be conducted for
 power and sample size 
calculations, and 
what ICC est
imates are varied (using sensitivity analysis) 
to a point 
in different platforms
.  Third
, covariate adjustment is an important technique in statistical modeling to take 
into account the confounder effects in a model (HLM or HGLM). In the complex multilevel
 design (i.e., more than two levels in hierarchal models), it would be interesting to understand 
how covariate adjustment (with or without subgroup analysis or stratification) affects 
adjusted 
ICC estimation and inferenc
e. In the study (as the case of 2
-level hierarchical design),
 !119 
improve the ICC estimates to some extent, yet in some cases (especially for a small total 
sample size or within
-group sample size) 
the ICC estimation and inference cannot work at all 
(i.e., estimation failure). Therefore, it would be important to find out how to develop 
the 
remedial strategy
 for statistical 
adjustment and stratification
 in complex multilevel design
 via 
HLM and HGLM, a
nd what type of statistical centering or standardizing procedures can be 
used to modify
 
customize
 covariate adjustment 
(e.g., group and grand centering or 
standardizing) 
at each level 
to make 

 ICC estimation and inference
 more accurate 
and
 precise by 
accounting for the localized multilevel substructure
 adjusted by covariates
.  Fourth, 
this study considers only one
-year data (FY 2015) of RSA
-911 for simulations 
to testify the proposed method of ICC estimation and inference. It would be inter
esting to 
study the statistical properties of ICC by 
extend
ing
 the current framework to a complex 
multilevel 
structure
 such as 
longitudinal design across multiple years or cross
-cohort design 
with multiple year data resources. In this type of complex multi
level modeling structure (e.g., 
longitudinal analysis in HLM and HGLM), the variance
-and
-covariance structure (i.e., a 

symmetry for homogeneous data, and autoregressive 
structure for heterogeneous data
) so as 
to take into account the correlation 
structure 
across different time periods or cohorts.
 In 
addition, it would be interesting to use multiple year data sets of RSA
-911 to verify statistical 
performance of ICC estimat
ion and inference in terms of consistency and efficiency. 
 Lastly
, missing data analysis is a common issue in statistics. Although the listwise 
procedure (i.e., only include complete data, but exclude those subjects with any incomplete 
information) is a co
nvenient way to deal with missing data, it would often lose much 
!120 statistical information and compromise statistical power in analysis (e.g., HLM or HGLM). 
Hence,
 it would be 
important 
to study how to 
cope
 with missing values
 (assuming missing at 
random)
 in a multilevel design data structure for ICC estimation and inference
, and what 
remedial procedure
s (EM or multiple imputation for discrete or continuous variables) 
can be 
applied
 to improve 
the ICC 
estimation process
 via sensitivity analys
is in HLM or HGLM
. For 
the proposed method
 of ICC with a full complete data
, the simulation results suggest 
the total 
sample size 
needs
 to 
be greater than 1,500 and within group sample size larger than 100 (
over 
15 groups
). Nevertheless, the guideline
s need to be adjusted for incomplete data case.
  6.5 Conclusion
  In conclusion, this study provides a comprehensive methodology for intraclass correlation 
(ICC) estimation and inference using the 
hierarchical 
mixed 
modeling framework. The proposed 
methodology for ICC estimation and inference incorporate the 
analysis of variance
 (ANOVA)
 approach 
to the development of the ICC estimator and its inferential statistic of the pivotal 
quantity of the ICC estimand for derivi
ng the sampling distribution
 (F-distribution)
 to test ICC
 as 
well as construct 
confidence interval
 on ICC
. The
 proposed statistical procedures 
for ICC 
estimation and inference 
can be easily used and applied in any large
-scale
 or small
-scale data sets, 
wher
eas small total sample size and small within group size and missing data are limitations 
can 
affect the results of ICC 
estimat
es to a certain degree 
in terms of precision and accuracy. 
More 
research study is needed to better understand the ICC in complex m
ultilevel design structures. 
 !121                         APPENDICES
                                 !122 APPENDIX
 A: Definitions 
of the V
R Variables 
in R
SA-911    The following are the definitions of VR variables, according to the manual of 
RSA
-911 (Policy Dir
ective of 
RSA
-PD-16-04 for Revision of 
RSA
-PD-14-01; 
https://www2.ed.gov/policy/speced/guid/rsa/subregulatory/pd
-16-04.pdf).  
  This 
appendix 
section includes three tables: (1) VR services are shown in Table 
A.1; (2) demographic backgrounds are listed in Table A.2; and (3) rehab
ilitation outcomes 
are given in Table A.3.
   Table A.1. List of the 
Definitions of VR 
Service 
Variables 
Used in the 
Study 
  Rehabilitation Service
 RSA Definition
 Job Placement Assistance
 This is a referral to a specific job 
resulting in setting up a 
job interview 
and obtaining a job on behalf of a 
customer
 (1=received; 0=not received)
 On-the
-Job Supports
 Services such as job coaching, follow 
along services to assist a customer 
adjust to the job and become stable to 
enhance job retention 
 (1=received;
 0=not received)
 Rehabilitation Technology
 The application of rehabilitation 
engineering, assistive devices, 
technologies, or services, to meet the 
needs and address the barriers 
 (1=received; 0=not received)
              !123 Table A.2. List of the 
Definitions of VR 
Demographic 
Variables 
Used in the 
Study 
  VR Demographics
 RSA Definition
 Age
 Indicate age when he or she is applied 
for VR services (continuous measure) 
 Gender
 Indicate an individual is male or female
 (1=male; 0=female) 
 Minority
 (Non
-White)
 
s/he is minority (including Black, 
Native, As
ian, Pacific Islander, and 
Hispanic) or not (White)
 (1=minority; 0=non
-minority)
 Social Security Benefits 
 (Insurance Benefits)
 Indicate if an individual receives Social 
Security Disability Insurance (SSDI) or 
Supplemental Security Income (SSI) 
 (1=receiv
ed; 0=not received)
 Employment Status at Application
 (Previous Work Background)
 Employment status of the individual at 
application
 (1=employment; 0=not employed)
 Type of Disability
 
impairment 
includes
: blindness/visual 
impairment, deafness/hearing 
impairment, physical or 
orthopedic/neurological impairment, 
LD, ADHD, intellectual disability (ID), 
TBI, autism, mental illness (MI), 
substance abuse
 (SA)
 (categorical/qualitative measure) 
 Level of Education 
 Level of education the individual had 
attained includes: elementary/secondary 
education, special education, high 
school graduate or equivalency 
certificate (GED), college or above
 (categorical/o
rdinal measure)
 Significance of Disability
 Whether the individual was considered a 
person with a significant disability or a 
most significant disability during VR
 (1=yes; 0=no) 
         !124   Table A.3. List of the 
Definitions of VR 
Outcome 
Variables 
Used in the 
Study 
  Rehabilitation Outcome
 RSA Definition
 Rehabilitation Outcome
 Individual exited the VR program either 
with or without an employment 
outcome after receiving services
 (1=exited with an employment; 
0=exited without an employment)
 Competitive Employment
 Employed either at or above minimum 
wage in integrated setting
 (1=yes; 0=no)
 Weekly Earnings (or 
Quality of 
Employment
) The approximate amount of money 
earned in a typical 
week
 (continuous measure)
                          !125 APPENDIX
 B: Descriptive Data Statistics
   Table B.1 Descriptive Summary of Usable Sample by Office Level in Michigan 
(n=11,819) Office Unit
 Frequency
 Percentage
 Adrian Unit
 244 2.06% Alpena Unit
 160 1.35% Ann Arbor Unit
 484 4.10% Battle Creek Unit
 298 2.52% Bay City Unit
 281 2.38% Benton Harbor Unit
 289 2.45% Big Rapids Unit
 175 1.48% Clinton Township Unit
 732 6.19% Detroit Fort Street Unit
 320 2.71% Detroit Grand River Unit
 423 3.58% Detroit Hamtramck Unit
 463 3.92% Detroit Mack Unit
 332 2.81% Detroit Porter Unit
 421 3.56% Flint Unit
 418 3.54% Gaylord Unit
 174 1.47% Grand Rapids Unit
 764 6.46% Holland Unit
 335 2.83% Jackson Unit
 163 1.38% Kalamazoo Unit
 345 2.92% Lansing Unit
 631 5.34% Livonia Unit
 441 3.73% Marquette Unit
 405 3.43% Midland Unit
 125 1.06% Monroe Unit
 200 1.69% Mt. Pleasant Unit
 136 1.15% Muskegon Unit
 366 3.10% Oak Park Unit
 540 4.57% Pontiac Unit
 416 3.52% Port Huron Unit
 485 4.10% Saginaw Unit
 281 2.38% Taylor Unit
 213 1.80% Traverse City Unit
 377 3.19% Wayne Unit
 382 3.23% Total
 11,819 100.00% Note. There are 33 offices located statewide in Michigan, serving the target population of 
people with disabilities of N=17,633 in FY 2015. Of the target samples, the usable sample 
size is n=11,819 for 
data analysis in the study and ICC calculations.
   !126   Table B.2.  A Summary of the Geographic Information System of Office Units in Michigan 
  Latitude
 (N)
 Longitude
 (W)
 Abbreviation
 MRS Unit
 41.90 84.04 ADR
 Adrian
 45.06 83.43 ALP
 Alpena
 42.28 83.73 AA Ann Arbor
 42.30 85.23 BCK Battle Creek
 43.60 83.89 BC Bay City
 42.10 86.48 BH Benton Harbor
 43.70 85.48 BR Big Rapids
 42.31 83.21 CT Clinton 
Township
 42.38 83.10 DT Detroit Fort 
Street
 Detroit Grand 
River
 Detroit 
Hamtramck
 Detroit 
Mack
 Detroit Porter
 43.02 83.69 FL Flint
 45.03 84.67 GL Gaylord
 42.96 85.66 GR Grand Rapids
 42.78 86.10 HD Holland
 42.25 84.40 JAK
 Jackson
 42.27 85.59 KAZ
 Kalamazoo
 42.71 84.55 LAN
 Lansing
 42.40 83.37 LV Livonia
 46.55 87.41 MRQ
 Marquette
 43.62 84.23 ML Midland
 41.92 83.40 MR Monroe
 43.60 84.77 MP Mt. Pleasant
 43.23 86.26 MKG Muskegon
 42.47 83.18 OP Oak Park
 42.65 83.29 PT Pontiac
 42.98 82.60 PH Port Huron
 43.42 83.95 SAG
 Saginaw
 42.24 83.27 TL Taylor
 44.77 85.62 TC Traverse City
 42.28 83.39 WY Wayne
    !127   Figure B.1 Spatial Network of Target Sample in Michigan by Hierarchical Structure
    Note1. MRS represents the Michigan Rehabilitation Services Programs.
 Note2. 

plotted on geometric graph according to the geographic information system (GIS) in Table 
B.2.
         Longitude (West)
 Latitude (North)
 !128 APPENDIX
 C: Glossary of
 Abbreviations
  This 
glossary contains 
abbreviations
, acronyms and some definition used in this
 study.
   Table
 C.1 Glossary of Abbreviations 
  ANOVA
 Analysis of Variance
 ASD
 Autism Spectrum Disorder
 CSPD
 Comprehensive System of Personnel Development
 CTT
 Classical Test Theory
 EBP
 Evidence Based Practice
 ESRA
 Education Sciences Reform Act
 FY Fiscal Year
 GIS
 Geographic Information System
 HGLM
 Hierarchical Generalized Linear Model
 HLM
 Hierarchical Linear Model
 ICC
 Intraclass Correlation Coefficient
 ID Intellectual 
Disability
 IPE
 Individualized Plan for Employment
 LVM
 Latent Variable Modeling
 MI Mental Illness
 MLE
 Maximum Likelihood Estimate
 MRS
 Michigan Rehabilitation Services
 NCLB
 No Child Left Behind
 RCT Randomized Control Trial
 REML
 Restrictive 
Maximum Likelihood
 RSA
 Rehabilitation Service Administration
 SE Standard Error
 SEM
 Standard Error Measurement
 SEM
 Structural Equation Model
 TBI
 Traumatic Brain Injury
 VR Vocational Rehabilitation
 WIOA
 Workforce Innovation and Opportunity
     !129                         BIBLIOGRAPHY
                               !130 BIBLIOGRAPHY
    Agresti, A., & Finlay, B. (2009). 
Statistical methods for the social sciences
. Upper Saddle 
River, N.J: Pearson Prentice Hall. 
  Alsaman
, M. A., & Lee, C.
-L. (2017). Employment Outcomes of Youth With Disabilities 
in Vocational Rehabilitation: A Multilevel Analysis of RSA
-911 Data. 
Rehabilitation 
Counseling Bulletin
, 60(2), 98-107.  American Educational Research Association., American Psych
ological Association., 
National Council on Measurement in Education., & Joint Committee on Standards for 
Educational and Psychological Testing (U.S.). (2014). 
Standards for educational and 
psychological testing
.   Anderson, T., & Shattuck, J. (2012). Desig
n-based research: A decade of progress in 
education research? 
Educational researcher
, 41(1), 16-25.  Austin, B. S., & Leahy, M. J. (2015). Construction and validation of the clinical judgment 
skill inventory: Clinical judgment skill competencies that measu
re counselor debiasing 
techniques. 
Rehabilitation Research, Policy, and Education
, 29(1), 27.   Austin, B. S., & Lee, C.
-L. (2014). A structural equation model of vocational rehabilitation 
services: Predictors of employment outcomes for clients with intellectual and co
-occurring psychiatric disabilities. 
Journal of Rehabilitation, 80(3), 
11-20.  Bara
b, S., & Squire, K. (2004). Design
-based research: Putting a stake in the ground. 
The 
journal of the learning sciences
, 13(1), 1-14.  Bartholomew, D. J. (1987). 
Latent variable models and factors analysis
. Oxford University 
Press, Inc.
  Bartholomew, D. J.,
 Knott, M., & Moustaki, I. (2011). 
Latent variable models and factor 
analysis: A unified approach
 (Vol. 904). John Wiley & Sons.
  Bloom, H.S., Bos, J.M., & Lee, S.W. (1999). Using Cluster Random Assignment to 
Measure
 Program Impacts: Statistical Implicatio
ns for the Evaluation of Education 
Programs. Evaluation
 Review, 23(4), 445
-469.  Bloom, H.S., Richburg
-Hayes, L., & Black, A.R. (2007). Using Covariates to Improve 
Precision:
 Empirical Guidance for Studies that Randomize Schools to Measure the 
Impacts of E
ducational
 Interventions. Educational Evaluation and Policy Analysis, 
29(1), 30-59.   !131 Bolton, B. F., Bellini, J. L., & Brookings, J. B. (2000). Predicting client employment 
outcomes from personal history, functional limitations, and rehabilitation services. 
Rehabilitation Counseling Bulletin
, 44(1), 10-21.  Casella, G., & Berger, R. L. (200
2). Statistical inference
. Australia: Thomson Learning. 
  Chan, F., Tarvydas, V., Blalock, K., Strauser, D., & Atkins, B. J. (2009). Unifying and 
elevating rehabilitation counseling through model
-driven, diversity
-sensitive 
evidence
-based practice. 
Rehabil
itation Counseling Bulletin
, 52(2), 114-119.  Chan, F., Bezyak, J., Ramirez, M. R., Chiu, C. Y., Sung, C., & Fujikawa, M. (2010). 
Concepts, Challenges, Barriers, and Opportunities Related to Evidence
-Based Practice 
in Rehabilitation Counseling. 
Rehabilitat
ion Education
, 24.  Chan, F., Wang, C. C., Fitzgerald, S., Muller, V., Ditchman, N., & Menz, F. (2016). 
Personal, environmental, and service
-delivery determinants of employment quality for 
state vocational rehabilitation consumers: A multilevel analysis. 
Journal of Vocational 
Rehabilitation
, 45(1), 5-18.  Cobb, P., Confrey, J., DiSessa, A., Lehrer, R., & Schauble, L. (2003). Design experiments 
in educational research. 
Educational researcher
, 32(1), 9
-13.  Chan, F., Lee, G. K., Lee, E., Kubota, C., & Allen, 
C. A. (2007). Structural equation 
modeling in rehabilitation counseling research. 
Rehabilitation Counseling Bulletin
, 57(1), 44-57.  Chan, J. Y., Wang, C. C., Ditchman, N., Kim, J. H., Pete, J., Chan, F., & Dries, B. (2014). 
State unemployment rates and vo
cational rehabilitation outcomes: A multilevel 
analysis. 
Rehabilitation Counseling Bulletin
, 57(4), 209
-218.  Cohen, J. (1988). 
Statistical power analysis for the behavioral sciences
. Hillsdale, N.J: L. 
Erlbaum Associates. 
  Cohen, J. (1992). A power prime
r. Psychological bulletin
, 112(1), 155.  Connelly, L. B. (2003). Balancing the number and size of sites: an economic approach to 
the optimal design of cluster samples. 
Controlled clinical trials
, 24(5), 544-559.  Connolly, P., Keenan, C., & Urbanska, K. (2018). The trials of evidence
-based practi
ce in 
education: a systematic review of randomised controlled trials in education research 
19802016. Educational Research
, 60(3), 276-291.  Cox, D. R. (1971). The choice between alternative ancillary statistics. 
Journal of the Royal 
Statistical Society. S
eries B (Methodological)
, 251-255.
    !132 Ditchman
, N. M., Miller, J. L., & Easton, A. B. (2018). Vocational Rehabilitation Service 
Patterns: An Application of Social Network Analysis to Examine Employment 
Outcomes of Transition
-Age Individuals With Autism. 
Rehabilitation Counseling 
Bulletin
, 61(3), 143-153.  Donner, A., Birkett, N., & Buck, C. (1981). Randomization by cluster: sample size 
requirements and analysis. 
American Journal of Epidemiology
, 114(6), 906-914.  Donner, A., & Koval, J. J. (1980a). The estimation of intraclass correlation in the analys
is 
of family data. 
Biometrics
, 19-25.  Donner, A., & Koval, J. J. (1980b). The large sample variance of an intraclass correlation. 
Biometrika
, 67(3), 719-722.  Donner, A., & Koval, J. J. (1982). Design considerations in the estimation of intraclass 
correla
tion. 
Annals of Human Genetics
, 46(3), 271-277.
  Dutta, A., Gervey, R., Chan, F., Chih
-chin, C., & Ditchman, N. (2008). Vocational 
rehabilitation services and employment outcomes for people with disabilities: A united 
states study. 
Journal of Occupational 
Rehabilitation
, 18(4),
 326-334.   Efron, B., & Hinkley, D. (1978). Assessing the Accuracy of the Maximum Likelihood 
Estimator: Observed Versus Expected Fisher Information. 
Biometrika,
 65(3), 457-482. doi:10.2307/2335893
  Eignor, D. R. (2013). 
The standards
 for educational and psychological testing
. American 
Psychological Association.
  Ellis, P. D. (2009, September 7). 
Thresholds for interpreting effect sizes
 [Website log post 
on Hong Kong Polytechnic University]. Retrieved August 11, 2018, from 
http://www.p
olyu.edu.hk/mm/effectsizefaqs/thresholds_for_interpreting_effect_sizes2
.html
  ESRA Legislation 
- U.S. Department of Education. (May 2008). 
Public Law Print: 
Education Sciences Reform Act
. Retrieved from 
https://ies.ed.gov/director/pdf/ESRAreauth.pdf
  Fishe
r, R. A. (1915). Frequency Distribution of the Values of the Correlation Coefficient in 
Samples from an Indefinitely Large Population. 
Biometrika,
 10(4), 507-521. doi:10.2307/2331838
 (Editorial). (1915). On the Distribution of the Standard Deviations of Sm
all Samples: 
Appendix I. To Papers by "Student" and R. A. Fisher. 
Biometrika,
 10(4), 522-529. doi:10.2307/2331839
  Fisher, R. A. (1925a). 
Statistical methods for research workers
. Genesis Publishing Pvt 
Ltd.
  !133 Fisher, R. A. (1925b, July). Theory of statistical estimation. In 
Mathematical Proceedings 
of the Cambridge Philosophical Society
 (Vol. 22, No. 5, pp. 700
-725). Cambridge 
University Press.
  Fisher, R. A. (194
2). The design of experiments
. Edinburgh: Oliver
 and Boyd
.  Fisher, R. A. (1958a). Cigarettes, cancer, and statistics. 
The Centennial Review of Arts & 
Science
, 2, 151-166.  Fisher, R. A. (1958b). Lung cancer and cigarettes. 
Nature
, 182(4628), 108.  Fleming, A. R., Del Valle, R., Kim, M., & Leahy, M. J. 
(2013). Best practice models of 
effective vocational rehabilitation service delivery in the public rehabilitation 
program: A review and synthesis of the empirical literature. 
Rehabilitation Counseling 
Bulletin
, 56(3), 146-159.  Flom, P. (2015, March 10). 
What is adjusted correlation
 [Website log post on Quora]. 
Retrieved August 8, 2018, from https://www.quora.com/What
-is-adjusted
-correlation
  Givens, G. H., & Hoeting, J. A. (2012). 
Computational statistics
 (Vol. 710). John Wiley & 
Sons.
  Hauck, W. W., Gilli
ss, C. L., Donner, A., & Gortner, S. (1991). Randomization by cluster. 
Nursing research
, 40(6), 356-358.  Hays, W. L. (1994). 
Statistics
. Fort Worth: Harcourt Brace College Publishers. 
  Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values
 for planning group
-randomized trials in education. 
Educational Evaluation and Policy Analysis
, 29(1), 60-87.  Hedges, L. V., Hedberg, E. C., & Kuyper, A. M. (2012). The variance of intraclass 
correlations in three
-and four
-level models. 
Educational and Ps
ychological 
Measurement
, 72(6), 893-909.  Hedges, L. V., & Hedberg, E. C. (2013). Intraclass correlations and covariate outcome 
correlations for planning two
-and three
-level cluster
-randomized experiments in 
education. 
Evaluation review
, 37(6), 445-489.  Hedges, L. V., & Olkin, I. (1985). 
Statistical methods for meta
-analysis
. Orlando: 
Academic Press. 
  Holland, P. W. (1986). Statistics and causal inference. 
Journal of the American statistical 
Association
, 81(396), 945-960.   !134 Karlin, S., Cameron, E. C., & Williams, P. T. (1981). Sibling and parent
--offspring 
correlation estimation with variable family size. 
Proceedings of the National Academy 
of Sciences
, 78(5), 2664-2668.  Klar, N., & Donner, A. (2001). Current and future chal
lenges in the design and analysis of 
cluster randomization trials. 
Statistics in medicine
, 20(24), 3729-3740.  Klar, N., & Donner, A. (2015). The impact of EF Lindquist's text on cluster randomisation. 
Journal of the Royal Society of Medicine
, 108(4), 142
-144.  Kosciulek, J. F. (2010). Evidence
-Based Rehabilitation Counseling Practice: A 
Pedagogical Imperative. 
Rehabilitation Education
, 24.  Kosciulek, J. F., & Merz, M. (2001). Structural analysis of the consumer
-directed theory of 
empowerment, 
Rehabilitati
on Counseling Bulletin
, 44(4),
 209-216.  Kutner, M. H., Nachtsheim, C., Neter, J., & Li, W. (2005). 
Applied linear statistical 
models
. Boston: McGraw
-Hill Irwin. 
  Leahy, M. J., Thielsen, V. A., Millington, M. J., Austin, B., & Fleming, A. (2009). Quality 
assurance and program evaluation: Terms, models, and applications. 
Journal of 
Rehabilitation Administration
, 33(2), 69.  Leahy, M. J., & Arokiasamy, C. V. (2010). Prologue: Evidence
-based practice research and 
knowledge translation in rehabilitation counse
ling. 
Rehabilitation Research, Policy, 
and Education
, 24(3/4), 173.
  Leahy, M. J., Chan, F., & Lui, J. (2014a). Evidence
-based best practices in the public 
vocational rehabilitation program that lead to employment outcomes. 
Journal of 
Vocational Rehabilita
tion
, 41(2), 83-86.  Leahy, M. J., Chan, F., Lui, J., Rosenthal, D., Tansey, T., Wehman, P., Kundu, M., Dutta, 
A., Anderson, C. A., Del Valle, R., & Sherman, S. (2014b). An analysis of evidence
-based best practices in the public vocational rehabilitation p
rogram: Gaps, future 
directions, and recommended steps to move forward. 
Journal of Vocational 
Rehabilitation
, 41(2), 147-163.  Lee, C.
-L. (2014). 
Linking paths between rehabilitation customer characteristics, services 
and outcomes by decision tree models
 (Unpublished apprenticeship paper. Michigan 
State University. Department of Counseling, Educational Psychology, Special 
Education).
  Lee, C.
-L., Pi, S., & Thielsen, V. (2012). 
Relationships of Customer Characteristics, 
Services and Outcomes Using a Data Min
ing Approach
 (An unpublished internal 
report to Michigan Rehabilitation Services. Project Excellence, Program of 
Rehabilitation Counseling, Department of Counseling, Educational Psychology, 
Special Education, Michigan State University).
  !135 Lee Rodgers, J., & Nicewander, W. A. (1988). Thirteen ways to look at the correlation 
coefficient. 
The American Statistician
, 42(1), 59-66  Lingard, B. (2013). The impact of research on education policy in an era of evidence
-based 
policy. 
Critical Studies 
in Education
, 54(2), 113-131.
  Lohr, S. L. (1999). 
Sampling: Design and Analysis
. Pacific Grove, CA: Duxbury Press. 
  Lomax, R. G., & Hahs
-Vaughn, D. L. (2012). An Introduction to Statistical Concepts. New 
York: Routledge.
  Kelly
, K. (2018).
 CEP932: 
Quantitative Methods in Education Research I [Spring 2018], 

correlation coefficient]
. College of Education, Michigan State University, East 
Lansing, Michigan, USA.
  Mayhew, S. (201
5). A d
ictionary of 
geography
. Oxford University Press
.  Maas, C. J., & Hox, J. J. (2004). The influence of violations of assumptions on multilevel 
parameter estimates and their standard errors. 
Computational statistics & data 
analysis
, 46(3), 427-440.  Menon, A., Korner
-Bitensky, N., Kastner, M., McKibbon, K., & Straus, S. (2009). 
Strategies for rehabilitation professionals to move evidence
-based knowledge into 
practice: a systematic review. 
Journal of Rehabilitation Medicine
, 41(13), 1024-1032.  Mood, A. 
M., Graybill, F. A., & Boes, D. C. (1974). 
Introduction to the theory of statistics
. New York: McGraw
-Hill. 
  Moore, C. L., Flowers, C. R., & Taylor, D. (2000). Vocation al rehabilitation services: 
Indicators of successful rehabilitation for persons with m
ental retardation. 
Journal of 
Applied Rehabilitation Counseling
, 31(2), 36-40.   Moore, C. L. (2001). Disparities in closure success rates for African Americans with 
mental retardation: An 
ex post
-facto 
research design. 
Journal of Applied 
Rehabilitation Co
unseling
, 32(2), 31-36.   Moore, C. L., Feist
-Price, S., & Alston, R. J. (2002a). Competitive employment and mental 
retardation: Interplay among gender, race, secondary psychiatric disability, and 
rehabilitation services. 
Journal of Rehabilitation
, 68(1), 14-19.   Moore, C. L., Feist
-Price, S., & Alston, R. J. (2002b). VR services for persons with 
severe/profound mental retardation: Does race matter? 
Rehabilitation Counseling 
Bulletin
, 45(3), 162-167.    !136 Moore, C. L., Harley, D. A., & Gamble, D. (2004). Ex
-post
-fact
o analysis of competitive 
employment outcomes for individuals with mental retardation: National perspective. 
Mental Retardation
, 42(4), 253-262.   Murray, D.M. & Short, B. (1995). Intra
-Class Correlation Among Measures Related to 
Alcohol
 Use by Young Adult
s: Estimates, Correlates, and Applications in Intervention 
Studies. Journal
 of Studies on Alcohol, 56(6), 681
-694.
  
: Statistical analysis with latent 
variables
 (Version 
6).  Los Angeles, CA: Muth”n 
& Muth”n
.  NCLB Legislation 
- U.S. Department of Education. (January 8, 2002). 
Public Law Print: 
No Child Left Behind Act
. Retrieved from 
https://www2.ed.gov/policy/elsec/leg/esea02/107
-110.pdf
  Odom, S. L., Brantlinger, E., Gersten, R., Horner, R. H., Tho
mpson, B., & Harris, K. R. 
(2005). Research in special education: Scientific methods and evidence
-based 
practices. 
Exceptional children
, 71(2), 137-148.  Olkin, I., & Pratt, J. W. (1958). Unbiased estimation of certain correlation coefficients. 
The 
Annals 
of Mathematical Statistics
, 29(1), 201-211.  
Effect of college or university training on earnings of people with disabilities: A case 
control study. 
Journal of 
Vocational Rehabilitation
, 43(2), 93
-102.  Paccagnella, O. (2006). Centering or not centering in multilevel models? The role of the 
group mean and the assessment of group effects. 
Evaluation review
, 30(1), 66-85.  Pearson, K., & Lee, A. (1903). On the Laws of Inheritance in Man: I. Inheritance of 
Physical Characters. 
Biometrika,
 2(4), 357-462. doi:10.2307/2331507
  Pearson, K. (1904). On the Laws of Inheritance in Man: II. On the Inheritance of the 
Mental and Moral C
haracters in Man, and Its Comparison with the Inheritance of the 
Physical Characters. 
Biometrika,
 3(2/3), 131
-190. doi:10.2307/2331479
  Pearson, K. (1920). Notes on the History of Correlation. 
Biometrika,
 13(1), 25-45. doi:10.2307/2331722
  
Biometrika,
 14(3/4), 412
-417. doi:10.2307/2331822
  Pi, S. (2006). 
Micro
-and Macro
-level Factors Related to Vocational Rehabilitation 
Outcomes
 (Doctoral dissertation, Michigan Stat
e University. Department of 
Counseling, Educational Psychology, Special Education).
   !137 Pi, S., & Thielsen, V. (2011). RSA 911 Data Is a Gold Mine If You Have the Right Shovel
, presented at 
the 4th Summit on Vocational Rehabilitation Program Evaluation & 
Qua
lity Assurance
. September 13th & 14th, 2011. Grand Hyatt Tampa Bay, Tampa, 
Florida, U.S.A.
 Retrieved from 
 http://vocational
-rehab.com/wp
-content/uploads/2013/04/C802.0007.01.pdf
   Raykov, T., & Marcoulides, G. A. (2004). Using the delta method for approxi
mate interval 
estimation of parameter functions in SEM. 
Structural Equation Modeling
, 11(4), 621-637.  Raykov, T., & Marcoulides, G. A. (2006). 
A first course in structural equation modeling
. New York, NY: Psychology Press, Tylor and Francis Group, LLC.
  Raykov, T., & Penev, S. (2010). Evaluation of reliability coefficients for two
-level models 
via latent variable analysis. 
Structural Equation Modeling
, 17(4), 629-641.  Raykov, T., & Marcoulides, G. A. (2011). 
Introduction to psychometric theory
. Routledge.
  Raykov, T. (2011). Intraclass correlation coefficients in hierarchical designs: Evaluation 
using latent variable modeling. 
Structural Equation Modeling
, 18(1), 73-90.  Raykov, T., & Marcoulides
, G. A. (2015a). Intraclass correlation coefficients in 
hierarchical design studies with discrete response variables: A note on a direct interval 
estimation procedure. 
Educational and psychological measurement
, 75(6), 1063-1070.  Raykov, T., & Marcoulides,
 G. A. (2015b). On examining the underlying normal variable 
assumption in latent variable models with categorical indicators. 
Structural Equation 
Modeling: A Multidisciplinary Journal
, 22(4), 581-587.
  Raudenbush, S. W. (1997). Statistical analysis and opt
imal design for cluster randomized 
trials. 
Psychological Methods
, 2(2), 173.  Raudenbush, S. W., & Bryk, A. S. (2002). 
Hierarchical linear models: Applications and 
data analysis methods
 (Vol. 1 Advanced quantitative techniques in the social 
sciences)
. CA: Sage.
  Rehabilitation Services Administration Policy Directive (2013). RSA
-PD-14-01. Washington, DC. Retrieved from 
https://www2.ed.gov/policy/speced/guid/rsa/subregulatory/pd
-14-01.pdf 
  Richardson, J. T. (2011). Eta squared and partial eta
 squared as measures of effect size in 
educational research. 
Educational Research Review
, 6(2), 135-147.  Rizzo, M. L. (2007). 
Statistical computing with R
. Chapman and Hall/CRC.
  Rosenthal, J. A. (1996). Qualitative descriptors of strength of association 
and effect size. 
Journal of social service Research
, 21(4), 37-59.  !138 Rosenthal, D. A., Dalton, J. A., & Gervey, R. (2007). Analyzing vocational outcomes of 
individuals with psychiatric disabilities who received state vocational rehabilitation 
services: A da
ta mining approach. 
International Journal of Social Psychiatry
, 53(4), 357-368.  Ross, S. M. (2013). 
Simulation
. Amsterdam: Academic Press. 
  Roussas, G. G. (2002). 
A course in mathematical statistics
. San Diego: Academic. 
  Rutterford, C., Copas, A., & El
dridge, S. (2015). Methods for sample size determination in 
cluster randomized trials. 
International journal of epidemiology
, 44(3), 1051-1067.  Schoen, B. (2010). 
An examination of employment outcomes for individuals with spinal 
cord injury served by the 
state vocational rehabilitation services program between 
2004 and 2008 (Doctoral dissertation, Michigan State University. Department of 
Counseling, Educational Psychology, Special Education).
   Schoen, B. A., & Leahy, M. J. (2012). An Analysis of the Chang
ing Demographics of 
Individuals with Spinal Cord Injury Who Received State Vocational Rehabilitation 
Services between 2004 and 2008. 
Journal of Rehabilitation
, 78(3).  Schonbrun, S. L., Sales, A. P., & Kampfe, C. M. (2007). RSA Services and Employment 
Outc
ome in Consumers with Traumatic Brain Injury. 
Journal of Rehabilitation
, 73(2).  Schneider, B. (Ed.). (2018). 
Handbook of the Sociology of Education in the 21st Century
. Springer.
  Schneider, B., Carnoy
, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). 
Estimating causal effects using experimental and observational design
. American 
Educational & Research Association.
  Schochet, P. (2005). Statistical Power for Random Assignment Evaluations o
f Education
 Programs. Princeton, NJ: Mathematica Policy Research, Inc.
  Schwanke, T., & Smith, R. O. (2004). Technical report
Vocational rehabilitation database 
analysis: RSA
-911 case service report and database linking (Version 1.0). 
Rehabilitation Resear
ch Design & Disability: University of Wisconsin
-Milwaukee
.  Sink, T., Bua
-Iam, P., Hampton, J. E., & Snuffer, D. W. (2014). Applying Location Theory 
in Vocational Rehabilitation. 
Journal of Rehabilitation Administration
, 38(2), 73-86.  Slavin, R. E. (2002)
. Evidence
-based education policies: Transforming educational practice 
and research. 
Educational researcher
, 31(7), 15-21.  Slavin, R. E. (2008). Perspectives on evidence
-based research in education
What works? 
Issues in synthesizing educational program ev
aluations. 
Educational researcher
, 37(1), 5-14.  !139 Shavelson, R. J., Phillips, D. C., Towne, L., & Feuer, M. J. (2003). On the science of 
education design studies. 
Educational researcher
, 32(1), 25-28.  Soper, H., Young, A., Cave, B., Lee, A., & Pearson, K. 
(1917). On the Distribution of the 
Correlation Coefficient in Small Samples. Appendix II to the Papers of "Student" and 
R. A. Fisher. 
Biometrika,
 11(4), 328-413. doi:10.2307/2331830
  Stapleton, J. H. (2009). 
Linear statistical models
 (Vol. 719). John Wiley
 & Sons.
  Student. (1917). Tables for Estimating the Probability that the Mean of a Unique Sample of 
Observations Lies Between 
-
from Which the Sample is Drawn. 
Biometrika,
 11(4), 414
-417. doi:10.2307/2
331831  
education research. 
Journal of Graduate Medical Education
, 3(3), 285-289.   Supporting Information for the RSA
-911 Data. (n.d.). Retrieved November 18, 2018 f
rom
 https://rsa.ed.gov/display.cfm?pageid=75
    Tachibana, Y., Miyazaki, C., Mikami, M., Ota, E., Mori, R., Hwang, Y.,
 Terasaka, A.
, Kobayashi, E.,
 & Kamio
, Y. (2018). Meta
-analyses of individual versus group 
interventions for pre
-school children with autism spectrum disorder (ASD). 
PloS one
, 13(5), e0196272.
 https://doi.org/10.1371/journal.pone.0196272
   Tan, P.
-N., Steinbach, M., & Kumar, V. (2005). 
Introd
uction to data mining
. Boston: 
Pearson Addison Wesley.
  The What Works Clearinghouse (WWC). (n.d.). 
Standards Handbook Version 4.0.
 Retrieved from
 https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_standards_handbook_v4.p
df 
  Thorndike, R. M. (2005). 
Measurement and evaluation in psychology and education
. Upper Saddle River, New Jersey: Pearson Education, Inc. 
  U.S. Department of Education. (September 16, 2016). 
Guidance and Regulatory 
Information
. Retrieved from 
https://www2.ed.gov/policy/elsec/leg/e
ssa/guidanceuseseinvestment.pdf
  WIOA Legislation 
- U.S. Department of Labor. (June 1, 2018). 
Overview and Highlight: 
Workforce Innovation and Opportunity Act
. Retrieved from 
https://www.doleta.gov/WIOA/Overview.c
fm