THEORY AND APPLICATIONS OF INTRACLASS CORRELATION COEFFICIENTS AT CLUSTER RANDOMIZED DESIGN FOR STATISTICAL PLANNING VIA HIERARCHICAL MIXED MODELS By Chun -Lung Lee A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Measurement and Quantitative Methods Doctor of Philosophy 2019 ABSTRACT THEORY AND APPLICATIONS OF INTRACLASS CORRELATION COEFFICIENTS AT CLUSTER RANDOMIZED DESIGN FOR STATISTICAL PLANNING VIA HIERARCHICAL MIXED MODELS By Chun -Lung Lee Research investigators rely on information of intraclass correlation coefficients for plan ning and conducting designs and experiments for scientific inquiries in educational and social studies. Randomized control led trials and cluster randomized studies are deemed as the gold standard for evidence -based interventions, and both approach es have been applied successfully in many situations for more effective decision -making in education and social research . The c luster randomized design s for community -based research , in particular, have been widely used in the modern era , since they are often operated at the group level, like a whole community or worksite, in order for researchers more easily to deal with random assignment of an entire intact group rather than that of each individual subject. Hence, such cluster -randomized trials or group -randomized experiments have become important and useful to provide evidence -guided practice models for scientific inquiry and research . The aim of this dissertation is to develop the methods for the intraclass correlation coefficients for binary and continuous outcomes in cluster -based intervention designs using hierarchal mixed model based on the scenarios of unconditional and conditional mult ilevel structures with cluster sampling schemes. Simulation studies are used to assess the statistical properties of intraclass correlation estimation and inference via the real data set of RSA -911 for people with disabilities served in the Michigan Rehabilitation Services Programs . The results show that the average (unadjusted) intraclass correlation is about 0.01 for competitive employment and about 0.02 for weekly earning s (quality employment ) in Michigan. These average (unadjusted) intraclass correlations from RSA -911 are relative ly low in comparison to education interventions or academic programs for assessments in reading and mathematics across K -12 (Bloom et al., 1999, 2007; Hedges & Hedberg, 2007; Schochet, 2008); however, they seem comparable to some extent from those psycholo gical and mental health data in school -based intervention designs (Murray & Short, 1995). For future study, researchers may look into different types of integrated large -scale complex data sets such as RSA -911 data with a set of covariates from Census dat a for investigating how intraclass correlation performs in statistical estimation and inference across multiple platforms. In addition, it would be interesting to study how to deal with missing values in the estimation procedure of intraclass correlation, and what remedial procedure can be added to improve estimation process. For the proposed method, it would recommend the total sample size should be greater than 1,500 and within group sample size would be better to be larger than 100 (with the number of gr oups about 15). In conclusion, this study provides a comprehensive methodology for intraclass correlation estimation and inference using the mixed analysis of variance approach along with the derived sampling distribution (i.e., F-distribution) for testing hypothesis as well as building confidence interval on intraclass correlation estimates. Such proposed statistical procedures can be easily used and applied in any large -scale or small -scale data sets, whereas small total sample size and small within group size and miss ing data are limitations on intraclass correlation estimation in terms of precision and accuracy. Keywords: Intraclass Correlation Coefficient, Cluster Randomized Design, Multilevel Structure , Hierarchical Linear Model ing , Evidence -based Practice Models This dissertation is dedicated to Mom and Dad (, both of who m graciously and patiently tolerated me then and now) ~ Through all the years, Thank you for always always believing in me (that this would someday be completed) ! iv ACKNOWLEDGEMENTS I like to thank the support of my dissertation /academic advisor, Dr. Kimberly Kelly, and my committee members, Drs. Richard Houang, Gloria Lee, and Su kyeong Pi. This dissertation is the final product of my (long and winding) PhD journey at M ichigan State , and it cannot be done without two (separate but equally important ) groups proper training my MQM (measurement & quantitative methods ) and PE (project excellence at rehab counseling ). I am very fortunate to have not just one (major of MQM) but two (aficionad o of rehab counseling as well ) unique experience s (within which there re challenges , difficulties , happiness and joys to make me grow as who I am today ) on the special educational trip to the goal line ( a doctorate degree ). Although I did not attend the graduat ion ceremony, I was truly inspired by Kirk Cousins who deliver ed a passionate commencement speech (MSU, Spring 2019 ; https://www.wkar.org/post/kirk -cousins -may-3-2019-michigan -state -university -commencement -address#stream/0 ), addressing that : Through it all, enjoy the journey ... let us rejoice and be glad in it ... d -deliver ... see life th rough a window, not a mirror .... and choose to be a great decision maker. At the end of the day While chasing/ fighting forget to stop and smell the roses along the way (to enjoy enough the tough road thru paradise ). And also The Lord blessed my time here in ways I never thought possible (God was preparing us for great things, wa Go G reen, Go White, Go MQM, and Go PE! v PREFACE The history of intraclass correlation can be traced back to the last century that Sir Ronald A . Fisher introduced it to research communities as a new tool for measuring the level of similarity within a group . Since then, the intraclass correlation has been used as one of the most important statistical tools in scientific inquiries . In education , for example, it is often to use the intraclass correlation coefficient (or ICC) to measure the degree of intra -cluster resemblance in student educational outcome s (e.g., test scores) between different classroom s or school s. Although the ICC was a great success in the idea of how to measure within -group , it was not until later that Allan Donner and his colleagues provided a comprehensive and practical framework of the ICC estimation and inference (e.g. , point estimates are derived by multivariate normal theory, and hypothesis tests are based on variance components using analysis of variance, ANOV A). In the contemporary era, ICC play s another key role in quantifying the inherent clustering effect size (i.e., within -group variation) in multilevel design s by using hierarchical linear models (HLM) . Stephen Raudenbush is a pioneer for the development and application of HLM in education, and he sheds light on how to evaluate the effect magnitude of multilevel structure by ICC. Moreover , Larry Hedges , renowned for his work of meta -analysis in education, finds a n ovel approach to power ing (i.e., power analysis) sampl ing designs through design effect (i.e., a function of ICC). Lastly, Tenko Raykov gives new insight into strategies for ICC estimates in the complex statistics setting (e.g., a categorical outcome variable ) for HLM via latent variable model s. The goal of this dissertation is to draw together in one place the major ICC developments , then to further develop a new thinking in statistical inquiry of ICC estimation and inference. In addition, the evidence -based paradigm in v . vi TABLE OF CONTENTS LIST OF TABLES ................................................................................................................ ix LIST OF FIGURES ............................................................................................................... xi CHAPTER 1 INTRODUCTION ............................................................................................ 1 CHAPTER 2!LITERATURE REVIEW OF STATISTICAL METHODS ............................ 8 2.1 Fisher Approach .......................................................................................................... 8 2.2 Donner Approach ...................................................................................................... 21 2.3 Hedges Approach ...................................................................................................... 31 2.4 Raykov Approach ..................................................................................................... 39 CHAPTER 3 LITERATURE IN REHABILITATION COUNSELING ............................. 45 3.1 Multilevel Analysis ................................................................................................... 46 3.2 Structural Equation Model ........................................................................................ 48 3.3 Classification Tree Model ......................................................................................... 49 3.4 Other Methods Such as Social Network Analysis and Spatial Analysis ................... 50 3.5 Justification for Covariates Used in Multilevel Analysis .......................................... 51 CHAPTER 4 METHOD S AND RESEARCH QUESTIONS .............................................. 52 4.1 Research Methods ..................................................................................................... 52 4.2 Proposed Models ....................................................................................................... 56 4.3 Research Questions ................................................................................................... 57 4.4 Description of RSA -911 Data ................................................................................... 59 4.5 Simulation and Analysis Plan ................................................................................... 59 4.6 Theoretical Framework of HLM and HGLM in 2 -Level Cluster Randomized Design ........................................................................................................................................ 61 4.6.1 HLM in 2 -Level Cluster Randomized Structure via RSA -911 ........................... 61 4.6.2 HGLM in 2 -Level Cluster Randomized Structure via RSA -911 ....................... 63 CHAPTER 5 RESULTS ...................................................................................................... 65 5.1!Data Source and Sample Characteristics ................................................................... 65 5.2 Models and Variables Used for Simulations of ICC Analysis .................................. 68 5.3 ICC Estimation Method and Its Inferential Statistics ................................................ 74 5.4 Results of ICC Estimates and Inferential Statistics ................................................... 79 5.4.1 Competitive Employment Outcome Measure .................................................... 80 5.4.2 Earnings or Quality Employment Outcome Measure ........................................ 91 vii CHAPTER 6 CONCLUSION & DISCUSSION ............................................................... 101 6.1!Summary of the Results .......................................................................................... 101 6.2 Implications ............................................................................................................. 105 6.3 Limitations of the Study .......................................................................................... 114 6.4 Future Research ....................................................................................................... 117 6.5 Conclusion .............................................................................................................. 120 APPENDICES .................................................................................................................... 121 APPENDIX A: Definitions of the VR Variables in RSA -911 ...................................... 122 APPENDIX B: Descriptive Data Statistics .................................................................. 125 APPENDIX C: Glossary of A bbreviations ................................................................... 128 BIBLIOGRAPHY .............................................................................................................. 129 vii i LIST OF TABLES Table 2.1 Analysis of Variance (ANOVA) for Intraclass Correlation (ICC) Calculations .... 23 Table 5.1 Individual Characteristics of the Usable Samples ( n=11,819) ............................... 66 Table 5.2 Disability & Rehabilitation Characteristics of the Usable Samples ( n=11,819) .... 67 Table 5.3 Outcomes of the Usable Samples ( n=11,819) ....................................................... 68 Table 5.4 Correlation Structure of All Predictors and Outcome Y1 in Hierarchical Analysis .............................................................................................................................................. 70 Table 5.5 Correlation Structure of All Predictors and Outcome Y2 in Hierarchical Analysis .............................................................................................................................................. 70 Table 5.6 Summary of Mean Differences in the Outcomes between Type of Disability ....... 71 Table 5.7 ICC Estimates of Unconditional Model M1 for Outcome Measure Y1 ................. 86 Table 5.8 ICC Estimates of Conditional Model M2 for Outcome Measure Y1 .................... 87 Table 5.9 ICC Estimates of Conditional Model M3 for Outcome Measure Y1 .................... 88 Table 5.10 ICC Estimates of Conditional Model M4 for Outcome Measure Y1 ................... 89 Table 5.11 Auxiliary Information of ICC Estimates for Outcome Measure Y1 .................... 90 Table 5.12 Evaluation of Bootstrap ICC Estimates for Outcome Measure Y1 ................... 90 Table 5.13 ICC Estimates of Unconditional Model M1 for Outcome Measure Y2 ............... 96 Table 5.14 ICC Estimates of Conditional Model M2 for Outcome Measure Y2 ................... 97 Table 5.15 ICC Estimates of Conditional Model M3 for Outcome Measure Y2 ................... 98 Table 5.16 ICC Estimates of Conditional Model M4 for Outcome Measure Y2 ................ 99 Table 5.17 Auxiliary Information of ICC Estimates for Outcome Measure Y2 ................ 100 Table 5.18 Evaluation of Bootstrap ICC Estimates for Outcome Measure Y 2 ................. 100 ! ix !!Table A.1 List of the Definitions of VR Service Variables Used in the Study .................. 122 Table A.2 List of the Definitions of VR Demographic Variables Used in the Study ........ 123 Table A.3 List of the Definitions of VR Outcome Variables Used in the Study ............... 124 Table B.1 Descriptive Summary of the Usable Sample by Office Level in Michigan (n=11,819) .......................................................................................................................... 125 Table B.2 A Summary of the Geogra phic Information System of Office Units in Michigan ............................................................................................................................................ 126 Table C.1 Glossary of Abbreviations ................................................................................. 128 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! x !!LIST OF FIGURES Figure 1.1 Conceptual Flowchart of the Intraclass Correlation Study at Hierarchical Design ................................................................................................................................................ 5 Figure 2.1 Sampling Distributions of Non -Transformed and Transformed Correlations at Three Different Levels ......................................................................................................... 15 Figure 2.2 Intraclass Correlation Between Two Classes of Measurements ......................... 18 Figure 2.3 Demonstration Example of Intraclass Correlation by Two Classes of Measurements ....................................................................................................................... 20 Figure 2.4 Intraclass Correlation & Design Effect in 2 -Level Hierarchical Linear Model .... 36 Figure 2.5 Latent Variable Model for Estimation of Intraclass Correlation in 2 -Level Design .............................................................................................................................................. 41 Figure 4.1 A W orkflow Diagram of Simulation -based Exploration and Evaluation for the ICC ....................................................................................................................................... 60 Figure B.1 Spatial Network of Target Sample in Michigan by Hierarchical Structure ....... 127 !!!!!!!!!!! xi ! !1 CHAPTER 1 INTRODUCTION The need for more scientific evidence -based research has been increasingly concerned in 21st century education (Schneider et al., 2007). The use of rigorous methods such as randomized control trial (RCT) and cluster randomized trial (CRT) experiments in particular, is important to not onl y reinforce sound research but also build a solid basis of evidence -guided knowledge for inform ing policymakers and practitioners (Menon et al., 2009; Slavin, 2002). Under The Every Student Succeeds Act of 2016 (amended after No Child Left Behind), the U.S. Department of Education (2016) wrote the new guidelines of implementation of scientific research . Specifically, a s for use of evidence -based intervention s, researchers need t o be guided by auxiliary research evidence from previous studies in order to conduct scientific ally rigorous research as well as promot e better and effective outcomes in education , according to the statistical standards and guidelines for the National Cent er for Education Statistics at The What Works Clearinghouse (https://ies.ed.gov/ncee/wwc/). With that goal in mind , RCTs and CRTs are often highly suggested by federal education research agencies , such as Institute of Education Sciences and its affiliated centers , and constantly deemed as the gold standard in scientific research and evidence -informed practice , since both RCT and CRT approach es have already been proved successfully in many circumstances for making decision s in educatio n. One key element to making any meaningful scientific conclusions is to produce evidential base through designs and experiments (Anderson & Shattuck, 2012; Barab & Squire, 2004; Cobb et al., 2003; Odom et al., 2005; Shavelson et al., 2003 ). For education policy and practice in the 21st century (Slavin, 2008 ), the pursuit of research soundness has been already !2 reinforced persistently by means of education legislation , e.g., NCLB Legislation (2002 ) and ESRA Legislation (2008). The No Child Left Behind Act of 2001 (NCLB) , f or example, supported scientifically based research involving rigorous and systematic methods to obtain applicable and generalizable knowledge for improving school programs, teaching methods and learning outcomes. Further more , The Education Sciences Reform Act of 2002 (ESRA) was proposed to reform education science s through principles of scientific research such as random ized experiments to measure causal impacts on educational outcomes. In the era of evidence -based practice (EBP), rehabilitation counseling is also embracing the concepts of best practice and knowledge translation to incorporate scientific advances and chan ges that have redefined the relationship between impairment s and the cap ability to work (Leahy et al., 2014a). As for the state -federal vocational rehabilitation (VR) services, the public VR agencies are a major force of employment assistance for individua ls with disabilities. Recent legislation for The Workforce Innovation and Opportunity Act (WIOA) of 2014, state VR programs have to assist the target disability population s, with educational or vocational training services, to succeed in the labor market and further to compete, with professional competency skills, in the global economy (WIOA Legislation, 2018). Therefore, nowadays the rehabilitation counseling workforce (including all those counselors, educators, practitioners, and researchers) need to work together to embrace the new era of the EBP paradigm to help VR customers improve the access ibility of quality rehabilitation services with informed choices of effective interventions or treatments. Moreover , it is important to use data -driven or evidence -based rehabilitation counseling best practices to improv e accountability and outcomes for people with disabili ties by conducting systematic reviews and well -designed studies , as a way to get more reliable and valid evidence for translating knowledge and making good decisions in !3 VR (Chan et al., 2009; Leahy et al., 2009; Leahy & Arokiasamy, 2010; Leahy et al., 2014 b). The e vidence -based practice (EBP) has become a new norm today by conduct ing valid research and gather ing reliable data for improving practices and outcomes (Eignor, 2013). In education (including rehabilitation counseling), t he EBP research along with well -constructed designs and experiments can provide fundamental and significant improvements over practices. Not only can t he proper use of EBP results help make better decisions about individuals (e.g., people with disabilities) and programs (e.g., VR agencies ), but it can also provide a successful path way to gaining broader access to quality education or full employment , according to the standards for research conduct by educational researchers ( American Educational Research Association, American Psychological Association, National Council on Measurement in Education, & Joint Committee on Standards for Educational and Psychological Testing, 2014). Professionals in the field of rehabilitation coun seling , such as VR counselors and practitioners , are often expected to integrate clinical judgement skills (including scientific attitude, cognitive complexity, evidence -based practice, and counselor biases) with research evidence via scientific -based meth ods to make best informed decisions that maximize the well -being outcomes of the clients (e.g., people with disabilities in public VR) (Austin & Leahy, 2015; Menon et al., 2009). The emphasis of best EBP lends VR counselors a significant ly renewed impetus , so that they can be more accurate in clinical judgement by getting research -informed knowledge in clinical issues of interventions and outcomes (Chan et al., 2010). Since the correlation coefficient was introduced last century, it has been used as one of the most important statistical tools for scientific inquiries in educational and social research (Agresti & Finlay, 2009; Fisher, 1915; Olkin & Pratt, 1958; Pearson & Lee, 1903 ; Pearson, 1904; Pearson, 1920; Soper et al., 1917; Student, 1917; Thorndike, 2005). When using !4 whereas, this issue , which has the potential for an informal fallacy , can be rectified by using a well -designed experiment , by means of which researchers are more likely to go the extra mile to obtain valid statistical inference or even causality in studies (Fisher, 1925a, 1942, 1958a, 1958b; Holland, 1986). Of the different types of effect magnitude measures for the correlation ratio (e.g., Intraclass Correlation , Eta -squared, Omega -squared , R-squared, and Rho -squared indexes), the intraclass correlation (ICC) is a parametric estimator in the random -effect (or mixed -effect ) model to quantify the true proportion of total variance accounted for in the outcome variable (Hays, 1994; Raudenbush & Bryk, 2002). Further more , the ICC can summarize the cluster ing effect magnitude (i.e., the relatedness) at a hierarchical design ( Note: I n statistics, this technique is called the random coefficients in m ultilevel models) (Hays, 1994; Hedges & Olkin, 1985). In this study, one main research goal is to investigate the ICC in hierarchical linear models (HLM) and hierarchical generalized linear models (HGLM) by using the mixed -effect analysis of variance ( ANOV A), in order to better understand statistical properties on different simulation -based scenarios with respect to complex modeling structure s and sampling design s. Both RCT s and CRT s have been widely viewed as the one of the best EBP approaches (i.e., the gold standard) for appraising and measuring the efficacy and effectiveness of intervention s or treatments in educational and social research studies , since not only can such a method ology for designing experiment s efficiently relation to the intervention or treatment given by using a n experimental design, but it can also effectively provide more robust and valid evidence in EBP for scientific inquir y and research !5 (Connolly et al., 2018; Menon et al., 2009; Schneider et al., 2007; Sullivan, 2011). Figure 1.1 Conceptual Flowchart of the Intraclass Correlation Study at Hierarchical Design ! The current study is to address the following research questions under an EBP paradigm with the hierarchical design (i.e., CRT) driven by the intraclass correlation coefficient (ICC) ICC). Moreover, this research is to evaluate the statistical performance of ICC in various simulation -based scenarios designed by the complex hierarchical data structure s through the existing RSA -911 data set, where clients in VR were represented as real -world connections into computer simulations in the two -level CRT setting (i.e., clients are in level 1, and offices are in level 2). Also n ote that , in simulations using real data via the RSA -911, the selected variables are incorporate d into multilevel mode ling (i.e., HLM & HGLM) to represent the pivotal VR relationships between demographic characteristics, rehabilitation services , and employment !6 outcomes . In order to answer the proposed research questions, a computer simulation study (i.e., the Monte Carlo Method) is conducted using the bootstrapping procedure with the real data of RSA -911 (Note: The bootstrap method is a resampling technique wit hout replac ement from a given sample). More details of this computer simulation framework with RSA -911 data are provided in Chapter 4 ( Methods and Research Questions ) and Chapter 5 (Results). This study is to address the following three research questions with respec t to the ICC. Research Question 1 . Consider RSA -911 data for those people with disabilities served in Michigan in FY 2015 . What are the empirical distributions of ICC (estimate, standard error, p-value and 95% confidence limits ) for the usable samples of RSA -911 data? (a). Compare the method performance of statistical estimation and inference among Models 1 -4, where Model 1 is fully unconditional, Model 2 is conditional on individual characteristics of gender, minority, age, education and social security insurance benefits , Model 3 is conditional on rehabilitation service predictors (job placement, on -the -job supports and rehabilitation technology) , and Model 4 is a combination of Models 2 and 3. (b). What are the empirical distributions of ICC estimates given by different breaking variables (disability type, disability significance and severity, and previous work experience ) for subset analysis under Models 1 -4? What are the differences among Models 1 -4 in (a) and (b) ? Research Question 2 . Given the clust er randomized design structure of RSA -911 data, there are three different cluster settings at level 2 - the number of groups = 5, 15, or 25; and there are three individual settings at level 1 - the number of subjects = 50, 100, or 150. Based on the boo tstrapping procedure by 100 times, what are the empirical distributions of ICC (estimate, standard error, and p -value) in each bootstrap scenario under Models 1 -4? !7 (a) . Given by each of bootstrap resampling scenarios (the number of bootstrap repetitions=10 0), compare the ICC estimates among Models 1 -4 and examine which model (from Models 1 -4) can provide better statistical performance of ICC estimation and inference . (b). Evaluate which bootstrap sampling scenarios (based on the number of groups and the number of subjects) can provide more accurate and precise ICC estimates (i.e., less bias and less mean squared error in statistical estimates) ? What are the recommended sampling strategies (the number of groups and subjects) for cluster randomized trials u sing RSA -911 data? Research Question 3 . Comparing the results between Research Question 1 ( RQ1: Population Model) and Research Question 2 ( RQ2: R esampling Model), which model (in Models 1 -4) can provide the best statistical properties of ICC estimat ion and inference, in terms of statistical bias (expected difference in ICC estimates between RQ1 and RQ2), mean square d error (mean square d deviations in ICC estimates between RQ1 and RQ2), and ICC parameter 95% confidence interval for ICC , based on the subsamples in RQ2 , in comparison with the overall sample result in RQ1 )? The next two chapters are to present both the literature review of statistical methods and applications for intraclass correlation plus t he motivation for the study (Chapter 2), as well as the literature of statistical approaches in rehabilitation counseling using RSA -911 data (Chapter 3). The rest of the dissertation is organized as follows. In Chapter 4, it covers a mathematical framework and notation in the propose d methodology for investigating intraclass correlation in multilevel structure. In Chapter 5, it shows the results using the real data set of RSA -911 via an exploratory bootstrap simulation approach to ICC estimation and related statistical inference. Last but not least, simulation results and study findings are discussed in Chapter6. !8 CHAPTER 2 LITERATURE REVIEW OF STATISTICAL METHODS In this chapter, a comprehensive introduction to the history of intraclass correlation coefficients (ICC) at experimental designs is provided to serve a basic framework of this study. ICC has been one of the oldest statistical measures since Sir Ronald A. Fisher coined it last century. T he fundamental idea of ICC is presented first to show the basic context of intraclass correlation, and then is followed by a series systematic review of its development s in statistical estimation & hypothesis testing, effect size measurement, and Fisher tr ansformation using ICC . In addition, a review of the literature pertinent to the current major development s in ICC by Allan Donner, Larry Hedges, and Tenko Raykov, as well as their proposed analytic strategies for using ICC , are all provided to serve a fundamental basis of the study and then to understand the ICC phenomenon at multilevel design especially for cluster randomized trials. 2.1 Fisher Approach Since the correlation coefficient was introduced last century, it has been used as one of the most popular and important tools in scientific inquiries including biometrical work as well as social and educational research studies (Fisher, 1915; Olkin & Pratt, 1958; Pearson, 1920; Rodgers & Nicewander, 1988; Soper et al., 1917; Student, 1917). The inheritance of physical and mental characters in human is one classical example to show how powe rful this statistical tool can be applied to across all our scientific fields. For example, Pearson and his colleague used U.K. school children data in the late 1800s to investigate a variety of basic human !9 mechanisms from physical characteristics (e.g., a ge, body size, stature, and even eye color) to latent or psychic abilities (e.g., mental status or intelligence), and further to compare those measures, using Person product -moment correlation, to understand ancestral heredity, natural inheritance, and fam ily resemblance (Pearson & Lee, 1903; Pearson, 1904). When using correlation to interpret statistical results, researchers need to be aware of issue can be dealt with by a carefully designed experiment (like randomized control trials) , and it may help go the extra mile to test causality and further to make a valid statem ent of causal inference (Fisher, 1958a, 1958b; Holland, 1986). In the experimental field, scientific inquiries can be done synthetically with three key ingredients replication (for adding precision), randomization (for bringing validity), and control (fo r reducing interference), and so research workers therefore are able to 409-410); on the other hand, in the obs ervational study, some may be found it useful in the exploratory stages to express a statistical inquiry in the form of a correlation coefficient, but, valid founda tion of making causal links rather than simply to produce spurious correlations or even counterfactual connections, due to a reasonable suspicion that, if any, various possible contributory causes of a studied phenomenon cannot be controlled (Fisher, 1925a , Chapter Six (like quasi -experimentation or regression discontinuity) to circumvent, or at least to alleviate, the difficulty and probl em by adjusting uncontrolled observations (or uncontrollable events !10 with artificially controlled (or statistically manipulated) quasi -experimental conditions (i.e., pseudo experiment al models ) to appropriately but properly estimate causal impacts (i.e., treatment or intervention effects) using Neyman -Rubin Model (Schneider et al., 2007). In a theoretical perspective, mathematical features (algebraic relationships) and key properties (statistical functions) of Pearson correlation coefficient are listed as follows . Let are N pairs of independent samples with bivariate normal with means , variances and correlation . The frequency can be written in the form , where the correlation may be positive or negative or zero but cannot exceed unity in magnitude (Fisher, 1925a; Roussas, 2002). If one variate has an assigned value (e.g., ), then by giving a constant value , this conditional frequency (i.e., the total frequency above is divided by the frequency with which occurs) can be expressed by a general formula , where the conditional distribution ( of given ) is normal with mean and variance , and it implies that the total variance of in the fraction !11 is independent of , while the remaining variation of in the fraction is determined by (and calculable from) the value of (Fisher, 1925a; Mood, Graybill , & Boes, 1974). The statistical estimation of the correlation is the ratio of the covariance to the geometric mean of the two variances; if and represent the deviations of the two variates from their means , then the correlation coefficient (or product moment) estimator would be given by , where the mean estimates can be approximated by sample means . By Olkin and Pratt (1958), the probability density of is derived as , where is the hypergeometric function, and the last term therefor e can be computed and simplified as . It is noteworthy that under the null hypothesis of is true (i.e., ), the asymptotic distribution of a sample correlation is a normal density with mean 0 and variance . In the general case of (i.e., ), by using Laplace transformation and Taylor series expansion through on that previous density function of sample correlation , Olkin and Pratt (1958) derived the !12 uniformly minimum -variance unbiased estimator (UMVU E: an unbiased estimator that has lower variation error than any other unbiased estimators for all plausible values of the parameter), which is shown to be Note that there is another simple estimator of correlation coefficient to adjust biased correlation especially for a small sample size , according to Kelly (2018) and Flom (2015): , where the formula is resulted from adjusted for # predictors. To test whether a correlation is different from zero (i.e., ), the test statistic is , which is t -distributed with degrees of freedom (Lomax & Hahs -Vaughn, 2012, pp.267-268; Roussas, 2002, pp. 472 -473). It is interesting to note that the probability density of a correlation (when ) can be found using a linear transformation of -statistic above !13 by , where this density is only true for the case of (independence) ( Roussas, 2002, pp. 474). Comparing to the previous -statistic approach of testing significance of a correlation coefficient, transformed correlations is another way to deal with the issue of testing the significance of an observed correlation coefficient. By using a well -known standard normal testing statistic, Fisher (1925a) proposed a more reliable and accurate transformation method that employs the informati on of a given correlation to approximate to the standard normal distribution in which this test can be carried out without much difficulty in laborious transformation is defined as the formula , where the statistic value ranges from to as the sample correlation changes from to can also be approximated by , and the standard error of is derived in a simpler form approximately as which is practically independent of any value of correlation in the population from which the sample is drawn. There are three advantages of this transformation of into (Fisher, 1925a, pp. 198 -199) : (1) the standard error of does not depend on the true value of the correlation , so can provide a true weight for the value of the estimate (i.e., is a so -called ancillary statistic !14 which contains no information about the parameter interest , but sometimes it paradoxically the accuracy and precision of th e estimate here; Casella & Berger, 2002, pp. 282 -284; Cox, 1971; Efron & Hinkley, 1978; Fisher, 1925b, p. 724); (2) although the distribution of is not normal in small samples and even remains far from normal for large samples with a high correlatio n (e.g., the correlation is close to ) , the sampling distribution still tends to converge to asymptotic normality as the sample size increases, no matter what the value of the correlation may be (either large or small, positive or negative); ( 3) while the distribution of changes rapidly in terms of its shape (i.e., skewness and kurtosis) as the parameter is changed (given by ), the sampling distribution of is probabilistically more stable and nearly constant in the form of a symmetrical bell shape (i.e., values are normally transformation follows approximate normality with mean the true correlation parameter and variance . Also see below Figure 2.1 for demonstrating the comparisons of the sampling distributions between non -transformed and transformed correlation coefficients at the three different levels (i.e., correlation coefficient s are set at 0.2, 0.5, and 0.8 , while ). In Figure 2.1, i distributions are relatively more robust and stable than -transformed correlation coefficient across the continuum domain (i.e., and ). !15 Figure 2.1 Sampling Distributions of Non -Transformed and Transformed Correlations at Three Different Levels Note. The original idea of this graph (Figure 2.1) comes from transformation (1925a, p.200) . The upper panel demonstrates the sampling distributions of correlation at the levels of r = 0.2, 0.5, and 0.8 ; and the lower panel shows the respective sampling distributions by Z transformation in which the values are shown as z = 0.20, 0.55, and 1.10 . Corr elation Distribution by Three Diff erent Levels Distribution by Three Diff erent Levels Correlation Value Density Density !16 In terms of correlation -based measures , Jacob Cohen (1988, pp. 77 -81) proposed that (or the threshold of around 0.1) as a small or weak effect, (or around 0.3) as a medium or moderate effect, and (or around 0.5) as a large or strong effect, and (or around 0.7 or above) as a very large or extremely strong effect, to determine the effect size magnitude of a studied phenomenon of interest ( Cohen, 1988; Ellis, 2009; Rosenthal, 1996). It is cautious to note that these standards for correlation thresholds may need to be modified or even re -evaluated & re-justified in different areas of scientific inquiries , especi ally for the fields other than behavioral and social sciences (such as clinical and social psychology ), since J. Cohen (as a clinical and social psychologist) was originally working on this effect -size magnitude research using the data in his field (specifically, unique to psychology and social sciences ) for developing qualitative descriptors of strength of association w ith respect to a quantitative product -moment . In the family of effect size measures of correlation, there are other types of effect size estimates that are calculated based on different variance components (e.g., effect magnitude (EM) = [explained variance] / [total va riance] , which is translat ed into plain language EM is the amount of the explained variance can be accounted for by the total variation within an experimental design model; Cohen, 1988, p. 78.) For instance, the coefficient of determination (aka R -squared, or ) is widely known and used especially in regression models . In addition, the correlation ratio Eta -squared ( ) is another form of the squared correlation in analysis of variance (ANOVA) models (Pearson, 1923; Richardson, 2011) . Also, Hays (1994) introduced a similar one the omega -squared index ( ) as a ratio of the relative reduction in uncertainty say about due to , which shows the variance component in given by , and this index can be !17 described as . Last but not least, the intraclass correlation coefficient (IC C) is defined as , the formula of which is another idea to quantify the true proportion of variance accounted for in the outcome (by cluster effect) in random -effect mixed models (Hays, 1994; Raudenbush & Bryk, 2002). Note that the intraclass correlation (or the so -called cluster effect) is defined only in the random -effect (esp. random -intercept) models, while the omega -squared index can also be used in the fixed -effect analysis (Hays, 1994, p.535; Hedges & Olkin, 1985, p. 103). In this study, one main focus is to investig ate ICC in hierarchical models (mixed effects ANOVA) so as to better understand its properties on different scenarios by design effect and sample size. Another application of the use of intraclass correlation is to measure the level of similarity or resem blance (Fisher, 1925a; see Figure 2.2 below as an illustration of intraclass correlation). In one case like plant biology fields, the resemblance between leaves or pods on the same tree was studied say by picking 30 seed pods from a number of different 100 trees. In another case of human & family correlation studies, for example, we have a sample of anthropometric measurements of about 1500 pairs of siblings of the same family (e.g., two classes: elder kid vs younger kid); and we may want to calculate corre lation between siblings. Here , if an association of interest is based on difference s between two classes (or groups) of measurements , then it would be so-called interclass correlation that is also equivalent to a typical correlation coefficient between two sets of measurements . On the other hand, suppose that all the subjects ( e.g., a combination of both older and younger siblings) !18 belong to the same class ( only one group of a single whole study overall) with a common mean and a common standard deviation about that mean for all measurements, and then correlation now is distinguished as intraclass correlation (Fisher, 1925a, pp. 211 -215). Figure 2.2 Intraclass Correlation Between Tw o Classes of Measurements Note. This illustration of ICC is motivated by an original idea by Fisher (1925a). In the special case of having two classes of measurements given by N pairs of samples , intraclass correlation is defined as !19 , where the common mean is , and the common variance is . When it consider s the general case of having a set of classes of measurements given by N samples with representing a set of means from the k classes in each sample, the general formula of intraclass correlation can be written by , where the common mean is , the common variance is , and the range of intraclass correlation values is always positive or should not be less than . See Figure 2.3 for a geometric interpretation of ICC by illustrating the resemblance of 10 paired observations (i.e., siblings ) as to some measure of within -pair association (or intraclass correlation ) bet ween the two siblings in the same family . It is interesting to note that the ICC in Figure 2.3 can be geometrically represented as well as numerically approximated by the overall Euclidean distance (or norm ) between the paired samples on the standardized scale (i.e., t he overall Euclidean length can be defined by the standardized difference between the measures of sibling A and sibling B in the Cartesian coordinate system ). !20 Figure 2.3 Demonstration Example of Intraclass Correlation by Two Classes of Measurements Note. This illustration comes from the concept of intraclass of having the common mean and standard deviation for all the measurements (1925a, Section 38 of Intraclass Correlations and the Analysis of Variance, pp.211 -214). The intraclass correlation (or within -pair correlation) can be estimated by the Euclidian distance of the paired measurements between the two related groups of samples (i.e., the true ICC is set at 0.303, and the estimated ICC is given by 0.298 using the standardized length between a pair of measurements from Sibling A and Sibling B ). -2-1012-2-1012!"#$%&'%(()*+$$,'%#-+")./%01',)2+$)3,%(4$,0,"#()+2)56)7%-$()+2)8-9'-":(Measure of Sibling AMeasure of Sibling BReg Line: r= 0.303Cor Dist: r= 0.29845-Degree LineExample of Intraclass Correlation for Measurements of 10 Pairs of Siblings Measure of Sibling A Measure of Sibling B !21 2.2 Donner Approach In the analysis of family data, it is frequent to use the intraclass correlation coefficient to measure the degree of intra -family resemblance among family members with regard to family health history in quantitative traits of biological or psychological a ttributes such as intelligence (IQ). Donner & Koval (1980a) derived the maximum likelihood estimator (regarding no prior knowledge of statistical estimates) of the intraclass correlation using multivariate normal theory in variance component models (assuming unequal group/family sample size). In statistical theory, suppose one observation on the j-th member ( ) of the i-th family ( ) is used to investigate the intraclass resemblance among the class of samples from each of families, which can be stated mathematically as , where is an observation for which i is the index of a family or group factor ( ) and j is an individual member within that family or group factor ( ), is the grand mean of all the observations, is designed as the random effect (identically distributed) with mean 0 and variance (i.e., NID(0, )), is a random normal error term for j-th subject in i-th group ( i.e., independently and identically distributed with mean 0 and variance ; viz., NID(0, )), and both random components, { } and { }, are assumed to be mutual ly independen t. !22 By summing the additive variance components (i.e., a sum of both between -group and within -group variation is equal to total variation), the variance of is given by , and then the int , where this index will be zero when , and it will be unity if (assuming that ). Notice that the intraclass correlation represents the tr ue proportion of variance attributable to Factor , and that the intraclass correlation is similar to the omega -squared index ( ) in the general form, although the intraclass correlation ( ) applies to the random -effect model but th e omega -squared index ( ) often only to the fixed -effect model (Hays, 1994, p.535). Equivalently, from a point of view of statistical theory, the intraclass correlation can also be fundamentally defined as the ordinary correlation coefficient between any two observations in the same class (group or family), say & , since thei r statistical relationship holds that , where , and (Donner & Koval, 1980a). above requires distributional assumptions of observations (based upon multivariate normal theory), it is the analysis of variance (ANOVA) that provides an alternative estimator of intraclass correlation (for relaxing the assumptions) in the classical line ar models (Donner & Koval, 1980a). The new !23 practical method for estimating intraclass correlation is to utilize relevant information in the ANOVA table shown as following (without loss of generosity, it is assumed to be a balanced design with equal group/f amily size). Table 2.1 Analysis of Variance (ANOVA) for Intraclass Correlation (ICC) Calculations !Source of Variation Degree of Freedom (DF) Sum of Squares (SS) Mean Squares (MS) F Statistic Among Groups k-1 SSA MSA MSA / MSW Within Groups k(n -1) SSW MSW Total kn-1 SST ! , where the between -group variation SSA = , the within -group variation SSW = , the total variation SST = , the mean squares among groups MSA = SSA / DF(Among Groups) = SSA / (k -1) = , the mean squares within groups (or the mean squared error) MSW = SSW / DF(Within Groups) = SSW / [k(n -1)] = , the between -group degrees of freedom DF(Among Groups) is (for = the number of groups), the within -group degrees of freedom DF(Within Groups) is (for = the number of within -group subjects). It is interesting to note that, by Hays (1994, pp. 533 -535), the expectation of mean square among groups E[MSA] = , and that the expectation of mean square within groups E[MSE] = (i.e., MSE is an unbiased estimate of error variance; Hays, 1994, p.532). Therefore, the intraclass correlation estimator can be indirectly obtained in such a way (via ANOVA) that: !24 , where the total variance consists of two independent variance components and hence is given by for and ; the best estimate of the total variance ( ) is to use the estimates of group variance ( ) and error variance ( ), so that . Also notice: an unbiased estimate of group variance may be found when MSE is greater than or equal to MSA (Hays, 1994, p.534). (for ) in ANOVA, the common fami ly or group size is calculated for representing the mean within -group individuals, and the intraclass correlation coefficient (Donner & Koval, 1982) is given by , where and is defined by the number of total sample size (i.e., ). Also note that, by Donner & Koval (1980a), the mean within -group subjects can be alternatively calculated by , where the approximate group si ze , and this latter formula of the average within -group size ( ) is mathematically equivalent to the former ( ), yet the computation ( ) is more laborious. Since and are deemed, respectivel y, as !25 the unbiased estimates of and , it is intuitive and straightforward to find the estimator of , where it is equivalent to the previous formula due to (Donner & Koval, 1980a). As for statistical testing of intraclass correlation, by Donner & Koval (1980a), there is a test of significance for the estimate of intraclass correlation in analysis of varia nce using F-distribution with and degrees of freedom at the chosen level of significance, with respect to testing the hypotheses vs. . A significant F testing statistic value (i.e., ) implies that members of the same group tend to be more alike and similar to each other with respect to the attribute or characteristic in question than those from a different group, and also that the estimated intraclass correlation coefficient shows the idea of the true proportion of variance accounted for in the population by that factor of interest (e.g., families or groups). For the sake of another mathematical and statistical expression of the intraclass correlation index, the i ntraclass correlation coefficient can be re -defined using the quantity as , where there is a basic statistical assumption of the normal distribution for the random effect ( ) and the error ter m ( ) (Hays, 1994, p.535). Further, in linear !26 modeling theory (Hays, 1994, pp.535 -536; Kutner et al., 2005, pp.1040 -1041; Stapleton, 2009, p.285), the testing statistic of the proposed intraclass correlation estimator can be shown that , where this proposed method is mainly based on the random -effect ANOVA with a balanced design, and it follows an distribution with and degrees of freedom, so that a confidence interval on can be obtained by , where is the sample ratio value in ANOVA table. By the algebraic relationship , the corresponding interval for intraclass correlation is , where this confidence limit, with confidence coefficient , for intraclass correlation !27 represents the degree of total variability accounted for by the mean differences among different factor levels (or the effect of the extent of variation between groups or families in the analysis of family data). Note that this interval estimate (for either or ) may not be very precise, if it results from a relatively small sample size, or if variance components are much more difficult (e.g., relatively low reliability in measurements) to be estimate d precisely than means. Also note that it may occasionally happ en that the lower limit of the confidence interval for either or is negative, but since this ratio ( or lower limit with the best value to the zero lower bound that is , simply, zero in this case. The maximum likelihood estimator of intraclass correlation can be derived by using a theory of multivariate normal distribution (with the common mean and variance -covariance struct ure). Let represent measurements taken on the -th groups ( ), each consisting of subjects, with a total size . Assume this -variate follows a multivariate normal or equivalently, the ( -variate normal) probability density function is given by , where the mean vector is for a common mean across all groups, and the !28 variance -covariance matrix is for the diagonal element (or a common variance across groups) and the off -diagonal element (or a common covariance over groups), denotes the determinant of (i.e., the scaling factor in matrix algebra), and is the index of groups for . In a balanced design (the common correlation model), the estimate of intraclass correlation can be obtained by using Pearson produ ct-moment correlation (Donner & Koval, 1980a), and the explicit form of the estimator can be expressed by , where and represent the common sample mean and variance, respectively, and can be computed across all observations using the concept of intraclass correlation by Fisher (1925a). And, by a large sample theory (asymptotic normality), the variance of the proposed estimator is Note that when a balanced design is considered (i.e., for all ), this estimator is also equivalent to the result of the maximum likelihood estimate (MLE) of intraclass correlation (i.e., the multivariate normal density is taken by the maximum !29 likelihood method). On the other hand, for an unbalanced design, the asymptotic (large sample) variance of the proposed estimator is given by , where the sampling weights are & , total sample , and Pearson correlation is used as the estimate of (Donner & Koval, 1982). In addition, as for the estimators of and , the MLE solutions can be found by and Hence, with and , the MLE of intraclass correlation in this case can be computed as !30 Alternatively, it is equivalent to Note that Karlin et al. (1981) derived this MLE of intraclass correlation in an unbalanced design (by using invariance prop is the MLE of , then for any function , the MLE of is & Koval (1980b) used a different approach to solving the MLE of by numerically g-likelihood function (the logarithm of -variate normal density) with a scaling factor of : , where this optimization method takes differentiation with respect to to find the MLE. !31 2.3. Hedges Approach Hedges used intraclass correlation to summarize the information of variance components in multilevel structure of 2 -Level, 3 -Level, and 4 -Level hierarchical design (Hedges et al., 2012; Hedges & Hedberg, 2013). Further, intraclass correlation has been considered as an important tool/statistic to provide design effect parameters for statistical planning (power analysis) in experimental design an d survey sampling (e.g., randomized controlled trials or large -scale experiments in education settings). In hierarchical linear models, intraclass correlation play s a key role in quantifying the amount of inherent clustering effects (i.e., within -cluster v ariation) in multilevel data. Look back at the development of ICC in hierarchical designs . The ICC was first introduced by Fisher (1925a), who created the oldest measure for within -group correlation and provided a significance testing procedure in experimental designs (such as RCTs and CRTs) . Later on, Raudenbush (1997) buil t on hierarchical linear models in education to evaluate the clustering effect of multilevel data structure through ICC. Furthermore, Hedges used the meta -analy tic framework to rethink the ICC by using design effect to improve multilevel design s in education and social research . The theoretical framework of intraclass correlation in multilevel design (like a cluster randomized trial , CRT) using hierarchical linear model (HLM) is: In a two -level HLM, suppose that the variance components associated with fully unconditional model (no cov ariates at any level of the model). Let and be the variance components at Level 1 and Level 2, respectively, and and be the MLEs of and , respectively. Let the variances of and be and , !32 respectively. Without loss of generality, suppose that (note: in most large -scale studies by hierarchical design, the Level -1 variance component is usually known , i.e., is a given constant and , or c an most likely be est imated precisely, i.e., , since there are many Level -1 unit s that provide sufficient information for estimation ; Hedges et al., 2012.) Let denote the number of groups or clusters (Level -2 units) and denote the number of Level -1 units in the -th Level -2 unit of group or cluster. When the study is a balanced design analysis (i.e., ), the intraclass correlation in the tw o-level HLM model is , and the intraclass correlation estimator (based on cluster random samples) is given by , then the asymptotic variance (based on la rge sample theory and delta method) is shown by , where the total variance component is , and the variance of is which is the variance (or squared standard error) estimate of the Level -2 variance !33 component. As for the estimate of the variance of (i.e., sampling variability of the sample ICC) , the large sample variance is given by , where is the intraclass correlation estimate (or ), and the variance of is defined by , so that the estimate is for . (Note: the assumption of , or , is imposed on the large -sample variance of intr aclass correlation estimates.) Fisher (1925a, p.220) derived a similar formula (large sam ple variance) for the intraclass correlation in a balance design ( note: Fisher did not consider the assumption of ): Donner & Koval (1980b) showed the large sample variance of intracl ass correlation in an unbalanced design (note: Donner & Koval did not consider the assumption of ) as !34 In a cluster (or group) randomiz ed design, researchers often operate interventions or assign treatments at a group level (say Level -2 such as classrooms, schools, or sites) rather than at an individual level (say Level -1 for individual subjects like students) for some practical reasons t hat it is sometimes too expensive (or even not feasible) to work on interventions to each subject but rather than deal with an entire intact group (e.g., a whole community, school, worksite, or family). Therefore, cluster -randomized trials (or group -random ized experiments) recently have become more and more important and popular in educational and social research studies for effectively and economically evaluating educational and social interventions (Donner et al., 1981; Hauck et al., 1991; Hedges & Hedber g, 2007; Klar & Donner, 2015). For example, a research investigator could save money (or increase the effectiveness of cost) by using group interventions , e.g., CRTs , instead of individual ones like RCTs (Tachibana et al., 2018 ). Also note that researchers find CRTs are more suitable than RCTs for the construction of economic ally -efficient and economic ally -productive samples that have the desired statistical properties (Connelly, 2003). In a theoretical framework of cluster sampling experiments (i.e., cluster -randomized trials), suppose a sample of subject s are collected from clusters (or organizational units such as classrooms, schools, or district sites) of a group size which are assigned to an intervention (or a treatment group) with randomization. In this cluster sampling design, the individual samples are not independent to each other, but rather are highly dependent on the cluster to whom a subject, he or she, belongs or is assigned ; Lohr, 1999, Chapter 2 Simple Probability Samples & Chapter 5 Cluster Sampling with Equal Probability). Therefore, the sampling distribution of a statistic using cluster samples needs to take into account both between -group correlation and !35 within -group variation at the same time in analysis. Suppose that in this cluster sampling structure, the total variance consists of a within -cluster variance and a between -cluster variance , i.e., . Then, comparing with the formula of the population mean variance estimator for a simple random sample , the population average variance for an individual sample (from clusters with size ) is shown as , where the intraclass (or sometimes called intra -cluster) correlation coefficient is which provides a statistical measure of homogeneity within the clusters (i.e., if the clusters are perfectly ho mogeneous , then and ), and the design effect (DE) or variance inflation factor (VIF) is defined as (Donner et al., 1981; Lohr, 1999, pp.138 -140). Note that clustering has more variation than simple random sampling by a factor of DE (or VIF>1) due to the major part of cluster -to-cluster variability plus the minor portion of within -cluster variance (i.e., samples in different clusters often vary more than those samples in the same cluster). See Figure 2.4 as an example of 2 -level hierarch ical structure with regard to intraclass correlation and design effect. In experimental design, statistical planning for sample size determination and power calculation is critical for researchers to better produce evidence -based conclusions by rigorously detecting true effects at the desired level of significance. Traditionally, the experimental planning approach of sample and power computation considers the classical assumption of simple random samples. Therefore, power analysis for cluster sampling desig n or group randomized experiments need to use intraclass correlation coefficient along with non-centrality parameters (of -distribution) to account for variability in multilevel design (e.g., between -group and within -group variations) (Cohen, 1992; Hedges & Hedberg, 2007, 2013; Raudenbush, 1997; Rutterford et al., 2015). !36 Figure 2.4 Intraclass Correlation & Design Effect in 2 -Level Hierarchical Linear Model Note. Each level has its own variation, where variation between sites is sigma -square of between, and variation within site is sigma -square of within, and the total variation is the sum - - !37 In a two -level hierarchical design structure (i.e., individuals a re at the level 1, and groups or clusters at the level 2), the unconditional model (involving with no covariates) is written by , where represents an o utcome for the -th individual subject (at the level 1) in the -th cluster group (at the level 2), is a grand mean outcome , is a random error term at the level 1 (i.e., ) corresponding to the -th person in the -th group, is a random effect (i.e., ) associated with the -th cluster (or a random error term at the level 2), the within -group (between -person) variance component is given by , the between -group variance component is given by , and the random error terms at the level 1 and level 2 are not correlated (i.e., ). The (unconditional) intraclass correlation coefficient associated with the u nconditional model is , where the (unconditional) total variance is defined as , and represents the error variances corresponding to the within - and between -group random variation, !38 respectively . In a hierarchical design (such as cluster -randomized experiment) involving statistical adjustment by covariate(s), the (covariate -adjusted, or conditional) intraclass correlation is defined by , where the (covariate -adjusted) total variance is defined as , and -effect variance components adjusted by covariates) corresponding to the within - and between -group random variation, respectively. In order to evaluate the relative efficiency between unconditional and conditional hierarchical models, Hedges & Hedberg (2007) proposed two statistical auxiliary quantities and , where indicates the proportion of between -group variance remaining, and indicates the proportion of within -group variance remaining. Note that these two measures, along with and , are useful to provide information o f statistical variation for power and sample size computations, where and are defined as the proportion of !39 between -group and within -group variance explained by covariate(s) in hierarchical design, respectively. 2.4. Raykov Approach In classical test theory (CTT), a given test score ( ) consists of two parts the true score ( ) and the measurement error ( ) (Raykov & Marcoulides, 2011, pp.117 -118); hence, the relationship can be mathematically described as , where the true sc ore variance is , the error variance is , plus the true score and error score are assumed to be mutually independent, i.e., . According to the CTT equation, reliability coefficient ( ) is the ratio of t he true score variance to observed score variance, and can be expressed as , which is equivalent to a similar idea of the index in regression analysis when predicting true score from observed score. Moreover, it is interest ing to note that the standard error of measurement (SEM) is (Raykov & Marcoulides, 2011, pp.137 -145). Thereby, within the CTT framework, it appears a strong connection between reliability coefficient and intraclass correlation coefficient i n terms of statistical concepts and mathematical definitions (i.e., both share common ground to utilize variance accounted for). By the latent variable modeling (LVM) approach (Bartholomew, 1987), Raykov & Penev (2010) showed a procedure to evaluate reli ability coefficients (such as point and interval estimators) in 2 -level HLM unconditional and conditional models, and further derived !40 standard error (SE) estimates for reliability coefficients with logit transformation (i.e., ) via Taylor series expansion method (aka Delta method) as , which can lead to an large sample confidence interval using the standard normal Z distribution by where , and is a logit -transformed reliability coefficient, and is the error measurement. As for intraclass correlation coefficients (ICC) in hierarchical designs (e.g., two -level models) within the LVM framework (Bartholomew et al., 2011), Raykov (2011) used the restrictive maximum likelihood (REML) estimators to find ICC in the two -level HLM structure (aka factorial random -effect ANOVA): , where represents a response outcome score for the -th individual subject (at the level 1; ) in the -th cluster group (at the level 2; ), is the grand mean, is a random error term at the level 1 and assumed to be normally distributed with mean and within -group variance (i.e., ) corresponding to the -th person in the -th group, is a random effect and assumed to be normally distributed with mean and between -group variance (i.e., ) associated with the - deviation term at the level 2, and the random error terms at the level 1 and level 2 are !41 supposed to be mutually uncorrelated (i .e., ). In this LVM framework, the ICC is defined as the ratio of between -group variance to observed total variance , where the within -group variance is , and the between -group variance is . The visualization of this LVM modeling approach using a path diagram is shown in Figure 2.5. Figure 2.5 Latent Variable Model for Estimation of Intraclass Correlation in 2 -Level Design Note. The path diagram is inspired by the visualization of 2 -level random coefficient models in the book of statistical multilevel modeling (Muth”n & Muth”n, 2012, Chapters 9 & 10) . With the invariance property of MLE for the variance estimates in LVM, the ICC is given by , where and are the between - and within -group variation estimates, respectively, obtained by the REML method in th e two -level LVM model. Note tha t according to Casella & Berger (2002, p.320 ), the invariance property of MLEs is stated as follows: If is the MLE of , then for any one-to-one function , the MLE of is . As for hypothesis testing, the test statistic for intraclass correlation is !42 given by a standard normal distribution for the pivotal quantity is used to test the simple hypotheses vs (i.e., a two -tailed test at the significance level of ), or vs (i.e., a one -tailed test at the level), albeit this analytic strategy may only work for the large sample case, plus the lower bound of an interval estimation for by this method may reach out below zero (i.e., an out -of-bounds value from the valid domain of ICC ). The LVM procedure can also be extended and used to evaluate ICC at two -level designs with discrete response variables (Raykov & Marcoulides, 2015a). Suppose the same two -level LVM setting above, but assume that the observed outcome score is recorded on a categorical scale (i.e., a discrete variable for the -th unit at the level -1 of individual subject () in the -th unit at the level -2 of cluster group ( )). In this situation with categorical responses, the traditional approach of ICC estimation (which presumes the outcome is continuous) needs to be modified by the following modification procedure via the LVM framework (Raykov & Marcoulides, 2011, Chapter 10 Introduction to Item Response Theory). First, consider the underlying la ( possible cateogries) as !43 , where () plays an important role of a continuous latent variable (i.e., ), which is not only linked with the observed measure by a one -to-one linear transformation from one domain (latent space) to another (real space), but also used to assign a s pecific categorical value through the given thresholds points from (note: each threshold or cut -off point is a real number, and it holds that ) (Raykov & Marcoulides, 2015b). Given this underlying latent st ructure above, the ICC estimator for a binary outcome (a special case of categorical outcome variables; Raudenbush & Bryk, 2002, p.334) can be derived by , where is the between -group variation, and is a mathematical constant (note: the standard logistic distribution, with location and shape , has a variance of ). Also notice that this ICC estimator for the dichotomous outcome case (say, or ) makes a strong assumption that the within -group variance is held as a constant of over all -group -life data, and so the modified analytic strategies are needed for building non -constant within -group variances (which are data -driven and more flexible for a real world situation) into hierarchical generalized linear models (HGLM). Furthermore, the standard error of the ICC above (for the binary response case) can be approximately derived via Delta method (Raykov & Marcoulides, 2004; Hedges et al., 2012), !44 which is given by , where is the ICC estimate, the total variance estimate is (assuming the within -group variance is a constant of ), and is the between -group variance estimate. !45 CHAPTER 3 LITERATURE IN REHABILITATION COUNSELING This chapter presents literature of EBP in rehabilitation counseling using the RSA -911. The state vocational rehabilitation (VR) agencies collect and report summary data in a federally mandated format called the Rehabilitation S ervices Administration (RSA) Case Service Report, aka the RSA -911 (Schwanke & Smith, 2004). The RSA -911 provides researchers in the field of rehabilitation counseling an open playground and additional resource for deep learning and data mining. Not only do es the RSA -911 allow multi -faceted explorations of complex issues about people with disabilities in VR, but rehabilitation researchers can also probe extensively into big data to examines the hidden components or latent factors contributed to successful VR outcomes (Pi & Thielsen, 2011). Moreover, rehabilitation practitioners and scholars can take full advantage of the RSA -911 data to develop evidence -based practices, particularly for individual -level and employment -focused interventions, effective strategi es, as well as best practices to promote independent living and positive outcomes for individuals with disabilities (Fleming et al., 2013). With EBP as a cookbook approach to rehabilitation counseling (Kosciulek, 2010), it provides the fundamental framewo rk for rehabilitation counseling practitioners that incorporates the available scientific evidence with the expertise of clinical judgement skills to make best decisions about interventions, services, or treatments for people with disabilities. In this man ner, EBP guidelines also suggests rehabilitation counselors to identify relevant literature and systematic research, to assess different available information resources such as the RSA - services for people !46 with disabilities. So, with the data -driven framework using information on RSA -911, which research method or statistical approach can provide insights to work best for whom (target population s), how ( intervention or treatment programs ), and under what condition (rehabilitation support or other types of services )? This literature review survey s recent academic knowledge on those key questions and provides a firm foundation to this study . The following is a summary of literature review of statistical methods using the RSA -911. 3.1. Multilevel Analysis Hierarchical data structures are often seen in educational and social research studies . For example, in rehabilitation counseling, VR clients are grouped into organizational buildings and structures or field offices, which are nested into different local districts, and local districts can be nested into states or regions, and so on. So, it is important to take into account all these hierarchical data structures and topological data relationships by using multilevel analysis (hierarchical linear models) . Note that conventional regression models often under perform statistical estimation and inference (e.g., inflation of standard errors , and relative bias in ICC ) in hierarchical ly structure d data due to non-normal residuals resulted from the interrelation between subjects (which somewhat leads to violat ion of the important assumption s of independence , homogeneity and normality ) (Maas & Hox, 2004 ; Raudenbush & Bryk, 2002). Chan and his colleagues (2014) used RSA -911 data in FY 2005 (before the economic recession) and FY 2009 (after the economic recession) to study the impact of the contextual !47 factor of state unemployment rate, and its impact on the employment opportu nities and outcomes in VR. By the (2 -level) hierarchical (generalized) linear modeling approach, they found state unemployment rate (the contextual variable) was having a significant moderation effect on the relationship between personal factors (demograph ic and disability variables) and competitive employment. Alsaman & Lee (2017) examine the relationships between contextual factors, individual factors, and employment outcomes of transition youth with disabilities in VR using the RSA -911 in FY 2013 by the 2-level hierarchical generalized linear modeling. They found state unemployment rates were having the indirect interaction impacts on the relationships between individual characteristics, rehabilitation services, and successful employment. For example, the state unemployment rate increased, the disparity in successful VR closure decreased across some types of disabilities such as intellectual disabilities, TBI, or youth with autism and other communicative disabilities (in comparison to the reference group o f physical disabilities). Pi (2006) constructed the 2 -level hierarchical structure model with the micro - and macro -level factors related to VR outcomes using RSA -911 in FY 2002. Results showed the micro -level variables (i.e., age, education, minority, SSI/ DI, disability significance, services rehabilitation technology, job placement assistance, on -the -job -support, and diagnosis & treatment) were more related to rehabilitation outcomes than the macro -level variables (i.e., counselors who met CSPD requireme nts, proportion of clients with significant disabilities, unemployment rate, proportion of minority population). Note: CSPD=Comprehensive System of Personnel Development. !48 3.2. Structural Equation Model The structural equation modeling (SEM) with latent constructs (unobserved factors ) and manifest variables ( truth realizations) is one type of structural causal modeling (statistical models for causation ) that is built (through a path diagram for visualization) to identify the underlying factor structure ex plaining the direct and/or indirect effects of latent constructs and their inter -relationships on outcomes of interest (Raykov & Marcoulides, 2006). In the VR context, SEM can be used to understand complex theoretical models (or EBP) and to find important predictive associations ( using latent factor analysis ) among individual characteristics, rehabilitation services, and employment outcomes (Austin & Lee, 2014). Kosciulek & Merz (2001) conducted structural analysis of consumer -directed theory of empowermen t for consumers with disabilities in the community rehabilitation program. Chan et al. (2007) provided an overview of the basic concepts and applications of SEM (e.g., confirmatory factor analysis) in counseling, psychology, and rehabilitation research. Austin & Lee (2014) built a structural equation model of VR services (consisting of job -related and person -related factors) via RSA -911 in FY 2009, to study predictors of employment outcomes in VR for people with intellectual and co -occurring psychiatric d isabilities. The study found job -related services such as job placement, job search, job readiness, and on -the -job support, were to significantly predict competitive employment outcomes . !49 3.3. Classification Tree Model The tree model is a data -mining technique via the classification method of CHAID Chi-squared Automatic Interaction Detection algorithm to explore hidden relationships and predictive information in a large database (Tan et al., 2005). In the classification tree procedure, the tree -based model is designed to classify all subjects into homogeneous subgroups by their attributes . Additionally, the quite useful to uncover the complex multivariate system like the VR process by provid ing useful information . Rosenthal et al. (2007) used the data mining approach via RSA -911 data in FY 2001 to examine factors (i.e., services) affecting outcomes in the VR process for individuals suffering psychiatric disabilities. Results showed rece iving job placement services was found to be the most important variable and had a positive effect for the target population in VR. Schoen (2010), and Schoen & Leahy (2012) conducted an examination of demographics, services, and employment outcomes for peo ple with spinal cord injury in VR between FY 2004 and FY 2008 by data mining models via RSA -911 data. Findings suggested the most significant predictors of employment were level of education attained, cost of purchased services, days from application to cl osure, rehabilitation technology, job placement assistance, and job supports. Lee and his colleagues (2012), and Lee (2014) tried to discover the VR evidence -based best practices using a data mining approach of decision (or classification) tree models thro ugh the RSA -911 data in FY 2011 and FY 2013, respectively, to study the inter -relationships of VR measurements between services delivery, personal backgrounds and rehabilitation outcomes for !50 people with disabilities in State of Michigan . 3.4. Other Methods such as Social Network Analysis and Spatial Analysis Spatial analysis is a type of geographical /locational analysis (statistics) which seeks to explain patterns of human behavior (e.g., rehabilitation outcome s) and its spatial expression (reside ntial areas ). The geostatistical model can predict the spatial patterns (using geographical information) in the complex networks or systems (like RSA -911) for spatial decision -making support and solving geographic issues in planning and policy development (Mayhew, 2015) . Sink et al. (2014) developed location theory in VR to study effectiveness of service delivery and consumption for persons with disabilities using the geographic information system (GIS) and data from West Virginia Division of Rehabilitation Services (including RSA -911 and Census data). The findings supported the value of public VR field office or facility location and its effectiveness and efficiency for people with disabilities to achieve or maintain employment. Social network analysis is the process of investigating social structures through the use of graph and network theory. The social networking model characterizes individual links or ties (relationships or interactions) within a networked structure (such as the VR system). One key feature of this social network analysis is visual representation (via sociograms) which provides pivotal information about attributes within a network (e.g., positive or negative relationships between services and outcomes in the VR networ k data) ( Schneider, 2018 ). Ditchman et al. (2018) applied social network analysis, via the RSA -911 data in FY 2009, to examine service patterns and their relationships with employment outcomes for !51 transition -age individuals with autism spectrum disorder ( ASD). By social network analysis, six core VR services were found positively linked with a better employment outcome, including: assessment, counseling & guidance, job placement, job search, job support and transportation. 3.5 Justification for Covariates Used in Multilevel Analysis The Rehabilitation Act of 1973 (and its Amendments of 1986, 1992) was legislated with the goal of providing individuals with disabilities with equal opportunities to achieve employment, independent living, and self -sufficient as the general population without disabilities. Under the law, state VR programs are to help people with disabilities to obtain or maintain employment through rehabilitation services, which may include but not limited to assessment, vocational rehabilitat ion counseling & career guidance, educational training (e.g., colleges or universities), job coaching, job placement services, on -the -job support training, transportation and miscellaneous services (see Appendix A for the definitions of VR variables used i n the study; Rehabilitation Services Administration Policy Directive, 2013). Many research studies have been conducted to examine the relationships between various factors (i.e., individual characteristics, VR services, VR counselors, and environmental fa ctors) and rehabilitation outcomes. Based on a systematic review on VR outcomes in relation to VR factors, previous rehabilitation studies confirms the VR variables of interest in this study (including individual characteristics, employment backgrounds, re habilitation services) are all supported by the VR foundations with the significance of associations with successful employment outcomes for people with disabilities (Alsaman & Lee, 2016; Bolton et al., 2000; Chan et al., 2014; Dutta et al., 2008; Moore et al., 2000, 2001, 2002a, 2002b, 2004). !52 CHAPTER 4 METHODS AND RESEARCH QUESTIONS In this chapter, it provide s analytic strategies of experimental planning for cluster (or group) randomized design structure with respect to power & sample size calculations using intraclass correlation coefficient (or ICC) via hierarchical linear model (HLM) and hierarchical general ized linear model (HGLM). By the bootstrapping simulations (Givens & Hoeting, 2012; Rizzo, 2007), the methods are proposed to evaluate statistical performance of ICC , in terms of relative bias , estimation error, and inference on parameter , via HLM & HGLM using the real data set of RSA -911 from the U.S. Department of Education and Labor . In the RSA -911 data of this study, the target population focuses on those people with disabilities who had been served in Michigan in fiscal year (FY) 2015. In addition , the two -stage sampling approach is used to generate the simulated data sets with the cluster -randomized design structure, where individual subject (person with disability) is for Level 1 and structure ( rehabilitation office ) is for Level 2. 4.1 Research Methods Three proposed ICC estimation methods are shown for different statistical settings and experimental design purposes using multilevel model s: Method 1 the ICC estimator (via Pearson correlation & F of ANOVA) given by a balance design ( equal size of individual subjects across groups) is shown in Equations 1 !53 and 2 : (1) and (2) , where is group sample size , is the index of samples , the among -group mean is (from the -th sample over all group s), the common mean is , the common variance is , and are Mean Square s Among and Mean Square s Within from ANOVA , respectively. Method 2 the ICC estimator (via Pearson correlation & F of ANOVA) given by an unbalance design (unequal size of individual subjects across groups , for ) is shown in Equation 3 : (3) , where sample size for ICC estimation , and is the total sample size (i.e., ), and are Mean Squares Among and Mean Squares Within from ANOVA , respectively . Note that Pearson correlation estimate requires numerical approximation of -2 log likelihood. !54 Method 3 find auxiliary information (based on the ICC estimate from Method 1 or 2) for experimental planning in designs (design effect and minimum detectable effect size with respect to desired power & required sample size): (a) Design effect (DE), or variance inflation factor (VIF), is defined in Equation 4 as (4) , where the intraclass correlation coefficient is , or alternatively , which provides a statistical measure of homogeneity within the clusters , is group sample size for a balance d design case (or, alternatively, can be substituted for in an unbalance d design) . (b) The unconditional intraclass correlation coefficient is shown in Equation 5: (5) , where the unconditional total variance is , and represent the error variances corresponding to the within - and between -group variation, respectively. In a hierarchical design, such as cluster -randomized experiment, involving statistical adjustment by covariate(s), the conditional (or covariate -adjusted) intraclass correlation is described in Equation 6: !55 (6) , where the covariate -adjusted total variance is , and represent the random -effect variance components, adjusted by covariates, corresponding to the within - and between -group random variation, respectively. (c) Four prop osed statistical auxiliary quantities for evaluating the relative efficiency between unconditional and conditional hierarchical models, are shown as follows. The first two for measuring are described in Equations 7 and 8: (7) and (8) , where indicates the proportion of between -group variance remaining, and indicates the proportion of within -group variance remaining. The other two supplementary measures for variance explained by covariates (also serving the comple mentary side of measurements in Equations 7 and 8) are described below in Equations 9 and 10: !56 (9) and (10) , where and are defined as the proportion of between -group and within -group variance explained by covariate(s) in hierarchical design, respectively. 4.2 Proposed Models Four hierarchical modeling structures (Models 1 -4 as shown below) are considered in the study to test the proposed methods. And significance of disability (yes/no) , type of disability (nominal measure with 10 categories ), and previous work experience (yes/no) are included in all four models for separate (subgroup -specific ) analys es by breaking down the whole sample into different subsets based on the shared characteristics. Model 1 Unconditional Model (no covariate -adjusted) Model 2 Conditional Model (covariate -adjusted by Covariate Set 1) Covariate Set 1 consisting of demographic characteristics includes: (a) gender (male or female); (b) minority (yes or no); (c) age (continuous measure); (d) SES by social security and/or insurance benefits (yes or no); (e) educational background (ordina l measure) . !57 Model 3 Conditional Model (covariate -adjusted by Covariate Set 2) Covariate Set 2 consisting of VR service variables includes: (a) job placement assistance ( binary; received or not received); (b) on -the -job supports ( binary; received or not received); and (c) rehabilitation technology ( binary; received or not received) . Model 4 Conditional Model (covariate -adjusted by Covariate Set 3 ) Covariate Set 3 combines both Covariate Sets 1 and 2 altogether in to one set . There are two different V R outcomes used in simulation analyses (1) competitive employment outcome (yes/no); and ( 2) weekly earnings (a continuous measure) = rehabilitation outcome (a dichotomous 0 or 1 measure) X weekly income (a continuous measure) , where the weekly earnings can also be deemed as an indicator of quality of employment outcomes achieved at exit in the VR (Chan et al., 2016; et al., 2015 ). Note. The total number of all combinations of analyses (4 Models X 2 Outcomes) = 8. 4.3 Research Questions Our proposed methods are used to address the following research questions: In order to evaluate the simulation results, descriptive statistics of ICC are provided to answer R esearch Question 1 (RQ1) & Research Question 2 (RQ2) below. In addition , statistical performance (precision and accuracy) of ICC under the designated conditi ons using randomized cluster samples is examined by statistical bias (or average bias) and its error variance (or mean square error) to answer Research Question 3 (RQ3) below. Further more , the usable samples in !58 the whole data set of RSA -911 are used as a collection of the true parameters of ICC in RQ1; then, in the bootstrapping computations (Ross, 2013), the full data set of RSA -911 is resampled 100 times ( number of bootstrap ping repetition s=100) under the given sampling condi tions for ICC estimation using the bootstrap samples in RQ2 . At the end, by comparing the differences in ICC estimates between RQ1 and RQ2 , it shows which one of estimation methods, designated models, and sampling conditions, can provide the best results of statistical performance of ICC estimation and inference at multilevel design with randomized cluster samples (RQ3) . Research Question 1 (RQ1) : What are intraclass correlation values (ICC estimate, standard error, p -value and 95% confidence limits ) in the usable samples given by the breaking variables for subset analysis (Models 1 -4)? How are ICC estimates distributed in Models 1 -4? What are the differences in the ICC estimates among Models 1 -4? Research Question 2 (RQ2 ): Given the designated cluster randomized structure ( i.e., t he number of groups = 5, 15, 25; the number of subjects = 50, 100, 150), what are the intraclass correlation estimates (ICC estimate, standard error, and p -value) using the bootstrap samples (the number of bootstrap repetition=100) given by breaking variables under Models 1 -4? Research Question 3 (RQ3) : Comparing the results between Research Question 1 (population model) and Research Question 2 ( bootstrap subsample model), which model ing structu re (Models 1 -4) can provide the best statistical properties of ICC estimation and inference , in terms of statistical bias ( mean difference in ICC estimates between RQ1 and RQ2), mean square d error or mean squared deviation (average square d difference in ICC estimates between RQ1 and RQ2), and parameter coverage rate (proportion of true parameter confidence interval for ICC , using the results of RQ1 and RQ2 )? !59 4.4 Description of RSA -911 Data The RSA -911 data in FY 201 5 (which RSA -911 is supporting information by state VR agencies for rehabilitation services administration by the U.S. Department of Education ) is used to test the proposed methods for the ICC in different simulation scenarios of multilevel structure models. As for the foundations of evidence -based rehabilitation, the target population for employment, IPE, and had been receiving VR services already by their IPE) from the p ublic VR program in the State of Michigan. There are 33 VR office structures in Michigan that are used as an indicator of level -2 units in HLM & HGLM analyses in the simulation study. 4.5 Simulation and Analysis Plan To address the proposed research questions, a simulation study via the existing RSA -911 data (representing a complex system in a real -world situation ) is conducted by 2 -level hierarchical design modeling, where individual is on level -1, and office is on le vel -2. Two types of the proposed hierarchical models are considered in analyses: (1) unconditional model (without covariates) is designed by Model 1; and (2) conditional model (with covariates) is given by Models 2 -4. To test proposed multilevel designs a nd their modeling structures in apply a simulation analysis to compare the results between unconditional and conditional !60 models, with respect to four differe data set in RQ1, plus three different cluster sampling procedures in RQ2). Furthermore, in test design and evaluation, three outcomes of interest in the study (rehabilitation outcome, competitiv e employment, and quality of employment) are used to examine the statistical performance (effectiveness analyses) of the proposed models and the simulation results, in terms of statistical bias, error bias, and accuracy & precision (in RQ3). A graphic ove rview of the simulation process in the study is shown as a workflow chart below in Figure 4.1 . Figure 4.1 A Workflow Diagram of Simulation -based Exploration and Evaluation for the ICC In computer simulations via the RSA -911, the statistical software R (Linear Mixed !61 Model lmer and Generlized Linear Mixed Model glmer in the package of lme or lme4 ), IBM SPSS (Mixed Effect Model by MIXED ; Generalized Linear Mixed Model by GENLINMIXED ; Varia nce Component Analysis by VARCOMP ), SAS (Mixed Effect Modeling through Proc Mixed ; Generalized Linear Mixed Model via Proc Glimmix ), and Stata (Multilevel Mixed Model through Xtmixed or Mixed ) are used for conducting statistical analysis and outcome performance evaluation for simulation results of ICC estimation and statistical inference. 4.6 Theoretical Framework of HLM and HGLM in 2 -Level Cluster Randomized Design This section provides mathematical details of multilevel modeling structures used in the study. 4.6.1. HLM in 2 -Level Cluster Randomized Structure via RSA -911 In the two -level hierarchical design structure (i.e., individuals are at the level 1, and offices are at the level 2), the unconditional model (involving with no covariates) is de scribed in Equation 1 1: (11) !62 , where represents an outcome for the -th individual subject (at the level 1; ) in the -th office (at the level 2; ), is a grand mean outcome that can be estimated by , is a random error term (or individual variation) at the level 1 (i.e., ) corresponding to the -th person in the -th group, is a random effect (i.e., ) associated with the -th office (or cluster variation at the level 2), the within -cluster (i.e., between -person) variance component is given by , the between -cluster variance component is given by , and the random error terms at the level 1 and level 2 are assumed to be not mut ually correlated (i.e., ). When a covariate (e.g., age groups) used in the hierarchical design, the conditional model (involving with one covariate centered at the group mean) is written in Equation 1 2: (12) , where the covariate model uses group (office) mean centering for reducing correlation between groups (Paccagnella, 2006; Raudenbush & Bryk, 2002), the Level 1 model is for the -th person ( ) and the Level 2 is for the -th group ( ), is the covariate for the -th individual subject in the -th office, is group me an for the -th group, is a !63 random effect of the -th office (a random residual at Level 2), is an individual error term for the -th person (a random residual at Level 1), es are equal across offices), is grand mean, and independence between errors at levels 1 and 2. 4.6.2. HGLM in 2 -Level Cluster Randomized Structure via RSA -911 Suppose that is a binary outcome variable for the -th individual subject (at the level 1; ) from the -th cluster (office). In the 2 -level cluster randomized trial, the 2 -level hierarchical generalized linear model, HGLM, (involving with no covariates) i s given in Equation 1 3: (13) , where denote a dichotomous outcome (coded as zero or one) for the -th individual subject (at the level 1; ) from the -th office (at the level 2; ), is grand mean, is an individual error term at the level 1 (i.e., ) corresponding to the -th person in the -th group, is a random effect (i.e., ) associated with the -th group (or office variation at the level 2), the within -group variance is given by , the between -group variance is given by , and the random error terms at the level 1 and level 2 are assumed to be not mutually independent (i.e., ). !64 When a covariate (e.g., minority groups) used in the 2 -level generalized hierarchical design, the conditional model (involving with one covariate centered at the group mean) is written in Equation 1 4: (14) , where the generalized or binary covariate model is centered by cluster (office) mean, Level 1 is denoted for the -th person ( ) and Level 2 is denoted for the -th group ( ), is a covariate for the -th person in the -th gr oup, is group mean for the -th cluster, is a random effect of the -th office (a residual at Level 2), is an individual error for the -th subject (a residual at Level 1), opes are not the same across office structures), is grand mean, and random errors at levels 1 and 2 are assumed to be mutually independent (Klar & Donner, 2001; Raudenbush & Bryk, 2002). !!!!!!65 !!!CHAPTER 5 RESULTS 5.1 Data Source and Sample Characteristics This study used the real data set of Rehabilitation Services Administration, RSA -911, in FY 2015 to examine and verify the proposed analytic methods of intraclass correlation (ICC) estimation and related inferential statistics (e.g. , confidence interval and p -value) in different types of scenarios with respect to hierarchical design and modeling structure. The target samples are selected from people with disabilities who had been receiving services in the Michigan Rehabilitation Serv ices Programs for vocational rehabilitation and supported employment. Note that in order to select usable samples for data simulations, this study only includes those samples having an individualized plan for employment (IPE) for services in vocational reh abilitation (VR), while all other subjects (ineligible for VR or not having an IPE) are excluded from the target samples and not considered further in data analysis for ICC calculations. In simulation analysis of the study, the target sample is of size N=17,633, while the usable sample size is n=11,819 for ICC estimation and inference. By hierarchical design & model considerations (i.e., individuals are on Level 1 and offices are on Level 2), all usable samples are distributed across 33 office units statewi de in Michigan (see Tables B.1 and B.2 and Figure B.1 in Appendix B for an illustration of the hierarchical spatial data structure for usable samples in Michigan from RSA -911). Individual characteristics of the usable samples !66 are described in Tables 5.1, 5 .2 and 5.3 for more details. Table 5.1 Individual Characteristics of the Usable Samples ( n=11,819) Demographic Background Frequency Percentage Gender Female 5,069 42.90% Male 6,750 57.10% Age Younger than 22 3,771 31.91% Ages 22 -40 2,905 24.58% Ages 40 -64 4,734 40.05% Older than 65 409 3.46% Minority Yes (Non -Whites) 7,757 65.63% No (Whites) 4,062 34.37% Education Elementary or Secondary 3,177 26.88% Special Education 840 7.11% High School 5,075 42.94% College Above 2,727 23.07% Social Security Benefits No 9,168 77.60% Yes 2,651 22.40% Total 11,819 100.00% Note1. Minority group is defined as the non -white populations (e.g., Black or African American, American Indian or Alaska Native, Asian, Native Hawaiian or Other Pacific Islanders). Non - Middle East or North African, according to the RSA -911 Report Manual ; also see Appendix A ). Note2. Mean of Age = 36.4 , and Standard Deviation of Age = 16.3. !67 Table 5.2 Disability & Rehabilitation Characteristics of the Usable Samples ( n=11,819) Disability & Rehabilitation Information Frequency Percentage Type of Disability VI: Visual Impairments 87 0.70% HI: Hearing Impairments 1,989 16.80% PI: Physical Impairments 2,154 18.20% LD: Learning Disability 2,276 19.30% ADHD 443 3.70% ID 652 5.50% TBI 132 1.10% ASD: Autism 436 3.70% MI: Mental Illness 3,073 26.00% SA: Substance Abuse 577 4.90% Significance of Disability No 1,259 10.65% Yes 10,560 89.35% Previous Work Background No Work Experience 8,836 74.76% Had Work Experience 2,983 25.24% Job Placement Assistance Service Not Received 7,347 62.20% Received 4,472 37.80% On-the -job Supports Service Not Received 11,076 93.70% Received 743 6.30% Rehabilitation Technology Service Not Received 9,610 81.30% Received 2,209 18.70% Total 11,819 100.00% Note. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=S ubstance Abuse. !68 Table 5.3 Outcomes of the Usable Samples ( n=11,819) Outcome Measure Frequency Percentage Rehabilitation Outcome Not Employment 5,201 44.01% Employment 6,618 55.99% Competitive Employment Not Competitive Employment 6,787 45.60% Competitive Employment 6,429 54.40% Weekly Earnings Below $100 Weekly Income 5,409 45.80% $100-$200 Weekly Income 1,647 13.90% $200-$300 Weekly Income 1,506 12.70% Above $300 Weekly Income 3,257 27.60% Total 11,819 100.00% Note 1. Median of Weekly Earnings = 148.0, Mean of Weekly Earnings = 224.5, and Standard Error of Mean (SEM) of Weekly Earnings = 3.0. Note2. Weekly Earnings can also be deemed as an indicat or of quality employment. 5.2 Models and Variables Used for Simulations of ICC Analysis There are four multilevel modeling structures (Models 1 -4; M1 -M4) in the study to test the proposed methods of ICC estimation and inference. Furthermore, three disability -related covariates significance of disability (dichotomous; W1), type of disability (nominal; W2), and previous work experience (dichotomous; W3) analysis (i.e., separating the whole usable sample into different and mutually exclusive sub -samples) in the all four designated models (M1 -M4). Three covariate sets are considered for statistical adjustment in the multilevel modeling procedure: (1) Covariate Set 1 (CVS1) !69 includes demographic information such as gender (dichotomous; X1), minority (d ichotomous; X2), age (continuous; X3), social security benefits (dichotomous; X4), and education background (ordinal or approximately continuous; X5); (2) Covariate Set 2 (CVS2) includes rehabilitation service information such as job placement assistance ( dichotomous; X6), on -the -job supports (dichotomous; X7), and rehabilitation technology (dichotomous; X8); (3) Covariate Set 3 (CVS3) combines the previous covariate sets together (both CVS1 and CVS2) to account for all individual information in multilevel modeling. Two different outcome measures, competitive employment (dichotomous; Y1) and weekly earnings (continuous; Y2), correlation structures between predicto rs, covariates and outcomes are shown in Tables 5.4 and 5.5, and that the associations between disability type and outcomes are described via one -way analysis of variance (ANOVA) in Table 5.6. For outcome measure Y1, e xcept for X1 (p -value=0.41), all other predictors (X2 -X8) and covariates (W1 & W3) are correlated with the outcome measure Y1 at the significance level of 0.05 (see Table 5.4) . For outcome measure Y2, all predictors (X1 -X8) and covariates (W1 & W3) are correlated with the outcome measure Y2 at the significance level of 0.05 (see Table 5.5) . For the association of W2 (Type of Disability) with both outcome measures Y1 & Y2, it demonstrates in Table 5.6 that disability type is a significant factor in explaining total variation of both outcome meas ures, and that the measure of strength of association (i.e., F-statistic in ANOVA along with Eta -squared as an ICC effect size measure) is significant at the alpha level of 0.05 . In all, it suggests those predictors (X1 -X8) and covariates (W1 -W3) have pro spective associations with key outcome variable s (Y1 -Y2), and that this statistical evidence may provide supportive information linked to favorable and promising ICC calculations in the study. !70 Table 5.4 Correlation Structure of All Predictors and Outcom e Y1 in Hierarchical Analysis Y1 X1 X2 X3 X4 X5 X6 X7 X8 W1 W3 Y1 1.00 0.01 -0.09 0.18 -0.16 0.16 0.07 0.07 0.31 -0.21 0.31 X1 0.01 1.00 -0.01 -0.04 -0.02 -0.08 0.02 0.03 -0.05 0.03 -0.06 X2 -0.09 -0.01 1.00 0.01 0.10 -0.05 -0.01 -0.06 -0.23 0.13 -0.19 X3 0.18 -0.04 0.01 1.00 0.01 0.51 -0.16 -0.13 0.40 -0.26 0.36 X4 -0.16 -0.02 0.10 0.01 1.00 0.00 0.10 0.12 -0.15 0.18 -0.17 X5 0.16 -0.08 -0.05 0.51 0.00 1.00 -0.08 -0.10 0.29 -0.18 0.28 X6 0.07 0.02 -0.01 -0.16 0.10 -0.08 1.00 0.21 -0.25 0.17 -0.26 X7 0.07 0.03 -0.06 -0.13 0.12 -0.10 0.21 1.00 -0.10 0.07 -0.08 X8 0.31 -0.05 -0.23 0.40 -0.15 0.29 -0.25 -0.10 1.00 -0.38 0.56 W1 -0.21 0.03 0.13 -0.26 0.18 -0.18 0.17 0.07 -0.38 1.00 -0.39 W3 0.31 -0.06 -0.19 0.36 -0.17 0.28 -0.26 -0.08 0.56 -0.39 1.00 Note1. Y1=Competitive Employment; X1=Gender; X2=Minority; X3=Age; X4=Social Benefits; X5=Education; X6=Job Placement; X7=On -the -job Supports; X8=Rehabilitation Technology; W1=Significance of Disability; W3=Previous Work Experience. Note2. Except for X1 (p -value=0.41), all other predictors (X2 -X8) and covariates (W1 & W3) are correlated with the outcome measure Y1 at the significance level of 0.05. Note3. W2 (Type of Disability) is not included, due to the categorical (nominal) measure ment. Table 5.5 Correlation Structure of All Predictors and Outcome Y2 in Hierarchical Analysis Y2 X1 X2 X3 X4 X5 X6 X7 X8 W1 W3 Y2 1.00 0.05 -0.14 0.32 -0.22 0.28 -0.16 -0.07 0.51 -0.34 0.47 X1 0.05 1.00 -0.01 -0.04 -0.02 -0.08 0.02 0.03 -0.05 0.03 -0.06 X2 -0.14 -0.01 1.00 0.01 0.10 -0.05 -0.01 -0.06 -0.23 0.13 -0.19 X3 0.32 -0.04 0.01 1.00 0.01 0.51 -0.16 -0.13 0.40 -0.26 0.36 X4 -0.22 -0.02 0.10 0.01 1.00 0.00 0.10 0.12 -0.15 0.18 -0.17 X5 0.28 -0.08 -0.05 0.51 0.00 1.00 -0.08 -0.10 0.29 -0.18 0.28 X6 -0.16 0.02 -0.01 -0.16 0.10 -0.08 1.00 0.21 -0.25 0.17 -0.26 X7 -0.07 0.03 -0.06 -0.13 0.12 -0.10 0.21 1.00 -0.10 0.07 -0.08 X8 0.51 -0.05 -0.23 0.40 -0.15 0.29 -0.25 -0.10 1.00 -0.38 0.56 W1 -0.34 0.03 0.13 -0.26 0.18 -0.18 0.17 0.07 -0.38 1.00 -0.39 W3 0.47 -0.06 -0.19 0.36 -0.17 0.28 -0.26 -0.08 0.56 -0.39 1.00 Note1. Y2= Weekly Earnings ; X1=Gender; X2=Minority; X3=Age; X4=Social Benefits; X5=Education; X6=Job Placement; X7=On -the -job Supports; X8=Rehabilitation Technology; W1=Significance of Disability; W3=Previous Work Experience. Note2. All predictors (X1 -X8) and covariates (W1 & W3) are correlated with the outcome measure Y2 at the significance level of 0.05. Note3. W2 (Type of Disability) is not included, due to the categorical (nominal) measurement. !71 Table 5.6 Summary of Mean Differences in the Outcomes between Type of Disability Type of Disability (W2) Competitive Employment Outcome (Y1) Quality of Employment Outcome (Y2) VI 0.62 250.15 HI 0.86 578.54 PI 0.49 199.74 LD 0.48 140.62 ADHD 0.47 135.19 ID 0.48 103.63 TBI 0.48 180.94 ASD 0.52 123.64 MI 0.46 138.34 SA 0.50 173.38 Overall Mean (Standard Error) 0.54 (SE=0.01) 224.48 (SE=3.02) F-value (p-value) 118.36 (p-value < 0.01) 421.52 (p-value < 0.01) Eta-squared (or ICC) 0.08 0.24 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Abuse. Note2. F -value is based on One -way Analysis of Variance (ANOVA). Note3. Eta -squared ( ) is a measure of strength of association in ANOVA, and it can be computed as between -group sum of squares divided by total sum of squares, which is another form of effect -size measure of intraclass correlation coefficients (ICC). See more detail in Section 2.1. There are two types of multilevel modeling structures in the simulation study. The first one is unconditional model (Model 1, or M1) with no covariates adjusted; and the second one is conditional model (Model 2 -4, or M2 -4) with an adjustment of covariates (i.e., M2|CVS1, M3|CVS2, and M4|CVS3). Note that CVS1 is a pre -specified covariate set 1 about demographic information in Model 2 (M2), CVS2 is about rehabilitation service information in !72 Model 3 (M3), and CVS3 is about all individual information linking both CVS1 and CVS2 in Model 4 (M4). The statistical model specification for both unconditional model (M1) and conditional model (M2 -M4) is described as following: (1) Unconditional Model (Model 1; M1): In the two -level mul tilevel design structure (i.e., individual subjects are on the level 1, and office units are on the level 2), the unconditional model with no covariates -adjusted is shown in the system of Equation 15: (15) , where represents an outcome measure for the -th individual subject (at the level 1; ;) in the -th office unit (at the level 2; ; ), is a grand mean outcome that can be estimated as , is a random error term (or individual variation) at the level 1 (i.e., ) corresponding to the -th person in the -th group, is a random effect (i.e., ) associated with the -th office (or cluster variation at the level 2), the within -cluster (i.e., between -person) variance component is given by , the between -cluster variance component is given by , and the !73 random error terms at the level 1 and level 2 are assumed to be . (2) Conditional Model (Model s 2-4; M2 -M4) When a pre -specified covariate set (i.e., CSV1 -CSV3) is added in the previous unconditional model (M1), the conditional model with a covariate set -adjusted , where the covariate set is to be centered at the group mean on each level, can be described in the system of Equation 16: (16) , where the conditional model with covariate mean adjustment uses group -mean centering for reducing correlation between groups (Paccagne lla, 2006; Raudenbush & Bryk, 2002), Level 1 is for the -th person ( ) and Level 2 is for the -th group ( ), is the -th covariate for the -th individual subject in the -th office, is group mean of the -th covariate for the -th group, is a random effect of the -th office (a random residual at the level 2), is an individual error term for the -th person (a random residual at the level 1), is the - -th group (assuming each of slopes are varied across offices), is grand mean, is the slope regressed on the grand mean for the -th covariate adjusted by group mean, and independence is assumed between errors at levels 1 and 2. !74 5.3 ICC Estimation Method and Its Inferential Statistics The proposed intraclass correlation (ICC) estimator via Analysis of Variance (ANOVA) , shown below in Equation 17, is suitable for either a balanced (equal size over groups) or unbalance design (unequal size across groups): (17) , where MSA is Mean Squares Among Groups in the ANOVA, MSW is Mean Squares Within Groups in the ANOVA, is the -th group size, , and is the total sample size, i.e., . Note that computational information pertinent to the ANOVA for the ICC estimator (in Equation 17) is specified below in great detail. Suppose is decomposed by analysis of variance (ANOVA) for the intraclass correlation (ICC) estimator , where is an outcome measure for the -th person ( ) in the -th group ( ). The source of overall variation (or sum of squares, SS) is defined by , where the among -group variation , the within -group variation , and the total variation . The mean squares source (MS) in ANOVA can be obtained through the formula (i.e., regression to ward the mean or the average of variation ) , that !75 is, and , where is the mean variation among groups , is the mean variation within groups (or the mean squared error) , is (for = the number of groups) representing the between -group degrees of freedom, the within -group degrees of freedom is (for = the average number of within -group subjects = wei ghted mean group size). Note that the original idea of analysis of variable (ANOVA) for ICC estimation can be referred to Table 2.1 ( Donner & Koval, 1980a ). Furthermore, the variance of the ICC estimate can be obtained by (18) , where the sampling weights are and , the total sample size is , and is the ICC estimate as . Thus, the standard error of the ICC estimate is . The proposed testing statistic of the ICC estimate ( ) can be written that (19) !76 , where the test statistic follows an distribution with degrees of freedom and , for hypothesis testing versus . Given the sampling distribution for the ICC estimate, the confidence interval on the intraclass correlation can be obtained by (20) , where this confidence limit for the ICC ( ) represents the degree of total variability accounted for by betw een-group variation in multilevel design. It is noteworthy that the interval estimate on may not be very accurate and precise for a small sample size (i.e., small or ) or low reliability in measurements (i.e., large MSW or ). Also, it should be pointed out that the lower confidence limit on could be negative (especially when small sample size or large measurement error occurs in hierarchical modeling), but since normally should not be negative anyway by its mathematical definition (i.e., ), it is customary to replace the negative lower bound with For statistical planning in multilevel design, the proposed auxiliary statistics are used to help understand minimum detectable effect size with respect to desired power and required !77 sample size. Three types of measures linked with the intraclass correlati on (ICC) estimator are: (i) Design effect ( ), or variance inflation factor ( ), is written by (21) , where is the ICC estimate ( ) which provides a statistical measure of homogeneity within groups (i.e., if within -group subjects are homogeneous perfectly , then and hence ). In general, grouping creates more variation than simple random sampling by a factor of (or ), due to the major part of group -to-group variability plus the minor portion of within -group variation (i.e., samples in different groups vary more than those in the same group). (ii) The unconditional intraclass correlation coefficient is given by (22) , where the unconditional total variance is , and represent error variances corresponding to the within - and between -group variation, respectively, in the unconditional model with no covariates adjusted in multilevel design. !78 In hierarchical models with covariates for statistical adjustment, the conditio nal intraclass correlation coefficient is defined as (23) , where the covariate -adjusted total variance is , and represent the variance components, adjusted by covariates, corresponding to the within - and between -group variation, respectively, in the conditional multilevel model. (iii) For evaluating the relative efficiency of measures of homogeneity and heterogeneity in multilevel design, two statistical ancillary quantities, based on random variations of both unconditional and conditional hierarchical models, are given by (24) and (25) , where indicates the proportion of between -group variance remaining (after given by covariate adjustment) in multilevel design, and indicates the proportion of within -group variance remaining (after given by cov ariate adjustment) in multilevel design. !79 Both and measures show efficacy and effectiveness of covariate adjustment for between -group and within -group random variation in multilevel design and modeling. The other two opposite measures (like a pseudo R -squared) for random variation by covariate adjustment in hierarchical modeling, are written by (26) and (27) , where and are defined as the proportion of between -group and within -group, respectively, variation explained by covariates adjusted in hierarchical design. Note that both and can also show efficacy of covariate adjustment in multilevel design. 5.4 Results of ICC Estimates and Inferential Statistics measure for competitive employment (Y1); (2) The other one is a continuous measure for weekly earned income or quality e mployment (Y2). Further, there are four different multilevel models for ICC calculations: (1) Unconditional Model (M1) is of no covariate adjustment; (2) !80 Conditional Model (M2) is fitted with covariate adjustment by the demographic predictors (Covariate Se t1); (3) Conditional Model (M3) is fitted with covariate adjustment by the rehabilitation service predictors (Covariate Set2); (4) Conditional Model (M3) is fitted with covariate adjustment by both the demographic and service predictors (Covariate Set3). I n addition, three breaking variables are considered for subset analysis of ICC estimation and inference using usable sample s (n=11,819) in multilevel design: (1) Previous Work Experience binary measure (i.e., yes or no); (2) Significance Disability bin ary measure (i.e., yes or no); (3) Disability Type nominal measure with 10 different disability categories (i.e., VI, HI, PI, LD, ADHD, ID , TBI, ASD, MI, and SA). In this section, the main results of the study are presented in the following Tables 5.7 -5.16. 5.4.1 Competitive Employment Outcome Measure The competitive employment (Y1) is fitted as a dichotomous outcome measure in the 2 -level hierarchical generalized linear modeling (HGLM) framework, where individual subjects are on the level 1 and office units are on the level 2. The main results of the unconditional model M1 (Model 1) are shown in Table 5.7; the conditional model M2 (Model 2) in Table 5.8; the conditional model M3 (Model 3) in Table 5.9; the conditional model M4 (Model 4) in Table 5.10; Table 5.11 provides all the auxiliary information of ICC estimates such as design effect , , and ; and Table 5.12 shows ICC evaluation results based on the bootstrap sampling procedure (the number of bootstrap repetition s=100) . !81 The ICC estimate s (includin g standard error, p -value, 95% confidence interval) for competitive outcome measure (Y1) under unconditional (Model 1) and conditional (Models 2 -4) multilevel modeling structure, are summarized as follows. For competitive employment (Y1 under Model 1; refer to Tables 5.7), the average (unadjusted) intraclass correlation is about 0.01 (SE=0.00 3, p<0.01, 95% CI = [0.01,0.02]). Given by work experience (binary coding of yes or no) for partitioning subset samples, both show the average (unadjusted) ICC of 0.01 (SE=0.004, p<0.01, 95% CI = [0.01, 0.02]). By significance disability (binary coding of yes or no) for subset analyses, both show the average (unadjusted) ICC of 0.02 (SE=0.009, p<0.01, 95% CI = [0.01, 0.05]). Breaking down by disability types, it finds that autism spectrum disorder (ASD) has the highest (unadjusted) ICC of 0.06 (SE=0.03, p<0 .01, 95% CI = [0.00, 0.15]), followed by learning disability (LD; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.07]), hearing impairments (HI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), physical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [ 0.01, 0.04]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the significance level of 0.05, it shows non-significan ce for the unadjusted ICC estimates in the following disabilities visual impairments (VI, I CC=0.07, SE=0.10, p=0.25), attention deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.54), intellectual disability (ID; ICC=0.0 2, SE=0.02, p=0.0 6), traumatic brain injury (TBI; ICC=0.0 2, SE=0.0 5, p=0. 36), and substance abuse (SA; ICC=0.00, SE=0 .01, p=0.48). For competitive employment (Y1 under Model 2; refer to Tables 5.8), the average (adjusted by demographic information) intraclass correlation is about 0.01 (SE=0.003, p<0.01, 95% CI = [0.01,0.02]). Given by work experience (binary coding of yes or no) for partitioning subset samples, both show the average (adjusted by demographic information) ICC of 0.01 !82 (SE=0.004, p<0.01, 95% CI = [0.01, 0.03]). By significance disability (binary coding of yes or no) for subset analyses, both show the averag e (adjusted by demographic information) ICC of 0.02 (SE=0.01, p<0.01, 95% CI = [0.01, 0.05]). Breaking down by disability types, it finds that autism spectrum disorder (ASD) has the highest (adjusted) ICC of 0.06 (SE=0.03, p<0.01, 95% CI = [0.00, 0.15]), f ollowed by learning disability (LD; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.06]), hearing impairments (HI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), physical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]), and mental illne ss (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the significance level of 0.05, it shows non-significan ce for the adjusted ICC estimates in the following disability types visual impairments (VI, ICC=0.07, SE=0.10, p=0.26), a ttention deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.54), intellectual disability (ID; ICC=0.0 2, SE=0.02, p=0.0 5), traumatic brain injury (TBI; ICC=0.0 2, SE=0.0 5, p=0. 37), and substance abuse (SA; ICC=0.00, SE=0.01, p=0.48). For competi tive employment (Y1 under Model 3; refer to Tables 5.9), the average (adjusted by rehabilitation services information) intraclass correlation is about 0.01 (SE=0.003, p<0.01, 95% CI = [0.01,0.02]). Given by work experience (binary coding of yes or no) for partitioning subset samples, both show the average (adjusted by rehabilitation services information) ICC of 0.01 (SE=0.005, p<0.01, 95% CI = [0.01, 0.03]). By significance disability (binary coding of yes or no) for subset analyses, both show the average ( adjusted by rehabilitation services information) ICC of 0.02 (SE=0.01, p<0.01, 95% CI = [0.01, 0.05]). Breaking down by disability types, it finds that autism spectrum disorder (ASD) has the highest (adjusted) ICC of 0.08 (SE=0.04, p<0.01, 95% CI = [0.02, 0.17]), followed by learning disability (LD; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.02, 0.07]), intellectual disability (ID; !83 ICC=0.03, SE=0.02, p =0.02, 95% CI = [0.0 2, 0.0 9]), hearing impairments (HI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), phys ical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the significance level of 0.05, it shows non-significan ce for the adjusted ICC estimates in the following disability types visual impairments (VI, ICC=0.09, SE=0.10, p=0.20), attention deficit hyperactivity disorder (ADHD; ICC=0.01, SE=0.02, p=0.29), traumatic brain injury (TBI; ICC=0.0 2, SE=0.0 5, p=0. 35), and substance abuse (SA; ICC=0.00, SE=0.01, p=0.47). For competitive employment (Y1 under Model 4; refer to Tables 5.10), the average (adjusted by both demographics and rehabilitation services) intraclass correlation is about 0.01 (SE=0.003, p<0.01, 95% CI = [0.01,0.02]). Given by work experience (binary coding of yes or no) for partitioning subset samples, both show the average (adjusted by both demographics and rehabilitation services) ICC of 0.01 (SE=0.005, p<0.01, 95% CI = [0.01, 0.03]). By significance disability (binary coding of yes or no) for subset analyses, both show the average (adjusted by both demographics and rehabilitation services) ICC of 0.02 (SE=0.01, p<0.01, 95% CI = [0.01, 0.05]). Breaking down by disability types, it finds that autism spe ctrum disorder (ASD) has the highest (adjusted) ICC of 0.06 (SE=0.03, p<0.01, 95% CI = [0.01, 0.15]), followed by learning disability (LD; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.06]), hearing impairments (HI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01 , 0.05]), physical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the significance level of 0.05, it shows non-significan ce for the adjusted ICC estimates in the following disability types visual impairments (VI, ICC=0.07, SE=0.10, p=0.26), attention deficit hyperactivity disorder (ADHD; !84 ICC=0.00, SE=0.01, p=0.54), intellectual disability (ID; ICC=0.0 2, SE=0.02, p=0.0 5), traumatic brain injur y ( TBI; ICC=0.0 2, SE=0.0 5, p=0. 37), and substance abuse (SA; ICC=0.00, SE=0.01, p=0.48). For a uxiliary information of ICC Estimates for Outcome Measure Y1 (see T ables 5.1 1), the unconditional model (Model 1 ; unconditional ICC=0.01 and design effect DE =4.44 ) is used as a baseline for measuring relative efficiency of between -group variance ( and ) and within -group variance ( and ) for ICC estimates. The conditional model (Model 2 ; conditional ICC=0.01 and design effect DE=4.59 ) with a covariate set of demographic information has a decrease of 3.05% of within -group variation and 0.00% of change in between -group variation, in comparison with the unconditional model (Model 1). The conditional model (Model 3 ; conditional ICC=0.01 an d design effect DE=4.83 ) with a covariate set of rehabilitation service information has a decrease of 8.06% of within -group variation and an increase of 4.17% in between -group variation, in comparison with the unconditional model (Model 1). The conditiona l model (Model 4 ; conditional ICC=0.01 and design effect DE=4.59 ) with a covariate set of both demographic and rehabilitation service information has a decrease of 3.38% of within -group variation and no change (0.00%) in between -group variation, in compari son with the unconditional model (Model 1). For evaluation of bootstrapping ICC estimates ( bootstrap repetition of 100 times) for outcome measure Y1 in the different resampling scenarios of the number of groups and subjects (see Table 5.12), it provid es important information of sampling schemes in multilevel structure (based on Model 4 with the full set of covariates of demographics and rehabilitation services). For the low level of cluster samples (i.e., number of groups=5), the mean bias is about 0.0 068, MSE is about 0.0004, the proportion of successful hits is about 34%. For the medium level of !85 cluster samples (i.e., number of groups=15), the mean bias is about 0.0049, MSE is about 0.0002, the proportion of successful hits is about 66%. For the high level of cluster samples (i.e., number of groups=25), the mean bias is about 0.0047, MSE is about 0.0001, the proportion of successful hits is about 68%. On the other hand, For the low level of subject samples (i.e., number of subjects=50), the mean bias i s about 0.0062, MSE is about 0.0003, the proportion of successful hits is about 41%. For the medium level of subject samples (i.e., number of subjects=100), the mean bias is about 0.0053, MSE is about 0.0002, the proportion of successful hits is about 59%. For the high level of subject samples (i.e., number of subjects=150), the mean bias is about 0.0047, MSE is about 0.0001, the proportion of successful hits is about 70%. Overall, the sampling scheme with the high level of group samples (i.e., 25) and high level of subject samples (i.e., 150) achieve the best outcome (i.e., lowest bias & MSE, and highest successful hits); the sampling scheme with moderate cluster and subject samples (i.e., number of groups=15 and number of subjects=100) can provide the aver age performance of ICC estimation ; the sampling scheme with the low level of group samples (i.e., 5) or the level of group subject samples (i.e., 50) is more likely to result in poor performance of ICC estimates in hierarchical generalized linear modeling structure . !86 Table 5.7 ICC Estimates of Unconditional Model M1 for Outcome Measure Y1 Model 1 Total Sample Size Number of Groups Within Group Size ICC Estimate SE of ICC Estimate p-value Lower Bound of ICC Upper Bound of ICC Overall Sample 11,819 33 356 0.0097 0.0031 0.00 0.0053 0.0187 Work Experience No 8,821 33 266 0.0119 0.0038 0.00 0.0064 0.0232 Yes 2,998 33 90 0.0101 0.0053 0.00 0.0026 0.0254 Significance Disability No 1,233 33 36 0.0297 0.0145 0.00 0.0093 0.0675 Yes 10,586 33 319 0.0107 0.0034 0.00 0.0058 0.0208 Disability Type VI 87 29 3 0.0732 0.1008 0.25 -0.1241 0.3253 HI 1,989 32 61 0.0201 0.0093 0.00 0.007 0.0459 PI 2,154 33 65 0.0187 0.0084 0.00 0.0067 0.0429 LD 2,276 33 68 0.0286 0.0105 0.00 0.0134 0.0585 ADHD 443 33 13 -0.0032 0.0149 0.54 -0.0303 0.0495 ID 652 33 19 0.0223 0.0173 0.06 -0.0041 0.0727 TBI 132 27 5 0.0212 0.0513 0.36 -0.0823 0.1919 ASD 436 33 13 0.0641 0.0329 0.00 0.0141 0.1505 MI 3,073 33 92 0.0175 0.0070 0.00 0.0075 0.0376 SA 577 31 18 -0.0006 0.0085 0.48 -0.0208 0.0405 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Abuse. Note2. P -value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01). !87 Table 5.8 ICC Estimates of Conditional Model M2 for Outcome Measure Y1 Model 2 Total Sample Size Number of Groups Within Group Size ICC Estimate SE of ICC Estimate p-value Lower Bound of ICC Upper Bound of ICC Overall Sample 11,819 33 356 0.0101 0.0032 0.00 0.0055 0.0194 Work Experience No 8,821 33 266 0.0119 0.0038 0.00 0.0064 0.0232 Yes 2,998 33 90 0.0119 0.0057 0.00 0.0038 0.0284 Significance Disability No 1,233 33 36 0.0356 0.0160 0.00 0.0131 0.0767 Yes 10,586 33 319 0.0109 0.0034 0.00 0.0059 0.0211 Disability Type VI 87 29 3 0.0671 0.0996 0.26 -0.1289 0.3190 HI 1,989 32 61 0.0236 0.0102 0.00 0.0093 0.0517 PI 2,154 33 65 0.0188 0.0084 0.00 0.0067 0.0430 LD 2,276 33 68 0.0289 0.0105 0.00 0.0136 0.0588 ADHD 443 33 13 -0.0032 0.0149 0.54 -0.0303 0.0495 ID 652 33 19 0.0234 0.0176 0.05 -0.0034 0.0743 TBI 132 27 5 0.0190 0.0501 0.37 -0.0837 0.1892 ASD 436 33 13 0.0640 0.0329 0.00 0.0140 0.1504 MI 3,073 33 92 0.0175 0.0070 0.00 0.0075 0.0376 SA 577 31 18 -0.0006 0.0085 0.48 -0.0208 0.0405 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Abuse. Note2. P -value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01). !88 Table 5.9 ICC Estimates of Conditional Model M3 for Outcome Measure Y1 Model 3 Total Sample Size Number of Groups Within Group Size ICC Estimate SE of ICC Estimate p-value Lower Bound of ICC Upper Bound of ICC Overall Sample 11,819 33 356 0.0108 0.0033 0.00 0.0060 0.0206 Work Experience No 8,821 33 266 0.0130 0.0041 0.00 0.0071 0.0250 Yes 2,998 33 90 0.0124 0.0059 0.00 0.0041 0.0291 Significance Disability No 1,233 33 36 0.0356 0.0160 0.00 0.0131 0.0767 Yes 10,586 33 319 0.0119 0.0037 0.00 0.0066 0.0228 Disability Type VI 87 29 3 0.0905 0.1038 0.20 -0.1105 0.3430 HI 1,989 32 61 0.0215 0.0097 0.00 0.0079 0.0482 PI 2,154 33 65 0.0192 0.0085 0.00 0.0070 0.0437 LD 2,276 33 68 0.0340 0.0116 0.00 0.0170 0.0672 ADHD 443 33 13 0.0096 0.0189 0.29 -0.0219 0.0694 ID 652 33 19 0.0320 0.0199 0.02 0.0023 0.0877 TBI 132 27 5 0.0224 0.0519 0.35 -0.0815 0.1935 ASD 436 33 13 0.0782 0.0356 0.00 0.0238 0.1705 MI 3,073 33 92 0.0187 0.0073 0.00 0.0083 0.0396 SA 577 31 18 0.0001 0.0089 0.47 -0.0204 0.0416 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Ab use. Note2. P -value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01). !89 Table 5.10 ICC Estimates of Conditional Model M4 for Outcome Measure Y1 Model 4 Total Sample Size Number of Groups Within Group Size ICC Estimate SE of ICC Estimate p-value Lower Bound of ICC Upper Bound of ICC Overall Sample 11,819 33 356 0.0101 0.0032 0.00 0.0055 0.0195 Work Experience No 8,821 33 266 0.0119 0.0038 0.00 0.0064 0.0232 Yes 2,998 33 90 0.0120 0.0058 0.00 0.0038 0.0286 Significance Disability No 1,233 33 36 0.0359 0.0161 0.00 0.0133 0.0771 Yes 10,586 33 319 0.0109 0.0034 0.00 0.0060 0.0211 Disability Type VI 87 29 3 0.0673 0.0996 0.26 -0.1287 0.3192 HI 1,989 32 61 0.0237 0.0102 0.00 0.0094 0.0519 PI 2,154 33 65 0.0188 0.0084 0.00 0.0067 0.0430 LD 2,276 33 68 0.0290 0.0106 0.00 0.0137 0.0591 ADHD 443 33 13 -0.0033 0.0148 0.54 -0.0304 0.0493 ID 652 33 19 0.0237 0.0177 0.05 -0.0032 0.0749 TBI 132 27 5 0.0191 0.0501 0.37 -0.0837 0.1893 ASD 436 33 13 0.0645 0.0330 0.00 0.0144 0.1511 MI 3,073 33 92 0.0175 0.0070 0.00 0.0075 0.0377 SA 577 31 18 -0.0006 0.0085 0.48 -0.0208 0.0405 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Abuse. Note2. P -value=0.00 indicates t hat the level of significance is below 0.01 (i.e., p<0.01). !90 Table 5.11 Auxiliary Information of ICC Estimates for Outcome Measure Y1 Modeling Structure ICC Estimate Between Group Variance Within Group Variance Design Effect (DE) Model 1 (M1) 0.0097 0.0024 0.2458 4.4436 NA NA NA NA Model 2 (M2) 0.0101 0.0024 0.2383 4.5856 1.0000 0.9695 0.0000 0.0305 Model 3 (M3) 0.0108 0.0025 0.2260 4.8341 1.0417 0.9194 -0.0417 0.0806 Model 4 (M4) 0.0101 0.0024 0.2375 4.5856 1.0000 0.9662 0.0000 0.0338 M2-M4 show the conditional ICC quantity. Note2. Relative efficiency measures for ICC estimates between unconditional and condi tional models (M1 versus M2 -M4) are , , and .! Table 5.1 2 Evaluation of Bootstrap ICC Estimates for Outcome Measure Y1 Number of Group Within Group Size Bias MSE Hits 5 50 0.0078 0.0005 0.19 5 100 0.0069 0.0004 0.34 5 150 0.0056 0.0003 0.50 15 50 0.0054 0.0003 0.50 15 100 0.0048 0.0002 0.70 15 150 0.0045 0.0001 0.78 25 50 0.0053 0.0002 0.54 25 100 0.0042 0.0001 0.73 25 150 0.0033 0.0000 0.82 Note1. Bias is defined as the mean difference between Bootstrap ICC and True ICC. Note2. MSE is the mean squared error difference between Bootstrap ICC estimates. Note3. Hits shows the proportion of Bootstrap ICC estimates successfully lying within the 95 % confidence interval of True ICC. !91 5.4.2 Earnings or Quality Employment Outcome Measure The w eekly earned income , or quality employment , (Y2) is fitted as a continuous outcome measure in the 2 -level hierarchical linear modeling (HLM) framework, where individual subjects are on the level 1 and office units are on the level 2. The main results of the unconditional model M1 (Model 1) are shown in Table 5.1 3; the conditional model M2 (Model 2) in Table 5.1 4; the conditional model M3 (Model 3) in Tabl e 5.1 5; the conditional model M4 (Model 4) in Table 5.1 6; and Table 5.1 7 provides all the auxiliary information of ICC estimates measures of , , and ; and Table 5.1 8 shows ICC evaluation results based on the bootstrap sampling procedure (the number of bootstrap repetitions=100). The ICC estimates (including standard error, p -value, 95% confidence interval) for quality of employment outcome measure (Y2) under unconditional (Model 1) and conditional (Models 2 -4) multilevel modeling structure, are summarized as follows. For quality employment (Y2 under Model 1; refer to Tables 5.1 3), the average (unadjusted) intraclass correlation is about 0.02 (SE=0 .01, p<0.01, 95% CI = [0.01,0.04]). Given by work experience (binary coding of yes or no) for partitioning subset samples, both show the average (unadjusted) ICC of 0.03 (SE=0.01, p<0.01, 95% CI = [0.02, 0.05]). By significance disability (binary coding of yes or no) for subset analyses, both show the average (unadjusted) ICC of 0.05 (SE=0.01, p<0.01, 95% CI = [0.03, 0.09]). Breaking down by disability types, it finds that learning disability (LD) has the highest (unadjusted) ICC of 0.03 (SE=0.01, p<0.01, 9 5% CI = [0.02, 0.07]), followed by substance abuse (SA; ICC=0.03, !92 SE=0.02, p=0.04, 95% CI = [0.00, 0.09]), hearing impairments (HI; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.06]), physical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05 ]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the significance level of 0.05, it shows non-significan ce for the ICC estimates in the following disability types visual impairments (VI, ICC=0.00, SE=0.08, p=0.50), attention deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.87), intellectual disability (ID; ICC=0.0 2, SE=0.02, p=0.0 5), traumatic brain injury (TBI; ICC= -0.08, SE=0.0 3, p=0. 88), and autism spectrum disorder (ASD; ICC=0.02, SE=0.02, p =0.13). For quality employment (Y2 under Model 2; refer to Tables 5.1 4), the average (adjusted by demographic information) intraclass correlation is about 0.02 (SE=0.01, p<0.01, 95% CI = [0.01,0.04]). Given by work experience (binary coding of yes or no) for partitioning subset samples, both show the average (adjusted by demographic information) ICC of 0.03 (SE=0.01, p<0.01, 95% CI = [0.02, 0.05]). By significance disability (binary coding of yes or no) for subset analyses, both show the average (adjusted by demographic information) ICC of 0.05 (SE=0.01, p<0.01, 95% CI = [0.03, 0.09]). Breaking down by disability types, it finds that learning disability (LD) has the highest (adjusted by demographic information) ICC of 0.03 (SE=0.01, p<0.01, 95% CI = [0.02, 0.07]), followed by hearing impairments (HI; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.06]), substance abuse (SA; ICC=0.03, SE=0.02, p=0.04, 95% CI = [0.00, 0.09]), physical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the significance level of 0.05, it shows non -significan ce for the ICC estimates in the following disability types visual impairments (VI, ICC=0.00, SE=0.08, p=0.49), atte ntion deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.87), intellectual disability (ID; !93 ICC=0.0 2, SE=0.02, p=0.0 5), traumatic brain injury (TBI; ICC= -0.08, SE=0.0 3, p=0. 87),and autism spectrum disorder (ASD; ICC=0.02, SE=0.02, p=0.13). For quality employment (Y2 under Model 3; refer to Tables 5.1 5), the average (adjusted by rehabilitation services) intraclass correlation is about 0.02 (SE=0.01, p<0.01, 95% CI = [0.01,0.04]). Given by work experience (binary coding of yes or no) for partition ing subset samples, both show the average (adjusted by rehabilitation services) ICC of 0.03 (SE=0.01, p<0.01, 95% CI = [0.02, 0.05]). By significance disability (binary coding of yes or no) for subset analyses, both show the average (adjusted by rehabilita tion services) ICC of 0.05 (SE=0.01, p<0.01, 95% CI = [0.03, 0.09]). Breaking down by disability types, it finds that learning disability (LD) has the highest (adjusted by rehabilitation services) ICC of 0.04 (SE=0.01, p<0.01, 95% CI = [0.02, 0.07]), follo wed by substance abuse (SA; ICC=0.03, SE=0.02, p=0.04, 95% CI = [0.00, 0.09]), hearing impairments (HI; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.06]), intellectual disability (ID; ICC=0.03, SE=0.02, p=0.0 3, 95% CI = [0.00, 0.0 8]), physical impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the significance level of 0.05, it shows non-significan ce for the ICC estimates in the following disabilit y types visual impairments (VI, ICC=0.00, SE=0.08, p=0.52), attention deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.81), traumatic brain injury (TBI; ICC= -0.0 8, SE=0.0 3, p=0. 86),and autism spectrum disorder (ASD; ICC=0.02, SE=0.02, p=0.12) . For quality employment (Y2 under Model 4; refer to Tables 5.1 6), the average (adjusted by both demographics and rehabilitation services) intraclass correlation is about 0.02 (SE=0.01, p<0.01, 95% CI = [0.01,0.04]). Given by work experience (binary coding of yes or no) for !94 partitioning subset samples, both show the average (adjusted by both demographics and rehabilitation services) ICC of 0.03 (SE=0.01, p<0.01, 95% CI = [0.02, 0.05]). By significance disability (binary coding of yes or no) for subset analy ses, both show the average (adjusted by both demographics and rehabilitation services) ICC of 0.05 (SE=0.01, p<0.01, 95% CI = [0.03, 0.09]). Breaking down by disability types, it finds that learning disability (LD) has the highest (adjusted by both demogra phics and rehabilitation services) ICC of 0.03 (SE=0.01, p<0.01, 95% CI = [0.02, 0.07]), followed by substance abuse (SA; ICC=0.03, SE=0.02, p=0.04, 95% CI = [0.00, 0.09]), hearing impairments (HI; ICC=0.03, SE=0.01, p<0.01, 95% CI = [0.01, 0.06]), physica l impairments (PI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.05]), and mental illness (MI; ICC=0.02, SE=0.01, p<0.01, 95% CI = [0.01, 0.04]). Also noted that at the significance level of 0.05, it shows non-significan ce for the ICC estimates in the follo wing disability types visual impairments (VI, ICC=0.00, SE=0.08, p=0.49), attention deficit hyperactivity disorder (ADHD; ICC=0.00, SE=0.01, p=0.87), intellectual disability (ID; ICC=0.0 2, SE=0.02, p=0.0 5), traumatic brain injury (TBI; ICC= -0.08, SE=0.0 3, p=0. 87),and autism spectrum disorder (ASD; ICC=0.02, SE=0.02, p=0.12). For auxiliary information of ICC Estimates for Outcome Measure Y 2 (see Tables 5.1 7), the unconditional model (Model 1; unconditional ICC=0.0 2 and design effect DE= 8.49) is used as a baseline for measuring relative efficiency of between -group variance ( and ) and within -group variance ( and ) for ICC estimates. The conditional model (Model 2; conditional ICC=0.0 2 and design effect DE= 9.38) with a covariate set of demographic information has a decrease of 9.75% of within -group variation and an increase of 1 .27% of between -group variation, in comparison with the unconditional model (Model 1). The conditional model (Model 3; co nditional ICC=0.0 2 and design effect DE= 8.70) with a covariate !95 set of rehabilitation service information has a decrease of 2.47% of within -group variation and an increase of 0.32% in between -group variation, in comparison with the unconditional model (Mode l 1). The conditional model (Model 4; conditional ICC=0.0 2 and design effect DE= 9.41) with a covariate set of both demographic and rehabilitation service information has a decrease of 10.02 % of within -group variation and an increase of 1 .31% of between -group variation, in comparison with the unconditional model (Model 1). For evaluation of bootstrapping ICC estimates (bootstrap repetition of 100 times) for outcome measure Y2 in the different resampling scenarios of the number of groups and subj ects (see Table 5.18), it provides important information of sampling schemes in multilevel structure (based on Model 4 with the full set of covariates of demographics and rehabilitation services). For the low level of cluster samples (i.e., number of group s=5), the mean bias is about 0.0164, MSE is about 0.0009, the proportion of successful hits is about 34%. For the medium level of cluster samples (i.e., number of groups=15), the mean bias is about 0.0152, MSE is about 0.0004, the proportion of successful hits is about 55%. For the high level of cluster samples (i.e., number of groups=25), the mean bias is about 0.0149, MSE is about 0.0003, the proportion of successful hits is about 64%. On the other hand, For the low level of subject samples (i.e., number of subjects=50), the mean bias is about 0.0160, MSE is about 0.0007, the proportion of successful hits is about 40%. For the medium level of subject samples (i.e., number of subjects=100), the mean bias is about 0.0154, MSE is about 0.0004, the proportion of successful hits is about 54%. For the high level of subject samples (i.e., number of subjects=150), the mean bias is about 0.0148, MSE is about 0.0004, the proportion of successful hits is about 66%. Overall, the sampling scheme with the high level of g roup samples (i.e., 25) and high level of subject samples (i.e., 150) achieve the best outcome (i.e., !96 lowest bias & MSE, and highest successful hits); the sampling scheme with moderate cluster or subject samples (i.e., number of groups=15 or number of subj ects=100) can provide the average performance of ICC estimates in multilevel structure; the sampling scheme with the low level of group samples (i.e., 5) or the level of group subject samples (i.e., 50) is more likely to result in poor performance of ICC estimates in hierarchical linear modeling structure. Table 5.1 3 ICC Estimates of Unconditional Model M1 for Outcome Measure Y2 Model 1 Total Sample Size Number of Groups Within Group Size ICC Estimate SE of ICC Estimate p-value Lower Bound of ICC Upper Bound of ICC Overall Sample 11,819 33 356 0.0211 0.0054 0.00 0.0127 0.0381 Work Experience No 8,821 33 266 0.0134 0.0042 0.00 0.0073 0.0257 Yes 2,998 33 90 0.0408 0.0118 0.00 0.0227 0.0758 Significance Disability No 1,233 33 36 0.0797 0.0237 0.00 0.0422 0.1434 Yes 10,586 33 319 0.0171 0.0048 0.00 0.0100 0.0316 Disability Type VI 87 29 3 -0.0044 0.0798 0.50 -0.1832 0.2422 HI 1,989 32 61 0.0273 0.0110 0.00 0.0117 0.0577 PI 2,154 33 65 0.0223 0.0092 0.00 0.0090 0.0488 LD 2,276 33 68 0.0342 0.0116 0.00 0.0171 0.0676 ADHD 443 33 13 -0.0219 0.0081 0.87 -0.0425 0.0198 ID 652 33 19 0.0233 0.0176 0.05 -0.0034 0.0743 TBI 132 27 5 -0.0773 0.0281 0.88 -0.1447 0.0604 ASD 436 33 13 0.0226 0.0225 0.13 -0.0138 0.0899 MI 3,073 33 92 0.0190 0.0074 0.00 0.0085 0.0401 SA 577 31 18 0.0283 0.0207 0.04 -0.0026 0.0853 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Abuse. Note2. P -value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01). !97 Table 5.1 4 ICC Estimates of Conditional Model M2 for Outcome Measure Y2 Model 2 Total Sample Size Number of Groups Within Group Size ICC Estimate SE of ICC Estimate p-value Lower Bound of ICC Upper Bound of ICC Overall Sample 11,819 33 356 0.0236 0.0059 0.00 0.0143 0.0423 Work Experience No 8,821 33 266 0.0135 0.0042 0.00 0.0074 0.0259 Yes 2,998 33 90 0.0457 0.0126 0.00 0.0260 0.0838 Significance Disability No 1,233 33 36 0.0869 0.0246 0.00 0.0471 0.1540 Yes 10,586 33 319 0.0183 0.0050 0.00 0.0108 0.0336 Disability Type VI 87 29 3 -0.0016 0.0808 0.49 -0.1811 0.2453 HI 1,989 32 61 0.0293 0.0115 0.00 0.0130 0.0610 PI 2,154 33 65 0.0227 0.0093 0.00 0.0093 0.0495 LD 2,276 33 68 0.0346 0.0117 0.00 0.0174 0.0682 ADHD 443 33 13 -0.0218 0.0081 0.87 -0.0425 0.0198 ID 652 33 19 0.0238 0.0177 0.05 -0.0031 0.0750 TBI 132 27 5 -0.0750 0.0287 0.87 -0.1433 0.0636 ASD 436 33 13 0.0233 0.0226 0.13 -0.0134 0.0908 MI 3,073 33 92 0.0193 0.0074 0.00 0.0087 0.0405 SA 577 31 18 0.0283 0.0207 0.04 -0.0026 0.0853 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Abuse. Note2. P -value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01). !98 Table 5.1 5 ICC Estimates of Conditional Model M3 for Outcome Measure Y2 Model 3 Total Sample Size Number of Groups Within Group Size ICC Estimate SE of ICC Estimate p-value Lower Bound of ICC Upper Bound of ICC Overall Sample 11,819 33 356 0.0217 0.0055 0.00 0.0131 0.0391 Work Experience No 8,821 33 266 0.0136 0.0042 0.00 0.0074 0.0260 Yes 2,998 33 90 0.0429 0.0121 0.00 0.0241 0.0793 Significance Disability No 1,233 33 36 0.0855 0.0244 0.00 0.0462 0.1520 Yes 10,586 33 319 0.0175 0.0049 0.00 0.0103 0.0324 Disability Type VI 87 29 3 -0.0099 0.0777 0.52 -0.1872 0.2361 HI 1,989 32 61 0.0273 0.0111 0.00 0.0117 0.0577 PI 2,154 33 65 0.0223 0.0092 0.00 0.0091 0.0488 LD 2,276 33 68 0.0359 0.0120 0.00 0.0182 0.0703 ADHD 443 33 13 -0.0171 0.0100 0.81 -0.0394 0.0274 ID 652 33 19 0.0275 0.0187 0.03 -0.0007 0.0807 TBI 132 27 5 -0.0727 0.0292 0.86 -0.1419 0.0669 ASD 436 33 13 0.0242 0.0229 0.12 -0.0128 0.0922 MI 3,073 33 92 0.0193 0.0074 0.00 0.0087 0.0405 SA 577 31 18 0.0282 0.0207 0.04 -0.0027 0.0851 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Abuse. Note2. P -value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01). !99 Table 5.1 6 ICC Estimates of Conditional Model M4 for Outcome Measure Y2 Model 4 Total Sample Size Number of Groups Within Group Size ICC Estimate SE of ICC Estimate p-value Lower Bound of ICC Upper Bound of ICC Overall Sample 11,819 33 356 0.0237 0.0059 0.00 0.0144 0.0424 Work Experience No 8,821 33 266 0.0135 0.0042 0.00 0.0074 0.0259 Yes 2,998 33 90 0.0458 0.0126 0.00 0.0261 0.0840 Significance Disability No 1,233 33 36 0.0872 0.0246 0.00 0.0473 0.1544 Yes 10,586 33 319 0.0184 0.0050 0.00 0.0108 0.0337 Disability Type VI 87 29 3 -0.0015 0.0808 0.49 -0.1810 0.2454 HI 1,989 32 61 0.0293 0.0115 0.00 0.0130 0.0610 PI 2,154 33 65 0.0227 0.0093 0.00 0.0093 0.0495 LD 2,276 33 68 0.0348 0.0117 0.00 0.0175 0.0684 ADHD 443 33 13 -0.0216 0.0082 0.87 -0.0424 0.0201 ID 652 33 19 0.0240 0.0178 0.05 -0.0030 0.0753 TBI 132 27 5 -0.0754 0.0286 0.87 -0.1436 0.0630 ASD 436 33 13 0.0235 0.0227 0.12 -0.0132 0.0912 MI 3,073 33 92 0.0193 0.0074 0.00 0.0087 0.0406 SA 577 31 18 0.0283 0.0207 0.04 -0.0026 0.0853 Note1. VI=Visual Impairments or Blindness; HI=Hearing Impairments or Deafness; PI=Physical Impairments; LD=Learning Disabilities; ADHD= Attention Deficit Hyperactivity Disorder; ID=Intellectual Disability; TBI= Traumatic Brain Injury; ASD=Autism Spectrum Disorder; MI=Mental Illness; SA=Substance Abuse. Note2. P -value=0.00 indicates that the level of significance is below 0.01 (i.e., p<0.01). !100 Table 5.1 7 Auxiliary Information of ICC Estimates for Outcome Measure Y2 Modeling Structure ICC Estimate Between Group Variance Within Group Variance Design Effect (DE) Model 1 (M1) 0.0211 2,275.62 105,264.82 8.4907 NA NA NA NA Model 2 (M2) 0.0236 2,304.53 95,000.22 9.3782 1.0127 0.9025 -0.0127 0.0975 Model 3 (M3) 0.0217 2,282.94 102,665.47 8.7037 1.0032 0.9753 -0.0032 0.0247 Model 4 (M4) 0.0237 2,305.32 94,718.63 9.4137 1.0131 0.8998 -0.0131 0.1002 Note1. The ICC estimate for M1 M2-M4 show the conditional ICC quantity. Note2. Relative efficiency measures for ICC estimates between unconditional and conditional models (M1 versus M2 -M4) are , , and . !!!!Table 5.1 8 Evaluation of Bootstrap ICC Estimates for Outcome Measure Y2 Number of Group Within Grou p Size Bias MSE Hits 5 50 0.0175 0.0013 0.20 5 100 0.0162 0.0007 0.37 5 150 0.0156 0.0006 0.45 15 50 0.0154 0.0005 0.47 15 100 0.0153 0.0003 0.51 15 150 0.0148 0.0003 0.66 25 50 0.0152 0.0004 0.52 25 100 0.0147 0.0003 0.75 25 150 0.0139 0.0002 0.86 Note1. Bias is defined as the mean difference between Bootstrap ICC and True ICC. Note2. MSE is the mean squared error difference between Bootstrap ICC estimates. Note3. Hits shows the proportion of Bootstrap ICC estimates successfully lying within the 95% confidence interval of True ICC. !!101 CHAPTER 6 CONCLUSION & DISCUSSION 6.1 Summary of the Results The proposed method for ICC estimation and inference is based on the real -world data set of RSA -911, where the usable sample s are those individuals with disabilities served in the Michigan Rehabilitation Service s Programs in FY 2015 (n=11,819). To address the research questions of the study , the two -level multilevel modeling approach to the cluster -randomized design data struct ure, is used to fit the data simulations , where individual subject s are at the level 1 (i.e., the average within cluster size is 356 per unit) and rehabilitation office s are at the level 2 (i.e., there are 33 of vocational rehabilitation office structures statewide in Michigan ). There are two types of multilevel model ing in data simulation s: (1) unconditional model (Model 1 ); and (2) conditional model s (Models 2 -4). To evaluate which multilevel modeling structure s match better with which sampling schemes , a bootstrap resampling procedure is adopted in data simulation and analysis , to compare the ICC estimates between population (Research Question 1) and subsample (Research Question 2) models, in terms of statistical properties of accuracy and preci sion on ICC estimation and inference (Research Question 3). (a) Research Question 1 for Outcome Measure Y1 (see Tables 5.7-5.10 ) For overall sample of competitive employment, t he ICC estimate on average is about 0.01 (SE=0.003, p<0.01). Given by work experie nce ( i.e., no work experience, in particular ), the ICC estimate is inflated slightly (i.e., 0.002) but so is the standard error !102 (i.e., 0.002) , comparing with the overall sample. Given by disability significance (i.e., no disability significance, in particu lar) , the ICC estimate is inflated more (i.e., 0.02) and so is the standard error (i.e., 0.01), comparing with the overall sample. Given by disability type, the ICC estimate is inflated most (i.e., 0.05) for ASD, followed by LD (i.e., 0.0 2), HI (i.e., 0.0 1), PI (i.e., 0.01), and MI (i.e., 0.01). Also note that VI has the highest ICC (i.e., about 0.07), but the estimate is not significant at the level of 0.05 , due to small sample size (i.e., total sample size is 87 across 29 office units). (b) Research Question 1 for Outcome Measure Y2 (see Tables 5. 13-5.1 6) For overall sample of quality employment, t he ICC estimate on average is about 0.0 2 (SE=0.00 5, p<0.01). Given by work experience (i.e., having work experience, in particular), the ICC estimate is inf lated to some extent (i.e., 0.02) and so is the standard error (i.e., 0.00 6), comparing with the overall sample. Given by disability significance (i.e., no disability significance, in particular), the ICC estimate is inflated much (i.e., 0.06) and so is th e standard error (i.e., 0.0 2), comparing with the overall sample. Given by disability type, the ICC estimate for LD is inflated most (i.e., 0.01) followed by SA (i.e., 0.01), HI (i.e., 0.01), and PI (i.e., 0.001). Also note that the ICC estimate for MI is relatively lower than the overall sample by about 0.002. (c) Research Question 2 for Outcome Measure Y1 (see Tables 5.11 -5.12) As for examination of bootstrapping ICC estimates (repetitions=100) for competitive employment in the different sampling scenarios, it provides important sampling design information about hierarchical modeling with the full set of covariates of individual characteristics and rehabilitation services. With an average cluster sample size (e.g., the !103 number of clusters is about 10 -15), the mean bias is about 0.00 5, MSE is about 0.000 2, the proportion of successful hits is about 70%. With an average level of subj ect sample size (e.g., the number of subjects is around100), the mean bias is about 0.0053, MSE is about 0.0002, the proportion of successful hits is close to 60%. That is, the within -cluster subject size also plays an auxiliary role in quality of ICC esti mation and inference, while the between -cluster sample size determine s overall quality of ICC estimates. In general, with large cluster samples (e.g., cluster size is 15 -25) and average within -cluster samples (e.g., within -cluster size is 100 -150), the ICC estimation and inference can result in effective performance in terms of accuracy and precision; on the other side, with a smaller cluster size (e.g. , 5 or below ) or a smaller within -cluster sample size (e.g. , 50 or below ), the ICC estimate is susceptible to be less reliable and more biased in the hierarchical generalized linear modeling framework for a binary outcome measure . (d) Research Question 2 for Outcome Measure Y 2 (see Tables 5.17 -5.18) As for examination of bootstrapping ICC estimates (rep etitions=100) for quality of employment in the different resampling scenarios, it provides crucial sampling design information about multilevel modeling with the full set of covariates of individual characteristics and rehabilitation services. With an aver age cluster sample size (e.g., the number of clusters is about 10 -15), the mean bias is about 0.015, MSE is about 0.0003, the proportion of successful hits is about 55%. With an average level of subject sample size (e.g., the number of subjects is around 100), the mean bias is also about 0.015, MSE is about 0.0004, the proportion of successful hits is close to 55% as well. That is, the within -cluster size also plays a supplemental role in ICC estimation and inference, while the between -cluster size still can boost effective performance of ICC estimates. !104 In general, with large cluster samples (e.g., cluster size is 15 -25+) and average within -cluster samples (e.g., within -cluster size is 100 -150 +), the ICC estimation and inference can result in effective perfo rmance in terms of accuracy and precision; on the other hand , with a smaller cluster size (e.g., 10 or less ) or a smaller within -cluster sample size (e.g., 50 or less ), the ICC estimate is prone to be less consistent and more biased in the hierarchical lin ear modeling framework for a contin uous outcome measure. (e) Research Question 3 for Outcome Measure Y1 (see Tables 5.11 -5.12) As for a uxiliary statistics of the ICC estimates for competitive employment, the unadjusted ICC is about 0.01 (DE=4.44), while the ad justed ICC is also about 0.01 (DE=4.67). The unconditional model is used as a baseline to measure relative efficiency of between - and within -group variances for ICC estimates in conditional models. Among the three competing conditional models (Models 2 -4), Model 3 ( the one with a covariate set of service information ) has the most decrease of 8.06% of within -group variation as well as a significant increase of 4.17% in between -group variation, comparing with the baseline model (Model 1). Note that both Model 2 (demographic model) and Model 4 (full model) have similar performance that result in a decrease of 3.05% of within -group variation and 0.00% of change in between -group variation, comparing with the baseline. (f) Research Question 3 for Outcome Measure Y2 (see Tables 5.17 -5.18) As for a uxiliary statistics of the ICC estimates for quality of employment, the unadjusted ICC is about 0.02 (DE=8.49), while the adjusted ICC is also about 0.02 (DE=9.17). The unconditional model is used as a baseline to measure relative efficiency of between - and within -group variances for ICC estimates in conditional models. Among the three !105 competing conditional models (Models 2 -4), Model 4 (the one with the full covariate set of demographics and service s) and Model 2 (the one w ith a covariate set of demographic information) has the most decrease of about 9.88 % of within -group variation as well as a slight increase of 1.29% in between -group variation, comparing with the baseline model (Model 1) . Note that Model 3 (service model) has relatively ineffective performance that result in a modest decrease of 2.47% of within -group variation and a tiny increase of 0.32% in between -group variation, comparing with the baseline model . 6.2 Implications (a) Statistical perspectives on the ICC estimation and inference The intraclass correlation coefficients (ICC) at experimental designs has been one of the oldest statistical measures since Sir RA Fisher invented it last century (Fisher, 1925a) . it has been used as one of the most popular and important tools in scientific inquiries including educational and social research. In a theoretical perspective, both correlation coefficient and intraclass correlatio n share mathematical similarities and features. For example, ICC can be used to measure the level of similarity or resemblance within a group of measurements (e.g., students in a classroom or school), and the general formula of intraclass correlation can b e written by a very Fisher (1925a) also pointed out that the ICC can be geometrically equivalent to the overall Euclidean distance between the paired samples on the standardized scale (see F igures !106 2.2 and 2.3 as example s). In terms of effect size measures, both correlation and ICC can determine the effect size magnitude of a studied phenomenon of interest; in particular, the ICC can show the amount of total variance explained by between -group variation in an experimental design model (e.g., hierarchical linear models) , and that it is another form of the squared correlation (R-squared) in analysis of variance models which accounts for the true proportion of outcome variance across different clu sters. One research gap in methodology for ICC estimation and inference is about the testing statistic and its related sampling distribution. This study aims to address that important issue by developing the mathematical foundations of the ICC estimator at a hierarchical design (e.g., cluster randomized trials). Donner & Koval (1980a) derived maximum likelihood estimator (MLE) of the intraclass correlation using variance component in analysis of variance (ANOVA) models . Since the traditional method (Fisher approach) requires distributional assumptions (based on multivariate normal theory), it is analysis of variance (ANOVA) that provides an alternative estimator of intraclass correlation (by relaxing the multi -normal assumptions) via classical ANOVA. This extends it to utilize relevant information in the ANOVA table (i.e., utility of between - and within -group variation) for developing a general statistical framework for the ICC in the multilevel structure (i.e., a flexible approach to either a balanced design with equal group size or an unbalanced design with unequal group size). It is noteworthy that the approximate group size (or the average within -group size by Donner & Koval, 1980a ) is a key in an unbalanced design case for computation of the proposed ICC estimator (see Figure 2.4 as an illustration) . !107 As for statistical testing of the proposed ICC estimator ( ), this study suggests the use of F-distribution (with and degrees of freedom ) and F-testing statistic (based on ANOVA) for determining if the null or alternative hypothesis of the magnitude of effects is true at the chosen level of significanc e (i.e., vs. ). A significant F-testing statistic value implies that members of the same group tend to be more alike and similar with respect to the attribute or characteristic in question than tho se from different groups (i.e., if within -group subjects are perfectly homogeneous, or equivalently , then it implies ). As for a confidence interval on the ICC , this study provides the formulas for the corresponding interval for an ICC estimand (i.e., the true proportion of variance accounted for by a grouping factor of interest in a hierarchical design) . Also, it is notable to be pointed out that the lower confidence limit on an ICC int erval estimate could be negative using the proposed method, especially when a small sample size or large measurement error occurs in hierarchical modeling ; but since ICC is normally non-negative in anyway by the mathematical definition, it is a common prac tice to -hoc adjustment (Hays, 1994) . As for the variance of the proposed ICC estimator, this study uses the MLE approach (multivariate normality in a large sample theory ) by Donner & Koval (1980a) to obtain the standard error of the ICC estimate. It is interesting to note that the MLE of ICC is (i.e., a quick shortcut solution for the ICC estimation) for a balanced design in hierarchi cal modeling; but for an unbalanced design, the MLE of ICC needs to be solved by a different approach either numerical optimization via multivariate log -likelihood by !108 Donner & Koval (1908b) or using invariance property of MLEs by Karlin et al. (1981) . Th approach ( Hedges & Hedberg , 2007) that uses ICC via hierarchical modeling to collect the clustering information of variance components in cluster randomized trials (CRT) . Nowadays CRT have become more and more popular in education and social studies for some practical reasons that RCT (randomized control trial) is too expensive for the assignment of each individual subject, whereas CRT is more economical by deal ing with an entire intact group at one time . Since ICC has been considered as an ancillary statistic to provide design effect (DE, or variance inflation factor , VIF ) for statistical planning in multilevel design , ICC can play a key role in effectively quantif ying the amount of inherent clustering effects for a CRT survey study (Hedges et al., 2012; Hedges & Hedberg, 2013) . It is important to n ote that cluster ing design (CRT) has more total variation (i.e., cluster -to-cluster plus within -cluster variance ) than simple random sampling (RCT) by a factor of DE ( that is why it is also called VIF). As for experimental design with a binary outcome ( e.g., a dichotomous variable) , the proposed ICC estimator in this study is derived by using the hierarchical generalized linear modeling framework (HGLM ; Raudenbush & Bryk , 2002). It is conventional (and also mathematically convenient) to use a constant variance (i.e., ) as within -group variance based on the standard logistic distribution (locati on and shape ), whereas this strong assumption of within -group variance as often is not met in real world , so the recommended modifi cation strategy from the study is to introduce a more flexible estimation procedure by incorporati ng a data -driven within -group variance via HGLM for the proposed ICC estimation and inference . !109 Last but not least, the proposed ICC method is also connected with statistical planning in experimental design for sample size determination and power calculation , which is critical for researchers to conduct rigorous scientific investigations for detecting true effects at a desired effect size, statistical power, and significance level . Traditionally, the design and planning for sample and power calcula tions requires a classical restrictive assumption of simple random samples , which is not quite met for multilevel modeling . Hence , this study propose s a theoretical framework for the ICC estimator to circumvent such a shortcoming by tak ing into account het erogeneity in hierarchical structures of cluster samples ( such as CRT ). The proposed ICC estimation and inference is feasible via the use of between - and within -group variance in ANOVA of hierarchical linear modeling, and the testing statistic is based on -distribution to serve a foundation for statistical inference of the ICC estimand in a multilevel design. (b) Policy perspectives on the ICC estimation and inference In behavior, educational, psychological and social research, cluster randomized design that assig ns intact groups (e.g., classrooms or schools) to interventions, has been become more increasingly adopted in the era of evidence -based education and policy (Lingard, 2013 ). Since e xperimental design with such a cluster randomization is deemed as a hierarchical data structure (i.e., subjects nested within a cluster), statistical planning would require relevant information of ICC to account for clustering effects to achieve adequate power and collect sufficient sample . Through the real data set of !110 RSA -911 from U.S. Department of Education, this study provides a comprehensive analysis of ICC of employment outcomes ( i.e., competitive employment and quality of employment measures) which are adjusted by covariates of interest (i.e., demographics and rehabilitation services) that can be used for statistical planning on CRT research (randomized trials or quasi experiments ) in future education studies. In addition , this study also provides relative variance component information (i.e., between -group and within -group variation) that can be useful to understand which types of covariates should be involved in multilevel design for statistical planning and an alysis. In an era of evidence -based practice in rehabilitation counseling & education , researchers are more aware of incorporati on of scientific ways to empower people with impairments through effective services (Chan et al., 2009). The recent legislation of The Workforce Innovation and Opportunity Act of 2014 (WIOA), state and federal VR agencies have to assist the target disability population s, to succeed in the and labor market s in the global economy (WIOA Legislation, 2018). Thus, those rehabilitation counselors, educators, practitioners, and researchers all need to work together to adopt t he new EBP paradigm to improve the quality of life for VR customers through rehabilitation services. Further, evidence -based best practices in rehabilitation counseling would significantly improve outcomes for people with disabilities by translating knowledge and making good decisions in VR (Leahy et al., 2009, 2010, 2014a, 2014b). The use of E BP has become a new standard to conduct effective research and gather reliable data for improving practices and outcomes (Eignor, 2013). Rehabilitation counselors and practitioners can integrate best EBP research evidence with clinical !111 judgement expertise, to make better decisions that enhance the outcomes, so the EBP can provide a significant improvement of knowledge translation in practice (Kosciulek, 2010). So, not only does EBP provide the foundations incorporat ing scientific evidence as well as clinica l judgement expertise , to make best decisions about interventions, services, or treatments for people with disabilities , but EBP also assists rehabilitation counselors to identify relevant literature , assess available information such as th e RSA - services for people with disabilities. So, under the data -driven framework with RSA -911, this study provides the proposed method of ICC in multilevel data structure (i.e., individua l subjects are on level 1 and rehabilitation office units are on level 2) that can help rehabilitation counseling researchers better understand the target population of people with disabilities when conducting CRT design and analysis for gathering relevant information of EBP by taking into account of the clustering effects via the ICC (w.r.t. the office units statewide) in the RSA -911 data using hierarchical linear models. Hierarchical data structures are ubiqui tous in education and social studies (Raudenbush & Bryk, 1992). In rehabilitation counseling & education, for example, clients are nested into field office structures, which are also nested into local districts, and local districts are nested into states , and states are nested into regions, and so on. So, it is important to take into account all these multilevel structures and related topological relationships by using the hierarchical modeling framework for design and analysis . As for the origin of the R SA-911 data, Rehabilitation Services Administration Case Service Report (RSA -911for short) is the state vocational rehabilitation agencies collect and report summary data in a federally mandated format. The RSA -911 provides !112 researchers a good resource for gathering evidence of EBP (Schwanke & Smith, 2004). Through data mining and deep learning of t he RSA -911 data, rehabilitation researchers can study complex issues to build EBP for people with disabilities (Pi & Thielsen, 2011), and they can also explore big data of RSA -911 to examine what and how factors (e.g., variables in the individual level or office level) affect VR outcomes in which type disability groups . Therefore, rehabilitation researchers can exploit the RSA -911 data to develop EBP (either by CRT design or quasi -experimental analysis), in particular , for conducting individual -level and employment -related interventions, finding effective strategies for VR outcome improvement , and best VR practices to achieve successful outcomes for individuals with disabilities (Fleming et al., 2013 ; Pi, 2006). In previous literature of multilevel modeling using RSA -911 data, Alsaman & Lee (2017) examine d the cross -sectional inter -relationships between contextual factor s (unemployment rates at the state level), individual factors (demographic background at the person level) , and employment outcomes (competitive employment of a binary measure) for the youth population with disabilities using the 2 -level hierarchical gene ralized linear modeling (HGLM) framework. Chan et al. (2014) stud ied the impact of the economic recession on VR employment by controlling for the contextual factor of unemployment rate in each state, where the 2-level HGLM approach is applied. Pi (2006) used the 2 -level HLM method with the micro - and macro -level factors related to VR outcomes , to study relationship between predictors across levels in the VR. One knowledge gap in rehabilitation counseling research and literature for the ICC applications is about how to incorporate relevant ICC information into design and analysis using the RSA -911 data by taking into account the clustering effects via the !113 ICC and the related DE estimates using multilevel models. This study aims to address that important issu e by examining the ICC values via HLM and HGLM . The proposed framework for ICC estimation and inference in the study is examined via the real -life data set of RSA -911, where the target sample s of interest are people with disabilities in Michigan Rehabilitation Service s in FY 2015 (n=11,819). To address the ICC -related research questions of the study , the two -level HLM and HGLM approach to the CRT (or cluster ing RCT) type of study design is used to conduct the simulat ions , where person subject s are on the l evel and cluster units are on the l evel 2. Results show that: (i) the overall ICC estimate for both outcome measures (competitive employment and quality employment) tends to be low (0.01 and 0.02, respectively), impl ying that the clustering effects of rehabilitation office structures cannot capture much total variation in the RSA -911 data; (ii) rehabilitation services play a bigger role than individual characteristics in accounting for total variation in the both empl oyment outcome measures; (iii) previous work experience, significance of disability, and type of disability (i.e., covariates for subgroup analysis) can affect outcome measures, but also they show differences in the ICC estimates, which indicates that rese archers should pay attention to those groups with a high ICC value when conducting a CRT design study; (iv) should a CRT experiment be conducted, the recommended minimum cluster samples are about 10 -15 units, and person samples are about 100 -150 subjects, for attaining sufficient quality sample in analysis. It is interesting to notice that the average (unadjusted) ICC estimates in the simulation study are comparable to those psychological mental health data in school -based intervention designs in which ICCs range from 0.01 to 0.05 (Murray & Short, 1995) , although they are relatively low er than the standards of 0.05 -0.15 based on education data in reading and !114 mathematics across Grades K -12 (Bloom et al., 1999, 2007; Hedges & Hedberg, 2007; Scho chet, 2008 ). The low ICC is an indicator of small clustering effects in the multilevel design and analysis, but the effective sample size (i.e., a total sample size divided by design effect) is inflated to a certain degree , meaning the bottom line (minimum sample size ) is risen to maintain high statistical power and low standard error given by the same model. 6.3 Limitations of the Study There are four limitations in the study. (1) Of the different types of effect magnitude measures for the correlation ratio (), the intraclass correlation (ICC ; ) is a parametric estimator in ANOVA via HLM to quantify the true proportion of total variance ( ) accounted for in the outcome. Although the underlying ANOVA fr amework in HLM suggests the total variance consists of two independent variance components (i.e., both always be a positive real number) group variance ( ) and error variance ( ), an unbiased estimate of group variance may be failed and found , especially when MSE () is greater than or equal to MSA () (Hayes, 1994). As a consequence, the ICC estimate value is forced to become zero, which would be shown as a warning of estimation failure from the command for HLM or HGLM in statistical software (like lmer or glmer from the package of lme or lme4 in R). In this case of estimation failure in HLM, model modification is suggested to remedy the situation that there is mor e within -group error variation than between -group variation , i.e., , in ANOVA via HLM (Raudenbush & Bryk , 2002). !115 (2) In the simulation using the RSA -911 data, there have other options to build a different multilevel design and analysis for ICC estimation and inference . In this study, a two -level hierarchical design structure (i.e., individuals are at the level 1, and offices at the level 2) is fitted by HLM and HGLM to find the unadjusted ICC (by the unconditional model without any covariates) and adjusted ICC (by the conditional model with covariates). On the other side, alternative modeling choice can be the latent variable modeling (LVM) approach to investigate the multilevel data of RSA -911. Austin & Lee (2014) built a structural equation model (SEM) of VR services via RSA -911, to study predictors of employment outcomes in VR for people with intellectual and co -occurring psychiatric disabilities. And Alsaman & Lee (2017) examine the relationships between contextual factors, individual factors, and empl oyment outcomes of transition youth with disabilities in VR using the RSA -911 data in by the 2 -level HGLM (individuals are on Level 1, and states are on Level 2). Since the current study does not use latent factors in the HLM and HGLM framework due to the limitation of HLM and HGLM modeling structure, the alternative LVM approach can provide a holistic modeling structure with latent constructs and manifest variables both at the same time to study latent factor structures of interest ( Raykov & Marcoulides, 2006). In the VR context, SEM can also be used to examine important predictive associations between individual characteristics, rehabilitation services, and employment outcomes , while HLM is evel design (such as CRT). (3) In the simulation study using the RSA -911 data , it does not consider any interactions at the person level or the office level (e.g., demographic variables and service indicators at the level 1, or their group means at the lev el 2 ) due to statistical simplicity for simulations, but they may exist two -way interactions somewhat between those individual !116 characteristic and rehabilitation service variables. For example, age group (X3) can be related to education (X5) , rehabilitation services (X6 -X8) for both employment outcome measures (Y1 and Y2), according to the sample correlation structures of all predictors in hierarchical analysis (see Tables 5.4 and 5.5). With those important two -way interactions added into HLM and HGLM, the I CC estimation and inference can be influenced to some degree due to between - and within -group variation affected by new predictors (those important two -way interactions) in the HLM and HGLM model. Theoretically, after adding those significant predictors in an HLM or HGLM model, MSE (within -group variation) would be decreasing to some extent, and the new ICC could be increasing to a certain degree, comparing with the old ICC (based on the baseline model without newly added important two -way interactions). (4) The ICC estimation would require a minimum total sample size ( ), the number of groups ( ), and within -group size ( ). If one of the criteria (i.e., , , and ) is not met, it is very likely to obtain an invalid ICC estimate value (either the ICC estimate is a negative value or zero, or the lower bound of confidence interval is not positive at all). For example, the lower bound of ICC confidence interval (CI) for visual impairments (VI) on Y1 under Model 1 is not valid (see Table 5.7), due to t he small total sample size ( ) and within group size ( ); similarly, the lower bound of ICC confidence interval (CI) for visual impairments (VI) on Y2 under Model 1 is not valid either (see Table 5.13) and so is the ICC estimate negative, due to ag ain the small total sample size ( ) and within group size (). The threshold of sample size criteria for ICC estimation and inference would need future research to determine the minimum sample size for statistical analysis in HLM and HGLM. From the simulations, the rule of thumb is total sample size ( ) greater than 600 and within group size ( ) larger than 20, given by the number of groups about 30. In other !117 words, the quick formula is , where is total sample size , is the number o f groups, is within group size; and the simulation finding in the study (based on the RSA -911 data) suggests that the sample size criterion , or , would assure the ICC estimation and inference is more likely to get a valid and reliabl e result in the case of CRT (or cluster RCT) via the HLM and HGLM framework using the RSA -911 data. 6.4 Future Research Future work should address the following five potential issues that have not been fully addressed in this study. First, as for the tradi tional approach to ICC estimation, the practical method is based on a two -level multilevel structure (e.g., the person level is defined as Level 1, and the group level is defined as Level 2), where the ICC estimation is to utilize relevant information from the ANOVA table including the source of both between - and within -group variation in the HLM and HGLM framework. For more complex multilevel structure in CRT experiments (e.g., 3 -level and 4 -level hierarchical design), ICC estimation (using variance component decomposition in ANOVA via HLM ) has been discussed (Hedges et al., 2012; Hedges & Hedberg, 2013), but ICC inference (hypothesis testing by confidence interval and p -value) has not been done yet for complex 3 -level or 4 -level multilevel models. For this development, one statistical challenge and difficulty is to find out an effective way to quantify standard error of ICC (based on the pooled weighted variance of ICC across different levels) in complex multilevel design via the HLM or HGLM framework, or to extend the 2 -level !118 multilevel framework in the study to 3 - or 4 -level HLM or HGLM by using multiple correction method, and Benjamin i-Hochberg procedure) to control for familywise Type I error rate or the overall false discovery rate (i.e., the probability of making one or more Type I errors or false discoveries when performing multiple hypotheses tests) . Second, complex data integration (or data fusion) has become an important issue in the big -data era with today , and researchers may look into multiple sources of large -scale complex data sets (or data platforms) to conduct interdisciplinary studies. For example, it would be interesting to integrate the RSA -911 data with a set of covariates from Census data for a comprehensive research investigati on about how the between - and within group variation sources are varied by the ICC estimates, in terms of statistical effectiveness perspectives for design and analysis, for statistica l estimation and inference at eac h level of multilevel modeling across different data platforms. In such a way, multilevel design models are inherently nested at each level in different data platforms (note: data platform can be ated as an additional level in the HLM and HGLM framework) Given by this complex design structure (multiple data platforms), it would be interesting to study how statistical planning can be conducted for power and sample size calculations, and what ICC est imates are varied (using sensitivity analysis) to a point in different platforms . Third , covariate adjustment is an important technique in statistical modeling to take into account the confounder effects in a model (HLM or HGLM). In the complex multilevel design (i.e., more than two levels in hierarchal models), it would be interesting to understand how covariate adjustment (with or without subgroup analysis or stratification) affects adjusted ICC estimation and inferenc e. In the study (as the case of 2 -level hierarchical design), !119 improve the ICC estimates to some extent, yet in some cases (especially for a small total sample size or within -group sample size) the ICC estimation and inference cannot work at all (i.e., estimation failure). Therefore, it would be important to find out how to develop the remedial strategy for statistical adjustment and stratification in complex multilevel design via HLM and HGLM, a nd what type of statistical centering or standardizing procedures can be used to modify customize covariate adjustment (e.g., group and grand centering or standardizing) at each level to make ICC estimation and inference more accurate and precise by accounting for the localized multilevel substructure adjusted by covariates . Fourth, this study considers only one -year data (FY 2015) of RSA -911 for simulations to testify the proposed method of ICC estimation and inference. It would be inter esting to study the statistical properties of ICC by extend ing the current framework to a complex multilevel structure such as longitudinal design across multiple years or cross -cohort design with multiple year data resources. In this type of complex multi level modeling structure (e.g., longitudinal analysis in HLM and HGLM), the variance -and -covariance structure (i.e., a symmetry for homogeneous data, and autoregressive structure for heterogeneous data ) so as to take into account the correlation structure across different time periods or cohorts. In addition, it would be interesting to use multiple year data sets of RSA -911 to verify statistical performance of ICC estimat ion and inference in terms of consistency and efficiency. Lastly , missing data analysis is a common issue in statistics. Although the listwise procedure (i.e., only include complete data, but exclude those subjects with any incomplete information) is a co nvenient way to deal with missing data, it would often lose much !120 statistical information and compromise statistical power in analysis (e.g., HLM or HGLM). Hence, it would be important to study how to cope with missing values (assuming missing at random) in a multilevel design data structure for ICC estimation and inference , and what remedial procedure s (EM or multiple imputation for discrete or continuous variables) can be applied to improve the ICC estimation process via sensitivity analys is in HLM or HGLM . For the proposed method of ICC with a full complete data , the simulation results suggest the total sample size needs to be greater than 1,500 and within group sample size larger than 100 ( over 15 groups ). Nevertheless, the guideline s need to be adjusted for incomplete data case. 6.5 Conclusion In conclusion, this study provides a comprehensive methodology for intraclass correlation (ICC) estimation and inference using the hierarchical mixed modeling framework. The proposed methodology for ICC estimation and inference incorporate the analysis of variance (ANOVA) approach to the development of the ICC estimator and its inferential statistic of the pivotal quantity of the ICC estimand for derivi ng the sampling distribution (F-distribution) to test ICC as well as construct confidence interval on ICC . The proposed statistical procedures for ICC estimation and inference can be easily used and applied in any large -scale or small -scale data sets, wher eas small total sample size and small within group size and missing data are limitations can affect the results of ICC estimat es to a certain degree in terms of precision and accuracy. More research study is needed to better understand the ICC in complex m ultilevel design structures. !121 APPENDICES !122 APPENDIX A: Definitions of the V R Variables in R SA-911 The following are the definitions of VR variables, according to the manual of RSA -911 (Policy Dir ective of RSA -PD-16-04 for Revision of RSA -PD-14-01; https://www2.ed.gov/policy/speced/guid/rsa/subregulatory/pd -16-04.pdf). This appendix section includes three tables: (1) VR services are shown in Table A.1; (2) demographic backgrounds are listed in Table A.2; and (3) rehab ilitation outcomes are given in Table A.3. Table A.1. List of the Definitions of VR Service Variables Used in the Study Rehabilitation Service RSA Definition Job Placement Assistance This is a referral to a specific job resulting in setting up a job interview and obtaining a job on behalf of a customer (1=received; 0=not received) On-the -Job Supports Services such as job coaching, follow along services to assist a customer adjust to the job and become stable to enhance job retention (1=received; 0=not received) Rehabilitation Technology The application of rehabilitation engineering, assistive devices, technologies, or services, to meet the needs and address the barriers (1=received; 0=not received) !123 Table A.2. List of the Definitions of VR Demographic Variables Used in the Study VR Demographics RSA Definition Age Indicate age when he or she is applied for VR services (continuous measure) Gender Indicate an individual is male or female (1=male; 0=female) Minority (Non -White) s/he is minority (including Black, Native, As ian, Pacific Islander, and Hispanic) or not (White) (1=minority; 0=non -minority) Social Security Benefits (Insurance Benefits) Indicate if an individual receives Social Security Disability Insurance (SSDI) or Supplemental Security Income (SSI) (1=receiv ed; 0=not received) Employment Status at Application (Previous Work Background) Employment status of the individual at application (1=employment; 0=not employed) Type of Disability impairment includes : blindness/visual impairment, deafness/hearing impairment, physical or orthopedic/neurological impairment, LD, ADHD, intellectual disability (ID), TBI, autism, mental illness (MI), substance abuse (SA) (categorical/qualitative measure) Level of Education Level of education the individual had attained includes: elementary/secondary education, special education, high school graduate or equivalency certificate (GED), college or above (categorical/o rdinal measure) Significance of Disability Whether the individual was considered a person with a significant disability or a most significant disability during VR (1=yes; 0=no) !124 Table A.3. List of the Definitions of VR Outcome Variables Used in the Study Rehabilitation Outcome RSA Definition Rehabilitation Outcome Individual exited the VR program either with or without an employment outcome after receiving services (1=exited with an employment; 0=exited without an employment) Competitive Employment Employed either at or above minimum wage in integrated setting (1=yes; 0=no) Weekly Earnings (or Quality of Employment ) The approximate amount of money earned in a typical week (continuous measure) !125 APPENDIX B: Descriptive Data Statistics Table B.1 Descriptive Summary of Usable Sample by Office Level in Michigan (n=11,819) Office Unit Frequency Percentage Adrian Unit 244 2.06% Alpena Unit 160 1.35% Ann Arbor Unit 484 4.10% Battle Creek Unit 298 2.52% Bay City Unit 281 2.38% Benton Harbor Unit 289 2.45% Big Rapids Unit 175 1.48% Clinton Township Unit 732 6.19% Detroit Fort Street Unit 320 2.71% Detroit Grand River Unit 423 3.58% Detroit Hamtramck Unit 463 3.92% Detroit Mack Unit 332 2.81% Detroit Porter Unit 421 3.56% Flint Unit 418 3.54% Gaylord Unit 174 1.47% Grand Rapids Unit 764 6.46% Holland Unit 335 2.83% Jackson Unit 163 1.38% Kalamazoo Unit 345 2.92% Lansing Unit 631 5.34% Livonia Unit 441 3.73% Marquette Unit 405 3.43% Midland Unit 125 1.06% Monroe Unit 200 1.69% Mt. Pleasant Unit 136 1.15% Muskegon Unit 366 3.10% Oak Park Unit 540 4.57% Pontiac Unit 416 3.52% Port Huron Unit 485 4.10% Saginaw Unit 281 2.38% Taylor Unit 213 1.80% Traverse City Unit 377 3.19% Wayne Unit 382 3.23% Total 11,819 100.00% Note. There are 33 offices located statewide in Michigan, serving the target population of people with disabilities of N=17,633 in FY 2015. Of the target samples, the usable sample size is n=11,819 for data analysis in the study and ICC calculations. !126 Table B.2. A Summary of the Geographic Information System of Office Units in Michigan Latitude (N) Longitude (W) Abbreviation MRS Unit 41.90 84.04 ADR Adrian 45.06 83.43 ALP Alpena 42.28 83.73 AA Ann Arbor 42.30 85.23 BCK Battle Creek 43.60 83.89 BC Bay City 42.10 86.48 BH Benton Harbor 43.70 85.48 BR Big Rapids 42.31 83.21 CT Clinton Township 42.38 83.10 DT Detroit Fort Street Detroit Grand River Detroit Hamtramck Detroit Mack Detroit Porter 43.02 83.69 FL Flint 45.03 84.67 GL Gaylord 42.96 85.66 GR Grand Rapids 42.78 86.10 HD Holland 42.25 84.40 JAK Jackson 42.27 85.59 KAZ Kalamazoo 42.71 84.55 LAN Lansing 42.40 83.37 LV Livonia 46.55 87.41 MRQ Marquette 43.62 84.23 ML Midland 41.92 83.40 MR Monroe 43.60 84.77 MP Mt. Pleasant 43.23 86.26 MKG Muskegon 42.47 83.18 OP Oak Park 42.65 83.29 PT Pontiac 42.98 82.60 PH Port Huron 43.42 83.95 SAG Saginaw 42.24 83.27 TL Taylor 44.77 85.62 TC Traverse City 42.28 83.39 WY Wayne !127 Figure B.1 Spatial Network of Target Sample in Michigan by Hierarchical Structure Note1. MRS represents the Michigan Rehabilitation Services Programs. Note2. plotted on geometric graph according to the geographic information system (GIS) in Table B.2. Longitude (West) Latitude (North) !128 APPENDIX C: Glossary of Abbreviations This glossary contains abbreviations , acronyms and some definition used in this study. Table C.1 Glossary of Abbreviations ANOVA Analysis of Variance ASD Autism Spectrum Disorder CSPD Comprehensive System of Personnel Development CTT Classical Test Theory EBP Evidence Based Practice ESRA Education Sciences Reform Act FY Fiscal Year GIS Geographic Information System HGLM Hierarchical Generalized Linear Model HLM Hierarchical Linear Model ICC Intraclass Correlation Coefficient ID Intellectual Disability IPE Individualized Plan for Employment LVM Latent Variable Modeling MI Mental Illness MLE Maximum Likelihood Estimate MRS Michigan Rehabilitation Services NCLB No Child Left Behind RCT Randomized Control Trial REML Restrictive Maximum Likelihood RSA Rehabilitation Service Administration SE Standard Error SEM Standard Error Measurement SEM Structural Equation Model TBI Traumatic Brain Injury VR Vocational Rehabilitation WIOA Workforce Innovation and Opportunity !129 BIBLIOGRAPHY !130 BIBLIOGRAPHY Agresti, A., & Finlay, B. (2009). Statistical methods for the social sciences . Upper Saddle River, N.J: Pearson Prentice Hall. Alsaman , M. A., & Lee, C. -L. (2017). Employment Outcomes of Youth With Disabilities in Vocational Rehabilitation: A Multilevel Analysis of RSA -911 Data. Rehabilitation Counseling Bulletin , 60(2), 98-107. American Educational Research Association., American Psych ological Association., National Council on Measurement in Education., & Joint Committee on Standards for Educational and Psychological Testing (U.S.). (2014). Standards for educational and psychological testing . Anderson, T., & Shattuck, J. (2012). Desig n-based research: A decade of progress in education research? Educational researcher , 41(1), 16-25. Austin, B. S., & Leahy, M. J. (2015). Construction and validation of the clinical judgment skill inventory: Clinical judgment skill competencies that measu re counselor debiasing techniques. Rehabilitation Research, Policy, and Education , 29(1), 27. Austin, B. S., & Lee, C. -L. (2014). A structural equation model of vocational rehabilitation services: Predictors of employment outcomes for clients with intellectual and co -occurring psychiatric disabilities. Journal of Rehabilitation, 80(3), 11-20. Bara b, S., & Squire, K. (2004). Design -based research: Putting a stake in the ground. The journal of the learning sciences , 13(1), 1-14. Bartholomew, D. J. (1987). Latent variable models and factors analysis . Oxford University Press, Inc. Bartholomew, D. J., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach (Vol. 904). John Wiley & Sons. Bloom, H.S., Bos, J.M., & Lee, S.W. (1999). Using Cluster Random Assignment to Measure Program Impacts: Statistical Implicatio ns for the Evaluation of Education Programs. Evaluation Review, 23(4), 445 -469. Bloom, H.S., Richburg -Hayes, L., & Black, A.R. (2007). Using Covariates to Improve Precision: Empirical Guidance for Studies that Randomize Schools to Measure the Impacts of E ducational Interventions. Educational Evaluation and Policy Analysis, 29(1), 30-59. !131 Bolton, B. F., Bellini, J. L., & Brookings, J. B. (2000). Predicting client employment outcomes from personal history, functional limitations, and rehabilitation services. Rehabilitation Counseling Bulletin , 44(1), 10-21. Casella, G., & Berger, R. L. (200 2). Statistical inference . Australia: Thomson Learning. Chan, F., Tarvydas, V., Blalock, K., Strauser, D., & Atkins, B. J. (2009). Unifying and elevating rehabilitation counseling through model -driven, diversity -sensitive evidence -based practice. Rehabil itation Counseling Bulletin , 52(2), 114-119. Chan, F., Bezyak, J., Ramirez, M. R., Chiu, C. Y., Sung, C., & Fujikawa, M. (2010). Concepts, Challenges, Barriers, and Opportunities Related to Evidence -Based Practice in Rehabilitation Counseling. Rehabilitat ion Education , 24. Chan, F., Wang, C. C., Fitzgerald, S., Muller, V., Ditchman, N., & Menz, F. (2016). Personal, environmental, and service -delivery determinants of employment quality for state vocational rehabilitation consumers: A multilevel analysis. Journal of Vocational Rehabilitation , 45(1), 5-18. Cobb, P., Confrey, J., DiSessa, A., Lehrer, R., & Schauble, L. (2003). Design experiments in educational research. Educational researcher , 32(1), 9 -13. Chan, F., Lee, G. K., Lee, E., Kubota, C., & Allen, C. A. (2007). Structural equation modeling in rehabilitation counseling research. Rehabilitation Counseling Bulletin , 57(1), 44-57. Chan, J. Y., Wang, C. C., Ditchman, N., Kim, J. H., Pete, J., Chan, F., & Dries, B. (2014). State unemployment rates and vo cational rehabilitation outcomes: A multilevel analysis. Rehabilitation Counseling Bulletin , 57(4), 209 -218. Cohen, J. (1988). Statistical power analysis for the behavioral sciences . Hillsdale, N.J: L. Erlbaum Associates. Cohen, J. (1992). A power prime r. Psychological bulletin , 112(1), 155. Connelly, L. B. (2003). Balancing the number and size of sites: an economic approach to the optimal design of cluster samples. Controlled clinical trials , 24(5), 544-559. Connolly, P., Keenan, C., & Urbanska, K. (2018). The trials of evidence -based practi ce in education: a systematic review of randomised controlled trials in education research 19802016. Educational Research , 60(3), 276-291. Cox, D. R. (1971). The choice between alternative ancillary statistics. Journal of the Royal Statistical Society. S eries B (Methodological) , 251-255. !132 Ditchman , N. M., Miller, J. L., & Easton, A. B. (2018). Vocational Rehabilitation Service Patterns: An Application of Social Network Analysis to Examine Employment Outcomes of Transition -Age Individuals With Autism. Rehabilitation Counseling Bulletin , 61(3), 143-153. Donner, A., Birkett, N., & Buck, C. (1981). Randomization by cluster: sample size requirements and analysis. American Journal of Epidemiology , 114(6), 906-914. Donner, A., & Koval, J. J. (1980a). The estimation of intraclass correlation in the analys is of family data. Biometrics , 19-25. Donner, A., & Koval, J. J. (1980b). The large sample variance of an intraclass correlation. Biometrika , 67(3), 719-722. Donner, A., & Koval, J. J. (1982). Design considerations in the estimation of intraclass correla tion. Annals of Human Genetics , 46(3), 271-277. Dutta, A., Gervey, R., Chan, F., Chih -chin, C., & Ditchman, N. (2008). Vocational rehabilitation services and employment outcomes for people with disabilities: A united states study. Journal of Occupational Rehabilitation , 18(4), 326-334. Efron, B., & Hinkley, D. (1978). Assessing the Accuracy of the Maximum Likelihood Estimator: Observed Versus Expected Fisher Information. Biometrika, 65(3), 457-482. doi:10.2307/2335893 Eignor, D. R. (2013). The standards for educational and psychological testing . American Psychological Association. Ellis, P. D. (2009, September 7). Thresholds for interpreting effect sizes [Website log post on Hong Kong Polytechnic University]. Retrieved August 11, 2018, from http://www.p olyu.edu.hk/mm/effectsizefaqs/thresholds_for_interpreting_effect_sizes2 .html ESRA Legislation - U.S. Department of Education. (May 2008). Public Law Print: Education Sciences Reform Act . Retrieved from https://ies.ed.gov/director/pdf/ESRAreauth.pdf Fishe r, R. A. (1915). Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Indefinitely Large Population. Biometrika, 10(4), 507-521. doi:10.2307/2331838 (Editorial). (1915). On the Distribution of the Standard Deviations of Sm all Samples: Appendix I. To Papers by "Student" and R. A. Fisher. Biometrika, 10(4), 522-529. doi:10.2307/2331839 Fisher, R. A. (1925a). Statistical methods for research workers . Genesis Publishing Pvt Ltd. !133 Fisher, R. A. (1925b, July). Theory of statistical estimation. In Mathematical Proceedings of the Cambridge Philosophical Society (Vol. 22, No. 5, pp. 700 -725). Cambridge University Press. Fisher, R. A. (194 2). The design of experiments . Edinburgh: Oliver and Boyd . Fisher, R. A. (1958a). Cigarettes, cancer, and statistics. The Centennial Review of Arts & Science , 2, 151-166. Fisher, R. A. (1958b). Lung cancer and cigarettes. Nature , 182(4628), 108. Fleming, A. R., Del Valle, R., Kim, M., & Leahy, M. J. (2013). Best practice models of effective vocational rehabilitation service delivery in the public rehabilitation program: A review and synthesis of the empirical literature. Rehabilitation Counseling Bulletin , 56(3), 146-159. Flom, P. (2015, March 10). What is adjusted correlation [Website log post on Quora]. Retrieved August 8, 2018, from https://www.quora.com/What -is-adjusted -correlation Givens, G. H., & Hoeting, J. A. (2012). Computational statistics (Vol. 710). John Wiley & Sons. Hauck, W. W., Gilli ss, C. L., Donner, A., & Gortner, S. (1991). Randomization by cluster. Nursing research , 40(6), 356-358. Hays, W. L. (1994). Statistics . Fort Worth: Harcourt Brace College Publishers. Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning group -randomized trials in education. Educational Evaluation and Policy Analysis , 29(1), 60-87. Hedges, L. V., Hedberg, E. C., & Kuyper, A. M. (2012). The variance of intraclass correlations in three -and four -level models. Educational and Ps ychological Measurement , 72(6), 893-909. Hedges, L. V., & Hedberg, E. C. (2013). Intraclass correlations and covariate outcome correlations for planning two -and three -level cluster -randomized experiments in education. Evaluation review , 37(6), 445-489. Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta -analysis . Orlando: Academic Press. Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association , 81(396), 945-960. !134 Karlin, S., Cameron, E. C., & Williams, P. T. (1981). Sibling and parent --offspring correlation estimation with variable family size. Proceedings of the National Academy of Sciences , 78(5), 2664-2668. Klar, N., & Donner, A. (2001). Current and future chal lenges in the design and analysis of cluster randomization trials. Statistics in medicine , 20(24), 3729-3740. Klar, N., & Donner, A. (2015). The impact of EF Lindquist's text on cluster randomisation. Journal of the Royal Society of Medicine , 108(4), 142 -144. Kosciulek, J. F. (2010). Evidence -Based Rehabilitation Counseling Practice: A Pedagogical Imperative. Rehabilitation Education , 24. Kosciulek, J. F., & Merz, M. (2001). Structural analysis of the consumer -directed theory of empowerment, Rehabilitati on Counseling Bulletin , 44(4), 209-216. Kutner, M. H., Nachtsheim, C., Neter, J., & Li, W. (2005). Applied linear statistical models . Boston: McGraw -Hill Irwin. Leahy, M. J., Thielsen, V. A., Millington, M. J., Austin, B., & Fleming, A. (2009). Quality assurance and program evaluation: Terms, models, and applications. Journal of Rehabilitation Administration , 33(2), 69. Leahy, M. J., & Arokiasamy, C. V. (2010). Prologue: Evidence -based practice research and knowledge translation in rehabilitation counse ling. Rehabilitation Research, Policy, and Education , 24(3/4), 173. Leahy, M. J., Chan, F., & Lui, J. (2014a). Evidence -based best practices in the public vocational rehabilitation program that lead to employment outcomes. Journal of Vocational Rehabilita tion , 41(2), 83-86. Leahy, M. J., Chan, F., Lui, J., Rosenthal, D., Tansey, T., Wehman, P., Kundu, M., Dutta, A., Anderson, C. A., Del Valle, R., & Sherman, S. (2014b). An analysis of evidence -based best practices in the public vocational rehabilitation p rogram: Gaps, future directions, and recommended steps to move forward. Journal of Vocational Rehabilitation , 41(2), 147-163. Lee, C. -L. (2014). Linking paths between rehabilitation customer characteristics, services and outcomes by decision tree models (Unpublished apprenticeship paper. Michigan State University. Department of Counseling, Educational Psychology, Special Education). Lee, C. -L., Pi, S., & Thielsen, V. (2012). Relationships of Customer Characteristics, Services and Outcomes Using a Data Min ing Approach (An unpublished internal report to Michigan Rehabilitation Services. Project Excellence, Program of Rehabilitation Counseling, Department of Counseling, Educational Psychology, Special Education, Michigan State University). !135 Lee Rodgers, J., & Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient. The American Statistician , 42(1), 59-66 Lingard, B. (2013). The impact of research on education policy in an era of evidence -based policy. Critical Studies in Education , 54(2), 113-131. Lohr, S. L. (1999). Sampling: Design and Analysis . Pacific Grove, CA: Duxbury Press. Lomax, R. G., & Hahs -Vaughn, D. L. (2012). An Introduction to Statistical Concepts. New York: Routledge. Kelly , K. (2018). CEP932: Quantitative Methods in Education Research I [Spring 2018], correlation coefficient] . College of Education, Michigan State University, East Lansing, Michigan, USA. Mayhew, S. (201 5). A d ictionary of geography . Oxford University Press . Maas, C. J., & Hox, J. J. (2004). The influence of violations of assumptions on multilevel parameter estimates and their standard errors. Computational statistics & data analysis , 46(3), 427-440. Menon, A., Korner -Bitensky, N., Kastner, M., McKibbon, K., & Straus, S. (2009). Strategies for rehabilitation professionals to move evidence -based knowledge into practice: a systematic review. Journal of Rehabilitation Medicine , 41(13), 1024-1032. Mood, A. M., Graybill, F. A., & Boes, D. C. (1974). Introduction to the theory of statistics . New York: McGraw -Hill. Moore, C. L., Flowers, C. R., & Taylor, D. (2000). Vocation al rehabilitation services: Indicators of successful rehabilitation for persons with m ental retardation. Journal of Applied Rehabilitation Counseling , 31(2), 36-40. Moore, C. L. (2001). Disparities in closure success rates for African Americans with mental retardation: An ex post -facto research design. Journal of Applied Rehabilitation Co unseling , 32(2), 31-36. Moore, C. L., Feist -Price, S., & Alston, R. J. (2002a). Competitive employment and mental retardation: Interplay among gender, race, secondary psychiatric disability, and rehabilitation services. Journal of Rehabilitation , 68(1), 14-19. Moore, C. L., Feist -Price, S., & Alston, R. J. (2002b). VR services for persons with severe/profound mental retardation: Does race matter? Rehabilitation Counseling Bulletin , 45(3), 162-167. !136 Moore, C. L., Harley, D. A., & Gamble, D. (2004). Ex -post -fact o analysis of competitive employment outcomes for individuals with mental retardation: National perspective. Mental Retardation , 42(4), 253-262. Murray, D.M. & Short, B. (1995). Intra -Class Correlation Among Measures Related to Alcohol Use by Young Adult s: Estimates, Correlates, and Applications in Intervention Studies. Journal of Studies on Alcohol, 56(6), 681 -694. : Statistical analysis with latent variables (Version 6). Los Angeles, CA: Muth”n & Muth”n . NCLB Legislation - U.S. Department of Education. (January 8, 2002). Public Law Print: No Child Left Behind Act . Retrieved from https://www2.ed.gov/policy/elsec/leg/esea02/107 -110.pdf Odom, S. L., Brantlinger, E., Gersten, R., Horner, R. H., Tho mpson, B., & Harris, K. R. (2005). Research in special education: Scientific methods and evidence -based practices. Exceptional children , 71(2), 137-148. Olkin, I., & Pratt, J. W. (1958). Unbiased estimation of certain correlation coefficients. The Annals of Mathematical Statistics , 29(1), 201-211. Effect of college or university training on earnings of people with disabilities: A case control study. Journal of Vocational Rehabilitation , 43(2), 93 -102. Paccagnella, O. (2006). Centering or not centering in multilevel models? The role of the group mean and the assessment of group effects. Evaluation review , 30(1), 66-85. Pearson, K., & Lee, A. (1903). On the Laws of Inheritance in Man: I. Inheritance of Physical Characters. Biometrika, 2(4), 357-462. doi:10.2307/2331507 Pearson, K. (1904). On the Laws of Inheritance in Man: II. On the Inheritance of the Mental and Moral C haracters in Man, and Its Comparison with the Inheritance of the Physical Characters. Biometrika, 3(2/3), 131 -190. doi:10.2307/2331479 Pearson, K. (1920). Notes on the History of Correlation. Biometrika, 13(1), 25-45. doi:10.2307/2331722 Biometrika, 14(3/4), 412 -417. doi:10.2307/2331822 Pi, S. (2006). Micro -and Macro -level Factors Related to Vocational Rehabilitation Outcomes (Doctoral dissertation, Michigan Stat e University. Department of Counseling, Educational Psychology, Special Education). !137 Pi, S., & Thielsen, V. (2011). RSA 911 Data Is a Gold Mine If You Have the Right Shovel , presented at the 4th Summit on Vocational Rehabilitation Program Evaluation & Qua lity Assurance . September 13th & 14th, 2011. Grand Hyatt Tampa Bay, Tampa, Florida, U.S.A. Retrieved from http://vocational -rehab.com/wp -content/uploads/2013/04/C802.0007.01.pdf Raykov, T., & Marcoulides, G. A. (2004). Using the delta method for approxi mate interval estimation of parameter functions in SEM. Structural Equation Modeling , 11(4), 621-637. Raykov, T., & Marcoulides, G. A. (2006). A first course in structural equation modeling . New York, NY: Psychology Press, Tylor and Francis Group, LLC. Raykov, T., & Penev, S. (2010). Evaluation of reliability coefficients for two -level models via latent variable analysis. Structural Equation Modeling , 17(4), 629-641. Raykov, T., & Marcoulides, G. A. (2011). Introduction to psychometric theory . Routledge. Raykov, T. (2011). Intraclass correlation coefficients in hierarchical designs: Evaluation using latent variable modeling. Structural Equation Modeling , 18(1), 73-90. Raykov, T., & Marcoulides , G. A. (2015a). Intraclass correlation coefficients in hierarchical design studies with discrete response variables: A note on a direct interval estimation procedure. Educational and psychological measurement , 75(6), 1063-1070. Raykov, T., & Marcoulides, G. A. (2015b). On examining the underlying normal variable assumption in latent variable models with categorical indicators. Structural Equation Modeling: A Multidisciplinary Journal , 22(4), 581-587. Raudenbush, S. W. (1997). Statistical analysis and opt imal design for cluster randomized trials. Psychological Methods , 2(2), 173. Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (Vol. 1 Advanced quantitative techniques in the social sciences) . CA: Sage. Rehabilitation Services Administration Policy Directive (2013). RSA -PD-14-01. Washington, DC. Retrieved from https://www2.ed.gov/policy/speced/guid/rsa/subregulatory/pd -14-01.pdf Richardson, J. T. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review , 6(2), 135-147. Rizzo, M. L. (2007). Statistical computing with R . Chapman and Hall/CRC. Rosenthal, J. A. (1996). Qualitative descriptors of strength of association and effect size. Journal of social service Research , 21(4), 37-59. !138 Rosenthal, D. A., Dalton, J. A., & Gervey, R. (2007). Analyzing vocational outcomes of individuals with psychiatric disabilities who received state vocational rehabilitation services: A da ta mining approach. International Journal of Social Psychiatry , 53(4), 357-368. Ross, S. M. (2013). Simulation . Amsterdam: Academic Press. Roussas, G. G. (2002). A course in mathematical statistics . San Diego: Academic. Rutterford, C., Copas, A., & El dridge, S. (2015). Methods for sample size determination in cluster randomized trials. International journal of epidemiology , 44(3), 1051-1067. Schoen, B. (2010). An examination of employment outcomes for individuals with spinal cord injury served by the state vocational rehabilitation services program between 2004 and 2008 (Doctoral dissertation, Michigan State University. Department of Counseling, Educational Psychology, Special Education). Schoen, B. A., & Leahy, M. J. (2012). An Analysis of the Chang ing Demographics of Individuals with Spinal Cord Injury Who Received State Vocational Rehabilitation Services between 2004 and 2008. Journal of Rehabilitation , 78(3). Schonbrun, S. L., Sales, A. P., & Kampfe, C. M. (2007). RSA Services and Employment Outc ome in Consumers with Traumatic Brain Injury. Journal of Rehabilitation , 73(2). Schneider, B. (Ed.). (2018). Handbook of the Sociology of Education in the 21st Century . Springer. Schneider, B., Carnoy , M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating causal effects using experimental and observational design . American Educational & Research Association. Schochet, P. (2005). Statistical Power for Random Assignment Evaluations o f Education Programs. Princeton, NJ: Mathematica Policy Research, Inc. Schwanke, T., & Smith, R. O. (2004). Technical report Vocational rehabilitation database analysis: RSA -911 case service report and database linking (Version 1.0). Rehabilitation Resear ch Design & Disability: University of Wisconsin -Milwaukee . Sink, T., Bua -Iam, P., Hampton, J. E., & Snuffer, D. W. (2014). Applying Location Theory in Vocational Rehabilitation. Journal of Rehabilitation Administration , 38(2), 73-86. Slavin, R. E. (2002) . Evidence -based education policies: Transforming educational practice and research. Educational researcher , 31(7), 15-21. Slavin, R. E. (2008). Perspectives on evidence -based research in education What works? Issues in synthesizing educational program ev aluations. Educational researcher , 37(1), 5-14. !139 Shavelson, R. J., Phillips, D. C., Towne, L., & Feuer, M. J. (2003). On the science of education design studies. Educational researcher , 32(1), 25-28. Soper, H., Young, A., Cave, B., Lee, A., & Pearson, K. (1917). On the Distribution of the Correlation Coefficient in Small Samples. Appendix II to the Papers of "Student" and R. A. Fisher. Biometrika, 11(4), 328-413. doi:10.2307/2331830 Stapleton, J. H. (2009). Linear statistical models (Vol. 719). John Wiley & Sons. Student. (1917). Tables for Estimating the Probability that the Mean of a Unique Sample of Observations Lies Between - from Which the Sample is Drawn. Biometrika, 11(4), 414 -417. doi:10.2307/2 331831 education research. Journal of Graduate Medical Education , 3(3), 285-289. Supporting Information for the RSA -911 Data. (n.d.). Retrieved November 18, 2018 f rom https://rsa.ed.gov/display.cfm?pageid=75 Tachibana, Y., Miyazaki, C., Mikami, M., Ota, E., Mori, R., Hwang, Y., Terasaka, A. , Kobayashi, E., & Kamio , Y. (2018). Meta -analyses of individual versus group interventions for pre -school children with autism spectrum disorder (ASD). PloS one , 13(5), e0196272. https://doi.org/10.1371/journal.pone.0196272 Tan, P. -N., Steinbach, M., & Kumar, V. (2005). Introd uction to data mining . Boston: Pearson Addison Wesley. The What Works Clearinghouse (WWC). (n.d.). Standards Handbook Version 4.0. Retrieved from https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_standards_handbook_v4.p df Thorndike, R. M. (2005). Measurement and evaluation in psychology and education . Upper Saddle River, New Jersey: Pearson Education, Inc. U.S. Department of Education. (September 16, 2016). Guidance and Regulatory Information . Retrieved from https://www2.ed.gov/policy/elsec/leg/e ssa/guidanceuseseinvestment.pdf WIOA Legislation - U.S. Department of Labor. (June 1, 2018). Overview and Highlight: Workforce Innovation and Opportunity Act . Retrieved from https://www.doleta.gov/WIOA/Overview.c fm