STRATIFIED INVERSE CLUS T ER SAMPLING WITH UPDATING PROCESS FOR SAMPLES FROM A RARE POPULA TION By Sewon Kim A DISSERTATION Submitted to Michigan State University in partial f ulfillment of the requirements for the degree of Measurement and Quantitative Methods Doctor of Philosophy 2020 ABSTRACT STRATIFIED INVERSE CLUS T ER SAMPLING WITH UPDATING PROCESS FOR SAMPLES FROM A RARE POPULA TION By Sewon Kim Survey s have been a popular research tool and have been used extensively in many fields including education . In practice, most of surveys are conducted with some part of the population, samples . As more surveys are conducted , the range of survey participants becom es wider than ever before . Groups of people , who because they were rare in the general population, are now considered populations of interest . However, they are hard to sample using conventional sam ple designs. Such situation motivated the development of a new sampl e design and Reckase, Kim, and Ju (2016) developed stratified inverse cluster sampling with updating process (SICSUP) in order to obtain a representative sample from such rare populations. The objective of this study is to evaluate the performance of SICSUP with respect to statistical and economic aspects . The statistical aspects are: (1) accuracy in parameter estimation, (2) required sample size to achieve desired pre c ision that results o f surveys should have, and (3) accuracy in group differentiation were examined. The economic aspect is the number of contacted schools in order to reach the predetermined sample size of elements in SICSUP as compared to that in stratified cluster sampling (SC) was investigated. The results suggest that SICSUP works as well as SC and can be a useful sample design for rare populations. Also, the results provide guidelines for the application of SICSUP in educational surveys. In terms of precision in mean, st andard deviation, and standard error estimation, in general, SICSUP performs as well as SC except with small sample size ( n = 50). The four replication - based standard error estimators, including the jackknife, bootstrap, BRR, do not make a substantial difference in standard error estimation. In terms of determination of sample size, on average, SICSUP needs a slightly larger sample than SC although the difference in sample size between the two sample designs is not sizable. W ith sampling weight, SICSUP and SC require a sample size about 2.30 and 2.21 times, respectively, larger than that in simple random sampling (SRS) in order to produce estimates as accurate as those in SRS. In terms of providing country rankings that are identical with those based on the population means , SICSUP work s as well as or , depending on the condition, slightly better than SC . However, the results impl y that rankings sho uld be interpreted with caution. With respect to economic aspect, SICSUP needs to contact fewer schools than SC in order to reach a predetermined sample size of elements and thus, is more economica l than SC. However, SICS UP might not have advantages for rare populations with large clusters or small number of strata. Copyright by SEWON KIM 20 20 v ACKNOWLEDGEMENTS I would like to express my deepest appreciation to my advisor and committee chair, Dr. Mark D. Reckase. The meetings and conversations every week were vital in inspiring and motivating me to keep working on my dissertation. Without his amazing support and encouragements especially during this challenging time , I would not have been able to complete my dissertation. I would also like to thank my committee members, Dr. Kimberly Kelly, Dr . Richard Ho uang, and Dr. Amita Chudgar , not only for their time and patience, but for insightful suggestions that substantially improved my work. Nobody has been more important to my life and the completion of my dissertation than my family. I dedicate this work to my mom and dad, Hyunsug Hong and Kyungman Kim, and my brother, Hansol. Many thanks to my family for all the love and support you have shown me throughout my work even though we are living on two different continents. vi TABLE OF CONTENT S LIST OF TABLES ................................ ................................ ................................ .................. vii i LIST OF FIGURES ................................ ................................ ................................ .................. xi CHAPTER 1. INTRODUCTION ................................ ................................ ................................ 1 1.1 Background ................................ ................................ ................................ ................... 1 1.2 Stratified Inverse Cluster Sampling with Updating Process (SICSUP) ........................... 3 1.3 Research Questions ................................ ................................ ................................ ........ 6 CHAPTER 2. LITERATURE REVIEW ................................ ................................ ...................... 8 2.1 Concepts and Definitions ................................ ................................ ............................... 8 2.2 SICSUP and Conventional Sampling techniques ................................ .......................... 10 2.2.1 Concept of Rare Population ................................ ................................ .................. 10 2.2.2 Relationship between SICSUP and Existing Sample Designs ............................... 12 2.2.3 Replication Method for Variance Estimation ................................ ........................ 17 CHAPTER 3. METHODS ................................ ................................ ................................ ......... 24 3.1 Research Question 1 ................................ ................................ ................................ .... 25 3.1.1 Data Generation ................................ ................................ ................................ .... 25 3.1.2 Simulation Design ................................ ................................ ................................ 28 3.1.3 Vari ance Estimator ................................ ................................ ............................... 30 3.1.4 Evaluation Criteria ................................ ................................ ............................... 31 3.2 Research Question 2 ................................ ................................ ................................ .... 34 3.2.1 Data and Simulation Design ................................ ................................ ................. 35 3.2.2 Evaluation Criteria ................................ ................................ ............................... 35 3.3 Research Question 3 ................................ ................................ ................................ .... 38 3.3.1 Data Generation ................................ ................................ ................................ .... 39 3.3.2 Simulation Design ................................ ................................ ................................ 41 3.3.3 Eval uation Criteria ................................ ................................ ............................... 42 3.4 Research Question 4 ................................ ................................ ................................ .... 43 3.4.1 Data and Simulation Design ................................ ................................ ................. 44 3.4.2 Evaluation Criteria ................................ ................................ ............................... 45 CHAPTER 4. RESULTS ................................ ................................ ................................ .......... 48 4.1 Research Question 1 ................................ ................................ ................................ .... 49 4.1.1 Mean and Standard Deviation ................................ ................................ ............... 49 4.1.2 Standard Error of the Sample Mean ................................ ................................ ...... 57 4.2 Research Question 2 ................................ ................................ ................................ .... 86 4.2.1 Design Effect and Sample Size ................................ ................................ ............. 86 4.2.2 Margin of Error and Sample Size ................................ ................................ .......... 95 4.3 Research Question 3 ................................ ................................ ................................ .. 101 4.3.1 Confidence Interval Coverage Probability ................................ .......................... 102 vii 4.3.2 Rank Order of Five Countries ................................ ................................ ............. 104 4.4 Research Question 4 ................................ ................................ ................................ .. 110 4.4.1 Results Based on Dataset 1 ................................ ................................ ................. 111 4.4.2 Results Based on Dataset 2 ................................ ................................ ................. 117 4. 4.3 Probability of Using Substitute Schools in SC ................................ .................... 122 CHAPTER 5. CONCLUSION AND DISCU SSION ................................ ............................... 124 5.1 Summary of Findings ................................ ................................ ................................ 124 5.2 Implications ................................ ................................ ................................ ............... 128 5.3 Limitation and Future Research ................................ ................................ ................. 130 APPENDIX ................................ ................................ ................................ ............................ 132 REFERENCES ................................ ................................ ................................ ....................... 142 viii LIST OF TABLES Table 3.1 Number of Novice Teachers per School ................................ ................................ ..... 27 Table 3.2 Number of Novice Teachers by Location of School ................................ ................... 27 Table 3.3 Initial Proportions for Sampling ................................ ................................ ................. 29 Table 3.4 Margin of Error and Required Sample Size for SRS ................................ .................. 38 Table 3.5 Summary of the Generated Data by Countries ................................ ............................ 40 Table 3.6 Stratification Variable by Country ................................ ................................ ............. 41 Table 3.7 Required Sample Size for SRS by Margin of Error and Country ................................ 42 Table 4.1 MSE of the Mean and Sta ndard Deviation Using SRS Samples ................................ . 49 Table 4.2 MSE of Mean Using SICSUP, SICS, and SC Samples ................................ ............... 50 Table 4.3 MSE of Standard Deviation Using SICSUP, SICS, and SC Samples .......................... 55 Table 4.4 Estimated Bias, Relative Bias, Relative MSE, and Confidence Interval Coverage Probability (CV) of the Standard Error Estimators Using SRS without Strata ............................ 58 Table 4.5 Estimated Bias and Relative Bias of the Standard Error Estimators Using SRS with Pseudo - Strata ................................ ................................ ................................ ............................ 59 Table 4.6 Relative MSE and Confidence Interval Coverage Probability of the Standard Error Estimators Using SRS with Pseudo - Strata ................................ ................................ ................. 60 Table 4.7 Estimated Bias for the Standard Error Estimators with Original Strata and Weight .... 62 Table 4.8 Relative Bias of the Standard Error Estimators with Original Strata and Weight ........ 66 Table 4.9 Relative MSE for the Standard Error Estimators with Original Strata and Weight ...... 68 Table 4.10 Confidence Interval Coverage Probability of the Standard Error Estimators with Original Strata and Weight ................................ ................................ ................................ ........ 70 Table 4.11 Estimated Bias of the Standard Error Estimators with Pseudo - Strata and Weight ..... 73 Table 4.12 Relative Bias of the Standard Error Estimators with Pseudo - Strata and Weight ........ 75 ix Table 4.13 Relative MSE of the Standard Error Estimators with Pseudo - Strata and Weight ....... 81 Table 4.14 Confidence Interval Coverage Probability of the Standard Error Estimators with Pseudo - Strata and Weight ................................ ................................ ................................ ......... 83 Table 4.15 Design Effect for the Variable of Interest ................................ ................................ . 88 Table 4.16 Desired Sample Size ................................ ................................ ................................ 89 Table 4.17 Margin of Error for a Sample Mean and Required Sample Size f or SRS .................. 95 Table 4.18 Margin of Error for a Sample Mean and Required Sample Size for SICSUP, SICS, and SC ................................ ................................ ................................ ................................ ...... 96 Table 4.19 Sample Means by Country ................................ ................................ ..................... 101 Table 4.20 Coverage Probability of Confidence Interval for the Country Mean Using Weighted Samples ................................ ................................ ................................ ................................ .. 103 Tab le 4.21 Coverage Probability of Confidence Interval for the Country Mean Using Unweighted Samples ................................ ................................ ................................ ............... 104 Table 4.22 Rates of Producing Rankings That Are Identical with the Rankings Based on the Population Means Using SICSUP, SICS, SC, and the Combination of Two Designs ............... 105 Table 4.23 Number of Contacted Schools and Schools in the Sample, Based on Dataset 1 ...... 111 Table 4.24 Difference in the Number of Contacted Schools, Based on Dataset 1 ..................... 112 Table 4.25 Number of Contacted Schools and Schools in the Sample by Strata, Based on Datase t 1 ................................ ................................ ................................ ................................ .............. 113 Table 4.26 Difference in the Number of Contacted Schools by Strata, Based on Dataset 1 ...... 115 Table 4.27 Number of Contacted Schools and Schools in the Sample, Based on Dataset 2 ...... 118 Table 4.28 Difference in the Number of Contacted Schools, Based on Dataset 2 ..................... 119 Table 4.29 Number of Contacted Schools and Schools in the Sample by Strata, Based on Dataset 2 ................................ ................................ ................................ ................................ .............. 120 Table 4.30 Difference in the Number of Contacted Schools by Strata, Based on Dataset 2 ...... 121 Table 4.31 Probability of Using Substitute Schools, Based on Dataset 1 ................................ .. 123 Table 4.32 Probability of Using Substitute Schools, Based on Dataset 2 ................................ .. 123 x Table A.1 Estimated Bias for the Standard Error Estimators with Original Strata and without Weight ................................ ................................ ................................ ................................ .... 134 Table A.2 Relative Bias of the Standard Error Estimators with Original Strata and without Weight ................................ ................................ ................................ ................................ .... 135 Table A.3 Relative MSE for the Standard Error Estimators with Original Strata and Weight ... 136 Table A.4 Confidence Interval Coverage Probability of the Standard Error Estimators with Original Strata and without Weight ................................ ................................ ......................... 137 Table A.5 Estimated Bias of the Standard Error Estimators with Pseudo - Strata and without Weight ................................ ................................ ................................ ................................ .... 138 Table A.6 Relative Bias of the Standard Error Estimators with Pseudo - Strata and without Weight ................................ ................................ ................................ ................................ ............... 139 Table A.7 Relative MSE of the Standard Error Estimators with Pseudo - Strata and without Weight ................................ ................................ ................................ ................................ .... 140 Table A.8 Confidence Interval Coverage Probability of the Standard Error Estimators with Pseudo - Strata and without Weight ................................ ................................ ........................... 141 xi LIST OF FIGURES Figure 1.1 Procedure of SICSUP ................................ ................................ ................................ . 5 Figure 4.1 Empirical Selection Probability for n=50 (left) and n=1,000 (right) Using SICSUP .. 53 UJ UB ) Estimators with n=50 and Original Strata by Type of Initial Proportions: Initial Proportions Based on Data (Top), Informal Estimate Based on School Proportions (Middle), and Informal Estimate Based on Equal Proportions (Bottom) ................................ ................................ ................................ ................ 64 Estimate Based on Equal Proportions) ................................ ................................ ....................... 76 Figure 4.4 Relative Bias of the Standard Error Estimator with Weight (Blue Lines) and without Weight (Red Lines) by Sample Size Using SICSUP. ................................ ................................ . 79 Figure 4.5 Confidence Interval Coverage Probability with Pseudo - Strata and Weight ............... 85 Figure 4.6 Sample Size for SICSUP, SICS, and SC That Yields the Same Precision as SRS of 50 ................................ ................................ ................................ ................................ ................. 91 Figure 4.7 Sample Size for SICSUP, SICS, and SC That Yields the Same Precision as SRS of 100 ................................ ................................ ................................ ................................ ............ 92 Figure 4.8 Sample Size for SICSUP, SICS, and SC That Yields the Same Precision as SRS of 500 ................................ ................................ ................................ ................................ ............ 93 Figure 4.9 Sample Size for SICSUP, SICS, and SC That Yields the Same Precision as SRS of 1,000 ................................ ................................ ................................ ................................ ......... 94 Figure 4.10 Margin of Error for a Sample Mean and Required Sample Size for SICSUP, SICS, ................................ ................................ ......................... 97 Figure 4.11 Margin of Error for a Sample Mean and Required Sample Size for SICSUP, SICS, ................................ ................................ ......................... 98 Figure 4.12 Margin of Error for a Sample Mean and Required Sample Size for SICSUP, SICS, ................................ ................................ ......................... 99 Figure 4.13 Estimated Means with 95% Confidence Interval by Country under the Condition of Initial Proportions Based on Data and with Weight: The First Scenario ................................ ... 106 xii Figure 4.14 Estimated Means with 95% Confidence Interval by Country under the Condition of Informal Estimate of Proportions Based on School Proportion and with Weight: The Second Scenario ................................ ................................ ................................ ................................ .. 107 Figure 4.15 Estimated Means with 95% Confidence Interval by Country under the Condition of Informal Estimate of Proportions Based on School Proportion: The Third Scenario ................ 109 Figure 4.16 Difference in t he Number of Contacted Schools by Strata, Based on the Dataset 1 116 Figure 4.17 Difference in the Number of Contacted Schools by Country: SC (Top Line) and SICSUP (Bottom Line) ................................ ................................ ................................ ........... 118 1 CHAPTER 1. INTRODUCTION 1.1 Background Survey s have been a popular research tool and have been used extensively in many fields including education, psychology, and sociology. In practice , most of surveys are conducted with some part of the population , samples, rather than with the whole population and ma ke inferences about the population. Eve ry year, more and more surveys are conducted , and the range of survey participants becomes wider than ever before. Groups of people such as cultural minority, the homeless, and nomads once seemed impossible to survey because they are rare in the general population, but now they are considered target populations for surveys although they are harder to survey than the general population. In the field of educational research, surveys are also popular and have been widely used to increase knowledge in the field. Like other areas, surveys have been done more frequently in recent years , and rare, thus, hard - to - sample populations , such as students from minority group (De Róiste & Dinneen, 2005 ) and children experiencing long - term fos ter care (Daly & Gilligan, 2005 ) , have gain ed increasing attention from educational researchers. Since the International Association for the Evaluation of Educational Achievement (IEA) conducted the First International M athematics Study (FIMS) in the early 1960s, which is one of the earliest modern - day international assessments of student skills (Rutkowski, von Davier, & Rutkowski, 2013), i nternational large - scale surveys and assessments in education also have gained impo rtance and popularit y among educational researchers and have become one of the most influential studies in current education (Kirsch et al., 2013). More than 50% of the 2 countries in the world have taken part in some type of international assessment (Kamen s & NcNeely, 2010). The results from international surveys do not only provide international comparison but also impact on education policies at the national level (Smith, 2016) . Countries as a whole have characteristics , and different characteristics across countries could make populations of interest difficult to survey at the country level in addition to at the individual level. For example, developing countries tend to lack the resources to carry out surveys and hence , likely to have more hard - to - survey populations. These countries might not have enough funding for surveys so that they might omit specific subpopulations (e.g., people in rural area and people who speak a minor language) or shorten the period for survey . They might not have statistics collected by government censuses and statistical agencies so that general sample frames that are often required for sampling procedure cannot be constructed readil y. In addition to the individual level factors (rare by nature ), factors at the country level (rare by operation) could increase difficulty in sampling rare populations. With respect to international surveys in education, different educational systems across countries sometimes raise challenges for obtaining samples. O ne of the most widely used sampl e designs for surveys in the area of educational research is cluster sampling because of the hierarchical structure of e ducation systems. S tudents , the major target population in educational research, ar e nested in classes, and classes are nested in schools. Stratification variables are also often used to improve the representativeness of the target population. Therefore, stratified cluster sampli ng is a frequently used sampling technique for educational surveys. As well as domestic educational surveys, international large - scale Trends in International Mathematics and Science Study (TIMSS), and the Progress in 3 Internatio nal Reading Literacy Study (PIRLS), use stratified multi - stage sample design s, which are a complex form of stratified cluster sampling . In order to apply multi - stage sampling to surveys, known proportions or frequencies of sampling units over strata are required. If sampling units are students in schools, researchers should have a list (a frame) of students in the target population before starting the sampling procedure. Applying these sampl e designs is inconvenient when target populations are rare po pulations at the country level, individual level, or both. For instance, a t the individual level, a significant proportion of clusters might not include any sampling unit, which can be considered a rare population by nature , and at the country level, a cou ntry might not know the distribution of sampling units due to the limited resource, which can be considered a rare population by operation . If the target population is a rare population at the individual level and also at the country level, obtaining samples for the survey would be come much harder. Such problem motivated the development of a new sampl e design , and Reckase, Kim, and Ju (2016) developed stratified inverse clust er sampling with updating process (SICSUP) 1 i n order to obtain a representative sample under these circumstances. 1.2 Stratified Inverse Cluster Sampling with Updating Process (SICSUP) Before describing the SICSUP procedure, I need to define a few key terms and describe situations that this dissertation focuses on. In this section, only the terms that are necessary for describing SICSUP were mentioned . More concepts and definitions of sampling are discussed in detail in Chapter 2. In this dissertation , samples are stratified and clustered. Clusters are taken first and elements in the clusters are taken later sampling units that are selected first 1 It was initially called s tratified s equential a daptive c luster s ampling (SSACS) . 4 ( SSU) refers to s ampling units in PSUs , which are elements. It is assumed that researchers need much about the proportions of SSUs over strata in the population . It is also assumed that As shown Figure 1.1, SICSUP can be implemented in a four - step procedure (Reckase, Kim, & Ju, 2016).The first step of SICSUP is to determine initial sa mple sizes for strata based on available information about the proportions of SSUs over strata in the population . The second step of SICSUP is contacting PSUs from the list of PSUs which is randomly ordered and identifying SSUs available from each PSU contacted. If there are any SSUs available, researchers recruit all of them and include them to the sample. If there i s no SSU available, researchers move on to the next PSU in the list of PSUs. Researchers repeat the second step until one of the strata reaches the initial sample size. The third step of SICSUP is updating the initial proportions of elements over strata . When one of the strata reaches the initial sample size , it becomes possible to update the initial proportions of SSUs over strata , which might not be accurate, based on the current sample proportions over strata at this point . Then, the updated sample sizes for strata would be obtained. The fourth step of SICSUP is contacting PSUs and recruiting SSUs from the PSU s contacted until all of the strata reach the updated sample sizes. The fourth step is basically the same as the sec ond step. The difference between the fourth step and the second step is whether updated sample sizes are us ed or the initial sample sizes are used. Once a stratum satisfies the desired number of SSUs, PSUs in that stratum would be ignored when contacting n ext PSUs in the list. After the fourth step, a final set of samples would be obtained. 5 Figure 1 . 1 Procedure of SICSUP 6 1.3 Research Questions Although previous studies evaluated the performance of SICSUP for rare populations, th ey are done before SICSUP was fully developed (Kim, Ju, & Reckase, 2015) or did not evaluate the entire procedure of SICSUP (Reckase, Kim, & Ju, 2016) , excluding the updating process of SICSUP and stratification. Therefore, it is difficult to determine whether SICSUP is a viable sample design for rare population s in education . T here is a need for evaluating the full SICSUP procedure . The results would provide guidelines and requirements for the application of SICSUP to educational survey s , enabl ing researchers to explore rare populations in education. A good sample design must provide necessary information with maximum precision for fixed allowed resource s . The objective of this study is to evaluate the performance of SICSUP with respect to statistical and economic aspects . In terms of statistical aspect , accuracy in parameter estimation, required sample size to achieve desired pre c ision that results of surveys should have , and accuracy in group differentiation are examined. In terms of economic aspect , the number of contacted schools during the sampling procedure in SICSUP as compared to those in stratified cluster sampling is investigated . The following research questions need to be addressed: 1. Do es SICSUP work as well as stratified clu ster sampling regarding parameter estimation? 2. How can the appropriate sample size for SICSUP be determined ? 3. Can the samples from SICSUP determine whether the means of groups are different from each other ? 4. Is SICSUP economically more advantageous than stratified cluster sampling ? 7 The four research questions are answered through simulation studies. Chapter 2 reviews the concept of rare populations, the relationship between SICSUP and existing sampling techniques , and the features of standard error estim ators (e.g., replication m ethods) . Chapter 3 presents details about data generation, simulation designs, and evaluation criteria. The last two chapters (Chapter 4 and 5) provide the findings from the simulation studies and discuss the performance of SICSUP and how SICSUP can be used for surveys that aim at rare populations in education . These two chapters also provide guidelines for applying SICSUP to rare populations in education . 8 CHAPTER 2. LITERATURE REVIEW This literature r eview chapter consists of three main sections. The first section briefly discusses the basic concept and definitions that are used in this dissertation . The second section explores the types of rare populations and sampling techniques useful for these populations. This section also explores the relationship between SICSUP and other sampling techniques. The third section summarizes the features of replica tion methods for standard error estimation. 2.1 Concepts and Definitions Different studies on sampling might use different statistical terms to describe the same concept. In order to avoid confusion due to various statistical terms, this section discusses t he basic concepts and definitions that are used in this dissertation. The concepts and definitions are based on the three sampling textbooks ( Kish , 1965 ; Murthy , 1967 ; Thompson , 2002 ). A sampling unit, or simply a unit, is an element or a group of elements, on which observations can be made and for which information is sought . A population is a collection of all units in a given region. In element sampling (e.g., simple random sampling), each sampling unit contains only one element; but in cluster s ampling, any sampling unit (or primary sampling unit (PSU)) called a cluster may contain several elements. In general, for using sampl e designs, a list, or a frame, of all sampling units belong to th e population is necessary , and such list or frame is term ed the sampling frame. The sampling frame illustrates the distribution of elements over the population. Surveys aim at estimating population values, which are obtained from all population elements. A population value is called a parameter. A sample val ue, or statistic, is an estimate computed from elements in a set of sample s 9 estimate of the population standard devi ation. The sampling distribution of an estimate is the theoretical distribution of all possible values of the estimate. The standard deviation of the sampling distribution is called the standard error. The squared standard error is called the variance of e stimate. With respect to symbols, capital letters refer to population values, parameters, and In general, this dissertation uses n for t he number of elements in the sample and N for the number of elements in the population. However, for stratified cluster sampling ( SC ), I may use different symbols: m for the number of elements in the sample and M f or the number of elements in the population; n for the number of clusters in the sample and N for the number of clusters in the population. A va lue of the variable of interest in a sample is expressed as y . The symbol of denotes any parameter such as mea n or standard deviation. For the purpose of comparison, this dissertation uses SC . It is slightly different from the stratified multi - stage sampling, which is widely used for national and international studies. SC can be considered stratified single - stage sampling. In each stratum , clusters are randomly sampled, and all elements in t he clusters are selected . In stratified multi - stage sampling, different sampling technique s can be applied to each stage. For example, for two - stage sampling, prima ry units can be selected with probabilities proportional to size, and secondary units can be selected using simple random sampling. 10 2.2 SICSUP and C onventional S ampling techniques 2.2.1 Con cept of Rare Population The development of SI C SUP was motivated by facing difficulties in obtaining samples from rare populations. One may question what rare populations refer to. A rare population sometimes is defined as a population with a low number of elements. However, t here is no universally accepted definition p - to - McDonald (2004) reviewed definitions of a rare population in the field of biology. Rare populations in biology possess one or more followi ng characteristics: first, the proportion of the elements in the population is small; second, elements practice elusive or secretive behavior; third, elements are sparsely distributed over large ranges; fourth, elements practice differently by time or seas on; fifth, application of ineffective sampling can make rare populations. Based on the four characteristics, there are two types of rare populations: rare populations by nature and operationally rare population s . Likewise, Riniolo (1999) discussed rare populations when sampling units are individuals and categorized rare population into five : (1) sparse populations, (2) limited access populations, (3) persons experiencing an infrequent event (e.g., persons wit h severe allergic reaction), (4 ) those who newly associated with a rare population (e.g., persons with brain injury), and ( 5 ) developmentally uncommon cases (e.g., teenage myocardial infarction patients) . Tourangeau ( Tourangeau et al . , 2014) discussed hard - to - survey populations mainly in the fields of psychology, sociology, and business . Some populations are hard to survey in different ways. The author distinguished hard - to - survey populations into five categories: populations that are hard to sample, those whose members who are hard to identify, those that are hard to find or 11 contact, those whose members are hard to persuade to take part, and those whose members are hard to interview. A hard - to - sample population is a population without a sampling frame or with an incomplete sample frame. In the absence of a complete sampling frame, if elements are rare, representing a small fraction of the larger population, the population can be hard to sample. The other factor making rare populations to hard - to - sample populations is the cost of screenin g. Screening is often used to detect rare elements in the population (for example, a few questions to identify elements in the larger population). If screening is expensive relative to main survey, it affects the final data collection from the main survey. Hard - to - sample populations also contain elusive or mobile populations, such as the homeless and migrant workers. In sum, a rare population is defined as a population with a small proportion of elements in the larger population and is a part of hard - to - sample populations and hard - to - survey populations ( see Tourangeau et al . , 2014, for the other four categories of hard - to - survey populations) . What kinds of rare populations are there in the field of educational research? Students with s pecial educational needs are an example of rare populations in education, including deaf and hard of hearing students (Scott & Hoffmeister, 2016). The U.S. Census Bureau annually conducts natio n wide survey known as t he Survey of Income and Program Particip ation (SIPP) in order to identifying the American population of persons with hearing loss or deafness including children (Mitchell, 2006) . Such data provide useful information about this rare population. As the United States population becomes increasingly diverse , t here has been growing interest in immigrant students (Bailey & Weininger, 2002) and bilingual students ( Burke, Morita - Mullaney , & Singh, 2016 ; Lesaux & Kieffer, 2010 ) . This increase in US bilingual populations also led federal author ity to conduct empirical research on bilingual students ( Greenberg Motamedi, Singh , & Thompson, 2016 ; Haas et al . , 2015 ). 12 Drop - out students are another example of rare populations in educational research (Kinnunen & Malmi, 2006; Lassibille & Navarro Gómez, 2008). In general, because they already left schools, researchers experience difficulties in finding them. Information before their drop - out and indication s of drop - out are often used for studies. There are rare populations including teachers as elements. Novice teachers or b eginning teachers are a rare population because of the low frequency in the larger population (population of teachers) and their mobility. Various research methods have been applied in order to study such populatio n of novice teachers ( Chubbuck et al . , 2001 ; Westerman, 1991 ). Schools can also be a rare population. Lee, Ready , and Johnson (2001) investigated schools - within - school, and in their s tudy, such schools are rare elements in the general population (population of schools). 2.2.2 Relation ship between SICSUP and Existing Sampl e Designs A wide variety of techniques ha s been suggested for dealing with samples from a rare population such as multipurpose samples, cumulation of a rare population, use of large clusters, controlled selection, batch testing, two - phase sampling, etc. Among the se techniques , Kish (1985) and Elliott (National Academies of Sciences, Engineering, and Medicine, 2018) suggested three techniques : (1) creating a list (a frame) of elements, (2) oversampling, and (3) screening. Rare populations often lack a list of elements. Creatin g a list of elements could help researchers locate rare elements. Network sampl e designs can be used to build up a list of elements. In network sampling, a simple random or stratified random sample is selected, and all the elements linked to the previously selected sample are used to create a list of elements 13 ( Thompson , 20 0 2 ). For example, if researchers want to create a list of nov ice teachers with less than five years of teaching experience , they t ake an initial sample of novice teachers. Then, the sampled novice teachers are asked whether they know any novice teachers. If they know any novice teachers, the researcher s add them to the list of novice teachers. If it is hard to create a list of elements, oversampling is another approach for sampling rare populations. In SC , if most of the rare elements are located in a small stratum, the more samples would be selected f rom th at stratum than from other strata. The last suggested technique is screening. Screening may involve a brief interview or short tests to identify rare elements. This technique can associate with different sampl e designs. Screening to find rare element s is practical if proportion of the elements is about 10 or 20 percent of the population. Adaptive Sampling . Selection procedures in conventional sampl e designs , such as SRS, stratified sampling, and cluster sampling, do not depend on observations made during sampling. However , in some sampling situations, making decisions during sampling process may be beneficial in order to obtain a set of samples that provides more precise estimates than conventional sampl e designs given sample size or cost. For rare populations, researchers often do not have a complete frame of sampling units before starting the sampling procedure. They can take advantage of the knowledge in po pulation characteristics that was obtained during the sampling process and improve accuracy in estimation. Adaptive sampling refers to sampl e designs in which the selection procedure may depend on values of the variables of interest observed during the sa mpling process (Thompson, 2002). Therefore, in general, the sample s ize tends to vary . Adaptive sampling is a general sampling strategy rather than a specific sampl e design. Sampl e designs that employ adaptive strategy can be consider adaptive sampling, such as adaptive cluster s ampling and sequential sampling. 14 The family of adaptive sampling tend s to estimate population density or abundance and thus, has been often studied and use d in biology (Thompson, 2004) . Surveys in educational research pay more attention to estimating variable s of intere st than estimating population density. SICSUP does not specially focus on estimating abundance of elements and can be used for estimating bot h of variables of interest and population density. Therefore, SICSUP is more applicable than adaptive sampling for educational research. Adaptive Cluster Sampling . Adaptive cluster sampling , introduced by Thompson (1990), is a sampl e design that uses adaptive strategy for selection procedure. Thompson (2002) describes adaptive cluster sampling as follows: Adaptive cluster sampling refers to designs in which an initial set of units is selected by some probability sampling procedure, and whenever the variable of interest of a selected unit satisfies a given criterion, additional units in the neighborhood of th at unit are added to the p.319 ). Adaptive cluster sampling has a large number of possible designs, and various adaptive cluster sample designs have been developed based on the basic adaptive cluster sampling: systematic and strip adaptive cluster sampling (Thompson, 1991a), stratified adaptive cluster sampling (Thompson, 1991b) , two - stage adaptive cluster sampling (Salehi & Seber, 1997), restricted adaptive cluster sampling (Lo, Giffith , & Hunter, 1997) , etc . Although t hey are different in terms of selecti on process or stopping rules , the ba sic concept of these designs is selecting additional units in t Also, the total number of units in the final sample is adaptive. The network in adaptive cluster sampling. Adaptive cluster sampling is advantageous when elements are highly aggregated or clu stered . However, that might not be the case for rare populations in education. For example, 15 consider one wants to sample students with special educational needs. If one school contains such students, it does not mean that the neighborin g schools also tend to include students with special educational needs. Therefore, in general, adaptive cluster sampling may not be very beneficial for rare populations in education. Sequential Sampling . Sequential analysis or sequential estimation is a method for testing statistical hypotheses in which the number of observations i s not fixed in advance but depended on the observations themselves (Wald, 194 5 ) . In sequential analysis, every time a sampling unit is added to th e set of samples, hypothesis testing is conducted. This sequential selection procedure continues until the hypothesis testing produces a significant result. A merit of the sequential method, as applied to testing statistical hypotheses , is that , on average , the test procedure requires a substantially smaller number of observations than equally reliable test procedures based on a predetermined num ber of observations (Wald, 194 7 ) . Sequential probability ratio test was developed for the purpose of testing statistical hypotheses . After sequential estimation was introduced, sampl e designs using the sequential estimation method have been developed. Hald a n e (1945) is one of them al though it is n 1953 ) . Because there is no sequential sampl e e designs in different fields. In general, s equential sampling is a type of adaptive sa mpling and uses sequential estimation. A t each observation in the sampling process , the decision to continue depends on the data recorded to that point. Data collection continues according to the initial de sign until the stopping rule is satisfied. Sequent ial sampling can be applied with SRS, stratification, or clusters (Christman, 2004) . 16 Inverse Sampling . Inverse ( binomial ) sampling uses adaptive strategy where the sample size is adaptive in that it depends on the information that is obtained during the sampling process. Inverse binomial sampling was introduced to select a set of samples from a rare population (Haldane, 1945) . Under conventional sampl e designs with a fixed sample size, one may not be able to observe enough numb er of rare events to produce precise estimates. Inverse sampling was developed to estimate the frequency of a rare event. Researchers keep selecting sam pling units until certain specified conditions are satisfied (Seber & Salehi, 2012). scholars inverse sampl ing ( Christman, 2004 ; Pathak, 19 76 ). The major difference between sequential sampling and inverse sampling is that , in general, inverse sampling focuses on estimating parameters such as total and mean while sequential sampling focuses on testing hypotheses. The similarity among SICSUP, sequential, and inverse sampling is to take samples sequentially and make d e cision s during the sampling procedure based on the information collected to that point. SICSUP is different from sequential and inverse sampling in terms of what kind of d ecision to be made. SICSUP makes decision s to adjust sample sizes for strata while sequential and inverse sampling make decisions to determine a stopping point of selection. SI CSUP is a combination of several sampling strategies : stratification , clustering, sequential estimation , and updating process. Without stratification and clustering, SICSUP is similar to inverse sampling. The sampling procedure is continued sequentially until certain specified conditions are satisfied. In SICSUP, a t each point of selection, a decision, whether any stratum reaches the predetermined sample size (initial sample size or updated sample size) , is made and the decision affects the later selection. 17 Sequential estimation and updating process are required because of stratification in SICSUP . At the beginning of the sampling procedure, the initial sample size for each stratum might not be proportional to the size of the stratum because of lack of information on the proporti ons of elements over strata . When an additional sample is added to the existing set of samples, researchers check whether there is a stratum that achieved the initial sample size. This process is related to inverse sampling and sequential estimation . If a stratum satisfies the required number of samples, sample sizes for strata are updated using the sampling distribution over strata that was obtained during the sampling proce ss . This updating process is the unique characteristic of SICSUP as compared to different sample design s. 2.2.3 Replication Method for Variance E stimation Complex sample design s often involve features such as stratification, multiple stage c omplex sample design s, SICSUP as well as SC can be considered a complex sample design . For such a complex sample design , unlike a simple sample design , special procedures are needed to estimate an unbiased or consistent sampling variance of an estimate of a parameter. There are two procedures to deal with those situations: the Taylor series linearization method and replication (or resampling) methods (Rutkowski, von Davier, & Rutkowski, 2013). In recent large - scale surveys, including educational large - scale survey s and assessments , replication methods have tended to be used more frequently than the Taylor series linearization method for estimating sampling variance. The major reason for the popularity of replication methods is that the Taylor linearizat ion method is , in general, mathematically complicated and, therefore, require significant computation burden as compared to replication methods. 18 The idea of subsample replication methods was introduced to simplify variance estimation for complex sample su rveys (Wolter, 1985). In terms of sample variance of means, the family of replication method s consists in selecting multiple samples from the parent sample; computing a separate estimate of mean from each sample; and computing the sample variance among the several estimates. The jackknife method and balanced repeated replication (BRR) method are commonly used replication methods along with the bootstrap method. The Jackknife Method . developed. The jackknife method, which was introduced by Quenouille (1949), is one of the most frequently used replication method s . Replicated datasets are typically cr ea ted by dropping secondary units from one PSU at time to form a replicate until all PSUs have been dropped from each stratum (Skinner et al., 1989). In general, the procedure of the jackknife method is as follows (Lee, Lee, & Shin, 2016; Wolter, 1985). First, the parent sample is divided into K random groups where K represents the numbe r of PSUs. Second, all secondary units in the parent sample possess the variable of interest, and the parameter of the variable is An estimate of based on the parent sample denotes . Third, after deleting the K th group, the weights of the remaining secondary units are doubled. With these replicate weights, ( ) is calculated using the elements in the remaining groups. Finally, the jackknife estimator of variance is then = ( 1 ) 2 = 1 . (2. 1 ) There are some differences in the jackknife procedures when they are applied to a stratified cluster sample design (Chen & Shen, 2019; Smith, Srinath , & Battaglia, 2000). In a stratified cluster sample design , the jackknife procedure is basically identical except that occurs in each stratum. F irst, in each stratum h , there are K clusters (or PSUs) . After deleting K th cluster, 19 the weights of the remaining element s in stratum h would be doubled to compensate for the deleted cluster and us ed to compute a variance estimate . Second, ( ) is calculated , and there would be K h estimates of ( ) for stratum h . Third, the jackknife estimator of variance is = ( 1 ) = 1 2 = 1 . (2. 2 ) A variety of variance estimators based on the jackknife method has been developed. The jackknife repeated replication (JRR) method was developed by Frankel (1971) who first applied jackknife procedure to compute sampling variance in complex surveys. The JRR was developed based on the jackknife estimation procedure and the BRR method. With the BRR method, each of the replications estimates the variance of the entire sample while , with the JRR method, each replication estimates the variance contributed by a single stratum (Kish & Frankel, 1974). The TIMSS and the PIRLS currently use the JRR method to estimate sampling variance. The major advantages of using the jackknife method are that it is c onceptually simple an d provides a precise estimate of sampling variance in general. As compared to the bootstrap method, it is less computationally intensive. The jackknife method ha s a limitation when it is applied to single - stage sample design s. In these sample design s, estimates of sampling variance of non - smooth statistics, such as median or quantiles, are tend to be unstable. Although this problem does not occur when multi - stage sample design s are used, it is advised to avoid using the jackk nife method for estimating sampling variance of median or quantiles (Betti, Gagliardi, & Verma, 2018; Rutkowski, von Davier , & Rutkowski, 2013). The Bootstrap Method . Bootstrapping, which was introduced by Efron (1979), is a technique that relies on random resampling with replacement , and the bootstrap method in statistics is designed to provide information about the population distribution using bootstrapping. The bootstrap is used in practice for a variety of purposes: estimating statistics on 20 a populatio n (e.g., mean and standard deviation); estimating variance of a statistical estimator; and constructing approximate confidence intervals for parameters of interest (Shalizi, 2016). In this dissertation , the bootstrap method refers to the method to estimate sampling variance of means. The bootstrap method procedure is as follows: first, a resample is drawn from the parent sample , and a stati stic (e.g., mean) is computed; s econd, after repeating the previous step B times, B sets of the statistic, , would be obtained; t hird, the bootstrap variance is calculated: = 1 2 = 1 , (2. 3 ) where is t he estimate based on the parent sample. In SC , n 1 sampling units out of the n elements are selected independently with replacement within each stratum. Because the selection is with replacement, a sampling unit may be chosen more than one (Statistics Canada, 2018). Given sample size of n , there are n n possible sets of samples with replace ment. Calculating a statistic (e.g., mean) from all n n bootstrap samples is basically impossible in practice , thus, researchers choose a number of bootstrap samples that they use to estimate sample variance of the statistic. The bootstrap variance involve s two sources of error: an error due to the fact that the sample size is finite and an error due to the fact that B is less than n n . The first source of error can be correct by multiplying it by ( n 1)/ n . The second source of error can be reduced by increas ing the number of B . Previous studies have suggested numbers of replications in order to obtain a reliable estimate using the bootstrap method. Although a minimum number of 200 to 300 for variance estimation was suggested (Efron & Tibshirani, 1993; Hall, 1989), larger B would be preferred to obtain a reliable estimate. 21 As compared to the jackknife method, estimated standard errors using the bootstrap method tend to be slight smaller (Efron, 1982). While the jackknife method provide s unstable estimates of sampling variance of non - smooth stat istics such as median and quantiles, the bootstrap method is generally work well for these statistics (Ghosh et al . , 1984 ; Riniolo, 1999 ). The bootstrap method requires less computa tional burden as compared to the jackknife method (Chen & Shen, 2019). The bootstrap method does not work well for the following situations: correlated data (e.g., time series data), missing data, and data with outliers. Balanced Repeated Replication and F . The balanced repeated replication method (BRR) involves dropping all elements within a PSU in a stratum, but it does so by creating half - samples. One PSU from each stratum is selected and its elements are retained, forming a pseudo - replicate, with the set of remaining PSU s f o r each stratum forming the complement replicate ( Stapleton, 2008). The principle of the BRR is the following: each of the two PSUs can provide an unbiased estimate of the parameter of interest of its stratum. The BRR design assumes that a population of PSUs is able to be grouped into H strata with two PSUs per stratum. The BRR can thus only be accomplished when the sample design has been undertaken with the selection of two PSUs fro m each stratum. In practice, it is hard to find such populations . If the sample design did not include the selection of two PSUs from each stratum, similar strata or PSUs can be artificially grouped to obtain such a design ( p seudo - strata). This process of allotting each pair of PSUs into pseudo - and complement replicates is repeated many times to create a large set of half - replicates. There is a complication in creating replicates using half of the PSUs because dependent replicates can produce parameter estimates that are correlated across replicates. In order to obtain a balanced design, a solution is to balance the formation of replicates by using an orthogonal 22 design matrix. A selection of these matrices, sometimes referred to as Hadamard mat rices, are developed and available from Wolter (1985). The BRR provides a way to extract from the complete set of 2 H possible replicates a much smaller subset that gives the very same measure of sampling error as the full set would . Using these matrices, a minimal set of K balanced half - samples are created. In order to obtain a fully balanced design, the number of replicates used needs to be four times greater than the number of strata (Chen et al . , 2007). For each of the retained PSUs as defined by the des ign matrix, the sampling weight is doubled to create a set of replicate weights from which to calculate replicate estimates . For any given replicate, two times of the sampling weight if the PSU in stratum is retained in the pseudo - replicate , and the weight is equal to zero otherwise (Rust & Rao, 1996). Once these sets of replicate weights are created, a conventional analysis is run for each set of weights, and the standard errors of the parameter estimates are a measure of the variability ac ross pseudo - replicates = 1 2 = 1 , (2. 4 ) where is the estimated variance using the parent sample, is the estimated variance based on the K th replicates, and K is the total number of half - sample replicates. . With larger datasets, the BRR estimates of variance are seen by some as less computationally taxing than JRR because they use only half - samples (Rao , Wu , & Yue , 1992; Rust & Rao, 1996). The replication meth ods work differently depending on the variable of interest. For ratio estimates, the jackknife is superior to the BRR or bootstrap (Rao & Wu, 1985) while for medians, the BRR works better than the jackknife (Kovar, Rao , & Wu, 1988). This issue in using ratio estimator for the BRR motivated the deve 23 Fay , & Morgansein, 1984). When ratio estimator is used , the BRR might pr oduce extremely large estimates because of zero weighted and double weight weights of 0.5 and 1.5 instead of 0 and 2 for the half samples within each stratum. (1990) supports the the BRR and the jackknife for the ratio and the reg ression coefficient . 24 CHAPTER 3. M ETHOD S T h is dissertation evaluate s stratified inverse cluster sampling with updating process (SICS U P ) through four research questions (See Section 1.3) . The first to third research questions evaluate SICSUP with respect to statistical aspects and the last research question evaluates SICSUP with respect to economic aspects . This chapter describes details about the research methods for answering to the four research que stions. In general, f or each research question, t he results from SICSUP are compared to the results obtained from simple random sampling (SRS) , stratified cluster sampling ( SC ) , and SICSUP without updating process (SICS) . Results of SRS provide a basis for comparison. Results of SC are n ecessary because S I CS U P also uses cluster and stratification . Result s of SICS are also necessary in order to examine the effect of updating process . The comparison of SICS and SC describes the effect of sequential process. SRS selects n distinct units from the N units in the population with the equal selection probability for each unit. SC randomly selects n clusters from the N clusters in each stratum in the population and samples all the units in the n clusters. The procedure of SICS is the same as the procedure of SICSUP except the updating process. In this dissertation, the population of novice teachers, who are defined as teachers with zero to five years of overall teaching experience, serves as the rare population, which is also a hard - to - survey population. The population of all teachers including novice and non - novice teachers is called here the general population. Novice teachers are rare and hence , hard to sample because of two possible reasons. First, in general, mobility or turn - over rate of novice teachers i s higher than veteran teachers ( Simon & Johnson, 2015 ; Smith & Ingersoll, 2004 ) . Even though 25 researchers h ave a frame of novice teachers and know where to find them, they may fail to find them in the schools where they are supposed to be due to high mobility. Second, in case of international survey s , some countries years of teaching experience. For example, in the Unite d States, it may be hard to identify years of teaching experience for each teacher whose previous school is in a different state. Altho ugh novice teachers are a rare and hard - to - sample population, at lea st we know they are in schools, meaning within clusters. Also, previous st udies on the general population support the usage of stratification at the school level, such as school type, location of school , and source of funding (OECD , 2017, 2019) . T herefo re , the rare population of novice teachers with stratification and clustering would be an appropriate population for the application of SICSUP. Data generation and analysis were done by using MATLAB R2015b (The MathWorks, INC., 1984 - 2015) and R software (R Core Team, 2019 ). 3.1 R esearch Question 1 The first research question is about whether SICSUP work s as well as SC in terms of parameter estimation. Simulations were conducted to examine the performance of SICSUP under the various conditions. It was assumed that when sample size is not small, the updating process would be beneficial to estimate parameters including mean, standard deviation, and standar d error of the mean, and hence, SICSUP would work at least as well as SC with respect to precision in parameter estimation. 3.1.1 Data Generation A data set was generated for simulations, and in order to generate a realistic population as possible , the Teaching and Learning International Survey (TALIS) 2018 (OECD, 2019) was used as a basis for generating parameters . The TALIS surveys teachers and school leaders across 26 countries about working conditions and learning environment at their schools, and the q uestionnaire for teachers includes a question about their years of teaching experience. It is available to determine whether a participant of the TALIS is a novice teacher or not. The distribution of novice teachers in schools by strata was used to generat e data for simulations. Specifically, the distribution of novice teachers in Canada was used because, with respect to the TALIS 2018, Canada is o ne of the countries that have high proportion s of schools with zero novice teacher , indicating a rare populatio n. Location of school was use d for stratification. The TALIS 2018 categorizes schools into three locales: rural, town, and city. In the Canada data, a bout 11% of novice teachers are in rural schools, about 25% of novice teachers are in town schools, and about 64% of novice teachers are in city schools. The three locales serve as stratification. - efficacy in instruction serves as the variable of interest for simulations. This variable was chosen because, in the TALIS2018, novi ce teachers in the same stratum (rural, town, or city) behaved similarly to each other . Stratification, which is one of the common sampling techniques used in SICSUP, SICS, and SC, is useful and provides precise estimates of a parameter when values of the variable of interest within each stratum are homogeneous. - efficacy in instruction, is an appropriate variable for employing stratification. A data set with 2,000 novice teachers in 949 schools was generated . In the data, about 19% of the schools have no novice teacher, and about 17% of the schools have only one novice teacher. In the TALIS2018 , about 7% of schools have more than five novice teachers. To avoid too complicated data, an adjustment was applied, so the nu mber of novice teachers in school range d from zero to five. Additionally, because the TALIS2018 Canada has no school with more 27 than three novice teachers in rural area, I added schools with four or five novice teachers to the generated data . Their proporti ons are very small, 2.4% and 1.1% , respectively. T he generated data match the TALIS2018 Canada with respect to proportion s of novice teachers over strata . Table 3. 1 Number of Novice Teachers per School Novice teachers in school Rural Town City N % N % N % 0 38 23.2 59 20.7 83 16.6 1 52 31.7 55 19.3 59 11.8 2 47 28.7 99 34.7 111 22.2 3 21 12.8 42 14.7 85 17.0 4 4 2.4 19 6.7 91 18.2 5 2 1.2 11 3.9 71 14.2 Total 164 100 285 100 500 100 Table 3. 2 Number of Novice Teachers by Location of School Area N % Rural 235 11.8 Town 510 25.5 City 1,255 62.8 Total 2,000 100.0 Based on these proportions of novice teachers over strata , three sets of the variable of interest were created: first, uncorrelated data, which h ave zero correlation between school size - efficacy in instruction; second, mildly correlated data, which ha ve ve 28 3.1.2 Simulation Desi gn T he simulations were conducted under the three conditions: sample size, type of initial proportions used , and correlation between the variable of interest and school size. School size was measured by the number of novice teachers within school. The simulations focus on examining under which conditions SICSUP would perform well whe n it is applied to the rare population of novice teachers. Four levels of sample size were used: 50, 100, 500, and 1,000. The previous research (Reckase, Kim , & Ju, 2016) i ndicated that the selection probability (or inclusion probability) was changed depending on sample size. As the size of the sample increase d , the selection probabilities of novice teachers in different size schools bec a me similar to each other. When a samp le was half the population si ze, selection probabilities beca me equal. That is, when the target sample size is a half of the population size, there is no selection bias due to clustering . Regarding the total population size of 2,000, the sample size of 1,0 00 is a half of the population size. Thus, the four levels of sample size can examine whether sample size affects accuracy in parameter estimation. This dissertation used three types of initial proportions of novice teachers over strata : initial proportions based on data and two types of informal estimate. To apply cluster sampling with stratification for surveys, a frame (list) of sampling unit s is required. If that is not available, at least one should know the proportions of sampling units over strata and the average number of sampling units per cluster. This dissertation focuses on the situations when the proportions of sampling units over strata are unknown before sampling . The situations can be categorized into three conditions. First, resear chers know the true proportions of novice teachers over strata in the Second, although 29 researchers do not know the true proportions, they may have an informal estimate of the prop ortions in the population . This estimate is base d on the proportions of schools over strata, which may be different from the proportions of novice teachers over strata. Third, researchers may have another type of informal est imate. This estimate is based on the assumption that the proportions of novice teachers over strata . Table 3. 3 Initial Propo rtions for Sampling Area Proportion s of novice teachers over strata Initial proportions b ased on d ata Informal est. based on school proportions Informal est. based on equal proportions Rural .20 .30 .33 Town .25 .29 .33 City .55 .42 .33 In practice, the first informal estimate could happen when the proportions of schools over strata are the best information available for researchers before starting the sampling procedure . The second informal estimate assumed that each stratum contains an equal number of novice teachers in the population . If researchers do not know anything a bout the proportions of novice teachers over strata before sampling , taking samples of equal size from strata could happen. Correlation between the variable of interest and school size, meaning number of novice teachers within school , was categorized i nto three: zero , medium, and high ( .0, .4, and .7 , respectively). T eachers , including novice teachers, in large schools tend to stay longer in teaching than those in small schools ( Allensworth, Ponisciak, & Mazzeo, 2009 ; Shin, 1995 ) . High - quality teachers may find greater opportunities, such as advancement and promotion, in large schools which have more positions . T hese teachers are also more likely to leave small schools because working conditions in small schools are usually worse than those in large 30 school s (e.g., heavy teaching and working loads). Additionally, t he previous research (Reckase, Kim , & Ju, 2016) showed that seque ntial cluster sampling (SICS without stratification) worked slightly worse in parameter estimation when school size was correlated with the variable of interest. To sum, t his dissertation considered total 36 conditions (4 sample sizes × 3 types of ini tial proportions × 3 levels of co rrelation). SICSUP, SICS , and SC were used to obtain sets of samples, and 10 sets of samples were created for each simulation condition. 3.1.3 Variance Estimator The firs t research question focuses on estimating mean, standard deviation, and variance of the mean estimate . In order to estimate variance of the sample mean , 2 ( ) , four replication methods were used: the jackknife , bootstrap , balanced repeated replica tion (BRR) , and the method s . The variance of the sampling distribution of is defined to be 2 = 2 = ( ) 2 , (3. 1 ) where is the value of calculated from sample S and P ( S ) is the selection probability . For S I CS U P, it is difficult to calculate variance directly due to the complex sampling procedure. In that case, re plication methods are used to obtain the variance of . The PISA uses the BRR with adjustment ( OECD, 2017 ). The TIMSS uses one variation of the jackknife method , the jackknife r epeated r eplication (JRR) ( Martin, Mullis , & Hooperm , 2016 ). The NAEP also uses the jackknife method 2 . The bootstrap method is also widely used for large - scale surveys ( Statistics Canada , 2018 , 2019 ) . The jackknife method is chosen due to its popularity, simplicity, and relative ease of computation (Canty & Davison, 1999) . The bootstrap method is relatively easy to implement and 2 NAEP Assessment Weighting Procedures. https://nces.ed.gov/nationsreportcard/tdw/weighting/ 31 enables researchers to more readily perform design - based analysis (Mach, Dumais , & Robin son, 2005). It also requires l ess c omputational burden as compared to other variance estimator s (Chen & Shen, 201 9 ) . In bootstrap method, 500 replicates were generated for each simulation condition. s , pseudo - strata were created. The s chools were paired within the original strata, and each pair served as pseudo - stratum. For the jackknife and bootstrap, original and pseudo - strata were used for estimation. The s use Hadamard matrices to create balanced half - sample ( Lumley, 2020) converts a sample to a sample with replicate - weights and estimates the standard error based on either of the jackknife s Hadamard matrices to creat e balanced half - samples. provide replicate weights or to create them by program, and I let the program create replicate weights. 3.1.4 Evaluation Criteria For mean and standard deviation estimates , the mean square errors (MSE) were used to evaluate accuracy in estimation. The MSE is given by = 1 2 = 1 = Variance +Bias 2 , (3. 2 ) where set is the number of sample sets, 10, is the estimate for each set of samples, and is the population parameter, which is the population mean or standard deviation . The MSE is a sum of the variance of estimates and squared bias of estimat es . Smaller MSE indicates a more accurate estimator . Sample means and standard deviations were estimated with sampling weight and without sampling weight. 32 The purpose of weighting on the data for surveys is to obtain estimates of population parameters that do not suffer from bias due to the use of a complex sample design ( Rutkowski, von Davier , & Rutkowski, 2013). Sampling weights are basically an inverse of selection probability . Sampling weights were applied to each novice teacher with respect to t he sample design used. For simple random sampling, all novice teachers have the equal sampling weight, and it is given by = , (3. 3 ) where N is the total number of novice teachers in the population, and n is the sample size. For SC , all novice teachers in the same school in each stratum have the equal sampling weight , and it is given by = × × = × . (3. 4 ) For each stratum, M h is the total number of novice teachers in stratum h , M is the total number of novice teachers in the population, N h is the total number of schools, n h is the number of schools in the sample, m h * is the number of novice teachers from n h schools, and m h is the number of novice teachers in the sample, which equals sample size . The righ t most term in the equation is equal to 1 because all teachers in the sampled school were added to samples. For SICSUP and SICS, the following sampling weights were used: = × × . (3. 5 ) For each stratum, M h is the total number of novice teachers in stratum h , M is the total number of novice teachers in the population, N h is the total number of schools, n h is the number of schools in the sample , m h is the number of novice teachers in the school that was selected at 33 the end, and m h * is the number of novice teachers sampled from the last school . All novice teachers in the same school receive the equal weight except the last school. Novice teachers in the last school might be selected randomly depending on the number of samples obtained before the last school. In the first research question, standard error s (square root of variance) of each sample mean w ere estimated. E stimated bias, relative bias, relative MSE, and confidence interval coverage probability were used in order to compare the four replication standard error estimators : the jackknife , bootstrap , estimators . Estimate d bias = EMP (3. 6 ) is the difference between the average of standard error estimates from the 10 sample means , , EMP . A positive value indicates that the standard error estimator tends to overestimate the empirical standard error and a negative value indicates that the standard error estimator tends to underestimate the empirical standard error. A value near zero is preferred and represents a good standard error estimator. The e mpirical s tandard error is the standard deviation of the 5 ,000 sample means. . = = . (3. 7 ) The relative bias is the estimated bias divided by the empirical standard error. Because estimated bias can be a negative or positive value, the relative bias can also be a negative or positive value. The relati ve bias would be zero when the estimated bias is zero, which is hardly ever the case in real world. The relative bias expresses the estimated bias as a proportion of the 34 empirical standard e rror . A small absolute value of relative bias is preferred and indicates a good standard error estimator. The third criterion is the relative MSE. The relative MSE of the standard error estimator, . = 2 = 1 ( ) 2 = 1 2 , (3. 8 ) where set is the number of the sample sets, 10. The relative MSE expresses the MSE as a proportion of the squared empirical standard error. Like the rela tive bias, a small value of relative MSE is preferred and indicates a good standard error estimator. Finally, the confidence interval coverage probability is the probability of the 10 samples for which the estimated 95% confidence interval covers the population mean. The confidence interval for the sample mean , , is given by = ± / 2 , (3. 9 ) where z dard error estimator. It is expected that the coverage probability would equal the nominal coverage probability of 95%. However, because 10 sets of samples were generated for each simulation condition , the coverage probability can only be expressed in tenth such as .9 and .8. Therefore, in this study, coverage probabil ity of . 9 or higher is preferred and interpreted as a good standard error estimator. 3.2 Research Question 2 The second research question is about how the appropriate sample size for SICSUP can be determined. The first question asked when a survey is being planned is what sample size to be used . The larger the sample size is, the better accuracy in estimation can be achieved although the more likely a hypothesis test will detect a small difference , increasing a probability of rejecting a null hypothesis. In addition, taking a large r sample requires more resource such as 35 time a nd cost. A survey should consider the maximum sample error one is willing to accept and the effect of the samp le design on estimation precision so that the sample size for the survey can be decided. 3.2.1 Data and Simulation Design S imulation studies are c onduc ted using the same dataset and simulation conditions that are used in the previous section (the first research question) . Each of SI CSUP, SICS, and SC takes 10 sets of samples under the 36 conditions (4 sample sizes × 3 correlations × 3 initial proportions), and each set of sample provides the standard error of the sample mean . The results in the previous section ( the first research question ) indicate that the four replication standard error estimators such as the jackknife , bootstrap, estimators work similarly to each other on average. For the second research question, the standard error for each simulation condition is obtained by averaging standard errors from the four standard error estimators with pseudo - strata. 3.2.2 Ev aluation Criteria Design Effect and Sample Size . Before data collection, sample size should be determined so that the results of the survey could provide a certain degree of precision in estimation. The sample size is determined by the margin of error, design effect , and confidence level . Complex sample design s such as SC usually require larger sample sizes than those for SRS in order to achieve the same level of prec ision. The design effect ( Deff ) is the ratio of the variance of a sample that is from a complex sample design to the variance of a SRS sample with the same sample size : = 2 2 . (3. 10 ) 36 The design effect summarizes the effect of various complexities in the sample design such as clustering and stratification (Kish, 1965). T he variance of stratified sample s could be smaller than the variance of SRS sample s due to stratification. Therefore, the design effect could be less than one. For clustered samples, the variance of clu stered sample s tends to be larger than the variance of SRS sample s due to clustering. Thus, the design effect is typically larger than one. For stratified clustered samples, the design effect depends on the effect of stratification and clustering. The required sample size , n , for the survey is a product of sample size for SRS and design effect and is given by n = n SRS × Deff , (3. 11 ) where n SRS is the sample size for SRS . The required sample size for a complex sample design, n , and n SRS can produce estimates at the same level of precision. For example, if a design effect is 2 and n SRS is 1 00, samples of 2 00 from the complex sample design are required in order to obtain the results as precise as those from 1 00 SRS samples. One of the simulation conditions used in this study is the different levels of sample size, ranged from 50 to 1,000. The design effect is computed for each level of sample size and used for calculating sample sizes for SICSUP, SICS and SC . The calculated s ample size s of SICSUP, SICS, and SC provide the same estimation precision as the given sample size of SRS would . It is expected that, under the same simulation condition, as the sample size increase s , the design effect decreases. Thus, the difference in sa mple size between SRS and the three complex sample design s (SICSUP, SICS, and SC ) would decrease . 37 Under the various simulation conditio ns, required sample sizes for S I C SUP are mainly compared to those for SC . S mall difference in sample size between S ICSUP and SC indicate s that SICSUP is as effective as SC . Margin of Error and Sample Size . A margin of error is another factor for determining sample size. A margin of error refers to a limit of accuracy of a sample estimate (Agresti & Finlay, 2009). In other words, it shows how many points the results can be differ from the population paramet er . To determine sample size, researchers should decide on the margin of error de sired. The margin of error of an estimate is the maximum likely estimation error exp ected when the sample statistic is used as an estimator (Peck, 2014). In this study, the sample statistic is mainly the sample mean. The margin of error is ME = z × 2 = 1.96 × 2 , (3. 12 ) w here 2 is the population variance and n is the sample size. With a conventional 95% confidence level , 1.96 is used for z . If the margin of error for the mean is d at a 95% confidence level, 95% sample means fall within the population mean plus or minus d . Given the margin of error (ME), population variance ( 2 ) , population size ( N ), and a 95% confidence level, the necessary sample size for SRS , n SRS , can be obtained by the following formula (Thompson, 2002): = 1 2 2 2 + 1 = 1 1 0 + 1 (3. 13 ) where 0 = 2 2 2 . In the dataset used for this research question , the population means are 12.37, 12.27, and 12.23 for uncorrelated data ( 38 = .7), respectively; the population standard deviations are 1.94, 2.05, 2.16 for uncorrelated data ( , respe ctively . Considering the standard deviations, the five levels of margin of error were examined, from .1 to .5. Table 3. 4 presents the required sample siz es given the level of margin of error. For the population used in the second research question, th e margin of error of .1 might require too many samples considering th at the design effects are between 1 and 3. Given the margin of error, .1, if the design e ffect is 3, the required sample sizes (3 times the last column in Table 3. 4 ) would be se sample sizes are larger than population size of 2,000, and then, it is impossible to achieve the margin of error of .1 in this population . T he lowest margin of erro r that can be obtained in this population is examined. Table 3. 4 Margin of Error and Required Sample Size for SRS Margin of Error 0.5 0.4 0.3 0.2 0.1 56 87 149 307 840 63 96 165 337 895 69 106 180 365 943 The sample sizes for SICSUP, SICS, and SC are computed based on the margin of error and sample sizes for SRS in Table 3. 4 . This study investigates whether SICSUP needs more samples than SC in order to achieve a given margin of error. If the required sample size for SICSUP is s imilar to that for SC, SICSUP can be considered as effective as SC. 3.3 Research Question 3 The third research question is about whether samples from SICSUP can determine group differences. The overall state ranking s or country ranking s compared with others are one of the 39 headline findings for m national or international large - scale surveys and assessments in education ( OECD, 2016). Education authorities and policy makers have paid great attention to rankings that provide information for their benchmarking tools to help develop educational strategies (Downing & Ganotice Jr., 2016) . For the general public, such as parents and students, rankings also provide relative performance as compared to those in other states or countries. There is strong media interest in ra nkings because they are clear and e asy to understand. Although results should not be interpreted naively and are often abused, state and country rankings are one of the most influential results from national and international surveys and assessments. In t his section, it is assumed that one would like to conduct an international survey in order to compare the ra nk order position of a country with the positions of other countries . Simu lation studies are conducted to examine whether, in such situation, SICSUP would perform as well as SC , which has been frequently used for international surveys. 3.3.1 Data Generation Like the first research question, the TALIS2018 (OECD, 2019) was used to generate datasets for the simulation studies. The data generation procedure h ere is basically the same as the one in the previous section. Five countries were selected in the TALIS 2018: Brazil, Canada, New Zealand, Portugal, and Taiwan. These five countries were selected because they have relatively higher proportions of schools w ith no novice teacher . In o ther words, novice teachers are rare in their countries as compared to other countries in the TALIS2018. The percentages of schools with no novice teacher are about 60% for Portugal, 31% for Brazil, 24% for Canada, 23% for Taiwan, and 20% for New Zealand. 40 Table 3. 5 provides the summary of the five generat ed datasets. Although the generated datasets are based on the distributional information of the five countries, they are simulated datasets rather than real datasets. Therefore, in this dissertation, country 1, country 2, country 3, country 4, and country 5 refer to the datasets based on Portugal, Brazil, Canada, Taiwan, and New Zealand, respectively. For each of the five countries, a population of 10,000 novice teachers was generated. In this research question , the variable of job satisfaction with profession was considered the variable of interest. The second and third columns in Table 3. 5 present the means and standard deviations of the varia ble of interest in the generated populations. In the simulations, the population means are estimated by sample s of SICSUP, SICS, and SC . Table 3. 5 Summary of the Generated Data by Countries Country M ean SD Total NT Total School % Sch. with No NT Max NT * Country 1 12.37 2.29 10000 162000 60.6 5 Country 2 11.31 2.06 10000 6222 31.8 9 Country 3 11.46 2.06 10000 5368 25.5 8 Country 4 11.70 1.75 10000 4950 23.8 12 Country 5 11.92 2.03 10000 4299 21.1 11 * Note: m aximum number of novice teachers (NT) in a school In the TALIS 2018, the five countries used one or more stratification variables at the school level. The stratification variables were selected based on the stratificati on variables that were used in the TALIS 2018 ( OECD, 2 019) . In the TALIS2018, country 4 ( Taiwan ) used two types of stratification variables, such as location of sc hool and source of funding. O ne of the strata had a very small fraction in the population (about 1.4 % ) . This stratum was excluded in the generated population. 41 Table 3. 6 Stratification Variable by Country Country Stratification variable Level Description Country 1 School location 4 (1) village, hamlet or rural area, (2) s mall town , (3) t own , and (4) c ity Country 2 Source of funding 2 (1) public school and (2) private school Country 3 School location 3 (1) r ural , (2) t own , and (3) c ity Country 4 School location and source of funding 5 (Rural, town, and city) × (Public school and private school) * Country 5 School size measured by number of enrolled students 5 (1) un der 250 , (2) 250 - 499 , (3) 500 - 749 , (4) 750 - 999 , and (5) 1000 and above * Note: p rivate schools in rural area were excluded from the population due to very small proportion in the population ( about 1.4 %) 3.3.2 Simulation Design For each country, the population mean (or country mean) is estimated. Given the margin of error (ME), population variance ( 2 ) , population size ( N ), and a 95% confidence level, the necessary sample size for SRS , n SRS , can be obtained by the following formula (Thompson, 2002): = 1 2 2 2 + 1 = 1 1 0 + 1 (3. 14 ) where 0 = 2 2 2 . Based on the formula above, the required sample sizes for SRS given the levels of margin of error were calculated for each country (see Table 3 . 7 ). Alth ough design effects of SICSUP, SICS, and SC for the five populations are not known, one may assume that the design effects are less than three considering the results of the second research question. Therefore, the sample sizes for SICSUP, SICS, and SC wou ld be about three times larger than the sample sizes in Table 3 . 7 . In order to achieve the margin of error, .1 for all five countries , around 5,000 samples ( 1679 42 × 3 = 5037) seem to be needed, which is a half of the population size of 10,000. About 1,500 samples are desired for the margin of error, .2, about 660 samples for the margin of error , .3, and 380 samples for the margin of error , .4. In this study, the sample s ize of 600 was chosen and it would give margin of errors slightly larger than .3 for country 1 and less than .3 for the other countries , if the design effect is three. Table 3 . 7 Required Sample Size for SRS by Margin of Error and Country Country Margin of Error 0.5 0.4 0.3 0.2 0.1 Country 1 80 125 219 480 1679 Country 2 65 101 178 391 1400 Country 3 65 100 177 390 1397 Country 4 63 98 174 382 1371 Country 5 47 73 129 286 1054 Two types of initial proportions of novice teacher s over strata were used for simulations: initial proportions based on data and informal estimate of the population proportions based on school proportions. For each country, 10 sets of samples were taken and the results from th e 10 sets of samples were averaged and reported. 3.3.3 Evaluation Criteria The population means of the five countries are estimated using SICSUP, SICS, and SC samples. For eac h sample mean , standard error is also estimated using the jackknife estimator with original strata and the BRR estimator w ith pseudo - strata. The results from the first research question suggest that, on average, they work slightly better than the other estimators in SICSUP. Based on the results of the 10 sets of samples, 95% confidence interval coverage probability is investigated. The preferred values are .9 and 1.0 conside ring the number of sample sets, which cannot provide probabilities in hundredth. 43 The five countries are ranked in descending order of the sample mea ns based on the samples from SICSUP , SICS, or SC , and the rankings are compared to those based on the population mean s. Four types of rank order are examined: country rankings based on each of the sample designs and country rankings based on a combination of SICSUP and SC . National or international large - scale surveys and assessments often use different sample design s regarding the situation s of the participating states or countries . For example, the overall sample design for the TALIS 2018 was a stratifi ed two - stage probability sample design (OECD, 2019). Stratification was applied based on the situation of each country. Geography, source of financing, type of educational program, and school size were used as stratification variables. In the case of the P ISA, there were countries that used a three - stage design while the overall sample design was a two - stage design (OECD, 2017). Country 1, 4, and 5 take samples using SICSUP while country 2 and 3 do it using SC . Country 1 has the rarest population in terms of number of schools with no novice teacher. Only 40% of schools contain at least one novice teacher. One of the strata in country 4 and 5 has a very small fraction: 8% in country 4 and 6% in country 5 . It is expected th at SICSUP would work well under these situations as compared to SC . Based on the population means, country 1 has the highest mean, followed by country 5 , 4, 3, and 2 . The rankings estimated by samples of SICSUP, SICS, and SC are compared to the rankings ba sed on the population mean s . 3.4 Research Question 4 The last research question evaluate s the economic aspect of SICSUP as compared with that of SICS and SC . What is a good sample design? What are the optimal characteristics of a sample design? It is often s aid that a good sample design can achieve a fixed level of precision 44 with the least amount resource s used such as cost and time. This description contains two aspects of good sample designs: statistical and econo mic aspects . The previous three research questions evaluate SICSUP wit h respect to statistical aspect . The last research question evaluates SIC SUP in terms of economic aspect . Drawing samples from a rare population often causes difficulties with respect to resource consumption because sampling u nits are hard to locate. If researchers take samples from a rare population using a conventional sample design, such as cluster sampling or multi - stage sampling, they would see a large proportion of units that do not satisfy the selection criterion. Based on the data used in this dissertation, if one draws schools from the population, there would be a large Usually less resource s , such as time and cost, are associate d with observing a school with no novice teacher than observing a school with at least one novice teacher. Schools with no novice teacher are discarded without administering the survey, so the amount of resource s used for such schools is less than for scho ols with at least one novice teacher. However, drawing schools still spends some resource s rega rdless whether they are added to the final set of samples or not. For example, obtaining approval and cooperation of schools often takes time and cost. One advan tage of SICSUP over conventional SC is that it can reduce the frequency of meeting such more economical than SC . 3.4.1 Data and Simulation Design To address the la st research question, the generated data for the first and third research questions are used. SICSUP, SICS, and SC are used to draw samples from the populations, and the number of schools that are contacted during the sampling procedure and the number of 45 s chools that are included in the final set of samples are examined. For the dataset from the first research question (dataset 1) , different levels of sample size are applied including sample sizes of 50, 100, 500, and 1,000. For the dataset from the third research question (dataset 2) , five different countries are examined given the sample size of 600. For both of the datasets, the results are reported by strata in addition to the results based on the whole samples. In some situations, the resource s requir ed to conduct a survey may be different between strata. For example, travel cost is proportional to the distance. If location of school such as rural, town, and city is used as stratification, survey ing samples in rural schools might be more expensive than those in city schools because the distance between rural schools tends to be greater than the distance between city schoo ls. If SICSUP can achieve a pre determined sample size of novice teachers in rural area, which may require more resource s than other st rata, with fewer schools contacted as compared to SICS or SC would , SICSUP is more economic than the others. For each dataset, 500 sets of samples are taken and the averaged results are reported in the result chapter. 3.4.2 Evaluation Criteria The economic as pect of SICSUP are measured by the number of schools that are contacted during the selection process ( n* ). These contacted schools consist of two types of schools: schools without novice teacher and schools with at least one novice teacher. The former is d iscarded without administering a survey, and the latte r is added to the final sample set of novice teachers. This can be expressed by = + . (3. 15 ) 46 Considering that the numbers of schools in the final set of samples are similar regardless of sample design, as the total number of contacted school s increases, the frequency of seeing economical. The ratio of the number of schools in the final sample set to the number of contacted schools is used as the evaluatio n criterion. The value of 1 indicates that researchers did not meet ave at least one novice teacher and are added to the final set of samples. With smaller value, researchers more s . The value of 0 indicates that all contacted schools are with no novice teacher, and researchers failed to sample any novice teacher through the sampling procedur e. The value close to 1 suggests an economic sample design. The ratio must be interpreted with the number of contacted schools or the number of schools in the final sample set because the ratio provides only relative information. For example, consider tha t , given a sample size, in SIC S UP, the number of contacted schools and the number of schools in the final sample set are 10 and 9, respectively; in SC , the numbers are 20 and 18, respectively. Both of the cases provide the ratio of .9, but one cannot say that they are equally economical because SICSUP used fewer schools to achieve th e given sample size than SC did . The ratio of tw o sample designs is also used in order to evaluate the performance of SICSUP; (1) the ratio of the number of conta cted schools in SICSUP ( ) to those in SC ( ), , (2) the ratio of the number of contacted schools in SICSUP to those in SICS ( ) , ; and (3) t he ratio of the number of contacted schools in SICS to those in SC , . The first ratio shows the effect of the updating process and sequential selection on the number of co ntacted schools, the second desc ribes the effect of the updating pr ocess, and the third reveals the effects of the sequential selection. For each ratio, the smaller the value, the 47 greater the effect of the updating process, sequential selection, or both upon the number of contacted schools during the sampling procedure. In addition to the two types of ratios, the probability of using substitute schools in SC is investigated. In this study, it is assumed that all sampled novice teachers participate in the survey, and using substitute schools only occurs when the novice teachers in the selecte d schools did not reach the pre determined sample size. Although it does not directly give information to evaluate the performance of SICSUP, the probability shows how SC works inapprop riately in the rare population of novice teachers. 48 CHAPTER 4. RESULTS This chapter summarizes the results of the analyses organized into four sections corresponding to the four research questions described in Chapter 1 . The first two sections report the results of the simulation studies that investigated the level of precision in estimating population parameters under various conditions. The third section also presents the results of the simulation studies t hat examined another statistical aspect of SICSUP. Unlike the first two sections, that assumed a national survey, the third section focuse s on the application of SI C SUP to international surveys. The last section focuses on the evaluating SICSUP in terms of economic aspect r ather than statistical aspect . Throughout the chapter, n represent between school size and the variable of interest. SICSUP refers to stratified inverse cluster sampling with updating process, SI C S to stratified inverse cluster sampling without updating process, and S C to stratified cluster sampling . A symbol of estimator, and its subscripts denote a simulation condition: J = jackknife standard error B = R = BRR standard error F = Fay's standard error estimator, UJ = jackknife standard error estimator using SI C SUP sample s IJ = jackknife standard error estimator using SI C S sample s S J = jackknife standard error estimator using SC sample s , UB = bootst rap standard error estimator usi ng SICSUP sample s IB = bootstrap standard error estimator using SICS sample s , S B = bootstrap standard error estimator using SC sample s , UR = BRR standard error estimator using SICSUP sample s IR = BRR standard error estimator using SICS sample s S R = BRR standard error estimator using SC 49 sample s UF = Fay's standard error estimator using SICSUP sample s IF = Fay's standard error estimator using SICS sample s , and S F = Fay's standard error estimator using SC sample s . 4.1 Research Question 1 The first research question is about whether SICSUP works at least as well as SC with respect to estimating population mean, standard deviation, and standard error (square root of variance) of the sample mean. For mean and standard deviation est imation, the mean squared errors (MSE) are reported in order to examine the estimatio n precision. For standard error estimation, estimated bias, relative bias, relative MSE, and 95% confidence interval coverage probability are reported for each simulation condition. 4.1.1 Mean and Standard Deviation Mean . Table 4 .1 shows the MSEs of the sample mean s and standard deviation s in SRS. In both of sample mean s a nd standard deviation s , as the sample size increases, the MSE decreases. This pattern stays the same regardless of correlation: no correlation between school size and the .0 .7). Table 4. 1 MSE of the M ean and S tandard D eviation U sing SRS S amples n Mean SD 50 0. 0 0.07 0.02 50 0.4 0.11 0.01 50 0.7 0.04 0.05 100 0. 0 0.03 0.03 100 0.4 0.02 0.03 100 0.7 0.04 0.04 500 0. 0 0.00 0.00 500 0.4 0.01 0.00 500 0.7 0.00 0.00 1000 0. 0 0.00 0.00 1000 0.4 0.00 0.00 1000 0.7 0.00 0.00 50 Table 4. 2 MSE of Mean Using SI CSUP, SICS, and SC S amples n Weighted Unweighted SICS UP SICS SC SICS UP SICS SC Initial Proportions Based on Data 50 0. 0 0.29 0.12 0.17 0.21 0.14 0.14 50 0.4 0.34 0.40 0.29 0.31 0.24 0.21 50 0.7 0.59 0.26 0.20 0.48 0.21 0.11 100 0. 0 0.12 0.11 0.10 0.12 0.11 0.10 100 0.4 0.12 0.17 0.13 0.09 0.18 0.12 100 0.7 0.08 0.20 0.16 0.03 0.17 0.13 500 0. 0 0.03 0.02 0.01 0.03 0.01 0.01 500 0.4 0.02 0.05 0.02 0.02 0.04 0.01 500 0.7 0.05 0.02 0.02 0.03 0.01 0.01 1000 0. 0 0.01 0.00 0.01 0.01 0.00 0.01 1000 0.4 0.01 0.01 0.02 0.01 0.01 0.01 1000 0.7 0.02 0.02 0.02 0.01 0.01 0.01 Informal Estimate Based on School Proportions 50 0. 0 0.07 0.19 0.46 0.09 0.10 0.27 50 0.4 0.42 0.15 0.33 0.31 0.09 0.25 50 0.7 0.42 0.30 0.25 0.25 0.26 0.18 100 0. 0 0.18 0.06 0.12 0.15 0.03 0.08 100 0.4 0.18 0.12 0.12 0.12 0.09 0.09 100 0.7 0.14 0.06 0.18 0.13 0.02 0.14 500 0. 0 0.01 0.02 0.02 0.01 0.02 0.01 500 0.4 0.02 0.02 0.03 0.01 0.01 0.02 500 0.7 0.03 0.02 0.04 0.02 0.01 0.03 1000 0. 0 0.01 0.01 0.00 0.01 0.00 0.00 1000 0.4 0.00 0.01 0.01 0.00 0.01 0.01 1000 0.7 0.02 0.01 0.01 0.01 0.01 0.01 Informal Estimate Based on Equal Proportions 50 0. 0 0.10 0.25 0.11 0.09 0.17 0.14 50 0.4 0.31 0.29 0.22 0.26 0.23 0.14 50 0.7 0.39 0.48 0.23 0.33 0.28 0.36 100 0. 0 0.15 0.15 0.07 0.05 0.12 0.08 100 0.4 0.14 0.15 0.11 0.07 0.08 0.06 100 0.7 0.13 0.14 0.12 0.07 0.07 0.05 500 0. 0 0.01 0.01 0.04 0.02 0.02 0.01 500 0.4 0.02 0.02 0.03 0.01 0.06 0.03 500 0.7 0.03 0.02 0.05 0.04 0.02 0.01 1000 0. 0 0.01 0.01 0.01 0.01 0.00 0.00 1000 0.4 0.01 0.00 0.01 0.00 0.01 0.01 1000 0.7 0.01 0.01 0.01 0.01 0.01 0.01 51 Table 4. 2 shows the MSEs of the sample means using the samples from three different sample design s : SICSUP, SICS, and SC. Weighted means and unweighted means were estimated under each simulation condition. The MSE s in SICSUP, SICS, and SC are greater than those in SRS when sample size is not large ( n 100). The MSEs in SRS are between .02 and .11 while those in the three sa mple design s are between .02 and .59. However, with medium to large sample sizes ( n 500), the MSEs in the three sample designs are very similar to those in SRS . That indicates that the sample means based on the three sample design s are almost as accurate as the sample means based on SRS. That also shows that SICSUP works as well as SC with medium to large sample sizes . Under the simulation condition of initial proportions based on data and small sample size ( n 100), the updating process of SICSUP is not helpful to estimate the population m ean as compared to the results in SICS and SC . For both of weighted and unweighted sample means , given the sample size of 50, the MSEs in SICSUP are larger than tho se in S I C S and SC in general. For example, in SICSUP are .29 for the weighted mean and .21 for the unweigh t ed mean while the MSEs in SC are .17 for the weighted mean and .14 for the unweighted mean. When the sample size is small, the updati ng process of SICSUP might not be able to find the true proportions of novice teachers over strata in the population. The updating process relies on the samples that were collected to this point . Therefore, the number of samples that would be used for the updating process ( n 1 ) is smaller than the predetermined sample size ( n ), expressed by n = n 1 + n 2 , where n 2 is the number of novice teachers who are sampled after the updating process. For example, given the sample size of 50, the updating process relies on samples less than 50. Because of the small number of samples that are used for the updating process, the 52 updated proportions of novice teachers over strata might be different from those in the population . If researchers already know the true proportions (the proportions in the population) befo re the sampling procedure, the updatin g process is not necessary and thus, could not be able to increase accuracy in parameter estimation . Therefore, the MSEs in SICSUP are larger than those in SC when sample size is small. As the sample size increases, the updating process can provide proper information about the proportions of novice teachers over strata , and the MSEs in SICSUP and SC become similar to each other. An interesting finding is that even with small sample size ( n = 50), under the condition of either of informal estimates of the proportions used , SICSUP works better than SC in terms of MSE. Although the updating process with small sample size may not be helpful for par ameter estima tion, if researchers do not know the true proportions over strata , the updating process at least provides some useful information about the proportions. Therefore, SICSUP could produce better estimates than SC . As shown in Figure 4.1, empiric al selection probabilities (selection probabilities based on 5,000 sets of samples) in SICSUP indicate that, in general, when the sample size is small ( n = 50), schools with more than one novice teachers have a slightly higher chance of being sampled than schools with one novice teacher. As the sample size increases, the selection probabilities become almost equal regardless of school size. This could influence accuracy in estimation especially under the simulation condition , large schools tend to have higher means than small schools because school size and school mean are po sitively correlated. In SICSUP as well as SICS, t he MSEs under . 0 and n = 50 are greater than those under and n = 50 . 53 * N ote: each bar represents the number of novice teachers in school (e.g., dark blue bar s refer to schools with one novice teacher) . Figure 4. 1 Empiric al Selection Probability for n= 50 (left) and n=1,000 (right) U sing SICSUP When researchers have = . 0, the MSEs in SICS are similar to those in SC rather than those in SICSUP . This is because of the similarity in sampling procedure between SICS and SC under such condition. Both of them use stratification, clusters, and fixed sample sizes for strata. The only difference is the selection method when the selected schools have more novice teachers tha n required. In SICS, all novice teachers would be sampled in contacted schools except those in the lastly contacted school . For example, given the sample size of 50, consider that a researcher has collected 49 samples so far. The next contacted school has two novice teachers while the required number of n ovice teachers from that school is only one. In that case, the researcher would pick one out of the two novice teachers. In SC used in this dissertation, all two novice teachers in that school are once sampled and later, one novice teacher is removed rando mly from the whole sample of 51. Except this difference, the sampling procedure between SICS and SC is similar to each other, that may lead the similar MSEs between the two sample designs under the condition mentioned above. 54 Taking all of results together , there are four main findings. First, as the sample size increases, the MSE s in SICSUP, SICS, and SC decrease and become close to those in SRS . Second, SICSUP works as well as SC when sample size is not small ( n 500). Three, the upda ting process of SICSUP may not be beneficial to estimate the population mean accurately with small sample size and correlated data . However, if researchers do not know the true proportions of novice teachers over strata before the sampling procedu re, under the condition of similar to SC under the condi tion of initial proportions based on data Standard Deviation . Table 4 . 3 gives the MSEs of the sample standard deviation s . Weighted standard deviations and unweighted standard deviations were estimated using the three sample designs : SICSUP, SICS, and SC . Like the sample means , the MSEs in SICSUP, SICS, and SC are greater than those in SRS when sample size is not large ( n 100). The MSEs in SRS are between .01 and .05 while those in the three sample design s are between .03 and .27 . However, with medium to large sample sizes ( n 500), the MSEs in the three sample design s are similar to those in SRS . That indicates that the sample standard deviation s based on the three sample design s are as accurate as those based on SRS. 55 Table 4 . 3 MSE of Standard Deviation Using SI CSUP, SICS, and SC S amples n Weighted Unweighted SICS UP SICS SC SICS UP SICS SC Initial Proportions Based on Data 50 0. 0 0.27 0.10 0.07 0.19 0.11 0.06 50 0.4 0.15 0.15 0.15 0.10 0.15 0.10 50 0.7 0.07 0.07 0.12 0.05 0.06 0.10 100 0. 0 0.04 0.05 0.06 0.05 0.03 0.04 100 0.4 0.07 0.05 0.03 0.07 0.04 0.03 100 0.7 0.07 0.06 0.16 0.06 0.03 0.10 500 0. 0 0.01 0.01 0.01 0.00 0.01 0.01 500 0.4 0.01 0.01 0.01 0.01 0.01 0.01 500 0.7 0.01 0.02 0.02 0.00 0.01 0.01 1000 0. 0 0.01 0.01 0.00 0.00 0.00 0.00 1000 0.4 0.00 0.00 0.00 0.00 0.00 0.00 1000 0.7 0.01 0.00 0.01 0.00 0.00 0.00 Informal Estimate Based on School Proportions 50 0. 0 0.24 0.10 0.12 0.17 0.05 0.11 50 0.4 0.26 0.18 0.12 0.18 0.14 0.08 50 0.7 0.09 0.10 0.16 0.05 0.09 0.13 100 0. 0 0.03 0.11 0.08 0.03 0.06 0.05 100 0.4 0.03 0.06 0.04 0.03 0.05 0.03 100 0.7 0.08 0.06 0.07 0.06 0.04 0.06 500 0. 0 0.01 0.02 0.01 0.01 0.02 0.01 500 0.4 0.01 0.01 0.01 0.01 0.01 0.01 500 0.7 0.03 0.03 0.01 0.01 0.02 0.01 1000 0. 0 0.00 0.00 0.00 0.00 0.00 0.00 1000 0.4 0.00 0.00 0.00 0.00 0.00 0.00 1000 0.7 0.01 0.01 0.01 0.01 0.01 0.01 Informal Estimate Based on Equal Proportions 50 0. 0 0.03 0.07 0.13 0.04 0.06 0.12 50 0.4 0.12 0.10 0.08 0.12 0.09 0.05 50 0.7 0.17 0.22 0.10 0.13 0.10 0.07 100 0. 0 0.10 0.09 0.09 0.07 0.03 0.02 100 0.4 0.07 0.05 0.08 0.09 0.04 0.02 100 0.7 0.12 0.09 0.12 0.12 0.06 0.03 500 0. 0 0.01 0.01 0.02 0.02 0.02 0.02 500 0.4 0.02 0.01 0.02 0.01 0.02 0.01 500 0.7 0.01 0.01 0.02 0.02 0.01 0.01 1000 0. 0 0.00 0.00 0.00 0.00 0.00 0.00 1000 0.4 0.00 0.00 0.01 0.00 0.01 0.00 1000 0.7 0.00 0.00 0.00 0.01 0.01 0.01 56 The MSEs of the sample standard deviation s show a similar pattern to those of the sample means . SICSUP works as well as SC in terms of MSE except when the sample size is very small ( n = 50). Regardless of type of initial proportions and correlation, when the sample size is not very small ( n in SICSUP are not very different from those in SC for both of the weighted and unweighted sample standard deviations . The greatest difference in MSE between SICSUP and SC occurs under the condition of initial proportions based on data, n = .0. On the other hand , under such condition, the differen c e between SICS and SC is not that great, showing the effect of sequential selection. These differences indicate that the updating process does not work well unde r the condition mentioned above. However, e ven under the condition of small sample size ( n when informal estimate based on equal proportions is used , SICSUP works better than SC. T he updating process provides at least some useful information about the proportions of novice with those in the previous section (mean estimation) . In SICSUP, the MSEs of the sample means under the condition of smaller than those under t he condition of or .7. However, that is not the case in the sample standard deviation s. There are some cases that the MSEs of the sample standard deviation s under those under ion of initial proportions based on data , n = 50, and weighted samples, the MSE when is .27 ; the corresponding value when 7 is . 07 . In SICSU P, w ith small sample size of 50, schools with more than one novice teacher tend to have a higher chance of being selected than schools with one novice teacher (see Figure 4. 1 ) . When > .0, large schools tend to have higher school means than small schools have. Th ese two 57 factors may make MSEs small when > .0 and n = 50 as compared to when = .0 and n = 50 . Under the former condition , i n each SICSU P sample set , schools are similar in size, and school means are similar to each other (less spread out) . Therefore, sample standard deviations in the sets of samples would be also similar to each other . This might reduce the MSEs , which refer to a sum of the variance of the estimate s a nd squared bias of the estimates . Like the mean estimation, under the condition of initial proportions based on data and = . 0, the MSEs in SICS are similar to those in SC rather than those in SICSUP because of the same reason as I mentioned in the previous section, which is the similarity in sampling procedure between the SICS and SC under such condition. T he MSEs of the sample standard deviation s show a similar pattern to those of the sample means . Taking all of the results together, there are four main findings. First, as the sample size in creases, the MSE s in SICSUP, SICS, and SC decrease and become close to those in SRS . Second, SICSUP works as well as SC when sample size is not very small ( n 100). However, under the condition of small sample size ( n = .0, and informal estimate based on equal proportion s , SICSUP works better than SC . Third, there are some cases that the MSEs of the sample standard deviation s when .0 are smaller than t hose when SICS and SC works similar ly to each other under the condi tion of i nitial proportions based on data and . 4.1.2 Standard Error of t he Sample Mean Table 4. 4 gives the estima ted bias, relative bias, relative MSE, and 95% confidence interval coverage probability for different standard error estimators using SRS. The estimates were obtained under the two conditions: without strata and with pseudo - strata. The jackknife and bootstrap estimators can be used for a sample without strata while the BRR a estimator s 58 require a sample with a certain type of strata. Therefore, pseudo - strata were generate d for samples in SRS. The novice teacher s in each set of samples were randomly paired , and each pair represented a stratum. Simple Random Sampling w ith out Strata . The standard errors were computed based on the jackknife and bootstrap methods without using strata. In terms of estimated bias, relative bias, relative MSE, and confidence interval coverage probability, the two standard error estimators performed similarly well regardless of sample size and level of correlation. The estimated bias is the difference between the empirical standard error and the average of standard error estimates from the 10 sets of samples and can be a negative value. The relative bias is the estimated bias divided by the empirical standard error , and hence, can be also a negative value. Table 4. 4 Estimated Bias, R elative B ias, R elative MSE, and C onfidence I nterval C overage P robability (CV) of the Standard Error E stimators U sing SRS without S trata n Jackknife B oot strap B ias R el . B ias Rel . MSE CV B ias R el . B ias Rel . MSE CV 50 0. 0 0.01 0.03 0.01 1.00 0.01 0.04 0.01 1.00 50 0.4 0.00 0.01 0.00 0.90 0.01 0.02 0.01 0.90 50 0.7 - 0.02 - 0.05 0.01 1.00 - 0.02 - 0.06 0.01 1.00 100 0. 0 - 0.01 - 0.04 0.01 0.90 - 0.01 - 0.03 0.01 0.90 100 0.4 0.00 0.02 0.01 1.00 0.01 0.04 0.01 1.00 100 0.7 0.01 0.03 0.01 0.90 0.01 0.04 0.01 0.90 500 0. 0 0.01 0.20 0.04 0.90 0.01 0.20 0.04 0.90 500 0.4 0.01 0.16 0.03 0.90 0.01 0.15 0.02 0.90 500 0.7 0.01 0.14 0.02 1.00 0.01 0.15 0.02 1.00 1000 0. 0 0.02 0.41 0.17 1.00 0.02 0.40 0.16 1.00 1000 0.4 0.02 0.38 0.14 1.00 0.02 0.35 0.12 1.00 1000 0.7 0.02 0.43 0.19 1.00 0.02 0.43 0.19 1.00 Simple Random Sampling w ith Pseudo - S trata . The standard errors were computed based on the jackknife , methods with pseudo - strata. Table 4. 5 shows the 59 estimated bi ases and relative biases . All standard error estimators exhibit similar estimated biases and relative biases regardless of sample size and level of correlation. Table 4. 5 Estimated Bias and R elative B ias of the Standard Error E stimators U sing SRS with Pseudo - S trata n Bias Rel. Bias J B Br F J B Br F 50 0. 0 0.02 0.01 0.02 0.02 0.06 0.05 0.06 0.06 50 0.4 0.01 0.01 0.01 0.01 0.03 0.04 0.03 0.03 50 0.7 - 0.03 - 0.02 - 0.03 - 0.03 - 0.09 - 0.08 - 0.09 - 0.09 100 0. 0 0.00 0.00 0.00 0.00 - 0.02 - 0.03 - 0.02 - 0.02 100 0.4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100 0.7 0.01 0.01 0.01 0.01 0.03 0.03 0.03 0.03 500 0. 0 0.01 0.01 0.01 0.01 0.19 0.18 0.19 0.19 500 0.4 0.01 0.01 0.01 0.01 0.15 0.14 0.15 0.15 500 0.7 0.01 0.01 0.01 0.01 0.14 0.15 0.14 0.14 1000 0. 0 0.02 0.02 0.02 0.02 0.43 0.40 0.43 0.43 1000 0.4 0.02 0.02 0.02 0.02 0.40 0.43 0.40 0.40 1000 0.7 0.02 0.02 0.02 0.02 0.42 0.42 0.42 0.42 Table 4.6 shows the relative MSE and confidence interval coverage probability . All standard error estimators report similar relative MSE s and confiden ce interval coverage probabilities regardless of sample size and level of correlation. 60 Table 4. 6 Relative MSE and Confidence Interval Coverage P robability of the Standard Error E stimators U sing SRS with Pseudo - S trata n Rel. MSE CI coverage probability J B R F J B R F 50 0 0.02 0.03 0.02 0.02 1.00 1.00 1.00 1.00 50 0.4 0.02 0.02 0.02 0.02 0.90 0.90 0.90 0.90 50 0.7 0.03 0.03 0.03 0.03 1.00 1.00 1.00 1.00 100 0 0.00 0.01 0.00 0.00 1.00 1.00 1.00 1.00 100 0.4 0.02 0.02 0.02 0.02 1.00 1.00 1.00 1.00 100 0.7 0.01 0.02 0.01 0.01 0.90 0.90 0.90 0.90 500 0 0.04 0.04 0.04 0.04 0.90 0.90 0.90 0.90 500 0.4 0.03 0.02 0.03 0.03 0.90 0.90 0.90 0.90 500 0.7 0.02 0.03 0.02 0.02 1.00 1.00 1.00 1.00 1000 0 0.18 0.16 0.18 0.18 1.00 1.00 1.00 1.00 1000 0.4 0.16 0.19 0.16 0.16 1.00 1.00 1.00 1.00 1000 0.7 0. 18 0. 18 0. 18 0. 18 1.00 1.00 1.00 1.00 As expected, the results of SRS indicate t hat all of the standard error estimators work similarly to each other regardless of simulation conditions and serve as a basis in order to evaluate the performance of the standard error estimators using samples in SICSUP, SICS, and SC . Three Sample Designs w ith Original Strata . Estimated Bias . Table 4.7 provides the estimated bias for different standard error estimators with the original strata (rural, town, and city) and weights. Only the jackknife and bootstrap estimators were applied because application of the BRR and estimator s require using pseudo - strata. The estimated bias is the difference between the empirical standard error and the average of standard error estimates from the 10 sets of samples and can be a negative value. In general, the standard errors in SICSUP are similar to tho se in SC. No substantial differen ce between SICSUP and SC was fou nd although some simulation conditions showed 61 larger difference than the other conditions did. When the sample size is small ( n = 50), SICSUP reported slightl y smaller biases than SC did. Considering the smaller bias is the better, SICSUP pe r formed better than SC under such condition. With non - small sample size, n sample design , type of standard error estimator, and other simulation conditions, the standard error estimators tend to overestimate the empirical standard err or, which was obtained using 5,000 sample means . With small sample size, n < 500, the estimated bias could be negative or positive. 62 Table 4. 7 Estimated B i as for the Standard Error E stimators with O riginal Strata and W eight n UJ IJ S J UB IB S B Initial Proportions Based on Data 50 0. 0 - 0.02 0.03 0.04 - 0.03 0.02 0.03 50 0.4 0.01 0.09 - 0.05 - 0.01 0.07 - 0.07 50 0.7 - 0.02 0.00 - 0.04 - 0.03 - 0.02 - 0.05 100 0. 0 - 0.03 - 0.01 - 0.04 - 0.04 - 0.02 - 0.04 100 0.4 - 0.01 0.02 - 0.02 - 0.01 0.01 - 0.02 100 0.7 0.01 0.00 0.02 0.00 - 0.01 0.02 500 0. 0 0.03 0.02 0.02 0.02 0.02 0.02 500 0.4 0.02 0.02 0.02 0.02 0.02 0.02 500 0.7 0.03 0.03 0.03 0.03 0.03 0.03 1000 0. 0 0.04 0.04 0.04 0.04 0.03 0.04 1000 0.4 0.03 0.04 0.04 0.03 0.03 0.04 1000 0.7 0.05 0.04 0.05 0.04 0.04 0.04 Informal Estimate Based on School Proportion s 50 0. 0 - 0.02 - 0.05 - 0.03 - 0.04 - 0.05 - 0.04 50 0.4 0.00 - 0.01 - 0.03 - 0.01 - 0.03 - 0.05 50 0.7 0.00 0.05 0.07 - 0.01 0.03 0.06 100 0. 0 0.01 0.00 - 0.04 0.01 - 0.01 - 0.04 100 0.4 0.01 - 0.01 - 0.02 0.00 - 0.01 - 0.02 100 0.7 - 0.02 0.01 0.01 - 0.03 0.01 0.01 500 0. 0 0.03 0.01 0.01 0.03 0.01 0.01 500 0.4 0.02 0.03 0.02 0.02 0.02 0.02 500 0.7 0.03 0.02 0.02 0.03 0.02 0.02 1000 0. 0 0.03 0.03 0.02 0.04 0.03 0.02 1000 0.4 0.04 0.03 0.04 0.04 0.03 0.03 1000 0.7 0.04 0.03 0.03 0.04 0.03 0.03 Informal Estimate Based on Equal Proportions 50 0. 0 0.03 - 0.08 - 0.04 0.02 - 0.09 - 0.05 50 0.4 - 0.03 0.03 0.04 - 0.04 0.02 0.03 50 0.7 - 0.07 0.09 - 0.06 - 0.07 0.07 - 0.07 100 0. 0 0.01 0.00 0.01 0.01 - 0.03 - 0.03 100 0.4 - 0.03 - 0.04 - 0.01 - 0.02 0.00 - 0.01 100 0.7 - 0.08 - 0.09 - 0.05 - 0.05 - 0.02 - 0.03 500 0. 0 0.03 0.02 0.02 0.02 0.01 0.01 500 0.4 0.03 0.03 0.03 0.03 0.01 0.01 500 0.7 0.02 0.02 0.02 0.02 0.02 0.02 1000 0. 0 0.04 0.04 0.03 0.03 0.02 0.03 1000 0.4 0.04 0.04 0.03 0.03 0.03 0.03 1000 0.7 0.04 0.04 0.04 0.04 0.03 0.03 63 On average, in SICSUP, the jackknife and bootstrap estimator worked similarly to each other . The difference in standard error between the two standard error estimators is very small, with maximum difference of .03. Figure 4. 2 illustrates the absolute values of the estimated biases. W ith very small sample size ( n = 50), the jackknife estimator might be slightly better than the bootstrap estimator in SICSUP. 64 Figure 4. 2 Estimated B ias of the Jackknife UJ ) and B UB ) E stimators with n =50 and O riginal S trata by T ype of I nitial Proportions : I nitial Proportions B ased on D ata (T op), Informal Estimate Based on School Proportion s (M iddle), and Informal Estimate Based on Equal Proportion s ( B ottom) 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 UJ UB Estimated Bias Standard Error Estimator 0.0 0.4 0.7 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 UJ UB Estimated Bias Standard Error Estimator 0.0 0.4 0.7 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 UJ UB Estimated Bias Standard Error Estimator 0.0 0.4 0.7 65 T he estimated biases of standard error without sampling weight were also examined (see Appendix). The tw o results are not very different . The standard error estimators without weight tend to slightly less underestimate the empirical standard error than those with weight. This difference shows the influence of using sampling weight upon standard error estimation. Relative Bias . Table 4.8 presents the relative biases of the standard error estimat es with weights. Like the previous section , only the jackknife and bootstrap estimators were used. On av erage, SICSUP worked as well as SC in terms of relative bias. When the sample size is small ( n = 50), SICSUP reported slightl y smaller relative biases than SC did. Considering the smaller relative bias is the better, SICSUP pe r formed better than SC under such condition. As the sample size increases, the relative biases tend to increase regardless of sample design , type of standard error estimator, and other simulation conditions. The s tandard error decreases as the sample size increases, and this reduction in standard error might cause the increase in rel ative bias. For instance, with sample size of 50, if empirical standard error is .10 and a standard error estimate is .11, the bias is .01 (.11 .10). With sample size of 1,000, if the empirical standard error is .01 and a standard error estimate is .02, the bias is also .01 (.02 .01). Although the biases are the same, the relative biases are different: .1 (.01/.10) for the former case and 1.0 (.01/.01) for the latter case. This caused the increase in relative bias with increasing sample size in Table 4. 9 . Therefore, the results should be interpreted given the same sample size. On average, in SICSUP, the jackknife and bootstrap estimator worked sim ilarly to each other in terms of relative bias. With very small sample size ( n = 50), the jackknife estimator might be slightly better than the bootstrap estimator in SICSUP. Th ese result s agree with those of estimated bias. 66 Table 4. 8 Relative Bias of the Standard Error E stimators with O riginal Strata and W eight n UJ IJ S J UB IB S B Initial Proportions Based on Data 50 0. 0 - 0.04 0.06 0.07 - 0.06 0.03 0.07 50 0.4 0.01 0.16 - 0.09 - 0.01 0.12 - 0.13 50 0.7 - 0.03 0.00 - 0.07 - 0.04 - 0.03 - 0.09 100 0. 0 - 0.07 - 0.03 - 0.11 - 0.10 - 0.05 - 0.12 100 0.4 - 0.03 0.05 - 0.05 - 0.03 0.04 - 0.05 100 0.7 0.01 0.00 0.04 0.00 - 0.02 0.05 500 0. 0 0.17 0.11 0.16 0.15 0.11 0.15 500 0.4 0.15 0.14 0.16 0.15 0.14 0.16 500 0.7 0.19 0.20 0.20 0.19 0.19 0.19 1000 0. 0 0.44 0.41 0.48 0.45 0.40 0.47 1000 0.4 0.37 0.40 0.48 0.38 0.37 0.49 1000 0.7 0.45 0.46 0.49 0.43 0.45 0.47 Informal Estimate Based on School Proportion s 50 0. 0 - 0.04 - 0.09 - 0.07 - 0.08 - 0.10 - 0.08 50 0.4 0.00 - 0.02 - 0.06 - 0.03 - 0.06 - 0.09 50 0.7 0.00 0.09 0.13 - 0.02 0.05 0.11 100 0. 0 0.03 0.00 - 0.11 0.02 - 0.02 - 0.12 100 0.4 0.02 - 0.03 - 0.05 0.00 - 0.02 - 0.06 100 0.7 - 0.05 0.03 0.01 - 0.07 0.03 0.01 500 0. 0 0.18 0.10 0.09 0.18 0.09 0.08 500 0.4 0.12 0.17 0.13 0.10 0.15 0.13 500 0.7 0.20 0.10 0.14 0.20 0.09 0.14 1000 0. 0 0.41 0.31 0.27 0.42 0.31 0.24 1000 0.4 0.42 0.36 0.40 0.42 0.36 0.39 1000 0.7 0.42 0.25 0.27 0.42 0.24 0.29 Informal Estimate Based on Equal Proportions 50 0. 0 0.06 - 0.16 - 0.08 0.04 - 0.18 - 0.10 50 0.4 - 0.06 0.06 0.08 - 0.08 0.04 0.05 50 0.7 - 0.12 0.15 - 0.10 - 0.12 0.12 - 0.13 100 0. 0 0.02 - 0.01 0.03 0.03 - 0.07 - 0.08 100 0.4 - 0.08 - 0.10 - 0.04 - 0.06 - 0.01 - 0.03 100 0.7 - 0.19 - 0.20 - 0.12 - 0.14 - 0.06 - 0.08 500 0. 0 0.18 0.17 0.16 0.15 0.10 0.09 500 0.4 0.21 0.21 0.22 0.21 0.07 0.06 500 0.7 0.14 0.14 0.13 0.13 0.11 0.12 1000 0. 0 0.43 0.41 0.42 0.41 0.26 0.27 1000 0.4 0.43 0.43 0.42 0.41 0.26 0.27 1000 0.7 0.44 0.44 0.43 0.45 0.28 0.28 67 The relative biases without sampling weight were also investigated (see Appendix). Although SICSUP does not show a substantial difference in relati ve bias between the two results , SICS and SC show a noticeable difference in relative bias under some simulation conditions, with the maximum difference of 30 % in SICS and 2 8 % in SC . This indicates the influence of using sampling weight for standard error estimation. Relative MSE . Table 4.9 presents the relative MSE s of the standard error estimates with sampling weight s . Like the previous sections , only the jackknife and bootstrap estimators were used. On a verage , SICSUP worked as well as SC in terms of relative MSE except under some simulation conditions. For example, under the condition of n based on data, the rela t ive MSE in SICSUP (.17) is fairly greater than that in SC (.05). This The MSE is a sum of the variance of estimate s a nd squared bia s of estimates . Under such condition , the standard errors may be widely spread out and lead to increase the MSE here. In general, in SICSUP, the jackknife and bootstrap estimators worked similarl y to each other . No substantial difference between the two standard error estimators was found, with the maximum difference of .03. Using either of the jackknife o r bootstrap estimator would not make a big difference in estimating standard errors in SICSUP. T he relative MSEs without weight were also calculated (see Appendix). The difference between the two results is not significant in general although some simulation conditions produced a relative ly large difference between results. This indicates the effect of using sampling weight upon standard error estimation. 68 Table 4. 9 Relative MSE for the Standard Error Estimators with O riginal S trata and W eight n UJ IJ S J UB IB S B Initial Proportions Based on Data 50 0. 0 0.17 0.05 0.05 0.17 0.06 0.05 50 0.4 0.06 0.06 0.04 0.05 0.05 0.05 50 0.7 0.02 0.04 0.04 0.02 0.04 0.04 100 0. 0 0.02 0.02 0.03 0.03 0.03 0.03 100 0.4 0.03 0.03 0.01 0.03 0.03 0.02 100 0.7 0.02 0.03 0.07 0.03 0.04 0.07 500 0. 0 0.04 0.02 0.03 0.03 0.02 0.03 500 0.4 0.03 0.02 0.03 0.04 0.03 0.03 500 0.7 0.04 0.05 0.05 0.04 0.04 0.04 1000 0. 0 0.20 0.17 0.23 0.21 0.17 0.23 1000 0.4 0.14 0.16 0.23 0.15 0.14 0.24 1000 0.7 0.21 0.21 0.24 0.18 0.20 0.23 Informal Estimate Based on School Proportion s 50 0. 0 0.11 0.07 0.05 0.12 0.06 0.05 50 0.4 0.14 0.11 0.08 0.12 0.09 0.06 50 0.7 0.02 0.05 0.08 0.02 0.04 0.07 100 0. 0 0.01 0.09 0.05 0.02 0.09 0.04 100 0.4 0.03 0.03 0.02 0.03 0.03 0.02 100 0.7 0.04 0.05 0.04 0.04 0.07 0.04 500 0. 0 0.04 0.02 0.01 0.04 0.02 0.01 500 0.4 0.02 0.04 0.03 0.01 0.03 0.03 500 0.7 0.05 0.03 0.03 0.05 0.03 0.03 1000 0. 0 0.17 0.10 0.08 0.18 0.10 0.06 1000 0.4 0.18 0.13 0.16 0.18 0.13 0.15 1000 0.7 0.18 0.07 0.08 0.19 0.06 0.09 Informal Estimate Based on Equal Proportions 50 0. 0 0.05 0.07 0.05 0.05 0.07 0.05 50 0.4 0.05 0.07 0.09 0.05 0.05 0.07 50 0.7 0.07 0.20 0.06 0.07 0.17 0.06 100 0. 0 0.05 0.04 0.04 0.04 0.04 0.05 100 0.4 0.04 0.03 0.02 0.03 0.04 0.03 100 0.7 0.06 0.06 0.03 0.04 0.04 0.04 500 0. 0 0.04 0.03 0.03 0.03 0.02 0.03 500 0.4 0.06 0.06 0.06 0.05 0.01 0.01 500 0.7 0.02 0.02 0.02 0.02 0.02 0.03 1000 0. 0 0.19 0.17 0.18 0.17 0.07 0.08 1000 0.4 0.19 0.18 0.18 0.17 0.07 0.08 1000 0.7 0.20 0.20 0.19 0.21 0.08 0.08 69 Confidence Interval Coverage Probability . Table 4.10 presents the confidence interval coverage probabilities of the standard error estimate s with weights. Like the previous sections , only the jackknife and bootstrap estimators were used. In terms of confidenc e interval coverage probability, SICSUP worked as well as SC. Under the condition of n = 50, = .0, informal estimate based on school proporti o ns, and the bootstrap estimator, SICSUP worked much better than SC: 1 .0 in SICSUP and .7 in SC. T he corresponding probabilities using jackknife estimator are 1 .0 in SICSUP and .8 in SC. U nder such condition, SICSUP worked better than SC in terms of confidenc e interval coverage probability. In SICSUP, the jackkn i fe and bootstrap worked almost identically. Although most of the coverage probabilities are either of .9 or 1.0, there are some conditions that show the coverage probability of .8, whic h is lower than the preferred value (. 9 or higher ). For example, u nder the condition of n = 50, informal estimate based on equal proportions, = .7 , either of the jackknife or bootstrap estimator reported the coverage probability o f .8. 70 Table 4. 10 Confidence Interval Coverage Probability of the Standard Error Estimators with Original Strata and Weight n UJ IJ S J UB IB S B Initial Proportions Based on Data 50 0. 0 1.00 1.00 0.90 1.00 1.00 0.90 50 0.4 0.90 0.80 0.90 0.90 0.90 0.90 50 0.7 0.90 1.00 0.80 0.80 0.90 0.80 100 0. 0 0.90 1.00 1.00 0.90 1.00 1.00 100 0.4 0.90 0.90 1.00 0.90 0.90 0.90 100 0.7 1.00 0.90 1.00 1.00 0.90 1.00 500 0. 0 0.80 1.00 1.00 0.80 1.00 1.00 500 0.4 1.00 0.90 1.00 1.00 0.90 1.00 500 0.7 0.90 1.00 1.00 0.90 1.00 1.00 1000 0. 0 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.4 0.90 1.00 1.00 0.90 1.00 1.00 1000 0.7 1.00 1.00 1.00 1.00 1.00 1.00 Informal Estimate Based on School Proportion s 50 0. 0 1.00 1.00 0.80 1.00 1.00 0.70 50 0.4 0.80 1.00 0.80 0.90 1.00 0.80 50 0.7 1.00 0.90 1.00 1.00 0.90 1.00 100 0. 0 1.00 1.00 1.00 0.90 1.00 1.00 100 0.4 0.90 1.00 0.90 0.90 0.90 0.90 100 0.7 0.90 1.00 1.00 0.90 1.00 1.00 500 0. 0 1.00 1.00 1.00 1.00 1.00 1.00 500 0.4 0.90 1.00 1.00 0.90 1.00 1.00 500 0.7 1.00 1.00 1.00 1.00 1.00 1.00 1000 0. 0 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.4 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.7 0.90 1.00 1.00 0.90 1.00 1.00 Informal Estimate Based on Equal Proportions 50 0. 0 1.00 0.90 1.00 1.00 0.90 1.00 50 0.4 0.90 0.90 1.00 0.90 0.90 1.00 50 0.7 0.80 1.00 0.90 0.80 1.00 0.90 100 0. 0 1.00 1.00 1.00 1.00 1.00 1.00 100 0.4 0.90 0.90 0.80 0.80 1.00 1.00 100 0.7 0.90 0.90 0.90 0.80 1.00 1.00 500 0. 0 1.00 1.00 1.00 1.00 0.90 0.90 500 0.4 1.00 1.00 1.00 1.00 1.00 1.00 500 0.7 1.00 1.00 1.00 1.00 1.00 1.00 1000 0. 0 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.4 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.7 1.00 1.00 1.00 1.00 0.90 0.90 71 On average, SICS reported higher confidence interval coverage probabilities than the other two sampling design s did across the simulation conditions. Under some condit i ons, the coverage probabilities in SICSUP are slightly worse than those in SICS. This ca n be explained by two reasons: first, the mean was underestimated; second, standard error was underestimated. The range of confidence interval is determined by the sample mean and the standard error estimate. If both of the mean and the standard error are underestimated, the confidence interval coverage probability would decrease. Additionally, the confidence interval coverage probabilities were computed using the estimated standard errors and unweighted means (see Appendix). I n SICSUP, on average, the confidence interval coverage probabilities without sampling weight are similar to those with sampling weight. Three Sample Designs w ith Pseudo - S trata . The method s require using a special typ e of strata. Each stratum should have two PSUs. In reality, such populations are rarely found, and hence, the standard error estimators based on the s are often employed with pseudo - strata. Estimated Bias . The estimated bias is the difference between the empirical standard error and the average of standard error estimates from the 10 sets of samples and can be a negative value. As shown in Table 4.11 , SICSUP worked as well as SC in most simulation conditions with res p e c t to standard error estimation. Under the condition of n = information based on data, SICSU P worked relatively worse than SC. The estimated biases are .17 and .04 in SICSUP and SC, respectively. This result is different from that with original strata. When original strata were used, under the same condition, SICSUP worked sl ightly better 72 than SC in terms of estimated bias. The use of pseudo - much the standard errors in SC whi le it made a noticeable change to standard errors when SICSUP was used, with the difference about .1 5. This suggest s that using pseudo - strata may influence SICSUP more than SC. Under most of the conditions, the four standard error estimators tend to underestimate the empirical standard error. The estimated biases are mostly negative. Why this happened? When original strata were used, biases were either of positive or negative. When n jackknife and bootstrap estimators tend ed to overestimate the empirical standard error . W hen pseudo - strata were used, biases tend ed to be negative. Some previous studies reported similar results. For estimator ha d tendency to underesti mate the standard error (Paben, 1999). The jackknife estimator also seem ed to underestimate the standard error with pseudo - strata (Folsom, 2014). The results of estimated bias in this study show that not only the jackknife and s but also the bootstrap and BRR estimators tend ed to underestimate the empirical standa rd errors. The underestimation may be related to the us e of pseudo - strata. In SICSUP, on average, the four standard error estimators work ed similar ly to each other . No substantial difference in estimated bias was found among the four standard error estimat ors. When informal estimate based on equal proportions was used, the BRR estimator performed sl ightly better than the other standard error estimators , especially with n 73 Table 4. 11 Estimated Bias of the Standard Error Estimators with Pseudo - S trata and Weight n UJ IJ S J UB IB S B UR IR S R UF IF S F Initial Proportions Based on Data 50 0. 0 - 0.11 - 0.05 - 0.10 - 0.10 - 0.04 - 0.10 - 0.11 - 0.04 - 0.10 - 0.11 - 0.05 - 0.10 50 0.4 0.05 - 0.02 - 0.11 0.05 - 0.01 - 0.11 0.05 - 0.01 - 0.11 0.05 - 0.02 - 0.11 50 0.7 - 0.17 - 0.08 - 0.04 - 0.17 - 0.09 - 0.0 4 - 0.17 - 0.09 - 0.04 - 0.17 - 0.09 - 0.04 100 0. 0 - 0.10 - 0.12 - 0.08 - 0.10 - 0.12 - 0.08 - 0.10 - 0.12 - 0.08 - 0.10 - 0.12 - 0.08 100 0.4 - 0.05 - 0.04 - 0.08 - 0.06 - 0.04 - 0.08 - 0.05 - 0.04 - 0.08 - 0.05 - 0.04 - 0.08 100 0.7 - 0.04 - 0.07 - 0.09 - 0.04 - 0.07 - 0.09 - 0.04 - 0.07 - 0.09 - 0.04 - 0.07 - 0.09 500 0. 0 - 0.04 - 0.05 - 0.04 - 0.04 - 0.05 - 0.04 - 0.04 - 0.05 - 0.04 - 0.04 - 0.05 - 0.04 500 0.4 - 0.05 - 0.04 - 0.04 - 0.05 - 0.04 - 0.04 - 0.05 - 0.04 - 0.04 - 0.05 - 0.04 - 0.04 500 0.7 - 0.05 - 0.08 - 0.05 - 0.05 - 0.08 - 0.05 - 0.05 - 0.08 - 0.05 - 0.05 - 0.08 - 0.05 1000 0. 0 - 0.03 - 0.04 - 0.02 - 0.03 - 0.03 - 0.02 - 0.03 - 0.04 - 0.02 - 0.03 - 0.04 - 0.02 1000 0.4 - 0.03 - 0.03 - 0.02 - 0.03 - 0.03 - 0.02 - 0.03 - 0.03 - 0.02 - 0.03 - 0.03 - 0.02 1000 0.7 - 0.03 - 0.04 - 0.03 - 0.03 - 0.04 - 0.03 - 0.03 - 0.04 - 0.03 - 0.03 - 0.04 - 0.03 Informal Estimate Based on School Proportion s 50 0. 0 - 0.15 - 0.13 - 0.11 - 0.15 - 0.13 - 0.10 - 0.15 - 0.13 - 0.10 - 0.15 - 0.13 - 0.11 50 0.4 - 0.15 - 0.09 - 0.10 - 0.15 - 0.09 - 0.09 - 0.15 - 0.08 - 0.10 - 0.15 - 0.09 - 0.10 50 0.7 - 0.12 0.06 0.07 - 0.11 0.06 0.07 - 0.11 0.06 0.07 - 0.12 0.06 0.07 100 0. 0 - 0.07 - 0.02 - 0.09 - 0.08 - 0.02 - 0.09 - 0.07 - 0.02 - 0.09 - 0.07 - 0.02 - 0.09 100 0.4 - 0.07 - 0.03 - 0.06 - 0.07 - 0.03 - 0.06 - 0.07 - 0.03 - 0.06 - 0.07 - 0.03 - 0.06 100 0.7 - 0.02 - 0.03 - 0.09 - 0.02 - 0.04 - 0.09 - 0.02 - 0.03 - 0.09 - 0.02 - 0.03 - 0.09 500 0. 0 - 0.05 - 0.05 - 0.03 - 0.05 - 0.05 - 0.04 - 0.05 - 0.05 - 0.03 - 0.05 - 0.05 - 0.03 500 0.4 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 500 0.7 - 0.05 - 0.06 - 0.06 - 0.05 - 0.06 - 0.06 - 0.05 - 0.06 - 0.06 - 0.05 - 0.06 - 0.06 1000 0. 0 - 0.02 - 0.02 - 0.03 - 0.02 - 0.02 - 0.02 - 0.02 - 0.02 - 0.03 - 0.02 - 0.02 - 0.03 1000 0.4 - 0.03 - 0.04 - 0.02 - 0.03 - 0.04 - 0.02 - 0.03 - 0.04 - 0.02 - 0.03 - 0.04 - 0.02 1000 0. 7 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 Informal Estimate Based on Equal Proportions 50 0. 0 - 0.07 - 0.15 - 0.09 - 0.06 - 0.15 - 0.10 - 0.06 - 0.15 - 0.10 - 0.06 - 0.15 - 0.10 50 0.4 - 0.09 - 0.08 - 0.04 - 0.09 - 0.08 - 0.04 - 0.09 - 0.09 - 0.04 - 0.09 - 0.09 - 0.04 50 0.7 - 0.16 - 0.02 - 0.07 - 0.16 - 0.02 - 0.07 - 0.16 - 0.02 - 0.07 - 0.16 - 0.02 - 0.07 100 0. 0 - 0.05 - 0.05 - 0.05 - 0.05 - 0.04 - 0.04 - 0.04 - 0.04 - 0.08 - 0.09 - 0.09 - 0.09 100 0.4 - 0.06 - 0.06 - 0.06 - 0.06 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.03 - 0.04 100 0.7 - 0.11 - 0.11 - 0.11 - 0.11 - 0.07 - 0.07 - 0.07 - 0.07 - 0.06 - 0.06 - 0.06 - 0.06 500 0. 0 - 0.05 - 0.05 - 0.05 - 0.05 - 0.04 - 0.04 - 0.04 - 0.04 - 0.03 - 0.03 - 0.03 - 0.03 500 0.4 - 0.04 - 0.03 - 0.04 - 0.04 - 0.03 - 0.03 - 0.03 - 0.03 - 0.06 - 0.06 - 0.06 - 0.06 500 0.7 - 0.07 - 0.08 - 0.07 - 0.07 - 0.06 - 0.06 - 0.06 - 0.06 - 0.08 - 0.08 - 0.08 - 0.08 1000 0. 0 - 0.04 - 0.04 - 0.04 - 0.04 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 1000 0. 4 - 0.03 - 0.03 - 0.03 - 0.03 - 0.02 - 0.02 - 0.02 - 0.02 - 0.04 - 0.04 - 0.04 - 0.04 1000 0.7 - 0.04 - 0.03 - 0.04 - 0.04 - 0.03 - 0.03 - 0.03 - 0.03 - 0.04 - 0.04 - 0.04 - 0.04 74 Additionally, estimated bias e s of the standard error s without sampling weight were examined (see Appendix). For most of the simulation conditions, the estimated biases without weight are slightly s maller than those with weight and are more underestimated than those with weight . Relative Bias . As I mentioned previously in the results with original strata, r elative bias es of the standard error estimates do not necessarily decrease as the sample size increase s . On average , SICSUP worked as well as SC in terms of relative bias. Under some simulation conditions, SICSUP worked slightly worse than SC (e.g., n = information based on data) while under other conditions, SICSUP worked s lightly better than SC (e.g., n pattern that explains the different performances of SICSUP with respect to relative bias . Under the condition of n = tial information based on data, SICSUP worked relatively worse than SC, and this result agrees with that of estimated bias. 7 5 Table 4. 12 Relative Bias of the Standard Error Estimators with Pseudo - S trata and Weight n UJ IJ S J UB IB S B UR IR S R UF IF S F Initial Proportions Based on Data 50 0. 0 - 0.21 - 0.09 - 0.20 - 0.20 - 0.08 - 0.20 - 0.21 - 0.08 - 0.20 - 0.21 - 0.09 - 0.20 50 0.4 0.09 - 0.04 - 0.21 0.10 - 0.03 - 0.20 0.09 - 0.02 - 0.21 0.09 - 0.03 - 0.21 50 0.7 - 0.27 - 0.14 - 0.07 - 0.27 - 0.14 - 0.08 - 0.27 - 0.14 - 0.07 - 0.27 - 0.14 - 0.07 100 0. 0 - 0.27 - 0.33 - 0.23 - 0.27 - 0.34 - 0.22 - 0.27 - 0.33 - 0.24 - 0.27 - 0.33 - 0.24 100 0.4 - 0.14 - 0.09 - 0.22 - 0.15 - 0.11 - 0.22 - 0.13 - 0.09 - 0.23 - 0.13 - 0.09 - 0.22 100 0.7 - 0.09 - 0.17 - 0.22 - 0.10 - 0.16 - 0.22 - 0.09 - 0.16 - 0.22 - 0.09 - 0.16 - 0.22 500 0. 0 - 0.27 - 0.37 - 0.28 - 0.27 - 0.37 - 0.28 - 0.28 - 0.37 - 0.28 - 0.28 - 0.37 - 0.28 500 0.4 - 0.31 - 0.24 - 0.24 - 0.31 - 0.23 - 0.24 - 0.31 - 0.24 - 0.24 - 0.31 - 0.24 - 0.24 500 0.7 - 0.31 - 0.47 - 0.29 - 0.31 - 0.47 - 0.29 - 0.31 - 0.47 - 0.29 - 0.31 - 0.47 - 0.29 1000 0. 0 - 0.34 - 0.41 - 0.24 - 0.34 - 0.40 - 0.25 - 0.34 - 0.41 - 0.24 - 0.34 - 0.41 - 0.24 1000 0.4 - 0.38 - 0.33 - 0.29 - 0.38 - 0.33 - 0.29 - 0.37 - 0.33 - 0.29 - 0.37 - 0.33 - 0.29 1000 0.7 - 0.34 - 0.39 - 0.31 - 0.34 - 0.39 - 0.31 - 0.34 - 0.39 - 0.31 - 0.34 - 0.39 - 0.31 Informal Estimate Based on School Proportion s 50 0. 0 - 0.29 - 0.25 - 0.22 - 0.29 - 0.25 - 0.21 - 0.28 - 0.25 - 0.21 - 0.29 - 0.26 - 0.22 50 0.4 - 0.29 - 0.16 - 0.19 - 0.29 - 0.16 - 0.18 - 0.28 - 0.15 - 0.19 - 0.29 - 0.16 - 0.19 50 0.7 - 0.20 0.09 0.12 - 0.19 0.09 0.13 - 0.19 0.10 0.12 - 0.20 0.09 0.12 100 0. 0 - 0.20 - 0.05 - 0.25 - 0.21 - 0.06 - 0.25 - 0.19 - 0.05 - 0.24 - 0.20 - 0.05 - 0.25 100 0.4 - 0.17 - 0.07 - 0.17 - 0.17 - 0.07 - 0.15 - 0.17 - 0.07 - 0.17 - 0.17 - 0.07 - 0.17 100 0.7 - 0.05 - 0.08 - 0.23 - 0.05 - 0.09 - 0.23 - 0.05 - 0.08 - 0.22 - 0.05 - 0.08 - 0.23 500 0. 0 - 0.33 - 0.36 - 0.23 - 0.34 - 0.36 - 0.24 - 0.33 - 0.36 - 0.23 - 0.33 - 0.36 - 0.23 500 0.4 - 0.33 - 0.30 - 0.35 - 0.33 - 0.30 - 0.36 - 0.33 - 0.30 - 0.35 - 0.33 - 0.30 - 0.35 500 0.7 - 0.31 - 0.33 - 0.36 - 0.31 - 0.34 - 0.36 - 0.31 - 0.33 - 0.36 - 0.31 - 0.33 - 0.36 1000 0. 0 - 0.18 - 0.23 - 0.29 - 0.18 - 0.25 - 0.27 - 0.18 - 0.23 - 0.29 - 0.18 - 0.23 - 0.29 1000 0.4 - 0.32 - 0.40 - 0.21 - 0.32 - 0.40 - 0.22 - 0.31 - 0.40 - 0.21 - 0.31 - 0.40 - 0.21 1000 0.7 - 0.38 - 0.36 - 0.41 - 0.38 - 0.35 - 0.41 - 0.38 - 0.36 - 0.41 - 0.38 - 0.36 - 0.41 Informal Estimate Based on Equal Proportions 50 0. 0 - 0.13 - 0.30 - 0.20 - 0.12 - 0.29 - 0.20 - 0.12 - 0.30 - 0.20 - 0.13 - 0.30 - 0.20 50 0.4 - 0.16 - 0.16 - 0.08 - 0.16 - 0.16 - 0.07 - 0.16 - 0.16 - 0.08 - 0.16 - 0.16 - 0.08 50 0.7 - 0.27 - 0.04 - 0.12 - 0.27 - 0.04 - 0.12 - 0.27 - 0.04 - 0.12 - 0.27 - 0.04 - 0.12 100 0. 0 - 0.14 - 0.14 - 0.14 - 0.14 - 0.12 - 0.14 - 0.12 - 0.12 - 0.23 - 0.23 - 0.23 - 0.23 100 0.4 - 0.16 - 0.15 - 0.16 - 0.16 - 0.13 - 0.12 - 0.13 - 0.13 - 0.10 - 0.10 - 0.09 - 0.09 100 0.7 - 0.26 - 0.25 - 0.26 - 0.26 - 0.20 - 0.19 - 0.20 - 0.20 - 0.14 - 0.15 - 0.14 - 0.14 500 0. 0 - 0.34 - 0.33 - 0.34 - 0.34 - 0.34 - 0.34 - 0.34 - 0.34 - 0.21 - 0.22 - 0.21 - 0.21 500 0.4 - 0.23 - 0.22 - 0.23 - 0.23 - 0.22 - 0.23 - 0.22 - 0.22 - 0.37 - 0.38 - 0.37 - 0.37 500 0.7 - 0.43 - 0.44 - 0.43 - 0.43 - 0.38 - 0.38 - 0.38 - 0.38 - 0.45 - 0.44 - 0.45 - 0.45 1000 0. 0 - 0.41 - 0.42 - 0.41 - 0.41 - 0.34 - 0.35 - 0.34 - 0.34 - 0.28 - 0.29 - 0.28 - 0.28 1000 0.4 - 0.30 - 0.29 - 0.30 - 0.30 - 0.22 - 0.23 - 0.22 - 0.22 - 0.37 - 0.37 - 0.37 - 0.37 1000 0.7 - 0.36 - 0.36 - 0.36 - 0.36 - 0.36 - 0.36 - 0.36 - 0.36 - 0.40 - 0.40 - 0.40 - 0.40 76 Figure 4.3 shows the differences in relative bias under the condition of = .7 and i nformal e stimate b ased on e qual p roportion s . The relative biases in SICSUP are expressed in blue, and those in SC are expressed in green regardless of standard error estimator. The same color was used for each sample design regardless of standard error estimator in order to show the difference in relative bias between the two sample designs clearly. With small sample size ( n = 50), the relative biases in SICSUP (blue lines) are greater in absolute value than those in SC (green lines). However, as the sample size increase s , the difference between the two sample designs becomes small , and with the sample size of 1,000, almost standard error estimators wor k similarly to each other in SC (green dash line) . If researchers want to use SICSUP and SC together for their survey under such condition , using sample sizes more than 50 would be recommended in order to keep the estimation precision constant across the sample design s. Figure 4. 3 Relative Bias of the Standard Error Estimators by Sample D .7 and Informal Estimate Based on Equal Proportions) - 0.50 - 0.45 - 0.40 - 0.35 - 0.30 - 0.25 - 0.20 - 0.15 - 0.10 - 0.05 0.00 50 100 500 1000 Relative Bias Sample Size UJ TJ UB TB UR TR UF TF 77 In SICSUP, although all standard error estimators work similarly to each other in general, the BRR slightly work s better than the other standard error estimators in terms of relative bias. standard error estimators under some conditions, but it works worse than the others under different conditions. Therefore, the As the sample size increases, the difference in relative bias among the four standard error e stimators decreas es and becomes almost identical except under the condition of informal estimate based on equal proportions. Additionally, the relative bias without weight was calculated (see Appendix). Many of the simulation conditions produced similar re lative biases regardless of whether the weights were used or not. However, there are some conditions where the amount of difference in relative bias is greater than or equal to .1. Since the relative bias is expressed in a proportion, a difference of .1 re presents a difference of 10%. About 1.3 % ( 56 out of 432 simulation conditions) of the simulation conditions have differences greater than or equal to .1. Those relative ly large differences happened in SICS and SC rather than in SICSUP. That means whether o r not using weight ha s a more significant impact upon standard error estimation for samples in SICS and SC than those in SICSUP. In SICSUP, although the relative biases with weight and without weight show similar patterns, there are some differences. With sampling weight, a s the sample size increase s , t he relative biases increase . On the other hand, the relative biases without weight report relatively similar values as the sample size increases except under the condition of highly correlated = .7) . This is the same for all four standard error estimators. Figure 4.4 gives the relative biases with weight (blue lines) and those without weight (red lines) by sample size under some conditions . Four standard error estimato rs including the jackknife , boo 78 estimators were use d . In Figure 4.4, all relative biases with weight are in blue and those witho ut weight are in red in order to show the difference between weighted and unweighted samples clearly . The relative biases without weig ht are more constant across sample sizes than those with weight. 79 = . 0, Initial Proportion Based on Data = . 0, Informal Estimate Based on School Proportions = .4, Informal Estimate Based on School Proportions Figure 4. 4 Relative Bias of the Standard Error Estimator with W eight (Blue L ines) and without W eight ( Red L ines) by Sample S ize Using SICSUP. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 50 100 500 1000 Relative Bias Sample size UJW UBW URW UFW UJ UB UR UF 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 50 100 500 1000 Relative Bias Sample size UJW UBW URW UFW UJ UB UR UF 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 50 100 500 1000 Relative Bias Sample size UJW UBW URW UFW UJ UB UR UF 80 Relative MSE . On average, SICSUP worked slightly worse than SC did in terms of relative MSE , but the difference between the two sample designs was not substantial. The greatest difference happened under the condition of n = on data. The difference is about .2, meaning about 20% difference in relative MSE. Alth ough this result is different from the results of estimated bias and relative bias, it agrees with the result of relative MSE with original strata. In S I C SUP, throughout the simulation conditions, the j ackknife , bootstrap, BRR, and perform ed similarly to each other in terms of r elative MSE except under the condition of informal estimate based on equal proportions . Under such condition, the BRR perfor m ed slightly better than the others. nder some conditions, but under different conditions, it worked worse than the others. It seems the equal proportions. As the sample size increases, the differ ence in relative MSE among the four standard error estimators decreas es and becomes almost identical except under the condition of informal estimate based on equal proportions. These results are similar to those of relative bias. 81 Table 4. 13 Relative MSE of the Standard Error Estimators with Pseudo - S trata and Weight n UJ IJ S J UB IB S B UR IR S R UF IF S F Initial Proportions Based on Data 50 0. 0 0.26 0.07 0.06 0.25 0.07 0.06 0.25 0.07 0.06 0.25 0.07 0.06 50 0.4 0.13 0.06 0.09 0.14 0.05 0.09 0.13 0.06 0.09 0.13 0.06 0.09 50 0.7 0.12 0.13 0.08 0.12 0.12 0.08 0.12 0.13 0.08 0.12 0.13 0.08 100 0. 0 0.08 0.13 0.08 0.08 0.14 0.07 0.08 0.13 0.08 0.08 0.13 0.08 100 0.4 0.07 0.06 0.06 0.07 0.06 0.06 0.07 0.05 0.06 0.07 0.06 0.06 100 0.7 0.09 0.11 0.09 0.09 0.11 0.10 0.09 0.11 0.10 0.09 0.11 0.10 500 0. 0 0.10 0.15 0.09 0.09 0.15 0.09 0.10 0.15 0.09 0.10 0.15 0.09 500 0.4 0.13 0.08 0.07 0.13 0.08 0.07 0.13 0.08 0.07 0.13 0.08 0.07 500 0.7 0.12 0.25 0.11 0.12 0.25 0.10 0.12 0.25 0.11 0.12 0.25 0.11 1000 0. 0 0.16 0.18 0.08 0.16 0.17 0.09 0.16 0.18 0.08 0.16 0.18 0.08 1000 0.4 0.16 0.13 0.09 0.16 0.13 0.10 0.16 0.13 0.09 0.16 0.13 0.09 1000 0.7 0.14 0.20 0.11 0.14 0.20 0.11 0.14 0.20 0.11 0.14 0.20 0.11 Informal Estimate Based on School Proportion s 50 0. 0 0.17 0.10 0.14 0.18 0.10 0.14 0.17 0.10 0.14 0.17 0.10 0.14 50 0.4 0.19 0.20 0.16 0.18 0.19 0.16 0.18 0.20 0.16 0.18 0.20 0.16 50 0.7 0.09 0.14 0.14 0.09 0.14 0.14 0.09 0.14 0.14 0.09 0.14 0.14 100 0. 0 0.09 0.12 0.10 0.09 0.11 0.10 0.09 0.12 0.10 0.09 0.12 0.10 100 0.4 0.10 0.08 0.07 0.10 0.08 0.06 0.10 0.08 0.07 0.10 0.08 0.07 100 0.7 0.08 0.14 0.07 0.09 0.14 0.07 0.08 0.13 0.07 0.08 0.13 0.07 500 0. 0 0.15 0.15 0.07 0.16 0.15 0.07 0.15 0.15 0.07 0.15 0.15 0.07 500 0.4 0.14 0.12 0.14 0.14 0.12 0.14 0.14 0.12 0.14 0.14 0.12 0.14 500 0.7 0.13 0.15 0.15 0.13 0.16 0.15 0.13 0.15 0.15 0.13 0.15 0.15 1000 0. 0 0.05 0.08 0.10 0.05 0.08 0.09 0.05 0.08 0.10 0.05 0.08 0.10 1000 0.4 0.13 0.18 0.06 0.12 0.18 0.06 0.13 0.18 0.06 0.13 0.18 0.06 1000 0.7 0.18 0.16 0.17 0.18 0.16 0.18 0.18 0.16 0.17 0.18 0.16 0.17 Informal Estimate Based on Equal Proportions 50 0. 0 0.08 0.14 0.11 0.09 0.14 0.11 0.09 0.14 0.12 0.09 0.14 0.11 50 0.4 0.07 0.17 0.14 0.07 0.18 0.14 0.07 0.18 0.14 0.07 0.18 0.14 50 0.7 0.14 0.18 0.08 0.14 0.18 0.08 0.14 0.18 0.08 0.14 0.17 0.08 100 0. 0 0.06 0.06 0.06 0.06 0.05 0.05 0.05 0.05 0.11 0.11 0.11 0.11 100 0.4 0.13 0.13 0.14 0.13 0.10 0.11 0.10 0.10 0.09 0.10 0.09 0.09 100 0.7 0.15 0.15 0.15 0.15 0.11 0.11 0.11 0.11 0.10 0.10 0.10 0.10 500 0. 0 0.15 0.15 0.15 0.15 0.14 0.14 0.14 0.14 0.09 0.08 0.09 0.09 500 0.4 0.07 0.07 0.07 0.07 0.06 0.07 0.06 0.06 0.16 0.16 0.16 0.16 500 0.7 0.20 0.21 0.20 0.20 0.16 0.15 0.16 0.16 0.24 0.24 0.24 0.24 1000 0. 0 0.22 0.23 0.22 0.22 0.14 0.14 0.14 0.14 0.11 0.11 0.11 0.11 1000 0.4 0.09 0.09 0.09 0.09 0.05 0.05 0.05 0.05 0.17 0.17 0.17 0.17 1000 0.7 0.16 0.16 0.16 0.16 0.15 0.16 0.15 0.15 0.19 0.19 0.19 0.19 82 Additionally, the relative MSE s without weight were examined (see Appendix). On average, the relative MSEs without weight and those with weight are similar to each other . The greatest difference in relative MSE occurred in SICSUP under the condition of sample size of 1,000 0 , and i nformal estimate based on equal proportions . Except that conditi on, in SICSUP, the relative MSE s with weight and those without weight are similar. Confidence Interval Coverage Probability . Due to the underestimated standard errors, many of the confidence interval coverage probabilities did not reach the preferred value, .9 or higher . Only 47 % of the simulation conditions re ached the preferred value, and other conditions reported probabilitie s less than .9. Underestimated standard errors reduce the range of confidence interval and hence, decrease the coverage probabilities. This is the same regardless of standard error estimator and sample design used . On average, SICSUP worked as well as SC i n terms of confidence interval coverage probability except under some conditions. Under the condition of n estimate based on school proportions, SICSUP reported much higher probability than SC: 1 .0 in SICSUP and about .5 in SC. O n the other hand, under the condition of n information based on data, SICSUP reported lower probability than SC: .6 in SICSUP and .9 in SC. 83 Table 4. 14 Confidence Interval Coverage Probability of the Standard Error Estimators with Pseudo - S trata and Weight n UJ IJ S J UB IB S B UR IR S R UF IF S F Initial Proportions Based on Data 50 0. 0 0.80 0.90 0.80 0.80 1.00 0.80 0.80 0.90 0.80 0.80 0.90 0.80 50 0.4 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 50 0.7 0.80 1.00 0.80 0.80 1.00 0.70 0.80 1.00 0.80 0.80 1.00 0.80 100 0. 0 0.90 0.80 0.90 0.90 0.80 0.90 0.90 0.80 0.90 0.90 0.80 0.90 100 0.4 0.90 1.00 0.90 0.90 1.00 0.90 0.90 1.00 0.90 0.90 1.00 0.90 100 0.7 1.00 0.90 0.90 1.00 0.90 0.90 1.00 0.90 0.90 1.00 0.90 0.90 500 0. 0 0.60 0.70 0.90 0.60 0.70 0.90 0.60 0.70 0.90 0.60 0.70 0.90 500 0.4 0.90 0.80 0.80 0.90 0.80 0.80 0.90 0.80 0.80 0.90 0.80 0.80 500 0.7 0.70 0.70 0.90 0.50 0.80 0.90 0.70 0.70 0.90 0.70 0.70 0.90 1000 0. 0 0.90 0.80 0.50 0.90 0.90 0.50 0.90 0.80 0.50 0.90 0.80 0.50 1000 0.4 0.80 0.70 0.50 0.70 0.70 0.50 0.80 0.70 0.50 0.80 0.70 0.50 1000 0.7 0.50 0.60 0.70 0.50 0.60 0.70 0.50 0.60 0.70 0.50 0.60 0.70 Informal Estimate Based on School Proportion s 50 0. 0 1.00 0.90 0.50 1.00 0.90 0.50 1.00 0.90 0.60 1.00 0.90 0.60 50 0.4 0.70 0.90 0.80 0.70 0.90 0.80 0.70 0.90 0.80 0.70 0.90 0.80 50 0.7 0.90 0.80 1.00 0.90 0.80 1.00 0.90 0.80 1.00 0.90 0.80 1.00 100 0. 0 0.80 0.80 0.70 0.80 0.80 0.70 0.80 0.80 0.70 0.80 0.80 0.70 100 0.4 0.80 0.90 0.90 0.80 0.90 0.90 0.80 0.90 0.90 0.80 0.90 0.90 100 0.7 0.90 1.00 0.80 0.90 1.00 0.80 0.90 1.00 0.80 0.90 1.00 0.80 500 0. 0 0.90 0.90 0.90 0.90 0.90 0.80 0.90 0.90 0.90 0.90 0.90 0.90 500 0.4 0.90 1.00 0.60 0.80 1.00 0.60 0.90 1.00 0.60 0.90 1.00 0.60 500 0.7 1.00 0.80 0.50 1.00 0.80 0.50 1.00 0.80 0.50 1.00 0.80 0.50 1000 0. 0 0.90 0.70 1.00 0.90 0.70 1.00 0.90 0.70 1.00 0.90 0.70 1.00 1000 0.4 1.00 0.80 0.90 1.00 0.80 0.80 1.00 0.80 0.90 1.00 0.80 0.90 1000 0.7 0.70 0.70 0.60 0.70 0.70 0.60 0.70 0.70 0.60 0.70 0.70 0.60 Informal Estimate Based on Equal Proportions 50 0. 0 1.00 0.90 1.00 1.00 0.90 1.00 1.00 0.90 1.00 1.00 0.90 1.00 50 0.4 0.80 0.90 0.90 0.80 0.90 0.90 0.80 0.90 0.90 0.80 0.90 0.90 50 0.7 0.70 0.90 0.90 0.70 0.80 0.80 0.70 0.90 0.90 0.70 0.90 0.90 100 0. 0 1.00 1.00 1.00 1.00 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 100 0.4 0.90 0.90 0.90 0.90 0.80 0.80 0.80 0.80 0.90 0.90 0.90 0.90 100 0.7 0.90 0.90 0.90 0.90 0.80 0.80 0.80 0.80 0.90 0.90 0.90 0.90 500 0. 0 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.70 0.70 0.70 0.70 500 0.4 1.00 1.00 1.00 1.00 1.00 0.90 1.00 1.00 0.80 0.80 0.80 0.80 500 0.7 0.60 0.60 0.60 0.60 0.70 0.60 0.70 0.70 0.50 0.50 0.50 0.50 1000 0. 0 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.70 0.90 0.90 0.90 0.90 1000 0.4 0.80 0.80 0.80 0.80 0.90 0.90 0.90 0.90 0.80 0.80 0.80 0.80 1000 0.7 0.70 0.70 0.70 0.70 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 84 In SICSUP, the four standard error estimators performed similarly to each other except under the condition of n bootstrap estimator s . Figure 4.5 illustrates the d ifference in confidence interval coverage probabilities in SICSUP. Under most of the simulation conditions, the four standard error estimators work almost identically except under conditions of i nformal estimate based on equal proportions (condition 13 to 18 in Figure 4.5 ). 85 # Simulation Condition # Simulation Condition 1 = . 0, initial proportions based on Data 10 Non - = .4 , informal estimate based on school proportions 2 Non - = . 0, initial proportions based on Data 11 = .7, informal estimate based on equal proportion 3 = .4, initial proportions based on Data 12 Non - = .7, informal estimate based on equal proportion 4 Non - = .4, initial proportions based on Data 13 = . 0, informal estimate based on equal proportion 5 = .7, initial proportions based on Data 14 Non - = . 0, informal estimate based on equal proportion 6 Non - = .7, initial proportions based on Data 15 = .4, informal estimate based on equal proportion 7 = . 0, informal estimate based on school proportions 16 Non - = .4, informal estimate based on equal proportion 8 Non - = . 0, informal estimate based on school proportions 17 = .7, informal estimate based on equal proportion 9 = .4, informal estimate based on school proportions 18 Non - = .7, informal estimate based on equal proportion Figure 4. 5 Confidence Interval Co verage Probability with Pseudo - S trata and Weight 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 CI Coverage Probability Simulation Condition UJ UB UR UF 86 Additionally, the confiden ce interval coverage probabilities without sampling weight were examined (see Appendix). In SICSUP, on average , coverage probabilities with weight tend to be slightly higher than those without weight . In line with the findings presented so far, on average, the performance of SICSUP was as good as SC i n estimating the population mean , the population standard deviation , and the standard error of the sample mean. For mean and standard deviation estimation, with n SC . In addition, with n = 1,000, the performance of the three sample design s became close to that of SRS, with only slight difference. For standard error estimation, SICSUP worked as well as SC except under some conditions. The conditions are different by evaluation criteria or type of strata used, but the common factor is small sample size ( n = 50). Therefore, very small sample size should be avoided when SICSUP is u sed. 4.2 Research Question 2 One critical issue in applying a complex sample design is the determination of sample size. This is typically done by determining amount of error that a researcher would allow. The second research question is about how the app ropriate sample size for SICSUP can be determined. To address this research question, first, the design effects and corresponding sample sizes were computed for each sample design , based on the standard errors that were obtained from the first research que stion; second, given the margin of error, required sample sizes for SICSUP, SICS, and SC were examined. 4.2.1 Design Effect and Sample Size Table 4. 15 shows the design effects of SICSUP, SICS, and SC by sample size and initial information about proportions of novice teacher s over strata. The standard errors based on the 87 four replication meth ods were averaged because of the only slight difference in the estimates among the replication methods. In addition, the levels of correlation between school size and the variable of interes t were averaged because of the same reason , and medians were used due to some outliers. On average, the design effects based on the weighted sample s are around 2.30, 2.55, and 2.21 in SICSUP, SICS, and SC , respectively. The design effects based on the samples without weight a re around 1.86, 2.01, and 1.89 in SICSUP, SICS, and SC , respectively. As expected, in general, the design effects seem to decrease as the sample size increase s regardless of the type of sample design. The design effect measures relative efficiency between a complex sample design and SRS. As the sample size increases, the effect of a complex sample design decreases and the design effect approaches to 1, meaning that its efficiency becomes close to that of SRS. The type of initial proportions of novice teachers over strata made noticeable differences in design effect when the sample size is small ( n = 50), especially in SICSUP and SICS. In the weighted samples, with initial proportions based on data, the designs effects for SICSUP and SICS are 3.03 and 3.63, respectively; with informal esti mate based on school proportions, 2.30 and 3.30, respectively; with informal estimate based on equal proportions, 2.74 and 3.01, respectively. The use of informal estimates led to reduce the design effects of SIC S UP as compared to the design effect when in itial proportions based on data used. On the other hand, the design effects of SICS rem ained similar regardless of type of initial proportions used. This shows the effect of the updating process in SICSUP on the efficiency of sample design. The updating pr ocess is beneficial when initial proportions are different from those in the population, especially for small sample size. With very large sample size ( n = 1,000), the effect of th e updating process disappears, and , on average, the design effects tend to b e similar among the 88 three sample designs. When the sample s ize is 1,000, regardless of type of sample design, a half of novice teachers are taken from the population. Table 4. 15 Design Effect for the Variable of Interest n Weighted Unweighted SICSUP SICS SC SICSUP SICS SC Initial Proportions Based on Data 50 3.03 3.63 2.38 2.00 2.72 2.07 100 2.89 3.27 2.14 2.30 2.52 1.94 500 2.08 1.67 2.03 1.63 1.53 1.74 1000 1.70 1.69 1.85 1.42 1.42 1.57 Informal Estimate Based on School Proportion s 50 2.30 3.30 2.56 1.97 2.32 2.14 100 2.90 3.58 2.36 1.97 2.37 1.88 500 1.90 2.04 1.73 1.65 1.38 1.50 1000 1.79 2.29 2.09 1.59 1.49 1.70 Informal Estimate B ased on Equal Proportions 50 2.74 3.01 3.03 2.25 2.12 2.42 100 2.61 2.55 2.75 2.11 2.42 2.22 500 1.93 1.88 1.84 1.68 1.91 1.77 1000 1.75 1.69 1.77 1.71 1.89 1.77 Table 4. 16 presents desired sample sizes based on the design effects in Table 4. 15 . Under the condi tion of relatively small sample size ( n times larger than those of SRS can achieve the same precision in estimation (e.g., 50 for SRS and 500), samples of SICSUP about two times larger than those of SRS can achieve the same precision in estimation (e.g., 500 for SRS and 1 , 040 for SICSUP). In order to achieve the same level of accuracy in estimation with 1,000 samples of SRS, SICSUP needs to take more than 85% of novice teachers in the population (more than 1,700 novice teachers). The required sample size for SICSUP in this situation seems hard to carry out in practice. 89 With weighted samples, 50% of the simulation conditions require more s amples in SICSUP than for SC . With unweighted samples, 25% of the simul ation conditions require more samples in SICSUP than in SC . Table 4. 16 Desired Sample Size n Weighted Unweighted SICSUP SICS SC SICSUP SICS SC Initial Proportions Based on Data 50 152 181 126 100 136 103 10 0 289 327 225 230 252 194 5 0 0 1040 836 913 815 763 868 100 0 1701 1686 1719 1416 1419 1571 Informal Estimate Based on School Proportion s 50 115 165 125 99 116 107 10 0 290 358 237 197 237 188 5 0 0 949 1018 823 823 691 750 100 0 1788 2294 2012 1595 1492 1700 Informal Estimate B ased on Equal Proportions 50 137 151 140 113 106 121 10 0 261 255 275 211 242 222 5 0 0 963 939 921 840 953 885 100 0 1746 1693 1772 1710 1890 1768 Figure 4. 6 to Figure 4. 9 illustrate the diffe rences in desired sample sizes, that can provide parameter estimates as accurate as the given SRS samples would , among SICSUP, SICS, and SC . The leftmost point on the horizontal axis in each figure represents the sample size of SRS. The vertical axis represents coefficient between school size and the variable of interest, and the second term denotes the type of initial proportions of novice teacher s based on informal estimate based on equal proportions). Figure 4. 6 to Figure 4. 9 give the required sample sizes for SICSUP, SICS, and SC given SRS samples of 50, 100, 500, and 1000, respectively. 90 In the figures, there are some odd sample sizes that are substantively different from the other sample sizes. For example, in Figure 4. 8 , the required sample size for SICS, under the .7 and initia l proportions based on data , seems too small ( n = 623) as compared to those for SICSUP ( n = 1 , 040) and SC ( n = 890). In this dissertation, 10 sets of samples were generated for each simulati on condition and som e sets with outliers might affect the results. Without sampling weight, the required sample sizes are smaller than with sampling weight. This is because the unweighted samples have smaller design effects than the weighted samples have. The difference s i n sample sizes among SICSUP, S I C S, and SC are not significant as compared to the differences with sampling weight. As shown in Figure 4. 9 , some simu lation conditions produce the required sample sizes greater than the population size of 2,000 ( blue vertical line in Figure 4. 9 ). These happen when either of informal estimates is used as initial proportions. The deviation of required sample size for SICSUP from the sample size of SRS becomes great as the sample size i ncreases because the sample size for SICSUP is a product of the sample size of SRS and the corresponding design effect . For example, with SRS samples of 50, the design effect of 3 for SICSUP gives the required sample size of 150, and the difference in samp le size between two sample designs is 100. On the other hand, with SRS samples of 1 , 000, the design effect of 3 for SICSUP gives the required sample size of 3,000, and the difference in them is 3,000. Although the design effect is not changed, the differen ce in sample size between SICSUP and SRS increases. 91 Figure 4. 6 Sample Size for SICSUP, SICS, and SC T hat Yields the Same Precision as SRS of 50 92 Figure 4. 7 Sample Size for SICSUP, SICS, and SC T hat Yields the Same Precision as SRS of 100 93 Figure 4. 8 Sample Size for SICSUP, SICS, and SC T hat Yields the Same Precision as SRS of 500 94 Figure 4. 9 Sample Size for SICSUP, SICS, and SC T hat Yields the Same Precision as SRS of 1,000 95 4.2.2 Margin of Error and Sample Size The marg in of error refers to the limit of accuracy of a sample estimate of a population parameter (Agresti & Finlay, 2009). In other words , it shows how many points the results can be differ from the population parameter. In this research question, it is the population mean. Table 4.17 presents the required sample sizes of SRS given the level of margin of error. Table 4. 17 Margin of Error for a Sample Mean and Required Sample Size for SRS Margin of Error 0.5 0.4 0.3 0.2 0.1 56 87 149 307 840 63 96 165 337 895 69 106 180 365 943 The required sample sizes for SICSUP, SICS, and SC were obtained (see Table 4. 18 ) by multiplying the samples sizes for SRS in Table 4.17 and the design effects in Table 4. 15 . The type s of initial proportions of novice teachers over strata and the level s of correlation between school size and the variable of interest were averaged. In this population, with sampling weight, it seems that one cannot use SICSUP with the margin of error of .1 because the required sample size is larger than the population size of 2,000. The minimum margin of er ror (the maximum precision) that SICSUP can achieve is .2 in this population. Therefore, under this situation, for SICSUP, drawing 761 novice teachers is recommended if the resources such as cost and time are enough to carry out this sampling plan. If samp ling weights are not used, the required sample size for SICSUP is 622 given . 2 margin of error . 96 Table 4. 18 Margin of Error for a Sample Mean and Required Sample Size for SICSUP, SICS, and SC Margin of Error Weighted Unweighted SICSUP SICS SC SICSUP SICS SC 0.5 142 152 140 116 119 117 0.4 218 233 215 178 183 179 0.3 372 398 367 304 313 307 0.2 761 813 750 622 638 626 0.1 2018 2154 1992 1652 1695 1664 Figure 4. 10 to Figure 4. 12 illustrate the required sample sizes for SICSUP, SICS, and SC as compared to the sample sizes for SRS. In the figures, the left and right panels represent (1) weighted samples and (2) unweighted samples. The top, middle, and bottom panels represent (a) i nitial p roportion b ased on d ata , (b) i nformal estimate based on s chool p ropo rtion s , and (c) i nformal estimate based on equal proportion s, respectively. In the figures, the dotted line in black represents the number of SRS samples that can achieve the given margin of error. 97 Figure 4. 10 Margin of Error for a Sample Mean and Required Sample Size for SICSUP, SICS, 98 Figure 4. 11 Margin of Error for a Sample Mean and Required Sample Size for SICSUP, SICS, and SC under the 99 Figure 4. 12 Margin of Error for a Sample Mean and Required Sample Size for SICSUP, SICS, 100 All three figures reveal that as the margin of error increases, the required sa mple size decreases. In order to achieve the margin of error of .1, SICSUP as well as SICS and SC needs large sample sizes, close to or larger than the population size of 2,000. Therefore, it seems impossible to achieve .1 margin of error in this populatio n. The required sample size decreases rapidly as the margin of error increases . For some conditions, SICSUP, SICS, and SC require similar number of sample size. For nformal estimate based on s chool p roportion s , and un w eighted s ample (b2 of Figure 4. 10 ), the three lines are almost overlapped each other. This implies that one can use SICSUP and SC together for their s urvey with the same sample size, and the samples of SICSUP and SC would provide similar pr ecision in estimating the mean. Unlike the cases mentioned above, under some conditions, there are visible differences in sample sizes among SICSUP, SICS, and SC . Und er the three conditions of i nitial p roportion s b ased on d ata , and unweighted sample (a2 of Figure 4. 10 ) , = .4, i nitial p roportion s b ased on d ata , and w eighted s ample (a1 of Figure 4. 11 ) , and i nformal estimate based on s chool p roportion s , and w eighted s ample (b1 of Figure 4. 11 ), SICSUP requires less samples than SC does. On the other i nformal estimate based on s chool p roportion s , and w eighted or unweighted s ample s (b1 and b2 of Figure 4. 12 ), SICSUP requires more samples than SC . In line with the findings presented thus far in this section, in order to apply SICSUP to this population of novice teacher s , the sample sizes of about 760 and 620 seem the best choices with and without sampling weight, respectively , in terms of estimation precision. However, one should pay attention to the type of initial proportions of novice teachers over strata that are used for SICSUP and the correl ation between school size and the variable of interest because they 101 may influence the expected estimation precision either of positively or negatively given the sample size . 4.3 Research Question 3 The third research question is about whether SICSUP works well in terms of estimating group difference. For each of the five selected countries, 10 sets of samples were taken in order to estimate the population mean. In addition to means, standard errors w ere also estimated using the jackknife estimator with original strata and the BRR estimator with pseudo - strata. This section provides the results of 95% confidence interval coverage probabilities and rankings of the five countries based on the estimated me ans. As shown in Table 4. 19, the three sample design s tend to estimate the mean well for all countries. It was assumed that the approximate design effects of SICSUP, SICS, and SC for the populations are less than three for all countries, so the sample size of 600 could achieve the margin of error, .3. It se ems that all estimates regardless of simulation condition, sample design , and country achieve the margin of error, .3. Table 4. 19 Sample Means by Country Country Population mean With Weight Without Weight SICSUP SICS SC SICSUP SICS SC Initial Proportions Based on Data Country 1 12.37 12.35 12.35 12.43 12.35 12.35 12.40 Country 2 11.31 11.23 11.21 11.18 11.33 11.32 11.29 Country 3 11.46 11.40 11.44 11.39 11.44 11.48 11.45 Country 4 11.70 11.78 11.72 11.73 11.71 11.68 11.68 Country 5 11.92 11.93 11.88 11.90 11.92 11.88 11.90 Informal Estimate Based on School Proportions Country 1 12.37 12.32 12.37 12.33 12.31 12.36 12.34 Country 2 11.31 11.19 11.22 11.20 11.30 11.31 11.31 Country 3 11.46 11.44 11.40 11.43 11.51 11.46 11.49 Country 4 11.70 11.76 11.74 11.74 11.70 11.71 11.69 Country 5 11.92 11.90 11.95 11.89 11.91 11.95 11.89 102 4.3.1 Confidence Interval Coverage Probability Confidence interval coverage probability at a 95% confidence level was investigated (see Table 4. 20 and T able 4.2 1) . In general, SICSUP works slightly better than SC in terms of confidence interval coverage probability. Country 1 has the highest proportion of schools with no novice teacher among the fi ve countries, meaning the rarest population. SICSUP is supposed to work well with this type of populations, and it did for country 1. Under the condition of initial proportions based on data, the coverage probability is 1.0; under the condition of informal estimate based on school proportions, the coverage probability is still 1.0 while the corres ponding coverage probabilities in SC are 1.0 and .8, respectively. For country 2, SICSUP did not work well under the condition of informal estimate based on school proportions, with the low coverage probability of .5 with weight. Why did this happen? A possible reason is that country 2 has only two strata (public or private), and there is quite difference in stra tum mean, 11.08 for novice teachers in public schools and 11.78 for those in private schools. In the population, about 67% of novice teachers are in public schools and 33% are in private schools. If the updating process of SICSUP produces adjusted proportions which are different from those in the population, SICSUP would not work well in terms of estimating mean. With respect to type of standard error estimator, in general, the jackknife estimator performed better than the BRR estimator. In SICSUP samples, the jackknife estimator works slightly better than t he BRR estimator, but the difference is not substantial. However, in SC samples, the difference in coverage probability is clearer for some countries. For instance, the coverage probabilities for country 1 under the condition of informal estimate based on school 103 proportions are .8 with the jackknife estimator and .3 with the BRR estimator (or .9 and .5 without weight). It seems that the BRR estimator underestimat ed the standard errors, so they reduced the range of 95% confidence interval and caused the low coverage probability as compared to the other. Table 4. 20 Coverage Probability of Confidence Interval for the Country Mean U sing Weighted Samples Country Jackknif e BRR SICSUP SICS SC SICSUP SICS SC Initial Proportions Based on Data 1 1.0 1.0 1.0 1.0 1.0 1.0 2 0.9 0.9 0.7 0.9 0.9 0.7 3 1.0 1.0 0.9 1.0 1.0 0.9 4 0.8 0.9 0.7 0.8 0.9 0.7 5 0.9 1.0 1.0 0.9 1.0 1.0 Informal Estimate Based on School Proportions 1 1.0 0.9 0.8 0.9 0.6 0.3 2 0.5 0.8 0.8 0.5 0.6 0.5 3 1.0 1.0 1.0 0.8 0.5 0.9 4 0.9 0.9 0.8 0.7 0.7 0.7 5 1.0 0.9 1.0 1.0 0.6 1.0 104 Table 4. 21 Coverage Probability of Confidence Interval for the Country Mean U sing Unw eighted Samples Country Jackknife BRR SICSUP SICS SC SICSUP SICS SC Initial Proportions Based on Data 1 1.0 1.0 1.0 1.0 1.0 1.0 2 1.0 0.8 0.9 1.0 0.8 0.9 3 0.9 1.0 1.0 0.9 1.0 1.0 4 0.9 1.0 1.0 0.9 1.0 1.0 5 0.9 1.0 1.0 0.9 1.0 1.0 Informal Estimate Based on School Proportions 1 1.0 0.9 0.9 0.6 0.6 0.5 2 0.9 0.9 1.0 0.5 0.8 0.8 3 0.9 0.9 0.9 0.7 0.8 0.8 4 1.0 1.0 1.0 0.9 1.0 0.8 5 1.0 0.9 1.0 1.0 0.8 0.9 4.3.2 Rank Order of Five Countries Table 4. 22 gives the coverage probabilities of producing country rankings that are identical with the rankings based on the population means using the samples of SICSUP, SICS, SC , and the combination of SICSUP and SC . From the second to fourth columns in Table 4. 22 represent that a single sample design was applied to all five countries; the last column r epresent s that either of SICSUP or SC was applied to the five countries. Specifically, country 1, 4, and 5 drew samples using SICSUP; country 2 and 3 drew samples using SC . In the populations, country 1 has the highest mean, and county 2 has the lowest mean. The rank order of the five countries is as follows: country 1 , country 5 , country 4 , country 3 , and country 2 . As shown in Table 4. 22 , with weighted samples, SICSUP works as well as SC regardless of type of initial proportions used. Under the condition of initial proportions based on data, SICSUP performs slightly better than SC in terms of coverage probability : .9 for SICSUP and .8 for SC . 105 An interesting finding is the coverage probabilities based on the combination of two sample design s. Under the condition of informal estimate based on school proportions, the co mbination works as well as the cases in which a single sample design was used. On the other hand, under the condition of initial proportions based on data, the weighted samples from the combination works slightly worse than those from SICSUP: .8 for the co mbination and .9 for SICSUP. The coverage probability of the combination is equal to that of SC . This result suggests that SICSUP might be advantageous for all five countries in terms of estimating country rankings . Given the weighted samples, having info rmal estimate of proportions of novice teachers over strata at the beginning of the sampling procedure does not cause a significant impact up on the coverage probabi lity of rankings as compared to the coverage probability of population means. Under that condition, the difference in coverage probability of rankings between SICSUP and SC is smaller than that of population means. In terms of differentiating countries, SICSUP works as well as SC under that condition. Thus, the combination of SICSUP and SC produced the identical coverage probability with the cases in which SICSUP or SC was used alone. Table 4. 22 Rates of Producing Rankings T hat Are Identical with the Rankings Based on the Population Mea ns Using SICSUP, SICS, SC, and the Combination of Two Designs Weight SICSUP SICS SC Combination Initial Proportions Based on Data With Weight 0.9 1 0.8 0.8 Without Weight 0.7 0.9 0.9 0.8 Informal Estimate Based on School Proportions With Weight 0.9 0.9 0.9 0.9 Without Weight 0.9 0.9 0.9 0.9 106 Figure 4. 13 Estimated Mean s with 95% Confidence Interval by Country under the Condition of Initial Proportions Based on Data and with Weight: T he First Scenario Figure 4. 13 illustrates the population and sample mean s using the three sample design s, with 95% confidence interval s under the condition of initial propo rtions based on data and weighted samples. T he jackknife estimator was used to compute standard errors. For all three sample design s, each population mean falls into 95% confidence intervals of the estimates except a single case (SICS samples in country 2). In addition, the three sample design s provide the rankings that are identical with the rankings based on the population means. In this case, using either of SICSUP or SC does not make any difference in rankings of the countries. This illustrates the best scenario for all three sample design s and the combination with respect to coverage of the population means and country rankings. 107 Figure 4. 14 Estimated Mean s with 95% Confidence Interval by Country under the Condition of Informal Estimate of Proportions Based on School Proportion and with Weight : T he Second Scenario Figure 4. 14 shows the sample means with 95% confidence interval s based on another set of samples, and the jackknife estimator was used to compute standard errors. The results here are quite different from those in the first scenario. For some countries, there are cases that the population mean s do not fall into the 95% confidence interval o f the sample mean , which indicates hypothesis testing would reject t he null hypothesis at a 95% confidence level. For example, the sample means using SICSUP samples are significantly different from the parameters for country 5 and 4. The sample means using SC samples are significantly different from the parameters for most of the countries including country 2 to 5. 108 It is interesting to observe that the rankings of countries based on SICSUP samples are identical with those based on the population means even though some of the sample means are not very accurate. That is not the case when SC was used. For the samples of SC , the rankings are different from those based on the population means. The combination of SICSUP and SC provides the rankings that are identical with those based on the popu lation means. These results imply that rankings should be interpreted with caution although they are frequently reported as results of national or international surveys and assessments. For example, when SICSUP is used, the sample means of country 4 and 5 are not statistically different from each other at a 95% confidence level. However, their rank order positions are different. County 5 is ranked higher than country 4. To sum, although there are some limitations, and the results should be interpret ed cautiously, in this scenario, SICSUP performs better than SC with respect to coverage of the population means and country rankings. 109 Figure 4. 15 Estimated Mean s with 95% Confidence Interval by Country under the Condition of Informal Estimate of Proportio ns Based on School Proportion: T he Third S cenario The results shown in Figure 4. 15 illustrate ano ther scenario that the sample design s do not work well with the five countries. The simulation conditions here are exactly the same as those in the second scenario. SICSU P here works slightly better than in the second scenario (see the red circles and line s Figure 4. 14 ). Only the population mean of country 4 does not fall into the 95% confidence interval of the sample mean . SC here also works slightly better than those in the second scenario. In two out of the five counties (country 1 and 3), the populations means do not fall into the 95% confidence intervals of the sample means . How about the rankings of the countries in this scenario? The rankings based on the SICSUP samples are not identical with those based on the population means. That is the same for 110 the combination of SICSUP and SC . On the other hand, the rankings based on SC are identical with those based on the populat ion means. If one focuses on the sample means for country 4 and 5 based on the SICSUP samples (see the red circles and lines in Figure 4. 15 ), they are a lmost equal to each other: 11.86 for country 4 and 11.85 for country 5. After r ounding the sample means to the nearest tenth , they become identical. The two countries might be the same in rank depending on the decimal places reported. Because SICSUP is app lied to country 4 and 5 when the combination of two sample design s is used, the rankings under this combination are also not identical with those based on the population means. This scenario shows that although S I C SUP performs slightly better than SC with respect to coverage of the population means, it does not work as well as SC with respect to country rankings. Given the results present ed thus far in this section, SI C S UP functions as well as, or , depending on the condition, slightly better than, SC in t he rare populations across the five countries with respect to coverage of the population means. That is the same with respect to coverage of country rankings. However, the three scenarios mentioned in this section suggest that country rankings should be in terpreted with caution. 4.4 Research Question 4 The last research question evaluate s the economic aspect of SICSUP , and comparisons of SICSUP with SICS and SC were made on the basis of the number of contacted schools during the sampling procedure and the number of schools in the final set of samples. The results in this section are based on the 500 replications. 111 4.4.1 Results Based on Dataset 1 Table 4. 23 gives the numbers of contacted schools during the sampling procedure ( n* ) and schools in the final set of samples ( n ) by sample size. For all sample sizes, SICSUP contacted fewer schools than SICS and SC did. The numbers of schools in the final set of samples are similar across the three sample designs. Therefore, the ratio of the schools in the final set of samples to t he number of contacted schools in SICSUP is higher than those in SICS and SC , mean ing SICSUP is more economical than these two sample designs. In SICSUP, 77% of contacted schools were added to the final set of schools, 76% in SICS and 72% in SC . These show that both of the updating process and sequential selection have a positive effect on the reduction in the number of contacted schools in this population. Table 4. 23 Number of Contacted Schools and Schools in the Sample, Based on Dataset 1 Sample Size SICSUP SICS SC n* n n* n n* n 50 28.78 22.11 0.77 30.07 22.79 0.76 33.84 24.39 0.72 100 55.52 42.65 0.77 58.58 44.45 0.76 65.99 47.67 0.72 500 269.06 207.35 0.77 286.87 217.68 0.76 322.40 233.67 0.73 1000 536.52 413.39 0.77 572.02 434.08 0.76 636.88 464.02 0.73 Table 4. 24 illustrates the difference in the number of contacted schools by using the ratio of two sample designs. The second column of Table 4. 24 presents the effect of updating process and sequential selection on the number of contacted schools; the third column for the effect of the updating process, and the last column for the effect of sequential selection. Small ratio values indicate large effects on the number of contacted schools, meaning the sample design in the top of the ratio is more beneficial than the sample design in the bottom of the ra tio in terms of economic aspect . Rati o equal to 1 indicates no effect on the number of contacted schools. 112 The combination of the updating process and sequential selection is the most effective in the reduction of the number of contacted schools, with the ratio of about .85. However, the thir d column of Table 4. 24 suggests that this effect might be mostly due to the sequential selection rather than the upd ating process. The ratio in this column is close to 1, meaning that the updating process reduced the number of contac ted schools only slightly. The last column shows that the sequential selection reduced the number of contacted schools by 10%. The ratios tend to be constant across the dif ferent sample sizes. Table 4. 24 Difference in the Number of Contacted Schools , Based on Dataset 1 Sample Size 50 0.85 0.96 0.89 100 0.84 0.95 0.89 500 0.84 0.94 0.89 1000 0.84 0.94 0.90 113 Table 4. 25 Number of Contacted Schools and Schools in the Sample by Strata, Based on Dataset 1 m Stra. SICSUP SICS SC n* n n* n n* n 50 1 9.26 6.41 0.77 11.36 7.86 0.76 13.23 8.42 0.72 2 8.66 6.69 0.76 9.34 7.20 0.77 10.33 7.65 0.73 3 10.87 9.01 0.76 9.31 7.70 0.77 10.28 8.31 0.73 100 1 17.23 11.97 0.76 22.38 15.56 0.76 25.88 16.67 0.74 2 16.18 12.41 0.76 18.01 13.91 0.76 20.23 14.92 0.73 3 22.11 18.26 0.76 18.14 14.99 0.76 19.88 16.08 0.73 500 1 80.89 56.16 0.76 110.17 76.66 0.76 127.87 82.47 0.73 2 77.20 59.51 0.76 88.30 68.12 0.76 98.57 73.16 0.74 3 110.96 91.68 0.76 88.14 72.85 0.76 95.97 78.04 0.73 1000 1 161.24 111.78 0.76 220.48 153.11 0.76 249.86 163.13 0.73 2 153.05 117.88 0.76 175.83 135.57 0.76 196.38 145.68 0.73 3 222.23 183.73 0.76 175.99 145.43 0.76 190.63 155.22 0.74 Table 4.25 gives the numbers of contacted schools during the sampling procedure and schools in the final set of samples by strata. Dataset 1 uses location of school, such as (1) rural, (2) town, and (3) city, as stratification. Rural schools contain the smallest number of novice teachers (20%), and city schools contain the largest number of novice teachers (55%) . T he ratios of schools in the final set of samples to the contacted schools are constant across different sample sizes. In SICSUP, stratum 3 ( c ity) contacted the largest number of schools because this stratum tended to have largest sample size s as compare d to the other strata. Stratum 1 ( r ural) seems to contact more schools than stratum 2 ( t own) although the proportion of stratum 2 (25%) is slightly larger than stratum 1 (20%). This is due to the large number of small schools in rural area (stratum 1). Unlike SICSUP, stratum 3 did not contacted more schools than the other strata in SICS and SC . For all sample sizes, the first stratum (rural) contacted more schools than the other 114 strata. If drawing novice teachers in rural schools is more expensive than that in town and city schools, the larger number of contacted schools in this stratum might increase the resource consumption in SICS and SC . Some interesting results are found in Table 4. 26 . The ratios are quite different between strata. In rural area, the effect of the updating process and sequential selection is substantial, reducing the number of contacted schools by about 30% in SICSUP as compared to SC . However, that is not the same in city. The combination of updating process and sequential selection caused a negative effect, and SICSUP contacted more schools than SC did . The ratios are greater than 1. Distance between schools in rural tends to be greater than that in city, and this may increase the cost for sampling in rural area. If SICSUP requires fewer contacted schools than SC in order to reach the predetermined samp le size of elements espe cially in rural area, SICSUP might significantly reduce the cost for sampling as compared to SC. The similar pattern is observed in the fourth column of Table 4. 26 . In city, there are more large schools than in rural or town. In other words, average school size in city is larger than that in rural or town. These results s uggest that SICSUP might not have advanta ges with large schools or clusters. 115 Table 4. 26 Difference in the Number of Contacted Schools by Strata, Based on Dataset 1 Sample Size Location of School 50 Rural 0.73 0.85 0.86 Town 0.85 0.94 0.90 City 1.09 1.20 0.91 100 Rural 0.70 0.80 0.86 Town 0.81 0.91 0.89 City 1.15 1.27 0.91 500 Rural 0.66 0.77 0.86 Town 0.79 0.89 0.90 City 1.21 1.31 0.92 1000 Rural 0.67 0.77 0.88 Town 0.79 0.88 0.90 City 1.22 1.32 0.92 116 Figure 4. 16 Difference in the Number of Contacted Schools by Strata, Based on the Dat aset 1 Figure 4. 16 illustrates the difference in the number of contacted schools by strata. The blue circles refer to the number of contacted schools in SICSUP and the red circles refer to the number of contacte d schools in SC . The gray area between the two lines represents the difference in the number of contacted schools between SICSUP and SC . As the sample size increases, the difference becomes greater, showing SC contacted many more schools than SICSUP did. 117 H owever, the bottom panel of Figure 4. 16 shows the opposite pattern. As the sample size increases, SICSUP need ed more schools to contact than SC did . Wi th respect to the amount of difference, expressed by gray area, the effect of the updating process and sequential selection on the number of contacted schools is greatest in rural area . 4.4.2 Results Based on Dataset 2 Given the sample size of 600, the numbers of contacted schools during the sampling procedure by co untry are reported in Table 4. 27 . Country 1 has the rarest population, meaning a large portion of schools (about 60%) do es not have any novice teacher. Because of this fact, country 1 contacted many more schools than the other countries, and the ratio of schools in the final set of samples to contacted schools is very low, about 40% in SICSUP. This means that more than a half of the contacted schools were discarded and researchers should keep contacting scho ols in order to achieve the pre determined sample size of novice teachers. Country 2 also has a fairly small portion of novice teachers in the gen eral population, and about 30% of schools do not contain any novice teacher. This leads country 2 to have the second - worst ratio among the five countries, .68, .68, .64 for SICSUP, SICS, and SC , respectively. In SICSUP, only 68% of the contacted schools we re added to the final set of samples. Country 3 to 5 have relatively high proportions of novice teachers in the general population, and around 77% of schools include at least one novice teacher. Therefore, t he ratios for these three countries are higher th an country 1 and 2. In country 3 to 5, more than 70% of the contacted schools were added to the final set of samples. 118 Table 4. 27 Number of Contacted Schools and Schools in the Sample, Based on Dataset 2 CNT SICSUP SICS SC n* n n* n n* n 1 974.94 383.76 0.39 981.84 389.85 0.40 1270.37 413.96 0.33 2 374.11 255.03 0.68 373.83 255.16 0.68 437.32 280.45 0.64 3 325.03 241.99 0.74 336.94 249.35 0.74 380.66 267.76 0.70 4 298.06 228.18 0.77 311.10 235.41 0.76 352.03 251.93 0.72 5 261.90 206.06 0.79 276.27 211.17 0.77 308.23 223.76 0.73 Figure 4. 17 Difference in the Number of Contacted Schools by Country: SC (Top Line) and SICSUP (Bottom Line) Figure 4.17 describes the difference in the number of contacted schools by the five countries. For each box, the top line represents the number of contacted schools in SICSUP, and the bottom line represents those in SC . Country 1 (CNT1) shows the biggest difference in the number of contacted schools between SICSUP and SC as c ompared to the other countries does. 119 s, country 1 had to contact many more schools than the other countries did. For the five countries, the updati ng process of SICSUP seems not very beneficial while the sequential process is fairly advantageous (see Table 4. 28 ). The sequential selection reduced the number of contacted schools by 10 to 20% depending on the country (see the last column in Table 4.28 ). The third column describes the effect of the updating process on the number of contacted schools, and the values are very close to 1, meaning no effect. For some countries, the updating process was not helpful to reduce the number of contacted schools; for the othe r countries, the updating process worked differently for each stratum, and the effects were canceled out when it c ame to the whole sample. Table 4. 28 Difference in the Number of Contacted Schools , Based on Dataset 2 Country Country 1 0.77 0.99 0.77 Country 2 0.86 1.00 0.85 Country 3 0.86 0.97 0.89 Country 4 0.85 0.96 0.88 Country 5 0.85 0.95 0.90 Table 4. 29 and Table 4. 30 provide detailed descriptions of what happened within each stratum in each country. In country 1 (CNT1) with SICSUP, more than a half of the contacted schools in stratum 1 to 3 were discarded because they were a half of the contacted schools were added to the final set of samples. Despite of these results, SICSUP is still more economic than SC in country 1. SC discarded more schools than SICSUP did. The ratios in Table 4.29 show that SICSUP is more economical than SC i n country 1. 120 Each of country 4 (CNT4) and 5 (CNT5) have relatively small stratum as compared to others. For example, s tratum 3 and 4 have very small proportions in country 4, and most of the schools cont ain at least one novice teacher, meaning that novice teachers are not rare in such stratum. The ratios in the three designs are almost identical with each other. In this situation, sequential process and the updating process do not have substantial impact up on the reduction in the number of contacted school s. This suggests that the updating process and sequential selection is effective for rare populations, in which a large portion of clusters does not satisfy the selection criterion. Table 4. 29 Number of Contacted Schools and Schoo ls in the Sample by Strata, Based on Dataset 2 CNT. SICSUP SICS SC St. n* n n* n n* n CNT1 1 61.78 20.74 0.34 45.79 15.26 0.33 61.54 17.57 0.29 2 396.88 163.97 0.41 415.75 172.22 0.41 534.16 181.76 0.34 3 369.69 120.76 0.33 366.49 119.98 0.33 485.79 129.50 0.27 4 146.59 78.29 0.53 153.73 81.92 0.53 188.87 85.14 0.45 CNT2 1 261.05 171.43 0.66 263.17 173.03 0.66 310.52 190.78 0.61 2 113.06 83.59 0.74 112.17 82.81 0.74 126.81 89.67 0.71 CNT3 1 64.58 42.87 0.66 77.25 51.37 0.67 89.95 55.18 0.61 2 101.59 74.88 0.74 114.74 84.70 0.74 131.13 90.33 0.69 3 158.85 124.25 0.78 144.03 112.83 0.78 159.57 122.26 0.77 CNT4 1 41.04 36.52 0.89 36.73 32.48 0.88 39.01 34.34 0.88 2 107.08 75.41 0.70 114.94 80.75 0.70 132.53 87.27 0.66 3 13.23 13.23 1.00 9.71 9.71 1.00 10.10 10.08 1.00 4 119.26 87.28 0.73 135.46 98.90 0.73 154.16 105.74 0.69 5 17.44 15.74 0.90 15.42 13.85 0.90 16.23 14.51 0.89 CNT5 1 43.99 21.82 0.50 60.28 30.10 0.50 74.91 32.21 0.43 2 55.88 39.12 0.70 59.50 41.70 0.70 68.39 45.74 0.67 3 69.62 57.43 0.82 67.68 55.81 0.82 73.13 59.52 0.81 4 24.60 24.41 0.99 22.43 22.24 0.99 23.32 22.90 0.98 5 67.81 63.28 0.93 65.40 61.10 0.93 68.49 63.39 0.93 121 As shown in Table 4. 30 , in country 2 (CNT2), the number of contacted schools in SICSUP is very simi lar with that in SICS with the ratios close to 1. This simply shows that the updating process did not work well beca use there are only two strata in country 2. The updating process is based on the proportions of novice teachers over strata. If there are only two strata, the updated proportions may not make many changes to the initial sampling plan. Country 3 to 5 show the similar patterns. For some strata, the updating process was effective in reducing the number of contacted schools, showing small ratios, wh ile, for the others, it was not very effective, showing large rat ios (see the fourth column in Table 4.30 ). These differences disappear when they are combined into the whole set of samples (see the third column i n Table 4.28 ). Table 4. 30 Difference in the Number of Contacted Schools by Strata, Based on Dataset 2 Country Strata Country 1 1 1.00 1.35 0.74 2 0.74 0.95 0.78 3 0.76 1.01 0.75 4 0.78 0.95 0.81 Country 2 1 0.84 0.99 0.85 2 0.89 1.01 0.88 Country 3 1 0.72 0.84 0.86 2 0.77 0.89 0.87 3 1.00 1.10 0.90 Country 4 1 1.05 1.12 0.94 2 0.81 0.93 0.87 3 1.31 1.36 0.97 4 0.77 0.88 0.88 5 1.07 1.13 0.95 Country 5 1 0.59 0.73 0.80 2 0.82 0.94 0.87 3 0.95 1.03 0.93 4 1.05 1.10 0.96 5 0.99 1.04 0.95 122 4.4.3 Probability of Using Substitute Schools in SC In addition to the three evaluation criteria including the number of contacted schools, ratio of schools in the final set of samples to the contacted schools , and ratio of the contacted schools between sample designs, the probability of using substitute schools in SC was investigated. When cluster or multi - stage sample design is employed, surveys often prepare substitute or replacement clusters in advance. For example, the replacement schools in the PISA are the two neighboring schools of the initially sampled school in the sampling frame (OECD, 2017). These replacem ent schools are majorly due to non - response. In rare populations, substitute schools are require d because sampling units are hard to locate, and there is a high Timing is another important economic aspect when selecting a sample design for a survey. Usually surveys have strict closeout dates and p ublication deadlines. If multiple sets of substitute schools are necessary to reach the fixed sample size, it may take a long time and delay the plan of the survey. This can be considered a disadvantage of SC over SICSUP and SICS. Table 4. 31 and Table 4.32 report the probability of using substitute schoo ls in SC . As expected, in general, more than 80% of sets of samples used substitute sc hools in order to reach the pre determined sample size . In Table 4. 32, for some countries, such as country 1, 4, and 5, almost all sets of samples needed substitute schools. Although these results do not directly provide evidence that SICSUP is economically ad vantageous over SC , these suggest applications of alternative sample design instead of SC in such rare populations, such as SICSUP. 123 Table 4. 31 Probability of Using Substitute Schools, Based on Dataset 1 Sample Size 50 100 500 1000 Probability of Using Substitute Schools 0.83 0.85 0.87 0.85 Table 4. 32 Probability of Using Substitute Schools, Based on Dataset 2 Sample Size CNT 1 CNT 2 CNT 3 CNT 4 CNT 5 Probability of Using Substitute Schools 0.94 0.78 0.86 0.97 0.96 To sum, in general, SICSUP requires smaller number of contacted schools during the sampling proc edure in order to reach the pre determined sample size than SICS and SC do due to the updating process and sequential selection. This suggests that SICSUP may be beneficial for rare populati ons in terms of economic aspect as compared to SICS and SC . 124 CHAPTER 5. CONCLUSION AND DISCUSSION 5.1 Summary of Findings The aim of this dissertation was twofold . Firstly, it attempted to investigate the performance of stratified inverse cluster sampling with updating process (SICSUP) as compared to that of stratified cluster sampling ( SC ) with respect to statistical and economic aspects. The comparison was made because SICSUP was expected to serve as an alternative to S C for rare populations in education . Secondly, it was an attempt to provide guidelines for applying SICSUP to rare populations . Based on these aims, the research questions were the following: 1. Does SICSUP work as well as SC regarding parameter estimation? 2. How can the appropriate sample size for SICSUP be determined? 3. Can the samples from SICSUP determine whether the means of groups are different from each other? 4. Is SICSUP more economic than SC ? The first to third research questions evaluate d the statistic al aspect of SICSUP , and the last research question evaluated the economic aspect . From the simulation studies, four key findings were drawn. First , the results of simulation studies in Research Question 1 examined the performance of SICSUP with respect to the level of precision in estimating the population mean, the population standard deviation, and standard error of the sample mean as compared to that of SC . In terms of mean and standard deviation estimation, SICSUP worked as well as SC when sample size w as not very small ( n ) . SICSUP worked worse than SC under the condition of 125 very small sample size ( n = 50) and initial proportions of novice teachers based on data. However, if informal estimates of proportions were used, SICSUP performed better than SC even though the sample size was small. Although the updating process with small sample size is not very helpful for estimating parameters , if researchers do not know the proportions of novice teachers over strata in the population, the updating process at least provides some useful information about the proportions. In terms of standard error estimation, in general, SICSUP performed as well as SC except with very small sample size ( n = 50 ) . With n = 50, SICSUP worked better or worse than SC depending on the evaluation criteria and type of strata used. The results of the simulation studies showed that the jackknife, bootstrap, BRR, and among the four standard e rror estimators was not substantial. If one wants to choose one of them, the choice of standard error estimator in SICSUP would depend on the type of strata used for standard error estimators. When original strata were used, the jackknife estimator was slightly better than the bootstrap estimator with very small sample size ( n = 50) . When pseudo - strata were used, the BRR worked slightly better than the others when informal estimate based on equal proportions was used. However, the differen ce among the four standard error estimators was not great. Second , the simulation studies in Research Question 2 suggested some guidelines for sample - size determination for SICSUP. On average, the design effects based on the weighted samples were around 2.30 and 2.21 in SICSUP and SC , respectively. The design effects based on the samples without weight were around 1.86 and 1.89 in SICSUP and SC , respectively. These results indicated that the desired sample size in SICSUP was 2.3 0 times larger than that in SRS 126 with weight, or 1.86 times larger than that in SRS without weight, in order to produce estimates as accurate as those in SRS. The required sample sizes in SICSUP were similar to those in SC . Different margin of errors requir ed different sample sizes. In the studied population, i n order to achieve the margin of error of .1, SICSUP as well as SC need ed large sample sizes, close to or larger tha n the population size of 2,000. Therefore, it seem ed impractical or impossible to ach ieve the margin of error of .1 in this population. The best choice of margin of error in this population was the margin of error of .2, and hence, in SICSUP, the sample sizes of about 760 and 620 seem ed the best choices with and without sampling weight, re spectively. However, one should pay attention to type of initial proportions used and the correlation between school size and the variable of interest because they may influence sample - size determination either of positively or negatively. Third, the study in Research Question 3 examined the performance of SICSUP for multiple populations (e.g., statewide or international surveys) as compared that of SC with respect to rankings. For each country, the population mean (or country mean) was estimated and c onfidence interval coverage probability at a 95% confidence level was investigated . In general, SICSUP work ed slightly better than SC across the five countries in terms of confidence interval coverage probability of the population mean . Especially, for the country with the highest proportion of schools with no novice teacher, or the rarest population, among the five countries, SICSUP worked fairly better than SC . In terms of providing country rankings that are identical with those base d on the population means , SICSUP work ed as well as or , depending on the condition, slightly better than SC and the combination of SICSUP and SC . In Research Question 3, some interesting results were found. For example, the sample means in SICSUP were not very accurate, but it was able to 127 produce the county rankings that were identical with those based on the population means. This also occurred when SC or the combination of SICSUP and SC was used. These results impl ied that rankings should be interpreted w ith caution although they were frequently reported as results of national or international surveys and assessments. Last but not least, Research Question 4 evaluated the economic aspect of SICSUP in terms of number of contacted schools in order to achieve the predetermined sample size as compared to those in SC . Based on the dataset in Research Question 1, SICSUP contacted fewer schools than SC did. The numbers of schools in the final set of samples were similar between the two sample designs. Thus, the ratio of the number of schools in the final set of samples to the number of contacted schools during the sampling procedure, ( ) ) , showed that SICSUP was more economica l than SC . However, the different ratios by strata suggest ed that SICS UP might not be advantages for populations with large clusters. Based on the datasets in Research Question 3, SICSUP require d smaller number of contacted schools during the sampling procedure than SC did. However, t he updating process of SICSUP seem ed not very beneficial while the sequential selection of SICSUP was fairly advantageous . The sequential selection red uced the number of contacted school s by 10 to 20% depending on the country examined . For some countries, the updating process of SICSUP was not helpful to reduce the number of contacted schools (e.g., country with small number of strata) ; for the other cou ntries, the updating process worked differently for each stratum, and the effects were canceled out when it c ame to the whole sample. In this study, t he probability of using substitute schools in SC was also investigated. As expected, in general, more than 80% of sets of samples used substitute schools in order to reach 128 the predetermined sample size. Although these results did not directly provide evidence that SICSUP was economically advantageous over SC , these suggest ed a pplications of alternative sample design instead of SC in rare populations, such as SICSUP. The findings of the entire study reported that SICSUP worked at least as well as SC in terms of statistic aspect and was more economic than SC . SICSUP was sensi tive to sample size and type of initial proportions of elements when it comes to parameter estimation. In terms of economic aspect, it was sensitive to number of strata and average cluster size in the population. T he use of small number of strata or popula tions with large clusters could make SICSUP less economic. 5.2 Implications As societies become more complex and heterogeneous, the field of education also becomes broader and more diverse. This leads growing interest in groups of individuals who as students who share a distinctive culture or religion. Therefore, a substantial amount of studies have been conducted with these groups of individuals. As such studies increase, new challenges arise. Researchers who attempt to survey these groups of peop le often experience difficulty in locating them. These groups of individuals, such as those who are in a distinctive culture or religion group (e.g., migrant students), those who experienced a rare event (e.g., students who experienced cyber harassment), a nd those who share a special characteristic (e.g., students with special educational needs), are usually rare in the general populations and hence, hard to sample. The common characteristic of the groups mentioned above is that they are students in schools . This is the same for teachers in rare populations: they are found in schools. Researchers know where to find these individuals in general, but they cannot exactly locate them. SICSUP could be 129 advantageous especially to such situations. The simulation stu dies in this dissertation s uggest that SICSUP could provide results as precise as conventional SC would with contacting fewer clusters, mostly schools. Another advantage of SICSUP is the similarity in procedure to conventional SC . Both of the designs use stratification and clusters. If researchers are familiar with SC , the procedure of SICSUP would be easy to understand, and they may be less hesitant to give it a try as compared to unfamiliar sample designs. The results of this dissertation indicate that S ICSUP works as well as SC . Existing educational surveys that have used SC can change their sample design to SICSUP without facing many challenges. Existing statewide or international surveys can employ SICSUP for a part of participating states or countries that have experienced difficulties due to rarity of elements in their populations. As its name indicates, SICSUP has a close relationship with adaptive sampling, especially with inverse sampling. Adaptive sampling has a solid foundation within sampling theory ( Seber & Salehi, 2012 ; Thompson, 2002 ). There are well - established theories and sample designs that are related to adaptive sampling, and inverse sampling is one of them. These well - found bases would support SICSUP theoretically and may facilitate the understanding of concepts of the updating process and sequential selection in SICSUP. At the same time, the evaluation of SICSUP in this dissertation would contribute to the literature on adaptive sampling. Despite of a good foundation and the populari ty of adaptive sampling in sampling theory, it has been hardly used in the field of educational research. SICSUP may be able to make a connection between the two areas and encourage educational researchers to employ adaptive sampling including SICSUP in th eir studies. 130 5.3 Limitation and Future Research This section briefly discusses limitations of the study and proposes some of the directions for future research. First, a major limitation of this study is that the performance of SICSUP was evaluated only bas ed on the results from simulations with generated datasets. Although I tried to generate datasets as realistic as possible using the TALIS2018 datasets, the results still lack in realism. Future research needs to examine the performance of SICSUP with empi rical datasets. Since the development of SICSUP, it was used only once in practice for the field trial of the FIRSTMATH (First Five Years of Mathematics) Study (Tatto et al . , 2020). In addition to simulation studies with real datasets, empirical evidence is required in order to evaluate the performance of SICSUP. Second, the simulation conditions that were examined in this dissertation were (1) sample size, (2) type of init ial proportions of elements over strata, (3) level of correlation between cluster size and the variable of interest, and (4) number of strata. In surveys, response rate is one of the important considerations. Response rates for surveys seem to decrease eac h year in general (Tourangeau et al . , 2014), and increasing non - responses have caused difficulties in operation of survey, determination of sample size, and parameter estimation. With respect to rare populations, response rates of some groups of individual s tend to be low (e.g., parents with very high or low income). Evaluating the performance of SICSUP based on different levels of response rate would be suggested for future research. Third , the number of variable of interest used in this study was one for each population, and mainly the population mean and standard deviation were estimated using SICSUP. In practice, questionnaires, tests, and interview questions include many items, so the number of variable of interest in surveys and assessments is more tha n one. Statistical factors such as 131 estimation precision and required sample size tend to be different by variables of interest within a survey (OECD, 2017, 2019). Future studies could evaluate SICSUP with multiple variable of interest. Type of standard err or estimator is also an important statistical consideration when evaluating SICSUP. Along with the mean and standard deviation, different standard error estimators such as ratio, regression coefficient, and plausible value could be examined in order to eva luate the performance of SICSUP in terms of estimation precision. Finally, future studies may explore how the point at which the updating process takes pla ce affects the performance of SICSUP. The updating process relies on c urrent samples collected to the updating point. I f the size of the current samples is too small, the updating process may not work well. Based on the dataset generated for the first research question, with n = 50, some updating process occurred with few samples (less than 5 for a stratum), and such cases usually failed to produce accurate proportions. Another factor that affects the updating process is which stratum first reaches the initial sample size. With small sample size (e.g., n = 50), w hen the smallest stratum reached the initial sample size first , the updating process tended to produce less accurate proportions than when the largest stratum reached first. These may be topics for future research and provide directions to improve current SICSUP. 132 APPENDIX 133 Standard Error of the Sample Mean Using Samples without Weight The estimated bias of a standard error estimator is the difference between the average of standard error estimates from the 10 sets of samples and the empirical standard error . A positive value indicates that the standard error estimator tends to overestimate the empirical standard error and a negative value indicates that the standa rd error estimator tends to underestimate the empirical standard error. The relative bias is the estimated bias divided by the empirical standard error. Because an estimated bias can be a negative or positive value, the relative bias can also be a negative or positive value. 134 T able A. 1 Estimated B i as for the Standard Error Estimators with Original Strata and without Weight n UJ IJ S J UB IB S B Initial Proportions Based on Data 50 0. 0 - 0.03 0.02 0.01 - 0.05 0.01 0.01 50 0.4 0.02 0.08 - 0.03 0.02 0.07 - 0.04 50 0.7 0.02 - 0.02 0.00 0.01 - 0.03 - 0.01 100 0. 0 - 0.02 0.00 - 0.02 - 0.02 0.00 - 0.02 100 0.4 0.01 0.01 0.00 0.00 0.00 - 0.01 100 0.7 0.02 0.00 0.02 0.01 0.00 0.01 500 0. 0 0.02 0.01 0.02 0.02 0.01 0.02 500 0.4 0.02 0.02 0.02 0.02 0.02 0.02 500 0.7 0.03 0.03 0.03 0.03 0.03 0.03 1000 0. 0 0.03 0.03 0.03 0.03 0.03 0.03 1000 0.4 0.03 0.03 0.04 0.03 0.03 0.04 1000 0.7 0.04 0.04 0.04 0.04 0.04 0.04 Informal Estimate of Based on School Proportion s 50 0. 0 - 0.02 - 0.01 - 0.03 - 0.03 - 0.01 - 0.03 50 0.4 0.03 - 0.02 0.00 0.03 - 0.02 - 0.01 50 0.7 0.01 0.02 0.05 0.00 0.02 0.05 100 0. 0 0.01 0.00 - 0.02 0.01 0.00 - 0.02 100 0.4 - 0.01 0.00 - 0.01 - 0.02 0.00 - 0.01 100 0.7 - 0.01 0.01 0.01 - 0.01 0.00 0.00 500 0. 0 0.02 0.02 0.02 0.02 0.02 0.02 500 0.4 0.02 0.03 0.02 0.02 0.03 0.02 500 0.7 0.03 0.02 0.02 0.03 0.02 0.02 1000 0. 0 0.03 0.03 0.03 0.03 0.03 0.03 1000 0.4 0.03 0.04 0.04 0.03 0.04 0.04 1000 0.7 0.04 0.03 0.03 0.03 0.03 0.03 Informal Estimate Based on Equal Proportions 50 0. 0 0.04 - 0.02 - 0.02 0.03 - 0.02 - 0.03 50 0.4 - 0.02 0.04 0.02 - 0.03 0.04 0.01 50 0.7 - 0.03 0.01 - 0.01 - 0.04 0.01 - 0.01 100 0. 0 - 0.01 - 0.01 0.00 0.00 0.01 0.01 100 0.4 - 0.01 - 0.01 - 0.01 - 0.02 0.01 0.01 100 0.7 - 0.01 - 0.01 - 0.05 - 0.06 - 0.01 - 0.01 500 0. 0 0.02 0.02 0.02 0.02 0.03 0.03 500 0.4 0.02 0.02 0.03 0.03 0.03 0.03 500 0.7 0.02 0.02 0.02 0.02 0.02 0.02 1000 0. 0 0.03 0.03 0.03 0.03 0.04 0.04 1000 0.4 0.03 0.04 0.02 0.02 0.03 0.03 1000 0.7 0.03 0.03 0.03 0.03 0.04 0.04 135 Table A. 2 Relative Bias of the Standard Error E stimators with O riginal Strata and without W eight n UJ IJ S J UB IB S B Initial Proportions Based on Data 50 0. 0 - 0.07 0.05 0.03 - 0.11 0.02 0.01 50 0.4 0.05 0.17 - 0.07 0.04 0.14 - 0.08 50 0.7 0.03 - 0.05 0.00 0.02 - 0.05 - 0.02 100 0. 0 - 0.06 0.02 - 0.05 - 0.07 0.01 - 0.06 100 0.4 0.02 0.02 0.00 0.00 0.00 - 0.02 100 0.7 0.05 0.00 0.04 0.02 0.00 0.04 500 0. 0 0.16 0.11 0.17 0.17 0.10 0.16 500 0.4 0.17 0.13 0.15 0.17 0.13 0.16 500 0.7 0.18 0.21 0.19 0.19 0.21 0.18 1000 0. 0 0.44 0.41 0.47 0.44 0.41 0.47 1000 0.4 0.40 0.38 0.48 0.38 0.37 0.49 1000 0.7 0.45 0.46 0.49 0.45 0.46 0.52 Informal Estimate Based on School Proportion s 50 0. 0 - 0.04 - 0.02 - 0.06 - 0.06 - 0.02 - 0.08 50 0.4 0.06 - 0.04 0.01 0.06 - 0.05 - 0.02 50 0.7 0.01 0.05 0.11 - 0.01 0.04 0.10 100 0. 0 0.04 0.01 - 0.06 0.03 0.00 - 0.07 100 0.4 - 0.04 0.00 - 0.04 - 0.05 - 0.01 - 0.04 100 0.7 - 0.01 0.02 0.03 - 0.01 0.00 0.01 500 0. 0 0.19 0.13 0.13 0.18 0.14 0.13 500 0.4 0.15 0.21 0.16 0.13 0.21 0.15 500 0.7 0.20 0.15 0.17 0.23 0.15 0.18 1000 0. 0 0.39 0.43 0.40 0.40 0.44 0.39 1000 0.4 0.43 0.47 0.52 0.42 0.47 0.51 1000 0.7 0.41 0.37 0.41 0.41 0.37 0.40 Informal Estimate Based on Equal Proportions 50 0. 0 0.09 - 0.05 - 0.05 0.07 - 0.06 - 0.06 50 0.4 - 0.05 0.09 0.05 - 0.06 0.09 0.01 50 0.7 - 0.06 0.02 - 0.01 - 0.08 0.02 - 0.03 100 0. 0 - 0.02 - 0.03 0.01 - 0.01 0.04 0.04 100 0.4 - 0.03 - 0.04 - 0.03 - 0.04 0.02 0.02 100 0.7 - 0.02 - 0.03 - 0.13 - 0.14 - 0.03 - 0.03 500 0. 0 0.17 0.18 0.14 0.16 0.22 0.22 500 0.4 0.16 0.15 0.17 0.18 0.25 0.23 500 0.7 0.17 0.18 0.11 0.10 0.18 0.17 1000 0. 0 0.49 0.47 0.32 0.36 0.56 0.55 1000 0.4 0.48 0.50 0.26 0.25 0.50 0.48 1000 0.7 0.45 0.44 0.31 0.29 0.52 0.51 136 Table A. 3 Relative MSE for the Standard Error Estimators with O riginal S trata and W eight n UJ IJ S J UB IB S B Initial Proportions Based on Data 50 0. 0 0.07 0.04 0.04 0.08 0.05 0.03 50 0.4 0.04 0.06 0.03 0.04 0.05 0.03 50 0.7 0.01 0.02 0.02 0.01 0.02 0.03 100 0. 0 0.02 0.01 0.02 0.01 0.02 0.02 100 0.4 0.03 0.02 0.01 0.03 0.03 0.01 100 0.7 0.02 0.01 0.04 0.02 0.01 0.04 500 0. 0 0.03 0.02 0.03 0.03 0.02 0.03 500 0.4 0.03 0.02 0.03 0.04 0.02 0.03 500 0.7 0.04 0.05 0.04 0.04 0.05 0.04 1000 0. 0 0.20 0.17 0.22 0.19 0.17 0.23 1000 0.4 0.16 0.15 0.23 0.14 0.14 0.24 1000 0.7 0.20 0.21 0.24 0.21 0.21 0.28 Informal Estimate Based on School Proportion s 50 0. 0 0.08 0.03 0.04 0.08 0.02 0.04 50 0.4 0.09 0.07 0.04 0.10 0.06 0.04 50 0.7 0.02 0.03 0.05 0.02 0.03 0.05 100 0. 0 0.01 0.03 0.02 0.01 0.03 0.03 100 0.4 0.01 0.03 0.01 0.01 0.03 0.01 100 0.7 0.02 0.02 0.03 0.03 0.01 0.03 500 0. 0 0.04 0.02 0.02 0.04 0.02 0.02 500 0.4 0.02 0.05 0.03 0.02 0.05 0.03 500 0.7 0.05 0.03 0.03 0.06 0.03 0.04 1000 0. 0 0.16 0.18 0.17 0.16 0.19 0.16 1000 0.4 0.18 0.22 0.27 0.18 0.23 0.27 1000 0.7 0.17 0.14 0.17 0.17 0.15 0.16 Informal Estimate Based on Equal Proportions 50 0. 0 0.05 0.03 0.04 0.04 0.02 0.04 50 0.4 0.05 0.07 0.04 0.05 0.05 0.03 50 0.7 0.05 0.07 0.03 0.04 0.06 0.02 100 0. 0 0.02 0.02 0.03 0.03 0.02 0.02 100 0.4 0.02 0.02 0.04 0.03 0.01 0.01 100 0.7 0.02 0.02 0.05 0.05 0.01 0.01 500 0. 0 0.04 0.04 0.04 0.05 0.06 0.06 500 0.4 0.03 0.03 0.05 0.05 0.07 0.06 500 0.7 0.04 0.04 0.02 0.02 0.04 0.04 1000 0. 0 0.24 0.22 0.11 0.14 0.32 0.30 1000 0.4 0.23 0.26 0.07 0.07 0.25 0.24 1000 0.7 0.20 0.20 0.11 0.10 0.28 0.27 137 Table A. 4 Confidence Interval Coverage Probability of the Standard Error Estimators with Original Strata and without Weight n UJ IJ S J UB IB S B Initial Proportions Based on Data 50 0. 0 0.90 1.00 0.90 0.90 1.00 0.90 50 0.4 0.90 0.70 0.90 0.90 0.70 0.90 50 0.7 0.80 0.90 0.90 0.80 0.90 0.90 100 0. 0 0.90 1.00 1.00 0.90 1.00 1.00 100 0.4 0.90 0.90 0.90 0.90 0.90 0.90 100 0.7 1.00 0.90 1.00 1.00 0.90 1.00 500 0. 0 0.80 1.00 1.00 0.80 1.00 1.00 500 0.4 1.00 0.90 1.00 0.90 0.90 1.00 500 0.7 0.90 1.00 1.00 0.90 1.00 1.00 1000 0. 0 1.00 1.00 0.90 1.00 1.00 0.90 1000 0.4 0.90 1.00 1.00 0.90 1.00 1.00 1000 0.7 1.00 0.90 1.00 1.00 0.90 1.00 Informal Estimate Based on School Proportion s 50 0. 0 1.00 0.90 0.80 1.00 0.90 0.80 50 0.4 0.90 1.00 0.80 0.90 0.90 0.80 50 0.7 1.00 0.90 1.00 1.00 0.90 1.00 100 0. 0 0.80 1.00 1.00 0.80 1.00 0.90 100 0.4 0.80 0.90 0.90 0.80 0.90 0.90 100 0.7 0.90 1.00 0.80 0.90 1.00 0.80 500 0. 0 1.00 0.90 1.00 1.00 0.90 1.00 500 0.4 0.90 1.00 1.00 0.90 1.00 0.90 500 0.7 1.00 0.90 0.90 1.00 0.90 1.00 1000 0. 0 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.4 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.7 0.90 1.00 0.90 0.90 1.00 0.90 Informal Estimate Based on Equal Proportions 50 0. 0 1.00 0.90 1.00 1.00 0.90 1.00 50 0.4 0.90 0.90 0.90 0.90 0.90 0.90 50 0.7 0.80 0.90 0.90 0.80 0.90 0.90 100 0. 0 1.00 1.00 1.00 1.00 1.00 1.00 100 0.4 1.00 1.00 1.00 1.00 1.00 1.00 100 0.7 1.00 1.00 1.00 1.00 1.00 1.00 500 0. 0 0.90 0.90 1.00 1.00 1.00 1.00 500 0.4 1.00 1.00 0.80 0.80 0.90 0.90 500 0.7 0.90 0.80 1.00 1.00 1.00 1.00 1000 0. 0 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.4 1.00 1.00 0.90 1.00 1.00 1.00 1000 0.7 0.90 0.90 1.00 1.00 1.00 1.00 138 Table A. 5 Estimated Bias of the Standard Error Estimators with Pseudo - S trata and without Weight n UJ IJ S J UB IB S B UR IR S R UF IF S F Initial Proportions Based on Data 50 0. 0 - 0.13 - 0.04 - 0.09 - 0.13 - 0.04 - 0.09 - 0.13 - 0.04 - 0.08 - 0.13 - 0.04 - 0.09 50 0.4 0.05 - 0.02 - 0.10 0.05 - 0.02 - 0.09 0.05 - 0.01 - 0.10 0.05 - 0.01 - 0.10 50 0.7 - 0.12 - 0.09 - 0.02 - 0.12 - 0.09 - 0.03 - 0.12 - 0.09 - 0.02 - 0.12 - 0.09 - 0.02 100 0. 0 - 0.09 - 0.09 - 0.06 - 0.09 - 0.09 - 0.06 - 0.09 - 0.09 - 0.06 - 0.09 - 0.09 - 0.06 100 0.4 - 0.05 - 0.04 - 0.06 - 0.04 - 0.04 - 0.06 - 0.05 - 0.04 - 0.06 - 0.05 - 0.04 - 0.06 100 0.7 - 0.03 - 0.05 - 0.07 - 0.03 - 0.05 - 0.07 - 0.03 - 0.05 - 0.07 - 0.03 - 0.05 - 0.07 500 0. 0 - 0.03 - 0.04 - 0.03 - 0.03 - 0.04 - 0.03 - 0.03 - 0.04 - 0.03 - 0.03 - 0.04 - 0.03 500 0.4 - 0.04 - 0.03 - 0.03 - 0.04 - 0.03 - 0.03 - 0.04 - 0.03 - 0.03 - 0.04 - 0.03 - 0.03 500 0.7 - 0.05 - 0.06 - 0.04 - 0.05 - 0.06 - 0.04 - 0.05 - 0.06 - 0.04 - 0.05 - 0.06 - 0.04 1000 0. 0 - 0.02 - 0.03 - 0.01 - 0.02 - 0.03 - 0.01 - 0.02 - 0.03 - 0.01 - 0.02 - 0.03 - 0.01 1000 0.4 - 0.02 - 0.03 - 0.02 - 0.03 - 0.03 - 0.01 - 0.02 - 0.03 - 0.02 - 0.02 - 0.03 - 0.02 1000 0.7 - 0.03 - 0.03 - 0.02 - 0.03 - 0.03 - 0.02 - 0.03 - 0.03 - 0.02 - 0.03 - 0.03 - 0.02 Informal Estimate Based on School Proportion s 50 0. 0 - 0.13 - 0.10 - 0.10 - 0.14 - 0.10 - 0.10 - 0.13 - 0.10 - 0.09 - 0.13 - 0.10 - 0.09 50 0.4 - 0.11 - 0.08 - 0.06 - 0.12 - 0.08 - 0.07 - 0.11 - 0.08 - 0.06 - 0.11 - 0.08 - 0.06 50 0.7 - 0.10 0.00 0.05 - 0.10 0.00 0.05 - 0.10 0.00 0.05 - 0.10 0.00 0.05 100 0. 0 - 0.07 - 0.03 - 0.06 - 0.06 - 0.03 - 0.06 - 0.07 - 0.03 - 0.06 - 0.07 - 0.03 - 0.06 100 0.4 - 0.08 - 0.02 - 0.05 - 0.08 - 0.02 - 0.05 - 0.08 - 0.02 - 0.05 - 0.08 - 0.02 - 0.05 100 0.7 - 0.02 - 0.04 - 0.07 - 0.02 - 0.04 - 0.06 - 0.02 - 0.04 - 0.07 - 0.02 - 0.04 - 0.07 500 0. 0 - 0.04 - 0.04 - 0.02 - 0.04 - 0.04 - 0.02 - 0.04 - 0.04 - 0.02 - 0.04 - 0.04 - 0.02 500 0.4 - 0.04 - 0.04 - 0.04 - 0.03 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 500 0.7 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 - 0.04 1000 0. 0 - 0.02 - 0.01 - 0.01 - 0.02 - 0.01 - 0.01 - 0.02 - 0.01 - 0.01 - 0.02 - 0.01 - 0.01 1000 0.4 - 0.02 - 0.02 - 0.01 - 0.02 - 0.02 - 0.01 - 0.02 - 0.02 - 0.01 - 0.02 - 0.02 - 0.01 1000 0.7 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 - 0.03 Informal Estimate Based on Equal Proportions 50 0. 0 - 0.03 - 0.07 - 0.07 - 0.03 - 0.07 - 0.07 - 0.03 - 0.07 - 0.07 - 0.03 - 0.07 - 0.07 50 0.4 - 0.08 - 0.06 - 0.03 - 0.08 - 0.06 - 0.03 - 0.08 - 0.05 - 0.03 - 0.08 - 0.06 - 0.03 50 0.7 - 0.10 - 0.06 - 0.03 - 0.10 - 0.06 - 0.03 - 0.10 - 0.07 - 0.03 - 0.10 - 0.07 - 0.03 100 0. 0 - 0.07 - 0.07 - 0.07 - 0.07 - 0.04 - 0.04 - 0.03 - 0.04 - 0.02 - 0.02 - 0.02 - 0.02 100 0.4 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.05 - 0.04 - 0.04 - 0.04 - 0.04 100 0.7 - 0.04 - 0.04 - 0.04 - 0.04 - 0.09 - 0.09 - 0.09 - 0.09 - 0.05 - 0.05 - 0.05 - 0.05 500 0. 0 - 0.02 - 0.02 - 0.02 - 0.02 - 0.03 - 0.04 - 0.03 - 0.03 - 0.02 - 0.02 - 0.02 - 0.02 500 0.4 - 0.04 - 0.04 - 0.04 - 0.04 - 0.03 - 0.03 - 0.03 - 0.03 - 0.02 - 0.02 - 0.02 - 0.02 500 0.7 - 0.05 - 0.05 - 0.05 - 0.05 - 0.06 - 0.06 - 0.06 - 0.06 - 0.04 - 0.04 - 0.04 - 0.04 1000 0. 0 - 0.01 - 0.01 - 0.01 - 0.01 - 0.01 - 0.01 - 0.01 - 0.01 0.00 0.00 0.00 0.00 1000 0.4 - 0.02 - 0.02 - 0.02 - 0.02 - 0.02 - 0.02 - 0.02 - 0.02 - 0.01 - 0.01 - 0.01 - 0.01 1000 0.7 - 0.03 - 0.03 - 0.03 - 0.03 - 0.04 - 0.04 - 0.04 - 0.04 - 0.02 - 0.02 - 0.02 - 0.02 139 Table A. 6 Relative Bias of the Standard Error Estimators with Pseudo - S trata and without Weight n UJ IJ SJ UB IB SB UR IR SR UF IF SF Initial Proportions Based on Data 50 0. 0 - 0.29 - 0.08 - 0.21 - 0.29 - 0.09 - 0.20 - 0.28 - 0.08 - 0.17 - 0.29 - 0.09 - 0.21 50 0.4 0.10 - 0.04 - 0.20 0.10 - 0.03 - 0.20 0.09 - 0.01 - 0.20 0.09 - 0.03 - 0.20 50 0.7 - 0.23 - 0.18 - 0.05 - 0.23 - 0.18 - 0.05 - 0.23 - 0.18 - 0.05 - 0.23 - 0.18 - 0.05 100 0. 0 - 0.28 - 0.29 - 0.18 - 0.28 - 0.29 - 0.18 - 0.28 - 0.28 - 0.18 - 0.28 - 0.29 - 0.18 100 0.4 - 0.14 - 0.11 - 0.19 - 0.13 - 0.10 - 0.18 - 0.13 - 0.11 - 0.19 - 0.13 - 0.11 - 0.19 100 0.7 - 0.07 - 0.13 - 0.19 - 0.07 - 0.13 - 0.20 - 0.07 - 0.13 - 0.19 - 0.07 - 0.13 - 0.19 500 0. 0 - 0.26 - 0.32 - 0.25 - 0.26 - 0.32 - 0.24 - 0.26 - 0.32 - 0.25 - 0.26 - 0.32 - 0.25 500 0.4 - 0.27 - 0.25 - 0.23 - 0.27 - 0.25 - 0.25 - 0.27 - 0.25 - 0.23 - 0.28 - 0.25 - 0.23 500 0.7 - 0.31 - 0.43 - 0.28 - 0.31 - 0.43 - 0.28 - 0.31 - 0.43 - 0.27 - 0.31 - 0.42 - 0.27 1000 0. 0 - 0.29 - 0.33 - 0.19 - 0.29 - 0.33 - 0.20 - 0.29 - 0.33 - 0.20 - 0.30 - 0.33 - 0.20 1000 0.4 - 0.30 - 0.32 - 0.21 - 0.31 - 0.32 - 0.20 - 0.30 - 0.32 - 0.21 - 0.31 - 0.32 - 0.21 1000 0.7 - 0.35 - 0.38 - 0.29 - 0.34 - 0.38 - 0.30 - 0.34 - 0.38 - 0.29 - 0.34 - 0.38 - 0.29 Informal Estimate Based on School Proportion s 50 0. 0 - 0.29 - 0.22 - 0.22 - 0.30 - 0.21 - 0.22 - 0.28 - 0.22 - 0.21 - 0.28 - 0.22 - 0.22 50 0.4 - 0.23 - 0.17 - 0.14 - 0.24 - 0.17 - 0.15 - 0.23 - 0.16 - 0.13 - 0.23 - 0.17 - 0.14 50 0.7 - 0.20 0.01 0.10 - 0.20 0.01 0.10 - 0.19 0.00 0.10 - 0.19 0.01 0.10 100 0. 0 - 0.20 - 0.10 - 0.20 - 0.19 - 0.10 - 0.19 - 0.20 - 0.09 - 0.20 - 0.20 - 0.09 - 0.20 100 0.4 - 0.22 - 0.06 - 0.16 - 0.22 - 0.06 - 0.16 - 0.22 - 0.06 - 0.16 - 0.22 - 0.06 - 0.16 100 0.7 - 0.06 - 0.12 - 0.20 - 0.06 - 0.11 - 0.18 - 0.06 - 0.11 - 0.20 - 0.06 - 0.12 - 0.20 500 0. 0 - 0.28 - 0.31 - 0.19 - 0.29 - 0.32 - 0.19 - 0.29 - 0.31 - 0.20 - 0.29 - 0.31 - 0.20 500 0.4 - 0.25 - 0.27 - 0.28 - 0.24 - 0.28 - 0.28 - 0.25 - 0.27 - 0.28 - 0.26 - 0.27 - 0.28 500 0.7 - 0.30 - 0.31 - 0.31 - 0.30 - 0.30 - 0.33 - 0.30 - 0.31 - 0.31 - 0.30 - 0.31 - 0.32 1000 0. 0 - 0.20 - 0.19 - 0.22 - 0.21 - 0.19 - 0.21 - 0.20 - 0.19 - 0.21 - 0.20 - 0.18 - 0.21 1000 0.4 - 0.27 - 0.27 - 0.13 - 0.25 - 0.27 - 0.11 - 0.26 - 0.27 - 0.13 - 0.27 - 0.27 - 0.13 1000 0.7 - 0.36 - 0.31 - 0.34 - 0.36 - 0.31 - 0.36 - 0.36 - 0.31 - 0.34 - 0.36 - 0.31 - 0.34 Informal Estimate Based on Equal Proportions 50 0. 0 - 0.07 - 0.17 - 0.17 - 0.07 - 0.15 - 0.16 - 0.08 - 0.16 - 0.17 - 0.07 - 0.17 - 0.17 50 0.4 - 0.17 - 0.12 - 0.07 - 0.17 - 0.12 - 0.06 - 0.17 - 0.12 - 0.07 - 0.17 - 0.12 - 0.07 50 0.7 - 0.18 - 0.13 - 0.07 - 0.18 - 0.13 - 0.07 - 0.19 - 0.14 - 0.07 - 0.19 - 0.14 - 0.07 100 0. 0 - 0.23 - 0.21 - 0.23 - 0.23 - 0.10 - 0.10 - 0.10 - 0.10 - 0.08 - 0.08 - 0.08 - 0.08 100 0.4 - 0.15 - 0.14 - 0.14 - 0.15 - 0.14 - 0.14 - 0.14 - 0.14 - 0.11 - 0.11 - 0.11 - 0.11 100 0.7 - 0.13 - 0.12 - 0.13 - 0.13 - 0.23 - 0.23 - 0.22 - 0.23 - 0.16 - 0.16 - 0.16 - 0.16 500 0. 0 - 0.20 - 0.20 - 0.20 - 0.20 - 0.23 - 0.25 - 0.23 - 0.23 - 0.17 - 0.16 - 0.17 - 0.17 500 0.4 - 0.32 - 0.32 - 0.32 - 0.32 - 0.19 - 0.20 - 0.19 - 0.19 - 0.14 - 0.12 - 0.14 - 0.14 500 0.7 - 0.40 - 0.40 - 0.40 - 0.40 - 0.34 - 0.35 - 0.34 - 0.34 - 0.32 - 0.32 - 0.32 - 0.32 1000 0. 0 - 0.17 - 0.18 - 0.17 - 0.17 - 0.17 - 0.16 - 0.17 - 0.17 - 0.06 - 0.07 - 0.06 - 0.06 1000 0.4 - 0.23 - 0.23 - 0.23 - 0.23 - 0.26 - 0.26 - 0.26 - 0.26 - 0.11 - 0.10 - 0.11 - 0.11 1000 0.7 - 0.35 - 0.36 - 0.35 - 0.35 - 0.37 - 0.37 - 0.37 - 0.37 - 0.28 - 0.28 - 0.28 - 0.28 140 Table A. 7 Relative MSE of the Standard Error Estimators with Pseudo - S trata and without Weight n UJ IJ SJ UB IB SB UR IR SR UF IF SF Initial Proportions Based on Data 50 0. 0 0.15 0.05 0.08 0.15 0.05 0.08 0.15 0.04 0.10 0.15 0.04 0.08 50 0.4 0.10 0.08 0.09 0.11 0.09 0.09 0.10 0.09 0.09 0.10 0.08 0.09 50 0.7 0.09 0.10 0.07 0.09 0.10 0.08 0.09 0.10 0.07 0.09 0.10 0.07 100 0. 0 0.10 0.10 0.06 0.09 0.10 0.05 0.09 0.10 0.06 0.09 0.10 0.06 100 0.4 0.06 0.06 0.04 0.06 0.05 0.04 0.06 0.05 0.04 0.06 0.05 0.04 100 0.7 0.06 0.07 0.08 0.07 0.06 0.08 0.06 0.07 0.08 0.06 0.07 0.08 500 0. 0 0.08 0.11 0.07 0.08 0.11 0.07 0.08 0.11 0.07 0.08 0.11 0.07 500 0.4 0.09 0.07 0.06 0.09 0.07 0.07 0.09 0.07 0.06 0.10 0.07 0.06 500 0.7 0.11 0.20 0.09 0.11 0.20 0.09 0.11 0.20 0.09 0.11 0.20 0.09 1000 0. 0 0.10 0.12 0.05 0.10 0.12 0.06 0.10 0.12 0.05 0.11 0.12 0.05 1000 0.4 0.10 0.11 0.05 0.11 0.11 0.04 0.10 0.11 0.05 0.11 0.11 0.05 1000 0.7 0.13 0.18 0.10 0.13 0.18 0.10 0.13 0.18 0.10 0.13 0.18 0.10 Informal Estimate Based on School Proportion s 50 0. 0 0.16 0.07 0.09 0.16 0.06 0.09 0.15 0.07 0.09 0.15 0.07 0.09 50 0.4 0.15 0.15 0.09 0.15 0.16 0.10 0.15 0.15 0.09 0.15 0.15 0.09 50 0.7 0.09 0.09 0.11 0.09 0.08 0.11 0.09 0.09 0.11 0.09 0.09 0.11 100 0. 0 0.07 0.06 0.07 0.07 0.06 0.06 0.07 0.07 0.07 0.07 0.06 0.07 100 0.4 0.07 0.07 0.06 0.07 0.08 0.06 0.07 0.07 0.06 0.07 0.07 0.06 100 0.7 0.04 0.07 0.05 0.04 0.08 0.05 0.04 0.07 0.05 0.04 0.07 0.05 500 0. 0 0.10 0.11 0.05 0.10 0.11 0.05 0.10 0.11 0.05 0.11 0.11 0.05 500 0.4 0.08 0.09 0.09 0.08 0.09 0.09 0.08 0.09 0.09 0.09 0.09 0.09 500 0.7 0.10 0.12 0.11 0.10 0.12 0.13 0.10 0.12 0.12 0.10 0.12 0.12 1000 0. 0 0.05 0.05 0.06 0.05 0.05 0.06 0.05 0.05 0.06 0.05 0.05 0.06 1000 0.4 0.09 0.09 0.02 0.08 0.08 0.02 0.09 0.09 0.02 0.09 0.09 0.02 1000 0.7 0.15 0.12 0.13 0.15 0.11 0.14 0.15 0.12 0.13 0.16 0.12 0.13 Informal Estimate Based on Equal Proportions 50 0. 0 0.06 0.07 0.09 0.06 0.07 0.09 0.06 0.07 0.09 0.06 0.07 0.09 50 0.4 0.05 0.09 0.09 0.05 0.11 0.10 0.05 0.10 0.09 0.05 0.10 0.09 50 0.7 0.10 0.08 0.04 0.09 0.08 0.04 0.10 0.08 0.04 0.10 0.08 0.04 100 0. 0 0.09 0.08 0.09 0.09 0.06 0.07 0.07 0.07 0.04 0.05 0.05 0.05 100 0.4 0.07 0.07 0.07 0.07 0.09 0.08 0.09 0.09 0.04 0.04 0.04 0.04 100 0.7 0.07 0.07 0.07 0.07 0.11 0.10 0.11 0.11 0.05 0.05 0.05 0.05 500 0. 0 0.07 0.07 0.07 0.07 0.08 0.09 0.08 0.08 0.05 0.05 0.05 0.05 500 0.4 0.12 0.12 0.12 0.12 0.04 0.04 0.04 0.04 0.02 0.02 0.02 0.02 500 0.7 0.18 0.18 0.18 0.18 0.13 0.14 0.13 0.13 0.11 0.11 0.11 0.11 1000 0. 0 0.05 0.05 0.05 0.05 0.04 0.04 0.04 0.04 0.02 0.02 0.02 0.02 1000 0.4 0.06 0.07 0.06 0.06 0.09 0.09 0.09 0.09 0.02 0.02 0.02 0.02 1000 0.7 0.14 0.15 0.14 0.14 0.16 0.16 0.16 0.16 0.09 0.09 0.09 0.09 141 Table A. 8 Confidence Interval Coverage Probability of the Standard Error Estimators with Pseudo - S trata and without Weight n UJ IJ SJ UB IB SB UR IR SR UF IF SF Initial Proportions Based on Data 50 0. 0 0.80 0.90 0.80 0.80 0.90 0.80 0.80 0.90 0.90 0.80 0.90 0.80 50 0.4 0.90 0.70 0.90 0.90 0.70 0.80 0.90 0.70 0.90 0.90 0.70 0.90 50 0.7 0.80 0.90 0.60 0.80 0.90 0.60 0.80 0.90 0.60 0.80 0.90 0.60 100 0. 0 0.80 0.70 0.90 0.80 0.70 0.90 0.80 0.70 0.90 0.80 0.70 0.90 100 0.4 0.80 0.70 0.70 0.80 0.70 0.80 0.80 0.70 0.70 0.80 0.70 0.70 100 0.7 1.00 0.90 0.90 1.00 0.90 0.90 1.00 0.90 0.90 1.00 0.90 0.90 500 0. 0 0.60 0.80 0.90 0.60 0.80 0.90 0.60 0.80 0.90 0.60 0.80 0.90 500 0.4 0.90 0.60 0.80 0.90 0.60 0.80 0.90 0.60 0.80 0.90 0.60 0.80 500 0.7 0.50 0.80 0.80 0.50 0.80 0.80 0.50 0.80 0.80 0.50 0.80 0.80 1000 0. 0 0.80 0.90 0.70 0.80 0.90 0.70 0.80 0.90 0.70 0.80 0.90 0.70 1000 0.4 0.70 0.70 0.40 0.70 0.70 0.40 0.70 0.70 0.40 0.70 0.70 0.40 1000 0.7 0.50 0.50 0.60 0.50 0.50 0.60 0.50 0.50 0.60 0.50 0.50 0.60 Informal Estimate Based on School Proportion s 50 0. 0 1.00 0.90 0.60 1.00 0.90 0.60 1.00 0.90 0.60 1.00 0.90 0.60 50 0.4 0.60 0.90 0.80 0.60 0.90 0.80 0.60 0.90 0.80 0.60 0.90 0.80 50 0.7 0.70 0.80 1.00 0.70 0.80 1.00 0.70 0.80 1.00 0.70 0.80 1.00 100 0. 0 0.80 0.80 0.60 0.80 0.80 0.60 0.80 0.80 0.60 0.80 0.80 0.60 100 0.4 0.70 0.90 0.90 0.80 0.90 0.90 0.70 0.90 0.90 0.70 0.90 0.90 100 0.7 0.90 1.00 0.80 0.90 1.00 0.80 0.90 1.00 0.80 0.90 1.00 0.80 500 0. 0 0.80 0.90 0.90 0.90 0.90 0.90 0.80 0.90 0.90 0.80 0.90 0.90 500 0.4 0.90 1.00 0.60 0.90 1.00 0.60 0.90 1.00 0.60 0.90 1.00 0.60 500 0.7 0.80 0.80 0.50 0.80 0.80 0.50 0.80 0.80 0.50 0.80 0.80 0.50 1000 0. 0 0.80 0.70 1.00 0.80 0.70 0.90 0.80 0.70 1.00 0.80 0.70 1.00 1000 0.4 1.00 0.80 0.90 1.00 0.80 0.90 1.00 0.80 0.90 1.00 0.80 0.90 1000 0.7 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 Informal Estimate Based on Equal Proportions 50 0. 0 1.00 0.90 1.00 1.00 0.90 1.00 1.00 0.90 1.00 1.00 0.90 1.00 50 0.4 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 50 0.7 0.70 0.80 0.90 0.70 0.80 0.90 0.70 0.80 0.90 0.70 0.80 0.90 100 0. 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 100 0.4 0.90 1.00 0.90 0.90 0.90 0.90 0.90 0.90 1.00 1.00 1.00 1.00 100 0.7 0.90 0.90 0.90 0.90 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 500 0. 0 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 500 0.4 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 500 0.7 0.70 0.70 0.70 0.70 0.90 0.90 0.90 0.90 0.90 0.90 0.90 0.90 1000 0. 0 0.90 0.90 0.90 0.90 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1000 0.4 1.00 1.00 1.00 1.00 0.80 0.90 0.80 0.80 0.90 0.90 0.90 0.90 1000 0.7 0.80 0.80 0.80 0.80 0.70 0.70 0.70 0.70 0.50 0.50 0.50 0.50 142 REFERENCES 143 REFERENCES Agresti, A. & Finlay, B., (2009). Statistical methods for the social sciences . Upper Saddle River, NJ : Prentice Hall. Allensworth, E., Ponisciak, S., & Mazzeo, C. (2009). The schools teachers leave: Teacher mobility in Chicago public schools . Chicago, IL: Consortium on Chicago School Research. Retrieved from https://consortium.uchicago.edu/sites/default/files/2018 - 10/CCSR_Teacher_Mobility.pdf Anscombe, F. J. (1953). Sequentia l estimation. Journal of the Royal Statistical Society: Series B (Methodological) , 15 (1), 1 - 21. Bailey, T., & Weininger, E. B. (2002). Performance, graduation, and transfer of immigrants and natives in City University of New York community colleges. Educational Evaluation and Policy Analysis , 24 (4), 359 - 377. Betti, G., Gagliardi, F., & Verma, V. (2018). Simplified j ackknife variance estimates for fuzzy measures of multidimensional poverty. International Statistical Review , 86 (1), 68 - 86. Burke, A. M ., Morita - Mullaney, T., & Singh, M. (2016). Indiana emergent bilingual student time to reclassification: A survival analysis. American Educational Research Journal , 53 (5), 1310 - 1342. Canty, A. J., & Davison, A. C. (1999). Resampling based variance estimat ion for labour force surveys. Journal of the Royal Statistical Society: Series D (The Statistician) , 48 (3), 379 - 391. Chen, T. C., Bobbitt, P. A., Himelein, J. A., Paben, S. P., Cho, M. J., & Ernst, L. R. (2007). Variance e stimation for i nternational p rice p rogram i ndexes. 2007 Proceedings of the American Statistical Association , 1427 - 1434. Chen, H., & Shen, Q. R. (2019). Variance estimation for survey - weighted data using bootstrap resampling methods: 2013 methods - of - payment survey questionnaire. In P. H. Kim, D. T. Jacho - Chavez, & G. Tripathi (Eds.), The Econometrics of Complex Survey Data: Theory and Applications (pp. 144 - 163) . Bingley , England : Emerald Publishing Limited. Christman, M. C. (2004). Sequential sampling for rare and geographically clustered p opulations . In W. Thompson (Ed.), Sam pling rare or elusive species: C oncepts, designs, and techniques for estimating population parameters (pp.134 - 145) . Washington, DC: Island Press. 144 Chubbuck, S. M., Clift, R. T., Allard, J., & Quinlan, J. (2001). Playing it safe as a novice teacher: Implications for programs for new teachers. Journal of Teacher E ducation , 52 (5), 365 - 376. Clark, R. G., & Steel, D. G. (2000). Optimum all ocation of sample to strata and stages with simple additional constraints. Journal of the Royal Statistical Society: Series D (The Statistician) , 49 (2), 197 - 207. Daly, F. & Gilligan, R. ( 2005) . Lives in foster care : The educational and social support experiences of young people aged 13 to 14 years in long - term foster case . Dublin , Ireland : Children's R esearch Centre, Trinity College, Dublin . De Róiste, A., & Dinneen, J. (2005). iews about o pportunities , b arriers and s upp orts to r ecreation and l eisure. Dubli n , Ireland : Government Publications. Dippo, C. S., Fay, R. E., & Morganstein, D. H. (1984). Computing variances from complex samples with replicate weights. Proceedings of the Survey Research Methods Section , 489 - 494 . Downing, K., & Ganotice Jr, F. A. (Eds.). (2016). World university rankings and the future of higher education . Hershey, PA : IGI Global. Ghosh , M., Parr, W. C., Singh, K., & Babu, G. J. (1984). A note on bootstrapping t he sample median. Annals of Statistics, 12 (3) , 1130 - 1135. Efron, B. (1979). Bootstrap methods: Another look at the j ackknife . Annals of Statistics, 7 (1) , 1 - 26. Efron, B. (1982). The j ackknife , the b ootstrap, and o ther r esampling p lans . Philadelphia , PA : SIAM. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap . Boca Raton, FL: Chapman & Hall. Folsom, R. (2014). National assessment approach to sampling error estimation (Report No. BK - 0013 - 1412) . Research Triangle Institute . Retrieved from https://www.rti .org/rti - press - publication/sampling - error - estimation/fulltext.pdf FrankelL, M. R. (1971). Inference from s urvey s amples. Ann Arbor , MI : Institute for Social Res earch, University of Michigan. Greenberg Motamedi, J., Singh, M., & Thompson, K. D. (2016). English learner student characteristics and time to reclassification: An example from Washington state ( Report No. REL 2016 128). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistanc e, Regional Educational 145 Laboratory Northwest. Retrieved from https://ies.ed.gov/ncee/edlabs/regions/northwest/pdf/REL_2016128.pdf Haas, E., Tran, L., Huang, M., & Yu, A. (2015). The achievement progress of English learner students in Arizona ( Report No. REL 2015 - 098). U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory West. Retrieved from https://ies.ed.gov/ncee/edlabs/regions/west/pdf/REL_2015098 .pdf Haldane, J.B.S. (1945). On a method of estimating frequencies. Biometrika , 33 (3) , 222 - 225 . Hall, P. (1989). On efficient bootstrap simulation, Biometrika , 76 ( 3 ) , 613 - 617. Journal of Official Statistics , 6 (3), 223 - 239. Kamens, D. H., & McNeely, C. L. (2010). Globalization and the growth of international educational testing and national assessment. Comparative Education R eview , 54 (1), 5 - 25. Kim, S ., Ju, U., & Reckase, M. D. (2015). Obtaining a representative sample when the distribution of element s is not known. A poster was presented at the meeting of the Psychometric Society. Beijing, China. Kinnunen, P., & Malmi, L. (2006 ). Why students drop out CS1 course?. Proceedings of the Second International Workshop on Computing Education R esearch , 97 - 108. Kirsch, I., Lennon, M., von Davier, M., Gonzalez, E., & Yamamoto, K. (2013). On the growing importance of international large - scale assessments. In M. Von Davier, E. Gonza les, I. Kirsc h, & K. Yamamoto (E ds . ), The role of international large - scale assessments: Perspectives from technology, economy, and educational research (pp. 1 - 11). Dordrecht , Netherlands : Springer. Kish, L. (1965). Survey s ampling . New York : John Wiley and Sons, Inc Kish, L. (1985). Sample s urveys v ersus e xperiments, c ontrolled o bservations, c ensuses, r egisters, and l ocal s tudies . Australian Journal of Statistics, 27 (2) , 111 - 122. Kish, L. (1991). Taxonomy of elusive populations. Journal of Official Statistics , 7 (3) , 340 - 347. Kish, L., & Frankel, M. R. (1974). Inference from complex samples. Journal of the Royal Statistical Society: Series B (Methodological) , 36 (1), 1 - 22. Kovar, J. G., Rao, J. N. K., & Wu, C. F. J. (1988). Bootstrap and other methods to measure errors in survey estimates. Canadian Journal of Statistics, 16 (S1) , 25 - 46. 146 Lassibille, G. & Navarro Gómez, L. (2008). Why do higher education students drop out? Evidence from Spain. Education Ec onomics , 16 (1), 89 - 105. Lee, S. E., Lee, P. R., & Shin, K. I. (2016). A composite estimator for stratified two stage cluster sampling. Communications for Statistical Applications and Methods , 23 (1), 47 - 55. Lee, V. E., Ready, D. D., & Johnson, D. J. (2001 ). The difficulty of identifying rare samples to study: The case of high schools divided into schools - within - schools. Educational Evaluation and Policy Analysis , 23 (4), 365 - 379. Lesaux, N. K., & Kieffer, M. J. (2010). Exploring sources of reading comprehension difficulties among language minority learners and their classmates in early adolescence. American Educational Research Journal , 47 (3), 596 - 632. Lumley, T. (2020). S urvey: A nalysis of complex survey samples. R package version 4.0. Lo, N., Griffith, D. & Hunter, J. (1997). Using a restricted adap tive cluster sampling to estimate pacific hake larval abundance. California Cooperative Oceanic Fisheries Invest igations Report, 38 , 103 - 113. Mach, L., Dumais, J., & Robinson, A. A. (2005). Study of the properties of a bootstrap variance estimator under sampling without replacement. A paper was presented at the Federal Committee on Statistical Methodology (FCSM) Research Conference. Arlington , VA. Martin, M. O., Mullis, I. V., & Hooper, M. (2016). Methods and procedures in TIMSS 2015 . Boston College, TIMSS & PIRLS International Study Center. Retrieved from http:// timssandpirls.bc.edu/publications/timss/2015 - methods.html The MathWorks, Inc. (1984 - 2015). MATLAB version 10.1. Natick, MA : The MathWorks Inc. McDonald, L. L. (2004) . Sampling rare populations. In W. Thompson (Ed.), Sampling rare or elusive species: Concepts, designs, and techniques for estimating population parameters (pp.11 - 42) . Washington, DC: Island Press. Mitchell, R. E. (20 06). How many deaf people are there in the United States? Estimates from the Survey of Income and Program Participation. Journal of deaf studies and deaf education , 11 (1), 112 - 119. Murthy, M. N. (1967). Sampling theory and methods . Calcutta , Canada : Eka Press. National Academies of Sciences, Engineering, and Medicine (2018). Improving h ealth r esearch on s mall p opulations: Proceedings of a w orkshop . Washington, DC : The National Academies Press. OECD (2016). Country note: Key findings from PISA 2015 for the United States. Retrieved from https://www.oecd.org/pisa/PISA - 2015 - United - States.pdf 147 OECD (2017). PISA 2015 technical report . Paris , France : PISA, OECD Publishing. Retrieved from http://www.oecd .org/pisa/sitedocument/PISA - 2015 - technical - report - final.pdf OECD (2019). TALIS s tarting s trong 2018 t echnical r eport . Paris , France : TALIS, OECD Publishing. Retrieved from https://www.oecd.org/education/talis/TALIS_2018_Technical_Report.pdf Paben , S. P. (1999) . Comparison of v ariance e stimation m ethods for the n ational c ompensation s urvey. Proceedings of the Section on Survey Research Methods, American Statistical Association , 709 - 795 . Pathak, P. K. ( 1976 ) . Unbiased e stimation in f ixed c ost s equential s ampling s chemes. The Annals of Statistics, 4 (5) , 1012 1017. Quenouille, M. H. (1949). The joint distribution of serial correlation coefficients . The Annals of Mathematical Statistics , 20 (4) , 561 571. Rao, J. N. K., & Wu, C. J. (1985). Inference from stratified samples: Second - order analysis of three methods for nonlinear statistics. Journal of the American Statistical Association , 80 (391), 620 - 630. Rao, J. N. K., Wu, C. F. J., & Yue, K. (1992). Some recent work on resampling methods fo r complex surveys. Survey Methodology, 18 (2) , 209 - 217. R Core Team. (201 9 ). R: A language and environment for statistical computing. Vienna, Austria. URL https://www.R - project.org/.: R Foundation for Statistical Computing. Reckase, M. D., Kim, S., & Ju, U. (2016). Sequential cluster sampling for international studies. A paper was presented at the meeting of the Psychometric Society. As heville, NC . Riniolo, T. C. (1999). Using a large control group for statistical comparison: Evaluation of a between - groups median test. The Journal of Experimental Education , 68 (1), 75 - 88. Robinson, A. P., & Hamann, J. D. (2008). Correcting for spatial autocorrelation in sequential sampling. Journal of Applied E cology , 45 (4), 1221 - 1227. Rust, K. F., & Rao, J. N. K. (1996). Variance estimation for complex surveys using replication techniques. Statistical Methods in Medical R esearch , 5 (3), 283 - 310. Rutkowski, L., von Davier, M., & Rutkowski, D. (Eds.). (2013). Handbook of international large - scale assessment: Backgrou nd, technical issues, and methods of data analysis . New York: CRC Press. Shalizi, C. R. (2016). Advanced d ata a nalysis from an e lementary p oint of view . Cambridge, England : Cambridge University Press. 148 Smith, P. J., Srinath, C. K., & Battaglia, M. P. (2000). Issues relating to the use of jackknife methods in the National Immunization Survey. A paper was presented at the American Statistical Association Meetings. Indianapolis, IN. Salehi, M. & Seber, G. A. (1997). Two - stage adaptive cluster sampling. Biometrics , 53 (3) , 959 - 970. Salehi, M. M. & Smith, D. R. (2005). Two - stage sequential sampling: A neighborhood - free adaptive sampling procedure. Journal of Agricultural, Biological, and Environmental Statistics , 10 (1), 84 - 103. Scott, J. A. & Hoffmeister, R. J. (2016). American s ign l anguage and academic English: Factors influencing the reading of bilingual secondary school deaf and hard of hearing students. The Journal of Deaf Studies and Deaf Education , 22 (1), 1 - 13. Seber, G. A. & Salehi, M. M. (2012). Adaptive sample design s: I nference for sparse and clustered populations . New York: Springer Science & Business Media. Shin, H. S. (1995). Estimating future teacher supply: Any policy implications for educational reform?. International Journal of Educational Reform, 4 (4), 422 - 433. Simon, N. S. & Johnson, S. M. (2015). Teacher turnover in high - poverty schools: What we kno w and can do. Teachers College Record , 117 (3), 1 - 36. Skinner, C. J., Holt, D., & Smith, T. M. F. (1989). Analysis of complex surveys . Chichester, England : Wiley. Smith, T. M. & Ingersoll, R. M. (2004). What are the effects of induction and mentoring on beginning teacher turnover?. American Educational Research J ournal , 41 (3), 681 - 714. Smith, W. C. (E d.) (2016). The global testing culture: Shaping educational policy, perceptions, and practice , Oxford , England : Symposium Books. Stapleton, L. M. (2008). Variance estimation using replication methods in structural equation modeling with complex sample data. Structural Equation Modeling: A Multidisciplinary Journal , 15 (2), 183 - 210. Statistics Canada (2018). General s ocial s urvey c ycle 30: Canadians at w ork and h ome . Minister of Industry. Retrieved from http://sda.chass.utoronto.ca/sdaweb/dli2/gss/gss30/gss30/more_doc/GSSC30ENgid.pdf Statistics Canada (2019). National Cannabis Survey , third quarter 2019, The Daily, October 30. Retrieved from https://www 150.statcan.gc.ca/n1/en/daily - quotidien/191030/dq191030a - eng.pdf?st=AzIUltJh 149 Tatto, M. T. (2014). Teacher Education Development Study - Mathematics (TEDS - M). In S. Lerman (Ed.) , Encyclopedia of Mathematics Education (pp. 586 - 592). Dordrecht, Netherlands: Springer. Tatto, M. T., Rodriguez, M. C., Reckase, M. D., Smith, W. M., Bankov, K., & Pippin, J. (2020). The F irst F ive Y ears of T eaching M ath ematics (FIRSTMATH): Concepts, methods and strategies for comparative i nternat ional r esearch . Dordrecht, Netherla nds : Springer Nature. Tourangeau, R., Edwards, B., Johnson, T. P., Wolter, K. M., & Bates, N. (Eds.). (2014). Hard - to - survey populations . Cambridge, England : Cambridge University Press. Thompson, S. K. (1990). Adaptive cluster sampling. Journal of the American Statistical Association , 85 (412), 1050 - 1059. Thompson, S. K. (1991 a). Adaptive cluster sampling: D esigns with primary and secondary units. Biometrics , 47 (3), 1103 - 1115. Thompson, S. K. (1991b). Stratified adaptive cluster sampling . Biometrika , 78 (2), 389 - 397. Thompson, S. K. (2002). Sampling . New York : John Wiley & Sons. Thompson, W. (Ed.). (2004). Sam pling rare or elusive species: C oncepts, designs, and techniques for estimating population parameters . Washington, DC: Island Press. Wald , A. (1945). Sequential t ests of s tatistical h ypotheses. Annals of Mathematical Statistics, 16 (2), 117 - 186 Wald, A. (1947). Sequential analysis . New York: John Wiley. Westerman, D. A. (1991). Expert and novice teacher decision making. Journal of teacher education , 42 (4), 292 - 305 . Wolter, K. (1985). Introduction to variance estimation . New York: Springer - Verlag.