INVESTIGATING IMMIGRANT STUDENT ACADEMIC ACHIEVEMENT ON PISA BY LINKING ADDITIONAL DATA ON ORIGIN CHARACTERISTICS By William Nicholas Bork Rodriguez A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Educational Psychology & Educational Technology—Doctor of Philosophy 2024 ABSTRACT A secondary analysis of PISA 2018 data was conducted to investigate the reading achievement of students with an immigrant background. While prior research suggests the importance of demographic characteristics in secondary PISA analyses, a major critique is that PISA centers destination characteristics over origin ones, limiting the study of the association between origin characteristics and academic achievement. This study addressed the critique by linking PISA data with data of origin country characteristics to allow for analyses that could not be conducted with the PISA data set alone. The study linked PISA data with data of origin country characteristics, centered the origin-based characteristics in the analyses, and then evaluated the utility of linking this additional data for explaining education outcomes. Multilevel statistical modeling was used to model the association between origin country characteristics and the academic achievement of students with an immigrant background. Linked data were: (1) Language Distance—student-level measure of similarity between home and school language; (2) Human Development Index—country-level measure of human development; (3) Global Adaptation Index—country-level measure of climate vulnerability and readiness to improve resilience/climate adaptation; and (4) forced displacement ratio—country-level ratio between inward/outward forced displacement. The principal findings were the negative association between language distance and PISA reading scores (i.e., one unit increase in language distance associated with -0.31 unit change in reading score) and that language distance afforded a closer look at the association between language and PISA reading scores when applied to four particular countries and their language pairs (e.g., Switzerland, Finland, Qatar, Israel). TABLE OF CONTENTS INTRODUCTION.............................................................................................................. 1 PHENOMENON & RATIONALE OF THE STUDY........................................................... 3 REVIEW OF RESEARCH................................................................................................ 4 THE PROPOSED STUDY.............................................................................................. 25 METHODS...................................................................................................................... 46 RESULTS....................................................................................................................... 80 DISCUSSION............................................................................................................... 109 CONCLUSION..............................................................................................................125 REFERENCES............................................................................................................. 127 APPENDIX A: LIST OF PROMINENT ILSAS..............................................................138 APPENDIX B: BREAKDOWN OF DEMOGRAPHIC QUESTIONS............................. 139 APPENDIX C: COMPLETE LIST OF VARIABLES..................................................... 142 APPENDIX D: SOFTWARE CONSIDERED FOR STATISTICAL MODELING........... 148 APPENDIX E: LIST OF DATA FILES.......................................................................... 150 APPENDIX F: FINAL LIST OF COUNTRIES USED IN STUDY..................................151 APPENDIX G: COMPLETE STEPS FOR ANALYTICAL METHOD............................156 APPENDIX H: COMPLETE LIST OF IMMIGRATION PATHS..................................... 221 APPENDIX I: IMMIGRATION COUNTS AND PERCENTAGES BY COUNTRY......... 232 iii INTRODUCTION The Program for International Student Assessment (PISA) is an International Large-Scale Assessments (ILSA) that assesses the knowledge and skills of 15-year old students and reports results in domain areas such as mathematics, reading, and science (OECD, 2019a). ILSAs are empirical studies to assess educational achievement (Rocher & Hastedt, 2020). The results inform stakeholders, such as educators, researchers, policymakers, and the general public about the status of learner achievement and their educational system, which often guide decisions regarding the evaluation and reform of education at scale (Rocher & Hastedt, 2020). Students with an immigrant background are one important group that participate in PISA. Prior research suggests that immigrant student achievement on PISA is typically studied using secondary analyses and that there are numerous benefits in doing so (Donnellan et al., 2011; Torney-Purta & Amadeo, 2013a). Prior research also suggests that demographic characteristics have been a typical research focus for PISA secondary analyses (Aloisi & Tymms, 2017). In recent years, PISA results have shown mixed achievement amongst immigrant students (Schleicher, 2006; OECD, 2016). In addition, PISA results have shown increasing immigration in most countries (OECD, 2019a). While prior research suggests the importance of demographic characteristics, a major critique of this research is that PISA centers destination characteristics over origin ones in studies. This is because PISA questionnaire items collect data from students who have since settled into their post-immigration destination country (i.e., destination country). Therefore, most of the data is about students’ lived experiences within the 1 destination country, with a much smaller number of items on characteristics related to participants’ origin country (e.g., parents’ country of birth). This limits researchers from studying how characteristics of the origin country may introduce variation into the academic achievement of students with an immigrant background. One way to address the critique of research is by linking PISA data with data of origin country characteristics to enhance the analytic opportunities by allowing for analyses that could not be conducted with the PISA data set alone. A study was proposed to this end. The study was built on three pillars. The first pillar was linking PISA data with data of origin country characteristics to enhance the analytic opportunities, allowing for analyses that could not be conducted with the PISA data set alone. The second pillar was to center the origin-based characteristics in the analyses. The third pillar was to evaluate the utility of linking this additional data for explaining education outcomes. The study linked PISA data with four additional data sets bringing origin country characteristics into the analysis. The study used multilevel statistical modeling to model the association between origin country characteristics and the academic achievement of students with an immigrant background. 2 PHENOMENON & RATIONALE OF THE STUDY The Phenomenon of Study: Reading Achievement of Immigrant Students on the Programme for International Student Assessment (PISA) This study investigated the phenomenon of reading achievement of students with an immigrant background on the Programme for International Student Assessment (PISA), a large-scale international assessment (ILSA). Rationale for the Importance of Studying This Phenomenon The rationale for studying this phenomenon came from a review of research which suggested two important reasons for studying this phenomenon. First, there is variation in immigrant student reading outcomes by their individual and country characteristics (OECD, 2016). Studying these characteristics can help explain and predict immigrant student achievement. Second, there is an increasing number of students with an immigrant background in most countries (OECD, 2019b). This increase suggests the growing importance regarding the attention and resources directed towards educating learners with an immigrant background. 3 REVIEW OF RESEARCH The review of prior research highlighted topics relevant to this study. To start, an overview of the PISA assessment is given. Then the first relevant review topic highlighted that immigrant student achievement on PISA has typically been studied using secondary analyses. Second was that demographic characteristics has been a typical research focus for PISA secondary analyses. Third was that PISA results have shown mixed achievement amongst immigrant students. Fourth was that PISA results have shown increasing immigration in most countries. Then a critique of the prior research was highlighted: while prior research suggests the importance of demographic characteristics, a critique of this research is that PISA centers destination characteristics over origin ones in studies. This limits researchers from studying how characteristics of the origin country may introduce variation into the academic achievement of students with an immigrant background. Finally, an assertion was made that one way to address the critique of research was by linking PISA data with data of origin country characteristics to enhance the analytic opportunities by allowing for analyses that could not be conducted with the PISA data set alone. An Overview of the PISA Assessment Large-Scale International Assessments (ILSAs) have a history spanning over twenty years, starting with the introduction of the Trends in International Mathematics and Science Study (TIMSS) in 1995 to the most recent Global Teaching Insights (GTI) study in 2020 (Rocher & Hastedt, 2020). Appendix A lists prominent ILSAs. ILSAs typically assess key knowledge and skills and report results in domain areas such as 4 mathematics, reading, science, and more. This study focused on the ILSA reading achievement of students with an immigrant background because language-based skills are typically, though not always, more challenging for immigrant students to develop compared to more technical domains such as mathematics and science (Schnepf, 2006; Andon et al., 2014). One of the most well known ILSA is the Programme for International Student Assessment (PISA). PISA was first introduced in 2000 by the Organisation for Economic Co-operation and Development (OECD). The rationale for PISA was to meet a need for internationally comparable evidence on student performance, to offer insights for education policy and practice, and to monitor trends in acquisition of knowledge and skills (OECD, 2019a). The research questions of PISA, at a high level, ask to what extent have 15-year-old students acquired key knowledge and skills essential for full participation in social and economic life (OECD, 2019a). PISA instrumentation is a computer-based assessment of four 30-minute clusters of assessment material presented to students (OECD, 2019c). The clusters were drawn from mathematics, science, and reading domains (OECD, 2019c). Test items included both multiple-choice and constructed-response questions (OECD, 2019c). Additionally, contextual data was collected from 35-minute long surveys administered separately to students, their teachers, and their school principals (OECD, 2019c). PISA methods are complex, with affordances and constraints different from smaller scoped assessments. Specific sample designs, weights, and plausible values are used, typical to other ILSAs (Bailey et al., 2023). For example, PISA is a balanced-incomplete-block (BIB) design that assesses reading, mathematics, science, 5 and global competence (OECD, 2019c). BIB designs group test items into clusters which are presented to participants with equal frequency and position across tests (OECD, 2019c). Approximately 600,000 students participated in 79 countries/economies for PISA 2018 (OECD, 2019c). Participants were selected based on a two-stage stratified sampling procedure where first schools were sampled and then students were sampled within the selected schools (OECD, 2019c). Participant performance was reported on a continuous scale based on item-response theory models, as well as, arranged into six numbered proficiency levels (i.e., 6 is highest) to help users interpret PISA scores (OECD, 2019c). Immigrant Student Achievement Typically Studied with Secondary Analyses The first review topic was that immigrant student achievement on PISA has typically been studied using secondary analyses. This is because PISA is designed to support secondary analyses with publicly available data and ample supporting documentation. Prior research suggests there are numerous benefits in conducting secondary analysis of immigrant student achievement on PISA (Donnellan et al., 2011; Torney-Purta & Amadeo, 2013a). There are multiple reasons why secondary analyses of ILSAs are important. Torney-Purta & Amadeo (2013a) wrote that “...careful secondary analysis can address important research questions about human development, learning and education at both the macro and micro levels'' (p. 249). In addition, quantitative secondary analysis of ILSAs can generate research questions to be answered by smaller mixed-methods studies (Torney-Purta & Amadeo, 2013a). Torney-Purta & Amadeo (2013a) stated that secondary analyses can be especially impactful in the educational and developmental psychology field where data are typically limited in 6 sample size and scope. Secondary analyses give researchers the potential to make inferences at multiple levels of analysis, ranging from the macro level (e.g., understanding country-level education outcomes), down to the micro level (e.g., classroom practices), and including the levels between (Torney-Purta & Amadeo, 2013b). Secondary analyses also promote an open source approach to science by making the artifacts and processes of scientific research (e.g., data, analyses, etc) broadly accessible to actors outside the study (Donnellan et al., 2011). This saves time and costs on data collection, aids reproducibility of findings, and promotes careful documentation of processes (Donnellan et al., 2011). Demographic Characteristics are a Typical Research Focus for PISA Secondary Analyses The second review topic was that demographic characteristics have been a typical research focus as PISA surveys collect lots of demographic information. Survey questions typically ask students about their home environment, their school environment, and even questions about how their parents engage with the world around them. One example of a study making use of demographic characteristics data comes from Cattaneo and Wolter (2015) with their investigation into how country policy efforts impacted immigrant student outcomes. This study controlled for demographic characteristics first, to then attribute the remaining variation to recent changes in migration policy in Switzerland. The rationale for the Cattaneo and Wolter (2015) study was that in Switzerland, between 2000 and 2009, improvements in socio-economic background were traditionally attributed to integration policies. However, migration policy may have also had an impact on socio-economic background as well (Cattaneo & 7 Wolter, 2015). The research question asked about the extent to which achievement improvements in 1st-generation immigrant students were attributable to changes in observable background variables due to migration policy changes? The methods used linear regression to model reading scores for PISA 2000 and PISA 2009. The authors’ key explanatory variables came from PISA and included parents’ socio-economic background characteristics such as an SES index, parents’ education, books in home, parental occupation, home language, and parent nationality (Cattaneo & Wolter, 2015). In addition to PISA survey data, the study incorporated additional data from the parental occupation index from International Socio-Economic Index of Occupational Status (Cattaneo & Wolter, 2015). One finding of the study was that 1st generation immigrants scored 40-points higher on PISA between 2000 and 2009 (Cattaneo & Wolter, 2015). The authors reported that 75% of that score increase was attributed to changes in parent background characteristics and school composition while the remaining 25% was attributed to changes in migration policy (e.g., immigration policy and law). Other researchers have found demographic characteristics useful for studying the academic outcomes of immigrant students. A study by Aloisi and Tymms (2017) found that demographic factors are more impactful than policy and reform ones. The rationale for the Aloisi and Tymms (2017) study was to see if adjusting for factors outside the control of education policymakers could better distill out the impact of policy. The researchers asked three research questions: (1) What is the relationship between changes in the socio-economic and demographic characteristics of the PISA cohorts, and changes in country outcomes? (2) What is the relationship between changes in the curricular provision of PISA-participating countries and their outcomes? (3) Overall, 8 what is the relative importance of non-policy-malleable factors (student SES and demographics), when compared with policy-malleable factors (e.g., curricular changes) with respect to PISA scores? The researchers used multilevel growth models. In addition to PISA survey data between PISA 2000 and PISA 2015, additional external data was incorporated for country reforms information (Aloisi & Tymms, 2017). The authors found that demographic characteristics of students (e.g., SES level) impacted PISA achievement more than reform policies (Aloisi & Tymms, 2017). In fact, the study reported an annual effect size of only 0.02 standard deviations for reform policies (Aloisi & Tymms, 2017). The study did not find strong evidence for the effectiveness of curricular reforms on PISA outcomes. The authors concluded that education reforms typically change the education goal or vision but the less malleable demographic factors explain more variation in PISA scores (Aloisi & Tymms, 2017). PISA Results Suggest Mixed Achievement Amongst Immigrant Students Another review topic was the mixed achievement amongst immigrant students' PISA scores. One early report of immigrant learner outcomes came from a comparative review of PISA 2003 data in which Schleicher (2006) reported that not all immigrant PISA participants scored alike. The Schleicher (2006) report was commissioned to examine performance differences between students with and without an immigrant background and to identify what impacted the results, so as to provide implications for educational policy. One group of research questions in the report asked about an immigrant student performance gap with sub-questions for economic, social, and cultural background characteristics while the other group of questions asked about between-country differences and potential policy intervention points (Schleicher, 2006). 9 The analysis was conducted using multiple regression for continuous outcome variables and logistic regression for dichotomous outcome variables and no unique instruments were used as the data were already collected for PISA 2003 (Schleicher, 2006). One finding was that immigrant students performed lower than native learners, with variation by country (Schleicher, 2006). A second finding was there was no significant association between immigrant population size and performance outcomes (Schleicher, 2006). A third finding was that background characteristics of immigrant students, after controlling for them, only partially explained the variation in performance outcomes in some countries (Schleicher, 2006). A fourth finding was that differences in student language and the language of instruction also partially explained the variation in performance outcomes, with variation by country (Schleicher, 2006). A final finding was that countries with well-established language support programs featuring clear goals and standards tended to have immigrant learners who perform more similarly to native learners (Schleicher, 2006). In more recent times, the official PISA 2015 results have also reported variation in immigrant student academic outcomes and demographic characteristics (OECD, 2016). One finding was the 1.4% increase in overall educated parents and student SES between PISA 2009 and PISA 2015 (OECD, 2016). There were 57.3% of first-generation immigrant students with a parent who had an education level equal to the average non-immigrant parent (OECD, 2016). Countries with a 10+% increase were: Belgium, Croatia, Denmark, Ireland, and Luxembourg. Relatedly, some immigrant students had a similar or higher economic, social, and cultural status (ESCS) than their non-immigrant peers within: Estonia, Ireland, Latvia, Malta, Montenegro, Singapore, and 10 United Arab Emirates (OECD, 2016). A second finding was increased differences between home and school languages (OECD, 2016). There was a 10+% increased difference in: Belgium, Germany, Greece, Ireland, Qatar, and Slovenia (OECD, 2016). Among immigrant students, those who did not speak the PISA test language at home scored 54 points lower than non-immigrant students while those who did speak the PISA test language at home scored relatively better at only 31 points lower (OECD, 2016). This trend was true for all subjects but less pronounced in mathematics with only a 15 point deficit (OECD, 2016). The countries with the highest language penalty were: Hong Kong, Luxembourg, Austria, Belgium, Jordan, Macao, Russia, and Switzerland (OECD, 2016). A third finding was that immigrant learners scored 31 points lower in science compared to non-immigrant learners, after controlling for SES (OECD, 2016). The largest deficits (i.e., 40 to 55 points) were in: Austria, Belgium, Denmark, Germany, Slovenia, Sweden, and Switzerland (OECD, 2016). In other countries, the scores were similar between immigrant and non-immigrant students who both scored above the OECD average in: Australia, Canada, Estonia, Hong Kong, Ireland, and New Zealand (OECD, 2016). Furthermore, in some countries, immigrant learners outperformed non-immigrants with the largest differences (i.e., 22 to 80 points) found in: Macao, Qatar, and the United Arab Emirates (OECD, 2016). A fourth finding was that SES disadvantage only partially accounted for immigrant learner outcomes (OECD, 2016). In 22 of the 33 countries with at least a 6.25% immigrant student population, there were significant differences in science performance between immigrant and non-immigrant learners, after controlling for socio-economic status (OECD, 2016). Only in 5 countries did the immigrant background effect disappear after controlling for socio-economic 11 status in: Costa Rica, Hong Kong, Israel, Singapore, and the United States. Similar results were reported for mathematics and reading (OECD, 2016). A fifth finding was that 1st-generation immigrant learners performed better in culturally similar destinations compared to their peers in culturally different ones, after controlling for similar SES (OECD, 2016). For example, 1st-generation mainland Chinese students who moved to culturally similar locations (e.g., Hong Kong or Macao) scored better than those who moved to culturally different ones (e.g., Australia or New Zealand) (OECD, 2016). However, the cultural familiarity effect was not found for 2nd-generation immigrants (OECD, 2016). For example, 2nd-generation mainland Chinese students scored better in culturally different destinations (e.g., Australia or New Zealand) than those 2nd-generation mainland Chinese students in culturally similar ones (e.g., Hong Kong or Macao) (OECD, 2016). PISA Results Suggest Increasing Immigration in Most Countries Another review topic was that PISA has detected an increase in the number of students with an immigrant background over time (OECD, 2019b). PISA 2018 reported one finding that the OECD country average of students with an immigrant background had increased from 10% to 13% between PISA 2009 and PISA 2018 (OECD, 2019b). The largest increases were seen in: Canada, Ireland, Luxembourg, Malta, Norway, Qatar, Singapore, Sweden, Switzerland, and the United Kingdom (OECD, 2019b). A second finding was an increase in socio-economically disadvantaged students as measured by the bottom quartile of PISA’s economic, social, and cultural status (ESCS) index (OECD, 2019b). At least 45% of immigrant students were disadvantaged in: Austria, Denmark, Finland, France, Germany, Greece, Iceland, the Netherlands, 12 Norway, Slovenia, and Sweden (OECD, 2019b). Conversely, some other countries have gained immigrant students from higher socio-economic status; higher than their own non-immigrant population (OECD, 2019b). This suggested the migration of highly skilled workers in places such as: Brunei Darussalam, Panama, Qatar, Saudi Arabia, Singapore, and the United Arab Emirates (OECD, 2019b). A third finding was that home and school language differences were common with 48% of immigrant students not speaking the PISA assessment language while at home (OECD, 2019b). This threshold was over 70% in: Austria, Brunei Darussalam, Finland, Iceland, Lebanon, Luxembourg, and Slovenia (OECD, 2019b). Conversely, this value was <10% in: Costa Rica, Croatia, Jordan, and Kazakhstan (OECD, 2019b). Speaking the assessment language at home was associated with increased performance in: Brunei, Darussalam, Germany, Luxembourg, Macao, Malta, and Switzerland (OECD, 2019b). A fourth finding was that there were distinct effects of immigration status that were not explained by SES (OECD, 2019b). Generally, non-immigrant students outperform first- and second-generation immigrant students, when controlling for students’ and schools’ SES (OECD, 2019b). Furthermore, immigrant students with the same origin country and SES could still perform differently within different destination countries (OECD, 2019b). For example, immigrant students had a 30+ point reading deficit in: Austria, Denmark, Estonia, Finland, Iceland, Lebanon, Norway, and Sweden (OECD, 2019b). Conversely, some immigrant students performed better than non-immigrant peers in: Australia, Brunei Darussalam, Hong Kong, Jordan, Macao, Qatar, Saudi Arabia, the United Arab Emirates, and the United States (OECD, 2019b). 13 The Major Critique of Prior Research: Secondary PISA Research on Immigrant Students Centers Destination Characteristics over Origin Ones While prior research suggests the importance of demographic characteristics, a critique of this research is that PISA centers destination characteristics over origin ones in studies. This is because PISA questionnaire items collect data from students who have since settled into their post-immigration destination country (i.e., destination country). Therefore, most of the data is about students’ lived experiences within the destination country, with a much smaller number of items on characteristics related to participants’ origin country (e.g., parents’ country of birth). This is a reasonable design choice given the goals of PISA. However, this means that PISA can provide little data about the life of students and their parents prior to immigration, which in turn limits researchers from studying how characteristics of the origin country may introduce variation into the academic achievement of students with an immigrant background. Table 1 shows a breakdown of demographic question counts by origin country or destination country for the PISA 2018 survey. A third column is for demographic questions that overlap both origin and destination questions. See Appendix B for a more detailed breakdown of the count by specific questionnaire (i.e., student, teacher, or school questionnaires). Table 1 Count of Demographic Questions on PISA 2018 Origin Demographics Destination Demographics Both 1 question 15 questions 7 questions 14 Addressing a Critique by Linking PISA with Data of Origin Country Characteristics While destination country data is useful, incorporating more origin country characteristics may provide a more complete picture of the immigrant student experience. Some researchers have linked ILSA data with additional data sets to enhance the analytic opportunities. This allows for analysis that could not be conducted with a single ILSA data set alone. Affordances of Data Linking: Linking ILSAs with Additional Data Enhances Analytic Opportunities Data linking studies have been an effective approach for studying immigrant outcomes in a number of social sciences. One such example is linking data sets to study health outcomes of people with an immigrant background. One example study by Giuntella et al. (2018) linked immigration circumstances with health outcomes. The rationale for the study was to investigate the relationship between reasons for immigration and health outcomes in the United Kingdom. The research questions asked about health outcomes for four different immigration reasons: employment, family, study, and asylum. The study used linear regression models, accounting for age, education, gender, ethnicity, residence, and year. Data was collected from the UK quarterly Labor Force Survey. A main finding was that immigrants who immigrated for employment, study, and family reported better health outcomes than native-born individuals (Giuntella et al., 2018). In addition, asylum seekers reported worse health outcomes (Giuntella et al., 2018). 15 Another example is linking data sets to study economic outcomes of people with an immigrant background. A study by Maskileyson et al. (2021) linked immigration circumstances with economic outcomes. The rationale for the study was to investigate the economic advantages of immigrants to Switzerland by their immigration circumstance (i.e., economical, political, family reunion, or educational pursuit). The research question asked if economic immigrants to Switzerland had higher income attainment compared to immigrants arriving for other reasons. The study used regression analysis of an existing data set. Data was collected for the 2007 Swiss health Survey. One major finding was that immigration motive was a significant factor in income attainment and that the results varied by gender (Maskileyson et al., 2021). Overall, economic reasons for immigration resulted in the highest income levels followed by educational reasons (Maskileyson et al., 2021). In addition, women tended to attain lower incomes than men (Maskileyson et al., 2021). A Minor Critique of Prior Data Linking Studies: Lagging Adoption of Studies Linking Origin Data with Education Outcomes Despite the affordances that data linking can provide for enhancing an analysis, a minor critique of prior data linking studies was regarding the lagging adoption of studies linking origin data with education outcomes. Instead, this type of research is more common in other social science fields such as in studying health or economic outcomes of people with an immigrant background. The lagging adoption in education means there is less prior research to rely upon when planning related studies on education outcomes. This leaves a research gap that, if filled, helps to evaluate the 16 utility of linking additional origin country data for explaining the academic achievement for immigrant students on PISA (see Figure 1). Figure 1 Research with Linked Data Fortunately, despite the research gap, there is a blueprint to follow for linking ILSA data, such as PISA, with additional data sources to enhance the analysis. Specifically, other researchers have provided a set of rationals and a framework for doing these types of studies. There Are Three Rationales for Linking Outside Data Sets While ILSAs are designed to be stand-alone studies, some researchers have linked additional data sources, from outside the ILSA, to enhance the analytic opportunities of the study. An analysis of PISA data in conjunction with non-PISA data may better explain the association between origin country characteristics and the 17 achievement of immigrant learners. There is already precedent for doing these types of data linking studies. Strietholt and Scherer (2016) provided three rationales. One rationale is to research changes over time. For example, Nilsen and Gustafsson (2014) linked data from two consecutive administrations of the same ILSA. The rationale for the study was to illustrate how different cycles of the same ILSAs can be combined at the test item level. The research question asked if changes in Norwegian schools’ emphasis on academic skills caused an increase in science performance. The methods combined TIMSS 2007 and TIMSS 2011 (i.e., student and teacher/classroom data) on anchor items. The analysis was conducted using multilevel structural equation models. The study instruments were two prior implementations of TIMSS. One finding was that schools’ emphasis on academic success was associated with science performance (Nilsen & Gustafsson, 2014). A second rationale is to extend outcome measures. For example, Martin et al. (2013) linked two different ILSAs to be able to say more about outcomes than a single ILSA could say. The rationale for the study was that no ILSA simultaneously assesses reading, math, and science at once. The research questions asked questions about the effects of home environment on students’ achievement. The study’s methods linked TIMSS 2011 data (math and science) with PIRLS 2011 data (reading). Additionally, context data for home, school, and classroom contexts were combined for both ILSAs. The data were linked using unique identification numbers of students who took both ILSAs in 2011. The analysis used multilevel regression. The instruments were one TIMSS and one PIRLS implementation. One finding was that the results were stable for 18 the three combined academic domains of reading, math, and science, with country-level variation in how school characteristics were related to achievement (Martin et al., 2013). A third rationale is to supplement outcome measures with context information. For example, Ruhose & Schwerdt (2016) linked ILSA data with information about academic tracking. The rationale for the study was that the researchers wanted to study the impact of early tracking of students (i.e., ability grouping) using UNESCO data on school structure by country. The researchers asked research questions about the effect of tracking on migrant and non-migrant students in various countries. The methods linked achievement data from PIRLS, PISA, and TIMSS with supplemental data on the country-specific age that tracking begins, from the UNESCO International Bureau of Education. The instruments of the study were the 3 ILSAs. The analyses used a difference-in-difference strategy. The authors’ major finding was that early tracking did not impact achievement gaps between migrant and non-migrant learners (Ruhose & Schwerdt, 2016). There Is A Framework to Guide the Linking of Data Sets In addition to the rationales for data linking provided by Strietholt and Scherer (2016), the authors also provided suggestions for evaluating data attachment points, based on a framework by Bray and Thomas (1995) for comparative and multilevel analyses in educational studies (see Figure 2). The framework consists of three dimensions. The first dimension is geographic/locational (Bray & Thomas, 1995). There are seven levels in this dimension: world regions/continents, countries, states/provinces, districts, schools, classrooms, and individuals. The second dimension is nonlocational demographic groupings (Bray & Thomas, 1995). Some groups in this dimension 19 include: ethnicity, religion, age, gender, etc. The third dimension is regarding aspects of education and society (Bray & Thomas, 1995). Some of the aspects in this dimension include: curriculum, teaching methods, finance, management structures, political change, and labor markets. Figure 2 Attachment Points for Linking Data Sets One illustrative example of the Bray and Thomas (1995) framework is seen in Arikan et al. (2017) who linked data at three different attachment points: (1) geographic/locational; (2) nonlocational demographic grouping; and (3) education and society. Arikan et al. (2017) linked data external with ILSA data as a way to measure the context of immigrant learners. The rationale for the Arikan et al. (2017) study was the 20 observed variation in the ILSA performance of Turkish immigrant students (i.e., nonlocational demographic grouping) by their destination countries (i.e., geographic/locational). The authors attempted to understand Turkish immigrant performance as a function of both individual- and country-level characteristics across multiple countries. The research question asked about the achievement differences across countries and how they related to national integration policies (i.e., education and society). The study’s method was multilevel regression analysis of factors predicting reading and mathematics performance. Predictor variables included both individual-level (e.g., immigration status, social status, etc.) and country-level variables. The country-level variables (e.g., longevity, health, knowledge, standard of living, etc.) were incorporated from the Human Development Index (HDI). Additional country-level data incorporated from the Migrant Integration Policy Index (MIPEX) provided measures of anti-discrimination and general integration of migrant learners. The study instruments were PISA 2009 data with linked country-level external data. One finding was that at the individual-level, immigrant learners performed lower than their mainstream peers in reading and math performance, even when controlling for an index of economic, social, and cultural status (Arikan et al., 2017). Overall, students higher in economic, social, and cultural status performed higher as well (Arikan et al., 2017). Another finding was that at the country-level, destination countries with higher human development index scores were associated with higher reading performance, though not math (Arikan et al., 2017). Turning specifically towards PISA, PISA data provides some attachment points for researchers to link additional data. For instance, the questionnaire includes 21 questions about students’ and parents’ birth country. This provides a point of attachment for linking at the country-level. Another example is the student's age of immigration. This provides a time frame in which relevant origin country data should be linked. Summary of the Research Review & Critique Points The review of prior research highlighted topics relevant to this study. To start, an overview of the PISA assessment was given. Then the first relevant review topic highlighted that immigrant student achievement on PISA has typically been studied using secondary analyses and that there are numerous benefits in doing so (Donnellan et al., 2011; Torney-Purta & Amadeo, 2013a). Second was that demographic characteristics have been a typical research focus for PISA secondary analyses. Demographic characteristics of students (e.g., SES level) impacted PISA achievement more than reform policies (Aloisi & Tymms, 2017). While education reforms typically change the education goal or vision, it is the less malleable demographic factors that explain more variation in PISA scores (Aloisi & Tymms, 2017). Third was that PISA results have shown mixed achievement amongst immigrant students. PISA results suggest that not all immigrant PISA participants scored alike (Schleicher, 2006). PISA reported variation in immigrant student academic outcomes and demographic characteristics (OECD, 2016). Fourth was the increasing immigration in most countries. PISA has detected an increase in the number of students with an immigrant background over time (OECD, 2019a). Then the major critique of the prior research was highlighted: while prior research suggests the importance of demographic characteristics, a critique of this research is that PISA centers destination characteristics over origin ones in studies. This is because 22 PISA questionnaire items collect data from students who have since settled into their post-immigration destination country (i.e., destination country). Therefore, most of the data is about students’ lived experiences within the destination country, with a much smaller number of items on characteristics related to participants’ origin country (e.g., parents’ country of birth). This limits researchers from studying how characteristics of the origin country may introduce variation into the academic achievement of students with an immigrant background. Finally, the study makes an assertion that one way to address the critique of research was by linking PISA data with data of origin country characteristics to enhance the analytic opportunities by allowing for analyses that could not be conducted with the PISA data set alone. A minor critique of prior data linking studies was stated, that despite the affordances that data linking can provide for enhancing an analysis, there is less prior research to rely upon when planning related studies on education outcomes. This leaves a research gap that, if filled, helps to evaluate the utility of linking additional origin country data for explaining the academic achievement for immigrant students on PISA. Fortunately, despite the research gap, there is a blueprint to follow for linking ILSA data, such as PISA, with additional data sources to enhance the analysis. Specifically, other researchers have provided a set of rationals and a framework for doing these types of studies. Strietholt and Scherer (2016) provided three rationales such as changes over time (Nilsen & Gustafsson, 2014), extending outcome measures (Martin et al., 2013), or supplement outcome measures with context information (Ruhose & Schwerdt, 2016). The framework suggests three points of attachment which 23 were geographical/locational, nonlocational demographic groupings, and aspects of education and society (Bray & Thomas, 1995; Strietholt & Scherer, 2016). 24 THE PROPOSED STUDY The Proposed Study: A Multilevel Statistical Analysis of the Association Between Origin Characteristics and the Academic Achievement of Students with an Immigrant Background The review of research highlighted a major critique and made an assertion on how to address this critique. These aspects of the review of literature served as the foundation for the proposed research study. The study was built on three pillars (see Figure 3). The first pillar was linking PISA data with data of origin country characteristics to enhance the analytic opportunities, allowing for analyses that could not be conducted with the PISA data set alone. The second pillar was to center the origin-based characteristics in the analyses. The third pillar was to evaluate the utility of linking this additional data for explaining education outcomes. The study linked PISA data with four additional data sets bringing origin country characteristics into the analysis. The study used multilevel statistical modeling to model the association between origin country characteristics and the academic achievement of students with an immigrant background. 25 Figure 3 Pillars of the Proposed Study Research Questions Were Aligned to Pillars of the Study This study posed two research questions which were aligned to the pillars of the proposed study: 1. Which specific origin country characteristics from the linked data sets have statistical significance for interpreting immigrant students’ PISA reading achievement? 2. How much additional variation in immigrant students’ PISA reading achievement is explained by the linked data sets? Research question #1 was aligned to the first two pillars. These pillars were about bringing in additional data related to origin-based characteristics and making them the focus of the statistical analysis. The answers to research question #1 provided answers as to whether the selected data sets enhance the analysis. Research question #2 was aligned with the third pillar. This pillar was about evaluating the assertion made in this study. The answers to research question #2 26 provided answers as to whether the additional analyses afforded by the linked data sets are worth the effort of linking them in the first place. The Primary Data Are Sourced from PISA 2018; The Analytic Sample Was Selected by Immigration Status PISA 2018 was the primary data source. The data set, as it comes from PISA, includes 546,932 students from 77 countries or territories. The participants of this study were selected into the analytic sample by their immigration status. Immigrant students numbered 14,246 participants (2.6% of all students). The PISA assessment categorizes participants by their immigration status. Participants are categorized as immigrants when their mother and father were born in a country different from the country where the PISA assessment was taken (OECD, 2016). Participants are categorized as non-immigrant when either their mother or father were born in the same country where the PISA assessment was taken OECD (2016). While PISA does differentiate between first-generation and second-generation immigrants, this study focused on just 1st generation immigrant students. There are two reasons for this. One reason was that the PISA 2018 sample did not contain substantial numbers of second generation immigrant students. The second reason was that working with second generation students would add complexity to the study. This would be due to uncertainty around the non-immigrant side of the family’s familiarity with the destination country. This would confound the analysis in a way that is not present when working with just first generation immigrant students. In the end, the analytic sample was further reduced due to missing language data resulting in 9493 students with an immigrant background, 3514 schools, 42 destination countries, and 74 origin countries. 27 The Linked Data Were Sourced from Four Publicly Available Data Sets Using Four Criteria The selection of data sources was based on four criteria. First, the data set must provide characteristics related to immigrant students’ origin country. The reason for this is that the study focuses on characteristics that were imparted by the origin country. Second, the data set must have data for the years in which a student has been alive, plus some years prior. The reason for this is to ensure there is data available for the relevant years leading up to the point of immigration and beyond. Third, the data set must be openly published. One reason for this was to make it easier to evaluate the available data sets freely without having to request access from a data broker. An additional reason for this is to allow future researchers to reproduce and extend on this study without needing to obtain data permissions. Fourth, the data sets must provide information that is not already measured by the ILSA. The reason for this was because of the study’s focus on bringing in new data not already available in PISA. This search criteria yielded 4 data sets which are listed in Table 2. Further details on these data sets are explained in the paragraphs below. Each data set contributed a variable of interest which was used in the statistical modeling. The Procedures section of this research report provides details on the work done with each individual data set. 28 Table 2 Data Sources for Immigration Circumstance Name Source Description Contribution Language Automated Matrix measure of the degree Student-level Distance Similarity of linguistic similarity between language Judgment languages in origin and characteristics Program (ASJP) destination countries. imparted by via the Max Planck Institute for the Science of Human History origin/destination countries Human United Nations Composite measure of Origin Development Development country-level human country-level Index Program development based on three development (UNDP) dimensions: health, education, characteristics and standard of living. Global University of Index measure of country-level Origin Adaptation Notre Dame climate vulnerability and country-level Index (ND-GAIN) readiness to improve climate resilience/climate adaptation characteristics Refugee United Nations Count measures of the Origin/Destination Population High population of forcibly displaced country-level Statistics Commissioner persons in origin and forced Database for Refugees destination countries. displacement (UNHCR) 29 Language Distance (LD) The academic achievement of immigrant learners may be a function of both characteristics of their origin country, their destination country, and how the two interact. An illustrative example of this dynamic is language use and one way to use language analytically is with linguistic distance. Language distance is a measure of how different one language is from another language (Chiswick & Miller, 2005; Gamallo et al., 2017). A few researchers have created and validated their own quantitative measures of language distance (Chiswick & Miller, 2005; Gamallo et al., 2017; Bakker et al., 2009) or developed tools based on that research (Wichmann et al., 2022). Language distance can help explain some of the variation of immigrant learners' educational experience. An early hypothesis was that the less distance between origin and destination country languages, the faster and overall higher attainment level is reached (Corder, 1981). Seminal research by Heath and Heath (1983) provided evidence that learners who grow up outside the dominant culture don’t learn the dominant culture’s ways of doing discourse, which has negative effects on participation within education spaces. Furthermore, language distance can also influence immigration destination choices. For example, a study by Chiswick and Miller (1994) found immigrants to Canada were more likely to live in Quebec, an area where the romance language French is used, if they were arriving from a country that also used a romance language. Additionally, those romance language immigrants were also more likely to become French speakers after arrival (Chiswick & Miller, 1994). Prior Research with Language Distance. Some researchers have used language distance to help explain the immigrant experience in a few ways. Beenstock et 30 al. (2001) studied the Hebrew language proficiency of immigrants to Israel. The rationale for the study was to separately assess the effects of origin country and origin language on destination language proficiency of immigrants. The research question asked whether immigrant language proficiency is an effect of country characteristics or origin language characteristics. The study’s methods used an ordered logistic regression analysis. Data on Hebrew language fluency and literacy was linked from two data sources: a census data set and an immigrant absorption survey (Beenstock et al., 2001). One finding was that greater linguistic distance was associated with greater difficulty of learning the destination language (Beenstock et al., 2001). Arabic speakers had the smallest linguistic distance to Hebrew and thus Arabic speaking immigrants were the most proficient with Hebrew (Beenstock et al., 2001). Additionally, immigrants who originated from multilingual countries exhibited higher Hebrew proficiency as well (Beenstock et al., 2001). In more recent times, other researchers have used measures of linguistic distance to study student academic achievement. Figueiredo et al. (2016) stated that heritage language speakers often struggled in European classrooms. The research questions asked about the effect of age and home language on the development of verbal reasoning and vocabulary tasks in second language learners. The methods started with a sample of 106 Portuguese language learners. Regression analysis was conducted. Data were collected using tests of reading, writing, and comprehension skills. Results showed home language was a significant predictor of variation in learners’ outcomes with linguistic distance explaining the relationships (Figueiredo et al., 2016). Regarding linguistic distance from the target language of Portuguese, 31 learners coming from Indo-Aryan languages struggled the most while those coming from Romance languages struggled less (Figueiredo et al., 2016). A study by Jain (2017) used language distance to investigate how the 1956 reorganization of South Indian states along strict linguistic lines affected academic achievement. The study’s rationale was that matching between students’ native language and school language may have influenced long-term academic achievement outcomes. The study asked two research questions. First, if a language mismatch benefits or hinders long-term education achievement. Second, whether alignment between home and school language leads to catch-up achievement. The study used linear regression analysis. Data was sourced from Indian census records ten years apart starting from 1951 through to 1991. An additional data set was sourced from a different study to provide data on population characteristics. One finding was that mismatched home and school language was associated with achievement (Jain, 2017). A second finding was that language-based reorganization of state boundaries may have remediated achievement gaps, with the greatest impact in areas with a longer history of language mismatch (Jain, 2017). Why is Language Distance Important? Language distance is important because a smaller language distance can have positive results for learners as they can more quickly and deeply access the educational benefits of the destination country. A study by Ammermueller (2007) provided evidence that a match between home and school language is associated with higher achievement. The rationale of the Ammermueller (2007) study was to follow-up on the German PISA 2000 results that reported large differences between immigrant and non-immigrant students. The 32 research question asked: Why did immigrant and non-immigrant students achieve so differently on PISA 2000? The study’s methods used linear regression and a Juhn-Murphy-Pierce decomposition method. In addition to PISA 2000 survey data, the study incorporated additional data from a Germany-specific PISA extension study. One finding was that parents' choice to use the destination language as the home language explained higher achievement (Ammermueller, 2007). Additionally, achievement could also be partially explained by enrolling earlier in German schools, thus gaining additional familiarity with the destination country’s schooling culture (Ammermueller, 2007). When a language match between home and school languages cannot be achieved, there is evidence that a partial language match also provides benefits. One study that investigated this was Azzolini et al. (2012). The rationale for the Azzolini et al. (2012) study was to investigate if immigrant student achievement was similar in both the traditional and recently popular immigration destinations. The researchers’ questions asked: What is the variation in math and reading achievement by immigrant status in Italy and Spain? How much of the variation is accounted for by family background? The study’s methods were hierarchical linear modeling at the student- and school-level fitted with predictor variables for family socioeconomic background, home language, and school characteristics. The Azzolini et al. (2012) analysis did not implement unique instruments as it relied on PISA 2009 survey data. One finding was that students with a home language different from the test language score lower (Azzolini et al., 2012). However, the gap was smaller in Spain when the immigrant student spoke Latin American Spanish, compared to Italy where a language match was less common 33 (Azzolini et al., 2012). These trends were similar in both traditional and newer immigration destinations (Azzolini et al., 2012). Another finding was the variation in achievement was partially attributed to parent occupation and economic integration (Azzolini et al., 2012). Language distance is important for the present study because it informs on how the origin country language may influence pre-immigration decisions and post-immigration communication. Language Distance is Linked to PISA at the Student-Level. A measure of language distance comes from the Automated Similarity Judgment Program (ASJP), created by researchers from the Max Planck Institute for Evolutionary Anthropology. ASJP is used to calculate a lexical distance between languages using the Levenshtein Distance algorithm (1966) and a database of Swadesh Word Lists (Swadish, 1955). The latest versions of software and tools (Wichmann et al., 2022) were developed based on research from Bakker et al. (2009). ASJP produces matrices of language distance between pairs of languages (Wichmann et al., 2022). Language distances are provided in decimal form with lower values indicating less distance between languages and larger values indicating greater language distance (Wichmann et al., 2022). For instance, the language distance between German and English is 67.00 while German and Spanish is 92.98; indicating that German is more similar to English than to Spanish (Wichmann et al., 2022). These values can be used to indicate the similarity or dissimilarity between an immigrant students’ origin country language and their destination country language. These data are linked at the student-level, producing a language distance value for each participant in the PISA data. It is important to note that the language distance 34 measure differs from the other measures used in this study in one key aspect. Language distance is a student-level measure while the other measures are country-level measures. Language distance represents language characteristics at the student-level, which were imparted by the origin and destination countries. In other words, it reflects characteristics of the student’s origin country carried over into the destination country. For example, the language a student uses in the home is associated with their country of origin. Therefore, a student’s language distance values are influenced by the origin country. Human Development Index (HDI) The Human Development Index (HDI) is a composite measure of country-level human development. It is based on three dimensions: health, education, and standard of living. Health is assessed by life expectancy (UNDP, 2023). Education is measured by years of schooling (UNDP, 2023). Standard of living is measured by the logarithm of gross national income per capita (UNDP, 2023). The aim of HDI is to provide a measure of how people and their capabilities, not just economics, can be used to assess the development of a country (UNDP, 2023). HDI was developed with a new concept of what human development should mean; the process of enlarging the set of choices available to people (Klugman et al., 2011). HDI was developed by a Pakistani economist named Mahbub ul Haq and collaborating scholars including Amartya Sen who wanted a different way of measuring development with the then typical per capita income measure (Klugman et al., 2011). HDI is a relatively simple and transparent measure, which has facilitated its wide adoption since its inception in the year 1993 (Klugman et al., 2011). Since then, HDI has often been used by governments for informing policy or 35 resource allocation decisions (Dervis & Klugman, 2011). However, HDI is not without criticisms. One critique is that HDI should broaden to include constructs such as gender equity (Dervis & Klugman, 2011; Sharma, 2010). A different critique is that some countries have large distribution gaps in their indicator levels (e.g., life expectancy) between subgroups (Ghislandi et al., 2019). Another critique is that HDI indicators don't allow much discrimination between the countries at the highest and lowest ends of the distributions (Kovacevic, 2010). Prior Research with HDI? Within research, HDI is typically used as a country-level variable in studies where development level is suspected to be an influential factor in outcome measures. For instance, a study by Vargas-Montoya et al. (2023) investigated the relationship between the use of information and communications technologies (ICT) and student learning outcomes. The Vargas-Montoya et al. (2023) definition of ICT included the use of computers to search or practice skills. The rationale for the study was the inconclusive evidence on whether the use of ICT was associated with positive learning outcomes. The researchers asked questions about whether the country context was relevant to the association. The study’s methods utilized hierarchical linear modeling with HDI as one measure of country context to measure whether, and how, development levels impacted the relationship. The instruments included PISA 2018 data and HDI data. One result was that this relationship did indeed differ depending on whether students were in a country deemed developed or developing (Vargas-Montoya et al., 2023). The authors concluded that country context is an important consideration as results can vary significantly enough for countries by their categorical development level (Vargas-Montoya et al., 2023). 36 Why is HDI Important? An important contribution of HDI was that it decoupled human development from a strictly economic measure, such as income and Gross Domestic Product measures (Dervis & Klugman, 2011; Klugman et al, 2011). HDI is an important index because of its wide acceptance, having gained an audience with political leaders, scholars, and activists (Dervis & Klugman, 2011). This means that studies which use the HDI can readily integrate into the broader conversation around country-level human development (Dervis & Klugman, 2011; Klugman et al, 2011). HDI is important for the present study because it provides a measure of country-level context related to indicators that may hold influence on immigration decisions (i.e., health, education, and standard of living). HDI is Linked to PISA Data at the Country-Level. These data are linked at the country-level, producing an HDI value for each participant in the PISA data. Global Adaptation Index (GAIN) The Notre Dame-Global Adaptation Initiative annually provides the free and open source Notre Dame-Global Adaptation Index (GAIN). The GAIN index includes 45 indicators across 181 countries to report a combined country-level measure of climate vulnerability and readiness to improve resilience/climate adaptation (Chen et al., 2015). The aim of GAIN is to help governments, businesses, and communities use the data to prioritize investments for climate adaptation and lowering the risk of climate vulnerability (Chen et al., 2015). Prior Research with GAIN. GAIN has been used in a variety of quantitative studies to investigate how climate change impacts a variety of outcomes. Some examples of scholarly work that used the GAIN have included work on climate 37 adaptation actions in agriculture (Nowak & Rosentock, 2020) and climate-induced vulnerability in the water sector (Nyiwul, 2023). A study by Chen et al. (2018) investigated the equity and efficiency of allocation of resources for climate change adaptation. A study by Werrell et al., (2015) investigated how climate conditions contributed to societal instability in Syria and Egypt which lead up to 2011 Arab Spring; a series of anti-government protests, uprisings and rebellions. Why is GAIN Important? The GAIN index is important because it provides not only an assessment of country-level vulnerability to climate change, but it also provides information on the readiness for making adaptations to mitigate risk (Chen et al., 2015). This makes the GAIN index both present-focused and future-focused (Chen et al., 2015). Present day vulnerabilities to climate changes are reflected in the vulnerability indicators (e.g., food, water, health, ecosystem service, human habitat, and infrastructure) (Chen et al., 2015). Future ability to endure climate change is reflected in the readiness indicators (e.g., economic readiness, governance readiness, and social readiness) (Chen et al., 2015). This is important since the timetable of climate change has not progressed linearly (Chen et al., 2015). GAIN is important to the present study because it reflects factors that may guide decisions to immigrate. Specifically, when residents of a country look around and see indicators of the country’s readiness or lack of readiness for future climate change, this may influence decisions to immigrate before the full effects of climate change have been realized. GAIN is Linked to PISA Data at the Country-Level. These data are linked at the country-level, producing an GAIN value for each participant in the PISA data. 38 Forced Displacement (FD): Refugee Population Statistics Database Data on forced displacement is typically used to study human migration flows, specifically of vulnerable groups such as displaced persons, asylum seekers, refugees, stateless people, etc. (UNHCR, 2023). While there are numerous data sets of forced displacement, they don’t all cover the same groups of people, with some focusing on more regional or more national levels. The data for this present study comes from the United Nations High Commissioner for Refugees (UNHCR) Refugee Population Statistics Database (UNHCR, 2023). This database provides data on forcibly displaced populations which include refugees, asylum-seekers, and internally displaced people, and stateless persons (UNHCR, 2023). Forced displacement has been described as an emerging development challenge with extreme poverty increasingly concentrated among vulnerable groups such as those who are fleeing conflict and violence (World Bank, 2017). Prior Research with Forced Displacement. Some researchers have used data on forced displacement in research on refugee experiences and refugee impacts on asylum countries. Education outcomes are one subcategory of that refugee experience. A report by Dryden-Peterson (2015) was written with the rationale to address a literature gap for post-arrival refugee education experiences in the United States of America. The research asked questions regarding which elements of refugee students’ education were relevant to their post-resettlement education. The Dryden-Peterson (2015) methods included an analysis of three data sources on refugee education experience: access and quality of education for refugees, field-based case studies, and key informant interviews. The instruments were existing UNHCR data for refugee education 39 access and quality, in addition to the author’s own fieldwork. Select findings were that: refugee learners typically face disrupted schooling, resulting in skill and knowledge gaps; pre-resettlement schooling is typically sporadic; and refugee learners are exposed to multiple languages of instruction (Dryden-Peterson, 2015). Why is Forced Displacement Important? This data is important because the data allow researchers to make country-level inferences about the status of immigration climate and policy regarding immigrants in general (World Bank, 2017). Additionally, it provides information on both country of origin and destination, allowing researchers to follow country-level movement. Furthermore, displaced people can face xenophobia from the communities in which they are hosted (World Bank, 2017). The forced displacement data is important to the present study because it provides a high level sense of the degree to which underlying factors may have driven immigration, both forced and unforced. Forced Displacement is Linked to PISA Data at the Country-Level. The UNHCR reports data annually on country-level forced displacement. The forced displacement data used in this study was obtained from the World Bank Open Data website, measured by the refugee population by country or territory of origin and the refugee population by country or territory of asylum. These data are linked on the country-level, specifically connecting via PISA participants’ destination and origin country. In this study, these data are operationalized as a forced displacement ratio called FD Ratio. FD Ratio is a country-level ratio between inward forced displacement to outward forced displacement. For instance, if 2000 forcibly displaced persons entered a country while 1000 people were forced out by that same country, then the ratio is 40 2000:1000, simplified to 2:1, and 2.0 in decimal form. Decimal form is used for statistical modeling in this study. The decimal values ranged from 0.0 to around 12,000 at the high end. Therefore, a value of 12,000 means that there are 12,000 forcibly displaced persons arriving for every 1 person going out. Higher ratios tend to be more wealthy/desirable countries (i.e., many arrive, few leave). Examples include Denmark, Sweden, and Norway. Lower ratio tends to be less wealthy/desirable countries (i.e., few arrive, many leave). Examples include The Philippines, Afghanistan, and Albania). Acknowledging the Causality Limitations of ILSAs A limitation of ILSAs is that they are cross-sectional studies, which means they collect data on a particular sample of students at one point in time. In contrast are longitudinal studies which collect pre- and post- data from the same sample of students. A critique of ILSAs is that they are not longitudinal and therefore cannot control for prior levels of a given measure. Therefore, ILSAs are not able to establish a causal link between explanatory variables and an outcome variable as they lack the pre- and post-test design. However, these constraints can be mitigated in two ways. One way is to use findings from related, yet different studies to estimate an expected effect that a prior achievement measure would have on regression outcomes. An example of this can be seen in a study by Carnoy et al. (2016) who tried to improve ILSAs results by incorporating longitudinal data to provide more precise, less biased estimated estimates. Carnoy et al. (2016) critiqued the use of ILSAs for making education policy recommendations as cross-sectional studies can identify broad trends but are not able to provide causal inferences. The authors contended that this leads to simplified and 41 misleading conclusions, compared to true longitudinal studies. Therefore, their study was intended to reduce the bias in typical estimates obtained from cross-sectional ILSAs (Carnoy et al., 2016). The researchers asked if researchers could obtain more accurate estimates of the effects of classroom variables on students’ PISA performance. The study used linear regression models to estimate PISA 2012 math achievement using a stepwise sequence of explanatory variables such as individual student characteristics; class and school characteristics; and teacher characteristics. The study instruments were two ILSA data sets; TIMSS 2011 and PISA 2012, both from Russia. The same sample of students took both these assessments one year apart, allowing the researchers to control for prior academic achievement in a way that either ILSA could not provide alone (Carnoy et al., 2016). The major methodological finding for this study was that controlling for previous math achievement resulted in more modest estimates for the relationships between explanatory variables and PISA outcome scores (Carnoy et al., 2016). Before the researchers included prior achievement (i.e., TIMSS 2011 scores) into their models, the effect sizes of pre-service teacher training on PISA math scores ranged from -0.16 to -0.21 standard deviations (Carnoy et al., 2016). After controlling for prior achievement, the estimates reduced to a range of -0.14 to -0.16 standard deviations (Carnoy et al., 2016). Therefore, controlling for prior achievement may lessen the effect size of PISA outcomes between 0.02 and 0.05 standard deviations (Carnoy et al., 2016). Another example comes from Jerrim et al. (2022) who studied the effectiveness of inquiry-based science teaching by linking ILSA data with other measures of science attainment. The rationale for the study was the ongoing debate on the effectiveness of 42 the widely used inquiry-based science teaching in England. There were two research questions: (1) Do young people who receive a higher frequency of inquiry-based science teaching have higher levels of science achievement? and (2) Is there a positive association between specific components of inquiry-based teaching and young people's achievement in science? Jerrim et al. (2022) ran regression models to estimate science achievement on General Certificate of Secondary Education (GCSE) scores, controlling for demographic, prior achievement, and school-level measures. The main outcome variable was GCSE science achievement with a secondary analysis using PISA science attainment as an alternative outcome. The instruments of the Jerrim et al. (2022) study included two main data sources: (1) England's PISA 2015 and (2) England's 2016 National Pupil Database (NPD). Data was linked across these data sets. This provided measures of prior achievement at age 11 (Key Stage 2), age 15 (PISA), and age 16 (GCSE). The main finding was that inquiry-based teaching had a weak and inconsistent relationship with GCSE science attainment (Jerrim et al., 2022). From the methodological perspective, when the researchers included prior achievement (Key Stage 2 scores) into their models this reduced the effect sizes of inquiry-teaching on PISA science scores, by 0.02, 0.03, and 0.12 standard deviations for the respective quartiles (Jerrim et al., 2022). Adding in all the remaining controls further reduced effects sizes by an additional 0.04, 0.03, and 0.00 standard deviations (Jerrim et al., 2022). Therefore, controlling for prior achievement may lessen the effect size of PISA outcomes between 0.02 to 0.12 standard deviations (Jerrim et al., 2022). An alternative way to address causality in ILSAs is to forgo accounting for prior achievement altogether, as controlling for prior achievement may obscure the 43 cumulative negative effects of systemic inequity on academic achievement. Researchers working from a critical-quantitative approach (i.e., Quantcrit) might argue against the assumption of the obligate inclusion of some traditional controls, such as prior achievement (Sablan, 2019). Controlling for prior achievement may obscure the cumulative negative effects of systemic inequity on academic achievement (Frank et al., 2023a). Take for example an intervention study measuring the effect of some teaching practice. In this example, a pre- and post-test of academic achievement is conducted. The resulting analysis then shows that the teaching effect increased achievement by 0.5 standard deviations. From a typical quantitative perspective, this suggests the effectiveness of the teaching practice. However, since the study only measured achievement between the pre-test and the post-test, the researcher may overlook that one group of students (e.g. students of a particular race) still scored significantly lower than other groups, despite similar growth in the time between the pre- and post-test (Frank et al., 2023a; Sablan, 2019). In other words, while all students improved at a similar rate due to the intervention, the preexisting achievement gaps between groups still persists (Frank et al., 2023a; Sablan, 2019). Therefore, omitting pre-tests of prior achievement allows the effects of negative systemic factors to be more obvious (Frank et al., 2023a; Sablan, 2019). The above paragraphs provided two different approaches for addressing the limitation of ILSAs as cross-sectional studies. One approach is to rely on studies that have approximated the pre- and post-test designs with other data sources (Jerrim et al., 2022). The other approach is a purposeful omission of pre-tests (Frank et al., 2023a; Sablan, 2019). Both approaches are relevant to this study. On the one hand, this study 44 can run models that incorporate a proxy measure of prior achievement based on related studies. On the other hand, the rationale for omitting controls of prior achievement is relevant to the study as it investigates how immigrant students’ experience of society may impact their academic achievement rather than identifying specific causal elements. Taken together, the variation between the different approaches may provide helpful context to the analyses. 45 METHODS One Dependent Variable Was Sourced from PISA 2018: Reading Scores The outcome variable in the analysis was the student-level reading scores on PISA 2018. The PISA assessment reports scores as plausible values (PV). Plausible values represent a distribution of possible scores for a given participant, rather than a single individual test score (OECD, 2009). Plausible values represent a range of the scores an individual student would be reasonably expected to score, based on their actual point estimates and the associated probability of these values (OECD, 2009). Plausible values are designed to reduce measurement bias occurring from using a relatively small number of items that a given PISA participant actually sees (OECD, 2009). Typically, modeling software runs regression for each PV; averages the runs (e.g., 10 PVs = 10 models). However, not all software handles plausible values in every case. In particular one software used in this study does not handle plausible values when modeling cross-nested multilevel models. This software was the Scientific Software International (SSI) HLM software (Raudenbush & Congdon, 2021). Not using plausible values introduces bias into estimates and is a limitation of the study. On the one hand, not using plausible values biases estimates towards more extreme scores (Von Davier et al., 2009). On the other hand, averaging PVs biases towards underestimated group-level variances (Von Davier et al., 2009). This study’s compromise is to use just a single plausible value when utilizing software with PV limitations. Overall, this approach is interpreted as providing less confidence in smaller score differences (Von Davier et al., 2009). 46 Four Explanatory Variables Were Sourced From The Linked Data Sets The outcome variable in the study was a student-level reading PISA score. Explanatory variables include a number of demographic variables that come from PISA as well as the variables from the non-PISA data sets which were linked to the PISA data. Additionally, the variables include operational variables needed to conduct the analysis such as weighting variables and anonymized student identity numbers. The complete list of variables is found in Appendix C. Materials: The Complexity of the Data Called for Multiple Tools There are multiple software tools used for this study. Together, these software tools were used to create an analytic sample (i.e., R, Tidyverse, EdSurvey, Countrycode, and fastDummies) and run statistical models (i.e., SSI HLM, WeMix). Table 3 provides more detail about the software tools used in this study. A few other software options were considered in this study on the basis of their functionality for running HLMs. A comparison table is shown in Appendix D. 47 Table 3 Software used for Data Prep and Statistical Models Software Purpose R A programming language and environment for statistical computing (v. 4.2.1) and graphics (R Core Team, 2023). Tidyverse A collection of R packages designed for data science, with a focus on (v. 2.0.0) data import, tidying, manipulation, visualization, and programming (Wickham et al., 2019). EdSurvey A collection of functions for working with complex sample designs, (v. 3.1.0) weights, and plausible values common to NCES, OECD, and IEA education survey and assessment data (Bailey et al., 2023) Countrycode An R package to convert between country names and coding (v. 1.3.0) schemes (Arel-Bundock et al., 2018). fastDummies The goal of fastDummies is to quickly create dummy variables (v. 1.7.1) (columns) and dummy rows (Kaplan & Schlegel, 2023) WeMix A collection of functions for conducting HLMs with multilevel data that (v. 4.0.0) includes weights at multiple levels (Bailey, et al., 2021). SSI HLM A specially designed collection of modeling tools for modeling with a (v. 8.2) wide variety with hierarchical data (Raudenbush & Congdon, 2021). Procedures: Data Preparation Was Needed Prior to Linking Data Sets Obtained PISA 2018 Data The first step was to obtain the ILSA data. The PISA 2018 data was obtained from the OECD PISA 2018 data set website. The website provided a series of 48 downloadable data files. The data download files come as a set of compressed files within compressed folders. The files are available as either SAS or SPSS files. For this study, the SPSS file was downloaded. However, either option is viable as the EdSurvey R package, which was used to read in data, can handle a variety of data file types including CSV, SAS, and SPSS file types. These data were manually downloaded via a web browser onto the local computer. Then, the uncompressed data were placed into a project folder to be uncompressed before working with them. The list of data files downloaded for this study, and their respective file sizes, are shown in Appendix E. Imported PISA Data in R Next, the EdSurvey R package was used to build the raw data set from the now uncompressed data files. EdSurvey contains a function called readPISA() which reads the folder containing the data files and links them into a single data object which is used to start preparing the analytic sample. Cleaned & Prepared PISA Data The first step to preparing the analytic sample was to keyword search for potential variables of interest and select them into the analytic sample. Then, a process of data cleaning was conducted. One illustrative procedure was renaming variables to more human-readable names. For example, the variable name for a student’s year of birth was changed from “st003d03t” to “birth_year”. Another example was changing the name of the school-level weights from “w_schgrnrabwt” to “w_fschwt”, to bring it more inline with the variable name for student-level weights (i.e., “w_fstuwt”). Another illustrative procedure was creating a variable for the ISO3 country code (e.g., “AFG” = 49 “Afghanistan”) because it is easier to reference a 3-digit country code while working on the data than using the full text country name. In addition, a number of derived variables were created. One illustrative procedure was creating a new variable called “linking_years”. This variable contains calendar year values used to link that student’s PISA data with the appropriately dated external country-level data. For an immigrant student, this means the linking_years variable contains a year value for their own year of immigration and the 5 years prior, inclusive. For example, an immigrant student who immigrated in the year 2010 would have linking years of 2010, 2009, 2008, 2007, 2006, and 2005. Therefore, after the data linking procedures, this same immigrant student will have added columns of external data with values for each respective linking_year (i.e., 2010 - 2005). A similar process was done for non-immigrant (i.e., native) students, but their linking_years were built from the year they took the PISA assessment (i.e., 2018) and the 5 years prior, inclusive. For example, a native student would have linking years of 2018, 2017, 2016, 2015, 2014, and 2013. The reason for using two different 5-year windows is as follows. This 5-year window represents a time period, for both immigrant and native students, when their families either did or did not make a decision to immigrate. Therefore, these 5-year windows provided an analogous snapshot of the context of the country in which students’ families were making a decision or non-decision to immigrate. Intuitively, this 5-year window makes sense for immigrant students as it encompasses a period of time right before their eventual immigration event. For native students, the intuition might be a bit harder to understand so an illustrative example is provided. For example, if a 50 Canadian student’s family has never immigrated from Canada, as of the time of PISA 2018 data collection, then that family has continually made the decision to maintain the status quo for the past 5+ years. The ongoing decision to remain (i.e., to not immigrate) may be related to the relative stable state of the home country. However, since this study focused on only students with an immigrant background this was a moot point for natively born students. Yet another data preparation decision was made regarding the immigrant status variable (i.e., immig_status). The immigration status that comes ready-made in PISA 2018 had some peculiarities. For example, the immigration status flag marks some students as NATIVE when those students have a different birth country and test country. Conversely, there are some students who are marked as FIRST-GENERATION immigrants but have the same birth country and test country. Since there was no additional data to clearly discern these data discrepancies, the issue was reconciled by creating a secondary immigrant status flag (i.e., immig_status2) based on either matching or not matching each participant’s birth and test country. This simplified immig_status2 variable thereby provided a simpler way to identify participants as either IMMIGRANT or NATIVE. The SECOND-GENERATION category was not used in this new immigrant status variable. This left any student with matching origin and destination country labeled as NATIVE and any student with non-matching origin and destination country labeled as IMMIGRANT. Any students with a missing value for either origin or destination country (i.e., NA) were dropped from the analytic sample, since it was not possible to link external country-level data to a participant who did not have a birth or test country associated with them. Finally, this study focused on only students with an 51 immigrant background so the sample was further reduced to just first-generation immigrant students. In the end, the analytic sample included 9493 students with an immigrant background, 3514 schools, 42 destination countries, and 74 origin countries. Cleaned & Prepared External Data After creating the initial analytical sample from ILSA data, the external data were cleaned and prepared ahead of linking with the analytic sample. These procedures are described in this section. Some of the external data was relatively ready to be worked with, requiring minimal prep work for this study (e.g., HDI, GAIN). The data prep work for these data sets included subsetting for the years of interest. Additionally, some column names were changed to make later data joining easier. Conversely, the forced displacement data set underwent comparatively more work. The main task was stripping out metadata, pivoting the data into long-format, and joining the separate data for origin and destination counts into a single data set that contains a given country's inflow and outflow of displaced persons by country, by year. In addition, the language distance data required extensive work. The first step to creating the language distance data was to create a list of the languages used by participants in the ILSA analytic sample. This process created a list of all languages listed for participants in PISA 2018. Then, this list was filtered to remove languages that were too broad or specific to be useful for analysis or were simply not available in the ASJP database. A few illustrative examples included: "NON-HAN ETHNIC LANGUAGES (QCN)"; “OTHER FORMER YOGSLAVIAN LANGUAGES (SVN)”; “WESTERN EUROPEAN LANGUAGES''). After obtaining a list of relevant languages 52 from the PISA 2018 analytic sample, The ASJP language matrix was generated on a separate Windows 10 computer. A separate computer was used because the software was written with the Windows command line in mind, while the rest of this study was conducted on a Linux computer. Then this ASPJ matrix was saved and then imported into R. Next, some language name cleaning was performed. An illustrative example was changing "STANDARD_GERMAN" to just "GERMAN". Finally, language pairs and their associated language distances were extracted from the ASPJ matrix and arranged into a standard data frame object, using the PISA language list to filter for ASJP language pairs present in the analytic sample. Joined External Data With PISA Data After the external data were cleaned and prepared, they were joined to the ILSA data set by ISO3 country code and the appropriate linking years. An illustrative example of this linking procedure comes from the joining of the ND-GAIN data set to ILSA data set. The first country, alphabetically, in the ND-GAIN data set is Afghanistan (AFG). The preprocessed ND-GAIN has AFG data for years 2001 to 2018. Therefore, after joining, all rows for AFG participants in the ILSA data had two columns added. One column is for an ND-GAIN value for AFG as an origin country and one column for a ND-GAIN value for AFG as a destination country. The value in both new columns is the same. The purpose for having two columns with the same GAIN values, but with different designations for origin or destination country, is because a country can be either an origin country or destination country depending on a given immigrant student’s individual-level immigration history. 53 In addition to linking the GAIN data set, the other external data sets were linked by country code and the range of linking years. This resulted in a very large, long-format data set. For example, a single participant had a row for their country-level ND-GAIN value for years 2001 to 2018, resulting in 18 rows of data per participant. Next, some additional data processing was done. A 5-year mean value was calculated for the appropriate external country-level data for each student in the PISA data set. For immigrant students, their 5-year mean was calculated based on the 5-years prior to their year of immigration. This created a measure of a country-level variable 5 years before the immigration took place. The rationale for this decision was that the decision and actions to immigrate were likely made somewhere in that time frame. In other words, this measure provides a window into the state of the origin country at the time when decisions and actions regarding immigration were being made. For native students, their 5-year mean values were calculated based on the 5-years prior to the year of the PISA test 2018. This created a measure of a country-level variable 5 years before the PISA data collection point. The rationale for this decision to base this number on the year 2018 was to likewise create a similar 5-year window between the native and immigrant students in the analytic sample. However, in the native students case, their window goes back from their most recent year of non-immigration, compared to immigrant students whose window goes back from their most recent year of immigration. Another way to state this is that the 5-year means capture the state of the countries when native and immigrant student families were making or not making decisions to immigrate. One final process done to these 5-year means was to standardize them since they exist on very different scales. 54 Adjusted Weights Prior to Modeling as Suggested by PISA Documentation An additional procedure needed to be done regarding weights. Nguyen and Kelley (2018) recommend adjusting level-1 weights before fitting mixed-effects models as unscaled weights may cause biased estimates. The EdSurvey R package uses a method from Rabe-Hesketh and Skrondal (2006) to automatically scale weights according to the respective ILSA data set; PISA in this case. However, since EdSurvey’s HLM modeling function was not capable of running a 3-level model, it was not used for modeling. Therefore, this study could not rely on EdSurvey’s automatic scaling of weights. Thus, weight transformation is conducted manually, using the suggested method (Nguyen & Kelley, 2018; Rabe-Hesketh & Skrondal, 2006). Scaling weights is recommended for PISA on level-1, but not for level-2 or level-3 (Nguyen & Kelley, 2018). This procedure produced an additional column in the sample data frame to hold the scaled student-level weights, keeping the original weights column as well for cross-reference. Note that country-level weights remained with a value of 1 for all countries. This was done to satisfy potential software weight input requirements while preserving the parity between each country. Final Analytic Sample: PISA and External Data Linked Finally, the joined analytic sample was reduced down to one row per participant, with the country-level variables representing 5-year means. There were 77 countries/territories represented in the analytic sample. This final list of countries is found in Appendix F. 55 Specified and Ran the Multilevel Models Finally, the multilevel models were run. This study involved the creation of a “baseline model”/“null model”/“unconditioned model” followed by multiple models to test the association between the variables of interest and the outcome variables of PISA reading scores. Given the complexity of the data used in this study, the analytic methods require special attention. The following section on the analytic method summarizes those methods with the complete report found in Appendix G. The Analytic Method: An Asymmetric Cross-Classified Data Structure Called For A Two-Stage Approach to Multilevel Modeling Hierarchical PISA Data Suggests Multilevel Models PISA data has a hierarchical structure with students nested within schools nested within countries. Data with a hierarchical structure is a candidate for multilevel modeling methods (Braun et al., 2009). Multilevel modeling encompasses a variety of very similar sub-methods including multilevel modeling, mixed modeling, or hierarchical linear modeling. Multilevel modeling affords researchers the ability to model effects at each level of the data, thus obtaining estimates of covariates and variance at each respective level (Beaton et al., 2011). Asymmetric Cross-Classified Data Structure To understand the methods of this study, it is important to first understand the structure of the data. With hierarchical data, such as PISA, student data can vary systematically based on how they are grouped (e.g., school of attendance; country of residence). This means it is important to pay attention to these levels in statistical modeling. There are three levels: level 1 is the student-level, level 2 is the school-level, 56 level 3 is the country-level. Furthermore, part of the data structure is strictly nested while another part is cross-nested. The strictly nested part is students nested within schools nested within destination countries. The cross-nested part is the students also nested within the origin countries. This asymmetric cross-classified data structure presented a challenge for statistical modeling. Figure 4 is a visualization of the data structure. Figure 4 Structure of the Data 57 The Data Structure Presented a Challenge for Statistical Modeling The challenge for statistical modeling was due to the available software not supporting this model structure. The desired model specification was cross-classifying level-1 within two different country-level groups (i.e., destination country; origin country). Figure 5 shows the structure of the data. Figure 5 Structure of the Data 58 Initially, the WeMix R package was targeted for use in statistical modeling for this study (Bailey, et al., 2021). However, a limitation of the software was that it only handles fully nested data, such as the 3-level fully nested structure shown in Figure 6. But this was a problem because the data for this study has students cross-nested within two different country-levels. Figure 6 Structure of the Data 59 Then the SSI HLM software was targeted for statistical modeling in this study (Raudenbush & Congdon, 2021). However, while SSI HLM does support cross-classifying, it requires the cross-classifying to be level-2 cross-classified within two different level-3 groups (see Figure 7). The problem here is that level-2 (i.e., schools) isn’t really nested within level-3b (i.e., origin countries). The schools are located only within destination countries, not origin countries. This modeling challenge prompted an investigation into the variance decomposition to determine whether there really is a need to model with all available levels (i.e., student, school, destination country, and origin country). The results of this investigation are explained in the section below. Figure 7 Structure of the Data 60 Examining Variance Proportions by Level Suggested That All Levels Are Important Several unconditioned models were run to investigate the variance decomposition of each level, as a way to determine the importance of each level. Both two-level and three-level models were specified for comparison. The results showed that meaningful variance was attributed to each level of data ranging from 23% to 55% of the variance within nested models. This suggested that all levels are important and should be included in the multilevel models. The model outputs reported the proportion of variance explained by each respective level. Prior to running models, data were sorted by either destination country or origin country before modeling, producing differing results. Table 4 and Table 5 report on the first set of models (i.e., 2 levels; students nested within schools). The first column indicates the model levels. Columns 2 through 4 report the variance estimates from the WeMix R package or SSI HLM, respectively. The use of normalized weights is denoted where necessary. A run with WeMix using normalized weights was conducted to match what SSI HLM does automatically, so as to aid comparison. The results of these models indicated that school-level explained between 44% to 70% of the variance, depending on destination/origin sorting and software used. These results suggest that school-level is important for modeling immigrant students’ reading achievement on PISA 2018. 61 Table 4 Variance Decomposition for 2-Level Unconditioned Model (Students within Schools; Destination Country Sorted) Levels WeMix WeMix SSI HLM (Non-Normalized (Normalized (Normalized Weights) Weights) Weights) Level 1: Student 0.9388 Level 2: School 0.0612 0.5591 0.4409 0.2753 0.7250 Table 5 Variance Decomposition for 2-Level Unconditioned Model (Students within Schools; Origin Country Sorted) Levels WeMix WeMix SSI HLM (Non-Normalized (Normalized (Normalized Weights) Weights) Weights) Level 1: Student 0.9388 Level 2: School 0.0612 0.5591 0.4409 0.1923 0.8080 62 Table 6 and Table 7 report on the second set of models (i.e., 2 levels; students within countries). The column descriptions are the same as the table above. The results of these models indicated that country-level explained between 23% to 24% of the variance, depending on destination/origin sorting and software used. These results suggest that both the destination and origin country-level are important for modeling immigrant students’ reading achievement on PISA 2018. Table 6 Variance Decomposition for 2-Level Unconditioned Model (Students within Countries; Destination Country Sorted) Levels WeMix WeMix SSI HLM (Non-Normalized (Normalized (Normalized Weights) Weights) Weights) Level 1: Student 0.7663 Level 2: School 0.2337 0.7663 0.2337 0.7663 0.2337 Table 7 Variance Decomposition for 2-Level Unconditioned Model (Students within Countries; Origin Country Sorted) Levels WeMix WeMix SSI HLM (Non-Normalized (Normalized (Normalized Weights) Weights) Weights) Level 1: Student 0.7601 Level 2: School 0.2399 0.7601 0.2399 0.7601 0.2400 63 Table 8 and Table 9 report on the third set of models (i.e., 3 levels; students nested within school nested within destination countries or origin countries). The column descriptions are the same as the table above. The results of these models indicated that school- and country-level combined explain between 45% to 66% of the variance. These results suggest that school-level and country-level combined are important when modeling immigrant students’ reading achievement on PISA 2018. Table 8 Variance Decomposition for 3-Level Unconditioned Model (Students within Schools with Destination Countries; Destination Country Sorted) Levels WeMix WeMix SSI HLM (Non-Normalized (Normalized (Normalized Weights) Weights) Weights) Level 1: Student 0.7444 Level 2: School 0.0208 Level 3: Country 0.2347 0.4992 0.2747 0.2261 0.5520 0.2220 0.2265 64 Table 9 Variance Decomposition for 3-Level Unconditioned Model (Students within Schools within Countries; Origin Country Sorted) Levels WeMix WeMix SSI HLM (Non-Normalized (Normalized (Normalized Weights) Weights) Weights) Level 1: Student Level 2: School Level 3: Country NA NA NA NA NA NA 0.3858 0.3875 0.2660 Note. The NA values are due to WeMix giving an error when trying to use the origin country as level-3: “Not a nested model; WeMix only fits nested models.” In summary, the aforementioned structure of the data presented a modeling challenge which prompted an investigation into the variance decomposition to determine the importance of each level. Several unconditioned models were run to investigate the variance decomposition of each level, as a way to determine the importance of each level. Both two-level and three-level models were specified for comparison. The results showed that meaningful variance was attributed to each level of data ranging from 23% to 55% of the variance within 3-level models. This suggested that all levels are important and should be included in the multilevel modeling of students’ reading achievement on PISA 2018. Therefore, having determined the importance of all levels, and due to the asymmetric cross-nested data structure, a two-stage approach to modeling was implemented. These stages are explained in the following sections. 65 Stage 1: Identifying Destination Countries with Outsized Influence; Chipping Away Destination Country Variance The first stage of the two-stage approach to modeling was to identify destination countries that have an outsized impact on student achievement. This was done by specifying a sequence of 3-level hierarchical models with students nested within schools nested within destination countries. Figure 8 is a visualization of the model structure. Figure 8 Structure of the Data 66 In Stage 1, three models were specified and run towards identifying destination countries that have an outsized impact on student achievement: Model 00, Model 01, and Model 02. For each model the following steps were taken: 1. Specify and run a 3-level hierarchical model. 2. Examine country-level residuals for outliers countries. a. Initial outlier threshold started with largest positive and negative residual values (i.e., +/-100). 3. Add outliers as fixed effects in subsequent stage 1 models. 4. Re-check variance. 5. Repeat until destination country-level variance was at 5% or less of the overall variance (i.e., non-significant). Once that variance reduction goal had been achieved, twelve countries had been added into the model as fixed effects dummy variables (i.e., Korea, Ireland, New Zealand, Philippines, Dominican Republic, Indonesia, Canada, Turkey, Australia, Greece, North Macedonia, and Morocco). The destination country variance reduced in each model from 22% down to 10% and then 4%, meeting the goal of reducing the variance below 5% (see Table 10). 67 Table 10 Model 02 Variance Estimates Compared to Model 00 Model Level Variance Variance By Level Model 00 Student 75780 School Country Model 01 Student School Country Model 02 Student School Country 3004 2971 7575 3015 1206 7578 3006 483 56% 22% 22% 64% 26% 10% 68% 27% 4% The important takeaway here was this list of countries to carry forward into Stage 2, while essentially removing the destination country-level as its own level going forward. Ultimately, the aim of this Stage 1 process was to be able to control for these outlier countries by transforming them into fixed effects (i.e., predictor variables) at the school-level for the models in Stage 2. The rationale for adding these countries as level-2 variables is that each school can be reasoned as having characteristics imparted upon it by the country the school is located within. This process solves the issue with complex cross-nesting as it essentially removes destination country as its own level, 68 and instead puts the impact of destination country as a covariate within the model, thereby allowing origin country influences to become the focal point of the modeling. Figure 6 shows the specifications of the final Stage 1 model. The complete report of this step-by-step Stage 1 process is found in Appendix G. Only the final Stage 1 model, Model 02, is shown below. Figure 6 Model 02 Specification 69 Table 11 shows the fixed effects for Model 02. The intercept (i.e., 425) represents the reference point from which to evaluate the twelve fixed effect coefficients (e.g., Australia is associated with +69 PISA reading score). Table 11 Model 02 Fixed Effects 70 Table 12 shows the variance decomposition for Model 02. After adding the 12 fixed effects, the destination country variance decreased to 483. Table 12 Model 02 Variance Components Stage 2: Specify and Run Cross-Classified Multilevel Models Using Variables of Interest to Explain PISA Reading Scores of Immigrant Students The second stage of the two-stage approach to modeling was to specify a 2-level, cross-classified multilevel model. Students are cross-nested within two different level-2 groups: schools and origin countries. Figure 9 is a visualization of the data as it is to be modeled. The model is level-1 (students) cross-nested within level-2a (schools) and level-2b (origin countries). Note that the destination country is not a level itself but rather fixed effect variables at school-level. This essentially removed the destination country-level as it was controlled for by the 12 outlier countries found in Stage 1. 71 Figure 9 Structure of the Data Nine total cross-nested models were specified and run for Stage 2. Each model was evaluated during the model building process. One important measure is the log-likelihood. Log-likelihood is a value output by modeling software which can be used to compare against other models. A higher value means a better model fit. Moreover, there are three commonly used goodness of fit measures based on the log-likelihood value: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Deviance. All three are used to compare models. A lower AIC, BIC, or deviance means a better model fit (Hox et al., 2017). These goodness of fit measures are used to assess whether additional predictors improve the model. A lower value means a better model. This is a way to test the significance of a variance component, by comparing the value of one model with that component compared to the same value for a model without it 72 (Hox et al., 2017). The collective measures of goodness of fit are used to holistically evaluate the models built for the study. It is recommended to compare the models according to a variety of criteria rather than a single fit criterion (Scott et al., 2013). Furthermore, it is informative to report how the models change from model to model rather than just report a single and final model (Scott et al., 2013). Stage 2 started with a series of models building towards a baseline model from which to add the predictors of interest in this study. This started with the null model (i.e., Model 0). Model 1 added the 12 fixed effects countries identified in Stage 1. Model 2 added control variables (e.g., sex, socio-economic status, school socio-economic status, etc.). Model 2 was then the baseline model from which to test the variables of interest against, one at a time. Next, the primary variables were tested one at a time in its own individual model (i.e., Model 3 = LD, Model 4 = HDI, Model 5 = GAIN, and Model 6 = FD Ratio). One of these variables was the student-level language distance while the rest were the country-level variables. Only Language Distance (i.e., Model 3) was a statistically significant coefficient. Next, a secondary version of the country-level variables of interest were tested one at a time in its own individual model (i.e., Model 7 = Immigration Year HDI, Model 8 = Immigration Year GAIN, and Model 6 = Immigration Year FD Ratio). These were more specific versions of the country-level variables based on each student’s particular year of immigration. For example, Immigration Year HDI was a measure of each student’s HDI value around their unique time of immigration. This differed from the initial HDI 73 variable in that the initial version had a shared value for all students from that country. None of these models had a statistically significant coefficient. In the end 9 total cross-nested models were specified and run for Stage 2. Table 13 compares the deviance and variance estimates between all models, which is used to assess the change in goodness of fit between models. Model 3 deviance was the lowest amongst the models. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance from the baseline model (Model 2) to Model 3 was statistically significant (p=<0.001). Additionally, the variance decreased at all levels. In the end, this modeling procedure identified just one covariate of interest that was statistically significant. This covariate was language distance from Model 3. The other variables were not statistically significant when tested individually. 74 Table 13 Comparison of All Models Model Deviance Level Variance Variance By Level 29% 51% 20% 35% 50% 15% 40% 48% 12% 39% 48% 12% 40% 48% 12% Model 0 (Null) Model 1 (Null + Stage 1 Effects) Model 2 (Null + Stage 1 Effects + Controls) Model 3 (Language Distance) Model 4 (HDI) 92931 Student 3625 School 6510 Country 2518 92355 Student 3638 School 5205 Country 1585 91720 Student 3539 School 4293 Country 1112 91625* Student 3488 School 4267 Country 1079 91718 Student 3539 School 4291 Country 1089 75 Table 13 (cont’d) Model Deviance Level Variance Variance By Level Model 5 (GAIN) Model 6 (FD Ratio) 91716* Student 3540 School 4289 Country 1075 91716* Student 3538 School 4296 Country 1041 Model 7 91672* Student 3515 (Immigration Year HDI) School 4169 Country 3047 Model 8 91695* Student 3506 (Immigration Year GAIN) School 4276 Country 2079 Model 9 91715* Student 3528 (Immigration Year FD Ratio) School 4319 Country 1058 40% 48% 12% 40% 48% 12% 33% 39% 28% 36% 43% 21% 40% 48% 12% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 2. Bold text denotes models with covariates that were statistically significant within the model. 76 The complete report of this step-by-step Stage 2 process is found in Appendix G. Only the best fit Stage 2 model, Model 3, is shown below. Figure 10 shows the specifications of Model 3. Figure 10 Model 3 Specification 77 Table 14 shows the fixed effects for Model 3. The intercept (i.e., 4756 represents the reference point from which to evaluate the coefficients (e.g., language distance is associated with -0.31 PISA reading score). Table 14 Model 3 Fixed Effects 78 Table 15 shows the variance decomposition for Model 3. Table 15 Model 3 Variance Components 79 How Findings Answered the Research Questions RESULTS Research question #1 asked: Which specific origin country characteristics from the linked data sets have statistical significance for interpreting immigrant students’ PISA reading achievement? The results reported that only one covariate from the linked data was statistically significant: Language Distance. The rest were not. Research question #2 asked: How much additional variation in immigrant students’ PISA reading achievement is explained by the linked data sets? Table 16 shows that the results reported that Model 3 explained an additional 30.2% of variance compared to the null model. Model 3 also explained an additional 1.2% of variance compared to the baseline model. Even though the proportion of additional variance explained was small (+1.2%), the applications of the language distance model were of practical significance because of the additional context that the language distance model provided around students’ language use. 80 Table 16 Unexplained Variance Remaining By Model Model M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 Variable Null Model Fixed Effects Destination Countries Baseline Language Distance Human Development Index Global Adaptation Index Forced Displacement Unexplained Variance 12652 10427 8944 8834 8919 8904 8875 Human Development Index (Immigration Year) 10731 Global Adaptation Index (Immigration Year) Forced Displacement (Immigration Year) 9861 8905 First Descriptive Result: Immigration Patterns Were Asymmetric and Varied One descriptive result was that students immigrated asymmetrically from a larger set of origin countries to a smaller set of destination countries. Figure 11 shows the distribution of countries as either: destination countries, origin countries, or both. There were 43 countries that only sent immigrant students (i.e., Origin Only). There were 32 countries that sent and received (i.e., Both). There were 10 countries that only received immigrants (i.e., Destination Only). 81 Figure 11 Distribution of Countries by Destination, Origin, or Both Regarding varied immigration patterns, these results show that there are different immigrant patterns for the various destination countries. For instance, the immigration movement to Canada (i.e., CAN) has little origin country overlap with Qatar (i.e., QAT). One reason for this may be due to common characteristics between origin/destination pairings. For instance, some destination countries share at least one common language with their top origin pair (e.g., Philippines to Canada). Others are physically near and share a border (e.g., Russia to Azerbaijan). Yet others may be related to multiple combined factors such as (1) a major geopolitical event, (2) sharing a land border, and (3) sharing a language (e.g., Syria to Jordan). Figure 12 is a visual representation of the 82 immigration patterns. Nodes placement (i.e., the boxes) were based on origin or a destination country status. For instance, China (i.e., CHN) was exclusively an origin country and thus placed far left while Australia was mostly a destination country (i.e., AUS) thus placed far right. Countries with similar immigration in and out, such as The Philippines (i.e., PHL) are placed more centrally. Node size represents case count. Figure 12 Immigrant Paths from Destination to Origin Countries 83 Table 17 is a tabular form of the data shown in Figure 12 above. It shows the paths of immigrant students from origin to destination country, sorted by destination country, in descending order of total immigrant count. The first two columns indicate the origin to destination country pairings. The third column denotes how many students followed that route. The final column is just a reminder for how many overall immigrants were in the sample for a given destination country. Only the Top 10 countries by overall immigrant count are shown. See Appendix H for the complete list. Table 17 Immigrant Students Paths from Origins to Destinations, Sorted by Country # 1 2 3 4 5 6 7 8 9 10 Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count CAN CAN CAN CAN CAN CAN CAN CAN CAN CAN 526 278 253 171 102 100 73 66 51 44 PHL USA CHN IND GBR PAK KOR FRA IRN SYRA 84 1705 1705 1705 1705 1705 1705 1705 1705 1705 1705 Table 17 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 CAN QAT QAT QAT JOR JOR JOR AUS AUS AUS AUS AUS AUS AUS AUS NZL NZL 41 1035 333 254 849 87 35 249 177 148 143 143 27 10 8 161 93 ARE EGY JOR YEM SYR IRQ EGY GBR NZL PHL CHN IND VNM GRC ITA GBR ZAF 85 1705 1622 1622 1622 971 971 971 905 905 905 905 905 905 905 905 548 548 Table 17 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 NZL NZL NZL NZL NZL CHE CHE CHE CHE CHE CHE CHE CHE BEL BEL BEL BEL 89 84 80 24 17 133 95 89 32 24 13 10 8 103 72 66 8 PHL CHN AUS KOR FJI PRT ITA DEU FRA ESP TUR AUT ALB NLD DEU FRA TUR 86 548 548 548 548 548 404 404 404 404 404 404 404 404 247 247 247 247 Table 17 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 45 46 47 48 49 50 51 52 53 54 55 56 AZE AZE AZE ARG ARG ARG ARG ARG ARG CRI CRI CRI 154 42 5 91 67 15 11 7 99 171 14 6 201 201 201 191 191 191 191 191 215 191 191 191 RUS GEO TUR BOL PRY BRA CHL URY NIC NIC COL PAN 87 Table 18 shows the paths of immigrant students from origin to destination country, sorted by the pairings with highest counts overall. The first two columns indicate the origin to destination country pairings. The third column denotes how many students followed that route. Only the Top 25 paths are shown, with the complete list found in Appendix Table I2. These results highlight some of the most frequent migration paths. For example, with the Top 10 migration paths, Canada appeared as an origin country three times. Another example was Jordan appearing as both an origin country and destination country. Table 18 Immigrant Students Paths from Origins to Destinations, Sorted by Count # 1 2 3 4 5 6 7 8 9 10 Destination Country Origin Country Origin to Destination Count QAT JOR CAN QAT CAN QAT CAN AUS IRL AUS EGY SYR PHL JOR USA YEM CHN GBR GBR NZL 88 1035 849 526 333 278 254 253 249 178 177 Table 18 (cont’d) # Destination Country Origin Country Origin to Destination Count 171 171 161 154 148 143 143 133 133 122 103 102 100 98 95 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 CAN CRI NZL AZE AUS AUS AUS CHE GEO GBR BEL CAN CAN GRC CHE IND NIC GBR RUS PHL CHN IND PRT RUS IRL NLD GBR PAK ALB ITA 89 The Second Descriptive Result: Same Language Pairings Were Frequent Among Immigrant Students A second descriptive result was the frequency of same language pairs amongst immigrant students. There were many cases of immigrant students who used the same language at home as they do in school. This suggests that language familiarity may have driven immigration choices. It may also suggest immigrant families change their home language to match the schooling language. However, the former is more likely than the later for two reasons. The main reason is supporting research suggesting that adults are unlikely to change the home language to match school language (Kang, 2013; Liu, 2018). Another reason is that the median age of immigration for students in the sample was 7 years old; likely past the age when family home language norms would have been solidified. Table 19 which shows selected language pairing, their language distance, and the number of instances the pairings are found in the data. A minimum criteria of 20 cases or more was used for selection into this table. One aspect of this table to notice is that the most frequent language pairings are equivalent languages (n=6263). Non-equivalent language pairings are bolded within the table. Another aspect of the table to note is that some PISA countries did not capture useful data regarding the home language. For example, Canada data had 852 cases of home language recorded as “Another Language (CAN)”; the data did not provide a specific language. Therefore language distance could not be calculated for those cases. 90 Table 19 Language Pairs with 30+ Instances # 1 2 3 4 5 6 7 8 9 10 11 12 13 Language Home Test Count Distance Language Language 0.00 0.00 NA 0.00 0.00 Arabic Arabic 2245 English English 1852 English 852 Another Language (CAN) Spanish Spanish 543 Serbocroatian Serbocroatian 304 97.76 Arabic English 0.00 0.00 0.00 NA French French German German Azerbaijani Azerbaijani English Another Language (AUS) 102.30 Mandarin English 0.00 NA Russian Russian Another English Language (NZL) 273 190 182 169 162 154 125 101 91 Table 19 (cont’d) # Language Home Test Count Distance Language Language 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 0.00 0.00 NA 0.00 78.74 97.75 97.76 0.00 0.00 0.00 91.35 93.53 0.00 89.66 96.06 99.77 Georgian Georgian Dutch Dutch French Another Language (CAN) Hebrew Hebrew Portuguese French Albanian English Greek Arabic Portuguese Portuguese Romanian Romanian Italian Italian English French Portuguese German Greek Greek Russian Latvian Hindi English Turkish German 92 95 78 72 69 66 60 58 57 57 54 47 44 24 24 32 32 Table 19 (cont’d) # Language Home Test Count Distance Language Language 30 31 32 33 34 35 36 37 38 39 40 41 42 43 100.43 Russian Georgian 47.55 Estonian Finnish 0.00 NA 96.88 95.73 Finnish Finnish Spanish Quechua/ Guaran/ Mapuche English Hebrew German French 100.68 Russian Azerbaijani 0.00 96.50 Danish Danish Polish German 98.83 Korean English 100.11 Russian Finnish NA 0.00 NA Another Language (JOR) Korea Arabic Korean Missing Arabic 93 32 31 30 30 28 27 27 26 26 26 26 26 26 25 Table 19 (cont’d) # Language Home Test Count Distance Language Language 44 45 46 47 48 49 50 51 52 0.00 61.92 Czech Czech Russian Ukrainian 103.10 Vietnamese English 55.70 97.35 67.96 73.23 95.68 95.88 Ukrainian Czech Cantonese English Portuguese Spanish Ethiopic Hebrew Albanian German Punjabi English 24 24 23 22 22 21 20 20 20 94 Table 20 is similar to Table 19 above but now with the 0.00 language distances filtered out. When the results are filtered for just non-zero language distances, more fine grain relationships can be highlighted within the table. Illustrated examples are provided below the table. Table 20 Language Pairs with 30+ Instances (LD=0.00 Removed) # Language Home Test Count Countries Distance Language Language (# of students) 1 97.76 Arabic English 273 EGY -> QAT (203) JOR -> QAT (55) … 2 102.30 Mandarin English 154 CHN -> AUS (75) 3 4 5 6 7 8 78.74 Portuguese French 97.75 Albanian Greek 97.76 English Arabic 66 60 58 CHN -> NZL (72) … PRT -> CHE (66) ALB -> GRC (60) EGY -> QAT (32) … 91.35 English French 47 USA -> CAN (20) … 93.53 Portuguese German 44 PRT -> CHE (41) … 89.66 Russian Latvian 34 RUS -> LVA (27) … 95 Table 20 (cont’d) # Language Home Test Count Countries Distance Language Language (# of students) 9 96.06 Hindi English 32 IND -> AUS (31) … 10 99.77 Turkish German 32 TUR -> AUT (11) TUR -> CHE (7) TUR -> DEU (6) … 11 100.43 Russian Georgian 32 RUS -> GEO (31) … 12 47.55 Estonian Finnish 31 EST -> FIN (31) … 13 96.88 English Hebrew 28 USA -> ISR (26) … 14 95.73 German French 27 DEU -> BEL (20) … 15 100.68 Russian Azerbaijani 27 RUS -> AZE (23) 16 17 96.50 Polish German 98.83 Korean English … 26 26 POL -> DEU (26) KOR -> NZL (22) … 18 100.11 Russian Finnish 26 RUS -> FIN (20) … 96 Table 20 (cont’d) # Language Home Test Count Countries Distance Language Language (# of students) 19 61.92 Russian Ukrainian 24 RUS -> UKR (19) … 20 103.10 Vietnamese English 23 VNM -> AUS (22) … 21 55.70 Ukrainian Czech 22 UKR -> CZE (21) 22 23 24 25 97.35 Cantonese English 67.96 Portuguese Spanish 73.23 Ethiopic Hebrew 95.68 Albanian German 22 21 20 20 … CHN -> AUS (22) BRA -> URY (14) … ETH -> ISR (20) ALB -> CHE (5) ITA -> CHE (5) DEU -> CHE (4) … 26 95.88 Punjabi English 20 IND -> AUS (16) … 97 Primary Inferential Result: Greater Language Distance Was Associated with Lower PISA Reading Scores The primary inferential result of the analysis was the negative association between language distance and PISA reading scores. A one unit increase in language distance was associated with a -0.31 change in reading score, when holding all other variables constant. To contextualize a one unit change in language distance, it is important to provide context for the language distance measure. The language distance measurement scale ranges from 0 to 104 (Wichmann et al., 2022). The larger the language distance (i.e., bigger number) between the two languages, the more dissimilar they are (Wichmann et al., 2022). A few example language pairings from Wichmann et al. (2022) are provided below for context. On the lowest end of the language distance measure are exact language pairs (e.g., home = Spanish, test = Spanish; LD = 0.00). Next, are the smallest language distances, aside from 0 values, which start at approximately 24 for this study’s analytic sample. One example of very similar languages are Slovak and Czech (language distance = 33), which are highly mutually intelligible, belonging to the same Czech-Slovak language family. In the middle of the range are Portuguese and Spanish (language distance = 68) which share mutual intelligibility as they come from the same Ibero-romance language classification. An example of two dissimilar languages are Cantonese and Mandarin (language distance = 81) which are mutually unintelligible. An example of the furthest end of the language distance scale are Vietnamese and English (language distance = 103). 98 One way to utilize the results of the analysis, is to use the association between language distance and PISA reading scores to anticipate the expected score differential an immigrant student may encounter given their home language and the language used in the school of their destination country. Note that this score is not country-dependent as the association between PISA reading scores and language distance is modeled across all language pairs in the data set, though the model is controlled for ESCS and sex on the student-level as well as school-level ESCS. One illustrative example of language distance in action comes from the Czech Republic. The destination country of Czech Republic received immigrant students who used Ukrainian at home (n=22). The language distance between Ukrainian and Czech is 55.70 which means that an immigrant student who used Ukranian in the home but Czech in school would be associated with a PISA reading score 17.27 points lower than immigrant students who used Czech in both locations (n=24). These values compared within their respective formulas are as follows: (Ukrainian/Czech: 55.70 * -0.31 = -17.27) vs. (Czech/Czech: 0.00 * -0.31 = 0.00). Also within the destination country of the Czech Republic, some immigrant students used Slovak in the home (n=13). The language distance between Slovak and Czech is 32.82 which means that immigrant students who used Slovak in the home but Czech in school were associated with a PISA reading score 10.17 points lower than immigrant students who used Czech in both locations (n=24). These values compared within their respective formulas are as follows: (Slovak/Czech: 32.82 * -0.31 = -10.17) vs. (Czech/Czech: 0.00 * -0.31 = 0.00). 99 But the comparison between language groups is where the language distance result is most useful. Amongst immigrant students in the Czech Republic, the difference between immigrants who use Ukrainian at home and Czech in school (-17.27) versus those who use Slovak at home and Czech in school (-10.17) is 7.1 points. A 7.1 point difference corresponds to an effect size of 0.07, which can be described as small yet statistically significant effect size. Interpreting PISA Score Differences To interpret the significance of the point differences, such as shown above, the PISA literature provides guidance. PISA results are: … scaled to fit approximately normal distributions, with means around 500 score points and standard deviations around 100 score points. In statistical terms, a one-point difference on the PISA scale therefore corresponds to an effect size (Cohen’s d) of 0.01; and a 10-point difference to an effect size of 0.10 (OECD, 2019a, p. 43). Table 21 shows how to interpret a range of effect sizes. Table 21 How to Interpret Effect Sizes Found in this Study Cohen’s d Effect Size Proportion of One Standard Deviation 0.01 Very Small 1% of a Standard Deviation 0.2 0.5 0.8 Small 20% of a Standard Deviation Medium 50% of a Standard Deviation Large 80% of a Standard Deviation 100 Table 21 (cont’d) Cohen’s d Effect Size Proportion of One Standard Deviation 1.0 2.0 Very Large One Standard Deviation Huge Two Standard Deviations Sources: (Cohen, 1988; Sawilowsky, 2009) The effect sizes in this study tended towards the smaller side. But effect sizes tend towards the smaller side for many educational actions. Prior research suggests the typical effect size range of 0.10 to 0.20 for the association between most education efforts and PISA scores. One study found an effect size range of -0.14 to -0.16 for the association between pre-service teacher training and PISA math scores (Carnoy et al., 2016). A second study found an effect size range of 0.10 to 0.20 for the association between student perception of instruction quality and PISA reading performance (Hu & Wang, 2022). Another study found an effect size of 0.02 for annual education reform policies (Aloisi & Tymms, 2017). Yet another study found an effect size range of -0.10 to 0.2 association between inquiry-based teaching and PISA science scores (Jerrim et al., 2022). A final study found an effect size range of 0.10 to 0.20 association between self-regulated learning and PISA reading scores (Lau & Ho, 2016). 101 The Language Distance Results Took On Greater Meaning Within Some Contexts Over Others The language distance model had practical significance because of the additional context that language distance provided around students’ language use. Language distance affords a closer look at the association between language and PISA reading scores in countries; especially in countries with linguistic diversity. There were some highlighted examples of the value of this additional context using four particular countries and their language pairs: Switzerland, Finland, Qatar, and Israel. Switzerland: Multilingualism Afforded Multiple Language Distances Associations with PISA Reading Scores; Results Most Impactful for Portuguese Speakers with Effect Size of 0.09 Depending on Choice of Schooling Language The first example context is Switzerland where Portuguese speaking immigrant students scored differently depending on their schooling language (see Table 22). The Swiss education system offers schooling in multiple languages (i.e., German, French, Italian, or Romansh) (Kużelewska, 2016). Unsurprisingly, immigrant students who used the same language in school and home did not suffer a language penalty. These students made up the reference group for interpreting the score differences. There were 121 cases of immigrants who used Portugues in the home but attended school in a different language because Portuguese speaking immigrant students didn’t have a Portuguese schooling option. Amongst immigrant students who used Portuguese at home but a different language at school, using French at school (n=66) was associated with a PISA reading score -24.50 points lower than immigrant students who used the same language at both home and school (e.g., 78.74 * -0.31 = -24.50). Using Italian at 102 school (n=11) was associated with a PISA reading score -20.41 points lower than those using the same language at home and school. Using German at school (n=44) was associated with a PISA reading score -28.99 points lower. Table 22 Score Differentials for Portuguese to French, German, & Italian Home School Count LD Score Effect Language Language Difference Size French French Portuguese French 38 66 00.00 00.00 NA 78.74 -24.50 -0.25 Italian French 11 78.25 -24.26 -0.24 Italian Italian 54 00.00 00.00 NA Portuguese Italian 11 65.86 -20.41 -0.20 German German Portuguese German Albanian German Italian German 79 44 16 12 00.00 00.00 NA 93.53 -28.99 -0.29 95.68 -29.66 -0.30 86.30 -26.75 -0.27 While it is useful to see how each group compares to the reference group, where language distance is most useful is looking between each language group. For example, the difference between attending an Italian school and a German school was 8.58 points. This tells us that Portuguese speaking immigrant students attending an Italian language school over a German one was associated with an approximate 0.09 103 effect size. While these effect size differences may seem small at first glance, within the context of other studies, they are indeed impactful, since prior research suggests the typical effect size range of 0.10 to 0.20 for the association between many education efforts and PISA scores. Therefore, language distance is as impactful as many other educational actions. These results call attention to the effect that school language choice can have on immigrant students. Finland: Language Pairings Evenly Distributed Between Finnish, Estonian, and Russian; Language Distance Between Estonian and Russian Associated with Effect Size of 0.15 on PISA Reading Scores The second example context was Finland where Estonian and Russian speakers scored differently within Finnish schools (see Table 23). Immigrant students who used the same language in school and home (n=30) did not suffer a language penalty. These students made up the reference group for interpreting the score differences. Amongst 57 cases of immigrants who used Finnish at school but a different language at home, students using Estonian at home were associated with a PISA reading score -14.74 points lower than immigrant students who used the same language at both home and school. Students using Russian at home were associated with a PISA reading score -31.03 points lower than those using the same language at both locations. The difference between using Estonian at home versus Russian at home was 16.29 points. This tells us that speaking Russian was associated with 0.15 effect size. These results highlight the language-based disparity between Finland’s two most common immigrant language groups, suggesting which groups likely require more language support. 104 Table 23 Score Differentials For Finnish to Estonian & Russian Home School Count LD Score Effect Language Language Difference Size Finnish Finnish Estonian Finnish Russian Finnish 30 31 26 00.00 00.00 NA 47.55 -14.74 -0.15 100.11 -31.03 -0.31 Qatar: Arabic and English Were Common Instructional Languages; English Language Instruction Associated with 0.30 effect size on PISA Reading Scores The third example context was Qatar where 262 immigrant students chose to attend school in a different language from their home language. In Qatar, English is used as a Medium of Instruction (EMI) alongside Arabic (Hillman & Ocampo, 2018; Mustafawi & Shaaban, 2019). While most immigrant students used the same language in school and home (n=1123) and did not suffer a language penalty, there was interest in English language instruction from about 250 cases of either Egyptian or Jordanian students immigrating to Qatar to go to school in English while still using Arabic at home. However, this interest with English language schooling was associated with language-based challenges (see Table 24). Students using English at school were associated with a PISA reading score -30.31 points lower than immigrant students who used the same language at both home and school. The difference between using Arabic at home and English at school was 30.31 points. This tells us that Qatar’s English language instruction option was associated with a 0.30 effect size. These results 105 highlight the draw that English language instruction may be having for Qatar and the anticipated effects that may follow if English instruction demand continues to grow. Table 24 Score Differentials Based on Arabic to English & Hindi Home School Count LD Score Effect Language Language Difference Size Arabic Arabic 1233 00.00 00.00 NA English Arabic 51 97.76 -30.31 -0.30 Arabic English 262 97.76 -30.31 -0.30 Hindi Arabic 9 77.91 24.15 -0.24 Israel: Immigration from Jewish Ethiopians Led to Ethiopic and Hebrew Language Pairings; Speaking Ethiopic Languages Associated with 0.07 Effect Size Compared to English The fourth example context was Israel. In Israel, Hebrew and Arabic are the two languages of instructions (Kelly et al., 2020). However, only Hebrew language schools were present in this data set. Immigrant students who used the same language in school and home (n=69) did not suffer a language penalty (see Table 25). These students made up the reference group for interpreting the score differences. There were 75 cases of immigrants who used Hebrew at school but a different language at home. Students using English at home were associated with a PISA reading score -30.03 points lower than immigrant students who used the same language at both home and school. Students using an Ethiopic language at home were associated with a PISA 106 reading score of -22.70 points lower. The difference between using English at home and Ethiopic at school was 7.33 points. This tells us that speaking Ethiopic languages was associated with an effect size of about 0.07 compared to English. These results are meaningful because the Ethiopian immigrants in this sample are likely Jewish Ethiopians. Other research shows that Jewish Ethiopians have difficulties with post-immigration integration, resulting in economic, educational, and social stress (Berhanu, 2005; Ringel et al., 2005). However, when looking at just the language distance results, they suggested that the Jewish Ethiopian students may have a linguistic advantage over other groups, which highlights how this group could benefit less from language support and more from social support. Table 25 Score Differentials Based on Hebrew to English, Ethiopic, & Arabic Home School Count LD Score Effect Language Language Difference Size Hebrew Hebrew Arabic Hebrew English Hebrew Ethiopic Hebrew French Hebrew 69 11 28 20 16 00.00 00.00 NA 78.11 -24.21 -0.24 96.88 -30.03 -0.30 73.23 -22.70 -0.23 93.05 -28.85 -0.29 107 Summary of The Results One descriptive result was that students immigrated asymmetrically from a larger set of origin countries to a smaller set of destination countries. A second descriptive result was the frequency of same language pairs amongst immigrant students. There were many cases of immigrant students who used the same language at home as they do in school. The primary inferential result of the analysis was the negative association between language distance and PISA reading scores. A one unit increase in language distance was associated with a -0.31 change in reading score, when holding all other variables constant. The language distance model had practical significance because of the additional context that language distance provided around students’ language use. Language distance affords a closer look at the association between language and PISA reading scores in countries; especially in countries with linguistic diversity. There were some highlighted examples of the value of this additional context using four particular countries and their language pairs: Switzerland, Finland, Qatar, and Israel. 108 The Meaning & Significance of the Principle Finding DISCUSSION The principal finding was the statistical significance of the language distance measure. The meaning of that finding was that linking PISA data with language distance data did indeed provide additional context for interpreting how home and school language is associated with students' PISA reading scores. This finding is significant because it affords a deeper analysis into the association between language characteristics, which were imparted by the origin and destination countries, and academic outcomes. Language distance is a continuous scale that tells us to interpret more than a simpler binary language match measure that PISA can provide. This utility was shown in the four example country contexts. This deeper analysis could not be conducted using just the PISA data alone. The Utility of Language Distance Over Time An affordance of language distance is its utility for measuring the variation between language groups across time. One example was shown in the Switzerland results. The Switzerland results suggested paying attention to the effect of school language choice for a particular immigrant group (i.e., Portuguese speakers). But, the groups that require attention are not static. Table 26 indicates changes in the top immigration groups within Switzerland during a recent 10-year period. First, the dominant group became less dominant (i.e., Germany). Second, two groups increased representation (i.e., Italy & France). Third, the major group of interest decreased (i.e., Portugal). Finally, a new emerging group has appeared (e.g., Romania). Together, this 109 highlights that as the language groups shift overtime, the utility of language distance to measure the variation between groups remains. Table 26 Changes in Top Language Groups within Switzerland Year: 2010-19 Year: 2020 Germany (16.8%) Germany (14%) Italy (10.7%) Italy (12%) France (9.2%) France (11.5%) Portugal (9.1%) Portugal (5.5%) Spain (4.1%) Spain (4%) UK (3%) Romania (3.5%) Poland (2.7%) Poland (3.3%) Source: (OECD, 2022) Another example of the utility of language distance across time comes from the Finland results. In Finland, these results highlighted the language-based disparity between Finland’s two most common immigrant language groups (i.e., Estonian and Russian), suggesting which groups likely require more language support. The languages of instruction in Finland are primarily Finnish or, to a lesser extent, Swedish (Latomaa & Nuolijärvi, 2002). These are two languages without widespread global use which means that most immigrant students to Finland will be coming from an origin 110 country that does not use Finnish or Swedish. Importantly, there have been shifts in the top immigration groups over time (see Table 27). In Finland, in the year 2015, people from Estonia (16%) and Russia (10%) were the top two nationalities as a percentage of total incoming migrants. (OECD, 2017). However, by 2020, people from Estonia had decreased down to 7% while people from Russia had increased up to 15% (OECD, 2022). These trends suggest an increasing importance for Finnish educators to attend to Russian immigrants as the language distance results suggest large scoring gaps for the latter group. Table 27 Changes in Top Language Groups within Finland Year: 2015 Year: 2020 Estonia (16%) Estonia (7%) Russia (10%) Italy (15%) Source: (OECD, 2017; OECD, 2022) The Utility of Language Distance For Anticipating Language Challenges In Qatar, the language distance results highlighted the draw that English language instruction may be having for Qatar and the anticipated effects that may follow if English instruction demand continues to grow, especially without investment in language support. The bifurcation of the Qatari educational system by either Arabic or English has caused some social discord regarding the adoption of Western, English language culture versus local Qatari traditions (Eslami et al., 2020). Furthermore, 111 research on trends in language use in Qatar suggest that Arabic’s status is falling while the status of English has increased due to perceptions of advantages for education, business, and integration with international affairs (Ellili-Cherif & Alkhateeb, 2015). The policies that have brought English as a medium of instruction into the country have necessitated the introduction of different types of EMI approaches including foundational/bridge programs to build language skills, or bilingual programs that support the use of both languages (Eslami et al., 2020). Whichever approaches are taken, the results of this study support the usefulness of language specific academic supports for immigrant learners regardless of whether learners come from Arabic speaking homes into English speaking classrooms or vice versa. The Utility of Language Distance for Parsing Out Language from Culture In Israel, the language distance results identified which immigrant students likely require the most academic support, based on language characteristics of immigrating students. There were multiple language pairings between Hebrew and English, Ethiopic, French, or Arabic. Among immigrant students in Israel, the language distance associated with the smallest language deficit was Ethiopic languages. Israel received multiple periods of immigration from Jewish Ethiopians; people of Jewish ancestry residing in present-day Ethiopia (Ringel et al., 2005). Importantly, Jewish Ethiopians who immigrate to Israel have difficulties with integration into the destination country society, resulting in economic and social stress (Ringel et al., 2005). Some of the reasons for this include racial discrimination, intergenerational conflicts amongst Jewish Ethiopian families, and differences in communication styles (Ringel et al., 2005). Along educational dimensions, this particular immigrant group also lags in academic 112 performance partially due to the local school system being unfamiliar with how identity, belongingness, and negotiation of meaning take place within the cultural groups (Berhanu, 2005). In terms of language, similar results were reported in the current study. However, when looking at just the language distance results, they suggested that the Jewish Ethiopian students may have a linguistic advantage over other groups, which highlights how this group could benefit more from social support and less from language support. The Meaning & Significance of the Analytic Method Developed for This Study The analytic method also had meaning and significance as well. The meaning of this analytic method was that it provided one solution to the modeling challenge that the asymmetric, cross-nested data structure presented. This helped overcome the constraints of the statistical modeling software. The findings of the pre-work suggested that all levels should be addressed in the modeling of students’ reading achievement on PISA 2018. The significance of this finding was that it suggested that each level of the data has something important to contribute to explaining the outcome measure. In addition, Stage 1 produced a list of 12 destination countries that have larger than typical influence on student reading scores. The significance of this finding was that these 12 destination countries could be used as fixed effects covariates, which now controlled for, solves the issue with complex cross-nesting as it essentially removes destination country as its own level, and instead puts the impact of destination country as a covariate within the model. This allowed the statistical modeling software to specify the desired models; something that was not possible prior due to the constraints of the statistical modeling software already explained in the methods section. 113 Connecting the Findings Back to the Literature Review This section connects the findings of the study back to the literature review topics. The first review topic highlighted that immigrant student achievement on PISA has typically been studied using secondary analyses and that there are numerous benefits in doing so (Donnellan et al., 2011; Torney-Purta & Amadeo, 2013a). This study continued in this tradition by also conducting a secondary analysis. The second review topic was that demographic characteristics (e.g., SES, gender, language, nationality, parent occupation, etc.) were associated with PISA outcomes (Aloisi & Tymms, 2017). The findings of the study supported this prior research with the significance of a measure based on a demographic characteristic: language. The third review topic was that PISA results have shown mixed achievement amongst immigrant students (Schleicher, 2006). The findings of this study supported this prior research since the results also showed that immigrant students in this sample were a diverse subgroup with asymmetric immigration patterns, used a variety of languages, and scored differently on PISA reading. The fourth review topic was the increasing immigration in most countries (OECD, 2019a). This study did not investigate this longitudinal trend. The major critique of prior research was that secondary PISA research on immigrant students centers destination country characteristics over origin ones. This study addressed this critique by linking and then testing a set of origin measures, finding one that did indeed enrich the analysis. A minor critique of prior data linking studies was regarding the lagging adoption of studies linking origin data with education outcomes. This study addressed this research gap by entering the space and contributing a set of results towards this research area. 114 Implications of This Study Focus and Design of Future Studies First, there were implications for the focus and design of future studies. One implication is that it is important to include or at least address each level of multilevel data in related analyses. Another implication is that the multi-stage analytic method used in this study may be used to overcome software limitations of other studies with asymmetric, cross-classified data. Yet another implication is the potential to explore other data sets that can be linked with large-scale international assessments to enhance the analysis. One more implication is that efforts towards linking data at the lowest level possible may require pursuing restricted-use data which is more likely to contain identifiable information at the student-level compared to higher level data with just country-level data. While open access data sets can be easier to find and work with, there may be more benefit in obtaining access to more restricted use data, with its greater degree of identifying information and therefore more potential attachment points for linking data. Measurement in Large-Scale Assessments Next, are implications for measurement in large-scale assessments. The primary implication is that student-level origin data may better enhance the analysis over country-level origin data. Therefore if ILSAs ever alter their scope of data collection, student-level origin data should be increased. Data Linking Studies There were implications for the level at which data sets are linked. The framework presented in the review of research demonstrated that there are multiple 115 dimensions that serve as attachment points for which to link data (Bray & Thomas, 1995; Strietholt & Scherer, 2016). This study focused on two of those dimensions. One dimension was nonlocational demographic groupings (e.g., language distance). Another dimension was the geographic/locational dimension (e.g., country-level). The geographic/locational dimension includes seven levels ranging from the macro level to micro level: world regions/continents, countries, states/provinces, districts, schools, classrooms, and individuals. The results of this study suggest that characteristics of the student may be more influential than the origin country characteristics. For instance, the one significant measure, language distance, was a student-level measure. This interpretation aligns with the findings from the analysis of the variance decomposition of the unconditioned models which showed origin country characteristics with the lowest variance explained: student-level (~39%), school-level (~39%), and country-level (~27%). However, these same results also show that each level still merits attention for explaining immigrant students PISA reading results. Therefore, the implication is that the origin country is worth investigating and when linking PISA with origin country characteristics, those characteristics should be as close as possible to the individual-level (i.e., students) and that higher levels (i.e., countries) provide less explanatory value. The student-level language distance measure accomplished this by measuring how the origin country language carries over into home life after immigration while the particular country-level measures may have been too broad. A counterpoint to linking at the lowest possible level is that there is a tradeoff for using lower level measure. The lower the level, the less data readily available to link to each participant in the principal data set. An illustrative example with PISA comes from 116 this study itself. At the most micro level of PISA (i.e., student), some of the PISA participants used very uncommon language at home (e.g., “OTHER FORMER YUGOSLAVIAN LANGUAGES”) which meant the linked language distance data was missing and these students were not included in the analysis. Conversely, at the most macro level of PISA (i.e., country) every student had an associated destination and origin country which meant no students needed to be excluded for missing country-level data. Additionally, as missing data increases, the more challenging it becomes to adequately model the outcome variable. This study dealt with this using listwise deletion, removing cases where data was missing. Therefore, when a researcher sets out to ask and answer more micro level research questions, with more selective samples, the less likely there are linkable data sets, and the less likely they will have meaningful results to report. Practice & Policy in Education Finally, there were implications for educational practice. One implication is to plan instruction around students’ particular language pairs and corresponding language distances, instead of just origin country identity. This is in service of culturally responsive teaching (Gay, 2000; Gay, 2002). Culturally responsive teaching centers students’ cultural characteristics, experiences, and perspectives (Gay, 2000; Gay, 2002). Therefore, while two students may share the same origin country identity, they can have different origin languages. Closer attention to those pairs and the degree to which they differ is a recognition of the cultural variation that exists amongst students who share a country identity. Specifically, a funds of knowledge approach to instruction is suggested. 117 The term “funds of knowledge” is in reference to the collective knowledge and skills that are historically and culturally developed by students (Moll et al., 2006). A funds of knowledge approach to instruction highlights the strengths, interests, identities, social backgrounds, and cultural backgrounds that students’ bring from their origin countries (Moll et al., 2006). Teachers can incorporate linguistic funds of knowledge into their instruction. For instance, a literacy lesson can be planned around the loan words shared between students' origin and destination language. This affords immigrant students with opportunities to use their existing origin language resources while continuing to develop their origin language. Furthermore, it affirms cultural and linguistic identities which has been shown to improve academic achievement (Wu et al., 2021). A second implication is for education policy, particularly for schools to allocate language support resources around the particular language pairs most associated with lower reading scores. Language support services encompass a wide range of services. For instance, effective language support services include many accommodations ranging from computer-administered glossaries to fully-translated assessments (Pennock-Roman & Rivera, 2011). Another example is providing translations on assessments so that language ability isn’t confounded with cognitive ability (Alt et al., 2013). This can reduce cases where language learning students are misdirected towards special education services (Sanchez et al., 2010). Similar recommendations have been made for school psychologists to use cognitive assessments in multiple languages or non-verbal versions, to reduce referrals to special needs services when language services are needed (Olvera & Gomez-Cerrillo, 2011). Taking findings from this study, the results suggested that in Finland, immigrant students who use Russian at 118 home struggled compared to students using Estonian at home. Therefore, language accommodations such as extra time on assessments can be differentiated by particular languages (i.e., Russian speaking students require more assessment time). Another implication for education policy is regarding school accountability. It is recommended to keep a record of whether students have ever received language support services, even after exiting language support services (Robinson-Cimpian et al., 2016). This allows schools to continue to assess students’ language growth past the point of language proficiency deemed acceptable by the school. Likewise, recording students' language distances provides a starting reference point from which to gauge student academic growth after they exit language support services. Retaining a record of language distances can show a school which language pairs progress faster or slower relative to each other, rather than just knowing that students have reached the schools’ desired language proficiency levels. For instance, schools can know if the approximately 15 point deficit between Russian and Estonian speakers in Finland remains constant as the two groups progress through language milestones over time. Non-Significant Findings & Reconsidering Assumptions The non-significant findings are important to discuss in relation to both the phenomenon investigated in this study (i.e., reading achievement of students with an immigrant background on PISA) and for methodological considerations. They were also interpreted for how they prompted the researcher to reconsider prior assumptions. Prior to the analysis there was a hypothesis that PISA did not have adequate coverage of important characteristics imparted by immigrant students' origin country. Additionally 119 there was a hypothesis that linking origin characteristics with ILSAs would therefore enhance related analyses. The findings of this study suggested a nuanced answer to this hypothesis. On the one hand, the language distance result did support these hypotheses because the addition of language distance allowed for a more nuanced analysis for particular countries with particular language pairs. Something that existing PISA data alone could not provide. On the other hand, the other linked measures did not enhance the analysis. Therefore, these results prompted a reconsideration of these two assumptions. One reconsideration is whether PISA 2018 already contains sufficient depth and breadth and the PISA data overlapped whatever the linked data brought to the analysis, thereby reducing the benefit of linking additional data. For instance, perhaps PISA’s measure of social, economic, and cultural status may already capture what the linked Human Development Index measure brought to the analysis. A second reconsideration is the importance of data levels for addressing PISA data collection gaps. Specifically, that any linked origin characteristics should be as close as possible to the individual-level (i.e., students) and that higher levels (i.e., countries) may provide less explanatory value; at least the ones tested within this study. Future Research Extending from this Study One future extension from this study is to model language distance on top of language match (i.e., same home and schooling language). This would combine the linked contribution of language distance with the already available language match measure derived from PISA data. The reason is to model the degree of contribution that language distance provides beyond a binary language match covariate. To 120 operationalize this, a binary language match variable would be used in the analysis. This variable captures whether each student’s home and schooling languages are the same. A value of 1 is assigned to cases where the home and school language match. A value of 0 is assigned to cases where the languages do not match. Then multilevel models would be specified first using language match followed by language distance, comparing the change in model between the two. A second future extension is to explore the non-linear effect of language distance. The reason is to explore whether the linear relationships found in this study (i.e., -0.31 coefficient for language distance) has a turning point somewhere along the slope line (e.g., effect wears off at some point). To operationalize this, a new variable would be created. This variable would be the language distance variable squared. Then multilevel models would be specified using the squared language distance variable along with the linear effect (i.e., already established language distance variable). A third future extension is to interact the time in the destination country with language distance in a statistical model. The reason for this is to explore whether language distance matters less the longer a student is in the country. To operationalize this, a derived variable for time since immigration would need to be created. This was already done for this study during the data processing procedures. A fourth future extension is to interpret the fixed effects for the twelve destination countries. The reason for this is to investigate what made those destinations different from the rest. To operationalize this, the countries could be interpreted in a qualitative or mixed-methods manner. As noted earlier, quantitative secondary analysis of ILSAs can 121 generate research questions to be answered by smaller mixed-methods studies (Torney-Purta & Amadeo, 2013a). A fifth future extension is to switch the focus of the study from origin countries to destination countries. The reason would be to explore the characteristics imparted by the destination country which impact students’ PISA reading achievement. These results could then be paired with the results from the origin country analysis to tell a more holistic story. To operationalize this, the analysis would again use a multi-stage approach to control for origin country fixed effects and explore the destination country variables that most impact student academic achievement. One final extension of this study is to conduct a repeat analysis on the recently released PISA 2022 data. The reason would be to add a longitudinal component to this existing analysis. To operationalize this, the procedures for this study would be repeated but with improvements based on lessons learned from this initial study. Limitations of the Study There were a few limitations of the study to highlight. One limitation is that PISA anonymizes individual- and school-level data. This makes it harder to link data at these lower levels and easiest to link by country identity. This is important because the results of this study suggested linking data at the lowest level possible. A second limitation was the fidelity to which immigrant students were included within the initial PISA sample. The PISA technical documents explain how the PISA sample was defined (OECD, 2019c). For instance, there were exclusion criteria for students who had insufficient experience in the assessment language (i.e., non-native speaker; limited proficiency; less than one year of instruction). However, it is uncertain 122 how well each country followed those criteria. Furthermore, there has been evidence that language learners are overrepresented in special education services because many are misidentified as such (Rueda & Windmueller, 2006; Sanatullova-Allison & Robison-Young, 2016). This means that some immigrant students who would otherwise be eligible for PISA could have been excluded for other reasons. A third limitation comes from the results of sensitivity analysis. Sensitivity analysis is a technique to examine the robustness of an inference to unobserved or hypothetical conditions that cannot be directly addressed with the observed data (Frank et al., 2023b). The results are useful for quantifying the terms of uncertainty in a model due to bias from omitted variables or sampling variability (Frank et al., 2013; Frank et al., 2023b). An online tool for sensitivity analysis was used to find how robust the language distance result was (Rosenberg et al., 2023). To invalidate the principle inference from Model 3, 13.79% of the estimate would have to be due to bias. This is based on a threshold of -0.265 for statistical significance (alpha = 0.05). Therefore, to invalidate an inference, 634 observations would have to be replaced with cases for which the effect is 0 (RIR = 634). There is no specific threshold for interpreting the robustness of an inference, as this is context dependent (Frank et al., 2013). However, replacing approximately 14% of the observed cases with cases where no relationship exists between language distance and PISA reading scores (i.e., counterfactual cases) could invalidate the inference. This replacement rate is not so high as to be improbable. For instance, the sample size for this study dropped from 14,246 cases to 9,493 cases due to missing data. If these cases were not missing data, there could conceivably be at least 634 counterfactual cases among them which could introduce sampling bias to the 123 degree to invalidate the inference. Generally, the higher the cost for taking action based on the inference, the higher the robustness of the inference should be (Frank et al., 2013). A fourth limitation is related to plausible values. As already stated, the SSI HLM software does not handle plausible values in cross-nested multilevel models. Therefore, this study’s compromise is to use just a single plausible value when utilizing software with PV limitations. Overall, this approach is interpreted as providing less confidence in smaller score differences (Von Davier et al., 2009). A fifth limitation is that this study cannot make statements of causality between the explanatory variables and the outcome variable; results are limited to interpreting associations between the variables. This is because PISA is a cross-sectional design which does not control for prior achievement. 124 CONCLUSION A secondary analysis of PISA 2018 data was conducted to investigate the reading achievement of students with an immigrant background. While prior research suggests the importance of demographic characteristics in secondary PISA analyses, a major critique is that PISA centers destination characteristics over origin ones, limiting the study of the association between origin characteristics and academic achievement. This study was proposed to address the critique by linking PISA data with data of origin country characteristics to allow for analyses that could not be conducted with the PISA data set alone. The study linked PISA data with data of origin country characteristics, centered the origin-based characteristics in the analyses, and then evaluated the utility of linking this additional data for explaining education outcomes. Multilevel statistical modeling was used to model the association between origin country characteristics and the academic achievement of students with an immigrant background. Linked data were: (1) Language Distance—student-level measure of similarity between home and school language; (2) Human Development Index—country-level measure of human development; (3) Global Adaptation Index—country-level measure of climate vulnerability and readiness to improve resilience/climate adaptation; and (4) forced displacement ratio—country-level ratio between inward/outward forced displacement. The principal finding was the statistical significance of the language distance measure. This means that linking PISA data with language distance did indeed provide additional context for interpreting how home and school language is associated with students' PISA reading scores. This is significant because it affords a deeper analysis into the association between language characteristics, which were imparted by the 125 origin and destination countries, and students’ academic outcomes. This utility was shown in four example country contexts (e.g., Switzerland, Finland, Qatar, Israel). This deeper analysis could not be conducted using just the PISA data alone. Implications were made for (1) focus and design of future studies; (2) measurement in large-scale assessments, (3) data linking studies, and (4) practice and policy in education 126 REFERENCES Aloisi, C., & Tymms, P. (2017). PISA trends, social changes, and education reforms. Educational Research and Evaluation, 23(5-6), 180-220. https://doi.org/10.1080/13803611.2017.1455290 Alt, M., Arizmendi, G. D., Beal, C. R., & Hurtado, J. S. (2013). The effect of test translation on the performance of second grade English learners on the KeyMath‐3. Psychology in the Schools, 50(1), 27-36. https://doi.org/10.1002/pits.21656 Ammermueller, A. (2007). Poor background or low returns? Why immigrant students in Germany perform so poorly in the programme for international student assessment. Education Economics, 15(2), 215-230. https://doi.org/10.1080/09645290701263161 Andon, A., Thompson, C. G., & Becker, B. J. (2014). A quantitative synthesis of the immigrant achievement gap across OECD countries. Large-Scale Assessments in Education, 2(1), 1-20. https://doi.org/10.1186/s40536-014-0007-2 Arel-Bundock, V., Enevoldsen, N., & Yetman, C.J. (2018). Countrycode: an R package to convert country names and country codes. Journal of Open Source Software, 3(28), 848. https://doi.org/10.21105/joss.00848 Arikan, S., Van de Vijver, F. J., & Yagmur, K. (2017). PISA mathematics and reading performance differences of mainstream European and Turkish immigrant students. Educational Assessment, Evaluation and Accountability, 29(3), 229-246. https://doi.org/10.1007/s11092-017-9260-6 Azzolini, D., Schnell, P., & Palmer, J. R. (2012). Educational achievement gaps between immigrant and native students in two “new” immigration countries: Italy and Spain in comparison. The Annals of the American Academy of Political and Social Science, 643(1), 46-77. https://doi.org/10.1177/0002716212441590 Bailey P., Emad A., Huo H., Lee M., Liao Y., Lishinski A., Nguyen T., Xie Q., Yu J., Zhang T., Buehler, E., Bundsgaard J., C'deBaca R., & Christensen AA. (2023). EdSurvey: Analysis of NCES education survey and assessment data. R package version 3.1.0, https://www.air.org/project/nces-data-r-project-edsurvey Bailey, P., Kelley, C., Nguyen, T., Huo, H., & Kjeldsen, C. (2021). WeMix: Weighted mixed-effects models using multilevel pseudo maximum likelihood estimation. R package version 4.0.0, https://github.com/American-Institutes-for-Research/WeMix 127 Bakker, D., Müller, A., Velupillai, V., Wichmann, S., Brown, C. H., Brown, P., Egorov, D., Mailhammer, R., Grant, A., & Holman, E. W. (2009). Adding typology to lexicostatistics: A combined approach to language classification. https://doi.org/10.1515/LITY.2009.009 Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01 Beaton, A.E., Rogers, A.M., Gonzalez, E., Hanly, M.B., Kolstad, A., Rust, K.F., Sikali, E., Stokes, L., and Jia, Y. (2011). The NAEP Primer (NCES 2001-463). U.S. Department of Education, National Center for Education Statistics. Washington, DC. Beenstock M., Chiswick, B.R., Repetto, G.L. (2001) The effect of linguistic distance and country of origin on immigrant language skills: application to Israel. International Migration 39(3):33–60. https://doi.org/10.1111/1468-2435.00155 Berhanu, G. (2005). Normality, deviance, identity, cultural tracking and school achievement: The case of Ethiopian Jews in Israel. Scandinavian Journal of Educational Research, 49(1), 51-82. https://doi.org/10.1080/0031383042000302137 Braun, H., Coley, R., Jia, Y., & Trapani, C. (2009). Exploring what works in science instruction: A look at the eighth-grade science classroom. Policy Information Report. Educational Testing Service. https://eric.ed.gov/?id=ED507837 Bray, M., & Thomas, R. M. (1995). Levels of comparison in educational studies: Different insights from different literatures and the value of multilevel analyses. Harvard Educational Review, 65(3), 472-491. Retrieved from https://www.proquest.com/docview/212255856 Carnoy, M., Khavenson, T., Loyalka, P., Schmidt, W. H., & Zakharov, A. (2016). Revisiting the relationship between international assessment outcomes and educational production: evidence from a longitudinal PISA-TIMSS sample. American Educational Research Journal, 53(4), 1054–1085. https://doi.org/10.3102/0002831216653180 Cattaneo, M. A., & Wolter, S. C. (2015). Better migrants, better PISA results: Findings from a natural experiment. IZA Journal of Migration, 4(1), 1-19. https://doi.org/10.1186/s40176-015-0042-y Chen, C., Noble, I., Hellmann, J., Coffee, J., Murillo, M., & Chawla, N. (2015). University of Notre Dame global adaptation index. University of Notre Dame: Notre Dame, IN, USA. https://gain.nd.edu/ 128 Chen, C., Hellmann, J., Berrang-Ford, L., Noble, I., & Regan, P. (2018). A global assessment of adaptation investment from the perspectives of equity and efficiency. Mitigation and Adaptation Strategies for Global Change, 23, 101-122. https://doi.org/10.1007/s11027-016-9731-y Chiswick, B. R., & Miller, P. W. (1994). Language choice among immigrants in a multi-lingual destination. Journal of Population Economics, 7(2), 119-131. https://doi.org/10.1007/BF00173615 Chiswick, B. R., & Miller, P. W. (2005). Linguistic distance: A quantitative measure of the distance between English and other languages. Journal of Multilingual and Multicultural Development, 26(1), 1-11. https://www.tandfonline.com/doi/abs/10.1080/14790710508668395 Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587 Corder, S. P. (1981). Error Analysis and Interlanguage, Oxford: Oxford University Press. Dervis, K., & Klugman, J. (2011). Measuring human progress: the contribution of the Human Development Index and related indices. Revue d'économie politique, 121(1), 73-92. https://doi.org/10.3917/redp.211.0073 Donnellan, M., Trzesniewski, K. & Lucas, R. (2011) Introduction, in K. Trzewniewski, M. Donnellan & R. Lucas (Eds) Secondary Data Analysis: an introduction for psychologists, pp. 3-10. Washington, DC: American Psychological Association. https://doi.org/10.1037/12350-000 Dryden-Peterson, S. (2015). The educational experiences of refugee children in countries of first asylum. British Columbia Teachers' Federation. Ellili-Cherif, M., & Alkhateeb, H. (2015). College students' attitude toward the medium of instruction: Arabic versus English dilemma. Universal Journal of Educational Research Vol. 3(3), pp. 207 - 213. DOI: 10.13189/ujer.2015.030306 Eslami, Z. R., Graham, K. M., & Bashir, H. (2020). English Medium Instruction in higher education in Qatar: A multi-dimensional analysis using the ROAD-MAPPING framework. In: Dimova, S., Kling, J. (eds) Integrating content and language in multilingual universities, Educational Linguistics, vol 44. 115-129. https://doi.org/10.1007/978-3-030-46947-4_7 Frank, K.A., Maroulis, S., Duong, M., & Kelcey, B. (2013). What would it take to Change an Inference?: Using rubin’s causal model to interpret the robustness of causal inferences. Education, Evaluation and Policy Analysis. Vol 35: 437-460. https://doi.org/10.3102/0162373713493129 129 Frank, K.A., Lin, Q., Maroulis, S.J., (2023a). Embracing essential discourse in educational policy about causal inferences from observational studies: towards pragmatic social science. Handbook on Educational Policy Research. American Educational Research Association. Frank, K.A., Lin, Q., Xu, R., Maroulis, S.J., & Mueller, A. (2023b). Quantifying the robustness of causal inferences: sensitivity analysis for pragmatic social science. Social Science Research, 110, 102815. https://doi.org/10.1016/j.ssresearch.2022.102815 Figueiredo, S., Alves Martins, M., & Silva, C. F. D. (2016). Second language education context and home language effect: language dissimilarities and variation in immigrant students’ outcomes. International Journal of Multilingualism, 13(2). https://doi.org/10.1080/14790718.2015.1079204 Gamallo, P., Pichel, J. R., & Alegria, I. (2017). From language identification to language distance. Physica A: Statistical Mechanics and its Applications, 484, 152-162. https://doi.org/10.1016/j.physa.2017.05.011 Garson, G. D. (2013). Hierarchical linear modeling: Guide and applications. Sage. Gay, G. (2000). Culturally responsive teaching: Theory, research, and practice. New York: Teachers College Press Gay, G. (2002). Culturally responsive teaching in special education for ethnically diverse students: Setting the stage. International Journal of Qualitative Studies in Education, 15(6), 613-629. https://doi.org/10.1080/0951839022000014349 Ghislandi, S., Sanderson, W. C., & Scherbov, S. (2019). A simple measure of human development: The human life indicator. Population and Development Review, 45(1), 219. https://doi.org/10.1111/padr.12205 Giuntella, O., Kone, Z. L., Ruiz, I., & Vargas-Silva, C. (2018). Reason for immigration and immigrants' health. Public Health, 158, 102-109. https://doi.org/10.1016/j.puhe.2018.01.037 Goodwin, A. L. (2020). Globalization, global mindsets and teacher education. Action in Teacher Education, 42(1), 6-18. https://doi.org/10.1080/01626620.2019.1700848 Grimes, J. E. and Grimes, B. F. 1993. Ethnologue: Languages of the World, 13th edn, Dallas: Summer Institute of Linguistics, Inc. Hart-Gonzalez, L., & Lindemann, S. (1993). Expected achievement in speaking proficiency, 1993. School of Language Studies, Foreign Services Institute, Department of State. 130 Heath, S. B., & Heath, S. B. (1983). Ways with words: Language, life and work in communities and classrooms. Cambridge University Press. Hillman, S., & Ocampo Eibenschutz, E. (2018). English, super‐diversity, and identity in the State of Qatar. World Englishes, 37(2), 228-247. https://doi.org/10.1111/weng.12312 Hox, J. J., Moerbeek, M., & Van de Schoot, R. (2017). Multilevel analysis: Techniques and applications. Routledge. Hu, J., & Wang, Y. (2022). Influence of students’ perceptions of instruction quality on their digital reading performance in 29 OECD countries: A multilevel analysis. Computers & Education, 189, 104591. https://doi.org/10.1016/j.compedu.2022.104591 Jain, T. (2017). Common Tongue: The Impact of Language on Educational Outcomes. The Journal of Economic History, 77(2), 473-510. https://doi.org/10.1017/S0022050717000481 Jerrim, J., Oliver, M., & Sims, S. (2022). The relationship between inquiry-based teaching and students’ achievement. New evidence from a longitudinal PISA study in England. Learning and Instruction, 80, 101310. https://doi.org/10.1016/j.learninstruc.2020.101310 Kang, H. S. (2013). Korean-immigrant parents’ support of their American-born children’s development and maintenance of the home language. Early Childhood Education Journal, 41, 431-438. https://doi.org/10.1007/s10643-012-0566-1 Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast creation of dummy (binary) columns and rows from categorical variables. Version 1.7.1. https://github.com/jacobkap/fastDummies Kelly, D.L., Centurino, V.A.S., Martin, M.O., & Mullis, I.V.S. (Eds.) (2020). TIMSS 2019 encyclopedia: Education policy and curriculum in mathematics and science. Retrieved from Boston College, TIMSS & PIRLS International Study Center website: https://timssandpirls.bc.edu/timss2019/encyclopedia/ Klugman, J., Rodríguez, F., & Choi, H. J. (2011). The HDI 2010: new controversies, old critiques. The Journal of Economic Inequality, 9, 249-288. https://doi.org/10.1007/s10888-011-9178-z Komatsu, H., & Rappleye, J. (2021). Rearticulating PISA. Globalisation, Societies and Education, 19(2), 245-258. https://doi.org/10.1080/14767724.2021.1878014 131 Kosnik, C., Beck, C., & Goodwin, A. L. (2016). Reform efforts in teacher education. In International handbook of teacher education (pp. 267-308). Springer, Singapore. https://doi.org/10.1007/978-981-10-0366-0_7 Kovacevic, M. (2010). Review of HDI critiques and potential improvements. Human development research paper, 33, 1-44. Kużelewska, E. (2016). Language policy in Switzerland. Studies in logic, grammar and rhetoric, 45(1), 125-140. https://doi.org/10.1515/slgr-2016-0020 Latomaa, S., & Nuolijärvi, P. (2002). The language situation in Finland. Current Issues in Language Planning, 3(2), 95-202. https://doi.org/10.1080/14664200208668040 Lau, K. L., & Ho, E. S. C. (2016). Reading performance and self-regulated learning of Hong Kong students: What we learnt from PISA 2009. The Asia-Pacific Education Researcher, 25, 159-171. https://doi.org/10.1007/s40299-015-0246-1 Ledger, S., Thier, M., Bailey, L., & Pitts, C. (2019). OECD’s Approach to Measuring Global Competency: Powerful Voices Shaping Education. Teachers College Record, 121(8), 1–40. https://doi.org/10.1177/016146811912100802 Levenshtein, V. I. (1966, February). Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (Vol. 10, No. 8, pp. 707-710). https://doi.org/10.1080/14664200208668040 Liu, L. (2018). “It’s Just Natural”: A Critical Case Study of Family Language Policy in a 1.5 Generation Chinese Immigrant Family on the West Coast of the United States. In: Sinner, M., Hult, F., Kupisch, T. (eds) Language Policy and Language Acquisition Planning. Language Policy, vol 15. Springer, Cham. p. 13-31. https://doi.org/10.1007/978-3-319-75963-0_2 Martin, M. O., Foy, P., Mullis, I. V., & O’Dwyer, L. M. (2011). Effective schools in reading, mathematics, and science at the fourth grade. TIMSS and PIRLS, 109-178. https://files.eric.ed.gov/fulltext/ED545256.pdf#page=117 Maskileyson, D., Semyonov, M., & Davidov, E. (2021). Economic integration of first‐and second‐generation immigrants in the Swiss labour market: Does the reason for immigration make a difference?. Population, Space and Place, 27(6), https://doi.org/10.1002/psp.2426 Moll, L., Amanti, C., Neff, D., & Gonzalez, N. (2006). Funds of knowledge for teaching: Using a qualitative approach to connect homes and classrooms. In Funds of knowledge (pp. 71-87). Routledge. Mustafawi, E., & Shaaban, K. (2019). Language policies in education in Qatar between 2003 and 2012: From local to global then back to local. Language Policy, 18, 209-242. https://doi.org/10.1007/s10993-018-9483-5 132 Nguyen, T. & Kelley, C. (2018). Methods used for estimating mixed-effects models in edsurvey. https://www.air.org/sites/default/files/EdSurvey-Mixed_Models.pdf Nowak, A. C., & Rosenstock, T. S. (2020). Foundations for common approaches to measure global adaptation actions in the agriculture sector: Highlights from an analysis of existing climate adaptation frameworks. https://hdl.handle.net/10568/109718 Nilsen, T., & Gustafsson, J. E. (2014). School emphasis on academic success: exploring changes in science performance in Norway between 2007 and 2011 employing two-level SEM. Educational Research and Evaluation, 20(4), 308-327. https://doi.org/10.1080/13803611.2014.941371 Nyiwul, L. (2023). Climate change adaptation innovation in the water sector in Africa: Dataset. Data in Brief, 46, 108782. https://doi.org/10.1016/j.dib.2022.108782 OECD (2009), “Plausible Values”, in PISA Data Analysis Manual: SPSS, Second Edition, OECD Publishing, Paris. DOI: https://doi.org/10.1787/9789264056275-7-en OECD (2016). PISA 2015 results, Excellence and equity in education (Vol. I). Paris: OECD Publishing. https://doi.org/10.1787/9789264266490-en OECD (2017). International migration outlook 2017. OECD Publishing. https://doi.org/10.1787/migr_outlook-2017-en OECD (2019a). PISA 2018 Results (Volume I): What Students Know and Can Do, PISA, OECD Publishing, Paris, https://doi.org/10.1787/5f07c754-en OECD (2019b). PISA 2018 Results (Volume II): Where All Students Can Succeed, PISA, OECD Publishing, Paris, https://doi.org/10.1787/b5fd1b8f-en OECD (2019c). PISA 2018 technical report. https://www.oecd.org/pisa/data/pisa2018technicalreport/ OECD (2020). Global Teaching InSights: Technical Report, OECD Publishing, Paris, http://www.oecd.org/education/school/global-teaching-insights-technical-docume nts.htm OECD (2022). International migration outlook 2022. OECD Publishing. https://doi.org/10.1787/30fe16d2-en Olvera, P., & Gomez-Cerrillo, L. (2011). A bilingual (English & Spanish) psychoeducational assessment MODEL grounded in Cattell-Horn Carroll (CHC) 133 theory: A cross battery approach. Contemporary School Psychology: Formerly “The California School Psychologist”, 15, 117-127. https://doi.org/10.1007/BF03340968 Pennock‐Roman, M., & Rivera, C. (2011). Mean effects of test accommodations for ELLs and non‐ELLs: A meta‐analysis of experimental studies. Educational Measurement: Issues and Practice, 30(3), 10-28. https://doi.org/10.1111/j.1745-3992.2011.00207.x R Core Team (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Rabe-Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data, Journal of the Royal Statistical Society Series A: Statistics in Society, 169(4), pg. 805–827, https://doi.org/10.1111/j.1467-985X.2006.00426.x Raudenbush, S.W., & Congdon, R.T. (2021). HLM 8: Hierarchical linear and nonlinear modeling. Chapel Hill, NC: Scientific Software International, Inc. Ringel, S., Ronell, N., & Getahune, S. (2005). Factors in the integration process of adolescent immigrants: The case of Ethiopian Jews in Israel. International Social Work, 48(1), 63-76. https://doi.org/10.1177/0020872805048709 Robertson, S. L. (2021). Provincializing the OECD-PISA global competences project. Globalisation, Societies and Education, 19(2), 167-182. https://doi.org/10.1080/14767724.2021.1887725 Robinson-Cimpian, J. P., Thompson, K. D., & Umansky, I. M. (2016). Research and policy considerations for English learner equity. Policy Insights from the Behavioral and Brain Sciences, 3(1), 129-137. https://doi.org/10.1177/237273221562355 Rocher, T. & Hastedt, D., (2020, September). International large-scale assessments in education: a brief guide. IEA Compass: Briefs in Education. No. 10. International Association for the Evaluation of Educational Achievement. Rosenberg, J. M., Narvaiz, S., Xu, R., Lin, Q., Maroulis, S., & Frank, K. A. (2023). Konfound-It!: Quantify the robustness of causal inferences (v. 2.0.0). Rueda, R., & Windmueller, M. P. (2006). English language learners, LD, and overrepresentation: A multiple-level analysis. Journal of Learning Disabilities, 39(2), 99-107. https://doi.org/10.1177/00222194060390020801 Ruhose, J., & Schwerdt, G. (2016). Does early educational tracking increase migrant-native achievement gaps? Differences-in-differences evidence across 134 countries. Economics of Education Review, 52, 134-154. https://doi.org/10.1016/j.econedurev.2016.02.004 Sablan, J. R. (2019). Can you really measure that? Combining critical race theory and quantitative methods. American Educational Research Journal, 56(1), 178-203. https://doi.org/10.3102/0002831218798325 Sanatullova-Allison, E., & Robison-Young, V. A. (2016). Overrepresentation: An overview of the issues surrounding the identification of English language learners with learning disabilities. International Journal of Special Education, 31(2), n2. Sánchez, M. T., Parker, C., Akbayin, B., & McTigue, A. (2010). Processes and Challenges in Identifying Learning Disabilities among Students Who Are English Language Learners in Three New York State Districts. Issues & Answers. REL 2010-No. 085. Regional Educational Laboratory Northeast & Islands. Sawilowsky, S. S. (2009). New effect size rules of thumb. Journal of Modern Applied statistical methods, 8(2), 26. https://doi.org/10.22237/jmasm/1257035100 Sharma, S. D. (2010). Making the human development index (HDI) gender-sensitive. Gender & Development, 5(1), 60-61. https://doi.org/10.1080/741922304 Schleicher, A. (2006). Where immigrant students succeed: a comparative review of performance and engagement in PISA 2003. Intercultural Education, 17(5), 507-516. https://doi.org/10.1080/14675980601063900 Schnepf, S. V. (2007). Immigrants’ educational disadvantage: an examination across ten countries and three surveys. Journal of population economics, 20, 527-545. https://doi.org/10.1007/s00148-006-0102-y Scott, M. A., Simonoff, J. S., & Marx, B. D. (2013). The SAGE handbook of multilevel modeling. SAGE Publications Ltd, https://doi.org/10.4135/9781446247600 Strietholt, R., & Scherer, R. (2016). The contribution of international large-scale assessments to educational research: Combining individual and institutional data sources. Scandinavian Journal of Educational Research, 62(3), 368-385. https://doi.org/10.1080/00313831.2016.1258729 Swadesh, Morris. (1955). Towards greater accuracy in lexicostatistic dating. International Journal of American Linguistics, 21(2), 121-137. https://doi.org/10.1086/464321 Torney-Purta, J., & Amadeo, J. A. (2013a). International large-scale assessments: Challenges in reporting and potentials for secondary analysis. Research in Comparative and International Education, 8(3), 248-258. https://doi.org/10.2304/rcie.2013.8.3.248 135 Torney-Purta, J., & Amadeo, J. A. (2013b). The contributions of international large-scale studies in civic education and engagement. In: von Davier, M., Gonzalez, E., Kirsch, I., Yamamoto, K. (eds) The role of international large-scale assessments: Perspectives from technology, economy, and educational research, 87-114. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-4629-9_6 United Nations Development Programme (UNDP). (2023, May 5). Human development index. Human Development Reports. https://hdr.undp.org/data-center/human-development-index#/indicies/HDI United Nations High Commissioner for Refugees (UNHCR). (2023, May 5). Refugee Population Statistics Database. Refugee Data Finder. https://www.unhcr.org/refugee-statistics/ Vargas-Montoya, L., Gimenez, G., & Fernández-Gutiérrez, M. (2023). ICT use for learning and students' outcomes: Does the country's development level matter?. Socio-Economic Planning Sciences, 101550. https://doi.org/10.1016/j.seps.2023.101550 Von Davier, M., Gonzalez, E., & Mislevy, R. (2009). What are plausible values and why are they useful. IERI monograph Series, 2(1), 9-36. Werrell, C. E., Femia, F., & Sternberg, T. (2015). Did we see it coming? State fragility, climate vulnerability, and the uprisings in Syria and Egypt. The SAIS Review of International Affairs, 35(1), 29-46. https://www.jstor.org/stable/27000974 Wichmann, S., Holman, H.W., & Brown, C.H. (2022). The ASJP Database (version 20). Wickham H., Averick M., Bryan J., Chang W., McGowan LD., François R., Grolemund G., Hayes A., Henry L., Hester J., Kuhn M., Pedersen TL., Miller E., Bache SM., Müller K., Ooms J., Robinson D., Seidel DP., Spinu V., Takahashi K., Vaughan D., Wilke C., Woo K., Yutani H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. doi: https://joss.theoj.org/papers/10.21105/joss.01686 World Bank. (2017). Forcibly Displaced: Toward a development approach supporting refugees, the internally displaced, and their hosts: Toward a development approach supporting refugees, the internally displaced, and their hosts. Washington, DC: World Bank. http://hdl.handle.net/10986/25016 Wu, Z., Spreckelsen, T. F., & Cohen, G. L. (2021). A meta‐analysis of the effect of values affirmation on academic achievement. Journal of Social Issues, 77(3), 702-750. https://doi.org/10.1111/josi.12415 136 Zhao, Y. (2018). The changing context of teaching and implications for teacher education. Peabody Journal of Education, 93(3), 295-308. https://doi.org/10.1080/0161956X.2018.1449896 Zhao, Y. (2020). Two decades of havoc: A synthesis of criticism against PISA. Journal of Educational Change, 21(2), 244-246. https://doi.org/10.1007/s10833-019-09367-x 137 APPENDIX A: LIST OF PROMINENT ILSAS Table A1 shows prominent ILSAs are the organizations that administer the respective assessments. Table A1 List of Prominent ILSAs with Administering Organization ILSA Global Teaching Insights (GTI) Organization OECD International Civic and Citizenship Education Study (ICCS) IEA International Computer and Information Literacy Study (ICILS) IEA Programme for International Student Assessment (PISA) OECD Progress in International Reading Literacy Study (PIRLS) IEA & NCES Trends in International Mathematics and Science Study (TIMSS) IEA 138 APPENDIX B: BREAKDOWN OF DEMOGRAPHIC QUESTIONS Tables B1 to B3 show a breakdown of demographic questions by origin country or destination country for the PISA 2018 questionnaires (i.e., student, teacher, and school questionnaire). A third column is demographic questions that overlap both origin and destination. Table B1 Breakdown of Demographic Questions on PISA 2018 Student Survey Origin (1) Destination (6) Both (5) 1. ST019 In 1. ST011 Which of the 1. ST003 On what date what country following are in your were you born? were you and home? 2. ST004 Are you female your parents 2. ST012 How many of or male? born? these are there at your 3. ST021 How old were home? you when you arrived in 3. ST013 How many ? books are there in your 4. ST023 Which language home? do you usually speak 4. ST014 The following with the following two questions concern people? your mother’s job: 5. ST177 How many 5. ST015 The following languages, including two questions concern the language(s) you your father’s job: speak at home, do you 6. ST022 What language and your parents speak do you speak at home well enough to most of the time? converse with others? 139 Table B2 Breakdown of Demographic Questions on PISA 2018 Teacher Survey Origin (0) Destination (3) Both (2) 1. None 1. TC186 In what country were you 1. TC001 Are you born? 2. TC007 How many years of work female or male? experience do you have? 2. TC002 How 3. TC014 Did you complete a teacher old are you? education or training 4. programme? 140 Table B3 Breakdown of Demographic Questions on PISA 2018 School Survey Origin (0) Destination (6) Both (0) 1. None 1. SC001 Which of the following 1. None definitions best describes the 2. community in which your school is located? 3. SC002 As of , what was the total school 4. enrolment (number of students)? 5. SC003 What is the average size of classes in 6. in your 7. school? 8. SC011 Which of the following statements best describes the 9. schooling available to students in your location? 10. SC013 Is your school a public or a private school? 11. SC016 About what percentage of your total funding for a typical 12. school year comes from the following sources? 141 APPENDIX C: COMPLETE LIST OF VARIABLES The variables of interest for this study included variables related to identity (e.g., student id), immigration status, language use, mathematics scores, and more. Additionally, variables for student and school weights were identified. Table C1 shows the explanatory variable, which was a student-level mathematics PISA score. Table C1 Outcome Variable # PISA Variable Description Level Source 1 pv1math - The 10 plausible values for the PISA Student PISA pv10math composite score of the mathematics performance subscales (e.g., algebra, score, geometry score, etc.) 142 Table C2 shows the explanatory variables that were used for statistical modeling. These variables were mostly pre-existing, student-level variables directly from PISA with a few derived variables as well. In addition, external variables from the external data sets, mostly country-level variables with a few student-level variables, are shown here as well. Table C2 Explanatory Variables # PISA Variable Description Level Source 1 2 3 lang_home Language at home Student PISA lang_test Language of questionnaire/assessment Student PISA lang_match Derived indicator of whether lang_home & Student PISA lang_match are equal 4 language_dist Degree of linguistic similarity between Student LD ance languages in origin and destination countries. 5 6 immig_status Immigrant status based on country of birth Student PISA immig_status2 Derived immigrant status based on country Student PISA of birth 7 country_code_ Student country of birth (ISO3 code) Student PISA origin 8 country_code_ Student country of PISA test (ISO3 code) Student PISA destination 143 Table C2 (cont’d) # PISA Variable Description Level Source 9 country_name Student country of birth (Full name) Student PISA _origin 10 country_name Student country of PISA test (Full name) Student PISA _destination 11 gain_origin_5y Linked country-level ND-GAIN measure for Country GAIN r_mean student’s origin country; Mean value of 5 years prior to year of immigration (immigrant) or year of test (native). 12 gain_destinati Linked country-level ND-GAIN measure for Country GAIN on_5yr_mean student’s destination country; Mean value of 5 years prior to year of immigration (immigrant) or year of test (native). 13 hdi_origin_5yr Linked country-level HDI measure for Country HDI _mean student’s origin country; Mean value of 5 years prior to year of immigration (immigrant) or year of test (native). 14 hdi_destinatio Linked country-level HDI measure for Country HDI n_5yr_mean student’s destination country; Mean value of 5 years prior to year of immigration (immigrant) or year of test (native). 15 refugee_ratio_ Linked country-level displacement measure Country WB 5yr_mean_sta for student’s origin country; Mean value of 5 ndardized_orig years prior to year of immigration in (immigrant) or year of test (native). 144 Table C3 (cont’d) # PISA Variable Description Level Source 16 refugee_ratio_ Linked country-level displacement measure Country WB 5yr_mean_sta for student’s destination country; Mean ndardized_des value of 5 years prior to year of immigration tination (immigrant) or year of test (native). Table C3 shows the control variables. These variables were a mix of student-level and school-level variables that came directly from PISA and do not include any external variables. Table C3 Control Variables # PISA Variable Description Level Source 1 sex Student (Standardized) Sex Student PISA 2 escs Index of economic, social and cultural status Student PISA 3 school_pct_se School-level measure of % of 15 yr-olds from School PISA s_disadv socioeconomically disadvantaged homes 4 pct_immig_de Derived country-level measure of % of 15 Country PISA stination_coun yr-olds from socioeconomically try disadvantaged homes 145 Table C4 shows the weight variables that were used in statistical modeling. These weights include student-, school-, and country-level weights. The student- and school-level weights came from PISA. The adjusted student-level weight was derived from the original student-level weight. The country-level weight was added on for the purpose of running models. A value of 1 was assigned to each country since this study does not use any weighting scheme for country-level data. Table C4 Weight Variables # PISA Variable Description Level Source 1 W_fstuwt Student-level weights Student PISA 2 w_fstuwt_adj Student-level weights (scaled) Student PISA 3 w_fschwt School-level weights School PISA 4 w_cntrywt Country-level weights Country N/A 146 Table C5 shows a set of miscellaneous variables that were used for identification (e.g., student_id), used as grouping variables within modeling software (e.g., school_id), used to create derived variable (e.g., birth_year), or used for descriptive purposes (e.g., immig_count). Table C5 Miscellaneous Variables # PISA Variable Description Level Source 1 student_id Student ID number Student PISA 2 test_year Derived year of PISA assessment (i.e., 2018) Student N/A 3 birth_year Student sex 4 age Student age Student PISA Student PISA 5 immig_flag Binary flag for immigration status Student PISA 6 immig_year Derived year of immigration Student PISA 7 immig_age Age at year of immigration Student PISA 8 school_id School ID number School PISA 9 immig_count Count of immigrant participants in PISA Country PISA sample, per country. 147 APPENDIX D: SOFTWARE CONSIDERED FOR STATISTICAL MODELING Multiple software options were considered for conducting multilevel statistical modeling before selecting one. The R WeMix package (v. 4.0.0) was one software considered for conducting multilevel models (Bailey, et al., 2021). WeMix was created by the American Institutes for Research (AIR) for running HLMs with multilevel data that includes weights at multiple levels. An advantage of the WeMix package is that it can accept up to 4 levels (e.g., student, school, country, etc.). A constraint of WeMix is that it does not automatically handle plausible values (i.e., PISA has 10 PVs for mathematics). A few other software options were considered for this study on the basis of their functionality for running HLMs. Another considered modeling software was EdSurvey, also created by AIR, which offers a function for running HLMs on multilevel data with weights at multiple levels. EdSurvey specifically focuses on some of the most well-known large-scale education surveys, such as NAEP and PISA. This means EdSurvey is specifically written for the complex sample designs, weights, and plausible values common to these types of data sets (Bailey et al., 2023). An advantage of EdSurvey is that it automatically handles plausible values and weights. A constraint of EdSurvey is that it currently only accepts two levels (e.g., student- and school-levels). Yet another commonly used R package for multilevel models is lme4 (Bates et al, 2015). However, lme4 does not accept weights and thus is not recommended for PISA data. A final consideration was the SSI HLM software by Scientific Software International. An advantage of the SSI HLM software is that it was developed by 148 prominent scholars in the multilevel modeling field. A constraint of the SSI HLM software is that it is not a free or open source software which means it is restricted to paying users, after a two-week trial period ends. A comparison between the considered software is shown in Table D1. Table D1 Modeling Software Comparison Modeling Software Advantages Constraints Used in Study WeMix ● Automatically handles ● Doesn’t automatically Yes (v. 4.0.0) weights handle PVs ● Specify models up to 4 levels EdSurvey ● Automatically handles ● Specify models up to 2 Yes (v. 3.1.0) weights levels ● Automatically handles PVs lme4 ● None relevant for this ● Does not automatically No (v. 1.4-14) study handle weights and PVs. SSI HLM ● Developed by ● Not free or open Yes (v. 8.2) prominent scholars in source software multilevel modeling 149 APPENDIX E: LIST OF DATA FILES Table E1 List of Downloaded PISA 2018 Data Data File Data File Name Student questionnaire data file (489 MB) SPSS_STU_QQQ.zip School questionnaire data file (3.1 MB) SPSS_SCH_QQQ.zip Teacher questionnaire data file (12.8 MB) SPSS_TCH_QQQ.zip Cognitive item data file (466 MB) SPSS_STU_COG.zip 150 APPENDIX F: FINAL LIST OF COUNTRIES USED IN STUDY This final list of countries/territories used in this study are shown in Table F1. The first column indicates the list count. The second column contains the country/territory name. The third column shows the count for how many immigrant students were found in the countries’ respective samples. The fourth column shows what percentage of the countries’ samples were immigrants. Table F1 List of Countries used in Model # 1 2 3 4 5 6 7 8 9 10 11 Country Immigrants Count % Immigrants in Sample in Sample Albania Azerbaijan (Baku) Argentina Australia Austria Belgium Bosnia and Herzegovina Brazil Brunei Darussalam Bulgaria Belarus 0 377 215 1083 149 255 214 0 373 0 99 151 0.0 4.9 4.7 21.4 15.6 13.1 2.5 0.0 6.9 0.0 3.8 Country Immigrants Count % Immigrants Table F1 (cont’d) # 12 13 Canada Chile 14 Chinese Taipei (Taiwan) 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Colombia Costa Rica Croatia Czech Republic Denmark Dominican Republic Estonia Finland France Georgia Germany Greece Hong Kong Hungary in Sample in Sample 1868 20.7 0.0 0.0 0.0 9.4 8.8 3.3 19.4 2.4 0.0 4.6 0.0 1.0 18.7 10.3 36.8 0.0 0 0 0 191 80 119 118 82 0 155 0 150 82 110 1007 0 152 Table F1 (cont’d) # 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Country Immigrants Count % Immigrants in Sample in Sample Iceland Indonesia Ireland Israel Italy Japan Kazakhstan Jordan South Korea Lebanon Latvia Lithuania Luxembourg Macao Malaysia Malta Mexico 0 37 184 163 0 0 0 971 33 0 48 0 868 1074 0 0 85 153 0.0 0.2 10.3 13.9 0.0 0.0 0.0 17.2 0.1 0.0 4.3 0.0 50.8 62.4 0.0 0.0 0.8 Table F1 (cont’d) # 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 Country Immigrants Count % Immigrants in Sample in Sample Moldova Montenegro Morocco Netherlands New Zealand Norway Panama Peru Philippines Poland Portugal Qatar Romania Russian Federation Saudi Arabia Serbia Singapore 112 201 74 0 765 34 165 0 42 0 67 1711 0 0 66 0 0 154 1.2 5.2 0.7 0.0 22.6 7.0 4.7 0.0 0.6 0.0 4.8 40.3 0.0 0.0 7.4 0.0 0.0 Table F1 (cont’d) Country Immigrants Count % Immigrants in Sample in Sample # 63 64 65 66 67 68 69 Slovak Republic Vietnam Slovenia Spain Sweden 46 0 15 0 0 Switzerland 443 Thailand 70 United Arab Emirates 71 72 73 74 75 76 77 Turkey Ukraine North Macedonia United Kingdom United States Uruguay B-S-J-Z (CHINA) 0 0 26 43 24 149 0 63 0 155 0.8 0.0 4.4 0.0 0.0 30.6 0.0 0.0 0.6 1,9 1.1 8.0 0.0 1.0 0.0 APPENDIX G: COMPLETE STEPS FOR ANALYTICAL METHOD This study involves the creation of a baseline model called the “null model” or “unconditioned model” followed by multiple models to test the association between the variables of interest and the outcome variables of PISA reading scores. This appendix contains the complete methods already summarized in the main body text. Stage 1: Identifying Destination Countries with Outsized Influence; Chipping Away Destination Country Variance Model 00: Null Model First, a null model was specified and run. Double digit numbering convention was used to differentiate between models in Stage 1 and Stage 2. Figure G1 shows the model specification. Figure G1 Model 00 Specification 156 Table G2 shows the fixed effects for Model 00. In the fixed effects, there are no covariates in the null model so the only relevant term is the intercept (i.e., 423) which represents the grand mean reading score for all students in the data. Table G2 Model 00 Fixed Effects 157 Table G3 shows the variance decomposition for the null model. Most of the variance decomposition belongs to level-1 (i.e. 7580), with an equal amount of variance at both level-2 (i.e., 3004) and level-3 (i.e., 2971). Table G3 Model 00 Variance Components Table G4 shows the model variance decomposition. The first column indicates the model name. The second column indicates model levels. The third column indicates the variance by level. The fourth column indicates the variance decomposition by level. Table G4 Model Variance Decomposition Model Level Variance Variance By Level Model 00 Student 75780 School Country 3004 2971 158 56% 22% 22% Examining Residuals For Outliers The residuals of the aforementioned Model 00 were examined. Particular attention was given to level-3 residuals. The reason for this is to identify destination countries that may be significantly different from the rest. These countries would be added into subsequent Stage 1 models (e.g., Model 01, Model 02) as fixed effects dummy variables. The goal was to reduce the variance for the destination countries until it becomes small (i.e., non-significant) at less than 5% of the overall variance. Figure G2 is a plot of the level 3 residuals versus the fitted values. The x-axis is the fitted values scale while the y-axis is the residuals scale. Here, the expectation is for the residuals to be scattered randomly around the horizontal zero line at y = 0. A random scattering of points suggests that the residuals have no systematic relationship with the fitted values, meeting the assumption of linearity necessary for linear modeling. Figure G2 most obviously indicates the categorical nature of the level 3 residuals as they are plotted with respect to the rest of the destination countries which all share a mean effect in the null model. In other words, these residuals represent how much each destination country deviated from the predicted value that was itself based on the mean of all the countries. The figure indicated a large difference in deviation from the destination country mean residual. However, the points are generally distributed symmetrically around the zero line. Therefore, this figure supports the assumption of linearity for level 3 residuals. 159 Figure G2 Residuals vs. Fitted Plot for Level 3 (Destination Countries) 160 Figure G3 is a normal Q-Q plot of the level 3 residuals. Some of the residual points fall near the reference line. However, the tails of the residual data are heavy and deviate from the reference line. Therefore, this indicates non-normality of the level 3 residuals due to outliers. Figure G3 Normal Q-Q Plot of Residuals for Level 3 (Destination Countries) 161 Figure G4 is a histogram of the level 3 residuals. Figure G4 also suggests non-normality (e.g., bimodal) for the level 3 residuals, due to negative residual outliers on the left of the figure and the positive frequency outliers to the right. Figure G4 Histogram of Residuals for Level 3 (Destination Countries) 162 The level-3 residuals are shown below in Table G5. The first column shows the destination country name while the second column shows the value of the residual for that country. Rows are sorted from highest residual value to lowest. Table G5 Destination Country Residual Values # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Country Residual Value +102 +94 +89 +73 +72 +68 +58 +56 +45 +41 +33 +31 +28 +23 Korea Ireland New Zealand Canada Turkey Australia Latvia Great Britain Croatia Norway Portugal Denmark Belgium Jordan 163 Country Residual Value Table G5 (cont’d) # 15 16 17 18 19 20 21 Switzerland Ukraine Belarus Israel Slovenia Finland Qatar +22 +16 +15 +8 +5 +3 -3 -4 -5 -10 -11 -11 -12 -15 -20 -20 -27 22 Czech Republic 23 24 25 26 27 28 29 30 31 Moldova Montenegro Panama Saudi Arabia Austria Uruguay Germany Slovakia Bosnia & Herzegovina 164 Country Residual Value Table G5 (cont’d) # 32 33 34 35 36 37 Mexico Argentina Azerbaijan Costa Rica Georgia Greece 38 North Macedonia 39 40 41 Morocco Philippines Dominican Republic -28 -34 -35 -36 -53 -55 -60 -86 -114 -120 42 Indonesia -121 Chipping Away Destination Country Variance The purpose of stage 1 was to examine the residuals to identify which destination countries might differ from the rest. Ultimately, the aim of this Stage 1 process was to be able to control for these outlier countries by transforming them into fixed effects (i.e., predictor variables) at the school-level for the models in Stage 2. The rationale for adding these countries as level-2 variables is that each school can be reasoned as having characteristics imparted upon it by the country the school is located within. This 165 process solves the issue with complex cross-nesting as it essentially removes destination country as its own level, and instead puts the impact of destination country as a covariate within the model, thereby allowing origin country influences to become the focal point of the modeling. Model 01: Null Model + Destination Country Effects Model 01 was specified and run. Figure G5 shows the model specification. Six destination countries were added as fixed effects to the destination country-level. These 6 are the destination countries identified in Model 00 as having the largest residual absolute values (i.e., Korea, Ireland, New Zealand, Philippines, Dominican Republic, and Indonesia). They were entered into Model 01 as binary dummy variables at the country-level. For example, the variable for Korea takes on a value of either 1 (i.e., “Yes”) if the country is Korea or a value of 0 (i.e., “No”) if it is not. Figure G5 Model 01 Specification 166 Table G7 shows the fixed effects for Model 01. The intercept (i.e., 425) represents the reference point from which to evaluate the six coefficients (e.g., Korea is associated with +102 PISA reading score). This means that immigrant students who settled in Korea were associated with scores 102 points higher than the mean score of students attending school in any other country (i.e., 425); except for those six countries modeled as fixed effects. Table G7 Model 01 Fixed Effects 167 Table G8 shows the variance decomposition for Model 01. After adding the 6 fixed effects, the destination country variance decreased to 1206. Table G8 Model 01 Variance Components After adding the destination country fixed effects to Model 00, country-level variance value decreased from 22% to 10%, not yet under the target 5% or less of the overall variance (see Table G9). Table G9 Model Variance Decomposition Model Level Variance Variance By Level Model 00 Student 75780 School Country 3004 2971 56% 22% 22% 168 Table G9 (cont’d) Model Level Variance Variance By Level Model 01 Student School Country 7575 3015 1206 64% 26% 10% 169 Model 02: Null Model + More Destination Country Effects Model 02 was specified and run. Figure G6 shows the model specification. Six additional countries were added as fixed effects to the destination country-level. They were the destination countries identified in Model 00 with next largest residual absolute values (i.e., Canada, Turkey, Australia, Greece, North Macedonia, and Morocco). They were entered into Model 02 as binary dummy variables at the country-level. Figure G6 Model 02 Specification 170 Table G11 shows the fixed effects for Model 02. The intercept (i.e., 425) represents the reference point from which to evaluate the now twelve coefficients (e.g., Korea is associated with +103 PISA reading score). Table G11 Model 02 Fixed Effects 171 Table G12 shows the variance decomposition for Model 02. After adding the 6 additional fixed effects, the destination country variance decreased to 483. Table G12 Model 02 Variance Components 172 After adding the destination country fixed effects to the previous model the country-level variance value decreased from 10% to 4%, reducing the destination country variance below 5% of the overall variance (i.e., non-significant). Table G13 shows the model variance decomposition. Table G13 Model Variance Decomposition Model Level Variance Variance By Level Model 00 Student 75780 School Country Model 01 Student School Country Model 02 Student School Country 3004 2971 7575 3015 1206 7578 3006 483 56% 22% 22% 64% 26% 10% 68% 27% 4% 173 Summary of Stage 1: Twelve Destination Countries Were Identified for Fixed Effects Three total 3-level hierarchical models (i.e., students nested within schools nested within destination countries) were specified and run for Stage 1: Models 00, 01, and 02 were specified toward identifying destination countries that have an outsized impact on student achievement. In addition, the residuals were examined to identify which destination countries might differ from the rest. The reason for examining residuals was to identify destination countries that may be significantly different from the rest. The goal was to reduce the variance for the destination countries until it becomes small (i.e., non-significant) at less than 5% of the overall variance. Once that goal had been achieved, twelve countries had been designated as fixed effects dummy variables (i.e., Korea, Ireland, New Zealand, Philippines, Dominican Republic, Indonesia, Canada, Turkey, Australia, Greece, North Macedonia, and Morocco). Ultimately, the aim of this Stage 1 process was to be able to control for these outlier countries by transforming them into fixed effects (i.e., predictor variables) at the school-level for the models in Stage 2. In Stage 1, three models were specified and run towards identifying destination countries that have an outsized impact on student achievement: Model 00, Model 01, and Model 02. First, a null model (i.e., Model 00) was specified and the variance decomposition was examined. In addition, the residuals were examined to identify which destination countries might differ from the rest. These countries would be added into subsequent Stage 1 models (e.g., Model 01, Model 02) as fixed effects dummy variables. The goal was to reduce the variance for the destination countries until it 174 becomes small (i.e., non-significant) at less than 5% of the overall variance. Once that goal had been achieved, twelve countries had been designated as fixed effects dummy variables (i.e., Korea, Ireland, New Zealand, Philippines, Dominican Republic, Indonesia, Canada, Turkey, Australia, Greece, North Macedonia, and Morocco). The country-level variance value decreased on each Stage 1 model from 22% to 4%, meeting the goal of reducing the variance below 5%. Ultimately, the aim of this Stage 1 process was to be able to control for these outlier countries by transforming them into fixed effects (i.e., predictor variables) at the student-level for the models in Stage 2. The rationale for adding these countries as level-2 variables is that each school can be reasoned as having characteristics imparted upon it by the country the school is located within. This process solves the issue with complex cross-nesting as it essentially removes destination country as its own level, and instead puts the impact of destination country as a covariate within the model, thereby allowing origin country influences to become the focal point of the modeling. 175 Stage 2: Modeling Cross-Classified Models The second stage of the two-stage approach to modeling was to specify a 2-level, cross-classified multilevel model. Students are cross-nested within two different level-2 groups: schools and origin countries. Figure G7 is a visualization of the data as it is to be modeled. The model is level-1 (students) cross-nested within level-2a (schools) and level-2b (origin countries). Note that the destination country is not a level itself but rather fixed effect variables at student-level. Figure G7 Structure of the Data Nine total cross-nested models were specified and run for Stage 2: Models 0, 1, and 2 were specified towards creating a baseline model from which to add the predictors of interest in this study. This started with a null model (i.e., Model 0) followed 176 by two subsequent models, building towards a baseline model. The first subsequent model added the fixed effects countries identified in Stage 1 (i.e., Model 02). The second subsequent model added typical control variables (e.g., sex, socio-economic status). This process formed the baseline model (i.e., Model 3) from which to test the variables of interest against, one at a time. Then Models 3, 4, 5, and 6 swapped in the main variables of interest for this study (i.e., LD, HDI, GAIN, and FD Ratio), one at a time. LD was added at the student-level while HDI, GAIN, and FD Ratio were at the country-level. Next, Models 7, 8, and 9 tested immigration year versions of the country-level variables (i.e., HDI, GAIN, and FD Ratio). For example, the Immigration Year HDI was a measure of each student’s HDI value around their unique time of immigration versus the shared country-wide value for all students from that country. These variables were individually swapped in one at and tested, just as the country-level variables were. In the end, this modeling procedure identified just one covariate of interest that was statistically significant. This covariate was language distance from Model 3. The other variables were not statistically significant when tested individually. 177 Model 0: Null Model First, null Model 0 was specified and run. Figure G8 shows the model specification. Figure G8 Model 0 Specification Table G15 shows the fixed effects for the null model. In the fixed effects, there are no covariates in the null model so the only relevant term is the intercept (i.e., 421) which represents the grand mean reading score for all students in the data. Table G15 Model 0 Fixed Effects 178 Table G16 shows the variance decomposition for the null model. Most of the variance decomposition belongs to the student-level (i.e. 3625), with the rest at the school-level (i.e., 6509) and origin country-level (i.e., 2518). Table G16 Model 0 Variance Components Table G17 shows the model deviance and variance decomposition. The first column indicates the model name. The second column indicates the deviance value. The third column indicates model levels. The fourth column indicates the variance by level. The fourth column indicates the variance decomposition by level. We can interpret the distribution of variance as follows. 29% of the variance is at the individual student-level. 51% of the variance is at the school-level (level-2a). 20% of the variance is at the origin country-level (level-2b). Model 0 is a 2-level cross-nested model. This Model 0 serves as an initial model from which to compare subsequent 2-level cross-nested models against. 179 Table G17 Model 0 Deviance & Variance Estimates Model Deviance Level Variance Variance By Level Model 0 (Null) 92931 Student 3625 School 6510 Country 2518 29% 51% 20% 180 Model 1: Null Model + Outlier Destination Countries Effects Next Model 1 was specified and run. Model 1 adds the 12 fixed effects (i.e., predictor variables) from Stage 1. Figure G9 shows the model specification. Figure G9 Model 1 Specification 181 Table G19 shows the fixed effects for Model 1. The intercept (i.e., 419) represents the reference point from which to evaluate the coefficients of the twelve fixed effects (i.e., Korea, Ireland, New Zealand, Philippines, Dominican Republic, and Indonesia, Canada, Turkey, Australia, Greece, North Macedonia, and Morocco). This means that immigrant students who settled in Korea were associated with scores 141 points higher than the mean score of students attending school in any other country (i.e., 419); except for those twelve countries modeled as fixed effects. Table G19 Model 1 Fixed Effects 182 Table G20 shows the variance decomposition for Model 1. After adding the twelve fixed effects, the origin country-level variance decreased to 1585. Table G20 Model 1 Variance Components 183 Table G21 compares the deviance and variance estimates between models, which is used to assess the change in goodness of fit between models. M1 deviance lowered by approximately 575 from 92931 to 92355. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance was statistically significant (p=<0.001). Additionally, the variance decreased at all levels. Table G21 Model 1 Deviance & Variance Estimates Compared to Model 0 Model Deviance Level Variance Variance By Level Model 0 (Null) Model 1 (Null + Stage 1 Effects) 92931 Student 3625 School 6510 Country 2518 92355* Student 3638 School 5205 Country 1585 29% 51% 20% 35% 50% 15% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 0. Bold text denotes models with covariates that were statistically significant within the model. 184 Model 2: Null Model + Controls = (Baseline Model) Next Model 2 was specified and run. Model 2 adds control variables for student socio-economic status and student sex, as well as for school-level socio-economic status. Figure G10 shows the model specification. Figure G10 Model 2 Specification 185 Table G23 shows the fixed effects for Model 2. The intercept (i.e., 468) represents the reference point from which to evaluate the coefficients. Table G23 Model 2 Fixed Effects 186 Table G24 shows the variance decomposition for Model 2. After adding the additional covariates, the origin country-level variance decreased to 1112. Table G24 Model 2 Variance Components 187 Table G25 compares the deviance and variance estimates between models, which is used to assess the change in goodness of fit between models. M2 deviance lowered by approximately 635 from 92355 to 91720. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance was statistically significant (p=<0.001). Additionally, the variance decreased at all levels. Table G25 Model 2 Deviance & Variance Estimates Compared to Model 1 Model Deviance Level Variance Variance By Level Model 1 (Null + Stage 1 Effects) Model 2 (Null + Stage 1 Effects + Controls) 92355 Student 3638 School 5205 Country 1585 91720* Student 3539 School 4293 Country 1112 35% 50% 15% 40% 48% 12% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the Model 1. Bold text denotes models with covariates that were statistically significant within the model. Since Model 2 includes both the fixed effects from Stage 1 and the control variables, this model serves as the baseline model from which to test the various coefficients of interest to the study. 188 Model 3: Language Distance Next Model 3 was specified and run. Model 3 adds the first student-level predictor of interest for this study: language distance. Figure G11 shows the model specification. Figure G11 Model 3 Specification 189 Table G27 shows the fixed effects for Model 3. The intercept (i.e., 476) represents the reference point from which to evaluate the coefficients. The student-level language distance measure was statistically significant (-0.31, SE=0.13, p=0.022). This means that a 1 unit change in language distance is associated with -0.31 change in PISA reading score). Table G27 Model 3 Fixed Effects 190 Table G28 shows the variance decomposition for each level of Model 3. Table G28 Model 3 Variance Components 191 Table G29 compares the deviance and variance estimates between models, which is used to assess the change in goodness of fit between models. Model 3 deviance lowered by approximately 95 from 91720 to 91625. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance was statistically significant (p=<0.001). Additionally, the variance decreased at all levels. Table G29 Model 3 Deviance & Variance Estimates Compared to Baseline Model Deviance Level Variance Variance By Level Model 2 (Null + Stage 1 Effects + Controls) Model 3 (Language Distance) 91720 Student 3539 School 4293 Country 1112 91625* Student 3488 School 4267 Country 1079 40% 48% 12% 39% 48% 12% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 2. Bold text denotes models with covariates that were statistically significant within the model. 192 Model 4: HDI Next Model 4 was specified and run. Model 4 adds the first origin country predictor of interest for this study: Human Development Index (HDI). Figure G12 shows the model specification. Figure G12 Model 4 Specification 193 Table G31 shows the fixed effects for Model 4. The intercept (i.e., 425) represents the reference value from which to evaluate the coefficients. The country-level HDI was not statistically significant (57, SE=41, p=0.165). Table G31 Model 4 Fixed Effects 194 Table G32 shows the variance decomposition for each level of Model 4. Table G32 Model 4 Variance Components 195 Table G33 compares the deviance and variance estimates between models, which is used to assess the change in goodness of fit between models. Model 4 deviance lowered by approximately 2 from 91720 to 91718. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance was not statistically significant (p=0.113). Additionally, the variance decreased at all levels. Table G33 Model 4 Deviance & Variance Estimates Compared to Baseline Model Deviance Level Variance Variance By Level Model 2 (Null + Stage 1 Effects + Controls) Model 4 (HDI) 91720 Student 3539 School 4293 Country 1112 91718 Student 3539 School 4291 Country 1089 40% 48% 12% 40% 48% 12% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 2. Bold text denotes models with covariates that were statistically significant within the model. 196 Model 5: GAIN Next Model 5 was specified and run. Model 5 swaps in the second origin country predictor of interest (i.e., Notre Dame-Global Adaptation Initiative (GAIN)) while swapping out the first (HDI). Figure G13 shows the model specification. Figure G13 Model 5 Specification 197 Table G35 shows the fixed effects for Model 5. The intercept (i.e., 420) represents the reference value from which to evaluate the coefficients. The country-level GAIN coefficient was not statistically significant (0.92, SE=0.52, p=0.085). Table G35 Model 5 Deviance & Variance Estimates Compared to Baseline 198 Table G36 shows the variance decomposition for each level of Model 5. Table G36 Model 5 Variance Components 199 Table G37 compares the deviance and variance estimates between models, which is used to assess the change in goodness of fit between models. Model 5 deviance lowered by approximately 4 from 91720 to 91716. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance was statistically significant (p=<0.049). Additionally, the variance decreased at school- and country-levels yet remained practically the same at student-level. Table G37 Model 5 Deviance & Variance Estimates Compared to Baseline Model Deviance Level Variance Variance By Level Model 2 (Null + Stage 1 Effects + Controls) Model 5 (GAIN) 91720 Student 3539 School 4293 Country 1112 91716* Student 3540 School 4289 Country 1075 40% 48% 12% 40% 48% 12% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 2. Bold text denotes models with covariates that were statistically significant within the model. 200 Model 6: FD Ratio Next Model 6 was specified and run. Model 6 swaps in the final origin country predictor of interest (i.e., FD Ratio while swapping out the previous (GAIN). Figure G14 shows the model specification. Figure G14 Model 6 Specification 201 Table G39 shows the fixed effects for Model 6. The intercept (i.e., 468) represents the reference value from which to evaluate the coefficients. The country-level FD Ratio coefficient was not statistically significant (9.26, SE=4.89, p=0.062). Table G39 Model 6 Fixed Effects 202 Table G40 shows the variance decomposition for each level of Model 6. Table G40 Model 6 Variance Components 203 Table G41 compares the deviance and variance estimates between models, which is used to assess the change in goodness of fit between models. Model 6 deviance lowered by approximately 4 from 91720 to 91716. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance was statistically significant (p=<0.046). Additionally, the variance decreased at all the school- and country-levels. Table G41 Model 6 Deviance & Variance Estimates Compared to Baseline Model Deviance Level Variance Variance By Level Model 2 (Null + Stage 1 Effects + Controls) Model 6 (FD Ratio) 91720 Student 3539 School 4293 Country 1112 91716* Student 3538 School 4296 Country 1041 40% 48% 12% 40% 48% 12% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 2. Bold text denotes models with covariates that were statistically significant within the model. 204 Model 7: HDI (Immigration Year) Next Model 7 was specified and run. Model 7 tests immigration year versions of the country-level variables. First was Immigration Year HDI. Figure G15 shows the model specification. Figure G15 Model 7 Specification 205 Table G43 shows the fixed effects for Model 7. The intercept (i.e., 680) represents the reference point from which to evaluate the coefficients. While Immigration Year HDI had a statistically significant p-value (coefficient = -283, SE=106, p=0.007), there were two other indicators that suggested not interpreting this covariate due to possible collinearity (i.e., a positive/negative sign switch and an increased standard error). First the HDI coefficient sign changed from positive (+58) in the country-level to negative (-283) in the immigration year version of HDI. Second was that the standard error increased from the country-level value (21) to a value more than 5% larger (106) in the Immigration Year HDI. To further investigate, simpler, non-multilevel regressions were tested with both the country and student HDI variable to see if the unusual patterns were also found in a simpler model. In those cases both HDI variables had coefficients with positive signs and similar standard errors, suggesting that the immigration year, cross-classified HDI was statistically problematic and therefore Model 7 was not interpretable. 206 Table G43 Model 7 Fixed Effects 207 Table G44 shows the variance decomposition for each level of Model 7. Table G44 Model 7 Variance Components 208 Table G45 compares the deviance and variance estimates between models, which is used to assess the change in goodness of fit between models. Model 7 deviance lowered by approximately 48 from 91720 to 91672. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance was statistically significant (p=<0.001). Additionally, the variance decreased at student- and school-levels while the country-level increased by over 2000. Table G45 Model 7 Deviance & Variance Estimates Compared to Baseline Model Deviance Level Variance Variance By Level Model 2 (Null + Stage 1 Effects + Controls) 91720 Student 3539 School 4293 Country 1112 Model 7 91672* Student 3515 (Immigration Year HDI) School 4169 Country 3047 40% 48% 12% 33% 39% 28% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 2. Bold text denotes models with covariates that were statistically significant within the model. 209 Model 8: GAIN (Immigration Year) Next Model 8 was specified and run. Model 8 tests immigration year versions of the country-level variables. This was Immigration Year GAIN. Figure G16 shows the model specification. Figure G16 Model 8 Specification 210 Table G47 shows the fixed effects for Model 8. The intercept (i.e., 591) represents the reference point from which to evaluate the coefficients. The Immigration Year GAIN coefficient was not statistically significant (-2.36, SE=1.45, p=0.103). Table G47 Model 8 Fixed Effects 211 Table G48 shows the variance decomposition for each level of Model 8. Table G48 Model 8 Variance Components 212 Table G49 compares the deviance and variance estimates between models, which is used to assess the change in goodness of fit between models. Model deviance lowered by approximately 25 from 91720 to 91695. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance was statistically significant (p=<0.001). Additionally, the variance decreased at school-level only. Table G49 Model 8 Deviance & Variance Estimates Compared to Baseline Model Deviance Level Variance Variance By Level Model 2 (Null + Stage 1 Effects + Controls) 91720 Student 3539 School 4293 Country 1112 Model 8 91695* Student 3506 (Immigration Year GAIN) School 4276 Country 2079 40% 48% 12% 36% 43% 21% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 2. Bold text denotes models with covariates that were statistically significant within the model. 213 Model 9: FD Ratio (Immigration Year) Next Model 9 was specified and run. Model 9 tests immigration year versions of the country-level variables. This was the Immigration Year FD Ratio. Figure G17 shows the model specification. Figure G17 Model 9 Specification Table G51 shows the fixed effects for Model 9. The intercept (i.e., 468) represents the reference value from which to evaluate the coefficients. The Immigration Year FD Ratio coefficient was not statistically significant (4.11, SE=4.17, p=0.324). 214 Table G51 Model 9 Fixed Effects 215 Table G52 shows the variance decomposition for each level of Model 3. Table G52 Model 9 Variance Components 216 Table G53 compares the deviance and variance estimates between models, which is used to assess the change in goodness of fit between models. Model 9 deviance lowered by approximately 5 from 91720 to 91716. A lower deviance suggests a better model fit. A likelihood ratio test confirmed the change in deviance was statistically significant (p=<0.022). Additionally, the variance decreased at the student-level only. Table G53 Model 9 Deviance & Variance Estimates Compared to Baseline Model Deviance Level Variance Variance By Level Model 2 (Null + Stage 1 Effects + Controls) 91720 Student 3539 School 4293 Country 1112 Model 9 91715* Student 3528 (Immigration Year FD Ratio) School 4319 Country 1058 40% 48% 12% 40% 48% 12% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 2. Bold text denotes models with covariates that were statistically significant within the model. 217 Summary of Stage 2: Two Variables Were Significant; One Model Was Preferred Nine total cross-nested models were specified and run for Stage 2: Models 0, 1, and 2 were specified towards creating a baseline model from which to add the predictors of interest in this study. This started with a null model (i.e., Model 0) followed by two subsequent models, building towards a baseline model. The first subsequent model added the fixed effects countries identified in Stage 1 (i.e., Model 02). The second subsequent model added typical control variables (e.g., sex, socio-economic status). This process formed the baseline model (i.e., Model 3) from which to test the variables of interest against, one at a time. Then Models 3, 4, 5, and 6 swapped in the main variables of interest for this study (i.e., LD, HDI, GAIN, and FD Ratio), one at a time. LD was added at the student-level while HDI, GAIN, and FD Ratio were at the country-level. Next, Models 7, 8, and 9 tested immigration year versions of the country-level variables (i.e., HDI, GAIN, and FD Ratio). For example, the Immigration Year HDI was a measure of each student’s HDI value around their unique time of immigration versus the shared country-wide value for all students from that country. These variables were individually swapped in one at and tested, just as the country-level variables were. In the end, this modeling procedure identified just one covariate of interest that was statistically significant. This covariate was language distance from Model 3. The other variables were not statistically significant when tested individually. Table G54 compares all models in Stage 2 at once. 218 Table G54 Models with Significant Covariates Compared to Baseline Model Deviance Level Variance Variance By Level 29% 51% 20% 35% 50% 15% 40% 48% 12% 39% 48% 12% 40% 48% 12% Model 0 (Null) Model 1 (Null + Stage 1 Effects) Model 2 (Null + Stage 1 Effects + Controls) Model 3 (Language Distance) Model 4 (HDI) 92931 Student 3625 School 6510 Country 2518 92355 Student 3638 School 5205 Country 1585 91720 Student 3539 School 4293 Country 1112 91625* Student 3488 School 4267 Country 1079 91718 Student 3539 School 4291 Country 1089 219 Table G54 (cont’d) Model Deviance Level Variance Variance By Level Model 5 (GAIN) Model 6 (FD Ratio) 91716* Student 3540 School 4289 Country 1075 91716* Student 3538 School 4296 Country 1041 Model 7 91672* Student 3515 (Immigration Year HDI) School 4169 Country 3047 Model 8 91695* Student 3506 (Immigration Year GAIN) School 4276 Country 2079 Model 9 91715* Student 3528 (Immigration Year FD Ratio) School 4319 Country 1058 40% 48% 12% 40% 48% 12% 33% 39% 28% 36% 43% 21% 40% 48% 12% Note. Asterisk after the model name denotes the model deviance is statistically significantly different from the baseline Model 2. Bold text denotes models with covariates that were statistically significant within the model. 220 APPENDIX H: COMPLETE LIST OF IMMIGRATION PATHS Table H1 Immigrant Students Paths from Origins to Destinations, Sorted by Country # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count CAN CAN CAN CAN CAN CAN CAN CAN CAN CAN CAN QAT QAT QAT JOR JOR PHL USA CHN IND GBR PAK KOR FRA IRN SYRA ARE EGY JOR YEM SYR IRQ 221 526 278 253 171 102 100 73 66 51 44 41 1035 333 254 849 87 1705 1705 1705 1705 1705 1705 1705 1705 1705 1705 1705 1622 1622 1622 971 971 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 JOR AUS AUS AUS AUS AUS AUS AUS AUS NZL NZL NZL NZL NZL NZL NZL CHE 35 249 177 148 143 143 27 10 8 161 93 89 84 80 24 17 133 EGY GBR NZL PHL CHN IND VNM GRC ITA GBR ZAF PHL CHN AUS KOR FJI PRT 222 971 905 905 905 905 905 905 905 905 548 548 548 548 548 548 548 404 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 CHE CHE CHE CHE CHE CHE CHE BEL BEL BEL BEL AZE AZE AZE ARG ARG ARG 95 89 32 24 13 10 8 103 72 66 8 154 42 5 91 67 15 ITA DEU FRA ESP TUR AUT ALB NLD DEU FRA TUR RUS GEO TUR BOL PRY BRA 223 404 404 404 404 404 404 404 247 247 247 247 201 201 201 191 191 191 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 ARG ARG ARG CRI CRI CRI BIH BIH BIH IRL ISR ISR ISR GEO GEO GEO AUT 11 7 99 171 14 6 92 84 2 178 84 43 29 133 8 7 71 CHL URY NIC NIC COL PAN HRV SRB MNE GBR USA ETH FRA RUS AZE ARM DEU 224 191 191 215 191 191 191 178 178 178 178 156 156 156 148 148 148 127 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 AUT AUT AUT FIN FIN FIN FIN FIN FIN FIN FIN GBR MDA MDA MDA MNE MNE 24 17 15 43 31 17 15 7 5 3 1 122 70 24 15 82 14 AFG SYR TUR EST RUS CHN SWE IRQ AFG VNM TUR IRL RUS UKR ROU SRB BIH 225 127 127 127 122 122 122 122 122 122 122 122 122 109 109 109 103 103 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 MNE MNE PAN PAN PAN PAN PAN BLR BLR BLR BLR CZE CZE CZE CZE CZE MEX 6 1 55 30 8 6 3 53 27 15 4 32 28 12 12 11 79 ALB HRV VEN COL NIC DOM CHN RUS UKR KAZ POL UKR SVK CHN RUS VNM USA 226 103 103 102 102 102 102 102 99 99 99 99 95 95 95 95 95 77 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 HRV HRV DEU DEU DEU DEU DEU DEU DEU DEU DNK DNK DNK DNK DNK DNK DNK 58 19 31 10 9 8 7 5 4 1 19 9 7 6 6 6 6 BIH SRB POL ITA TUR HRV BIH GRC SRB MKD SYR ISL AFG IRQ NOR PAK SWE 227 77 77 75 75 75 75 75 75 75 75 67 67 67 67 67 67 67 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 DNK DNK MAR MAR MAR MAR SAU SAU SAU SAU SAU URY URY PRT PRT UKR UKR 6 2 22 21 18 3 25 13 13 9 5 33 30 59 2 29 7 TUR FIN ESP FRA ITA DEU JOR KWT QAT USA AUS ARG BRA BRA CHN RUS MDA 228 67 67 64 64 64 64 63 63 63 63 63 63 63 61 61 43 43 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 UKR UKR LVA LVA LVA PHL PHL PHL PHL PHL SVK SVK IDN IDN IDN IDN KOR 4 3 31 9 2 15 10 8 4 15 21 12 17 9 3 3 13 BLR KAZ RUS UKR BLR SAU USA CHN ARE SAU CZE HUN MYS NLD AUS SGP USA 229 43 43 42 42 42 37 37 37 37 42 33 33 33 33 33 33 32 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 KOR KOR KOR KOR NOR NOR TUR TUR TUR TUR MKD MKD MKD DOM DOM DOM SVN 8 7 2 2 17 12 18 5 2 1 10 5 5 8 6 4 8 JPN CHN PHL RUS SWE DNK DEU RUS BGR NLD ALB BIH SRB ESP HTI USA ITA 230 32 32 32 32 29 29 26 26 26 26 20 20 20 18 18 18 10 Table H1 (cont’d) # Destination Country Origin Country Origin to Dest. Cntry. Dest. Count Immig. Count 170 SVN HUN 2 10 231 APPENDIX I: IMMIGRATION COUNTS AND PERCENTAGES BY COUNTRY Further analysis of the asymmetric immigration paths was conducted for each country. This analysis shows how many immigrant students are found in each destination country (see Table I1). The first column is the row number. The second column indicates the country, the third column shows how many immigrant students ended up in the aforementioned country. The fourth column shows the percentage of said country’s PISA sample with an immigrant background, among immigrant and native students. An illustrative example result is that Canada (i.e., CAN) had 1705 students with an immigrant background in their PISA 2018 sample. Furthermore, Canada’s PISA 2018 sample was 20.70% immigrant students. Table I1 Immigration Counts and Percentages by Destination Country # 1 2 3 4 5 6 7 8 Country Immigrant Count Immigrant % Population CAN QAT JOR AUS NZL CHE BEL AZE 1705 1622 971 905 548 404 247 201 232 20.70 40.26 17.22 21.43 22.57 30.55 13.06 4.87 Table I1 (cont’d) # 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Country Immigrant Count Immigrant % Population ARG CRI BIH IRL ISR GEO AUT FIN GBR MDA MNE PAN BLR GRC CZE MEX HRV DEU 191 191 178 178 156 148 127 122 122 109 103 102 99 98 95 79 77 75 233 4.74 9.44 2.50 10.25 13.94 1.03 16.60 4.57 7.99 1.23 5.16 4.66 3.78 10.25 3.30 0.78 8.78 18.71 Table I1 (cont’d) # 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Country Immigrant Count Immigrant % Population 19.35 0.71 7.35 0.97 4.78 1.88 4.31 0.64 0.76 0.24 0.09 7.01 0.59 1.10 2.44 4.38 DNK MAR SAU URY PRT UKR LVA PHL SVK IDN KOR NOR TUR MKD DOM SVN 67 64 63 63 61 43 42 37 33 32 32 29 26 20 18 10 234 Table I2 shows the unique country-to-country pairings of immigrant students within the sample. analysis of the asymmetric immigration paths was conducted for each country. The first column is the row number. The second column indicates the destination country, the third column shows the origin country. The fourth column is the count of cases for that particular country pair. An illustrative example is that Qatar to Egypt had 1035 unique cases. Table I2 Immigrant Students Paths from Origins to Destinations, Sorted by Count # 1 2 3 4 5 6 7 8 9 10 11 12 Destination Country Origin Country Origin to Destination Count QAT JOR CAN QAT CAN QAT CAN AUS IRL AUS CAN CRI EGY SYR PHL JOR USA YEM CHN GBR GBR NZL IND NIC 235 1035 849 526 333 278 254 253 249 178 177 171 171 Table I2 (cont’d) # 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Destination Country Origin Country Origin to Destination Count NZL AZE AUS AUS AUS CHE GEO GBR BEL CAN CAN GRC CHE NZL BIH ARG CHE NZL GBR RUS PHL CHN IND PRT RUS IRL NLD GBR PAK ALB ITA ZAF HRV BOL DEU PHL 236 161 154 148 143 143 133 133 122 103 102 100 98 95 93 92 91 89 89 Table I2 (cont’d) # 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Destination Country Origin Country Origin to Destination Count JOR BIH ISR NZL MNE NZL MEX CAN BEL AUT MDA ARG CAN BEL PRT HRV PAN BLR IRQ SRB USA CHN SRB AUS USA KOR DEU DEU RUS PRY FRA FRA BRA BIH VEN RUS 237 87 84 84 84 82 80 79 73 72 71 70 67 66 64 59 58 55 53 Table I2 (cont’d) # 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 Destination Country Origin Country Origin to Destination Count CAN CAN FIN ISR AZE CAN JOR URY CHE CZE DEU FIN LVA PAN URY ISR UKR CZE IRN SYR EST ETH GEO ARE EGY ARG FRA UKR POL RUS RUS COL BRA FRA RUS SVK 238 51 44 43 43 42 41 35 33 32 32 31 31 31 30 30 29 29 28 Table I2 (cont’d) # 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 Destination Country Origin Country Origin to Destination Count AUS BLR SAU AUT CHE MDA NZL MAR MAR SVK DNK HRV MAR TUR AUT FIN IDN NOR VNM UKR JOR AFG ESP UKR KOR ESP FRA CZE SYR SRB ITA DEU SYR CHN MYS SWE 239 27 27 25 24 24 24 24 22 21 21 19 19 18 18 17 17 17 17 Table I2 (cont’d) # 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 Destination Country Origin Country Origin to Destination Count NZL ARG AUT BLR FIN MDA PHL CRI MNE CHE KOR CZE CZE NOR SAU SAU SVK ARG FJI BRA TUR KAZ SWE ROU SAU COL BIH TUR USA CHN RUS DNK KWT QAT HUN CHL 240 17 15 15 15 15 15 15 14 14 13 13 12 12 12 12 12 12 11 Table I2 (cont’d) # Destination Country Origin Country Origin to Destination Count 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 CZE AUS CHE DEU MKD PHL DEU DNK IDN LVA SAU AUS BEL CHE DEU DOM GEO KOR VNM GRC AUT ITA ALB USA TUR ISL NLD UKR USA ITA TUR ALB HRV ESP AZE JPN 241 11 10 10 10 10 10 9 9 9 9 9 8 8 8 8 8 8 8 Table I2 (cont’d) # Destination Country Origin Country Origin to Destination Count 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 PAN PHL SVN ARG DEU DNK FIN GEO KOR UKR CRI DNK DNK DNK DNK DNK DOM MNE NIC CHN ITA URY BIH AFG IRQ ARM CHN MDA PAN IRQ NOR PAK SWE TUR HTI ALB 242 8 8 8 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 Table I2 (cont’d) # Destination Country Origin Country Origin to Destination Count 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 PAN AZE DEU FIN MKD MKD SAU TUR BLR DEU DOM PHL UKR FIN IDN IDN MAR PAN DOM TUR GRC AFG BIH SRB AUS RUS POL SRB USA ARE BLR VNM AUS SGP DEU CHN 243 6 5 5 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 Table I2 (cont’d) # Destination Country Origin Country Origin to Destination Count 157 158 159 160 161 162 163 164 165 166 167 168 169 UKR BIH DNK KOR KOR LVA PRT SVN TUR DEU FIN MNE TUR 3 2 2 2 2 2 2 2 2 1 1 1 1 KAZ MNE FIN PHL RUS BLR CHN HUN BGR MKD TUR HRV NLD 244