MATCHING FOR BIAS REDUCTION IN TREATMENT EFFECT ESTIMATION OF HIERARCHICALLY STRUCTURED SYNTHETIC COHORT DESIGN DATA By Qiu Wang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Measurement and Quantitative Methods 2010 Abstract MATCHING FOR BIAS REDUCTION IN TREATMENT EFFECT ESTIMATION OF HIERARCHICALLY STRUCTURED SYNTHETIC COHORT DESIGN DATA By Qiu Wang This study uses a multi-level multivariate propensity score matching approach to examine the synthetic cohort design (SCD) in estimating the schooling effect on mathematics proficiency of the focal cohort 2 (8th graders). Collecting 7th and 8th graders at the same time point, the SCD is sufficient in estimating the schooling effect under the historical equivalency of groups (HEoG) assumption. A structural equation modeling (SEM) framework is used to define the HEoG assumption. It has shown that HEoG assures that the use of SCD results in an unbiased estimate of schooling effect without randomized data. The post-hoc group matching is used to achieve the HEoG assumption in order to produce an estimate of schooling effect that is unbiased in SCD. Three matching approaches, level-1 matching, level-2 matching, and dual matching, are evaluated using simulated data generated based on USA participants of the Second International Mathematics Study (SIMS-USA, IEA, 1977). Two-level latent variable models based on situations that violated the HEoG assumption are created in order to examine the ability of matching to reduce the simulated selection biases to improve the accuracy of the schooling effect estimate in SCD. The three simulated situations involve hierarchically structured data, surrogate covariates with measurement errors, and omitted covariates. Results suggest the following: 1) To reduce initial bias and assure the HEoG assumption, three different matching approaches should be conducted on the covariates according to where the initial bias occurs: on level-1 covariates, on level-2 covariates, and on both level-1 and level-2 covariates; 2) When reliability is low (e.g, .25), latent variable matching does not help improve group comparability, but using observed surrogate variables to match can reduce bias by more than 50 percent. When reliability is high (e.g, greater than .75), latent variable matching reduces bias as much as matching on observed surrogate variables does; 3) When level-2 initial bias is large, increasing level-2 R2 does help to improve level-2 matching. The bias reduction of either individual or dual propensity score matching is not sensitive to the increase of R2 . The dual propensity score matching is more robust to the magnitude of initial selection bias, achieving a large bias reduction rate when the initial bias is small. Either level-1 matching or level-2 matching achieves lower bias reduction rate when the initial bias is small. This dissertation study provides a theoretical basis for future research to examine the effectiveness of propensity score matching in reducing the selection bias of SCD for casual inference and program evaluation. Practical considerations and suggestions for future research on hierarchically structured data in program evaluation are discussed. Copyright by QIU WANG 2010 To: My Father v ACKNOWLEDGMENT Completing this doctoral dissertation has been a journey full of explorations, adventures, excitement, and sometimes struggles. During this process, I have been very fortunate to have the guidance, support, and help from my professors, colleagues, and friends. My deepest gratitude goes to my spiritual and academic mentors, Drs. Richard Houang, Kimberley Maier, Matthew Diemer, and William Schmidt. Their working philosophy, efficient working style, thinking-out-of-box ability shaped my own academic work style. Because of their influences, I am now able to work with my students with appreciation and respects. Dr. Kimberly Maier has been very encouraging. Her comments guide my dissertation study and writing, and specially help me efficiently go through the final revision stage. Dr. Maier’s caring nature deeply influenced me. The inner peace I gain through the scriptures she shared with me is the most valuable an advisee can receive. Dr. Richard Houang, with his insightful thoughts, has helped me see through and solve the technical issues of this dissertation. His philosophical metaphor of “the forest and individual trees” has shed light on problems that I am working on in the dissertation. Dr. Houang’s office door is always open to me and our discussions have been directing the study to a new direction. Without Dr. Houang’s help, I would not have been able to complete the dissertation study. Working with Dr. Williams Schmidt as both a graduate research assistant and a teaching assistant has benefited me at many levels. His guidance is invaluable to both me and my wife, as we both were exceptionally fortunate to have him on our dissertation committee. I have also been very fortunate to be financially supported by Dr. Diemer working several vi very important projects that helped shape my research interests on educational equity and school inclusion. I am also very grateful to Dr. Jack Schwille, Dr. David Wiley, Dr. Richard Wolfe, Dr. Ingrit Monk for their thoughtful suggestions and encouraging inputs during my development of my dissertation proposal. Throughout my doctoral studies, many people have supported my professional growth in one way or another: I’m grateful to professors from department of statistics, Dr. James Stapleton ( Categorical data analysis, experimental design, sampling) and Dr. Lijian Yang (Regression), and then student colleague and friend and now statistics professors, Dr. Weixing Song (Kansas State University). I really appreciated the research work done through the financial supports I was awarded by Drs.Betsy J. Becker (Meta-analysis) and Mary M. Kennedy (Science Education), Dr. Akiho Kamata ( Psychometrics and Equating ), Dr. Richard Tate (Multilevel-modeling), Dr. Yeo Meng Thum (Hierarchical Linear Modeling), Dr. Barbara Schneider (Causal inference), my coauthors and friends Dr. Hui Liu and Dr. Brandon Vaughn. I am especially thankful that I have studied with two great scholars, Dr. Tenko Raykov (Structural Equation Modeling and Reliability ) and Dr. Mark Reckase (Psychometrics and Multidimensional Item Response Theory). Many friends Benjamin Ong, Brian Lengseth, David Rayes-Gastelum, and Amanda Lewis also helped me through my doctoral studies. I greatly value their friendship and deeply appreciate their confidence in me. Their support and care helped me overcome setbacks and stay focused on my study. Dr. Yong Zhao and his wife Xi Chen and their son Yechen and daughter Athena have been very supportive to me. They made my years at MSU meaningful and enjoyable with vii warm memories. Mr. Blaine Morrow and Mrs. Linda Morrow are always there with love, support and encouragement to me and my family. My dear wife, Dr. Jing Lei, is always there by my side with her unconditional love. I am so lucky to share my life with her. Because of her, going through this process and finishing my dissertation becomes such a wonderful journey. This material is based upon work supported by the National Science Foundation under Grant No. DUE-0831581. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. PREFACE 11 Daniel then said to the guard whom the chief official had appointed over Daniel, Hananiah, Mishael and Azariah, 12 “Please test your servants for ten days: Give us nothing but vegetables to eat and water to drink. 13 Then compare our appearance with that of the young men who eat the royal food, and treat your servants in accordance with what you see.” 14 So he agreed to this and tested them for ten days. 15 At the end of the ten days they looked healthier and better nourished than any of the young men who ate the royal food. 16 So the guard took away their choice food and the wine they were to drink and gave them vegetables instead. (Daniel 1:11-16 New International Version) ix Contents List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Nomenclature xvii 1 Introduction 1.1 Research Goals . . . . . . . . 1.2 Solomon Four-Group Design . 1.3 Synthetic Cohort Design . . . 1.4 Why Is HEoG Critical in SCD 1.5 Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Literature Review 2.1 Definitions of Bias and Selection Bias . . . . . . . . . . . . . 2.1.1 Mathematical Definition of Bias at Individual Level . . 2.1.2 How Selection Bias Affects Treatment Effect Estimate . 2.1.3 Selection Bias in Hierarchically Structured Data . . . . 2.2 Propensity Score Matching for Bias Reduction . . . . . . . . 2.3 Matching on Hierarchically Structured Data . . . . . . . . . . 2.3.1 Level-1 Matching . . . . . . . . . . . . . . . . . . . . . 2.3.2 Level-2 Matching . . . . . . . . . . . . . . . . . . . . . 2.3.3 Dual-Matching . . . . . . . . . . . . . . . . . . . . . . 2.4 Measurement Errors and Matching . . . . . . . . . . . . . . . 2.4.1 Measurement Errors Adjusted Propensity Scores . . . 2.4.2 Structural Equation Modeling as an Alternative . . . . 2.5 Omitted Variables . . . . . . . . . . . . . . . . . . . . . . . . 3 Theoretical Framework 3.1 Solomon Four-Group Design in SEM Framework . . . . . . 3.1.1 SEM of Experimental Group 1 . . . . . . . . . . . 3.1.2 SEM of Control Group 1 . . . . . . . . . . . . . . 3.1.3 SEM of Experimental Group 2 . . . . . . . . . . . 3.1.4 SEM of Control Group 2 . . . . . . . . . . . . . . 3.1.5 Pre-Equivalence of Groups (PEoG) Assumption . . 3.2 Extended Solomon Four-Group Design in SEM Framework 3.2.1 Extended-PEoG Assumption . . . . . . . . . . . . x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 3 5 7 10 11 . . . . . . . . . . . . . 13 14 14 16 18 20 22 24 24 25 25 26 28 30 . . . . . . . . 33 33 35 36 37 37 37 40 43 3.3 3.4 Synthetic Cohort Design in the Context of Solomon Four-Group Design . . . Matching and HEoG Assumption . . . . . . . . . . . . . . . . . . . . . . . . 47 50 4 Simulation Study 53 4.1 Data and Conceptual Model . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.1.1 Two-Level Structural Equation Model Based on Data of SIMS-USA . 63 4.1.2 Longitudinal Data Generation . . . . . . . . . . . . . . . . . . . . . 73 4.2 Generate Synthetic Cohort Design Data with Simulated Selection Bias . . . 80 4.2.1 Generate Hierarchically Structured C1T 1 Data with Selection Bias . 83 4.2.1.1 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s . . . . 83 4.2.1.2 C1T 1’s Level-1 Covariate Variances Differ from C2T 0’s . . . 84 4.2.1.3 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s . . . . 84 4.2.1.4 C1T 1’s Level-2 Covariate Variances Differ from C2T 0’s . . . 85 4.2.1.5 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s 85 4.2.2 Generate Data for Matching on Latent Variables v.s. Matching on Surrogate Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.2.2.1 C1T 1’s Surrogate Variable Means Differ from C2T 0’s, with the Same Latent Means and Low Reliability . . . . . . . . . 86 4.2.2.2 C1T 1’s Surrogate Variables Have Higher Reliability than C2T 0’s, with the Same Surrogate Means and the Same Latent Means 88 4.2.2.3 C1T 1’s Surrogate Variables Have Higher Reliability, Different Latent Variable Mean from C2T 0’s . . . . . . . . . . . . 89 4.2.2.4 C1T 1’s Latent Variable Mean Differ from C2T 0’s, with Same Higher Reliability . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2.3 Manipulate R2 to Generate Data for Matching . . . . . . . . . . . . . 90 4.2.3.1 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s, with 2 Level-1 Variance σepre Reduced by Half . . . . . . . . . . . 90 4.2.3.2 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s, with 2 Level-1 Variance σepre Reduced by Half, and Initial Difference Reduced . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.2.3.3 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s with 2 Level-2 Variance σuα Reduced by Half . . . . . . . . . . . 91 0 4.2.3.4 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s with 2 Level-2 Variance σuα Reduced by Half and Initial Difference 0 Reduced by Half . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2.3.5 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s with Both Level-1 and Level-2 Variances Reduced by Half . 92 4.2.3.6 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s, with Both Level-1 and Level-2 Variances Reduced by Half and Total Initial Difference Reduced . . . . . . . . . . . . . 92 4.3 Simulation Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3.1 Compute Initial Difference . . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.2 Compute After Matching Bias . . . . . . . . . . . . . . . . . . . . . . 94 xi 4.3.3 Compute Bias Reduction Rate . . . . . . . . . . . . . . . . . . . . . 95 5 Matching Simulation Results and Discussions 5.1 Three Types of Matching Routines . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Level-1 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Level-2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Dual Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Simulation Results of Matching on Level-1 and/or Level-2 Covariates . . . . 5.2.1 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s . . . . . . . . . 5.2.2 C1T 1’s Level-1 Covariate Variances Differ from C2T 0’s . . . . . . . . 5.2.3 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s . . . . . . . . . 5.2.4 C1T 1’s Level-2 Covariate Variances Differ from C2T 0’s . . . . . . . . 5.2.5 Dual Matching Simulation Results . . . . . . . . . . . . . . . . . . . 5.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Simulation Results of Matching on Level-1 Latent Variable and Surrogate Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 C1T 1’s Surrogate Variable Means Differ from C2T 0’s, with the Same Latent Means and Low Reliability . . . . . . . . . . . . . . . . . . . . 5.3.2 C1T 1’s Surrogate Variables Have Higher Reliability than C2T 0’s, with the Same Surrogate Means and the Same Latent Means . . . . . . . . 5.3.3 C1T 1’s Surrogate Variables Have Higher Reliability, Different Latent Variable Mean from C2T 0’s . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 C1T 1’s Latent Variable Mean Differ from C2T 0’s, with the Same Higher Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Simulation Results of Matching When R2 Being Manipulated . . . . . . . . 5.4.1 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s, with Level-1 Variance Reduced by Half . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s, with Level-1 Variance Reduced by Half, and Initial Difference Reduced . . . . . . 5.4.3 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s with Level-2 Variance Reduced by Half . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s with Level-2 Vari2 ance σuα Reduced by Half, and Initial Difference Reduced by Half . 0 5.4.5 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s with Both Level-1 and Level-2 Variances Reduced by Half . . . . . . . . . 5.4.6 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s, with Both Level-1 and Level-2 Variances Reduced by Half, and Total Initial Difference Reduced . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 96 96 98 99 100 100 100 101 101 103 103 104 105 107 108 108 109 110 112 112 113 113 114 114 115 6 Discussions 116 6.1 Extend the Analysis to another Type of Math Classes . . . . . . . . . . . . . 117 6.2 Incomplete Matching Due to Small Cluster Size . . . . . . . . . . . . . . . . 118 xii 6.3 Role of Covariates in Synthetic Cohort Design . . . . . . . . 6.3.1 On Which Covariates to Match . . . . . . . . . . . . 6.3.2 Concern on Chronological Variables such as Age and OTL . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Two Types of Level-2 Covariates . . . . . . . . . . . 6.3.4 Interaction Terms as Omitted Covariates . . . . . . . 6.4 Deal with Students under Retention in Matching . . . . . . 6.5 Improve Measurement Accuracy in Education Studies . . . . 6.6 Situations Where HEoG May Fail . . . . . . . . . . . . . . . 6.7 Statistical Power as an After-Matching Evaluation Index . . 6.8 After-Matching Statistical Analyses . . . . . . . . . . . . . 6.9 Synthetic Cohorts Design and Life-Course Research . . . . . 6.10 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Simulation Code A.1 Mplus Code Fitting the Two-Level SEM on SIMS-USA Data A.2 Mplus Code Generating Data for Mont Carlo Simulation . . A.3 R Code for Level-1 Matching . . . . . . . . . . . . . . . . . . A.4 Code for Level-2 Matching . . . . . . . . . . . . . . . . . . . A.5 R Code for Dual Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . Grade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Specific . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 123 123 124 125 126 127 129 129 132 136 . . . . . 138 138 141 143 149 153 . . . . . . . . . . . . . . . . . . . . 121 122 B Variance-Covariance Decomposition of the Extended Solomon Four-Group Design (SFGD) Based On Two-Level SEM Framework 158 B.1 Variance-Covariance Matrix of SFGD Experimental Group 1 . . . . . . . . . 159 B.2 Variance-Covariance Matrix of SFGD Experimental Group 2 . . . . . . . . . 160 B.3 Variance-Covariance Matrix of SFGD Control Group 1 . . . . . . . . . . . . 161 B.4 Variance-Covariance Matrix of SFGD Control Group 2 . . . . . . . . . . . . 162 B.5 Detailed Variance-Covariance Decomposition . . . . . . . . . . . . . . . . . . 163 xiii List of Tables 3.1 Solomon Four-Group Design in Structural Equation Modeling Framework . . 34 3.2 SEMs of the Extended Solomon Four-Group Design and Covariance Matrixes 42 4.1 Level-1 Descriptive Statistics of the Final Two-Level Structural Equation Model (N=2,296) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1 Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Level-2 Descriptive Statistics of the Final Two-Level Structural Equation Model (N=126) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3 ¯ The Level-1 Variance Covariance Matrix (S1 ) and Means (X1 ) . . . . . . . . 59 4.3 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4 ¯ The Level-2 Variance Covariance Matrix (S2 ) and Means (W2 ) . . . . . . . . 61 4.4 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5 Two-Level Structural Equation Model Estimates (a.k.a True Pseudo-population Parameter Values) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.5 Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.6 ˆ Model Estimated Parameters: Level-1 Variance Covariance Matrix (Σ1 ) and Mean (ˆ1 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . µ 69 4.6 Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.7 ˆ Model Estimated Parameters: Level-2 Variance Covariance Matrix (Σ2 ) and Mean (ˆ2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . µ 71 Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.7 xiv 4.8 Covariance Matrix of the Five Latent Variables . . . . . . . . . . . . . . . . 73 4.9 Class Size Distribution of 126 Classes of SIMS-USA Data . . . . . . . . . . . 74 4.10 Recovery of Pseudo-Population Parameters. . . . . . . . . . . . . . . . . . . 75 4.10 Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.10 Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.11 Possible Simulation Manipulations on Comparability of C2T0 and C1T1 in SEM Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.12 Simulation Design of Matching on Latent and Surrogate Variables . . . . . . 87 5.1 Bias Reduction Rates of the Three Types of Matching . . . . . . . . . . . . . 102 5.2 Simulation Results of Matching on Level-1 Latent Variable and Surrogate Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.3 Bias Reduction Rates of Three Type of Matching with Higher R2 . . . . . . 111 xv List of Figures 1.1 The Solomon Four-Group Design: R represents group randomization, T treatment, and O assessment. Besides randomization, matching as the other approach to create comparable groups (Solomon, 1949). . . . . . . . . . . . . . 5 Longitudinal vs. Quasi-Longitudinal Comparison. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. . . . . . . . . . . . . . . . . . . . . . . 8 3.1 Synthetic cohort design in the context of Solomon Four-Group Design-G1 . . 47 3.2 Three data sets, two-way matching and the HEoG assumption . . . . . . . . 52 4.1 Conceptual framework model on SIMS-USA data . . . . . . . . . . . . . . . 63 4.2 Two-level structural equation model on SIMS-USA data . . . . . . . . . . . 65 1.2 xvi Nomenclature SCD Synthetic Cohort Design, a quasi-longitudinal design, page 1. TIMSS the Third International Mathematics and Science Study, page 1. SIMS the Second International Mathematics Study, page 1. CjT t Cohort j at Time t, with j = 1, 2 and t = 0, 1, page 1, page 51. δC2T 1−C2T 0 schooling effect based on Cohort 2 across Time 0 and Time 1 in longitudinal design, page 1. δC2T 1−C1T 1 schooling effect based on Cohort 1 and Cohort 2 at Time 1 in quasilongitudinal design, page 1. HEoG the historical equivalence of groups assumption, page 2. X vector of level-1 and level-2 covariates, page 2. x a covariate, page 2. p number of covariates in vector X, page 2. DF the discriminant function, page 2. V1 Σw the first eigenvector vector, page 2. Σb between-group variance covariance matrixes of X s, page 2. DDA descriptive discriminant analysis, page 2. R2 proportion of the explained variance by the model, page 3. SEM structural equation model, page 3. SFGD the Solomon Four-Group Design, page 4. R group randomization, page 5. O operation of assessment, page 5. within-group variance covariance matrixes of X s, page 2. xvii T treatment, page 5. Yti outcome vector of group i at time t, with i = E1 , C1 , E2 , C2 ; t = 0,1, page 6. outcome vector of group i at time t, with i = E1 , C1 , E2 , C2 ; t = 0,1, page 33. ¯ Yti mean of group i at time t, with i = E1 , C1 , E2 , C2 ; t = 0,1, page 6. E1 , E2 Experimental Group 1, 2 in SFGD, respectively, page 6. C1 , C2 Control Group 1, 2 in SFGD, respectively, page 6. α main effect due to history or prior learning in SFGD, page 6. C2 α0 main effect due to history or Prior learning of Cohort 2(8th grade) at Time 0 in SCD, respectively, page 7. C1 α0 main effect due to history or Prior learning of Cohort 1(7th grade) at Time 0 in SCD, respectively, page 7. τ main effect due to taking the pre-test in SFGD, page 6. τ C1 main effect due to taking the pre-test of Cohort 1(7th grade) in SCD, page 7. τ C2 main effect due to taking the pre-test of Cohort 2 (8th grade)in SCD, page 7. α×τ joint effect (the interaction) of prior learning (α) and taking the pretest (τ ) in SFGD, page 7. γ main effect due to maturation from Time 0 to Time 1 in SFGD, page 6. γ C1 main effect due to maturation between Time 0 and Time 1 of Cohort 1(7th grade) in SCD, page 7. γ C2 main effect due to maturation between Time 0 and Time 1 of Cohort 2 (8th grade)in SCD, page 7. δ main effect due to the treatment in SFGD, page 6. population level treatment effect, see equation (2.7), page 16. intervention effect in SEM, see equation (3.2), page 35. intercept of the measurement model of outcome Y in the extended SEM-based SFGD, see equation (3.12), page 41. xviii δ C1 main effect due to schooling effect of 7th grade instruction in Cohort 1 in SCD, page 7. ¯ Ytl the mean of the dependent variable for cohort l at time t, l = C1, C2 and t = 0, 1 in SCD, page 7. E(.) Expectation/Mean function, page 6. C1, C2 Cohort 1 (7th grade), Cohort 2 (8th grade) in SCD, respectively, page 7. ⇒ reads implies, page 9. f (.) some additive function, page 6. a function estimating the propensity score, page 80. | reads given, page 9. BIAS(.) bias function of an estimator, see equation (1.2), page 9. ˆ BIAS(.) manipulated bias, see equation (4.6), page 84. ICC intraclass correlation, page 13. D yi ith response/outcome of group D, see equation (2.1), page 15. D group membership index variable: treatment (D = 1) or control (D = 0), see equation (2.1), page 15. nD Dth group size, see equation (2.1), page 15. uD i the ith random errors in Dth group, see equation (2.1), page 15. µD the mean of Dth group’s outcome variable y, see equation (2.1), page 15. µD X the mean vector of covariates X in Dth group, page 15. i random error, which is u1 minus u0 , see equation (2.2), page 15. i i M D (X) the mean function of Dth group in terms of covariates X, see equation (2.4), page 15. αD intercept of the regression equation of outcome y D in Dth group, see equation (2.4), page 15. βD covariates X’s regression coefficient vector in the regression equation of outcome y D in Dth group, see equation (2.4), page 15. xix δ(X) treatment effect of the counterfactual model including covariates X’s, see equation (2.6), page 16. ∆(X) treatment effect bias of the counterfactual model including covariates X’s, see equation (2.7), page 16. ∆X non-zero constant vector, the treatment and control group mean difference of covariates X , see equation (2.8), page 17. ∆β β 1 − β 0 , difference of covariates X’s regression coefficients between treatment group and control group, see equation (2.12), page 17. βX×D regression coefficient vector of the interaction terms between covariates X and treatment status variable D, page 18. D Yik outcome vector of ith individual in k th cluster of the Dth group, see equation (2.15), page 19. D Xik level-1 covariates X measured on ith individual in k th cluster of the Dth group, see equation (2.15), page 19. D Wik level-2 covariates W measured on ith individual in k th cluster of the Dth group, see equation (2.15), page 19. µX the population mean vector of level-1 covariates, see equation (2.15), page 19. µW the population mean vector of level-2 covariates, see equation (2.15), page 19. µX the population mean vector of k th class’s level-1 covariates, see equa- k tion (2.15), page 19. β pooled within-level-2-unit regression coefficient vector of the level-1 variables, see equation (2.15), page 19. βk within-level-2-unit regression coefficient specifically for k th level-2 school, see equation (2.15), page 19. βX the regression coefficient vector of the observed level-1 variables, see equation (2.15), page 19. βW the regression coefficient vector of the observed level-2 variables, see equation (2.15), page 19. CRT cluster-randomized trial, page 22. ECLS the Early Childhood Longitudinal Program, page 22. xx LSAY the Longitudinal Study of American Youth, page 22. ˜ β attenuated regression coefficient due to measurement errors in the covariate x, page 25. R attenuation rate due to measurement errors in the covariate x, page 25. ρ reliability coefficient, page 26. P r(D = 1|X) propensity score function in terms of covariates X, page 26. logit logit function, see equation (2.17), page 27. H the covariate vector,(h1 , ..., hq ) without measurement errors, see equation (2.17), page 27. X∗ the true covariates measured by vector X with errors, see equation (2.18), page 27. N sample size index, page 26. n1 , n2 sample size of sample 1 and sample 2, respectively, page 26. r residual term, see equation (2.18), page 27. logit−1 the inverse-logit function, or logistic function, see equation (2.21), page 27. X ∗ |X, H the true X ∗ given X and H, page 28. µX ∗ |X,H the mean vector of the conditional distribution of the true X ∗ given X and H, page 28. MN multivariate normal distribution, page 28. N univariate normal distribution, page 41. σX ∗ |X,H the variance-covariance matrix of the conditional distribution of the true X ∗ given X and H, page 28. MCMC Markov chain Monte Carol, page 28. ι0 , ι1 , ι∗ , ι∗ 0 1 regression coefficients in the hybrid model, see equation (2.26), page 29. 2 2 σ1 , σ2 within level-2 unit variance and between level-2 unit variamnce, respectively, see equation (2.27), page 32. η0 , η1 latent mathematics proficiency at pre-test and post-test time points, respectively, page 35. xxi ε0 , ε1 residual terms at pre-test and post-test time points, respectively, see equation (3.1), page 35. a0 , a1 factor loading vectors at pre-test and post-test time points, respectively, page 36. b0 , b1 item difficulty parameter vectors at at pre-test and post-test time points, respectively, page 36. ν acceleration effect of intervention, i.e., the regression coefficient in the structural equation of SEM , see equation (3.5), page 36. ⊥ reads perpendicular or independent of , page 38. PEoG Pre-Equivalence of Groups (PEoG) Assumption in the SEM-based SFGD, page 37. PEoG-1 Pre-Equivalence of Groups (PEoG) Assumption in the SEM-based SFGD’s Group 1, page 38. PEoG-2 Pre-Equivalence of Groups (PEoG) Assumption in the SEM-based SFGD’s Group 2, page 38. SFGD-G2 SFGD’s Group 2, page 40. λ factor loading of the measurement model of outcome Y in the extended SEM-based SFGD, see equation (3.12), page 41. v, g intercept and factor loading of the measurement model of covariates X in the extended SEM-based SFGD, see equation (3.12), page 41. e1 , e2 residual terms of the measurement models of Y and X in the extended SEM-based SFGD, see equation (3.12), page 41. η, ξ latent variables (factors) of the measurement models of Y and X in the extended SEM-based SFGD, see equation (3.12), page 41. V(ξ) latent variable ξ’s variance, Ψξ in the extended SEM-based SFGD, see equation (B.3), page 163. Ψb , Ψw ξ ξ latent variable ξ’s within-cluster and within-cluster variances in the extended SEM-based SFGD, see equation (B.3), page 163. V(η) latent variable η’s variance, Ψη in the extended SEM-based SFGD, see equation (B.9), page 164. Ψb , Ψw η η latent variable η’s within-cluster and within-cluster variances in the extended SEM-based SFGD, see equation (B.9), page 164. xxii Y0 , Y1 outcome variable Y at Time 0 and Time 1, respectively, in the extended SEM-based SFGD, see equation (3.14), page 41. δ0 , δ1 intercept vectors of the measurement model of outcome Y0 and Y1 , respectively, in the extended SEM-based SFGD, see equation (3.14), page 41. λ0 , λ1 factor loading vectors of the measurement model of outcome Y0 and Y1 , respectively, in the extended SEM-based SFGD, see equation (3.14), page 41. e10 , e11 residual terms of the measurement model of outcome Y0 and Y1 , respectively, in the extended SEM-based SFGD, see equation (3.14), page 41. X0 , X1 covaraites X at Time 0 and Time 1, respectively, in the extended SEM-based SFGD, see equation (3.16), page 41. v0 , v1 intercept vectors of the measurement model of X0 and X1 , respectively, in the extended SEM-based SFGD, see equation (3.16), page 41. g0 , g1 factor loading vectors of the measurement model of X0 and X1 , respectively, in the extended SEM-based SFGD, see equation (3.16), page 41. e20 , e21 residual terms of the measurement model of X0 and X1 , respectively, in the extended SEM-based SFGD, see equation (3.16), page 41. Θe1 , Θe2 variances of the residual terms in the extended SEM-based SFGD, see equation (3.12), page 41. Θw , Θb e2 e2 within-cluster and within-cluster variances of the residual e2 in the extended SEM-based SFGD, see equation (B.4), page 164. Θw , Θb e1 e1 within-cluster and within-cluster variances of the residual e1 in the extended SEM-based SFGD, see equation (B.12), page 165. A, B, U intercept, factor loading, and residual term of the structural model in the extended SEM-based SFGD, see equation (3.13), page 41. ΘU variance of the residual term of the structural model in the extended SEM-based SFGD, see equation (3.13), page 41. Θb , Θw U U within-cluster and within-cluster variances of the residual U of the structural model in the extended SEM-based SFGD, see equation (B.8), page 164. xxiii a, π intercept and regression coefficient of the structural model in the extended SEM-based SFGD, see equation (3.17), page 43. A0 , A1 intercept vectors of the structural model in the extended SEM-based SFGD, see equation (3.18), page 43. B0 , B1 factor loading vectors of the structural model in the extended SEMbased SFGD, see equation (3.18), page 43. U0 , U1 residual terms , see equation (3.18), page 43. Y eari , Y eari+1 two adjacent years in the longitudinal design, page 48. Φw , Φb X X within-cluster and within-cluster variances of X in the extended SEM-based SFGD, see equation (B.5), page 164. Φw , Φb Y Y within-cluster and within-cluster variances of Y in the extended SEMbased SFGD, see equation (B.13), page 165. Ψw , Ψb η0 η0 within-cluster and within-cluster variances of latent variable η0 in the extended SEM-based SFGD, see equation (B.24), page 168. Ψw , Ψb ξ0 ξ0 within-cluster and within-cluster variances of latent variable ξ0 in the extended SEM-based SFGD, see equation (B.29), page 169. V(.) variance function, page 82. Cov(.) covariance function , page 159. ¯ X1 , S1 level-1 variable mean vector and variance-covariance matrix, page 58. SIMS-USA SIMS data collected in the United States, page 58. µ1 , Σ 1 ˆ ˆ estimated level-1 variable mean vector and variance-covariance matrix, page 66. ¯ W2 , S2 level-2 variable mean vector and variance-covariance matrix, page 58. µ2 , Σ 2 ˆ ˆ estimated level-2 variable mean vector and variance-covariance matrix, page 66. STU, SCH represent STUDENT and SCHOOL, respectively in Figure 4.1, page 63. OTL opportunity to learn, measured by the curriculum coverage, page 63. Coef. loading or regression coefficient, page 67, 68. SE standard error, page 67, 68. xxiv PV p-value, page 67, 68. POSTTEST post-test outcome variable, see equation (4.1), page 64. α0 , α1 intercept and regression coefficient of the level-1 model of post-test, see equation (4.1), page 64. epre , epost error terms of the level-1 model of pre- and post-test, respectively, page 64. PRETEST pre-test outcome variable, see equation (4.2), page 64. EDUCEPT education expectation, see equation (4.2), page 64. EDUINSP latent variable, educational inspiration, see equation (4.2), page 64. SLFENCRG latent variable, self encouragement, see equation (4.2), page 64. FMLSUPRT latent variable, family support, see equation (4.2), page 64. MTHIMPT latent variable, importance of learning mathematics, see equation (4.2), page 64. SES latent variable, socioeconomic status, see equation (4.2), page 64. β0 , . . . , β 9 intercept and regression coefficients of the level-1 model of pre-test, see equation (4.2), page 64. 2 2 σepre , σe post variances of error terms of the level-1 model of pre- and post-test, respectively, page 64. CLASSSIZE class-size, see equation (4.3), page 64. MTHONLY proportion of qualified math teachers, see equation (4.3), page 64. PRETTEST MEAN β0 , intercept of level-1 pre-test score model in Figure 4.2, page 65. POSTTEST MEAN α0 , intercept of level-1 post-test score model in Figure 4.2, page 65. γ0 , . . . , γ4 intercept and regression coefficients of the level-2 model of pre-test, see equation (4.3), page 64. γ5 , . . . , γ7 regression coefficients of the level-2 model of post-test, see equation (4.4), page 66. uβ , uα0 0 error terms of the level-2 model of pre- and post-test, respectively, page 64. 2 2 σu , σuα β0 0 variances of error terms of the level-2 model of pre- and post-test, respectively, page 66. xxv v C2T 0 , v C1T 1 intercept vector of the measurement model of covariates X in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82. C2T C1T e2 0 , e2 1 residual vector of the measurement model of covariates X in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82. ξ C2T 0 , ξ C1T 1 latent factor vector of the measurement model of covariates X in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82. g C2T 0 , g C1T 1 factor loading vector of the measurement model of covariates X in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82. δ C2T 0 , δ C1T 1 intercept vector of the measurement model of outcome Y in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82. C2T C1T e1 0 , e1 1 residual vector of the measurement model of outcome Y in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82. η C2T 0 , η C1T 1 latent factor vector of the measurement model of outcome Y in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82. λC2T 0 , λC1T 1 factor loading vector of the measurement model of outcome Y in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82. AC2T 0 , AC1T 1 intercept vector of the structural model in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, see equation (4.6), page 82. B C2T 0 , B C1T 1 factor loading vector of the structural model in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, see equation (4.6), page 82. U C2T 0 , U C1T 1 residual term of the structural model in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively , see equation (4.6), page 82. c1 a (constant) vector , page 83. p1 a (multiplier) vector , page 84. p2 a (multiplier) vector, page 85. c3 a (constant) vector, page 86. SU M (.) sum function adding up all the components of a matrix, page 87. c4 a (constant) vector , page 89. ˆP δi osP re estimate of schooling effect δ P osP re from sample i (size is ni ) based upon longitudinal C2T 0 − C2T 1 data, see equation (4.10), page 93. xxvi ˆSCD δi ith estimate of schooling effect δ SCD based upon C1T 1 − C2T 1 data, see equation (4.11), page 93. δ BIASinitial initial estimation bias on schooling effect before matching , see equation (4.12), page 94. M δ SCD ˆ i estimate of schooling effect δ SCD based upon matched C1T 1−C2T 1 data, see equation (4.13), page 94. δ BIASmatching estimation bias on schooling effect after matching, see equation (4.14), page 94. δ BRRmatching after matching bias reduction rate on schooling effect estimate, see equation (4.15), page 95. nC2T 0 sample size of data from Cohort 2 at Time 0, page 97. (Y, X, W )C2T 0 i ith data record of the Cohort 2 at Time 0 sample, page 97. nC1T 1 sample size of data from Cohort 1 at Time 1, page 97. (Y, X, W )C1T 1 i i th data record of the Cohort 1 at Time 1 sample, page 97. p1 level-1 propensity score representing the probability that a student belongs to focal Cohort 2, page 97. p2 level-2 propensity score representing the probability that a class belongs to focal Cohort 2, page 98. M in[a, b] minimum distance between vector a and vector b, page 97. ATT average treatment effect, page 119. L N y1i , y1i L treatment units that can be matched from the local control group and from the non-local, respectively, page 120. xxvii Chapter 1 Introduction Synthetic Cohort Design (SCD) was proposed and used for cross-national comparisons of schooling (Wiley and Wolfe, 1992) in the Third International Mathematics and Science Study 1995 (TIMSS 1995). In this design, growth is determined by comparing data of adjacent grades. Two cohorts, 7th grade (Cohort 1) and 8th grade (Cohort 2, the focal cohort), are measured at the same time point (Time 1). The SCD by nature is a quasi-longitudinal design. In a longitudinal design as usd in the Second International Mathematics Study (SIMS, IEA, 1977), two waves of data are collected from only the focal cohort (i.e., Cohort 2) at Time 0 and Time 1. Cohort 2 at Time 0 serves the control (Burstein, 1992; Wolfe, 1987). After the “treatment” of one year of schooling, data from Cohort 2 at Time 1 are collected to assess the schooling effect (δC2T 1−C2T 0 ), defined as the average of “changes in mathematics achievement over the time-span of one school year at the particular grade level” (Wiley and Wolfe, 1992, p. 299). The contrast between the two cohorts in SCD, a quasi-longitudinal schooling effect 1 (δC2T 1−C1T 1 ) is a measure of δC2T 1−C2T 0 under the historical equivalence of groups (HEoG) assumption. The HEoG assumption asserts that students in adjacent grades are similar except for the additional year of schooling. The HEoG assumption implies that improving comparability 1 of adjacent grades will improve unbiasedness of the schooling effect estimate in the quasi-longitudinal SCD. The greater the comparability of the students in adjacent grades, the less bias of δC2T 1−C1T 1 . However, the HEoG assumption can be violated by selection bias 2 (Heckman, 1979), resulting from the difference between Cohort 1 at Time 1 and Cohort 2 at Time 0. This 1 The comparability can be statistically tested through the multivariate group comparison approach (Tatsuoka, 1971). The comparability of two groups is revealed by the discriminant function (DF ) of covariates X (Tatsuoka, 1971). X includes p column vectors such as level-1(student level) covariates and their interaction terms and level-2 (class or school level) covariates and their interaction terms. It is denoted as X = (x1 , . . . , xp ). The DF is a linear combination of the covariates X s. For example, the first DF of X s can be written as DF = v11 x1 + v12 x2 + · · · + v1p xp . Vector V1 = (v11 , . . . , v1p ) is the first eigenvector vector of Σw −1 Σb . Σw and Σb are the within-group and between-group variance covariance matrixes of X s , respectively. Notice that within-group variance-covariance matrix Σw should be computed by taking account of the hierarchical structure of the data (See Schmidt and Houang, 1986). If (Σw )−1 Σb has q non-zero eigenvalues, then we can define q DF ’s, namely, DF1 , DF2 , . . . , DFq . Using DF simplifies group comparability testing when the number of covariates is large. Following the descriptive discriminant analysis (DDA, Huberty and Olejnik, 2006), the group comparability testing can determine if Cohort 2 at Time 0 is comparable to Cohort 1 at Time 1 regarding covariates X. A two step testing approach can be conducted. First, one computes the latent roots of (Σw )−1 Σb to construct DF ’s and test if two groups are ominously comparable or not. Second, if they are not ominously comparable, univariate group comparison can reveal the non-comparability. Thus, a set of covariates will be identified. The two groups should be non-comparable on each of the covariates in terms of their means. The set of covariates then can be used as matching variables. 2 Selection bias, also called sample selection bias (Heckman, 1979), refers to the bias due to the use of non-random samples in estimating relationships among variables of interests. It can occur in two situations: 1) self selection by objects being studied, and 2) sample selection by researchers or data analysts. Using selection-biased samples results in a biased estimate of the effect of an intervention that should have been randomly assigned. The intervention can refer to “treatment of migration, manpower training, or unionism” (Heckman, 1979, p. 154). 2 dissertation study consider three sources of selection bias that is due to: 1) hierarchical school structure; 2) measurement errors on covariates; and 3) omitted variables. First, both cohorts in SCD are naturally observed in the hierarchical school system and selection bias can arise from level-1 and/or level-2 covariates. Second, biased estimate of schooling effect can occur due to implicit measurement errors of covariates, which are commonly treated as perfect measures in analysis. Third, biased estimate of schooling effect can be caused by the omitted covariates, whose effect is indicated by an attenuated R2 . R2 , as an index of goodness of fit of regression, indicates the proportion of explained variance by the model. The mathematical definition of bias of schooling effect estimate is discussed in Chapter 2. It outlines the potential of matching in reducing selection bias of SCD. A structural equation model (SEM) framework is used to define the HEoG assumption of SCD in Chapter 3. Simulation studies are designed in Chapter 4 and 5 to examine how well matching reduces each of the three types of selection bias. 1.1 Research Goals This dissertation study uses a multi-level multivariate propensity score matching approach to examine the SCD in estimating the schooling effect on the focal cohort’s learning of mathematics. The simulation is based on a two-level structural equation model developed from USA data of the Second International Mathematics Study (SIM S − U SA, IEA, 19781982). Three types of simulated selection bias correspond to the three sources of selection bias: hierarchical school structure, measurement errors on covariates, and omitted variables. Performances of matching in reducing estimation bias of the schooling effect estimate is evaluated 3 using the bias reduction rate. To reduce the simulated selection bias on level-1 and level-2 covariates in the hierarchical school structure, dual matching (combining both individual matching and cluster matching) is proposed. When simulated selection bias is due to measurement errors on the covariates, latent variable matching is proposed to reduce selection bias. Simulated selection bias due to omitted covariates is realized by manipulating the values of R2 , level-1 and level-2. These simulation designs examine how well matching can reduce selection bias in the use of SCD. In particular, contrasting the quasi-longitudinal SCD with an optimal longitudinal design - the Solomon Four-Group Design (SFGD, Solomon, 1949; Campbell & Stanley, 1966) in the SEM framework, this study focuses on the following three research questions: 1. What is the relationship between SCD and SFGD? 1.1 What are the strengths and weaknesses of SCD compared with SFGD? 1.2 How can each type of selection bias fail SCD in educational settings? 2. How does the HEoG assumption mathematically assure unbiased estimates in SCD? 2.1 What is the mathematical definition of HEoG assumption? 2.2 What are the statistical definitions of the three types of selection bias that violate the HEoG assumption? 3. Does matching reduce the selection bias, and, if so, to what extent? Section 1.3 and 1.4 introduce the SCD and the SFGD. Section 1.5 explains the importance of HEoG in SCD. Chapter 2 reviews related literature to identify the sources of selection bias and delineate the use of matching to reduce bias. Chapter 3 compares and contrasts 4 SCD and SFGD to mathematically delineate the HEoG assumption. Chapter 4 discusses the simulation models and parameter manipulations to mimic each situation that results in selection bias and violation of the HEoG assumption. Chapter 5 presents the simulation results and reveals to which level the proposed matching approach reduces the selection bias. Chapter 6 draws conclusions, discusses the limitations, and outlines future research plans. 1.2 Solomon Four-Group Design In the SFGD, participants are randomly assigned to one of four different groups (See Figure 1.1). For example, treatment T can be a particular instructional method. The dependent Figure 1.1: The Solomon Four-Group Design: R represents group randomization, T treatment, and O assessment. Besides randomization, matching as the other approach to create comparable groups (Solomon, 1949). variable is measured on O’s, administered as pre-test (Time 0, before T) and post-test (Time 1, after T). The SFGD investigates if changes on the dependent variable are due to some interaction between the pre-test effect (τ ) and the treatment effect(δ). The peculiarity of the SFGD is that it includes Group 2, which is an extension of the pre- and post-test control group design (Campbell and Stanley, 1966). The pre- and post-test control group design depicted in Group 1, Experimental and Control, provides the researcher an instrument to estimate gain at the individual level that is attributed to the treatment, plus potentially the 5 effect from taking the pre-test. Group 2, Experimental and Control, is a replicate of the treatment-control study, except that the subjects do not receive the pre-test and are thus free from the influence of the pre-test. Denote ¯j Yt : mean of group j at time t, with j = E1 , C1 , E2 , C2 ; t = 0,1, α : main effect due to history or prior learning3 , τ : main effect due to taking the pre-test, γ : main effect due to maturation (between Time 0 and Time 1), δ : main effect due to the treatment. The expected values of the means of the four groups can be expressed as ¯E ¯E Experimental Group 1: E(Y0 1 ) = α E(Y1 1 ) = α + γ + τ + δ ¯C ¯C Control Group 1: E(Y0 1 ) = α E(Y1 1 ) = α + γ + τ ¯E Experimental Group 2: E(Y1 2 ) = α + γ + δ ¯C Control Group 2: E(Y1 2 ) = α + γ. While this is a main effect model, interaction effects, if identifiable, can be parameterized using the four main effects, which accounts for additional differences among the means 4 (Solomon, 1949, p.143). Randomization, however, is a very powerful requirement because all initial differences among the groups are attributed to sampling variation. For example, with randomization, the main effect due to prior learning can be assumed to be constant and 3 Prior learning effect α was not specified in Solomon (1949). It is important to specify it in this study for three reasons. First, it is a quantity that relates or indicates initial comparability of the groups. Second, it involves the process of computing treatment effect (see the sub-section of Synthetic Cohort Design in this study.). Third, more importantly, it will be a critical criterion to match the groups. 4 For example, pre-test and treatment interaction effect is a function of the four quantities. The quantity in Experimental Group 1 is QE = f (α + δ + γ + τ + I), the quantity in 1 Experimental Group 2 is QE = f (α + γ + δ), The quantity in Control Group 1 is QC = 2 1 f (α + τ + γ), the quantity in Control Group 2 is QC = f (α + γ). Interaction effect denoted 2 as I is computed as QE − QE − QC + QC . 1 2 1 2 6 identical for all groups. This does not mean that the performances of groups on the pretest are identical, but that the differences are solely due to sampling variation. Furthermore, randomization also renders the interaction effects to be indistinguishable from the main effects. For example, the joint effect (the interaction) of prior learning (α) and taking the pre-test (τ ) is confounded with τ , in the sense that α × τ cannot be separated from τ (Solomon, 1949, p.148). In summary, randomization provides the main justification for the main effect model and the tools to obtain an unbiased estimate of the treatment effect, δ. 1.3 Synthetic Cohort Design Figure 1.2 depicts the time line of a SCD. Four possible sets of data, two cohorts at two time points, can be collected. But the SCD intends to collect data at only Time 1. The hypothetical data at Time 0 are important to the comparison of SCD with the SFGD. ¯ Following the notation used before, Ytl is denoted as the mean of the dependent variable for cohort l at time t, l = C1, C2 and t = 0, 1. Then the expected values of the dependent variable at four data points can be hypothetically parameterized as C2 ¯ C2 Cohort 2, Time 0: E(Y0 ) = α0 C1 ¯ C1 Cohort 1, Time 0: E(Y0 ) = α0 C2 ¯ C2 Cohort 2, Time 1: E(Y1 ) = α0 + δC2T 0−C2T 1 + γ C2 + τ C2 C1 ¯ C1 Cohort 1, Time 1: E(Y1 ) = α0 + δ C1 + γ C1 + τ C1 Putting Figure 1.2 into the context of TIMSS 1995, Cohort 1 corresponds to 7th graders, and Cohort 2 corresponds to 8th graders. Data are collected at only Time 1. The SCD investigates δC2T 0−C2T 1 , which is the effect of 8th grade instruction on student learning. 7 Figure 1.2: Longitudinal vs. Quasi-Longitudinal Comparison. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. The schooling effect of 7th grade instruction is δ C1 . Students do not take pre-test at Time 0. Thus, τ C1 = τ C2 = 0. Effects due to maturation, γ’s, are confounded with effects due to history or prior learning α’s. The SCD model, at only Time 1, is as follows: C2 ¯ C2 Cohort 2, Time 1: E(Y1 ) = α0 + δC2T 0−C2T 1 C1 ¯ C1 Cohort 1, Time 1: E(Y1 ) = α0 + δ C1 An estimate of δC2T 0−C2T 1 through SCD is C1 C2 ¯ C2 ¯ C1 δC2T 1−C1T 1 = E(Y1 − Y1 ) = (α0 + δC2T 0−C2T 1 ) − (α0 + δ C1 ). (1.1) δC2T 1−C1T 1 represents what Cohort 1 at Time 1 would learn if they go through the school 8 system that students of Cohort 2 at Time 0 had gone through. Under the HEoG assumption, Cohort 1 at Time 1 ( Cohort 1 at 7th grade) are ( Cohort 1 at 7th grade) are comparable with Cohort 2 at Time 0 (Cohort 2 at 7th grade), i.e., C1 C2 C1 C1 α1 ≡ α0 , and α1 is (α0 + δ C1 ). Therefore, δC2T 1−C1T 1 produces an unbiased estimate of δC2T 0−C2T 1 under the HEoG assumption. The HEoG assumption allows the claim that schooling effect δC2T 1−C1T 1 and δC2T 0−C2T 1 are identical. That is, mathematically HEoG ⇒ δC2T 1−C1T 1 = δC2T 0−C2T 1 . Mathematically “⇒” reads “implies”. However, the lack of HEoG can cause biased schooling effect estimation using SCD, and the bias can be defined as ˆ ˆ BIAS(δC2T 1−C1T 1 ) = E(δC2T 1−C1T 1 ) − δC2T 0−C2T 1 5 . (1.2) Schooling effect δC2T 1−C1T 1 and δC2T 0−C2T 1 are identical in the counterfactual sense. That is, (HEoG|Randomization) ⇒ δC2T 1−C1T 1 = δC2T 0−C2T 1 , where “|” reads “given”. In other words, randomization assures the HEoG assumption that allows the claim that schooling effect δC2T 1−C1T 1 and δC2T 0−C2T 1 are identical. 5 The mean of the sample distrubtion of δ ˆ C2T 1−C1T 1 is δC2T 1−C1T 1 . At population level, δC2T 1−C1T 1 is an estimator of δC2T 0−C2T 1 . This way bias can be defined as BIAS(δC2T 1−C1T 1 ) = δC2T 1−C1T 1 − δC2T 0−C2T 1 . 9 1.4 Why Is HEoG Critical in SCD While the use of randomization is commonly found in natural science, it is often difficult, if not impossible to implement6 in educational settings. Randomization at the individual level is not practical in education (Hedges, 2007a) because the population of interest is hierarchically structured, and students are naturally nested in clusters such as classrooms and schools. Alternatively, classrooms may be randomly assigned to treatment and control groups as in the cluster-randomized trials design. Nevertheless, frequently we cannot conduct a study by assigning classrooms to a “control” or non-treatment condition, which leads to moral and legal concerns (Bloom, 2004). Even if it is legitimate in practice, randomization may still result in non-comparable groups (Berger, 2005). Without randomization, the comparability of Cohort 1 at Time 1 and Cohort 2 at Time 0 relies on the HEoG assumption. Because of the quasi-experimental nature of SCD, selection bias (Heckman, 1979) must be accounted for. Selection bias indicates the potential violation of the HEoG assumption in the quasi-experimental SCD. Matching has been successfully used to reduce bias in the equation (1.2) at level-1 units (Rosenbaum, 1986). It can also be used to reduce selection bias at level-2 units (Cox and Reid, 2000; Freedman et al., 1990; Hong and Raudenbush, 2006; Raab and Butcher, 2001, p. 29). Matching has the potential to reduce selection bias of SCD, so that (HEoG|M atching) ⇒ δC2T 1−C1T 1 = δC2T 0−C2T 1 . 6 Rubin (1978) pointed out that “In some cases with strong prior knowledge, randomization may not be important” (p. 55). 10 In other words, matching creates a situation where HEoG assumption holds so that schooling effect δC2T 1−C1T 1 unbiasedly estimates δC2T 0−C2T 1 . Nevertheless, matching can only reduce selection bias to a certain degree because of the potential of having unobserved and unmatched characteristics. Further comparisons between SCD and SFGD in the structural equation modeling framework expand the mathematical definition of HEoG assumption that clarifies the use of SCD to estimate schooling effects in Chapter 3. 1.5 Significance This study is important for many reasons. First, this study provides empirical evidence for policy makers and educational researchers to examine the role of SCD in large scale research. Despite the fact that randomization has been difficult to apply in educational studies, it is possible to examine schooling effect through curriculum sensitive assessment for policy making if the intervention groups are assured comparable (Cochran, 1972 in Rubin, 2006). With the use of matching (Cochran and Rubin, 1973), the SCD retrospectively creates comparable groups to examine student learning in the school system. Furthermore, this study is methodologically informative and illustrative. It comprehensively evaluates the performance of post-hoc matching in validating the implementation of SCD in large scale education studies. It provides multifaceted measures such as bias reduction of schooling effect estimate to facilitate program evaluation. This study also provides a suitable bias reduction approach and an analytical tool for researchers when intact clusters such as schools and classrooms are used. It is important to find the optimal bias reduction methods to draw statistical inference for educational studies. 11 The study of multilevel matching in this work will help educational researchers achieve research goals. For example, if a study uses clusters (e.g. classrooms) as units to examine the effectiveness of a new intervention, level-2 matching will accomplish the analytical goal. However, the single use of level-2 matching may leave the hidden/micro individual differences, which are known to affect student learning. If the analytical units are individuals, level-1 matching will be needed to make treated and control individuals comparable. 12 Chapter 2 Literature Review This chapter identifies three types of selection bias: hierarchical school structure (Section 2.1.2), measurement errors on covariates (Section 2.4.1), and omitted variables (Section 2.5). The three sources of selection bias identified in this chapter are closely tied to corresponding matching approaches in the context of the SCD in Chapter 4. After reviewing bias reduction approaches such as propensity score matching in Section 2.2, I identify the necessity of dual matching (McCall, 1923) in reducing both level-1 and level-2 selection bias (Section 2.3). The attenuated bias reduction rate due to measurement errors is reviewed in Section 2.4.1. The measurement error adjusted propensity score model is introduced in Section 2.4.2. A hybrid propensity score estimation model accounting for measurement errors is delineated in Section 2.4.3. Section 2.5 reviews the selection bias problem due to omitted variables. This problem is generally indicated by a shrunk R2 because the effect of the omitted variables is “compressed” into residuals and inflates the residual variance. In hierarchically structured data, the inflated residual variance due to omitted variables will further affect intraclass correlation (ICC). 13 2.1 Definitions of Bias and Selection Bias Bias or error occurs when the expected value of estimate (observed score or observed treatment effect) differs from the value being estimated (true score or true effect) through sampling (S¨rndal et al., 2003). Bias on the estimate of treatment effect can be attributed to a measurement errors on outcome Y (Fuller, 1987) and/or to the initial difference on covariates X in the two groups being compared (Carroll et al., 2006; Cochran and Rubin, 1973). Thus, the outcome is a sum of three parts (Wooldridge, 2002): 1) the effect of treatment variety; 2) the effect of initial difference due to covariates X; and 3) the random measurement error. The negative effect of initial difference due to covariates X has been studied for decades. Neyman (1923) pointed out that the plot characteristic besides the treatment impacted potential yield, which implies that the plot characteristic can be a source of bias (in Rubin, 1990, p. 283). Similarly, Gosset (“Student”, 1923, in Rubin, 1990) found that the initial differences among the groups affected the outcome besides the intervention. The initial difference can bias the treatment effect estimation and mislead one’s conclusion (Campbell and Stanley, 1966). Given the hierarchically structured nature of educational settings, participants are not assigned to groups at random, initial difference can occur on the level-1 covariates and/or at level-2 covariates. The following two sections mathematically demonstrate how selection bias on covariates X affects the estimate of treatment effect at the individual or group level. 2.1.1 Mathematical Definition of Bias at Individual Level Counterfactual Model This is the ideal case, which involves no covariates X. The counterfactual responses (Holland, 1986; Morgan and Winship, 2007) in treatment and control 14 groups can be written as D y i = µD + u D , i (2.1) D where D = 0, 1 and i = 1, ..., nD . yi are the ith counterfactual responses under treatment (D = 1) or control (D = 0). µD represents the mean of Dth group’s responses. uD are the i ith random errors in Dth group. The composite equation is yi = µ0 + D ∗ (µ1 − µ0 ) + (u0 + D ∗ i ), (2.2) where i = u1 − u0 and E( i ) = 0. Let the population level treatment effect be δ. Then i i 1 0 δ = E(yi − yi ) = E(µ1 ) − E(µ0 ). (2.3) Covariates X can be added to the counterfactual model. Let µD = M D (X) be a function of covariates X (e.g., in Cochran and Rubin, 1973), such as a linear equation1 M D (X) = αD + (X − µD )β D . X (2.4) Drop the subscript and let residual be = u1 − u0 . 1 The layout of the regression equations in Section 2.1.1 and Section 2.1.2 is different from the layout of those in Section 2.1.3. That is, for example, the regression coefficient vector β of the covariates X are displayed in the equation as Xβ in Section 2.1.1 and 2.1.2, but as β X in Section 2.1.3. Because the regression coefficient vectors have superscribes indicating group membership, this way will simplify the layout of regression equations in Section 2.1.1 and 2.1.2. 15 Write y D = α0 + (X − µ0 )β 0 + u0 + D ∗ (α1 − α0 ) + D ∗ [X(β 1 − β 0 ) − µ1 β 1 + X X (2.5) µ0 β 0 )] + (u0 + D ∗ ). X Further simplify the equation above to obtain the treatment effect, denoted as δ(X), which is δ(X) = E{(α1 − α0 ) + [X(β 1 − β 0 ) − µ1 β 1 + µ0 β 0 )] + }. X X (2.6) Bias occurs when the estimate of treatment effect is NOT equal to the true value, i.e., δ(X) = δ. Thus, bias is defined as ∆(X) = δ(X) − δ. 2.1.2 (2.7) How Selection Bias Affects Treatment Effect Estimate The detailed decompositions below identify illustrative situations where bias many occur. Assume that there are no measurement errors on X and the expectations of the residuals u1 and u0 are zero, bias reduction will focus mainly on components related to M D (X). Initial Difference on Covariates X The initial difference on covariates X in treatment and control group can generate bias on estimating the treatment effect. Let ∆X be a non-zero constant vector representing the treatment and control group 16 mean difference of covariates. That is, µ1 = ∆X + µ0 . X X (2.8) The function is M D (X) linearly additive with D = 0, 1. That is, M D (X + ∆X ) = M D (X) + M D (∆X ). (2.9) The treatment effect estimate is E[M 1 (X + ∆X ) − M 0 (X)] = E{(α1 − α0 ) + [X(β 1 − β 0 ) − (µ0 + ∆X )β 1 + µ0 β 0 ]. (2.10) X X Because the initial difference ∆X is not equal to zero, the treatment effect is biased. If we assume the regression coefficients are the same, i.e., β 1 = β 0 = β, then the bias component can be identified as ∆(X) = β∆X . (2.11) Unequal Regression Coefficients of Treatment and Control Groups If the treatment and control group means are equal, i.e., µ1 = µ0 = µX , the difference between the regression X X coefficients of treatment and control groups will bias the treatment effect estimate. In this situation, the bias component is ∆(X) = E[(X − µX )∆β ], where ∆β = β 1 − β 0 . 17 (2.12) In practice, unequal regression coefficient of the treatment and control groups may be due to the interaction terms between covariates X and treatment status variable D. Let vector βX×D be the regression coefficient vector of the interaction terms. The regression coefficient vector of covariates X in treatment group is β 1 = β 0 + βX×D . (2.13) ∆β = βX×D . (2.14) This implies In practice, one can add an interaction term of a covariate x and D in the regression and test if this coefficient is statistically zero. 2.1.3 Selection Bias in Hierarchically Structured Data In a hierarchically structured population (Cochran, 1963), ith individual is assumed to be nested in k th class. At student level (level-1), outcome Yik and Xik covariates are observed. At class level (level-2), Wk covariates are also available. Let D be a binary treatment-control indicator with 1 representing the treatment group, 0 otherwise. The mathematical relationship between outcome variable and covariates is modified from (Schmidt and Houang, 1986) in a counterfactual sense: D D D Yik = αD + βX (µD − µX ) + βW (Wk − µW ) + uD + β (Xik − µD )+ Xk Xk k ∗ D (βk ) (Xik − µD ) + eD , Xk ik 18 (2.15) ∗ where D = 0, 1 and βk = (βk − β) . µX and µW are the population means in the vector format, µX is the population mean vector of the level-1 covariates in k th class, vector βX k includes the between-level-2-unit regression coefficients of the aggregated means of level-1 covariates, vector βW includes the regression coefficient of the observed level-2 covariates, vector β includes the pooled within-level-2-unit regression coefficient of the level-1 covariates, and vector βk includes the within-level-2-unit regression coefficients of the level-1 covariates in k th class. The counterfactual treatment effect is 1 0 ∗ 1 E[Yik − Yik ] = E(α1 − α0 ) + E{(βX − β − βk ) (µ1 − µ0 )} + E{βW (Wk − Xk Xk (2.16) 0 ∗ 1 0 Wk )} + E{(β + βk ) (Xik − Xik )} + E(u1 − u0 ) + E(e1 − e0 ). k k ik ik 1 0 1 0 In the counterfactual case 2 , (µ1 − µ0 ) = (Wk − Wk ) = (Xik − Xik ) ≡ 0 holds, and Xk Xk the expected treatment effect is E(α1 −α0 )+E(u1 )−E(u0 )+E(e1 )−E(e0 ). The treatment k k ik ik effect is unbiased because the residual expectations are zero. However, bias can result in at 1 0 1 0 least one of the three situations: µ1 − µ0 = 0, (Wk − Wk ) = 0, and (Xik − Xik ) = 0. Xk Xk 1 0 The three situations represent different sources of bias. (Wk − Wk ) = 0 indicates that 1 treatment and control groups are not comparable at level-2 units such as classes. (Xik − 0 Xik ) = 0 indicates the level-1 difference within k th class. µ1 − µ0 = 0 represents the Xk Xk difference due to the non-comparable aggregated means of the level-1 covariates X within k th class. 2 If randomization is used, equivalence is at the expection/mean level rather than at level1-2 units or level-1-1 units. The subscripts, k of level-2 units and i of level-1 units will be 1 0 dropped. Thus, [µ1 − µ0 ] = [E(W 1 ) − EW 0 ] = [E(Xk ) − E(Xk )] ≡ 0 holds. For the X X purpose of simplicity, Chapter 2 uses counterfactual model to define bias and demonstrates how selection bias occurs. 19 2.2 Propensity Score Matching for Bias Reduction Bias reduction (Cochran and Chambers, 1965; Cochran and Rubin, 1973; Rubin, 1973a,b, 1976a,b, 1979, 1980) is critical for treatment effect estimation in causal inference and in program evaluation. Initial difference on covariates X between treatment and control groups should be taken into account so that the bias on Y can be reduced and the treatment effect can be accurately estimated. Research on bias reduction has shown that combining matching and regression adjustment (e.g., Stuart and Rubin, 2008) can achieve the best bias reduction, even if the relationship between outcome Y and covariates X is nonlinear (Cochran and Rubin, 1973; Rubin, 1973b, 1979). Bias reduction techniques have been developed for observational studies in causal inference and in program evaluation. These techniques include Cochran’s three approaches including pairing, balancing, and stratification (Cochran, 1953)3 , post-hoc matching (Abadie and Imbens, 2006, 2007; Rubin, 1973a,b, 1976a,b, 1979, 1980), analysis of covariance (e.g., Cochran, 1957, 1969), inverse propensity score weighting (Angrist and Pischke, 2009; Horvitz and Thompson, 1952; McCaffrey and Hamilton, 2007), statistical modeling with adjustment (e.g. WLS estimation in HLM frame work, see Hong and Raudenbush, 2006), and double robust estimation using regression adjustment and inverse propensity score weighting (Kang and Schafer, 2007). Post-hoc matching depends on the summary measure, a functional composite of covariates (Rubin, 1985). The most commonly used composites in matching are the Mahalanobis distance (e.g., Rubin, 1980) and the propensity score (Rosenbaum and Rubin, 1983). This 3 Pairing is to exact match each unit of treatment with one from the control group; balancing is to match treatment and control on means of a covariate; and stratification is to stratify data on a covariate. 20 study mainly focuses on propensity score matching. Propensity score matching is a post-hoc bias reduction method, which has been commonly used on observational data to approximate the individual-randomized trials to study a treatment effect of interest (Cochran, 1953, 1968a; Cochran and Rubin, 1973; Rosenbaum and Rubin, 1983; Rubin, 1973a,b) Because the “golden rule” of randomization is generally broken in observational studies (Cochran, 1953; Rosenbaum, 2002), the post-hoc matching approach uses covariates or summary measures of covariates (e.g., Mahalanobis distance in Rubin, 1980) to match the treatment and control groups (Rosenbaum and Rubin, 1985) to remove bias. When the number of the covariates is not large, matching can be done on covariates (Cochran, 1953). Rosenbaum and Rubin (1983) further developed a holistic measure, the propensity score, to avoid dimensionality issues when the number of covariates increases dramatically and makes matching impossible on original covariates. Propensity scores play a critical role in bias reduction techniques. Research has found that accurately estimated propensity scores can reduce bias and assist researchers in drawing causal inferences in observational studies (e.g., Greenland, 2004). A propensity score represents conditional probability that an individual is assigned to the treatment group (Rosenbaum and Rubin, 1983). Generally, it is estimated by using logistic regression with the covariates collected from the participants as independent variables and the participant’s status on the treatment variable as the dependent variable (Rosenbaum, 1987). The covariates in the logistic regression are non-treatment variables such as the participant’s background characteristics. An propensity score summarizes the information of these covariates. Using such propensity scores, a researcher can match a participant from the treatment group with a participant from the control group to achieve group comparability to facilitate causal 21 inference (Rubin and Waterman, 2006). Rubin (1979) defines the bias reduction rate as the percentage reduction in expected squared bias of treatment effect, which is also adopted by Stuart and Rubin (2008). In this study, the index in Rubin (1979) and Stuart and Rubin (2008) will be used as a measure to evaluate how well the bias reduction methods perform. (See details in Chapter 4.). 2.3 Matching on Hierarchically Structured Data In educational experiments for program evaluation, often researchers sample larger units from a hierarchically structured population (Cochran, 1963; Scott and Smith, 1969). Examples of larger units include clusters (Donner, 1998), groups (Cornfield, 1978; Murray, 1998; Raudenbush, 1997), communities (Freedman et al., 1990; Martin et al., 1993; Thompson et al., 1997), or schools (Hedges, 2007a; Hong and Raudenbush, 2006; Murray et al., 1994; Raudenbush, 1997). This design, a cluster-randomized trial (CRT; Donner and Klar, 2000; Murray, 1998; Raudenbush, 1997) consists of clusters made up of multiple individuals (Bloom, 2004). For example, in educational settings, cluster sizes usually vary from 5 in the Early Childhood Longitudinal Program (ECLS) to 60 in the Longitudinal Study of American Youth (LSAY) and the average cluster size is about 13 (Hedges, 2007a). When clusters are assigned to interventions, non-comparable treatment-control groups can arise from either level-1 or level-2 covariates (Raab and Butcher, 2001) resulting in selection bias. Selection bias happens frequently in observational studies or in studies where randomization fails (Rosenbaum, 2002), and leads to an inappropriate estimate of the intervention effect (Rubin, 1973a). 22 When large scale hierarchically structured data are used, selection bias needs to be evaluated (Berger, 2005) and its influence removed from the estimate of the intervention effect in the analytical stage (Hong and Raudenbush, 2006). The dual matching method proposed in this study is used to approximate a matched cluster-randomized design to reduce bias in the estimate of the intervention effect for multilevel educational data. Matching has been widely used in observational studies to reduce estimation bias of the intervention effect because it can significantly improve comparability of the groups (Cochran, 1953; Rubin, 2001). Dual matching was used in experimental education using clusters such as classrooms and school districts to study intervention effects decades ago. Pittman (1921, as cited in McCall, 1923) conducted the delaying match after the final test score had been collected. In order to achieve comparability at both cluster- and level-1, Pittman (McCall, 1923, p. 49) matched individuals after higher level covariates such as wealth, quality of population, teacher quality were taken into account in matching. Unfortunately, the literature has not followed up to study the potential of Pittman’s duel matching approach for CRT designs. Using large scale group-randomized trial data Griffin et al. (2009) found that matching on different sets of level-2 covariates resulted in different levels of statistical power. Using propensity scores estimated from kindergarten retention data to approximate the CRT design has been studied by Hong and Raudenbush (2006). However, Hong and Raudenbush (2006) did not use the estimated propensity scores for matching. They used propensity scores to stratify the data, and then treated the propensity scores as a covariate to analyze stratified data to estimate the intervention effect using the hierarchical linear model. The three situations identified in Section 2.1.3 represent different sources of bias, which 23 need different matching strategies. 2.3.1 Level-1 Matching 1 0 There is level-1 bias within k th class. That is, (Xik − Xik ) = 0, implying that the counterfactual equivalence is not satisfied in practice. In other words, the ith student is either in the treatment group or in the control group, but not in both. For a student, say John in the treatment group, there is no exact John-equivalent in the control group. The two groups are not equivalent. Here, level-1 matching can be conducted using covariates X to match each treated individual with one from non-treated individuals. µ1 − µ0 = 0 results when the Xk Xk aggregated means of the level-1 covariates X within k th school are non-comparable. This second level bias can be reduced when the bias on level-1 covariates X is removed. By ignoring the hierarchical structure, treated individuals are matched with control individual to compute bias reduction rate. The analysis units for intervention effect are the outcomes of the matched individuals. 2.3.2 Level-2 Matching Treatment and control groups are not comparable at level-2 units such as classes or schools, 1 0 that is (Wk − Wk ) = 0 in the counterfactual sense. Bias reduction here focuses on second level units, and one would conduct level-2 matching. By ignoring level-1 variables, clusters are matched by using level-2 propensity scores to compute bias reduction rate. 24 2.3.3 Dual-Matching When both level-1 and level-2 covariates are not comparable, it needs dual matching, including both level-2 matching and level-1 matching. That is, treated clusters are first matched with control clusters, then, within each matched treatment-control pair, individuals are matched. The detailed dual-matching procedure is discussed in Chapter 5. 2.4 Measurement Errors and Matching Modeling errors of measurement (Cochran, 1968b) on observed (surrogate) variables has been well developed in general regression (Fuller, 1987), logistic regression (Carroll et al., 2006) (Spiegelman et al., 1997), and survey sampling (Biemer et al., 2004; Fuller, 1995; Hansen et al., 1961; Mahalanobis, 1946). Few studies have been done in matching after Cochran and Rubin (1973) reviewed the effect of measurement errors of covariate on bias reduction. Measurement issue can be a more serious issue in a study because measurement errors will reduce the efficiency of adjustment (Cochran, 1968a,b; Cochran and Chambers, 1965). While the literature is replete with guidelines on how to use matching to estimate treatment effect, there is little research on how to adjust the measurement errors of the covariates used for matching. Most researchers simply analyze and estimate propensity scores by taking the covariates as the perfect measures. Measurement errors attenuate the regression coefficient β of covariate x on outcome ˜ y (J¨reskog and S¨rbom, 1996). Let β be the attenuated regression coefficient. It has o o ˜ ˜ β < |β| and β = β × R 4 in bivariate regression (Cochran & Rubin, 1973). R is the 4 Statistically, R has the upper limit of 1. 25 attenuation rate due to measurement errors in the covariate x. Bias reduction rate on ˜ β covariate x is attenuated by R = due to measurement errors in covariate x (Cochran |β| & Rubin, 1973). Cochran (1968a, in Rubin, 2006, p. 20) found that under a simple linear 1 regression, the measurement error on x attenuates the bias reduction rate by a factor of 1+ρ , where ρ is the reliability of x. 2.4.1 Measurement Errors Adjusted Propensity Scores When the true covariates (X ∗ ) are measured by vector X with errors, matching should be based upon the propensity scores P r(D = 1|X ∗ ), rather than P r(D = 1|X). There are two methods (Carroll et al., 2006) to adjust for measurement errors in the logit model used to estimate propensity scores. The first method assumes the true covariates have not been observed and the na¨ ıve parameter estimates are obtained using the observed covariates. An approximately consistent estimator of the parameters is provided through a functional adjustment on the na¨ ıve estimator (see details below). The second method to adjust for measurement errors in logistic regression is through structural modeling, in which the distribution of the true covariates is parametrically modeled. For example, likelihood and Bayesian approach based structural equation modeling (Carroll et al., 2006; Lee, 2007, Chapter 9) can be used to accomplish this goal. The first method proposed by Rosner et al. (1990, 1989) is a two-step regression calibration logit model. The first step is to use one sample (N=n1 ) and fit a logit model logit[P r(D = 1|X, H)] = α0 + α1 X + α2 H, 26 (2.17) where X = (x1 , ..., xp ) is the observed surrogate covariate vector of the true X ∗ . H = (h1 , ..., hq ) is the covariate vector without measurement errors. The regression coefficient vectors are denoted as α1 = (α11 , . . . , α1p ) and α2 = (α21 , . . . , α2q ). Secondly, a model X ∗ = ι0 + ι1 X + ι2 H + r (2.18) is fit on the other sample (N = n2 ), in which both X and X ∗ are available. The regression coefficient vectors are denoted as ι1 = (ι11 , . . . , ι1p ) and ι2 = (ι21 , . . . , ι2q ). The mean and covariance of r are 0 and ΣX ∗ |(H,X) , respectively. The adjustment matrix is κ =   0 ι1  . The regression coefficients of the “true” logit model ι2 I logit[P r(D = 1|X ∗ , H)] = β0 + β1 X ∗ + β2 H (2.19) ˆ ˆ ˆ β = (β1 , β2 ) = κ−1 α. ˆ ˆ (2.20) can be obtained using This two-step adjusted method requires that the dimensions of X and X ∗ are equal. However, the measurement errors adjusted propensity scores cannot be obtained directly using this approach because the integral in the following propensity score function does not have a closed-form solution (Carrel et al, 2006, p.91). P r[D = 1|X ∗ , H] = L(.)exp[−(1/2){x∗ − µX ∗ } Σ−1 {x∗ − µX ∗ }]dx∗ X∗ , (2π)p/2 |ΣX ∗ |1/2 (2.21) where L(.) = logit−1 (β0 + β1 X ∗ + β2 H). The approximate approach is developed in Weller 27 et al. (2007). That is, P r[D = 1|X ∗ , H] ≈ exp(α0 + α1 X ∗ + α2 H), σ2 ∗ 2 X |X,H − ι β , α0 = β0 − β1 0 1 2 (2.23) α1 = ι1 β1 , (2.24) α2 = ι2 β1 + β2 . where (2.22) (2.25) and The distribution of (X ∗ |X, H) is a multivariate normal M N (µX ∗ |X,H , σ 2 ∗ ). X |X,H However,when the second sample having both X and X ∗ observed, is not available, one cannot estimate the measurement-errors-adjusted propensity scores. In this situation, an alternative method such as Bayesian logit model with the implementation of Markov chain Monte Carol (MCMC) can be fitted through WinBUGS (Lunn et al., 2000). The measurement error adjusted propensity scores P r[D = 1|X ∗ , H] can be simultaneously estimated when the regression coefficients and covariates X have been updated using MetropolisHastings algorithm. This MCMC-based propensity scores approach is then used for matching. 2.4.2 Structural Equation Modeling as an Alternative Structural equation modeling (SEM, Bollen, 1989; J¨reskog and S¨rbom, 1996) incorporates o o the latent variable to take into account measurement errors on surrogate variables. Propen28 sity scores can be estimated through the following hybrid model:    X = ι0 + ι1 X ∗ + eX . (2.26)   logit[Pr(D = 1|X ∗ )] = ι∗ + ι∗ X ∗ 0 1 The first equation, a measurement model, captures the linear relationship between the latent X ∗ and observed X in both treatment (D = 1) and control (D = 0) group. The second equation, a structural model, captures the nonlinear relationship between the latent X ∗ and a latent propensity score P r(D = 1|X ∗ ). Adapting a latent variable approach circumvents the post-hoc coefficient adjustment (e.g., Weller et al., 2007) discussed in Section 2.4.1. The estimated propensity scores can be used in matching. Note that the latent propensity score P r(D = 1|X ∗ ) and latent X ∗ have a oneto-one functional relationship. Matching on estimated propensity scores is mathematically equivalent to matching on the estimated factor scores 5 of the latent X ∗ . In educational studies, factor scores of the latent X ∗ such as academic proficiency measures and ability constructs have been used to match individuals to achieve comparable groups (e.g., classical true score in Van Van der Linden and Hambleton, 1997). The latent construct is measured by multiple surrogate items. A most commonly used model is the item response theory, in which individual ability is calibrated through a set of items with presumptive difficulty and discrimination parameters (Lord and Novick, 1968). The calibrated ability estimation represents an examinee’s academic proficiency that the set of items are designed to measure. However, matching on latent variables may fail to remove bias due to 5 The estimated factor scores can be derived using SEM software packages such as Mplus (Muth´n and Muth´n, 2009). e e 29 other observed covariates H that are free of measurement error. A composite measure such as propensity score that summarizes both the latent X ∗ and covariates H becomes necessary in matching. Further, matching needs to take into account the hierarchical structure of latentX ∗ and covariates H. 2.5 Omitted Variables Cochran and Rubin (1973) studied failing to include a confounding variable in matching. For example, he true linear regression has two covariates, x1 and x2 . Matching is on only x1 . x2 is omitted from matching. Bias reduction of matching on only x1 depends on the regression relationship between x1 and x2 . If the regression of x2 on x1 has equal slopes but non-equal intercepts in the two populations, treated and control, the final bias due to matching on only x1 is larger than the initial difference. This is referred to the “parallel but not identical” case (p. 45). If the regression of x2 on x1 has a “parallel but non-linear” (p. 45) relationship and the sample sizes are large, matching on only x1 reduces partial selection bias due to x2 . The reduced selection bias due to x2 is only proportional to the partial linear regression of x2 on x1 . When the number of covariates is large, the omitted covariates and the included have more complex relationships. The pattern of bias reduction due to omitted covariates will be different from what was found in Cochran and Rubin (1973). Instead of studying bias reduction due to the correlated relationship between the omitted covariates and the included covariates, the literature focuses on how the relationship between included variables and outcome variable affects bias reduction (Austin et al., 2007). Austin et al. (2007) conducted a Monte Carlo study to compare the strengths of different propensity 30 score models in matching treated and untreated groups. They found that correlation and association between the outcome and the covariates is required. For example, using covariates that are associated with exposure but independent of the outcome will result in a situation where more treatment units cannot be matched. If essential covariates are omitted from the model, it attenuates the association between the outcome and the covariates included in the model. In the general linear regression model, omitted covariates decrease the proportion of variation explained and inflate the residual variable. Thus, it results in an attenuated R2 , an index of goodness of fit of the regression. Matching on a measure that is not highly correlated with the outcome variable, results in ineffective matching (Martin et al., 1993). In order to obtain effective matching, the correlation between the matching covariate and the outcome variable needs to be at least .40, when there are 10 pairs of clusters being matched (Martin et al., 1993). It is more complicated when omitted variables occur in the analysis involving hierarchically structured data. Unlike the level-1 variation which increases when covariates are omitted, between-cluster variation will not necessarily increase when covariates are omitted from the model (Raudenbush, 1997). Thus the relationship between omitted variables and the intraclass correlation (ICC) is complex. The level-1 and level-2 residual variances define an index, ICC, which indicates the similarity among the units in a cluster. The decomposition of total variance of outcome variable 2 2 indicates the within level-2 unit variation (σ1 ) and between level-2 unit variation (σ2 ). Thus 31 ICC is defined in Raudenbush and Bryk (2002) as 2 σ2 . ICC = 2 2 σ2 + σ1 (2.27) 2 2 Increasing σ1 and/or σ2 will result in complication on ICC. That is, ICC, summarizing the two sources of variation, is not a clean-cut index to be linked to the bias reduction of either level-1 matching or level-2 matching (Abadie and Imbens, 2006). In order to examine the performance of matching, this dissertation study simulates data by manipulating the level-1 and/or level-2 residual variances, rather than the ICC index. 32 Chapter 3 Theoretical Framework This chapter defines the HEoG assumption using a structural equation model (SEM) framework. SEM (Bollen, 1989) takes into account measurement errors and depicts the measurement relationship between the surrogate variables and their latent variables, whose relationships are captured by the structural model (J¨reskog and S¨rbom, 1996) o o 3.1 Solomon Four-Group Design in SEM Framework SFGD includes two Experimental groups and two Control groups. It also involves two testing points, pre-test at Time 0 and post-test at Time 1. Using randomization, SFGD assumes that the four groups are comparable (Solomon, 1949). Table 3.1 displays the SEM framework of the SFGD. Each group involves two measurement models and one structural model capturing the latent growth relationship from pre-test to post-test. Let Yti be outcome variable in group i at time t, with i = E1 , C1 , E2 , C2 ; t = 0, 1. 33 Table 3.1: Solomon Four-Group Design in Structural Equation Modeling Framework Intervention Group 1 (with pre-test) Group 2 (without pre-test) Experimental Acceleration Effect: Slope ν Intervention Effect: Intercept δ Maturation Effect: γ Pre-test Effect: τ E Y0 1 = δ0 + Λ0 η0 + ε0 E Y1 1 = (δ0 + δ) + Λ1 η1 + ε1 E Y0 2 = δ0 + Λ0 η0 + ε0 E Y1 2 = (δ0 + δ) + Λ1 η1 + ε1 η1 = τ + γ + νη0 η1 = γ + νη0 C Y0 2 = δ0 + Λ0 η0 + ε0 C Y1 2 = δ0 + Λ1 η1 + ε1 C Y0 2 = δ0 + Λ0 η0 + ε0 C Y1 2 = δ0 + Λ1 η1 + ε1 η1 = τ + γ + 1η0 η1 = γ + 1η0 Control Maturation Effect: γ Pre-test Effect: τ 34 3.1.1 SEM of Experimental Group 1 Let η0 and η1 represent latent mathematics proficiency at pre-test and post-test time points, E respectively. η0 is measured by k0 surrogate variables, which are denoted in vector Y0 1 = [Y1 , Y2 , · · · , Yk ]. η1 is measured by k1 surrogate variables, which are denoted in vector 0 E1 Y1 = [Y1 , Y2 , · · · , Yk ]. 1 The measurement equation for Experimental Group (denoted with the superscript E1 ) at pre-test time (denoted with the subscript 0) is E Y0 1 = δ0 + Λ0 η0 + ε0 . (3.1) The measurement equation for Experimental Group at post-test time (denoted with the subscript 1) is E Y1 1 = (δ0 + δ) + Λ1 η1 + ε1 . (3.2) The extra term δ in the intercept of the post-test measurement model indicates the interE E vention effect. If Y0 1 and Y1 1 are binary vectors (e.g., 1 or 0), the two measurement equations become item response theory models (Lord, 1980). In the two-parameter logistic (2PL) model (Lord and Novick, 1968), measurement equations for pre- and post-test are  E1 prb(Y0 = 1)  = a (η − b ); log  0 0 0 E1 1 − prb(Y0 = 1) (3.3)  E1 prb(Y1 = 1)  = a (η − b ), log  1 1 1 E1 1 − prb(Y1 = 1) (3.4)  and  respectively. b0 and b1 are the item difficulty parameter vectors. a1 and a0 are the discrim35 ination parameter vectors. The structural equation η1 = τ + γ + νη0 1 (3.5) reveals the latent mathematics proficiency growth between two time points. γ and τ indicate the maturation effect and learning effect due to taking pre-test, respectively. The latent growth rate, namely the acceleration effect of intervention, is captured by the slope ν. 3.1.2 SEM of Control Group 1 Control Group 1 is a pre-post test design without treatment involved. The measurement equations are the same as those in Treatment Group 1. However, ν is unity in the structural equation η1 = τ + γ + 1η0 , (3.6) indicating a “flat” latent growth rate due to the lack of intervention. Still, latent mathematics proficiency at the post-test time point is different from that at the pre-test time point by a sum of the maturation effect γ and the learning effect due to taking pre-test τ . 1 This equation specifies a general case. For the purpose of simplicity, ν can be set as 1 0 across all four groups. τ and γ are speculated in the structural model is because they reflect changes associated with the latent mathematics proficiency. The latent changes will further reveal their effects through the measurement equation. 36 3.1.3 SEM of Experimental Group 2 Experimental Group 2 only observes post-test data. Because there is no pre-test, the learning effect τ is zero and is dropped from the implicit structural model η1 = γ + νη0 . (3.7) The pre-test measurement model and structural model are not observable and are displayed in the dashed boxes (See Table 3.1, Row 2 Column 3). 3.1.4 SEM of Control Group 2 Treatment and pre-test are not applied to this group. The acceleration effect of intervention ν is unity and the learning effect τ is zero in the implicit structural model η1 = γ + 1η0 . (3.8) Pre-test measurement model and structural model are not observable and are displayed in the dashed boxes (See Table 3.1, Row 3 Column 3). 3.1.5 Pre-Equivalence of Groups (PEoG) Assumption The measurement model at Time 0 is written as Y0 = δ0 + Λ0 η0 + ε0 . (3.9) Definition The measurement model at Time 0 in the equation (3.9) holds equivalently in 37 the four groups. It is called pre-equivalence of groups (PEoG) assumption, which is mathematically equivalent to Y0 ⊥D, where ⊥ means “ independent of ” (e.g., Rosenbaum and Rubin, 1983). D is the binary group membership indicator variable, representing treatment (D = 1) or control (D = 0). Y0 ⊥D holds if η0 ⊥D holds because Y0 is a linear function of η0 . η0 ⊥D holds for two reasons, described below. First, Group 1 and Group 2 are independently selected randomly (Solomon, 1949) from the same population, whose latent mathematics proficiency is η0 . Second, participants have an equal chance to be assigned to either intervention or control through random assignment. Let D1 be the binary group membership indicator variable in Group 1, with D1 = E1 , C1 . Y0 ⊥D implies Y0 ⊥D1 ; and η0 ⊥D implies η0 ⊥D1 . Correspondingly, let PEoG-1 represent PEoG assumption in only Group 1. The PEoG-1 assumption implies the equation (3.9) holds equivalently in Group 1. Similarly, Let D2 be the binary group membership indicator variable in Group 2, with D2 = E2 , C2 . Y0 ⊥D2 and η0 ⊥D2 can be derived. Also, The PEoG-2 assumption implies the equation (3.9) holds equivalently in Group 2. Theorem 3.1.1. (Equivalence of using Group 1 and Group 2 to Estimate Latent Growth) Because of random assignment of treatment and control, participants at the pre-test time point have equal chance to be assigned to either the treatment or the control group. Given PEoG assumption, latent growth estimate derived from Group 2 is equivalent to that derived from Group 1. Proof. First, the latent growth can be estimated using data collected in Group 1, Experimental and Control. The latent growth is estimated as the latent mean difference between 38 two populations. That is, E(η1 |E1 ) − E(η1 |C1 ) = E(τ + γ + νη0 |E1 ) − E(τ + γ + 1η0 |C1 ) = E[(ν − 1)η0 ]. (3.10) This holds because of η0 ⊥D1 , with D1 = E1 , C1 . Second, given PEoG assumption, the latent growth can be estimated using Experimental Group 2 and Control Group 2. That is, E[(η1 |E2 ) − (η1 |C2 )] = E[(γ + νη0 |E2 ) − (γ + 1η0 |C2 )] = E[(ν − 1)η0 ]. (3.11) This holds because η0 ⊥D2 , with D2 = E2 , C2 . Thus, it proves that given PEoG assumption using Group 2 is equivalent to using Group 1 in estimating the latent growth. Theorem 3.1.2. (Equivalence of using Group 1 and Group 2 to Estimate True Gain) Given the random assignment and PEoG assumption, true gain score estimate derived from Group 2 is equivalent to that derived from Group 1. Proof. True gain estimate derived from Group 1 is E C C E E[(Y1 1 − Y0 1 ) − (Y1 2 − Y0 2 )] = E[δ + Λ1 (ν + γ + τ η0 ) − Λ0 η0 )] − E[Λ1 (ν + γ + 1η0 ) − Λ0 η0 ] = δ + E[Λ1 (τ − 1)η0 ]. True gain estimate derived from Group 2 is 39 E C E[Y1 2 − Y1 2 ] = E[δ + Λ1 (γ + τ η0 )] − E[Λ1 (γ + 1η0 )] = δ + E[Λ1 (τ − 1)η0 ]. The two estimates are equal given the PEoG assumption. In summary, under the PEoG assumption, using SFGD-Group 2 (SFGD-G2) sufficiently estimates the latent growth and the true gain score. Actually, how PEoG assumption assures the use of SFGD-G2 is the same as how HEoG assumption assures the use of SCD. 3.2 Extended Solomon Four-Group Design in SEM Framework Mathematically defining HEoG for SCD requires an extended version of SFGD. The SFGD is extended by including covariates X in the SEM framework. The SEM of the extended SFGD has two measurement models of outcome Y , two measurement models of X, and three structural models. Detailed model structures are in the following paragraphs, followed by the further graphical comparison between SCD and the extended SFGD. The SFGD is extended by including covariates X in the SEM framework. After including covariates X in the SFGD, the measurement models 2 are as follows:    Y = δ + λη + e ; 1 (3.12)   X v + gξ + e . = 2 2 For the purpose of simplicity, the superscripts (the group indices) are dropped. However, Table 3.2 clearly displays each group in a separate row. Adding subscripts may be redundant. Also, after covariates are included, the errors terms are now denoted by e’s rather than ε’s. 40 e1 ∼ N (0, Θe1 ) is independent of η, ξ and e2 , e2 ∼ N (0, Θe2 ) is independent of η, ξ and e1 . The structural model in LISREL8 notation (J¨reskog and S¨rbom, 1996) is o o η = A + Bξ + U. (3.13) U ∼ N (0, ΘU ) is independent of ξ, e1 and e2 . Intercept A is generally set at zero for the purpose of model identification (Lee, 2007). Table 3.2 displays the models for both pre-test at Time 0 (denoted as 0) and post-test at Time 1 (denoted as 1). The two measurement models for Y are as follows:     Y0    = Y1    δ0   λ0 + 0 δ1   0   η0    + λ1 η1  e10  , e11 (3.14) with δ1 = δ0 + δ. δ represents the intervention effect. The structural model is η1 = τ + γ + νη0 , (3.15) whose parameters are the same as those in Section 3.1.2. Similarly, the two measurement models for covariates X are        X0   v 0   g0  = + X1 v1 0    0   ξ0   e20   + . g1 ξ1 e21 41 (3.16) Table 3.2: SEMs of the Extended Solomon Four-Group Design and Covariance Matrixes Extended Solomon Four-Group Design SEMs and Constraints Y0 δ0 λ0 0 η0 e10 Experimental Group 1: = + + e11 Y1 δ1 0 Λ0 η1 e20 0 ξ0 g0 k0 X0 + + = e21 ξ1 0 g1 k1 X1 η1 = τ + γ + νη0 ξ1 = a + πξ0 U0 η0 A0 B0 0 ξ0 = + + U1 η1 A1 0 B1 ξ1 Control Group 1: Constraints on Experimental Group 1’s Model: Zero treatment effect: δ1 = δ0 + δand δ=0 ; Unity acceleration effect: Slope ν=1. Covariance Appendix B.1 Appendix B.2 Experimental Group 2: Constraints on Experimental Group 1’s Model: X0 is not observed: Y0 is not observed: No pre-test effect: τ =0. Appendix B.3 Control Group 2: Constraints on Experimental Group 1’s Model: X0 is not observed: Y0 is not observed: Zero treatment effect: δ1 = δ0 + δand δ=0 ; No pre-test effect: τ =0. Unity acceleration effect: Slope ν=1. Appendix B.4 42 The relationship between ξ1 and ξ0 is captured by a structural model, ξ1 = a + πξ0 . (3.17) When a=0 and π=1, the covariates are invariant across two time points. Further, the relationship of the latent variables of X and Y is revealed in the structural model        η0   A0   B0  = + η1 A1 0    0   ξ0   U0   + . B1 ξ1 U1 (3.18) The extended SFGD SEM by nature is a two-level factor analysis model (Muth´n, 1994) e because of the hierarchically structured school system. Its covariance can be decomposed into within-cluster (denoted as w) and between-cluster(denoted as b) components (Muth´n, e 1994; Schmidt, 1969). Appendix B.5 has the detailed procedures that derive the variance-covariance of the extended SFGD’s Experimental Group 1 listed in Appendix B.1. Appendix B.2-B.4 have the other three variance-covariance matrixes that are derived using the constraints listed in Table 3.2. 3.2.1 Extended-PEoG Assumption Still, data at Time 0 are not collected from Group 2, Experimental and Control. Time-0-SEM is     Y0 = δ0 + λ0 η0 + e10     X = v0 + g0 ξ0 + e20  0     η =A +B ξ +U 0 0 0 0 0 43 (3.19) Thus, this model is not testable. Group 2 produces an unbiased intervention effect estimate in the counterfactual sense because one needs to assume that Time-0-SEM implicitly holds equivalently in Group 1 and Group 2. Definition The assumption that the equation (3.19)implicitly holds equivalently in Group 1 and Group 2 is the extended-PEoG assumption. The extended-PEoG assumption assures that true gain score estimate derived from Group 2 is unbiased and equivalent to that derived from Group 1. Theorem 3.2.1. (Equivalence of using Group 1 and Group 2 to Estimate True Gain in Extended-SFGD) Given the random assignment of treatment and control and the extendedPEoG assumption, true gain score estimate derived from Group 2 is equivalent to that derived from Group 1. The proof includes two parts: 1) under the assumption, the extended-SFGD’s Group 2 and Group 1 are equivalent in estimating the true gain; and 2) the estimate of the true gain is unbiased. Proof. 1) Under the assumption, the extended-SFGD’s Group 2 and Group 1 are equivalent in estimating the true gain. Equation (3.19) implies Y0 = δ0 + λ0 A0 + λ0 B0 ξ0 + λ0 U0 + e10 , (3.20) −1 where ξ0 = g0 (X0 − v0 − e20 ). In Group 2, both Experimental and Control data at Time 0 are not observable. Thus, E C through Group 2 treatment effect is estimated by E(Y1 2 − Y1 2 ). 44 C E Further, the extended-PEoG assumption implies that Y0 1 ≡ Y0 1 . So that C T C E E(Y1 2 − Y1 2 ) = E(Y1 2 − Y1 2 − 0) C E C E C E = E[Y1 2 − Y1 2 − (Y0 1 − Y0 1 )] (becasue of Y0 1 − Y0 1 = 0) C C E E = E[(Y1 2 − Y0 1 ) − (Y1 2 − Y0 1 )] C C E E = E[(Y1 2 − Y0 1 )] − E[(Y1 2 − Y0 1 )]. E E C C Note that E[(Y1 2 − Y0 1 ] − E[(Y1 2 − Y0 1 )] is the true gain estimate derived from Experimental Group 1 and Control Group 1. E E The true gain is the difference between the average treatment gain (E[(Y1 2 − Y0 1 )]) C C and the average control gain (E[(Y1 2 − Y0 1 )]). Thus, it proves that under the extend-PEoG assumption using Group 2 is equivalent to the use of Group 1 to estimate the true gain. 2) Estimate of the true gain is unbiased. Based on Time-1-SEM (see Table 3.1)     Y1 = δ1 + λ1 η1 + e11    ,  X1 = v1 g1 ξ1 + e21      η =A +B ξ +U 1 1 1 1 1 (3.21) along with η1 = τ + γ + νη0 , write Y1 = δ1 + Λ0 (τ + γ + νη1 ) + e11 . 45 (3.22) Because η1 = A1 + B1 ξ1 + U1 , write Y1 = δ1 + Λ0 (τ + γ + νη0 ) + e11 = δ1 + Λ0 [τ + γ + ν(A0 + B0 ξ0 + U0 )] + e11 . The average treatment gain across Time 0 and Time 1 is E E Y1 2 − Y0 1 = (δ1 − δ0 ) + Λ0 (τ + γ) + (Λ0 ν − λ0 )(A0 + B0 ξ0 + U0 ) + e11 − e10 . In Control Group 1 and Control Group 2, δ1 = δ0 and ν=1. The average treatment gain across Time 0 and Time 1 is C C Y2 2 − Y1 1 = (δ1 − δ1 ) + Λ1 (ν + γ) + (Λ1 − Λ0 )(A1 + B1 ξ1 + U1 ) + e12 − e11 . The true gain is T T C C E[(Y1 2 − Y0 1 ) − (Y1 2 − Y0 1 )] = E[(δ1 − δ0 ) + Λ0 (ν − 1)(A0 + B0 ξ0 + U0 )] = δ + Λ0 (ν − 1)(A0 + B0 ξ0 ) = δ + E[Λ0 (ν − 1)A0 ] + E[Λ0 (ν − 1)B0 ξ0 ]. A0 is generally set at 0 in the SEM literature in a identifiable model (Lee, 2007). Because C T T C of E(U0 ) = 0, E(ξ0 ) = 0, E[(Y1 2 − Y0 1 ) − (Y1 2 − Y0 1 )] = δ holds. Thus, the true gain estimate is unbiased. 46 3.3 Synthetic Cohort Design in the Context of Solomon Four-Group Design The SFGD-G1 is illustrated inside the black dashed box in Figure 3.1. Experimental Group 1 and Control Group 1 are represented by the black circles. Each group is tested twice: pre-test and post-test. The black-colored capital letter T in Figure 3.1 indicates treatment intervention administered after the pre-test. δ1 and δ0 , defined in Section 3.2.1, represent the average treatment gain and the average control gain, respectively. The PEoG assumption indicates that Experimental Group 1 and Control Group 1 are comparable at pre-test time point. Figure 3.1: Synthetic cohort design in the context of Solomon Four-Group Design-G1 The SCD is illustrated in green in Figure 3.1. Ideally, two cohorts, Cohort 2 and Cohort 1, can be followed longitudinally across years such as three adjacent years, Yeari−1 ,Yeari , 47 and Yeari+1 . Cohort 1 is in grade 7 at Yeari and grade 8 in Yeari+1 . Focal Cohort 2 is in grade 7 in Yeari−1 and grade 8 in Yeari . As a quasi-longitudinal design illustrated in the green dashed box, SCD collects data at only Yeari from the two adjacent cohorts. SCD requires the HEoG assumption implying that two 7th graders are comparable across Yeari−1 andYeari . In other words, the HEoG assumption assures that Cohort 1 at time 1 (7th grade at Time 1) are comparable to Cohort 2 at Time 0 (7th grade at Time 0). Figure 3.1 indicates a close relationship between two designs. Focal Cohort 2 (grade 8 in Yeari ) is the Experimental Group. Treatment intervention represents one year of schooling at 8th grade inYeari . δ1 is the schooling effect due to one year of schooling. δ0 is not estimable because “control”, without one year of schooling at 8th grade inYeari , is not applicable in educational practice. Thus, SCD cannot estimate the true gain, which is the difference between δ1 and δ0 in SFGD. Particularly, SCD is used to obtain δC2T 1−C1T 1 , the estimate schooling effect δ1 , due to one year of schooling of 8th grade inYeari . The estimator of δ1 in SCD, denoted as δC2T 1−C1T 1 , is a composite estimate of true treatment gain plus the maturation and learning effect due to previously taking the pre-test. E E That is, based on SEM framework, the SCD estimates δ1 is the expectation of Y1 2 − Y0 1 . That is, E E E(Y1 2 −Y0 1 ) = E[(δ1 −δ0 )+λ1 (τ +γ)+(Λ0 ν −λ0 )(A0 +B0 ξ0 +U0 )+e11 −e10 ]. (3.23) Adding constraints to the parameters can further simplify the estimation of schooling effect. First, temporal measurement invariance assumption (Cheung and Rensvold, 2002; Kaplan, 2008, p. 64) assumes that factor loading vectors across two time points are equal: 48 Λ0 ≡ λ0 . Second, it is plausible to assume a flat growth rate in the latent relationship the equation (3.15), that is, ν ≡ 1. This implies: 1) latent ability at Time 1 is invariant from Time 0; and 2) growth effect is fully captured by maturation and pre-testing effect, plus interaction effect, if there is any. Thus, δ1 = E[(δ1 − δ0 ) + Λ0 (τ + γ)] = (δ1 − δ0 ) + E(Λ0 (τ + γ)). (3.24) This indicates that school effect estimate equals to the true gain (δ1 − δ0 ) plus the growth effect due to maturation and pre-test effect. The use of SCD to investigate the effect of 8th grade instruction on student learning is determined by how comparable the two 7th grades are across Yeari−1 andYeari . If they are not comparable, schooling effect estimate will be biased. But if the two 7th grades are comparable, SCD approximates a longitudinal study SFGD-G1. The necessary condition that two 7th grades are comparable across two time points is assured by the HEoG assumption, which works in a counterfactual sense. This can be mathematically written as (HEoG|counterfactual) ⇒ δC2T 1−C1T 1 = δ1 . (3.25) It reads, “given the counterfactual condition, HEoG assumption holds and assures that SCD approximates a longitudinal study SFGD-G1 in terms of estimating the effect of one year schooling.” Definition Figure 3.1 graphically indicates that the PEoG assumption (see Section 3.1.5 49 and Section 3.2.2 for detail) is the SEM-version of the HEoG assumption. That is, the equation (3.19) holds equally at two 7th grades in Y eari−1 and Y eari . In practice, using randomization assures the (Extended-)PEoG assumption for (Extended)SFGD. If randomization is applicable in SCD, it can assure the HEoG. That is, (HEoG|randomization) ⇒ δC2T 1−C1T 1 = δ1 . (3.26) This reads, “under the randomization condition, HEoG assumption holds and assures that SCD approximates a longitudinal study SFGD-G1 in terms of estimating the effect of one year schooling.” In educational settings, randomization is not applicable in SCD and it cannot assure the HEoG, even though (Extended-)PEoG and HEoG are mathematically equivalent. Matching is proposed to assure HEoG in SCD. That is, (HEoG|matching) ⇒ δC2T 1−C1T 1 = δ1 . (3.27) This reads, “under the matching condition, HEoG assumption holds and assures that SCD approximates a longitudinal study SFGD-G1 in terms of estimating the effect of one year schooling.” 3.4 Matching and HEoG Assumption This section further depicts how matching will assure HEoG assumption. Let C2T 0, C1T 1, and C2T 1 represent three time-cohort knots (See the following Figure 3.2.). CjT t indicates 50 the knot of Cohort j at Time t, with j = 1, 2, and t = 0, 1. Conceptually, there are two types of matching. C2T0-C1T1 Matching Implementing the matching approach in this situation is to match individuals of Cohort 2 at Time 0 with those of Cohort 1 at Time 1. In other words, matching creates a group of 7th graders at Time 1 that are equivalent to 8th graders when they were at Time 0. In real longitudinal design, the treatment effect is the outcome difference on Y of 8th graders between Time 1 and Time 0. Because of matching, the 8th graders at Time 0 do not have to be measured. The assessment measure of matched 7th graders at Time 1 can be treated as the equivalent assessment measure of the 8th graders when they were in 7th grade at Time 0. However, this matching cannot be realized in the SCD because data of C2T0 are not available. It is only applicable in simulation studies in order to verify that matching can assure HEoG assumption. It will be discussed in details in Section 4.2 and 4.3 of Chapter 4. C2T1-C1T1 Matching Implementing the matching approach in this situation is to match individuals of two cohorts at Time 1. In other words, matching creates a group of 7th graders at Time 1 that are equivalent to those 8th graders at Time 1 in terms of the simulated student characteristic variables. If the covariates are hypothetically unchanged across Time 0 (Y eari−1 ) and Time 1 (Y eari ), then C2T 1 − C1T 1 matching will be equivalent to C2T 0 − C1T 1 matching. Quantifying HEoG assumption based on SEM framework provides a way of manipulating model parameters to generate non-comparable cohort data to examine how matching improves cohort comparability to assure HEoG for SCD. The hierarchical data structure collected through the SCD determines the proposed dual matching. The following paragraphs 51 Figure 3.2: Three data sets, two-way matching and the HEoG assumption will discuss how to match hierarchically structured data and how to match through the latent variable to account for measurement errors on surrogate variables. The detailed simulation plan is discussed in Chapter 4. The simulated data will be generated for C2T 0, C2T 1 and C1T 1. Detailed data generation procedure of C2T 0 and C2T 1 is discussed in section 4.1. The parameter manipulation for data generation of C1T1 is discussed in section 4.2. 52 Chapter 4 Simulation Study A number of Monte Carlo simulations are conducted to test the performance of the proposed matching approaches under different conditions. The purpose of simulation is to create a series of studies to examine how matching reduces bias of the schooling effect estimate in the SCD. Specifically, the simulation evaluates how effectively matching can reduce selection bias and improve the accuracy of the schooling effect estimate. In the use of SCD, selection bias is represented by the non-comparability of the two cohorts at two time points (i.e., C2T 0 and C1T 1). Selection bias violates the HEoG assumption. It inflates estimation bias and attenuates the efficiency of the SCD in examining student learning. Estimation bias is defined by the difference between the quasi-longitudinal growth and the true longitudinal growth (the schooling effect). Reducing selection bias will reduce estimation bias. Based upon the Second International Mathematics Study (SIMS, IEA, 1977) data and the two-level structural equation model, several selection bias situations are simulated to examine how matching improves the comparability of the two cohorts and reduces selection bias. Bias reduction rate and estimation 53 bias reduction rate indicate how well matching reduces bias of the schooling effect estimate in the SCD. Larger reduction rate indicates higher accuracy and efficiency. The bias reduction rate is defined in Section 4.3. Mplus (Muth´n and Muth´n, 2009) is used to fit the two-level SEM to estimate the e e parameters, which are treated as unknown values to generate quasi-population data. R (R Development Core Team, 2007) is used to conduct matching and examine its performance. Section 4.1 discusses how to generate longitudinal data of focal Cohort 2 at Tim 0 and Time 1 (denoted as C2T 0 − C2T 1); Section 4.2 discusses how to generate data for Cohort 1 at Time 1 (denoted as C1T 1). The SCD uses C2T 1 − C1T 1 data to estimate schooling effect. 4.1 Data and Conceptual Model The SIMS uses a longitudinal design to study the effects of the curriculum and the classroom instruction. The classroom process is “mapped” on the targeted 8th grade (focal Cohort 2) where the 13-year-old students are found. Two waves of mathematics achievement data are collected, with the first wave at the beginning of the school year (Time 0), and the second at the end of the school year (Time 1). In this design, Cohort 2 at Time 0 is in the control condition. After the “treatment” of one year of schooling, Cohort 2 at Time 1 data are collected to assess the schooling effect (δC2T 1−C2T 0 ), defined as the “changes in mathematics achievement over the time-span of one school year at the particular grade level” (Wiley and Wolfe, 1992, p. 299). 54 Table 4.1: Level-1 Descriptive Statistics of Variables Label Outcome Variable Post-Test Score POSTTEST Pre-Test Score PRETEST Student Level Latent Covariates Educational Inspiration (EDUINSP) YPWANT YPWWELL YPENC Self –Encouragement (SLFENCRG) Family support (FMLSUPRT) Math Importance (MTHIMPT) Socioeconomics Status (SES) the Final Two-Level Structural Equation Model (N=2,296) Description Mean Total post-test scores on 40 items Total pre-test scores on 40 items Learn more math (Inverse code, 1-5 a ) Parents want me do well on math (1-5 a ) Parents encourage me to do well on math (Inverse code, 1-5 a ) YIWANT I want to do well on math (1-5 a ) YMORMTH Looking forward to taking more math (1-5 a ) YNOMORE Take no more math if possible (Inverse code,1-5 a ) YPINT Parents are interested in helping with math (Inverse code, YFLIKES 1-5 a ) YMLIKES Father enjoys doing math (Inverse code , 1-5 a ) YFABLE Mother enjoys doing math (Inverse code, 1-5 a ) YMABLE Father is able to do math homework (Inverse code, 1-5 a ) Mother is able to do math homework (Inverse code,1-5 a ) YMIMPT Mother thinks math is important (1-5 a ) YFIMPT Father thinks math is important (1-5 a ) YFEDUC Father’s education level (1-4 b ) YMEDUC Mother’s education level (1-4 b ) YFOCCN Father’s occupation national code (1-8c ) YMOCCN Mother’s occupation national code (1-8c ) 55 17.67 13.79 4.73 4.24 4.37 4.32 3.24 3.73 3.72 3.53 3.25 3.92 3.71 4.60 4.55 3.38 3.35 4.26 4.11 Table 4.1: Continued. Variables Label Description Student level Observed Covariates Student Age XAGE Grand mean centered age Parental Help YFAMILY frequency of family help (1-3d ) Education Expectation EDUECPT Derived from YMOREED: how many years of education parents expected (1-4e ) Time use on homework YMHWKT Typical week hours math for homework per week a 1=not at all like ,..., 3=unsure,. . . , 5=exactly like; b 1=little schooling, 2=primary school, 3=secondary school, 4=college or university or tertiary education; c 1=unskilled worker, 2=semiunskilled worker , 3=skilled worker lower, 4=skilled worker higher, 5=clerk sales and related lower, 6=clerk sales and related higher, 7=professional and managerial lower, 8=professional and managerial higher; d 1=never/hardly, 2=occasionally, 3=regularly; e 1=up to 2 years, 2=2 to 5 years, 3=5 to 8 years, 4=more than 8 years. 56 Mean 0.00 1.75 2.97 2.98 Table 4.2: Level-2 Descriptive Statistics of the Final Two-Level Structural Equation Model (N=126) Variables Label Description Mean Teacher/Class Level Covariates Class Size CLASSIZE Created from the number of students in class 26.60 Opportunity to Learn OLDARITH Prior OTL of Arithmetic 7.10 OLDALG Prior OTL of Algebra NA OLDGEOM Prior OTL of Geometry 3.19 NEWARITH This year’s OTL of Arithmetic NA NEWALG This year’s OTL of Algebra 59.61 NEWGEOM This year’s OTL of Geometry 41.37 Class Instruction TPPWEEK Actual number of hours of math instructions per week 5.09 School Level Covariates Qualified Math Teacher MTHONLY Proportion of qualified math teachers: 0.14 Rate the sum of SSPECM and SSPECF divided by STCHS 57 SIMS data are collected from seven countries including the United States, Canada, France, Belgium, Japan, Thailand and New Zealand. This study uses only SIMS data collected in the United States (SIMS-USA, Wolfe, 1987). The targeted population is Population A, including all students in the second year of the general secondary education, technical secondary education, and vocational secondary education programs in both type I (non-traditional) and type II (traditional) forms of school organizations. In the SIMS-USA data, 8,332 students from 164 schools are sampled within 7 strata using the two stage complex sampling method. There are 5,584 students (of a total of 8,332) are nested in 211 classes, which belong to four types of classes (Kifer, 1992): Remedial (N=21), Regular (N=126), Enriched (N=46) and Algebra (N=18). The final data set includes 2,296 students in 126 Regular classes. The average class-size is about 27. Table 4.1 and Table 4.2 list the descriptive statistics of the outcome variables and covariates1 (Schmidt and Burstein, 1992) ¯ Level-1 variable means (X1 ) and variance-covariance matrix (S1 ) are listed in Table 4.3. ¯ Level-2 variable means (W2 ) and variance-covariance matrix (S2 ) are listed in Table 4.4. These means, variances, and covariances are computed using Mplus code listed in Appendix A.1. 1 The labels of the covariates are adapted from the abbreviations in the SIMS questionnaire (Wolfe, 1987). The newly created abbreviations of the latent variables and the outcome variables are listed in the nomenclature of this dissertation study. 58 Variable RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN TOTPOS TOTPRE XAGE EDUEPCT YFAMILY YMHWKT OLDARITH OLDGEOM NEWALG NEWGEOM TPPWEEK CLASSSIZ MTHONLY Mean 4.730 4.373 4.239 4.317 3.242 3.725 3.720 3.534 3.248 3.915 3.705 4.603 4.545 3.374 3.349 4.264 4.112 0.000 0.000 0.000 2.968 1.745 2.984 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ¯ Table 4.3: The Level-1 Variance Covariance Matrix (S1 ) and Means (X1 ) Variance Covariance 0.34 0.24 0.80 0.16 0.24 0.81 0.11 0.14 0.28 0.75 0.12 0.27 0.23 0.32 1.35 0.11 0.20 0.24 0.28 0.59 1.25 0.15 0.32 0.16 0.14 0.24 0.14 1.28 0.09 0.21 0.09 0.10 0.22 0.19 0.46 1.13 0.08 0.15 0.13 0.08 0.22 0.14 0.35 0.23 1.19 0.11 0.23 0.09 0.09 0.12 0.10 0.66 0.66 0.05 1.51 0.08 0.15 0.11 0.02 0.13 0.07 0.51 0.05 0.72 0.34 1.52 0.19 0.36 0.17 0.15 0.20 0.12 0.23 0.12 0.14 0.12 0.18 0.53 0.20 0.37 0.17 0.13 0.17 0.12 0.25 0.26 0.08 0.31 0.07 0.38 0.04 0.07 0.04 0.01 0.01 0.03 0.18 0.20 0.01 0.29 0.03 0.05 0.04 0.04 0.04 0.02 0.01 0.05 0.11 0.09 0.09 0.11 0.12 0.04 0.07 0.09 0.02 0.01 0.06 0.06 0.33 0.43 0.01 0.64 0.02 0.03 0.01 0.01 0.04 -.05 -.06 -.01 0.22 0.20 0.16 0.33 0.24 0.04 0.47 0.46 0.69 0.97 1.71 2.00 0.14 0.64 0.30 0.59 -.09 0.20 0.25 0.24 0.40 0.29 0.89 0.99 0.19 0.57 0.10 0.60 -.24 0.08 -.13 -.16 -.09 -.22 0.02 -.13 -.29 -.18 0.06 -.49 0.03 -.11 0.07 0.10 0.09 0.08 0.16 0.18 0.11 0.12 0.09 0.10 0.04 0.06 0.03 0.06 0.02 0.01 -.01 -.01 0.20 0.07 0.06 0.15 0.11 0.04 0.04 0.08 0.05 -.05 -.02 -.08 0.07 0.04 -.04 -.22 -.04 -.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 59 0.64 0.09 0.03 0.14 0.06 0.39 0.21 -.29 0.08 0.06 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.48 0.23 0.61 0.41 0.81 0.65 -.45 0.15 0.03 0.13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.41 0.38 0.47 0.72 0.52 -.41 0.14 0.02 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Variable RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN TOTPOS TOTPRE XAGE EDUEPCT YFAMILY YMHWKT OLDARITH OLDGEOM NEWALG NEWGEOM TPPWEEK CLASSSIZ MTHONLY Table 4.3: Continued Variance Covariance 4.43 1.10 2.35 1.88 -1.13 0.45 0.01 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.93 1.66 1.47 -1.19 0.25 0.00 0.87 0.00 0.00 0.00 0.00 0.00 0.00 0.00 48.51 22.02 -4.85 1.45 -0.64 -1.73 0.00 0.00 0.00 0.00 0.00 0.00 0.00 29.98 -3.34 1.11 -0.50 -1.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 36.06 -0.47 0.02 -0.29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.59 -0.01 -0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 60 0.35 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 47.26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Variable RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN TOTPOS TOTPRE XAGE EDUEPCT YFAMILY YMHWKT OLDARITH OLDGEOM NEWALG NEWGEOM TPPWEEK CLASSSIZ MTHONLY ¯ Table 4.4: The Level-2 Variance Covariance Matrix (S2 ) and Means (W2 ) Mean Variance Covariance 0.000 0.00 0.000 0.00 0.00 0.000 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 17.672 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 13.793 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.710 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.319 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.961 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.137 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.087 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 26.600 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.139 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 61 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Variable RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN TOTPOS TOTPRE XAGE EDUEPCT YFAMILY YMHWKT OLDARITH OLDGEOM NEWALG NEWGEOM TPPWEEK CLASSSIZ MTHONLY Table 4.4: Continued Variance Covariance 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 20.17 14.68 0.00 0.00 0.00 0.00 0.52 0.39 3.03 3.20 0.12 -7.34 0.07 13.04 0.00 0.00 0.00 0.00 0.80 0.46 2.93 2.09 0.01 -5.27 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 62 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.03 0.25 -0.08 -0.45 -0.06 0.17 -0.01 0.39 -0.07 -0.37 -0.03 -0.13 -0.01 6.24 3.97 0.18 0.35 0.00 5.59 -0.20 -0.77 0.00 3.73 -0.67 0.00 29.01 0.10 0.02 4.1.1 Two-Level Structural Equation Model Based on Data of SIMS-USA The conceptual model fitted on SIMS-USA data is displayed in Figure 4.1. The post-test score is predicted by the pre-test score, and teacher variables. The pre-test score is predicted by student background variables and school characteristic variables. The two-level structural equation model (Muth´n, 1994) is displayed in Figure 4.2. This model is a particular case e of the general two-level SEM discussed in Section 3.2. Figure 4.1: Conceptual framework model on SIMS-USA data In the Level-1 model, the post-test outcome variable (POSTTEST) is predicted by the pre-test outcome variable (PRETEST). The pre-test score is predicted by student characteristics including age (XAGE), educational expectation (EDUCEPT), homework time (YMHWKT), and frequency of family help on homework (YFAMILY). The pre-test score is also predicted by five latent variables including educational inspiration (EDUINSP), self encouragement (SLFENCRG), family support (FMLSUPRT), importance of learning mathematics (MTHIMPT), and socioeconomic status (SES). The latent variables and their surrogate variables are displayed in Table 4.1. In the level-2 model, the intercept of pre-test score (denoted as β0 in equation 4.3 and PRETEST MEAN in Figure 4.2) is predicted by teacher/school-level variables including previous year’s opportunities to learn (OTL) of arithmetic (OLDARITH), previous year’s 63 OTL of algebra (OLDALG), class-size (CLASSSIZE), and qualified mathematics teacher rate in school (MTHONLY). The intercept of post-test score (denoted as α0 in equation 4.4 and POSTTEST MEAN in Figure 4.2 ) is predicted by β0 and three class-level variables, which are current year’s OTL of algebra (NEWALG), current year’s OTL of geometry (NEWGEOM), and weekly hours of math instructions (TPPWEEK). The residuals are independent of one another. Mplus (Muth´n and Muth´n, 2009) code that fits the two-level SEM is listed in Appendix e e A.1. Table 4.5 lists the factor loadings, regression coefficients, and residual variances. Level-1 Model: POSTTEST = α0 + α1 PRETEST + epost ; PRETEST = β0 + β1 XAGE + β2 EDUCEPT + β3 YFAMILY + β4 YMHWKT+ (4.1) (4.2) β5 EDUINSP + β6 SLFENCRG + β7 FMLSUPRT + β8 MTHIMPT + β9 SES + epre , 2 2 where epost ∼ N (0, σe ), e ∼ N (0, σepre ). post pre Level-2 Model: β0 = γ0 + γ1 OLDARITH + γ2 OLDALG + γ3 CLASSSIZE + γ4 MTHONLY + uβ0 , (4.3) 2 with uβ ∼ N (0, σu ); β0 0 α0 = β0 + γ5 NEWALG + γ6 NEWGEOM + γ7 TPPWEEK + uα0 , 64 (4.4) Figure 4.2: Two-level structural equation model on SIMS-USA data 65 2 with uα0 ∼ N (0, σuα ). 0 This two-level structural model results in the estimates of the variance-covariance S1 )and ˆ ˆ S2 . The estimated variance-covariance matrixes are denoted as Σ1 and Σ2 , respectively. These model-based estimates of level-1 variable means (ˆ1 ) and variance-covariance matrix µ ˆ (Σ1 ) are listed in Table 4.6. The model-based estimates of Level-2 variable means (ˆ2 ) and µ ˆ variance-covariance matrix (Σ2 ) are listed in Table 4.7. The variance-covariance matrix of the latent factors is listed in Table 4.8. These model-based parameter estimates are treated as known values and are used to generate longitudinal data of Cohort 2 at Time 0 (e.g., grade 7 in Y eari−1 ) and Time 1 (e.g., grade 8 in Year i). Details are in Section 4.1.3. Cohort 2 at Time 0 data are not collected in SCD. Cohort 1 at Time 1 (e.g., grade 7 at Y eari ) data are treated as the ˆ “replacement” to estimate schooling effect (δC2T 1−C1T 1 ). The schooling effect estimation bias in SCD is ˆ ˆ BIAS(δC2T 1−C1T 1 ) = E(δC2T 1−C1T 1 ) − δC2T 1−C2T 0 . (4.5) The goal is to simulate SCD by generating Cohort 1’s Time 1 data that are non-comparable with Cohort 2 at Time 0, so that matching can be used to reduce the “simulated selection bias”, to assure the HEoG assumption, and to decrease estimation bias of schooling effect. A series of parameter manipulations are used to generate data of Cohort 1 at Time 1 (See Section 4.2 for detail.). 66 Table 4.5: Two-Level Structural Equation Model Estimates (a.k.a True Pseudo-population Parameter Values) Level-1 Variable Label Loading Coefficient Residual Variance Regression Coefficient PRETEST POSTTEST Coef. SE PV Coef. SE PV Coef. SE PV Est. SE PV Pre-Test Score PRETEST - - - - - - .72 .03 .00 31.87 1.94 .00 Post-Test Score POSTTEST - - - - - - - - - 25.64 1.27 .00 EDUINSP YPWANT YPWWELL YPENC 1.00 1.05 1.82 .08 .11 .00 .00 .87 1.56 .58 - - - .21 .37 .66 .01 .03 .05 .00 .00 .00 SLFENCRG YIWANT YMORMTH YNOMORE 1.00 1.98 1.67 .18 .13 .00 .00 1.97 .56 .00 - - - .58 .67 .77 .04 .05 .05 .00 .00 .00 FMLSUPRT YPINT YFLIKES YMLIKES YFABLE YMABLE 1.00 .77 .46 1.00 .60 .05 .04 .06 .05 .00 .00 .00 .00 -.04 .25 .88 - - - .62 .73 1.05 .85 1.27 .04 .03 .04 .05 .05 .00 .00 .00 .00 .00 67 MTHIMPT YMIMPT YFIMPT 1.00 1.06 Table 4.5: Continued. -.89 .76 .05 .00 SES YFEDUC YMEDUC YFOCCN YMOCCN 1.00 .72 1.94 1.54 .04 .13 .14 .00 .00 .00 1.55 Age XAGE - - - Parental Help YFAMILY - - Ed. tion EDUECPT - YMHWKT .25 - - - .17 .24 .02 .03 .00 .00 .30 .00 - - - .17 .24 3.26 3.18 .01 .01 .13 .13 .00 .00 .00 .00 -.06 .02 .00 - - - - - - - -1.44 .16 .00 - - - - - - - - 1.28 .17 .00 - - - - - - - - - -.03 .01 .01 - - - - - - PRETEST MEAN 1.26 .91 .17 - - - 1.21 .07 .00 3.97 .93 POSTTEST MEAN 16.61 1.80 .00 - - - - - - 11.20 2.17 .00 Class Size CLASSIZE - - - -.20 .06 .00 - - - - - - Prior OTL OLDARITH OLDGEOM - - - .65 .79 .36 .94 .07 .41 - - - - - - Qualified Math Teacher Rate MTHONLY - - - 4.51 2.11 .03 - - - - - - Current OTL NEWALG NEWGEOM - - - - - - -.27 .37 .13 .14 .03 .01 - - - TPPWEEK - - - - - - .08 .02 .00 - - - Expecta- Homework Time Class tion Year Instruc- 68 .00 Table 4.6: Model Estimated Parameters: Level-1 Variance Variable Mean RYPWANT 4.731 0.34 RYPENC 4.373 0.24 0.80 YPWWELL 4.241 0.14 0.25 0.81 YIWANT 4.319 0.08 0.15 0.08 0.75 YMORMTH 3.247 0.16 0.29 0.17 0.34 1.35 RYNOMORE 3.732 0.13 0.24 0.14 0.29 0.57 1.25 RYPINT 3.719 0.15 0.27 0.15 0.11 0.21 0.18 1.28 RYFLIKE 3.536 0.11 0.20 0.12 0.08 0.16 0.14 0.51 1.13 RYMLIKE 3.247 0.07 0.12 0.07 0.05 0.10 0.08 0.31 0.24 RYFABLE 3.917 0.15 0.27 0.15 0.11 0.21 0.18 0.67 0.51 RYMABLE 3.702 0.09 0.16 0.09 0.06 0.13 0.11 0.40 0.31 RYMIMPT 4.603 0.19 0.34 0.20 0.09 0.18 0.15 0.22 0.17 RYFIMPT 4.546 0.20 0.36 0.21 0.10 0.19 0.16 0.24 0.18 YFEDUC 3.375 0.04 0.07 0.04 0.01 0.02 0.02 0.20 0.15 YMEDUC 3.349 0.03 0.05 0.03 0.01 0.02 0.02 0.14 0.11 YFOCCN 4.277 0.07 0.13 0.07 0.02 0.05 0.04 0.39 0.30 YMOCCN 4.128 0.06 0.10 0.06 0.02 0.04 0.03 0.31 0.24 TOTPOS 0.854 0.11 0.20 0.12 0.25 0.49 0.41 0.31 0.24 TOTPRE 1.185 0.16 0.28 0.16 0.34 0.68 0.57 0.42 0.33 XAGE 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 EDUEPCT 2.968 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 YFAMILY 1.745 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 YMHWKT 2.984 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 OLDARITH 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 OLDGEOM 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 NEWALG 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 NEWGEOM 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 TPPWEEK 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CLASSSIZ 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 MTHONLY 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 69 ˆ Covariance Matrix (Σ1 ) and Mean (ˆ1 ) µ 1.19 0.31 0.18 0.10 0.11 0.09 0.07 0.18 0.14 0.14 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.52 0.40 0.22 0.24 0.20 0.14 0.39 0.31 0.31 0.43 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.52 0.13 0.14 0.12 0.09 0.23 0.19 0.18 0.26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.53 0.38 0.06 0.04 0.11 0.09 0.08 0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.64 0.06 0.04 0.12 0.10 0.08 0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.48 0.22 0.60 0.48 0.34 0.48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.41 0.43 0.34 0.25 0.34 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.42 0.93 0.67 0.93 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 3.92 0.53 0.74 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Variable RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN TOTPOS TOTPRE XAGE EDUEPCT YFAMILY YMHWKT OLDARITH OLDGEOM NEWALG NEWGEOM TPPWEEK CLASSSIZ MTHONLY Table 4.6: Continued Mean 46.95 20.93 -1.93 0.57 -0.38 -1.29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 29.06 -2.68 0.79 -0.52 -1.79 0.00 0.00 0.00 0.00 0.00 0.00 0.00 36.06 -0.47 0.02 -0.29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.59 -0.01 -0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.35 0.17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 70 47.26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Table 4.7: Model Estimated Parameters: Level-2 Variance Covariance Matrix Variabble Mean Variance Covariance RYPWANT 0.000 0.00 RYPENC 0.000 0.00 0.00 YPWWELL 0.000 0.00 0.00 0.00 YIWANT 0.000 0.00 0.00 0.00 0.00 YMORMTH 0.000 0.00 0.00 0.00 0.00 0.00 RYNOMORE 0.000 0.00 0.00 0.00 0.00 0.00 0.00 RYPINT 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 RYFLIKE 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 RYMLIKE 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 RYFABLE 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 RYMABLE 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 RYMIMPT 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 RYFIMPT 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 YFEDUC 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 YMEDUC 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 YFOCCN 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 YMOCCN 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 TOTPOS 16.762 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 TOTPRE 12.597 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 XAGE 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 EDUEPCT 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 YFAMILY 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 YMHWKT 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 OLDARITH 0.710 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 OLDGEOM 0.319 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 NEWALG 5.961 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 NEWGEOM 4.137 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 TPPWEEK 5.087 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 CLASSSIZ 26.600 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 MTHONLY 0.139 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 71 ˆ (Σ2 ) and Mean (ˆ2 ) µ 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Variabble RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN TOTPOS TOTPRE XAGE EDUEPCT YFAMILY YMHWKT OLDARITH OLDGEOM NEWALG NEWGEOM TPPWEEK CLASSSIZ MTHONLY Table 4.7: Continued. Variance Covariance 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23.63 16.06 0.00 0.00 0.00 0.00 0.82 0.44 -0.38 0.50 0.25 -6.94 0.06 13.41 0.00 0.00 0.00 0.00 0.80 0.46 -0.16 -0.41 0.08 -5.40 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 72 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.03 0.25 -0.08 -0.45 -0.06 0.17 -0.01 0.39 -0.07 -0.37 -0.03 -0.13 -0.01 6.24 3.97 0.18 0.35 0.00 5.59 -0.20 -0.77 0.00 3.73 -0.67 0.00 29.01 0.10 0.02 Table 4.8: Covariance Matrix of the Five Latent Variables EDUINSP EDUINSP .13*** SLFENCRG .08 *** FMLSUPRT .15** MTHIMPT .19** SES .04** *** p < .001, *** p <.05 * 4.1.2 SLFENCRG FMLSUPRT MTHIMPT SES .17*** .11*** .09*** .01* p <.1 .67*** .22*** .2*** .36*** .06*** .31*** Longitudinal Data Generation The model-based estimates listed in Table 4.5, Table 4.6, Table 4.7 and Table 4.8 are treated as known parameter values and are plugged in the two-level SEM to generate longitudinal data of C2T 0 and C2T 1. Mplus code for the data generation is listed in Appendix A.2. Determine Class Sizes Simulated class sizes are based on the observed class sizes in SIMS-USA data. Table 4.9 displays the class-size distribution of the 126 Regular classes in SIMS-USA data. The range of the observed class sizes is [6,42]. The observed class sizes are rounded up resulting in four types of class sizes (with frequency): 10 (N=3), 20 (N=35), 30 (N=80), and 40 (N=8). The average class is 27.38, which is very close to the observed average class-size of 26.60. In the literature, the class-size in a simulated two-level model has been set at 30 (e.g., Tate and Wongbundhit, 1983). Pseudo-Population Size The SIMS-USA data are collected to represent 3,681,939 8th - graders nested in 136,368 classes across the seven strata in the United States (Wolfe, 1987). The simulated pseudo-population includes 12,600 classes and 345,000 students. The classsize distribution in the pseudo-population is: 10 (N=300), 20 (N=3500), 30 (N=8000), and 40 (N=800). The average class-size is 27.13. 73 Table 4.9: Class Size Distribution of 126 Class Size Frequency Percent 6 1 .8 14 2 1.6 15 1 .8 17 2 1.6 18 1 .8 19 3 2.4 20 4 3.2 21 4 3.2 22 6 4.8 23 11 8.7 24 3 2.4 25 11 8.7 26 6 4.8 27 14 11.1 28 11 8.7 29 7 5.6 30 6 4.8 31 14 11.1 32 3 2.4 33 7 5.6 34 1 .8 35 1 .8 36 3 2.4 37 2 1.6 38 1 .8 42 1 .8 Total 126 100.0 Classes of SIMS-USA Data Cumulative Percent .8 2.4 3.2 4.8 5.6 7.9 11.1 14.3 19.0 27.8 30.2 38.9 43.7 54.8 63.5 69.0 73.8 84.9 87.3 92.9 93.7 94.4 96.8 98.4 99.2 100.0 Evaluation of Pseudo-Population Parameter Recovery The two-level SEM is fit on the generated pseudo-population data. The estimated parameter values along with the “true” values are listed in Table 4.10. Except for a negative estimate bias (−0.2268) of the regression coefficient of latent construct Self Encouragement (β6 = .87), all other parameter estimation bias values are less than 0.09. 74 Table 4.10: Recovery of Pseudo-Population Parameters. Within Level Observed Variable RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN Population Estimated Bias Means 4.7310 4.3730 4.2410 4.3190 3.2470 3.7320 3.7190 3.5360 3.2470 3.9170 3.7020 4.6030 4.5460 3.3750 3.3490 4.2770 4.1280 4.7312 4.3741 4.2421 4.3226 3.2473 3.7337 3.7197 3.5375 3.2471 3.9191 3.7037 4.6034 4.5468 3.3742 3.3480 4.2811 4.1246 0.0002 0.0011 0.0011 0.0036 0.0003 0.0017 0.0007 0.0015 0.0001 0.0021 0.0017 0.0004 0.0008 -0.0008 -0.0010 0.0041 -0.0034 Latent Observed Variable Variable EDUISPR RYPWANT RYPENC YPWWELL SLFENCRGYIWANT YMORMTH RYNOMORE FMLYSUPT RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE MTHIMPT RYMIMPT RYFIMPT SES YFEDUC YMEDUC YFOCCN YMOCCN 75 Population Estimated Bias Factor Loadings 1.0000 1.8220 1.0540 1.0000 1.9840 1.6740 1.0000 0.7690 0.4580 1.0010 0.6010 1.0000 1.0590 1.0000 0.7180 1.9410 1.5400 1.0000 1.8288 1.0534 1.0000 1.9934 1.6816 1.0000 0.7704 0.4582 1.0011 0.5986 1.0000 1.0617 1.0000 0.7173 1.9391 1.5349 0.0000 0.0068 -0.0006 0.0000 0.0094 0.0076 0.0000 0.0014 0.0002 0.0001 -0.0024 0.0000 0.0027 0.0000 -0.0007 -0.0019 -0.0051 Table 4.10: Continued. Population Observed Residual Variable ances RYPWANT 0.2080 RYPENC 0.3650 YPWWELL 0.6600 YIWANT 0.5770 YMORMTH 0.6700 RYNOMORE 0.7720 RYPINT 0.6180 RYFLIKE 0.7330 RYMLIKE 1.0490 RYFABLE 0.8480 RYMABLE 1.2740 RYMIMPT 0.1700 Estimated Bias Vari0.2085 0.3650 0.6616 0.5774 0.6732 0.7697 0.6149 0.7328 1.0510 0.8448 1.2759 0.1703 0.0005 0.0000 0.0016 0.0004 0.0032 -0.0023 -0.0031 -0.0002 0.0020 -0.0032 0.0019 0.0003 RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN TOTPOS TOTPRE 0.2356 0.1639 0.2459 3.2391 3.1889 31.8988 25.5541 -0.0004 -0.0021 -0.0001 -0.0179 0.0059 0.0248 -0.0839 0.2360 0.1660 0.2460 3.2570 3.1830 31.8740 25.6380 Population Estimated Bias Latent Variable SLFENCRG FMLYSUPT FMLYSUPT MTHIMPT MTHIMPT MTHIMPT SES SES SES SES Latent Variable EDUISPR EDUISPR SLFENCRG EDUISPR SLFENCRG FMLYSUPT EDUISPR SLFENCRG FMLYSUPT MTHIMPT Outcome Variable Predictor Regression Coefficient EDUISPR SLFENCRG FMLYSUPT MTHIMPT SES XAGE EDUEPCT YFAMILY YMHWKT TOTPRE 0.8700 1.9730 -0.0380 -0.8850 1.5540 -0.0570 1.2770 -1.4390 -0.0320 0.7200 TOTPRE TOTPRE TOTPOS 76 Correlation Coefficient 0.0800 0.1460 0.1050 0.1880 0.0910 0.2220 0.0360 0.0120 0.2010 0.0580 Population 0.0795 0.1463 0.1054 0.1881 0.0907 0.2227 0.0362 0.0127 0.2033 0.0580 Estimated 0.6432 2.0168 -0.0160 -0.7823 1.5335 -0.0573 1.2781 -1.4015 -0.0320 0.7212 -0.0005 0.0003 0.0004 0.0001 -0.0003 0.0007 0.0002 0.0007 0.0023 0.0000 Bias -0.2268 0.0438 0.0220 0.1027 -0.0205 -0.0003 0.0011 0.0375 0.0000 0.0012 Table 4.10: Continued. Between Level Intercept TOTPRE TOTPOS Population Regression Coefficient OLDARITH 0.6500 OLDGEOM 0.7860 CLASSSIZE -0.2010 MTHONLY 4.5080 TOTPRE 1.2050 NEWALG -0.2690 NEWGEOM 0.3710 TPPWEEK 0.0750 Estimated 0.6106 0.7843 -0.2067 4.5822 1.2096 -0.2832 0.3741 0.0776 Bias -0.0394 -0.0017 -0.0057 0.0742 0.0046 -0.0142 0.0031 0.0026 77 Intercept TOTPOS TOTPRE Intercept TOTPOS TOTPRE Population Estimated Means 1.2640 1.3101 16.6120 16.7208 Residual Variance 3.9690 3.9681 11.1980 11.2468 Bias 0.0461 0.1088 -0.0009 0.0488 Longitudinal Data Generation Routines Longitudinal data sets of the focal Cohort 2 across Time 0 and Time 1 are generated through the following steps: 1. Level-2 independent covariates in the regression equation of the pre-test class-level means (denoted as vectorβ0 ) are generated through a multivariate normal distriC2T 0.β0 C2T 0.β0 bution, MN(µ2 , Σ2 ). These independent covariates include previous OTL of arithmetic (OLDARITH), previous OTL of algebra (OLDALG), class-size (CLASSSIZE), and qualified mathematics teacher rate (MTHONLY). Their mean C2T 0.β0 vector µ2 =[0.710, 0.319, 26.600, 0.139] and their variance-covariance matrix C2T 0.β0 Σ2 is in Table 4.7. The level-2 residuals uβ are generated through an univari0 2 ). The “true” variance σ 2 ate normal distribution, N (0, σu uβ is 11.198. β0 0 2. Based on the regression equation, pre-test class-level means β0 are generated by plugging in the regression coefficients γ0 , . . ., γ4 (Table 4.4), and the independent variables and residuals generated in the first step. 3. Level-2 independent covariates in the regression equation of the post-test class-level means (denoted as vector α0 ) are generated through a multivariate normal distribution, C2T 0.α0 MN(µC2T 0.α0 , Σ2 ). These independent covariates include current year OTL of algebra (NEWALG), current year OTL of geometry (NEWGEOM), and weekly hours of math instructions (TPPWEEK). Their means and variance-covariances are in Table 4.7. The level-2 residuals uα0 are generated through an univariate normal 2 2 distribution, N (0, σuα ). The variance σuα is 3.97. 0 0 4. Based on the regression equation, post-test class-level means α0 are generated by plugging in the regression coefficients γ5 , . . ., γ7 (Table 4.4), β0 generated in the first 78 two steps, and the independent variables and residuals generated in the third step. 5. Level-1 latent variables are generated through a multivariate normal distribution, C2T 0.ξ C2T 0.ξ MN(µ1 , Σ1 ). These latent variables (denoted as the vector ξ) include educational inspiration (EDUINSP), self encouragement (SLFENCRG), family support (FMLSUPRT), importance of learning mathematics (MTHIMPT), and socioecoC2T 0.ξ nomic status (SES). Their mean vector µ1 =[0,0,0,0,0]. Their variance-covariance C2T 0.ξ matrix Σ2 is in Table 4.6. The level-1 residuals epre are generated through 2 2 N (0, σepre ), with σepre =31.87. 6. Based on Ypre ’s regression equation, level-1 dependent variable YP re data are generated by plugging in β0 generated in the first two steps, the regression coefficients β1 , . . ., β9 (listed in Table 4.4), level-1 latent variables and residuals generated in the fifth step. 2 2 7. Level-1 residuals epost are generated through N (0,σe ), with σepre = 25.64. Based post on Ypost ’s regression equation, level-1 dependent variable Ypost data are generated by plugging in the level-1 YP re data generated in the sixth step, α0 generated in the fourth step, the regression coefficients α1 (listed in Table 4.4), and level-1 residuals epost . 8. The surrogate variables of each of the level-1 latent variables are generated through a multivariate normal distribution. For example, level-1 latent variable SES is associated with 4 surrogate variables through the measurement model C2T 0 X1.SES = µC2T 0 X1.SES + λX1.SES ηSES + eX1.SES , 79 (4.6) with eX1.SES ∼ N (0, ΘX1.SES ) and ηSES ∼ N (0, ΦSES ). The surrogate variC2T 0 ables are denoted as X1.SES , which include Father’s/Mother’s education level (YFEDUC /YMEDUC), and Father’s/Mother’s occupation national code (YFOCCN/YMOCCN). C2T 0 X1.SES ∼ M N (µC2T 0 , ΣC2T 0 ). Their mean vector µC2T 0 =[ 3.375, 3.349, X1.SES X1.SES X1.SES 4.277, 4.128]. ΣC2T 0 X1.SES is computed by λX1.SES ΦSES λ X1.SES +ΘX1.SES . λX1.SES is the factor loading vector. ΦSES is the variance of latent variable SES. ΘX1.SES is the diagonal matrix of residual variances. The parameter values of λX1.SES ΦSES and ΦSES are in Table 4.4. The computed ΣC2T 0 X1.SES is a diagonal matrix, whose diagonal entries are 0.475, 0.405, 4.421, and 3.916. The eight steps above together generate longitudinal data of focal Cohort 2 across Time 0 and Time 1. 4.2 Generate Synthetic Cohort Design Data with Simulated Selection Bias The goal is to simulate synthetic cohort design by generating Cohort 1’s Time 1 data that are non-comparable with Cohort 2 at Time 0 based on the conceptual two-level SEM. That is, C1T C2T Ypre 1 = Ypre 0 , indicating the baseline scores are different due to the simulated selection bias. ˆ Thus schooling effect estimate using SCD, δC2T 1−C1T 1 becomes biased compared with δC2T 1−C2T 0 . In order to reduce bias, matching is used to ‘assure’ a conditional equivalence, C1T C2T [Ypre 1 = Ypre 0 |f (X)], f (X) can be a function estimating the propensity score (RosenC1T C2T baum and Rubin, 1983). Given the conditional equivalence, using Ypre 1 to replace Ypre 0 80 as a baseline score to estimate schooling effect is applicable and accurate. The two-level SEM is complex because it involves a large number of parameters. For example, the Time-0-SEM in the the equation (3.19) includes (level-1’s and level-2’s) regression intercepts and coefficients, factor loadings, residual variances, and latent variable distribution parameters. For the purpose of simplicity, drop the subscript 0 of the parameters in the Time-0-SEM and add superscripts C1T 1 and C2T 0 to identify all possible situations where selection bias C1T C2T can cause Ypre 0 = Ypre 1 . Table 4.11 summarizes those situations, each of which may occur at level-1 and/or level2. There are too many possible situations, each of which can “break” the HEoG assumption and bias SCD’s schooling effect estimate. In order to make the simulation manageable, constraints and appropriate assumptions are needed to limit the situations. For example, factorial invariance (Cheung and Rensvold, 2002) and regression homogeneity (Wooldridge, 2002) rule out the situations that cause selection bias due to factor loadings and regression coefficients. Section 4.2.1 to Section 4.2.5 manipulate parameters such as means and variances to simulate selection bias due to the data hierarchical structure. Section 4.2.6 manipulates parameters such as latent variable mean and surrogate variables’ residual variances to simulate selection bias due to measurement errors. Section 4.2.7 manipulates level-1 and/or level -2 residual variances to simulate selection bias due to omitted variables. This section also examines if the association strength (indicated by R2 ) between the outcome Y and covariates (used in matching) affects the bias reduction rate of matching. 81 Table 4.11: Possible Simulation Manipulations on Comparability of C2T0 and C1T1 in SEM Framework Outcome Regression InVariable tercept Structural Model η C2T 0 η C1T 1 δ C2T 0 = δ C1T 1 E(eC2T 0 ) = E(eC1T 1 ) 2 2 E(ξ C2T 0 ) = E(ξ C1T 1 ) V(ξ C2T 0 ) = V(ξ C1T 1 ) E(eC2T 0 ) = E(eC1T 1 ) 1 1 E(η C2T 0 ) = E(η C1T 1 ) C1T V(eC2T 0 ) = V(e1 1 ) 1 Y v C2T 0 = v C1T 1 Distribution Parameter C1T V(eC2T 0 ) = V(e2 1 ) 2 Measurement X Model Residual V(η C2T 0 ) = V(η C1T 1 ) AC2T 0 = AC1T 1 U C2T 0 = U C1T 1 82 NA Coefficient g C2T 0 = g C1T 1 λC2T 0 = λC1T 1 B C2T 0 = B C1T 1 4.2.1 Generate Hierarchically Structured C1T 1 Data with Selection Bias These situations include the following practical issues: 1. Non-comparability occurs only where level-1 covariates and level-2 covariates are identical. It occurs when the two adjacent seventh grade classes being located in the same school and taught by the same teacher. That is, matching can be only on level-1 2. Level-2 covariates are not comparable and level-1 covariates are identical or level-1 comparability is not a concern. For instance, in the cluster randomized design, clusters including classes or schools are the sampling and intervention units; aggregated cluster means are the analysis units. Matching clusters to create level-2 comparability; 3. Level-1 and Level-2 covariates cause non-comparability. This is a concern when clusters are sampled from the population of interest, and intervention happens on individuals. Both level-1 and level-2 matching, i.e, the dual matching, is necessary. 4.2.1.1 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s There are four level-1 covariates: age (XAGE), education expectation (EDUCEPT), homework time (YMHWKT ), and frequency of family help on homework (YFAMILY). Their mean vector µC2T 0 = [0.000, 2.968, 1.745, 2.984] is manipulated by adding a constant vec1 tor c1 = (−1, 1, −1, −1). The manipulated mean vector is denoted as µC1T 1 = [-1.000, 1 3.968, 0.745, 1.984] and used to generate data of Cohort 1 at Time 1. Varying the value of C1T 1 and X C2T 0 . A smaller c will c1 will vary the overlap between the distribution of X1 1 1 C1T 1 and X C2T 0 ; and it is more likely to achieve create a bigger overlapping between X1 1 83 successfully matched units given a specific sample size. The simulated bias on pre-test score is 2.8052 . Thus, the manipulated population pretest mean of C1T 1 is increased from 13.711 to 16.576. Therefore, using SCD underestimates ˆ ˆ the learning effect by 2.805. That is BIAS(δC2T 1−C1T 1 )=2.805. After matching the ˆ ˆ level-1 covariates, BIAS(δC2T 1−C1T 1 ) will be shrunk. Thus a bias reduction rate can be computed to evaluate the performance of matching (See detail in Section 4.3). The logic of using matching is the same for the other six simulation situations. All simulation results are in Chapter 5. 4.2.1.2 C1T 1’s Level-1 Covariate Variances Differ from C2T 0’s The variances of the four level-1 covariates are manipulated by adding an extra 15% to each C2T 0 = [36.056, 0.590, 0.354, 47.260] of the original variances. The original variance vector σ1 is manipulated by multiplying a constant vector p1 = (1.15, 1.15,1.15,1.15). The manipuC1T 1 =[ 41.4644, 0.6785, 0.4071, 54.349]. Varying the lated variance vector is denoted as σ1 C1T 1 and X C2T 0 . A larger p will multiplier vector p1 will vary the overlap between X1 1 1 C1T 1 . This will decrease the chance of achieving successfully matched units given increase σ1 a specific sample size of Cohort 1 at Time 1. 4.2.1.3 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s The level-2 covariates include previous opportunities to learn arithmetic (OLDARITH) and algebra (OLDALG), class-size (CLASSSIZE), and qualified mathematics teacher rate (MTHONLY). The mean vector, µC2T 0 =[0.710, 0.319, 26.600, 0.139], is manipulated by 2 2 The four regression coefficients are, -0.057, 1.277, -1.439, and -0.032. It results in the simulated bias: (-0.057)*(-1)+ 1*1.277+(-1)*(-1.439)+(-1)*(-0.032)=2.805. 84 multiplying another constant vector p2 = (1.5, 1.5, -0.5,1.5). After manipulation, the mean vector µC1T 1 = [1.065, 0.4785, 13.3, 0.2085] is used to generate the data of Cohort 1 at Time 2 1. Note that the average class-size3 in Cohort 1 at Time 1 is half time as large as in Cohort 2 at Time 0. The regression coefficients of the four level-2 covariates are 0.65, 0.79, -0.2, and 4.51, respectively. This leads to a total bias of 3.33. 4.2.1.4 C1T 1’s Level-2 Covariate Variances Differ from C2T 0’s The variances of the four level-2 covariates are manipulated by adding a 15% to each of C2T 0 =[1.032, 0.385, 29.005, 0.018] is the original variances. The original variance vector, σ2 manipulated by multiplying a constant vector p2 = (1.15, 1.15,1.15,1.15). The manipulated C1T 1 =[ 1.187, 0.443, 33.356, 0.021]. Varying the multiplier variance vector is denoted as σ2 C1T 1 and X C2T 0 . A larger p will increase vector p1 will vary the overlap between X1 1 1 C1T 1 . This will decrease the chance of achieving successfully matched units given a specific σ1 level-2 sample size of Cohort 1 at Time 1. 4.2.1.5 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s In this situation, both level-1 and level-2 covariate means are manipulated by following procedures in Section 4.2.1.1 and Section 4.2.1.3. This way, the initial difference is inflated to 6.135, which is the sum of two initial difference values, 3.330 in Section 4.2.1.1 and 2.805 in Section 4.2.1.3 3 The class sizes shrink by half in Cohort 1 at Time 1. This requires to use Mplus command of CSIZES = 300 (5) 3500 (10) 8000 (15) 800(20) in Appendix A2, compared with the code of CSIZES = 300 (10) 3500 (20) 8000 (30) 800(40) in generating data of Cohort 2 at Time 0. 85 4.2.2 Generate Data for Matching on Latent Variables v.s. Matching on Surrogate Variables Data generation in this section involves manipulating random measurement errors and reliˆ ability values. Among the five latent constructs, SES (β9 = 1.55, SE = 0.30, p<.001) and ˆ Self Encouragement (β6 = 1.97, SE = 0.56, p<.001) are statistically significantly predictive of pre-test score. Because of its practical importance in education studies, only latent variable SES and its four surrogate variables are focused on in simulation manipulation. The four surrogate variables include: Father’s/Mother’s education level (YFEDUC/YMEDUC), and Father’s/Mother’s occupation national code (YFOCCN/YMOCCN). All manipulations are summarized in Table 4.12. 4.2.2.1 C1T 1’s Surrogate Variable Means Differ from C2T 0’s, with the Same Latent Means and Low Reliability C2T 0 The variance vector of the four surrogate variables is σX .SES =[0.475, 0.405, 4.421, 3.916] 1 respectively. The half standard deviation vector is c3 =[0.345, 0.318, 1.051, 0.989]. Their mean vector µC2T 0 =[3.375, 3.349, 4.277, 4.128] is manipulated by adding a half stanX1.SES dard deviation to each. The manipulated mean vector is denoted as µC1T 1 X1.SES =[ 3.720, 3.667, 5.328, 5.117] and used to generate data of Cohort 1 at Time 1 through a multivariate C1T 1 normal distribution X1.SES ∼ M N (µC1T 1 , ΣC1T 1 ). ΣC1T 1 X1.SES X1.SES X1.SES is computed by λX1.SES ΦSES λ X1.SES +ΘX1.SES . The factor loading vector λX1.SES =[1.000, 0.718, 1.941, 1.540] and the latent variable SES variance ΦSES =0.31. The values of ΘX1.SES are in Table 4.4. The re86 Table 4.12: Simulation Design of Matching on Latent and Surrogate Variables PreTest PostTest ICC Reliability Latent Variable Mean Observed Variable Mean Cohort 1 Time 1 (7th grade in Year i+1) ICC Reliability Latent Variable Mean Observed Variable Mean Cohort 2 Time 0 (7th grade in Year i) PreTest PostTest µ 0 Low 0.318 0.337 µ+c3 0 Low 0.311 0.331 µ 0 Low 0.318 0.337 µ 0 High 0.311 0.331 µ 0 Low 0.318 0.337 µ+c4 0.68 High 0.311 0.331 µ 0 High 0.318 0.337 µ+c4 0.68 High 0.311 0.331 liability coefficient is computed as the ratio of SUM(λX1.SES ΦSES λ X1.SES )4 and SUM(ΣC1T 1 ) (Lord and Novick, 1968; Raykov, 1997). The pseudo-population level X1.SES reliability coefficient, for both Cohort 2 at Time 0 and Cohort 1 at Time 1, is equal to 0.25, which is low. A two-level SEM is fit to the generated pseudo-population data to derive the factor score of latent variable SES. When fitting the two-level SEM, the mean of the latent variable SES is set to 0. This setting is the same for pseudo-population data of Cohort 1 at Time 1 and for pseudo-population data of Cohort 2 at Time 0. 4 Function SUM(.) add up all the components of a matrix. 87 4.2.2.2 C1T 1’s Surrogate Variables Have Higher Reliability than C2T 0’s, with the Same Surrogate Means and the Same Latent Means In this simulation, the residual variances of the four surrogate variables are reduced by 90%. Thus, the reliability will be increased to 0.78. The original residual variance matrix     0.166 0 0 0       0 0.246 0 0  ΘX1.SES =      0 0 3.257 0     0 0 0 3.183                   (4.7) is manipulated by reducing the diagonal entries by 90%. The manipulated residual variance matrix becomes     0.017 0 0 0       0 0.025 0 0  ∗ ΘX1.SES =      0 0 0.326 0     0 0 0 0.318          .         (4.8) C1T 1 The surrogate variables are generated through a multivariate normal distribution X1.SES ∼ C2T 0 M N (µC1T 1 , ΣC1T 1 ). The mean vector µC1T 1 X1.SES X1.SES X1.SES is µX1.SES =[ 3.375, 3.349, 4.277, ∗ 4.128]. ΣC1T 1 X1.SES is computed by λX1.SES ΦSES λ X1.SES +ΘX1.SES . The factor loading vector λX1.SES and the latent variable SES variance ΦSES are the same as those 88 in Section 4.2.2.1. The latent variable SES mean is set as 0 for both Cohort 1 at Time 1 and Cohort 2 at Time 0, when Mplus fits the two-level SEM to estimate factor scores. 4.2.2.3 C1T 1’s Surrogate Variables Have Higher Reliability, Different Latent Variable Mean from C2T 0’s In this simulation, the level-1 surrogate variables of Cohort 1 at Time 1 have a higher reliability. This manipulation is the same as what has been discussed in Section 4.2.2.2. The latent variable mean in Cohort 2 at Time is 0; the latent variable mean in Cohort 1 at Time 1 is 0.68, which is a half of the standard deviation of the latent variable. When fitting the two-level SEM to estimate factor scores, the mean of the latent variable SES is set to as 0 by default in Cohort 2 at Time 0. The latent mean is set to 0.68 in Cohort 1 at Time 1. Because of the latent mean difference, the surrogate variable means of two cohorts are different by a constant vector c4 . c4 = 0.68 ∗ λX1.SES , with λX1.SES displayed in Section 4.2.2.1. 4.2.2.4 C1T 1’s Latent Variable Mean Differ from C2T 0’s, with Same Higher Reliability Both Cohort 1 at Time 1 and Cohort 2 at Time 0 have a higher reliability of .78. This indicates a strong relationship between the four surrogate variables and the latent variable SES. The latent mean of SES in Cohort 1 at Time 1 is manipulated in the same way as that discussed in Section 4.2.2.3. This section adapts the latent mean settings of Mplus discussed in Section 4.2.2.3 to estimate the factor scores for both cohorts. 89 4.2.3 Manipulate R2 to Generate Data for Matching 2 2 In this simulation, level-1 variance σepre and level-2 residual variance σuα are reduced by 0 half, respectively. A value of ICC due to a manipulation is computed. Table 5.2 (Colum 2 and 3) lists the ICC’s. 4.2.3.1 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s, with Level-1 Vari2 ance σepre Reduced by Half In this simulation, level-1 covariates means in Cohort 1 at Time 1 are manipulated by 2 following the same procedure of Section 4.2.1.1. The residual variance σepre in both cohorts is set as 12.819, which is reduced by 50% of the value (25.638) in Section 4.2.1.1. The initial difference on Pre-test score is 2.805. 4.2.3.2 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s, with Level-1 Vari2 ance σepre Reduced by Half, and Initial Difference Reduced 2 In this simulation, the residual variance σepre in both cohorts is set as 12.819, which is a 50% reduction. In Cohort 2 at Time 0, the standard deviation vector of the four level-1 C2T 0 =[6.005, 0.768, 0.595, 6.875] and a half of standard deviation is [3.002, covariates, is σ1 0.384, 0.297, 3.437]. The mean vector of the four level-1 covariates µC2T 0 =[0.000, 2.968, 1 1.745, 2.984], is manipulated by deducting/adding a half of standard deviation to each. The deducting/adding operation is determined by the negative/positive sign of the regression coefficient. The manipulated mean vector µC1T 1 =[ -3.002, 3.352, 1.448, -0.453] is 1 used to generate data of Cohort 1 at Time 1. The regression coefficients of the four covariates are, -0.057, 1.277, -1.439, and -0.032, thus, the bias on Pre-test score is 1.1995 , which 5 (-3.002)*(-0.057)+0.384*1.277+(-0.297)*(-1.439)+(-3.437)*(-0.032)=1.199 90 is about 43% of the bias in Section 4.2.3.2. 4.2.3.3 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s with Level-2 Vari2 ance σuα Reduced by Half 0 In this simulation, level-2 covariate means are manipulated by following the same procedures 2 as in Section 4.2.1.3. The residual variance σuα in both cohorts is set as 5.599, which is 0 reduced by 50% of the value (11.198) in Section 4.2.1.3. The initial difference on Pre-test score due to level-2 covariate means difference is 3.330, which is the same as that in Section 4.2.1.3. 4.2.3.4 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s with Level-2 Vari2 ance σuα Reduced by Half and Initial Difference Reduced by Half 0 2 In this simulation, the residual variance σuα in both cohorts is set as 5.599, which is 0 reduced by 50% of the value (11.198) of 4.2.1.3. In Cohort 2 at Time 0, the standard C2T 0 =[1.016, 0.620, 5.386, 0.134] and a deviation vector of the four level-2 covariates, is σ2 half of the standard deviation is [0.508, 0.310, 2.693, 0.067]. The mean vector of the four level-2 covariates µC2T 0 =[ 0.710, 0.319, 26.600, 0.139] is manipulated by deducting/adding 2 a half of the standard deviation to each. The deduction or addition operation is determined by the negative/positive sign of the regression coefficient. The manipulated mean vector µC1T 1 =[1.218, 0.629, 23.90, 0.206] is 2 used to generate data of Cohort 1 at Time 1. The regression coefficients of the four covariates are, 0.65, 0.79, -0.2, and 4.51, thus, the bias on level-2 pre-test intercept is 1.4166 , which is 6 0.65*0.508+0.79*0.310+(-0.2)*(-2.693)+ 4.51* 0.067=1.416. 91 about 43% of the initial difference of the bias in 4.2.3.3. 4.2.3.5 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s with Both Level-1 and Level-2 Variances Reduced by Half This simulation combines section 4.2.3.1 and 4.2.3.3 to generate data for Cohort 1 at Time 1. In this way, the initial difference is inflated to 6.135. which is the same as that in 4.2.1.5. This simulation is different from that in 4.2.1.5 is that the former has higher level-1 R2 and level-2 R2 . 4.2.3.6 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s, with Both Level-1 and Level-2 Variances Reduced by Half and Total Initial Difference Reduced This simulation combines the manipulations of 4.2.3.2 and 4.2.3.4 to generate data for Cohort 1 at Time 1. In this way, the initial difference is 2.615, which is about 43% of the initial difference (6.135) in 4.2.3.5. Both this simulation and simulation 4.2.3.5 have the same lower level-1 R2 and lower level-2 R2 . Contrasting simulations of Section 4.2.3 with those of Section 4.2.1 allows examinations of the following: 1) if increasing R2 improves the bias reduction rate; and 2) if deceasing initial selection bias after increasing R2 further improves bias reduction rate. 4.3 Simulation Evaluation Generating C1T 1, C2T 0, and C2T 1 data sets allows the computation of quasi-longitudinal growth and real longitudinal growth to examine estimation bias reduction rate. The effec92 tiveness of matching is examined using bias reduction rate (Cochran and Rubin, 1973). It is computed as 100 1 − Schooling Effect Estimation Bias in SCD After Matching Schooling Effect Estimation Bias in SCD Without Matching %. (4.9) The detailed computation is described in the following three subsections. Because the computation involves a sample index, the notation of schooling effect estimates used in this section are different from those in previous sections. However, the schooling effect estimate based on longitudinal C2T 0 − C2T 1 data are conceptually invariant in the dissertation study. Also, the schooling effect estimate based on SCD’s C1T 1 − C2T 1 data are also conceptually invariant across chapters in this dissertation study. 4.3.1 Compute Initial Difference The longitudinal estimate of schooling effect δ P osP re from sample i (size is ni ) based upon C2T 0 − C2T 1 data will be denoted as ˆP ¯ C2T ¯ C2T δi osP re = YP ost 1 − YP re 0 , i i (4.10) with i = 1, 2, . . . , 200. The SCD used C1T 1 − C2T 1 data to estimate schooling effect δ SCD , which is computed as ˆSCD = Y C2T 1 − Y C1T 1 , ¯ ¯ δi P osti P rei with i = 1, 2, . . ., 200. 93 (4.11) δ The initial difference thus is defined as (BIASinitial ) and it is computed as δ ˆP ˆSCD ), BIASinitial = E(δi osP re − δi (4.12) with i = 1, 2, . . ., 200. It is the mean of 200 biases. Each bias is the difference between a schooling effect estimate ˆP ˆSCD . δi osP re and a SCD-based estimate δi 4.3.2 Compute After Matching Bias After matching, SCD estimate of schooling effect will be M δ SCD = [Y C2T 1 − Y C1T 1 |f (X C2T 0 ) = f (X C1T 1 )], ˆ ¯ ¯ i i i P osti P rei (4.13) with j = 1, 2, . . ., 200 and f (.) representing some function. For example, f (.) can be the propensity score function. δ After matching, bias is defined as BIASmatchng and it is computed as δ ˆP ˆSCD , BIASmatching = E(δi osP re − M δi (4.14) with i = 1, 2, . . ., 200. It is the mean of 200 biases. Each bias is the difference between a ˆP ˆSCD . schooling effect estimate δi osP re and a matched synthetic cohort design estimate M δi 94 4.3.3 Compute Bias Reduction Rate After-matching bias and initial difference together define the bias reduction rate (BRR), which is computed as δ BRRmatching = 100 ∗ (1 − δ BIASmatching )%. BIAS δ initial (4.15) For example, if initial difference is 6 before matching and 2 after matching, then the bias reduction rate due to matching is 100(1-2/6)%=67%. That is, two thirds of the initial difference has been accounted for by matching. Larger value of bias reduction rate indicates a better performance of matching in assuring HEoG of Synthetic Cohort Design. 95 Chapter 5 Matching Simulation Results and Discussions This chapter reports the detailed matching procedures and results of analyses of the simulated situations discussed in Chapter 4. 5.1 Three Types of Matching Routines Matching is conducted using the R (R Development Core Team, 2007) module MatchIt (Ho et al., 2009) and Match (Sekhon, 2007) . The R code is attached in Appendix A.3, A.4, and A.5. 5.1.1 Level-1 Matching By ignoring the hierarchical structure, individuals are matched. The detailed matching procedures are as following: 96 1. Randomly draw 100 classes from the pseudo-population of Cohort 2 at Time 0. Let nj 100 be the class-size of j th class. The sample size is nC2T 0 = nj . Randomly draw 100 j=1 classes from the pseudo-population of Cohort 1 at Time 1. Let n be the class-size of j 100 j th class. The sample size is nC1T 1 = nj . Let (Y, X, W )C2T 0 be the ith data i j =1 record of the Cohort 2 at Time 0 sample, with i =1,2,. . . , nC2T 0 . Vector Y represents the level-1 pre-test score variable. X is the level-1 variable vector including both the latent and the observed. W is the level-2 variable vector. Let (Y, X, W )C1T 1 be the i i th data record of the Cohort 1 at Time 1 sample, with i =1,2,. . . , nC1T 1 . 2. Pool the two random samples together to estimate the level-1 propensity scores. The propensity score (p1 ) represents the probability that a student belongs to focal Cohort p1 , is computed 2, whose cohort ID is coded as 1. The logarithm of the odds , log 1−p1 for each student and used for matching (Stuart & Rubin, 2008). Because the simulated bias occurs at only level-1 , the level-2 covariates W are not used to compute propensity scores. 3. Among the total of nC2T 0 cases in the sample drawn from Cohort 2 at Time 0, for the ith data record (Y, X, W )C2T 0 , find ONE data record (Y, X, W )C1T 1 from Cohort i i 1 at Time 1, so that Min[(X)C2T 0 , (X)C1T 1 ] reach a pre-set value, which is such a i i small number called caliper in matching literature (e.g., Stuart and Rubin, 2008). The smaller the caliper, the more comparable the two data points will be. Min [a, b] is a function that computes the minimum distance between quantity a and quantity b in terms of log-odds or Mahalanobis distance. The matched data are used to compute δ the ith bias reduction rate (BRR i ) using the formula in Section 4.3. matching 97 4. Repeat 1-3 for 200 times, which are the replications. This results in 200 bias reduction rates. 200 5. Compute and report the average bias reduction rate as i=1 δ BRR i matching 200 . Level-1 matching R code is displayed in Appendix A.3. 5.1.2 Level-2 Matching In level-2 matching, classes are matched using level-2 propensity scores. The analysis units are the means of the matched classes. 1. Randomly draw 100 classes from the pseudo-population of Cohort 2 at Time 0. Randomly draw 100 classes from the pseudo-population of Cohort 1 at Time 1. Let ¯ ¯ Y , X, W C2T 0 be the k th data record of the sample drawn from Cohort 2 at Time k ¯ ¯ 0, with k =1,2,. . . , 100. Let Y , X, W C1T 1 be the k th data record of the sample k ¯ drawn from Cohort 1 at Time 1, with k = 1 , 2 , . . .,100. Vector Y represents the class ¯ mean of pre-test score. VectorX represents the means of the level-1 variables including both latent and observed. Vector W represents level-2 variables. 2. Pool the two random samples together to estimate the level-2 propensity scores. The level-2 propensity score (p2 ) represents the probability that a class belongs to Cohort p2 , is computed 2, whose cohort ID is coded as 1. The logarithm of the odds, log 1−p2 for each class and used to for matching. Because of the hierarchical structure, level-2 covariates W play a critical role in computing propensity scores. ¯ ¯ ¯ ¯ 3. For the ith data record Y , X, W C2T 0 , find ONE data record Y , X, W C1T 1 from k k Cohort 1 at Time 1, so that Min [ (W )C2T 0 , (W )C1T 1 ] is less than a caliper (Stuart i i 98 & Rubin, 2008). The smaller the caliper, the more comparable the two data points will be. The matched classes are used to compute the ith bias reduction rate. 4. Replicate 1-3 for 200 times, which results in 200 bias reduction rates. 5. Compute and report the average of the 200 bias reduction rates. Level-2 matching R code is displayed in Appendix A.4. 5.1.3 Dual Matching Dual matching involves two parts. First, level-2 units such as classes are matched. Second, within each pair of matched treatment-control clusters, individual units are matched. The detailed matching procedures are: 1. Conduct level-2 matching following the first 3 steps of Section 5.1.2. The matched classes are used to compute the ith class-level bias reduction rate. 2. Within each pair of the matched classes from level-2 matching, conduct level-1 matching following Step 3 of Section 5.1.1. The matched units are used to compute the ith dual-matching bias reduction rate. 3. Replicate 1-2 for 200 times, which results in 200 class-level bias reduction rates and 200 dual-matching bias reduction rates. 4. Compute and report the average class-level bias reduction rate and the average dualmatching bias reduction rate. Dual matching R code is displayed in Appendix A.5. 99 5.2 Simulation Results of Matching on Level-1 and/or Level-2 Covariates Table 5.1 summarizes the simulation results of individual matching, cluster matching and dual-matching. The results are from simulations in Section 4.2.1 5.2.1 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s When the two cohorts’ hierarchically structured data are different on only level-1 covariates, matching on propensity scores estimated from level-1 covariates reduces estimation bias by 78 percent using a caliper of 0.01 standard deviations. If a larger caliper (i.e., 0.2 standard deviations) is used, propensity score matching reduces estimation by 72 percent. Mahalanobis distance matching only reduces estimation bias by 16 percent when a caliper of 0.2 standard deviations is used. The bias reduction rate is only 24 percent when a caliper of 0.01 standard deviations is used. 5.2.2 C1T 1’s Level-1 Covariate Variances Differ from C2T 0’s When the two cohorts are hierarchically different in terms of only level-1 covariate variances, matching on propensity scores estimated from level-1 covariates reduces estimation bias by only 1 percent when a caliper of 0.2 standard deviations is used. If a smaller caliper of 0.01 standard deviation is used, propensity score matching reduces estimation bias by about 2 percent. Mahalanobis distance matching even increases estimation bias by 9 percent when a caliper of 0.01 standard deviations is used. The estimation bias is increased by about 8 percent when 100 a caliper of 0.2 standard deviations is used. 5.2.3 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s When the two cohorts are hierarchically different on only the level-2 covariate, matching on level-2 propensity scores estimated from level-2 covariates reduces estimation bias by 63.55 percent when a caliper of 0.2 standard deviations is used. If a smaller caliper of 0.01 standard deviations is used, propensity score matching reduces estimation bias by 68.81 percent. Mahalanobis distance matching does not reduce estimation bias when a caliper of 0.2 standard deviations is used. The bias reduction rate is 5.26 percent when a caliper of 0.01 standard deviations is used. 5.2.4 C1T 1’s Level-2 Covariate Variances Differ from C2T 0’s When the two cohorts’ hierarchically structured data are different in terms of level-2 covariate variances, matching does not help reduce estimation bias at all. Matching on level-2 propensity scores increases 8.2 percent of estimation bias when a caliper of 0.2 standard deviations is used. If a smaller caliper of 0.01 standard deviation is used, matching on level-2 propensity scores even increases estimation bias by 167.8 percent. Mahalanobis distance matching increases estimation bias by 21.18 percent when a caliper of 0.01 standard deviations is used. Estimation bias is reduced by about 5.34 percent when a caliper of 0.2 standard deviations is used. 101 Table 5.1: Bias Reduction Rates of the Three Types of Cohort 2 Cohort 1 Propensity Time 0 Time 1 score Matching (7th grade (7th grade in Yeari ) in Yeari+1 ) ICC ICC Larger Smaller Caliper Caliper Level-1 PrePost- PrePostMatching Test Test Test Test Noncomparable Level-1 0.32 0.34 0.31 0.33 72.03 78.44 Covariate Means Noncomparable Level-1 0.32 0.34 0.31 0.33 1.79 1.35 Covariate Variances Level-2 Matching Noncomparable Level-2 0.32 0.34 0.31 0.33 63.55 68.81 Covariate Means Noncomparable Level-2 0.32 0.34 0.32 0.34 -8.20 -167.80 Covariate Variances Dual Matching Noncomparable 37.01 NA Level-2 Covariate Means 0.32 0.34 0.32 0.34 Noncomparable 76.66 NA Level-1 Covariate Means 102 Matching Mahalanobis Distance Matching Larger Caliper Smaller Caliper 16.56 24.03 -7.51 -9.48 0.00 5.26 5.34 -21.18 NA NA NA NA 5.2.5 Dual Matching Simulation Results When the two cohorts’ hierarchically structured data have different covariate means at both level-1 and level-2, dual matching significantly reduces estimation bias. Matching on level-2 propensity scores reduce estimation bias by 37.01 percent when a caliper of 0.2 standard deviations is used. After cluster matching, matching level-1 propensity scores further reduces estimation bias by 39.65 percent. Dual matching for synthetic cohort design reduces total estimation bias by 76.66 percent. 5.2.6 Discussion When the cohorts’ hierarchically structured data are different on only level-1 or only level-2 covariate variances, matching does not help to reduce bias in synthetic cohort design. This is because when covariate means, either at level-1 or level-2, are identical between Cohort 2 at Time 0 and Cohort 1 at Time 1, the initial difference is very small and using synthetic cohort design can estimate the schooling effect accurately. Matching helps little in reducing, or even can increase bias in this situation. When the cohorts’ hierarchically structured data are different on both level-1 covariates and level-2 covariates, dual matching is the optimal approach. Only cluster matching helps reduce bias. But initial difference (about 40%) due to level-1 covariate means between Cohort 2 at Time 0 and Cohort 1 at Time 1 still exists. Research has suggested that when the true propensity score model is known and sample size is large, propensity score will be a better approach (Sekhon and Diamond, 2008). Each simulated condition determines a “true” and known propensity score model. Each replication 103 sample size is about 2,700, from 100 classes. Each class on average has 27 students. Thus, Mahalanobis distance matching cannot achieve the optimal results. Future studies may use smaller level-1 and level-2 sample sizes to examine the performances of the three proposed matching approaches on hierarchically structured data. 5.3 Simulation Results of Matching on Level-1 Latent Variable and Surrogate Variables For each simulation, there are four matching methods including two propensity score matching, factor score matching, and Mahalanobis matching. Table 5.2 summarizes the results. When C1T 1’s surrogate variable means differ from C2T 0’s and both cohorts have equal latent means and a low reliability (ρ = 0.25), matching on propensity scores estimated from surrogates reduces estimation bias by 51.53 percent. None of the other three types of matching can reduce estimation bias by more than 3.5 percent. Latent variable propensity score matching even increases estimation bias by 3.02 percent. When the two cohorts have equal surrogate variable means, equal latent means and C1T 1 has a higher reliability (ρ = 0.78), propensity score matching through latent variable SES is optimal and it reduces estimation bias by 8.5 percent. All the other three types of matching reduces estimation bias by less than 1.5 percent . Latent variable Mahalanobis distance matching even increased the bias by 4 percent. When C1T 1’s surrogate variable means and latent variable SES mean differ from C2T 0’s and C1T 1 has a higher reliability (ρ = 0.78), latent variable Mahalanobis distance matching only reduces estimation bias by 4.69 percent. Any of the other three types of matching 104 reduces estimation bias by about 53.3 to 54.8 percent. When C1T 1’s surrogate variable means and latent variable SES mean differ from C2T 0’s and both cohorts have high reliability (ρ = 0.78), latent variable Mahalanobis distance matching only reduces estimation bias by 2.91 percent. Other three types of matching work equally well, and any of them can reduce estimation bias by about 55 percent. Section 5.3.1 to Section 5.3.4 list the detailed results of the four matching approaches for each simulation. Each matching is conducted within a caliper of 0.2 standard deviations. 5.3.1 C1T 1’s Surrogate Variable Means Differ from C2T 0’s, with the Same Latent Means and Low Reliability The surrogate variable means are different between Cohort 1 at Time 1 and Cohort 2 at Time 0. The pseudo-population mean of the latent variable SES is 0 in the two cohorts. The pseudo-population level reliability coefficient of the surrogate variables is as low as 0.25 in the two cohorts. Propensity Score Matching Based on Surrogate Variables of SES Matching on propensity scores estimated from the four surrogate variables reduces schooling effect estimation bias in SCD (shortened as “estimation bias”) by 51.53 percent. Propensity Score Matching Based on Latent Variable SES Matching on propensity scores estimated from the latent variable SES factor scores increases estimation bias by 3.02 percent. Matching on Latent Variable SES Factor Score When the estimated factor score of latent variable SES is used as a “propensity score”-like measure in matching, it reduces estimation bias by 2.02 percent. 105 Table 5.2: Simulation Results of Matching on Level-1 Latent Variable and Surrogate Variables Cohort 2 Time 0 (7th grade in Year i) Surrogate Variables Propensity Score Matching Cohort 1 Time 1 (7th grade in Year i+1) Latent Variable Propensity Score Matching Matching on Latent Variable Itself Latent Variable Mahalanobis Matching Latent Variable Mean Observed Variable Reliability Mean Latent Variable Mean Reliability µ 0 Low µ + c3 0 Low 51.53 -3.02 2.02 0.45 µ 0 Low µ 0 High 0.07 8.5 1.03 -4.0 µ 0 Low µ + c4 0.68 High 54.75 53.81 53.31 4.69 µ 0 High µ + c4 0.68 High 55.12 54.96 54.86 2.91 Observed Variable Mean 106 Mahalanobis Distance Matching Based on Latent Variable SES Factor Score If the estimated Mahalanobis distance of the estimated factor score of the latent variable SES is used for matching, it reduces estimation bias by 0.45 percent. 5.3.2 C1T 1’s Surrogate Variables Have Higher Reliability than C2T 0’s, with the Same Surrogate Means and the Same Latent Means The surrogate variable means are the same in the two cohorts. The pseudo-population mean of the latent variable SES is 0 in the two cohorts. The pseudo-population level reliability coefficient of the four surrogate variables is 0.25 in Cohort 2 at Time 0, but it is equal to .78 in Cohort 1 at Time 1. Propensity Score Matching Based on Surrogate Variables of SES Matching on propensity scores estimated from the four surrogate variables reduces estimation bias by 0.07 percent. Propensity Score Matching Based on Latent Variable SES Matching on propensity scores estimated from the latent variable SES factor scores reduces estimation bias by 8.5 percent. Matching on Latent Variable SES Factor Score Matching on factor scores of the latent variable SES reduces estimation bias by 1.03 percent . Mahalanobis Distance Matching Based on Latent Variable SES Factor Score Mahalanobis distance matching based on the latent variable SES factor scores even increases estimation bias by 4 percent. 107 5.3.3 C1T 1’s Surrogate Variables Have Higher Reliability, Different Latent Variable Mean from C2T 0’s The pseudo-population mean of the latent variable SES is 0.68 in Cohort 1 at Time 1; but it is 0 in Cohort 2 at Time 0. Therefore, the surrogate variable means of the two cohorts are different by 0.68 ∗ λX1.SES ; λX1.SES being the factor loading vector. The pseudo-population level reliability coefficient of the four surrogate variables is 0.25 in Cohort 2 at Time 0, but it is .78 in Cohort 1 at Time 1 Propensity Score Matching Based on Surrogate Variables of SES Matching on propensity score estimated from the four surrogate covariates reduces estimation bias by 54.75 percent Propensity Score Matching Based on Latent Variable SES Matching on propensity scores estimated from of latent variable SES factor scores reduces estimation bias by 53.81 percent. Matching on Latent Variable SES Factor Score Matching based on factor scores of the latent variable SES reduces estimation bias by 53.31 percent. Mahalanobis Distance Matching Based on Latent Variable SES Factor Score Mahalanobis distance matching based on the latent variable SES factor scores only reduces estimation bias by 4.69 percent. 5.3.4 C1T 1’s Latent Variable Mean Differ from C2T 0’s, with the Same Higher Reliability The pseudo-population level reliability coefficient of the surrogate variables is 0.78 in the two cohorts. The pseudo-population mean of the latent variable SES is 0.68 in Cohort 1 at Time 1; but it is 0 in Cohort 2 at Time 0. 108 Therefore, the surrogate variable means of the two cohorts are different by 0.68∗λX1.SES ; λX1.SES being the factor loading vector. Propensity Score Matching Based on Surrogate Variables of SES Matching on propensity score estimated from the four surrogate covariates reduces estimation bias by 55.12 percent. Propensity Score Matching Based on Latent Variable Matching on propensity scores estimated from of latent variable SES factor scores reduces estimation bias by 54.96 percent. Matching on Latent Variable SES Factor Matching based on factor scores of the latent variable SES reduces estimation bias by 54.86 percent. Mahalanobis Distance Matching Based on Latent Variable SES Factor Score Mahalanobis distance matching based on the latent variable SES factor scores only reduces estimation bias by 2.91 percent. 5.3.5 Discussion Different studies may use different quality of data in terms of measurement reliability, which presents different requirements for matching. This section demonstrates the potential of using surrogate variables or a latent variable in matching, according to the different matching requirements to reduce bias of schooling effect estimate. Mahalanobis distance matching does not reduce bias as effectively as either propensity score matching or factor score matching. Propensity score matching generally performs better than Mahalanobis distance matching when the true propensity score model is known and the sample size is large (Sekhon and Diamond, 2008). The simulation settings favor propensity score matching. Each simulated condition de109 termines a “true” and known propensity score model. Each of the 200 replications uses 100 classes and the sample size is as large as about 2,700. Latent variable matching is effective if factor scores represent the difference between the two cohorts. If the two cohorts are comparable on the latent variable means, matching through the latent variable is not helpful at all. If the two cohorts are different only on surrogate variables with larger measurement errors, latent variable matching reduces little bias. But propensity score matching through these surrogate variables is optimal. Reliability of surrogate variables doesn’t impact the effectiveness of matching. If the two cohorts are different on the latent variable, regardless of whether the level of reliability is low or high, matching on propensity scores estimated from surrogate variables works as well as matching on factor scores or propensity scores estimated using the latent variable. Measurement errors on surrogate variables do not attenuate bias reduction rate. This is different from what was found by (Cochran and Rubin, 1973). Future study and simulation are needed to further examine and explain the inconsistency. 5.4 Simulation Results of Matching When R2 Being Manipulated 2 2 In this simulation, level-1 variance σepre and level-2 residual variance σuα are reduced by 0 half. It examines: 1) if increasing R2 improves bias reduction rate; and 2) if increasing R2 improves bias reduction rate more when the simulated selection bias is smaller. Table 5.3 summarizes the results. 110 Table 5.3: Bias Reduction Rates of Three Type of Matching with Higher R2 Cohort 2 Time 0 Cohort 1 Time 1 Propensity (7th grade in Year (7th grade in Year score i) i+1) Matching Larger Smaller ICC ICC Caliper Caliper Level-1 Matching Pre-Test Post-Test Pre-Test Post-Test Non-comparable Level-1 0.372 0.447 0.365 71.77 78.34 Covariate Means, Higher 0.454 2 R Non-comparable Level-1 0.372 0.447 0.365 62.96 64.06 Covariate Means with 0.454 2 and Reduced Higher R Initial Difference Level-2 Matching Non-Comparable Level-2 0.250 0.207 0.244 70.22 71.15 Covariate Means with 0.213 2 Higher R Non-Comparable Level-2 0.250 0.207 0.244 52.26 66.84 Covariate Means with 0.213 2 and Reduced Higher R Initial Difference Dual Matching Non-Comparable Level-1 0.280 0.320 0.274 36.74 NA and Level-2 Covariate 0.326 2 Means with Higher R 77.13 NA Non-Comparable Level-1 0.280 0.320 0.274 36.39 NA and Level-2 Covariate 0.326 2 and Means with Higher R Reduced Initial Difference 78.19 NA 111 Mahalanobis Distance Matching Larger Smaller Caliper Caliper 16.99 23.34 7.80 12.74 0.00 5.49 0.00 24.48 NA NA NA NA NA NA NA NA 5.4.1 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s, with Level-1 Variance Reduced by Half When the two cohorts are hierarchically different at only level-1 covariates and level-1 R2 is high, matching on propensity scores estimated from the level-1 covariates reduces the schooling effect estimation bias in SCD by 78.34 percent using a caliper of 0.01 standard deviations. If a lager caliper of 0.2 standard deviations is used, the bias reduction rate is 71.77 percent. Mahalanobis distance matching only reduces estimation bias by16.99 percent using a caliper of 0.2 standard deviations. The bias reduction rate is 23.34 percent when a caliper of 0.01 standard deviations is used. 5.4.2 C1T 1’s Level-1 Covariate Means Differ from C2T 0’s, with Level-1 Variance Reduced by Half, and Initial Difference Reduced When the two cohorts are hierarchically less different at level-1 covariates, that is, the initial difference is smaller, increasing level-1 R2 does not help to improve the performance of matching. Matching on propensity scores estimated from level-1 covariates reduces estimation bias by 64.06 percent when it uses a caliper of 0.01 standard deviations. If a larger caliper of 0.2 standard deviations is used, the bias reduction rate becomes 62.96 percent. Mahalanobis distance matching only reduces estimation bias by 7.8 percent when it uses a caliper of 0.2 standard deviations. The bias reduction rate is 12.74 percent when a caliper is of 0.01 standard deviations. 112 5.4.3 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s with Level-2 Variance Reduced by Half When the two cohorts are hierarchically different at only level-2 covariate and level-2 R2 is high, matching on class level propensity scores estimated from level-2 covariates reduces estimation bias by 70.22 percent when it uses a caliper of 0.2 standard deviations. The estimation bias reduction rate is 63.55 percent when level-2 R2 is low (Section 5.2.3). If a smaller caliper of 0.01 standard deviations is used, propensity score matching reduces bias reduction rate by 71.15 percent. Mahalanobis distance matching does not reduce estimation bias when a caliper of 0.2 standard deviations is used. The bias reduction rate is 5.49 per cent when a caliper of 0.01 standard deviations is used. 5.4.4 C1T 1’s Level-2 Covariate Means Differ from C2T 0’s with 2 Level-2 Variance σuα0 Reduced by Half, and Initial Differ- ence Reduced by Half When the two cohorts are hierarchically less different at only level-2 covariates and the initial difference is smaller, increasing level-2 R2 does not help to improve the performance of matching. Matching on propensity scores estimated from level-2 covariates reduces estimation bias by 66.84 percent when it uses a caliper of 0.2 standard deviations. If a smaller caliper of 0.01 standard deviations is used, propensity score matching reduces estimation bias by 52.26 percent. Mahalanobis distance matching does not reduce estimation bias at all when a caliper of 113 0.2 standard deviations is used. The bias reduction rate is 24.48 per cent when a caliper of 0.01 standard deviations is used. 5.4.5 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s with Both Level-1 and Level-2 Variances Reduced by Half When the two cohorts’ hierarchically structured data are different at both level-1 and level-2 covariates, and both level-1 R2 and level-2 R2 are high, dual matching reduces estimation bias by 77.13 percent. Matching on only class-level propensity scores reduces estimation bias by 36.74 percent when a caliper of 0.2 standard deviations is used. After level-2 matching, matching on propensity scores estimated from level-1 covariates further reduces estimation bias by 40.39 percent. 5.4.6 C1T 1’s Level-1 and Level-2 Covariate Means Differ from C2T 0’s, with Both Level-1 and Level-2 Variances Reduced by Half, and Total Initial Difference Reduced When the two cohorts’ hierarchically structured data are less different at both level-1 and level-2 covariate means, that is, the initial difference is smaller, increasing both level-1 R2 and level-2 R2 dose not help to improve the performance of matching. Matching on propensity scores estimated from level-2 covariates reduce estimation bias by 36.39 percent when a caliper of 0.2 standard deviations is used. Further, after level-2 matching, matching on propensity scores estimated from level-1 114 covariates further reduces estimation bias by 41.80 percent. A total of 78.19 percent of estimation bias is reduced in the dual matching. 5.4.7 Discussion When level-1 R2 is high, the results are almost identical to those of Simulation 4.2.1, where level-1 R2 is low. This suggests that increasing level-1 R2 does not help to improve level-1 matching. When simulated level-1 selection bias is smaller, no matter level-1 R2 is high or low, individual matching does not work as effectively as it does when the initial difference is larger. This suggests that level-1 matching is not sensitive to the increase of level-1 R2 . When level-2 R2 is high, the results of propensity score matching are not identical to those of Simulation 4.2.1, where level-2 R2 is low. Specifically when a larger caliper of 0.20 standard deviations is used, increasing level-2 R2 does help to improve level-2 matching estimation bias reduction rate by about 7 percent. When the simulated level-2 selection bias becomes smaller, increasing level-2 R2 does not improve the performance of cluster matching. This further suggests that the accuracy of level-2 propensity score matching is more sensitive to the magnitude of initial difference than to the increase of level-2 R2 . The dual propensity score matching is robust. The performance of dual propensity score matching is not sensitive to the increase of R2 . More importantly, propensity score matching still achieves a large bias reduction rate when the initial difference is small. Still Mahalanobis distance matching is not comparable to propensity score matching. Using a smaller caliper will improve matching accuracy. 115 Chapter 6 Discussions The synthetic cohort design (SCD) is a cross-sectional design by nature. It is also a quasiexperimental design because it uses retrospective data rather than prospective data to study the effect of being in a specific cohort1 . The observed cohorts in the SCD of this dissertation study are matched to reduce the impact of selection bias on the schooling effect estimate. Because of the complexity of the hierarchical structure of school systems, there are four major school related issues that may limit the use of the proposed matching approaches. Sections 6.1 to 6.4 discuss these four issues. Specifically, section 6.1 examines extending the matching procedure that is developed on the regular math class data to other types of math classes such as remedial classes. Section 6.2 discusses incompletely matched data due to small class size. Section 6.3 discusses the role that the level-1 and level-2 covariates play in the SCD and in the process of matching. And section 6.4 discusses matching students that are held back at grade 8. To address the measurement error issue that commonly exists in education studies, it 1 In education studies, this effect is often referred to as schooling effect. 116 is often necessary to use latent variables at more than one levels in the hieratically structured school systems. This dissertation involves the use of only level-1 latent variables in multi-level structural equation modeling to address measurement error issues in matching. The issue of use level-2 latent variables in matching is discussed in Section 6.5. In addition, measurement-invariance-testing technique developed in the field of structural equation modeling is introduced in Section 6.6 to identify situations where the HEoG assumption may fail. In future studies, other statistical indices and analytical approaches should be considered in after-matching evaluation and data analysis. Section 6.7 discusses how statistical power can be used to evaluate the performance of matching. Section 6.8 discusses statistical analysis approaches, such as meta-analysis technique that can be used to analyze matched data. Further, how matched-cohort data can be used in a longitudinal study is addressed in section 6.9. This section also discusses why the SCD is needed in causal inference in education studies and how it differs from synthetic cohort analysis in other fields. And to illustrate, Section 6.10 discusses how matching approaches proposed in this dissertation study can be applied in the fields of international comparative mathematics education studies and program evaluation studies in higher education and secondary track/nontrack school system in three different examples. Finally, Section 6.11 briefly summarizes the dissertation study. 6.1 Extend the Analysis to another Type of Math Classes This study only uses the Regular Class data of SIMS, although there are three other class types: Remedial, Enriched, and Algebra. Focusing on one type of classes ignores information 117 contained in the other three types of classes. Because of its largest sample size in the four types classes, the Regular Class data set is used for matching to address the following question: What is the schooling effect of one year in a regular 8th grade class? However, the curriculum system is different across the four class types. Matching developed from the Regular Class data can be applied to any one of the other three types of classes. For example, using the same matching routine on the Remedial Class data would depict the schooling effect of one year in remedial class. 6.2 Incomplete Matching Due to Small Cluster Size This dissertation study considers a balanced design. That is the two cohorts are equally sized. The success of obtaining matches for the treated units is generally determined by the control group size. Stuart and Rubin (2008) recommends the use of a larger control group to assure successful matching for each unit in a smaller treatment group. In practice, larger treatment groups often occur due to a specific purpose of design. For example, in the TIMSS 1995 design there were two classrooms at the upper 8th grade and one classroom at the lower 7th grade. The upper 8th grade is the focal cohort, conceptually, in the “treatment” condition. Overall, the number of students in the treatment group is twice as large as the one in the control group. In this situation, incomplete matching may occur and lead to estimation bias of schooling effect. Three components of bias on an outcome are identified by Rosenbaum and Rubin (1985): 1) component due to departures from strongly ignorable treatment assignment; 2) component due to incomplete matching; and 3) component due to coarse or inexact matching. Bias due to departures from strongly ignorable treatment assignment indicates the selection bias. This 118 dissertation study currently uses one-to-one matching (called simple matching, Abadie and Imbens, 2006) to reduce selection bias to improve schooling effect estimation accuracy. Using simple matching can leave some treated units unmatched, which in turn leads to incomplete matching. Abadie and Imbens (2006) review the average treatment effects (ATTs) estimators based on simple matching. They show that ATT estimators in simple matching include a conditional bias term (i.e., efficiency loss) that does not converge to zero when more than one continuous variable is used for matching. Abadie and Imbens (2007) proposed a bias-corrected matching estimator for ATTs using multiple matching, which allows matching with replacement. Thus, every treated unit has a matched unit or several matched units from the control group. Therefore, one can find matching units with higher quality. For example, when matching is on one continuous variable, it has been shown that the efficiency loss can be forced to converge to zero through multiple matching with replacement. Although matching with replacement leads to a larger variance, it typically produces greater match quality and smaller bias. According to Cochran (1972), bias reduction is more important than the variance reduction. Future studies can apply the multiple matching approach in hieratically structured data to test if it can improve bias reduction rate and, if yes, to what degree. Another approach to address the incomplete matching issue is to use the non-local control group matching (Stuart and Rubin, 2008), in which the difference between the local and nonlocal control groups is accounted for in estimating the schooling effect. The local control group can be a control class located in the same school as the treatment class is. The 119 non-local group is a control class from another school. In educational settings, class (group) sizes are comparatively small, therefore researchers may fail to find a good match for every unit of a treatment class. In this situation, units from a non-local control group may be needed. Stuart and Rubin (2008) use an infinitely large non-local group for matching. For the treatment units that can be matched from the local L control group (y1i ), the researchers find corresponding matching units from the non-local N L N units (y1i L ). The regression model will be fit on data including y1i and y1i L to obtain quantities. The quantities are used to adjust the treatment units that can be matched only from a non-local control group. Thus it resolves the incomplete matching issue. This dissertation study did not use the non-local control group to adjust incomplete mataching. Future studies can explore how to use non-local group matching adjustment in the SCD to achieve a better bias reduction rate when classes are the intervention units. For example, future studies can simulate a situation where C2T 0 and C1T 1 are not equallysized: 2N2 for the former and N for the latter. In such situations, researchers can evaluated whether or not, and if yes to which degree, matching success can be improve by matching procedures with adjustment including non-local adjustment matching (Stuart and Rubin, 2008) and multiple matching with replacement (Abadie and Imbens, 2007). Besides conducting empirical simulation studies, future research should engage in systematic literature review on studies using non-balanced designs with or without randomization. The systematic literature review can focus on how the statistical inference is adjusted in the unbalanced design, specifically in the field of latent variable modeling. 2 N is the sample size. 120 6.3 Role of Covariates in Synthetic Cohort Design In the Solomon Four-Group Design or any other true experimental design, due to the use of randomization, inclusion of covariates is not necessary in the analytical step (Solomon, 1949). Randomization, in the long run, “evens out” the effect of covariates on intervention in treatment and control groups. However, in quasi-experimental designs such as the SCD, covariates play a very important role in statistical inference. The feasibility of matching depends on the availability of the covariates in a study. The upper grade and the lower grade in the SCD are conceptually treated as the treatment group and control group, respectively. Instead of using analysis of covariance (ANCOVA, Cochran, 1957) to partial out the contaminate effect of covariates on intervention, this study demonstrates three cases where matching approaches can take care of and “even out” the effect of covariates. The first case involves the hierarchical structure of the covariates. The second involves the measurement errors associated with observed covariates. The third involves incomplete or omitted covariates. These cases commonly occur in education studies due to the complex structure of school systems, which involve an endless list of variables such as student characteristics, family background variables, teacher variables, and variables at school and district levels. These variables are, directly or indirectly, relevant to student learning, although they are measured with errors in practice. Some researchers may treat one set of these relevant variables as intervention variables to study their effects on student learning, controlling for another set of variables (i.e., covariates). Other researchers study another set of variables and their effects on student learning, controlling for another set of covariates. Correspondingly, the covariates that used for matching will vary across studies. 121 6.3.1 On Which Covariates to Match Covariates used in matching and covariates used in ANCOVA belong to the same set of variables, which are potential confounders (Song and Herman, 2010). These confounders are the preexisting characteristics that, although not the intervention variables, also cause observable difference between the treatment and control groups. The difficulty is how to determine a set of covariates that would be appropriate for matching. Answering this question requires a two-step process. Step one requires distinguishing the intervention variables from the covariates among the available variables in a study. Step two requires testing on which covariates that the treatment and control groups are not comparable. The covariates used to compute the propensity scores will be those non-intervention variables that can significantly distinguish the two cohorts. In matching the upper and lower graders in the SCD, attention should be paid to the covariates due to their chronological nature. 6.3.2 Concern on Chronological Variables such as Age and Grade Specific OTL In practice, special considerations are needed when there is a chronological difference between the two groups or cohorts of individuals being matched. For example, in the SCD, the upper and lower grades are different by age and by curriculum coverage (e.g., OTL, in Schmidt and Burstein, 1992). Matching the two cohorts, the upper and the lower grades, involves collecting historical data on the upper grade students. These historical data would include their age and curriculum coverage when they are in 122 the lower grade a year before. Thus, the age of the upper grade used in matching should be reduced by one year. The previous year’s curriculum coverage of the upper grade should be used in matching with the current year’s curriculum coverage of the lower grade. 6.3.3 Two Types of Level-2 Covariates Two types of level-2 covariates are identified (L¨dtke et al., 2008): global and contextual u variables. The directly measured level-2 covariates, such as class-size, generally can be broken down to individuals at level-1. These covariates are referred to as global variables. The contextual variables are those covariates included at both levels. For example, a covariate x is used in the level-1 model, then the cluster-mean of x is included in the level-2 model. Researchers have studied multilevel modeling with the same covariate included at both level-1 and level-2. This type of multilevel model has been called contextual analysis modeling (Boyd and Iversen, 1979; Firebaugh, 1978; Raudenbush and Bryk, 2002; Schmidt and Houang, 1986). The possible problem is that the aggregated mean of a covariate based upon small number of individuals has low reliability. This dissertation study does not include the aggregated means as extra level-2 covariates in estimating the level-2 propensity scores. Future research should examine the effect of including contextual variables, such as the aggregated means, as extra level-2 covariates in matching. 6.3.4 Interaction Terms as Omitted Covariates Omitted covariates can be the interaction terms that are not included in the model and analysis. Gelman (2009) emphasizes the interaction between the treatment variable and pre123 treatment covariates and the possible impact on treatment effect estimation. The interaction term and its effect are often ignored when the main focus is to estimate a single coefficient of the treatment variable of interest. Ignoring the interaction term may result in a biased treatment effect estimate (Gelman and Hill, 2007). Ignoring the interaction term may directly cause non-comparability of two cohorts being compared in the SCD. Students in Cohort 2 at Time 0 can be more/less proficient than those in Cohort 1 at Time 1. That is those two 7th grades in two continuous years are not comparable in terms of the interaction between pre-test score and a pretreatment covariate. The treatment in this situation is one-more year of schooling in 8th grade. Future research should identify the interaction terms of treatment and pre-treatment and develop matching routines to account for selection bias due to ignoring the interaction terms. 6.4 Deal with Students under Retention in Matching Grade retention, as an indictor of educational process (Planty et al., 2009), affects the matching procedure in the SCD. It has been used as an intervention to hold a student for an extra year in a grade with a goal of improving his/her academic proficiency (Ou and Reynolds, 2010). These retained students are not comparable with their classmates due to the extra year of schooling they received in the same grade. However, if the research goal is to find how much the 8th graders learn after one school year, retained students should be excluded during the matching. This is because they have one year of schooling before retention. Matching retained students in the upper grade with students in the lower grade will not 124 be plausible. They themselves can be their matched units. The schooling effect estimate can be derived from retained students by using a longitudinal design. One can treat their observed learning outcome at the beginning of the retention year as a base-line score to compute the individual’s gain of the retaining year to estimate the schooling effect. 6.5 Improve Measurement Accuracy in Education Studies There is a clear trend of using randomized control trials design in studying how educational interventions and instructional inferences affect student performance (Raudenbush and Sadoff, 2008; Sloane, 2008; Spybrook, 2007). Assessing the efficacy and the efficiency of the design depends heavily on the hypothesis development, experimental design, controlled experimental trials, identification of the population of interest, and the implementation of the study. The most challenging task in educational randomized trials design is to obtain valid measurements of interventions and inferences to assess their efficacy and efficiency (Raudenbush and Sadoff, 2008; Sloane, 2008). Because of the hierarchically structured experimental design and data collection (Raudenbush and Sadoff, 2008), treatment units are generally classes or schools rather than individuals. The accuracy of measurement on interventions should be assessed at the classroom level. Raudenbush and Sadoff (2008) point out that intervention in classes is critical to student development and requires a large-scale measurement of classroom instruction. Measurement errors commonly occur in survey sampling data and analysis (Cochran, 1968b; S¨rndal et al., a 125 2003, Chapter 16). The measures of the classroom intervention and instruction that students receive can be subject to measurement errors (Raudenbush and Sadoff, 2008) in the randomized large-scale surveys. In general, research focuses on how treatment-control status affects classroom instruction (two-level modeling) and further how classroom instruction affects student outcomes across schools (three-level modeling). The measurement error on classroom instruction is accounted for by assuming a classroomlevel random effect within the school. Further, the mean of the classroom-level instruction is assumed to be predicted by treatment-control status. It is also allowed to vary across schools (Raudenbush and Sadoff, 2008). In addition, the classroom instruction can be predicted by treatment-control status at a level-1 equation, and the intercept and slope of the level-1 equation can be assumed to vary across schools. The measurement error effect on statistical inference testing can be done using the non-centrality F statistics (Raudenbush and Sadoff, 2008). Future research can explore how class-level measurement errors will affect the accuracy of cluster matching and dual matching in terms of bias reduction rate. 6.6 Situations Where HEoG May Fail In this study the simulated non-comparability of the two cohorts occurs only on the mean vector and the variances of the joint distribution of the covariates of interest. Other situations where HEoG may fail should be examined in future studies. For example, future research should consider a situation where the two cohorts have different factor loadings and/or regression coefficients. Measurement invariance testing (Cheung and Rensvold, 2002; Kaplan, 2008) can detect if the two groups or cohorts being matched 126 have the same parameters such as factor loadings and residual variance. Traditionally, three steps are required to test measurement invariance. First, configural invariance is assessed by imposing the same structure of free and fixed parameters on the factor loadings across groups. If such a model fits the data well, it can be concluded that the same conceptual framework underlies the respondents’ responses. Second, metric invariance is tested by constraining the factor loadings to be equal in the different groups. If metric invariance holds, changes in the latent variable lead to the same changes on the observed responses of the same items across groups. Third, constraining the intercepts to be invariant across groups provides a means to assess scalar invariance. Together with invariant factor loadings this condition assures that comparisons of latent means across groups are meaningful. These measurement-invariance-testing approaches can identify on which sets of parameters that the two groups are not comparable. Future research can simulate a corresponding set of parameters that cause the non-comparability of the two groups/cohorts to evaluate how matching on latent variables or surrogate variables can reduce selection bias. 6.7 Statistical Power as an After-Matching Evaluation Index Statistical power refers to the probability that an inference test achieves statistical significance given a true effect size (Cohen, 1988). Recently, experimental design is considered as a composite element in the power analysis of intervention studies. Experimental design includes analytical modeling and the sampling procedures, which consist of sample size and 127 sample assignment (Song and Herman, 2010). Given a specific value of the Type I error, statistical power will vary across true effect sizes in a study deign. More importantly, statistical power is affected by the process of “sample allocation” (Song and Herman, 2010, p. 358), which refers to whether intervention units are individuals or clusters of individuals. The power analysis in statistical modeling refers to including covariates for adjustment to boost the value of statistical power and achieve a smaller sample size (Bloom et al., 2007; Raudenbush et al., 2007). The boosted value of statistical power is called the gain on power. Based upon what has been discussed, there are a few strategies to evaluate matching under the framework of power analysis. First, if a matching approach can improve the effect size estimate, which will approximate the true effect size, this matching approach can achieve higher statistical power (e.g., Freedman et al., 1990; Griffin et al., 2009; Martin et al., 1993). Second, matching with non-local group adjustment (Stuart & Rubin, 2008) will help to successful find matching units for the treatment group. This will result in an orthogonal design, and will help to keep a larger sample size. These two aspects will improve the value of statistical power. The effect size can be computed as the standardized mean difference (Cohen, 1988; Hedges, 2007b, for multilevel data). The second strategy that future studies can use is to focus on comparisons between the gain on power in regression-covariate-adjustment in multi-level modeling and the gain on power in cluster/dual matching. Griffin et al. (2009) use post-hoc matching on observational data and find that matching on different sets of level-2 covariates results in different levels of statistical power. Studying the second strategy can shed light on how the level-1 and/or level-2 covariates affect statistical power in matching compared with covariate-adjustment 128 in regression. 6.8 After-Matching Statistical Analyses In general, it is recommended to use matching and regression-adjustment together to pursue a larger reduction rate of the initial selection bias (Rubin and Thomas, 2000; Stuart and Rubin, 2007). Future studies need to merge dual matching and multilevel modeling with covariates included as adjustment variables to examine if optimal results can be achieved in the context of using hierarchically structured data. For example, a simulation study can compare if including propensity scores as one of the covariates in the hieratical linear model can achieve an optimal result. The analytical approach of paired cluster randomized trials (PCRT) discussed in Thompson et al. (1997) can be used to analyze the data after level-2 matching. The PCRTs involve pairs of clusters that are matched in terms of covariates such as the demographic characteristics. Within each pair, one cluster is randomly assigned to the treatment group and the other to the control group. A random effects meta-analysis framework can take into account the between-cluster variation. Techniques such as the sample size calculations and the profile likelihood method can be applied to compute the confidence interval of the global effect size by taking into account the variation in estimating the variance across clusters. 6.9 Synthetic Cohorts Design and Life-Course Research Making causal inference from policy-based treatment is to understand the causal effect of one of the “turning point events and interventions on development trajectories” in life129 course (Haviland and Nagin, 2005, p. 576). In education studies, the treatment status needs to be defined in the context of multiple hierarchically structured sites. Such policy-based hierarchal treatment status has been developed in the literature. For example, policy-based treatment status in their school retention policy study, Hong and Raudenbush (2006) defined the school level binary treatment status as 1 if a school has high retention proportion, and 0 otherwise. Within each school, the level-1 treatment status is determined as 1 if a pupil is allowed to retain in the current grade for one-more year regarding school retention policy, and 0 otherwise. The SCD of this dissertation study can be treated as a specific case of the one-time-point treatment effect estimation approach discussed in Haviland and Nagin (2005). Compared with a longitudinal design, the SCD involves only one set of multiple informative data during several periods. However, the SCD and other one-time-point treatment effect estimation approaches are irreplaceable for two major reasons as discussed below. Longitudinal studies through growth modeling are statistically feasible; but they may not be morally applicable in practice to study life-course events. This is because most of the events during the life-course are one-time-occurrences, and some events can be emotionally negative and should/can not be repeated. Each event can create a turning point at a historically important time point in life-course. The use of SCD can examine the impact of one-time-occurrence-event in life-course and help researchers understand individual development and change (Haviland and Nagin, 2005). Even if longitudinal design can be used for certain research situations during the lifecourse, it is often unrealistic to follow participants longitudinally due to the high cost and the complexity of life events. Longitudinal studies collect multi-waves of follow-up data in 130 a time period. Increasing data collection frequency in a longer time period can improve the quality of research given a fixed number of sample, but it can significantly increase the research cost (Bloom et al., 2007). In the complex cluster randomized trials design such as using schools as study units, limited cost on data collection can reduce the number of clusters and can further attenuate the statistical power of a research (Raudenbush and Liu, 2001). The one-time-point treatment effect estimation approach, such as the first-time treatment effect estimate of a duration of life-course, can be generalized across multiple time points (Leon and Hedeker, 2005; Li et al., 2001). Individual development trajectory and pathway in life-course are shaped by the life-events’ effects (Elder, 1998). Haviland and Nagin (2005) point out that propensity score matching can be used to create comparable groups, which can be followed and studied across in the duration of life-course. These comparable groups created by researchers using observed data are called synthetic cohorts in epidemiology studies. Synthetic cohorts are commonly used in aging studies in the field of epidemiology to estimate the synthetic cohort effect (Heimberg et al., 2000; Kessler et al., 1998). For example, Campbell and Hudson (1985) use rare life events of seniors to pool observed panel survey data to create synthetic cohorts, which are comparable groups to be further analyzed through the discrete times series analysis. Kessler et al. (1998) use latent class analysis to predict individuals’ cohort membership and to create synthetic cohorts. Each cohort is longitudinally studied. A dummy variable can be created to represent the cohort membership index and can be used in data analysis. For example, the synthetic cohort effect is the non-comparability 131 between two cohorts and is captured by the statistical significance of the interactions of the dummy variable and the background characteristics (Heimberg et al., 2000). This way it can reveal on which background characteristics the two cohorts are different. 6.10 Illustrations Program evaluation is a challenging task to accomplish in education studies because valid inferences must account for errors and biases in the observed data collected from the hierarchically structured samples. In these samples, students are nested in classes, and classes are nested in schools and other higher level units. Matching is a tool to reduce bias of schooling effect estimation even when random assignment is achieved, and to account for selection bias when random assignment is not possible. Different studies use different types of data in terms of school systems, which present different requirements for matching. The following examples demonstrate how to use matching approaches proposed in this dissertation study to improve the accuracy of estimating intervention effect in the fields of international comparative mathematics education studies, and program evaluation studies in higher education and secondary track/nontrack school systems. Exemplary Case 1 In the Mathematics Teaching in the 21st Century (MT21) project, program evaluators study the program effect on the subject matter knowledge of the starters and the finishers on mathematics education in four teacher education programs located at four German universities (Schmidt et al., 2007). The treatment status of this study is the trainees’ finishing status in the mathematics education program in an across-nation comparative study. The treatment status is defined as 1 if a student has finished the training program and 0 if she/he just starts. The goal is to evaluate the effect of the treatment (i.e., 132 finishing the training program) by comparing the starters (the control group) and the finishers (the treatment group). This treatment status has a longitudinal nature because the finishers have spent a certain period of time in training and the starters have not. In this case, the strategy is to match each finisher with a starter within the same university in terms of grade point average, math course taking, and four motivation measures (pedagogical and subject-specific intrinsic motivation, and status-related and access-related extrinsic job motivation). This matching uses the level-1 matching proposed in this dissertation study. Exemplary Case 2 In TIMSS 1995 study, researchers examine students’ improvement on mathematics knowledge in track and nontrack classes across grades 7 and 8. These classes are located in schools across different states in the U.S. The hierarchically structured data and complex sampling design bring forth challenges in program evaluation and causal modeling. The policy-based treatment assignment at the school level is schools’ track/nontrack status, 1 for a track school, and 0 for a nontrack school. Within each school, the naturally observed grade levels construct the class-level treatment status, 1 for grade 8 and 0 for grade 7. In this case, there are two ways to match the hierarchically structured data. Matching 1 for Track/Nontrack School Comparison at Each Grade This matching can be referred to as the dual matching proposed in this dissertation study. Take the 7th grade as an example. First, at level-2, match each 7th grade class in a nontrack school with one 7th grade class in a track school. Level-2 propensity scores are estimated by using all level-2 covariates. The school-level and class-level covariates are treated as level-2 covariates. The level-2 matched data then will be analyzed by taking class means as analysis data points to determine, in general, if there is a track/nontrack school difference in terms of the math 133 learning of the 7th graders. This step is referred to as the level-2 matching proposed in this dissertation study. Similarly, the same matching procedures can be done for the 8th grade track/nontrack data. Further, level-1 matching will be conducted. Within each pair of the matched classes from the first step, level-1 propensity scores will be used to match one student from the nontrack class with one student from the track class. These doubly matched data can be analyzed to identify the individual differences on math learning among 7th graders between the track school system and the nontrack school system. The two steps demonstrate how the dual matching can be used in this situation. Matching 2 for track/nontrack school growth comparison on math learning This situation involves only level-1 matching. That is to mach 7th graders and 8th graders within each school. A growth score can be obtained from a pair of 7th − 8th graders. Within each school, level-1 propensity scores are used to match one 7th grader to one 8th grader. Only the level-1 covariates will be used to compute the propensity scores for matching. In the mixed effect (multilevel modeling) analysis, the school level covariates will be added to control for the potential confounders. Exemplary Case 3 Researchers at a midwest public university propose to investigate how well residential students have learned through the higher educational system. The comparison will be conducted within the university. The residential students are distributed in multiple colleges such as in Residential College in the Arts and Humanities (RCAH), College of Arts and Letters (CAL), a College (named JM in short ) in Social Science, a College (named LB in short ) in Natural Science. The policy-based treatment will be 1 for a residential student, 0 otherwise. The treatment status is uniform across colleges that 134 have residential students in the university. The clusters are referred to as the colleges being evaluated. Matching 1 for Residential/Nonresidential Students Comparisons This situation in- volves the within-cluster comparison, which means comparing residential students with their peers in the same college based upon the measuring knowledge specific to that college. For this within-cluster comparison, individual residential students will be matched with nonresidential students in the same college. Level-1 propensity scores are used for matching. The level-1 propensity scores are computed by using only the level-1 covariates. Matching 2 for College Comparison Using Residential Students This situation involves the between-cluster comparison. The between-cluster comparison will need a holistic measure of higher education success for all colleges. This between-cluster comparison involves only the residential students of all colleges in the university. For any two given colleges, RCAH and JM for instance, the treatment status will be 1 for being in college RCAH and 0 in college JM. So, residential students in the treatment group will be matched with residential students in the control group. The same matching can be done by setting up one college as the control group and matching each of other colleges with this control group. The matched data then can be analyzed. Matching 3 for College Effect on Residential Students The evaluation can be done by using the freshmen and the seniors among the residential students. The colleges can be treated the same as those universities in Exemplary Cases 1 and corresponding matching can be done as what has been explained in Exemplary Cases 1 above. This matching approach is more suitable to study how each college can add value to the learning effect of residential students. 135 The three approaches use different data for matching and address three different research questions: Matching 1 examines if there is any difference between residential students and nonresidential students within the college; Matching 2 examines if there is any difference on “residential effect” across colleges; and Matching 3 examines how much the “residential effect” is for each college. 6.11 Summary In education studies using the SCD, valid inferences must account for the selection bias of the hierarchical nature of the data. Matching is a tool to reduce estimation bias of treatment effect to account for selection bias when random assignment is impossible in the SCD. Different situations where the HEoG assumption may fail present different requirements for matching. This dissertation study demonstrates the potential of using propensity score matching in the SCD to reduce bias of schooling effect estimate in three simulated situations involving hierarchically structured data, surrogate covariates with measurement errors, and omitted covariates. Based on the structural equation modeling framework, this dissertation study provides a theoretical basis for future research to examine the effectiveness of post-hoc adjustment approaches such as propensity score matching in reducing the selection bias of SCD for casual inference and program evaluation. 136 APPENDICES 137 Appendix A Simulation Code A.1 Mplus Code Fitting the Two-Level SEM on SIMSUSA Data DATA: FILE IS ”SIMSRGLR.dat”; Format IS FREE; TYPE IS individual; DEFINE: STTHRATE = SCHSIZE/STCHS; NEWGEOM=NEWGEOM/10; NEWALG=NEWALG/10; OLDARITH=OLDARITH/10; OLDGEOM=OLDGEOM/10; VARIABLE: NAMES ARE IDTEACH IDSCH IDSTUD IDCLASS XAGE YFOCCN YMWORK YMOCCN EDUEPCT YPWWELL YIWANT YMORMTH 138 RYPWANT RYPENC RYNOMORE RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT INTERCTY YFAMILY YFEDUC YMEDUC YMHWKT YTUTORT OLDARITH OLDALG OLDGEOM NEWARITH NEWALG NEWGEOM THWRKT CTCBEHV TORDERT TBOTTOM TPPWEEK SAREA SENROLB SENROLG STCHS SSOMMM SSOMMF SALLMM SALLMF SSPECM SSPECF YGOWO YUSTAND YWRKLNG YNOTNEC YJOBUSE YMTHLOG YFLGOOD YNONEED YFUN YMTHJOB YNEVER YHELPO YHAPPY YING YCHALL YINMAZE YNOTWLL YHARDER YCALM Totpre Totpos CLASSSIZE MTHSTAF MTHTEACH MTHONLY SCHSIZE ; USEVARIABLES ARE XAGE EDUEPCT RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE YFAMILY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN YMHWKT oldarith oldgeom NEWALG NEWGEOM TPPWEEK Totpos Totpre CLASSSIZE MTHONLY; MISSING ARE ALL (-9); WITHIN= XAGE EDUEPCT RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE YFAMILY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN YMHWKT; BETWEEN = OLDARITH OLDGEOM NEWALG NEWGEOM 139 CLASSSIZE TPPWEEK MTHONLY; CENTERING = GRANDMEAN (XAGE); CLUSTER = IDCLASS; ANALYSIS: TYPE = TWOLEVEL ; MODEL: %WITHIN% ! LATENT VARIABELs EDUISPR BY RYPWANT RYPENC YPWWELL; SLFENCRG BY YIWANT YMORMTH RYNOMORE; FMLYSUPT BY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE; MTHIMPT BY RYMIMPT RYFIMPT; SES BY YFEDUC YMEDUC YFOCCN YMOCCN; ! REGRESSION Totpre on XAGE EDUEPCT YFAMILY YMHWKT EDUISPR SLFENCRG FMLYSUPTMTHIMPT SES ; Totpos on Totpre ; %BETWEEN% Totpre ON OLDARITH OLDGEOM CLASSSIZE MTHONLY; Totpos ON Totpre NEWALG NEWGEOM TPPWEEK ; OUTPUT: sampstat TECH1 TECH8 CINTERVAL residual; ! Mont Carlo parameters SAVEDATA: ESTIMATES = newmodelfinal.dat; 140 A.2 Mplus Code Generating Data for Mont Carlo Simulation TITLE: Data Generation Mplus Code of Mont Carlo Simulation MONTECARLO: NAMES ARE XAGE EDUEPCT RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE YFAMILY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN YMHWKT oldarith oldgeom NEWALG NEWGEOM TPPWEEK Totpos Totpre CLASSSIZE MTHONLY; NOBSERVATIONS = 345000; NREPS = 1; SEED = 58459; POPULATION =newmodelfinal.dat; COVERAGE =newmodelfinal.dat; NCSIZES = 4; CSIZES = 300 (10) 3500 (20) 8000 (30) 800(40); WITHIN=XAGE EDUEPCT RYPWANT RYPENC YPWWELL YIWANT YMORMTH RYNOMORE YFAMILY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN YMHWKT; BETWEEN = OLDARITH OLDGEOM NEWALG NEWGEOM CLASSSIZE TPPWEEK MTHONLY; 141 REPSAVE = ALL; SAVE = Newmodel8v2*.dat; MODEL POPULATION: %WITHIN% ! LATENT VARIABEL EDUISPR BY RYPWANT RYPENC YPWWELL; SLFENCRG BY YIWANT YMORMTH RYNOMORE; FMLYSUPT BY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE; MTHIMPT BY RYMIMPT RYFIMPT; SES BY YFEDUC YMEDUC YFOCCN YMOCCN; Totpre on XAGE EDUEPCT YFAMILY YMHWKT EDUISPR SLFENCRG FMLYSUPT MTHIMPT SES ; Totpos on Totpre ; %BETWEEN% Totpre ON OLDARITH OLDGEOM CLASSSIZE MTHONLY; Totpos ON Totpre NEWALG NEWGEOM TPPWEEK ; ANALYSIS: TYPE = TWOLEVEL; MODEL: %WITHIN% ! LATENT VARIABEL EDUISPR BY RYPWANT RYPENC YPWWELL; SLFENCRG BY YIWANT YMORMTH RYNOMORE; FMLYSUPT BY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE; 142 MTHIMPT BY RYMIMPT RYFIMPT; SES BY YFEDUC YMEDUC YFOCCN YMOCCN; Totpre on XAGE EDUEPCT YFAMILY YMHWKT EDUISPR SLFENCRG FMLYSUPT MTHIMPT SES ; Totpos on Totpre; %BETWEEN% Totpre ON OLDARITH OLDGEOM CLASSSIZE MTHONLY; Totpos ON Totpre NEWALG NEWGEOM TPPWEEK ; OUTPUT: TECH9; A.3 R Code for Level-1 Matching # Level-1 Matching # Author: Qiu Wang # Date: 2010-01-01/ 2010-01-06: Fist revision # Simulation in detail: This is a simulation set-up for the dissertation. In this simulation, the 200 independent schools with varying sample size (See Section 4.1) are generated, half from the pseudo-population of Cohort 1 at Time 1 and the other half from pseudo-population of Cohort 2 at Time 0. Data generation and parameter setting are discussed in Section 4.2. The covariates both level-1 and level-2 are generated from the multivariate normal distribution with corresponding mean vector and variance-covariance matrix, which are derived from the two-level SEM fitted in Mplus. The surrogate variables of a latent construct such as SES are also generated using multivariate normal distribution with corresponding mean vector and variance-covariance matrix. The variance-covariance matrix is derived based on 143 the (sub-)measurement model of the two-level SEM (See Section 4.2.7). # Propensity Score matching and Mahalanobis distance matching are proposed. The calipers are 0.2 standard deviation of pooled samples and 0.01 standard deviation of pooled samples. # 200 replications per conditions. # # # Part 1 Data Preparation # # # # #. library(MatchIt) setwd(”C:/Documents and Settings/wangqiu/Desktop/SIMS042010/QiuData/ REGULAR CLASS DATA/Data generate”) # cohort 2 at time 0 and time 1 data and cohort ID is coded as 1. cohort1T0T1<-read.table (file=’NewmodePOP.dat’) cohort1T0T1data<-cbind(cohort1T0T1,cohort=c(rep(1,length(cohort1T0T1[,1])))) # cohort 1 at time 1 data and # cohort ID is coded as 0. cohort2T0T11<-read.table (file=’popdata11.dat’) cohort2T0T1data1<-cbind(cohort2T0T11,cohort=c(rep(0,length(cohort2T0T11[,1])))) # varialbe list colnamess<-c(”RYPWANT”,”RYPENC”,”YPWWELL”,”YIWANT”,”YMORMTH”, ”RYNOMORE”,”RYPINT”,”RYFLIKE”,”RYMLIKE”,”RYFABLE”,”RYMABLE”, ”RYMIMPT”,”RYFIMPT”,”YFEDUC”,”YMEDUC”,”YFOCCN”,”YMOCCN”, ”TOTPOS”,”TOTPRE”,”XAGE”,”EDUEPCT”,”YFAMILY”,”YMHWKT”, ”OLDARITH”,”OLDGEOM”,”NEWALG”,”NEWGEOM”,”TPPWEEK”,”CLASSSIZ”, ”MTHONLY”,”CLUSTER”,”COHORT”) # Attach variable names to both cohorts. 144 colnames(cohort1T0T1data)<-colnamess colnames(cohort2T0T1data1)<-colnamess # population longitudinal schooling effect POP.lngtdnl.ef<-mean(cohort1T0T1data$TOTPOS)-mean(cohort1T0T1data$TOTPRE) #populaiton synthetic cohort schooling effect POP.synthetic.ef<-mean(cohort1T0T1data$TOTPOS)-mean(cohort2T0T1data1$TOTPRE) # # # # Part 2 Simulation: Matching # # # # cluster.id<-c(1:12600) lngtdnl.ef<-NULL synthetic.ef<-NULL matched.synthetic.ef1<-NULL matched.synthetic.ef2<-NULL matched.synthetic.ef3<-NULL matched.synthetic.ef4<-NULL for (j in 1:200){ classID.treat<-sort(sample(cluster.id,size=100,replace=F)) sample.data.Trt<-NULL sample.data.Cntr<-NULL for (i in 1:length(classID.treat)){ sample.data.Trt<-rbind(sample.data.Trt,cohort1T0T1data[ (cohort1T0T1data$CLUSTER==classID.treat[i]),] ) sample.data.Cntr<-rbind(sample.data.Cntr,cohort2T0T1data1[ (cohort2T0T1data1$CLUSTER==classID.treat[i]),] ) 145 } lngtdnl.ef[j]<-mean(sample.data.Trt$TOTPOS)-mean(sample.data.Trt$TOTPRE) synthetic.ef[j]<-mean(sample.data.Trt$TOTPOS)-mean(sample.data.Cntr$TOTPRE) # matching sample.data<-data.frame(rbind(sample.data.Trt,sample.data.Cntr)) #propensity score matching pro.pen<-glm(COHORT∼XAGE+EDUEPCT+YFAMILY+YMHWKT,family=binomial, data=sample.data) logodds<-log(pro.pen$fitted/(1-pro.pen$fitted)) ## Mahalanobis matching XX<-cbind(sample.data$XAGE,sample.data$EDUEPCT, sample.data$YFAMILY,sample.data$YMHWKT) mhd<-mahalanobis(XX,mean(XX),var(XX)) cutoff1 <-.01*sd(logodds) cutoff2<-0.2*sd(logodds) cutoff3<-.01*sd(mhd) cutoff4<-.2*sd(mhd) t.c.match1<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT, X=logodds,M=1,caliper=cutoff1,replace=FALSE) t.c.match2<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT, X=logodds,M=1,caliper=cutoff2,replace=FALSE) t.c.match3<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT, X=mhd,M=1,caliper=cutoff3,replace=FALSE) 146 t.c.match4<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT, X=mhd,M=1,caliper=cutoff4,replace=FALSE) # after matching SCD-based Shooling Effect matched.synthetic.ef1[j]