MATCHING FOR BIAS REDUCTION IN TREATMENT EFFECT ESTIMATION OF
HIERARCHICALLY STRUCTURED SYNTHETIC COHORT DESIGN DATA
By
Qiu Wang

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Measurement and Quantitative Methods
2010

Abstract
MATCHING FOR BIAS REDUCTION IN TREATMENT EFFECT
ESTIMATION OF HIERARCHICALLY STRUCTURED SYNTHETIC
COHORT DESIGN DATA
By
Qiu Wang

This study uses a multi-level multivariate propensity score matching approach to examine the
synthetic cohort design (SCD) in estimating the schooling eﬀect on mathematics proﬁciency
of the focal cohort 2 (8th graders). Collecting 7th and 8th graders at the same time point,
the SCD is suﬃcient in estimating the schooling eﬀect under the historical equivalency of
groups (HEoG) assumption.
A structural equation modeling (SEM) framework is used to deﬁne the HEoG assumption.
It has shown that HEoG assures that the use of SCD results in an unbiased estimate of
schooling eﬀect without randomized data. The post-hoc group matching is used to achieve
the HEoG assumption in order to produce an estimate of schooling eﬀect that is unbiased in
SCD. Three matching approaches, level-1 matching, level-2 matching, and dual matching,
are evaluated using simulated data generated based on USA participants of the Second
International Mathematics Study (SIMS-USA, IEA, 1977).
Two-level latent variable models based on situations that violated the HEoG assumption
are created in order to examine the ability of matching to reduce the simulated selection
biases to improve the accuracy of the schooling eﬀect estimate in SCD. The three simulated
situations involve hierarchically structured data, surrogate covariates with measurement errors, and omitted covariates.
Results suggest the following: 1) To reduce initial bias and assure the HEoG assumption,

three diﬀerent matching approaches should be conducted on the covariates according to
where the initial bias occurs: on level-1 covariates, on level-2 covariates, and on both level-1
and level-2 covariates; 2) When reliability is low (e.g, .25), latent variable matching does
not help improve group comparability, but using observed surrogate variables to match can
reduce bias by more than 50 percent. When reliability is high (e.g, greater than .75), latent
variable matching reduces bias as much as matching on observed surrogate variables does; 3)
When level-2 initial bias is large, increasing level-2 R2 does help to improve level-2 matching.
The bias reduction of either individual or dual propensity score matching is not sensitive to
the increase of R2 . The dual propensity score matching is more robust to the magnitude
of initial selection bias, achieving a large bias reduction rate when the initial bias is small.
Either level-1 matching or level-2 matching achieves lower bias reduction rate when the initial
bias is small.
This dissertation study provides a theoretical basis for future research to examine the
eﬀectiveness of propensity score matching in reducing the selection bias of SCD for casual inference and program evaluation. Practical considerations and suggestions for future research
on hierarchically structured data in program evaluation are discussed.

Copyright by
QIU WANG
2010

To: My Father

v

ACKNOWLEDGMENT

Completing this doctoral dissertation has been a journey full of explorations, adventures,
excitement, and sometimes struggles. During this process, I have been very fortunate to
have the guidance, support, and help from my professors, colleagues, and friends.
My deepest gratitude goes to my spiritual and academic mentors, Drs. Richard Houang,
Kimberley Maier, Matthew Diemer, and William Schmidt. Their working philosophy, eﬃcient working style, thinking-out-of-box ability shaped my own academic work style. Because
of their inﬂuences, I am now able to work with my students with appreciation and respects.
Dr. Kimberly Maier has been very encouraging. Her comments guide my dissertation
study and writing, and specially help me eﬃciently go through the ﬁnal revision stage. Dr.
Maier’s caring nature deeply inﬂuenced me. The inner peace I gain through the scriptures
she shared with me is the most valuable an advisee can receive.
Dr. Richard Houang, with his insightful thoughts, has helped me see through and solve
the technical issues of this dissertation. His philosophical metaphor of “the forest and individual trees” has shed light on problems that I am working on in the dissertation. Dr.
Houang’s oﬃce door is always open to me and our discussions have been directing the study
to a new direction. Without Dr. Houang’s help, I would not have been able to complete the
dissertation study.
Working with Dr. Williams Schmidt as both a graduate research assistant and a teaching
assistant has beneﬁted me at many levels. His guidance is invaluable to both me and my
wife, as we both were exceptionally fortunate to have him on our dissertation committee.
I have also been very fortunate to be ﬁnancially supported by Dr. Diemer working several
vi

very important projects that helped shape my research interests on educational equity and
school inclusion.
I am also very grateful to Dr. Jack Schwille, Dr. David Wiley, Dr. Richard Wolfe, Dr.
Ingrit Monk for their thoughtful suggestions and encouraging inputs during my development
of my dissertation proposal.
Throughout my doctoral studies, many people have supported my professional growth
in one way or another: I’m grateful to professors from department of statistics, Dr. James
Stapleton ( Categorical data analysis, experimental design, sampling) and Dr. Lijian Yang
(Regression), and then student colleague and friend and now statistics professors, Dr. Weixing Song (Kansas State University). I really appreciated the research work done through
the ﬁnancial supports I was awarded by Drs.Betsy J. Becker (Meta-analysis) and Mary M.
Kennedy (Science Education), Dr. Akiho Kamata ( Psychometrics and Equating ), Dr.
Richard Tate (Multilevel-modeling), Dr. Yeo Meng Thum (Hierarchical Linear Modeling),
Dr. Barbara Schneider (Causal inference), my coauthors and friends Dr. Hui Liu and Dr.
Brandon Vaughn. I am especially thankful that I have studied with two great scholars,
Dr. Tenko Raykov (Structural Equation Modeling and Reliability ) and Dr. Mark Reckase
(Psychometrics and Multidimensional Item Response Theory).
Many friends Benjamin Ong, Brian Lengseth, David Rayes-Gastelum, and Amanda Lewis
also helped me through my doctoral studies. I greatly value their friendship and deeply
appreciate their conﬁdence in me. Their support and care helped me overcome setbacks and
stay focused on my study.
Dr. Yong Zhao and his wife Xi Chen and their son Yechen and daughter Athena have
been very supportive to me. They made my years at MSU meaningful and enjoyable with
vii

warm memories. Mr. Blaine Morrow and Mrs. Linda Morrow are always there with love,
support and encouragement to me and my family.
My dear wife, Dr. Jing Lei, is always there by my side with her unconditional love. I am
so lucky to share my life with her. Because of her, going through this process and ﬁnishing
my dissertation becomes such a wonderful journey.
This material is based upon work supported by the National Science Foundation under
Grant No. DUE-0831581. Any opinions, ﬁndings, and conclusions or recommendations
expressed in this material are those of the author and do not necessarily reﬂect the views of
the National Science Foundation.

PREFACE

11 Daniel then said to the guard whom the chief oﬃcial had appointed over Daniel,
Hananiah, Mishael and Azariah, 12 “Please test your servants for ten days: Give us nothing
but vegetables to eat and water to drink. 13 Then compare our appearance with that of
the young men who eat the royal food, and treat your servants in accordance with what you
see.” 14 So he agreed to this and tested them for ten days.
15 At the end of the ten days they looked healthier and better nourished than any of the
young men who ate the royal food. 16 So the guard took away their choice food and the wine
they were to drink and gave them vegetables instead. (Daniel 1:11-16 New International
Version)

ix

Contents
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xiv

List of Figures

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

Nomenclature

xvii

1 Introduction
1.1 Research Goals . . . . . . . .
1.2 Solomon Four-Group Design .
1.3 Synthetic Cohort Design . . .
1.4 Why Is HEoG Critical in SCD
1.5 Signiﬁcance . . . . . . . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

2 Literature Review
2.1 Deﬁnitions of Bias and Selection Bias . . . . . . . . . . . . .
2.1.1 Mathematical Deﬁnition of Bias at Individual Level . .
2.1.2 How Selection Bias Aﬀects Treatment Eﬀect Estimate .
2.1.3 Selection Bias in Hierarchically Structured Data . . . .
2.2 Propensity Score Matching for Bias Reduction . . . . . . . .
2.3 Matching on Hierarchically Structured Data . . . . . . . . . .
2.3.1 Level-1 Matching . . . . . . . . . . . . . . . . . . . . .
2.3.2 Level-2 Matching . . . . . . . . . . . . . . . . . . . . .
2.3.3 Dual-Matching . . . . . . . . . . . . . . . . . . . . . .
2.4 Measurement Errors and Matching . . . . . . . . . . . . . . .
2.4.1 Measurement Errors Adjusted Propensity Scores . . .
2.4.2 Structural Equation Modeling as an Alternative . . . .
2.5 Omitted Variables . . . . . . . . . . . . . . . . . . . . . . . .
3 Theoretical Framework
3.1 Solomon Four-Group Design in SEM Framework . . . . . .
3.1.1 SEM of Experimental Group 1 . . . . . . . . . . .
3.1.2 SEM of Control Group 1 . . . . . . . . . . . . . .
3.1.3 SEM of Experimental Group 2 . . . . . . . . . . .
3.1.4 SEM of Control Group 2 . . . . . . . . . . . . . .
3.1.5 Pre-Equivalence of Groups (PEoG) Assumption . .
3.2 Extended Solomon Four-Group Design in SEM Framework
3.2.1 Extended-PEoG Assumption . . . . . . . . . . . .
x

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

1
3
5
7
10
11

.
.
.
.
.
.
.
.
.
.
.
.
.

13
14
14
16
18
20
22
24
24
25
25
26
28
30

.
.
.
.
.
.
.
.

33
33
35
36
37
37
37
40
43

3.3
3.4

Synthetic Cohort Design in the Context of Solomon Four-Group Design . . .
Matching and HEoG Assumption . . . . . . . . . . . . . . . . . . . . . . . .

47
50

4 Simulation Study
53
4.1 Data and Conceptual Model . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.1 Two-Level Structural Equation Model Based on Data of SIMS-USA . 63
4.1.2 Longitudinal Data Generation . . . . . . . . . . . . . . . . . . . . . 73
4.2 Generate Synthetic Cohort Design Data with Simulated Selection Bias . . . 80
4.2.1 Generate Hierarchically Structured C1T 1 Data with Selection Bias . 83
4.2.1.1 C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s . . . . 83
4.2.1.2 C1T 1’s Level-1 Covariate Variances Diﬀer from C2T 0’s . . . 84
4.2.1.3 C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s . . . . 84
4.2.1.4 C1T 1’s Level-2 Covariate Variances Diﬀer from C2T 0’s . . . 85
4.2.1.5 C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from C2T 0’s 85
4.2.2 Generate Data for Matching on Latent Variables v.s. Matching on
Surrogate Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2.2.1 C1T 1’s Surrogate Variable Means Diﬀer from C2T 0’s, with
the Same Latent Means and Low Reliability . . . . . . . . . 86
4.2.2.2 C1T 1’s Surrogate Variables Have Higher Reliability than C2T 0’s,
with the Same Surrogate Means and the Same Latent Means 88
4.2.2.3 C1T 1’s Surrogate Variables Have Higher Reliability, Diﬀerent Latent Variable Mean from C2T 0’s . . . . . . . . . . . . 89
4.2.2.4 C1T 1’s Latent Variable Mean Diﬀer from C2T 0’s, with Same
Higher Reliability . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2.3 Manipulate R2 to Generate Data for Matching . . . . . . . . . . . . . 90
4.2.3.1 C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s, with
2
Level-1 Variance σepre Reduced by Half . . . . . . . . . . . 90
4.2.3.2 C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s, with
2
Level-1 Variance σepre Reduced by Half, and Initial Diﬀerence Reduced . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2.3.3 C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s with
2
Level-2 Variance σuα Reduced by Half . . . . . . . . . . . 91
0
4.2.3.4 C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s with
2
Level-2 Variance σuα Reduced by Half and Initial Diﬀerence
0
Reduced by Half . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.3.5 C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from C2T 0’s
with Both Level-1 and Level-2 Variances Reduced by Half . 92
4.2.3.6 C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from C2T 0’s,
with Both Level-1 and Level-2 Variances Reduced by Half
and Total Initial Diﬀerence Reduced . . . . . . . . . . . . . 92
4.3 Simulation Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.1 Compute Initial Diﬀerence . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.2 Compute After Matching Bias . . . . . . . . . . . . . . . . . . . . . . 94
xi

4.3.3

Compute Bias Reduction Rate

. . . . . . . . . . . . . . . . . . . . .

95

5 Matching Simulation Results and Discussions
5.1 Three Types of Matching Routines . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Level-1 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Level-2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.3 Dual Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Simulation Results of Matching on Level-1 and/or Level-2 Covariates . . . .
5.2.1 C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s . . . . . . . . .
5.2.2 C1T 1’s Level-1 Covariate Variances Diﬀer from C2T 0’s . . . . . . . .
5.2.3 C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s . . . . . . . . .
5.2.4 C1T 1’s Level-2 Covariate Variances Diﬀer from C2T 0’s . . . . . . . .
5.2.5 Dual Matching Simulation Results . . . . . . . . . . . . . . . . . . .
5.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Simulation Results of Matching on Level-1 Latent Variable and Surrogate
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1 C1T 1’s Surrogate Variable Means Diﬀer from C2T 0’s, with the Same
Latent Means and Low Reliability . . . . . . . . . . . . . . . . . . . .
5.3.2 C1T 1’s Surrogate Variables Have Higher Reliability than C2T 0’s, with
the Same Surrogate Means and the Same Latent Means . . . . . . . .
5.3.3 C1T 1’s Surrogate Variables Have Higher Reliability, Diﬀerent Latent
Variable Mean from C2T 0’s . . . . . . . . . . . . . . . . . . . . . . .
5.3.4 C1T 1’s Latent Variable Mean Diﬀer from C2T 0’s, with the Same
Higher Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Simulation Results of Matching When R2 Being Manipulated . . . . . . . .
5.4.1 C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s, with Level-1
Variance Reduced by Half . . . . . . . . . . . . . . . . . . . . . . . .
5.4.2 C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s, with Level-1
Variance Reduced by Half, and Initial Diﬀerence Reduced . . . . . .
5.4.3 C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s with Level-2 Variance Reduced by Half . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.4 C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s with Level-2 Vari2
ance σuα Reduced by Half, and Initial Diﬀerence Reduced by Half .
0
5.4.5 C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from C2T 0’s with
Both Level-1 and Level-2 Variances Reduced by Half . . . . . . . . .
5.4.6 C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from C2T 0’s, with
Both Level-1 and Level-2 Variances Reduced by Half, and Total Initial
Diﬀerence Reduced . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96
96
96
98
99
100
100
100
101
101
103
103
104
105
107
108
108
109
110
112
112
113
113
114

114
115

6 Discussions
116
6.1 Extend the Analysis to another Type of Math Classes . . . . . . . . . . . . . 117
6.2 Incomplete Matching Due to Small Cluster Size . . . . . . . . . . . . . . . . 118
xii

6.3

Role of Covariates in Synthetic Cohort Design . . . . . . . .
6.3.1 On Which Covariates to Match . . . . . . . . . . . .
6.3.2 Concern on Chronological Variables such as Age and
OTL . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.3 Two Types of Level-2 Covariates . . . . . . . . . . .
6.3.4 Interaction Terms as Omitted Covariates . . . . . . .
6.4 Deal with Students under Retention in Matching . . . . . .
6.5 Improve Measurement Accuracy in Education Studies . . . .
6.6 Situations Where HEoG May Fail . . . . . . . . . . . . . . .
6.7 Statistical Power as an After-Matching Evaluation Index . .
6.8 After-Matching Statistical Analyses . . . . . . . . . . . . .
6.9 Synthetic Cohorts Design and Life-Course Research . . . . .
6.10 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Simulation Code
A.1 Mplus Code Fitting the Two-Level SEM on SIMS-USA Data
A.2 Mplus Code Generating Data for Mont Carlo Simulation . .
A.3 R Code for Level-1 Matching . . . . . . . . . . . . . . . . . .
A.4 Code for Level-2 Matching . . . . . . . . . . . . . . . . . . .
A.5 R Code for Dual Matching . . . . . . . . . . . . . . . . . . .

. . . .
. . . .
Grade
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

. . . . .
. . . . .
Speciﬁc
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .

122
123
123
124
125
126
127
129
129
132
136

.
.
.
.
.

138
138
141
143
149
153

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

121
122

B Variance-Covariance Decomposition of the Extended Solomon Four-Group
Design (SFGD) Based On Two-Level SEM Framework
158
B.1 Variance-Covariance Matrix of SFGD Experimental Group 1 . . . . . . . . . 159
B.2 Variance-Covariance Matrix of SFGD Experimental Group 2 . . . . . . . . . 160
B.3 Variance-Covariance Matrix of SFGD Control Group 1 . . . . . . . . . . . . 161
B.4 Variance-Covariance Matrix of SFGD Control Group 2 . . . . . . . . . . . . 162
B.5 Detailed Variance-Covariance Decomposition . . . . . . . . . . . . . . . . . . 163

xiii

List of Tables

3.1

Solomon Four-Group Design in Structural Equation Modeling Framework . .

34

3.2

SEMs of the Extended Solomon Four-Group Design and Covariance Matrixes

42

4.1

Level-1 Descriptive Statistics of the Final Two-Level Structural Equation
Model (N=2,296) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

4.1

Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

4.2

Level-2 Descriptive Statistics of the Final Two-Level Structural Equation
Model (N=126) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.3

¯
The Level-1 Variance Covariance Matrix (S1 ) and Means (X1 ) . . . . . . . .

59

4.3

Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

60

4.4

¯
The Level-2 Variance Covariance Matrix (S2 ) and Means (W2 ) . . . . . . . .

61

4.4

Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

4.5

Two-Level Structural Equation Model Estimates (a.k.a True Pseudo-population
Parameter Values) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.5

Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

4.6

ˆ
Model Estimated Parameters: Level-1 Variance Covariance Matrix (Σ1 ) and
Mean (ˆ1 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
µ

69

4.6

Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

70

4.7

ˆ
Model Estimated Parameters: Level-2 Variance Covariance Matrix (Σ2 ) and
Mean (ˆ2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
µ

71

Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

4.7

xiv

4.8

Covariance Matrix of the Five Latent Variables . . . . . . . . . . . . . . . .

73

4.9

Class Size Distribution of 126 Classes of SIMS-USA Data . . . . . . . . . . .

74

4.10 Recovery of Pseudo-Population Parameters. . . . . . . . . . . . . . . . . . .

75

4.10 Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

4.10 Continued. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

77

4.11 Possible Simulation Manipulations on Comparability of C2T0 and C1T1 in
SEM Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

4.12 Simulation Design of Matching on Latent and Surrogate Variables . . . . . .

87

5.1

Bias Reduction Rates of the Three Types of Matching . . . . . . . . . . . . . 102

5.2

Simulation Results of Matching on Level-1 Latent Variable and Surrogate
Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.3

Bias Reduction Rates of Three Type of Matching with Higher R2 . . . . . . 111

xv

List of Figures
1.1

The Solomon Four-Group Design: R represents group randomization, T treatment, and O assessment. Besides randomization, matching as the other approach to create comparable groups (Solomon, 1949). . . . . . . . . . . . . .

5

Longitudinal vs. Quasi-Longitudinal Comparison. For interpretation of the
references to color in this and all other ﬁgures, the reader is referred to the
electronic version of this dissertation. . . . . . . . . . . . . . . . . . . . . . .

8

3.1

Synthetic cohort design in the context of Solomon Four-Group Design-G1 . .

47

3.2

Three data sets, two-way matching and the HEoG assumption . . . . . . . .

52

4.1

Conceptual framework model on SIMS-USA data . . . . . . . . . . . . . . .

63

4.2

Two-level structural equation model on SIMS-USA data . . . . . . . . . . .

65

1.2

xvi

Nomenclature
SCD

Synthetic Cohort Design, a quasi-longitudinal design, page 1.

TIMSS

the Third International Mathematics and Science Study, page 1.

SIMS

the Second International Mathematics Study, page 1.

CjT t

Cohort j at Time t, with j = 1, 2 and t = 0, 1, page 1, page 51.

δC2T 1−C2T 0

schooling eﬀect based on Cohort 2 across Time 0 and Time 1 in
longitudinal design, page 1.

δC2T 1−C1T 1

schooling eﬀect based on Cohort 1 and Cohort 2 at Time 1 in quasilongitudinal design, page 1.

HEoG

the historical equivalence of groups assumption, page 2.

X

vector of level-1 and level-2 covariates, page 2.

x

a covariate, page 2.

p

number of covariates in vector X, page 2.

DF

the discriminant function, page 2.

V1
Σw

the ﬁrst eigenvector vector, page 2.

Σb

between-group variance covariance matrixes of X s, page 2.

DDA

descriptive discriminant analysis, page 2.

R2

proportion of the explained variance by the model, page 3.

SEM

structural equation model, page 3.

SFGD

the Solomon Four-Group Design, page 4.

R

group randomization, page 5.

O

operation of assessment, page 5.

within-group variance covariance matrixes of X s, page 2.

xvii

T

treatment, page 5.

Yti

outcome vector of group i at time t, with i = E1 , C1 , E2 , C2 ; t =
0,1, page 6.
outcome vector of group i at time t, with i = E1 , C1 , E2 , C2 ; t =
0,1, page 33.

¯
Yti

mean of group i at time t, with i = E1 , C1 , E2 , C2 ; t = 0,1, page 6.

E1 , E2

Experimental Group 1, 2 in SFGD, respectively, page 6.

C1 , C2

Control Group 1, 2 in SFGD, respectively, page 6.

α

main eﬀect due to history or prior learning in SFGD, page 6.

C2
α0

main eﬀect due to history or Prior learning of Cohort 2(8th grade)
at Time 0 in SCD, respectively, page 7.

C1
α0

main eﬀect due to history or Prior learning of Cohort 1(7th grade)
at Time 0 in SCD, respectively, page 7.

τ

main eﬀect due to taking the pre-test in SFGD, page 6.

τ C1

main eﬀect due to taking the pre-test of Cohort 1(7th grade) in SCD,
page 7.

τ C2

main eﬀect due to taking the pre-test of Cohort 2 (8th grade)in SCD,
page 7.

α×τ

joint eﬀect (the interaction) of prior learning (α) and taking the pretest (τ ) in SFGD, page 7.

γ

main eﬀect due to maturation from Time 0 to Time 1 in SFGD,
page 6.

γ C1

main eﬀect due to maturation between Time 0 and Time 1 of Cohort
1(7th grade) in SCD, page 7.

γ C2

main eﬀect due to maturation between Time 0 and Time 1 of Cohort
2 (8th grade)in SCD, page 7.

δ

main eﬀect due to the treatment in SFGD, page 6.
population level treatment eﬀect, see equation (2.7), page 16.
intervention eﬀect in SEM, see equation (3.2), page 35.
intercept of the measurement model of outcome Y in the extended
SEM-based SFGD, see equation (3.12), page 41.
xviii

δ C1

main eﬀect due to schooling eﬀect of 7th grade instruction in Cohort
1 in SCD, page 7.

¯
Ytl

the mean of the dependent variable for cohort l at time t, l = C1, C2
and t = 0, 1 in SCD, page 7.

E(.)

Expectation/Mean function, page 6.

C1, C2

Cohort 1 (7th grade), Cohort 2 (8th grade) in SCD, respectively,
page 7.

⇒

reads implies, page 9.

f (.)

some additive function, page 6.
a function estimating the propensity score, page 80.

|

reads given, page 9.

BIAS(.)

bias function of an estimator, see equation (1.2), page 9.

ˆ
BIAS(.)

manipulated bias, see equation (4.6), page 84.

ICC

intraclass correlation, page 13.

D
yi

ith response/outcome of group D, see equation (2.1), page 15.

D

group membership index variable: treatment (D = 1) or control
(D = 0), see equation (2.1), page 15.

nD

Dth group size, see equation (2.1), page 15.

uD
i

the ith random errors in Dth group, see equation (2.1), page 15.

µD

the mean of Dth group’s outcome variable y, see equation (2.1),
page 15.

µD
X

the mean vector of covariates X in Dth group, page 15.

i

random error, which is u1 minus u0 , see equation (2.2), page 15.
i
i

M D (X)

the mean function of Dth group in terms of covariates X, see equation (2.4), page 15.

αD

intercept of the regression equation of outcome y D in Dth group, see
equation (2.4), page 15.

βD

covariates X’s regression coeﬃcient vector in the regression equation
of outcome y D in Dth group, see equation (2.4), page 15.
xix

δ(X)

treatment eﬀect of the counterfactual model including covariates X’s,
see equation (2.6), page 16.

∆(X)

treatment eﬀect bias of the counterfactual model including covariates
X’s, see equation (2.7), page 16.

∆X

non-zero constant vector, the treatment and control group mean difference of covariates X , see equation (2.8), page 17.

∆β

β 1 − β 0 , diﬀerence of covariates X’s regression coeﬃcients between
treatment group and control group, see equation (2.12), page 17.

βX×D

regression coeﬃcient vector of the interaction terms between covariates X and treatment status variable D, page 18.

D
Yik

outcome vector of ith individual in k th cluster of the Dth group, see
equation (2.15), page 19.

D
Xik

level-1 covariates X measured on ith individual in k th cluster of the
Dth group, see equation (2.15), page 19.

D
Wik

level-2 covariates W measured on ith individual in k th cluster of the
Dth group, see equation (2.15), page 19.

µX

the population mean vector of level-1 covariates, see equation (2.15),
page 19.

µW

the population mean vector of level-2 covariates, see equation (2.15),
page 19.

µX

the population mean vector of k th class’s level-1 covariates, see equa-

k

tion (2.15), page 19.

β

pooled within-level-2-unit regression coeﬃcient vector of the level-1
variables, see equation (2.15), page 19.

βk

within-level-2-unit regression coeﬃcient speciﬁcally for k th level-2
school, see equation (2.15), page 19.

βX

the regression coeﬃcient vector of the observed level-1 variables, see
equation (2.15), page 19.

βW

the regression coeﬃcient vector of the observed level-2 variables, see
equation (2.15), page 19.

CRT

cluster-randomized trial, page 22.

ECLS

the Early Childhood Longitudinal Program, page 22.
xx

LSAY

the Longitudinal Study of American Youth, page 22.

˜
β

attenuated regression coeﬃcient due to measurement errors in the
covariate x, page 25.

R

attenuation rate due to measurement errors in the covariate x, page 25.

ρ

reliability coeﬃcient, page 26.

P r(D = 1|X)

propensity score function in terms of covariates X, page 26.

logit

logit function, see equation (2.17), page 27.

H

the covariate vector,(h1 , ..., hq ) without measurement errors, see
equation (2.17), page 27.

X∗

the true covariates measured by vector X with errors, see equation (2.18), page 27.

N

sample size index, page 26.

n1 , n2

sample size of sample 1 and sample 2, respectively, page 26.

r

residual term, see equation (2.18), page 27.

logit−1

the inverse-logit function, or logistic function, see equation (2.21),
page 27.

X ∗ |X, H

the true X ∗ given X and H, page 28.

µX ∗ |X,H

the mean vector of the conditional distribution of the true X ∗ given
X and H, page 28.

MN

multivariate normal distribution, page 28.

N

univariate normal distribution, page 41.

σX ∗ |X,H

the variance-covariance matrix of the conditional distribution of the
true X ∗ given X and H, page 28.

MCMC

Markov chain Monte Carol, page 28.

ι0 , ι1 , ι∗ , ι∗
0 1

regression coeﬃcients in the hybrid model, see equation (2.26), page 29.

2 2
σ1 , σ2

within level-2 unit variance and between level-2 unit variamnce, respectively, see equation (2.27), page 32.

η0 , η1

latent mathematics proﬁciency at pre-test and post-test time points,
respectively, page 35.
xxi

ε0 , ε1

residual terms at pre-test and post-test time points, respectively, see
equation (3.1), page 35.

a0 , a1

factor loading vectors at pre-test and post-test time points, respectively, page 36.

b0 , b1

item diﬃculty parameter vectors at at pre-test and post-test time
points, respectively, page 36.

ν

acceleration eﬀect of intervention, i.e., the regression coeﬃcient in
the structural equation of SEM , see equation (3.5), page 36.

⊥

reads perpendicular or independent of , page 38.

PEoG

Pre-Equivalence of Groups (PEoG) Assumption in the SEM-based
SFGD, page 37.

PEoG-1

Pre-Equivalence of Groups (PEoG) Assumption in the SEM-based
SFGD’s Group 1, page 38.

PEoG-2

Pre-Equivalence of Groups (PEoG) Assumption in the SEM-based
SFGD’s Group 2, page 38.

SFGD-G2

SFGD’s Group 2, page 40.

λ

factor loading of the measurement model of outcome Y in the extended SEM-based SFGD, see equation (3.12), page 41.

v, g

intercept and factor loading of the measurement model of covariates
X in the extended SEM-based SFGD, see equation (3.12), page 41.

e1 , e2

residual terms of the measurement models of Y and X in the extended
SEM-based SFGD, see equation (3.12), page 41.

η, ξ

latent variables (factors) of the measurement models of Y and X in
the extended SEM-based SFGD, see equation (3.12), page 41.

V(ξ)

latent variable ξ’s variance, Ψξ in the extended SEM-based SFGD,
see equation (B.3), page 163.

Ψb , Ψw
ξ ξ

latent variable ξ’s within-cluster and within-cluster variances in the
extended SEM-based SFGD, see equation (B.3), page 163.

V(η)

latent variable η’s variance, Ψη in the extended SEM-based SFGD,
see equation (B.9), page 164.

Ψb , Ψw
η η

latent variable η’s within-cluster and within-cluster variances in the
extended SEM-based SFGD, see equation (B.9), page 164.
xxii

Y0 , Y1

outcome variable Y at Time 0 and Time 1, respectively, in the extended SEM-based SFGD, see equation (3.14), page 41.

δ0 , δ1

intercept vectors of the measurement model of outcome Y0 and Y1 ,
respectively, in the extended SEM-based SFGD, see equation (3.14),
page 41.

λ0 , λ1

factor loading vectors of the measurement model of outcome Y0
and Y1 , respectively, in the extended SEM-based SFGD, see equation (3.14), page 41.

e10 , e11

residual terms of the measurement model of outcome Y0 and Y1 ,
respectively, in the extended SEM-based SFGD, see equation (3.14),
page 41.

X0 , X1

covaraites X at Time 0 and Time 1, respectively, in the extended
SEM-based SFGD, see equation (3.16), page 41.

v0 , v1

intercept vectors of the measurement model of X0 and X1 , respectively, in the extended SEM-based SFGD, see equation (3.16), page 41.

g0 , g1

factor loading vectors of the measurement model of X0 and X1 , respectively, in the extended SEM-based SFGD, see equation (3.16),
page 41.

e20 , e21

residual terms of the measurement model of X0 and X1 , respectively,
in the extended SEM-based SFGD, see equation (3.16), page 41.

Θe1 , Θe2

variances of the residual terms in the extended SEM-based SFGD,
see equation (3.12), page 41.

Θw , Θb
e2 e2

within-cluster and within-cluster variances of the residual e2 in the
extended SEM-based SFGD, see equation (B.4), page 164.

Θw , Θb
e1 e1

within-cluster and within-cluster variances of the residual e1 in the
extended SEM-based SFGD, see equation (B.12), page 165.

A, B, U

intercept, factor loading, and residual term of the structural model
in the extended SEM-based SFGD, see equation (3.13), page 41.

ΘU

variance of the residual term of the structural model in the extended
SEM-based SFGD, see equation (3.13), page 41.

Θb , Θw
U U

within-cluster and within-cluster variances of the residual U of the
structural model in the extended SEM-based SFGD, see equation (B.8),
page 164.
xxiii

a, π

intercept and regression coeﬃcient of the structural model in the
extended SEM-based SFGD, see equation (3.17), page 43.

A0 , A1

intercept vectors of the structural model in the extended SEM-based
SFGD, see equation (3.18), page 43.

B0 , B1

factor loading vectors of the structural model in the extended SEMbased SFGD, see equation (3.18), page 43.

U0 , U1

residual terms , see equation (3.18), page 43.

Y eari , Y eari+1

two adjacent years in the longitudinal design, page 48.

Φw , Φb
X X

within-cluster and within-cluster variances of X in the extended
SEM-based SFGD, see equation (B.5), page 164.

Φw , Φb
Y Y

within-cluster and within-cluster variances of Y in the extended SEMbased SFGD, see equation (B.13), page 165.

Ψw , Ψb
η0 η0

within-cluster and within-cluster variances of latent variable η0 in
the extended SEM-based SFGD, see equation (B.24), page 168.

Ψw , Ψb
ξ0 ξ0

within-cluster and within-cluster variances of latent variable ξ0 in
the extended SEM-based SFGD, see equation (B.29), page 169.

V(.)

variance function, page 82.

Cov(.)

covariance function , page 159.

¯
X1 , S1

level-1 variable mean vector and variance-covariance matrix, page 58.

SIMS-USA

SIMS data collected in the United States, page 58.

µ1 , Σ 1
ˆ ˆ

estimated level-1 variable mean vector and variance-covariance matrix, page 66.

¯
W2 , S2

level-2 variable mean vector and variance-covariance matrix, page 58.

µ2 , Σ 2
ˆ ˆ

estimated level-2 variable mean vector and variance-covariance matrix, page 66.

STU, SCH

represent STUDENT and SCHOOL, respectively in Figure 4.1, page 63.

OTL

opportunity to learn, measured by the curriculum coverage, page 63.

Coef.

loading or regression coeﬃcient, page 67, 68.

SE

standard error, page 67, 68.
xxiv

PV

p-value, page 67, 68.

POSTTEST

post-test outcome variable, see equation (4.1), page 64.

α0 , α1

intercept and regression coeﬃcient of the level-1 model of post-test,
see equation (4.1), page 64.

epre , epost

error terms of the level-1 model of pre- and post-test, respectively,
page 64.

PRETEST

pre-test outcome variable, see equation (4.2), page 64.

EDUCEPT

education expectation, see equation (4.2), page 64.

EDUINSP

latent variable, educational inspiration, see equation (4.2), page 64.

SLFENCRG

latent variable, self encouragement, see equation (4.2), page 64.

FMLSUPRT

latent variable, family support, see equation (4.2), page 64.

MTHIMPT

latent variable, importance of learning mathematics, see equation (4.2),
page 64.

SES

latent variable, socioeconomic status, see equation (4.2), page 64.

β0 , . . . , β 9

intercept and regression coeﬃcients of the level-1 model of pre-test,
see equation (4.2), page 64.

2
2
σepre , σe
post

variances of error terms of the level-1 model of pre- and post-test,
respectively, page 64.

CLASSSIZE

class-size, see equation (4.3), page 64.

MTHONLY

proportion of qualiﬁed math teachers, see equation (4.3), page 64.

PRETTEST MEAN

β0 , intercept of level-1 pre-test score model in Figure 4.2, page 65.

POSTTEST MEAN

α0 , intercept of level-1 post-test score model in Figure 4.2, page 65.

γ0 , . . . , γ4

intercept and regression coeﬃcients of the level-2 model of pre-test,
see equation (4.3), page 64.

γ5 , . . . , γ7

regression coeﬃcients of the level-2 model of post-test, see equation (4.4), page 66.

uβ , uα0
0

error terms of the level-2 model of pre- and post-test, respectively,
page 64.

2
2
σu , σuα
β0
0

variances of error terms of the level-2 model of pre- and post-test,
respectively, page 66.
xxv

v C2T 0 , v C1T 1

intercept vector of the measurement model of covariates X in Cohort
2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82.

C2T
C1T
e2 0 , e2 1

residual vector of the measurement model of covariates X in Cohort
2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82.

ξ C2T 0 , ξ C1T 1

latent factor vector of the measurement model of covariates X in
Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82.

g C2T 0 , g C1T 1

factor loading vector of the measurement model of covariates X in
Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82.

δ C2T 0 , δ C1T 1

intercept vector of the measurement model of outcome Y in Cohort
2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82.

C2T
C1T
e1 0 , e1 1

residual vector of the measurement model of outcome Y in Cohort 2
at Time 0 and in Cohort 1 at Time 1, respectively, page 82.

η C2T 0 , η C1T 1

latent factor vector of the measurement model of outcome Y in Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82.

λC2T 0 , λC1T 1

factor loading vector of the measurement model of outcome Y in
Cohort 2 at Time 0 and in Cohort 1 at Time 1, respectively, page 82.

AC2T 0 , AC1T 1

intercept vector of the structural model in Cohort 2 at Time 0 and
in Cohort 1 at Time 1, respectively, see equation (4.6), page 82.

B C2T 0 , B C1T 1

factor loading vector of the structural model in Cohort 2 at Time 0
and in Cohort 1 at Time 1, respectively, see equation (4.6), page 82.

U C2T 0 , U C1T 1

residual term of the structural model in Cohort 2 at Time 0 and in
Cohort 1 at Time 1, respectively , see equation (4.6), page 82.

c1

a (constant) vector , page 83.

p1

a (multiplier) vector , page 84.

p2

a (multiplier) vector, page 85.

c3

a (constant) vector, page 86.

SU M (.)

sum function adding up all the components of a matrix, page 87.

c4

a (constant) vector , page 89.

ˆP
δi osP re

estimate of schooling eﬀect δ P osP re from sample i (size is ni ) based
upon longitudinal C2T 0 − C2T 1 data, see equation (4.10), page 93.
xxvi

ˆSCD
δi

ith estimate of schooling eﬀect δ SCD based upon C1T 1 − C2T 1
data, see equation (4.11), page 93.

δ
BIASinitial

initial estimation bias on schooling eﬀect before matching , see equation (4.12), page 94.

M δ SCD
ˆ
i

estimate of schooling eﬀect δ SCD based upon matched C1T 1−C2T 1
data, see equation (4.13), page 94.

δ
BIASmatching

estimation bias on schooling eﬀect after matching, see equation (4.14),
page 94.

δ
BRRmatching

after matching bias reduction rate on schooling eﬀect estimate, see
equation (4.15), page 95.

nC2T 0

sample size of data from Cohort 2 at Time 0, page 97.

(Y, X, W )C2T 0
i

ith data record of the Cohort 2 at Time 0 sample, page 97.

nC1T 1

sample size of data from Cohort 1 at Time 1, page 97.

(Y, X, W )C1T 1
i

i th data record of the Cohort 1 at Time 1 sample, page 97.

p1

level-1 propensity score representing the probability that a student
belongs to focal Cohort 2, page 97.

p2

level-2 propensity score representing the probability that a class belongs to focal Cohort 2, page 98.

M in[a, b]

minimum distance between vector a and vector b, page 97.

ATT

average treatment eﬀect, page 119.

L N
y1i , y1i L

treatment units that can be matched from the local control group
and from the non-local, respectively, page 120.

xxvii

Chapter 1
Introduction
Synthetic Cohort Design (SCD) was proposed and used for cross-national comparisons of
schooling (Wiley and Wolfe, 1992) in the Third International Mathematics and Science Study
1995 (TIMSS 1995). In this design, growth is determined by comparing data of adjacent
grades. Two cohorts, 7th grade (Cohort 1) and 8th grade (Cohort 2, the focal cohort),
are measured at the same time point (Time 1). The SCD by nature is a quasi-longitudinal
design.
In a longitudinal design as usd in the Second International Mathematics Study (SIMS,
IEA, 1977), two waves of data are collected from only the focal cohort (i.e., Cohort 2) at Time
0 and Time 1. Cohort 2 at Time 0 serves the control (Burstein, 1992; Wolfe, 1987). After the
“treatment” of one year of schooling, data from Cohort 2 at Time 1 are collected to assess
the schooling eﬀect (δC2T 1−C2T 0 ), deﬁned as the average of “changes in mathematics
achievement over the time-span of one school year at the particular grade level” (Wiley and
Wolfe, 1992, p. 299).
The contrast between the two cohorts in SCD, a quasi-longitudinal schooling eﬀect
1

(δC2T 1−C1T 1 ) is a measure of δC2T 1−C2T 0 under the historical equivalence of groups
(HEoG) assumption. The HEoG assumption asserts that students in adjacent grades are
similar except for the additional year of schooling. The HEoG assumption implies that improving comparability 1 of adjacent grades will improve unbiasedness of the schooling eﬀect
estimate in the quasi-longitudinal SCD. The greater the comparability of the students in
adjacent grades, the less bias of δC2T 1−C1T 1 .
However, the HEoG assumption can be violated by selection bias 2 (Heckman, 1979),
resulting from the diﬀerence between Cohort 1 at Time 1 and Cohort 2 at Time 0. This
1 The comparability can be statistically tested through the multivariate group comparison
approach (Tatsuoka, 1971). The comparability of two groups is revealed by the discriminant
function (DF ) of covariates X (Tatsuoka, 1971). X includes p column vectors such as
level-1(student level) covariates and their interaction terms and level-2 (class or school level)
covariates and their interaction terms. It is denoted as X = (x1 , . . . , xp ). The DF is a
linear combination of the covariates X s. For example, the ﬁrst DF of X s can be written
as DF = v11 x1 + v12 x2 + · · · + v1p xp . Vector V1 = (v11 , . . . , v1p ) is the ﬁrst eigenvector
vector of Σw −1 Σb . Σw and Σb are the within-group and between-group variance covariance
matrixes of X s , respectively. Notice that within-group variance-covariance matrix Σw
should be computed by taking account of the hierarchical structure of the data (See Schmidt
and Houang, 1986). If (Σw )−1 Σb has q non-zero eigenvalues, then we can deﬁne q DF ’s,
namely, DF1 , DF2 , . . . , DFq . Using DF simpliﬁes group comparability testing when the
number of covariates is large. Following the descriptive discriminant analysis (DDA, Huberty
and Olejnik, 2006), the group comparability testing can determine if Cohort 2 at Time 0
is comparable to Cohort 1 at Time 1 regarding covariates X. A two step testing approach
can be conducted. First, one computes the latent roots of (Σw )−1 Σb to construct DF ’s
and test if two groups are ominously comparable or not. Second, if they are not ominously
comparable, univariate group comparison can reveal the non-comparability. Thus, a set
of covariates will be identiﬁed. The two groups should be non-comparable on each of the
covariates in terms of their means. The set of covariates then can be used as matching
variables.
2 Selection bias, also called sample selection bias (Heckman, 1979), refers to the bias due
to the use of non-random samples in estimating relationships among variables of interests. It
can occur in two situations: 1) self selection by objects being studied, and 2) sample selection
by researchers or data analysts. Using selection-biased samples results in a biased estimate
of the eﬀect of an intervention that should have been randomly assigned. The intervention
can refer to “treatment of migration, manpower training, or unionism” (Heckman, 1979, p.
154).
2

dissertation study consider three sources of selection bias that is due to: 1) hierarchical
school structure; 2) measurement errors on covariates; and 3) omitted variables.
First, both cohorts in SCD are naturally observed in the hierarchical school system
and selection bias can arise from level-1 and/or level-2 covariates. Second, biased estimate
of schooling eﬀect can occur due to implicit measurement errors of covariates, which are
commonly treated as perfect measures in analysis. Third, biased estimate of schooling eﬀect
can be caused by the omitted covariates, whose eﬀect is indicated by an attenuated R2 . R2 ,
as an index of goodness of ﬁt of regression, indicates the proportion of explained variance
by the model. The mathematical deﬁnition of bias of schooling eﬀect estimate is discussed
in Chapter 2. It outlines the potential of matching in reducing selection bias of SCD. A
structural equation model (SEM) framework is used to deﬁne the HEoG assumption of SCD
in Chapter 3. Simulation studies are designed in Chapter 4 and 5 to examine how well
matching reduces each of the three types of selection bias.

1.1

Research Goals

This dissertation study uses a multi-level multivariate propensity score matching approach
to examine the SCD in estimating the schooling eﬀect on the focal cohort’s learning of
mathematics. The simulation is based on a two-level structural equation model developed
from USA data of the Second International Mathematics Study (SIM S − U SA, IEA, 19781982).
Three types of simulated selection bias correspond to the three sources of selection bias:
hierarchical school structure, measurement errors on covariates, and omitted variables. Performances of matching in reducing estimation bias of the schooling eﬀect estimate is evaluated
3

using the bias reduction rate.
To reduce the simulated selection bias on level-1 and level-2 covariates in the hierarchical
school structure, dual matching (combining both individual matching and cluster matching)
is proposed. When simulated selection bias is due to measurement errors on the covariates,
latent variable matching is proposed to reduce selection bias. Simulated selection bias due
to omitted covariates is realized by manipulating the values of R2 , level-1 and level-2. These
simulation designs examine how well matching can reduce selection bias in the use of SCD.
In particular, contrasting the quasi-longitudinal SCD with an optimal longitudinal design
- the Solomon Four-Group Design (SFGD, Solomon, 1949; Campbell & Stanley, 1966) in the
SEM framework, this study focuses on the following three research questions:

1. What is the relationship between SCD and SFGD?
1.1 What are the strengths and weaknesses of SCD compared with SFGD?
1.2 How can each type of selection bias fail SCD in educational settings?
2. How does the HEoG assumption mathematically assure unbiased estimates in SCD?
2.1 What is the mathematical deﬁnition of HEoG assumption?
2.2 What are the statistical deﬁnitions of the three types of selection bias that
violate the HEoG assumption?
3. Does matching reduce the selection bias, and, if so, to what extent?

Section 1.3 and 1.4 introduce the SCD and the SFGD. Section 1.5 explains the importance
of HEoG in SCD. Chapter 2 reviews related literature to identify the sources of selection
bias and delineate the use of matching to reduce bias. Chapter 3 compares and contrasts
4

SCD and SFGD to mathematically delineate the HEoG assumption. Chapter 4 discusses
the simulation models and parameter manipulations to mimic each situation that results
in selection bias and violation of the HEoG assumption. Chapter 5 presents the simulation
results and reveals to which level the proposed matching approach reduces the selection bias.
Chapter 6 draws conclusions, discusses the limitations, and outlines future research plans.

1.2

Solomon Four-Group Design

In the SFGD, participants are randomly assigned to one of four diﬀerent groups (See Figure
1.1). For example, treatment T can be a particular instructional method. The dependent

Figure 1.1: The Solomon Four-Group Design: R represents group randomization, T treatment, and O assessment. Besides randomization, matching as the other approach to create
comparable groups (Solomon, 1949).

variable is measured on O’s, administered as pre-test (Time 0, before T) and post-test (Time
1, after T). The SFGD investigates if changes on the dependent variable are due to some
interaction between the pre-test eﬀect (τ ) and the treatment eﬀect(δ). The peculiarity of
the SFGD is that it includes Group 2, which is an extension of the pre- and post-test control
group design (Campbell and Stanley, 1966). The pre- and post-test control group design
depicted in Group 1, Experimental and Control, provides the researcher an instrument to
estimate gain at the individual level that is attributed to the treatment, plus potentially the
5

eﬀect from taking the pre-test. Group 2, Experimental and Control, is a replicate of the
treatment-control study, except that the subjects do not receive the pre-test and are thus
free from the inﬂuence of the pre-test. Denote
¯j
Yt : mean of group j at time t, with j = E1 , C1 , E2 , C2 ; t = 0,1,
α : main eﬀect due to history or prior learning3 ,
τ : main eﬀect due to taking the pre-test,
γ : main eﬀect due to maturation (between Time 0 and Time 1),
δ : main eﬀect due to the treatment.
The expected values of the means of the four groups can be expressed as
¯E
¯E
Experimental Group 1: E(Y0 1 ) = α E(Y1 1 ) = α + γ + τ + δ
¯C
¯C
Control Group 1:
E(Y0 1 ) = α E(Y1 1 ) = α + γ + τ
¯E
Experimental Group 2:
E(Y1 2 ) = α + γ + δ
¯C
Control Group 2:
E(Y1 2 ) = α + γ.
While this is a main eﬀect model, interaction eﬀects, if identiﬁable, can be parameterized
using the four main eﬀects, which accounts for additional diﬀerences among the means
4 (Solomon, 1949, p.143). Randomization, however, is a very powerful requirement because
all initial diﬀerences among the groups are attributed to sampling variation. For example,
with randomization, the main eﬀect due to prior learning can be assumed to be constant and
3 Prior learning eﬀect α was not speciﬁed in Solomon (1949). It is important to specify
it in this study for three reasons. First, it is a quantity that relates or indicates initial
comparability of the groups. Second, it involves the process of computing treatment eﬀect
(see the sub-section of Synthetic Cohort Design in this study.). Third, more importantly, it
will be a critical criterion to match the groups.
4 For example, pre-test and treatment interaction eﬀect is a function of the four quantities.
The quantity in Experimental Group 1 is QE = f (α + δ + γ + τ + I), the quantity in
1
Experimental Group 2 is QE = f (α + γ + δ), The quantity in Control Group 1 is QC =
2
1
f (α + τ + γ), the quantity in Control Group 2 is QC = f (α + γ). Interaction eﬀect denoted
2
as I is computed as QE − QE − QC + QC .
1
2
1
2
6

identical for all groups. This does not mean that the performances of groups on the pretest
are identical, but that the diﬀerences are solely due to sampling variation.
Furthermore, randomization also renders the interaction eﬀects to be indistinguishable
from the main eﬀects. For example, the joint eﬀect (the interaction) of prior learning (α)
and taking the pre-test (τ ) is confounded with τ , in the sense that α × τ cannot be separated
from τ (Solomon, 1949, p.148). In summary, randomization provides the main justiﬁcation
for the main eﬀect model and the tools to obtain an unbiased estimate of the treatment
eﬀect, δ.

1.3

Synthetic Cohort Design

Figure 1.2 depicts the time line of a SCD. Four possible sets of data, two cohorts at two
time points, can be collected. But the SCD intends to collect data at only Time 1. The
hypothetical data at Time 0 are important to the comparison of SCD with the SFGD.
¯
Following the notation used before, Ytl is denoted as the mean of the dependent variable for
cohort l at time t, l = C1, C2 and t = 0, 1.

Then the expected values of the dependent

variable at four data points can be hypothetically parameterized as
C2
¯ C2
Cohort 2, Time 0: E(Y0 ) = α0
C1
¯ C1
Cohort 1, Time 0: E(Y0 ) = α0
C2
¯ C2
Cohort 2, Time 1: E(Y1 ) = α0 + δC2T 0−C2T 1 + γ C2 + τ C2
C1
¯ C1
Cohort 1, Time 1: E(Y1 ) = α0 + δ C1 + γ C1 + τ C1
Putting Figure 1.2 into the context of TIMSS 1995, Cohort 1 corresponds to 7th graders,
and Cohort 2 corresponds to 8th graders. Data are collected at only Time 1. The SCD
investigates δC2T 0−C2T 1 , which is the eﬀect of 8th grade instruction on student learning.
7

Figure 1.2: Longitudinal vs. Quasi-Longitudinal Comparison. For interpretation of the
references to color in this and all other ﬁgures, the reader is referred to the electronic version
of this dissertation.
The schooling eﬀect of 7th grade instruction is δ C1 .
Students do not take pre-test at Time 0. Thus, τ C1 = τ C2 = 0. Eﬀects due to
maturation, γ’s, are confounded with eﬀects due to history or prior learning α’s. The SCD
model, at only Time 1, is as follows:
C2
¯ C2
Cohort 2, Time 1: E(Y1 ) = α0 + δC2T 0−C2T 1
C1
¯ C1
Cohort 1, Time 1: E(Y1 ) = α0 + δ C1
An estimate of δC2T 0−C2T 1 through SCD is

C1
C2
¯ C2 ¯ C1
δC2T 1−C1T 1 = E(Y1 − Y1 ) = (α0 + δC2T 0−C2T 1 ) − (α0 + δ C1 ).

(1.1)

δC2T 1−C1T 1 represents what Cohort 1 at Time 1 would learn if they go through the school
8

system that students of Cohort 2 at Time 0 had gone through.
Under the HEoG assumption, Cohort 1 at Time 1 ( Cohort 1 at 7th grade) are ( Cohort
1 at 7th grade) are comparable with Cohort 2 at Time 0 (Cohort 2 at 7th grade), i.e.,
C1
C2
C1
C1
α1 ≡ α0 , and α1 is (α0 + δ C1 ).
Therefore, δC2T 1−C1T 1 produces an unbiased estimate of δC2T 0−C2T 1 under the
HEoG assumption. The HEoG assumption allows the claim that schooling eﬀect δC2T 1−C1T 1
and δC2T 0−C2T 1 are identical.
That is, mathematically HEoG ⇒ δC2T 1−C1T 1 = δC2T 0−C2T 1 . Mathematically
“⇒” reads “implies”.
However, the lack of HEoG can cause biased schooling eﬀect estimation using SCD, and
the bias can be deﬁned as

ˆ
ˆ
BIAS(δC2T 1−C1T 1 ) = E(δC2T 1−C1T 1 ) − δC2T 0−C2T 1 5 .

(1.2)

Schooling eﬀect δC2T 1−C1T 1 and δC2T 0−C2T 1 are identical in the counterfactual sense.
That is,

(HEoG|Randomization) ⇒ δC2T 1−C1T 1 = δC2T 0−C2T 1 ,

where “|” reads “given”.
In other words, randomization assures the HEoG assumption that allows the claim that
schooling eﬀect δC2T 1−C1T 1 and δC2T 0−C2T 1 are identical.
5 The mean of the sample distrubtion of δ
ˆ
C2T 1−C1T 1 is δC2T 1−C1T 1 . At population
level, δC2T 1−C1T 1 is an estimator of δC2T 0−C2T 1 . This way bias can be deﬁned as
BIAS(δC2T 1−C1T 1 ) = δC2T 1−C1T 1 − δC2T 0−C2T 1 .
9

1.4

Why Is HEoG Critical in SCD

While the use of randomization is commonly found in natural science, it is often diﬃcult,
if not impossible to implement6 in educational settings. Randomization at the individual
level is not practical in education (Hedges, 2007a) because the population of interest is
hierarchically structured, and students are naturally nested in clusters such as classrooms
and schools.
Alternatively, classrooms may be randomly assigned to treatment and control groups as
in the cluster-randomized trials design. Nevertheless, frequently we cannot conduct a study
by assigning classrooms to a “control” or non-treatment condition, which leads to moral and
legal concerns (Bloom, 2004). Even if it is legitimate in practice, randomization may still
result in non-comparable groups (Berger, 2005).
Without randomization, the comparability of Cohort 1 at Time 1 and Cohort 2 at Time 0
relies on the HEoG assumption. Because of the quasi-experimental nature of SCD, selection
bias (Heckman, 1979) must be accounted for. Selection bias indicates the potential violation
of the HEoG assumption in the quasi-experimental SCD.
Matching has been successfully used to reduce bias in the equation (1.2) at level-1 units
(Rosenbaum, 1986). It can also be used to reduce selection bias at level-2 units (Cox and
Reid, 2000; Freedman et al., 1990; Hong and Raudenbush, 2006; Raab and Butcher, 2001,
p. 29). Matching has the potential to reduce selection bias of SCD, so that

(HEoG|M atching) ⇒ δC2T 1−C1T 1 = δC2T 0−C2T 1 .
6 Rubin (1978) pointed out that “In some cases with strong prior knowledge, randomization may not be important” (p. 55).
10

In other words, matching creates a situation where HEoG assumption holds so that
schooling eﬀect δC2T 1−C1T 1 unbiasedly estimates δC2T 0−C2T 1 .
Nevertheless, matching can only reduce selection bias to a certain degree because of the
potential of having unobserved and unmatched characteristics. Further comparisons between
SCD and SFGD in the structural equation modeling framework expand the mathematical
deﬁnition of HEoG assumption that clariﬁes the use of SCD to estimate schooling eﬀects in
Chapter 3.

1.5

Signiﬁcance

This study is important for many reasons. First, this study provides empirical evidence
for policy makers and educational researchers to examine the role of SCD in large scale
research. Despite the fact that randomization has been diﬃcult to apply in educational
studies, it is possible to examine schooling eﬀect through curriculum sensitive assessment for
policy making if the intervention groups are assured comparable (Cochran, 1972 in Rubin,
2006). With the use of matching (Cochran and Rubin, 1973), the SCD retrospectively creates
comparable groups to examine student learning in the school system.
Furthermore, this study is methodologically informative and illustrative. It comprehensively evaluates the performance of post-hoc matching in validating the implementation of
SCD in large scale education studies. It provides multifaceted measures such as bias reduction of schooling eﬀect estimate to facilitate program evaluation.
This study also provides a suitable bias reduction approach and an analytical tool for
researchers when intact clusters such as schools and classrooms are used. It is important to
ﬁnd the optimal bias reduction methods to draw statistical inference for educational studies.
11

The study of multilevel matching in this work will help educational researchers achieve
research goals. For example, if a study uses clusters (e.g. classrooms) as units to examine
the eﬀectiveness of a new intervention, level-2 matching will accomplish the analytical goal.
However, the single use of level-2 matching may leave the hidden/micro individual diﬀerences,
which are known to aﬀect student learning. If the analytical units are individuals, level-1
matching will be needed to make treated and control individuals comparable.

12

Chapter 2
Literature Review
This chapter identiﬁes three types of selection bias: hierarchical school structure (Section
2.1.2), measurement errors on covariates (Section 2.4.1), and omitted variables (Section 2.5).
The three sources of selection bias identiﬁed in this chapter are closely tied to corresponding
matching approaches in the context of the SCD in Chapter 4.
After reviewing bias reduction approaches such as propensity score matching in Section
2.2, I identify the necessity of dual matching (McCall, 1923) in reducing both level-1 and
level-2 selection bias (Section 2.3). The attenuated bias reduction rate due to measurement
errors is reviewed in Section 2.4.1. The measurement error adjusted propensity score model
is introduced in Section 2.4.2. A hybrid propensity score estimation model accounting for
measurement errors is delineated in Section 2.4.3. Section 2.5 reviews the selection bias
problem due to omitted variables. This problem is generally indicated by a shrunk R2
because the eﬀect of the omitted variables is “compressed” into residuals and inﬂates the
residual variance. In hierarchically structured data, the inﬂated residual variance due to
omitted variables will further aﬀect intraclass correlation (ICC).
13

2.1

Deﬁnitions of Bias and Selection Bias

Bias or error occurs when the expected value of estimate (observed score or observed treatment eﬀect) diﬀers from the value being estimated (true score or true eﬀect) through sampling (S¨rndal et al., 2003). Bias on the estimate of treatment eﬀect can be attributed to
a
measurement errors on outcome Y (Fuller, 1987) and/or to the initial diﬀerence on covariates
X in the two groups being compared (Carroll et al., 2006; Cochran and Rubin, 1973). Thus,
the outcome is a sum of three parts (Wooldridge, 2002): 1) the eﬀect of treatment variety;
2) the eﬀect of initial diﬀerence due to covariates X; and 3) the random measurement error.
The negative eﬀect of initial diﬀerence due to covariates X has been studied for decades. Neyman (1923) pointed out that the plot characteristic besides the treatment impacted potential
yield, which implies that the plot characteristic can be a source of bias (in Rubin, 1990, p.
283). Similarly, Gosset (“Student”, 1923, in Rubin, 1990) found that the initial diﬀerences
among the groups aﬀected the outcome besides the intervention. The initial diﬀerence can
bias the treatment eﬀect estimation and mislead one’s conclusion (Campbell and Stanley,
1966).
Given the hierarchically structured nature of educational settings, participants are not
assigned to groups at random, initial diﬀerence can occur on the level-1 covariates and/or
at level-2 covariates. The following two sections mathematically demonstrate how selection
bias on covariates X aﬀects the estimate of treatment eﬀect at the individual or group level.

2.1.1

Mathematical Deﬁnition of Bias at Individual Level

Counterfactual Model This is the ideal case, which involves no covariates X. The counterfactual responses (Holland, 1986; Morgan and Winship, 2007) in treatment and control
14

groups can be written as
D
y i = µD + u D ,
i

(2.1)

D
where D = 0, 1 and i = 1, ..., nD . yi are the ith counterfactual responses under treatment
(D = 1) or control (D = 0). µD represents the mean of Dth group’s responses. uD are the
i
ith random errors in Dth group. The composite equation is

yi = µ0 + D ∗ (µ1 − µ0 ) + (u0 + D ∗ i ),

(2.2)

where i = u1 − u0 and E( i ) = 0. Let the population level treatment eﬀect be δ. Then
i
i
1
0
δ = E(yi − yi ) = E(µ1 ) − E(µ0 ).

(2.3)

Covariates X can be added to the counterfactual model. Let µD = M D (X) be a function
of covariates X (e.g., in Cochran and Rubin, 1973), such as a linear equation1

M D (X) = αD + (X − µD )β D .
X

(2.4)

Drop the subscript and let residual be = u1 − u0 .

1 The layout of the regression equations in Section 2.1.1 and Section 2.1.2 is diﬀerent from
the layout of those in Section 2.1.3. That is, for example, the regression coeﬃcient vector β
of the covariates X are displayed in the equation as Xβ in Section 2.1.1 and 2.1.2, but as
β X in Section 2.1.3. Because the regression coeﬃcient vectors have superscribes indicating
group membership, this way will simplify the layout of regression equations in Section 2.1.1
and 2.1.2.
15

Write
y D = α0 + (X − µ0 )β 0 + u0 + D ∗ (α1 − α0 ) + D ∗ [X(β 1 − β 0 ) − µ1 β 1 +
X
X

(2.5)

µ0 β 0 )] + (u0 + D ∗ ).
X
Further simplify the equation above to obtain the treatment eﬀect, denoted as δ(X),
which is
δ(X) = E{(α1 − α0 ) + [X(β 1 − β 0 ) − µ1 β 1 + µ0 β 0 )] + }.
X
X

(2.6)

Bias occurs when the estimate of treatment eﬀect is NOT equal to the true value, i.e.,
δ(X) = δ.
Thus, bias is deﬁned as
∆(X) = δ(X) − δ.

2.1.2

(2.7)

How Selection Bias Aﬀects Treatment Eﬀect Estimate

The detailed decompositions below identify illustrative situations where bias many occur.
Assume that there are no measurement errors on X and the expectations of the residuals u1
and u0 are zero, bias reduction will focus mainly on components related to M D (X).
Initial Diﬀerence on Covariates X

The initial diﬀerence on covariates X in treatment

and control group can generate bias on estimating the treatment eﬀect.
Let ∆X be a non-zero constant vector representing the treatment and control group
16

mean diﬀerence of covariates. That is,

µ1 = ∆X + µ0 .
X
X

(2.8)

The function is M D (X) linearly additive with D = 0, 1. That is,

M D (X + ∆X ) = M D (X) + M D (∆X ).

(2.9)

The treatment eﬀect estimate is

E[M 1 (X + ∆X ) − M 0 (X)] = E{(α1 − α0 ) + [X(β 1 − β 0 ) − (µ0 + ∆X )β 1 + µ0 β 0 ]. (2.10)
X
X

Because the initial diﬀerence ∆X is not equal to zero, the treatment eﬀect is biased. If we
assume the regression coeﬃcients are the same, i.e., β 1 = β 0 = β, then the bias component
can be identiﬁed as
∆(X) = β∆X .

(2.11)

Unequal Regression Coeﬃcients of Treatment and Control Groups If the treatment and
control group means are equal, i.e., µ1 = µ0 = µX , the diﬀerence between the regression
X
X
coeﬃcients of treatment and control groups will bias the treatment eﬀect estimate. In this
situation, the bias component is

∆(X) = E[(X − µX )∆β ],

where ∆β = β 1 − β 0 .
17

(2.12)

In practice, unequal regression coeﬃcient of the treatment and control groups may be
due to the interaction terms between covariates X and treatment status variable D. Let
vector βX×D be the regression coeﬃcient vector of the interaction terms. The regression
coeﬃcient vector of covariates X in treatment group is

β 1 = β 0 + βX×D .

(2.13)

∆β = βX×D .

(2.14)

This implies

In practice, one can add an interaction term of a covariate x and D in the regression and
test if this coeﬃcient is statistically zero.

2.1.3

Selection Bias in Hierarchically Structured Data

In a hierarchically structured population (Cochran, 1963), ith individual is assumed to be
nested in k th class. At student level (level-1), outcome Yik and Xik covariates are observed.
At class level (level-2), Wk covariates are also available.
Let D be a binary treatment-control indicator with 1 representing the treatment group,
0 otherwise. The mathematical relationship between outcome variable and covariates is
modiﬁed from (Schmidt and Houang, 1986) in a counterfactual sense:

D
D
D
Yik = αD + βX (µD − µX ) + βW (Wk − µW ) + uD + β (Xik − µD )+
Xk
Xk
k
∗
D
(βk ) (Xik − µD ) + eD ,
Xk
ik
18

(2.15)

∗
where D = 0, 1 and βk = (βk − β) . µX and µW are the population means in the vector
format, µX is the population mean vector of the level-1 covariates in k th class, vector βX
k
includes the between-level-2-unit regression coeﬃcients of the aggregated means of level-1
covariates, vector βW includes the regression coeﬃcient of the observed level-2 covariates,
vector β includes the pooled within-level-2-unit regression coeﬃcient of the level-1 covariates,
and vector βk includes the within-level-2-unit regression coeﬃcients of the level-1 covariates
in k th class.
The counterfactual treatment eﬀect is
1
0
∗
1
E[Yik − Yik ] = E(α1 − α0 ) + E{(βX − β − βk ) (µ1 − µ0 )} + E{βW (Wk −
Xk
Xk

(2.16)

0
∗
1
0
Wk )} + E{(β + βk ) (Xik − Xik )} + E(u1 − u0 ) + E(e1 − e0 ).
k
k
ik
ik
1
0
1
0
In the counterfactual case 2 , (µ1 − µ0 ) = (Wk − Wk ) = (Xik − Xik ) ≡ 0 holds, and
Xk
Xk
the expected treatment eﬀect is E(α1 −α0 )+E(u1 )−E(u0 )+E(e1 )−E(e0 ). The treatment
k
k
ik
ik
eﬀect is unbiased because the residual expectations are zero. However, bias can result in at
1
0
1
0
least one of the three situations: µ1 − µ0 = 0, (Wk − Wk ) = 0, and (Xik − Xik ) = 0.
Xk
Xk
1
0
The three situations represent diﬀerent sources of bias. (Wk − Wk ) = 0 indicates that
1
treatment and control groups are not comparable at level-2 units such as classes. (Xik −
0
Xik ) = 0 indicates the level-1 diﬀerence within k th class. µ1 − µ0 = 0 represents the
Xk
Xk
diﬀerence due to the non-comparable aggregated means of the level-1 covariates X within
k th class.
2 If randomization is used, equivalence is at the expection/mean level rather than at level1-2 units or level-1-1 units. The subscripts, k of level-2 units and i of level-1 units will be
1
0
dropped. Thus, [µ1 − µ0 ] = [E(W 1 ) − EW 0 ] = [E(Xk ) − E(Xk )] ≡ 0 holds. For the
X
X
purpose of simplicity, Chapter 2 uses counterfactual model to deﬁne bias and demonstrates
how selection bias occurs.
19

2.2

Propensity Score Matching for Bias Reduction

Bias reduction (Cochran and Chambers, 1965; Cochran and Rubin, 1973; Rubin, 1973a,b,
1976a,b, 1979, 1980) is critical for treatment eﬀect estimation in causal inference and in
program evaluation. Initial diﬀerence on covariates X between treatment and control groups
should be taken into account so that the bias on Y can be reduced and the treatment eﬀect
can be accurately estimated. Research on bias reduction has shown that combining matching
and regression adjustment (e.g., Stuart and Rubin, 2008) can achieve the best bias reduction,
even if the relationship between outcome Y and covariates X is nonlinear (Cochran and
Rubin, 1973; Rubin, 1973b, 1979).
Bias reduction techniques have been developed for observational studies in causal inference and in program evaluation. These techniques include Cochran’s three approaches including pairing, balancing, and stratiﬁcation (Cochran, 1953)3 , post-hoc matching (Abadie
and Imbens, 2006, 2007; Rubin, 1973a,b, 1976a,b, 1979, 1980), analysis of covariance (e.g.,
Cochran, 1957, 1969), inverse propensity score weighting (Angrist and Pischke, 2009; Horvitz
and Thompson, 1952; McCaﬀrey and Hamilton, 2007), statistical modeling with adjustment (e.g. WLS estimation in HLM frame work, see Hong and Raudenbush, 2006), and
double robust estimation using regression adjustment and inverse propensity score weighting (Kang and Schafer, 2007).
Post-hoc matching depends on the summary measure, a functional composite of covariates (Rubin, 1985). The most commonly used composites in matching are the Mahalanobis
distance (e.g., Rubin, 1980) and the propensity score (Rosenbaum and Rubin, 1983). This
3 Pairing is to exact match each unit of treatment with one from the control group;
balancing is to match treatment and control on means of a covariate; and stratiﬁcation is to
stratify data on a covariate.
20

study mainly focuses on propensity score matching.
Propensity score matching is a post-hoc bias reduction method, which has been commonly used on observational data to approximate the individual-randomized trials to study
a treatment eﬀect of interest (Cochran, 1953, 1968a; Cochran and Rubin, 1973; Rosenbaum
and Rubin, 1983; Rubin, 1973a,b) Because the “golden rule” of randomization is generally
broken in observational studies (Cochran, 1953; Rosenbaum, 2002), the post-hoc matching
approach uses covariates or summary measures of covariates (e.g., Mahalanobis distance in
Rubin, 1980) to match the treatment and control groups (Rosenbaum and Rubin, 1985)
to remove bias. When the number of the covariates is not large, matching can be done
on covariates (Cochran, 1953). Rosenbaum and Rubin (1983) further developed a holistic
measure, the propensity score, to avoid dimensionality issues when the number of covariates
increases dramatically and makes matching impossible on original covariates.
Propensity scores play a critical role in bias reduction techniques. Research has found
that accurately estimated propensity scores can reduce bias and assist researchers in drawing
causal inferences in observational studies (e.g., Greenland, 2004). A propensity score represents conditional probability that an individual is assigned to the treatment group (Rosenbaum and Rubin, 1983). Generally, it is estimated by using logistic regression with the
covariates collected from the participants as independent variables and the participant’s status on the treatment variable as the dependent variable (Rosenbaum, 1987). The covariates
in the logistic regression are non-treatment variables such as the participant’s background
characteristics. An propensity score summarizes the information of these covariates. Using such propensity scores, a researcher can match a participant from the treatment group
with a participant from the control group to achieve group comparability to facilitate causal
21

inference (Rubin and Waterman, 2006).
Rubin (1979) deﬁnes the bias reduction rate as the percentage reduction in expected
squared bias of treatment eﬀect, which is also adopted by Stuart and Rubin (2008). In this
study, the index in Rubin (1979) and Stuart and Rubin (2008) will be used as a measure to
evaluate how well the bias reduction methods perform. (See details in Chapter 4.).

2.3

Matching on Hierarchically Structured Data

In educational experiments for program evaluation, often researchers sample larger units
from a hierarchically structured population (Cochran, 1963; Scott and Smith, 1969). Examples of larger units include clusters (Donner, 1998), groups (Cornﬁeld, 1978; Murray, 1998;
Raudenbush, 1997), communities (Freedman et al., 1990; Martin et al., 1993; Thompson
et al., 1997), or schools (Hedges, 2007a; Hong and Raudenbush, 2006; Murray et al., 1994;
Raudenbush, 1997).
This design, a cluster-randomized trial (CRT; Donner and Klar, 2000; Murray, 1998;
Raudenbush, 1997) consists of clusters made up of multiple individuals (Bloom, 2004). For
example, in educational settings, cluster sizes usually vary from 5 in the Early Childhood
Longitudinal Program (ECLS) to 60 in the Longitudinal Study of American Youth (LSAY)
and the average cluster size is about 13 (Hedges, 2007a).
When clusters are assigned to interventions, non-comparable treatment-control groups
can arise from either level-1 or level-2 covariates (Raab and Butcher, 2001) resulting in selection bias. Selection bias happens frequently in observational studies or in studies where
randomization fails (Rosenbaum, 2002), and leads to an inappropriate estimate of the intervention eﬀect (Rubin, 1973a).
22

When large scale hierarchically structured data are used, selection bias needs to be evaluated (Berger, 2005) and its inﬂuence removed from the estimate of the intervention eﬀect
in the analytical stage (Hong and Raudenbush, 2006). The dual matching method proposed
in this study is used to approximate a matched cluster-randomized design to reduce bias in
the estimate of the intervention eﬀect for multilevel educational data.
Matching has been widely used in observational studies to reduce estimation bias of the
intervention eﬀect because it can signiﬁcantly improve comparability of the groups (Cochran,
1953; Rubin, 2001). Dual matching was used in experimental education using clusters such
as classrooms and school districts to study intervention eﬀects decades ago. Pittman (1921,
as cited in McCall, 1923) conducted the delaying match after the ﬁnal test score had been
collected. In order to achieve comparability at both cluster- and level-1, Pittman (McCall,
1923, p. 49) matched individuals after higher level covariates such as wealth, quality of population, teacher quality were taken into account in matching. Unfortunately, the literature
has not followed up to study the potential of Pittman’s duel matching approach for CRT
designs.
Using large scale group-randomized trial data Griﬃn et al. (2009) found that matching
on diﬀerent sets of level-2 covariates resulted in diﬀerent levels of statistical power. Using
propensity scores estimated from kindergarten retention data to approximate the CRT design
has been studied by Hong and Raudenbush (2006). However, Hong and Raudenbush (2006)
did not use the estimated propensity scores for matching. They used propensity scores to
stratify the data, and then treated the propensity scores as a covariate to analyze stratiﬁed
data to estimate the intervention eﬀect using the hierarchical linear model.
The three situations identiﬁed in Section 2.1.3 represent diﬀerent sources of bias, which
23

need diﬀerent matching strategies.

2.3.1

Level-1 Matching

1
0
There is level-1 bias within k th class. That is, (Xik − Xik ) = 0, implying that the counterfactual equivalence is not satisﬁed in practice. In other words, the ith student is either in
the treatment group or in the control group, but not in both. For a student, say John in the
treatment group, there is no exact John-equivalent in the control group. The two groups are
not equivalent. Here, level-1 matching can be conducted using covariates X to match each
treated individual with one from non-treated individuals. µ1 − µ0 = 0 results when the
Xk
Xk
aggregated means of the level-1 covariates X within k th school are non-comparable. This
second level bias can be reduced when the bias on level-1 covariates X is removed. By ignoring the hierarchical structure, treated individuals are matched with control individual to
compute bias reduction rate. The analysis units for intervention eﬀect are the outcomes of
the matched individuals.

2.3.2

Level-2 Matching

Treatment and control groups are not comparable at level-2 units such as classes or schools,
1
0
that is (Wk − Wk ) = 0 in the counterfactual sense. Bias reduction here focuses on second
level units, and one would conduct level-2 matching. By ignoring level-1 variables, clusters
are matched by using level-2 propensity scores to compute bias reduction rate.
24

2.3.3

Dual-Matching

When both level-1 and level-2 covariates are not comparable, it needs dual matching, including both level-2 matching and level-1 matching. That is, treated clusters are ﬁrst matched
with control clusters, then, within each matched treatment-control pair, individuals are
matched. The detailed dual-matching procedure is discussed in Chapter 5.

2.4

Measurement Errors and Matching

Modeling errors of measurement (Cochran, 1968b) on observed (surrogate) variables has
been well developed in general regression (Fuller, 1987), logistic regression (Carroll et al.,
2006) (Spiegelman et al., 1997), and survey sampling (Biemer et al., 2004; Fuller, 1995;
Hansen et al., 1961; Mahalanobis, 1946). Few studies have been done in matching after
Cochran and Rubin (1973) reviewed the eﬀect of measurement errors of covariate on bias
reduction.
Measurement issue can be a more serious issue in a study because measurement errors
will reduce the eﬃciency of adjustment (Cochran, 1968a,b; Cochran and Chambers, 1965).
While the literature is replete with guidelines on how to use matching to estimate treatment
eﬀect, there is little research on how to adjust the measurement errors of the covariates used
for matching. Most researchers simply analyze and estimate propensity scores by taking the
covariates as the perfect measures.
Measurement errors attenuate the regression coeﬃcient β of covariate x on outcome
˜
y (J¨reskog and S¨rbom, 1996). Let β be the attenuated regression coeﬃcient. It has
o
o
˜
˜
β < |β| and β = β × R 4 in bivariate regression (Cochran & Rubin, 1973). R is the
4 Statistically, R has the upper limit of 1.
25

attenuation rate due to measurement errors in the covariate x. Bias reduction rate on
˜
β
covariate x is attenuated by R =
due to measurement errors in covariate x (Cochran
|β|
& Rubin, 1973). Cochran (1968a, in Rubin, 2006, p. 20) found that under a simple linear
1
regression, the measurement error on x attenuates the bias reduction rate by a factor of 1+ρ ,
where ρ is the reliability of x.

2.4.1

Measurement Errors Adjusted Propensity Scores

When the true covariates (X ∗ ) are measured by vector X with errors, matching should be
based upon the propensity scores P r(D = 1|X ∗ ), rather than P r(D = 1|X). There are two
methods (Carroll et al., 2006) to adjust for measurement errors in the logit model used to
estimate propensity scores.
The ﬁrst method assumes the true covariates have not been observed and the na¨
ıve
parameter estimates are obtained using the observed covariates. An approximately consistent estimator of the parameters is provided through a functional adjustment on the na¨
ıve
estimator (see details below).
The second method to adjust for measurement errors in logistic regression is through
structural modeling, in which the distribution of the true covariates is parametrically modeled. For example, likelihood and Bayesian approach based structural equation modeling (Carroll et al., 2006; Lee, 2007, Chapter 9) can be used to accomplish this goal.
The ﬁrst method proposed by Rosner et al. (1990, 1989) is a two-step regression calibration logit model. The ﬁrst step is to use one sample (N=n1 ) and ﬁt a logit model

logit[P r(D = 1|X, H)] = α0 + α1 X + α2 H,
26

(2.17)

where X = (x1 , ..., xp ) is the observed surrogate covariate vector of the true X ∗ . H =
(h1 , ..., hq ) is the covariate vector without measurement errors. The regression coeﬃcient
vectors are denoted as α1 = (α11 , . . . , α1p ) and α2 = (α21 , . . . , α2q ).

Secondly, a model
X ∗ = ι0 + ι1 X + ι2 H + r

(2.18)

is ﬁt on the other sample (N = n2 ), in which both X and X ∗ are available. The regression
coeﬃcient vectors are denoted as ι1 = (ι11 , . . . , ι1p ) and ι2 = (ι21 , . . . , ι2q ). The mean
and covariance of r are 0 and ΣX ∗ |(H,X) , respectively. The adjustment matrix is κ =


0
ι1

. The regression coeﬃcients of the “true” logit model
ι2
I
logit[P r(D = 1|X ∗ , H)] = β0 + β1 X ∗ + β2 H

(2.19)

ˆ
ˆ ˆ
β = (β1 , β2 ) = κ−1 α.
ˆ ˆ

(2.20)

can be obtained using

This two-step adjusted method requires that the dimensions of X and X ∗ are equal. However, the measurement errors adjusted propensity scores cannot be obtained directly using
this approach because the integral in the following propensity score function does not have
a closed-form solution (Carrel et al, 2006, p.91).

P r[D = 1|X ∗ , H] =

L(.)exp[−(1/2){x∗ − µX ∗ } Σ−1 {x∗ − µX ∗ }]dx∗
X∗
,
(2π)p/2 |ΣX ∗ |1/2

(2.21)

where L(.) = logit−1 (β0 + β1 X ∗ + β2 H). The approximate approach is developed in Weller
27

et al. (2007). That is,

P r[D = 1|X ∗ , H] ≈ exp(α0 + α1 X ∗ + α2 H),

σ2 ∗
2 X |X,H − ι β ,
α0 = β0 − β1
0 1
2

(2.23)

α1 = ι1 β1 ,

(2.24)

α2 = ι2 β1 + β2 .

where

(2.22)

(2.25)

and

The distribution of (X ∗ |X, H) is a multivariate normal M N (µX ∗ |X,H , σ 2 ∗
).
X |X,H
However,when the second sample having both X and X ∗ observed, is not available,
one cannot estimate the measurement-errors-adjusted propensity scores. In this situation,
an alternative method such as Bayesian logit model with the implementation of Markov
chain Monte Carol (MCMC) can be ﬁtted through WinBUGS (Lunn et al., 2000). The
measurement error adjusted propensity scores P r[D = 1|X ∗ , H] can be simultaneously estimated when the regression coeﬃcients and covariates X have been updated using MetropolisHastings algorithm. This MCMC-based propensity scores approach is then used for matching.

2.4.2

Structural Equation Modeling as an Alternative

Structural equation modeling (SEM, Bollen, 1989; J¨reskog and S¨rbom, 1996) incorporates
o
o
the latent variable to take into account measurement errors on surrogate variables. Propen28

sity scores can be estimated through the following hybrid model:




X = ι0 + ι1 X ∗ + eX

.

(2.26)


 logit[Pr(D = 1|X ∗ )] = ι∗ + ι∗ X ∗
0
1

The ﬁrst equation, a measurement model, captures the linear relationship between the
latent X ∗ and observed X in both treatment (D = 1) and control (D = 0) group. The
second equation, a structural model, captures the nonlinear relationship between the latent
X ∗ and a latent propensity score P r(D = 1|X ∗ ).
Adapting a latent variable approach circumvents the post-hoc coeﬃcient adjustment (e.g.,
Weller et al., 2007) discussed in Section 2.4.1. The estimated propensity scores can be used
in matching. Note that the latent propensity score P r(D = 1|X ∗ ) and latent X ∗ have a oneto-one functional relationship. Matching on estimated propensity scores is mathematically
equivalent to matching on the estimated factor scores 5 of the latent X ∗ .
In educational studies, factor scores of the latent X ∗ such as academic proﬁciency measures and ability constructs have been used to match individuals to achieve comparable
groups (e.g., classical true score in Van Van der Linden and Hambleton, 1997). The latent
construct is measured by multiple surrogate items. A most commonly used model is the item
response theory, in which individual ability is calibrated through a set of items with presumptive diﬃculty and discrimination parameters (Lord and Novick, 1968). The calibrated
ability estimation represents an examinee’s academic proﬁciency that the set of items are
designed to measure. However, matching on latent variables may fail to remove bias due to
5 The estimated factor scores can be derived using SEM software packages such as
Mplus (Muth´n and Muth´n, 2009).
e
e
29

other observed covariates H that are free of measurement error. A composite measure such
as propensity score that summarizes both the latent X ∗ and covariates H becomes necessary in matching. Further, matching needs to take into account the hierarchical structure of
latentX ∗ and covariates H.

2.5

Omitted Variables

Cochran and Rubin (1973) studied failing to include a confounding variable in matching.
For example, he true linear regression has two covariates, x1 and x2 . Matching is on only
x1 . x2 is omitted from matching.
Bias reduction of matching on only x1 depends on the regression relationship between
x1 and x2 . If the regression of x2 on x1 has equal slopes but non-equal intercepts in the
two populations, treated and control, the ﬁnal bias due to matching on only x1 is larger
than the initial diﬀerence. This is referred to the “parallel but not identical” case (p. 45).
If the regression of x2 on x1 has a “parallel but non-linear” (p. 45) relationship and the
sample sizes are large, matching on only x1 reduces partial selection bias due to x2 . The
reduced selection bias due to x2 is only proportional to the partial linear regression of x2 on
x1 . When the number of covariates is large, the omitted covariates and the included have
more complex relationships. The pattern of bias reduction due to omitted covariates will be
diﬀerent from what was found in Cochran and Rubin (1973).
Instead of studying bias reduction due to the correlated relationship between the omitted
covariates and the included covariates, the literature focuses on how the relationship between
included variables and outcome variable aﬀects bias reduction (Austin et al., 2007). Austin
et al. (2007) conducted a Monte Carlo study to compare the strengths of diﬀerent propensity
30

score models in matching treated and untreated groups. They found that correlation and
association between the outcome and the covariates is required. For example, using covariates
that are associated with exposure but independent of the outcome will result in a situation
where more treatment units cannot be matched.

If essential covariates are omitted from the model, it attenuates the association between
the outcome and the covariates included in the model. In the general linear regression
model, omitted covariates decrease the proportion of variation explained and inﬂate the
residual variable. Thus, it results in an attenuated R2 , an index of goodness of ﬁt of the
regression.

Matching on a measure that is not highly correlated with the outcome variable, results
in ineﬀective matching (Martin et al., 1993). In order to obtain eﬀective matching, the
correlation between the matching covariate and the outcome variable needs to be at least
.40, when there are 10 pairs of clusters being matched (Martin et al., 1993).

It is more complicated when omitted variables occur in the analysis involving hierarchically structured data. Unlike the level-1 variation which increases when covariates are
omitted, between-cluster variation will not necessarily increase when covariates are omitted
from the model (Raudenbush, 1997). Thus the relationship between omitted variables and
the intraclass correlation (ICC) is complex.

The level-1 and level-2 residual variances deﬁne an index, ICC, which indicates the similarity among the units in a cluster. The decomposition of total variance of outcome variable
2
2
indicates the within level-2 unit variation (σ1 ) and between level-2 unit variation (σ2 ). Thus
31

ICC is deﬁned in Raudenbush and Bryk (2002) as
2
σ2
.
ICC =
2
2
σ2 + σ1

(2.27)

2
2
Increasing σ1 and/or σ2 will result in complication on ICC. That is, ICC, summarizing
the two sources of variation, is not a clean-cut index to be linked to the bias reduction of
either level-1 matching or level-2 matching (Abadie and Imbens, 2006). In order to examine
the performance of matching, this dissertation study simulates data by manipulating the
level-1 and/or level-2 residual variances, rather than the ICC index.

32

Chapter 3
Theoretical Framework
This chapter deﬁnes the HEoG assumption using a structural equation model (SEM) framework. SEM (Bollen, 1989) takes into account measurement errors and depicts the measurement relationship between the surrogate variables and their latent variables, whose relationships are captured by the structural model (J¨reskog and S¨rbom, 1996)
o
o

3.1

Solomon Four-Group Design in SEM Framework

SFGD includes two Experimental groups and two Control groups. It also involves two
testing points, pre-test at Time 0 and post-test at Time 1. Using randomization, SFGD
assumes that the four groups are comparable (Solomon, 1949). Table 3.1 displays the SEM
framework of the SFGD. Each group involves two measurement models and one structural
model capturing the latent growth relationship from pre-test to post-test. Let Yti be outcome
variable in group i at time t, with i = E1 , C1 , E2 , C2 ; t = 0, 1.

33

Table 3.1: Solomon Four-Group Design in Structural Equation Modeling Framework
Intervention

Group 1 (with pre-test)

Group 2 (without pre-test)

Experimental
Acceleration Eﬀect: Slope ν
Intervention Eﬀect: Intercept δ
Maturation Eﬀect: γ
Pre-test Eﬀect: τ

E
Y0 1 = δ0 + Λ0 η0 + ε0
E
Y1 1 = (δ0 + δ) + Λ1 η1 + ε1

E
Y0 2 = δ0 + Λ0 η0 + ε0
E
Y1 2 = (δ0 + δ) + Λ1 η1 + ε1

η1 = τ + γ + νη0

η1 = γ + νη0

C
Y0 2 = δ0 + Λ0 η0 + ε0
C
Y1 2 = δ0 + Λ1 η1 + ε1

C
Y0 2 = δ0 + Λ0 η0 + ε0
C
Y1 2 = δ0 + Λ1 η1 + ε1

η1 = τ + γ + 1η0

η1 = γ + 1η0

Control
Maturation Eﬀect: γ
Pre-test Eﬀect: τ

34

3.1.1

SEM of Experimental Group 1

Let η0 and η1 represent latent mathematics proﬁciency at pre-test and post-test time points,
E
respectively. η0 is measured by k0 surrogate variables, which are denoted in vector Y0 1 =
[Y1 , Y2 , · · · , Yk ]. η1 is measured by k1 surrogate variables, which are denoted in vector
0
E1
Y1 = [Y1 , Y2 , · · · , Yk ].
1
The measurement equation for Experimental Group (denoted with the superscript E1 )
at pre-test time (denoted with the subscript 0) is

E
Y0 1 = δ0 + Λ0 η0 + ε0 .

(3.1)

The measurement equation for Experimental Group at post-test time (denoted with the
subscript 1) is
E
Y1 1 = (δ0 + δ) + Λ1 η1 + ε1 .

(3.2)

The extra term δ in the intercept of the post-test measurement model indicates the interE
E
vention eﬀect. If Y0 1 and Y1 1 are binary vectors (e.g., 1 or 0), the two measurement
equations become item response theory models (Lord, 1980). In the two-parameter logistic
(2PL) model (Lord and Novick, 1968), measurement equations for pre- and post-test are

E1
prb(Y0 = 1)
 = a (η − b );
log 
0 0
0
E1
1 − prb(Y0 = 1)

(3.3)


E1
prb(Y1 = 1)
 = a (η − b ),
log 
1 1
1
E1
1 − prb(Y1 = 1)

(3.4)



and



respectively. b0 and b1 are the item diﬃculty parameter vectors. a1 and a0 are the discrim35

ination parameter vectors.
The structural equation
η1 = τ + γ + νη0 1

(3.5)

reveals the latent mathematics proﬁciency growth between two time points. γ and τ indicate
the maturation eﬀect and learning eﬀect due to taking pre-test, respectively.
The latent growth rate, namely the acceleration eﬀect of intervention, is captured by the
slope ν.

3.1.2

SEM of Control Group 1

Control Group 1 is a pre-post test design without treatment involved. The measurement
equations are the same as those in Treatment Group 1. However, ν is unity in the structural
equation
η1 = τ + γ + 1η0 ,

(3.6)

indicating a “ﬂat” latent growth rate due to the lack of intervention. Still, latent mathematics
proﬁciency at the post-test time point is diﬀerent from that at the pre-test time point by a
sum of the maturation eﬀect γ and the learning eﬀect due to taking pre-test τ .

1 This equation speciﬁes a general case. For the purpose of simplicity, ν can be set as 1
0
across all four groups. τ and γ are speculated in the structural model is because they reﬂect
changes associated with the latent mathematics proﬁciency. The latent changes will further
reveal their eﬀects through the measurement equation.
36

3.1.3

SEM of Experimental Group 2

Experimental Group 2 only observes post-test data. Because there is no pre-test, the learning
eﬀect τ is zero and is dropped from the implicit structural model

η1 = γ + νη0 .

(3.7)

The pre-test measurement model and structural model are not observable and are displayed
in the dashed boxes (See Table 3.1, Row 2 Column 3).

3.1.4

SEM of Control Group 2

Treatment and pre-test are not applied to this group. The acceleration eﬀect of intervention
ν is unity and the learning eﬀect τ is zero in the implicit structural model

η1 = γ + 1η0 .

(3.8)

Pre-test measurement model and structural model are not observable and are displayed in
the dashed boxes (See Table 3.1, Row 3 Column 3).

3.1.5

Pre-Equivalence of Groups (PEoG) Assumption

The measurement model at Time 0 is written as

Y0 = δ0 + Λ0 η0 + ε0 .

(3.9)

Deﬁnition The measurement model at Time 0 in the equation (3.9) holds equivalently in
37

the four groups. It is called pre-equivalence of groups (PEoG) assumption, which is mathematically equivalent to Y0 ⊥D, where ⊥ means “ independent of ” (e.g., Rosenbaum and
Rubin, 1983). D is the binary group membership indicator variable, representing treatment
(D = 1) or control (D = 0).
Y0 ⊥D holds if η0 ⊥D holds because Y0 is a linear function of η0 . η0 ⊥D holds for two
reasons, described below.
First, Group 1 and Group 2 are independently selected randomly (Solomon, 1949) from
the same population, whose latent mathematics proﬁciency is η0 . Second, participants have
an equal chance to be assigned to either intervention or control through random assignment.
Let D1 be the binary group membership indicator variable in Group 1, with D1 =
E1 , C1 . Y0 ⊥D implies Y0 ⊥D1 ; and η0 ⊥D implies η0 ⊥D1 . Correspondingly, let PEoG-1
represent PEoG assumption in only Group 1. The PEoG-1 assumption implies the equation
(3.9) holds equivalently in Group 1.
Similarly, Let D2 be the binary group membership indicator variable in Group 2, with
D2 = E2 , C2 . Y0 ⊥D2 and η0 ⊥D2 can be derived. Also, The PEoG-2 assumption implies
the equation (3.9) holds equivalently in Group 2.
Theorem 3.1.1. (Equivalence of using Group 1 and Group 2 to Estimate Latent Growth)
Because of random assignment of treatment and control, participants at the pre-test time
point have equal chance to be assigned to either the treatment or the control group. Given
PEoG assumption, latent growth estimate derived from Group 2 is equivalent to that derived
from Group 1.
Proof. First, the latent growth can be estimated using data collected in Group 1, Experimental and Control. The latent growth is estimated as the latent mean diﬀerence between
38

two populations. That is,

E(η1 |E1 ) − E(η1 |C1 ) = E(τ + γ + νη0 |E1 ) − E(τ + γ + 1η0 |C1 ) = E[(ν − 1)η0 ].

(3.10)

This holds because of η0 ⊥D1 , with D1 = E1 , C1 .
Second, given PEoG assumption, the latent growth can be estimated using Experimental
Group 2 and Control Group 2. That is,

E[(η1 |E2 ) − (η1 |C2 )] = E[(γ + νη0 |E2 ) − (γ + 1η0 |C2 )] = E[(ν − 1)η0 ].

(3.11)

This holds because η0 ⊥D2 , with D2 = E2 , C2 .
Thus, it proves that given PEoG assumption using Group 2 is equivalent to using Group
1 in estimating the latent growth.

Theorem 3.1.2. (Equivalence of using Group 1 and Group 2 to Estimate True Gain) Given
the random assignment and PEoG assumption, true gain score estimate derived from Group
2 is equivalent to that derived from Group 1.

Proof. True gain estimate derived from Group 1 is

E
C
C
E
E[(Y1 1 − Y0 1 ) − (Y1 2 − Y0 2 )]
= E[δ + Λ1 (ν + γ + τ η0 ) − Λ0 η0 )] − E[Λ1 (ν + γ + 1η0 ) − Λ0 η0 ]
= δ + E[Λ1 (τ − 1)η0 ].
True gain estimate derived from Group 2 is
39

E
C
E[Y1 2 − Y1 2 ]
= E[δ + Λ1 (γ + τ η0 )] − E[Λ1 (γ + 1η0 )]
= δ + E[Λ1 (τ − 1)η0 ].
The two estimates are equal given the PEoG assumption.
In summary, under the PEoG assumption, using SFGD-Group 2 (SFGD-G2) suﬃciently
estimates the latent growth and the true gain score. Actually, how PEoG assumption assures
the use of SFGD-G2 is the same as how HEoG assumption assures the use of SCD.

3.2

Extended Solomon Four-Group Design in SEM Framework

Mathematically deﬁning HEoG for SCD requires an extended version of SFGD. The SFGD
is extended by including covariates X in the SEM framework. The SEM of the extended
SFGD has two measurement models of outcome Y , two measurement models of X, and three
structural models. Detailed model structures are in the following paragraphs, followed by
the further graphical comparison between SCD and the extended SFGD.
The SFGD is extended by including covariates X in the SEM framework. After including
covariates X in the SFGD, the measurement models 2 are as follows:


 Y = δ + λη + e ;
1

(3.12)


 X v + gξ + e .
=
2
2 For the purpose of simplicity, the superscripts (the group indices) are dropped. However,
Table 3.2 clearly displays each group in a separate row. Adding subscripts may be redundant.
Also, after covariates are included, the errors terms are now denoted by e’s rather than ε’s.
40

e1 ∼ N (0, Θe1 ) is independent of η, ξ and e2 , e2 ∼ N (0, Θe2 ) is independent of η, ξ and
e1 .
The structural model in LISREL8 notation (J¨reskog and S¨rbom, 1996) is
o
o

η = A + Bξ + U.

(3.13)

U ∼ N (0, ΘU ) is independent of ξ, e1 and e2 . Intercept A is generally set at zero for the
purpose of model identiﬁcation (Lee, 2007).
Table 3.2 displays the models for both pre-test at Time 0 (denoted as 0) and post-test
at Time 1 (denoted as 1).
The two measurement models for Y are as follows:






 Y0  

=
Y1







δ0   λ0
+
0
δ1





0

  η0  

+
λ1
η1


e10 
,
e11

(3.14)

with δ1 = δ0 + δ. δ represents the intervention eﬀect. The structural model is

η1 = τ + γ + νη0 ,

(3.15)

whose parameters are the same as those in Section 3.1.2.
Similarly, the two measurement models for covariates X are












 X0   v 0   g0

=
+
X1
v1
0







0

  ξ0   e20 

+
.
g1
ξ1
e21

41

(3.16)

Table 3.2: SEMs of the Extended Solomon Four-Group Design and Covariance Matrixes
Extended Solomon Four-Group Design
SEMs and Constraints
Y0
δ0
λ0
0
η0
e10
Experimental Group 1:
=
+
+
e11
Y1
δ1
0
Λ0
η1
e20
0
ξ0
g0
k0
X0
+
+
=
e21
ξ1
0
g1
k1
X1
η1 = τ + γ + νη0
ξ1 = a + πξ0
U0
η0
A0
B0
0
ξ0
=
+
+
U1
η1
A1
0
B1
ξ1
Control Group 1:
Constraints on Experimental Group 1’s Model:
Zero treatment eﬀect: δ1 = δ0 + δand δ=0 ;
Unity acceleration eﬀect: Slope ν=1.

Covariance
Appendix
B.1

Appendix
B.2

Experimental Group 2:

Constraints on Experimental Group 1’s Model:
X0 is not observed:
Y0 is not observed:
No pre-test eﬀect: τ =0.

Appendix
B.3

Control Group 2:

Constraints on Experimental Group 1’s Model:
X0 is not observed:
Y0 is not observed:
Zero treatment eﬀect: δ1 = δ0 + δand δ=0 ;
No pre-test eﬀect: τ =0.
Unity acceleration eﬀect: Slope ν=1.

Appendix
B.4

42

The relationship between ξ1 and ξ0 is captured by a structural model,

ξ1 = a + πξ0 .

(3.17)

When a=0 and π=1, the covariates are invariant across two time points.
Further, the relationship of the latent variables of X and Y is revealed in the structural
model












 η0   A0   B0

=
+
η1
A1
0







0

  ξ0   U0 

+
.
B1
ξ1
U1

(3.18)

The extended SFGD SEM by nature is a two-level factor analysis model (Muth´n, 1994)
e
because of the hierarchically structured school system. Its covariance can be decomposed
into within-cluster (denoted as w) and between-cluster(denoted as b) components (Muth´n,
e
1994; Schmidt, 1969).
Appendix B.5 has the detailed procedures that derive the variance-covariance of the
extended SFGD’s Experimental Group 1 listed in Appendix B.1. Appendix B.2-B.4 have
the other three variance-covariance matrixes that are derived using the constraints listed in
Table 3.2.

3.2.1

Extended-PEoG Assumption

Still, data at Time 0 are not collected from Group 2, Experimental and Control. Time-0-SEM
is




 Y0 = δ0 + λ0 η0 + e10



 X = v0 + g0 ξ0 + e20
 0



 η =A +B ξ +U
0
0
0 0
0
43

(3.19)

Thus, this model is not testable. Group 2 produces an unbiased intervention eﬀect estimate
in the counterfactual sense because one needs to assume that Time-0-SEM implicitly holds
equivalently in Group 1 and Group 2.
Deﬁnition The assumption that the equation (3.19)implicitly holds equivalently in Group
1 and Group 2 is the extended-PEoG assumption.
The extended-PEoG assumption assures that true gain score estimate derived from Group
2 is unbiased and equivalent to that derived from Group 1.
Theorem 3.2.1. (Equivalence of using Group 1 and Group 2 to Estimate True Gain in
Extended-SFGD) Given the random assignment of treatment and control and the extendedPEoG assumption, true gain score estimate derived from Group 2 is equivalent to that derived
from Group 1.
The proof includes two parts: 1) under the assumption, the extended-SFGD’s Group 2
and Group 1 are equivalent in estimating the true gain; and 2) the estimate of the true gain
is unbiased.
Proof. 1) Under the assumption, the extended-SFGD’s Group 2 and Group 1 are equivalent
in estimating the true gain.
Equation (3.19) implies

Y0 = δ0 + λ0 A0 + λ0 B0 ξ0 + λ0 U0 + e10 ,

(3.20)

−1
where ξ0 = g0 (X0 − v0 − e20 ).
In Group 2, both Experimental and Control data at Time 0 are not observable. Thus,
E
C
through Group 2 treatment eﬀect is estimated by E(Y1 2 − Y1 2 ).
44

C
E
Further, the extended-PEoG assumption implies that Y0 1 ≡ Y0 1 . So that
C
T
C
E
E(Y1 2 − Y1 2 ) = E(Y1 2 − Y1 2 − 0)
C
E
C
E
C
E
= E[Y1 2 − Y1 2 − (Y0 1 − Y0 1 )] (becasue of Y0 1 − Y0 1 = 0)
C
C
E
E
= E[(Y1 2 − Y0 1 ) − (Y1 2 − Y0 1 )]
C
C
E
E
= E[(Y1 2 − Y0 1 )] − E[(Y1 2 − Y0 1 )].

E
E
C
C
Note that E[(Y1 2 − Y0 1 ] − E[(Y1 2 − Y0 1 )] is the true gain estimate derived from
Experimental Group 1 and Control Group 1.
E
E
The true gain is the diﬀerence between the average treatment gain (E[(Y1 2 − Y0 1 )])
C
C
and the average control gain (E[(Y1 2 − Y0 1 )]).
Thus, it proves that under the extend-PEoG assumption using Group 2 is equivalent to
the use of Group 1 to estimate the true gain.
2) Estimate of the true gain is unbiased.
Based on Time-1-SEM (see Table 3.1)



 Y1 = δ1 + λ1 η1 + e11



,
 X1 = v1 g1 ξ1 + e21




 η =A +B ξ +U
1
1
1 1
1

(3.21)

along with η1 = τ + γ + νη0 , write

Y1 = δ1 + Λ0 (τ + γ + νη1 ) + e11 .
45

(3.22)

Because η1 = A1 + B1 ξ1 + U1 , write

Y1 = δ1 + Λ0 (τ + γ + νη0 ) + e11
= δ1 + Λ0 [τ + γ + ν(A0 + B0 ξ0 + U0 )] + e11 .
The average treatment gain across Time 0 and Time 1 is

E
E
Y1 2 − Y0 1 = (δ1 − δ0 ) + Λ0 (τ + γ) + (Λ0 ν − λ0 )(A0 + B0 ξ0 + U0 ) + e11 − e10 .

In Control Group 1 and Control Group 2, δ1 = δ0 and ν=1. The average treatment gain
across Time 0 and Time 1 is

C
C
Y2 2 − Y1 1 = (δ1 − δ1 ) + Λ1 (ν + γ) + (Λ1 − Λ0 )(A1 + B1 ξ1 + U1 ) + e12 − e11 .

The true gain is
T
T
C
C
E[(Y1 2 − Y0 1 ) − (Y1 2 − Y0 1 )]
= E[(δ1 − δ0 ) + Λ0 (ν − 1)(A0 + B0 ξ0 + U0 )]
= δ + Λ0 (ν − 1)(A0 + B0 ξ0 )
= δ + E[Λ0 (ν − 1)A0 ] + E[Λ0 (ν − 1)B0 ξ0 ].
A0 is generally set at 0 in the SEM literature in a identiﬁable model (Lee, 2007). Because
C
T
T
C
of E(U0 ) = 0, E(ξ0 ) = 0, E[(Y1 2 − Y0 1 ) − (Y1 2 − Y0 1 )] = δ holds. Thus, the true gain
estimate is unbiased.

46

3.3

Synthetic Cohort Design in the Context of Solomon
Four-Group Design

The SFGD-G1 is illustrated inside the black dashed box in Figure 3.1. Experimental Group
1 and Control Group 1 are represented by the black circles. Each group is tested twice:
pre-test and post-test.
The black-colored capital letter T in Figure 3.1 indicates treatment intervention administered after the pre-test. δ1 and δ0 , deﬁned in Section 3.2.1, represent the average treatment
gain and the average control gain, respectively. The PEoG assumption indicates that Experimental Group 1 and Control Group 1 are comparable at pre-test time point.

Figure 3.1: Synthetic cohort design in the context of Solomon Four-Group Design-G1

The SCD is illustrated in green in Figure 3.1. Ideally, two cohorts, Cohort 2 and Cohort
1, can be followed longitudinally across years such as three adjacent years, Yeari−1 ,Yeari ,
47

and Yeari+1 . Cohort 1 is in grade 7 at Yeari and grade 8 in Yeari+1 . Focal Cohort 2
is in grade 7 in Yeari−1 and grade 8 in Yeari . As a quasi-longitudinal design illustrated
in the green dashed box, SCD collects data at only Yeari from the two adjacent cohorts.
SCD requires the HEoG assumption implying that two 7th graders are comparable across
Yeari−1 andYeari . In other words, the HEoG assumption assures that Cohort 1 at time 1
(7th grade at Time 1) are comparable to Cohort 2 at Time 0 (7th grade at Time 0).
Figure 3.1 indicates a close relationship between two designs. Focal Cohort 2 (grade 8 in
Yeari ) is the Experimental Group. Treatment intervention represents one year of schooling
at 8th grade inYeari . δ1 is the schooling eﬀect due to one year of schooling. δ0 is not
estimable because “control”, without one year of schooling at 8th grade inYeari , is not
applicable in educational practice. Thus, SCD cannot estimate the true gain, which is the
diﬀerence between δ1 and δ0 in SFGD. Particularly, SCD is used to obtain δC2T 1−C1T 1 ,
the estimate schooling eﬀect δ1 , due to one year of schooling of 8th grade inYeari .
The estimator of δ1 in SCD, denoted as δC2T 1−C1T 1 , is a composite estimate of true
treatment gain plus the maturation and learning eﬀect due to previously taking the pre-test.
E
E
That is, based on SEM framework, the SCD estimates δ1 is the expectation of Y1 2 − Y0 1 .
That is,

E
E
E(Y1 2 −Y0 1 ) = E[(δ1 −δ0 )+λ1 (τ +γ)+(Λ0 ν −λ0 )(A0 +B0 ξ0 +U0 )+e11 −e10 ]. (3.23)

Adding constraints to the parameters can further simplify the estimation of schooling
eﬀect.
First, temporal measurement invariance assumption (Cheung and Rensvold, 2002; Kaplan, 2008, p. 64) assumes that factor loading vectors across two time points are equal:
48

Λ0 ≡ λ0 .
Second, it is plausible to assume a ﬂat growth rate in the latent relationship the equation
(3.15), that is, ν ≡ 1. This implies: 1) latent ability at Time 1 is invariant from Time 0;
and 2) growth eﬀect is fully captured by maturation and pre-testing eﬀect, plus interaction
eﬀect, if there is any. Thus,

δ1 = E[(δ1 − δ0 ) + Λ0 (τ + γ)] = (δ1 − δ0 ) + E(Λ0 (τ + γ)).

(3.24)

This indicates that school eﬀect estimate equals to the true gain (δ1 − δ0 ) plus the growth
eﬀect due to maturation and pre-test eﬀect.
The use of SCD to investigate the eﬀect of 8th grade instruction on student learning is
determined by how comparable the two 7th grades are across Yeari−1 andYeari . If they
are not comparable, schooling eﬀect estimate will be biased. But if the two 7th grades are
comparable, SCD approximates a longitudinal study SFGD-G1.
The necessary condition that two 7th grades are comparable across two time points is
assured by the HEoG assumption, which works in a counterfactual sense. This can be
mathematically written as

(HEoG|counterfactual) ⇒ δC2T 1−C1T 1 = δ1 .

(3.25)

It reads, “given the counterfactual condition, HEoG assumption holds and assures that SCD
approximates a longitudinal study SFGD-G1 in terms of estimating the eﬀect of one year
schooling.”

Deﬁnition Figure 3.1 graphically indicates that the PEoG assumption (see Section 3.1.5
49

and Section 3.2.2 for detail) is the SEM-version of the HEoG assumption. That is, the
equation (3.19) holds equally at two 7th grades in Y eari−1 and Y eari .

In practice, using randomization assures the (Extended-)PEoG assumption for (Extended)SFGD. If randomization is applicable in SCD, it can assure the HEoG. That is,

(HEoG|randomization) ⇒ δC2T 1−C1T 1 = δ1 .

(3.26)

This reads, “under the randomization condition, HEoG assumption holds and assures that
SCD approximates a longitudinal study SFGD-G1 in terms of estimating the eﬀect of one
year schooling.”
In educational settings, randomization is not applicable in SCD and it cannot assure the
HEoG, even though (Extended-)PEoG and HEoG are mathematically equivalent. Matching
is proposed to assure HEoG in SCD. That is,

(HEoG|matching) ⇒ δC2T 1−C1T 1 = δ1 .

(3.27)

This reads, “under the matching condition, HEoG assumption holds and assures that SCD
approximates a longitudinal study SFGD-G1 in terms of estimating the eﬀect of one year
schooling.”

3.4

Matching and HEoG Assumption

This section further depicts how matching will assure HEoG assumption. Let C2T 0, C1T 1,
and C2T 1 represent three time-cohort knots (See the following Figure 3.2.). CjT t indicates
50

the knot of Cohort j at Time t, with j = 1, 2, and t = 0, 1. Conceptually, there are two types
of matching.
C2T0-C1T1 Matching Implementing the matching approach in this situation is to match
individuals of Cohort 2 at Time 0 with those of Cohort 1 at Time 1. In other words, matching creates a group of 7th graders at Time 1 that are equivalent to 8th graders when they
were at Time 0. In real longitudinal design, the treatment eﬀect is the outcome diﬀerence
on Y of 8th graders between Time 1 and Time 0. Because of matching, the 8th graders at
Time 0 do not have to be measured. The assessment measure of matched 7th graders at
Time 1 can be treated as the equivalent assessment measure of the 8th graders when they
were in 7th grade at Time 0. However, this matching cannot be realized in the SCD because
data of C2T0 are not available. It is only applicable in simulation studies in order to verify
that matching can assure HEoG assumption. It will be discussed in details in Section 4.2
and 4.3 of Chapter 4.
C2T1-C1T1 Matching Implementing the matching approach in this situation is to match
individuals of two cohorts at Time 1. In other words, matching creates a group of 7th
graders at Time 1 that are equivalent to those 8th graders at Time 1 in terms of the simulated student characteristic variables. If the covariates are hypothetically unchanged across
Time 0 (Y eari−1 ) and Time 1 (Y eari ), then C2T 1 − C1T 1 matching will be equivalent to
C2T 0 − C1T 1 matching.
Quantifying HEoG assumption based on SEM framework provides a way of manipulating model parameters to generate non-comparable cohort data to examine how matching
improves cohort comparability to assure HEoG for SCD. The hierarchical data structure collected through the SCD determines the proposed dual matching. The following paragraphs
51

Figure 3.2: Three data sets, two-way matching and the HEoG assumption
will discuss how to match hierarchically structured data and how to match through the latent
variable to account for measurement errors on surrogate variables. The detailed simulation
plan is discussed in Chapter 4. The simulated data will be generated for C2T 0, C2T 1 and
C1T 1. Detailed data generation procedure of C2T 0 and C2T 1 is discussed in section 4.1.
The parameter manipulation for data generation of C1T1 is discussed in section 4.2.

52

Chapter 4
Simulation Study
A number of Monte Carlo simulations are conducted to test the performance of the proposed
matching approaches under diﬀerent conditions. The purpose of simulation is to create a
series of studies to examine how matching reduces bias of the schooling eﬀect estimate in
the SCD. Speciﬁcally, the simulation evaluates how eﬀectively matching can reduce selection
bias and improve the accuracy of the schooling eﬀect estimate.
In the use of SCD, selection bias is represented by the non-comparability of the two
cohorts at two time points (i.e., C2T 0 and C1T 1). Selection bias violates the HEoG assumption. It inﬂates estimation bias and attenuates the eﬃciency of the SCD in examining
student learning. Estimation bias is deﬁned by the diﬀerence between the quasi-longitudinal
growth and the true longitudinal growth (the schooling eﬀect).
Reducing selection bias will reduce estimation bias. Based upon the Second International
Mathematics Study (SIMS, IEA, 1977) data and the two-level structural equation model,
several selection bias situations are simulated to examine how matching improves the comparability of the two cohorts and reduces selection bias. Bias reduction rate and estimation
53

bias reduction rate indicate how well matching reduces bias of the schooling eﬀect estimate in
the SCD. Larger reduction rate indicates higher accuracy and eﬃciency. The bias reduction
rate is deﬁned in Section 4.3.
Mplus (Muth´n and Muth´n, 2009) is used to ﬁt the two-level SEM to estimate the
e
e
parameters, which are treated as unknown values to generate quasi-population data. R (R
Development Core Team, 2007) is used to conduct matching and examine its performance.
Section 4.1 discusses how to generate longitudinal data of focal Cohort 2 at Tim 0 and Time
1 (denoted as C2T 0 − C2T 1); Section 4.2 discusses how to generate data for Cohort 1 at
Time 1 (denoted as C1T 1). The SCD uses C2T 1 − C1T 1 data to estimate schooling eﬀect.

4.1

Data and Conceptual Model

The SIMS uses a longitudinal design to study the eﬀects of the curriculum and the classroom
instruction. The classroom process is “mapped” on the targeted 8th grade (focal Cohort
2) where the 13-year-old students are found. Two waves of mathematics achievement data
are collected, with the ﬁrst wave at the beginning of the school year (Time 0), and the
second at the end of the school year (Time 1). In this design, Cohort 2 at Time 0 is in
the control condition. After the “treatment” of one year of schooling, Cohort 2 at Time
1 data are collected to assess the schooling eﬀect (δC2T 1−C2T 0 ), deﬁned as the “changes
in mathematics achievement over the time-span of one school year at the particular grade
level” (Wiley and Wolfe, 1992, p. 299).

54

Table 4.1: Level-1 Descriptive Statistics of
Variables
Label
Outcome Variable
Post-Test Score
POSTTEST
Pre-Test Score
PRETEST
Student Level Latent Covariates
Educational Inspiration (EDUINSP) YPWANT
YPWWELL
YPENC
Self –Encouragement (SLFENCRG)

Family support (FMLSUPRT)

Math Importance (MTHIMPT)
Socioeconomics Status (SES)

the Final Two-Level Structural Equation Model (N=2,296)
Description
Mean
Total post-test scores on 40 items
Total pre-test scores on 40 items

Learn more math (Inverse code, 1-5 a )
Parents want me do well on math (1-5 a )
Parents encourage me to do well on math (Inverse code,
1-5 a )
YIWANT
I want to do well on math (1-5 a )
YMORMTH Looking forward to taking more math (1-5 a )
YNOMORE Take no more math if possible (Inverse code,1-5 a )
YPINT
Parents are interested in helping with math (Inverse code,
YFLIKES
1-5 a )
YMLIKES Father enjoys doing math (Inverse code , 1-5 a )
YFABLE
Mother enjoys doing math (Inverse code, 1-5 a )
YMABLE
Father is able to do math homework (Inverse code, 1-5 a )
Mother is able to do math homework (Inverse code,1-5 a )
YMIMPT
Mother thinks math is important (1-5 a )
YFIMPT
Father thinks math is important (1-5 a )
YFEDUC
Father’s education level (1-4 b )
YMEDUC Mother’s education level (1-4 b )
YFOCCN
Father’s occupation national code (1-8c )
YMOCCN Mother’s occupation national code (1-8c )

55

17.67
13.79
4.73
4.24
4.37
4.32
3.24
3.73
3.72
3.53
3.25
3.92
3.71
4.60
4.55
3.38
3.35
4.26
4.11

Table 4.1: Continued.
Variables
Label
Description
Student level Observed Covariates
Student Age
XAGE
Grand mean centered age
Parental Help
YFAMILY frequency of family help (1-3d )
Education Expectation EDUECPT Derived from YMOREED:
how many years of education parents expected (1-4e )
Time use on homework YMHWKT Typical week hours math for homework per week
a 1=not at all like ,..., 3=unsure,. . . , 5=exactly like; b 1=little schooling, 2=primary school,
3=secondary school, 4=college or university or tertiary education; c 1=unskilled worker, 2=semiunskilled worker , 3=skilled worker lower, 4=skilled worker higher, 5=clerk sales and related lower,
6=clerk sales and related higher, 7=professional and managerial lower, 8=professional and managerial higher; d 1=never/hardly, 2=occasionally, 3=regularly; e 1=up to 2 years, 2=2 to 5 years,
3=5 to 8 years, 4=more than 8 years.

56

Mean
0.00
1.75
2.97
2.98

Table 4.2: Level-2 Descriptive Statistics of the Final Two-Level Structural Equation Model (N=126)
Variables
Label
Description
Mean
Teacher/Class Level Covariates
Class Size
CLASSIZE
Created from the number of students in class
26.60
Opportunity to Learn
OLDARITH
Prior OTL of Arithmetic
7.10
OLDALG
Prior OTL of Algebra
NA
OLDGEOM
Prior OTL of Geometry
3.19
NEWARITH
This year’s OTL of Arithmetic
NA
NEWALG
This year’s OTL of Algebra
59.61
NEWGEOM
This year’s OTL of Geometry
41.37
Class Instruction
TPPWEEK
Actual number of hours of math instructions per week
5.09
School Level Covariates
Qualiﬁed Math Teacher MTHONLY
Proportion of qualiﬁed math teachers:
0.14
Rate
the sum of SSPECM and SSPECF divided by STCHS

57

SIMS data are collected from seven countries including the United States, Canada,
France, Belgium, Japan, Thailand and New Zealand. This study uses only SIMS data
collected in the United States (SIMS-USA, Wolfe, 1987).
The targeted population is Population A, including all students in the second year of
the general secondary education, technical secondary education, and vocational secondary
education programs in both type I (non-traditional) and type II (traditional) forms of school
organizations. In the SIMS-USA data, 8,332 students from 164 schools are sampled within
7 strata using the two stage complex sampling method. There are 5,584 students (of a
total of 8,332) are nested in 211 classes, which belong to four types of classes (Kifer, 1992):
Remedial (N=21), Regular (N=126), Enriched (N=46) and Algebra (N=18). The ﬁnal
data set includes 2,296 students in 126 Regular classes. The average class-size is about
27. Table 4.1 and Table 4.2 list the descriptive statistics of the outcome variables and
covariates1 (Schmidt and Burstein, 1992)
¯
Level-1 variable means (X1 ) and variance-covariance matrix (S1 ) are listed in Table 4.3.
¯
Level-2 variable means (W2 ) and variance-covariance matrix (S2 ) are listed in Table 4.4.
These means, variances, and covariances are computed using Mplus code listed in Appendix
A.1.

1 The labels of the covariates are adapted from the abbreviations in the SIMS questionnaire (Wolfe, 1987). The newly created abbreviations of the latent variables and the outcome
variables are listed in the nomenclature of this dissertation study.
58

Variable
RYPWANT
RYPENC
YPWWELL
YIWANT
YMORMTH
RYNOMORE
RYPINT
RYFLIKE
RYMLIKE
RYFABLE
RYMABLE
RYMIMPT
RYFIMPT
YFEDUC
YMEDUC
YFOCCN
YMOCCN
TOTPOS
TOTPRE
XAGE
EDUEPCT
YFAMILY
YMHWKT
OLDARITH
OLDGEOM
NEWALG
NEWGEOM
TPPWEEK
CLASSSIZ
MTHONLY

Mean
4.730
4.373
4.239
4.317
3.242
3.725
3.720
3.534
3.248
3.915
3.705
4.603
4.545
3.374
3.349
4.264
4.112
0.000
0.000
0.000
2.968
1.745
2.984
0.000
0.000
0.000
0.000
0.000
0.000
0.000

¯
Table 4.3: The Level-1 Variance Covariance Matrix (S1 ) and Means (X1 )
Variance Covariance
0.34
0.24 0.80
0.16 0.24 0.81
0.11 0.14 0.28 0.75
0.12 0.27 0.23 0.32 1.35
0.11 0.20 0.24 0.28 0.59 1.25
0.15 0.32 0.16 0.14 0.24 0.14 1.28
0.09 0.21 0.09 0.10 0.22 0.19 0.46 1.13
0.08 0.15 0.13 0.08 0.22 0.14 0.35 0.23 1.19
0.11 0.23 0.09 0.09 0.12 0.10 0.66 0.66 0.05 1.51
0.08 0.15 0.11 0.02 0.13 0.07 0.51 0.05 0.72 0.34 1.52
0.19 0.36 0.17 0.15 0.20 0.12 0.23 0.12 0.14 0.12 0.18 0.53
0.20 0.37 0.17 0.13 0.17 0.12 0.25 0.26 0.08 0.31 0.07 0.38
0.04 0.07 0.04 0.01 0.01 0.03 0.18 0.20 0.01 0.29 0.03 0.05
0.04 0.04 0.04 0.02 0.01 0.05 0.11 0.09 0.09 0.11 0.12 0.04
0.07 0.09 0.02 0.01 0.06 0.06 0.33 0.43 0.01 0.64 0.02 0.03
0.01 0.01 0.04 -.05 -.06 -.01 0.22 0.20 0.16 0.33 0.24 0.04
0.47 0.46 0.69 0.97 1.71 2.00 0.14 0.64 0.30 0.59 -.09 0.20
0.25 0.24 0.40 0.29 0.89 0.99 0.19 0.57 0.10 0.60 -.24 0.08
-.13 -.16 -.09 -.22 0.02 -.13 -.29 -.18 0.06 -.49 0.03 -.11
0.07 0.10 0.09 0.08 0.16 0.18 0.11 0.12 0.09 0.10 0.04 0.06
0.03 0.06 0.02 0.01 -.01 -.01 0.20 0.07 0.06 0.15 0.11 0.04
0.04 0.08 0.05 -.05 -.02 -.08 0.07 0.04 -.04 -.22 -.04 -.01
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

59

0.64
0.09
0.03
0.14
0.06
0.39
0.21
-.29
0.08
0.06
0.08
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.48
0.23
0.61
0.41
0.81
0.65
-.45
0.15
0.03
0.13
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.41
0.38
0.47
0.72
0.52
-.41
0.14
0.02
0.02
0.00
0.00
0.00
0.00
0.00
0.00
0.00

Variable
RYPWANT
RYPENC
YPWWELL
YIWANT
YMORMTH
RYNOMORE
RYPINT
RYFLIKE
RYMLIKE
RYFABLE
RYMABLE
RYMIMPT
RYFIMPT
YFEDUC
YMEDUC
YFOCCN
YMOCCN
TOTPOS
TOTPRE
XAGE
EDUEPCT
YFAMILY
YMHWKT
OLDARITH
OLDGEOM
NEWALG
NEWGEOM
TPPWEEK
CLASSSIZ
MTHONLY

Table 4.3: Continued
Variance Covariance

4.43
1.10
2.35
1.88
-1.13
0.45
0.01
0.25
0.00
0.00
0.00
0.00
0.00
0.00
0.00

3.93
1.66
1.47
-1.19
0.25
0.00
0.87
0.00
0.00
0.00
0.00
0.00
0.00
0.00

48.51
22.02
-4.85
1.45
-0.64
-1.73
0.00
0.00
0.00
0.00
0.00
0.00
0.00

29.98
-3.34
1.11
-0.50
-1.69
0.00
0.00
0.00
0.00
0.00
0.00
0.00

36.06
-0.47
0.02
-0.29
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.59
-0.01
-0.05
0.00
0.00
0.00
0.00
0.00
0.00
0.00

60

0.35
0.17
0.00
0.00
0.00
0.00
0.00
0.00
0.00

47.26
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00

0.00
0.00
0.00

0.00
0.00

0.00

Variable
RYPWANT
RYPENC
YPWWELL
YIWANT
YMORMTH
RYNOMORE
RYPINT
RYFLIKE
RYMLIKE
RYFABLE
RYMABLE
RYMIMPT
RYFIMPT
YFEDUC
YMEDUC
YFOCCN
YMOCCN
TOTPOS
TOTPRE
XAGE
EDUEPCT
YFAMILY
YMHWKT
OLDARITH
OLDGEOM
NEWALG
NEWGEOM
TPPWEEK
CLASSSIZ
MTHONLY

¯
Table 4.4: The Level-2 Variance Covariance Matrix (S2 ) and Means (W2 )
Mean
Variance Covariance
0.000 0.00
0.000 0.00 0.00
0.000 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
17.672 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
13.793 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.710 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.319 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5.961 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4.137 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
5.087 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
26.600 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.139 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

61

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

Variable
RYPWANT
RYPENC
YPWWELL
YIWANT
YMORMTH
RYNOMORE
RYPINT
RYFLIKE
RYMLIKE
RYFABLE
RYMABLE
RYMIMPT
RYFIMPT
YFEDUC
YMEDUC
YFOCCN
YMOCCN
TOTPOS
TOTPRE
XAGE
EDUEPCT
YFAMILY
YMHWKT
OLDARITH
OLDGEOM
NEWALG
NEWGEOM
TPPWEEK
CLASSSIZ
MTHONLY

Table 4.4: Continued
Variance Covariance

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

20.17
14.68
0.00
0.00
0.00
0.00
0.52
0.39
3.03
3.20
0.12
-7.34
0.07

13.04
0.00
0.00
0.00
0.00
0.80
0.46
2.93
2.09
0.01
-5.27
0.05

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

62

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

1.03
0.25
-0.08
-0.45
-0.06
0.17
-0.01

0.39
-0.07
-0.37
-0.03
-0.13
-0.01

6.24
3.97
0.18
0.35
0.00

5.59
-0.20
-0.77
0.00

3.73
-0.67
0.00

29.01
0.10

0.02

4.1.1

Two-Level Structural Equation Model Based on Data of
SIMS-USA

The conceptual model ﬁtted on SIMS-USA data is displayed in Figure 4.1. The post-test
score is predicted by the pre-test score, and teacher variables. The pre-test score is predicted
by student background variables and school characteristic variables. The two-level structural
equation model (Muth´n, 1994) is displayed in Figure 4.2. This model is a particular case
e
of the general two-level SEM discussed in Section 3.2.

Figure 4.1: Conceptual framework model on SIMS-USA data
In the Level-1 model, the post-test outcome variable (POSTTEST) is predicted by the
pre-test outcome variable (PRETEST). The pre-test score is predicted by student characteristics including age (XAGE), educational expectation (EDUCEPT), homework time
(YMHWKT), and frequency of family help on homework (YFAMILY).
The pre-test score is also predicted by ﬁve latent variables including educational inspiration (EDUINSP), self encouragement (SLFENCRG), family support (FMLSUPRT), importance of learning mathematics (MTHIMPT), and socioeconomic status (SES). The latent
variables and their surrogate variables are displayed in Table 4.1.
In the level-2 model, the intercept of pre-test score (denoted as β0 in equation 4.3 and
PRETEST MEAN in Figure 4.2) is predicted by teacher/school-level variables including
previous year’s opportunities to learn (OTL) of arithmetic (OLDARITH), previous year’s
63

OTL of algebra (OLDALG), class-size (CLASSSIZE), and qualiﬁed mathematics teacher
rate in school (MTHONLY).
The intercept of post-test score (denoted as α0 in equation 4.4 and POSTTEST MEAN
in Figure 4.2 ) is predicted by β0 and three class-level variables, which are current year’s
OTL of algebra (NEWALG), current year’s OTL of geometry (NEWGEOM), and weekly
hours of math instructions (TPPWEEK). The residuals are independent of one another.
Mplus (Muth´n and Muth´n, 2009) code that ﬁts the two-level SEM is listed in Appendix
e
e
A.1. Table 4.5 lists the factor loadings, regression coeﬃcients, and residual variances.
Level-1 Model:

POSTTEST = α0 + α1 PRETEST + epost ;

PRETEST = β0 + β1 XAGE + β2 EDUCEPT + β3 YFAMILY + β4 YMHWKT+

(4.1)

(4.2)

β5 EDUINSP + β6 SLFENCRG + β7 FMLSUPRT + β8 MTHIMPT + β9 SES + epre ,
2
2
where epost ∼ N (0, σe
), e
∼ N (0, σepre ).
post pre
Level-2 Model:

β0 = γ0 + γ1 OLDARITH + γ2 OLDALG + γ3 CLASSSIZE + γ4 MTHONLY + uβ0 , (4.3)

2
with uβ ∼ N (0, σu );
β0
0
α0 = β0 + γ5 NEWALG + γ6 NEWGEOM + γ7 TPPWEEK + uα0 ,
64

(4.4)

Figure 4.2: Two-level structural equation model on SIMS-USA data

65

2
with uα0 ∼ N (0, σuα ).
0
This two-level structural model results in the estimates of the variance-covariance S1 )and
ˆ
ˆ
S2 . The estimated variance-covariance matrixes are denoted as Σ1 and Σ2 , respectively.
These model-based estimates of level-1 variable means (ˆ1 ) and variance-covariance matrix
µ
ˆ
(Σ1 ) are listed in Table 4.6. The model-based estimates of Level-2 variable means (ˆ2 ) and
µ
ˆ
variance-covariance matrix (Σ2 ) are listed in Table 4.7. The variance-covariance matrix of
the latent factors is listed in Table 4.8.
These model-based parameter estimates are treated as known values and are used to
generate longitudinal data of Cohort 2 at Time 0 (e.g., grade 7 in Y eari−1 ) and Time 1
(e.g., grade 8 in Year i). Details are in Section 4.1.3. Cohort 2 at Time 0 data are not
collected in SCD. Cohort 1 at Time 1 (e.g., grade 7 at Y eari ) data are treated as the
ˆ
“replacement” to estimate schooling eﬀect (δC2T 1−C1T 1 ). The schooling eﬀect estimation
bias in SCD is

ˆ
ˆ
BIAS(δC2T 1−C1T 1 ) = E(δC2T 1−C1T 1 ) − δC2T 1−C2T 0 .

(4.5)

The goal is to simulate SCD by generating Cohort 1’s Time 1 data that are non-comparable
with Cohort 2 at Time 0, so that matching can be used to reduce the “simulated selection
bias”, to assure the HEoG assumption, and to decrease estimation bias of schooling eﬀect.
A series of parameter manipulations are used to generate data of Cohort 1 at Time 1 (See
Section 4.2 for detail.).

66

Table 4.5: Two-Level Structural Equation Model Estimates (a.k.a True Pseudo-population Parameter Values)
Level-1 Variable

Label

Loading Coeﬃcient

Residual
Variance

Regression Coeﬃcient
PRETEST

POSTTEST

Coef. SE

PV

Coef. SE

PV

Coef. SE

PV

Est. SE

PV

Pre-Test Score

PRETEST

-

-

-

-

-

-

.72

.03

.00

31.87 1.94 .00

Post-Test Score

POSTTEST

-

-

-

-

-

-

-

-

-

25.64 1.27 .00

EDUINSP

YPWANT
YPWWELL
YPENC

1.00
1.05
1.82

.08
.11

.00
.00

.87

1.56 .58

-

-

-

.21
.37
.66

.01
.03
.05

.00
.00
.00

SLFENCRG

YIWANT
YMORMTH
YNOMORE

1.00
1.98
1.67

.18
.13

.00
.00

1.97

.56

.00

-

-

-

.58
.67
.77

.04
.05
.05

.00
.00
.00

FMLSUPRT

YPINT
YFLIKES
YMLIKES
YFABLE
YMABLE

1.00
.77
.46
1.00
.60

.05
.04
.06
.05

.00
.00
.00
.00

-.04

.25

.88

-

-

-

.62
.73
1.05
.85
1.27

.04
.03
.04
.05
.05

.00
.00
.00
.00
.00

67

MTHIMPT

YMIMPT
YFIMPT

1.00
1.06

Table 4.5: Continued.
-.89
.76
.05
.00

SES

YFEDUC
YMEDUC
YFOCCN
YMOCCN

1.00
.72
1.94
1.54

.04
.13
.14

.00
.00
.00

1.55

Age

XAGE

-

-

-

Parental Help

YFAMILY

-

-

Ed.
tion

EDUECPT

-

YMHWKT

.25

-

-

-

.17
.24

.02
.03

.00
.00

.30

.00

-

-

-

.17
.24
3.26
3.18

.01
.01
.13
.13

.00
.00
.00
.00

-.06

.02

.00

-

-

-

-

-

-

-

-1.44

.16

.00

-

-

-

-

-

-

-

-

1.28

.17

.00

-

-

-

-

-

-

-

-

-

-.03

.01

.01

-

-

-

-

-

-

PRETEST MEAN

1.26

.91

.17

-

-

-

1.21

.07

.00

3.97 .93

POSTTEST MEAN

16.61

1.80

.00

-

-

-

-

-

-

11.20 2.17 .00

Class Size

CLASSIZE

-

-

-

-.20

.06

.00

-

-

-

-

-

-

Prior OTL

OLDARITH
OLDGEOM

-

-

-

.65
.79

.36
.94

.07
.41

-

-

-

-

-

-

Qualiﬁed Math
Teacher Rate

MTHONLY

-

-

-

4.51

2.11 .03

-

-

-

-

-

-

Current
OTL

NEWALG
NEWGEOM

-

-

-

-

-

-

-.27
.37

.13
.14

.03
.01

-

-

-

TPPWEEK

-

-

-

-

-

-

.08

.02

.00

-

-

-

Expecta-

Homework
Time

Class
tion

Year

Instruc-

68

.00

Table 4.6: Model Estimated Parameters: Level-1 Variance
Variable Mean
RYPWANT 4.731 0.34
RYPENC 4.373 0.24 0.80
YPWWELL 4.241 0.14 0.25 0.81
YIWANT 4.319 0.08 0.15 0.08 0.75
YMORMTH 3.247 0.16 0.29 0.17 0.34 1.35
RYNOMORE 3.732 0.13 0.24 0.14 0.29 0.57 1.25
RYPINT 3.719 0.15 0.27 0.15 0.11 0.21 0.18 1.28
RYFLIKE 3.536 0.11 0.20 0.12 0.08 0.16 0.14 0.51 1.13
RYMLIKE 3.247 0.07 0.12 0.07 0.05 0.10 0.08 0.31 0.24
RYFABLE 3.917 0.15 0.27 0.15 0.11 0.21 0.18 0.67 0.51
RYMABLE 3.702 0.09 0.16 0.09 0.06 0.13 0.11 0.40 0.31
RYMIMPT 4.603 0.19 0.34 0.20 0.09 0.18 0.15 0.22 0.17
RYFIMPT 4.546 0.20 0.36 0.21 0.10 0.19 0.16 0.24 0.18
YFEDUC 3.375 0.04 0.07 0.04 0.01 0.02 0.02 0.20 0.15
YMEDUC 3.349 0.03 0.05 0.03 0.01 0.02 0.02 0.14 0.11
YFOCCN 4.277 0.07 0.13 0.07 0.02 0.05 0.04 0.39 0.30
YMOCCN 4.128 0.06 0.10 0.06 0.02 0.04 0.03 0.31 0.24
TOTPOS 0.854 0.11 0.20 0.12 0.25 0.49 0.41 0.31 0.24
TOTPRE 1.185 0.16 0.28 0.16 0.34 0.68 0.57 0.42 0.33
XAGE 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
EDUEPCT 2.968 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
YFAMILY 1.745 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
YMHWKT 2.984 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
OLDARITH 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
OLDGEOM 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NEWALG 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NEWGEOM 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TPPWEEK 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
CLASSSIZ 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MTHONLY 0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

69

ˆ
Covariance Matrix (Σ1 ) and Mean (ˆ1 )
µ

1.19
0.31
0.18
0.10
0.11
0.09
0.07
0.18
0.14
0.14
0.20
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

1.52
0.40
0.22
0.24
0.20
0.14
0.39
0.31
0.31
0.43
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

1.52
0.13
0.14
0.12
0.09
0.23
0.19
0.18
0.26
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.53
0.38
0.06
0.04
0.11
0.09
0.08
0.11
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.64
0.06
0.04
0.12
0.10
0.08
0.11
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.48
0.22
0.60
0.48
0.34
0.48
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.41
0.43
0.34
0.25
0.34
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

4.42
0.93
0.67
0.93
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

3.92
0.53
0.74
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

Variable
RYPWANT
RYPENC
YPWWELL
YIWANT
YMORMTH
RYNOMORE
RYPINT
RYFLIKE
RYMLIKE
RYFABLE
RYMABLE
RYMIMPT
RYFIMPT
YFEDUC
YMEDUC
YFOCCN
YMOCCN
TOTPOS
TOTPRE
XAGE
EDUEPCT
YFAMILY
YMHWKT
OLDARITH
OLDGEOM
NEWALG
NEWGEOM
TPPWEEK
CLASSSIZ
MTHONLY

Table 4.6: Continued

Mean

46.95
20.93
-1.93
0.57
-0.38
-1.29
0.00
0.00
0.00
0.00
0.00
0.00
0.00

29.06
-2.68
0.79
-0.52
-1.79
0.00
0.00
0.00
0.00
0.00
0.00
0.00

36.06
-0.47
0.02
-0.29
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.59
-0.01
-0.05
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.35
0.17
0.00
0.00
0.00
0.00
0.00
0.00
0.00

70

47.26
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00

0.00
0.00
0.00

0.00
0.00

0.00

Table 4.7: Model Estimated Parameters: Level-2 Variance Covariance Matrix
Variabble Mean
Variance Covariance
RYPWANT
0.000 0.00
RYPENC
0.000 0.00 0.00
YPWWELL
0.000 0.00 0.00 0.00
YIWANT
0.000 0.00 0.00 0.00 0.00
YMORMTH
0.000 0.00 0.00 0.00 0.00 0.00
RYNOMORE
0.000 0.00 0.00 0.00 0.00 0.00 0.00
RYPINT
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00
RYFLIKE
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
RYMLIKE
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
RYFABLE
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
RYMABLE
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
RYMIMPT
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
RYFIMPT
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
YFEDUC
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
YMEDUC
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
YFOCCN
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
YMOCCN
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TOTPOS 16.762 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TOTPRE 12.597 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
XAGE
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
EDUEPCT
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
YFAMILY
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
YMHWKT
0.000 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
OLDARITH
0.710 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
OLDGEOM
0.319 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NEWALG
5.961 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
NEWGEOM
4.137 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
TPPWEEK
5.087 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
CLASSSIZ 26.600 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MTHONLY
0.139 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

71

ˆ
(Σ2 ) and Mean (ˆ2 )
µ

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

Variabble
RYPWANT
RYPENC
YPWWELL
YIWANT
YMORMTH
RYNOMORE
RYPINT
RYFLIKE
RYMLIKE
RYFABLE
RYMABLE
RYMIMPT
RYFIMPT
YFEDUC
YMEDUC
YFOCCN
YMOCCN
TOTPOS
TOTPRE
XAGE
EDUEPCT
YFAMILY
YMHWKT
OLDARITH
OLDGEOM
NEWALG
NEWGEOM
TPPWEEK
CLASSSIZ
MTHONLY

Table 4.7: Continued.
Variance Covariance

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

23.63
16.06
0.00
0.00
0.00
0.00
0.82
0.44
-0.38
0.50
0.25
-6.94
0.06

13.41
0.00
0.00
0.00
0.00
0.80
0.46
-0.16
-0.41
0.08
-5.40
0.05

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
72

0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00

1.03
0.25
-0.08
-0.45
-0.06
0.17
-0.01

0.39
-0.07
-0.37
-0.03
-0.13
-0.01

6.24
3.97
0.18
0.35
0.00

5.59
-0.20
-0.77
0.00

3.73
-0.67
0.00

29.01
0.10

0.02

Table 4.8: Covariance Matrix of the Five Latent Variables
EDUINSP
EDUINSP
.13***
SLFENCRG .08 ***
FMLSUPRT .15**
MTHIMPT
.19**
SES
.04**
*** p < .001, *** p <.05 *

4.1.2

SLFENCRG

FMLSUPRT

MTHIMPT

SES

.17***
.11***
.09***
.01*
p <.1

.67***
.22***
.2***

.36***
.06***

.31***

Longitudinal Data Generation

The model-based estimates listed in Table 4.5, Table 4.6, Table 4.7 and Table 4.8 are treated
as known parameter values and are plugged in the two-level SEM to generate longitudinal
data of C2T 0 and C2T 1. Mplus code for the data generation is listed in Appendix A.2.
Determine Class Sizes

Simulated class sizes are based on the observed class sizes in

SIMS-USA data. Table 4.9 displays the class-size distribution of the 126 Regular classes in
SIMS-USA data. The range of the observed class sizes is [6,42]. The observed class sizes
are rounded up resulting in four types of class sizes (with frequency): 10 (N=3), 20 (N=35),
30 (N=80), and 40 (N=8). The average class is 27.38, which is very close to the observed
average class-size of 26.60. In the literature, the class-size in a simulated two-level model
has been set at 30 (e.g., Tate and Wongbundhit, 1983).
Pseudo-Population Size

The SIMS-USA data are collected to represent 3,681,939 8th -

graders nested in 136,368 classes across the seven strata in the United States (Wolfe, 1987).
The simulated pseudo-population includes 12,600 classes and 345,000 students. The classsize distribution in the pseudo-population is: 10 (N=300), 20 (N=3500), 30 (N=8000), and
40 (N=800). The average class-size is 27.13.
73

Table 4.9: Class Size Distribution of 126
Class Size Frequency Percent
6
1
.8
14
2
1.6
15
1
.8
17
2
1.6
18
1
.8
19
3
2.4
20
4
3.2
21
4
3.2
22
6
4.8
23
11
8.7
24
3
2.4
25
11
8.7
26
6
4.8
27
14
11.1
28
11
8.7
29
7
5.6
30
6
4.8
31
14
11.1
32
3
2.4
33
7
5.6
34
1
.8
35
1
.8
36
3
2.4
37
2
1.6
38
1
.8
42
1
.8
Total
126
100.0

Classes of SIMS-USA Data
Cumulative Percent
.8
2.4
3.2
4.8
5.6
7.9
11.1
14.3
19.0
27.8
30.2
38.9
43.7
54.8
63.5
69.0
73.8
84.9
87.3
92.9
93.7
94.4
96.8
98.4
99.2
100.0

Evaluation of Pseudo-Population Parameter Recovery

The two-level SEM is ﬁt on the

generated pseudo-population data. The estimated parameter values along with the “true”
values are listed in Table 4.10. Except for a negative estimate bias (−0.2268) of the regression
coeﬃcient of latent construct Self Encouragement (β6 = .87), all other parameter estimation
bias values are less than 0.09.

74

Table 4.10: Recovery of Pseudo-Population Parameters.
Within Level
Observed
Variable
RYPWANT
RYPENC
YPWWELL
YIWANT
YMORMTH
RYNOMORE
RYPINT
RYFLIKE
RYMLIKE
RYFABLE
RYMABLE
RYMIMPT
RYFIMPT
YFEDUC
YMEDUC
YFOCCN
YMOCCN

Population Estimated Bias
Means
4.7310
4.3730
4.2410
4.3190
3.2470
3.7320
3.7190
3.5360
3.2470
3.9170
3.7020
4.6030
4.5460
3.3750
3.3490
4.2770
4.1280

4.7312
4.3741
4.2421
4.3226
3.2473
3.7337
3.7197
3.5375
3.2471
3.9191
3.7037
4.6034
4.5468
3.3742
3.3480
4.2811
4.1246

0.0002
0.0011
0.0011
0.0036
0.0003
0.0017
0.0007
0.0015
0.0001
0.0021
0.0017
0.0004
0.0008
-0.0008
-0.0010
0.0041
-0.0034

Latent
Observed
Variable Variable
EDUISPR RYPWANT
RYPENC
YPWWELL
SLFENCRGYIWANT
YMORMTH
RYNOMORE
FMLYSUPT
RYPINT
RYFLIKE
RYMLIKE
RYFABLE
RYMABLE
MTHIMPT RYMIMPT
RYFIMPT
SES
YFEDUC
YMEDUC
YFOCCN
YMOCCN

75

Population Estimated Bias
Factor Loadings
1.0000
1.8220
1.0540
1.0000
1.9840
1.6740
1.0000
0.7690
0.4580
1.0010
0.6010
1.0000
1.0590
1.0000
0.7180
1.9410
1.5400

1.0000
1.8288
1.0534
1.0000
1.9934
1.6816
1.0000
0.7704
0.4582
1.0011
0.5986
1.0000
1.0617
1.0000
0.7173
1.9391
1.5349

0.0000
0.0068
-0.0006
0.0000
0.0094
0.0076
0.0000
0.0014
0.0002
0.0001
-0.0024
0.0000
0.0027
0.0000
-0.0007
-0.0019
-0.0051

Table 4.10: Continued.
Population
Observed
Residual
Variable
ances
RYPWANT 0.2080
RYPENC
0.3650
YPWWELL 0.6600
YIWANT
0.5770
YMORMTH 0.6700
RYNOMORE 0.7720
RYPINT
0.6180
RYFLIKE
0.7330
RYMLIKE
1.0490
RYFABLE
0.8480
RYMABLE
1.2740
RYMIMPT
0.1700

Estimated Bias
Vari0.2085
0.3650
0.6616
0.5774
0.6732
0.7697
0.6149
0.7328
1.0510
0.8448
1.2759
0.1703

0.0005
0.0000
0.0016
0.0004
0.0032
-0.0023
-0.0031
-0.0002
0.0020
-0.0032
0.0019
0.0003

RYFIMPT
YFEDUC
YMEDUC
YFOCCN
YMOCCN
TOTPOS
TOTPRE

0.2356
0.1639
0.2459
3.2391
3.1889
31.8988
25.5541

-0.0004
-0.0021
-0.0001
-0.0179
0.0059
0.0248
-0.0839

0.2360
0.1660
0.2460
3.2570
3.1830
31.8740
25.6380

Population Estimated Bias
Latent
Variable
SLFENCRG
FMLYSUPT
FMLYSUPT
MTHIMPT
MTHIMPT
MTHIMPT
SES
SES
SES
SES

Latent
Variable
EDUISPR
EDUISPR
SLFENCRG
EDUISPR
SLFENCRG
FMLYSUPT
EDUISPR
SLFENCRG
FMLYSUPT
MTHIMPT

Outcome
Variable

Predictor

Regression Coeﬃcient

EDUISPR
SLFENCRG
FMLYSUPT
MTHIMPT
SES
XAGE
EDUEPCT
YFAMILY
YMHWKT
TOTPRE

0.8700
1.9730
-0.0380
-0.8850
1.5540
-0.0570
1.2770
-1.4390
-0.0320
0.7200

TOTPRE

TOTPRE

TOTPOS

76

Correlation Coeﬃcient
0.0800
0.1460
0.1050
0.1880
0.0910
0.2220
0.0360
0.0120
0.2010
0.0580
Population

0.0795
0.1463
0.1054
0.1881
0.0907
0.2227
0.0362
0.0127
0.2033
0.0580
Estimated

0.6432
2.0168
-0.0160
-0.7823
1.5335
-0.0573
1.2781
-1.4015
-0.0320
0.7212

-0.0005
0.0003
0.0004
0.0001
-0.0003
0.0007
0.0002
0.0007
0.0023
0.0000
Bias

-0.2268
0.0438
0.0220
0.1027
-0.0205
-0.0003
0.0011
0.0375
0.0000
0.0012

Table 4.10: Continued.
Between Level
Intercept
TOTPRE

TOTPOS

Population
Regression Coeﬃcient
OLDARITH 0.6500
OLDGEOM 0.7860
CLASSSIZE -0.2010
MTHONLY 4.5080
TOTPRE
1.2050
NEWALG
-0.2690
NEWGEOM 0.3710
TPPWEEK 0.0750

Estimated
0.6106
0.7843
-0.2067
4.5822
1.2096
-0.2832
0.3741
0.0776

Bias
-0.0394
-0.0017
-0.0057
0.0742
0.0046
-0.0142
0.0031
0.0026

77

Intercept
TOTPOS
TOTPRE
Intercept
TOTPOS
TOTPRE

Population Estimated
Means
1.2640
1.3101
16.6120
16.7208
Residual Variance
3.9690
3.9681
11.1980
11.2468

Bias
0.0461
0.1088
-0.0009
0.0488

Longitudinal Data Generation Routines

Longitudinal data sets of the focal Cohort 2

across Time 0 and Time 1 are generated through the following steps:

1. Level-2 independent covariates in the regression equation of the pre-test class-level
means (denoted as vectorβ0 ) are generated through a multivariate normal distriC2T 0.β0 C2T 0.β0
bution, MN(µ2
, Σ2
). These independent covariates include previous
OTL of arithmetic (OLDARITH), previous OTL of algebra (OLDALG), class-size
(CLASSSIZE), and qualiﬁed mathematics teacher rate (MTHONLY). Their mean
C2T 0.β0
vector µ2
=[0.710, 0.319, 26.600, 0.139] and their variance-covariance matrix
C2T 0.β0
Σ2
is in Table 4.7. The level-2 residuals uβ are generated through an univari0
2 ). The “true” variance σ 2
ate normal distribution, N (0, σu
uβ is 11.198.
β0
0
2. Based on the regression equation, pre-test class-level means β0 are generated by plugging in the regression coeﬃcients γ0 , . . ., γ4 (Table 4.4), and the independent variables
and residuals generated in the ﬁrst step.

3. Level-2 independent covariates in the regression equation of the post-test class-level
means (denoted as vector α0 ) are generated through a multivariate normal distribution,
C2T 0.α0
MN(µC2T 0.α0 , Σ2
). These independent covariates include current year OTL
of algebra (NEWALG), current year OTL of geometry (NEWGEOM), and weekly
hours of math instructions (TPPWEEK). Their means and variance-covariances are
in Table 4.7. The level-2 residuals uα0 are generated through an univariate normal
2
2
distribution, N (0, σuα ). The variance σuα is 3.97.
0
0
4. Based on the regression equation, post-test class-level means α0 are generated by
plugging in the regression coeﬃcients γ5 , . . ., γ7 (Table 4.4), β0 generated in the ﬁrst
78

two steps, and the independent variables and residuals generated in the third step.

5. Level-1 latent variables are generated through a multivariate normal distribution,
C2T 0.ξ
C2T 0.ξ
MN(µ1
, Σ1
). These latent variables (denoted as the vector ξ) include
educational inspiration (EDUINSP), self encouragement (SLFENCRG), family support (FMLSUPRT), importance of learning mathematics (MTHIMPT), and socioecoC2T 0.ξ
nomic status (SES). Their mean vector µ1
=[0,0,0,0,0]. Their variance-covariance
C2T 0.ξ
matrix Σ2
is in Table 4.6. The level-1 residuals epre are generated through
2
2
N (0, σepre ), with σepre =31.87.
6. Based on Ypre ’s regression equation, level-1 dependent variable YP re data are generated by plugging in β0 generated in the ﬁrst two steps, the regression coeﬃcients β1 ,
. . ., β9 (listed in Table 4.4), level-1 latent variables and residuals generated in the ﬁfth
step.
2
2
7. Level-1 residuals epost are generated through N (0,σe
), with σepre = 25.64. Based
post
on Ypost ’s regression equation, level-1 dependent variable Ypost data are generated by
plugging in the level-1 YP re data generated in the sixth step, α0 generated in the
fourth step, the regression coeﬃcients α1 (listed in Table 4.4), and level-1 residuals
epost .
8. The surrogate variables of each of the level-1 latent variables are generated through a
multivariate normal distribution. For example, level-1 latent variable SES is associated
with 4 surrogate variables through the measurement model

C2T 0
X1.SES = µC2T 0
X1.SES + λX1.SES ηSES + eX1.SES ,
79

(4.6)

with eX1.SES ∼ N (0, ΘX1.SES ) and ηSES ∼ N (0, ΦSES ). The surrogate variC2T 0
ables are denoted as X1.SES , which include Father’s/Mother’s education level (YFEDUC /YMEDUC), and Father’s/Mother’s occupation national code (YFOCCN/YMOCCN).
C2T 0
X1.SES ∼ M N (µC2T 0 , ΣC2T 0 ). Their mean vector µC2T 0 =[ 3.375, 3.349,
X1.SES X1.SES
X1.SES
4.277, 4.128]. ΣC2T 0
X1.SES is computed by λX1.SES ΦSES λ X1.SES +ΘX1.SES .
λX1.SES is the factor loading vector. ΦSES is the variance of latent variable SES.
ΘX1.SES is the diagonal matrix of residual variances. The parameter values of
λX1.SES ΦSES and ΦSES are in Table 4.4. The computed ΣC2T 0
X1.SES is a diagonal matrix, whose diagonal entries are 0.475, 0.405, 4.421, and 3.916.

The eight steps above together generate longitudinal data of focal Cohort 2 across Time
0 and Time 1.

4.2

Generate Synthetic Cohort Design Data with Simulated Selection Bias

The goal is to simulate synthetic cohort design by generating Cohort 1’s Time 1 data that are
non-comparable with Cohort 2 at Time 0 based on the conceptual two-level SEM. That is,
C1T
C2T
Ypre 1 = Ypre 0 , indicating the baseline scores are diﬀerent due to the simulated selection
bias.
ˆ
Thus schooling eﬀect estimate using SCD, δC2T 1−C1T 1 becomes biased compared with
δC2T 1−C2T 0 . In order to reduce bias, matching is used to ‘assure’ a conditional equivalence,
C1T
C2T
[Ypre 1 = Ypre 0 |f (X)], f (X) can be a function estimating the propensity score (RosenC1T
C2T
baum and Rubin, 1983). Given the conditional equivalence, using Ypre 1 to replace Ypre 0
80

as a baseline score to estimate schooling eﬀect is applicable and accurate.
The two-level SEM is complex because it involves a large number of parameters. For
example, the Time-0-SEM in the the equation (3.19) includes (level-1’s and level-2’s) regression intercepts and coeﬃcients, factor loadings, residual variances, and latent variable
distribution parameters.
For the purpose of simplicity, drop the subscript 0 of the parameters in the Time-0-SEM
and add superscripts C1T 1 and C2T 0 to identify all possible situations where selection bias
C1T
C2T
can cause Ypre 0 = Ypre 1 .
Table 4.11 summarizes those situations, each of which may occur at level-1 and/or level2. There are too many possible situations, each of which can “break” the HEoG assumption
and bias SCD’s schooling eﬀect estimate. In order to make the simulation manageable,
constraints and appropriate assumptions are needed to limit the situations. For example,
factorial invariance (Cheung and Rensvold, 2002) and regression homogeneity (Wooldridge,
2002) rule out the situations that cause selection bias due to factor loadings and regression
coeﬃcients.
Section 4.2.1 to Section 4.2.5 manipulate parameters such as means and variances to
simulate selection bias due to the data hierarchical structure. Section 4.2.6 manipulates parameters such as latent variable mean and surrogate variables’ residual variances to simulate
selection bias due to measurement errors. Section 4.2.7 manipulates level-1 and/or level -2
residual variances to simulate selection bias due to omitted variables. This section also examines if the association strength (indicated by R2 ) between the outcome Y and covariates
(used in matching) aﬀects the bias reduction rate of matching.

81

Table 4.11: Possible Simulation Manipulations on Comparability of C2T0 and C1T1 in SEM Framework
Outcome Regression InVariable tercept

Structural
Model

η C2T 0
η C1T 1

δ C2T 0 = δ C1T 1

E(eC2T 0 ) = E(eC1T 1 )
2
2

E(ξ C2T 0 ) = E(ξ C1T 1 )
V(ξ C2T 0 ) = V(ξ C1T 1 )

E(eC2T 0 ) = E(eC1T 1 )
1
1

E(η C2T 0 ) = E(η C1T 1 )

C1T
V(eC2T 0 ) = V(e1 1 )
1

Y

v C2T 0 = v C1T 1

Distribution Parameter

C1T
V(eC2T 0 ) = V(e2 1 )
2

Measurement X
Model

Residual

V(η C2T 0 ) = V(η C1T 1 )

AC2T 0 = AC1T 1 U C2T 0 = U C1T 1

82

NA

Coeﬃcient

g C2T 0 = g C1T 1

λC2T 0 = λC1T 1

B C2T 0 = B C1T 1

4.2.1

Generate Hierarchically Structured C1T 1 Data with Selection Bias

These situations include the following practical issues:
1. Non-comparability occurs only where level-1 covariates and level-2 covariates are identical. It occurs when the two adjacent seventh grade classes being located in the same
school and taught by the same teacher. That is, matching can be only on level-1
2. Level-2 covariates are not comparable and level-1 covariates are identical or level-1
comparability is not a concern. For instance, in the cluster randomized design, clusters
including classes or schools are the sampling and intervention units; aggregated cluster
means are the analysis units. Matching clusters to create level-2 comparability;
3. Level-1 and Level-2 covariates cause non-comparability. This is a concern when clusters
are sampled from the population of interest, and intervention happens on individuals.
Both level-1 and level-2 matching, i.e, the dual matching, is necessary.

4.2.1.1

C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s

There are four level-1 covariates: age (XAGE), education expectation (EDUCEPT), homework time (YMHWKT ), and frequency of family help on homework (YFAMILY). Their
mean vector µC2T 0 = [0.000, 2.968, 1.745, 2.984] is manipulated by adding a constant vec1
tor c1 = (−1, 1, −1, −1). The manipulated mean vector is denoted as µC1T 1 = [-1.000,
1
3.968, 0.745, 1.984] and used to generate data of Cohort 1 at Time 1. Varying the value of
C1T 1 and X C2T 0 . A smaller c will
c1 will vary the overlap between the distribution of X1
1
1
C1T 1 and X C2T 0 ; and it is more likely to achieve
create a bigger overlapping between X1
1
83

successfully matched units given a speciﬁc sample size.
The simulated bias on pre-test score is 2.8052 . Thus, the manipulated population pretest mean of C1T 1 is increased from 13.711 to 16.576. Therefore, using SCD underestimates
ˆ
ˆ
the learning eﬀect by 2.805. That is BIAS(δC2T 1−C1T 1 )=2.805. After matching the
ˆ
ˆ
level-1 covariates, BIAS(δC2T 1−C1T 1 ) will be shrunk. Thus a bias reduction rate can be
computed to evaluate the performance of matching (See detail in Section 4.3). The logic of
using matching is the same for the other six simulation situations. All simulation results are
in Chapter 5.

4.2.1.2

C1T 1’s Level-1 Covariate Variances Diﬀer from C2T 0’s

The variances of the four level-1 covariates are manipulated by adding an extra 15% to each
C2T 0 = [36.056, 0.590, 0.354, 47.260]
of the original variances. The original variance vector σ1
is manipulated by multiplying a constant vector p1 = (1.15, 1.15,1.15,1.15). The manipuC1T 1 =[ 41.4644, 0.6785, 0.4071, 54.349]. Varying the
lated variance vector is denoted as σ1
C1T 1 and X C2T 0 . A larger p will
multiplier vector p1 will vary the overlap between X1
1
1
C1T 1 . This will decrease the chance of achieving successfully matched units given
increase σ1
a speciﬁc sample size of Cohort 1 at Time 1.

4.2.1.3

C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s

The level-2 covariates include previous opportunities to learn arithmetic (OLDARITH)
and algebra (OLDALG), class-size (CLASSSIZE), and qualiﬁed mathematics teacher rate
(MTHONLY). The mean vector, µC2T 0 =[0.710, 0.319, 26.600, 0.139], is manipulated by
2
2 The four regression coeﬃcients are, -0.057, 1.277, -1.439, and -0.032. It results in the
simulated bias: (-0.057)*(-1)+ 1*1.277+(-1)*(-1.439)+(-1)*(-0.032)=2.805.

84

multiplying another constant vector p2 = (1.5, 1.5, -0.5,1.5). After manipulation, the mean
vector µC1T 1 = [1.065, 0.4785, 13.3, 0.2085] is used to generate the data of Cohort 1 at Time
2
1. Note that the average class-size3 in Cohort 1 at Time 1 is half time as large as in Cohort
2 at Time 0. The regression coeﬃcients of the four level-2 covariates are 0.65, 0.79, -0.2, and
4.51, respectively. This leads to a total bias of 3.33.

4.2.1.4

C1T 1’s Level-2 Covariate Variances Diﬀer from C2T 0’s

The variances of the four level-2 covariates are manipulated by adding a 15% to each of
C2T 0 =[1.032, 0.385, 29.005, 0.018] is
the original variances. The original variance vector, σ2
manipulated by multiplying a constant vector p2 = (1.15, 1.15,1.15,1.15). The manipulated
C1T 1 =[ 1.187, 0.443, 33.356, 0.021]. Varying the multiplier
variance vector is denoted as σ2
C1T 1 and X C2T 0 . A larger p will increase
vector p1 will vary the overlap between X1
1
1
C1T 1 . This will decrease the chance of achieving successfully matched units given a speciﬁc
σ1
level-2 sample size of Cohort 1 at Time 1.

4.2.1.5

C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from C2T 0’s

In this situation, both level-1 and level-2 covariate means are manipulated by following
procedures in Section 4.2.1.1 and Section 4.2.1.3. This way, the initial diﬀerence is inﬂated
to 6.135, which is the sum of two initial diﬀerence values, 3.330 in Section 4.2.1.1 and 2.805
in Section 4.2.1.3
3 The class sizes shrink by half in Cohort 1 at Time 1. This requires to use Mplus command
of CSIZES = 300 (5) 3500 (10) 8000 (15) 800(20) in Appendix A2, compared with the code
of CSIZES = 300 (10) 3500 (20) 8000 (30) 800(40) in generating data of Cohort 2 at Time
0.
85

4.2.2

Generate Data for Matching on Latent Variables v.s. Matching on Surrogate Variables

Data generation in this section involves manipulating random measurement errors and reliˆ
ability values. Among the ﬁve latent constructs, SES (β9 = 1.55, SE = 0.30, p<.001) and
ˆ
Self Encouragement (β6 = 1.97, SE = 0.56, p<.001) are statistically signiﬁcantly predictive
of pre-test score. Because of its practical importance in education studies, only latent variable SES and its four surrogate variables are focused on in simulation manipulation. The
four surrogate variables include: Father’s/Mother’s education level (YFEDUC/YMEDUC),
and Father’s/Mother’s occupation national code (YFOCCN/YMOCCN). All manipulations
are summarized in Table 4.12.

4.2.2.1

C1T 1’s Surrogate Variable Means Diﬀer from C2T 0’s, with the Same
Latent Means and Low Reliability

C2T 0
The variance vector of the four surrogate variables is σX .SES =[0.475, 0.405, 4.421, 3.916]
1
respectively. The half standard deviation vector is c3 =[0.345, 0.318, 1.051, 0.989]. Their
mean vector µC2T 0 =[3.375, 3.349, 4.277, 4.128] is manipulated by adding a half stanX1.SES
dard deviation to each. The manipulated mean vector is denoted as µC1T 1
X1.SES =[ 3.720,
3.667, 5.328, 5.117] and used to generate data of Cohort 1 at Time 1 through a multivariate
C1T 1
normal distribution X1.SES ∼ M N (µC1T 1 , ΣC1T 1 ). ΣC1T 1
X1.SES X1.SES
X1.SES is computed by
λX1.SES ΦSES λ X1.SES +ΘX1.SES .
The factor loading vector λX1.SES =[1.000, 0.718, 1.941, 1.540] and the latent variable SES variance ΦSES =0.31. The values of ΘX1.SES are in Table 4.4. The re86

Table 4.12: Simulation Design of Matching on Latent and Surrogate Variables

PreTest

PostTest

ICC

Reliability

Latent Variable Mean

Observed Variable Mean

Cohort 1 Time 1
(7th grade in Year i+1)

ICC

Reliability

Latent Variable Mean

Observed Variable Mean

Cohort 2 Time 0
(7th grade in Year i)

PreTest

PostTest

µ

0

Low 0.318 0.337 µ+c3

0

Low 0.311 0.331

µ

0

Low 0.318 0.337 µ

0

High 0.311 0.331

µ

0

Low 0.318 0.337 µ+c4

0.68 High 0.311 0.331

µ

0

High 0.318 0.337 µ+c4

0.68 High 0.311 0.331

liability coeﬃcient is computed as the ratio of SUM(λX1.SES ΦSES λ X1.SES )4 and
SUM(ΣC1T 1 ) (Lord and Novick, 1968; Raykov, 1997). The pseudo-population level
X1.SES
reliability coeﬃcient, for both Cohort 2 at Time 0 and Cohort 1 at Time 1, is equal to 0.25,
which is low.
A two-level SEM is ﬁt to the generated pseudo-population data to derive the factor score
of latent variable SES. When ﬁtting the two-level SEM, the mean of the latent variable SES
is set to 0. This setting is the same for pseudo-population data of Cohort 1 at Time 1 and
for pseudo-population data of Cohort 2 at Time 0.
4 Function SUM(.) add up all the components of a matrix.
87

4.2.2.2

C1T 1’s Surrogate Variables Have Higher Reliability than C2T 0’s, with
the Same Surrogate Means and the Same Latent Means

In this simulation, the residual variances of the four surrogate variables are reduced by 90%.
Thus, the reliability will be increased to 0.78. The original residual variance matrix





 0.166
0
0
0





 0
0.246
0
0

ΘX1.SES = 



 0
0
3.257
0




0
0
0
3.183




















(4.7)

is manipulated by reducing the diagonal entries by 90%. The manipulated residual variance
matrix becomes






 0.017
0
0
0





 0
0.025
0
0

∗
ΘX1.SES = 



 0
0
0.326
0




0
0
0
0.318










.









(4.8)

C1T 1
The surrogate variables are generated through a multivariate normal distribution X1.SES ∼
C2T 0
M N (µC1T 1 , ΣC1T 1 ). The mean vector µC1T 1
X1.SES X1.SES
X1.SES is µX1.SES =[ 3.375, 3.349, 4.277,
∗
4.128]. ΣC1T 1
X1.SES is computed by λX1.SES ΦSES λ X1.SES +ΘX1.SES . The factor
loading vector λX1.SES and the latent variable SES variance ΦSES are the same as those
88

in Section 4.2.2.1. The latent variable SES mean is set as 0 for both Cohort 1 at Time 1 and
Cohort 2 at Time 0, when Mplus ﬁts the two-level SEM to estimate factor scores.

4.2.2.3

C1T 1’s Surrogate Variables Have Higher Reliability, Diﬀerent Latent
Variable Mean from C2T 0’s

In this simulation, the level-1 surrogate variables of Cohort 1 at Time 1 have a higher
reliability. This manipulation is the same as what has been discussed in Section 4.2.2.2. The
latent variable mean in Cohort 2 at Time is 0; the latent variable mean in Cohort 1 at Time
1 is 0.68, which is a half of the standard deviation of the latent variable.
When ﬁtting the two-level SEM to estimate factor scores, the mean of the latent variable
SES is set to as 0 by default in Cohort 2 at Time 0. The latent mean is set to 0.68 in
Cohort 1 at Time 1. Because of the latent mean diﬀerence, the surrogate variable means of
two cohorts are diﬀerent by a constant vector c4 . c4 = 0.68 ∗ λX1.SES , with λX1.SES
displayed in Section 4.2.2.1.

4.2.2.4

C1T 1’s Latent Variable Mean Diﬀer from C2T 0’s, with Same Higher
Reliability

Both Cohort 1 at Time 1 and Cohort 2 at Time 0 have a higher reliability of .78. This
indicates a strong relationship between the four surrogate variables and the latent variable
SES. The latent mean of SES in Cohort 1 at Time 1 is manipulated in the same way as that
discussed in Section 4.2.2.3. This section adapts the latent mean settings of Mplus discussed
in Section 4.2.2.3 to estimate the factor scores for both cohorts.
89

4.2.3

Manipulate R2 to Generate Data for Matching

2
2
In this simulation, level-1 variance σepre and level-2 residual variance σuα are reduced by
0
half, respectively. A value of ICC due to a manipulation is computed. Table 5.2 (Colum 2
and 3) lists the ICC’s.

4.2.3.1

C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s, with Level-1 Vari2
ance σepre Reduced by Half

In this simulation, level-1 covariates means in Cohort 1 at Time 1 are manipulated by
2
following the same procedure of Section 4.2.1.1. The residual variance σepre in both cohorts
is set as 12.819, which is reduced by 50% of the value (25.638) in Section 4.2.1.1. The initial
diﬀerence on Pre-test score is 2.805.

4.2.3.2

C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s, with Level-1 Vari2
ance σepre Reduced by Half, and Initial Diﬀerence Reduced

2
In this simulation, the residual variance σepre in both cohorts is set as 12.819, which is a
50% reduction. In Cohort 2 at Time 0, the standard deviation vector of the four level-1
C2T 0 =[6.005, 0.768, 0.595, 6.875] and a half of standard deviation is [3.002,
covariates, is σ1
0.384, 0.297, 3.437]. The mean vector of the four level-1 covariates µC2T 0 =[0.000, 2.968,
1
1.745, 2.984], is manipulated by deducting/adding a half of standard deviation to each.
The deducting/adding operation is determined by the negative/positive sign of the regression coeﬃcient. The manipulated mean vector µC1T 1 =[ -3.002, 3.352, 1.448, -0.453] is
1
used to generate data of Cohort 1 at Time 1. The regression coeﬃcients of the four covariates are, -0.057, 1.277, -1.439, and -0.032, thus, the bias on Pre-test score is 1.1995 , which
5 (-3.002)*(-0.057)+0.384*1.277+(-0.297)*(-1.439)+(-3.437)*(-0.032)=1.199
90

is about 43% of the bias in Section 4.2.3.2.

4.2.3.3

C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s with Level-2 Vari2
ance σuα Reduced by Half
0

In this simulation, level-2 covariate means are manipulated by following the same procedures
2
as in Section 4.2.1.3. The residual variance σuα in both cohorts is set as 5.599, which is
0
reduced by 50% of the value (11.198) in Section 4.2.1.3. The initial diﬀerence on Pre-test
score due to level-2 covariate means diﬀerence is 3.330, which is the same as that in Section
4.2.1.3.

4.2.3.4

C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s with Level-2 Vari2
ance σuα Reduced by Half and Initial Diﬀerence Reduced by Half
0

2
In this simulation, the residual variance σuα in both cohorts is set as 5.599, which is
0
reduced by 50% of the value (11.198) of 4.2.1.3. In Cohort 2 at Time 0, the standard
C2T 0 =[1.016, 0.620, 5.386, 0.134] and a
deviation vector of the four level-2 covariates, is σ2
half of the standard deviation is [0.508, 0.310, 2.693, 0.067]. The mean vector of the four
level-2 covariates µC2T 0 =[ 0.710, 0.319, 26.600, 0.139] is manipulated by deducting/adding
2
a half of the standard deviation to each.
The deduction or addition operation is determined by the negative/positive sign of the
regression coeﬃcient. The manipulated mean vector µC1T 1 =[1.218, 0.629, 23.90, 0.206] is
2
used to generate data of Cohort 1 at Time 1. The regression coeﬃcients of the four covariates
are, 0.65, 0.79, -0.2, and 4.51, thus, the bias on level-2 pre-test intercept is 1.4166 , which is
6 0.65*0.508+0.79*0.310+(-0.2)*(-2.693)+ 4.51* 0.067=1.416.
91

about 43% of the initial diﬀerence of the bias in 4.2.3.3.

4.2.3.5

C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from C2T 0’s with
Both Level-1 and Level-2 Variances Reduced by Half

This simulation combines section 4.2.3.1 and 4.2.3.3 to generate data for Cohort 1 at Time
1. In this way, the initial diﬀerence is inﬂated to 6.135. which is the same as that in 4.2.1.5.
This simulation is diﬀerent from that in 4.2.1.5 is that the former has higher level-1 R2 and
level-2 R2 .

4.2.3.6

C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from C2T 0’s, with
Both Level-1 and Level-2 Variances Reduced by Half and Total Initial
Diﬀerence Reduced

This simulation combines the manipulations of 4.2.3.2 and 4.2.3.4 to generate data for Cohort
1 at Time 1. In this way, the initial diﬀerence is 2.615, which is about 43% of the initial
diﬀerence (6.135) in 4.2.3.5. Both this simulation and simulation 4.2.3.5 have the same lower
level-1 R2 and lower level-2 R2 .
Contrasting simulations of Section 4.2.3 with those of Section 4.2.1 allows examinations
of the following: 1) if increasing R2 improves the bias reduction rate; and 2) if deceasing
initial selection bias after increasing R2 further improves bias reduction rate.

4.3

Simulation Evaluation

Generating C1T 1, C2T 0, and C2T 1 data sets allows the computation of quasi-longitudinal
growth and real longitudinal growth to examine estimation bias reduction rate. The eﬀec92

tiveness of matching is examined using bias reduction rate (Cochran and Rubin, 1973). It
is computed as

100 1 −

Schooling Eﬀect Estimation Bias in SCD After Matching
Schooling Eﬀect Estimation Bias in SCD Without Matching

%.

(4.9)

The detailed computation is described in the following three subsections. Because the
computation involves a sample index, the notation of schooling eﬀect estimates used in this
section are diﬀerent from those in previous sections.
However, the schooling eﬀect estimate based on longitudinal C2T 0 − C2T 1 data are
conceptually invariant in the dissertation study. Also, the schooling eﬀect estimate based on
SCD’s C1T 1 − C2T 1 data are also conceptually invariant across chapters in this dissertation
study.

4.3.1

Compute Initial Diﬀerence

The longitudinal estimate of schooling eﬀect δ P osP re from sample i (size is ni ) based upon
C2T 0 − C2T 1 data will be denoted as

ˆP
¯ C2T
¯ C2T
δi osP re = YP ost 1 − YP re 0 ,
i
i

(4.10)

with i = 1, 2, . . . , 200.
The SCD used C1T 1 − C2T 1 data to estimate schooling eﬀect δ SCD , which is computed
as
ˆSCD = Y C2T 1 − Y C1T 1 ,
¯
¯
δi
P osti
P rei
with i = 1, 2, . . ., 200.
93

(4.11)

δ
The initial diﬀerence thus is deﬁned as (BIASinitial ) and it is computed as
δ
ˆP
ˆSCD ),
BIASinitial = E(δi osP re − δi

(4.12)

with i = 1, 2, . . ., 200.
It is the mean of 200 biases. Each bias is the diﬀerence between a schooling eﬀect estimate
ˆP
ˆSCD .
δi osP re and a SCD-based estimate δi

4.3.2

Compute After Matching Bias

After matching, SCD estimate of schooling eﬀect will be

M δ SCD = [Y C2T 1 − Y C1T 1 |f (X C2T 0 ) = f (X C1T 1 )],
ˆ
¯
¯
i
i
i
P osti
P rei

(4.13)

with j = 1, 2, . . ., 200 and f (.) representing some function. For example, f (.) can be the
propensity score function.
δ
After matching, bias is deﬁned as BIASmatchng and it is computed as
δ
ˆP
ˆSCD ,
BIASmatching = E(δi osP re − M δi

(4.14)

with i = 1, 2, . . ., 200. It is the mean of 200 biases. Each bias is the diﬀerence between a
ˆP
ˆSCD .
schooling eﬀect estimate δi osP re and a matched synthetic cohort design estimate M δi
94

4.3.3

Compute Bias Reduction Rate

After-matching bias and initial diﬀerence together deﬁne the bias reduction rate (BRR),
which is computed as

δ
BRRmatching = 100 ∗ (1 −

δ
BIASmatching
)%.
BIAS δ
initial

(4.15)

For example, if initial diﬀerence is 6 before matching and 2 after matching, then the bias
reduction rate due to matching is 100(1-2/6)%=67%. That is, two thirds of the initial
diﬀerence has been accounted for by matching. Larger value of bias reduction rate indicates
a better performance of matching in assuring HEoG of Synthetic Cohort Design.

95

Chapter 5
Matching Simulation Results and
Discussions
This chapter reports the detailed matching procedures and results of analyses of the simulated situations discussed in Chapter 4.

5.1

Three Types of Matching Routines

Matching is conducted using the R (R Development Core Team, 2007) module MatchIt (Ho
et al., 2009) and Match (Sekhon, 2007) . The R code is attached in Appendix A.3, A.4, and
A.5.

5.1.1

Level-1 Matching

By ignoring the hierarchical structure, individuals are matched. The detailed matching
procedures are as following:
96

1. Randomly draw 100 classes from the pseudo-population of Cohort 2 at Time 0. Let nj
100
be the class-size of j th class. The sample size is nC2T 0 =
nj . Randomly draw 100
j=1
classes from the pseudo-population of Cohort 1 at Time 1. Let n be the class-size of
j
100
j th class. The sample size is nC1T 1 =
nj . Let (Y, X, W )C2T 0 be the ith data
i
j =1
record of the Cohort 2 at Time 0 sample, with i =1,2,. . . , nC2T 0 . Vector Y represents
the level-1 pre-test score variable. X is the level-1 variable vector including both the
latent and the observed. W is the level-2 variable vector. Let (Y, X, W )C1T 1 be the
i
i th data record of the Cohort 1 at Time 1 sample, with i =1,2,. . . , nC1T 1 .

2. Pool the two random samples together to estimate the level-1 propensity scores. The
propensity score (p1 ) represents the probability that a student belongs to focal Cohort
p1
, is computed
2, whose cohort ID is coded as 1. The logarithm of the odds , log
1−p1
for each student and used for matching (Stuart & Rubin, 2008). Because the simulated
bias occurs at only level-1 , the level-2 covariates W are not used to compute propensity
scores.

3. Among the total of nC2T 0 cases in the sample drawn from Cohort 2 at Time 0, for the
ith data record (Y, X, W )C2T 0 , ﬁnd ONE data record (Y, X, W )C1T 1 from Cohort
i
i
1 at Time 1, so that Min[(X)C2T 0 , (X)C1T 1 ] reach a pre-set value, which is such a
i
i
small number called caliper in matching literature (e.g., Stuart and Rubin, 2008). The
smaller the caliper, the more comparable the two data points will be. Min [a, b] is a
function that computes the minimum distance between quantity a and quantity b in
terms of log-odds or Mahalanobis distance. The matched data are used to compute
δ
the ith bias reduction rate (BRR i
) using the formula in Section 4.3.
matching
97

4. Repeat 1-3 for 200 times, which are the replications. This results in 200 bias reduction
rates.
200
5. Compute and report the average bias reduction rate as
i=1

δ
BRR i
matching

200 .

Level-1 matching R code is displayed in Appendix A.3.

5.1.2

Level-2 Matching

In level-2 matching, classes are matched using level-2 propensity scores. The analysis units
are the means of the matched classes.
1. Randomly draw 100 classes from the pseudo-population of Cohort 2 at Time 0. Randomly draw 100 classes from the pseudo-population of Cohort 1 at Time 1. Let
¯ ¯
Y , X, W C2T 0 be the k th data record of the sample drawn from Cohort 2 at Time
k
¯ ¯
0, with k =1,2,. . . , 100. Let Y , X, W C1T 1 be the k th data record of the sample
k
¯
drawn from Cohort 1 at Time 1, with k = 1 , 2 , . . .,100. Vector Y represents the class
¯
mean of pre-test score. VectorX represents the means of the level-1 variables including
both latent and observed. Vector W represents level-2 variables.
2. Pool the two random samples together to estimate the level-2 propensity scores. The
level-2 propensity score (p2 ) represents the probability that a class belongs to Cohort
p2
, is computed
2, whose cohort ID is coded as 1. The logarithm of the odds, log
1−p2
for each class and used to for matching. Because of the hierarchical structure, level-2
covariates W play a critical role in computing propensity scores.
¯ ¯
¯ ¯
3. For the ith data record Y , X, W C2T 0 , ﬁnd ONE data record Y , X, W C1T 1 from
k
k
Cohort 1 at Time 1, so that Min [ (W )C2T 0 , (W )C1T 1 ] is less than a caliper (Stuart
i
i
98

& Rubin, 2008). The smaller the caliper, the more comparable the two data points
will be. The matched classes are used to compute the ith bias reduction rate.
4. Replicate 1-3 for 200 times, which results in 200 bias reduction rates.
5. Compute and report the average of the 200 bias reduction rates.
Level-2 matching R code is displayed in Appendix A.4.

5.1.3

Dual Matching

Dual matching involves two parts. First, level-2 units such as classes are matched. Second,
within each pair of matched treatment-control clusters, individual units are matched. The
detailed matching procedures are:
1. Conduct level-2 matching following the ﬁrst 3 steps of Section 5.1.2. The matched
classes are used to compute the ith class-level bias reduction rate.
2. Within each pair of the matched classes from level-2 matching, conduct level-1 matching following Step 3 of Section 5.1.1. The matched units are used to compute the ith
dual-matching bias reduction rate.
3. Replicate 1-2 for 200 times, which results in 200 class-level bias reduction rates and
200 dual-matching bias reduction rates.
4. Compute and report the average class-level bias reduction rate and the average dualmatching bias reduction rate.
Dual matching R code is displayed in Appendix A.5.
99

5.2

Simulation Results of Matching on Level-1 and/or
Level-2 Covariates

Table 5.1 summarizes the simulation results of individual matching, cluster matching and
dual-matching. The results are from simulations in Section 4.2.1

5.2.1

C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s

When the two cohorts’ hierarchically structured data are diﬀerent on only level-1 covariates,
matching on propensity scores estimated from level-1 covariates reduces estimation bias by
78 percent using a caliper of 0.01 standard deviations. If a larger caliper (i.e., 0.2 standard
deviations) is used, propensity score matching reduces estimation by 72 percent.
Mahalanobis distance matching only reduces estimation bias by 16 percent when a caliper
of 0.2 standard deviations is used. The bias reduction rate is only 24 percent when a caliper
of 0.01 standard deviations is used.

5.2.2

C1T 1’s Level-1 Covariate Variances Diﬀer from C2T 0’s

When the two cohorts are hierarchically diﬀerent in terms of only level-1 covariate variances,
matching on propensity scores estimated from level-1 covariates reduces estimation bias by
only 1 percent when a caliper of 0.2 standard deviations is used. If a smaller caliper of 0.01
standard deviation is used, propensity score matching reduces estimation bias by about 2
percent.
Mahalanobis distance matching even increases estimation bias by 9 percent when a caliper
of 0.01 standard deviations is used. The estimation bias is increased by about 8 percent when
100

a caliper of 0.2 standard deviations is used.

5.2.3

C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s

When the two cohorts are hierarchically diﬀerent on only the level-2 covariate, matching on
level-2 propensity scores estimated from level-2 covariates reduces estimation bias by 63.55
percent when a caliper of 0.2 standard deviations is used. If a smaller caliper of 0.01 standard
deviations is used, propensity score matching reduces estimation bias by 68.81 percent.
Mahalanobis distance matching does not reduce estimation bias when a caliper of 0.2
standard deviations is used. The bias reduction rate is 5.26 percent when a caliper of 0.01
standard deviations is used.

5.2.4

C1T 1’s Level-2 Covariate Variances Diﬀer from C2T 0’s

When the two cohorts’ hierarchically structured data are diﬀerent in terms of level-2 covariate variances, matching does not help reduce estimation bias at all. Matching on level-2
propensity scores increases 8.2 percent of estimation bias when a caliper of 0.2 standard deviations is used. If a smaller caliper of 0.01 standard deviation is used, matching on level-2
propensity scores even increases estimation bias by 167.8 percent.
Mahalanobis distance matching increases estimation bias by 21.18 percent when a caliper
of 0.01 standard deviations is used. Estimation bias is reduced by about 5.34 percent when
a caliper of 0.2 standard deviations is used.
101

Table 5.1: Bias Reduction Rates of the Three Types of
Cohort
2
Cohort
1
Propensity
Time
0
Time
1
score Matching
(7th grade
(7th grade
in Yeari )
in Yeari+1 )
ICC
ICC
Larger Smaller
Caliper Caliper
Level-1
PrePost- PrePostMatching
Test
Test
Test
Test
Noncomparable
Level-1
0.32
0.34
0.31
0.33
72.03
78.44
Covariate
Means
Noncomparable
Level-1
0.32
0.34
0.31
0.33
1.79
1.35
Covariate
Variances
Level-2
Matching
Noncomparable
Level-2
0.32
0.34
0.31
0.33
63.55
68.81
Covariate
Means
Noncomparable
Level-2
0.32
0.34
0.32
0.34
-8.20
-167.80
Covariate
Variances
Dual
Matching
Noncomparable
37.01
NA
Level-2
Covariate
Means
0.32
0.34
0.32
0.34
Noncomparable
76.66
NA
Level-1
Covariate
Means

102

Matching
Mahalanobis
Distance
Matching
Larger
Caliper

Smaller
Caliper

16.56

24.03

-7.51

-9.48

0.00

5.26

5.34

-21.18

NA

NA

NA

NA

5.2.5

Dual Matching Simulation Results

When the two cohorts’ hierarchically structured data have diﬀerent covariate means at both
level-1 and level-2, dual matching signiﬁcantly reduces estimation bias. Matching on level-2
propensity scores reduce estimation bias by 37.01 percent when a caliper of 0.2 standard
deviations is used.
After cluster matching, matching level-1 propensity scores further reduces estimation bias
by 39.65 percent. Dual matching for synthetic cohort design reduces total estimation bias
by 76.66 percent.

5.2.6

Discussion

When the cohorts’ hierarchically structured data are diﬀerent on only level-1 or only level-2
covariate variances, matching does not help to reduce bias in synthetic cohort design. This
is because when covariate means, either at level-1 or level-2, are identical between Cohort 2
at Time 0 and Cohort 1 at Time 1, the initial diﬀerence is very small and using synthetic
cohort design can estimate the schooling eﬀect accurately. Matching helps little in reducing,
or even can increase bias in this situation.
When the cohorts’ hierarchically structured data are diﬀerent on both level-1 covariates
and level-2 covariates, dual matching is the optimal approach. Only cluster matching helps
reduce bias. But initial diﬀerence (about 40%) due to level-1 covariate means between Cohort
2 at Time 0 and Cohort 1 at Time 1 still exists.
Research has suggested that when the true propensity score model is known and sample
size is large, propensity score will be a better approach (Sekhon and Diamond, 2008). Each
simulated condition determines a “true” and known propensity score model. Each replication
103

sample size is about 2,700, from 100 classes. Each class on average has 27 students. Thus,
Mahalanobis distance matching cannot achieve the optimal results.
Future studies may use smaller level-1 and level-2 sample sizes to examine the performances of the three proposed matching approaches on hierarchically structured data.

5.3

Simulation Results of Matching on Level-1 Latent
Variable and Surrogate Variables

For each simulation, there are four matching methods including two propensity score matching, factor score matching, and Mahalanobis matching. Table 5.2 summarizes the results.
When C1T 1’s surrogate variable means diﬀer from C2T 0’s and both cohorts have equal
latent means and a low reliability (ρ = 0.25), matching on propensity scores estimated
from surrogates reduces estimation bias by 51.53 percent. None of the other three types of
matching can reduce estimation bias by more than 3.5 percent. Latent variable propensity
score matching even increases estimation bias by 3.02 percent.
When the two cohorts have equal surrogate variable means, equal latent means and C1T 1
has a higher reliability (ρ = 0.78), propensity score matching through latent variable SES is
optimal and it reduces estimation bias by 8.5 percent. All the other three types of matching
reduces estimation bias by less than 1.5 percent . Latent variable Mahalanobis distance
matching even increased the bias by 4 percent.
When C1T 1’s surrogate variable means and latent variable SES mean diﬀer from C2T 0’s
and C1T 1 has a higher reliability (ρ = 0.78), latent variable Mahalanobis distance matching
only reduces estimation bias by 4.69 percent. Any of the other three types of matching
104

reduces estimation bias by about 53.3 to 54.8 percent.
When C1T 1’s surrogate variable means and latent variable SES mean diﬀer from C2T 0’s
and both cohorts have high reliability (ρ = 0.78), latent variable Mahalanobis distance
matching only reduces estimation bias by 2.91 percent. Other three types of matching work
equally well, and any of them can reduce estimation bias by about 55 percent.
Section 5.3.1 to Section 5.3.4 list the detailed results of the four matching approaches for
each simulation. Each matching is conducted within a caliper of 0.2 standard deviations.

5.3.1

C1T 1’s Surrogate Variable Means Diﬀer from C2T 0’s, with
the Same Latent Means and Low Reliability

The surrogate variable means are diﬀerent between Cohort 1 at Time 1 and Cohort 2 at
Time 0. The pseudo-population mean of the latent variable SES is 0 in the two cohorts. The
pseudo-population level reliability coeﬃcient of the surrogate variables is as low as 0.25 in
the two cohorts.
Propensity Score Matching Based on Surrogate Variables of SES Matching on propensity scores estimated from the four surrogate variables reduces schooling eﬀect estimation
bias in SCD (shortened as “estimation bias”) by 51.53 percent.
Propensity Score Matching Based on Latent Variable SES Matching on propensity scores
estimated from the latent variable SES factor scores increases estimation bias by 3.02 percent.
Matching on Latent Variable SES Factor Score When the estimated factor score of
latent variable SES is used as a “propensity score”-like measure in matching, it reduces
estimation bias by 2.02 percent.

105

Table 5.2: Simulation Results of Matching on Level-1 Latent Variable and Surrogate Variables

Cohort 2 Time 0
(7th grade in Year i)

Surrogate
Variables
Propensity Score
Matching

Cohort 1 Time 1
(7th grade in Year i+1)

Latent
Variable
Propensity Score
Matching

Matching
on Latent
Variable
Itself

Latent
Variable
Mahalanobis
Matching

Latent
Variable
Mean

Observed
Variable
Reliability Mean

Latent
Variable
Mean

Reliability

µ

0

Low

µ + c3

0

Low

51.53

-3.02

2.02

0.45

µ

0

Low

µ

0

High

0.07

8.5

1.03

-4.0

µ

0

Low

µ + c4

0.68

High

54.75

53.81

53.31

4.69

µ

0

High

µ + c4

0.68

High

55.12

54.96

54.86

2.91

Observed
Variable
Mean

106

Mahalanobis Distance Matching Based on Latent Variable SES Factor Score If the estimated Mahalanobis distance of the estimated factor score of the latent variable SES is used
for matching, it reduces estimation bias by 0.45 percent.

5.3.2

C1T 1’s Surrogate Variables Have Higher Reliability than
C2T 0’s, with the Same Surrogate Means and the Same Latent Means

The surrogate variable means are the same in the two cohorts. The pseudo-population mean
of the latent variable SES is 0 in the two cohorts. The pseudo-population level reliability
coeﬃcient of the four surrogate variables is 0.25 in Cohort 2 at Time 0, but it is equal to .78
in Cohort 1 at Time 1.
Propensity Score Matching Based on Surrogate Variables of SES Matching on propensity scores estimated from the four surrogate variables reduces estimation bias by 0.07 percent.
Propensity Score Matching Based on Latent Variable SES Matching on propensity scores
estimated from the latent variable SES factor scores reduces estimation bias by 8.5 percent.
Matching on Latent Variable SES Factor Score Matching on factor scores of the latent
variable SES reduces estimation bias by 1.03 percent .
Mahalanobis Distance Matching Based on Latent Variable SES Factor Score Mahalanobis distance matching based on the latent variable SES factor scores even increases
estimation bias by 4 percent.
107

5.3.3

C1T 1’s Surrogate Variables Have Higher Reliability, Diﬀerent Latent Variable Mean from C2T 0’s

The pseudo-population mean of the latent variable SES is 0.68 in Cohort 1 at Time 1; but
it is 0 in Cohort 2 at Time 0. Therefore, the surrogate variable means of the two cohorts are
diﬀerent by 0.68 ∗ λX1.SES ; λX1.SES being the factor loading vector.
The pseudo-population level reliability coeﬃcient of the four surrogate variables is 0.25
in Cohort 2 at Time 0, but it is .78 in Cohort 1 at Time 1
Propensity Score Matching Based on Surrogate Variables of SES Matching on propensity score estimated from the four surrogate covariates reduces estimation bias by 54.75
percent
Propensity Score Matching Based on Latent Variable SES Matching on propensity scores
estimated from of latent variable SES factor scores reduces estimation bias by 53.81 percent.
Matching on Latent Variable SES Factor Score Matching based on factor scores of the
latent variable SES reduces estimation bias by 53.31 percent.
Mahalanobis Distance Matching Based on Latent Variable SES Factor Score Mahalanobis distance matching based on the latent variable SES factor scores only reduces estimation bias by 4.69 percent.

5.3.4

C1T 1’s Latent Variable Mean Diﬀer from C2T 0’s, with the
Same Higher Reliability

The pseudo-population level reliability coeﬃcient of the surrogate variables is 0.78 in the
two cohorts. The pseudo-population mean of the latent variable SES is 0.68 in Cohort 1 at
Time 1; but it is 0 in Cohort 2 at Time 0.
108

Therefore, the surrogate variable means of the two cohorts are diﬀerent by 0.68∗λX1.SES ;
λX1.SES being the factor loading vector.
Propensity Score Matching Based on Surrogate Variables of SES Matching on propensity score estimated from the four surrogate covariates reduces estimation bias by 55.12
percent.
Propensity Score Matching Based on Latent Variable Matching on propensity scores
estimated from of latent variable SES factor scores reduces estimation bias by 54.96 percent.
Matching on Latent Variable SES Factor Matching based on factor scores of the latent
variable SES reduces estimation bias by 54.86 percent.
Mahalanobis Distance Matching Based on Latent Variable SES Factor Score Mahalanobis distance matching based on the latent variable SES factor scores only reduces estimation bias by 2.91 percent.

5.3.5

Discussion

Diﬀerent studies may use diﬀerent quality of data in terms of measurement reliability, which
presents diﬀerent requirements for matching. This section demonstrates the potential of
using surrogate variables or a latent variable in matching, according to the diﬀerent matching
requirements to reduce bias of schooling eﬀect estimate.
Mahalanobis distance matching does not reduce bias as eﬀectively as either propensity
score matching or factor score matching. Propensity score matching generally performs
better than Mahalanobis distance matching when the true propensity score model is known
and the sample size is large (Sekhon and Diamond, 2008).
The simulation settings favor propensity score matching. Each simulated condition de109

termines a “true” and known propensity score model. Each of the 200 replications uses 100
classes and the sample size is as large as about 2,700.
Latent variable matching is eﬀective if factor scores represent the diﬀerence between the
two cohorts. If the two cohorts are comparable on the latent variable means, matching
through the latent variable is not helpful at all. If the two cohorts are diﬀerent only on
surrogate variables with larger measurement errors, latent variable matching reduces little
bias. But propensity score matching through these surrogate variables is optimal.
Reliability of surrogate variables doesn’t impact the eﬀectiveness of matching. If the
two cohorts are diﬀerent on the latent variable, regardless of whether the level of reliability
is low or high, matching on propensity scores estimated from surrogate variables works as
well as matching on factor scores or propensity scores estimated using the latent variable.
Measurement errors on surrogate variables do not attenuate bias reduction rate. This is
diﬀerent from what was found by (Cochran and Rubin, 1973). Future study and simulation
are needed to further examine and explain the inconsistency.

5.4

Simulation Results of Matching When R2 Being
Manipulated

2
2
In this simulation, level-1 variance σepre and level-2 residual variance σuα are reduced by
0
half. It examines: 1) if increasing R2 improves bias reduction rate; and 2) if increasing R2
improves bias reduction rate more when the simulated selection bias is smaller. Table 5.3
summarizes the results.

110

Table 5.3: Bias Reduction Rates of Three Type of Matching with Higher R2
Cohort 2 Time 0
Cohort 1 Time 1
Propensity
(7th grade in Year
(7th grade in Year
score
i)
i+1)
Matching
Larger Smaller
ICC
ICC
Caliper Caliper
Level-1 Matching
Pre-Test Post-Test Pre-Test Post-Test
Non-comparable
Level-1
0.372
0.447
0.365
71.77
78.34
Covariate Means, Higher 0.454
2
R
Non-comparable
Level-1
0.372
0.447
0.365
62.96
64.06
Covariate
Means
with 0.454
2 and Reduced
Higher R
Initial Diﬀerence
Level-2 Matching
Non-Comparable
Level-2
0.250
0.207
0.244
70.22
71.15
Covariate
Means
with 0.213
2
Higher R
Non-Comparable
Level-2
0.250
0.207
0.244
52.26
66.84
Covariate
Means
with 0.213
2 and Reduced
Higher R
Initial Diﬀerence
Dual Matching
Non-Comparable
Level-1
0.280
0.320
0.274
36.74
NA
and
Level-2
Covariate 0.326
2
Means with Higher R
77.13
NA
Non-Comparable
Level-1
0.280
0.320
0.274
36.39
NA
and
Level-2
Covariate 0.326
2 and
Means with Higher R
Reduced Initial Diﬀerence
78.19
NA
111

Mahalanobis
Distance
Matching
Larger Smaller
Caliper Caliper

16.99

23.34

7.80

12.74

0.00

5.49

0.00

24.48

NA

NA

NA

NA

NA

NA

NA

NA

5.4.1

C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s, with
Level-1 Variance Reduced by Half

When the two cohorts are hierarchically diﬀerent at only level-1 covariates and level-1 R2
is high, matching on propensity scores estimated from the level-1 covariates reduces the
schooling eﬀect estimation bias in SCD by 78.34 percent using a caliper of 0.01 standard
deviations. If a lager caliper of 0.2 standard deviations is used, the bias reduction rate is
71.77 percent.
Mahalanobis distance matching only reduces estimation bias by16.99 percent using a
caliper of 0.2 standard deviations. The bias reduction rate is 23.34 percent when a caliper
of 0.01 standard deviations is used.

5.4.2

C1T 1’s Level-1 Covariate Means Diﬀer from C2T 0’s, with
Level-1 Variance Reduced by Half, and Initial Diﬀerence
Reduced

When the two cohorts are hierarchically less diﬀerent at level-1 covariates, that is, the initial diﬀerence is smaller, increasing level-1 R2 does not help to improve the performance of
matching. Matching on propensity scores estimated from level-1 covariates reduces estimation bias by 64.06 percent when it uses a caliper of 0.01 standard deviations. If a larger
caliper of 0.2 standard deviations is used, the bias reduction rate becomes 62.96 percent.
Mahalanobis distance matching only reduces estimation bias by 7.8 percent when it uses
a caliper of 0.2 standard deviations. The bias reduction rate is 12.74 percent when a caliper
is of 0.01 standard deviations.
112

5.4.3

C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s with
Level-2 Variance Reduced by Half

When the two cohorts are hierarchically diﬀerent at only level-2 covariate and level-2 R2
is high, matching on class level propensity scores estimated from level-2 covariates reduces
estimation bias by 70.22 percent when it uses a caliper of 0.2 standard deviations. The
estimation bias reduction rate is 63.55 percent when level-2 R2 is low (Section 5.2.3). If a
smaller caliper of 0.01 standard deviations is used, propensity score matching reduces bias
reduction rate by 71.15 percent.
Mahalanobis distance matching does not reduce estimation bias when a caliper of 0.2
standard deviations is used. The bias reduction rate is 5.49 per cent when a caliper of 0.01
standard deviations is used.

5.4.4

C1T 1’s Level-2 Covariate Means Diﬀer from C2T 0’s with
2
Level-2 Variance σuα0 Reduced by Half, and Initial Diﬀer-

ence Reduced by Half
When the two cohorts are hierarchically less diﬀerent at only level-2 covariates and the initial diﬀerence is smaller, increasing level-2 R2 does not help to improve the performance of
matching. Matching on propensity scores estimated from level-2 covariates reduces estimation bias by 66.84 percent when it uses a caliper of 0.2 standard deviations. If a smaller
caliper of 0.01 standard deviations is used, propensity score matching reduces estimation
bias by 52.26 percent.
Mahalanobis distance matching does not reduce estimation bias at all when a caliper of
113

0.2 standard deviations is used. The bias reduction rate is 24.48 per cent when a caliper of
0.01 standard deviations is used.

5.4.5

C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from
C2T 0’s with Both Level-1 and Level-2 Variances Reduced
by Half

When the two cohorts’ hierarchically structured data are diﬀerent at both level-1 and level-2
covariates, and both level-1 R2 and level-2 R2 are high, dual matching reduces estimation
bias by 77.13 percent. Matching on only class-level propensity scores reduces estimation bias
by 36.74 percent when a caliper of 0.2 standard deviations is used.
After level-2 matching, matching on propensity scores estimated from level-1 covariates
further reduces estimation bias by 40.39 percent.

5.4.6

C1T 1’s Level-1 and Level-2 Covariate Means Diﬀer from
C2T 0’s, with Both Level-1 and Level-2 Variances Reduced
by Half, and Total Initial Diﬀerence Reduced

When the two cohorts’ hierarchically structured data are less diﬀerent at both level-1 and
level-2 covariate means, that is, the initial diﬀerence is smaller, increasing both level-1 R2 and
level-2 R2 dose not help to improve the performance of matching. Matching on propensity
scores estimated from level-2 covariates reduce estimation bias by 36.39 percent when a
caliper of 0.2 standard deviations is used.
Further, after level-2 matching, matching on propensity scores estimated from level-1
114

covariates further reduces estimation bias by 41.80 percent. A total of 78.19 percent of
estimation bias is reduced in the dual matching.

5.4.7

Discussion

When level-1 R2 is high, the results are almost identical to those of Simulation 4.2.1, where
level-1 R2 is low. This suggests that increasing level-1 R2 does not help to improve level-1
matching.
When simulated level-1 selection bias is smaller, no matter level-1 R2 is high or low,
individual matching does not work as eﬀectively as it does when the initial diﬀerence is
larger. This suggests that level-1 matching is not sensitive to the increase of level-1 R2 .
When level-2 R2 is high, the results of propensity score matching are not identical to
those of Simulation 4.2.1, where level-2 R2 is low. Speciﬁcally when a larger caliper of 0.20
standard deviations is used, increasing level-2 R2 does help to improve level-2 matching
estimation bias reduction rate by about 7 percent.
When the simulated level-2 selection bias becomes smaller, increasing level-2 R2 does
not improve the performance of cluster matching. This further suggests that the accuracy
of level-2 propensity score matching is more sensitive to the magnitude of initial diﬀerence
than to the increase of level-2 R2 .
The dual propensity score matching is robust. The performance of dual propensity score
matching is not sensitive to the increase of R2 . More importantly, propensity score matching
still achieves a large bias reduction rate when the initial diﬀerence is small.
Still Mahalanobis distance matching is not comparable to propensity score matching.
Using a smaller caliper will improve matching accuracy.

115

Chapter 6
Discussions
The synthetic cohort design (SCD) is a cross-sectional design by nature. It is also a quasiexperimental design because it uses retrospective data rather than prospective data to study
the eﬀect of being in a speciﬁc cohort1 . The observed cohorts in the SCD of this dissertation
study are matched to reduce the impact of selection bias on the schooling eﬀect estimate.
Because of the complexity of the hierarchical structure of school systems, there are four
major school related issues that may limit the use of the proposed matching approaches.
Sections 6.1 to 6.4 discuss these four issues. Speciﬁcally, section 6.1 examines extending the
matching procedure that is developed on the regular math class data to other types of math
classes such as remedial classes. Section 6.2 discusses incompletely matched data due to
small class size. Section 6.3 discusses the role that the level-1 and level-2 covariates play in
the SCD and in the process of matching. And section 6.4 discusses matching students that
are held back at grade 8.
To address the measurement error issue that commonly exists in education studies, it
1 In education studies, this eﬀect is often referred to as schooling eﬀect.
116

is often necessary to use latent variables at more than one levels in the hieratically structured school systems. This dissertation involves the use of only level-1 latent variables in
multi-level structural equation modeling to address measurement error issues in matching.
The issue of use level-2 latent variables in matching is discussed in Section 6.5. In addition,
measurement-invariance-testing technique developed in the ﬁeld of structural equation modeling is introduced in Section 6.6 to identify situations where the HEoG assumption may
fail.
In future studies, other statistical indices and analytical approaches should be considered
in after-matching evaluation and data analysis. Section 6.7 discusses how statistical power
can be used to evaluate the performance of matching. Section 6.8 discusses statistical analysis
approaches, such as meta-analysis technique that can be used to analyze matched data.
Further, how matched-cohort data can be used in a longitudinal study is addressed in
section 6.9. This section also discusses why the SCD is needed in causal inference in education
studies and how it diﬀers from synthetic cohort analysis in other ﬁelds. And to illustrate,
Section 6.10 discusses how matching approaches proposed in this dissertation study can be
applied in the ﬁelds of international comparative mathematics education studies and program
evaluation studies in higher education and secondary track/nontrack school system in three
diﬀerent examples.
Finally, Section 6.11 brieﬂy summarizes the dissertation study.

6.1

Extend the Analysis to another Type of Math Classes

This study only uses the Regular Class data of SIMS, although there are three other class
types: Remedial, Enriched, and Algebra. Focusing on one type of classes ignores information
117

contained in the other three types of classes. Because of its largest sample size in the
four types classes, the Regular Class data set is used for matching to address the following
question: What is the schooling eﬀect of one year in a regular 8th grade class? However,
the curriculum system is diﬀerent across the four class types. Matching developed from
the Regular Class data can be applied to any one of the other three types of classes. For
example, using the same matching routine on the Remedial Class data would depict the
schooling eﬀect of one year in remedial class.

6.2

Incomplete Matching Due to Small Cluster Size

This dissertation study considers a balanced design. That is the two cohorts are equally
sized. The success of obtaining matches for the treated units is generally determined by the
control group size. Stuart and Rubin (2008) recommends the use of a larger control group
to assure successful matching for each unit in a smaller treatment group.
In practice, larger treatment groups often occur due to a speciﬁc purpose of design. For
example, in the TIMSS 1995 design there were two classrooms at the upper 8th grade and
one classroom at the lower 7th grade. The upper 8th grade is the focal cohort, conceptually,
in the “treatment” condition. Overall, the number of students in the treatment group is
twice as large as the one in the control group. In this situation, incomplete matching may
occur and lead to estimation bias of schooling eﬀect.
Three components of bias on an outcome are identiﬁed by Rosenbaum and Rubin (1985):
1) component due to departures from strongly ignorable treatment assignment; 2) component
due to incomplete matching; and 3) component due to coarse or inexact matching. Bias due
to departures from strongly ignorable treatment assignment indicates the selection bias. This
118

dissertation study currently uses one-to-one matching (called simple matching, Abadie and
Imbens, 2006) to reduce selection bias to improve schooling eﬀect estimation accuracy. Using
simple matching can leave some treated units unmatched, which in turn leads to incomplete
matching.
Abadie and Imbens (2006) review the average treatment eﬀects (ATTs) estimators based
on simple matching. They show that ATT estimators in simple matching include a conditional bias term (i.e., eﬃciency loss) that does not converge to zero when more than one continuous variable is used for matching. Abadie and Imbens (2007) proposed a bias-corrected
matching estimator for ATTs using multiple matching, which allows matching with replacement.
Thus, every treated unit has a matched unit or several matched units from the control
group. Therefore, one can ﬁnd matching units with higher quality. For example, when
matching is on one continuous variable, it has been shown that the eﬃciency loss can be
forced to converge to zero through multiple matching with replacement.
Although matching with replacement leads to a larger variance, it typically produces
greater match quality and smaller bias. According to Cochran (1972), bias reduction is
more important than the variance reduction. Future studies can apply the multiple matching
approach in hieratically structured data to test if it can improve bias reduction rate and, if
yes, to what degree.
Another approach to address the incomplete matching issue is to use the non-local control
group matching (Stuart and Rubin, 2008), in which the diﬀerence between the local and nonlocal control groups is accounted for in estimating the schooling eﬀect. The local control
group can be a control class located in the same school as the treatment class is. The
119

non-local group is a control class from another school.
In educational settings, class (group) sizes are comparatively small, therefore researchers
may fail to ﬁnd a good match for every unit of a treatment class. In this situation, units from
a non-local control group may be needed. Stuart and Rubin (2008) use an inﬁnitely large
non-local group for matching. For the treatment units that can be matched from the local
L
control group (y1i ), the researchers ﬁnd corresponding matching units from the non-local
N
L
N
units (y1i L ). The regression model will be ﬁt on data including y1i and y1i L to obtain
quantities. The quantities are used to adjust the treatment units that can be matched only
from a non-local control group. Thus it resolves the incomplete matching issue.
This dissertation study did not use the non-local control group to adjust incomplete
mataching. Future studies can explore how to use non-local group matching adjustment in
the SCD to achieve a better bias reduction rate when classes are the intervention units. For
example, future studies can simulate a situation where C2T 0 and C1T 1 are not equallysized: 2N2 for the former and N for the latter. In such situations, researchers can evaluated
whether or not, and if yes to which degree, matching success can be improve by matching
procedures with adjustment including non-local adjustment matching (Stuart and Rubin,
2008) and multiple matching with replacement (Abadie and Imbens, 2007).
Besides conducting empirical simulation studies, future research should engage in systematic literature review on studies using non-balanced designs with or without randomization.
The systematic literature review can focus on how the statistical inference is adjusted in the
unbalanced design, speciﬁcally in the ﬁeld of latent variable modeling.

2 N is the sample size.
120

6.3

Role of Covariates in Synthetic Cohort Design

In the Solomon Four-Group Design or any other true experimental design, due to the use
of randomization, inclusion of covariates is not necessary in the analytical step (Solomon,
1949). Randomization, in the long run, “evens out” the eﬀect of covariates on intervention
in treatment and control groups. However, in quasi-experimental designs such as the SCD,
covariates play a very important role in statistical inference.
The feasibility of matching depends on the availability of the covariates in a study. The
upper grade and the lower grade in the SCD are conceptually treated as the treatment
group and control group, respectively. Instead of using analysis of covariance (ANCOVA,
Cochran, 1957) to partial out the contaminate eﬀect of covariates on intervention, this study
demonstrates three cases where matching approaches can take care of and “even out” the
eﬀect of covariates. The ﬁrst case involves the hierarchical structure of the covariates. The
second involves the measurement errors associated with observed covariates. The third
involves incomplete or omitted covariates. These cases commonly occur in education studies
due to the complex structure of school systems, which involve an endless list of variables
such as student characteristics, family background variables, teacher variables, and variables
at school and district levels.
These variables are, directly or indirectly, relevant to student learning, although they
are measured with errors in practice. Some researchers may treat one set of these relevant
variables as intervention variables to study their eﬀects on student learning, controlling for
another set of variables (i.e., covariates). Other researchers study another set of variables and
their eﬀects on student learning, controlling for another set of covariates. Correspondingly,
the covariates that used for matching will vary across studies.
121

6.3.1

On Which Covariates to Match

Covariates used in matching and covariates used in ANCOVA belong to the same set of
variables, which are potential confounders (Song and Herman, 2010). These confounders
are the preexisting characteristics that, although not the intervention variables, also cause
observable diﬀerence between the treatment and control groups.
The diﬃculty is how to determine a set of covariates that would be appropriate for
matching. Answering this question requires a two-step process. Step one requires distinguishing the intervention variables from the covariates among the available variables in a
study. Step two requires testing on which covariates that the treatment and control groups
are not comparable.
The covariates used to compute the propensity scores will be those non-intervention
variables that can signiﬁcantly distinguish the two cohorts. In matching the upper and lower
graders in the SCD, attention should be paid to the covariates due to their chronological
nature.

6.3.2

Concern on Chronological Variables such as Age and Grade
Speciﬁc OTL

In practice, special considerations are needed when there is a chronological diﬀerence between
the two groups or cohorts of individuals being matched. For example, in the SCD, the upper
and lower grades are diﬀerent by age and by curriculum coverage (e.g., OTL, in Schmidt
and Burstein, 1992). Matching the two cohorts, the upper and the lower grades, involves
collecting historical data on the upper grade students.
These historical data would include their age and curriculum coverage when they are in
122

the lower grade a year before. Thus, the age of the upper grade used in matching should be
reduced by one year. The previous year’s curriculum coverage of the upper grade should be
used in matching with the current year’s curriculum coverage of the lower grade.

6.3.3

Two Types of Level-2 Covariates

Two types of level-2 covariates are identiﬁed (L¨dtke et al., 2008): global and contextual
u
variables. The directly measured level-2 covariates, such as class-size, generally can be broken
down to individuals at level-1. These covariates are referred to as global variables. The
contextual variables are those covariates included at both levels. For example, a covariate x
is used in the level-1 model, then the cluster-mean of x is included in the level-2 model.
Researchers have studied multilevel modeling with the same covariate included at both
level-1 and level-2. This type of multilevel model has been called contextual analysis modeling (Boyd and Iversen, 1979; Firebaugh, 1978; Raudenbush and Bryk, 2002; Schmidt and
Houang, 1986).
The possible problem is that the aggregated mean of a covariate based upon small number
of individuals has low reliability. This dissertation study does not include the aggregated
means as extra level-2 covariates in estimating the level-2 propensity scores. Future research
should examine the eﬀect of including contextual variables, such as the aggregated means,
as extra level-2 covariates in matching.

6.3.4

Interaction Terms as Omitted Covariates

Omitted covariates can be the interaction terms that are not included in the model and
analysis. Gelman (2009) emphasizes the interaction between the treatment variable and pre123

treatment covariates and the possible impact on treatment eﬀect estimation. The interaction
term and its eﬀect are often ignored when the main focus is to estimate a single coeﬃcient
of the treatment variable of interest.
Ignoring the interaction term may result in a biased treatment eﬀect estimate (Gelman
and Hill, 2007). Ignoring the interaction term may directly cause non-comparability of two
cohorts being compared in the SCD. Students in Cohort 2 at Time 0 can be more/less
proﬁcient than those in Cohort 1 at Time 1. That is those two 7th grades in two continuous
years are not comparable in terms of the interaction between pre-test score and a pretreatment covariate. The treatment in this situation is one-more year of schooling in 8th
grade.
Future research should identify the interaction terms of treatment and pre-treatment and
develop matching routines to account for selection bias due to ignoring the interaction terms.

6.4

Deal with Students under Retention in Matching

Grade retention, as an indictor of educational process (Planty et al., 2009), aﬀects the
matching procedure in the SCD. It has been used as an intervention to hold a student for
an extra year in a grade with a goal of improving his/her academic proﬁciency (Ou and
Reynolds, 2010).
These retained students are not comparable with their classmates due to the extra year of
schooling they received in the same grade. However, if the research goal is to ﬁnd how much
the 8th graders learn after one school year, retained students should be excluded during the
matching. This is because they have one year of schooling before retention.
Matching retained students in the upper grade with students in the lower grade will not
124

be plausible. They themselves can be their matched units. The schooling eﬀect estimate
can be derived from retained students by using a longitudinal design. One can treat their
observed learning outcome at the beginning of the retention year as a base-line score to
compute the individual’s gain of the retaining year to estimate the schooling eﬀect.

6.5

Improve Measurement Accuracy in Education Studies

There is a clear trend of using randomized control trials design in studying how educational
interventions and instructional inferences aﬀect student performance (Raudenbush and Sadoﬀ, 2008; Sloane, 2008; Spybrook, 2007). Assessing the eﬃcacy and the eﬃciency of the
design depends heavily on the hypothesis development, experimental design, controlled experimental trials, identiﬁcation of the population of interest, and the implementation of the
study.
The most challenging task in educational randomized trials design is to obtain valid measurements of interventions and inferences to assess their eﬃcacy and eﬃciency (Raudenbush
and Sadoﬀ, 2008; Sloane, 2008). Because of the hierarchically structured experimental design
and data collection (Raudenbush and Sadoﬀ, 2008), treatment units are generally classes or
schools rather than individuals. The accuracy of measurement on interventions should be
assessed at the classroom level.
Raudenbush and Sadoﬀ (2008) point out that intervention in classes is critical to student
development and requires a large-scale measurement of classroom instruction. Measurement
errors commonly occur in survey sampling data and analysis (Cochran, 1968b; S¨rndal et al.,
a
125

2003, Chapter 16). The measures of the classroom intervention and instruction that students receive can be subject to measurement errors (Raudenbush and Sadoﬀ, 2008) in the
randomized large-scale surveys. In general, research focuses on how treatment-control status aﬀects classroom instruction (two-level modeling) and further how classroom instruction
aﬀects student outcomes across schools (three-level modeling).
The measurement error on classroom instruction is accounted for by assuming a classroomlevel random eﬀect within the school. Further, the mean of the classroom-level instruction
is assumed to be predicted by treatment-control status. It is also allowed to vary across
schools (Raudenbush and Sadoﬀ, 2008). In addition, the classroom instruction can be predicted by treatment-control status at a level-1 equation, and the intercept and slope of the
level-1 equation can be assumed to vary across schools. The measurement error eﬀect on
statistical inference testing can be done using the non-centrality F statistics (Raudenbush
and Sadoﬀ, 2008).
Future research can explore how class-level measurement errors will aﬀect the accuracy
of cluster matching and dual matching in terms of bias reduction rate.

6.6

Situations Where HEoG May Fail

In this study the simulated non-comparability of the two cohorts occurs only on the mean
vector and the variances of the joint distribution of the covariates of interest. Other situations
where HEoG may fail should be examined in future studies.
For example, future research should consider a situation where the two cohorts have diﬀerent factor loadings and/or regression coeﬃcients. Measurement invariance testing (Cheung
and Rensvold, 2002; Kaplan, 2008) can detect if the two groups or cohorts being matched
126

have the same parameters such as factor loadings and residual variance. Traditionally, three
steps are required to test measurement invariance.
First, conﬁgural invariance is assessed by imposing the same structure of free and ﬁxed
parameters on the factor loadings across groups. If such a model ﬁts the data well, it can
be concluded that the same conceptual framework underlies the respondents’ responses.
Second, metric invariance is tested by constraining the factor loadings to be equal in the
diﬀerent groups. If metric invariance holds, changes in the latent variable lead to the same
changes on the observed responses of the same items across groups.
Third, constraining the intercepts to be invariant across groups provides a means to
assess scalar invariance. Together with invariant factor loadings this condition assures that
comparisons of latent means across groups are meaningful.
These measurement-invariance-testing approaches can identify on which sets of parameters that the two groups are not comparable. Future research can simulate a corresponding
set of parameters that cause the non-comparability of the two groups/cohorts to evaluate
how matching on latent variables or surrogate variables can reduce selection bias.

6.7

Statistical Power as an After-Matching Evaluation
Index

Statistical power refers to the probability that an inference test achieves statistical significance given a true eﬀect size (Cohen, 1988). Recently, experimental design is considered
as a composite element in the power analysis of intervention studies. Experimental design
includes analytical modeling and the sampling procedures, which consist of sample size and
127

sample assignment (Song and Herman, 2010).
Given a speciﬁc value of the Type I error, statistical power will vary across true eﬀect sizes
in a study deign. More importantly, statistical power is aﬀected by the process of “sample
allocation” (Song and Herman, 2010, p. 358), which refers to whether intervention units
are individuals or clusters of individuals. The power analysis in statistical modeling refers
to including covariates for adjustment to boost the value of statistical power and achieve
a smaller sample size (Bloom et al., 2007; Raudenbush et al., 2007). The boosted value of
statistical power is called the gain on power.
Based upon what has been discussed, there are a few strategies to evaluate matching
under the framework of power analysis. First, if a matching approach can improve the
eﬀect size estimate, which will approximate the true eﬀect size, this matching approach can
achieve higher statistical power (e.g., Freedman et al., 1990; Griﬃn et al., 2009; Martin
et al., 1993). Second, matching with non-local group adjustment (Stuart & Rubin, 2008)
will help to successful ﬁnd matching units for the treatment group. This will result in an
orthogonal design, and will help to keep a larger sample size. These two aspects will improve
the value of statistical power. The eﬀect size can be computed as the standardized mean
diﬀerence (Cohen, 1988; Hedges, 2007b, for multilevel data).
The second strategy that future studies can use is to focus on comparisons between the
gain on power in regression-covariate-adjustment in multi-level modeling and the gain on
power in cluster/dual matching. Griﬃn et al. (2009) use post-hoc matching on observational
data and ﬁnd that matching on diﬀerent sets of level-2 covariates results in diﬀerent levels
of statistical power. Studying the second strategy can shed light on how the level-1 and/or
level-2 covariates aﬀect statistical power in matching compared with covariate-adjustment
128

in regression.

6.8

After-Matching Statistical Analyses

In general, it is recommended to use matching and regression-adjustment together to pursue
a larger reduction rate of the initial selection bias (Rubin and Thomas, 2000; Stuart and
Rubin, 2007). Future studies need to merge dual matching and multilevel modeling with
covariates included as adjustment variables to examine if optimal results can be achieved
in the context of using hierarchically structured data. For example, a simulation study can
compare if including propensity scores as one of the covariates in the hieratical linear model
can achieve an optimal result.
The analytical approach of paired cluster randomized trials (PCRT) discussed in Thompson et al. (1997) can be used to analyze the data after level-2 matching. The PCRTs involve
pairs of clusters that are matched in terms of covariates such as the demographic characteristics. Within each pair, one cluster is randomly assigned to the treatment group and
the other to the control group. A random eﬀects meta-analysis framework can take into
account the between-cluster variation. Techniques such as the sample size calculations and
the proﬁle likelihood method can be applied to compute the conﬁdence interval of the global
eﬀect size by taking into account the variation in estimating the variance across clusters.

6.9

Synthetic Cohorts Design and Life-Course Research

Making causal inference from policy-based treatment is to understand the causal eﬀect of
one of the “turning point events and interventions on development trajectories” in life129

course (Haviland and Nagin, 2005, p. 576). In education studies, the treatment status needs
to be deﬁned in the context of multiple hierarchically structured sites. Such policy-based
hierarchal treatment status has been developed in the literature. For example, policy-based
treatment status in their school retention policy study, Hong and Raudenbush (2006) deﬁned
the school level binary treatment status as 1 if a school has high retention proportion, and
0 otherwise. Within each school, the level-1 treatment status is determined as 1 if a pupil
is allowed to retain in the current grade for one-more year regarding school retention policy,
and 0 otherwise. The SCD of this dissertation study can be treated as a speciﬁc case of
the one-time-point treatment eﬀect estimation approach discussed in Haviland and Nagin
(2005).
Compared with a longitudinal design, the SCD involves only one set of multiple informative data during several periods. However, the SCD and other one-time-point treatment
eﬀect estimation approaches are irreplaceable for two major reasons as discussed below.
Longitudinal studies through growth modeling are statistically feasible; but they may
not be morally applicable in practice to study life-course events. This is because most of
the events during the life-course are one-time-occurrences, and some events can be emotionally negative and should/can not be repeated. Each event can create a turning point
at a historically important time point in life-course. The use of SCD can examine the impact of one-time-occurrence-event in life-course and help researchers understand individual
development and change (Haviland and Nagin, 2005).
Even if longitudinal design can be used for certain research situations during the lifecourse, it is often unrealistic to follow participants longitudinally due to the high cost and
the complexity of life events. Longitudinal studies collect multi-waves of follow-up data in
130

a time period. Increasing data collection frequency in a longer time period can improve
the quality of research given a ﬁxed number of sample, but it can signiﬁcantly increase the
research cost (Bloom et al., 2007). In the complex cluster randomized trials design such
as using schools as study units, limited cost on data collection can reduce the number of
clusters and can further attenuate the statistical power of a research (Raudenbush and Liu,
2001).
The one-time-point treatment eﬀect estimation approach, such as the ﬁrst-time treatment eﬀect estimate of a duration of life-course, can be generalized across multiple time
points (Leon and Hedeker, 2005; Li et al., 2001). Individual development trajectory and
pathway in life-course are shaped by the life-events’ eﬀects (Elder, 1998).

Haviland and

Nagin (2005) point out that propensity score matching can be used to create comparable
groups, which can be followed and studied across in the duration of life-course. These comparable groups created by researchers using observed data are called synthetic cohorts in
epidemiology studies.
Synthetic cohorts are commonly used in aging studies in the ﬁeld of epidemiology to
estimate the synthetic cohort eﬀect (Heimberg et al., 2000; Kessler et al., 1998). For example,
Campbell and Hudson (1985) use rare life events of seniors to pool observed panel survey
data to create synthetic cohorts, which are comparable groups to be further analyzed through
the discrete times series analysis. Kessler et al. (1998) use latent class analysis to predict
individuals’ cohort membership and to create synthetic cohorts. Each cohort is longitudinally
studied.
A dummy variable can be created to represent the cohort membership index and can
be used in data analysis. For example, the synthetic cohort eﬀect is the non-comparability
131

between two cohorts and is captured by the statistical signiﬁcance of the interactions of the
dummy variable and the background characteristics (Heimberg et al., 2000). This way it can
reveal on which background characteristics the two cohorts are diﬀerent.

6.10

Illustrations

Program evaluation is a challenging task to accomplish in education studies because valid
inferences must account for errors and biases in the observed data collected from the hierarchically structured samples. In these samples, students are nested in classes, and classes are
nested in schools and other higher level units. Matching is a tool to reduce bias of schooling
eﬀect estimation even when random assignment is achieved, and to account for selection
bias when random assignment is not possible. Diﬀerent studies use diﬀerent types of data in
terms of school systems, which present diﬀerent requirements for matching. The following
examples demonstrate how to use matching approaches proposed in this dissertation study
to improve the accuracy of estimating intervention eﬀect in the ﬁelds of international comparative mathematics education studies, and program evaluation studies in higher education
and secondary track/nontrack school systems.
Exemplary Case 1 In the Mathematics Teaching in the 21st Century (MT21) project,
program evaluators study the program eﬀect on the subject matter knowledge of the starters
and the ﬁnishers on mathematics education in four teacher education programs located
at four German universities (Schmidt et al., 2007). The treatment status of this study
is the trainees’ ﬁnishing status in the mathematics education program in an across-nation
comparative study. The treatment status is deﬁned as 1 if a student has ﬁnished the training
program and 0 if she/he just starts. The goal is to evaluate the eﬀect of the treatment (i.e.,
132

ﬁnishing the training program) by comparing the starters (the control group) and the ﬁnishers
(the treatment group).
This treatment status has a longitudinal nature because the ﬁnishers have spent a certain
period of time in training and the starters have not. In this case, the strategy is to match
each ﬁnisher with a starter within the same university in terms of grade point average,
math course taking, and four motivation measures (pedagogical and subject-speciﬁc intrinsic
motivation, and status-related and access-related extrinsic job motivation). This matching
uses the level-1 matching proposed in this dissertation study.

Exemplary Case 2 In TIMSS 1995 study, researchers examine students’ improvement on
mathematics knowledge in track and nontrack classes across grades 7 and 8. These classes are
located in schools across diﬀerent states in the U.S. The hierarchically structured data and
complex sampling design bring forth challenges in program evaluation and causal modeling.
The policy-based treatment assignment at the school level is schools’ track/nontrack status,
1 for a track school, and 0 for a nontrack school. Within each school, the naturally observed
grade levels construct the class-level treatment status, 1 for grade 8 and 0 for grade 7. In
this case, there are two ways to match the hierarchically structured data.
Matching 1 for Track/Nontrack School Comparison at Each Grade This matching can
be referred to as the dual matching proposed in this dissertation study. Take the 7th grade
as an example. First, at level-2, match each 7th grade class in a nontrack school with one
7th grade class in a track school. Level-2 propensity scores are estimated by using all level-2
covariates. The school-level and class-level covariates are treated as level-2 covariates. The
level-2 matched data then will be analyzed by taking class means as analysis data points
to determine, in general, if there is a track/nontrack school diﬀerence in terms of the math
133

learning of the 7th graders. This step is referred to as the level-2 matching proposed in this
dissertation study. Similarly, the same matching procedures can be done for the 8th grade
track/nontrack data.
Further, level-1 matching will be conducted. Within each pair of the matched classes
from the ﬁrst step, level-1 propensity scores will be used to match one student from the
nontrack class with one student from the track class. These doubly matched data can be
analyzed to identify the individual diﬀerences on math learning among 7th graders between
the track school system and the nontrack school system. The two steps demonstrate how
the dual matching can be used in this situation.
Matching 2 for track/nontrack school growth comparison on math learning This situation involves only level-1 matching. That is to mach 7th graders and 8th graders within
each school. A growth score can be obtained from a pair of 7th − 8th graders. Within
each school, level-1 propensity scores are used to match one 7th grader to one 8th grader.
Only the level-1 covariates will be used to compute the propensity scores for matching. In
the mixed eﬀect (multilevel modeling) analysis, the school level covariates will be added to
control for the potential confounders.

Exemplary Case 3 Researchers at a midwest public university propose to investigate how
well residential students have learned through the higher educational system. The comparison will be conducted within the university. The residential students are distributed
in multiple colleges such as in Residential College in the Arts and Humanities (RCAH),
College of Arts and Letters (CAL), a College (named JM in short ) in Social Science, a
College (named LB in short ) in Natural Science. The policy-based treatment will be 1
for a residential student, 0 otherwise. The treatment status is uniform across colleges that
134

have residential students in the university. The clusters are referred to as the colleges being
evaluated.
Matching 1 for Residential/Nonresidential Students Comparisons

This situation in-

volves the within-cluster comparison, which means comparing residential students with their
peers in the same college based upon the measuring knowledge speciﬁc to that college. For
this within-cluster comparison, individual residential students will be matched with nonresidential students in the same college. Level-1 propensity scores are used for matching.
The level-1 propensity scores are computed by using only the level-1 covariates.
Matching 2 for College Comparison Using Residential Students

This situation involves

the between-cluster comparison. The between-cluster comparison will need a holistic measure
of higher education success for all colleges. This between-cluster comparison involves only
the residential students of all colleges in the university. For any two given colleges, RCAH
and JM for instance, the treatment status will be 1 for being in college RCAH and 0 in
college JM. So, residential students in the treatment group will be matched with residential
students in the control group. The same matching can be done by setting up one college as
the control group and matching each of other colleges with this control group. The matched
data then can be analyzed.
Matching 3 for College Eﬀect on Residential Students

The evaluation can be done

by using the freshmen and the seniors among the residential students. The colleges can be
treated the same as those universities in Exemplary Cases 1 and corresponding matching can
be done as what has been explained in Exemplary Cases 1 above. This matching approach
is more suitable to study how each college can add value to the learning eﬀect of residential
students.
135

The three approaches use diﬀerent data for matching and address three diﬀerent research
questions: Matching 1 examines if there is any diﬀerence between residential students and
nonresidential students within the college; Matching 2 examines if there is any diﬀerence
on “residential eﬀect” across colleges; and Matching 3 examines how much the “residential
eﬀect” is for each college.

6.11

Summary

In education studies using the SCD, valid inferences must account for the selection bias of
the hierarchical nature of the data. Matching is a tool to reduce estimation bias of treatment eﬀect to account for selection bias when random assignment is impossible in the SCD.
Diﬀerent situations where the HEoG assumption may fail present diﬀerent requirements
for matching. This dissertation study demonstrates the potential of using propensity score
matching in the SCD to reduce bias of schooling eﬀect estimate in three simulated situations
involving hierarchically structured data, surrogate covariates with measurement errors, and
omitted covariates. Based on the structural equation modeling framework, this dissertation
study provides a theoretical basis for future research to examine the eﬀectiveness of post-hoc
adjustment approaches such as propensity score matching in reducing the selection bias of
SCD for casual inference and program evaluation.

136

APPENDICES

137

Appendix A
Simulation Code

A.1

Mplus Code Fitting the Two-Level SEM on SIMSUSA Data

DATA:
FILE IS ”SIMSRGLR.dat”;
Format IS FREE;
TYPE IS individual;
DEFINE: STTHRATE = SCHSIZE/STCHS;
NEWGEOM=NEWGEOM/10;
NEWALG=NEWALG/10;
OLDARITH=OLDARITH/10;
OLDGEOM=OLDGEOM/10;
VARIABLE: NAMES ARE IDTEACH IDSCH IDSTUD IDCLASS XAGE
YFOCCN YMWORK YMOCCN EDUEPCT YPWWELL YIWANT YMORMTH
138

RYPWANT RYPENC RYNOMORE RYPINT RYFLIKE RYMLIKE
RYFABLE RYMABLE RYMIMPT RYFIMPT INTERCTY YFAMILY
YFEDUC YMEDUC YMHWKT YTUTORT OLDARITH OLDALG
OLDGEOM NEWARITH NEWALG NEWGEOM THWRKT CTCBEHV
TORDERT TBOTTOM TPPWEEK SAREA SENROLB SENROLG
STCHS SSOMMM SSOMMF SALLMM SALLMF SSPECM SSPECF
YGOWO YUSTAND YWRKLNG YNOTNEC YJOBUSE YMTHLOG
YFLGOOD YNONEED YFUN YMTHJOB YNEVER YHELPO
YHAPPY YING YCHALL YINMAZE YNOTWLL YHARDER
YCALM Totpre Totpos CLASSSIZE MTHSTAF
MTHTEACH MTHONLY SCHSIZE ;
USEVARIABLES ARE XAGE EDUEPCT RYPWANT RYPENC YPWWELL
YIWANT YMORMTH RYNOMORE YFAMILY RYPINT RYFLIKE RYMLIKE
RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN
YMOCCN YMHWKT oldarith oldgeom NEWALG NEWGEOM
TPPWEEK Totpos Totpre CLASSSIZE MTHONLY;
MISSING ARE ALL (-9);
WITHIN= XAGE EDUEPCT RYPWANT RYPENC YPWWELL
YIWANT YMORMTH RYNOMORE YFAMILY RYPINT
RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT
RYFIMPT YFEDUC YMEDUC YFOCCN YMOCCN
YMHWKT;
BETWEEN = OLDARITH OLDGEOM NEWALG NEWGEOM
139

CLASSSIZE TPPWEEK MTHONLY;
CENTERING = GRANDMEAN (XAGE);
CLUSTER = IDCLASS;
ANALYSIS: TYPE = TWOLEVEL ;
MODEL:
%WITHIN%
! LATENT VARIABELs
EDUISPR BY RYPWANT RYPENC YPWWELL;
SLFENCRG BY YIWANT YMORMTH RYNOMORE;
FMLYSUPT BY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE;
MTHIMPT BY RYMIMPT RYFIMPT;
SES BY YFEDUC YMEDUC YFOCCN YMOCCN;
! REGRESSION
Totpre on XAGE EDUEPCT YFAMILY YMHWKT EDUISPR SLFENCRG FMLYSUPTMTHIMPT SES ;
Totpos on Totpre ;
%BETWEEN%
Totpre ON OLDARITH OLDGEOM CLASSSIZE MTHONLY;
Totpos ON Totpre NEWALG NEWGEOM TPPWEEK ;
OUTPUT: sampstat TECH1 TECH8 CINTERVAL residual;
! Mont Carlo parameters SAVEDATA: ESTIMATES = newmodelﬁnal.dat;
140

A.2

Mplus Code Generating Data for Mont Carlo Simulation

TITLE: Data Generation Mplus Code of Mont Carlo Simulation
MONTECARLO:
NAMES ARE XAGE EDUEPCT RYPWANT RYPENC YPWWELL
YIWANT YMORMTH RYNOMORE YFAMILY RYPINT RYFLIKE RYMLIKE
RYFABLE RYMABLE RYMIMPT RYFIMPT YFEDUC YMEDUC YFOCCN
YMOCCN YMHWKT oldarith oldgeom NEWALG NEWGEOM
TPPWEEK Totpos Totpre CLASSSIZE MTHONLY;
NOBSERVATIONS = 345000;
NREPS = 1;
SEED = 58459;
POPULATION =newmodelﬁnal.dat;
COVERAGE =newmodelﬁnal.dat;
NCSIZES = 4;
CSIZES = 300 (10) 3500 (20) 8000 (30) 800(40);
WITHIN=XAGE EDUEPCT RYPWANT RYPENC YPWWELL
YIWANT YMORMTH RYNOMORE YFAMILY RYPINT
RYFLIKE RYMLIKE RYFABLE RYMABLE RYMIMPT RYFIMPT
YFEDUC YMEDUC YFOCCN YMOCCN YMHWKT;
BETWEEN = OLDARITH OLDGEOM NEWALG NEWGEOM
CLASSSIZE TPPWEEK MTHONLY;
141

REPSAVE = ALL;
SAVE = Newmodel8v2*.dat;
MODEL POPULATION:
%WITHIN%
! LATENT VARIABEL
EDUISPR BY RYPWANT RYPENC YPWWELL;
SLFENCRG BY YIWANT YMORMTH RYNOMORE;
FMLYSUPT BY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE;
MTHIMPT BY RYMIMPT RYFIMPT;
SES BY YFEDUC YMEDUC YFOCCN YMOCCN;
Totpre on XAGE EDUEPCT YFAMILY YMHWKT EDUISPR SLFENCRG FMLYSUPT MTHIMPT SES ;
Totpos on Totpre ;
%BETWEEN%
Totpre ON OLDARITH OLDGEOM CLASSSIZE MTHONLY;
Totpos ON Totpre NEWALG NEWGEOM TPPWEEK ;
ANALYSIS: TYPE = TWOLEVEL;
MODEL:
%WITHIN%
! LATENT VARIABEL
EDUISPR BY RYPWANT RYPENC YPWWELL;
SLFENCRG BY YIWANT YMORMTH RYNOMORE;
FMLYSUPT BY RYPINT RYFLIKE RYMLIKE RYFABLE RYMABLE;
142

MTHIMPT BY RYMIMPT RYFIMPT;
SES BY YFEDUC YMEDUC YFOCCN YMOCCN;
Totpre on XAGE EDUEPCT YFAMILY YMHWKT EDUISPR SLFENCRG FMLYSUPT MTHIMPT SES ;
Totpos on Totpre;
%BETWEEN%
Totpre ON OLDARITH OLDGEOM CLASSSIZE MTHONLY;
Totpos ON Totpre NEWALG NEWGEOM TPPWEEK ;
OUTPUT: TECH9;

A.3

R Code for Level-1 Matching

# Level-1 Matching
# Author: Qiu Wang
# Date: 2010-01-01/ 2010-01-06: Fist revision
# Simulation in detail: This is a simulation set-up for the dissertation. In this simulation,
the 200 independent schools with varying sample size (See Section 4.1) are generated, half
from the pseudo-population of Cohort 1 at Time 1 and the other half from pseudo-population
of Cohort 2 at Time 0. Data generation and parameter setting are discussed in Section 4.2.
The covariates both level-1 and level-2 are generated from the multivariate normal distribution with corresponding mean vector and variance-covariance matrix, which are derived
from the two-level SEM ﬁtted in Mplus. The surrogate variables of a latent construct such
as SES are also generated using multivariate normal distribution with corresponding mean
vector and variance-covariance matrix. The variance-covariance matrix is derived based on
143

the (sub-)measurement model of the two-level SEM (See Section 4.2.7).
# Propensity Score matching and Mahalanobis distance matching are proposed. The
calipers are 0.2 standard deviation of pooled samples and 0.01 standard deviation of pooled
samples.
# 200 replications per conditions.
# # # Part 1 Data Preparation # # # # #.
library(MatchIt)
setwd(”C:/Documents and Settings/wangqiu/Desktop/SIMS042010/QiuData/
REGULAR CLASS DATA/Data generate”)
# cohort 2 at time 0 and time 1 data and cohort ID is coded as 1.
cohort1T0T1<-read.table (ﬁle=’NewmodePOP.dat’)
cohort1T0T1data<-cbind(cohort1T0T1,cohort=c(rep(1,length(cohort1T0T1[,1]))))
# cohort 1 at time 1 data and # cohort ID is coded as 0.
cohort2T0T11<-read.table (ﬁle=’popdata11.dat’)
cohort2T0T1data1<-cbind(cohort2T0T11,cohort=c(rep(0,length(cohort2T0T11[,1]))))
# varialbe list
colnamess<-c(”RYPWANT”,”RYPENC”,”YPWWELL”,”YIWANT”,”YMORMTH”,
”RYNOMORE”,”RYPINT”,”RYFLIKE”,”RYMLIKE”,”RYFABLE”,”RYMABLE”,
”RYMIMPT”,”RYFIMPT”,”YFEDUC”,”YMEDUC”,”YFOCCN”,”YMOCCN”,
”TOTPOS”,”TOTPRE”,”XAGE”,”EDUEPCT”,”YFAMILY”,”YMHWKT”,
”OLDARITH”,”OLDGEOM”,”NEWALG”,”NEWGEOM”,”TPPWEEK”,”CLASSSIZ”,
”MTHONLY”,”CLUSTER”,”COHORT”)
# Attach variable names to both cohorts.
144

colnames(cohort1T0T1data)<-colnamess
colnames(cohort2T0T1data1)<-colnamess
# population longitudinal schooling eﬀect
POP.lngtdnl.ef<-mean(cohort1T0T1data$TOTPOS)-mean(cohort1T0T1data$TOTPRE)
#populaiton synthetic cohort schooling eﬀect
POP.synthetic.ef<-mean(cohort1T0T1data$TOTPOS)-mean(cohort2T0T1data1$TOTPRE)
# # # # Part 2 Simulation: Matching # # # #
cluster.id<-c(1:12600)
lngtdnl.ef<-NULL
synthetic.ef<-NULL
matched.synthetic.ef1<-NULL
matched.synthetic.ef2<-NULL
matched.synthetic.ef3<-NULL
matched.synthetic.ef4<-NULL
for (j in 1:200){
classID.treat<-sort(sample(cluster.id,size=100,replace=F))
sample.data.Trt<-NULL
sample.data.Cntr<-NULL
for (i in 1:length(classID.treat)){
sample.data.Trt<-rbind(sample.data.Trt,cohort1T0T1data[
(cohort1T0T1data$CLUSTER==classID.treat[i]),] )
sample.data.Cntr<-rbind(sample.data.Cntr,cohort2T0T1data1[
(cohort2T0T1data1$CLUSTER==classID.treat[i]),] )
145

}
lngtdnl.ef[j]<-mean(sample.data.Trt$TOTPOS)-mean(sample.data.Trt$TOTPRE)
synthetic.ef[j]<-mean(sample.data.Trt$TOTPOS)-mean(sample.data.Cntr$TOTPRE)
# matching
sample.data<-data.frame(rbind(sample.data.Trt,sample.data.Cntr))
#propensity score matching
pro.pen<-glm(COHORT∼XAGE+EDUEPCT+YFAMILY+YMHWKT,family=binomial,
data=sample.data)
logodds<-log(pro.pen$ﬁtted/(1-pro.pen$ﬁtted))
## Mahalanobis matching
XX<-cbind(sample.data$XAGE,sample.data$EDUEPCT,
sample.data$YFAMILY,sample.data$YMHWKT)
mhd<-mahalanobis(XX,mean(XX),var(XX))
cutoﬀ1 <-.01*sd(logodds)
cutoﬀ2<-0.2*sd(logodds)
cutoﬀ3<-.01*sd(mhd)
cutoﬀ4<-.2*sd(mhd)
t.c.match1<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT,
X=logodds,M=1,caliper=cutoﬀ1,replace=FALSE)
t.c.match2<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT,
X=logodds,M=1,caliper=cutoﬀ2,replace=FALSE)
t.c.match3<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT,
X=mhd,M=1,caliper=cutoﬀ3,replace=FALSE)
146

t.c.match4<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT,
X=mhd,M=1,caliper=cutoﬀ4,replace=FALSE)
# after matching SCD-based Shooling Eﬀect
matched.synthetic.ef1[j]<mean(sample.data.Trt$TOTPOS[(t.c.match1$index.treated)])mean(sample.data.Cntr$TOTPRE[(t.c.match1$index.controllength(sample.data.Trt$TOTPRE))])
matched.synthetic.ef2[j]<mean(sample.data.Trt$TOTPOS[(t.c.match2$index.treated)])mean(sample.data.Cntr$TOTPRE[(t.c.match2$index.controllength(sample.data.Trt$TOTPRE))])
matched.synthetic.ef3[j]<mean(sample.data.Trt$TOTPOS[(t.c.match3$index.treated)])mean(sample.data.Cntr$TOTPRE[(t.c.match3$index.controllength(sample.data.Trt$TOTPRE))])
matched.synthetic.ef4[j]<mean(sample.data.Trt$TOTPOS[(t.c.match4$index.treated)])mean(sample.data.Cntr$TOTPRE[(t.c.match4$index.controllength(sample.data.Trt$TOTPRE))])
}
initial.bias<-lngtdnl.ef-synthetic.ef
indicator<-1*(initial.bias<=0.5*sd(initial.bias))
aftermatch.bias1<-lngtdnl.ef-matched.synthetic.ef1
147

aftermatch.bias2<-lngtdnl.ef-matched.synthetic.ef2
aftermatch.bias3<-lngtdnl.ef-matched.synthetic.ef3
aftermatch.bias4<-lngtdnl.ef-matched.synthetic.ef4
# Bias Reduction Rate
bias.reduct1<-(abs(initial.bias[(indicator==0)])-abs(
aftermatch.bias1[(indicator==0)]))/abs(initial.bias[(indicator==0)])
bias.reduct2<-(abs(initial.bias[(indicator==0)])-abs(
aftermatch.bias2[(indicator==0)]))/abs(initial.bias[(indicator==0)])
bias.reduct3<-(abs(initial.bias[(indicator==0)])-abs(
aftermatch.bias3[(indicator==0)]))/abs(initial.bias[(indicator==0)])
bias.reduct4<-(abs(initial.bias[(indicator==0)])-abs(
aftermatch.bias4[(indicator==0)]))/abs(initial.bias[(indicator==0)])
mean(bias.reduct1)
mean(bias.reduct2)
mean(bias.reduct3)
mean(bias.reduct4)
layout(mat=matrix(1:4,2,2,byrow=FALSE) )
hist(bias.reduct1)
hist(bias.reduct2)
hist(bias.reduct3)
hist(bias.reduct4)
save.image(”sim1.RData”)
148

A.4

Code for Level-2 Matching

# Level-2 Matching
# Author: Qiu Wang
# Date: 2010-01-01/ 2010-01-06: Fist revision
# Propensity Score matching and Mahalanobis distance matching are proposed. The
calipers are 0.2 standard deviation of pooled samples and 0.01 standard deviation of pooled
samples.
# 200 replications per conditions.
# # # Part 1 Data Preparation # # # # #.
# Copy Part 1 of Appendix A.3.
## level-2 data
cluster.cohort1T0T1data<-aggregate(cohort1T0T1data,by=list(
cohort1T0T1data$CLUSTER),FUN=mean,na.rm=TRUE)
# length(cluster.cohort1T0T1data[,1]) # 12600 numver of clusters
cluster.cohort2T0T1data3<-aggregate(cohort2T0T1data3,by=list(
cohort2T0T1data3$CLUSTER),FUN=mean,na.rm=TRUE)
#length(cluster.cohort2T0T1data3[,1]) # 12600 numver of clusters
# population longitudinal schooling eﬀect using class means
POP.lngtdnl.ef<-mean(cluster.cohort1T0T1data$TOTPOS)-mean(
cluster.cohort1T0T1data$TOTPRE)
# # # # Part 2 Simulation: Matching # # # #
cluster.id<-c(1:12600)
lngtdnl.ef<-NULL
149

synthetic.ef<-NULL
matched.synthetic.ef1<-NULL
matched.synthetic.ef2<-NULL
matched.synthetic.ef3<-NULL
matched.synthetic.ef4<-NULL
for (j in 1:200){
classID.treat<-sort(sample(cluster.id,size=100,replace=F))
sample.data.Trt<-NULL
sample.data.Cntr<-NULL
for (i in 1:length(classID.treat)){
sample.data.Trt<-rbind(sample.data.Trt,cluster.cohort1T0T1data[(
cluster.cohort1T0T1data$CLUSTER==classID.treat[i]),] )
sample.data.Cntr<-rbind(sample.data.Cntr,cluster.cohort2T0T1data3[(
cluster.cohort2T0T1data3$CLUSTER==classID.treat[i]),] )
}
lngtdnl.ef[j]<-mean(sample.data.Trt$TOTPOS)-mean(sample.data.Trt$TOTPRE)
synthetic.ef[j]<-mean(sample.data.Trt$TOTPOS)-mean(sample.data.Cntr$TOTPRE)
# matching
sample.data<-data.frame(rbind(sample.data.Trt,sample.data.Cntr))
#propensity score matching
pro.pen<-glm(COHORT∼OLDARITH+OLDGEOM+CLASSSIZ+MTHONLY,
family=binomial, data=sample.data)
# testing if extra level-2 covariates improve matching # answer is NO!
150

# pro.pen<-glm(COHORT∼OLDARITH+OLDGEOM+NEWALG+NEWGEOM+
TPPWEEK+CLASSSIZ+MTHONLY,family=binomial, data=sample.data)
logodds<-log(pro.pen$ﬁtted/(1-pro.pen$ﬁtted))
XX<-cbind(sample.data$OLDARITH,sample.data$OLDGEOM,
sample.data$CLASSSIZ,sample.data$MTHONLY)
mhd<-mahalanobis(XX,mean(XX),var(XX))
## Mahalanobis matching
cutoﬀ1 <-.01*sd(logodds)
cutoﬀ2<-0.2*sd(logodds)
cutoﬀ3<-.01*sd(mhd)
cutoﬀ4<-.2*sd(mhd)
t.c.match1<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT,
X=logodds,M=1,caliper=cutoﬀ1,replace=FALSE)
t.c.match2<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT,
X=logodds,M=1,caliper=cutoﬀ2,replace=FALSE)
t.c.match3<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT,
X=mhd,M=1,caliper=cutoﬀ3,replace=FALSE)
t.c.match4<-Match(Y=sample.data$TOTPRE,Tr=sample.data$COHORT,
X=mhd,M=1,caliper=cutoﬀ4,replace=FALSE)
# after matching SCD-based Shooling Eﬀect
matched.synthetic.ef1[j]<mean(sample.data.Trt$TOTPOS[(t.c.match1$index.treated)])-mean(
sample.data.Cntr$TOTPRE[(t.c.match1$index.control151

length(sample.data.Trt$TOTPRE))])
matched.synthetic.ef2[j]<mean(sample.data.Trt$TOTPOS[(t.c.match2$index.treated)])-mean(
sample.data.Cntr$TOTPRE[(t.c.match2$index.controllength(sample.data.Trt$TOTPRE))])
matched.synthetic.ef3[j]<-mean(sample.data.Trt$TOTPOS[(t.c.match3$index.treated)])-

mean(sample.data.Cntr$TOTPRE[(t.c.match3$index.control-length(
sample.data.Trt$TOTPRE))])
matched.synthetic.ef4[j]<- mean(sample.data.Trt$TOTPOS[(t.c.match4$index.treated)])-

mean(sample.data.Cntr$TOTPRE[(t.c.match4$index.control-length(
sample.data.Trt$TOTPRE))])
}
initial.bias<-lngtdnl.ef-synthetic.ef
indicator<-1*(initial.bias<=0.5*sd(initial.bias))
# mean(abs(initial.bias[(indicator==0)]))
aftermatch.bias1<-lngtdnl.ef-matched.synthetic.ef1
aftermatch.bias2<-lngtdnl.ef-matched.synthetic.ef2
aftermatch.bias3<-lngtdnl.ef-matched.synthetic.ef3
aftermatch.bias4<-lngtdnl.ef-matched.synthetic.ef4
bias.reduct1<-(abs(initial.bias[(indicator==0)])-abs(
aftermatch.bias1[(indicator==0)]))/abs(initial.bias[(indicator==0)])
152

bias.reduct2<-(abs(initial.bias[(indicator==0)])-abs(
aftermatch.bias2[(indicator==0)]))/abs(initial.bias[(indicator==0)])
bias.reduct3<-(abs(initial.bias[(indicator==0)])-abs(
aftermatch.bias3[(indicator==0)]))/abs(initial.bias[(indicator==0)])
bias.reduct4<-(abs(initial.bias[(indicator==0)])-abs(
aftermatch.bias4[(indicator==0)]))/abs(initial.bias[(indicator==0)])
# Bias Reduction Rate
mean(bias.reduct1)
mean(bias.reduct2)
mean(bias.reduct3)
mean(bias.reduct4)
layout(mat=matrix(1:4,2,2,byrow=FALSE) )
hist(bias.reduct1)
hist(bias.reduct2)
hist(bias.reduct3)
hist(bias.reduct4)
save.image(”sim3v22.RData”)

A.5

R Code for Dual Matching

# Dual Matching
# Author: Qiu Wang
# Date: 2010-01-01/ 2010-01-06: Fist revision
153

# Propensity Score matching and Mahalanobis distance matching are proposed. The
caliper is 0.2 standard deviation of pooled samples.
# This simulation does not consider Mahalanobis matching.
# # # Part 1 Data Preparation # # # # #.
Copy Part 1 of Appendix A.4
# # # Part 2 Dual Matching # # # # #.
cluster.id<-c(1:12600)
lngtdnl.ef<-NULL
synthetic.ef<-NULL
matched.synthetic.ef<-NULL # dual propensity score based matching
cluster.lngtdnl.ef<-NULL
cluster.synthetic.ef<-NULL
cluster.matched.synthetic.ef1<-NULL
for (j in 1:200){
classID.treat<-sort(sample(cluster.id,size=100,replace=F))
sample.data.Trt<-NULL
sample.data.Cntr<-NULL
cluster.sample.data.Trt<-NULL
cluster.sample.data.Cntr<-NULL
LL<-length(classID.treat)
for (i in 1:LL){
#level-1 data
sample.data.Trt<-rbind(sample.data.Trt,cohort1T0T1data[(
154

cohort1T0T1data$CLUSTER==classID.treat[i]),] )
sample.data.Cntr<-rbind(sample.data.Cntr,cohort2T0T1data5[(
cohort2T0T1data5$CLUSTER==classID.treat[i]),] )
# level-2
cluster.sample.data.Trt<-rbind(cluster.sample.data.Trt,cluster.cohort1T0T1data[(
cluster.cohort1T0T1data$CLUSTER==classID.treat[i]),] )
cluster.sample.data.Cntr<-rbind(cluster.sample.data.Cntr,cluster.cohort2T0T1data5[(
cluster.cohort2T0T1data5$CLUSTER==classID.treat[i]),] )
}
# data sets data
sample.data<-data.frame(rbind(sample.data.Trt,sample.data.Cntr))
cluster.sample.data<-data.frame(rbind(cluster.sample.data.Trt,cluster.sample.data.Cntr))
#sample.data.Trt[,31]
#sample.data.Cntr[,31]
lngtdnl.ef[j]<-mean(sample.data.Trt$TOTPOS)-mean(sample.data.Trt$TOTPRE)
synthetic.ef[j]<-mean(sample.data.Trt$TOTPOS)-mean(sample.data.Cntr$TOTPRE)
cluster.lngtdnl.ef[j]<-mean(cluster.sample.data.Trt$TOTPOS)-mean(
cluster.sample.data.Trt$TOTPRE)
cluster.synthetic.ef[j]<-mean(cluster.sample.data.Trt$TOTPOS)-mean(
cluster.sample.data.Cntr$TOTPRE)
# dual matching step 1
#cluster matching
cluster.pro.pen<-glm(COHORT∼OLDARITH+OLDGEOM+CLASSSIZ,
155

family=binomial, data= cluster.sample.data)
# testing if extra level-2 covariates improve matching
# pro.pen<-glm(COHORT∼OLDARITH+OLDGEOM+NEWALG+NEWGEOM+
TPPWEEK+CLASSSIZ+MTHONLY,family=binomial,data=sample.data)
cluster.logodds<-log(cluster.pro.pen$ﬁtted/(1-cluster.pro.pen$ﬁtted))
cluster.cutoﬀ.logodds <- .2*sd(cluster.logodds)
cluster.t.c.match1<-Match(Y=cluster.sample.data$TOTPRE,Tr=
cluster.sample.data$COHORT, X=cluster.logodds,M=1,caliper=
cluster.cutoﬀ.logodds,replace=FALSE)
cluster.matched.synthetic.ef1[j]<-mean(cluster.sample.data.Trt$TOTPOS[(
cluster.t.c.match1$index.treated)])-mean(cluster.sample.data.Cntr$TOTPRE[(
cluster.t.c.match1$index.control-LL)])
# dual matching step 2
#individual matching
matched.cluster.data.Trt<-NULL
matched.cluster.data.Cntr<-NULL
for (k in 1:length(cluster.t.c.match1$index.treated))
{
matched.cluster.data.Trt<-rbind(matched.cluster.data.Trt,
sample.data.Trt[(sample.data.Trt$CLUSTER==cluster.sample.data.Trt$CLUSTER[
cluster.t.c.match1$index.treated[k]]),])
matched.cluster.data.Cntr<-rbind(matched.cluster.data.Cntr,sample.data.Cntr[(
sample.data.Cntr$CLUSTER==cluster.sample.data.Cntr$CLUSTER[(
156

cluster.t.c.match1$index.control[k]-LL)]),])
} # matched.sample.data$ matched.sample.data<-data.frame(rbind(matched.cluster.data.Trt,
matched.cluster.data.Cntr))
pro.pen<-glm(COHORT∼XAGE+EDUEPCT+YFAMILY+YMHWKT,family=binomial,
data=matched.sample.data)
logodds<-log(pro.pen$ﬁtted/(1-pro.pen$ﬁtted))
#hist(logodds)
cutoﬀ <- .2*sd(logodds)
t.c.match<-Match(Y=matched.sample.data$TOTPRE,Tr=matched.sample.data$COHORT,
X=logodds,M=1,caliper=cutoﬀ,replace=TRUE)
matched.synthetic.ef[j]<- mean(matched.cluster.data.Trt$TOTPOS[(
t.c.match$index.treated)])-mean(matched.cluster.data.Cntr$TOTPRE[(
t.c.match$index.control-length(matched.cluster.data.Trt$TOTPRE))])

157

Appendix B
Variance-Covariance Decomposition
of the Extended Solomon Four-Group
Design (SFGD) Based On Two-Level
SEM Framework

158

B.1

Variance-Covariance Matrix of SFGD Experimental Group 1

The Variance-Covariance Matrix of Y and X in Pre-Post Test Design [a.k.a, Covariance Matrix of Solomon Four-Group
Design’s Experimental Group 1: Pre-test = 0, Post-test = 1 ].




V(Y0 )








V(Y1 )
 V(Y )
  Cov(Y0 , Y1 )

=

Cov(Y, X) = 




 Cov(Y , X ) Cov(Y , X )

Cov[Y, X) V(X)
V(X0 )


0 0
1 0




Cov(Y0 , X1 ) Cov(Y1 , X1 ) Cov(X0 , X1 ) V(X1 )


b B + Ψb )λ + Θb
e10

 λ0 (B0 Ψξ
U0 0
0 0






b π B + Ψb )λ + Θb
b B + Ψb )λ β
λ1 (B1 πΨξ
λ0 (B0 Ψξ


e11
1
0
U1 1
U0 1
0
0

=




λ0 B0 Ψb g0
λ1 B1 πΨw g0
g0 Ψb g0 + Θb


e20
ξ0
ξ0
ξ0




λ0 B0 πΨb g1
λ1 B1 πΨb π g1
g0 Ψb g1 π
g1 πΨb π g1 + Θb
e21
ξ0
ξ0
ξ0
ξ0


w B + Ψw )λ + Θw
e10
 λ0 (B0 Ψξ 0

U0 0
0






λ0 (B0 Ψw B 0 + Ψw )λ1 β
λ1 (B1 πΨw π B1 + Ψw )λ1 + Θw


e11
U0
U1
ξ0
ξ0


+



λ0 B0 Ψw g0
λ1 B1 πΨw g0
g0 Ψw g0 + Θw


e20
ξ0
ξ0
ξ0




wg
wπg
wg π
w π g + Θw
λ0 B0 πΨξ 1
λ1 B1 πΨξ
g0 Ψξ 1
g1 πΨξ
e21
159
0
0 1
0
0 1




B.2

Variance-Covariance Matrix of SFGD Experimental Group 2




V(Y0 )








V(Y1 )
 V(Y )
  Cov(Y0 , Y1 )

=

Cov(Y, X) = 


 
 Cov(Y , X ) Cov(Y , X )

Cov[Y, X) V(X)
V(X0 )


0 0
1 0




Cov(Y0 , X1 ) Cov(Y1 , X1 ) Cov(X0 , X1 ) V(X1 )

b
b
b
 λ0 (B0 Ψξ B0 + ΨU )λ0 + Θe10
0
0



λ1 (B1 πΨb π B1 + Ψb )λ1 + Θb
λ0 (B0 Ψb B0 + Ψb )λ1

e11
U1
U0
ξ0
ξ0

=

g0 Ψb g0 + Θb
λ1 B1 πΨw g0
λ0 B0 Ψb g0

e20
ξ0
ξ0
ξ0


g1 πΨb π g1 + Θb
g0 Ψb g1 π
λ1 B1 πΨb π g1
λ0 B0 πΨb g1
e21
ξ0
ξ0
ξ0
ξ0






w
w
w
 λ0 (B0 Ψξ B0 + ΨU )λ0 + Θe10
0
0



λ0 (B0 Ψw B0 + Ψw )λ1
λ1 (B1 πΨw π B1 + Ψw )λ1 + Θw

e11
U0
U1
ξ0
ξ0
+


λ0 B0 Ψw g0
λ1 B1 πΨw g0
g0 Ψw g0 + Θw

e20
ξ0
ξ0
ξ0


λ0 B0 πΨw g1
λ1 B1 πΨw π g1
g0 Ψw g1 π
g1 πΨw π g1 + Θw
e21
ξ0
ξ0
ξ0
ξ0
160



























B.3

Variance-Covariance Matrix of SFGD Control Group 1








 V(Y )
 
=
Cov(Y, X) = 

 

Cov[Y, X) V(X)




















V(Y1 )

Cov(Y1 , X1 )







=


















V(X1 )


λ1 (B1 Ψb B1 + Ψb )λ1 + Θb
e11
U1
ξ1

g1 Ψb g1 + Θb
e21
ξ1


λ1 B1 Ψb g1
ξ1

λ1 (B1 Ψw B1 + Ψw )λ1 + Θw
e11
U1
ξ1

λ1 B1 Ψw g1
ξ1

g1 Ψw g1 + Θw
e21
ξ1

161


















+






B.4

Variance-Covariance Matrix of SFGD Control Group 2








 V(Y )
 
=
Cov(Y, X) = 

 

Cov[Y, X) V(X)




















V(Y1 )

Cov(Y1 , X1 )







=


















V(X1 )


λ1 (B1 Ψb B1 + Ψb )λ1 + Θb
e11
U1
ξ1

g1 Ψb g1 + Θb
e21
ξ1


λ1 B1 Ψb g1
ξ1

λ1 (B1 Ψw B1 + Ψw )λ1 + Θw
e11
U1
ξ1

λ1 B1 Ψw g1
ξ1

g1 Ψw g1 + Θw
e21
ξ1

162


















+






B.5

Detailed Variance-Covariance Decomposition

This appendix discusses the detailed variance-covariance decomposition of the SFGD’s Experimental Group 1. The measurement model is



 Y = δ + λη + e
1

,

(B.1)


 X = v + gξ + e

2
where e1 ∼ N (0, Θe1 ). e1 is independent of η, ξ and e2 , e2 ∼ N (0, Θe2 ) is independent of
η,ξ and e1 . The latent variable ξ is hierarchically structured and includes within-cluster and
between-cluster components. The latent variable η of Y is hierarchically structured (Muth´n,
e
1994, p. 379). This is because η has a functional relationship with ξ in the structural model.
If A is the intercept and B the loading matrix, the structural model (J¨reskog and
o
S¨rbom, 1996) is written as
o
η = A + Bξ + U,

(B.2)

where U ∼ N (0, ΘU ) is independent of ξ, e1 and e2 . The independence assumption will
be used in the computation of the covariance of Y and X. This model is a two-level factor
analysis model (Muth´n and Muth´n, 2009). Correspondingly, based on the SEM framework,
e
e
data variation can be decomposed into within- and between-cluster components (Muth´n,
e
1994, p. 380).
Decomposition of Variation of X The variation of the latent variable ξ can be decomposed
as
V(ξ) = Ψξ = Ψb + Ψw ;
ξ
ξ

(B.3)

the variation of X’s residual can be decomposed into between- and within-cluster components.
163

That is,
V(e2 ) = Θb + Θw .
e2
e2

(B.4)

The variation of outcome X is decomposed as

V(X) = Φb + Φw ,
X
X

(B.5)

Φb = gΨb g + Θb ,
e2
X
ξ

(B.6)

Φw = gΨw g + Θw .
e2
X
ξ

(B.7)

with

and

Decomposition of Latent variable η The variation of η

ξ can be decomposed using the

structural model. First, the residual variance is decomposed as

V(U ) = Θb + Θw .
U
U

(B.8)

V(η) = Ψη = Ψb + Ψw ,
η
η

(B.9)

Ψb = BΨb B + Ψb ,
η
U
ξ

(B.10)

Ψw = BΨw B + Ψw .
η
U
ξ

(B.11)

The variation of η is decomposed as

with

and

164

Decomposition of Variation of Y The variation of Y’s residual can also be ξ can be
decomposed into between- and within-cluster components,

V(e1 ) = Θb + Θw .
e1
e1

(B.12)

Now the variation of variable Y is decomposed as

V(Y ) = Φb + Φw ,
Y
Y

(B.13)

Φb = λΨb λ + Θb ,
η
e1
Y

(B.14)

Φw = λΨw λ + Θw .
η
e1
Y

(B.15)

with

and

Covariance of Y and X Using the independence assumptions deﬁned in the measurement
and structural models above, the covariance of Y and X is computed as

Cov[Y, X) = Cov[λ(A+Bξ+U )+e1 , v+gξ+e2 ] = λBgV ar(ξ) = λBgΨb +λBgΨw . (B.16)
ξ
ξ

Thus, the whole data variance-covariance matrix is shown below:


Cov(Y, X) = 



V(Y )



 
=
 

Cov[Y, X) V(X)
 
 Φb
  Φw
Y
Y

+
=
 
λBgΨb Φb
λBgΨw Φw
X
X
ξ
ξ


165


Φb + Φw
Y
Y

λBgΨb + λBgΨw Φb + Φw
X
X
ξ
ξ











 
b λ + Θb
 λΨη
  λΨw λ + Θw
e1
η
e1
+
=

 
λBgΨb
gΨb g + Θb
λBgΨw
gΨw g + Θw
e2
e2
ξ
ξ
ξ
ξ


 λ(BΨb B + Ψb )λ + Θb 1

e
E
ξ
+
=


λBgΨb
gΨb g + Θb
e2
ξ
ξ


 λ(BΨw B + Ψw )λ + Θw

e1
E
ξ




λBgΨw
gΨw g + Θw
e2
ξ
ξ






Variance-Covariance Decomposition across Times: This is the temporal decomposition
of variance-Covariance in Appendix B.1.

Note the measurement model in the equation (B.1), e1 ∼ N (0, Θe1 ). e1 is independent
of η, ξ and e2 , e2 ∼ N (0, Θe2 ) is independent of η, ξ and e1 . Rewrite each component
in the model into two parts. One part represents the measure collected in Time 1 and the






 λ
 δ 
 Y0 
, δ =  0 ,λ =  0
other in Time 2. For the ﬁrst equation, Y = 





0
δ1
Y1





0
λ1


,


 η0 
 e 
, e =  10 . Variation of variable Y is decomposed as
η=
i

 1 

η1
e11
V(Yt ) = Φb + Φw ,
Yt
Yt

(B.17)

Φb = λt Ψb λt + Θb ,
ηt
e1t
Yt

(B.18)

Φw = λt Ψw λt + Θw ,
ηt
e1t
Yt

(B.19)

with

and

166

for t= 0, 1.












 X0 
 v 
 g
, k =  1 ,g =  0
Similarly write, X = 





X1
v2
0







0


 ξ 
, ξ =  0 , e =


 2
g1
ξ1

 e20 

. Correspondingly, the variation of outcome X is decomposed as
i


e21
V(Xi ) = Φb + Φw ,
Xi
Xi

(B.20)

Φb = gi Ψb gi + Θb ,
e2i
Xi
ξi

(B.21)

Φw = gi Ψw gi + Θw ,
e2i
Xi
ξi

(B.22)

with

and

for i= 0,1.

The latent variables ξ0 , ξ1 , η0 , and η1 are hierarchically structured and includes withincluster and between-cluster components (Muth´n, 1994, p. 379). This is because η has a
e
functional relationship with ξ in the structural model as followings:













 η0   A1   B0

=
+

 
 
η1
A2
0







0

  ξ0   U0 

+
,

 

B1
ξ1
U1

(B.23)

with U ∼ N (0, ΘU ) is independent of ξ, e1 and e2 . Thus, it results in

V(η0 ) = Ψη0 = Ψb + Ψw ,
η0
η0
167

(B.24)

with
Ψb = B0 Ψb B0 + Ψb ,
η0
U0
ξ0

(B.25)

Ψw = B0 Ψw B0 + Ψw .
η0
U0
ξ0

(B.26)

and

Now, write the variance-covariance matrix as



Cov(Y, X) = 


V(Y )
Cov[Y, X) V(X)









V(Y0 )




V(Y1 )
 Cov(Y0 , Y1 )
=

 Cov(Y , X ) Cov(Y , X )
V(X0 )

0 0
1 0


Cov(Y0 , X1 ) Cov(Y1 , X1 ) Cov(X0 , X1 ) V(X1 )






.






Compute V(Y ) In order to determine Cov(Y0 , Y1 ), the structural relationship between
η1 , and η1 . That is,
η1 = τ + γ + ιη0 .

(B.27)

Slope ι represents schooling eﬀect and γ represents maturation eﬀect; τ represents the pretest eﬀect at Time 1 (Solomon, 1949).
Thus, Cov(Y0 , Y1 ) = Cov[δ1 +λ0 η0 +e10 , δ2 +λ1 η1 +e11 ) = Cov[λ0 η0 , λ1 (τ +γ+ιη0 )] =
λ0 V(η0 )λ1 ι , with Cov[e10 , e11 ) = 0.
Plug in those components and write





b + Φw
V(Y0 )

  ΦY

Y0

=
0

V(Y ) = Cov(Y, Y ) = 
 

Cov(Y0 , Y1 ) V(Y1 )
Cov(Y0 , Y1 ) Φb + Φw
Y1
Y1
168



 
b λ + Θb
  λ0 Ψw λ0 + Θw
 λ0 Ψη0 0
η0
e10
e10
+
=
 

λ1 Ψw λ1 + Θw
λ1 Ψb λ1 + Θb
λ0 Ψw λ1 τ
λ0 Ψb λ1 τ
e11
η1
e11
η1
η0
η0











b
b
b
 λ0 (B0 Ψξ B0 + ΨU )λ0 + Θe10

0

0
+
=

b B + Ψb )λ ι
b B + Ψb )λ + Θb
λ0 (B0 Ψξ 0
λ1 (B1 Ψξ 1
e11
U0 1
U1 1
0
1


w
w
w
 λ0 (B0 Ψξ B0 + ΨU )λ0 + Θe10

0

0



λ1 (B1 Ψw B1 + Ψw )λ1 + Θw
λ0 (B0 Ψw B0 + Ψw )λ1 ι
e11
U1
U0
ξ1
ξ0




b
b
b

 λ0 (B0 Ψξ B0 + ΨU )λ0 + Θe10
0
+
0

=

λ1 (B1 πΨb π B1 + Ψb )λ1 + Θb
λ0 (B0 Ψb B0 + Ψb )λ1 ι
e11
U1
U0
ξ0
ξ0


w
w
w

 λ0 (B0 Ψξ B0 + ΨU )λ0 + Θe10
0

0



λ1 (B1 πΨw π B1 + Ψw )λ1 + Θw
λ0 (B0 Ψw B0 + Ψw )λ1 ι
e11
U1
U0
ξ0
ξ0

Compute V(X) In order to determine Cov(X0 , X1 ), the structural relationship between
ξ0 , and ξ1 , is displayed by
ξ1 = a + πξ0 .

(B.28)

Thus, Cov(X0 , X1 ) = Cov[k1 + g0 ξ0 + e20 , k2 + g1 ξ1 + e21 ) = Cov[g0 ξ0 , g1 (a + πξ0 )] =
g0 V(ξ0 )g1 π , with Cov[e20 , e21 ) = 0, and
V(ξ0 ) = Ψξ = Ψb + Ψw .
ξ0
ξ0
0





b + Φw
V(X0 )

  ΦX

X0

=

0
Thus,V(X) = Cov(X, X) = 
 

Cov(X0 , X1 ) V(X1 )
Cov(X0 , X1 ) Φb + Φw
X1
X1
169



(B.29)



 
b g + Θb
w
w
 g0 Ψξ 0
  g0 Ψξ g0 + Θe20
e20
0
+
0
=

 
g0 Ψb g1 π
g1 Ψb g1 + Θb
g0 Ψw g1 π
g1 Ψw g1 + Θw
e21
e21
ξ0
ξ1
ξ0
ξ1
 










b
w
b
w
  g0 Ψξ g0 + Θe20

 g0 Ψξ g0 + Θe20
+
.

0
0
=
 

g1 πΨb π g1 + Θb
g1 πΨw π g1 + Θw
g0 Ψb g1 π
g0 Ψw g1 π
e21
e21
ξ0
ξ0
ξ0
ξ0
ComputeCov(Yi , Xj )

Components of Cov(Yi , Xj ) for i, j=0,1. are computed as

Cov(Yi , Xj ) = Cov[λi (Ai + Bi ξi + Ui ) + e1i , kj + gj ξj + e2j ] = λi Bi Cov(ξi , ξj )gj . (B.30)

Thus, the four components are computed as

Cov(Y0 , X0 ) = λ0 B0 (Ψb + Ψw )g0 ,
ξ0
ξ0

(B.31)

Cov(Y0 , X1 ) = λ0 B0 π(Ψb + Ψw )g1 ,
ξ0
ξ0

(B.32)

Cov(Y1 , X0 ) = λ1 B1 π(Ψb + Ψw )g0 ,
ξ0
ξ0

(B.33)

Cov(Y1 , X1 ) = λ1 B1 π(Ψb + Ψw )π g1 .
ξ0
ξ0

(B.34)

and

These procedures derive all components displayed in the matrixes of Appendix B.1.

170

BIBLIOGRAPHY

171

Bibliography
Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for
average treatment eﬀects. Econometrica, 74(1):235–267.
Abadie, A. and Imbens, G. W. (2007).
Bias-corrected matching estimators for
average treatment eﬀects.
Derived January 1, 2009, from http://ksghome.harvard.edu/aabadie/research.html.
Angrist, J. D. and Pischke, J. S. (2009). Mostly harmless econometrics: an empiricist’s
companion. Princeton University Press.
Austin, P. C., Grootendorst, P., and Anderson, G. M. (2007). A comparison of the ability
of diﬀerent propensity score models to balance measured variables between treated and
untreated subjects: a monte carlo study. Statistics in Medicine, 26(4):734–753.
Berger, V. (2005). Selection bias and covariate imbalances in randomzied clinical trials. John
Wiley & Sons, England.
Biemer, P. P., Groves, R. M., and Lyberg, L. E. (2004). Measurement errors in surveys.
John Wiley & Sons, New Jesey.
Bloom, H. S. (2004). Randomizing Groups to Evaluate Place-based Programs,. Retrieved
July 18, 2009, http://www.wtgrantfoundation. org/usr doc/RSChapter4Final.pdf.
Bloom, H. S., Richburg-Hayes, L., and Black, A. R. (2007). Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educational
Evaluation and Policy Analysis, 29(1):30–59.
Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons, New
York.
Boyd, L. H. and Iversen, G. R. (1979). Contextual analysis: Concepts and statistical techniques. Wadsworth Publishing Company, Belmont, California.
Burstein, L. (1992). The IEA study of mathematics III: Student growth and classroom processes. Pergamon Press, Oxford.
Campbell, D. T. and Stanley, J. C. (1966). Experimental and quasi-experimental designs for
research. Rand McNally College Publishing Company, Chicago.
172

Campbell, R. T. and Hudson, C. M. (1985). Synthetic cohorts from panel surveys. Research
on Aging, 7(1):81–93.
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. M. (2006). Measurement
Error in Nonlinear Models: A Modern Perspective. CRC Press.
Cheung, G. W. and Rensvold, R. B. (2002). Evaluating goodness-of-ﬁt indexes for testing measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal,
9(2):233–255.
Cochran, W. G. (1953). Matching in analytical studies. American Journal of Public Health,
43:684–691.
Cochran, W. G. (1957). Analysis of covariance: its nature and uses. Biometrics, 13(3):261–
281.
Cochran, W. G. (1963). Sampling Techniques. John Willey & Sons, New York.
Cochran, W. G. (1968a). The eﬀectiveness of adjustment by subclassiﬁcation in removing
bias in observational studies. Biometrics, 24(2):295–313.
Cochran, W. G. (1968b). Errors of measurement in statistics. Technometrics, 10(4):637–666.
Cochran, W. G. (1969). The use of covariance in observational studies. Applied Statistics,
18(3):270–275.
Cochran, W. G. (1972). Observational studies. In Bancroft, T. A., editor, Statistical Papers
in Honor of George W. Snedecor. Iowa State University Press, Ames.
Cochran, W. G. and Chambers, S. P. (1965). The planning of observational studies of human
populations. Journal of the Royal Statistical Society. Series A (General), 128(2):234–266.
Cochran, W. G. and Rubin, D. B. (1973). Controlling bias in observational studies: A
review. Sankhy : The Indian Journal of Statistics, Series A, 35:417–446.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Lawrence Erlbaum,
Hillsdale, New Jersey.
Cornﬁeld, J. (1978). Randomization by group: a formal analysis. American Journal of
Epidemiology, 108(2):100–102.
Cox, D. R. and Reid, N. (2000). The theory of the design of experiments. CRC Press.
Donner, A. (1998). Some aspects of the design and analysis of cluster randomization trials.
Journal of the Royal Statistical Society: Series C (Applied Statistics), 47(1):95–113.
Donner, A. and Klar, N. (2000). Design and Analysis of Cluster Randomization Trials in
Health Research. Arnold, London.
Elder, G. H. (1998). The life course as developmental theory. Child development, 69(1):1–12.
173

Firebaugh, G. (1978). A rule for inferring individual-level relationships from aggregate data.
American Sociological Review, 43(4):557–572.
Freedman, L. S., Green, S. B., and Byar, D. P. (1990). Assessing the gain in eﬃciency due
to matching in a community intervention study. Statistics in Medicine, 9(8):943–952.
Fuller, W. A. (1987). Measurement error models. John Wiley & Sons, New York.
Fuller, W. A. (1995). Estimation in the presence of measurement error. International Statistical Review/Revue Internationale de Statistique, 63(2):121–141.
Gelman, A. (2009). A statisticians perspective on Mostly Harmless Econometrics: An
Empiricists Companion, by Joshua D. Angrist and J¨rn-Steﬀen Pischke. Stata Journal,
o
9(2):315–320.
Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical
Models. Cambridge University Press, New York.
Greenland, S. (2004). An overview of methods for causal inference from observational studies.
In Gelman, A. and Meng, X.-L., editors, Applied Bayesian Modeling and Causal Inference
from Incomplete-Data Perspectives, pages 3–14. John Wiley & Sons, New York.
Griﬃn, B. A., McCaﬀrey, D. F., and J, F. P. (2009). Evaluating the impact of blocking on
power in group-randomized trials. In Annual Conference of the Society for Research on
Educational Eﬀectiveness (SREE), Washington D.C.
Hansen, M. H., Hurwitz, W. N., and Bershad, M. A. (1961). Measurement errors in censuses
and surveys. Bull. de Institut. International de Statistique, 38(2):359–374.
Haviland, A. M. and Nagin, D. S. (2005). Causal inferences with group based trajectory
models. Psychometrika, 70(3):557–578.
Heckman, J. J. (1979). Sample selection bias as a speciﬁcation error. Econometrica: Journal
of the econometric society, 47(1):153–161.
Hedges, L. V. (2007a). Correcting a signiﬁcance test for clustering. Journal of Educational
and Behavioral Statistics, 32(2):151–179.
Hedges, L. V. (2007b). Eﬀect sizes in cluster-randomized designs. Journal of Educational
and Behavioral Statistics, 32(4):341–370.
Heimberg, R. G., Stein, M. B., Hiripi, E., and Kessler, R. C. (2000). Trends in the prevalence
of social phobia in the united states: a synthetic cohort analysis of changes over four
decades. European Psychiatry, 15(1):29–37.
Ho, D. E., Imai, K., King, G., and Stuart, E. A. (2009). MatchIt: nonparametric preprocessing for parametric causal inference (version 2.211)[software]. Journal of Statistical
Software, http://imai.princeton.edu/research/ﬁles/matchit.pdf.
174

Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical
Association, 81(396):945–960.
Hong, G. and Raudenbush, S. W. (2006). Evaluating kindergarten retention policy. Journal
of the American Statistical Association, 101(475):901–910.
Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a ﬁnite universe. Journal of the American Statistical Association, 47(260):663–
685.
Huberty, C. J. and Olejnik, S. (2006). Applied MANOVA and discriminant analysis. John
Wiley & Sons, Hoboken, New Jersey.
IEA (1977).
The Second International Mathematics Study.
The Netherlands:
the International Association for the Evaluation of Educational Achievement,
http://www.iea.nl/sims.html.
J¨reskog, K. G. and S¨rbom, D. (1996). LISREL 8: User’s reference guide. Scientiﬁc
o
o
Software International, Lincolnwood, Illinois.
Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: a comparison of
alternative strategies for estimating a population mean from incomplete data. Statistical
Science, 22(4):523–539.
Kaplan, D. (2008). Structural equation modeling: Foundations and extensions. Sage Publications, Thousand Oaks, California.
Kessler, R. C., Stein, M. B., and Berglund, P. (1998). Social phobia subtypes in the national
comorbidity survey. American Journal of Psychiatry, 155(5):613–619.
Kifer, E. (1992). Opportunities, talents and participation. In Burstein, L., editor, The
IEA Study of Mathematics III: Student Growth and Classroom Processes, pages 279–307.
Pergamon Press, Oxford.
Lee, S. Y. (2007). Structural equation modeling: A Bayesian approach. John Wiley & Sons,
West Sussex, England.
Leon, A. C. and Hedeker, D. (2005). A mixed-eﬀects quintile-stratiﬁed propensity adjustment
for eﬀectiveness analyses of ordered categorical doses. Statistics in medicine, 24(4):647–
658.
Li, Y. P., Propert, K. J., and Rosenbaum, P. R. (2001). Balanced risk set matching. Journal
of the American Statistical Association, 96(455):870–882.
Lord, F. M. (1980). Applications of item response theory to practical testing problems.
Lawrence Erlbaum Associates, Hillsdale, New Jersey.
Lord, F. M. and Novick, M. R. (1968). Statistical theories of mental test scores. AddisonWesley Publishing Company, Reading, Massachusett.
175

L¨dtke, O., Marsh, H. W., Robitzsch, A., Trautwein, U., Asparouhov, T., and Muth´n, B.
u
e
(2008). The multilevel latent covariate model: A new, more reliable approach to grouplevel eﬀects in contextual studies. Psychological methods, 13(3):203–229.
Lunn, D. J., Thomas, A., Best, N., and Spiegelhalter, D. (2000). Winbugs-a bayesian
modelling framework: Concepts, structure, and extensibility. Statistics and Computing,
10(4):325–337.
Mahalanobis, P. C. (1946). A sample survey of after-eﬀects of the bengal famine of 1943.
Sankhy : The Indian Journal of Statistics, 7(4):337–400.
Martin, D. C., Diehr, P., Perrin, E. B., and Koepsell, T. D. (1993). The eﬀect of matching
on the power of randomized community intervention studies. Statistics in Medicine, 12(34):329–338.
McCaﬀrey, D. and Hamilton, L. (2007). Value-Added Assessment in Practice: Lessons from
the Pennsylvania Value-Added Assessment System Pilot Project. Rand Corporation, Santa
Monica, California.
McCall, W. A. (1923). How to experiment in education. The Macmillan Company, New
York.
Morgan, S. L. and Winship, C. (2007). Counterfactuals and causal inference: Methods and
principles for social research. Cambridge University Press, New York.
Murray, D. M. (1998). Design and analysis of group-randomized trials. Oxford University
Press, New York.
Murray, D. M., Rooney, B. L., Hannan, P. J., Peterson, A. V., Ary, D. V., Biglan, A.,
Botvin, G. J., Evans, R. I., Flay, B. R., and Futterman, R. (1994). Intraclass correlation
among common measures of adolescent smoking: estimates, correlates, and applications
in smoking prevention studies. American Journal of Epidemiology, 140(11):1038–1050.
Muth´n, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods &
e
Research, 22(3):376–398.
Muth´n, L. K. and Muth´n, B. O. (2009). Mplus users guide. Muth´n & Muth´n, Los
e
e
e
e
Angeles, California.
Neyman, J. (1923). On the application of probability theory to agricultural experiments:
Essay on principles, section 9.(translated in 1990). Statistical Science, 5:465–480.
Ou, S. R. and Reynolds, A. J. (2010). Grade retention, postsecondary education, and public
aid receipt. Educational Evaluation and Policy Analysis, 32(1):118–139.
Planty, M., Hussar, W., Snyder, T., Kena, G., KewalRamani, A., Kemp, J., Bianco, K., and
Dinkes, R. (2009). The condition of education 2009: Indicator 18 grade rentetion (NCES
2009-081). National Center for Education Statistics, Institute of Education Sciences, U.S.
Department of Education. Washington, D.C.
176

R Development Core Team (2007). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Raab, G. M. and Butcher, I. (2001). Balance in cluster randomized trials. Statistics in
Medicine, 20(3):351–365.
Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster randomized
trials. Psychological Methods, 2(2):173–185.
Raudenbush, S. W. and Bryk, A. S. (2002). Hierarchical linear models: Applications and
data analysis methods. Sage, Thousand Oaks, California.
Raudenbush, S. W. and Liu, X. F. (2001). Eﬀects of study duration, frequency of observation, and sample size on power in studies of group diﬀerences in polynomial change.
Psychological Methods, 6(4):387–401.
Raudenbush, S. W., Martinez, A., and Spybrook, J. (2007). Strategies for improving precision
in group-randomized experiments. Educational Evaluation and Policy Analysis, 29(1):5–
29.
Raudenbush, S. W. and Sadoﬀ, S. (2008). Statistical inference when classroom quality is
measured with error. Journal of Research on Educational Eﬀectiveness, 1(2):138–154.
Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied
Psychological Measurement, 21(2):173–184.
Rosenbaum, P. R. (1986). Dropping out of high school in the united states: An observational
study. Journal of Educational and Behavioral Statistics, 11(3):207–224.
Rosenbaum, P. R. (1987). Model-based direct adjustment. Journal of the American Statistical Association, 82(398):387–394.
Rosenbaum, P. R. (2002). Observational study. Springer-Verlag, New York.
Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in
observational studies for causal eﬀects. Biometrika, 70(1):41–55.
Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate
matched sampling methods that incorporate the propensity score. American Statistician,
39(1):33–38.
Rosner, B., Spiegelman, D., and Willett, W. C. (1990). Correction of logistic regression
relative risk estimates and conﬁdence intervals for measurement error: the case of multiple
covariates measured with error. American Journal of Epidemiology, 132(4):734–745.
Rosner, B., Willett, W. C., and Spiegelman, D. (1989). Correction of logistic regression
relative risk estimates for non-random measurement error. Statistics in Medicine, 8:1051–
1069.
177

Rubin, D. B. (1973a). Matching to remove bias in observational studies. Biometrics,
29(1):159–183.
Rubin, D. B. (1973b). The use of matched sampling and regression adjustment to remove
bias in observational studies. Biometrics, 29(1):185–203.
Rubin, D. B. (1976a). Multivariate matching methods that are equal percent bias reducing,
i: Some examples. Biometrics, 32(1):109–120.
Rubin, D. B. (1976b). Multivariate matching methods that are equal percent bias reducing,
II: maximums on bias reduction for ﬁxed sample sizes. Biometrics, 32(1):121–132.
Rubin, D. B. (1978). Bayesian inference for causal eﬀects: The role of randomization. The
Annals of Statistics, 6(1):34–58.
Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to
control bias in observational studies. Journal of the American Statistical Association,
74(366):318–328.
Rubin, D. B. (1980).
36(2):293–298.

Bias reduction using mahalanobis-metric matching.

Biometrics,

Rubin, D. B. (1985). The use of propensity scores in applied bayesian inference. Bayesian
statistics, 2:463–472.
Rubin, D. B. (1990). Formal modes of statistical inference for causal eﬀects. Journal of
Statistical Planning and Inference, 25(3):279–292.
Rubin, D. B. (2001). Using propensity scores to help design observational studies: application
to the tobacco litigation. Health Services and Outcomes Research Methodology, 2(3):169–
188.
Rubin, D. B. (2006). Matched sampling for causal eﬀects. Cambridge University Press, New
York.
Rubin, D. B. and Thomas, N. (2000). Combining propensity score matching with additional
adjustments for prognostic covariates. Journal of the American Statistical Association,
95(450):573–585.
Rubin, D. B. and Waterman, R. P. (2006). Estimating the causal eﬀects of marketing
interventions using propensity score methodology. Statistical Science, 21(2):206–222.
S¨rndal, C. E., Swensson, B., and Wretman, J. (2003). Model assisted survey sampling.
a
Springer-Verlag, New York.
Schmidt, W. and Houang, T. R. (1986). Ein vergleich von drei analyseverfahren fur hierarchist strukturierte daten. In v. Saldern, M., editor, Mehrebenenanalyse, pages S. 71–81.
PVU, Weinheim.
178

Schmidt, W. H. (1969). Covariance structure analysis of the multivariate random eﬀects
model.
Schmidt, W. H. and Burstein, L. (1992). Concomitants of growth in mathematics achievement during the population a school year. In Burstein, L., editor, The IEA Study of Mathematics III: Student Growth and Classroom Processes, pages 309–327. Pergamon Press,
Oxford.
Schmidt, W. H., Tatto, M. T., Bankov, K., Blmeke, S., Cedillo, T., Cogan, L., Han, S. I.,
Houang, R., Hsieh, F. J., and Paine, L. (2007). The preparation gap: Teacher education
for middle school mathematics in six countries: Mathematics Teaching in the 21st Century
(MT21). East Lansing, MI: MSU. Retrieved December 12, 2007, from http://usteds. msu.
edu/related research.asp.
Scott, A. and Smith, T. M. F. (1969). Estimation in multi-stage surveys. Journal of the
American Statistical Association, 64(327):830–840.
Sekhon, J. S. (2007). Multivariate and propensity score matching software with automated
balance optimization: The matching package for r. Journal of Statistical Software, 10(2):1–
51.
Sekhon, J. S. and Diamond, A. (2008). Genetic matching for estimating causal eﬀects:
A general multivariate matching method for achieving balance in observational studies.
Retrieved July 18, 2009, from http://sekhon.berke- ley.edu/papers/GenMatch.pdf.
Sloane, F. C. (2008). Randomized trials in mathematics education: Recalibrating the proposed high watermark. Educational Researcher, 37(9):624–630.
Solomon, R. L. (1949). An extension of control group design. Psychological Bulletin,
46(2):137–150.
Song, M. and Herman, R. (2010). Critical issues and common pitfalls in designing and
conducting impact studies in education. Educational Evaluation and Policy Analysis,
32(3):351–371.
Spiegelman, D., Schneeweiss, S., and McDermott, A. (1997). Measurement error correction for logistic regression models with an” alloyed gold standard”. American Journal of
Epidemiology, 145(2):184.
Spybrook, J. K. (2007). Experimental designs and statistical power of group randomized
trials funded by the Institute of Education Sciences. (Unpublihsed doctoral dissertation.)
University of Michigan, Ann Arbor, Michigan.
Stuart, E. and Rubin, D. (2007). Best practices in quasi-experimental designs: matching
methods for causal inference. In Osborne, J., editor, Best Practices in Quantitative Social
Science, pages 155–176. Thousand Oaks, CA: Sage Publications.
179

Stuart, E. A. and Rubin, D. B. (2008). Matching with multiple control groups with adjustment for group diﬀerences. Journal of Educational and Behavioral Statistics, 33(3):279–
306.
Tate, R. L. and Wongbundhit, Y. (1983). Random versus nonrandom coeﬃcient models for
multilevel analysis. Journal of Educational and Behavioral Statistics, 8(2):103–120.
Tatsuoka, M. M. (1971). Multivariate Analysis: Techniques for Educational and Psychological Research. John Wiley & Sons, New York.
Thompson, S. G., Pyke, S. D. M., and Hardy, R. J. (1997). The design and analysis of
paired cluster randomized trials: an application of meta-analysis techniques. Statistics in
medicine, 16(18):2063–2079.
Van der Linden, W. J. and Hambleton, R. K. (1997). Handbook of modern item response
theory. Springer-Verlag, New York.
Weller, E. A., Milton, D. K., Eisen, E. A., and Spiegelman, D. (2007). Regression calibration
for logistic regression with multiple surrogates for one exposure. Journal of Statistical
Planning and Inference, 137(2):449–461.
Wiley, D. E. and Wolfe, R. G. (1992). Major survey design issues for the IEA third international mathematics and science study. Prospects, 22(3):297–304.
Wolfe, R. G. (1987). Second international mathematics study. training manual for use of
the databank of the longitudinal, classroom process surveys for population a in the IEA
second international mathematics study. contractor’s report. Contractor’s report, Center
for Education Statistics, Washington, D.C.
Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. MIT Press,
Cambridge, Massachusett.

180