3.4:”.
. .

ulna;

.39 4::

I 5;.

i ll:-

3?

3 .... d
.ruuﬁuy
23......

E...

 

.4 .

r $3.1: .
1 I; 14.5..
it 7:5...
ET. ‘ .Q .
dun nu: you
‘ a.

3L.»
. .: . u
c. r . a

.
3.

z ;
». ﬂue-5
ai.........::
Lie .

i... u

. .r
5‘22. ..

 

LEBRARY
Michigan State
University

 
     

 

This is to certify that the

dissertation entitled

An Investigation of Methods
for Mixed-Model Meta—Analysis
in the Presence of Missing Data

presented by

Kyle R. Fahrbach

has been accepted towards fulﬁllment
of the requirements for

___1l_-_d~__ degree in CEP SE

7 7
A;
#61. 1 0M 79% w
\/ Major professor

 

Date 5—9—0].

MSU is an Afﬁrmative Action/Equal Opportunity Institution 0712771

 

gr
0 $5
K #72
i
i

9
i2.

_/

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

DATE DUE DATE DUE DATE DUE

AlllIiZZBL’ 2005

 

6/01 c:/ClRC/DateDue.p65-p.15

An Investigation of Methods
for Mixed-Model Meta-Analysis
in the Presence of Missing Data

By

Kyle R. Fahrbach

A DISSERTATION

Submitted to
Michigan State University
College of Education
In partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling, Educational Psychology,
and
Special Education

2001

 

ABSTRACT

An Investigation of Methods
for Mixed-Model Meta-Analyses in the
Presence of Missing Data

By

Kyle R. Fahrbach

Meta-analysts often ﬁnd that the data sets they are planning to synthesize have
missing data on potentially important study characteristics. When the data are complete,
these study characteristics may be controlled for and their effects estimated through
techniques similar to weighted-least squares multiple regression analysis (Hedges & Olkin,
1985; Raudenbush, 1994), where the study characteristics are treated as predictors and the
eﬁ‘ect-size magnitude is the outcome. When data are missing, however, new analysis
techniques must be employed if the meta-analyst does not want to resort to either
dropping potentially important study characteristics from the analysis, or dropping studies
that are missing data on those characteristics.

In the present study I investigate the estimation of parameters in a mixed-model
meta-analysis under the conditions that there is missing data on the predictors. To date
only Pigott (1992) has investigated estimation in meta-analytic models where data are
missing, and she did not model the presence of random effects. The estimation procedures
compared here include complete-case analysis (the default in meta-analysis today),
available-case analysis, and maximum-likelihood estimation through the Expectation-
Maximization (EM) algorithm. Each procedure is compared with regard to the bias and

eﬂiciency of its estimators for three different types of missing data: missing completely at

random (MCAR), missing at random (MAR), and not missing at random (NMAR).
Bootstrapped standard errors for the MLE method (using the method outlined in Su,
1988) are generated and examined for accuracy, and a real meta-analytic dataset of
juvenile delinquency reduction studies is examined using the methods mentioned above.

The ﬁndings show that on average, the EM maximum-likelihood estimation
procedure produced substantively important gains in efﬁciency over both CC and AC
estimation, though there were a few subsets of simulation parameters for which the
improvement was either small or nonexistent.

The bootstrapped standard errors for the slopes were, on average, very accurate
and had acceptable non-coverage rates. However, the bootstrapped standard errors for the
estimation of 1: proved to be too conservative. The estimation of multiple models for the
juvenile delinquency dataset showed that the program that employs the EM estimation
procedure can handle models with many variables and switch between models easily. With
this program in hand, the added inconvenience of maximum-likelihood estimation in meta-

amlyst becomes minimal.

DEDICATION

To my father, who can now stop asking me

“Are you done yet?”

iv

ACKNOWLEDGMENTS

To start with, I want to thank the members of my committee — Betsy Becker, Ken
Frank, Mark Reckase, Richard Houang, and James Stapleton, for their guidance and
support. Without it, the ﬁnal version wouldn’t have been as nearly as good.

Much thanks to Sam Larson. Sam, you are by far the coolest person I’ve ever
done consulting for, and I’ve proﬁted from your advice as much as your $3.

Much thanks also to Ken Frank. Ken, not only have I enjoyed and learned much by
working with you, but without your referrals I’d have been living in the street for the last
ﬁve years. This would have put a damper on my dissertation-writing activities.

And, of course, much, much, much, much thanks to my advisor, Betsy. I would
put those “muc ”’s in a 32-point font, but I believe that in that form they would not pass
dissertation formatting muster. Betsy, without you, I’d be wrapping up my apprenticeship
proposal right about now. And it would be written in salsa, on a napkin. You’ve been a
fantastic mentor for me through this entire program, and for that, I owe you greatly.
You’ve also been a great ﬁ‘iend.

Speaking of mm I would like to acknowledge Aaron Blodgett and Jarek
Hruscik. They did little to help me ﬁnish the dissertation -— but they did help stop me from
going insane while I was working on it.

And ﬁnally, to Anne Continelli — without me, you’d probably be far less insane.
Thanks for sticking with me. Now, I shall have to ﬁnd some other excuse to avoid doing

housework than “But I have to work on my dissertation!”

TABLE OF CONTENTS

CHAPTER I
INTRODUCTION ............................................... 1
CHAPTER II
REVIEW OF THE LITERATURE .................................. 7
1. Types of Missing Data .......................................... 7
2. Estimation Techniques for Datasets with Missing Data ................. 10
Complete-Case Analysis .................................... 10
Available-Case Analysis .................................... 12
Unconditional Mean Imputation and Conditional Mean Imputation . . . . 15
Maximum-Likelihood Estimation ............................. 16
3. Summary ................................................... 19
CHAPTER III
MODELS AND ESTIMATION THEORY ........................... 20
l. The Meta-Analytic Model ...................................... 20
2. Estimation Procedures ......................................... 22
Complete-Case Estimation .................................. 22
Available-Case Estimation .................................. 22
Maximum-Likelihood Estimation ............................. 25
The E-Step ........................................ 26
Suﬂicient Statistics for D, 1:, EX, [Ix ..................... 29
The M-Step ....................................... 32
MLE Standard Errors ................................ 36
CHAPTER IV
SIMULATION STUDY METHODOLOGY .......................... 38
l. Hyperparameter Choices ....................................... 38
The Outcome ............................................ 39
Number of Studies per Meta-Analysis (k) ....................... 40
Random-Eﬁ'ects Variation (1') ................................ 41
Population Correlation Matrix among Predictors and Outcomes ..... 41
Variation in Outcome Caused by Predictor Variables (V mod) ........ 42
Incidence of Missing Data .................................. 43
Types of Missing Data .................................... 44
2. Generation of Data ............................................ 47
Bias and MSE Simulations .................................. 48
Standard Error Simulations .................................. 50
3. Criteria for the Investigation of Estimators .......................... 51

CHAPTER V

SIMULATION STUDY RESULTS ................................. 54
1. Results: MCAR/MAR Data ............................... 54
Bias in ML Estimation of B ................................. 54
Bias in ML Estimation of 1: .................................. 58
Bias in AC Estimation of B .................................. 59
Bias in AC Estimation of ‘C .................................. 62
Bias in CC Estimation of B and 1: ............................. 63
MSECC to MSEMLE Ratios .................................. 64
MSECC to MSEAC Ratios ................................... 72
2. Results: p-NMAR Data ........................................ 78
Bias in ML Estimation of B ................................. 78
Bias in ML Estimation of 1: .................................. 81
Bias in AC Estimation of B .................................. 83
Bias in AC Estimation of 1: .................................. 86
Bias in CC Estimation of B and 1: ............................. 87
MSECC to MSEMLE Ratios .................................. 89
MSECC to MSEAC Ratios ................................... 94
3. Results: o—NMAR Data ........................................ 99
Bias in ML Estimation of B ................................. 99
Bias in ML Estimation of 1: ................................. 101
Bias in AC Estimation of B ................................. 103
Bias in AC Estimation of 1? ................................. 106
Bias in CC Estimation of B ................................. 108
Bias in CC Estimation of 1: ................................. 109
MSECC to MSEMLE Ratios ................................. 111
MSECC to MSEAC Ratios .................................. 114
4. Estimation of the Population Mean ............................... 118
Bias in Mean Estimation: MAR Data ......................... 119
Bias in Mean Estimation: p-NMAR Data ...................... 120
Bias in Mean Estimation: o-NMAR Data ...................... 122
5. Results: Dichotomous Predictor with Missing Data .................. 124
Bias in ML Estimation of B ................................ 126
Bias in ML Estimation of 1: ................................. 127
Bias in AC Estimation of B ................................. 128
Bias in AC Estimation of 1' ................................. 129
Bias in CC Estimation of B and 1: ............................ 130
MSECC to MSEMLE Ratios ................................. 131
MSECC to MSEAC Ratios .................................. 135
6. Results: Bootstrapped Standard Errors ............................ 139
Bootstrapping Errors for the SIOpes .......................... 139
Testing Homogeneity of Eﬂ‘ects ............................. 146

CHAPTER VI

SAMPLE META-ANALYSIS .................................... 149
1. Selection of Study Effects and Study Characteristics ................. 150
2. The Initial Model ............................................ 154
3. The Final Model ............................................. 157
4. Conclusions ................................................ 160
CHAPTER VII
DISCUSSION AND CONCLUSION .............................. 162
1. Some Practical Considerations .................................. 162
2. Is Maximum-Likelihood Estimation Always Better? .................. 165
3. Is Maximum-Likelihood Estimation Always Substantively Better? ....... 166
4. Future Research ............................................. 178
5. Conclusion ................................................. 179
APPENDIX
Study Sample Sizes and Missing Data Patterns ........................ 182
REFERENCES ..................................................... 189

Table 5.1

Table 5.2

Table 5.3

Table 5.4

Table 5.5

Table 5.6

Table 5.7

Table 5.8

Table 5.9

Table 5.10

Table 5.11

Table 5.12

Table 5.13

Table 5.14

Table 5.15

Table 5.16

LIST OF TABLES

Maximum-Likelihood MSE/Variance Ratios for B (MCAR/MAR Data) . . 56
Maximum-Likelihood MSE/Variance Ratios for 1: (MCAR/MAR Data) . . 59

Available-Case Frequencies of Bias in Estimates of B (MCAR/MAR Data)

......................................................... 60
Available-Case MSENariance Ratios for B (MCAR Data) ............ 61
Available-Case MSENariance Ratios for B (MAR Data) ............. 61
Available-Case MSENariance Ratios and Biases for 1:

(MCAR/MAR Data) ....................................... 63
MSECC/MSEMLE Ratios (MCAR/MAR Data) ...................... 64
MSECC/MSEMLE Ratios, Main Eﬁ‘ects (MCAR/MAR Data) .......... 66
MSECC/MSEMLE Ratios for [30, MCAR/MAR Data

(Predictor Intercorrelations x Missing Data Incidence) ............. 69
MSECC/MSEMLE Ratios for B0, MCAR/MAR Data

(Predictor Intercorrelations x Missing-Data Mechanism) ............ 69
MSECC/MSEMLE Ratios for T, MCAR/MAR Data

(Size of T x Incidence of Missing Data) ........................ 70
MSECC/MSEMLE Ratios for 17, MCAR/MAR Data

(Size of 17 x Average Study Sample Size) ....................... 70
MSECC/MSEMLE Ratios for 1:, MCAR/MAR Data

(Size of T x Number of Studies) .............................. 71
MSECC/MSEMLE Ratios for T, MCAR/MAR Data

(Incidence of Missing Data x Number of Studies) ................. 71
MSECC/MSE AC Ratios (MCAR/MAR Data) ....................... 72
MSECC/MSEAC Ratios, Main Effects (MCAR/MAR Data) ............ 74

 

Table 5.17 MSECC/MSEAC Ratios, MCAR/MAR Data

(Incidence of Missing Data x k x Predictor Intercorrelations) ........ 75
Table 5.18 MSECC/MSEMLE Ratios for 1:, MCAR/MAR Data

(Size of 1: x Number of Studies) .............................. 75
Table 5.19 MSEAC/MSEMLE Ratios for MCAR/MAR Data, 1: = .02

(Excluding k=40/75% Missing Data) .......................... 77
Table 5.20 Maximum-Likelihood Frequencies of Bias in Estimates of B

(p-NMAR Data) ......................................... 79
Table 5.21 Maximum-Likelihood MSENariance Ratios

for B (p—NMAR Data) ..................................... 79
Table 5.22 Maximum-Likelihood MSENariance Ratios and Biases

for B0 and B1 (p-NMAR Data) ............................... 81
Table 5.23 Maximum-Likelihood MSENariance Ratios for 17 (p-NMAR Data) ...... 82

Table 5.24 Available-Case Frequencies of Bias in Estimates of B (p-NMAR Data) . . . 83
Table 5.25 Available-Case MSE/Variance Ratios for B (sp-NMAR Data) .......... 84
Table 5.26 Available-Case MSENariance Ratios for B (mp-NMAR Data) ......... 84
Table 5.27 Available-Case MSENariance Ratios and Biases for B0 (p-NMAR Data) . 85
Table 5.28 Available-Case MSENariance Ratios and Biases for t (p-NMAR Data) . . 86
Table 5.29 Complete-Case MSENariance Ratios and Biases for 1: (p-NMAR Data) . . 88
Table 5.30 MSECC/MSEMLE Ratios (p-NMAR Data) ......................... 89
Table 5.31 MSECC/MSEMLE Ratios, Main Eﬁ‘ects (p-NMAR Data) .............. 91

Table 5.32 MSECC/MSEMLE Ratios for 1:, p-NMAR Data
(Incidence of Missing Data x Size of t) ........................ 92

Table 5.33 MSECC/MSEMIE Ratios for 1:, p-NMAR Data
(Average Study Sample Size x Size of 1:) ....................... 93

 

Table 5.34 MSECC/MSEMLE Ratios for 1?, p-NMAR Data

(Number of Studies x Size of 1?) .............................. 93
Table 5.35 MSECC/MSEMLE Ratios for 1:, p-NMAR Data

(Incidence of Missing Data x Number of Studies) ................. 94
Table 5.36 MSECC/MSE AC Ratios, p-NMAR Data .......................... 95
Table 5.37 MSECC/MSE AC Ratios, Main Eﬂ‘ects (p-NMAR Data) ............... 96
Table 5.38 MSECC/MSE AC Ratios, p—NMAR Data

(Incidence of Missing Data x k x Predictor Intercorrelations) ........ 97
Table 5.39 MSEAC/MSEmE Ratios for p-NMAR Data, 1: = .02 ................ 98
Table 5.40 Maximum-Likelihood MSENariance Ratios for B (o-NMAR Data) . . . . 100
Table 5.41 Maximum-Likelihood MSENariance Ratios and Biases

for B0 and B1, Main Effects (o-NMAR Data) ................... 101
Table 5.42 Maximum-Likelihood MSENariance Ratios and Biases for 1:

(o-NMAR Data) ......................................... 102
Table 5.43 MSECCMSEmE Ratios for t, o-NMAR Data

(Average Study Sample Size x Population Variance) ............. 103
Table 5.44 Available-Case MSENariance Ratios for B, o-NMAR Data .......... 104
Table 5.45 Available-Case Biases for B, o-NMAR Data ...................... 104
Table 5.46 Available-Case MSENariance Ratios and Biases

for Bo and B1, Main Effects (o-NMAR Data) ................... 105
Table 5.47 Available-Case MSENariance Ratios and Biases

for B2 and B3, Main Effects (o-NMAR Data) ................... 106
Table 5.48 Available-Case MSENariance Ratios and Biases for t

(o-NMAR Data) ......................................... 107
Table 5.49 Complete-Case MSENariance Ratios for B (o-NMAR Data) ......... 108
Table 5.50 Complete-Case Biases for B (o-NMAR Data) ..................... 109

 

 

Table 5.51

Table 5.52

Table 5.53

Table 5.54

Table 5.55

Table 5.56

Table 5.57

Table 5.58
Table 5.59
Table 5.60
Table 5.61

Table 5.62

Table 5.63

Table 5.64

Table 5.65

Table 5.66

Table 5.67

Complete-Case MSENariance Ratios and Biases for 1:

(o-NMAR Data) ......................................... 1 10
MSEC C/MSEML Ratios (O'NMAR Data) ....................... 1 l l
MSECC/MSEIVILE Ratios, Main Eﬂ‘ects (O-NMAR Data) ............. 1 12
MSECC/MSE AC Ratios (o-NMAR Data) ........................ 1 14
MSECC/MSE AC Ratios, Main Eﬂ‘ects (o-NMAR Data) .............. 115
MSECC/MSE AC Ratios for NMAR Data

(Incidence of Missing Data x k) ............................. 116
MSEAC/MSEMLE Ratios for o-NMAR Data, 1: = .02

(Excluding k=40/75% Missing Data) ......................... 117
MSENariance Ratios for Estimation of the Mean (MAR Data) ........ 119
MSENariance Ratios for Estimation of the Mean (Sp-MAR Data) ..... 121
MSENariance Ratios for Estimation of the Mean (up-MAR Data) ..... 122
MSENariance Ratios for Estimation of the Mean (o-NMAR Data) ..... 123
Maximum-Likelihood MSENariance Ratios for B

(MCAR Data w/Dichotomous Predictor ....................... 126
Maximum-Likelihood MSENariance Ratios for 1:

(MCAR Data w/Dichotomous Predictor) ...................... 128
Available-Case MSENariance Ratios for B

(MCAR Data w/Dichotomous Predictor) ...................... 129
Available-Case MSENariance Ratios and Biases for 1:

(MCAR Data w/Dichotomous Predictor) ...................... 130
Complete-Case MSENariance Ratios and Biases for t

(MCAR Data w/Dichotomous Predictor) ...................... 131
MSECC/MSEMLE Ratios (MCAR Data w/Dichotomous Predictor) ..... 132

 

 

Table 5.68 MSECC/MSEMLE Ratios, Main Eﬂ‘ects

(MCAR Data w/Dichotomous Predictor) ...................... 132
Table 5.69 MSECC/MSEMLE Ratios for t, MCAR Data w/Dichotomous Predictor

(Size of ‘L' x Incidence of Missing Data) ....................... 134
Table 5.70 MSECC/MSEMLE Ratios for ‘L', MCAR Data w/Dichotomous Predictor

(Size of 1: x Average Study Sample Size) ...................... 134
Table 5.71 MSECC/MSEMIE Ratios for t, MCAR Data w/Dichotomous Predictor

(Incidence of Missing Data x Number of Studies) ................ 135
Table 5.72 MSECO/MSEAC Ratios (MCAR/MAR Data) ...................... 135

Table 5.73 MSECC/MSEAC Ratios, Main Effects
(MCAR Data w/Dichotomous Predictor) ...................... 136

Table 5.74 MSECC/MSEMLE Ratios for 1?, MCAR data w/Dichotomous Predictor
(Average Study Sample Size x Size of 1:) ...................... 137

Table 5.75 MSEAC/MSEMLE Ratios, 1: = .02
(MCAR Data, w/Dichotomous Predictor) ...................... 138

Table 5.76 Average Mean and Median Bootstrapped Variance/MSEML
Ratios for B0 and B1, Main eﬂ‘ects ........................... 141

Table 5.77 Average Mean and Median Bootstrapped Variance/MSEML
Ratios for B2 and B3, Main eﬂects ........................... 142

Table 5.78 Average Mean and Median Bootstrapped Variance/MSEML Ratios
(Incidence of Missing Data x Number of Studies Interaction) ....... 143

Table 5.79 Non-Coverage Rates using Bootstrapped Standard Errors, Main effects . 145

Table 5.80 Non-Coverage Rates using Bootstrapped Standard Errors

(Incidence of Missing Data x Number of Studies Interaction) ....... 146
Table 5.81 Empirical Rejection Rates using Bootstrapped Standard Errors ........ 147
Table 5.82 Average Empirical Percentiles for Estimates of 1:,

Average ni=80, 1: = 0 ..................................... 148

 

 

Table 6.1

Table 6.2

Table 6.3

Table 6.4

Table 7.1

Table 7.2

Table 7.3

Table 7.4

Table 7.5

Table 7.6

Table A.l

Table A.2

Table A3

Table A.4

Table A5

Table A6

Table A.7

Table A8

Study Characteristics Investigated in

Juvenile Delinquency Meta-Analysis .......................... 151
Variable Names and Frequency of Missing Data ................... 153
Parameter Estimates and Signiﬁcance Tests: Initial Model ............ 155
Parameter Estimates and Signiﬁcance Tests: Final Model ............ 158
MSECC/MSEMLE Ratios for MCAR Data

Across Hyperparameters Known to Meta-Analyst ................ 168
MSECC/MSEAC Ratios for MCAR Data

Across HyperParameters Known to Meta-Analyst ............... 169
Average Power of H0: B0 = 0 vs. HA: B0 = .20,

Across Hyperparameters Known to Meta-Analyst ................ 172
Average Power of HO: B1 = 0 vs. B1 = B A,

Across Hyperparameters Known to Meta-Analyst ................ 173
Average Power of HO: B2 = 0 vs. HA: B2 = B A,

Across Hyperparameters Known to Meta-Analyst ................ 174
Average Power of H0: B3 = 0 vs. HA: B3 = B A,

Across Hyperparameters Known to Meta-Analyst ................ 175
k = 40, Average ni = 80, 50% Incidence of Missing Data ............ 183
k = 40, Average ni = 80, 75% Incidence of Missing Data ............ 183
k = 40, Average ni = 400, 50% Incidence of Missing Data ........... 184
k = 40, Average ni = 400, 75% Incidence of Missing Data ........... 184
k = 100, Average ni = 80, 50% Incidence of Missing Data ........... 185
k = 100, Average ni = 80, 75% Incidence of Missing Data ........... 186
k = 100, Average ni = 400, 50% Incidence of Missing Data .......... 187
k = 100, Average ni = 400, 7 5% Incidence of Missing Data .......... 188

LIST OF FIGURES

Figure 5.1 Ratio ofMSE to Variance for [33 (MCAR Data) ................... 56

Figure 5.2 Ratio of MSE to Variance for B3 (Sp-NMAR Data) ................. 80

Figure 5.3 Ratio of MSE to Variance for B3 (mp-NMAR Data) ................ 80

Figure 5.4 Ratio of MSE to Variance for B3 (w/Dichotomous Predictor) ........ 127
xv

CHAPTER I

INTRODUCTION

Researchers conducting meta-analysis often ﬁnd themselves the victim of problems
for which they have little recourse. Many of these problems center around an absence of
data that theoretically should be reported in the studies of the ﬁeld being investigated. For
instance, the problem of publication bias, otherwise known as the “ﬁle-drawer problem,”
concerns whether studies have gone unpublished because they failed to ﬁnd signiﬁcant
results for whatever treatment or relationship was being studied. While methods exist to
correct for this bias (see, e.g., Begg, 1994), it is a problem that is difﬁcult to handle -- or
to know the ﬁill extent of in any given ﬁeld. Another common problem, and the problem
that is the focus of this study, is missing data that occurs within studies. Missing data
within studies occurs when variables that the meta-analyst believes may moderate or
mediate the effect magnitude under investigation (such as a correlation or eﬂ‘ect size) are
not reported.

Missing data of this sort are uncomfortably common in primary studies. A good
example is the set of studies in Lipsey’s 1992 meta-analysis of juvenile-delinquency
treatments. Lipsey collected 443 studies of youth delinquency interventions that had both
a treatment and a control group. Fully 12% of these studies tailed to report a variable as
important as the average age of the juveniles during treatment, and over 25% failed to
report the mean total number of hours of contact between the intervention staﬂ’ and the

juveniles under treatment. Missing data on important study characteristics occur

throughout the social sciences, and it is the meta-analyst’s continuous burden to handle
such deﬁciencies.

When conﬁonted by missing data, researchers are generally advised to take one (or
both) of two routes. They may either drop studies that have missing data on the variables
they think may be important, or, to avoid lowering their sample size, they may instead
drop those variables from the analysis. The ﬁrst option shrinks the sample size of the
meta-analysis; the second is theoretically unsound. While researchers can try to combine
both routes, the effect is piecemeal at best, as the different sub-analyses work with
different numbers of studies and diﬂ‘erent parts of the population of studies. A third route
is sometimes taken, though it is heavily advised against in the literature: some researchers
impute overall means for missing values in order that neither their overall sample size nor
their variable pool is compromised. As I will show in the review of literature, this practice
can lead to strongly biased estimators of meta-analytic parameters.

The problem of missing data in meta-analysis is made worse in that meta-analytic
datasets are often perceived to be very diverse -- moreso than many other datasets
analyzed through multiple regression. This perception has led to the “comparing apples
and oranges” debate that has taken place within the meta-analysis community for the last
twenty years (e.g., Green & Hall, 1984; Slavin, 1984). Given this perceived diversity and
the fact that the average meta-analysis has fewer cases (i.e., studies) than the average
social-science study (see, e.g., Harris & Rosenthal, 1985; Hunter & Schmidt, 1990; Lent,
Auerbach, & Levin, 1971), it should be anathema to delete studies ﬁ'om one’s analysis

simply because they do not report complete data on all variables of interest.

 

As of this writing, the only researcher to examine the problem of missing data
within meta-analytic studies in a statistically rigorous way has been Pigott (1992). Pigott
derived maximum-likelihood (ML) estimators of regression slopes within a meta-analytic
mode]. In this model, the outcome is some asymptotically normally distributed parameter,
such as an eﬁea size or a correlation, and the predictors are study characteristics, such as
type of treatment or mean number of contact hours between juvenile and intervener. Her
model was not a full model, however, in that it did not allow for the presence of a random
effect for the intercept. In other words, her model assumes that after accounting for
sampling error and variation in the observed predictors, all variation has been accounted
for and all studies have the same population correlation or eﬂ‘ect size. This assumption is
restrictive and may be unrealistic in many ﬁelds, especially if not all important study
characteristics were measured. Also, Pigott assumed that study characteristics are
measured with a precision proportional to their study’s sample size. Obviously, for many
conceivable study characteristics (i.e., sex of author, length of treatment) this assumption
is an mreasonable one; there is no reason to think that a more precise estimate of the sex
of a study’s author comes from a study with 20 participants than from a study with 200 or
2000. An estimation procedure that estimates a random effect, and does not make the
above assumption, is the natural next step.

There is a potential problem in deriving a more complicated maximum-likelihood
method, however. There is the danger that it will be too complicated or intimidating for
those who generally conduct meta-analyses, i.e., subject-matter experts who may be

uncomfortable or unfamiliar with complex statistical procedures. There are two possible

solutions to this problem: ﬁnd an easier method of handling missing data, or write
software that is general enough and simple enough to be used by someone whose
statistical expertise is limited to moderate exposure to multiple regression and facility with
one of the simpler statistical packages (e. g., SPSS for Windows).

The purpose of this study is to present, and test, a method for the estimation of
meta-analytic parameters while addressing each of the above issues. A statistically
rigorous maximum-likelihood estimation procedure is derived that accoumS for the
possible presence of a study random effect, and is written into soﬁware that can handle
most meta-analytic datasets. A less statistically rigorous, but intuitively promising method
for handling missing data (available-case analysis, also known as pairwise deletion) is also
derived and tested against the maximum-likelihood procedure. It is expected that the
maximum-likelihood procedure will perform better in terms of bias and mean-squared
error (MSE) of its estimators; however, it is an open question as to whether it will
perform substantively better. If the only difference between maximum-likelihood (ML)
analysis and available—case (AC) analysis is that the MSEs of the ML estimators are only
one or two percent smaller than the MSEs of the AC estimators, then practically speaking,
the two techniques are performing equally. The two techniques are tested against each
other and against standard complete-case analysis while varying a variety of study
parameters, including study sample size, size of random eﬂ‘ect, population correlation
matrix between the predictors and outcome, number of studies, and type of missing data.

Thus, this investigation offers several important additions to the ﬁeld. Meta-

analytic methods have fallen behind those of related ﬁelds, such as Hierarchical Linear

Modeling (I-ILM) in regard to their statistical sophistication. The derivation of a
maximum-likelihood estimation procedure to estimate all important meta-analytic
parameters, even in the presence of missing data, brings meta-analysis back to state-of-
the-art. Yet, it is not taken on faith that the new methods, which are admittedly complex,
give substantively different results than less statistically rigorous methods that are easier to
understand. This study quantiﬁes both the accuracy of all the methods studied and their
accuracies relative to each other, in order to determine whether the “payoﬂ” from the use
of the more complicated method is worthwhile. Finally, the development of easy to use
software will allow complicated meta-analytic models to be investigated by those who are
subject-matter experts and not statisticians.

Chapter 2 presents a review of the literature that deﬁnes the different types of
missing data and summarizes research regarding diﬁ‘erent methods of handling diﬂ‘erent
types of missing data.

Chapter 3 begins with a statement of the meta-analytic model under investigation.
It then describes how complete-case analysis estimates are calculated, followed by a
derivation of the available-case estimation procedure as well as the maximum-likelihood
estimation method. These are followed by a summary of the simulations used to test both
of these estimation procedures, as well as test the “default” estimation procedure,
complete-case analysis.

Chapter 4 provides a summary of the parameters to be varied in the simulations, as
well as rationales for their ranges of values.

Chapter 5 describes the criteria for investigating the different estimators of the

 

 

 

meta-analytic model and summarizes the results of the simulation work with regard to
how biased and precise the estimators are for the different methods.

Chapter 6 presents the analysis of a real meta-analytic dataset, provided by Dr.
Mark Lipsey, concerning interventions to prevent juvenile delinquency. This data set has
been the source of multiple papers (e.g., Lipsey, 1999a, 1999b).

Finally, Chapter 7 summarizes the results of the simulation work and the meta-
analysis of the Lipsey data set, and examines the possible substantive beneﬁts of using the

more statistically rigorous maximum-likelihood method.

 

CHAPTER H

REVIEW OF THE LITERATURE

Diﬁ‘erent types of missing data for the typical general linear model are introduced,
followed by a description of the estimation procedures used to handle these different types
of missing data. Literature (mostly simulation work) regarding the bias and eﬂiciency of
the estimators ﬁ'om the diﬂ‘erent procedures is summarized. While almost none of the

literature that compares and contrasts the diﬂ‘erent procedures looks at a general linear

 

model (GLM) with both a ﬁxed error term and a random eﬁ‘ects component, or has any
sort of meta-analytic context, it helps inform us as to what procedures would be best

applied to the problem.

1. Types of Missing Data

Deﬁne Y = (Y if) as a k x (p+1) matrix of k observations measured for (p+1)
variables, where the ﬁrst variable measured is the dependent variable and the next p
variables are the independent variables.

Deﬁne the response indicator R=(R,j), such that R,-J=1 if Yij is observed, and R,-J=0
if Yij is missing. This model treats R as a random variable; the type of missing data present

will depend on the speciﬁcation of Y and the distribution of R, given Y, which is

RY,m<I>,w)=tiYI<I>)rRIY.in. (2.2)

 

The parameter (I) represents the parameters concerning the Yij and their
interrelationships. Parameter 1|! represents an unknown variable or group of variables that
aﬂ‘ect the distribution of the missing-data mechanism

We can separate the values of the Yij into two groups: Yobs and Ymis’ where Yobs
denotes the observed values and Ymis denotes the missing values. The observed data, then,
consist of the values of the Yobs as well as R. Data are considered to be missing at random

(MAR) when the equality in 2.2 holds true for Yobs (as opposed to Y); that is,

ﬂYobeRl‘I’AlI) = KY obsl<1>)t(RlYob,.t|J). (MAR data) (2.3)

To say that data are MAR is to say that the missing-data mhanism is ignorable,
i.e., whether or not there is data missing for some observation is completely dependent on
a combination of random error and the observed values of other variables within that
observation. If there is missing data on variables Ya, Y3, and Y“, then the mechanism for
their missingness must be dependent solely on random error and Yil for the missing-data
mechanism to be MAR. MAR data might be thought of as “conditionally missing
completely at random”; the missing-data mechanism is nothing but noise after observed
values of the predictors are controlled for.

Data are considered to be missing completely at random (MCAR) when the
missing-data mechanism does not depend on the values of the observed values; that is,

when

 

tiYoa,m<I>.tI:)=ttY.ai<I>)trmw). (MCAR data) (2.4)

If neither of these conditions holds, then the observed values depend on Ymis; this
kind of data is referred to as NMAR (not missing at random).

No literature exists on whether meta-analytic data are generally MCAR, MAR, or
NMAR in nature. However, it is easy to see how any of these types of missing data might
arise in a group of studies. Suppose a sub—ﬁeld in an area tends to ﬁnd stronger
correlations between two variables of interest than another sub-ﬁeld that studies the same
relationship in a different context. Because they are two diﬁ'erent sub-ﬁelds, one will likely
report variables the other does not, and vice-versa. The pattern of missingness on the
predictors will be related to the outcome (the correlations), i.e., it will be MAR. MAR
data will also arise if the missingness of any predictor (e.g., treatment length) is a function
of the value of another predictor that is always observed (e. g., whether the program was
private or public, or whether the paper the data was taken ﬁ'om was published in a peer-
reviewed journal).

Alternatively, assume that the missingness of any one predictor is in part related to
the values of any of the other predictors for which there is missing data. Or, assume that
the outcome (e. g., correlations) is unreliably measured and there is a relationship between
the outcome and the missing data pattern. In such a case, the missing data pattern will be

NMAR.

2. Estimation Techniques for Datasets with Missing Data

There are numerous procedures one can use to estimate parameters in a general
linear model (including a meta-analytic model) with missing data. By far the most common
method is complete-case analysis, in which only those studies that have complete data are
used, usually in an OLS or WLS regression. Other methods to be considered for
application to a meta-analytic model are available-case analysis, unconditional-mean
imputation, conditional-mean imputation (Buck’s method), and maximum-likelihood
estimation (MLE). I consider each of these in turn with regard to evidence of their
estimators’ bias in the face of data that are MCAR, MAR, and NMAR. Finally, I discuss
the ramiﬁcations of this literature with regard to the speciﬁc nature of meta-analytic data

and models.

Congzlete-Case Msis

Complete-case (CC) analysis, also known as lismise deletion, is the simplest
estimation technique, and the most common. As with most of the other estimation
techniques discussed below, complete-case analyses give unbiased estimators of
population correlation matrices, slopes, and random-effects variance terms when the data
are MCAR. When the data are MAR, and the missing data are conﬁned to the predictors
(the latter is common in meta-analysis, though perhaps only because no methods have
been developed to handle missing data on the outcomes), CC analyses give unbiased
estimators of the slopes in the underlying regression model, but biased estimators of the

population correlation matrix and means of the predictors (Glynn et. al., 1986; Little &

10

—

Rubin, 1987). This somewhat counterintuitive fact stems ﬁom the same statistical
argument that restriction of range on a predictor in a multiple regression causes a biased
(i.e., lower) estimate of the population correlation between predictor and outcome, but
does not affect the estimate of the slope for that predictor-outcome relationship.

While CC analyses give unbiased estimators of slopes even when the data are
MAR, their estimators are ineﬁcient compared to alternative estimators. CC analyses risk
the loss of a signiﬁcant ﬁ'action of one’s sample of cases (or, in the meta-analytical
context, studies). This loss of information can cause a large drop in efﬁciency compared to
estimators from other estimation procedures (Kim & Curry, 1977; Little & Rubin, 1987;
Little & Raghunathan, 1999). Also, while CC estimators are unbiased when the data are
MAR, the same cannot be said when the data are not missing at random. Few studies have
investigated robustness of procedures with regard to data that are NMAR, however, so
the evidence cannot be considered to be strong in any direction. The little research that has
been done simulating NMAR data (e.g., Little, 1992; Little & Raghunathan, 1999)
suggests that CC analyses can lead to large biases relative to other estimation procedures,
as well as poorer conﬁdence interval coverage. Any procedure gives biased results when
the data are NMAR if the estimation procedure is not speciﬁcally modeled to handle the
missing-data mechanism that is causing the data to be NMAR (Little & Rubin, 1992, Ch.
12); however, the research suggests CC estimators have noticeably worse biases than

estimators from other procedures.

11

Available-Case Anabgis

There is good reason for distaste for the idea of throng out otherwise good data,
merely because it is incomplete. A natural idea is to try pairwise deletion of the data (as
opposed to listwise). This kind of deletion leads to what is known as available-case (AC)
analysis. However, AC analysis has a potential problem in that different sets of cases are
used to estimate different means and covariances among a group of variables. Depending
on how the means and covariances are estimated, the covariance matrix has the chance of
not being positive deﬁnite. A lack of a positive deﬁnite covariance matrix makes multiple
regression analysis diﬂicult.

However, low correlations among variables minimize the risk of ﬁnding a positive
indeﬁnite matrix. Simulation studies have shown that when multiple regressions are based
on population correlation matrices that have low to moderate values for the correlations,
AC performs well compared to CC methods (Kim & Curry, 1977; Little, 1992). In fact, in
Little’s paper the AC analysis performs comparably to the maximum likelihood analysis.
There is no evident bias for the AC estimators when the data are MCAR, a small bias for
one predictor when the data are MAR, and the bias that arises when the data are NMAR is
no greater than it is for the ML estimation. While the bias for the one predictor (the one
that had missing values) cannot be ignored, it was not that much greater than the
difference between the ML estimate and the correct value (.154 vs .113, with the AC
analysis having a standard error of .364 and the ML analysis a standard error of .471). For
the other predictors, AC analysis perforated equally well or better than MLE, and

Considerably better than CC.

In Little’s 1992 study, standard errors for the estimates of the AC slopes are 6-8%
larger than the standard errors for the ML slopes. Little points out that the standard errors
he reports, taken ﬁ'om a BMDP algorithm (Dixon, 1988) seem to have “no theoretical
basis” and “appear too small”. He suggests that correct estimators of standard errors
require more complex formulas (e.g., see Van Praag, Dijkstra, & Van Veltzen, 1985).

There is one other perceived problem with AC analysis, and that is that it only
performs this well when the intercorrelations among the variables are “low”, as they were
in Little’s 1992 study. When intercorrelations are high, AC analyses decline in
performance to where CC analyses are superior (Azen & Van Guilder, 1971; Haitovsky,
1968).The natural question to ask is, how low is “low” and how high is “high”? As it turns
out, the correlations need not be very low at all for AC analyses to provide more efﬁcient
estimators than CC analyses. Kim and Curry (197 7) point out that Haitovsky’s 1968 paper
compared AC and CC analyses using parameter values rarely found in social science data;
all but two of the simulated multiple regressions had population multiple R25 of greater
than .7, and halfhad multiple R2s of greater than .9! The two models that had lower R25
(.596 and .158) were still unrepresentative of social science regression models as even
these models had some correlations between the predictors that were greater than .8 and
.9. Thus, the generalizability of Haitovsky’s paper (and rarely is any other work cited
regarding the weakness of AC estimation when intercorrelations are high) is limited at
best.

A similar problem is found in Azen and Van Guilder (1971), in that most of the

parameter values examined would rarely be found in a meta-analysis. Four conditions were

13

——____‘

investigated with regard to the correlation structure of the predictors and outcome. Only
the ﬁrst, where R2 = .5 and p (the correlation between the predictors) =.25 resembles
anything that might be found in a meta-analysis; many would argue that an R2 of .5,
corresponding to a multiple R of .71, is unrealistically high. The other conditions were
R2=.5, p=.75, R2=.9, p=.25, and R2=.9, p=.75. Rarely do we see st this high or
correlations between predictors in a multiple regression this high anywhere in the social
sciences.

Not surprisingly, Azen and Guilder found that when R2 is high, available-case
analysis should not be used. Similarly, they found that when the correlation structure
among the predictors is strong (i.e., p is high), available-case analysis should not be used.
They did ﬁnd that in a case we are interested in, where R2=.5 and p=.25, that “[Available
case analysis] performs adequately when the . . . data is missing at random [MCAR] or in
a related pattern [MAR]” (p. 54). When the data were NMAR (“truncated” in their
language), they found that when R2=.5 and p=.25, complete-case analysis performed
somewhat better than did the maximum-likelihood estimation through the EM algorithm,
which in turn performed slightly better than available-case analysis. This is a curious
ﬁnding, as no where else in the literature has anyone found complete-case analysis to be
superior to EM (or to available-case analysis when the correlations are low). Given that
their study was based on only 50 replications, and given the differences between the three
methods were slight for the condition of interest, these results must be interpreted with
caution.

Kim and Curry’s study simulated correlation matrices of ﬁve variables; the

14

intercorrelations varied between .322 and .596; they then investigated how well
CC methods estimated the population correlation and covariance matrices. AC

were superior in all cases, though sometimes the improvement was only modest
trivariate regression case where the proportion of missing values was uniform, t
(1964) found that AC analyses give more efﬁcient estimators than CC analyses

the correlation between the two independent variables is less than .5 8 -- and the
correlation between two independent variables is almost always less than .5 8 in
sciences, including study characteristics in a meta-analysis. Thus, while Kim anc'
(1977) are often cited as ﬁnding evidence that AC dominates CC analyses when
correlations are “modest” (Little & Rubin, 1987) or “small” (Rubin & Schafer,

fact is that the correlations need not be “small” or “modest” at all.

Unconditional Mean Imputation and Conditional Mean Imputation

Reports of the poor performance of unconditional mean imputation are
commonplace in the literature on missing data (e. g., Anderson et al., 1983; Littl
1992; Pigott, 1994). Variances and covariances are usually severely underestim:
because one is imputing the mean for all missing cases. Ifthe estimators are not
to take into account this underestimation, they will be both biased and inefﬁcien
Adjustments for this underestimation lead to equations for the variances and co‘
equivalent to those in available-case analysis.

Imputation of conditional means, also known as Buck’s method (Buck,

Sllperior to imputation of unconditional means. It involves regressing the missin;

15

onto the observed variables, and treating estimates of the missing data as “real” in the
follow-up multiple regression analysis. This procedure tends to lead to underestimation of
variances and covariances, just as unconditional mean imputation does, but the problem is
less serious due to the regressions. In the simulations in Little (1992), Buck’s method
performed similarly to the AC analyses: it was less efﬁcient than ML estimation, but about
as biased. However, problems exist with the estimation of standard errors when the data
have anything but a monotone pattern of missingness. Also, this procedure so closely
resembles in operation ML estimation (which is essentially an iterative Buck’s method)
that it seems unreasonable to implement something as complicated as Buck’s method

without going one step further, i.e., maximum-likelihood estimation.

Maximum-Likelihood Estimation
The method of maximum-likelihood is, generally speaking, to choose as estimates

of parameters those values of the parameters that maximize the likelihood of observing
whatever data has been collected. Often, when the data are complete, ML estimators can
be achieved by a straightforward derivation. Unfortunately, when data are missing, ML
estimation is not so easy, as no closed form solutions of the log-likelihood equations exist.
Many iterative methods have been proposed to obtain NIL estimators even though no
closed form solution exists, such as Fisher-scoring and the Newton-Raphson algorithm
However, as Little and Rubin (1987 ) point out, both of these methods require calculating
the matrix of 2nd derivatives of the log-likelihood, which can be mathematically quite

challenging. Two maximum-likelihood methods do not require this: multiple imputation

16

 

 

 

 

 

(Schafer, 1997a) and the Expectation-Maximization (EM) algorithm (Dempster, Laird, &

Rubin, 1977; Little & Rubin, 1992). The formulas for these methods are similar (Schafer,
1997a); the chief advantage of multiple imputation over EM is that it allows for the
calculation of standard errors. Most simulation research investigating the bias and
efﬁciency of maximum-likelihood estimators that account for missing data has been done
using the EM algorithm, however; the strongest multiple imputation work is very recent.
To some extent, the method used to conduct a maximum-likelihood estimation is moot, as

they are all based on the same maximum-likelihood equations and asymptotically should

 

give the same results.

As with CC and AC estimation, ML estimation gives unbiased estimators of slopes
when the data are either MCAR or MAR. There are some important distinctions between
ML estimation and CC or AC estimation, however. When there is any non-trivial degree
of missingness in the data, ML estimators dominate CC estimators with regard to
eﬁciency, and this margin grows larger when the data are NMAR (Little, 1992; Little &
Raghunathan, 1999). The difference between AC and ML estimation is less stark; it seems
that when the intercorrelations among the predictors are low to moderate, AC estimators
perform similarly to ML estimators (Little, 1992), though as mentioned above the AC
estimators may be more biased and questions remain as to the correct standard errors for
AC estimators. Unfortunately, all the evidence we have with regard to this comparison is
from Little (1992), and it is dangerous to generalize from one study.

Besides Little (1992), only two studies (Little, 1988; Little & Raghunathan, 1999)

contain ﬁndings from simulation research regarding how robust maximum likelihood

17

——_—,

 

 

 

estimation is when the data are NMAR. In the ﬁrst, Little tested through simulation how
well two estimation techniques — one which assumed the errors were distributed as a
multivariate t, and one which assumed normal errors — performed when the data were
MCAR, MAR, and NMAR. His results indicate that when the data were NMAR, the ML
estimation was about as good (i.e., very good) as when the data were MAR

It is doubtful, however, that these results generalize to the meta-analysis case. In

Little’s simulations, the population correlation matrix was

 

 

‘1 .95 .45 —.77'
1 52 —5s

1 .08
.. 1 _

 

Variable 1 was always observed; variables 2, 3, and 4 were missing individually
and in combinations across the data set. When the data were generated to be NMAR, the
missing-data mechanism R was taken to be a function of the value of the 4th variable.
Notice the weak relationship of variable 4 with variable 3 (only .0763) and the very, very
strong correlation between variable 1 and variables 2 and 3 (respectively, .948 and .447).
Variable 1 was never missing, meaning it could be used as a predictor of variables 2 and 3
in all cases within the EM algorithm. Unfortunately, the results of this study do little to
answer the question of whether maximum likelihood methods are robust to NMAR

In the study by Little and Raghunathan (1999), the authors investigate how well

diﬂ‘erent types of EM estimation estimated slopes and a random effects term in a multi-

level model with varying types of missing data. When their data were NMAR, conﬁdence

18

 

interval coverages were generally very poor and the estimators had large biases.

3. Summary

Of the estimation techniques discussed, only two seem likely to give viable
estimators for the meta-analysis model when signiﬁcant amounts of data are missing:
available-case estimation, and maximum-likelihood estimation. The differences between
the two are twofold: AC analysis is methodologically simpler than MLE analysis, but the
MLE analysis is more statistically rigorous. An AC analysis could theoretically be
conducted on a spreadsheet, or through SPSS or SAS generated pairwise correlations
combined with some pencil-and-paper work. A MLE analysis, on the other hand, would
require specialized software. This distinction is an important one in that most meta-
analyses are conducted by subject-nutter specialists, who may or my not be statistically
adept or expert at programming. Thus, while statistically the MLE analysis is likely to be
superior in terms of bias and MSE, it is an open question as to whether it would give

substantively different results ﬁom AC analysis.

19

 

 

 

CHAPTER III

MODELS AND ESTIMATION THEORY

1. The Meta-Analytic Model

Consider a meta-analysis of k independent effect magnitudes, T,- (i = 1 to k).
Typically.

Ti= B0+ B1X1i+B2X2i + . . . + Bpoi+ u,- + e,- (fromi= 1 to k), where X1,- . . .
Xpi are kx 1 matrices ofdata on thep predictors. If X,={X1,-|X2,-| . ..|Xp,-} and B =
{B0 | B1 1 . . . | Bp_1}’, this maybe restated as

T, = XiB + 11i+ e,, where (3.1)

ei ~ N(O, 0%), and 0% is considered knownl,

u,- ~ N(0, 1:), and

X,- ~ NG‘Xa 2X).

Furthermore, assume that for any given study i, data may be missing on any of the
predictors in Xi. Estimators of B and 1:, and standard errors for these estimators, are
straightforward for the this model when no data are missing (see, e.g., Hedges & Olkin,
1985; Raudenbush, 1994). However, the presence of missing data makes matters complex.

A search of the literature found no available-case estimation procedure that would

be directly applicable to the meta-analysis equation in 3.1 (i.e., estimation with a ﬁxed

‘

I 0% is the ﬁxed-effects variance of the errors. 0% = V(e,-) = 1/(n,-3) in the case of
F isher’s-zs, and V(e,-)= l/niE + 1/niC +0i2/2(n,-E+nic) for standardized mean diﬂ‘erences
(d3).

20

 

variance term for one variable and random-effects terms for all variables). Similarly, no
maximum-likelihood procedure was found that, without some modiﬁcation, could be used
to estimate the model in 3.1. Some models are very similar; Bryk and Raudenbush (1992)
give a model with both a known variance term and a random-effects variance to be used
for meta-analysis. However, their estimation procedure does not take into account missing
data on the predictors. Schafer (1997b) outlines a general multivariate model into which
the meta-analysis model falls. It would treat all of the study characteristics as outcomes,

and estimate both ﬁxed-effects and random-effects variance-covariance rmtrices.

 

However, Schafer’s model assumes that the ﬁxed-effects variance-covariance matrix needs
to be estimated. This matrix is considered to be known in meta-analysis: the variance of
the outcome is 0%, and the variances of the predictors, and covariances between the Xs
and the outcome are zero (as the covariances between the Xs and the outcome are best
represented in the random-effects covariance matrix, not the ﬁxed-effects covariance
matrix). This makes Schafer’s EM formulas impossible to use as they require an invertible
ﬁxed variance term. Finally, as already mentioned, the method given in Pigott (1992)
accounts for missing data and was developed with the meta-analytic context in mind.
However, it does not account for the presence of random-effects variation. Also, her
model assumes that the variation in the predictors is proportional to the sample size of the
study in question, which often will not be the case. For instance, if one study characteristic
under investigation is the length of a treatment program, it would probably be wrong to
expect variation in that characteristic to be smaller for studies with large numbers of

subjects than with small numbers of subjects.

21

 

—

 

2. Estimation Procedures

Below I describe how I will estimate the model in 3.1 employing three diﬂ‘erent
methods: complete—case estimation, available-case estimation, and maximum-likelihood

estimation.

Complete-Case Estimation

The complete-case estimation follows the procedure described in Raudenbush
(1994). The estimator ofB is the standard WLS estimator (X'leylx’vlr, where v is
the diagonal k x k rmtrix of ﬁxed effects variances. The WLS estimator of the population
random-effects variance, 1.”, is found using the formulas given in Appendix A of

Raudenbush (1994).

Available-Case Estimation

Little and Rubin (1992) outline three different methods to get pairwise covariance
matrices ﬁ'om datasets with missing data. Simulation work led to the conclusion that in the
majority of cases, the estimation procedure used to constrain correlations between the
estimates to between -1 and 1 (Matthai, 1951) led to estimators with the greatest precision
and fewest outliers. This method does not constrain the covariance matrix to be positive
deﬁnite, however.

Deﬁne Y = (Y ij) as an k x (p+1) matrix of k observations incompletely measured

for (p+1) variables (i = 1 to k, j = 1 to p+1). The matrix Y consists of one column for the

outcome, Ti, and p columns for the p predictors. Introduce variable indexes v and w,

22

   

where v and w vary from 1 to (p+1). Let a statistic’s superscript (e.g., (W) or (V) )
represent the variable(s) for which complete observations are necessary to calculate that

statistic. The estimator of the covariance between two variables v and w (Slim) is

* _S(I) V3933? (3.2)

va — ,
was

 

 

where

S0). 20”" —fsz) )(y (Y‘r'w —(VW))/(k(vw)_ 1).

(W) (3.3)

To estimate the slopes and the intercept alter the covariance matrix is estimated, I
use the sweep operator deﬁned in Beaton (1964). An excellent explanation of its use with
regard to handling missing data appears in Chapter 6 of Little and Rubin (1992). It is
straightforward to use each of the estimation procedures above to calculate pairwise
covariance matrices. Weighted pairwise covariances might also be found; the formulas
would change only in that each term in the equations is multiplied by the relative weight
wi/ij , where Wi is based on the inverse of the sampling error for study i. The weight “’1
can be equal to zero based on the missingness of yiv, Yiw’ or both (depending on the
formula). However, this addition severely complicates the procedure, especially for the
non-statistician, given the many different weights that must be calculated for each variance

and covariance. Thus, unweighted pairwise covariance matrices are estimated in the

23

method proposedz. Insofar as we are interested in estimating slopes, this procedure is a
pairwise equivalent of OLS meta-analysis. There are two other key elements of a meta-
analysis, however: an estimator of the random-effects variance (1."), and estimators of
standard errors for the slopes.

The estimation of 1: when data are complete is straightforward: we calculate the
residual sums of squares, ZCT, - XiB)2, divide by (k-p—l), and subtract from that the
average study sampling error. While this method does not take into accolmt the study
weights, Raudenbush (1994) states that it yields an approximation that tends to be quite
accurate. However, the residual sums of squares (RSS) cannot be directly calculated
because of the missing Xs. We can bypass this difﬁculty by calculating the variance of the
T,- and subtracting from it the variance explained by the model, B'ZB. After a degrees-of-
freedom adjustment which corresponds exactly to use of an adjusted R2 in place of a

normal R2 to explain the variance explained in the sample, we arrive at

. A —2 - . —2
0% =,8'2,6+0'e +r,so(e%—ﬁ2ﬂ)*k/(k_4)goe +1. (3.4)

The estimator of 1? is found by subtracting the average study sampling error from
the residual variance. A straightforward procedure to test H0: 1: = O is to calculate what
the value of the heterogeneity statistic, Q E, would be given the estimator of 1:. From

Raudenbush (1994), we get Q E = ([(i*2w,) / k] + 1)*(k-p-2). Ifwe were working with

 

2 This begs the question of whether a weighted pairwise procedure might perform
substantially better than an unweighted procedure. Both were used in the simulation study
described in Chapter IV; while the weighted procedure occasionally gave more precise and
less biased estimators than the unweighted procedure, the reverse was also true. In
general, the difference between the two procedures in overall bias and MSE was slight.

24

complete data, Q E would be distributed as a chi-square with k-p-2 degrees of ﬁ'eedom
when the null hypothesis 1: = O is true. However, there is not complete data on the p
predictors. An intuitive adjustment to the degrees of ﬁ-eedom, used in a related context by

Su (1988), is to multiply p by the ﬁ'action of missing data across the p predictors.

Maximum-Likelihood Estimation

Pigott (1992) showed that through a weighted EM estimation one could conduct a
ﬁxed-effects meta-analysis with missing data on the predictors. However, she noted that
her algorithm treats all variables in the model, both predictors and outcome, as if the
precision with which they are measured is proportional to the sample size of the study. As
she points out (p.53), this assumption may not be a valid one for data from a research
synthesis. Below, I present a method for doing a weighted EM estimation that assumes
that only the outcome is measured with a precision proportional to the size of the study.
The predictors are assumed to be measured with equal precision across all studies and
with no measurement error, regardless of the size of the study. My method also allows for
the presence of a random-effects error term.

The goal ofthis section is to obtain estimators ofXMi , B, 2X, BX, ui , and 1:
through the EM algorithm Aﬂer ﬁnding suﬂicient statistics for the parameters that are to
be treated as ﬁxed, the distributions of the parameters to be treated as random
(conditioning on the ﬁxed parameters and the observed data) are derived, and estimators
of the parameters treated as random are found. This is the “E-step”. During the “M-step”,

these estimates are used in the sufﬁcient statistics to obtain new estimates of the ﬁxed

25

 

 

 

parameters. The new estimates of the ﬁxed parameters lead to new estimates of the

random parameters, and the process iterates until convergence.

The E-Step
Dropping subscripts, for each study i, the joint distribution of T, X0, X M, and u is

 

 

 

 

 

 

asfollows:
_ T q P ﬂ/IX
X
0 ~N #0 ’
XM #M
_ u _ _ ﬂo
r . 2 . . . . ) - (3'5)
ﬂzxﬂ‘rdi +7 ﬂ020+ﬂM2M,0 ﬂM2M+ﬂ020,M T
EOﬂO+ZO,MﬂM 20 20,111 0
EMﬂM+2M,0ﬂ0 2M,o 2M 0
( I 0 0 T) _
where
X0 20 20M
X: , 2X: ’ , ,6: ’60 (3.6)

For study i, X0 consists of the observed X5 and XM consists of the unobserved
Xs. Similarly, B0 consists of the population slopes for the observed X5 and B M consists of
the population slopes for the missing Xs. The column of 1's typically included in the matrix

of predictor variables is exchided, as B0, the intercept, is deﬁned as the mean of the

random effects.
In this multivariate representation of the model for the ith study, only X M and u

26

 

 

 

 

are treated as random. All other parameters, such as E X , 11X, and the slopes for the study
characteristics, B, are treated as ﬁxed. It is stande when doing MLE/EM estimation to
assume that the variance-covariance matrices are ﬁxed (Bryk & Raudenbush, 1992; Little
& Rubin, 1987; Pigott, 1992). This assumption is what differentiates the empirical Bayes
solution from an exact Bayes solution, which can be extraordinarily computationally
complex (Bryk & Raudenbush, 1992). When the assumption that B is ﬁxed is made, the
type of estimation being done is often referred to as full maximum likelihood MF)
estimation. When the assumption is not made, and a non-informative prior is speciﬁed for
B instead, it is referred to as restricted maximum likelihood (MLR) estimation. The
difference between these two, derivation-wise, is not trivial. MLF estimation is easier than
MLR, as fewer parameters are treated as random. Common wisdom is that in the end, the
two usually give similar results except that MLR gives a degrees-of-freedom correction
for the estimators.

There is more to the difference between MLF and MLR than a degrees-of-freedom
correction, however. In a meta-analysis working with complete data, MLR estimation of B
weights studies by l/(‘C+O%); see, for instance, the V-known model in Bryk and
Raudenbush (1992) or Shadish and Haddock (1994). However, MLF estimation of B
weights studies by l/O%. The reason for this stems from the lack of a non-informative prior
for B in 3.5. Equation 3.5 treats B as a ﬁxed effect; were we able to treat it as a random
effect (and thus use the non-informative prior to describe its variance), the weighting
would be different. The derivation would be made much more complicated with this step,

however, as it would mean both the Xs and B are being treated as random.

27

 

 

Another difference is mentioned in Bryk and Raudenbush (1992); when the
number of level-2 units (here, studies) is small, MLF estimators of 1: can also be too small.
This problem is exacerbated the more ﬁxed effects the model contains.

In Equation 3.5, we assume that for study i, T and X0 are ﬁilly observed random
variables, While X M and u are unobserved random variables, and that we have estimates of
the ﬁxed parameters 1:, EX, B, and BX. Standard multivariate normal distribution theory
(e.g., Morrison, 1967, p.88) is used to get E(XM , u | 1:, EX, B, BX); the estimators of XM

and u conditioned on T and X0. Call these estimators XIII and u“, speciﬁcally,

[rim

-1 (3-7)
(2 MﬂM + 2 Memo 2 M,O][ ﬂair/9+ a? + r A52 0 + M 114,0] [T- ﬂé - m]
f 0 Eoﬂo+20,M/9M 20 XO‘l’O
We also need Var(XM , u | T, EX, B, BX). We call this matrix D’, which is
D* D} DE,” [2 M 0)
' 1);, X D; ‘ 0 r '
(3.8)

_1 I I
()3 MﬂM + )3 M,0ﬂM,0 2 MOM/9'2 Xﬂ+ 0,-2 + r ,6'02 0) [ﬂMX M + .1902 0,M I
r 0 20130 20 20,»! 0 '

These expressions become necessary in the M-step, where the sufﬁcient statistics

for the ﬁxed parameters are used to estimate the ﬁxed parameters.

28

 

 

Suﬂ‘icient Statistics for B, 1:, EXLHX

Maximum-likelihood estimation in BM proceeds by assuming that we have

complete data for all of the parameters that we are treating as random. As we are treating
T, X0, XM, and u as random, we need the complete-data joint maximum likelihood of
these four parameters, given all other parameters. Although, for any given study, there is
an X0 and XM, if we are assuming we have complete data we can (temporarily)
unpartition them and refer to the predictors collectively as X.
The key to the derivation is to take advantage of the identity f(m,n) = g(n)h(m|n).
Thus, we have
f(T, X, u | r, 2X. B. PX) =g1(T I X, 11, 1,2» l3. Hx)h(X. u l T, 2X, 13, lJ'X)
=g1(T l X, u, r, 2X, l3, HX)g2(X | u, T, 2X, 9’ PX)g3(u| 1:, Ext l3, PX)
= 81(T | X, n. 17, EX, 9. HX)82(X I T, 2X, l3, 11X)g3(u l T, 2X9 Ba lJ'X)’
given the independence of X and u; across the k studies, the values of the
predictors and the values of the random-effects error terms are unrelated.
This allows us to break the likelihood into three tractable pieces:
L[f(T, x, u | 17, 2X, [3, uX)]= L[/(T I X, u, 1. Ex. 13, no]
* L[g(X l T, 2X9 13, ”XII * L[h(u l 17’ EX, l5: FD].

29

k k

k
L: .Hlf(T|X,,6,r,2X,ﬂX ).H1g(Xl,6,r,2X,,,X).th(ulﬂ,r,2x,yx)
l: l: I:

F -

 

 

 

 

 

 

 

 

 

 

 

 

1 I
k 1 —§(72--Xtﬂ-ut) (Ii‘Xiﬂ‘ui)
—.- II 2 expr 2 *
1:1 (2700.1, 0-1.
_ J
k 1 1 . _1 (3.9)
H exP[ (Xi-#X) 2X(Xi’/1X)]*
i=1‘/(2”)P2X
_ 1 i 7
k 1 " E 1 u:
H —--—----ex
i=1‘/(27r)r p r
This leads to the log-likelihood
k
1
log L = — 5,412+ 2)(log 220- 52kg 0,?- —-§-1ogiz Xi- grog 2' —
i=1
-1 I _
k 5(71' - Xiﬂ- ui) (7} - Xiﬂ - ui)
= Z 2 - (3.10)
i=1 0i
- J
k 1 ' k l
-1 r _1
[3(Xi‘ﬂx) Z (Xi-#X)]-Z[§ui T a}

All that remains is to differentiate with respect to the ﬁxed parameters 1, EX, B,
and p. X and ﬁnd the sufﬁcient statistics for these parameters. After the sufﬁcient statistics
have been derived, the formulas for the expected values and variance-covariance matrix of

X5} and u‘ are used to complete the EM process. The formulas are

30

 

(3.1

d log L
dﬂX

 

k
z %;[(2X 2X 1)- (zﬂXz X1)l’ (3.1:

where I is a pxl vector of 1's,

1 k
#X =‘gi-ZIX1'.

k
ﬁx = ;Z(X.- - ﬁrms - ﬁx). (3.1

 

dﬂ
k k

1 1 (3.1
ZL—aXixrﬂ = 2,7100} - us),

 

The log-likelihood allows us to use results from Tatsuoka (1988, p. 410) to g
estimator of 2X: The estimator of B has weights that do not involve 1 because the u,-

have been partialled out, as shown in Searle et a1. (1992), on page 297.

31

 

The suﬂicient statistics for the maximum likelihood estimators are

Note that we need both weighted and unweighted estimators of the sums of cross-

products of the X.

The M-Stgp

The next stage of the EM algorithm is to calculate the expected value of the
sufﬁcient statistics after conditioning on the observed data. Doing so will allow us to
estimate the parameters that we are treating as ﬁxed, thus updating their estimates from
whatever was used at the previous iteration. Below I index the p columns of X,- by
subscripts r and s, and P“) represents the estimates of the parameters that we are treating
as random, i.e., X32,- and u:, at iteration 1‘. These parameters are shown in equation 3.7.

The expected value of X, (i.e., the expected value of predictor r, r = 1 . . . p, in

study 1), given the observed data and our estirmtes is straightforward, speciﬁcally

Xi, if Xi, is observed

X :0 if X i, is missing ° (3.16)

E(Xir|7ia XOJ’ P(t)) ={

The same is true for the expected value of XirTi, which is

32

X irl} if X i, is observed

X :0];- if X i, is missing (3°17)

E(er1}| T,- Xaa P‘”) = {
The expressions for the others are more complex because they include cross-

products between “real” data and imputed data; that is,

E(XirXisl 7}, X0,ia Pm) =

Xier'S if Xir’Xis are Observed
it:
X i, X a“) if X i, is observed, X is is missing . (3.13)
Xf(t)Xf(t) + Cov( X it“ ) X ”(1) if both X - and X - are missin
Ir IS 1r 9 1s 1r 1s 8

The last expression arises from the identity E(X: X2) = E(X:)E(X;) + Cov(X:X;).

Here,

It! I * t It:
Cov(X,-,.( )9Xis( )) = DXi,rs (3.19)

Thus, we can ﬁnd the necessary covariance by consulting the r’th row and s’th
column of our solution for D k in (3.8).

The solution for the weighted estimator of the sums of cross-products of the Xs is
similar to the unweighted version; all that is added in are the weights, which are

proportional to the inverse of the (known) values of 0%. We have

33

 

 

 

 

E(wiXirXisl1ia Xoa'a Pm) =

WiX irX is if X i, ,X is are observed (3°20)

*
w,- X i, X is“) if X i, is observed, X is is missing .

wiX;(t)X,;(t) + wiCov(X ,3”, X 1;“) if both X i, and X is are missing

A similar argument gives the best estimator of the sum of squares of the ui’s that is

required in order to estimate 1:
E(uiuil]ia X043 Pm) = “19)“ng + V0414”) = “i“; + D3;- (3.21)

The estimator of the covariance between our estimate of X,- and “i is in D; X in
(3.8). If the data were complete, the expected value of the product of X, and ui would be
zero. Because some Xi, must be estimated ﬁom the same data that is leading to estimates
of the u,, the product does not need to be exactly zero, especially in early iterations.

Finally, we have

E(WiXiruilTia X0,ia 10(1)) =

3.22
WiXiruzgt) if Xi, is observed ( )

wiX;(t)u:(t) + wiCov(X,-:(t),u:(t)) if Xir is missing .
From the above expected values we can get updated estimates of each of the
parameters that are considered ﬁxed: 1:, EX, B, and (IX. These updated estimates allow us
to calculate estimates Xffﬁ'l), Din“), and “*(Hl), and the process continues until

convergence. Following Dempster et a1. (1981), the likelihood ﬁmction to be monitored

34

 

for convergence is f(Y , X0 | 17, XX, B, BX), and we can derive the likelihood function as

follows:

g(T,X0,XMaulﬂa2XaﬂX’T) (3.23)

(T,X I ,2 , T): ’
f 0.3 XﬂX, h(XM,u|T,X0,ﬂa2Xa/1Xa7)

where g(. . . ) was derived as in (3.9), and h(. . . ) is the conditional distribution of the
missing parameters treated as random given the observed random variables and the ﬁxed
parameters. The conditional distribution h(. . . ) is normal with the mean given in (3.7) and
variance-covariance matrix given in (3.8).

Simpliﬁcation of the likelihood function relies on recognizing that the likelihood is
true regardless of the values of XM and u. Thus, suppose XM =XA} and u =u‘. This
simpliﬁes the denominator to a great extent as the terms in the exponent equal 0. After
taking the log we are left with an expression similar to that in (3.10) but with a term

representing the conditional variance-covariance matrix of X1; and u* (namely, k log lDil).

log L(convergence) 0C - gloglﬁ XI— glog f + glog|D*| —

1 to: *' at: 4:
_(]i_Xiﬂ‘ui) (E’Xiﬂ'ui)

k 2
Z 2 -

i=1 01'
k
l t'A_1 at:
Eui 2' u,- .
'=1

l

(3.24)

 

 

1(X: - ﬁX)] -

M»
NIH
A
3s
‘*
I
h>
><
V
M)

I

n

ll

0—.
l

35

 

The value needed for convergence is somewhat arbitrary, and depends on weighing
the importance of accuracy against the reality that computer running time is limited.
Simulations showed that a criteria of .000001% change between iteration i and i+l
resulted in estimators of B and 17 that were within .0005 of the estimates found using the

far more strict criteria of .000000001%, and often the estimates were closer.

MLE Standard Errors

Standard errors can be generated after maximum-likelihood EM estimation in
several different ways. Little and Rubin (1987) explain how to get asymptotic standard
errors for the slopes from the inverse of the observed or expected information matrix.
These are two different methods, and neither the observed nor the expected information
matrix is a natural output of the EM algorithm. Pigott (1992) states that the procedure for
calculating the information matrix depends on the speciﬁc model being estimated. Other
suggested procedures include bootstrapping (Little, 1988; Su, 1988), the SEM algorithm
(Meng & Rubin, 1987), or use of an approximation formula (Beale & Little, 197 5). Also,
as noted above, the multiple imputation technique given in Schafer (1997 a) allows for
calculation of standard errors. After doing extensive simulation work, Su (1988)
demonstrated that standard errors based on the observed information matrix are slightly
superior to those found through the bootstrap method or the expected information matrix,
but that the bootstrap method was less affected by model misspeciﬁcation than the other
variance estimators. He recommends use of bootstrapping standard errors in general,

especially when robustness to model assumptions is of concern.

36

 

 

Given the relative ease of generating bootstrapped standard errors, and Su’s
positive review of the technique, the MLE program written to conduct meta-analyses
allows for the calculation of bootstrapped standard errors for the slopes. The method
follows Su (1988) generation of conditional bootstrapped (CBOOT) standard errors.

His method conditions on the observed missing data pattern (also known as the
response pattern), R. The cases within each pattern are treated as independently and
identically distributed (i.i.d.) random variables from the conditional distribution of the
predictors and outcome (Y) given the observed pattern. The procedure of estimating the
covariance matrix is as follows:

1. Let M be the total number of observed patterns and km be the number of cases
in pattern m, m = l to M. For data in pattern m, draw a “bootstrap subsample” of km
studies, with replacement, ﬁ'om those studies with pattern m.

2. Combine the M bootstrap subsamples into a “bootstrap sample” and calculate
estimates of B, denoted B“.

3. Independently repeat steps 1 and 2 B times, obtaining bootstrap replications
(3* 1. . . (3'3.

4. Calculate the variances and covariances of the bootstrap replications:

B
a 1 _ _
C6V(ﬂ) : 3:2(ﬂ4‘b __ ﬂ *Xﬂ*b _ )6 *). (3.24)
b=l

The variances allow for the calculation of conﬁdence intervals for the estimates of

the slopes.

37

 

CHAPTER IV

SIMULATION STUDY METHODOLOGY

In this chapter, I outline a Monte Carlo study to evaluate the proposed maximum-
likelihood and available-case estimators of the meta-analytic parameters in (3.1). I begin
by describing the ranges of hyperparameter values for the simulations, and explaining how
these ranges of values were determined. (The term “hyperparameter” is used to describe
the variables varied in the simulations to differentiate them from the parameters that are to
be estimated in the meta-analytic model.) I also describe how the data were generated
using the values of the hyperparameters. The chapter concludes with a description of the
criteria for measuring and comparing the performance of the complete—case, available-
case, and maximum-likelihood estimators of B and 1:, as well as for testing the

bootstrapped standard errors for the maximum-likelihood method.

1. Hyperparameter Choices

I consider seven hyperparameters for variation within the simulation study: average
study sample size (avg ni), number of studies in the meta-analysis (k), random-effects
variation (1:), study correlations between predictor and outcome and between predictors,
variation in outcome explained by predictors, the incidence of missing data, and the type
of missing data (MCAR vs. MAR vs. NMAR). Hyperparameters that do not change
include the number of predictors (three) and the type of outcome (standardized mean
differences).

38

The Outcome

The outcome to be simulated is the standardized mean difference, calculated using
simulated raw data. Standardized mean differences are distributed as non-central
t—statistics. This outcome was chosen instead of Fisher’s z (the asymptotically normal
transformation of the correlation coefﬁcient) to see whether the maximum-likelihood

method could handle the heavier tails of the non-central t-distribution.

Avlage Study Sample Size
Past simulations of meta-analytic data (Becker, 1985; Chang, 1992; Fahrbach,

1995) investigated study sample sizes ranging ﬁom 20 to 250. A survey of meta-analyses
in Psychological Bulletin from 1995 to 1999 showed that average study sample sizes tend
to range between 60 and 500, although there are exceptions in particular cases (e. g., if the
primary sampling methodology in the ﬁeld was mass community phone or mail surveys,
or, conversely, if the primary study design was the case study). Based on this literature,
two average study sample sizes were chosen for simulation: 80 and 400. The sampling
errors for effect sizes generated using these study sample sizes are roughly equivalent to
the sampling errors that would be generated if Fisher’s 25 were simulated for ns of 40 and
200. The former is near the lower end of what is considered an “acceptable” study sample
size in the social sciences (Gay, 1992), and the latter is generally considered to be
moderate to large.

There remains the question of how much the study sample sizes should vary.

Optimally, there are three natural choices: none (equal study sample sizes), moderate, and

39

 

high (e.g., where the imbalance is such that the largest study’s sample size my have half
of the total sample in the meta-analysis). Because it is of interest to determine how well
these estimation procedures work with weighted data, I assign study sample sizes in such
a way that there is a high degree of imbalance across the k studies. The survey of recent
meta-analyses in Psychological Bulletin found that most meta-analyses had large variation
in study sample sizes, and that up to 60% of the total study sample size (N) could be
within the largest one-ﬁfth of the studies. I let one-ﬁlth of the studies contain 50% of N,

and the other four-ﬁﬂhs of the studies contain the other 50% of N.

Number of Studies ﬁr Meta-An—alysis (k)
The simulation studies of Becker (1985) and Chang (1992) use values of k ranging

between 2 and 50. The previously mentioned survey of meta-analyses published over the
last ﬁve years in Psychological Bulletin showed meta-analyses ranging in size from 19
studies to over 300; most meta-analyses had at least 35 studies, and ahnost halfhad over
100.

An important factor in determining the values of k for this simulation study is the
presence of missing-data mechanisms. Such mechanisms cannot be fairly represented in a
meta-analysis with a very small number of studies. In addition, it is rare (and inadvisable)
to examine moderator variable effects with small numbers of studies due to the amount of
imprecision expected. Just as one would not reconnnend conducting a multiple-regression
with an n of 20, one carmot recommend conducting a moderator meta—analysis with a k of

20.

4o

Given these facts and the review of the Psychological Bulletin meta-analyses,

values for k of 40 and 100 were selected.

Random-Effects Variation (1:)

The values for 17 were 0, .005, and .02. The two nonzero values roughly
correspond to 95% bands for ranges of the population eﬁ‘ect size of :t. 14, and $.28, which
translate roughly to small variation and large variation in the outcome after sampling error
is accomted for. Cohen (1988) states that a small eﬂ‘ect size is about .2, a moderate one,
about .5, and a large one, about .8. Thus, values of 1: much lower than .005 are often not
substantively interesting (17 = .005 indicates that almost all studies are within .15 of the

mean, and an effect size difference of. 15 is considered small).

Population Correlation Matrix among Predictors and Outcomes

The slopes were generated by starting with the following set of correlations among
the outcome (represented by the ﬁrst row/column) and the predictors (represented by the

2“, 3rd, and 4th rows/columns). The ﬁgures presented are rounded to two decimal places.

'1 50 .62 .75' '1 .67 .80 .93‘
50 1 .1010 .67 1 .40 50
.62 .10 1 .10 .80 .10 1 .60
_.75 .10 .10 1_ _.93 .50 .60 14

 

 

 

 

41

In each matrix the underlying R2 is 1.00, corresponding to a ﬁxed effects model
where the effect is fully explained by the Xs, and the predictors have varying strengths of
relationships with the outcome. The primary diﬂ'erence is that in the ﬁrst matrix, the
predictors are weakly correlated, while they are strongly correlated in the second. Each of
these correlation matrices were used to compute standardized betas to be used in the data
generation process that is described below. The ﬁrst matrix generates B1 = .38, B2 = .52,
and [33 = .66. The second matrix generates B1 = .22, [32 = .34, and B3 = .62.

The underlying R2 for the relationship between the predictors and outcome is unity
in these matrices, but the sample R2 will be lower due to two previously mentioned
hyperparameters: sample size (which causes sampling variation), and 1: (which causes
random-effects variation). The sample R2 will decrease an amount depending on the size

of these two types of variation relative to that caused by the next hyperparameter.

Variation in Outcome Caused bv Predictor Variables ( ymod)

Correlation matrices alone cannot prescribe a relationship between the predictors
and the outcome; the total amount of variation in the outcome caused by the predictors
must also be set. This amount is deﬁned by the hyperparameteerod; it allows R2 to vary
across simulations while keeping BS, sampling err0r, and random-effects error constant.
Vmod is used to scale the variance of the predictors. A description of how Vmod was used
follows in section 2.

Values of Vmod of .006 and .03 were chosen, for reasons similar to those for

choosing the values of 17. The ﬁrst value corresponded to a small effect for the three

42

 

 

predictors, while the second value corresponded to a large effect. The underlying sample
R2s for Vmod = .006 ranged ﬁom around .06 (for high sampling error and large 17) to
about .30 (for low sampling error and zero 1:). Vmod = .03 led to sample st ranging from

.25 to .70.

Incidence of Missing Data

All simulated meta-analyses were of models with three predictors. While
simulation with more predictors would be preferable, EM estimation is computer—intensive
and there is a geometric relationship between the number of predictors in the model and
the time it takes to estimate the parameters in that model. Past simulation research of
MCAR and MAR data (e.g., Little, 1992; Su, 1988) provides little theoretical basis for
choice of a speciﬁc missing-data mechanism or fraction of missing data on any given
predictor. Ifanything, there is a focus on keeping things simple: Little (1992) simulates
four predictors, but only the ﬁrst has missing data (about 50%). Su (1988) has three
predictors, and assumes the 2nd has 25% missing data and the third, 50%. Because no
studies have described missing data patterns in meta-analytic datasets, there is little to rely
on but these eﬂ‘orts, and good judgement.

Two patterns of missing data were generated. In both, the ﬁrst predictor is always
observed, while the second and third are sometimes missing. In the ﬁrst pattern, 25% of
the studies are missing information on both predictors, 25% of the studies are missing data
on the 2nd predictor, 25% of the studies are missing data on the 3rd predictor, and 25% of

the studies are complete. This results in both the 2’1d and 3rd predictors each being missing

43

50% of the time. In the second pattern, 10% of the studies are missing data on both
predictors, 20% of the studies are missing data on the 2nd predictor, 20% of the studies
are missing data on the 3rd predictor, and 5 0% of the studies are complete. This results in
both the 2nd and 3rd predictors each being missing 30% of the time. Appendix 1 contains
details regarding these patterns and how they relate to variation in study sample size and
the number of studies per meta-analysis. For simplicity’s sake, these two conditions will be
referred to based on the proportion of data that would be missing in a complete-case

analysis, i.e., “75% Incidence of Missing Data” and “50% Incidence of Missing Data”.

Tms of Missm' g Data
As noted previously, there are three different types of missing-data mechanisms:

missing completely at random (MCAR), missing at random (MAR), and not missing at
random (NMAR). Within each missing data pattern, diﬂermt percentages of data can be
missing on each variable, and diﬂ‘erent strengths of relationships may hold between any
given variable and the missing-data mechanism. For instance, assume that there are ﬁve
predictors of interest, and only the ﬁrst three are completely observed. For MAR data,
missing data on the 4th and 5th variables might be a direct function of the value of the 1St
variable; or it might be strongly related to the value of the 2nd variable; or it might be
weakly related to a ﬁmction of both the 1st and 2nd variable.

Generation of the MCAR pattern of missing data is straightforward. For instance,
for the ﬁrst incidence-of-missing—data pattern mentioned in the last section (7 5% Missing

Data), the ﬁrst 25% of the studies are treated as complete, the next 50% are missing

44

 

values, alternately, on the 2nd or the 3rd predictors, and in the last 25% both the 2nd and
3rd predictors’ values are deleted. The variation in sample sizes is taken into account so
that after study deletion, no pattern has more large studies than any other pattern (e. g., it
would never be the case that most of the large studies are the studies in which data are
missing on the 2nd predictor, or the studies with complete data, etc.).

One type of MAR data was generated. The construction of the missing-data
pattern begins by generating a random normal deviate (r.n.d.) correlated .8 with the only
completely observed predictor, X1. Each study thus has its own r.n.d. For any given meta-
analysis’s missing data pattern and number of studies, x% of the studies will have
complete data, y% will have data missing on either the 2nd or 3rd predictors, and 2% will
have data missing on two variables (x+y+z=100). For instance, for the ﬁrst incidence
pattern described above, x% = 25%, y% = 50%, and 2% = 25%. The studies with the top
x% values of the r.n.d. are treated as complete: no variables are unobserved. The studies
with the middle y% values of the r.n.d. are treated as partially observed; only one variable
(the 2nd or 3rd, chosen at random) is treated as missing3 . In the studies with the bottom
2% values of the r.n.d., both the 2nd and 3rd predictors are treated as missing.

This process results in a correlation of approximately .50 between the value of the
completely observed predictor used to generate the selection bias, and a dummy variable

indicating whether or not for that study each predictor (2nd or 3rd ) is observed. This

correlation is lower than the .80 value used to create the r.n.d. because “missingness” is

 

3 Within any given meta-analysis, this process is controlled such that the 2nd and 3rd
predictors are each treated as missing halfthe time.

45

dichotomous, limited to 0's and 1's.

Two types of NMAR data were generated in which the missing-data mechanism is
dependent on the true value of the 2nd (incompletely observed) predictor. Data generated
using this type of mechanism will be collectively referred to as “predictor-NMAR da ”, or
p-NMAR data (to distinguish them ﬁ'om another type of NMAR data described below).
The process to generate these data is similar to that described above for the MAR data.
One type assumes a strong relationship between the true values of the predictor and the
chance that the 2nd or 3rd predictor will be missing; a correlation of .8 is used to create the
r.n.d. These data will be referred to as sp-NMAR data, as there is a strong relationship
between the predictor and the missing-data mechanism. The other type assumes a
moderate relationship between the two variables; the correlation used to generate the
r.n.d. is only .4. These data are referred to as mp-NMAR data.

One type of NMAR data is generated in which the missing-data mechanism is
dependent on the true value of the outcome (i.e., the mechanism is dependent on Hi, the
value of the outcome before sampling error is added to the model, but after random-
eﬁ‘ects variation is added). This type of data (referred to as outcome-NMAR data, or o-
NMAR data) is perhaps of the most concern to the skeptical researcher, who may worry
that unmeasured variables both affect the outcome and relate to whether or not the eﬂ’ect
size is reported. Consider, for instance, a research domain in which private schools are
investigated disproportionally more oﬂen than public schools; in this instance, in
comparison to the population of schools, data on public schools is missing more often than

data on private schools. Further suppose that the public/private dimension has a strong

46

relationship to the outcome being investigated. If the researcher has no data on whether
studies were conducted in public or private schools, her data will be NMAR in nature,
dependent on the value of the outcome, because the outcome is dependent on the value of
an unmeasured variable whose missingness is related to its value. In such circumstances
one is assuming that the effect of unmeasured variables is subsumed in the value of the
random-effects error ui, and that the value of “i is correlated to whether or not a study is
observed (i.e., “i is correlated to whether a study investigated a public or private school).
Because the sampling error is unrelated to whether the school is public or private, the
selection is based on the population value of the outcome, not the observed value of the
outcome, which would include sampling error.

Another reason to investigate this type of missing-data mechanism is that in Little
(1992), censoring based on the observed value of the outcome (which should lead to
MAR data, as the outcome was always observed), showed poorer estimation of regression

parameters than for an NMAR example. Further investigation is in order.

2. Generation of Data

Two series of simulations were conducted. The primary series crossed the
hyperparameters listed above and was designed to investigate the bias and MSE of the
complete-case, available-case, and maximum-likelihood estimators.

The second series was designed to test how well standard errors for the (is could
be estimated using the conditional bootstrapping method CBOOT described by Su (1988)

when applied to maximum-likelihood estimation in this context.

47

Bias and MSE Simulations

For each of the 480 combinations of hyperparameters, 1000 simulated meta-
analyses were generated and analyzed using the three methods (CC, AC, and ML
estimation) described in Chapter 3. The values for the predictors and outcome were
generated in the following fashion.

1.) The population [is are calculated based on which one of the two correlation
matrices were chosen.

2.) Three columns of data, representing X1, X2, and X3, are generated based on
the correlation matrix. These three columns of data have the correlation matrix chosen as
their population correlation matrix, though of course the actual sample correlation
matrices among generated Xs will vary. Each column has a population mean of zero and a
population variance of one.

3.) Variation stemming from variation in the predictors is added to the outcome.

6;“ = 9 +<Vmod>ﬂtxt + (Vmoapzxz + (vmoaﬁexe ,

where 9 represents the selected population mean effect size. A mean effect size of
.8 was used in all simulations.

4.) Variation stemming ﬁ'om random-effects error is added to the outcome.

6i = 9; + “i ,

where “i represents the random-effects error for study i (var(ui) = 1:).

5.) Two sets of normally distributed raw data (vis) with population mean eﬁ’ect
size 6i and sample size ni/2 are generated. An unbiased standardized mean diﬂ‘erence, Ti,

is calculated from these data using the formula from Hedges & Olkin (1985), p. 81:

48

T=(1- 3 )Yl-i—Y2i,
1 4n —9 s,-

i

 

where si is the square root of the pooled sample variance.

6.) A missing data pattern is generated based on the methods described above for
MCAR, MAR, p-NMAR, and o-NMAR data. This does not change the values of any of
the data, but leads to some observations of some predictors to be considered “missing”.
The algorithm employed generates r.n.d.’s that (for all but the MCAR data) are correlated
with the values of one of the predictors or the population value of the outcome. The
algorithm then sorts the studies’ r.n.d.’s and assigns each study a missing data pattern
based on its value of the r.n.d. For instance, in the ﬁrst missing data pattern, the studies
with the 25% highest values were considered to be “complete”, the studies with values in
the middle 50% were considered to be “missing data on either the 2nd or 3rd predictor”,
and the studies with the 25% lowest values were considered to be “missing data on both
the 2nd and 3rd predictor”. As noted above and in Appendix 1, the algorithm is controlled
so that studies within each missing data pattern have approximately the same proportion
of small and large study sample sizes. Thus, across missing data patterns (i.e., complete-
data, missing data on either the 2nd or 3rd predictor, or missing data on both 2nd and 3rd
predictors), there is no relationship between average study sample size and the study
population effect size. After each study was assigned a missing data pattern, the order of
the studies within each pattern was randomized, in order to assure that within each missing

data pattern there was no relationship between the average study sample size and the

49

study population effect size.

After the simulations were completed, the bias and mean-squared error (MSE)
were calculated for the 1000 CC, AC, and ML estimators of BO, [31, B2, [33, and 1:. These
statistics were also calculated for the CC, AC, and ML estimators of the population mean.
For the pmpose of convenience in notation call a parameter of interest 9 and a given
estimation technique’s estimators of that parameter 6.

The formula for bias in 0 is Mean(é) - 9.

The formula for the MSE ofé is bias2 + Var(é).

The program to generate these data was written in SAS/[ML Version 6.12 (SAS

Institute, Inc.).

Standard Error Simulations

The purpose of these simulations was to test the accuracy of bootstrapped
standard errors. The primary question was how many times (B) to run the bootstrap
algorithm per meta—analysis. Efron and Tibshirani (1991) state that the beneﬁt of
increasing the value of B to over 200 is generally negligible, and that “values of B as small
as 25 often give satisfactory results” (p. 391). Su (1988) used a value of B of only 2 in his
simulated multiple regressions, determining that this was the optimal value after
controlling for the total number of data sets and bootstrapped samples (see his Appendix
B). Unfortunately, his method did not result in accurate bootstrap conﬁdence intervals in
each simulated regression, and he was forced to use additional calculations to calculate the

coverage probabilities (see his Appendix C).

50

Given the considerable amount of computing time necessary to run the maximum-
likelihood estimation procedure for these simulated meta-analyses, even a B of 25 is
demanding for a large amount of simulation runs. However, I wanted to analyze the “real”
coverage probabilities found using a higher B than the B = 2 method employed by Su
(1988). Thus, a run was made of B = 25 for 100 simulated meta-analyses for each of 48
combinations of hyperparameters in which the data was MCAR and the correlation among
the predictors was low. The incidence of missing data, the number of studies, the average
study sample size, the size of the effect the predictors had on the outcome, and the size of

the random-effects variance were all varied.

3. Criteria for the Investigation of Estimators

The estimators of most importance are 6 and it. Though estimators of other
parameters exist (e.g., of 2 X and pX), estimation of these parameters is generally
considered important only so far as it leads to good estimators of B and 1:.

The accuracy of these estimators is ﬁrst investigated by examining the estimators’
biases. This is done in three ways: by examining the number of times the true value of the
parameter fell outside the 99% conﬁdence interval generated for its estimate (the 99%
conﬁdence interval was used given the large number of tests conducted and the high
power of the tests), by looking at the actual size of the biases relative to the effect-size
metric, and by calculating the ratios of the empirical MSEs to their empirical variances.
The last method was employed by Su (1988) in his investigation of the bias of his MLE
estimators, and allows one to judge the effect that the bias has relative to the error caused

51

by sampling error in the estimates. This ratio is expected to be 1.00 if no bias exists; if the
ratio had a value of 1.10, this would imply that 1/ 1.10, or 91% of the variation in
estimation stemmed from bias in the estimator.

After investigating the bias of each method’s estimators of B and 1:, the
performance of ML and AC estimators are compared to the performance of the CC
estimators by calculating the ratios of their estimators’ respective MSEs. Thus, to
compare the ML estimation of B1 to the CC estimation of B1 , MSECC/MSEML is
calculated for that estimator. This ratio is similar to the ratio calculated to measure
“relative eﬂiciency” (Mendenhall et a1. 1986), but is not exactly said ratio because the ML
estimators are not necessarily unbiased. However, in the presence of only a small bias, this
ratio will be something very similar to relative efﬁciency. Even when there is bias, the ratio
still fulﬁlls a similar role.

These ratios were investigated by Su (1988) by comparing mean values through
tables suggested by ANOVAs. A similar approach is used here to gauge the relative
eﬁeas of simulation hyperparameters (e.g., the importance of variation in average study
sample size versus the importance in variation in the population value of 1'). However,

AN OVA signiﬁcance values are not reported, as they cannot be directly interpreted due to
the two-level nature of the data. For instance, when the null hypothesis that the MSE of an
EM estimator is the same as the MSE of a CC estimator is true for a given condition, the
MSECC/MSEML ratio should be distributed as an F-statistic with 999 degrees of freedom
in both the numerator and the denominator. The sampling error for such ratios is very low:

the expected value is 999/997 = 1.002, and the standard error is approximately .062. Tests

52

of main effects, which as seen below may be based on as many as 196,000 observations,
will have true standard errors of much less. This was conﬁrmed by creating multiple
random samples that were 25% the size of the largest dataset and comparing the average
ratios across main effects. These averages changed by no more than .03 between these
smaller subsamples.

AN OVAs do not take the standard errors of these ratios into account. Also, the
standard error of the ratios will differ depending on what the true ratio actually is; the
standard error for an F9999” will be .062 only when the null is true. It was decided to use
AN OVAs only to investigate the relative size of eﬂ‘ects, as mentioned above, and to pay
close attention only to hyperparameters that had effects of at least .20.

The performance of the bootstrapped standard errors for the slopes was
investigated in the two ways employed by Su (1988) in his investigation of bootstrapped
standard errors. First, for each estimator, empirical non-coverage rates (using 95 %
conﬁdence intervals) were calculated for the 100 simulated meta-analyses for each of the
96 combinations of hyperparameters mentioned above. In the second procedure, for each
estimator, the average empirical variance over the 100 simulated meta-analyses was
divided by the empirical MSE of the matching estimator. When this ratio equals one, the
average estimated variance of the estimated slope is equal to the MSE of the estimated
slopes, implying precise estimators of standard errors and an overall rejection rate which
should be equal to that which is theoretically expected. For reasons discussed in the next

chapter, a ratio was calculated that used the median empirical variance as well.

53

CHAPTER V

SIMULATION STUDY RESULTS

This investigation is split into six parts. The ﬁrst part investigates the conditions
for which the assumptions of the maximum-likelihood method is met (the data are MCAR
or MAR because of missing data on the 1St predictor) when estimating B0, B1, B2, B3 and
T. In the second the assumptions are not met (the data are p-NMAR due to data being

missing on the 2nd

predictor), and in the third the assumptions are not completely met (the
data are o-NMAR due to the data being missing on the true value of the always-observed
outcome). The fourth part investigates all three of these missing data types regarding
estimation of the population mean. The ﬁlth part brieﬂy considers the estimation of B 0’
B1, B2, B3 and T when there is a dichotomous predictor with missing data. The ﬁnal

section investigates the performance of the bootstrapped standard errors.

1. Results: MCAR/MAR Data

For 192 combinations of hyperparameters the data were MCAR or MAR. These

data are referred to jointly as MCAR/MAR data.

Bias in ML Estimation of B
It was expected that there would be no statistically signiﬁcant bias in the MLE
estimators, given the normally-distributed predictors and the almost normally distributed

outcome. This was not the case. In the 192 conditions, there are 2 instances of bias for the

54

intercept, 10 instances for B1, 25 instances for B2, and 34 instances for B3 (p<.01). There
were 58 conditions for which there was a bias for at least one of the slopes. Of the seven
hyperparameters varied in the simulation study, four are moderately related to whether a
signiﬁcant bias was found in the slopes: Vmod (for 39 of the conditions, Vmod = .03),
average "i (for 37 of the conditions, the average ”i = 200), missing data incidence (for 38
of the conditions, there was 50% missing data on both the 2nd and 3'?d predictors, as
opposed to 25%), and correlation matrix (for 38 of the conditions, the correlation matrix
had low correlations among the predictors). There were no or very weak relationships
between whether a condition showed a signiﬁcant bias and that condition’s values of k, T,
or whether the data was MCAR or MAR.

Table 5.1 contains the average, minimum, and maximum empirical MSE/variance
ratios. A histogram of MSE/variance ratios for B3 is in Figure 5.1. The third predictor was
used to generate the ﬁgure as it is the 3rd slope for which there is the most evidence of
bias. Both the table and the ﬁgure demonstrate that the biases are always very small
relative to the amount of sampling error in the estimators. The largest ratio of 1.025
indicates that even the largest bias was only 2.5% the size of the normal sampling variation

in the estimator.

55

Table 5.1

 

 

 

 

 

 

Maximum-Likelihood MSE/Variance Ratios
for B (MCAR/MAR Data)
Bo B1 [52 B3
Mean 1.001 1.002 1.003 1.004
Median 1.001 1.001 1.001 1.002
Minimum 1.000 1.000 1.000 1.000
Maximum 1.011 1.021 1.025 1.024

 

 

 

 

 

 

 

Figure 5.1
120

 

 

100 ,
80?

60 -.

 

40'7- r-._l

 

 

 

 

1.001 1.003 1.006 1.008 1.011 1.013 1016 1.018 1.021 1.023

Ratio of MSE to Variance for B3 (MCAR Data)

56

The presence of a substantively small, yet statistically signiﬁcant bias is curious
given that all assumptions of the maximum-likelihood model are met in these data except
for the slightly non-normal outcome. Little and Rubin (1987) show that maximum-
likelihood estimation should lead to consistent estimates, and Little (2000) said that while
bias might exist in some more complex applications of nmximum—likelihood theory, such
as with censored data or in nonlinear regression (e.g., Cook, Tsai, & Wei, 1986), he was
not familiar with instances of bias in the standard regression model with normal slopes and
normal outcomes. He said that small biases might exist for small sample sizes, however,
though they should go to zero as sample size increases. This View is reinforced in Cordeiro
and McCullagh (1991), in which they state “It is well known that MLEs may be biased
when the sample size or the total Fisher information is small. The bias is usually ignored in
practice, the justiﬁcation being that it is negligible compared with the standard errors.” (p.
629)

The work of Sn (1988) supports the idea that in some cases, there might be some
bias in maximum-likelihood estimates in the standard general linear model even when the
outcome and predictors are normally distributed. Su found MSE ratios of 1.05 and 1.02
for his sample size of 40, and 1.00 and 1.01 for his sample sizes of 160, and his model did
not concern itselfwith a known ﬁxed covariance term of different sizes across studies or
treat the random-effects variation the same as the ML estimation in Chapter 3. These
numbers compare favorably with what is found ill the present study.

Follow-up simulations were conducted in which k=400 and a small number for

which k=1000. In approximately 2/3 of the cases, the bias shrunk to become statistically

57

insigniﬁcant. In the other 1/3, however, the bias shrunk no faster than the standard error of
the estimates. In other words, while the bias shrunk as might be expected if the ML
estimates were consistent, there were still statistically signiﬁcant biases for some sets of
conditions for higher values of k. In some of this simulation work a normal outcome was
simulated, and equally-sized studies; neither of these changes affected the size of the bias.
Because the MSE ratio was still small (1.01 to 1.02) and the biases for the higher values of
k so substantively uninteresting (on the order of one-hundredth of an effect size) the

matter of the bias was left to later investigation.

Bias in ML Estimation of T

Some bias was expected in the estimation of T given that the method used was full
maximum-likelihood and not restricted maximum-likelihood (Bryk & Raudenbush, 1992,
p. 223). The size of the bias was expected to be lower for k = 100 than for k = 40. An
AN OVA showed that all seven simulation hyperparameters and most 2-way interactions
between those hyperparameters were signiﬁcant; however, the only substantive eﬂ‘ects
came from missing data pattern, number of studies, and the value of T. Table 5.2 shows
the ratios and biases for the ML estimations across the different values of these simulation

hyperparameters. The average ratio across all conditions was 1.327.

58

Table 5.2

 

 

 

 

I Maximum-Likelihood MSE/Variance Ratios for T (MCAR/MAR Data)
Ratio for T Bias in t

 

 

 

 

 

 

 

  

 

 

 

  

 

 

 

    

 

 

While the ratios are sometimes quite large (above 2.00 for some combinations of

 

 

 

 

 

 

T = 0 1.179 .0012
k = 40 T = .005 1.276 -.0020
50% Incidence T = .02 1.398 -.0075
ofMissing Data T = 0 1.380 .0012
k= 100 T=.005 1.117 -.0011
T=.O2 1.111 -.0031
T = 0 1.148 .0011 I
k = 40 T = .005 1.608 -.0028
75% Incidence T = .02 1.809 -.0110 i
ofMissing Data 1: = 0 1.342 .0016
k = 100 T = .005 1.273 -.0017 1
T = .02 1.284 -.0054 j

 

hyperparameters), the size of the bias is usually trivial. The only exception is when T = .02

and the number of studies is small. In these instances biases average -.011, roughly halfthe

size of the population variance to be estimated. As expected, this bias drops when study

size increases or more data are observed.

Bias in AC Estimation of

As noted in the review of literature, some biases in the AC estimates of B were not

59

 

unexpected, though it was unknown how large they might be. Table 5.3 shows how often
there were statistically signiﬁcant (p<.01) biases for the estimates of B. MAR data led to

more biases than MCAR data, especially for B0 and B1.

Table 5.3

Available-Case Frequencies of Bias in Estimates of B (MCAR/MAR Data)

Bo I31 [32 Ba

 

8 (8.3%) 21 (21.9%) 31 (32.2%) 39 (40.6%)

 

 

 

 

 

 

The amount of bias in the AC estimators differed greatly depending on the
simulation patterns. Not surprisingly, bias was the worst when there was the least
complete data (75% missing data and k = 40). The size of the bias varied widely for this
condition and often was affected by outliers across the 1000 simulations.

While bias was widespread, the size of the bias in B relative to the variance in B
was typically small. Tables 5.4 and 5.5 summarize the ratios of MSEs to the variances of
the estimators. For both the MCAR and MAR data, the biases in B are on average

substantively insigniﬁcant relative to the mean-squared error of estimation.

60

 

Table 5.4

 

50% Incidence

 

of M. Data
7 5% Incidence

 

 

 

 

 

 

 

 

OfM ata , - ~

Table 5.5

‘ Available-Case MSENariance Ratios for B (MAR Data) .

 

5 0% Incidence

 

of M Data
75% Incidence

 

 

 

 

 

 

 

 

_ MaDta

The only exception is the estimation of B0 and B1 for the MAR data, especially
when there is little missing data and there are a large number of studies. In one set of
conditions the ratio was as high as 3.00. It should be noted that the reason the MAR ratios
are so high in these cases is not because the biases are higher, but because the variance of
the estimates is much lower, due to the large number of simulated studies or the low

percentage of missing data. The ratios in the table are averages; however, for no set of

61

MCAR conditions does an estimator of any B have a ratio above 1.035. For no set of

MAR conditions (where there is 7 5% missing data) does the ratio exceed 1.20.

Bias in AC Estimation of T

Given the theoretical possibility of bias in B it was expected that there might be
some bias in the estimation of T, especially given the fact that all negative estimates of T
were set to zero (as they would be in a real study). Table 5.6 summarizes the sizes of the
average bias in T for the hyperparameters that were most strongly related to the size of
bias. The estimation is the worst for T = 0; the MSE/variance ratios are high and the actual
size of the bias is large, especially for average "i = 80. The bias for these conditions is
especially large; an average bias of .0078 for population T = 0 implies a standard deviation
in the outcome of .088, which could be substantively misleading. There are similarly
substantively large biases for T = .005 when average ni = 80. The bias is negligible relative

to the size of T for T = .02 across all conditions.

62

Table 5.6

Available-Case MSE/Variance Ratios and Biases for T (MCAR/MAR Data) !

Ratio for T

 

1.387

 

50% Average "i = 80 . 1.147
1.016
1.325

 

 

‘ Incidence of

 

Average "i =

I Missing Data . 1.011

 

400
1.011

1.275
Average "i = 80 . 1.080
1.001

A 1.246
f . . verage "i =
. M‘ssmg Data . 1.003

 

 

75%

 

 

7 Incidence of

 

 

 

 

 

 

 

Bias in CC Estimation of B and T

As expected by estimation theory, there was no bias in the complete-case
estimators of B for the MCAR or MAR data. Because all negative estimates of T were set
to zero before means and MSE/variance ratios were calculated, some positive bias was

expected; however, biases were generally less than .003, and no ratio exceeded 1.025.

63

MSECC to MSEMI E Ratios

Due to the lack of bias among over 70% of the simulation conditions, and the
relatively small bias present for some predictors for the remaining 30%, it is fair to roughly
characterize the MSECC/MSEMLE ratios below as measuring efﬁciencies of the ML
estimators relative to the CC estimators. One ratio was calculated for the intercept, one
ratio was calculated for each slope, and one ratio was calculated for T.

Table 5.7 summarizes the MSE ratios across all simulation conditions. On average,
the rmximum-likelihood analysis provides large gains in efﬁciency; estimation of the
intercept and slope for the completely-observed variable is almost twice as efﬁcient, and
estimation of T over three-and-a-half times as efﬁcient. However, while for each
parameter to be estimated the maximum-likelihood method is more eﬂicient than the
complete-case method across all conditions, the relative efﬁciency varies considerably

depending on the combinations of hyperparameters.

Table 5.7

: MSECC/MSEMLE Ratios (MCAR/MAR Data)

 

 

 

 

 

 

 

 

 

All of the lowest relative efﬁciencies came from studies with an average study

sample size of 400. The lowest relative efﬁciencies for B0 and B1 came from the set of

64

 

 

conditions with T=0, k=100, Vmod=.03, low correlation among predictors, and 50%
missing MCAR data. The lowest relative efﬁciency for T came from a similar set of
hyperparameters, except that T = .005. The lowest relative eﬂiciencies for estimation of
B2 and B3 came ﬁom somewhat diﬁ‘erent looking conditions: they had in common T =
.02, k=40, high correlations among predictors, and a 75% incidence of missing data per
predictor. For B2, however, Vmod = .03 and the data was MAR; for B3, Vmod = .006 and
the data was MCAR. For none of these sets of conditions was there any bias on any of the
slopes except for a small bias in B2 for the T minimum relative efﬁciency. This supports
the conclusion made above that the slight bias in the estimates ofB were substantively
insigniﬁcant.

Table 5.8 below summarizes the MSE ratios for the main effects. The effect of
different sizes of T is substantively trivial for relative eﬂiciency in estimation of B0 and B1,
moderately affects relative efﬁciency for B2 and B3, and has a large effect on relative
eﬁciency in the estimation of T. Overall, having more studies in a meta-analysis leads to
slightly lower efﬁciencies for the estimation of the Bs, but leads to clearly better complete-
case estimates of T, and thus a smaller gain in efﬁciency. The same is true regarding
having a larger average study sample size, but the effect on the relative efﬁciencies of the
Bs is slightly larger and the effect on the eﬂiciencies for T is slightly smaller.

The correlation among the predictors also has a small effect. Higher
intercorrelations among predictors lead to srmller gains in relative efﬁciency for the Bs but
a larger gain in relative eﬁciency for estimation of T. Incidence of missing data is the most

important hyperparameter with regard to its effect on relative efﬁciency.

65

i MSECCIMSEMLE Ratios, Main Eﬂ'ects (MCAR/MAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parameter Parameter B0 B1 B 2 B 3 T
Value
T o 1.969 1.894 1.431 1.456 4.700
.005 1.921 1.836 1.286 1.322 2.572
.02 1.892 1.781 1.181 1.220 1.694
k 40 1.974 1.909 1.329 1.353 3.657
100 1.880 1.765 1.269 1.311 2.321
Avg. ni 80 2.040 1.943 1.348 1.362 3.429
400 1.815 1.731 1.250 1.302 2.549
Predictor Low 2.052 1.921 1.333 1.367 2.825
intercom, High 1.802 1.752 1.265 1.298 3.152
Vmod .006 2.036 1.931 1.320 1.342 3.297
.03 1.818 1.743 1.278 1.323 2.681
Incidence of 50% 1.596 1.551 1.188 1.209 2.248
M, Data 75% 2.258 2.123 1.411 1.456 3.729
M. Data MCAR 1.677 1.691 1.303 1.330 3.011
Mechanism MAR 2.178 1.983 1.295 1.334 2.967

It has the largest main eﬂ‘ect for each of the Bs, and the second largest main eﬁ‘ect for the
estimation of T; predictably, on average, the more missing data there is, the better the
maximum-likelihood method performs relative to the complete-case method. Finally, the

type of missing-data mechanism (MCAR vs. MAR) has (large) substantive main effects on

66

the eﬂiciencies for B0 and B1, but little effect on B2, B3, or T. Somewhat surprisingly,
even though the estimation of the Bs is unbiased for complete-case analysis in the presence
of MAR data, the estimation of the parameters is clearly worse than it is for MCAR data.
Note that the efﬁciencies for B2 and B3 are quite similar. This pattern was maintained
through most of the simulation study. The values of B2 and B3 were not identical, but
because both variables X2 and X3 were missing the same amount of the time, statistics
regarding these two parameters behaved in similar ways, regarding interaction effects,
relative efﬁciencies, and so on.

Although most that can be learned about the relative eﬂiciencies can be gained
from Table 5 .8, there were two noticeable 2nd-order interaction effects for the MSEs for
B0 and B1. These are shown in Tables 5.9 and 5.10. These tables show that the MI.
estimation of B0 and B1 is especially efﬁcient relative to CC estimation when there are low
predictor intercorrelations and a large amount of missing data, or low predictor
intercorrelations and MAR data. There were no interesting (i.e., substantively large)
interactions for the MSEs for B2 and B 3 .

There are several large interaction eﬁ‘ects for the MSE ratios for the estimation of
T, the largest of which involve the size of T and another hyperparameter. These
interactions are reﬂected in Tables 5.11 - 5.14.

Table 5.11 demonstrates that the interaction between the size of T and the
incidence of missing data is such that when there is 75% missing data, the ML method is
especially efﬁcient for low values of T, especially T = 0.

Table 5.12 shows that for a small average study sample size, the ML method is

67

 

 

especially efﬁcient for T = 0 and T = .005. For a larger average study sample size, the ML
method is only especially efﬁcient (i.e., three to four times as efﬁcient) for T = 0.
In Table 5.13 the key facet of the interaction is that the ML method is especially
eﬁicient (almost six times as efﬁcient) for a small number of studies and a low value of T.
Table 5.14 demonstrates that the ML method is especially efﬁcient when there is
the least amount of complete data, i.e., when there is a 75% incidence of missing data and

only 40 studies.

68

Table 5 .9

  
  

MSECC/MSEMLE Ratios for [30, MCAR/MAR Data

(Predictor Intercorrelations x Missing Data Incidence)

Ratio for B0

 
   

Ratio for [31 ‘

 

  

  

Low Predictor

   

Intercorrelations

50% Incidence of

Missing Data

  

1.568

  

1.559

 

 

7 5% Incidence of

Missing Data

    
  

 

 
  

High Predictor

  

Intercorrelations

50% Incidence of

Missing Data

 

 

 

 
 
 

7 5% Incidence of

 

Ratio for ﬂo

    
 

 

Ratio for BI

 

Low Predictor

Intercorrelations

1.647

1.697

 

2.457

2.145

 

High Predictor

Interconsorrelati - _ _ __ __ _ _

1.706

1.685

 

 

69

 

1.899

 

1.820

 

  
      

 

 

 

Table 5.11

 

MSECC/MSEMLE Ratios for T, MCAR/MAR Data

(Size of T x Incidence of Missing Data)
m

   

 

 

Ratio for T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

T = 0 3.375

50% Missing Data T = .005 1.922

T = .02 1.447

T = 0 6.026

75% Missing Data T = .005 3.222

__ T = .02 1.940
Table 5.12

MSECC/MSEMLE Ratios for T, MCAR/MAR Data
(Size of T 11 Average Study Sample Size)
Ratio for T

T = 0 4.857

Average "i = 80 T = .005 3.518

T = .02 1.911

T = 0 4.544

Average ”i = 400 T = .005 1.625

4: .02 1.477 I

 

 

 

 

70

Table 5.13

 

MSECC/MSEMLE Ratios for T, MCAR/MAR Data

(Size of T x Number of Studies)

 

 

 

 

 

 

 

Ratio for T
T = 0 5.880
k = 40 T = .005 3.203
T = .02 1.887
T = 0 3.521
k = 100 T = .005 1.941
T = .02 1.500

 

 

 

 

 

 

 

 

 

 

Table 5.14
MSECC/MSEMLE Ratios for T, MCAR/MAR Data
(Incidence of Missing Data 11 Number of Studies)
Ratio for T
k = 40 2.602
50% Missing Data
k = 100 1.894
k = 40 4.731
75% Missing Data
k = 100 2.747
_

 

 

 

 

71

MSECC to MSE AC Ratios

Unlike for CC estimation and MI. estimation, there is no theory that suggests that
AC estimation is appropriate when the data are MAR. However, the results of AN OVAs
suggested that the performance of the AC estimation with MAR data was very similar to
its performance with MCAR data; thus, the results for the two types of data are examined
simultaneously. The few differences that existed are noted below.

As will be seen below, except for the estimation of T, and except for the estimatior
of the Bs when k = 40 and there was 75% incidence of missing data, the biases among the
AC estimators are small enough that it is fair to roughly characterize the MSECC/MSE AC
ratios below as measuring the eﬂiciencies of the AC estimators relative to the CC
estimators.

Table 5.15 summarizes the MSE ratios across all simulation conditions.

 

 

 

 

Table 5.15
MSECC/MSE AC Ratios (MCAR/MAR Data)
150 51 52 B3 1" %
Mean 1 424 1.286 862 866 1.155
Medlan l 343 1.321 942 939 1.072
Mmrmum 010 .014 007 010 081
Maxrmum 3 539 2.751 1 402 1 433 2 664

 

 

 

 

 

On average, available-case methods provide much better estimates than complete-

72

case methods for B0 and B1, slightly better estimates for T, and slightly worse estimates
for B2 and B3. However, the results have high variability across the values of the
hyperparameters. Table 5.16 summarizes the MSE ratios by main eﬁ‘ect. The largest
effects are for the size of correlation among predictors and the number of studies per
meta-analysis. The only substantively interesting interaction effects for the Bs involve
incidence of missing data, the correlation among the predictors, and the number of studies
in the simulated meta-analysis. Table 5.17 shows the average sizes of these interactions
across the combinations of the diﬂ‘erent values of these hyperparameters. The interaction
between the correlation among predictors and missing data incidence is especially strong
for B0 and B1, as is the interaction between incidence of missing data and the number of
studies for B2 and B3. Available-case estimation of B0 and B1 is especially poor for k = 40
when there are high intercorrelations between predictors, and AC estimation of B2 and B 3

is especially poor for k = 40 when the incidence of missing data is high.

73

 

Table 5.16

MSECC/MSEAC Ratios, Main Effects (MCAR/MAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parameter Parameter B0 B1 B2 B 3 T
Value

1: 0 1.226 1.137 .772 .771 1.053

.005 1.469 1.325 .888 .891 1.142

.02 1.577 1.395 .928 .937 1.270

It 40 1.325 1.114 .736 .737 1.254

100 1.523 1.458 .989 .996 1.056

Avg. "i 80 1.521 1.371 .907 .907 1.349

400 1.327 1.200 .818 .826 .960

Predictor LOW 1.654 1.479 .968 .981 1.179

magmas, High 1.194 1.093 .757 .752 1.131

Vmod .006 1.512 1.348 .873 .872 1.219

.03 1.337 1.224 .852 .861 1.090

Incidence Of 50% 1.272 1.285 .939 .938 .932

M Data 75% 1.576 1.287 .786 .795 1.377

M. Data MCAR 1.295 1.267 .982 .988 1.063
Mechanism MAR 1.626 1.563 .974 .977 1.038 ,

74

Table 5.17

 

MSECC/MSEAC Ratios, MCAR/MAR Data

 
  

  

(Incidence of Missing Data 11 k x Predictor Intercorrelations)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Bo 151 132
50% k = 40 Low Corrs. 1.382 1.400 1.025
High Corrs. 1.291 1.261 .888
Incidence
k Low Corrs. 1.270 1.279 .963
= 100
°fM' Data I-IighCorrs. 1.147 1.204 .878
75% k = 40 Low Corrs. 1.837 1.345 .722
High Corrs. .792 .454 .697
Incidence
k Low Corrs. 2.127 1.897 1.160
= 100
°fM' Data ' Corrs. 1.548 1.451 .956

 

Table 5.18

 

MSECCMSEmE Ratios for T, MCAR/MAR Data

 

(Size of T 11 Number of Studies)

 

 

 

 

 

 

 

 

 

T Ratio
T = 0 1.201
Average ni = 80 T = .005 1.212
T = .02 1.254
T = 0 .660
Average "i = 400 T = .005 .833
T = .02 —l 1.141 _

 

75

 

The patterns of ratios for the Bs are roughly what would be expected given what
we know about pairwise estimates’ sensitivity to extreme correlation matrices. The
pairwise estimates have the highest relative efﬁciency when the correlations among the
variables are the lowest, and especially when the correlations between the predictors and
the outcome are lowest — speciﬁcally, when T is large, Vmod is low, or the average sample
size is low. However, on average the ratios are much closer to 1.00 than they were for the
maximum-likelihood analyses. Just as was the case with the ML/CC relative efﬁciencies,
available—case analyses show greater gains in efﬁciency for MAR data than for MCAR
data for estimation of B0 and B1.

There is one moderate interaction between average "i and T; it is summarized in
Table 5.18. The interaction represented in Table 5.18 shows that the effect of study
sample size is relatively constant across values of T for average "i = 80, but graduated for
average "i = 400.

Because, in general, available-case estimation does not perform better than
complete-case estimation for B2 and B3, it might be concluded that pairwise estimation is
not recommended for use in a mixed-model meta-analysis with MCAR or MAR data.
However, there are large gains in efﬁciency for B0 and B1 when there is a large amount of
random-effects variation in the studies, and there are either a large number of studies or a
small percentage of missing data. Under these conditions, AC estimation compares
favorably with even the maximum-likelihood method, as shown in Table 5.19. While
maximum-likelihood estimation of T is on average 26% more efﬁcient for even these

conditions, the estimation of the Bs is on average only 10 to 13 percent more eﬂicient, and

76

 

 

 

on a few rare occasions (ﬁve conditions out of the 192) is marginally less efﬁcient than AC

estimation. These cases arose when Vmod = .006 and the correlations among the

predictors were at their lowest (.1).

Table 5.19

 

(Excluding k=40/75% Missing Data)

 

 

 

 

 

 

 

 

 

 

 

 

130 131 02 B3 I
Mean 1.121 1.108 1.100 1.134 1.278
Median 1.100 1.090 1.096 1.136 1.288
Minimum 1.013 .991 .927 .921 1.116
Maximum 1.629 2.086 1.598 1.746 1.470

 

 

Thus, the ML estimation of the Bs does the most poorly relative to AC analysis
when there are almost zero correlations between the predictors and outcome, and very
low correlations among predictors. It should be noted that in these cases, complete-case
analysis is rarely appropriate; both the ML and AC estimation procedures are on average
two to three times as efﬁcient in estimating B0 and B1, and 30% to 40% as efﬁcient in

estimating B2 and B3.

77

2. Results: p-NMAR Data

There were a total of 192 combinations of hyperparameters for which the data
were NMAR and the missing-data mechanism was dependent on the values of the
predictors. 96 of these combinations had data that were strongly NMAR, (Sp-NMAR),
while the other 96 had combinations of hyperparameters that were moderately NMAR

(mp-NMAR).

Bias in MI. Estimation of B

It was expected that there would be statistically signiﬁcant biases in the MLE
estimators of the slopes given that ML theory assumes that the missing-data mechanism is
MCAR or MAR. This proved to be the case, as shown in Table 5.20. Table 5.21 shows
the average, minimum, and maximum MSE/variance ratios across all data.

The frequency of bias is low for the slopes for the predictors, although the
intercept is almost always biased. The primary difference between these results and the
results for the MCAR/MAR data in the previous section is the frequency of bias in the
intercept. This ﬁnding is borne out in histograms of the sizes of the biases. Figures 5.2 and
5.3 are of the biases for B3. Except for some rare cases, the ratios are very close to 1.00.
The highest ratios come from cases with T = .005, 75% incidence of missing data, low
correlations among predictors, and an average study sample size of 400. The actual size of
the signiﬁcant biases for the slopes ranged ﬁom -.06 to .09. While the larger biases are not

substantively uninteresting, these ﬁgures show they are generally overwhelmed by
sampling error.

78

Table 5.20

 

L Maximum-Likelihood Frequencies of Bias in Estimates of B (p-NMAR Data) J]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Missing-data 130 131 132 B3
Mechanism
sp-NMAR 95(99%) 28(29.2%) 13(13.5%) 38(29.2%)
‘ _mp-NMAR 89(92.7%) 18(1 8. 8%) 11(1 1.5%) 20(20.8%) ﬂ
Table 5.21
Maximum-Likelihood MSE/Variance Ratios
for B (p-NMAR Data)
1
[30 131 B2 BB
Mean 1.137 1.005 1.003 1.007
Median 1.078 1.002 1.001 1.002
Minimum 1.002 1.000 1.000 1.000
Maximum 1.740 1.021 1.084 1.087
79

 

 

 

 

Figure 5.2

 

70
so .8 .3
50
40 j: 8’

30

20

  

 

 

 

1.002 1.011 1.020 1.030 1.039 1.048 1.058 1.067 1.077 1.086

Ratio of MSE to Variance for B3 (Sp-NMAR Data)

Figure 5.3
80

 

60
40

  

 

 

   

1.001 1.007 1.012 1.018 1.024 1.029 1.035 1.040 1.046 1.051

Ratio of MSE to Variance for B3 (mp-NMAR Data)

80

 

 

 

The ratios for the biases for the intercept are on average much higher than for the
slopes, but substantively they are small, as shown in Table 5.22. While the ratios are
sometimes as high as 1.74, the maximum bias is only -.0297. Given the relative
unimportance of the difference between an average effect size of .8 versus an average

effect size of .77, the bias in the intercept often can be safely ignored.

 

 

 

 

 

 

 

 

 

 

Table 5.22
Maximum-Likelihood MSE/Variance Ratios and Biases
for Bo and B1 (p-NMAR Data)

Mean 1.204 -.0139

sp-NMAR Median 1.117 -.0125

Data Minimum 1.003 -.0034

Maximum 1.740 -.0297

Mean 1.070 -.0072

mp-NMAR Median 1.052 -.0068

Data Minimum 1.002 -.0018
Maximum 1.246 F -.0177 I

 

 

 

 

 

Bias in ML Estimation of T
An AN OVA in which the outcome was the size of the bias in the estimation of T
was conducted that showed that all seven simulation hyperparameters and most 2-way

interactions between those hyperparameters were signiﬁcant; however, the only

81

 

 

 

substantive effects came ﬁom missing data pattern, number of studies, and the value of T.
Whether the data were sp-NMAR or mp-NMAR was unimportant. Table 5.23 shows the
ratios and biases for the ML estimates. All biases were in a positive direction. The large
negative biases for T = .02 found for the MCAR/MAR data are not present in the NMAR
estimations. However, there are substantively large biases for k = 40 and 75% incidence of

missing data for the lower values of T.

Table 5.23

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ratio for T
I T=0 1.177 .0035 I
k = 40 T = .005 1.274 .0028 I
50% Incidence T = .02 1.364 .0019
ofMissing Data T = 0 1.386 .0022 J
k: 100 T=.005 1.114 .0017 I
I T=.02 1.104 .0015 I
T = 0 1.149 .0065
k = 40 T = .005 1.561 .0056
75% Incidence T = .02 1.824 .0046 I
I of Missing Data 1332
1.231

 

 

 

 

 

 

82

 

Bias in AC Estimation of B
The extent to which biases were statistically signiﬁcant in the sp-NMAR and mp-

NMAR data is shown below in Table 5.24.

 

 

 

Table 5.24
Available-Case Frequencies of Bias in Estimates of B (p-NMAR Data)
Missing-data I30 131 132 133 ‘
Mechanism
I sp-NMAR 96 (100%) 40 (41.7%) 71 (74.0%) 64 (66.7%)
m. NMAR 93 96.9 36 37. , - 41.7% 41 42.7% _

 

 

 

 

While bias was widespread, the size of the bias relative to the variance of the
estimators was similar to that found for the MAR data for all estimators but B0. Tables
5.25 and 5.26 summarize the ratios of MSEs to the variances of the estimators. The biases
for the B8 are, on average, substantively insigniﬁcant relative to the mean-squared error
for both the MCAR and MAR data for B2 and B3, especially for the mp-NMAR data. In
contrast, the ratios for B0 were quite large, indicating that much of the error in the

estimation of B0 came from biased estimation.

83

Table 5.25

Available-Case MSE/Variance Ratios for B (Sp-NMAR Data)

 

 

 

 

 

 

 

 

 

$0 pl 132 B3
50% Incidence k=40 1.361 1.006 1.018 1.009
ofM. Data k=100 2.062 1.019 1.042 1.018 I
75% Incidence k=40 1.228 1.001 1.008 1.010 I
ofM. Data k=100 2.592 1.021 1.038 1.018 I

 

 

 

 

 

 

 

 

 

Table 5.26
Availabe-lu—seMSE/Variance Ratios for B (mp-NMAR Data)
Bo 131 132 133
, 50% Incidence k=40 1.097 1.003 1.004 1.004
‘ ofM Data k=100 1.247 1.002 1.004 1.003
% Incidence of k=40 1.117 1.004 1.007 1.013 I

 

 

 

 

 

 

 

1.005 I

Table 5.27 summarizes the sizes of the average ratios and bias for B0 across the

types and incidence of missing data, and values of Vmod- These variables were the most

important in predicting the amount of bias. Biases are larger for Vmod = .03, the sp-

NMAR data, and 7 5% missing data. There were no substantively large interactions. While

the biases are larger than those for the ML estimator of B0 (shown in Table 5.22), on

average they are small relative to the effect size metric.

84

 

 

Table 5.27

 

I Available-Case MSE/Variance Ratios and Biases for B0 (p-NMAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

Ratio for B0 Bias in B0

50% Incidence Vmod = .006 1.246 -.0150

sp-NMAR ofMissing Data Vmod = .03 2.177 -.0350

Data 75% Incidence Vmod = .006 1.362 -.0320

ofMissing Data Vmod = .03 2.458 -.0560

50% Incidence Vmod = .006 1.069 -.007 5

mp—NMAR ofMissing Data Vmod = .03 1.274 -.0160

Data 75% Incidence Vmod = .006 1.120 -.0130
_ofMlss_D_ta_:__ Vmod= '03

85

 

 

 

 

Bias in AC Estimation of T

Table 5.28 summarizes the sizes of the average bias in T for the hyperparameters

that were most strongly related to the size of bias. The results are very similar to those for

the MCAR/MAR data. Estimation is the worst for T = 0, and biases are substantively large

for the conditions in which T = 0 or .005 and the average ni = 80. Biases are substantively

negligible for other conditions.

Table 5.28

 

Available-Case MSE/Variance Ratios and Biases for T (p-NMAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ratio for T Bias in T
1; = 0 1.406 .0081
50% Average ”i = 80 T = .005 1.158 .0060
T = .02 1.014 .0023
Incidence of
, A 1; = 0 1.343 .0025
, . , verage "i =
Missmg Data 1: = .005 1.017 .0009
‘ 0
40 T = .02 1.007 -0007
1; = 0 1.324 .0074
75% Average "1 = 80 T = .005 1.099 .0056
1 1; = .02 1.006 .0002
Incidence of
, A 1; = () 1.277 .0025
.1 , , verage n- =
. Mlssmg Data ‘ I = .005 1.007 .0004

86

Bias in CC Estimation of B and T

Out of the 192 cond'uions, there were only 7 instances of a signiﬁcant bias for
estimation of B0, 8 instances of signiﬁcant bias for B1, 8 instances of signiﬁcant bias for
B2, and 4 instances of signiﬁcant bias for B 3. These biases were very small relative to the
sampling error; the ratios of MSEs to variance were all less than 1.02.

Bias in the estimation of T was more substantial. Table 5.29 summarizes the sizes
of the bias over the simulation conditions to which it was most strongly related: T, k,
incidence of missing data, and average "i- Biases in the estimation of low values of T were
very large, sometimes over .01, in cases Iwhere the average study sample size was small.
There were substantial interaction effects between average study sample size and both
incidence of missing data and number of studies. A high proportion of missing data or low
value for k exacerbated the bias in the estimation of T for the conditions in which average

study sample size was 80.

87

Table 5.29

 

 

Complete-Case MSENariance Ratios and Biases for T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

88

(p-NMAR Data)
Ratio for T Bias in T
0 Average ”i = 80 1.383 .0113 I
I =
Average "1 = 400 1.363 .0021
Average "1 = 80 1.187 .0096
T = .005
Average "i = 400 1.026 .0011
Average "1 = 80 1.044 .0067
T = .02
Average "1 = 400 1.004 .0005
Average "i = 80 1.202 .0126
k = 40
Average "i = 400 1.122 .0018 I
Average "i = 80 1.208 .0058
k = 100
Average ”i = 400 1.140 .0007
50% Incidence of Average "i = 80 1.202 .0120
Missing Data Average ni = 400 1.127 .0017 I
75% Incidence of Average "i = 80 1.209 .0064 I
Missg' B Data Average ni = 400 1.134 .0008 I

 

 

 

 

 

 

MSECC to MSEMI E Ratios

Table 5.30 summarizes the MSE ratios across all simulation conditions. As it did
with the MCAR/MAR data, the maximum-likelihood analysis provides large gains in
efﬁciency; estimation of the intercept and slope for the completely-observed variable is

almost twice as efﬁcient, and estimation of T is almost three times as efﬁcient.

 

 

 

 

 

 

 

 

Table 5.30
130 131 132 B3 T
Mean 1.871 1.677 1.417 1.306 2.861
Median 1.764 1.536 1.347 1.246 2.373
Minimum 1.107 1.212 1.061 1.038 1.044
Maximum 3.383 2.559 2.463 2.388 10.73

 

 

 

 

 

 

 

As with the MCAR/MAR data, the relative efﬁciency varied depending on the
simulation conditions. Table 5.31 shows the main effects of each of the seven simulation
hyperparameters. For the intercept and slopes, the only hyperparameters that were very
important were the incidence of missing data and the population value of T. The lower the
value of T, the higher the relative efﬁciency; similarly, the more missing data there were,
the higher the relative efﬁciency was. The effects for the other ﬁve hyperparameters on the
estimation of the Bs was slight or non-existent. More hyperparameters affected the

estimation of T. The population value of T, the average study sample size, the number of

89

 

 

 

 

studies, the value of Vm , and the incidence of missing data all lmd moderate to large
eﬁ‘ects on the relative efﬁciency. There was little to no effect for whether the missing-data
mechanism was sp-NMAR or mp-NMAR on the efﬁciencies for any of the
hyperparameters. The relative eﬂiciencies for B2 and B3 were quite similar to each other,
even though there was an important difference between the 2nd and 3rd predictors in this
combination of simulation hyperparameters — the missing—data mechanism was related to a

function of solely the 2nd predictor.

90

Table 5.31

 

MSECCIMSEMLE Ratios, Main Effects (p-NMAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

 

, Parameter Parameter B0 B1 B2 B3 T
Value
T 0 1.952 1.746 1.550 1.430 4.417 I
.005 1.862 1.676 1.404 1.292 2.490 I
.02 1.799 1.610 1.295 1.194 1.676
k 40 1.936 1.729 1.438 1.311 3.481 I
100 1.806 1.626 1.396 1.300 2.241 I
Avg. ”1 80 2.008 1.756 1.481 1.334 3.294
400 1.735 1.599 1.352 1.277 2.429 I
Predictor Low 1.931 1.666 1.479 1.345 2.712 I
“315mm. High 1.811 1.689 1.354 1.267 3.011 I
Vmod .006 2.002 1.742 1.450 1.305 3.168
Incidence of
M. Data

 

 

 

 

 

 

 

 

 

 

 

The lowest relative eﬁciencies for the estimation of B0 (between 1.10 and 1.25)

and B1 (1.20 to 1.30) came from cases with an average study sample size of 400, k = 100,
and only 50% missing data on the predictors. Vmod tended to be .03, but not always. The

lowest relative efﬁciencies for B2 and B3 (about 1.05 to 1.10) came from simulated meta-

91

 

 

 

analyses that had an average study sample size of 400, had large correlations among
predictors, and used mp—NMAR data. Finally, the lowest relative efﬁciencies for
estimation of T (1.05 to 1.15) came from cases with an average study sample size of 400,
Vmod = .03, and 50% missing data on the predictors.

The only interactions that were substantively large in causing variation in the ratios

for 1: were identical to those found for the MCAR/MAR data, and are reﬂected in Tables

 

 

 

 

 

 

 

 

 

 

5.32 - 5.35.
Table 5.32
MSECC/MSEMLE Ratios for T, p-NMAR Data
(Incidence of Missing Data 11 Size of T)
Ratio for T .
. . . T = 0 3.222
50% Incidence of Missmg
T = .005 1.838
Data T = .02 1.413
. . . T = 0 5.613
75% Incldence of Mlssmg
T = .005 3.142
Data T=.02 , f _ , _ 1.934

 

 

 

92

 

 

Table 5.33

 

MSECC/MSEMLE Ratios for T, p-NMAR Data
(Average Study Sample Size 11 Size of T)

Ratio for T

 

 

 

 

 

Table 5.34

 

 

 

 

Méneamsnm

(Number of Studies 11 Size of T)

 

 

Ratios for T, p-NMAR Data

 

T = 0 4.857

Average ni = 80 T = .005 3.518
T = .02 1.911

T = 0 4.544

Average ni = 400 T = .005 1.625
1.477

 

 

  
   

 

 

 

 

 

 

 

 

 

Ratio for T
T = 0 5.502
k = 40 T = .005 3.072
T = .02 1.869
T = 0 3.333
k = 100 T = .005 1.908
1.482

 

93

 

 

 

Table 5.35

MSECC/MSFMLE Ratios for T, p-NMAR Data

     
   

(Incidence of Missing Data 11 Number of Studies)

 

 

 

 

 

 

 

Ratio for T
50% Incidence of Missing k = 40 2.611
Data k= 100 1.871
75% Incidence of Missing k = 40 4.518
Data k = 100 2.444

The patterns in the above tables are identical to those found in Tables 5.11 through
5.14 for the MCAR/MAR data. Estimation of T is especially eﬂicient for the ML method
when T is 0 and there is either 7 5% missing data on the predictors, there is a small average
study sample size, or there is a small number of studies. Finally, ML estimation is
especially eﬂicient when there is a small number of studies and a large amount of missing

data.

MSECC to MSEAC Ratios

Table 5.36 summarizes the MSE ratios across all simulation conditions. As was the
case with the MCAR/MAR data, ratios were sometimes practically zero due to poor
estimation and extreme outliers in estimates of B. These cases came about in conditions in
which there was 7 5% missing data, k = 40, and there were high correlations among

predictors.

94

 

 

Table 5 .36

MSECC/MSEAC Ratios, p-NMAR Data

 

 

 

 

 

 

 

 

 

00 131 132 153 17
Mean 1.324 1.214 1.045 .924 1.011
Median 1.280 1.219 1.063 .969 1.061
Minimum .037 .006 .005 .004 .240
. ﬂ ,, 2.682 -, 16.7 1.343 2.549

On average, available-case methods provide better estimates than complete-case

methods for B0, B1, but little to no improvement in estimation for B2, B3, and T. The

results vary considerably across the values of the hyperparameters: Table 5.37 summarizes

the MSE ratios by main effect. Table 5.38 summarizes the ratios across combinations of

the different values of missing data incidence, k, and size of predictor intercorrelations.

Just as for the MCAR/MAR data, the largest main effects and interactions were among

and between these simulation hyperparameters.

The patterns of ratios are similar to those for the MCAR/MAR data. The pairwise

estimates have the highest relative eﬁciency when the correlations among the variables are

the lowest, and especially when the correlations among the predictors and between the

predictors and the outcome are lowest. Except for B2, the ratios are all closer to 1.00 than

they were for the MCAR/MAR data.

 

 

Table 5.37

w MSECC/MSEAC Ratios, Main Effects (p-NMAR Data) j

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parameter Parameter B0 B1 B2 B 3 T
Value

T 0 1.201 1.132 .969 .856 1.074
.005 1.276 1.179 1.025 .903 1.124

.02 1.495 1.330 1.142 1.014 1.315

k 40 1.344 1.100 .947 .831 1.309

100 1.304 1.327 1.114 1.017 1.033

Avg. ni 80 1.493 1.310 1.183 .970 1.376
400 1.155 1.117 1.044 .878 .967

Predictor Low 1.464 1.338 1.159 1.026 1.191
f imam“, High 1.183 1.089 .931 .822 1.151
' vmod .006 1.484 1.232 1.059 .914 1.245
.03 1.163 1.195 1.031 .934 1.098
. Incidence of 50% 1.141 1.135 1.042 .935 .898
' Mpata 75% 1.506 1.292 1.049 .913 1.445
M. Data sp-NMAR 1.378 1.199 1.132 .918 1.154
Mechanism mp-NMAR 1.270 1.228 .959 .930 1.188 3

 

 

 

 

96

 

 

 

Table 5.38

MSEcc/MSEAC Ratios, p-NMAR Data

(Incidence of Missing Data 11 k x Predictor Intercorrelations)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 Low Corrs. 1.272 1.207 1.119 1.001
504’ k=40
High Corrs. 1.192 1.118 .989 .884
Missing
Low Corrs. 1.101 1.133 1.081 .969
Data k = 100
High Corrs. 1.000 1.084 .977 .887
75% k = 40 Low Corrs. 1.795 1.396 1.092 .947
High Corrs. 1.117 .681 .590 .491
Missing
Low Corrs.
Data k = 100
E52. Corrs. _ - -

 

AC estimation of the slopes has the largest gains in efﬁciency over the CC
estimates when T = .02, and in these conditions, AC estimation sometimes compares
favorably with the maximum likelihood method, as shown in Table 5.39. Maximum-
likelihood estimation of T is on average 29% more efﬁcient for these conditions, the
estimation of the intercept 21% more efﬁcient, but the estimation of the slopes is on
average only 5 to 10 percent more eﬂicient. In nine of the 192 conditions, at least one of
the hyperparameters was marginally less efﬁciently estimated through ML estimation. In
all of these conditions T was .02, in eight, the data was sp-NMAR, and in seven, Vmod was

.006.

97

 

 

 

Table 5.39

 
 

MSEAC/MSEMLE Ratios for p-NMAR Data, T = .02

(Excluding k=40/75% Incidence of Missing Data)

 

 

 

 

 

 

 

 

 

 

Bo I31 132 133 17
Mean 1.211 1.100 1.057 1.095 1.292
Median 1.160 1.102 1.070 1.104 1.279
Minimum .976 1.025 .899 .921 1.095
' Max1mum 1.866 1.71 _ 1.251 1.290 1.515

98

 

3. Results: o-NMAR Data

There were a total of 96 combinations of hyperparameters for which the missing-
data mechanism was related to the population value of the outcome. Data of this type is

referred to as o-NMAR data.

Bias in ML Estimation of B

Maximum-likelihood theory suggests that when the missing-data mechanism is a
function of the observed values of the outcome, rmximum-likelihood estimates will be
asymptotically unbiased. However, as noted in Chapter 3, Little (1992)’s ML estimates
when the missing-data mechanism was a ﬁmction of the values of the outcome seemed to
be biased. Also, as described in Chapter 4, in this investigation the missing-data
mechanism was related to the population value of the outcome. The population value is
not known to a meta-analyst, however; only the sample values are known.

For the 96 conditions, there were 85 biased estimates of B0, 80 biased estimates of
B1, 43 biased estimates of B2, and 62 biased estimates of B3. Table 5.40 shows the
average MSE/variance ratio across all conditions for each slope. Except for four outliers
(all of which have in common T = 0 or .005, Vmod = .03, low correlations among
predictors, and average ni = 400), all ratios for B2 and B3 are below 1.12, implying that
most of the time the size of the bias is small relative to the sampling error in the estimates.
The ratios for B0 and for B1 have larger ratios under some conditions. A breakdown of the

ratios and biases across conditions is in Table 5.41.

99

Table 5.40

Maximum-Likelihood MSE/Variance Ratios

for B (o-NMAR Data)

 

 

 

 

 

 

 

 

 

The MSE/variance ratios in Table 5.41 are largest for T = 0; on average, over half
the mean-squared error in the estimation of B0 is caused by bias in the estimation of B0.
However, on average, the bias is slight, and substantively insigniﬁcant relative to the
metric of standardized mean diﬁerences. The ratios for B1 are less extreme; they are the
highest for T = 0, but the average biases are largest for T = .02. However, for this
condition the average ratio is only 1.02, indicating that the bias is generally insigniﬁcant

A
relative to the sampling error in B1.

100

Table 5.41

 
 
   
   
 
   
   
   
   

Maximum-Likelihood MSE/Variance Ratios and Biases

 

 
 

for B0 and B1, Main Effects (o-NMAR Data)

Ratio for BI

 

 

 

 

 

 

Avg. ni

 

 

Predictor

 

inter-corrs.

 

 

 

Incidence of

 

 

 

 

 

 

M. Data

Bias in ML Estimation of T

An AN OVA was conducted to determine the extent to which the seven simulation
patterns were related to variation in the size of the bias in the estimates of T. All seven
simulation hyperparameters and two-way interactions between those hyperparameters
were signiﬁcant; however, the only substantive main eﬁ‘ects came ﬁ'om missing data

101

 

 

pattern, number of studies and the value of T. Table 5.42 shows the average ratios and
biases across combinations of these hyperparameters. There was a substantively important

interaction effect between k and T: large values of k lessen the negative bias in T. There

was one other substantively large interaction, shown in Table 5.43, between average study

sample size and T: large average study sample sizes both decrease the positive bias in T for

T = 0 and decrease the negative bias in T for T = .02. These results are very similar to

those of the MCAR/MAR data; in general, biases are quite small, especially when k = 100.

Table 5.42

 

 

I Maximum-Likelihood MSENariance Ratios and Biases for T (o-NMAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ratio for T Bias in t

T=O 1.173 .0015 I

k = 40 T = .005 1.151 -.0016 I
50% Incidence T = .02 1.302 -.0067
ofMissing Data T = O 1.385 .0014
k = 100 T = .005 1.042 -.0005
_ 1: = .02 1.063 -.0024
T = o 1.150 .0014
k = 40 T = .005 1.349 -.0021
75% Incidence T = .02 1.754 -.0105
ofMissing Data T = 0 1.353 .0016
k = 100 T = .005 1.098 -.0008

T = .02 1.189 -.0044 I

 

102

 

 

Table 5.43

MSECC/MSEMLE Ratios for T, o-NMAR Data

(Average Study Sample Size x Population Variance)

 

 

 

 

 

 

 

 

 

 

 

 

 

T Ratio Bias in t
T = 0 1.252 .0024
Average ni = 80 T = .005 1.030 -.0008
T = .02 1.323 -.0076
T = 0 1.27 8 .0006
Average ni = 400 T = .005 1.290 -.0017
T = .02 . _ 1.332 _ -.0045 7

Bias in AC Estimation of B

Signiﬁcant biases in the AC estimators were more ﬁ'equent for the o-NMAR data
than they were for any other type of data. For the 96 conditions, estimates of B0 were
biased 95 times, estimates of B1 were biased 90 times, estimates of B2 were biased 86
times, and estimates of B3, 84 times. A summary of the MSE/variance ratios to determine
the extent of the bias relative to the sampling error of the estimates is in Table 5.44. These
ratios are higher than they were for the other data types in previous sections. A summary
of the average biases is in Table 5.45, and the main effects of the simulation
hyperparameters on ratio and bias are shown in Tables 5.46 and 5.47. However, as can be
seen in Table 5.45, there were many outliers with regard to bias for B1, B2, and B3, which
makes interpretation of factors that cause bias in the estimators of those parameters

diﬁcult. There were fewer stronger outliers for Bo; it can be concluded from Table 5.46

103

 

 

 

 

that bias is lower and the size of the bias relative to sampling error is lower for the high
values of T and low values of Vmod. These ﬁndings mimic those found for the

MCAR/MAR and p-NMAR data.

Table 5.44

Available-Case MSE/Variance Ratios for B (o-NMAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

B0 Ratio B1 Ratio B2 Ratio B 3 Ratio
Mean 2.081 1.125 1.095 1.085
Median 1.425 1.058 1.034 1.032
Minimum 1.000 1.000 1.000 1.000
Maximum 8.370 1.639 1.821 1.529
Table 5.45
W— “ 7 I
BiasinBo BiasinBl BiasinB2 BlasinB3
Mean -.0400 .1083 .1333 .0910
Median -.0360 .0985 .1161 .0730
Lowest -.1474 -.0322 -.0530 -1.085

 

 

 

104

 

 

 

 

 

 

Table 5.46

Available-Case MSE/Variance Ratios and Biases

for B0 and B1, Main Effects (o-NMAR Data)

_—

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parameter Parameter Ratio for B0 Bias in B0 Ratio for Bl Bias in B1
Value

T 0 2.658 -.0540 1.149 .1027

.005 2.091 -.0390 1.124 .1003

.02 1.493 -.0260 1.101 .1229

k 40 1.396 -.0410 1.037 .1087

100 2.766 -.0390 1.213 .1078

Avg. ni 80 1.581 -.0380 1.087 .1167
400 2.580 -.0420 1.162
Predictor Low 2.173 -.0440 1.010
Malawi-rs. High 1.989 -.0360 1.152
Vmod .006 1.320 -.0190 1.058
.03 2.841 -.0610 1.192
Incidence of 50% 2.058 -.0300 1.123
M. Data 75% 2.103 -.0500 1.125

 

 

105

 

 

 

 

 

Table 5.47

 

 

   

    
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Bias in AC Estimation of T

 

 

 

 

Available-Case MSENariance Ratios and Biases
L.‘ for B2 and B3, Main Effects (o-NMAR Data)
Parameter Parameter Ratio for B2 Ratio for B3
Value
0 1.156 .1753 1.063 .0562
.005 1.092 .1135 1.073 -.1030
.02 1.036 .1112 1.118 -.2264
40 1.048 .1679 1.032 -.0800
100 1.142 .0988 1.137 -.1019
Avg. "i 80 1.049 .1280 1.060 -.1227
400 1.141 .1387 1.109 -.0590
Predictor Low 1.106 .1 147 1.066 .0313
intercom, High 1.084 .1520 1.103
Vmod .006 1.031 .1251 1.071
.03 1.158 .1416 1.098
Incidence of 50% 1.108 .1727 1.094
M982. ED“ 75% 1.082 .0940 1.077

 

Table 5.48 summarizes the sizes of the average bias in T for the hyperparameters

most strongly related to the size of the bias. Unlike for the MCAR/MAR and p-NMAR

data, the percentage of missing data was relatively unimportant.The hyperparameter Vmod

was the third most important hyperparameter, having roughly the same size effect as

 

 

 

variation in T.

Table 5.48

 

 

 

 

I Available-Case MSE/Variance Ratios and Biases for T (o-NMAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

       

 

 

Ratio for T Bias in T
T = 0 1.371 .0075
Average ni = 80 T = .005 1.153 .0057
T = .02 1.017 .0018
V = .006
”“1 T = 0 1.333 .0026
Average ni =
T = .005 1.031 .0011
400
T = .02 1.018 -.0004
T = 0 1.411 .0091
Average ni = 80 T = .005 1.194 .0080
T = .02 1.079 .0059
V = .
m" T = 0 1.342 .0032
Average ni =
T = .005 1.073 .0022
400
T = .02 1.056 .0018

 

Average study sample size was the most important predictor, as was the case with

the MCAR/MAR and p-NMAR data. When the average study sample size was only 80,
there were very large biases in T for T = 0 and T = .005. This bias was even higher when
Vmod = .03. Biases were generally small for estimates coming from simulated meta-
analyses with an average study sample size of 400, especially when the predictors

explained only a snull amount of variation in the outcome (V mod = .006).

107

 

 

 

Bias in CC Estimation of B

Out of the 96 conditions, there were 64 instances of a signiﬁcant bias for B0, 46
instances of signiﬁcant bias for BI, 52 instances of signiﬁcant bias for B2, and 63 instances
of signiﬁcant bias for B3. A summary of the overall MSENariance ratios are in Table 5.49
and a summary of the biases are in Table 5.50. For the MCAR/MAR and p-NMAR data,
the ratios and biases were typically very small; this is not the case for the o-NMAR data.
Estimation of the intercept in particular can be very biased, and that bias can be large
relative to the sampling error of the estimator. This is expected given that the true value of
the outcome was strongly related to the missing-data mechanism. All ratios are higher than

they were for the MCAR/MAR and p-NMAR data.

 

 

 

 

 

 

Table 5.49
Complete-Case MSE/Variance Ratios for B (o-NMAR Data)
Ratio for B0 Ratio for B1 Ratio for B2 Ratio for B3
Mean 2.393 1.020 1.041 1.072
Median 1.357 1.006 1.012 1.025
Minimum 1.000 1.000 1.000 1.000
Maximum 11.458 1.228 1.421 1.711

 

 

 

 

 

 

 

108

 

 

Table 5.50

 

 

 

 

 

 

 

 

BiasinBo BiasinBl BiasinB2 BiasinB3
Mean .0509 -.0484 _.0703 -.1111
Median .0402 -.0397 -0534 -.1088
Lowest -.0100 _.1700 -.2500 -.3100
t est f .1500 f f .0700 .1200 .0300
Bias in CC Estimation of T

Patterns in the bias in the CC Ts are similar to those found for the p-NMAR data.
Table 5.51 summarizes the sizes of the bias over T, k, incidence of missing data, and
average ni. Biases in the estimation of low values of T were very large, sometimes over
.01, in cases where the average study sample size was small. There were substantial
interaction effects between average study sample size and both incidence of missing data
and number of studies. A high proportion of missing data or low value for k exacerbated

the bias in the estimation of T.

109

 

 

Table 5.51

Complete-Case MSENariance Ratios and Biases for T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(o-NMAR Data)
Ratio for T Bias in T
0 Average ni = 80 1.400 .0119
t =

Average ni = 400 1.383 .0022
Average ni = 80 1.183 .0094

T = .005
Average ni = 400 1.025 .0005
Average ni = 80 1.020 .0030

T = .02
Average ni = 400 1.037 -.0052
Average ni = 80 1.193 .0117

k=40

Average ni = 400 1.173 -.0002
Average ni == 80 1.209 .0045

k = 100
Average ni = 400 1.347 -.0015
50% Incidence of Average ni = 80 1.205 .0052
Missing Data Average ni = 400 1.285 -.0011
75% Incidence of Average ni = 80 1.197 .0110
Miss' Data Average ni = 400 1.234 -.0005

 

 

 

110

 

 

 

 

_MLECC to MSEMI E Ratios

Table 5.52 summarizes the MSE ratios across all simulation conditions. The results
for B1, B2, B3, and T are similar to those for the MCAR/MAR and NMAR data, while the
results for B0 are more varied. Estimation of B0 is usually over twice as eﬂicient, but there

are instances in which CC estimation is more eﬂicient than ML estimation.

Table 5.52

MSECC/MSEML Ratios (o-NMAR Data) '

 

 

 

 

 

 

 

 

 

00 pl 132 B3 17
Mean 2.332 1.612 1.261 1.248 2.505
Median 2.204 1.490 1.230 1.210 2.161
Minimum .672 1.041 .868 .845 .924
Maximum 5.620 , 2.624 1.988 1.829 8.516

The relative eﬂiciency varied depending on the simulation conditions. Table 5.53
shows the main effects of each of the seven simulation hyperparameters. There were no
substantively interesting interaction effects among the simulation hyperparameters for any
B or T. The effects are very similar to those for the MCAR/MAR and p-NMAR data:
smaller values of T are related to higher relative efﬁciencies for ML estimation of the
slopes and T, as are smaller average sample sizes and larger proportions of missing data.
The primary difference in the o-NMAR data is that the effect of the size of T on relative

eﬂiciency is reversed for estimation of B0.

111

 

 

   

Table 5.53

MSECC/MSEMLE Ratios, Main Effects (o-NMAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parameter Parameter B0 B1 B2 B3 T
Value

17 0 1.669 1.689 1.416 1.401 3.766
.005 2.178 1.616 1.258 1.242 2.275
.02 3.148 1.531 1.111 1.110 1.475
It 40 2.148 1.671 1.290 1.248 3.028
100 2.515 1.553 1.232 1.247 1.983
Avg. ni 80 2.288 1.727 1.336 1.289 2.969
400 2.375 1.497 1.187 1.206 2.041
Predictor Low 2.164 1.635 1.306 1.287 2.339
intermn-S. High 2.499 1.589 1.217 1.208 2.672
vmod .006 2.675 1.687 1.290 1.248 2.827
.03 1.988 1.537 1.232 1.247 2.184
Incidence 50% 1.978 1.349 1.171 1.168 1.922
ofM. Data 75% 2.685 1.876 1.352 1.327 3.088

ML estimation is over three times as efﬁcient as CC estimation when T =.02, but

only about one-and-a-half times as efﬁcient when T = 0. The reason for this stems ﬁ'om

the fact that CC estimation of data that is NMAR because of variation in the outcome is

 

worse the more unexplained variation there is. When T = 0, the only variation in the
population value of the outcome stems from variation on the predictors, whose values are

known. When T = .02, most variation in the outcome stems ﬁom random error.

112

 

 

The lowest relative eﬂiciencies for the estimtion of B0 (there were six conditions
between .67 and 1.20, four under 1.00) had much in common; all had T = 0, as expected.
For all but one of the six conditions, k = 100, Vmod = .03, average ni = 400, there was
7 5% missing data, and there were low correlations between the predictors.

The lower relative eﬁciencies of B1 (1.04 to 1.25) came from cases with average
ni = 400 and 7 5% missing data. There were eleven conditions for which the relative
efﬁciency of either B2 or B3 (or both) was less than 1.00. In these conditions, it was
generally the case that T= .02, Vmod = .006, the average ni = 400, and there were high
correlations between predictors. Finally, there were only two conditions for which the
relative efﬁciency of T was less than zero; all of the lower efﬁciencies for T (i.e., below
1.15) tended to come from conditions in which T was .01 or .02 and there was 75%

missing data on the predictors.

113

 

 

 

MSECC to MSEAC Ratios

Table 5.54 summarizes the MSE ratios across all simulation conditions.

 

 

 

 

 

 

 

 

 

Table 5.54
MSECC/MSEAC Ratios (o-NMAR Data)
Bo pl 92 133 I
Mean 1.564 1.158 .913 .883 1.073
Median 1.466 1.164 .965 .932 .995
Minimum .101 .114 .059 .059 .245
Maximum 5.277 f 2.175 1.355 1.290 2.382

 

On average, available-case methods provide much better estimates than complete-
case methods for B0, slightly better estimates for B1, and T, and slightly inferior estimates
for B2 and B3. The results vary considerably across the values of the hyperparameters;
Table 5.55 summarizes the MSE ratios by main effect

The patterns of ratios are similar to those for the MCAR/MAR data. The pairwise
estimates have the highest relative efﬁciency when the correlations among the variables are
the lowest, and especially when the correlations among the predictors and between the

predictors and the outcome are lowest.

114

 

 

Table 5.55

MSECC/MSEAC Ratios, Main Eﬂ'ects (o-NMAR Data)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Parameter Parameter B0 B1 B2 B3 T
Value
17 0 .931 1.058 .821 .818 1.068
.005 1.374 1 . 171 .946 .910 1.002
.02 2.387 1.243 .971 .922 1 . 150
k 40 1.365 1.065 .818 .783 1.205
100 1.763 1.250 1.007 .983 .941
Avg. ni 80 1.610 1.293 .995 .949 1.299
400 1.518 1.022 .830 .817 .848
Predictor LOW 1.597 1.242 .956 .941 1.054
interwar”. High 1.531 1.073 .879 .825 1.092
Vmod .006 2.002 1.275 .970 .923 1.171
.03 1.126 1.040 .855 .844 .976
Incidence 50% 1.436 1.089 .930 .909 .818
ofM Data 75% 1.692 1.226 .895 .857 1.328

AC estimation of the intercept is ﬁlr better for T = .02 and Vmod = .006. For other

values of T or for Vmod = .03, AC estimation of B0 is only marginally better than CC

estimation. There are no hyperparameters that have as large an effect on the relative

eﬁciency for estimation of B1; however, there are moderate-sized effects for all of the

simulation hyperparameters. Similarly, there are no hyperparameters that have a large

effect on the relative efﬁciencies for estimation of B2 or B3, but there are moderate effects

115

 

 

 

 

for k, average ni, Vmod and size of predictor intercorrelations. There were substantively

interesting interaction eﬁ‘ects for the slopes between k and incidence of missing data, as

 

 

 

 

shown in Table 5 .56.
Table 5.56
MSECC/MSEAC Ratios for NMAR Data
(Incidence of Missing Data 1 k)
Bo 131 I32 133
50% Incidence k = 40 1.373 1.120 .942 .903
ofMissing Data k = 100 1.499 1.059 .918 .915
75% Incidence k= 40 1.357 1.010 .694 .663
ofMiss' Data k = 100 2.027 1.442 1.096 1.052

 

 

 

 

 

 

Available-case estimation of B0 and B1 is especially good relative to complete-case
estimation when there are a large amount of studies and a large amount of missing data,
while AC estimation of B2 and B3 is especially poor when there are a small amount of
studies and a large amount of missing data.

While in general the results of the above sections suggest that for o-NMAR data,

ML estimation is superior to AC estimation, there is a subset of conditions for which AC

estimation is almost as eﬁcient as ML estimation.

116

Table 5.57
MSEAC/MSEMU: Ratios for o-NMAR Data, T = .02

(Excluding k=40/75% Missing Data)

 

 

 

 

90 91 92 93 17
Mean 1.333 1.141 1.061 1.126 1.312
Median 1.196 1.121 1.046 1.088 1.305
Minimum .980 1.034 .894 .905 .990
2.079 __1405 1510 f 1.668 1.570 __

 

 

 

 

 

For a low incidence of missing data or a large number of studies, AC estimation ol
the slopes is on average only marginally more eﬂicient than ML estimation. In some cases
(all of which have in common a k of 100) estimation of some slopes is slightly worse using

the ML method.

117

4. Estimation of the Population Mean

One last parameter worth considering is the population mean of all of the eﬂ'ect
sizes. Whether this parameter is of interest depends on the purposes of the meta-analysis.
Given estimates of 90: B1, B2, and B3, the meta-analyst can determine the expected value
for any typical study, given values of the study predictors X1, X2, and X3. Thus, for
example, the mean effect size (measuring eﬂicacy of treatment) might not be of interest in
a meta-analysis where the ﬁrst predictor is length of treatment (in weeks), the 2nd
predictor is average contact time per week, and the 3rd predictor is the age of the subjects.
Average treatment efﬁcacy may be of less interest than how successful the treatment might
be given speciﬁc values of the predictors (i.e., speciﬁc lengths of programs, ages of
youths, and average contact time per week).

On the other hand, there will be times when the meta-analyst is interested in the
average eﬂ‘ect, regardless of what the values of the predictors might be. For instance, in
the example above, it is certainly possible that the meta-analyst would want to know
whether the average treatment received by a juvenile delinquent today tends to help,
irrespective of what the treatment might be.

Because all predictors in the simulation had a population mean of zero (note: this
is different than saying that all were “mean-centered”), estimation of the mean for all three
methods is essentially identical to the estimation of B0. As noted early in the ﬁrst section
of this chapter, estimation of B0 was essentially unbiased for all three estimation
procedures (ML estimation, AC estimation, and CC estimation). This was theoretically

expected.

118

 

Bias in Mean Estimation: MAR Data

Theoretically, there should be no bias in the ML estimator of the mean. However,
given the missing-data mechanism in question, there should be a positive bias in the CC
estimator. There is no theory to suggest that the AC estimator should be unbiased (though
there is also no theory to suggest whether any existing bias should be either positive or
negative.) These expectations were borne out in the analyses. Only one of the 96
combinations of hyperparameters examined for the MAR data showed a statistically
signiﬁcant bias (p<.01) in ML estimation of the mean. However, all of the 96 conditions
led to bias in the CC estimation of the mean, and 70 of the 96 conditions led to bias in the
AC estimation of the mean.

As shown in Table 5.58, the size of the bias relative to the standard error of
estimate (examined by considering the empirical MSE/variance ratios) was minimal for
ML estimation, and often trivial for AC estimation. However, for CC estimation the ratios

were often considerable, relative to the size of the mean-squared error of estimation.

 

 

 

 

Table 5.58
'—
ML AC CC
I Mean 1.001 1.063 2.634
I Median 1.001 1.015 2.137
Minimum 1.000 1.000 1.118
Max1mum 1.010 ,. 3.706 10.035

 

 

 

119

 

While the ratios above seem large, the actual size of the bias for the CC estimator
of the mean ranged ﬁ'om small to moderate in size. This is easily explained: the mean was
usually measured with extreme precision. The average CC bias across the conditions
ranged from +02 to +.12, with an average of +06. In the eﬂect size metric, the accepted
guidelines are that a small effect size is .2, a medium effect size is .5, and a large effect size
is .8 (Cohen, 1988). The importance of the size of this bias will depend on what the mean
effect size actually is. The difference in effect size between an estimate of .62 and an
estimate of .50 is noticeable, but will perhaps be substantively unimportant to some
researchers. On the other hand, the difference in effect size between an estimate of .20 and
an estimate .08 might be very important. Conceivably, it could mean the difference
between wide-scale implementation of a treatment, and sending it back to the drawing
board.

This direction of bias was expected, as the missing-data mechanism was such that
studies were more likely to be missing when they had lower values on the 1St predictor.
Thus, the observed studies were more likely to have higher values on the 1"t predictor.
Given the positive relationship between the 18t predictor and the outcome, the complete-

case studies were more likely to have higher effect sizes than the population mean.

Bias in Mean Estimation: ENMAR Data
Theoretically, there should be bias in all three estimators of the mean when there is

this type of missing data. These expectations were borne out in the analysis, though ML

estimation clearly excelled. There were two types of p-NMAR data: sp-NMAR data

120

 

 

 

(strongly NMAR) and mp—NMAR data (moderately NMAR). Only 16 of the 96

combinations of hyperparameters examined for the sp-NMAR data led to a statistically

signiﬁcant bias in the ML estimator of the mean. However, all 96 conditions led to

statistically signiﬁcant biases in both the CC and AC estimators of the mean.

As expected, the biases were somewhat less severe for the nip-NMAR data. Only
three of the 96 combinations of hyperparameters examined for the mp-NMAR data led to
a statistically signiﬁcant bias in the ML estimator of the mean. Again, the CC estimator of
the mean was biased across all 96 conditions. However, the AC estimator of the mean was
biased in only 86 of those conditions.

As shown in Table 5.59 and Table 5.60, the size of the bias relative to the standard

error of estimate (examined by considering the empirical MSE/variance ratios) was

minimal for EM estimation, and often trivial for AC estimation. However, for CC

estimation it was often considerable, relative to the size of the mean-squared error of

estimation.

Table 5.59

 

MSE/Variance Ratios for Estimation of the Mean (sp-MAR Data)

 

 

 

 

 

 

 

 

 

EM AC CC
Mean 1.004 1.347 3.491
Median 1.003 1.190 2.743
Minimum 1.000 1.003 1.175
Maximum 1.035 2.902 15.481

 

 

Table 5.60

    

 

MSE/Variance Ratios for Estimation of the Mean (up-MAR Data)

 

 

  

 

 

 

 

 

 

 

 

These ratios are similar to those found in Table 5.58. The average CC bias across

the sp-NMAR conditions ranged ﬁ'om +03 to +.14, with an average of +07 , while the
AC biases for the same set of conditions ranged from -.05 to —.01, with an average of -.02.
The average CC bias across the mp-NMAR conditions ranged from +01 to +07, with an
average of +04, while the AC biases for those conditions ranged from -.03 to +01, with
an average of -.01. Relative to the effect size metric, only the biases for the CC estimator

are likely to be high enough to lead to substantively incorrect conclusions.

Bias in Mean Estimation: o-NMAR Data

As with the other NMAR data, there should theoretically be bias in all three
estimators of the mean. These expectations were borne out ill the analysis — again, ML
estimation was clearly superior. In 31 of the 96 combinations of hyperparameters
examined for the o-NMAR data, there was a statistically signiﬁcant bias (p<.01) in the ML

estimation of the mean. Again, all of the 96 conditions led to bias in the CC estimation of

122

 

 

the mean, and most (89 of the 96 conditions) led to bias in the AC estimation of the mean.
As shown in Table 5.61, the sizes of the biases relative to the standard error of

estimate were greater for this condition than for any of the others. The bias was still

minimal for ML estimation. However, it was small to moderate for AC estimation, and

often very large for CC estimation.

Table 5.61

 

MSENariance Ratios for Estimation of the Mean (o-NMAR Data) :

)—————————— ___—~— —_——— ——J

 

 

 

 

 

 

 

 

 

 

 

 

ML AC cc
Mean 1.007 1.381 8.816 1
Median 1.003 1.174 7.763
Minimum 1.000 1.000 1.597
Maxmum - 1.065 ._ _ _ f f 28.31 -

The size of the bias ranged from +.05 to +23 for the CC estimator of the mean,
with an average bias of +. 12. The bias of the AC estimator of the mean ranged from -.07
to zero, with an average bias of -.02. Again, the bias in the CC estimator is considerable
enough relative to the effect size metric that it might lead researchers to conclusions that

are, substantively speaking, far from correct.

123

 

5. Results: Dichotomous Predictor with Missing Data

The EM maximum-likelihood estimation method derived in Chapter 1]] assumes
that each predictor is normally distributed. However, it is generally accepted that EM
estimation does not suffer when there are completely observed dichotomous variables as
predictors. Nevertheless, there will be many times in meta-analysis when there are
dichotomous variables of import that have some degree of missing data. One way to
handle this problem is to derive a new EM estimation method for use in meta-analysis, one
which combines continuous predictors and those with a multinomial distribution.

However, that is a complex task, and the assumption of normality in maximum-likelihood
estimation is often not that stringent. Thus, before pursuing it, it is worthwhile to
investigate the possibility that EM estimation with dichotomous predictors with missing
data is comparable to EM estimation with continuous predictors with regard to bias and
improvement in MSE over CC estimation.

Computing time was not available to investigate the effect of having a
dichotomous predictor or multiple dichotomous predictors in BM estimation of the meta-
analytic model. The task is worthy of a large simulation study in itself, not only would the
seven variables already varied in the present investigation be considered as key simulation
hyperparameters, but consideration would have to be given to the proportions involved for
the dichotomous variables. There might be different results for a dichotomous variable for
which the division between the two groups is 50%/5 0% and a variable for which the
division between the two groups is 90%/10%. There is also the problem that depending on

the missing-data percentages and number of studies in the simulated meta-analyses, there

124

would be some simulations for which there could be no complete-case analysis using all
variables (because in the complete data, all values for that variable are identical). This
would especially be a risk when there are small amounts of studies, large percentages of
missing data, a large population proportion in one group or the other (such as a 90%/10%
split), or a missing-data mechanism strongly dependent on the value of one of the
predictors or the outcome.

Because of these concerns, only a preliminary investigation into the question was

conducted. To avoid the problems described in the previous paragraph, the meta-analyses

 

were limited to instances in which k = 100 and the missing-data mechanism was MCAR.
Within these constraints, values of Vm , average n, T, strength of intercorrelations
between predictors, and missing data pattern were varied, leading to 48 combinations of
hyperparameters. In these simulated meta-analyses, the third variable was treated as
dichotomous; it was made dichotomous by following the procedure for generating data
described in Chapter IV; once the values for the third variable were generated, each value
was changed to -1.0 if the value was negative and to +1.0 if the value was positive.
Obviously, in future investigation of this issue different values of k, different missing-data
mechanisms, and different population proportions for the dichotomous variable should be

considered.

125

 

 

Bias in ML Estimation of B

It was expected that except for the slope for the third predictor, the patterns of
bias would be what they were ill Section 1 of this Chapter. This indeed was the case; in the
48 conditions, there are 4 instances of bias for the intercept, 2 instances for B1, 4 instances
for B2, and 12 instances for B3 (p<.01). The signiﬁcant biases occurred more often when
the average study sample size was 400 and the missing data percentage was 75%.

Table 5.62 contains the average, median, minimum, and maximum empirical
MSE/variance ratios. A histogram of MSE/variance ratios for B3 is in Figure 5.4. The
table and the ﬁgure show that even though there are more signiﬁcant biases for estimation

of B3, the size of the biases remain small relative to the amount of sampling error in the

 

 

 

 

 

 

estimators.
Table 5.62
Maximum-Likelihood MSE/Variance Ratios
for B (MCAR Data w/Dichotomous Predictor)
90 91 92 93
Mean 1.002 1.002 1.004 1.006
Median 1.001 1.001 1.002 1.003
Minimum 1.000 1.000 1.000 1.000
Maximum 1.010 1.015 1.033 1.025

 

 

 

 

 

 

 

126

Figure 5.4
20 '

 

   

 

   

1.000 1.003 1.006 1.008 1.011 1.014 1.017 1.019 1.022 1.025

Ratio of MSE to Variance for 13:, (err/Dichotomous Predictor)

Bias in ML Estimation of T
Table 5.63 shows the ratios and biases for the ML estimations across the different
values of these simulation hyperparameters. The average ratio across all conditions was

1.260.

127

Table 5.63
Maximum-Likelihood MSE/Variance Ratios for T
(MCAR Data w/Dichotomous Predictor)

Ratio for T
1.364
1.152
1. 15 1
1.342

1.260

 

 

50% Incidence of

 

Missing Data

 

7 5% Incidence of

 

 

 

 

 

 

Missing Data

These results are ahnost identical to those for the MCAR/MAR data for k = 100 in

section 1. of this chapter.

Bias in AC Estimation of B

Of the 48 combinations of simulation hyperparameters, there were 2 conditions for
which there was a signiﬁcant bias in the estimation of B0, 4 conditions for estimation of
B1, 8 conditions for estimation of B2, and 14 conditions for B3. The proportion of cases
for which there were signiﬁcant biases for each parameter are roughly equivalent to those
for continuous MCAR data, as expected given that there is no assumption in AC
estimation that the predictors have continuous distributions. Table 5.64 demonstrates that
the amount of bias in the AC estirmtors was similar to that in the MCAR data with

continuous predictors: very small relative to the size of the variance of the estimates, even

128

in those few cases where the bias was statistically signiﬁcant.

Table 5.64
Available-Case MSE/Variance Ratios for B

(MCAR Data w/Dichotomous Predictor)

 

3 50% MD. Incidence

 

 

 

 

 

 

Bias in AC Estimation of T

Table 5.65 summarizes the sizes of the average bias in T for the hyperparameters
that were most strongly related to the size of bias. The hyperparameters selected — missing
data incidence, average study sample size, and value of T — are the same as for the MCAR
data with continuous predictors. The ﬁndings are similar to those for the MCAR data with
continuous predictors, as well. On average the biases are substantively small except for T
= 0 or .005 and average "i = 80. Biases greater than zero are not unexpected for these

conditions due to the fact that all T < 0 were set to 0 before calculations commenced.

129

Table 5.65

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Available-Case MSE/Variance Ratios and Biases for T
(MCAR Data w/Dichotomous Predictor)
Ratio for T Bias in i
T = 0 1.436 .0066
50% Average ”i = 80 T = .005 1.147 .0047
T = .02 1.009 .0015
Incidence of
A T = 0 1.354 .0020
_ , verage ”i =
M‘ssmg Data T = .005 1.005 .0003
400
T = .02 1.006 -.0006
T = 0 1.392 .0061
75% Average "i = 80 T = .005 1.119 .0042
T = .02 1.004 -.0002
Incidence of
A T = 0 1.328 .0020
. . verage "i =
M'ssmg Data T = .005 1.004 .0003
400
T = .02 1.026 -.0014

 

 

 

 

 

 

Bias in CC Estimation of B and T

As expected by estimation theory, there was little bias in the complete-case
estimators of B for the MCAR data; of the 48 conditions, there were no instances of
statistically signiﬁcant bias for B1 and only one instance for B0, B2, and B3.

Bias ill T was strong, as it was in the MCAR data with continuous predictors, as
can be seen in Table 5.66. The bias is especially strong for T = 0 or .005 and low average

sample size and large amounts of missing data.

130

Table 5.66

Complete-Case MSE/Variance Ratios and Biases for T

(MCAR Data w/Dichotomous Predictor)

Ratio for T
1.428

Average "i = 80 1.153

1.01 1

A 1.412
. . verage ni =
Mlssmg Data 1002

400

 

 

50%

 

 

I Incidence of

 

 

1.001
1.395
Average ”1 = 80 1.184
1.029
1.379
1.016
1002

 

 

75%

 

 

Incidence of

 

Missing Data Average "i =

 

400

 

 

 

 

 

MSECC to MSEMI E Ratios

Table 5.67 summarizes the MSE ratios across all simulation conditions. As was the
case with the MCAR data with continuous predictors, on average maximum-likelihood
estilmtion provides large gains in efﬁciency, especially for the estimation of T. As shown
in Table 5.68, the actual gain in eﬂiciency varies depending on the values of the simulation

hyperparameters.

131

Table 5.67

 

MSECC/MSEMLE Ratios (MCAR Data w/Dichotomous Predictor)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

90 91 92 93 I
Mean 1.621 1.622 1.269 1.312 2.281
Median 1.468 1.515 1.243 1.245 1.818
Minimum 1.159 1.192 1.063 1.083 1.066
Maximum 2.160 2.152 1.654 1.682 5.682
Table 5.68
MSECC/MSEMLE Ratios, Main Effects (MCAR Data w/Dichotomous Predictor)
Parameter Parameter B0 B1 B2 B3 T
Value
T 0 1.629 1.637 1.367 1.410 3.432
.005 1.636 1.638 1.263 1.316 1.959
.02 1.599 1.591 1.176 1.211 1.451
Avg. "1 80 1.686 1.669 1.289 1.312 2.591
400 1.556 1.575 1.248 1.312 1.971
Predictor Low 1.587 1.619 1.311 1.348 2.171
intepcorrs. High 1.655 1.625 1.226 1.276 2.390
Vmod .006 1.688 1.684 1.268 1.291 2.461
.03 1.554 1.560 1.269 1.334 2.100
Incidence 50% 1.361 1.351 1.176 1.199 1.861
ofM. Data 75% 1.882 1.893 1.361 1.425 2.700

 

 

 

 

 

 

 

 

132

 

 

As with the data with all continuous predictors, the superiority of MLE estimation
to CC estimation is greater when there is more missing data. The improvement is greatest
for estimation of [50, [51’ and 1:. The relative eﬂiciency of the EM estimates of [32, B3, and
17 is lower for large values of 1:; this ﬁnding mirrors that found in the previous sections as
well. Unlike in the previous sections, there were no substantively large interactions
between simulation hyperparameters regarding the size of the relative eﬂiciencies of the
(is. There were three substantively large interaction for the relative eﬂiciency for it.

Table 5.69 demonstrates that the interaction between the size of 1: and the
incidence of missing data is such that when there is 75% missing data, the ML method is
especially eﬂicient for low values of 17, especially 1: = 0.

Table 5.70 shows that for a small average study sample size, the ML method is
especially efﬁcient for 1: = 0 and 1.‘ = .005. For a larger average study sample size, the ML
method is only especially eﬂicient (i.e., three to four times as efﬁcient) for 1: = 0.

Table 5.71 shows that for a small average study sample size, the ML method is

especially efﬁcient for 75% missing data.

133

Table 5.69

 

(Size of t x Incidence of Missing Data)

MSECC/MSEMLE Ratios for 1:, MCAR Data w/Dichotomous Predictor

 

 

 

 

 

 

 

 

 

 

 

Ratio for 1:
. . . 1: = 0 2.684
50% Inc1dence of Missmg
1: = .005 1.590
Data
1: = .02 1.309
. . . 1: = 0 4.179
75% Inc1dence of Missmg
1: = .005 2.328
Data
1: = .02 1.593
Table 5.70

 

(Size of 1: x Average Study Sample Size)

MSECC/MSEMLE Ratios for 1:, MCAR Data w/Dichotomous Predictor

 

 

 

 

 

 

 

 

 

 

Ratio for 1:
1: = 0 3.678
Average "i = 80 1: = .005 2.635
1: = .02 1.460
1: = 0 3.186
Average "i = 400 1: = .005 1.484
1: = .02 1.443

 

134

 

 

Table 5.71

 

MSECC/MSEMLE Ratios for 1:, MCAR Data w/Dichotomous Predictor

(Incidence of Missing Data 1: Number of Studies)

 
  

 
   
 

Ratio for 1:

 

 

T 50% Incidence of Missing

  

Data

Average "i = 80

1.989

 

 

Average ni = 400

 
   

  

1.734

  

 

 

7 5% Incidence of Missing

  

Data

MSECC to MSE AC Ratios

Average "i = 80

3.193

 

 

 

 

 

 

 

Average "i = 400

     

    

2.207

Table 5.72 summarizes the MSE ratios across all simulation conditions.

Table 5.72

MSECC/MSEAC Ratios (MCAR/MAR Data)

 

 

 

 

 

 

 

 

 

 

As with the MCAR data with all continuous predictors, available-case methods

provide better estimates than complete-case methods for 90 and B1 and approximately

135

equally efﬁcient estimates for [30 , [31, and 1:. Table 5.73 summarizes the MSE ratios by

 

 

 

 

 

 

 

 

 

 

 

main effect.
Table 5.73
MSECC/MSEAC Ratios, Main EﬂCCtS
(MCAR Data w/Dichotomous Predictor)
Parameter Parameter [30 [31 [32 [33 1:
Value
1: 0 1.200 1.188 .895 .930 1.053
.005 1.338 1.326 .998 1.018 1.142
.02 1.460 1.462 1.100 1.183 1.270
Avg. "i 80 1.429 1.409 1.053 1.036 1.209
400 1.236 1.241 .949 .959 .879
Predictor Low 1.327 1.368 1.070 1.067 1.018
inter.corrso High 1.338 1.282 .931 .928 1.069
Vmod .006 1.377 1.375 1.004 1.006 1.084
.03 1.288 1.275 .998 .989 1.003
Incidence of 50% 1.117 1.104 .930 .924 .819
M. Data 75% 1.547 1.546 1.071 1.071 1.268

As was the case with the MCAR data with all continuous predictors, AC

 

 

 

 

 

 

estimation tends to be more eﬂicient relative to CC estimation when combinations of

hyperparameters are such that they make correlations among predictors and between

136

 

predictors and outcome lower. Correlations are lower for high 1:, low average ni, low
correlations among predictors, and low values of Vm ; relative eﬁciencies are higher
(though sometimes only marginally so) for these combinations of hyperparameters. The
largest eﬁ‘ect on relative efﬁciency is ﬁrom incidence of missing data; the eﬁciency of AC

estimation is almost indistinguishable from CC estimation when there is less missing data.

Table 5.74

MSECC/MSEMLE Ratios for 1:, MCAR data w/Dichotomous Predictor

   
 

(Average Study Sample Size x Size of 1:)

 
 
 

 

 

 
 
 

Average ni = 80

 

 

 

Average "i = 400

 
 

 

 

 

There is one substantively important interaction eﬂ‘ect for the relative efﬁciency in

the estirmtion of 1:. Table 5.7 4 demonstrates that for large average study sample size, the
relative efﬁciency is strongly dependent on the value of 1:, while the efﬁciency is not
dependent in meta-analyses with a small average study sample size.

The evidence in Tables 5.68 and 5.7 3 show that the efﬁciency of ML estimation

relative to CC estimation decreases as 1: increases, while the eﬁciency of AC estimation

137

 

relative to CC estimation increases as 1: increases. Table 5.75 shows the average relative
eﬂiciency of the ML estimation to AC estinmtion for I = .02. These results are also similar
to those for the MCAR data with continuous predictors. Maximum-likelihood estimation
of 1: is on average 24% more eﬂicient for these conditions, but the estimation of the [is is

on average only 7 to 10 percent more efﬁcient.

Table 5.75

' MSEAC/MSEMLE Ratios, 1: = .02 (MCAR Data, w/Dichotomous Predictor) '

 

 

 

 

 

 

 

 

 

 

There is one condition out of the 48 for which NIL estimation of [32 and B3 is
marginally less efﬁcient than AC estimation. In this condition, 1: = .02, average "i = 40,
Vmod = .006, and there are correlations of . 10 among predictors. This is the condition for
which the correlation between the predictors and the outcome is the lowest; the sample R2
for this condition averaged .075. The relative eﬂiciency of ML to AC estimation of 1: was
1.17 for this set of hyperparameters. The other 47 combinations of hyperparameters had

MSE ratios of above 1.00 for each hyperparameter.

138

6. Results: Bootstrapped Standard Errors

Bootstrappmg' Errors for the Slogs

Tables 5.76 and 5.77 show the main effects of the simulation parameters on the
size of the bootstrapped variance/MSE ratios. Ratios based on both the mean
bootstrapped variance across 100 simulations and median bootstrapped variance across
the 100 simulations are reported due to the large effect that outliers had on the mean for
some conditions in which k = 40 and the incidence of missing data was 7 5%. While most
bootstrapped variances for slopes for those conditions were between .25 and 1.5, on some
occasions the bootstrapped variance was over 5.00, and in one case it was over 50, even
though the estimations still converged. Further investigation into the behavior of these
estimates should lead to some rules of thumb for bootstrapping that will exclude
unreasonably large values; for the purpose of this investigation, concentration will be
focused on the performance of the ratios based on the medians.

These results are similar to what is found in Su (1988). A direct comparison is
diﬁcult, given that Su used different missing data percentages and underlying covariance
matrices. For his condition in which there are 40 cases (equivalent to k=40 in this
simulation study), and a 62.5% incidence of missing data (i.e., only 15 of the studies had
complete data), he found an average ratio of 1.77 across all slopes. As a comparison, the
average ratio for the condition in this study where k=40 and there was a 7 5% incidence of
missing data was 2.617 (Table 5.78). The average ratio was considerably lower — only
1.170 — when there was a 50% incidence of missing data. Thus, the average ratio for the

k=40 conditions was 1.894, similar to Su’s average ratio.

139

For his condition in which there are 160 cases, and a 62.5% incidence of missing
data, the average ratio was 1.03, which is similar to what was found in this investigation
for both the 75% incidence of missing data condition and 50% incidence of missing data
condition, when k=100.

There was only one interesting interaction, between incidence of missing data and
number of studies, and it is also reﬂected in Table 5.7 8. Ratios using the medians are very

close to 1.00 for all conditions except k = 40 and 75% incidence of missing data.

140

 

 

Table 5.76

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Average Mean and Median Bootstrapped
Variance/MSEML Ratios for [30 and [51, Main effects
Parameter Parameter Ratio for Ratio for Ratio for Ratio for

Value Bo 50 pl Bl

(Mean) (Median) (Mean) (Median)
1: 0 1.503 1.161 1.434 1.100
.005 1.389 1.134 1.430 1.147
.02 1.273 1.076 1.345 1.097
k 40 1.753 1.249 1.802 1.274
100 1.023 .998 1.010 .956
Predictor Low 1.344 1.089 1.406 1.126
inter-corrs. High 1.432 1.578 1.406 1.103
Avg. "i 80 1.424 1.140 1.500 1.154
400 1.353 1.107 1.313 1.076
Vmod .006 1.438 1.149 1.422 1.119
.03 1.339 1.098 1.391 1.111
Incidence of 50% 1.040 .984 1.076 .997
M. Data 75% 1.736 1.263 1.737 1.232

 

141

 

Table 5 .77

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Average Mean and Median Bootstrapped
Variance/MSEML Ratios for Ba and B , Main effects
Parameter Parameter Ratio for Ratio for Ratio for Ratio for

Value 132 132 133 133

(Mean) (Median) (Mean) (Median)

1: 0 1.716 1.192 1.687 1.132
.005 1.289 1.032 1.368 1.075
.02 1.370 1.033 1.447 1.022
k 40 1.871 1.223 2.007 1.254

100 1.045 .9481 .994 .900
Avg. "i 80 1.500 1.091 1.570 1.095
400 1.416 1.164 1.431 1.059
Predictor Low 1.546 1.128 1.590 1.078
interﬁons. High 1.371 1.044 1.410 1.076
Vmod .01 1.472 1.117 1.456 1.077
.03 1.444 1.054 1.545 1.077

Incidence 50% 1.090 .995 1.913 .971
ofM_ Data 75% 1.827 1.177 1.088 1.183

 

142

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Hugo Mum
><o§no an»: E:— 3255 weenmnnuwuoa «Slaugxgmwaﬁ ”anew
Guano—boo on gamma“ U»? x 2:532. em 95:8 man—$285
W38 33o m8. Ema Ema m8 Ema m3 Ema mg Wane Wane m8.
m8 mo no m2 P P an 3 m2 P F
c533 A3355 Agagv $3255 9985 93255 Agogv Agog—5
3.x. wl RB FEM row» HEM Foam HTS _.Sm ENS. memo
H3538 om
w I So .93 .omm boo bum Foww bum bow bow
EmmEm U2»
daxe wl go Nu: Tam N130 _.mom whoa Tam Nwoo Tao
30538 c»,
>125 HEN Foqw _.omm bum _.omm .05 _.omo .wow
gmmEm ORB

 

 

143

Investigations of the average non-coverage rates lead to similar conclusions.
Tables 5.79 and 5.80 show, respectively, the main eﬂ‘ects and incidence x k interaction
eﬁ‘ect on the frequency of non-coverage. A 5% rate is expected if the standard errors are
being correctly estimated; the “non-coverage rate” nomenclature is used by Su (1988), but
it also might be thought of as a rejection rate. Most effects are quite small It appears that
for smaller amounts of missing data, the non-coverage rate is slightly conservative; when
there are larger amounts of missing data, the non-coverage rate may be slightly
conservative for large k but slightly liberal for small k.

Overall, it should be concluded that bootstrapped standard errors for the slopes
lead to rejection rates close to .05 and standard errors that are generally the correct size,
though they may be a bit too large for meta-analyses with large amounts of missing data

and few studies.

Table 5.79

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Non-Coverage Rates using Bootstrapped Standard Errors, Main eﬁ'ects
Parameter Parameter N.C. Rate N.C. Rate N.C. Rate N.C. Rate
Value for Bo for [51 for [32 for [53
1: 0 5.9%" 5.3% 5.2% 5.0%
.005 5.6% 5.4% 6.5%" 5.6%
.02 6.2%" 5.9%" 6.6%" 6.5%"
k 40 5.3% 4.7% 5.3% 4.8%
100 6.5%" 6.4%" 6.9%“ 6.6%"
Avg. ni 80 5.7% 5.7% 6.7%" 5.6%
400 6.0%" 5.4% 5.5% 5.8%"
Predictor Low 5.6% 5.4% 5.6% 5.7%
Memo“, High 6.2%" 5.7% 6.6%” 5.7%
vmod .01 5.5% 5.3% 6.2%" 5.9%“
.03 6.2%" 5.8% 6.0% 5.5%
Incidence of 50% 6.7%“ 6.2%" 6.7%" 6.4%”
Missing
75% 5.0% 4.9% 5.4% 5.0%
Data

 

** Non-coverage rate is signiﬁcantly diﬁerent from .05

145

 

Table 5.80

 

Non-Coverage Rates using Bootstrapped Standard Errors

(Incidence of Missing Data x Number of Studies Interaction)

 

N.C. Rate N.C. Rate N.C. Rate N.C. Rate

 

 

 

 

 

 

 

 

 

 

 

for [30 for [31 for B2 for [53
50% Incidence k = 40 6.5%" 5.8% 7.2%" 5.9%"
ofMissing Data k = 100 6.9%" 6.8%" 6.8%" 6.9%“
75% Incidence k = 40 4.1% 3.6%" 3.4%" 3.8%“
ofMissing Data k = 100 6.0%" 6.1%" 7.0%" 6.3%"

 

** Non-coverage rate is signiﬁcantly different from .05

Testm' g Homogenegy' of Effects

The test of Hoz1: = 0 is an important one in any meta-analysis. Unfortunately, the
negative bias in the estimation of 1: together with the “ﬂoor” effect (disallowing estimates
of 1:<0) combine to make bootstrapped standard errors of the estimate of 1 smaller than
they should be, leading to many empirical rejection rates over 20%. A value of 1: was
considered to be signiﬁcantly above zero if it exceeded 1.675 times the corresponding
bootstrapped standard error (i.e., the test was one-tailed, as 1: cannot be less than zero.)
The estimates of 1: should be asymptotically normal, given the nature of maximum-

likelihood estimation. The results are summarized in Table 5 .81.

146

Table 5.81

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Empirical Rejection Rates using
Bootstrapped Standard Errors for
Test ofH0:1: = 0
Parameter Parameter Rejection
Value Rate
k 40 25.1%"
100 23.0%"
Avg. ni 80 22.4%"
400 25.6%"
Predictor Low 20.0%"
Inter-Certs. High 28-1%"
Vmod .006 29.6%"
.03 18.5%"
Incidence of 50% 23.0%M
Missing "
75% 25.1%
Data

 

 

 

** Non-coverage rate is signiﬁcantly different from .05

While the empirical rejection rates are clearly above the nominal ones, the problem
is, practically speaking, a small one, given the phenomenal precision of the ML estimator
of 1: when 1: = 0. Average medians, 90m, and 95th percentiles for the 32 combinations of

hyperparameters for which the data were MCAR and the average study sample size was

147

80 are presented in Table 5.82. The value of 1: was estimated extraordinarily well for the
32 combinations for which the sample size was 400; the largest 95th percentile across all

of these conditions was .0028.

Table 5.82

Average Empirical Percentiles for Estimates of 1:, Average ni=80,

1: = 0
90th 95th

Percentile Percentile

 

.0042

.0088

 

.0054

.0082

 

Predictor

inter-cons.

.0046

.0089

 

.0053

.0078

 

Vmod

.0043

.0075

 

.0055

.0094

 

Incidence

of M. Data

.0057

.0091

 

 

 

 

 

.0042

 

.0069

 

Across all conditions, when t = 0, the typical estimate of 1: is generally very close
to zero, even when the average study sample sizes are low. Improvement in the statistical

signiﬁcance test of the hypothesis H0: 1:=0 will be pursued in future research.

148

CHAPTER VI

SANIPLE META-ANALYSIS

To demonstrate that the program written to implement the EM estimation
procedure derived in Chapter III could handle real data, a meta-analysis was conducted on
data supplied by Professor Mark Lipsey of Vanderbilt University. The dataset contained
3905 effect sizes representing the efficacy of interventions designed to reduce the extent
of juvenile delinquency among at-risk youth or youth who had already committed some
delinquent acts. Over 150 study characteristics were coded for each effect. The most
recent analyses of this dataset can be found in Lipsey (1999a) and Lipsey (1999b). Due to
the complexity of the data, problems that speciﬁcally relate to the nature of the problem
involved (i.e., the types of study characteristics investigated in juvenile delinquency
studies), and more importantly, less familiarity on my part with the both the dataset and
the subject matter than Dr. Lipsey, a full replication of his analyses could not be
conducted. However, a careﬁil reading of Lipsey (1999a, 1999b) suggests a model that
might be examined in a test of the EM program.

This chapter begins with a description of how the sample of study effects was
selected from the database, and continues with a description of the study characteristics to
be investigated in this mixed-model meta-analysis. Listwise—deletion (complete-case)
parameter estimates and EM parameter estimates for the initial model and the ﬁnal model

are given and the differences between the results of the two models are discussed.

149

1. Selection of Study Effects and Study Characteristics

While the Lipsey database included 3905 eﬂ‘ect sizes, many effect sizes were
dependent on one another. Most studies had both post-test and follow-up measurements
and multiple outcomes. The most straightforward way to eliminate these dependencies
was to pick one effect size from each study. While not all studies had follow-up
measurements, all studies did have post-test measures; thus, only effect sizes based on
post-test measures were used. Lipsey (1999b) limits his analysis to studies that have a
recidivism outcome measure; preferably police arrests, but if that was not available as an
outcome the most similar outcome was used. In this meta-analysis police arrests was used
as the preferred outcome; when that was unavailable, “institutionalization” was used, and
when that was not available, probation, court, or parole contact was used. Lipsey also
limits his analysis to studies in which the researcher did not implement the treatment,
ﬁnding that such studies tended to be biased; the same restriction was made here. Finally,
studies that did not have complete data on the sample size of the treatment and control
groups were also eliminated. This procedure led to a dataset consisting of 328 studies.

Lipsey found many variables important in his 1999 analyses. However, not all of
the variables that Lipsey found to be important in his sample had even small bivariate
relationships with the effect sizes in this sample; this is another reason that this analysis
cannot be considered to be a replication of his analysis. For example, Lipsey found that
four types of treatments related to probation led to higher treatment effect sizes; the same
was not found in the sample used in this meta-analysis. Nevertheless, most of the variables

Lipsey found to be important were considered in the initial model below.

150

Table 6.1

Study Characteristics Investigated in Juvenile Delinquency Meta-Analysis

Mean Age of Juveniles in Program at Time of Intervention
° Ranged ﬁ'om 10.9 to 21.0

Aggressive History of Juveniles
- 1 = At least some aggressive history, 0 = No aggressive history

Administrator of the Treatment
0 1 = Administrator was criminal justice personnel, 0 = Administrator was

not criminal justice personnel

Reason for Entering the Treatment

~ 1 = Admission was mandatory, 0 = Admission was not mandatory
Site of the Treatment
0 1 = Site was a criminal justice site, 0 = Site was not a criminal justice site

Periodicity of the Treatment
0 l = Treatment took place either Daily or more infrequently, 0 = Treatment
was “continuous” in nature

Amount of Weekly Contact
0 1 = Average number of weekly hours > 7.0, 0 = Average number of weekly
hours < 6.99
Length of Program
- 1 = Program lasted 18 weeks or more, 0 = Program lasted 17.9 weeks or
less

Deterrence/Wilderness Treatment
- 1 = Program was either based on deterrence or a wilderness camp or
survival training, 0 = All other programs

Counseling Treatment

' l = Program’s central focus was some form of individual or group
counseling, 0 = All others

151

Demonstration Program
0 1 = Program was a “Demonstration program”, 0 = All others (e. g., public
or private programs)

Private Program
- 1 = Program was privately sponsored, 0 = All others

Difﬁculty in implementation of program

- 1 = Program was diﬂicult to implement, 2 = Program was possibly diﬂicult
to implement, 3 = Program was not diﬂicult to implement

152

 

 

 

 

Table 6.2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Variable Names and Frequency 01' Missing Data
Variable Proportion of Missing Data
MeanAge 38/328 (11.6%)
Agngist 105/328 (32.0%)
Adminstr 0/328 (0%)
Mandatory 4/328 (1.2%)
CJSite 22/328 (6.7%)
Periodicity 51/328 (15.5%)
WeeklyContct 22/328 (6.7%)
ProgLength 46/328 (14.0%)
Springer 0/328 (0%)
Counsel 0/328 (0%)
DemoProg 2/328 (.6%)
PrivProg 2/328 (.6%)
Difﬁculty 23.328 (7.0%)

 

Table 6.1 describes the thirteen variables used in the ﬁrst model tested; all variables
except for “history of aggression” and “difﬁculty in implementing treatment” were
analyzed by Lipsey.

The variable names in Table 6.2 correspond, in the same order, to the variables
described in Table 6.1. (The Deterrence/Wilderness Camp/Survival Camp variable is
named Springer due to the frequent appearance of advocates of these kinds of programs

on the Jerry Springer TV show.) Note that all variables investigated except for MeanAge

153

 

and Diﬂiculty are dichotomous in nature. Two characteristics that at ﬁrst glance might
best be considered continuous — program length and mean hours of weekly contact — were
dichotomized by Lipsey (and by this researcher) due to the fact that the distributions of
values for those variables were highly skewed. Most programs lasted fewer than 20
weeks, but some programs lasted, according to the database, over 5 years. Similarly, while
the majority of values for average hours of weekly contact was under 8 hours, many
interventions took over 40 hours in a week. While section 7 of Chapter V showed positive
results for models containing a single dichotomous predictor that had missing values, and
most of these predictors had either no or low amounts of missing data (and it is generally
accepted that dichotomous variables with no missing data are acceptable in BM
estimation), it should be kept in mind that the EM estimation derived in Chapter TH is

based on normally distributed predictors.

2. The Initial Model

The ﬁrst model investigated had as its outcome the eﬂ‘ect size and as its predictors
the values of the 13 study characteristics mentioned above. The complete-case analysis
used 146 studies, while the EM analysis used all 328. This shows the usefulness of
missing-data methods even when the dataset is not very sparse, as is the case with this
collection of studies. Even though only one variable has a missing data percentage above
20%, and the majority of variables have missing data percentages less than 15%, over half
the studies were lost when all cases with any missing data were dropped. The mean ES

was calculated by adding the estimate of B0 to the sum of the products of the estimates of

154

the slopes of the predictors and the weighted mean value of those predictors.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 6.3
Parameter Estimates and Signiﬁcance Tests: Initial Model
Parameter CC Estimate CC p-value EM estimate EM p-value
w/standard w/standard
error error

Mean ES .1428 (.0258) <.001 .1470 (.0175) < .001
MeanAge -.0271 (.0163) .096 -.0242 (.0108) .024
Agngist -.0464 (.0641) .470 -.0421 (.0484) .383
Adminstr -.0550 (.0810) .498 -.0281 (.0407) .491
Mandatory .1470 (.0639) .022 .0606 (.0391) .121
CJSite -.0737 (.0797) .355 -.0643 (.0314) .041
ProgLength .0694 (.0552) .209 .0125 (.0367) .733
Periodicity .0794 (.0729) .276 .0513 (.0530) .333
WeeklyCont -.0473 (.0693) .495 -.0395 (.0509) .437
Springer -.0890 (.1223) .467 -.2434 (.0638) <.001
Counsel .0230 (.0807) .776 .0476 (.0539) .377
Demong -.0040 (.0677) .953 .0141 (.0392) .718
PrivProg .0386 (.1011) .703 .0925 (.0522) .076
Difficulty .0476 (.0320) .137 .0349 (.0211) .098

1." .0584 <.001 .0573 <.0014

 

 

 

 

 

 

 

4 There is no strict test of H0: 1: = 0 using the EM method, but as shown in Chapter V, the
EM estimate of 1: is usually far more accurate than the CC estimate.

155

 

Table 6.3 shows the parameter estimates and signiﬁcance test results for both the
complete-case and EM estimations. The CC estimations ﬁnd that the mean effect size is
signiﬁcantly different from zero, although there is a large amount of random-effects
variation. Only one other parameter is found to be signiﬁcant — Mandatory, with a
parameter value indicating a difference in average effect size of. 147 favoring programs in
which participation was mandatory as opposed to voluntary. Two other variables,
MeanAge and Diﬁ’iculty tend towards marginal signiﬁcance (p-values of .097 and .137,
respectively).

The EM estimation tells a slightly different story. As with the CC estimation, the
mean eﬁ‘ect size is signiﬁcantly diﬁ‘erent ﬁ'om zero, and there is a large amount of random-
effects variation. However, the EM estimates ﬁnd three variables to have slopes
statistically signiﬁcantly diﬁ'erent from zero: MeanAge, CJSite, and Springer. The
difference stems ﬁom the lower standard errors from the EM method; if the CC estimates
remained the same, but had the lower standard errors of the EM method, the CC estimates
for MeanAge and CJSite would be statistically signiﬁcant as well. The EM slopes for
PrivProg, Difﬁculty, and Mandatory are all marginally statistically signiﬁcant (with p-
values of .07 6, .098, and .121, respectively). If the CC estimates remained the same but
had the EM standard errors, the estimate for the slope for Difficulty would be statistically
signiﬁcant as well. Overall, the standard errors for the CC estimates range ﬁ'om 133% (for
AggHist) to 254% (for CJSite) the size of the corresponding EM standard errors. Of
course, these calculations make the assumption that the bootstrapped EM standard errors

are accurate, but the ﬁndings for k = 100 with 50% missing data in section 8 of Chapter V

156

suggest that the standard errors for this condition are generally the correct size and on

average lead to close to the correct conﬁdence-interval coverage percentages.

3. The Final Model

The ﬁnal model was determined by the less-than—theoretically-defensible method of
simply dropping insigniﬁcant variables (from the EM estimation) one by one until all
variables remaining were either statistically signiﬁcant or at least marginally statistically
signiﬁcant. Each variable dropped was then tested again by adding it to the ﬁnal model,
alone, to see whether its slope became statistically signiﬁcant or marginally statistically
signiﬁcant. Obviously, a more careful, theory-backed method of model testing should be
used in a future application of EM estimation to this dataset, but it seemed unwarranted at
this juncture given the purpose of this analysis (which was not to replicate a speciﬁc
Lipsey analysis), the fact that the sample of 328 effects was clearly not the sample of
effects used in either Lipsey (1999a) or Lipsey (1999b), and the fact that such an analysis
should be conducted in partnership with a subject-matter expert.

The ﬁnal model had four predictors: MeanAge, Periodic, Springer, and PrivSpon.

CC and EM parameter estimates are presented in Table 6.4.

157

 

 

 

Table 6.4

 

 

 

 

 

 

 

 

 

 

 

 

 

Parameter Estimates and Signiﬁcance Tests: Final Model
Parameter CC Estimate CC p-value EM estimate EM p—value
w/ standard w/ standard
error error
Mean Eﬂ‘ect .1341 (.0188) <.001 .1443 (.0149) < .001
Size

MeanAge -.0315 (.0106) .004 -.0340 (.0108) .002
Periodicity .1023 (.0391) .006 .0945 (.0376) .012
Springer -.2327 (.0827) .010 -.2768 (.0846) .001
PrivProg .1148 (.0649) .130 .1218 (.0451) .007

1: .0522 <.001 .0590 1:<.001

 

The results indicate a very large negative eﬂect for the Springer programs, similar

to what Lipsey (1999a) found in his analysis. While the shock-scared and wilderness

programs seem to be popular with the public, they work less well than other types of

 

programs, and might actually have a detrimental effect in some instances. The result for
MeanAge was also strong. For instance, holding other variables constant, treatment for
juveniles of 12 years of age would on average have an effect size .27 better than treatment
for juveniles of 20 years of age. While this result makes substantive sense (it makes sense
that it would be easier to change behavior among the young than among the old), this
result is the opposite of that found in Lipsey (1999a), where he found that treatments were

more eﬂ‘ective when the average age was greater than the median (15.5 years of age) for

158

all juveniles. The result in this analysis does not stem from the partial nature of the
relationship; the unweighted bivariate correlation between MeanAge and the eﬂ‘ect size is
-.21. However, as noted above, while the sample used in this investigation is similar to that
used in the Lipsey analyses, it is not identical. The eﬂ‘ects for PrivProg and Periodicity are
smaller, but substantively important given the overall small average effect size that juvenile
delinquency treatments have. These effects are in line with those found in Lipsey’s
analyses; private programs tend to have larger effects than other programs, as do
programs that deliver treatment in less than a continuous fashion.

The above ﬁndings are based on the EM estimates; however, the CC and EM
estimates and signiﬁcance tests are quite similar. This is not surprising, given that dropping
the eight other predictors raised the number of complete cases considerably, from 146 to
256, leading to fully 78% of the studies having complete data. However, the improvement
in standard error for PrivProg does lead to a statistically signiﬁcant EM estimate of that
slope, while the concomitant CC estimate is only marginally signiﬁcant.

While the EM standard error for PrivProg is clearly superior to the CC standard
error, the other EM standard errors for the other predictors are very similar to the CC
standard errors for those same predictors. In fact, for two of the predictors (MeanAge and
Springer), the EM standard errors are actually marginally larger than the CC standard

errors.

159

4. Conclusions

The purpose of the simulation study was to test the estimation equations derived in
Chapter 3; the purpose of this real data meta-analysis was to show that the program used
to employ those equations could work in a non-simulation environment, with as many
predictors as a meta-analyst might ﬁnd pertinent to examine, and estimate slopes, standard
errors for those slopes, and 1: for that data using both complete-case and EM maximum-
likelihood methods. In this respect, the purpose has been achieved, and in testing the
program it was demonstrated that EM estimation can lead to important improvements in
standard error in a real-data context in a model with many predictors. In the examination
of the initial model’s thirteen predictors, fully 55% of the data was incomplete, even
though the missing data rates for most of the predictors were quite low. In this condition,
assuming one trusts the EM bootstrapped standard errors (and more investigation needs to
be done on the standard errors generated when most predictors are dichotomous), EM
estimation of the parameters was at least 33% more eﬂicient than CC estimation, and
sometimes far more eﬂicient. While the diﬁ‘erences between the CC and EM estimations in
the ﬁnal model are far smaller, as would be expected given the large number of studies
with complete data, the fact remains that EM estimation allows meta-analysts to include
more sparsely measured variables (such as Agngist here) in early models and have
greater power to ﬁnd signiﬁcant results for all predictors.

Finally, it should be noted that the EM program used to estimate these models did
not have to be customized in any but the most trivial way to handle the Lipsey dataset.

Any ASCII meta-analytic dataset that obeys certain rules with respect to variable order

160

(such as what columns the study sample sizes or variance estimates are placed in) can be
analyzed in short order. The program used to estimate the initial model can be changed to
the program to estimate the ﬁnal model in a matter of seconds, and the program can be
changed to estimate models for a diﬂ‘erent dataset in a matter of minutes. The program
outputs both CC and EM estimates, standard errors, 95% conﬁdence intervals for each
slope, and p-values for the tests of each slope, labeled by variable name. The user has the
option of what variables to include in the analysis and how many bootstraps to use to ﬁnd
the EM standard errors. A copy of the program, which is written in SAS/IML, can be
obtained by writing the author at fahrbach@msu.edu or contacting him through Dr. Betsy

Becker at 456 Erickson Hall, Michigan State University, East Lansing, MI, 48824.

161

CHAPTER VII

DISCUSSION AND CONCLUSION

Maximum-likelihood estimation was expected to be more precise than available-
case estimation and complete-case estimation. However, this does not necessarily imply
that the maximum-likelihood method is always preferable, because it requires special
software and is more diﬂicult for non-statisticians to use and understand. The discussion in
this chapter centers around two practical questions from the point of View of the non-
statistician interested in conducting a meta-analysis:

1. If it is available and easy to use, should maximum-likelihood estimation always
be used in meta-analysis? In other words, is maximum-likelihood estimation always
better?

2. Are there situations in which CC or AC estimation is “good enough ” because
the ML method oﬂers little additional eﬂiciency in these situations? In other words, is

maximum-likelihood estimation always substantively better?

1. Some Practical Considerations

Before addressing these questions, three pieces of information need to be
considered. The ﬁrst two consider the actual practice of meta-analysis (which diﬁ‘ers from
what transpired in the simulations in Chapter V), and the third, generalization of the
results in Chapter V.

First, a meta-analysis is rarely as straightforward as one simple estimation of the

162

parameters across all studies of interest. In the analysis in Chapter VI, several models were
estimated, as variables were dropped and (for the complete-case analysis) studies were
added. The meta-analysis ended at that point, as its purpose was simply to show that the
EM estimation program could handle real data. In a real meta-analysis, however, the
analysis would only have begun. The ﬁnal model could be tested within each level of each
of the dichotomous variables in order to determine whether the effects were similar for
any subsets of data. This is not an uncommon practice (e.g., see Lipsey, 1999b); it is done
because of the possibility that for some subsets of studies, the predictors may explain most
of the variation in the study effects, while in others the random-eﬁ'ects variation may
dominate. Similarly, it is possible that there are interaction effects, and that predictors that
are important within one subset of studies are not in another subset. Because the last thing
a practitioner wants to do is use different estimation techniques for diﬂ‘erent subsets of
data (e.g., use available-case estimation for analysis of the entire dataset and use complete-
case or maximum-likelihood estimation for the subsets), I make the assumption that unless
the group of studies in the meta-analysis is truly huge (perhaps over 1000) or unless the
meta-analyst has good reason to exclude the possibility of analyses of subsets of data
across levels of categorical variables, the meta-analyst is likely to desire methods that
work for both small and large numbers of studies. Even if the number of studies is large,
there is a strong possibility, as in the Lipsey dataset, that subsets of interest may have
fewer than 50 studies.

The second consideration is that while seven simulation hyperparameters were

studied, the meta-analyst is only going to know with certainty the values of three of them:

163

the number of studies, the average study sample size, and how much data is missing. The
meta-analyst will only have sample values for the correlations between predictors and
perhaps a rough estimate of the value of 1, both of which may change between subsets of
data and may not accurately reﬂect the population values. The meta-analyst will have even
less of an idea about what the values of Vmod might be in the data, and will likely have no
idea whether the missing-data mechanism is such that the data are MCAR, NMAR, or
NMAR (though many researchers will make the assumption that any missing data are
probably MCAR in nature). Thus, when considering estimation techniques to recommend,
I’ll assume I know no more than a meta-analyst would know in this situation, i.e., the
number of studies, the average study sample size, and whether missing data are (as deﬁned
in the simulations) sparse or heavy.

The third consideration is a caveat: while the statistics in Chapter V are
informative, they may not generalize to all meta-analyses. While the simulations attempted
to examine the behavior of CC, ML, and AC estimation across a wide range of conditions,
there were restrictions. For instance, the simulations only looked at meta-analyses with
three predictors, and only a partial analysis of the effects of categorical predictors was
conducted. Thus, while it can be said that across the many conditions studied, ML
estimation provided estimates of 1: that were about three times as efﬁcient as complete-
case estimation, it obviously cannot be said that across the population of all meta-analyses,

ML estimation of 1 will be three times as efﬁcient as CC estimation.

164

2. Is Maximum-Likelihood Estimation Always Better?

In the ﬁrst question I temporarily neglect the complexity of use of maximum-
likelihood estimation by non-statisticians by assuming that maximum-likelihood estimation
for meta-analyses can be made easily accessible and employable. With this and the above
three considerations in mind, question one is easy to address. In the simulations
conducted, across all types of missing data (MCAR, MAR, and the three types of NMAR
data), on average maximum-likelihood estimation provided (relative to complete-case
estimation) roughly 100% more eﬂicient estimation of B0, 67% times more efﬁcient
estimation of B1, 33% more eﬂicient estimation of [32 and B3, and about 200% more
efﬁcient estimation of 1:. The diﬂ‘erence between the efﬁciencies for estimation for [31 and
the efﬁciencies for estimation of B2 and [53 seems to stem overwhelmingly from the fact
that the ﬁrst predictor was always observed, while the 2m1 and 3rd predictors were only
partially observed. The eﬂiciencies for the 2nd and 3rd predictors were oﬁen very similar,
even in the p-NMAR data, where only the 2'1d predictor was used to generate the missing
data pattern. In addition, large biases occurred in the complete-case estimation of the
mean effect size when the data were MAR or NMAR, but bias was zero or near-zero for
maximum-likelihood estimation. While substantial biases did not occur in the available-
case estimation of the mean, for many combinations of hyperparameters (especially those
with low numbers of studies and large degrees of missingness) available-case estimation
was actually less efﬁcient than complete-case estimation. While in some rare cases
available-case estimation provided the most precise estimates of the three procedures for

one or two parameters, even in these cases the diﬁ‘erence in precision was marginal at

165

best, and maximum-likelihood estimation was more precise for the other parameters of
interest. It cannot be said, then, that on average across the hyperparameters, available-case
estimation was even marginally superior to maximum-likelihood estimation for any
condition.

Because on average across the 528 conditions maximum-likelihood estimation
provided as precise or (far more often) more precise estimates of the parameters than
either of the competing methods, question one must be answered in the afﬁrmative:
maximum-likelihood estimation is always better, at least across the values of the

hyperparameters studied.

3. Is Maximum-Likelihood Estimation Always Substantively Better?

For some, the answer to the ﬁrst question is enough. Some meta-analytic databases
require hundreds of thousands of dollars and much time to build and code (e.g., the Lipsey
dataset), and researchers who have that much invested are likely to want to use the most
advanced techniques possible to squeeze every last bit of precision into their estimates.
Others, however, do not have as much invested, and may want more justiﬁcation for
moving from traditional complete-case methods to something more complex. These
investigators may, justiﬁably, see little purpose in using a more complex method and
specialized software unless it can be shown that the extra inconvenience incurred results in
a good enough “payoﬂ”. While it is impossible to say precisely how much extra
inconvenience ML estimation might be worth, or how much “payoﬂ” a meta-analyst

would look for, the only way to address this question is to somehow quantify the

166

improvement offered by EM methods.

“Payoﬂ” has been chieﬂy measured in this paper as improvement in relative
efﬁciency. Summaries of the average MLE to CC relative eﬂiciencies and AC to CC
relative efﬁciencies appear in Table 7.1 and Table 7.2. While the relative eﬂiciencies were
reported in Chapter V based on whatever simulation hyperparameters were the most
important in the ANOVAS, here they are summarized for the eight conditions reﬂecting
information available to a researcher, given his knowledge of the three hyperparameters
mentioned above (number of studies, average study sample size, and incidence of missing
data). The relative eﬂiciencies given are those for MCAR data, as most meta-analysts
assume that patterns of missingness in their studies are MCAR in nature. Only for MCAR
data does complete-case estimation provide an unbiased estimator of the mean effect; if a
meta—analyst had reason to suspect that his data was not MCAR, he would be more

inclined to switch methods, not less.

167

 

 

 

 

 

 

 

 

 

 

 

 

 

.320 NH
gmHOOE—m—WEH 558 me. 30; Gone
>268 magenta-352a; WEE: 8 358-33%»
no P u» v.5. a
3.x. 50588 knao ><o3mo :wnoo EMS EMU Hbmu Hbmw wpou
><o3mo S M ”So _.AoN BANo _.Hmo FNNN Nooo
owgmmmbm

><o§mo 5 H mo Hug fwd _.SA mgo ﬁomw

Us? w H Hoo
><o$mo 3 H k5o Boom ﬁuww raw _.Go Evoo
. ><o3mo S u mo Np: Npom Humor _.moo Mmmw

dﬁe Saigon w H Ao
>54me 5 H kgoo room _.owm _.umw How» Wag

BE .

o mmBm ><oBmo 3 H mo Noom Boon EA: _.Awa upmo

U»:— w n Hoo
><o~mmo 3 u aoo room Baum ﬁmwu _.AG Pug

 

 

 

 

 

 

 

 

 

168

 

 

 

 

 

 

 

 

 

 

Hugo SN
zmHOGEM—‘wpo E38 we. :0; Um»:
>268 Eve—88:352.“; Xhosa 8 gosswbavdn
no 9 a» no a
3.x. 30838 N“ N no ><03mo 3 .I. mo Ewow HMS. _.oOw boo Ewoo
. . ><oSmo 3 H k3o _.me _.Eq .93 .96 .38
omgmmEm
><a3®o 5 H mo BSA FHA .80 .93 .3.»
U3» » M So
>33me 3 u k5o Powo _.owm .mmo .moN .uoN
> .H mo Ewmm .oﬂ .oNH .ooo wao
3.x. 50838 w N no <o~u®o S
><oSMo 3 H k8o Foam .3m .53 .Awm BZo
3.2:me
m Enema 3 u mo :8 _.93 :3 Ca :8
9.8 a H Hoo
><03mo S u B8o HANN _.wqw .93. Fooo Hod

 

 

 

 

 

 

 

 

 

 

169

The results in Tables 7.1 and 7.2 are as expected given the in-depth analysis in
Chapter V. Maximum-likelihood estimation is always more efﬁcient, often by a large
margin. By contrast, there are often losses in eﬂiciency for AC estimation, especially for
[32 and [33. Also, AC estimation of 1: is regularly worse than CC estimation of 1: when
there is a small amount of missing data. Before this study, it seemed possible that
available-case estimation might provide a substantively acceptable way to avoid listwise
deletion without having to resort to the complexity of maximum-likelihood methods.
While, as noted above, available-case estimation seems to indeed provide precision similar
to that of ML methods for some individual conditions, practically speaking, available-case
estimation cannot be recommended.

Although available-case estimation cannot be recommended, it is not clear that the
results in Table 7.1 lend unequivocal support for a ML meta-analysis, regardless of the
added inconvenience (unless, as she should be, the meta-analyst is concerned with precise
estimation of 1'; the superiority of ML methods to estimate 1: is transparent). For instance,
consider the condition with 50% incidence of missing data, k = 100, and average ni = 400.
This might be considered the “worst case scenario” regarding improvement in efﬁciency
over CC estimation, at least among the combinations of hyperparameters studied; the
increases in eﬂiciency for the estimation of [52 and B3 are less than 20%. This does not
seem like much of an increase, and it is unclear what, substantively, such an increase in
eﬁiciency might mean.

This issue leads to another way to measure “payoﬂ” — in terms of the power of the

statistical tests. Both statisticians and non-statisticians alike are more used to power

170

concerns than concerns about relative eﬂiciency, and with the mean-squared—errors
available for CC and ML estimates of [3 it is straightforward to determine the power of
each statistical test. Thus, for a given condition, the power of the CC and ML estimates of
[30,31,32’ and [33 can be compared.

Speciﬁcally, for BI, [32, and B3, the power of the test of H0: [3 = 0 vs. HA: [3 = BA
was calculated (Mendenhall et a1., 1986), where BA represents the actual population value
of the parameter in the simulations and the empirical root-mean-squared-error is used as
the standard error of the estimate. For [30, the power of the test HO: B0 = 0 vs.HA: [30 = .2
was calculated, using the empirical root-mean—squared-error as the standard error of the
estimate of the intercept. While in the simulations B0 = .8, the power for the test of HA: B0
= .8 is simply too high across all conditions and estimation techniques to be interesting;
thus, a lower value, .2 (that of a “low effect size”, according to Cohen, 1988) was used.
The H A values used for [31, [32, and [33 were the true values of the hyperparameters used in
the simulations.

Average power values for the test for the intercept are in Tables 7 .3. Values of on
used are .01 and .05, given the high power of the test. Average power values for the test
of the slopes are in Tables 7.4 - 7.6. Values of a used are .05 and .15, given the low
power of the tests and the interest researchers might have in marginal statistical

signiﬁcance of predictors’ estimates when constructing their models.

171

Table 7.3

 

Average Power of H0: Bo = 0 vs. H A: [30 = .20,

Across Hyperparameters Known to Meta-Analyst

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

HA: Bo = .20
CC ML
50% k = 40 Avg. ni = 80 .281 .548
Avg. ni = 400 .721 .914
Incidence of
Avg. ni = 80 .723 .929
M Data k=100
' Avg. ni = 400 .978 .999
ct = .01
75% k = 40 Avg. ni -— 80 .131 .428
Avg. ni = 400 .355 .775
Incidence of
Avg. ni = 80 .379 .895
M Data k = 100
' Avg. ”i = 400 .837 .996
50% k = 40 Avg. ni = 80 .782 .980
Avg. ni = 400 .992 1.00
Incidence of
Avg. ni = 80 .998 1.00
k = 100
M' Data Avg. n, = 400 1.00 1.00
at = .05
75% k = 40 Avg. ”i — 80 .389 .926
Avg. ni = 400 .842 .994
Incidence of
Avg. ni = 80 .903 1.00
M Data k = 100
‘ Avg. n, = 400 .999 1.00

 

172

 

Table 7.4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Average Power of H0: B1 = 0 vs. [31 = [3A,
Across Hyperparameters Known to Meta-Analyst

HA: [31 = .222 HA: [51 = .380
CC ML CC ML
50% Avg. ni = 80 .057 .070 .095 .159

k = 40
Incidence Avg. "i = 400 .089 .141 .233 .367
of M. Avg. "i = 80 .091 .134 .241 .378

k = 100
Data Avg. "i = 400 .225 .337 .553 .700

a = .05

75% Avg. "i = 80 .037 .057 .055 .126

k = 40
Incidence Avg. "i = 400 .053 .100 .103 .245
of M. Avg. "i = 80 .059 .117 .122 .334

k = 100
Data Avg. "1 = 400 .118 .272 .322 .618
50% Avg. "1 = 80 .131 .167 .211 .307

k = 40
Incidence Avg. ni = 400 .200 .278 .394 .540
of M. Avg. "i = 80 .204 .271 .407 .553

k = 100
Data Avg. "1 = 400 .383 .505 .693 .814

a = .15

75% Avg. ni = 80 .104 .143 .139 .260

k = 40
Incidence Avg. "1 = 400 .135 .218 .222 .416
of M. Avg. "i = 80 .146 .246 .253 .510

k = 100
Data Avg. "1 = 400 .245 .438 .486 .758

 

173

 

Table 7.5

 

Average Power of H0: [52 = 0 vs. H A: [32 = [3A,

Across Hyperparameters Known to Meta-Analyst

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

HA: p2 = .341 HA: 52 = .518
CC ML CC ML
50% Avg. ni = 80 .066 .081 .140 .197
k = 40
Incidence Avg. ai = 400 .142 .181 .366 .474
ofM. Avg. ni = 80 .143 .178 .377 .481
k = 100
Data Avg. ni = 400 .366 .442 .701 .786
a = .05
75% Avg. ni = 80 .044 .058 .072 .119
k = 40
Incidence Avg. ”i = 400 .069 .095 .148 .239
ofM. Avg. ni = 80 .082 .119 .194 .341
k = 100
Data Avg. ni = 400 .189 .293 .474 .640
50% Avg. ni = 80 .160 .187 .279 .356
k = 40
Incidence Avg. ni = 400 .280 .330 .534 .627
ofM. Avg. ni = 80 .283 .328 .546 .634
k = 100
Data Avg. ni = 400 .303 .598 .813 .872
a = .15
75% Avg. ni = 80 .118 .146 .172 .249
k = 40
Incidence Avg. ”1 = 400 .166 .210 .290 .406
ofM. Avg. ni = 80 .188 .247 .350 .510
k = 100
Data Avg. ni = 400 .342 .451 .628 .763

 

174

 

Table 7.6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Average Power of Ho: [33 = 0 vs. H A: B3 = BA,
Across Hyperparameters Known to Meta-Analyst

HA: B3 = .619 HA:B3 = .656
CC ML CC ML
50% Avg. ni = 80 .124 .166 .203 .291

k = 40
Incidence Avg. ni = 400 .312 .41 1 .487 .600
of M. Avg. ni = 80 .326 .401 .512 .615

k = 100
Data Avg. "i = 400 .650 .727 .814 .880

at = .05

75% Avg. ni = 80 .065 .100 .092 .171

k = 40
Incidence Avg. "i = 400 .137 .216 .227 .371
of M. Avg. "i = 80 .162 .273 .273 .486

k = 100
Data Avg. ”1 = 400 .418 .586 .600 .776
50% Avg. ni = 80 .255 .314 .362 .463

k = 40
Incidence Avg. "i = 400 .479 .566 .639 .731
of M. Avg. "i = 80 .497 .567 .661 .744

k = 100
Data Avg. "i = 400 .774 .830 .894 .937

on = .15

75% Avg. ni = 80 .158 .219 .206 .323

k = 40
Incidence Avg. ni = 400 .274 .374 .389 .544
of M. Avg. "i = 80 .309 .437 .441 .636

k = 100
Data Avg. "i = 400 .579 .708 .731 .860

 

 

175

 

 

 

A scan of the above Tables allows for some general conclusions. First, except for
small meta-analyses with large amounts of missing data and small average study sample
sizes, the power of the test of H0: [30 = 0 is very, very strong, even when the alternative
hypothesis is that [30 is very small. Second, across all conditions the powers of the slope
tests H0: Bj = 0 (1' = 1 to 3) are often quite low, though the power is strongly dependent
on the size of the slope in the alternative hypothesis. Finally, because of the nature of the
normal curve and the nature of power calculations, the superiority of the power of the ML
estimates is most obvious when the CC power is neither extremely high or extremely low.
If the power for a test of a CC estimator is about .05, it is unlikely that the power for the
same test of an ML estimator will be greater than .08. However, in many instances, when
the CC power is about .30, the ML power will be .45 or better.

What can a researcher who has a situation with 50% missing data, a large number
of studies, and a large average study sample size gather from these tables? First, as long as
the data are MCAR in nature, she will get a very precise estimate of [30 (or, the average
effect size, assuming the predictors are grand-mean centered), even if she uses CC
estimation. Second, the added power ﬁ'om ML estimation will depend on the size of the
eﬁ‘ects she is trying to ﬁnd. If she is trying to ﬁnd small or moderate effects, EM
estimation may give increases in power from 22.5% to 33.7%, or 55.3% to 70.0% (Table
7.3). These ﬁgures are for the slope of the completely-observed variable, [31. For the
partially observed variable [52 the improvements are smaller: from 36.6% to 42.2% and
from 70.1% to 78.6% (Table 7.4). The improvement is similar for [33, except that the

average power levels are higher because the larger values of [33 are examined.

176

The increase in eﬂiciency can thus be reduced to an increase in power. Whether
this increase in power is trivial depends on the interpretation, but EM estimation in this
instance is a reasonable alternative given the low power of the CC estimates of the slopes.
Even for the completely observed variable, the meta-analyst has fairly low power to detect
a small slope; however, she is 5 0% more likely to ﬁnd one with EM estimation than with
CC estimation5 . She has a moderate chance to detect a medium-sized slope; however, she
is 27% more likely to ﬁnd one with EM estimation than with CC estimation. These small
increases in power lead to deﬁnite increases in the odds of discovering real relationships
between predictors and the outcome. The increases for the partially observed variable ‘32,
using the power values cited above, are smaller: 15% and 12% increases in the odds of
ﬁnding signiﬁcant effects. It would be easier to argue that these improvements do border
on trivial

However, the formula for the calculation of a necessary sample size given a desired
a and power level (Mendenhall et al., 1988) leaves us with one last way of measuring
“payoﬂ”: how many more studies like the ones already gathered would the researcher
have to obtain an increase in efﬁciency comparable to that oﬁered by NIL analysis? Using
the “worst case scenario” above of 50% missing data, 100 studies, and an average study
sample size of 400, the results are as follows:

For the completely observed variable [31: an increase in power of 22.5% to 33.7%
would require 28 more studies, while an increase in power of 55.3% to 70% would

require 20 more studies.

 

5The ratio of power values for a small eﬂ‘ect is 33.7%/22.5% = 1.50
177

 

 

For the partially observed variable [32: an increase in power of 36.6% to 42.2%
would require 9 more studies, and an increase in power of 70. 1% to 78.6% would require
10 more studies.

It is left to the meta-analyst whether the work of locating and coding these
additional studies is as costly and time-consuming as learning to apply EM estimation.
Suppose, though, that it takes a researcher 200 hours to ﬁnd, code, and enter 100 studies.
(This may be a conservative estimate in some subject areas, a h'beral estimate in others). It
is fair to assume that most of the ﬁndings that the meta-analyst will make depend on the
precision of his estimation. Thus, if it takes 200 hours to ﬁnd and code 100 studies, it is
worth 20 hours of work to learn how to conduct a maximum-likelihood estimation of
one’s data, which would be equivalent to adding 10 studies to estimate those slopes for
which there is missing data. Or perhaps it would be worth 40 hours of work to employ
ML, as the increase in power to detect an effect for a completely observed variable would
be equivalent to that achieved by ﬁnding 20 more studies. It should be added that these
calculations ignore the large increases in efﬁciency in the estimation of 1: for maximum-
likelihood, and the advantages of ML estimation in handling diﬁ‘erent missing-data
mechanisms. It also might be argued that these calculations ignore the many hours spent
on primary studies, hours that are to some degree wasted when the most efﬁcient

estimation techniques are not employed.

4. Future Research

As noted above, further simulation work (though perhaps not as intensive as what

178

 

 

was conducted here) is needed to conﬁrm the superiority of ML estimation when there are
many predictors. More important is an investigation into whether the current estimation
procedure can handle dichotomous predictors that have missing data. While the brief
investigation in Chapter V, Section 7 suggests that the ML estimation derived here is
robust to the assumption of normality of predictors, further investigation is in order. If ML
estimation is not as robust as expected for dichotomous predictors, a ML method for
meta-analytic data that considers a multinomial distribution for categorical predictors
should be derived. Within this investigation, the accuracy of bootstrapped standard errors
for data with dichotomous predictors should be considered. Similarly, it may be
worthwhile to investigate the accuracy of bootstrapped standard errors for diﬁ‘erent
missing-data mechanisms, though the work of Su (1988) suggests that such standard
errors should still be accurate. An investigation into how this ML estimation procedure
might handle measurement error in the predictors would also be worthwhile.

Finally, it would be beneﬁcial to derive a maximum-likelihood method that could
handle dependent effects (such as the analysis of Moritz et al., 2000). This step adds a
layer of complexity to an already complex model; however, many meta-analysts must ﬁnd
some way to deal with dependent eﬂects and it would be ideal for a maximum-likelihood

estimation program to have the ability to handle them.

5. Conclusion
The purpose of this research was to investigate alternatives to complete-case

estimation of meta-analytic parameters, with a focus on maximum-likelihood estimation

179

 

 

but with consideration of available-case estimation as a simpler and perhaps substantively
equivalent alternative. A maximum-likelihood method designed to handle the intricacies of
meta-analytic data was derived and tested through simulation in Chapter V and applied to
a real dataset in Chapter VI. Analyses in Chapters V and VII suggest that ML methods
lead to important gains in efﬁciency that are generally superior to those found through AC
methods. However, the gains in efﬁciency are not always great, and depend on many
hyperparameters, some of them easily discemable to the meta-analyst, such as number of
studies, amount of missing data, and average study sample size. Relative efﬁciencies were
lowest for large numbers of studies with large sample sizes and little missing data. Even in
this case, however, the “payoﬁ” of ML estimation may be acceptable relative to the time
spent with the many other tasks associated with conducting a meta-analysis. When other
factors are considered, such as the clearly superior efﬁciency of ML estimation of 1: and
the ability of ML estimation to handle diﬂ‘erent types of missing data, the payoff is even
greater.

The “payoﬂ” is not acceptable, of course, if there is no accessible software for the
meta-analyst to run. Creation of a program that conducts maximum-likelihood estimation
takes a considerable amount of time. However, I am making the program used to analyze
the Lipsey dataset in Chapter VI availableé. The program runs in SAS/IML, accepts
ASCII data, and can be learned in under an hour if one is even slightly familiar with SAS.

In addition, little “customization” is needed to make the program able to handle any given

 

6 The author may be contacted at fahrbach@msu.edu or through Dr. Betsy Becker,
Erickson Hall, Michigan State University, East Lansing, MI. (517) 355-9567.

180

dataset. With this program available, the main hurdle to maximum-likelihood analysis of
meta-analytic data is overcome. This research moves maximum-likelihood estimation from
merely “something to be considered” to a method that, across a wide range of realistic

conditions, must be considered strongly advisable.

181

APPENDIX

Study Sample Sizes and Missing Data Patterns

There were eight combinations of k, average study sample size, and incidence of
missing data. The tables below detail the distribution of missing data on the 2nd and 3rd
predictors for these eight combinations of hyperparameters. Each cell represents one
study. The ﬁrst number in a cell is a study sample size, while the second refers to a pattern
of missing data. The label “C” means that data were complete: the study had observations
on all three predictors. The label “MZ” means that only the 2nd predictor was missing,
“M3" means that only the 3rd predictor was missing, and “M23" means that both the 2nd
and 3rd predictors were missing.

Thus, within each combination of hyperparameters, the patterns of missingness, the
average study sample size, and the mapping of the missing data patterns to particular
groups of study sample sizes are all held constant. This was done in order to limit the
amount of variance in the ﬁndings to as few sources as possible (i.e., the seven
hyperparameters). Future simulation work might allow more ﬂexibility in how study
sample sizes are assigned and allow for missing data patterns that vary between simulated

meta-analyses.

182

 

 

Table A.1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

k = 40, Average ni = 80, 50% Incidence of Missing Data
40,M23 160,M2 40,C 40,C
40,M23 320,M2 40,C 40,C
40,M23 40,M3 40,C 40,C
160,M23 40,M3 40,C 40,C
40,M2 40,M3 40,C 40,C
40,M2 40,M3 40,C 40,C
40,M2 40,M3 40,C 40,C
40,M2 40,M3 40,C 40,C
40,M2 160,M3 160,C 160,C
40,M2 320,M3 320,C 320,C
Table A2
k = 40, Average "i = 80, 75% Incidence of Missing Data
40,M23 40,M2 40,M3 40,C
40,M23 40,M2 40,M3 40,C
40,M23 40,M2 40,M3 40,C
40,M23 40,M2 40,M3 40,C
40,M23 40,M2 40,M3 40,C
40,M23 40,M2 40,M3 40,C
40,M23 40,M2 40,M3 40,C
40,M23 40,M2 40,M3 40,C
160,M23 160,M2 160,M3 160,C
320,M23 320,M2 320,M3 320,C

 

 

 

 

 

183

 

 

Table A3

k = 40, Average ni = 400, 50% Incidence of Missing Data

80,M23

320,M23

800,M23
80,M2
80,M2

320,C
800,C
1600,M3 1600,C

 

Table A.4
k = 40, Average ni = 400, 75% Incidence 01' Missing Data

80,M23 80,M3

800,M2
1600,M23 1600,M2 1600,M3 1600,C

 

184

 

 

 

Table A.5

, k = 100, Average "1 = 80, 50% Incidence of Missing Data

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

40,M23 40,M2 40,M3 40,C 40,C
40,M23 40,M2 40,M3 40,C 40,C
40,M23 40,M2 40,M3 40,C 40,C
40,M23 40,M2 40,M3 40,C 40,C
40,M23 40,M2 40,M3 40,C 40,C
40,M23 40,M2 40,M3 40,C 40,C
40,M23 40,M2 40,M3 40,C 40,C
40,M23 40,M2 40,M3 40,C 40,C
160,M23 160,M2 160,M3 160,C 160,C
320,M23 320,142 320,M3 320,C 320,C
40,M2 40,M3 40,C 40,C 40,C
40,M2 40,M3 40,C 40,C 40,C
40,M2 40,M3 40,C 40,C 40,C
40,M2 40,M3 40,C 40,C 40,C
40,M2 40,M3 40,C 40,C 40,C
40,M2 40,M3 40,C 40,C 40,C
40,M2 40,M3 40,C 40,C 40,C
40,M2 40,M3 40,C 40,C 40,C
160,M2 160,M3 160,C 160,C 160,C
320,M2 320,M3 320,C 320,C 320,C

 

 

185

 

 

Table A.6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

k = 100, Average "i = 80, 75% Incidence of Missing Data
40,M23 160,M23 40,M2 40,M3 40,C
40,M23 320,M23 40,M2 40,M3 40,C
40,M23 160,M23 40,M2 40,M3 40,C
40,M23 320,M23 40,M2 40,M3 40,C
40,M23 160,M23 40,M2 40,M3 40,C
40,M23 40,M2 320,M2 40,M3 40,C
40,M23 40,M2 160,M2 40,M3 40,C
40,M23 40,M2 320,M2 40,M3 40,C
40,M23 40,M2 160,M2 40,M3 40,C
40,M23 40,M2 320,M2 40,M3 40,C
40,M23 40,M2 40,M3 320,M3 40,C
40,M23 40,M2 40,M3 160,M3 40,C
40,M23 40,M2 40,M3 320,M3 40,C
40,M23 40,M2 40,M3 160,M3 40,C
40,M23 40,M2 40,M3 320,M3 40,C
40,M23 40,M2 40,M3 40,C 160,C
40,M23 40,M2 40,M3 40,C 320,C
40,M23 40,M2 40,M3 40,C 160,C
40,M23 40,M2 40,M3 40,C 320,C
40,M23 40,M2 40,M3 40,C 160,C

 

186

 

Table A7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

k = 100, Average "i = 400, 50% Incidence of Missing Data
80,M23 80,M2 80,M3 80,C 80,C
80,M23 80,M2 80,M3 80,C 80,C
80,M23 80,M2 80,M3 80,C 80,C
80,M23 80,M2 80,M3 80,C 80,C
320,M23 320,M2 320,M3 320,C 320,C
320,M23 320,M2 320,M3 320,C 320,C
320,M23 320,M2 320,M3 320,C 320,C
320,M23 320,M2 320,M3 320,C 320,C
800,M23 800,M2 800,M3 800,C 800,C
1600,M23 1600,M2 1600,M3 1600,C 1600,C
80,M2 80,M3 80,C 80,C 80,C
80,M2 80,M3 80,C 80,C 80,C
80,M2 80,M3 80,C 80,C 80,C
80,M2 80,M3 80,C 80,C 80,C
320,M2 320,M3 320,C 320,C 320,C
320,M2 320,M3 320,C 320,C 320,C
320,M2 320,M3 320,C 320,C 320,C
320,M2 320,M3 320,C 320,C 320,C
800,M2 800,M3 800,C 800,C 800,C
1600,M2 1600,M3 1600,C 1600,C 1600,C

 

187

 

Table A.8

‘ k = 100, Average a, = 400, 75% Incidence of Missing Data

80,M23

320,M3

80,C

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

800,M23 320,M2
80,M23 1600,M23 320,M2 320,M3 80,C
80,M23 800,M23 320,M2 320,M3 80,C
80,M23 1600,M23 320,M2 320,M3 80,C
80,M23 800,M23 320,M2 320,M3 80,C
80,M23 80,M2 1600,M2 320,M3 320,C
80,M23 80,M2 800,M2 320,M3 320,C
80,M23 80,M2 1600,M2 320,M3 320,C
80,M23 80,M2 800,M2 320,M3 320,C
80,M23 80,M2 1600,M2 320,M3 320,C
320,M23 80,M2 80,M3 1600,M3 320,C
320,M23 80,M2 80,M3 800,M3 320,C
320,M23 80,M2 80,M3 1600,M3 320,C
320,M23 80,M2 80,M3 800,M3 320,C
320,M23 80,M2 80,M3 1600,M3 320,C
320,M23 320,M2 80,M3 80,C 800,C
320,M23 320,M2 80,M3 80,C 1600,C
320,M23 320,M2 80,M3 80,C 800,C
320,M23 320,M2 80,M3 80,C 1600,C
320,M23 320,M2 80,M3 80,C 800,C

188

REFERENCES

Anderson, A., Basilevsky, A. & Hum, D. (1983). Missing data: A review of the literature.
In J .D. Wright, P.H. Rossi, & A.B. Anderson (Ed.), Handbook of survey research.
New York: Academic Press.

Azen, S. & Van Guilder, M. (1981). Conclusions regarding algorithms for handling
incomplete data. Proceedings of the Statistical Computing Section, American
Statistical Association 1981, 53-56.

Beale, E. & Little, R. (1975). Missing values in multivariate analysis. Journal of the Royal
Statistical Society, B37, 129-145.

Beaton, A. E. (1964). The use of special matrix operations in statistical calculus.
Educational Testing Service Research Bulletin, RB-64-51.

Becker, B. J. (1985). Tests of combined signiﬁcance: Hypotheses and power
considerations. Unpublished doctoral dissertation, University of Chicago.

Begg, C. B. (1994). Publication bias. In H. Cooper & L. V. Hedges (Eds.), Handbook of
research synthesis. New York: Russell Sage Foundation.

Bryk, A. S. & Raudenbush, S. W. (1992). Hierarchical linear models: applications and
data analysis methods. Newbury Park, CA: Sage Publications, Inc.

Buck, S. (1960). A method of estimation of missing values in multivariate data suitable for
use with an electronic computer. Journal of the Royal Statistical Society, B22,

302-306.

189

 

Chang, L. (1992). A power analysis of the test of homogeneity in effect-size meta-
analysis. Unpublished doctoral dissertation, Michigan State University.

Cohen, J. (1988). Statistical power analyses for the behavioral sciences (2nd ed.)
Hillsdale, NJ: Erlbaum.

Cook, R., Tsai, C., & Wei, B. (1986). Bias in nonlinear regression. Biometrika, 73(3),
615-623.

Cordeiro, G. & McCullagh, P. (1991). Bias correction in generalized linear models.
Journal of the Royal Statistical Society B, 3, 629-643.

Dempster, A.P, Laird, N.M., & Rubin, DB. (197 7). Maximum likelihood ﬁ'om incomplete
data via the EM algorithm. Journal of Royal Statistical Society (with discussion),
B39, 1-38.

Dixon, W. J. (1992). BMDP Statistical Software Manual (Vol. 2). Berkeley, CA:
University of California Press.

Eﬁon, B. & Tibshirani, R. (1991). Statistical data analysis in the computer age. Science,
253, 390-395.

Fahrbach, K. (1995). A Monte-Carlo investigation of univariate and multivariate meta-
analysis. Unpublished apprenticeship paper, Michigan State University.

Gay, L. R. (1992). Educational research: competencies for analysis and application.
New York: Merrill.

Glynn, R., Laird, N., & Rubin, D. B. (1986). Selection modeling versus mixture modeling

with nonignorable nonresponse. In H. Wainer (Ed.), Drawing inferences from self-
selected samples. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.

190

Green, B. & Hall, J. (1984). Quantitative methods for literature review. Annual Review of
Psychology, 35, 37—53.

Haitovsky, Y. (1968). Missing data in regression analysis. Journal of the Royal Statistical
Society ,B30, 67-82.

Harris, M. & Rosenthal, R. (1985). Mediation of interpersonal expectancy effects: 31
meta-analyses. Psychological Bulletin, 97(3), 363-386.

Hunter, J. E. & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and
bias in research ﬁndings. Beverly Hills, CA: Sage.

Kim, J. & Curry, J. (1977). The treatment of missing data in multivariate analysis.
Sociological Methods and Research, 6, 215-240.

Lent, R. H., Auerbach, H. A., & Levin, L. S. (1971). Research design and validity
assessment Personnel Psychology, 24, 247-274.

Lipsey, M. W. (19993). Can intervention rehabilitate serious delinquents? American
Academy of Political and Social Science, 564, 142-166.

Lipsey, M. W. (1999b). Can rehabilitative programs reduce the recidivism of juvenile
offenders? Virginia Journal of Social Policy and the Law, 6(3), 611-641.

Little, R. (1988). Robust estimation of the mean and covariance matrix from data with
missing values. Applied Statistician, 3 7, Vol.1, 23-28.

Little, R. (1992). Regression with missing X’s: A review. Journal of the American
Statistical Association, 87, 1227- 1237.

Little, R. (2000). Personal communication, August, 2000.

191

Little, R.J.A. and Raghunathan, T. E. (1999). On summary-measures analysis
of the linear mixed-eﬁ'ects model for repeated measures when data are not
missing completely at random. Statistics in Medicine, 18, 2465-2478.

Little, R. & Rubin, D. (1987). Statistical analysis with missing data. New York: John
Wiley.

Matthai, A. (1951). Estimation of parameters from incomplete data with application to
design of sample surveys. Sankhya, 2, 145-152.

Mendenhall, W., Scheaﬂ‘er, R., & Wackerly, D. (1986). Mathematical Statistics with
Applications. Duxbury Press: Boston.

Meng, X. & Rubin, D. (1991). Using EM to obtain asymptotic variance-covariance
matrices: The SEM algorithm. Journal of the American Statistical Association,
86, 899-909.

Morrison, D. F. (1967). Multivariate statistical methods. New York: McGraw-Hill.

Moritz, S., Feltz, D., Fahrbach, K., & Mack, D. (2000). The relation of self-efﬁcacy
measures to sport performance: A meta-analytic review. Research Quarterly for
Exercise and Sport, 71(3), 280-291.

Pigott, T. (1992). The application of normal maximum-likelihood methods to missing
data in meta-analysis. Unpublished doctoral dissertation, University of Chicago,
Chicago IL.

Pigott, T.(1994). Methods for handling missing data in research synthesis. In H. Cooper &
L. V. Hedges (Eds.), Handbook of research synthesis. New York: Russell Sage

Foundation.

192

Raudenbush, S. W. (1994). Random effects models. In H. Cooper & L. V. Hedges (Eds.),
Handbook of research synthesis. New York: Russell Sage Foundation.

Schafer, J. (1997a). Analysis of incomplete multivariate data. New York: Chapman &
Hall.

Schafer, J. (1997b). Imputation of missing covariates under a multivariate linear mixed
model. Technical Report, Department of Statistics, Pennsylvania State University

Schafer, J. (1998). Some improved procedures for ﬁxed linear models. Technical Report,
Department of Statistics, Pennsylvania State University.

Searle, S.R., Casella, G., & McCulloch, C. E. (1992). Variance components. New York:
John Wiley & Sons.

Shadish, W. R. & Haddock, C. K (1994). Combining estimates of effect size. In H.
Cooper & L. V. Hedges (Eds.), Handbook of research synthesis. New York:
Russell Sage Foundation.

Slavin, R. E. (1984). Meta-analysis in education: How has it been used? Educational
Researcher, 13(8), 6-15.

Su, H. (1988). Estimation of standard errors in some multivariate models with missing
data. Unpublished doctoral dissertation, University of Michigan.

Tatsuoka, M. (1988). Multivariate analysis. .New York: Macmillan.

Van Praag, B.M.S, Dijkstra T.K., & Van Veltzen, J. (1985). Least squares theory based
on distributional assumptions with an application to the incomplete observations

problem. Psychometrika, 50, 25-36.

193

1
. lit... .llhl).

i.l.
. ll... .11

 

 

.:
2:: I.
.7.

.

5...:
.:I..

.. "4.3.3.”...
5.31;. . L.

 

.N. . 31......

5:3...
L... 31.1

 

 

 

 

 

 

.522! .
1;... . .
. . . . 11.14
attxiu. .
I. t .

”v
. 1.; x...

:.
:
7
4
.:
...

.t 5.1.
225.... .

:
.

 

:1. :
5;. . :9).
k1

:. i 5. . .
r.... . i. I:

... :. . . .
:: ALC. .\.:ﬂr:4i: tf..< .

EVE... r .
. E .

. . :15;

.i..k€4£.i.

«The

. .2.

.21..

an“...
. mfﬁ
VIVA a

.. 1. . x. . . .
. .5,”); .
. .__c‘.
. . 3:35
rha.
.

i a .
I...
a»... .
1.2.: 3...;
.:.

_
,. ._u... L
.9... t. 1:..
43......

. a...

..

... .
3.23.”: ..
a.

ran}... w :1
"E. :ﬂt... 2n. n. 5.1:...
. .i. : 3:55.: .
5:55.. .. . .
1 . . u a ,5: . .
; .: r. _. Human... . , ans... 7
.3: . : .7 . . .3...
. 2.1....

523.35.... v.55? £2 . a: . . .: .
:v .. l—C .