LIBRARY

Michigan State
University

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

0
7‘6)
‘4 .
H “
w’ -
F3 ,
u.)

 

 

 

 

 

 

 

 

 

 

 

L

MSU I. An Affirmieiive AdlorVEquel Opportunity Institution
em

 

WG-93

 

A POWER ANALYSIS OF
THE TEST OF HOMOGENEITY

IN EFFECT-SIZE META-ANALYSIS

BY
Lin Chang

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Counseling, Educational Psychology,
and Special Education

1992

ABSTRACT
POWER ANALYSIS OF

THE TEST OF HOMOGENEITY IN META-ANALYSIS

BY

Lin Chang

The power of homogeneity tests in both fixed- and
random-effects models in effect-size meta-analyses is
studied. Power functions are approximated and simulated.
The impact of the power of the homogeneity test on
statistical errors of subsequent tests of effect magnitude
is also examined. The homogeneity test or g statistic had
an asymptotic central chi-squared distribution when effect
sizes were homogeneous. When the effect sizes were not
homogeneous, under the fixed-effects model, the distribution
of the 3 statistic was well approximated by a noncentral
chi-squared distribution. The probability of a type I error
(a false rejection) was higher than the preset a level when
study effects were from many small samples. In order to
maintain the desired significance level, meta-analysts were
advised to lower the nominal type I error rate for reviews

with many small samples. The non-null distribution of the

homogeneity test ﬂ+ under the random-effects model is

approximated well by a combination of many noncentral chi-
squared distributions. Power values were compared for
subsequent tests of effect magnitude (5 tests) calculated
with the fixed-effects variance (gF) versus tests with the
random-effects variance (ﬁn) in the presence of a
statistical error at stage one of testing. When the stage-
one test of homogeneity was falsely accepted, the subsequent
fixed-effects test (gF) was slightly more powerful than the
appropriate random-effects test (g3). When the stage-one
test of homogeneity was falsely rejected, the subsequent
random-effects test (in) was much less powerful than the
correct fixed-effects test (5?). To prevent the random-
effects test (3R) from being falsely applied, reviewers
could either apply other approaches to prevent the use of
the test until more is learned about the estimator of
parameter variance used in the random-effects test, or
reviewers could lower the Type I error rate (the possibility

of false rejection) for the homogeneity test at stage one.

Copyright by
LIN CHANG

1992

To my parents Yu-Tai and Jen-Pin Han Chang

ACKNOWLEDGEMENT

Many thanks are due to those who have supported me in
the completion of this dissertation. I am grateful for
God’s sufficient mercy and provision.

First I thank my advisor Dr. Betsy Becker, who was also
a great friend. She understood me well despite cultural
differences and helped me learn to write statistical
problems. I was often inspired by her persistent
encouragement. I deeply appreciate her patience and
availability to me, especially for the time and energy she
spent with me outside her office hours.

I sincerely thank all my committee members, Dr. Steve
Raudenbush for his consistent support and helpful
suggestions; Dr. James Stapleton for his constructive advice
and insightful assistance in the mathematical part of my
dissertation; and, last but not least, Dr. Susan Phillips
for her friendly encouragement and sincere concerns during
the process of the completion of this paper. I also thank
computer consultants Ryan Simmons and Randy Foutiu for their
useful assistance in computer operation.

I thank my parents for the way they raised me and for
their unconditional love for and trust in me. Finally, I
thank my loving and supporting husband, Jacob Chi, for his

tacit understanding and belief in me.

TABLE OF CONTENTS
Page

LI ST OF TABLES O O O O O O O O O O O O O O O O O O x

LIST OF FIGURES . . . . . . . . . . . . . . . . xvii
CHAPTER
I. INTRODUCTION . . . . . . . . . . . . . . . . 1
Meta-analysis in Educational Research . . 1
Purpose of the Study . . . . . . . . . . 2
Need for a Power Study of the Homogeneity
Test in Effect-size Meta-analysis . . . . 4
Definition of Statistical Power . . . . 4
Importance to the Test of Fit . . . . . 6
Power of the Homogeneity Test . . . . . 7
Need for a Power Study . . . . . . . . . 7
Comparison to the Unbalanced Analysis of
Variance Case . . . . . . . . . . . . . . 8
II. STATEMENT OF THE PROBLEM . . . . . . . . . . 11

Power of the Statistical Test in Empirical
Research . . . . . . . . . . . . . . . . 11

Power of the Homogeneity Test in Meta-
analysis . . . . . . . . . . . . . . . . 12

III. POWER OF HOMOGENEITY TESTS IN EFFECT-SIZE
ANALYSES O O O O O O O O O O O O O O I O O 17

Definitions and Notation . . . . . . . . . 17
Population Effect Size . . . . . . . . 17
Glass's Estimate of Effect Size . . . . 17
Unbiased Estimate of Effect Size . . . 18

Analytical Approximation of Power . . . . 20
Effect-size Analyses for Fixed-Effects

Models . . . . . . . . . . . . . . . . 20
Hypotheses . . . . . . . . 20
Homogeneity Test Statistic . . . . 21
Distribution of the Homogeneity Test

for Fixed-effects Models . . . . . 22
Theorem . . . . . . . . . . . . . . 23
Proof . . . . . . . . . . . . . . . 23

vii

Effect-size Analyses for Random-Effects
Models . . . . . . . . . . . . . . . . 25
Hypotheses . . . . . . . . . . . . 27
Homogeneity Test Statistic . . . 27
Distribution of the Homogeneity Test
for Random-effects Models . . . . 28
Theorem . . . . . . . . . . . . . . 28
Proof . . . . . . . . . . . . . . . 28

IV. SIMULATION OF THE DISTRIBUTIONS OF THE
STATISTICS FOR POWER UNDER FIXED- OR RANDOM-
EFFECTS MODELS O O O O O O O O O C O O O O 3 1

Parameters of the Simulation Study . . . . 31

Number of Effect Sizes . . . . . . . . 32
Sample Sizes . . . . . . . . . . . . . 33
Population Effect Sizes . . . . . . . . 36
Variance of Population Effects . . . . 37
Design of the Simulation Study . . . . . 37
Computation for Simulated Distributions . 4O
Fixed-effects Models . . . . . . . . . 40
Random-effects Models . . . . . . . . . 41
Test for Goodness of Fit . . . . . . . . . 42
Results . . . . . . . . . . . . . . . . . 43
Power Discrepancies for Fixed-effects
Models . . . . . . . . . . . . . . . . 43
Number of Effect Sizes (5) . . . . 46
Sample Sizes (N) . . . . . . . . . 47
Sampling Fractions (#1) . . . . . . 49

Sampling Ratios (¢ ) ‘. . . . . . . 51

Patterns of Effect-size Parameters. 52

Summary . . . . . . . . . . . . . 58
Power Discrepancies on Random-effects

Models . . . . . . . . . . . . . . . . 60
Power Analysis . . . . . . . . . . . . . 64
Fixed-effects Model . . . . . . . . 64
Random-effects Model . . . . . . . 77

V. THE INFLUENCE OF THE SIGNIFICANCE LEVEL AND
POWER OF THE FIRST STAGE TEST ON THE SECOND
STAGE TEST: A SEQUENTIALLY RELATED TESTING
PROCEDURE . . . . . . . . . . . . . . . . . 80

Two-stage Testing . . . . . . 82
Influence of Sequentially Related Hypothesis
Testing on Statistical Errors . . . . . . 83
Acceptance of the Overall Homogeneity
Test . . . . . . . . . . . . . . . . . 83
Rejection of the Overall Homogeneity
Test . . . . . . . . . . . . . . . . . 85
Summary . . . . . . . . . . . . . . . . 86

viii

VI.

Simulation of Power for Sequential Tests . 89

Factors for Simulation of Subsequent z

Tests 0 O I O O O O O O O O O
Resu1ts O O O O O O O O O O O O O

. 9O
. 92

Simulated vs. Theoretical Power Values. 92

Fixed-effects Tests . . . .
Random-effects Tests . . .
summary 0 O O O O O O O 0

Power of z Based on Decisions about

Homogeneity . . . . . . . . .

Homogeneous Population Effects
Heterogeneous Population Effects

summa ry O O O O O O I O O O
Adjustment to Maintain the Desired
Testing Error Rates . . . . . . .

CONCLUSIONS AND IMPLICATIONS . . . .

Example . . . . . . . . . . . . .
Summary . . . . . . . . . .
The Power of the Homogeneity Test
The Power of the 1 Test . . . . .
Practical Implications . . . . . .
Suggestions for Further Research .

APPENDIX A:
APPENDIX B:
APPENDIX C:
APPENDIX D: FIGURES
APPENDIX E:

BIBLIOGRAPHY . . .

POWER TABLES

ix

SUPPLEMENTARY TABLES

LIST OF SYNTHESIZED STUDIES

CHOOSE NUMBER OF REPLICATIONS

. 93
. 103
. 112
. 116
. 116
. 119
. 120
. 120
. 125
. 125
. 127
. 128
O 129
. 131
. 131
. 133
. 134
. 144
. 165
. 192

. 195

10.

11.

12.

13.

14.

LIST OF TABLES

Sampling Fractions for Power Study . . . .

Page

35

Paired t Test between Theoretical and

Simulated Power (a
Model . . . . . .

Crosstabulation of

Crosstabulation of
by k . . . . . . .

Crosstabulation of
by Sample Size . .

Crosstabulation of
by N and k . . . .

Crosstabulation of
by ”i o o o o o o

Crosstabulation of
by n; and k . . .

Crosstabulation of
by ¢i o o o o o o

Crosstabulation of
by ¢i and k . . .

Crosstabulation of
by Pattern of Sis

Crosstabulation of

= 0.05) for

Discrepancies and k . .

Significant

Significant

Significant

Significant

Significant

Significant

Significant

Significant

Significant

by Pattern of sis and k . . .

Crosstabulation of

Significant

by Pattern of Gis, N, and k .

Fixed-effects

44
45

Discrepancies

46

Discrepancies

48

Discrepancies

49

Discrepancies

50

Discrepancies

51

Discrepancies

52

Discrepancies

52

Discrepancies

53

Discrepancies

54

Discrepancies

55

Means of Significant Discrepancies by

Pattern of 618, N, and k . . . . . . . . .

58

15.

16.

17.

18.

19.

20.

21.

22.

22.a

23.

24.

24.a

25.

26.

26.a

27.

28.

28.a

Paired ; Test between Theoretical and Simulated
Power for Random-effects Model . . . . . . 60

Frequency Table for Significant Discrepancies
for Random-effects Model . . . . . . . . . 62

Analysis of Variance for Power of H . . . 67

Means of Theoretical Power of H by Pattern
Of 618, H, and K o o o o o o o o o o o o o 68

Means of Simulated Power of H by Pattern of
618 by n and K o o o o o o o o o o o o o o 69

Means of Simulated Power of H for Homogeneous
6&8 by u by K (6 = 0) o o o o o o o o o o 70

ANOVA on Power of H for sis with One Extreme
value I 0 O O O O I O O T O O O O O O O O 7 1

Mean of Power of H for Sis with One Extreme

Value by H and H . . . . . . . . . . . . . 71

Mean of Simulated Power of H for 6&5 with
One Extreme Value by H and H . . . . . . . 72

ANOVA on Power of H for 615 with Two Extreme
values 0 O O O O O O O O T O O O O O O O O 72

Mean of Power of H for 61s with Two Extreme
Values by H and H . . .“. . . . . . . . . 73

Mean of Simulated Power of H for 61s with
Two Extreme Values by H by H . . .’. . . . 73

ANOVA on Power of H for Three Equal Subsets
Of 618 o o o o o o o o o o o o o a o o o o 74

ANOVA on Power of H for Three Equal Subsets
Of sis by u and k c o o o o o o o o o o o 74

Mean of Simulated Power of H for Three
Equal Subsets of 6&3 by H and H . . . . . 75

ANOVA on Power of H for Five Equal Subsets
Of 618 O O O O O O O O O O O O O O O O O O 75

Mean of Power of H for Five Equal Subsets
Of 6&8 by E and K o o e o o o o o o e o o 76

Mean of Simulated Power of H for Five
Equal Subsets of 61s by H and H . . . . . 76

xi

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

Mean of Power of H+ at a = 0.05 for “a = 0
for the Random-effects Model . . . . . . . 79
Two-Stage Testing Errors . . . . . . . . . 87

Paired L Tests on Mean Theoretical and
Simulated 1! Power for Homogeneous Effects
with 6 = o (a = 0.05) . . . . . . . . . . 94

Paired H Tests on Mean Theoretical and

Simulated HF Power for Homogeneous Effects
With 6 > O (a = 0.05) O O O O I O O O O O 95

Frequencies of Significant Discrepancies for
Power of 1, by H for Homogeneous Effects
With 6 > o O O O O O O O O O O O I I O O O 96

Frequencies of Significant Discrepancies for
Power of a, by H for Homogeneous Effects
With 6 > O O O O I O O O O O O O O O O O O 96

Frequencies of Significant Discrepancies for
Power of HF by n1 for Homogeneous Effects
With 6 > O O O ._ O O O O O O O O O O O O O 97

Frequencies of Significant Discrepancies for
Power of 1? by ¢$ for Homogeneous Effects
With 6 > o O O O O O O O O O O O O O O O O 97

Frequencies of Significant Discrepancies for
Power of HF by 6 for Homogeneous Effects
With 6 > o O O O O O O O O I O O O O O O O 98

Paired t Tests on Theoretical and Simulated

Power of is for Heterogeneous Effects . . 99

Frequencies of Significant Discrepancies of
Power of z, for Heterogeneous Effects . . 100

Frequencies of Significant Discrepancies
for Power of 13 by H for Heterogeneous
Effects 0 O O O O O O O O O O O O O O O O 101

Significant Discrepancies for Power of HF by
Pattern of 51 for Heterogeneous Effects. . 101

Frequencies of Significant Discrepancies

by Power of is by n; for Heterogeneous
EffeCts O O O O O O O O O O O O O O O O 0 102

xii

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

Frequencies of Significant Discrepancies for
Power of a? by ¢1 for Heterogeneous
Effects O O O O O O O O O O O O O O O O O 102

Paired ; Tests on Mean Theoretical and
Simulated Power of in for Homogeneous
Effects with 6 = 0 (a = 0.05) . . . . . . 104

Paired H Tests on Mean Theoretical and

Simulated Power of in for Homogeneous
Effects with 6 > o (a = 0.05) . . . . . . 105

Frequencies of Significant Discrepancies for
Power of HR by H for Homogeneous Effects
With 6 > O O O O O O O O O O O O O O O O O 106

Frequencies of Significant Discrepancies for
Power of HR by H for Homogeneous Effects
With 6 > o O O O O O O O O O O O O O O O O 107

Frequencies of Significant Discrepancies for
Power of HR by n; for Homogeneous Effects
with 6 > o O O O O O O O O O O O O O O O O 107

Frequencies of Significant Discrepancies for
Power of in by ¢i for Homogeneous Effects
With 6 > O O O O O O O O O O O O O O O O O 108

Frequencies of Significant Discrepancies for
Power of an by 6 for Homogeneous Effects
With 6 > o O O O O O O O O O O O O O O O O 108

Paired H Tests on Theoretical and Simulated
Power of an for Heterogeneous Effects . . 109

Frequencies of Significant Discrepancies for
Power of za by H for Heterogeneous
Effects O O O O O O O O O O O O O O O O O 110

Frequencies of Significant Discrepancies for
Power of in by H for Heterogeneous
Effects O O O O O O O O O O O O O O O O O 111

Frequencies of Significant Discrepancies for
Power of in by u; for Heterogeneous
Effects O O O O O O O O O O O O O O O O O 111

Frequencies of Significant Discrepancies for

Power of an by ¢i for Heterogeneous
EffeCts O O O O O O O O O O O O O O O O O 112

xiii

56.

57.

58.

59.

6o.

61.

62.

63.

64.

65.

66.

67.

67.a

68.

68.a

69.

Significant Discrepancies for Power of in
by Pattern of 61 for Heterogeneous
Effects O O O O O O O O O O O O O O O O O 112

Paired ; Tests on Power (sizeao of it versus
1a for Heterogeneous Effects with 6 = 0
(a = 0.05) and Homogeneity Was Rejected. . 117

Mean 1 Power values of EF versus ;3 for
Homogeneous Effects with 6 > 0 (a = 0.05)
and Homogeneity Was Rejected . . . . . . . 118

Mean g Power values of ;F versus ;R for
Heterogeneous Effects (a 0.05) and
Homogeneity Was Accepted . . . . . . . . . 119

Computation of Noncentrality Parameter for
the One-Extreme-Value Example . . . . . . 126

Computation of Noncentrality Parameter for
the Three-Equal-Values Example . . . . . . 126

Values of Sample Sizes Used in the
Simulation . . . . . . . . . . . . . . . . 134

Values of 6is used in the Simulation for
k=2 0 o o o o o o o o o o o o o o o o o 140

Values of 6is used in the Simulation for
K=S o o o o o o o o o o o o o o o o o o 140

Values of 61s used in the Simulation for
k = 10 O O T O O O O O O O O O O O O O O O 141

Values of 613 used in the Simulation for
K = 30 o o T o o o o o o o o o o o o o o o 142

Mean of Power for 6 s with One Extreme
Value by u and K (a = 0.10) . . . . . . . 144

Mean of Simulated Power for 6 s with One
Extreme Value by E and L (a = 0.10) . . . 145

Mean of Power for 6 s with Two Extreme
Values by H and 3 (a = 0.10) . . . . . . . 146

Mean of Simulated Power for 6 s with Two
Extreme Values by n and 3 (a = 0.10) . . . 146

Mean of Power for Three Equal Subsets of 61s
by n and k (a = 0.10) . . . . . . . . . . 147

xiv

69.a

70.

70.a

71.

71.a

72.

72.a

73.

73.6

74.

74.3

75.

75.a

76.

76.a

77.

77.3

Mean of Simulated Power for Three Equal

Subsets of 6Ls by g and 3 (a = 0.10) . . . 148

Mean of Power for Five Equal Subsets of 6is

by H and 3 (a = 0.10) . . . . . . . . . T 149
Mean of Simulated Power for Five Equal

Subsets of 613 by n and K (a = 0.10) . . . 149
Mean of Power for 6 s with One Extreme

Value by n and 3 (a = 0.025) . . . . . . . 150
Mean of Simulated Power for 61s with One
Extreme Value by u and 5 (a =‘0.025) . 151
Mean of Power for 61s with Two Extreme

Values by 3 and 3 (a = 0.10) . . . . . . . 152
Mean of Simulated Power for 615 with Two
Extreme Values by u and 5 (a 5 0.025) . . 152
Mean of Power for Three Equal Subsets of 615
by Q and L (a = 0.025) . . . . . . . . . ."153
Mean of Simulated Power for Three Equal
Subsets of 61s by H and g (a = 0.025) . . 154

Mean of Power for Five Equal Subsets of 6is

by H and 3 (a = 0.025) . . . . . . . . . . 155
Mean of Simulated Power for Five Equal

Subsets of 6is by H and L (a = 0.025) . . 155
Mean of Power for 6 s with One Extreme

Value by u and 3 (a = 0.01) . . . . . . . 156
Mean of Simulated Power for 6 s with One
Extreme Value by H and L (a = 0.01) . . . 157
Mean of Power for 6is with Two Extreme

Values by H and g (a = 0.01) . . . . . . . 158
Mean of Simulated Power for 6is with Two
Extreme Values by u and L (a = 0.01) . . . 158
Mean of Power for Three Equal Subsets of 61s
by H by K (a = OO01) O O O O O O O O O O 0-159
Mean of Simulated Power for Three Equal
Subsets by H and k (a = 0.01) . . . . . . 160

78.

78.a

79.

80.

81.

Mean of Power for Five Equal Subsets of 615
by n and K (a = 0.01) . . . . . . . . . T 161

Mean of
Subsets

Mean of
for the

Mean of
for the

Mean of
for the

Simulated Power for Five Equal
of 6is by H and g (a = 0.01) . . . 161

Power of ﬂ+ at a = 0.10 for pa = 0
Random-effects Model . . . . . . . 162

Power of H+ at a = 0.025 for #6 = 0
Random-effects Model . . . . . . . 163

Power of 3+ at a = 0.01 for pa = 0
Random-effects Model . . . . . . . 164

xvi

Figure

4.1.0

LIST OF FIGURES

Page

Frequencies of Absolute Significant
Discrepancies . . . . . . . . . . . . . . 57

Power Curve with g = 2 (a = 0.05) for Fixed-
effects Models (One Extreme Value) . . . . 165

Power Curve with L = 5 (a = 0.05) for Fixed-
effects Models (One Extreme Value) . . . . 166

Power Curve with 3 = 10 (a = 0.05) for Fixed-
effects Models (One Extreme Value) . . . . 167

Power Curve with L = 30 (a = 0.05) for Fixed-
effects Models (One Extreme Value) . . . . 168

Power Curve with 3 = 10 (a = 0.05) for Fixed-
effects Models (Two Extreme Values) . . . 169

Power Curve with K = 30 (a = 0.05) for Fixed-
effects Models (Two Extreme Values) . . . 170

Power Curve with 5 = 5 (a = 0.05) for Fixed-
effects Models (Three Equal Values) . . . 171

Power Curve with 5 = 10 (a = 0.05) for Fixed-
effects Models (Three Equal Values) . . . 172

Power Curve with 5 = 30 (a = 0.05) for Fixed-
effects Models (Three Equal Values) . . . 173

Power Curve with 3 = 10 (a = 0.05) for Fixed-
effects Models (Five Equal Values) . . . . 174

Power Curve with h = 30 (a = 0.05) for Fixed-
effects Models (Five Equal Values) . . . . 175

Power Curve with k = 2 (a a 0.05) for Random-
effects Models with u5 = 0 . . . . . . . . 176

Power Curve with L = 5 (a = 0.05) for Random—
effects Models with "5 = 0 . . . . . . . . 177

xvii

Power Curve with L = 10 (a 0.05) for Random-
effects Models with #5 = 0 . . . . . . . . 178

Power Curve with 3 = 30 (a 0.05) for Random-
effeCtS ”Odels With “6 = O o O o o o o o o 179

Power Curve with K = 2 (a = 0.05) for Random-
effects Models with #5 = 0.10 . . . . . . 180

Power Curve with 3 = 5 (a = 0.05) for Random-
effects Models with #5 = 0.10 . . . . . . 181

Power Curve with g = 10 (a = 0.05) for Random-
effects Models with “5 = 0.10 . . . . . . 182

Power Curve with L = 30 (a = 0.05) for Random-
effects Models with #5 = 0.10 . . . . . . 183

Power Curve with L = 2 (a = 0.05) for Random-
effects Models with #5 = 0.25 . . . . . . 184

Power Curve with L = 5 (a = 0.05) for Random-
effects Models with #5 = 0.25 . . . . . . 185

Power Curve with K = 10 (a = 0.05) for Random-
effects Models with pa = 0.25 . . . . . . 186

Power Curve with 3 = 30 (a = 0.05) for Random-
effects Models with pa = 0.25 . . . . . . 187

Power Curve with L = 2 (a = 0.05) for Random-

Power Curve with 5 = 5 (a = 0.05) for Random-
effects Models with “5 = 0.50 . . . . . . 189

Power Curve with 3 = 10 (a = 0.05) for Random-
effects Models with pa = 0.50 . . . . . . 190

Power Curve with L = 30 (a = 0.05) for Random-
effects Models with “a = 0.50 . . . . . . 191

xviii

CHAPTER I

INTRODUCTION

Mega-analysis in Educational Research

The application of quantitative methods in synthesizing
and analyzing the results of related studies has been of
growing interest to researchers in the social sciences. As
the number of related studies increases, drawing conclusions
about research questions becomes less straightforward than
it has been. Study results may be consistent with or
contradictory to each other. Features of the related
studies including sample sizes, experimental treatment
conditions, and sampled populations differ from study to
study. Drawing reasonable conclusions from those related
yet varied studies is the challenge for researchers.

Research reviewers utilize the results of many related
studies rather than results of single studies to draw
inferences. Such synthetic research is known as "meta-
analysis", a term coined by Glass (1976) to mean the
"analysis of analyses."

Various methods of research synthesis have been used
for many decades (e.g., since Tippett, 1931). The procedure
of meta-analysis in the social sciences was popularized by
Glass (1976), and has been developed by Rosenthal (1978),

Rosenthal and Rubin (1979), Pillemer and Light (1980),

Cooper (1982), Hedges and Olkin (1985) and others in the
last decade. This work has enabled research syntheses to
become quantitatively more precise through the analysis of
standardized effect sizes from primary studies.

Chang and Becker (1987) examined an empirical
application of three main approaches in meta-analysis: vote
counts and vote-counting estimation procedures (e.g.,
Hedges, 1986; Hedges & Olkin, 1980, 1985), tests of combined
significance (e.g., Fisher, 1932; Rosenthal, 1978; Tippett,
1931), and analyses of effect sizes (e.g., Hedges & Olkin,
1985). Chang and Becker compared the hypotheses,
statistical properties, and possible conclusions drawn from
the three approaches. In contrasting these methods, they
identified several areas for further research, noting in
particular a lack of information on the power of tests of

homogeneity of effect-size analyses.

Pur ose f e Stud

The purpose of this research is to study the power of
tests of homogeneity in effect-size analyses. The power of
homogeneity tests in both fixed- and random-effects models
in meta-analyses is studied. Power functions are
approximated and simulated. In addition, since typical
effect-size analyses involve tests for at least two stages,
the influence of the power of the homogeneity test on the

statistical errors of the subsequent tests is examined.

3

Power analysis of statistical tests is essential and
often ignored by empirical researchers (Brewer, 1972; Cohen,
1962, 1973, 1977; Daly & Hexamer, 1983; and Sedlmeier &
Gigerenzer, 1989). Without information on power,
interpretation of the results of statistical tests can be
very difficult. A null hypothesis may be accepted either
because the null hypothesis is true, or because the
statistical test had insufficient power to detect a true
alternative hypothesis, or because by chance the result was
small by sampling error even when the test had sufficient
power. Brewer (1972) and Cohen (1962, 1965) found that the
neglect of power analysis has resulted in generally low
power in research. Brewer argued that lower power affects
the validity of what otherwise would be a proper rejection
of Ho based on the research data. Cohen (1973) emphasized
power analysis as "the only rational guide to planning the
relevant details of the research" (p. 227).

This study approximates power functions and serves
empirical meta-analysts by enabling them to estimate the
power of their statistical tests against an array of
possible outcomes. I will do a numerical simulation of
power values for homogeneity tests in effect-size meta-
analyses. Comparisons will be made between power values
calculated through theoretical approximations and simulated
values. Power tables will be constructed. The influence of

the power of the homogeneity test on subsequent effect-

4
magnitude tests will also be examined. Below, I start by
briefly reviewing the concept of power and discussing the
importance of power analysis, especially for homogeneity

tests.
ed r a owe tud of Hom neit Tests in Meta-anal sis

D 'ni o Statist a Pow

Two types of error are involved in statistical
hypothesis testing. The type I error occurs if the
researcher rejects a null hypothesis when the null
hypothesis is actually true. A researcher commits a type II
error when accepting (failing to reject) a false null
hypothesis. The probability of the type I error is usually
denoted as 0, whereas the probability of the type II error
is denoted as 6. Statistical power is defined as the
probability of rejecting a false null hypothesis, and is
denoted 1 - 6.

Educational researchers have tended to be more
concerned about type I errors than about type II errors. In
setting a, the researcher imagines the null hypothesis to be
true and then considers the risk of falsely rejecting Ho.

On the other hand, in considering power, the researcher
imagines the treatment to have ”the minimum effect size"
worth detecting and then considers the risk of falsely
accepting Ho. Researchers limit the probability of a type I

error by setting low a levels, such as .05, .01, etc. Given

5
certain preset or fixed a levels, they then try to increase
power. For instance, they may increase sample sizes to
increase the statistical power (1 - B).

By setting low a levels rather than controlling the B
level, educational researchers are conservative about
accepting a new alternative hypothesis over an existing null
hypothesis. The existing null hypothesis will be retained
unless there is enough evidence against it. This
conservative attitude in considering new alternative
hypotheses in educational settings is often practical. It
reflects concern over possible extra time or extra cost if
changes are involved. Nevertheless, the tradeoff for a
conservative attitude is the increased possibility of making
a type II error.

This conservative attitude is reasonable in the context
of rejecting the null hypothesis, because rejecting a null
hypothesis does not cause a type II error. However, when
the null hypothesis is accepted (which sometimes results
from a ”conservative attitude"), one needs to have
reasonably high power in order to be comfortable that the
acceptance of the null hypothesis implies a small or non-
existent effect. Thus, apart from limiting the type I
error, a power analysis is always valuable in research
planning.

Empirical researchers often may not report the power of

their statistical tests for two reasons. First, the power

6
functions of some tests are not available, and second, some
researchers do not emphasize the importance of power.
Importance t9 the Test of git

The type I error is of primary concern and is often
used as the criterion for decisions in statistical tests.
However, one needs to be as concerned or more concerned
about limiting the type II error when testing for fit.

The purpose of tests of "fit" is to test the hypothesis
that certain expectations about a distribution (under Ho)
are correct and that the obtained data are actually from the
population specified by the hypothetical model (Hays, 1981).
The difference between tests of fit and other tests is an
implied "attitude." In the ordinary test, researchers
usually accept Ho unless the treatment effect is
significantly large. Therefore, researchers limit a values
in ordinary tests. In the test of fit, one tends to accept
Ha unless the obtained data fit Ho. That is, the researcher
assumes the data do not fit and seeks evidence that they do
(i.e., seeks to accept Ho). Logically, one should limit 6
in the test of fit. If applying a "conservative attitude"
to the tests of fit, researchers should limit B rather than
a, because in the tests of fit, the conservative researcher
would rather "accept" Ha. Hence, to be consistent with a
"conservative attitude," one would emphasize statistical
power (1 - B) more in testing for fit than in ordinary

tests. Also, since the test of fit is usually a preliminary

7
test to other tests, for one to proceed comfortably with the
assumption of data being "fit" the power of the test of fit
need to be high.
we 0 e e E fec -s'ze Meta-anal sis

The simplest homogeneity test in meta-analysis (Hedges
& Olkin, 1985) examines whether all the studies share a
common effect size. Unless the effect sizes are shown to be
homogeneous, they are treated as heterogeneous. Thus, the
homogeneity test can be viewed as a test of fit. A power
study for the homogeneity test is important because the
homogeneity test is a test of fit. An analysis of the power
of homogeneity tests in meta-analyses not only will aid our
understanding of how homogeneity tests relate to other meta-
analysis summaries, as suggested by Chang and Becker (1987)
but also is essential pg;_§g.
e o a wer tud

A power study can provide more understanding about the
homogeneity test. Practically, a power analysis can examine
how sensitive the test of homogeneity in meta-analysis is to
such important factors as the number of studies to be
integrated, sample sizes in each study, magnitudes of effect
sizes, and other factors. Thus, the examination of the
power of the homogeneity test is significant for both
theoretical and practical reasons. Based on the results of
this study, meta-analysts will be able to estimate the

statistical power of the homogeneity test prior to their

8
analysis, recognize factors influencing the power of the
test, and when possible choose appropriate values for those
influencing factors which can be manipulated to maintain
reasonable levels of power in their applications. Even if
they are unable (or choose not) to manipulate factors,
researchers will at least be able to evaluate how much power

they can obtain, based on this power analysis.

Comparison to the unbalanced Analysis of Variance Case

Parallels can be drawn between research synthesis and
the analysis of variance (ANOVA). Hypothesis testing in
ANOVA involves certain assumptions: observations are random
samples drawn from normally distributed populations; the
numerator and denominator of the E ratio are independent and
(under Ho) estimate the same population variance , 01‘. In
ANOVA models, the total variation in scores is partitioned.
For example, the simplest ANOVA model partitions the total
variation into two parts, the between-groups variation and
the within-group variation. The ratio of the between-groups
variation to the within-group variation has an E-
distribution (under Ho) and is used to test, for example,
the hypothesis of equal group means‘in the one-way case.

As with the analysis of variance, there are two models
for the population parameters in meta-analysis: the fixed-
effects case, and the random-effects case. In the fixed-

effects case, the population effect sizes are assumed to be

9
constants (or the variance of population effect size is
zero). By contrast in the random-effects case, the
population effect sizes are random variables. Therefore, in
the random-effects case, population effect-sizes have a
variance greater than zero.

In combining results, studies have been treated as a
blocking variable (Snedecor & Cochran, 1967; and Rosenthal,
1978) in ANOVA. When the studies are regarded as a random
factor and when the Treatment x Studies effect is large,
this interaction effect is used as the appropriate error
term. In the fixed-effects case for effect sizes, Hedges
and Olkin (1985) and others (e.g., Pigott, 1986) also have
drawn analogies between the effect-size meta-analysis and
the analysis of variance.

However, for combining studies, the homogeneity test
proposed by Hedges and Olkin is often more accurate than the
E based on the Treatment x Studies effect as an index of the
extent to which effect sizes vary across the groups. This
statement is true primarily because in combining studies,
the scales of measurement of the variables usually are not
the same across studies, whereas in ordinary ANOVA,
treatment groups within an experiment or study usually are
measured on the same scale.

Also, the assumption of the homogeneity of variance for
ANOVA is often violated when standard (unweighted) ANOVA is

applied to meta-analysis data. Studies in meta-analysis

10
thus often cannot be treated as blocks in an ANOVA where the
assumption is that comparable measurements are used.
However, weighted ANOVA where scores are weighted by their
precision would be appropriate, or if all of the reviewed
studies measure the outcome variable on a single metric and
if sample sizes (gs) are same (i.e., if homogeneity of
variance exists) then one could use the "treatment x blocks
(studies)" ANOVA to examine whether different studies have
different treatment effects.

Caution needs to be taken in making homogeneity of
variance assumptions in meta-analysis. In combining
studies, the sample sizes of the studies are almost always
different across studies. When studies do have equal sample
sizes, one might treat the study effects as having equal
variances (which depend mainly on the sample sizes).
However, more realistically, most studies will not be based
on the same sample sizes, thus the homogeneity of variance
in combining studies cannot typically be assumed.

Therefore, Hedges and Olkin's homogeneity tests for effect-
size meta-analysis are often necessary, and usually more
accurate than 3 tests in ANOVA. The homogeneity test
proposed by Hedges and Olkin does not require the assumption
of homogeneity of variance across the effect sizes. And the
homogeneity test can be applied to studies with unequal

sample sizes.

CHAPTER II

STATEMENT OF THE PROBLEM

Power of the Statistical Test in Empirical Research

As Cohen (1962) indicated nearly three decades ago, the
power of statistical tests in empirical research is rarely
reported. This is still true today. Though many
researchers have recognized the importance of statistical
power, few estimate and report the power of statistical
tests in their studies. For example, a review of studies
for the last ten years in the qurnal pf Research in Science
Tgaching (1980-1990) shows that few researchers (less than
5%) report power based on their proposed treatment effects
or sample sizes.

Theoretically, the power of a statistical test to
detect some alternative hypotheses (versus a given null
hypothesis) should be computed before the initiation of a
study. Without information on power, the test's conclusion
may be questionable. When the power of a test is reasonably
high, the decision about the hypothesis is likely to be a
valid one. However, when the power of a test is low, the
decision about the hypothesis may be confounded and
confusing. Specifically, when the probability of rejecting
the null hypothesis is low, the null hypothesis may be

11

accepted because it is true or because of low power.

Tversky and Kahneman (1971) even suggested that research
studies can be wasteful, as the interpretation of results is
quite difficult with tests having low power.

Overall (1969) argued that when a test has low power,
the probability of rejecting the true null hypothesis (a)
may be only slightly smaller than the probability of
rejecting a false null hypothesis (1 — B). "As a
consequence, false rejections of valid null hypotheses may
constitute a large proportion of all significant results"
(Overall, 1969, p. 286).

As defined in Bayes' theorem, the ratio of the
probability of invalid rejection of Ho to the total
probability of rejecting Ho depends upon (1) the simple a
specified by the investigator, (2) the power of the test,
and (3) the a_pripp1 probability that the null hypothesis is
valid (Overall, 1969). With low power, and if "the a priori
probability of validity for the null hypothesis is .
substantial, an even larger proportion of significant
results may be due to chance" (Overall, 1969, p. 286).
Overall’s message supports the emphasis on the power

analysis of the homogeneity test in combining studies.
Powe o t e - s s

The test for homogeneity of effect sizes has been

suggested of having "excessively high statistical power

12

13
(Hunter et al., 1982)". In detecting a true difference, the
concept of a test being "too powerful" is often not a
concern. A powerful test can have a problem when the false
rejection rate (or the type I error rate) exceeds the
nominal level. Alexander et al. (1989) examined the chi-
square test of homogeneity of effect sizes when the test is
applied to correlation coefficients. Their results showed
that the test on untransformed ps has excessively high Type
I error rates but the test performs nominally for Fisher’s
p-to-z transformation. However, the power of test for
homogeneity of effect sizes are yet to be studied.

As mentioned above, in meta-analysis the effect-size
analysis can involve two levels of statistical tests.
Before testing the magnitude of the average of the effect
sizes drawn from related studies, one typically examines
whether the studies share a common effect size. The
reviewer first tests the homogeneity of the effect sizes
drawn from various studies; and then tests if the common or
average effect shared by those studies is greater than zero.

Low statistical power from the first-stage homogeneity
test can also affect the second-stage test of the magnitude
of the common effect size. When power is low, the null
hypothesis for the homogeneity test tends to be accepted;
that is, the effect sizes from studies are assumed to be
homogeneous. The subsequent test for the magnitude of the

commop effect size may be wrong (or misleading) if the

l4
effect sizes were actually heterogeneous and this has not
been detected.

In the extreme case, if the power of the homogeneity
test is approximately zero, one would always falsely accept
the hypothesis that the effects are from the same population
(i.e., effects are homogeneous). Subsequent tests of effect
magnitude would be based on the avgpagg effect size, which
would be wrongly assumed to be the pommpp effect. The test
for the magnitude of the effect then will generally be too
lenient, and the concept of the ppmmpp effect is misleading.
By assuming that the test of fit has adequate power, the
researcher also assumes that subsequent tests will behave as
they should. Thus a power analysis for the test of fit in
meta-analysis has indirect benefits as well.

Another situation using two-stage testing involves the
homogeneity-of—variance test in analysis of variance.
Suppose the within-group variances 0’; are very different
from group to group. In this case, the standard ANOVA would
be unjustified. Here the researcher also goes through two
stages: (1) testing homogeneity of variance across the
groups; and (2) if homogeneity is retained, proceeding with
the ANOVA. Testing at stage 2 will only be valid if the H0
at stage 1 is true. In other words, the test at stage 2
will lack validity if the result in stage 1 is a type II
error, wherein the Ho of homogeneity is falsely retained.

A similar analogy is (1) testing the blocks by .

15
treatment interaction in a two-way ANOVA design; and (2) if
the interaction effect is judged to be zero, one can either
(a) pool the interaction sum of squares into the error sum
of squares, or (b) form a one-way model with treatment
effect as the only factor by pooling sums of squares for
blocks and the interactions into the error sum of squares.
Fabian (1991) pointed out that to proceed as if the
interactions were zero after rejecting the zero-interaction
hypothesis may give incorrect decisions with a large
probability. Fabian further studied whether considering the
power of the test and obtaining information on the neglected
interactions can provide improved methods for obtaining "(1)
an interval estimate of one of the cell expectations, (2) a
simultaneous interval estimate of the cell expectations, and
(3) an estimate of the cell with the largest expectation"
(p.362). Fabian concluded that replacing the two-way model
by the one-way model is a better method.

In the effect-size meta-analyses, the goal is to
estimate the overall average treatment effects. The
procedures also differ from the ANOVA analogy. When effect
sizes are determined to be consistent, the variation between
the population effect sizes will too be ignored. However,
instead of pooling error sum of squares as in the ANOVA, the
fixed-effects model excluding the variation of population
effect-sizes will be applied. Power of the homogeneity test

is again important because one can examine whether similar

16
recommendation to the two-way ANOVA with blocks design will

be made to the effect-size meta-analyses.

CHAPTER III

POWER OF HOMOGENEITY TESTS IN EFFECT-SIZE ANALYSES

In this section, notation and definitions are given for
the statistics used in this paper. Second, procedures are
outlined for effect-size meta-analyses for both fixed- and
random-effects models. And third, the power of the tests of
homogeneity in effect-size meta-analyses for both fixed- and

random-effects models is studied.
Definitipns apd uppapion

Population Effect Siza

Consider the 1th of a series of 3 studies each
comparing two groups. The population effect size for the

two groups within study 1 is defined as

61 = (pf - ui°)/ 0.. i = 1. z. (1)

where #13 and pic are the population means in the ith study
on some outcome variable X1! in the experimental and control
groups, respectively, and 01 is the common population
standard deviation for study i.
class’s Estimator of Effept Size

Glass’s estimator of effect size is often used in

integrative reviews. (Examples can be found in some reviews

17

in the Appendix.) Glass (1976) estimated the population
effect size by the aampla standardized mean difference. The
formula for Glass’s effect size for the 1th study of a set

of k studies is

Q. = (2;E ~‘2iC)/§;, (2)

where 2&3 and-21C are the sample means in the 1th study for
the experimental and control groups, and §i is the pooled
sample standard deviation from the usual two-sample p test
for experimental and control groups. We assume that XLE, i
= 1,..., piE, and 21°, i = 1,..., pic, are independent and
normal with means u;E and pic, respectively, and common
population variance 0&2. This is the usual t test
assumption.
U b sed s ma 0 o t e

Glass's estimator of the population effect size is
biased. Hedges (1981) obtained a corrected effect size a1,
which is the minimum variance unbiased estimator of 6;. The

unbiased estimator is approximately

9.; = 2(E1)Q‘ (3)

where,

m

(E1) z 1 - 3/(431 - 1). and

_. E C_

The large-sample distribution of a1 tends towards

normality.' Hedges and Olkin noted (1985, p. 86) that if piE

18

19
and pic increase at the same rate (that is, if niE/Ei and
paF/ﬂi are fixed, where 31 is p13 + pic) then the asymptotic
distribution of a1 is normal with mean 61 and asymptotic

variance 02(gi). We may write

6; ~ N (6;. 02011)). (4)
where the variance of ai is approximated by,

02(51') = - i + i . (5)
' qun;° 2(niE + 21°)

 

 

A

The variance of ai, 02(ai), is estimated by 02(gi), a sample
estimate of 02(ai), where Q; is substituted for 61 in
formula (5). I do not use the notation 02(61) to denote the
variance of a1, to avoid confusion with 035 introduced
below. According to Hedges and Olkin (1985, p. 193; also

Hedges, 1983), the exact conditional variance 03(g1|6i) of

Q; is
a2<g_|6_i_) = Baa/EL + (a_ - 1) 62;. (6)
where £1_ = 2122; / (D._E + .11;C) I

a; = mi(C(mi))’ / (m; - 2).
and m; = n: + pic - 2,

and C(mi) is approximated as in (3).

20

A 'ca ' ati 0 Power

Effact-size Analyses for Fixed-Effects Models

In this section, I review methods for effect-size meta-
analyses in the fixed-effects case. The procedures for
analysis and the statistical tests used are briefly
described. Full details are given in Hedges and Olkin
(1985).

H s s. In effect-size analyses, an estimate of
effect size is first calculated for each study using (2) and
(3) above. Combining these estimates, one can obtain an
overall estimate of effect size. Reviewers are usually
interested in testing the magnitude of the overall effect
size. Typically one tests the null hypothesis of no effect.

Hedges (1982) indicated that if the underlying
population effect sizes from a series of studies are not
identical, representing the results of a set of studies by a
single estimate of effect size can be misleading. Hedges
developed a two-stage testing procedure for effect-size
meta-analysis in the fixed-effects case. At the first
stage, one tests the homogeneity of the effect sizes from
all the collected studies, and decides if the studies share
a common population effect size. If the studies are not
homogeneous, the studies probably do not share a common
population effect size. The reviewer next may attempt to
"model" or describe the studies with categorical or

regression models using study features as factors or

21
predictors or may decide to adopt a random-effects approach.
If the studies are homogeneous, one can test the hypothesis
that the magnitude of the common effect size equals zero at
the second stage of testing.
Hypotheses examined in the two-stage testing are:
H01: 61=62=...=61_{=6, and (7)
H02: 6 = 0. (7a)
Homogeneity test statistig. The statistic for the
homogeneity test of “01' proposed by Hedges and Olkin
(1985), is
; (91. - 9142 A
1‘- ‘ E - " X2<1r.-1)' (8)

1 l ‘
02 (s1;)

 

under H01, and where

 

51° = I (9)

II MW
>

0-2(91)

is the average of dis, weighted by the precision of each di.
Hedges and Olkin (1985, p. 112) noted that if the
sample sizes of the experimental and control groups in each
of the 1; studies, p13, ..., pkg, plc, ..., pkc, increase at
the same rates (as pig/Hi, Bic/Hi remain fixed, where H; is

the total sample size for study 1), then the null

distribution of a. tends to normality with a mean

22

 

 

; 1
6. = , (10)
K
2 a'2<g;)
=1 -
and a variance
1
02m.) = . (11)
L -2
'3‘” (9i)

1=1

where 02(ai) is defined in (5).

When the hypothesis of homogeneity is retained, one
tests H02 by drawing a normal confidence interval around the
weighted average d., or by doing a a test since a. is
asymptotically normally distributed with a mean of the
common effect 6 (if all 6is equal 6, then 6. = 6), and a
variance of 03(g.).

Distribution of the homogeneity test for fixed-effects
models. As stated before, when the gis are asymptotically
normal, under the null hypothesis where the 3 studies share
a common effect size, then the homogeneity test statistic, H
has an approximate central chi-square distribution with (k-
1) degrees of freedom. When the 6is are not the same across
the 5 studies, H has an approximate noncentral chi-square

distribution with (k-l) degrees of freedom and a

noncentrality parameter

23
r. (a - 5.):
2 i
.'_=1 may

 

, (12)

where 6. is the weighted mean of 6is shown in (10).

Iheorem. Let g1, ..., gk be defined as in (4) and the
homogeneity test g be defined as in (8). Then when Ho: 61 =
... = 63 = 6 is true, g ~ x3£_1, and when H0 is false 3 ~
x3311(k.) where k. is defined in (12).

2:99;: We observe g1, ..., g; independently, each with

a mean 6;, and a variance 02(g1), that is,

a; A
d = = ~ N3 (6. diagw’mi). 03(d3))). (13)

where 6 = (61, ..., 6k)'. We wish to test the hypotheses
Ho: 61 = 62 = ... - 6k = 6 versus (14)

H : At least one 61 is different, for i = 1, ..., g.

The null hypothesis can be rewritten in matrix form as

Ho: 6 = 63 (15)

for some constant 6, where 6 = (61, ..., 63)’, and J = (1,
..., 1)’. Let g; = g1/a(gi{ denote g1 weighted by its
precision (or, the inverse of its standard error), so that
the vector of £43 is normally distributed with a mean vector
of 618 weighted by these precisions, denoted as vector pa,
and with a variance matrix equal to the k x k identity ~

matrix, Ik. In matrix form,

24

Qi/U(Q1)
,, = ; (16)
you...)
l/atgl) 0
. . . . . . A
= . . . . d ~ N [MEI IE]!
6 : . . .1202Q5)

where n! = («h/0&1): .... 65/0(d£))'- Let vector 30 =

(1/0(91): ---. 1/a(g£))'. Under the null hypothesis, uw =

6x0. The projection of vector v on x0 is

 

 

K
30"" .2 Ei/O(Qi)
~ ~ i=1
p(w|xo) = x0 = x0 (17)
” “ "30"2 " E ”

2 1/“’(Q1)
i=1 _

ll MIN

Q/0’(Q)
1 i i

IP-

 

u' le

1/03(Qi)
1

where g. is defined in (9). The projection of vector w on

~

the entire space other than the space spanned by vector xo

is, by definition, the difference between the vector w and

~

its projection on vector :0:

25

 

 

 

 

 

 

 

 

r 9-; w r 1 1 r _d_1 _ g. 1
w - p(w|xo) = -—————- - g. = ———————— .
~ ~ ~ C(94) 0(511) 0(g1)
d; .1 93 -' g.
_ a(d_) . . 0&5) J L 0(dg) J
(18)

The vector w - p(w|xo) has multivariate normal distribution

~ ~~

with a mean vector of (51 - 6.)/a(gi). The squared length
of the above vector is
E (Q; ' §-)3

llw - p(w|xo)lla = E . (19>
“ ~ ” 131 03(93)

 

which is asymptotically distributed as a noncentral chi-
square with (g - 1) degrees of freedom and a noncentrality

parameter, say A., where

(6; " 6.):

 

. (20)

>’
H
IP-
II [‘1 IX

1 0’(Q;)

Under the null hypothesis where 6&5 are equal and A. is
zero, the 3 statistic is asymptotically distributed as a
central chi-square with (L - 1) degrees of freedom."
Effect-size Analyses for Random-Effects Models

Unlike the fixed-effects case where the population
effect sizes, the 6Ls (i.e., 61, ..., 6k)' are fixed
constants, in the random-effects case the 6&3 are sampled
from some population. Cronbach (1980) argued that in

educational research each treatment site (or study) may be a

26
sample from some universe of related sites rather than from
a single population. Under the random-effects model
variations in treatments are viewed as more or less
effective in producing an outcome. In other words, in the
random-effects model there is no "single" true (population)
effect. The true effects are from a distribution of effects
with some variance.

Since random-effects models assume that true values of
the effect sizes are sampled from a distribution, the
sources of variation in observed effects are at least two.
One is the variability in effect-size parameters in the
population distribution of effects. Another is the
variability in the estimator about the true parameter value
for a particular study (due to sampling error).

The simplest case of a random-effects model specifies
that d1, ..., d3 are conditionally normal. That is, each d;
given 6; is approximately normal for the ith study. The
distribution of 61 values is often assumed to be normal,
which implies that the unconditional distribution of Q; is

also normal. The unconditional distribution of g; is then:
.Qi ” N (“5: 035 + 02(gilai))’ (21)

where #5 is the expected value of the population effect-size
values, 035 is the variance of the population distribution
of effect sizes, and 02(gilsi) is the variance of the

conditional distribution of Q1 given 51' and is described in

27
formula (5) and (6).

t se . The steps in testing for a random-effects
model are, first, to estimate the mean effect size "a (the
population mean of the 6s) and the variance 015 and, then,
to test the hypothesis that 035 is zero. If 035 = 0, then
no variation exists among the 6&8, that is, the conditional
variance of g1, 02(gilsé), equals the unconditional variance
of d;, 02(gi) in the fixed-effects model. A test of 036 = 0
in the random-effects model corresponds to a test for
homogeneity of effect sizes in the fixed-effects model.

Hence, the following two hypotheses are the same:

H0: 035 = 0, and (22)
Ho: 61 = 62 = ... = 6k = 6, for some 6.
omo e e't es at 'c. Under the above null

hypothesis that the population effect sizes have no

variation, the homogeneity test statistic is

 

x (g; - g.)= A
3+ =.2 "' X20571): (23)
i=1 *
03(QLI61)

where
, a'2(gilsi) g1
l=1
g+ = . (24)

A

 

K-2
.2 a- (gilsé)
i=1

The estimate of the variance is obtained by substituting d-

28
for 61 in the asymptotic variance in (5).

Distribution of the homogeneity test for random-effects
models. The statistical power of the homogeneity test is
the probability of rejecting a null hypothesis when the
alternative hypothesis is true, that is, when the true
variance of the 6&5 is not zero. The distribution of the ﬂ+
statistic under the alternative hypothesis is no longer a
central x2, as under the null hypothesis that 6és have no
variation. However, it is not a simple noncentral x2
distribution either. It is a combination of many noncentral
x2 distributions.

Theorem. Let g1, ..., gk be defined as in (21) and the
homogeneity test ﬁ+ defined as in (23). Then when Ho: 053 =
O is true, EL ~ x3£_1, and when no is false ﬁ+ is a
combination of many x4£r1(x.) variates where A. is defined
in (12).

m: Let a9; = V052 + a=(gi|s;), let x; = 911/09;
denote g; weighted by the square root of its precision, and
let vector u! = p6(1/091, ..., 1/093)' denote u5 weighted
similarly, so that the vector v of yis is normally

distributed with a mean vector uv, and with a variance equal

to the identity matrix, 13' In matrix form,

In
\

Q
Q
P

1
(25)

”all on on no
\
Q
Q
'5‘

29

where IL is an identity of dimension 5. Let vector to =
(1/091. ..., 1/ap£)'. Under the null hypothesis that 052 =
0, vector to equals vector :0 (as defined in the proof for

the fixed-effects model), and vector v is vector w in

formula (16) for the fixed-effects model. Thus, under the

null hypothesis, the projection of vector v on to in the

random-effects model equals the projection of vector w on xo

~ ~

in fixed-effects:

p(VIto) = pWIxo) = doxo. (26)

and

v - p(v|to) = w - p(w|xo). (27)

~~ ~~

The squared length of the difference between vector v and

~

its projection on to is thus distributed as a central chi
squared with (K - 1) degrees of freedom under 30 as was E in
the fixed-effects case."

However, the nonnull distributions of 3+ for random-
effects models differ from that of a for fixed-effects. For
fixed-effects models, the distribution of a under the
alternative hypothesis is a noncentral chi-squared
distribution. In random-effects models, the probability

that ﬂ+ S h given the 6&3 is an average over k dimensions:

30

E5[P(ﬂ+ S h|61, ..., 53)] = P(E+ S 11): (23)

for 6 = (61, ..., 63). For each possible 6 vector from the

population of 618, H; has a xakr1(k.) distribution with

noncentrality parameter A. as in (12):

PUi-q- S DIG-1.! 0": 65) = P(X2_)s-1()\°) S 11.)] = F01; 9(5)):
(29)

where F is the cumulative density function of 5+, and g(6)

is the noncentrality parameter A. for the noncentral X2k-1

distribution. Thus

Ei[P(H+ S hl‘;r °--o 55)] = E£[F(h; 9(i))]- (30)
We can also write:

Ei[F(hi 9(3))1 = S "' 5 F(hi 9(3)) Elf) §fr (31)

where 1(6) is the normal density function of the 615. The

power of the random-effects homogeneity test is
1 - P (.11. s n) = 1 -S S Fm; 9(6)) 2(6) d6. (32)

No simple form of the distribution of 5+ under the

alternative in the random-effects case can be written.

CHAPTER IV
SIMULATION OF THE DISTRIBUTIONS OF THE STATISTICS

FOR POWER UNDER FIXED- OR RANDOM-EFFECTS MODELS

In this Chapter the asymptotic distributions of the
homogeneity statistics 3 and 3+ (for fixed- and random-
effects models) are compared to numerical simulations of
those distributions. Specifically, differences between
cumulative density functions of chi-squared distributions
(with A. 2 O) and simulated cumulative density functions for
n and 5+ are examined. Confidence intervals are drawn for
the differences at the 95% level. The parameters varied in
the simulation include (1) the significance criterion (a
level), (2) the noncentrality parameter of the chi-square
density (the degree to which H0 is false), (3) the number of
effect sizes (5), and (4) the sample sizes (g). It is known
that, other things being equal, power increases as sample
size increases. The same relationship exists between the

power and the effect size, and between power and a levels.

Earameters of the Simulation Study

An empirical study of published reviews suggested
values for the parameters of the simulation study.

Practical ranges for variables in the simulation were

31

32
designed by reviewing a random sample of twenty published
meta-analyses (see Appendix E). Many of these twenty meta-
analyses did not report sufficient information on the
original studies they reviewed to inform the selection of
variable values for the simulation. Therefore, I examined
about 40 more reviews in Review of Educational Research from
the middle of 1985 to the beginning of 1990 (volumes §§(2)
through §2(3)).

Factors examined included the following: the number of
studies (or number of independent effect sizes), 3; the
magnitude of effect sizes (g1), the sample variance of
simulated effect sizes (S35), the sample size of the
experimental group for each study ;, giE; and the sample
size of the control group for each study 1, pic. From these
factors values of the population effect sizes, 6;; the
variances of population effects, 035; and the significance
level, a; were chosen for the simulation.

Humber of Effect Sizes

In contrast to previously examined reviews (Becker,
1985), the reviews examined here tended to include more
studies, that is, to have larger 3 values. Of reviews that
reported information about individual studies, approximately
one fourth included more than one hundred studies, and about
one fourth analyzed fewer than twenty. One tenth of the
reviews contained fewer than ten studies. Very rarely, the

homogeneity test was applied to only two studies (3 = 2).

33
Although the 3 values (numbers of studies) were generally
quite large in this set of reviews, power studies have often
been performed assuming small numbers of studies. For this
reason, a broader range of 3 values (3 = 2, 5, 10, and 30)
was selected for this power study.
a Si 5

Based on the empirical study, study sample sizes (a =
231/3, 1 = 1, ..., L) of 20 (e.g., 10 in each experimental
or control group), 60, 120 and 200 were selected. In
empirical reviews, studies rarely have equal sample sizes.
The sample-size values in the simulation were determined by
the total sample size across studies (3), the total sample
sizes of each study (3;, 1 = 1, ..., K), the sampling
fractions (n; = 31/3, 1 = 1, ..., 5), and the ratio of the
size of the experimental group over the total sample size of
a study (¢_1_ = niE/Qi' i = 1, ..., 15).

For example, in the case of g = 2, with a series sample
size of E = 40, with sampling fractions («1, #2) = (.5, .5)
and (.3, .7), and within-study sampling fractions (o1, ¢2) =
(.5, .5) and (.35, .35), the simulation will include the
sets of parameters described below.

Sampling fraction (”1' ”2) = (.5, .5) indicated that
studies had equal sample sizes, that is, (31,,32) = (20,
20). Two values of within-study sampling fractions
determined the sample sizes for two sets of samples. For

(o1, ¢2) = (.5, .5), samples were equal within studies. For

34
(o1, ¢2) = (.35, .35), the ratio of the sample sizes of the
experimental group over the total sample size within each
study was 0.35 (and was the same across studies.

Symbolically,

("1: ”2) = ('5: '5) => (£1.11 32) = (20: 20)

Then,
(th, 4:2) = (.5, .5) => 1113 = me = 10 and £23 = gzc = 10.
And for (o1, ¢2) = (.35, .35), then
£13 = 7, 31° = 13 and 1123 = 7, 94° = 13.

Thus the combination of fixed values of H and (n1, "2), with
the pair of (o1, ¢2) values produced two sets of sample
sizes for the simulation.

Unequal sampling fractions such as ("1, «2) = (.3, .7)
indicated that some studies had larger sample sizes than
others. In this example, the ratios of the study sample
sizes over the total of the sample sizes for the two studies
were 0.3 or 0.7. Thus for E = 40, (31, £2) = (12, 28). The
two values of within-study sampling fractions again
determined the within-study sample sizes. Sampling
fractions used within studies (o1, oz) = (.5, .5) or (.35,

.35) were the same as outlined above. Thus

("1: 7'2) = (.3, .7) => (21. £2) = (12: 23)
Then,
(401. ¢2) = (-S. .5) => 2113 = 21° = 6 and 22E = 122° = 14-

35
And for (o1, ¢2) = (.35, .35), then
3313 = 4, 31° = 8 and £23 = 10, n2(3 = 18.

The values of E, 51' Hi, and ¢i were selected based on my
empirical study of reviews. Total sample sizes across 3
studies with average sample size n = 2n;/L were u = 3*3,
205, 603, 1203, or 2003. Sampling fractions were the ratios
of the sample sizes of each study to the total sample size
across studies. Sampling fractions differed for each x and
are listed in Table 1. Two values of the sampling fraction
within studies were selected: 0.5, or 0.35. That is,
experimental and control sample sizes were either balanced
(¢i = 0.5) or unbalanced (¢1 = 0.35) within studies.
Specific numbers used for the simulation are listed in Table

62 in Appendix B.

Table 1

Sampling Fractions for Power Study

 

 

3 (HI, ..., wk)

2 (.5 .5) (.3 -7)

5 (.2 .2 .2 .2 .2) (.15 .2 .2 .2 .25)

10 (.1 .1 .1 .1 .1 .1 .1 .1 .1 .1)
(.05 .06 .07 .07 .08 .08 .09 .1 .15 .25)

30 (.03 .03 .03 .03 .03 .03 .03 .03 .03 .03

.03 .03 .03 .03 .03 .03 .03 .03 .03 .03
.03 .03 .03 .03 .03 .03 .03 .03 .03 .03)

(.007 .01 .01 .01 .013 .02 .02 .02 .02 .02
.02 .023 .023 .023 .027 .027 .027 .027 .037 .037
.037 .04 .04 .047 .056 .056 .056 .067 .067 .113)

 

36

ula 'on e S' es

In the homogeneity test, the alternative hypothesis
that "at least ooe population effect size differs" is a
composite hypothesis. The number and complexity of possible
alternative hypotheses makes the power study difficult.
However, by examining past reviews, I have selected sets of
typical values for 61’ The conditions depicted include (1)
the null hypothesis, where all the estimates of effect sizes
share a common population parameter (6), and (2) several
alternative hypotheses, where at least one sampled effect
size arises from a different population.

For example, the empirical reviews showed that effect
sizes often vary from study to study. Thus, a typical
pattern of the effect sizes shows a set of 61 values that
differ slightly from each other. Other possible sets of 6;
values are also suggested by the empirical study. One
larger 6; value with (k - 1) smaller 61 values is of
interest (Becker, 1985). The pattern of two larger 6;
values will also be studied when k 2 10. Another pattern of
interest is one in which the 51 values are more evenly
distributed, for example, having equal value within three or
five equal subsets, but differing between subsets.

For the fixed-effects model, five patterns of 6:5 were
designed: (1) all equal to zero, (2) 61 = ... = 6k_1== 0
and one nonzero value 63 (taking values 0, 0.1, 0.25, 0.5,

0.75, and 1.0), (3) 61 = ... = 6k_2 = 0 and two nonzero 6is

37

(6£_11and 6;) (for k 2 10), (4) three equal subsets of 6;s
in which one subset contains zeros, and studies in the other
two subsets share nonzero values 6 and 26, respectively, and
(5) five equal subsets of 6‘s where, again, one subset
contains zeros, and the other four subsets have nonzero
values (of %6, 6, 1&6, 26). The patterns of population
effects used were:

(1) (o, ..., 0),

(2) (O, ..., 0, 6),

(3) (0, ..., 0, 6, 6),

(4) (0, ..., 0, 6, ..., 6, 26, ..., 26), and

(5) (o ,..., 0, k6, ..., 25, 6,..., a, 126,..., 1%6,

26,.., 26).

The population effect sizes used for the fixed-effects
models are listed in Tables 63 to 66 in Appendix B.
Var' e 0 on cts

Values of the variance of the population effect sizes
(036) in the random-effects models were also suggested
through the empirical study. Variance values selected for
the random-effects models are 0.01, 0.03, 0.05, 0.07, 0.09,

and 0.1.

Design of the Simulation Study

Combinations of the variables outlined above formed 992
patterns of simulation parameter values for fixed-effects

models and about 2400 combinations for random-effects

38

models. The probability distribution of the homogeneity
statistic was simulated for each combination of variables.
Simulated distributions were compared with the corresponding
asymptotic distribution at fifteen percentile points (1-a):
0.05, 0.10(0.10)0.90 (i.e., from 0.10 to 0.90 with increment
Of 0.10), 0.95, 0.975, 0.99, 0.995, and 0.999. That is,
14880 simulated and theoretical power values were obtained
from 992 combinations of parameters for fixed-effects
models.

The simulation followed these procedures:
Case I. 035 = 0, for fixed-effects models:

A. Generate 2000 replications (see rationale in
Appendix A) of normal and chi-square deviates
and compute 3 effect sizes (d1, ..., g3) for
each combination of the parameters presented
in Table 1.

8. Calculate the homogeneity statistic E from the
5 generated effect sizes for each of the 2000
replications. Computations for steps A, and B
were done for each replication.

C. Compute proportions of a values (from the 2000
replications) that fall beyond central x3
critical values at fifteen significance levels
(a).

D. Compare proportions of significant n

statistics at 15 a levels from step C to the

39
probabilities based on the approximate
noncentral chi-squared distribution at each
significance (a) level.
Case II. 035 = 0.01(0.02)0.09, 0.10, for random-effects
models:

A. Generate 2000 replications of 62s (l = 1, ...,
L) from normal deviates and given sets of (pa,
035) values.

B. Calculate the noncentrality parameter A. from
each vector of 618. Randomly select a value
of 3+ from the noncentral chi-squared
distribution based on A.. As in Case I,
computations for steps A, and B were done for
each replication.

C. Compute proportions of 5+ values (from the
2000 replications) beyond central chi-squared
critical values (xza).

D. Compare proportions of significant 3+
statistics at various significance levels (a)
to the probabilities based on the calculated
power values from formula (29) in Chapter III
at page 24.

Attention is drawn below to the difference between
simulated and theoretical power in cases involving extreme
values, especially small values of 618, ks, and us. The

strength and nature of the relationships between power and

40

the simulation parameters are examined.

Qomoutation for Simulated Distributions

Simulations were conducted using FORTRAN programs and
the resulting data were analyzed through the SPSS-X and §A§
statistical packages. FORTRAN programs were written by the
author. The accuracy of the programs and subroutines was
assumed by inspection of initial detailed printouts of
results on individual iterations. For small numbers of
iterations, results of the simulation were listed and
checked by hand calculation.

Fixed-effects Models

Sample effect-sizes were obtained from noncentral ;
statistics, computed using normal deviates and chi-squared
random numbers generated by IMSL subroutines DRNNOR and
RNCHI. Note that g is exactly a noncentral t statistic even
though its asymptotically normal. Glass’s estimator of the
effect size has a t distribution. The formula used for the

unbiased effect size estimator was 91 = {1 - [3/(4(oinﬁof3

 

- 9H} * 1;. where t; = {‘1 + [0/(2F+ 111°) mimic) *
zi]}/(V§;7df , 21 is a normal deviate, and 91 is a chi-
squared random value. ﬂ statistics were calculated from
those effect sizes using FORTRAN programs. For each given
set of population effect sizes (61s) and a combination of
other simulation parameters, 2000 replications of g

statistics formed a simulated distribution. Upper tail

41

probability values from the simulated distributions were
compared with upper tail probabilities of noncentral chi-
squared distributions (provided by IMSL subroutine CSNDF) at
15 percentile points. Power values were calculated as the
proportions of 3 statistics exceeding critical values at the
15 significance levels.
Raodom-effects Models

In random-effects models, population effect sizes (625)
were not fixed values; rather, they were assumed to vary
randomly around one grand mean M5- In the simulation, sets
of population effect sizes 61s were generated from normal
distributions through IMSL subroutine DRNNOR with a given
mean, #5. and variance 035. From one set of means and
variances, 2000 replications of 6;s were generated. For
each set of 61s, a noncentrality parameter A. was calculated
to obtain probability values from a noncentral chi-squared
distribution using IMSL subroutine CSNDF. A homogeneity
test statistic (3+) was drawn randomly from each noncentral

chi-squared distribution to form a set of 2000 H

+s. I did
not generate 91' ..., 92 to calculate ﬂ+ because results of
g from the fixed-effects models showed that noncentral
x=(x.) based on the asymptotic theory approximates well for
the distributions of n for large sample sizes. Simulated
power values were calculated as the proportions of 3+ values

exceeding various percentile points from the central chi-

squared distribution (null distribution) through subroutine

42
CHIIN. Simulated power values were compared with these
obtained from an average of 2000 noncentral chi-squared
probabilities.
Test for the goodness of fit was used to examine the
accuracy of the theoretical distributions to the simulated
distributions. Patterns of power of homogeneity test were

studied. Power values were tabulated.

Test for Goodness of Flt

A slight modification of the Kolmogorov-Smirnov one-
sample test (Massey, 1956) was used to test the goodness of
fit between the asymptotic distribution and the simulated
distribution of g. The Kolmogorov-Smirnov test focuses on
the largest of the deviations between two distributions one
of which is an empirical distribution based on 3

observations. The maximum deviation, denoted as Q:

Q = maximum IEOQC) - SEQUI (33)

where

£0 = the theoretical cumulative distribution,

!b(X)

the proportion of values equal to or less than

X. and

§R(z) the observed cumulative step-function of g
observations, r/B, where z is the number of
observations equal to or less than 1.
An approximate critical value for Q at the 0.05 level is

1.36/\/§ if g > 35 (Massey, 1956).

43

For each combination of various values of N, k, and the
pattern of effect-sizes of the simulated distribution,
fifteen proportions (simulated power values) were compared
to fifteen noncentral chi-squared tail areas. Thus the
empirical power function could be considered to have been
observed on B = 15 occasions. Since the 15 measured
proportions slightly differed from the §B(z) in the formula
for o, the statistic can be called 12*. When 3 = 15, the
Kolmogorov-Smirnov critical value for goodness of fit is
0.338 at a = 0.05 (Massey, 1956). The critical value of
0.338 was lenient, and no significant differences were found
for 3 = 15. However, since there were 2000 3 statistics and
sets of probability values (3 = 2000), the critical value
for Q' to reject the goodness of fit was revised to 0.030.
Though only 15 differences (out of a possible of 2000 based
on all available probabilities) were observed, the use of R
= 2000 should provide a more conservative measure of
differences between the two functions than the critical

value for B = 15.

B§§El§§

owe ' or an 'es 0 ed- ts ode s
For fixed-effects models the simulated power values
generally tended to be greater than theoretical power
values. The averages of differences between the theoretical

and simulated power values at a = 0.05 for each 3 and N were

computed.

44

Table 2 shows the results of paired t tests on

the difference (theoretical power - simulated power) for

each total sample size (3) and number of effect sizes (3).

These tests of the mean differences gave general information

about the two power values for each sample group within 3.

Both values of o and the mean differences indicated that the

discrepancy between theoretical and simulated power values

increased as 3 increased or 3 decreased.

Table 2

Paired 5 Test between Theoretical and simulated Power for

 

 

 

Fixed-effects Model (a = 0.05)
3 3 Mean Diff . * Sd Se Paired t _d_f p_
2 203 0.0001 0.008 0.002 0.06 23 0.950
603 -0.0004 0.008 0.002 -0.24 23 0.816
1203 -0.0010 0.010 0.002 -0.48 23 0.632
2003 0.0013 0.007 0.001 0.94 23 0.359
5 203 -0.0163 0.008 0.001 -14.57 47 0.000*
603 -0.0062 0.010 0.002 -4.13 47 0.000*
1203 -0.0040 0.008 0.001 -3.67 47 0.001*
2003 -0.0011 0.008 0.001 -0.88 47 0.382
10 203 -0.0277 0.010 0.001 -26.04 87 0.000*
603 -0.0091 0.010 0.001 -8.39 87 0.000*
1203 -0.0043 0.008 0.001 -4.89 87 0.000*
2003 -0.0013 0.008 0.001 -1.49 87 0.141
30 203 -0.0592 0.021 0.002 -26.90 87 0.000*
603 -0.0139 0.015 0.002 -8.82 87 0.000*
1203 -0.0060 0.011 0.001 -4.94 87 0.000*
2003 -0.0035 0.009 0.001 -3.82 87 0.000*
Note: * o 5 0.001, positive mean difference indicates

 

theoretical power > simulated power.

45

Data was further examined using the modified
Kolmogorov-Smirnov test to detect significant discrepancies
between theoretical and simulated power functions. The
criterion for a "significant discrepancy" is 0.030, derived
from formula (33). Again, significant discrepancies
increased as the number of effect sizes (3) increased. A
frequency table of the significant discrepancies
crosstabulated by 3 is in Table 3, where the difference 2

stands for theoretical power values minus simulated power

 

 

 

values.
Table 3

Crosstabulation of Discrepancies by 3

number of effoct sizes (3)
Discrepancy 2 5 10 30 Total
Q < -0.030 0 19 81 125 225
0% 10% 23% 36% 23%
-0.030 S D 5 0.030 94 171 268 227 760
Q > 0.030 2 2 3 0 7
1%
Total 96 192 352 352 992

 

x’ = 82-9909 (d_f -- 6, p < 0.00001)

Since only 7 of 992 (less than 0.7%) distributions had
higher theoretical power values, the following analyses will
ignore the sign and focus on the frequency of the

significant discrepancies. More detailed information on

46
differences between simulated and theoretical power values
is summarized below according to the following factors:
total sample size (3), number of effect-sizes (3), sampling
fractions (Hi), sample ratios (¢i)' patterns of effect-sizes
(four patterns of fixed effect-size parameters), variation
in effect-sizes.

u e o fec s' es . The chi-squared test for
independence between "number of effect-sizes 3 (2, 5, 10,
30)" and the "significant discrepancy (yes or no)" was
significant (69.8485, o; = 3, p < .00001). Data in Table 4
indicated that discrepancies occurred the most for 3 = 30,
and the least (or almost never) for 3 = 2. However, as

shown in Tables 63 to 66 in the Appendix B, the values of

the effect-size parameters differ for different 3 values.

Table 4

Crosstabulation of Significant Discrepancies by 3
Number of Effeot-sigeo (k)

 

 

 

Significant
Discrepancy 3 = 2 3 = 5 3 = 10 3 = 30 Total
Yes 2 21 84 125 232
0% 11% 24% 36% 23%
NO 94 171 268 227 760
Total 96 192 352 352 992

 

x3 = 69.8485 (o: = 3, o < 0.00001)

47

For 3 = 2, possible conditions were the null case (all 65
were zeros) and one extreme value case. For 3 = 5, one
additional condition showed three equal subsets of parameter
effects. Only 3 = 10, and 3 = 30 contained all possible
conditions: the null case, the one-extreme-value case, the
two-extreme-values case, three equal subsets of parameter
effects, and five equal subsets of effects. Comparisons of
results for different 3 values overlook other important
factors such as pattern of 6&5. Further analysis for each 3
value was necessary and is described below.

gaggle sizes ( ). Discrepancies between simulated and
asymptotic distributions happened more often for small
sample sizes (3) with larger 3 values. The chi-squared
value to test for the dependence between "total sample size
3 (with values 203, 603, 1203, 2003)" and "significant
discrepancy (yes or no)" is 260.7375 (of = 3, p < 0.00001).
Data in Table 5 indicated that the discrepancies occurred
the most for the smallest 3 and the least for the largest 3.
In other words, when total sample sizes were small,
especially for 3 = 203, simulated distributions showed
higher power values than theoretical distributions. The
asymptotic power fitted much better with effect size
calculated from samples of 120 (60 in each experimental or
control group) or greater.

For each value of 3, the discrepancies between the

simulated and asymptotic distributions were consistently

48
smaller for larger 3s. For 3 = 2, simulated and theoretical
distributions fitted well. Only 2 out of 96 combinations
had significant discrepancies and they are not mentioned
further. Chi-square tests for the independence of "total
sample sizes" and a "significant discrepancies" within each
3 were as follows: for 3 = 5, x3 = 0.16 (g; = 3, p =
0.984); for 3 = 10, x’ = 100.95 (5;;= = 3, p < 0.00001),- and
for 3 = 30, x3= 223.43 (g; = 3, o < 0.00001). Complete

information is listed in Table 6.

Table 5

Crosstabulation of Significant Discrepancies by Sample Size

 

Iotal Sample Size

 

 

Significant
Discrepancy 203 603 1203 2003 Iotal
Yes 149 46 23 14 232
60% 18% 9% 7% 23%
NO 99 202 225 234 760
77%
Total 248 248 248 248 992

 

X3 = 260.7375 (3; = 3, Q < 0.00001)

These results suggested that simulated distributions
with large sample sizes (3) fitted better with the
calculated noncentral chi-squared distributions which
demonstrated the concept of the "asymptotic" distributions

(for large samples). Discrepancies occurred more with small

49
samples. Results for each 3 showed that the differences
among sample sizes were stronger as 3 increased. When 3
increased, small total sample sizes 3 were composed of more

small (within-study) samples.

Table 6

Crosstabulation of Significant Discrepancies by 3 and 3

 

ota Sam e e
Significant
Discrepancy 3; 203 3= 603 3=1203 3=2003 Total

 

 

 

 

 

 

3=5
Yes 5 6 5 5 21
10% 13% 10% 10% 11%
No 43 42 43 43 171
x33 = 0.16 (p = 0.984) 192
3 = 10
Yes 55 16 9 4 84
63% 18% 10% 5% 24%
No 33 72 79 84 268
x33 = 100.95 (p < 0.00001) 352
3 = 30
Yes 88 24 8 5 125
100% 27% 9% 6% 36%
No 0 64 80 83 277
x33 = 223.43 (p < 0.00001) 352

 

§amoling fzaogions (111‘ Discrepancies between

simulated and calculated power values did not depend on the

"pattern of sample sizes" designated by sampling fractions

50
(Hi). The sets of sampling fractions included were either
balanced or unbalanced. When sample sizes were the same for
all effect sizes, sample sizes were considered balanced.
Unbalanced sample sizes were designed according to the
sampling fractions obtained from the empirical study
discussed in the beginning of Chapter IV and listed in Table
62 in Appendix B.

Discrepancies between simulated and theoretical power
values did not depend on sampling fractions. The test of
independence chi-squared value between "significant
discrepancy", and "sampling fraction" was 3.80 (g; = 1, o =

0.051). Frequencies of discrepancies are listed in Table 7.

Table 7

Crosstabulation of Significant Discrepancies by I;

 

 

 

Sampling_£rastign_21
Significant
Discrepancy Balanced Unbalanced Total
Yes 129 (26%) 103 (21%) 232 (23%)
No 367 393 760 (77%)
Total 496 496 992

 

x31 = 3.80 (o = 0.051)

However, as noted in the description of the unbalanced
sample sizes pattern, large effects were only accompanied

with large samples. Results were not completely independent

51
(as also indicated by the observed significant level of 0.05
for the chi-square test); simulated values for unbalanced
samples across studies tended to be higher than the
theoretical values. Detailed information for each value of

3 is listed in Table 8.

Table 8

Crosstabulation of Significant Discrepancies by I; and 3

 

Sampling Fraction n1

 

 

 

 

 

 

Significant
Discrepancy Balanced Unbalanced Total
3 = 5
Yes 11 (12%) 10 (10%) 21 (11%)
NO 85 86 171
x31 = 0.054 (p = 0.817) 192
3 = 10
Yes 50 (28%) 34 (19%) 84 (24%)
NO 126 142 268
x”; = 4.003 (p = 0.045) 352
3 = 30
Yes 68 (39%) 57 (32%) 125 (36%)
NO 108 119 227
x21 = 1.501 (p = 0.22) 352

 

Sample ratios (@11- Discrepancies between theoretical
and simulated power did not depend on the ratios ¢i of p; to
the total sample size within a study. The chi-squared value

for "significant discrepancy" and "sample ratio (0.5 or

52
0.35)" was 0.563 (3: = 1, p = 0.453). Results are listed in
Table 9. This result was consistent within each 3 value.
Proportions of the significant discrepancies for each 3 are

listed in Table 10.

Table 9

Crosstabulation of Significant Discrepancies by ¢i

 

Sample Ratio oi

 

 

Significant
Discrepancy 313/3; = 0.5 pig/p; = 0.35 Total
Yes 121 (24%) 111 (22%) 232 (23%)
No 375 385 760 (77%)
Total 496 496 992

 

Table 10

Crosstabulation of Significant Discrepancies by ¢i and 3

 

ngple Ratio 91

 

Significant
Discrepancy 3 Biz/Hi = 0.5 gin/11; = 0.35 Total
Yes 5 13 (14%) 8 ( 8%) 21 (11%)
10 41 (23%) 43 (24%) 84 (24%)
30 65 (37%) 60 (34%) 125 (36%)

 

Eatterns of effecL-sigo pargmepops. In the simulation,

the non-null effect-size parameters were designed with four
patterns: (1) one distinct value with other values being

zero, (2) two distinct values with others being zero, (3)

53
three subsets with values equal within each subset but
different across subsets, and one subset contained zeros,
and (4) five subsets with values equal within but different
across subsets, and one subset contained zeros. Significant
discrepancies between simulated and theoretical values
depended on the pattern of effect sizes.

The chi-square test for the independence of
"significant discrepancy (yes or no)" and "pattern of
effect-sizes" was 24.03 (g; = 4, p < .0001). As listed in
Table 11, discrepancies occurred more when population
effects had one or two extreme values. Simulated values
were higher than theoretical power values when one or two

extreme parameter values existed.

Table 11

Crosstabulation of Significant Discrepancies
by Pattern of 61s

Egttern of Efﬁeot-SlZe Parameteps

Significant Zero One Two Three Five Total
Discrepancy Effects Extreme Extremes Subsets Subsets

 

 

 

Yes 8 86 54 47 37 232
13% 27% 34% 16% 23% 23%

NO 56 234 106 241 123 760
Total 64 320 160 288 160 992

 

x34 = 24.031 (p < 0.0001)

As was true in the context of other factors, when total

sample size 3 increased, the pattern of effect-sizes was

54
less relevant in introducing discrepancies. However,
significant discrepancies still occurred more when extreme
population effects existed than when effects sizes were more
evenly distributed even with large sample sizes. When the
number of effects 3 increased, the discrepancies between
sample sizes or patterns of effect sizes also increased.
Results for each 3 value are listed in Table 12. Detailed
information on power discrepancies and pattern of effect-

sizes for each 3 by 3 combination is listed in Table 13.

Table 12

Crosstabulation of Significant Discrepancies
by Pattern of 61s and 3

 

 

 

 

 

 

Significant Zero One Two Three Five Total
Discrepancy Effects Extreme Extremes Subsets Subsets
3=5
Yes 0 17 - 4 - 21
21% 4% 11%
No 16 63 - 92 - 171
x33 = 15.217 (p < 0.001) 192
3 = 10
Yes 3 28 21 16 16 84
19% 35% 26% 17% 20% 24%
No 13 52 59 80 64 268
X34 = 9.336 (p = 0.053) 352
3 = 30
Yes 5 39 33 27 21 125
31% 49% 41% 28% 26% 36%
No 11 41 47 69 59 227

 

x’4 = 12.683 (p < 0.013) 352

 

55

Table 13

Crosstabulation of Significant Discrepancies
by Pattern of 61s, 3, and 3

 

nggegp of Efﬁecp-size Parameters
3*p Zero One Two Three Five Total

Effects Extreme Extremes Subsets Subsets % Count

 

 

5(20) 0 15% - 8% - 10% ( 5)
5(60) 0 20% - 8% - 13% ( 6)
5(120) 0 25% - 0 - 10% ( 5)
5(200) 0 25% - 0 - 10% ( 5)

21/192 = 11%
10(20) 50% 65% 65% 58% 65% 63% ( 55)
10(60) 25% 35% 25% 4% 10% 18% ( 16)
10(120) 0 25% 15% 0 5% 10% ( 9)
10(200) 0 15% 0 4% 0 5% ( 4)

84/352 = 24%
30(20) 100% 100% 100% 100% 100% 100% ( 88)
30(60) 25% 55% 40% 13% 5% 27% ( 24)
30(120) 0 20% 20% 0 0 9% ( 8)
30(200) 0 20% 5% 0 0 6% ( 5)

125/352 = 36%
Total 232/992 = 23%

 

When there were many studies with small sample sizes,
discrepancies between the asymptotic and the simulated
distributions increased. As described above, discrepancies
occurred most often when the set of parameters had one
extreme value. In fact, that when 3 = 30 and p = 10, almost

half of the measured percentile points of each simulated

56
distribution were significantly higher than those of the
theoretical distribution (these values are not tabled).
Simulation data repeatedly indicated that when effect—sizes
were from many studies (e.g., 3 = 30) all having small
sample sizes (e.g., p = 20), the homogeneity test produced
greater simulated power values than the asymptotic theory.
The discrepancies between the asymptotic and simulated
distributions became insignificant as sample sizes
increased.

Further analyses of power discrepancies examined the
magnitudes of the discrepancies. Of the 14880 measures (992
combinations x 15 percentiles) 986 had significant
discrepancies: 978 were negative, where theoretical values
were lower than simulated values; and 8 theoretical values
were higher than the simulated values. The frequency
distribution of the 986 significant differences (theoretical
values - simulated values) was negatively skewed in a range
from -0.15 to 0.04 with a mean of -0.051, a mode of -0.035
(333 cases, or 33.8% showed this modal discrepancy), and a
standard deviation of 0.02. Figure 4.1.0 is a frequency
table showing the absolute values of these discrepancies.

A paired 3 test showed that overall theoretical values
were lower than simulated power by an average of -0.008 (p =
-46.40, p < 0.0001, for 14,880 records). For the 986
absolute values of significant discrepancies, about one

third (34%) ranged from 0.03 to 0.04, more than one half

(56%)

57

had values less than 0.05, and almost all (99%) had

values less than 0.10.

Figure 4.1.0

Frequencies of Absolute Significant Discrepancies

 

Count Value “
333 .03-.04 ***************************************id”:
223 ,04—,05 ****************************
146 .05,—.05 ******************
114 .06—,07 **************
37 .07-,03 ***********
45 .08-.O9 ******
26 .09-.10 ***
6 .10-.11 *
2 .11-.12
0 .12-.13
3 .13-.14
0 .14-.15
1 .15-.16
----- +----+----+----+----+----+-—--+----+----+—---+
986 0 80 160 240 320 360

As discussed above, discrepancies occurred the most

often

for large k, small 3, and extreme parameter effect

sizes. The magnitudes of the discrepancies also appeared to

be greater for these described conditions. Mean

discrepancies for pattern of population effects, number of

effects 3, and sample sizes are listed in Table 14. The

mean significant discrepancy for 3 = 30 and p = 20 was

around 0.058 (for 594 records).

58

 

Table 14
Means of Significant Discrepancies by Pattern of 61s,
El '3‘ L -
at e f t-s‘ze arameters
3*pi Zero One Two Three Five Total

Effects Extreme Extremes Subsets Subsets

 

2(20) - .031( 1) - - - .031( 1)
2(60) - ' ( 0) '- ‘ ' ‘- ( 0)
2(120) - ,031( 1) - - - .031( 1)
2(200) - - 0) - - - - ( 0)
5(20) - .037( 6) - .037( 5) - .037( 11)
5(60) - .044(15) - 1033( 2) - .035( 17)
5(120) - .042( 6) - - ( 0) - .042( 6)
5(200) - .036( 8) - - ( 0) - .036( 8)

10(20) .038(8) .040(47) .036(34) .036(33) .038(38) .038(160)
10(60) .031(1) .044(24) .043(15) .036( 1) .038( 3) .043( 44)
10(120) - (0) .046(10) .041( 4) — ( 0) .033( 1) .040( 15)
10(200) - (0) .035( 8) - ( 0) ,033( 1) - ( 0) .028( 9)

30(20) .052(28).058(134).058(140).058(151).057(141).057(594)
30(60) .036(2) .048( 43).054( 28).033( 4) .038( 1) .049( 78)
30(120) - (0) .063( 15).049( 10) — ( 0) - ( 0) .058( 25)
30(200) - (0) .050( 15).033( 2) - ( 0) - ( 0) .048( 17)

 

* Underlining indicates average theoretical power was

higher. Numbers in parentheses are counts pf dofferemces.

333333y. Simulated distributions tended to have fatter

upper tails than noncentral chi-squared distributions.
Simulated distributions fitted quite well to noncentral chi-
squared distributions when studies had large sample sizes or
evenly distributed effects. Discrepancies occurred the most
often and were largest when a review included many studies
(large 3) with small sample sizes, or when studies had
extreme parameter effects.

In other words, homogeneity tests were more sensitive

59
than indicated by theory for data with small sample sizes or
with extreme parameter effects. The non-central chi-squared
distributions based on the asymptotic theory were useful for
data with large samples and evenly distributed parameter
effects. Using the asymptotic theory to obtain power for
homogeneity test would give conservative power estimates for
data with small samples or non-normal population effects.

In his paper, Bangert-Drowns (1986) questioned the use
of the homogeneity test due to the lack of understanding of
the behavior of statistics for small or nonnormal samples.
Simulation data indicated that simulated a values for
homogeneous population effects approximately equaled the
preset significance levels. Only for large collections of
small samples was the size of the test significantly greater
than 0.05. As shown in Table 11, when 3 = 30 and average
within-study sample size 3 = 20, simulated sizes and power
values were consistently higher than theoretical values.
Also simulated sizes were around 0.10 (0.05 higher than the
nominal level) for p = 20 and 3 = 30 (Table 11). Under the
null hypothesis, these higher values indicate an inflated
rate of false rejections (type I error).

When effects were not homogeneous (i.e., under
alternative hypotheses), higher simulated power for small
samples and extreme parameter effects was not problematic.
In these cases (1) heterogeneity should be detected (since

H0 is false), and (2) simulated power values were not much

60
higher than the asymptotic power values. Asymptotic power
underestimated the power of the homogeneity test for extreme
parameter effects and small samples.
Powe; leozepanoles ip 3andom-efﬁeots Mogels

Results (patterns of discrepancies) were similar across
different population effect-size means, ”5, of 0, 0.1, 0.25,
or 0.5. Table 15 demonstrates the results of paired 3 tests

for each 3, showing the differences in theoretical and

simulated power values.

Table 15

Paired 3 Test between Theoretical and Simulated Power for
Random-effects Model

 

 

 

3 p5 Mean Diff.* Sd Se Paired 3 g: p
2 0.00 0.0003 0.008 0.000 1.64 2399 0.102
0.10 0.0002 0.008 0.000 1.07 2399 0.287
0.25 0.0000 0.008 0.000 0.09 2399 0.931
0.50 -0.0000 0.008 0.000 -0.06 2399 0.950
5 0.00 -0.0002 0.007 0.000 -1.14 2399 0.253
0.10 -0.0004 0.007 0.000 -2.53 2399 0.012*
0.25 0.0001 0.007 0.000 0.83 2399 0.405
0.50 0.0003 0.008 0.000 1.90 2399 0.058
10 0.00 -0.0000 0.006 0.000 -0.08 2399 0.933
0.10 0.0002 0.007 0.000 1.37 2399 0.172
0.25 -0.0001 0.007 0.000 -0.58 2399 0.559
0.50 -0.0006 0.006 0.000 -4.75 2399 0.000#
30 0.00 -0.0003 0.007 0.000 -1.90 1919 0.057
0.10 0.0006 0.007 0.000 3.76 1919 0.000#
0.25 0.0002 0.007 0.000 1.28 1919 0.202
0.50 -0.0003 0.007 0.000 -2.00 1919 0.046*
3o3o: * p < 0.05, f p < 0.001, positive mean difference

indicates theoretical power > simulated power.

61

For 3 = 2, none of the paired 3 tests showed significance at
the 0.05 level. For 3 = 5 and 10, one average difference
was significant. For 3 = 30, one group was found
significant and another barely significant. The mean
differences were very small. Statistical significance was
largely due to the large degrees of freedom and small
standard error values. These gverago differences would not
be consequential for our interpretation of theoretical power
values.

The modified Kolmogorov-Smirnov one-sample test was
again used to determine the goodness of fit between the
asymptotic power functions and the simulated power
functions. As in the fixed-effects case, the number of
replications used in random-effects was 2000. Thus the
maximum deviation, D, from formula (30) was again 0.030.

Only 20 of 2688 (0.07%) distributions had significant
discrepancies. For 3 = 2, 9 of 640 (1.4%) distributions had
significant discrepancies. For 3 = 5, 6 of 640 (1%)
distributions had significant discrepancies. For 3 = 10, 2
of 640 (0.3%) combinations had significant discrepancies.
And for 3 = 30, 4 of 512 (0.8%) combinations had significant
discrepancies. Frequencies of power discrepancies are
listed in Table 16..

Significant discrepancies occurred less than 1 out of
100 times. Their occurrence was dependent upon 3, the

number of effect-sizes (test of association, x33 = 14.898, p

62
< 0.005). Significant discrepancies did not depend on
sample sizes 3 (x33 = 4.861, p < 0.25). No significant
association (x39 = 10.18, p < 0.40) was found between the
number of effect—sizes (3) and sample sizes (3) and
occurrence of significant power discrepancies. In other
words, the dependence of power discrepancies on sample sizes
3 did not vary with 3. Significant discrepancies occurred
the most often for 3 = 2; however, the occurrence rate was

still less than 1.5%.

Table 16

Frequency Table for Significant Discrepancies
for Random-effects Model

 

 

 

3 3 pa Total Total
for for
0.00 0.10 0.25 0.50 3 3
2 1 - - 2 1 3
2 - - 2 - 2
3 1 1 1 - 3
4 - - 1 - 1 9
5 1 1 - 1 3
2 - - - - 0
3 - - - 1 1
4 - 1 1 - 2 6
10 1 - 1 - - 1
2 - - - - 0
3 - - - - 0
4 - 1 - - 1 2
30 1 - 1 - - 1
2 - 1 - - 1
3 - - - - 0
4 - 1 - - 1 3

 

N
0

Total 2 7 8 3 20

 

63

The magnitude of significant power discrepancies for
random-effects models was examined. Significant
discrepancies occurred for 28 out of 36,480 (0.8%) measures.
Unlike for the fixed-effects models where simulated power
values were sometimes higher than theoretical power values;
for random-effects models, a strong two-thirds (9/28) of the
discrepancies reflected lower simulated power values. The
mean of the 28 significant values was 0.009 (3 = 1.48, p >
.05). The 3 statistic indicated that the mean did not
differ significantly from zero. In other words, theoretical
power values were not consistently either higher or lower on
average than simulated power values.

The occurrence rates as well as the magnitudes of
significant discrepancies differed for random- and fixed-
effects models. The dissimilarity may have resulted
partially from the fact that population effect-sizes in the
random-effects models were all normally distributed, unlike
the cases examined for fixed-effects models. Also, the 3+
statistics were generated from asymptotic noncentral chi-
squared distributions in the random-effects models. Results
from the fixed-effects case had indicated that the
theoretical power functions approximated well the simulated
functions when sample size was large. However, simulated
power values in the random-effects simulations may still be
underestimating the true power for small samples and large 3

values. Simulation data did not indicate a many differences

64
between simulated and theoretical power values; therefore,
the analysis of power for random-effects models will focus
on the theoretical values.
Powe na sis

Power values at a = 0.05 were selected for analysis.
Factors for the power analysis included: the number of
effect sizes 3 (2, 5, 10, 30), total sample sizes 3 (203,
603, 1203, 2003), sampling fractions a; (balanced vs.
unbalanced sample sizes 3333333 studies), sample ratios o;
(balanced vs. unbalanced sample sizes wlthin studies), and
patterns of effect size parameters. Relations between power
and these factors were studied through analysis of variance,
regression, correlation and curve fitting.

Eixed-oﬁfoc3s model. Power values for the homogeneity
test were positively related to the variance of simulated
effects, sample sizes 3, and number of effects 3. However,
since these variables were not directly (or linearly)
related to power, correlation coefficients representing the
relationships appeared weak. For the fixed-effects model,
the correlation coefficient 3(power, 3) was 0.15 (p =
0.001), and 3(power,‘V3) was 0.16. Between power and total
sample size 3, the correlation coefficient 3(power, 3) was
0.38 (p < 0.001), and 3(power,‘V30 was 0.43. The
relationship between power and the spread among population

effects was greater, 3(power, §?5) was 0.47 (p < 0.001), and

65
3(power, 36) was 0.641. The relations between sampling
fraction or sample ratio and power were not significant.
The correlation coefficient 3(power, n) was 0.05 (p = 0.14)
and 3(power, ¢) was -0.02 (p = 0.44).

A regression analysis of the power values used a
stepwise procedure. The particular stepwise procedure
selected predictor variables in the order of the amount of
the variation (change in 33) in power values being explained
by the predictor. The variable representing the pattern of
6is was not continuous thus was not entered as a predictor
variable. As expected, the weighted average of parameter
effects, 6. (as in formula (9) in Chapter III, page 17), and
the spread of 6&8, 35, increased linearly within each
pattern of 6is. The combination of 6. and 36 contained
information about the pattern of 6&5. Therefore, 6. and 35
were entered into the regression as predictor variables
instead. The association between the pattern of 618 and
power was also studied below via analysis of variance.

The predictor variable first selected in the regression
model was the index of spread among parameter effects 35
(multiple 3 = 0.64, 32 = 0.41, 21,990 = 678.49, p < 0.0001,
for 992 cases). Total sample size with square root/Vﬁfwas
next to be included in the model with 3 increased to 0.83,

32 = 0.69, 132 change -- 0.28, and £2,939 = 1105.84 (p <

 

1 In the fixed-effects case, 835 represents the distance
between fixed 61 values.

66
0.0001). The third predictor included in the regression
model was the square root of 3,‘V3f(3 = 0.84, 33 = 0.70, R2
change = 0.01, 33,988 = 765.44, p < 0.0001). Sampling
fraction between studies (n: 1 = balanced, 2 = unbalanced)
had a very small effect, however, was also selected into the
model last (3 = 0.84, 33 = 0.70, 33 change = .002). The
final regression model for combination of parameters j for

the fixed-effects model is listed below:

£92223 = -0.252 + 1.839 (g5)j + 0.012 «4% +

(-0.032) x/Ej + 0.035 nj. (34)

As predicted the spread in 61s explained much variation
in power. Total sample size was also important. Number of
effects 3 had a smaller effect, since 3 = 3*p had already
partially taken into account the effect of 3.

Analysis of variance was also conducted for power with
number of effects, sample sizes, sampling fraction, sample
ratio, and pattern of parameters as factors for the fixed-
effects model. Results are listed in Table 17.

The power of 3 was explained most by sample size and
the pattern of 6&3. Sampling fraction and sample ratio were
again not influential on the power of 3. This result seems
reasonable since effect sizes for homogeneity test were
weighted by their precision which is nearly proportional to
the sample sizes (see formula (5), and (9) in Chapter III at

page 15, and 16). And the power of homogeneity test should

67
depend on whether effect sizes were similar. Changes in
sample sizes combined with values of 61s should affect the
power of the homogeneity test. Thus the total sample sizes
increased differences among the effects sizes were also
emphasized. However, the regression model seemed better
than the ANOVA model in explaining the variation of power.
The amount explained by the ANOVA model was around 31% which
was much less than the amount explained by the regression

model (70%).

Table 17

Analysis of Variance for Power of 3

 

 

Source of Sum of Amt. g3 Mean F p

Variation Squares Exp. Squares

Main Effect 39.031 (31%) 11 3.548 38.255 .000
k 1.642 ( 1%) 3 .547 5.900 .001
Sample size 33.706 (27%) 3 11.235 121.131 .000
Sampling fraction .316 ( 0%) 1 .316 3.408 .065
Sample ratio .088 ( 0%) 1 .088 .944 .331
Pattern of 615 2.250 ( 2%) 3 .750 8.087 .000

Residual 84.963 (69%) 916 .093

Total 123.994 927 .134

 

Average power values at a = 0.05 were calculated. For
fixed-effects, the grand mean power was 0.44 (across 992
cases) with a standard deviation of 0.37. Too much
information is aggregated in the grand mean; thus this value
has little practical meaning. Further categorization of the

data was necessary. Mean power values for each pattern of

615 and total sample size are listed in Table 18.

Table 18

Means of Theoretical Power of 3 by Pattern of 6&3,
3, and 3 (c = 0.05)

 

Patterp of Etfec3-sige Parameters

 

3*3 Zero One Two Three Five Total
Effects Extreme Extremes Subsets Subsets
2(20) .050(4) .146(20) - - - .1300( 24)
2(60) .050(4) .315(20) - - - .2708( 24)
2(120) .050(4) .470(20) - - - .4003( 24)
2(200) .050(4) .571(20) - - - .4844( 24)
5(20) .050(4) .143(20) - .125(24) - .1259( 48)
5(60) .050(4) .337(20) - .301(24) - .2951( 48)
5(120) .050(4) .500(20) - .496(24) - .4606( 48)
5(200) .050(4) .596(20) - .641(24) - .5732( 48)
10(20) .050(4) .154(20) .189(20).171(24) .158(20) .1630( 88)
10(60) .050(4) .360(20) .460(20).432(24) .409(20) .3992( 88)
10(120).050(4) .515(20) .595(20).636(24) .606(20) .5657( 88)
10(200).050(4) .607(20) .669(20).765(24) .718(20) .6640( 88)
30(20) .050(4) .134(20) .196(20).296(24) .256(20) .2140( 88)
30(60) .050(4) .315(20) .430(20).624(24) .566(20) .4705( 88)
30(120).050(4) .462(20) .573(20).797(24) .723(20) .6191( 88)
30(200).050(4) .564(20) .651(20).870(24) .808(20) .6992( 88)

 

Simulated power values were slightly higher, the grand

mean was 0.45 (992 cases) with a standard deviation of 0.36.

Means of simulated power values for each pattern of 6is by

total sample size are listed in Table 19.

Means of

69

Table 19

Simulated Power of g by Pattern of Sis,
g, and L (c = 0.05)

 

Zero
Effects

3*3

a

One

Extreme Extremes

rno

Two

Three
Subsets

Five
Subsets

ffect-size Parameters

Total

 

2(20)
2(60)
2(120)
2(200)

.049(4)
.053(4)
.050(4)
.045(4)

5(20)
5(60)
5(120)
5(200)

.058(4)
.os4(4)
.os1(4)
.055(4)

10(20) .078(4)
10(60) .053(4)
10(120).055(4)
10(200).054(4)

30(20) .100(4)
30(60) .062(4)
30(120).055(4)
30(200).051(4)

.146(20)
.315(20)
.471(20)
.571(20)

.159(20)
.348(20)
.504(20)
.598(20)

.184(20)
.373(20)
.521(20)
.610(20)

.194(20)
.337(20)
.477(20)
.574(20)

.217(20)
.470(20)
.601(20)
.671(20)

.257(20)
.449(20)
.582(20)
.651(20)

.142(24)
.304(24) -
.501(24) -
.641(24) -

.196(24).186(20)
.439(24).415(20)
.638(24).609(20)
.764(24).718(20)

.354(24).306(20)
.631(24).576(20)
.796(24).726(20)
.871(24).811(20)

.1300(
.2711(
.4012(
.4830(

.1421(
.3013(
.4646(
.5742(

.1908(
.4082(
.5699(
.6651(

.2732(
.4844(
.6251(
.7027(

24)
24)
24)
24)

48)
48)
43)
48)

88)
88)
88)
88)

88)
88)
88)
88)

 

When all effects were zero (homogeneous), the simulated

power was higher than expected a levels, especially for

small samples (e.g., n = 20).

= 0.10, 0.025, and 0.01 are also listed in Table 20.

Simulated power values for a

Table 20

 

 

Means of Simulated Power of g by Homogeneous 81s,
a, and x (8 = 0

3*3 0.10 0.05 0.025 0.01
2(20) .092 (4) .049 (4) .026 (4) .014
2(60) .104 (4) .053 (4) .025 (4) .009
2(120) .100 (4) .050 (4) .026 (4) .011
2(200) .096 (4) .045 (4) .023 (4) .009
5(20) .113 (4) .058 (4) .033 (4) .015 (4)
5(60) .103 (4) .054 (4) .027 (4) .011 (4)
5(120) .101 (4) .051 (4) .025 (4) .010 (4)
5(200) .101 (4) .055 (4) .028 (4) .012 (4)
10(20) .132 (4) .078 (4) .047 (4) .025 (4)
10(60) .110 (4) .058 (4) .030 (4) .013 (4)
10(120) .103 (4) .055 (4) .029 (4) .014 (4)
10(200) .105 (4) .054 (4) .030 (4) .012 (4)
30(20) .160 (4) .100 (4) .066 (4) .038 (4)
30(60) .118 (4) .062 (4) .034 (4) .017 (4)
30(120) .105 (4) .055 (4) .027 (4) .012 (4)
30(200) .103 (4) .051 (4) .025 (4) .010 (4)

 

Analysis of variance (ANOVA) was applied to power

values for each pattern of sis.

mean value or the spread of the population effects

Within each pattern, as the

increased, the power of the homogeneity test increased.

Sample size was again a significant factor.

Tables 21 to 28

list the ANOVA results and mean power values for each

pattern of Sis.

Table 21

ANOVA on Power of g for 81s with One Extreme Value

71

 

 

 

 

 

Source of Sum of g; Mean E 9
Variation Squares Squares
Main Effect 31.780 10 3.178 307.305 .000
L .079 3 .026 2.529 .058
3 8.874 3 2.958 285.771 .000
Magnitude of 518 22.828 4 5.707 551.361 .000
Two-way Interactions 4.658 33 .141 13.637 .000
L x E .014 9 .002 .153 .998
K x 6 .044 12 .004 .353 .978
H x 6 4.600 12 .383 37.043 .000
Three-way Interactions .051 36 .001 .137 1.00
Residual 2.484 240 .010
Total 38.973 319 .133
Table 22
Means of Power of g for 6 s with One Extreme Value
by g and 5 a = 0.05)

3*3 6 = 0.10 0.25 0.50 0.75 1.00 Total
2(20) .053(4) .066(4) .114(4) .195(4) .304(4) .1300(20)
2(60) .058(4) .097(4) .246(4) .472(4) .702(4) .2708(20)
2(120) .065(4) .148(4) .436(4) .762(4) .940(4) .4003(20)
2(200) .075(4) .215(4) .640(4) .930(4) .995(4) .4844(20)
5(20) .052(4) .063(4) .106(4) .187(4) .305(4) .1426(20)
5(60) .056(4) .092(4) .247(4) .515(4) .775(4) .3370(20)
5(120) .063(4) .141(4) .475(4) .843(4) .979(4) .5000(20)
5(200) .071(4) .213(4) .720(4) .976(4) .999(4) .5960(20)
10(20) .052(4) .063(4) .110(4) .203(4) .342(4) .1540(20)
10(60) .056(4) .094(4) .276(4) .572(4) .801(4) .3597(20)
10(120).063(4) .149(4) .532(4) .856(4) .973(4) .5146(20)
10(200).072(4) .235(4) .759(4) .970(4) .999(4) .6070(20)
30(20) .051(4) .060(4) .096(4) .172(4) .294(4) .1346(20)
30(60) .055(4) .083(4) .236(4) .503(4) .698(4) .3148(20)
30(120).059(4) .127(4) .468(4) .749(4) .906(4) .4621(20)
30(200).066(4) .199(4) .663(4) .901(4) .990(4) .5639(20)

 

72

 

 

 

Table 22.a
Means of Simulated Power of! for 6 s with One Extreme Value
by )1 and1;(c =‘D. 05)

3*3 6 = 0.10 0.25 0.50 0.75 1.00 Total
2(20) .056(4) .074(4) .115(4) .189(4) .297(4) .1461(20)
2(60) .056(4) .091(4) .248(4) .472(4) .707(4) .3147(20)
2(120) .066(4) .150(4) .438(4) .759(4) .945(4) .4714(20)
2(200) .075(4) .214(4) .638(4) .933(4) .993(4) .5706(20)
5(20) .068(4) .075(4) .120(4) .202(4) .330(4) .1589(20)
5(60) .060(4) .097(4) .247(4) .534(4) .801(4) .3477(20)
5(120) .066(4) .143(4) .477(4) .850(4) .985(4) .5041(20)
5(200) .067(4) .209(4) .733(4) .979(4) 1.000(4) .5978(20)
10(20) .078(4) .089(4) .138(4) .232(4) .383(4) .1843(20)
10(60) .061(4) .103(4) .286(4) .588(4) .827(4) .3732(20)
10(120).065(4) .154(4) .540(4) .864(4) .983(4) .5210(20)
10(200).070(4) .235(4) .771(4) .976(4) .999(4) .6104(20)
30(20) .111(4) .113(4) .152(4) .234(4) .366(4) .1950(20)
30(60) .069(4) .091(4) .258(4) .533(4) .735(4) .3370(20)
30(120).060(4) .139(4) .486(4) .767(4) .934(4) .4771(20)
30(200).070(4) .211(4) .676(4) .919(4) .994(4) .5742(20)

Table 23

ANOVA on Power of g for 8;s with Two Extreme Values

 

 

Source of Sum of g; Mean 3 9

Variation Squares Squares

Main Effect 19.994 8 2.499 236.143 .000
3 .010 1 .010 .948 .332
3 5.067 3 1.689 159.576 .000
Magnitude of 618 14.918 4 3.729 352.367 .000

Two-way Interactions 2.419 19 .127 12.028 .000
3 x H .008 3 .003 .238 .869
K x 6 .009 4 .002 .223 .925
H x 6 2.402 12 .200 18.910 .000

Three-way Interactions .016 12 .001 .126 1.00

Residual 1.270 120 .011

Total 23.699 159 .149

 

73

Table 24

Means of Power of g for 6is with Two Extreme Values
by g and L (a = 0.05)

 

 

 

5*; 6 = 0.10 0.25 0.50 0.75 1.00 Total
10(20) .053(4) .069(4) .142(4) .290(4) .393(4) .1892(20)
10(60) .059(4) .116(4) .397(4) .773(4) .954(4) .4598(20)
10(120).069(4) .203(4) .729(4) .977(4) .999(4) .5954(20)
10(200).083(4) .336(4) .928(4) .999(4) 1.000(4) .6690(20)
30(20) .052(4) .066(4) .129(4) .269(4) .465(4) .1961(20)
30(60) .057(4) .106(4) .375(4) .715(4) .897(4) .4301(20)
30(120).065(4) .187(4) .679(4) .937(4) .997(4) .5731(20)
30(200).077(4) .315(4) .867(4) .996(4) 1.000(4) .6509(20)
H2L§= The pattern of 6 values with two extreme values was

(0,

or 5v

6 0

Table 24.a

Means of Simulated Power of g for 6 s with Two Extreme
Values by g and 5 (a = 0.05)

 

 

 

3*; 6 = 0.10 0.25 0.50 0.75 1.00 Total
10(20) .080(4) .098(4) .166(4) .320(4) .423(4) .2175(20)
10(60) .063(4) .125(4) .412(4) .783(4) .966(4) .4698(20)
10(120).074(4) .210(4) .738(4) .984(4) 1.000(4) .6009(20)
10(200).084(4) .350(4) .924(4) 1.000(4) 1.000(4) .6714(20)
30(20) .105(4) .122(4) .188(4) .332(4) .536(4) .2557(20)
30(60) .068(4) .119(4) .400(4) .733(4) .924(4) .4488(20)
30(120).072(4) .201(4) .688(4) .949(4) .998(4) .5816(20)
30(200).075(4) .307(4) .877(4) .996(4) 1.000(4) .6512(20)
Note: The pattern of 6 values with two extreme values was

(0.

0, 6, 6 .

74

Table 25

ANOVA on Power of n for Three Equal Subsets of 61s

 

 

 

 

 

 

Source of Sum of g; Mean E 2
Variation Squares Squares
Main Effect 32.018 10 3.202 4347.081 .000
3 3.160 2 1.580 2145.030 .000
3 13.005 3 4.335 5885.632 .000
Magnitude of 63 15.853 5 3.171 4304.771 .000
Two-way Interactions 3.023 31 .098 132.419 .000
3 x g .195 6 .032 44.052 .000
g x 6 .450 10 .045 61.113 .000
n x 6 2.379 15 .159 215.303 .000
Three-way Interactions 1.221 30 .041 55.255 .000
Residual .159 216 .001
Total 36.421 287 .127
Table 26
Means of Power for Three Equal Subsets of 6is
by E and 3 (c = 0.05)

5*3 6 = 0.10 0.20 0.25 0.30 0.40 0.50 Total
5(20) .056 .076 .092 .112 .168 .243 .1245(24)
5(60) .070 .138 .141 .272 .463 .666 .3010(24)
5(120) .091 .247 .376 .524 .795 .944 .4962(24)
5(200) .122 .401 .596 .772 .960 .997 .6414(24)
10(20) .059 .088 .114 .147 .244 .377 .1714(24)
10(60) .078 .190 .293 .423 .705 .902 .4319(24)
10(120) .112 .379 .584 .774 .968 .998 .6359(24)
10(200) .163 .621 .846 .960 .999 1.000 .6691(24)
30(20) .064 .120 .174 .249 .463 .704 .2958(24)
30(60) .100 .347 .559 .766 .973 .999 .6240(24)
30(120) .169 .705 .917 .988 1.000 1.000 .7965(24)
30(200) .285 .938 .996 1.000 1.000 1.000 .8699(24)
ﬂgtg: The pattern of three equal subsets of 6; values was

(0,000,

0:

6,000, 6’

26,000,

26).

75
Table 26.a

Means of Simulated Power of g for Three Equal Subsets of 6is
by H and L (c = 0.05)

 

 

3*3 6 = 0.10 0.20 0.25 0.30 0.40 0.50 Total
5(20) .069 .092 .106 .136 .184 .266 .1421(24)
5(60) .076 .144 .199 .279 .460 .665 .3038(24)

5(120) .092 .252 .381 .529 .801 .949 .5006(24)
5(200) .120 .404 .595 .770 .962 .996 .6412(24)

10(20) .080 .118 .138 .169 .269 .405 .1964(24)
10(60) .080 .202 .307 .405 .705 .907 .4388(24)
10(120) .121 .379 .590 .769 .970 .998 .6381(24)
10(200) .153 .627 .843 .960 .999 1.000 .7637(24)

30(20) .123 .189 .248 .324 .511 .728 .3541(24)
30(60) .115 .360 .573 .767 .973 .998 .63ll(24)
30(120) .171 .701 .920 .985 1.000 1.000 .7961(24)
30(200) .285 .938 .996 1.000 1.000 1.000 .8699(24)

 

note: The pattern of three equal subsets of 61 values was
(0,..., 0, 6,..., 6, 26,..., 26).

Table 27

ANOVA on Power of g for Five Equal Subsets of 61s

 

 

Source of Sum of 6; Mean 2 9

Variation Squares Squares

Main Effect 18.832 8 2.354 2997.240 .000
K .437 1 .437 556.972 .000
3 7.590 3 2.530 3221.503 .000
Magnitude of 618 10.804 4 2.701 3439.109 .000

Two-way Interactions 1.994 19 .105 133.602 .000
3 x H .057 3 .019 23.998 .000
L x 6 .162 4 .040 51.549 .000
Q X 6 1.775 12 .148 188.353 .000

Three-way Interactions .291 12 .024 30.846 .000

Residual .094 120 .001

Total 21.210 159 .133

 

76

Table 28

Means of Power of g for live Equal Subsets of 81 s

 

 

 

by M and g (a = 0.05)

3*; 86 = 0.10 0.20 0.30 .40 0.50 Total
10(20) .057(4) .082(4) .129(4) .207(4) .317(4) .1585(20)
10(60) .073(4) .164(4) .355(4) .614(4) .835(4) .4086(20)
10(120) .101(4) .318(4) .686(4) .932(4) .994(4) .6060(20)
10(200) .313(4) .532(4) .917(4) .997(4) 1.000(4) .7519(20)
30(20) .061(4) .100(4) .187(4) .339(4) .542(4) .2458(20)
30(60) .086(4) .254(4) .604(4) .899(4) .990(4) .5664(20)
30(120) .133(4) .542(4) .941(4) .999(4) 1.000(4) .7231(20)
30(200) .211(4) .830(4) .998(4) 1.000(4) 1.000(4) .8078(20)
ﬂgtg: The pattern of five equal subsets of 6 values was

(0'000'0' %6'00

0'%6’ 6,000,

6' 1%6'000’1 6' 26,000

,26).

 

 

 

Table 28.a
Means of Simulated Power for 6is with Five Equal Subsets
by N by k (c: 0. 05)

3*3 k6 = 0.10 0.20 0.30 0.40 0.50 Total
10(20) .081(4) .108(4) .162(4) .237(4) .343(4) .1863(20)
10(60) .078(4) .177(4) .357(4) .623(4) .842(4) .4154(20)
10(120) .106(4) .321(4) .691(4) .933(4) .994(4) .6090(20)
10(200) .318(4) .531(4) .915(4) .997(4) 1.000(4) .7523(20)
30(20) .120(4) .167(4) .251(4) .400(4) .589(4) .3056(20)
30(60) .105(4) .267(4) .613(4) .903(4) .989(4) .5757(20)
30(120) .140(4) .547(4) .942(4) .999(4) 1.000(4) .7256(20)
30(200) .221(4) .837(4) .999(4) 1.000(4) 1.000(4) .8113(20)
Note: The pattern of five equal subsets of 6 values was

 

(0,...,0, $56,...,356, 5,

000'

6' 1%8'000'1 6'

26,000

,26).

77

The main effect of h and the two-way interaction
effects of K by u, and k by 6 were not significant for the
one-extreme-value case or the two-extreme-values case.

Power values did not vary with 3 when population effects had
extreme values. However these effects were significant for

the three-equal-subsets and the five-equal-subsets patters.

Power values increased faster with large 3.

Random-effects model. Correlation coefficients were
also obtained for power of g, and number of effects, total
sample sizes, variance of parameter effects, sampling
fraction, and sample ratio for the random-effects model. In
comparison to the fixed-effects model, the relationships
between power and the first three variables were stronger
for the random-effects model; {(power, 3) was 0.29 (p <
0.001), r(power,‘vgb was 0.34, {(power, H) was 0.43 (p <
0.001), r(power,‘VE) was 0.53, g(power, 035) = 0.48 (p <
0.001), and 1(power, 05) was 0.55 for random-effects.
Correlations were not significant between power and the
sampling fraction (g(power, n) was -0.24, p = 0.54), or
between power and the sample ratio ¢ (g(power, ¢) was 0.02,
p = 0.64).

Regression analysis with a stepwise procedure was also
applied to the power of 3+. For random-effects, instead of
the predicted 6. (weighted average of 61s) and $5 (the index
of spread among the fixed 61s), the standard variation of

parameter effects (05) was included in the regression

78
analysis. For p5 = 0.00, the stepwise procedure also
selected the standard deviation of parameter effects 05 as
the most important predictor for power of ﬁ+ (B = 0.55, 33 =
0.30, £1,990 = 261.97, p < 0.0001). The second predictor
included in the regression was the square root of the total
sample size‘Vﬁ (B = 0.87, 33 = 0.76, 33 change = 0.46,
22,989 = 943.18, p < 0.0001) . Only two predictors were
selected for the random-effects model, however, the
variation explained by the model reached 76%. For #5 =

0.10, 0.25, and 0.50 results were similar to the case of #5

- 0.00, the final regression model was:

"11

O

'1

T:
on
I

(-0.326) + 1.557 (0‘5)j + 0.013 Vﬁj. (35)

5

Results indicated that the power of 3+ depended upon the
variation of effects 05 and the total sample size in the
random-effects model. It appeared that 3 had no effect,
however, since u = 3*n, the total sample size had already
taken into account the effect of L.

The grand mean power value for u5 = 0.00 was 0.41 with
a standard deviation of 0.31. Mean power values for random-
effects increased as the variance of population effects or
the sample sizes increased. Mean power values according to
the variance of parameter effects for random-effects with #5
= 0.00 are listed in Table 29.

Asymptotic and simulated power values were calculated

79

.05; k = 2, 5, 10, 30;

and power curves drawn for a = and ﬂ
= 203, 603, 1203, 2003 for fixed-effects models in Figures
4.1.1 to 4.4.2 in the Appendix D. For random-effects
models, power values were calculated with pa = 0, 0.10,
0.25, 0.50; and 053 = 0.01(0.02)0.9, 0.10. See Figures

4.5.1 to 4.8.4 in the Appendix D. Power tables for other 0

levels are also listed in the Appendix C.

Table 29

Mean Power of §+ at c = 0.05 for "a = 0
for the Random-effects Model

 

 

 

035 g = 205 605 1205 2003 Total
.00 0.05(16) 0.05(16) 0.05(16) 0.05(16) 0.05( 64)
.00-.02 0.06(16) 0.13(20) 0.23(20) 0.35(20) 0.20( 76)
.02-.04 0.09(16) 0.29(20) 0.50(20) 0.63(20) 0.39( 76)
.04-.06 0.13(16) 0.42(20) 0.54(16) 0.67(16) 0.44( 68)
.06-.08 0.17(16) 0.53(20) 0.51(12) 0.64(12) 0.45( 60)
.08-.10 0.23(32) 0.47(28) 0.59(24) 0.71(24) 0.48(108)
.15 0.34(16) 0.52(12) 0.70(12) 0.78(12) 0.57( 52)
.20 0.42(16) 0.60(12) 0.75(12) 0.81(12) 0.63( 52)
.25 0.48(16) 0.65(12) 0.78(12) 0.76( 8) 0.64( 48)
Total 0.22(160) 0.37(160) 0.48(l44) 0.56(140) 0.41(604)

 

CHAPTER V
THE INFLUENCE OF THE SIGNIFICANCE LEVEL AND POWER
OF THE FIRST STAGE TEST ON THE SECOND STAGE TEST

-- A SEQUENTIALLY RELATED TESTING PROCEDURE --

In this section, I will first distinguish among several
similar terms: "sequential analysis" (Wald, 1952),
"sequential decision" (Sobel & Wald, 1949), and
"sequentially related testing procedure". Use of these
terms in the literature suggests that "sequential analysis"
defines the sampling procedure, "sequential decision"
relates to the selection of the hypothesis, and
"sequentially related testing procedure" refers to the
ordering of testing in a multi-stage testing process.

Wald (1952) defined sequential analysis as "a method of
statistical inference whose characteristic feature is that
the number of observations required by the procedure is not
determined in advance of the experiment. The decision to
terminate the experiment depends, at each stage, on the
results of the observations previously made" (p. 1).
Sequential analysis is often used in medical research (e.g.,
Anscombe, 1963; Armitage, 1960; Whitehead, 1983, 1987;
etc.), probably because fewer subjects are required in

sequential trials than in fixed trials (Lewis, 1990).

80

81

A §§Q2§EL1§1.Q§QL§12E involves the sequential
examination of hypotheses. Sobel and Wald (1949) discussed
a sequential decision procedure for choosing one of three
hypotheses concerning the unknown mean of a normal
distribution. Consider a variable x which is normally
distributed with known variance 02, but with an unknown mean
u. Given two real numbers a1 < a2 and a set of hypotheses
to be examined, say, H1: u < a1, H2: a1 5 u 5 a2, and H3: 0 >
a2, the problem is to choose one of these three mutually
exclusive and exhaustive hypotheses. This is a process of
making decisions about a sequence of hypotheses.

The third term, to be used in this study, is
"sequentially related testing procedure." Such a procedure
does not draw observations sequentially, nor does it involve
sequential decisions about several alternative hypotheses.
It involves testing more than one hypothesis in sequence for
one set of data. The sequentially related hypotheses tested
imply that one will test a second qualitatively different
hypothesis only after a specific decision is made at stage
1.

When tests are sequentially related, it is natural to
consider the relationship of the testing errors among the
tests. Will the testing error in the first test influence
errors made in conducting the next test? Does the impact

involve either one of, or both, type I and type II errors?

Effect-size meta-analysis involves the process of

82

sequentially related testing, since many effect-size meta-
analyses involve the two-stage testing procedure outlined
above in (7) and (7a) in Chapter IV. Therefore, in studying
the power of the homogeneity test in effect-size meta-
analysis, the sequential impact of testing errors is a
concern.

In this chapter, I will discuss the influence of
sequentially related hypothesis test, and I will examine the
impact of the first-stage decisions on the second-stage

statistical errors.

Two-Stage Testing

Effect-size meta-analyses involve at least two tests in
sequence: the homogeneity test for the consistency of the
effect sizes and the test for the magnitude of the common
effect. When the study effects are determined to be
homogeneous, one further estimates the value of the probable
common p0pulation effect and tests whether the common value
is zero.

For example, consider a review of sex differences on
science achievement for grade-school students. After
computing effect sizes from a series of studies, the
reviewer first tests the homogeneity of all effects to
decide whether they are consistent. If the homogeneity of
effects is accepted, the reviewer then tests to determine

whether gender has an effect on science achievement. If the

83
homogeneity test for the full set of effects is rejected,
one decides that the magnitudes of sex differences on
science achievement may vary. To proceed with the analysis,
one either considers effects to be random, or seeks
homogeneity within smaller groups of effects. For instance,
effects may vary with grade levels, such that girls perform
better than boys only in certain grade levels. The
homogeneity test would then be performed on the effects for
each grade level. If homogeneity of effects is accepted
within a subgroup or grade level, the second-stage test
measuring the magnitude of the average sex differences will

be conducted for that subgroup.

influence of Seguentially Related Hypothesis Testing
on Statistical Errors

The role of sequentially related hypothesis testing in
determining statistical errors is observed below in two
situations: acceptance or rejection of the overall
homogeneity test at the first stage.

v Ho e'

Since the test for homogeneity and the test for the
common population effect are sequentially related, the
validity of the former test can affect the validity of the
latter. If at stage one, the analyst made a type II error
in the homogeneity test, the second stage test for the

ggmmgg effect is misleading. Precisely, when population

84

effects are heterogeneous, the estimate of the effect-size
in the second stage test is an estimate of an "average"
effect (#5) from a set of random effects rather than of the
"common" effect (6) representing a set of equal effects.
The interpretation of the test for the "average" effect
should differ from the interpretation of the test for the
"common" effect. As in the case of a random-effects
analysis-of-variance model, in the heterogeneous case
population effects are random numbers with some distribution
(i.e., 015 ¢ 0). Sampled effect sizes do not share one
population effect. Wrongly accepting the homogeneity of
effects will treat an average effect as the common effect.

The variance used for calculating the 1 statistic for
testing the hypothesis Ho: 6 = 0 under the assumption of
homogeneity will not reflect the variation of population
effects. The estimate of the variance used for the test
statistic for the hypothesis in (7a) (on p. 16) at the
second stage will be too small. Instead of using the
estimate of (035 + 02(91I51)) for the variance of the ith
effect size, calculation of the 1 statistic (say, in) under
the decision of homogeneity would use the estimate of
02(QII61)‘ Therefore, when the effects are heterogeneous
(i.e., 035 > 0), the test statistic is tends to be too
large, which likely results in a greater chance of type I
error (false rejection) or "too much power" in the second

stage test.

85
e ' n t ve e

When the overall homogeneity test is rejected, one
assumes that several "true" effects may exist. One common
approach to further study of these effects is to divide the
collection of effect sizes into subgroups by certain factors
and repeat the homogeneity test for each subgroup. Another
approach to analyzing these effects is applying a random-
effects model and testing for the average population effect.

As mentioned above, errors at the first stage will
impact the validity of tests at the second stage. When a
false rejection is made at the first stage, dividing effects
into small groups can lead to more errors. First, because
the population effects are truly homogeneous, classifying
the effects from the same population into subclasses and
conducting separate analyses is unnecessary. Second, the
effective sample sizes for t tests for each subgroup are
obviously reduced from the total sample size used for the t
test for the whole group. Therefore, when population
effect-sizes are homogeneous, tests of homogeneity for
smaller subgroups are conservative or less powerful relative
to the one test for the whole group.

Applying random-effects tests at the second stage is
sometimes considered after rejection of the homogeneity
test. In a random-effects test, the variance used for
calculating the 1 statistic (denoted in here) will include

an estimate of the variation in population effects (016).

86
Including the estimated variance of population effects (035)
rather then using 03(gi|61) alone would overestimate the
variance when population effects are actually consistent.
The in test statistic will then be too small and become less
powerful than tests using 3, under the fixed-effects model
(which should be applied when effects are truly
homogeneous).

The additional simulation in this Chapter will examine
the statistical errors and the appropriateness of tests
using fixed- versus random-effects models. The simulation
addresses the following questions: When the homogeneity
test at the first stage is rightly rejected or wrongly
rejected will the statistical error rates of the 1 tests (gF
and in) at the second stage be similar? Specifically, when
the homogeneity test is wrongly rejected (a type I error
occurs at stage one), how much is the power of the in test
(i.e., assuming random effects at the second stage)
decreased? And, when the homogeneity test is wrongly
accepted (a type II error occurs at stage one), how much is
the power of Q? increased?

Summary

In conclusion, when the overall homogeneity test is
wrongly accepted (a type II error) at the first stage, the
fixed-effects model test 2, would be wrongly applied at
stage two. Two errors will be made: the test is (1)

conceptually invalid, and (2) subject to type I error. When

87
the overall homogeneity test is ytgngly_zg1ggtgg (a type I
error) at the first stage, the test at the second stage
should be less powerful when the random-effects test (an) is
wrongly applied. Table 30 illustrates the relationship

among two-stage sequential testing errors.

Table 30

Two-stage Testing Errors

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I£E§_§L§L§
61 = 62 = ... = 63 = 6 At least one 61 differs
A I¥E§_II 3
True State True State
0 ,
D m 6 = 0 B 6 = 0 I B
e o
c 6 ¢ 0 a 1 - B 6 ¢ 0 l c 1 - B
i
s
iN M C D
o o True State True State
nt 6:0 8¢0 “6:0 [15750
H #5-0 B u5=0 [ B]
o
m ﬂ5¢0 a 1-3 I15¢0 a [1.3]
o r .

 

 

 

 

 

 

 

 

 

 

In Table 30, the four main cells represent the first-
stage test. For convenience, these cells are named A, B, C,
and D (marked at their upper right corners). The second-
stage tests and their statistical errors are illustrated by
small tables within each cell of the large table.

Population effects for the first stage are denoted by 613.

The common population effect for the homogeneous effects is

88
6. The average population effect for the heterogeneous
effects is denoted as #5. Cells marked "Type I" or "Type
II" represent occurrences of the two types of statistical
errors.

From the above summary, I predicted that the second-
stage tests in cell C using the random-effects test (ﬁn) may
have higher type II error rates than the correct fixed-
effects test 12‘ And second-stage tests in cell B using the
fixed-effects test gr may have lower type II error rates
than the correct random-effects test (ta), and may have
higher type I error rates.

Second stage 1 tests to test the hypotheses Ho: #5 = 0
vs. H1: ”51¢ 0 for fixed-effects and random-effects models

are 3

 

 

g.
£5: = I (36)
1 /"/{(1/"2 (Qﬂsg)
and
g.
2 = I (37)

 

 

—R
1 ”Al/[025*‘03 (ELI 5;) l)

where g. is the average effect weighted by precision,

Q0 = Ei'd'i’ (38)

89

US;1
2- = - (39)
’ V 2(1/.§i’)

 

The estimators of the variances S31 for fixed- and random-

effects differ:

For fixed-effects,

§_2 = 0’ (QLI5L)- (40)

For random-effects,

sj.a = 625 + annual). (41)
The estimator of the variance of population effects was an

estimate developed by Hedges and Olkin (Hedges & Olkin,

1985), specifically:

025 = §3(Qi) ' (1/5) E 03(Qil5i). (42)
where §2(gi) is the usual sample variance computed using the

g1 values as data.

Simulation gt Rowen to: Segngntial Tests

Power values for the 1 tests were constructed through
further simulation. Counts of both type I and type II
errors for the second stage 1 tests were noted. Simulation
will allow me to determine (1) whether or not the preset
significance level of the 1 test is maintained, and (2)

whether or not the second-stage 1 test given errors at the

90
first stage is as powerful as it is following correct
decisions.

Factors that produced high or lgy power in the
homogeneity tests are crucial in studying errors of
subsequent 1 tests. The power simulation in Chapter IV
indicated that for certain non-normal distributions of 6
values and for effects with small sample sizes the actual
power of homogeneity tests was greater than power based on
the asymptotic theory . The primary goal of this Chapter is
to examine the statistical errors of the second-stage based
upon the decision at the first stage. Extra focus was on
the subsequent level of errors at the second stage in
conditions that showed higher power for the homogeneity test
at stage one. Results from "non-normal" sets of 63 (or
"sets of 6s with extreme values") or small sample sizes were
compared to those from more evenly distributed sets of 65 or
large samples.

Factots fon Simulation ot Subsegnent g Tests

Factors from previous simulations were chosen for the
simulation of n-test behavior. The fixed-effects models
were used to fully demonstrate the subsequent impact of the
power of the first-stage test on the power of the second-
stage test. Those combinations of factors that had resulted
in differences between the simulated and asymptotic power
values of homogeneity tests were closely examined. Other

factors used in the additional simulation were the same as

91

those for the simulation in Chapter IV, with the elimination

of (1) cases where k = 2, and (2) patterns of population

effects with two extreme values.

The simulation procedure for the power of the second-

stage 1 tests followed the simulation for homogeneity tests

in Chapter IV:

A.

Test significance of the homogeneity test (at a =
0.05). Consider the second-stage test to occur in
one of the four decision categories based on the
homogeneity test and the known pattern of 6 values.
The four categories (shown as A through D in Table
30) are rightly accepting homogeneity, wrongly
accepting homogeneity, rightly rejecting
homogeneity, or wrongly rejecting homogeneity.
Calculate two 1 statistics using the two estimates
of variance, and note which would be used based on
the decision about homogeneity. (using 1R if
homogeneity is rejected, or 1F if homogeneity is
accepted) for each of 2000 sets of generated
effects.

Continue to replicate until the count of 1 tests in
each category of decision based on the homogeneity
test reaches 2000 replications.

Compute proportions of 1 statistics (across the
2000 replications) exceeding normal critical values

at various significance levels separately for the

92
above four decision categories.

E. Calculate theoretical power values for both fixed-
and random-effects tests (1F and 1R) based on the
known parameters 61' i = 1 to L-

F. Compare proportions of the significant 1 statistics
(as power values) with the theoretical power
values.

G. Determine if 1 tests were more powerful for cell B
(1, vs. 1R) than for cell A or less powerful for
cell C (1R vs. 1?) than for cell D. (Note that in
cell A and cell D, the 1 tests used would have been

computed with the correct estimate of variance.)

Results

Simulated power values for 1 tests from the second
stage of effect-size meta-analysis were compared to
theoretical power values. Analysis of power for 1 were
carried out for each of the four decision categories for the
homogeneity test at the first stage: (A) rightly accept
homogeneity test, (B) wrongly accept homogeneity test, (C)
wrongly reject homogeneity test, or (D) rightly accept
homogeneity test.

S'mu at v . Th we ues

Simulated power values based on tests with fixed- or

random-effects variance estimates were compared with the

corresponding theoretical power, based on either the fixed-

93
or random-effects variance parameters.

Under the true state of homogeneity, theoretical power
of both fixed- and random-effects tests were equal since the
variance of population effects (015) was zero. Under
heterogeneity, theoretical power values for random-effects
tests were less than values for fixed-effects tests because
the random-effects test 1R used a larger variance value (in
its denominator).

Fined-effects tests. Theoretical power values were
calculated with the fixed-effects variance 03(gi|6i). The
simulated power values were obtained by computing is with
the estimated fixed-effects variance (using Q; for 6; in
formula (5)).

When effects were homogeneous, and the stage-one
decision about homogeneity was correct (in cell A),
theoretical power values for 1, were slightly greater than
simulated power values. The difference decreased as sample
sizes increased. At a = 0.05, for common effect 6 = 0, the
mean difference across all homogeneous groups was .003
(.050-.047). A paired t test for the equality of simulated
and theoretical power means was 4.36 (Q: = 47, p < .001).
When power was analyzed according to sample size and 1 the
mean difference in theoretical versus simulated power was
significant only for sample sizes n = 20. Paired t tests on
mean theoretical and simulated power values for 1, for

homogeneous groups with different sample sizes and 6 = 0 are

94

listed in Table 31.

Table 31

Paired t Tests on Mean Theoretical and Simulated 1, Power
for Homogeneous Effects with 6 = 0 (a = 0.05

 

 

n n Mean Diff.* 86 Se paired t g; p
5 201 .0069 .005 .002 2.85 3 .065
601 .0043 .009 .004 .99 3 .395
120; .0040 .006 .003 1.31 3 .281
2001 .0023 .003 .002 1.41 3 .254
10 203 .0070 .004 .002 3.20 3 .049*
605 -.0016 .003 .002 -1.02 3 .385
1203 .0033 .005 .003 1.28 3 .290
2001 .0003 .006 .003 .09 3 .934
30 205 .0100 .003 .002 5.73 3 .011*
603 -.0029 .002 .001 -3.05 3 .056
1203 .0030 .003 .001 2.10 3 .127
2005 .0048 .004 .002 2.66 3 .076

 

ugtg: * p < .05, positive mean difference indicates
theoretical power > simulated power.

For homogeneous effects with 6 > 0, the mean difference
was .01 (.719-.709). The paired t test value was 7.56 (dt =
143, p < .001). Like the case in which 6 = 0, the
difference also decreased as sample size increased. Results
were similar for 6 = 0.1, 0.2, or 0.3. Paired t tests on
theoretical minus simulated power values for fixed-effects
tests (1?) for homogeneous groups with different sample

sizes and 6 > 0 are listed in Table 32.

95
Table 32

Paired t Tests on Mean Theoretical and Simulated 1, Power
for Homogeneous Effects with 6 > 0 (a = 0.05:

 

 

1 H Mean Diff.* Sd Se Paired t g: n
5 20K .0237 .012 .003 7.04 11 .000#
605 .0073 .011 .003 2.30 11 .042*
1203 .0032 .009 .003 1.26 11 .234
200K .0014 .007 .002 .72 11 .484
10 203 .0364 .012 .003 10.87 11 .000#
603 .0084 .011 .003 2.54 11 .028*
1203 .0002 .006 .002 .13 11 .899
200K .0002 .004 .001 .21 11 .838
30 20K .0296 .020 .006 5.14 11 .000#
603 .0044 .008 .002 1.99 11 .072
1205 .0017 .005 .001 1.27 11 .229
2005 .0005 .002 .001 .85 11 .412

 

Note: * p < .05, positive mean difference indicates
theoretical power > simulated power.
if p < .001.

 

Next I applied the modified Kolmogorov-Smirnov test,
with critical value 26 = 0.030, to the distribution of
(theoretical power - simulated power) values. For
homogeneous population effects with 6 = 0, only 1 of 48
combinations showed a significant difference between the
theoretical and simulated Zr power functions. When 6 > 0,
22% (32/144) had significant discrepancies in which. The
theoretical power values were greater than the simulated
ones. Discrepancies increased as the sample size decreased.

Discrepancies were independent of the number of effect sizes

1, the value of 6, and the sampling fractions between or

96
within studies. Frequencies are listed by numbers of
effects 3, total sample sizes H. equal vs. unequal sample
sizes between studies («1), within study sample-size balance
(¢;), or the value of the common effect 6 in Tables 33 to

37.

Table 33

Frequencies of Significant Discrepancies for Power of 1,
by 3 for Homogeneous Effects with 6 > 0

 

 

 

Significant unnben of effect—sizes (1)

Discrepancy 5 10 30 Total
Yes 8 (17%) 11 (23%) 13 (27%) 110 (22%)
No 40 37 35 112 (78%)
Total 48 48 48 144

 

x3 = 1.527 (g; = 2, p = .466)

Table 34

Frequencies of Significant Discrepancies for Power of 1F
by E for Homogeneous Effects with 6 > 0

 

 

 

Significant Tetal_samnle_sizes (H)
Discrepancy 201 603 1203 2001 Total
Yes 26(72%) 6(17%) 0 0 32 (22%)
No 10 30 36 36 112 (78%)
Total 36 36 36 36 144

 

x3 = 73.286 (1: = 3, Q < .001)

97

Table 35

Frequencies of Significant Discrepancies for Power of 1,
by I; for Homogeneous Effects with 6 > 0

 

 

 

 

Significant Sam2lin9_fractign_bstuesn_§tudis§ (u )

Discrepancy Balanced Unbalanced Total
Yes 18 (25%) 14 (19%) 32 (22%)
No 54 58 112 (78%)
Total 72 72 144

X’ = 0.643 (1; = 1, p = .423)

Table 36

Frequencies of Significant Discrepancies for Power of 1,
by 8i for Homogeneous Effects with 6 > 0

 

 

 

Significant Samaling_fractign_xithin_§tudis§ (¢1)
Discrepancy Balanced Unbalanced Total
Yes 18 (25%) 14 (19%) 32 (22%)
No 54 58 112 (78%)
Total 72 72 144

 

0.643 (1; = 1, p = .423)

X2

98

Table 37

Frequencies of Significant Discrepancies for Power of 1,
by 6 for Homogeneous Effects with 6 > 0

 

ommon o u a on e e t (6)
3

 

 

Discrepancy 0.1 0.2 0. Total
Yes 7 (15%) 12 (25%) 13 (27%) 32 (22%)
No 41 36 35 112 (78%)
Total 48 48 48 144

 

x2 = 2.491 (_t = 2, p = .288)

When population effects are truly heterogeneous, fixed-
effects tests are not appropriate (in cell B and D).
However, the simulated power values for 1, were also
compared with the theoretical power values calculated with
the fixed-effects variances in cell B because in this case
the stage-one decision implies that 1, should be used. At a
= 0.05, theoretical power values were significantly less
than simulated power values, with a mean difference of -
.040, and the paired t-test value was -8.13 (11 = 375, p <
0.001). Results were similar across sample sizes. Paired t
tests on theoretical and simulated power values for 1F for

heterogeneous groups with different sample sizes are listed

in Table 38.

99

Table 38

Paired t Tests on Theoretical and Simulated Power of 1,
for Heterogeneous Effects (0 = 0.05)

 

 

1 1 Mean Diff.* 86 Se paired t g; p
5 203 .0113 .021 .004 2.62 23 .015*
601 -.0138 .024 .005 -2.79 23 .010*
1203 -.0157 .034 .007 —2.27 23 .033*
2001 -.0219 .041 .008 -2.64 23 .015*
10 201 -.0361 .082 .014 -2.65 35 .012*
603 -.0590 .084 .014 -4.22 35 .000*
1201 -.0626 .121 .020 -3.09 35 .004*
2001 -.0665 .146 .024 -2.73 35 .010*
30 201 -.0419 .082 .014 —3.07 35 .004*
605 -.0396 .086 .014 -2.77 35 .009*
1201 -.0368 .090 .015 -2.47 35 .019*
2001 -.0554 .143 .027 -2.04 27 .051

 

Ngte: * p < 0.05, positive mean difference indicates
theoretical value > simulated value.

Results of the modified Kolmogorov-Smirnov test for
heterogeneous population effects with fixed-effects tests
showed that 51% of 376 combinations showed a significant
difference between the theoretical and simulated 1, power.
Most significant discrepancies were negative, that is,
simulated values were higher than the theoretical values.
Positive discrepancies were more common for smaller sample
sizes. That is, when sample sizes were small, some

theoretical values were higher than simulated power values.

100
Discrepancies were not associated with patterns of
population effects. Discrepancies were independent of the
sampling ratio within studies, but were associated with
sampling fraction between studies. When studies with large
effects had large sample sizes, the simulated values were
consistently higher than theoretical values. When sample
sizes across studies were equal, the simulated values were
consistently lower than the theoretical values.
Crosstabulation of significant discrepancies are listed in

Tables 39 to 43.

Table 39

Frequencies of Significant Discrepancies for Power of 1,
by 1 for Heterogeneous Effects

 

 

 

Significant nunne; gt gffect-giges (1)

Discrepancy 5 10 30 Total
Yes 47 (49%) 80 (56%) 66 (49%) 193 (51%)
No 49 64 70 183 (49%)
Total 96 144 136 376

 

x’ = 1.672 (g; = 2, p = .433)

101

Table 40

Frequencies of Significant Discrepancies for Power of 1, by
by H for Heterogeneous Effects

 

 

 

 

Significant Total 1ample sizes (N)
Discrepancy 203 603 1203 2001 Total
Yes 67 49 42 35 193
(70%) (51%) (44%) (40%) (51%)
No 29 47 54 53 183
Total 96 96 96 88 376

 

x3 = 20.0133 (1; = 3, p < .001)

Table 41

Significant Discrepancies for Power of 1, by Pattern of 61
for Heterogeneous Effects ‘

 

 

 

Pa er 0 o 8
Significant One Three Five
Discrepancy Extreme Subsets Subsets Total
Yes 70 (49%) 74 (53%) 49 (53%) 193 (51%)
No 74 66 43 183 (49%)
Total 144 140 92 376

 

x“ 0.694 (g: = 2, p = .707

102

Table 42

Frequencies of Significant Discrepancies for Power of 1, by
11 for Heterogeneous Effects

 

 

 

 

 

Significant Sampling fnagtion between §tngies (n1)
Discrepancy Balanced Unbalanced Total
Yes 34 (18%) 159 (85%) 193 (51%)
No 154 29 183 (49%)
Total 188 188 376

 

x“ = 166.341 (1; = 2, p < .001)

Table 43

Frequencies of Significant Discrepancies for Power of 1, by
¢i for Heterogeneous Effects

 

 

 

 

 

Significant Sampling tpngtion witnin §tudie§ (¢i)
Discrepancy Balanced Unbalanced Total
Yes 97 96 193 (51%)
No 91 (48%) 92 (49%) 183 (49%)
Total 188 188 376

 

x“ = 0.0107 (g; = 2, p = .918)

do - . Theoretical power values for 1R
were calculated with the random-effects variance 035 +
03(gi|61). The simulated power values were obtained with
the estimate of the random-effects variance (see formulas

(40) and (41)).

103

When population effects are truly homogeneous, random-
effects tests are not appropriate (in cell A and C).
However, in cell C the decision made at stage one is to
reject Ho, thus this decision would lead (incorrectly) to
the use of 1, at stage two. At a = 0.05, the discrepancy
between the theoretical and simulated power values for 1,
was large (in comparison to that for 1,, the fixed-effects
test). When 6 = 0 the mean difference across all sample
groups was .041 (.050-.009). The paired t-test value was
49.92 (g; = 47, p < 0.001), showing that the theoretical
values were significantly greater than the simulated power
values. Paired t tests on theoretical and simulated power
values of 1, for homogeneous groups with 6 = 0 and for
different sample sizes are listed in Table 44.

For 6 > 0, at a = 0.05, the mean difference across all
sample sizes was .1832 (.7187-.5355). The paired t test was
15.90 (g; = 143, p < .001) which showed that theoretical
values were significantly greater than simulated power
values. Results were similar across sample sizes. However,
as power values approached 1 for some large samples the
differences were forced to decrease. Paired t tests on mean
theoretical and simulated power values for 1, for

homogeneous groups with 6 > 0 for different sample sizes are

listed in Table 45.

104

Table 44

Paired t Tests on Theoretical and Simulated Power of 1,
for Homogeneous Effects with 6 = 0 (a = 0.05)

 

 

 

3 H Mean Diff.* Sd Se Paired t g; p

5 203 .0450 .002 .001 48.11 3 .000
603 .0474 .002 .001 55.68 3 .000

1203 .0485 .001 .000 168.01 3 .000

2005 .0453 .001 .001 62.70 3 .000

10 203 .0421 .001 .001 64.07 3 .000
603 .0401 .001 .001 84.79 3 .000

1205 .0425 .002 .001 38.66 3 .000

2005 .0419 .001 .001 58.32 3 .000

30 203 .0356 .007 .004 10.16 3 .002
605 .0351 .005 .002 14.27 3 .001

1205 .0338 .002 .001 43.42 3 .000

2005 .0346 .005 .002 14.53 3 .001
Notg: * positive mean difference indicates that theoretical

power > simulated power.

105

Table 45

Paired t Tests on Theoretical and Simulated Power of 1,

for Homogeneous Effects with 6 > 0 (a = 0.05)

 

 

 

3 11 Mean Diff . * Sd Se Paired t n: p

5 203 .2089 .091 .026 8.00 11 .000
603 .3228 .106 .030 10.58 11 .000

1203 .3035 .100 .029 10.47 11 .000

2003 .2433 .153 .044 5.51 11 .000

10 203 .2380 .089 .026 9.29 11 .000
603 .2285 .085 .024 9.35 11 .000

1203 .1627 .127 .037 4.43 11 .001

2003 .1265 .157 .045 2.79 11 .018

30 203 .2072 .101 .029 7.09 11 .000
603 .0887 .107 .031 2.87 11 .015

1203 .0491 .080 .023 2.31 11 .056

2003 .0192 .034 .010 1.98 11 .073
note: * positive mean difference indicates that theoretical

power > simulated power.

Applying the modified Kolmogorov-Smirnov test to
difference based on homogeneous population effects, when 6 =
0, all 48 combinations showed significant difference between
the theoretical and simulated 1 power. One half of the
significant discrepancies was positive and the other half
was negative. Significant discrepancies for 6 = 0 was not
associated with any simulation factors.

When 6 > 0, 89% of 144 combinations had significant
discrepancies, most of which were positive. The theoretical
power values were greater than the simulated ones.
Discrepancies increased when the number of effects 1 and the

sample size N decreased. When the value of 6 decreased

106
discrepancies also increased. Discrepancies were
independent of the sampling fraction either between or
within studies. Frequencies are listed by number of effects
1, total sample sizes H, and equal vs. unequal sample sizes
between study sample sizes (”1), within study sample size
balance (¢$), and value of the common effect 6 in Tables 46

to 50.

Table 46

Frequencies of Significant Discrepancies for Power of 1,
by 1 for Homogeneous Effects (6 > 0)

 

 

 

 

Significant numben gt gfﬁegt-sizgs (K)
Discrepancy 5 10 30 Total
Yes 48 (100%) 47 (98%) 33 (69%) 128 (89%)
No 0 1 15 16 (11%)
Total 48 48 48 144

 

x3 = 29.672 (1; = 2, p < .001)

107

Table 47

Frequencies of Significant Discrepancies for Power of 1,
by H for Homogeneous Effects (6 > 0)

 

 

 

 

Significant Tgtnl snmplg sizes (N)
Discrepancy 201 601 1205 2001 Total
Yes 36 34 31 27 128
(100%) (94%) (86%) (75%) (89%)
No 0 2 5 9 16
Total 36 36 36 36 144

 

x3 = 12.938 (df = 3, p < .01)

Table 48

Frequencies of Significant Discrepancies for Power of 1,
by 11 for Homogeneous Effects (6 > 0)

 

 

 

Significant Sampling tragtign between studigs (Hi)
Discrepancy Balanced Unbalanced Total
Yes 62 (86%) 66 (92%) 128 (89%)
No 10 6 16 (11%)
Total 72 a 72 144

 

x3 = 1.125 (g: = 1, p = .289)

108

Table 49

Frequencies of Significant Discrepancies for Power of 1R
by 9i for Homogeneous Effects (6 > 0)

 

Significant Sam 'n f acti with' studies (¢i)

 

 

 

 

 

Discrepancy Balanced Unbalanced Total
Yes 63 (88%) 65 (90%) 128 (89%)
No 9 7 16 (11%)
Total 72 72 144

x’ = 0.281 (g; = 1, p = .596)

Table 50

Frequencies of Significant Discrepancies for Power of 1,
by 6 for Homogeneous Effects (6 > 0)

 

Significant annon population ettgpt (6)
0 1

 

 

 

 

Discrepancy . 0.2 0.3 Total
Yes' 48 (100%) 43 (90%) 37(77%) 128 (89%)
No 0 5 11 16 (11%)
Total 48 48 48 144

 

x3 = 12.79752 (1; = 2,‘ Q < .01)

When the population effects were heterogeneous and the
first stage hypothesis is rejected (in cell D), the random-

effects test 1, was the correct test. At a = 0.05,

109
theoretical power values for 1, were greater than the
simulated values. The mean difference across all sample
sizes of 0.10 was significant (.463-.363), with a t = 18.67

(g; = 375, p < 0.001). Results were similar across sample

sizes. As above, when power values approached 1 for some
large samples, discrepancies were limited and reduced.
Paired t tests on theoretical and simulated 1, power values

for heterogeneous groups and different sample sizes are

listed in Table 51.

Table 51

Paired t Tests on Theoretical and Simulated Power of 1,
for Heterogeneous Effects (0 = 0.05)

 

 

 

1 3 Mean Diff.* Sd Se Paired t g; p
5 203 .1467 .089 .018 8.06 23 .000
603 .2021 .116 .024 8.56 23 .000
1203 .1776 .088 .018 9.90 23 .000
2005 .1445 .107 .022 6.60 23 .000
10 203 .1572 .087 .015 10.84 35 .000
603 .1454 .073 .012 12.01 35 .000
1203 .0871 .105 .017 4.98 35 .000
2003 .0447 .097 .016 2.75 35 .009
30 203 .1057 .053 .009 11.87 35 .000
605 .0380 .061 .010 3.75 35 .001
1203 .0141 .045 .007 1.88 35 .068
2003 -.0012 .062 .012 - .10 27 .919
Ngtg * positive mean difference indicates theoretical

power > simulated power.

Applying the modified Kolmogorov-Smirnov test to power

functions for 1, for heterogeneous effects, almost all (96%)

110
of the 376 combinations showed significant differences
between the theoretical and simulated power. Most of the
significant differences were negative. About 33% of the
measures (376 x 15 = 5640 measures) showed that theoretical
power values were less than the simulated ones, and 13%
showed that theoretical values were less than the simulated
values. Significant discrepancies decreased as sample size n
or the number of effects 1 increased. Discrepancies
occurred more when population effects had extreme values
than when population effects were more evenly dispersed.
Frequencies are listed by number of effects 3, total sample
sizes 3, and equal vs. unequal sample sizes between study

sample sizes (ML) and within study sample-size balance (¢;)

in Tables 52 to 56.

Table 52

Frequencies of Significant Discrepancies for Power of 1,
by 1 for Heterogeneous Effects

 

 

 

Significant Nnnbet gt ettegt-sizes (1)

Discrepancy 5 10 30 Total
Yes 96(100%) 144(100%) 120 (88%) 360 (96%)
No 0 0 16 16 ( 4%)
Total 96 144 136 376

 

x3 = 29.490 ( f = 2, p < .001)

111

Table 53

Frequencies of Significant Discrepancies for Power of 1,
by g for Heterogeneous Effects

 

 

 

 

 

Significant Tota s' (H)
Discrepancy 203 603 1201 2003 Total
Yes 96 96 88 80 360
(100%) (100%) (92%) (91%) (96%)
No 0 0 8 8 16
Total 96 96 96 88 376

 

x3 = 17.502 (df = 3, p < .001)

Table 54

Frequencies of Significant Discrepancies for Power of 1,
by I, for Heterogeneous Effects

 

Significant §ampling fnactign bgtwegn §tudigs (Hi)

 

 

Discrepancy Balanced Unbalanced Total
Yes 180 (96%) 180 (96%) 360 (96%)
No 8 8 16 ( 4%)
Total 188 188 376

 

ll
H:
II
...:

0.000 (d p = 1.000)

X2

112

Table 55

Frequencies of Significant Discrepancies for Power of 1,
by e, for Heterogeneous Effects

 

 

 

Significant Sampling fraction within studies (¢i)

Discrepancy Balanced Unbalanced Total
Yes 180 (96%) 180 (96%) 360 (96%)
No 8 8 16 ( 4%)
Total 188 188 376

 

x2 0.000 (g; = 1, p = 1.000)

Table 56

Significant Discrepancies for Power of 1, by Pattern of 61
for Heterogeneous Effects ’

 

Battern of gopulation Etfectg

 

 

 

 

Significant One Three Five

Discrepancy Extreme Subsets Subsets Total
Yes 144(100%) 132(94%) 84 (91%) 360 (96%)
No 0 8 8 16 ( 4%)
Total 144 140 92 376

 

x“ = 11.584 (1: = 2. n < .01)

Sunnaty. In general, theoretical and simulated values
matched better for large samples than small samples.
Because they are based on ngynptgtig theory, the theoretical

values should fit better for large samples. However, since

113
both power values had an upper limit, and both power values
increased as the sample size increased, the discrepancies
also tend to decrease as sample size increases because both
power functions tend more quickly to one.

Theoretical values for 1, power fitted the best when
homogeneity tests at the first stage were correctly accepted
(in cell A). For homogeneous effects with 6 = 0, almost no
significant discrepancies between simulated and theoretical
power functions were found. When 6 > 0, most discrepancies
occurred when sample size was small (e.g., n, = 20), where
theoretical values were significantly greater than the
simulated values.

About half of the distributions showed significant
discrepancies between theoretical and simulated power values
for 1, when homogeneity was falsely accepted (in cell B).
Discrepancies increased as sample sizes decreased. When
studies had equal sample sizes (equal His), theoretical
values were closer to the simulated values then when studies
had unequal samples. When large effects were combined with
large samples, the theoretical values were lower than the
simulated values.

Power functions for random-effects tests (1,) did not
fit as well as those for fixed-effects tests. When
homogeneity was falsely rejected (in cell C), for 6 = 0, all
combinations had significant discrepancies (half were

positive values, and the other half were negative values).

114
Significant discrepancies were not clearly associated with
any other simulation factors. When 6 > 0, about nine tenth
of the distributions had higher theoretical values.
Discrepancies decreased as the number of effect sizes, the
sample size, or the value of 6 increased.

When homogeneity was correctly rejected (in cell D),
almost all theoretical power values (96%) for 1, were
significantly different from the simulated values. When
population effects were fairly evenly distributed,
theoretical values were higher than simulated values. When
one population had one extreme effect-size value,
theoretical values could be either higher or lower than the
simulated values. Also discrepancies decreased as the
number of effects 3 increased.

Results showed that overall theoretical power values
did not fit well with the simulated values for random-
effects tests (1,). Theoretical values were sometimes
greater and sometimes less than simulated values. This
result leads to a question about the precision of the
estimate of the variance of population effects (035).

Hedges and Olkin (1985) gave an approximation to the
distribution of the effect-size parameter-variance
estimator. As they indicated, the estimator of the variance
of population effects has an asymptotic normal distribution,
however, the large sample normal approximation to the

distribution of the estimate of 035 is probably not very

115
good unless the number of effects 1 is quite large. More
needs to be known about the accuracy of the large sample
approximation to the distribution of the estimate of the
variance of population effects.

When effects were homogeneous, the power of the random-
effects test 1, seemed excessively low. One possibility is
that the variance of the population effects 0’, for
homogeneous effects (0’, = 0) may be systematically
overestimated (biased). When effects were heterogeneous,
the estimate of the population variance seemed appropriate
and may be more accurate.

The behavior of the estimator of the population
variance based on different homogeneity decisions at stage
one was studied via further simulation. Two sample sizes n,
of 20 and 60 were selected and two sets of effect-size
parameters were set for the case where 1 = 5. The average
effect size was the same for both homogeneous and
heterogeneous effects: the 6 values for 035 = 0 were (0.2,
0.2, 0.2, 0.2, 0.2), and for 035 > 0 the effects were (0,
0.2, 0.2, 0.2, 0.4). 2000 replications were generated for
both correct and incorrect decisions about homogeneity.
When homogeneity was accepted values of the variance
estimates were close to zero and were less dispersed for
both homogeneous and heterogeneous effects. As predicted
the bias of the estimate was greater when effects were

homogenous than when effects were heterogeneous.

116
ng11 of 1 gased on Decisiong npgut Honogeneity

Power values for 1 and 1, were compared at a = 0.05.
If the homogeneity was rightly accepted (in cell A) or
rightly rejected (in cell D), the second stage 1 tests which
follow from the stage-one decision are tests with correct
variance components. No comparison was necessary when the
correct 1 test was applied. When homogeneity was falsely
accepted (cell B) or falsely rejected (cell C), the
subsequent 1 test (suggested by the stage-one test) would
use the estimate of the wrong variance and be incorrect.
Since population effects were known values in the
simulation, both 1 tests were calculated for cells B and C.
Simulated power values were compared for the two tests
(i.e., for tests using the correct versus incorrect
variance).

Homogeneous population effects. When effects were
homogeneous and the homogeneity was rejected (in cell C),
the recommended 1 test on the average effect would be
calculated as 1,, that is, using the estimate of the random-
effects variance 035+03(11|61). However, the correct 1_test
(1,) should use the estimate of the fixed-effects variance
03(giI61). Since the estimate of 035 must be greater than
or equal to 0, power values based on 1, and the random-
effects variance should always be less than values based on

the fixed-effects test (1,).

For homogeneous effects with 6 = 0, at a = 0.05, across

117
all sample groups the mean power difference between 1, and
1, was .0387 (.0477-.0090), with a paired t = 31.33 (1: =
47, p < .001). When the common effect 6 = 0, the
probability of falsely rejecting the 1 test is the type I
error rate. Mean simulated power values showed that both 1,
and 1, had smaller type I error rates (0.0477 and 0.009)
than the preset a level (0.05). However, the size of 1, is
much lower than either the a level or the size of 1,. When
the number of effects 1 increased, mean differences between
the power of 1, and 1, slightly decreased. Paired t tests
on homogeneous effects with 6 = 0 for each sample-size group

are listed in Table 57.

Table 57

Paired t Tests on Power (size) of 1, versus 1,
for Homogeneous Effects with 6 = 0 (a = 0.05)
and Homogeneity Was Rejected

 

 

 

1 3 Mean Diff.* Sd Se Paired t n; p
5 205 .0434 .004 .002 23.41 3 .000
605 .0432 .005 .003 16.42 3 .000
1205 .0465 .008 .004 11.61 3 .000
2003 .0522 .006 .003 16.37 3 .000
10 203 .0327 .004 .002 15.26 3 .001
603 .0396 .002 .001 41.99 3 .000
1203 .0417 .003 .001 30.66 3 .000
2005 .0419 .001 .001 56.09 3 .000
30 203 .0275 .009 .004 6.44 3 .008
603 .0327 .008 .004 8.14 3 .004
1203 .0305 .003 .001 20.92 3 .000
2003 .0320 .001 .001 52.26 3 .000

Note: * positive mean difference indicates power of 1, >

 

power of 1,.

118

For homogeneous effects with 6 > 0, across all sample
groups the mean power difference between 1, and 1, was .1751
(.7107-.5355), with a paired t = 15.65 (df = 143, p < .001).
Power values increased as either the value of 6 or the
sample size increased. However, power values for fixed-
effects tests (1,) increased faster than those for random-
effects tests (1,) as either the value of 6 or the sample
size increase. When 6 or the sample sizes were large, both
power values approached 1. Mean power values for both tests
for different sample sizes and 6 values are listed in Table

58. Since population effects were homogeneous, the 1 test

should still be the correct test here.

Table 58

Mean 1 Power Values of 1, versus 1,
for Homogeneous Effects with 6 > 0 (c = 0.05)
and Homogeneity Was Rejected

 

 

6 = 0.10 6 = 0.20 6 = 0.30

L H a, an 2., 13 1, an
5 201 .1081 .0186 .2296 .0445 .4036 .1087
601 .2142 .0299 .5015 .1391 .8014 .4012
1201 .3246 .0605 .7630 .3372 .9735 .7619
2001 .4495 .1002 .9200 .5912 .9990 .9609
10 201 .1502 .0415 .3599 .1352 .6292 .3351
601 .3166 .1066 .7669 .4449 .9672 .8331
1201 .5117 .2170 .9561 .7831 .9994 .9876
2001 .6996 .3690 .9971 .9529 1.0000 .9999
30 201 .2939 .1331 .7437 .4971 .9656- .8187
601 .6567 .4396 .9922 .9535 1.0000 .9992
1201 .9061 .7530 1.0000 .9994 1.0000 1.0000
2001 .9830 .9256 1.0000 1.0000 1.0000 1.0000

 

Heterogeneous population gifects.

119

When effects were

heterogeneous but homogeneity was accepted (in cell B), the

1 test which follows from the stage-one decision whould

typically be calculated as 1,, using the estimate of the

fixed-effects variance 03(gil61).

The correct 1 test,

however, is 1,, which should use the estimate of the random-

effects variance 035+c=(gi|61).

Here the power of the

incorrect test (1,) would be expected to be greater than the

power of the correct test.

between 1

minus 1

The mean power difference

across all sample groups was -0.330

(.5122-.5452), paired 1 = - 20.05 (1: = 375, p_ < .001).

Mean power values for each sample group and patterns of 61s

are listed in Table 59.

Table 59

Mean 1 Power values of 1, versus 1
for Heterogeneous Effects (a = 0.05

and Homogeneity Was Accepted

 

One Extreme

Three Subsets

Five Subsets

 

K H Zp ZR 1p ZR 1p in
5 203 .0792 .0654 .2808 .2389 - -
603 .1387 .1077 .5922 .5214 - -
1203 .1997 .1437 .7745 .7188 - -
200K .2796 .1955 .8638 .8261 - -
10 203 .0895 .0731 .4099 .3591 .4116 .3628
603 .1608 .1284 .7256 .6693 .7258 .6744
1203 .2418 .1986 .8637 .8294 .8596 .8258
2003 .3118 .2623 .9302 .9039 .9683 .9498
30 205 .0747 .0641 .7292 .6989 .7203 .6947
603 .1233 .1053 .9198 .9020 .9224 .9055
1203 .1711 .1490 .9802 .9750 .9806 .9745
200K .2272 .2018 .9958 .9938 .9960 .9941

 

120

The population effects for the pattern with one extreme
case were (0, ..., 0, 6,) and 6, = 0.1, 0.2, or 0.25. The
value of 6 for three or five subsets, varied as 0.1, 0.2, or
0.25, and can also be viewed as the average effect. At a =
0.05, when the average effect was small, the differences
between power values of the fixed- and random-effects tests
increased as sample sizes increased, as was true for the
one-extreme-value case. When the average effect was large,
power values reached 1, and the differences between power
values for the fixed- and random-effects tests were forced
to diminish.

§umma y. The power difference between the fixed- and
random-effects tests at a = 0.05 increased as the value of
the average effect or sample size increased. As the average
effect or sample size became large, power approached 1 and
the differences diminished. Power differences were smaller
when the homogeneity of effects was falsely accepted (cell
B) than when the homogeneity of effects was falsely rejected
(cell C). The fixed-effects 1 test 1, was always the more

powerful test.
‘.'u_ egt . a 1 - ‘ P“. #1 ‘_ ,1. _ _9 ;:te_

Caution needs to be taken in any sequentially related
testing procedure. To achieve the desired significance
level, sometimes, the criteria for the choice of the

significance level at each stage needs to be adjusted. At

121
other times, corrections need to be made for estimation and
tests of hypotheses.

In Chapter IV, the actual size of the homogeneity test
for large 1 with all small samples (n, = 20) was found to be
greater than the preset a value (see Table 2 and Figure
4.1.4). In other words, there was a slightly higher chance
(up to about 0.05 more) that homogeneity of effects would be
falsely rejected for large 1 with small samples than for
smaller 1 with large samples. Results in Chapter V showed
that the use of an incorrect 1 test (i.e., with an incorrect
variance) was associated with greater type I and type II
error rates when homogeneity of effects was falsely rejected
than when homogeneity was falsely accepted.

Meta-analysts who encounter many studies all with small
samples need to be aware that the homogeneity test has an
inflated type I error rate. Also subsequent 1 tests,
erroneously computed with random-effects variances, will be
much less sensitive to the magnitude of the common effect.
In order to maintain a desired statistical error rate for E.
for example 0.05, one may want to lower the nominal a level
to 0.025 (for which simulated power was around 0.066) for
the homogeneity test with many studies all having sample
sizes less than or equal 20.

Power values and the type I error rates for the second-
stage 1 tests were computed for selected cases to examine

the consequences of lowering the a level from 0.05 to 0.025

122
for the homogeneity test at the first stage. For 1 = 30, n

= 20, and homogeneous effect sizes with common effect 6

ll
0

the actual rejection rate of H was 0.0780 for a nominal a =
0.05. The actual rejection rate for the 1, test was 0.0185
when homogeneity was falsely rejected, and was 0.0435 for 1,
when homogeneity was correctly rejected. For the same
values of 1 and n for the homogeneity test with a = 0.025,
the rejection rate of H was 0.0465. And the rejection rate
for 1, test was 0.020 when homogeneity was falsely rejected,
and the chance of rejecting was 0.0425 when homogeneity was
correctly rejected.

The total rejection rates for the second-stage 1 tests
at the 0.05 level, P(R2), were compared under the first-

stage a values of 0.05 and 0.025 and can be written as

below:

P(R2) = P(R2|R1)P(R1) + P(R2|R°1)[l - P(R1)], (43)
where
P(R1) = the rejection rate of H at stage one,
P(R°1) = 1 - P(R1),
P(R2|R1) = the chance of rejecting Ho: u, = 0,
given that homogeneity has been rejected, and
P(R,|R91) = the chance of rejecting Ho: u, = 0,

given that homogeneity has been accepted.

For a = 0.05 at stage one:

P(R2) = (0.0185)(0.0780) + (0.0435)(0.9220) = 0.0416.

123
For a = 0.025 at stage one:

P(R2) = (0.0200)(0.0465) + (0.0425)(0.9535) = 0.0415.

Thus here reducing the first-stage a does not impact
the size of the 1 test procedure at all. When effect sizes
were homogeneous with common effect 6 = 0.2, the rejection
rates at the second stage, for first-stage a values 0.05 and

0.025 are,

for a = 0.05 at stage one,

P(R2) = (0.6100)(0.0860) + (0.7565)(0.9140) = 0.7439,
and for a = 0.025 at stage one,
P(R2) = (0.6140)(0.0465) + (0.7690)(0.9535) = 0.7618.

The lower a value at stage one here is associated with
a slight increase in power at stage 2, which is beneficial
since the stage 2 hypothesis is false (6 = 0.2). When
effect sizes were heterogeneous with average effect u, = 0,
the rejection rates at the second stage under first-stage a

values 0.05 and 0.025 are,

for a = 0.05 at stage one,

P(R2) = (0.0160)(0.1815) + (0.0420)(0.8185) = 0.0373,
and for a = 0.025 at stage one,
P(R2) = (0.0150)(0.1210) + (0.0400)(0.8790) = 0.0370.

Again the change in the type I error rate is minimal,

thus the reduce of stage-one a does not naturally affect the

124
stage-two a value. When effect sizes were heterogeneous
with average effect u, = 0.2, the rejection rates at the

second stage for first-stage a values 0.05 and 0.025 are,

for a = 0.05 at stage one,

P(R2) = (0.5865)(0.1890) + (0.7635)(0.8110) = 0.7300,
and for a = 0.025 at stage one:
P(R2) = (0.5775)(0.1235) + (0.7625)(0.8765) = 0.7397.

Again a slight power increase is seen, though it is
only minimal. However, in none of these instances is a
reduction in stage-one 0 associated with detrimental effects
at stage two. From the above comparison, one can conclude
that lowering the significant level for the homogeneity test
at the first stage when 1 2 30, and n s 20, is appropriate.
When the first-stage-test a was lowered from 0.05 to 0.025,
the false rejection rates for the second-stage 1 tests were
slightly decreased (for 6, or #5 = 0), and the total power
of these 1 tests increased (for 6, or ”51¢ 0).

One can also consider other approaches such as
categorizing the data into homogeneous subgroups instead of
using the random-effects test after rejection of

homogeneity, until more is learned about the estimate of the

variance of the p0pulation effects.

CHAPTER VI

CONCLUSIONS AND IMPLICATIONS

This Chapter includes six sections. First I give an
example with empirical data to illustrate how power of the
homogeneity test can be useful to integrative reviewers.
Second I summarize the simulation study. Then I discuss the
results of the simulation, including the power of the
homogeneity test, and the power of the sequential 1 testing
procedure. Fifth, I present some practical implications for
integrative reviews. And finally, I make suggestions for

further research related to effect-size meta-analysis.

Em

The theoretical power of the homogeneity test was
computed for a subset of data originally from the published
reviews by Steinkamp and Maehr (1983, 1984) and reanalyzed
by Becker (1989). Five studies with six samples on gender
and Geology achievement were chosen. Power was computed for
two sets of fixed-effects population effects: (0, 0, 0, 0,
0, 0.5), and (0, 0, 0.2, 0.2, 0.4, 0.4). The number of
effects was 1 = 6, and the sample sizes, conditional
variances of effects 03(gil6i), and noncentrality parameter
A. for the noncentral chi-square are listed in Tables 60 and

61.

125

126

Table 60

Computation of Honcentrality Parameter for
the One-Extreme-Value Example

 

 

 

 

 

 

n n 61 (a, - 5.): w (9161) (a; - 6-)’/c’(911|5_1_)
52 54 0 .00694 .0378 .1839
46 47 0 .00694 .0430 .1614
458 430 0 .00694 .0045 1.5397
47 47 0 .00694 .0426 .1632
64 56 0 .00694 .0335 .2074
48 48 0.5 .24174 .0430 5.6258
A. = 7.8814

Table 61
Computation of Noncentrality Parameter for
the Three-Equal-Values Example

n“ n" a, (61 - 5.): a= (9.16;) (61 - 6.)=/a= (91161)
52 54 0 .04 .0378 1.0596
46 47 0 .04 .0430 .9298
458 430 0.2 .00 .0045 .0000
47 47 0.2 .00 .0428 .0000
64 56 0.4 .04 .0341 1.1714
48 48 0.4 .04 .0425 .9412

 

A. = 4.1020

 

127

For the given samples, power to detect the "true"
heterogeneity for population effects including only one
distinct value of 0.5 was about 0.55 (A. = 7.8814, g; = 5).
With the given set of samples, the homogeneity test can
detect true differences (with the single distinct value
being 0.5) more than half of the time. Power decreases as
the one extreme value decreases. In other words, if the
extreme value was less than 0.5, the homogeneity test would
be less likely to reject the homogeneity of effects.

Power for population effects with three equal values
(with an average of 0.2) was about 0.42 (1. 4.1020, gt = 5).
With the given set of data, homogeneity would be rejected
slightly less than half of the time. Again, when the values
of effects decrease or increase, the power of the
homogeneity test will decrease or increase accordingly.

The homogeneity test is also sensitive to the
dispersion of effects. Even though the mean effect of the
three-equal-values set (0.2) was greater than the mean
effect of the one-extreme-value set (0.0833), power of the
homogeneity was higher for the sets of effects that

contained one extreme values.

Summary

Effect-size meta-analysis has enabled research
syntheses to become quantitatively more precise through

analyses of standardized effect sizes from primary studies.

128
Hedges & Olkin (1985) present both an unbiased estimator of
effect size and a homogeneity test for effect-size data.
They recommend examining the consistency of the effect sizes
before applying any test for the magnitude of the common or
average effect across studies. In this research, I derived
an approximate distribution for the homogeneity test under
alternative models, and then studied the power of the
homogeneity test through numerical simulation. I also
explored the impact of decisions about homogeneity of effect
sizes on subsequent tests of effect magnitude. Suggestions
were made to assist meta-analysts in maintaining desirable

statistical error rates.
The Poygr of the Homogeneity Test

The H statistic or homogeneity test had an asymptotic
central chi-squared distribution when effect sizes were
homogeneous, that is, under the null hypothesis. In the
fixed-effects case, when alternative hypotheses were true,
the distribution of the H statistic was well approximated by
a noncentral chi-squared distribution. These theoretical
distributions fit quite well with the simulated
distributions for effect sizes based on large samples. The
asymptotic distributions tended to underestimate power when
some effects had extreme values or when large numbers of

effects were based on small samples (e.g., total within-

study sample sizes of n, = 20).

129

When effects are homogeneous, the power of H should
equal the a level or size of the test. In most cases the
nominal and simulated significance levels were quite close.
However, simulation data indicated that for a nominal a
level of 0.05, the proportion of false rejections approached
0.10 for situations in which 1 = 30 and n, = 20. Simulated
significance levels were close to the nominal a level when
sample sizes were larger (n1 2 60). When encountering many
studies (for 1 2 30) all or many of which have small samples
(e.g., n, s 20) meta-analysts may wish to lower the nominal
a level of the homogeneity test to 0.025 to achieve an
actual a nearer to 0.05.

In the random-effects case, under alternative
hypotheses, the distribution of H could not be presented in
a simple form. The nonnull distribution of H is a
combination of many noncentral chi-squared distributions.
Theoretical power values based on the combination of
noncentral chi-squares corresponded closely to the simulated

values for the random-effects case.

The nger of the z Tests

Based on the particular decision about homogeneity from
the H test, a "second-stage" 1 test of effect magnitude can
be calculated. If homogeneity is accepted, the estimate of
the fixed-effects within-study variance is applied in the 1,

test. When homogeneity is rejected, the estimate of the

130
random-effects variance would be used to compute 1,. The
power functions of 1, and 1, were examined in this
dissertation. In general, the theoretical power values were
lower than the simulated values for the fixed-effects tests,
and higher for the random-effects tests.

Power values were also compared for 1 tests calculated
with the fixed-effects variance (1,) versus tests with the
random-effects variance (1,), i.e., tests calculated in the
presence of a statistical error at stage one of testing.
Power values were always higher for the fixed-effects tests
(1,) than for the random-effects tests (1,) in these cases.
When homogeneity was falsely accepted, the more powerful
fixed-effects tests would be applied. When homogeneity was
falsely rejected, the much less powerful random-effects
tests would be applied.

To prevent the 1, test from having excessively low
power for homogeneous effects, the Type I error rate (the
rate of false rejection) of the homogeneity test should be
limited. This recommendation is consistent with the
recommendation based on the simulation study of the
homogeneity tests above. In order to maintain, if not to
reduce, the rate of false rejection, the a level of 0.05 for

the homogeneity test may be lowered for effect sizes based

on many small samples.

131

r t m c t' s

The study of the power of the homogeneity test and the
power of the subsequent 1 test was useful theoretically in
understanding the distributions of both statistics.
Practically, these distributions enable reviewers to
estimate the power of the homogeneity tests and to adjust
for possible inflation of statistical errors. Studying the
sequential process in meta-analysis gives a sense of the
impact of the first-stage homogeneity test on the second-
stage 1 test.

Simulation results showed that when many studies have
small samples homogeneity tests were likely to be falsely
rejected and thus cause the subsequent 1 test to lack power.
Classifying effects into homogenous subgroups, or applying
more complicated linear models are alternative approaches in
which the reviewer explains variation among the effects.
Meta-analysts were advised to adjust the significance level
of the homogeneity test. However, a more general suggestion
to researchers should be to include more subjects (i.e.,
large samples) in primary studies. It is always better to
integrate studies of higher quality or with stronger

evidence.

Su es '0 s o rt er esearc

More needs to be learned about the estimator of the

population variance component, which figures in random-

132
effects 1 tests. The estimator proposed by Hedges 6 Olkin
(1985) had an asymptotic normal distribution but the small-
sample behavior of the estimator is unexplored. The
variance of the estimator as well as the behavior of the
estimator for different numbers of studies or sample sizes

should be further studied.

APPENDICES

APPENDIX A

CHOOSING THE NUMBER OF REPLICATIONB FOR SIMULATION

Simulated power values are measured by the proportion of
replications. We want to be able to draw a 95% confidence
intervals for these proportions. With 1 replications, the

proportions are approximately normally distributed with an

expected value n, and a variance of u(1-n)/H. We can write:

”(l-n)

2 ~ N (n, ).
n

 

Let n = .95, and let the desired 95% confidence interval for

the proportion be p i .01. That is,

 

 

.95 (1-.95)
R

The solution of this equation gives 1 = 1827. Thus, I choose

1 = 2000 as the number of replications for the simulation.

133

APPENDIX B

134

Table 62

 

Values of Sample Sizes Used in Simulation Study

 

 

 

 

 

 

¢ 1 - 201 601 1201 2001
.5 n; = 10, 10 30, 30 60, 60 100, 100
n; = 10, 10 30, 30 60, 60 100, 100
.35 7, 13 20, 40 42, 78 70, 130
7, 13 20, 40 42, 78 70, 130
.5 6, 6 18, 18 36, 36 60, 60
14, 14 42, 42 84, 84 140, 140
.35 4, 8 12, 24 24, 48 40, 80
10, 18 30, 54 60, 108 100, 180
.5 11 = 10, 10 30, 30 60, 60 100, 100
n; = 10, 10 30, 30 60, 60 100, 100
n; = 10, 10 30, 30 60, 60 100, 100
an = 10, 10 30, 30 60, 60 100, 100
D5 = 10, 10 30, 30 60, 60 100, 100
.35 7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
.5 7, 8 22, 23 45, 45 75, 75
10, 10 30, 30 60, 60 100, 100
10, 10 30, 30 60, 60 100, 100
10, 10 30, 30 60, 60 100, 100
12, 13 37, 38 75, 75 125, 125
.35 4, 11 15, 30 32, 58 52, 98
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
9, 16 26, 49 52, 98 87, 163

 

 

 

Table 62 --- Continued

135

 

Values of Sample Sizes Used in Simulation Study

 

 

 

1 n 6 n = 201 601, 1201 2001
10 1 .5 a; = 10, 10 30, 30 60, 60 100, 100
n; a 10, 10 30, 30 60, 60 100, 100
13 = 10, 10 30, 30 60, 60 100, 100
04 = 10, 10 30, 30 60, 60 100, 100
as = 10, 10 30, 30 60, 60 100, 100
n, = 10, 10 30, 30 60, 60 100, 100
n, = 10, 10 30, 30 60, 60 100, 100
61 = 10, 10 30, 30 60, 60 100, 100
ng = 10, 10 30, 30 60, 60 100, 100
1110= 10, 10 30, 30 60, 60 100, 100
.35 7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130'
7, 13 21, 39 42, 78 70, 130
7, 13 21, 39 42, 78 70, 130
2 .5 5, 5 15, 15 30, 30 50, 50
6, 6 18, 18 36, 36 60, 60
7, 7 21, 21 42, 42 70, 70
7, 7 21, 21 42, 42 70, 70
8, 8 24, 24 48, 48 80, 80
8, 8 24, 24 48, 48 80, 80
9, 9 27, 27 54, 54 90, 90
10, 10 30, 30 60, 60 100, 100
15, 15 45, 45 90, 90 150, 150
25, 25 75, 75 150, 150 250, 250
.35 3, 7 10, 20 21, 39 35, 65
4, 8 13, 23 25, 47 42, 78
5, 9 15, 27 29, 55 49, 91
5, 9 15, 27 29, 55 49, 91
6, 10 17, 31 34, 62 56, 104
6, 10 17, 31 34, 62 56, 104
6, 12 19, 35 38, 70 63, 117
7, 13 21, 39 42, 78 70, 130
11, 19 31, 59 63, 117 105, 195
17, 33 52, 98 105, 195 175, 325

 

 

 

 

 

136

Table 62 --- Continued

 

Values of Sample Sizes Used in Simulation Study

 

 

 

1 1r 9 n = LO}; 601 12% 2001

30 1 .5 £11 -- 10, 10 30, 30 60, 60 100, 100
n, = 10, 10 30, 30 60, 60 100, 100
13 = 10, 10 30, 30 60, 60 100, 100
:14 = 10, 10 30, 30 60, 60 100, 100
115 = 10, 10 30, 30 60, 60 100, 100
116 = 10, 10 30, 30 60, 60 100, 100
117 =- 10, 10 30, 30 60, 60 100, 100
Be :- 10, 10 30, 30 60, 60 100, 100
119 =- 10, 10 30, 30 60, 60 100, 100
I110 = 10, 10 30, 30 60, 60 100, 100
3211 = 10, 10 30, 30 60, 60 100, 100
1112 == 10, 10 30, 30 60, 60 100, 100
, 1113 =- 10, 10 30, 30 60, 60 100, 100
n“ = 10, 10 30, 30 60, 60 100, 100
115 = 10, 10 30, 30 60, 60 100, 100
116 = 10, 10 30, 30 60, 60 100, 100
I117 = 10, 10 30, 30 60, 60 100, 100
118 = 10, 10 30, 30 60, 60 100, 100
119 = 10, 10 30, 30 60, 60 100, 100
£20 = 10, 10 30, 30 60, 60 100, 100
.021 = 10, 10 30, 30 60, 60 100, 100
122 = 10, 10 30, 30 60, 60 100, 100
123 =- 10', 10 30, 30 60, 60 100, 100
124 = 10, 10 30, 30 60, 60 100, 100
125 = 10, 10 30, 30 60, 60 100, 100
126 =- 10, 10 30, 30 60, 60 100, 100
12., = 10, 10 30, 30 60, 60 100, 100
:12, = 10, 10 30, 30 60, 60 100, 100
1129 =- 10, 10 30, 30 60, 60 100, 100
1130 a 10, 10 30, 30 60, 60 100, 100

 

 

 

 

 

137

Table 62 --- Continued

 

Values of Sample Sizes Used in Simulation Study

 

 

 

 

1 1r 6: n = 201 601 1201 2001
30 1 .35 111 = 7, 13 21, 39 42, 78 70, 130
112 = 7, 13 21, 39 42, 78 70, 130
113 = 7, 13 21, 39 42, 78 70, 130
.84 = 7, 13 21, 39 42, 78 70, 130
as = 7, 13 21, 39 42, 78 70, 130
D6 = 7, 13 21, 39 42, 78 70, 130
n, = 7, 13 21, 39 42, 78 70, 130
D8 = 7, 13 21, 39 42, 78 70, 130
119 = 7, 13 21, 39 42, 78 70, 130
1110 = 7, 13 21, 39 42, 78 70, 130
1111 = 7, 13 21, 39 42, 78 70, 130
1112 = 7, 13 21, 39 42, 78 70, 130
1113 = 7, 13 21, 39 42, 78 70, 130
11, = 7, 13 21, 39 42, 78 70, 130
1115 = 7, 13 21, 39 42, 78 70, 130
1116 = 7, 13 21, 39 42, 78 70, 130
£17 = 7, 13 21, 39 42, 78 70, 130
1118 = 7, 13 21, 39 42, 78 70, 130
119 = 7, 13 21, 39 42, 78 70, 130
120 = 7, 13 21, 39 42, 78 70, 130
121 = 7, 13 21, 39 42, 78 70, 130
122 = 7, 13 21, 39 42, 78 70, 130
1123 = 7, 13 21, 39 42, 78 70, 130
12, = 7, 13 21, 39 42, 78 70, 130
1125 = 7, 13 21, 39 42, 78 70, 130
325 = 7, 13 21, 39 42, 78 70, 130
12., - 7, 13 21, 39 42, 78 70, 130
1128 = 7, 13 21, 39 42, 78 70, 130
1129 .. 7, 13 21, 39 42, 78 70, 130
1130 = 7, 13 21, 39 42, 78 70, 130

 

 

 

 

 

138

Table 62 --- Continued

 

Values of Sample Sizes Used in Simulation Study

 

 

 

1 n o H = 201 601 1201 2001
30 2 .5 n, a 2, 2 6, 6 12, 12 20, 20
n, = 3, 3 9, 9 18, 18 30, 30
n3 = 3, 3 9, 9 18, 18 30, 30
n4 = 3, 3 9, 9 18, 18 30, 30
ns = 4, 4 12, 12 24, 24 40, 40
ns = 6, 6 18, 18 36, 36 60, 60
n7 = 6, 6 18, 18 36, 36 60, 60
De = 6, 6 18, 18 36, 36 60, 60
n, = 6, 6 18, 18 36, 36 60, 60
nlo = 6, 6 18, 18 36, 36 60, 60
n11 = 6, 6 18, 18 36, 36 60, 60
3112 = 7, 7 21, 21 42, 42 70, 70
1113 = 7, 7 21, 21 42, 42 70, 70
n14 = 7, 7 21, 21 42, 42 70, 70
n15 = 8, 8 24, 24 48, 48 80, 80
n16 = 8, 8 24, 24 48, 48 80, 80
n17 == 8, 8 24, 24 48, 48 80, 80
n18 = 8, 8 24, 24 48, 48 80, 80
n49 = 11, 11 33, 33 66, 66 110, 110
Ibo = 11, 11 33, 33 66, 66 110, 110
n21 = 11, 11 33, 33 66, 66 110, 110
n22 = 12, 12 36, 36 72, 72 120, 120
n23 = 12, 12 36, 36 72, 72 120, 120
n24 a 14, 14 42, 42 84, 84 140, 140
1125 a 17, 17 51, 51 102, 102 170, 170
n26 = 17, 17 51, 51 102, 102 170, 170
n27== 17, 17 51, 51 102, 102 170, 170
n28 = 20, 20 60, 60 120, 120 200, 200
n,, = 20, 20 60, 60 120, 120 200, 200
n30 = 34, 34 102, 102 204, 204 340, 340

 

 

 

 

 

139

Table 62 --- Continued

 

Values of Sample Sizes Used in Simulation Study

 

 

1 1r 6 g:- 201 601, ROLL 2001 ,
30 2 .35 D1 ... 2, 2 4, 8 8, 16 14, 26
n; =- 2, 4 6, 12 13, 23 21, 39
n; = 2, 4 6, 12 13, 23 21, 39
:14 =- 2, 4 6, 12 13, 23 21, 39
£5 = 3, 5 8, 16 17, 31 28, 52
16 = 4, 8 13, 23 25, 47 42, 78
117 = 4, 8 13, 23 25, 47 42, 78
Da = 4, 8 13, 23 25, 47 42, 78
n, = 4, 8 13, 23 25, 47 42, 78
1110 = 4, 8 13, 23 25, 47 42, 78
D11 = 4, 8 13, 23 25, 47 42, 78
1112 = 5, 9 15, 27 29, 55 49, 91
113 = 5, 9 15, 27 29, 55 49, 91
11, = 5, 9 15, 27 29, 55 49, 91
115 = 6, 10 17, 31 33, 63 56, 104
1116 = 6, 10 17, 31 33, 63 56, 104
1117 = 6, 10 17, 31 33, 63 56, 104
118 = 6, 10 17, 31 33, 63 56, 104
1119 =- 8, 14 23, 43 46, 86 77, 143
£20 == 8, 14 23, 43 46, 86 77, 143
321 = 8, 14 23, 43 46, 86 77, 143
322 = 8, 16 25, 47 50, 94 84, 156
1123 =- 8, 16 25, 47 50, 94 84, 156
1124 =- 10, 18 29, 55 59, 109 98, 182
1125 a 12, 22 36, 66 71, 133 119, 221
1126 = 12, 22 36, 66 71, 133 119, 221
112, - 12, 22 36, 66 71, 133 119, 221
1128 -- 14, 26 42, 78 84, 156 140, 260
1129 = 14, 26 42, 78 84, 156 140, 260
830 = 24, 44 71, 133 143, 265 238, 422

 

 

 

 

 

 

 

140

Table 63

 

Values of 65s Used in the Simulation for 1 = 2

 

 

 

 

 

 

set = 1 2 3 4 5 6

1 0 0 0 0 0 0

2 0 .1 .25 .S .75 1
Table 64

Values of s Used in the Simulation for 1 8

S 6 7 8 9
0 0 0

.1 .2

O
0
0 .1 .2
0 .2 .4
7

5 .2 .4

 

141

Table 65

O
1
x
k
r
O
f
n
O
.1
t
8
l
m
.s
0
h
t
n
i
d
0
s
U
8

Values of

 

Table 65 --— Continued

r
o
....
a
o
i
t
a
l
u
m
i
S
8
h
a.
n
....
d
a
8
U
s
6.

Value; of

 

142

Table 66

o
3
a
k—
r
0
f
n
O
.1
t
0
1.
m
3
m
d
0
I
U
I
8
f
O
I
0
u
1
I
V

 

143

Table 66 --- Continued

 

Values of 65s Used in the Simulation for 1 8 30

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

set 8 12 13 14 15 16 17 18 19 20 21 22
1 0 0 0 0 O 0 0 0 0 0 0
2 0 O 0 0 0 0 O 0 O O 0
3 0 0 0 0 O 0 0 0 0 0 0
4 0 0 O 0 0 0 O 0 0 0 0
S 0 O 0 0 O 0 0 0 0 0 0
6 O 0 0 0 0 0 O 0 O O 0
7 0 0 0 0 0 0 .05 .1 .15 .2 .25
8 0 0 0 0 O 0 .05 .1 .15 .2 .25
9 0 O O 0 O 0 .05 .1 .15 .2 .25
10 0 0 0 0 0 0 .05 .1 .15 .2 .25
11 .1 .2 .25 .3 .4 .5 .05 .1 .15 .2 .25
12 .1 .2 .25 .3 .4 .S .05 .1 .15 .2 .25
13 .1 .2 .25 .3 .4 .5 .1 .2 .3 .4 .5
14 .1 .2 .25 .3 .4 .5 .1 .2 .3 .4 .5
15 .1 .2 .25 .3 .4 .5 .1 .2 .3 .4 .5
16 .1 .2 .25 .3 .4 .5 .1 .2 .3 .4 .S
17 .1 .2 .25 .3 .4 .5 .1 .2 .3 .4 .5
18 .1 .2 .25 .3 .4 .S .1 .2 .3 .4 .5
19 .1 .2 .25 .3 .4 .5 .15 .3 .45 .6 .75
20 .1 .2 .25 .3 .4 .5 .15 .3 .45 .6 .75
21 .2 .4 .5 .6 .8 1 .15 .3 .45 .6 .75
22 .2 .4 .5 .6 .8 1 .15 .3 .45 .6 .75
23 .2 .4 .5 .6 .8 1 .15 .3 .45 .6 .75
24 .2 .4 .5 .6 .8 1 .15 .3 .45 .6 .75
25 .2 .4 .5 .6 .8 l .2 .45 .6 .8 1
26 .2 .4 .5 .6 .8 1 .2 .45 .6 .8 1
27 .2 .4 .5 .6 .8 l .2 .45 .6 .8 1
28 .2 .4 .5 .6 .8 1 .2 .45 .6 .8 1
29 .2 .4 .5 .6 .8 1 .2 .45 .6 .8 1
3O .2 .4 .5 .6 .8 1 .2 .45 .6 .8 1

 

 

 

APPENDIX C

Means of Power of H for 6

144

Table 67

s with One Extreme value

‘(a

 

 

 

by H and 1 0.10)

1*n 6 = 0.10 0.25 0.50 0.75 1.00 Total
2(20) .104(4) .123(4) .190(4) .294(4) .421(4) .2265(20)
2(60) .111(4) .167(4) .355(4) .596(4) .800(4) .4062(20)
2(120) .122(4) .235(4) .561(4) .847(4) .969(4) .5470(20)
2(200) .137(4) .319(4) .750(4) .963(4) .998(4) .6333(20)
5(20) .103(4) .120(4) .183(4) .289(4) .426(4) .2245(20)
5(60) .110(4) .163(4) .361(4) .639(4) .857(4) .4260(20)
5(120) .120(4) .230(4) .612(4) .906(4) .990(4) .5694(20)
5(200) .133(4) .321(4) .815(4) .989(4) .000(4) .6515(20)
10(20) .103(4) .121(4) .189(4) .308(4) .462(4) .2365(20)
10(60) .110(4) .166(4) .391(4) .681(4) .867(4) .4431(20)
10(120) .120(4) .241(4) .646(4) .908(4) .986(4) .5803(20)
10(200) .134(4) .346(4) .835(4) .985(4) .000(4) .6598(20)
30(20) .102(4) .116(4) .169(4) .270(4) .406(4) .2128(20)
30(60) .107(4) .151(4) .344(4) .605(4) .773(4) .3960(20)
30(120) .115(4) .213(4) .575(4) .817(4) .944(4) .5327(20)
30(200) .126(4) .302(4) .743(4) .940(4) .996(4) .6213(20)
Note:

(0, ..

’I

The pattern of 6, values with one extreme value was
0, 6).

145

 

 

 

Table 67.a

Means of Simulated Power of H for 6 s with One Extreme Value
by H and 1 (c =10.10)
1*n 6 = 0.10 0.25 0.50 0.75 1.00 Total
2(20) .103(4) .131(4) .183(4) .283(4) .415(4) .2228(20)
2(60) .109(4) .163(4) .350(4) .597(4) .800(4) .4038(20)
2(120) .123(4) .233(4) .560(4) .849(4) .974(4) .5475(20)
2(200) .141(4) .320(4) .745(4) .966(4) .996(4) .6334(20)
5(20) .118(4) .130(4) .196(4) .302(4) .447(4) .2386(20)
5(60) .113(4) .166(4) .362(4) .656(4) .871(4) .4337(20)
5(120) .120(4) .235(4) .602(4) .913(4) .993(4) .5726(20)
5(200) .123(4) .319(4) .830(4) .990(4) .000(4) .6524(20)
10(20) .133(4) .147(4) .220(4) .336(4) .502(4) .2677(20)
10(60) .117(4) .177(4) .401(4) .700(4) .883(4) .4556(20)
10(120) .124(4) .241(4) .656(4) .913(4) .992(4) .5853(20)
10(200) .135(4) .344(4) .848(4) .987(4) .000(4) .6626(20)
30(20) .168(4) .173(4) .229(4) .328(4) .466(4) .2727(20)
30(60) .121(4) .166(4) .369(4) .625(4) .801(4) .4163(20)
30(120) .116(4) .226(4) .595(4) .831(4) .961(4) .5457(20)
30(200) .131(4) .312(4) .756(4) .951(4) 1.000(4) .6297(20)
Note:

(0, ..

“I

The pattern of 6, values with one extreme value was
0, 6).

146

Table 68

Means of Power of H for 6,s with Two Extreme values
by H and 1 (a = 0.10)

 

1*H 6 = 0.10 0.25 0.50 0.75 1.00 Total

 

10(20) .105(4) .130(4) .232(4) .410(4) .510(4) .2776(20)
10(60) .114(4) .198(4) .524(4) .853(4) .976(4) .5330(20)
10(120) .129(4) .310(4) .819(4) .989(4) 1.000(4) .6495(20)
10(200) .150(4) .460(4) .960(4) 1.000(4) 1.000(4) .7141(20)

30(20) .104(4) .125(4) .216(4) .384(4) .580(4) .2819(20)
30(60) .112(4) .185(4) .495(4) .796(4) .937(4) .5049(20)
30(120) .124(4) .289(4) .766(4) .965(4) .999(4) .6288(20)
30(200) .142(4) .434(4) .915(4) .998(4) 1.000(4) .6979(20)

 

Note: The pattern of 5i values with two extreme values was
(0, ..., 0, 6, 67.
Table 68.a

Means of Simulated Power of H for 6,s with Two Extreme
Values by H and 1 (a = 0.10)

 

1*1 6 = 0.10 0.25 0.50 0.75 1.00 Total

 

10(20) .131(4) .163(4) .254(4) .432(4) .628(4) .3015(20)
10(60) .120(4) .200(4) .535(4) .864(4) .982(4) .5398(20)
10(120) .138(4) .317(4) .827(4) .992(4) 1.000(4) .6547(20)
10(200) .156(4) .472(4) .956(4) 1.000(4) 1.000(4) .7166(20)

30(20) .164(4) .185(4) .274(4) .435(4) .639(4) .3396(20)
30(60) .129(4) .199(4) .516(4) .810(4) .959(4) .5228(20)
30(120) .133(4) .302(4) .775(4) .973(4) .999(4) .6364(20)
30(200) .138(4) .424(4) .921(4) .998(4) 1.000(4) .8957(20)

 

Note: The pattern of 51 values with two extreme values was
(0, ..., 0, 6, 6T.

147

Table 69

Means of Power of H for Three Equal Subsets of 61s
by H and 1 (a = 0.10)

 

 

 

1*n 6 = 0.10 0.20 0.25 0.30 0.40 0.50 Total
5(20) .110 .140 .163 .191 .265 .356 .2042(24)
5(60) .130 .227 .301 .391 .590 .772 .4019(24)
5(120) .162 .362 .503 .648 .873 .971 .5864(24)
5(200) .206 .529 .713 .855 .980 .999 .7135(24)
10(20) .114 .159 .195 .240 .360 .506 .2622(24)
10(60) .144 .296 .416 .553 .805 .946 .5264(24)
10(120) .192 .508 .704 .858 .985 .999 .7076(24)
10(200) .261 .736 .909 .980 1.000 1.000 .8143(24)
30(20) .123 .206 .277 .369 .595 .805 .3957(24)
30(60) .177 .477 .684 .853 .987 1.000 .6962(24)
30(120) .271 .806 .956 .995 1.000 1.000 .8378(24)
30(200) .410 .968 .999 1.000 1.000 1.000 .8960(24)
Note: The

pattern of three equal subsets of 6, values was

(0'00.’

0, 6,...,

6’ 26,000,

26) .

Table 69.a

148

Means of Simulated Power of H for Three Equal Subsets of 6,3
by H and 1 (a = 0.10)

 

 

 

1*n 6 = 0.10 0.20 0.25 0.30 0.40 0.50 Total
5(20) .122 .160 .173 .216 .273 .372 .2193(24)
5(60) .134 .231 .306 .400 .584 .773 .4046(24)
5(120) .164 .365 .511 .649 .874 .972 .5890(24)
5(200) .206 .528 .710 .856 .981 .999 .7133(24)
10(20) .137 .192 .217 .260 .372 .524 .2837(24)
10(60) .144 .307 .425 .557 .801 .948 .5304(24)
10(120) .195 .503 .709 .853 .987 1.000 .7077(24)
10(200) .250 .740 .905 .980 1.000 1.000 .8125(24)
30(20) .190 .272 .349 .426 .625 .819 .4468(24)
30(60) .190 .487 .689 .851 .987 1.000 .7007(24)
30(120) .269 .802 .959 .992 1.000 1.000 .8370(24)
30(200) .407 .969 .998 1.000 1.000 1.000 .8957(24)
Note:

The pattern of three equal subsets of 6, values was
(0,000, 0’ 6,000, 6' 26,000, 26)0

149

Table 70

Means of Power of g for Five Equal Subsets of 6;s

 

 

 

by E and L (a = .10)

3*3 £6 = 0.10 0.20 0.30 0.40 0.50 Total
10(20) .112 .149 .216 .316 .442 .2470(20)
10(60) .136 .262 .484 .730 .902 .5029(20)
10(120) .176 .444 .789 .964 .998 .6742(20)
10(200) .234 .657 .955 1.000 1.000 .7690(20)
30(20) .117 .177 .294 .469 .669 .3451(20)
30(60) .156 .374 .723 .944 .996 .6386(20)
30(120) .223 .669 .970 1.000 1.000 .7722(20)
30(200) .323 .898 .999 1.000 1.000 .8442(20)
Note: The pattern of three

equal subsets of 8‘6 values was

Table 70.a

(0,..., o, 956,...,!56, 6,...,6, 1355,...,1!5

, 26,...,26).

Means of Simulated Power of H for Five Equal Subsets of 61s
by H and 5 (a = 0.10)

 

 

 

3*g k6 = 0.10 0.20 0.30 0.40 0.50 Total
10(20) .139 .184 .245 .336 .460 .2727(20)
10(60) .143 .270 .485 .737 .902 .5073(20)
10(120) .181 .439 .791 .967 .997 .6750(20)
10(200) .233 .654 .952 .997 1.000 .7675(20)
30(20) .186 .253 .352 .515 .703 .4018(20)
30(60) .185 .388 .729 .943 .995 .6481(20)
30(120) .233 .671 .970 1.000 1.000 .7748(20)
30(200) .324 .904 1.000 1.000 1.000 .8455(20)
Note: The pattern of three equal subsets of 6 values was

(0,..., 0,5k6,...,k6, 6,...,6, 1k6,...,18 , 26,...,26).

150

 

 

 

Table 71

Means of Power of g for 6 s with One Extreme Value
by Q and 5 e = 0.025)
3*g 6 = 0.10 0.25 0.50 0.75 1.00 Total
2(20) .027(4) .035(4) .068(4) .126(4) .213(4) .0937(20)
2(60) .030(4) .056(4) .166(4) .362(4) .599(4) .2427(20)
2(120) .035(4) .092(4) .330(4) .668(4) .899(4) .4046(20)
2(200) .014(4) .142(4) .532(4) .885(4) .988(4) .5173(20)
5(20) .026(4) .033(4) .061(4) .119(4) .213(4) .0905(20)
5(60) .029(4) .052(4) .166(4) .404(4) .685(4) .2670(20)
5(120) .033(4) .085(4) .366(4) .768(4) .962(4) .4426(20)
5(200) .033(4) .139(4) .620(4) .956(4) .999(4) .5505(20)
10(20) .026(4) .033(4) .063(4) .132(4) .249(4) .1001(20)
10(60) .027(4) .053(4) .191(4) .472(4) .732(4) .2953(20)
10(120) .033(4) .091(4) .431(4) .799(4) .954(4) .4616(20)
10(200) .033(4) .156(4) .680(4) .950(4) .998(4) .5649(20)
30(20) .026(4) .031(4) .054(4) .109(4) .210(4) .0868(20)
30(60) .028(4) .045(4) .160(4) .414(4) .632(4) .2557(20)
30(120) .031(4) .076(4) .378(4) .687(4) .863(4) .4070(20)
30(200) .035(4) .130(4) .592(4) .857(4) .981(4) .5190(20)
Note:

(0:

0.0,

The pattern of 61 values with one extreme value was
0, 6).

151

Table 71.a

Means of Simulated Power of g tor 6 s with One Extreme Value

by H and L (a = .025)

 

 

 

3*3 6 = 0.10 0.25 0.50 0.75 1.00 Total
2(20) .031(4) .041(4) .068(4) .126(4) .211(4) .0953(20)
2(60) .027(4) .052(4) .168(4) .358(4) .602(4) .2412(20)
2(120) .036(4) .097(4) .328(4) .664(4) .902(4) .4054(20)
2(200) .044(4) .142(4) .525(4) .885(4) .986(4) .5162(20)
5(20) .041(4) .044(4) .079(4) .141(4) .241(4) .1091(20)
5(60) .032(4) .056(4) .175(4) .419(4) .713(4) .2790(20)
5(120) .035(4) .084(4) .369(4) .780(4) .971(4) .4476(20)
5(200) .035(4) .138(4) .633(4) .963(4) .000(4) .5537(20)
10(20) .047(4) .055(4) .094(4) .162(4) .291(4) .1298(20)
10(60) .034(4) .061(4) .203(4) .498(4) .763(4) .3117(20)
10(120) .031(4) .097(4) .440(4) .809(4) .970(4) .4692(20)
10(200) .039(4) .160(4) .695(4) .961(4) .000(4) .5706(20)
30(20) .072(4) .071(4) .103(4) .168(4) .284(4) .1395(20)
30(60) .035(4) .053(4) .182(4) .451(4) .671(4) .2783(20)
30(120) .033(4) .087(4) .393(4) .709(4) .896(4) .4235(20)
30(200) .039(4) .141(4) .607(4) .880(4) .989(4) .5308(20)
Note:

I

o, 6).

The pattern of 61 values with one extreme value was
(0' 000

152

Table 72

Means of Power of g for 61s with Two Extreme values
by H and 5'1e = 0.025)

 

5*Q 6 = 0.10 0.25 0.50 0.75 1.00 Total

 

10(20) .027(4) .037(4) .085(4) .201(4) .297(4) .1294(20)
10(60) .030(4) .068(4) .294(4) .687(4) .924(4) .4006(20)
10(120) .036(4) .131(4) .635(4) .960(4) .999(4) .5521(20)
10(200) .045(4) .240(4) .886(4) .999(4) 1.000(4) .6339(20)

30(20) .026(4) .035(4) .076(4) .185(4) .367(4) .1379(20)
30(60) .029(4) .061(4) .230(4) .637(4) .852(4) .3717(20)
30(120) .034(4) .119(4) .596(4) .904(4) .993(4) .5293(20)
30(200) .041(4) .225(4) .814(4) .992(4) 1.000(4) .6145(20)

 

Note: The pattern of 6 values with two extreme values was
(0’ 000' 0' 6' 6 0
Table 72.a

Means of Simulated Power of g for 61s with Two Extreme
Values by g and 5 (a =‘0.025)

 

3*3 6 = 0.10 0.25 0.50 0.75 1.00 Total

 

10(20) .047(4) .064(4) .107(4) .237(4) .336(4) .1581(20)
10(60) .036(4) .075(4) .312(4) .707(4) .940(4) .4141(20)
10(120) .039(4) .130(4) .643(4) .967(4) .999(4) .5556(20)
10(200) .046(4) .249(4) .883(4) .999(4) 1.000(4) .6354(20)

30(20) .068(4) .081(4) .130(4) .245(4) .448(4) .1943(20)
30(60) .037(4) .072(4) .310(4) .662(4) .886(4) .3932(20)
30(120) .037(4) .133(4) .603(4) .920(4) .997(4) .5381(20)
30(200) .040(4) .224(4) .825(4) .993(4) 1.000(4) .6164(20)

 

Note: The pattern of 6 values with two extreme values was
(0' 000' o, 6, 6 0

153

Table 73

Means of Power of n for Three Equal Subsets of 6‘s

 

 

 

by n and 3 (a = 0.025)

3*3 6 = 0.10 0.20 0.25 0.30 0.40 0.50 Total
5(20) .029 .041 .052 .065 .105 .162 .0757(24)
5(60) .037 .084 .127 .186 .354 .560 .2246(24)
5(120) .051 .166 .274 .413 .709 .906 .4198(24)
5(200) .072 .297 .486 .681 .931 .993 .5766(24)
10(20) .030 .049 .066 .089 .162 .274 .1117(24)
10(60) .042 .121 .202 .315 .603 .846 .3548(24)
10(120) .064 .276 .472 .682 .944 .997 .5725(24)
10(200) .101 .511 .771 .930 .998 1.000 .7185(24)
30(20) .034 .070 .108 .165 .352 .600 .2213(24)
30(60) .056 .246 .446 .673 .951 .998 .5615(24)
30(120) .104 .602 .868 .976 1.000 1.000 .7583(24)
30(200) .194 .898 .992 1.000 1.000 1.000 .8472(24)
Note: The

pattern of three equal subsets of 61 values was

(0,000,

0’ 6,...,

a, 26,..., 25).

Table 73.a

154

Means of Simulated Power of B for Three Equal Subsets of 613
by H and 5 (a = 0.025)

 

 

 

3*3 6 = 0.10 0.20 0.25 0.30 0.40 0.50 Total
5(20) .044 .059 .066 .085 .123 .192 .0947(24)
5(60) .041 .089 .130 .193 .355 .561 .2282(24)
5(120) .055 .171 .274 .421 .718 .910 .4248(24)
5(200) .071 .294 .485 .679 .939 .993 .5770(24)
10(20) .048 .073 .093 .108 .188 .308 .1362(24)
10(60) .044 .129 .211 .325 .604 .852 .3606(24)
10(120) .070 .281 .481 .682 .946 .996 .5761(24)
10(200) .095 .513 .771 .936 .998 1.000 .7187(24)
30(20) .084 .130 .172 .241 .407 .644 .2797(24)
30(60) .065 .260 .465 .679 .948 .997 .5688(24)
30(120) .111 .595 .869 .973 1.000 1.000 .7580(24)
30(200) .196 .901 .993 1.000 1.000 1.000 .8482(24)
Note:

pattern of three equal subsets of 61 values was
6, 26,..., 26).

(0,000,

6,000,

155

Table 74

Means of Power or n for Pive Equal Subsets or 61s

by H and K (a = 0.025)

 

 

gtn as = 0.10 0.20 0.30 0.40 0.50 Total
10(20) .029 .045 .077 .134 .222 .1012(20)
10(60) .039 .101 .225 .504 .758 .3316(20)
10(120) .057 .223 .581 .888 .987 .5472(20)
10(200) .086 .420 .867 .993 1.000 .6731(20)
30(20) .031 .056 .117 .240 .428 .1746(20)
30(60) .047 .169 .491 .842 .979 .5057(20)
30(120) .078 .429 .902 .997 1.000 .68l3(20)
30(200) .135 .750 .996 1.000 1.000 .7764(20)

 

Note: The pattern of three equal subsets of 61 values was
(0,000, o, 86'000’k6' 8’000'6’ 1%6'000'1%T' 26’000'26)0
Table 74.a

Means of Simulated Power or B for Five Equal Subsets of 618
with H and L (a = 0.025)

 

 

 

3*n %6 = 0.10 0.20 0.30 0.40 0.50 Total
10(20) .049 .068 .104 .163 .261 .1289(20)
10(60) .043 .110 .257 .517 .767 .3387(20)
10(120) .063 .232 .590 .887 .988 .5519(20)
10(200) .086 .424 .863 .994 1.000 .6733(20)
30(20) .082 .113 .181 .306 .484 .2332(20)
30(60) .057 .178 .502 .846 .977 .5119(20)
30(120) .083 .437 .903 .998 1.000 .6841(20)
30(200) .140 .757 .997 1.000 1.000 .7787(20)
Note: The pattern of three equal subsets of 51 values was

(0'000’0’ *6'000’%6'

6’000'6'1%6'000'1%T’26'000’26)0

156

 

 

 

Table 75

Means of Power of g for 6 s with One Extreme Value
by g and 5 a = 0.01)
3*3 6 = 0.10 0.25 0.50 0.75 1.00 Total
2(20) .011(4) .015(4) .033(4) .070(4) .129(4) .0517(20)
2(60) .012(4) .027(4) .096(4) .247(4) .467(4) .1699(20)
2(120) .015(4) .048(4) .220(4) .541(4) .828(4) .3303(20)
2(200) .019(4) .080(4) .401(4) .808(4) .974(4) .4563(20)
5(20) .011(4) .014(4) .029(4) .064(4) .129(4) .0495(20)
5(60) .012(4) .024(4) .096(4) .284(4) .563(4) .1956(20)
5(120) .014(4) .043(4) .251(4) .659(4) .927(4) .3788(20)
5(200) .017(4) .077(4) .493(4) .918(4) .997(4) .5005(20)
10(20) .011(4) .014(4) .030(4) .074(4) .160(4) .0577(20)
10(60) .012(4) .024(4) .115(4) .359(4) .639(4) .2299(20)
10(120) .014(4) .047(4) .319(4) .719(4) .922(4) .4041(20)
10(200) .017(4) .091(4) .579(4) .916(4) .995(4) .5196(20)
30(20) .010(4) .013(4) .025(4) .059(4) .132(4) .0478(20)
30(60) .011(4) .020(4) .094(4) .317(4) .555(4) .1994(20)
30(120) .013(4) .038(4) .281(4) .616(4) .804(4) .3502(20)
30(200) .015(4) .073(4) .509(4) .797(4) .964(4) .4713(20)
Note:

(0, ..

'I

The pattern of 61 values with one extreme value was
0’ 6)0

157

 

 

 

Table 75.a
Means of Simulated Power of 5 for 6 s with One Extreme Value
by 5 and 5 (a =Jb.01)

5*n 6 = 0.10 0.25 0.50 0.75 1.00 Total
2(20) .014(4) .020(4) .036(4) .074(4) .133(4) .0553(20)
2(60) .011(4) .025(4) .101(4) .242(4) .474(4) .1704(20)
2(120) .016(4) .051(4) .217(4) .537(4) .835(4) .3313(20)
2(200) .020(4) .077(4) .391(4) .807(4) .974(4) .4537(20)
5(20) .021(4) .022(4) .043(4) .082(4) .162(4) .0667(20)
5(60) .016(4) .026(4) .106(4) .301(4) .606(4) .2111(20)
5(120) .017(4) .047(4) .253(4) .673(4) .943(4) .3865(20)
5(200) .017(4) .076(4) .516(4) .934(4) .999(4) .5082(20)
10(20) .027(4) .027(4) .052(4) .102(4) .205(4) .0825(20)
10(60) .013(4) .030(4) .129(4) .393(4) .679(4) .2487(20)
10(120) .012(4) .051(4) .331(4) .740(4) .940(4) .4148(20)
10(200) .018(4) .093(4) .595(4) .928(4) .997(4) .5261(20)
30(20) .044(4) .043(4) .064(4) .109(4) .200(4) .0919(20)
30(60) .016(4) .025(4) .115(4) .356(4) .598(4) .2219(20)
30(120) .014(4) .046(4) .291(4) .634(4) .844(4) .3658(20)
30(200) .017(4) .080(4) .525(4) .825(4) .978(4) .4849(20)
Note: The pattern of 51 values with one extreme value was

(0, ..

0:

'I

m."

158

Table 76

Means of Power of a for 6 s with Two Extreme values
by E and 5 (a = 0.01)

 

 

 

5*5 6 = 0.10 0.25 0.50 0.75 1.00 Total
10(20) .011(4) .016(4) .043(4) .121(4) .202(4) .0785(20)
10(60) .013(4) .033(4) .193(4) .572(4) .873(4) .3366(20)
10(120) .016(4) .072(4) .514(4) .927(4) .997(4) .5053(20)
10(200) .020(4) .150(4) .819(4) .996(4) 1.000(4) .5971(20)
30(20) .011(4) .015(4) .038(4) .111(4) .263(4) .0875(20)
30(60) .012(4) .029(4) .186(4) .542(4) .788(4) .3115(20)
30(120) .014(4) .065(4) .497(4) .852(4) .985(4) .4826(20)
30(200) .018(4) .142(4) .744(4) .983(4) 1.000(4) .5773(20)
Note: The pattern of 51 values with two extreme values was

(0, ..., 0, 6, 67.
Table 76.a

Means of Simulated Power of g for 61s with Two Extreme

 

 

 

Values by 5 and 5 (a — 0.01)

5*5 6 = 0.10 0.25 0.50 0.75 1.00 Total
10(20) .025(4) .034(4) .063(4) .156(4) .243(4) .1041(20)
10(60) .015(4) .037(4) .206(4) .595(4) .899(4) .3505(20)
10(120) .019(4) .078(4) .528(4) .936(4) .998(4) .5119(20)
10(200) .022(4) .161(4) .819(4) .997(4) 1.000(4) .5997(20)
30(20) .042(4) .048(4) .083(4) .166(4) .353(4) .1383(20)
30(60) .017(4) .036(4) .212(4) .579(4) .835(4) .3358(20)
30(120) .016(4) .072(4) .507(4) .876(4) .992(4) .4926(20)
30(200) .021(4) .141(4) .751(4) .995(4) 1.000(4) .5796(20)
Note: The pattern of 6 values with two extreme values was

(0:

0:

.0,

6, 6 .

159

Table 77

Means of Power of E for Three Equal Subsets or 6&8
by 5 and 5 (a = 0.01)

 

 

 

5*5 6 = 0.10 0.20 0.25 0.30 0.40 0.50 Total
5(20) .012 .018 .024 .032 .055 .093 .0390(24)
5(60) .016 .042 .069 .110 .241 .430 .1514(24)
5(120) .024 .096 .175 .291 .589 .840 .3359(24)
5(200) .036 .193 .358 .558 .877 .985 .5011(24)
10(20) .013 .022 .032 .045 .092 .175 .0631(24)
10(60) .019 .065 .120 .208 .474 .757 .2737(24)
10(120) .031 .176 .345 .559 .898 .991 .5000(24)
10(200) .052 .382 .663 .876 .996 1.000 .6613(24)
30(20) .014 .033 .056 .094 .237 .471 .1508(24)
30(60) .026 .152 .320 .549 .910 .995 .4917(24)
30(120) .054 .472 .783 .953 1.000 1.000 .7110(24)
30(200) .114 .830 .982 .999 1.000 1.000 .8208(24)
Note: The

pattern of three equal subsets of 6; values was

(0,000,

6,000,

6' 26,000,

26) .

Table 77.a

160

Means of Simulated Power of E for Three Equal Subsets of 6&8

 

 

 

by 5 and 5 (a = 0.01)

5*5 6 = 0.10 0.20 0.25 0.30 0.40 0.50 Total
5(20) .023 .031 .034 .045 .073 .123 .0547(24)
5(60) .020 .043 .069 .120 .242 .434 .1554(24)
5(120) .028 .100 .175 .296 .598 .845 .3404(24)
5(200) .035 .192 .360 .561 .883 .983 .5020(24)
10(20) .025 .038 .052 .066 .120 .215 .0857(24)
10(60) .022 .070 .129 .220 .480 .766 .2811(24)
10(120) .038 .186 .349 .559 .901 .992 .5043(24)
10(200) .050 .381 .665 .883 .995 1.000 .6624(24)
30(20) .053 .081 .112 .162 .301 .528 .2060(24)
30(60) .030 .166 .336 .561 .912 .994 .4998(24)
30(120) .060 .467 .791 .949 1.000 1.000 .7110(24)
30(200) .120 .832 .984 .999 1.000 1.000 .8225(24)
Note: The

pattern of three equal subsets of 51 values was
(0,000, -

6,000,

5,

26,000,

28).

161

Table 78

Means of Power of 5 for live Equal Subsets or 61s
by 5 and 5 (a = 0.01)

 

 

 

5*5 k6 = 0.10 0.20 0.30 0.40 0.50 Total
10(20) .012 .020 .038 .073 .135 .0556(20)
10(60) .017 .053 .160 .375 .646 .2502(20)
10(120) .027 .136 .451 .815 .972 .4800(20)
10(200) .043 .297 .785 .984 1.000 .6218(20)
30(20) .013 .026 .062 .147 .304 .1104(20)
30(60) .021 .096 .362 .752 .958 .4380(20)
30(120) .038 .304 .836 .994 1.000 .6343(20)
30(200) .074 .638 .990 1.000 1.000 .7404(20)
Note: The pattern of three equal subsets of 6 values was

(0,..., O, 356,...,356, 6,...,6, 1356,...,135 , 26,...,26).

Table 78.a

Means of Simulated Power of 5 for Five Equal Subsets of 518
by 5 and 5 (a = 0.01)

 

 

 

5*5 %6 = 0.10 0.20 0.30 0.40 0.50 Total
10(20) .023 .037 .060 .104 .169 .0786(20)
10(60) .021 .057 .165 .386 .660 .2575(20)
10(120) .030 .144 .458 .817 .976 .4848(20)
10(200) .042 .295 .785 .986 1.000 .6215(20)
30(20) .051 .067 .119 .214 .372 .1645(20)
30(60) .027 .103 .377 .760 .958 .4449(20)
30(120) .042 .313 .842 .994 1.000 .6383(20)
30(200) .075 .642 .991 1.000 1.000 .7416(20)
Note: The pattern of three equal subsets of 61 values was

(O'eee,o'%6’eee'%6'S'eee's'1%6,eee,1%1,26,eee,26)e

162

Table 79

Means of Power of at a = 0.10 for “a = 0

for the Random-effects Model

 

 

616 5 = 205 605 1205 2005, Total
.00 0.10(16) 0.10(16) 0.10(16) 0.10(16) 0.10( 64)
.00-.02 0.12(16) 0.21(20) 0.33(20) 0.45(20) 0.29( 76)
.02-.04 0.17(16) 0.39(20) 0.58(20) 0.70(20) 0.47( 76)
.04-.06 0.22(16) 0.52(20) 0.62(16) 0.73(16) 0.52( 68)
.06-.08 0.27(16) 0.61(20) 0.59(12) 0.71(12) 0.53( 60)
.08-.10 0.33(32) 0.55(28) 0.66(24) 0.76(24) 0.56(108)
.15 0.44(16) 0.61(12) 0.75(12) 0.82(12) 0.64( 52)
.20 0.51(16) 0.67(12) 0.79(12) 0.85(12) 0.69( 52)
.25 0.56(16) 0.71(12) 0.81(12) 0.80( 8) 0.70( 48)

 

163

Table 80

Means of Power of at a = 0.025 for u5 = 0
for the Random-effects Model

 

 

035 5 - 205 605 1205 2005 Total
.00 .025(16) .025(16) .025(16) .025(16) .025( 64)
.00-.02 .033(16) .080(20) .169(20) .288(20) .148( 76)
.02-.04 .054(16) .213(20) .425(20) .579(20) .332( 76)
.04-.06 .081(16) .351(20) .478(16) .615(16) .379( 68)
.06-.08 .114(16) .459(20) .433(12) .587(12) .387( 60)
.08-.10 .169(32) .394(28) .526(24) .660(24) .413(108)
.15 .260(16) .452(12) .646(12) .746(12) .506( 52)
.20 .340(16) .535(12) .706(12) .786(12) .572( 52)
.25 .406(16) .598(12) .741(12) .719( 8) .590( 48)

 

164

Table 81

Means of Power of
for the Random-effects Model

at a = 0.01 for “5 = 0

 

 

035 5 = 205 605 1205 2005 Total
.00 .010(16) .010(16) .010(16) .010(16) .010( 64)
.00-.02 .014(16) .042(20) .110(20) .217(20) .100( 76)
.02-.04 .025(16) .142(20) .346(20) .516(20) .269( 76)
.04-.06 .042(16) .267(20) .404(16) .554(16) .314( 68)
.06-.08 .065(16) .380(20) .351(12) .518(12) .318( 60)
.08-.10 .100(32) .315(28) .450(24) .603(24) .345(108)
.15 .184(16) .370(12) .587(12) .704(12) .440( 52)
.20 .260(16) .460(12) .657(12) .752(12) .511( 52)
.25 .326(16) .531(12) .699(12) .676( 8) .529( 48)

 

APPENDIX D

165

.mo>uso uo3om ooueHsEAm on» musomoumou sum: ”ouoz

Moouwaé ......l 02 Ta l.|. S Tm ........ M2716. .........
2: 8. III OS 8. uuuuuu 3 a ........ 2 z I
masses eouaam zooaagsaOa mamaaxm uzo
mg. To 8... m... To m... ~.o
> . b p n p u p p . p P h p u p — p u h P n r p p . n r p u p -

 

 

mango: whomhhunnuxmm
.omo.o u tmchv N I K can: u>moo match

In? mmDGE

u I w r v I I I rfv I r v v v I v w v v I v I v—r 1' ﬁv I v w Vﬁ w v v—v—' o
O 0‘ O F O m V M N H O
0 0 I 0 0 0 0 0 0 0 0
0-4 O O O O O O O O O O

'Vvv’v'vvrwtfvv

H
0
H

 

0003”“ Oh EOIOOHZﬂHl-iﬁ HEIDI-I

2166

 

.mo>uno uosom ooueasaum can musomoumou sum: ”ouoz
cow wow .:|.:. o: xsm |.|. on gum ........ ow mam ............
:2: In: .2: ...... a . ........ Er
mnoa¢> aouemu ona<a=mom azuzaxu uzo
a.o 5.0 o.o m.o v.o n.o ~.o

 

 

H.o

V‘TVI'VIIV'
l‘ ‘0 n v m N
0 0 0 0 0 0
O O O O O O

Q
0

IIIY‘VVI'VVV’VIVTVIIIVIIIvvvv'

Y I

l
01
O

0.030-us: Ol'n ﬂOEOGﬂZNHI-‘ﬂ BNGDE-O

«\N \.\.\
xx \\
\ ..
xxx \V
\.\\ \...
\\\\ \.\\
\.\\ \.
\w\. A“
\\A“\ \h\
\‘o \ \uoo
\\‘\ \ I
“\v ‘1: I

mango: mauummmunuxnh
.omo.o u ammacv m u n can: u>¢au mmxom

N66 mmDGE

 

IVY'YIV‘T'

167'

 

.moouso nosom oous~s&«m 0:» ousomoumou elm:

u 00.02

MmWMWWam “wiui o-WWum_HHHHH om gum ........ mMWWum .............

lil ouu cm x ........

mMDA¢> Bowman zo~a¢aamom usuaaxn uzo
b.o u.o m.o v.9 n.o

Phthb>PP+L>¥>>~PrP$hthhpurn-bub

 

 

mauawz mauubmmunmxnm
.omo.o u «want cm I u uBHI u>mau «atom

$.34 950$

"71
O

I

H

Iva'

V v V '
F ‘9 ID V m N

Iv—fIWVYV'VIIv'IVII'IIvv'

I v
O

 

l’Y""Vﬁfv"“'

O

O

O

e
O

O

O

O

O

O

mos-nu: Oh: HOSOUﬂZMHE-I)‘ Emmi-'0

.168

 

.mo>uso uosom ooueaseam on» musomoumou :Im: ”ouoz

“838.; ...l... “OS 7». I.l. “887$ ........ 2 «A
S: a ll.) OS 3. nnnnnn S a ........ 2 z
muoo¢> summon noaaeoomoa mmuzaxu use
as To 8... ...... ..o n...

F b b b I D P I b I D D .I b I D P D, - D D ’ > - D I I ’

0000000000000

 

 

 

\\\ \V\.\\ \\\
\\ I o \ s
\\\ \ . \\
\\ \...
\\\ \\\ \\\
\\\ \ . ..-
\\\\ .. \\\.\ \\
\\\ \ \ \
\\ \ . \
\\\ \\\ \
\\ \\ . .\.
\ \
\\\\ \\W\ \“
\.\\ \\
\w\ \...
\\0. _\\x
\ . ..
\\. \ .
\\\ \\...\\
\\\V\. A. \
\\ \ . \\\
\\\.\ . \ ...
\. \ . \A\
\.\.\ .
\\.W\ .
mango: mhummhunomme

.omo.o a «east. on u g moo: u>m=o muxoa

v46 mmDGE

 

I'Y‘VV'"V‘V"i'YYr'T'V‘Y"Y'VVfVIYIVVI '
O 01 O F ‘0 ID V M
0 0 0 0 0 0 0 0
H O O O O O O O

r I v v v

H
0
H

 

EOSMK Ol’u OOIOONZHHBN HEIDI-0

1169

.oo>uso nozom ooueasaae on» ausoeoumou sum: «ouoz
oo~ x-m .:|u:. o- mum |-.I-. ow gum ........ om z-m ............
Moowwx IIIII- Mow" & IIIIII ow a :-!i- Mowwx IIIII- z
muse; aunmmm 20:63.2: mammaxu 0.2.
o4 a... o.o 5.9 o... mic To n.o N... 1o o.o
F p . - y L p h p p p . . . . p L) P T? b b P u p . p n . . - . p . p . p p p u p . b b

 

 

 

\\o\ 0 \\ \ooo
\ .
\\\ . \\ .\\
\\\. \\\. \-
\\\. \N \\
\ .
\\\ \\ k
\\\. x
\ A \
.ﬁ. \
\\\ \.\ \
sa-a \\nv
\ .. .
\\.\. . \\
\\- . \.
\\\ .\\\

moo: maouamu-auxoh
3‘. OH - a use: u>mpo muzoa

ﬁmé wing...-

mg
.omo.o u (mm

 

 

““
“Mm-d...-

   

77‘71

V

V'WIIVVV'

 

III"r‘VTU1""V'V'rfvv‘vfﬁ‘rvi’ﬁ"Ivif‘vvjit'

O
0
O

H
e
O

N
O

M
0
O

1-
e
O

m
e
O

‘D
0
O

I"
O

0
e
O

O
0
O

O
e
H

H
0
H

m03ﬂ¢ Oh EOIOOHZHHHN HINDI-4

.mo>uso nozom oousaseqm on» musomoumou elm: “ouoz

 

170

 

 

 

 

com a-m ....:.| ONH mum |.|.. om mum ........ o~ Mum .............
:2: I)... .2 a 8r ........ 2 a -|I .
mmDA¢> Bowman zenatnbmom ”Eng—Bx” 08h
o.~ m.o o.o h.o 0.9 m.o v.9 n.o «.0 H.o o.o
—
“2°
2...
....o
s \. H
\\\\. \\\ Yum-O
\\ o \. v
-.. \«“V \\\ H
\\\\\ \N\\ \\ “00°
\\\ .4 \\\ \ H
\\. \. u
\\\\\.. \\.\\ \\ v e
\\\.\.. \.\\ \. .vh O
\\\ . \\\ .\ .
\V \\ n
\\ \\... . o . o
\\\\\\.\ \\\\ . \...\ H
\\\ . \\\. \\ .
\\\\ \\. \...
\ \\\ \A.\\ ..m...
- \\\“. .\\ v
\\.\\. \.“ v
--nu4\\w\- -)\1\t\
‘Ehu‘.|I-.Il|ﬂ|.h|lv - II. hlllI-IIII‘III‘II .- O o H
h
:4
mauao: maumhhmlamXHh
.omo.o n tamed. on n I new: m>moo 1330.“

Ndé EDGE

9103”“ Oh EOSODMBMHHV E-‘ﬂmE-ﬁ

171.

m.o

.eo>uso uesom ooueasaau ecu susosoumeu cums

“cud gum Iu.l-.

cad K

man» mauummm BOuh‘Aamom A‘Dau ”numb
n.o

 

Moo e-m
co m

00000000

.0902

am “-8

 

 

Aomo.o u

dmchv

v.06 mmDGE

mqmoo: whoumhnnouxuh
m I x uth u>¢oo zmsom

IIII'VTYVYVW‘V"

lV'V‘V'VV'V'jVY

1

'VrTY'VV'I'VVV'YYYfYIV'I'Y

 

O
0
O

F!
O

N
e
O

M
0
O

'
e
O

I”
O

‘0
e
O

F
0
O

O
0
O

6403”“ Oh- HOSOUﬂBﬂHt-tﬂ HNWF

172

 

.mo>uso nozom oouoasedm may musomoumou cum:

...l... S: I.|. o... x-m ........ 8
III o2 ------ MS a ........ “a
menu mounts» onaeosmoa aesou names
«.8 N...

L F P P \rL L b P F P p I D b F # P b b p F D

«ouoz

 

m.— Anzac ma owuhMInuxnm
.omo.o u «an A2. ow I & man: 5530 auto.—
«.3 $30.“.

~.o

n.o

v.0

mic

a...

h...

o.o

a.o

O."

I[V‘IVWVV'VVVI'V'U‘['Vjv"Vv'rVVVVIVVVVI‘f'fvvvrﬁ'

 

l
H
0
H

0003”“ Oh: EOZOUﬂBMHE-IW [HEMP

173

69590 “330“ vouuaaﬁdm on» encamoumou ..Im: ”0002
MooNWm-m ......I Mo: x-m l.|. Moo z-m ........ “3 -m .............
oo~ x III as u uuuuuu on a -------- 3 III
mama maoumhu onatqpmom Adoau Maura
m.o v.0 n.o «.9 ~.o o.
n b n . p - . b p p p . p p . r? b b r p b b b p b p P b P p p > > p p p b F L

 

 

    

   

I
s
I
e
I
so
0
I

 

\\\\\\ \\\
IIIII Isl-\\\ - -| I-IL-IIL‘\\ Ill-\\\
ma 2 mhuuhhMIamxHh
.omo.o .|- (am on I K mam: n>m=o match

no
Ad
Qmé mmDGE

\ a 0
\ e .00...

‘ 0
I‘m-..:... \

\

TijYTYTYrTY‘V—V'"Y'V'Tﬁﬁt‘vﬁ'j'v'v'vi‘V‘IVV‘VﬁT""1'VVV'

 

O
0
O

H
e
O

N
O

m
0
O

'
e
O

In
0
O

\D
O

h
I
O

O
O

as
0
O

O
H

H
0
H

m03ﬂ& Oh QOSOOMIHHBN BNU)!‘

174

 

 

 

 

  

.oo>u=o nose“ oouoasaae on» eusomounou cum: «ouoz
OS z-m ......l 82 Tm l.l. 2 m-m ........ 3 a-» ............
“sour Ill 82 a ------ “on e ........ ”our III
mama «acumen zomaeaomom a¢aou u>Hm
m.o v.0 n.o ~.o H.o o.o
» I I I I L I I p + b P I I I I I I b I I I I I I b I I b I I I I r I I I I _ I I I I - OaO
1
W
3.1.1.11...“- ﬁ
.13-5 La
‘\ 7
..Na
...”...
h
....o
\\ \ ..m...
xxx x \ n
x \ ,
\.\.\.s. \\ .r@.°
\\.. \r v
\.\.\.. \\ 4
\. 1
A. \ .
\\ \\ -.o
\\.\.\.. \\ h
\\.\.\.. K v
\.\.\.na. \\\ ...o.o
X“. x \ .
\\ \ .
-m.o
\\\x \ .
\ .
\\ \ r
‘I‘IIIIIIIIIIIIII\I| ‘III‘\ r OIH
..H;

mama—o: mhumhmuunuxnh
.omo.o I (want. on n K can: mbzoo auto.—

wéé MEDGE

 

0003”“ Oh HOZOOHZMHBH Ian-wu—

175

.m0>uso nozom omusanedm any nusomoumou cum: ”ouoz

MooNWx-m .:-:. owdwz-m -.|.. cm a-» ........ Mow z-m ............
ecu x -II:I cu" m 111111 co m -------- on :

mama thnmmm onataomOm Aﬂoou M>Hu
m.o v.9 n.o ~.o H.c o.o

. D D F D F D F F D p D D F D D D D F D _ P D D D D D F [r )- D F D P F D F F F _ D D D- .
-o o

 

I I I I I I l V

I I I I I I '

 

 

 

mango: mauummUInmxHh
“omo.o n cnmnﬂv on I a can: u>moo mason
N66 mmDGE

DIO3N¢ Oh: HOIOOHZNHE‘N (“NUDE-I

176

n

m~.o ¢~.o

PFPDLLD

.moouoo nosom ooueasawm on» musomoumou gum:

“wax-.. ..-....-.-... :2 ..

-.o o~.o

D D F LP D! D D D -

cum a

m

“ﬁx; nth

muNHm Bowman zo~a<aaa0m mo noz¢Hz¢>
vu.° ~H.° o~.o

o~.o ou.o

D F DD - rDrrLLIF FD .*F F DI-r F

 

 

 

 

 

 

man: mnmoox who
Adv N I x can:

WQ6MEDOE

m
>

m
x

u
a

«0002

“a 1-.

oc.o

too
an

4:
Om

wo.o

IIIIIIIIIIII

vc.o

No.0

 

oo.o
c.o

D F D b D D D|P bl? D D D h F D D F P D F D D[ b D D D D .

“.0

VYVYV‘f

IIIIII‘Iﬁ—VrIIII'I

 

O
0
v-O

9‘03”“ Oll- EOIOUNZOHBD (“NUDE-I

3177

m

w~.o

IPIII

 

v~.o

D

-.o

D F F b D- D

.mo>uso nosom oousH56Am any musomoumou sum:

«0902
WE Hr. am “a HH ﬁx; H.............. gm w.-.
muNHm human” zenhtgamom ho muz<H¢¢>
~.o o~.o o~.o vH.o ~H.o c~.o co.o mo.o

D

o

F

b D F Dr F .F D F F D! h D F D, -D h F F L L h D D F D h F P F D _ Dr F D F b D F

0|
0.
88.0
I...
.l.... \\\
II
I).
-III. \\\\\\\\
JI....-..I.|.I.\.I. \\
IIIII IIIOIJ \IN
\\\\| \\\
I.\\.L-. \\\\\ \\\
\\
\\\ \
.\.‘\. '0‘
‘I
I“
III-.010“. \
\\I“‘II‘.III“ \\\
\\
IIIII‘I‘IIIII

wo.o-¢aonaoz use: mango: «oomph
omo.o I <maq¢. m I a maHz use

N66 MESS—n.

mnxoo
Do an

F

vo.o

D

IIIIIIIIIIII

~o.o oo.o

P F F D F

LD’FL’FF

' v—Y I
H O

N

YVY'YVVVTV

I I I I

0‘

O

 

H

jf‘V‘IIrTTIIIIIIIYIIII'IYYI‘IIII'IIIIYI
0

O

O

O

m
0
O

'
e
O

I!)
0
O

\D
O

F
0
O

O

0
H

0
s-l

0403”“ Oh EOEOUMZNHBH HIE-IV)!"

178

£0230 ugom vovuasauau on» aucououmou ..Im: ”muoz

~21.me ..l.. OS a-» l.l. MS 7... . ....... S a-»
2; n Ill 22 n in}: S u :------ 3 z
mean» auu-u nonu«aamo. no uon¢nx¢>

v~.0 -.0 0~.0 0~.0
r D h, D D F F P D D D D by D D D D D p D D D D b F F p F P D Dr D F P D D D P .- D F Lr D -

p

D D

-‘
DD..§‘.““.-|‘ \\
5D. 0“
‘\.I‘\.l'\l \\
\‘
:l..-lltl’l

Iluh I‘V‘"!!!

 

00.0I‘9Aﬂ00! has manna: mHUMhhuuzoozdx
.0m0.0 I ﬂaunt 0a I a nun: n>¢00 «”300

Qmé MED—0.“.

0H.0 v~.0 N~.0 0u.0 00.0 00.0

D

F

v0.0 «0.0

D

D

00.0

-L

 

I

‘vrvv‘f'vf'vvv’v'vf‘T—Yrﬁ‘vaTffvv'V'YVIVVYV'vvvv'Ijvv'jvvv

 

O
O

«.0

n.0

O
O

0008”“ Oh IOIOOIIHHHDI Iﬂ-MI'O

179

0~.0

v~.0

.uo>uso nozom vouuaaedu 0:» mucououmou sum:

M00~w2:0 .:In:. 00H xnm lu.l.. M00 mam :.Z.: “0
00w x 00 a 0

-.0

D .— D

D

..I:I on" a .-----
munhm nou~ha nonagaoaom uo nun¢~gg>
°~.° od.o o~.° .~.o -.° o~.o oo.o

D’DLD’FDDPDDDDbDDFDhDDDtDDDF.DFDDb’DD

I---‘-'-

00. 0.Ichau008 has manna: when uh>¢ :OOIt
.000. 0 I cnmat 0m I a man: ~00

v66 NEDGE

”ouoz

00.0

D F p D D

D

Pp.»

 

H
C
O

N
O

M
O
O

'
0
°

ID
0

p
O
O

O
O
O

O
'4

H
O
F.

 

jﬁrrw'vvvv'f‘vvv'vvv‘v'vav"'VV'V'v"rvTﬁjrivv'vvvf'vvvv‘
.

“O‘ﬂﬂ Oh. HOIOOHIHHI-O)! l-Iﬂtni-ﬂ

180

w

 

 

 

£0350 uo3om coaaagu on» unconoumou ...-m: 3902
MOST.-.“ .....I Mo: 0...... Ii. “2;; ........ “27-... ............
2: v. III c2 a ...... 8:. ........ 2 0. Ill 2
muunm Bowman zouadnamom mo n02£nx¢>
m~.o v~.o -.o o~.o o~.o o~.o .H.° ~d.o o~.° 0°.o oo.o .o.o ~o.o co.o
Lyn».PL»pphrbhprpbbberLb+>p>LF>bnnPPp-bb»pp—>-»>—-p>.PblP-._>>->h

o...
.L.o
X...
..n...
h
-..o
w
.....o
ﬂ
“a...
:....
.....o
{a
..o;
I.“

$H.0I<hqu
0m0.0 I

a: 0 H3 manner mauum
umA‘ N I x can: n>

v.06 mmDGE

”ha

0
t

 

DIODE“ Oh: GOIOUMZNHHH t-IﬂCDF

181

n

0~.0

—

D D

 

.no>usu Mason voudaaaau on» unannoumou gum:

0- z:

wxum :.II.
a 0- K

.00~
.00N

m

Ii: “8 T...
...... S g

muunm Bowman zo~u¢ADmOm mo moz¢~¢<>
v~.0 -.0 0H.0

v~.0 -.0 0~.0 0~.0 0H.0

F p F D F F F F D D F b F F \P F P D D Dr D b D D F P h P D F F b D D

D D p F D

00.0

D F — FD P D - F F D D h F D FL - D D D F

”0902

M0~ mum
0m x

00.0

00.0

No.0

00.0

 

u H: mqmooz who
A4 0 I & mam:

Néé MEDGE

m:

m
>

Zh-

 

I
O
O

V'IYTV‘Iﬁﬁ’V
F‘
O
O

0.0

va'vvvv‘
O
O
O

I
F
o
O

0
O

vavvvv'vYVVva

 

.L;

0‘03”“ Oh: IOZOUNZﬂHB>¢ BNUDE-t

182

 

 

 

 

.ua>uao uo3om caudasaau on» mucououmou elm: «0902
.oc~wx-m :.n.:. cm“ ...I.. Maw a m ........ M°~ x- m ............
.oo~ a 1:13. ow" ..... : ........ 11:11 2
muNHm gunman zouu¢gomom no uoz¢H¢‘>
w~.o .«.o -.o c~.o m~.° 0H.o .~.° ~H.° o~.o o°.° wo.o .o.o ~o.o oo.o
p» >P> .uprkpb-pubpppPPhppphhrpppbpppp—-Puphhb p-bpu.hp uh» In a

.c o
pﬁ.o
p~.o
pn.o
ﬂ
w..o
pm.o
T
pm.°
.....o
.
r
vc.o
pm.o
wo.ﬂ
1
r
r
.H.~

0a. 0.I¢9Am00: mans mama—o: maummmm ...:onzc
.0m0. 0 I tamatv 0a a a ma:— u>aou «2.00

m. mi meGE

 

mosaic: Oil-a EOSOUWZNHFD‘ [-‘hl‘DE-I

1133

n

 

 

 

 

.uo>uao uo3om vouuasaua on» nucououmou cum: nouoz
Moonwn-n :.|.s o~n n-n :..|.. on n-n ........ “onwn-n ............
con n .|..:. onn n .----- on n ....... - on n z
nnnnn nonnnn zonnnnnnon no nuznnnn>
n~.o .~.o -.o n~.° nn.o nn.o .n.n «n.n on.o no.9 no.o no.9 no.0 no.0
p?~..b?>.-h>>.pppth~>.>L-->>—p>~>—p>>b~>>pbp>p>¥b+L>p—hp>>- p.

.°.o
“n.o
h
.~.o
ﬁn.n
T
m..°
“n.o
ﬁn.°
T
w
1
.n.o
“n.°
pn.o
ﬂ
.°.n
“n.n

an.ounannnnx nan: nnnno: nnunnnn-xonznn
nono.o a nannn. on a n nan: n>n=o nnxon

YQ¢mmDGE

 

LOSNM Oil: EOIOUHZNHt-Oﬂ BEND?

184

69:50 .3390 voyage: 0.3 nuconoummu ..Im: “ouoz

 

00~0xum .l A0~ m .|.l. 00 gym ........ 0w 0. m ............
Moon; Ill .3 i ...... “on n ........ Mon 2
Nam Bumhhm zouhtqamOm ho uuz¢~¢<>
vw 0 -.0 0~.0 0H0 00.0 :.0 ~H.0 010 000 00. 0. V0.0 ~00 00.0
h —- h *p n n n — . . > n — . . . b F b r n p n P > n . — p . . . n n > - -— p > b — . — u p .— c-o

IIIIILLLLLLLLLLLLLL A‘K‘\\\ \\
|||||| 10‘ \v
IIIIII ‘3‘ ‘\
\\\\ \\
\\\\\\ \
\|\|‘\‘\\\\\ \\III‘II.
\\\.‘
II:||\|.\|\I\
1‘
MN.ou£bAHOD: HRH: MANGO! whowhhnllooztm
omo o u tmmatv N I K maul H>mao mason

_.N.¢ meGE

 

..n...

v5.0

 

mosh-n: Oil- BOXODHZNHP» E-‘IINDB

185

.uu>u:o u¢3om vouuaaEAn on» nucvuoumou gum: nouoz
Moonwnn ...l... 037$ l.l. Mon Tn ........ “Swan ............
con x III as a nnnnnn on n ........ on. n I z

mmNHm aonumu BOHH¢A0000 ho mozt~¢¢>

 

 

 

 

 

0«.0 v«.0 ««.0 0«.0 0~.0 0~.0 v~.0 «~.0 0«.0 00.0 00.0 00.0 «0.0 00.0
D tfr....»._.L..n....—....nrr»......L»........_p-....»..p.».......n .

10 0

r

T

r
In
K...
.119
......

W
..n...
\\\ ‘so. ww.0

.... ‘I n

\\ V v

... \ .
U§.\\ \ {Foo

It‘s 5 v

DDIOIDDD.‘ \\\ \ h
\\§\\ﬁ‘§ \ \ Taco

\\\IIII \ \. ..

IIIIIIIII \\ \\ .

ddalddddlldd \\\ \ H
Idlidldllll‘ \I“\ \\ ma 0 c
all ‘ii Yto OH
I;

m«.0u¢aau00: mam: «Annex muuuhhmnzoozcm
.0m0.0 I 4:041. 0ﬁ I n can: m>m=o «Mien

mNé MEDGE

m03ﬂ¢ Oh: EOZOUNZHHE-‘D' PM”)?!

186

n

 

.um>uno u03om vouaaaaqu on» mucououmou gum: nouoz
Moo~wn-n .:|.:. Monnwx-n n..|.. an nun ........ “on
non n I. o~n a uuuuuu on x ........
mnnnn aunnnn zonannnnon no nuznnz<>

0«.0 v«.0 ««.0 0«.0 0«.0 0«.0 0«.0 «H.0 0«.0 00.0 00.0 v0.0 «0.0 00.0

bFDFFPFFFFPFFFFbFDFFhFDFDpFDFDbrFLFKhFFDD.DFFF-FFDDbDFFFFDDDFhFFPFL °I°

 

I".
o
o

N
o

vvvtvv‘t‘v

 

 

nan: mama—o: whom
at. m I a man: u

«.56 meGE

>lha
min

mOII'lm Ol‘u EOEOUHZMHBM E-H'IICDE‘

1137

 

 

 

 

 

.mo>usu umzoa vouGHSEAu on» nuconoumcu cum: «Guoz
“003070 .::..I .0«~ 0.1m |.|. 00 gum ....... . M0« xum ............
oo~0n I:I|. .onn n uuuuu u on n ....... - on a IIIII 2
mnan aonnnn zonannanon no nuznnz<>
0«.0 0«.0 ««.0 0«.0 3.0 00.0 :.0 «0.0 0H0 00.0 00.0 00.0 «0.0 00.0
r+.rr_...._........._.F>L_r»...»...r.>.r..prr....._.>.._.>L._..L|r# .
#0 0
0
1
.10
..n.o
.
f
f
.00
f
w..o
.
ﬁ
.00
h
0.0
hp.o
..n.o
..n.o
\ T
--:--- \\\~ n
Illllb’ll IIIII III IIIIIIII lull-I‘ll I.\ T o -H
vn.n

n~.ounann00= nan: nnnnoz naonnnn-xonznn
nono.o u nnnnn. on u a nun: n>nno nnson

V86 mmDGE

 

mOSNK Oh EOZOOﬂZMHt-I)‘ atoms-4

188

n

0«.0

D|p

 

 

.ncbuso uczom vouuaneAu on» uncououmou cum: nouoz
Moonwzn ......l on; an l.l. M0... Tn ........ 0... 7n ............
can 0. III 02 n uuuuuu .00 n ........ on. n z

musum Houhhn zonacaamOm ho uozt~¢¢>

 

 

 

 

0«.0 ««.0 0«.0 0«.0 0«.0 0«.0 «H.0 00.0 00.0 00.0 v0.0 «0.0 00.0
pppnpr¥p_>F..pT.>>.........»......b».»»...p>r.L........»»»pp»>.> .
.0 0
nnnnnnnnnnnnnnnnnn \\\\\\\\\ \\\
||||||||||||||||| I‘.‘“‘ \\ ﬂ 0
...... :.....ss...ss\o \\\ \\ In o
“thl. IIIIIII \ v
‘\\\\\\\\ \\\ n
‘#‘\II\.\II\\¢| III... \\\ m.¢.O
\\I‘II-I‘ \\\ v
"III. \\ H
\.urvlui“l‘t Inoc
..n...
0
r
10.0
..n.o
.3...
W
0 .
10H
0
r
0
wH.~
mn.0u<ham00: n a: manoo: maummmunzoozta
000.0 I <mmqt « I a can: u>¢00 zmzom

...wé mmDGE

OaOBNIK Ola EOEOQMZHHE-t)‘ BEND?!

2189

.uu>u=o nozom vouaasaau on» unaccoumou cum: nouoz
.oon n-n :.|.5 o~n n-n ...|.. on n-n ........ an n-n ............
.oonwn III:I onn a uuuuuu Monwx ........ Mon n IIIII
mnnnn aonnnn zonannnnon no nuznnn¢>

0«.0 v«.0 ««.0 0«.0 0«.0 0«.0 00.0 «0.0 0«.0 00.0 00.0 00.0 «0.0 00.0

n F F D F F P D F D D F F D D F .P D F D D P D F F D p F F D F x- P D rDr by D D D F b D D F D by D D F F P F F Dr F b D D D D h D D D D h

v 'TffY

I V Y V TT V V T Y ' V V I V V V
v M N
I I I
O O O

n
O

 

\D
O

I
8
\
L
l
V ‘ 7 ti Y I V V V Y ' V V V

F
I
O

 

nan! manna: macaw
at. m I x can: m>

«dd mmDGE

3.:
£551
DH
Ul

mo3¢l3¢ Oh: =OSOUszHt-UN BEND!“

19C)

.n0>uso nozom vouaasadu on» uucououmou sum: nauoz
Moonwz-n .:u.:. onnwx-m ...|.. Mon x-» ........ Mon n-m ............
can u I:I|. 0~n n uuuuuu on a .:a:a- 0n n uIIII. 2

«Hana human” BOT—Hanna; ho nuz¢H¢¢>
0«.0 v«.0 ««.0 0«

n F F D D F — D F F F b D D D

D F D p D D D D h D F D F p F F F F .- D D D F b D D D D P F F D

 

 

on.ounannanz nan: nnnnoz naunnnn-=ooznn
.ono.o n nannn. on a a nan: n>nno nnzon

0.0.? mmDGE

.0 0«.0 0«.0 0«.0 «0.0 00.0 00.0 0

 

0.0 00.0 «0.0 00.0

bDDIDFhFFDDbllDFP
0.0

H
O

N
O

M
O

'
O

n
O
O

O
O

F
O
O

O
O

0‘
O

O
H

r4
0
Fl

 

jﬁ‘rvv'vvvv'vvvv'v‘v‘v’v'ﬁfvv'vviv'v‘va'YV’erWVIIVVVVIVrvv'

 

OIOID‘lm Oh EOEOUNZMHE-ID‘ tibial-4

191

.uobusu “5309 vouudaaau any mucouoﬂuou cum: nouoz
“wax.-. IH.I.. “mm“ m-.. NH... a“ a... H.............. am w-.. ..
mﬂNHm Pomhmﬂ ZOHB¢ADmOm ho HUB<H¢¢>
««.0 0«.0 0«.0 0«.0 0«.0 ««.0 0«.0 00.0 00.0

 

D \- F D F F b D F F D b D Dr F D b D D D! F p F D D D b F F D D P D F D

F . F D

 

00.0Icaqu002 new: mauoo: whomhuunzoozcx
.000.0 I 4004‘. 00 I a was: ”>000 zmxom

V6.0. MEDGE

v0.0

D D P F

«0.0

 

00.0

' ﬂ v ' v v v v I v 7’1
1' n N H O
O I I O O
O O O O O

n
O

O F
0 o
O O

'vvvw‘v’vit'lvvv'vvv‘v‘vvvvjvvvv'
W
O
O

O
I
O

I v v v v

'vvrr

 

QOIHK Ola QOSOUNZMHHD‘ ENG)!“

APPENDIX E

SYNTHESIZED STUDIES

Anderson, R. D., Kahl, S. R., Glass, G. V., & Smith, M. L.
(1983). Science education: A meta-analysis of major
questions. Journal of Research in Science TeachingII 20II
379-385.

Bucknam, R. B., & Brand, 8. G. (1983). FBCE really works: A
meta-analysis on experience based career education.
Educational Leadership. 59:6, 66-71.

Cohen, P. A. (1981). Student ratings of instruction and
student achievement: A meta-analysis of multisection
validity studies. Review 9: Educational Research, 51I
281-309.

Fleming, M. L., & Malone, M. R. (1983). The relationship of
student characteristics and student performance in
science as viewed by meta-analysis research. Journal of
Research in Science leaching, 20l 481-495.

Horak, V. M. (1981). A meta-analysis of research findings on
individualized instruction in mathematics. Journal of
Educational Research. 24. 249-253.

Johnson, D. W., Johnson, R. T., & Maruyama, G. (1983).
Interdependence and interpersonal attraction among
heterogeneous and homogeneous individuals: A.theoretical

formulation and a meta-analysis of the research. Review

of Educational Research, 53ll 5-54.

 

193

Kavale, K. (1980). .Auditory-visual integration. and its
relationship to reading achievement: A meta-analysis.
Egncennunl g Mono; Skillg. 5;, 947-955.

Kavale, K. (1981). Functions of the Illinois Test of
Psycholinguistic Abilities (ITPA): Are they trainable?
Exceptional Chilgngn, 57, 496-510.

Kavale, K., & Mattson, P. D. (1983). One jumped off the
balance beam: Meta-analysis of perceptual motor
training. Jounnal 0; Learning Disabllitigs. 16. 165-173.

Kulik, C. C., Kulik, J. A., & Cohen, P. A. (1979). A meta
-analysis of outcome studies of Keller's personalized
system of instruction. Ann;innn_£§ynnnlng1§;‘_;gn 307
-318.

Parker, K. (1983). A meta-analysis of the reliability and
validity of the Rorschach. Jou al 0 ers a
Assessment, 47l 227-231.

Shapiro, D. A., & Shapiro, D. (1983). Comparative therapy

outcome research: Methodological implications of meta

-ana1ysis. Journal of Consulting and Clinical

Psychology, 5;, 42-53.
Smith, M. L., & Glass, G. V. (1980). Meta-analysis of

research on class size and its relationship to attitudes

and instruction. MW
17. 419-433.

194

Steinkamp, M. W. & Maehr, M. L. (1983). Affect, ability, and
science achievement: A quantitative synthesis of
correlational research, Bgyigw gt Egucational Researgh,
51‘ 369-396.

Steinkamp, M. W. EIMaehr, M. L. (1984). Gender differences in
motivational orientations toward achievement in school
science: A quantitative synthesis. Angzignn Educanional
Research Journal. gl, 39-59.

Sweitzer, G. L., & Anderson, R. D. (1983). A meta-analysis of

research on science teacher-education practices
associated with inquiry strategy. Jgunnal g; Bgsenrch in
Science TeacningI 20, 452-466.

White, K. R. (1982). The relation between socioeconomic
status and academic achievements. Psychological
gullenin, 21. 461-481.

Whitley, B. E. (1983). Sex role orientation and self-esteem:
A critical meta-analytic review. n o e o
and Sogial Esygnglogy. 55, 765-778.

Willett, J. B., Yamashita, J. J. M., & Anderson, R. D. (1983).
A meta-analysis of instructional systems applied in
science teaching. lgnnnal of Begenncn in Scignce

Ieaching. 20. 405-417.
Willson, V. L. (1983). A meta-analysis of the relationship

between science achievement and science attitude:
Kindergarten through college. gnunnal of Research in
§21e22e_1eagningn_22i 839-850-

195
Yeany, R. H., & Miller, P. A. (1983). Effects of diagnostic
remedial instruction on science learning: A meta
-analysis. Jou na 0 3 ch en e e chin 20

19-26.

BIBLIOGRAPHY

Alexander, R. A., Scozzaro, M. J., & Borodkin, L. J. (1989).
Statistical and empirical examination of the chi-square

test for homogeneity of correlations in meta-analysis.

E§XE9212912§l_ﬁull§£ini_12§1 329-331-
Anscombe, F. J. (1963). Sequential medical trials. Journal
9: Amenignn snntisnical AssocianionI 58. 365-383.
Armitage, P. (1960). §£QB§D§121.E§QL§QL.EI1§1§o Oxford:

Blackwell Scientific Publications.

Bangert-Drawns, R. L. (1986) . Review of developments in meta-
analytic method. Psychological Bulletin. 29, 388-399.

Becker, B. J. (1985). Applying tests of combined
significance hypotheses and power considerations.
(Unpublished Doctoral dissertation, University of
Chicago, 1985).

Becker, B. J. (1989). Gender and science achievement: A
reanalysis of studies from two meta-analyses. Jou n of
Research in Science Teaching, 26I 141-169.

Brewer, J. K. (1972). On the power of statistical tests in
the American Educational Research JournalI American
Educational Reseanch JournalI 9, 391-401.

197

Chang, L. & Becker, B. J. (1987). A comparison of three
integrative review methods: Different methods, different
findings? Paper presented at the annual meeting of the
American Educational Research Association at San
Francisco.

Cohen, J. (1962). The statistical power of abnormal-social

psychological research: A review. u nal b ormal
Psychology, 65I 145-153.
Cohen, J. (1969). ta '5 ' owe ' th

behavioral scienges. New York: Academic Press.
Cohen, J. (1973). Statistical power analysis and research
results. American Edugatinnal Rgsgarcn Jounnnl. 10.
225-230.
Cohen, J. (1977)- ELQLi§EiQQl_PQ!§I_ﬂnil¥§i§_£QI_§h£
nghavignal gnienggg (Rev. ed.). New York: Academic

Press.

Cooper, H. M. (1982). Scientific guidelines for conducting
integrative research reviews. Keying 9f Egugnninnal
Research. 52, 291-302.

Cronbach, L. V. (1980). Inwang nefonn gf nnggnam gvalnntion.
San Francisco: Jossey-Base.

Daly, J. A. & Hexamer, A. (1983). Statistical power in

research in English education. Researcn in nhe Ieaching
9f English, l7, 157-164.

198

Fabian, V. (1991). On the problem of interactions in the

analysis of variance. Jooroal of Amoricao Statistical
Aogociorion, g6, 362-367.
Fisher, R. A. (1932). ' ' etho o s ch

w e s (4th ed.), London: Oliver and Boyd.

Glass, G. V (1976). Primary, secondary, and meta-analysis of
research. EQBQ§L12n§l_B§§£§IEhL_§, 3-8.

Hedges, L. V. (1981). Distribution theory for Glass's
estimator of effect size and related estimators. Joornal
Qi_EQB£Q§12n21_§£§£1§£1§§I_§l 107-128-

Hedges, L. V. (1982). Fitting categorical models to effect
size data. Journal of Egooarionol Srorisrios, 7, 245
-270.

Hedges, L. V. (1983). A.random effect model for effect sizes.
Psychological Bulletin, 93, 388-395.

Hedges, L. V. (1986) Estimating effect size from vote counts
or box score data, IPaper presented.at the annual meeting
of the American Educational Research Association at
Chicago.

Hedges, L. V. & Olkin, I. (1980). Vote-counting methods in
research synthesis. W 359—369.

Hedges, L. V. & Olkin, I. (1985). Statistioal mothods for
meto-analysis. Orlando: Academic Press, Inc.

Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). geta-
anal sis: Cumu t n ese c 'nd'n 8 ac os udies.

Beverly Hills, CA: Sage.

199
Lewis, R. J. (1990). Sequential clinical trials in emergency
medicine. Aoools of Emorgoocy uedioioe, lg, 1047.
Massey, F. J., Jr. (1956). The Kolmogorov-Smirnov test for
goodness of fit. Joornol of Americao Statistical

Associorion, 46, 68-78.
Overall, J. E. (1969). Classical statistical hypothesis

testing within the context of Bayesian theory.
Psychological Bulletin, 7l, 285-292.

Pigott, T. D. (1986). An analogue to analysis of variance for
correlations. Paper presented at the annual meeting of
the American Educational Research Association at Chicago.

Pillemer, D. B., & Light, R. J. (1980). Synthesizing
outcomes: How to use research evidence from many

studies. Harvard Eduoarioool Review, 59, 176-195.
Rosenthal, R. (1978). Combining results of independent

studies. Esyohologlcal gullerlg, 8:, 185-193.

Rosenthal, R., & Rubin, D. B. (1979). Comparing significance
levels of independent studies. Psychological Bulletin,
ooy 1165-1168.

Sedlmeier, P. & Gigerenzer, G. (1989). Do studies of

statistical power have an effect on the power of studies?

MW 309-316-
Snedecor, G. W; & Cochran, W. G. (1967). St st et ods.

Iowa: The Iowa State University Press.

200

Sobel, M. & Wald, A. (1949). A sequential decision procedure
for choosing one of three hypotheses concerning the
unknown mean of a normal distribution. The Annals of

athe atic Stat's 'cs 0 502-522.

Steinkamp, M. W. & Maehr, M. L. (1983). Affect, ability, and
science achievement: A quantitative synthesis of
correlational research. ew c ti a a ch
51‘ 369-396

Steinkamp, M. W., & Maehr, M. L. (1984). Gender differences
in motivational orientations toward achievement in school
science: A quantitative synthesis. Meri can Educotional
Research Journal, 21, 39-59.

Tippett, L. H. C. (1931). Tho methods or statistics (lst

ed.). London: Williams and Norgate, Ltd.

Tversky, A, & Kahneman, D. (1971). Belief in the law of small
numbers. Psychological Bulletig, 76, 105-110.

Wald, A. (1952). Se ue t al A a s's. New York: John Wiley
& Sons, Inc.

Whitehead, J. (1983). s and a s's 0 Se e 'al
Medical Trials. New York: John Wiley & Sons, Inc.

Whitehead, J. (1987). Supplementary analysis at the
conclusion of a sequential clinical trial. Biometrics,

42, 461-471.