PLACE IN REI'URN Box to remove this checkout from your record.
TO AVOID FINE return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

'JULcm 24008 2

 

NOV 0 5zgn§

 

111'; y;

 

 

 

 

 

 

 

 

 

 

 

 

moo “Wis-9.14

 

 

This is to certify that the
dissertation entitled
Large and Small Sample Properties of

Maximim Likelihood Estimates for the
Hierarchical Linear Model

presented by

Dina Bassiri

has been accepted towards fulﬁllment

 

 

of the requirements for _
Ph.D. . Counseling, Ed. Psychology
degree in Wal Educat ion ,
r I Major professor
Date 1 1-4-88

 

M I u u
3 U u an Airman" Amen/Equal Opportunity Institution 042771

 

 

 

LARGE AND SMALL SAMPLE PROPERTIES OF
MAXIMUM LIKELIHOOD ESTIMATES FOR THE
HIERARCHICAL LINEAR MODEL

13}!

Dina Bassiri

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling, Educational

Psychology and Special Education

1988

‘ ."¢\.7 7V7}?

 

 

 

ABSTRACT
LARGE AND SMALL SAMPLE PROPERTIES OF

MAXIMUM LIKELIHOOD ESTIMATES FOR THE
HIERARCHICAL LINEAR MODEL

by

Dina Bassiri

The multilevel character of educational data have
implications of a general methodological nature. Interest in
these methodological problems has recently been stimulated
by the development of the E.M. algorithmic approach to
variance component models. The EM algorithm produces
maximum likelihood estimates for variance components with
known large sample properties. That is, the estimates are
consistent, asymptotically efficient with known large sample
normal distributions. However, at present little is known
about the small sample behavior of the parameter estimates.

The primary purpose of this Monte Carlo investigation is
to understand the properties of maximum likelihood estimates
in small and moderate samples using a two stage hierarchical
linear model with standardized normal predictors at both
levels of hierarchy (i.e., using a standardized two-stage
hierarchical linear model). Specifically, this research will
investigate the effects of variance estimation via the EM

algorithm on properties of parameter estimates at the

 

 

Dina Bassiri
second stage of the hierarchy, that is, the macro or fixed
effects, yoo, Y01 , Ylo , and Y11 . These are the regression
coefficients in the equations for the mean and slope at the
second stage of the hierarchy. A secondary purpose is to
evaluate the robustness and power of asymptotic z-tests of
the macro parameters under various conditions determined by
the number of groups, the group size, and the effect size.

The following are the major conclusions drawn from the
investigation. (1) Macro estimators are unbiased,
consistent, and asymptotically efficient with asymptotically
known normal distribution. (2) Error estimates of macro
parameters are considerably affected by the number of
groups, but not so much by the group size. (3) Precision of
macro parameters is directly proportional to the number of
groups and inversely proportional to intraclass correlation
coefficient. Increasing group size increases precision as
well. Yet, the effect of one is not proportional to that of
the other. (4) Micro parameter variance estimator for the
slope and intercept of the first stage regression model are
biased but consistent and asymptotically efficient. Increa-
Sing the number of groups has a determinative effect on the
Parameter variance in slopes, but parameter variance in
intercepts is more influenced by group size. (5) Within
group error variance estimates ( o2 ) are unbiased, consis-
tent, asymptotically efficient and are considerably more

affected by group size than by the number of groups.

Dina Bassiri
(6) The precisions of variance components estimates, in
contrast to that of the macro parameters, are directly
related to intraclass correlation coefficient. (7) Depar-
tures of empirical type I error rates from nominal alpha for
tests of macro parameters are typically within 99% confi-
dence intervals. When outside the probability intervals,
empirical significance levels are all liberal. No pattern
developed between empirical type I error rates and number of
groups, group size or effect size. (8) For all macro parame-
ters, power increases as total sample size, number of
groups, group size, or effect size increase. However, group
size has a consistent determinative effect on power over the

number of groups.

To Mohammad and Yashaar

ii

Acknowledgements

I would like to take this opportunity to thank my
committee chairperson, counselor and friend Dr. Stephen W.
Raudenbush for his invaluable support, insightful comments
and understanding. Working with him contributed greatly to
my professional development.

I would also like to thank Dr. Richard F. Houang who has
been a constant source of inspiration, encouragement and
genuine support throughout my graduate studies, as well as
for his intellectual persuasion throughout this research.

I wish to express my appreciation to the rest of my
committee, Drs. Dennis Gilliland, William H. Schmidt, and
Robert E. Floden for reviewing my work and providing
suggestions for improvement. I would further like to take
this opportuniy to acknowledge the support I received from
the Spencer Foundation.

Most importantly, my deepest gratitude goes to my
dearest friend and husband, without whose understanding,
patience and moral support this work certainly would not
have been completed, the person who always knew I could do
it, Dr. Mohammad Ali Chaichian.

I wish to express my appreciation to my parents who have
been a source of love and strength from before my time at
Michigan State University. Last but not the least, my

iii

deepest appreciation goes to my son Yashaar for his patience

and understanding while I worked on my dissertaion.

iv

TABLE OF CONTENTS

Page
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . v
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . vii
Chapter
I. STATEMENT OF THE PROBLEM . . . . . . . . . . . . . 1
II. REVIEW OF THE LITERATURE . . . . . . . . . . . . . 7
Uni- Level Techniques . . . . . 7
Multilevel Techniques with Random Intercepts
and Fixed Slopes . . .11
The Multilevel Technique with Random Intercepts
and Random Slopes. HLM . . . . . . . . . . .15
Estimation of Dispersion Matrices. . .18
Asymptotic Properties of Maximum Likelihood
Estimates . . . . . . . . . . . . . . .20
III. TWO STAGE HIERARCHICAL LINEAR MODEL (HLM) . . . . .23
Estimation Under Known variance Components . . .27
Estimation Under Unknown Variance Components . .31
The Logic of EM Estimation . . . . . .35
Effects of Having to Estimate Variance
Components . . . . . . . . . . . . . . .38
IV. METHOD . . . . . . . . . . . . . . . . . . . . ._.43
Standardized Two-Stage Hierarchical Linear
Model . . . . . . . . . . . . . .43
Parameters of the study . . . . . . . . . . . .47
Design of the Study . . . . . . .50
Description of the Generation Routine . . . . .57
Monte Carlo Techniques . . . . . . . . . . .62
Random Number Generation . . . . . . . . . .62
Analysis Routine . . . . . . . . . . . . . . . .64
Checking the EM Algorithm . . . . . . . . .65
Type I Error Rate and Power . . . . . . . .69

V

V. RESULTS . . . . . . . .
Results for the Estimation Phase

Are the Macro Parameters Asymptotically
Unbiased and Consistent ? . . .

Are the Macro Parameters Asymptotically
Efficient ? .

Do the Macro Parameters Have Asymptotic
Normal Distribution ?

Are the Variance Components Asymptotically
Unbiased and Consistent ? .

Are the Variance Components Asymptotically
Efficient ? . .

Results for the Hypothesis Testing Phase
Robustness Under Various Conditions

Total Sample Size and Robustness
Number of Groups and Robustness
Sample Size and Robustness
Intraclass Correlation Coefficient
and Robustness . . . . .

Power Under Various Conditions
Total Sample Size and Power
Number of Groups and Power
Sample Size and Power
Intraclass Correlation Coefficient and
Power . . . .

EM Algorithm: Rate of Convergence

VI. DISCUSSION . . .
Conclusions . . . .
Guidelines for the Researcher
Suggestions for Future Research

APPENDIX .

REFERENCES

vi

.72

.72

.73

.83

.92

.94

102

102
105

106
108
108
113
114
114
115
119
122

122

125
125
127
130
132

149

LIST OF TABLES

5
Lay out of Type RBFP - 2 Design in Two Blocks
Treatment Combination . . . . . . . .
5
Alias Pattern for Type RBFP - 2 Design
Varying Combination of K and n for Block 0 and 1

5
of RBFP - 2 Design

Actual Values of Macro Parameters

Expected Errors of Estimate in the Macro

Parameters
Error Estimates in Yoo
Error Estimates in 701
Error Estimates in ;10

Error Estimates in Yll
Measures of Dispersion in Macro Parameters

Differences in Measures of Dispersion in Macro
Parameters Estimated by HLM via EM Algorithm
(VAR) and Cramer-Rao Lower Bound (CRLB) for
Different Number of Groups . . . . .

Differences in Measures of Dispersion in Macro
Parameters Estimated by HLM via EM Algorithm
(VAR) and Cramer-Rao Lower Bound (CRLB) for
Different Group Sizes . . . . . . .

Differences in Measures of Dispersion in Macro
Parameters Estimated by HLM via EM Algorithm
(VAR) and Cramer-Rao Lower Bound (CRLB) for
Different Intraclass Correlation Coeffi-
cients

Mean and Variances of the z-Statistics for the
Macro parameters . . .

Error Estimates in TUlw
Error Estimates in TBIW

vii

Page

.54

.54

.56

.56

.73
.75
.76
.77
.78

.84

.85

.86

.87

.94
.95

.96

5-21
5-22
5-23
5-24
5-25

Page
Error Estimates in 02 . . . . . . . . . . . . .97
Standard Errors for Nominal Alpha Levels and

Number of Replications Used in the Study . . 106
Probability Intervals for Nominal Alpha Levels and

Number of Replications Used in the Study . . 107
Type I Error Rates for Tests of Macro Estimators

Under a True Null . . . . . . . . . . . . . 107
Type I Error Rates of Macro Parameter,‘Y00.

Under a True Null . . . . . . . . . . . 109
Type I Error Rates of Macro parameter, Y01’

Under a True Null . . . . . . . . . . 110
Type I Error Rates of Macro Parameter, Ylo’

Under a True Null . . . . . . . . . . 111
Type I Error Rates of Macro Parameter, Yll’

Under a True Null . . . . . . . . . . . . 112
Power for Tests of Macro Parameters . . . . . . 115
Power for Tests of Macro Parameter Y01 . . . . . 116
Power for Tests of Macro Parameter Ylo . . . . . 117
Power for Tests of Macro Parameter 711 . . . . . 118
Average Convergence Rate of EM Algorithm . . . . 123

viii

LIST OF FIGURES

Page
Error Estimates in Y and‘YO for Different
Combinations of 0and n w th d= .10 . . . . .81
Error Estimates in Ylo and‘Yll for Different
Combinations of K and n with d = .10 . . . . .82
Plot of Stabilized VAR and CRLB for YOU and
YOI.............. ....89
Plot of Stabilized VAR and CRLB for Y10 and
Y11.....................90

Normal Probability Plot of 2- Statistics for
700, Y0]. , Y10 and Y11 . . .

A A

Error Estimates in Transformed T and T8
for Different Combinations SEW and n8 ith
d = .10 . . . . . . . . . . . . . . . . . . .99
Error Estimates in Transformed. o2 for Different
Combinations of K and n with d = .10 . . . . 100
Plot of Transformed Estimated and asymptotic
Variance of T ulw and.T BIW . . . . . . . . 103
Plot of Transformed Estimated and Asymptotic
Variance of 32 . . . . . . . . . . . . . . 104
Power Curves of ‘Y01 , Ylo and‘Yll for Different
Number of Groups, X . . . . . . . . . . . . 120
Power Curves of ‘Y01 , 'Y10 and Y11 for Different
Group Sizes, n . . . . . . . . . . . . . . . 121

ix

CHAPTER I

STATEMENT OF THE PROBLEM

Science's main job is to "explain" natural phenomena by
discovering and studying the relations among variables. In
the behavioral sciences, variability is itself a phenomenon
of great scientific curiosity and interest. In their
attempts to explain the variability of a phenomenon of
interest (often referred to as the dependent variable),
scientists study its relations or covariations with other
variables (referred to as the independent variables). Educa-
tional researchers seek to explain the variance of school
achievement by studying its relations with intelligence,
aptitude, social class, race, home background, school atmos-
phere, teacher characteristics, and so on. Various analytic
techniques have been developed for the purpose of studying
relations between independent variables and dependent
variables, or the effects of the former on the latter
(Pedhazur, 1973).

Perhaps the most powerful method of doing this is the
regression analysis, whose simplest form is one in which the
effect of an independent variable on a dependent variable is
being studied (Pedhazur, 1973). Under this simple conception

the two parameters of interest are the slope and intercept

which are usually called regression coefficients. The test
statistic used for either of the regression coefficients is
the Z-test (or t-test if the sampling variance, 0: , is
replaced by its unbiased estimator). So long as we deal with
a situation where the variables have the same level of
aggregation (e.g., both at individual or group level) and
where our measurement processes are assumed to be error
free, there is no real drawback to this approach.

But in educational research many, if not most, data have
multilevel characteristics. For example, students are nested
within classrooms, and classrooms are nested within grade
levels which are themselves nested within schools, dis-
tricts, or program sites. Thus, we can have variables of
different levels, describing students, classes, schools and
so on. Variables such as family background, prior achieve-
ment, parental educational level, and the like are indivi-
dual (or micro) variables identified with students and
variables such as whether the school is public or private
are group (or macro) variables. In a multilevel problem we
want to investigate the relations between variables at dif-
ferent levels of hierarchy as well as interactions across
levels.

There has been a great surge of interest in educational
statistics over the past decade to search for appropriate
statistical methods for hierarchical, multilevel data. As a
result of this search, a general approach to the problem of

multilevel data which is referred to as hierarchical linear

models (HLM) by Sternio (1981) has emerged.

The basic idea of a hierarchical linear model is fairly
simple. When data are available at two levels of aggrega-
tion, for example, on students and the schools to which they
belong, the model is specified in two sets of equations:
one within schools, and one between schools. The within-
school model is defined separately for each school with
student level predictors and a student level outcome
variable. This is a familiar linear regression model with
one major exception; the within-school parameters, regres-
sion coefficients, are allowed to vary randomly across
schools. This conception poses a second or between-school
model. The between-school model then regresses the within-
school regression coefficients on to the school level
predictors.

Two sets of parameters evolve from this formulation:
micro parameters, or random effects, and macro parameters,
or fixed effects. Research interest has focused on estima-
tion of both micro and macro parameters. As Raudenbush
(1988: 87) has pointed out:

"Two fundamentally different types of problems have

motivated the development of these HLM models. In the

first type, interest focuses on the micro parameters or

random effects. One seeks to estimate, for instance, a

regression equation for a particular school, the effect

size for a particular study, or the growth rate of a

particular child when the data available for that

school, study, or child are sparse. The empirical Bayes
approach strengthens estimation for each unit by utili-
zing data from many similar units: schools, studies, or
children. In the second type of application, attention
focuses on the macro parameters, or fixed effects. One

asks why some kinds of schools have smaller regression
slopes than others, why some studies report larger

effects than others, and why some children grow faster
than others."

The conception that "micro" parameters vary randomly
across the population of groups as a function of "macro"
parameters not only justifies the "slopes as outcomes" idea
(Burstein, 1980), but also introduces a new source of varia-
tion in micro parameters referred to as random effect varia-
nce or parameter variance. This is the variance among the
micro parameters themselves which is distinguished from the
sampling variance resulting from using a sample within each
macro unit to estimate these parameters.

The new advances in analyzing multilevel data have
evolved from statistical theory stimulated by the seminal
contributions of Lindley and Smith (1972), Novick, Jackson &
Thayer (1972), and Smith (1973), who developed Bayesian
estimation procedures for hierarchicaly structured data
When the variance components (i.e., sampling and parameter
variance) are known, estimates for the micro and macro
parameters can be derived from alternative estimation theo-
ries: least squares, Bayesian, and maximum likelihood (see
for example Raudenbush, 1984). The crucial difference bet-
ween Bayesian/ maximum likelihood approach and least squares
approach is the difference in assumptions.

In most applications, however, these variance components
have to be estimated. Unfortunately, no simple closed-form
estimate is available. However, a variety of numerical

approaches to maximum likelihood estimation of covariance

components are available among which EM algorithm (Dempster,
Laird & Rubin, 1977) is especially conceptually appealing.

The EM algorithm produces maximum likelihood estimates
for variance components with known large sample properties.
That is, the estimates are consistent, asymptotically
efficient with known large sample normal distributions. The
fact that the sampling distributions are known becomes espe-
cially important when inferences are to be made based on the
parameter estimates. The test statistic for a macro regres-
sion coefficient is a Z-test (asymptotic z-test or t-test
if the variance components are replaced by their maximum
likelihood estimates). But before asymptotic results become
exact, the number of levels of each random factor must
increase to infinity (Miller, 1977). That is, for example,
both the number of schools (call it K) and the number of
pupils (call it n) within each school must approach
infinity.

At present little is known about the small sample beha-
vior of these parameter estimates. To date it is not clear
how large n and K have to be in order for estimates and
their standard errors to become acceptable, thus justifying
the use of large sample theory.

The goal of this research is to understand the proper-
ties of maximum likelihood estimates obtained from small
and moderate samples, and to evaluate their implications for
research design. Because anlytic study of these properties

becomes intractable in the case of unknown variances and

covariances, empirical studies are needed. Clearly to gain a
comprehensive understanding of the inferential strength of
hierarchical linear model, and to understand the small sample
properties of its parameter estimates, many simulation
studies are needed. In other words, alternative HLM methods
with different model specifications and assumptions, or at
least the most interesting and realistic ones, have to be
studied.

This research will take the initial step and will
address these issues by considering. the two-stage
standardized hierarchical linear model. Specifically, this
research through simulated data generated for different
values of K and n, will investigate the effects of variance
estimation via EM algorithm on inferences about parameters
at the second stage of the hierarchy, that is, about the
macro or fixed effects.

The following chapters will review the literature with
respect to statistical approaches to multilevel data,
discuss maximum likelihood parameter estimation, present the
method used for investigating the small sample properties of
parameter estimates, and provide results and discussion of

their implications for research design.

CHAPTER II

REVIEN OF THE LITERATURE

A long-standing problem associated with educational
research has been the failure of many quantitative studies
to attend to the complexity of data usually produced by
hierarchical, multilevel educational field research
(Cronbach, 1976; Haney, 1980; Burstein, 1980; Cooley, Bond &
Mao, 1981; Rogosa, 1978).

Cronbach (1976) remarked that the majority of studies of
educational effects carried out until 1976 conceal more than
they reveal, and that "the established methods have gene-

rated false conclusion in many studies." (P. 1)

Uni-level Techniques

 

Traditionally, statistical approaches have attempted to
adopt uni-level techniques to multilevel situations. This
can often be done by using aggregation or disaggregation. A
student (micro) variable, such as intelligence, can be
aggregated to school level by assigning to a school the
average intelligence of its students. A school (macro)
variable, such as whether it is public or private, can be
disaggregated to the student level by assigning to each
student the type of school. But as de Leeuw and Kreft (1986)
pointed out "the operations of aggregation and disaggrega-

tion are highly nontrivial, both from the methodological and

from the statistical point of view". Conceptually, by aggre-
gating, a change in the meaning of the variables occurs.
Statistically this means we are ignoring all within-school
variation which sometimes results in dramatic increase in
the correlation between aggregated variables. Robinson
(1950) showed that not only does the correlation fluctuate
as a function of grouping but that the sign may even be
different at different levels. As a result we no longer can
make inferences on the student level without committing the
‘ecological fallacy' (Alker, 1969; Cronbach and Webb, 1975;
Hannan, 1971; Robinson, 1950). This refers to the practice
of interpreting correlations between aggregated variables as
if they were correlations between variables measured on
individuals (i.e., cross level inference). This is the most
commonly cited flaw in any early methodological treatments
of hierarchical data.

On the other hand if we disaggregate, we have to take
into account the fact that students within the same school
do not respond independently to school level variable. But
the traditional linear models require the assumption that
subjects respond independently to educational programs. Also
by ignoring the nested structure of data we will misestimate
the precision of parameter estimates, resulting in serious
inferential problem (Aitkin, Anderson & Hinde, 1981; Knapp,
1977; Walsh, 1947).

In the late 19605 and early 1970s, the topic of

aggregation and proper choice of analytic units (using the

student versus using the group) gained popularity in
educational field research (see Burstein, 1980 for a review
of these issues). This increased interest may be viewed as a
natural by-product of the then growing emphasis in
educational research on evaluation of social and educational
programs; evaluations that had to be designed and analyzed
in such a way as to take into account the ever-present
natural hierarchy found in all school systems. With the
awareness that students within a class and teachers within a
school cannot be considered truly independent and that
responses -to treatment may rightfully vary dramatically
across groups, came an increased interest in how to deal
with non-independence and differential effects for distinct
population groups often labeled aptitude-treatment interac-
tions (Cronbach and Webb, 1975).

Up to the early 19708, then, considerable research had
been done on the effects of aggregation on bias and
efficiency under various grouping strategies. However,
little of this research was grounded in practical
applications.

As Burnstein (1980) in his concluding remarks on choice
of units of analysis points out, if the goal is to learn
something about the effect of educational process on student
achievements, the "discussion, about the choice of an
appropriate unit are simply unnecessary digressions" (p.
196). The emphasis should be on choosing an appropriate

analytical model that accounts for the relationship among

variables observed at both levels of aggregation. Rogosa
(1978: 83) remarked, "no one level is uniquely responsible
for the delivery of and the response to educational
programs.... confining substantive questions to any one
level of analysis is unlikely to be a productive research
strategy". Burstein and his associates (Burstein, Linn &
Capell, 1978; Burstein, Miller & Linn, 1979; Burstein,
1980) argue that when relationships between the dependent
and independent variables are different in different groups,
single level analyses at either the individual or aggregate
level will produce misleading results. For example, conduc-
ting analyses at the individual level without regard for
group membership might lead to spurious null effects or to
spurious large effects; in either case, the actual effects
will only be uncovered through within-group analysis. Thus,
these researchers advocate conducting selected within-class
analyses and using the results of these regressions in
aggregate level analyses. Cooley et al. (1981) reached the
same conclusions and pointed out : "We must not ignore the
possibility of variation among groups (e.g., classrooms or
schools) in estimating a variable's effect. Examining this
variation can reveal grouping effects or specification
error, ignoring it will conceal them." (p. 74)

Such criticisms of single level analyses suggest that
multilevel approaches are needed in many settings. Such
models would aim simultaneously to discover: 1) what is

happening within macro units; 2) what differences there are

10

between macro units; and 3) how those differences influence
the quality of what is going on within the macro units. To
be valid, statistical analyses must account simultaneously
for effects at both levels.

Multilevel Techniques With Random Intercepts
and Fixed Slopes

 

In the mid-seventies, the problem of aggregation bias
was resolved by analyzing multilevel data with multilevel
techniques. some of these alternative analytical strategies
are: the separate between-group and pooled within-group
analyses as suggested by Cronbach (1976), and Cronbach and
Webb (1975); a two-stage hierarchical analysis proposed by
Keesling and Wiley (1974), and Wiley (1976); and a "full
model" analysis suggested by Keesling (1977). All three
strategies obtain their estimators through ordinary least
squares (OLS) technique, but they differ in the approach by
which the estimators are obtained. Notice that in the case
of random intercepts, OLS is an appropriate estimation
method only when balanced designs are considered.

Schmidt and Houang (1983) compare and contrast these
three approaches with respect to parameter estimation. They
concluded that these strategies differ in the way that the
relationship between the between-group effects and the
within-group effects are conceptualized. Analytically they
showed that all three procedures give the same estimate for
the within-group regression coefficient. With respect to

the between-group regression coefficient, the estimate

11

obtained by Cronbach's approach is different but related to
that of Keesling's. As far as the third approach is
concerned (i.e., Keesling and Wiley's two-stage analysis),
no estimate for the between group regression coefficient is
available in this case. These differences reflect different
conceptualizations in the three strategies. That is, where
in the Cronbach's and Keesling's approaches the individual
level variables are conceptualized to have direct impact on
the outcome variables, in the Keesling and Wiley's their
influence are indirect and mediated through other group
level variables. This difference in conceptualization of the
situation sets a criterion for choosing among these three .
multilevel techniques (Schmidt and Houang, 1983).

The model proposed by Wisenbaker and Schmidt (1979) may
be viewed as the extension of these multilevel techniques
to their multi-variate form. The application of a "compo-
nents of covariance structure" method (Schmidt, 1969) to the
proposed, multivariate random effects model (with random
intercepts and fixed structural parameters) allow the
simultaneous estimation of between-group and within-group
effects and their standard errors via maximum likelihood
procedure. The model potentially permits different specifi-
cations for the relations at two levels of hierarchy.

Although these analytical strategies accounted for
aggregation bias, however, the other technical problem,
i.e., misestimated precision, remained unresolved. The

problem is that these methods allow for random intercepts,

12

but they assume constant within-group slopes. The technical
consequences of ignoring slope heterogeneity when in fact it
exists are inefficient estimation of regression coefficients
and negatively biased standard errors of regression coeffi-
cient, which inflate type I error rate.

Cronbach (1976) cites three sources of variation in
within-class slopes: 1) sampling variability and stability
problems due to small class sizes when the process operating
in the classes are basically the same; 2) differences in the
selection factors operating to form the classes; and 3)
differences in causal processes going on in the classrooms.
If we can rule out chance effects and different selection
rules as reasonable explanations, the variation in within-
class slopes become a potent source of information to
researchers and policy makers.

Tate and Wongbundhit (1983) argued that random
coefficient models with random slopes and intercepts are
more appropriate than random coefficient models with only
a random intercept for multilevel analysis in educational
research. De Leeuw and Kreft (1986) further added that
random coefficient models are more general and that fixed
constants are special random variables. They argued that
"whether something is random or fixed should be decided by
considering what would happen if we replicate the experi-
ment. Would it be realistic to suppose that regression
coefficients stayed the same under replications? If not,

then random coefficients are appropriate" (p. 59).

13

Boyd and Iversen (1979) discussed a "separate equations"
approach in which both intercept and slope are allowed to be
random. But their estimation procedure is ordinary un-
weighted least squares for both sets of coefficients, which
ignores the information provided by the random coefficient
model. I

One class of multilevel approaches in which random
coefficients are estimated (Burstein and Miller, 1980;
Cooley et al., 1981), first estimates relationships within
each school; then these regression coefficients serve as
outcome variables for an assessment of the importance of
school policies and practices. Again, this approach is not
free from problems. Raudenbush and Bryk (1986) discussed
some of the technical difficulties associated with slopes-
as-outcomes approach. Among these problems are weak statis-
tical power to detect real differences in slopes, the need
for a multivariate formulation so that several regression
coefficients per unit can be studied, and the need for a
statistical model which matches the complexity of hierarchi-
cal, multilevel character of most educational field research
data.

Additional problems with Cooley et al.'s proposed model
is that all random variation in the effects of macro units
is assumed to be explained by the predictors included in the
model, so that the only unexplained variation results from
sampling of micro units (i.e., unsystematic or sampling

variation). In traditional analysis of variance terms, such

14

a model is a fixed effects model. Assuming the model is
completely specified, there is no drawback to this approach.
However, when the model specification is incomplete, which
will commonly be the case, the parameter estimates of regre-
ssion coefficients and their standard errors are
untrustworthy.

The Multilevel Technique With Random Intercepts
and Random SIopes:HLM

 

A general approach to the problem of multilevel data
(Aitken and Longford, 1986; de Leeuw and Kreft, 1986;
Goldstein, 1986; Mason, Wong & Entwisle, 1984; and Rauden-
bush and Bryk, 1986) incorporates the idea of "slopes as
outcomes" without its various deficiencies. This general
approach with random effects at each sampling level has
been proposed under a variety of names: variance component
models (Harville, 1977), mixed model ANOVA (Elston and
Grizzle, 1962), regression with random coefficients (Rao,
1972; Swamy, 1973; Rosenberg, 1973; and Dielman, 1983),
Bayesian estimation for linear models (Lindley and Smith,
1972; Smith, 1973,Dempster, Rubin & Tsutakawa, 1981; and
Morris, 1983), multilevel linear models (Mason et al.,
1984), mixed linear models (Goldstein, 1986), and hierarchi-
cal linear models (HLM) (Sternio, Weisberg & Bryk, 1983)
have all been used. The present study employs the term
hierarchical linear models, labeled HLM for convenience.

The HLM has a hierarchical structure in the sense that

parameters at a lower level of aggregation (i.e., micro

15

parameters) are assumed to vary over a population of groups
as a function of the parameters at the next higher level
(i.e., macro level). Micro parameters may be as diverse as
means, proportions, variances, linear regression coeffi-
cients and logit linear regression coefficients (see Rauden-
bush, 1988). Through such models, it is possible to assess
the strength of relationship between macro predictors and
micro parameters. This quality along with the "slopes as
outcomes" idea enables investigators to go beyond traditio-
nal questions (e.g., why do more schools have higher
achievement than others ?) and ask more fundamental
questions about why structural relationships vary across
groups. This class of questions (e.g., why is the effect of
social class or race stronger in some schools than others ?)
reflect the "slopes as outcomes" conceptualization popula-
rized by Burstein (1980). The HLM model identifies both
slope and intercept heterogeneity and tries to explain them
via related macro predictors.

Not only do such models enrich the class of research
questions asked about educational effects occuring within
and between educational units, they solve problems of
aggregation bias and misestimated precision long associated
with multilevel data.

Estimators of micro and macro parameters are available
through empirical Bayes methods. The empirical Bayes
estimates of the micro parameters (also called shrinkage or

Stein estimators) provide an improvement over the least

16

squares estimators. This improvement is most pronounced when
some or all groups have sparse data and when there is
heterogeneity among micro parameters, some of which can be
explained by group characteristics.

Estimation of the micro parameters can be improved by
shrinkage of least squares estimates around a grand mean
(known as "unconditional shrinkage") in the first situation
and by shrinkage toward a conditional expectation (known as
"conditional shrinkage") in the second situation.

The empirical Bayes approach also yields estimates for
macro parameters. This estimator, which is recognizable as
the generalized least squares estimator, weights each OLS
estimate of micro parameters proportional to its precision.

Estimation of macro parameters are of great importance
not primarily because these improve estimation of micro
parameters, but because it enriches the class of research
questions asked about educational effects which goes far
beyond what was plausible prior to the advent of HLM models.

Research interest has focused on estimation of both
micro parameters and macro parameters each addressing funda-
mentally different type of questions. Studies with the goal
of improving micro estimators (by either conditional or
unconditional shrinkage) include Laird and Ware, 1982;
Raudenbush and Bryk, 1985; and Sternio, et al., 1983 for the
first type of shrinkage and Braun, Jones a Rubin, 1983; Der
Simonian and Laird, 1983; Novick, et al., 1972; Novick and

Jackson, 1974; Rubin, 1980 and 1981; and Shigemasu, 1976 for

17

the second type of shrinkage. However, numerous investiga-
tors have recently found that the macro parameters them-
selves may be of greater interest (Aitkin and Longford,
1986; Aitkin, et al., 1981; de Leeuw and Kreft, 1986; Gol-
dstein, 1986; Laird and Ware, 1982; Lee, 1986; Mason, et
al., 1984; Raudenbush and Bryk, 1985 and 1986; and Sternio,
et al., 1933).

The HLM model has broad applicability in educational
research. The study of individual growth (Laird and Ware,
1982; Sternio, et al., 1983; Bock, 1983), the measurement of
change (Bryk and Raudenbush, 1987 ); contextual effects in
cross-national fertility research (Mason et al., 1984), and
research synthesis or "meta analysis" (Raudenbush and Bryk,
1985) are examples of HLM's broad applicability.

The major problem with this development is the
mathematical complexity of Bayesian covariance components
estimation. Fortunately, a variety of numerical approaches
to maximum likelihood estimation of covariance components
are now available.

Estimation gf Dispersion Matrices:

 

 

 

Estimation of dispersion matrices in multilevel linear
models with fixed and random effects (i.e., mixed models)
can be complex, particularly in an unbalanced case. The
traditional 'ANOVA' approach is essentially the only method
in use for balanced data. This method consists of equating
the observed sums of squares and cross-products matrices to

their expected values. For unbalanced data, the 'ANOVA'

18

approach leads to biased estimators of variance components.
Henderson (1953) developed analogous techniques to correct
this deficiency. Searle (1968, 1971a, and 1971b) gives
excellent descriptions of Henderson's methods and indicates
various generalizations. One problem with Henderson's method
for estimating variance and covariance components is that
the methods are not necessarily well defined. Moreover,
except for balanced data cases, little is known about the
properties of the Henderson estimators, other than that they
are unbiased and translation invariant. It is known that,
at least in particular cases, there are biased estimators
that have uniformly (assuming normality) smaller MSE's than
the Henderson estimators (see Klotz, Milton and Zacks,
1969). The discovery by Seely (1975) and by Olsen, Seely,
and Birkes (1976) proved that, at least in the case of most
unbalanced mixed or random effects models having one random
factor, there exist estimators that have uniformly smaller
variance than the Henderson estimators. These locally best
estimators are related closely to maximum likelihood estima-
tors (Hocking and Kutner, 1975).

Maximum likelihood and related procedures, which are
reviewed by Harville (1977), have received increased atten-
tion in the past ten years. However, maximum likelihood
approach has been somewhat ignored by practitioners because
of computational complexities and because it takes no
account of the loss in degrees of freedom (df) from the

estimation of fixed effects, leading in some instances to

19

large biases and large mean squared errors (Patterson and
Thompson, 1974). Improved computational procedures are now
available, and Patterson and Thompson (1971, 1974) have
devised a modified ML approach known as 'resticted maximum
likelihood', that adjusts automatically for losses in df. As
Harville (1977:320) states:

" Certain deficiencies of various other methods are not
shared by maximum likelihood. In particular, the maximum
likelihood approach is 'always' well defined, even for
the many useful generalizations of the ordinary ANOVA
models, and, with maximum likelihood, nonnegativity
constraints on the variance components or other
constraints on the parameter space cause no conceptual
difficulties. Moreover, the maximum likelihood estimates
and the information matrix for a given parameterization
of the model can be obtained readily from those for any
other parameterization".

Asymptotic Properties 22 Maximum Likelihood Estimates

 

The attractive features of maximum likelihood estimates
of variance-covariance components, discussed by Harville
(1977), are important. The maximum likelihood are functions
of sufficient statistic and are consistent, i.e., they
converge to the population values as the sample size becomes
indefinitely large. Their joint distribution is approximated
by the multivariate normal distribution with mean equal to
the population value and variance-covariance matrix equal to
the negative inverse of the matrix of second derivative of
the likelihood function. Moreover, the maximum likelihood
estimators are said to be asymptotically efficient (in the
sense described by Miller, 1973 and 1977) attaining the
Cramer-Rao lower bound for the covariance matrix under mild

regularity condition . There is, however, no guarantee or

20

unbiasedness or efficiency in small samples.

In order to obtain asymptotic results in the mixed
model, the number of levels of each random factor must
increase to infinity. More often in the analysis of variance_
a conceptual sequence of experiments with the number of
levels of each of the random factors increasing to infinity
is considered. Hartley and Rao (1967) were the first to
attempt an asymptotic theory that would be truly appropriate
for the more complicated of the ordinary ANOVA models. They
proved that under certain restrictions the estimates were
consistent and asymptotically normal as the size of the
experimental design increased. However, one of their
assumptions is that the number of observations at any level
of any factor must remain less than some fixed constant for
all designs in the sequence. This assumption eliminates many
crossed designs where the number of observations at a given
level of one factor is proportional to the number of levels
of another factor.

An alternative way of obtaining asymptotic results in
the mixed model is by considering repetitions of a given
experiment. Anderson (1969, 1971) considered maximum
likelihood estimates in a more general class of models
(multivariate models where the covariance matrix has linear
structure) and proposed a different solution; he proved that
the estimates were consistent and asymptotically normall as
the entire design was repeated. Miller (1973) developed an

asymptotic theory for the ordinary ANOVA models which, while

21

it is similar to that presented by Hartley and Rao (1967),
it does not exclude any cases of real interest. He consi-
dered asymptotic properties of the maximum likelihood esti-
mates for a large class of design sequences whose size
increases to infinity; this class of design sequences con-
tains all sequences treated by Hartley and Rao and most
sequences which could occur in practice. In other words he
took the basic model of Hartley and Rao, rewrote it in the
form used by Anderson and proved consistency and asymptotic
normality of the estimates in the model.

Raudenbush (1988) in his paper entitled ,"Educational
Application of Hierarchical Linear Models: A Review"
provides a comprehensive review of HLM model with respect to
estimation theory and application. In his concluding remarks
he states, "despite the clear potential of such models,
important questions about their statistical properties
remained unanswered. The questions concern small sample
properties, implications for research design and robustness
of violations of assumptions" (p. 111). This research will
take the initial step and will address questions about small
sample properties of the estimators and their implications
for research design by considering a two-stage standardized

hierarchical linear model.

22

CHAPTER III

TWO STAGE HIERARCHICAL LINEAR MODEL (HLM)

In this chapter, a mathematical model for the general
two-stage hierarchical linear model (HLM) is presented. This
is followed by the description of parameter estimation when
variance components are known and when they are unknown.
Then the logic of EM algorithm along with the steps involved
for the implementation is discussed. Finally, the effects of
estimating variance components on macro or fixed parameters
is described.

For reasons of simplicity and clarity, a two-stage
'hierarchical linear model is considered although the
statistical theory permits more (see Goldstein, 1986). The
basic idea of HLM is reasonably simple. We begin by
supposing that the researcher has data at two levels of
aggregation, for example, on students and the schools to
which they belong. The model is specified in two sets of
equations: one within schools, and one between schools. Our
fundamental assumption is that the outcome variable in some
way depends on the student level predictors and that the
micro regression coefficients may vary systematically as a
function of the school level predictors. The within school

model is defined separately for each school. This is a

23

familiar linear regression model, with student level
predictors, and with a student level outcome variable. The
between-school model then regresses the within-school
regression coefficients on to the school level predictors.
The present study restricts attention to the case in which
variation in the outcome variable, Y, is to be explained as
a function of one student (micro) level predictor, x, and
one school (macro) level predictor, W (theoretically there
is no limit as to the number of Y, X and W). In this case
of a simple univariate regression model the within-school

model (or micro model) becomes

Yij II “1 + 81 xij '5‘ R11 (3.1)
and

2
Rij~N(0.Oj)

where

Yij is the outcome score for student 1 in school j;
where j = 1,..., n

“j and 31 are the micro level regression coeffi-

cients within school j;

Xij is the micro level predictor for student 1 in
school j; and
RLj represents random error, assumed independently

normally distributed with zero mean and variance
2

0'3 '

By centering the micro level predictor around its

respective group mean, x21 , “1 represents the mean on the

24

outcome variable in school j. Equation (3.1) is a standard
linear "full rank" regression model with one major
exception; the within-school parameters, u and B are

1 j
allowed to vary randomly across schools. This conception

poses a second or between-school model.

The between-school model (or macro model) may be either
unconditional (involving no macro level predictors) or con-
ditional (involving macro level predictors). The uncondi-

tional model is:

“j ' “+U0j’ (3.2)

B - 8 + U

j 11, (3.3)

U ~ N (0. T“).

0.1

Ulj ~ N (0, TB),

cov ( U U ) ' T

Oj’ lj

H8

that is, p1 and 31 are viewed as a functions of their

respective grand mean across all schools plus random error.

Under this simple model, TU and 1 represent the parameter

8

variances in U01 and U11 respectively. That is, they signify

the variability in the true intercept and slope across the

population of schools, and that THE signifies the covariance

between them. Treating W as potential determinant of u

1
and 31 , leads to the following conditional between-school
model:

“3 - Yoo + Y01 Wj + "03 (3.4)

25

8j - 710 + Yll Wj + Ulj (3.5)

and

U ~ N ( 0 . T )

ulW

01
U11 ~ N ( 0 , TBIW )
cov ( UOj , UIj ) - TUBIW
where THIW and-rBIW are the conditional parameter variances
in 00;) and U11 respectively, and TIJBIW is. their conditional

covariance. The micro errors are assumed independent of the
macro errors. Equations (3.4) and (3.5) represent the
effects of macro predictor W on the two micro parameters,
pj and 31 . These two equations combined with equation
(3.1) define a multilevel model that can be written equiva-
lently as a single equation by substituting (3.4) and (3.5)

into (3.1):

(3.6)

+ (U + X U + R )

Y 0:1 ij 13 11

11 ' Yoo + Yo1wj + Yloxij + Yllxijwj

The brackets in equation (3.6) enclose error terms that
complicate the expression considerably, as they do its
estimation. The presence of macro error terms in (3.4) and
(3.5) make (3.6) a mixed model, because it contains fixed
coefficients (the y'S) and random coefficients ( the U's ).
The model shown in equation (3.6) is quite general in
that a number of familiar models can be derived from it. If
the macro errors are suppressed, the hierarchical linear

model (3.6) becomes equivalent to an ordinary regression

26

model (or fixed effect specification) that includes student

level variable Xij , school level variable, W1 , and their

interaction effect, and its estimation poses no

X W ,
113

special problem. Under this model we assume that all of the
variation in the micro parameters, p1 and sj , has been

perfectly explained by knowledge of the macro level

variable, Wj , whereas equation (3.6) allows for error. When

random effects remain (i.e., 051 and/or U13

zero), application of ordinary least squares to (3.6) is

are not equal to

inefficient, and the estimated standard errors are too

small. Another model that has received some attention is

"random intercept regression model". This model considers
the within-school intercepts, “j , as random, but the
regression slopes, 83’ as fixed. Some variant of this model
has been employed by Aitkin et al. (1981), Cronbach (1976),

Keesling (1977), Keesling and Wiley (1974), and Wisenbacker
and Schmidt (1979). There are hypothesis tests in each case
to decide whether or not it is justifiable to make these
simplifications (Raudenbush and Bryk, 1986). Mason et al.
(1984) provide a detailed discussion of the relationship
between the general hierarchiacl linear model (3.6) and
other simplified sub-models of potential interest that can

be derived from it.

Estimation Under Known Variance Components

 

Estimates for the parameters in HLM models assuming
known variance components can be derived from alternative

estimation theories: least squares, Bayesian and maximum

27

likelihood (see for example Raudenbush, 1984). Using matrix

notations to generalize the model, equation (3.1) becomes;

YJ -X16j+Rj, (3.7)
and
Rj” N (0) 21):
where
2
221- c.1 Inj,
51 u R1
Y ' 1 9 9 = j , and R - i
J . J B j - ’
Yn j Rn
J j

 

 

 

 

and equations (3.4) and (3.5) will be reduced into a single

equation of the form,

 

ej - pa + 111, (3.8)
and
U3 ~ N ( 0. I).
where

r T
T _ u uB

Tue TB

- U
“a! E 9 and U1. 0:]

B

Uij

Under the Bayesian approach, assuming variance compo-
nents are known, the minimum mean squared error point esti-

mators for micro and macro parameters are;

CD
I

>9

CD)

*

28

and

4 -1 A
ue ( £11) ( 2X1 9 ) . (3.10)

where
_ -1
81 is the ordinary least squares estimate of 8j for each

school with sampling error of

A a 2 ' -1 3
var ( 61 81 ) oJ ( X1 X1 ) Vj .

and.l represents a "multivariate ratio" of the true para-

1

meter variance in 81 to the total observed variance in

81 . This ratio signifies the reliability of BJ as an esti-

mator of school j's slope. It follows that 8j . (X5X3)-1x3Yj
is normally distributed with a mean of “8 and variance Vj + T

A

i.e.,

A

n s a A .
) ‘var ( 81 9:1 ) + var ( ej ) V3 +-T' j

var ( 6

J

The empirical Bayes estimator, u; is a generalized

least squares estimate of 8 , where the outcome vector

8 (i.e., OLS estimates of micro regression coefficients)

3
is weighted by its precision. The empirical Bayes estimator

a: is a weighted combination: first of 8j ,

derived for each school based only on the student data from

the OLS slopes

that school; and second, from the estimated mean slope u; ,
for the population of schools. That is, 6: is a vector
the elements of which are somewhere between the elements
contained in 81 derived entirely from within the macro unit

j, and the elements u; the estimated mean vector for the

29

entire sample.

The properties of these estimators are reviewed by Efron
and Morris (1975) and Morris (1983). Such estimators are
conditionally. biased, i.e., the bias is largest for ej

. * .
values far from the average. However, in general ej is a

more precise estimator (i.e., it has smaller expected mean

squared error) than 81 its OLS counterpart.

Sternio (1981) reasoned, however, that the precision of

*
61 could be improved even further by shrinking estimates

8j , not toward a grand mean u; , but toward a
conditional mean ij* . This is obtained by regressing
61 onto a macro parameter W as follows,
8 - W + U , 3.11
1 1" .1 ‘ )
where

Y

00 1 nj 0 o
' Y s W -

Y“ 3 o o 1 w

10 3

Y11

 

 

Under this formulation the empirical Bayes estimators or
equivalently the posterior means for micro and macro

Parameters are:

A

* *
6 - A 6 - l
1 j j + ( I j : WjY ,1 A (3.12)
- z ' A- - Z ' A- e o

where

Aj-(TIW)(TIW+V )'1

J

30

These results generalize to the case of multiple X's and
multiple W's. The posterior dispersion ofea and ‘Yare given
in equations (3.14) and (3.15) respectively (Raudenbush,

1988:91).

D I - _ t
93 11 VS] + ( I 1:1)3j (I 11) , (3.14)
where
s - w (z w' A'1 w )'1 w'
.‘l :l .‘l .‘l J 3'
and
13* - (z w' A“1 w )‘1 (3 15)
Y .1 J j ' '

It is worth noting that the crucial difference between
the three alternative estimation theories: least squares,
Bayesian and maximum likelihood is the difference in
assumptions. With regard to 8 , the Bayesian and maximum
likelihood method lead to identical result but different
from least squares. This is because the least squares method
makes no assumptions about the prior distribution of a
On the other hand, both Bayes and maximum likelihood assume
normality of the 83 in order to derive e; . With regard to
Y all three approaches effectively assume no prior dis-
tribution and therefore produce identical results (Rauden-

bush, 1984).

Estimation Under Unknown Variance Components
So far we have assumed that the variance components are

known. In most applications however, these will not be given

31

and have to be estimated. For balanced data it has been
common practice to equate the observed sums of squares and
cross-products matrices to their expected values (called
"ANOVA" approach). Estimating variance components from
unbalanced data is not as straight forward as from balanced
data. Henderson (1953) developed analogous techniques
dealing with variance component estimation from unbalanced
data. However, his method is computationally cumbersome when
a mixed model is assumed and when the number of classes is
large. Searle (1971a, Chapter 10) discussed problems with
the ANOVA approach when applied to unbalanced data.

As can be seen in the survey article by Searle (1971b)
there are many approaches to variance components estimation
from unbalanced data; many of them of a rather specialized
nature and many which depend on some form of balance or
symmetry in the problems addressed. The need for general
procedures for. handling unbalanced problems is quite well
known to the statisticians.

A complete Bayesian analysis can be performed by
specifying a joint prior distribution for all the parameters
involved.( 8 , y ,‘T and V in our case), combined with the
likelihood function for regression coefficients (here 8 and
Y ) in order to obtain a joint posterior distribution for
the four parameters. This distribution then has to be
integrated with respect to the variance-covariance compo-
nents (here 'r and V ), thus removing the nuisance parame-

ters, so that the posterior distribution of 8 and 'y can be

32

calculated (Lindley and Smith, 1972). While theoretically
satisfying, this approach is computationally complex.

As a practical alternative Sternio, et al. (1983)
followed the general strategy of Dempster, et al. (1981) and
developed an empirical Bayes analysis. The empirical Bayes
approach consists of first deriving Bayesian estimates based
on known variances and then substituting maximum likelihood
estimates for the unknown variances and covariances in the
estimation formulas. Similarly, they generated maximum
likelihood estimates for unknown variance-covariance compo-
nents via EM akgorithm (Dempster, et al., 1977), and then

replaced the true parameter values in their model by these

estimates.
As Harville (1977: 320) points out, ".... except in
relatively simple settings (cases), the computation of

maximum likelihood estimates requires the numerical solution
of a constrained non-linear optimization problem". For
unbalanced data maximum likelihood estimates of variance
components are not available in closed form and one has to
resort to iterative solutions to obtain them.

A variety of numerical approaches to maximum likelihood
estimation of variance-covariance components are available.
Among them EM algorithm is specially gaining prominence.
Dempster, et al. (1977), review many areas where the EM
algorithm has successfully been applied, or has potential
applications. These include missing value situations, aplli-

cations to grouped, censored or truncated data, variance

33

component estimations, iteratively reweighted least squares,
fixed mixture models, hyperparameter estimation and factor
analysis. They also derive theorems showing the monotonic
behavior of the likelihood function and the convergence of
the algorithm. some of the applications include Aitken,
et al. (1981), Dempster, et al. (1981), Laird and Ware
(1982), Mason, et al. (1984), Raudenbush and Bryk (1986),
Rubin (1980), and Sternio, et al. (1983). Other numerical
approaches to maximum likelihood estimation of covariance
components are the iterative generalized least squares
(Goldstein, 1986) and the Fisher scoring method (Longford,
1985; de Leeuw and Kreft, 1986).

All these three iterative methods avoid the inversion of
large matrices. Thus, they are computationally more feasible
than Newton-Raphson, which requires inversion of large
matrices at each iteration. S.J. Haberman, one of the
discussants of the paper by Dempster et al. (1977), pointed
out that the numerical stability and simplicity of
implementation of the EM algorithm are in its favor. The
Newton-Raphson and scoring algorithms are not especially
difficult to implement. However, convergence of the EM algo-
rithm is often slow. In contrast, the Newton-Raphson and
scoring method are superior from the point of view of rate
of convergence near a maximum since they converge quadrati-
cally rather than linearly. However, they do not have the
property of always increasing the likelihood, and can in

some instances move toward a local minimum. Consequently,

34

the choice of starting value may be more important under

Newton-Raphson and scoring method (Dempster, et al., 1977).

The £23 3.9; EM Estimation

The EM algorithm of Dempster, et al. (1977) provides an
iterative method of finding the maximum likelihood variance
estimates. The EM algorithm is a very general method for
finding maximum likelihood estimates. . In the variance
estimation situation, the EM algorithm alternates two steps
in each iteration. The E ("expectation") step finds the
posterior expectation of the sufficient statistics based on
the complete data (in our case y, e ) given the observed
data (in our case y) and given current estimates of parame-
ters (in our case r and o; ). The M ("maximizing") step
then uses the expected sufficient statistics to produce new
ML parameter estimates of variance components. Each step of
the EM algorithm increases the likelihood. This sequence of
alternate steps guarantees convergence to a local maximum of
the likelihood function. If data is normally distributed,
the local maximum will also be the absolute maximum since
the normal likelihood is unimodal.

One difficulty with EM algorithm is that it may require
many iterations to converge (Sternio, 1981). Thus, it is a
slow process of maximum likelihood estimation particularly
with poor starting values (Mason, et al., 1984). None-
theless, in favor of the EM algorithm are simplicity in

implementation and numerical stability.

35

The process of the EM algorithm along with the computa-
tional details are provided by Dempster, et al. (1981) to a
special version of the model considered in this research.
They considered the model in which there are no macro pre-
dictors (covariates) related to micro parameters, i.e.,

2

W = I and in which the 21 have the special formz):j - o In

where o2 is equal across all individuals. Sternio (1981)-has
broaden this appraoch to include estimation of r and 02 in
more general cases and provides a unified discussion of
theory and computation in such cases. Bryk, Raudenbush,
Seltzer, and Congdon (1987) have extended this approach even
further to general mixed model in which the full rank
j , and the
assumption that micro parameters are random are no longer

assumption of the within-group predictor matrix X

required. Hence, relaxing these two restrictive assumptions
broadens the range of application of the model (Braun, et
al., 1983; Rubin, 1983).

To illustrate the logic of EM algorithm, consider the
simple conditional univariate HLM model prescribed in
equations (3.7) and (3.11). The logic of EM works like this:
First, assume that r and a; are known. Equations (3.12)
through (3.15) provide posterior means and dispersions of Y
and 8.1 . Next, suppose that y and 81 were known i.e., Rj
and U1 had been observed and we want to estimate 1 and a;

It can be shown easily that the following two equations
(3.16) and (3.17) are maximum likelihood estimates for T

2
and Oj respectively.

36

'r- It" 21130:; (3.16)
A2 I. 1

EM algorithm utilizes the dependence of estimators 8j
and Y on knowledge of dispersion matrices and the dependence
of ML estimators of these matrices on knowledge of 8j and
7 via an iterative process with the following steps:

(1) Generate reasonable starting values for the unknown
variances, q; and T . Perhaps as suggested by Raudenbush
(1988), the within-group and between-group residuals from
ordinary least squares regression can be used.

(2) These starting values are substituted into equations
(3.12) and (3.13) yielding starting values of 8; and YE.

(3) To derive new estimates for T and o; , substitute
the sufficient statistics-ZUJUS and R3 R1 in equations
(3.16) and (3.17) by their posterior expectations.

These posterior expectations are derived by Dempster, et

al. (1981) and are as follows:

E{(R'R) Y}-E{(Y )IY}

ii
-(Yj-X

1' x13:

* 1*
)'(Yj '- x381) + E {(83 - :1
)'(Y1 - X181) + tr(Xj'Xj DBj

)' ( Y - XjBj

)‘x'x (e - 8*)IY}
j j J j
)

B
8

Lhasa.»—

J
' (Y3 " x3
and

* *
to: 11er.1 Y) - 2: 0303 + {{Ajvj + “351 ”3}

2
(4) The new estimates of dj and T are then used in a

* i:
repetition of step 2 to yield new value for 8j and yj

37

The process iterates until estimates converge to any degree
of accuracy required.

The estimated variance components after one iteration
are then the maximum likelihood estimates of the variances,
conditional on the values of the‘ structural. parameters
(i.e., regression coefficients). The proof of convergence to
the maximum likelihood estimates is given by Dempster, et

al. (1977).

Effects of Having £2 Estimate Variance Components

Best linear unbiased estimators of the fixed and random
effects (i.e., macro and micro parameters respectively) of
mixed linear models are available when the true values of
the variance components are known. If the true values are
replaced by estimated values, the mean squared errors of the
estimators of the macro and micro parameters increase in
size (Kackar and Harville, 1984). Clearly the magnitude of
this increase is unknown to us. Another problem resulting
from this situation is; that the. parametric family
distribution of micro and macro parameter estimates will
remain unknown. Thus, any statistical inference concerning
these parameters, if not impossible, will be inaccurate.

Fortunately, we can use large sample theory to find
asymptotic distributions of macro parameter estimates. But
finding an analogous sampling distribution for micro parame-
ters is not possible (Dempster, et al., 1981), because we

cannot simultaneously maximize the joint likelihood function

38

2
of all four parameters ( 8 , y , r and.o ). The data

3

simply will not support the estimation of so many parame-
ters. But the focus of the present research is on the effect
of variance estimation on inferences about macro parameters.
Of course, when variance components are unknown, substi-
tuting their maximum likelihood estimates ; and. d: in the
definition of £1 and then estimating 'y* by replacing
A1 for 81 in equation (3.13) is a natural idea. That is,
following empirical Bayes approach of first deriving
Bayesian estimates based on known variances and then substi-
tuting ML estimates for the unknown variances in the estima-
tion formulas. The resulting empirical Bayes estimator,
7* is a true maximum likelihood estimator. Therefore,
this estimate shares the desirable properties of maximum
likelihood estimators. But maximum likelihood estimators

rely on large sample theory. According to large sample

theory we know that:

1. y*- (2 w; 231 NJ)”1 2 W3 331 81 .

A

is the maximum likelihood estimate of Y* if A is the
maximum likelihood of A . This is the case since functions
of ML estimates are ML estimates of the same functions of
the parameters.

2. Under regularity condition the large sample distribu-
tion of ML estimates of Y*for K‘+ m with n's fixed is as
follows
-1

Mir (xv; A; wj>

39

where (2 W5 A31 wj ) is the Cramer Rao lower bound for

’“k
the covariance matrix of 7

But for n, K + m with n's/N fixed

(2 “5 A31 wj )'1 = (2 W5 T-l wJ )'1
since
A31 - ( Vj +:"c)-l = a: ( X3 X1 )-1+ 1' -1= 1-1
as
2 -1
c1 (x5 x3) + 0

thus 7* is indeed asymptotically efficient. It is clear

that we can use the asymptotic distribution of y* for

confidence interval and hypothesis testing:

* _ ASY ' -1 '1
(Y Y) ~ N 0 . (2 W3 Aj Wj )
or
(v; - vh / 3.2. 7;) A§Y N < o. 1 )

where subscript h refers to the elements in the Y vector
i'e" ( Yoo' 701’ Y10’ Y11 ) '
Thus, even though the estimates of the macro parameters are
numerically computed, their large sample properties are well
defined which facilitates the large sample hypothesis
testing and interval estimation.
The EM algorithm yields estimates of dispersion matrices
r and 2 which, in conjunction with Y* , maximize the

marginal density of Y . In other words, EM estimates of T

40

and 2: (i.e., maximum likelihood estimates of 'r and 2: )
when substituted into the equation (3.13) make‘f*a true
maximum likelihood estimator. These asymptotic properties of
ML estimates are of value only if there is reason to believe
that the data are extensive enough that the properties hold.
For these properties to hold exactly it is sufficient that
the number of groups (K in our case), approaches to infinity
(Miller, 1977). However, it would be interesting to observe
the behavior of the estimates as K and n (i.e., number of
individual within each group) each increase to infinity.
This does not imply that K and n be of the same order of
magnitude (Miller, 1977). In the present research, the main
question we set out to investigate concerns the small sample
behavior of the macro estimators (i'e"YOO’ YOI’ Ylo , and
Y11 ). The purposes of this study are three:

(1) To check on the EM algorithm, we can look at the
properties of the macro estimators and make sure that the
algorithm behaves as expected. That is, the macro estimators
are consistent, unbiased, asymptotically efficient, and with
known and asymptotic normal distribution. Also it is worth-
while to look at how well the EM algorithm does at estima-
ting the variance components. Again, this concerns the bias,
consistency and asymptotic efficiency of thse estimators. A
side concern with this algorithm is its rate of convergence
under varying combinations of K and n. This question is
addressed through examining the total number of iterations

prior to convergence to ML estimates.

41

(2) Investigating the effect of variance estimation on
inferences about macro parameters with respect to both
robustness and power.

(3) By constructing data sets that differ in the number
of K and n, we investigate how different combinations of K
and n affect the properties of the macro estimators as well
as the inferences about them.

Specific questions of interest concentrate on estimation
and hypothesis testing. The key issue in the estimation
phase concerns the bias, consistency and efficiency of the
macro estimators, y , and the effect of different combina-
tions of K and n on these properties. For hypothesis testing

interest centers on the type I errors and power.

42

CHAPTER IV

METHOD

The procedures employed in the study to answer the
research questions presented in the previous chapter will
now be discussed. The chapter begins by presenting
standardized two-stage hierarchical linear model. Next, a
description of the population parameters and the manner in
which they were chosen will be given. In the third section
details are presented about the computer routine utilized to
generate the data. The fourth section looks at the analysis
routines. Finally, the measures of biasedness, consistency,

efficiency, type I errors and power will be described.

Standardized Two-Stage Hierarchical Linear Model

This model which is special case of the two-stage HLM
model presented in the preceding chapter is adopted for
generating data in the present research. The standardized
HLM takes the same exact form of equations (3.1) through
(3.5) for the unconditional and conditional case but with
somewhat different assumptions. That is, the micro and macro
predictors both are assumed to be standardized normal
variables with mean of zero and variance of one. Clearly,
this reduction in the number of unknown parameters

simplifies the data generation process.

43

Within-School Model

 

11 J 13 ij.
and
Rij ~ N ( 0, 1 ).

Between-School Model (unconditional)

u = u + U .

J 03
Bj=é+ U13 ,
and
U0j ~ N ( 0. Tu)’
Ulj ~N(O,TB).
cov ( U01, Ulj ) = TuB.

Between-School Model (conditional)

 

“j ' Yoo + Y01 Wj + U03,

Bj ' Y10 + Y11 wj + ”13’

and
UOj ~ N ( O, Tulw )!

cov ( UOj’ Uljlw ) = TuBIw.

44

Further we assume that the micro and macro predictors

are each a unit normal random variable, i.e.,

x N ( 0, 1 ).

ij”
w. .. N 0. 1 .
J ( )

and that Xi R U and Ulj are mutually independent.

3 ' ij ' 03
This implies that cov ( Ubj' Ulj )= 0 , and that the
dispersion matrix I is diagonal (but we will still
investigate estimates of this covariance between macro level
errors).

In order to generate data we need to define the

following parameters:

1. c = TB/Tu, so T6 = c TU' (4.1)
2 §2=52 (62/52)
XY Y
where
E is pooled within group slope;
5¥y is pooled within group correlation coefficient;
3; is pooled within-group (unconditional) variance
in Y ; and
6: is pooled within variance in X
But
6; = 1,
and
3; = TB+ §2+ var (R) (4.2)

45

so that

EZ= < Biyl ( 1 - Egy >>< cru + 1 ). (4.3)

= '2
3. d Tu/(Tu + oy ), (4.4)

where d is the intraclass correlation of Y.
By substituting expression (4.3) into expression (4.2)

we will get:

32 = c1u + < Biy/c 1 - Exy >>< c1

y +1)+1,

u

and by substituting 32 into (4.4) and solving for Tu we
Y
will have:

Tu = d/(Cl-d ) ( 1 - a ) - cd). (4.5)

2
KY
This expression implies that the larger the intraclass

correlation the larger the parameter variance T and that

u I
the larger the pooled within-group correlation the smaller

the I“ . That is, Tu is directly related to d , but
inversely to 52
KY
But Tu and T8 are both positive quantities, therefore

c is constrained to be in the following range of values

0 < c < (( 1 - d )/d )( 1 - Eiy ). (4.6)

Also the conditional parameter variance in intercept and

slope are:

Tulw 3 Tu( 1 - pﬁw ), (4.7)

46

and

_ _, 2 o
TBIW _ TB( 1 pBW ), respectively (4,3)

The above specification of the standardized hierarchical
linear model reduces to five parameters. These parameters
are C , d , pxy , puw
can generate a large number of samples under these known

and 98W ; if predetermined, one

population parameters and investigate the properties of
resulting statistics (i.e., point estimates of and their

standard errors) by observing their sampling distributions.

Parameters of the Study

 

 

In order to investigate the small sample properties of
the macro parameters (i.e., YOO , Y01 , Y10 and Y11 ,
two more parameters need to be added to the list of five
model parameters previously mentioned. These two parameters
are: number of groups, x.and group size, n . This adds up
the number of parameters considered in the present study to

total of seven (K, n, d, and c ).

pxy"pﬂw' pBW'

The first three parameters (K, n and d ) are specially
of great concern in the present study because of their
significant implications in sampling and design of a study.
In a two-stage random sampling (or two-stage cluster
sampling using sampling design terminology), the coefficient
of intraclass correlation ( d ) measures the homogeneity of

the elements within clusters. For a fixed total sample size

of N = nK, the larger the intraclass correlation, the larger

47

the number of groups (K) and the smaller the number of
individuals within groups need to be sampled for optimum
efficiency in design given fixed cost. In contrast, the
smaller the d , the fewer the number of groups and the
larger the number of individuals within groups the better
the precision (Kish, 1983).

However, to consider asymptotic properties of macro
estimators it is sufficient that only K converges to
infinity (Miller, 1977). Accordingly, we may occasionally
define the population solely in terms of levels of K, on
other occasions in terms of varying combinations of K and n,
and still on some occasions redefine it in terms of all
three parameters. Now the values assigned to each of these
parameters will be given.

(A) Number of Groups, K; Small to moderate to large groups
with K = 10, 30, 60 and 150 are simulated in this study.
(gl‘ggggg'§i524 n; Situations with n = 5, 25, 60 and 150 in
each group are simulated.

In deciding the values of K and n, the main concern was
to select those values that provide us with reasonable
ground to investigate the small sample properties of the
macro parameters of interest. The other concern was to have
a reasonable coverage of those combinations of n and K which
occur in real research situations.

These realistic situations include; 1) the study of
growth model (Bock, 1983; Goldstein, 1986; Laird and Ware,
1982; and Sternio, et al., 1983) in which K is small and n

48

ranges from small, moderate to large; 2) school effects
research (Aitken and Longford, 1986; Raudenbush and Bryk,
1986), where K is moderate or large and n is either small or
moderate; and 3) sociological and contextual research
(Mason, et al., 1984; and Wong and Mason, 1985) in which K
is small to moderate and n is large.

3) Intraclass Correlation of 3, d L Two values of d =.10

 

and .25 are considered. These two values appear to be of
reasonable magnitude based on the following grounds. The
intraclass correlation of Y may be large if Tu is large
compared with 5; , and zero only when T = 0 , that is

u
when there is no variation in outcome variable among schools
in the population of schools, which will rarely happen in
practice (see expression 4.4) . But as a general rule,
intraclass correlations in educational research are small
positive values, mostly under .15 (Kish, 1983).

The range of the values are chosen to reflect the
values often obtained from educational field research. In
the school effect research conducted by Raudenbush and Bryk
(1986) the actual value of intraclass correlation was .177.
The mid point of the values considered in this Study is
.175.

4), g) and _6_)_ Correlation Coefficients o_f pxy _,_p and pBW .

uW-’-
For each of these correlation coefficients two values are
considered. These values which are considered to be of
moderate and almost high magnitude (considering educational

field research data), are .25 and .75.

49

11.52513 9; TB 52 Tu L c a Two values of c =.10 and .50 are
considered in this study. Both fall within the range of
permisable values for c given by expression (4.6). Two
interrelated factors have affected the selection of these
two values: First, as a general rule, regression coeffi-
cients have considerably greater sampling variability than
sample means( Burstein and Miller, 1980; Wiley, 1970).
Mathematically, the total variability in intercepts and
slopes can be decomposed into two parts; parameter variance
and sampling variance. Logically it follows that the parame-
ter variance in intercept is of larger magnitude than that
of slope. Second, in many applications one would expect that
much of the observed variation in slopes to be sampling
variation. For example, in the school effect research
conducted by Raudenbush and Bryk (1986) which utilized a
sample of 10231 students in 176 schools, student samples per
school ranged from 10 to 70, and samples less than 45 were
rare, and the value of c was equal to .10. Consequently,
this value is chosen in this study to act as a baseline, and
will be compared to a less realistic but certainly not

impossible larger value of c i.e., .50.

__Design o_f m eta—av
Considering the number of factors (total of seven) and
number of levels in each factor (K and n each have 4 levels,
and the remaining 5 factors each have 2 levels), if we were
to include all factor combinations in our study, we would

5 2
have a ( 2 x 4 ) design matrix with a total of 512 possible

50

cells, which is unmanagable given the large cost of
implementing the EM algorithm.

As a practical alternative, this study adopted a
fractional factorial design by which only a fraction of
factor combinations of a complete fractional design will be
considered. Specifically, this study has adopted a "one-
half" randomized block fractional factorial (RBFF).

Kirk (1968:386-87) made the following comment concerning
fractional factorial designs: "the use of a fractional
design can lead to a sizeable reduction in the number of
treatment combinations that must be included in a study.
This is accomplished by confounding main effects with higher
order interactions ..... however, if certain information
concerning the outcome of the experiment is of negligible
interest, an experimenter can employ confounding so as to
sacrifice only this information."

As a result of treatment-interaction confounding,
considerable ambiguity may exist in interpreting the results
of such experiments. This is the case since every sums of
squares can be given two or more designations referred to as
"aliases". To minimize this ambiguity, careful attention
must be given to the alias pattern of a proposed design.
Treatments are customarily aliased with next to highest-
order interactions which can be assumed to equal zero. This
is accomplished by using the highest-order interaction as
the "defining contrast" which is used to divide the treat-

ment combination into two blocks. The higher order interac-

51

tions are then pooled to form a residual error term. "If
these pooled interactions are insignificant, a complete
factorial design would have been a better design choice for
the data than the fractional factorial design. On the other
hand, if some of the interactions are significant, the
present analysis (i.e., fractional factorial design) offers
the advantage of a larger number of degrees of freedom for
experimental error and a within-all error term" (Kirk,
1968: 394).

Designs with mixed treatments (or factors), i.e., having
unequal number of levels, present special problems with
respect to layout and analysis (see Kempthorne, 1952: 419).
But this is the case in the present study which contains
mixed treatments of the form 25 x 42 design. As a
reasonable alternative this study, adopted a RBFF- 25 design
for the five factors with two levels (i.e., c , d , p ,

xy

puw , and 08w ), and to compensate for the two remaining

factors, K and n each with four levels, every two blocks of
the design layout of RBFF- 25 was crossed with different
level combinations of K and n.

Next, steps involved for laying out one-half replication
of a type RBFF- 25 fractional factorial will be discussed.
(1) Choose a defining contrast. Following Kirk's guideline
the highest order interaction (i.e., five order interaction)
is chosen in this study as the defining contrast. The 32
treatment combinations ( 25 = 32 ) of a complete factional

design can be reduced to one-half of that by the use of the

52

defining contrast.

(2) Confound an interaction with between-block variation.
The interaction which serves as the confounding interaction
must be insignificant and also different from the defining
contrast. For this purpose the interaction between qu and
paw is chosen which is thought to be insignificant. For
confounding an ‘interaction with blocks see Kirk (1968,
Chapter 9). As a result of this process, the 16 treatment
combinations are assigned to two blocks of eight
combinations each.

For simplicity the five factors are assigned the

following notations:

A = puw
B - p8w
C = pxy
D = d
E = C

If (ABCDE) is used as the defining contrast and (AB) as
confounding interaction, the design shown in Table 4.1 will
be obtained. Levels of each factor are denoted as zero and
one. Where zero corresponds to the low value and one to the
high value.

All treatments and interactions except AB (the
confounding interaction), its alias CDE, and the defining
contrast ABCDE are within-block effects. All main effects

are aliased with four-factor interactions. The alias

53

 

 

 

m< mm mo mo mm< mu< mo< mom mom nom< mum< mom< mao< maom moo mmﬁﬁ<
Am<v
cum nu< am< om< no cm on a< o< m a u m < mouaom
mxooam
cwwmon.mw I mmmm oa>9 L0H :Lmuumm mmwa<
M
Nuq mange
HHHOH Hoooﬂ oHooH oo~o~ HooHo oﬂoﬁo ooHHo HHHHO A gooﬁm
HHOHH MQHHH oﬁaﬁa oooHH HHooo HoHoo oHHoo ooooo o xooam
muons macaw ovonm muonm moose ovuna ouobm muons

 

COHuQCMDEOQ UCQEUOQHH

mxoon 039 :ﬁ :mwmoa N I mum: mums mo uzoamq

m

ch oHan

pattern for this design appear in Table 4.2.

A careful examination of the alias pattern in Table 4.2
reveals an interesting feature of this one-half fractional
factorial design. The incomplete five-treatment design con—
tains all of the treatment combinations of a complete four-
treatment design. This implies that the computational proce-
dures for a one-half replication of a 25 design are identi-
cal to those for a complete replication of a 24 design.
That is, by ignoring one of the treatments, the analysis of
an incomplete-design can be carried out as if all the treat-
ment combinations were included in the experiment. The
choice of which treatment to ignore is arbitrary (Kirk,
1968).

As mentioned earlier, different combinations of K and n
will be crossed with the blocks,contained in the RBFB- 25
design. There are a total of 42 different combinations of K
and n, each referred to as a "trial" for convenience. Within
each trial the first K by n level combination will be
crossed with "block 0" of the RBFF design and the second K
by n level combination will be crossed with "block 1". Table
4.3 contains all different combinations of K and n, their
designated block, along with the trial number.

By using a one-half RBFF- 25 design and by crossing this
design with particular combination of K and n a total of 128
(16 cells x 8 trials = 128) treatment combinations will

result. Notice that although levels of K are crossed with

both blocks of the design matrix, levels of n are not.

55

 

 

 

 

m

muq oHan

: can a mo coﬂumcwnsoo wcwaum>

NHooH. mamoo. oomeo. emcee. emmmH. ooeeH. HoHNH. mammn. HH.»
mHHee.H Hmoom. Homom. HmmsH.H Hmoom. «swam. HmmeH.H mHHee.H OH.»
monm. Hmoom. mmmmq. camwm. «ammo. oHHmH. NSHNH. mmHHN. Ho.»
0 o o o o o o o co.»
«qum. «comm. ommHN. «Hmmo. mmoHH. mmmao. omHao. mmnwo. HH>
mHemN. ommH~.H mHoHH.H «Homm. macaw. ommH~.H mHoHH.H sammm. oH »
canoe. Nmeoe. HHHmo. «Hamw. oeeoH. qmqu. «QHHH. mnemo. Ho_»
0 o o o o o o o co.»
mhmuQEmhmm Chum: HO MUSHQ> Hm=uu<
5-5 mHsme
cmHna mm": onHua mm": oman mNna onHua «Nu: H xuon
OnHuH OnHuH coax coax cmnx an"; oan oan
as“: mu can: mu con: mu ecu: mu o xuon
cmHux cmHux coax coax can: on"; oHux oan
m H o m e m N H
HmHue
cmHmmo N . mama H0 H can o mxuon How

I x3018

0 33018

56

Specifically samples of size 5 and 60 occur only in "block
0" and 25 and 150 in "block 1". Thus, each trial consists of
either the two lowest levels of n, 5 and 25 (call it n'), or
the two highest levels 60 and 150 (call it n").

As a result of this design with sixteen varying
combinations of factors A, B, C, D, and B, we obtain sixteen

different parameter values for yr andyul. as shown in

01
Table 4-4. With regard to .Yl the total number of parameter

0
values reduces to half of this size since 1w“) is defined
only in terms of C, D, and 8. Thus, it is not affected by
the high and low values of A and B. Irrespective of the

design Yoo is pre-fixed at zero.

Description of the Generation Routine

 

 

In generating data the present study make use of five
sufficient statistics. Generally speaking, sufficient
statistics are useful in that they reduce the number of
observations, say from n to r statistics ( where r s n ).
This is because these r statistics contain all the
"information" about 9 (iHe., parameters of the study) that
the n observations contain (Graybill, 1976). If r is
appreciably less than n , as it is in the present study
(i.e., r = 1/n), then the very fact that we have to consider
only r , rather than n simplifies our data generation
routine.

The five sufficient statistics are 23x , 22x2, 23R ,

2
ER , and ZXR . Assume that,

57

iid

x N(0.1).
RiEdN(o,1),
and

pxago

(i.e., the population correlation coefficient is zero).
The generation procedure is composed of the following

steps:

(1) Generate

XXj ~ N ( 0, n ).

(2) First generate 2(Xj - §-)2 ~ x2(n_1) and then compute
xx; = zc xj - i.)2 + ( ZXj)2/n

(3) Generate

ZRj ~ N (0, n ).

(4) First generate £(Rj - i.)2 ~ x2(n_1) and then compute
ZR; = 2(1:j - i.)2 + (XRj)2/n.

(5) To generate ZXR first generate t with ( n-2 ) degrees

of freedom, then compute

r - t / (t + n - 2)1/2,

58

and finally compute

2X1 Rj = (n - 1) r Sx SR + ZXjRj/n

(note: t = Z/(x2 /(n-2))1/2 )

n-Z)

After completing the steps involved in generating the
2
sufficient statistics, we can actually compute ZY , zY and

2 KY . But before doing so we need to generate three more
random variables which are contained in the conditional
between-group model. These random variables, which are part

of the expression for“j andggj and thus part of Y1" are
J
Also we need to assign values to the four

Wj, U and U

01 11'
macro parameters of interest. Assignment of values to the

slopes, 7' and 711 are accomplished through the following

01
expressions:

V T

y H
pp)“ p

01

Y ‘ngVJ/—?—

11 8
who is assumed equal to zero, and Y10 is assumed equal to
E . Now proceed with the steps in generation routine.

(6) Generate

w. r N , .
J ( 0 1 )

(7) Generate

U ~N(09T

03 ulw)°

59

(8) Generate

Iﬁj ~ N (0, TBIW).

Also notice that prior to generating Uoj and Ulj we need
to assign values to the five model parameters; i.e., c ,

d , p , p and p . The final step in
xy uW 8W 2
generation routine is to compute ZY , EY’ and ZXY .

(9) Compute

EY = EC “j + Bj Xij + Rij )

a ““j + Bj inj + ZRij.

Compute

2
Z a
Y 2(uj+Bjxij+Rij)

Exij + 2R2 + 2 u 8 Ex +

2 2
+8 1:1 11' 11

J .1

a nu

X Z
2 “j Rij + 28j Xij Rij'

Compute

ZXY = u Ex. + 6 2x2 + 2x R ..
j 13 J 13 ij i]

The generation program, completes each of the nine steps
as one observation is formed. The sample size chosen is five
so five such vectors Y compromise one sample. Thus, begin-
ning with the first "trial", values of 10 and 5 will be
assigned to K and n respectively of "block 0" in the RBFF-Z5
design, and similarly values of 10 and 25 to K and n of
"block 1". Then, starting with the cell one of the design

layout, first the remaining five parameters will be assigned

60

values (according to zero and ones), and then the nine data
generation steps will be completed and repeated for five
replications. This process will be repeated for each and
every 16 cells as one trial is completed. A total of 80 (16
cells x 5 observations) data points will be generated upon
the completion of this trial. Next, we move to the second
trial, assign values to K and n and repeat the same cycle as
in the first trial. This process continues until all eight
trials are completed and a total of 640 (80 sample points in
each trial x 8 trials) sample points is generated. In other
words 128 ( 16 cells x 8 trials) distinct samples each
containing five replications wil be generated.

Along with the generation of sample points, the
generalization program will compute two indices of disper-
sion in macro parameters. These indices are the mean squares

within and the Cramer Rao lower bound which is

- E ( azlog L/By'y) -1,

and equal to the asymptotic dispersion of y* i.e.,

( 2w; Agle )‘1.

The first analysis routine (i.e., HLM program) accepts
both raw data and summary statistics of the sample means and
sample covariance matrix. Considering the efficiency of
summary statistics, for each sample, the mean and the
covariance matrix is computed to be used as input in the

analysis phase.

61

Monte Carlo Techniques

 

As recognized by Hammersley and Handscomb (1964), a
Monte Carlo method is a general technique with different
areas of application, for solving a model by using a random
(or pseudo-random) numbers. One application is the
generation of sampling distribution. Through repeated
sampling under known population parameters, one can
investigate the properties of estimators by observing their
empirical (sampling) distribution.

The present study is a Monte Carlo study aimed at
generating sampling distributions of the macro estimators.
These empirical distributions are then compared to the
nominal distribution (in this case the normal distribution)
obtained under asymptotic theory (i.e., when K and n
converge to infinity).

A Fortran program is used to generate a total of 640
sample points; five observations for every 128 experimental

conditions.

Random Number Generation

 

The use of random number is considered to be an integral
part of a Monte Carlo study. Random numbers are of two
types: purely random numbers and pseudo-random numbers.
However, for a computer based Monte Carlo study the purely
random numbers are inefficient compared with pseudo random
numbers. There are two advantages in using pseudo-random

numbers: (1) the computer itself can generate sequence of

62

numbers by applying an algorithm, and (2) the same sequence
of numbers can be reproduced exactly for the future use.

Pseudo-random numbers are generated sequentially from a
completely specified algebraic formula. At best which at
best they behave as if they are random (i.e., uniformly
distributed and mutually independent). These algebraic for-
mulas are devised in such a way to resist any significant
deviation from randomness. However, there are many statisti-
cal tests that can be used to determine if this is the case.
Typically run tests, serial tests, and various Chi-square
tests for independence are applied to relatively short sec-
tions of the pseudo-random sequence. See Knuth (1969) for
discussions on many of these tests.

Two subroutines, GGNML and GGCHS from the International
Mathematical and Statistical Library (IMSL) were used to
obtain a sequence of pseudo-random normal (R), distributed
N (0,1), and Chi-square random deviates with n degrees of
freedom respectively. Once the procedure is started by an
initial number, called the seed, each new seed number will
be determined from the previous one.

Random normal (0, n) deviates can be obtained by
transforming GGNML output according to Y (I) = R (I) x nl/Z,
for I in (1, 2, ...., K). This transformation was done in
steps (1) and (3) of the generation routine. In steps (7)
and (8) a similar transformation was performed of the form

Y (I) = R (I) x Vl/2

where V represents TuW or TBW whichever the case may be.

63

Analysis Routine

Output from the generation program consists of summary
statistics for each sample. This serves as input to the
first analysis routine.

From the first routine, HLM (Bryk, et al., 1987), we
obtain a vector of the empirical Bayes estimates of the
macro parameters, 7* (as in equation 3.13), the*empirical
Bayes estimates of their dispersion matrix, DY (as in
equation 3.15), estimates of parameter variances Tu and TB ,
estimate of 02 , and number of iterations . These estimates
are numerically computed via EM algorithm. The convergence
criterion for the log likelihood function was set at .0001
with the maximum number of iterations allowed fixed at 500.
The empirical Bayes estimates of the macro parameters and
their dispersion, parameter variances and 02 , yet serve
as input to the second analysis routine.

The analysis routine computes: (1) the required summary
statistics for the estimation phase, (2) the proportion of
times the values of each test statistics exceeded its criti-
cal values for a given nominal significance level under true
null hypothesis, (3) the noncentrality parameter (ncp) as is
defined in the last section of this chapter, and (4) tabu-

lates population effect size ( y in our case ) against ncp

as a way of demonstrating power functions.

64

Checking the EM Algorithm

 

 

As a check on the algorithm, first we might wish to
examine the properties of the numerically computed estimates
of the macro parameters,‘Y . The ML estimators are functions
of every sufficient statistic and are consistent and
asymptotically normal and efficient. Additionally, given
normal data, ML estimates of regression coefficients are
unbiassed.

Key issues in estimation concentrate on bias and effi-
ciency of an estimator. An estimator is unbiased if its
expected value is equal to the population value of the
parameter. In other words, if an estimator is unbiased, the
estimated value minus its parameter value should have zero
expectation, i.e.,

E ( Y - v) = o

where

Y ‘ Y00’ 701’ Y10’ Y11

Thus, by deviating parameter estimate from the known
population value and averaging over the entire sample, one
can determine the degree of bias, if any, present in the
estimation procedure.

An estimator is said to be relatively efficient if it
has the smallest standard error term among the set of
unbiased estimators. Three estimates of the variance are
computed. The first is the variance of the macro estimators

estimated by HLM via EM algorithm:

65

var( ;*,' ) = Diag (- [W3 831 Wj )

-1

A

To the extent that variance components 1- and vi are
misestimated due to small sample problem, the estimated
variance of macro estimators will be in error. The second

estimate of variance is the mean squares within (MSW):

s we 7* 2
MSW-121( y - y )/s

where

7* 5 Ave

Y /5

Last measure is. the average squared bias or mean square
error (MSE):

MSE .2 m?" -Y)2/5

i=1
These last two measures of variance are similar except that
MSE takes advantage of the fact that the population value
( y ) is known.

Since the maximum likelihood estimates are
asymptotically normally distributed, it is of interest to
discover whether they are asymptotically efficient in the
sense of attaining the Cramer-Rao lower bound for the

covariance matrix. This minimum variance bound is the

inverse of the Fisher information matrix and is equal to

66

the asymptotic dispersion of Y*, i.e.,( 3w; A31 w )

3
Consequently, all three measures of variance are averaged

-1

over the entire sample and then compared with the asymptotic

variance of the macro parameters, i.e., diagonal elements of

matrix ( zw‘ A'1 w )-1

J i 1
these various measures of dispersion are:

Computational formulae for

K n A
1) VAR = Z 2 var ( y*) /Kn
jali-l
where K is number of groups, and n is group size.
Average estimate for the variance of the macro estimators

estimated by HLM via EM algorithm from each sample.

K n A* 7*
2) MSW - z 2 ( y - y )z/Kn
3:11-1

Average estimate for the variance of the macro estimators
which is based on the squared difference between the
estimates and the mean estimate.

K n A* * 2
3) MSE = z z ( y - Y ) /Kn

3-11-1
Average estimate for the variance of the macro estimators
which is based on the squared difference between the

estimators and the population parameters.

4) CRLB = Diag ( zw' A'1 w )'1

J j 1

Average of values for the minimum variance bound of the

macro estimators.

67

ML estimators of macro parameters have another desirable
property, their asymptotic sampling distribution is known

and normal. That is,

ASY

( y* - y ) N O, ( 2W3 A- W. )

or equivalently

(( y; - Yh )/S.E. ( y; )> A§Y N c o. 1)

where subscript h refers to the elements in the y vector,

i.e., ( Y , Y , Y , Y
00 01 10 11

One way to assess this property is through the use of
normal probability plots. Similarly, we can determine the
degree of bias in variance components produced by FM algo-
rithm. But the estimates of variance components are un-
stable. The size of the sampling variance of these estimates
depends on the size of the parameter variances they esti-
mate, i.e., the larger the parameter variances, the larger
the sampling variance of the statistics. To stabilize these
estimates a logarithmic transformation is performed on each

of these variance components. This then is followed by

making correction for bias. Formulas for Tu , T and 02

B
are given below:

A

log :ulW - log I + 1/K .

ulW

log - log I + l/K ,

TBlw slw
log 02 - log 02 + 1/nK ,

(note: log 02 = log 1 = 0 ).

68

where 1/K, 1/K and 1/nK are Cramer Rao lower bound for

A A A2 a
log TulW , log TBIW , and log a respectively, and are
used for bias corrections (Pitman, 1938).
2 2 2 2

(note: E (log (8 ) # logc: but E (log (S ) + 1/V) = logcz

where v is the correction for bias in 82 ). The efficiency
of these variance components will be examined by plotting
the log of the squared error estimates in'r

2 uIW ' TBIW
0 against their respective asymptotic variance, 2/K , 2/K,

, and

and 2/nK (see Bartlett and Kendall, 1946, for derivation of
these asymptotic variances). As a last check we look at the

FM convergence rate under varying combinations of K and n.

Type I Error Rate and Power

 

There are two ways to commit an error when making an
inference: (1) rejecting a null hypothesis when is true
(type I error), and (2) not rejecting a null hypothesis when

is false (type II error). Where

a probability of type I error,

probability of type II error,

8

and 1 - B = power

An experimenter wants to avoid errors and select a
statistical procedure which is powerful enough to detect an
"experimental effect" if it exists and in which the level of
significance (<2) is accurate, i.e., neither inflated, nor
conservative. One empirical question is what effect does the
estimated variance components have on type I error and

power .

69

Three specified significance levels .01, .05 and .10 are
considered in this study . For a given nominal alpha,
(100t1 )% of the values in a test statistic's distribution
will exceed the appropriate critical value under a true null
( Ho : y = 7*) where Ha : y = o ) with known variance compo-
nents. Actual significance level relates to the proportion
of the values in a test statistic's distribution that exceed
the appropriate critical value under true null and esti-
mated variance components. Hence, an empirical estimate
of the probability of type I error (i.e., actual signi-
ficance level under unknown variance components) is deter-
mined by counting the frequency with which the test statis-
tics ( z - ( Y* - y )/S.E. ( Y*) ) in each replication
exceeds the corresponding critical value, and the dividing
by the total number of replications.

Nominal power relates to the proportion of the values in
a test statistic's distribution that exceed the appropriate
critical value under a true alternative ( Ha : Y = 7* where
H0 : y = 0 ) and known variance components. Notice that the
null and the alternative hypotheses are the same as the
ones under robustness but have switched their position.
An empirical estimate of power (i.e., actual power when
variance components are estimated) is determined by counting
the frequency with which the observed test statistics
( z . (y*-y)/S.E.(y*) ) in each replication exceeds the
corrgggonding critical value, and then dividing by the total

number of replications. This count is made at all three

70

nominal significance levels.

Power is a function of the discrepancy between central
and noncentral distribution for a test statistics. In this
study actual noncentrality parameters (ncpi ) is defined as
the expected value of the observed test statistics i.e.,

,1, *
ncp = E ( z ) = E Y / S.E. ( Y )
OBS
where S.E. (7* ) is the standard error of the macro estima-
tors estimated by HLM via EM algorithm,

and

z .. N (ncp. 1)
0135

Actual power under noncentral distribution and unknown

variance components is simply equal to the probability of

 

z exceeding the corresponding critical values:
OBS

Actual power = P (‘ 2035 > C-V° (9/2) )

where

z - ncp = z, and 2 ~ N ( 0,1 )

OBS a

Empirical estimates of actual power are then compared to
the nominal power in which the nominal noncentrality para-

meter (ncp ) is defined as:
n

* 1!:
ncp = E ( Y /o ( Y ) = Y/OY
n

where 0(y*) is square root of the asymptotic dispersion

of y* ( i.e., the Cramer Rao minimum variance bound).

71

CHAPTER V

RESULTS

The results of the study are presented in this chapter
in three sections. The first section is a check on the EM
algorithm examining the properties of the maximum likelihood
estimates of macro parameters and variance components. The
second section will address robustness and power issues and
the implication of variance estimation on inferences about
macro parameters. The last section presents the rate of
convergence of the EM algorithm under varying combinations

of K and n used in this study.

Results for Estimation Phase

 

The objective for this phase of the study is to check
the EM algorithm with respect to macro parameters and

variance components (the vector notation “Y and.jz are used

~

throughout this section to refer to the macro parameters

Yoo , Y01 , 710 and Y11 , and their estimates Yoo , YOI , Ylo

A

and'y11 respectively). The question could be phrased: Does

the algorithm behave as expected ? That is, are the macro

I-<>

estimators (1) unbiased, i.e., E ( "I ) = 0; (2) asymptoti-
cally efficient; and (3) with known and asymptotic normal

distribution? Operationally this question could be phrased:

72

Does the estimation get better as a function of K and n ?
That is, are the macro parameters consistent, less biased,

and more efficient ? Similar questions will be addressed

A

with regard to the variance components,r 'thq and 02 .

ulW '

Are the Macro Parameters Asymptotically Unbiased and

 

 

Consistent 3 The error estimates in macro parameters are

 

calculated by subtracting the estimated values {E from
their corresponding parameter values I. . These values are
then averaged over the entire sample, 640 sample points. The
expected errors of all four macro parameters and their 95
percent confidence intervals are shown in Table (5-1) which

suggest that the maximum likelihood estimates of the macro

parameters are unbiased.

Table 5—1

Expected Errors of Estimate in the Macro Parameters*

 

Y00 Y01 Y10 Y11

.002178 .003224 -.001964 .005195

(-.OO78, .0118) (—.0088, .0148) (-.OO98, .0058) (-.0028, .0128)**

 

*From 640 replications
**9SZ confidence intervals

To assess the differential effects of K, n and, d on the.
error of estimates and to examine more explicitly the

differences among levels of each factor, Tables 5-2 through

73

5-5 give error of estimates in i reflecting these three
factors. In all four tables the same patterns emerged. Since
this was consistent across all four macro parameters, only
the results for the yoowill be discussed. The first eight
rows relate to the averaged within-cell error of estimates.
Generally these values considering the small number of
replications (five) tend to be small. In the lower part of
the tables absolute errors of estimates are summed within:
1) levels of d ; 2) levels of n (n' vs. n"); and 3) levels
of K. Within each level of n, error increased as d in-
creased. For example, with K = 10 under n' the absolute
error of estimates in $00 (Table 5-2) went from .517 under
low intraclass correlation to 1.013 with a high degree of
d ( d = .25). With K = 30, 60 and 150 under n' absolute
error increased from .325, .112 and .145 to .558, .276 and
.172,respectively.

This upward trend was remarkably consistent among all
macro parameters. The only difference among the four
parameters was one of magnitude. With Y00 and Y01 , error of
estimates tended to be slightly higher than for the y and

10
The reason for this difference in reduction in errors

711"
for slopes and intercepts seems to be due to the assumed
ratio of the parameter variance in slope to that of inter-
cept used to generate data. This ratio is prefixed to be
either 1/10 or 1/2 (i.e., c = .10, .50).

Having shown that error of estimates respond differently

to the levels of d , the following is an attempt to

74

mm

x we m~m>o~ swab“: reassm mumswumo mo muouuo muaaonn<tcat
: mo m~m>o~ cucuwz voEE:m oumeHumo mo muouuo musHoma<¢tu
a mo m~o>m~ :quwz coaszm mumeHumo uo mLoHLo muzmoma<tt

HcoHoHHuoou :oHHmHouHoo mmmmomuucw 0 cu“: AOmH Lo 00u:v:: vcm AmN Ho mu=0.= muwm no mazoum a mo mcoﬁumOHdamu n scum:

 

 

 

 

mmaaa0m. mcm. aNm.H Oam.N
asgNmN. Nam. nmm. 00m. 000. 000. 000. mm.m
:nmm_. 090. NNH. mq~. mNN. CNN. 0NN. NHH. ham. amN. 0mm. mNm. NNN. NON. m~0.~ mam.
m—0. m00. m~0.1 «H0. n~0.1 n~0.1 mmo. ado. m0H. 0N0. NmH.l 0N0. mH0.I ~00. mmowl HHO.
0.0. 000.1 .1 000. .I NHO. ago. 0H0. MNO. hmo. moo.) 000. 500.1 000. 000. 000.1
_00. 000. 0N0. 0N0. moo. 000. 0~0. 0N0. m-.l 0H0. 0N0. 0N0. mn~.1 mqo. N00.1 N00.1
mmo. mm0.1 Mao. MN0.I ~n~.1 MN0.1 qu. #00. 000. N00.1 NMN.I 000. MON. N00.I m~m.1 N00.
H2x%.. ~00. 000.1 N00. HNO. 0N0. nNO. HHO. Hmo.l who. 0N0. 000.1 000. «No.1 on~.1 0mm.
«N0. 0~0. 0N0. . N00. mH0.1 w~0.1 000. m00.1 HNO. «No.1 H00.1 000. mvH.I 0~0.1 mmo. ONN.
NN0.1 000.1 000.1 nmo. vm0.1 000.1 500. Mac. NH0.I moo. m~0.1 00H. mm0.1 mmoJ Haﬁ. c~0.1
Nm0. 000. 000.1 0~0.I 0H0. 0N0. m-.1 0N0. HNO. 0N0. 000. 0N0. 0cm. NNO. 00H. n~0.

mN.uc 0H.u= mN.nu 0~.uv mN.«c 0H.uv mN.uo 0a.:u mN.Iv 0H.uc mN.nv 0N.Iv mN.uu 0H.up nN.np 0~.nv
: c c : == .: :c .:

00H": 001x Omnu Can:

:8»
:w mmumswmmm Houum

Nlm manna

05

x no m~o>o~ easy“:
a mo m~m>o~ cquwz
u no m~m>m~ :quw3
HcoHoHOOOOO :oHHmHmcHoo mmmmomnucﬁ a soar AOmH ho 00ucv:: cam RMN Ho mu:0.: oumm

amassm mumawumo mo muouum musaomn<maaa
cmesam mumsﬁumo mo mpouuo condoma<344
oceanm oumeﬂumo mo mHoHHm oquomn<aa
mo masoum x mo mcowumOHHamH n sock:

 

 

 

 

**#*M0m. N00.~ 00M.H 00M.N

mas00N. 00N. 000. 000. 005. 0M0. 000. 00m.~
umN5H. 000. 00N. ~00. 050. QNH. 00N. NON. 00M. 00M. 00M. H5N. 55m. 0H0. NON.~ 00M.

_M0.1 000. 0~0.1 M00. 0M0. 000. MMO. 000. 000.1 MH0.I 0N0.1 000. 000. 5m0. H00.1 0m0.1

r~0. 5HO.I 500. O_0.1 ONO. M00. 0H0.I N00. 000. 000.1 NHO. 000.1 500.1 0N~. HMo.I 0Nm.

NNO. NN0.I ~NO. 000. NMO. 000.1 0N0.1 M00. 000.1 000. 0N0.I M00. 0N0. ~00. 500. 0m0.1

000. N00.1 0Mo.1 0~0. 0N~.1 000.1 0N0.I 000. 000.1 HH0.1 N00.1 0M0.1 00H.1 000. 00M. 500.1

000.1 000. 000.1 MN0.1 0N0.1 HHO. 000.1 0H0. 0N0.1 H~0.1 MOO. 5N0. 500.1 0H0.1 Nma. 0M~.

Mao. 0N0. 0Mo. M00. m00.1 500.1 0N0.I 000. 0H0. 0~0.1 N50.1 000. 0N0. NN0.1 0NH. N00.

m00.1 500.1 0~O. 0N0. 5H0.1 000. MNO. 000.1 000.1 000.1 MNO. mmO. H00. 0N0. 00N. -0.I

0N0.1 000. 000.1 MHO. 05H. 0N0.1 ~00.1 5H0. H00.1 Hmo. 0MH.1 NN0.1 00H. 000.1 00a.1 0N0.

mN.uc 0~.uu 0N.n1 O~.u0 MN.uu 0H.uv 0N.nu 0H.uu 0N.uu 0H.nv 0N.I0 0H.Ie MN.nO 0H.Iv mN.uu 0~.ut

..= .C 3: C 3: .C ::
Oman: 00a: OMuM Ouux

 

.30 >
:« mannedumm Houum

MIM omnmh

Hammowuumoo :oHHQHmLHou mmcHomuucw v zqu AOMH no 00ucv:: new AmN Ho nu:0.: mNHm

55

4 0o m~o>m~ casuH) thesam mumsﬂumm mo mbouum muzaoma<¢xum

: mo m~m>m~ swab“: amassm mumeﬂumm 0o msocum mquomn<zta
a mo m~o>o~ =quw3 amassm mumsﬁumo uo muouuo muaqoma<48
mo mason» 2 0o mcoHumoHHQOH m eoumt

 

 

 

 

cmasm0m. M00. 0M0. H50.~
#3400N. 55N. 50N. 05N. ~00. 0M0. 000. 55~.H
uuN5q. 000. 0MH. 0MH. NOH. M50. 00a. OHM. nMN. 00H. 00M. 00H. muM. 05H. 0M5. N00.
HMO.1 0~0. m00.1 N00. 000. 0H0. 500. N00. H00.1 0M0. 000. 000.1 0N0. 0M0. 000. 0M0.1
0H0. 5~0.1 MHO.1 NNO.1 H00.1 N00.1 000. 000. 000.1 .1 0m0.1 MHO.1 NNO.1 HNO. 0N0.1 H00.
NNO. NNO.1 000.1 NHO. H00.1 000. HMO. 500.1 NNO. 0N0. H00.1 NNO.1 0N0. 5N0.1 HNO. 0m0.
000. N00.1 000. MAO. 500.1 000.1 000. NHO. Mm0.1 000. 0M0. 000.1 NN~.1 NNO.1 0H0. mNO.1
000.1 000. 000. 0N0.1 HN0.1 NNO.1 0M0. MAO. NNO.1 0M0.! 000.1 0N0. N00. NNO.1 0MH. H00.
M~0. 0N0. M00.1 MHO. 000.1 0HO. 000.1 NNO.1 H00. 0N0. who. 0H0.1 MM0.1 mOO. 05~.1 5NO.1
m00.1 500.1 5N0. N~0.1 000.1 0N0.1 MN0.1 0m0.1 000. MH0.I 0M~.1 N00.1 M00. 000. 00a. M0N.1
0N0.1 000.1 0M0. NmO. 000. 000.1 M00. 000.1 M00. 0N0. 500.1 0H0.1 000.1 0N0. mON. 000..
MN.uc 0d.uu MN.uu O~.nc mN.Iv OH.Ic MN.IU 0H.ut 0N.Iu O~.Iv MN.IU 0H.Ic MN.nv 0H.nc MN.uv 0a.»:
: : :
a. .- .. .-
Omﬁux 00ux OMnx 0H

 

*OH >

mmumeﬁumm Houum

01m mﬂamb

05

x 00 m~m>m~ swab“: voeaam muceHumo 0o muouuo obaHomb<¢***

: 0o m~o>o~ :«zuH: emsszm mumswumo no muouum ousHoma<ata

c uo mHm>oH cﬁzqu seesaw mumsHumo mo muoHLo mquoma<zt
HcoHoHOOooo :oHHmHoHHoo mmmHOMHH:H a saw) 50nd Lo 00u:0:: sea AmN Ho nu:0.: QNHm no masoLw x 00 mcowumoHHdmu m Eoumt

 

 

 

 

*¢¢*0M0. 0N0. M50. 00M.~
n..u:¥NM— . 00M. NOH. 0M0. 0M0. 0mm. 050. ~50.
”$050. 0m0. m0H. ~0H. Nam. 000. ~5N. 00H. 00N. 00H. 0MN. 00M. NMM. MNA. 0N0. HmN.
m00. 000.1 5H0.1 000.1 000.1 OHO. MHH. NHO. MMH. mN0.1 N00.1 000. 0m0.1 0N0.1 500.1 Mo0.1
000. 000. 000.1 000. 0N0. 000. Mo0.l 000. NMO. MNO. NNO.1 5H0.1 500.1 000.1 N50. 0N0.
000. m00. 0_O.I M00. mOO. 000. 0M0.1 M00. NHO. H00.1 MH0.1 MM0.1 mHO. MHO. 000.1 0Mo.1
0:0. 0~0.I 0N0. 000. 5H0.1 H00. 000.1 HMO. 000. 000.1 ~00.1 ~H0.1 00H. 000.1 MOH. HHO.
_N0.1 000.1 MHO. m~0. 0N0. 500.1 5N0. H00.1 M00.1 M00.1 H00. 000.1 0M0. 000.1 000. 500.1
0.0. N00.1 0~0. 0N0. MHO. ~00. 000. 0M0.1 0N0. MN0.I ONO. 00H. 000. 500.1 0M0. NMO.
r00.1 mHO. HMo.1 000. MH0.1 ANO. 0N0.1 M00.1 NNO.1 0H0.1 000. 0M0. 000.1 0M0. QMH. 0m0.
o_O. 000. 000. 000. NNO.1 000. 0H0. N00.1 000. 500.1 000.1 M00.1 000. HMO. 000.1 N00.1

.1: 0~.1= mN.uc 0a.": mN.nv OH.I0 nN.Iv OH.nu MN.I0 OH.Iv 0N.It O~.uv mN.uu O~.nu mN.nu O~.ut
= c c .c :: .: :c .:

Omaux 00a: OMux 0H»:

*HHP
cw mmumsﬂmwm Houum

Mlm mash?

evaluate the effects of different levels of K and n on these
error estimates. Within each level of K, error decreased as
n increased. For example, with K = 10, 30, 60 and 150,
the absolute error in;00 dropped from 1.53, .883, .388 and
.317 to .989, .646, .355 and .252, respectiviely. This
downward trend was consistent over all four y’s. Again the

~

main difference across I was one of degree. Increasing the
number of groups substantially improved the results. Abso-
lute error for all four parameters decreased with larger
number of groups. Although the patterns are similar, reduc-

tion for Y00 and y are greater than that of 710 and

01
$11 . Moving from K = 10 to K = 150 reduces the error in $00
and $01 to almost 1/5 whereas this reduction is only 1/3
for $10 and ;11' Among the three factors the effect of K on
error of estimates is more pronounced than that of n and d.

These downward trends in d and n were compounded as
number of groups was introduced. For example, with regard to
sample size, n, the absolute error in $00 dropped only mini-
mally within each level of K (e.g., going from 1.53, .883,
.388, and .317 to .989, .646, .355, and .252, respectively).
Increasing the number of groups brought substantial improve-
ment in results. That is, the absolute error dropped from
1.53 under K = 10 to .252 under K = 150. A similar pattern
is present in the case of d . While absolute errors of
estimate decrease as d decreases under a given level of n

(e.g., dropping from 1.013, .558, .276 and .172 to .517,

.325, .112 and .145) this reduction is compounded across

79

levels of K i.e., going from 1.013 under K = 10 to as low as
.145 under K =150.

Figures 5-1 and 5-2 provide schematic representations of
error patterns in the macro parameters (symbols A-2 and *
signify frequencies 10-36, respectively, in the figures used
throughout the thesis). In these figures errors are plotted
against varying combinations of K and n when d = .10. The
plots follow the patterns described earlier in this section.
The further we move toward K = 150, the more concentrated
the errors are around zero. This suggests that ML estimates
of macro parameters are asymptotically consistent. The
errors are small for all levels of n when K = 150, and for
n = 30, 60 and 150 when K = 60, but not so when K = 10 or
30. This suggests that the latter combinations of K and n
produce less precise estimates of the macro parameters. When
intraclass correlation is increased to .25 (Figures A-1 and
A-2 in Appendix ) the exact same pattern is reproduced
except the spread of errors is magnified .

Results of multiple regression analysis applied to the
squared errors of the macro estimators support the foregoing
discussion of the differential effects of K, n and d.
Specifically when (Kn)-1and d were the assumed predictors in
the regression equations there was a significant linear
relationship between the squared errors and the two
predictors with (Kn)”1 accounting for more variability. That

-1 2

is, when (Kn) was the sole predictor. R for Yoo' YOI' Ylo

80

 

 

 

 

[was 3°19
‘C31 .6511
39.2 556
11433 168.61
QC“ .635
533 976
1.643 601.
1595 281.51
AU .1
O 0
Y AY
8651 .1 1 5733
982 .l 1383 11.
127311 255‘21
1‘27‘1 1. “6 5
2155331 1.. 111.127.5321
[2466‘ 2 227621.
0467122 2355312
11213 622 21. 2631131. 1 1
o 5 7 5 5 7 .3 0 5 7 s
3 o o 0 o 3 3 o o
o - o o o o c e
. I. 1 . 1.
. .
S T I M A T E D E R R O R

153

so
K=150

60
25 150
6C)
.10.

H0
30
81

60
25

respectively.

um
10

60

25
Note: Symbols A-Z signify frequencies 10-35,

Error estimates in YOO andY'01 for different

combinations of k and n with d

 

 

Figure 5-1.

 

 

 

 

 

 

LOH
A
Y10
.7
E .35
3 1
S 11 11 3 11
4 6 7 lo ‘0 1 4 3 6 3 2 2 7 1
T o 24 43 5: so ‘5 cs 1 18 xx
44 43 4 43 3 75 3 4 22
11 1 11 2
3 1 3
I
- 3s 2
M 1
A -01
T
-1005
E
1.05
D A
Y11
.7
E
.35 1
R
1 2 1
s 12 1 11 1 1
R 1 41 7 5 41 s 35 32 c 23
o 3 41 s 1 78 a 7 cs 8 4 1: J
48 35 4 55 3 82 21 3 2 1
O 2 11 1 1 12 1 2
3
1
R -035
'07
'1005
5 so 5 so 5 so 5 50
n
25 150 25 150 25 150 25 153
K=lO K=30 K=6O K=150
A A
Figure 5-2. Error estimates in Y10 and Y11 for different

combinations of k and n with d=.lO.

Note: Symbols A-Z signify frequencies 10-35,
respectively.

82

and 741. assumed the values of .11, .15, .19, and .11,
respectively and increased to .13, .17, .21, and .13,
respectively when d was added to the equation. This finding
was consistent across all macro estimators.

However, when the effects of K and n were considered

-1
separately, K always accounted for more variability
. -1
followed by d and lastly by n (using stepwise regression
-1
analysis) except in the case of where K was followed

first by n and then by d.

Are The Macro Parameters Asymptotically Efficient 3

 

Three estimates of dispersion in macro parameters are
computed. These estimates as defined in Chapter IV are, VAR,
MSE, and MSW. Table 5-6 contains the mean value of these
three statistics averaged over the entire sample along with
the asymptotic dispersion of {1* , the Cramer Rao lower bound
(CRLB).

Across all macro parameters the three measures of
dispersion are not significantly different from each other
and from CRLB. Within each dispersion measure, the magnitude
of dispersion for Y10 and Y11 is smaller than that of yoo
and. Y01' This pattern is consistent with previously estab-
lished results. In two occasions MSW gave smaller estimates
than (CRLB (.007872 and .008636 versus .008058 and .009595

respectively) which can be attributed to sampling variance.

83

Table 5-6

Measures of Dispersion in Macro Parameters*

 

Y 00 Y01 Y10 Y 11
VAR .018283 .021039 .009079 .010753
MSW .017085 .021947 .007872 .008636
MSE .018341 .021904 .008525 .008377
CRLB .016640 .019662 .008058 .009594

 

*From 640 replications. Dispersion measures are: HLM estimates
via EM algorithm, VAR; mean squares within, MSW; mean square
error, MSE; and Cramer-Rao lower bound, CRLB.

The values of MSW and MSE are very similar. This lends
credence to the notion that the estimates are unbiased since
the only difference between MSW and MSE is that one uses the
mean of the parameter estimates and the other uses the
parameter itself. Similarly, the resemblance in values of
VAR and CRLB lends itself to the notion that estimates of
macro parameters are asymptotically efficient. VAR is
different from CRLB in that it uses the ML estimates of
variance components and not the parameter variances.

To evaluate the differential effects of K, n and d on
VAR and to assess explicitly the discrepancy between VAR and
CRLB, Tables 5-7 through 5-9 provide differences between VAR
and CRLB (DIP = VAR - CRLB) as a function of K, n, and d ,
respectively.

The differences portray the extent to which VAR
approximates CRLB under varying levels of these three
factors . While larger d consistently increased the

asymptotic variance and its ML estimates, the difference

84

Table 5—7

Differences in Measures of Dispersion in Macro Parameters Estimated by
HLM via EM Algorithm (VAR) and Cramer-Rao Lower Bound (CRLB)
for Different Number of Groups*

 

K VAR CRLB VAR-CRLB
10 .238891 .221223 .017668
Y 30 .075480 .066246 .009234
00 60 .036600 .032476 .004124
150 .014689 .012854 .001835
10 .288704 .276495 .012209
30 .078131 .068933 .009198

Y
01 60 .039283 .034971 .004312
150 .014664 .012844 .001820
10 .124528 .108462 .016066
30 .034335 .031260 .003075

Y
10 60 .016307 .015386 .000921
150 .006406 .006052 .000354
10 .155503 .136727 .018776
30 .036303 .033094 .003209

Y
11 60 .016776 .015934 .000842
150 .006479 .006126 .000353

 

*From 160 replications of K groups.

85

Table 5-8

Differences in Measures of Dispersion in Macro Parameters Estimated by
HLM via EM Algorithm (VAR) and Cramer-Rao Lower Bound (CRLB)
for Different Group Sizes*

 

n VAR CRLB VAR-CRLB
5 .138723 .108442 .030281
-y00 25 .091868 .088070 .008061
60 .065642 .065254 .000388
150 .069427 .075295 -.005868
5 .149417 .126111 .023306
.Y01 25 .099376 .092726 .006650
60 .076937 .076393 .000544
150 .095051 .098014 -.002961
5 .090942 .073461 .017481
.YIO 25 .040522 .037811 .002711
60 .020843 .021008 —.OOOl65
150 .029270 .028880 .000390
5 .104826 .085723 .019103
YJJ. 25 .046579 .042289 .004290
60 .027006 .026608 .000398
150 .036650 .037262 -.000612

 

*From 160 replications of groups of size n.

86

Table 5-9

Differences in Measures of Dispersion in Macro Parameters Estimated by
HLM via EM Algorithm (VAR) and Cramer-Rao Lower Bound (CRLB)

for Different Intraclass Correlation Coefficients*

 

d VAR CRLB VAR-CRLB
IYOO .10 .053569 .044934 .008635
.25 .129261 .121465 .007796
.YOI .10 .064928 .055877 .009031
.25 .145462 .140745 .004717
y .10 .031157 .025678 .005479
10
.25 .059631 .054902 .004729
Y]J_ .10 .036040 .031199 .004841
.25 .071491 .064741 .004081

 

*From 320 replications of d intraclass correlation coefficient.

87

between the two estimates, although not statistically signi-
ficant, is reduced. This downward trend is considerably more
pronounced with an increase in K and consistent among all
macro parameters. Results for n, while generally following
the same trend, did not appear to be smallest with the
largest sample size. Under all four macro parameters, the
smallest difference occurred with n = 60. It might be
assumed that this result was due to the pattern of factor
combinations in matrix design, since n = 60 only occurs in
"block 0" and n = 150 in "block 1". Additionally, the values
of VAR and CRLB consistently get smaller proportionally with
an increase in K and decrease in d . This pattern is not
followed when n increased. This might be due to the fact
are affected by K and

that parameter variances T and r

ulw Blw
not by n. However, the situation is reversed with a2 which
is affected by sample size and not by so much number of
groups. But considering that 02 is fixed at 1 and is not
allowed to vary explains the pronounced effect of K over n
in reducing the size of the VAR and CRLB.

Schematically these close approximations between VAR and

CRLB are presented for and 711 in Figures 5-3

Y00' Y01 ' Y10
and 5-4 respectively. This is done by first transforming
VAR and CRLB as such:

/-K72*log (VAR) + 1/K and MFR7251og (CRLB), respectively.
The logarithmic transformation is used in order to stabilize

VAR and CRLB. When unstabilized the sampling variance of the

statistic, VAR, dependes on the size of CRLB, the parameter.

88

 

 

 

 

 

0
Y’ 2530
00 21157-4
14339362
-15 495121
23711
2 71142
3356851
171 2
. -30 56335
S 1GC41
7 42
2 3
T' 158
'45 1 3824
6 1662
11 5304
1 8662
73722
B -60 927
3 6 1
34 2
I 21
-75
I.
I
-90
z o
558
- ‘Y
E 11413
01- 146740511
‘15 141c75
94H§6
D 2283151
1543362
16302
'30 152:1
108331
5332
)I 11 4 1
21 3252
-‘5 122 356
I) 11 3452
3465
67962
R 1364411
~60 118551
243 1
421
22
-75
-9o
-78 -66 -54 -42 -3o -18 -6
-84 -72 -6o -48 -36 -24 —12
55‘? IX E3 I 15 12 Z 13 E) (I I? 1.28
Figure 5-3. Plot of stabilized VAR and CRLB
for Y andY .
0() 0].

Note: Symbols A-Z signify frequencies 10-35,

respectively.

89

 

 

 

 

 

 

0
'Y SW
10 202761
5 19HL3
'15 4566931
380821
4533 1
1158K84
2671561
-30 156F31
S 17 42
5868
33974
T 7752
-45 1 53
1121
29211
11 274852
511
~60 13 5
B 1272
27342
1 021
I 4246 1
~75 13414
L (621
I ~90
Z 0
Y’ 2755
H 11 0003
t‘ 13HE32
~15 31858
30093
D 18344
1 7HJ932
6AFF51
~30 225735
47111
3466
\I 137154
16361
~45 1 251
11 2264
76112
3279
R 2252 1
~60 3221
36 z
9852
22633
4226 1
~75 22421
152,
11
~90
~78 -66 ~54 ~42 ~30 ~18 ~6
~84 ~72 -60 ~48 ~36 ~24 ~12
ES'T 11 E3 I 1; II Z 13 I) (3 11 1.13
Figure 5—4. Plot of stabilized VAR and CRLB

for and .
Y10 Y11

Note: Symbols A-Z signify frequencies 10-35,
respectively.

90

 

value. The relationship although not perfect, is highly

predictable. The slope of regression lines is over .985 for

y and Y01 and over .999 for Y10 and yll . The intercepts

00
are -.03 and .13 respectively.

The preceding discussion of differential effects of K
and n and d is supported by applying multiple regression

analysis to the DIP squared calculated for each macro
-1
estimator. Specifically, when (Kn) and d were the assumed

predictors, there was a significant linear relationship
-1
between DIP squared and the two predictors with (Kn)

consistently accounting for more variability. This finding
was consistent across all macro estimators.

However, when the effect of K and n were considered
separately K-l always accounted for more variability
followed by d . n-ldid not account for more variability in
(DIP) above and beyond K-1 and d. The only exception was
for yoo wherein2 increased from .125 to .134 as a result of
introducing n to the equation.

It is worth noting that squared multiple correlation
( R2) for the latter set of prediction equations are
consistently higher than that of former set of equations.
This may be taken as an evidence to the pronounced effect of
K over n in reducing the size of dispersion in macro
parameters. This is because the second set of equations

reflect the effect of K separate from n as is opposed to the

combined effects of K and n in the first set of equations.

91

22 the Macro Parameters Have Asymptotic Normal Distribution?

 

 

One of the desirable properties of ML estimates is that
their asymptotic distribution is known to be normal. This
property indeed facilitates the process of inference making.

Macro estimators are distributed as:

“* ASY

( y ~ y ) N 0, ( 2w' 6’1 w. )

or

z =(( Y: - Yh )/S.E. ( Y: ))A§Y

 

NI...

where subscript h refers to the elements in the vector.
This property is examined through normal probability
plotting technique applied to the 2 statistics computed for
each macro parameter. This method applies a special
transformation to the vertical scale of a graph of the
assumed type cumulative distribution function. As a result,
the cumulative distribution of the assumed type will
transform into a straight line. If the z statistics are
indeed normally distributed the corresponding quantiles of
the model when plotted against the sample order statistics
will be nearly linear. When all sample points are considered
the sampling distribution of the z-statistics closely
approximate normal distribution over all macro parameters
(Figure 5-5). Slight deviations from linearity are due to
random sampling fluctuations. Notice that the 2 statistics
assume different range of values. These ranges are -3 to +4,
-4 to +3, -4 to +3, and -4 to +4 for'zyoo, zY , zY ,

01 10
and zY rspectively. Table 5-10 contains the mean and

11
variance of the z statistics.

92

3
E 2

IX
1

.P
E o

C
If -1

E
-2

D
~3

N’

C)

R.
3

14
A‘ 2

I.
1

'V
o

.A

I.
~1

U
E ~2
-3

 

 

 

 

 

 

 

 

Z Z 1
Yoo 1 Yo1 1
11 ~ 11
21 21
5 13
61 61
5‘1 37
03 78
D7 66
1’3 85
2t ‘7
a 3'
as a
46 on
:4 t2'
an .2
as 4.2
:4 3t
36 73
Ft 0‘
H6 6.
35 c
566 111
2E .7
B 6‘
3‘ 126
13 31
12 1 2
11 11
1 1
1 1
-2 o 2 -3 -1 1 3
*3 -1 1 3 -§ -2 o z
Z'Y Z‘Y 1
1.0 1 1 1
11
21 21
13 4
2’ 331
64 6‘
1" 1A5
‘ 79
G" 1;
Gt J‘
at as
6. ‘.
7o ,5
Ht .3
"‘ a!
I. 89
1‘ .5
.. .E
l. on
2‘ 18
K 12
2! 566
37 1‘
25 16
‘ 4
12 3
11 2
1 1
1
-3 ‘1 1 -’ -1
-‘ -2 o 2 -‘ -2 o 1 z 3

 

Figure 5-5.

Normal probability plot of 2 statistics

f°r Yoo' Yo1' Y1o ' andY1-1 -

Note: Symbols A-2 and * signify frequencies 10-36,

respectively.
93

 

 

Table 5-10
Mean and Variance of the
z-Statistics for the Macro Parameters*

 

Z Z Z Z

Yoo Y01 Y10 Y11
Mean .038 .000 ~.041 .072
Variance 1.072 1.137 1.047 1.127

 

*From 640 replications.

The mean and variance of the 2 statistics for different
number of groups, group sizes, and intraclass correlation
coefficients are presented in Tables A-l through A-3,
respectively. No pattern developed between the means or the
variances of the 2 statistics and the number of groups,

group sizes or intraclass correlation coefficients.

Are the Variance Components Asymptotically Unbiased and

Consistent 3

 

The expected errors of the variance components, gulw ,
gBIW and 32 are reported in Tables 5-11 through 5-13,
respectively. The cell values are calculated by first
deviating the estimated values from their respective

parameter values ( T and 02: 1) and then averaging

ulw ' TBlw
them over the number of replications. The 95 percent
confidence interval constructed for the expected errors of

variance-covariance components are given below:

95% C.I. for expected error of TUIW = (.0373, .0647)
95% C.I. for expected error of TBIW = (.0031, .0149)
95% C.I. for expected error of TUBIW = (-.0089, .0029)
95% C.I. for expected error of 02 = (-.0049, .0067)

94

mm

x we m~o>o~ swab“: voessm mumswumo mo muonho ouaﬁomn<taaa
: no m~o>m~ saga“: vasesm muoeﬂumo mo muouuo ousﬁomn<stt

u no m~o>o~ :ﬁzuw: toessm oumeﬂumo mo muouuo muaﬂomn<tt
ucmwowuumoo cowumamucou mmmﬁomuucw u now: ﬂow“ no ocncv:: vco Amm no mncv.: ouwm mo masoum a mo meowumoﬁaamu n scum:

 

 

 

 

asusmmo.~ NN©.~ coo.N ~n~.m

ssummw. moc.~ Hod. ~mw.~ non. mnc.~ mNM.H ©¢®.~
umxma. ooo. wmm. odm. Moo. OOH. mmo.~ men. «we. nNH. Nao. ~c0.~ mqo. omm. coq. Omw.

330.1 H00. «nan moo. mac. Hoo.l «Ca. moo.) Goa. moo.1 moo. moo.l coo. m~o.1 mHH. moo.

c~o.1 moo. qﬁo. mmo. HMO.) mﬁo. oHo.I «mo. oNo.I smo. NOD. one. Mao. «no.1 qmo.l mwo.

_~3.1 030.1 woo. woo. coo. oﬁo. NHO.I moo. 000.1 5N0. moo.) mmo.l ooo.1 Ono. Hmo.l mqo.

.cc. moo. NmH. .moo. moo. «No.1 «mm. mmo. mmo. oHo.I moH. oqo. omm.l wo~.1 ego. «we.

N33.I cmo. omo. mmm. coo.l moo. mmo. Nmm. NNo. 0H0. moo.) cam. moo. 000.1 mea. umm.

_oo.7 000.1 mmm. mac. 000.1 boo. mmm. «no. Mac. 500.1 nce. owe. mma. mHo.I HMO. omo.

moo. cmo. cqo. own. one. mmo.l moo. Now. nmo.l coo.l omo. com. who. moo. mmo.l Hwo.l

mmo. mac. ~ON. mmo. Oqo.| ¢~O.I ova. ONO. 50H.I NH0.1 maﬁ. ONO. mn~.1 NNO.1 mom. o~o.:

wm.u OH.nc mm.nu o~.uv mN.uu OH.nu mm.nv o~.uv mN.lt OH.nv mN.uc o~.uv mN.uv o~.uc mw.uc o~.n1

.. a. n.— ..F— o n.— .-
Omﬁnx 00 can: OH

 

i

if

<

cw monasaumm nouum

Hath maaob

cm

x 00 m~o>o~ c~zu~3
: wo m~o>o~ c~zu~3
c we m~m>m~ =~zu~3
uzo~o~uaoou co~bm~mcuou mmm~omuucq a :u~3 A0n~ no 001:0:2 1cm anm no nncv.: 0N~m

seesaw ouma~umo mo muouum ou=~oma<****
vmesam ouos~umm wo muouum mus~oma<¢zt
umEE:m mums~umo we muouuo ou=~omn<ut
mo masoum x «o mco~umu-qou n sonar

 

 

 

 

 

*«aumoN. n~m. 000. 0N~.~

***~O~. N0~. ONO. 00m. NGH. ~hc. 00¢. N~h.
xqu. m~0. ~m~. ~00. OQO. 0m0. ~N~. h-. 0m~. OMO. ~0N. 00~. ~mm. ~50. 00m. «OQ.
OmO. 000.1 0m0.1 000.1 0~0. NOO.I nm0.1 #00. nnO. 000. who. N~0.1 ~N~.1 000. mNO. «N0.
..Xﬁu. m00.1 OOOuI ~00. NOO.1 ~00. m00. 000. 000. N00. O~O.l n00.1 ~OO.1 000.1 m~0. NMO.
000.1 000. 000. N00.1 ~00. 000. ~00. ~00. ~00.1 N00.1 0~0. O~0. m00.1 0~0.1 mOO. 000.
m00.1 ~00. OmO. m00.1 000.1 .1 N00.1 MOO. m00.1 .1 n00.l 500.1 Nm~. m00.1 <00. 0~0.
O~0. v00. O~0. 0~O. m00.1 000.1 000. nNO. -0.1 «00.1 NMO. ~50. va. ~00.1 mm0.l 050.
«00.1 ~00.1 m~0. O~O. ~00. ~00. m~0. meO. 000. 000.1 ~m0. 0N0. N00.1 000. 050. ~00.
o_O. «00.1 0N0. m~0.1 «00.1 h~0. Q~O.1 0N0. mmO. 0N0. m00.1 0N0. «00.1 Nm0.1 moo. 0m~.
~00. «00. -O.1 m00.1 «~O. ~00. ~NO. 000. N~0. N00. 500. 0N0. MNO.1 .1 OmO. 5N0.

mm.nc 0~.nv mm.uc 0~.nc mN.nv 0~.lc mm.uv O~.Iv mN.nc O~.Iv mm.lu O~.nv mN.nv 0~.nv mN.uv O~.nc
:F— .P— ....- -= 0.: u: up: ...—
0m~ux Ooua Onlx O~ua
.4 ...—me

I\

:~ moumEHumm nouum

N~1n m~nmh

mm

x we m~o>o~ c~zu~3 amassm oume~umm mo mcouum muz~omn<tazu
: mo m~m>o~ :~zu~3 «weszm mumEHumo mo muouum oua~omn<ttt
u no m~u>o~ cwzuuz cmessm oume~umo mo muouuo oua~oma<tt

acmmomwuoou :owum~mccoo mmm~ocuu:~ c cu~z A0n~ no 001:0:c was Ana no mucv.: ouwm wo masoum a mo mco~umo-amu n socks

 

 

 

 

#***NMM. 00«. 000. 050.
4$9~00. ONN. ~00. m~«. ~M~. o««. ~0~. «mm.

arm0. Mm0. «M~. N«~. ONO. NmO. «MN. Mc~. m00. 000. «OM. M«~. 050. m0~. 0mm. ~mm.
Mo0.1 ~00.1 N~0.1 ~00. «00.1 mOO. ~N0. 0~O. M00.1 ~00. ~00. «00.1 N00. -0.1 NNO.1 nm0.1
M00.1 ~00.1 m~0.1 moo. ~00.1 000. -0. «~0. m~0. 0~O. 0~0.1 ~00. m00. ~00.1 o~0. mMo.1
.00. Mo0.1 0N0.1 ~00.1 N00. N00. «~0.1 «Mo.1 M00.1 ~00.1 000. ~N0. N~0. -O.1 «NO. m00.1
:00. M~0.1 «~0.1 o~0.1 n~0. m00.1 ~00.1 M00. 0~O.l «~0.1 «00.1 0~0. m00. m00.1 5N0. m~0.
:00. ~00.1 M00. MNO. ~00. m~O. 050.1 500. 000.1 «00. 0-4 “«O.1 M00. O~O.l ~«O.l 0N~.
000. m00.1 MNO.I MNO. ~00. 0~0. 0«O.l NNO.1 0N0.1 M~O. 000.1 h~0.1 M~0.I 0~O. MNO. NMO.1
~00.1 ~00.1 0M0.1 mmO. M00. -0. m«0. o«0.1 ~00. 000. «50. o~0.1 «~0.1 n«0. MNO. ««~.1
N00. 000. 000.1 0N0.1 000. «00. 0M0. ~NO. 000.1 «~0. -0.1 N~O.1 NNO. «00. Ono. Om~.

«N.nv 0~.us mm.uu O~.uc mm.uc O~.uc mm.nv 0~.uv nu.ne O~.nv mm.nv 0~.Ia MN.nv O~.uc mm.uc 0~.n=
C C :: .: :0— .C -.= ...—
Om~ux Oonx 0mm: o~nx
. b
a «4
=~ moums~umm nouum

M~1n o~aoh

A A

The expected errors of TUIW and TBIW are significantly
different from zero i.e., THIW and TBIW are biased estima-
tors of TuIW and TBIW respectively. However, their cova-
riance estimator TUBIW is an unbiased estimator of 'TUBIW

(note: TuBlwis assumed to be zero in model specification

A

section). Finally, 02 may considered to be an unbiased

estimator of 02 since zero is included in the 95% confi-

dence interval of its expected errors. Basically the error

estimates follow a similar general pattern to that observed

for the macro parameters. That is, error consistently de-
creased as either n or K or both increased. However the
intraclass correlation coefficient did not produce a consis-
tent effect across all 16 cells.

A A

Also the size of the errors in Tehqznuidzare consistent-

A

1y smaller than that of T This is due to the assumed

ulw
parameter variance in slope to that of intercept used to
generate data which is larger for 0 than is for 8
Figures 5-6 and 5-7 demonstrate these patterns graphi-
cally. In these plots the variance components are first

transformed logarithmically, corrected for bias and then

plotted against varying combinations of K and n with

d = .10 . This transformation has reversed the range of the
error of estimates in TUIW and IBIW' As a result TE'W now
has larger range of errors than TUIW Even though 02 re-

lies more on n than K, it behaves better when the compound

effects of K and n are present. For example, when d = .10

98

 

 

33—3
C71
3962

1‘25‘31

.3C21

1236

 

1

«JCS
2592 1
3962
‘3325 1
5.5

11175231

 

 

117821 2155111 1
12644 3 121 213511 12
NH 8.
A T T.
2633 2563‘
435 1265211 1 1
355312 11326211111
1 4432122 1 1 21162112 2 11 1
11325341 11 ‘62311 1
265151 21231311 12 11 1
232731 1 1 111‘2 122121 1 1
13 611111 1 111 1 1 1 1 31223 231
5 o 5 3 s
.3 3 5 0 5 a... o o . o
o o 1“ 1 1 ‘
7» 1 . . .
E S T I M A D E R R O R

60

60

60

60

153

150

60

25

150

30

ZS

K

150

10

25

K=150

K:

 

lw for

and 'TB

=.10

h d

N
1t

*0

Error estimates in transformed T

Figure 5-6.

different combinations of k and n

Symbols A-Z signify frequencies 10-35,

respectively.

Note

99

 

 

 

 

 

0‘05 A2
0'
m a 1
m 2
2
a 1
H 1
.u 1 1
3 11 11 2 1
p 2 22 31 2
1 6 1 1 3 2 4
a 55 3 1 594 36 52 331
m o 1 324 156 9 11 DH 5 391
1 225 274 6 36 1 1 522
U 1 322 6 1 2 11
2 1 2 2 3
211 11
m n15 1 3 3 2
m 2
1
w 1
1
O -03
73
.0‘5
11 s 60 5 60 s 60 5 60
25 150 25 150 25 150 25 150
K=10 K=3O K=6O K=150
62

Figure 5—7. Error estimates in transformed (3 for different
combinations of k and n with d=.lO.

Note: Symbols A-Z signify frequencies 10-35,
respectively.

100

A

with n = 150, 02 is reliable even when K = 10. But when K
increases to 30, or more, 82 becomes reliable even with
n = 60.

When the intraclass correlation is increased to .25 the
similar pattern is reproduced except the spread of errors
is generally reduced ( Figures A-3 and A-4 in Appendix ).
Notice that this adverse effect of intraclass correlation
coefficient is in contrast to its effect on macro parame-
ters discussed earlier.

Once again when multiple regression analysis is applied
to the squared errors of these transformed variance compo-
nents the foregoing discussion of differential effects of K,
n, and d is supported. Specifically, a significant linear
relationship is evidenced between the squared errors of
gulw and':;8h1 and (Kn)-1. The intraclass correlation
coefficient d, with negative slope across all variance
components, is significant only for;ulw . -1

When the effect of K and n are considered separately n
accounted for more variability in squared error of

A -1

Tuhl followed by K and then d. However, reverse happened
A -1

for .W3HJ' That is, K explained the most variability
-1
followed by n , but with no significant contribution from

d above and beyond the two predictors.

2 -1
When (I was considered, the predictor n acounted for
-1
the most variability followed by K . Furthermore, when the
-1 2
effect of these predictors were combined into (Kn) , R

increased from .25 to .31. However, intraclass correlation

101

coefficient did not contribute to either case.

Are the Variance Components Asymptotically Efficient 3

This property is examined graphically through plotting
the squared of the transformed values of the variance
components against their respective asymptotic variance

A

(Figures 5-8 and 5-9 ). The asymptotic variance of Tu'w ,

TBIW and 02 are 2/K, 2/K and Z/nK, respectively. 02

approximates its asymptotic variance quite well, as n and K
increase. The estimates of 02 seem to be reliable for all
combinations of K and n except when both take the lowest
value i.e., n = 5 and K = 10.

The two parameter variances approximate their respective
asymptotic variances quite well except when K = 10. However,

across all levels of K estimator TUIW gives a better

approximation of its asymptotic variance than that of

TBlw
Results for the Hypothesis Testing Phase

The objective for this portion of the study is to assess
the effect of variance estimation on inferences about macro
parameters with respect to both robustness and power. To the
extent that variance components are misestimated, the
estimated macro coefficients and their standard errors will
be in error thus affecting the inferences made about macro
parameters.

The situation considered is that of four different

102

 

 

 

 

 

21
A
'1' ..
ulW
E1 17 5
S
4
'T 1 z
I
10.5 ‘
1
M 1
1
7
A 2
1
'T 1
3.5 1 4
IE 2 2
1 1 3 5
9 3 l. c
D 7 C H a
0 91 s 8- a
21 A
T I
\7 B ‘1 1
17.5 1
IX 1
R“ 14
1
2
I
10.5
11
1
11 1
7 1 Z
1 1 2
(I 2 2
3 3
Z 4
El 3 5 2 3 1 3
1 1 3 '.
1 2 4 9
3 7 3 L
8 J a 2
o a a a r.
.0133 .0333 .0667 .2000

A S Y M P T O T I C V A R I A N C E , 2/k

 

Figure 5-8. Plot of transformed estimated and asymptotic
variance of E and T .
ulw Blw

Note: Symbols A-Z and * signify frequencies
10-36, respectively.

103

 

111 O
U)
19 15
#1
3
3’
.1
1% 2
11:1
0
.09 1
2
1
‘< 1
3, 3
.06 1 2
21 z 2
F, 2 1
2 1
>1 1 2 1
:z .03 2 2
1 2 4 4 z
(3 1 4 3 1 1
t0 1 1 2 4 9 2 3 6
2 1 2 4 7 4 8 c 4 8 0 6
o 6 6 6 a 6 6 6 a u 2 P 0 n L 8 9

 

 

 

1 2 3 lo 5 6 7 8 9 10 11 12 13 14 15 16

A S Y M P T O T I C V A R I A N C E , 2/nk

 

Figure 5—9. Plot of transformed estimated and asymptotic
variance of 62,

 

Where:
l=2/n4k4 2:2/113k4 3=2/n4k3 4:2/n4k2
5=2/n2k4 6=2/n3k3 7=2/n3k2 8:2/n4kl
9=2/n2k3 lO=2/n2k2 ll=2/nlk4 12:2/n3kl
13:2/nlk3 14:2/n2k1 15:2/n1k2 16:2/n1k1
kl=10 k2=30 k3=60 k4=150
n1: 5 n2=25 n3=60 n4=150

Note: Symbols A-Z and * signify frequencies
10-36, respectively.

104

combinations of K and n along with five other factors
described in Chapter IV. The design for this part of the
study allows for an assessment of robustness and power under
unknown variance components when: (1) total sample size is
considered, N; (2) the number of groups is varied (K = 10,
30, 60, and 150); (3) group size is varied (n = 5, 25, 60,
and 150); and (4) the intraclass correlation coefficient is
varied (d = .10, and .25).

The data for the first part consists of 640
replications, parts two and three contain 160 replications
each of four combinations of K and n, and the last part is

based on 320 replications each of two combinations of d.

Robustness Under Various Conditions

This section evaluates the effect of variance estimation
on tests of macro parameters based on total sample size, N,
different levels of K, n, and intraclass correlation
coefficient, d. Since the data are randomly generated via
Monte Carlo methods, random error in the data must be
considered. To take this error into account, the standard
error (8.3.) of a proportion for a sample size equal to the
number of replications is employed. The 3.3. for a
proportion is estimated by ( P(1 - P)/N)1/2 , where P is the
true value of the proportion, and N equals the number of
replications. Since the true value of P (i.e., nominal
alpha) is known, this formula is used to calculate the 8.3.

at the three nominal alpha levels considered. These are

given in Table 5-14.

105

Table 5-14

Standard Errors for Nominal Alpha Levels
and Number of Replications Used in the Study

 

 

Alpha N=640 N=320 N=160
.01 .0039 .0056 .0079
.05 .0086 .0122 .0172
.10 .0119 .0168 .0237

 

With a reduction in the number of generated data sets
comes an increase in standard errors. Given known parameters
(i.e., nominal alpha levels), the standard error of a
proportion may be used to calculate confidence intervals
around the known parameters instead of probability intervals
around the sample estimates. Using the standard procedure,
95 and 99 percent confidence intervals for the three nominal
levels considered are presented in Table 5-15. Thus,
obtained alpha levels within these intervals may be

considered to be within sampling error of nominal alpha.

Total Sample Size and Robustness

 

Table 5-16 contains the actual alpha levels for all
parameters under central unknown variance components
situation when all combinations of K and n considered
together. The empirical type I error rates are consistently

exceeding the nominal error rates across all macro

106

Table 5-15

Probability Intervals for Nominal Alpha Levels
and Number of Replications Used in the Study

 

a) 95% Probability Intervals

 

 

 

 

 

 

Alpha N=640 N=320 N=160
.01 (.0024, .0176) (.0000, .0210) .0000, .0255)
.05 (.0331, .0669) (.0261, .0739) .0163, .0837)
.10 (.0767, .1233) (.0671, .1329) .0535, .1465)
b) 99% Probability Intervals
Alpha N=640 N=320 N=160
.01 (.0000, .0201) (.0000, .0245) .0000, .0304)
.05 (.0278, .0722) (.0185, .0815) .0056, .0944)
.10 (.0693, .1307) (.0567, .1433) .0389, .1612)
Table 5-16
Type I Error Rates for Tests of Macro Estimators
Under a True Nu11*
e.s 01 =.Ol 0‘ -.OS 0% =.10
Y 00 0 .020“ .059 .119
Y 01 .31 .022** .066 .119
Y 10 .76 .013 .061 .112
Y 11 .16 .019** .072** .122

 

*From 640 replications with e.s. effect size.

#*Outside the 95% confidence interval.

107

parameters. Especially the error rates are relatively large

01 and 111 than that of 710

most values tend to be within 95% confidence intervals. The

when testing YOO ,y However,

exceptions are for y and Y01 at a = .01, and for at

00 Y11
.01 and .05 alpha where all are within 99% confidence
interval but Y01 . When outside the probability intervals,

empirical alpha levels are all liberal.

Number of Groups and Robustness

 

Tables 5-17 through 5-20 (part a) present the type I

error rates for tests of macro estimators Y00 and

' Y01’ Y10
Yll' respectively when the number of groups varied, with K =
10, 30, 60, and 150. The values for all macro parameters
tend to be within 95% confidence intervals of the nominal
alpha across all K levels. When outside the confidence
interval, empirical significance levels are all liberal.
Exceptions are for'y10 with K = 30 at .01 and .05 alpha
and for'y11 with K = 30 and 150 at .05, .10 and .05 alpha
respectively. However, these values are typically within
99% confidence interval. An unexpected finding from this set
of results is that for a given macro parameter the largest

type I error occurred randomly regardless of number of

groups .

Sample Size and Robustness
Tables 5-17 through 5-20 (part b) give the type I error

rates for tests of macro parameters-y00 7 and‘rll

' Y01’ Y10
respectively for experimental conditions with n = 5, 25, 60,

108

Table 5-17

Type I Error Rates for Test of Macro Parameter,'Y00 ,
Under a True Null

 

a) For different number of groups, k.*

k e.s. 0‘=.01 o‘=.05 0‘=.10
10 0 .025 .075 .119
30 .019 .075 .138
60 .013 .031 .087

150 .025 - .056 .131

*From 160 replications with e.s. effect size.

b) For different group size, n.*

n e.s. ' 6=.01 OL=.05 OL=.10
5 o .025 .063 .112
25 .019 .050 .119
60 .019 .050 .119
150 .019 .075 .125

*From 160 replications with e.s. effect size.

c) For different intraclass correlation coefficients, d.*

d e.s. 9:.01 “=.05 “=.10
.10 0 .022** .047 .100
.25 .019 .072 .138

*From 320 replications with e.s. effect size.

 

**Outside the 95% confidence interval.

109

Table 5—18

Type I Error Rates for Tests of Macro Parameter,)%)1,
Under a True Null

 

a) For different number of groups, k.*

k e.s. q =.01 a =.05 9.=.10
10 .31 .025 .081 .138
30 .019 ' .056 .112
60 .025 .050 .100

150 .019 .075 .125
*From 160 replications with e.s. effect size.
b) For different group size, n.*
n e.s. o.=.01 a.=.05 a.=.10
**

5 .31 .031 .075 .138

25 .32 .013 .050 .081
** **

60 .31 .031 .081 .156

150 .32 .013 .056 .100

*From 160 replications with e.s. effect size.

c) For different intraclass correlation coefficients, d.*

d e.s. 01:.01 0t=.05 0:.10

.10 .22 .019 .053 .109
** **

.25 .41 .025 .078 .128

*From 320 replications with e.s. effect size.

 

**Outside the 95% confidence interval.

110

Table 5-19

Type I Error Rates for Tests of Macro Parameter,Y10 ,
Under a True Null

 

a) For different number of groups, k.*

k e.s. 0:.01 10:.05 01:.10

10 .76 .013 .044 .106
** **

30 .031 .094 .119

60 .006 .056 .106

150 .000 .050 .119

*From 160 replications with e.s. effect size.

b) For different group size, n.*

h e.s 01:.01 01 =.05 0t =.10
5 .73 .006 .050 .100
25 .78 .006 .075 .144
60 .73 .025 .081 .125
150 .78 .013 .038 .081

*From 160 replications with e.s. effect size.

c) For different intraclass correlation coefficients, d.*

d e.s a =.01 a =.05 a =.10
.10 .72 .019 .059 .103
.25 .79 .006 .063 .122

*From 320 replications with e.s. effect size.

 

**0utside the 95% confidence interval.

111

Table 5-20

Type I Error Rates for Tests of Macro Parameter,Y11.
Under a True Null

 

a) For different number of groups, k.*

k e.s. a =.01 a =.05 a =.10
10 .16 .019 .069 .112
** **
30 .025 .087 .150
60 .019 .038 .087
**
150 .013 .094 .138

*From 160 replications with e.s. effect size.

b) For different group size, n.*

n e.s. a=.01 a=.05 a=.10
5 .16 .006 .044 .081
**
25 .17 .031 .069 .106
60 .16 .013 .081 .138
150 .17 .025 .094** .162**

*From 160 replications with e.s. effect size.

c) For different intraclass correlation coefficients, d.*

d e.s. 0:.01 OL=.05 01:.10
.10 .11 .025“ .081** .128
.25 .22 g .013 .063 .116

*From 320 replications with e.s. effect size.

 

**Outside the 95% confidence interval.

112

and 150. For all macro parameters, actual significance
levels tend to be within the 95% probability intervals of
nominal values across all levels of n. When outside the
confidence intervals, empirical alpha levels are all
liberal. Exceptions are forY01 with n = 5 at .01 alpha and
with n = 60 at .01 and .10 alpha, and forYEI with n = 25 at
.01 alpha and with n = 150 at .05 and .10 alpha levels.
However, all values are typically within 99% probability
intervals of the nominal alpha. Again no pattern emerged.
That is, for a given macro parameter the largest type I
error occurred randomly regardless of sample size and effect
size. The only exception is for Y01 where departure from
nominal alpha was fairly small with an increase in effect

size with n = 25 and 150.

Intraclass Correlation Coefficient and Robustness

Tables 5-17 through 5-20 (part c) report the type I
error rates for tests of macro estimators YOO , Y01 , Ylo ,
and'Y11 , respectively when the intraclass correlation
coefficient varied, with d = .10 and .25. Once again most
values tend to be within 95% confidence intervals of the
nominal alpha for both levels of d. The values outside of
this interval are all liberal. However, values not contained
within 95% confidence intervals are all within 99%
probability interval. These values are: YOO with d = .10

at a = .01; with d = .25 at a = .01 and .05; and Y11

Yo1
with d = .10 at a = .01 and .05. Again no pattern emerged

113

with respect to d and /or effect sizes.

Power Under Various Conditions

The goal of this portion of the study is to evaluate the
power of the tests of macro parameters in rejecting the null
hypothesis under unknown variance components situation by
considering total sample size, N, different levels of K, n
and intraclass correlation coefficient, d. The empirical
estimates of power (P') may also be compared to the
theoretical values of power (P") obtained through nominal
noncentrality parameter discussed in Chapter IV.

Because of the way the null and true alternative
hypotheses are set up (see Chapter IV) the implementation
and discussion of power analysis will be limited only to

macro parameters YOI , and Y11 . The actual parameter

’ Y10
value of Yoo is set at zero. Thus, power analysis cannot be

applied.

Total Sample Size and Power

 

As shown in Table 5-21 the empirical power for all macro
parameters considering all four combinations of K and n
together are quite high.

0f the three macro parameters 710 consistently obtains
the highest power, followed by Y01 and Y11 . This pattern
of order in power is highly consistent with the magnitude of

the effect sizes in the macro parameters. Within each macro

parameter power is always larger at larger nominal levels.

114

Table 5-21

Power for Tests of Macro Parameters*

 

e.s. p p p p p'** p"***

‘Y01 .31 .9382 .9495 .9846 .9881 .9932 .9949
'Y10 .76 .9999 .9999 .9999 .9999 .9999 .9999

'Y11 .16 .8023 .7995 .9292 .9265 .9625 .9616

 

*From 640 replications with e.s. effect size.
**Empirica1 power
***Nomina1 power

The power estimates of macro parameters are either equal
or very close to the theoretical values (differences are
statistically insignificant). Within each nominal alpha
level, empirical power is smaller than the nominal power for

YOI . The situation is reversed with respect to Yll

Number of Groups and Power

Tables 5-22 through 5-24 (part a) give empirical and

nominal power for macro parameters and Y11 ,

Y01 ' ‘Y10 ’
respectively when the number of groups varied, with K = 10,
30, 60, and 150. Again Y10 has the highest power across all

levels of K and all alpha levels. with effect size

Y01
e.s. = .31 obtains the next highest place but reaches the
same degree of power (.9999) with K = 150 at all levels of
alpha. On the other hand, Yll obtains the same power with

the same value of K but only at a = .05 and .10.

115

Table 5—22

Power for Tests of Macro Parameter

Y01

 

a) For different number of groups, k.*

a: =.Ol C1 =.05 C1 =.10
k e.s. pl pt! pl p" p!** p!'***
10 .31 .2546 .2119 .4801 .4286 .6064 .5517
30 .6736 .7054 .8577 .8770 .9162 .9292
60 .9686 .9772 .9934 .9955 .9974 .9983
150 .9999 .9999 .9999 .9999 .9999 .9999
*From 160 replications with e.s. effect size.
b) For different groups size, n.*
0. =.01 a =.05 OL =010
n e.s. p" p" pl p" p! p"
5 .31 .6141 .7517 .8186 .9015 .8888 .9463
25 .32 .9406 .9564 .9854 .9898 .9936 .9959
60 .31 .9772 .9778 .9956 .9957 .9984 .9984
150 .32 .9893 .9846 .9982 .9973 .9993 .9989
*From 160 replications with e.s. effect size.
. c) For different intraclass correlation coefficients, d.*
a =.01 a =.05 0' =.10
d .5. pi. p" pl p" p! p"
.10 .22 .8997 .9265 .9713 .9808 .9864 .9913
.25 .41 .9641 .9664 .9920 .9927 .9968 .9971

*From 320 replications with e.s. effect size.

 

**Empirical power
***Nominal power

116

Table 5-23

Power for Tests of Macro Parameter

Y10

 

a) For different number of groups, k.*

01=.01 a=.05 01:.10
k e.s. p' p" p' p" p'** p"***
10 .76 .9999 .9999 .9999 .9999 .9999 .9999
30 .9999 .9999 .9999 .9999 .9999 .9999
60 .9999 .9999 .9999 .9999 .9999 .9999
150 .9999 .9999 .9999 .9999 .9999 .9999

*From 160 replications with e.s. effect size.

b) For different group size, n.*

a =.Ol a =.05 a =.10
k 6-8- p' p" p' p" p' p"
5 .73 .9999 .9999 .9999 .9999 .9999 .9999
25 .78 .9999 .9999 .9999 .9999 .9999 .9999
60 .73 .9999 .9999 .9999 .9999 .9999 .9999
150 .78 .9999 .9999 .9999 .9999 .9999 .9999

*From 160 replications with e.s. effect size.

. c) For different intraclass correlation coefficients, d.*

a =.01 a =.05 a =.10

d e.s. p' p" p, p" p' p"
.10 .72 .9999 .9999 .9999 .9999 .9999 .9999
.25 .79 .9999 .9999 .9999 .9999 .9999 .9999

*From 320 replications with e.s. effect size.

 

**Empirical power
***Nominal power

117

Table 5-24

Power for Tests of Macro Parameter

Y
11

 

a) For different number of groups, k.*

a =.Ol a =.05 ‘1 =.10
k e o S . p! p" p! p" p 0 ** p"***
10 .16 .1423 .1314 .3264 .3050 .4443 .4247
30 .5040 .4840 .7357 .7190 .8264 .8159
60 .8665 .8621 .9582 .9564 .9798 .9783
150 .9996 .9997 .9999 .9999 .9999 .9999
*From 160 replications with e.s. effect size.
b) For different group size, n.*
a = 01 a = 05 a =.10
n .S. p! p" p! p" p! p"
5 .16 .2743 .3156 .5040 .5517 .6293 .6736
25 .17 .7422 .7357 .8962 .8944 .9429 .9406
60 .16 .9162 .9082 .9767 .9744 .9896 .9881
150 .17 .9767 .9693 .9955 .9936 .9982 .9974
*From 160 replications with e.s. effect size.
c) For different intraclass correlation coefficients, d.*
a =.Ol a =.05 a =.10
d .S. p! P" p! p" p! P"
.10 .11 .6915 .6808 .8686 .8621 .9236 .9207
.25 .22 .8849 .8849 .9656 .9656 .9834 .9834

*From 320 replications with e.s. effect size.

 

HEmpirical power
***Nomina1 power

118

As shown in Figure 5-10 with K = 150 the power curves
for the three macro parameters are indistinguishable. With
respect to number of groups, power is best with K = 60 and
150 and worst with K = 10 when ‘031 is considered and best
with K = 150 and worst with K = 10 for 111

Empirical powers are always close but smaller than the
theoretical powers for ‘le except with K = 10 and close but
larger than the nominal power for ‘Yll except with K = 150
(differences between P' and P" are atatistically

insignificant).

Sample Size and Power

 

Tables 5-22 through 5-24 (part b) give tha actual and
nominal power for macro parameters ‘y01, 710 , ands-y11 ,
respectively for groups of size 5, 25, 60 and 150. Across
all levels of n power improved relative to what it was when
levels of K was considered. This was consistent across all
macro parameters and three alpha levels (to evaluate simul-
taneous effect of K and n on empirical and theoretical power

see Tables A-4 through A-6 in Appendix ). the macro

Y10
parameter with the larger effect size gained the most power.
With each macro parameter power increased as either effect
size, group size, or alpha level increased.

As shown in Figure 5-11 again with n = 150 the power
curves for the three macro parameters are indistinguishable.

With regard to sample size, power is best with n = 25 or

, but only with n = 60 and 150 when 'Y is

above for Y 11

01
considered.

119

1.00

.90

.80

.70

.60

.50

.40

.30

.20

.50

.40

.10

1.00

.90

.80

.70

.60

.50

I40

.30

.20

.10

 

 

 

/' .___.¥10
I “‘91 ———“Y01

”"”"Y11
o———+————-

    

 

o———-¢Y1 0
a --05 Y01
_ """ ‘Y11

 

 

 

 

 

/

___]10

~ ----- -ﬂ11
I l l I i l l l l l l l I f I
10 30 60 150

 

Power curves of Y 01, 710 and ‘Y11 for

different no. of groups, R.

120

 

.90

.80

.70

.60

.50

.40

.30

.20

.10

1.00

.90

.80

.70

.60

.50
.40

.30

.10

E 1.00

Figure 5-11.

.90

.80

.70

.60

.50

.40

.30

 

.1

_————_—_———

 

..-..Yio
- -.m Y01
”“"‘Y11

 

 

- h-—d_-_- —¥

’—-—0_o—o—.--—.— n—o- 0-4—ﬁ
a

.-—-<Y10

a -.m ‘————¥01
' ____X11

 

 

 

 

 

 

Power curves of'YOI, Y10 and 111 for
different group sizes,n.

121

Although statistically insignificant nominal power is
larger than empirical power for 701 except when n = 150.
The situation is reversed when Y11 is considered. That
is, empirical power always exceeds theoretical power except

with n = 5

Intraclass Correlation Coefficient and Power

Part c of Tables 5-22 through 5-24 reflects the effect
of intraclass correlation coefficient on power of the macro
parameters YOI' Y10 and Y11 , respectively. Within each
macro parameter power increased as intraclass correlation
coefficient increased. This upward trend was consistent
across all macro parameters. But this is an artifact due to
the effect size. Notice that across all macro parameters
larger values of effect size are paired with the larger
values of intraclass correlation coefficients. Obviously,
the highest power is attained by Ylo with the largest
effect size. Deviations from nominal power was consistently
small and statistically insignificant for all macro parame-
ters. Tables A-7 through A-12 in Appendix represent
differential effect of K, n, and d on empirical and nominal

powers of the macro parameters.

E5 Algorithm: Rate 92 Convergence

 

 

Often EM algorithm is praised for its simplicity of
implementation and numerical stability. Nonetheless, its
rate of convergence is criticized to be slow. A concern here

with this algorithm is its rate of convergence under varying

122

"1...:21 ,

combinations of K and n used in this study.

The convergence criterion was set at .0001 with the
maximum number of iterations allowed fixed at 500. This
limit was reached only three times out of 640; two occasions
with K = 10 and n = 5, and one occasion with K = 30 and
n = 5.

Table 5-25 reports average convergence rate of EM

algorithm with respect to sixteen different combinations of

 

K and n.
Table 5-25
Average Convergence Rate of
EM Algorithm
KalO K=30 K=60 K=150
n=5 259.125 148.750 116.750 106.875
n=25 49.750 19.250 15.625 7.875
n=60 17.875 6.625 3.875 2.125
n=150 3.625 2.125 2.000 2.000

 

There is a clear downward trend in average number of
iterations prior to convergence to ML estimates both with

regard to K and n . Within each level of n the average

123

number of iterations consistently reduced with an increase
in K. This reduction is even more dramatic as a result of n
increasing within any level of K. This is an evidence to the
pronounced effect of n over K in reducing the rate of con-
vergence in EM algorithm. However, any combination of K =
30, 60 and 150 with n = 60 and 150 results in a reasonably
small number of iterations and even more so when n = 150 is

combined with any level of K.

124

CHAPTER VI

DISCUSSION

The results presented in the previous chapter provide an
indication that maximum likelihood estimators of macro para-
meters generated by the EM algorithm behaved as expected.
Conclusions based on these results will be presented in this
chapter, followed by guidelines for the researcher analyzing

multi level data and suggestions for future research.

Conclusions

 

Under the conditions considered in this study, it
appears that:

(1) Macro coefficients generated by the EM algorithm
are asymptotically unbiased and consistent. However, as a
result of prefixing parameter c = “EB/Tu to values .10 and

A A

.50 errors in yoo and Y01 estimates tended to be slightly

higher than that of 910 and 911 . This reflected itself in
dispersion measure where again was higher among 900 and
901 . Additionally, error estimates are considerably affec-
ted by the number of groups and that their precision is
directly proportional to K. But, group size (n) does not
have a consistent effect on the precision, whereas intra-
class correlation coefficient is proportionally related to

the squared error of the macro parameters.

125

(2) Macro coefficients also tend to be asymptotically
efficient. Their variances when stabilized closely approxi-
mate Cramer Rao minimum variance bound. Slopes of regression
lines are over .98 across all macro coefficients.

(3) As far as variance components are concerned 92 is
extremely well behaved, while the two parameter variances
tend to have slightly positive bias. Within group variance

A

0.2

is considerably affected by group size. However, an
increase in the number of groups will farther reduce the

A A

estimated errors in 02 . On the other hand, T W is only

11 l
influenced by group size, and IBIW by number of groups. In
contrast to macro parameters precision in the variance com-
ponents is directly related to the intraclass correlation
coefficient. All three variance components approximated
their asymptotic variance quite well.

(4) Additionally, test statistics for all macro coeffi-
cients resemble normal curve except with slightly heavier
tails. This manifests itself in a larger empirical error
rates which fall beyond two standard errors from the nominal
alpha levels in less than half of the times.

(5) Furthermore, these tests maintain reasonable power
under unknown variance components, except for small sample

sizes. Specifically, macro coefficient with the largest

Y1o
effect size has the lead followed by Y01 and Y11 , respecti-
vely. Notice that Y01 despite having a lower precision ob-
tains a higher power over Yll across all levels of K and n.
This is the case since YOI has an effect size which is

126

double the size of that for Yll

Guidelines for the Researcher

 

The analysis of multilevel data may be undertaken with a
focus on estimating either micro parameters or macro
parameters or both. From the perspective of shrinkage
estimation, macro coefficients are merely vehicles for
improving estimation of micro effects. However, macro models
themselves may be of great importance primarily because they
enrich the class of research questions asked about
educational effects.

Design of research to optimize allocation of resources
is contingent upon the goal and scope of the study. It is
helpful to consider three different cases. In the first
case, suppose the interest is focused on the estimation of
micro parameters at school level with little or no concern
for a broader scale. With a large enough data gathered from
within a particular school (micro unit) micro parameters can
be estimated with reasonable precision i.e., sampling
variance V will be near zero. On the other hand, the micro
parameters can be estimated by utilizing data not only from
that particular school but from many similar schools
In this situation macro coefficients become a secondary
concern merely used for improving estimation of micro
parameters through empirical Bayes shrinking estimator. On
the other extreme, suppose the scope of the study goes
beyond one school system and has a potential of becoming a

national policy. This immediately entails consideration of

127

more than one group at the price of reducing the within
groups allocation. In this case, estimation of macro models
are of primary and only concern. Notice that situations two
and three reflect different conceptualizations of macro
models. Nonetheless, as far as estimation of macro coeffi-
cients is concerned the two situations are identical.

Since the focus of the present study is on the macro
parameters, only situations two and three will be addressed.
Either situation is concerned with the same question: How
can research resources be allocated for the sake of estima-
ting macro coefficients ? Specifically, how many subjects
should be sampled from within groups (n) and how many groups
(K) should be selected ?

Obviously, determination of statistical power is of
primary concern as a preinvestigation procedure. Findings
from this study provide strong evidence that the tests of
macro parameters maintain reasonable power under unknown
variance components, except for small sample sizes. Tables
A-4 through A-6 in Appendix may be consulted for an expected
power value for different combinations of K and n . Further-
more, Tables A-7 through A-9 in Appendix provide power
values for different combinations of K and n along with
effect size for macro parameters in the population. However,
allocation of samples within and between macro units would
not be optimal unless implemented through maximization of
power for a given cost. That is, selection of any combina-

tion of K and n for an expected power should be done by

128

minimizing total cost.

Assume the design that has been adopted for a particular
study calls for the subjects to be selected from within
micro units (e.g. schools) with moderately low intraclass
correlation coefficients ( i.e., d = .10 ). Additionally
assume that the effect size is approximately equal to .10.
The experimenter now needs to consider the same question of
optimal allocation of samples within and between units.
First consideration should be given to the cost of data
collection per group and per subject within a group. The
second consideration pertains to the power of the test. The
researcher may want a test which is powerful enough to
detect even small differences if they exist (e.g. .80 or
better). By utilizing information contained in Tables A-6
through A-12 the following combinations of K and n would
result in power of at least .81 for tests of macro parame-
ters:

1) K
n

30 2) K
150 n

60 3) K
25 n

150
5

Obviously, selection of any of these combinations is
determined by the relative cost of a unit of K and n, and by
a given budget allocation. On the other hand, with an
improvement in intraclass correlation coefficient of .25 and
of an effect size of approximately .20 one can obtain the
same magnitude of power for tests of macro parameters even
with K = 30 and n = 60. It is worth noting that the

suggested n in each set indicates the minimum allocation.

129

Assuming more budget, this can be exceeded to obtain even a
higher power.

As far as the robustness is concerned the results from
this study, although of preliminary nature, indicate that
macro coefficients' tests are all liberal. Thus, inferences

about macro parameters should be made with great caution.

Suggestions for Future Research

 

The guidelines just discussed are based on a
standardized two-stage hierarchical linear model with unit
normal predictors at both levels of hierarchy. An aspect
that needs to be considered although it is quite likely that
this would not produce any radical changes in the results is
to let both predictors assume general normal distributions.
Moreover, consideration should be given to multiple
predictors at each stage of hierarchy, thereby allowing
interaction effect to be studied as well. Also, further
research should let the within group variance matrix 2
assume a general matrix form. In the present study it was
assumed that 2 is a diagonal matrix with equal diagonal
elements.

Additionally, the intraclass correlation coefficient on
micro level predictor x would be a relevant factor to be
considered. Homogeneity of groups on x assumed by the
present study reduces the mixed model to being a fixed
effect model. But in cluster sampling physical distribution
of the population is generally not random but characterized

by some homogeneity. Thus, fixing this coefficient at zero

130

produces an unrealistic situation.

Although findings from this study are of a preliminary
nature, they provide strong evidence that, even for small
number of replications, tests of macro parameters are
reasonably powerful under unknown variance components,
except for small sample sizes. Nonetheless to substantiate
or refute the results of the present study with regard to
robustness to estimated variance components, consideration
should be given to a large scale study of a similar nature.
Given a larger number of replications, it is expected to

achieve some improvements in robustness.

131

 

APPENDIX

132

 

00

A V1

25931
11255
636

7‘5...

3181
16861
1:1“5 1

8631

6 13512:.

1

1‘

1

 

01

111

1

15‘3‘
12.61
1‘78

635 1

26931
67.24
1 ...-3531

116163 2

157151

 

 

11119.25 1.2238121
3332‘s,. 2 11‘6‘3 1
226225 1 1 323113221
2 52431. 32111163 21. 1.
:9 11.33512 1 1 32523 2!.
1. 3253 11 9.. 9.1213 332 3
2 123223 1‘. 311A 1 :9 ‘ 2 1
5 5 o 5 7 s s 7 5 o s 7 s
0 3 3 o o 0 o 3 3 o 0
o o o u o o o o q o
1. . 1a 1 . 1
. .
E S T I M. A E E R R O R

 

6O

60

60

60

153

K=150

150

60

25

K

150

30

25

150

25

K:

K=10

 

Error estimates inY'OO and Y 01 for different

combinations of k and n with d=.25.

Figure A-l.

Symbols A-Z signify frequencies 10-35,

respectively.

Note

n.

133

 

 

153
.361
3.9521

‘03

22c...
5°21
12‘8‘1

35732

 

36 1
3F:
“-11

140 2

3c32
5?...
1111951

1950.1

 

 

1
0 ..l
aVu
139232 127A
112C31 1 2334
116333 13344
1164331 1 11164322
249112 1 111165231 1
23653 1 13365
1 1 ‘32‘32 1 321552 1
3 2 122.61 6 1 122136211 11
5 s o s S s 7 s o 5 7 5
0 3 3 0 o o 3 3 o o
o o o o o o o . o
1 _ 1 1 u 1
. .
pa es mi T. M” pa nu pm p“ n“ no on

60

60

60

60

150

25

150

60

25

150

30

25

150

25

K=150

10

K

 

Error estimates in Y10 and Y 11 for different

combinations of k and n with d=.25.

Figure A-2.

Note: Symbols A-Z signify frequencies 10-35,

respectively.

134

 

 

 

2F3
361
6:.-

20‘1

3:53
587
7311

2‘331

 

3F).
5011
59‘ 1

1118152

677
167.51
121922 2

111632221

1

1

 

 

u. B
aT.
AT.
2631‘2 1:.‘5
2‘71 1693 1
16922 1 352‘311
‘326 131 1 122 5323
121219111 1 11332‘311
1163232 1 1 2256 2111
1.5363 1 312531 ‘51 1
z 333312.96 334.1 1111 22
s 5 o S s a: 3 s o ...” ‘H
o o o o o o a.
‘ 1 1 ‘ l‘ 1 1
. g - .
E S T I M D E R R O R

60

60

60

60

153

25

150

25

150

25

150

10

25

K=150

K=60

30

 

for

81%.;5

and T
1th d=

W

11
11

Error estimates in transformed'r

Figure A-3.

different combinations of k and

Symbols A-z signify frequencies 10-35,

respectively.

Note

135

 

 

 

 

.55 #
Az
0’
['11
.3
U)
1 3
1-3 1
H 1 1 1
.15 22 2
3 211 1
3:1 23 21 1
P3 122 32 331 1
2111 3554. 21.4. 2 z
[11 0 2365 379 5701 bhHJ
U 312‘ 1557 1611 6811
2‘1 113 12 22
111 11 1 1
1 4.
Fl -.15 1 1
SO 1 3 1
:1 2
1
O -03 1
w
-.‘5
n5 60 560 5 60 5 60
25 150 25 150 25 150 25 153
K=10 K=30 K=6O K=150

A2
Figure A-é. Error estimates in transformed G for different
combinations of k and n with d=.25.

Note: Symbols A-Z signify frequencies 10-35,
respectively.

136

Table A—1

Mean and Variance of the z-Statistics for the
Macro Parameters for Different Number of Groups*

 

Z Z Z Z

Yoo Y01 Y10 Y11
K
Mean .004 .085 .005 .073
10
Variance 1.185 1.187 .956 1.079
Mean .070 -.087 -.104 ~.007
30
Variance 1.120 1.094 1.248 1.232
Mean .048 .001 .039 .066
60
Variance .847 1.101 1.018 1.008
Mean .031 -.001 -.102 .158
150
Variance 1.153 1.171 .971 1.195

 

*From 160 replications.

137

Table A—2

Mean and Variance of the z-Statistics for the
Macro Parameters for Different Group Sizes*

 

z z z z
Yoo Y01 Y10 Y11
n -
Mean .111 .034 .004 .161
5
Variance 1.084 1.292 .990 .914
Mean .080 -.007 -.071 -.007
25
Variance 1.023 .943 1.130 1.212
Mean .009 -.034 .016 .028
60
Variance 1.018 1.235 1.145 1.105
Mean -.048 .005 -.111 .107
150
Variance 1.167 1.095 .934 ' 1.279

 

wv

*From 160 replications.

138

 

Table A-3

Mean and Variance of the Z-Statistics for the
Macro parameters for Different Intraclass
Correlation Coefficients*

 

 

zY zY zY zY
00 01 10 11

d
Mean .116 .052 -.056 .043
'10 Variance .970 1.062 1.080 1.192
Mean -.040 -.053 -.025 .102
'25 Variance 1.165 1.209 1.018 1.063

 

*From 320 replications.

139

1l

Table A— 4

Power for Tests of Macro Parameter

Y *

 

01
(1:,01 a3605 08.10
k n .s. p' p" p' p" p'** pn***

10 5 .31 .1423 .1762 .3264 .3783 .4443 .5000
25 .32 .2005 .1762 .4090 .3783 .5359 .5000

60 .31 .2514 .2327 .4801 .4562 .6026 .5793

150 .32 .4761 .2709 .7088 .5040 .8078 .6255

30 5 .31 .3336 .4483 .5714 .6844 .6915 .7881
25 .32 .7486 .7517 .9015 .9032 .9452 .9474

60 .31 .7517 .7454 .9032 .8997 .9463 .9441

150 .32 .8106 .8315 .9332 .9429‘ .9656 .9706

60 5 .31 .6331 .7823 .8315 .9192 .8980 .9564
25 .32 .9783 .9854 .9957 .9974 .9984 .9990

60 .31 .9940 .9934 .9990 .9990 .9997 .9997

150 .32 .9951 .9946 .9992 .9991 .9998 .9997

150 5 .31 .9898 .9987 .9984 .9998 .9994 .9999
25 .32 .9999 .9999 .9999 .9999 .9999 .9999

60 .31 .9999 .9999 .9999 .9999 .9999 .9999

150 .32 .9999 .9999 .9999 .9999 .9999 .9999

 

*From 40 replications of k groups of size n with e.s. effect size.
**Empirical power

***Nomina1 power

140

 

Table A- 5

Power for Tests of Macro Parameter

 

Y10
a =.01 <1 =.05 a- =.10
k n e.s. p' p" p' .p" p'** p'hsm-n-

10 5 .73 .8051 .8980 .9306 .9699 .9633 .9861
25 .78 .9999 .9999 .9999 .9999 .9999 .9999

60 .73 .9999 .9999 .9999 .9999 .9999 .9999

150 .78 .9999 .9999 .9999 .9999 .9999 .9999

30 5 .73 .9999 .9999 .9999 .9999 .9999 .9999
25 .78 .9999 .9999 .9999 .9999 .9999 .9999

60 .73 .9999 .9999 .9999 .9999 .9999 .9999

150 .78 .9999 .9999 .9999 .9999 .9999 .9999

60 5 .73 .9999 .9999 .9999 .9999 .9999 .9999
25 .78 .9999 .9999 .9999 .9999 .9999 .9999

60 .73 .9999 .9999 .9999 .9999 .9999 .9999

150 .78 .9999 .9999 .9999 .9999 .9999 .9999

150 5 .73 .9999 .9999 .9999 .9999 .9999 .9999
25 .78 .9999 .9999 .9999 .9999 .9999 .9999

60 .73 .9999 .9999 .9999 .9999 .9999 .9999

150 .78 .9999 .9999 .9999 .9999 .9999 .9999

 

*From 40 replications of k groups of size n with e.s. effect size.
**Empirical power
***Nominal power

141

 

Table A—6

Power for Tests of Macro Parameter

 

Y11
a =.01 01 =.05 a=.10
k n e.s. p' p" p' p" p'** p"***
10 5 .16 .0418 .0495 .1314 .1515 .2119 .2358
25 .17 .1112 .1170 .2709 .2843 .3859 .3974
60 .16 .2451 .2061 .4681 .4207 .5948 .5478,
150 .17 .2676 .2005 .5000 .4129 .6217 .5359
30 5 .16 .1492 .1788 .3372 .3783 .4562 .5040
25 .17 .3974 .3669 .6406 .6103 .7486 .7224
60 .16 .5160 .5080 .7454 .7357 .8340 .8289
150 .17 .9015 .8665 .9719 .9573 .9868 .9793
60 5 .16 .2546 .2810 .4840 .5160 .6064 .6368
25 .17 .8577 .8531 .9535 .9515 .9772 .9761
60 .16 .9656 .9573 .9925 .9904 .9970 .9961
150 .17 .9871 .9854 .9978 .9974 .9991 .9990
150 5 .16 .8461 .8888 .9484 .9664 .9744 .9842
25 .17 .9987 .9987 .9999 .9998 .9999 .9999
60 .16 .9999 .9999 .9999 .9999 .9999 .9999
150 .17 .9999 .9999 .9999 .9999 .9999 ' .9999

 

*From 40 replications of k groups of size n with e.s. effect size.
**Empirical power
***Nominal power

142

Table A-7

Power for Tests of Macro Parameter

 

701*
a =.0]. 0' 3.605 a =.10

d k n e.s. p! p" pt I)" pv** ptl***
.10 10 5 .22 .0735 .1251 .2005 .2981 .3015 .5871
25 .0838 .0778 .2206 .2119 .3264 '.3121

60 .1841 .1587 .3859 .3520 .5120 .4721

150 .4681 .2776 .7054 .4920 .8023 .6331

30 5 .2611 .4207 .4880 .6591 .6141 .7673

25 .6950 .7019 .8708 .8729 .9251 .9279

60 .6808 .6736 .8599 .8577 .9192 .9162

150 .8133 .8485 .9332 .9505 .9656 .9750

60 5 .5557 .7257 .7734 .8888 .8577 .9370

25 .9778 .9875 .9957 .9978 .9984 .9991

60 .9927 .9934 .9988 .9990 .9996 .9997

150 .9920 .9927 .9987 .9988 .9995 .9996

150 5 .9319 .9875 .9826 .9978 .9922 .9991

25 .9999 .9999 .9999 .9999 .9999 .9999

60 .9999 .9999 .9999 .9999 .9999 .9999

150 .9999 .9999 .9999 .9999 .9999 .9999

 

*From 20 replications of k groups of size n with d intraclass correlation coefficient

of e.s. effect size.
**Empirical power
***Nominal power

143

Power for Tests of Macro Parameter

Table A-8

 

Yolak
a =.01 ‘1 =.05 01 =~10
d k n e.s p! p" pl p" p 1*.” P"***
.25 10 5 .39 .2451 .2389 .4721 .4641 .5948 .5871
25 .43 .3783 .3336 .6217 .5714 .7324 .6915
60 .39 .3300 .3228 .5714 .5636 .6915 .6808
150 .43 .4801 .2676 .7157 .5000 .8106 .6217
30 5 .39 .4129 .4721 .6554 .7088 .7611 .8051
25 .43 .7967 .7995 .9265 .9279 .9608 .9625
60 .39 .8159 .8078 .9345 .9319 .9664 .9641
150 .43 .8106 .8133 .9332 .9345 .9649 .9656
60 5 .39 .7054 .8315 .8770 .9418 .9292 .9706
25 .43 .9783 .9834 .9957 .9970 .9984 .9988
60 .39 .9952 .9934 .9993 .9990 .9997 .9997
150 .43 .9969 .9960 .9996 .9994 .9999 .9998
150 5 .39 .9991 .9999 .9999 .9999 .9999 .9999
25 .43 _ .9999 .9999 .9999 .9999 .9999 .9999
60 .39 .9999 .9999 .9999 .9999 .9999 .9999
150 .43 .9999 .9999 .9999 .9999 .9999 .9999

 

...-

*From 20 replications of k groups of size n with d intraclass correlation coefficient
of e.s. effect size.
**Empirical power
***Nominal power

144

Table A—9

Power for Tests of Macro Parameter

 

Y1o*
01=.01 a =.05 a =.10
d k n ' e o S. ‘ p! p" pl p" p I ** p"***
.10 10 5 .74 .7224 .8599 .8849 .9554 .9357 .9783
25 .71 .9999 .9999 .9999 .9999 .9999 .9999
60 .74 .9999 .9999 .9999 .9999 .9999 .9999
150 .71 .9999 .9999 .9999 .9999 .9999 .9999
30 5 .74 .9999 .9999 .9999 .9999 .9999 .9999
25 .71 .9999 .9999 .9999 .9999 .9999 .9999
60 .74 .9999 .9999 .9999 .9999 .9999 .9999
150 .71 .9999 .9999 .9999 .9999 .9999 .9999
60 5 .74 ..9999 .9999 .9999 .9999 .9999 .9999
25 .71 .9999 .9999 .9999 .9999 .9999 .9999
60 .74 .9999 .9999 .9999 .9999 .9999 .9999
150 .71 .9999 .9999 .9999 .9999 .9999 .9999
150 5 .74 .9999 .9999 .9999 .9999 .9999 .9999
25 .71 .9999 .9999 .9999 .9999 .9999 .9999
60 .74 .9999 .9999 .9999 .9999 .9999 .9999
150 .71 .9999 .9999 .9999 .9999 .9999 .9999

 

*From 20 replications of k groups of size n with d intraclass correlation coefficient
of e.s. effect size.
**Empirical power
***Nomina1 power

145

Table A-IO

Power for Tests of Macro Parameter

 

710*
a=.01 a=.05 (1:.10
d k n e.s. pt P" p! P" p!” p"***
.25 10 5 .73 .8729 .9265 .9599 .9808 .9808 .9913
25 .85 .9890 .9854 .9982 .9974 .9993 .9990
60 .73 .9999 .9999 .9999 .9999 .9999 .9999
150 .85 .9999 .9990 .9999 .9999 .9999 .9999
30 5 .73 .9999 .9999 .9999 .9999 .9999 .9999
25 .85 .9999 .9999 .9999 .9999 .9999 .9999
60 .73 .9999 .9999 .9999 .9999 .9999 .9999
150 .85 .9999 .9999 .9999 .9999 .9999 .9999
60 5 .73 .9999 .9999 .9999 .9999 .9999 .9999
25 .85 .9999 .9999 .9999 .9999 .9999 .9999
60 .73 .9999 .9999 .9999 .9999 .9999 .9999
150 .85 .9999 .9999 .9999 .9999 .9999 .9999
150 5 .73 .9999 .9999 .9999 .9999 .9999 .9999
25 .85 .9999 .9999 .9999 .9999 .9999 .9999
60 .73 .9999 .9999 .9999 .9999 .9999 .9999
150 .85 .9999 .9999 .9999 .9999 .9999 .9999

 

*From 20 replications of k groups of size n with d intraclass correlation coefficient
of e.s. effect size.

**Empirical power

***Nominal power

146

 

Table A-ll

Power for Tests of Macro Parameter

 

Y11*
O. =.01 _ 0' =.05 (1 =.10
d k n e.s. p' p" p' p" p'** p"***
.10 10 5 .12 .0228 .0294 .0838 .1003 .1446 .1685
25 .10 .0694 .0823 .1922 .2177 .2912 .3228
60 .12 .2005 .1446 .4129 .3300 .5359 .4483
150 .10 .3156 .1841 .5517 .3859 .6736 .4880
30 5 .12 .1112 .1469 .2743 .3336 .3859 .4522
25 .10 .3015 .2676 .5359 .4960 .6591 .6217
60 .12 .2643 .2578 .4920 .4840 .6179 .6103
150 .10 .8212 .7734 .9382 .9147 .9678 .9535
60 5 .12 .0934 .1093 .2420 .2709 .3520 .3859
25 .10 .8389 .8365 .9463 .9452 .9726 .9719
60 .12 .9082 .8869 .9744 .9664 .9881 .9990
150 .10 .9812 .9788 .9965 .9960 .9987 .9985
150 5 .12 .7088 .7881 .8790 .9207 .9306 .9582
25 .10 .9949 .9934 .9991 .9989 .9997 .9997
60 .12 .9999 .9998 .9999 .9999 .9999 .9999
150 .10 .9999 .9999 .9999 .9999 .9999 .9999

 

*From 20 replications of k groups of size n with d intraclass correlation coefficient
of e.s. effect size.
**Empirical power
***Nominal power

147

Table A-12

Power for Tests of Macro Parameter

 

Y11*
a =.01 a =.05 a =.10
d k n e.s. p! . p" p! ‘ p" pkg”? p!!***
.25 10 5 .19 .0708 .0808 .1977 .2148 .2946 .3192
25 .24 .1685 .1611 .3632 .3557 .4880 .4801
60 .19 .2912 .2843 .5279 .5160 .6480 .6406
150 .24 .2266 .2177 .4483 .4364 .5714 ‘ .5596
30 5 .19 .1977 .2119 .4052 .4247 .5319 .5517
25 .24 .5000 .4801 .7324 .7123 .8238 .8106
60 .19 .7642 .7549 .9099 .9032 .9505 .9474
150 .24 .9515 .9279 .9887 .9808 .9953 .9916
60 5 .19 .4960 .5239 .7291 .7517 .8212 .8413
25 .24 .8729 .8665 .9608 .9582 .9808 .9793
60 .19 .9893 .9875 .9982 .9979 .9993 .9991
150 .24 .9916 .9904 .9987 .9984 .9995 .9994
150 5 .19 .9306 .9495 .9821 .9878 .9920 .9949
25 .24 .9997 .9998 .9999 .9999 .9999 .9999‘
60 .19 .9999 .9999 .9999 .9999 .9999 .9999

150 .24 .9999 .9999 .9999 .9999 .9999 .9999

 

*From 20 replications of k groups of size n with d intraclass correlation coefficient
of e.s. effect size.
**Empirica1 power
***Nomina1 power

148

REFERENCE S

References

Aitkin, M., Anderson, D. and Hinde, J. (1981) "Statistical
Modeling of Data on Teaching Styles", Journal 2; the
Royal Statistical Society, Series A, 144(4), 419-461.

 

Aitkin, M., and Longford, N. (1986) "Statistical Modeling
Issues in School Effectiveness Studies", Journal 23 the
Royal Statistical Society, Series A, 149(1), 1-43.

 

Alker, H.R. (1969) "A Typology of Ecological Fallacies", in
M. Dogan and S. Rokkan (eds.), Social Ecology. Cam-
bridge, MA: MIT Press.

Anderson, T.w. (1969) "Statistical Inference for Covariance
Matrices with Linear Strucutre", in P.R. Krishnaiah
(ed.), Multivariate Analysis-II, 55-66. New York: Acade-
mic Press. 1

 

------ . (1971) Estimation of Covariance Matrices with Linear
Structure and Moving Average Processes 92 Finite Order.
Technical Report No. 6, Department of Statistics, Stan-
ford University, Stanford, CA.

 

 

 

Bartlett, M.S., and Kendall, D.G. (1946) "The Statistical
Analysis of Variances- Heterogeneity and the Logarithmic
Transformation", Journal of the Royal Statistical
Society, 8, 128-138.

 

 

Bock, R.D. (1983) "The Discrete Bayesian", in H. Wainer and
Messick (eds.), Principles of Mgdern Psychological
Measurement: ‘5 Pestschrift for Fredrick M; Lord.
Hillsdale, N.J.: Lawrence Erlebaum Associates.

 

 

 

 

Boyd, L.H. Jr., and Iversen, G.R. (1979) Contextual
Analysis: Concepts and Statistical Techniques. Belmont,
CA: Wadsworth.

 

 

Braun, H.I.; Jones, D.H.; Robin, D.B.; and Thayer, D.T.
(1983) "Empirical Bayes Estimation of Coefficients in
the General Linear Model With Data of Deficient Rank",
Psychometrika, 48(2), 171-181.

 

Bryk, A.S., and Raudenbush, s.w. (1987) "Application of
Hierarchical Linear Models to Assessing Change", Psycho-
logical Bulletin, 101(1), 147-158.

 

149

 

 

Bryk, A.S., Raudenbush, S.W.; Seltzer, M.; and Congdon
(1987) An Introduction to HLM: Computer Program and
User's Guide. Version 1.0, University of Chicago.

 

Burstein, L. (1980) "The Analysis of Multi-Level Data in
Educational Research and Evaluation", Review of Research
in Education, 8, 158-233.

 

Burstein, L.; Linn, R.L.; and Capell, F. (1978) "Analyzing
Multilevel Data in the Presence of Heterogeneous Within-
class Regression", Journal of Educational Statistics,
3(4), 347-383.

Burstein, L.; Miller, M.D.; and Linn, R.L. (1979) The use
of Within-Group Slopes as Indices 32 Group outcomes.
Unpublished’Manuscript.

Burstein, L. and Miller, M.D. (1980) "Regression-Based
Analysis of Multi-Level Educational Data", New Direc-
tions for Methodology of Social and Behavioral sciences,
6, 194-211.

 

 

Cooley, W.W.; Bond, L.; and Mao, B. (1981) "Analyzing
Multi-Level Data", in R.A. Berk (ed.), Educational Eva-
luation Methodology. Baltimore, MD: John Hopkins Univer-
sity Press.

 

Cronbach, L.J. (1976) Research on Classrooms and Schools:
gormulation of questions, Design and Analysis. Stanford,
CA: Standard Evaluation Consortium.

 

Cronbach, L.J., and Webb, N. (1975) "Between-Class and
Within-Class Effects in a Reported Aptitude X Treatment
Interaction: Reanalysis of a Study by G.L. Anderson",
Journal of Educational Psychology, 67, 717-724.

de Leeuw, J.D., and Kreft,I. (1986) "Random Coefficients
Models for Multilevel Analysis", Journal of Educational
Studies, 11, 57-85.

Dempster, A.P.; Laird, N.M.; and Rubin, D.B. (1977)
"Maximum Likelihood Prom Incomplete Data via the EM
Algorithm (with discussion)", Journal of the Royal Sta-
tistical Society, Series B, 39, 1-18.

 

Dempster, A.P.; Rubin, D.B.; and Tsutakawa, R.D. (1981)
"Estimation in Covariance Components Models", Journal of
American statistical Association, 76, 341-353.

DerSimonian, R., and Laird, N.M. (1983) "Evaluating the

Effect of Coaching on SAT Scores: A Meta-Analysis",
Harvard Educational Review, 53(1), 1-15.

150

 

 

 

 

 

 

Dielman, T.E. (1983) "Pooled Cross-Sectional and Time-
series Data: A Survey of Current Statistical
Methodology", The American Statistician, 37, 111-122.

 

Efron, B. and Morris, C. (1975) "Data analysis Using
Stein's Estimator and its Generalizations", Journal 2:
the American Statistical Association, 74, 311-319.

 

Elston, R.C., and Grizzle, J.E. (1962) "Estimation of Time-
Response Curves and Their Confidence Bands", Biomtrics,
18, 148-159.

 

Goldstein, H.(1986) "Multilevel Mixed Linear Model analysis
Using Iterative Generalized Least Squares", Biometrika,
73(1), 43-56.

Graybill, P.A. (1976) Theory and Application 2; the linear
Model. Belmont, CA: Wadsworth Publishing Co.

 

 

Hammersley, J.M., and Handscomb, D.C. (1964) Monte Carlo
Methods. New York: Barnes and Noble, Inc.

 

Haney, W. (1980) "Units and Levels of Analysis in Large-
Scale Evaluation", New Directions for Methodology of
Social and Behavioral Sciences, 6, 1-15.

 

 

Hannan, M.T. (1971) Aggregation and Disaggregation in
Sociology. Lexington, MA: D.C. Heath.

 

 

Hartley, H.0., and Rao, J.N.K. (1967) "Maximum Likelihood
Estimation for the Mixed Analysis of Variance Model",
Biometrika, 54, 93-108.

 

Harville, D.A. (1977) "Maximum Likelihood Approaches to
Variance Component Estimation and to Related Problems",
Journal 3; American Statistical Association, 72, 320-
340.

 

Henderson, C.R. (1953) "Estimation of Variance and
Covariance Components", Biometrics, 9, 226-52.

 

Hocking, R.R., and Kutner, M.H. (1975) "Some Analytical and
Numerical Comparisons of estimators for the Mixed ANOVA
Model", Biometrics, 31, 19-28.

 

Kackar, R.N., and Harville, D.A. (1984) "Approximations for
Standard error of Estimation of fixed and Random Effects
in Mixed Linear Models", Journal of the American Statis-
tical Association, 79, 388, 853-862.

 

 

Keesling, J.W. (1977) Some Explorations in Multilevel
Anal sis. Santa Monica, CA: System Development Corpora-
tion.

 

151

 

Keesling, J.W., and Wiley, D.E. (1974) Regression Models
for Hierarchical Data . Paper presented at the Annual
Meeting of the Psychometrics Society.

 

 

Kirk, R.E. (1968) Experimental Design: Procedures for the
Behavioral Sciences. Belmont, CA: Brooks/Cole Publishing
co.

 

Kish, L. (1983) Survey Sampling. Malabar, Pla.: R.E.
Krieger.

Klotz, J.H.; Milton, R.C.; and Zacks, S. (1969) "Mean Square
Efficiency of Estimators of Variance Components", Journal
gf the American Statistical Association, 64, 1383-1402.

Knapp, T.R. (1977) "The Unit of Analysis Problem in
Applications of Simple Correctional Research", Journal
of Educational Statistics, 2(3), 171-186.

Knuth, D. (1969) Seminumerical Algorithms. Reading, Mass.:
Addison-Wesley.

 

Laird, N.M., and Ware, J.H. (1982) "Random-Effects Models
for Longitudinal Data", Biometrics, 38, 963-974.

Lee, V.E. (1986) Multi-Level Causal Models for Social Class
and Achievement. Paper presented at the annual meeting
d?’ the American Educational Research Association, San
Francisco.

 

Linley, D.V., and Smith, A.F.M. (1972) "Bayes Estimates for
the Linear Model", Journal of the Royal Statistical
Society, Series B, 34,1-41.

Longford, N.T. (1985) A Fast Scoring Algorithm for Maximum
Likelihood Estimation in unbalanced Mixed Models with
Nested Effects. An unpublished paper, Institute of
Applied Statistics. Lancaster University, Lancaster,
England.

 

Mason, W.M.; Wong, G.Y.; and Entwisle, B. (1984) "Contextual
Analysis Through the Multi-level Linear Model", Sociolo-
gical Methodology,. San Fransisco, CA: Jossey Bass,
72-103.

 

Miller, J. (1973) Asymptotic Properties and Computation of
Maximum Likelihood Estimates in the Mixed Model‘gf EEE
Analysis of Variance. Tecﬁdical Report No. 12,
Department 3? Statistics. Stanford University, Stanford,
California.

 

------ . (1977) "Asymptotic Properties of Maximum Likelihood
Estimates in the Mixed Model of Analysis of Variance",

152

Annals 3f Statistics, 5, 746-762.

 

Morris, C.N. (1983) "Parametric Empirical Bayes Inference:
Theory and Applications", Journal of the American
Statistical Association, 78(3815, 47-65.

Novick, M.R.; Jackson, P.H.; Thayer, D.T.; and Cole, N.S.
(1972) "Estimating Multiple Regressions in m Groups- A
Cross-Validation Study", British Journal 9; Mathematical
and Statistical Psychology; 25, 33-50.

 

Novick, M.R., and Jackson, P.H. (1974) "Further Cross-
Validation analysis of the Bayesian m-Groups Regression
Method", American Educational Research Journal A 11, 77-
85.

Olsen, A; Seely, J.; and Brikes, D. (1976) "Invariant
Quadratic Unbiased Estimation for Two Variance Compo-
nents", Annals of Statistics, 4, 878-890.

 

Patterson, H.D., and Thompson, R. (1974) Maximum Likelihood
Estimation of Compgnents of Variance, Proceedings of the
8th Internafional Biometrld Conference, 197-207.

 

 

 

Patterson, H.D., Thompson, R. (1971) "Recovery of Inter-
Block Information when Block Sizes are Unequal", Biomet-
rika, 58, 545-554.

Pedhazur, E.J. (1973) Multiple Regression in Behavioral
Research. New York: Holt, Reinhart and Winston.

 

 

 

Pitman, E.J.G. (1938) "The Estimation of the Location and
Scale Parameters of a Continuous Population of any Given
Form", Biometrika, 30, 391-421.

 

Rao, C.R. (1972) "Estimation of Variance and Covariance
Components in linear Models", Journal of the American
Statistical association, 67, 112-115.

 

 

Raudenbush, S.W. (1984) Application of a Hierarchical Linear
Model in educational Research ; Unpublished Doctoral
Dissertation. Harvard University: Graduate School of
Education.

 

 

 

Raudenbush, S.W., and Bryk, A.S. (1986) "Empirical Bayes
Meta-Analysis", Journal 2; Educational statistics, 10,
75-98.

 

Raudenbush, S.W., and Bryk, A.S. (1986) "A Hierarchical
Model for Studying School Effects", Sociology g: Educa-
tion, 59, 1-17.

Raudenbush, S.W. (1988) "Educational Applications of Hierar-
chical Linear Models: A Review", Journal of Educational

153

 

 

Satistics, 13(2), 85-116.

 

Robinson, W.S. (1950) "Ecological Correlations and the
Behavior of Individuals", American Sociological Review,
15, 351-357.

 

Rogosa, D. (1978) "Politics, Process, and Pyramids", Journal
of Educational Statistics, 3(1), 79-86.

 

Rosenberg, B. (1973) "Linear Regression with Randomly
Dispersed Parameters", Biometrika, 60, 61-75.

 

Rubin, ~D.B. (1980) "Using Empirical Bayes Techniques in the
Law School Validity studies", Journal of the American
Statistical Association, 75, 801-827.

 

 

 

------- . (1981) "Estimation in Parallel Randomized Experi-
ments", Journal of Educational Statistics, 6(4), 377-
400.

------ . (1983) "Some Applications of Bayesian Statistics to

Educational data," The Statistician, 32, 55-68.

 

Schmidt, W.H. (1969) Covariance Structure Analysis of the
Multi-variate Random Effects Model. Unpﬁblished disser-
tation, University of Chicago.

 

 

Schmidt, W.H., and Houang, R. T. (1983) A Comparison of Three
Analytical Strategies for Hierarchical Data. Paper pre-
sented at the AnnuallMeeting of the American Educational
Research Association, Los Angeles, California.

 

 

Searle, S.R. (1968) "Another Look at Henderson's Methods of
Estimating Variance Components", Biometrics, 24, 749-
788.

 

------ . (1971a) Linear Models. New York: Wiley & Sons.

------ . (1971b) "Topics in Variance Component Estimation",
Biometrics, 27,1-76.

 

Seely, J. (1975) "An Example of an Inquadmissible Analysis
of Variance Estimator for a Variance Component", Biomet-
rika, 62, 689-90.

Shigemasu, K. (1976) "Development and Validation of a
Simplified m-Group Regression Model", Journal of Educa-
tional Statistics, 1(2), 157-180.

 

Smith, A.F.M. (1973) "A General Bayesian Linear Model",
Journal of the Royal Statistical Society, Series B, 35,
67-75.

 

Sternio, J.L.F. (1981) Empirical Bayes Estimation for a

154

02/81 gié‘
1319797 “51:3

Hierarchical Linear Model ; Unpublished dissertation,
Department of Statistics, Harvard University.

 

Sternio, J.L.F.; Weisberg, H.I.; and Bryk, A.S. (1983)
"Empirical Bayes Estimation of Individual Growth Curves
Parameters and Their Relationship to Covariates",
Biometrics, 39, 71-86.

 

Swamy, P.A.V. (1973) "Criteria, Costraints, and
Multicollinearity in Random Coefficient Regression Mo-
dels", Annals of Economic and Social Measurement, 4,
429-450.

 

Tate, R.L., and Wongbundhit, Y. (1983) "Random Versus
Nonrandom Coefficient Model for Multilevel Analysis",
Journal of Educational Statistics, 8, 103-120.

 

Walsh, J.E. (1947) "Concerning the effect of the Intraclass
Correlation on Certain Significance Tests", Annals ‘2;
Mathematical Statistics, 18, 88-96.

 

Wiley, D.E. (1970) "Design and Analysis of Evaluation
Studies", in Witrock, M.C., and Wiley, D.E. (eds.), The
Evaluation of Instruction. New York: Holt, Rinehart add
Winston.

 

 

------ . (1976) "Another Hour, Another Day: Quality of
Schooling a Potent Path for Policy", in Sewell, W.H.;
Hauser, R.M.; and Featherman, D.L. (eds.), Schooling and
Achievement in American Society, New York: Academic.

 

 

 

Wisenbacker, J., and Schmidt, W.H. (1979) The Structural
Anal sis of Hierarchical Data. Paper presented at the
annual meeting of the American Educational Research
Association.

 

 

Wong, G.Y., and Mason, W.M. (1985) "The Hierarchical
Logistic Regression Model for Multilevel Analysis",
Journal 3; American Statistical Association, 80, 513-
524.

 

155

 

 

 

 

 

 

"I1111114111111)1111,1111?