. . t

5 a. 2 3..
. I‘- ~

g
. 3.3%.
53!. . s
.

L

:1. 5.7.442. :

. .3. .w A I». .4.
l‘ 4.1 .51.

s.

a

3;... . ..
p‘ .v‘, -,E5
3% s .. «nu...

.é....:.: um. .

08.1.5! 2 :.\2§.t:—3:3§
. I . la 1. .1
. 3. a.
glagnhoi:
.. 1.2.5.375

.wnulﬂﬁ .9
x . h... y. b. .
a...l..:.uﬂt:s t.
.3: 42b ;.

.2. .J 51:
I

.
mu». ..
(35‘. xx... 3.930..“
nL).Nm..i$. 3‘ Janna.)
I A ”Mun-l. nu»: 0.9.32.1“

 

Ii .1
3.3.1.5.
:ﬂﬁmar 1.0.3.»
«ﬁrmwmiizte .
.c . EM
(£5. imp...
Jun-.95.!- I. .33
:05‘ 4.2.5....
as? a...
25...... ..
.13....

 

 

 

 

IS
.51.. ,.
71-35).... 1...

3:1,..."
bilr;(tvl
1.. .‘\«. I .\..\.Dl’.l i.

- .3...» w,

.... . . .- $\.i7 ~ .... .I

 

 

. I. . t. n E . . .
.5 .4... a. ﬁgurgrﬁsé.

:3. :.

 

2&0 lHllililziﬂﬂmﬂﬂwlilill/Hilllllll

£36449

LIBRARY

Michigan State
University

 

 

 

 

This is to certify that the
dissertation entitled
Computation of power in the nested random effects

models
presented by l

Xiaofeng Liu

has been accepted towards fulﬁllment
of the requirements for

Ph.D. . Measurement & Quantitative
degree in

 

 

. Methods

Major professor

 

Date H" ’V'éi7

MSU i: an Afﬁrmative Action/Equal Opportunity Institution 0- 12771

PLACE lN RETURN Box to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

 

DATE DUE

DATE DUE

DATE DUE

 

sill5 95; 21311

 

 

 

 

 

 

 

 

 

 

 

 

 

 

11100 W.“

Computation of Power in the Nested Random Effects Models

By

Xiaofeng Liu

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Counseling, Educational PsychologY, and Special Education

1 999

ABSTRACT

Computation of Power in the Nested Random Effects Models

By

Xiaofeng Liu

Nested random effects models contain random effects due to the nested
sampling units. Such models used to be framed as mixed analysis of variance
(ANOVA). Nested data are now often analyzed by means of more‘ﬂexible
models such as hierarchical linear models (HLM) because HLMs accommodate
continuous covariates and unbalanced designs. However, -few people
understand how to compute power in HLMs. This dissertation utilizes the relation
between ANOVA and HLM as a basis for the computation of power in HLM. In
essence, hierarchical linear models without continuous covariates are mixed
ANOVA models. Power functions in ANOVA can be derived through ANOVA
tables, though it is typically difﬁcult to obtain ANOVA tables for complex designs.
The dissertation simpliﬁes the derivation of the ANOVA table through the
structural representation of the models in HLM. The derived ANOVA tables can
then be translated into similar HLM tables for deriving power functions in HLM.
Knowledge of those power functions in HLM allows us to choose appropriate

sample sizes for prospective studies using HLM. Two hypothetical examples are

provided to illustrate the application of power functions in planning educational

studies.

To My Parents

ACKNOWLEDGMENTS

The dissertation is the culmination of my intellectual pursuit for the past three
years. I am grateful for all the people who had helped me ﬁnish this dissertation.
I thank Dr. Stephen Raudenbush for his long term academic guidance and
unlimited support in my doctoral studies. Dr. Raudenbush is a very
knowledgeable and generous person. He ﬁrst introduced me into the
computation of power two years ago, and l have been working very closely with
him in this area. His insight and wisdom have greatly shaped my thinking and
have tremendously helped found the basis of the dissertation. It is really hard to
enumerate all his ideas in my dissertation, and I am sure that I will inadvertently
overlook giving him due credit many places in the dissertation. I am thankful for
Dr. Kenneth Frank, Dr. Mark Reckase, and Dr. Aaron Pallas for agreeing to serve

on my dissertation committee and giving me valuable feedback.

I am also grateful for my parents who provided me the good education and
supported me in any of my endeavors. Last but not the least, I should thank all
my relatives for their unfailing belief in my intellectual ability since my childhood,

and their help that made my ﬁrst trip to the United States possible.

VI.

VII.

TABLE OF CONTENTS

Introduction

Chapter 1: statistical power and its computation

Chapter 2: computation of power in the hierarchical linear model through
the ANOVA table

Chapter 3: computation of power in the hierarchical linear model through
the HLM table

Chapter 4: four examples

Chapter 5: numerical application of power functions in planning
educational studies

Conclusion

vi

LIST OF TABLES

Table 1: HLM formulation of the example

Table 2: translation between HLM and ANOVA terms
Table 3: ANOVA table of the example

Table 4: derivation of EMS—step 1

Table 5: derivation of EMS—step 2

Table 6: derivation of EMS—step 3

Table 7: ﬁnal ANOVA table of the example

Table 8: HLM table of the example

Table 9: HLM table of 3 level CRT

Table 10: HLM table of MST with a covariate at the site level
Table 11: HLM table of a combination of CRT and MST
Table 12: power in cluster randomized trial

Table 13: power in multi-site trial

vii

LIST OF ABBREVIATIONS

HLM Hierarchical Linear Model
CRT Cluster Randomized Trial
MST Multi-site Clinical Trial

ANOVA Analysis of Variance

viii

INTRODUCTION

A random effects model speciﬁes more than one random term. The random
factors contain, levels that are randomly selected from a population of possible
levels. As the number of possible levels of some factors may be very large, it is
impossible to assess all of them. Random inclusion of some levels in the model
becomes economical and convenient. Suppose that we are studying the effect of
schools on implementing one instruction method. There are hundred of
thousands of schools. It would be impossible to examine all the schools. A
sensible strategy is to identify all the schools and randomly sample a few of them

to assess school effects (Littell et al., 1997).

A complete random effects model contains only a mean; the rest of the terms are
random factors. The key interest surrounds the estimation of the grand mean
and the variance of each random component. The random effects may be
crossed in some cases, but nesting of random effects occurs in most real
sampling schemes. Large experimental units are ﬁrst selected, and then the
smaller units are randomly selected from each large experimental unit. In
education, school districts are the natural sampling units from which individual
schools may be randomly chosen. The institutional hierarchical structure often
determines the design of our studies, be it of experimental or survey nature. In
fact, the hierarchical structure of experimental units leads to the nesting of

random effects, which correspond to different units of various sizes

The random effects models can be analyzed through two frameworks. In the
past they were treated as mixed analysis of variance (ANOVA) models. The
mixed ANOVA approach always assumes a balanced design and no continuous
covariates. It therefore restricts its application to many real data analysis. Data
tend to come in with some missing values; and the continuous covariates are
common. People are now often analyzing the data by more ﬂexible models like
hierarchical linear models (HLM). HLMs can include continuous covariates in the

models and can accommodate missing data.

In HLM the models are arranged in a few levels based on the hierarchy of the
sampling units. Corresponding to the experimental units, the hierarchical linear
model may be represented by a sub-linear model at each level. For simplicity we
assume one ﬁxed effect at each level. The generic presentation at each level is

as follows:

Outcome = mean + coefﬁcient*ﬁxed effect + random effect.

The “mean” and “coefﬁcient” may be the outcome variables for the next level.
Each level must contain a “mean”, while the “ﬁxed effect” and “random effect”
may be optional at the higher levels. To generalize the model further we may
include a continuous covariate for each level. The formula of each level changes

into

Outcome = mean + coefficient*ﬁxed effect +

coefﬁcient*covariate + random effect.

The hierarchical formulation reﬂects the structure of the design. It appeals to
people’s intuition, and, therefore, the hierarchical linear model has gained

popularity among researchers in various disciplines.

Much research in hierarchical linear model so far has been focused on estimation
theory and algorithmic implementation of the parameter estimation. Much is yet
to be known about the performance of the test of those parameters in the model.
The power of the test of the parameters in the model is rarely computed for two
reasons. First, the complexity of the model itself deters people from computing
the power of the tests. It requires some mathematical sophistication to carry out
the computation of power of relevant tests. Second, it is hard to derive the power

function of key parameters in HLM.

Power functions are usually required for determining sample size in planning a
study. Many researchers who plan a study using HLM want to choose an
appropriate sample size. For example, a researcher may want to compare two
types of counseling practice in schools. It is logistically easy to have one school
practice one type of counseling; so the researcher decides to randomly assign
half of schools to one type of counseling and the other half of schools to the other

type of counseling. At the end of the study, students in schools will be examined

on certain criterion outcomes, and data will be analyzed as a cluster randomized
trial by HLM. The researcher is interested in knowing how many students from
each school should be recruited in the study. The question can be easily
answered if we have the power function for the test of the main effect of
treatment in the corresponding hierarchical linear model. In this case the power
function may be derived analytically if we know the variance of the estimate of
the treatment (see Raudenbush, 1997). However, there is not a general way to
derive power functions for the general HLM when the design of the study

becomes complex.

Stroup (1998) provides a general way to compute the power for mixed linear
models. Since HLM is a subset of mixed linear models, his approach is
applicable to the power calculation of HLM. If we formulate the hierarchical linear

model in the framework of a linear mixed model, it can be expressed as

Mode|2y=Xﬂ+Zu+e (1)

IZI~MVNII3I [‘5 ill]

The vector ,6 contains the parameters of ﬁxed effects; and the vector

u represents the random effects whose covariance matrix is G; and e is a vector

of individual random error whose covariance matrix R is a diagonal matrix with

the diagonal elements being 0".

It is noted that

E(y)=X,B; Var(y)=V=ZGZ'+R.

Test Ho: K'ﬂ=0,

where K is a matrix containing contrast constants.

(Kﬁ)'[K'<X'V"X)K1"(K'ﬁ) ~ 13......(2) (2)
where A is the non-centrality parameter.

2. = (K',B)'[K'(X'V"X)K]" (K18) (Stroup, 1998).

We essentially treat the model as a case of generalized linear model. In practice

this can not be used because the V will not be known exactly. However, we may

substitute for the V its estimate Vfrom the previously collected data. If Vis
estimated from a large sample, the computed power will be a good

approximation.

The dilemma lies in the fact that unless the same study is replicated, we barely
know V and ,3. In addition, people are interested in one or two key parameters
in their research. They want to have a power function in a closed form to work
with. The matrix representation rarely helps to assess the power of individual

parameters.

The way out of this dilemma is to divide the model into each level. At each level
we look at the corresponding test, standardize a few parameters, and make the
power calculation feasible and meaningful. The practice of standardization is

both convenient and necessary. Standardization of elements in V and ,6 allows

us to explain parameters meaningfully. They make sense to educational
researchers because their studies do not use a common measurement scale.
Standardization allows us to disregard any measurement scale in a particular
study and evaluate a prospective study in a general sense. Raudenbush and Liu
(1999) have created a scheme to standardize the parameters for cluster

randomized trial (CRT) and multi-site clinical trial (MST).

The same approach can be used for HLM in general. Structure-wise, every pair
of levels is either like a cluster randomized trial or a multi-site trial. The same
standardization principles (see Raudenbush, 1997, and Raudenbush & Liu,

1999) used in CRT and MST can apply across different levels.

The key problem revolves around how to construct the test for each parameter at
each level. The construction of the test of parameters at each level requires
knowledge of the standard error of its estimate. When the model is very simple,
there are ready formulas to derive those standard errors. When the model is
complex, standard errors are normally expressed through matrices. There are

not any simple procedures on how to write down those standard errors

algebraically. However, this problem may be solved with the aid of an ANOVA

table.

Mixed ANOVA models are often used in experimental design, although
educational researchers and many other social scientists are now increasingly
replacing them with ﬂexible models like HLM in data analysis. In fact these
models are equivalent if we take continuous covariates out of HLM. Raudenbush
(1993) shows that the nested random effects ANOVA is equivalent to HLM. lf
restricted maximum likelihood is used for estimation, HLM duplicates the ANOVA
estimates of the same parameters. Since the power computation of the mixed
ANOVA is known, we may capitalize on this to come up with a general scheme to

compute the power for HLM.

We may translate HLM without continuous covariates into a mixed ANOVA to
derive the power functions, and then we may extend the computation of power to
HLM with continuous covariates. The computation of power in ANOVA is based
on the expectation of the mean squares in the ANOVA table. In fact the ANOVA
table for even moderately complicated HLMs are hard to obtain. Although there
are procedures by Scheffe (1959), etc., to derive ANOVA tables, these
procedures are too unwieldy to be used. Most of the time, following these

procedures will not allow us to derive the correct ANOVA table.

This dissertation provides a different set of rules and procedures to derive the
mixed ANOVA table. The rules hinge on the structural representation of the
mixed ANOVA model in its HLM format. The structural representation allows us
to identify the relation among all the mean squares in the ANOVA table. The
expectations of those mean squares may easily be obtained once we know

which mean squares are tested against which for a certain test.

The provided rules basically have two implications for methodology. First, they
highlight the fact that ANOVA and HLM without continuous covariates are related
in a systematic way. Once the relation is illustrated, it is easy for people with
training of either HLM or ANOVA to learn the other. Also, the rules enable
people to derive the ANOVA expected mean square table more easily, compared
to other available rules. The new rules complement those old ones by providing a
way to write down complicated models and check the derived results. They are
easy to use. Second, ANOVA originates from experimental design. Planning an
ANOVA design is not new. Power computation is already known in ANOVA.
The relation between HLM and ANOVA allows us to design a HLM study with all

the planning tools from ANOVA.

The dissertation will start with a general introduction to power in the 1St chapter.
In the 2"d chapter it will identify the above-mentioned rules and illustrate the
derivation of ANOVA table of a generic example. In the 3rd chapter we provide

algebraic proof to justify these rules. To simplify the computation of power in

HLM we replace the ANOVA table with a similar HLM table. The HLM table
bears similar features as the ANOVA table except that all the parameters are in
HLM notation. In the 4th chapter four HLM tables are provided for four complex
HLM designs. In the 5th chapter the application of power function is shown with
two examples. In particular, the power functions are used to choose sample sizes
for two hypothetical studies using HLM. The choice between two different
designs is discussed with respect of the performance of statistical power. The
conclusion discusses the power in HLM with continuous covariates and with

missing data.

Chapter 1

STATISTICAL POWER AND ITS COMPUTATION

Hypothesis testing and statistical power

Hypothesis testing usually sets up two complementary hypotheses. One is the
null hypothesis; the other is the alternative hypothesis. It is the null hypothesis
that we usually test; and its rejection establishes the plausibility of the alternative
hypothesis. Normally we put whatever we wish to prove as the alternative
hypothesis and its complementary opposite as the null hypothesis. For example,
to test the differential effectiveness of two teaching methods we state the null
hypothesis that the two methods are equally effective and the alternative that

they are not.

In any case the rejection of the null hypothesis is of our great interest. If the null
is true, its rejection is a type I error. If the alternative is true, our ultimate goal is
to reject the null. The probability of rejecting the null is called the power of the

test. When the null is false, failure to reject the null results in the type II error.

In the following we provide the power functions for T test, F test, and Chi square
tests. In addition, the power function is shown for the test of a random
component in a random effects model, and power functions are also derived for

linear model and mixed linear models.

10

T test
The two-sample t test is a widely used t test. We might suppose that responses
from the experiment group ye1,..., yen are i.i.d. N(ue, oz) and that responses from
the control group yc1,..., yon are i.i.d. N(uc, 02). If we try to produce evidence that
there is a difference in responses between the two groups because of the
treatment administered to the experiment group, we may set the model as
follows:

n;=a+,BX,-;+e,~,- ; i:1,2...n; j=e (experimental),c (control)
X),- is 1/2 if it is the experimental condition or -‘/2 if the control condition; and

ﬂ=#E—#C°

The hypotheses are
Ho: ,6 = 0
H12 ﬂ > O ,

Here 0,2 is assumed to be unknown, and the test is a T statistic,

A

T: 31 '6 . (4)
2(y,,—5»‘,,)2+2(y,,—z,)2

i=1 i=1

2n—2

 

 

The power function of the test at the 5 percent signiﬁcance level is

11

n
1 - probt( tinv( 0.95, 2n-2), 2n-2, 6f; )1, (5)

where probt is the cumulative distribution function for the non-central T; and tinv
is the quantile function for the central T (see appendix A for the deﬁnition of the
functions); 0.95 is equivalent to 1 - the alpha level of the test (we assume 0.05

alpha level from thereon); 2n-2 is the degrees of freedom for the central and non-

n
central T distribution; 6 is the standardized effect size E (Cohen, 1988); atl;
0'

is the non-centrality parameter of the non-central T distribution.

F test

The multi-site clinical trial is a popular two-factor design. The treatment factor is
a ﬁxed effect; and its power function can be expressed in terms of the cumulative
distributive function of a noncentral F. The random factor, i.e. the site, is a
random effect, and its power function can be formulated in terms of the

cumulative distributive function of a central F.

The model can be written in the ANOVA notation:
yak =p+ak+7rj+(arr)jk+g“m, (6)

7r}. ~ N(0,12), (0:70}, ~ N(o,a§,,), a“ 1,, ~ N(0,0'2)

 

For simplicity we only consider one-sided test and a level of 0.05 throughout the dissertation,
though the two-sided test can be derived similarly. All the cumulative distribution functions and
their inverse functions from here on use the same notation as SAS. Their deﬁnitions are provided
in the Appendix A.

12

where Ya, is the outcome for the ith participant nested within the ith site and
receiving treatment k (i = 1.., n;j = 1,...,J, k=1,...K). Here ,u is the grand
mean, a, is the main effect of treatment k, 7:}. is the main effect of site j, an], is

the interaction effect between sitej and treatment k. Note that It and art are

viewed as random effects, and that they are independent.

The test of the treatment effect assumes the null hypothesis that

a, =a2 =-°-=ak.
The test statistic is the ratio of the mean square for the treatment over the mean
square for the treatment-by-site interaction. It assumes an F distribution with df
for the numerator as k-1, df for the denominator as J-1, and the non-centrality

parameter

The power function for the test at 0! =0.05 is

2
12.12613
1 — probf(ﬁnv(0.95, K-1, J-1) , K-1, J-1, ———"——), (7)

0'2 + n0";r

where probf is the cumulative distribution function for the non-central F; ﬁnv is the
quantile function for the central F; 0.95 is equal to 1 — alpha level of the test; K4

is the df for the numerator for the central and non-central F distribution; J-1 is the

13

nJZaf

df for the denominator for the central and non-central F distribution; 2—"— is

0' +210;

the non-centrality parameter for the non-central F distribution.

The test of the random site effect assumes a central F distribution after
transformation (see Raudenbush and Liu, 1999). The power function can be

expressed as follows:

0.2

1— probf(ﬁnv(0.95, J-1, (n-1)*K*J)* ——2———2—, J-1, (n-1)*K*J), (8)
0' +

ax

where probf is the cumulative distribution function for the central F; ﬁnv is the
quantile function for the central F; 0.95 is equal to 1 — alpha level of the test; J-1

is the df for the numerator for the central F distribution; (n-1)*K*J is the df for the

2

denominator for the central F distribution; and 7—0—2— times ﬁnv function is

0' +120“

the quantile parameter for probf.

12 test

The Wald test is a 12 test, although it is seldom used. Suppose that
Y = Xﬂ + e , (9)

where e ~ N(0,021).

The null test can usually be set as

Ho: Aﬂ=0

14

A6 assumes a normal distribution, i.e. Al? ~ N(A,6,A(X'X)'l A'oz); and the test
statistics
Q = (Am/MK)" A'az 1" (Al?) ~ 13.....,(6). (10)

where the non-centrality parameter 6 is (Aﬂ)’[A(X'X)"A'0'2]"(A,B).

If we do not know 02 , we usually substitute its estimate 62. The estimate 62

has a chi square distribution times a constant. Therefore, the new statistics

 

[Q/mnan follows a F distribution with non-centrality parameter

6

/(,.

6 = (Aﬂ)'[A(X'X)" A'02]"(Aﬂ). It can be proved that the new statisticis a
monotonic function of the likelihood ratio statistic (see pp. 110, Stapleton, 1995).

It is also noted that ,B can be the maximum likelihood estimate or the least

square estimate, and that they are identical if efollows the normality and

independence assumption.

The power functions can be expressed as:
1 — probchi(cinv(0.95, rank(A)), rank(A), (Aﬂ)'[A(X'X)"A'0'2](A,B) ), (11)
where probchi is the cumulative distribution function for non-central chi square;

and the cinv is the quantile function for central chi square.

The test of parameters in the mixed model provides another example. We can

formulate the general model as:

15

y=Xﬂ+Zu+e (12)

E(y)=Xﬂ; Var(y)=V=ZGZ'+R

Test Ho: K 36 = O
(K'ﬁ)'[K'(X'V"X>"‘K1<K'/§) ~ 13...“,(1) (Stroup. 1998)
Therefore, the power function can be expressed as:
1 - probchi(cinv(0.95, rank(K)), rank(K), (K73)'[K'(X'V"X)"K](K',B) ), (13)
where V is usually not known but may be replaced by its estimate from a

previous study. When the estimate is based on a large sample, the chi square

test is approximately true and the formulas is still applicable.

16

Chapter 2

COMPUTATION OF POWER IN THE HIERARCHILCAL LINEAR MODEL

THROUGH THE ANOVA TABLE

Researches in education and mental health often involve the use of people as
subjects. People are usually situated within some social and clinical settings.
For example, they may be nested in schools or in community health centers.
Randomly assigning individual people into experimental conditions is sometimes
not ethically and logistically possible. However, groups of people who are
geographically or socially related may be randomly assigned into experimental
conditions. In school-based experiments, classrooms are often assigned into
treatment or control conditions; in mental health research clusters of patients who
attend the same clinic are assigned into a new therapy or control treatment
(Raudenbush, 1997). Those experiments can be analyzed using the HLM, and

they are closely related to mixed ANOVA.

Under the analytical framework of ANOVA the effect of a factor may be
considered as either ﬁxed or random. Mean squares may be computed for the
factors in the model, and the test of each factor may be constructed according to
its expected mean square and those of other factors. The power functions can

then be constructed from their respective tests from the ANOVA table.

17

The idea of using the expected mean square ANOVA table assumes a balanced
design. For a given sample size with equal cost and variance per treatment a
balanced design yields the maximum power. In planning a study a balanced
design is usually chosen. If the cost of the study is at issue, then an unbalanced
design may achieve better power through optimal sample allocation. For
example, we may enlarge the total sample by recruiting more subjects that cost
less. Of course, the optimality issue is beyond the scope of the dissertation; and
we limit our inquiry to balanced designs. We will discuss unbalanced designs,
which result from missing data in the concluding section. In short, the
assumption of a balanced design makes it possible to relate HLM to ANOVA and
to plan a study using HLM with the aid of the power functions from ANOVA

tables.

Scheffe (1959), Bennett and Franklin (1954), and Searle (1971) all provide
general rules on deriving the ANOVA table of an experiment be it ﬁxed factorial
or mixed design. The correctness of the derived table crucially depends on
including every effect term and the correct subscripts of each effect term in the
model. Yet, it is very easy to miss an effect term or some subscripts as the
design gets complex. The derivation of the ANOVA table for high-order mixed
designs requires extreme meticulousness and patience. People commonly are
unsuccessful. Moreover, the only way to check the correctness of the ﬁnal

results is to repeat the same process.

18

In the following a new set of rules will be introduced to simplify the procedure in
deriving ANOVA tables of mixed design and the statistical power functions of the
relevant tests. The rules are largely based on the equivalence between mixed
linear model (hierarchical linear model) and mixed ANOVA model. Raudenbush
(1993) shows that a hierarchical linear model produces through the restricted
maximum likelihood method the same results as the traditional ANOVA when the

nested random effects design is balanced.

For simplicity we will consider only the typical case of a hierarchical linear model,
that is, each level has a constant and a random effect with the option of a
categorical covariate. The constant and categorical covariate become the ﬁxed
terms in ANOVA; and the random effect of each level transforms into a random
term or random interaction term in the ANOVA model. The deﬁnition of ﬁxed
and random terms carries the same meaning for both HLM and ANOVA. The
random term in HLM usually results from randomly sampling units at a particular
level. It denotes the random variation among the units at that level after
accounting for the effect of the covariate of that level. Each random term in the
HLM corresponds to a unique random effect in the ANOVA model. In the latter
model the random term may change into a random interaction term. Its meaning
becomes less clear than its counterpart in the HLM model. The ﬁxed effect in the
HLM model can either be an experimental condition or a classifying variable
which takes a ﬁnite number of values. When translated into the ANOVA model

those ﬁxed effect terms may become ﬁxed interaction terms. Although the

19

following-stated rules apply to the HLM with a constant, an optional covariate,
and a random effect at each level, extension to a more general case can easily

be made with slight modiﬁcation.

Procedures on writing the terms and their subscripts in an ANOVA

model

The steps and rules are stated with reference to an example in which the cluster
randomized trial is replicated across a number of sites and sites are classiﬁed by
a dichotomous characteristic. A hypothetical educational study may be
constructed for this design. The site can be the school and the cluster can be the
classroom. At each school classrooms can be randomly assigned into two

different counseling types. Schools are then classiﬁed as public or catholic.

This particular design is chosen for two reasons. First, it includes both crossing
and nesting and bears the feature of a split-plot design. The site is the “plot.”
“Plots” are classiﬁed by a ﬁxed effect, and there is a randomized experiment in
each “plot”. Second, the inclusion of cluster randomized trials at each “plot”

complicates the design and illustrates the generality of those rules.

Step 1: write down the design in terms of a hierarchical linear model (HLM)

Experimental units are speciﬁed as levels in HLM. A ﬁxed effect term is included

at the level where random assignment into treatment conditions occurs or a

20

covariate is introduced (For simplicity all ﬁxed factors and covariates are treated
as dichotomous. The results generalize.) At each level by default an

observation is a mean value plus its random error of that level.

The “site” is the largest randomly sampled unit and is situated at the highest
level. It might be schools in educational research. The "cluster" can be
classrooms in each school. The “individual” can be students in a certain
classroom. The subscript denotes either kth, or 11h, or Ith unit at the

corresponding level.

 

Level subscript Linear model at each level2

 

Site k ﬂock =7000+7001W+u00k

ﬂOIk =7010 + 7011W +u01k

 

CIUSter J ”0,1: = ﬂoor; + ﬂOlkX + ’ojk

 

Indeual I Y0" = ”0,1. + em

 

 

 

 

 

Table 1: HLM formulation of the example

Combined model:

Yljlc =7ooo +700|W+“00k +7010X+7011WX+u01kX+r0jk +8117: (14)

 

2 Since the HLM terms are only used to identify their ANOVA counterparts, we omit the subscripts
for the HLM terms for simplicity.

21

Step 2: write down ANOVA model with reference to the terms in HLM
notation

Rules on naming terms in HLM: The random effect at each level is named after
the variable name of that level. For example: r is the cluster effect and U is the
site effect. Fixed effects are determined by dichotomous variables in the model,
Le. a covariate at each level, and coefﬁcients do not have bearing on naming the
terms in the model. Ignoring the coefﬁcients, we can easily identify the
interaction term by looking at the presence of random terms and ﬁxed terms in

the model. The term map is provided for this example.

 

 

 

 

 

 

 

 

Term Name ANOVA term
romW Effect of site characteristic 7)

um, Effect of site 7

ymoX Effect of treatment a

70,,WX Effect of treatment*site 0")

characteristic

um X Effect of treatment*site a7

r0], Effect of cluster 3

em Within cell error a

 

 

 

 

 

Table 2: translation between HLM and ANOVA terms

22

Each term in the map table has its corresponding counterpart in the ANOVA
model. The order of the terms in HLM is kept the same in ANOVA. The letter for
the ANOVA terms may be different to conform to convention; the subscripts in
HLM can be adopted to show the relation between HLM and ANOVA. The

ANOVA model for this example is as below:

Y=p+n+7+a+an+ay+ﬂ+e. (15)

Step 3: add subscripts to the terms in the ANOVA model
Rule 1: Attach subscript i to the treatment effect and any letter, say, h other than
the already used subscript to site characteristic. Attach the same subscripts as

in the HLM notation to the single letter terms.

Y=p+77h+yk+ai+an+ay+ﬂj+e,. (16)

Rule 2: If there is treatment or covariate effect at a level in the HLM, then the
units of that level are nested within the treatment or covariate, and the units of
the lower levels are said to be nested within the units and its covariate at the

higher levels.

For this example, the cluster is nested within the treatment because level 2 has a
treatment term in the HLM; site is nested in site characteristic because site

characteristic is the covariate at the site level. In addition, the cluster is said to

23

be nested within the site and the site characteristic because the cluster is at a
level below the site level. In summary, the cluster is nested within the treatment,
the site, and the site characteristic. The site is nested in the site characteristic.
The individual is nested in the .cluster, the treatment, the site, and the site
characteristic because the individual sits lower than any other levels in the HLM

representation.

Rule 3: If an effect is nested within other effects, add the subscripts of the other

effects in parenthesis after its own subscript.

Y = .U + ’71. T 71:00 Tar +a77+a7 1' ﬂjwch) +el(ijkh) - (17)

Rule 4: The subscript for the interaction term is the combination of subscripts of

those terms which constitute the interaction.

Y 2 .u '1’ 77;. +7141») +ai +000. +£37.10.) + 10'“) +6107”) ' (18)

In addition, if there is a high level above the level where treatment occurs, then

the treatment is said to be crossed with the high level.

24

25

IIOVA

e the mean square of
effect to its right (see
row points at the term

' term from which the

(19)

square of the random
:t on its right without

1 square divides the

(20)

Rules on constructing tests for mixed ANOVA

Rule 5: For the test of a ﬁxed effect or interaction, we divide the mean square of
that term by the mean square of the next available random effect to its light (see
the arrows on the following equation for illustration). The arrow points at the term

whose mean square divides the mean square of the other term from which the

arrow comes.

mm

Y = .U + 77;. +7k(h) Tar +a’7ih 1’ “7.1m + jlikh) + elwkh) - (19)
Rule 6: For the test of a random effect, we divide the mean square of the random
effect by the mean square of the next closest random effect on its right without
the same letter. The arrow points at the term whose mean square divides the
mean square of the other term from which the arrow comes.

m

Y = .U + 77;. + 7m.) + a, + “’7". + “711(k) 1’ mm + enrich) (20)

25

Procedures and rules for deriving the ANOVA table and statistical power

functions

With the prior knowledge of which mean square should be tested against which
mean square, the expected mean square of each term may be easily derived

without resort to the tedious task of establishing a subscript table.

Step 1: Construct a frame of an ANOVA table, add the subscript and its number
of levels to the right of a single letter term, write down the (If according to their
subscripts, namely, the product of the number of levels associated with each
subscript in the parenthesis and the number of levels minus one associated with

each other subscript(s).

 

 

 

 

 

 

 

 

 

Name term df EMS
h : d Effect of site characteristic 7],, d —1
k : c Effect of site m) (c -1)d
i : a Effect of treatment a, a —1
Effect of treatment*site an”, (a —1)(d — 1)
characteristic
Effect of treatment*site a)!“ m (a —1)(c — 1)d
j : b Effect of cluster IBM“) a(b —1)cd
I : n Within cell error 5mm abcd(n — 1)

 

 

 

 

 

 

Table 3: ANOVA table of the example

26

Rule 7: Each term in the ANOVA model has either a variance component

(random effect) or ﬁxed factor (ﬁxed effect associated with it). The variance

component for the random effect is 02 which is subscripted by its term. The

term '

ﬁxed effect is represented by the sum of squares of the model components

associated with the factor divided by its degree of freedom (Montgomery, 1996).

Rule 8: The expected mean square of within cell error is always 02. The EMS of

the random term which should be tested against within cell error is 0'2 plus the
variance component of that random term times the product of the numbers of
levels of the other subscripts which are not present on the term. Likewise, the
previously derived EMS of random term is used to write down the EMS of other
random or ﬁxed term whose test uses the MS of the previous random term as the
denominator in the F test. The EMS of new random or ﬁxed term is the EMS of
the previous random term plus the variance or ﬁxed effect of the new term times
the product of the numbers of levels of those subscripts which are not on the new

term.

In the case of the current example, the derivation is illustrated as below in 3

steps:

27

 

 

 

 

 

 

 

 

 

Name term df EMS
h : d Effect of site 77;. d -1
characteristic
k : c Effect of site 7m) (c — 1)d
z : a Effect of treatment a, a —1
Effect of treatment*site an”, (a —1)(d — 1)
characteristic
Effect of treatment*site gym) (a —1)(c — 1)d
j : b Effect of cluster ,ij) a(b —1)cd 0.2 + no;
1 : n Within cell error 5W) abcd(n —- 1) (,2

 

 

 

 

 

Table 4: derivation of EMS—step 1

28

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Name term df EMS
h : d Effect of site 77;. d —1
characteristic
k :c Effect of site ym) (c—1)d 02 +n0'f, +abn0'f
1 : 0 Effect of at. a —1
treatment
Effect of an”. (a — 1)(d — 1)
treatment*site
characteristic
Effect of an”) (a —1)(c - 1)d 0'2 + no; + bn a;
treatment*site
j : b Effect of cluster ﬂjW) a(b — 1)cd 0-2 + no;
I : n Within cell error saw) abcd(n —1) 0,2

 

Table 5: derivation of EMS—step 2

29

 

 

 

 

 

 

 

 

 

 

 

Name term (If EMS

h :d Effect of site 77;, d -1
characteristic

k :c Effect of site ym) (0-1)d 0'2 +n0'f, +abnO'r2

zza Effect Of ai a -1 iaz
treatment 02 + n 012’ + b" 03’ + bed" (i; —i
EﬁeCt Of an”, (a _1)(d _ I) 0'2 + no}3 + bn of” + bcn l—Z an;
treatment*site (a — 1)(h -1)
characteristic
Effect of arm,” (a -1)(C - 1)d 0' 2 + no}, + bn 0:,
treatment*site

j : b Effect of ﬂjm,” a(b —1)Cd 0'2 + no;
cluster

I : n Within cell swim abcd(n — 1) 02
error

 

 

 

 

 

Table 6: derivation of EMS—step 3

30

 

 

 

 

 

 

 

 

 

 

 

 

Name term df EMS
hzd Effect of site 77;. d-l £772
In
characteristic ‘72 + "0121 + abnof + “be" 2 1
k :c Effect of site m) (c—l)d 0'2 +ncrf3 +abnaf
z:a Effect of a. a—l “
I Z a}
treatment 0'2 + no":’ + b11057 + bcdn ‘ 1
a _
Effect of a . (a —1)(d —1) .2
’7.» 0-2 +2202 +bnO'2 +bcn ————Zzamh ‘
. I’ “’ (a-l)(h—1)
treatment*sute
characteristic
Effect of a7...“ (a —1)(c--1)d (,2 +naf, +bnaj,
treatment*site
j :b Effect of ﬂjum a(b -1)cd 02 + no},
cluster
I : n Within cell any“) abcd(n —1) 0-2
error

 

 

 

 

 

The above ANOVA table is the same as directly derived according to the old

rules.

Table 7: ﬁnal ANOVA table of the example

31

 

With the aid of the ANOVA table, the power function of the tests may be easily
constructed. We provide the power functions for this example as below:
The power function for testing the site characteristic is

abort: I]:

1 — probf( ﬁnv(0.95, d-1, (c-1)d),d-1, (c-1)d, 2 2 " 2 ). (21)
0‘ +naﬁ +abn0'r

 

The power function for testing the site effect is

2 2
0 +n0ﬂ

’
02 + n‘o-f3 + abhor2

 

1 - probf( ﬁnv( 0.95, (c-1)d, a(b-1)cd )*

(c-1)d, a(b-1)cd). (22)
The power function for testing the treatment is

bcdniari2
1 — probf(ﬁnv(0.95, a—l, (a—1)(c—1)d , ‘ ). (23) f

2 2 2
a + no,3 +bn0'ar

 

The power function for testing the treatment-by-site characteristics is

1 — probf(ﬁnv(0.95, (a —1)(d — 1), (a —1)(c — 1)d , “"2”“ ). (24)

2 2 2
0' + no,3 +bn0'ar

 

The power function for testing the treatment-by-site variance is

2 2
d +noﬂ

 

1 — probf(ﬁnv(0.95, (a—1)(c—1)d, a(b—1)cd)* 2 2 2 .
0' + no), +bnoar

(a — 1)(c -1)d , a(b —1)cd ). (25)

The power function for testing the cluster variance is

0.2

2

1 — probf(ﬁnv(0.95, a(b —1)cd , abcd(n — 1) )"*————2 ,
a +naﬁ

a(b —1)cd , abcd(n —1)). (26)

32

The same power functions in HLM can be constructed from a similar HLM table.

The HLM has all the parameters in HLM notation.

33

Chapter 3

COMPUTATION OF POWER IN HIERARCHICAL LINEAR MODEL THROUGH

HLM TABLE

In this chapter proof will be given to support the rules we have used in the
previous chapter. The same rules will be used to construct the HLM table and
derive the power functions directly. The sample example in the previous chapter

is used to construct a HLM table.

Proof
The rules we have used in the previous chapter follow three principles. First, the -
parameters of ﬁxed effects are always tested against the random term at the
same level in HLM. Second, the random term at the higher level is always tested
against the random term at the next lower level. Third, the conditional
hierarchical linear model is equivalent to mixed ANOVA model; and each term in

the HLM has its corresponding term in the mixed ANOVA.

HLM can be divided into individual levels. At each level it takes the form of either
Outcome = mean + random component

or
Outcome = mean + parameter*ﬁxed effect + random component.

We simplify the proof by having the “ﬁxed effect” take only two levels, i.e.

treatment or control, though the proof can be easily adapted to the ﬁxed effect of

34

more than one level. Essentially, each level of HLM can be either considered as
requiring a one sample t test or two sample independent t test. We will ﬁrst
prove that the ﬁrst principle holds true at the either of the two forms at the 2nd
level. We continue to prove that the ﬁrst principle is true at the n+1 level.
Similarly, we will prove the second principle at the 2nd level, and then we

generalize it to the n+1 level. We will give reference to the third principle.

Case 1: The 2nd level has a ﬁxed effect

Level 2: :60; = y0+7le+uj, uj ~ N(O,r) j =1,...,J (27)

Level1: rpm!” r0. ~N(0,0'2) i=1,....n (28)

W}. takes -‘/2 or 1/2 for control and experimental conditions.

2

,6,” = Y j and 7, )7, are independent and have variance gn—t-I'.

The test of parameter 7, is the same as the two sample independent t test. The

power of the test of y, is

 

1 — probt( tinv(0.95, J-2), J-2, 7'2 ). (29)
— — +

J ( n I)
So the estimated 7, is tested against the estimated variance of the error term u j .

Alternatively,
2(7) if = J $7,...- -7.>2 + Z 207...- —i..>2 (30)

and, that is, TSS = $88 + SSE,

where 883 is the between sum of squares and SSE is the within sum of square

as in the one-way ANOVA.

SSE 02 SSE 02
E(‘ﬂ-é')=0')271 =—;-+r,and J—_§~(7+T)13_2. (31)
2 2 2
E(ﬂ) =9—+r+—{Z'—, and ﬂ~ (57—+r)x,2(5 =—J§—). (32)
2 —1 n 4 2 —1 n 0'
4(— + r)
n
. SSB . . . . .
The test of 7, IS F=§E———. It follows a F distribution. The power function is
(J - 2)
inz
1 — probf(ﬁnv(0.95, 1, J-2), J-2, ——2— ). (33)
0'
4(———— fl- r)
n

It is equivalent to the previous power function (see appendix B). Therefore, the

ﬁrst principle is true for the 2"d level in the ﬁrst case.

From text books on experimental design (see Montgomery, 1997), the estimated
62 always has the expectation 02 and follows 0212 distribution when the design
is balanced. We know from the above that

E(nSSE /(J — 2)) = 0'2 + nr. (34)

The test statistics for r is "SSE/(j —2). After transformation, it follows a central
0’

 

F distribution. The power function is

a
l

0'

02+nr

1 - probf( ﬁnv(0.95, J-2, Jn-J)*

 

,J-2, Jn-J). (35)

Therefore, the second principle holds true too for the 2"d level.

36

Now we can generalize the proof to the n+1 level. The model can be formulated
as follows:
Leveln+1:

5'17“ 2 yg” +7,"”w}”' +u3”, uj ~ N(O,r"”) j =1,...,J (36)

Leveln:

n _ n n
Y..- 4.,-

” ’

Var(Y; ) = of i = 1,....n (37)
w}”‘ takes -‘/2 or 1/z for control and experimental conditions.
“:1.“ = 7;" .17,“ ff” are independent since the nth level are nested in the n+1

level. The test of 7:” is the same as a two sample independent t test. The test
statistic has a T distribution with degree of freedom J-2 and a non—centrality
7f”
4 0'2
I;

the ﬁrst and second principle are proved.

parameter . The same results duplicate in the n+1 level. Therefore

Case 2: The level does not have a ﬁxed effect.

Level 2: ,6,” = 700 + uoj (38)
ﬂij=710+uij (39)
(“01.12. NUO} [700 710]]
ulj O 1'01 Til
Level1:
Y”. 2,60]. +,BUXU +ry. r;.j~i.i.d.N(0,0'2) (40)

2

‘ - - - . . 0'
,8,” = Y j and Y l...Y J are independent and have variance --—-+ 2'00.
. . . n

37

,8”. =75j—YC . =AY.,

5} J

where 75}. is the mean for the experimental group at the jth site; and )7“. is the

mean for the control group at the jth site. A7,...AI7J are independent and have

. 40'2
variance — + r”.
n

The test of 700 and yloare the same as one sample t test. 60]. and ,6”. are the

corresponding observed scores. The estimates are:

2?} 207; “if 2

 

 

 

 

. A — . — 0'
700 J Var(Y.j)= J—l E(Var(Y.j))=0')Efj =7+Tooi (41)
J _ J _ = 2
ZAY.) _ 2(AYJ TAX.) _ 402
)7”, = J V&r(AYj)= J—l E(V&’(AY)))=0':7, :--n_+T”
(42)

The power function for the test of yw is

 

 

1 — probt( tinv(0.95, J-1), J-1, 27w ). (43)
J(40 + r” )/J

n

 

Therefore the ﬁrst principle holds true in this case.

It should be noted that the estimates

ZOE-ff :AY} :(AYJ_A7)2

V547.» : i ii.) J ’ V&r(AY-1) =

 

 

 

J-l

38

are algebraically related to the mean squares for the terms in the ANOVA, which
correspond to the parameters 1401., y”, , u”. in the HLM model.

Now we prove that the second principle is true in this case. The estimated
ézagain has the expectation 02 and follows 02 12 distribution.

207.,- —17..)2 ,

Vcir()7_j) = has a (0—+ 10,0134 distribution. The test statistics for the
n

 

nVar(?_’j)

6'2

parameter uoj is . The numerator is equivalent to the mean square for

the site effect in the ANOVA model. After transformation, the statistics has a
central F distribution. The power function for the test of ac). is

2
1 - probf( finV(0.95, J-1, nJ-2J)*—20 ,J-1, nJ-2J). (44)
0'

n

 

ion". —47)’

V6r(AYj)= J—-1 hasa ( C;

2
+ 1,0134 . The test statistics for the

 

 

 

4/n(Var(Ai7,)

2 . It is the same as the mean square ratio of

parameter ui ,- is
0'

treatment-by-site interaction over within-cell error. After transformation, the test

statistics has a central F distribution. The power function is

0.2

1 — probf( ﬁnv(0.95, J-1, Jn-2J)*-—-2—-———

0' +nrn/4

,J-1, Jn-2J). (45)

For the n+1 level, we add n+1 superscripts to the parameters in the model.

Essentially the proof is the same as we have provided in the previous case.

39

Creating the HLM table

With the aid of the three principles we may construct an HLM table similar to
ANOVA table and derive the power functions directly. We use the same example

from the previous chapter to illustrate the construction of a HLM table.

First, we put all the terms in the combined model into the table and write the
subscripts and their corresponding total number of levels on the leftmost column.
Second, we use the same rules to construct the subscript for each term and
derive the degrees of freedom as we have done in the ANOVA table. Third, we
ﬁll in the parameters for the random and ﬁxed effects. There are three. mles to
do so (for simplicity, we restrain all the covariates to take two values):

1. If the outcome at one level is the coefﬁcient of the covariate at the next lower
level, divide the variance of the random component of that level by 2.
Otherwise, we write down the variance of the random component.

2. For the ﬁxed effect we write down the corresponding coefﬁcient, square it ,
and then divide it by 2.

3. For the ﬁxed interaction effect we write down the corresponding coefﬁcient,
square it, and then divide it by 4.

Fourth, we refer to the rule 8 in the previous chapter to write down the numerator

or denominator of the non-centrality parameter for the power functions.

40

With the aid of the HLM table, we can derive power functions for the tests of the
ﬁxed and random effects. For the test of the random component the construction
of the power function is the same as in the previous chapter. For the ﬁxed effect
all the power functions use a non-central t distribution. The non-centrality
parameter is the square root of the ratio of the piece for the ﬁxed effect in the last
column over the expectation of the random term, against which the ﬁxed effect is
tested. In general, non-centrality parameter =sqrt[ (numerator of non-centrality
parameter for the ﬁxed effect)/(denomenator of non-centrality for the random

effect) ].

41

The HLM table for the example in the previous chapter is provided as below.
The subscripts are kept the same to highlight the translation between ANOVA
and HLM. In the following chapter all the HLM tables adopt the HLM subscripts
and notation. Here h denotes for site characteristic; k for site; ifor treatments; j
for cluster; and [for individual. a is the number of levels for the treatment factor;
b for the cluster factor; c for the site factor; (I for the site characteristic factor; n

for the within-cell error.

 

 

 

 

 

 

 

 

 

 

Term Sub- df Param- Numerator or denominator
script eter of non-centrality parameter
- _ _ 2 2
h . d 700lW h 1 d 1 700] abcnyom
2 2
k 26 um, [C(11) (C -1)d rpm 0'2 +br, + abn 7/300
iia 7010X i a_1 70w2 bcdn 70i02_
2
7’01in (a—1)(d—1) 1%: bcnf’on2
4
u... X ik(l) (a —1)(c — 1). 1:0, 02 + m, + b” 2:2
j : b ray. b(kh) a(b -1)cd 1,, c)-2 + n r,
I : n em 1(ijkh) abcd(n — 1) 02 0-2

 

 

 

 

 

 

Table 8: HLM table of the example

42

 

The power function for testing the treatment is

 

bcdn E
1 — probf(tinv(0.95, (a -1)(c -1)d ), (a —1)(c — 1)d , 2 ). (46)

0'2 +nr,r +bn—gm

 

 

The power function for testing the site characteristic is

 

2
abcn ZE—

1—probt(tinv(0.95, (c—1)d), (c-1)d, 2 ). (47)

02+nz'8 +abn—22g

 

 

The power function for testing the treatment-by-site characteristics is

 

2
bcn .721

1 -- probt(tinv(0.95, (a —1)(c -— 1)d ), (a -1)(c --1)d , 2 4 ). (48)
0' +nrﬂ +bm'lml

 

43

Chapter 4

FOUR EXAMPLES

We provide the HLM tables for three different designs: a 3-Ievel cluster
randomized trial, a multi-site clinical trial with a site covariate, and a combination
of cluster randomized trial and multi-site clinical trial ( cluster randomized trial
replicated across multi-sites). For each design the power functions are given for
the tests of the parameters in the model. Finally we provide the power function
for a potential HLM analysis based on Tennessee classroom size study (Finn &

Achilles, 1990).

3-level cluster randomized trial
In school-based intervention studies schools are randomly assigned into '
treatment and control condition. Classrooms are nested within each school. The

design can be formulated as a 3-level HLM model:

Level3:

ﬂ00k=y000+yomwk+u00k’ uj ~N(O,r/,) k=1,...,K (49)
Level2:

7:01., = ﬂock +r0jk ’01:. ~ N(O,z',,) j =1,...,J (50)
Level1:

Y). =,;,,,.e,,,, e0, ~ N(0,0'2) i: 1,....n (51)

w, takes -‘/2 or V2 for control and experimental conditions.

44

 

Term Subscript df Parameter Numerator or
denominator of non-

centrality parameter

 

 

 

 

 

 

’12 room 1 2'1 750. M" @-
2 2
sz um, k(l) 2(K-1) r}, 02 +nr,r +ner
j'J r011 j(kl) 2K(J_1) T” 0.2+n1'”
12): em i(jkl) 2JK(n-l) 02 (72

 

 

 

 

 

 

Table 9: HLM table of 3 level CRT

The power function for the test of 700, is

 

 

 

2

JKnZ—ﬂ

1 — probt( tinv(0.95, 2(K —1)). 2(K — 1) , 2 2 ). 3 (52)
0' + nr,r + nJr 13
The power function for the test of r), is

2

1— probf(ﬁnv(0.95, 2(K-1), 2K(J—1))* 2 a ”T" ,
0' + nr,r + ng'),

2(K—1), 2K(J —1)). (53)

 

3 A SAS programs can be used to compute the power value of any listed functions and plot a
power curve. The functions input are exactly the same as provided (see Appendix C, the SAS
programs).

45

 

The power function for the test of r” is

0,2

1 — probf( ﬁnv(0.95, 2K (J —1), 2JK(n—1))*-—2—,
0' +an

2K(J — 1) ), 2JK(n — 1) ). (54)

Multi-site clinical trial with a site covariate

The multi-site clinical trial is widely used in mental health research. Patients are
randomly assigned into the treatment or control condition at each clinical site.
The same study is replicated across a number of clinical sites. The key interests
surround the average treatment effect across the sites and the variability of
treatment effect among the sites (Raudenbush & Liu, 1999). When the treatment
effects vary signiﬁcantly across sites, it usually implies that the ﬂuctuation of the
treatment effect is not simply random but is related to some characteristic of the

sites, Le. a site covariate. The model may be formulated as a 3-level HLM:

LeveL2:

ﬂu]. = 600 +6mwj +qu uoj ~ N(O,‘roo) j =1,...,J (55)

,6”. =6lo+l9nwj+u,j. uU~N(O,1'”) (56)
Level1:

ya. = ,60}. +6ij. +rg. r9. ~ ii.d.N(0,0'2) i=1,....n (57)
where X j = 1/ 2 if subject in the treatment

X ,j = -1/ 2 if subject in the control.

46

 

 

 

 

 

 

 

 

Term Subscript df Parameter Numerator or
denominator of non-
centrality parameter

[:2 601W]. 1 2‘1 0—021 126—021
2 2
jzi “01- 1(1) 2(£_1) TOO 02+2£T00
2 2
k 2 BIOXij k 2_1 92’. 22191:).
2 2 2

Hum-X.) kl (2—1)(2—1) 912] £19]:

4 2 2 4

“ii/Y.)- jk(1) 2 i_1 312 2 +£i

(2 ) 2 a 2 2
' ' 2 2
“1’. ’11 '(Jkl) 41(n/2-1) " a
2 2

 

 

 

 

 

 

Table 10: HLM table of MST with a covariate at the site level

The power function for the test of 90, is

, 2
1 — probt( tinv( 0.95, J — 2 ), J — 2, /w ). (58)
a +n2'00

The power function for the test of 1'00 is

2

1— probf(ﬁnv( 0.95, J—2), 2J(n/2—l))* , a , (59)
0' + It Too

 

47

 

2(J—1),2J(n/2-1)).

The power function for the test of 6,0 is

 

 

1 — probt( tinv( 0.95, J - 2), J — 2, (60)
The power function for the test of 19,, is
I 2 19:1
1 — probt( tinv( 0.95, J — 2 ), J —2, -—22—4 ). (61)
0'2 + 2 5i
2 2
The power function for the test of 1,, is
2
1 — probf( ﬁnv( 0.95, J - 2, 2J(n / 2 —1) )*—1——,
‘ (,2 , 2 :6
2 2
J—2, 4J(n/2—1)). (62)

A combination of cluster randomized trial (CRT) and multi-site trial (MST)
This design has the features of both CRT and MST. At each site there is a CRT;
and the same CRT is replicated across a number of sites. For example, a
school-based intervention CRT can be conducted across a number of different

school districts. It then becomes a 3-level HLM; and the model is listed as

follows:

Level3:
[300, = 7000 +1100, uomk ~ N(O,tpm) k =1,...,K (63)
,Bm,‘ = 70.0 +u0” um ~ N(0,rpm) (64)

48

Level2:

 

 

 

 

 

 

 

”0,1: =16001 +ﬂOIkX+r0jk rOjk ~N(O9Tzr) j=1,.-.,J (65)
Level1:
if], = 7:01., + e1], 6,7,, ~ N(O,0'2) i = 1,....n (66)
Term Subscript df Parameter Numerator or
denominator of non-
centrality parameter
k K k _
“00" K 1 T501" 0'2 + nz',r + 2-‘g—nz‘pm
’32 7010X l 2_1 1511 110,52
2 2
k K — -
uOIkX I ( 1X2 1) {1’0” 0'2 + 717,, 1;;- "Tami
' 2
1.11 r01, 10") 2K(1]——1) I, a +n1',,
2 2
(In em i(jkl) 23—1-K(n—1) 0.2 0'2
2

 

 

 

 

 

 

Table 11: HLM table of a combination of CRT and MST

The power function for the test of r 12.... is

0'2+nr,r

 

J
1- robf ﬁnv0.95, K—1,2K —--1 * ,
p ( ( (2 )) oz+nrx+nJrﬂW

49

 

K—l, K(—:—-1) ). (67)

The power function for the test of 70.0 is

 

2,9,8.

1 — probt( tinv(0.95, (K -1)(2 -1) ), (K —1)(2 —1), 2 J2 ). (68)

2
a +nTn+‘2—ntﬂou

 

The power function for the test of r 1,0” is

02+n1',

1 — probf( ﬁnv(0.95, (K —1)(2 — 1) , 21a? 1) )* J .

2 .
O- +nrﬂ+§nrﬂou

 

(K —1)(2 — 1) , 2m; — 1) ). (69)

The power function for the test of r, is

0,2

1 — probf( ﬁnv(0.95, 2K(-J-—1), JK(n -1) )* 2
2 0' +nr,r

2K(%-1), JK(n -1) ). (70)

Multi-site trial with a continuous covariate and a site characteristic

The studies of school effectiveness relate different types of school policy to
students' achievement. Data are often collected on students from a number of
schools, which can be classiﬁed by their policy types. The students are nested in
individual schools, and schools are nested in different policy types. HLM is often
used to analyze those nested data. The school policy types are considered as a

school-level categorical covariate; the students’ background information is

50

modeled as student-level variables. They can be either categorical or continuous
variables. For example, students’ gender is a categorical variable, and their
scores on achievement tests are continuous variables. A generic model may be

constructed as follows:

School-level:
[30}. =600+I901Wj+u0j uoj ~N(O,r00) j=1,...,J (71)
,6”. = 6,, + BMW]. + 14,-. u”. ~ N(0, 1r“) ' (72)
)6“ = 620 +621ij +112}. uzj. ~ N(0,r22) (73)

where W}. is a school level categorical variable; and it is assumed to be

dichotomous for simplicity.

Student-level:

Y.)- = )6,” +ﬂUX“). +,8sz2,} +5.]. 1;}. ~ i.i.d.N(0,o*§) i=1,....n (74)
where X, is a categorical variable, and it takes 1A» and -‘/2 for a.student-level
dichotomous characteristic; X2 is a continuous variable, i.e. a continuous
covariate; a; is the student-level error variance with the inclusion of a

continuous covariate.

Kreft (1993) used the same type of HLM model to study the effect of school
selective recruitment on students’ success. The sample contains 70 secondary
schools in Amsterdam. Some schools selectively admit students based on their
scores on achievement tests, and the other schools admit all students regardless

of their scores. So the selective policy of schools is the school level covariate,

51

and it is represented by W]. in the model. The student level variables contain

gender, test score on an achievement test, and their interaction. The gender
corresponds to X1 in the model and the test score to X2. The interaction can be
deemed as an additional continuous covariate like X2 (we limit the number of
continuous covariates to one in the model for simplicity, though the results

generalize).

If we plan a similar study, we can use the same model. Assuming the model is
balanced, we may derive the power function for the test of school types, i.e. the

test of parameter 190,. The power function is based on equation (58) except that

02 in (58) is replaced by a; , that is,

2
1 — probt( tinv( 0.95, J - 2 ), J — 2, (£19215— ). (75).
ac + "Too

This is because we reduce the above-mentioned model to a multi-site trial with a
site characteristic. If we move the continuous covarite X2 to the left side of the
equation (73), (73) changes into (76) and (76) is equivalent to equation (57),
which is the level 1 model in the multi-site trial with a site characteristic.

nyzm—ﬂzszii =50)+131in11+’})' (76)

After changing the )3}. into the adjusted Y

,j , we can apply the power functions in

the multi-site trial with a site characteristic to the above-mentioned generic

model.

52

Chapter 5

NUMERICAL APPLICATOIN OF POWER FUNCTIONS IN PLANNING
EDUCATIONAL STUDIES

The power function evaluates the probability of rejecting the null hypothesis in
our study. Since most of the studies are used to reject the null hypothesis,
statistical power becomes a natural criterion to evaluate the soundness of a
research plan. In the following we examine statistical power in two designs using
HLM, i.e. cluster randomized trial and multi-site trial. In each design we pose a
research question. Appropriate power functions are then chosen to determine
the sample sizes. At the end the two designs are compared in terms of power

performance.

Cluster randomized trial

The cluster randomized trial is used widely in educational research. For
example, schools are randomly assigned to the treatment or control condition.
Students in the same schools tend to share common characteristics; and their
responses to the treatment may not be independent of each other. The nesting
nature of the design requires HLM analysis (see Raudenbush, 1997).

The model may be formulated as follows:

Level1:

53

Xy=ﬂo,-+r.): r.)- ~N(0,0'2) I (1,2, ...,n)j (1,2, ...J) (77)
where

Y9. is the individual score;

.30,- is the mean of the jth cluster;

r is the individual error

n is the number of subjects in each cluster

J is the total number of clusters.

Level2:

ﬂoj=7°°+70IWJ+uoﬂ “01' ~N(O,z'00), (78)

where

700 is the grand mean;
yo, is the treatment effect;

W}. takes ‘/2 for the treatment condition and -‘/2 for the control condition

uoj is the cluster effect.

The combined model is therefore:

n2700+7mnlj+u0j+ny (79)

The derived HLM table is as follows:

 

Term Subscript Df Parameter Numerator/denominator of

non-centrality parameter

 

 

 

 

k:2 70,111,. k 1 7_§._ Jnré.

2 4
J'IJ/Z uoj 106) J-2 to, 02+nr
i: n 1;). i(jk) J(n - 1) o'2 0'2

 

 

 

 

 

 

 

The power functions of the test of the main treatment effect is

. .Inygl
1-probt(tinv(0.95, J — 2) , J - 2 , —2—— ). (80)
4(0' + M)

The power function for the test of the cluster effect is

0.2

2 —,J—2,J(n-1)). (81)
a-t-nz'

 

1-probf(ﬁnv(0.95, J - 2, J (n —1))*

The variance components and effect size 70, are real value parameters and are

inﬂuenced by their measurement scale. In planning a speciﬁc study we rarely

know those parameters. However, functions of those parameters are available

 

from previous studies of similar nature. In the cluster randomized trial p =

02+r

is reported as an intraclass correlation coefﬁcient in most of the previous studies

using the same design. It varies from 0 to 1.0. 6 = —§3'—— is the standardized

0' + 1'
effect size whose magnitude may easily be evaluated. 6 may be assumed to

take 0.2, 0.5, 0.8 for small, median, and large effect (Cohen, 1988). It is

55

therefore natural to translate the variance components and effect size into their
functional forms, whose values we can get from previous studies. After

reparameterization the power function for the test of the main effect becomes

 

1-probt(tinv(0.95, J — 2), J — 2, (82)

‘5 )
4 1— p '
— —+
JJ( n p)
The power function for the test of the cluster effect becomes

1-probf(ﬁnv(0.95, J - 2, J(n — 1) )*——1‘—”-—, J —- 2, J(n — 1) ). (83)
l—p+np

We may substitute the hypothesized parameter values into the power function
and plot the power against a sample size variable, e.g. J, the number of clusters
or n the number of subjects in each cluster. An appropriate sample size may be
found from the power curve to obtain a desired power level. Depending on our
research question we may use different power functions in planning the study. In

the following we present a typical research problem.

An educational researcher wants to design a school-based intervention study.
The researcher is interested in comparing the differential effects of two
counseling programs on students morale and academic aspiration. The outcome
of students morale and academic aspiration will be a composite score on a
continuous scale. It is logistically feasible to administer the same counseling
program in a school at one time. So the evaluator decides to use the cluster
randomized trial. The schools as clusters are randomly assigned to using either

one counseling program or the other. The evaluator has 10 participating schools

56

and wants to know how many students should be recruited in each school. Since
the effect of the counseling programs corresponds to the treatment effect in the
model, the power function for the test of the main effect of treatment should be
used to choose the sample size. Assume that the researcher gets an intraclass
correlation coefﬁcient from previous school-based studies, e.g. ,0 =0.05, and a
standardized effect size 0.5 from a preliminary study. The power function can be
plotted over the sample size n in the ﬁgure 1 (see table 12 in appendix D for
numerical values). If the cluster size n is set to be 20, then the power will be

0.75. The choice of sample size of 20 at each school is therefore justiﬁed.

PCNVER

 

q

d

.1
90‘
. .1
.1

q

d
80‘
C d
.1

1

[LL

ll_L

llll lllll

 

 

0900000000
UT
0
LLL

 

[VIIIIlTTTIﬁlllIIITITTTIFTTTIIﬁTTTTTIFITTITTIIIII

0 IO 20 30 40 50
N

upper curve for fixed effect
lower curve for random effect

Figure 1: power in the cluster randomized trial

57

Observing the power function we may notice how the n and J inﬂuence the power

given the parameters p and 6 . If p is small like 0.05 in the previous case, then

most of the variation among students scores occurs within schools. If it is more
costly to sample clusters than to sample people within clusters, then increasing n
is more efﬁcient to raise the power than increasing J. Increasing n greatly
reduces the denominator of the non-centrality parameter in the power function
and thus increases the power. It is exactly reﬂected in the ﬁgure 1. On the

contrary, if p is large, then increasing J is more efﬁcient to get high power than

increasing n (see Raudenbush, 1997).

It is noted that high power of one test is achieved at the cost of low power of the
other tests. In the cluster randomized trial obtaining the desirable power of the
test of the treatment effect does not necessarily guarantee high power for the test
of the cluster effect. In the ﬁgure 1 the lower curve represents the power for the
test of the cluster effect. It is obvious that its power is much lower than the power
for the test of the treatment. Such conﬂict may be easily resolved if the
researcher compares the importance of individual tests with reference to
research questions they answer and sets them in priority order. The power '
function of the test, which answers the key research question, is used to choose
the sample size. In the current example the main effect of treatment is of keen
Interest. The test of the treatment effect overweighs the test of the cluster effect;
and choice of sample sizes should be made with the power of the test of the

main effect of treatment.

58

Multi-site trial

The multi-site trial is a popular design because it is easy to administer. At each
site there is an independent randomized experiment; and the same experiment is
replicated across a number of sites (see Raudenbush and Liu, 1999). The model
may be formulated as a 2-level HLM:

Level1:

Yi‘j =ﬂo) +ﬂlei'j +7"

‘1’

r.)- ~N(0,az) I (1,2, ...,n)j (1,2, ...J) (84)

where

Y9. is the individual score;
,6,” is the mean at the j-th site;
,6”. is the treatment effect at the j-th site;

r”. is the within-cell error.

LeveLZ:
ﬂo,‘ = 2'00 Tuo)‘, “0} ~ N(0,I'00 ); (85)
ﬂlj : 710 +ulj’ ulj ~ N(O’Tll)i (86)

where

70,, is the grand mean;
y", is the main effect of treatment;
Cov(u0 1,11, j = to, is the covariance between the site mean and treatment

effect.

59

The combined model is

)1): =700+710Xij+u0j+710XguijT’ij-

(37)

If we express the combined model in the ANOVA notation, it becomes equation

(6). The terms in both models are arranged in the same order.

The HLM table for the multi-site trial is as follows:

 

 

 

 

 

 

 

Term Subscript df Parameter Numerator or
denominator of non-
centrality parameter

2 2
“.ng 1'" J-1 a 02,236.
2 2 2
j:J qu j IJ—l r00 02411100
2
i:n/2 r I(jk) 2J(n/2—1) 0'2 0'2

 

 

 

 

 

 

The power function for the test of main effect of treatment is

2
1-probt(tinv(0.95, J —-1 ), J —1 , —4’1&—— ).
40' + in“

(88)

The power function for the test of treatment-by-site interaction is

1-probf(ﬁnv(0.95, J -1 , J (n — 2) )* 4'7;—
0'

60

2

II

4

,J-1,J(n—2)).

(89)

 

Observing the HLM model, we may notice that y”, is the unstandardized

treatment effect, and that 1'” is the variance of the unstandardized treatment

effects across individual sites. As in the cluster randomized trial we translate

those parameters into their functional forms whose values can be conjectured.

7,0 is transformed into 6 = 231, and it becomes a standardized effect size.

0'

Similarly, 2'” becomes 03 = 5%, i.e. the variance of standardized treatment
0'

effects across sites; and its value may be set at 0.05, 0.10, 0.15 for small,

median, and large (Raudenbush 81 Liu, 1999). The power functions with the new

parameterization are as follows:

 

1-probt(tinv(0.95, J —l ), J -1, —i——); (90)
f. 2?.
Jn J
and
1-probf(ﬁnv(0.95, J —1 , J(n — 2) )*—3—,, J — 1 , J(n — 2) ). (91)
"0'
1+ 5
4

We may use either of power functions to choose sample size. The choice
depends on the research question in the study. If the researcher tries to ﬁnd out
whether one innovative instruction program is better than the routine program,
then the main effect of instruction is of great importance. The power function for
the test of treatment effect should be used to make sample size choice. If on the
contrary, the researcher is concerned about whether the differential treatment

effect is related to the administration of those treatments at individual sites, the

61

power function for the treatment-by—site effect should be used to select a sample

size.

Suppose that a researcher is interested in the differential effect of two tutoring
methods, and that he or she conjectures a median effect size 0.5, median effect
size variability across sites 0.10, and that there are 10 participating schools. He
or she wants to know how many students at each school should be recruited to
maintain the power of the test of the treatment at 0.75. The power can be plotted
over a range of possible sample sizes n (see ﬁgure 2 and table 13 in appendix
D). The power of the test for the treatment-by-site interaction (random effect) is
also plotted over the same range of sample size n. The power arises very
quickly with the increase of n; and it reaches 0.76 when n is 14. So the sample
size 14 gives the researcher good chance to discover any median treatment
effect. It is easy to see that the power for the treatment is much higher than the
power for the interaction. This does not affect the adequacy of the research
design. Although the interaction is included in the model, it is not considered to
be signiﬁcant. Its inclusion allows us to trace the source of the variances and get

a good estimate of each variance components.

Observing the power function (91 ), we can also see the effect of J on power. For
a constant n, increasing J will raise the power because it increases the non-
centrality parameter in the power function. This is especially true when the effect

size variability is large.

62

PCNVER
1.001

 

q
q
90‘
' '1
.1
q

.1

d
.80.
d

d

.1

—(
.701
d

‘1

d
80‘
e -1
I

.

.
50‘
e -1
q

u

G
40‘
' d
-(

‘

q

_
.30.
d

d

d
20‘
9 I
d

i

H
10‘
o .1
.

 

 

9099999999

.oo‘

 

IUIIUUIIIIIIIIIIUTU—TITIIITIIIUIITIIIIIUU'IUIUUIII'T

0 10 20 30 40 50
N

upper curve for fixed effect
lower curve for randon effect

Figure 2: power in the multi-site trial

In the multi-site trial the treatment conditions are crossed with the sites; and the
design allows the estimation of treatment-by-site interaction in addition to the
estimation of the site random effect. In the cluster randomized trial, the cluster-
by-treatment interaction is not estimable and is swept under the cluster random
effect. This increases our uncertainty about the source of the variation in
subjects” responses to the treatments, and it in turn enlarges the variance of our
estimate of the treatment effect. As the variance of the estimate of treatment
effect increases, it is less likely to reject the null and have high statistical power.

Comparing the two designs in terms of power, we can see that the multi-site trial

63

is superior to the cluster randomized trial. The same sample size n returns
higher power in the multi-site trial than in the cluster randomized trial. For
example, when n is set at 14, power is 0.76 for the multi-site trial and 0.68 for the
cluster randomized trial (see table 12 and 13 in appendix D). When n is chosen
to be 20, power is 0.86 for the multi-site trial and 0.75 for the cluster randomized
trial. In addition, the site and cluster variability are unfavorable for the power of
the test of main effect of treatment. In the examples multi-site trial
accommodates a higher site variability than the cluster randomized trial. In the
multi-site trial the site variability is set at a moderate level, i.e. the effect size
variability of 0.10, whereas in the cluster randomized trial the variability of
clusters over the total variance is 0.05, which is considered low. In short, the
multi-site trial outperforms the cluster randomized trial in terms of power even
under unfavorable conditions. However, the choice of design may depend on
other logistical issues. If the schools can not give differential treatments to the
students at one time, cluster randomized trial may become a favorable design. It

does not require that the subjects receive different treatments at one place.

In sum, the power functions may be used to assess the statistical adequacy of
sample size in a certain design. They can also be used to compare different
designs in terms of the power performance. The key is to come up with a
reasonable set of parameters, which are meaningful to researchers. Once the

parameter values are chosen, the power function can be plotted over a certain

64

possible range of sample size. It is then easy to determine a desirable power

level and sample size for the design.

65

Conclusion

Sample size issue plays an important role in educational and social research.
Prediction studies need to use a large enough sample to make sound
generalizations. The larger the sample; the more stable the estimates become.
Sample size is related to the extent to which the model can make an accurate
prediction in the general case (Brooks, et al, 1996). Other studies, which test a
research hypothesis, also involve choice of sample size. The test of the
parameter needs a large enough sample to make the ﬁnal inference defensible. .
The larger sample the study uses; the more information it can generate, and the
more conﬁdently the conclusion can be made about the detected treatment
effect. Such conﬁdence in the conclusion is related to the probability with which
we can reject the null hypothesis and conﬁrm our belief in the alternative
hypothesis. The probability of rejecting the null hypothesis is the statistical power
of the test, and the power is related to sample size of the study. The larger the

sample is; the higher the statistical power can be achieved.

Sample size determination depends on the power function of the relevant test.
When the model becomes complicated, it is hard to derive the power functions.

It is especially true in multi-level modeling. The model is complex because the

66

coefﬁcients at the lower level are considered as random at the higher level. The
estimation of parameters follows very sophisticated algorithms, i.e. iterative
generalized least squares or the expectation maximization (EM) procedure. It is
really hard to estimate the power of all the tests in all cases. However, we can
simplify the derivation of the power functions by placing some reasonable
constraints on our model. We may impose a balanced design requirement on
power analysis. Given the fact that studies are usually planned as taking a
balanced design, it is quite practical to apply the constraint of balanced design in
the power analysis. Once we limit our investigation to balanced designs. We
literally eliminate the difference in many estimation methods of the parameters.
They converge on the same estimate when the design is balanced. This gives

us an unique solution to power analysis of multi-level models.

However, the power will vary under the unbalanced design. The unbalanced
design may either be a result of missing value or unbalanced sampling plan. In
the ﬁrst case the power should be lowered because of information loss in the
data. It may spuriously be higher or lower than it should be. This may be true if
the data are not missing at random and the imputation methods are not properly
used. We will discuss below the logic of power attrition when data are missing.
Assume that we use the multiple imputation method. Distributions are ﬁrst
hypothesized for missing values; and then multiple values are generated for each
missing value from those distributions to yield multiple complete data sets. The

routine analysis is then performed on those multiple data sets to produce multiple

67

estimates of the same parameter; and the multiple estimates are averaged to
give the ﬁnal estimate of the parameter. The variance of the ﬁnal estimate
consists of two components: the ﬁrst the average of the imputed estimates’
variances; the second the sample variance of those estimates. When the data
are complete, only the ﬁrst component exists. Therefore, the variance of the ﬁnal
estimate from multiple imputation method is larger than it should be if no data are
missing (Schafer and Olsen, 1998; Rubin, 1987). The larger the variance of the
estimate; the less likely the test will reject the null hypothesis. The power of the

test therefore decreases.

The unbalanced design may also arise from a sampling plan. Some sampling
units may naturally have more subjects than other units. In general the power
will be lower than in an unbalanced design given the total sample size. It is
difﬁcult to assess the power change without real cases. There are many
procedures to adjust those unbalanced design in the data analysis. Those
procedures may vary in their power performance. In addition, the distribution of
the test statistics often depends on speciﬁcally used procedures. If the departure
from balance design is not severe, we may treat it as a balanced design and
calculate power by substituting average sample sizes or their harmonic means

into the power functions.

Under the balanced design multi-Ievel modeling can be carried out in two

approaches: mixed ANOVA’s and hierarchical linear models. They are

68

essentially the same in the planning stage of a study. The dissertation points out
very clearly the connection between the two approaches. They can literally be
translated from one to the other. In the former approach it is easier to do the
power analysis of tests of ﬁxed effects of more than one levels. The second
approach (HLM) gives ﬂexibility and advantages in the stage of data analysis
because it accommodates missing values and the unbalanced designs of real
data. The dissertation shows the power analysis for both approaches. With the
HLM approach the dissertation invents a handy HLM table to derive the power

functions of parameters in the model.

In HLM the power analysis literally uses the estimates of parameters at the lower
level as the outcome for the parameters at the high level. At each level the
model is simpliﬁed to a linear regression. It takes either the form of a one sample
t test or a two sample independent t test. The power functions of the relevant
parameters are derived similarly to the case of one sample t test or two sample t
test. The expectation of the estimated variance of random component has
patterns from the lower level to the higher level. Through algebraic
transformations we may use the estimates of variance to test each random
component at each level. The estimates of the treatment and variance
components are algebraically related to the mean squares of their couterparts in
ANOVA. The ANOVA tables provides a scaffold for systematically developing

power functions for the key parameters in HLM.

69

With slight modiﬁcation the power analysis can be extended to the case of having
continuous covariates at each level in HLM. We use the CRT as an example to
illustrate the approach and generalize it to any level. We may assume some

covariates at the 2"Cl level and modify equation 27 as follows:

ﬂojzyo+yle+y2le+y3X2j+°'+uj1 uj~N(O,T) j=1,...,J. (92)

To simplify the computation, we assume that the population coefﬁcients of those

covariates are known, and that the percentage of variation in ﬂojdue to the

covariates are known (we may use empirical estimates from the previous study
to substitute), and that the covariates do not have any collinearity with the ﬁxed
effects (Randomization or matching subjects on covariates can help to achieve
that). If we leave out those covariates in the analysis, we literally force 2' to be
larger than it should be. We may view the variation due to the covariates are
swept under the random error at that level. To assess the power change due to
the inclusion of covariates we may adjust the 1 parameters in the power function
by a percentage score, that is

n=—’—C— (93)
T

where 77 is the ratio of the reduced r, due to inclusion of covariates over the

orignal I. For example, the adjusted power function for testing the treatment

effect becomes

1 — probt( tinv(0.95, J-2), J-2, 7' ). (94)
[1(12- + 1' )
J n 77

 

70

Of course, the computed power value will be approximate. It should be higher
than the real one because it does not assume the estimation of covariate
coefﬁcients. The estimation of covariate coefﬁcients consumes some information
in the data, which may othenNise be used to gain more precision in estimating
the treatment effect. If we consider the collinearity between covariates and ﬁxed
effect, then the variance estimate for the treatment effect will be increased
correspondingly (see Raudenbush, 1997) and power decreases correspondingly.
In short, the real power value falls between the unadjusted power function and
the adjusted power function. To generalize the approach to any level, we may

. T . .
hypothesrze a percentage score '7 = ——C— for each level. 2' IS the random vanance
T

.at that level; and Q is the reduced random variance due to the inclusion of

covariates. We may obtain those percentage scores from previous studies, and
then we may adjust the random error parameters in the power function. by their

corresponding percentage scores.

When planning a study researchers can standardize the parameters in the power
functions and bypass the assumption of full knowledge of the key parameters.
This makes it easy to plan a study. Raudenbush (1997) and Raudenbush & Liu
(1999) have proposed some standardization scheme for cluster randomized trial
and multi-site clinical trial. They can be adapted to general cases. This is

because the every two levels of HLM essentially assumes a CRT or MST.

71

The rules in the dissertation may form the basis of a computer software which
computes the power of the tests of key parameters in the HLM. Hopefully the
dissertation will become a stepping stone to serious investigation of power

analysis of general HLM, e.g. categorical outcome and multivariate outcome.

72

APPENDICES

73

APPENDIX A

DEFINITIONS OF THE USED PROBABILITY FUNCTIONS

Noncentral T cumulative distributive function:

probt( x, degrees of freedom, non-centrality parameter)

Quantile function for central T distribution:

tinv(cumulated probability, degrees of freedom);

Central F cumulative distributive function:

probf(x, df for the numerator, df for the denominator)

Noncentral F cumulative distributive function:

probf(x, df for the numerator, df for the denominator, non-centrality parameter)

Quantile function for central F:

ﬁnv(cumulated probability, df for the numerator, df for the denominator)

Noncentral Chi cumulative distributive function:

probchi(x, df, non-centrality parameter)

Quantile function for central Chi:

cinv(cumulated probability, df)

74

APPENDIX B
CONVERSION BETWEEN NON-CENTRAL T ' AND F '

Deﬁnition of non-central T' and F'

U+6

IV
V

, where U is a standard normal random variable.

 

TJ(5)~

 

232(52) 2
P;:.<62>~———/ 4‘1“”

2 T 2
I.» IV
V V

Therefore

”5) : Jméz) ma) 2 o .
” -JF.:.<62) .TJ(5)<0

Also we state the following results without proof, since the proof uses the same
logic as the following derivation:

ll. Conversion in power of two-sided test between non-central T' and F '
power= P[ T'(6)2t g ]+ P[T'(6)srg ]
2’ 2'

=P[T:»(6)2 ‘dﬂ-mlﬂ’]+P[TV'(6)< —.flv a—.l:v]
= P[(Tv(6))2 Z .fl-adw]
= P[I;‘l:'v(§2 ) ">- .fl-aﬂzv]

75

APPENDIX C

SAS PROGRAM TO COMPUTE POWER

 

 

 

THIS SAS PROGRAM IS USED TO COMPUTE THE VALUES

OF POWER FUNCTIONS IN THE DISSERTATION.

THE FUNCTIONS SHOULD BE ENTERED AS THEY APPEAR

IN THE DISSERTATION; AND ALL THE PARAMETERS SHOULD
BE REAL VALUES.

 

=======================================_ ___*/

 

%KEYDEF F1 'END; PGM; REC; SUB';

%LET P=;

%LET FUN=;

%WINDOW FUNCTION COLOR=CYAN ROWS=30 COLUMNS=7O

GROUP= FIRST
#5 @4 ”INPUT THE POWER FUNCTION"
#6 @4 ”ALL THE PARAMETER INPUTS SHOULD BE REAL VALUES"

#10 @4 "ENTER POWER FUNCTION BELOW"
#12 @4 FUN 60 ATTR=UNDERLINE REQUIRED=YES

GROUP=SECOND

#5 @4 FUN 60

#7 @4 'THE ABOVE FUNCTION IS EQUAL TO ' @36 P 3 ATTR=UNDERLINE
#12 @4 'PRESS' @10 'ENTER' A=UNDERLINE @16 To END'

#13 @4 'OR PRESS FUNCTION KEY' @28 'F1' A=UNDERLINE @32 'TO
CONTINUE

%DISPLAY FUNCTION.FIRST;

DATA DSN1;
POWER=&FUN;
RUN;

DATA NULL;

SET DSN1;
CALL SYMPUT('P',TRIM(LEFT(POWER)) );

76

RUN;
%DISPLAY FUNCTION.SECOND;

 

THIS PROGRAM TAKES A VARIABLE NAME, ITS RANGE,
AND A POWER FUNCTION. IT THEN PLOTS THE POWER
FUNCTION AGAINST THE VARIABLE OVER THE PROVIDED
RANGE

 

%KEYDEF F1 'END; PGM; REC; SUB’;
%LET X=;

%LET UPBOUND=;

%LET LOWBOUND=;

%LET FUN=;

%WINDOW PWPLOT COLOR=CYAN ROWS=30 COLUMNS=7O

GROUP= FIRST
#5 @4 ”INPUT THE VARIABLE NAME” @36 X 8 ATTR= UNDERLINE
#6 @4 ”AGAINST WHICH POWER SHOULD BE PLOTTED"

GROUP=SECOND
#5 @4 ”INPUT THE VARIABLE NAME" @36 X 8 ATTR=UNDERLINE
#6 @4 ”AGAINST WHICH POWER SHOULD BE PLOTI'ED"

#8 @4 "ENTER THE UPBOUND"
@29 UPBOUND 8 ATTR=UNDERLINE REQUIRED=YES
@43 ”FOR" @48 X PROTECT=YES

#10 @4 ”ENTER THE LOWBOUND”
@29 LOWBOUND 8 ATTR=UNDERLINE REQUIRED=YES
@43 ”FOR” @48 X PROTECT=YES

#13 @4 ”ENTER POWER FUNCTION BELOW”
#14 @4 FUN 60 ATTR=UNDERLINE REQUIRED=YES

%DISPLAY PWPLOT.FIRST;
%DISPLAY PWPLOT.SECOND;

77

DATA PW (KEEP=POWER 8X);
LOW=SYMGET(‘LOWBOUND');
UP =SYMGET('UPBOUND');
INC=(UP-LOW)/100;

DO &X=LOW TO UP BY INC;
POWER=&FUN;
OUTPUT;

END;

RUN;

GOPTlON HORIGIN=2 VORIGIN=2 VSIZE=5 HSIZE=4;
symbol1 interpol=join width=2;

AXIS1 ORDER=(O T0 1.0 BY 0.1);

PROC GPLOT DATA=PW;

PLOT POWER*&X/ VAXlS=AXlS1 FRAME;

RUN;

78

APPENDIX D

SAS PROGRAM FOR FIGURE 1 AND 2 AND TABLE 12 AND 13

 

P==================================== =

 

THIS PROGRAM PRODUCES FIGURE 1 AND 2 TABLE 12 AND
13 IN THE DISSERTATION.

FIGURE 1 AND TABLE 12 ARE FOR CRT;

FIGURE 2 AND TABLE 13 ARE FOR MST;

 

 

================================ —————— = */

FILENAME TABLE1 'C:\|iu\dissertation\table1.rtI‘;
FILENAME TABLE2 'C:\liu\dissertation\table2.rtf';

DATA CRT (KEEP=N POWER_F POWER_R);
FILE TABLE1;

P========================
PARAMETERS FOR CRT
======:==================ﬂ

ALPHA=0.05; * SIGNICANCE LEVEL;

DELTA=O.5; * DELTA STANDS FOR STANDARDIZED EFFECT SIZE;
RHO=0.05; * RHO IS THE INTRACLASS CORRELATION;

J= 10; * J IS # OF CLUSTERS;

PUT @10 'n'
@20 'ﬁxed effect'
@40 'random effect'
ll;

DO N=5 TO 50;

 

P=================;_ : — —
POWER FUNCTION IS THE SAME AS ( 82, 83) IN CHAPTER 5.
NC IS THE 4TH PARAMETER IN THE POWER
FUNCTION FOR FIXED EFFECT;
OMEGA IS THE SCALE IN TH POWER

 

79

FUNCTION FOR RANDOM EFFECT

============== —: _____====:===:=:V

 

 

NC=DELTAISQRT(4*( (1-RHO)/N + RHO )IJ ):
POWER_F=1-PROBT(T|NV(1-ALPHA,J-2),J-2.NC) ;

OMEGA=(1-RHO)/( 1-RHO + N*RHO);
POWER_R=1-PROBF(FINV(1-ALPHA,J-2,J*(N-1))*OMEGA,J-2,J*(N-1) ) ;

FORMAT POWER_F 8.2 POWER_R 8.2;
PUT @10 N @20 POWER_F @40 POWER_R;
OUTPUT;
END;
PUT //
@10 'Table 12: power in cluster randomized trial' ;
RUN;
*PROC PRINT DATA=CRT; RUN;
DATA MST (KEEP=N POWER_F POWER_R);
FILE TABLE2;
ALPHA=0.05;
DELTA=O.5;
SIG_DELT=O.10; * VARIABILITY OF DELTA ACROSS SITES;
J=10;
PUT @10 'n'
@20 'ﬁxed effect'
@40 'random effect'
ll;
DO N=4 TO 50 BY 2; *ASSUME A BALANCED DESIGN ;

 

 

P================================ _ =___
POWER FUNCTIONS ARE THE SAME AS (90,91)
IN CHAPTER 5.

 

NC=DELTAISQRT(4/(N*J)+SIG_DELT/J);

80

POWER_F=1-PROBT(TINV(1-ALPHA,J-1), J-1, NC);

OMEGA=1/(1+N*SIG_DELT/4);
POWER_R=1-PROBF(FINV(1-ALPHA, J-1, J*(N-2))*OMEGA, J-1, J*(N-2) );

FORMAT POWER_F 8.2 POWER_R 8.2;

PUT @10 N @20 POWER_F @40 POWER_R;
OUTPUT;

END;

PUT //
@10 'Table 13: power in multi-site trial' ;
RUN;

%MACRO PWPLOT(DSN);
GOPTION HORIGIN=2 VORIGIN=2 VSIZE=5 HSIZE=4£

SYMBOL1 INTERPOL=JOIN LINE=1 WIDTH=2 ;
SYMBOL2 INTERPOL=JOIN LINE=2 WIDTH=1;
FOOTNOTE1 J=C H=1 'upper curve for ﬁxed effect';
FOOTNOTE2 J=C H=1 'Iower curve for random effect';
AXIS1 ORDER=(0 T0 1.0 BY 0.1)
LABEL=(FONT=SWISS 'POWER');

PROC GPLOT DATA=&DSN;

PLOT POWER_F*N POWER_R*N /OVERLAY VAXIS=AXIS1 FRAME;
RUN;

%MEND PWPLOT;

%PWPLOT(CRT)
%PWPLOT(MST)

81

APPENDIX D

n treatment effect cluster effect
5 0.43 0.12
6 0.48 0.14
7 0.51 0.17
8 0.55 0.19
9 0.57 0.21
10 0.60 0.23
11 0.62 0.26
12 0.64 0.28
13 0.66 0.31
14 0.68 0.33
15 0.69 0.35
16 0.70 0.38
17 0.72 0.40
18 0.73 0.42
19 0.74 0.44
20 0.75 0.46
21 0.76 0.48
22 0.76 0.50
23 0.77 0.52
24 0.78 0.54
25 0.78 0.56
26 0.79 0.57
27 0.80 0.59
28 0.80 0.61
29 0.81 0.62
30 0.81 0.63
31 0.81 0.65
32 0.82 0.66
33 0.82 0.67
34 0.83 0.69
35 0.83 0.70
36 0.83 0.71
37 0.84 0.72
38 0.84 0.73
39 0.84 0.74
40 0.84 0.75
41 0.85 0.76
42 0.85 0.77
43 0.85 0.78
44 0.85 0.79
45 0.85 0.79
46 0.86 0.80
47 0.86 0.81
48 0.86 0.81
49 0.86 0.82
50 0.86 0.83

Table 12: power in cluster randomized trial

82

n treatment effect treatment*site

4 0.40 0.07
6 0.51 0.09
8 0.59 0.11
10 0.66 0.13
12 0.72 0.15
14 0.76 0.17
16 0.79 0.20
18 0.82 0.22
20 0.85 0.25
22 0.86 0.27
24 0.88 0.30
26 0.89 0.32
28 0.90 0.34
30 0.91 0.37
32 0.92 0.39
34 0.93 0.41
36 0.94 0.44
38 0.94 0.46
40 0.95 0.48
42 0.95 0.50
44 0.95 0.52
46 0.96 0.54
48 0.96 0.56
50 0.96 0.58

Table 13: power in multi-site trial

83

BIBLIOGRAPHY

Bibliography

Bennet, C. A. and N. L. Franklin (1954). Statistical analysis in chemistry and the
chemical industry. Wiley, New York.

Brooks, Gordon et al. (1996). Precision power and its application to the selection
of regression sample sizes. Mid-Westem Educational Researcher, 9, 10-
17.

Kreft, lta (1993). Using multilevel analysis to assess school effectiveness: a study
Of Dutch secondary schools. Sociology of Education, 66, 104-129.

Littell, et al. (1997). SAS System for Mixed Models. SAS, North Carolina.

Montgomery, D. C. (1996). Design and analysis of experiments. Wiley, New
York.

Raudenbush, S. (1993). Hierarchical linear models and experimental design. In
Edwards, L (Ed.), Applied analysis of variance in behavioral science. New
York: Marcel Dekker, Inc.

Raudenbush, Stephen 8. Liu, Xiaofeng (1999). Statistical power and optimal
design in multi-site clinical trial, revised and resubmitted to Psychological
Methods

Rubin, DB. (1987). Multiple imputation for nonresponse in surveys. J. Wiley &
Sons, New York.

Schafer, J.L. and Olsen, MK. (1998). Multiple imputation for multivariate missing-
data problems: a data analyst's perspective. Multivariate Behavioral
Research, 33 (4), 545-571.

Scheffe, H. (1959). The analysis of variance. Wiley, New York.

85

Searle, S. R. (1971 ). Linear models. Wiley, New York.
Stapleton, James. (1995). Linear statistical models. New York: John Wiley.

Stroup, Walter. (1998). An introduction to mixed model analysis. course notes.

86

ICHIGRN STRTE UNI V.

IIIIIIII I||III3I 0|I2II OIII IIIIIIIIILIIIIIIIIIIIIIS