THFRKI

Mill/lill/Illwg/QWMIMI/W

3 1293 0918

   
     

  

LIBRAR Y
Michigan St: re
University

W l l__.-,...- W};

  

 

This is to certify that the

thesis entitled

INVESTIGATION OF METHODS OF ANALYZING
HIERARCHICAL DATA

presented by

Boonreang Kajornsin

has been accepted towards fulﬁllment
of the requirements for

Ph . D . degree in Counseling

 

 

and Educational Psychology

(Statistics and Research Design)

waist,“ ﬁwi

Major professor

Date October 9, 1980

0-7639

 

' W 7 7 "
in? (€35?
M913 33 1009

 

 

ovsnw ' ma :
25¢ per day per it.

unwind LIBRARY "mums:

[Place in’ book return to move

from circulation records

 

 

INVESTIGATION OF METHODS OF ANALYZING

HIERARCHICAL DATA

By

Boonreang Kajornsin

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

\

DOCTOR OF PHILOSOPHY

Department of Counseling and Educational Psychology

1980

é ﬂee/é;—

ABSTRACT
INVESTIGATION OF METHODS OF ANALYZING
HIERARCHICAL DATA
By

Boonreang Kajornsin

In recent years researchers have become more cognizant of the
problems of analyzing hierarchical data. It has become increasingly
evident that efforts to investigate the relationship among educational
variables have suffered from a failure to understand complications
caused by hierarchical data. When faced with the analysis of hier-
archical data many researchers have proposed alternative ways of
analyzing such data.

The general purpose of this dissertation was to investigate
various alternatives used to analyze hierarchical data by applying them
to a set of simulated data. This study extends the regression model
presented by Burstein, Linn and Capell (1978) to its multivariate form.
The model used to simulate the data is the random effects model. The
main assumption used in this model is that there is homogeneity of the
within-group regression coefficients. The main concern of this dis-
sertation is to determine which approach gives the best estimates of
the between and within regression coefficients in terms of accuracy
(least amount of bias) and in terms of precision for various situations.
The bias ratio of each estimator was also computed to facilitate com-

parisons.

Two situations were investigated in this dissertation. The first

Boonreang Kajornsin

situation was one in which there were both individual level predictors.
which were aggregated to the group level and predictors which were
defined only at the group level. The second situation was one in
which there were only individual level predictors which could be ag-
gregated. For each situation, three different data sets were genera-
ted; first, there were no group level effects; second, group level
effects were equal to the individual level effects; third, group level
effects were not equal to the individual level effects.

The simulation results showed that all analysis approaches gave
the same estimates of the pooled within-group regression coefficients
for all six cases with good precision and small bias ratios.

In the situation where there were both individual level predictors
which were aggregated to the group level, the group level analysis
approach, full model analysis approach and substraction analysis
approach all gave essentially the same estimates of the regression
coefficients defined for the group level variables. In the case where
there was no group level effects, the two stage analysis approach gave
better estimates of the regression coefficients defined for the group
level variables than for the other three approaches. In the case
where the between-group regression coefficients were equal to the
pooled within-group coefficients, all four approaches gave essentially
the same estimates of the regression coefficients defined for the group
level variables. In the case where the between—group regression coef—
ficients were not equal to the pooled within-group regression coeffi—
cients and when the intraclass correlations were low (about 0.30) all
four approaches gave the same estimates, but when the intraclass cor-

relations were high (about 0.90) the two stage analysis approach did

Boonreang Kajornsin

not give estimates of the regression coefficients as good as those
given by the other three approaches.

In the situation where there were only individual level predictors
which could be aggregated to the group level, the simulation results
showed that for all three cases, the full model analysis approach and
the subtraction analysis approach gave exactly the same estimates of
the between-group regression coefficients but they were not close to
the true parameter values. The group level analysis and Bock appli-
cation approaches gave estimates of the between-group regression coef-
ficients that were not that different from each other and were also
close to the parameters. When the intraclass correlations were high
(about 0.90), the group level analysis approach seemed to give better
estimates of the between-group regression coefficients, but the Bock
application analysis approach gave better estimates when the intraclass
correlations were low (about 0.30) in the case where the between-
.group regression coefficients were not equal to the within regression
coefficients. When the between-group regression coefficients were
equal to zero, the Bock application analysis approach gave better
estimates of the between-group regression coefficients than the group
level analysis approach. However, when'the between-group regression
coefficients were equal to the pooled within group regression coef-
ficients, the two approaches gave essentially the same estimates for

the between-group level analysis approach.

ACKNOWLEDGEMENTS

I wish to express my sincere appreciation to my advisor and com—
mittee chairman, Professor William H. Schmidt for his assistance,
suggestions, and encouragement throughout all phases of my study.
Special thanks goes to Dr. Richard Houang, Dr. Robert Floden and Dr.
Dennis Gilliland for their help and valuable comments.

Working in the Office of Research Consultation provided me with
a most valuable experience which I will never forget. Many thanks to
Professor Joe L. Byers, the Director of the Office of Research Consul-
tation, who gave me this job, and to Professor William H. Schmidt, who
gave me four years of teaching assistantship experience.

I acknowledge with appreciation the support of the Thai Government
which allowed me to pursue my doctoral studies. Special thanks are
also extended to Apinya Assavanig for typing part of the rough draft
of this document, and to Donna Schmidt, who was kind enough to spend
many hours in correcting my English.

Most of all, I wish to thank my husband, Dr. Samnao Kajornsin, for
his encouragement, support, understanding and sympathy. Finally, I
wish to extend my gratitude to my parents, my brothers, and to Rungson

Kajornsin, my son.

ii

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . .

LIST OF FIGURES . . . . . . . . . . . . . . .

Chapter
I.
II.

III.

IV.

V.

VI.

STATEMENT OF THE PROBLEM . . . . . . .

REVIEW OF THE LITERATURE . . . . . . . .

ALTERNATIVE APPROACHES FOR ANALYZING HIERARCHICAL

Two Stage Analysis Approach . . . . .
Group Level Analysis Approach . . . .
Subtraction Analysis Approach . . . .
The Full Model Analysis Approach .
Bock Application Analysis Approach . .
SIMULATION PROCEDURE . . . . . . . . . .
Description of Population Parameters
Description of the Generation Routine
Two Stage Analysis Approach . . . .
Group Level Analysis Approach . . . .
Subtraction Analysis Approach . . . .
Full Model Analysis Approach . . . . .
Bock Application Analysis Approach . .
SIMULATION RESULTS . . . . . . . . . . .

CONCLUSIONS AND RECOMMENDATIONS . . . . .

iii

Page

. vii

15

17

19

20

21

22

26

27

36

42

43

43

44

45

47

74

APPENDICES Page
A. COMPUTER PROGRAMS . . . . . . . . . . . . . . . . . . . . 83
Mydata program . . . . . . . . . . . . . . . . . . . . 83
Discrimination Analysis of Finn Manova Program . . . . 89
Bock Program . . . . . . . . . . . . . . . . . . . . . 90

B. RELATIONSHIP OF THE BETWEEN-GROUP REGRESSION COEFFICIENTS
FROM VARIOUS ANALYSIS APPROACHES . . . . . . . . . . . 92

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

iv

Table

4—3

4-4

4-5

4-6

4—8

4-9

4-10

5-1

5-2

5-3

5-5

5-6

5-7

LIST OF TABLES

3x2 Design of Populations Defining the Structure

of (E + Ea) . . . . . . .

Population Compositions of Z and £8

Parameter Values for the First

Situation . . .

Parameter Values for the Second Situation

Population Covariance Matrices
Variables . . . . . . . .

Population Covariance Matrices
Situation . . . . . . .

Population Covariance Matrices
Situation . . . . . . . .

Parameter Values of the Second

Population Covariance Matrices
Variables of the Second Set

Population Covariance Matrices
of Data . . . . . . . .

of the Predictor

Under the First

Under the Second

Set of Data .

of the Predictor
of Data . . .

of the Second Set

Simulation Results of Population I-A . . . . .

Simulation Results of Population I-B

Simulation Results of Population I-C .

Simulation Results of Population Il-A . .

Simulation Results of Population II—B

Simulation Results of Population II-C . . . . .

Simulation Results of the Second Set of Data of

Population I-C . . .

Page

28
29
30

31

32

33

34

35

37

38
49
53
56
59
63

66

70

Table Page

5-8 Simulation Results of the Second Set of Data of
POPUlatIOD II-C o o o o o o o o o o o o o o o o o o o 72

vi

5-3

5-4

5-5

Sampling
Sampling
Sampling
Sampling
Sampling
Sampling
Sampling
Sampling
Sampling
Sampling
Sampling

Sampling

LIST OF FIGURES

Distribution

Distribution

Distribution

Distribution

Distribution

Distribution

Distribution

Distribution

Distribution

Distribution

Distribution

Distribution

of

of

of

of

of

of

of

of

of

of

of

ID) ID) ID) m) m)

m )

of 8

vii

21

22

21

22

From

From

From

From

From

From

From

From

From

From

From

From

Population
Population
Population
Population
Population
Population
Population
Population
Population
Population
Population

Population

II-A

II-A

II-B

II-B

II—C

II-C

Page

. 52

. 55

. 62
. 65

. 65

CHAPTER I
STATEMENT OF THE PROBLEM

In recent years, the problems of analyzing hierarchical data have
been well known among researchers. It has become increasingly evident
that efforts to investigate the relationship between variables have
suffered from a failure to understand complications caused by hier-
archical data. Most educational data are hierarchically arranged,
i.e., students are grouped into classrooms which are grouped within
grade levels and within schools. The schools are also grouped within
school districts and these in turn are also grouped within state educa-
tional administrations.

Consider the problem of modeling the effects of school structure
on student achievement. Suppose we are interested in the effects of
some characteristic of school structure on achievement. There is a
systematic sorting of families into school districts that produces a
correlation of individual student attributes with school characteristics.
Therefore, an1 adequate description of the achievement process must
contain both student characteristics and school characteristics.

The practical problem is that the researcher may either analyze
individual level data (e.g., regressing individual achievement on
student characteristics and school characteristics) or he may analyze
school level averages (e.g., regressing average achievement on average
individual characteristics and school characteristics). Hannan and
Young (1976) have shown that in most realistic situations (i.e., when
models are not perfectly specified) results of the two different

analyses will be quite dissimilar.

Bidwell and Kasarda (1975) argue that when the question is posed
at the school level, the school level regression is most appropriate.
Hannan, Freeman and Meyer (1976) point out that researchers seldom
adequately specify school-level processes that are anything more than
the sum of individual—level processes. The causal arguments are con—
cerned with the impacts on individual students which are composed of
school—level outcomes. Consequently, the choice of level is open to
question.

Wiley (1973) points out the problem of analyzing data when using
large numbers of correlated explanatory variables. He indicates that
when variables defined at the level of the individual pupil are ag-
gregated to the level of the school their correlations tend to increase.
As a consequence, in the presence of large numbers of such variables,
effective analyses are hindered by excessive collinearity (high rela-
tions among independent variables). When the number of such collinear
variables becomes very large, the effects of individual variables
become very difficult to detect. Whenever variables are defined at
the school level, the appropriate unit of analysis is the school and
the number of degrees of freedom available is limited to the number of
schools.

Research on the differences between multiple regression models
applied at different levels of aggregated data indicates three things:
1) there are substantial differences in the magnitude of regression
coefficients across aggregated levels for specific models; 2) different
variables enter the models at different levels; and 3) aggregation of
individual characteristics generally inflates the estimated effects of

pupil background and thus decreases the likelihood of identifying

teacher and classroom characteristics that are effective. The results
cited above are not very comforting for the researcher who wishes to
draw conclusions about educational processes at one level but is
constrained to analysis at a different level.

When faced with the analysis of hierarchical data many researchers
have tried to propose alternative ways of analyzing such data (e.g.,
Keesling and Wiley, 1974; Cronbach and Webb, 1975; Keesling, 1976; and
Burnstein, 1976).

Keesling and Wiley (1974) prOpose a two stage analysis of hier-
archical data. They set out to define a model for disentangling the
effects of variables defined solely at the school level from those
defined at the level of the pupil.

Cronbach (1975) claims that the overall between-student coefficient
from the regression of individual outcome on individual outcome on
individual explanatory variables is a composite of the between groups
regression coefficient and the pooled within—group regression coef-
ficient. He recommends that between group effects and individual within
group effects should be examined separately.

According to Keesling (1976), to obtain the correct estimates of
the between school regression coefficient at least two models need to
be examined. These two models are a school level model and an indivi-
dual within school level model. Keesling recommends subtracting the
within school regression coefficient from the between school regression
coefficient to obtain the correct regression coefficient appropriate to
school level effects.

Burstein (1976) proposes an alternative approach by suggesting

the examination of determinants of heterogeneity of the within class

SIOpe. He suggests that the first step is to find the specific within
class adjusted intercept and SIOpe. The second step is to fit a model
at the class level with the adjusted intercept and slope used as out—
come variables and the class level explanatory variables used as
independent variables.

The general purpose of this dissertation is to investigate various
alternatives used to analyze hierarchical data by applying them to a
set of simulated data. This study extends the regression model
presented by Burstein, Linn and Capell (1978) to its multivariate form.
The model used to simulate the data is the random effects model. The
main assumption used in this model is the homogeneity of the within
group regression, that is, in contrast with Burstein's approach which
suggests allowing for the heterogeneity of the within group regressions.

The main concern of this dissertation is to determine which approach
gives the best estimates of the between and within regression coef-
ficients in terms of accuracy, least amount of bias and in terms of
precision for various situations. In other words, this dissertation
is concerned with determining how correctly the alternative procedures
tend to work, i.e., how similar the estimated coefficients at the
group level are to the known parameter values, and if the conclusions
arrived at under each analysis approach are the same.

Two situations are investigated. The first is where there are
both individual level predictors which can be aggregated, and predictors
defined only at the group level. For example, the predictors could be
length of the school day (group level), and average home background
(individual level aggregated to the group level). The second situation

is where there are only individual level predictors which are aggregated.

For example, the predictors are average home background, and average
pretest scores (both individual level variables aggragated to the
group level). For each situation, three different populations are
investigated; first, there are no group level effects; second, group
level effects are equal to the individual level effects; third, group
level effects are not equal to the individual level effects.

For the first situation, four analytical approaches will be
investigated; a two stage least squares—analysis recommended by
Keesling and Wiley, 3 group level analysis approach using only averages
recommended by Cronbach and Webb, a full model analysis approach
recommended by Keesling. For the second situation, four approaches
will be investigated: the group level analysis approach, an approach
’based on Bock (1968) using his method of estimating heritable varia-
tion in twin studies, the full model analysis approach and the sub-
traction analysis approach.

The method of investigating these various approaches will involve
the use of simulated data which are generated by computer algorithms
where the population parameters are known. For each pOpulation, fifty
samples were generated. By analyzing fifty samples, one can compare
1) the empirical distribution of the estimator for each analysis
approach to the others and 2) the empirical standard errors of the
parameter estimates. These results can be used to help determine the

appropriateness of each of the analyses for different data situations.

CHAPTER II
REVIEW OF THE LITERATURE

Traditionally, in a situation involving heirarchical data, a
variety of competing points of view have been cited as justification
for the choice of either pupils or groups (classroom, schools, etc.)
as the unit of analysis. Hannan (1976) has shown that in most inexact
cases (i.e., when the models are not perfectly specified) results of
individual level analyses and group level analyses will be quite
dissimilar. This finding makes the choice between models extremely
important. This section will review the methodologies that some invest—
igators have used to analyze multi-level data.

Cronbach and Webb (1975) reanalyzed a study by G. L. Anderson.
Anderson reported finding an interaction of drill and meaningful methods
of arithmetic instruction with student ability and achievement. Drill
was found to be superior for "overachievers" and meaningful instruction
for "underachievers" in 18 fourth-grade classrooms. Pretest measures
used in the study were the Minnesota School Ability Test and the Compass
Survey Test. Cronbach and Webb argued the importance of separating the
regression effects into the between—class and within—class categories.
In their reanalysis, separating between—class and within-class regres-
sion components of the outcome on aptitude, the Aptitude by Treatment
interaction finding disappeared. An apparent interaction in the
between-class analysis was dismissed as unreliable. No interactions
were found within classes. Finally, the concluded that studies of

interactions usually have not been powerful enough to evaluate outcome

on aptitude regressions accurately. Using the class as the unit of
analysis, even the rather large Anderson study could not set narrow
confidence limits on the regression slopes. They urged investigators
collecting data on intact classes to examine between group and within
group regressions separately.

Keesling and Wiley (1974) discussed the problem of disentangling
the effects of variables defined solely at the level of the school
(e.g., length of the school day or the highest degree held by the
principal) from those defined at the level of the individual pupil
(e.g., home background characteristics). They summarized the model
implicit in this situation by:

= + ' + +0! +
Y.. Yo -l-§i 01 & zij e

13 1}

where Yij is the outcome of the jth_pupil in the ith school, Yo is an

'

additive constant,_l is the vector of adjusted effects of school
characteristics on Y’-§i is the vector of school variables for the
ith school, Oi is an error component defined at the school level, 8'
is the vector of adjusted effects of individual characteristics on

Y,_§ is the vector of the characteristic of the jth individual in the

13'
i£h_school, and Eij is an error component defined at the individual
level.

In the context of hierarchically defined educational data, they
proposed three alternatives to obtain appropriate adjusted estimates
of the effects at the individual and school levels. The first alter—
native was to assume that the model was completely specified at the
school level, i.e., all of the school variables relevant to the out-

come are included in the model. Then the covariance (Oi, Xi) is equal

to zero, where:i is the mean of Xij for the ith school. This model

implies that individual level variables have direct impact on outcomes
only at the level of the individual; their effects at the school level
are mediated through other variables defined at the level of the school.

The second alternative was that if all the mediating variables
at the school level were not specified in the model, then the covariance
(0:, 2;) was not equal to zero where O: was the residual from the
measured school variables. In this case, the fitting of the model will
produce a biased estimate of 8, This source of bias may, however, be
eliminated by performing an analysis based on the variation within
schools. This may be done by subtracting the relevant school means
(school effect values) for the criterion variable and for each of the
pupil level explanatory variables from each of the individual values
for these variables. An analysis performed using these deviated values
will be adjusted for all sources of variation among schools. The
covariance matrix of the deviated values is called the pooled within
school covariance matrix. If this covariance matrix is computed for
all individually defined variables and used as the basis for the regres—
sion of the outcome on the ﬁij’ the resulting estimate of §_will not be
biased by specification errors at the school level.

After the adjusted effects of the individual level variables are
found, the average effect value for each school, aggregated over all
the individual pupils, may be subtracted from the criterion mean for
each school. Analyses using the school as a unit with variables
defined at the level of school as the independent variables and the
modified criterion means as the dependent variable will produce esti-
mates of the effect of the school variables adjusted for the effects

of individually defined variables. The model at the school level

becomes:

’

Yi-é’gi=yo+lgi+¢i

where T; is an achievement mean for i£h_school, éfgi is the estimated
average effect value for ith school, yo is the constant, l’is the
vector of the adjusted effects for the school variables, and oi is an
error component at the school level. Using this model, the analysis
will produce unbiased estimates of Y0 and l in the absence of specifi-
cation error.

The third alternative was that if there was some specification
bias at the level of the school-defined variables (i.e., some impor-
tant variables are missing) then the covariance (Oi, Xi) is not equal
to zero and the covairance (¢i, 2%) is not equal to zero, either.

Some of the biases in the estimate of 1_can be removed by including
the sum of the average effect values (8521) of the individually defined

varialbes as another variable in the school level analysis. The model

then becomes as follows:

I.

+' + _ +
1 Yo 151 A<§2<—i) ¢1

This technique allows the partial removal of some of the additional
bias due to the omission of relevant school level variables to the
extent that the sum of these average effect values is correlated with
the omitted variables.

Rock, Baird, and Linn (1972) studied the interaction between
college effects and students' aptitudes. They claimed that their
approach was designed to find groups of colleges that are about equally
effective for students with various levels of initial performance.

Then the characteristics of the identified criterion groups were com-

pared to see which characteristics were related to the relative

10

effectiveness of the groups. Their method attempted to provide an
intuitively simple approach which identified both overall college
effects and effects which interact with student ability. Specifically,
four steps were carried out: 1) all within school regression lines
were computed, i.e., Graduate Record Examination (GRE), area tests were
regressed on the College Entrance Examination Board, Scholastic
Aptitude Test (SAT) scores within school; 2) Ward's (1963) hierarch-
ical clustering technique was applied to group schools in the basis of
the similarity of their regression lines; 3) multiple group discriminant
functions using the estimates of the regression parameters as the
group discriminants were computed to test whether the newly formed
groups differed with respect to their pooled regression lines; and
4) discriminant functions using college descriptive variables as the
group discriminants were then computed. This method thus identified
criterion clusters of colleges that differed in effectiveness by
clustering on the SIOpe, the mean SAT scores of the students, and the
intercept. Therefore, one can identify and grouI3colleges that have
different levels of initial ability. Then the simultaneous evaluation
of the college along with the relative slopes of their pooled within
group regression lines indicated the college characteristics which
are associated with overall as well as differential effectiveness.
Burstein (1976) discussed two examples of multi-level analyses
found in studies by Rock, Baird and Linn (1972) about the interaction
of student aptitude and college characteristics and by Keesling and
Wiley (1974) in which they reanalyzed a subset of the Coleman data.
He stated that each method has certain merits and certain drawbacks.

The Keesling and Wiley approach provided effect parameters more nearly

ll

mirroring the structural form of school effects than the Rock, Baird
and Linn approach or the usual single level analysis models. Burstein's
concern was that the Keesling and Wiley approach fails to adequately
reflect the effects of between class differences in SIOpes. Moreover,
treating the resulting clusters as groups in a discriminant analysis
as Rock, Baird, and Linn did discarded any metric differences existing
among the clusters and thereby eliminated the possibility of describing
school effects in structural terms. The use of discriminant groups
results in some loss in generalizability of findings that should be
avoided. In the same paper Burstein also criticized Cronbach's approach
that recommended analyzing between class and within class separately
when intact classrooms are sampled. Burstein said that the between
class and within class analyses did not remove the need for concern
about homogeneity of regression.

Burstein proposed an alternative multilevel analysis atragety

that consisted of two stages, as follows:

1. perform within class regression (not pooled) of outcomes on
input, and
2. use the parameters (a,8) from the within class regressions

as "outcomes" in a between class analysis.

Burstein claimed that his strategy combined certain features of
approaches by Keesling and Wiley and by Rock, Baird and Linn. The
technique of using the within class parameter estimates as outcomes
should lead to more sensitive interpretation of effects and clearer
policy implications of the findings.

Burstein and Miller (1979) stated that because of its hierarchical

organization, the effects of schooling on individual pupil performance

ll

 

12

can exist both between and within the levels of the educational system.
Moreover, analyses at different levels address different questions and
thus analyses conducted at a single level were inherently inadequate.
While analyses of the relationships between "treatment" dimensions and
the mean outcomes of groups often provide useful information, impor-
tant differences in within group processes may be obscured. These
within group processes may arise due to group composition (e.g., ability
level and mixture affecting participation patterns), differential
allocation of instructional resources among the members of the group
(e.g., the grouping and pacing features of reading instruction), or
differential reactions of group members to the same instructional
treatment (aptitude-treatment interactions).

If important group-to-group differences in within group processes
exist then the use of group means as the only indicator of group out-
comes will result in misleading estimates of group (teacher, class,
treatment) effects.

Burstein and Miller's interest in alternative measures of group
outcomes has concentrated on the properties of the within-group slopes
from the regression of outcome on input. They have argued that within
group slopes are group level indicators of within group processes.
Their reason for considering slopes as outcomes was that there may be
instructional effects on the within group regression of outcomes on
input, whether there were instructional effects were present, the
analysis should attempt of isolate instructional process and practice
variables that were associated with slope variation. If such variables
can be found and alternative explanations cannot be ruled out then

variation in SIOpes becomes an important source of information for

13

researchers and policy makers, especially when considered along with
effects on other group level outcomes.

Keesling (1976) presented a model for analysis at two levels of
aggregation (e.g., pupil and school). The multivariate random effects

model for this situation is:

= + + ' = .....
‘Xij E. 2i Eij 1 1,2, ,k
j = 1,2,.....,n
all vectors are p x 1. This implies 2y = 2a +ifassuming that there are

k groups of n units, each unit having measures on p variables.

The above model, adopted from Schmidt (1969), was comprehensive
in that it permitted the estimation of effects and their standard
errors at both levels of aggregation simultaneously. Keesling, however,
did not analyze the data by using Schmidt's procedure.

Two sets of data were presented. One set dealt with data con-
structed to a particular specification. The second set dealt with
real data of a two-level nature. He analyzed data under three models.
The first model used pupil post-test score as the dependent variable,
ignoring the group structure in the data, and pretest, SES, average
pretest, average SES and hours per month of principal absence as pre—
dictors. The second model used school mean post—test as the depen—
dent variable, average pretest, average SES and hours per month of
principal absence as predictors. The third model used pupil posttest
score within school as the dependent variable with pretest and SES as
predictors. The results suggested that in order to obtain both the
correct parameter estimates and the correct standard errors, it is
necessary to perform at least two analyses. The first model gave the

correct parameter estimates, but it did not partition the residual sum

14

of squares by level of effect. The second model gave the aggregate
level standard errors, but the parameter estimates were the sum of the
between and within effects. The third model obtained the apprOpriate
estimates and standard errors for the within school effects. The
second and third model may be combined to produce correct estimates

of the between school effects by subtracting the within school

estimates from the between school estimates.

CHAPTER III
ALTERNATIVE APPROACHES FOR ANALYZING

HIERARCHICAL DATA

Of the different alternatives prOposed to analyze the hierarchical
data, four were selected for comparison in the present study. The two
stage analysis approach which was recommended by Keesling and Wiley,
group level analysis approach which was recommeded by Cronbach and
Webb, full model and 'subtraction. analysis approaches which were recom-
mended by Keesling, and Bock application analysis approach will be dis-
cussed in this chapter.

Consider the following general situation where person j is a
member of group i. The person has a set of scores, £1. and Yif’ Also
available are a set of explanatory variables defined only at the group
level which is denoted as 21. The relationship of}:ij and ii to Yi
can be decomposed into between group and within group components as

given in equation (3—1).

(3-1) Y..=u+§'(y_ —£)+B'(Z.-u)+6,+
13 Y 3 xi x ‘—Z *1 -—z 1
V _ _ 1 _
E-(Kij u i) + (8i E) (Eij 3x1) + Eij

where u , U and u represent the population means, u represents the
y -z —x —xi

i£h_group population mean,__8_a denotes the between-group regression

coefficients for the individual level variables, 82 represents the

regression coefficients defined for the group level variables, 8

represents the pooled within-group regression coefficients for the

15

16

individual level variables, and Ei represents the specific within—group
regression coefficients for group i for the individual level variables.
The (Si and Eij represent the error at the group level and at the indivi-
dual level, respectively.

This study will deal with the case where all within group slopes
are equal, resulting in (B1 - B) being equal to zero. The model for

the first simulated case is:

.. = ' - + V' - +S +
(32) Yij uy+§a(ui 11) 52(51 112) (i
8'(X. - u ) + e
-—13 -x. 13
1
Let a = u -
-x. —x, -x
1 1
a = Z. - u
—z. —1 ‘-z
1
a = X..- H
—x.. -1 —x.
13 1

The equation (3-2) can be rewritten as equation (3—3):

(3—3) Y.. = u + B'a + B‘a + Ci + B'a + 6,,
13 y -—a—xi --z—zi ---—xij 13

This implies that the variance of Y is:

(3-4) Var(Y) = B'zxa + 3'22 8 + q2 + s'zxs + 02
—aa—a —Za-Z a _ —

X . . . 7. .
where Ea is the between level variance—covariance matrix of g, 2a 18

. O l x I O O
the between level variance-covariance matrix of g, 2 IS the within

level covariance matrix of x, o: is the error variance defined at the

group level and ozis the error variance defined at the individual level.
Then there are only individual level explanatory variables, the
model is:

-' Y = + ' - +18 "l' ' -
(3 5) ij uy Ea (3x1 Bx) i E- (éij 'Exi) + Eij

And the variance of Y is:

17

(3-6) Var(Y) = £328? + o: + @ng + 02.

The five alternative analysis approaches (two—stage analysis,
group level analysis, full model analysis, subtraction analysis, and
Bock application analysis) that are investigated in this dissertation
can be related to the models as given in equations (3-2) and (3—5)
for the first and second situation, respectively. In the following

pages, the procedures of each alternative approach is discussed.

Two Stage Analysis Approach

 

The two stage analysis approach was recommended by Keesling and
Wiley (1974). Wiley mentions that one of the problems in the analysis
of multi-level data has been separation of the effects of the aggregated
variables into parts reflecting their individual level effects on one
hand, and their effects via school climate and organization on the other.
One way to describe an appropriate method of analysis of hierarchical
data is in terms of the general notions of statistical confounding and
control. If we wish to assess the impact of how one explanatory vari—
able is correlated with another one, then if we ignore the second, we
will attribute to the first not only its effect, but also a spurious
effect which is due to the correlation between it and the second, and
the effect of the second. If we utilize an apprOpriate method of
analysis which takes into account the second variable, i.e., its
effects and its relationship to the first, we may obtain an adjusted
assessment of the effect of the first variable which is not confounded
by the second.

Keesling and Wiley set out to define a model for disentangling

the effects of variables defined solely at the school level from those

18

defined at the level of the pupil. The process of disentanglement
involves two stages. The first stage adjusted the effects of indivi—
dual background characteristic on outcome for the effects of the
schools in which the individuals receive instruction. The second stage
used the adjusted effects of individual level variables aggregated
over pupils within schools to determine the adjusted effects of school
level variables. In practice, they carried out the following:

1. Determine the pooled within-school slopes under equation (3—7).

3—7 Y = u + 8' X.. - u ) + 6..
( ) ij yi “(‘13 —Xi 13

i = 1, 2, . . ., k; j = 1, 2, . . ., n
j is the outcome of the jth subject in the ith_school, uy is
i

the population mean of the ith school, gij is the vector of explanatory

where Yi

variables of the j£h_subject in the ith_school, and B is the vector
of pooled within—school slopes.

An analysis using the school mean deviated values of both explana-
tory and criterion variable will effectively "control" or adjust for
all sources of variation among schools. The covariance matrix of the
deviated values is called the pooled-within school covariance matrix.
If this covariance matrix is computed for all individually defined
variables and used as the basis for the regression of the outcome on
the set of explanatory variables, the resulting estimates of §_will
not be biased by specification errors at the school level.

2. Find the mean predicted outcome for each school.

_. A =A 4-".A —-A
(3 8) uy. uy §_(gx. Ex)
1 1
where u is the mean predicted outcome for the ith_school.
i

3. Fit a model at the school level regressing the observed school

19

mean outcome on school level explanatory variables and predicted

school mean outcomes,

_ v _
(3-9) uy - my +_§z(§i .32) + Any + 61

i i
where 82 is the vector of adjusted effects of the school level varia-
bles, 2i is the vector of the school level variables, 61 is the error

defined at the group level, Ais the coefficient allowing for partial
removal of some of the additional bias due to the omission of relevant
school level variables (to the extent that the sum of these average
effect values is correlated with the sum of the average individual
level effect values represented in u ). If all relevant school level

1
variables are included, then Anwill be equal to one.

 

Group Level Analysis Approach
Cronbach (1976) mentions that in the situation where pupil j is
a member of group i, Bt’ the overall between student coefficient from

the regression of Yij on X .,

iJ

- = + - +
(3 10) Yij uy Bt(Xij ux) eij

has been shown by Duncan, Cuzzort, and Duncan (1961) to be a composite
of 83, the between group regression coefficient and B, the pooled

within-group coefficient;

- = 2 _ 2
(3 11) 8t nXBa + (1 nx)8

where n: is the correlation ratio of X.

 

22 2
x . - u .
(3—12) n2 = 1 - 11~( 13 X1)
X XX (X _ )2
ij ij “x

20

Cronbach indicated that analyses at the group level and the
individual level give conflicting descriptive results because they
speak to different substantive questions. The investigator who wants
to know the relationship between two variables is not asking a clear
question until he tells whether the group or individual level relation—
ship is the one of interest. He recommended that between group effects
and individual within group effects should be examined separately. He
proposed the following:

1. Between groups:

(3-13) uyi = uy + EAQI — £2) + _B_(,:l(y,Xi — 10+ 61
where the 82 is the effect of school level variables on mean outcomes,

and Ea is the between groups effect that reflects any consistent tendency

of higher-X groups to do better or worse than others on the outcome

measure.
2. Pooled within groups:
3-13 Y.. = u + 8' X.. - u + 6..
< ) 13 yi “ (‘13 -_Xi) 1]

where §_is the common within-group effect that reflects the tendency
for students above the group average to outperform or underperform

the rest of the group.

Subtraction Analysis Approach

 

In the situation where subject j is nested within group i, Keesling
(1977) analyzed constructed data to show how well ordinary least square
estimators can retrieve the information. He analyzed the data under
two models, as follow:

1. The group level model uses group mean outcome as the indepen-

dent variable:

21

(3-15) uyi = uy ”'15-;‘51 - 112) + Eg'Qin - ix) + 61

Keesling claimed that this model gives the aggregated level standard
errors, but the parameter estimates are the sum to the between and within
effects.

2. The within group model is the model that uses the individual
level outcome variable within groups as the dependent variable. Accord-
ing to Keesling, this model obtains the apprOpriate estimates and

standard errors for the within group effects.

(3—16) Yij = 11y. +§ (£13, - 3x.) + 513’

Keesling concluded that to obtain the correct estimates of the
between school effects at least these two models need to be performed
and then substract the within group estimates from the between group

estimates. That is,
3 l7 - *
<— > ﬁa-éa-E

A
whereﬁa is the correct between group effects, 88 is the estimate of
the between groups effects using the group level data and_§ is the

within group effect.

The Full Model Analysis Approach

 

The full model is the model that uses the individual level outcome
variable as the dependent variable. The explanatory variables are:
l) the variables defined at the individual level but which can also
be aggregated, 2) the means of the variables defined at the individual
level, and 3) the variables defined at the group level only. The model

is shown in equation (3-18).

22

(3-18) Yij = uy + a2 (a, - 1.1.2) + £8 (3X1 ' 1%,.) + $5- ‘Eij ‘ Ex) + Co

where uy’-Ez and Ex are the population means,_y_xi is the ith group popu-
lation mean,§a represents the between—group regression coefficients

for the aggregated individual level variables, 82 represents the regres—
sion coefficients for the group level variables, 8 represents the

pooled within—group regression coefficients for the individual level

variables, and t. represents the error defined at the individual level.

13
Keesling (1977) at one time analyzed the heirarchical data under
the full model. He mentioned that this model gave the correct parameter

estimates, but it did not partition the residual sum of squares by the

level of effect.

Bock Application Analysis Approach

 

In the situation where there are students nested within schools
and the school is a random variable, the model is the random effects
model. In this dissertation, there is one dependent measure and two

antecedent measures for each subject; the random effects model is:

W,, = u + a, + e,, , i = 1, 2, . . . ., k.
—13 - -1 ~13

where all vectors are 3x1 in this application, Eij is the response
vector representing the dependent, and antecedent measures,_p is vector
of the population means on each measure, 2i and;ij are the random
vectors assumed to be multivariate normally and independently distri—
buted with zero mean vectors and covariance matrices Xa andllrespectively.

The above model implies 2w = 2a + X, where 2w is the total variance

23

covariance matrix, 23 is the between school variance matrix, and X is
the within school variance covariance matrix.

The use of the Bock application approach is to provide an estimate
of 28 which is at least a positive semi—definite variance covariance
matrix, and then from this matrix to estimate the group level regression
coefficients. Bock's method is presented in the context of twin studies
and is used to estimate the component of heritable variation. A more
detailed description of this approach can be found in Bock (1968).

Under the random effects model, the expected value of the mean
square matrix between schools is E + n23, and the expected value of

the mean square matrix within schools is I.

Let S X + n2
3 a
S = X
Then for a symmetric positive definite matrix S and a symmetric
positive definite matrix Sa’ it is possible to find a nonsingular

transformation T such that

¢

_ I
(3 19) T SaT

(3-20) T'S T I

where ¢ is diagonal with positive diagonal elements, and I is an
identity matrix. The columns of T are the solution of a system of

homogeneous equations of the form:

Q

(58 — ¢18)t = 0, l = l, 2, 3, and ¢1 IS a root of

Is - <I>SI =0.
a

In practice, the estimate of S is the mean square matrix within

schools, that is obtained from the equation (3-21).

k
(3-21) 3 = ———1——— 2 (14.. - w.)(w.. — w.)'
i ‘—1J - ~1

k(n - 1) 1 —Tj

u-PTS

24

where W_. is the individual response vector, W_ is the group mean

13 1

vector, k is the number of groups and n is the number of subjects in

each group.

<1

Yij _ i

z
I
lxl

Xij

The estimate of Sa is the mean square matrix between schools that

is obtained from the equation (3—22).

 

k
_ = z - _ '
(3 22) 83 k _ 1 n1 (34,i y)<yi 59
Y’
where W_is the grand mean vector. W_= __ .
X

From equations (3-19) and (3-20),

T'SaT - T'ST = ¢ - I

T'(Sa - S)T = ¢ - I
T'nX T = ¢ -1
a -1 1
28 = [(T )'<¢ — I)T’ ]/n

Practically, for the elements in the columns of T the discriminant
function coefficients are substituted and for the elements of ¢ the

corresponding significant canonical variances ¢ (1 = l, 2, . . . , s)

l
and p -s unities are substituted (p is the dimension of T). This esti—
mate has the following properties. Because the elements of the diagonal
matrix (¢ -I) are non-negative, it can be expressed as the product of

a matrix and its transpose and is therefore positive semi-definite.

Its rank is s and its nullity is p - 3. When all of the canonical var-
iances are significant (8 = p),

A

X
a

-1 —1
[(T )‘(T'SaT - T'ST)T ]/n

[88 - S]/n

25

where S8 is the estimate of mean square matrix between schools and S
is the estimate of mean square matrix within school.
The between school covariance matrix (23) estimated in this way

and guaranteeing positive semi—definiteness can then be used to estimate

the between school regression coefficient.

CHAPTER IV

SIMULATION PROCEDURE

Simulation procedures were used in this study to generate the
data. The use of simulated data enables us to determine which method
of analyzing hierarchical data gives the best estimator, in terms of
accuracy and precision, of the parameters under various situations.

The bias ratio of each estimator was also computed to facilitate com-
parisons. Two situations were investigated in this dissertation. The
first situation was one in which there were both individual level pre-
dictors which can be aggregated to the group—level and predictors
defined only at the group-level. The second situation was one in which
there were only individual—level predictors which can be aggregated.
For each situation, three different data sets will be generated. These
are described in the following ways:

1. No group level effect. The between-group regression coef—

 

ficient is set to zero, but the within-group regression coefficient is
non-zero. This case implies that there is no group level effect
(i.e., Ea = 0).

2. Group level effect is equal to the individual level effect.
The between group regression coefficient is not equal to zero but is
equal to the within—group regression coefficient. This case implies
that there is a group level effect, and that the group level effect
is equal to the within level group effect (i.e., Ea =.§ # 0).

3. Group level effect is not equal to the individual level effect.

26

27

The between group regression coefficient is neither equal to zero, nor
equal to the within-group regression coefficient. This case implies
that there is a group level effect but it is not equal to the within-

group effect.

Description of Population Parameters

 

The data generated for the present study were from a multivariate
normal distribution with a mean vector B_and a covariance matrix X + 23,
where X is the within covariance matrix, and 2a is the between covariance
matrix. The between-groups and within—groups regression coefficients

(§_and 83), the within and between covariance matrix of individual

(X) (X)

a

level predictors (Z and 2 ), the between covariance matrix of

predictors defined at the group level only (2(2)

), the between covariance
matrix of predictors defined at the individual level and at the group
level (2(xz)), the error variance at the individual level (02), the
error variance at the group level (0:), and the population mean (3)
were specified in advance.

The study by Keesling and Wiley (1974) was used as a guide to
choose parameter values which would be reasonable. Three population
covariance matrices were constructed based on the model (3-2) and (3-5)
for the first and second situations, respectively. The six possible
total covariance matrices ( 2 + 23) were derived from the 3x2 crossed
design of possible combinations of pOpulation parameters and situations
(see Table 4—1). Fifty samples of 1,500 subjects each were generated
for each cell in Table 4-1. In each sample there were fifty schools

with thirty subjects within each school. The structure of the within

covariance matrix (2) and the between covariance matrix (23) for the

28

Table 4-1

3x2 Design of POpulations Defining the Structure of (Z + Ea)

 

 

 

 

 

Both individual level Individual level
and school level varia— variables only.
bles.
E; g, g 3‘ g I-A II-A
= 8 # 0 I-A II—B
—a _. _.
_a E .8. E O I-C II-C

 

 

first and second situations are shown in Table 4-2. The vector of
populations means (2), pooled within-group regression

coefficient (8), between-groups regression coefficient (83), error
variance at the individual level (02), and at the group level (0:)
for both situations are given in Tables 4-3, and 4-4. The numerical

values of 2(x)’ 2(x)’ 2:2), Z:XZ)

, Z, and Z + 2a are given in Tables
4-5, 4—6, and 4—7.

The intraclass correlations of variables Y and §_are quite high
(about 0.6052 for Y and 0.6547 for g) in population I—C. In pOpula-
tion II—C, the intraclass correlations of Y and §_variables are all
about 0.98. Therefore, in order to check whether the analysis appro—
ches give the same result in the situation where the intraclass cor-
relations are not as high as the first set of data, a second set of

data for populations I-C and II-C were generated with a new set of

parameters as shown in Table 4-8 which have intraclass correlations of

(X) Z(X) Z(2)
’ a a

about 0.30. The numerical values of X ,

2(XZ), z, z ,

’ a a

29

 

 

 

mm mm ow I.
3 3 3a mesa
HH
Aoauuweezmv mo + mm. New. AoauuoEE%. o I. .I
. N G: . . 8 + mcswh
m m NI. m MI W I II II
3” sea sacs + arcs o o o
m NI w NI. m I: H
3“ mass + a3“ 3a mesa
3335?me on + om. «wow +~m mwmmN Mm swam Gan—owes; V o m. m
. N E . as . 3 . . m + 3H.
Ammo moamaum>oo cmmzumm Ame moamanm>oo assess coaumauam

 

 

mm cam w mo mcowumeQEoo coaumﬁsmom

qu oHan

30

Table 4-3

Parameter Values for the First Situation

 

 

IT:
loo
‘3)
o
M
Q
N

Case

 

12.0810 1
1.3491 2.53 0 4.08

2.4587 0.32 0 2.15 0.5276 0.0812
6.8660

1.0511

1 Ll

p-l r

2.0810
1.3491 2.53 2.53 4.08
1’3 2.4587 0.32 0.32 2.15 0.5276 0.8972
6.8660

1.0551

 

 

12.0810 1

1.3491 2.53 1.45

b

.08

1‘0 2.4587 0.32 0.89

N

.15 0.5276 0.9547

6.8660

 

 

1.0511
J

 

 

31

Table 4-4

Parameter Values for the Second Situation

 

 

 

 

 

 

 

2 2
Case p_ B_ Ba 0 03
"25.38101 2.53 0
II-A 12.5912 [9.32 0 0.5276 20.7321
11.4587
i—. .1
"25.3810j 2.53 2.53
II—B 12.5912 0.32 0.32 0.5276 32.7341
g1.4587_
25.38101 2.53 1.45
II—C 12.5912 0.32 0.89 0.5276 22.3454

 

 

11.4587_

 

 

32

Table 4-5

Population Covariance Matrices of the Predictor Variables

 

 

Covariance Matrices Situation I Situation II

 

(x) 0.0912 0.1901 0.0912 0.1901
3 0.1901 2.4775 0 1901 2.4775
2(x) 0.1729 0.1400 14.7149 2.9871
a
0.1400 0.3746 2.9871 25.8970
(2) 0.0072 0.0007 Not Applicable
3 0.0007 0.0009

2(xz) 0.0159 0.0065 Not applicable
a
0.0260 0.0088

 

 

33

 

 

000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0
000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0
000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0-0
000.0 000.0 000.0 000.0 000.0 000.0
00000055000 000.0 00000055000 000.0 00000052000 000.0
000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0
000 0 000.0 000.0 000.0 000 0 000.0 000.0 000.0 0 0 0 0
000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 010
000.0 000.0 000.0 000.0 000.0 000.0
00000005000 000.0 000000EE000 000.0 00000055000 000.0
000.0 000.0 000 0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0
000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0
000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0.0
000.0 000.0 000.0 000.0 000.0 000.0
00000025000 000.0 00000055000 000.0 00000005000 000.0
000+00 0000 A00
oucmwum>oo cmmzuom mocmwum>ou manu03 mmmu

ousmﬁum>ou Hmuoe

 

 

one canoe

coaumnuﬁm umuwm mnu pupa: mmowuumz mocmaum>oo cowumasmom

34

 

 

000.00 000.00 000.00 000.0 000.00 000.0 000.0 000.0
000.00 000 00 000.00 000.00 000.0 000 0
00000055000 000.00 00000059000 000.00 000aumeE000 000.0
000.00 000.0 000.00 000.00 000.0 000.00 000.0 000.0 000.0
000.00 000.00 000.00 000.00 000.0 000.0 0-00
00000055000 000.000 00000055000 000.000 00000055000 000.0
000.00 000.0 000.0 000.00 000.0 0 000.0 000.0 000.0
000.00 000.0 000.00 0 000.0 000.0 0-00
00000055000 000.00 00000055000 000.00 00000055000 000.0
m m
0 0+3 A 00
moamwum>oo Hmuoe oocmwum>oo cowauom moamwum>oo :05003 ommu

 

 

:owumsuwm vacuum 050 Hoes: mmowuumz mocmﬁum>oo c00umasaom

01¢ manmw

35

Table 4-8

Parameter Values of the Second Set of Data

 

 

 

 

 

Case u_ §_ Ea 82 0 08
"12.08101
1.3491 2.53 1.45 4.08 100.2311 22.3454
1-c 2.4587 0.32 0.89 2.15
6.8660
1.1'05110
52.08107 0.08 0.05 None 35.0000 11.9994
II-C 1.3491 0.76 0.95

 

 

L245874

 

 

36

and X + 2a for the new set of data are given in Tables 4-9 and 4-10.
Ten samples of 1,500 subjects were generated for population I-C and
twenty-five samples of 1,500 subjects were generated for pOpulation
II-C. Populations I-C and II—C were chosen to have additional data
generated in addition to the first set because these two cases are the

most realistic.

Description of the Generation Routine

 

The present study requires that data be generated from a multi—
variate normal distribution with mean p_and covariance matrix X + Xa’
where the within covariance matrix (X) and the between covariance
matrix (23) are specified as in Table 4—2. The generation procedure
is composed of five steps:

1. Specify the values for the parameters so that they approximate
the actual data. The Keesling and Wiley (1974) which analyzed real
hierarchical data was used as a guide. This provided values for the
pooled within—group regression coefficients (8), the between-group
regression coefficients for the individual level variables (88), the
regression coefficients for the group level variables (82), the
population means (2), error variance defined at the individual level
(02), and at the group level (0:) as shown in Tables 4-3 and 4-4 for
the first and second situation respectively. The population covar-
iance matrices of the predictors were also specified based on the
Keesling and Wiley study as shown in Table 4—5. The number of schools
(k) and the number of subjects in each school (n) were specified a
priori.

2. Compute the within and between covariance matrices (Z and Z )
a

37

Table 4-9
Population Covariance Matrices of the Predictor

Variables of the Second Set of Data

 

 

 

Covariance Matrices Population I-C Population II-C
2(X) 36.3347 4.7243 81.0000 18.0000
4.7423 61.4263 18.0000 100.0000
2(X) 14.7149 2.9871 35.0000 11.6383
a 2.9871 25.8970 11.6383 43.0000
2(2) 0.0072 0.0007 Not Applicable
8 0 0007 0.0009
1(Xz) 0.0159 0.0065 Not Applicable
3 0.0260 0.0088

 

 

38

 

 

000.000 000.00 000.000 000.00 000.00 000.00 000.000 000.00 000.00
000.000 000.00 000.00 000.00 000.00 000.00 0-00
060uumEE000 000.000 06000055000 000.00 00000055000 000.00
000.0 000.0 000 0 000 0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0
000.0 000.0 000.0 000.0 000.0 000.0 000 0 000.0 0 0 0 0
000.00 000.00 000.00 000.00 000.0 000.00 000.00 000.0 000.00 0-0
000.00 000.000 000.00 000.00 000.00 000.00
06000005000 000.000 00000055000 000.00 06000655000 000.000
000+00 0000 000
moamﬁwm>ou Hmuoe mocmwum>oo som3uom moamwum>oo awnuwz ommo

 

 

mama mo uom paooom msu mo moowuumz oucmwum>ou cowumasmom

0010 magma

39

between the outcome measure and the predictors as specified in Table
4-2.

3. Generate a random sample of k vectors ai, where-ai is multi-
variate normally distributed with mean vector 9_and covariance matrix
Ea. A random sample of k vectors 21 are generated with the following
procedure.

a. Generate 12 independent random variables which are uniformly
distributed between zero and one. Software for the CDC 6500 has been
developed which generates independent values of a random variable
which is uniformly distributed over the range (0, 1), the values zero
and one are excluded. This function, called Ranf, is described in
Fortran reference manual version four (1978).

b. Convert the values from a uniform distribution to values
from the normal distribution by TeichrOew's method to approximate the
inverse of the probability function for the standard normal distri-
bution. Teichroew used a polynomial approximation to evaluate the
inverse function. His procedure generates 12 independent random vari-
ables, Ul’ U2, . . . , U12, uniformly distributed between zero and one.
Then R is defined as (Knuth, 1968):

R = (U1 + U2 + . . . + U12 - 6)/4

The normal deviate, z is then approximated by:

_ 2 2 2 2
z — ((((a9R + a7) R + 85) R + 33) R + al) R

where a1 = 3.949846138
a3 = 0.252408784
a5 = 0.076542912
a7 = 0.008355968
a = 0.029899776

40

For the first situation, each observation needed in this study
consisted of 5 measures. Those 5 measures are the outcome variable
(Y), two predictors defined at the individual level (X), and two pre-
dictors defined only at the group level (2). For the second situation,
each observation consists of 3 measures, the outcome variable (Y), and
two predictors defined at the individual level (x). Therefore, the
procedure from a to b is repeated to obtain a 5x1 vector_z for the first
situation and a 3x1 vector §_for the second situation which is normally
distributed with a mean vector of zero and an identity matrix as the
covariance matrix.

c. Transform E.t° a_where a'is normally distributed (g, 23). The
transformation is:

2.: TE
where T is the cholesky factor of 23. The cholesky factor is a lower
triangular matrix such that TT' = Ea. This is used because the covariance
matrix of the transformed variables a is:

Var(a) = T Var (§)T'.

In this case, Var(§) is the identity matrix. Thus,

Var(a) = TT' = 23
which gives the desired result (Morrison, 1976). After the transforma—
tion, a is multivariate distributed normally with mean vector Q_and
covariance matrix 23.

4. Generate a random sample of kn vectors Eij where Eij is multi-
variate normally distributed with mean vector Q_and covariance matrix X.
A random sample of kn vectors Eij are generated with the same procedure
as used in the generation of vector 2i except that here we generate kn

vectors, and the covariance matrix is 2 instead of X

41

5. Add the k values of a1 and kn values of Eij to the p according
to formula (4-1) resulting in kn values of Eij' The values of 21 are
constant for the i£h_group, i.e.,

(4-1) W..=u+a +e

ij —- -i -ij

where W_ for the first situation

1
lx

and W_= for the second situation.

Ix 1< IN

The program MYDATA (see appendix A) was written for this study to
generate a random sample of kn vectors of Eij where Eij is multivariate
normally distributed with mean vector u_and covariance matrix X + 28,
using the procedure described above.

For each sample the pooled within and between mean square matrices

(S and 88) are computed as shown in formulas (4—2) and (4-3) respectively:

1 k n
- - '
Z X (Wij Wi) (W,, W1)

(4'2) 5 = k(n — 1) 1 j 13

 

where the expected value of S is the pooulation within covariance
matrix and the E(S) = Z and
k
(4-3) 3 = 1 n X (E. - E) (3i. - ﬂ)‘
a k _ 1 i 1 1

where the expected value of S8 is the following:
E(S ) = Z + n2
a 3

Here, Ea is the population between levels covariance matrix. The

general structure of S is the following:

42

where Sy is the pooled within variance of Y, Sxy is the pooled within
covariance matrix between X and Y, Sx is the pooled within covariance
matrix 0f.§-

To compute an estimate of the pooled within—group regression

coefficient (8) for any approach the formula (4-4) is used.

—1
(4-4) _8 — sX sxy

For the first situation where there are both individual level pre-
dictors and group level predictors, four approaches were investigated:
two stage analysis, group level analysis, full model analysis, and the
sUbtraction approach. The main concern is to estimate the regrassion
coefficients for the group level variables (82) by those four approaches.

For the second situation where there were only individual level pre-
dictors, four approaches were investigated: group level analysis, Bock
application, subtraction analysis and full model analysis. The main
purpose of each approach is to estimate the between group regression
coefficients for the individual level variables (88). The procedure

for each analysis approach is described in following sections.

Two Stage Analysis Approach

 

The procedure to estimate 82 by using the two stage analysis ap-
proach is the following:
1. Compute an estimate of the pooled within-group regression coef-

icient (89 using formula (4-4).

43

A

2. Compute an estimate of the group mean (uy ) using formula
i

(4-5).

.. A =A+A'A _A
(4 5) uy' uy §_(gx Ex)
1 i
3. Compute 82 using equation (4—6) implemented by the Finn multi-

variance program (1972).

(4‘6) “y. = u + B'(z, - u ) + An + 6
1 y —2 —1 —2 y I

Group Level Analysis Approach

 

Under the group level analysis approach the 82 for the first
situation and_§a for the second situation are estimated separately
from 8. The Finn multivariance program (1972) is used to estimate B2

under equation (4—7) and pa under the equation (4—8).

(4-7) uy = uy + §z(_Z__i - 32) + §,(g . - Ex) + 6

ll
1:
+
no
A
II:
I
1:
v
+
0'»

(4-8) H
yi 1

Subtraction Analysis Approach

 

For the first situation, Z variables are defined only at the group
level. The procedure of estimating 82 by the subtraction approach is
the same as for the group level analysis approach. The Finn multivariance
program is used to estimate 82 under equation (4—7).

To obtain the correct estimates ofpa in the second situation
Keesling recommends performing three steps as follows.

1. Compute estimates of the pooled within—group regression

coefficient (8) using formula (4—4). This step is to compute_§ under

44

the model of (4-9):

(4-9) Y.. u + B'(X., -
1] Y

2. Compute the estimates of the between-group regression coef—
ficient (8;) with equation (4-10) using the Finn multivariance program.
_ = +7'c' _ +5
(4 10) uyi uy Ea (Ex. Ex) 1
3. Compute the correct estimates of the between—group regression
coefficients for the individual level variables (Ba) by using formula

(4-11).

<4-11) 83 = 8* - a

a

Full Model Analysis Approach

 

The full model analysis approach used the individual outcome as
the dependent variable, the individual level variables, the mean of the
individual variables and the variables that are defined at the group
level as the predictors. The Finn multivariance program is used to
estimate 82 for the first situation under equation (4-12) and Ea for

the second situation under equation (4-13).

(4-12) Yij = uy + §z(§i - £2) + Ea(B-x, - lJor) +
E. Oil]. - ix) + Eij
_ = v _ - _
(4 13) Yij uy + §a(gxi 2x) + §_(§ij EX) + Eij

45

Bock Application Analysis Approach

 

Bock's analysis approach provides an estimate of the between co-
variance matrix (Za) which is guaranteed to be at least a positive
semi-definite covariance matrix, and then from this matrix estimates
the regression coefficient. The’steps to this approach are as follows:

1. Use the Finn multivariance program to determine discrimination
function coefficients p1, and canonical variances 01 (l=l, 2, 3).

2. Compute the positive semi—definite between covariance matrix

(Ea) using formula (4-14).

(4-14) Ea = [(T)'1'( 0 - I)T'1]/n

where elements in the columns of T are the discrimination function
coefficients £1, and the diagonal elements of diagonal matrix ¢ are
significant canonical variances $1 (l=l, 2, . . . , s) and p—s unities
(p is the dimension of T and s is less than and equal to p). When all

canonical variance 0 are significant (s=p)
2 = -
a [Sa S]/n

where S and Sa are within and between mean square matrices that are
computed by formula (4-2) and (4-3) respectively.
3. Using Ea to estimate Ea by formula (4—15).

A

(4—15) B = E(X)‘IE(XY)

—a a a

A

The general structure of Ba is the following:

“gm g<yxﬂ
a a
Z =
a
Ea(IXY) £(X)
a

 

 

46

“(y)

where Z
a

A x
X; y) is the between covariance

is the between variation of Y,

“(X)

of §_and Y, and 2a is the between covariance of X,

CHAPTER V

SIMULATION RESULTS

The simulation procedures employed in this study to investigate
the methods of analyzing hierarchical data were reviewed in Chapter IV.
The main purpose of this dissertation is to determine which approach
gives the best estimates of the between and within-group regression
coefficients in terms of accuracy (least amount of bias) and in terms
of precision for various situations. Six populations as shown in Table
4-1 were used as the basis from which the questions of interest were
explored. For each pOpulation, 50 samples of size 1,500 were gen—

-
erated. In each sample, there were 50 schools with 30 students nested
within each school. In order to confirm the emerging conclusions
resulting from the analyses, 10 additional samples for population I-C
and 25 additional samples for population II—C with the new set of
parameter values as given in Table 4—8, were generated. Data from
population I-A, I—B and I-C were analyzed by the two stage analysis
approach (as suggested by Keesling and Wiley), group level analysis
approach (as suggested by Cronbach and Webb), subtraction analysis
approach (as suggested by Keesling) and full model analysis approach
(as suggested by Keesling). Data from pOpulation II-A, II-B, and II-C
were analyzed by Bock application analysis approach, group level
analysis approach, subtraction analysis approach and full model analy-

sis approach.

The results of the data analysis of population I—A are shown in

47

 

48

Table 5-1. All four approaches give good estimates of the pooled
within-group regression coefficients B1 and 82. The means of the
estimates of 81 and 82 over the 50 samples are 2.524 and 0.322 for

all four approaches while the values of the parameters of 81 and_82
are 2.530 and 0.320. The standarderrorsofé1 and 82 are 0.009 and
0.002 for all four approaches. The result of testing the hypotheses
that the mean of the estimates of the within—group regression coef-
ficients of all 50 samples are equal to the parameters 81 and 82 are
not significant at the 0.01 level (t = -0.667, t = 1.000). The ratios
of the bias squared to mean square error which are computed by formula
(5—1) for 81 and 82 are 0.008 and 0.028. The formula is:

2
(5-1) Bias ratio = (Bias)
mean square error

 

where Bias = average value of the estimates - parameter value and mean
square error = variance of estimator + (bias)2. We can conclude that
all four approaches give good estimates of 81 and 82 since the bias
ratios are quite small. The results of the hypothesis tests showed
that the means of El and 82 over the 50 samples are not different from
the values of the parameters 81 and 82.

The means of the estimates of 82 analyzed by the two

and 8
z

1 2

stage analysis approach are closer to the parameters than the other

A

three analysis approaches (821 = 3.905, 822 = 1.722, 821

2.15). The results of testing the hypotheses that the mean of the

= 4.08, 822 =

estimates of the regression coefficients defined for the group level

variables over all 50 samples are equal to the parameters 8 and 822
z

1

are not significant for the two stage analysis approach, whereas, the

tests for the other three analysis approaches are significant at the

49

Table 5-1

Simulation Results of Population I-A

 

 

 

Parameters Analysis Approach Estimators t Ratio**
Mean SD SE
Two stage1 2.524 0.066 0.009 —0.667 .008
Bl= 2.53 Group level 2.524 0.066 0.009 -0.667 .008
Full model 2.524 0.066 0.009 —0.667 .008
Subtraction 2.524 0.066 0.009 -0.667 .008
Two stage 0.322 0.012 0.002 1.000 .028
82= 0.32 Group level 0.322 0.012 0.002 1.000 .028
Full model 0.322 0.012 0.002 1.000 .028
Subtraction 0.322 0.012 0.002 1.000 .028
Two stage 3.905 0.920 0.130 —1.346 .036
le=4.08 Group level 3.651 0.858 0.121 -3.545* .203
Full model 3.651 0.858 0.121 —3.545* .203
Subtraction 3.651 0.858 0.121 -3.545* .203
Two stage 1.722 1.795 0.254 -l.682 .055
822=2.15 Group level 1.405 1.709 0.242 —3.079* .162
Full model 1.405 1.709 0.242 —3.079 .162
Subtraction 1.405 1.709 0.242 -3.079 .162

 

*Significant at 0.01 level of significance.

**Ratio of the estimate of the bias to mean square error.

1

All four approaches gave exactly the same estimates of 81 and 82

as was expected.

50

A

0.01 level of significance. The bias ratios of 82 and 822 analyzed by

l
the two stage analysis are 0.036 and 0.055 while the bias ratios for

the other three approaches are 0.203 and 0.162. We can conclude that

in the situation where there is no group level effect (Ea = _) two
stage analysis approach gives the best estimates of the regression coef-
ficients defined for the group level variables. The empirical sampling
and 82

distributions of 82 over the 50 samples are shown in Figures

1 2

5-1 and 5-2. According to Figures 5—1 and 5-2, all four approaches
have very similar distributions.

The results of the data analysis of population I-B are shown in
Table 5-2. The parameter values of 81 and 82 are 2.53 and 0.32,
respectively. All four approaches gave the same average estimates for
81 and 82 (2.519 and 0.322). The standard errors of 81 and 82 for
all four approaches are quite small, 0.011 and 0.001. The bias ratio
of 81 and 82 for all four approaches are 0.019 and 0.033. The result
of testing the hypotheses that the mean of the estimates for the within-
group regression coefficients over all 50 samples are equal to the
parameters 81 and 82 are not significant at the 0.01 level of signifi-
cance (t = -1.000, t = 2.000). Therefore, the results of the data
analysis of population I-B indicate that all four approaches gave good
estimates of the within—group regression coefficients with good pre—
cision and small bias ratios.

The means of the estimates of 82

and 82 analyzed by the four

1 2

approaches are almost the same. However, the two stage analysis approach

gave a somewhat better estimate than the other three approaches as

demonstrated by the fact that the bias ratios of 82 and 822 are smaller.

1

The bias ratio of the estimates of 82 and 822 are 0.005 and 0.018 for

l

51

 

 

45 1"
o——-0Two Stage
40 .. '- o---o Group Level, Full Model
,1
I ‘\ and Subtraction
35 v- \
\
\
30 0 “
\
‘1
25 - \
20 -
15 .. ,
I
I
10 - I
I
Id
5 I I, o
I
0 l i l l l I
1 5 7 9

Figure 51 Sampling Distribution of ’321me Population I-A

25

15

10

52

  

p
o—o Two Stage
.- O----OGroup Level, Full Model
and Subtraction
1-

 

 

Figure 5-2 Sampling Distribution of B 2 From Population IA
2

Simulation Results of POpulation I-B

53

Table 5-2

 

 

 

Parameters Analysis Approach Estimators t Ratio**
Mean SD SE
Two stage 2.519 0.080 0.011 —1.000 0.019
81: 2.53 Group level 2.519 0.080 0.011 —1.000 0.019
Full model 2.519 0.080 0.011 -1.000 0.019
Subtraction 2.519 0.080 0.011 -1.000 0.019
Two stage 0.322 0.011 0.001 2.000 0.033
82= 0.32 Group level 0.322 0.011 0.001 2.000 0.033
Full model 0.322 0.011 0.001 2.000 0.033
Subtraction 0.322 0.011 0.001 2.000 0.033
Two stage 4.203 1.782 0,252 0.488 0.005
821=4.08 Group level 4.236 1.907 0.270 0.578 0.007
Full model 4.235 1.907 0.270 0.578 0.007
Subtraction 4.235 1.907 0.270 0.578 0.007
Two stage 1.381 5.651 0.799 —0.962 0.018
822=2.15 Group level 1.335 5.901 0.834 -0.977 0.019
Full model 1.330 5.901 0.834 -0.977 0.019
Subtraction 1.335 5.901 0.834 —0.977 0.019

 

**Ratio of the estimate of the bias to

mean square error .

54

the two stage analysis approach, and 0.007 and 0.019 for the other
three approaches. The results of testing the hypotheses that the
means of the estimates of the regression coefficients defined for the
group level variables over all 50 samples are equal to the parameters

B

21 and 822 are not significant at the 0.01 level of significance for
all four approaches. We conclude that in the situation where the
group level effects are equal to the individual effects (pa = 8) all
four approaches give good estimates of the regression coefficients
defined for the group level variables. The sampling distributions of

A

B

and B are shown in Figures 5-3 and 5-4.
21 z2

The results of the data analysis of population I—C are shown in
Table 5-3. The parameter values of 81 and 82 are 2.53 and 0.32. The

means of the estimates of B and 82 are 2.512 and 0.321 for all four

1
approaches. The standard errors for all four approaches are quite

small (0.010 and 0.002). The bias ratios of 81 and 82 for all four
approaches are 0.067 and 0.005. The results of testing the hypotheses
that the means of the estimates of within-group regression coefficient
of all 50 samples are equal to the parameters 81 and 82 are not sig-
nificant at the 0.01 level of significance (t = 0.067, t = 0.005). So,
all four approaches gave the same good estimates of within-group regres—
sion coefficients with high precision and small bias ratios.

The means of the estimates of the regression coefficients defined
for the group level variables (:21 and E;2) from the parameter values
than in the other three approaches. The results of testing the hypo-
theses that the means of the estimates of the regression coefficients

defined for the group level variables over all 50 samples are equal to

the parameters 821 and 822 are significant at the 0.01 level of

55

 

 

 

251’
H Two Stage
20 o---o Group Level, Full Model
" and Subtraction

15 -
10 r-

5 I-

0 l a 1 1

3 5 7 9

25

15

10

Figure 53 Sampling Distribution of P From Population l-B
21

0—0 Two Stage
1- O----o Group Level, Full Model
and Subtraction

 

 

 

 

Figure 54 Sampling Distribution of /3 2From Population l-B
z

56

Table 5-3

Simulation Results of Population I-C

 

 

 

Parameters Analysis Approach Estimators t Ratio**
Mean SD SE
Two stage 2.512 0.068 0.010 —1.800 0.067
81 = 1.53 Group level 2.512 0.068 0.010 -1.800 0.067
Full model 2.512 0.068 0.010 -l.800 0.067
Subtraction 2.512 0.068 0.010 —l.800 0.067
Two stage 0.321 0.014 0.002 0.500 0.005
82 = 0.32 Group level 0.321 0.014 0.002 0.500 0.005
Full model 0.321 0.014 0.002 0.500 0.005
Subtraction 0.321 0.014 0.002 0.500 0.005
Two stage 5.325 1.986 0.281 4.527 0.286
821: 4.08 Group level 4.296 1.946 0.275 0.785 0.012
Full model 4.296 1.946 0.275 0.785 0.012
Subtraction 4.926 1.946 0.275 0.785 0.012
Two stage 4.932 4.937 0.697 3.991* 0.245
822=2.15 Group level 3.194 4.911 0.695 1.502 0.044
Full model 3.194 4.911 0.695 1.502 0.044
Subtraction 3.194 4.911 0.695 1.502 0.044

 

*Significant at 0.01 level of significance.

**Ratio of the estimate of the bias to mean square error.

57

significance for the other three approaches (t = 3.062, t = 3.991).
These were not significant at the 0.01-level of significance for the
other three approaches (t = 0.785, t = 1.502). The bias ratios of

A

B and 822 analyzed by the other three approaches are 0.012 and 0.044.

21
We can conclude that in the situation where the group level effects
are not equal to the individual level effects (83318740), the group
level analysis approach, the subtraction analysis approach and the full
model analysis approach gave the better estimates of the regression

coefficients defined for the group level variables as opposed to the

two stage analysis approach. The sampling distributions of 821 and

A

B

22 are shown in Figures 5-5 and 5-6.
In the situation where there were only individual level explana-
tory variables, the data for three populations II-A, II-B, and II—C
were analyzed by the group level analysis approach, the Bock applica-
tion approach, the full model analysis approach, and the subtraction
analysis approach. The results of the data analysis of population
II-A are shown in Table 5—4. All four approaches gave the same esti-
mates of the pooled within-group regression coefficients. The means
of the estimates of 81 and 82 are 2.533 and 0.321 while the parameter
values of 81 and 82 are 2.530 and 0.320 respectively. The standard
errors of 81 and 82 are 0.010 and 0.002 for all four approaches. The
results of testing the hypotheses that the means of the estimates of
within regression coefficients of all 50 samples are equal to the para-
meters 81 and 82 are not significant at the 0.01 level of significance
(t = 0.300, t = 0.500) for all four approaches. The bias ratios of 81
and 82 are 0.002 and 0.004. All four approaches yielded the same esti-

mates of the within-group regression coefficients with high precision

25

25

 

58

R
[ ,’ \\ 0—-O TWO Stage
I’ \\ o---a Group Level, Full Model
I ‘\ and Subtraction

 

 

l
1 3 5 7 9
Figure 55 Sampling Distribution of [3 1 From Population l-C
2
f. I”, H Two Stage
I \\ o----o Group Level, Full Model
T ,l and Subtraction

 

 

 

-7 -1 5 11 17
Figure 5-6 Sampling Distribution of ,6 2 From Population l-C
z

59

Table 5-4

Simulation Results of POpulation II-A

 

 

 

Parameters Analysis Approach Estimators t Ratio**
Mean SD SE
Group level 2.533 0.070 0.010 0.300 0.002
81 = 2.53 Bock application 2.533 0.070 0.010 0.300 0.002
Full model 2.533 0.070 0.010 0.300 0.002
Subtraction 2.533 0.070 0.010 0.300 0.002
Group level 0.321 0.015 0.002 0.500 0.004
82 = 0.32 Bock application 0.321 0.015 0.002 0.500 0.004
Full model 0.321 0.015 0.002 0.500 0.004
Subtraction 0.321 0.015 0.002 0.500 0.004
Group level 0.017 0.146 0.021 0.810 0.013
Sal: 0.00 Bock application 0.008 0.148 0.021 0.381 0.003
Full model —2.519 0.168 0.024 —104.958* 0.996
Subtraction -2.159 0.168 0.024 -104.958* 0.996
Group level -0.011 0.161 0.023 -O.478 0.002
Ba2= 0.00 Bock application -0.067 0.181 0.026 -2.577 0.123
Full model -O.338 0.168 0.024 -14.083* 0.805
Subtraction —0.338 0.168 0.024 —14.083* 0.805

 

*Significant at 0.01 level of significance.

**Ratio of the estimate of the bias to mean square error.

60

and small bias ratios.

The means of the estimates of the between—group regression coef—

ficients ﬂnrthe individual level variables (Ba and 832) analyzed by

1

the group level analysis approach and the Bock application approach are

similar and closer to the parameters (83 = 0, Ba2= 0) . The full model

1

analysis approach and the subtraction analysis approach gave the same esti-

mates of Ba and Ba and they are not close to the parameter values

1 2

A

(gal: -2.519, 882== —0.338). The resultscﬂitesting the hypothesis that
the meanscﬂfthe estimates oftﬁmabetween-group regression coefficients
for the individual level variables over the 50 samples are equal to the
parameters Bal and Ba2 are not significant for the group level analysis ap—
proachanulthe Bock application analysis approach. However, they are

significant at 0.01 level of significance for the full model analysis ap-

A

proach and the subtraction analysis approach. The bias ratios of Bal and

A

B for the group level analysis approach and the Bock application analy-

a2

A

sis approach were quite small(}iuabias ratios of Ba for the group level

1

analysis approach and the Bock application analysis approach were 0.013
and 0.003, the bias ratio of 832 for the group level analysis approach and
the Bock application analysis approach were 0.002 and 0.123), while

the bias ratios of Ba and 882 for the other analysis approaches were

1

A

quite large (bias ratio of Bal and 832 for these two approaches were
0.996 and 0.805). We can conclude that in the situation where there
was no group level effect (Ba1 = Baz = 0), the group level analysis
approach and the Bock application analysis approach gave the better
estimates of the between-group regression coefficients while the full

model and subtraction analysis approach gave incorrect estimates of

the between-group regression coefficients. The sampling distributions

61

A

of 8 and 832 are shown in Figures 5—7 and 5-8.

al
The results of the data analyses for population II—B are shown in
Table 5-5. All four approaches gave the same estimates of the pooled
within regression coefficients. The means of the estimates of 81 and
‘82 were 2.546 and 0.320. The standard errors of 81 and 82 were 0.010
and 0.002 for all four approaches. The results of testing the hypo-
thesis that the means of the estimates of the pooled within regression
coefficients over all 50 samples were equal to the parameters 81 and
82 were not significant at 0.01 level of significance (t = 0.000,
t = 0.000). The bias ratios of 81 and 82 were 0.051 and 0.000. All
four approaches gave good estimates of the pooled within regression
coefficients.

The means of the estimates of the between-group regression coef-
ficients for the individual level variables analyzed by the group level
analysis approach and the Bock application approach were quite similar

A

and close to the paremeter values. The means of Ba and 832 for the

1
group level analysis approach were 2.525 and 0.310, and for the Bock

application analysis approach were 2.524 and 0.309. The full model
analysis approach and the subtraction analysis approach gave the same

estimates for Bal and Ba2 and they were not close to the parameter

A A

values (881 = —0.021, 8&2 = —0.014). The results of testing the hypo-

thesis that the means of the estimates of the between-group regression

coefficients were equal to the parameters Ba1 and BaZ were not signi-

ficant for the group level analysis approach or for the Bock applica-
tion analysis approach. However, they were significant at 0.01 level

of significance when analyzed by the full model and subtraction analy-

A

sis approaches. The bias ratios of Ba and 832 for the group level and

1

62

" o——o Group Level
O-u-O Bock Application
25 l" 3'". Subtraction

25

20

15

10

 

H Full Model

 

I I I

-1 O 1

Figure 57 Sampling Distribution of ,3 From Population Il-A
a1

 

-2.8

 

 

r
O—o Group Level
L = ‘ o----o Bock Application
1 \\ I’ \‘ I----I Subtraction
\ ,’ \‘ D——-n Full Model
1- ] \ \‘
’ I o \
I \
I I [I I \‘
1- ,, / \ \ o
I/ / \ “
I i \ ‘
- K
- ‘\ .
" \
o \
I I I I I P I I
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1.0

Figure 58 Sampling Distribution of Pa2 From Population lI-A

63

Table 5-5

Simulation Results of Population II-B

 

 

 

Parameters Analysis Approach Estimators t Ratio**
Mean SD SE
Group level 2.546 .070 0.010 0.600 0.051
81 = 2.53 Bock application 2.546 .070 0.010 0.600 0.051
Full model 2.546 .070 0.010 0.600 0.051
Subtraction 2.546 .070 0.010 0.600 0.051
Group level 0.320 .012 0.002 0.000 0.000
82 = 0.32 Bock application 0.320 .012 0.002 0.000 0.000
Full model 0.320 .013 0.002 0.000 0.000
Subtraction 0.320 .012 0.010 0.000I 0.000
Group level 2.525 0.221 0.031 -0.161 0.001
Ba}: 2.53 Bock application 2.524 0.222 0.031 -0.194 0.001
Full model -0.021 0.228 0.032 -79.7l9* 0.992
Subtraction -0.021 0.228 0.032 ~79.719* 0.992
Group level 0.310 0.158 0.022 -O.455 0.004
Ba2= 0.32 Bock application 0.309 0.178 0.025 -0.440 0.004
Full model —0.010 0.157 0.022 -15.000* 0.818
Subtraction —0.010 0.157 0.022 -15.000* 0.818

 

*Significant at 0.01 level of significance.

**Ratio of the estimate of the bias to mean square error.

64

Bock application analysis approaches were quite small (bias ratios of

A

8 and Ba2 for both the group level analysis approach and the Bock

a1

application analysis approach were 0.001 and 0.004) while the bias
ratios of Bal and 8&2 analyzed by the other two analysis approaches

A

were quite large (bias ratios of 8a and Ba2 equal to 0.992 and 0.822).

1
We can conclude that in the situation where the between-group regres-
sion coefficients (pa = B_# 0), the group level and Bock application
analysis approaches gave correct estimates of the between-group regres-
sion coefficients while the full model and subtraction analysis

approaches gave incorrect estimates of the between—group regression

and Ba are shown

coefficients. The sampling distributions of Ba 2

1
in Figures 5—9 and 5—10.

The results of the data analysis of population II-C are shown in
Table 5-6. All four approaches gave the same estimates of the pooled
within regression coefficients. The means of the estimates of B1 and
82 were 2.518 and 0.323 while the standard errors were 0.010 and 0.002
for all four approaches. The results of testing the hypothesis was
that the means of the estimates of the within regression coefficients
were equal to the parameters were not significant (t = -1.200, t =
1.500). The bias ratios of 81 and 82 were 0.028 and 0.060 for all four
approaches.

The means of the estimates of the between-group regression coef-
ficients for the group level and Bock application analysis approaches
were quite similar and close to the parameter values. The means of
gal and 8&2 analyzed by group level analysis approach were 1.424 and
0.934 and for the Bock application analysis approach they were 1.412

and 0.946 (881 = 1.45 and Ba2 = 0.89). The full model analysis

35F

30-
25-
20..
15+-

10"

 

-0.7

I
-0.1

65

o-—o Group Level
o----C Bock Application
H Subtraction 'and Full Model

     

l I l l L l
-0.4 1.0 1.6 2.2 2.8 3.1 3.7

Figure 5-9 Sampling Distribution of pa1 From Population ll-B

 

 

30 I
25 __ 1‘ o———o Group Level
,’ \‘ o----o Bock Application
" \‘ H Subtraction'and Full Model
20 -
15 '-
10 I-
\
\
5 l- \
J I I I I k I A l .
-0.5 -0.3 -0.1 0.1 0.3 0.5 0.7 0.8 1.0 1.3 1.5

Figure 5-10 Sampling Distribution of paZ From Population ll-B

66

Table 5-6

Simulation Results of Population II-C

 

 

 

Parameters Analysis Approach Estimators t Ratio**
Mean SD SE
Group level 2.518 0.071 0.010 -1.200 0.028
81 = 2.53 Bock application 2.518 0.071 0.010 -1.200 0.028
Full model 2.518 0.071 0.010 -1.200 0.028
Subtraction 2.518 0.071 0.010 -1.200 0.028
Group level 0.323 0.012 0.002 1.500 0.060
82 = 0.32 Bock application 0.323 0.012 0.002 1.500 0.060
Full model 0.323 0.012 0.002 1.500 0.060
Subtraction 0.323 0.012 0.002 1.500 0.060
Group level 1.424 0.260 0.037 -0.703 0.010
Sal: 1.45 Bock application 1.412 0.184 0.026 -1.462 0.042
Full model —1.093 0.179 0.025 -101.720* 0.995
Subtraction -1.093 0.117 0.025 -101.720* 0.995
Group level 0.934 0.196 0.021 2.095 0.049
832: 0.89 Bock application 0.946 0.165 0.023 2.453 0.105
Full model 0.611 0.021 0.021 -13.286* 0.782
Subtraction 0.611 0.149 0.021 -13.286* 0.782

 

*Significant at 0.01 level of significance.

**Ratio of the estimate of the bias to mean square error.

67

approach and the subtraction analysis approach yielded the same esti-

A A

mates for gal and Ba2 were not close to the parameter values (Sal =

1.093, 8a2 = 0.611). In testing the hypothesis that the means of the

estimates of the between-group regression coefficients were equal to

the parameters 8a and 832 were not significant for the group level

1
analysis approach and the Bock application analysis approach; however,
they were significant for the full model and subtraction analysis

approaches. The bias ratios of Ba and Ba2 for the group level analy-

1
sis approach and the Bock application analysis approach were quite
small while the same bias ratios for the other two analysis approaches
were quite large. We can conclude that in the situation where the
between—group regression coefficients were not equal to the within-
group regression coefficients (Ba # §_# 0), that the group level and
Bock application analysis approaches gave the correct estimates of

the between-group regression coefficients while the full model and the
subtraction analysis approaches gave incorrect estimates of the
between-group regression coefficients. The sampling distributions of

A

8

a1 and 832 are shown in Figures 5-11 and 5-12.
The specifications concerning Ea and §_for populations I—C and
II-C are most similar to the types of situations encountered in the
real world. The intraclass correlations of variables Y and §_are,
however, quite high for both cases (0.90). Therefore, in order to
check whether the analysis approaches give the same results for situ-
ations where the intraclass correlations are not as high as for the
data presented in the preceding pages, a second set of data for popu—

lations I-C and II-C was generated with new parameter values chosen so

that the intraclass correlations were lower than those in the first set

3O

25

20

15

10

35

30

25

20

15

10

68

 

 

 

 

 

' O—-—o Group Level
...._.... Bock Application
H Subtraction and
1- Full Model
1-
p-
in.
l I I I I I I I I l
-1.4 -1.0 -0.6 -0.2 0.2 0.6 1 1.4 1.8 2.2
Figure 5-11 Sampling Distribution of pa1 From Population ll-C
o-——o Group Level
- 0...-.. Bock Application
I-—-I Subtraction and Full Model

R

I \\

r
L I

I x
l I V
I \
l- 1 1
‘t
I ' ‘
I- " ‘\
I \
I \
I \
b . ‘
I
I
,1/ .
I - I I L j I I I I I
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2.0 2.2 2.4

Figure 5-12 Sampling Distribution of ﬁ 32 From Populatioq ll-C

69

of data. All analysis approaches were used to analyze the second set
of data in the same fashion as for the first set of data. The intra-
class correlations of the second set of data are 0.30.

The ten samples for pOpulation I-C were analyzed by the two stage
analysis approach, the group level analysis approach, the full model
analysis approach and the subtraction approach. The results of the
data analysis of the second set of data in population I-C are shown
in Table 5-7. The parameter values of 81 and 82 were 2.53 and 0.32.
The means of the estimates of 81 and 82 were 2.522 and 0.329 for all
four approaches. The standard errors of 81 and 82 for all four
approaches. The standard errors of 81 and 82 for all four approaches
were quite small (about 0.013 and 0.010). The bias ratios of 81 and
82 for all four approaches were 0.039 and 0.081. The results of
testing the hypothesis that the means of the estimates of the within
regression coefficients of all 10 samples were equal to the parameters
81 and 82 were not significant at 0.01 level of significance (t =
-0.615, t = 0.900). Therefore, all four approaches gave the same good
estimates of within regression coefficients.

The means of the estimates of the regression coefficients defined
for the group level variables (821 and 822) analyzed by the two stage
analysis approach were closer to the parameter values than when using
the other three approaches. However, the results of testing the hypo-
thesis that the means of the estimates of the regression coefficients
defined for the group level variables of all 10 samples were equal to
the parameters 821 and 822 were not significant at the 0.01 level of

significance for all four approaches. The bias ratios of 82 and 822

1
analyzed by two stage analysis approach were 0.001 and 0.007, while the

70

Table 5—7

Simulation Results of the Second Set of Data of Population I-C

 

 

 

Parameters Analysis Approach Estimators t Ratio**
Mean SD SE
Two stage 2.522 0.042 0.013 -0.615 0.039
81 = 2.53 Group level 2.522 0.042 0.013 —0.615 0.039
Full model 2.522 0.042 0.013 —O.615 0.039
Subtraction 2.522 0.042 0.013 -0.615 0.039
Two stage 0.329 0.032 0.010 0.900 0.081
82 = 0.32 Group level 0.329 0.032 0.010 0.900 0.081
Full model 0.329 0.032 0.010 0.900 0.081
Subtraction 0.329 0.032 0.010 0.900 0.081
Two stage 4.221 7.162 2.265 0.062 0.001
B 1= 4.08 Group level 3.004 6.652 2.103 -O.512 0.028
z
Full model 3.004 6.652 2.103 —0.512 0.028
Subtraction 3.004 6.652 2.103 —0.512 0.028
Two stage 5.450 42.965 13.587 0.243 0.007
822: 2.15 Group level 7.509 44.182 13.971 0.384 0.016
Full model 7.509 44.182 13.971 0.384 0.016
Subtraction 7.509 44.182 13.971 0.384 0.016

 

**Ratio of

the estimate of the bias to mean square error.

71

A

bias ratios of 821 and 822 analyzed by the other three approaches were
0.028 and 0.016. We can conclude that in the situation where the

group level effects were not equal to the individual level effects

(pa # §_# 9) and the intraclass correlations were not high (about 0.30)
all four approaches gave the correct estimates of the regression
coefficients and with small bias ratios but that the two stage approach

A

resulted in the smallest bias. The standard errors for 82 and 82 for

1 2

all four approaches, however, were quite high (see Table 5-7).

Twenty five samples of the second set for population II-C were
analyzed by the group level analysis approach, the Bock application
analysis approach, the full model analysis approach and the subtrac-
tion analysis approach. The results of the data analyses of the
second set of data for population II-C are shown in Table 5—8. The
parameter values of 81 and 82 were 0.08 and 0.76. The means of the
estimates of B1 and 82 were 0.083 and 0.758 for all four approaches.
The standard errors of 81 and 82 of all four approaches were 0.004
and 0.003. The bias ratios of 81 and 82 of all four approaches were
0.025 and 0.014. The results of testing the hypothesis that the means
of the estimates of pooled within regression coefficients over the 25
samples were equal to the parameters 81 and 82 were not significant
at 0.01 level of significance (t'= 0.025, t = 0.014). Therefore, all
four approaches gave the same good estimates of the pooled within
regression coefficients.

The means of the estimates of the between—group regression coef-
ficients for the individual level variables analyzed by the group level
analysis approach and the Bock application analysis approach were quite
similar and close to the parameter values (83

= 0.05, B = 0.95).

1 32

72

Table 5—8

Simulation Results of the Second Set of Data

of Population II-C

 

 

 

Parameters Analysis Approach Estimators t Ratio**
Mean SD SE
Group level 0.083 0.019 0.004 0.750 0.025
81 = 0.08 Bock application 0.083 0.019 0.004 0.750 0.025
Full model 0.083 0.019 0.004 0.750 0.025
Subtraction 0.083 0.019 0.004 0.750 0.025
Group level 0.758 0.017 0.003 —0.667 0.014
82 = 0.76 Bock application 0.758 0.017 0.003 -0.667 0.014
Full model 0.758 0.017 0.003 -0.667 0.014
Subtraction 0.758 0.017 0.003 —0.667 0.014
Group level 0.051 0.104 0.021 0.048 0.000
Sal: 0.05 Bock application 0.047 0.114 0.023 -0.130 0.001
Full model -0.033 0.102 0.020 -4.150* 0.408
Subtraction —0.033 0.102 0.020 -4.150* 0.408
Group level ’ 0.933 0.078 0.016 —1.063 0.047
Ba2= 0.95 Bock application 0.946 0.085 0.017 -0.235 0.002
Full model 0.175 0.076 0.015 —51.667* 0.991
Subtraction 0.175 0.076 0.015 —51.667* 0.991

 

*Significant at 0.01 level of significance.

**Ratio of the estimate of the bias to mean square error.

73

The means of Ba and BaZ analyzed by the group level analysis approach

1
were 0.051 and 0.933 and when analyzed by the Bock application analysis
approach they were 0.047 and 0.946. The full model analysis approach
and the subtraction analysis approach gave the same average estimates
of Bal and 832 and they were not close to the parameter values

(Sal = —0.033, 882 = 0.175). The results of testing the hypothesis
that the means of the estimates of the between-group regression coef-
ficients were equal to the parameter values were not significant for
the group level analysis approach or for the Bock application analysis
approach, but they were significant (p < 0.01) for the full model and

A

the subtraction analysis approach. The bias ratios of Ba and Ba2 for

1

the group level and the Bock application analysis approaches were quite
small (bias ratios of gal when using the group level analysis approach
and the Bock application analysis approach were 0.000 and 0.001, and
bias ratios of 882 when using the group level analysis approach and

the Bock application analysis approach were 0.047 and 0.002) while the
bias ratio of Sal and 882 analyzed by the full model and the subtrac—

A

tion analysis approaches were quite large (bias ratios of Ba and Ba2

1
with these two apporaches were 0.408 and 0.991). From this we can
conclude that in the situation where the between-group regression coef-
ficients were not equal to the within-group regression coefficients

(Ea # §_# 9) and the intraclass correlations were not high (0.30) the
group level and Bock application analysis approaches still gave the
correct estimates of the between-group regression coefficients and the

full model and the subtraction analysis approaches continued to yield

incorrect estimates.

CHAPTER VI

CONCLUSIONS AND RECOMMENDATIONS

The main purpose of the present study was to investigate various
alternatives used to analyze hierarchical data by applying them to a
set of simulated data. We determined which approach would give the
best estimates of the between and within regression coefficients in
terms of accuracy (the least amount of bias) and in terms of preci-
sion for various situations. The bias ratio of each estimator was
computed to facilitate comparisons.

Two situations were investigated in this dissertation. The first
situation was one in which there were both individual-level predictors
which can be aggregated to the group—level and predictors defined only
at the group level. The second situation was one in which there were
only individual-level predictors which can be aggregated. For each sit-
uation, data were generated from three different populations: the first
in which there were no group level effects; the second in which group
level effects were equal to individual level effects; the third in which
group level effects were not equal to the individual level effects.

The original intention of this study was to do a simulation
study to contrast the various analysis methods. However, the simulation
results suggested several patterns that imply certain relationships
between the alternative approaches. Therefore, an analytical study of
the relationship among the alternative approaches was used as a follow
up. The results of the analytical work are presented in this section

74

75

supporting the simulation results.

The simulation results showed that all analysis approaches gave
the same estimates of the within-group regression coefficients for all
six cases with good precision and small bias ratios. All approaches
gave the same estimates of the within-group regression coefficients
because they used the same basic formula (formula 4-4) to compute the
pooled within-group regression coefficients.

For the situation in which there were both individual level
predictors which can be aggregated to the group level and predictors
defined only at the group level, the two stage analysis approach,
group level analysis approach, full model analysis approach and the
subtraction analysis approach were used to analyze the data. Analyti-
cally, these four approaches can be grouped into two sets. One set
includes only the two stage analysis approach. The other set includes
the group level analysis approach, the full model analysis approach,
and the subtraction analysis approach. Theoretically, these three
approaches should give the same estimates of the regression coefficients
defined for the group level variable (gz) by the two stage analysis
approach and the other three approaches are defined by formulas (6-1)

and (6—2) respectively.

II
US
l
03
m
X
A
>J >
re;

A -1
(6_1) E2 2 Bzy z 2

II
on
on

I

[00

(6-2) $2 2 zy Bz Bzx

where Bz, Bzy’ and Bzx are the between—group sum of squares and cross
product matrices of 2;, 2_and Y, and 2_and §_variables.

The simulation results showed that for all three cases, the group

level analysis approach, the full model approach and the subtraction

76

approach gave the same estimates of 82 and were consistent with the
theoretical results suggested above.

In the case where there were no group level effects, the two
stage analysis approach gave estimates of 82 better than those derived
from the other three approaches. Where the between-group regression
coefficients were equal to the pooled within group regression coef-
ficients, all four approaches gave essentially the same estimates of
p2, all with comparable bias ratios. In the case where the between-
group regression coefficients were neither equal to the pooled within-
group regression coefficients nor to zero, the simulation results
were different depending upon the value of the intraclass correlation
coefficient. For the case where the intraclass coefficient was high,
the two stage analysis did not give as good estimates of 82 as the
other three approaches. However, when the intraclass correlations
were more moderate in value (around 0.30) all four approaches gave the
same estimates of 82, although the two stage approach yielded better
bias ratios indicating less bias relative to mean square error.

When the situation was such that there were only individual level
predictors which could be aggregated to the group level, the group
level analysis approach, Bock application analysis approach, full model
analysis approach and subtraction analysis approach were used to
analyze the data. Theoretically, these four approaches can be grouped
into three sets: first, the group level analysis approach by itself;
second, the Bock application analysis approach by itself; and third,
the full model analysis approach and the subtraction analysis approach.
In theory, the estimates of the between-group regression coefficients

(Ba) by the full model analysis approach are equal to the differences

77

between the between-group regression coefficients that are estimated
from the between-group sum of squares and cross products matrix and
the pooled within group regression coefficients. Therefore, the esti-
mates of Ea that obtain from the full model analysis approach and the
subtraction analysis approach should be the same. Analytically, the
relationship between the between-group regression coefficients esti—
mated by the group level analysis approach, Bock application analysis

approach are shown in equation (6-3).

1AF

G+(B-1A-I)_B_
a

8-» 33-0,

B
where 8 , BG
—a -a

and 8: are the between-group regression coefficients
estimated by the Bock application analysis approach, the group level
analysis approach and the full model analysis approach, respectively,
B is the within-group mean of square divided by the number of subjects
in each group, A is the between-group mean of square divided by the
number of subjects in each group, and I is the identity matrix. The
derivation of this relationship is shown in Appendix B.

The simulation results showed that for all three cases, the full
model analysis approach and subtraction analysis approach gave exactly
the same estimates of the between-group regression coefficients. They
were also equal to the difference between the regression coefficients
that were estimated from the between-group sum of squares and cross
products matrix and the pooled within-group regression coefficients
which is consistent with the theoretical results. However, the esti-
mates of Ea from these two approaches were not close to the parameter

values. The bias ratios for the estimates resulting from these two

approaches were very high. From this, we can conclude that the

78

subtraction and the full model approach gave totally wrong estimates of
§_. For all three cases the group level analysis approach and the Bock
application analysis approach gave good estimates of Ba. The bias
ratios for these two approaches were quite small when compared to the
bias ratios for the other two approaches. When the between-group
regression coefficients were equal to zero, the Bock application
analysis approach gave better estimates of Ea than the group level
analysis approach. However, when the between—group and within—group
regression coefficients were equal, both approaches gave the same
estimates of Ea' For the situation where the between-group regression
coefficients were not equal to the pooled within—group regression coef—
ficients, the group level analysis approach gave better estimates of
Ea than the Bock application analysis approach when the intraclass
correlations were high (about 0.90), but the Bock application analysis
approach gave the better estimates of Ea when the intraclass correla-
tions were low (about 0.30).

From the simulation results, we can summarize which approach
gave good estimates of the parameters for the different populations.
This is shown in Table 6-1.

Table 6—1 shows the quality of the estimates of the between
regression coefficients defined for the group level variables (82)
and the between-group regression coefficients (83) under the alter-
native approaches in terms of accuracy (bias ratios less than 0.15).
In the situation where there were both individual level predictors
which were aggregated to the group level and predictors which were de-

fined only at the group level, the two stage analysis approach gave

good estimates of the regression coefficients defined for the group

79

 

oanmoﬁamam uoz

manoowaaam uoz

manmuwaaam uoz

manmowaamm uoz

0ooo

0oo0

0ooo

seven

Umm

0ooo

0oo0

«coco

oceau
loaouuoo mmoauouucw Lwﬁn

£003 muoomwo Ho>oH Hosp
lH>HvsH ecu 0» Hence uoc
0003 muomwwo Ho>oH @5000

mGOHumHou
luou mmmﬁomuuC0 oumuovoe
:uHB muoomwo Ho>oa Hosp
10>0pcﬁ ecu Ou Hence 00:
0003 muuommo Ho>oa anonw

muoommo Ho>oH
Honcw>ﬁpcw 050 cu Honda

0003 muoomwo Ho>oH maouo

muoowwo Ho>oH asouw oz

moanmwwm> Ho>oa
Hoocom can Ho>oa
Hmavﬁ>0vcﬁ :uom

 

cowumowama< xoom

0ooo noou
0oo0 0oo0
0oo0 0ooo
000 000
0oooz

cowuomuuQSm Adam

somouaa< mﬁmxamc<

Ho>oq
asouo

owmum
030

mC00umH=mom

 

 

macaumasaom ucoquMHQ How moumEHumm onu mo xuﬁamso

ﬁlo mHan

80

.mH.o 00 H0000

00 0050 0000000 00 00000 0000 0:0 00023 0000000 0w 00m««

.mH.O CNS“ mmmH mH Oﬂumh mmwﬂ mﬁu OHM—t» Vmﬁﬂwmﬁ wﬁ UOOUK

 

00000
n0H00000 0000000000 swan

 

0H000 :0H3 0000000 H0>0H H000
0000 000 000 0000 Iwﬂmam IH>chﬁ 0:0 00 H0300 00:
- 002 0003 0000000 H0>0H 0:000
mcoﬁ00H00000
000H0000cﬂ 00000008
0H£00 L003 0000000 H0>0H H000
0000 000 000 0000 Iwaaam Iw>ﬁvca 0:0 00 H0000 00:
002 0003 0000000 H0>0H 0:000 >H00 00H00H00>
H0>0H H0000>H00H
0000 000 000 0000 0Hn00 0000000 H0>0H
Iwammm H0000>H00H 0:0 00 H0000
0oz 0003 0000000 H0>0H 0:000
0H000
IHH000
0000 000 000 0000 002 0000000 H0>0H 00000 oz
H0002 H0>0A 0wm0m
:0H000waaa< x000 :0ﬂ00000asm Hank 0:000 038 0:0H00H000m

£000000< mwmzamc<

 

 

A.Pucouv To

magma

81

level variables: 1) in the case where there were no group level effects;
2) where the group level effects were equal to the individual level
effects; and 3) where the group level effects were not equal to the
individual level effects and when the intraclass correlations were low
(about 0.30). The group level analysis, the full model and the sub-
traction approach gave good estimates of the regression coefficients
defined for the group level variables in the case where: 1) the group
level effects were not equal to the individual level effects; and 2)
the group level effects were not equal to the individual level effects
and when the intraclass correlations were either low (about 0.30) or
high (about 0.90).

When the situation was such that there were only individual level
predictors which could be aggregated to the group level, the group level
analysis and Bock application approaches gave good estimates of the
between-group regression coefficients for all cases. The full model
and the subtraction approach gave bad estimates of the between-group
regression coefficients for all cases.

In the present study, we only dealt with the simulation of specific
parameter values and specific situations. We did not investigate all
types of parameter values or all types of situations. Therefore, the
results of this study can be generalized only to similar situations

and similar parameter values.

Recommendations for Further Study

 

The present study deals with situations where homogeneity of
within—group regression coefficients is assumed; therefore, one possible

extension of the present work is an investigation of the methods of

82

analyzing hierarchical data which allow for heterogeneity of the within-
group regression coefficients. The results of this study suggest‘that
the intraclass correlations have an effect on estimating the between—
group regression coefficients (ﬁg) and the between—group regression
coefficients defined for the group variables (éz)' This would suggest
the investigation of all analysis approaches that are used to analyze
hierarchical data for different sets of data that are generated from
populations which are described by intraclass correlations of dif-
ferent magnitudes. The present study, although not designed to examine
this issue, and upon finding the apparent relationship, was able to
suggest in a preliminary way the need for examining this issue more
thoroughly.

Another avenue of future work is to apply the analytical pro-
cedures based on the methods of analysis of covariance structures for
hierarchical data devised by Schmidt (1969) and Wisenbaker and Schmidt

(1979) to simulated data of the sort considered in this study.

APPENDICES

APPENDIX A

COMPUTER PROGRAMS

0000000

000000000

10

20

35

PROGRAM MYUﬁTﬂ(TNPUTrUUTPUTﬁbﬁyTAPE6$UUTPUT7TﬁPE59TAPE1ITOPEQITAPE
+3)

GENEROTION PRUGROH

SUBNUUTINLS NFL“

GENEA

CHOL

CHANGE

GENUﬁTﬁ

COUOR

HINENSION SIGMA(15)7T(575)92(571)rE(150075)70(15)9TEMP(571)

DIMENSION SIGH0A(15)rTﬁ(515)901(150075)9Y(150015)96TOTOL(5)

UIMENSTON UMH(5)7YBWR(150075)vSUN(5075)7U(lﬁ)vS(15 rMU(5)rSV(15

DIMENSION HOW(15)79UB(15)78UH(15)yUMB(5)7PH(5)IGTU(5)1PSU(15)

DIMENQION SHQT(]5)7GMEQN(5 VSUBT(15)

REAL MU

REOTJ IN Nvl'x'l'JerNSINTVNEINEUVNSAT’USIGN?”SIl'I-il‘lflu’inﬁl.’

KxNOo OF VOHIORLEQ

KwnNO. OF UHRIABLES FOR WITHIN SCHOOL

NSNOo OF SUBJECTS WITHIN KOCH SCHOOL

NSzNOo OF SCHOOL

NT==NOo OF TOT-(IL SUBJECTS

NEHNO. OF ELEMENTS 1N COVORIONCE MQTRTX

NEN=NOo OF ELEMENTS IN HITHIN COUARIﬁNOE MﬁTRIX

NSﬁMﬁND. OF SAMPLES

READ(5110)K7KU9N7NSrNTvNﬁvNEUINSOM

FORMAT(815) '

READ(5915)(SIGMA(I)rleyNEU)v(SIGHAN(I)71#19NE)r(MU(I)9I*HK)

F0RMAT(6F10.47/yOFlOo4r/y7F10o47/73F10o4) ‘

WRITE KIKUINvNSrNT7NE7NEU!NSOM7815HAVSIGNﬁﬁrNU

URITE(67?O)KvKHerNSyNTrNEyNEUyNSOMQ(SIGMO(I)91$19NEH)7
+(SIGMAA(I)71319NE)V(NU(T)71319K)

FORHQT(*1DATQ INFORMATION*!//VOXVTNO. OF UORIQBLES m *rISr/vSXv*NO
+0 0F WITHIN SCI'TUOI- UKTIRII‘NBL.IE§.8 31' *7:l.57.-/VSX7*T\“:”9 OF SUBJECTS UITHIN S
+CHUUL m *915’/VSX7*N00 OF SCHOOLS I *IIUr/VSXrXNOo OF TOTAL SUBJEC
+TS = *,ISY/73X7*Nf}o OF [il..E.i"’i|ll"‘-TT53;. IN (:O'v’f'llz:Il’df’iLE: MATRIX =3 *1157/75X7X
+N0o 0F ELEMENTS IN WITHIN COUONTNNCE NOTHIX m *VISr/7UX7*NO. 0F SA

+MPLES m *rISv/r$OTHE UITHIN COUHRIHNUE MﬂTRTXk://15XvF10.49/95X12F
+10.49/75X73Floo4v/rWOTHE BETHEEN OOUHHIONOE HATRIX$r//75X7F10.4y/v

+5X92F10.4y/v5Xr3F10.4r/75‘r4F10.4y/y5X:5F10.4y/r*OPOPULATION MEAN
+=*95F10.4)
START GENERATING DATA
DO 100 IJKﬁerSﬁM
PRINT SAMPLE NUMBER
URITE(0721)IJK
FORMAT<$OSAHPLE NO. #912)
GENERATE E(IyJ)
CALL GENEh(KwyNTvSIOMArErT)
WRITE TUO HORE COLUMNS FOR NITHIN COUARIONCE
DO 750 IuerT
E(Iv4)30.
E(I!5)'~‘Oo

CONTINUE
WRITE CHOLESKY FAACTOR OF NITHIN COUARIANCE
URITE(6y25

FORMAT<$OTHE CHULESLY FﬂCTOR OF WITHIN COUﬂRIANCEX)
DO 110 1317K”

URITE(6930)(T(I9J)7J$17KU)

Pom-mum»-,3F10.4)
CONTINUE
GENERATE AI(I)
CALL GENEACNrNSySIGHAAyﬁIvTA)
“RITE CHOLESKY FACTOR OF BETWEEN COUﬂRIANCE
URITE(6735)
FORMAT($OTHE CHOLLSKY FACTOR OF BFTNEEN COUﬂRIANCEX)
DO 120 1:1HK

83

000 00000

84

NRITD<A,AD>(TA<IyJ>.J:I,N)
4o FORHOT<$0Ty5FlO.4>
120 CONTINUE
OENERATE Y(IyJ) .
CALL GENUOTA(KyNyNSyNErNTrMUyETAITY:YBARyPﬁrPSUrGMBvSUByGMEANySUyS
+HAT)
PRINT POOLED MEAN OF EACH UARIADLE
NRITE(6.45)(RM(J>.JAI.N) ‘
45 FORMAT<T0UEOTOR 0F POOLED MEAN xxys<r10.4.3x>)
PRINT SAMPLE POOLED NITHIN ODDARIANOE
NRITE(6y50)(P8v(L)yLRlvNE)
so FORMAT<$OSRHPLE POOLED wITHIN OOUARIANOE MATRIXI.//.Ox,F10.4,/.Ov.
+2(F10.4v3X)9/95X93(F10.473X)y/rSXyaéF10.4y5X)y/ySXyS(F10.473X))
PRINT GRAND MEAN OF EACH UARIOBLETSCHUUL MDAN IS UNIT 0F ANALYSIR
HRITE(6960)(SUB(L)7L=1yNE)
60 FORMAT<IOOAMPLE DDTUREN OOvARIANOt MATRIRAr//,ON,P10.4,/,OXTO<F10.
+493X)1/95X93(F10.4v3X)r/ySXy4(F10.4:3x)y/yﬁXySiFlO.4y3X))
PRINT GRAND MEAN OF EACH UARIAARLD
URITE<6r65>(GMERN(J)TJR1yK)
65 FORMAT(*0UECTUR 0F GRAND MEAN ATTO<P10.ATIX)>
PRINT SAMPLE COUARIANCE MATRIN
NRITE(6y7O)(SU(L)yLRlyNE)
7o FORMAT(#OSAHPLE covARIANcE MATRIXI,//,nyrIo.4,/y5x.2(r10.4.3X)y/y
+5X73<F10.473X)7/75X74(F10.423X)r/vSXr4(FlO.473X))
PRINT PURE DETNEEN OOUARIANOE
NRITE<6.75)(SNAT<L>,L:1,NE)
75 FORMAT<T08AMPLE PURE BETNEEN CUUNRIAHCETrH/rSXyFIO.4y/y5Xy2(F10.4y
+3X)y/rSXr3(F10.4v3X)v/rSXy4(F]O.4r3X)r/ySXyS(F10.4r3X))
PRINT Y AND YDAR ON TAPEI
II=1
L=N
DO 125 Mxin8
DO 130 IzIIyL
wRITE(1980)M:Iy(Y(IyJ)erlyK)y(YBOR(HeJ)yJw273)
80 . FORMAT<IO.1X.14.7(F10.4)I
130 CONTINUE
IIaII+N
L=L+N
125 CONTINUE
ENDFILE 1
PRINT SCHOOL MEAN ON TAPE2
no 135 “=1 9N8
NRITE(2.OO)M.<YDAR<M,J),JAI,N>
85 FORMAT(12.O(3X.F10.4>) A
135 CONTINUE
ENDFILE 2
PRINT POOLED NITNIN COUARIANOE ON TRPEE
URITE(3.90>RSU<1>,Rsv<2>.Psv(4).Rsvc2),ROU<3),ROU(5),Psu<4>,Psv<5)
+:PSU(6) *
9o FORMAT<3F10.4./,3F10.4./.3F10.4)
ENDFILE 3
100 CONTINUE
END
SUBROUTINE OENEA<N.N,SIOMA.Y.T>
GENERATION PROGRAM FOR A<I> AND E(IyJ)
READ IN SIOMA
FIND CHOLESKY FACTOR 0F SIGMA =_I
GENERATE 5 vARIABLEs DISTRIDUTED N(0r1) — z~5x1
TRANSFORM A<I>=Tz, A<I> DISTRIBUTED N<0.OIOMA~A)

SUBROUTINES NEED

000

00000

300

200.

500

400

600
700

101

85

CHOL
CHANGE

DIMENSION STGMA(]5)9T(575)IZ(591)PY(150075)rﬂ(15)vTEMP(571)
$5L;4. [5t Til Iii} IIJI£ l‘fll(flfiE:T (If?
9133.949946138
A330.252408794
9520.0765‘2913
6730.008355968
9930.029899776
CALL RANSET ( CLUCI’HDUMMY))
FIND CHOLESKY FACTOR 0F SIGMA ”~'T 0N1] IJFIT'EE'R'MINKTNT OF T
CéLL CHUL.(SIGMAIAII‘JUEr)
CALL CI'UINUE((HT7K)
DO 700 IlerN
GENERATE 12 RﬂNDUM NUMBER DISTRIBUTED U(091)
DO 200 Jﬂlvﬁ
B3300 .
D0 300 IN31712 J
RXmRﬂNF(DUMMY)
BxB+RX
CONTINUE
TRGNSFORH UNIFORM RANDOM NUMBER T0 2 UﬁRIﬁBLE UISTRIBUTED N(071)
R3(B*6o)/4o

RS=RRR
Z(J91)=((((A9XRS+N7)*RS+AS)XRS+H3)*RS+R1)*R
CONTINUE

TRANQFORM Z VARIABLE TU Y VARIABLE DISTRIBUTED N(OrSIGHﬁ)
DO 400 JJ=19K
X==O 0
DD 500 KKﬂHK
X=X+T(JJIKK)*Z(KKII)
.CONTINUE
TENP(JJ91)$X
CONTINUE
DO 600 J=1IK

Y<II.J)mTEMP<U,1> , ‘1 '
CONTINUE J N . ,- . - IT2~
CONTINUE ‘4 ‘3 r,a,f I
RETURN ‘
END

SUBROUTINE CHOL(SIGM99A1KIUET)

SIGMA AN K BY K SYMMETRIC MATRIX

A AN ARRAY 0F AT LEAST K$(K+l)/2 LOCﬁTIONS
K NUMBER OF RUNS IN Slﬁﬁﬁ

BET THE DETERMINﬁNT 0F 9
DIMENSION SIGHH(15)99(15)
X=SQRT<SIGMA(1))
DETmX
A(1)3X
K1=K~1
KC=1
IFIR=1
DO 101 JnHKl
NCxKC+J
A(KC)3SIGHA(KC)/X
CONTINUE
DO 105 leyﬁl
IFIRﬁIFIR+I
KCleIR
X=O.

0000000

0000

102

104

103
105

4?
41

44
43

86

DO 102 J=le
X=X+A(KC)$*2
KC=NC+1

CONTINUE

XmSORT(SIGMﬁ(KC)~X)

DET =3DETXX

A(KC)=X

IIxI+1

]F(II.EO.M)MIIUHN

EJCI==IIF'III

IN) 103 .kkIIrKl
JCuJC+J
ICﬁJC
KCwIFIR
Y::::() 0
DO 104 LKIII

YWY+AQIC)$A(KC)
NCSKC+1
ICSICII
CONTINUE
A(IC):(SIOMﬂ(IC)*Y)/X
CON I I N UE
C(HQTIIHJE
RETURN
END
SUDROUT I NC [El-IT’H‘TOE ( A r T v N)

A MATRIX TO BE CDNUERTED
T ARRAY UHERE CONVERTED MATRIX UILL DE STORED
K DIMENSION OF MATRIX

CHANGE TO SQUARE MATRIX

DIMENSION A(IC),T(5.5)
LmN+1
LL:(L*K)/2+I ,
DO 41 Jn1yk ' f “
JR=L—J
DO 42 1:1.JR _
IRPJRMI+1 ~*"
LLxLLwl ‘*
T(JR91R)HM(LL)
CONTINUE
CONTINUE
DO 43 JREIK
Lme1
DO 44 IaIyL ‘ ,
T<I.J)mo. : , ;
CONTINUE
CONTINUE
RETURN
END
SUBROUTINE OENDATA(R,N,NS,NE.NT,MU,E,A,Y,YDAR.PM,PDU,OMD,SUD.OMEAN

pu-
‘I
\x

+ySUvSHGT)

GENRRATION PROGRAM FOR Y(I7J)

Y(IrJ)$MU+A(I)+E(I)

THhRE ARE 5 UﬁRIADLES FOR EACH SUBJECTS

VECTOR OF Y DISTRIBUTED N(HUrSIGMn)

DIMENSION E(ISOOIS)yﬁ(1500y5)rY(150095):GTOTAL(S)IGMEAN(5)rOM8(55
DIMENSION YDAR<150095>98UM<50r5>9U(15)98(JS)rMU(O)yOU(lS)rSSU(15)
DIMENSION SVD(15)18VU(]S)yOMD(5)yPM(O)beUk5)rPSU(15)rSHﬁT(15)
DIMENSION SUDT(15

REAL MU

000

87

K7”: NO. OF VORIAD’LES7 N3J=NOo OF SUBJECTS IN EOCH SCHOOL
NOV-NO. OF CTN‘TOULS7 ATE-“NO. OF ELENE "‘3‘ '3: - T-' ' " '0
COMPUTE Y(I,‘J) NT.” NT TOTAL NONILD UF SUBJECT.)
III=TI
Lr-‘N
DO 100 "”317NS
DO 200 I=“I[7L
DO 250 .1331le
Y(I7J)==’NU('J)+A(NIJ)+£<I7J)
9.50 CONTINUE:
200 CON-TINT”?
I 13341 I TN
L‘3l-'l"N
100 CONTIT'TUE
COT‘TF'UTE SUM AND MOON FOR EACH SCHOOL
11:31
LxN
ﬁN:N
[IO 450 N-T’JIVNO
[TO ITOO-J‘T'lva
SUN ( M 7 J ) "*0 o
['0 500 l‘iNiv‘ITvL
SUN(M7J)-'=Ol.llvl(f’i7J)‘T‘Y(l§|\'7J)
500 CONTINUE
YBIOR(M7.J)L:SUTT<M!J)/AN
400 CONTINUE '
IIiJII‘T'N
L'~'-'L+N
450 CONTINUE
CONF'UTE POOLED NEON
ONTWNT
DO 94 J35’17K
GTU(.J):-'0o0
DO 93 N’JI7N‘3
GTU(.J)'~'-VI'TT'U(\.J)‘T'SLJM(MrJ)
93 CONTINUE
F'N(J)lii’OTw(J)/ANT
94 CONTINUE ,
CONF'UTE F'OOLED WITHI N COVT’IHINM‘TCEL NI’I'ITFIIX
ANI==N"‘I
DO 97 C131 7NE
SS”(;J):ZOOO
97 CONTINUE
11:1
L33N
['0 9O 1317MB .
CALL CUUIOR‘: I I 7 L 7 Y 7 l"\' 7 Nr NE 7 OUT-J 7 OMS)
DO 99 J=317NE
SOU<J)'~"—SSU(J)+(ON1*SUU(J))
99 CONTINUE
IIﬂII+N
L=L+N
98 CONTINUE
PN‘xNT‘“NS
DO 96 .1117le
PSV(J)=SSU(J)/F'N
96 CONTINUE
COMF’UTE SAMPLE. BETWEEN COUARIANCE
113:1
L=N8
CALL COUAR(II7L7YBAR7K7NS7NErSUBT76MB)
AN=N

130

700

600

710
750

800

88

DO 451 I=17NE
. SVB(I)=SVBT(I)*AN

' CONTINUE

COMPUTE SAMPLE TOTAL COUARIANCE NOTRIX
11:1
L=NT
CALL COVOR(TI7L7Y7N7NT7NE7SU7OMEAN)
COMF‘UTE THE ESTIMATION OF PURE BELTUEQEN COUARIONCE
AN-‘ZN
DO 130 I=17NE
SHOT( I)“‘~'(SVB( I )‘F'SV(\I ) )/AN
CONTINUE
RETURN
END
OTIDIITOUTII‘KTE (Tl-:TK'T‘Il‘-.'(I17l...7 Y 7K 7 NT 5' NF: 7 OU 7 (EH-’TTW‘TN)

COMF'UTE ”Hf-TN (1er COUﬁRIl-‘INCE MATRIX
DIT‘IENSION Y( .1 1.30075) 7 53V( 15) 7(‘3NKON(O) 7 OTOTI’U..(5) 7b” 15) 78( IS)
COMF’UT 19‘: SUN (IND MEAN
ANTvT-TT
DO 600 J‘r-‘IH‘L'
GTOTTMJJF-ﬁoo
DO 700 I=~‘-II7|.-
GTOTT’TL(J)77:1'3TOTAL(J)+Y< I 7 J)
CONTINUE
ONEAT‘HJJ=OTOTAL(J)/ANT
CONTINUE
COMF'UTE YEAR BY YEAR TRANSF'OOE
Ic=o
DO 750 J=17K
DO 710 M==17J
IC=’IC+1
TANIC)~‘-?=OMEAN(J)*OMEAT‘T(M)
CONTINUE
CONTINUE
COMF'UTE NT BY (YEAR BY YEAR TRANOF'OCSE)
ANT-”4N1.
DO 800 13:17le
“(I)3=U(I)*ANT
CONTINUE
COMF'UTE Y TRANSF‘Of-‘TE: Y
IC=40
DO 925 .J3317K
DO 900 MﬂlrJ
X330.
['0 950 Kli'fitIIrL
X=X+Y(KI’I7J)*Y(KIK7M)
CONTINUE
ICSIC‘T'I
S(IC)='~'X
CONTINUE
CONTINUE
COMF‘UTE SAMPLE VARIANCE COUARIANCEMATNIX
ONT] “NT“ 1
DO 9535 J=17NE
8(J):S(J)*U(J)
SU(J)-‘*S(J)/ANT1

" CONTI NUE

RETURN
END

89

DISCRIMINATION ANALY?IF r FINN N A nnnuga1 TO
GET CHﬁRﬁCTERISTlC H60? nan vucr S 93 UEM 1% LOCK APPROACH
3 1 1 1 2o 1 1

CHQOU 50 '
‘AMPLE NUMBER 2
FINISH
(IQ!5X93(F10.4))
Y X1 X2

50
C0!
C1!
C2!
C3!
C4!
C57
C61
C79
C89
C99
0101
C11!
C121
C137
C149
C159
C169
C177
C189
0199
C201
C217
C229
023!
0249
C259
C269
C271
C28!
C29!
C309
C319
C321
C33!
C34!
C357
C367
C377
C381
C399
C401
C419
C429
C437
C447
C457
C469
C479
C48!
C49!

3 1 1

“1749. STOP

10

200
100

25
300

30

500
400

90

PROGRAM BOCK(INPUTrOUTPUrméﬁyTAPESvTAPE6m0UTPUT)
SAMPLE NUMBER 50

ESTIMATE ROOITIUE SFMI DEFINITE DETUEEN COUARIANCE

USE IMSL SUDROUTINE

DIMENSION UNANEA<200>.TINUTC3.3>.TINU<3,3>.AI(3.3).T<3,3)

DIMENSION PHI(373)9PHIMI(3:3)vRES<3y3>ySIGHA<3y3)yRESl(3y3)

DIMENSION SXX<292)y8XY(2v1)ySXXINU<2y2)yBHAT<2v1)

DIMENSION 5Yx<1.2).sE<2>

N=NO. OF SUBJECTS IN EACH SCHOOL

NsNO. OF DIMENSION 0F MATRICES

cho

N=3

READ IN T(IyJ)rPHI(IrJ)

DO 100 1:1,N
READ<S:10)(T(IvJ)yJ:19K)y(PHI(I:J):JR1:K)
FORMATIAF10.6>

CREATE IDENTITY MATRIX
DO 200 Jx1,N
IF (I.EQ.J) THEN

AI(I:J)=1.
ELSE

AI(I.J>=o.
ENDIF
CONTINUE

CONTINUE

PRINT T<I.J).RHI<I.J>.AI<I.J)

URITE<6.20)

FORMAT<$1MATRIX 0F CHARACTERISTIC vECTORt)

DO 300 1:1.N
URITE<A,25>(T(I,J>.Jz1,N)
F0RMAT<X0X7(3X:3(F10.613X)))

CONTINUE

URITE<6y30)

FORMAT(*0MNTRIX OF RHIt)

DO 350 I=1.N
URITE<6.25)(RHI<I.J>,J:I.N)

CONTINUE

URITE<6,50)

FORMAT(*OIDENTITY MATRIxx)

DO 375 1:1.N
URITE<A.25><AI<I,J).J=1.N>

CONTINUE

CUMPUTE PHI—I

DO 400 1:1,N
DO 500 JxlyK

PHIHI(I:J):PHI(IvJ)~AI(IyJ)
CONTINUE

CONTINUE

COMPUTE INVERSE OF T

CALL LINv2F(Ty3r3rTINUyOrUKAREA:IER)

CREATE TINUERSE TRANCFOOE

DO 600 1:1,N
DO 700 Jm1yx

TINUT<I.J):TINU(J,I)
CONTINUE

CONTINUE

PRINT INVERSE OF T

WRITE(6y60)

FORMAT<XOINUERSE OF TX)

DO 723 ImlyK
URITE<A.25><TINU(I.J).J:1.N)

CONTINUE

70

750

91

T4€IF!T 'lktniﬁﬂ’ULJZ LN: TIIJUIJIEH:
URITE(697O)
FORMAT(*OTRAN8POSE OF TINUERSE*)
DO 750 I=17K
URITE(6925)(TINUT(IIJ)9J$17K)
CONTINUE
COMPUTE THE PRODUCT OF TINUERSE TRﬁNSPOSF AND PHI~I
CALL UHULFF(TINUT1PHIMIv3r3rSy3§3yRE81v5rIER)
COMF'UTE THE F'FL‘TIJIIUCT OF TINUERSE TFn’INESF'TTffSErF-‘HI~~I (5ND TINUER‘SE

' CALL UNULFF(HE81rTINUv39313v3739RESr3rIER)

850
800

COMPUTE POSITIVE SEMI UEFINITEBETUEEN COUHHIHNCE(SIONA~A HAT)
AN=N
DO 800 I=19K

DO 850 J=17K

SIOMA(IIJ)*RES(I7J)/AN

CONTINUE
CONT ]' NUE
PRINT POSITIUE SEMI DEFINITE BETUEEN COUGRINNCE
URITE(6980)

80 FORMAT(*OPOSITIUE SEMI DIFINITE BETWEEN COUﬁRIﬁNCE MATRICES<SIOMQ“

900

30
200

40

50

300

+HFIT ) *)

DO 900 lerK
NRITE<6725)(SIONA(I!J)1J*17K)

CONTINUE

SYY=SIGNA(171)

SXY(1!1)3SIGMO(271)

SXY<291)=SIGMA(371)

SXX(111)=SIGNA(272)

SXX(1!2)=SIGNA(273)

SXX(271)=SIONA(372)

SXX(272)=SIGNA(373)

CALL BETAHAT(SYYISXYISXX)

END

SUBROUTINE HETAHAT(SYYFSXYISXX)

COMPUTE BETWEEN SLOPE BY ROCK APPLICﬁTION APPROHCH

USE INSL SUBROUTINE _

DIMENSION SXX(272)18XY(271)78XXINU(272)yBHﬂT(271)rUKAREA(ROO)

DIMENSION SYX(1,2)7SE(2)

N=NOo OF SUBJECTS

K=NOo OF X

N=JO

K=2

PRINT SXX

URITE<6720)

FORNGT(*1HATRIX OF SXXX)

DO 200 I=HK
URITE(6730)(SXX(I7J)9J$1!K)
FORMAT(*O*75XIQ(F10.4:3X))

CONTINUE

PRINT SXY

URITE(694O)

FORMOT(*ONATRIX OF SXYX)

DO 300 I=vi
URITE(6750)(SXY(III))
FORMAT(*O*’5X9F10.4)

CONTINUE

CONFUTE INUERSE OF SXX

CALL LINUIIF ( SXX v 2 7 2 r SXXINNl r 0 1 ”KOREA r I ER)

-PRINT INVERSE OF SXX

60

URITE(6760)
FORMAT(*OINVERSE OF SXXX)

APPENDIX B

RELATIONSHIP OF THE BETWEEN-GROUP REGRESSION COEFFICIENTS

FROM VARIOUS ANALYSIS APPROACHES

Relationship of the Between-Group Regression Coefficients

From Various Analysis Approaches

To estimate the between-group regression coefficients, Bock
application approach uses the estimated between-group variance—

covariance matrix, 23. An unbiased estimate of 2a is:

where Sa is the matrix of the between-group mean squares,
S is the matrix of the within-group mean squares,

and n is the number of subjects within each group.
I

 

 

 

 

Denote: Z _(k) _(k) _(k1) ____(k1)
i=1n(x1 - x )(xi - x )
A
n(I - l) KxK
I n
(k) —<k) (k1) -<k1)
z 2 (x - x )(x - x )
B 1=1 j=1 13 1 ij i
nI(n - 1) KxK
I
—- n(I - l) le
I n
X X (k) -(k) -
and §_= i=1 j=1 (X13 ’ X1 >(Yij ' Y1)
nI(n — l) le

where I is the number of groups.

92

93

Then fa can be written as

 

 

. . 1
,. 0002 E(xy) I
2 = a a
.a ,. ,.
Z(xy) 2(X) l
a a .-
3 32"” <2. - 19'
(g - g) (A - B) |
A 2 " "2 _ _' 2
where 0(y) = 1_ 211(Y1 - Y) 22(Yij Yi)

 

n(I - 1) n(I - 1)

%- MS(Between) - MS(Within)
The least squares estimate of the between-group regression coef-

ficient can then be written as

“B _ “(x)-1“(xy)_ -1
E3 — Ea 2a - (A - B) (a - b).

While the matrices A and B are in general non-singular, the dif-
ference matrix (A - B) may not be non-singular. Thus, Bock (1968)
proposed to use the orthogonal decomposition of (A - B) and retain only
those eigen values and eigen vectors that were statistically signifi-
cant to construct Che "inverse" of gix).

In order to relate the estimated between-group regression coef-
ficient to those obtained from the other approaches, A - B is also
assumed to be non-singular so that (A - BT-lexists. Furthermore, both

A and B are assumed to be non-singular. The least squares estimate

of the between-group regression coefficient is then:

94

m-BYHa-w

Ba '
-l -1
= (A - B) a — (A - B) b
= (A(I - A‘IB))'1a - (n(B’lA - I))'1b
= (I - A-lB)-lA-la - (B'lA - I)'lB'1b
But A—la = E: is the between—group regression coefficient esti-
1

mated from the group level apporach, and B- b = §_is the pooled within

group regression coefficient. Hence,

1A

1A - I)- .g.

= (I — A-lB)-l§: - (B-

Pu;

Applying a theroem presented by Nobel (1969, Theorem 5.22, p. 147),

(I - AmlB).l can be written as:

(I - A-lB)-l = I + (B-IA - I)'1.
Thus,

“B _ -1 -1 “G -1 -1 *

Ea — (I + (B A - 1) ga - (B A - I) a
_ “G -1 “G -1 -1 A
—§a+(B A-I)§a-(B A-I) g
_ “G -1 -1 “G “
- Ea + (B A - I) (Ea — g)

“B “G -1 -1‘F

Ea — ga + (B A - I) ga

A

F
where Ea is the between-group regression coefficient obtained from the

A

full model analysis approach, and g: is the between-group regression

coefficient obtained from the Bock application approach.

B IBLIOGRAPHY

BIBLIOGRAPHY

Bock, R. D. and Vandenberg, S. G. Components of Heritable Variation
in Mental Test Scores. Progress in Human Behavior Genetics.
S. G. Vandenberg (Ed.). Baltimore: Johns Hopkins Press, 1968.

 

. Bidwell, Charles E. and Kasarda, J. D. School District Organization
and Student Achievement. American Sociological Review, 1975,
39, 55-70.

 

Burstein, Leigh. Assessing Differences Between Grouped and Individual-
Level Regression Coefficients. Paper presented at the Annual
VMeeting of the American Educational Research Association, 1976.

Burstein, Leigh. Three Key Topics in Regression—Based Analysis of
Multilevel Data from Quasi-Experiments and Field Studies. Paper
presented at Institute for Research on Teaching, Michigan State
University, 1977.

Burstein, Leigh and Miller, M. David. The Use of Within-Group Slope
as Indices of Group Outcomes. Paper presented at the Annual
Meeting of the American Educational Research Association, 1979.

Burstein, Leigh, Linn, Robert L. and Capell, Frank J. Analyzing
Multilevel Data in the Presence of Heterogeneous Within-Class
Regression. Journal of Educational Statistics, 1978, 2, 347-383.

 

Control Data Corporation. Fortran Extended Version 4 Reference Manual,
California, Control Data Corporation, 1978.

 

Cronbach, L. J. and Webb, N. Between—Class and Within-Class Effects
in a Reported Aptitude X Treatment Interaction: Reanalysis of
a Study by G. L. Anderson. Journal of Educational Psychology,
1975, Q], 717-724.

 

Finn, J. D. MULTIVARIANCE: Univariate and Multivariate Analysis of
Variance, Covariance and Regression. Ann Arbor, Mich.: National
Educational Resources, Inc., 1972.

Hannan, Michael T., and Young, Alice A. On Certain Simularities in
the Estimation of Multi—Wave Panels and Panels and Multi—Level
Cross-Sections. Paper prepared for the conference on Methodology
for Aggregating Data in Educational Research, 1976.

95

96

Hannan, Michael T., Freeman, John, and Meyer, John W. Specification
of Models of Organizational Effectiveness: Comment on Bidwell
and Kasarda. American Sociological Review, 1976, 31, 136—143.

 

IMSL LIBRARY. The IMSL Library Volume 3. Houston, International
Mathematical & Statistical Libraries, Inc., 1979.

 

Keesling, J. W. Components of Variance Models in Multilevel Analysis.
Paper prepared for presentation at a conference on Methodology
for the National Institute of Education, 1976.

Keesling, J. W. Some Explorations in Multilevel Analysis. Santa
Monica: System Development Corporation, 1977.

Keesling, J. W. and Wiley, David E. Regression Models for Hierarchical
Data. Paper presented at the Annual Meeting of the Psychometric
Society, 1974.

Knuth, Donald E. Semi-numerical Algorithms: The Art of Computer
Programming. Mass.: Addison, Wesley Publishing Co., 1968.

 

 

Noble, Ben. Applied Linear Algebra. Englewood Cliff, N.J.: Prentice-
Hall, 1969.

 

Rock, Donald A., Baird, Leonard L. and Linn, Robert L. Interaction
Between Colleges Effects and Students' Aptitudes. American
Educational Research Journal, 1972, 19) 149-161.

 

Scheifley, Verda M. Analysis of Repeated Measures Data: A Simulation
Study. Doctoral Dissertation, Michigan State University, 1974.

Scheifley, Verda M. and Schmidt, William H. Jeremy D. Finn's Multi-
variance-Univariate and Multivariate Analysis of Variance,
Covariance, and Regression Modified and adopted for use on the
CDC 6500. Occasional Paper No. 22, Office of Research Consultation,
Michigan State University, 1973.

"111111111111111111