THFRKI Mill/lill/Illwg/QWMIMI/W 3 1293 0918 LIBRAR Y Michigan St: re University W l l__.-,...- W}; This is to certify that the thesis entitled INVESTIGATION OF METHODS OF ANALYZING HIERARCHICAL DATA presented by Boonreang Kajornsin has been accepted towards fulfillment of the requirements for Ph . D . degree in Counseling and Educational Psychology (Statistics and Research Design) waist,“ fiwi Major professor Date October 9, 1980 0-7639 ' W 7 7 " in? (€35? M913 33 1009 ovsnw ' ma : 25¢ per day per it. unwind LIBRARY "mums: [Place in’ book return to move from circulation records INVESTIGATION OF METHODS OF ANALYZING HIERARCHICAL DATA By Boonreang Kajornsin A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of \ DOCTOR OF PHILOSOPHY Department of Counseling and Educational Psychology 1980 é flee/é;— ABSTRACT INVESTIGATION OF METHODS OF ANALYZING HIERARCHICAL DATA By Boonreang Kajornsin In recent years researchers have become more cognizant of the problems of analyzing hierarchical data. It has become increasingly evident that efforts to investigate the relationship among educational variables have suffered from a failure to understand complications caused by hierarchical data. When faced with the analysis of hier- archical data many researchers have proposed alternative ways of analyzing such data. The general purpose of this dissertation was to investigate various alternatives used to analyze hierarchical data by applying them to a set of simulated data. This study extends the regression model presented by Burstein, Linn and Capell (1978) to its multivariate form. The model used to simulate the data is the random effects model. The main assumption used in this model is that there is homogeneity of the within-group regression coefficients. The main concern of this dis- sertation is to determine which approach gives the best estimates of the between and within regression coefficients in terms of accuracy (least amount of bias) and in terms of precision for various situations. The bias ratio of each estimator was also computed to facilitate com- parisons. Two situations were investigated in this dissertation. The first Boonreang Kajornsin situation was one in which there were both individual level predictors. which were aggregated to the group level and predictors which were defined only at the group level. The second situation was one in which there were only individual level predictors which could be ag- gregated. For each situation, three different data sets were genera- ted; first, there were no group level effects; second, group level effects were equal to the individual level effects; third, group level effects were not equal to the individual level effects. The simulation results showed that all analysis approaches gave the same estimates of the pooled within-group regression coefficients for all six cases with good precision and small bias ratios. In the situation where there were both individual level predictors which were aggregated to the group level, the group level analysis approach, full model analysis approach and substraction analysis approach all gave essentially the same estimates of the regression coefficients defined for the group level variables. In the case where there was no group level effects, the two stage analysis approach gave better estimates of the regression coefficients defined for the group level variables than for the other three approaches. In the case where the between-group regression coefficients were equal to the pooled within-group coefficients, all four approaches gave essentially the same estimates of the regression coefficients defined for the group level variables. In the case where the between—group regression coef— ficients were not equal to the pooled within-group regression coeffi— cients and when the intraclass correlations were low (about 0.30) all four approaches gave the same estimates, but when the intraclass cor- relations were high (about 0.90) the two stage analysis approach did Boonreang Kajornsin not give estimates of the regression coefficients as good as those given by the other three approaches. In the situation where there were only individual level predictors which could be aggregated to the group level, the simulation results showed that for all three cases, the full model analysis approach and the subtraction analysis approach gave exactly the same estimates of the between-group regression coefficients but they were not close to the true parameter values. The group level analysis and Bock appli- cation approaches gave estimates of the between-group regression coef- ficients that were not that different from each other and were also close to the parameters. When the intraclass correlations were high (about 0.90), the group level analysis approach seemed to give better estimates of the between-group regression coefficients, but the Bock application analysis approach gave better estimates when the intraclass correlations were low (about 0.30) in the case where the between- .group regression coefficients were not equal to the within regression coefficients. When the between-group regression coefficients were equal to zero, the Bock application analysis approach gave better estimates of the between-group regression coefficients than the group level analysis approach. However, when'the between-group regression coefficients were equal to the pooled within group regression coef- ficients, the two approaches gave essentially the same estimates for the between-group level analysis approach. ACKNOWLEDGEMENTS I wish to express my sincere appreciation to my advisor and com— mittee chairman, Professor William H. Schmidt for his assistance, suggestions, and encouragement throughout all phases of my study. Special thanks goes to Dr. Richard Houang, Dr. Robert Floden and Dr. Dennis Gilliland for their help and valuable comments. Working in the Office of Research Consultation provided me with a most valuable experience which I will never forget. Many thanks to Professor Joe L. Byers, the Director of the Office of Research Consul- tation, who gave me this job, and to Professor William H. Schmidt, who gave me four years of teaching assistantship experience. I acknowledge with appreciation the support of the Thai Government which allowed me to pursue my doctoral studies. Special thanks are also extended to Apinya Assavanig for typing part of the rough draft of this document, and to Donna Schmidt, who was kind enough to spend many hours in correcting my English. Most of all, I wish to thank my husband, Dr. Samnao Kajornsin, for his encouragement, support, understanding and sympathy. Finally, I wish to extend my gratitude to my parents, my brothers, and to Rungson Kajornsin, my son. ii TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . LIST OF FIGURES . . . . . . . . . . . . . . . Chapter I. II. III. IV. V. VI. STATEMENT OF THE PROBLEM . . . . . . . REVIEW OF THE LITERATURE . . . . . . . . ALTERNATIVE APPROACHES FOR ANALYZING HIERARCHICAL Two Stage Analysis Approach . . . . . Group Level Analysis Approach . . . . Subtraction Analysis Approach . . . . The Full Model Analysis Approach . Bock Application Analysis Approach . . SIMULATION PROCEDURE . . . . . . . . . . Description of Population Parameters Description of the Generation Routine Two Stage Analysis Approach . . . . Group Level Analysis Approach . . . . Subtraction Analysis Approach . . . . Full Model Analysis Approach . . . . . Bock Application Analysis Approach . . SIMULATION RESULTS . . . . . . . . . . . CONCLUSIONS AND RECOMMENDATIONS . . . . . iii Page . vii 15 17 19 20 21 22 26 27 36 42 43 43 44 45 47 74 APPENDICES Page A. COMPUTER PROGRAMS . . . . . . . . . . . . . . . . . . . . 83 Mydata program . . . . . . . . . . . . . . . . . . . . 83 Discrimination Analysis of Finn Manova Program . . . . 89 Bock Program . . . . . . . . . . . . . . . . . . . . . 90 B. RELATIONSHIP OF THE BETWEEN-GROUP REGRESSION COEFFICIENTS FROM VARIOUS ANALYSIS APPROACHES . . . . . . . . . . . 92 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 iv Table 4—3 4-4 4-5 4-6 4—8 4-9 4-10 5-1 5-2 5-3 5-5 5-6 5-7 LIST OF TABLES 3x2 Design of Populations Defining the Structure of (E + Ea) . . . . . . . Population Compositions of Z and £8 Parameter Values for the First Situation . . . Parameter Values for the Second Situation Population Covariance Matrices Variables . . . . . . . . Population Covariance Matrices Situation . . . . . . . Population Covariance Matrices Situation . . . . . . . . Parameter Values of the Second Population Covariance Matrices Variables of the Second Set Population Covariance Matrices of Data . . . . . . . . of the Predictor Under the First Under the Second Set of Data . of the Predictor of Data . . . of the Second Set Simulation Results of Population I-A . . . . . Simulation Results of Population I-B Simulation Results of Population I-C . Simulation Results of Population Il-A . . Simulation Results of Population II—B Simulation Results of Population II-C . . . . . Simulation Results of the Second Set of Data of Population I-C . . . Page 28 29 30 31 32 33 34 35 37 38 49 53 56 59 63 66 70 Table Page 5-8 Simulation Results of the Second Set of Data of POPUlatIOD II-C o o o o o o o o o o o o o o o o o o o 72 vi 5-3 5-4 5-5 Sampling Sampling Sampling Sampling Sampling Sampling Sampling Sampling Sampling Sampling Sampling Sampling LIST OF FIGURES Distribution Distribution Distribution Distribution Distribution Distribution Distribution Distribution Distribution Distribution Distribution Distribution of of of of of of of of of of of ID) ID) ID) m) m) m ) of 8 vii 21 22 21 22 From From From From From From From From From From From From Population Population Population Population Population Population Population Population Population Population Population Population II-A II-A II-B II-B II—C II-C Page . 52 . 55 . 62 . 65 . 65 CHAPTER I STATEMENT OF THE PROBLEM In recent years, the problems of analyzing hierarchical data have been well known among researchers. It has become increasingly evident that efforts to investigate the relationship between variables have suffered from a failure to understand complications caused by hier- archical data. Most educational data are hierarchically arranged, i.e., students are grouped into classrooms which are grouped within grade levels and within schools. The schools are also grouped within school districts and these in turn are also grouped within state educa- tional administrations. Consider the problem of modeling the effects of school structure on student achievement. Suppose we are interested in the effects of some characteristic of school structure on achievement. There is a systematic sorting of families into school districts that produces a correlation of individual student attributes with school characteristics. Therefore, an1 adequate description of the achievement process must contain both student characteristics and school characteristics. The practical problem is that the researcher may either analyze individual level data (e.g., regressing individual achievement on student characteristics and school characteristics) or he may analyze school level averages (e.g., regressing average achievement on average individual characteristics and school characteristics). Hannan and Young (1976) have shown that in most realistic situations (i.e., when models are not perfectly specified) results of the two different analyses will be quite dissimilar. Bidwell and Kasarda (1975) argue that when the question is posed at the school level, the school level regression is most appropriate. Hannan, Freeman and Meyer (1976) point out that researchers seldom adequately specify school-level processes that are anything more than the sum of individual—level processes. The causal arguments are con— cerned with the impacts on individual students which are composed of school—level outcomes. Consequently, the choice of level is open to question. Wiley (1973) points out the problem of analyzing data when using large numbers of correlated explanatory variables. He indicates that when variables defined at the level of the individual pupil are ag- gregated to the level of the school their correlations tend to increase. As a consequence, in the presence of large numbers of such variables, effective analyses are hindered by excessive collinearity (high rela- tions among independent variables). When the number of such collinear variables becomes very large, the effects of individual variables become very difficult to detect. Whenever variables are defined at the school level, the appropriate unit of analysis is the school and the number of degrees of freedom available is limited to the number of schools. Research on the differences between multiple regression models applied at different levels of aggregated data indicates three things: 1) there are substantial differences in the magnitude of regression coefficients across aggregated levels for specific models; 2) different variables enter the models at different levels; and 3) aggregation of individual characteristics generally inflates the estimated effects of pupil background and thus decreases the likelihood of identifying teacher and classroom characteristics that are effective. The results cited above are not very comforting for the researcher who wishes to draw conclusions about educational processes at one level but is constrained to analysis at a different level. When faced with the analysis of hierarchical data many researchers have tried to propose alternative ways of analyzing such data (e.g., Keesling and Wiley, 1974; Cronbach and Webb, 1975; Keesling, 1976; and Burnstein, 1976). Keesling and Wiley (1974) prOpose a two stage analysis of hier- archical data. They set out to define a model for disentangling the effects of variables defined solely at the school level from those defined at the level of the pupil. Cronbach (1975) claims that the overall between-student coefficient from the regression of individual outcome on individual outcome on individual explanatory variables is a composite of the between groups regression coefficient and the pooled within—group regression coef- ficient. He recommends that between group effects and individual within group effects should be examined separately. According to Keesling (1976), to obtain the correct estimates of the between school regression coefficient at least two models need to be examined. These two models are a school level model and an indivi- dual within school level model. Keesling recommends subtracting the within school regression coefficient from the between school regression coefficient to obtain the correct regression coefficient appropriate to school level effects. Burstein (1976) proposes an alternative approach by suggesting the examination of determinants of heterogeneity of the within class SIOpe. He suggests that the first step is to find the specific within class adjusted intercept and SIOpe. The second step is to fit a model at the class level with the adjusted intercept and slope used as out— come variables and the class level explanatory variables used as independent variables. The general purpose of this dissertation is to investigate various alternatives used to analyze hierarchical data by applying them to a set of simulated data. This study extends the regression model presented by Burstein, Linn and Capell (1978) to its multivariate form. The model used to simulate the data is the random effects model. The main assumption used in this model is the homogeneity of the within group regression, that is, in contrast with Burstein's approach which suggests allowing for the heterogeneity of the within group regressions. The main concern of this dissertation is to determine which approach gives the best estimates of the between and within regression coef- ficients in terms of accuracy, least amount of bias and in terms of precision for various situations. In other words, this dissertation is concerned with determining how correctly the alternative procedures tend to work, i.e., how similar the estimated coefficients at the group level are to the known parameter values, and if the conclusions arrived at under each analysis approach are the same. Two situations are investigated. The first is where there are both individual level predictors which can be aggregated, and predictors defined only at the group level. For example, the predictors could be length of the school day (group level), and average home background (individual level aggregated to the group level). The second situation is where there are only individual level predictors which are aggregated. For example, the predictors are average home background, and average pretest scores (both individual level variables aggragated to the group level). For each situation, three different populations are investigated; first, there are no group level effects; second, group level effects are equal to the individual level effects; third, group level effects are not equal to the individual level effects. For the first situation, four analytical approaches will be investigated; a two stage least squares—analysis recommended by Keesling and Wiley, 3 group level analysis approach using only averages recommended by Cronbach and Webb, a full model analysis approach recommended by Keesling. For the second situation, four approaches will be investigated: the group level analysis approach, an approach ’based on Bock (1968) using his method of estimating heritable varia- tion in twin studies, the full model analysis approach and the sub- traction analysis approach. The method of investigating these various approaches will involve the use of simulated data which are generated by computer algorithms where the population parameters are known. For each pOpulation, fifty samples were generated. By analyzing fifty samples, one can compare 1) the empirical distribution of the estimator for each analysis approach to the others and 2) the empirical standard errors of the parameter estimates. These results can be used to help determine the appropriateness of each of the analyses for different data situations. CHAPTER II REVIEW OF THE LITERATURE Traditionally, in a situation involving heirarchical data, a variety of competing points of view have been cited as justification for the choice of either pupils or groups (classroom, schools, etc.) as the unit of analysis. Hannan (1976) has shown that in most inexact cases (i.e., when the models are not perfectly specified) results of individual level analyses and group level analyses will be quite dissimilar. This finding makes the choice between models extremely important. This section will review the methodologies that some invest— igators have used to analyze multi-level data. Cronbach and Webb (1975) reanalyzed a study by G. L. Anderson. Anderson reported finding an interaction of drill and meaningful methods of arithmetic instruction with student ability and achievement. Drill was found to be superior for "overachievers" and meaningful instruction for "underachievers" in 18 fourth-grade classrooms. Pretest measures used in the study were the Minnesota School Ability Test and the Compass Survey Test. Cronbach and Webb argued the importance of separating the regression effects into the between—class and within—class categories. In their reanalysis, separating between—class and within-class regres- sion components of the outcome on aptitude, the Aptitude by Treatment interaction finding disappeared. An apparent interaction in the between-class analysis was dismissed as unreliable. No interactions were found within classes. Finally, the concluded that studies of interactions usually have not been powerful enough to evaluate outcome on aptitude regressions accurately. Using the class as the unit of analysis, even the rather large Anderson study could not set narrow confidence limits on the regression slopes. They urged investigators collecting data on intact classes to examine between group and within group regressions separately. Keesling and Wiley (1974) discussed the problem of disentangling the effects of variables defined solely at the level of the school (e.g., length of the school day or the highest degree held by the principal) from those defined at the level of the individual pupil (e.g., home background characteristics). They summarized the model implicit in this situation by: = + ' + +0! + Y.. Yo -l-§i 01 & zij e 13 1} where Yij is the outcome of the jth_pupil in the ith school, Yo is an ' additive constant,_l is the vector of adjusted effects of school characteristics on Y’-§i is the vector of school variables for the ith school, Oi is an error component defined at the school level, 8' is the vector of adjusted effects of individual characteristics on Y,_§ is the vector of the characteristic of the jth individual in the 13' i£h_school, and Eij is an error component defined at the individual level. In the context of hierarchically defined educational data, they proposed three alternatives to obtain appropriate adjusted estimates of the effects at the individual and school levels. The first alter— native was to assume that the model was completely specified at the school level, i.e., all of the school variables relevant to the out- come are included in the model. Then the covariance (Oi, Xi) is equal to zero, where:i is the mean of Xij for the ith school. This model implies that individual level variables have direct impact on outcomes only at the level of the individual; their effects at the school level are mediated through other variables defined at the level of the school. The second alternative was that if all the mediating variables at the school level were not specified in the model, then the covariance (0:, 2;) was not equal to zero where O: was the residual from the measured school variables. In this case, the fitting of the model will produce a biased estimate of 8, This source of bias may, however, be eliminated by performing an analysis based on the variation within schools. This may be done by subtracting the relevant school means (school effect values) for the criterion variable and for each of the pupil level explanatory variables from each of the individual values for these variables. An analysis performed using these deviated values will be adjusted for all sources of variation among schools. The covariance matrix of the deviated values is called the pooled within school covariance matrix. If this covariance matrix is computed for all individually defined variables and used as the basis for the regres— sion of the outcome on the fiij’ the resulting estimate of §_will not be biased by specification errors at the school level. After the adjusted effects of the individual level variables are found, the average effect value for each school, aggregated over all the individual pupils, may be subtracted from the criterion mean for each school. Analyses using the school as a unit with variables defined at the level of school as the independent variables and the modified criterion means as the dependent variable will produce esti- mates of the effect of the school variables adjusted for the effects of individually defined variables. The model at the school level becomes: ’ Yi-é’gi=yo+lgi+¢i where T; is an achievement mean for i£h_school, éfgi is the estimated average effect value for ith school, yo is the constant, l’is the vector of the adjusted effects for the school variables, and oi is an error component at the school level. Using this model, the analysis will produce unbiased estimates of Y0 and l in the absence of specifi- cation error. The third alternative was that if there was some specification bias at the level of the school-defined variables (i.e., some impor- tant variables are missing) then the covariance (Oi, Xi) is not equal to zero and the covairance (¢i, 2%) is not equal to zero, either. Some of the biases in the estimate of 1_can be removed by including the sum of the average effect values (8521) of the individually defined varialbes as another variable in the school level analysis. The model then becomes as follows: I. +' + _ + 1 Yo 151 A<§2<—i) ¢1 This technique allows the partial removal of some of the additional bias due to the omission of relevant school level variables to the extent that the sum of these average effect values is correlated with the omitted variables. Rock, Baird, and Linn (1972) studied the interaction between college effects and students' aptitudes. They claimed that their approach was designed to find groups of colleges that are about equally effective for students with various levels of initial performance. Then the characteristics of the identified criterion groups were com- pared to see which characteristics were related to the relative 10 effectiveness of the groups. Their method attempted to provide an intuitively simple approach which identified both overall college effects and effects which interact with student ability. Specifically, four steps were carried out: 1) all within school regression lines were computed, i.e., Graduate Record Examination (GRE), area tests were regressed on the College Entrance Examination Board, Scholastic Aptitude Test (SAT) scores within school; 2) Ward's (1963) hierarch- ical clustering technique was applied to group schools in the basis of the similarity of their regression lines; 3) multiple group discriminant functions using the estimates of the regression parameters as the group discriminants were computed to test whether the newly formed groups differed with respect to their pooled regression lines; and 4) discriminant functions using college descriptive variables as the group discriminants were then computed. This method thus identified criterion clusters of colleges that differed in effectiveness by clustering on the SIOpe, the mean SAT scores of the students, and the intercept. Therefore, one can identify and grouI3colleges that have different levels of initial ability. Then the simultaneous evaluation of the college along with the relative slopes of their pooled within group regression lines indicated the college characteristics which are associated with overall as well as differential effectiveness. Burstein (1976) discussed two examples of multi-level analyses found in studies by Rock, Baird and Linn (1972) about the interaction of student aptitude and college characteristics and by Keesling and Wiley (1974) in which they reanalyzed a subset of the Coleman data. He stated that each method has certain merits and certain drawbacks. The Keesling and Wiley approach provided effect parameters more nearly ll mirroring the structural form of school effects than the Rock, Baird and Linn approach or the usual single level analysis models. Burstein's concern was that the Keesling and Wiley approach fails to adequately reflect the effects of between class differences in SIOpes. Moreover, treating the resulting clusters as groups in a discriminant analysis as Rock, Baird, and Linn did discarded any metric differences existing among the clusters and thereby eliminated the possibility of describing school effects in structural terms. The use of discriminant groups results in some loss in generalizability of findings that should be avoided. In the same paper Burstein also criticized Cronbach's approach that recommended analyzing between class and within class separately when intact classrooms are sampled. Burstein said that the between class and within class analyses did not remove the need for concern about homogeneity of regression. Burstein proposed an alternative multilevel analysis atragety that consisted of two stages, as follows: 1. perform within class regression (not pooled) of outcomes on input, and 2. use the parameters (a,8) from the within class regressions as "outcomes" in a between class analysis. Burstein claimed that his strategy combined certain features of approaches by Keesling and Wiley and by Rock, Baird and Linn. The technique of using the within class parameter estimates as outcomes should lead to more sensitive interpretation of effects and clearer policy implications of the findings. Burstein and Miller (1979) stated that because of its hierarchical organization, the effects of schooling on individual pupil performance ll 12 can exist both between and within the levels of the educational system. Moreover, analyses at different levels address different questions and thus analyses conducted at a single level were inherently inadequate. While analyses of the relationships between "treatment" dimensions and the mean outcomes of groups often provide useful information, impor- tant differences in within group processes may be obscured. These within group processes may arise due to group composition (e.g., ability level and mixture affecting participation patterns), differential allocation of instructional resources among the members of the group (e.g., the grouping and pacing features of reading instruction), or differential reactions of group members to the same instructional treatment (aptitude-treatment interactions). If important group-to-group differences in within group processes exist then the use of group means as the only indicator of group out- comes will result in misleading estimates of group (teacher, class, treatment) effects. Burstein and Miller's interest in alternative measures of group outcomes has concentrated on the properties of the within-group slopes from the regression of outcome on input. They have argued that within group slopes are group level indicators of within group processes. Their reason for considering slopes as outcomes was that there may be instructional effects on the within group regression of outcomes on input, whether there were instructional effects were present, the analysis should attempt of isolate instructional process and practice variables that were associated with slope variation. If such variables can be found and alternative explanations cannot be ruled out then variation in SIOpes becomes an important source of information for 13 researchers and policy makers, especially when considered along with effects on other group level outcomes. Keesling (1976) presented a model for analysis at two levels of aggregation (e.g., pupil and school). The multivariate random effects model for this situation is: = + + ' = ..... ‘Xij E. 2i Eij 1 1,2, ,k j = 1,2,.....,n all vectors are p x 1. This implies 2y = 2a +ifassuming that there are k groups of n units, each unit having measures on p variables. The above model, adopted from Schmidt (1969), was comprehensive in that it permitted the estimation of effects and their standard errors at both levels of aggregation simultaneously. Keesling, however, did not analyze the data by using Schmidt's procedure. Two sets of data were presented. One set dealt with data con- structed to a particular specification. The second set dealt with real data of a two-level nature. He analyzed data under three models. The first model used pupil post-test score as the dependent variable, ignoring the group structure in the data, and pretest, SES, average pretest, average SES and hours per month of principal absence as pre— dictors. The second model used school mean post—test as the depen— dent variable, average pretest, average SES and hours per month of principal absence as predictors. The third model used pupil posttest score within school as the dependent variable with pretest and SES as predictors. The results suggested that in order to obtain both the correct parameter estimates and the correct standard errors, it is necessary to perform at least two analyses. The first model gave the correct parameter estimates, but it did not partition the residual sum 14 of squares by level of effect. The second model gave the aggregate level standard errors, but the parameter estimates were the sum of the between and within effects. The third model obtained the apprOpriate estimates and standard errors for the within school effects. The second and third model may be combined to produce correct estimates of the between school effects by subtracting the within school estimates from the between school estimates. CHAPTER III ALTERNATIVE APPROACHES FOR ANALYZING HIERARCHICAL DATA Of the different alternatives prOposed to analyze the hierarchical data, four were selected for comparison in the present study. The two stage analysis approach which was recommended by Keesling and Wiley, group level analysis approach which was recommeded by Cronbach and Webb, full model and 'subtraction. analysis approaches which were recom- mended by Keesling, and Bock application analysis approach will be dis- cussed in this chapter. Consider the following general situation where person j is a member of group i. The person has a set of scores, £1. and Yif’ Also available are a set of explanatory variables defined only at the group level which is denoted as 21. The relationship of}:ij and ii to Yi can be decomposed into between group and within group components as given in equation (3—1). (3-1) Y..=u+§'(y_ —£)+B'(Z.-u)+6,+ 13 Y 3 xi x ‘—Z *1 -—z 1 V _ _ 1 _ E-(Kij u i) + (8i E) (Eij 3x1) + Eij where u , U and u represent the population means, u represents the y -z —x —xi i£h_group population mean,__8_a denotes the between-group regression coefficients for the individual level variables, 82 represents the regression coefficients defined for the group level variables, 8 represents the pooled within-group regression coefficients for the 15 16 individual level variables, and Ei represents the specific within—group regression coefficients for group i for the individual level variables. The (Si and Eij represent the error at the group level and at the indivi- dual level, respectively. This study will deal with the case where all within group slopes are equal, resulting in (B1 - B) being equal to zero. The model for the first simulated case is: .. = ' - + V' - +S + (32) Yij uy+§a(ui 11) 52(51 112) (i 8'(X. - u ) + e -—13 -x. 13 1 Let a = u - -x. —x, -x 1 1 a = Z. - u —z. —1 ‘-z 1 a = X..- H —x.. -1 —x. 13 1 The equation (3-2) can be rewritten as equation (3—3): (3—3) Y.. = u + B'a + B‘a + Ci + B'a + 6,, 13 y -—a—xi --z—zi ---—xij 13 This implies that the variance of Y is: (3-4) Var(Y) = B'zxa + 3'22 8 + q2 + s'zxs + 02 —aa—a —Za-Z a _ — X . . . 7. . where Ea is the between level variance—covariance matrix of g, 2a 18 . O l x I O O the between level variance-covariance matrix of g, 2 IS the within level covariance matrix of x, o: is the error variance defined at the group level and ozis the error variance defined at the individual level. Then there are only individual level explanatory variables, the model is: -' Y = + ' - +18 "l' ' - (3 5) ij uy Ea (3x1 Bx) i E- (éij 'Exi) + Eij And the variance of Y is: 17 (3-6) Var(Y) = £328? + o: + @ng + 02. The five alternative analysis approaches (two—stage analysis, group level analysis, full model analysis, subtraction analysis, and Bock application analysis) that are investigated in this dissertation can be related to the models as given in equations (3-2) and (3—5) for the first and second situation, respectively. In the following pages, the procedures of each alternative approach is discussed. Two Stage Analysis Approach The two stage analysis approach was recommended by Keesling and Wiley (1974). Wiley mentions that one of the problems in the analysis of multi-level data has been separation of the effects of the aggregated variables into parts reflecting their individual level effects on one hand, and their effects via school climate and organization on the other. One way to describe an appropriate method of analysis of hierarchical data is in terms of the general notions of statistical confounding and control. If we wish to assess the impact of how one explanatory vari— able is correlated with another one, then if we ignore the second, we will attribute to the first not only its effect, but also a spurious effect which is due to the correlation between it and the second, and the effect of the second. If we utilize an apprOpriate method of analysis which takes into account the second variable, i.e., its effects and its relationship to the first, we may obtain an adjusted assessment of the effect of the first variable which is not confounded by the second. Keesling and Wiley set out to define a model for disentangling the effects of variables defined solely at the school level from those 18 defined at the level of the pupil. The process of disentanglement involves two stages. The first stage adjusted the effects of indivi— dual background characteristic on outcome for the effects of the schools in which the individuals receive instruction. The second stage used the adjusted effects of individual level variables aggregated over pupils within schools to determine the adjusted effects of school level variables. In practice, they carried out the following: 1. Determine the pooled within-school slopes under equation (3—7). 3—7 Y = u + 8' X.. - u ) + 6.. ( ) ij yi “(‘13 —Xi 13 i = 1, 2, . . ., k; j = 1, 2, . . ., n j is the outcome of the jth subject in the ith_school, uy is i the population mean of the ith school, gij is the vector of explanatory where Yi variables of the j£h_subject in the ith_school, and B is the vector of pooled within—school slopes. An analysis using the school mean deviated values of both explana- tory and criterion variable will effectively "control" or adjust for all sources of variation among schools. The covariance matrix of the deviated values is called the pooled-within school covariance matrix. If this covariance matrix is computed for all individually defined variables and used as the basis for the regression of the outcome on the set of explanatory variables, the resulting estimates of §_will not be biased by specification errors at the school level. 2. Find the mean predicted outcome for each school. _. A =A 4-".A —-A (3 8) uy. uy §_(gx. Ex) 1 1 where u is the mean predicted outcome for the ith_school. i 3. Fit a model at the school level regressing the observed school 19 mean outcome on school level explanatory variables and predicted school mean outcomes, _ v _ (3-9) uy - my +_§z(§i .32) + Any + 61 i i where 82 is the vector of adjusted effects of the school level varia- bles, 2i is the vector of the school level variables, 61 is the error defined at the group level, Ais the coefficient allowing for partial removal of some of the additional bias due to the omission of relevant school level variables (to the extent that the sum of these average effect values is correlated with the sum of the average individual level effect values represented in u ). If all relevant school level 1 variables are included, then Anwill be equal to one. Group Level Analysis Approach Cronbach (1976) mentions that in the situation where pupil j is a member of group i, Bt’ the overall between student coefficient from the regression of Yij on X ., iJ - = + - + (3 10) Yij uy Bt(Xij ux) eij has been shown by Duncan, Cuzzort, and Duncan (1961) to be a composite of 83, the between group regression coefficient and B, the pooled within-group coefficient; - = 2 _ 2 (3 11) 8t nXBa + (1 nx)8 where n: is the correlation ratio of X. 22 2 x . - u . (3—12) n2 = 1 - 11~( 13 X1) X XX (X _ )2 ij ij “x 20 Cronbach indicated that analyses at the group level and the individual level give conflicting descriptive results because they speak to different substantive questions. The investigator who wants to know the relationship between two variables is not asking a clear question until he tells whether the group or individual level relation— ship is the one of interest. He recommended that between group effects and individual within group effects should be examined separately. He proposed the following: 1. Between groups: (3-13) uyi = uy + EAQI — £2) + _B_(,:l(y,Xi — 10+ 61 where the 82 is the effect of school level variables on mean outcomes, and Ea is the between groups effect that reflects any consistent tendency of higher-X groups to do better or worse than others on the outcome measure. 2. Pooled within groups: 3-13 Y.. = u + 8' X.. - u + 6.. < ) 13 yi “ (‘13 -_Xi) 1] where §_is the common within-group effect that reflects the tendency for students above the group average to outperform or underperform the rest of the group. Subtraction Analysis Approach In the situation where subject j is nested within group i, Keesling (1977) analyzed constructed data to show how well ordinary least square estimators can retrieve the information. He analyzed the data under two models, as follow: 1. The group level model uses group mean outcome as the indepen- dent variable: 21 (3-15) uyi = uy ”'15-;‘51 - 112) + Eg'Qin - ix) + 61 Keesling claimed that this model gives the aggregated level standard errors, but the parameter estimates are the sum to the between and within effects. 2. The within group model is the model that uses the individual level outcome variable within groups as the dependent variable. Accord- ing to Keesling, this model obtains the apprOpriate estimates and standard errors for the within group effects. (3—16) Yij = 11y. +§ (£13, - 3x.) + 513’ Keesling concluded that to obtain the correct estimates of the between school effects at least these two models need to be performed and then substract the within group estimates from the between group estimates. That is, 3 l7 - * <— > fia-éa-E A wherefia is the correct between group effects, 88 is the estimate of the between groups effects using the group level data and_§ is the within group effect. The Full Model Analysis Approach The full model is the model that uses the individual level outcome variable as the dependent variable. The explanatory variables are: l) the variables defined at the individual level but which can also be aggregated, 2) the means of the variables defined at the individual level, and 3) the variables defined at the group level only. The model is shown in equation (3-18). 22 (3-18) Yij = uy + a2 (a, - 1.1.2) + £8 (3X1 ' 1%,.) + $5- ‘Eij ‘ Ex) + Co where uy’-Ez and Ex are the population means,_y_xi is the ith group popu- lation mean,§a represents the between—group regression coefficients for the aggregated individual level variables, 82 represents the regres— sion coefficients for the group level variables, 8 represents the pooled within—group regression coefficients for the individual level variables, and t. represents the error defined at the individual level. 13 Keesling (1977) at one time analyzed the heirarchical data under the full model. He mentioned that this model gave the correct parameter estimates, but it did not partition the residual sum of squares by the level of effect. Bock Application Analysis Approach In the situation where there are students nested within schools and the school is a random variable, the model is the random effects model. In this dissertation, there is one dependent measure and two antecedent measures for each subject; the random effects model is: W,, = u + a, + e,, , i = 1, 2, . . . ., k. —13 - -1 ~13 where all vectors are 3x1 in this application, Eij is the response vector representing the dependent, and antecedent measures,_p is vector of the population means on each measure, 2i and;ij are the random vectors assumed to be multivariate normally and independently distri— buted with zero mean vectors and covariance matrices Xa andllrespectively. The above model implies 2w = 2a + X, where 2w is the total variance 23 covariance matrix, 23 is the between school variance matrix, and X is the within school variance covariance matrix. The use of the Bock application approach is to provide an estimate of 28 which is at least a positive semi—definite variance covariance matrix, and then from this matrix to estimate the group level regression coefficients. Bock's method is presented in the context of twin studies and is used to estimate the component of heritable variation. A more detailed description of this approach can be found in Bock (1968). Under the random effects model, the expected value of the mean square matrix between schools is E + n23, and the expected value of the mean square matrix within schools is I. Let S X + n2 3 a S = X Then for a symmetric positive definite matrix S and a symmetric positive definite matrix Sa’ it is possible to find a nonsingular transformation T such that ¢ _ I (3 19) T SaT (3-20) T'S T I where ¢ is diagonal with positive diagonal elements, and I is an identity matrix. The columns of T are the solution of a system of homogeneous equations of the form: Q (58 — ¢18)t = 0, l = l, 2, 3, and ¢1 IS a root of Is - SI =0. a In practice, the estimate of S is the mean square matrix within schools, that is obtained from the equation (3-21). k (3-21) 3 = ———1——— 2 (14.. - w.)(w.. — w.)' i ‘—1J - ~1 k(n - 1) 1 —Tj u-PTS 24 where W_. is the individual response vector, W_ is the group mean 13 1 vector, k is the number of groups and n is the number of subjects in each group. <1 Yij _ i z I lxl Xij The estimate of Sa is the mean square matrix between schools that is obtained from the equation (3—22). k _ = z - _ ' (3 22) 83 k _ 1 n1 (34,i y)oo cmmzumm Ame moamanm>oo assess coaumauam mm cam w mo mcowumeQEoo coaumfismom qu oHan 30 Table 4-3 Parameter Values for the First Situation IT: loo ‘3) o M Q N Case 12.0810 1 1.3491 2.53 0 4.08 2.4587 0.32 0 2.15 0.5276 0.0812 6.8660 1.0511 1 Ll p-l r 2.0810 1.3491 2.53 2.53 4.08 1’3 2.4587 0.32 0.32 2.15 0.5276 0.8972 6.8660 1.0551 12.0810 1 1.3491 2.53 1.45 b .08 1‘0 2.4587 0.32 0.89 N .15 0.5276 0.9547 6.8660 1.0511 J 31 Table 4-4 Parameter Values for the Second Situation 2 2 Case p_ B_ Ba 0 03 "25.38101 2.53 0 II-A 12.5912 [9.32 0 0.5276 20.7321 11.4587 i—. .1 "25.3810j 2.53 2.53 II—B 12.5912 0.32 0.32 0.5276 32.7341 g1.4587_ 25.38101 2.53 1.45 II—C 12.5912 0.32 0.89 0.5276 22.3454 11.4587_ 32 Table 4-5 Population Covariance Matrices of the Predictor Variables Covariance Matrices Situation I Situation II (x) 0.0912 0.1901 0.0912 0.1901 3 0.1901 2.4775 0 1901 2.4775 2(x) 0.1729 0.1400 14.7149 2.9871 a 0.1400 0.3746 2.9871 25.8970 (2) 0.0072 0.0007 Not Applicable 3 0.0007 0.0009 2(xz) 0.0159 0.0065 Not applicable a 0.0260 0.0088 33 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0-0 000.0 000.0 000.0 000.0 000.0 000.0 00000055000 000.0 00000055000 000.0 00000052000 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0 000 0 000.0 000.0 000.0 000 0 000.0 000.0 000.0 0 0 0 0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 010 000.0 000.0 000.0 000.0 000.0 000.0 00000005000 000.0 000000EE000 000.0 00000055000 000.0 000.0 000.0 000 0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 000.0 0.0 000.0 000.0 000.0 000.0 000.0 000.0 00000025000 000.0 00000055000 000.0 00000005000 000.0 000+00 0000 A00 oucmwum>oo cmmzuom mocmwum>ou manu03 mmmu ousmfium>ou Hmuoe one canoe coaumnufim umuwm mnu pupa: mmowuumz mocmaum>oo cowumasmom 34 000.00 000.00 000.00 000.0 000.00 000.0 000.0 000.0 000.00 000 00 000.00 000.00 000.0 000 0 00000055000 000.00 00000059000 000.00 000aumeE000 000.0 000.00 000.0 000.00 000.00 000.0 000.00 000.0 000.0 000.0 000.00 000.00 000.00 000.00 000.0 000.0 0-00 00000055000 000.000 00000055000 000.000 00000055000 000.0 000.00 000.0 000.0 000.00 000.0 0 000.0 000.0 000.0 000.00 000.0 000.00 0 000.0 000.0 0-00 00000055000 000.00 00000055000 000.00 00000055000 000.0 m m 0 0+3 A 00 moamwum>oo Hmuoe oocmwum>oo cowauom moamwum>oo :05003 ommu :owumsuwm vacuum 050 Hoes: mmowuumz mocmfium>oo c00umasaom 01¢ manmw 35 Table 4-8 Parameter Values of the Second Set of Data Case u_ §_ Ea 82 0 08 "12.08101 1.3491 2.53 1.45 4.08 100.2311 22.3454 1-c 2.4587 0.32 0.89 2.15 6.8660 1.1'05110 52.08107 0.08 0.05 None 35.0000 11.9994 II-C 1.3491 0.76 0.95 L245874 36 and X + 2a for the new set of data are given in Tables 4-9 and 4-10. Ten samples of 1,500 subjects were generated for population I-C and twenty-five samples of 1,500 subjects were generated for pOpulation II-C. Populations I-C and II—C were chosen to have additional data generated in addition to the first set because these two cases are the most realistic. Description of the Generation Routine The present study requires that data be generated from a multi— variate normal distribution with mean p_and covariance matrix X + Xa’ where the within covariance matrix (X) and the between covariance matrix (23) are specified as in Table 4—2. The generation procedure is composed of five steps: 1. Specify the values for the parameters so that they approximate the actual data. The Keesling and Wiley (1974) which analyzed real hierarchical data was used as a guide. This provided values for the pooled within—group regression coefficients (8), the between-group regression coefficients for the individual level variables (88), the regression coefficients for the group level variables (82), the population means (2), error variance defined at the individual level (02), and at the group level (0:) as shown in Tables 4-3 and 4-4 for the first and second situation respectively. The population covar- iance matrices of the predictors were also specified based on the Keesling and Wiley study as shown in Table 4—5. The number of schools (k) and the number of subjects in each school (n) were specified a priori. 2. Compute the within and between covariance matrices (Z and Z ) a 37 Table 4-9 Population Covariance Matrices of the Predictor Variables of the Second Set of Data Covariance Matrices Population I-C Population II-C 2(X) 36.3347 4.7243 81.0000 18.0000 4.7423 61.4263 18.0000 100.0000 2(X) 14.7149 2.9871 35.0000 11.6383 a 2.9871 25.8970 11.6383 43.0000 2(2) 0.0072 0.0007 Not Applicable 8 0 0007 0.0009 1(Xz) 0.0159 0.0065 Not Applicable 3 0.0260 0.0088 38 000.000 000.00 000.000 000.00 000.00 000.00 000.000 000.00 000.00 000.000 000.00 000.00 000.00 000.00 000.00 0-00 060uumEE000 000.000 06000055000 000.00 00000055000 000.00 000.0 000.0 000 0 000 0 000.0 000.0 000.0 000.0 000.0 000.0 0 0 0 0 0 000.0 000.0 000.0 000.0 000.0 000.0 000 0 000.0 0 0 0 0 000.00 000.00 000.00 000.00 000.0 000.00 000.00 000.0 000.00 0-0 000.00 000.000 000.00 000.00 000.00 000.00 06000005000 000.000 00000055000 000.00 06000655000 000.000 000+00 0000 000 moamfiwm>ou Hmuoe mocmwum>oo som3uom moamwum>oo awnuwz ommo mama mo uom paooom msu mo moowuumz oucmwum>ou cowumasmom 0010 magma 39 between the outcome measure and the predictors as specified in Table 4-2. 3. Generate a random sample of k vectors ai, where-ai is multi- variate normally distributed with mean vector 9_and covariance matrix Ea. A random sample of k vectors 21 are generated with the following procedure. a. Generate 12 independent random variables which are uniformly distributed between zero and one. Software for the CDC 6500 has been developed which generates independent values of a random variable which is uniformly distributed over the range (0, 1), the values zero and one are excluded. This function, called Ranf, is described in Fortran reference manual version four (1978). b. Convert the values from a uniform distribution to values from the normal distribution by TeichrOew's method to approximate the inverse of the probability function for the standard normal distri- bution. Teichroew used a polynomial approximation to evaluate the inverse function. His procedure generates 12 independent random vari- ables, Ul’ U2, . . . , U12, uniformly distributed between zero and one. Then R is defined as (Knuth, 1968): R = (U1 + U2 + . . . + U12 - 6)/4 The normal deviate, z is then approximated by: _ 2 2 2 2 z — ((((a9R + a7) R + 85) R + 33) R + al) R where a1 = 3.949846138 a3 = 0.252408784 a5 = 0.076542912 a7 = 0.008355968 a = 0.029899776 40 For the first situation, each observation needed in this study consisted of 5 measures. Those 5 measures are the outcome variable (Y), two predictors defined at the individual level (X), and two pre- dictors defined only at the group level (2). For the second situation, each observation consists of 3 measures, the outcome variable (Y), and two predictors defined at the individual level (x). Therefore, the procedure from a to b is repeated to obtain a 5x1 vector_z for the first situation and a 3x1 vector §_for the second situation which is normally distributed with a mean vector of zero and an identity matrix as the covariance matrix. c. Transform E.t° a_where a'is normally distributed (g, 23). The transformation is: 2.: TE where T is the cholesky factor of 23. The cholesky factor is a lower triangular matrix such that TT' = Ea. This is used because the covariance matrix of the transformed variables a is: Var(a) = T Var (§)T'. In this case, Var(§) is the identity matrix. Thus, Var(a) = TT' = 23 which gives the desired result (Morrison, 1976). After the transforma— tion, a is multivariate distributed normally with mean vector Q_and covariance matrix 23. 4. Generate a random sample of kn vectors Eij where Eij is multi- variate normally distributed with mean vector Q_and covariance matrix X. A random sample of kn vectors Eij are generated with the same procedure as used in the generation of vector 2i except that here we generate kn vectors, and the covariance matrix is 2 instead of X 41 5. Add the k values of a1 and kn values of Eij to the p according to formula (4-1) resulting in kn values of Eij' The values of 21 are constant for the i£h_group, i.e., (4-1) W..=u+a +e ij —- -i -ij where W_ for the first situation 1 lx and W_= for the second situation. Ix 1< IN The program MYDATA (see appendix A) was written for this study to generate a random sample of kn vectors of Eij where Eij is multivariate normally distributed with mean vector u_and covariance matrix X + 28, using the procedure described above. For each sample the pooled within and between mean square matrices (S and 88) are computed as shown in formulas (4—2) and (4-3) respectively: 1 k n - - ' Z X (Wij Wi) (W,, W1) (4'2) 5 = k(n — 1) 1 j 13 where the expected value of S is the pooulation within covariance matrix and the E(S) = Z and k (4-3) 3 = 1 n X (E. - E) (3i. - fl)‘ a k _ 1 i 1 1 where the expected value of S8 is the following: E(S ) = Z + n2 a 3 Here, Ea is the population between levels covariance matrix. The general structure of S is the following: 42 where Sy is the pooled within variance of Y, Sxy is the pooled within covariance matrix between X and Y, Sx is the pooled within covariance matrix 0f.§- To compute an estimate of the pooled within—group regression coefficient (8) for any approach the formula (4-4) is used. —1 (4-4) _8 — sX sxy For the first situation where there are both individual level pre- dictors and group level predictors, four approaches were investigated: two stage analysis, group level analysis, full model analysis, and the sUbtraction approach. The main concern is to estimate the regrassion coefficients for the group level variables (82) by those four approaches. For the second situation where there were only individual level pre- dictors, four approaches were investigated: group level analysis, Bock application, subtraction analysis and full model analysis. The main purpose of each approach is to estimate the between group regression coefficients for the individual level variables (88). The procedure for each analysis approach is described in following sections. Two Stage Analysis Approach The procedure to estimate 82 by using the two stage analysis ap- proach is the following: 1. Compute an estimate of the pooled within-group regression coef- icient (89 using formula (4-4). 43 A 2. Compute an estimate of the group mean (uy ) using formula i (4-5). .. A =A+A'A _A (4 5) uy' uy §_(gx Ex) 1 i 3. Compute 82 using equation (4—6) implemented by the Finn multi- variance program (1972). (4‘6) “y. = u + B'(z, - u ) + An + 6 1 y —2 —1 —2 y I Group Level Analysis Approach Under the group level analysis approach the 82 for the first situation and_§a for the second situation are estimated separately from 8. The Finn multivariance program (1972) is used to estimate B2 under equation (4—7) and pa under the equation (4—8). (4-7) uy = uy + §z(_Z__i - 32) + §,(g . - Ex) + 6 ll 1: + no A II: I 1: v + 0'» (4-8) H yi 1 Subtraction Analysis Approach For the first situation, Z variables are defined only at the group level. The procedure of estimating 82 by the subtraction approach is the same as for the group level analysis approach. The Finn multivariance program is used to estimate 82 under equation (4—7). To obtain the correct estimates ofpa in the second situation Keesling recommends performing three steps as follows. 1. Compute estimates of the pooled within—group regression coefficient (8) using formula (4—4). This step is to compute_§ under 44 the model of (4-9): (4-9) Y.. u + B'(X., - 1] Y 2. Compute the estimates of the between-group regression coef— ficient (8;) with equation (4-10) using the Finn multivariance program. _ = +7'c' _ +5 (4 10) uyi uy Ea (Ex. Ex) 1 3. Compute the correct estimates of the between—group regression coefficients for the individual level variables (Ba) by using formula (4-11). <4-11) 83 = 8* - a a Full Model Analysis Approach The full model analysis approach used the individual outcome as the dependent variable, the individual level variables, the mean of the individual variables and the variables that are defined at the group level as the predictors. The Finn multivariance program is used to estimate 82 for the first situation under equation (4-12) and Ea for the second situation under equation (4-13). (4-12) Yij = uy + §z(§i - £2) + Ea(B-x, - lJor) + E. Oil]. - ix) + Eij _ = v _ - _ (4 13) Yij uy + §a(gxi 2x) + §_(§ij EX) + Eij 45 Bock Application Analysis Approach Bock's analysis approach provides an estimate of the between co- variance matrix (Za) which is guaranteed to be at least a positive semi-definite covariance matrix, and then from this matrix estimates the regression coefficient. The’steps to this approach are as follows: 1. Use the Finn multivariance program to determine discrimination function coefficients p1, and canonical variances 01 (l=l, 2, 3). 2. Compute the positive semi—definite between covariance matrix (Ea) using formula (4-14). (4-14) Ea = [(T)'1'( 0 - I)T'1]/n where elements in the columns of T are the discrimination function coefficients £1, and the diagonal elements of diagonal matrix ¢ are significant canonical variances $1 (l=l, 2, . . . , s) and p—s unities (p is the dimension of T and s is less than and equal to p). When all canonical variance 0 are significant (s=p) 2 = - a [Sa S]/n where S and Sa are within and between mean square matrices that are computed by formula (4-2) and (4-3) respectively. 3. Using Ea to estimate Ea by formula (4—15). A (4—15) B = E(X)‘IE(XY) —a a a A The general structure of Ba is the following: “gm gJ > re; A -1 (6_1) E2 2 Bzy z 2 II on on I [00 (6-2) $2 2 zy Bz Bzx where Bz, Bzy’ and Bzx are the between—group sum of squares and cross product matrices of 2;, 2_and Y, and 2_and §_variables. The simulation results showed that for all three cases, the group level analysis approach, the full model approach and the subtraction 76 approach gave the same estimates of 82 and were consistent with the theoretical results suggested above. In the case where there were no group level effects, the two stage analysis approach gave estimates of 82 better than those derived from the other three approaches. Where the between-group regression coefficients were equal to the pooled within group regression coef- ficients, all four approaches gave essentially the same estimates of p2, all with comparable bias ratios. In the case where the between- group regression coefficients were neither equal to the pooled within- group regression coefficients nor to zero, the simulation results were different depending upon the value of the intraclass correlation coefficient. For the case where the intraclass coefficient was high, the two stage analysis did not give as good estimates of 82 as the other three approaches. However, when the intraclass correlations were more moderate in value (around 0.30) all four approaches gave the same estimates of 82, although the two stage approach yielded better bias ratios indicating less bias relative to mean square error. When the situation was such that there were only individual level predictors which could be aggregated to the group level, the group level analysis approach, Bock application analysis approach, full model analysis approach and subtraction analysis approach were used to analyze the data. Theoretically, these four approaches can be grouped into three sets: first, the group level analysis approach by itself; second, the Bock application analysis approach by itself; and third, the full model analysis approach and the subtraction analysis approach. In theory, the estimates of the between-group regression coefficients (Ba) by the full model analysis approach are equal to the differences 77 between the between-group regression coefficients that are estimated from the between-group sum of squares and cross products matrix and the pooled within group regression coefficients. Therefore, the esti- mates of Ea that obtain from the full model analysis approach and the subtraction analysis approach should be the same. Analytically, the relationship between the between-group regression coefficients esti— mated by the group level analysis approach, Bock application analysis approach are shown in equation (6-3). 1AF G+(B-1A-I)_B_ a 8-» 33-0, B where 8 , BG —a -a and 8: are the between-group regression coefficients estimated by the Bock application analysis approach, the group level analysis approach and the full model analysis approach, respectively, B is the within-group mean of square divided by the number of subjects in each group, A is the between-group mean of square divided by the number of subjects in each group, and I is the identity matrix. The derivation of this relationship is shown in Appendix B. The simulation results showed that for all three cases, the full model analysis approach and subtraction analysis approach gave exactly the same estimates of the between-group regression coefficients. They were also equal to the difference between the regression coefficients that were estimated from the between-group sum of squares and cross products matrix and the pooled within-group regression coefficients which is consistent with the theoretical results. However, the esti- mates of Ea from these two approaches were not close to the parameter values. The bias ratios for the estimates resulting from these two approaches were very high. From this, we can conclude that the 78 subtraction and the full model approach gave totally wrong estimates of §_. For all three cases the group level analysis approach and the Bock application analysis approach gave good estimates of Ba. The bias ratios for these two approaches were quite small when compared to the bias ratios for the other two approaches. When the between-group regression coefficients were equal to zero, the Bock application analysis approach gave better estimates of Ea than the group level analysis approach. However, when the between—group and within—group regression coefficients were equal, both approaches gave the same estimates of Ea' For the situation where the between-group regression coefficients were not equal to the pooled within—group regression coef— ficients, the group level analysis approach gave better estimates of Ea than the Bock application analysis approach when the intraclass correlations were high (about 0.90), but the Bock application analysis approach gave the better estimates of Ea when the intraclass correla- tions were low (about 0.30). From the simulation results, we can summarize which approach gave good estimates of the parameters for the different populations. This is shown in Table 6-1. Table 6—1 shows the quality of the estimates of the between regression coefficients defined for the group level variables (82) and the between-group regression coefficients (83) under the alter- native approaches in terms of accuracy (bias ratios less than 0.15). In the situation where there were both individual level predictors which were aggregated to the group level and predictors which were de- fined only at the group level, the two stage analysis approach gave good estimates of the regression coefficients defined for the group 79 oanmofiamam uoz manoowaaam uoz manmuwaaam uoz manmowaamm uoz 0ooo 0oo0 0ooo seven Umm 0ooo 0oo0 «coco oceau loaouuoo mmoauouucw Lwfin £003 muoomwo Ho>oH Hosp lH>HvsH ecu 0» Hence uoc 0003 muomwwo Ho>oH @5000 mGOHumHou luou mmmfiomuuC0 oumuovoe :uHB muoomwo Ho>oa Hosp 10>0pcfi ecu Ou Hence 00: 0003 muuommo Ho>oa anonw muoommo Ho>oH Honcw>fipcw 050 cu Honda 0003 muoomwo Ho>oH maouo muoowwo Ho>oH asouw oz moanmwwm> Ho>oa Hoocom can Ho>oa Hmavfi>0vcfi :uom cowumowama< xoom 0ooo noou 0oo0 0oo0 0oo0 0ooo 000 000 0oooz cowuomuuQSm Adam somouaa< mfimxamc< Ho>oq asouo owmum 030 mC00umH=mom macaumasaom ucoquMHQ How moumEHumm onu mo xufiamso filo mHan 80 .mH.o 00 H0000 00 0050 0000000 00 00000 0000 0:0 00023 0000000 0w 00m«« .mH.O CNS“ mmmH mH Oflumh mmwfl mfiu OHM—t» Vmfiflwmfi wfi UOOUK 00000 n0H00000 0000000000 swan 0H000 :0H3 0000000 H0>0H H000 0000 000 000 0000 Iwflmam IH>chfi 0:0 00 H0300 00: - 002 0003 0000000 H0>0H 0:000 mcofi00H00000 000H0000cfl 00000008 0H£00 L003 0000000 H0>0H H000 0000 000 000 0000 Iwaaam Iw>fivca 0:0 00 H0000 00: 002 0003 0000000 H0>0H 0:000 >H00 00H00H00> H0>0H H0000>H00H 0000 000 000 0000 0Hn00 0000000 H0>0H Iwammm H0000>H00H 0:0 00 H0000 0oz 0003 0000000 H0>0H 0:000 0H000 IHH000 0000 000 000 0000 002 0000000 H0>0H 00000 oz H0002 H0>0A 0wm0m :0H000waaa< x000 :0fl00000asm Hank 0:000 038 0:0H00H000m £000000< mwmzamc< A.Pucouv To magma 81 level variables: 1) in the case where there were no group level effects; 2) where the group level effects were equal to the individual level effects; and 3) where the group level effects were not equal to the individual level effects and when the intraclass correlations were low (about 0.30). The group level analysis, the full model and the sub- traction approach gave good estimates of the regression coefficients defined for the group level variables in the case where: 1) the group level effects were not equal to the individual level effects; and 2) the group level effects were not equal to the individual level effects and when the intraclass correlations were either low (about 0.30) or high (about 0.90). When the situation was such that there were only individual level predictors which could be aggregated to the group level, the group level analysis and Bock application approaches gave good estimates of the between-group regression coefficients for all cases. The full model and the subtraction approach gave bad estimates of the between-group regression coefficients for all cases. In the present study, we only dealt with the simulation of specific parameter values and specific situations. We did not investigate all types of parameter values or all types of situations. Therefore, the results of this study can be generalized only to similar situations and similar parameter values. Recommendations for Further Study The present study deals with situations where homogeneity of within—group regression coefficients is assumed; therefore, one possible extension of the present work is an investigation of the methods of 82 analyzing hierarchical data which allow for heterogeneity of the within- group regression coefficients. The results of this study suggest‘that the intraclass correlations have an effect on estimating the between— group regression coefficients (fig) and the between—group regression coefficients defined for the group variables (éz)' This would suggest the investigation of all analysis approaches that are used to analyze hierarchical data for different sets of data that are generated from populations which are described by intraclass correlations of dif- ferent magnitudes. The present study, although not designed to examine this issue, and upon finding the apparent relationship, was able to suggest in a preliminary way the need for examining this issue more thoroughly. Another avenue of future work is to apply the analytical pro- cedures based on the methods of analysis of covariance structures for hierarchical data devised by Schmidt (1969) and Wisenbaker and Schmidt (1979) to simulated data of the sort considered in this study. APPENDICES APPENDIX A COMPUTER PROGRAMS 0000000 000000000 10 20 35 PROGRAM MYUfiTfl(TNPUTrUUTPUTfibfiyTAPE6$UUTPUT7TfiPE59TAPE1ITOPEQITAPE +3) GENEROTION PRUGROH SUBNUUTINLS NFL“ GENEA CHOL CHANGE GENUfiTfi COUOR HINENSION SIGMA(15)7T(575)92(571)rE(150075)70(15)9TEMP(571) DIMENSION SIGH0A(15)rTfi(515)901(150075)9Y(150015)96TOTOL(5) UIMENSTON UMH(5)7YBWR(150075)vSUN(5075)7U(lfi)vS(15 rMU(5)rSV(15 DIMENSION HOW(15)79UB(15)78UH(15)yUMB(5)7PH(5)IGTU(5)1PSU(15) DIMENQION SHQT(]5)7GMEQN(5 VSUBT(15) REAL MU REOTJ IN Nvl'x'l'JerNSINTVNEINEUVNSAT’USIGN?”SIl'I-il‘lflu’infil.’ KxNOo OF VOHIORLEQ KwnNO. OF UHRIABLES FOR WITHIN SCHOOL NSNOo OF SUBJECTS WITHIN KOCH SCHOOL NSzNOo OF SCHOOL NT==NOo OF TOT-(IL SUBJECTS NEHNO. OF ELEMENTS 1N COVORIONCE MQTRTX NEN=NOo OF ELEMENTS IN HITHIN COUARIfiNOE MfiTRIX NSfiMfiND. OF SAMPLES READ(5110)K7KU9N7NSrNTvNfivNEUINSOM FORMAT(815) ' READ(5915)(SIGMA(I)rleyNEU)v(SIGHAN(I)71#19NE)r(MU(I)9I*HK) F0RMAT(6F10.47/yOFlOo4r/y7F10o47/73F10o4) ‘ WRITE KIKUINvNSrNT7NE7NEU!NSOM7815HAVSIGNfifirNU URITE(67?O)KvKHerNSyNTrNEyNEUyNSOMQ(SIGMO(I)91$19NEH)7 +(SIGMAA(I)71319NE)V(NU(T)71319K) FORHQT(*1DATQ INFORMATION*!//VOXVTNO. OF UORIQBLES m *rISr/vSXv*NO +0 0F WITHIN SCI'TUOI- UKTIRII‘NBL.IE§.8 31' *7:l.57.-/VSX7*T\“:”9 OF SUBJECTS UITHIN S +CHUUL m *915’/VSX7*N00 OF SCHOOLS I *IIUr/VSXrXNOo OF TOTAL SUBJEC +TS = *,ISY/73X7*Nf}o OF [il..E.i"’i|ll"‘-TT53;. IN (:O'v’f'llz:Il’df’iLE: MATRIX =3 *1157/75X7X +N0o 0F ELEMENTS IN WITHIN COUONTNNCE NOTHIX m *VISr/7UX7*NO. 0F SA +MPLES m *rISv/r$OTHE UITHIN COUHRIHNUE MflTRTXk://15XvF10.49/95X12F +10.49/75X73Floo4v/rWOTHE BETHEEN OOUHHIONOE HATRIX$r//75X7F10.4y/v +5X92F10.4y/v5Xr3F10.4r/75‘r4F10.4y/y5X:5F10.4y/r*OPOPULATION MEAN +=*95F10.4) START GENERATING DATA DO 100 IJKfierSfiM PRINT SAMPLE NUMBER URITE(0721)IJK FORMAT<$OSAHPLE NO. #912) GENERATE E(IyJ) CALL GENEh(KwyNTvSIOMArErT) WRITE TUO HORE COLUMNS FOR NITHIN COUARIONCE DO 750 IuerT E(Iv4)30. E(I!5)'~‘Oo CONTINUE WRITE CHOLESKY FAACTOR OF NITHIN COUARIANCE URITE(6y25 FORMAT<$OTHE CHULESLY FflCTOR OF WITHIN COUflRIANCEX) DO 110 1317K” URITE(6930)(T(I9J)7J$17KU) Pom-mum»-,3F10.4) CONTINUE GENERATE AI(I) CALL GENEACNrNSySIGHAAyfiIvTA) “RITE CHOLESKY FACTOR OF BETWEEN COUflRIANCE URITE(6735) FORMAT($OTHE CHOLLSKY FACTOR OF BFTNEEN COUflRIANCEX) DO 120 1:1HK 83 000 00000 84 NRITD(TA.J:I,N) 4o FORHOT<$0Ty5FlO.4> 120 CONTINUE OENERATE Y(IyJ) . CALL GENUOTA(KyNyNSyNErNTrMUyETAITY:YBARyPfirPSUrGMBvSUByGMEANySUyS +HAT) PRINT POOLED MEAN OF EACH UARIADLE NRITE(6.45)(RM(J>.JAI.N) ‘ 45 FORMAT) PRINT SAMPLE POOLED NITHIN ODDARIANOE NRITE(6y50)(P8v(L)yLRlvNE) so FORMAT<$OSRHPLE POOLED wITHIN OOUARIANOE MATRIXI.//.Ox,F10.4,/.Ov. +2(F10.4v3X)9/95X93(F10.473X)y/rSXyaéF10.4y5X)y/ySXyS(F10.473X)) PRINT GRAND MEAN OF EACH UARIOBLETSCHUUL MDAN IS UNIT 0F ANALYSIR HRITE(6960)(SUB(L)7L=1yNE) 60 FORMAT(GMERN(J)TJR1yK) 65 FORMAT(*0UECTUR 0F GRAND MEAN ATTO PRINT SAMPLE COUARIANCE MATRIN NRITE(6y7O)(SU(L)yLRlyNE) 7o FORMAT(#OSAHPLE covARIANcE MATRIXI,//,nyrIo.4,/y5x.2(r10.4.3X)y/y +5X73,L:1,NE) 75 FORMAT 85 FORMAT(12.O(3X.F10.4>) A 135 CONTINUE ENDFILE 2 PRINT POOLED NITNIN COUARIANOE ON TRPEE URITE(3.90>RSU<1>,Rsv<2>.Psv(4).Rsvc2),ROU<3),ROU(5),Psu<4>,Psv<5) +:PSU(6) * 9o FORMAT<3F10.4./,3F10.4./.3F10.4) ENDFILE 3 100 CONTINUE END SUBROUTINE OENEA GENERATION PROGRAM FOR A AND E(IyJ) READ IN SIOMA FIND CHOLESKY FACTOR 0F SIGMA =_I GENERATE 5 vARIABLEs DISTRIDUTED N(0r1) — z~5x1 TRANSFORM A=Tz, A DISTRIBUTED N<0.OIOMA~A) SUBROUTINES NEED 000 00000 300 200. 500 400 600 700 101 85 CHOL CHANGE DIMENSION STGMA(]5)9T(575)IZ(591)PY(150075)rfl(15)vTEMP(571) $5L;4. [5t Til Iii} IIJI£ l‘fll(flfiE:T (If? 9133.949946138 A330.252408794 9520.0765‘2913 6730.008355968 9930.029899776 CALL RANSET ( CLUCI’HDUMMY)) FIND CHOLESKY FACTOR 0F SIGMA ”~'T 0N1] IJFIT'EE'R'MINKTNT OF T CéLL CHUL.(SIGMAIAII‘JUEr) CALL CI'UINUE((HT7K) DO 700 IlerN GENERATE 12 RflNDUM NUMBER DISTRIBUTED U(091) DO 200 Jfllvfi B3300 . D0 300 IN31712 J RXmRflNF(DUMMY) BxB+RX CONTINUE TRGNSFORH UNIFORM RANDOM NUMBER T0 2 UfiRIfiBLE UISTRIBUTED N(071) R3(B*6o)/4o RS=RRR Z(J91)=((((A9XRS+N7)*RS+AS)XRS+H3)*RS+R1)*R CONTINUE TRANQFORM Z VARIABLE TU Y VARIABLE DISTRIBUTED N(OrSIGHfi) DO 400 JJ=19K X==O 0 DD 500 KKflHK X=X+T(JJIKK)*Z(KKII) .CONTINUE TENP(JJ91)$X CONTINUE DO 600 J=1IK Y , ‘1 ' CONTINUE J N . ,- . - IT2~ CONTINUE ‘4 ‘3 r,a,f I RETURN ‘ END SUBROUTINE CHOL(SIGM99A1KIUET) SIGMA AN K BY K SYMMETRIC MATRIX A AN ARRAY 0F AT LEAST K$(K+l)/2 LOCfiTIONS K NUMBER OF RUNS IN Slfififi BET THE DETERMINfiNT 0F 9 DIMENSION SIGHH(15)99(15) X=SQRT98UM<50r5>9U(15)98(JS)rMU(O)yOU(lS)rSSU(15) DIMENSION SVD(15)18VU(]S)yOMD(5)yPM(O)beUk5)rPSU(15)rSHfiT(15) DIMENSION SUDT(15 REAL MU 000 87 K7”: NO. OF VORIAD’LES7 N3J=NOo OF SUBJECTS IN EOCH SCHOOL NOV-NO. OF CTN‘TOULS7 ATE-“NO. OF ELENE "‘3‘ '3: - T-' ' " '0 COMPUTE Y(I,‘J) NT.” NT TOTAL NONILD UF SUBJECT.) III=TI Lr-‘N DO 100 "”317NS DO 200 I=“I[7L DO 250 .1331le Y(I7J)==’NU('J)+A(NIJ)+£.TINUTC3.3>.TINU<3,3>.AI(3.3).T<3,3) DIMENSION PHI(373)9PHIMI(3:3)vRES<3y3>ySIGHA<3y3)yRESl(3y3) DIMENSION SXX<292)y8XY(2v1)ySXXINU<2y2)yBHAT<2v1) DIMENSION 5Yx<1.2).sE<2> N=NO. OF SUBJECTS IN EACH SCHOOL NsNO. OF DIMENSION 0F MATRICES cho N=3 READ IN T(IyJ)rPHI(IrJ) DO 100 1:1,N READ CREATE IDENTITY MATRIX DO 200 Jx1,N IF (I.EQ.J) THEN AI(I:J)=1. ELSE AI(I.J>=o. ENDIF CONTINUE CONTINUE PRINT T.AI(T(I,J>.Jz1,N) F0RMAT,J:I.N) CONTINUE URITE<6,50) FORMAT(*OIDENTITY MATRIxx) DO 375 1:1.N URITE CONTINUE CUMPUTE PHI—I DO 400 1:1,N DO 500 JxlyK PHIHI(I:J):PHI(IvJ)~AI(IyJ) CONTINUE CONTINUE COMPUTE INVERSE OF T CALL LINv2F(Ty3r3rTINUyOrUKAREA:IER) CREATE TINUERSE TRANCFOOE DO 600 1:1,N DO 700 Jm1yx TINUT(Yij ' Y1) nI(n — l) le where I is the number of groups. 92 93 Then fa can be written as . . 1 ,. 0002 E(xy) I 2 = a a .a ,. ,. Z(xy) 2(X) l a a .- 3 32"” <2. - 19' (g - g) (A - B) | A 2 " "2 _ _' 2 where 0(y) = 1_ 211(Y1 - Y) 22(Yij Yi) n(I - 1) n(I - 1) %- MS(Between) - MS(Within) The least squares estimate of the between-group regression coef- ficient can then be written as “B _ “(x)-1“(xy)_ -1 E3 — Ea 2a - (A - B) (a - b). While the matrices A and B are in general non-singular, the dif- ference matrix (A - B) may not be non-singular. Thus, Bock (1968) proposed to use the orthogonal decomposition of (A - B) and retain only those eigen values and eigen vectors that were statistically signifi- cant to construct Che "inverse" of gix). In order to relate the estimated between-group regression coef- ficient to those obtained from the other approaches, A - B is also assumed to be non-singular so that (A - BT-lexists. Furthermore, both A and B are assumed to be non-singular. The least squares estimate of the between-group regression coefficient is then: 94 m-BYHa-w Ba ' -l -1 = (A - B) a — (A - B) b = (A(I - A‘IB))'1a - (n(B’lA - I))'1b = (I - A-lB)-lA-la - (B'lA - I)'lB'1b But A—la = E: is the between—group regression coefficient esti- 1 mated from the group level apporach, and B- b = §_is the pooled within group regression coefficient. Hence, 1A 1A - I)- .g. = (I — A-lB)-l§: - (B- Pu; Applying a theroem presented by Nobel (1969, Theorem 5.22, p. 147), (I - AmlB).l can be written as: (I - A-lB)-l = I + (B-IA - I)'1. Thus, “B _ -1 -1 “G -1 -1 * Ea — (I + (B A - 1) ga - (B A - I) a _ “G -1 “G -1 -1 A —§a+(B A-I)§a-(B A-I) g _ “G -1 -1 “G “ - Ea + (B A - I) (Ea — g) “B “G -1 -1‘F Ea — ga + (B A - I) ga A F where Ea is the between-group regression coefficient obtained from the A full model analysis approach, and g: is the between-group regression coefficient obtained from the Bock application approach. B IBLIOGRAPHY BIBLIOGRAPHY Bock, R. D. and Vandenberg, S. G. Components of Heritable Variation in Mental Test Scores. Progress in Human Behavior Genetics. S. G. Vandenberg (Ed.). Baltimore: Johns Hopkins Press, 1968. . Bidwell, Charles E. and Kasarda, J. D. School District Organization and Student Achievement. American Sociological Review, 1975, 39, 55-70. Burstein, Leigh. Assessing Differences Between Grouped and Individual- Level Regression Coefficients. Paper presented at the Annual VMeeting of the American Educational Research Association, 1976. Burstein, Leigh. Three Key Topics in Regression—Based Analysis of Multilevel Data from Quasi-Experiments and Field Studies. Paper presented at Institute for Research on Teaching, Michigan State University, 1977. Burstein, Leigh and Miller, M. David. The Use of Within-Group Slope as Indices of Group Outcomes. Paper presented at the Annual Meeting of the American Educational Research Association, 1979. Burstein, Leigh, Linn, Robert L. and Capell, Frank J. Analyzing Multilevel Data in the Presence of Heterogeneous Within-Class Regression. Journal of Educational Statistics, 1978, 2, 347-383. Control Data Corporation. Fortran Extended Version 4 Reference Manual, California, Control Data Corporation, 1978. Cronbach, L. J. and Webb, N. Between—Class and Within-Class Effects in a Reported Aptitude X Treatment Interaction: Reanalysis of a Study by G. L. Anderson. Journal of Educational Psychology, 1975, Q], 717-724. Finn, J. D. MULTIVARIANCE: Univariate and Multivariate Analysis of Variance, Covariance and Regression. Ann Arbor, Mich.: National Educational Resources, Inc., 1972. Hannan, Michael T., and Young, Alice A. On Certain Simularities in the Estimation of Multi—Wave Panels and Panels and Multi—Level Cross-Sections. Paper prepared for the conference on Methodology for Aggregating Data in Educational Research, 1976. 95 96 Hannan, Michael T., Freeman, John, and Meyer, John W. Specification of Models of Organizational Effectiveness: Comment on Bidwell and Kasarda. American Sociological Review, 1976, 31, 136—143. IMSL LIBRARY. The IMSL Library Volume 3. Houston, International Mathematical & Statistical Libraries, Inc., 1979. Keesling, J. W. Components of Variance Models in Multilevel Analysis. Paper prepared for presentation at a conference on Methodology for the National Institute of Education, 1976. Keesling, J. W. Some Explorations in Multilevel Analysis. Santa Monica: System Development Corporation, 1977. Keesling, J. W. and Wiley, David E. Regression Models for Hierarchical Data. Paper presented at the Annual Meeting of the Psychometric Society, 1974. Knuth, Donald E. Semi-numerical Algorithms: The Art of Computer Programming. Mass.: Addison, Wesley Publishing Co., 1968. Noble, Ben. Applied Linear Algebra. Englewood Cliff, N.J.: Prentice- Hall, 1969. Rock, Donald A., Baird, Leonard L. and Linn, Robert L. Interaction Between Colleges Effects and Students' Aptitudes. American Educational Research Journal, 1972, 19) 149-161. Scheifley, Verda M. Analysis of Repeated Measures Data: A Simulation Study. Doctoral Dissertation, Michigan State University, 1974. Scheifley, Verda M. and Schmidt, William H. Jeremy D. Finn's Multi- variance-Univariate and Multivariate Analysis of Variance, Covariance, and Regression Modified and adopted for use on the CDC 6500. Occasional Paper No. 22, Office of Research Consultation, Michigan State University, 1973. "111111111111111111