IVA‘AI'H - _ v “-w v\( ~Kw’v ‘ u (A ~ H .. -Nn "——v » ~4Ay—w -.--.w«‘.n.-..-.» gs... «.7 2‘" 71.. ' “.4. ' u o-J. ,- 312: r r' “-49 3L“: :" t n 3 r 7‘ ‘;’o ‘yl .' - n .gjzfiggifig {3 "25%;, w, .; I.“ U i) '1 4.- .» _ ‘ Via ~ :1... .T .. 4t L 4 u 5. u v gv‘c‘fih “01* 4 : .nN. J ‘31:" .. "Tsar .._...,.f.\.:}' ‘v «(-1: 3. v E‘ z ‘u , P .\ .,, . ,, . 4 z: ,. _ Lv.,.,“ K 1“,, .‘. m A . U" w.;f..yhr _ K . 1;: . 7’ g A. {15 fl» wry: v .g -- - ,, . 7 . . . ~ ,_..1=.;~:.:§§L~'r “$555 I ‘_ an: i“ :23,“ .J- . a r 29%}.14“. ‘VLL; .. .. . f 4 u». , l v,“ ‘Lufikluw . w . ;. u 1‘1”:.33¥£.:9?I)£R‘l - . y ‘ .- . an... 1. .\ e fisw . -. - '1" . <~ ? y; .ng . , 4‘ 1.? {kn ‘ w,k1\’k;‘|.n" ‘ ‘u . .lqugunv 1.; . .‘K: ‘ 'fiygflmyz‘aggk I?“ ‘ .i:¥"y'ta .’-"\':t Nu- ~v«_ _ fi‘fié‘fi"; Sir-w fiv‘smw' émgyrsaw» \ g», t . ~_ .1. 9.. w.l~ u get ‘\ my "‘ _ ‘9‘. w " ' "5“- {llV .‘l-ag‘fi‘: .3333“" .‘u. m - Chianti; .l, .. 2. ~ ' n. ' ‘Ia. 1- I“. f ‘ 5 1 “4 «a, 'a “£444; x'I'IU L . . ‘aw‘ 1:33; ‘ “L. v xx ' ' villi Aide/‘93»; i..n¢u.~. m... 5 ‘ ' f < ‘ . An. 3. ? fig , .., f‘ u . ' r ' , , ' ’ j . y!» - a)“: » ‘, ., , .- ‘ w. ' . ‘ V 'rl‘ g :L' 44 ,.. 1 . 1.5.11.4}: ; . a .4 ‘t; . My, my: M7 N 4' , HP «2: .. “£19216?me ‘ffl-‘zfifi: n , ‘l‘ y.“ “0‘: .‘fiw.{~1 . . . ~ i‘ a _ . V . 9 M £1 . Y I A5“? ' ¢1 ‘ I! '31:: '¢_ ‘ .4 - 94¢ I" '31- .1 ‘i. A) £4; v ;u j." gang ‘3‘“, > , 15,5” . .. :$ru‘ m- z,‘ .w. . v.‘ ’ . , .' ‘ {‘Qan‘r.“::\' m ‘1 . .. ‘ n L . .' {.5 . . an! «E ’5‘. " ' in Ii“- 3; ~r: 14' gm“ , 4&4“ Vittirfi ma 1 ,J 15,, “2‘7"" 9’ - “2‘7?“ ;: r“. avg-fir? if Mm .~. ' , ' "an, '* .. M33?” 4 1“ 6‘11 ' 1’ éa—hwm um-Itit‘gbr Adv: é ; r 9 4“ «fit; :a , TN - new. .5 \- A 'I . my“; 4:23. ‘ . f- .' N . 3:. :1 ‘ 1‘ {gs 'gagv‘gé‘; 1": v , ‘ A . <. i ' 2 "' 7‘” ‘ fi ”.2“? {353.1 135» f: ..‘ flu m ‘ w « u N n ‘ I ' ' $351+" ‘ ! :qwlabu‘flilrl" ’ 5 J~ ” I“«'Er1' m, “fimfi tang; Aw . JV - 5:33.25 .~ .351.“ W 1 4 .3549? «wan: r1114 lama. ""fl‘r ‘ ! ' ‘I- T."- CIV A if! a?“ " . V 3”,. ‘ t':- 6 ~. <* 2-“ ;;. ' -. ‘ r \ (S, . 1%.. u 1 ¢ . . ’33» ‘ r, . .v r‘ .. .. “ ' f?» t' l . u S" :‘m n ' Ms ‘ v ' ‘ V“: U112 ‘ _ n pilfi,_ 2A." , .1 ‘ ~ ‘ ‘ r ‘ ,.. . v , q . 4 .3"; r . r . , ( I 41“}. «:l‘ -T:' V ... . ‘ , ., , v - q -/ -' [A Ntfl’: " , . “s V "' f if ‘ . . ., az- . n." ~“ vvfi'iy :5: :1 '::"Cr,'!7}?.~ 93‘” _ (Wing: 43% 3‘3» .fié‘xz' f'?” ' ' «J .f ‘ ,9: . . ,1 (ftyxl-u” ‘ H J .‘Y Ad 9‘ 1 "V “(3- . I‘ >' v ‘ V ’ h! , 7 -.¢.. .. ... . q .. . - 7. . .. (,1. """"""‘v ' ‘. . ~ “’ "v7" ' _, - * " - ; i533 _ aE—MSJT. 4:..12'...'12L'.'L'::‘“ This is to certify that the dissertation entitled A MIXED LINEAR MODEL WITH TWO—WAY CROSSED RANDOM EFFECTS AND ESTIMATION VIA THE EM ALGORITHM presented by SANG-J IN KANG has been accepted towards fulfillment of the requirements for Ph . D . , MEASUREMENT AND degree in QUANTITATIVE RESEARCH METHODS u). m . .9. Major professor Datejeb-ozzamz/ MS U is an Affirmative Action/Equal Opportunity Institution 0-1277 1 A. LIBRARY Michigan lute { Unlvmny N...‘ r fi‘ PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. usu Is An Affirmative Action/Equal Opportunity lnetitution email”: A MIXED LINEAR MODEL WITH TWO-WAY CROSSED RANDOM EFFECTS AND ESTIMATION VIA THE EM ALGORITHM BY Sang-Jin Kang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Educational Psychology and Special Education 1992 ABSTRACT A MIXED LINEAR MODEL WITH TWO-WAY CROSSED RANDOM EFFECTS AND ESTIMATION VIA THE EM ALGORITHM BY Sang-Jin Kang In the past two decades, there has been a prominent methodological effort in educational statistics to develop analytical methods that account for the multilevel characteristics of the educational data. As a result, methodologists have developed estimation procedures appropriate for nested multilevel data assumed normally distributed. One limitation shared by all of the new multilevel analytic approaches is that they apply only to those multilevel data which are purely 'hierarchical. Although educational systems typically have hierarchical organizations- students are grouped together for learning within classrooms, classrooms within schools — very often the structure of a system is not a 'pure' hierarchy. Students may belong to more than one group simultaneously. If the data are nested within the cell of two cross-classified grouping factors, we call it crossed multilevel data. This study presented an appropriate statistical model for the analysis of crossed multilevel data for general applicability. The crossed multilevel model presented in this thesis allows us to consider simultaneously the multilevel and crossed features of higher level units. Such a model promises to increase the descriptive power and inferential Sang-Jin Kang advantages for both micro-and macro-parameters in the crossed multilevel context. Research interests in this thesis were five fold: a) to conceptualize the crossed multilevel model; b) to present estimation theory for the model using the empirical Bayes viewpoint; c) to provide a computational algorithm for parameter estimation using the EM algorithm; d) to provide empirical evidences for the accuracy of the computing algorithm; e) to present the application of the model; f) to provide the substantive applicability of the model in educational research. The conceptualization of the model was described via a crossed random effects ANOVA model and by linking it to the crossed multilevel model. Estimation theory was reviewed from an empirical Bayes viewpoint and the computational procedure implemented via the EM algorithm. The accuracy of the algorithm was tested using randomly generated data against the standard statistical packages (SAS and BMDP) and via a simulation study. An actual experimental data set was analyzed for illustration and the substantive applicability of the model in educational research was assessed. This dissertation is dedicated to my mother, for her love and special life for the family. To my brother, Sang-Yul Kang, and his wife, In-Sook Kim, for their support and sincere life. To my wife, Eun-Mi, for her love, patience, and ongoing support. I love you. To my daughters, Bona and Boyoung, who enriched my family life in East Lansing and my nephews, Bio, Bundo, and Yohan, for their growth and future lives. To all of my other relatives and friends who have understood and have been generous while I have worked in United States. iv ACKNOWLEDGEMENTS I wiSh to send my heartfelt thanks to Dr. Stephen W. Raudenbush, my advisor and committee chair, who has been my teacher, role model, and provided rock of support throughout my doctoral program. His encouragement, mentoring and ‘his faith in. me greatly aided in the completion of this dissertation. Dr. Raudenbush's insight, courage, and unusual motivation of hard work will continue to be an inspiration for me. I also wish to send my special thanks to Dr. Brian P. Rowan, who has enlightened, and mentored my professional work. Dr. Rowan has provided a constant financial support for my advanced study. Dr. Rowan's generosity and open-minded advices for my work have inspired me. My special appreciation goes to Dr. John Schwille who provided me financial support for more than one year and has willingness to give his valuable time for my needs. It has been my luck that he was near me. Dr. William H. Schmidt, my former advisor and a committee member, gave generously one-year research assistantship when I was a first year doctoral student. Dr. Schmidt also gave helpful comments and suggestions on my dissertation. Thanks. I would like to send my recognition with gratitude to my other committee members. To Dr. James Stapleton, who taught me statistics from the basic to the advance, for his kind suggestions and comments. To Dr. Robert Floden for his tough questions. To Dr. Richard Houang for his willingness to use his valuable time for me. I wish to acknowledge the professors in the Department of Education at Yonsei University. I realize that all my success in the advanced study is rooted in their teaching. Finally, I wish to recognize my wife, Eun-Mi, who has shared the hard life for my entire study at Michigan State University. vi TABLE OF CONTENTS LIST OF TABLES Chapter I. II. III. IV. CONCEPTUALIZATION OF CROSSED MULTILEVEL MODEL Introduction . Objectives . . Problems with Multilevel Contexts . Educational Research and Crossed Multilevel Data . Methodological Advances of Two- -Way Crossed Multi- level Model . Past Methodological Research on Crossed Multilevel Model The Two- -Way Crossed Multilevel Model . . The Two- -Way Crossed Random Effect ANOVA Model Crossed Multilevel Model Formulation . General Crossed Multilevel Model . General Mixed Linear Model . EMPIRICAL BAYES ESTIMATION OF RANDOM EFFECTS IN THE CROSSED MULTILEVEL MODEL . Introduction . . . The Bayesian Model with Known Covariances Theory of EM Algorithm . . . Bayesian Formulation of Crossed Multilevel Model . Joint Normal Distribution . Posterior Distribution of Parameters . Posterior Dispersion Matrix of Parameters Posterior Mean of Parameter Vector . COMPUTING ESTIMATES OF CROSSED MULTILEVEL MODEL PARAMETERS Model . . Sufficient Statistics of Complete Data . . EM Formula for Estimating the Fixed Parameters . EM Formula for Estimating the Parameter Variances EM Formula for Estimating the Within-Cell Variance . Observed Data Likelihood for the MLF Estimation. Log- -Likelihood for the Crossed Multilevel Model CHECKING THE ACCURACY OF THE COMPUTING ALGORITHM . Computational Results Crossed Random Effect Anova Variance Components Model I vii Page \JO‘U‘H l3 16 18 18 23 28 34 35 35 37 40 43 45 46 A7 50 S3 53 54 58 59 60 64 66 69 73 7h 76 ix Variance Components Model II . Covariance Components Model V. ILLUSTRATION . Data and Hypotheses Sample and Design Analysis . . . Base Model . . . . . . . . . Within-Cell Model Specification Between-Cell Model Specification . VI. CONCLUSION . Summary . Limitations Future Works . BIBLIOGRAPHY viii 79 82 88 88 89 91 91 95 102 108 107 112 114 120 Table 10. 11. 12. 13. 14“ LIST OF TABLES Procedure for checking the accuracy of the algorithm . Computational comparison between SAS program and the CML algorithm for crossed random effects ANOVA Computational comparison for the estimates of variance components between BMDP program and the CML algorithm for Model B . . . . . . . . . . . . . . . . . Computational comparison for the estimates of the fixed effects between BMDP program and the CML algorithm for Model B (Balanced Case) . . . . . . . . . Computational comparison for the estimates of the fixed effects between BMDP program and the CML algorithm for Model B (Unbalanced Case) . . . . . . . . Computational comparison for variance components between BMDP program and the CML algorithm for Model C . Computational comparison for the estimates of the fixed effects between BMDP program and the CML algorithm for Model C (Balanced Case) . . . . . . . . . . . . . Computational comparison for the estimates of the fixed effects between BMDP program and the CML algorithm for Model C (Unbalanced Case) . . . . . . . . . . . Computational comparison between the preassigned parameter values and the average values of the estimates from twenty simulations Computational comparison between the preassigned parameter values and the results of the analyses of the balanced and unbalanced data Design and Sample Sizes Results of crossed multilevel analysis of the base model . Results of crossed multilevel analysis with random slopes at within-cell model Results of crossed multilevel analysis with a fixed within- cell slope . ix Page 71 75 77 78 78 8O 81 82 84 86 90 93 97 . 100 15. Results of crossed multilevel analysis with both within— and between-cell variables . . . . . . . . . . . . . . . . . . . 103 16. Results of crossed multilevel analysis of the final model. . 105 CHAPTER I CONCEPTUALIZATION 0F CROSSED MULTILEVEL MODEL W In statistical analysis, a researcher needs to specify an appropriate model that guides inquiry as well as describes the structure of data as precisely as possible. Educational systems typically have hierarchical organizations in which "units" at one level (e.g., students) are "nested" within units at the next higher level (e.g., classrooms or schools). These educational systems often produce inherently hierarchical data. The problems of analyzing hierarchical data arise when. key variables of interest are measured at different levels of an organizational ‘hierarchy. The prime question for an appropriate statistical model in this case is whether the model takes into account the effects of variables measured at both the individual and the group levels. Failure to account for hierarchies may lead to trouble in terms of research validity and has been the core in methodological criticism of educational research (see Cronbach, 1976; Burstein, 1980; Cooley, Bond, & Mao, 1981; Raudenbush, and Bryk, 1988). Despite such fundamental warnings, many analysts used single-level models even if the key variables were measured at two different levels. This mismatch between single-level models and the multi-level data often leads the researchers to the analytic dilemmas in the choice of unit of analysis. In most educational research, students are not randomly assigned to groups such as classrooms or schools. The lack of ind Stu KO C0? .5 we 32:} ,I p‘ J! 2 independence of responses is further exacerbated by the fact that students receive treatment as a group. When students share common group histories, teachers, peer experiences, their responses will be correlated. In addition to the violation of statistical independence assumption, hierarchical data usually produce variables of interest at both student and group levels. Traditional linear models can account for only single level variables and fail to accommodate the variables measured at both levels which restricts model specification and appropriate inferences of interest as a result. Moreover researchers may raise the questions about how group processes (i.e . , some policy implementation) are interrelated with the processes within the groups (1 . e . , student behavior) when they have hierarchical data. These questions are hardly answered by the classical linear models. Educational researchers have long been concerned with multilevel issues. But traditional research methods have not provided adequate tools with which to analyze data arising in naturally occurring hierarchies. New advances on model specification for multilevel data were made by the methodologists who have advised researchers to formulate explicit, multi-level models which enable testing of hypotheses about the effects occurring within each level and interrelations among them (Burstein, 1980; Cooley, Bond,& Mao, 1981; Rogosa, 1978). A number of methodologists, working independently, have developed estimation procedures appropriate for hierarchical, or multilevel data assumed normally distributed (Aitkin & Longford, 1986; Goldstein, 1986; DeLeeuw & Kreft, 1986; Mason, Wong,& Entwisle, 1984; Raudenbush 8: Bryk, 1986). They made a substantial methodological advances in two key reasons exp} Educ Aha-v °55~ EI‘IC have (Raudenbush , 1988) : "First, such methods enable researchers to formulate and test explicit statistical models for process occurring within and between educational units. These models solve, in principle, the problem of aggregation bias Second, these models enable researchers to specify an appropriate error structure, including random intercepts and random coefficients In most settings, appropriate specification of error components solves the problems of misestimated precision which have plagued hypothesis testing in nested unbalanced data sets."(p.86). One limitation shared by all of the new multilevel analytic approaches is that they apply only to those multilevel data structures which are purely hierarchical. Although educational systems typically have a hierarchical organization - students are grouped together for learning within classrooms, classrooms within schools - very often the structure of a system is not a "pure” hierarchy. Students may belong to more than one group simultaneously. For example, all students are members of neighborhoods as well as of schools and not all students in the same neighborhood are in the same school and vice versa. Therefore schools and neighborhoods in this case are crossed with different number of students or no students within each cell classified by the two factors. If such a classification of students is possible, again a statistical model must reflect the crossed structure so that a researcher may investigate the variables at both higher level units, neighborhoods and schools. Because students are "nested" within cells of a neighborhoods by schools cross-classification, the structure of data is cross-classified as well as hierarchical. Also both neighborhood and school effects are considered random because we can assume the two higher level Golds: hieta: ic CCD‘. L car; the This 4 level units are randomly selected from larger populations of the two. Goldstein (1987) called this "crossed-multilevel" structure while the hierarchical data is "nested-multilevel". For the analysis of crossed-multilevel data, a researcher ought to pose a model that considers simultaneously both the multilevel and crossed features of data. Many researchers implicitly worked on this issue in the research on variance component analysis (Henderson, 1953; Cunningham & Henderson, 1968; Hartley 5: Rao, 1967; Patterson 5: Thompson, 1971; LaMotte, 1973). The results from these researchers, however, are applicable under restrictive conditions, as when no continuous group variables are available, and there was no multilevel model formulation which enables estimation of covariance components of micro-parameters and testing the effects of crosslevel interactions in multilevel contexts. Their results are limited to the estimation of variance components. Researchers working on multilevel models have already anticipated the analysis of crossed-multilevel data in the view point of general mixed model. Only a few computational attempts were made with simple models. Lindley and Smith (1972) showed how Bayes estimation can be applied to the analysis of crossed-multilevel data with a randomized block design. But their attempt was limited to the case of balanced design with one observation per cell. A more complete attempt was made by Dempster, Rubin, and Tsutakawa (1981) in the example of "Professional football scores" where the game scores were nested within offensive and defensive effects of the teams involved in each game. These two examples, however, do not lead directly to a general crossed-multilevel model. Rather researchers need to understand these exemplary results as educa each grad yea: sch: Six LA. - 5 pointing to the possible emergence of a general model. Unfortunately we do not have sudh a general crossed-multilevel model where any number of fixed and random effects, in principle, can be estimated at both within and between unit level along with the interaction effects for both balanced and unbalanced designs. Crossed-multilevel data are both common and poorly understood in educational research. Consider a school testing program. Students in each grade level receive multiple tests. Here students are nested within the cell of grade levels by test forms; grade level effects and test form effects are viewed as random; grade levels and test forms are crossed. Again suppose schools have an annual testing program for graduates. Students, in this case, are nested within schools as well as years; school effects and year effects may be viewed as random; and both schools and years are crossed. Objectives This research attempts to achieve six objectives corresponding to the six chapters of the thesis: 1. Conceptualization of crossed-multilevel model is described in chapter 1 through two parts. The first part provides background information for crossed multilevel model. The second part describes the mathematical model for crossed multilevel analysis and considers the research inquiries that it makes possible. 2- Estimation theory for the general crossed-multilevel model is Presented via the empirical Bayes viewpoint for both general and mixed linear models in chapter 2. 3- A computational algorithm is provided for the estimation of press 6.3? - v o w: \u‘ 6 parameters in the posed model using the EM Algorithm (Dempster, Laird, and Rubin, 1977) in chapter 3. 4. A computer program is developed in the Gauss language (version 2.0), and accuracy of the algorithm is examined in chapter 4. 5. Application of the crossed multilevel model to real data set is presented through illustration in chapter 5. 6. The properties of the estimators and value of the study is discussed with summary in chapter 6. b1 5 w t Mu eve o e As mentioned, most educational research has tended to ignore the nesting of individuals within groups and, instead, used single-level models to analyze the multilevel data. This mismatch between single- 1evel models and multilevel data causes the following major problems. W Analyzing multilevel data with single-level models often leads the researchers to the analytic dilemmas in the choice of unit of analysis. Researchers have stated that significance tests based on individual- level analysis are unacceptable, due to the violation of the independence assumption when subjects receive treatments as a group, even if individuals are randomly assigned to groups (Cronbach, 1975; Glass 6: Stanley, 1970; Page, 1975; Aitkin, Anderson, & Hinde, 1981; Knapp, 1977; Walsh, 1947). Inferences about individual behavior, such as individual aChievement, based on group level analysis can cause aggregation bias (Cronbach & Webb, 1975; Robinson, 1950; Hopkins, 1982). Some investigators may choose separate analysis at each level in the hope that mecca critici provide tron 'choice ‘5 Shelli 7 the convergent results of the two analyses would avoid the methodological criticism. Raudenbush and Bryk (1988) have stated that this strategy provides no guidelines on how to interpret the results when they diverge. Cronbach (1976), Burstein (1980) and others have recognized that “choice of unit of analysis" is the wrong question for analyzing multilevel data because the variation at each level is potentially of interest and ought not be ignored. ou c o In the multilevel contexts, the variables measured at different levels need to be taken into account in the model specification. Educational researchers are often interested in the association between school organization and student achievement. Consider a school district that implements a certain policy to improve student achievement. The effect of the policy on student achievement may be different across students due to the differences in student demographic backgrounds such as prior achievement and motivation, variables defined at student-level. If such student differences are ignored, one must assume the policy effects are evenly distributed across the students regardless the differences of student characteristics, which is h rdly acceptable. The effect of school organizational variables, i.e. policy variables, may also tend to be different across schools due to the differences of the contextual conditions across the schools. For example, schools may differ in their school climates such as principal leadership, staff cooperation, student disorder, and in socioeconomic levels of the communities in which schools are located. Whenever we study the effect of organizational variables on students 3.. de EX 97.. EH : ole 1?: a w- an 8 or teachers nested in the organization, we must adjust for confounding effects that could occur at both the individual level and the organizational level. Otherwise the estimates of the effects of the organizational variables will be biased. The restriction of the standard single-level linear model for the analysis of multilevel data is that it does not simultaneously control the variables defined at different levels. as - ve nteractions The effects of organizational variables may depend on variables defined at different levels of the processes of the organization. For example, the fast pacing of the curriculum is beneficial for high- aptitude students but not for low-aptitude student (Gamoran, 1991), an example of an aptitude-by-treatment interaction (Cronbach and Snow, 1977). Such interaction is a "cross-level interaction” (Bryk 6: Raudenbush, 1989) in terms of the multilevel viewpoint because a variable measured at higher level (i.e. pacing of the curriculum) interacts with a variable measured at a lower level (i.e. student aptitude). Cross-level interaction effects may have important implications for the formulation of scientific theories because the effects constrain the generalizability of findings. Cronbach and Webb (1975) realized that the characteristics of the classroom interacted with individual backgrounds. Such interactions rendered the treatment effect unique in each study and constrained the generalizability of a finding across the settings with the treatment. The lack of generalizability of organizational effects may often be foam goli the J . m9: beca ”v . R Us .1“ ~ 9 Eu (7 P9! .13 r.‘ 9 found in the studies of school effects because the effect of a certain policy or reform may depends on the characteristics of students or on the contextual conditions of the schools. Traditional single-level models do not allow the investigation of the cross-level interactions because they must assume that effects are homogeneous across the groups. duca e ea c and C o s d u leve ata o t l v n e t Consider the three educational research cases below which require the analysis of crossed multilevel data. Case 1. Assessment of School Effects A number of district administrators want to diagnose the secondary high school education of their districts and therefore implement an annual testing program for the graduates. The administrators want to know whether the schools provide stable education with good quality, (i.e. , excellence), in terms of graduates test scores across the years. If the test scores of the graduates are not stable across the years then they want to describe the changes. The administrators also want to know how much the schools differ in the test scores of the graduates. If the schools differ substantially in test scores, then they hope to see what characteristics of the schools are determinants of such differences. Another important question is whether the effects of student background characteristics are homogeneous across schools and years. Are outcomes more or less equitably distributed in different schools? Is the equity of distribution stable over time? Available data to the administrators in this case are graduates test scores in each year, students demographic backgr teache' Case I adzini: systems primary adsinis element at secc the sec achiEVe rharact E’“L‘ilste ‘ieek u set 10 background variables, and school characteristics (e.g., school size, teacher/student ratio). Case 2. o a betwee t e school evels Again the administrators want to get fundamental information about the schooling systems of the districts. In schooling systems, children attending a primary school may go to one of several secondary schools. The administrators need information about what characteristics of the elementary schools support the long-term progress of student achievement at secondary high schools. They also need to know the characteristics of the secondary high schools that have positive or negative effects on achievement. The more interesting question may be what combination of characteristics of the elementary and secondary schools supports student achievement. Another question is how well schools of different types serve children of different ethnic and social background. Available data to the administrator are achievement test scores in elementary schools and secondary schools, student background variables, and characteristics of the schools at both levels. Case 3- MW A Psychometrician wants to evaluate a standardized achievement test that was developed as a power test (Rudman & Raudenbush, 1987). Unlike a speed test, a power test is supposed to be insensitive to the duration of test administration. It is expected that examinees can not answer further questions correctly after the prescribed test time because their knowledge would be eXhausted after that elapsed time. Research hypotheses in this case are Whether there are the effects of excess test time on examinees' test 11 scores. The psychometrician first blocks classrooms on background characteristics and then randomly assigns classrooms to excess time conditions. Hence children are nested within cells of a blocks-by- treatment cross-classification. Both treatment and blocks are conceived as random. Available data to the psychometrician are student demographic variables, teacher characteristics, students' prior achievement, and the standard test scores. Analytical Comonalities The design characteristics of the above three research cases share four points. First, data of the all studies are basically multilevel. The studies use the variables measured at both individual and at higher levels. Second, the higher-level units of the studies are all considered as random. In the first case, we consider the sample schools as selected from the larger population and the years of the test administration are also random. In the second case, the elementary schools and secondary high schools are all considered as random. The third case uses a randomized block design where the block effects are typically viewed as random and the categories of different excess test times are viewed as random as long as the researcher increases the number of conditions. Third, each study uses two kinds of higher level units and the two factors are crossed. The first example shows that graduates are nested within cells classified by the schools and years. In the second example, students have dual membership of their elementary schools and secondary high schools. The current organization of schools implies that primary 12 schools will not be nested within secondary schools. Rather the two factors will be crossed. The last example utilizes the block design where the blocks are crossed with the categories of different time allocations for testing. Finally, the data in every case are unbalanced. In the first example, the number of students are not the same across the years as well as across the schools. This unbalanced character of the sample size may be more serious in the second example. Students attending a secondary high school do not come from all elementary schools. Again, the number of students from each elementary school is different in each high school. Therefore, in a natural situation, the numbers of students classified into each elementary-by-secondary cell are unequal and there will be many missing cells. A similar situation occurs in the third example. In education, experimental studies usually use volunteers as the source of data. The class sizes will vary, meaning that cell sizes will vary. Also not every block may contain the same number of students. In sum, each of the three studies has two crossed random factors within which individuals are nested. Because the higher level units are crossed and the lower level units are nested within cells in the two-way classification of the higher level units, we call the data "crossed- multilevel“ (Goldstein, 1987a). The appropriate model should then consider simultaneously both the multilevel and crossed features of data. The multilevel and crossed features require the specification of four random effects in a model: random individual effects, random effects of each of the two higher level units, and random interaction effects. In addition, individual scores within each cell can be described as a function of multiple individual characteristics; and some of the within- 13 cell effects vary significantly over rows and columns while the others may vary only over rows. e dva ta es 0 o-W C o sed ult eve Mode The limitation of the current available multilevel models for the analysis of crossed multilevel data is that those models have been developed for the analysis of one-way nested data. Those models are appropriate to the analysis of nested multilevel data, but only when one level is purely nested within the other level. The nested multilevel model allows estimation of covariance-components at each level. For the analysis of crossed multilevel data, a model should estimate the covariance components at each level but it should also decompose the covariance-components at higher levels into three components corresponding to specified random effects for two higher- level units and for their interactions. The model should handle with unbalanced data and covariates having either fixed or random effects. We now consider the additional information obtainable from crossed- multilevel models in connection with the previous three research examples . v - o o e ts decom tio The crossed multilevel model allows specification of the apprOpriate error terms in a model where schools and years are crossed. For the first example, the crossed multilevel model can estimate a within-cell variance, 02, and between-cell covariance components. The between-cell covariance-components are decomposed into the three parts; one is the covariance matrix for school residual effect, say fa’ the 14 other is for year residual effect, say 1 and for the residual b’ interaction effect between schools and years, rc. With these covariance components we could get various intra-unit correlations which inform us of the proportion of the total observed variance that lies within cells, between rows, between columns and between cells. We could also do a statistical test of the significance of random variation for each effect. These two pieces of information, the intra-unit correlations and the xz-test, provide us both practical and statistical information about further model specification. For example, the presence of a large intra-school correlation informs us that a large percentage of variation of the student scores lies among schools. The significant results of a x2-test for school residual dispersion effect, ’a’ informs us that there still remains a significant random residual effect not explained by the school characteristics in the model. Similar information is obtainable for the analyses of the two other examples. n e ec s We return now to the case in which primary and secondary schools are crossed. A major difference between the nested multilevel model and crossed multilevel model is that crossed multilevel model captures the crossed features of the two higher level units while the nested multilevel models do not. Because the crossed multilevel model allows the specification of random interaction effects of the two higher level units, primary and secondary schools in the second example, it produces estimates of the covariance components of random interaction effects of the two higher 15 level units. The presence of substantial random interaction effects tells us that certain primary school effects depend upon the secondary school attended; or, equivalently that the effects of attending a particular secondary school depend on the primary school attended. If we find a significant interaction effect of a particular primary school characteristic and a particular secondary school characteristic on student achievement, then the effect of the involved school characteristic can't be generalized to the entire schooling system. 3 t tica recision Imagine that a researcher has applied a nested-multilevel model to analyze multilevel data with a two-way classification. He has performed the analysis as a compromise because a crossed-multilevel model was not available. The nested-multilevel model specifies a single error term for higher level units. There are, however, three error sources in the higher level units: two higher level units (e.g., schools, years) and their interactions. The estimates of higher level covariance components in his analysis are then the sum of the three covariance components from each error source. In the multilevel contexts, the estimates of fixed and random effects are the functions of covariance components at both individual and group levels. Nested multilevel models use the sum of covariance components of the higher level units for parameter estimation without knowing the size of covariance components from each error source. Although the exact functional relationship between the nested and crossed multilevel models regarding the parameter estimation is not clear to the author at this point, the author suspects the precision of 16 the parameter estimation may be sub-optimal when the nested multilevel model is applied to the analysis of crossed multilevel data. Consider the case in which primary and secondary schools are crossed. Again suppose we are estimating the fixed effects of primary and secondary school characteristics. Then nested multilevel model uniformly applies the sum of covariance components for estimating the fixed effects of both primar and secondary schools while the size of covariance components pertinent to each higher level units are different. Crossed multilevel models allow specification of the error terms precisely for the analysis of nested data with two-way classification and properly decomposes the covariance components at higher level. Hence it uses the decomposed covariance components at higher level for parameter estimation of the model. et 0 o 0 ice Res a c o C as d leve de Standard texts on experimental design provide methods for simple crossed random effects models in which a single random component is associated with each cell in a fully balanced two-way cross- classification (see, for example, Kirk, 1982). However, as in the above examples, the interesting designs in education will typically be substantially unbalanced. For example, many cells in a neighborhood-by- schools cross-classification will be empty or small; and enrollments will vary in a schools-by-time cross-classification. Researchers working on variance component analysis have . tried to elaborate models in order to meet various conditions (Henderson, 1953; Cunningham & Henderson, 1968; Hartley 6: Rao, 1967; Patterson & Thompson, 1971; LaMotte, 1973). The results from these researchers, however, are l7 applicable under restrictive conditions, as when no continuous group variables are available and their results are limited to the estimation of variance components. Harville (1977) reviews methods of variance components estimation based on maximum likelihood for unbalanced designs with crossed random factors. However, although these methods are appropriate for variance components, they do not allow covariance components, which will often be of interest in ducation. For example, regression coefficients describing the relationship between social background and achievement may vary across schools or neighborhoods, or across time. Lindley and Smith (1972) showed how Bayes estimation can be applied to the analysis of crossed-multilevel data with a randomized block design. But their attempt was limited to the case of balanced designs with one observation per cell. Dempster, Rubin, and Tsutakawa (1981) provided numerical results in the example of "Professional football scores” where the game scores were nested within offensive and defensive effects of the teams involved in each game. Dempster et al. provided maximum likelihood estimates using the EM algorithm and explained the empirical Bayes estimation theory. These two examples are useful; however, they do not lead directly to the general crossed multilevel model. Multilevel methodologists anticipated the analysis of crossed- multilevel data in the viewpoint of the general mixed model. In this stream, Goldstein (1987b) outlined a procedure based on iterative generalized least squares for statistical estimation in crossed-random covariance components. He, however, supplied no computational strategy. 18 e o-Wa os ve de In this section, the two-way crossed-multilevel model is conceptualized and formulated. In order to promote the conce tual understanding of the crossed-multilevel model, we first consider crossed-random effects ANOVA model and formulate the crossed multilevel model from the ANOVA model. Then the general crossed multilevel model will be presented in two forms: matrix form, and no-subscript form. e o-Wa Crossed Random feet OVA ode Suppose a researcher wants to investigate the effects of schools and neighborhoods on students' achievement. He may first select J schools and obtain the information about students' K residential areas. The framework of this design shows; a) students are nested within the cells of a schools-by-neighborhoods cross-classification, b) school effects on students achievement are random, c) neighborhood effects are random. From this description, we may estimate some or all of the following effects; a) random main effects of schools, b) random main effects of neighborhoods, c) fixed effects of school characteristics, d) fixed effects of neighborhood characteristics, e) random interaction effects (school by neighborhood), and f) fixed interaction effects (fixed school-by-fixed neighborhood predictors). If we consider just the random effects, we can specify a statistical model as below Y -p+a+b+c +e ijk j k jk (1'1) izjk 19 for i - 1,2,..., - l,2,...,Jk, k - 1,2,...,Kj; and Y is ith njk’ J ijk student score in school j in neighborhood k; p is grand mean of all j is random effect of school j; bk is random effect of neighborhood k; cjk is random interaction effect between school j and neighborhood k; and finally e1. jk is a random error. The notation "izjk" means student i is nested in school j and students in the population; a neighborhood k. The subnotation of the cell size, njk’ the number of schools, Jk’ and the number of neighborhood, Kj’ imply the design characteristics which allow unbalanced and incomplete data. Since all 2 effects are random, we have a series of assumptions: a - N(0, a a); 2 . 2 . bk - N(0, ob ), cjk - N(0, ac ), ei:jk assumptions of Equation (1.1). The effects of aj, bk’ cjk’ and e J ~ N(0, aez) are all required izjk are mutually independent. Equation (1.1) considers only the random effects. Random main effects may or may not be of central interest for the inquiry of a researcher. A researcher interested how much of the variance among students' scores is attributable to the differences among schools and neighborhoods, may use this model for analyzing data from the above design. Very often, researchers inquire about the effects of school and neighborhood characteristics on student achievement and how students scores in a school differ across neighborhoods. Equation (1.1), assuming all predictors are centered around their means, can be elaborated into the model below in this case: Yijk - p + fllxlj + u3 + fi2x2k + vk + fi3x3jk + 'jk + ei:jk (1.2), 20 where 81 is fixed effect of school characteristic, X1 (e.g., school size); 82 is fixed effect of a neighborhood characteristic, X2 (e.g.,community SES level); and [33 is the fixed effect of the interaction between X1 and X2. Assuming X the random school effect, a J 9 Thus u is a residual random - X + u . J a, 11 J J school effect after accounting for the fixed school effect, 51x11. 1, X2, and X3 are orthogonal, in Equation (1.1) is decomposed into two parts in Equation (1.2); a Similarly, the random neighborhood effect, bk’ and interaction effect cjk of Equation (1) are also decomposed into two parts respectively; bk - 8 X + v , and c - B X + it . This model needs assumptions for 2 2k k jk 3 3jk jk 2 2 2 the random terms. u:l - N(0, au ), vk - N(0, av ), ”jk - N(0, a,r ), and 2 e - N(0, 0e ). Other alternative model specifications are possible ijk if a researcher has different aims of inquiry. For example, a researcher may utilize covariates with students level variables, say x4ijk (e.g., prior achievement) centered around the mean of group jk. Then the student level random error can be decomposed into two parts; e 1. jk - + where 84 is the fixed effect of student prior fi4x4ijk fizjk’ achievement level, X4. Whatever model a researcher poses, the model tells us that all specified effects in the model need to be estimated. We call Equation (1.1) a crossed-random effect ANOVA model which can be understood as multilevel model because it takes into account the contributions from group level units (the a], bk’ cjk) and individual effect, ei'jk' Equation (1.2) is also considered as a multilevel model because there are two kinds random terms; u j , vk, and tr jk are group level residuals and e is individual level residual. These two i:jk models show the basic features of crossed-random effect models and how 21 we specify a model of interest. Because these two models incorporate both individual and group level random effects, we can get considerable information from the random effect dispersion matrix through variance decomposition. d s e o a an V ce eco os ti n Equation (1.2) contains four fixed parameters; p, 51’ 82 , B3 and the random variables; u Here we assume the model is j’ Vk’ 'jk ' ei:jk' additive and that the four random variables are mutually independent each other. The variance of Yijk’ for given fixed effects, is 2+02+02+02, and the conditional covariance between students' score within the cell of school by neighborhood is ' 2 . Cov (uj+vk+1rjk+e1:jk,uj+vk+1rjk+e i:jk) - au v 1r Hence the correlation is given by definition, 2 2 2 2 2 2 p-(au +0V +a1r)/(au +av +01r +08) which we can consider "intra-cell" correlation. This correlation itself shows the proportion of between-cell variances over the total variance. Thus it tells us the proportion of variance attributable to variation between cells of school-by-neighborhood. 22 Again the conditional covariances for students in different schools but the same neighborhood is 2 Cov(uj+kaxjk+ei:jk, u'j+vk+« jk+e i:jk) - av and for students in different neighborhood but the same school is I I I 2 c°v(uj+vk+fljk+ei:jk’ uj+v k+fl jk+e i:jk) au From these covariances we can get two more "intra-unit" correlations; the "intra-neighborhood" correlation, 2 2 2 2 2 p - av / (au +0v +0,r +ae ) and the intra-school correlation 2 2 2 2 2 p - au / (au +0v +0,r +0e ). Again each intra-correlation shows the ‘proportion. of total variance between students which is due to the differences between neighborhood or between schools respectively. Now we have all necessary information to form the full dispersion matrix, V, for total sample. The full variance-covariance matrix, V, contains cell matrices of school j and year k where there are n , ij' jk students. We can form these cell matrices which can be classified into four types. The first type of cell matrices represent the variance- covariance matrices for those students in the same school and 23 neighborhood. These matrices take the diagonal positions of the full matrix, V. This matrix contains the variances of students within a cell, au2+av2+ax2+aez, as the diagonal elements, and the covariances, auz-i-avz-raxz , as the off-diagonal elements. For off-diagonal block matrices of the full matrix of V, there are three kinds of block matrices. One represents the blocks which consider the students in the same school but in the different neighborhood. In this case all elements are ouz. The other one represents the students in the same neighborhood but different schools. This block contains av2 in all elements. The last type of block matrices are for those students who have different neighborhood and school membership. This last type of matrices are null. Thus the full matrix, V, has somewhat complex structure, but is composed of only four kinds of block matrices; one kind of diagonal block matrices and three kinds off-diagonal block matrices. u t ev Mo u io The crossed random effects ANOVA model presented at the previous section is now reformulated into the crossed-multilevel model. Such reformulation will show the logic of crossed-multilevel modelling. W Recall the Equation (1.1) which reflects the research design where students are nested within the cells of a schools-by-neighborhoods cross-classification and both schools' and neighborhoods' effects are random. Since Equation (1.1) has a hierarchical structure, it can be represented into two stage formulae in terms of within-cell and between- 24 cell models. The within-cell model is Y1 jk - fljk + eijk (1.3) where the individual score, Yijk’ is composed of the average score of the cell classified by jg; school and kg; neighborhood plus an individual effect eijk - N(0, 02). Equation (1.3) is the traditional regression model with no predictors in the model except that fijk is allowed to vary randomly. Thus we pose between-cell model to explain the variation of the regression coefficients in the within-cell model. The mean score of a cell, fljk’ can be decomposed into the grand mean, p, across all cells plus the effects of school j, neighborhood k, and their interactions. Thus the between-cell model is specified as -p+a+b+c (1.4) ”3k J k 1k where u is the grand mean of all cell means, a is the effect of school J j, b is the effect of neighborhood k, c is the effect of cell jk. k jk Since the effects of school, neighborhood, and their interactions are all random, the distributional assumptions are; a J b)’ and cjk - N(0, rc). If we substitute the Equation (1.4) into the Equation (1.3) we get the Equation (1.1), crossed random effect ANOVA ~ N(0, fa), bk - N(0, T model. So the two models are identical. Here the random factor a.1 reflects the variation of cell means that is attributed to the effect of a school. Thus ,a is understood as a true parameter variance due to differences between schools. A similar meaning is applied to the Tb and 'c' Using this crossed-multilevel LI) 1' q. is). ‘ ~r: . 25 model we can get various information. First we can test the hypothesis that H0: p - 0 (i.e. the grand mean is zero), but it is not usually of interest in this model. We also pose hypotheses about the parameter variances as well. The null hypotheses are: var(aj) - 0; var(bk) - 0; and var(c - 0. Jk’ was The basic rationale of model specification of crossed-multilevel model for each study is the same as in the other available multilevel models (i.e. H M). First, we examine the variability among students in the hierarchical structure via the base model. Second, we specify the within-cell model only to reduce the within-cell variation and no predictor variables are used at between-cell model. In this stage we can determine whether the effects of the within-cell variables (i.e., student variables) are random or fixed across the higher level units by examining the intra-unit correlations of each within-cell slope. If the intra-unit correlations of a certain within-cell slope is close to zero then we can fix the effect of that predictor and the within-cell model become a mixed linear model. Third, after completion of the within-cell model specification we start the specification of between-cell model in order to identify variability among higher level units as a function of between-unit variables. The criterion of the model specification is a coe det a o , R2, which has similar meaning in the regression analysis. While R2 means the proportion of the explained variation by a model given the total observed variation in regression analysis, the R2 in crossed-multilevel model is obtained based on the true parameter variances. Therefore the meaning of R2 in this case is 26 that the proportion of explained true variance by a model given the total parameter variances from the model based on no predictors. Suppose we complete the model specification via the above three steps and have a final model where two within cell variables are employed but one of them has fixed effect, i.e. pretest score, while the intercept is random across the groups and the remaining student variable (e.g., SES level) has random effects across the schools only. Again suppose one school variable (e.g., school size) and one neighborhood variable (e.g., crime rate) are identified as an significant predictors in the final model. By using matrix form, the within-cell model is ij - xjk Bjk + ejk (1.5) I- r- - r- T Y1 i 1 x11 x21 [ 50 ] e1 - 51 + 52 JR LYndjk _1 x1n indjk _en‘jk where ij is a vector of students' posttest scores, xjk is the matrix of student predictors where X1 is students SES level and X2 is the pretest scores that have fixed effects. Since the within-cell model has both fixed and random effects it is a mixed linear model itself. In the following between-cell model we need to model the within-cell parameters, the 8's, as a function of the group level variables. The between-cell model is 27 ”3k ‘ “1k r 8 l w w w 0 0 0 F 1 - [pg] - [o 0102 03lw10] 182 52 jk o o o o o o 1 jk 702 703 710 711 . 720.1 + Rjk aJ + 631‘ bk + Tjk cjk 1 o a 1 [ b 1 + 1 [ c l + 01 [a2 L o 0 k 0 0 3“ oo jk o jk o jk (1.6) Here fijk is a vector of within cell parameters; ij is the matrix of group level variables; 1‘ is the matrix of fixed effects of group level predictors; Rjk’ cjk’ and Tjk are the selection matrices for random residual effects; aj, bk’ and cjk are the random residual effect vectors with distributions aJ - N(0, ra), bk - N(O, 1b), cjk An interesting feature of the above Equation is the presence of Rjk' cjk’ and Tjk’ which serve for identifying the residual random effects. If the all three within-cell parameters, ,80, [31, fiz, are random across - N(0, re). the schools and neighborhoods including the interactions between the two factors, then the three selection matrices, Rjk’ Cjk’ and Tjk’ are all identity matrices with the dimension of the number of within-cell parameters. If one represents the between-cell model in a scalar form, one will see how the within cell parameter, 132, has a fixed effect. These three selection matrices are useful for a practical purpose. Analysts may need to set some particular residual effects to zero to fix the relevant variable effects on outcome. If the three selection matrices are null, then analysts are assuming that the within-cell slopes are all fixed and the crossed-multilevel model become the 28 standard ordinary least squares regression equation with single variance component, 02. Another interesting feature of Equation (1.6) is the format of ij which shows a different set of predictors for each within-cell slope. Here, W1, W2, and W3 are the school size, crime rate, and their interactions of the jkgh cell respectively. Because the within-cell slope, 82, has no group level variation, it has been fixed and no group level predictors are included in the matrix of group level variables. Thus the model has great flexibility in model specification by allowing a different set of predictors for each within-cell slope. The above between-cell model is a multivariate model because there are multiple outcome variables (B's) for each cell, jk. If we combine the two equations by substituting the Equation (1.6) into the Equation (1.5), the model becomes Y jk - xjk ij I‘ 4» X8le a.j + ijk bk 4- chk cjk + ejk (1.7) where Xa X Equation (1.7) is jk " xijjk; x'bjk ' xjkcjk; cjk ' xjijk' considered as a mixed linear model where the first term of the right hand side is a fixed portion and and remaining terms are all random. e as eve o The crossed random effect ANOVA model and reformulation of it into the crossed multilevel model presented above explicitly describes the design characteristics of the data and allow specification of appropriate error structures that solve the problem of misestimated precision. We shall now see a general form of crossed-multilevel 29 model. To make the presentation concrete, suppose again that we wish to estimate a regression equation for each of many cells classified by schools and neighborhoods. Our aim is to discover whether these regression equations differ across the cells and , if they' do, to explore the reasons why they vary. To the extent that these regression parameters do vary, we want to ask: what school and neighborhood characteristics are associated with variation in these regression coefficients? To investigate this kind of problem we formulate a multilevel model which is composed of two submodels: a within-cell model and a between—cell model. The parameters of the within cell model are conceived as outcome variables in a between cell model. After formulation of multilevel model we will see how the model can be viewed as a special case of general mixed linear model. The presentation of crossed multilevel model is ordered in two forms according to its generalization: 1. matrix form with subscript 2. no subscript form. wts The regression formulas provided in the previous section are useful for seeing the exact structure of the equations. The formulation of a general model expands notation and facilitates derivation of the estimation formula. We first write the within-cell model which corresponds to the equation (1.5) as ij - xjkfljk + ejk’ (1.8) 30 T . where ij - [Yljk’ ’Ynjk] is an njk by 1 vector of achievement scores; T pjk - [flOjk’ ’fipjk’ ’fir-ljk] is an r by 1 vector of micro parameters; T ejk - [eljk’ ’enjk] is the njk by 1 vector of random errors assumed normally distributed with a mean vector of zero and dispersion matrix ijk; and jk xlljk ° ' ° xr-1,1jk _1 xlnjk . . . xr-l,njk" The between-cell model in matrix notation corresponding to the equation (1.6) is fijk - ijy + Rjkaj + Cjkbk + Tjkcjk (1.9) T T T T where 1 - [10 ,...,1p ""’1r-1 ] and T T 7p - [7p0’1pa1""7pas’7pbl’""7pbt’7pc1"'"7pcu] . The elements of 1PT are the parameters capturing the structural relationship between the p 3h within-cell slope and the predictor variables of schools, neighborhoods and their interactions. T "arl-l]j and a1 - N(0,ra), T 0"°"br2-l]k and bk - N(0,r a - [ao,.. J b-[b k b); 31 c and c - N(0,rc). - [c c 1 T jk O""’ r3-1jk jk The dispersion matrices 'a’ rb, and re are all full symmetric matrices with the dimension of r1, r2, and r3 which reflect the number of micro- parameters that vary randomly across the schools, neighborhoods and school-by-neighborhood interactions respectively. The matrix of group level variables is p ”3k ' 1 “Oalj"'w0cujk 0 l wpalj . . 'wpcujk O - 1 wr-lalj ' ' 'wr-lcujk‘ The elements of W are somewhat complex, because the predictors are jk chosen form a variety of resources; schools, neighborhoods and the interactions of the two macro units. For example, the first element, WOalj’ means that it is the value of first school characteristic, at al, school j, predicting the intercept of within-cell model, ,80. The design characteristics of ij allows a different set of predictors for each within-cell slope. Rjk is a random school effect indicator matrix with the dimension of r x r1, where r is the number of all micro parameters and r1 is the number of micro parameters that vary randomly across the schools. Cjk is a random neighborhood effect indicator matrix with the dimension of r x r2, where r2 is the number of micro parameters of within-cell model that vary randomly across the neighborhoods. '1‘jk is a interaction random effects indicator matrix with the dimension of r x r3, and r3 is 32 the number of randomly varying micro parameters across the cells. ij becomes a partitioned matrix with block diagonal structure where the row vectors are stacked along the main diagonal. 1 is constructed of subvectors, one for each of the r outcomes. The subvectors are "stacked" on top of each other. The total number of elements of 1 is F, where F- E Fp and F? - 1 + 3p + tp + up for p - 0,...,r-l, and sp is the number of fixed row effects predicting p th within cell slope, flpjk; tP is the number of fixed column effects predicting fipjk; up is the number of fixed interaction effects predicting flpjk' The between cell model is a multivariate model because there are "r" outcome variables for each cell, jk. Because we allow a different set of predictors for each within cell slope, fipjk’ we need to note FpflFp, for 5p and 8p,. a w u b t To further simplify notation and subsequent presentation of estimation formulas, we now rewrite the general crossed multilevel models without subscripts. This presentation is useful in two ways: First, it provides the matrix structure of the crossed multilevel model. Second, using the model without subscripts, it is easy to present estimation method. The within cell model becomes Y - Xfi + e, e - N (0, e) (1.10) where Y - [Y11T, ,deilT, a - [p11T, ... .fiinlT. e - [e11T, ,eJK?]T, and 33 X - Diag [xjk] - X11 0 xjk O . XJKJ The between cell model is p-w1+Ra+Cb+Tc, (1.11) T T T T where 1 - [1o ’ ... ,1p , ... '7r-1 ] , T T T W - [W11 , ... ,WJK ] , T T T a - [31 , , , . ,aJ ] , and a ~ N( 0, 08), T T T b - [b1 , , , . ,bK ] , and b - N( 0, 0b), T T T c - [c11 , . . . ’cJK ] , and c - N( 0, 0c), where 0a - subdiag(ra), 0b - subdiag(r and 0c - subdiag(rc). b)! R - Rjk O MR where Rjk are always r x r1 elements of a single '1' and all other elements are zero. As noted earlier matrices where each column has the elements of 'ls' in Rjk indicate the presence of random effects of within cell slopes. MR is G x J matrix which determines the row membership of a cell; ”G” is the total number of cells with data and ”J" is the total number of row groups, that is schools. The operator ” G ” means Kronecker product. C - Cjk a MC where MC is G x K matrix which determines the column membership of a cell. T - TJk e MT, where MT is G x G identity matrix. 34 Ge e a xed ea ode If we substitute the between-cell model (Equation 1.11) into the within- cell model (Equation 1.10), we obtain the single model Y - XW1 + XRa + XCb + XTc + e This model can be restated in simpler notation as Y - XW1 + Xla + X2b + X3c + e (1.12), The new matrices of X1 to X3 represent the XR, XC, and XT. This combined model can be reformulated into general mixed linear model as Y - A101 + A202 + e, (1.13) T where A - [X W], 01 is 1, A2 - [xlIXZIX - [aT, b , cT]T, and e is the 1 31' ”2 same as defined previously. The author will use these two models, Equation (1,12) and (1.13), for the presentation of estimation theory in chapter two. CHAPTER I I EMPRICAL BAYES ESTIMATION OF RANDOM EFFECTS IN THE CROSSED MULTILEVEL MODEL Wan At present, the estimation theories contributing to crossed multilevel data may be classified in two research streams. The first stream may be viewed as classical variance component analysis which started as early as the 19303. In this stream the researchers attempted to estimate various random effects ANOVA or ANCOVA models. Graybill (1961) extensively treated the estimation problems for the constants and variances in the linear model with balanced designs and demonstrated the optimality properties of the classical analysis of variance procedures. In the case of 'unbalanced factorial and nested data' , there may be four subgroups of estimation methods. They may be labeled, 1) Method of Moments (Henderson, 1953; Searle and Henderson, 1961; Cunningham and Henderson, 1968), 2) MIVQUE or MINQUE (Harville, 1969; LaMotte, 1970; Rao, 1972), 3) Maximum Likelihood (Hartley and Rao, 1967; Harville, 1977), 4) Restricted Maximum likelihood (REML) (Patterson and Thompson, 1971; 1974; Harville, 1977). These four methods are currently implemented in standard statistical packages, such as SAS and BMDP but all are limited to the estimation of variance components. Numerical applications are available under mixed ANOVA or ANCOVA models which do not allow continuous group-level variables (see BMDP,1985,pp413-435). The more general approach for hierarchically nested data was made in the second research stream. Researchers in this stream use the term of 35 36 'multilevel analysis' because they attempt to estimate both micro— and macro-parameters. There are several researchers who developed estimation methods independently. Lindley and Smith (1972) derived Bayesian estimates for the hierarchical linear model; Smith (1973) compared Bayesian and least squares estimates for' hierarchical linear' model; Dempster, Laird, and Rubin (1977) established the EM algorithm for ML estimation for covariance components models; Longford (1985) developed the Fisher scoring algorithm for ML estimation of covariance components in multilevel mixed linear models; Goldstein (1986) developed an iterative generalized least squares estimation; Laird and Ware (1982), Strenio, Weisberg, and Bryk (1983), Mason, Wong, and Entwisle (1984), and Raudenbush and Bryk (1986) developed. approaches to estimation rusing empirical Bayes estimation method via EM algorithm. These methods have been widely applied for the analysis of hierarchically nested data and can be considered as candidate methods for the analysis of crossed multilevel model. This thesis follows the flow of empirical Bayes estimation using the EM algorithm for the numerical analysis of crossed multilevel model. While the Bayesian theory formulates a prior density for the variance and covariances, the empirical Bayes theory uses maximum likelihood point estimates of the variance and covariances of the prior that maximize their' marginal posterior distribution (Efron and Morris, 1975). The term empirical Bayes came from the fact that it uses empirical data to estimate the variance and covariance parameters. For the application of empirical Bayes estimation, the EM algorithm produces the ML estimates of parameters by substituting the expected complete data sufficient statistics into the formula for complete-data ML 37 estimation of the parameters. The present chapter discusses how empirical Bayes estimation theory can be applied to the estimation of crossed multilevel model. The next chapter presents computational EM formulas for obtaining ML estimates of the parameters. Here, I will describe Bayesian estimation theory for the model given the known fixed effects, variances, and covariances. Then in chapter 3 I will show how EM algorithm can be used for estimation of crossed multilevel model. The Bayesian Model with Known Covariances Using the notation and results of Raudenbush (1988), the Bayesian linear model takes the form of general linear model, Y-A6+e, e - N(0, W) (2.1), where Y is N by 1 vector of outcomes; A is N by P matrix of predictors; 6 is P by 1 vector of parameters; e is N by 1 vector of random errors. The critical difference between the classical linear model and the Bayesian model lies on the conception of the parameters, 9. The conception of the parameters in classical linear model can be summarized as follows: 1) 9 is a fixed unknown; 2) we compute O as our estimate based on the sample of data; 3) if we replicate the same study many times, the 8 values computed will vary with a mean of 9 and a dispersion var(G) and the distribution of 0 is called the sampling distribution; 4) we use the 38 sampling distribution of 8 for the inferences about 9. In the Bayesian point of view, the parameters, 6, of Equation (2.1) themselves have a general linear structure in terms of other quantities which they call hyperparameters (Lindley, & Smith, 1972). Therefore all parameters of 9 are viewed as random and hyperparameters are viewed as fixed. Since 9 is random, we may propose its distribution, 9 - N( é , o) (2.2). The Bayesian views this as our prior distribution of 9 and the prior distribution represents our state of knowledge on 6 (Smith, 1973). Therefore our prior knowledge about the location of the parameter vector is 9, and the precision of this knowledge is measured by 0-1. In the context of Bayesian estimation, the prior distribution of parameters is assumed to be known, and the posterior distribution of parameters given the data and prior parameters needs to be found. The procedure for finding the posterior distribution involves Bayes theorem, which states that the posterior density function f(6|Y) is proportional to the product of two independent density functions: the likelihood L(Y|9) and the prior distribution P(9). Hence the posterior density function is, f(6|Y) a L(Y|6)P(6) (2.3). Bayes theorem incorporates the prior information of the parameters and the observed information from the sample. The right hand side of 39 Equation (2.3) can be easily seen as joint probability density function of observed data Y and prior information of 9. Under the assumption of known dispersion matrices, B and 0, the joint normal distribution of the data, Y, and the parameter 6 is f(Y,6) - L(Y|6)P(9) (2.4) - clexp[-1/2(Y-Ae)Tw'1(Y-Ae)1c2exp[-1/2(e-é)To'1(e-é)1 where C1 and 02 are normalizing constants for the two normal densities. Thus the posterior distribution of 6 is proportional to the joint normal density function. T -1 - T -1 - f(6IY) « exp[-1/2(Y-A9) w (Y-A6)] exp[-1/2(8-6) o (9-9] (2.5). * Equation (2.5) determines 6 , the posterior mean of 6, and the dispersion D * Th 1 e. at S f(9IY) « exp[-1/2 (e-e*)ne*’1(e-e*)], (2.6) where 9* - 06*(0'16 + AIW'IY) (2.7) and D9* - (0-1 + ATi-IA)-1. (2.8) Estimation dheory for the Bayes linear model is presented in Lindley’ and Smith (1972) and. Dempster, Rubin, and.‘Tsutakawa (1981). Bayesian inferences are based on these posterior estimators from which we can get point estimates and intervals. 4O eo o l o The Bayesian theory presented above requires known variances and covariances in order to get posterior means and dispersion matrices for parameters under consideration. In the empirical Bayes method, the parameters are estimated from the joint normal posterior distribution given the ML estimates or other consistent point estimates of the hyper- parameters for their' unknown 'values. This empirical Bayes estimation method is applied in the EM algorithm, where the ML estimates of hyperparameters defined through empirical Bayesian methods are used for finding the expected complete data sufficient statistics which in turn are used for ML estimation of the parameters (Dempster, Laird, and Rubin, 1977). The name EM_elge;1§hm comes from the characteristics of an iterative routine which cycles through an Expectation step and a Maximization step at each iteration. The expectation step finds the posterior expectation of the sufficient statistics based on the 'complete data' given the 'observed data' and current estimates of parameters. The Maximization step then uses the expected sufficient statistics to produce an estimated values of unknown parameters under estimation. These two steps cooperate to increase the likelihood function of the estimated parameters. To make the procedure of EM algorithm concrete, consider a linear model with a sample size of n and we want to estimate the variance components of the model as, Y’- X5 + e, where Y is a n by 1 vector of observations; X is a n by p matrix of predictor variables; 41 fl is a p by 1 vector of regression coefficients; e is a n by 1 vector of random error, and the distributional assumption is e - N (0, 02I) and p - N(6, F) as a prior. The EM algorithm uses the expected value of sufficient statistics of the complete data conditioned on observed data and current estimates of parameters as a proxy for the summary statistics of "complete data". The 'complete data" consist of Y and the true values of parameters, here fl and e. By employing the assumption of having complete data, we estimate the sufficient statistics of complete data by the conditional expectation as E(e'eIY,021) in E-step, where e'e is the complete data sufficient statistics, Y is the observed data, 021 is the current estimate of 02 at the i331 iteration. The term 'I' can be read as 'given the data of'. Suppose we have a vague prior on 8. Then the variance components of F become infinitely large and P-l-eO and the Equation (2.9) become the functionally same model as a standard linear model. Hence we pose the conditional distribution of 8 given the data as 21 - N(fii. V) (2.10) plY,a where pi - (x'X)'1x'Y v - a21(x'X)'1. The sufficient statistic is then 42 E(e'e|Y,021) - E[(Y-Xfi)'(Y-Xfi)IY,021] (2.11) - Er (Y-xaiwi-xm ' safari-Xe) b.0211 - (Y-xnb'w-xrsi) + arm-pirx'xw-fi‘)Iran] - SSres + tr[X'XVar(fllY,021)] - SSres + tr(X'X)021(X'X).1 - SSres + p021 The result of Equation (2.11) is the proxy of the sufficient statistics for the complete data. In the M-step we simply get maximum likelihood estimates of 02 as 02(i+1) - [SSres + p021]/n (2.12). 2(i+1) The resulting value of o is then used as input for the next E-step. A The reader will notice that if the current estimate 021 - SSres/(n-p), then 62(i+1)- SSres/(n—p) also. Indeed, if 621 differ from SSres/(n-p), the iteration will nevertheless converge to that estimate. For the use of EM algorithm, there remains a choice between two likelihood functions for ML estimation of variances and covariances, which is labeled as MLF and MLR (Dempster, Rubin, and Tsutakawa, 1981). Consider the general mixed Bayesian model as Y - A191 + A292 + e, [:1 - Ni {:1 .1: :11 and e - N(0. V) The MLF method treats 61, w, and T as fixed parameters to be estimated. 43 To achieve such estimation requires the conditional distribution f(82|Y,61,‘F,T). In contrast, the MLR method treats w and T as fixed parameters to be estimated. To achieve such estimation requires f(91,92IY,i,T). In this thesis, I have chosen the MLF approach. Consider the general mixed linear model as Equation (1.13), which was Y - A191 + A262 + e, where the A161 is the fixed part and remaining right hand side of the equation is random part. To obtain the conditional density of 62 given Y, 91, W, and T, the model is modified as, d - A292 + e, where d - Y - A161 (Dempster, Rubin, 6: Tsutakawa, 1981). Then the equation ‘become a general Bayesian. model which enables us to find posterior means and dispersion. matrix of 9 using Bayesian. method 2 presented in the previous section in this chapter. e a u ati o sed u t ve Mod In order to use EM algorithm, we need to formulate the crossed multilevel model into the general Bayesian form of' Equation (2.1). Recall the combined model Equation (1.12) was Y - XW1 + Xla + X2b + X3c + e and the mixed linear model, Equation (1.13), was 44 Y - A181 + A292 + e. Since these two models are equivalent, we can posit the following identities. A1 _ [xv] (2.13) A2-[x1 ' x2 |x31 and 61 - 1 62 - [aT | bT | CT]. The related assumptions are that the fixed and random parameters are independent and that the random parameters, a, b, and c, are also independent. We change the general mixed linear model into the model with only random parameters such as d - A292 + e (2.14), where d - Y - A191 and the distributional assumptions on the new outcome variable is d ~ N (0 , A TA + W) (2.15) Other terms of the model (2.14) keep the same distributional assumptions as the mixed model. In order to make the equivalent crossed multilevel model, we also Change the model (2.14) into; 45 d - x1e + xzb + x3c + e (2.16), where d - Y - XW1 and the distributional assumption is T T T 2 d - N (0 , xlnaxl + inbxz + x3ocx3 + a I) (2.17) Now, the models (2.16) and (2.14) are equivalent. Jo o l D str b t o The necessary step for posterior estimation is the specification of joint normal distribution of the modified model. The expected mean vector of the dependent variable is E(d) - E(A292 + e) - 0 and the expected mean vector of the parameter is E(62) - 0. The variance of outcome vector is T Var(d) - AZTA2 + i and the parameter variance is Var(62) - T. The covariance between 'd' and '62' is Cov(d, 62) - Cov(A262 + e, 62) - AZT' Hence, the joint normal distribution is T d _ 0 AZTAZ + w AZT N T (2.18) 92 0 , TA2 T The equivalent distribution of the crossed multilevel model can be 46 obtained by substituting A and T with their corresponding terms. Hence 2 'd‘ ' '0' ’xnxT+x XT+X0XT+021X0 x xo" 1 a 1 20b 2 3 c 3 1 a 20b 3 c a ' N o o x T o o 0 a1 a T b o obx2 0 ab 0 c o o x T o o o - - - . ~ - c 3 c - ~ P ste or st ibutio o arameter Having specified the joint normal distribution of modified crossed multilevel model, we now need to get the posterior distribution of the parameters, a, b, and c. By using the definitions of (2.13) and by applying the posterior estimation method, we now can pose the posterior distribution of parameter vector of the modified version of the Bayesian model. Then we translate the results into the terms of crossed multilevel model using the identities defined before. The posterior distribution of the parameter given the data is - * * 62Id,T,W N (92 , ”92 ) (2.20), where * * T -1 92 - D92 A2 1 d (2.21) and T -1 -1 -1 032 - (A2 0 A2 + T ) (2.22) Equations of (2.21) and of (2.22) show that the posterior distribution of parameters requires the prior information of the parameters; W, T, and 9 Here we assume that the information of the W, T, and 81 are known. 1. 47 I will first show the posterior dispersion matrix and then move on to the posterior mean of parameter vector in the crossed multilevel model. 0 e t 0 ar te S Equation (2.22) is now rewritten in terms of crossed multilevel model, T 2 -1 T T X1 X1 + a 08 X1 X2 X1 X3 -1 * 2 T T 2 -1 T D92 - 0 X2 X1 X2 X2 + a 0b X2 X3 (2.23). T T T 2 -1 X3 X1 X3 X2 X3 X3 + a Do To derive the inversion of the above matrix, the following definitions are useful. x TX + 020 '1 x Tx 311 312 + 1 l a l 2 B ' T T 2 -1 ' 21 22 (2'2“) x2 x1 x2 x2 + 0 ab 3 B T B - T - x2 x3 B2 T 2 -1 U - X3 X3 + 0 0c With these definitions we say * 3+ B -1 D _ (2.25) 92 3T 0 . The advantage of this representation is that we can now apply the partitioned matrix inversion method since (2.25) is partitioned in two- by- two form. The results of the inversion (Searle, 1982) is 48 * H-l -(GH-1)T D - (2.26) 92 -(GH 1) U 1 + CH 1GT , where H - 8+ - BU-IBT c - U'IBT. Equation (2.26) requires the two inverted matrices, U.1 and H4. The dimension of U.1 is determined by the number of columns of the matrix X But the 'H' matrix has two-by-two partitioned form again. We apply 3. the same inversion method as applied for (2.25). For the inversion of the 'H' we use the definitions at Equation (2.24) and rewrite it into simpler notation as 311 - B U'la T 312 - B U'IB T H n 1 1 1 2 11 12 H - 321 - B U'la T 322 - B U'ls T - H H (2.27) 2 1 2 2 21 22 _1 H11 H12 -1 H11 H12 “an H - - (2.28) n n H21 H22 21 22 11 -1 -1 where H - (H11 - H12H22 H21) 12 111 -1 H - -H H12H22 22 -1 -1 111 -1 -1 -1 H ' H22 + (“22 "21)H ("12“22 ) °r ("22 ' H21H11 "12) ° Once we get the inverted matrix 'H', we can apply the results to the inversion of the matrix (2.26). From (2.26) we have 49 p H11 H12 _H1181U-1_leBZU-1 . * 2 21 22 21 -1 22 -1 D92 - a H H -H 310 -H BZU (2.29) U-l+U'1[81TH11B1+B2TH2181 (Symmetric) T 12 T 22 -1 L +31 H 152+];2 H 152 ]U j * Now the task is to reexpress the each block of D82 in terms of the crossed multilevel model. The Equation (2.29) shows that the solutions of each block are linked each other. Thus more compact notation for the matrix (2.29) is useful. v11 v12 V13 Let D - a v21 v22 v23 (2.30). v31 v32 v33 Using the definitions at Equations of (2.24) and (2.27), and by collecting the results of Equations (2.26) through (2.29), we can obtain the following results : 11 -1 -1 -1 ‘V11." H ' (“11 ' H12H22 H21) ' (H11 ' Ga) (2'31) 12 111 -1 T T -1 v12 " H " '“ H12‘122 ' "’11 x1 Mx2n22 22 -1 -1 11 v22 " H '"22 + ("22 "21)H T -1 (“12"22 ) -1 -1 T x T -1 ' H22 + H22 x2 Mlell 1 Mx2H22 1 -1 -1 °’-' (“22 ' H21H11 H12) ' (“22 ‘ Cb) T T -1 v13 .. «(vllx1 X3 + v12X2 X3)U 50 v23 ' "(V21X1Tx3 + "22"213‘3)”.1 v33 - U'1 - (v13TX1TX3 + v23Tx2Tx3)U'1 v-vT v-vT 21 12 ' 31 13 where “11 - xlTMx1 + 0208-1, H22 ' szsz + 020b-1’ M - I - x3TU'1 3, Ga ' xlTMx2322-1x2TMx1' c - x T 'lx Tux . b 2 Mxlull 1 2 We need to note that the two terms H11 and H22 are subdiagonal matrices and their inversions are easily obtainable. The complexity of these computation can be reduced if we note the fact that each component is the function of other components. Hence once we have information of U-1, -l 22 , we can get v11 which serves for the estimation of v12 which, in turn, serves for the computation of v13, and so on. and H o te 0 Mean 0 e e Ve 0 Having obtained the posterior dispersion matrix of the crossed multilevel model, we now consider the posterior expectation. Recall the Equation (2.21) which was * Assuming we have the estimates of 91 and we have D62 , the matrix operation become straitforward. 51 Concerning the dispersion matrix, recall the assumptions for the crossed multilevel model in which the column, row, and interaction effects are mutually independent. In other words, the proposed prior dispersion matrix has block diagonal form where (Ia, (lb, and 0c are at its diagonal position. 08, ab, and 0c are also block diagonal matrices with submatrices of re, and rc at their diagonal position. These 'b’ three submatrices are full matrices with dimension of r1, r2, and r3 and they take the diagonal position of 08, 0b, and 0c respectively. However the posterior dispersion matrix which can be obtained after data observation is not block diagonal. The off diagonal submatrices, v12, v13, and v23, are not null. We need to use the obtained full matrix of s 062 for posterior estimation of the parameter vector as the equation (2.21). The equation (2.21) become as follow. * T T T a vnx1 + vux2 + v13X3 * T T T 92 - b* - v21X1 + v22X2 + v23X3 d (2.32) * T T T c v31Xl + v32X2 + v33X3 where X1, X2, and X3 terms reflect the data of the predictors classified as column, row, and their interaction variables respectively. To simplify the results of Equation (2.32), we need to first expand the results, using the operational results of Equation (2.31), and simplify them by noting the interrelationship of the submatrices of Equation (2.31). After a somewhat complicated operation of the relevant submatrices, we can arrive the following results. 52 ‘1’ b* " V22Qb' * -1 T * * c -u x3 [d -(X1a +X2b )1, 'r 'r -1 T where Qa - x1 Md - x1 £21122 x2 Md, 1' T -1 T ob - x2 Md - x2 MXlHn x1 Md. All terms included in Equation (2.33) are previously defined at Equation (2.31). CHAPTER III. COMPUTING ESTIMATES OF CROSSED MULTILEVEL MODEL PARAMETERS This chapter will present the technical aspects of implementing crossed multilevel analysis. The author will show how the EM algorithm provides the estimates of crossed multilevel model. Mgdel The model defined for MLF estimation was d - A292 + e, where d - Y - A161; 92 N(0, T); e - N(0, i) with appropriate dimension. 61 is fixed parameter while 62 is considered as random. The corresponding crossed multilevel model is d - Xla + X2b + X3 ‘where d - Y - XW1 c + e, a - N(0 , 0 ) a b - N(0 , 0b) c - N(0 , 0c) and. e ~ N(0 , 021). As before, the following definitions are useful to link the above two models; T T T T Al-XW, A2-[X1|X2|x3],61-1,62 -[a |b |c]. 53 54 The error, e, is an N by 1 vector which is stacked from the top with all within cell error vectors, ejk’ which are uncorrelated across the cells. The within cell errors are njk by 1 vectors where the njk elements are also independent. Therefore the dispersion matrix, W, is a N by N diagonal matrix which composed of the dispersion matrices of all non-null cells. Hence the dispersion matrix of each cell is 021 and the overall dispersion matrix is Pozlnll 1 0 VP - - 021, 0 _ 02anK . where N - E E njk - 2 n8. The random parameter variance matrices, Oa’ 0b, and 0c are both diagonal block matrices with submatrices of ’a’ Tb’ and fc at their diagonal position across the row, column units and their interactions. The three submatrices, r and re, are of full matrices with the a’ fb’ dimensions of the number of within-cell random slopes across the row, column units and their interactions as stated in chapter I. s ete ata In the empirical Bayes method, random effects are estimated given the ML estimates of the fixed unknown parameters. In ML estimation, the logic of EM algorithm is to use the expected ‘value of sufficient statistics of the 'complete data' given the observed data and the previous estimates of these fixed parameters as a substitute for the 55 summary statistics of 'complete data', and perform ML estimation based on the assumption that we had observed the complete data. In our case, the complete data consist of the outcome variable Y and the true values of random effects, 62 and e. The fixed unknown parameters include 02, 'a’ 'b’ 'c’ and 1. By employing the assumption of having the complete data, ML estimation can be simple to derive. To find the complete data sufficient statistics, I will first use the joint likelihood function of the complete data and parameters for MLF estimation and then translate the results in terms of the crossed multilevel model. Given the model defined at Equation (2.14) and (2.15) and by referencing the Equation (2.4), the joint density function of the parameters and the complete data is £(d,ezlel,T,w) - f(d|e T,w) f(92|el,T,w) (3.1), -l/2 2’61, T.w> - [<2«)NI*|1 -1/2 Where f(dle eXP[(-1/2)(d-A262)Tfl-1(d-A262)], 2'91' {(92|91,T,w) - [(2«)F|T|] T -l exp[('1/2)(92 T 62)]' The term, F, in the second equation on the right hand side of Equation (3.1) is the total number of elements of 62. Hence the log-likelihood function will be L a (-1/2)(3-A292)Tw'1(3-A292) - (1/2)92TT'192 A ‘where d - y - A 9 and 9 - (AITAI)-1A1T(Y - A282). Let e - d-A 6 1 l 1 2 2' From the above complete data log-likelihood, we see that 8T8 and 9262T are the sufficient statistics for 02 and the variance components T. Also T A1 Y and AlTAZG2 are the sufficient statistics for 61. Having obtained the set of complete data sufficient statistics of the 56 . “T“ T T modified baye51an model; (e e, 6262 , and A1 A292), we now need to translate them in terms of the crossed multilevel model. Concerning the term, 92621., we know that 62 is a vector of random effects as T T T T T T 92 - [a1 ,. ,aJ 'bl ,. . .,bK |c11,...,cJK ], T . and aj - [an’ . . . , arl-l,j]’ T . bk ' [bOk’ ' ' ' ' br2-l,k]’ c T - c T — [c c ] jk g Og’ ' ' ° ’ r3-l,g ’ for j -Ilq...,J rows; k - l,...,K columns; and (jk) - g - l,...,G cells classified by the jet; row and kg) column; where r1 is the number of within-cell slopes that are random across the row units; r2 is the number of within-cell slopes that are random across the column units; r3 is the number of random within-cell slopes regarding the cells classified by row and column units. Under the assumptions that the rows, the columns and their interaction effects are mutually independent and that each row unit is independent from other row units, each column unit is independent from other column units of complete data and, given the row and column effects, the cell is independent from other cells, the sufficient statistics we need for random residuals are; :2 a.a T, a b b T, 2 c c T (3.5) J J k k s 3 Now it become trivial to get the ML estimator of T as -J'12aaT (3-5) '4) -1 T - K 2 bkbk '5) O‘ 57 A A A A Concerning the terms of eT e, where e - d - A262 and d - Y-Alél, A we use the cell level notations of the vector e and use the identities between the Bayesian and crossed multilevel model. Then the sufficient statistic for 02 of complete data is A TA A T 2 2 eJk ejk - 2 2 [djk - (lekaj + x2jkbk + ijkcjk)] x [djk - (lekaj + x2jkbk + x3jkcjk)]' (3.7) Then the ML estimator of 02 is A 02 - (l/N) 2 2 ejkT ejk (3.8). Finally the sufficient statistics of the estimator of 61 are AITY and AltA292' the corresponding cell-level expression for AlTA292 in crossed-multilevel model is T T T A1 jk A2jk e23k ' “3k xjk (xljkaj + x2jkbk + x3jkcjk)' A Then the equation 81 - (AlTAl)-1A1T(Y - A262) can be rewritten in computational form in terms of crossed multilevel model as 7 - (2 2 wjk xjk ’51.ij (2 2 W3,, X31, 39,. - 3 E wjk x31. (X13133 + x2jkbk + X3chjk)) (3-9)‘ .Equation (3.9) shows that the complete data sufficient statistics for restimating the fixed effects of the crossed multilevel model are; 58 T T T T T T 22 wjk xjlc ij, 22 wJk xjk X13133: 22 wjk xJk ijkbk, 22 w T T (3.10) 31: xjk x3jkcjk' Having found the complete data sufficient statistics for ML estimation of r Tc’ and 02 and 1, we need to obtain the expected a’ rb’ values of the complete data sufficient statistics conditioned on the observed incomplete data Y and the estimates of the parameters. The EM algorithm then uses the values of the conditional expectation of the complete data sufficient statistics as the proxy of the complete data sufficient statistics for ML estimation of parameters. The following sections will show the EM formulas for parameter estimation. 9* EE_E2rmula_f2r_Estiaatina_ths_£iaad_£aramster ( 1 ) As noted at prior section, the complete data sufficient statistic for T 1 is the term A1 A292. If we have complete data we can directly apply the value of AITAZO2 for ML estimation. Since the actual data is not complete, the EM algorithm uses estimating the posterior value of 6 the following conditional expectation to get the substitute for the complete data sufficient statistic for ML estimation as T 2 T * A1 A2E[62|T, o , el, d] - A1 A262 (3.11), where the right side of 'I' includes incomplete data 'd' and the current estimates of T, 02,and 81. Since A1, A2 are given, the posterior expectation applies only to 6 . Therefore the result of (3.9) 2 at a- is A1TA262 , where 62 is the posterior mean of 6 as presented at 2 chapter two. Once we have the estimates of 02, T, and 1, we can obtain S9 * 'k 62 from Equation (2.33) and then the estimate of the fixed parameter 1 is T T T T * ij xjk ij - z 2 ”3k x.Jk (xljkaj * + x (3.12). T T * -1 1 - (2 2 ij X.jk xjkwjk) [2 2 * + x2jkbk 3jkcjk )1 0 st matin t e arameter Variances If the data is complete then the sufficient statistic is 2a to a T JJ get ML estimates of 18. With EM algorithm, the data are incomplete. Thus jT Finding this we substitute E[£ a aJTId, W, 1, 1] for 2 a a J J conditional expectation directly follow from the standard theory. The dispersion of aj, Var(a ), is defined as, J ) - E(a a T) - E(a )T 11 ”(8 Var(a (Searle, 1982). J J J This may be restated into a more useful form, E(a a T) - E(a )T 11 ”3‘“ ). + Var (a J J J This equation is for unconditional expectation. It is the analogue for conditional expectation given the 'incomplete data' Y and current estimates of (18, 'b’ fc’ 02, and 1), so the expectation is 2 * *T 2 T E(Z ajaj Id,ra, 'b’ 'c’ a , 1) - 2 a1 aJ + a 2 vlljj (3.13). vllj j are the matrices taken from the diagonal position of v11 at equation (2 . 31) . * The maximization step for the estimation of 'a is accomplished by 6O * the simple operation. The complete data ML estimator of fa is, * * a T + 022 v * -1 'a 'J ‘2 8.1 j 1111) (3.14). * This completes one iteration for estimating fa . The posterior estimates * * of other two estimates, 'b and TC , are presented at equation (3.15) and (3.16) below; * -1 * *T 2 rb - K ( 2: bk bk + a z v22kk) (3.15) * -1 * *T 2 Tc - G (22 cjk cjk + a 22 v33jkjk)° (3-16) v22kk and v33jl - I(2«)'1/21F|T|'l/zexp{(-1/2><92TT'192)1 (3.28) and the denominator function is; f(ezld, T, w .91) -1 ZIP -1 2 * - [(2 > /1F|De§| / epr(-1/2)(e2 - 62*)H0921(92 - 92 )1 (3.29) * T -1 -1 -l T -1 T -l -1 -1 where 62- (A2 i A2 + T ) A2 i d, and D92*- (A2 T A2 + T ) These three equations from (3.27) to (3.29) hold for all 92, therefore * if we evaluate Equation (3.27) at 62-92 , we can obtain more simplified equations since the only remaining term from the equation (3.29) will be 66 -1/2 after eliminating the constant terms. The resulting equation is Ivegl then f(dl61.T.w> - [<2«>N|*ll'l/zlbegll/ZITI’l/zexpt<-1/2)s1 (3.30) h s e * d A e * w'1 d A e * e *TT'le * " ere ( 2 ) ' ( ' 2 2 ) ( ' 2 2 ) + 2 2 T -l * * T -l -l T -l - d i (d - A262 ) (by using 92 -(A2 W A2+T )A2 8 d) Equation (3.30) is the likelihood we seek. ng-iikeiihood for the Crossed Multilevel Model Given the Equation (3.30), the log-likelihood function is LLF(61,T,02|d) -(-1/2)Tog|w| + 1/210g|092*| - l/2logITI - 1/2S(92*) (3.31) In order to translate this log-likelihood in terms of the crossed- multilevel model, let us consider the four terms; le, ITI, [Dag], and 3(62*). First we consider the equivalence, i-azI, in crossed multilevel model as noted at chapter 2. Thus 2 loglil - Nloga . (3.32) The ‘prior parameter ‘variance T is the 'block. diagonal. with the submatrices of cell-level dispersion matrices, 'a’ 'b’ and 'c that have dimensions of r1, r2, and r3 respectively. Hence the dimension of T is (Jr + Kr + Gr Therefore the log of the determinant of T is the sum 1 2 3" of all diagonal terms as thr' 5 as The}: The . 67 logITI - Jloglral + Kloglrbl + Gloglfcl. (3.33) * For the determinant of posterior dispersion matrix D92 which is a three-by-three partitioned full matrix, we represent the matrix into two by two partitioned matrix and obtain the determinant of the whole matrix as [ v v v 11 12 13 * 2 I I (2“1“K1'2"Gr3)I d22 d23 I lDez I ' ” v21 v22 v23 ' 0 d d I v v v 32 33 31 32 33 (3.34), Pv V 11 12 where d22 - v v ] . 21 22 ' PV 13 T d23 - v J d32 - d23 , 23 ' d33 ' v33“ The second term of the right hand side of the Equation (3.34) is then 1 22 23 d (3.35) ' Id22|ld33'd32d22 23I 32 33 1 d -1 - '|V11IIV22’V21V11 v12"d33’d32d22 23" Equation (3.35) can be simplified more by using the previous results of Equations of (2.26) and (2.31) in chapter 2. Using the Equation (2.31), we can obtain 1I -1 - Iv22'V21V11 V12| ' IH22 68 where H22 is the block diagonal matrix with dimension of Krz. Again using the Equation (2.26) we get -1 -1 Id «1231 - IU I. 33'd32d22 where U is the block diagonal matrix with dimension of Gr3. Finally at - 8(92 ) - dTi' 1(d - A292*) can be replaced with * T T * * * 2 8(82 ) - 22[djk djk-djk (leka:l +x2jkbk ”(311‘ch )]/a (3.36). 9: By substituting loglill, longe‘fi’I, logITI, and 8(62 ) of Equation (3.31) with Equation (3.32) through (3.36), we arrive the final form of log- likelihood function for the crossed multilevel model. 2 LLF(02, T, elld) - (Jr1+Kr +Gr3-N)1oga -(Jlog|raI+KlogIrbl+Gloglrcl) 2 -1 -1 T + 1°8|V11l+21°8'322k I-i-XXloglUJk I-[ZEdJk djk 1‘ * at 'k 2 - mdjk (xljkaj+x2jkbk ”(311‘ch )]/a (3.37). The EM algorithm evaluates the log-likelihood at each iteration to monitor the progress of the algorithm. The deviance, say 6, between the log-likelihood at ith iteration and its value at (i+l)1:_h iteration will be computed at each iteration. A value of 6 close to zero indicates that the log-likelihood at the ith iteration to be very small in comparison with its value at (ii-1);}; iteration. The actual EM iteration will stop at 65k, where k is predetermined level. Es: Es: '.-L : 0“ SEE CHAPTER IV CHECKING THE ACCURACY OF THE COMPUTING ALGORITHM The author deveIOped the computer program, "Qrossed Multi-Level (CML) algorithm," that provides estimates derived from the crossed multilevel model using Gauss (version 2.0) language. The program is designed to use sufficient statistics of the cross-product matrix as input data and to perform thousands of calculations over numerous iterations of the EM algorithm. It is also designed to analyze data under general crossed multilevel modelling. Therefore an accuracy check based on hand calculation of all equations in numerous situations would be unreasonably demanding and quite unreliable. An alternative reliable way is to utilize already available computer programs, such as SAS and BMDP, that support some special cases of the crossed multilevel analyses and to use simulation methods for other cases of the model. This approach for checking accuracy of the algorithm is reasonable in two points. First, the computer program performs MLF estimation of the EM algorithm which produces maximum likelihood estimates as noted earlier. Some standard packages, such as SAS and BMDP allow maximum likelihood estimation for variance components models in which only an intercept of regression model at the cell level can be specified as random and all other regression coefficients have to be fixed. The author decided to compare the estimates between the programs presented in this thesis and the standard packages under the assumption these standard packages produce true maximum likelihood estimates when the crossed multilevel model has a random intercept. For the analysis 69 70 of crossed multilevel model with multiple random slopes in the within- cell model, a covariance components model, I used a simulation method to see if the program recovers the known parameter values. The procedure for checking the accuracy of the program is organized in Table l. The first column of the table lists the crossed multilevel models from the simplest case to the most complicated. The second column of table 1 tells us how the computational results were confirmed to be accurate . 71 Table 1. Procedure for checking the accuracy of the algorithm Models Empirical Evidence d andom ects nova Within-cell model: Yijk ' fijk + 813k Between-cell model: fijk - 10 + aj + bk + cjk compared to SAS V nce Co onents Mode I Within-cell model: Y + e ijk ' 53k ijk Between-cell model: fljk ‘ 70 + 11W1jk + 72w23k + aj + bk + cjk C, Vagiance Components Model II Within-cell model: Yijk ' fiOjk + fllxlijk + eijk Between-cell model: ”031: " 70 + 71"131: + 72w2j +aj+bk+cjk va e o ode Within-cell model: Yijk " fiOjk + filjkxijk + eijk Between-cell model: fiOjk " 700 + 701w13k + 1'02"’2jk + aOJ + bOk + chk compared to BMDP compared to BMDP simulation study + a + b + c ”mt " 71o 13 1k ljk 72 Model A to model C represent the variance components models because there are only variances to be estimated. Thus the resulting estimates of variance components are all scalars. Model D is the covariance components model in which the two random within-cell slopes are correlated and the resulting parameter variance-covariance matrices are all two-by-two full matrices. The SAS (see SAS User's Guide, 5th Ed., chapter 41) program was used to obtain the maximum likelihood estimates of the posed models of A. BMDP (see BMDP Manual, Vol. 2., general mixed model analysis, pp.1144-llS3) was used for estimating the model B where two group level predictors are involved and the model C where a fixed effect covariate is involved at individual level, an analysis which SAS cannot perform. The common features of variance-components models in BMDP and SAS lie in the point that they are experimental in nature. The fixed effects sum to zero and the random effects are assumed to be sampled from normal populations with zero means. Although BMDP supports the model with fixed effects covariates at the individual level, SAS requires that all predictors in the model are the group level categorical variables and provides only the estimates of variance components even if fixed effects predictors are in the model. The algorithm for the crossed multilevel model presented in this thesis does not require the predictors in the model to be constrained as class or discrete variables. Because no available computer program allows the estimation of a crossed multilevel model with continuous variables in the model, the author selected the model with discrete variables only for computational comparison. Later at chapter 4 of this thesis, a crossed multilevel model with a continuous covariate in the model will be estimated. 73 In the case of model D, I conducted simulation analyses for balanced data, since no computer programs are available for estimation of crossed multilevel model when it has multiple random slopes. The simulation method is somewhat judgemental because one should decide the number of replications of the simulation. The current computer program for the crossed multilevel model involves complicated computation and takes a long time for computation. So the author decided to use relatively large size data, N-8400, but limited the replications to twenty times. For the unbalanced data of model D, another simulation analysis would be better for checking the accuracy of the program. However, I analyzed one unbalanced data set that originated from an already used balanced data and compared the results to the balanced case in the hope that if the two results were close enough, then the algorithm is believed to analyze the unbalanced data properly. 0 a e The author produced the all data sets to estimate the posed models through random generation using the Gauss programs. To explain the models and data used. for accuracy check, following, definitions are useful: 1. Design characteristics: J - number of macro units for rows (e.g. schools); K - number of macro units for columns (e.g. neighborhoods); G number of cells classified by the two sets of macro units; n - number of observations of each cell; N - total sample size. 74 2. Model characteristics Yijk - outcome score of person i in the cell classified by jth row and kth column units; flqjk - qth random regression slope of a cell classified by jgh row and kth column units; 2 . eijk - random effect of person i of cell jk. ejk - N(0, a ), 1 - fixed effect of a parameter; aJ - random effect of jth row units, a - N(0, ra); J bk cJk - random interaction effect of jth row and kth column units, b - random effect of kth column units, k ‘ N(ov fb); cjk-N(O,rc). a d f e ov Model A is a crossed random effects ANOVA model. The design characteristics are: J-lO, K-l4, 6-140, “F10, N—l400. The distributions from which the values of random effects selected are: a - J N(0, l6), bk ~ N(0, 25), c - N(0, 36), and e - N(0, 100). The fixed jk ijk parameter value was assigned as 1 - 10. The fixed effect, 1, and the random parameters at the higher level, a1, bk’ and cjk constituted the data for each cell mean, fijk' fljk and the individual random effect, eijk together produced the outcome values, Y Using Y two analyses ijk' ijk’ were performed using ML estimation of SAS and the algorithm developed by the author. For the analysis of’ unbalanced. data, fifteen. cells of' data. were arbitrarily selected out. Hence the design characteristics are changed as: J-10, K-l4, g-125, n-10, N-1250. The model is the same as in the balanced case. Table 2 shows the computational results of the CML 75 algorithm in comparison with the results of the ML estimation of SAS program. Table 2. Computational comparison between SAS program and the CML algorithm for crossed random effect ANOVA Covariance components SAS CML algorithm Balanced Data fa 37.287992 37.287925 fb 29.999248 29.999221 fc 31.867203 31.867350 02 94.260318 94.260317 1 12.671820 12.671820 Unbalanced Data fa 40.036423 40.036586 'b 36.3315730 36.315495 fc 30.811034 30.811064 02 94.260322 94.260317 1 12.799353 12.804980 The two sets of results clearly show that two programs produced identical estimates. One thing to note in Table 2 is that the variance components analysis of the SAS program does not produce the estimates of the fixed effects, 1, which is the grand mean of the sample in Model A, so I obtained the observed grand mean to check the accuracy of the estimates of fixed effects, 1, of the CML algorithm. cla: Hem des: acm mac: dam: 0 f< Alt} the COT] whe} Near Unii unit T1111 76 e o o t o e Model B is the typical 'variance components model for' which. most classical research on variance components have centered (see Searle and Henderson, 1961; Rao,l972; Hartley and Rao, 1967; Harville 1977). The design characteristics and model characteristics of the data used for the accuracy check are the same as the case of Model A except for the two macro variables in Model B. The two macro variables, lek and szk, are dummy variables defined on rows and columns respectively. They are coded 0 for the first half units and l for the remaining units respectively. Although BMDP provides the estimates of the fixed effects in the model, the parameters are estimated under a general mixed ANOVA model. The corresponding variance components model for model B that BMDP uses is Y - p + a (4.1), ijk 1 + am + aJ + bk + cjk + eijk where Y is the outcome value of student 1 in jkth cell; p is the grand ijk mean; a1 is the fixed effect of W1, the macro variable defined on row units; fin is the fixed effect of W2, the macro variable defined on column units; aj, bk’ cjk’ and eijk are the random effects as in the crossed multilevel model. In Equation (4.1), the fixed effects parameters, ,1, a1, and fim are not the same as 10, 11, and 12 in Model B. An alternative ‘way to check the accuracy of the estimated fixed effects of CML algorithm is to compute the predicted mean values of the groups classified by the two group-level variables using the estimates of the fixed effects from the two programs. The design matrix of the variance components model in BMDP, whose columns are the orthogonal contrasts, and the dummy coding of the fixed effects variables in the crossed multilevel model enable us to 77 compute the predicted group means from the two analyses. I will first show the computational comparison of the variance components and then move to the case of the fixed effects. The computational results for the estimates of the variance components from BMDP and crossed multilevel algorithm for balanced and unbalanced data are appeared at Table 3. Table 3. Computational comparison for the estimates of variance components between BMDP program and the CML algorithm for Model B. Covariance components BMDP CML algorithm Balanced Data 'a 12.665 12.665328 1b 28.111 28.111434 7c 31.879 31.879463 02 94.260 94.260318 1a 12.266 12.239884 rb 34.666 34.696003 rc 31.070 31.080777 62 94.260 94.260289 The first panel of Table 3 shows that the results from each program are identical. For the results from the analysis of unbalanced data, the results are virtually identical but the differences among the estimates, .03 for parameter variances and .0001 for within-cell variance, are a little larger than those of balanced case. The computational comparisons for the estimates of the fixed effects 78 Table 4. Computational comparison for the estimates of the fixed effects between BMDP and CML algorithm for Model B (Balanced case) BMDP CML u a1 fim 70 11 12 12.672 -4.909 -1.109 6.653551 9.818753 2.217782 Group Codes Predicted Group Means W1 W2 BMDP CML 0 0 p+a1+flm- 6.654 10- 6.654 1 0 p+a1-flm-l6.472 10+1I-16.472 0 1 p-a1+fimf 8.871 1o+12- 8.871 1 1 p-al-fimf18.690 10+11+12-18.690 Table 5. Computational comparison for the estimates of the fixed effects between BMDP and CML algorithm for Model B (Unbalanced case) BMDP CML u 01 flm 10 11 12 12.717 -5.098 - .911 6.480460 10.322841 1.667013 Group Codes Predicted Group Means W1 W2 BMDP CML 0 O p+al+fimf6.709 10-6.48O 1 O u+al-fimf16.904 10+11-16.803 0 l p-a1+fimf8.530 1o+12-8.148 l l p-al-fim-18.725 10+11+12-18.470 79 are appeared in Table 4 for the balanced case and in Table 5 for the unbalanced case. The first panels of Table 4 and TableS shows the estimated values of the fixed parameters and the second panels of the tables show the computational comparisons of the predicted group means obtained from the estimated fixed effects from the two programs. The equations of the second panel shows that the predicted mean values of the groups classified by the two group level variables are obtained using the design matrix in BMDP and the dummy coding of the variables in CML algorithm. The results show that the predicted mean values of the groups classified by the group level variables, W and W are virtually the same in both 1 2’ balanced and unbalanced case. WW Model C is a more complicated type of variance (components model that has fixed effect variables at both levels of units. SAS cannot analyze the data using this model because SAS allows only group level predictors in model specification and the predictors should not be continuous (see SAS User's Guide, chapter 41). BMDP can perform the analysis of this model but it also constrains the group level predictors not to be continuous. In order to compare the computational results, the author coded the individual level covariates, as zero, one, xlijk’ and two (a linear contrasts) and kept the coding systems for group level predictors as before. The sample size of each cell was increased as n-20 because the model has a within-cell covariate. The design characteristics for the balanced case are then: J-lO, K-14, G-l40, n-20, N-2800. The chosen random effect distributions are 80 the same as before. For the analysis of unbalanced data, fourteen cells were arbitrarily selected out and the resulting design characteristics are: J-lO, K-l4, G-126, n-20, N-2520. The computational results for variance components from BMDP and CML algorithm are presented at Table 6. Table 6. Computational comparison for variance components between BMDP and CML algorithm for Model C. Covariance components BMDP CML algorithm Balanced Data r8 13.565 13.565323 7b 17.402 17.402192 to 25.609 25.609418 02 140.532 140.53248 Unbalanced Data in 12.569 12.565704 rb 14.497 14.501222 ’6 26.852 26.853643 02 140.532 140.53247 Table 6 shows that the two sets of results from each program are equivalent. The results of covariance components of balanced case are closer each other than those of the unbalanced case. Concerning the estimates of the fixed effects, the variance components model that BMDP uses is Yijk - p + fllxijk + a1 + 5m + a3 + bk + cjk + eijk (3.38), where pl is the fixed effect of the characteristic of student 1 in jkgh 81 cell, and all other terms are the same as in model B. As in the case of model B, BMDP uses the design matrix for estimating the fixed effects, y, al, and pm, thus those fixed effects parameters are different from CML algorithm that estimates the regression coefficients in terms of general linear model. Table 7 and Table 8 show the computational comparisons for fixed effects between the BMDP and the CML algorithm. Table 7. Computational comparison for the estimates of the fixed effects between BMDP and CML algorithm for Model C (Balanced case) BMDP CML ' “ “1 pm ‘31 70 71 72 ‘91 12.185 .113 .207 7.363 12.505 -.414 -.225 7.363 Group Codes Adjusted Group Means (after covariate) W1 W2 BMDP CML O 0 p+a1+flmf12.505 10-12.505 l 0 p+al-fim-l2.09l 1o+11-12.09l 0 l p-a1+fimf12.279 1o+12-12.279 l 1 u-al-flm-18.865 10+11+12-18.865 82 Table 8. Computational comparison for the estimates of the fixed effects between BMDP and CML algorithm for Model C (Unbalanced case) BMDP CML 74 9:1 13m 181 ‘70 11 72 BI 12.215 -.029 .286 7.363 12.583 -.686 .084 7.363 Group Codes Adjusted Group Means (after covariate) W1 W2 BMDP CML 0 0 p+a1+fimfl2.472 1o-l2.583 l 0 p+a1-fim-11.900 1o+11-11.897 0 l p-a1+fim-12.530 10+12-l2.668 l l p-al-fim-ll.958 1O+11+12-1l.982 The first panels of Table 7 and Table 8 show the estimated values of the fixed parameters and. the second. panels of' the tables show’ the computational comparisons of the predicted group means after accounting for the effect of the fixed effect of individual covariate. The equations of the second panel show that the adjusted mean values of the groups classified by the two group level variables are obtained using the design matrix in BMDP and the dummy coding of the variables in CML algorithm. The results show that the two sets of adjusted group means are virtually the same in both balanced and unbalanced case. v nce o 5 od Model D has two random within-cell parameters and the variation of the two parameters across the macro units was explained with different sets of predictors. The two within-cell parameters are not necessarily independent. Hence the resulting dispersion matrices, 'a’ 1b, and 'c’ 83 are no longer scalars, rather they are all two by two full matrices. Standard statistical packages cannot analyze data under this kind of covariance components model. Since there is no available computer program for the Model D, the author conducted a simulation study to check the accuracy of the computational results for balanced data. For the complete accuracy check of the program performance, the number of replications of analyses should be large enough, for example ten thousands times, which is unreasonably demanding for the present thesis work. As a compromise, the author decided to conclude that the algorithm works properly if the results show sensible evidence of accurate results. The number of replications of analysis was limited to twenty times but the sample size was taken large enough, N-8400, in order to compensate for the small number of replications. The design characteristics for the simulation data are: J-14, Kr20, G-280, n930, N-8400. The distribution of the individual random effects are designated as e - N(0, 49). For ijk the group level random effects regarding fiOjk’ the distributions are: aOJ - N(0, l6), b0k - N(0, 25), COjk - N(0, 36). The other set of distributions of the group level random effects regarding pljk are: a 13 ' N(0, 9), b1 ~ N(0, 4), c ~ N(0,16). The fixed parameter values were k ljk assigned as: 700 - 110 - 10, 701 - 5, and 102 - 3. The group level predictors, lek and szk, are all dummy variables coded zeros and ones. The within-cell random effect covariate, has the values of zero, xlijk’ one, and two. The preassigned parameter values are now compared with the mean values of the estimates obtained through twenty replications of analysis. Table 9 shows the results. 84 Table 9. Computational comparison between the preassigned parameter values and the average values of the estimates from twenty simulations. Covariance Parameter Estimates components values Mean SD Min Max 62 49 49.09 .72 47.54 50.14 '0a 16 16.97 7.80 5.75 38.39 70b 25 23.82 9.31 8.12 36.57 70c 36 37.84 3.63 32.00 44.22 1 9 7.70 3.82 3.19 15.32 la 71b 4 4.52 2.21 1.70 8.52 71c 16 15.96 1.64 13.26 20.35 Fixed effects 100 10 9.96 3.52 3.10 15.23 701 5 3.47 3.09 -2.47 11.14 702 3 3.50 2.92 -4.05 7.50 110 10 9.99 1.21 7.33 12.32 Table 9 shows first that the mean values of the estimates are all reasonably close to the preassigned parameter values. Second, all the preassigned. parameter values fall within the limit of’ one standard deviation from the mean estimates. Third, the 02, f0c’ and flc are closest among the covariance components because these parameters are estimated with the large sample sizes, they are N-8400, G-280. Other estimates of the covariance components are not as close to the parameter values as are 02 r or 'lc’ due to the insufficient sample sizes such 0c as J-l4, K-20. Fourth, for the fixed effects, the mean value of 110 which is the mean. of' within-cell slopes, is closest to the £11k! parameter value and has high precision, SD-l.21. Generally speaking, the 85 slopes have less variation than does the intercept. The results of the fixed effect estimation support this fact. These four characters of the results establish that the CML algorithm produces sensible results for Model D even if the replications of the analysis were limited to twenty times. For the analysis of unbalanced data, I first built the data by selecting out, using a random digit table, the data of 30 cells from an already used balanced data set. Hence the design characteristics are; J-l4, K-20, G-250, n—30, N-7500, but the model characteristics and the preassigned parameter values are the same as in the balanced case. Table 10 show the results of the analysis that compared to the balanced case and the preassigned parameter values. 86 Table 10. Computational comparison beWeen the preassigned parameter values and the results of the analyses of the balanced and unbalanced data. Parameters Parameter Estimates values Balanced Data Unbalanced Data Randqm effects a 49 49.504685 49.152115 '0a 16 38.388547 23.765979 70c 25 17.432644 18.404436 71a 9 11.717677 5.432532 flb 4 6.440947 6.959900 '1c 16 18.057778 13.437822 Fixed effects 100 10 14.881167 10.308522 101 5 2.130050 3.770741 102 3 -l.010730 1.931768 110 10 12.318740 10.085683 Table 10 shows that two sets of estimated values have some differences from the preassigned parameter values. These differences occurred because the estimates were obtained from the single samples of the population with the preassigned parameter values. Similarly the differences between the two sets of estimates of the balanced and the unbalanced data are attributable to the fact that the observation was taken from one sample of data. If we replicate the attempt to make an unbalanced data set from the same balanced data and compute the mean values of the estimates from the analyses of the unbalanced data, then the two sets of estimates would be closer. Nevertheless, we can see some patterns of the results in Table 10 as we found in Table 9. The 87 estimates of 02 show the closest values to the parameter values and, both balanced and unbalanced cases show little differences, because the estimates were taken from the large sample size of 8400 and 7500 respectively. The estimates of '0a and rla show somewhat large differences from the preassigned parameter values as well as between the two data cases, because they were taken from the sample size of J-l4. The overall information of Table 10 is that the distribution of the estimated values are centered on their parameter values in either cases. CHAPTER V ILLUSTRATION In this chapter, crossed multilevel analysis is illustrated. by reanalyzing the data collected by Rudman and Raudenbush (1987). The experience with the crossed multilevel analysis will enhance our understanding on the logic of model specification and answer the following questions: 1. What parameters can be estimated ? 2. How are the hypothesis tested ? 3. How are the results interpreted meaningfully ? at d t Rudman and Raudenbush's (1987) study of the effects of excess testing time on test scores provides the data for illustration. The purpose of the study was to assess the influence of providing excess testing time on standardized reading comprehension test scores. The test was not intended to be a speed test, but rather a power test. The time limit of the standardized test has been determined by the "90 % criterion." Using this criterion, testing time is the time elapsed until 90 8 of the examinees complete the item analysis edition of the instrument. The assumption underlying the procedure is that the 90 % completing the test had arrived at correct answers to many of the items and had used informed guessing on the remainder. It is also assumed that the remaining 10 8 who had not completed the test would merely employ random guessing of given more time. Hence more time would not translate 88 89 into mean test score gains. If the test results are sensitive to the provided excess time it becomes important to discover the optimal testing time by estimating the functional form of the relationship between excess time and test scores for the susceptible test. Moreover, such tests may be sensitive to variations in test adminstration procedures and hence would have impact on decisions on student placement in advanced classes, promotion to a higher grade, teacher awards, or other recognitions because of their class' higher test scores. Sample and Design In the original study, 29 5th grade teachers from 16 of the 33 elementary schools in the Lansing school district in Michigan volunteered to serve as participants in the study. However data could be collected for only 23 of these classrooms. These 23 fifth grade classrooms supplied usable data for 471 pupils. The data set contains both demographic characteristics (including ethnicity, sex, eligibility for free lunch, etc.), and test scores, pretest scores (including reading subtest scores and total reading scores). The design of the study involved first, creation of seven blocks based on mean pretest scores each containing four classrooms from the original 29 classes. Four treatment groups ‘were established, each represent testing-time allotments defined by having 0, 5, 10, or 15 minutes excess time to complete the test. Within each of the seven blocks, classrooms were assigned at random to one of the four treatments. One remaining class was added to one of the classes of which the class size was only five. Therefore there are 22 cells of data in two-way classification. The data were well suited to a 90 polynomial trend analysis. The sample sizes in the design are presented at Table 11. Table 11. Design and Sample Sizes Excess Time None 5 Min 10 Min 15 Min Overall Block 1 - - - 24 24 Block 2 19 22 27 18 86 Block 3 28 23 20 - 71 Block 4 24 - 24 24 72 Block 5 22 ‘ 22 23 21 88 Block 6 25 20 15 - 60 Block 7 21 13 l7 19 70 Overall 139 100 126 106 471 If the sample had included 28 classrooms, the design would have been a balanced randomized block design (Kirk, 1982, chapter 6) with seven blocks and four treatments. In the study, however, six cells were missing and the sample sizes of the cells classified by the seven blocks and four treatments are not the same. The original study (Rudman and Raudenbush, 1987) used "the least squares solution” (see Searle, 1971) by applying a series of regression models with increasing complexity, and computed the reduction in residual variation on each step. This approach solves the inter-correlation of the main and interaction effects caused by the unbalanced character of the data. However, the amount of variation assigned to each effect will depend on the order of its inclusion. The interpretation of hypothesis testing is also conditional to the order of 91 its inclusion (Searle, 1971). In addition, the two crossed-factors are considered to have fixed effects in that study. However, the blocks are typically considered as having random effects and the excess time effects are viewed as random also because they are considered as a sample from the population of the excess testing times. Again classrooms are nested within the cells classified by the blocks and treatment groups. Hence the design reflects crossed multilevel data. na s's The general strategy for crossed multilevel analysis can be summarized in three steps: 1. examining variability among students in the hierarchical structure (base model), 2. examining variability among students within the cell, here classroom, classified. by 'blocks and treatment groups (within-cell model specification), 3. identifying variability as a function of group level variables (between—cell model specification). W In the first stage of model specification we examine variability of the data where students are nested within the cells, classrooms, classified by the blocks and treatment groups. Addleman (1970) stated two analytical principles related to the data structure. The first one may be considered as a general principle in experimental design, that is " The design and analysis of experiments should take into account all of the major sources of variability that are expected to influence the responses (p.1,095).' 92 Second, since students are nested within classrooms, both experimental unit (classrooms) error and the observational unit (individual) error should be included in the model (p. 1,097). With these classical conceptions of model specification, we can see that the present design reflects four sources of variability of students scores; individual differences, block differences, treatment group differences, and the interactions from different combinations between the blocks and the treatment groups. The errors from individual differences are the observational unit errors because the data were taken from the individual scores. The errors from the classroom differences are the experimental unit errors because the classrooms are the units that are randomly assigned to the cells of two-way classification. In the multilevel conceptions, the observational unit errors are specified under the within-cell model and the experimental unit errors are specified under the between-cell model where the errors are decomposed into three parts. We first pose the within cell model as Yijk ' fijk I e1:31: for i - l, ..., “31‘ student in the classroom that belongs to block j, j - 1, ..., 7, and treatment group k, k - 1, ..., 4. Yijk is the test score of student i in the classroom jk. fijk is a mean test score of students in classroom jk, and eizjk is an individual effect of student 1 nested within ”z" the classroom jk. It is assumed that ei:jk is normally distributed within each treatment by block with mean 0 and constant variance 02. The within-cell model is a traditional 93 regression model with no predictors in the model except class pjk’ means. Thus we pose between-cell model to examine variability of the classroom means, fljk’ The between-cell model is 10 + aj + bk + cjk' fijk Each class mean score, becomes the outcome variable which is a fijk’ function of the grand mean, 10, plus the jt_h, block effect, a the kth j! treatment effect, bk’ and the effect of interaction between jg]; block and kth treatment group. The distributional assumptions of the three error terms are; a - N(0, fa); bk - N(0, 7b); and c - N(0, 7c). The J three error terms are mutually independent. Hence the variance of class mean scores, parameter variance, is Var(fljk) - fa + fb + 'c’ which is the total variance attributable to the differences of the classrooms. The full results for this model are reported in Table 12. 94 Table 12. Results of crossed multilevel analysis of base model Fixed Effects Estimates SE t 10 41.535 .459 90.569 Covariance Components Within-cell variance 02 99.059 Between-cell variance Overall 1 r a b c 5.221 0.901 1.667 2.647 -21n(maximum likelihood) 3515.8118 Table 12 shows the estimation. results of' both fixed. effect, 10 (intercept), and the covariance components. These estimates are maximum likelihood estimates since the CML model uses MLF estimation method of EM algorithm. For the fixed effect, 10 (intercept), we can test the hypothesis, H0: 10 - 0, but it is not of interest at this point. The intercept is the grand mean of all classroom mean scores. The critical point in Table 12 is decomposition of the between-cell variance into three parts; fa , 'b’ 'c' The total observed variance is Var(Y 2 + fa + r + 'c - 104.28. The between-cell variance is ijk) " a b 5.221 and the within-cell variance is 99.059. We can compute various intra-unit correlations using the decomposed covariance components. First, we can estimate the proportion of variance that lies within- and between- classrooms. About 95 8 of the observed variance is at individual level. Only 5 8 of the observed variance is attributable to the differences among classroom memberships. Classical-single level models, i.e., regression or ANOVA, do not provide the parameter variance. 95 Nested-multilevel models such as ML2 or HLM provide the parameter variance but the analysis of variance components would stop here. The crossed multilevel model allows further decomposition of the variance. We would like to know what proportion of the variance lies between blocks or treatments or their interactions. Seventeen percents of the parameter variance and one percent of the observed variance reflects variation among blocks. About 32 8 of the parameter variance is attributable to the treatment group differences. Again 51 8 of the parameter variance and 2.5 8 of the observed variance reflect random interaction effects, which means the variation among students with different classroom membership after taking out the effect of blocks and treatment. Kang and Raudenbush (1988) performed a comparison between a classical analysis and HLM using the same data. In that study the observed variance was decomposed as 02-98.73 and r-5.5397. The two estimates are virtually the same as the estimates in Table 12. An interesting point is that the parameter variance from HLM analysis is the sum of the three variance components at macro level. With the HLM analysis we don't know the proportion of the parameter variance that is solely attributable to the differences among treatment group memberships of students. The tiny differences among the estimates from HLM and CML analysis may be caused by two things. One thing is that HLM uses MLR estimation which is equivalent to REML (Patterson and Thompson, 1971; 1974; Harville, 1977) but CML uses MLF estimation which produce ML estimates. The other thing is rounding errors when the two programs compute the estimates. In sum, the crossed multilevel analysis provides all necessary information regarding variance decomposition of the base model and guides further model specification. Having decomposed the covariance 96 components, we now move our interest to account for variability of the student scores at each level. We first attempt to account for the individual level variance. W h - o e a As shown in Table 12, about 95 8 of the observed variance lies at the individual level within each classroom. To account for the variation we need to identify the covariates from individual level variables. In the original study (Rudman & Raudenbush, 1987), the best single covariate proved to be the total reading pretest, r-.75. Only one covariate was needed because other likely covariates, (e.g. ethnicity, sex, parent education) were not significantly related to the outcome after adjusting for the effects of the best covariate. The traditional single-level analysis, as in the original study, treats the effects of individual- level variables as fixed, which implies that the effects of the individual characteristics on the outcome variable are constant across the all classrooms, blocks, and the treatment conditions. However we don't know whether the effects of the chosen covariate on students' test scores are constant across the all classrooms. The effects of the covariate may work differently across the classrooms. Suppose certain teachers provide individual teaching for the students who need remedial study after the pretest, but some other teachers do not. Then the correlations between the posttest scores and pretest scores may vary across the classes. The correlations of the two test scores of the classes with remedial teaching would be smaller than those of the other classes. Therefore we first examine the variation of the covariate effects across the classes, and specify it as having a random effect. We 97 pose the within-cell model as Yijk ' pOjk + pljk(Tread)ijk + ei:jk' The variable code ”Tread" indicates total reading pretest scores. Note the effect of the covariate, has subscripts in the model, which flljk’ allows the parameter to have different values across the classes. The between-cell model does not need any predictors at this moment, but we need to present a pair of equations because the two within-cell parameters are specified as random. The between-cell models are + aoj + b0k + cOjk + alj + b1k + cljk 503k ' 700 pljk ' 710 where aOJ - N(O, roe). 11 Tla)’ b1k - N(0, 11b), cljk - N(0, 71c). The random effects b - N(0, '08), 0k " N(09 70b), c0jk Similarly a - N(0, within each between-cell equation are mutually independent. However the two outcome variables, pOjk and £11k, are not necessarily independent of each other. Thus the random effects within each macro unit, for example an and alj’ are correlated across the two equations, and resulting covariance components at macro level will be two by two full matrices. The results of estimating this crossed multilevel model are shown at Table 13. 98 Table 13. Results of crossed multilevel analysis with random slopes at within-cell model Fixed Effects Estimates SE t 700 12.093 1.19576 10.114 110 . 3848 .01516 25. 3826 Covariance Components Within—cell variance 02 42 .474 Between-cell variances Overall r r r a b c 9 . 2207 - . 1025 2 .4059 - .0123 1 . 8698 - .0405 4. 9500 - .0497 . 0016 8 . 346E- 5 . 0010 . 0006 - 2ln(likelihood) 3122 . 8877 The major concern of this analysis was to see whether the covariate, Tread, has a fixed or random effect. The results of Table 13 clearly shows the slope of the covariate does not vary so much across the blocks, 118-8.346E-5; the treatment groups, 71 -.0010; and the interactions, b rlc-.0006. We can test the hypotheses, H0: Var(alj) - Var(blk) - Var(cljk) - 0 with the likelihood ratio test by using the values of -21n(likelihood). It is known that the statistic, -21n(L1/L2), has an asymptotic x2 distribution, where L1 is the maximum likelihood value of a less complex model and L2 is the value of a more complex model and the degrees of freedom of x2 statistic is the difference of the number of parameters to be estimated in each model. Hence the deviance between the two values of -2ln(likelihood) will have x2 distribution with the difference of the number of parameters between the two models as the degrees of freedom. The number of parameters of the model in Table 13 is twelve, while only five parameters were estimated in the model of 99 Table 12. The deviance between the two values of -21n(likelihood) is 392.9241 which is large enough to reject the x2 statistic with seven degrees of freedom. Therefore we reject the composite null hypotheses, - 0, Var(a )- Var(blk) - Var(ch) - 0. We, however, do not know “0‘ 710 13 which particular hypotheses are rejected given the all posed hypotheses. To identify the significant effects by using the loglikelihood ratio test, we need to specify another model that has fewer parameters. Table 13 shows that the within-cell variance has been reduced from 99.059 to 42.474 by virtue of the covariate. Concerning the fixed effects, the average slope of the covariate across the all cells is .3848. There are other results regarding the within-cell intercept, pOjk’ but the intercept is meaningless because the covariate was coded with its raw scores. Results shown at Table 13 reveal some characteristics of the crossed multilevel model. Traditional variance components models and single- level models that are available through SAS or BMDP can't perform this analysis, , where multiple random within-cell parameters are modeled with appropriate error terms. Under classical variance components models, all covariates must have fixed effects and estimation of random effects is limited to variance components. The difference between nested and crossed multilevel analysis is again shown at Table 13. While crossed multilevel analysis partitions the covariance components into three matrices, 'a’ 'b' and fc’ nested multilevel analysis, however would provide the overall matrix only. In Table 13, both overall and the three separate variance-covariance matrices show that the sizes of the variance of the covariate effect are small. Hence both analyses may agree to fix the effect of covariate in 100 this case. However when the overall parameter variance of the covariate is substantial, a nested multilevel analysis would provide no information about which macro units cause the substantial size of parameter variance while crossed-multilevel model identify what particular random effect among macro units are significant. Thus the crossed multilevel model provides clear guide for further model specification. Although the sizes of the variances of the covariate is small, we still need to test the significance of the variance components by fixing the effects of the covariate to get: a deviance of ~21n(likelihood) between the model of Table 13 and the new model with fixed covariate effects. Thus the second between-cell model is cancelled and the within- cell model become Yijk - fiOjk + £1 (Tread)ijk + eijk and the between-cell model has only one equation as + a + b + c fiOjk-7OO 03 0k Ojk’ The new results of the analysis are shown at Table 14. 101 Table 14. Results of crossed multilevel analysis with a fixed within cell slope Fixed Effects Estimates SE t 100 12.2879 1.2044 10.2029 81 .3826 .0153 25.055 Covariance Components Within-cell variance 02 43.0875 Between-cell variances Overall 1 r r a b c 2.984 1.1102 1.2682 .6058 -21n(likelihood) 3125.2379 Table 9 shows a highly significant (t-25.055) covariate effect. The within-cell variance at Table 12 was 99.059 but it was reduced to 43.0875 here. About 57 8 of the within-cell variance was accounted by the covariate. The overall between-cell variance was also reduced from 5.2205 to 2.984. The last line of Table 14 shows that -2ln(1ikelihood) is 3125.2379 and the deviance between the values in Table 13 and Table 14 is 2.3502. The x2 statistic with six degrees of freedom is 12.59 at 5 8 significance level. Hence we do not reject the null hypotheses, H ' 0° Var(alj) - Var(blk) - Var(cljk) - 0 and decide to fix the effects of the covariate across the blocks, treatment groups and the block-by-treatment interactions. The parameter variance estimates, Var(flojleread), become conditional variances. They measure the amount of variability remaining among the class means. Raudenbush and Bryk (1986) used a measure of model performance, R2 as in regression, by comparing it to the 102 unconditional parameter variance estimates from the first stage. Equivalent measures can be used for crossed multilevel model. Hence the proportion of explained parameter variance by the model is 22* - Var(fljk) - Var(fljkl Tread) _ 5.2205-2.984 _ 428 Var(fijk) 5.2205 About 43 8 of the parameter variance was explained by the model or the covariate in this case. The R2* presented above is never less than the one obtained through OLS estimation because the OLS estimates, fijk’ include true parameter value plus sampling error as 81k - fijk + elk but the EM estimates, is true parameter value. In the previous study Bjk! (Kang and Raudenbush, 1988) the within-cell variance was 43.22, the overall parameter variance was 2.915, and the total OLS between-cell variance, Var(B was 10.987. Based on these statistics, the proportion Jk). of explained parameter variance by HLM is 42.4 8 and by regression approach is 23.9 8. It clearly shows that both multilevel models are more reliable than the classical regression model in terms of the coefficient of determination (R2). It also shows that the parameter variance from nested multilevel analysis is the sum of the three covariance components at macro-level variances in crossed multilevel model. The nested multilevel model cannot identify the separate error sources in the model in crossed multilevel contexts. Having completed the specification of the within-cell model, we now need to identify the variability among classrooms as a function of 103 between-classroom variables . - e de f a The general principle for modelling strategy for the statistical model specification is to build a parsimonious model. In the crossed multilevel model, the x2 statistic which is the deviance of the- 21n(maximum likelihood) and which allows us to test a composite hypothesis, serves as a criterion for model specification. .Available group level predictors in the original study (Rudman and Raudenbush, 1987) were linear trend of the blocks, and the polynomial trends variables (linear, quadratic, and cubic) of the treatment groups. The linear trends of the blocks and treatment groups were significant in that study after examining all possible sets of predictors. In the present study, the author decided to use only the polynomial trends variables of the treatment groups for illustration purpose. By using the predictors taken from one dimension of two-way classification, we are able to see changes of each covariance component at the group level. We first pose the model with all polynomial trends variables and then take out the nonsignificant predictors from the model. The within- cell model is the same as before. Y1 + 81(Tread) jk ' flOjk ijk.+ eizjk but the between-cell model includes the predictors as fiOjk - 100 + 701(L1n)jk + 102(Quad)jk + 103(Cub)Jk + aOJ + b0k + COjk’ 104 where Lin - linear trend, Quad - quadratic trend, and Cub - cubic trend of the treatment effects variables. All other notations are the same as before. The results of the analysis of the above crossed multilevel model are shown at Table 15. Table 15. Results of crossed multilevel analysis with both within- and between- cell variables. Fixed Effects Estimates SE t 100 12.3233 1.2079 10.20 101 .5728 .1342 4.27 102 -.0670 .3049 -.22 103 .0158 .1383 .11 61 .3824 .0153 25.02 Covariance Components Within-cell Variance 2 0 42.9445 Between-cell Variances Overall 7 r r a b c 1.39728 1.08924 .01114 .2970 -21n(likelihood) 3116.4663 The results shows that the x2 value of the log-likelihood ratio test is 8.77 with three degrees of freedom, but x2 value at 5 8 significance level is 7.82. So the composite hypotheses, H - 0 is 0‘ 101 ' 102 ' 103 rejected. The first panel of Table 10 shows the results of t-test for the fixed effects and the fixed effects of quadratic and of cubic trends 105 effects are not significant in predicting the variation of within-cell intercept. So these two predictors ‘will be dropped in. next. model specification. The residual parameter variances are reduced by virtue of the predictors at group-level. But the residual parameter variance of the random block effect, fa' shows the least change because the employed predictors are mainly supposed to account for the random treatment effects. One notable result in Table 15 is that the within- cell variance was not changed by the effects of the group level predictors“ This is because the group level variables predict the variation of the responses only at group level. Since we know only the linear effect of the treatment is significant in predicting the variation of within-cell intercept, we pose the final crossed multilevel model. The within- and the between—cell models are Yijk ' fiOjk + 31(Tread)13k + ei:jk + a + b + c fiOjk ' 100 + 101(L1“)jk Oj 0k Ojk' The results are shown at Table 16. 106 Table 16. Results of crossed multilevel analysis of the final model Fixed Effects Estimates SE t 100 12.30087 1.20408 10.21 101 .57793 .13336 4.33 61 .38256 .01526 25.07 Covariance Components Within-Cell Variance 2 0 42.9431 Between-Cell Variances Overall 1 r r a b c 1.4134 1.1045 .0113 .2976 -21n(likelihood) 3116.5100 Table 16 shows that the estimated fixed effect of the linear trend is .578 and the change of the -2ln(likelihood) is almost zero. The classical interpretation of the estimated effects in regression analysis can be used for the fixed effects in the crossed multilevel model, that is the average change of outcome scores for one unit change of linear trend variable is .578. The linear trend variable was coded as; group 1 - -3; group 2 - -1; group 3 - 1; group 4 - 3; in the study. Hence the expected mean change of reading test scores for 5 minutes excess testing time is the twice of the coefficient which becomes 1.156. Concerning the covariance-components at macro level, we found the size of the variances at both within- and between—cell levels was not changed a lot after deleting the two non-significant trends effects. Comparison of the results with the results of Table 14 tells us that the within-cell variance was not affected by the effect of the linear trend 107 variable which is from treatment groups, but the magnitude of residual parameter variance of random treatment effects were reduced from 1.268 to .0113. Hence about 99 8 of the random treatment effects were accounted by the linear trend variable. We can compute the coefficient of determination for parameter variance here. The proportion of the explained parameter variance by the model is 2* Var(fijk) - Var(fijkl Tread,Lin) 5.2205 - 1.4134 R — - - .729 Var(fijk) 5.2205 About 73 8 of the parameter variance was explained by the model. Based on this results regarding fixed effects, we could conclude as in the original study: 1. Test designers should consider more precise methods of setting testing time limits, 2. tests which are sensitive to variations in the procedures of administration should not be used for high-stakes decisions unless the procedures can be carefully monitored (Rudman and Raudenbush, 1987, p.14) . CHAPTER VI CONCLUS ION 8mm Educational systems typically ‘have hierarchical organizations in which "units" at one level are "nested" within units at the next higher level. These educational systems often. produce 'hierarchical data. There have been many controversies over results found through traditional statistical analyses. .As Cronbach (1976) and others emphasized, many educational studies have used inappropriate analyses, including many important evaluation studies. A number of methodologists have developed multilevel models with estimation procedures appropriate for multilevel data. Although these multilevel models have made substantial methodological advances in analyzing multilevel data, they apply only to those multilevel data structures in which each lower-level unit belongs to only one unit at the next higher level. In many cases, however, the structure of a system is not so simple. Students may belong to more than one group simultaneously. For example, we could cross-classify students both by the school they attend and by the neighborhood they live in. We call this kind data as "crossed multilevel data." Despite the recognition of a need for an appropriate multilevel model analyzing crossed multilevel data, a major difficulty has been constrained by existing technologies such as one-way nested multilevel models, variance components models, and OLS regression models. This thesis has now expanded the multilevel techniques to include the two-way crossed multilevel model. The major products of this thesis are five things : 108 109 l. A crossed multilevel statistical model has been presented in general form. 2. The Empirical Bayes estimation procedure has been adapted to the estimation of the crossed multilevel model. 3. A computing algorithm (the CML algorithm) for numerical analysis of crossed multilevel data has been developed using the Gauss language. 4. The accuracy of the computing algorithm has been tested in reasonably comprehensive situations of crossed multilevel modelling both for balanced and unbalanced data sets. 5. The application of the crossed multilevel model to real data set was learned through illustration. In chapter one, the issues of crossed multilevel analysis in educational research were identified and the statistical model for crossed multilevel analysis was presented. The problems of existing statistical models for analyzing crossed multilevel data resulted because those models can't specify the error terms from all sources in crossed multilevel contexts. Traditional single level models, such as regression models, fail to incorporate the error sources from multilevel structure. Such a limitation of the models leads researchers to have an enforced choice of either individual level or group level analysis. Because the observations within a group are not independent, significance tests based on individual level analysis are not acceptable, due to the violation of independence assumption. Dependencies among observations cause over-estimation of the precision which is a function of sample size and intra-class correlations. The group level analysis does not violate the independence assumption but cannot use individual variables in the model. Inferences about individual behavior (e.g., individual student achievement) based.