.. t .62... ; H45..3..v.f .. I. .1.“ .Yv f . :1... a. , . «5“ g r. .r. . 1.15.. .. 2.? 1‘. J 11! .8 3:. .- 5. luv; a.”- 5»... 1 1.13.331. c1 .r x 3 i o x :- 9... .. I; 33.33. -..!>l» .J r .. . it. , an 2. ‘ 1! :1}. . 79... J. .21 . .5». 1.1.3.1.“. «fig. . 94.. .7. 2 in .1 ”brunt.” :1... .v 3.1:? Lllv .. u .i .v: w .. . . 1.. v. 3 5.2! ft. .. t a. . 1:5 .n ...m.:...; :urv SflH .q a... (a: . :a 13:11.“:3 I 2. L11) p i .. u... ;- . . firefly.» I... .11. v.I~ r z s 3 .. 3 .. ; a i. 1:!)TI0 x. . 7. . ‘92ij 35‘s: ”A“? ‘9 .u): $3,, . . .1 30‘: HF. K11 {(1472, A I. 1) 9:4, ‘ Psi... 7.. .Kp) rmmu43m..ua.fnn. TYLIBRARIES llllllll \\\l\\\\\\\\Willi“ ii iii 3 12930 MiCHIG \ l '1‘ his is to certify that the dissertation entitled EMPIRICAL BAYES ESTIMATION FOR UNBALANCED MULTILEVEL STRUCTURAL EQUATION MODEL VIA THE EM ALGORITHM r I defloCu U, See—Heyon Jo has been accepted towards fulfillment of the requirements for Ph. D. degreein Counselinq, Educational Psychology & Special Education (Statistics & Research Design) J;LL4/~ QMIMWK Major professor DfieNovember 15, 1994 MS U is an Affirmative Action/Eq ual Opportunity Institution 042771 LIBRARY Mlchlgan State Unlverslty - - , 494 N “a - _... PLACE ll RETURN BOX to tomovothb checkout flom your mood. To AVOID FINES Mum on or More dot. duo. DATE DUE DATE DUE DATE DUE l | l usu loAn Nflmnfln Action/Equal Opponunlly III-titular: W m1 EMPIRICAL BAYES ESTIMATION FOR UNBALANCED MULTILEVEL STRUCTURAL EQUATION MODELS VIA THE EM ALGORITHM By See-Heyon J o A DISSERTATION Submitted to Michigan State University in partial fulfilment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counceling, Educational Psychology and Special Education 1994 ABSTRACT EMPIRICAL BAYES ESTIMATION FOR UNBALANCED MULTILEVEL STRUCTURAL EQUATION MODELS VIA THE EM ALGORITHM By See-Heyon J o The question of how to analyze unbalanced hierarchical data generated from structural equation models has been a common problem for educational researchers and analysts. Among difficulties plaguing statistical modeling are estimation bias due to measurement error and the estimation of the effects of hierarchical, social milieu where education takes place. Over the last two decades, substantial progress in multilevel structural modeling and estimation techniques has been made for the balanced sampling design. This dissertation presents empirical Bayes estimation procedures for the multilevel structural equation models in the context of unbalanced sampling designs. The computational procedure is implemented via the EM algorithm. It is particularly useful for the problem of estimating a large number of parameters in multilevel structural equation models. A multilevel structural equation modeling process with an example illustrates the general principles of the empirical Bayes estimation with the EM algorithm. The accuracy of the algorithm was tested using a set of artificial data. The numerical results suggest that this new methodology is a potentially useful means for studying hypothesized causal relations among latent variables varying at two levels of hierarchy. © Copyright by See-Heyon J o 1994 To my parents, brothers, wife and friends iii ACKNOWLEDGMENT I wish to express my extreme gratitude to my major professor, Dr. Stephen W. Raudenbush, for his patient guidance over the last five years. His insights contributed greatly to the content of this dissertation. And his encouragement, intellectual and financial support helped me complete this dissertation. On a more general note, I would like to thank the entire faculty of the Department of Counseling, Educational Psychology and Special Education at Michigan State University for providing me the opportunity to pursue an advanced degree in statistics and research design, and for making my stay here a very rewarding experience. My special gratitude goes to Dr. William Schmidt for his support. In the winter of 1990, Dr. Raudenbush and Dr.Schmidt gave me a precious opportunity to study multilevel structural equation models under their supervision. That winter enriched the idea of this dissertation. I would like to express my gratitude to Dr. Richard Houang for his insightful suggestions and comments. I wish to express my appreciation to the rest of my committee, Dr. James Stapleton and Dr. Frank for reviewing my work and providing suggestions for improvement. To my father, mother, brothers, and relatives, I owe a special "thank you" for their support and encouragement. To my wife, Kyung-Nam Lee, I dedicate this dissertation. Without her unfailing love and support this study could not exist. iv Finally, I believe William Faulkner (1897-1962) also deserves some thanks for saying : " I would like to think that there was someone there at that time too, to reassure them that man is tough, that nothing, nothing -war, gn'ef, hopelessness, despair can last as long as man himself can last; that man himself will prevail over all his anguishes, provided he will make the effort to; make the effort to believe in man and in hope -to seek not for a mere crutch to lean on, but to stand erect on his own feet by believing in hope and in his own toughness and endurance." TABLE OF CONTENTS LIST OF TABLES ............................................................................................... viii LIST OF FIGURES ................................................................................................ ix Chapter Page 1. INTRODUCTION ............................................................................................... 1 1.1 General Problem .................................................................................. 3 1.2 Objectives ............................................................................................. 5 1.3 Brief History of Single Level Structural Equation Models ....................... 6 1.4 Prior Work on Multilevel Structural Equation Models ............................ 9 2. MULTILEVEL STRUCTURAL EQUATION MODELS .................................... 20 2.1 The Model and Basic Notation ............................................................. 22 3. EM ALGORITHM FOR MAXIMUM LIKELIHOOD ESTIMATES .................. 36 3.1 General Description and Application to the Multilevel Structural Equation Model ................................................................................... 37 3.2 Computation of the Iterates ................................................................ 42 3.3 Log-Likelihood for the Multilevel Structural Equation Model ................ 46 vi 4. NUMERICAL RESULTS .................................................................................. 50 4.1 Generation of Data ............................................................................ 50 4.2 Results of the Analysis ........................................................................ 53 5. CONCLUSION ................................................................................................. 61 5.1 Summary, Implications and Conclusions ............................................... 62 5.2 Future Work ........................................................................................ 65 APPENDICES Appendix 1 ................................................................................................. 67 Appendix 2 ................................................................................................. 69 Appendix 3 ................................................................................................. 73 Appendix 4 ................................................................................................. 74 Appendix 5 ................................................................................................. 75 BIBLIOGRAPHY ................................................................................................. 78 vii LIST OF TABLES Table Page 1. Structural Parts of the Multilevel Structural Equation Model ............................... 31 2. Number of Groups per Group Size .................................................................... 51 3. Summary Statistics of the Multilevel Structural Equation Model for Balanced and Unbalanced Data Sets with 500 groups .......................................... 53 4. Maximum Likelihood Estimates for the Example Model for Balanced Data Sets ......................................................................................................... 54 5. Conditional Expectations of the Regression Coefficients for Balanced Data Sets .......................................................................................................... 54 6. Maximum Likelihood Estimates for the Example Model for Unbalanced Data Sets ......................................................................................................... 55 7. Conditional Expectations of the Regression Coefficients for Unbalanced Data Sets .......................................................................................................... 55 8. The Values of the Observed Log-Likelihood ....................................................... 56 9. Maximum Likelihood Estimates for Example Restricted Model for Unbalanced Data Sets ......................................................................................................... 58 10. Conditional Expectations of the Coefficients for Exogenous Variables in Restricted Model for Unbalanced Data Sets ...................................................... 59 11.Estimates of It's for Balanced Data ................................................................... 6O 12. Estimates of It's for Unbalanced Data .............................................................. 6O viii LIST OF FIGURES Figure Page 1. The Path Diagram for an Achievement Model ..................................................... 21 ix CHAPTER 1 INTRODUCTION As a consequence of various theoretical developments and of improvements in computing, maximum likelihood (ML) estimation has become a viable procedure for estimating parameters in multilevel structural equation models under the balanced sampling design. Many of these developments were reviewed in detail by Jo (1993). The initial interest in ML estimation of a multilevel covariance structure model was noted and developed by Schmidt (1969). Schmidt and Wisenbaker (1986) extended this work to the structural equation models (J oreskog, 1973) for balanced hierarchical data. McDonald and Goldstein (1989) derived the likelihood equations and derivatives for a bilevel structural equation model which allows for variables measured strictly at a higher level, though no computational approach was made available. They also indicated that the procedure for computation of ML estimates is currently less well developed for the unbalanced sampling design. Recently, based on the balanced-data theory provided by Schmidt (1969) and McDonald and Goldstein (1989), Muthen (1990) showed that the maximum likelihood fitting function could be rewritten such that the between and within structural models could be estimated by means of a multi-population analysis in LISCOMP (Muthen,1987) or other comparable structural equations software. In the case of balanced data this could be accomplished by treating the within-group deviations as sampled from one population and the between-group deviations as sampled from a second population. For the case of 2 unbalanced data, each cluster of groups which have the same number of observations is treated as one population. Lee and Poon (1992) also used the strategy of classifying level-2 units into subsets of level-2 units having equal sample size. They proposed an estimator for such data which, though not maximum likelihood (ML), has the same asymptotic distribution as the ML estimator as the number of level-2 units per subset increases without bound. Computationally this estimator is available using standard software program, such as LISREL (Joreskog and Sorbom, 1993) or EQS (Bentler, 1989). More recently, Raudenbush (in press) proposed an alternate approach for the unbalanced case. He conceptualized the problem in the framework of groups which could all have the same number of sampled cases but are missing data for some individuals. In particular, in the M-step (maximization) the method uses the standard program such as EQS (Bentler, 1989). Vredevooogd (1993) applied this general approach to the global models (where two indicators for a group-level latent variable are included) in her dissertation proposal. Jo (1993) also applied the general procedure to a set of linear structural equation models. The purpose of this dissertation is to develop empirical Bayes estimation procedures for computing maximum likelihood (ML) estimates of the parameters in the multilevel structural equation models in the context of the unbalanced sampling design. The procedures do not require classifying level-2 units into subsets of level-2 units having equal sample size. We present a multilevel structural equation modeling process with an 3 artificial example, which illustrates the general principles of empirical Bayes estimation with the EM algorithm. 1.1 The neral Problem A distinguishing characteristic of the data encountered in many areas of educational, medical, social science (sociology, econometrics, management, marketing) and genetics research is that the sampling structure is hierarchical. For instance, students are nested within schools, workers within firms, patients within some treatment-specific medical programs, family members in a family tree, or residents within census tracks. Individuals also can take the role of independently observed groups. Generally, students are taught in groups by a teacher, several classrooms and teachers are grouped together into a school, schools into districts and districts are clustered in states. Then students who attend the same school or classroom are expected to share certain educational policies and practices. As a result, the educational outcomes for these students will be, to varying degrees, intercorrelated. These efl'ects of clusters are most validly viewed within the context of multilevel linear models. Much of social science data comes from two-, or three- stage sampling designs. Large-scale educational assessment, for example, is typically conducted by drawing a sample of schools and, fi'om those schools, sampling the students who will take the assessment test. This hierarchical fashion of sampling is fi'equently selected in large-scale surveys, such as the National Longitudinal Study with data gathered regarding the educational aspirations and attainment of high school seniors of 1972 and the Second International Mathematics Studies 4 (SIMS; Crosswhite et al., 1985), and the Third International Mathematics and Science Studies (TIMSS; Schmidt, 1993). Under the standard assumption of MD, covariance structure modeling (Joreskog, 1973) of such data misguides statistical inference by not taking into account the intracluster correlations which are present in hierarchical data. Hence an important implication of such structure is that the classical assumption of independence among nested observations is violated. Ignoring the existence of hierarchy in model building gives rise to ' several methodological and substantive problems and that have been well-documented in the literature (Burstein,1980). In the context of the linear modeL statisticians (Lindley and Smith, 1972; Smith, 1973; Raudenbush, 1984, 1988; Aitkin and Longford, 1986; Goldstein, 1986) developed hierarchical linear models (HLM) which are appropriate and powerful means of modeling hierarchical data. Many of these developments and examples are found in the recent book written by Bryk and Raudenbush (1992). It was not until hierarchical modeling techniques (Aitkin and Longford, 1986; Goldstein, 1986; Mason, Wong and Entwistle, 1984; Raudenbush, 1984; Raudenbush & Bryk, 1986) were developed that complex relationships among variables across all levels could be inferred. Such techniques have been widely used for various types of research topics such as cognitive growth and change (Bryk and Raudenbush, 1987; Goldstein, 1989), population studies (Mason et al., 1984), meta-analysis (Raudenbush and Bryk, 1985), and evaluation of educational effectiveness (Aitkin and Longford, 1986; Raudenbush and Bryk, 1986). However there have been only rare attempts of applying the methodology to 5 the structural equation models for hierarchical data. In discussing the empirical Bayes \ approach, Muthen (1990) also indicated the plausibility of application to the multilevel structural equation model by estimating each group's factor value under the assumption of "exchangeability" (deFinetti, 1937; Lindley and Smith, 1972) of the groups. The main tasks in this dissertation are (a) to incorporate the effects of two levels of social organization into statistical models for outcomes measured at the individual level or/and cluster level; (b) to develop latent variable models that simultaneously incorporate effects of structural relations and measurement error. 1.2 Qbiectives The primary objectives of this dissertation are: (1) To review previous relevant advances in statistiCal modeling and estimation procedures for the multilevel structural equation model, (2) To describe a multilevel structural equation model and develop empirical Bayes estimation procedures for ML estimates via the EM algorithm, (3) To write a necessary computer program to implement this new estimation procedure, (4) To demOnstrate by the use of simulated data that this estimation procedure produces accurate parameter estimates. 6 1.3 Brief History of Single Level Structural Eguation Models The measurement of latent constructs with multiple manifest variables began with the work of Spearman (1904) early in this century. In 1904, Spearman proposed the method of factor analysis to investigate Galton's theory (Galton, 1883) of "intelligence" that a single common factor and a specific factor constituted cognitive ability as measured educational tests. This model evolved to represent intelligence with a hierarchical structure (Vernon, 1950). Thurstone (1947) extended Spearman's theory to a multiple factor analysis model. Apart from Spearman's factor analysis there is the work of Wright (1934), who derived the path analysis for the research of genetics. Before Lawley's (1943) development of the maximum likelihood function for factor analysis, the classical method was not based on the statistical theory of random sampling. While computational methods were not available at that time, Lawley derived the partial derivatives of the logarithm of the likelihood function with respect to each element of the covariance matrix. Based on Lawley's ML estimation theory, Joreskog (1967, 1973, 1977) developed the structural equation model. “Linear structural equation modeling (1977) represents an important combining of the traditions of econometric and psychometric methods producing a set of procedures that enable researchers to separate the structural part of the model from the measurement properties of the variables. The structural part of the model represents hypothesized networks among latent construct variables imperfectly projected in the observed indicators. This scheme of formulation allows the separation of issues of measurement error from the assessment of the structural relationships that embody the actual purposes of the research. This tradition has seen 7 many applications in education, psychology, and sociology over the last 20 years.” (Raudenbush and Schmidt, 1991) Joreskog (1977) adapted two optimization algorithms of steepest descent and the Davidon-Flectcher-Powell. Recently he extended LISREL to nonlinear structural model (Joreskog and Sorbom, 1993). The recent version of LISREL8 with PRELISZ (Jorekog and Sorbom, 1993) provides user-easy language SIMPLIS for the PC- window. As noted by Austin and Wolfle (1991), structural equation modeling is not a recent development (Bentler, 1983; McDonald, 1978), nor is it the work of any one individual or disciplinary area. However, there have been other lines of inquiry. The second line of inquiry is represented by the work of Bock (1960), Bock and Bargmann (1966), Wiley (1967), and Wiley, Schmidt and Bramble (1973). They addressed a set of models formally parameterized as the factor analysis model but with different notions as to the roles of the parameters themselves. The model for the observed score vector of p tests is : The model implies that the vector y has a multivariate normal distribution with mean vector u and covariance matrix 23: 2=A:,+n,2,] (1.4.12) n J" de zji‘Zde 'Y.d)()7,—d _j7..d)T (1-4-13) d j=l 1 D J4 ")4 _ _ T Sw = —Z 2(yijd _y.jd)(yrjd —y.jd) (1-4-14) N_J d=l j=l i=1 with the following definitions: )7, = the sample mean vector for the d-th subset. ya, = the outcome vector for the i-th individual in the j-th group classified into the d-th subset. Jd = the number of groups of the d-th subset. ,u = the population grand mean vector. 14 n, = the total number of individuals in d-th subset, d=1,...,D. njd = the number of individuals in j-th group classified into the d-th subset. j = index for the groups classified into a subset of distinct size N = the total number of individuals in a study. J = the total number of groups. 531,, = the sample group mean vector for the j-th group. Note that 1': 1,...,njd. j =1,...,Jd. d=1,...,D From a structural equation modeling point of view, the multilevel data ML fitting function can be viewed as corresponding to a simultaneous analysis of independent observations from D + 1 heterogeneous “populations”. with the D populations for SM ’s plus the within-group “population”. All of the parameters are constrained to be equal across the D between-group populations except for the scaling factor, nd. In equation (1.4.41) “it should be noted that the between sample covariance matrices may be singular due to being created by summation over fewer units than variables. This may prevent the use of certain conventional structural modeling software where positive definite matrices are assumed.” (Muthen, 1990). To find the ML estimates one has to set up a command file (see, Muthen, 1990) to run EQS or LISCOMP program according to the model equations. By use of the statistical concept of "missing data" (Dempster, Laird, and Rubin, 1977) Raudenbush (in press) developed a new estimation procedure. In theory the balanced data is equivalent to the complete data, while unbalanced data is incomplete data. In the unbalanced case, the E-step computes the conditional 15 expectations of the complete data sufficient statistics given observed data and the current parameter estimates. In the M-step the standard EQS software can be used to find the ML estimates of the parameters iteratively. Raudenbush (in press) postulates that "by supposing one has sampled n units within each of J clusters, one can apply the Muthen's balanced-data method. However, within each group k, only n”. observations are available with n—nU observations missing. Then one might regard the balanced data as the complete data, the observed unbalanced data as incomplete data. Consequently the maximization of the complete data likelihood is the same as the Muthen's balanced data approach for hierarchical data using the LISCOMP (Muthen, 1986) or EQS (Bentler, 1989) and so on”. The complete data sufficient statistics can not be observed but can be estimated by means of their conditional expectations given the observed data and a current estimates at the parameter values. This process constitutes the E-step. At level 1 (within cluster) we have p observations on each of n units, collected in the p by 1 vector y,. These vary randomly around the cluster mean ,u, according to the model: yq=flj+e,,e,~N(0,2) (1.4.15) The model at level 1 can be written as : [:lj]=[:lj:lpj +[:”] (1.4.16) 2 2; 21' 16 where y”. is pnlj by 1 observed vector, yzj is p122]. by 1 missing vector. A”. =1m®1p A2,. =lm®1p sq, =1,U®2 (1.4.17) ‘11,, 4,1,8): "=71” +1221. At level 2 the model can be written as: ”j : fly +upuj ~ N(O: Tim) (1.4.18) x1. is a vector of group level observed variables. The expected value of the complete data sufficient statistics for within group variance covariance matrix given the parameter estimates from the M step of the previous iteration is: J O 0 SW = Slw +W(J_ 1)2w +ijnj[Lj +(ylj -y2j)(j71j _y2j)T] (1419) j=l S“, is the usual pooled within-group variance-covariance estimate based on the observed data; T" 1420 T... (..) 17 L1. = (1211.2: +T')“n,j. ;' (1.421) W __ " n (1.4.22) w=1v,/n1 N2 is the total number of missing level-1 units. 57,}. is the observed group mean vector and 72']. is the posterior mean vector for the missing observed data in each group. ii,- = LT?”- +(1-L‘}-‘)l'x,- (1.423) r = Tali? (1.4.24) The expected value of the complete data sufficient statistics for S” given the parameter estimates from M-step of the previous iteration is: 5,, = S... +26, —r)1w,(i;,- —r.,.)—Wo7; —y.)1’ (1.4.25) 51,. = 2x)??- - Jfi’ (1.4.26) The sufficient statistic for Zn is: J Sm =ij f—Jfi’ (1.4.27) j=l The expected value of the complete data sufficient statistics for variance covariance matrix for y given the parameter estimates from the M step of the previous iteration is: 18 S» = 1:27:57; +nwaL}l +JWZW 41.077" —(n/J)Zw,2.L;' +172 (1.4.28) where 7; = (1--wj)ylj +wjj7;j (1.4.29) y‘ =(1—W)j7,+1772’ (1.4.30) _ 1 J _ yr =72"qu (1.4.31) 1 i=1 J N1 = Zn”. (1.4.32) j=l . 1 J . Y2 =1—v-Z(n-nl,-)Y2, (1.4.33) 2 j=1 Given the starting values produced by Muthen’s ad hoc estimator, expected values for the complete data sufficient statistics are calculated by a Fortran program (Jo, 1993). These estimates will then be used to obtain maximum likelihood estimates of the parameters using the EQS program. An executive computer program provides the mechanism to switch on the Fortran program for the E-step and then the packaged program for the M-step. The previous work in the field of multilevel structural equation models made substantial progress. In this dissertation we propose a new approach which does not require classifying level-2 units into a subset of equal sample size. In conclusion of this chapter we provide a brief preview of subsequent chapters. In chapter 2, we present the general structural equation model with an 19 example. We also transform the model in terms of mixed model form. And we will briefly describe some typical research questions that may be addressed by means of multilevel structural equation models. In chapter 3, empirical Bayes estimation procedures are developed. The maximum likelihood estimators for the parameters are given, and we present the observed log-likelihood function. In chapter 4, artificial data are generated for checking the accuracy of the parameter estimates. The analysis is carried out by use of a program in Gauss. The index of goodness-of-fit of the model is presented, and the likelihood ratio for two alternate models is also given. In chapter 5, the summaries of each chapter are given, and the implications of the models are discussed. And future research questions are also presented. CHAPTER 2 MULTILEVEL STRUCTURAL EQUATION MODELS To illustrate how measurement and substantive theory can be integrated between and within levels in one overall fi’amework, a hypothetical achievement model will be examined as an example. Consider a model where achievement scores of a mathematics test are believed to be influenced by a student's attitude toward mathematics, individual characteristics, e.g., gender, and class characteristics, e. g., teaching styles. The teaching styles such as discovery- oriented instruction or expository teaching are believed to influence attitude and achievement on the classroom level. Gender also is believed to be related to students' attitude and achievement on the individual level. The path—diagram for this hypothetical achievement model is shown in Figure 2.1.1. In our example attitudel measures an individual's view on the usefirlness of mathematics in our life and is based on the sum of scores on the four attitude items, each of them scaled as a Likert (1932) response with categories: strongly disagree (1), disagree (2), undecided (3), agree (4), and strongly agree (5). These items are: l. I can get along well in everyday day life without using mathematics. 2. A knowledge of mathematics is not necessary in most occupations. 3. Mathematics is not needed in everyday day living. 4. Most people do not use mathematics in their jobs. Attitude2 measuring "Attracted" to mathematics, is based on the sum of the scores on the five attitude items, each scaled as five-category Likert; strongly disagree (1), disagree (2), undecided (3), agree (4), and strongly agree (5). These items are: 1. I would like to work at a job that lets me use mathematics. 20 2 1 2. I think mathematics is fun. 3. Working with numbers makes me happy. 4. I am looking forward to taking more mathematics. 5. I refuse to spend a lot of my own time doing mathematics. The ACHl is the first part composed of basic facts and principles, while the ACH2 is composed of problem solving questions. teaching style betabl betab) class class attitude achievement alphabl 1.0 Lambdabl 1.0 LambdabZ attttudel attitudeZ ACH I I . 0 Lambdawl I .0 Lambde I ha! attitude 0 p \ / achievement beta] beta2 gender Fi 2.1.1AP hD’ mforM l' E u tionM e1 2.1 The Model and thg Basic Notation 22 A simple item level equation for each individual: yr] : Awntj +1‘b’7bj+ 81'} 4H ylr’j y 31; _y4tj y2ij 1 N‘t ‘- bfl 2-1 i Up\ 0 ‘ ' 1 o ‘ 0 7719' + ’11» 0 F771»): 1 7721'; 0 1 JIM): w2_ .. O 152.1 + 82" 33,]. (2.1.1) (2.1.2) where j=1,2,..,J for classroom, i=1,2,...,nj for students nested in classroom j. The subscript "w" means the within-level, while "b" means the between-level. In terms of our educational example, equation (2.1.2) can be expressed as follows : attitude2 ,1. ACH1, where 3:1 ~ N (O, 2), a typical form for 2‘. is : 2 = —attitudel,j _ _ ACH2, _ attitude”. achievement”. }. attitude“ I. _achievementb21. O 0 a: O 0 0% O 0 a ]+ Assuming structural linear relationships among constructs, the theoretical ' (2.1.3) 2 3 relationships on the within-level depicted at the bottom part of Figure 2.1.1 can be expressed through the following structural equation: ”10' = 0 0 Um- filo flzo z"). uuj [7721!] [a] Oiinzyi+ifi30 [3,0] [220.] +[u2ij] (2-1-4) where 11,]. = [uwuw]r, u”. ~ N(O,A), A = Diagonal(5p,p =1,...,P). In terms of our educational example, equation (2.1.4) can be expressed as follows : attitude”. = O O attitude”. + .510 320 1 + ”Ir (2.1.5) achievement,j a, 0 achievement”. '63,) ,640 gender}, 112,1. Equation (2.1.4) stipulates that on the individual-level the latent variables are captured as a structural linear function of themselves and the predictor variable. In our example, gender is used as a predictor variable. In equation (2.1.4) 22,]. is gender, while 2“,. is unit value so that the model has intercept terms. Now we reduce equation (2.1.4) into the equation (2.1.6). [“1 =1: 1 1% (2111111 ‘11") 772:; ‘a1 1 flso 540 22a _al 1 ”211' Then the reduced form for the within-level structural equations (2.1.4) is : 24 ”to ,. 2,. z ,. O O - 7: vi. '7" = " 2’ 2° + ” (2.1.7) 7720' O 0 21:1" Z;.~,-_ 7’ 30 V20 1.71.40- where P750- -1 ”20 =veco[l:1 0] [7610 £201) ”30 ‘a1 1 1630 7640 -7’403 where vec' stacks the transpose of each row of a matrix into a vector. In terms of our educational example equation (2.1.7) Mean be written: l'fllo'i attitude”. = 1 gender, 0 0 7:20 + V.) (2.1.8) achrevement,j O O 1 gender”. - 7:30 v20. _”w_ Now on the between-cluster level, the structural relationships depicted at the upper part of Figure 2.2.1 can be expressed as follows : 25 [77011:] ___I: 0 0:][77011] + [flat] [WI] + ubu‘ (2.1.9) 77sz ab] 0 77sz b2 "sz where uh}. = [umum]r, uh]. ~ N(0,A,,), A, = Diagonal(6,,p,p =1,...,P). In terms of our educational example, equation (2.1.11) can be expressed as follows : attitude ttr't de . . . bi = 0 O a. u b’ + p“ [teachingstyle.]+ u,” (2.1.10) achrevementbj a b, 0 achrevementbj ’ ub b2 21' Equation (2.1.9) is the expression for the structural relationships among latent variables and the predictor variable on the group level. In equation (2. 1.9) w]. is teaching style. All of exogenous predictor variables are observed directly Without error, e. g., school location (rural, urban), school sector (public, nonpublic), religion, gender, ethnicity, family size (numbers of a household), individual's age in months and years, current membership in a political party or sports club. Then we have the reduced form for between-level structural equations (2. 1.9) m” = l 0 ‘1 flu 1 0 4 ubli [771.2,] [—a,, 1] [WNW-FLO!“ 1] [um] (2.1.11) We can represent (2.1.11) in the following form: 26 77b} = Pyjflbj + Vb; (2.1.12) w 0 V - [mu]:[ 1 ][”b10]+[ bit] (2.1.13) ’7sz O W]. ”020 vblj 01' where Vblj $‘\0 Vb” :[ 1 0]“ uh”- v.~N(0T ) V321 "Qbr'l V52" —abl 1 ”b2,- ’ b] , m’ ’ ”bro __ 1 0 -1 flbl [”bzoi—[—ab1 1] [flbZ] In terms of our educational example, equation (2.1.10) can be written as follows : [ attitude,j ] ___ [teachingstylej 0 ][7rm:l +[Vb1 J] (2 1 14) achrevementbj 0 teachmgstyle 1' am v”). Representation of the Equations in Matrix Form We can represent the equation (2.1.1) in matrix form without subscripts: y=AWn+ABnb +3 (2.2.1) 27 where y : [y111y211y311y411,---,y1n,ly2n,/y3n,1y4n,.l IT: incur emple, 4N by l veCtor~ a = [amaznsmsm,...,£,nfleznfl£,nfle,nf,]r ,in our example, 4N by 1 vector. N 2 Zn}. "Aw, 0 0 0 ‘ 0 11,, 0 0 AW : 0 0 A103 0 s L 0 0 0 A,“ FA”, 0 0 0 I _ 1 0 _ 0 A”, 0 0 1,, 0 where, A”. = 0 O AM.3 0 , A”... = 0 l o o 0 Am, L 0 2.9L "A,l 0 0 0 1 _ _ 0 A 0 O 1 0 b2 A.“ 0 AB: 0 0 A,, 0 ,Abj- 0 1 0 11 _ 0 0 0 Ade - ”- 77,1. =[77,”772”,...,17W‘Jn,nr,]r is in our example, a2N by 1 vector 28 77,, = [nb1ln52,,..., 1),, Jaw? is in our example, a 2] by 1 vector The matrix form for the equation (2.1.4) without subscripts is: n: An+Bz+u where ’A, 0 0 0 A2 0 A = 0 0 A3 _ 0 0 0 "B, o 0 0 8,. 0 B, 0 0 0 B: o 0 B3 0 ,8I 0 _0 0 0 B, L0 filo #20] .. B.= forallor). " [.330 fi40 (2.2.2) u = [u,,,u2”,...,u,,, flu," 1,]7 is in our example, a 2N by 1 vector. 29 The matrix form for the equation (2. 1.9) without subscripts is: 1],, = Ab 17,, + Bbw + ub (2.2.3) where FAbl o o o‘ "A”, 0 0 0' 0 A12 0 o O Ab}.2 0 0 0 0 Ab: 0 0 A,3 0 ,Abj= 0 0 Ab); 0 ,Abj,=[ ] ab, 0 _0 0 0 A,” L0 0 0 0 Arm Ab], is a lower triangular matrix with diagonal elements are zeros. 3,, 0 0 0 0 3,, 0 0 B - 0 0 B 0 B —[fl“°] ' b _ b3 9 bj — 801088811]. 761220 _ 0 0 0 Bud u, = [ub,,ub2,,...,uwum]r is in our example a 2] by 1 vector Now we can express the reduced form equation (2.1.7) in matrix form: 7] = Z7: + v (2.2.4) 30 where, 771.). = [7711177211,...,771,,J J 77an ,]T is in our example a 2N by 1 vector _ _:r,,.j 22,]. 0 0 up It)" ‘21)“ V.)- = [vmvm ,--«,V1..,JV2..,; ]T is in our example a 2N by 1 vector. We can express the between-level reduced form equation (2.1.12) in matrix form: ’71, = Wll'b +Vb (2.2.5) 272* I where 1], = [abut],21 ,..., 0,1,77sz ]7 is in our example a 2] by 1 vector 31 W: . , ng_=|:u/1J 0], 7Tb=|:flblo:| 0 Wu ”1220 3’ vb = [vbllvb2,,...,vmvb2,]" rs rn our example a 2M by 1 vector Table 2.1 Structural parts of the multilevel structural equation model Original Form Reduced Form where Withingroup n=A77+Bz+v n=Z7r+v 7r=vec[(12—AJ.,.)"BJ,] Between group 17,, = Abnb + Bbw + vb 7h = Wzrb + vb 71;, = (12 — Ab], )'1 Bb). e} $311 4 ‘ Transforming the Model into the Mixed Model Form By substituting the structural equations (2.2.4) and (2.2.5) shown in Table 2.1 into (2.2.1) without subscripts, we have the following combined equation (2.3.1). This representation permits us to develop a special version of the EM algorithm for multilevel structural equation models. y =[A,,Z |A,W][:]+[A,|A,][: ]+a (2.3.1) 32 In a more compact form we have: y: AOZa+ Aow+£ (2.3.2) where A0 : [AwiAbJ’ N1 ll NI 0 I__1 The model equation (2.3.2) is a special case of the general mixed model (Raudenbush, 1988) Y=4Q+AQ+E Q3” .A,=1\,Z, A2 = A0, g=w/*”‘” 03% 0, = 7r,'/ \ (7.1 4-1 ‘ .. E=£,"*""M 3 3 In equation (2.3.3), 6, ~ N (0,1"), since our prior knowledge about 19, is assumed null, the prior precision associated with 6, becomes null. 02 WM», 92 3893890 r 1,031, 0 0" 0 1,81“ E ~ N(0,‘I’). ‘I’=IN®Z, Based on this general model(2.3.2) I develop the empirical Bayes estimation procedure in Chapter 3. In our structural equation model, we consider a population of N level-one units, indexed k (group) and i (individual). Associated with each level-one unit are three vector- valued variables y, z and w. The values of the design variables, 2 and w, are completely known for all level-one units before observations are carried out, but the values of the outcome variables, y (the four indicators in our illustrative example), are not known at all. Design variables are considered fixed and known in our multilevel structural equation models. Then the marginal distribution of y is: y ~ N(p,) (2.3.5) where ,u= AOZn' (2.3.6) 34 ch: AOZHAOZ)’ + AorzAf, + z where And the conditional distribution of y given r] is: yl n~ N(Aon,>3) (2.3.7) - (2.3.8) Note that in the model there are not measurement errors at 2 levels that are distinct from the model residuals. This is a limitation on the illustrative example. To lmve an identifiable model we restrict the factor loadings for the first indicators of the latent variables to unit, the variance-covariance matrix 2 to a diagonal matrix, and “A” matrices to be lower triangular. In our model the total number ofvariance-covariance parameters to be estimated are 16, while the number of unique elements of the variance structure are 10 for each level. One can also include the group-level observed variables (global variables) for latent constructs in the general multilevel structural equation model (2.1.2). For example, in the study of the United States SIMS data each of the constructs of teaching practices and training and experience are measured by two indicators (Vredevoogd, 1993). In that case we have the following form of the item-level equations for each individual: "y,,,‘ ’1 0‘ F1 0 0 ' ”an,” y 21:: ’1'; (I) r A: 0 O P772311: 521:: y3ld 77m 1 0 33k, ym = O ’12 _772u]+ 0 ’182 0 mm + 341:: (2.3.9) xw 0 0 0 1 “my: 35,“. _x2a., _ O 0 _ L O O ’183_ fan-J where x,,,, ,th are the indicators for the group-level construct, teacher's teaching practices, which is supposed to influence the class posttest (Vredevoogd, 1993). Then one can see that this model is a special form subsumed into the general model (2.3.1). In the conclusion of this section we note the measurement model specifications. There are three types of specifications. The first measurement model is implemented by requiring equal factor loadings for all manifest variables and equal unique variances (Joreskog, 1971). The second measurement model retains the assumption of identical error variances across measures, but allows factor loadings to difi‘er. The second model provides a more realistic description of actual data where observed measures are similar in content but differ in difficulty. The third measurement model is that the observed measures have identical factor loadings but have unequal error variances. Of particular importance are the measurement models in which those measurements have different factor loadings and unequal error variances, but the manifest variables are highly correlated (i.e., they measure the same thing to somewhat high degree). CHAPTER 3 EM ALGORITHM FOR MAXIMUM LIKELIHOOD ESTIMATES After a model has been formulated, the statistical problems are to estimate the parameters in the model and to test the fit of the model to the data. General descriptions of the EM algorithm for the multilevel structural equation models are given in this chapter. Dempster, Laird and Rubin (197 7) presented the EM algorithm as a general iterative method for computing maximum likelihood estimates fi'om “incomplete data”. Wu (1983) presented it in a more general context, viewing it as a special optimization algorithm. The EM algorithm is particularly usefirl when analytic expressions exist for the conditional expectation of the missing data and for the maximum likelihood estimates (MLE) of the model parameters given the observed data and missing data. Although in the literature it has been known as a method for estimating parameters of a model when observed data can be regarded as incomplete data, there were early uses of EM notions by Hartley (1958), Healy and Westmoratt (1956), Baum et al (1970), Brown (1974) and Sundberg (1974). In Rubin (1991) the essential idea of EM algorithm is briefly depicted: “The basic idea behind the EM algorithm is very old and very intuitive and can be colloquially described follows: 1. Given a problem that is difficult to solve, formulate it so that if missing data were observed, then the solution would be at hand; in particular, formulate the problem so that a good estimate (e.g., the maximum likelihood estimate, MLE) of the 36 37 parameter 9, 6, would be easy to find if the missing values, Y were observed in "HS ’ addition to the observed values, Y . Notice that ”missing data” is viewed quite broadly to include, for example, latent variables in psychometric models. 2. Consequently, fill in a set of values for Ym and solve the problem(i.e., find 6). 3. Using this 6, find better values of Y rm's to fill in, and then repeat Point 2 to find a new value of 9. 4. Iterate until the values of 9 converge." Based on this basic notion, one can conceptualize the implementation of the EM algorithm for the multilevel structural equation model. In section 3.1, we discuss the concepts of incomplete and complete data as applied to the multilevel structural equation model. We also develop the posterior distributions of the random vectors in equation (2.1.3). In section 3 .2, we present the iterates for the implementation of EM algorithm. We also present the maximum likelihood estimates. In section 3 .3, we present the observed-data log likelihood fimction. 3.1 General Description and Application to the Multilevel Structural Eguation M2191 Through casting the measurement model and the reduced form of the structural equation for latent variables into the general mixed model (Raudenbush, 1988), we can conceptualize our problem as having complete data and incomplete data. Note that in the multilevel structural equation model the factor loadings are parameters rather than observed predictors. 38 Then to compute maximum likelihood estimates of the dispersion matrices for the random vectors in the model (2.3.1), we apply the EM algorithm (Dempster, Laird and Rubin, 1977). We now discuss the concepts of complete data and incomplete data, as applied to model (3.1.1) From Chapter 2 we have the following combined equation: A1011“ + NW5 “W “cil- 34L, (11' “42 'ul 2' ya =[AWquA W]I::b°o]+A,,6,j+A,0w+gv (3,1,1) In more compact form we have : {it ' i GU i J L 9% , ya. =A 0Z 7r+AowU +80. (3.1.2) 4" [(WU' )0 ' \ where at“ I" "i ~ Zfi 0 "Ao=[A,|A,], Z..-—- 0 W“ (3.1.3) 44 4” W 4 2.". r o 7::[4'9], 7r~N(0,I‘),1“=[l 1. i141 ”b9 0 er Since our prior knowledge about it is assumed null, 1“", the prior precision (Dempster, Rubin and Tsutakawa, 1981) associated with it, becomes null, that is, I"I —> 0. And, 39.. 8,, ~ N(0,>:), 2 = In the model equations, yo. =A,, ,1. +Ab’hj +531. %=%%+% ”a =Wr”bo+‘9br O = {A,,A,,2,Tm, T0} is the set of parameters. yo,” = {Y,Z,W} is the set of observed data. c = {#0, It”, 65,, 0”. , .} is the set of missing data. r_l.n()\Or'V\ {OW} ‘1 § (3.1.4) 40 The conditional probability density function is proportional to-ithe joint probability , d density firnction : f (495.195,) °C (271)—N'I2l23l—m2 “FIFO-5):: 2 0’1,- — Aozafl" A0 975-) )T 2" (ya. — AoZng— A0 w, )] x (2 70"” |T,,|"‘"2 exp[(-0.5)Z 2 (6;T1 6 )] ’7 if x(27t "“2le rm exp[(—0.5)Z(0;T;9,.)]xh(n) (3.1.5) where c={ no, 750,0 .,0,,.,e,,} , G) = {A,,Ab,Z,Tm,Tn}, r=the number of indicators, p=the dimension of 6”. And s=the dimension of 0b,. The prior distribution h(7r) is considered a very small constant and it can be ignored while the empirical Bayes estimators are calculated (F otiu, 1989; Dempster, Rubin and Tsutakawa, 1981). If ”e" were observable, some function of "c", t(c) would be a vector of complete data sufficient statistics for the dispersion matrices. In reality, the vector ”c" is unobservable; however, the vector y, whose elements are linear functions of the elements of "c", is observable. In the realm of the EM algorithm, we regard the elements of "c” as the "missing data" and those of y as the "incomplete" data. Then we can develop the iterative E-step (expectation step) and M-step (maximizing step) for computing new parameter estimates. The E-step consists of estimating t(c), which would be a complete data sufficient statistics [if the vector "c" of complete data were available, by its conditional mean given the observed data and current estimates of O. The equations that are solved for parameters in the M- step can be regarded as an approximation to what would be the likelihood equations 41 if the vector "c" of complete data were observable. From Dempster et al. (1977), Each iteration of the EM algorithm solves : £[Q(®,®m)|e=ew, ]= 0 where Q6190") = E(1nlf (C;@)]|Y = 359"") The necessary posterior location vector and dispersion matrix of the random vectors in the model (2.3.3) is : (F D. = Apr-'54, +1“l Any-‘54, “= D; c;, A,‘I’"A, A,\P"A, +r" Cg, D; D; = 14‘?" A. - Ai‘I’“A5) = ( Nr/2)ln 21t+(-N/2)ln|2| 3 u . f a 1 +(—Np/2)ln(27t)+(— N/2)ln|T |+(- Js/2)ln(27r)+(— J/2)ln|T,,l_ -2243 {(A0 2, )D; (A 2.1") +(,\ 1); A7,) +§>1-§ZtrlT.;‘E(6b,9JIY= yo“ ”)1 (3.2.1) In order to maximize, we take the first derivatives of Q(®‘”,®‘"”) with respect to Tme, 2,Ao, respectively. 4 3 '7 _,- T r ' . ’/ I I; r "p I a. '. l ) 0‘Q(®“’,®""’) = _ g 51‘ 2 q (1 [2’r,;‘—D(T,;')] ii -i ”’éZZIzTg'EWMIY=y.®“"’)'r;'-D{T;'E(0t0;IY=y,®“"’)T;'}1 (3.2-2) where D means a diagonal matrix (Graybill, 1983). Thus D(T) is a diagonal matrix with i-th diagonal element equal to the i-th diagonal element of a matrix T. Setting the derivative equal to null matrix and solving gives the ML estimate (see, Press, 1982; Magnus and Neudecker, 1986). Then the ML estimate is : . 1 , T». #23656? +09. )1 (32-3) 5Q(@(",@("”)__1_V_ —1_ -1 (2) 51. - 2 [2T,,, D(Tq,)] 'Ib +%Z[2T;IE(%9LIY = x0“ >1: 'D{T;E(0br9;|Y = y,®““’>T;.‘ H (32-4) Setting the derivative equal to null matrix and solving gives the ML estimate. Then the ML estimate is : Tm =%[Z(0;9;f +D;U)] (3.2.5) {U e Y J C , y a 4 1.2- In. a"? :3 "T \‘l’ t ‘ . L I ‘+' j)4 44 (3) @(®U)a®(i-l)) af as, 502 I' = [—%[22-' —D(2‘.")] 1 _ ,._ _ _ _ . 4322122 'E(a,.,a,.f|Y= y,®< ”)2 ‘ —D{2 ‘E(a,.jg,.f|r= y,®( ”)2: 1}]] x1, 0 r 7-. (3.2.6) - x J I” where I: is the column indicator vector which has a 1 in theX—th position and zeros in other positions. And 2, is a full matrix. Setting the derivative equal to zero and solving gives the ML estimate,. ea. In appendix 4 we present the ML estimator for each element of 22. i=DiagonaI(6f,..,6f) (3.2.7) 52 _ ._ (3) g @ 5Q(®"’,®" 1)) 5A7 = _ -1 ”i 0 _ J“ -1 v ”r (4) M 0318,, [ [ZZZ AZUCW] [>2 ACWUZU] Q o “' ® o 422 2"AZVDJJ HZZ 2"ADWUJ v ‘/ ~ +022 2"yy-n"251+tzi>="yl will-[ZZZ >3“AZ".zr‘rr"Z.-I 1 (l ) (l) J {22 E" (Auditqz; +AZJ. 71" wig] —%z Z Z‘lAw; q?%]] x 1:27): (3.2.8) '4- ) (S) J 45 90) 90-1)) é’A where A is a fill matrix. The details of derivation of 6Q( are given in appendix 5. Setting the derivative equal to zero and solving gives the ML estimate, 28,. In appendix 5 we present the ML estimate for £0. ii, = [AAA (3.2.9) where [askL means placing element as, in the g-th row and k-th column of matrix A, and zeros elsewhere. In our example, g= 2, 4, k= 1,2,3,4. In sum the E-steps and M-steps are: (1) E-step : Find Elog[L(c,o)|y,Tf;-'>], M-step : Substitute the equation (3.2.3) with these quantities, and then we obtain new T”, set TS) equal to this new T", (2) E-step : Find Elog[L(c,®)|y,T,h("”], M-step : Substitute the equation (3.2.5) with these quantities, and then we obtain new Tm, set T5,? equal to this new Tab. (3) E-step : Find E log[L(c,®)| y, 23 (H) ], M-step : Substitute the equation (3.2.7) with these quantities, and then we obtain new 23 , set E“) equalto thisnew 2. 4 6 (4) E-step: Find It", 01:, Dz“, 0;, CW“. Notethat these are all fimctions of A‘s"). M-step : Substitute the equation (3.2.8) with these quantities, and then we obtain new A0, set A? equal to this new A0. Then here the first iteration of the E and M step is completed. This algorithm proceeds until some user-specified termination criteria are met. For example, the algorithm might terminated when successive iterates differ from each other by no more than some number (i.e., = O“5 ). 3.3 Likelihood Function We conclude this chapter with expressions for the observed log-likelihood function which is numerically simple to evaluate. Although the EM algorithm does not require an evaluation of the likelihood function, successive values of the fimction can be usefiJl in monitoring the progress of the algorithm toward convergence at each iteration. And it's used in testing fit of alternate models. Note the relationship among probability density functions : =Po’lg’Pw’ 331 (y) P(6ly) (..) In the framework of the general mixed linear model, the equation (3.3.1) is rewritten as: 47 P(y|0,‘o,w,A,)P(qo,w,Ao) P(61y,Q,‘P, A0) P(y|Q,‘P,Ao) = (3.3.2) The specific expressions for each of the density functions stated in equation (3.3.2): P(y|0.0,‘P.Ao)=[(2n)”|‘l’ll"”exp[(-0.5)(y-A9)“I"‘(y-A9)] (3.3.3) Hanna/x.)=[(2n)°lm1"”epr-asxsm-‘m (3.3.4) where y : the observed outcome vector for an individual A : the design matrix for the multilevel structural equation model a: [n1 af ]’ Finally, the denominator part in equation (3.3.2) can be specified as: P(6|y,fl,‘1’./\o) = t(2n)“ID;n"*’ exp[(-0.5)(6- Wozw— 6)] (3.3.5) In particular, when 0: 0’, we have: 1301:2311, A0) = (27: 'N’2|D;|"2 |‘I’|"’2 [arm exp[(—O.5)S(0')] (3.3.6) where: s<€>=y’\r"(y—A.0:—A.6;) (3.3.7) 48 Now the log-likelihood fimction for the structural equation model may be: LLR(Q, ‘1’, Ao| y) at (-0.5)log|‘I’|+(0.5)|D;|—(O.5)longl—(O.5)S(6' ) (3.3.8) First we evaluate: det(‘P) = det(2 8 IN) = [det(2‘. )]” det<2 )=(of) (oi) ...(of> (3'39) log(det(‘I’)) = N[log of +logo§+...+logaf] (3.3.10) And also: det(Q) = det(Q,,)det(Q,h) (3.3.11) log[det(Q)] = N log[det(T,, )] + J log[det(Tfl)] I‘ is considered large but fixed, from Dempster, Rubin and Tsutakawa (1981). Finally we have: V11 V12 V13 det(D;) = det V21 V22 V23 = [det(Vu)l[det(V22 - V21V1il 12)][det(d33 — d32d2-21 23)] V31 V32 V33 (3.3.12) The second term in equation (3.3.12) is given in appendix 2 as det(Q;'). Let the third term be det(U") 49 where V V V d = ” ‘2 d = '3 d =d' = 3.3.13 22 [V21 V22 ]’ 23 V23 ’ 32 [ 23] ,d33 V33 ( ) SW) = ZEUS-24m. -A...- n‘ -A... w; )1 (3.3.14) Then the log-likelihood function for the structural equation model is : LLR(‘I’,Q, Aoly) = (N/2)log(det 2) — (N/2)log(detT,,)— (K / 2)log(det T,) +(1/ 2)log(det V11)+(1/ 2)Z log(det Q; )+ (l / 2): log(det Ug‘) -ZZ[y£2"(yu -Amfl'-Ame~)l (3.3.15) At each iteration the algorithm evaluates the log-likelihood function to monitor the progress of the estimation. CHAPTER 4 NUMERICAL RESULTS In this chapter, I use a computer program written in Gauss (Version 2.2) to compute ML estimates fi'om a set of artificial data. To verify that the produced estimates of the parameters are accurate the data are randomly generated with known (predetermined) population parameters. The analysis was done for the balanced case and the unbalanced case. The Gauss program is designed to use cross-product matrices and initial starting values as input data and to perform computing over numerous iterations of the EM algorithm. The path diagram for the model is given as a figure 2.1.1. In the example the two indicators for the ATT (attitude) latent variable are ATTI, ATT2. They are the student- reported responses to the questions in the attitude scale. The ACHl variable measures achievement score in the "principle" parts, while the second indicator ACH2 measures achievement score in "problem solving" part. 4 1 cneratin the Data Before creating the necessary data we have to consider several issues. For the balanced data 10 subjects are selected per group. The distribution of the number of groups per group size is given in Table 4.1. Due to the heavy computational load 50 51 of estimating these model via the EM algorithm, only a single sample data will be generated. Table 4.1 Number of Grggps per group Size Group Size Balanced Data Unbalanced Data 6 10 7 10 8 10 9 20 10 500 450 Total 500 500 To create samples to be fit to the multilevel structural equation model specified in chapter 2, we modified the covariance structure by setting var( 7:) = 0. Then the observed outcome vector y is calculated by using equation (4.1.1) : 0.2 y", "1.0 0.0 1.0 0.0“"10 2,, 0.0 0.0 0.0 0.0“ 0.1 y”, 0.82 0.0 0.75 0.0 0.0 0.0 1.0 2,, 0.0 0.0 0.31 y” 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 w. 0.0 0.33 y..,,_ _0.0 0.73 0.0 0.66_L0.0 0.0 0.0 0.0 0.0 w" 0.25 0.35 '1.0 0.0 1.0 0.0“va a, . . 0.7 0.0 v. +082 00 5 2,, + a, (4.1.1) 0.0 1.0 0.0 1.0 v,” a, _0.0 0.73 0.0 0.66JLv,2,_ s,_ 52 The values corresponding to the vector 7: in (2.3.2) have determined by using the formula as shown in Appendix 1. The necessary values for ds and fl's are : ,6l = 0.20,,82 = 0.10,,63 = 025,54 2 0.30,,6,l = 0.25,,B,,2 = 0.30, awl = 0.30, or,l = 0.20. The values for 220.,w 1,12,11,50,“. 1.,va 131.52.53.84 are generated by the following: (1) first we generate 5000 "2" variable from a standard normal distribution. Then if the value is bigger than 0.0 we assign l to it, while the value is less than 0.0, we assign 0 to it, (2) do the same for "w" variable for 500 groups, (3) generate 500 between level random vectors from the population VC (variance covariance) matrix, Th, (4) generate 5000 within level random vectors from the population VC matrix, T”, (5) generate 5000 measurement error vectors from the population VC matrix, 2 , (6) then we use the equation (4.1.1) to obtain a balanced raw data. The IMSL FORTRAN library contains the necessary several subroutines. The dimension of the observed variables is four (r=4). The dimension of the latent variables is two (p=2). And then we create an unbalanced data (4890 data points) by randomly deleting 4 for each of 10 groups, deleting 3 for each another 10 groups, and deleting 2 for each of another 10 groups, and finally delete l for each of 20 groups. 53 Table 4.2 Descriptive Statistigs for ngbles (1) Balanced Data TOTAL SAMPLE SIZE = 5000 MEAN Y1 -0.291 Y2 -0.187 Y3 -0.145 Y4 -0.097 ST.DEV SKEWNESS KURTOSIS MIN 7.735 6.477 7.913 6.055 -0.081 -0.07O -0.003 -0.021 SAMPLE COVARIANCE MATRIX Y1 59.824 Y2 Y3 16.487 Y4 1 1. 125 (2) Unbalanced Data 39.570 41.951 12.887 8.927 62.620 35.242 TOTAL SAMPLE SIZE = 4890 MEAN ST. DEV. Y1 -0.285 Y2 -0.l87 Y3 -0. 167 Y4 -0. 100 7.805 6.533 7.957 6.041 -0.075 -0.092 -0.010 -0.017 ~0.010 -0.005 -O. 125 -0.085 36.667 0.045 -0.018 -0.029 -O.137 ESTIMATED COVARIANCE MATRIX Y 1 Y2 Y3 Y4 60.922 40.477 16.370 10.376 42.686 13.360 8.746 4.2 Regults 91' the Analysis 63.309 35.342 36.492 -33.444 -23.317 -27.877 -20.807 -29.918 -22.221 -27.866 -21.051 FREQ. MAX FREQ. y—d—ia—sy—s 1 1 l 1 27.565 23.612 25.107 24.764 28.897 24.147 26.820 19.738 1 l 1 1 SKEWNESS KURTOSIS MIN FREQ. MAX FREQ. |—|p—I_s_-‘ The output of the Table 4.3 and 4.4 are the result of fitting balanced data and 54 unbalanced data to the same model. The focus of investigation is in discovering and testing the estimates of parameters are close to the predetermined population parameters within some what sampling error. Table 4.3 Resultg of Analysis for Balanced Data Population Starting Estimated Parameters Values Parameters it”, 0.82 0.582 0.857253 4.02 0.73 0.573 0.720532 4151 0.75 0.575 0.738310 21,, 0.66 0.566 0.652453 tn“ 30.00 20.000 29.006845 1,," 9.00 7.000 9.432951 1,,22 32.70 20.000 34.328632 tam 20.00 10.000 18.407147 tam 4.00 3.000 3.528167 1,,m 20.80 10.000 19.754972 of 10.00 8.000 10.851423 0'; 12.00 8.000 11.2542916 a“: 14.00 10.000 13.7741274 oi 16.00 10.000 16.1793495 aw, 0.30 0.350 0.3251942 at,l 0.20 0.300 0.1916734 Table 4.3.1 angitibnal gxbggtatibns of regressibn coefficients A 0.20 0.100 0.3550013 fl, 0.10 0.005 0.1492152 B3 0.25 0.100 0.6579540 [34 0.30 0.200 0.1218303 ,3“ 0.25 0.100 0.4485403 3,, 0.30 0.100 0.2161860 55 Table 4.4 Rgsults of Analysis for Unbalanced Data Population Starting Estimated Parameters Values Parameters 21,, 0.82 0.582 0.855920 21,, 0.73 0.573 0.719029 21,, 0.75 0.575 0.736739 2,, 0.66 0.566 0.651504 1,,“ 30.00 20.000 29.563270 t,l2 9.00 7 .000 9.569872 Inn 32.70 20.000 34.697681 1%“ 20.00 10.000 18.153446 tam 4.00 3.000 3.465942 t,m 20.80 10.000 19.38584 of 10.00 8.000 10.87648 0': 12.00 8.000 11.29153 0% 14.00 10.000 14.04039 oi 16.00 10.000 16.18727 a,,, 0.30 0.200 0.32372 at,l 0.20 0.150 0.19092 Table 4.4.1 Congitibngl gxbegtations of regression coefficients ,6, 0.20 0.150 0.29023 3, 0.10 0.500 0.13452 ,6, 0.25 0.150 0.74056 [3, 0.30 0.200 0.10906 13,, 0.25 0.150 0.47312 13,, 0.30 0.200 0.27258 56 Table 4.5 The Valuea bf tha Observed Log-likelihbod at Convergence Balanced Data Iteration 1353 -2902.3472413 Iteration 1354 -2902.3472410 Iteration 1356 -2902.3472408 Iteration 1357 -2902.3472405 Iteration 1358 -2902.3472402 Iteration 1359 -2902.3472400 Iteration 1360 -2902.3472397 T le 4.6 The Values f the serv dL -likelih dat Conve ence Unbalanced Data Iteration 1321 -2768.5733562 Iteration 1322 -2768.5733550 Iteration 1323 -2768.5733537 Iteration 1324 -2768.5733522 Iteration 1325 -2768.5733511 Iteration 1326 -2768.5733508 57 In discussing the results reported in Table 4.3 and 4.4 , we say that the EM algorithm recovered the population parameters values well. The criterion used for convergence of the observed log-likelihood is that log-likelihood is smaller than 0.1‘( 5 s 10‘6 ). In the 486 IBM PC with the spwd of 66 mhrz for the convergence it took about 45 hours. For the unbalanced case the hours spent are about 48 hours. The Table 4.5 and Table 4.6 show the list of log-likelihood values at the neighborhood of convergence. As the Table 4.4 shows the number of iterations is very large and the spent hours is very long. The slowness of convergence of the EM algorithm is the repeatedly criticized property of the algorithm. This seems to be caused by the fact that missing information (Meng and Rubin, 1991) is relatively large in the multilevel structural equation model. To obtain estimates of a" s and ,0 s we assume that the matrices, A and A, are diagonal matrices. Then as shown in Appendix 1, we obtain the estimates of structural parameters. When we use extremely poor starting values, the estimates are not similar. But if we use moderately poor starting values, then the results are very similar (almost same) across various sets of starting values. Afier estimating parameters, the likelihood ratio tests are available to test more and less complex models. It is known that the statistic, —21n(7:L), has an asymptotic chi-square distribution, where L1 is the maximum likelihood value of a less complex model and L, is the value of a more complex model. The degrees of freedom of chi-square statistic is the difference of the number of parameters to be estimated in each model. In our educational example, the model for L, may be constrained as follows. (I) Tq:Tn.2 (2) szAb As the Table 4.5 shows the number of parameters of the simpler model is 10 while the 58 complex model has 16 parameters to be estimated. The degrees of freedom of chi- square is 6. The deviance between the two values of -21n(Likelihood) is 662.24. This value is large enough to reject the adequacy of the simpler model. In this likelihood ratio test we do not know which of the restrictions is not adequate. Thus for each of the restrictions we may test the adequacy of the simpler model rather than complex model. Table 4.5 Rasults of Analyaia for Unbalanced Data for Restricted Model Population Starting Estimated Parameters Values Parameters 31,, 0.82 0.75 0.841037 21,, 0.73 0.70 0.692301 11,, 0.75 0.70 0.720219 21,, 0.66 0.50 4 0.837227 I," 30.00 15.00 32.647209 1,," 9.00 5.00 10.538582 1,,22 30.00 15.00 36.943765 tn," 20.00 15.00 16.459542 tn... 4.00 5.00 7.358164 1% 20.00 15.00 25.980356 of 10.00 8.00 9.236718 0‘; 12.00 8.00 12.450677 0'; 14.00 8.00 12.120453 0?, 16.00 8.00 13.863560 awl 0.30 0.200 0.323054 (1,, 0.20 0.150 0.447046 The value of log-likelihood -3104.68428 5, 0.20 0.150 0.92709 5, 0.10 0.500 0.17796 53 0.25 0.150 2.05197 5, 0.30 0.200 1.17230 5,, 0.25 0.150 2.64902 5,, 0.30 0.200 1.48537 The multilevel structural equation model postulates that the causal relationship on the within level and between level are different because the explanatory variable is different, in our example, gender on the within level, teaching methods on the classroom level. We may calculate the root mean square residual (Joreskog and Sorbom, 1993) for an overall goodness of fit measure. In the literature of LISREL, we found the root mean squared residual can be used to compare the fit of two different models for the same data. 1 RMQR= [22“, --c'},.)2 /(r2 +r) 5 (4.2.1) where s, is the i th row and j-th column variance-covariance element of the sample total variance-covariance matrix. And 6,, is the corresponding element of the model predicted total variance-covariance matrix,fl. RMQR is a measure of the average of the fitted residuals. The model predicted total variance covariance is given by iziw+f§, . then we obtain, RMQR = 0.73850 for complex model, RMQR= 5.98066 for simple model. Judged by these index we may conclude that the fit of the simple model to the hierarchical data is not as adequate as in the complex model. This conclusion was expected because the data were generated in a hierarchical fashion, 60 and the restrictions makes the model a single level structural equation model. Now we make inferences about It’s in balanced data case. Specifically, from the posterior variance estimates of 71’s we obtained the standard errors for n’s shown in Table 4.6. and 4.7. For the balanced data the t-statistics are: [1.247 0.8373 0.5566 0.8806 1.09108 0.6998]. Thus the expectation of the “attitude” latent variable for the girl students in classrooms of which instruction style is expository is significantly different from zero. And expectation of the “ACH” latent variable for the girl students in classrooms of which instruction style is expository is not significantly different from zero. The total effect of “gender” on the “ACH” latent variable is not significantly different from zero. The “gender gap” on the “attitude” latent variable is not significantly different from zero. The total effect of discovery teaching style on “ACH” latent variable is not significantly different zero at the between group level. And the effect of discovery teaching style on the “attitude” latent variable at the between group level is significantly different from zero. Table 4.6 Estimates of £3 for Two Sets of Data Balanced Data Unbalanced Data Estimated Standard Parameters Error Population Estimated Standard Parameters Parameters Error 71', 0.20 0.3550 0.28560 0.2902 0.2137 fl'z 0.10 0.1492 0.17818 0.1345 0.1845 7:, 0.31 0.7734 0.30602 0.8345 0.2827 71', 0.33 0.1703 0.19344 0.1526 0.2216 7r,l 0.25 0.4485 0.41105 0.5316 0.4363 It),2 0.35 0.3021 0.43176 0.3629 0.5262 CHAPTER 5 CONCLUSION Research in the field of education provides various challenges. For example, the random assignment of students to a set of conditions is not realistic in most cases. Even in the experimental setting the outcomes will have a positive intracluster correlations due to the fact that (1) students do not receive their instruction individually but in groups, (2) interactions exist between treatments and students (Lumsdaine, 1963). This situation often makes the application of the linear structural equation models (J oreskog, 1977) to the real data inappropriate. As Cronbach (1976) pointed out, many studies in the field of education have produced inappropriate analysis, especially in the field of evaluation studies because they failed to recognize the nature of hierarchical data. The difiiculty of analyzing data arising from two levels is in assessing the nature of intervariable relationships at both levels simultaneously. During the last two decades researchers have developed multilevel structural equation models for hierarchical data. Previous work on the multilevel structural equation models involved a minimization fitting firnction or /and balanced sampling design. And most ultilized stande software. These minimization fitting function approaches have made substantial methodological advances. They require classifying groups into subsets of groups having equal sample size. This dissertation has shown how multilevel structural equation models can be formulated for hierarchical data and how they can be analyzed by using empirical Bayes with the EM algorithm. The model equations are linear at each level, the direct 61 62 direct connection between variables is typically specified by the value of the coefficients as in the LISREL (Joreskog, 1977) or EQS (Bentler, 1983) models. The major outputs of this thesis are four: 1. I presented the general multilevel structural equation model in the mode of hierarchical empirical Bayes modeling. 2. I developed the empirical Bayes estimation procedure via the EM algorithm to find maximum likelihood estimates of the model. 3. A computer program for numerical analysis of hierarchical data for structural equation models was developed in Gauss. 4. The accuracy of the computing algorithm has been tested across sampling designs and models. 5.1 Summag, Implications and Conclusiona I now summarize the major points that emerged in each chapter and discuss their implications for fitting multilevel structural equation model to hierarchical data. In chapter one, the problems confronting single level and multilevel structural equation models were identified. These problems are old ones. Traditional single level structural equation models do not incorporate the random errors from the multilevel structure. Therefore reserachers face the dillema "What should be our unit of analysis?” That is, they have to choose individual level analysis or group level analysis. Ignoring the inherent hierarchical structure in our data sets results in the confounding of group level effects with individual level effects. Of course, the 63 individual level analysis violates the independence assumption and results in over estimation of precision. In program evaluation studies, for example, curriculuum evaluation, we are interested in the interaction effects between the treatments and the individuals across individual level background information and group level variables. In chapter two, the multilevel structural equation model and a few basic model assumptions are presented and then translated into the general mixed model (hierarchical model of linear equations) on both the cluster level and individual level. In particular, the model explicitly utilizes the socio-demographic information on both group and individual levels. We also provided an example for model specification. In chapter three, empirical Bayes estimation procedure via the EM algorithm was generalized to the multilevel structural equation model. In the "hierarchical prior distribution" specification we adopted the MLR (Dempster, Rubin and Tsutakawa, 1981) approach. We also presented the E and M steps for ML estimates. Much of Chapter four assessed the accuracy of the EM algorithm for an artificial model for different sampling designs. The results showed that the EM algorithm for the multilevel structural equation model is quite accurate for the data generated. The results were drawn from the simulation under the balanced and unbalanced sampling design. Ignoring the second level effects can yield a misleading interpretation of the structure of the causal map among latent constructs under the study. The presented methodology allows the simultaneous estimation of the prarmeters in the unbalanced multilevel structural equation models. 64 The primary advantage of the EM method for multilevel structural equation models lies in the three key facts: (1) it does not require the calculation of the 2nd- order partial derivatives of the maximum likelihood firnction. Even though the models studied by several researchers are different in terms of balanced or unbalanced sampling designs, the calculation of partial derivatives are essential for the methods proposed by Schmidt and Wisenbaker (1986), MacDonald and Goldstein (1989), Lee and Poon(l992), Raudenbush (press). (2) it does not require classifying level-2 units into subsets having equal sample size. However it has the shortcoming of slowness of convergence. Another shortcoming of the EM algorithm is that it does not provide the standard errors for the estimates of parameters. When we apply the method to real-world problems we need firrther elaboration of the models that carefully takes into account the special features of a certain subject matter. Typical applications of structural equation mOdels involve (1) the development of a prior model, representing hypothetical causal associations among a set of latent variables and manifest variables. (2) fitting the prior model to sample data, (3) the evaluation of the solution in terms of its parameters estimates and goodness of fit, (4) the modification of the model so as to improve its parsimony and its fit to the data. This last step has been known "a specification search" or "respecification". During such a search the researcher alters the model specification in search of substantively meaningful model that fits the data well. The structural misspecifications are involved the specification of the elements of matrix A and A,. It is also closely related to the identifiable model. 65 The purpose of latent variables model is to improve the accuracy and validity of inferences from empirical data. In order to accomplish this goal, several assumptions about the structure of the data and the meaning of the associations between variables must be made. Ideally, each of these assumptions will be based either on special features of subject-matter application area or on the knowledge, derived from past empirical evidence. 5.2 Future Work One of the potential fields to which the mutilevel structural equation model is extended is the model where the slope for the exogenous variables varies randomly across groups. Note that there are numerous settings in which multilevel structural equation models consisting of random slopes for exgenous variables are needed in order to represent adequately the variance-covariance structure of the data. Thus, for example, the gender gap in the SAT mathematics test scores can be explained by the group-level characteristics. As is well known, the EM algorithm is simple to implement and numerically stable, but is slow. Recently Jamshidian and Jennrich (1993) developed a conjugate gradient scheme for accelerating the speed of the EM algorithm. In their ABM (accelerating EM) algorithm the evaluation of the gradient of the likelihood function is essential. When the number of parameters is moderate the implementation of the AEM algorithm seems not so burdensome. To obtain the standard error of the 66 maximum likelihood estimates one may apply the SEM (supplemented EM algorithm) algorithm developed by Meng and Rubin (1991). The current approach to drawing inferences concerning variance components is based on large sample theory for ML estimators. When the number of groups is small the normal approximation will be invalid. For that case, one may apply the Data Augmentation (Tanner and Wong, 1987) approach. Future models can be expanded to larger and more complicated models. Also robustness of the model remain to be studied. APPENDICES 6 7 Appendix 1 For finding the values of ,6‘ s and a' s, we have the elementwise algebraic relations : 7n =4 (A11) 7’2 :azrflr‘l’flz (Al-2) Then we can derive estimates of the structural coefficients from this first within-group algebraic relations. Especially the identification of a' s is carried out on the base of the following assumption (A 1 .3) and the second within—group algebraic relationship (A 1 .4). var(u,)=A=diag(6,,p=1,...,P) (A.1.3) (1 -— A)" A(I — 14):" = var(vy.) = T, (A14) The elementwise relations of (A 1 .4) are: 6,,=r 'hr arrazr : 77),, (A. 1.5) After finding ’s from the above algebraic relationship (A 1 .5), we find the ,8 's from the first within-group algebraic relationship (A 1.2). 6 8 Similarly the first between-group algebraic relations are: (I — A,)"B, = r1, , (A16) The corresponding elementwise relationships are: ”6011 = flbll ”6021 : abZIflbll +flb21 (A17) In the same fashion, the identification of a, 's is carried out on the base of the following assumption (A 1 .8) and the second within-group algebraic relationship (A 1.9). var(u,,)=A, =diag(6,,,p=1,...,P) (A18) (1 - 71,):l A,(I — A, )-" = var(vbj) = Tm (A.1.9) The elementwise relations of(A 1.9) are: 51>” = Tn... abllabZI : T (A110) '1»: After finding a, 's from the above algebraic relationship (A 1. 10), we find the ,6, ' s from the first between-group algebraic relationship (A 1 .7). 6 9 Appendix 2 For the computational convenience of the necessary posterior expectations and dispersion matrices for the random vector the equation (3.1.1) without subscript becomes : y = A,7t+ A26, +A393 +8 e~ N(0,I,, ®Z ), N=an n~ man 0, ~ N(0,Q,,), 03 ~ N(O$Qq) ) let y = A 6+r where A = [A,|A,|A,] ’7’1 - F931- ”2 032 fl' 0 = 62 , 71' = ° , 03 = . s 93 ' ' _fll _, L63k 70 92 =[021162123""021nl"”’ 62K162K2P'W62Knx ]' t : the dimension of 7: Q, = subdiag(T ) r2 = subdiagdm) '10 var(6) = 00"! o 0 o, 0 0 o '10 By using the results of the standard multivariate normal distribution (Searle, 1971), one obtain the following joint distribution of the multilevel structural equation model y 0 ' 2, 41‘ 21,9, 21,9," 5, ~N 0 FA; r 0 0 5, o’ (2,,A, 0 a, 0 5,, __0_, -944; 0 0 (2,, -1 where z, = A,rA,' + A,o,.4, + A,o,,A; +1, (8 2 The posterior dispersion matrix D; of 0 is translated from A"I’“A +Q“]‘l into the context of the multilevel structural equation model. The posterior dispersion matrix can be written as : 71 Arr-'71, + r:l A',‘P"A, A',‘I’”‘A, D; = [A"I’“A + Q“ ] <:> A,‘1’"A, A',‘P"A, + (2;; A',‘I"l A3 A;\P"A, A;\P“A, 14,8!" A, + (2;: For the convenience of applying the inversion formula of the partitioned matrix (Graybill, 1983), one can rewrite the matrix D5: B L _1 —1' 3.13.1. =[Q- f‘GQ_’.] . —(GQ 1) U'+GQ'G where Q=B+ —LU"L' , G =U“L' B B . . . where B, {3: 3:] , L =[L,L,] Q_.=[Q. Q.J“=[R. a] Q; Q. R; R. Q, = A,‘I’"A, — A,‘I’"A,(A;‘I’“A3 +9, )-I A,‘P"A, Q, = A,‘P“A, - A,‘1’“A,(A;‘I’"A, + o,,)-' A,‘I’“A, Q3 = A',‘I"'A2 + Q: — A'2‘1"‘A3(A,','-I"’A3 + Q; )‘1 A;‘I’"A, 72 R, R, —R,L,U“ - R,L,U“ 1);: R, R, —R,L,U-l — R,L,U“ symmetric U“ + U“ (L',R,L + L',R;L, + L’,R,L, + L',R, L, )U" l 2 3 D; C39 C10, . = Cfo D1; Coo, C1,, C5,, 0;, U - D) II __V 5‘: __V: NV, 3‘ _.V :51 3V ,3: Thus in the E-step the needed conditional density of 93,, . 6,, , 8,, given the data and the parameters, 7:0,”,0,T,,,T,,,Tm,£, have the following locations and dispersion matrices: 9i = V11[(A2 - QzQi'A'z)‘1’"' (I - Ari/”AS?" )](y) 93 = (V22A2 - Qs-JQ2V11A1)T-JII - A3 (71311414, + T;,1)"A$‘P"](y) 9; = U"'A$‘1’“[y-(A19i +A29§)] 11 = [Q1 _ Q2Q3-1Q2 ]-1 12 : "'V11Q2 3-I 13 : -[V,,A;‘I’-IA3 '1'V12A2ql-lAslU—l 22 : 3-I +Q3-IQ2V11Q2 3-1 23 = “ll/1.2113443 +K2AéT—1A3]U_l 33 = U“ '[K3A1LP-1A3 'l'V2'3A2.‘IrlAs]U-l 7 3 Appendix 3 We show the explicit derivation of Q(®“),®“")) defined in chapter 3. By definition from Dempster, Laird and Rubin (1977), Q09") . 9“") = E (1081f (C;@“’)]IY = y;®“”"). By applying the theorem on the expectation of the quadratic function we have: E[9.§T"0-IY = y;®°‘“’1 = trng‘Ewr-éi IY = y;0“—”)1 '1 '7 E19T~T"9r,-IY = 20‘5”] = 4mm... 9,311» = y;®""”)] bl 'h E[£,,T.2"£,,|Y=y;@("”]= tr[2‘.“E(a,ja,:|Y =y;®“"’)] After some algebraic calculations we have: Q(@(” ,G)“"’) = (-Nr/2)ln 27t+(-N/2)ln| 2 | + (—Np / 2) ln(2 7r) + (—N / 2) ln|T,,|+(—Js/2)ln(27r) + (-J/ 2)ln| Th] 1 _ ~ . ~ . "EZZ ”[2 1((AoZij)Dx(AoZy)T + (AoDau A70) + ( (AOZ,.C;,,U A§)+(A,C,‘T,UZ,’T A1,))] 1 ~ . - - " ‘ ‘ -2ZZUh“Aozrfl —Aowv.)TZ ‘(y0‘-AOZU” _A°w"f) ‘izzvmg‘wflilhy,®“"’)l-%ZVIT;.‘E(0.0;IY=29““‘01 (3.2.1) 7 4 Appendix 4 In this appendix we provide the formula for the element of 21 is : A 1 0 t t 0 II of, = FZZQAWD, +22,,,z,,D,,§ +271 w .D T +22,,,z,,D,, +22,,,z,2,D,,. + bmr 48 J 22,,w 0,12,, + 271ng + 22,,D;,, + 2(22mc;,,, + 22,,z,,.C‘ + 2 w .C‘ + 1 any}; but J xafc O O O O O O t 2 Amemfc + 2'brrr“.7211'C1trrng'c} + {ynnj — 1m (”f + 221119“)- Awmvcrj — 111,ij ”bf — lbmvbcj} where m=1,..,4. 7 5 Appendix 5 In this appendix we present the derivations of the elements of the loading matrices. First by expanding the 8-th term in the right of the equation (3.2.1) we have the /24: following five terms: ”,...,? (1)(-2.0)y,’.2-‘A,Z,rr‘ (2)(-2.0)y,’.>:-‘A,nr; (3)(A,Z,£)’2“A,Z,rr* (4%AOZUIES’Z4AOw; (5)(A.w.;-)’>:*‘A.w; For each term we take the derivatives with respect to A0. Note that in taking the derivatives we regard A0 as a full matrix. Then we have: 7 (Ix—2.0572512; (2)(—2.012“yi"w;"” (3)1(2.0)z-'A.Z,r'(rr“2;)1 . (4)Z(E"Aotrz}}a"@3