1 a . . .1... i . . .2 f V. 2.. z )55.‘ : 23". 39:73.. «5&1 fun $535.. Event... .uhnfiaqnfiaw ‘ In”. . unfit W. n”. (I 3.3-.vriv9: 3.. 3.3.35 2 . . .2. :z .1. o in t 31.3353! :3... 1...: ....m.aquur a . V A 5mm. I . ‘4.- Lfixii...a 3 {$3. 3.3“... .135}. End», «Jihad , 3.5.5.2} 51...... int:? 349.. $3.3.) 1 . !_5G!.x...:\n.. . gill Y.... '15.)...1 ‘ .. 513%....1. (.55; 11.2. 118:... lllllllllllllllllllllllllllllllllllllllllllllll 31293021 Michigan State LIBRARY University This is to certify that the dissertation entitled INCORPORATING FACTOR ANALYSIS INTO HIERARCHICAL MODELS presented by Yasuo Miyazaki has been accepted towards fulfillment of the requirements for PH . D . degree in Coungeling, Educational Psychology & Special Education W/ Kama/«N M/ajor professor Date 7// é//§'5’ MSU is an Affirmative Action/Equal Opportunity Institution 0- 12771 PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE "’ 6%?) 2%? 11m c'JCIRODatoDuopas-p.“ INCORPORATING FACTOR ANALYSIS INTO HIERARCHICAL MODELS By Yasuo Miyazaki A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirement for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Educational Psychology and Special Education 2000 ABSTRACT INCORPORATING FACTOR ANALYSIS INTO HIERARCHICAL MODELS By Yasuo Miyazaki This dissertation incorporates a factor analysis model into a two-level hierarchical linear model (HLM). It provides the model layout in HLM format and derives the maximum likelihood estimators. A computational program to implement the theory is developed. Special attention is focused on the application of the model in order to create a bridge between the statistical model and applications in education and human development. That is, I describe in detail when and in what context the model might be useful and the kinds of research questions that can be addressed using this model, so that researchers who are interested in substantive issues rather than statistical issues can immediately apply the model to their research. Real as well as artificial data sets are used to illustrate the theory, the application, and the interpretation of the results. In the last Chapter, extensions of this model and their settings are mentioned. © Copyright by Yasuo Miyazaki 2000 DEDICATION This dissertation is dedicated to my parents, Yoshio and Sakiko Miyazaki, for their unconditional love and support. To my sister, Michiyo Kurakata, who cared about her elder brother, even though she was busy raising her four children. To my late elder brother, Tomoo Miyazaki, who didn’t have much happiness in his life, having passed away without reaching age 40, while I was in the United States. I apologized to him in front of his tomb for my absense, for not having come back when he died. That was three years ago already. This time I will bring this disseertation back home and will show it to him. He might ask, “Did you have fun in the U.S.?”. iv ACKNOWLEDGMENTS It took a longer time than I thought to complete this dissertation. It was a long and lonely journey. At midnight, I often dreamed of the scenery of my home town village in my childhood, such as the gentle shape of the round mountains and the sound of the rice paddies when the breezes blew across the leaves of rice. The dream was ofien the same and was so vivid even though my childhood was more than 30 years ago. During tough times, I often spoke the golden words to myself, “There is no night that doesn’t open (No night is endless),” and “Enduring snow, plums bloom sharper scents in the spring.” Additionally, the words from my father always relaxed me and made me realize that blue collar spirit was my blood: “There have been no persons in my family who can make a living by their brains. However, there is a way for such a person to survive. Use your body.” The words from my mother, “I will live and wait until you come back home,” lifted my spirit for completion of the program. During my journey, I fortunately had many accompanying me. I would like to extend my sincere gratitude to my dissertation director, Professor Stephen W. Raudenbush, my advisor, Professor Kenneth A. Frank, and dissertation committee members, Professor Betsy J. Becker, Professor Mark Reckase, and Professor James Stapleton, for their advice, insight and guidance throughout the process of writing this dissertation. My special thanks go to Dr. Randall P. Fotiu for his help with computer programming, which caused me a lot of trouble. Haiyan Zhang spent lots of her time to help and to support me. Thank you very much, Haiyan. I cannot enumerate here all the people to whom I am indebted, who offered me many kinds of help, support, and encouragement. I’m sure, however, that without their help, I wouldn’t have finished the dissertation. I want to make this fact my constant reminder for the rest of my life. “Do a little help. That makes you a slightly better man.” vi TABLE OF CONTENTS LIST OF TABLES ......................................................................................................... viii Chapter Page 1. INTRODUCTION ........................................................................................................ 1 1-1. Motivation of the Study ....................................................................................... 1 1-2. Overview of the Past Studies of Multilevel Latent Variable Modeling .............. 7 1-2-1. Papers on Latent Variables in HLM ........................................................... 8 1-2-2. Structural Equation Models (Mean and Covariance Structure Models) ................................................ 13 1-2-3. Modeling the Error Structure .................................................................... 15 1-2-4. Multilevel Factor Analysis ....................................................................... 16 2. SPECIFIC SETTINGS IN WHICH THE PROPOSED MODEL ARE USEFUL ...... 18 2-1. Slopes-as-Outcomes Model in Organizational Studies ...................................... 18 2-2. Growth Models in Human Development Studies ............................................... 20 2-3. Psychometric Analysis of Multivariate Outcomes ............................................. 22 3. APPROACH AND MODEL LAYOUTS .................................................................... 25 3-1. The Model ........................................................................................................... 25 3-2. Similarities and Differences from the Multilevel Factor Analysis Model .......... 26 3-3. Identification ...................................................................................................... 30 4. ESTIMATION AND INFERENCE ............................................................................. 50 5. COMPUTATIONAL EXAMPLES ............................................................................. 55 5-1. High School and Beyond .................................................................................... 55 5-2. Infant Vocabulary Growth .................................................................................. 60 5-3. Scholastic Aptitude Test (SAT) Meta-analysis of Coaching Effects ................ 67 -vii- 5-4. Results from the Simulated Data. ....................................................................... 81 6. CONCLUSIONS AND FUTURE DIRECTIONS ...................................................... 95 6-1. Conclusions ........................................................................................................ 95 6-1-1. Methodological Contributions .................................................................. 95 6-1-2. Substantive Contributions ......................................................................... 98 Expanding Modeling Possibility .......................................................... 98 Recovery of f ................................................................................ 98 Improvement over current ad hoc procedure .................................. 99 View of current ad hoc procedure from HLM2F framework ........ 101 Substantive Interpretation of latent variables ...................................... 102 Practical Interpretability of the Results ............................................... 103 6-1-3. Unsolved Methodological Problems ....................................................... 103 Confirmation of Global Maximum ..................................................... 103 Identification ....................................................................................... 106 6-2. Future Directions ............................................................................................... 107 6-2-1. Including Specificity Parameters ............................................................. 108 6-2-2. Multivariate HLM With a Factor Model .................................................. 110 (2-leve1 Multivariate Model) General Formulation of the Model ...................................................... 110 Example 1. Growth Model .................................................................. 113 Example 2. Multivariate Growth Model ............................................. 116 6-2-3. Application to Item Response Theory Model .......................................... 120 ANOVA type formulation of 2-P IRT using HGLM .......................... 122 Regression type formulation of 2-P IRT using HGLM ....................... 124 Rasch(1-P IRT) model .................................................................. 124 2-P IRT model ............................................................................... 127 Multi-dimensional IRT model ....................................................... l3 1 6-2-4. Other Possibilities .................................................................................... 132 APPENDIX .................................................................................................................... 134 REFERENCES ............................................................................................................... 154 -viii- 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 LIST OF TABLES Results of the Analysis for High School and Beyond Survey ............................. 59 Results of the Analysis for the Infant vocabulary Growth Data .......................... 65 Classification of SAT Studies .............................................................................. 68 Results of the Analysis for the SAT Coaching Effects Data ................................ 74 Descriptive Statistics of 4 Generated Outcomes .................................................. 82 Sample Covariance Matrix of 4 Generated Outcomes ......................................... 83 Sample Correlation Matrix of 4 Generated Outcomes ......................................... 83 Shapiro-Wilk Test for Normality for 4 Generated Outcomes .............................. 84 Characteristics of the Specified Models ............................................................... 88 Results of the Analyses for the Simulated Data by Four Models ........................ 90 Several Results for the Tests of the Model Fit ..................................................... 91 Estimated Coverage Probability and Its Confidence Interval for 1000 Simulations ........................................................................................................... 93 ix Chapter 1. Introduction l-l. Motivation of the Study Suppose that researchers are interested in how influences of students’ gender, race, SES, family structure, IQ, and pretest score on their achievements vary from school to school. This is a typical research question that motivated the development of the hierarchical linear model (HLM) (Bryk & Raudenbush, 1992), which is alternatively called the multilevel model (Goldstein, 1995), the random coefficient model (Longford, 1993); and, often in statistics and biostatistics literature, the mixed model. (See, for example, SAS manual (1996) for Proc Mixed). The HLM can be implemented by several software programs, such as HLM (Bryk, Raudenbush, & Congdon, 1996), MLWin (Goldstein et al., 1998), SAS Proc Mixed (SAS Institute, 1996), and MIXOR (I-Iedeker, D., & Gibbons, R. D., 1996). After the software became available, a fair amount of educational research was devoted to this issue (See, for example, Aitkin, Anderson, & Hinde (1980), Aitkin & Longford (1986), De Leeuw & Krefi (1986), Goldstein (1986), Raudenbush & Bryk (1986), and Lee & Bryk (1989) among others). Often, however, we don’t have enough data to support a complex model that has many parameters. Specifically, if we increase the number of randomly varying coefficients, we need a larger sample size per cluster. In an meta-analysis settings, suppose that we have multiple effect size estimates from a collection of independent studies that test the same hypotheses, and we wish to synthesize these effect sizes. In this setting, it is sometimes as important to make an inference about variance components as about fixed effects because the variances indicate -1- how much the effects sizes vary from study to study (Raudenbush, Becker, & Kalaian (1988), Kalaian & Raudenbush (1996)). Though in theory we can model as many the random effect sizes as we want, the number of the independent studies is usually relatively small in meta analysis (about 50 - 100 at most), naturally limiting for the number of random effects that can be estimated from the data at hand. A more classical and standard application of multivariate random effects is a psychometric analysis for multivariate outcomes. For example, if researchers want to know the reliabilities of a test that is supposed to measure multiple constructs, we may formulate a two-level hierarchical model using dummy variables at level-1 as predictors. Then we ask how scores from multiple domains or items vary among students. From the variance estimates, we can determine the test reliabilities for each domain. In the growth model literature within the hierarchical modeling framework, we often model the individual growth trajectory by a polynomial at level-1. At level-2 the coefficients of the level-1 model, that is, the person-specific growth parameters, become multivariate outcomes. In such a growth study, it is ofien researchers’ substantive interest to examine correlations among the growth parameters, for example, the correlation between initial status and rate of growth. If the growth function is complex, there will be a large number of random effects at level 2. The mixture of multivariate outcomes and growth models also applies to the context where we have many random effects in level-2. In this case, correlations within individuals occur for two reasons; one is that multiple measures are available for the same individual and the other is that a single measure is taken over time for the same individual. In a statistical sense, the common theme in the above examples is that the researchers wish to study many random effects simultaneously. If we have a large enough sample to estimate many parameters in the level-2 variance-covariance matrix (we will call this the “tau matrix”), it is possible to estimate such a large model. However, even in such a case, researchers who analyze data using a two-level hierarchical linear model still sometimes encounter the problem of slow convergence or even worse, noconvergence, because the tau matrix is really singular. In statistical literature, the problem of a singular tau estimate is known as the Heywood case (Heywood (1931)) or boundary solution problem (Catchpole & Morgan (1994)). This problem occurs when the estimates reside near/at the boundary of parameter space. The currently most common approach to remedy this problem is to get rid of some of the random terms at level-2 and to re-estimate the smaller number of parameters in the tau matrix. Sometimes this approach does not make sense since removing a random term not only removes the large correlation that we want to get rid of, but also removes the variances that we want to keep in the model when the variation of the corresponding term exists. Actually, we can find many examples of the above cases where the tau-matrix has high correlations among random terms. For example, Bryk & Raudenbush (P. 141-144, Chapter 6, 1992) reanalyzed Huttenlocher et al. (1991)’s children’s vocabulary growth data. The quadratic growth trajectory was formulated and they found the high correlations among random coefficients of the initial status, SIOpe, and rate of acceleration (Table 6.5, 1.000 — 0.982 - 0.895 P. 144, Bryk & Raudenbush, 1992), which was - 0.982 1.000 0.842 . Later the -— 0.895 0.842 1.000 intercept was dropped completely from the model, but the correlation between the slope and the rate of acceleration was still very high at 0.904. As a biostatistics example, Longford (P.136, 1993) analyzed the weights of newborn rats nested within litters and found that the correlation between the mean rat weights adjusted by diet contrasts and litter size and the gender gap in weight among 0505 — 0.139 . . , and the correlation rs litters was almost -1.0 (r = [_ 0.139 0.038 A £0! -00139 , . . u ., . = . ,. = as —l.0034,thou htlus estimate rs not A °’ u") JIM, J(0505)(0.038) g admissiblel). In psychometrics, we are often interested in measuring the true score of the student. Suppose there is a general aptitude test that consists of 4 domains such as Math, Science, Reading, and Social Studies and each domain consists of 10 testlets whose scores are given simply by adding up the correct responses for the items in the testlet. If we conceive the above situation as testlets nested within students and we know which domain each testlet is supposed to measure, then there are four true scores for each student and those are likely to be highly correlated because high ability students are likely produce high scores in any domains, in general. Then, if we have such data as repeated ' Longford used the Fisher scoring algorithm, which can produce the estimate outside of the parameter space. The results are the case of this out-of-bounds estimate by Fisher scoring. -4- measurements for students (testlets are nested within subjects), we can expect the high tau-correlation matrix for the random coefficients that describes between-student variation. If we take down our analysis down to the item level in the above scenario, then we have dichotomous outcome instead of continuous outcome. This is a scenario of using an item response theory (IRT) model (Hambleton, Swarninathan, & Rogers (1991)). The IRT model fits a nonlinear function that decomposes the log-odds of a correct response into an item-specific part and a person-specific part. The most standard IRT model is a unidimensional model and is in an exploratory factor analysis mode, but there is a multi- dimensional IRT model in terms of dimensionality, for example, see Reckase (1991). Even the multi-dimensional IRT model is still estimated by exploratory mode analysis, i.e., we are interested in how many dimensions we need to represent the item-person interaction. It is possible to execute a confirmatory mode analysis if we formulate the multi-dimensional IRT model with a hierarchical-model format. We will illustrate this point in Chapter 6. In this sense, the IRT model can be considered to be a non-linear version of correlated outcomes within subjects, corresponding to the previous linear and continuous version of correlated outcomes within subjects. In fact, Rasch model, the simplest IRT model, can be formulated by a hierarchical model by conceptualizing that items nested within examinee (Kamata (1998)). Raudenbush & Sampson (1999) applied the Rasch model to the items measuring neighborhood environments in terms of physical and social disorder. Their model can be seen as a multivariate Rasch model because they used dummy variables to represent different constructs. Most of the above cases involve micro aspects where we are interested in individual differences and in individual variability. However, we can find the examples in other substantive disciplines. For example, from organizational sociology, data from National Adult Literacy Survey study analyzed by Raudenbush and Kasim (1998) involves the regression of literacy on ethnicity conceived as European-American, African-American, Hispanic-American, Asian-American, and Other ethnicity. Thus, four dummy variables are required to represent ethnicity. If these dummy variables’ regression coefficients are allowed to vary from state to state, it can be speculated that those random coefficients might be highly correlated because of the similar social distribution of the minority disadvantage. Another example can be found in meta-analysis literature. Becker (1 988) found the rather high correlation 0.91 between the experimental and control standardized mean changes across studies. These above cases all suggest that those random coefficients share some part in common and tell us that once we know one of the random errors, we can discern the other to some extent. Then, it might be reasonable to think about a latent variable that is shared by those random coefficients and that produced the high correlation. In this scenario, a factor analysis model is one of the candidates to represent this idea. In factor analysis, we consider a small number of latent variables that explain the correlation among a larger number of observed variables. In this sense, the factor analysis model is useful for obtaining a parsimonious summary by reducing the number of parameters in a technical sense, but also it might extract the essential relationship among the constructs that play a central role in the social theory. Further, it might help explore and elaborate or test and confu'rn the theory in mind. Also, there are computational advantages of incorporating the factor analysis model to HLM. That is, in the current HLM, when some of the correlations get close to the boundary values such as 1 or - l , then its convergence gets very slow, or in the worst scenario, it terminates without giving the estimates. If we use the factor analysis model, we can expect quick and stable convergence because in factor analysis model, we are not estimating the covariances that are close to their boundary. 1-2. Overview of Past Studies of Multilevel Latent Variable Modeling In this section, I review the current state of the research in the field of multilevel latent variable modeling, which is an effort in combining multilevel modeling, structural equation modeling, and item response theory (IRT) modeling. It is a goal of scientific research to integrate different classes of models. Since the original model was developed in each tradition to solve specific types of problems, each modeling framework inherently has its own characteristic strengths and weaknesses. If we take the approach of incorporating another framework based on one framework, then the newly-developed model and the methodology are reminiscent of the parent’s model characteristics. Therefore, here we review those studies by emphasizing the underlying ideas to develop the model and summarize the similarities and differences as well as the strengths and weaknesses. We also describe a methodology that can be a potential building block of a broader class of model, specifically, that of Jennrich & Schluchter’s (1986) work. 1-2-1. Papers on Latent variables in HLM Incorporating latent variables within the hierarchical linear model (HLM) can be found in Raudenbush, Rowan, and Kang (1991), where classical test theory was introduced into their level-1 model to handle measurement error variation. Most social science research involves instruments that are supposed to measure certain constructs, but they produce measurement errors. If these measurement errors are ignored, then maximum likelihood estimates of correlations will be attenuated (Lord & Novick, 1968, page 69-74) as will be the regression coefficients. As a result, we will get biased estimates. The instrument used by Raudenbush, Rowan, and Kang (1991) is the Administrator Teacher Survey (ATS), which is a questionnaire that asked the high school teachers about school climate consisting of 35 items that are supposed to measure the five constructs. They used a three-level model in which the level-l units were items, the level- 2 units were teachers, and the level-3 units were schools. In their model, using five dummy variables at level-1 specified which items were intended to measure which of five constructs. Thus five latent variables which played a role of true scores in classical test theory were specified in the level-1 model. At level-2, those latent variables in turn became multivariate outcome variables defined on teachers. At level-3, the school mean of each entry varied over schools. The unrestricted model had no predictors in each level. This enabled the authors to do psychometric analysis, i.e., to estimate the internal consistency reliability of observed-scale scores. A major contribution of this approach was that they clarified the ambiguity that occurred from computing usual Cronbach’s a (1951) by ignoring the school-clustering effects and separated it to two components, the reliability of the teacher-level measures and the reliability of school-level measures. The results showed that we could recover the attenuated correlation by incorporating the measurement model. Another aspect of this level-1 model can be considered as a special case of confirmatory factor analysis with factor loading weights specified as unity. Actually they found evidence that the number of factors was less than five by computing the eigenvalues for each teacher or school-level covariance matrices. This paper incorporated the measurement error in dependent variables in their level-1 model which served as a measurement model, and then used the true scores as dependent variables at level-2 that served as a structural model. In this sense, this was a prelude for complete structural equation modeling (SEM) which incorporate measurement errors in both dependent and independent variables. In fact, Raudenbush and Sampson (1997) further extended this line of approach to formulate a model that allowed measurement errors both in dependent and independent variables. The model was formulated to examine the extent to which neighborhood social control (Z) mediates the association between neighborhood social composition (X) and violence (Y) in Chicago. The model was three-level model (measure, person, and neighborhood were the units for each level), and Y and Z involved measurement errors and X did not. Also Y and Z were measured at level-l and X was at level-3. Then at level-1, for measure 1' for person j in neighborhood k, a measurement model was formulated to represent the set of true scores and error scores using dummy variables as in the previous Raudenbush, Rowan, and Kang (199l)’s model. That is: Level-1: R0,, =D,,.j,,(Y,.k +gjk)+ 02,1.“ij +vjk), (1-1-1) 5,, ~ N(0,afi,, ) vjk ~ N(0,0'221.,,), where DW is an indicator variable taking on a value of 1 if R”, is a measure of perceived violence and 0 if not, and D2,], is an indicator variable taking on a value of 1 if Ry, is a measure of social control and 0 if not (i = 1,2; j = 1,...,Jk; k = 1,...,K). Note that this formulation allows missing data such as when a person provides a measure of perceived violence but not of social control, and vice versa. The level-2 model was formulated for person j in neighborhood k: Level-2: _ T Y/k — Yr +ij7‘yk +rjjk z}, = Z, + W114, + r”, (1-1-2) (”1") 01"” Q”) ’ka ~N( O ’ (2,, Q: )’ where W]: = (A ge j,‘ , Gender}, , SES,, ) that was used for adjusting for background differences. The level-3 model describes the variation across neighborhood of adjusted neighborhood mean perceived violence and social control: -10- Level-3: Y1: = Xkrfly +uyk 9 Zk = Xkrfl: 1" “:1: : (1'1'3) uv. O T T.- liww ‘- » uzjk 0 Try Tzz where X I was measures about neighborhood environments such as poverty concentration, ethnic isolation, and percent of foreign born, that were considered measured without error. For other regression components in level-2 model was fixed for parsimony: 7r... = flyo, (1-1-4) ”:1: = flzo ' Since the above model has the same structure of the model in Raudenbush, Rowan, and Kang (1991), the parameters such as ,6, , ,6: , ,Byo , .520 for fixed effects, and T», , Tyz , and T: for the random effects could be estimated by the standard method. However, their interest was to evaluate the mediating effect of social control (Z) on the regression model of social composition (X) on perceived violence (Y) at the neighborhood level. This was done by considering conditional distribution of YIZ, X from Y,Z|X . Since (YkT, Z I )T in the model Equation 1-1-3 had multivariate normal distribution, the conditional distribution of YIZ, X was obtained by standard normal theory. To obtain the decomposition formula, the standard structural equation modeling (SEM) (or originally, the path analysis) idea was used. That is, they first wrote the -11- regression of Y on X and Z, and the regression of Z on X. Creating the reduced form (the regression of Y on X) gave the formulae for the decomposition of total effect into direct and indirect effects. The all necessary quantities to evaluate the decomposition were the T functions of fly, ,6, , flyo, ,6;o T y, , and Tu. W ’ The decomposition of total effects into direct and indirect effect among latent variables is often executed by the SEM and thus it can be considered to be an attempt to integrate the HLM and the SEM to a broader class of model basing on HLM and then incorporating SEM. The only difference is that the model by the HLM used classical test theory model instead of factor analysis model that is often formulated by the SEM. If the’dependent variable is not continuous, say, dichotomous, we need a different model and the estimation method. Often, measurement instruments in social sciences such as cognitive achievement test consists of dichotomous items. In Raudenbush and Sampson (1999), the systematic social observation (hereafter, $80) was used to measure the physical and social disorders of face blocks in Chicago neighborhood. The evaluation of the SSO’s items were dichotomous. A three-level model involved items within face blocks within neighborhood. The level-1 model was a Rasch model (Rasch, 1960) that related item responses to item difficulties and face-block severities. At level-2, between face-blocks within cluster, the face blocks were outcomes depending on neighborhood - level intercepts. At level-3, neighborhood intercepts varied over neighborhoods. This analysis produced psychometric properties such as internal consistency reliabilities at each level. -12- Similarly, Cheong and Raudenbush (1999) applied Rasch Model to Child behavior Check List 4-18 (hereafter CBCL 4/18) (Achenbach, 1991) to calibrate the extent of severity of the externalizing behavioral problems of the children for each item. They used dummy variables to represent two different constructs (aggression and delinquency) indicator in level-1. In psychometrics, the application of HLM to Item Response Theory (IRT) model can be found in Kamata (1998), where he used the Rasch Model. Note that in psychometric literature, the item severity to measure the extent to which the neighborhood was disordered is replaced by item difficulty and the neighborhood propensity to social disorder is referred to as person ability. Bock (1988), and Adams, Wilson & Wu (1997) took a different direction, where they started from a 2 or 3- pararneter IRT model. 1-2-2. Structural Equation Models (Mean and Covariance Structure Models) As we mentioned in the above when we refer to Raudenbush and Sampson (1997), the framework of separating a statistical model into a measurement model and structural model has a long tradition in mean and covariance structure models or structural equation models (SEM), which are implemented by package softwares such as LISREL (Joreskog & Sorbon (1995), EQS (Bentler & Wu (1995)), AMOS (Arbuckle (1995)), and SAS CALIS procedure (SAS Institute Inc. (1990)), and M-Plus (Muthén (1998)) among others. In mean and covariance structure models, the measurement model is usually expressed as a confirmatory factor analysis model (CF A), which utilizes factors -13- as latent variables. The primary goal of factor analysis is to explain the covariances or correlations between many observed variables by relatively few underlying unobserved factors. In this sense, it is a data reduction technique. CF A, as contrasted to traditional exploratory factor analysis (EF A), implies that a model is constructed in advance, the number of factors is set by the researcher, whether a factor influences an observed variable is specified, some direct effect of factor on observed variables are fixed to zero or some other constant, covariances of factors can be estimated or set to any value, etc. Thus, in CFA, there are more opportunities that the researcher’s idea are reflected on the model. As suggested in the previous section, hierarchical linear model (HLM) and structural equation modeling (SEM) has much in common as methodology. In growth modeling context, Willet & Sayers (1994) showed that growth trajectory approach taken by HLM could be done by SEM if the data are balanced. Several methodologists claim that, compared to HLM, SEM approach has an advantage in terms of flexibility of modeling of the complex structure of variance-covariance matrix that naturally arises when we formulate a complicated causal model (Muthén & Curran (1997), Willet & Sayer (1996), Chou & Bentler (1998)). However, it is limited in dealing with nested structure of data and unbalanced designs because the model is estimated fi'om sufficient statistics such as either the correlations or the means and the covariances. An attempt to incorporate clustered data structure and dealing with unbalanced and missing data which is a feature advantage of HLM can be found in, for example, Muthén (1989). -14- Recently Muthén (1998) developed a methodology which integrates categorical and continuous latent variable models. Latent categorical variable plays a role of a classification variable in multiple-group SEM, but the group membership is unobserved and is determined from the data such that a person is classified into the class that has the highest probability. In growth modeling context, this methodology not only adds more flexibility of the modeling that can reflect a class of substantive developmental theories which concerns with discrete transitions or qualitative changes rather than quantitative changes, but also provides a practically very useful way of predictive diagnostics in preventive or intervention research. Comprehensive and systematic treatment of latent variable models including factor analysis, latent trait analysis (IRT), latent profile analysis, and latent class analysis can be found in Bartholomew & Knott (1999). 1-2-3. Modeling the Error Structure An attempt to incorporate various patterns of the error covariance structure into unbalanced repeated measures was made by Jennrich and Schluchter (1986) in a longitudinal study context, where we can consider the case when the study was executed perfectly in a designed way, i.e., no missing observations. If there are no missing observation, we can utilize the standard MANOVA, or MAN COVA model. The key idea of their model was to introduce a covariance structure model for the complete data and then to link the observed outcome and the complete data by missing observation indicators. They formulated the standard MANCOVA growth model for that complete -15- data and considered various error structures for the complete data error terms that were used in the standard models such as time series model. Those include compound symmetry, heteroskedastic errors, auto-regressive errors, and so on. Exploratory mode factor analysis model which put a constraint on covariance matrix (D , (I) = I (identity matrix) was one of the structures they listed. The key assumption for the missing data matrix was missing at random (MAR). This line of approach can be seen in Thum (1997). He explicitly stated that the key assumption for the missing data matrix was missing at random (MAR). Thurn specially focused on individual differences and individual variation. Starting from standard MANCOVA model for repeated measurement on human subjects, he formulated the two-level hierarchical linear model. In addition to showing various patterns of covariance structure of both level-1 and level-2, he addressed the sensitivity of inferences for small sample size data that is often the case in psychological studies by using a multivariate t-distribution instead of a multivariate normal distribution for modeling the variation of random coefficients. 1-2-4. Multilevel Factor Analysis Longford and Muthén (1992) analyzed data having eight domains in mathematics for the 8th grade from the Second International Mathematics Study (SIMS) carried out in 1982, and they considered a model with factor structures at both the student and school level. They used Fisher scoring to obtain maximum likelihood estimates for this model. Their theory was developed for the exploratory factor analysis phase, not for the -16- confirmatory phase. The main objective was to see whether the factor loading matrices had the same patterns at the student and school level. Muthén (1991) developed the multilevel factor analysis model from the SEM perspective and showed that if sample size per the level-2 unit was large, the conventional SEM approximate estimator worked well, providing similar results to those of the Longford and Muthén (1992) for the same SIMS data set. Muthén, Khoo, and Gustafsson (1998) extended this approach to multiple groups. They used the eighth graders’ 16 achievement scores from National Educational Longitudinal Study (NELS) administered in 1988, conceptualizing that urban Catholic and urban public schools are two distinct populations. Thus this methodology is a generalization of conventional latent variable multiple-group analysis to two-level clustered data. Though multilevel factor analysis is already developed, there are certain cases where it appears useful to incorporate the factor analysis model directly into standard hierarchical linear and non-linear models. That is, seeing the level-1 model of HLM as a model that generates latent variables that represent the true scores , in a broader sense, we apply the factor model for those random parameters such that the true score can be decomposed into a common score and a specific score in level-2 model. Then, we can clearly assess how much variation is explained by communality and how much is by the specificity. This decomposition can not be done by standard factor analysis because specificity is confounded with error variance in the model. Thus, in Chapter 2, we will give a rationale for incorporating a factor structure in level-2 in HLM in more detail. -17- Chapter 2. Specific Settings in which the Proposed Models are Useful In this chapter, contexts in which incorporating factor analysis structures into HLM would be useful will be discussed in more detail. Next the model can be laid out in terms of a hierarchical linear model. Technical issues such as method of estimation and derivation of computational formulae will be discussed in Chapter 3. 2-1. Slopes-as-Outcomes Model in Organizational Studies Suppose that in a school effectiveness study such as High School and Beyond (Coleman, Hoffer, & Kilgore (1982)), we want to know that how much the relationship between math achievement and student characteristics such as race, gender, SES vary among schools. Suppose, in fact, that there are six covariates at level 1, then naturally, we might formulate L-l: Y1,- = :30,- +fljxlij +fl21x2ij +ij3ij + ,6,ij +flSJ'x5i'j +36%“,- + 5i} , (2'1'1) it'd where 8,}. ~ N(0,0'2) and xly,x20,x3ij,x4y,x50, and x6”. is the certain covarrates of student characteristics for student i in school j. At level 2, suppose that those coefficinets all randomly vary among schools after being accounted for by a certain school characteristic W! . -13- A); =700 +701Wj +qu H} =7ro +711Wj +qu :82} =720 +721Wj +u2; :63} =730 +731Wj +u3j flu =74o +7.41% +u4j :65} =750 +751W1 +qu flea,- =76o +7ij 1'qu (212) where u J. = (uoj, u”, uzj, u3j, us], u61)T is considered to be distributed as mean of 0 and the covariance of r , a 7 x 7 symmetric matrix. This is fairly large model since the number of unique parameters in tau-matrix is (7 x 8)/2=28 and the data may not provide enough information to detect all the variances and covariances, or the current algorithm may not converge in reasonable time. However, if indeed u”. and u2j go together and an , u“. , us]. and u6j. go together, then we formulate the factor model for T u)- " (uojgurjiuzpuli’urtj’qu’uéi) as fuel) ( 1 u”. 0 uzj 0 u” = 0 u“ 0 us}. 0 \ucj) KO ooookr—o .N3°&*~—oo 0\ ’70} 7701' 0 W00 Won 771} a 771} NN( 0 3 W10 W11 772; 7721' 0 W20 W21 In matrix notation, the above models can be written as L-l: y} = XPBJ. +8}, 8,. ~ N(0,0'21,,/) -19- 1” 02 W12 )- (2'1-3) W22 (2-1-4) ,6]. = ij + A17]. (2-1-5) Note that r = D(uj) = A‘I’AT. (2-1-6) By using a factor model, we reduce the number of variance-covariance parameters estimated from 28 to 10. 2-2. Growth Models in Human Development Studies Suppose that in a child development study, an outcome is modeled by a linear function of age and want to know how much variability exists among children for intercept and the rate of growth. Then, the model is L-l: Y". = no, + ”Na”. + a". , 8". ~ N(0,0'2) (2-2-1) where a,,. is the age of child i at time t for t = 1,..., T, , and i = 1,..., n. Or, in more compactly, Y". = Air, + 6‘". , (2-2-2) where A". = (1, a,,)T and 7:, = (7:0,, 7:1,)T. In matrix notation for child 1', Y, = Ag; + 8,. , 8,. ~ N(0,O’21T’) (2-2-3) -20- where Y, =(K,,...,YTI,)T, A, = E ,and g, =(g,,,...,g,,,)r. Inis the T, x T,identity matrix. The level-2 model describes the variability of those person-specific parameters among children. L-2: 7r , = + u u , 0 r r UNION: :1» Or, in matrix notation, 7:, = y + u,, u, ~ N(0, 1). (2-2-5) Now suppose we suspect that u,, and u,, are correlated with the correlation of 1 or -1. Then using the proposed model, we formulate u” - '7‘” 27 ~ N(0 w ) (2-2-6) u.,=/1no,-’ °’ ’ °° ' In matrix notation, /_\ K. \- \___/ ll 1 [,)—§Iog1V,I—§ejV, 'e,.. (4-8) Longford (1987) gives the score vector and Hessian of the log-likelihood function (4-8) in terms of element wise formula. Raudenbush (1994) showed the formulae in more compact matrix form. The key results are for the q x 1 score vector 61, 1 6vecV, r S, EE=§( §¢ ) (vecM,) (4-9) where M, = V,"(e,ef — V, )V,’1 (4-10) and J S = 25,. (4-11) j=l For the q x q Hessian matrix, H _ E 621, 1(6vecV,]T V" ®V" (6%ch 412 j: [wa¢T]_'—2 a¢ (j j) a¢ (' ) and J Hz 2H,. (4-13) 1:1 We apply the above general formulae to obtain (3,4,5. The details of the derivation and the algorithm are in the Appendix. -52- To implement the Fisher scoring algorithm, we need a starting value of the parameters in the variance-covariance matrix V, . Though it is arbitrary, we need to provide a good starting value to obtain the MLE within the parameter space and its quicker convergence. Though V, is a function of three components, i.e., ,1 , w , and 0'2 (See Equation (A-4) in the Appendix), the hard part of providing a good starting value is in 71 and 1,11 , which are the vector of unknown elements of A and unique elements of ‘1’ (Starting values for the estimate of 0'2 can be found in the section of Starting Values in the Appendix). One way to obtain the starting values for the estimates of 2. and 11/ is by analogy to the factor analysis. That is, we execute principal component factor analysis on the positive semi-definite f , which is obtained by using the same method as current HLM2, and then take advantage of the pre-specified structure of A to find the initial estimate of A , which is denoted by A”). Then solving 1 = A‘I’AT for ‘1’ and substituting A by A”) and r by zT , we can obtain ‘1"0’ , the initial estimate of ‘1’. The details of the step was shown in Appendix. Finally I note that in order to obtain approximate standard errors for 6M”. , we will compute the information matrix by Inf0.= —H (4-14) where H is evaluated at (3mg. Then, the asymptotic standard errors for grim are computed by s.e.(¢3,) = (r = 1,..., Q) (4-15) 1 Info." -53- where ¢, is the rth element of vector ¢ and Info.” is the corresponding rth diagonal element of the information matrix ( Info. ). For the standard error of 6 , the MLE of fixed effects 61 , we will compute the asymptotic covariance Dim-(AMA. )" we where V is evaluated at q; , the MLE of covariance structure parameter ((1. Then the standard error of the rth element of a vectoré is given by S.e.(6A,,) = [(A,TV"A, )-' 1,, V (4-17) for r = 1,2,..., F. Note that since 6, and 61, are known to be asymptotically normally distributed, we can use them for the inference by keeping in mind that it’s an approximation.3 3 The large sample normal approximation for the standard error of 6 is poor in most applications so that other techniques are required in practice (Bryk & Raudenbush, 1992, Chapter 3). -54- Chapter 5. Computational Examples In this chapter, I will demonstrate how the proposed model can be applied to real data sets and when it is useful to do so. I first show that the HLM2 factor analysis model (HLM2F) includes the standard HLM2 model as a submodel, by using a model formulated in Bryk & Raudenbush (1992) for the High School and Beyond data. The next example shows a case when the HLM2F model is usefirl, using the data from Huttenlocher et a1. (1991) on children’s vocabulary growth. The third example shows how to specify the factor analysis model when the Tau-matrix (the level-2 variance- covariance matrix) is relatively large such that specification of the factor structure is less obvious. The fourth example uses artificial data with known parameters. The purpose of demonstrating this example is not only to provide a check for the validity of the computation, but also to demonstrate how to formulate a reasonable model, i.e., a substantively meaningful and an identifiable model. Further, it illustrates how to evaluate and correct mis-specified models. 5-1. High School and Beyond I use the High School and Beyond (HS & B) to demonstrate that the standard HLM2 model is a submodel of the two-level Hierarchical Linear Model with Factor structure (HLM2F). The data is a subsarnple from the 1982 High School and Beyond Survey, and includes information on 7,185 students nested within 160 schools. In Table 4.5 on page 72 of Bryk & Raudenbush (1992), the results of the following model are presented: -55- HLM2 model: L-l: _ iid Y, :60, +6,,(SES—SES_,),, +8,,8,, ~N(0,0'2) L-2: fl),- = yo, + y,,(Sector),- + 702(SES.,-)j + “0} (5-1) ,6, = 7,, +y,,(Sector), +y,2(SES,,), + u,,, (2:1)TN((3),(:$3 if: )1 Here, Y, is the math achievement score for student i in school j; SES,, is a measure of the socio-economic status (SES) of student i in school j; SES.)- is the sample mean of SES of the school j; and Sector, is the indicator variable taking on a value of ‘one’ for Catholic schools and ‘zero’ for public schools. Since the model which I developed in Chapter 3 uses the full maximum likelihood with the Fisher scoring algorithm and the default HLM2 uses the restricted maximum likelihood with the EM algorithm, we set the estimation method of HLM2 as comparable as possible to the HLM2F. This can be done by setting the HLM2 optional specification to MLF (Full maximum likelihood) and the number of Fisher iterations to 1. Setting the number of Fisher iterations to l specifies the program to run all the way by Fisher -56- scoring, but if the algorithm fails for some reason], then the algorithm switches back to the EM algorithm. The results of HLM2 by MLF with the number of Fisher scoring = l are in the second column of Table 5.1. The HLM2F model equivalent to the above model can be specified by setting the factor loading matrix as the identity matrix of the size of the number of random effects in HLM2. This can be shown as follows. HLMZF model: L-l: — iid Y, = 6,, + 6,, (SES — SES,), + 8,, ,8, ~ N(0,0'2)(Same as L-l model oquuation (5.1)) L-2: flu} =700+701(S86t0r)j+y02(SES,j)j+1.771} +0'772j . (5'2) 6, = 7,, +7,,(Sector), +7,2(SES,,), +0- 77,, +1472, where (:22) N16111:. 1;» In matrix form, the level-2 model can be written as 6, = W,7 + A77, (5-3) I Most typical reason of this failure is that Fisher will produce negative definite i" . -57- 1 Sector}. SES,}. 0 o 0 J h W.= __ , w ere ’ [0 0 O 1 Sector}. SESJ. ' l 0 7=(7ooa 701’ 702’ 710’ 711’ 712)r’A=(0 J’and nf=(noj’ ’7'1)T' Thus ifwe set A =12 in the HLM2 model, then uj = n}. and r = A‘PAT = ‘1’. The results using HLMZF are in the third column of Table 5.1. As can be seen by comparing the results, the HLMZF reproduced almost exactly the same estimates for all of the parameters including the standard errors. This is evidence that the newly developed program is working. -53- Table 5.1 Results of the Analysis for High School and Beyond Survey # iterations until convergence HLM2 5 HLMZF 3 Fixed effects estimates 700 12.126 (0.197)2 12.126(0.197) 701 1.227 (0.303) 1.227(0.303) 702 5.332 (0.366) 5.332(0.366) 710 2.946 (0.154) 2.946(0.154) 711 -1.644 (0.237) -1.644(0.237) 712 1.042 (0.296) 1.042(0.296) Random effects estimates 0,2 36.721 (0.626) 36.721(0.619) 2'00 (or woo for HLMZF) 2.317 (0.355) 2.317(0.353) 1,0 (or w", for HLMZF) 0.188 (0.196) (0.483 as corr.) 0.166(0.193) r” (or 1;!” for HLMZF) 0.065 (0.208) 0.065(0.204) Log-likelihood at convergence -23247-30 -23248.22 # parameters estimated 10 10 46494.592 46496.44 Deviance 2 Note. ( ) represents the standard error computed from the Information matrix. -59- 5-2. Infant Vocabulary Growth The purpose of demonstrating the analysis of infant vocabulary grth data is to show where it is useful to apply the HLMZF model. The data come from a recent study of children’s vocabulary development during the second year of life ((Huttenlocher, Haight, Bryk, Seltzer, & Lyons, 1991). Huttenlocher et al. investigated the relationship between a child’s early vocabulary acquisition and maternal use of language, hypothesizing that exposure to the mother’s spoken language has a positive relationship to the growth of the vocabulary of the young child. Gender differences in vocabulary grth were also considered (Huttenlocher et al., 1991). Two groups of mother-infant pairs, with six boys and five girls in each, provided from 5 to 7 observations per infant in their early stages of life. More specifically, the first group consisted of 11 children who were observed at their home on six or seven occasions at 2-month intervals during the period from 14 to 26 months of age. For some cases, the 14-months data point was missing. The second group consisted of another 11 children who were observed at 16, 20, and 24 months. The time dimension was (Age — 12),,- months, assuming that at 12 months a child’s vocabulary size was zero. Huttenlocher et al. (1991) first formulated the following full quadratic unconditional model. iid L-l: Y): 3,]. +6,j(Age-12),j+ 621(Age- 12)2,,+ 8,, a, ~N(O,0'2) I -60- A1j=700+u01 (5'4) “0) M O 700 701 2'02 “11 ”N O 1 1'01 711 2'12 “2 j 0 702 2'12 713 Obtaining the results that H0: r00 = 0 and 700 = O and 7,0 = O can hold statistically and A knowing that Corr(u,j. , uzj) z 1, they formulated the following final level-1 model which left only the quadratic term: L-l: Y1} = 621(Age -12); + 8,], £0. TN(0,0'2) At level-2, they modeled the rate of acceleration by the individual characteristics such as group membership, gender, and the amount that the mother spoke to the child. L-2: 6,]. = 720 + y2,(Group)j + 722(Gender), + 7,, log( Momspeak)j + uzj , (5-5) iid (“21) ~ N((O)’(Too))- where (Group) I. = 1 if the child is in group 1, and 0 otherwise; (Gender) 1 = 1 if the child is a girl, and 0 if a boy; and log( Momspeak)j is the natural logarithm of the number of -61- words that the mother spoke to the child j, measured once at the first occasion of the observation. The results for this model are in the second column of Table 5.23. The fact that the estimate of the correlation between an. and “21 was high may not necessarily indicate that the rate of growth does not vary from child to child, which is a question that was left in Huttenlocher et al. (1991)’s model. To investigate this point, Bryk & Raudenbush (1992) used the following model: L-l: if]. = mug.» -12),, + 6,j(Age —12);+ 5,}, 5,, TN(O,O'2) L-2 6.1.:qu 6,]. = 720 + 72,( group) j. + 722( gender) J. + 723 log( Momspeak), + uzj (5-6) where (2:3)TN((3).(::; 2:» The results showed that f” = 16.94 (See the row for r” at column 3 in Table 5.2), which was significantly different from 0, and the uU and uzj were highly correlated. This suggested that it was not necessary to estimate the covariance 1'12 , because the null 3 The results were obtained by reanalyzing the model by HLM2 version 4.92, specifying full maximum likelihood (MLF) and # Fisher acceleration = I. This specification induced slightly different results from Huttenlocher et al. (1991) which used the restricted maximum likelihood (MLR) and EM algorithm. The same thing can be said in the next results of Bryk and Raudenbush’s model. -62- hypothesis 2'12 = ,/ r” - 2'22 held, in the p0pulation, at .05 level, but it was necessary to estimate both variances, r” and In because those were not zero. The above result implies a factor analytic type model. Reparameterizing the 122 as . . . T . . 122 = 1:17,, by usmg the variance ratio 1,2, = —22- , and then setting 1,, = [turn gives the 11 same constraint as 1,, = ,/ r” - 122 . Thus, we obtain a factor analytic type (HLM2F) model by giving certain constraints on the elements of r matrix of the original HLM2 model, which in turn, implies that the HLMZF model that will be formulated is nested within the HLM2 model. This idea is formally formulated by a factor analysis type model. That is, In L-2 model, we let “0} :77” “1} =121771j, or, in matrix form, and the covariance equation is W11 3211””) D . =A‘l’ T: [uj] A (lull/11 331W” Thus, we formulate the following model: L-l: _ 2 K; - Aj(Age—12),j +flzj(Age—12),j +80. -63- (5-7) (5-8) it‘d 3,, ~ N(0,a’) L-2: A}: 771} (5'9) 6,}. = 720 + 7,,(Group)j + 722(Gender)j + 72, log( MomSpeak)j + 3,177,]. id where n”. l~ N(0,1//00). The results of this reduced model are in the fifth column of Table 5.2. In order to use the likelihood ratio test to examine whether the above factor model is statistically acceptable, we run the standard HLM2 model in two ways, one is to run the HLM2 sofiware and another is to run the HLMZF program by setting A = 1,. Those results are in the third column of the table below, under the caption HLM2 (Bryk and Raudenbush’s model), and in the fourth column, under the caption HLM2F (Bryk and Raudenbush’s model)‘. 4 It should be noted that the results for the standard HLM2 model executed by HLMZF showed a close match to the HLM2 results. However, a run by HLMZF was problematic because the estimate of LI" became a negative definite matrix in the early stage of scoring iteration even though several different starting values were tried. The values on the table were obtained when we ignored the fact that the estimate of ‘P became a negative definite matrix during the iteration. For this reason, we take the results by HLMZF for the saturated model as a reference. On the other hand, HLM2 was executed by choosing MLF and the number of Fisher iterations = 1. This specification lets HLM2 run repeatedly under Fisher scoring algorithm as long as the estimate of ‘1’ does not become a negative definite matrix during the iteration; if it does, then HLM2 returns to the EM algorithm, which is a default algorithm. Thus, probably this is how HLM2 produced the results: HLM2 first tried to estimate by scoring but it failed, and then used BM. -64- Table 5.2 Results of the Analysis for the Infant Vocabulary Growth Datas # iterations until convergence HLM2 (Huttenlocher et al’s model) HLM2 (Bryk and Raudenbush’s Model) 8 HLMZF (Bryk and Raudenbush’s Model) 4 HLM2F (Reduced Model) Fixed effects estimates 720 2.150(0.277) 2.088(0.140) 2.181(0.139) 2.180(0.139) 721 0.802(0.345) 0.599(0.290) 0.593(0.288) 0.598(0.289) 722 -1.105(0.327) -0.904(0.279) -0.902(0.278) -o.904(0.279) 723 0.886(0.327) 0.827(0.268) 0.826(0.267) 0.827(0.268) Random effects estimates 2 819.366(113.622) 707.866(107.832) 711.025(98.197) 708.412(98.233) CT 7110/41 for HLMZF) N.A. 16.940(15.402) 16.3266(13.436) 16.6429(5.256) I21 N.A. 1.709(0.978) 1.77651(0.864) N.A. (0.985 as earn.) 132 0.566(0.177) 0.178(0.128) 0.172(0.119) N.A. A?! N.A. N.A. N.A. 0.102(0.0262) Log-likelihood at - 639 . 22 ~ 632 . 505 ~ 632 . 502 - 632 . 504 convergence # parameters estimated 6 8 8 7 I)evjance 1276.436 1265.010 1265.004 1265.008 We compare the results in column 3 for the full model (Bryk and Raduenbush’s model) and those in column 5 for the reduced model (reduced by factor model). First, notice that the point estimates and their estimates of the standard error for the fixed effects are almost the same for the two models. The estimates of level-1 error variance (0”) are also almost the 5 Note: I. HLM2 results were obtained by setting Full maximum likelihood and # Fisher acceleration =1. When we set the # Fisher acceleration =5. then the results in column 3 took 867 iterations to converge. 2. ( ) is the standard error. 3. The cutoff criterion for getting out of the Fisher scoring loop for HLMZF is ‘relative change in the squared length of parameter estimates’ < 10’”. which is the same value as HLM2. though HLM2 uses changes in log- likelihood. (0'2) are also almost the same, but there is a small difference for the standard errors. For the level-2 variance-covariances, we can recover the estimates of the original variance- covariances if we compute them using the points estimates in the reduced model and the covariance equation in Equation (5-8) for the original Tau-matrix. Thus: in)?” = 0.10229 x 16.8429 = 1.72286, 21310?“ = 0.102292 x 16. 8429 = 0.176231. These values closely match those in the second column of Table 5.2, i.e., I], = 1.709 and 522 = 0.178 respectively, which are the corresponding elements in the 2' matrix. This means that we can recover the original I estimate, and this fact is especially useful when we cannot directly obtain the estimate of r , which is often the case when 5 is not fiill rank, as happened in this data when we used the Fisher scoring in HLM2F. The comparison of the deviances for the nested models can be used to assess model fit using a likelihood ratio test. The deviances for the full model (1265.010) and for the reduced model (1265.008) are basically the same. Thus, the deviance test clearly shows that the reduced model is statistically acceptable. -66- 5-3. Scholastic Aptitude Test (SAT) Meta-analysis of Coaching Effects The purpose of this demonstration is to give an example where the 1 matrix is of relatively large dimensions. When the size of r is large, we can formulate many different factor structures, which adds a complexity to the modeling. At this point, discipline specific theory and knowledge about particular data and variables should inform decisions regarding the number of factors (M), and which elements of factor loading matrix (A) to fix and what values to use, etc. After deciding on the general framework of the factor structure, mathematical knowledge of model identification can be applied to decide whether the model in mind is an identified model. In the SAT meta-analysis for coaching effect data, we create a 4 x 4 1' matrix. The data consist of 48 studies on coaching effects on the SAT scores (Becker, 1990). Only 46 studies were analyzed for our analysis because two studies had no information on coaching hours. The SAT scores were reported for the two subtests, SAT-Math (SAT- M) and SAT-Verbal (SAT-V). Each study provides a part of the information on standardized mean change scores between pretest and posttest on SAT-M, SAT-V for coached and uncoached groups. A study is considered to be complete if it provides four standardized mean change scores, i.e., standardized mean change score for the coached group on SAT-M, standardized mean change score for the uncoached group on SAT-M, standardized mean change score for the coached group on SAT-V, and standardized mean change score for the uncoached group on SAT—V. Table 5.3 shows a number of studies classified by available standardized mean change scores and whether a control group -67- (uncoached group) was used in the study. From that table, we see that only 19 studies, which is about 42% of the studies, had a complete set of information. Table 5.3 Classification of SAT Studies Existence of Control Group Available Standardized Yes No Row Subtotal Mean Change Both SAT-M and SAT-V 19 2 21 SAT-M only 1 3 4 SAT-V only 13 8 21 Column Subtotal 33 13 Total 46 One of the strengths of a hierarchical analysis is that diverse patterns of data information can be put together and all the information can be used for statistical inference assuming that data are missing at random (Little and Rubin, 1989). This will be the method of analysis used on the above meta-analysis data, which utilizes data from 27 studies (about 58 %) that are incomplete. Unless using a hierarchical model, information from more than half of the studies can be totally discarded or it will be used less effectively. In a small data set such as this meta-analysis data, it is too costly. Model: Let dy. be the standardized mean change score in the ith observation of the jth study, which can be defined as -68- 4(n,.j—2) )7 —l_’. 21'] 11/ ) '7 = 4n”. — 5 SM (5-10) where 172,]. is the sample post-test average for observation i of study j; 17“.]. is the sample pre-test average for observation i of study j, S,” is the sample standard deviation of the post test for observation i of study j which is assumed to well approximates the pooled 4("1/ ’2) for Y2); - Yw sample standard deviation, and the coefficient —— 4n”. — 5 S is used as a ray multiplying factor to ensure that d ,.j is an unbiased estimator of the population #217 standardized mean change 6,}. = —% (Becker, 1988), where p, ,1. is the population post-test mean for observation i of study j, .111.)- is the population pre-test mean for observation i of study j, and a is the population standard deviation that is common to both pre-test and post-test. Let X 1 be the indicator for SAT-M score for the coached group, X 2 be the indicator for SAT-M score for the uncoached group, X 3 be the indicator for SAT-V score for the coached group, and X 4 be the indicator for SAT-V score for the uncoached group. Thus, X 1 ,.j = 1 if d0 is the observation of SAT-M score for coached group, and 0 otherwise; X 2 ,.j = 1 if d”. is the observation of SAT-M score for uncoached group, and 0 otherwise, X 3 = 1 if d0 is the observation of SAT-V score for coached group, and 0 ij otherwise; X 4 ,j = 1 if d U is the observation of SAT-V score for uncoached group, and 0 otherwise. Defining the variables as above, we formulate the level-l HLM model: -69- L-l: d.)=A1X1y+flszzg+flstyj+,34,-X4.-j+5ya (5-11) where i = 1,...pj for p]. is the number of observations of standard mean difi’erence d”. available for the study j, and j = 1,2, ....,46. Within the study, the error 8,)— ’s are correlated because even within a study, sources of errors come from the same set of people. This is the difference between this model and the standard HLM model where the level-1 errors are assumed independent. The dependency of the within-study errors occurred because the model is multivariate, which implies that the data have common sources of variation within a study. The variances and covariances of the covariance matrix of the vector of within-study error a}. = (8”,...,£ij. )T (VJ. = D[sj], a pj x pj symmetric matrix) can be computed by the following formula (Becker, 1990): 2(1 " pp?) 502 + a . 2n . .I J Var(£,.j) a V,.]. = (5-12) where n j is the number of subjects in the study j, and p,.,. is the population pretest- posttest correlation. In this analysis, the value of p ,.P = 0.88 will be used as the approximate population correlation between pre- and posttest SAT scores for both SAT- M and SAT-V for all subjects, following DerSimonian and Laird (1983). Similarly the covariances of the covariance matrix of the vector of within-study error can be computed by the following formula (Becker, 1990). 6.6..- Cov(a..a..)aVii.~=:" + ’zjzp’. (5-13) I j -70- where p,,..j is the population correlation between the dependent variables associated with estimated standardized mean differences i and i ' . This correlation may be estimated from the sample, deduced from published test information or imputed on the basis of past research. In the SAT meta-analysis case, we use pm}. = 0.66 if the pair of (i , i’) is Math and Verbal in the same experimental group (either in the coached group or uncoached group), which is taken from the test manual (Gleser & Olkin, 1994). If the pair of the observation unit (i , i’ ) is in the different groups, i.e., coached and uncoahed group, p“, = 0.00 since those observations are independent. As we saw in the above, the within-study error variance-covariance are known in the meta-analysis settings. In HLM terms, this type of model is said to be the V-known model. The level-2 model is a multivariate regression model and the time for coaching in hours is used to model 6” and A}. , the means of standardized mean difference for SAT- M and SAT-V for the coached groups. L-2: 6U= 7,0 + 72,(Coaching Hours) j + u”. 62,. = rm + “2,- (5-13) 63]. = 720 + 721(Coaching Hours) J. + “31 64}. = y“, + u,1 j where -71- 7 ( ulj\ 0 711 712 T13 T14 \ “21' ”d 0 721 1'22 723 724 ’ u3 j 0 731 r32 733 734 1 W41) \0 741 742 2'43 2'“) and (Coaching Hours) 1 is the hours of coaching for the study j. This model is substantively interesting and meaningful by two reasons: One is that if we center Coaching Hours around zero as we just did in the Equation (5-13), then the fixed effect 710 is the expected standardized mean change for SAT-M for the coached groups at the absence of coaching and thus y“, — 720 represents the expected difference between coached and uncoached groups for SAT-M under no treatment (coaching), which is a possible result because some of the studies were not randomized experiments. Similar interpretation can be made for 730 — 7’40 , which is the expected difference between coached and uncoached groups for SAT-V under no treatment. The second reason is that If we center Coaching Hours around its grand mean, then 710 — he represents the mean gains in the experimental groups compared to the standardized mean change of the control groups for SAT-M, and 730 — 740 is the mean gains in the experimental groups compared to the standardized mean change of the control groups for SAT-V, assuming that the model is correctly specified6. To solve V-known HLM model, we capitalize the fact that the level-1 error variance-covariance matrix V]. is known for all j; j = 1,2,...,], where J is the number of 6 Actually model misspecification is possible. Kalaian and Raudenbush (1996) used the natural logarithmic transformation of the hours of coaching, being concerned with a curvilinear relationship from the observations of the scatterplots. Thus, including quadratic term may show a better fit. Here I used a linear model for simplicity of demonstrating the point. -72- studies that was used in Kalaian and Raudenbush (1996). That is, using Cholesky . . _ 'r . . . . . factorization, VJ. — Fij , where F]. is a p j x p 1 lower triangular posrtrve defimte matrix, we transform the level-1 within study model in Equation (5-11) by multiplying F1." from the left. Then the transformed level-1 model has i.i.d. error variance for all the studies with the fixed unit variance. Therefore, we can apply HLM2 and HLM2F software by fixing the level-1 error variance (02) = l. The results of this model by HLM2 is shown in the second column of Table 5.4. The results for the same model by HLM2F was not obtained because the estimate of ‘1’ becomes negative definite during the early stage of scoring iteration. The estimate of rwas 0.0775 0.0874 0.0874 0.1045 - 0.0360 - 0.0352 -0.0156 -0.0191 f: and as a correlation, 1.000 . _ 0.972 p‘ " -O.562 -0307 0.972 1.000 - 0.472 - 0.324 -0.0360 -0.0156 -00352 -00191 , (5-14) 0.0532 0.0364 0.0364 0.0334 -O.562 -0.307 -0472 -0.324 . (5-15) 1.000 0.863 0.863 1.000 As we can see from the correlation estimates, 2° is almost singular and has two high correlation blocks, which caused the slow convergence that is shown by the very large number of iterations in HLM2 as 2911 EM iterations, and caused the inability of obtaining the estimation by HLM2F which utilized Fisher scoring. -73- Table 5.4 Results of the Analysis for the SAT Coaching Effects Data7 # iterations until convergence HLM2 (Full model) 2911 HLMZF (Reduced model a) 13 HLMZF (Reduced model b) 8 Fixed effects estimates 0.290(0.071) 7m 0-317(0.0692) 0.284(0.0679) y” 0.00716(0.000896) 0.00701(0.00110) 0.00760(0.00115) 720 0.263(0.0709) 0.254(0.0614) 0.253(0.0655) 73o 0-183(0.0517) 0.176(0.0459) 0.187(0.0537) 7“ 0.00467(0.00158) 0.00481(0.00091) 0.00455(0.00123) y“, 0-140(0-0357) 0.1409(0.0321) 0.1417(0.0343) Random effects estimates 0'2 N-A- N.A. N.A. 7,,(1/1” for HLMZF a& b) 0-0775(0-0241) 0.0867(0.0243) 0.0812(0.0208) 61 0.0874(0.0269) 11.4. 11.4. 722 0.104(0.0330) 11.11. "J. 23,062, for HLM2F a& b) '0-036010-0‘54) ~0o0269(0-0135) -0.0325(0.00893) r32 -0.0352(0.0180) 11.11, "J. 133(sz for HLMZF a& b) 0.0532(0.0137) 0.0496(0.0117) 0:0536(0.00811) {“0631 for l-ILMZF b) -0.0156(0.0125) N.A. -0.0167(0.00673) T42 -0.0191(0.0146) NOA. N.A. T43(V/32 for HLM2F b) 0.0364(0.0102) NOA. 0.0329(0.00325) 744(1/133 for HLM2F b) 0.0334(0.00958) N.A. 0.0324(0.00464) ,1“ 11.4. 1.010(0.0541) 1.122(0.0717) N.A. 0.817 0.0556 11.4. ’142 1 ) Log-likelihood at convergence '279-689 -307-693 -284.424 # parameters estimated 16 11 13 Deviance 559.378 615.386 568.848 7 Note 1. ( ) is the standard error 2. The cutoff criterion for getting out of the Fisher scoring loop for l-ILMZF is ‘relative change in the squared length of parameter estimates’ < 10", which is the same value as HLM2. though HLM2 uses changes in log- hkehhood. -74- From the pattern of f , we can imagine a simple pattern of factor structure. That . T rs, we formulate the factor model for level-2 error vector u j. = (an. , uzj , 11,]. , u“) as ”U = ’71)" “21 =12lnlj’ (5-16) “31 = 7b}, “41 = 427b,“- or, in matrix form, (uU\ l 0 _ “21 121 O [’70] ll]. _ “3)- _ 0 1 Tbj _' A7719 (5’17) (1141.) O ’142 l 0 0 where A = ’12 , and 77]. =[011] 1 772} o 4.. The pattern of factor loading matrix A and the length of factor score vector 77!. characterizes our idea about how the level-2 errors are correlated. That is, we hypothesize that the 4 level-2 random errors (u j = (“1)- , u2j , u”. , u“. )T) after controlling for the coaching time can be represented by two unique level-2 unit latent random errors (77]. = (77,]. , 77217), and the SAT-M unique variability for coached and uncoahed groups ((uU , uzj )) can be explained by a common factor 77,}. , and the SAT-V variability for coached and uncoahed groups ((“31 , u“. )) can be explained by a common factor 7721.. Formally, we formulate a HLM2 factor (HLM2F) model as follows: -75- d”. zflij +flle2ij +14sz1; +6,ij +511 L-2: 6U= 7,0 + 7,,(Coaching Hours) j + 77,]. (5-18) '62} =720 "72217711 63!. = 730 + y3,(Coaching Hours) j + 772]. ’34} = 740 '1' 142772}, W11 W12 ) . We refer to this model as the reduced W 21 W22 where [77"] ~ N(0,\1’) for 111 =[ 21 ‘model a’. Note that having specified the level-2 errors as Equation (5-16) or (5-17), we structured the r covariance matrix as W11 121W” W12 ’142W12 121W“ A§1W11 421W12 ’121’1'42W12 W21 ’142 W21 W22 ’10sz 142 W21 ’1'21’1'42 W21 142 W 22 1'12 W 22 D[uj] = AD[nj]Ar = MFA" = . (5-19) Note also that ‘model a’ in Equation (5-18) is nested within the full model in Equation (5-13). This fact can be shown in a similar way that I did for the 2 by 2 2' matrix case for the infant vocabulary growth data in section 2 of Chapter 5 (see page 62). Another way of finding the constraints on the 7 matrix in order for the reduced model (model a) be nested within the full model is obtained by comparing Equation (5-19), the expression of the structured I created by formulating the HLMZF model (model a), with the unstructured and saturated r in Equation (5-13), which was formulated by HLM2. -75- Using this comparison and with a little algebra, we find five constraints on the 2' matrix: 711722 = 72211 733744 = 131231 73,133 = 21:11-43 1 73314, = 7437311 (5'20) and 1,,2'332'42 = 721743731 - Thus we obtain Equation (5-19) from the saturated 1 matrix in Equation (5-13) by defining T 7 W11: 7111 421=—2'l_1 W22 = 7331 ’1'42 =i1 (5‘21) 711 2'33 and W21: 731 with these five constraints. Note that the first two constraints in Equation (5-20) correspond to the statements that the squared correlation between u, j and 212, equals to one (p3,,2 = 1) and the squared correlation between u,, and u“. equals to one ( p3,", = 1). The results of ‘model a’ is shown in the third column of Table 5.4. To test the model fit, we use a deviance statistic. The deviance test suggests that ‘model a’ does not fit as well as the full model, D, — D0 = 615.386 - 559.378 = 56.008 with 5 (16-11) d.f., P- value < 0.0001 , where D, is the deviance for model a, and D0 is the deviance for HLM2 model. The next step is to try to find a statistically acceptable model. Since the ntunber of parameters is reduced from 16 to 11 and the estimate of 7 matrix is almost insufficient rank, the number of parameters of a model that shows a good fit must be between 11 and 16. -77- One model that satisfies this condition can be obtained by carefully looking at the estimated tau matrix produced by the HLM2 full model. The estimated tau matrix with its correlation matrix form reveals that the upper lefi 2 by 2 block matrix is closer to singular that the lower right 2 by 2 block matrix because the upper has the correlation 0.972 whereas the lower’s one is 0.863. Based on this observation, we consider a model that only u, j and u,, completely share the common variance. Thus we formulate the following model: L-l: d1; = AJXHI +152.in01' (BUX‘M'I 1' 6,,X40. 1' 50' L-2: 6, j: y“, + 7,,(C0aching Hours) j + 77,]. (5-22) flu = 720 + 42.171,- 631. = rm + 73,(C0aching Hours) 1 + 772,. 11’. j = 7... + 773,. ’71; W11 W12 W13 where 77,]. ~ N (031’) for ‘1’ = 11/2, 81,, 01,3 . We refer to this model as the ’73) W31 W32 W33 reduced ‘model b’. Similar argument that we did for ‘model a’ tells that ‘model b’ is nested in the HLM2 model, and it is clear that ‘model a’ is a nested model in ‘model b’. Note that the factor model that we used in ‘model b’ for the level-2 error vector 1‘. uj=(u,,., u,,, u”, u“) rs -78- (up 1 0 0 7],. ”21' 421 0 0 I ui_ 1131- — O 1 O ’72} "Anjr (5'23) 04,-) 0 o 1 ’7” 1 0 0 , 0 0 ”‘1' whereA= 0 1 O ,andn,= 772,. 0 0 1 "31' The results of fitting ‘model b’ is in column 4 of Table 5.4. The deviance test reveals that ‘model b’ does not fit as well as the full model, but we marginally reject the null hypothesis at 0.05 level because D2 — D0 = 568.848 - 559.378 = 9.470 with 3 (l6- 13)d.f., and the P-value is 0.024, where D2 is the deviance for model a, and D0 is the deviance for HLM2 model. Knowing that ‘model b’ is rejected though it is marginal, we still continue the investigation of the model that fits to the data. The number of parameters of the model that fits now must be between 13 and 16. One model that satisfies this condition is specified by adding a specificity component to the model, which will be presented in section 1 of Chapter 6. This model adds four specific variance parameters on ‘model a’, and thus if all of these are statistically significantly different from zero, the number of parameters of the model is 15, which is one smaller that the HLM2 model. To fit the model that has unique specificity is a t0pic of future study whose model is formulated in section 1 of Chapter 6. Finally, I would mention that the results and substantial interpretation for the two dimension HLM2 factor model are consistent with those of Kalaian & Raudenbush -79- (1996), which analyzed the studies that had the pair of SAT-M and SAT-V, where major substantive conclusions were that coaching was more effective for SAT-M than SAT-V, and the between-study unique effect of SAT-M and SAT-V was negatively correlated. However, the current analysis provides a more precise information on what way study variability emerged and it utilized all of the available studies, whereas Kalaian & Raudenbush (1996)’s model required the existence of control group for each study, which will result in 33 studies for this data set (see Table 5.3). -30- 54. Results from the Simulated Data The purpose of this analysis is not only to see whether the theory and the computer program that I developed can recover the parameter values well, but also to evaluate the capacity of the methodology to distinguish between true model and alternative incorrect models in the context of large data application. To answer these questions, I generate the artificial data with known parameters. We consider a situation in which subtests nested within students that was already stated in section 3 of Chapter 3. To summarize the settings, a test consists of 4 subtests, mathl, math2, verball, and verbalZ, and each student, the total of 100 students, takes one form of the test at the first testing occasion and takes a parallel alternative form of the test at the second testing occasion. No missing observations were assumed. Thus we created a situation that can be conceived as subtests are nested within students. Each student has 8 observations for 100 students, with the total observations of 800. We assumed that true scores of subtests for the same domain such as mathl and math2, and verball and verba12, are perfectly correlated and also assumed that the correlation between mathematical and verbal proficiency was 0.50. Therefore, non-orthogonal two factors, mathematical and verbal proficiency were considered. The model was formulated as in Equations (3-29) and (3-31), and the parameters in the model were setas 7.0 =720 =730 =74, =500, 0'2 = 25, 1., =0.8 2., =12, 01,, = 100 , 1/122 = 100 , W12 = 50. It should be noted that the model is an identified model as shown in section 3 of Chapter 3. -31- Checking the Data that are generated: We generate the 800 observations in total, 2 observations from each subtest and thus 8 observations for each subject. The descriptive statistics for this data is in the following tables. Table 5.5 Descriptive Statistics of 4 Generated Outcomes The MEANS Procedure Variable N Mean Std Dev Minimu- Maxilun matht 200 500.7896272 10.7851935 472.3574900 536.1797700 math2 200 500.0342420 9.1171437 468.6287800 523.6194500 verb1 200 499.2721306 11.5485774 475.0847300 528.3763000 verb2 200 499.4860767 13.2175895 466.3821800 535.9121100 To see whether the generated data are reasonable with regards to the means and the standard devratrons, we let y”, a 7018f! , yzy- E- Jig-(,2, =1 . 3’3.)- 5 ymxw-l . and y“, s y'jl’w“ . Then srnce y”, =fl, +5.). )5.) =13),- +6‘,-,-. 3’3.)- =.33,- +31), and .741} :16” +3111 we have E(71g)=710 =5001 E(yzij)=720 =5003 E(y3ii)‘= 730 =5001 and E(yw) = no = 500 for the respective means, and s.d.(y,,,.) = W: W: We 11.18, s.d.(y,,.)= W: W: J15 5943, s.d.(y,,)=W=W=./iz_sslns,and s.d.(y,,j) = W = W = «[169 513.00 for each standard deviation. Comparing these population means and standard deviations with the corresponding sample means and the standard deviations in Table 5.4, we recognize that -82- the sample statistics of the means and the standard deviations are close to the populations values. be seen in Table 5.6 or as a correlation matrix in Table 5.7. For the second moment property, the sample variance and covariance matrix can Table 5.6 Sample Covariance Matrix of 4 Generated Outcomes Covariance Matrix, DF = 199 natht math2 verb1 verbz matn1 116.3203994 74.5273374 47.8142300 59.2826475 math2 74.5273374 83.122308? 29.7039157 41.4890456 verb1 47.8142300 29.7039157 133.3696391 128.1651858 verbz 59.2826475 41.4890456 128.1651658 174.7046711 Table 5.7 Sample Correlation Matrix of 4 Generated Outcomes Pearson Correlation Coefficients, N s 200 Prob > |r| under H0: Rho=0 nath1 natnz verbt verb2 math1 1.00000 0.75793 0.38388 0.41588 <.0001 <.0001 <.0001 math2 0.75793 1.00000 0.28212 0.34429 <.0001 <.0001 <.0001 verb1 0.38388 0.28212 1.00000 0.83983 <.0001 <.0001 <.0001 verb2 0.41586 0.34429 0.83963 1.00000 <.0001 <.0001 <.0001 The population covariance matrix can be represented as y. 14. +02 11,,” (11,, 1,12,, 125 80 50 60 D yr _ ’11W11 ’112W11‘1'0'2 4W1: ’h’lrWn = 80 89 40 48 , y. 1.1., 4,111,, (11,, +02 2,01,, 50 40 125 120 _ y4 _ ‘ 12le ’11’1'1W12 12sz 122sz +02 60 48 120 169 -33- and as a correlation matrix, FY11 l Comparing these expected values with the sample values, the generated data are again reasonable. Finally we check the distribution of each observed variable, y, , y2 , y3 , and y4 , using the Shapiro-Wilk test for normality. The results are in Table 5.8. Table 5.8 Shapiro-Wilk Test for Normality for the Generated Outcomes y2 0.758 p y, ‘ 0.400 0379 1 0.826 ' y, _ 0.413 0391 0.826 1 0.758 0.400 0.413 1 0379 0391 Outcome Shapiro-Wilk Statistic P-value (W) (Pr. < W) Math 1 0 . 9958 0 . 8432 Math 2 0 . 9936 0 . 5480 Verbal 1 0.9881 0.0918 Verbal 2 0 . 9919 0 . 3347 All of the P-values computed from the Shapiro-Wilk statistic are greater than .05. That implies that each outcome can be considered to have a normal distribution, which we expect because each outcome was generated from the sum of two independent normal variates. The above results all indicate that the data were generated accurately as we specified in the model. -34- fill-15.8 Now we move to the analysis. Here we formulate a series of identified models discussed in section 3 of Chapter 3, and fit those series of models that only differ in the level-2 variance-covariance structure to the generated data. Thus, all of the following four models, model 0, model 1, model 3, and model 4, have the same level-l model as written in Equation (3 -29). The difference comes in at the level-2 model error structure. The standard HLM2 expresses the level-2 model for this artificial data as, L-2: 161} = 710 1' “11 6 . =7 +u 2, 20 21 (5_24) fl3j = 730 + “3} flu = 740 '1' “4} where “U K 0 711 7,2 713 714 \ “27' TN 0 712 2'22 723 2’2, “31 O 713 1'23 1'33 73, “41 K O 714 724 2'3, 744 1 Or in matrix notation, 6j=Wj7+uj, u]. ~N(0,r) (5-25) V T Where16j=(flj1 18219 flip 164;)1szl417=(711 721 731 74)1and T uj=(u,j, u,,, u,,, “41): All of the models deal with factoring u j and they can be written as in the matrix form -35- u j = A17]. (5-26) in general. The covariance structure can be written as r = A‘I’AT. (527) Thus, structuring the 7 matrix and adding restrictions on it, all the models can be derived from the standard HLM2 model, which I call here “model 0”. That means that all of the following models are nested within model 0. Model 0: Model 0 can be specified by a standard HLM model, but if we use the HLMZF formulation, this model can be expressed by setting A as 4 x 4 identity matrix, i.e., T A =1, and by setting 77,. = (77,], 772,, 773,, 774,.) , and thus by setting ‘1’ as a 4 x 4 symmetric variance-covariance matrix. Model 1: Model 1 is the model from which we generated the data. Therefore, it is the model 1 that should best fit to the data. Model 1 is obtained by setting A = g 0 0 O l 9 12 T 77,. = (71),, 772,.) in Equation (5-26), and ‘1’ as a 2 x 2 variance-covariance matrix in Equation (5-27). -86- Note that model 1 is nested within model 0, which was already shown at section 3 of Chapter 5 when we analyzed the SAT meta-analysis of coaching effects data (see Equation (5-17).). Model 2: Since model 2 was an unidentified as shown in Section 3 of Chapter 3, it will be omitted from the analysis. Model 3: 0 . . . ,1 T Mode13 rs specified by settrng A: ‘ , 77,. =(77U, 772,.) and ‘I’ asa 31o}:— 1 11 2 x 2 variance-covariance matrix. Note that model 3 is nested within model 0, and then, model 1 is nested within model 3. The fact that model 3 is nested within model 0 can be shown in a similar way that I did for the SAT meta-analysis of coaching effect data in section 3 of Chapter 5 by comparing the structured r with the unstructured r. The fact that model 1 is nested within model 3 can be easily obtained by constraining A, = 0 and ,1, =0. -37- Model 4: Model 4 is specified by setting A = , 77, = (77,1) and ‘I’ as a 1x lvariance- ,1, 32 covariance matrix (scalar). Note that model 4 is nested within model 1 as shown in section 3 of Chapter 3 and thus it is the most restrictive model. To summarize, a series of 4 models that are formulated above has the following nested models structure, Model 4 c Model 1 c Model 3 c: Model 0 (‘ c ’ reads ‘nested within’). The model ntunber, its characteristics, and the number of parameters estimated in the 2' matrix is summarized in Table 5.9. Table 5.9. Characteristics of the Specified Modelsl Model Model Characteristics # parameters estimated in the Tau matrix 0 Saturated HLM2 model 4(5)/2 = 10 4 factors 3 2 factors cross loading 4 + 3 = 7 mis-specified model 1 2 factors simple loading 2 + 3 = 5 correct model 4 1 factor 3 + l = 4 mis-specified model Results: The results are shown in Table 5.10. The estimates of fixed effects are very similar across all the models including the standard errors and all the 95% confidence ' Note: Model 4 c Model 1 c Model 3 c Model 0. -33- intervals captures the true values, i.e., 7,0 = 720 = y}, = 7,0 = 500. The large number of iteration of Model 0 indicates that the level-2 variance-covariance is near singular. It was estimated by HLM2 as: 124.3 00 98.940 48.818 65.533 .. _ 98.940 81.295 40.195 49.139 T — 48.818 40.195 94.183 1 12.590 ’ 65533 49.139 112590 143.791 and as a correlation matrix, 1.000 0.984 0.451 0.490 0.984 1.000 0.459 0.454 0.451 0.459 1.000 0.967 ’ 0.490 0.454 0.967 1.000 P5: which supports near singularity argument. The results for Model 1 represent that the estimates capture all of the true parameters in the model within the range of 95% confidence intervals. -39- Table 5.10. Results of the Analysis for the Simulated Data by Four Models2 # iterations until convergence Model 0 1309 Model 1 4 Model 3 6 Model 4 5 Fixed effects estimates 499.225(1.164) 499.225(1.166) 499.225(1.013) 499.225(1.045) 2: 499.055(0.960) 499.055(0.963) 499.055(0.655) 499.055(0.693) 730 500.200(1.028) 500.2(1.021) 500.2(1.092) 500.2(0.964) 74o 500.910(1.246) 500.91(1.243) 500.91(1.273) 500.91(1.166) Factor Loading estimates ,1, 11.4. 0.80569(0.0407) 0.66261(0.0472) 0.79521(0.0676) ,1, 11.4. 1.24466(0.0562) 1.15753(0.0556) 11.4. A; 11.4. 11.4. 0.05610(0.0438) 1.15745(0.0695) ,1, 11.4. 11.4. -0.0903(0.0493) 11.4. ,1, 11.4. 11.4. 11.4. 0.61993(0.0745) Random effects estimates 02 23.609(1.669) 25.3519(1.462) 24.0167(1.365) 57.8552(3.091) 7,, 124.300(19.266) 11.4. 11.4. 11.4. 2.2, 96.940(14.967) 11.4. 11.4. .11.4. r,, 81 .295(13.193) 11.4. 11.4. 11.4. 7,, 48.616(12.965) 11.4. 11.4. 11.4. Tn 40.195(10.716) 11.4. 11.4. 11.4. 733 94.163(15.012) 11.4. 11.4. 11.4. 7,, 65.532(15.960) 11.4. 11.4. 11.4. ,4? 49.139(13.000) 11.4. 11.4. 11.4. 7,, 112.590(17.079) 11.4. 11.4. 11.4. 7,, 143.792(22.020) 11.4. 11.4. 11.4. v,” 11.4. 123.201 (14.283) 90.5223(11.071) 60.2666(12.432) W12 11.4. 50.4351(8.190) 46.2939(7.666) 11.4. W22 11.4. 91.5432(10.901) 107.166(12.766) 11.4. Log-likelihood 2704.02 -2707.09 -2706.93 -2660.67 # parameters estimated ‘5 1° ‘2 9 Deviance 5408. 032 5414.18 5413.86 5761 .74 2 Note: 1. Cutoff criterion for getting out of the Fisher scoring loop for HLM2F is ‘relative change in the squared length of parameter estimates’ < 10", which is the same value as HLM2, though HLM2 used changes in log-likelihood). 2. ( ) is the standard error 3. The saturated model was tried by HLM2F. but during the iteration. ‘1’ estimate became negative definite. 4. Model 2 was unidentified and thus was excluded from the analysis. -90- We focus on the model fit by comparing the deviance statistics. Since all the models are nested, we can perform the deviance test. Since the order of nesting of the models is Model 4 c Model 1 c Model 3 c Model 0, we perform the deviance test as in the table below. Table 5.11. Several Results for the Tests of the Model Fit Test Deviance d.f. P-value Model 3 against Model 0 5.83 4 p = 0.212 (5413.86 - 5408.03) (15-11) Model 1 against Model 0 6.15 6 p = 0.407 (5414.18 - 5408.03) (15 - 9) Model 4 against Model 0 353.71 7 p < 0.001 (5761.74 - 5408.03) (15 - 8) Model 1 against Model 3 0.32 2 p = 0.852 (5414.18 - 5413.86) (11 - 9) Model 4 against Model 1 347.56 1 p < 0.001 (5761.74 - 5414.18) (9 - 8) From the table, the deviance test identifies that model 1 is the most appropriate model among the 4 models, which is the model that generated the data. Finally, in order to evaluate how good the method of estimation shown in Chapter 4 is, 1000 data sets were generated from the model in Equation (3-29) on page 41 and in Equation (3-31) on page 42 and parameter values on page 81. They were analyzed by Model 1, the correct model (see page 43 and page 86). A 95 % confidence interval on 6, where 19 is a generic symbol for any one of the parameters in the model, was constructed for each data set by the form, 19 i 1.96s. 6(9) , where s. e.(é) is the estimated standard error for 61 , which was obtained either from the -91- method of generalized least squares for the fixed effects parameters (see Equation (4-17) on page 54) or from the diagonal element of the information matrix (see Equation (4-15) on page 53). The value of the multiplier of s.e.(é) was chosen as 1.96, which assumes that 61 is normally distributed; this assumption can be justified by a reasonable conjecture that since we have a relatively large number of repetitions (1000 times), the normal approximation was appropriate. Let p be the probability that the confidence interval covers the true parameter value. For each of the 1000 samples a confidence interval was constructed by the above method. We estimate p by the proportion of coverages among the 1000 repetitions, and denote this by 13. For example, for a fixed effect parameter 7,, , 945 times out of 1000 repetitions the interval captured the true parameter value (7,0 = 500) in the interval. Therefore, the estimated coverage probability of the interval, p , is 0.945. Similarly, the estimated coverage probability was computed for the other eight parameters in the model and is represented in the second column of Table 5.12. -92- Table 5.12. Estimated Coverage Probability and Its Confidence Interval for 1000 Simulations Parameter Estimated coverage probability A 95% confidence interval on p (15) (0.931, 0.959) 0.950 (0.936, 0.964) 0.942 (0.927, 0.957) 0.949 (0.935, 0.963) 1,, 0.938 (0.923, 0.953) 4,, 0.939 (0.924, 0.954) ([1,, H 0.945 (0.931, 0.959) 1,12, 0.946 (0.932, 0.960) 1,122 H 0.943 (0.929, 0.957) Based on the point estimate of the coverage probability ( p ), we can compute an .. ,_ ,. approximate 95% confidence interval on p by [3 i1.96,’&n—p—) , where n is the number of replications. Thus, in our case, n = 1000. Then, for example, an approximate 95% confidence interval on p for 7,0 is (0.945 1- 1.96 - 0.0072) 5 (0.945 i 0.014) = (0931,0959) . An approximate 95% confidence interval on p for the other eight parameters in the model was computed in a similar way and is shown in the third column of Table 5.12. -93- The average estimated coverage probability of the confidence interval is slightly less than 0.95, with each estimated coverage probability being close to this value. All of the 95% confidence intervals on p for every parameter captured the true value, 0.95. These results add credibility to the method developed in this dissertation estimating parameters and standard errors, when the cluster level (level-2) sample size is large enough and the parameter values are not at the boundary of the parameter space. -94- Chapter 6. Conclusions and Future Directions 6- 1 . Conclusions In this dissertation, a method that incorporates factor analysis into the level-2 variance-covariance matrix of the hierarchical linear model was discussed. This approach provides several contributions, methodologically and substantively. 6-1-1. Methodological contributions HLMZF avoids excess simplification of level-2 variance-covariance matrix when some pair(s) of random effects have high correlation. If we specify many regression coefficients as random using the standard HLM, with maximum likelihood (ml) estimated via the EM algorithm, the number of iterations to obtain the estimates gets too large. However, as long as parameter estimates remain in the interior of the parameter space, the increase in the log-likelihood is assured and we can expect convergence in a reasonable amount of time. Fisher scoring, when combined with EM, can accelerate convergence quite dramatically. When Fisher scoring fails to converge within the parameter space or it fails to show much improvement on each iteration, a possible cause should be that parameter estimates are close to the boundary of the parameter space. Within this boundary estimation problem, there are two possibilities. One is that some variance estimates are close to zero and the other is that some correlations are close to 1 or -1. The former problem can be easily solved by setting the corresponding random level-2 error to zero. One caution needs to be noted, however. Since setting one random level-2 error to zero -95- implies that all the covariances related to this random error as well as the variance be zero, we need to check whether we can drop the term by executing a multiparameter test such as a deviance test. Solving the latter problem is one of the main topics of this dissertation. That is, the problem of either failure of convergence or slow convergence, caused by correlations which are close to 1 or -1, was solved by using factor analysis at level-2. There is another methodological contribution that considering factor analysis in the level-2 variance-covariance matrix provides. That is, this methodology offers a natural framework for modeling. If we don’t have a strong theory or prior knowledge to make regression coefficients fixed, we want to make as many of these coefficients as possible be random so that the model is more general such that it allows the regression coefficients to vary among level-2 cluster units. However, if these random coefiicients are highly correlated, the HLM2 methodology fails under Fisher scoring or gives a very slow convergence under EM algorithm, even when variance estimates of each random coefficient differ from zero. In this situation, HLMZF is useful because it constrains covariances, whose estimates have come close to the boundary using factor analysis, while allowing variance to differ from zero. Factor analysis achieves this purpose by decomposing the original covariance matrix into a factor loading matrix and a smaller covariance matrix of factors. Bryk and Raudenbush (1992) provided a guideline of level-1 model building in their Chapter 9 (pp. 201-204). The key issue of the procedure that they discussed was whether we fix the level-l regression slope or make it random. In this regard, they -96- suggested using statistical evidence such as point estimates, univariate 12 tests, and multivariate deviance tests. Also they mentioned that low estimated reliabilities and the slow rate of convergence for the EM were useful indicators, and diagnostic themselves, for respecifying a random level-l coefficient as either fixed or nonrandonrly varying. Even after polishing the level-1 model using the above recommended treatment, the slow rate of convergence for the EM may still occur because of highly correlated random coefficients as we have seen in the examples in Chapter 5. Therefore, I propose a slight modification of the level-1 model building procedure. That is, 1) Make as many level-1 regression coefficients random as needed, based on evidence of non-zero variance in the coefficients; 2) If the rate of convergence is very slow, then consider using the HLM2F model. In terms of specifying factor structure, employ substantive theory of the field in question as well as the estimated correlation structure of f , if it is obtained. 3) Further, a more active use of the HLMZF model would be, when a researcher wants to test his/her theory regarding the simultaneous variability of the random effects among the level-2 units, to specify this theory by HLM2F regarding how random effects are related. This procedure solves a key dilemma that most of the analysts who use multilevel modeling encounter. That is, when analysts don’t have a strong theory or prior knowledge to make regression coefficients fixed, they want to make as many of these coefficients random as is possible, but often the data don’t have enough information to estimate all of the parameters. The procedure that analysts use to deal with this dilemma is to fix some -97- of the coefficients that are not their focus, even when they suspect that there is no reason that those coefficients shouldn’t vary among level-2 clusters (Bryk & Frank, 1991). HLM2F is a solution of this dilemma. 6-1-2. Substantive contributions Expanding modeling possibilities I would argue that the HLMZF methodology will contribute to educational research because this model certainly expands modeling possibilities, while maintaining natural continuity with the standard HLM2. This statement consists of three claims: recovery of f , improvement over current ad hoc procedures, and view of current ad hoc procedures from the HLM2F framework. Recovery of 2‘ Suppose a researcher wants to study many random effects simultaneously but he/she cannot obtain a stable estimate of 7 because of limitations of both the data and the modeling framework. Suppose, however, he/she can run the HLM2F model by reducing the dimensionality of r. In this situation, the researcher can recover the 2' estimate by constructing it from the estimates of A and ‘1‘ by f = A‘I’AT. This certainly gives the educational researchers an opportunity to study the large 7 -matrix from the limited data he/she has at hand if some of the random effects in fact go together. -98- Improvement over the current ad hoc procedure The second claim is that the HLMZF model represents an improvement over the ad hoc procedures currently practiced by educational researchers when they encounter the above situation, i.e., when they want to make many regression coefficients random, but HLM2 does not allow for this. Let’s consider the current practice of the researchers who are in this dilemma. To illustrate this claim specifically, recall the Equation (2-1-1), the level-1 model formulated in a hypothetical school effectiveness study. L-l: iid Y}, = 60,. +fljxw +6,sz, +63jx3y. +6,wa +6565”. +6666”. +80, 8,]. ~N(0,0’2) At level 2, the researcher formulated the model in which all the level-1 coefficients vary randomly among schools afier being accounted for by a certain school characteristic W]. because he/she thought that there is no reason to fix some of the coefficients. L-2: A); =700 +701Wj +u0j A)“ ==710 +711Wj +qu 182} =720 +721Wj +u2j #1} =730 +731Wj +u3j A} =740 +741Wj +u4j 165} =750 +751Wj +qu 166} =760 +761Wj +u6j where u j. = (uo, , u, ,. , uzj , u,,, u,,, u,, )T is assumed to be distributed as multivariate normal with a mean of 0 and a covariance of r , a 7 x 7 symmetric matrix. This is a fairly large -99- model that has 28 unique parameters in the r matrix. Such a matrix is generally estimable only if the number of level-l units is quite large. If the level-1 sample size is not large, the problem that the researcher faces is that the data are insufficient to estimate all the variances and covariances. Following the current practice, the researcher fixes the slopes that are not central to his/her research question, and let the slopes of interest be random; thereafter he/she writes the report. This practice is understandable, but we know that it is not the best way. It is a compromise forced by the data and the limitation of the model. However, suppose a second researcher with a different theoretical focus analyzes the same data with a different 1' specification, producing different results. There could be no way to evaluate the adequacy of the tWo summaries of evidence. Using the HLM2F approach, both researchers could estimate a full 7 by 7 2' and produce identical results. Alternative interpretations could then be evaluated in light of a common summary of evidence. My claim is that even with the same data, we can do a better job by improving the modeling practice, i.e., formulating the HLM2F model with some a priori substantive information or insight that some of the random slopes are highly correlated. Also, when we think about the consequences that the report can produce when used for decision making of educational policy, the impacts that the results induce can be substantial. -100- View of the current ad hoc procedure from the HLM2F framework The third claim goes to the fact that the current practice also can be considered as a special case of the HLM2F model. For example, suppose the researcher’s interest is on 60, and 6,. Then, the model the researcher would use is A); =700 +701Wj +qu A] =710 +711Wj +qu 1621 =720 +721Wj A} =730 +731Wj flu =740 +741Wj 165)- =750 +751Wj fl..- =rw+r..W,-- This model actually is equivalent to HLM2F model with (um) (I 0) “U u,, 114,- OOOOO qu Kuej/ KO 0) which is a case in which the researcher used two latent factors and assigned 1 to (1,1) and (2,2) elements, and 0’s to all the rest of the elements of A . Thus, the current practice is a part of the HLM2F model framework, where the researcher naively determined the number of factors as two and the values of factor loading matrix as 1’s and 0’s in the specified position as in the above. If we use the HLM2F model, we can allow the elements of A to be unknown, where the current practice fixed these at zero. Thus, the use of HLM2F not only can solve the dilemma that researchers currently face deciding -101- which slopes they should fix, but it also reminds the researchers of alternative modeling possibilities. Therefore, the HLM2F model not only expands the flexibility of modeling, but also can reflect the researcher’s substantive knowledge of the model. This, in turn, means that the HLM2F modeling requires more consideration of substantive theory because the researcher does have to say which element of the level-2 error u, ”go together.” Thus, in HLM2F, it is especially important to note that the decision of specifying a factor structure must be made based on interplay between substantive theories and statistical methods. Substantive integpretation of latent variables The second contribution of the HLM2F model to educational research from a substantive standpoint involves interpretation of the latent variables which are used at level-2 in HLM2F. HLM2F provides an opportunity to link level-2 random errors which share common latent variables. Those latent variables may be meaningful substantively and can be interpreted, as we have seen in the infant vocabulary growth example, where we found that one and only one latent variable was necessary to describe the differential growth pattern of vocabulary of young children. This result may refine the current existing theory on infant vocabulary growth or motivate the researchers to create a new substantive theory. -102- Practical integpretabilig of the results The third substantive contribution would have to do with interpretation of the results from a practical viewpoint. When we have many intercorrelated random effects, HLM2 results are hard to interpret practically, even in the case when a large Tau matrix is successfully estimated. It is especially difficult to interpret many covariance parameters simultaneously. Therefore we often interpret only the variance components, and may overlook important evidence of shared variance. HLM2F offers an opportunity for analysts to carefully look at the off-diagonal covariance terms to detect the shared variance and to explore latent variables which might have important meanings. If it’s possible to make a concise summary by reducing the dimensionality and by using latent variables, a picture of what kind of factors influence the variability among the level-2 units then emerges. HLM2F provides this possibility. Overall, the proposed HLM2F model creates new flexibility by integrating two major statistical methodologies, HLM and confirmatory factor analysis. In terms of how to use theory and evidence to formulate models, more work needs to be undertaken. 6-1-3. U_nsolved Methodological Problems Confirmation of global maximum When maximum likelihood is chosen as a method of estimation, there is always a possibility that the likelihood function has multiple maxirna, or a flat likelihood in the neighborhood of the maximum likelihood estimate. Thus, it is a good idea to graph the likelihood or the log-likelihood to detect this problem. However, this is difficult since we -103- have many parameters and we want to sketch the log-likelihood curve, which is a function of many parameters, while the standard graph is limited to three-dimensions. To address this issue, one might consider using the profile likelihood. This method is appealing because we can visually see the shape of the likelihood as the function of the target parameter by taking into account other estimates of parameters. The profile likelihood can be defined (see for example, Gorthwaite, Jolliffe, & Jones, 1995) as follows: Definition: Profile likelihood Consider a vector of parameters 19 = (61 , 192) , with likelihood function L(6l , 61,;y), where y is an observed vector which comes from a density fY (y; 19). Suppose that 03,, is the MLE of 62 for a given value of 61 , then the profile likelihood for 61 is L(6l , 612,;y). The method of computing the profile likelihood involves the following steps: 1) Fix a parameter a , in which you are interested, to some specific value, e.g., 61(0). 2) Obtain the MLE of (92 , conditional on the fixed value of the target parameter, 6;. 3) Compute the likelihood L(6, , 0, ; y) by plugging those values in the likelihood formula, i.e., obtain L09, = 6(‘°’,0, = é,,;y). 4) Change the value of the target parameter 61 and repeat the steps 1) ~ 3) for all possible 6). 5) Draw the profile likelihood curve against 61. -104- One technical difficulty arises when we consider the implementation of this methodology. That is, in step 2, we need to compute the MLE for a given value of 61. If 62 involves a variance component and we choose a bad value for 6 , Fisher scoring fails by providing a negative estimate. In our HLM2F model in Equation (3-9) (or Equation (A-l) in Appendix), 6 = (7, 05) ,where ¢ = (2., 1/1, 0'2 ) (see Equation (A-9)). Suppose, for example, 21 = (21,4, )T and (u = (1,11,, , 1,112, , W22 )7. Suppose also that we are interested in the behavior of 1//,, . Then, in this case, 6 = 01,, , and 62 = (77, ’1’, W211 ((122) . To compute the profile likelihood of 1,11,, , we compute the MLE of 62 given (11”. But if we fix W11 at an unlikely value, then Fisher scoring produces an unacceptable MLE of 62 , e. g., W22 becomes a negative value, etc. Thus, as long as we use Fisher scoring, we will face this problem in computing the profile likelihood. A solution for this might be to use the EM algorithm. The EM algorithm assures convergence to a local maximum and thus we can compute the profile likelihood at any point evaluated, at least, at the local MLE. This certainly motivates one to develop an EM algorithm for the HLM2F model, but whether it is the global maximum or not is still in question. A non-graphical way of checking global maximum was developed by Gan and Jiang (1999). Their claim is that a global maximizer would satisfy ac 662 “’ ’ which is consistent with the property that at the global MLE, an equality to derive the Cramer-Rao lower bound should be satisfied, -105- at 2 0‘”! E4126) +£43.79)”: where I is the log-likelihood function, 6 is an unknown parameter and 60 is the true 6. Based on the observation on the above approximate equality, they developed a large- sarnple test that is practically usable. Their Monte Carlo studies on distributions that involve local maxima, including a normal mixture distribution, showed that the observed significance level was close to a level and the power was very high in a sample size of 500 in a simple random sampling. Identification Necessary and sufficient conditions for model identification are still an unsolved question when we use the factor analytic model. There are no sufficient conditions known. In practice, what we can do now is, once a model satisfies the necessary conditions, try to run the software and see if the software can produce the estimates. Then we might try different starting values for the estimates of the parameters and see if the software can produce the same estimates. If it does, we can have a little confidence that the model is identified, though it is not mathematically proved. This practice is based on the concept of local identification, which means that the information matrix is invertible at the MLE. Wald (1950) provided an alternative sufficient condition for local identification using a Jacobian determinant which is known as Wald’s rank rule (Bollen, 1989). The covariance equations that need to be solved are non-linear simultaneous equations in -106- terms of parameters. A general way of solving non-linear equations is an iterative algorithm using the Jacobian of the simultaneous equations. Wald’s suggestion is that if the Jacobian has non-zero determinant at the value we plugged in to the equations, then the parameter is identified at that value. Since these days symbolic computation of a determinant is possible using a software package such as Mathematica (Wolfran (1991)), Wald’s suggestion could be extended to a statement of necessary and sufficient condition of model identification, such that if the Jacobian determinant is not zero at the arbitrary point in the parameter space, then the model is identified, and vice versa. 6-2. Future Directions The issues of confirmation of global maximum and model identification, which were mentioned in the section of unsolved problems, are more general statistical problems that apply to many other models than HLM2F which I developed in this dissertation. Clearly more studies on these topics are needed to solve these problems. In this section, I mainly discuss extensions of the HLM2F model. Since correlated variables are common in social and behavioral sciences, a variety of the extensions of HLM2F can be considered. Those include: 6. Including a specificity parameter; b. Multivariate hierarchical linear model with factor structure (MHLMF), where factor analysis is applied to the multivariate HLM model; c. Nonlinear hierarchical generalized linear model with factor structure (HGLMF), where factor analysis is applied to the two-level hierarchical generalized linear model -107- (HGLM). When the outcome variables are dichotomous responses to test items, it can be shown that HGLMF is equivalent to a multi-dimensional two-parameter (2-P) IRT model. This is a confirmatory item factor analysis. 6-2-1. Including smcificig parameters Including specificity parameters into HLM2F is straight forward and is useful. As I suggested in section 5.3, this model is immediately applicable to the SAT meta-analysis data. Mel: L-l: Y, =X,6, +r,, (6-2-1.1) where Y,isa n, x lvector, X,isa n, x R matrix, 6, isa Rxl vector, and r,isa 11, x1 vector, and r, ~ N (0,021,,1) (6-2-1.2) where 1", denotes an identity matrix of size 71,-. L-2: fl) = W + u,- (6-2-1.3) u, = A77, + 5, (6-2-1.4) where W,isa RxF matrix, yisa Fxl vector, u, isa Rxl vector, Aisa Rx M matrix, 77, is a M x 1 vector (R 2 M), and 5, is a R x1 diagonal vector, and u, ~ N(0, r), (6-2-l.5) -108- 17,- ~ N (0.‘I’). (6-2—1.6) 4'} ~ N(0.Q). (6-2-1.7) where Q = diag(a), ,w2,...,a)R). (6-2-1.8) Thus, we have 2' = A‘PAT + Q. (6-2-1.9) and the total variance D[Y,] E 22 , a R x R matrix, is decomposed into 2 = z'+o'2I,,l 6-2-1.10 =A‘1’AT +Q+azln,. ( ) In factor analysis terms, the first term in Equation (6-2-1.10) is called the communality (common factor variance), and the second term is called the specificity (part of a test’s true variance which is not shared with any other tests in a battery) (Thurston, 1947). Note that since we are formulating a factor model at level-2, there is no error variance left at level-2 variance-covariance 2'. That is, the true variance 7 is decomposed to communality and uniqueness by the level-2 model. In his factor analysis model, Thurston (1947, Chapter 3) actually considered the model in terms of variance decomposition as follows: -109- Total Variance = True Score Variance + Error Score Variance Communality + Specificity + Error Variance (6-2-1.11) True Score Variance Communality + Emcificiw + Error Varinace. Uniqueness Usually in factor analysis, the equation that is represented by the third decomposition in (6-2-1.11) is used. But by using the model represented by Equation (6-2-l.1) to (6-2- 1.10), we are able to further decompose the uniqueness into the specificity and the error variance. Note that if we constrain a), = 0 for all r, then it reduces to the proposed model in Chapter 3. In section 6-2-2, I will present the model without the specificity term, i.e., 5, =0 for all j, because in a technical sense, the model seems more useful in solving the nonconvergence problem when some of the correlation estimates are close to 1 or -1, although inclusion of the specificity term is straightforward. Then, in section 6-2-3, I will show that the 2-Parameter Item Response Theory (IRT) model can be formulated by a nonlinear version of the HLM2F model. 6-2-2. Multivariate HLM with a Factor Model (2-level Multivariate Model) General Formulation of the Model Suppose the complete data consists of P observations. For example, if a test consists of Pth subtests and the examinee responds to all the subtests, then we have P -110- outcomes for that subject. In a longitudinal study, if Pth waves of observations are planned, and if we have succeeded to follow up, then we have complete P waves of observations. Often in these settings, we have missing observations. However, if we can assume that missing observations occurred at random (MAR), we can use all the information we have gotten to estimate the complete data model without bias. To develop a general multivariate model, we utilize the idea of Jennrich and Schluchter (1986) and Thum (1997). That is, we formulate a level-l model by two steps. The first step of the level-1 model links observed outcome and complete outcome, which is L-l: Y, = Z M r‘ (6-2-2.1) where Y, is a scalar and ith observation of jth level-2 unit, I; is a scalar and pth observation of level-l unit, and M ,p, is an indicator and is 1 if ith observation is the observation of pth level-1 unit, 0 if otherwise. Thus the matrix M ,.p, indicates which observation of complete data we have got, or it tells us the missing pattern of the observations. In matrix notation, we can write Y, = M ,Y, , (6—2-2.2) where Y, is a p, x 1 vector of observed data outcome, M, is a p, x P matrix of rrrissing pattern, and Y,’ is a P x 1 vector of complete data observation of the outcome. Then, we formulate the second part of the level-1 model, which is the standard formulation of the -111- level-1 model in HLM. Thus, we formulate the standard model for the complete data, not for the observed data. Y,’ = X,’6, + r,‘, r,‘ ~ N(0, 2), (6-2-2.3) where )3 is the P x P variance-covariance matrix for r,’ , i.e., E = D[r,°] = D[Y,°| 6,]. Plugging Equation (6-2-2.3) into Equation (6-2-2.2), we obtain the level-1 model by letting X, = M,X, and r, = M,r,°. L-l: Y, = X,6, +r,, r, ~ N(0,Z,) (6-2-2.4) where Z , = M, M} , a p, x p, symmetric matrix. Notice that now we have the subscript j in )3 that was induced by a missing indicator matrix M, and that 22 , is actually a submatrix of 2 , a P x P symmetric matrix. Thus, what multiplying M, and M,.T from the left and the right does is to pick up the subset of the 2 matrix. The level-2 model is a standard formulation. L-2: fl) = W,-7 + u). (6-2-2.5) Now we consider the case when u, = A77, , then we have the level-2 model 6,. = W,7 + A77,, 77,. ~ N(0,‘l’). (6-2-2.6) If we put Equation (6-2-2.2), (6-2-2.3), and (6-2-2.6) together, we obtain the combined model. -112- Y, = M,(X,°.W,7+X,'.Ar7, +r,). (6-2-2.7) We write this general two-level multivariate linear model, which has a factor structure at level-2, with a slightly more general form: Y, = M,(A,,6l +A2,A77, +r,) = M,A,,6, + M,A2,A77, + Mr. 11’ (6-2-2.8) where r, ~ N(0,Z), 77, ~ N(0,‘I’).Ifwe let X, = M,A A, = M,A2,,and e, = Mr. I j ’ J J ’ then we have a standard form of HLM with level-2 factor structure as in Equation (3-9), Y, = X,6( + A,A77, + e, , (6-2-2.9) where 6, ~ N(O,Z,)for X,isa p, x p,symmetric matrixand 2,. = M,2M,T. Heterogeneity of variance (Note the suffix for 2 ) appears because of varying missing pattern matrices M, , and 2, is a subset of 2 . Now we show two examples to illustrate how to apply the above general formulation to the specific cases. Example 1. Growth Model Suppose that the same math test was administered for n elementary school students from 1st grade to 5th grade, but some of the students did not take the test at some occasions of testing. We wish, for example, to know how math proficiency of the students grows over time and what the variation of the proficiency among the students is. Then, in a general formulation of the multivariate model, Y,’ , complete data for student j, -ll3- 1s Y,°= (Y' , Y;,,Y 3,, 13,, 1,5171 a 5 x 1 vector in Equation (6-2-2.2), and the number of occasions P = 5. To be specific, if student 1 took the test at all of the grades, then we have ()m Y Y31= Y.) 112,) If student 2 missed the third occasion, then we have 2 2 Y1 Y2 Y3: Y, OOOH 2 fl 0 0 0 \0 COO—‘0 co—1oo OOHO COCO OF-‘OOO o—oo O) (1") ll HOOO Y4] Y3}. 1Y5) (6-2—2. 1 0) (6-2-2.1 1) To formulate the complete data level-1 model in Equation (6-2-2.3), suppose we decide to use a quadratic function for modeling mean growth for studentj,.and the grade was coded as (1st grade, 2nd grade, 3rd grade, 4th grade, 5th grade) = (-2, -1, 0, 1, 2), which centers the age variable around 3rd grade. Then, the parameter 6, is a 3 x 1 vector 6, = (60,,6,,6,,)Tand the design matrix X, is (1 -2 41 l —1 1 1 1 l \l 2 4) -114- X, = 1 0 0 for all students. Thus, the whole level-1 model for complete data is (11;) f1 —2 4) 0;.) "1 f0) J 7“,, ’21 1 “1 1 flu r2.) r2.) 0 Y3} = 1 0 0 6, + r.) . 6’,- ~N( 0 ,2) (6-2-2.12) ’71 1 l 1 .32; r,°, r;, 0 1Y5) \1 2 4) vs); vs); KO} where 2: 0'3, 0'3, 0'3 0'3, 0'3, . C741 0'42 043 0'4 0'45 (0'5, 0'52 05, 0'54 052 1 This is an unconstrained (or saturated) level-l covariance. Modeling this covariance structure using a smaller number of parameters by giving restrictions such as homogeneous case, heterogeneous but independent case, a first order autoregressive model (AR(1)) case, factor analysis (exploratory model) case, and so forth, were mentioned in Jennrich and Schluchter (1986) and Thum (1997). For the level-2 model in Equation (6-2-2.5), suppose that we found a high negative correlation between residual intercept uo, and the residual slope 11,, , but no correlation between uo, and the residual rate of acceleration uz, , and between u, , and u,, , even after we included all of the necessary predictors that could explain individual differences such as gender, race, socio-economic status (SES), then we might consider the model in Equation (6-2-2.6). That is, it might be useful to think about uo, 1 0 77 ’7 u,. = 4 0 [ °’),[ W] ~ N(0,‘1’) (6-2-2.13) J 0 1 u,, 0 1 ”U -115- W11 W12) where ‘I’ =[ W21 W22 This factor model structures the 7 as T 700 701 702 1 0 1 0 W11 1W” 0 W11 W12 2 7,0 7,, 7,2 = 2. 0 W W /1. 0 = lit/1,, 2.171,, 0 (6-2-2.l4) 1,, 1,, 7,, 0 1 " ’2 0 1 0 0 11,, and reduces the number of parameters in r from 6 to 3. Example 2. Multivariate Outcome Growth Model Suppose that students in primary school took 2 tests such as Reading and Mathematics each year from 1st grade to 3rd grade. In this type of study, it is natural that we have missing observations for either or both tests at some time points for some students. We assume that these missing patterns are missing at random (MAR). In this setting, the complete data for each student is Y,’ = (Y'. Y' Y. Y. Y' Y' )T where If, is 11’ 2j’ 3j’ 4j’ Sj’ 6} the reading score for student j at grade 1, Y2, is the mathematics score for student j at grade 1, Y3, is the reading score for student j at grade 2, If, is the mathematics score for student j at grade 2, If, is the reading score for student j at grade 3, Y6‘, is the mathematics score for student j at grade 3. To formulate the complete data level-1 model in Equation (6-2-2.3), suppose we decide to use linear function for the mean growth for student j and the grade was coded as (lst grade, 2nd grade, 3rd grade) = (-l , 0, l), which centers the age variable around 2nd grade. Then, the parameter 6, is a 4 x 1 vector ,3, = (60, , 6, , 62,, 6,, )T , where 6,, is the mean reading intercept for student j at grade 2, -ll6- 6, is the mean reading slope for student j , 6,, is the mean mathematics intercept for student j at grade 2, 6,, is the mean mathematics slope for student j. The design matrix X, in Equation (6.2.2.3) is (1 —1 0 0) 0 0 l —l , 1 0 0 0 X, = 0 0 1 0 for all students. Thus, the whole level-1 model for complete data 1 l 0 0 K0 0 l l) is (11,) (1 -1 0 0) 0,3.) 0,3.) to) 2,. 0 0 1 “16301) r2, r2, 0 Y3, 1 0 0 0 6, r3, r3", 0 = + , , , ~N ,2 6-2-2.15) ,, 0 0 1 0 6,,- r,, r,, (0 ) ( Y5.) 1 1 0 0 K1311} ’5‘} ’3‘} O \61/ ‘0 O 1 1’ V6.1} \r6.j) ‘0) (0'1 012 0'13 0'14 0'15 0'19 021 of (In CB4 025 026 where 2: 0'31 0'32 0'32 0'34 0'35 036 0'51 0'52 0'53 0'54 0'5 056 2 )061 062 063 064 065 0'6 / Thus 2 matrix, i.e., within-student variance-covariance, is 6 x 6 in this case. In addition to the ways of structuring the covariance matrix mentioned in example 1., we can think about other patterns here because there is a content area factor, i.e., reading vs -ll7- A 0 0 mathematics, as well as a time factor within students. For example, 2 = 0 A 0 , 0 0 A 5:2 42 52, 522] and 015 32 x 2 null matrix, represents a block homogeneous where A = [ variance-covariance pattern which corresponds to 2 = 021, a homogeneous independent variance-covariance for example 1. The within-subject variance-covariance pattern 22 = c: :3 D» ab: 0 0 implies that reading and mathematics scores are correlated at each A time point within students as in A , and the within-student variance covariance is constant over three time points, but has no correlation with other time points. Now for the level-2 model, suppose that we found a high positive correlation between the residual reading intercept uoj and the residual mathematics intercept uzj , and between the residual reading slope u”. and the residual mathematics slope 113, , but no correlation between uoj and ul j , between uoj. and u” , between u1 j and uzj , and between uzj and "31 even after being accounted for by all of the necessary predictors such as gender, race, socio-economic status (SES) for example, that could explain individual differences. Then we might consider the model in Equation (6-2-2.6). That is, it might be useful to think about fuel) 1 O u. 0 1 . . ‘1 = [0”), (770,) ~ N(0,‘P) (6-2-2.16) “21 A1 0 7711 ”I; W31) 0 32 -118- W11 W12) where ‘P =[ W21 W22 This factor model structures the r as T o 1 o 12 0290*— 1 0 2'10 2'” 1,2 713 = 0 1(V’n W12] 1'20 715 722 723 ’11 0 W21 W22 0 32 W11 ./’12 31%| ’12le _ W21 W22 Ail/21 Az'l’zz _ 11W” 11%2 312W” 3132an lei/’21 ’12sz Aszr 122sz (6-2-2. 17) and reduces the number of parameters in r from 10 to 5. -119- 6-2-3. Application to Item Resmnse Theory Model The purpose of presenting an alternative formulation for a unidimensional 2- parameter Item Response Theory (IRT) model here is that, since the reformulation makes the IRT model a special case of nonlinear mixed model with factor structure, it gives us an opportunity to examine the model and the theory, from a standard statistical point of view. Also, the reformulation gives us a model that is easier to extend to a multidimensional IRT model and to multiple levels of nesting. We will first give the perspective of IRT as a nonlinear factor analysis. What we mean by “nonlinear” here is a nonlinear transformation of the expected value of the outcome is modeled by a linear combination of parameters and independent variables. From this view, we will show that if we adopt the IRT as a nonlinear factor analysis perspective, it is straightforward to extend the usual unidimensional IRT model to a multidimensional IRT model. Though the current IRT model, whether uni-dimensional or multi-dimensional, is used in the exploratory factor analysis mode, it will be shown that a confirmatory mode non-linear factor analysis can be executed without much difficulty, if we formulate the IRT model as a hierarchical generalized linear model (HGLM) (See Chapter 5 & 6, Bryk, Raudenbush, & Congdon (1996)). Usually a test consists of many items, and the items are supposed to measure the student’s proficiency. Then we want to know the current status of the proficiency of the examinee and how his/her proficiency grew over time if the longitudinal data are available. The key for the modeling here is to represent the IRT model as the HGLM level-1 model for a nonlinear factor analysis, and if we use dummy variables, the same ~120- model can be represented by HGLM with level-2 factor structure. Suppose that a test that consists of n dichotomous items is administered for J students, whose response Ya is scored 1 if correct, 0 if not, and suppose that J students are a random sample from the student population. The distribution of random variable Y”. conditional on the probability of correct response of student j for item 1' (denoted as pg. ) is Bernoulli with mean pg. (0 S pg. S 1) i.e., Pr(Y,J. = 1| p0.) = p”. . If we write this data generation process in a regression form, then the sampling part of the level-1 model (the unit is item) is: L-l Sampling Model: X,- = p,- + 3,, (6-2-3.l) where 5,). ~ Ber(0, py.(l — p0. )) and the 8,]. ’s are independent from each other. Note that the conditional mean and the conditional variance of the outcome variable I; are E(Kjlpy) =p,-,-, and Var(1£j|pg)=py(1-pg)- ‘ (6-2-3-2) Now, for the model for the mean p”. , since the range of p”. is restricted between 0 and 1, and in order for the transformed variable to have the range theoretically -oo to +00, we make a logit transformation, i.e., Pi} 1-p 77"} = log( ) =-—. 10g it(py) a (6'2‘3-3) y and then we assume that 17,]. has a normal distribution. For notational convenience, we write the logit inverse transformation as 1 1+ exp(—n,.j)' 1),-,- E logit"(r7,-,~) , where p, = (6-2-3.4) ~121- Now, the level-1 structural model for the 2-parameter IRT model for item 1' of student j would be: L-l Structural Model: n. = /1,-(0,- - 6.). (6-2—35) Combining the level-1 sampling and structural models, we obtain the Level-l model as: L-l Model: Y, = logir'[x,(oj - (2)] + 5,, (6-2-3.6) where 8,]- ~ Ber(0,p,.j(1— pg. )) for O S pg. 3 l. (6-2-3.7) and A, is the item discrimination parameter for item i, 01 is person j’s ability (proficiency), 6,. is the ith item’s difficulty parameter. The scaling constant D 5 1.7 that makes the logistic model comparable to the normal threshold model is omitted from the model representation without loss of generality. Note that if we consider all the 2., ’s to be the same for all the items, then it will be a Rasch model. AN OVA gge formulation of 2-P IRT using HGLM We now show that the inside of the inverse logit function of Equation (6-2-3.6) can be written in the same format as the linear factor analysis model. Specifically, let’s consider that we have only three items in the test. Then the level-1 structural model of the 2-parameter IRT model can be written as -122- L-l Structural Model: Itemlznlj =X1(6j —c5;) Item 2: 772]. = 12(0). —62), (6-2-3.8) Item 3: 7}”. = 23(6]. — 63) where 6]. ~ N (0,1) . The variance of 6}. was fixed to 1 in order to identify all of the item discrimination parameters. In matrix format, Equation (6-2-3.8) can be written as ’7” ”M 31 fl. ,1, '72.- = 4,5, + ,1, (0,)= y, + A. (9,), (6-2-3.9) ’13.; ”/1353 13 #3 33 where ,u, E -/1,.5,. for all i. (6-2-3.10) Or, we can write it in matrix form as 77]. =,u+A6j. (6-2-3.11) Then, at level-2 whose unit is student, we have, 6]. = fl” + uj, (6-2-3.12) where u J. ~ N (0,1). Note that the Var(uj) = 1 corresponds to Var(61.) = 1. Thus the logit-transformed mean vector 77]. conditional on 6]. has the same structure as the conditional mean of the standard one-population factor analysis model, i.e., E(YI 0) = ,u + A19. Remember that the key assumption of the linear factor analysis is that given the factor score (latent ability), the Y’s (outcome variables) are independent. In the IRT case, the local independence assumption (that, given 91. , the observations Yr ’5 are independent) corresponds to the conditional independence assumption for the linear -123- factor analysis model. The difference between the linear factor analysis model and IRT model is whether the link function is identity or logit. Thus, we can view the IRT model as a non-linear factor analysis model. If we do so, then the item difficulty parameter is the mean of the logit, and the item discrimination parameters are contained in the factor loadings matrix A . Regression ape formulation of 2-P IRT using HGLM Rasch (l-P IRT) model Though the above formulation is useful for recognizing that the IRT model involves a factor structure, it does not facilitate finding the estimation procedure similar to the one which is executed by the current HGLM. In order to do that, we reformulate the above IRT model by a hierarchical model with dummy variables, which was introduced by Kamata (1998) and Cheong & Raudenbush (1999), who applied this model to the 1-parameter (l -P) IRT (Rasch) model. In the following presentation, since the level-1 sampling model is always the same, i.e., Bernoulli distribution, 1 will omit it. To facilitate the idea of representing a 2-parameter (2-P) IRT model by HGLM, we use no-intercept model first. As before, we have n items and J students. For item i and student j, L-l Structural Model: 77,]. =A1Xw +fl2jX20. +AJX3U+---+flquqy+-~+,B,,X,,,.j = Zflinqij q=l (6-2-3.13) -124- where X W. is the indicator variable and is 1 if q = i and 0 if q a: i for all i. At level-2, all the coefficients randomly vary among students, but with the same ul j. L-2: fl,- = 710 + “u :62} = 720 + ulj (6-2-3.14) flqj : qu + ulj flnj = 710 + “119 where u“. ~ N (0, Too). Or, in short, ,6,” = 7,10 + u”. for q = 1,...,n. (6-2-3.15) Notice that ul 1. is the same for all items and reflects the idea of unidirnensionality of the IRT model, that all the items in a test are supposed to measure a single construct. Notice also that in the Rasch model it is not necessary to fix r00 = 1, though in IRT it is customary to do so. On the other hand, fixing some parameters is necessary for the 2-P IRT model in order for the model to be identified. If we put the L-l and L2 models together, we get the combined model. Combined Model: '7.) = 27.0%.», MHZ/12,. (6-2-3.16) q-I q=l But since 2 X W. E 1 by definition of indicator variables, it reduces to (1:1 77,-,- = Z quXq, + u” . (6-2-3.17) (Pl -125- I now show that the model in Equation (6-2—3.17) is equivalent to the model formulated by Kamata (1998), and Cheong & Raudenbush (1999), that used an intercept model in order to fit the Rasch model to the existing HGLM setup. That is, at level-1, we let one item, for example, the first item, be the reference item, and create (n-l) dummy variables to represent differential item effects on response probability of student j. L-l structural Model: ’7"; = A); +£1ij +13sz3;;+°"+/3q,-Xm+”'+fln-me-j = Ian + Zflqulij q-l (6-2-3.18) where X W. is the indicator variable and is 1 if q = iand 0 if q ¢ i for all i. At level-2, only intercept floj varies. L-2: flu} =7oo+u0) A} = 710 a, = 720 (6-2-3.l9) flqj = rqO Air-l” = Yup—no where uoj ~ N (0, 100) . The combined model can be written as, Combined Model: 27,, = 7,, + Z yqoxq, + u,,. (6-2—3.20) (Pl -126- If we let 6,. s -{700 + 2 yqu q,1} , 6}. E uoj , then the combined model is exactly ([21 the same as the l-P IRT Rasch model except that the constant D s. 1.7 , which adjusts it to the normal ogive model, was omitted. In IRT model terminology, 6,. represents ith item difficulty, 6]. is jth student ability. To interpret the combined model, we consider item 1 and item 2. For item 1, 7}”. = 700 +u0j, and for item 2, 17,]. = 700 +710 +uoj. Thus, the interpretation in words can be that the logit correct response probability of student j for item 1 is student’s ability uoj adjusted by the item difficulty 700 (item easiness may reflect the exact meaning of the parameter) for item 1. And the logit correct response probability of student j for item 2 is student’s ability uoj adjusted by the item difficulty (700 + 7,0) for item 2. 7,0 reflects the difference in terms of the item difficulty of item 2 compared to item 1. If item 2 is easier than item 1, then 7,0 > 0 and thus p2,.2_plj , where p” and p21. are the correct response probabilities for item 1 and item 2 of student j. It should be noted that the l-P IRT model can be considered as a non-linear two- way AN OVA additive model with the factors, item difficulty and student ability. 2-P IRT model Since using all n indicator variables for items without including the intercept is more natural, I’m going to adopt a no-intercept hierarchical model formulation to represent the following 2-P IRT model. The 2-P IRT model addsanother parameter called item discrimination on each item. Item discrimination can be interpreted as a kind of -127- item’s sharpness in distinguishing two students who have different abilities by the item. This concept suggests the interaction between item characteristics and student ability. That is, the effect of student ability on item response depends on an item characteristic which we call item discrimination even afier controlling item difficulty. In other words, some items are sharper than others in reflecting student’s ability. By this conceptualization, we formulate the level-1 structural model as L-l: 77y = AJXUJ +1321X2ij+m+ q/Xw+'”+flannr " (6-2-3.21) = Z 'qu Xqij ’ qal or, in matrix notation, 17,,- = X374, (6-2-3.22) where X; = (X10, ng, ,Xqij, ,ij). This notation is actually simpleth it seems: the ith element is 1 and the rest of them are 0, i.e., X; = (0, O, ,1, ,0). Note that this is the same as the level-1 model ofthe Rasch model. But the level-2 model is different. L-2: A] =710 ““11"” fir} =720 +22“; (6-2-3.23) flq] = 7qo + ’1qu flnj = 7110 +173“) -128- where v”. ~ N (O, 1). It should be noted that the variance of v”. is now set to 1 for model identification. Notice the similarity and difference between the 2-P IRT model and the Rasch model in Equation (6-2-3.14). The same v”. reflects that all items are measuring one kind of ability and the different coefficient 2 implies that the sensitivity of each item to reflect ability v”. is different; in other words, itq is the interaction effect between the membership of rtem q (X) and person j ’s ability ( v”. ). In matrix notation, ‘7’} ,5}. =7 +Avj , vj ~ N(O, 1) (6-2-3.24) Whereflj =(AJ’IBU’W ”Bin/0T 7=(710,720,...,y"0)r,A=(2.l, ’12, ”'a 1,.)7, 4.1,). If we combine the L-l and L-2 models together, we get 77'} = {2M1} qu} + {ZIA'X q qy' }vlj a ‘ (6'2'3 .25) «PI and ifwe let ,u, a 274,-qu , )1, a Z}. X and 6.- = v0j then the above equation can be q 0'} ’ q-l qul written as 77,-,- = A1.- + 1.63, (6-2-326) where ,u, is the item intercept for item i, A, is the item discrimination for item i, and 0] is the ability of student j. Using the item intercept as an item parameter is a way of representing the 2-P IRT model (See, for example, Bock & Aitkin, 1981), but if we impose a constraint on ,u, such as p, = -/1,.c5} for all i, then the model becomes = 249,. — 5,.) (6-2-3.27) -129- and this is the standard 2-P IRT model. We now write the models for all n items together. At level-l we have L-l: ,7]. = A13]. (6-2-3.28) where 77]. =(mj,...,n,,j)r, A]. = E ,and ,6]. =(Aj,...,,6nj)r.Note that Alisthe nx n identity matrix, i.e., A j = In if student i responded to all of the items, which is a case of balanced design. The level-2 model is written as Equation (6-2-3.24), and thus the combined model is, Combined Model: 77]. = A17 + AjAvj , vj ~ N(O,‘P) (6-2-3.29) where 77]. is the n x 1 outcome vector, y is the n x 1 item intercept vector, A] is the n x n design matrix, A is the n x 1 item discrimination vector (later, it will be n x M matrix if we use M multidimensional IRT model), vj is the l x lscalar (later we extend it to a M x 1 vector), and ‘P is a scalar variance of latent ability of student j (later, a M x M covariance matrix). For the above 3 items case, Equation (6-2-3.29) becomes ”I; 1 0 O 700 l 0 0 I11 77,]. = o 1 o y,, + o 1 o 12(vlj). (6-2-3.30) 0 0 1 773 j 7 20 0 0 1 ’13 Note that the Rasch Model is a special case of the 2-P IRT model in that it lets A = 1 , a vector of all 1’s in Equation (6-2-3.29) or (6-2-3.30). Note that though the design matrix ~130- A 1 seems trivial (A j = I" for all i, which is true if all the students take the same set of items and if there is no missing observation), it can be different from student to student. Now, notice the close similarity of Equation (6-2-3.29) to Equation (3-9), which is the HLM2F model developed in this dissertation. In fact, by letting u j = Av j and denoting D[uj] a r in Equation (6-2-3.29), we have I = A‘I’AT . Thus, by formulating the IRT model by HGLM with a factor structure, we recognize that the standard 2-parameter IRT model is a non-linear version of the model Equation (3 -9), which I call HGLMZF. This is the reason why I say the IRT model is a non-linear version of HLM2F. Instead of standardizing woo = 1, which is the standard procedure of IRT programs, we can set 20 = 1 for model identification. Then, we can directly incorporate the algorithm of HLM2F into the micro iteration of HGLM, which consists of a doubly iterative procedure. Multi-dimensional IRT model The nice thing about this formulation is that we can easily expand it to a multidimensional model. That is, if we think that the test is measuring multiple abilities, then we change Equation (6-2-3.9) to 7M #1 111112 (a!) 7721 = #2 + 121122 92' (6-2-3.31) ’73} #3 131 ’132 J and Equation (6-2-3.12) to 31 __ A») (“01] [gzjj—[Ao + u”. , (6-2-3.32) -l31- W38 SCI t0 1' “0} O 1 0 . . where u, ~ N2( 0 ,I2 = O 1 ). Note that the covanance matrix of J the identity matrix so that the model can be identified except for the rotational indeterminacy. For a HGLM formulation with dummy variables, we just need to change A by (31‘ {’111 312‘ 112 121 122 . E E 5 v1, . addrn another column from A = to A = , and v. from v . to [ )m g ’14 ’14! 1'42 j ( U) v2} KAN) Kl,” 1M2) Equation (6-2-3.29). By this formulation, we can perform the unidimensionality test by formulating the null hypothesis H091,2 = O for all i. Further, if we model the level-2 by some individual characteristics, then we can perform a statistical test for Differential Item Functioning (DIF) (Hambleton, Swaminathan, & Rogers, 1991). For the longitudinal data, we can extend the IRT model to a latent ability growth model based on this formulation. In that case, the scaling issue should be carefully addressed in order for the latent ability growth to be meaningful. 6-2-4. Other Possibilities Incorporating the factor model into a three level model is a natural extension of the two level model. In the three level case, there are two possibilities of using the factor -132- analysis model; one is in level-2 covariance matrix 1',r , and another is in level-3 covariance matrix 11,. Incorporating the factor model into a three level multivariate hierarchical model is also straightforward for conceptualization. As shown in the previous IRT application section, applying the factor analysis model to hierarchical generalized linear model (HGLM), which has a non-linear outcome, seems to be promising, at least to measurement models in psychometrics. ~133- Appendix. Derivation and Algorithm The purpose of this appendix is to derive the Maximum Likelihood Estimators (MLE) via Fisher scoring algorithm and to give the detailed steps of algorithm for computation. M: Subscript Model (Model for each group) Y}. = Aug + Aszryj +rj , (i =1,2,...,J) (A-l) where rj ~ N(0,0'21,,j), 71,- ~ N (031’). and rjand njare independent forallj. ins n]. x 1, AUis nj x F, 19, is Fx 1, 77} isthe Mxl factorscore vector, rj isa nj x1; Azjisa nj x R, Aist M, Injisthe n j x n 1 identity matrix, and ‘I’ is the positive definite M x M symmetric matrix. By writing ej = Asznj + rj (A-2) in Equation (A-l), we see the model (A-l) as general linear model, 1’]. =A,j.0,+ej. (A-3) The variance-covariance matrix is V]. a Var(ej) = Asz‘I’ATAzrj + 021,,‘ . (A4) -134- The log-likelihood: The Log-Likelihood I is 1 = — %[N log(27r) + (N - JM)log(0'2 ) + J logl ‘I’I-Z logl cf] + 2(efe, — 4,1,, -AC;‘AT «Le, )/ 02],(A-5) where C;' = (AT/1,3 Asz + 02‘1“)" that will be explained in Equation (A-16), ejTej is given by Equation (A-27), and Azrjej is given by Equation (A-28), to be shown later. Computation: In the following, the subscript for the A2 will be omitted for simplicity of the formula and will be written as A , and all the subscript j will be omitted. If it is necessary for clarity, we will explicitly use the subscript. Review of Standard 2-level HLM Elements required for standard 2 level MLF, Fisher Scoring Algorithm are: dvec(z') do'2 32 log L02, 2302;3’) - — d¢T ,F=W,andforH=E[ 6¢T5¢ ],ATV'A,ATV2A, ”(V-2), and eTV'ze since 1 many _. -. o‘vec(V) H:--[— V ®V —— 2 5W ‘ ) aw (A-6) = —-12-[E’(ATV"A® A’V"A)E + ETvec(ATV‘2A)F+ FTvec(ATV’2A)E+ tr(V'2)FTF]; 6”] L , , 2; . . for S = 0g (:61 0' y), E, F, ATV"e, tr(V"), and eTV’Ze are required srnce -135- S = QMYW“ e V" )vec(eeT - V) 2 5¢ 1 (A-7) = ~2-[Ervec(ATV"eeTV"A — ATV"A) + FT{eTV'Ze — tr(V'l )}]. Now for 2 level MLF factor model, we need the same things. But there are some differences, because in this case, 2' = A‘I’AT (A-8) 2' is R x R and is not full and matrix which is not invertible A is R x M, and ‘I’ is a M x M positive defrnite symmetric matrix. (I) Computation 01E and F Definition 1. Definition of ¢ vector ¢ =({’11j}a{‘/’M}’02)T A-9) =(AT,WT,0'2)T, ( is a Q x 1 vector of unknown unique parameters in the variance covariance matrix V]. for all j, and it is composed of three elements, a Ql x lvector A. , a Q2 x lvector l/I , a scalar 0'2. (a) A QI x lvector ,1 consists of Ql unknown elements in the R x M factor loading matrix A (Ql 5 RM). Thus, some of the elements (Q, ) are unknown and to be estimated from the data, and the rest of the elements (RM — Q) are known. To construct A , we align the unknown elements in the order of starting from (1 ,1) element of A and going down the column, if we find the unknown element, then put it in the A vector. Then, ~136- move on to next element in the same column of A . Once we finish searching in the first column, we move to the second column, and so on to the last Mth column of A . Exam 1e Ex 1. O O For example, if A = A” 1 then 11 = (A?!) . o a. o 2.. Ex 2. 1 0 ’121 A IfA= ’1“ A” ,then A: 4'. 0 l [in [141 1'42 2'42 ,1,, ,1,, . . 12. 112‘: Note that rn exploratory factor analysrs model, we let A = 32 A, , all the l 2 2'41 1‘42 elements in the A (factor loading matrix) are unknown. But, we are here thinking about confirmatory factor analysis in which we specify some of the elements as fixed number. In the first example, we fixed 2“ =1,,1,Jl = 0,24, = 0,2,2, 2 0,222 = 0,132 =1, and in the second example, we fixed as ’11, = 1, A.“ = 0, , ,1,, = O, 1.32 =1 . Thus, in our model, some of the elements in A are known and some of them are unknown. The number of known elements is RM — QI and the number of unknown elements is Ql . (b) A Q2 x 1 vector (11 consists of Q2 elements, where Q2 be the number of unique elements in M x M variance-covariance matrix ‘I’ , and since ‘1’ is always symmetric, -l37- the number of the unique elements Q2 is Q2 = M(M + 1) / 2. We align the Q2 x lvectort/I as follows. W11 W12 V’lM W21 ll’zz ‘11: : :9V’=(V’llaV’zls”'aWMlaV’zza'l’zsama'l’m:W33a”°aV’m)T- Wm WM This operation is often written as w = vech(‘1’) (vector half). Adding the number of unique elements in 02 , which is clearly 1, we have in total, Q=Q1+Q2+1=Q1+M(—A:+2+1 elements in the vector ¢. fig. Max(Q,) = RM and Q2 = M(M + 1) / 2 . Thus, Q2 is always exactly M ( M +1) / 2 , the number of unique elements in the symmetric matrix ‘1’ , but Q, satisfies inequality restrictions Q1 3 RM because we fix some of the elements in A , depending on our theory. Definition 2. Definition of E matrix We define E matrix as E_dvecr_(dvecr dvecr dvecr) - d¢T _ all" ’dw’ ’de” . (A-lO) =(E,,E,,,Ea,) -138- The dimension of E is a R2 x Q matrix, and E matrix consists of three parts, E, , EV , and E62 , which is always a R2 x 1 vector of 0 because I does not depend on 0'2. Thus, E matrix always has a form E=(E,,EV,0). (A-ll) And, E, is a R2 x Ql matrix, E, is a R2 x Q2 matrix. Now, (a) For E, , we compute this column by column. The qth column of E, corresponds to dvec 2' d/l 1} , where x1, is the qth element of A (that is, the qth unknown element in A). The order is determined by counting for the same column down the rows and then move to the next column, go down the rows. Thus, _ area! _ aiec(A‘PAT) _ 5(A‘I’AT) 1., " d1, ‘ at, ‘ m‘ 0%,. ) ’ ’ T ’ (A-12) - vec(-éA ‘i’AT + N? 6A )- vec(D ‘I’AT + A‘I’DT ) ‘ at, at, ‘ ‘11 1v ’ 3A .. .. . . . . . . where D1,;- = a: = {1 for posrtlon (1,1) 1n A and 0 for other posrtrons 1n A} for all 1, j. 1} Example. 1 O 321 0 diecz' firecr when A = 1 , E, =(E,“,E,u)=( €12] , 07142 ).Then, 0 2.2 E,“ = vec(Dkl‘I’AT + MID; ) = vec{D,,l‘PAT + (D,l\iIAT)T}. —139- 0 0 D l 0 ’12! - O O ’ 0 0 0 O O 0 O 0 DA, (VAT: 1 0(11’11 91’1sz ’12] O 0]: W11 ’1le” V12 ’14z'l’42 ,and ' 0 0 W2, ([1,, 0 O l 11,, 0 O 0 0 0 0 O 0 0 0 O O 0 O 0 W11 0 0 Wu Ami/’11 W12 1421/42 0 321W” 0 O T A‘PDT = D123“ + ‘2' o o 0 o + 0 v1.2 0 o 0 0 0 0 0 lat/1,2 0 O 0 Wu 0 0 W11 21211/41 W12 ’142‘l’42 O ([112 0 O 0 1'42 V’ 42 O 0 Then taking vec, we obtain E ,1! , which is a 16 x 1 column vector. Explicitly, it is _ o 0) (0) (0) o o o o o T E,“ — (0, V’ 1(1 ),0,0, W 1(1 ’ZAQIV/ll 2 W 12 s 1:2)1” 12 )90, V/l(2 )109090’ 1:2)W1(2)’09O) , Where the superscript on the parameters means that we use the current values of estimates to compute E ,‘ . Similarly, E1, = (0,0,0, Wig):0,0’0a121'/’2(i))’0’0,09 VS”, wéi”,1§?’w§?’a W§§’,2/1§3’W§§’)’ - (b) Next, for E w , we compute it like the old I in HLM2. That is, we can compute EV one at a time by E, =(A®A)D;, (A-13) where -140- where w is the part in the vector ¢ whose elements are the unique elements of ‘1’ , as defined in (A-9) in Definition 1. 1 O O Example.whenA= 2;! 1 ,LIJ=(ZH Zn),andy/=(Wn, W219 W22)T,then 21 22 0 142 W11 W21 W12 0' _ o'liec‘P _ W22 _(0’\’ecw o'vecy/ area/I) 1v WT 011/” 1,12, 1112,) 31/1,, , W21 , W22 (1 O 0 0 l 0 ‘ o 1 0' KO 0 1 Then, by Equation A-13, we get the 16 x 3 matrix E, by -141- 1 321 o o 2,, o 2,, o = (A ® A)(D. ’ DLZI ’ DJ’rz) Vll = (A 18 A)(vecD,,” , vecDV, , veCDVn ) E,=(A®A)D;={ HOO 2.}: '-‘OO CO OH—‘O #090 = ((A ® A)vecD,,” , (A ® A)vecD,,u , (A ® A)vecD,,u) (EWII Ell’zl Esz) ( l 0 0 I 121 0 0 0 1 O 0 2,, 0 2,, 0 0 1221 0 0 0 2,1 0 = 0 121/142 0 O 1 0 ' o 2,l o O O 1 o 0 2,, o 2,, o 0 221/142 0 o 0 2,, \ O 0 242,} Definition 3. Definition of F vector A 1 x Q row vector F is defined by (A- 14) E_xa_1_n1212. -142- 1 O 0 when A: A“ , ‘1’ =(Wn W”) , ¢=(,1,,,A,,,1//”,1/I,,,y/,,,0'2)T,a6 x 1 column 0 1 W21 W22 0 142 MM 1 22 1 vectorandQ=Q,+Q,+1=2+-L2—+—2+1=2+—(—2:—)+1=2+3+1=6.Then, F=(o, o, o, o, o, 1). (2) Computation at H (Expected Hessian matrix) and S (Score vector) Note that H is a Q x Q matrix, and S is a Q x lvector. (a) Computation of H (expected Hessian matrix). H“) = my l°g 2203?», ”2’11,” = i. H)” . where Hy) = _ g {( 5"ng )) T(Vj" ® V,"' )[fl‘;:#}|”’m = —%{ET(A,TJ.VJ."A,, ® ALV;‘A,,)E + ETvee(A,T,V;2A,,)F (A-IS) + FT[vec(A,T,-Vj-2A2, )]T E + (’(V;2)°FTF}1,.,M 2 where o'lzec(z') _ id: a¢T E: To compute H3.” , we need to know A; V,."AU , A; Vj‘2A2j , and tr(Vj‘2 ). These quantities are function of Weighted Sufficient Statistics (WSS), which are -143- A]; V,.-'14,, A,’",V;'A,, WSS = AITJ. V,"A,, ALI/j"); , for all j, the WSS is in the functions of Sufficient Airs-'2 14,521, , A,",A,. 1 Statistics (SS), SS = ASA”. ALY, , for all j, and ACjT'AT for all j. Therefore, we first Air,- show the computational formulae of ACJT'AT and the WSS, and then we show the computational formulae specific to H)”. Formula of ACjT'AT : We first compute a M x M matrix C;‘ = (AT/1,321,» + 0'2‘11")’l ,j = 1,2,...,], (A-16) and then compute a R x R matrix ACJT'AT for all j. Then, we compute the WSS for the current ACjT'AT. Computation of WSS: As we can see in the following, WSS is a functions of SS and AC;'AT. . A,§V;'A,, = {lug/1,, — A514,, -AC;'AT 2,214”); (A-17) . 27,421, , = 0'2(A,TJ.A,1. — 21,324,, ~ACJT'AT -A,T,A,j); (A-18) . AijfY, = 04015.); — 21,321,, .AC;‘AT - 21,313.); (A-19) . ALVflAN = 0‘2(A,TJ.A,j — 21,221,, -AC;‘AT -A2TjA,j); (A-20) . .43).“); = a‘2(A,",Y, — A,’,A,, -AC;'AT - A5123). (A-21) -144- Computational formulae of A2T,V,"A,, , A27, V,.-2A,, , and tr( V,'2) are as follows. 0 AI, V,‘l A2, is computed by the formula (A-20). . A,T,V,‘2A,, = 0‘2(A,T,V,"A,, — A{,V,"A,, -AC;'AT -A,T,A,, ) (A-22) 0 tr(V,'2)=(n, — MM“ +tr{(C,‘"1’")2}. (A-23) (b) Computation of S (Score vector). . alogL(Y;r.A.r.az)| ’ . 5(1) = = st!) , a¢ l2,” ,2 . where . 1 a) Vj T —l -l T Si." =§[{_e%(;_)} {(V, ®V, )vec(e,e, -V,)}i,= I _ . _ _ _ - = 2[ETVBC(A2T1VJ 13191“er IAZ} - A,T,V, l"121“" FT{e,7.'V, 23/ —”(V1 1),“ ‘(U (A-24) M‘” ' (i) T -l T -| T -2 -I To compute S, ,we needto know A,,V, e,, A,,V, A2,, e,V, e,,and tr(V, ). . T -I T -l T -2 Thus we need to know computatlonal formulae of A,,V, e, , A,,V, A,, , e, V, e, , and tr(V,’l ). To compute AI, V,"e, , AI, V,"'A,, , e,7.'V,'2e, , and tr(V,‘l ) , we need to have evaluated e, , 61") , efe, , and Age, . Those quantities are computed as in the following. We first note that e, = Y, - AUQIU) , (A-ZS) where 6}") is the current estimate of 61 , and is computed by J J 61‘" = (Z A], V,"A,, )T' 2 AI, V," Y, . (A-26) j=l jal -145- Note that this is a function of WSS, i.e., 61‘” = f (WSS). Next we compute efe, and A222, by efe, = YfY, — 26;"’TA,T,Y, + 61“’TA,T,.A,,19,<”; (A-27) A,T,e, = 2,31; — Jig/1,21“), for all j, (A-28) which are also functions of SS and WSS, i.e., ere, = f (SS, WSS) and A,T,e, = f (SS, WSS). Then, using these values, we can compute the quantities necessary to compute S)". T —l _ T -l T -l (i), A,,V, e, _ A,,V, Y, —A,,V, A,,6, , (A-29) AT.V."A . was already given in formula (A-18) in the computation of H1"; 21 J 21 J e,TV,'2e, =a“(efe, -efA,, -AC,T'AT- T,e, +efA,, -AC,T'AT -A,T,A,, -AC,"AT -A,’,e,) (A-30) tr(V,"‘) = (n, — M)o"2 + tr(C,T"I’"). (A-31) Overpll Steps via Fisher Scoring Algorithm: Step 1. Compute and Store the Sufficient Statistics (SS), i.e., Alli/1U AZTJ‘AZJ' $3 = 21,321,, A5,); (A-32) A1516- for all j. Step 2. Compute Statistics values 61°) , ¢(°’ = (21‘0", w‘o’r,az(°’)rby -146- J J 61‘” = (Z A.C-A.,-)" (2 Asia). (A—33) j=l jai and compute the initial values of 21°) and VI”) by the method which is provided in the next section. 2“” is computed by first COIDPUIing Note: 0' 9,3.) = (A; A, ,)'l (A,T,Y, — ALA, ,191‘”) , (A-34) and then 1 J 0 0'“ ) = mg“,- ‘ Aljglm) " 142199)“)?- " Aljglw) ‘ A210i3))- (A’35) Step 3. Compute a M x Mmatrix C," = (ATA2T, A,,A + 02‘1’")"' ,j = 1,2,...,], by constructing A , ‘1’ from 12“” , and then compute AC,'AT for all j. Note: We construct A , ‘1’ from ¢(°’ , and use them to compute C,’l with 02”). Step 4. Compute the Weighted Sufficient Statistics (WSS), i.e., A,T,V,"A,, A,T,V,"A,, wss= A,’,.V."A,, A,T,V,“Y, (A-36) J Asa-'1:- for all j. Step 5. Compute the matrix E and the vector F. Note 1: F doesn’t change, though E changes for each Fisher Scoring iteration. Note 2: E”) = f (¢‘°’). That is, E”) depends on ¢‘°’and E“) will change as (2“) changes. -147- Step 6. Compute H ,S by the formulae in Equation (A-15) and (A-24) along with computational formulae of the components. Note: H,S=f(E, WSS). Step 7. Compute the new e5“) by Fisher Scoring Algorithm, .i.e., r5“) = 451°) - H"S. (A-37) Note: ¢“) = f(H"S). Step 8. Compute the new J -1 J 61‘” = (Z Ali-W1.) 2 Am"):- (~38) j=1 j=l based on the new ¢"’. Note: 6,“) = f (WSS) Step 9. Compute the Log-Likelihood l. The Log-Likelihood I is computed by 1 J J I: —-2-[Nlog(27t)+(N - JM)log(0'2)+ Jlogl‘l’I-ZloglCfl +Z(e,’e, — e121,, -AC;'A’ -A’-e-)/ 0’], i=| j=| 2’ I as in Equation (A-S), where efe, is computed by Equation (A-27), and A,T,e, is computed by Equation (A-28). Step 10. Go back to Step 3. Note: We use the same cutoff criterion as current HLM2, i.e., change in log-likelihood, to get out of the loop. -148- Starting Values: To start the Fisher Scoring, we use the following quantities. 9:°’— = (Z A,,A,,)"(Z AUY) , (A-39) j=l 9,3.) = (A5, A, ,)"(A,T,Y, — 2}, 211,191”); (A-40) J 02“”: ——Z(Y, - 2,91» — Azflé‘j’m’, — 2,910)— A 910)) (A41) N— JR ,_ . _ 0,2(0) J r _l V = 2042.242.) ; (A42) j-l _ 1 J D = 72 away”; (A-43) F. f = 1'5 — i7 ; (A-44) F is the method of moments estimate in a balanced design. At this point, we check the positive semi-definitness of the zT matrix. If the 27 matrix is not positive semi-definite (p.s.d.), we fix it up in the following way: i 1) If i, so, set a, to .113]. 2) If l 5,12 if, , let T, =.9\/??,,—. (use the sign of the original ii, ); this reduction is 10 percent from the original value. 3) Now check if f is p.s.d. If yes, go to the next step. If no, go back to step 2), and let 2,: —.8 f, ?, an,other 10 percent reduction from the original value. Check if i" is p.s.d. again. Keep repeating this process. If after five times of this repetition and if f is still not p.s.d., then set all of the off-diagonal elements to O. -149- With the above procedure, we obtain a p.s.d. ? , an estimate of 1. Using this i" , we first obtain the starting values for A . We start with ‘I’ = [M . Then, since I = A‘PAT , we have t = AAT. Since i" is p.s.d., it can be decomposed into 5 : CACT , where A is an R x R matrix in which eigenvalues are on the diagonal, and C is an R x R matrix that the corresponding eigenvectors are in columns. We approximate C and A by taking the first M largest eigenvalues and the corresponding eigenvectors. We denote them as A' and C' . Note that A. is a M x M matrix and C' is R x M matrix. Then an equality holds .1 .1 .1 approximately f = NA” 5 C'A'C'T = (CA 2 )(A 2C”). Thus we obtain A. = C'A 2. Now we use the information for A specification. For clarity, we use an example. Suppose that we specified the A as A = (’111 2,, Us: 1122 3a: 1.2 in) ’123 333 2'43 in) ( 1 0 2,) 0 1 A, O 21, 1 and suppose we obtained 2, ,1, 0 ' U, 0 0} from the previous decomposition. Then, we let the f 1 0 313/133) 0 1 123 / 1a A“) = 0 2,, /2,, 1 (A-46) 1'41 /A1, A42 /’122 0 ‘151/111 0 0 / -150- and use it as the starting values for the estimate of A . In case the pivotal estimates in A10) are close to zero (in this example, either A“ or 2,, or ,1,, ), then we use the initial estimate in A. as it is. Note that we need to check if rank( A101) = M . Next, we compute the starting value for the estimate of ‘I’ based on A“) and F . Since I = A‘I’AT and rank(A) = M , we can obtain ‘1’ by L11 = (ATA)“(ATrA)(ATA)" because rank(ATA) = M and thus it is invertible. Therefore, we let the starting value for the estimate of ‘I’ be @(0) = ( [\(ov 8(0))-1( A(°)T?A‘°’ X Nov 11(0))4 . ( A _47) Note that we need to check whether ‘1’”) is positive semi-definite before we use it as the starting value. We leave an option for the users that they can specify the starting values for A“) and ‘1’”) . Endnote: Derivation of log-likelihood formula in Equation (A-S ): In general, linear model with normal distribution as in Equation (A-3), the log- likelihood is given by N 1 J _, 1 J T _, l=——log(2:r)+§Zlog|V, 1728in e,, (B-1) 2 j=l j=l where V,’1 is defined in Equation (A-4) and e, is defined in Equation (A-2). By Smith’s Theorem 2, (AQAT + ‘11)“1 = ‘1’" - ‘1’"A(AT‘I’"A + Q" )'1 A7"1’", -151- we have -1_ —2 -1 T T V. -0 (I,,—A,,AC,AA,,), I where C,‘ = (A"A,T,A,,A +02‘I’")". Now, log| V,"'l = log|0"2(lnl - A,,AC,T'ATA,T, )l = logl 0’21", |+ loglIn1 - A,,AC,'ATA,T,| T T 2 ' 2' _ =41, log(rr )+10gA2j’A Ilrlcjnl (since B=|A||D—CA"B|(=|D||A—BD"C|) |D-CA"B|=A BIA"I) C D ’ C D ' TT 1A2) 2 =_n,log(0' )+logA,,A I + logl C,"| A B (Agaiaapplylc D'=(IAIID-CA"BI)=IDIIA-BD"C1-) = -n, log(02)+ logllnll C, — ATA,T.I lA,,A j ll + log| C,'| = -n, log(az)+1og(1)+loglC, — ATA2T,A2,A|+1081C;'| _—_ _n, log(a'z ) + loglC, - ATALAUAI + log! Cj'l = —n, log(a’) + logl 02‘1’"|+log|C,7'| (of = (ATA2T, A, ,A + 0'2‘1’")") = —n, log(a’)+ M log(02)+ l0gl‘*"'|+log|C}'| (I‘P"|=I‘PI") = —{(n, - M)1og(a’)+10g|‘P|—log|C,'|}. (B-2) -152- J J 2 log| V,."l = — {(N — JM)log(0'2 ) + Jlog|‘1’|—Z loglC,'|} . (13-3) j=l j-l And e,TV,"e, = e,T(In, — A,,AC,"ATJ€1,T,)e,/0'2 =(efe,—eIA,,-AC,'AT-Ar.e )/0'2, 211' (34} where efe, is computed by Equation (A-27), and A,T,e . is computed by Equation (A-28). J Thus, J J Zerfe, = 2(efe, — 4A,, -AC;'AT . Are )/ 0'2 (13-5) 21 J' j=l j=| Therefore, plugging Equations (B-3) and (B-5) into Equation (B-l), we obtain the log- likelihood l ’ J . 1: -5“); log(27r) +(N - JM)log(az)+ JlogM-Zloglc;'| + 2(efe, - 42,, ~AC,"AT ~A,T,e,)/ 02], I" I" which is Equation (A-S). Notice that both of them are functions of WSS and AC,'AT. -153- References Achenbach, T. M. (1991). Manual for the Child Behavior Checklist/4-I8 and 1991 Profile. Burlington: University of Vermont Department of Psychiatry. Adams, R. 1., Wilson, M., and Wu, M. (1997). Multilevel Item response Models: An Approach to Errors in Variables Regression. Journal of Educational and Behavioral Statistics, Spring 1997, Vol. 22, No. I, 47-76. Aitkin, M., Anderson, D., & Hinde, J. (1981). Statistical Modeling of Data on Teaching Style. Journal of Royal Statistical Society A, 144, Part 4, 419-461. Aitkin, M., & Longford, N. (1986). Statistical Modeling Issues in School Effectiveness Studies. Journal of Royal Statistical Society A, 1 4 9, Part 1, [-43. Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, second edition. New York, John Wiley & Sons. Arbuckle, J. L. (1995). Amos user’s guide. Chicago: SmallWaters. Bartholomew, D. J., & Knott, M. (1999). Latent Variable Models and Factor Analysis, second edition. Kendall’s Library of Statistics 7. London, Edward Arnold. Becker, B. J. (1988). Synthesizing Standard Mean-change measures. British Journal of Mathematical and Statistical Psychology 41, 257-378. Becker, B. J. (1990). Coaching for the Scholastic Aptitude Test: Further Synthesis and Appraisal. Review of Educational Research, vol. 60, No. 3, 373-417. Bentler, P. M., & Wu, E. J. C. (1995). EQS for Windows User ’s Guide. Encino, CA: Multivariate Software, Inc. -154- Bock, R. D. (1988). Multilevel Analysis of Educational Data. London: Academic Press. Bock, R. D., & Aitkin, M. (1981). Marginal Maximum Likelihood Estimation of Item Parameters: Application of EM algorithm. Psychometrika, 46, 443-459. Bollen, K. A. (1989). Structural equations with latent variables. New York,: Wiley. Bollen, K. A. and Joreskog, K. G. (1985). Uniqueness does not imply identification: A note on confirmatory factor analysis. Sociological Methods and Research, I 4, 155-163. Bryk, A. S., & Frank, K. (1991). The specialization of teachers’ work: An initial exploratuion. In S. W. Raudenbush & J. D. Willms (Eds.), Schools, classrooms, and pupils: International studies of schooling fiom a multilevel perspective (pp. 185-204). Orlando, FL: Academic Press. Bryk, A. S., & Raudenbush, S. W. ( 1992 ). Hierarchical Linear Models. Applications and Data Analysis Methods. Newbury Park, CA: Sage. Bryk, A. S., Raudenbush, S. W., & Congdon, R. T. (1996). Hierarchical Linear and Nonlinear Modeling with the HLM/2L and HLM/3L. Chicago: Scientific Software International. Inc. Catchpole, E. A., & Morgan, B. J. T. (1994). Boundary estimation in ring recovery models. Journal of Royal Statistical Society, Series B, 49, 95-101. Cheong, Y. F., & Raudenbush, S. W. (1999) Measurement and Structural Models for Children ’s Problem Behaviors (under review). -155- Chou, C., Bentler, P. M., & Pentz, M. A. (1998). Comparisons of Two Statistical Approaches to Study Growth Curves: The Multilevel Model and the Latent Curve Analysis. Structural Equation Modeling, 5(3), 247-246. Coleman, J. S., Hoffer, T., Kilgore, S B. (1982). High school achievement: Public, Catholic and other schools compared. New York: Basic Books. Cronbach, L. J. (1951). Coefficients alpha and the internal structure of tests. Psychometrika, 16, 297-334. Davis, W. R. (1993). The F CI rule of identification for confirmatory factor analysis: A general sufficient condition. Sociological Methods and Research, 21, 403-43 7. De Leeuw, J ., & Kreft, I. (1986). Random Coefficients Models for Multilevel Analysis. Journal of Educational Statistics, 1 1(1), 57-85. DerSimonian, R., & Laird, N. M. (1983). Evaluating the effect of coaching on SAT scores: A meta-analysis. Harvard Educational Review, 53, 1-15. Gan, L. & Jiang, J. (1999). A Test for Global Maximum. Journal of American Statistical Association, September, 1999, Vol. 94, No. 44 7, 847-854. Garthwaite, P. H., Jolliffe, I. T., & Jones, B. (1995). Statistical Inference. London: Prentice Hall. Gleser, L. J. and Olkin, I. (1994). Stochastically Dependent Effect Sizes. In H. Cooper & L. V. Hedges (Eds.), The Handbook of Research Synthesis, (pp. 339-355). New York: Russell Sage Foundation. Goldstein, H. G. (1986). Multilevel Mixed Linear Model Analysis Using Iterative Generalized Least Squares. Biometrika, 73(1), 43-56. -156- Goldstein, H. G. (1995). Multilevel Statistical Models, Second Edition. Kendall’s Library of Statistics 3. London: Edward Arnold. Goldstein, H., and McDonald, R. P. (1988). A general model for the analysis of multilevel data. Psychometrika, 53. 435-67. Goldstein, H., Rashbash, J ., Plewis, 1., Draper, D., Browne, W., Yan, M., Woodhouse, G., and Healy, M. (1998). A User ’s Guide to ML Win. London: University of London, Institute of Education. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. Newbury Park, California: Sage Publication. Hedeker, D., & Gibbons, R. D. (1996). MIXOR: a computer program for mixed-effects ordinal regression analysis. Computer Methods and Programs in Biomedicine, 49, 229-252. Heywood, H. B. (1931). On finite sequences of real numbers. Proc. Roy. Soc. Lon. 4: 245-253. Howe, W. G. (1955). “Some contributions to factor analysis.” Report No. ORNL-l9l9. Oak Ridge, TN: Oak Ridge National Laboratory. Huttenlocher, J. E., Haight, W., Bryk, A. S., & Selzer, M. (1991). Early Vocabulary Growth: Relation to Language Input and Gender. Developmental Psychology, 27(2), 236-249. Jennrich, R. J ., & Schluchter, M. D. (1986). Unbalanced Repeated-Measures Models with Structured Covariance Matrices. Biometrics, 42, 805-820. -157- Joreskog, K. G. (1979). “Author’s adendum to: a general approach to confirmatory factor analysis,” in K. G. Joreskog and D. Sorbom (eds.) Advances in Factor Analysis and Structural Equation Models. Cambridge, MA: Abt Books. J oreskog, K. G. and Sorbom, D. (1979). Advances in factor analysis and structural equation models. Cambridge, MA: Abt Books. J oreskog, K. G., & Sorbom, D. (1995). LISREL8 User ’s Manual. Chicago: Scientific Software International. Kamata, A. (1998). Some Generalizations of the Rasch Model: An Application of the Hierarchical Generalized Linear Model. Unpublished doctoral dissertation, Michigan State University, East Lansing. Kalaian H. A., & Raudenbush, S. W. (1996). A Multivariate Mixed Linear Model for Meta-Analysis. Psychological Methods, Vol. 1. No. 3, 227-335. Kline, R. B. (1998). Principles and Practices of Structural Equation Modeling, New York: Guilford Press. Lee, V., & Bryk, A. S. (1989). A Multilevel Model of the Social Distribution of High School Achievement. Sociology of Education, 62, 172-192. Little, R. J. A., & Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York: Wiley. Longford, N. T. (1987). Fast Scoring Algorithm for Maximum Likelihood Estimation in Unbalanced Mixed Models with Nested Random Effects. Biometrika, 74(4), 817- 827. Longford, N. T. (1993). Random Coefficient Models. Oxford: Oxford University Press. -158- Longford, N. T., & Muthén, B. O. (1992). Factor analysis for clustered observations. . Psychometrika 5 7, 581-97. Lord, F. M., & Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Menlo Park, Calif: Addison-Wesley. Muthén, B. O. (1989). Latent variable modeling in heterogeneous populations. Psychometrika, 54, 557-585. Muthén, B. O. (1991). Multilevel factor analysis of class and student achievement components. Journal of Educational Measurement, 28, 338-354. Muthén, B. O. (1998). M-Plus User ’3. Muthén & Muthén, Los Angeles. Muthén, B. O. (1998). Second-generation structural equation modeling with a combination of categorical and continuous latent variables: New opportunities for latent class/latent growth modeling. Paper presented at the conference New Methods for the Analysis of Change, Pennsylvania State Univerisity, October 1998. Muthén, B. O. & Curran, P. J. (1997). General Longitudinal Modeling of Individual Differences in Experimental Designs: A Latent Variable Framework for Analysis and Power Estimation. Psychological Methods, Vol. 2, No. 4, 371-402. Muthén, B. O., Khoo, S. T. & Gustafsson, J. E. (1998). Multilevel latent variable modeling in multiple population. Under review. Rasch, G. (1960). Probabilistic Models for some intelligence and attainment tests. Copenhagen: Danish Institute of Educational Research. -159- Raudenbush, S. W. (1994). Fisher Scoring/[GL5 and EM algorithm for a variety of two- level models, College of Education, Michigan State university, Unpublished Manuscript. Raudenbush, S. W., & Bryk, A. S. (1986). A Hierarchical Model for Studying School Effects. Sociology of Education, 59, 1-17. Raudenbush, S. W., Becker, B. J., & Kalaian, H. (1988). Modeling Multivariate Effect Sizes. Psychological Bulletin, Vol. 103, No. 1, 111-120. Raudenbush, S. W., Rowan, B., & Kang, S. J. (1991). A Multilevel, Multivariate Model for Studying School Climate With Estimation Via the EM Algorithm and Application to US. High-School Data Journal of Educational Statistics, Vol. 16, No. 4, 295-330. Raudenbush, S. W., & Sampson, R. (1997). Assessing Direct and Indirect Associations in Multilevel Designs with Latent Variables. (in press). Raudenbush, S. W., & Kasirn, R. M. (1998). Cognitive Skill and Economic Inequality: Findings from the National Adult Literacy Survey. Harvard Educational Review, Vol. 68, No. 1, Spring 33-79. Raudenbush, S. W., & Sampson, R. (1999). “Ecometrics” Toward a Scientific Ecological Settings, with Application to the Systematic Social Observation. Sociological Methodology, Vol. 29, 1-41. Reckase, M. D. (1991). The Discriminating Power of Items That Measure More Than One Dimension. Applied Psychological Measurement, Vol. 15, No. 4, December, 361-373. -l60- SAS Institute (1990). SAS/STAT User ’s Guide, Version 6, Fourth Edition, Volume 1. SAS Institute Inc., Cary, NC, USA. SAS Institute (1996). SAS/STATSofrware: Changes and Enhancements through Release 6.11, Cary, NC, USA. Thum, Y. M. (1997). Hierarchical Linear Models for Multivariate Outcomes. Journal of Educational and Behavioral Statistics, Spring 199 7, Vol. 22, No. 1 , 77-108. Thurston, L. L. (1947). Multiple Factor Analysis. Chicago, IL: University of Chicago Press. Wald, A. (1950). A note on the identification of economic relations. In T. C. Koopmans, ed., Statistical Inference in Dynamic Economic Models. New York: Wiley, pp. 238-244. Wolfi'an, S. (1991). Mathematica: A system for Doing Mathematics by Computer. Second Edition. Reading, MA: Addison-Wesley. Willet, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of change. Pychological Bulletin, 116, 363 -381. Willet, J. B., & Sayer, A. G. (1996). Growth modeling and covariance structure analysis. In Advanced Structural Equation Modeling, Issues and Techniques (Ed. by Marcoulides, G. A. & Schumacker, R. E.) 125-157. -161- "1111111111111111171111“