.x,-‘. ' Q. ‘ ,_ I .: .41.; .- y»: w:“"":";.-» M _ 4 . ~ . Acul- 'E“ ; x ‘L .. .32.... ;_. NAN. . \A” ' - ' «"‘ 'J‘.:-'.".L_.‘-" ... n . . . W3 .. J3 1“ v‘: u’V'V‘ . a .4“: '. “ f ,1 . 1M1».- -§’. ‘ “~' "" ' ‘ " " "1 I J. I 1 .4 .. m. . . L "" W . 2;; - .. _ m: . W' rm”; ..:.“4-1-‘-327‘ "" ,. .1. 'Vv';- I '1 n... .. "11‘1"? ‘ I J 1‘», ~ u! ' "a" . . , ~ we! ‘ ' .. . .3 1" 4‘ w ‘1“ ‘ :1 ‘l- - I a i ' "H i“ 1‘ I T r 1 ‘ .w. 7'.“ . , z"...:.-. .» ‘ AS .. ~-~—r n: . ‘3'." w ~ -..~.-~~ ~—.-.-u1~7 ‘ ‘- 1.! a? “V ".3, w: v'm‘f "4‘ 1.4;“ fig m, .1-111 . . ‘ ww- 1:. ' 1-3;; "‘4? :1." 1'“ v" 1 2;! ~. 1 " " ’ 11-1” 1.41: '....S.-.:.-wr; ., . 5w .. “‘""I.""*"“T"‘"'7” ~. 5 ~54~CV ‘ 1 ~ 3 ___.-.~—- "1"" “""' " I. . 1'1 " q ‘ 1 .. W i 'U“ a VI “1 ‘ h r . v:£‘.1‘.‘.h;§.'. ix .1- .31.“..12J-17T‘E 119...“. - 5:11.: 4. ‘ ‘-.~1“Z‘-1~:~”~‘~~- "' 1_!' ti... ., " .. V » . -- .11 i ' 12-1: 1-11:}:- »4 .111". ‘ ' a" ’ "V "'3”! fi..-~m-.-‘:---.,, . “WA ‘ I "A w .. .. 1- mu... v g-n’u 1.2."... v- .fi ,,. : 1.. 1-. .n 1 . . r u « . .. 1 ; 1.3:" -..-.1-.~ 1 i’nt‘rfi‘j " "' "r... “' k a”. ‘ w . . v. , g . . -.-.- ”"4": v1-..-.w"......m.- W"" ' W 1., ~ 1:. Ar . 1m. «.1 v‘ ”"“ '"- -“‘ ‘ > I r. . . , ”m. 1 . _ _. , 9.. . . a...“ ... 1...: T‘ ' ' ‘3 x“ " a" ~ .~:;-" ' w: ..~, « , .‘v , me: 425.. - ~ . ‘ “ “2“" ”£57: . . 1;an m .41 .. - vii-“1 11:31:13- -- . 4 - . "pr. “A «d;?-;—'1‘-"‘~: ‘ _ ”.11.... ‘ ’ ~.C«:.-._-n.'m::-_I ‘1'“ J}; 12:". ;' ~“~ ., ..-.—..25.~--.1 1- MIIGCH \ \\\\1\||\will\llllllllllll 3 \\\\\\\\\\\\\ \\ This is to certify that the dissertation entitled A MULTIVARIATE MIXED LINEAR MODEL FOR META ANALYSIS presented by HRIPSIME A. KALAIAN has been accepted towards fulfillment of the requirements for ' Ph . D . degree in Educ at ion Major professor Date August 4, 1994 MSU is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE ‘7‘ 95%, wungoziitifi ‘JQ qué a} 33 mgfill ITOV 1 6 2002 121’} 03; MSU Is An Affirmative Action/Equal Opportunity Institution mic PMS-D. 1 A MULTIVARIATE NIDCED LINEAR MODEL FOR META-ANALYSIS By Hripsime A. Kalaian A DISSERTATION Submitted to Michigan State University College of Education In partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Educational Psychology, and Special Education 1994 ABSTRACT A MULTIVARIATE MIXED LINEAR MODEL FOR NIETA—ANALYSIS By Hripsime A. Kalaian Meta-analysts often encounter data sets with multiple effect sizes from each primary study in the review either because of multiple measures or multiple treatments. Having these correlated multiple effect sizes requires the use of multivariate analytical techniques which take into account the intercorrelations among these multiple effect sizes. In the present study, the multivariate mixed-effects model for meta-analysis is developed and presented. This multivariate model takes into account three important characteristics which often arise in meta-analysis. The first is having multiple correlated effect sizes. The second is that different studies can have different subsets of effect sizes depending on the design of the primary study. The third is that these multiple effect sizes may be random realizations from a population of possible effect sizes. Using the proposed model enables meta-analysts to obtain multivariate empirical Bayes estimates of the parameters in the model without excluding studies when some of the effect sizes are missing. The application of the multivariate mixed-effects model is illustrated using multivariate artificial effect sizes (generated from the multivariate normal distribution) and a real data set“ The real data set involves Scholastic Aptitude Test (SAT) coaching studies evaluating the effects of coaching on the two SAT subtests (SAT-Verbal and SAT-Math). Also, the fixed-effects model parameter estimates obtained from analyzing the transformed GL8 model are compared to the mixed- effects model parameter estimates obtained from the HLM program. In conclusion, the multivariate mixed-effects model using the HLM program can be applied to multivariate meta-analysis studies with missing effect sizes to obtain empirical Bayes estimates. Also, the proposed model can be used to perform multivaraite fixed-effects analysis. Finally; the findings of the present study can be generalized to studies with more than two outcomes (effect sizes) and at the same time within-study characteristics can be incorporated in these applications. ACKNOWLEDGEMENT S This study would not have been possible without the help of many individuals. My appreciation is offered to them for their encouragements and support. In one way or another, each of the members of my doctoral committee has shared the decade with me. First, I would like to thank Dr. Steve Raudenbush, my advisor and dissertation chair, for his constant support and valuable advice. I thank him and his family for helping us (me and my family) through our life crisis by providing a loving friendship. Second, I would like to thank Dr. Betsy Becker for her belief in my work. I thank her for valuing me by listening and for responding seriously and quickly to my educational and personal problems. Third, I would like to thank Dr. Richard Houang for his constant help, support, and advice through my study. Finally, I would like to thank Dr. Dennis Gilliland for being the best teacheru He taught me not only statistical subjects, but how to deal with students and treat them with respect. I would also like to express my deepest gratitude to my W husband, Rafa, for his support and help in any way he can to make this goal attainable. Our Wonderful three children, Nader, Neda, and Nabeel deserve special thanks for their sacrifices and patience so I can finish my studies. I also want to thank my colleagues in the Office of Medical Education who have upheld me and this study in both professional and practical ways. Dr. Bob Bridgham, Dr. Patricia Mullan, Dr. Andrew Hogan, Dr. Rebecca Henry, and Mrs. Karen Boatman all in one way or another supported and helped me to achieve my goals. Finally, I am deeply indebted to both of my parents and my sisters and brother, each of whom has prayed for me, supported me in every aspect of my life, taught me to value education and.hard work, and loved me through this experience. TABLE OF CONTENTS CHAPTER PAGE I. INTRODUCTION . 1 1. Meta— —Analysis 1n Educational and Social Sciences 1 2. Meta-Analysis 1n Medical Sciences 3 3. Multiple Dependent Effect Sizes 4 4. Multivariate Statistics . 5 5. Purpose of the Present Study . 6 6. Advantages of Using Multivariate Mixed Model 7 7. Organization of the Present Study 9 11. REVIEW OF THE LITERATURE . . . . 11 1. Univariate Approaches . . . . . . 11 1.1 Univariate Fixed-Effects . . . . . 13 1.2 Univaraite Random-Effects. . . . . 14 1.3 Univariate Mixed-Effects . . . . . 16 2. Multivariate Approaches . . . . . 17 2.1 Multivariate F 1xed Effects . . . 17 3. Summary of Previous Meta- -Analysis Techniques . . 19 III. NOTATION FOR MULTIVARIATE MIXED LINEAR MODEL 22 1. Multiple Measures For Each Study . . . . 23 1.1 Glass’ 5 Estimate of Effect Size . . . 24 1. 2 Population Effect Size . . . . 25 1. 3 Unbiased Estimate of Effect Size . . . 26 1.4 Distribution of Multiple Effect Sizes . . . 26 2. Pre-Post Multiple Measures for Each Study . . . 30 2.1 Estimated Standardized Mean-Change Measure . 31 2.2 Unbiased Standardized Mean-Change Measure . 31 2.3 Distribution of Standardized Mean-Change Measure . 32 2.4 Effect Size Estimate . . . . . 33 vi 2.5 Distribution of Effect Sizes . . . . 34 3. Multiple Treatments for Each Study . . . . 36 3.1 Population Effect Size . . . . . 36 3.2 Sample Effect Size . . . . . 37 3.3 Distribution of Effect Sizes . . . . 38 IV. MULTIVARIATE MIXED LINEAR MODEL . . 41 1. Within-Study Model . . . . . . 43 1.1 Illustrative Example . . . . . 44 1. 2 GLS Within-Study Model . . . . 48 2. Between-Studies Model . . . 51 2.1 Unconditional Between- Studies Model. . . 51 2. 2 Conditional Between-Studies Model . . 55 3. Within-Study and Between-Studies Models Combined . 60 V. ESTIMATION OF MULTIVARIATE MIXED MODEL . 66 1. Estimation when ‘t and E are Known . . . . 66 1.1 Posterior Distribution of 6=(y,U)’ . . . . 68 1. 2 Posterior Distribution of 6' . . 72 2. M. L. E. Estimation of the Dispersion Matrices Via EM. . 73 2. 1 E- -Sth (Expectation Step) . . . . 75 2.2 M—Step (Maximization Step) . . . . 76 VI. EMPIRICAL APPLICATION OF MULTIVARIATE HIERARCHICAL LINEAR MODEL. . . . 78 1. Introduction To The HLM Computer Program . . 79 2. Multivariate Effect- Size Data Generation . . . 81 3. Results . . . . . 84 3.1 Description of the Generated Data . . . 84 3. 2 The V- Known Program Results . . . . 85 3.3 The HLM Program Results . . . . 86 4. Conclusions . . . . . . . 87 VH. SAT COACHING EFFECTIVENESS: A META-ANALYSIS USING MULTIVARIATE HIERARCHICAL LINEAR MODEL . 89 1. Introduction . . . 90 2. Description of Scholastic Aptitude Test (SAT) . 91 vii 3. Past Research on SAT Coaching Effectiveness 4. Methodology 4.1 Studies 1n the Review 4.2 Study Features 4 .3 Statistical Procedures Results Fixed- and- Mixed-Effects Models Compared. Discussion . . >19.“ VIII. DISCUSSION AND IIVIPLICATIONS . APPENDICES APPENDIX A: V-KN OWN COMPUTER OUTPUT APPENDIX B: HLM COMPUTER OUTPUT REFERENCES viii 92 96 96 98 98 103 107 108 117 120 125 129 LIST OF TABLES Table 1: Previous Meta-Analysis Approaches for Effect-Size Data Table 2: Generated Multivariate Effect sizes Table 3: Effect Sizes of SAT Coaching Studies . Table 4: Characteristics and Features of SAT Coaching Studies Table 5: Frequency Distribution of Student Contact Hours Table 6: Fitting Unconditional HLM Model Results Table 7: Fitting Conditional Model Results Table 8: Comparison Between Fixed-and-Mixed-Effects Model Estimates . . . . . . 21 88 110 112 113 114 115 116 LIST OF FIGURES Figure 1: Frequency Distribution of SAT Effect Sizes . . 104 Figure 2: Relationships Between SAT Effect Sizes and Log(Contact Time)106 CHAPTER I INTRODUCTION 1. META—ANALYSIS IN EDUCATIONAL AND SOCIAL SCIENCES In the last two decades there has been a surge of interest among educational and social researchers in applying quantitative methods for synthesizing and aggregating the results of primary related studies. The goals of research synthesis are accumulating and combining research evidence from many studies testing the same research hypothesis and also generating new evidence which helps to formulate new research hypotheses and plan future research studies. In other words, meta-analysis is, potentially, a powerful tool for synthesizing existing knowledge, criticizing the design of existing research, and stimulating more meaningful interdisciplinary research. Various quantitative methods for research synthesis have been developed and applied within the last twenty years. One way of synthesizing and summarizing the research findings from 2 previous investigations is by aggregating effect magnitudes using meta-analysis statistical techniques. The term "meta- analysis" was first introduced and popularized to the social science literature by Glass (1976), and has also been developed by others, such as Rosenthal (1978) and Rosenthal and Rubin (1979). Pillemer and Light (1980) and Cooper (1982) provided a conceptual framework for research synthesis. Cooper (1982, 1984) developed a systematic approach (five- stage model) to carry on a research synthesis and an integrative research review. Hedges (1981, 1982, 1983), and Hedges and Olkin (1985) introduced the technical statistical methods for meta-analysis. Rosenthal (1978) presented a collection of statistical procedure for combining significance levels from primary research. Meta-analysis can be defined as the statistical analysis of a large collection of primary research studies which focus on the same research question for the purpose of accumulating previous findings and consequently generating new research evidence. The most popular meta-analysis technique is first calculating an effect size for each primary study in the sample of collected studies in the review and then finding an overall effect-size estimate (here we assume that the effect sizes from the primary studies share a common population effect size). Thus” for treatment-control studies, effect size:can.be defined as the standardized mean difference between the experimental and control groups from each study in the integrative review. 2. IVIETA-ANALYSIS IN NIEDICAL SCIENCES Since the mid-19805 the application of meta-analysis techniques for research.revieW'purposes spread from social and behavioral sciences through many other disciplines, especially medical sciences and health care disciplines. Meta-analyses of clinical trials (e.g., Yusuf et. al., 1987; Havens et. al, 1988) and epidemiologic studies (e.g. Longnecker et. al, 1988; Shinton and Beevers, 1989; Berlin and Colditz, 1990; Greenland, 1993) have been used frequently as an attempt to improve on traditional methods of narrative review. As in educational and behavioral sciences, the aim of the meta- analysis :hi health-care disciplines :hs systematically aggregating and summarizing data from the primary clinical trial studies to obtain a quantitative estimate of the overall effect of a particular treatment or clinical procedure on a defined outcome. Many meta—analysts have reviewed and examined the methodology of meta-analysis as applied to clinical problems especially to randomized controlled trials (Ottenbacher and Petersen, 1983; DerSimonian and Laird, 1986; L'Abbee', Detsky, and O'Rourke, 1987; Sacks et. al, 1987; Jenicek, 1989; Thacker, 1988; Greenland, 1987). Gerberg and 4 Horwitz (1988) presented. guidelines for' conducting :meta- analysis for clinical studies. Huque (1988) defines meta- analysis as a statistical analysis which combines or integrates the results of several independent clinical trials considered by the meta analyst to be integrable. 3. MULTIPLE DEPENDENT EFFECT SIZES Educational and social researchers often try to examine and explain a behavioral phenomenon by collecting multiple measurements from each individual in the study. As a result of having multiple measurements, primary research studies are not always so simple to integrate and summarize. Thus, meta— analysts usually calculate multiple measures for the effect of the experimental treatment depending on the number of the outcome variables in each study in the review. Some of these studies compare different treatment groups to a single control group and are called multiple treatments studies. Other studies compare a single treatment group to a single control group, but instead of obtaining a single outcome measure, multiple outcome measures are obtained where there are several subscales in the outcome measure or test. These will be referred to as multiple measures studies. 5 Moreover, another set of studies, which can be characterized as pretest-posttest study designs, compare a single treatment group to a single control group and multiple pretest and posttest outcome measures are obtained from each studyu These type of studies are referred to as pre-post multiple measures studies. 4. MULTIVARIATE STATISTICS Having these correlated multiple effect magnitudes from each primary study in the review requires multivariate procedures of analysis (Hedges & Olkin, 1985; Raudenbush, Becker & Kalaian, 1988). Multivariate analysis refers to a collection of descriptive and inferential methods that have been developed for situations where we have more than one outcome variable and these outcome variables are correlated. Using multivariate procedures for analyzing meta-analysis data sets with multivariate characterization has various advantages. For example, (a) it provides us with better parameter estimates because it handles the multiple effect sizes simultaneously, taking into account the interdependence among the outcome variables, (b) it controls Type I error rates, (0) it also facilitate statistical comparisons among outcomes. 5. PURPOSE OF THE PRESENT STUDY This thesis will present a nufltdvariate mixed-effects model (multivariate hierarchical linear model) for meta- analysis that considers the multiple effect sizes from multiple-outcome studies or multiple-treatment studies from each study as random, and then models these effect sizes or the correlation coefficients as a function of study characteristics plus random error. Thus, this multivariate model takes into account three important characteristics of this type of data which often arise in meta-analysis. The first is having multiple effect sizes based on multiple dependent variables from each study. The second important characteristic is that different studies can have different subsets of dependent variables and consequently different numbers of effect sizes and correlations for each study. The third.characteristic.is that.the effect sizes and the product- moment correlation coefficients from several studies are often viewed as random realizations from a population of possible effect sizes and correlation coefficients. The application of the proposed multivariate mixed- effects model will be evaluated and examined empirically using artificial and real data sets. The artificial multiple effect sizes will be generated from the multivariate normal 7 distribution with specified mean vector and variance- covariance matrix. These effect sizes will be analyzed and compared by using the Hierarchical Linear Model (HLM) program (designed for analyzing multi-level data) and the V-Known routine (designed for meta-analysis purposes when the within- study variance-covariance matrices are known). The real data set represents the Scholastic Aptitude Test (SAT) coaching studies“ 'Ehese:multiple effect sizes represent the effects of coaching on SAT-Verbal and SAT-Math scores. These effect sizes will be evaluated by using the HLM program. 6. ADVANTAGES OF USING MULTIVARIATE MIXED MODEL The estimates and hypothesis-testing procedures generated by using the multivariate mixed-linear model are fully multivariate techniques since they take into account the correlations among the multiple effect sizes from each study and meanwhile have several important properties. They allow one: 1. To distinguish between variation in the true multiple effect size parameters for each study, and the sampling covariation which results because effect sizes are 8 estimated with error. That is Total Effect Size Parameter Error Covariation = Covariation + Covariation 2. To examine the differential effects of the treatment on the multiple outcome measures; 3. To test hypotheses about the effects of study characteristics and features on multiple study outcomes; 4. To estimate the variance-covariance matrix of the multiple random effects and test the hypothesis of no variation-covariation among the multiple effect size parameters; 5. To find improved empirical Bayes estimates of multiple effect sizes and multiple product-moment correlation coefficients in each study; 6. To include.in the analysis different numbers of outcomes from each study as well as different predictors for the different outcome measures; 7. To provide more precise and stable parameter estimates. 7. ORGANIZATION OF THE PRESENT STUDY This study contains eight chapters dealing with the theory and the application of the multivariate mixed-effects model for meta-analysis and research integration. Chapter two will review the existing literature on the statistical approaches and methods of meta—analysis. Chapter 3 will present a description of the notation and the statistical terms used for the multivariate hierarchical linear model. Also, the theoretical background and notation for meta-analysis will be reviewed in this chapter. The multivariate mixed-effects model for meta-analysis will be introduced and developed in Chapter 4. First, the unconditional model (with no predictor in the model) will be illustrated. Second, the conditional .model (where the variations among the multiple effect sizes are explained by some study predictors) will be explained. Chapter 5 will deal with the estimation of the multivariate mixed-effects model that proposed in this study. Also, the maximum likelihood method of estimation and the EM algorithm will be presented in order to obtain empirical Bayes estimates of the parameters in the model. In Chapter 6, an artificial multivariate effect-size data set will be generated using FORTRAN and IMSL subroutines“ The 10 results of applying the proposed model to these generated.data using the HLM program for analyzing multi-level data and the V—Known routine for analyzing effect-size data will be compared” The findings of this chapter will help us to pursue the use of the lHLM jprogram for :meta-analysis purposes, especially when there are missing effect sizes in the data set. Chapter 7 will present empirical results of applying the proposed multivariate mixed-effects model to Scholastic Aptitude Test (SAT) coaching data. The results and the conclusions based on fitting unconditional and conditional hierarchical linear models will be documented. Also, in this chapter, the applicability of the proposed multivariate mixed- effects model to obtain multivariate fixed-effects parameter estimates of the effects of the SAT coaching will be illustrated and these parameter estimates will be compared to those estimates from the multivariate mixed-effects model. Finally, in Chapter 8, a concluding statement on the results of applying the proposed model to the artificial generated data and the SAT coaching studies will be presented. Also, the implications of the findings for further research related to multivariate effect—size meta-analysis will be discussed. CHAPTER II REVIEW OF THE LITERATURE There has been much research and development progress in meta-analysis techniques in the last two decades. The developments have included tests of homogeneity of the effect sizes, modeling heterogeneity using fixed-effects and random- effects models for univariate effect sizes and correlation coefficients, and modeling multivariate effect sizes for fixed-effects cases. In this chapter the statistical techniques used previously to analyze data from studies that have multiple outcome measures are reviewed. 1. UNIV ARIATE APPROACHES Despite the multivariate characterization of the situations of multiple outcome variables from each study, the most frequently used procedure is to treat the multiple effect 11 12 sizes separately, with one meta-analysis for each outcome measure (e. g., Giaconia & Hedges, 1982; Kulik & Kulik, 1984: Rosenthal & Rubin, 1978; White, 1976). This practice of dealing with multiple outcome effect sizes and correlation coefficients individually inflates Type I error rates for quantitative review results, which in turn decreases the future replicability of the research findings. Moreover, conducting a separate meta-analysis for each outcome measure limits the kinds of research questions that the meta analyst can address. For example, the research questions 'Does a specific treatment have differential effects on the multiple outcomes?‘ or 'Does a specific study characteristic have differential effects on the multiple product-moment correlation coefficients?‘ cannot be answered precisely and accurately using univariate meta-analysis procedures. Another common method of meta-analysis is to combine the estimates of the multiple effect sizes such as by averaging or summing the effect sizes for the multiple outcomes or the multiple correlation coefficients (e. g., Iaffaldano & Muchinsky, 1985). Employing this pooling procedure may result in losing important information about variation between the multiple effect sizes because a single treatment may have different effects on different outcome measures. This procedure is more appropriate when the outcomes represent or measure the same construct. Hedges and Olkin (1985) proposed a test for homogeneity of multiple effect sizes within each 13 study and a pooling procedure under the assumption that the multiple outcomes are measures of a single construct. Univariate statistical theories for synthesizing research studies are described below. 1.1 Univariate Fixed-Effects This approach stresses the estimation of a fixed and common population effect of the treatment across a series of studies which test the same research hypothesis (Glass, 1976; Hedges, 1981). The method involves the calculation'of an estimate of effect size from each single study. The average of effect-size estimates across studies for each outcome measure is used as an index of the overall effect size for each of the multiple outcome measures. Hedges (1982a) developed a test of homogeneity of effect—size estimates. This test examines whether the observed effect-size estimates vary by more than would be expected if all studies shared a common underlying population effect sizes. Further, if the test of homogeneity fails, the meta- analyst.tries toiconstruct.a1weighted least squares regression model or a categorical model by regressing effect size estimates on various known study features (Hedges, 1982b). The main reason to use a regression model is to explain the l4 variability among the effect-size estimates from different studies by using known study characteristics as predictors. 1.2 Univariate Random—Effects Contrary to the fixed-effects model, which assumes that there is a single underlying population effect of the treatment across all studies or that all the variation between studies can be explained by known study characteristics, the random-effects model assumes that the values of the effect sizes are sampled from a distribution of effect—size parameters. In other words, in the random-effects model there is no single true population effect. The true effects are from a distribution of effects. Thus, by using the random-effects models, we can estimate the variance components of the distribution of the population of effect-size parameters as well as the variance components of the sampling distribution of the effect sizes. In other words, there are two sources of variation in the observed effect sizes (variability in the population effect-size parameter distribution and the variability in the effect-size estimates about the true parameter values. Rubin (1981) suggested a random-effects model to summarize the results from parallel randomized experiments. 15 He usengayesian and empirical Bayesian techniques to obtain improvedi estimates of the' treatment effects in each experiment. Thus, his/model views study effects as being {andpmnrealizsimnset a Population of treatment fife???- Moreover, this model enables the researcher to estimate the variance of the treatment effect parameters. However, since the parallel randomized experiments have the same outcome measure, he did not incorporate the standardized effect-size estimates in his model. Also, he did not model the variation among the parallel experiments as a function of experiment characteristics. DerSimonian and Laird (1983) used the univariate random effects model in their meta-analysis to estimate an overall average effect of SAT coaching; .Also, they obtained empirical Bayes estimates of the individual study and program effects as well as their estimated variances via the EM algorithm using the maximum likelihood estimation procedure. Their outcome was not the effect size, 4, rather they looked at raw mean differences. Hedges (1983) developed the statistical theory for the random-effects model for effect sizes. ID1 this model the effect sizes are not assumed fixed but instead are viewed as sample realizations from a distribution of possible population effect size parameters with a :mean and ‘Variance to be estimated via methods of moments. Thus, by using this model, the observed variance among treatment effects can be l6 decomposed into two components (a) sampling error or conditional variability of the estimated effect sizes around its population effect sizes and (b) random variation of the individual study effect sizes around the mean population effect size. 1.3 Univariate Mixed-Effects The mixed-effects model corresponds to a setup with both fixed and random treatment effects. The random effects are the residuals (effect parameters minus predicted values) and the fixed effects are the effects of between study predictors. Raudenbush and Bryk (1985), building on the work of Rubin, provided a statistical theory for a univariate hierarchical linear model (mixed-effects model) for meta- analysis. Their model views the effect sizes are random and models the variation among the effect sizes as a function of study characteristics plus error. Also, their model enables the meta-analyst to find.improved.empirical Bayes estimates of individual effect sizes. Raudenbush (1988) reformulated the hierarchical linear model as the general mixed-model. This model allows estimation of the random and fixed effects when the within- group predictor matrices are less than full rank. l7 2. MULTIVARIATE APPROACHES We characterize a procedure as being "multivariate" when we have multiple effect sizes on the basis of having multiple dependent measures or multiple treatment groups compared to a common control group for each study. Consequently, we analyze this kind of data simultaneously by taking into account the intercorrelations among the multiple outcomes or the multiple treatments. That is, we consider a procedure as being multivariate where several measurements or treatments are modeled jointly. 2.1 Multivariate Fixed-Effects Hedges and Olkin (1985) proposed a multivariate statistical theory for summarizing the results from (iifferent studies with multiple outcome measures. Their approach requires that all studies use the same number of outcome measures. IHowever, they didn't.provide a statistical model to explain the variability in multiple effect sizes as a function of study features and experimental conditions. Rosenthal and Rubin (1986) presented another method for l8 combining and comparing research results from studies having multiple effect sizes based on multiple dependent variables. They provided a method for obtaining a single summary effect size estimate from multiple effect sizes and a technique for testing this composite effect size. Also, they described a procedure for estimating the magnitude of the effect for a contrast among the multiple effect sizes of an individual study and for testing the significance of this contrast effect size. Their proposed meta-analytic procedures do not allow different predictors for the various dependent variables. They also did not provide a model to explain the variability in multiple effect sizes as a function of study characteristics. Raudenbush, Becker, and. Kalaian (1988) proposed generalized least squares (GLS) regression. to :model the variation between studies and to account for the interdependence among multiple.outcomes within studies. 'Eheir approach allows the meta-analyst to include in the analysis different numbers of outcome measures from each study and different sets of predictors for each outcome measure. They / view study effects as fixed, which means that (all the variation among the multiple study effects other than sampling variance and covariance can be explained as a function of study characteristics. 19 3. SUMIVIARY OF PREVIOUS META-ANALYSIS TECHNIQUES Four main techniques have been used previously to deal with studies that have multiple outcomes and consequently multiple effect sizes. The first and the most commonly used approach.is the univariate fixed-effects model where the meta- analyst conducts a separate meta-analysis for each outcome measure. The basic assumption of this model is that the treatment and control populations share a common effect size, and the existing differences among these effect sizes can be determined through the knowledge of some study characteristics (Glass, 1976; Hedges, 1981). The univariate random-effects model is the second approach where the investigator also deals with the multiple outcomes separatelyu By using this approach the researcher assumes that there is a distribution of true effects for the experimental and control populations (Rubin, 1981; Hedges, 1983). The third approach is the univariate mixed-effects approach (Raudenbush & Bryk, 1985; Raudenbush, 1988) where the estimated effect sizes can be modeled as a function of study characteristics plus random error. These univariate approaches all assume that multiple outcomes from each study are independent. The fourth approach is the multivariate fixed-effects model (Raudenbush, Becker & Kalaian, 1988; Gleser & Olkin, 20 1993) which assumes that the study effects are fixed and considers all the variation-covariation among the standardized multiple study effects other than sampling variances and covariances to be explainable as a function of study characteristics (study design, treatment conditions, contexts, etc.). In summary, these previous meta-analysis techniques either didn't account for the intercorrelations between the multiple outcome measures (univariate procedures) or assumed that the size of the multiple effects reported in each study depend strictly on known study characteristics and all of the variation between these studies can be explained by these known predictors. 21 $8: sso a. $86 8%: cases s .5105 53:83am 38: 525 s season 63: ago a. smear @315 a. 53:83am Ass: 23 a 58528 Ana: .8me 33: 4&8: as: 58¢ 63: $20 msoomm-pox_2-ucw-Eoc:wm Baum—682m numu osflm voouuo you mononoummn mwmhannnlmuoz m50fi>oum H OHQMB mozomoaqm 8253232 monomotoam BESZED CHAPTER III NOTATION FOR MULTIVARIATE MIXED LINEAR MODEL Here we should distinguish between three kinds of studies, multiple measures, multiple treatments, and pre-post multiple measures studies. In multiple measures studies a single treatment group is compared to a single control group in each study and multiple outcome measures are obtained from each study. CH1 the other' hand, in. multiple-treatments studies, multiple treatment groups are compared to a common control group in each study on a single outcome variable or multiple treatment group means are contrasted in each study. As in multiple measures studies, in the third kind of study, a single treatment group is compared to a single control group in each pretest-posttest study and multiple pretest and posttest outcome measures are obtained from each study: This differentiation is made because (a) the estimated effect sizes and their variances for pre-post study designs are different from the other two kinds of studies, and (b) the formulas for estimating the covariances between the estimated effect sizes are different for the three kinds of studies. Thus, each type of study must be separately considered. 22 23 1. MULTIPLE MEASURES FOR EACH STUDY The model for multivariate mixed meta-analysis for multiple measure studies assumes that we have K studies each comparing an experimental treatment (E) to a control condition (C) on one or more of P1 outcome measures (in study 1). Where i =142,....,K studies. Let the outcome measures Yig-p and Yigp for person j on outcome p in study i be normally distributed with means E I e e 2 tap and pi” respectively and With common variance 03” Thus, we assume that E E 2 Yijp ~ N(|J'jplojp) I c c 2 Yijp ” NWIPIinL where, j =1q2,....,nf subjects, or j =1q2,....,nf subjects, 1 = 1,2,....;K studies, and 24 £3: 1,2,....,Pi outcome measures. 1.1 Glass’s Estimate of Effect Size Glass (1976) proposed that the standardized mean difference between the experimental and control groups for the pth outcome measure, Y. in the ith study is 1p! ‘1 _'_C = Yip Yip 91p —— . ip where 171.3 and 171-; are the 1th experimental and control group means respectively for the pth outcome measure, 13p. Also .3? is the pooled within—groups estimate of the sample variance which can be calculated as S? = (nf’ - 1) (3,32 + (nf - 1) (3,5,)-2 l I p nf'+IyF-2 25 where Si‘; and Si; are the experimental and control group standard deviations, respectively. 1.2 Population Effect Size Hedges (1981) developed the distribution theory for the effect size. He indicated that g1.p estimates a population effect size for the pth outcome measure for the 1th study. The parameter 51p can be represented as E C _ “nip - pip ip"‘——-———— , in where 01;) is the pooled within—groups population standard deviation and p§,and u§,are the ith experimental and control population means for the pth outcome measure, respectively. 26 1.3 Unbiased Estimate of Effect Size Hedges (1981) also indicated that Glass's estimatorgip is a biased estimator of the population effect-size 61p and he derived the minimum variance unbiased estimator, dip, which is approximately dip = C(mi) gip' where E _ C mi—ni +111 ‘2, and C(mi) is approximated by 4m.-1° 1.4 Distribution of Multiple Effect Sizes For fixed values of 51p , Hedges (1981) showed that this standardized effect-size estimator, d is asymptotically ip' 27 normally distributed with mean. 53,and variance 02(63), which can be represented as Since 53,is not known, Hedges (1982a) provided the large sample approximation of 02(61-p) by substituting dip for 61p. Thus, estimating ozbdm) for the pth outcome measure in the 131; study requires one to replace 6%,, by its estimate dip in the previous equation, or Given that this model allows different numbers of effect sizes based on different numbers of outcome measures for each study, the total number of comparisons between experimental and control groups is P, where I): Spy . As noted above pi 28 denotes the number of outcome measures in study 1. Because the measurements for any subject within a study are correlated, the estimated multiple effect sizes will also be correlated. The correlations between the effect sizes, d3” 19: 1,2,...,pi in study i, depend upon the correlations between the outcome measures for subjects in the experimental and control groups. However, not all studies report sample correlations among the outcome measures, which force us to impute values for the population correlations from other sources (published test manuals, other studies, etc.). Thus, the covariances between the effect sizes of any two outcome __,.r-L-—.,_ measures p _andmpf (in a study can be calculated using the correlation coefficient between the outcome measures (pum,), the population effect sizes for the pairs of outcome measures, and the sample sizes for the experimental and control groups. Gleser'& Olkin (1994) derived the large sample covariance a (dip, dipz) between all]p and dip/, which can be calculated as follows 1 2 — 5 ' 5 I p . . / l 1 2 1P 1P 1p,1p O (dip’ dipI) = (-—‘E- + 'C) pip,ip/ + 'E I“! l x.) 29 Estimating 0(d3fldhp) requires us to replace the effect sizes 61p by their estimates all.p and to replace pip,.ip’ by either the calculated sample correlations from each study or the imputed values Inmn¥° Thus, 1 2 A l l —2' dip dip’ rip, ip/ o(dip,dip/) — <—.: + ——C> rips-DI + E C Hi 111' 111' + 1'11 Thus, having estimated the variances and the covariances of the effect sizes for each study, we obtain the estimated variance-covariance matrix 21 for each study. Its diagonal elements are the variances and the off—diagonal elements are the covariances. By "stacking up" these K covariance matrices along the diagonal of a matrix we get the estimated covariance ~-.._ \\ matrix, 2, of the sampling errors. So, 2 is a P by/P matrix with 21's stacked along the diagonal, and all off-diagonal block matrices are zero because we assume that the individual studies are independent. Thus, the matrix 2 can be represented as 30 rElla n 9.82 .0. 2 = Q Q .laK 2. PRE—POST MULTIPLE IVIEASURES FOR EACH STUDY Another method for estimating effect sizes is using the standardized.mean-change measure for pretest-posttest designs outlined by Becker (1988). For multiple outcome measures from each study, the standardized mean-change measure is estimated separately for each of the multiple outcomes for experimental and control samples. For instance, a study with one experimental and one control group for each outcome measure would have two standardized mean changes for each outcome, each computed as the difference in mean performance between the posttest and pretest divided by the pretest standard deviation. 31 2.1 Estimated Standardized Mean-Change Measure For each of the K (i = 1,2. . - - UK) studies, let 91f; andgii denote the standardized mean change measures for the experimental and control groups, respectively and can be represented as ( EH _ £3) ( 17C _ EC) gig = 1p E 1p and 91'; = 1p c 119 Sip Sip I where f3 and fl: represent the pretest means for the 1? experimental and control groups, respectively. 171‘; andlE-g represent the posttest means for the experimental and control groups respectively. Si, and 55, represent their respective pretest standard deviations. For each of the multiple outcome measures, separate standardized mean-change measure were computed for the experimental and control groups. 2.2 Unbiased Standardized Mean-Change Measure Becker (1988) indicated that these standardized mean change measures are slightly biased estimates of the population standardized mean-change parameters and she derived 32 the unbiased estimates of these standardized mean change measures . The unbiased estimates of the experimental and control standardized mean changes are and where I'll-i and 121-; are the sample sizes for the experimental and control groups. 2.3 Distribution of Standardized Mean-Change Measure For fixed values of population standardized mean-change measures, the estimated experimental and control standardized 33 mean-change measures (c113:) and dig) are asymptotically normally distributed with mean of}; and 619;, and variances 02(63) and 02(612) , respectively. Thus, the estimated variances of dig and dig are 4(1 — 1,3,) +(di§)2 Var(dii) = E Znip I and 4 1 - C + d-C 2 Varmii) - ( “QC ( 1P) . Znip 2.4 Effect Size Estimate The estimated effect sizes, A for each outcome measure ip' are the differences between the experimental and control unbiased standardized mean-change measures for each of the 34 outcome measures within each of the K studies and is denoted as 3- = dig ’ dig. 1P Thus, studies that examine the effects of experimental treatment on p outcome measures will have p effect sizes. 2.5 Distribution of Effect Sizes For fixed values of Aip' the estimate of the asymptotic variance of each of the estimated multiple effect sizes, Aip' is 4 (1mg) + (c1132 + 4(1—IX$)+(d,g)2 var(3.) = 1p 2n; 2mg I where 13% and I}; are the estimates of the pretest-posttest 35 correlations for the experimental and control groups, respectively. The covariance between 51p and 319 is estimated as Cov(Aip, ipl) = ripiip, [\/V(d,f,) V(dl.f;,) + \/V(d1-f,) V(d1f,/) ] , where r / is the estimated correlation coefficient between ipJp the pairs of the correlated outcome measures within study 1. As with multiple measures studies, having the estimated variances and the covariances of th effect sizes for each study, we obtain the estimated variance-covariance matrix.21 for each study. Its diagonal elements are the variances and the off-diagonal elements are the covariances. Stacking up these K'cpvariance matrices along the diagonal of a matrix produces the estimated covariance matrix, 2, of the sampling errors. This 2 variance-covariance matrix has the same structure as variance-covariance matrix for multiple measures studies developed in the previous section in this chapter. 36 3. MULTIPLE TREATMENTS FOR EACH STUDY The model for multivariate mixed meta-analysis for multiple treatment studies assumes.that.welhave K studies each comparing T’ experimental treatment. groups (E3), - . . O . . . . -t E ‘ . s g '05 ’- . ‘ o 0 _05 1 1 1 1 “1.0 1 1 1 1 1 2 3 4 5 6 1 2 3 4 5 8 LOG ( COACHNG! HOJRB ) LOG ( OOAOHI‘G PM ) The results of fitting conditional hierarchical linear model (Table 7) show that the logarithmically transformed duration of coaching has a significant.positive effect on SAT- Math coaching effect sizes even after controlling for other variables in the model (B = 0.15, p = 0.04). As we can see in Table 7, no other variables studied in this review had a significant effect on SAT scores. .Also, the results show that after accounting for some of the study characteristics, still considerable and significant variability left in the coaching effect sizes (I? = 0.008, p = 0.03 and €;-= 0.03, p = 0.000). 107 Furthermore, the results show that the estimated covariance between SAT-Verbal and SAT-Math effect sizes is about -0.01. 6. FIXED-AND-MIXED-EFFECT S MODELS COMPARED Although the fixed-effects approach is statistically developed (Raudenbush, Becker, and Kalaian, 1988; Gleser and Olkin, 1994), the actual analytical procedure is complex and needs.a special computer skills from meta-analysts in order to perform a meta-analytic review. Thus, in this section, the multivariate fixed-effects model is carried out by applying multiple regression analysis, using the available standard statistical packages (SPSS-PC, SAS, SYSTAT,...etc.), on the transformed GLS within—study model which is developed in Chapter 4” Additionally, the parameter estimates of this application (multivariate fixed-effects model) to SAT coaching data set is compared to the parameter estimates obtained from applying the multivariate mixed-effects model which is developed in Chapter 4. From the findings of the application of the multivariate mixed-effects model to SAT coaching data in the previous section, I learned that duration of coaching was the only significant. explanatory 'varaiardsn Thus, for’ comparison 108 purposes, the number of coaching hours is considered in this section as predictor variable in the model. The results of fitting the conditional multivariate mixed-effects model show that the logarithmically transformed duration of coaching has a significant positive effect on SAT- Math coaching effect sizes (Table 8). On the other hand, the results of fitting the conditional multivariate-fixed model (Table 8) show that the logarithmically transformed coaching hours is not statistically significant. Also, from these results, we can see that the multivariate fixed-effects model yielded standard errors for the beta coefficients smaller than the mixed effects model. 7. DISCUSSION The results of the multivariate hierarchical linear model for' coaching’ effect. sizes showed. that. both. SAT tcoaching programs, on average, had positive effects of about 0.11 of a standard deviation or about six points for both SAT-Verbal and SAT-Math scores. .Also, the results indicated that the average SAT-Verbal effect sizes is not significantly different from the average SAT-Math effect sizes“ IHowever, although we found great variability for the effects of coaching for both 109 subtests, the coaching effects for SAT-Math were more variable than the SAT-Verbal coaching effects. When we modeled the variability of the effect sizes as a function of study features, student contact hours was the only significant predictor (especially for SAT-Math effect sizes) even.after’we controlled for the other predictors in the model. This result agrees with the previous findings of Messick and Jungeblut (1981) and Kalaian and Becker (1986) who found that duration of coaching had a strong effect on SAT scores. I also discovered that the design of the study, the publication year, and whether or not the coaching program is sponsored by Educational Testing Service did not have significant effects in explaining the variability in coaching studies. In comparing the results of analyzing the SAT coaching effect sizes using the multivariate mixed-effects model and the: multivariate fixed-effects :model, the .logarithmically transformed coaching hours yielded significant positive effect on SAT-Math effect sizes using the multivariate mixed-effects model. These results prove the existance of’ parameter variability in the coaching studies that should be accounted for by using the mixed-effects models. 110 Table 3 Effect Sizes of SAT Coaching Studies Study Year r1c n“ A V A 3. Hour ETS Study Home 8 Type Work Randomized Studies Alderman & Powers (A) 1980 28 22 0.22 . 7 1 1 1 Alderman & Powers (B) 1980 39 40 0.09 . 10 1 1 1 Alderman & Powers (C) 1980 22 17 0.14 . 10.5 1 1 1 Alderman & Powers (D) 1980 48 43 0.14 . 10 1 1 1 Alderman & Powers (E) 1980 25 74 -0.01 . 6 1 1 1 Alderman & Powers (F) 1980 37 35 0.14 . 5 1 1 1 Alderman & Powers (G) 1980 24 70 0.18 . 11 1 1 1 Alderman & Powers (H) 1980 16 19 0.01 . 45 1 1 1 Evans & Pike (A) 1973 145 129 0.13 0.12 21 1 1 1 Evans & Pike (B) 1973 72 129 0.25 0.08 21 1 1 1 Evans & Pike (C) 1973 71 129 0.31 0.09 21 1 1 1 Laschewer 1986 13 14 0.00 0.08 8.9 0 1 0 Roberts & Oppenheim (A) 1966 43 37 0.01 . 7.5 1 1 0 Roberts & Oppenheim (B) 1966 19 13 0.67 . 7.5 1 1 0 Roberts & Oppenheim (D) 1966 16 11 -0.66 . 75 1 l 0 Roberts & Oppenheim (E) 1966 20 12 -0.21 . 7.5 1 1 0 Roberts & Oppenheim (F) 1966 39 28 0.31 . 7.5 1 1 0 Roberts & Oppenheim (G) 1966 38 25 . 0.26 75 1 1 0 Roberts & Oppenheim (H) 1966 18 13 . ~0.41 7.5 1 1 0 Roberts & Oppenheim (I) 1966 19 13 . 0.08 7.5 1 1 0 Roberts & Oppenheim (J) 1966 37 22 . 0.30 7.5 1 1 0 Roberts & Oppenheim (K) 1966 19 11 . -O.53 7.5 1 1 0 Roberts & Oppenheim (L) 1966 17 13 . 0.12 7.5 1 1 D Roberts & Oppenheim (M) 1966 20 12 . 0.26 7.5 1 1 0 Roberts & Oppenheim (N) 1966 20 13 . 0.47 7.5 1 1 0 Zuman (B) 1988 16 17 0.14 0.51 24 O 1 1 Table 3 (cont) Effect Sizes of SAT Coaching Studies 111 Study Year c U . V A M Hours ETS Study Home n n A A Type Work Matched Studies Burke (A) 1986 25 25 050 50 0 2 1 Burke (B) 1986 25 25 0.74 50 0 2 1 Coffin (A) 1987 8 8 -0.20 0.37 18 0 2 0 Davis 1985 22 21 0.14 -0.14 15 0 2 0 Frankel 1960 45 45 0.13 0.35 30 0 2 0 Kintisch 1979 38 38 0.14 20 0 2 1 Whitla 1962 52* 52‘ 0.09 -0.11 10 1 2 l Nonequivalent Comparison Studies Curran (A) 1988 21 17 6 0 3 0 Curran (B) 1988 24 17 6 0 3 0 Curran (C) 1988 20 17 6 0 3 0 Curran (D) 1988 20 17 6 0 3 0 Dear 1958 60 526 -0.02 0.21 15 1 3 1 Dyer 1953 225 193 0.06 0.27 15 1 3 1 French (B) 1955 110 158 0.06 4.5 1 3 1 French (C) 1955 161 158 0.20 15 1 3 1 FTC (A) 1978 192 684 0.34 0.31 40 0 3 0 Keefauver 1976 16 25 0.19 -.20 14 0 3 0 Lass 1961 38 82 0.03 0.11 1 3 1 Reynolds & Oberman 1987 93 47 -0.04 0.59 63 0 3 1 Teague 1992 10 15 0.40 18 0 3 O Zuman (A) 1988 21 34 0.56 0.59 27 0 3 l * The sample sizes for SAT-V were I: C = 52 and n U = 52. 112 Table 4 Characteristics and Features of SAT Coaching Studies Characteristic Coded Values Randomized Study (1) yes (0) no Student Voluntariness (1) yes (0) no Presence of Verbal Coaching (1) yes (0) no Presence of Math Coaching (1) yes (0) no Assignment of Homework (1) yes (0) no ETS Sponsored Research ( 1) yes (0) no Publication Year last two digits of the year Coaching Duration log (hours) 113 Table 5 Frequency Distribution of Student Contact Hours Categories (in hours) SAT-V Samples SAT-M Samples 4.5 - 10 18 15 10.5 - 20 10 6 20.5 — 30 6 6 30.5 - 40 1 1 40.5 - 50 3 0 > 50.5 1 1 Mean 17.2 15.4 S. D. 14.4 12.8 Total 39 28 114 Table 6 Fitting Unconditional Model Results Fixed and Coefficient Standard t-ratio P-value Random Error Effects For SAT-V Intercept 0.118 0.021 5.51 0.00 1'2 - estimate 00% For SAT-M Intercept 0.125 0.039 3.18 0.004 12 - estimate 0'03 115 Table 7 Fitting Conditional Model Results z . r u - estlmate Fixed and Coefficient Standard t-ratio P-value Random Error Effects For SAT-V Intercept 0.099 0.049 2.06 0.06 Year 0.002 0.004 0.48 0.39 log (hours) 0.075 0.002 1.94 0.13 ETS 0.079 0.118 0.68 0.36 Randomized 0.003 0.089 0.03 0.38 1:2,, - estimate mm; For SAT-M Intercept 0.057 0.32 0.77 0.29 Year -0.000 0.39 -0.19 0.39 log (hours) 0.15 0.04 2.47 0.02 ETS -0.016 0.39 -0.12 0.39 Randomized 0.07 0.34 0.63 0.32 0.03 116 mo.v a E oocwocEwi 3:865 u 85 8s 8s . 26 9:5: 85688 005 - mfio mmd- Cd Rd- 3885 - 2.9% use 85 mos 8.0 8.0 28: mssusoov 005 - 86 mod- 3.0 3.0- 5086:: - >9 SE 852 mmCL>rrmm ZOm—M