II" 1 "u Vv ‘w‘v 3 "11:", ".3 '3 111'1'311’1111 1.1111 11- 111 ' IiiI'III 41' '1 ' "*'V1"",";"1'1§E' 1'1'13'11" ' !."“_~.?':WI'V V" ' I5J1~ 331” ll, 4’11. MI Vi! III,- 1‘|’QIII? ’II.” ‘kljj. :Etf.‘ l'a‘ 1"} r," -II " OI, II I .1 ' "" 3'11 "'11-... . a": "i', .1: 1 " f? I" 1 '1' :1." "' "‘1 ”' 2'1" ’1"; 3w" 1‘ 1 @131“ '1'!" T,‘ "3:11;? .3": III . '11"! 1 ., . .-. 1 u ”’11‘ (II F 11'1" ""' “"'1 1I.." 1 -""'."‘1"‘ 1M?” 1-"13 u I ‘. l I111 _1 1:”... '. 1- -_. .‘.’ . . I"~:L11I"l1l-II.I g“:"1';".'1-"."'1"":'" 1'jI1'j'-:1I1I,III'I-1_‘Jl"'-"'"'(.11I1II".‘ I '1 -€'/':III"I ‘ 1‘10! 111"". 1" 1'1. "1375933 J'Iv’I‘“ 1 '4' - 1 '1'1'111‘: '1" ’5‘ 1‘ '“ '1 1.’~.,1-1‘- 1151.111; 715.11 ‘ "“1' 1 1'1"- . if? :31) “1""'5I7‘-.1"L""'1 1"" " 111315: I134)" l' 1‘: 11'123 "'51:.1101:9111:._ ' ‘1 1- 1:511.- 1.1111111 .-.;11,;.1.:I,.11.-:;;.I:; .1 J ’}::i k IIII‘I'I :3"? 11:")"Iuh‘ >14 ' 1-.” W ' ' “'1'“ 11': ‘1 4. "31'. 111:? :IIIIQI kn]'}:fl'I1.I1I1"l"a:ltI|r 13.11"” '11“-w.1'51.-.I~I? I“? I. '1".'":' i" :FI"'_I’1'1?'+"'4"1L.' fix :1nt 13.01. «Nu-1).?! )Jflmw‘ '."I'|1.I;i;lll flag?! 1.1".31 “'3' wM3111 It???“ ,5“? j: 11 ' ' '_ 3131* Ra"? 1'11111'1'1 Mr. 1E1';-'1k(1._-fl;im :5merMMWmm E 1 wk "'A I .l', 1p " A,IJ..‘1II‘ ' L v ,3-” "1'le ‘JI ..I,.1.I‘..1IIII'II4PhdcfvaIKo-5 1r.“ I‘LJI‘I ..1 II 1111;:111'I11Id‘" "f" 11” "a". .1 ‘ '1‘ "up *1 1.10 1 I .' I'r. U" . ufifi,;=a* - :zag flaw. '- '. ‘ ' v r-fi-‘S 1 51:12" ’ 1%.1‘3333‘ '11 h, s... -‘. :0 ’ - -:"5" r :' 'c-EEJU o‘r'p’. *4 .3 »' 11‘1'111111‘ W“ 1 ' '1‘ 1.."3111 1.11 22' . - 1' 1%JWWW&V~1ufiI%#*u.' 1 1 12.111111! Jfif'3'"1f'3172‘.’13§11 ' “11: 31'1” ""11“: V1111 ~.f 111% 111MB.“ I?" 11.51.. 6' “:8 '1'; "Leg": I’m 'IJR'IEQILII'I‘ 'f' 'l 1'" a?“ f A " .-'1"131211111161111'"ng1'11' 9‘1 "'J1I‘T1'1.'111 1:111 5.1. .11 ,I iii‘i'11§m.‘""""1'11 M- 5W3." *9 1.RI¥':£J§'II'I 11"“ " "11",“ '1 '._1.' I1Idduq‘IIJIIwMIII1:1IIflIi ”If is: e- 3.1 I Vat-g: LIX:- Lt Vi ng-IT; éég; fl“ 5 ”:3.” ‘1'!" 1kg)", 1 I Eggl'r 'afiuV 1 "WW 1 .01 £11“ 1“ 1:1"?1 9114;”? 1 . tt 1! L t‘}&¥{13%$ 1231,“ (/31 Imhugsfi ' w ,1111 '12ng I1: 1? $133 1' 1%,1111“ >131} "";1'&'15"1‘£N "n‘kflfl_i 1 1.1 1 111 ' 1”?1.1111.1.1""‘w ‘ ”1' W1 11|II:1II‘IIII.I II . I .1 I . 1. I ' 12-" 'gl'ih '“L'I I "1")? 'Y}"' ' ‘ :5. '1')" \fi‘ ' .1 W ' "1 '1"- "'Ltfid'jjgfi'fiqlfik ijgflaflg. LII-{ff IRS I’Tfi‘fi-fi'iflfi? . I. IEIIII. I'I II; Ib‘p .tr n1: ‘ O l N'."“?.n1:1€“11 I I to 1 ‘, ‘ III " 111’ 1% 1 1 412.1, 311111 ~1' - 1 1, a.» {:U‘ 1'1 J “M". If ““3 “19+“; 1'.".J.k1/1 .11 1 -1 I . 1 r1. -1.1! I ‘g I. '1‘ "1' Ig‘ng '1‘ {3... 1. . I. .9 11?»; " 111'1‘1' '11“? “III’ '”II.' I111IIIII.I1>I‘.’I.’ “11.1 (:AIIII'IIIF' '-' (”5.91.1.3‘1‘1114 If.“ “1‘" {VW‘ . ‘3 I Q I 1""4Tlvu 1.3313111; ‘II 1 {'1' . '5‘1' 011‘"! ‘ 'Y 1".1") ‘r’. T‘glIF'iI“: ‘ b I ’IIIIIII ’1'I A‘ I Ill. "1. 4%; '3‘ m“ -‘. l. : .-' .f' 1' ~ 1‘ Eu: "1-. _f:‘ i?" ‘ "$1.1“ "LII 1:}. f ‘ .T-‘J';. v', _-_3§,.t - _:.r" 4“” c- 2 Sr 46???" 92‘1“ 4. '. - ~x. ,, . *3“ 4 on E I- ‘ ' ‘Jcflym . .1 . 4 1'— ‘ _flL.."‘ :- "t: .1 ‘9‘5332 i’.’ v E: 4‘ .F- -'-. * “fr/.1.” .4-‘. A ‘IPC- .. 5" . ;?-1.¢,4: o. “1‘34"; ‘ ’~.~.\ ' $3 '° a- )—" ‘ *. .‘ ; 1:1. ‘; - —- 1 . I 6:21:93: --‘_c _- 1"} 1' IIII'J £1311“ #11 1'11 THE/ fl Ul This is to certify that the thesis entitled Robustness and Power of Multivariate Tests for Trends in Repeated Measures Data Under Variance-Covariance Heterogeneity presented‘by Gabriella Belli has been accepted'towards fulfillment of the requirements for Ph. D. degree in Counseling , Educational Psychology, & Special Education (Statistics & Research Design) 0&8le V Major professor 0-7639 MSU is an Affirmative Action/Equal Opportunity Institution MSU RETURNING MATERIALS: Place in book drop to LJBRARJES remove this checkout from —;-—IL your record. FINES will be charged if book is returned after the date stamped below. .-; f s“ j” :5, 2;- a ~ * . l f— 7" :T'N‘Ai‘l 3" :13» .i crawls. 3* " — W- __...‘_~.__ _,~ . ROBUSTNESS AND POWER OF MULTIVARIATE TESTS FOR TRENDS IN REPEATED MEASURES DATA UNDER VARIANCE-COVARIANCE HETEROGENEITY by Gabriella M. Belli A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Educational Psychology and Special Education 1983 ©Copyright by GABRIELLA BELLI 1984 Mu. 0f homo groups). ordered group d and (3) effects Stoop h tests 0 A“ aEgo: f°r int. but tha' Thi Was to ‘ differe; 0f main evaluate tw° wit! and (2) ABSTRACT ROBUSTNESS AND POWER OF MULTIVARIATE TESTS FOR TRENDS IN REPEATED MEASURES DATA UNDER VARIANCE-COVARIANCE HETEROGENEITY BY Gabriella M. Belli Multivariate statistics are subject to the assumption of homoscedasticity (iAL, equal covariace matrices across groups). In a repeated measures (RM) design with time ordered data, three hypotheses are tested: (1) between- group differences, (2) within-group trends over occasions, and (3) group by occasion interactions. Although the effects of assumption violation on tests of the between- group hypothesis have been investigated, the effects on tests of within-group and interaction hypotheses have not. An argument is presented indicating that multivariate tests for interactions should behave like between-group tests, but that tests for within-group trends should not. The primary purpose of this Monte Carlo investigation was to determine whether heteroscedasticity has a differential effect on the robustness of multivariate tests of main effects in a RM case. A secondary purpose was to evaluate the robustness and power of multivariate tests of two within-group hypotheses: (1) overall tests of trends, and (2) subsequent tests of trends higher than linear, undei sampl Hotel Wilks inves trend than (2) w sligh' (3) D. for w. Size, betwei Stomp: r0bus1 heter‘ fatter inctea heterc factor a dec; lathe: Gabriella M. Belli under various combinations of number of groups and equal sample sizes. The test statistics were: Roy's largest root, R, Hotelling-Lawley trace, T, Pillai-Bartlett trace, V, and Wilks' likelihood ratio, W. The following are the major conclusions drawn from the investigation. (1) Multivariate tests of within-group trends are considerably more robust to heteroscedasticity than are multivariate tests of between-group differences. (2) Within-group tests of trends higher than linear are slightly more robust than overall tests of trends. (3) Departures of empirical Type I error from nominal alpha for within-group tests increase as heterogeneity, sample size, or alpha increase, but not as dramatically as for between-group tests. (4) Increasing the number of equal groups does not have a consistent detrimental effect on robustness of within-group tests. (5) For low and moderate heterogeneity (i.eu, covariance matrices differing by factors of two or four), power of within-group tests increases as total sample size, N, increases. (6) For high heterogeneity (i.e., covariance matrices differing by a factor of nine), power of within-group tests increases with a decrease in the number of discrepant score vectors, rather than with an increase in N. In a; and comma: wish to t] friend, ccl strong notI teacher ar Programs. I worl the many :I COnStant 5. of this rel Floden, an providing I Von D" J0e L. t0 Hr. Jef were eSSER MOSt husband, D. and EnCou r5 ACKNOWLEDGEMENTS In appreciation for his support, insightful questions and comments, and willingness to let me pursue my ideas, I wish to thank Dr. Andrew C. Porter. He has been a valuable friend, colleague, and chairperson and has provided a strong model for professionalism and excellence as both teacher and researcher throughout my masters' and doctoral programs. I would also like to thank Dr. Richard F. Houang for the many thought provoking discussions, as well as for his constant availability, during the conceptualization stage of this research. I wish to express my appreciation to the rest of my committee, Drs. James H. Stapleton, Robert E. Floden, and William H. Schmidt, for reviewing my work and providing suggestions for improvement. I would further like to express my appreciation to Dr. Joe L. Byers, who helped provide the computer time, and to Mr. Jeff Glass, who coded the FORTRAN program. Both were essential components in making this research possible. Most importantly, an expression of gratitute to my husband, Dr. Robert E. Krapfel, without whose moral support and encouragement this work would not have been completed. His patience and understanding have been immeasureable. ii LIST OF T LIST OF P Chapter I. STATI H.Hmt III. Revre TABLE OF CONTENTS Page LIST OF TABLES OCCOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO v LIST OF FIGURES 00.0.0...OOOOOOOOOOOOOOOOOOOOOO0.00... Vi Chapter I. STATEMENT OF THE PROBLEM IOOOOOOOOOOOOOOOOOOOOOOO 1 II. MULTIVARIATE ANALYSIS OF VARIANCE ............... 8 General Multivariate Linear Model .......... 8 Multivariate Generalization for Repeated Measures .............................. 10 Hypothesis Testing ......................... 14 Tests of Significance ...................... 18 Theoretical Comparison of Tests ............ 21 III. REVIEW OF THE LITERATURE OOOOOOOOOOOOOOOCOOIOO... 24 Strategies for Investigating Robustness to Heterogeneity ...................... 25 Consequences of Non-independence and NOD‘nOrmality 0000000000...00.00.00.000 28 Consequences of Heterogeneity .............. 30 Fixed-model ANOVA ..................... 30 Mixed-model Repeated Measures ......... 32 Two-sample MANOVA - Hotelling's T2 .... 35 General MANOVA Test Statistics ........ 38 Iv. METHOD COIOOCOOCOOCOOOOOOOOOCOOOOOOCOOO0.00.00... 46 Reduction to Canonical Form................. 47 Parameters Of the Study OOOOOOOOOOOOOOOOOOOO 48 iii VI. 13; APPEND] DOW? ("1 o Procedures ................................. 51 Determination of Critical Values ...... 53 Design for the Study .................. 53 Empirical Comparisons of Tests ........ 57 Analysis of Within-group Tests ........ 58 Interpretation of Obtained Probability Values ............... 58 "ante carlo TeChniques ......O.....OO....O.. 59 Random Number Generator ............... 60 Creation of Normal Deviates ........... 61 Transformation to Desired Structure ... 62 v. RESULTS ......................................... 64 Comparison of Tests on Robustness .......... 65 Power of Within-group Tests of Trends ...... 73 Robustness Under Various Conditions ........ 79 Sample Size and Robustness ............ 80 Number of Groups and Robustness ....... 83 Power Under Various Conditions ............. 87 Sample Size and Power ................. 88 Number of Groups and Power ............ 92 Total Sample Size and Power ........... 92 VI. DISCUSSION ................O..................... 95 conC1USion8 .............O.......O.....OOOO. 95 Guidelines for the Researcher .............. 98 Suggestions for Future Research ............100 APPENDICES ...........................................103 A. B. C. D. Computational Details and Computer Program ...103 Monte Carlo Critical Values ..................136 Significance Levels for Between-group Tests ..143 Significance Levels for Within-group Tests of Non-linearity ......................146 Power Rates for Within-group Tests ...........l49 BIBLIOGMPHY .........................................154 iv LIST OF TABLES Table Page 2-1 Multivariate Analysis of Variance (k-sample) .... 15 2-2 Multivariate Analysis of Variance for Repeated Measures O...................O..00...... 15 2-3 SSCP Matrices for RM Tests ...................... 17 2-4 Multivariate Test Statistics .................... 20 4-1 DeSign for the StUdy ............................ 54 4—2 Standard Errors for Nominal Alpha Levels and Number of Replications Used in the Study ........ 59 5-1 Percentage Exceedance Rates of Monte Carlo Critical Values for Multivariate Tests under True NUll Hypotheses O O O . O O O O O O O O O O . O O O O O O O 68 5-2 Differences in Percentage Exceedance Rates Under True Null Hypotheses ...................... 72 5-3 Percentage Exceedance Rates for Within-group Tests under True Alternatives .....OOOOIOOOOOOOOO 75 5-4 Percentage Exceedance Rates for Within-group Tests with Modified True Alternatives ........... 78 5-5 Percentage Exceedance Rates Under a True Null for Tests Of Trends Withk=3 ......O........... 81 5-6 Percentage Exceedance Rates Under a True Null for Tests of Trends with n = 20 ................. 84 5-7 Average Percentage Exceedance Rates Under True Alternatives for Tests of Trends with k = 3 ..... 89 5-8 Average Percentage Exceedance Rates Under True Alternatives for Tests of Trends with n = 20 .... 91 Figure 5'1 Tre: Tab of 5'3 Pow for LIST OF FIGURES Figure Page 5-1 Trend transformation for power results of Table 5-3 with mean vectors: (0 .4 .8 .5 .1) for p a 5 (0 .4 .8 .5) for p = 4 ..................... 77 5-2 Second trend transformation for power results of Table 5-4 with mean vectors: (0 .6 .7 .2 .05) for p =- 5 (0 .6 .7 .2) for p = 4 ..................... 77 5-3 Power curves averaged over four test statistics for different total sample sizes N, where: N = 30 40 60 120 150 k 8 3 2 3 6 3 nalo 20 20 20 so ......OOOOOOOOOOOOO 92 vi CHAPTER I STATEMENT OF THE PROBLEM Classical experimental research involves investigating the effect of manipulating one or more independent variables on a single dependent variable. This involves either testing the null hypothesis of equal group means against a general alternative or testing for specific planned comparisons among the group means. The test statistic used is the F-test (or t-test for two groups). Given parametric assumptions, this is the uniformly most powerful test that is invariant with respect to linear transformations (Scheffé. 1959). Generalizing to the multivariate case, where there are two or more dependent variables (say, p). the corresponding null hypothesis is that of no differences among the I: group vectors. where each vector consists of the group means on the p dependent measures. The F-test is a univariate test statistic, and several generalizations of it have been proposed for significance testing in the multivariate case. Among those tests that are invariant under linear transformation of the dependent variables, Hotelling's T2 statistic is the uniformly most powerful for one-sample tests of means and two-sample tests of mean differences (Anderson. 1958). Four other commonly used test statistics are Roy's largest root. R. Hotelling-Lawley trace. T. Pillai-Bartlett trace. V. and Wilks' likelihood ratio. W. However. for situations where there are multiple dependent variables or more than two groups. no test has emerged that is both invariant with respect to linear transformations and uniformly most powerful. A specialized case of the multivariate analysis of variance (MANOVA) deals with situations where the same measure is repeatedly taken over the same individuals. The design on the measures. or occasions of testing. may reflect the passage of time. with the same measure taken at equally spaced intervals. or it may represent a factorial structure. with the same measure taken after various treatment interventions. In addition to the usual multivariate hypothesis of group differences. hypotheses about the occasions and. if there are multiple groups. about group by occasion interaction may be tested. The null hypothesis for occasions is that of no differences among the p occasion vectors. where each vector consists of the occasion means for the k groups. ‘When there is only one group or when no group by occasion interaction exists. of even greater interest is the testing of hypotheses about the trend the data follow. assuming equally spaced time points. or about contrasts among the various measures. assuming a factorial design. Tests for these hypotheses are E tests conc. toxr null simi then Viol can the 0P6: thre the [Obu bala Sand 519: “her are all within-group tests as opposed to between-group tests in the usual MANOVA sense. In both the univariate and the multivariate cases. the test statistics used are based on certain distributional assumptions. These are that the random errors or error vectors for the p measures are: (1) independently and (2) normally or multivariate normally distributed (3) with a common variance or variance-covariance matrix. Violations of these assumtions may lead to erroneous conclusions. However. if a particular test is insensitive to violation of one or more of the assumptions when the null hypothesis is true (i.eu. if it leads to conclusions similar to what would be expected given the assumptions). then the test is said to be "robust" with respect to the violation. The assumption of independence is critical and no test can favorably withstand its violation. Non-independence of the observartions or of observational vectors due to faulty operationalization of experimental design is a serious threat to nominal alpha levels. In univariate situations. the F-test for fixed effects has been shown to be fairly robust with respect to violation of normality and. for balanced designs. of homogeneity (see Glass. Peckham. and Sanders. 1972). However. severe departure from nominal significance level may occur under heterogeneity conditions when samples are small and unequal (Scheffé. 1959). Re situatj violati 1980). result: are sim robust heteros varianc‘ but are two 9:0 number 7 With mo are equé moderat« level a1 T: Violati‘ within~. SitUati‘ (1975) S to high. latQQr . likely t treat“! between Regarding between-group differences in multivariate situations. the several tests respond differently to violations of the assumptions (for a review. see Ito. 1980). Generalizing. it may be said that robustness results for fixed effects of at least some of these tests are similar to those in the univariate case. They are robust to non-normality and also fairly robust to heteroscedasticity (i.e.. violation of homogeneity of variance and covariance) in balanced. two-group designs. but are not so for unbalance designs. However. even with two groups. the tests become liberal with increases in number of dependent variables or amount of heterogeneity. With more than two groups. tests are robust only if samples are equal and extremely large. If they are unequal. even moderate heterogeneity has large effects on significance level and power (Ito and Schull. 1964). To date. no studies have considered the robustness to violation of multivariate test assumptions for tests of within-group differences in a repeated measures (RM) situation. Due to the nature of RM studies. Morrison (1976) states that I'many experimental conditions which lead to higher mean values may also produce responses with larger variances" (p. 141). Different populations are likely to respond differently to successive measurements or treatment conditions. thereby also causing correlations between measures to differ from group to group. This is particularly true in studies of naturally occurring groups (e.g.. a comparison of learning disabled and normal children on learning retention rates over time). Subjects within a classification group may be expected to respond in a similar fashion. but it is unrealistic to expect that scores for the two groups come from the same multivariate normal population. Hence. it is important to determine the validity of the multivariate tests of RM in the presence of heterogeneity conditions. Just as findings from robustness studies for tests of between group differences have parallels in the univariate and multivariate cases. it may be presumed that similar parallels would hold for tests of within-group differences when homogeneity is violated. However. results from mixed- model RM studies would not apply to multivariate tests since the univariate tests are based on the assumptions of equal variances and equal pairwise correlations across the measures. which are unnecessary for multivariate tests to be valid. 'The effect on within-group tests when using a covariance matrix that is pooled from heterogeneous population covariance matrices is not known. The robustness of a parametric test is idiosyncratic rather than general with respect to any violation and changes in one parameter may produce different levels of departures from nominal significance level. Tests of within-group differences are based on transformations of the dependent variables and the assumptions are made on the transformed scores. It will be shown in Chapter II that multivariate tests of between-group and within-group differences are based on sums of squares and cross products (SSCP) matrices that are different in both form and size. and that the relationship between the eigenvalues needed for calculating the test statistics for the two tests is not obvious. Hence. it is not possible to predict the behavior of one type of test from that of the other. Since the current robustness results from studies of multivariate between-group tests may not apply directly to within-group tests. separate investigations need to be made. Furthermore. subtests of particular trends for RM data make use of subcomponents of the appropriate SSCP matrices for hypothesis and error. Since it is known that between- group tests become more robust with lower dimensionality of variables. it is expected that tests of successively higher order trends should show greater robustness than tests of lower order trends. The present research was an investigation of the robustness and power of multivariate within-group tests for a repeated measures design with the same measure taken over a series of equally spaced time points. Non-normality does not seem to cause serious problems under any circumstances thus far investigated. whereas heterogeneity may be a serious problem in certain cases. Therefore. given that he tk vi ha St to gn cox exa dii of con tes lev amc Als tru cor: Pre mu} lit end dis heterogeneity is typically a violation of greater concern. the focus of this study was limited to the effect that violation of the assumption of a common covariance matrix has on the sampling distributions of four multivariate test statistics. The purpose of the first part of the investigation was to determine whether tests of between-group and within- group hypotheses differ in their reactions to heterogeneous covariance matrices across groups. The second part was to examine whether covariance matrix heterogeneity produces differential effects on within-group tests when the number of groups or of subjects within groups are varied. Third. comparisons were made between overall tests of trends and tests of non-linearity. In all cases. actual significance levels obtained under a true null hypothesis and a given amount of heterogeneity were compared to nominal values. Also. actual powers for within-group tests obtained under a true alternative and a given amount of heterogeneity were compared to expected nominal powers if no violation was present. The following chapters will present the general multivariate and repeated measures models. along with their hypotheses and test statistics. a review of the robustness literature. the method used for investigating robustness and power of multivariate within-group tests. results and discussion of results. CHAPTER II MULTIVARIATE ANALYSIS OF VARIANCE In this chapter. the mathematical models for the general multivariate analysis of variance (MANOVA) and for the multivariate generalization to repeated measures (RM) are described. These are followed by a description of the hypothesis testing procedures through the separation of the total source of variation into component parts. the tests of significance used in multivariate analyses. and the assumptions on which they are based. The final section deals with a comparison of the sums of squares and cross products (SSCP) matrices used to test between-group and within-group differences. General_nultixariare_Linear_nodel Assuming there are nj (3' = l.....k) independent observations in each of k groups. the ith observation in the jth group is a pxl vector consisting of a constant term A. a group effect 21. and a random error component iij Yij ' E|+ Ej + Eij' The Y13 and the sij are distributed in the population of subjects as No.1,. 2) and N(D_. 2). respectively. where 2 is any pxp symmetric positive definite matrix. The null hypothesis tested in MANOVA is that the pxl mean vectors of all groups are equal. Ho: £1 I 2,2 I ... 3 11-k- By letting Ej - g,+ gj. this hypothesis is equivalent to testing that all the 221 - 0 (i.e.. that all the treatment or group effects are equal) (Bock. 1975). The general MANOVA model for k group means may be expressed in matrix terms as Y. 8 A3 + E. where: Y. - a kxp data matrix of k group means on p measures A I a kxm known design matrix . an mxp matrix of unknown paramenters E. - a kxp matrix of random errors The error matrix B. is distributed N(Q5D‘10£) where D a diag(n1.n2.....nk)- Since A typically is not of full rank. A'1 does not exist. and therefore solving for the unknown parameters is not possible. One solution is to reparameterize the model. which may be done by factoring A into the product of two matrices. K and L. A 8 KL where L is an lxm contrast matrix that describes a set of 1 linear combinations of the paramenters in : and K is the ! corresponding kxl column basis for the design matrix A. Then. E(Y.) =- A5 = KLE = K9 where e is an lxp matrix of new paramenters describing the resulting linear combinations that reflect the research interest regarding differences among the groups (Bock. 1975. pp. 239-240). WWW Multivariate analysis of variance of repeated measures (MANOVA of RM) is a variation of MANOVA that includes a test for the occasions or repeated measures. What distinguishes these data from general multivariate data is that in RM the multiple dependent scores are assumed to be in the same metric (i.e.. having the same origin and unit). whereas in general the scores are qualitatively distinct (i.eu. having different origin and unit). The underlying model for the ith observation in the jth group is a pxl vector that contains a component for occasions 1. for groups gj. and for random subject error Eij' lij - l + Ej + 513. As before. the gij are distributed N(.0,. 2 ). But. unlike the general MANOVA model. where the common term a does not provide any additional information. the common term in this model. 1. represents a pxl vector of constants and general means for the p occasions. The second term. fljr is a 931 vector of effects for the jth group that incorporates both group and group by occasion interaction effects. The model allows for a design on the occasions and a design on the subjects (Bock. 1975). 10 In the one-sample case or. assuming no interactions. in the k-sample case. the objective is to characterize the occasion vector 1. The appropriate characterization depends on the structure of the repeated measures dimension. If the measures correspond to points along a continuum. a polynomial representation is used. .1; - X f where x is a regression model matrix and 8, is a vector of unknown regression coefficients. If the measures correspond to a factorial classification. then a treatment contrasts and interaction representation is used. 1 = A.§ where A is a design matrix for the occasions and g is a vector of unknown occasion effects. In the former case. x is of full rank while. in the latter case. A is not and the model may be reparamenterized a second time. While this reparamenterization follows the same pattern as before. with A .. KL. A is now the design matrix for the occasions and not for the groups. Under the usual MANOVA model. the general occasion effect 1 is not estimable and hypotheses on it are not testable in the presence of group effects. Bock (1963) and Potthoff and Roy (1964) have suggested a variation of MANOVA that involves transforming the dependent variables to within-subject differences. A new set of measured variables is formed as linear combinations of the original 11 measures. Yij" 3 P'yij where P is a matrix representation of the design over the measures. In terms of the previous discussion of the characterization of 1. P is either: (1) the regression model matrix x, if the measures are taken at ordered time points or (2) the orthonormalization of K. where K is the basis for the reparameterization of A. the design matrix for the occasions. Assuming a full rank model for group means. the transformation in matrix terms consists of postmultiplying the components of the MANOVA model by a known matrix P. which may be any pxq matrix. Preferably. P should be an orthogonal matrix and this is now assumed. Then. Y.P 8 K9 P + E.P or equivalently. Y.* = KO P + E.* where: Y.*= a kxp matrix of transformed scores K = a kxl basis matrix for transformations on groups 9 a an lxp matrix of parameters P = a pxp basis matrix for transformations on occasions E.*- a kxp matrix of transformed errors Analysis now proceeds as usual with the transformed scores in Y.* replacing the original dependent measures. The fact that the standard procedures apply can be seen 12 since the transformation 2. - Y.*P'1 reduces the RM model to a standard MANOVA.mode1 (Timm. 1980. p. 76). Furthermore. if P is orthogonal (i.e.. P'P - I). so that P‘1 s P”. each vector of scores may be transformed using P'. as was shown previously. When P is either non-singular or has rank p. the transformation has nice properties with respect to the distributional assumptions. Given that the yij are independent and distributed N(£.2). then the Yij* are also independent and are distributed N(P'u. P'XP) (Bock. 1975. p. 140). Three basic hypotheses are of interest with k-sample RM data. These deal with comparisons among the mean curves or profiles of the groups. and may be phrased in terms of the following questions: (1) Are the curves or profiles of the k groups parallel? (2) If parallel. are they also coincident? and (3) If coincident. are they also constant? (Bock. 1975). The first question is asking about the presence of any group by occasion interactions. The second relates to group differences and the third to occasion differences. Subhypotheses to assess the effect of the treatment structure or the trend over the occasions may also be tested. Assuming a polynomial representation for the RM dimension. this involves partitioning the sources of variation for occasion and for group by occasion into 13 constant. linear. quadratic. etc. terms. Then a hypothesized trend may be tested by a multivariate test that all higher order trends are zero. The interpretation for these tests on occasions is straightforward and relates information about the type of trend the RM follow over time. However. a q-degree trend among the interactions implies that “any contrast among the groups can presumably be described as a polynomial of this degree. For example. a degree-2 interaction would imply that differences between groups. in addition to a possible linear trend. are accelerating or decelerating with respect to occasions" (Bock. 1975. p. 474). H I] . T l' The multivariate hypothesis testing stage involves partitioning the sums of squares and cross products (SSCP) matrix for total variation into a constant. a between- groups. and a within-groups part. The MANOVA table for the general multivariate analysis is given in Table 2-1 (adapted from Book. 1975). The SSCP matrices for RM may be calculated directly by substituting Y* for Y in Table 2-1. The same results may be obtained by transforming the MANOVA SSCP matrices as shown in Table 2-2 (Bock. 1975). 14 Table 2-1 Multivariate Analysis of Variance (k-sample case) Source of df SSCP (pxp)* Variation Equal n's General Constant 1 0c - (n/k)Y.'ll'Y. (1/N)Y.'Dll'DY. (occasion effect) Between groups k-l Qb a ny.'y. - Qc Y.'DY. - Qc (group effect) Within groups N-k Qw = Y'Y - nY.'Y Y'Y - Y.'DY. error Total N Qt a Y'Y Y'Y * where D = diag(n1.....nk) and l a a unit vector. Table 2-2 Multivariate Analysis of Variance for Repeated Measures Source of Variation df SSCP (pxp) Constant 1 QC* 8 P'QCP Between groups k-l Qb* = P'QbP Within groups error N-k Qw" = P'QwP Total N Qt* = P'QtP 15 The multivariate test statistics are functions of the appropriate SSCP for hypothesis and error (say. H and B. respectively). The MANOVA hypothesis of equal group means may be tested by setting H - Qb and B - Qw' For RM. the matrices in Table 2-2 may be partitioned in the following manner: Qc* ‘ PC 1 1 Qb* ’ Pb : 1 Qw* 8 'w : 1 --T---- --T ...... F"“ I I I :c :3 :w L.‘ J b. . L'J Assuming a polynomial decomposition. the scalars c. b. and w represent the sums of squares for constant. group effect. and error terms that would be used in a univariate analysis. The (p-1)x(p-l) matrices C. B. and W are the SSCP for occasion effects. group by occasion effects. and subject within group by occasion error. The diagonal elements of these submatrices are the univariate sums of squares for the respective linear. quadratic. etc. trends. Table 2-3 shows how these matrices are used for the three omnibus tests in a RM situation. With no group by occasion interaction. the full matrices 0b* and Qw* are the H and E matrices for group effect and corresponding error for a multivariate test of group differences. When P is orthogonal. a test using these transformed matrices gives the identical results as with Oh and Qwr because test statistics based on either determinants or trace functions remain invariant under 16 orthogonal transformation (Anderson. 1958. p. 277). Table 2-3 SSCP Matrices for RM Tests Hypothesis H B Dimension parallelism (interaction) 3 w (p-1)x(p-1) Coincidence (group effect) Qb* Qw* pxp Constancy (occasion effect) C W (pP1)x(p-l) The submatrices C. B. and W may be partitioned further to provide tests for particular trends. To test for any q
0 for all x =- 0) (Anderson. 1958.
p. 337). This will usually be the case if the number of
dependent variables (p) is less than the degrees of freedom
for error (dfe).
Let 11 z 12 2 ...z 13 > 0 where s - min(dfh. u) with
dfh 8 degrees of freedom for hypothesis and u - the number
of variables after any transformation. Then. four commonly
used multivariate test criteria are defined in Table 2-4
(Timm. 1975). These are exact tests. with known central
and noncentral distributions. When 3 - l (i.e.. if p a l
or k - 2). they are equivalent and may be represented as an
exact P distribution. There also are P approximations for
the multivariate tests (see e.g.. Tatsuoka. 1971).
The only parameters necessary to define the
distribution of the statistics under valid assumptions and
true null hypothesis are number of variates. degrees of
freedom for hypothesis. and degrees of freedom for error
(Ito. 1962). .Additionally. noncentrality parameters are
needed under true alternatives. Based on these parameters.
Timm (1975) provides tables for the upper percentile points
of R. T. and V and for the lower percentile points of W.
The null hypothesis is rejected at significance level a if
the obtained value of W is less than the 100a-centile of
the null distribution. For the other tests. the null is
rejected if the obtained value of a statistic exceeds the
100(1-o)-centile of the corresponding distribution.
19
Table 2-4
Multivariate Test Statistics
Roy's largest root R 3 _il_
1+1l
s
Hotelling-Lawley trace T = 2 xi = tr(HE"1)
181
s
Pillai-Bartlett trace v - 2 _’)j_ . tr(H(H+E)'1)
1+Ai
i=1
8
Wilks' likelihood ratio w . H_1_ = lsl-(|H+E|)'1
i=1ini
20
.Thenretica1_£nmnaxison_of_Tests
Although the multivariate test statistics for tests of
between-group and within-group differences are identical.
they operate on different.SSCP matrices. The question of
interest. then. is whether these matrices. whose expected
values are functions of the common covariance matrix. are
equally subject to violations of homoscedasticity. The
following discussion outlines the relationship between the
matrices used for the two tests.
Multivariate test criteria are functions of the
eigenvalues of HE'l. where H and E are the SSCP matrices
for hypothesis and error. respectively. For a within-group
test HE'1 is the lower (p-1)x(p-l) submatrix of the
appropriately transformed
QcQw'l (1)
and for a between-group test. it is the pxp matrix
chow-1 (2)
where QC. 0b! and Qw are defined in Table 2-1.
From the robustness literature. which is reviewed in
Chapter III. we have general conclusions about the effects
of particular types of homogeneity violations in the
population covariance matrices when (2) is used to test for
group differences. These results are based on
distributions of the p eigenvalues of (2).
Tests for group by occasion interactions are based on
the eigenvalues of the order-(p—l) submatrices of (2).
21
Since the same SSCP matrices are used for both interaction
and between-group tests. the lower dimensionality in the
portion of those matrices used for interaction tests should
tend to make them slightly more robust than the between-
group tests.
Tests of occasion differences with RM data are based
on the p-l eigenvalues of the order-(p—l) submatrix of (1).
By substitution.
QcQw-l = (Y.'DY. - Qb)Qw'l
. (Y.'DY.)Qw'1 - obow'l.
Even though a relationship exists between matrices (1) and
(2). knowledge about the distributions of eigenvalues of
(2) does not provide direct information about the
distribution of eigenvalues of (1). Since within-group
tests are actually based on a submatrix of (1). it would
further be necessary to establish the relationship between
the p eigenvalues of the full matrix (1) and the (p-l)
eigenvalues of the submatrix used for these tests in order
to fully specify the relationship between the matrices for
the two types of tests.
Bach subsequent within-group test of successively
higher order trends is based on submatrices of (l). which
decrease in dimension. Therefore. each test of a higher
order trend should result in slight increases in robustness
over the previous within-group test.
22
It is not obvious whether heterogeneity in the
population covariance matrices would differentially effect
the robustness of between-group and within-group tests and
the mathematics needed to demonstrate the necessary
relationships are intractable. Therefore. an empirical
study was conducted to determine if the distributions for
any of the four multivariate test statistics presented
earlier are comparable for testing the two types of
hypotheses. In this way. it could be determined if the
tests respond similarly to the same violation to
homogeneity; .A further comparison of the robustness
between within-group tests for any trend across time and
the subsequent tests for trends higher than linear was also
conducted. The study involved the simulation of a large
number of experiments so that the actual significance
levels could be compared to nominal levels with minimal
standard error.
The second part of the study was an investigation of
the effects on robustness and power of within-group tests
when sample size and number of groups are varied.
23
CHAPTER III
REVIEW OF THE LITERATURE
Consequences of assumption violations have been
thoroughly investigated for univariate test statistics from
both the large sample and small sample points of view.
Only recently have similar studies been undertaken for
multivariate test statistics. While some of this work has
been theoretical. involving large sample theory and
asymptotic approximations. most of it has been empirical.
Since the mathematics involved in a theoretical study of
multivariate statistics are quite complex. Ito and Schull
(1964) remarked that ”the small sample treatment of the
problem ... is very difficult if not impossible” (p. 72).
Researchers in the multivariate area have focused on a
one-way fixed effects classification for the independent
factor and have considered tests of between-group
differences on multiple dependent measures. Robustness
studies of within-group tests have dealt only with
violations of the univariate mixed-model assumptions of
equal variances and covariances across the repeated
measures (RM). Typically. comparisons have been made
between the usual F-test and the I? adjusted by a correction
factor (e.g.. Collier. Baker. Mandeville. and Hayes. 1967)
or between univariate and multivariate analyses (e.g..
24
Scheifley. 1974). However. in all cases with more than one
group. groups were assumed to have a common covariance
matrix.
The following review will briefly summarize the fixed
and mixed model univariate results and then present the
multivariate results in greater detail. As a preliminary.
an overview of a common strategy used to model
heterogeneity will be presented.
. . -; .- .. :.. : .-:; . :- - ..-.-
Variance heterogeneity in univariate studies is easily
portrayed by a ratio of population variances. For
multivariate problems. modeling is more complicated since
there are many ways of introducing heterogeneity in
population covariance matrices. Two single-valued
multivariate analogs to a variance are the trace and
determinant of the covariance matrix. The trace represents
the total variation and the determinant represents a
generalized variance (Tatsuoka. 1971). Ratios of
covariance matrix determinants parallel the univariate
case. forming a convenient index of multivariate
heterogeneity.
A typical tactic used in empirical studies of
robustness of multivariate test statistics against
violation of the assumption of homoscedasticity is to
reduce the problem to canonical form. This procedure.
which was used in all but two of the multivariate studies
25
reviewed. produces diagonal covariance matrices. thereby
reducing the number of parameters that need to be
considered by p(p-1)/2.
The procedure is based on theorems for matrix
transformations (see Tatsuoka. 1971. pp. 125-129). It
consists of applying a linear transformation. say C (where
C is orthogonal. i.e.. C'C = I and ICI - l). to the matrix
of observations x. thus producing a new set of uncorrelated
variables Y - XC. The matrix C represents a rigid (or
angle-preserving) rotation from the original variates to
the principal axes and consists of columns of eigenvectors
of the original covariance matrix 2. Using the same
transformation matrix. 2 is transformed into a diagonal
matrix C'XC " diag(11,12,,,,,ls), with the variances of the
canonical or transformed variates (eigenvalues) as diagonal
elements. This is called 'diagonalizing the matrix“
(Tatsuoka. 1971. r» 128). The trace and determinant of the
original covariance matrix are equal to the trace and
determinant of the transformed matrix. A multivariate
analysis on the canonical variates produces the same
results as those obtained with the original ones since the
MANOVA test criteria are invariant under any linear
transformation (Anderson. 1958. r» 277).
The operationalization of this procedure in MANOVA
robustness studies of heterogeneity relies on the fact that
two population covariance matrices. say V1 and v2, may be
26
' linearly transformed to the identity matrix. I. and to a
diagonal matrix. D. whose diagonal elements are the
eigenvalues of V2V1'1 (Holloway and Dunn. 1967). o is
called the diagonal matrix of latent roots.
MANOVA test criteria for a given test based on any
mixture Of N(D..V1) and N(.Q.V2) are equivalent to a mixture
of N(Q.I) and N(Q.D) (Olson. 1973). To model situations
with non-zero mean vectors. a mixture of N(u1,1) and
N(g2.D) may be used to represent the canonical forms of
N(C"'1.21.V1) and N(C"'1P.2.V2). This applies to both the
central case. with equal population means. and the
noncentral case. with unequal population means.
Heterogeneity is typically introduced either equally in
all of the canonical dimensions. with D - 61. or in only
one dimension. with D - diag(d.1.....l). Variations on
this theme allow for heterogeneity to vary across canonical
dimensions. with some di a 1 while other d1 = d. In this
way. a researcher need only vary values of d to simulate a
variety of heterogeneous conditions. For more than two
groups. either one or more groups are sampled from a
population with covariance matrix D and the rest from a
pOpulation with covariance matrix I. An alternative is to
sample groups from populations with covariance matrices I.
D. and multiples of D.
27
WW3!
Violation of the independence assumption is quite
serious. For analysis of variance (ANOVA). positive
correlations among the errors yield a liberal test (i.e..
too many significant results) and negative correlations
yield a conservative test (i.e.. too few significant
results). This is true for both equal and unequal sample
sizes and the discrepancy between nominal and actual levels
of significance increases as the absolute value of the
correlations increase (see Scheffé. 1959).
For a two-group matched-subjects design in univariate
situations. use of the correlated or dependent t-test is an
appropriate technique to handle the problem. For
correlated observations that arise from a RM situation. the
problem is identical to that of a mixed-model analysis and
two avenues are open. One is to use the correction factors
of Box (1954) or Greenhouse and Geisser (1959). These
adjust the degrees of freedom for the F-test and the latter
produces conservative results. The other method is to use
exact multivariate tests. which do not make the ANOVA
assumption of independence of errors across measures taken
on the same subject. However. independence of errors
between subjects must still be maintained.
Glass. Peckham. and Sanders (1972) provided a thorough
review of the univariate literature for fixed-effects
designs. General conclusions were that violation of the
28
normality assumption does not present a problem for either
the t-test or the F-test in an analysis of mean
differences. For both equal and unequal sample sizes.
discrepancies between actual and nominal significance
levels are slight and. with equal n's. the F-test proves to
be robust even in the extreme case of dichotomous data.
However. non-normality does effect inferences about
variances. such as in tests of random-effects or equality
of variances (Scheffe. 1959).
Considering six multivariate tests and using equal
rfls. Olson (1973. 1974) found that departures from
normality in the direction of positive kurtosis (occasional
extreme observations) had only minor conservative effects
on Type I error rates. From the asymptotic expressions for
central and non-central distributions of Hotelling‘s T2 and
T02 (a generalized T2 for more than two groups). which were
obtained by Ito (1969). approximate values for actual
significance and power may be found. In a recent review.
Ito (1980) mathematically demonstrated that. for
sufficiently large sample sizes. non-normality did not
appreciably effect either the significance level or the
power of these test statistics. The question of what
sample size is to be considered "sufficiently large” was
left open. since this is difficult to demonstrate
theoretically. Ito (1980) further stated that. from Monte
Carlo studies. the T2 test in the two-sample case has been
29
found to be particularly robust against non-normality for
tests about means. However. as in the univariate case.
non-normality has serious consequences for tests of
equality of covariance matrices.
WW
Studies of both univariate and multivariate cases
indicate that violation of the homogeneity assumption may
cause serious discrepancy between actual and nominal
significance levels and that this is typically a more
serious problem than non-normality. Since this violation
is more serious. as well as being the focus of the present
research. greater attention will be given to studies of
robustness in the face of heterogeneity. Consequences in
both the univariate and multivariate cases will be
reviewed.
£1xed:mndel.ANQ!A
Extensive work has been done to examine the
consequences of departures from homogeneity of variance for
univariate test procedures (for reviews. see Scheffé. 1959.
Chapter 10 and Glass. Peckham. and Sanders. 1972). In the
univariate two-sample case. inequality of population
variances has little effect on either significance level or
power of the t-test if sample sizes are about equal.
However. if sample sizes are markedly disparate. large
deviations from the nominal error rate occur for both large
and small sample cases. The test is conservative if the
30
larger group has the larger variance and is liberal if the
larger group has the smaller variance (Scheffé. 1959).
For more than two groups. heterogeneity does have a
slight effect on the Type I error rate of the F-test even
when groups are of about equal size. in which case the test
is liberal (Scheffé. 1959). However. general conclusions
from both theoretical and empirical work have been that the
ANOVA F-test is robust to heterogeneity of variance. A
major exception is in the case of small and unequal sample
sizes. where the effects are serious. Results for unequal
rfls follow the same pattern as for the t-test. with either
conservative or liberal results.
It should be noted. however. that these general
conclusions have boundary conditions. which depend on
sample size or ratio of sample sizes. on the amount of
heterogeneity. and on the value of nominal alpha. Ramsey
(1980) found that even for the equal sample t-test.
robustness depends on certain conditions. For example.
with n's greater than 15. the t-test will not exceed a
significance level of .06 at a nominal level of .05
regardless of the amount of heterogeneity. but robustness
may be achieved with n's as small as five if the ratio of
variances in the two populations is 1:4 or less. Also.
there is an inverse relationship between nominal alpha
level used and sample size needed for robustness.
31
Mixed:model.8eneated_Measures
I In univariate mixed-model analysis. the RM dimensions
are treated as additional design factors. Two assumptions
are made for a valid univariate test: (1) equality of
covariance matrices across levels of the between-group
factor. and (2) uniformity of the common covariance matrix
(i.e.. equality of the variances of the RM and of the
pairwise correlations between these measures). In a RM
situation. variances might change between observations.
possibly due to treatment effects on each occasion. Also.
there is potential for lack of independence between error
components of the observations. particularly if the RM
factor reflects time.
Huynh and Feldt (1970) demonstrated that uniformity is
merely a sufficient and not a necessary condition for
validity of within-group F-tests. What is required is that
the assumptions stated above be met by the covariance
matrices of orthonormalized variates rather than of the
original variates. Nevertheless. the majority of the
robustness literature in the mixed-model case has focused
on violation of the uniformity assumption with the original
variates. Some of these studies are reviewed below.
While the studies in this section have a different
focus from the rest of this paper. since variances and
covariances are equal across groups. they are included as
backgound to a study of consequences of assumption
32
violations in a RM study. Also. they provide another
indication of the idiosyncratic nature of the behavior of
test statistics under different forms of violations.
In a theoretical study. Box (1954) assessed the
approximate effects of unequal variances and serial
correlations in one factor of a two-way design with one
observation per cell. He showed that these conditions
reduced the apparent number of degrees of freedom in both
the numerator and denominator of the F-ratio and that the
effect was to produce a slightly liberal test.
Empirical results for a k-sample RM study (with k = 3
and p a 4) were obtained by Collier. Baker. Mandeville. and
Hayes (1967). They compared Type I error rates for three
ANOVA F-tests: unadjusted. adjusted by Box's correction
factor. and by Greenhouse and Geisser's conservative lower
bound for the correction factor. They considered 15
different patterns of covariance matrices. where both
variances and pairwise correlations were varied. although
covariance matrices were common across groups.
As expected. their results showed that the P-test for
group differences had a close agreement between empirical
and nominal alpha. but that the F-test for occasions and
group by occasions effects did not. In both cases the
unadjusted P was liberal and the adjusted R was fairly
robust with Box's correction factor but conservative with
the lower bound test. .An unexpected finding was that
33
departures from nominal alpha did not significantly
decrease. and in some cases actually increased. when sample
sizes increased from five to 15. A similar. but smaller
study conducted by Mendoza. Toothaker. and Nicewander
(1974) upheld the above conclusions.
In an empirical study comparing the mixed-model ANOVA.
MANOVA of RM. and analysis of covariance structures
(ANCOVST). Scheifley (1974; Scheifley and Schmidt. 1978)
considered a one-group RM case with a 2x2 design on the
measures. Three covariance matrices were used. where one
matrix conformed to the assumptions of each analysis. When
the ANOVA assumption of uniformity was not met. all three
tests were generally conservative. ANCOVST had the
greatest power when a significant difference in means was
present in only one of the RM factors and MANOVA of RM had
the greatest power when the null for both RM factors and
the interaction were false.
Significance level results for the univariate test
under violation of uniformity in the above study were not
consistent with the previous two empirical studies in this
section. where results tended to be liberal. This may be
partly due to the fact that the two covariance matrices
used to model univariate assumption violation in
Scheifley"s (1974) study had variances that were fairly
close to being equal. while the other two studies had
larger discrepancies between variances. Another
34
possibility is that the opposite results were due to the
different patterns for the covariance matrices. The first
two studies considered successive trials on one RM factor
and the covariance matrices had simplex patterns (i.e..
successive diagonals had lower values). Due to the two-way
factorial structure on the RM in the third study. those
covariance matrices had circumplex patterns (i.e.. values
in successive diagonals first increased and then
decreased).
W2
Unlike the mixed-model case. in multivariate analysis
the separate repeated measurements are considered as
multiple criterion variables. They may have unequal
variances and a general pattern of correlations. The
assumption is that this general covariance matrix is common
across groups. To test for differences among the dependent
variables. the original variables are transformed into
contrasts of interest. Hotelling's T2 statistic is the
multivariate analog to the t-test. and is the uniformly
most powerful test for comparing two groups on p variables
(Anderson. 1958. pp. 115-118). Several researchers have
found it to behave in a fashion similar to the t-test under
heterogeneity conditions.
In an empirical study using Monte Carlo methods with
relatively small samples (N = n1+n2 ranging from 10 t0 40).
Hopkins and Clay (1963) examined the Type I error rates of
35
Hotelling's T2 statistic for testing the equality of two
independent mean vectors in the p - 2 case. The two
populations studied were N(Q.0121) and N(Q.0221). where
heterogeneity between covariance matrices was present
equally in both canonical dimensions and of the form
022/012 - 1.6 and 3.2. Under these circumstances. they
found that with n1 . n2 > 10. heterogeneity had little
effect on test results. but that. as in the univariate
case. this robustness does not extend to unequal sample
sizes. Everything else being equal. the greater the
heterogeneity. the greater the departure of the observed
significance level from the nominal alpha level.
Furthermore. regardless of the amount of heterogeneity. the
T2 test was conservative if the larger group had more
variability and liberal if it had less variability.
Another empirical study of the effect of inequality of
covariance matrices and of sample size on the distribution
of Hotelling's T2 statistic was conducted by Holloway and.
Dunn (1967). They considered both level of significance
and power with number of variables ranging from one to 10
and total sample sizes from five to 100. In canonical
form. the covariance matrix for one population was equal to
the identity. I. and for the other it was either dI or
diag(d.1.....1). with d - l. 1.5. 3. 10. and 100. They
confirmed the robustness of T:2 for p :- 2 as found by
Hopkins and Clay and concluded that the actual level of
36
significance increases when any of the following occur:
(1) number of variables increases. (2) total sample size
with equal groups decreases. or (3) number of heterogeneous
dimensions increases (i.e.. all di . d). They also stated
that.'equal sample sizes help in keeping the level of
significance close to the supposed level. but have little
effect in maintaining the power of the test“ (p. 125). In
general. power was often considerably reduced by departures
that left the significance level satisfactory.
In a third empirical study of the robustness of T2.
with p - 2. 6. or 10. Hakstian. Roed. and Lind (1979) did
not use covariance matrices in canonical form. However.
all variances in one population were equal to one and
covariances had an irregular pattern. Two distinct
matrices were used for a second population. where all
elements were greater than in the first by a factor of 1.44
or 2.25. For two variates. robustness was evident with
equal sample sizes as small as six. For unequal sample
sizes. their results paralleled the previous studies.
Additionally. they found that increasing the total sample
size while keeping the ratio of sample sizes constant does
not help. and may actually hurt. the situation.
In summary. while the T2 test is robust to covariance
matrix heterogeneity with equal n's. it is not robust with
unequal n's. The latter is true even for relatively mild
departures from equality of the covariance matrices and of
37
sample sizes.
5eneLa1_HANQ¥A_TEEL_SLAIIELIGE
The MANOVA tests discussed in this section are all
functions of the eigenvalues of HE'l. where H and E are the
hypothesis and error SSCP matrices. For two groups. the
tests are equivalent. Hotelling's T02 is a generalization
of the T2 test. which may be used with more than two
groups. The T statistic is often used in place of T02.
since they are directly related (i.e.. T02 - dfe'r).
Robustness studies of multivariate test statistics for more
than two groups have shown that. in general. these test
statistics behave in a manner comparable to the univariate
F-test.
One of the earliest and most cited theoretical studies
of multivariate robustness to heterogeneity of covariance
matrices was conducted by Ito and Schull (1964). They
investigated the asymptotic distribution of Hotelling's T02
statistic. with one to four variables and two to five
groups. For the case of two large samples of equal size.
they showed analytically that the test is fairly well
behaved. with respect to both significance level and power.
in the presence of heterogeneity. Also. for samples of
nearly equal size. robustness holds as long as the
characteristic roots of 2221'1 fall in the range (.5.2).
For two large samples of unequal size. the departure from a
nominal alpha level of .05 increased as: (l) the ratio of
38
sampl
of he
depa1
incr«
the:
effe
Howe
even
effe«
test.
signj
the a
had t
sample sizes (I 3 n1/n2) departed from one. (2) the degree
of heterogeneity (d - the characteristic roots of 2221'1)
departed from one. or (3) the number of dependent variables
increased. For more than two groups and equal samples.
there was a tendency to overestimate significance. but the
effect was not serious with moderate heterogeneity.
However. if one or some of the groups were of unequal size.
even moderate heterogeneity conditions produced large
effects on the significance level and the power of the
test. In all cases with unequal sample sizes. actual
significance was greater than .05 if the larger group had
the smaller variance and less than .05 if the larger group
had the larger variance.
In an empirical study of the robustness properties of
Hotelling's T. Wilks' likelihood ratio criterion W. and
Roy's largest root R with small equal samples (n =- 5 or
10). Korin (1972) specified departure from equality of
covariance matrices in two ways. symbolized by A(d) and
Bid) with d a 1.5 or 10. A(d) represents cases where only
one population covariance matrix differed (i.e.. (I.I.dI)
for k 8 3 and (I.I.I.I.I.dI) for k = 6). while B(d)
represents cases where two differed (i.e.. (I.dI.2dI) for
k - 3 and (I.I.I.I.dI.2dI) for k - 6). Results showed that
the three tests were somewhat comparable and that. although
they were all liberal. R tended to be more so than did the
other two. The discrepancy between nominal and actual
39
values was slight with small violations of covariance
homogeneity (d all:1.5). but was pronounced with larger
violations (d = 10). This indicates that. unlike the large
sample case. with small n even equal samples do not
guarantee robustness.
A very extensive Monte Carlo investigation of the
performance of six multivariate test criteria under
heterogeneity conditions was conducted by Olson (1973.
1974). He considered groups of equal sizes (n - 5. 10. and
50) with both number of variables and of groups equal to 2.
3. 6. and 10. With populations having distributions N(Q.I)
or N(Q.D). he used two types of contaminating covariance
matrices (i.e.. where all canonical dimensions varied
equally. D18 61. or where only one dimension varied.
D =- diag(pd-p+1.l.....1)). with d . 4. 9. or 36. For a
given value of d. total variability in both matrices. as
measured by the trace of D. were equal. Therefore. only
the manner in which variability is allocated was varied for
a given d. and not the total variability. The latter being
varied by different choices of d.
Under various combinations of these factors. Olson
examined Type I error rates and power of Roy‘s largest
root. R. two trace-type tests (Hotelling-Lawley‘s T and
Pillai-BartlettIs V). and three determinental tests (Wilks'
likelihood ratio. W. Gnanadesikan's criterion. U. and
Olson's alternative criterion. S). The U and S tests
40
tend
favc
furt
test
libe
susp
asym
that
error
hypot
were
andw
thoug
robus
equal
overe
exten
sUbst
four
Chapt
GXCee.
dimem
inCreE
A180,
tended to be quite conservative and did not respond
favorably to violations and so will not be discussed
further.
Olson concluded that. although the remaining four
tests all tended to be liberal. the R test was far too
liberal and should be rejected if any heterogeneity is
suspected. For large samples. the V. W. and T tests are
asymtotically equivalent and he suggests as a rule of thumb
that they may be so considered whenever degrees of freedom
error are at least 10p times larger than degrees of freedom
hypothesis. For smaller samples. the T. W. and V tests
were robust against mild heterogeneity. but in general. T
and W did not fare as well. Findings showed that even
though it tended to be liberal. the V test was the most
robust under the conditions examined. These results with
equal samples uphold Korin's (1972) conclusions of
overestimation of significance level for small samples and
extend them to even moderately large samples (n - 50).
Although departures from assumptions have
substantially different effects on the distributions of the
four test statistics to be considered in this study (see
Chapter IV). general conclusions for equal samples are that
exceedance of nominal alpha may be decreased by reducing
dimensionality. p. or number of groups. k. However.
increasing sample size with equal n's does not always help.
Also. even though the percentage exceedance tended to be
41
greater at larger nominal alpha. Olson (1974) found that
”different proportions of contamination showed their
effects in much the same way at all three significance
levels" (i.e.. for .01. .05. and .10) (p. 898). In
general. exceedance rates increased with greater
heterogeneity. but they "tended to increase more as d
increased from 1 to 4 and from 4 to 9 than as it increased
from 9 to 36" (p. 898). Furthermore. regardless of p and
k. effects were relatively minor when only one canonical
dimension varied (D - Ch”) but severe when they all did
(D - d1).
For situations where D = CH. larger n's corresponded
to lower exceedance rates for R. T. and W whereas for V.
rates either decreased or increased as necessary to
converge to T and W for large n. This is due to the fact
that. for small n. V was significantly better than the
other tests in many of the cases. It should be noted that.
for equal samples. when D a dI ”effects of kurtosis and
heterogeneity tend to be in opposite directions. the former
yielding conservative rates and the latter producing too
many significant results“ (Olson. 1974. p» 901).
With respect to power. differences among the R. T. V.
and W statistics were typically small. However. the R
statistic tended to have slightly higher power if
differences in the population mean vectors were confined to
one of the 8 dimensions. while the V statistic had a slight
42
advantage if the differences were equally pronounced in all
the 3 dimensions. Furthermore. holding the noncentrality
parameter constant. increasing the number of groups tended
to decrease power. while increasing group size had no
consistent effect on power.
Another Monte Carlo study on the significance levels
of -R. T. W. and V test criteria with equal n's. but where
heterogenetity was modeled on the original covariance
matrices and not on the canonical dimensions. was conducted
by Ceurvorst (1980). He considered a variety of situations
that included varying the number of dependent variables (2
and 3). number of groups (2. 3. and 6). degrees of freedom
error (18. 60. and 180). and both type and degree of
heterogeneity. For differences of type. he considered
inequality of variance alone. of correlations alone. and of
both together. with combinations of three variances (l. 4.
and 9) and three correlations (.2. .5. and .8).
For heterogeneity of correlation he found only mild
liberal exceedance rates for the four test statistics using
a .05 nominal alpha. The observed significance levels were
always less than..09 and proved to be fairly robust in most
cases. Results for heterogeneity of variance confirmed
previous results for canonical forms. plus indicated that
the effects did not depend on the magnitude of the common
within-group correlation(s) for any of the cases
considered. Comparisons among the test criteria showed
43
that they were consistently ordered R-T-W-V from highest to
lowest exceedance rate of the nominal alpha. The most
serious discrepancies occured when k = 6 and five groups
had variances equal to unity. while the sixth had variances
equal to nine.
When both heterogeneity of variance and of correlation
were present. results differed depending on the relative
size of variances and correlations. If groups with the
largest variances had the largest correlations (LVLC).
violations became increasingly more serious than for
heterogeneity of variance (HV) alone. If groups with the
largest variances had the smallest correlations (LVSC). the
reverse was true. with violations being less serious.
Comparisons of the criteria under LVSC conditions were
similar to the HV situations with the V test being
uniformly most robust. followed in order by W. T. and R.
Under the LVLC conditions. no criteria was uniformly best.
When only one variance differed. R was often the best
choice. but it was the worst when all variances differed.
Also. when R was best. the other tests generally had
exceedance rates that were .07 or less.
Pillai and Sudjana (1975) studied the effects of
unequal covariance matrices on the R. T. V. and W
statistics in the exact case by deriving central and
noncentral distributions and applying them in a numerical
study with n - 5. 15. and 40. Considering p = 2. they
44
stated that low heterogeneity produces modest changes in
the powers of the test statistics. but that changes become
pronounced as heterogeneity increases. None of the four
statistics showed an advantage over the rest.
In summary. the discrepancy between actual and nominal
alpha tends to decrease with lower degrees of
heterogeneity. and with smaller number of variables and of
groups. It appears that. for two equal samples. neither
the significance level nor the power of Hotelling's T2 is
seriously affected by heterogeneity. but that this is not
necessarily true for unequal n's. For more than two groups
of large equal samples. robustness may be achieved with
moderate departures from homogeneity. but even moderate
heterogeneity produces large effects on both significance
level and power when samples are unequal. For several
small or moderately large groups. even equal samples do not
protect against departure from nominal significance levels.
with test criteria tending to be liberal. Consequences of
violation of the homogeneity assumption through a
contaminating covariance matrix is generally worse if all
canonical variances differ by an equal amount than when
only one differs. The case of only some equally discrepant
variances falls between the two extremes. In general.
Roy's largest root. R. appears to be the worst of the
invariant tests and Pillai-Bartlettls trace. V. the best.
with respect to both robustness and power.
45
CHAPTER IV
METHOD
Previous work exploring the robustness of MANOVA test
criteria to violation of homogeneous covariance matrices
across groups has dealt only with fixed-effects between-
group tests in a one-way classification. In the present
research. the effect of violating the assumption of
homoscedasticity was considered in a repeated measures (RM)
situation with data from ordered time points and a fixed-
effects. one-way design over the subjects. The purpose of
the study was twofold: (l) to compare the robustness of
multivariate test statistics for between-group and within-
group tests. and (2) to analyze the behavior of within-
groups tests under various conditions with respect to both
robustness and power.
When data in the p-variate response measures reflect
the passage of time. and assuming no group by measures
interaction. overall within-group tests encompass all p-l
degrees of freedom (df) for trend. thereby testing the null
hypothesis of no trend in the data or. equivalently. of
equality of occasion means. Subsequent tests may be
confined to any p-q-l degrees of freedom (df) remaining
after a q s p degree trend is hypothesized.
46
In this chapter. details are presented about the use
of covariance matrices in canonical form for RM analyses.
the parameters and procedures for the study. and the Monte
Carlo techniques that were used.
W
The assumption of homoscedastisity for tests of
between-group differences relates to population covariance
matrices for the original score vectors. For simplicity.
canonical forms of the covariance matrices are typically
used in MANOVA robustness studies (see Chapter III).
For MANOVA of RM. the score vectors are linearly
transformed to reflect the design on the measures. The
transformed vectors remain multivariate normal if the
original vectors are multivariate normal (Finn. 1974.
p. 62) and the assumption now relates to the transformed
covariance matrices.
With time ordered data. the transformation consists of
a matrix of normalized orthogonal polynomial coefficients.
When such a matrix is applied to populations with
covariance matrices reduced to the canonical forms I . dI.
or C(d) - diag(d.1.....1). the transformation does not
alter I or dI. Although C(d) becomes a general matrix. it
is reduced to C(d) when diagonalized. Therefore. the same
underlying violation is modeled for both between-group and
within-group tests when the original covariance matrices
are in canonical form.
47
mu
nu
he
so
be
n01
pal
to
vic
COR
set
9E0
wer
trel
Ovex
big}
nUll
ofa
Stat
largt
trace
W
A major problem in any study of robustness of
multivariate test statistics comes from the seemingly vast
number of ways in which the assumption of covariance
homogeneity can be violated and the many factors that have
some bearing on levels of robustness. Therefore. it
becomes necessary to specify these factors and
nonconforming populations in terms of some relevant
parameters and to choose particular levels of each in order
to have a systematic coverage of different forms of
violations under various conditions. The parameters
considered in this study are described in this section.
Tests_nf_unltiyariatg_Hypgtheses. For each simulated data
set. tests of a between-group hypothesis and two within-
group hypotheses were conducted. The hypotheses tested
were: (1) the null of no between-group differences on
p-variate mean vectors. using k-l df. (2) the null of no
trends across the p-variate data. using p-l df. for an
overall test on occasions. and (3) the null of no trend
higher than linear. using p-2 df. Rejection of the second
null hypothesis. but not of the third implies the existence
of a linear trend across time.
Test_£riteria. For each hypothesis. four multivariate test
statistics. defined in Table 2-4. were calculated: Roy's
largest root. R. Hotelling-Lawley trace. T. Pillai-Bartlett
trace. V. and Wilks' likelihood ratio. W.
48
W. Experiments were simulated with
p - 4 or 5 response measures. This enabled both within-
group tests to be multivariate. Since a multivariate test
for linearity uses SSCP matrices of order p-2. the smallest
value for p that allows for such a test is four. The SSCP
matrices for hypothesis and error were: (1) of order-4 or 5
for between-group tests. (2) of order-3 or 4 for overall
occasion tests of no trends. and (3) of order-2 or 3 for
tests of no trends higher than linear.
.Numher_gf_fironps._k. The simulated experiments had simple
one-way fixed designs on the independent factor with k I 2.
3. or 6 groups.
Group_§ize._n. Small to moderately large experiments were
simulated. with n a 10. 20. or 50 in each group. In all
cases. groups of equal size were considered.
T¥2e_nf_Heferngeneity. The identity matrix. I. was used to
model homogeneous populations. For heterogeneity
conditions. two populations with covariance matrices equal
to I and d1 were used. This type of diffuse structure was
chosen for the contaminating matrix because it is the kind
of violation that typically produces the most severe
departures from nominal significance levels.
Degree_of_fleterggeneity._d. This factor relates to the
size of the violation (i.e.. to how much more variable one
distribution is relative to another). Small to large
violations were modeled. with d a 2. 4. or 9. For
49
homogeneity conditions. d I l.
fiignifigange_neygl‘_2. The probablity of making a Type I
error was considered at the .01. .05. and .10 nominal alpha
levels. For a given nominal level. (100a)% of the values
in a test statistic's distribution will exceed the
appropriate critical value under a true null with no
assumption violation. Hence. a dependent variable in the
Monte Carlo experiments was the empirical estimate of a
statistic's percentage exceedance of its critical value at
significance level alpha. given a true null and
heterogeneous covariance matrices. (The phrase percentage
exceedance is used throughout the thesis to refer to the
percentage of replications of a statistic that exceed a
critical value).
Pager. This is equal to l - PWType II error). Nominal
power relates to the percent of values in a test
statisticfls distribution that will exceed the critical
value under a true alternative with no assumption
violation. A second dependent variable in the Monte Carlo
experiments was the empirical estimate of actual power
(i.e.. the percentage exceedance given a true alternative
and heterogeneous covariance matrices). This was conducted
at all three nominal alpha levels.
Power is a function of the discrepancy between central
and noncentral distributions for a test statistic. The
MANOVA noncentrality parameter (ncp) is a standardized
50
measure of the distance between group means in the
population (Olson. 1974) and may be defined as the sum of
the eigenvalues, gj (j - 1.....p). or trace of a matrix G.
where
G - FV'l.
V is the population covariance matrix and
k
F = 1E1n1(£i - l)(£i “.Ei'.
where 11.1 is the population mean vector for the ith Of k
groups and u is the grand mean vector in the population.
When data are ordered according to time. the ncp for tests
of within-group hypotheses incorporates the time dimension.
This is done by representing the elements of the covariance
matrix and the means in the above equations as functions of
time (Morrison. 1972).
Since power depends on the common covariance matrix V.
no theoretical power values exist under heterogeneous
conditions. and the choice for V is open. Therefore. the
noncontaminated covariance matrix (I in canonical form) is
typically used for V in order to calculate the ncp. In
this way. a comparison can be made between a test's ability
to detect differences when assumptions are violated to its
ability to do so when they they are met.
Procedures
Monte Carlo techniques were used to generate either
10.000 or 2.000 replications of multivariate data sets
51
distributed N(.Q.I). Each data set represented a particular
combination of k and n with p - 5 measures across time.
Using these data. critical values were calculated for tests
of three multivariate hypotheses using four test statistics
at three nominal alpha levels. The data in each set were
then transformed seven times to calculate: (1) actual
significance levels in three central heterogeneous cases
for between-group and within-group tests. (2) nominal power
in a noncentral homogeneous case for within-group tests. and
(3) actual power in three noncentral heterogeneous cases
for within-group tests. All calculations were performed a
second time on the same data sets using only the first four
measures to simulate conditions with p - 4. Since
noncentral situations refer only to tests of within-group
differences. in these cases. the null hypotheses of no
group by occasion interaction and of no group differences
remained true.
A FORTRAN V program was written to generate.
transform. and analyze the data. A detailed description of
the computational procedures appears in Appendix A. These
procedures guided the creation of the computer program.
which also appears in Appendix A. The remainder of this
section describes the determination of critical values. the
design for the study. the analysis procedures. and the
interpretation of computed significance levels and power
values.
52
WWW
Critical values for the multivariate test criteria and
the combinations of p. k. n. and alpha levels used in the
study were not all available in published tables. Also.
tabled values have generally been obtained analytically
rather than empirically. Therefore. values used in the
study were empirically determined via Monte Carlo
techniques.
Using three nominal significance levels. critical
values were calculated such that (1000)% of the N
noncontaminated replications (where N - 10.000 or 2.000)
under a true null would be judged significant using that
critical value. This was accomplished by taking the
arithmetic average of the (Na)th and (Na+1)th smallest of
the N values for W and the corresponding largest of the N
values for R. T. and V. Values thus obtained will be
referred to as Monte Carlo critical values to distinguish
them from tabled values.
War
The design for the study is given in Table 4-1. where
combinations of k and n used for all levels of p and d are
denoted by an x in part (a). Hypotheses tested under
central and noncentral conditions with four statistics at
three nominal levels are indicated in part an. The matrix
in part (c) shows how the two types of conditions from (a)
and (b) were combined to create the necessary statistics.
53
Table 4-1
Design for the Study
a)
b)
umber of measures (p). of groups (k). and equal sample sizes (n)
under heterogeneity conditions (d) . *
k: 2 3 6
n: 10 20 50 10 20 50 10 20 50
Condition d p
Hanogeneity 1 5 x x x x X
4 X X x X X
Heterogeneity 2 5 x x x x x
4 X X X X X
4 5 X X X X X
4 X X X X X
9 5 X X X X X
4 x x x x x
* x indicates conditions replicated 2.000 times.
Conditions replicated 10.000 were with k -- 3 and n = 20.
Statistics calculated under central and noncentral conditions for
various hypotheses at three nominal alpha levels and for every
combination of factors indicated in (a). *
Alpha: .01 i .05 .10
Statistic: RTVW RTVW RTVW
Condition Hypothesis
Central
B
C
L
Noncentral C
I.
* Hypotheses tested were: bebveen-group differences. B. within-
group test of trends. C. within-group test of trends higher
than linear. L; using test statistics: Roy's largest root. R.
Hotelling—Lawley trace. T. Pillai-Bartlett trace. V. and
Wilks' likelihood ratio. W.
54
Tuflce4€l(Canu)
c) Empirical values derived from each replicated data set by
crossing elements from conditions on covariance matrices in (a)
and cmifltflauson hnxnheafisin a».
(Axflutflu:on<2warhmxmaMNUflces
Hanxpneuy' Hehumgmudty
Central Monte Carlo Actual significance
Condition critical values levels
on
ngmhaum lkmcanxal Nmunalrnwm: AcUrfl.pmnm
induce \HUues
For the first part of the study. five-variate vector
scores from a population distributed N(Q.I) were generated
for 10.000 replications of one situation with three equal
groups of size 20. Four-variate situations were simulated
by using the same data and dropping the fifth measure in
each vector score.
For the second part of the study. new sets of 2.000
replications from the same population were generated for
each of the five combinations of k and :1 indicated in Table
4-l(a). Equal cell sizes were used throughout the study
and the same procedures followed for every combination of
p. k. and n. regardless of the number of replications.
Calculated statistics from the data in each set of N
replications under homogeneous conditions were used to
55
determine Monte Carlo critical values for all combinations
of multivariate tests. test statistics. and nominal alpha
levels shown under the central case of Table 4-1(b).
Regardless of the number of groups represented. score
vectors for only one group in each case were transformed to
simulate data that might arise from populations distributed
N(Q.dI). The data sets represented central heterogeneous
conditions. and all test statistics were recalculated.
Each resulting value was then compared to the
corresponding Monte Carlo critical value for the three
alpha levels considered. Actual significance levels (i.e..
empirical Type I errors under heterogeneity) were
determined by counting the number of values in each
replication that were: (1) greater than the corresponding
critical value for R. T. and V statistics. and (2) less
than the corresponding critical value for W statistic. and
then dividing by N. the number of replications.
To investigate the power of multivariate within-group
tests under true alternatives for the occasions. the
original noncontaminated data sets (with d - 1) were
transformed to reflect a given curvilinear trend across
time. Under homogeneity and an alternative condition for
within-group tests only. the above calculations were again
performed to determine Monte Carlo nominal power values for
tests of within-group differences.
56
The final step in the process was to add the
curvilinear trend to the heterogeneous data sets and repeat
the calculations to determine actual powers for within-
group tests under noncentral heterogeneous conditions. By
comparing these values to those for nominal power. the
effects of heterogeneity on the power of within-group tests
under an alternative hypothesis could be evaluated.
CW
In order to empirically determine whether between-
group and within-group test statistics respond
differentially to identical heterogeneity conditions under
true null hypotheses. one experimental situation with k - 3
and n - 20 was replicated 10.000 times. The large number
of replications was used in order to insure relatively
small standard errors.
The main interest in a comparison between actual
significance levels for the group and occasion tests was
examined from two perspectives. First. tests were compared
within a given p to simulate practical analyses where tests
of both hypotheses are performed on the same data set.
However. discrepancies between actual significance levels
evidenced here might occur because group and occasion tests
are based on SSCP matrices of order-p and p-l.
respectively. Therefore. a second comparison was made
between the group tests with p = 4 and the occasion tests
with p - 5. so that both would be based on order-4 SSCP
57
matrices.
W
Using the same 10.000 replications. comparisons of
actual significance levels were made between the two sets
of within-group tests for general trends and for trends
higher than linear. With the data modified to reflect true
alternatives for within-group hypotheses. the power to
reject the null under heterogeneity was also evaluated.
The second stage of the research was an attempt to
examine the effects of heterogeneity on within-group tests
when the number of groups and of equal sample sizes are
varied. Both robustness and power were considered with
2.000 replications each for five combinations of k and n.
Tests of between-group differences in the central case were
also made in order to determine if discrepancies between
these tests and within-group tests were sensitive to
changes in number of groups and sample size.
W
The critical values and probability levels for
significance (Type I error) and power were obtained via
Monte Carlo methods and are therefore subject to sampling
error. To take this error into account. the standard error
(S.E.) of a proportion for a sample size equal to the
number of replications was employed.
The S.E. for a proportion depends on the true value of
the proportion. P. and is equal to (P(l-P)/N)l/2. where N
58
equals the number of replications. Since the true value of
P (i.e.. nominal alpha) is known. this formula may be used
to calculate the S.E. at the three nominal alpha levels
considered. These are given in Table 4-2.
Table 4-2
Standard Errors for Nominal Alpha Levels
and Number of Replications Used in the Study
Alpha N . 2.000 N . 10.000
.01 .0022 .0010
.05 .0049 .0022
.10 .0067 .0030
.Monte_£arln_Technisues
The methods for exploring the issues of robustness in
this study involved the use of simulated data generated by
computer algorithms. Through the analysis of a large
number of samples under known population parameters. one
can investigate the properties of statistics by observing
their resulting distributions. These empirical
distributions obtained under heterogeneity are then
compared to the nominal distributions obtained under
homogeneity for the statistics in question. The FORTRAN
program was used to generate either 10.000 or 2.000 samples
of vector observations for each experimental condition and
59
to perform the required data transformations and analyses.
The procedures followed are specified in this section.
The required data were 5x1 vector observations.
normally distributed with known mean vector and covariance
matrix. The generation and transformation procedures
consisted of three steps:
1) Generate a set of independent random observations
uniformly distributed on the interval 0 to 1.
2) Combine the uniform variates to create a set of
observations normally distributed with mean vector
zero and covariance matrix equal to the identity.
3) Transform these observations to obtain the desired
structure with mean A and covariance matrix V.
Each step will be considered separately.
.Randnm_Numhex_Generation
Hammersley and Handscomb (1964) stated that “the
essential feature common to all Monte Carlo computations is
that at some point we have to substitute for a random
variable a corresponding set of actual values. having the
statistical properties of the random variable" (p. 25).
These values are called random numbers. In practice. what
is actually produced via computer programs are a set of
pseudo-random numbers calculated sequentialy from a
completely specified algorithm. This algorithm is devised
in such a way that a statistical test should not detect any
significant departure from randomness.
The subroutine GGUBS from the International
Mathematical and Statistical Library was used to obtain a
60
sequence of uniform random numbers U1.....Un distributed
EHO.1). This routine uses a congruential generator based
on the following relation
Xi - aXi-1 (mod m)
where a - 75 and m - 2+31 - 1. Once the procedure is
started by an initial seed value. each Xi is determined
from the previous value. The constant terms a and m are
chosen so as to maximize the period of the generator. since
a sequence repeats itself when a value for X1 reappears.
The numbers 01 - x1/231 are a pseudo-random sequence
in the interval 0 to 1. They are independent of each other
and behave as if they were random.
We
Several approaches are available to create independent
normal deviates from uniform random numbers. A simple
approach to program is based on the Central Limit Theorem
(CLT) and uses a summation of a fixed number of values.
where this number may be as low as 12 for reasonable
approximations. However. the procedure ”is very slow and
it does not adequately sample in the extreme tails of the
normal distribution“ (Lehman. 1977. r» 148).
The method used in this study for generating normal
deviates from independent random numbers. which was devised
to be reliable in the tails. was suggested by Box and
Muller (1958). They cite a detailed comparison with
several other methods. including the Central Limit
61
summation. and state that their approach gives higher
accuracy and compares favorably in terms of speed. The
procedure uses a pair of random numbers U1 and 02 from the
same distribution on the interval (0.1) to generate a pair
of normal deviates from the same normal distribution.
N(0.1). The following transformations are used:
21 - (-21ogeUl)1/2cos2nU2
22 a (-2logeul)1/Zsin2u02
The resulting values are a pair of independent random
variables. normally distributed with zero mean and unit
variance.
Vectors of five such variables taken together
represent 5x1 observational vectors. which are multivariate
normal and distributed N(Q.I) (Anderson. 1958. pp. 19-27).
Observations of this form were used to simulate the central
case with homogeneous covariance matrices.
W
The first step to determine a vector with specified
variances and intercorrelations among the variables is to
factor a known covariance matrix V into a lower triangular
matrix T such that
V =- TT'
This is the square root method or Cholesky factorization of
a symmetric positive-definite matrix. V (Bock. 1975. p. 85).
Then. a transformation of a vector of normal deviates 1.
Y a T1 + E
62
produces a normally distributed vector y with the desired
characteristics. since
Var(y) I T(Var(z))T' I TT' I V
when var(z) I I. The only effect due to adding a known
vector of means 1; is to change the point of central
tendency for the distribution of y.
In the present study. where V I dI.
T 3 51/21
and therefore.
2 I (61/21); +.u
. 51/21 ... 11
was the transformation used for one group to simulate data
from heterogeneous populations in the noncentral case.
Other transformations used the above equation with
(l) u I Q for the central heterogeneous case. and (2) d I l
for the noncentral homogeneous case.
After generating the data. the program performed the
required multivariate tests. calculated the critical
values. and tabulated the proportion of times the values of
each statistic exceeded its critical value for a given
nominal significance level when: (l) a null hypothesis was
true. and (2) an alternative hypothesis was true. Obtained
proportions were the actual Type I error rates and powers.
respectively. for the statistics. Multiplying these
obtained values by 100 produces percentage exceedance rates
under heterogeneity.
63
CHAPTER V
RESULTS
The results of the study are presented in this chapter
in four sections. The first two sections are based on
10.000 replications of experiments with k I 3 and n I 20
and deal first with comparisons-of multivariate between-
group and within-group tests with respect to robustness and
then with the power of within-group tests under
heterogeneity of group covariance matrices. The latter two
sections present the effects of varying sample size and
number of groups first on the robustness and then on the
power of within-group tests under heterogeneity conditions.
Results in these latter sections are based on 2.000
replications for each of five combinations of k and n.
Critical values for each set of N replications (where
N I 10.000 or 2.000) under central homogeneous conditions
were obtained empirically through Monte Carlo methods and
are tabled in Appendix B. Actual significance levels under
central heterogeneous conditions and powers under
noncentral conditions were calculated by determining the
number of times obtained test statistics exceeded the
corresponding critical values and then dividing by the
number of replications. These empirical values were
multiplied by 100 and are reported in this chapter in terms
64
of percentage exceedance rates of the Monte Carlo critical
values.
W
The objective for this portion of the study was to
determine whether heterogeneity of group covariance
matrices produces differential effects for multivariate
tests of between-group and within-group differences. 'The
question could be phrased: Given no interaction effects
and no main effects for either group or occasions in the
populations from which the data are sampled. are there
differences in the frequency with which rival test
statistics indicate a significant effect for tests of
between-group and within-group hypotheses under
heteroscedastic conditions? A secondary question relates
to differences between two within-group hypotheses (i.e. of
no trends in the occasion means and of no trends higher
than linear).
The situation considered was that of three equal
groups of size 20 with either five or four measures across
occasions. The procedures for central conditions. which
were detailed in Chapter IV. were followed.
Since the data were randomly generated using computer
algorithms. random error in the data must be considered.
To insure that this error be small. 10.000 replications
were used. Given known parameters (i.e.. nominal alpha
levels). the standard error of a proportion with 10.000
65
replications (see Chapter IV) may be used to calculate 95%
probability intervals around the known parameters instead
of confidence intervals around the sample estimates. This
produces the following intervals for the three nominal
levels considered:
.01 i .0020
.05.: .0043
.10 1 .0059
Expressed in terms of percentages. to correspond with
tabled values. the 95% probability intervals are:
(0.80. 1.20) at ..01
(4.57. 5.43) at .05
(9.41.10.59) at .10
Thus. obtained percentage values within these intervals may
be considered to be within sampling error of nominal
percentages. ‘
Critical values were estimated with Monte Carlo
methods and. therefore. are subject to error. Since
exceedance rates were derived from the same data sets
transformed to heterogeneous conditions. the deviations
from nominal levels in the following tables reflect only
added error due to heterogeneity.
As far as possible. parameters used in this part of
the study will be discussed separately in terms of their
effects on the percentage exceedance of Monte Carlo
critical values for the three hypotheses under
investigation (i.e.. of no between-group differences. B. of
no trend over occasions. C. and of no trend higher than
66
linear. L). Table 5-1 contains the actual percentage
exceedance rates (i.e.. empirical Type I error times 100)
for central heterogeneous situations.
.Significance_negel‘;1. Percentage exceedance rates for all
three hypotheses tested increased with larger nominal alpha
levels. except where obtained values were within 95%
probability intervals of the nominals. Although the
patterns were similar. increases in exceedance rates were
greatest for the between-group tests. B. and lowest for the
within-group tests of no trends higher than linear. L.
However. when tests for a given hypothesis are
considered with respect to standard errors. which also
increase with alpha level. different amounts of
heterogeneity showed consistent effects regardless of
significance level. For example. at all three alpha
levels. departures from the nominal for tests of B ranged
from about one standard error with the V statistic at d I 2
to over 50 times the standard error with the R statistic
at d I 9. Departures for the within-group tests were
typically around one standard error with all test
statistics at d I 2. and never exceeded 13 standard errors
for tests of C and eight standard errors for tests of L at
d I 9. The larger numerical values for exceedance rates as
alpha increases is apparantly a function of corresponding
larger standard errors.
67
.3 .033 @023on .333 can .> .003» uuonuucmnacafim
a. .83... aouzsumcfiaouoz :— Joe ”.393 Page 33333.. ammo 05m: 3 Game: :93 Hop—3: mucous
msoumncwfiqz 6 59.0.3 accuracy“: 5 6859.330 96.51503qu "mum: cobweb 3859:: .fiqocomououm:
no 00.53 6 Love: menace... a 5; cm a c on? no mucosa assoc m u 3. ac mcoflcofimou 25.3 eoum e
Hm.HH MN.HH mm.HA vv.HH mo.@ mc.w No.o vm.m om.u we.” mm.~ ah.u m
mh.cu mh.oa ah.cH wo.o~ om.m ov.m mc.m mv.m vN.H MN.H vN.H QN.~ v
mo.ou vo.o~ vc.od hc.oa mN.m MN.m MN.m ha.m ac.H ha.H NH.~ mH.H N
mm.HH vm.u~ Nm.HH cc.NH aw.m no.0 Ho.o Nm.o Nh.a Nh.H ca.~ va.n m
cv.cH om.oa mm.oa cw.on Ho.m . mm.m mm.m mu.m vm.~ mn.~ mm.“ hm.~ v
he.oa ho.od mo.¢a MH.9H no.v °¢.m mm.v do.m so.H vc.u ac.~ mo.~ N
cN.mH cw.MH NN.@H mm.HN hv.m mm.h mm.oa Nm.mn mm.m om.N hm.v mm.m m
mN.NH Hm.HH mm.NH vh.mH cN.h o¢.w No.5 No.oH aH.N no.H me.N mw.n v
vm.cH on.o~ mm.cu HN.HH h¢.m mm.m Ah.m NN.w mm.~ aN.H mc.~ mo.n N
co.Nn mo.NH ho.NH QN.NH on.w hn.o mm.o Ah.» mo.n cm.~ mo.d Nc.H m
Hm.ca hm.ou mo.ca vh.cu on.m hN.m 5N.m Hm.m mN.H «N.H mN.H vm.~ v
mN.0H hm.ca an.on cH.cH cm.v mo.v Nm.¢ mm.v mu.~ ~H.H mH.H vc.n N
ma.N~ h©.NH ad.mn ah.mn on.h mn.h vv.h Nm.h mm.H mc.~ oa.~ NH.N m
cv.ua HN.HH hv.Ha vh.HH ah.m uh.m mm.m mm.m cv.~ vn.H ov.~ cm.“ a
um.cd oN.cH vm.o~ Nm.oa mo.m ma.v m~.m wo.m NH.H HA.H NH.H cm.~ N
Ho.m~ o~.mu wo.a~ HH.@N Ho.a mh.h mm.HH oa.m~ he.n hv.N mo.¢ cc.h m
ch.NH Nh.HH me.m~ Nv.hd oc.h mn.o mo.h ¢a.od HQ.N om.m Hm.N mm.m v
mm.ca bu.oa cw.¢~ hh.HH Nm.m vN.m cm.m Nv.o HN.~ HH.H hN.H hm.d N
3. _> _8 Am 3 K, E .m a. >_ B m c
oH.Ia mc.lc Ho.uc
.8858»: 2:2 95. aces. 38... 82.9.33: .8
«33> ~33qu can—8 Eco: mo mound 00:33on 33588
Him munch.
68
MW“. Percentage exceedance rates were
generally larger with five dependent variates than with
four. Whenever this was not the case. discrepancies
between corresponding exceedance rates at p I 4 and 5 were
less than twice the standard error for a difference of two
proportions. The smallest differences between exceedance
rates for corresponding tests at the two levels of p
occurred for the L tests. This may be due to the
relatively small departures from nominal levels for tests
of L. regardless of the number of variates.
.DeHxee_n£_fietexngeneit¥i_d. In general. tabled values
tended to be within 95% probability intervals of nominal
values with low heterogeneity and. in all cases. the
percentage exceedance rates increased with d. The effects
of greater heterogeneity were the most pronounced for the B
tests. where actual Type I error departed by as much as .16
from a nominal..10 level. However. discrepancies between
actual and nominal values were less than .04 for the C
tests and .02 for the L tests at a nominal .10 level.
‘Test_Statistig. Considering low heterogeneity. percentage
exceedance rates tended to fall within 95% probability
intervals of nominal values with the V or W statistics when
testing the between-group hypothesis. while they did so
with all four statistics when testing either within-group
hypothesis. As expected from previous research on the
robustness of between-group tests (e.g.. Olson. 1973). the
69
four tests statistics were ordered V-W-T-R from best to
worst when testing B. Differences between actual Type I
errors for the V and R statistics for B were always greater
than twice the standard error of a difference. reaching as
high as .13 with high heterogeneity and five variates.
While results for tests of C generally followed the
same ordering from best to worst statistic. those for tests
of L did not. However. differences in departures from
nominal levels among the statistics for tests of both
within-group hypotheses were negligible. generally being
less than twice the standard error of a difference and only
once reaching a .01 difference. Except for tests of L. the
effect of greater heterogeneity increased the differences
between the best and worst statistic. This increase was
considerably more pronounced for tests of between-group
differences than for within-group tests of trends.
Testa_nfi_Mn1tixariate_flypgtheses. Exceedance rates for
within-group tests tended to be within 95% probability
intervals of nominal levels only under low heterogeneity.
For between-group tests. this tended to be true only when
the V or W statistics were used. ‘To evaluate robustness in
terms of acceptable Type I error. results were considered
too liberal if they exceeded .015. .06. and .12 at nominal
levels of .01. .05. and .10. respectively. Using these
criteria when k I 3 and n I 20. only the between-group
tests with T. V. and W statistics would be considered
70
robust under low heterogeneity. For within-group tests.
robustness would extend to all four statistics and to
moderate heterogeneity (d I 4).
To summarize the differential effects of heterogeneity
on tests of the three hypotheses and to examine more
explicitly the differences among them. Table 5-2 provides
differences in the actual percentage exceedance rates. The
first two sets of rows relate to tests with a given p to
simulate practical analyses with tests for both B and C
performed on the same data sets. In the third set of rows
the comparisons between B and C control for the size of the
SSCP matrices from which the tests are derived. so that
both sets of tests are based on matrices of order-4.
In the lower half of the table are presented similar
comparisons for the two within-group tests. As before. the
first two sets of rows relate to within-group tests with
the same initial p. while the third set compares the tests
with equal SSCP matrices of order-3.
The differences portray the extent to which departure
from nominal levels were typically greater for tests on B
than on C. Regardless of which set of comparisons was
considered. the differences followed similar general
patterns. The discrepancies in exceedance rates between
tests on B and C tended to be less than two standard errors
of a difference of two proportions when d I 2 or when the V
statistic was used.
71
4791.503 «0 003.303 mumm 050 .0500 0:» so @003 300» o» 0»0H0u 03:000.: 3300:: 5H3 0093 03:3
.300 33 053 0:» so @003 300» o» 333 .m 63:000.: Hes—00 52: 009.30qu
.3 .0303 EHmeHH
.82»: 0:0 N> .003» 303331333 5. .003» >0H331§HH0»0: 8 Joe 3033 0&9. “0030330 »m0»
H.505 53303330: no 03000 c »c 3 300:: :2» 30:3: 033» @9335“: 6 .0303» 93015233
8 60023033 mach—01:00:»0n Go 000052;: How Tm 033. 5 0030., scum @3333 303 009.30qu a
vv.l
mm.
NH.
mN.N
mm.
v°.1
cm.m
mm.H
NN.
mm.N
cm.H
co.
.3
Hm.1 mm.1
no.1 no.1
cm.1 cH.I
Hm. NH.
m¢.1 mm.1
me. no.
No. HH.H
VN. No.
HH.I mH.
ma. «o.m
ov. mm.H
vo. NN.
wo.N ch.‘
Hm.H vv.N
MN. he.
Hm. mm.v
Hm. Hc.N
No.1 «m.
>_ B
cH. I c
NN.I
«H.I
nc.
om.
wc.l
we.
Hm.H
oo.H
Nv.
Hm.
Hm.
no.
cw.
HH.
on.1
Na.
av.
mH.
hH.N
H¢.H
Nv.
mh.N
mm.H
vm.
Hm.N
HN.H
hN.
_3
NN. Nv.
Nc. on.
HH. so.
an. ab.
HN. oN.
MN.I vN.1
an. mo.H
vv. mm.
co. NN.
co. vm.m
sh. am.H
Nv. on.
Hm.H NH.¢
an. hH.N
mm. Nb.
mm. HH.¢
«v. No.N
Hm. mm.
> 8
ma. 0 a
HN.
NN.
ac.
am.
NN.
mH.I
Hm.
Ne.
NH.
cm.a
mo.v
wH.H
cm.o
mN.v
HN.H
wo.OH
Hm.v
mm.H
m
ac.
HH.
no.1
NN.
cH.
N:.1
on.
NH.
Hc.1
mh.H
an.
NN.
mm.H
mo.
Nm.
ch.H
Ho.
mo.
.3
NN. HH.
me. no.
no.1 a we.1
mN. NN.
oH. HH.
mc.1 mc.1
mm. mN.
oH. NH.
cc. no.
ms. mm.N
av. mo.H
5H. on.
vm. hm.N
cm. vH.H
IN. an.
em. he.N
NN. Hm.
cc. ac.
> 9
Ho. 0 5
NH.
no.
me.
MN.
mo.
mc.1
mm.
ON.
oN.
Nh.v
HH.N
mm.
mm.c
mN.N
cm.
Hh.m
mO.N
NN.
m
NVO! NQ'Q Nfl'm
'5 Nfi'm Nfl'm NQO‘
Angulaevu
Av.AIAvVU
$313.0
.mVUIAvvm
3»?va
3.013;.
3.0:
.8858»: :52 03... 895
033 00560003 003533 5 009.3033
Nlm mHnma
72
For a given statistic, differences in percentage
exceedance rates decreased as either nominal alpha or
heterogeneity decreased. Consistently the smallest
differences between B and C tests occurred with the V
statistic, typically being less than two percentage points.
The largest occurred with the R statistic. where
differences were as high a 12 percentage points. These
patterns reflect that betweenegroup tests tend toward
robustness when homogeneity is low or when the V test
statistic is used and that actual significance levels for
the 8 tests increase considerably from V to R, while they
remain relatively stable across the four statistics for the
C tests.
Differences between the two within-group tests did not
follow the patterns of the B and C differences. The
discrepancies in percentage exceedance rates between tests
on C and L were less than two standard errors of a
difference with both d =- 2 and 4, as well as in over half
the cases with d - 9. Regardless of the alpha level, these
differences were typically negligible and rarely exceeded
one percentage point.
12nwer_9f_Hithin:sr9up_Tests_nf_Irends
The power of the tests to reject the null was
evaluated under a homogeneous (d a l) and three
heterogeneous conditions. The original 10,000 data sets
for three equal groups of size 20 were transformed to
73
reflect the same quadratic trend over the five time points
for each vector score. Since all the groups were equally
transformed, this provides a situation with neither
interaction nor between-group main effects, but with a
within-group main effect. The percentage of rejections for
the null hypotheses of no trend (C) and of no trend higher
than linear (L) were determined.
As shown in Table 5-3, power values were quite stable
across the four statistics within a given heterogeneity
condition and alpha level. This is fairly consistent with
previous findings for power of between-group tests under
heterogeneity (e.g., Olson, 1973), where differences across
the test statistics, although sometimes present, were
relatively minor.
Regardless of heterogeneity, power was always larger
at larger nominal levels. This trend follows what is
expected under general homogeneity conditions, since
“u.setting alpha larger makes for relatively more powerful
tests 0f 30" (Hays, 1973, p. 359).
Within each nominal alpha level, power decreased as
heterogeneity increased. For example, with p - 5, power of
the C test at nominal .01 went from over 90% under
homogeneity to around 30% with a high degree of
heterogeneity (d - 9). At .05 and .10, power dropped from
98% and 99% to slightly over 50% and 65%, respectively.
This downward trend was remarkably consistent among all
74
S can m u a mom. 2. m. a. v. S 0.53 muouog cow:
.3 .033 39.39.: .323 can .> .83”. again—3:333
a. .83... woufilmfifiwuo: :— .uoob unwound Pas— ”wodumfioum bum» 9.3m: 3 Game: :93 3sz anew:
gougfiwz 6 .353 gougfius "mum: pmumou mmmofiog: .Eufimcomoeoc 3833 H I 3 3350035:
uo 0330 p oops: mmusmuos a 5; on .... c on? no museum H38 m a x no 953333.“ 80.3 =55 w
.c n a “.8 a. a. ..
mc.om mm.hm vc.mm no.5m mm.w~ mN.oN mN.wN oa.mN hm.on om.ca mm.ca hm.OH m
Hm.mm vm.wm av.mm wc.wm No.Nv mo.~v h¢.~¢ oo.~c mH.HN OH.HN «N.H~ ma.o~ v
~H.Nh mH.Nh cH.Nh mo.ah hw.mm oh.am mm.mm mc.mm NN.vm vN.vm mm.vm uh.mm N
m~.mo NN.mm om.mm ma.~m mm.nh hm.mn hm.m> hh.Nb m~.mv m~.mv mm.wv ch.hv H
hv.mm mm.mm mm.mm N~.mm wm.wv mm.ov ov.hv vm.w¢ cm.m~ ma.v~ .Hv.mN mv.m~ a
HH.NQ oo.um mo.~m hh.ao mv.Nh Nm.~h ch.~h wm.~5 mN.mv «c.mv mm.mv om.m¢ v
mo.ma mm.mm mu.ma om.mm ow.mw ch.om mw.mm no.mm ma.ah mm.nh vo.~b mc.Hh N
cw.mm mu.ma ma.om oo.om hw.om hn.mm Hm.oa Hm.mm om.mm «n.0m «m.mm mm.mm a
wv.om mm.mw mm.mo Hm.hm hm.mm ov.mm mm.mm Hm.vm «b.Nm cw.~m mo.~m mm.Hm m
mm.mm mv.mm m~.mm mm.mm ch.dm mh.Hm mm.Hm hw.cm mn.~w vm.dw co.nm vu.cw v
mo.hm hm.hm mo.hm mm.hm Na.vm No.mm mo.¢m mm.vm No.mm mm.mm hm.mm mv.Nm N
hm.ma hm.mm wm.mm mm.mm mm.mm mo.mm mm.mm ah.wm vv.vm v¢.vm mm.vm ma.mm H
m>.mw hH.mo Ho.mw v~.om vm.mm mm.Nm mm.mm mm.Nm vh.om Nm.mN no.0m uh.Hm m
aa.ww ah.oa oo.hm Ho.ho o~.mh wa.oh mm.mh no.mh h¢.hm om.wm mm.hm mm.hm v
mw.om mh.wm vo.om om.mm mm.mm m~.mm om.ma Hm.Nm cc.om mm.oh mh.mh mh.mh N
mm.mm vm.mm mm.mm 5N.mm Hm.mm 0N.mm Hm.mm «H.0m mv.~m ¢N.Nm mm.~m mo.Nm H
3. _> B m 3. .> B m z? .> B m 6
ca. I 5 no. I 5 Ha. I a
.mgflnfimui was ~85
Him 0.33.
Bums 905155; How mwuom 8:33on wmcucmozm
75
test statistics for both hypotheses using both values of p.
The only difference among the four conditions was one of
magnitude.
With five variates, power for the subsequent L tests
tended to be slightly better than for the corresponding C
tests. However, with four variates, the reverse was true,
with a dramatic loss in power occurring between the C and
corresponding L tests (e.g., going from 86% to 48% under
homogeneity at the .01 nominal level). Comparing p a- 5 to
p - 4, power dropped only slightly for the C test, but
significantly for the L test.
The reason for the substantial reduction in power for
the L test with four variates seems to be due to the nature
of the transformation used to create an alternative
hypothesis condition. While the curve was strongly
curvilinear with five measures, a linear trend serves as a
reasonable approximation of the data when only the first
four measures were used in the analyses (see Figure 5-1).
To test the above hypothesis and further explore
effects of heterogeneity on power, a second trend
transformation was used that resulted in more pronounced
curvilinearity at four time points (see Figure 5-2).
Power results for this second curve are presented in
Table 5-4. Comparing tables 5-3 and 5-4 shows that both
Monte Carlo nominal values and obtained values under
homogeneity were quite similar in all cases when p = 5, as
76
12345 1254
occasions occasions
Figure 5-1. Trend transformation for power results of
Table 5-3 with mean vectors:
(0 .4 .8 .5 .1) for p- 5
(0 .4 .8 .5) for p - 4
Means Means
; : : . 0 . ; f :
1 2 3 4 5 l 2 3 4
occasions occasions
Figure 5-2. Second trend transformation for power results
of Table 5-4 with mean vectors:
(0 .6 .7 .2 .05) for p a 5
(0 .6 .7 .2) for par 4
77
.v I a bow .N. h. m. 8 can m u m ...ou 3c. N. h. e. 8 303 muouoo.’ :8:
.3 .033 poonHmeHH .mxHHz can »> 603» yumflummleHHE
«a. .83... aonslmcHHHmuom E .33 umomumH mxom «onumHumum umou 05m: 3 .umocHH can... .852 8:3...
msoumncHfiHa 6 .85.!» mucuEfiHz “mum: @963 8858»: 43358.5: muowHuuu H u E 3353335
no common 0 oops: $398... n my; 8 I c 33 mo mucoum H350 m I x no 233333 28.3 50.5 ...
mm.mm mo.om om.mm mm.mm mh.o¢ mm.w¢ mo.w¢ om.ev ov.NN mH.NN mm.NN om.mN m
ow.mm mh.mm om.mm mv.mm mm.mn mo.¢h oo.vh ch.mn co.mv om.vv cN.mv om.m¢ v
mm.vm om.vm os.vm ch.