OPERATIONALLY DEFINING THE ASSUMPTION 0F INDEPENDENCE AND CHOOSING THE APPROPRIATE UNIT 0F ANALYSIS A Dissertation for ”19 Degree of DR. D. MICHIGAN STATE UNIVERSITY Linda K. Glendening I977 I LII Ill! MIMI"! MW Ill lllzlll I ,1; R A R Mich 183” Sta Cc mversity This is to certify that the thesis entitled Operationally Defining the Assumption of Independence.and Choosing the Appropriate Unit of Analysis presented by Linda K. Glendening has been accepted towards fulfillment ‘: of the requirements for ’ ’ " Ph.D. Educationalesychology degree in 0Q Pia Major professor Date January 28, 1977 0-7639 pn- .-.~—-.v-r ~'-o-- ' a ..... up...— .- —< I. -5“... .-~ OVERDUE FINES: '. 7 ‘- ~' 3 25¢ per day per ital j fifi‘}; 3‘ mumps umnv mm as: ix ”3:;va Place in book return to remove ‘ *1“ I" charge from circulation records lb ABSTRACT OPERATIONALLY DEFINING THE ASSUMPTION OF INDEPENDENCE AND CHOOSING THE APPROPRIATE UNIT 0F ANALYSIS By Linda K. Glendening The assumption of independence was operationally defined as: Individual units (such as students) can be considered independent on some dimension whenever the variance of the grouped units (such as classrooms) can be predicted from the grouping size and the variance of the individual units. When this definition of independence is satisfied, the expected mean squares between and within groups are equal. Given this operational definition, two types of dependence are possible, positive and negative. Positive dependence was defined by the expected mean square between groups being larger than the expected mean square within groups. Negative dependence was defined by the expected mean square within groups being larger than the expected mean square between groups. Both empirical and analytical methods were used to study the effect of violating the assumption of independence, where the design model was balanced and had two levels of nesting, subjects within groups and groups within treatments. Group data were independent of each other, while subjects within group data were manipulated to create different degrees and types of dependence. The simulated data were analyzed using Linda K. Glendening two ANOVA models, the "never pool" model where group was the unit of analysis and so was an always correct model and the "always pool" model with student as the unit of analysis. First, sampling distributions using the "never pool" model and the "always pool" model were compared for independent, positively dependent, and negatively dependent conditions. Given independence of subject responses, either subject or group can be used as the unit of analysis as both the "never pool" and the "always pool" tests proved to have acceptable Type I error rates for the test of treatment effects. The "always pool" test is the preferable test, however, as it had more power than did the "never pool" test. Given positive dependence, the proper unit of analysis is the grouped unit. Using subject as the unit of analysis caused the pooled error term for the "always pool" test to be too small, and so the "always pool" test was too liberal and had spuriously high power. Given negative dependence, the correct unit of analysis is again the grouped unit. Using subject as the unit of analysis caused the pooled error term for the "always pool" F test to be too large and thus the "always pool" test was too conservative and had spuriously low power. The empirical results indicated clearly that the F test is not robust to violations of the assumption of independence, even given small degrees of positive and negative dependence. Next, a conditional testing procedure (a "sometimes pool" model) was studied where an initial test of independence was done and then on the basis of that test a unit of analysis was chosen for the primary test of treatment effects. Sampling distributions using the "never Linda K. Glendening pool" and the "sometimes pool" models were compared for independent, positively dependent, and negatively dependent conditions. Given independence of ungrouped units, the "sometimes pool" F test had acceptable Type I error rates for the test of treatment differences, as did the "never pool" test. In addition, the powers of the "sometimes pool" test tended to be greater than the powers of the "never pool" test. Given positive dependence, the "sometimes pool" F test generally was too liberal and thus had spuriously high empirical power. And given nega— tive dependence, the "sometimes pool" test was somewhat conservative and generally had less power than the "never pool" F test. These results suggest that, as a general rule of thumb, a preliminary test of independence should not be done to choose a unit of analysis to use in testing for treatment differences. OPERATIONALLY DEFINING THE ASSUMPTION OF INDEPENDENCE AND CHOOSING THE APPROPRIATE UNIT OF ANALYSIS BY \ Linda K. Glendening A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Counseling, Personnel Services, and Educational Psychology 1977 ACKNOWLEDGMENTS First and foremost, I would like to acknowledge the members of my dissertation committee for their assistance. Special thanks go to my advisor and friend, Professor Andrew Porter. Working with him has strengthened my capabilities as a researcher and broadened my research interests and experiences. In addition, I would like to thank Drs. William Schmidt, Lee Shulman, and James Stapleton for their interest in my research. I also wish to acknowledge the National Institute of Education, where I did my research. The Institute has provided me with oppor- tunities to increase my awareness of and concern for current educational research problems. ******** ii TABLE OF CONTENTS Page LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . v Chapter I. STATEMENT OF THE PROBLEM . . . . . . . . . . . . . . . . . 1 II. REVIEW OF LITERATURE . . . . . . . . . . . . . . . . . . . 6 Definitions of Independence . . . . . . . . . . . . . . 6 Threats to Independence . . . . . . . . . . . . . . . . ll Selection of Analytic Units . . . . . . . . . . . 13 Statistical Arguments for Unit Selection . . . . . . 14 Logical Arguments for Unit Selection . . . . . . . . 17 III. AN OPERATIONAL DEFINITION OF INDEPENDENCE . . . . . . . . 21 IV D ANALYTIC RESULTS 0 O C O O O O O O O 0 O O O O O O O O O C 25 The Effects of Dependence . . . . . . . . . . . . . . . 25 Classroom as Unit . . . . . . . . . . . . . . . . . 30 Student as Unit 0 O O O O O I C O O O C O O O O O O 33 The Preliminary Test . . . . . . . . . . . . . . . . . . 38 v Q SIMULATION PROCEDURES 0 O O C . O O O C O O O O O C C O C 4 6 Simulation Parameters . . . . . . . . . . . . . . . . . 47 Data Generation Routine . . . . . . . . . . . . . . . . 52 VI. UNITS OF ANALYSIS: EMPIRICAL ESTIMATES OF EFFECTS . . . . 56 Classroom as Unit . . . . . . . . . . . . . . . . . . . 57 Independence . . . . . . . . . . . . . . . . . . . . 57 Positive Dependence . . . . . . . . . . . . . . . . 61 Negative Dependence . . . . . . . . . . . . . . . . 62 Student as Unit . . . . . . . . . . . . . . . . . . . . 63 Independence . . . . . . . . . . . . . . . . . . . . 64 Positive Dependence . . . . . . . . . . . . . . . . 67 Negative Dependence . . . . . . . . . . . . . . . . 72 iii Chapter VII. VIII. Appendix A. B. THE CONDITIONAL F TEST: EMPIRICAL ANALYSIS . . . . . . . The Two-Tailed Preliminary Test . . . . . . . . . . Independence . . . . . . . . . . . . . . . . . . . . Positive Dependence . . . . . . . . . . . . Negative Dependence . . . . . . . . . . . . . The Upper-Tailed Preliminary Test . . . . . . Independence . . . . . . . . . . . . . . . . Positive Dependence . . . . . . . . . . . . . . . Negative Dependence . . . . . . . . . . . The Lower-Tailed Preliminary Test . . . . . . . Independence . . . . . . . . . . . . . . . Positive Dependence . . . . . . . . . . . . . . . Negative Dependence . . . . . . . . . . . . . . . . SUMMARY AND CONCLUSIONS . . . . . . . . . . . . . PRELIMINARY ANALYSIS OF SIMULATED DATA . ESTIMATED ALPHAS OF THE RESCALED F STATISTIC . . . . . POWERS OF THE CONDITIONAL F GIVEN A TWO-TAILED PRELIMINARY TEST 0 O O O O C O O C C O O O I O O O O O O I POWERS OF THE CONDITIONAL F GIVEN AN UPPER—TAILED PRELIMINARY TEST 0 O I O C C O I O O O O O O O O POWERS OF THE CONDITIONAL F GIVEN A LOWER—TAILED PRELIMINARY TEST 0 O O O C O C C C O O O O O O C O O O O O BIBLIOGRAPIIIY O O O C O O O O O O O O O O O O O O O O O O O O O O 0 iv Page 75 77 78 94 100 105 112 120 121 123 129 136 138 142 153 162 163 168 173 178 Table 10. 11. 12. 13. 14. 15. LIST OF TABLES Power Computations Using Groups and Individuals as Units of Analysis, Given Individuals Are Independent . Expected Mean Squares for Model A . . . . . . . . . Expected Mean Squares for Model B . . . . . . . . Theoretical Intraclass Correlation Coefficients for Selected Numbers of Students and Degrees of Dependence O O I O I C O O O O O O O I O O O I O 0 Design of Study . . . . . . . . . . . . . . . . . Empirical Type I Errors for F = MST/MSC'T . . . . . Empirical Powers for F = MST/MSG:T . . . Empirical Type I Errors for F = MST/M85.T . . . . . Empirical Powers for F = MST/MSS°T . . . . . . . . . Discrepancy Between Observed and Theoretical Ed/n), S:T' where d equals the critical F value at any nominal alpha, given (t-l) and (sc—l)t degrees of freedom. The estimated Type I errors, given positive dependence, obtained using the rescaled F statistic are reported at the bottom of Table B-1 in Appendix B. These estimated values closely match those found in the Monte Carlo study (Table 8). (Ninety percent of the matched alpha values from the two empirical analyses were within 1.96 standard errors of each other.) Because the empirical alpha levels of both techniques were similar both in absolute value and trend, the remaining analysis of the empirical effects on the alpha level of increasing 8, increasing c, and increasing degree of positive dependence will be discussed and illustrated using only the simulated or Mbnte Carlo data found in Table 8. As c and degree of positive dependence were held constant and the actual alpha levels were averaged across the five nominal alpha levels, an increase in s was directly related to an increase in liberalness. For example, at E(MSC.T)/E(MS ) equal to 2 and c equal to 5, aver- S:CT aging across the five nominal alpha levels gave mean observed alpha levels equal to .159 for s = 5,.178 for s = 12, and .198 for s = 20 (Table 8). This direct relationship between liberalness and increasing 8, given positive dependence, occurs because as s is increased the discrepancy between the E(MSC'T) and the E’ 72 Negative Dependence The effects of negative dependence between disaggregate units on the actual significance levels of the test F = MST/MSS:T’ estimated using Monte Carlo procedures, are reported at the top of Table 8. None of the empirical alpha levels were within 1.96 standard errors of the nominal alpha levels. All empirical alpha levels were smaller than their nominal counterparts, which means the test statistics were too conservative. This indicates that, given that GE was less than 0;.CT/s and there were no treatment effects, F = MST/MSS°T was not distributed as a central F but instead had an F distribution which was located to the left of the central F distribution found when given the same F statistic and independence of individual units. This finding concurs with the analytic work of Scheffé (1959) and Cochran (1947) and also with the analytic work presented in Chapter IV. The rescaled F statistic was also used to estimate the magnitude of effects given negative dependence and prespecified parameters. The results of this analysis are reported at the top of Table B-1 in Appen- dix B. Once again the estimated alpha values in Table B-1 closely match, both in absolute value and trend, those reported in Table 8. (Ninety-eight percent of the matched alpha levels from the two empirical analyses were within 1.96 standard errors of each other.) Because of this, the effects on the alpha level of increasing 8, increasing c, and increasing the level of negative dependence will 1 be discussed and illustrated using only the simulated data found in Table 8. 73 The theoretical and observed discrepancies (Table 10) between the E(MSC'T) and the E(MSS'T) for each level of negative dependence indicate that as 3 increases F = MST/MS should become more conserva— S:T tive because the E(MSs:T) becomes increasingly larger than the E(MSC:T) as 3 increases. Table 10 indicates that the opposite should occur as c is increased. Neither one of these two expectations appeared in the simulated data. It may have been that the observed alpha values were just too close to zero and the three different levels of number of students and classes were just not different enough to bring out the expected trends. Table 8 also shows that the degree of conservativeness is monotonically related to degree of negative dependence. For example, at E(MSC.T)/B(MS ) equal to .5, averaging across the five combina- S:CT tions of s and c gave mean observed alpha levels equal to .000 for a = .01, .003 for a - .025, .008 for a - .05, .022 for a = .10, and .110 for a - .25; while at E(MSC.T)/E(MS ) equal to .33, averaging S:CT across the combinations of s and c gave mean observed alpha levels equal to .000 for a - .01, .000 for a I .025, .001 for a = .05, .007 for a . .10, and .054 for a - .25. The empirical powers for F - MST/Mss:T for the two simulated negative dependence conditions are shown at the top of Table 9. As expected, the estimated power values increased both as the number of students per class increased and as the number of classes per treatment increased. For example, given.the least degree of negative dependence, i.e., E(MSC.T)/E(MS ) equal to .5, averaging across the five nominal S:CT 74 alpha levels and keeping c constant at 5 gave mean power values equalling .304 for s = 5, .635 for s = 12, and .859 for s = 20. Given that same degree of negative dependence, averaging across the nominal alpha levels and keeping s constant at 12 gave mean power values equal to .284 for c = 2, .635 for c = 5, and .928 for c = 10. By itself a conservative test statistic should have spuriously less power. Thus the power of the test F = MST/MSS'T should be reduced as the negative dependence is increased from ECMSC.T)/E(MS ) equals S:CT .5 to E(MSC.T)/E(MS ) equals .33. For small degrees of freedom S:CT error, i.e., (sc-l)t equals 46 and 48, increasing negative dependence did decrease the power of F = MST/MS However, for large degrees S:T' of freedom error, across most nominal alpha levels the reverse occurred. Of greatest significance to the practitioner, however, F = MST/MS S:T had, in all cases but one, (EIMSC°T]/E[MSS°CT] = .5, c = 2, s - 12, and a = .025), less power than the F = MST/MSC°T test. Thus, in this simu— lation situation increasing the degrees of freedom error for the F statistic by using student, rather than class, as the unit of analysis did not compensate for the fact that using student made the test of no treatment effects too conservative a test. CHAPTER VII THE CONDITIONAL F TEST: EMPIRICAL ANALYSIS The most desirable situation in testing hypotheses is, of course, both a small probability of a Type I error (a) and a small probability of a Type II error (8). Table 7 of the preceding chapter showed that the probability of a Type 11 error, given the always correct F = MST/MS test, the most commonly used alpha level of .05 and the C:T operational definition of independence, was relatively high for four of the five simulated combinations of s and c. For c = 2 and s = 12, B equalled .847; for c - 5 and s - 5, B equalled .773; for c = 5 and s = 12, B equalled .522; for c - 5 and s I 20, B equalled .301, and for c a 10 and s = 12, B equalled .148. Clearly, it would be nice to improve on these rather high probabilities, if possible, without increasing the probability of Type I errors. It was shown earlier, both analytically and empirically (Table 9), that using F tests with disaggregate units as the units of analysis, F - MST/Mss:T, would reduce the probability of Type II errors, given independence, by increasing the degrees of freedom error. However, if the individual data values were positively dependent upon each other, which appears to be quite common in ordinary classroom situations, then using the test F = MST/M88:T increased the probability of Type I errors. On the other hand, if observations within groups were negatively dependent, using the test 75 76 F = MST/M88:T decreased the probability of Type I errors but at the same time the natural increase in degrees of freedom error was not enough to offset the increase of the probability of Type II errors caused by this spurious decrease in the probability of Type I errors. All in all, given simulated conditions common to educational data, it seemed "best" to use classroom as the unit of analysis given dependence (either positive or negative) between student responses and student as the unit of analysis when student responses were independent of each other. Herein lies the motivation for performing a preliminary test of independence. That is, the sole purpose of this preliminary test is to choose the appropriate unit of analysis for the primary test of treatment effects. The problem with using this operational test of independence, F = MS /MS C'T T’ to select a unit of analysis for the primary test S:C is that this procedure makes the primary test of no treatment effects have a conditional F distribution. Of interest then is the difference between the distribution of a conditional F test statistic (also called the "sometimes pool" test statistic) and the distribution of the appro- priate unconditional and always correct F statistic from the "never pool" model, F = MST/MS Variables which were examined to see how C:T’ they affected this difference included the number of students per class, the number of classes per treatment, the type and degree of dependence, the alpha level of the preliminary test, and the alpha level of the primary test. The effects of each one of the above mentioned variables on the distributional properties of the conditional F test were empir- ically studied within the content of three different preliminary F 77 tests. The three preliminary F tests included: (a) a two-tailed preliminary F test, (b) the usual, upper—tailed only preliminary F test, and (c) a lower-tailed only preliminary F test. In order to claim this two-stage testing procedure a success, the observed alpha level of the conditional F test should be close to the nominal alpha level at which the researcher thinks he is working and the procedure should have greater power than the always correct, unconditional test F = MST/MS Simulated data, identical to those C:T' used for looking at the effects of correlated units of analysis, were used to examine both empirical probabilities of Type I errors and empirical powers of the conditional F tests. Based on the results of one of the preliminary tests, either Model A (Table 2), F = MST/MSC:T’ or Model B (Table 3), F a MST/M58:T’ was designated as the appropriate ‘model to use in testing the primary hypothesis of no treatment effects. The actual alpha level for the conditional F test was defined by (nAo:A + nBGB)/(nA.+ nB), where nA and nB equalled the number of pre- liminary F tests rejected and not rejected, respectively, and “A and GB equalled the actual alpha levels for the primary tests of no treat- ment effects analyzed by Models A and B, respectively. The Two-Tailed Preliminary Test The two-tailed preliminary F test tested the hypothesis that the E(MSC'T) equalled the E(MS ), or equivalently that pI equalled zero. S:CT The effects of the two-tailed preliminary test were examined at five different preliminary test alpha levels (i.e., .02, .05, .10, .20, and 78 .50). Actual conditional test alpha levels, given the two—tailed preliminary test, are shown in Tables 11 through 15. Corresponding differences between empirical powers of the conditional test and the unconditional, always correct test F = MST/MSC=T are shown in Tables 16 through 20. Appendix C (Tables C-l through C—S) contains the actual statistical powers of the conditional F test, given the two-tailed preliminary test. Each separate table describes the effect on the conditional F test alpha level or power of varying the type and degree of dependence, the two-tailed preliminary test alpha level and the conditional test alpha level for one specific combination of s and c. Examining the effects of s and c requires between table comparisons. In this study each combination of s and c will be referred to as a "design." Thus, this study includes five designs, c - 2 and s - 12, c ' 5 and s = 5, c - 5 and s - 12, c = 5 and s = 20, and c = 10 and s = 12. Independence Independence is that condition where the variance of the aggregate units is predictable given the variance of the disaggregate units and the grouping size. Operationally speaking, within the context of this study, independence occurs whenever the ratio of E(MSC:T) over E(MSS:CT) equals 1. Given this situation, ideally the two-tailed preliminary test should not reject its null hypothesis, Ho: E(MSC:T) equals E(MSS:CT)’ designating the disaggregate unit (students) as the appropriate unit of analysis in testing for treatment effects. 79 Table 11 Actual Alphas of the Conditional F Test Given a Two-Tailed Preliminary Test, c = 2 and s = 12 Preliminary test Conditional test nominal alpha Mean nominal alpha .010 .025 .050 .100 .250 alpha .02 .007: .0153 .017 .025 .081 .029 .05 .0108 .024a .0338 .051 .114 .046 .33 .10 .010a .029a .044a .0708 .151 .061 .20 .0108 .029a .048a .085a .2008 .074 .50 .010 .029 .050 .093 .257 .088 Mean alpha .009 .025 .038 .065 .161 .02 .008: .0148 .023 .044 .116 .041 .05 .0118 .023a .035a .060 .139 .054 .50 .10 .011a .026a .045a .0753 .161 .064 .20 .0118 .029a .052a .091a .2008 .077 .50 .010 .028 .052 .098 .254 .088 ”E; Mean alpha .010 .024 .041 .074 .174 (g3 .02 .0168 .042 .070 .102: .251: .096 5 .05 .018 .045 .072 .104at . 2508 .098 23 1 .10 .019 .048 .078 .113 .2608 .104 r}, .20 .019a .048 .085 .122 .268 .108 as .50 .016 .042 .079 .126 .289 .110 2% Mean alpha .018 .045 . 077 .113 .264 m .02 .065 .090 .138 .210 .357 .172 .05 .066 .093 .138 .201 .340 .168 2 .10 .066 .091 .134 .193 .325 .162 .20 .055 .082 .123 .181 .309 .150 .50 .035 .062 .098 .152 .280 .125 Mean alpha .057 .084 .126 .187 .322 .02 .093 .144 .188 .261 .383 .214 .05 .085 .136 .178 .244 .357 .200 3 .10 .079 .125 .161 .219 .329 .183 .20 .067 .108 .139 .196 .306 .163 .50 .044 .070 .100 .143 .274 .126 Mean alpha .074 .117 .153 .213 .330 aActual alpha is within 1.96 standard errors of the nominal alpha. 80 Table 12 Actual Alphas of the Conditional F Test Given a Two—Tailed Preliminary Test, 0 = 5 and s = 5 Conditional test nominal alpha Preliminary test Mean nominal alpha .010 .025 .050 .100 .250 alpha .02 .008: .019: .044: .070 .151 .058 .05 .009a .020a .049a .076a .192 .069 .33 .10 .0108 .022a .055a .098a .209 .079 .20 .0118 .0248 .059 .107 .2308 .086 .50 .011 .025 .061a .108a .2453 .090 Mean alpha .010 .022 .054 .092 .205 .02 .006: .018: .0328 .053 .152 .052 .05 .008a .024a .045a .0708 .180 .065 .50 .10 .008a .026a .053a .084a .2018 .074 .20 .010a .026a .057a .098a .2248 .083 .50 .011 .026 .062 .108 .241 .090 E3 Mean alpha .009 .024 .050 .083 .200 gf’ .02 .013: .024: .053: .100: .235: .085 I: .05 .016a, .029a '058a .106a .234a .089 ;: 1 .10 .016 .0318 .060 .108 .237 .090 E: .20 .0183 .0308 .064 .118: .244: .095 E?) .50 .014 .031 .069 .118 .247 .096 I: Mean alpha .015 .029 .061 .110 . 239 .02 .039 .073 .112 .171 .308 .141 .05 .036 .067 .104 .155 .289 .130 2 .10 .033 .060 .095 .144 .2788 .122 .20 .026 .054 .091 .133 .2638 .113 .50 .020 .040 .074 .119 .248 .110 Mean alpha .031 .059 .095 .144 .277 .02 .056 .082 .119 .170 .319 .149 .05 .047 .073 .109 .159 .299 .137 3 .10 .038 .059 .095 .147 .2858 .125 .20 .031 .046 .079 .1278 .274a .111 .50 .019 .033 .071 .115 .253 .098 Mean alpha .038 .059 .095 .144 .286 aActual alpha is within 1.96 standard errors of the nominal alpha. 81 Table 13 Actual Alphas of the Conditional F Test Given a Two-Tailed Preliminary Test, c = 5 and s = 12 Conditional test nominal alpha Preliminary test Mean nominal alpha .010 .025 .050 .100 .250 alpha .02 .006: .015 .030 .049 .120 .044 .05 .0068 .015a .036a .071 .155 .057 .33 .10 .0078 .017a .040a .0808 .190 .067 .20 .0078 .018a .045a .0918 .212 .075 .50 .007 .018 .046 .095 .2248 .078 Mean alpha .007 .017 .039 .077 .180 .02 .003a .0088 .021 .042 .136 .042 .05 .0058 .0168 .032 .058 .157 .054 .50 .10 .0063 .017a .036a .0708 .175 .061 .20 .0068 .0178 .042a .0863 .198 .070 .50 .007 .018 .046 .096 .222 .078 ’14 Mean alpha .005 .015 .035 .070 .178 U a; .02 .009a .021a .047a .095a .2358 .081 a a a a a g .05 .009a .021a .047a .099al .235a .082 a, 1 .10 .010a .023a .054a .106a .2358 .086 ;; .20 .0118 .025a .057a .110a .2378 .088 e: .50 .011 .027 .056 .117 .234 .089 30 Mean alpha .010 .023 .052 .105 .235 “I .02 .040 .081 .111 .167 .323 .144 .05 .033 .067 .095 .148 .2988 .128 2 .10 .028 .058 .084 .1328 .2748 .115 .20 .0248 .050a .074a .115a .2538 .103 .50 .014 .029 .055 .101 .237 .087 Mean alpha .057 .057 .084 .133 .277 .02 .055 .076 .108 .157 .298 .139 .05 .042 .056 .085 .132 .2808 .119 3 .10 .032 .0443 .072a .120a .2658 .107 .20 .0238 .030a .058a .109a .2448 .093 .50 .012 .023 .053 .099 .231 .084 Mean alpha .033 .046 .061 .123 .264 aActual alpha is within 1.96 standard errors of the nominal alpha. 82 Table 14 Actual Alphas of the Conditional F Test Given a Two-Tailed Preliminary Test, c = 5 and s = 20 Conditional test nominal alpha Preliminary test Mean nominal alpha .010 .025 .050 .100 .250 alpha .02 .006: .019: .0303 .051 .126 .046 .05 .0073 .0243 .0403 .0773 .168 .063 .33 .10 .0073 .0263 .0453 .0893 .2093 .075 .20 .0073 .0283 .0493 .0973 .2333 .083 .50 .007 .028 .049 .100 .248 .086 Mean alpha .007 .025 .043 .083 .197 .02 .005: .0123 .020 .037 .143 .043 .05 .0053 .0183 .0293 .054 .168 .055 .50 .10 .0073 .0233 .0393 .0693 .192 .066 .20 .0073 .0253 .0443 .0893 .2113 .075 .50 .007 .028 .049 .100 .245 .086 ,3 Mean alpha .006 .021 .036 .070 .192 H E? .02 .014: .025: .050: .110: .260: .092 a? .05 .0163 .0283 .0543 .1113 .2603 .094 5 1 .10 .0163 . 0303 .0563 . 1123 . 2623 .095 31 .20 .0163 .031 .0543 .1103 .2603 .094 ’14 .50 .015 .035 .057 .107 .259 .095 g?) Mean alpha .015 .030 .054 .110 .260 33 .02 .052 .092 .120 .193 .328 .157 .05 .048 .085 .111 .175 .305 .145 2 .10 .040 .075 .102 .157 .2903 .133 .20 .032 .063 .0843 .129 .274 .116 .50 .021 .039 .060 .112a .2528 .097 Mean alpha .039 .071 .095 .153 .290 .02 .064 .089 .123 .170 .303 .150 .05 .050 .069 .098 .144 .286 .129 3 .10 .032 .052 .082 .136 .2668 .114 .20 .0243 .043 .067 .118a .2578 .102 .50 .010 .031a .054a .103a .2508 .090 Mean alpha .036 .057 .085 .134 .272 aActual alpha is within 1.96 standard errors of the nominal alpha. Actual Alphas of the Conditional F Test Given a Two—Tailed 83 Table 15 Preliminary Test, c = 10 and s = 12 Conditional test nominal alpha Preliminary test Mean nominal alpha .010 .025 .050 .100 .250 alpha .02 .006: .016: .044: .0783 .200 .069 .05 .0063 .0173 .0463 .0883 .2173 .075 .33 .10 .0063 .0173 .0483 .0923 .2253 .078 .20 .0063 .0173 .0493 .0943 .2303 .079 .50 .006 .017 .049 .094 .231 .079 Mean alpha .006 .017 .047 .089 .221 .02 .003 .009 .0233 .049 .160 .049 .05 .006: .0143 .0373 .070 .191 .064 .50 .10 .0063 .0163 .0443 .0793 .2103 .071 .20 .0063 ~.0173 .0473 .0853 .2253 .076 .50 .006 .017 .049 .094 .230 .079 ,3 Mean alpha .005 .015 .040 .075 .203 E-4 :33 .02 .007: .023: .054: .099: .237"1 .084 g .05 .0073 .0233 .0543 .0993 . 238 .084 ES 1 .10 .0073 .0233 .0543 .098a .2383 .084 ;: .20 .006 .021 .056 .097 .235 .083 E: .50 .006a .022a .053a .095a .2328 .082 gf’ Mean alpha .007 .022 .054 .098 .236 “‘ .02 .032 .053 .087 .152 .2773 .120 .05 .024 .043 .071 .132 .262 .106 2 .10 .0153 .033: .063: .1173 .254a .096 .20 .0113 .0263 .0543 .1043 .2403 .087 .50 .007 .021 .051 .096 .235 .082 Mean alpha .018 .035 .065 .120 .254 .02 .0193 .031a .064 .116a .2488 .096 .05 .0143 .026: .056: .102: .243: .088 3 .10 .0113 .021 .053 .099 .241 .085 .20 .0073 .018: .050: .0972 .2353 .081 .50 .006 .017 .049 .094 .2318 .079 Mean alpha .011 .023 .054 .102 .240 aActual alpha is within 1.96 standard errors of the nominal alpha. 84 Actual alpha levels. It was expected that, given independence of student data and no treatment effects, the empirical and nominal alpha levels for all the conditional F tests would be equal. Generally the simulated data verified this expectation. Excluding all situations where c equalled 2 and s equalled 12, 962 of the remaining 100 observed alpha levels (Tables 12 through 15), given E(MSC_T)/E(MS = l, were S:CT) within 1.96 standard errors of the nominal alpha levels. However, given all situations where c equalled 2 and s equalled 12 (Table 11), only 9 of the 25 observed alpha values (36%) were within 1.96 standard errors of the nominal alpha levels. The remaining 16 observed alpha levels were too liberal. That is, their probabilities of a Type I error were consistently too large. These 16 liberal observed alpha levels were concentrated at the lower conditional test nominal alpha levels (i.e., .01, .025, and .05). Given independence, the actual alpha levels of the conditional F tests, averaged across the five preliminary test alpha levels and the four designs c I 5 and s = 5, c = 5 and s a 12, c a 5 and s - 20, and c = 10 and s 8 12, increased from .012 to .026 to .055 as the nominal alpha levels increased from .01 to .025 to .05. At those same three conditional test nominal alpha levels, however, the actual alpha levels of the conditional tests for c - 2 and s - 12, averaged across the five preliminary test alpha levels, increased from .018 to .045 to .077. Because there seemed to be no reasonable expla- nation for the liberalness that dominated when c equalled 2 and s equalled 12, a second simulation run was done for that particular design. The results of this run deviated even more from the expected, 85 given independence, as 22 of the 25 (88%) actual alpha values were too liberal. Estimatedepowers. Given that E(MSC3T)/E(MS ) equalled one, S:CT the statistical powers of the conditional F tests were, almost without exception, greater than the powers of their respective "never pool," unconditional F = MST/MS tests (Tables 16 through 20). Across the C:T five designs and the five preliminary alpha levels, as the five nominal alpha levels increased from .01 to .25, the average difference between the conditional test powers and the "never pool" test powers decreased from .102 to .091 to .076 to .057 to .021. Comparing the estimated power values of the conditional F tests (Appendix C) with comparable power values of the unconditional F = MST/MS tests (Table 7) shows C:T that this decrease in discrepancy is probably due to the fact that the average powers of the "never pool" tests are rather high given an alpha level of .25 and thus it is harder for the "sometimes pool" tests to improve on that already "high" power. This is especially so given the two designs c = 5 and s - 20 and c - 10 and s = 12. While this negative relationship held up across the five designs or combinations of s and c, it did not hold up within each combination of s and c. Consider the design c a 2 and s = 12 (Table 16). Averaged across the five prelim- inary test alpha levels, the observed power differences for this one design equalled .085, .115, .142, .146, and .060 as their respective nominal alphas increased from .01 to .25. Power of the Conditional F Test Minus Power of the Test F==MS /MS 86 Table 16 Given a Two-Tailed Preliminary Test, c= 2 and s= 12 T C:T Conditional test nominal alpha Mean Preliminary test power nominal alpha .010 .025 .050 .100 .250 dif.a .02 .018 -.034 -.097 -.l37 -.099 -.077 .05 .014 .006 -.057 -.108 -.087 -.O46 .33 .10 .019 .034 -.012 -.065 -.060 -.017 .20 .020 .055 .055 .006 -.027 .022 .50 .013 .039 .067 .075 .016 .042 Mean power dif.a .010 .020 -.009 -.046 -.051 .02 .019 .019 .011 .001 -.046 .001 005 0036 0045 0033 .019 -0036 .019 050 010 0046 0069 0060 0039 -0024 0038 .20 .046 .089 .094 .088 .006 .065 .50 .028 .064 .093 .114 .021 .082 ,3 Mean power dif. .035 .057 .058 .052 -.016 [.1 ‘3. .02 .089 .115 .143 .155 .077 .116 m‘” .05 .090 .116 .142 .152 .068 .114 5 1 .10 .092 .122 .150 .156 .065 .117 E .20 .090 .127 .154 .156 .051 .116 “a: .50 .062 .093 .120 .113 .038 .085 U}’ Mean power dif. .085 .115 .142 .146 .060 2‘. “I .02 .152 .191 .223 .218 .159 .189 .05 .145 .180 .210 .202 .139 .175 2 .10 .134 .166 .191 .175 .110 .155 .20 .114 .149 .167 .147 .082 .132 .50 .068 .088 .101 .070 .029 .071 Mean power dif. .123 .155 .178 .162 .104 .02 .171 .221 .250 .218 .166 .205 .05 .154 .196 .219 .183 .136 .178 3 .10 .137 .175 .194 .157 .108 .154 .20 .114 .144 .146 .105 .058 .113 .50 .059 .078 .073 .043 .028 .056 Mean power dif. .127 .163 .176 .141 .099 aMean power differences. Power of the Conditional F Test Minus Power of the Test F==MS [MS 87 Table 17 Given a Two—Tailed Preliminary Test, c==5 and s==5 T C:T Conditional test nominal alpha Mean Preliminary test power nominal alpha .010 .025 .050 .100 .250 dif.a .02 .101 .177 —.239 -.228 -.099 -.169 005 0045 0094 “.150 -0153 -0068 -0102 .33 .10 .016 .044 -.083 -.092 -.054 -.058 .20 .004 .018 -.O34 -.042 -.025 -.025 .50 .000 .004 .002 .001 -.003 .001 Mean power dif.a .033 .066 -.101 -.103 -.050 .02 .060 .096 -.l4l -.l41 -.092 -.106 .05 .031 .057 -.O99 -.104 -.074 -.073 050 010 0013 0025 -0060 "0072 _0052 _0044 .20 .005 .008 -.012 -.026 -.032 -.011 .50 .012 .022 .011 .008 -.002 .010 Mean power dif. .017 .030 -.O60 -.O67 -.050 E3 .02 .029 .041 .051 .044 .022 .037 5; .05 .030 .043 .052 .042 .024 .038 g 1 . 10 . 032 . 043 . 051 . 040 . 028 . 039 E; .20 .039 .041 .043 .035 .022 .036 Z: .50 .025 .025 .025 .024 .007 .021 64 3;; Mean power dif. .031 .039 .044 .037 .021 2'1 3; .02 .090 .101 .106 .108 .102 .101 .05 .076 .078 .074 .077 .075 .076 2 .10 .066 .071 .066 .062 .057 .064 .20 .053 .050 .043 .044 .039 .046 .50 .024 .022 .017 .020 .008 .018 Mean power dif. .062 .064 .061 .062 .056 .02 .083 .083 .101 .093 .082 .088 .05 .069 .063 .080 .072 .055 .068 3 .10 .050 .046 .060 .052 .036 .049 .20 .033 .032 .038 .022 .025 .030 .50 .019 .011 .014 .003 .006 .011 Mean power dif. .051 .047 .059 .048 .041 aMean power differences. 88 Table 18 Power of the Conditional F Test Minus Power of the Test F==MSTlMsc-T Given a Two-Tailed Preliminary Test, c= 5 and s= 12 Conditional test nominal alpha Mean Preliminary test power nominal alpha .010 .025 .050 .100 .250 dif.a .02 .218 .240 .150 -.079 -.026 -.143 .05 .135 .158 .099 -.052 -.015 -.092 .33 .10 .050 .086 .061 -.033 -.011 -.048 .20 .003 .026 .023 -.011 -.005 -.014 .50 .018 .007 .000 -.002 .000 .005 Mean power dif.a .078 .101 .067 -.035 -.011 .02 .085 .127 .120 -.073 -.028 -.087 .05 .051 .090 .083 -.056 -.021 —.060 .50 .10 .002 .046 .049 -.039 -.015 -.030 .20 .033. .003 .010 -.020 -.006 -.001 .50 .045 .041 .018 .003 .002 .022 Mean power dif. .012 .045 .049 -.037 -.014 E? .02 .148 .141 .105 .062 .026 .096 a: .05 .140 .130 .096 .058 .025 .090 fig 1 .10 .141 .130 .092 .054 .018 .087 53 .20 .135 .119 .082 .044 .016 .079 ’23 .50 .090 .073 .052 .032 .006 .051 333 Mean power dif. .131 .119 .085 .050 .018 E? .02 .192 .180 .161 .130 .078 .148 .05 .153 .142 .122 .100 .050 .113 2 .10 .119 .105 .090 .081 .031 .085 .20 .081 .067 .045 .045 .016 .051 .50 .034 .030 .016 .018 .004 .020 Mean power dif. .116 .105 .087 .075 .036 .02 .114 .107 .100 .081 .061 .093 .05 .075 .074 .071 .056 .040 .063 3 .10 .056 .057 .054 .037 .027 .046 .20 .030 .032 .031 .016 .013 .024 .50 .014 .010 .010 .006 .004 .009 Mean power dif. .058 .056 .053 .039 .029 aMean power differences. 89 Table 19 Power of the Conditional F Test Minus Power of the Test F=-MST/MSC3T Given a Two-Tailed Preliminary Test, c=5 and s '20 ' Conditional test nominal alpha Mean Preliminary test power nominal alpha .010 .025 .050 .100 .250 dif. 002 -0143 -0072 “.043 -0016 0001 -0055 .05 -.O90 -.O49 -.029 -.012 .001 -.036 .33 .10 .041 -.027 -.015 -.005 .000 -.018 .20 .004 -.009 -.008 -.001 .001 -.004 .50 .020 .004 .002 .000 .001 .005 Mean power dif.a -.052 -.031 -.019 -.007 .000 .02 —.050 -.034 -.035 -.028 .006 -.O3l .05 .027 -.016 -.023 -.023 .006 -.019 .50 .10 .003 .002 -.011 -.014 .005 -.005 020 0039 0025 0001 -0007 0004 0011 .50 .059 .037 .013 .000 .000 .022 Mean power dif. .005 .003 -.011 -.014 .004 S .02 .218 .173 .106 .056 .007 .112 J; .05 .207 .161 .095 .053 .007 .105 g 1 .10 .195 .152 .087 .049 .005 .098 n: .20 .178 .134 .083 .040 .001 .087 233 .50 .113 .080 .041 .015 .001 .050 3;; ‘Mean power dif. .182 .140 .082 .043 .004 ES .02 .259 .235 .173 .116 .047 .166 .05 .209 .187 .132 .087 .033 .130 2 .10 .168 .141 .088 .058 .020 .095 .20 .115 .095 .059 .035 .011 .063 .50 .043 .037 .018 .006 .004 .022 Mean power dif. .159 .139 .094 .060 .023 .02 .158 .137 .123 .073 .039 .106 .05 .101 .086 .080 .043 .026 .067 3 .10 .073 .061 .058 .026 .015 .047 .20 .049 .031 .031 .012 .008 .026 .50 .014 .008 .011 .002 .003 .008 Mean power dif. .079 .065 .061 .031 .018 aMean power differences. 90 Table 20 Power of the Conditional F Test Minus Power of the Test F==MS /MS Given a Two-Tailed Preliminary Test, c= 10 and s = 12 T C:T Conditional test nominal alpha Mean Preliminary test power nominal alpha .010 .025 .050 .100 .250 dif.a .02 .033 .015 -.007 -.001 .000 -.011 .05 .014 .005 -.003 -.001 .000 -.005 .33 .10 .004 .003 -.002 -.001 .000 -.002 .20 .001 .000 .000 .000 .000 .000 .50 .001 .000 .000 .000 .000 .000 Mean power dif. .004 .005 -.002 -.001 .000 .02 .070 .043 -.021 -.011 .002 -.029 .05 .047 .025 -.013 -.008 .000 -.019 .50 .10 .025 .015 -.007 -.006 .000 —.011 .20 .009 » .003 -.002 -.004 .000 -.004 .50 .005 .000 .000 .000 .000 .001 Mean power dif. .029 .017 -.017 -.006 .000 E3 .02 .100 .054 .031 .015 .003 .041 J; .05 .095 .051 .028 .013 .003 .038 g 1 .10 .092 .048 .028 .011 .002 .036 a: .20 .079 .038 .023 .011 .001 .030 :33 .50 .047 .021 .010 .006 .001 .017 335 Mean power dif. .083 .042 .024 .011 .002 § .02 .165 . 131 .090 .047 .019 .090 .05 .109 .081 .053 .026 .013 .056 2 .10 .070 .046 .033 .017 .008 .035 .20 .044 .021 .015 .009 .002 .018 .50 .013 .004 .004 .001 .000 .004 Mean power dif. .080 .057 .039 .020 .008 .02 .040 .035 .020 .020 .006 .024 .05 .022 .019 .010 .012 .003 .013 3 .10 .006 .007 .005 .008 .002 .006 .20 .003 .004 .003 .004 .001 .003 .50 .001 .000 .000 .001 .000 .000 Mean power dif. .014 .013 .008 .009 .002 aMean power differences. 91 As the number of observations per class increased (compare across Tables 17, 18, and 19), the discrepancy between the power of the condi- tional test and the power of the unconditional F = MST/MS test was C:T expected to increase. The rationale for this expectation follows. Given independence of student responses, E(MSC3T)/E(MS = l, pooling S:CT) should be prescribed all the time. If there were only one student per class, the power of the conditional test and the power of F = MST/MSC:T should be identical. As the number of students increases, however, the power of the conditional test and the power of the test F = MST/MSG:T should become more discrepant, with the power of the conditional test being greater as it would have more degrees of freedom error. Basically the simulated data upheld this prediction, especially given the more stringent nominal conditional test alpha levels (.01 and .025). Given a conditional test nominal alpha of .01 and averaging across the five preliminary test alpha levels gave average differences between the power of the conditional F test and the respective power of the test F a MST/MSC3T of .031 for c - 5 and s - 5, .131 for c - 5 and s 8 12, and .182 for c - 5 and s - 20. As the number of students per class increased, the empirical powers of the unconditional test F - MST/MSG:T became rather high (Table 7) given the larger nominal alpha values. Thus it became more difficult to detect this expected difference in powers between the conditional test procedure and the unconditional test F - MST/MSG.T as 3 increased at the higher nominal conditional test alpha levels. 92 On the other hand, with the exception of the design c = 2 and s = 12 at the smallest alpha levels, as the number of classes increased (compare across Tables l6, l8, and 20), the discrepancies between the powers of the conditional F test and the unconditional test F-BMST/MSC3T tended to decrease. For example, for the conditional test alpha of .25, averaging the discrepancies across the five preliminary test alpha levels gave power differences of .060 for the design c I 2 and s = 12, .018 for the design c I 5 and s = 12, and .002 for the design c I 10 and s I 12. This trend was expected as "sometimes pooling" should be more advantageous for increasing power than "never pooling" when only a few classrooms per treatment have been sampled. The increased degrees of freedom brought on by pooling has a larger effect when the "never pool" test has relatively few degrees of freedom error than when it has many degrees of freedom error. The powers of the conditional test for the design c = 2 and s I 12 turned out very curiously as it was the one design where the discrepancies between the conditional test power and the F I MST/MS test power did not fit the predicted trend as c was C:T varied for nominal conditional test alphas of .01 and .025. At those two alpha levels, the estimated powers of the conditional tests were spuriously high because the actual alpha levels were too liberal. Thus, one would have expected the average discrepancies between powers of the conditional test and powers of the test F I MST/MSG:T to also be spuriously large. However, the opposite occurred. Although the trend was not perfect across the five conditional test nominal alpha levels and across the five combinations of s and c, 93 the simulated data generally showed the discrepancies between the power of the "sometimes pool," conditional test and the power of the "never pool" test to decrease with an increase in the nominal alpha level of the preliminary test. This too was predictable as the distributions of the "sometimes pool" test and the "never pool" test become more similar as the nominal preliminary test alpha level increases. As the alpha level of the preliminary test increases to .50, the power of the pre- liminary test, F I MSC:T/MSS:CT’ of freedom and mean squares between and within classrooms are prescribed increases and thus pooling of degrees less often, which makes the conditional F test more of a "never pool" test. A good example of this indirect relationship between the power difference between the conditional andunconditional test and alpha level of the preliminary test is evident when the nominal alpha level of the conditional test equals .25 and the design is c = 2 and s = 12 (Table 16). Given these three prespecified parameters, the discrep- ancies between the power of the conditional test and the F I MST/MSG:T test equal .077, .068, .065, .051, and .038 for preliminary test alpha levels of .02, .05, .10, .20, and .50. Given independence of individual responses within and between groups, one might also wonder how the power of the conditional, "sometimes pool" test compared to the power of the "always pool" F I MST/MSS:T test. These two powers can be compared by looking at the E(MSC3T)/E(MS ) = 1 sections in Appendix C (Tables C-l through C-5) S:CT and Table 9. One would expect the power of the "always pool" test to always exceed the power of the "sometimes pool" test. While that was 94 usually the case, it was not always the case. For example, given the design c I 2 and s I 12 (Table C-l), 17 of the 25 (68%) conditional test powers exceeded the "always pool" F I MST/MS test powers. A S:T little over two-thirds of these "exceptions," given this design, coincided with alpha levels which were too liberal, which would explain this result. However, as one example of a curious and unexplained result, given a preliminary alpha of .02, a conditional alpha of .01, c I 2 and s I 12, the power of the conditional test equalled .125 (Table C—l), while the power of the "always pool" test only equalled .114 (Table 9) even though the actual alpha level of this conditional test was within 1.96 standard errors of the nominal value. The other design that had several of these surprising and unexplanable findings was c I 5 and s I 5 (Table C—2). Here 14 of the 25 (562) conditional test powers exceeded the F I MST/MS test powers. And for this S:T particular design, in all cases but one, these "exceptions" occurred even though the conditional test alphas were not too liberal. Positive Dependence Positive dependence is that condition where the variance of the aggregate units exceeds that predicted given random assignment of individual units to groups, the variance of the disaggregate units and the grouping size. Given positive dependence, the two-tailed preliminary test should reject its null hypothesis, Ho: E