til ‘ . 1 IIIIIHHIHHIIIUllllHlllllllHlllUlllHHllllllHlUllll L d 31293 02058 6545 ' , ) c-K )I‘\F\ ’ F :k/L/ | This is to certify that the . dissertation entitled statistic's fit to its asymptotic distributions{fImplications for t i t . . , , 1 Factors 1nfluenc1ng Pearson s chl-squared . i . . . 1 sample Slze gu1de11nes ' presented by Shelly Johann Naud has been accepted towards fulfillment of the requirements for Ph.D. de&e in Education -v-+'lw._._fl r-—‘ W ma Major prgfessor Date g/Z'7/(lf __ .W—‘kv‘ OW W MSU is an Affirmative Action/Equal Opportunity Institution 0-12771 _‘ __ -rfi _ - ,f—v ~w——-r “v v—‘V LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE trim os ifiifiipm 11/00 chlRC/DdaOm.fi5-p.14 FACTORS INFLUENCING PEARSON’S CHI-SQUARED STATISTIC’S FIT TO ITS ASYMPTOTIC DISTRIBUTIONS: IMPLICATIONS FOR SAMPLE SIZE GUIDELINES BY SHELLY JOHANN NAUD A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Measurement and Quantitative Methods College of Education 1 999 '. =5... d-la_ ! ABSTRACT FACTORS INFLUENCING PEARSON’S CHI-SQUARED STATISTIC’S FIT TO ITS AYMPTOTIC DISTRIBUTIONS: IMPLICATIONS FOR SAMPLE SIZE GUIDELINES BY SHELLY JOHAN NAUD Recent sample size guidelines for Pearson's chi-squared statistic (X2) have generally been based on simulation studies. These previous studies have mainly focused on the impact of small sample size on Type I error for a single test. A simulation study was carried out to evaluate the impact of small sample size on both Type I error and power approximation across four tests. It was found that power may be overestimated even though the sample size is large enough for the Type I error rate to be close to a. This problem is more serious for the test of independence than for the goodness of fit test. A quantitative index, Pn, was proposed for contingency table tests. When sample size is larger than Pn, both Type I error and power of X2 are fairly well approximated by the asymptotic distributions. ACKNOWLEDGMENTS I owe a debt of gratitude to the members of my dissertation committee: Dr. Alexander von Eye, committee chair, Dr. Betsy Jane Becker, adviser and guidance committee chair, Dr. Richard Houang, and Dr. Alka lndurkhya. I also wish to thank Dr. David Wagstaff for his extensive editorial comments on myfifirst draft. TABLE OF CONTENTS LIST OF TABLES .................................................................................................. v LIST OF FIGURES .............................................................................................. vi LIST OF ABBREVIATIONS ................................................................................. vii INTRODUCTION .............................................. - .................................................... 1 CHAPTER 1 - ' THEORETICAL BACKGROUND .......................................................................... 3 CHAPTER 2 SIMULATIONS .................................................................................................... 11 CHAPTER 3 THE FIRST QUESTION ...................................................................................... 16 CHAPTER 4 THE GOODNESS OF FIT TEST UNDER THE MULTINOMIAL SAMPLING MODEL ............................................................................................................... 21 CHAPTER 5 THE GOODNESS OF FIT TEST UNDER THE PRODUCT MULTINOMIAL SAMPLING MODEL ............................................................................................ 33 CHAPTER 6 THE TEST OF INDEPENDENCE ....................................................................... 40 CHAPTER 7 THE HOMOGENEITY TEST ............................................................................... 51 CHAPTER 3 SUMMARY AND RECOMMENDATIONS ........................................................... 55 APPENDIXA ...................................................................................................... 59 TABLES AND FIGURES ..................................................................................... 7o BIBLIOGRAPHY ............................................................................................... 100 LIST OF TABLES Table 4-1. Lower limits for the minimum expected cell frequency (em). ........... 75 Table 4-2. Cell probabilities Of tables generated for part 2 ............................... 76 Figure 2-1. Normal plots of the standardized residuals of the cell means ......... 70 Figure 3-1. Power plots for the test of independence ........................................ 71 Figure 3-2. Power plots for the homogeneity test .............................................. 72 Figure 3-3. Power plots for the multinomial case Of the goodness of fit test ..... 73 Figure 3-4. Power plots for the product multinomial case Of the goodness of fit test .................................................................................................................. 74 Figure 4-1. Type I error rate versus sample size ............................................... 78 Figure 4-2. Power plots: Rejection rate in percent versus effect size, w. ......... 79 Figure 4-3. Rejection rates versus sample size for alternative hypotheses ....... 83 Figure 5-1. Differences in fit between the product multinomial and the multinomial cases of the goodness of fit test .................................................. 84 Figure 5-2. Unequal sample sizes, 860 and 875 series .................................... 86 Figure 5-3. The number Of small cell expectations within rows as a factor affecting power ............................................................................................... 88 . Figure 5-4. Application problem ......................................................................... 89 Figure 8-1. Type I error rate versus sample size, test of independence ........... 90 Figure 6-2. Power plots, test of independence .................................................. 91 Figure 6-3. Power plots for n = Pn ..................................................................... 93 Figure 8-4. Differences between observed and expected power versus expected power .............................................................................................................. 95 Figure 6-5. Application problem ......................................................................... 98 Figure 7-1. Differences in fit between the homogeneity test and the test of independence ................................................................................................. 97 LIST OF FIGURES vi LIST OF ABBREVIATIONS df Degrees of freedom. 9. or e. Expected cell frequency (p. 2). em... A table’s minimum expected cell frequency (p. 5). ES Effect size (p. 5) k Number of cells in a table. Ho Null hypothesis. H. Alternate hypothesis. n Sample size. n5 The sample size where power is expected to be .5 for a large effect size (p. 5). n8 The sample size where power is expected to be .8 for a moderate effect size (p. 5). n(p) The number of cells with expected probabilities less than 1/k (p. 6). O. or 0., Observed cell frequency (p. 2). ‘ p. or p. Expected cell probability (p. 2). p... p, Marginal probabilities (p. 2). pm A table’s minimum expected cell probability (p. 6). Pn Sample size where the probability of getting a marginal total of zero is 1% (p. 43). r Number of cells expectations less than 5 (p. 6). R A global index evaluating the skewness Of the distribution Of cell expectations (p. 6). Cohen's index of effect size (p. 4). Pearson’s Chi-squared statistic. Noncentrality parameter (p. 3). >180: vii INTRODUCTION Pearson’s chi-squared statistic, X2, first introduced in 1900, is currently widely known and used. Many of the researchers who used X2 may not realize is that there is no consensus on sample size guidelines - available guidelines actually vary a great deal. Why is there such variablitiy? It is partly due to the ' different approaches for determining when an asymptotic distribution is a reasonable approximation. When sample sizes are small, the distribution of X2 is a step function that cannot be well approximated by any continuous function. The older guidelines required that the distribution of X2 be fairly smooth. To. attain this criteria, sample sizes need to be large. Recent guidelines are generally based on simulation studies. As long as the actual Type I error rate is reasonably close to the nominal Type I error rate, a, the asymptotic distribution is considered adequate. The resulting sample size recommendations are considerably less stringent. Though a cOnsiderable number of simulation studies have been done, the question of sample size has not been entirely resolved because additional factors complicate the problem. One such factor is the table’s distribution of cell expeCtations. Tables where some of the expected cell frequencies are very small in comparison to the other cells apparently require different guidelines than table with uniform expectations. This study proposes to address some of the gaps in the simulation research. One is related to the fact that a majority of the research has dealt with only one of the several tests that use X2 as the test statistic. Although the asymptotic distributions of X2 are the same across tests, the actual distribution of X2 across tests is not necessarily similar when the sample size is small. This issue has not been studied systematically. A Second issue addressed in this study is power. Although there have been studies on the impact of small sample sizes on power, these have had much less influence on sample size guidelines than the studies focusing on Type I error. The comparison between tests is the focus of Chapter three. Each test is then considered individually in the following chapters. The first two Chapters will cover theoretical and methodological issues. In summary, this study will explore the behavior of Pearson’s chi-squared statistic when the sample size is small and the table has a skewed distribution of expected cell frequencies. These are the conditions where the asymptotic distributions do not hold well. Both power and Type I error will be considered across different tests. Current recommendations for sample size will be evaluated based on these findings. Chapter 1 THEORETICAL BACKGROUND The first Sections of this chapter will define the notation and terminology related to X2, hypothesis testing and power, and some proposed indices. The sampling distributions and tests associated with categorical data are described in the last section. Notation and formulas The two-way frequency tables have I rows and J columns. The number of cells in the table is denoted by k with k = IJ. Marginal row and column probabilities, p.. and p... are obtained by dividing the row and column totals, hi, and n... by the total sample size, n (6.9., p1, = n1,ln). Depending on the sampling plan that is assumed to have generated the data, one or more of the marginal totals may be fixed or treated as constants. With such sampling plans, the marginal totals will used in some formulas instead of n. The expected cell probabilities are denoted by p. with p; = p.,p,,. The expected cell frequencies (or expectations) are related to the cell probabilities: e. = npg. Each cell’s count is referred to as the observed cell frequency (09. Pearson’s chi-squared statistic provides a measure of the discrepancy between observed and expected cell frequencies: x2 = 2 2(0' _e'|) I I eIi If the expected values are close to the observed values, the value of X2 is small; if the expected values are far from the observed values, the value of X2 is large. Because the deviations are squared, the X2 statistic gives more weight to observed cell frequencies that are much larger (or much smaller) than the expected cell frequencies. . When the null hypothesis is true, Pearson’s X2 is asymptotically distributed as the chi-square (x2). On the other hand, when the null hypothesis is false, the asymptotic distribution is the noncentral chi-square distribution. Both distributions have degrees of freedom (df) as a parameter. The noncentral chi-square distribution further depends on a second parameter A. = "22 (p15 - p002 P011 where pg. refers to the cell probability under the null hypothesis (Ho) while p1. refers to the cell probability under the alternate hypothesis (H1). Lambda increases in value as the two hypotheses become more discrepant, and it increases with the sample size. When I. is set to zero, the noncentral x2 is equivalent to the chi-square distribution. Type I error, power, and Cohen’s effect size Index When deciding whether to accept or reject the null hypothesis, researchers can make two types of error. They can reject the null hypothesis when it is true or they can fail to reject the null hypothesis when it is false. This 4 former is referred to as Type I error and the probability of its occurrence is a. The second is a Type II error and its probability of occurrence is [3. Power is the probability of rejecting Ho when it is false; it is equal to 1 - [3. Researchers have to balance the costs associated with the two errors. Choosing to make a very small decreases the risk of rejecting a null hypothesis that is true; however, power also decreases as a result. Choosing a relatively large a will result in a smaller [3 and, therefore, more power; however, this choice I increases the risk of rejecting the null hypothesis when it is true. There is another alternative: Researchers can achieve an increase in power by increasing their sample size. In order to determine how large the sample should be, the researcher should have a reasonable estimate of the population effect size, ES. A small effect size indicates that the alternative hypothesis is not much different from the null hypothesis. Small differences are unlikely to be detected unless the sample size is large. On the other hand. a researcher can expect to detect large effect sizes with smaller samples. Estimates of effect size are determined from previous research or pilot studies whenever possible. Cohen (1988, 1992) has defined a measure of effect size that is widely used. If previous findings are available, Cohen’s index1 can be calculated as _ 2 follows: w = JZZM . This index is closely related to the 1 J Pol noncentrality parameter, it = nw“. Cohen provided the following guidelines for ‘ In his first edition of Statistical Power Analysis for the Behavioral Sciences, Cohen proposed a slightly different ES index: e = Mn = wz. 5 interpreting the values of w: 0.1 corresponds to a small effect size, 0.3 to a moderate effect size, while 0.5 is considered large. In practice, w is not likely to be greater than 0.9. Several effect size surveys have found the average w to be approximately 0.3, at least in the field of psychology (Haase et al., 1982; Cooper and Findley, 1982). Therefore, if a researcher lacks an empirically-based alternative hypothesis, setting w to 0.3 is a plausible alternative. Cohen suggests that power should be set at 0.8. This study will evaluate the behavior of X2 at two target sample sizes. The first, n5, is the sample size where the theoretical power is .5 for a large ES, i.e., n is determined after constraining the noncentrality parameter to be .5 and the noncentral x2 to be 0.5. If a researcher has a sample size equal to n5, he or . she will have a 50-50 chance of detecting a large effect size. The second target sample size, n8, corresponds to a power of .8 for a medium ES (w = .3). N5 serves as the lower bound to sample sizes that may be considered by researchers while n8 reflects a reasonable goal for most research. The specific sample sizes that correspond to each target sample size are listed at the end of this chapter. ‘ Measures of discrepancy One problem that exists in the literature on categorical data is the lack of a quantitative index for describing tables where the expected cell frequencies are not all equal. Researchers often resort to qualitative descriptions such as “a highly skewed distribution of expected cell frequencies.” Three quantitative indices are proposed here. Many authors use the minimum expected cell frequency (em...) as their criterion for indicating how discrepant the observed table is from a table where all cell frequencies are the same, i.e., the uniform table. Some researchers are also interested in the number of cells with small expectations (Cochran, 1952, 1954; Yamold, 1970). In particular, Yamold proposed r, the number of cells with an expected frequency of less than five. The disadvantage of using em... or r is that they vary with the sample size. I propose two alternate indices that remain invariant: the minimum expected cell probability, pm... and n(p), the number of cells that have probabilities less than 11k, where k is the total number of cells. In a uniform table, pm... = 1lk and n(p) = 0. The third index used in the present study to indicate how discrepant an observed table is from the uniform table is a global index, R = Z 1/p.. R is an element of three formulas for estimating the variance of X2 (Pearson given in Lawal & Upton, 1980; Haldane given in Lawal, 1992; and Morris given in Koehler and Lamtz, 1980 ). The use of R is of interest since it is a key component of the variance estimates, and the fit of Xz’s distribution to its asymptotic distributions is thought to be related to the variance of X2. When there are small cell expectations, the variance of X2 can be much greater than the variance of 12 (Lawal, 1991). Sampling distributions and tests for categorical data There are two tests that are usually thought of whenever one deals with categorical data, namely the test of independence and the goodness of fit test. Researchers would conduct a test of independence to determine whether two 7 variables are related, e.g., gender and level of job satisfaction. The degrees of freedom for x2 is (l-1)(J-1). The expected cell frequencies are calculated from the marginal probabilities: e; = nplpj. The procedure for the goodness of fit test differs from the test of independence in two ways. The expected cell probabilities are specified by the null hypothesis and the degrees of freedom is k-1. One application of this test, given by Pearson when he introduced his statistic in 1900 (cited in Agresti, 1990), is analyzing the outcomes from a roulette wheel. If the wheel shows no bias then each outcome has an equal probability of occurring, therefore, under the null hypothesis e. = n n = n (1/37). Only one subscript is used since these tables are one-dimensional. In the two examples described above, the data are sampled from a single population. There are two possible sampling distributions, namely, the multinomial and the Poisson. The Poison differs from the multinomial in that the sample size is not fixed; n itself has a Poisson distribution (Agresti, 1990). It is possible to sample from more than one population. The relevant sampling distribution is the product multinomial. For the goodness of fit test, the degrees of freedom will be reduced by the number of groups sampled. Given i groups and J categories, df = l (J-1) s IJ - l = k - l. The corresponding contingency table test, i.e., the test of homogeneity, has the same degrees of freedom as the test of independence; both are constrained by the marginal totals. This discussion of sampling distributions would not be complete without mentioning the hypergeornetric sampling distribution. In this case, all of the marginal frequencies are fixed: e, = n m. 1g. Agresti (1990) maintains that the only appropriate test in this situation is Fisher's exact probability test, therefore it will not be considered in this study. Wickens (1989) presents other alternative that also are appropriate. Agresti (1990, p. 39) uses an example to clarify the differences among the above sampling models. A two-way table is defined by seat-belt use (yes, no) and whether the driver survives the accident (yes, no). If the data include all reported accidents occurring on the Massachusetts turnpike in a year, then the cell frequencies are Poisson random variables. The cell observations have a multinomial distribution when a subset of the population is randomly chosen, say 100 accident reports. If the researcher decides to sample 50 drivers who didn’t wear seat belts and 50 seat-belt users, then we have a product multinomial sampling distribution. A different outcome needs to be chosen in order to illustrate the hypergeometric sampling distribution. Let’s say the sample of accident reports (from the product multinomial case) are given to an expert who is asked to determine which 50 drivers were most likely to have worn seat belts. The expert’s answers (likely, not likely) are compared to the actual data which were withheld from the expert (seat belt, no seat belt). The resulting 2 x 2 table will have marginal totals that are all fixed to equal 50 by design. This study will focus on four tests for which Pearson’s X2 is appropriate. These tests are defined by the two dimensions that were described in this section, namely the sampling distribution and the method of calculativng the expected cell frequencies - the goodness of fit tests depend on H, while the contingency table tests depend on the marginal totals. The target sample sizes described earlier (n5 and n8) will vary depending on the tests as well as the table size. The degrees of freedom will vary also. These are listed in the lF—' .7! following table. Goodness of fit tests Multinomial 4 3 24 122 16 15 48 216 Product multinomial 4 2 20 108 16 12 40 196 Contingencttable tests Test Of independence 4 1 16 88 (Multinomial) 16 9 40 176 1 9 Hom_ogeneity test 4 16 88 (Product multinomial) 16 40 176 The distribution of X2 for these four tests will be compared in Chapter 3 and considered separately in the following chapters. 10 Chapter 2 SIMULATIONS The next five chapters will present the results from multiple simulations. The underlying procedures common to all are described in this chapter. Two related topics are treated in separate sections: confidence intervals and a description of the tables that are used in more than one chapter. The simulation programs were written in UNIX SAS version 6.07 (SAS Institute Inc., 1990). Four data sets were generated to assess the behavior of X2 across tests. _ Using the same data eliminates variation in the generated data as a possible cause for any differences seen in the results. ’ The general strategy of the simulation programs was to partition the table by the predetermined cell probabilities (pl). For example, the limits for cell 1 are [0, ml”, the limits for cell 2 are (p1, p2], and so on. Each generated random number, u, was then assigned to the cell for which p. s u < pm. The product multinomial case differed from the multinomial in that each row was treated as a separate table. One critical aspect of simulations is the process used to generate the random numbers. The SAS uniform random number generator uses a prime ' 2 The bracket is inclusive while the parenthesis is not: “[0” means “including zero” while “p1)" means “up to, but not including, pl.” 11 modulus multiplicative generator with modulus 231 - 1 and multiplier 397204094. This particular combination has been tested and found to be one of the better random number generators (Fishman and Moore, 1982). The programs were tested to see how well the generated data conformed to the target sampling distribution. The observed cell means were compared to their theoretical values: d. = 3. — npr The standardized residuals, n, ='d.,l 033, were plotted against the expected 2 scores, 2., = 12.0“ = 7.815. At n = 12 the similar set (a, o, 1, 11) is no longer significant: 6.73 < 36". This results in the drop observed in the power function. In Panel b the plot appears to smooth out near it = 100 for n(p) = 1 (where em... = 4) and at n = 120 for n(p) = 3 (where em... = 4.8). The power plots are comparatively smooth for the large table (Panel c), even though two-thirds of the cells have very small expectations: n(p) = 12, p..." = .012. The two extreme His show that very different power plots can be creebd for the same table. At the maximum sample size em... is 2.8. In none of the plots do the two H1s converge. In Panel a, there is a difference of 13% in the rejection rate between the two alternate hypotheses at n = 200 (em..= .2). In Panel b, the disparity in the rejection rates beMen the two hypotheses is 6% for when e“... = 8. In Panel c, at the rneximum sample size, 6..., = 2.8 and the disparity in the rejection rates is nearly .10. The observed power plots are all outside the confidence interval of the asymptotic distribution - even when all cell expectations are greater than five (Panel b). 29 Discussion Type I error was found to be sensitive to several factors: the size of the minimum expectations, the number of smell expectations, and the size of the table. Power was found to be sensitive to an additional factor, namely the pattern of differences posited by the eltemetive hypothesis. Power plots where the small cells were larger under H. were quite different from those where a majority of the small cells were smaller. Theepproximetion of Xz’s distribution by 38 does appear to be satisfactory for sample sizes smaller than those generally recommended. However, under the serne conditions the power distributions of X2 are not well approximated by the noncentral 16. As suggested by Figure 4-3 Panel b, power can be underestimated by the noncentral 3" even when the sample size is larger than that recommended by any of the present guidelines. Admittedly, the observed power is not greatly overestimated and the case used is extreme. Any recommendations based on these limited number of cases would be premature. Further work controlling all four known factors is needed in order to develop reliable guidelines. 30 Application This section is meant to illustrate how to apply the simulation resUlts to a hypothetical example. A simulation was run to test the predictions made.’ An example was described in previously in the section “Implications for researchers.” The four-cell table had two small cells. These cells represent the extremes on the spectrum of educational level. If local teachers are higher than the national average at one end of the educational spectrum, they are likely to be lower than the national average at the other. In other words, it’s unlikely that a state having a higher percentage of teachers with advanced graduate degrees would also have more teachers who have not attained a bachelor's degree. Therefore, the alternate hypothesis is not likely to be an extreme case where both small cells are smaller than under Ho. From the simulations in part 1, we can expect that the Type I error to be acceptable as long as em... 2 .96. (Refer to Table 4-1, k = 4, n(p) = 2.) Given that pm... is .009 for this example, 11 should therefore be at least 107. The results of part 3 suggest that power is likely to be somewhat less than predicted by the noncentral 3" even when n = 122 (n8). (Refer to Figure 4-2 Panel d, case F215.) The actual power for this specific case was .03 less at n = n8 for the H, which 5 The data presented in all four application sections are made up. The confirmatory simulation runs used data generated by Numen'cal Recipes’ RAN2 (Press et al., 1992). This program uses a L’Ecuyer generator with a Bays- Durham shuffle. 31 posited that local teachers would have higher educational levels than the national average. Contrary to expectations, the other H1 tested showed more power (+.02) than predicted by the asymptotic power distribution. The second H1 posited that local teachers are less well educated than their national peers: Pm... became larger under H1. The predictions based on the previous simulations were therefore not entirely misleading although the power trend for one of the eltemetive hypothesis was opposite of what was expected. Power cannot yet be accurately predicted by the results of this simulation study. 32 Chapter 5 THE GOODNESS OF FIT TEST UNDER THE PRODUCT MULTINOMIAL SAMPLING MODEL It may be best to explain the product multinomial case of the goodness of fit test by contrasting it with the usual multinomial case. In the example used in the previous chapter, we were interested in teachers’ level of education. Let's say that it is known that teachers’ level of education is not homogeneous across all groups, specifically that high school teachers are more likely than any other group to have a graduate degree. If our sample has a higher percentage of high school teachers than in the national sample, this bias may cause us to erroneously reject the null hypothesis. One option for controlling this bias is to sample from each group and test against the expected proportions for each separate group. This, then, is the product multinomial version of the goodness of fit test. The research question remains the same as for the multinomial case: Are local teachers comparable in level of education to the nation as a whole? The number of degrees of freedom, however, differs. For I groups and J categories, the correct degrees of freedom is I(J - 1) or k - l. Otherwise the goodness of fit test is carried out in the usual manner. I have found no empirical studies for this version of the goodness of fit test. In Chapter 3, it was seen that the product multinomial case followed the same trends as its multinomial analog. In part 1, the extent of this similarity is 33 evaluated by comparing the simulation results for the two tests. In part 2, the impact of varying the size of the samples is considered. Part 1. Comparison to the multinomial case Methodology. Set I tables with n(p) = 12 were used. These were chosen because the fit of the observed power distribution to the asymptotic was found to be poor. The differences in fit for the two sampling models had to be evaluated indirectly because of the discrepancy in the degrees of freedom: [Observed power (product multinomial case) - predicted power (df = 12)] - [Observed power (multinomial case) - predicted power (df = 15)]. Results. The differences in fit are plotted in Figure 5-1. At n = n5 the differences in fit are nearly all negative (Panel a). For the small effect sizes, where power is slightly underestimated for both sampling models, the negative differences mean that the multinomial case has a stronger liberal trend than the product multinomial case. The interpretation is different when the effect sizes are large. Power is overestimated in both cases, but more so for the product multinomial. These differences, however, are small with the largest (in absolute terms) being -.023. At n = n8 the differences in fit are random - the product multinomial case does not show a consistent bias. The differences, again, are generally small. The two cases can therefore be considered as equivalent, at least when the group samples are all equal in size. This simulation is replicated in the next part with tables where the groups are not equal in size. Part 2. Varying the size of the samples Methodology. Set ll tables are used, along with their set I counterparts, namely S860 and S875. These are the tables where the minimum expected cell frequencies are held constant while the marginal probabilities are varied. Since it was found in the previous chapter that the patterns of differences under H1 affected power, this factor was controlled as much as possible. Specifically, I attempted to set the smallest frequencies equal across all tables for a given w. The table specifications can be found in Appendix A. Results. Figure 5-2 presents the power plots. For both series, the best fit to the asymptotic power distribution occurs when the samples are equal (S860, $875). What is striking is the fact that both the 860 and 875 series have similar plots even though the minimum cell frequencies are smaller for the latter. The 875 series has only slightly less power (approximately -.02) than the 860 series when the effect size is large and n = n5. Both are reasonably well approximated by the noncentral X2 when n = n8. The discrepancies seen in the power distributions at n = n5 (Panels a and c) cannot be explained by the factors that have been considered previously. Em and H1 patterns can be ruled out since these were held constant. Although the number of small cells do vary somewhat, discrepancies are seen beMen tables with the exact same n(p). For example, table C’s Observed power at w = .5 is .27 more than that of table a even though they both have n(p) = 6. Two other possible factors are marginal totals and the distribution of e. within the rows. 35 Let’s first consider marginal totals as a possible factor. There are two pairs of tables with the same fixed row totals (1: a and b; 2: c and d). Tables c and d do have similar Type I error rates and observed power distributions. The same cannot be said for tables a and b. They show a .17 disparity in power at w = .5. This finding seems to rule out marginal totals as a factor affecting the power of X2. The possibility that the distribution of elwithin each sample is the explanatory factor cannot be answered with the sample sizes used in this section. At n = 40, all of the 6...... are below the minimum observed values found in Chapter 4 while they are all larger than the minimum values at n = 196. Other sample sizes are considered in the next simulation. Part 3. Distribution of elwlthin samples . Methodology. The same tables are used as in part 2. Fewer effect sizes were considered, namely w = .3 to .8. One sample size was chosen so that tables a and c would have em... larger than the minimum Observed value for em,1 (as reported in Table 4-1) while tables b and d, with three small cells, will have an em... below the minimum observed value. This sample size is 96 for the 860 series and 128 for the 875 series. A second sample size was chosen near the minimum observed value for tables b and d. Results. The power plots are presented in Figure 5-3. The distribution for S875d shows markedly less power. It is a case where n(p) = 3 therefore it and, to a lesser extent, S860d appear to confirm the expectation that power plots associated with tables having three small cells per group would have less power 36 than the plots for tables with n(p) = 2 in each row. However, the other two tables with n(p) = 3, namely S860b and S875b, do not support this hypothesis. Their power plots are not consistently worse than those of other tables for the smaller sample size. Therefore, the number of small cells within each group does not appear to explain the discrepancies in the observed power distributions noted in part 2. Discussion When all samples are equal in size, the power distributions for the product multinomial case of the goodness of fit test are comparable to those for the multinomial case. When sample sizes are not equal, the fit of the observed power distributions to the asymptotic is not as good although this does not necessarily translate as loss of power. In the two series of tables with em... held constant, three of the four tables with unequal samples had more power than the tables with equal sample sizes. I was not able to isolate what specific factor or, more likely, the combination of factors that could explain the discrepancies of the observed power from the asymptotic power distribution. Application A The application problem will follow up on the example used at the beginning of this chapter. Let's say that the national survey of teachers’ level of education yielded the following results when broken down into four groups. The total sample size is 13,060. 37 < Bachelor’s Bachelor’s Master's Master's Total + 30 Primary N 64 2925 1577 4 4571 % of group 1.4 64.0 34.5 0.09 % of all 0.49 22.4 12.1 0.03 Upper Primary N 26 1698, 1528 13 3265 % of group 0.8 52.0 46.8 0.4 % of all 0.20 13.0 11.7 0.1 Junior High N 13 654 706 65 1437 % of group 0.9 45.5 49.1 4.5 % of all 0.1 5.0 5.4 0.5 High School N 1 1 1428 2049 299 3787 % of group 0.3 37.7 54.1 7.9 % of all 0.08 10.9 15.7 2.3 From Table 4-1, we can expect that the Type I error rate will be acceptable if em... is at least .44 (k = 18, n(p) = 8). As pm... is .0003, n should be 1437. They Type I error rate will be liberal for smaller sample sizes. The simulation results showed that Xz’s power tends to be close to the power approximation. (Refer to Figure 4-2 Panel I.) However, the application table has cell expectations much smaller than any of the simulation tables, therefore power can be expected to be less. The results from the confirmatory simulation run are presented in Figure 5-4. The group sizes are all equal. The four sample sizes considered correspond to expected powers of .80, .90, .95, and .99. The Type I error rates are all liberal, as predicted above. Observed power is considerably less than that of the noncentral X2 approximation for two of the eltemetive hypotheses. 38 The difference is more marked for the “Shift down” case where smaller cell frequencies were predicted for the Master's + 30 level. This result runs counter to the Chapter 4 application result where the “Shift down” H1 showed more power! The hypothesis which posited no Change for the small cells (“No extremes”) had Observed power close the nominal values. In summary, the predicted trends were correct for both Type I error and power under the two hypotheses predicting differences for the small cells. Power, however, was much lower than I expected. 39 Chapter 6 THE TEST OF INDEPENDENCE The test of independence differs from the goodness of fit test in that the expected cell probabilities are not predetermined but are calculated based on the marginal probabilities: on = n p._ p,,. These expectations cannot be known precisely before collecting the data therefore determining sample size will be a process of guess-estimating. Some have suggested a multi-stage sampling procedure when there is very little information about the possible values of the marginal probabilities (e.g., Horn, 1977). Simulation studies (Camilli and Hopkins, 1978; Craddock and Flood, 1970; Bradley et al., 1979) have consistently found that X2 is robust as long as the marginal probabilities are not extremely skewed. For tables varying in size from 2x3 to 5x5 and with nearly equal expected frequencies, Craddock and Flood found that the x2 approximations of X2 is accurate at the 90'”, 95‘" and 98" percentiles for n as small as k. In their extensive simulation study, Bradley et al. found that Type I error rates will not exceed .06 unless both sets of marginal probabilities are extreme skewed. If one set of marginal probabilities is highly skewed while the other is nearly uniform, the Type I error rates are conservative. This conservative bias, as remarked by Bradley, appears to be tolerable to many researchers even though power may be adversely affected. Koehler (1986) and Agresti and Yang (as cited in Agresti, 1990) considered much larger tables. For 10x10 and 20x20 tables, 6,, can be as small as 0.5 when all the expected 40 frequencies are equal. When both sets of marginal probabilities are highly skewed, Koehler found the X2 approximation to be poor for large, sparse tables. Agresti and Yang, on the other hand, found that the Chi-square approximation is adequate given a large table (100 cells) and n = k for marginal probabilities as small as .05. Their tables were not as skewed as those in Koehler’s study. An empirical study on the power of Pearson’s chi-squared test of independence for 2x2 tables was carried out by Bradley and Seely (1977). They found errors of approximation when n is small. These errors are most serious when a small it is combined with highly skewed marginal probabilities. For example, given n=20 and marginal probabilities of .1 and .9, the actual power is .8 whereas the power based on the noncentral 1,2 distribution is greater than .95. In an earlier study Harkness and Katz (1964) compared power estimated by normal approximation methods developed by Patnaik and Sillitto with an exact test, the uniformly most powerful unbiased size a. test (UMPUT), for three types of contingency tables. The power of all three tests was overestimated by the normal approximations though the discrepancies were not large. Only 2x2 tables and n s 30 were considered. In summary, the simulation studies focusing on Type I error suggest that X2 is robust when n is small unless the marginal probabilities are highly skewed. On the other hand, power simulations ( i.e., Bradley and Seely, 1977) found that the noncentral x2 approximation is more sensitive to these factors, at least for 2x2 tables. The initial results presented in Chapter 3 bear this out: Power was 41 found to be seriously overestimated for the generally recommended sample size, n = k or 16, and even for the larger sample size of 40 (n5). Implications for researchers Many different guidelines for sample size have been proposed. Cochran’s (1952, 1954) guidelines are still frequently cited in textbooks. He suggested that at least 80% of cells should have e. 2 5 while the remaining cells can have expected values as small as 1. As stringent as Cochran’s guidelines are, there are researchers that have recommended even larger sample sizes. Hays (cited in Bradely et al., 1979) recommended that all e, 2 10 when df=1 and a minimum of 5 for larger tables. Tate and Hyer (cited in Bradely et al., 1979) . argued for a minimum 6., of 20. Bradley et al. considered these recommendations as prohibitive and remarked that “traditional rules‘of thumb based on minimum expected frequency, without regard to the marginal distributions, do not provide selective protection against errors of approximation where such protection is needed most” (p. 1295). Roscoe and Byars (1971)6 suggested the following guidelines, given a = .05: n 2 2k when the marginal probabilities are uniform; n 2 4k when the probabilities are moderately skewed; n 2 6k for tables with extremely skewed marginals. ° This study is cited frequently in the literature related to the test of "independence although the actual sampling distribution used is the product multinomial. As the two sampling distributions were found to give similar results in Chapter 3, Roscoe and Byars’s guidelines are included in this section. 42 A more recent set of guidelines based on simulation studies was offered by Wickens (1989, p. 30): 1. For tests with 1 degree of freedom, all the [Ag [cell expectations] should exceed 2 or 3. With more degrees of freedom, p. as 1in a few cells is tolerable. In large tables up to 20% of the cells can have pg appreciably less than 1. The total sample should be at least 4. or 5 times the number of cells. Samples should be appreciably larger when the marginal categories are not equally likely. . 3”!” 5"? The main drawback to these guidelines is the vagueness of some of the terminology. When should one consider the marginal probabilities to be extremely rather than moderately skewed? How much is “substantially more?“ Obviously, these different guidelines lead to different sample sizes. To illustrate how different the sample sizes can be, ns are calculated for a few tables that will be used in the simulations. Table pm... Cochra Tate & Roscoe Wickens Power Power Pn n Hays Hyer 8: Byars n5 n8 emm=5 emh=20 n=6k em>1 S475 .0047 1064 4255 96 >213 40 176 241 S475b .0047 1064 4255 96 >213 40 176 86 S875 .0085 589 2353 96 >118 40 176 153 $875b .0085 589 2353 96 >118 40 176 75 Pn in the last column refers to an index that I wish to introduce here. When n is small, it is possible to end up with a marginal total of zero especially if the marginal probabilities are skewed. When that happens the expectations for that row’s (or column’s) cells are zero and it then becomes impossible to calculate X2 for all cells. The probability of getting a marginal total of zero for a specific 43 sample size can be calculated using Z.(1"Pr.)" + 210- p)". This estimate is accurate for small probabilities (i.e., less than .05). Pn is the sample size where the probability of getting a marginal total of zero is .01. This index will be considered along with the other factors, namely 6...... and R, in the following simulation. If any of these indices are useful in predicting when Type I error is close to a, we would then have a quantitative index that can be helpful in determining sample size. Part 1. Type I error rate Methodology. Set ll tables were used where pm... was held constant within each series of tables while the marginal probabilities were manipulated. These tables ' are described in Chapter 2 and Appendix A. Sample sizes ranged from 16 to 1000. The 95% confidence interval for the Type I error rate is .4 to .6. Whenever a generated table did have a marginal total of 0, it was treated as a failure to reject Ho. Results. The Type I error rates are plotted in Figure 6-1. The error rates substantiate Bradley et al.’s (1979) conclusion: When both sets of marginal probabilities are extremely skewed the error rates are higher than the nominal or; otherwise X2 tends to be conservative. Apparently both sets of marginals need to have at least one probability less than .1 for the Type I error to become liberal (i.e., larger than expected). For some of the tables, 11 must be quite large before Type I error falls within the confidence interval (notably S875b). If one sets wider tolerance limits, 44 as did several of the researchers cited above, then these results do substantiate their conclusion that X2 is fairly well approximated by X2 for the test of independence, even when the marginal probabilities are extremely skewed. The majority of tables have distributions that are within (.3, .7) for n 2 32. There are exceptions, the more notable being S475, S860, S860c, and S875. Neither pm... nor R appear to be useful for predicting how close the Type I error rate will be to the nominal, a. If pm... (or, alternatively, er...) were the determining factor, then the error rates would be similar within each series. However, this is not the case. For example, S475a falls within the tolerance limits at n = 40, em... = .188 while this doesn’t happen for $475 until 11 = 136 and em... = .64. There would also be noticeable differences across series. The 875 series should be worse than the 860 series (Pm = .0085 versus .0115 for the 860 series). The same argument can be made against R. The tables with the largest values are not necessarily the worse. By this criteria, all of the 475 tables should have poorer fit than the 875 tables (excepting S875 itself). Though the lowest R values (S875e, b, c and $860a) do tend to have good fits, this is not consistently true (8860). The index based on the marginal totals, Pn, does show some usefulness in controlling Type I error. Sample sizes that are greater than Pn have error rates well within the tolerance limits. 1 Part 2. Power Methodology. For comparative purposes, set I tables are presented here along with two tables from set ll, namely S869b and 8875b. These latter tables 45 have Type I error rates that are higher than expected. Two sample sizes are considered for these sixteen-cell tables: n = 40 (n5) and 176 (n8). Results. The power plots are presented in Figure 2. Power is well approximated by. the noncentral X2 at n = n8. However, this is not the case when n = 40. For these tables with skewed marginals, power is fairly consistently overestimated by the noncentral chi-square distribution. This is true even for the tables associated with a liberal Type I error rate (Panel e, S860b and S875b). The observed power distributions for these tables are also overestimated in the range of interest, namely w = .5 to .7. Four-cell tables with extremely skewed marginal probabilities have a particularity in that they have a restricted range for the effect size. If one column (or row) total is small relative to the other, there is an upper limit to the size of ES. In these trials, the largest effect size is w = .4. Power is overestimated for small n, but well approximated by the noncentral x2 at n = n8. As was seen in Chapter 3, the overestirnetion of power is much greater for the test of independence than for the goodness of fit test. Given k = 16, when the number of small expectations was not very large (n(p) s 8), the fit of the observed power distribution by the noncentral X2 was good for the latter test. For the test of independence, the observed power can be as little as half of that predicted by the noncentral 7". Another difference between the two tests is that the number of small cells does not seem to be a factor affecting power for the test of independence. The power plots are fairly similar across n(p) (i.e., compare Panels 6, e, and g). 46 In summary, the power plots of S860b and $875b eliminate pun/em... and R as determining factors. If the first case were true, these plots would be similar to those of their respective set I counterparts, S860 and $875. The observed power for the former tables was greater for all effect sizes. If R was the determining factor, then their power plots would have showed less power than that of S860. However, this expectation is contradicted by the results. In part 1, it was found that when n 2 Pn, Type I error was within the tolerance limits. Can the same be said for power? This question is the motivation for the next simulation. Part 3. Pn and asymptotic fit Methodology. The same set of sixteen-cell tables used in Part 2 are used here. The sample size was set to Pn rounded up to the nearest factor of .5k. Results. The power plots are presented in Figures 3. The fit of the observed power distribution to the noncentral x2 is not ideal for all values of w. It seems worse when power is in the middle ranges. The difference between the Observed and nominal powers are plotted against the nominal values in Figure 6-4. The relationship is parabolic for power estimates between .05 and .80. The maximum difference in fit is .09, corresponding to a 9% decrease in the rejection rate. DISCUSSION The above simulations confirm previous research: The chi-squared test of independence is quite robust as far as Type I error is concerned - as long as one accepts tolerance limits that are somewhat wider than the confidence 47 interval. However, when marginal distributions are skewed and n is small, power can be seriously overestimated by the noncentral 77’. Most of the available guidelines for determining sample size recommend sample sizes that are much larger than needed. It was also found that the distribution of marginal probabilities is a better indicator of the Pearson statistic’s fit to its asymptotic distribution than 6...... One practical issue not raised in the literature on the test of independence is that small sample sizes may result in marginal totals of 0. A researcher can avoid this problem by calculating Pn, defined in this study as the sample size where the probability of getting a marginal total of zero is 1%. An easier method that yields a similar answer is to multiply the minimum estimated marginal probability by 5.5. This sample size is large enough for the Type I error to be reasonably close to at. Power, however, can be overestimated by as much as .09 when n = Pn. Some adjustment to power estimates is recommended. An application A professor is interested in knowing whether the level of exposure to advanced math courses is related to success in her introductory statistics course. Based on a survey she finds the following distribution for highest level of math course taken. Factor 1: Highest level of math taken Percentage No college level math 10 College algebra 55 1 year of calculus 15 1 year or more beyond calculus 20 48 Based on previous experience, she expects the following distribution for grades. Factor 2: Grade Expected percentage 4.0 30 3.5 20 3 40 s 2.5 10 Her current enrollment is 40 students. Is the sample size large enough for a reasonable level of power? To answer the question a plausible effect size must first be determined. One strategy is to calculate w for a possible set of data if a high (but not perfect) correlation exists. If the students are distributed as shown in the following table, w = .87, a considerably large ES. The expected power is better than .90 for w greater than .7. s 2.5 3.0 3.5 4.0 No college math 2 2 0 0 College-algebra 2 14 4 2 1 year calculus 0 0 2 4 > 1 year calculus 0 0 2 6 It was found in this chapter’s simulations that Type I error rates generally fell within the range .3 to .7 when the sample size was at least 32 for sixteen-cell table. (Refer to part 1.) The marginal totals of the application table are not extremely skewed - no proportion is expected to be less than .1 - therefore the trend of the Type I error should be conservative. Marginal totals of zero are not a concern here but two marginal totals are less than five; a sample size of 40 is therefore less than Pn (which equals 51 for 49 this example). Actual power can be expected to be overestimated by the noncentral 3". (Refer to Figure 6-2, Panels c and 3 for n(p) = 4 and 8 respectively. N(p) is 6 for the application table.) The overestimation will decrease as the effect size increases. (Refer to Figure 6-4). In spite of the overestimation, a sample size of 40 appears to be large enough for detecting a large effect size with a power greater than .80. The confirmatory simulation run had a Type I error rate of 4.4% which does fall within the expected range. The power distribution is given in Figure 6- 5. The discrepancy between observed and actual power does not consistently decrease as the effect size increased as was predicted above. The largest discrepancy, though, is for w = .5. Observed power is not too seriously overestimated, supporting the conclusion that the sample size is large enough. 50 Chapter 7 THE HOMOGENEITY TEST The calculations for the test of homogeneity are carried out in the same manner as the test of independence. The difference is entirely in the sampling procedure. One set of marginal totals corresponds to the samples taken from the various populations. The objective is to determine whether the populations are similar on the characteristic measured. For example, one may ask whether career aspirations of medical students are similar across ethnic groups. The homogeneity test has been studied less frequently than the test of independence. Camilli and Hopkins (1978) found the homogeneity test to be somewhat conservative when both sets of marginal probabilities were skewed (e... s 2) but otherwise it was robust for 2 by 2 tables when the sample size was at least 20. A simulation study by Roscoe and Byars (1971) considered two equal groups and varying marginal probabilities on the second dimension (uniform, moderately, and extremely skewed). They reported X2 to be “strikingly robust.” At the .05 level, Type I error was conservative for the smallest sample sizes when the column totals were skemd. They also reported that when both sets of marginals were extremely skewed, the Type I errors were “a bit erratic (though generally conservative)” Garside and Mack (1976) calculated the exact Type I error rates for 2 x 2 tables. All but a very few error rates fell in the .04 to .06 range for a = .05. Lamtz (1978) tested a 2x3 table with two equal-sized groups. X2 was close to nominal values for n 2 16 and below nominal for smaller 51 n. These three studies are therefore consistent in finding that X2 is robust and tends to be conservative when n is small, much like the results found for the test of independence. l have not found any simulation studies on the power of the homogeneity test but there have been some theoretical work done. Meng and Chapman (1966) presents Neyman’s proof that the optimum sample size for a 2:9 table is n, = n; = N/2. The test of independence has less power than a homogeneity test with equal group sizes. Harkness and Katz’s (1964) theoretical study of exact power found that this superiority in power held for n s 30 and when the two groups were not equal in size. Although higher in power than the test of independence, the homogeneity test’s power is still overestimabd by the normal approximations developed by Patnaik. Implications for researchers Recommendations made for the test of independence appear appropriate for the homogeneity test. Ideally the all the samples would be equal in size as this would maximize power. When the marginal totals are skewed and/or 11 is small, the power of x2 will not be closely approximated by the noncentral x’. but - research suggest that the test of homogeneity is more robust then the test of independence. How much more robust is the question considered below. Methodology. Set I tables with n(p) = 8 are used along with two tables from set ll, namely S860b and S875b. The set I tables have equal sized groups while the set ll tables have skewed marginals on both dimensions. Two sample 52 sizes are used: n = n5 which is 40 for both tests, and n = Pn. The value of Pn will depend on the table. Results. The power plots for the test of independence and the homogeneity test are presented in Figure 7-1, along with the differences found between the two tests’ observed power. In Panel a one can see that the homogeneity test does tend to have more power for the larger effect sizes when n = 40 and its Type I error rate (w = 0) is slightly more conservative. This superiority does not hold when 11 increases (Panel d). The two tables with unequal sample sizes Show the same pattern (Panel 9): The homogeneity test’s superiority in power appears to exist only for large effect sizes and small n. The maximum observed difference in power is .05 (Panel 9) with nearly all other positive differences being less than .03. Discussion The homogeneity's test theoretical superiority in power over the test of independence was confirmed but found to be significant only for large ES and small 11. Guidelines developed for the test of independence appear to be generalizable to the homogeneity test. Application From a ten-year old large-scale study, it was found that career aspirations among medical students differed across ethnic groups. A replication study is being considered. Previous data provide the following information. Sixty-five percent of medical students are white, 25% are black, 7% are Hispanic, and 3% are Asian. The breakdown for career aspirations is: Private practice, 54.0%; 53 Salaried positions, 12.9%; Faculty positions, 29.5%; the remaining 2.7% are lumped together as “Other.” The effect size is expected to be moderate at best. If the smallest group size is 10 to 24% of the overall sample size, Pn will be 175. Since only one set of the marginals will be extremely skewed, the Type I error rate can be expected to be conservative. A sample size of 176 is theoretically large enough to detect a moderate effect size with a power of .8. The simulation results suggest when the sample size is greater than Pn, Type I error will be reasonably Close to a and the power approximation will also be close to the observed power. (Refer to Figure 7-1, Panel e, Table 8875.) The confirmatory simulation run with the smallest group making up 10% of the overall sample does substantiate the predictions: The Type I error rate was 5.8 and power was .82. The results were slightly better when all groups were set equal: The Type I error rate was 4.2 and power was .80. Chapter 8 SUMMARY AND RECOMENDATIONS Although the asymptotic distributions for Pearson’s chi-squared statistic are the same across tests, it was shown here that X2 behaves differently when n is small. The fit of Xz’s observed distributions to the asymptotic is further worsened when the distribution of expected cell frequencies is not uniform. Under these conditions, the gOodness of fit X2 tends to have a liberal Type I error. In contrast, the test of independence is generally conservative unless both sets of marginal probabilities are extremely skemd. For both tests it was found that power estimation is more sensitive than Type I error. Overestimetion of power is much more serious for the test of independence than the goodness of fit test. The product multinomial analogs of these tests have similar trends. Several sample size guidelines were considered for each test. These yielded greatly divergent sample sizes. The objective of the earliest guidelines was to have a close approximation of Xz’s Type I error rate by 352. These guidelines are stringent and their recommended sample sizes tend to be large. Later guidelines based on simulations considered a looser fit as acceptable, therefore these sample sizes are often considerably smaller. Though there have been empirical power studies, these haven’t led to sample size guidelines. This study attempted to combine both perspectives for evaluating sample size guidelines. 55 A related problem is how best to describe tables with cell expectations that are not uniform. The minimum cell expectation is frequently the criteria used by sample size recommendations. It was found to not be a sufficient criteria for the goodness of fit test and it is not as useful as marginal totals for the test of independence. Several factors are involved in the former case: not only the size of the minimum cell expectation, but also the number of small expectations, the size of the table, and, for power, whether the small cells are smaller or larger under the eltemetive hypothesis. These factors cannot be all combined into a single index nor can a simple guideline be developed that would account for all of the factors. The test of independence was easier to deal with. A quantitative index based on the marginal totals, Pn, was described. If the sample size is larger than Pn, a researcher can be confident that the actual distribution of X2 is fairly well approximated by its asymptotic distributions. Recommendations for future research A tension exists between “good enough” for practical purposes and the theoretical perspective. Ideally the sample size should be large enough that the statistic’s actual distribution will match its asymptotic distribution. Extreme cases, though, pose a dilemma for practitioners. Given a table with extremely small expectations, the sample size needs to be very large before one can expect a good approximation by the asymptotic distributions. This may neither be feasible nor even desirable. If the researcher is only interested in eveanting a moderate to large effect size but the recommended sample size is so large that 56 it will detect a small to moderate effect size with better than .9 power, the researcher would be justified in thinking that some middle ground should be found! Guidelines that provide adjustments for less than ideal cases would help in this type of situation. i The tentative guidelines suggested here need to be refined and tested to other table sizes in order to make them more generalizable. Determining adjustments for less than ideal sample sizes would also require a large systematic simulation study. Extensions to srneller as and multi-dimensional tables are two other areas where further research is needed. Pearson’s chi-squared statistic, in spite of its well-known shortcomings, is still the most used test for categorical data. With the growing emphasis on power issues, research on the factors influencing the power estimation of X2 should become a greater priority. 57 APPENDICES 58 Table A-1. Marginal probabilities for tables Set I. | Id S_135 S145 S160 S175 S435 S445 S460 S475 k 16 16 16 16 16 16 16 16 np 1 1 1 1 4 4 4 4 Var(X2) 35 45 60 75 35 45 60 75 R 366 526 766 1006 366 526 766 1006 row1 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 row2 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 row3 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 row4 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 column1 0.195 0.191 0.1894 0.1888 0.060 0.044 0.026 0.019 column2 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 column3 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 column 4 0.305 0.309 0.3106 0.3112 0.420 0.456 0.474 0.461 Id 8835 S845 S660 mm k 16 16 16 16 16 16 16 16 hp 6 6 6 6 12 12 12 12 Var(X2) 35 45 60 75 35 45 80 75 R 368 526 766 1006 366 526 766 1006 row1 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 row2 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 row3 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 row4 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 column1 0.113 0.071 0.046 0.034 0.142 0.095 0.066 0.049 column2 0.113 0.071 0.046 0.034 0.142 0.095 0.066 0.049 column3 0.367 0.429 0.454 0.466 0.142 0.095 0.066 0.049 column4 0.387 0.429 0.454 0.466 0.574 0.714 0.602 0.854 Id S1525 S1545 _S1560 STE-75' k 16 16 16 16 np 15 15 15 15 Var(x’) 35 45 60 75 R 366 526 766 q 1006 row 1 0.25 0.25 0.25 0.25 row2 0.25 0.25 0.25 0.25 row 3 0.25 0.25 0.25 0.25 row 4 0.25 0.25 0.25 0.25 column1 0.165 0.114 0.079 0.060 column2 0.165 0.114 0.079 0.060 column3 0.165 0.114 0.079 0.060 column4 0.505 0.658 0.763 0.620 59 Table A-1 continued. Set I. . I Id F—107——-——_F109 F112 F115 F207 F209 F212 F215 hr 4 4 4 4 4 4 4 4 np 1 1 1 1 2 2 2 2 Ver(X2) 7 9 12 15 7 9 12 15 R 32 52 62 112 32 52 62 112 row 1 0.362 0.349 0.342 0.340 0.50 0.50 0.50 0.50 row 2 0.638 0.651 0.658 0.660 0.50 0.50 0.50 0.50 column 1 0.362 0.349 0.342 0.340 0.146 0.084 0.051 0.037 column 2 0.638 0.651 0.658 0.660 0.854 0.916 0.949 0.963 Id F307 F309 F312 F315 k 4 4 4 4 np 3 3 3 3 Var(X2) 7 9 12 15 R 32 52 62 112 row 1 0.196 0.118 0.074 0.054 row 2 0.804 0.882 0.926 0.946 column 1 0.196 0.118 0.074 0.054 column 2 0.804 0.882 0.926 0.946 60 Table A-1 continued. Set ll. Id S47"5a S4" '7'5b $860a S660b' 6'66" To 666' '0d' k 16 16 16 16 16 16 np 4 4 8 8 8 8 Var(X2) 75 75 60 60 60 60 R 1216 1137 662 870 647 913 row1 0.125 0.063 0.125 0.125 0.063 0.063 row2 0.125 0.063 0.125 0.125 0.063 0.063 row3 0.375 0.436 0.125 0.125 0.063 0.063 row4 0.375 0.438 0.625 0.625 0.613 0.813 column10.038 0.076 0.092 0.092 0.164 0.164 column2 0.038 0.076 0.092 0.092 0.184 0.184 column3 0.462 0.424 0.406 0.092 0.316 0.164 column4 0.462 0.424 0.4060 0.7240 0.316 0.446 Id S875a ss—75'b"S6"' 7'5c"—Ss75d"l k 16 16 16 16 np 8 8 8 8 Var(X2) 75 75 75 75 R 666 1157 992 1165 row1 0.125 0.125 0.063 0.063 row2 0.125 0.125 0.063 0.063 row3 0.125 0.125 0.063 0.063 row4 0.625 0.625 0.813 0.613 column10.068 0.068 0.137 0.137 column2 0.068 0.068 0.137 0.137 column3 0.432 0.068 0.364 0.137 column4 0.432 0.795 0.364 0.591 61 .a 835.55 65 53 58x6 385 .9503 93 2 68:39 So 833.305 =8 =< ”262 3. .8.L .5.L .58. La 8.5 .88...» Lo 3. .8.L.5.L.8. co 3. .8.L .5.L .88. so 33. .8.L .8.L .35. no 3. .8.L .5.L .88. no 33. .8.L .8.L .8 5. so 3. .8.L .8.L .88. to 33. .5.L .8.L .85. no 33. .8.L .8.L .85. no N3. .8.L .8.L .88. No 33. .8.L .8.L .55. No «3. .5.L .8.L .555. 3.5 m3. .5.L .8.L .88. 3 «3.833.538. 5 «3:83:58. 5 85 u o. 85%. 3. 8 L 5.L .88. 8 3L .8. .5. .88L 3 3. ..8L 5.L.88. m5 8...m.5..338.L co 3. ..8 L ..5L .8. no 85.53588. no 33. ..8 L 8 L 55. so 3. .8.L .5.L .58. so 33. 8 L .8.L .55. no 3. .8.L .8.L .88. no «3. .8.L .8.L .58. mo 33. .8.L .8.L .55. No «3:83.388. 38 33..Lo..L.8..L.L~5. 3.o «3.8.3.538. o . «3:83.388. 5 85588 :8 3 85.886 :8 a 85 u 2 85nd. . .9 n.- ER «03:52.05 :00 30 cozatomoo .Nr< can... 62 35 .35 .85 .5. .55. .88. .3855 Lo 3. .3. .535 .8. .85 .5. .85 .855 .385 8 35 .35 .85 .L5o5 .385 so 8. .3. .35 .5. .85 .85 .355 .385 so 35 .335 .85 .85 .555 .585 5.5 L3. .3. .35 .8. .8. 85.85.85 .385 me 35 .35 .85 .85 .85 .585 so 3. .3. .335 .8. .8. .8555 .85 .385 so 35 .8....3. .85 .85 .585 no 3.3. .3. .3. .85 .8. .85 .85 .385 no 35 .35 .85 .85 .55 .L5o5 .8 535 .335 .55 .85 .85 .585 No 35 .335 .85 .85 .85 .585 3.0 535 .335 .85 .85 .555 .L85 8 53..e.8..m.L8..e 5 3585.355 5 8513. 853. 3. .3. .35 .85 .5. .395 .885 .585 Lo 3. .3. .535 .8. .85 .55 .8. .555 .3385 Lo 3. .L3. .35 .85 .8. .305 .55 .885 5.0 L3. .3. .35 .8. .85 .85 .85 .585 5.5 L3. .3. .35 .8. .85 .85 .395 .885 me 3. 3. .535 .8. .85 .55 .585 5.5 3.3.. 3 35.8..8..85.85.m55.885 to 3. .33. .35 .8. .85 .85 .55 .585 so 3. .3. .3. .85 .85 .55 .355 .885 no 535 .85 .85 .85 .55 .5555 no 3. .3. .33. .3. .85 .85 .355 .885 No 35 .85 .85 .85 .85 .85 .885 No 35 .85 .85 .55 .585 3.0 335 .35 .85 .85 .585 355 3.0 535 .85 .L85 5 35.85 .85 o 3 82.885 :8 z, 85388 :8 Gasman. 09% u E “30:53:00 Nr< Sam... 63 55 .85 .885 3 35 .85 .395 3 35.35885 3 3585.305 Qo 3585.885 no 3585.305 no 35 .35 .885 to 535 .85 .35 .885 5.0 35 .85 .88..» no 535 .85 .85 .885 no 535 .335 .395 .5585 ad 35 .35 .85 .3385 No 535 .335 .555 .885 3d 335 .35 .585 .3o5 3 535.885 o 335.395 0 38%. 58%. «5.85.555 No 35.805305 3 35.35.5305 ed 35.85.3385 ed :5 .85 .555 no 535 .35 .85 .585 no 35 .35 .5305 to 535 .85 .85 .585 to 35 .85 .555 ad 335 .85 .535 .395 we 53..3..33.5.3o5.3.85 No 335 85.35.3365 No 535 .335 .305 885 3.0 535 .85 .85 .3385 3 335.565 o 35.885 o 82.3805 :8 3 ”23:522.. :8 3 commug mmmw u 2 3:528 5.< mam». 55 .5. .3. .85 .50. .8085 5.0 55. .3. .53. .53. .85 .85 .5305 .3005 3.0 8. .8. .5. .3. .85 505 .8085 5.0 55. .305 .3. .505 .85 .505 .385 5.0 5. .55. .35. .3. .505 .505 .3005 .80005 5.0 55 .35 .85 .585 5.0 85 .505 .8085 5.0 55 .35 .505 .585 5.0 555 .55 .505 .585 5.0 35 .535 .505 .3305 5.0 555 .355 .305 .885 5.0 35 .335 .85 .305 «.0 55 .355 .305 .085 3.0 35 .535 .85 .305 30 55.53053 0 35.35053 0 3535 u 0. 3.535 n 0. 8. .55. .3. .3. .5005 .85 .3085 5.0 5. .535 .305 .8. .505 .505 .3585 5.0 55. .35. .3. .3. .85 .85 .5005 .8005 5.0 35 .335 .505 .3585 5.0 35. .55 .53. .85 .85 .305 .5805 5.0 35 .535 .85 .885 50 555 .35 .85 .385 5.0 35 .35 .85 .305 5.0 55 .35 .85 .8005 5.0 305 .35 .85 .505 5.0 355 .35 .505 .5085 5.0 3.5 .35 .85 .805 5.0 .55 505.305 30 35.53.5605 .85 30 505.3053 0 535.8053 0 5230805 :8 3 833805 :8 a Conant. mmwrm u E 8:53:00 N..< 20¢... 65 55. .85 .385 .5085 3.0 55. .85 .85 .5585 3.0 35. .85 .805 .585 5.0 5. .85 .55005 5.0 8. .505 305.385 5.0 8. .85 .5805 5.0 3. .85 .0585 5.0 mm. .85 .3305 5.0 3. «05.5585 50 8. 505.305 5.0 3. .505 .8005 5.0 8. 505.505 5.0 3. 305.5305 3.0 8. 85.5505 30 3. .053053 0 5.55053 0 335 u .2 5535 u 0. 3. .305 .385 .5585 3.0 «5.85.3585 3.0 53. .85 .305 .385 5.0 35. .305 .305 5.0 3. 505.385 5.0 5. 55.305 5.0 3. .85 .885 5.0 5. .85 .505 5.0 33. 85.3385 50 5. 385.805 50 3.85.305 50 3.85.385 5.0 33. 505.305 3.0 3. 6505.805 30 3-83053 0 3:50.53 0 525.3305 :8 3 85.5305 =8 3 083512 835"... uoaczcoo N.< 03m... d E:E_c_E 05 33 38.8 585.0 .3533. 0.33 03 330352 85 55_3_._nmno.a =8 =< ”302 55 .55. .505 .3 305 .58. 3.0 5. .33. .505 .50. .50. .3305 .38. 3.0 555 .55. .505 .50. .3 305 5.0 5. .335 .505 .8. .50. .3305 .38. 5.0 55. .555 .50. .505 .3 305 5.0 5. .335 .505 .8. .50. .3305 .38. 5.0 55,555.30. 505.3305 5.0 35-35.505.30. 50-3305 5.0 5. .3505 .8. .505 .3305 5.0 35. .3. .505 .530. .3305 5.0 8. .35 .8. .505 .305 .5585 5.0 55. .3. .85 .85 .5305 .5585 5.0 8. .35 .50. .55 .305 .3805 3.0 55. .505 .85 .5 305 .585 3.0 8. .35 .505 .3 305 0 55. .505 .85 .3 305 0 0055513. 885%. 55 .335 .505 .5305 .3 305 3.0 35 .35 .5505 .5305 .3305 5.0 555 .35 .505 .5305 .3 305 5.0 35 .535 .505 .5305 .385 5.0 555 .35 .505 .5305 .3 305 5.0! 5. .35. .35 .85 .3 305 .5805 5.0 555 .3... .505 .5305 .3305 5.0 55. .55. .005 .85 .5 305 .3305 5.0 35. .55. .35 .505 .3305 .3585 5.0 55. .55. .85 .85 .505 .3305 5.0 55..55..53..53..50..5.30..5.5305.5535 5.0 55. .55. .85 .50. .505 .305 .5585 5.0 55..55..535.5053305305.335 3.0 55. .55. .8. .505 .3305 .385 3.0 555 .35 .505 .3305 0 555 .85 .505 .3 305 0 85355025 :50 3 85335005 :8 >3 508510. 5085"... 533. .7355 __ 355 8533890 :85 50503550 .5.< 5.053 67 35. .555 .505 .5585 .558. .358. 3.0 55 .35 .85 .30. .50. .5585 3.0 35. .555 .505 .505 .5585 .5500. 5.0 55 35.858. .50. .5585 5.0 55. .5..5.505.50. .5585 5.0 55. 35.5058. 50.885 5.0 35..53..5.50..5.558..5 5.0 55:35 .8 5 .30 .50 .5805 5.0 55.33.58. 50.5.5585 5.0 55. 35 ..50.5 .5585 .538 5.0 55:33.5 .50. .505 .5305 .5585 5.0 5 .33. .5.0 5 50.5 .305 .5585 5.0 55:33.5 .505 .5305 .55005 3.0 5. .3 5 50.5 .5305 .5585 3.0 55..33..5.50..5.558..5 0 5. ..30.5.50..5.558..5 0 05355 n 5. 55355 n 5. 555 .535 .505 .5585 3.0 535 .535 .505 .505 .5585 3.0 555 .35 .505 .5805 5.0 55 .335 .505 .505 .5585 5.0 355 .55 .505 .5585 5.0 55. .55. .35 .505 .5585 5.0 555 .535 .505 .5585 5.0 35. .55. .85 .505 .3305 .5585 5.0 35. .55. .335 .505 .5585 .5585 5.0 5. .55. .505 .505 .505 .5585 5.0 5..55..53..3..50..5.530..5.5305.5585 5.0 55. .55. .85 .50. .505 .5305 .5585 5.0 5. .55. .335 .505 .505 .5305 .5585 3.0 355 .85 .505 .50. .5305 .5585 3.0 55 .335 .505 .55005 0 355 .505 .505 .5805 0 552.5395 :8 3 555355005 :8 3 omhmm u U. mmnmw u U. 8:35:00 N..< 20m... TABLES AND FIGURES 69 Multinomial Distribution Residuals O Product Multinomial Distribution Residuals o -3 -2 -1 o ' 1 2 3 Z (rank) Figure 2-1. Normal plots of the standardized residuals of the cell means. 70 .533": 60.05": 30 .53": E 80553835 355 =55... 5 .5353 __8.53 8:55:88; 30 3553 53.3 .03 530.5 .5305 3.5 539... 3.0 5.0 5.0 5.0 5.0 5.0 3.0 0 3.0 5.0 5.0 5.0 5.0 5.0 3.0 0 _ — h _ F 3 _ _ O 1 00 N‘- l l l T 1 I 38888 1 I see 33 No 0.0 md to md Nd _..o o mhvw o.nm...l0l 09% 038. IOI 5.15m 535.... Icl 53% 03.5... In! mute 3.3587935 .5265. .II E 555 355335 5:39 303 38 3.260.351 71 .533": 303.05": 30 .53": 5 53.055.58.35 =8 .555. 5 .5353 =83 .383 35850022 55 .03 530.5 .5305 .5.5 205E No md md We 06 Nd Pd 0 No md md .56 md. Nd v.0 o p _ — — — u u q «u— .— muvm can... 1T 83 Enabler 53% 538. LT 35 @335... In! mfio 53.533.323.335 825m .II 333 555 .5555 53301530333 :28on 3 72 .535": 33.55": .3 .53": .53 80558835 =8 3555 5 .5353 38.53 .3553 333 30 558085 55 30 558 532555 55 .03 5325 .5305 5.5 555E Nd 0d md vd md Nd ...d Nd 0d md vd md Nd Pd 0 p 3 q 3 - O I 000 N‘- P h _ 3 _ A _ b _ a l. O — — p — q 1 d 4 1 I l l I T I ssés ii 00 QR 3. 88 ._ 3.5 535... I01 85 535.5 I9. 53% 535.. LT 53% 535... no! . 5 3n... 3.235.333.5355 .5365. II. .33 555 85.335 5:39 .03 535. 330.355.3533. .5. 73 .53": 303.05": 3. .53": .5. 5005530595 =50 350.5 5 .5353 __50.53 5553 35 .0 5580000 55 .0 5550 35005.0... .0000... 55 .0. 5305 .5305 .55 5.005 Nd 0d md vd md Nd _.d o Nd 0d md Vd Md Nd —.d o p . p . + w .4 w L o p P _ a b d 3 fl Nd 0d md vd md Nd _.d _ q vam 535._.Iol 85w 535... Ion 53% 535... LT 555$ 535... Iol N335 585832555 .5265. II .3. 55.5 300.35 533518.535. 5.355.351 p _ 74 Table 4-1. Lower limits for the minimum expected cell frequency (em) Recommended minimums Observed Yamold Roscoe Trial k n(p) R pm 11 minimum e Cochran (Modified) & Byars index ' 16 1 366 0.0075 16 0.12 0.5 0.31 1.00 0.05 16 2 370 0.0125 16 0.2 l 0.63 1.00 0.10 16 4 345 0.0225 16 0.36 5 1.25 1.00 0.20 16 8 373 0.0275 16 0.44 5 2.50 1.00 0.40 16 12 349 0.0375 16 0.6 5 3.75 1.00 0.59 ‘ 16 15 319 0.0475 16 0.76 5 4.69 1.00 0.74 8 1 163 0.008 16 0.096 0.5 0.63 1.00 0.16 8 2 146 0.019 16 0.228 1 1.25 1.00 0.31 8 4 217 0.02 16 0.24 5 2.50 1.00 0.63 8 6 139 0.045 16 0.54 5 3.75 1.00 0.94 8 7 122 0.058 16 0.696 5 4.38 1.00 1.09 4 l 34 0.04 10 0.4 0.5 1.25 1.00 0.63 49 0.025 16 0.4 4 2 27 0.09 10 0.9 1 2.50 1.00 1.25 38 0.06 16 0.96 4 3 29 0.14 10 1.4 5 3.75 1.00 1.88 33 0.095 16 1.52 75 Table 4-2. Cell probabilities of tables generated for part 2. 1 Ratio L Set k n(p) Pm Subset n(pm) pm rim/pm R 1 g A 4 1 0.01 a 3 0.33 33 109 j b 2 0.37 37 109 i c 1 0.49 49 110 l 1 B 4 2 0.01 a 2 0.49 49 204 j b 1 0.73 73 205 l g c 16 2 0.0065 a 14 0.07 10.8 508 i b 8 0.08 11.8 504 i c 1 0.17 26.8 522 i . 1 D 16 4 0.0065 a 12 0.08 12.5 763 b 8 0.09 13.9 768 1 c 1 0.29 44.1 795 1 l l E 16 8 0.0065 8 8 0.12 18.2 1298 1 b 1 0.51 78.5 1345 76 A —.._....—-—.—...._- --. _.-o-—.-1L w...“ . 6 ‘ ‘1' ‘ 1 0“— . Vivi, a \ a a 4 -— +Subseta I ‘ ' ”1&37431-23 +Subset b l 2 +Subset c . 1 P J 0 50 100 150 200 250 8 7..-.-. M, . i... - .. ,_...__ 2.-.._-__...L...-..____.__.__.1 6 I‘, ~— 11 ‘ i ‘_ _ . 4 4 "‘ A 451151,; + Subset a 2 +Subset b 1 1 1 0 50 100 150 200 250 Figure 4-1. Type I error rate (in percent) versus sample size, 4-cell tables, (A) 1 and (B) 2 small cell expectations. 77 8 WWO‘«.—- cu.“ . . ‘ 4w~...-..- .. J...” w...— a...” _....._ .1 + subset b AW— _.__..M-.-_ + Subset a + Subset c .0" , 6 “$an ‘ ‘7“ ‘. 1 F v ult' 1 l 4 _ m“ L" d 5\! 5 ' 1‘ ‘5 ’1 l. 1 ’ 1 l 2 1 1 1 1 0 200 400 600 2 I i i 0 200 400 600 800 8 «r i 1“» 6- ‘1 «1 _4 ...4 43:. 4,4141» ' 1'1; v *r" 4 +Subseta 2 +SubsettL 1 1 J 0 200 400 600 800 Figure 4-1 continued. 16-cell tables, (C) 2, (D) 4, and (E) 8 small cell expectations. 4:84.685 =8 =23 ~ 5? 8.3 =84 A3 4 A3 .coasooQo :8 =95 F 5? @033 =34 3. w 3 .3 .3? Soto 6:29 “:8qu 5 no“! 5:00.? ”803 Esau N4 059“. 3 no no to 3 No 3 o 3 3 no 3 3 No to — h n + w «P L _ — p ‘P w u o 9%.? . ”F 4:47? 1 2 99.101 cm «34.101 1 cm 831? -1 on 8:7? -1 on 89.101 n1 ov BEAT H1 cc 3»: .6381 4 WM 3»: 8366' 11 ww -1 E -1 8 1 8 : p8. E a. 3 9o rec to no «a 3 o 3 ad ad to no No to 1. ._ 1 _ 1 1. L. o m _ a n 1 _ n . c or m 1 or 1 cm W .1 ow 1 on W 1 8 1 9. . 1 ow 1 on w 1 on «5.1? 1 8 -1 8 89.1.? 11 E. -1 2 Swain: 1 mm 1 WM "C 5030 I 11 . H l 11 «N n. 2: .3 cm a .6364. 2: E 79 No md md to ad Nd _..o — _ L P L — o fi 351? . . . w. m 85101 1 cm. 951...: .1 8 85161 -m ”m H: 5030 I SN 6 1 8 mhwmll 005101 mgmlcl mm 5 101 EN: .026."— II L k l E. .2 ..._-_.. -WM_...1,_ h — q Nd 06 m6 #6 «.0 N6 m3...— 19.1 N3“. IOI monu— 14.1 new“. InT wauc .Qsomll — — _ 1 £38398 =3 =95 P 5.; 838 __8.2 E 4 av 4588696 =8 =28 4 5.; 8.3 :84 e a § .8288 N4. 2:9“. _..o N3“. 101 manu— 1a.! son“. 10: c cor S .3 80 Nd dd md vd md Nd Pd 0 PW": 5025A. I .8.... . . . ,9 83101 1ON 9‘0wa -om 83101 -3 mnmw 16! comm 101 mgw [cl momw 101 9.": .gomll E — Nd 0d md Vd nd Nd _ q w w Nd od md vd md Nd 211m 191 covw IOI mfivw lcl mmvw Ial 024.": 833' h d — 45.68098 =8 =95 w 5? 8.3 =88. e a 3 4:84.686 =8 =95 v 5? 833 :84. s a s .82....8 «4 659.“. P h _ — Pd db- — u 3.5 191 85 101 mvvw 141 33 101 m4": bosom Ill CO ‘00 8 81 a Nd 0d md vd nd Nd _.d 9N": fiscal 22.141? _ _ 1 m. 035191 1 ON 435161 11 on 845161 .1 cc Nd od md vd md Nd rd _ .- 1— d1— «1— mNmrwIOI com—bl? mvmvmlcl mam 5 101 mi: 3264.11 1 1 l J row Tow 1cm cow a. .3 . 2.0380096 =8 =95 2. 5E 838 .818 av .4 § 4:36.896 =8 :2.» N. 5? 8.3 :88. A5 a as 822.8 «4. 659“. Nd od md Vd md .Nd _.d mNNPmIOI dmNPwIOI mVva lcl mmNlenT SN": 5269' _ * r p r _ w . mNN 5 191 ooNFm 101 EN pm Icl mmN 5 101 2 4 cm -1 8 -1 8 11 11 CO 0500 o9. .5 =5 82 (a) 100 __._____...__......_-...._._.-__-._ 80 -— ...... 60 ~— 40 ~— 20 —Power df=3 o 1x.0‘l(+H1) “ 0. 12101 (+11) + 3x.01(+H1) 0 1 [x 3x.01 (+11) % o 50 ' 100 150 200 80 -- ' " ' ’ 60 __ —Power df=3 o 1x.04(+1-11) 40 ~1 0 1x.04 (+11) 20 ._ + 3x.04(+1-11) a: 3:104 (+11) 0 "L‘t 1 r 1 o 50 100 150 200 1°) 100 ~ 80 q- .: Z 60 ‘— 40 “ —Power df=15 20 +s1275 (Mixed H1) " , +81275 (+111) 0 1 1 1 +812715 (-H1) 0 50 100 150 200 250 Figure 4-3. Rejection rates (96) versus sample size for alternative hypotheses: small cells increasing (+H1) or small cells decreasing (-H1), (a) and (b) 4-cell tables, n(p)=1 or 3. (c) 16-cell tables, n(p)=12. 83 .69 E 3 $0580 05 8 came iguana. 05 6v new $8 38055:. 833.5 65 § 8653 685248 65 2 .638 82636 E s 88.650 E .21: .919. .216. .3 65?. Nd 0d md vd md Nd _.d o essés 888 .3 EV Nd 0d md vd md Nd rd o. .1- Y mNN Pm flow... 101 8Nww 0.09—.10! 94pr min... Ial .1435 min... 101. men: coumgxoaam hogan. I E 0% 60:0 3.61.015 0.8 compo?“ 3 84 Nd md md Vd nd Nd _.d o 1 1 o Nd 0d md vd md Nd _.d o 1 O .a-y-u- 822.8 E 659“. E mNN Pm 0.00... 101 com Fm 0.00... 101 new 5 0.08. 101 mam 5 0.00... 101 on": 00:05:60an .0260 II. E 6% 06:6 . 0:29 Ag; 28 0030?”. .3 85 .8.. u m: 2: dv u me Amy 63.3 08 .mmNfi 0.9:mm 5:ch .N1m 0.590.. Nd md md vd md Nd _.d o 1. O O N “88888" Nd od md vd md Nd §88 _.d o 3. E 38$ 033.101 88m 033.101 88$ 0.08.101 08mm 0.08.101 88 038.101 0030879000 Egon. II E 6% 06:6 «3901505 0300?”. 86 .8. u a: As .8 u 0. § 8:8 03 085:8 «.m 6.501. Nd od md Vd md Nd _.d o w L _ _ _ h b o q _ d . d1 4 889 11%?111‘ E883 §88 .3 _.d 33 030... 101 8N3 030.1101 $wa 030... 191 005m 038.101 . 2.0 030... 1.01 cozaéxoaam .0260 I E 6% 06:6 4:89, 33 22 =68on Nd md md V b _ d md Nd h 4 .3 87 md .8118 2.11 .20 84 .1119. .210. 432 .1151; .86 06 815415410 830 . . m . . 830 . . + . . 850+ 88le mwa '91 N3": 0300' .3 3 0d 0d md vd ad 1 .1 _ mm “Swamifii 880.1: noowwnlal. mo woomwnlclr omwwlol 3.1": .0300 11 3. 1 mm 1 mm md Vd md 1 1 . m« 880...... on 111141.... IT 00 . moommlcl .. \ mv . 1.11.1111. 1.... cm... 0 11, mo 1 mN 1 mm 1 mm .3 65010 a. E 88 Discrepancy beMeen observed and predicted rejection rates 5.01”” -L A ° 0 A ‘ o EQUAL N -5.0 4- A D Shift down -lO 0 ._ ° A No extremes 0 ° 0 o Shift up -15.0 i a n -20.0 9 l l i 80 9O 95 99 Predicted rejection rates (96) 12 O 1o —- 0 —Alpha 0 8 -- e 3 -_ 0 Type I 4 i i i i 9"" (96) 194 248 288 380 (.80) (.90) (.95) (.99) Sample size (predicted rejection rates) Figure 5—4. Application problem, confirmatory simulation results. 89 + $475 (n>Pn) + S475a (n>Pn) NM-hUIODV 0 200 400 600 800 1000 +886)0<(nPn ‘ g, ’3' A- sasoa +sasoa n>Pn { 3 9' v. «S860b +3860b n>Pn 4 {Hi «e seeoc —o—seeoc n>Pn j 2 at , . a S860d, +8860d n>Pn . , , , )fi . o 200 400 600 800 1000 7 1, y __ 6 —_~t," '- . .TH'.‘, 5 y" ' \n ".‘.u"..,'. 3 '. .9 ., 8875 ' +8875(n>Pn) , .. .w-sa75a S875 P 3 . , . S875b s ‘ ~+~ ss7sc 2 g“ p , ,. r8875d, . L0 200 400 600 800 1000 Figure 6-1. Type I error rate versus sample size, test of independence: (3) 475 series, (b) 860 series, and (c) 875 series. 90 888088 =8 :2.» v 55, 8.8. =88. A3 a as 885880 =8 =95 u 5? 8.8. :88 E w E 8828»? so “8. .203 538 .3 28E No 0.6 m6 #6 ad Nd _..o o 06 *6 nd Nd Pd o 86.? . _ -. 9 2.9.1.: _ . - w. oovaOI -r em oomuIOI - om mild! l on mvmmlol -T on 8.5+ -- ”m mangle: -. ”m "c Ego I ll "C 530 I 1! at n. + 8 8 n. .- 8 .- E. l 2. l cm i cm [1 OG Lu cm 2: .3 . as 3. Nd 0.0 md #6 ad Nd Pd 6 . o r or - cm I on 8.5:? H mm 003+ .8 oo 35ch -- 2. .31de l on own: Egon—II. .8 cm 2: E 3 91 8.3380096 =3 =9...» 9 £8 833 .8825 a 3 885893.00 =98 8 5? 8.3 :88 S a E .8888 «.8 28m b L « mum—bl? cow—h IOI mwmwm lei .mmwwm IDI OF P": thQ Ila... Nd dd md ed md Nd _.d d _ n p q d _ mum—.w lel ommww IOI mVNww Iol mam Fm ID! at": fiscal E. 3 so no to no. No to o 88m _ _ 1 -WF 88+ 88?... - 8 88:? -- 8 mzwnf -. 8 mmalfll it On oswucbogodlas : 8 ,- 8 l 8 - 8 8. E 3 me no to no «o 3 o _ ¢ ¢ w d O - 9 .. 8 -v on 88?? “mm 2.8:? 88?? --8 88:? :2. mgwlar l on 88qu 4 8 awn: .Qsodl 8.. 3 92 .mcoafiooaxo =8 =95 up By 8286096 =8 =2.» e. 3 ”ca u c .2 3o... Egon. 6% 059“. Nd od md vd md Nd _.d o o 8 ON - d? T dd .. om cow Nd dd md vd md Nd _.d d 3. E mNNPm IOI ompuc .ogodnll doN Pm tel mm"... Bion— II RN 5 le RN 5 InT own: 530d II. vam LT 3N": Esoall 83 + 0N v": .638 I 94w IX: 3 Fun .538 II. movw InT 93 .mmtow mNd new cow A3 .3an0 =00 =95 m g .uoacacoo m6 2:9... No no no .3 do No to o d _ ed . o8 .. o8 .- c8 . Q8 3.3 Ah .- .. _ .1— L Nd 0d md vd md Nd rd o , o . ON a at 1 cm .. dd 09. .3 .3 nmNdw lxl mm": Bion— II noodw InT eon: .oBodl mNmm Iql on F": .oiod II comm I9: NP Pu: Esau. II. 93 I.T «Nu: .Qsod II mmmm In... 3...... .ozodl 94 8 0 o . ~- 1: o" 1"in: " 6.0 " o O o e 4.0 -_ e x e e x 2.0 ~~ xgo,o j." . . ~83 0.0 '1‘“ ; L = o .20 ._ 0 hp 4 o np = 8 o -4.0 -_ x np = 12 -6.0 8 l t 0 0 20.0 40.0 60.0 80.0 100.0 Figure 6-4. Differences between observed and expected power versus expected power. 95 Power distributions m C l I —Predicted power n=40 60 T -D-Observed power 50 L l : l 0.5 0.6 0.7 0.8 0.9 Effect size (Cohen's w) Figure 6—5. Application problem, confirmatory simulation results. 96 885088. =8 =95 o 55 833 .88 888885 3 .8. 2.. 8 2a “no“ 33:30:35 9: Ed 5223 anagxeaaa .038 .. $26.. 002033 «a 5 805350 5 .TN 239“. Nd 0d md vd nd Nd —.d o Nd 0d md vd md Nd _.d o w T u n . ..|.l o . w + 8 2 ._ n of v a .- . -. 3- .. 0."! . 3- od .. 3 -- c.~ -. ca - o... -- on E od E 2.3 287? 88 287? 83 287? . 88 038qu men... 5:35.86 cessodllu E as» 88 A . 881528 8806. . m a 97 .cd 2 .300 9 8a a. can 0388 on... .3338 ....N 0.59.“. Nd ed md to ad «d 3 o _ u . _ w . b O _ Y .- or n. -- om N. .i 8 a PI l CV m o .. cm -- F .. ow .- N .. ON .. n l 8 .. v -- om - m 8. E m .3 mem 23.7? -. ”F corn: bosom] -- cm 83 enabler : on NS": 3ng l 9. 93 enabler 11 cm NW": 5031' 1.. 8 m8” OBNPIOI -. 2. men: couwéxoaqm .osodll. .- 8 E 8888 .- 8 E 88> 33 22 88%”. co. 98 8:8 2.8 Eu 8 838:8 I. 28E Nd ed md v.0 0o «d 3 o .3 od md v.0 md No .8 F. _ _ w w n . c . w n n n _ _ _ .1 1T ON NI 1. on _... 1r 3 4 O -- on P l cm .. N it OK 11 m i on i V .. om .. m 8. e o a. L o 928 Enabler 11 ON 008% EDNPIII ll 8 anc hogal .. ow nmNom gambler .- 8 88m 287? wm men: conafixoaam .26.“. II .- 8 3 8m 88 -- om 2.201836. c0390”. 8F .5 BIBLIOGRAPHY 100 Blbllography Agresti, A (1990). Categorical data analysis. John Wiley 8. Sons. Bradley, D. R. Bradley, T. D. McGrath, S. G., 8: Cutcomb, S. D. (1979). Type I error rate of the chi-square test of independence In r x c tables that have small expected frequencies. Psychological Bulletin, 86(6), 1290-1297. Bradley, D. R., & Seely, D. L. (1977). Empirical determination of the power of the chi-square test of independence in 2 x 2 tables. Proceedings of the Statistical Computing Section of the American Statistical Association, 138- 144. Camilli, G., & Hopkins, K D. (1978). Applicability of chi-square to 2x2 contingency tables with small expected cell frequencies. Psychological , Bulletin, 85(1), 163-167. Cochran, W. G. (1952). The a" test of goodness of fit. Annals of Mathematical Statistics, 23, 315-345. Cochran, W. G. (1954). Some methods for strengthening the common 3" tests. Biometrics, 10, 417-451. Cohen, J. (1 988). Statistical power analysis for the behavioral sciences, 2"” ad. Hillsdate, NJ: Erlbaum. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. Cooper, H., & Findley, M. (1982). Expected effect sizes: Estimates for . statistical power analysis in social psychology. Personality and Social Psychology Bulletin, 9, 168-173. Craddock, J. M., 8. Flood, C. R. (1970). The distribution of the 12 statistic in small contingency tables. Applied Statistics. Joumal of the Royal Statistical Society, Series C, 19,173-181. Fishman, G. S., & Moore, L. R. (1982). A statistical evaluation of multiplicative oongruential random number generators with modulus 2311. Journal of the American Statistical Association, 71, 129-136. Frosini, B. V. (1978). On the power function of the X2 test. Matron, 34, 3-36. Garside, G. R., 8. Mack, C. (1976). Actual Type I error probabilities for various tests in the homogeneity case of the M contingency table. The American Statistician, 30(1), 16-20. Harkness, W. L. 8. Katz, L. ( 1964). Comparison of the power functions for the test of independence in 2x2 contingency tables. Annals of Mathematical Statistics, 35, 1115-1 127. Haase, R. F ., Waechter, D. M., 8. SolomOn, G. S. (1982). How significant is a significant difference? Average effect size of research in counseling psychology. Journal of Counseling Psychology, 29, 58-65. Horn, S. D. (1977). Goodness of fit tests for discrete data: A review and an application to a health impaim'Ient scale. Biometrics, 33, 237-248. 101 Hayman, G. E. 8. Leona, F. C. (1964). Comparison of the power functions for the test of independence in 2x2 contingency tables. Annals of Mathematical Statistics, 35, 1115-1 127. Koehler, K. J., 8. Lamtz, K (1960). An empirical investigation of goodness of fit statistics for sparse multinomials. Journal of the American Statistical Association, 75(370), 336-344. Koehler, K. J. (1986). Goodness of fit tests for log-linear models in sparse contingency tables. Journal of the American Statistical Association, 81(394), 483-493. Lamtz, K (1978). Small-sample comparisons of exact levels for chi-squared goodness of fit statistics. Journal of the American Statistical Association, 73(362), 253-263. Lawal, H. B. (1992). A modified X2 tests when some cells have small expectations In the multinomial distribution. Journal of Statistical Computer Simulafions,40,15-27. Lawal, H. B. 28: Upton, G J. G. (1980). An approximation to the distribution of the X2 goodness-of-fit statistic for use with small expectations. Biometrika, 67 (2), 447-453. Meng, R. C., and Chapman, D. G. (1966). Journal of the American Statistical Association, 61, 965-975. - Moore, D. S. (1986). Tests of chi-squared type. In R. B. D’Agostino and M. A Stephens (Eds) Goodness-of-fit techniques. NewYork: Marcel Dekker, Inc. on, R. L. (1993). An introduction to statistical methods and data analysis (4" edition). Belmont, CA: Duxbury Press. Press, W. H, Teukolsky, S. A, Vetterling, W. T., 8. Flannery, B. P. (1992). Numerical recipes in Fortran, 2'” edition. Cambridge University Press. Read, T. T. C. 8. Cressie, N. A C. (1988). Goodness-of-fit statistics fordiscrate mulfivan‘ate data. New York: Springer-Venag. Roscoe, J. T. 8. Byars, J. A (1971). An investigation of the restraints with respect to sample size commonly imposed on the use of the chi-square statistic. Journal of the American Statistical Association, 66, 336, 755- 759. SAS Institute Inc. (1990). SAS language: Reference, version 6, first edition. Cary, NC: SAS Institute, Inc. Slakter, M. J. (1968). Accuracy of an approximation to the power of the chi- square goodness of fit test with small but equal expected frequencies. Journal of the American Statistical Association, 63, 912-924. Von Eye, A (1990). Introduction to configural frequency analysis: The search for types and antilypas in cross-classifications. Cambridge University Press. VVIckens, T. D. (1989). Multiway contingency tables analysis for the social sciences. Hillsdale, New Jersey: Lawrence Erlbaum Associates, Publishers. 102 Wise, M. E. (1963). Multinomial probabilities and the x’ and x2 distributions. Biomatrika, 50, 145-154. Yamold, J. K. (1970). The minimum expectation in X2 goodness of fit tests and the accuracy of approximations for the null distribution. Journal of the American Statistical Association, 65(330), 864-886. 103