til

 

 

 

 

 

‘ .

 

1

  

 

 

IIIIIHHIHHIIIUllllHlllllllHlllUlllHHllllllHlUllll L

d 31293 02058 6545 '

, )
c-K

)I‘\F\ ’
F :k/L/

|
This is to certify that the .

dissertation entitled

statistic's fit to its asymptotic
distributions{fImplications for

t
i
t
. . , , 1
Factors 1nfluenc1ng Pearson s chl-squared .
i
. . . 1

sample Slze gu1de11nes '

presented by

Shelly Johann Naud

has been accepted towards fulﬁllment
of the requirements for

Ph.D. de&e in Education

 

-v-+'lw._._ﬂ r-—‘ W

ma

Major prgfessor

 

Date g/Z'7/(lf

__ .W—‘kv‘
OW W

MSU is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

_‘ __ -rﬁ _ - ,f—v ~w——-r “v

v—‘V

 

 

 

LIBRARY

Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

trim os

 

 

iﬁiﬁipm

 

 

 

 

 

 

 

 

 

 

 

 

 

11/00 chlRC/DdaOm.ﬁ5-p.14

FACTORS INFLUENCING PEARSON’S CHI-SQUARED STATISTIC’S FIT TO
ITS ASYMPTOTIC DISTRIBUTIONS:
IMPLICATIONS FOR SAMPLE SIZE GUIDELINES

BY

SHELLY JOHANN NAUD

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Measurement and Quantitative Methods
College of Education

1 999

'. =5... d-la_ !

ABSTRACT

FACTORS INFLUENCING PEARSON’S CHI-SQUARED STATISTIC’S FIT TO
ITS AYMPTOTIC DISTRIBUTIONS:
IMPLICATIONS FOR SAMPLE SIZE GUIDELINES

BY

SHELLY JOHAN NAUD

Recent sample size guidelines for Pearson's chi-squared statistic (X2)
have generally been based on simulation studies. These previous studies have
mainly focused on the impact of small sample size on Type I error for a single
test. A simulation study was carried out to evaluate the impact of small sample
size on both Type I error and power approximation across four tests. It was
found that power may be overestimated even though the sample size is large
enough for the Type I error rate to be close to a. This problem is more serious
for the test of independence than for the goodness of ﬁt test.

A quantitative index, Pn, was proposed for contingency table tests. When
sample size is larger than Pn, both Type I error and power of X2 are fairly well

approximated by the asymptotic distributions.

ACKNOWLEDGMENTS

I owe a debt of gratitude to the members of my dissertation committee:
Dr. Alexander von Eye, committee chair, Dr. Betsy Jane Becker, adviser and
guidance committee chair, Dr. Richard Houang, and Dr. Alka lndurkhya. I also

wish to thank Dr. David Wagstaff for his extensive editorial comments on myﬁﬁrst

draft.

TABLE OF CONTENTS

LIST OF TABLES .................................................................................................. v
LIST OF FIGURES .............................................................................................. vi
LIST OF ABBREVIATIONS ................................................................................. vii
INTRODUCTION .............................................. - .................................................... 1
CHAPTER 1 - '

THEORETICAL BACKGROUND .......................................................................... 3
CHAPTER 2

SIMULATIONS .................................................................................................... 11
CHAPTER 3

THE FIRST QUESTION ...................................................................................... 16
CHAPTER 4

THE GOODNESS OF FIT TEST UNDER THE MULTINOMIAL SAMPLING
MODEL ............................................................................................................... 21
CHAPTER 5

THE GOODNESS OF FIT TEST UNDER THE PRODUCT MULTINOMIAL
SAMPLING MODEL ............................................................................................ 33
CHAPTER 6

THE TEST OF INDEPENDENCE ....................................................................... 40
CHAPTER 7

THE HOMOGENEITY TEST ............................................................................... 51
CHAPTER 3

SUMMARY AND RECOMMENDATIONS ........................................................... 55
APPENDIXA ...................................................................................................... 59
TABLES AND FIGURES ..................................................................................... 7o
BIBLIOGRAPHY ............................................................................................... 100

LIST OF TABLES

Table 4-1. Lower limits for the minimum expected cell frequency (em). ........... 75
Table 4-2. Cell probabilities Of tables generated for part 2 ............................... 76

Figure 2-1. Normal plots of the standardized residuals of the cell means ......... 70
Figure 3-1. Power plots for the test of independence ........................................ 71
Figure 3-2. Power plots for the homogeneity test .............................................. 72
Figure 3-3. Power plots for the multinomial case Of the goodness of ﬁt test ..... 73
Figure 3-4. Power plots for the product multinomial case Of the goodness of ﬁt
test .................................................................................................................. 74
Figure 4-1. Type I error rate versus sample size ............................................... 78
Figure 4-2. Power plots: Rejection rate in percent versus effect size, w. ......... 79
Figure 4-3. Rejection rates versus sample size for alternative hypotheses ....... 83
Figure 5-1. Differences in ﬁt between the product multinomial and the
multinomial cases of the goodness of ﬁt test .................................................. 84
Figure 5-2. Unequal sample sizes, 860 and 875 series .................................... 86
Figure 5-3. The number Of small cell expectations within rows as a factor
affecting power ............................................................................................... 88 .
Figure 5-4. Application problem ......................................................................... 89
Figure 8-1. Type I error rate versus sample size, test of independence ........... 90
Figure 6-2. Power plots, test of independence .................................................. 91
Figure 6-3. Power plots for n = Pn ..................................................................... 93
Figure 8-4. Differences between observed and expected power versus expected
power .............................................................................................................. 95
Figure 6-5. Application problem ......................................................................... 98
Figure 7-1. Differences in ﬁt between the homogeneity test and the test of
independence ................................................................................................. 97

LIST OF FIGURES

vi

LIST OF ABBREVIATIONS

df Degrees of freedom.

9. or e. Expected cell frequency (p. 2).

em... A table’s minimum expected cell frequency (p. 5).

ES Effect size (p. 5)

k Number of cells in a table.

Ho Null hypothesis.

H. Alternate hypothesis.

n Sample size.

n5 The sample size where power is expected to be .5 for a large effect size (p.
5).

n8 The sample size where power is expected to be .8 for a moderate effect
size (p. 5).

n(p) The number of cells with expected probabilities less than 1/k (p. 6).

O. or 0., Observed cell frequency (p. 2). ‘

p. or p. Expected cell probability (p. 2).

p... p, Marginal probabilities (p. 2).

pm A table’s minimum expected cell probability (p. 6).

Pn Sample size where the probability of getting a marginal total of zero is 1%
(p. 43).

r Number of cells expectations less than 5 (p. 6).

R A global index evaluating the skewness Of the distribution Of cell

expectations (p. 6).

Cohen's index of effect size (p. 4).

Pearson’s Chi-squared statistic.

Noncentrality parameter (p. 3).

>180:

vii

INTRODUCTION

Pearson’s chi-squared statistic, X2, ﬁrst introduced in 1900, is currently
widely known and used. Many of the researchers who used X2 may not realize
is that there is no consensus on sample size guidelines - available guidelines

actually vary a great deal. Why is there such variablitiy? It is partly due to the

' different approaches for determining when an asymptotic distribution is a
reasonable approximation. When sample sizes are small, the distribution of X2
is a step function that cannot be well approximated by any continuous function.
The older guidelines required that the distribution of X2 be fairly smooth. To.
attain this criteria, sample sizes need to be large. Recent guidelines are
generally based on simulation studies. As long as the actual Type I error rate
is reasonably close to the nominal Type I error rate, a, the asymptotic
distribution is considered adequate. The resulting sample size

recommendations are considerably less stringent.

Though a cOnsiderable number of simulation studies have been done, the
question of sample size has not been entirely resolved because additional
factors complicate the problem. One such factor is the table’s distribution of cell
expeCtations. Tables where some of the expected cell frequencies are very
small in comparison to the other cells apparently require different guidelines

than table with uniform expectations.

This study proposes to address some of the gaps in the simulation
research. One is related to the fact that a majority of the research has dealt with
only one of the several tests that use X2 as the test statistic. Although the
asymptotic distributions of X2 are the same across tests, the actual distribution of
X2 across tests is not necessarily similar when the sample size is small. This
issue has not been studied systematically. A Second issue addressed in this
study is power. Although there have been studies on the impact of small sample
sizes on power, these have had much less inﬂuence on sample size guidelines

than the studies focusing on Type I error.

The comparison between tests is the focus of Chapter three. Each test is
then considered individually in the following chapters. The ﬁrst two Chapters will

cover theoretical and methodological issues.

In summary, this study will explore the behavior of Pearson’s chi-squared
statistic when the sample size is small and the table has a skewed distribution of
expected cell frequencies. These are the conditions where the asymptotic
distributions do not hold well. Both power and Type I error will be considered
across different tests. Current recommendations for sample size will be

evaluated based on these ﬁndings.

Chapter 1

THEORETICAL BACKGROUND

The ﬁrst Sections of this chapter will deﬁne the notation and terminology
related to X2, hypothesis testing and power, and some proposed indices. The
sampling distributions and tests associated with categorical data are described

in the last section.

Notation and formulas

The two-way frequency tables have I rows and J columns. The number of
cells in the table is denoted by k with k = IJ. Marginal row and column
probabilities, p.. and p... are obtained by dividing the row and column totals, hi,
and n... by the total sample size, n (6.9., p1, = n1,ln). Depending on the sampling
plan that is assumed to have generated the data, one or more of the marginal
totals may be ﬁxed or treated as constants. With such sampling plans, the
marginal totals will used in some formulas instead of n.

The expected cell probabilities are denoted by p. with p; = p.,p,,. The
expected cell frequencies (or expectations) are related to the cell probabilities:
e. = npg. Each cell’s count is referred to as the observed cell frequency (09.

Pearson’s chi-squared statistic provides a measure of the discrepancy

between observed and expected cell frequencies:

x2 = 2 2(0' _e'|)

I I eIi
If the expected values are close to the observed values, the value of X2 is small;
if the expected values are far from the observed values, the value of X2 is large.
Because the deviations are squared, the X2 statistic gives more weight to
observed cell frequencies that are much larger (or much smaller) than the
expected cell frequencies. .

When the null hypothesis is true, Pearson’s X2 is asymptotically
distributed as the chi-square (x2). On the other hand, when the null hypothesis
is false, the asymptotic distribution is the noncentral chi-square distribution.
Both distributions have degrees of freedom (df) as a parameter. The noncentral

chi-square distribution further depends on a second parameter
A. = "22 (p15 - p002
P011

where pg. refers to the cell probability under the null hypothesis (Ho) while p1.
refers to the cell probability under the alternate hypothesis (H1). Lambda
increases in value as the two hypotheses become more discrepant, and it
increases with the sample size. When I. is set to zero, the noncentral x2 is
equivalent to the chi-square distribution.
Type I error, power, and Cohen’s effect size Index

When deciding whether to accept or reject the null hypothesis,
researchers can make two types of error. They can reject the null hypothesis

when it is true or they can fail to reject the null hypothesis when it is false. This

4

former is referred to as Type I error and the probability of its occurrence is a.
The second is a Type II error and its probability of occurrence is [3. Power is the
probability of rejecting Ho when it is false; it is equal to 1 - [3.

Researchers have to balance the costs associated with the two errors.
Choosing to make a very small decreases the risk of rejecting a null hypothesis
that is true; however, power also decreases as a result. Choosing a relatively

large a will result in a smaller [3 and, therefore, more power; however, this choice I

increases the risk of rejecting the null hypothesis when it is true. There is
another alternative: Researchers can achieve an increase in power by
increasing their sample size. In order to determine how large the sample should
be, the researcher should have a reasonable estimate of the population effect
size, ES. A small effect size indicates that the alternative hypothesis is not much
different from the null hypothesis. Small differences are unlikely to be detected
unless the sample size is large. On the other hand. a researcher can expect to
detect large effect sizes with smaller samples. Estimates of effect size are
determined from previous research or pilot studies whenever possible.

Cohen (1988, 1992) has deﬁned a measure of effect size that is widely

used. If previous ﬁndings are available, Cohen’s index1 can be calculated as

 

_ 2
follows: w = JZZM . This index is closely related to the
1 J

Pol

noncentrality parameter, it = nw“. Cohen provided the following guidelines for

 

‘ In his ﬁrst edition of Statistical Power Analysis for the Behavioral Sciences,
Cohen proposed a slightly different ES index: e = Mn = wz.

5

interpreting the values of w: 0.1 corresponds to a small effect size, 0.3 to a
moderate effect size, while 0.5 is considered large. In practice, w is not likely to
be greater than 0.9. Several effect size surveys have found the average w to be
approximately 0.3, at least in the ﬁeld of psychology (Haase et al., 1982; Cooper
and Findley, 1982). Therefore, if a researcher lacks an empirically-based
alternative hypothesis, setting w to 0.3 is a plausible alternative. Cohen
suggests that power should be set at 0.8.

This study will evaluate the behavior of X2 at two target sample sizes.
The ﬁrst, n5, is the sample size where the theoretical power is .5 for a large ES,
i.e., n is determined after constraining the noncentrality parameter to be .5 and
the noncentral x2 to be 0.5. If a researcher has a sample size equal to n5, he or .
she will have a 50-50 chance of detecting a large effect size. The second target
sample size, n8, corresponds to a power of .8 for a medium ES (w = .3). N5
serves as the lower bound to sample sizes that may be considered by
researchers while n8 reﬂects a reasonable goal for most research. The speciﬁc
sample sizes that correspond to each target sample size are listed at the end of
this chapter. ‘
Measures of discrepancy

One problem that exists in the literature on categorical data is the lack of
a quantitative index for describing tables where the expected cell frequencies
are not all equal. Researchers often resort to qualitative descriptions such as “a
highly skewed distribution of expected cell frequencies.” Three quantitative

indices are proposed here.

Many authors use the minimum expected cell frequency (em...) as their
criterion for indicating how discrepant the observed table is from a table where
all cell frequencies are the same, i.e., the uniform table. Some researchers are
also interested in the number of cells with small expectations (Cochran, 1952,
1954; Yamold, 1970). In particular, Yamold proposed r, the number of cells with
an expected frequency of less than ﬁve. The disadvantage of using em... or r is
that they vary with the sample size. I propose two alternate indices that remain
invariant: the minimum expected cell probability, pm... and n(p), the number of
cells that have probabilities less than 11k, where k is the total number of cells. In
a uniform table, pm... = 1lk and n(p) = 0.

The third index used in the present study to indicate how discrepant an
observed table is from the uniform table is a global index, R = Z 1/p.. R is an
element of three formulas for estimating the variance of X2 (Pearson given in
Lawal & Upton, 1980; Haldane given in Lawal, 1992; and Morris given in Koehler
and Lamtz, 1980 ). The use of R is of interest since it is a key component of the
variance estimates, and the ﬁt of Xz’s distribution to its asymptotic distributions is
thought to be related to the variance of X2. When there are small cell
expectations, the variance of X2 can be much greater than the variance of 12
(Lawal, 1991).

Sampling distributions and tests for categorical data

There are two tests that are usually thought of whenever one deals with
categorical data, namely the test of independence and the goodness of ﬁt test.
Researchers would conduct a test of independence to determine whether two

7

variables are related, e.g., gender and level of job satisfaction. The degrees of
freedom for x2 is (l-1)(J-1). The expected cell frequencies are calculated from
the marginal probabilities: e; = nplpj.

The procedure for the goodness of ﬁt test differs from the test of
independence in two ways. The expected cell probabilities are speciﬁed by the
null hypothesis and the degrees of freedom is k-1. One application of this test,
given by Pearson when he introduced his statistic in 1900 (cited in Agresti,
1990), is analyzing the outcomes from a roulette wheel. If the wheel shows no
bias then each outcome has an equal probability of occurring, therefore, under
the null hypothesis e. = n n = n (1/37). Only one subscript is used since these
tables are one-dimensional.

In the two examples described above, the data are sampled from a single
population. There are two possible sampling distributions, namely, the
multinomial and the Poisson. The Poison differs from the multinomial in that
the sample size is not ﬁxed; n itself has a Poisson distribution (Agresti, 1990).

It is possible to sample from more than one population. The relevant
sampling distribution is the product multinomial. For the goodness of ﬁt test, the
degrees of freedom will be reduced by the number of groups sampled. Given i
groups and J categories, df = l (J-1) s IJ - l = k - l. The corresponding
contingency table test, i.e., the test of homogeneity, has the same degrees of
freedom as the test of independence; both are constrained by the marginal

totals.

This discussion of sampling distributions would not be complete without
mentioning the hypergeornetric sampling distribution. In this case, all of the
marginal frequencies are ﬁxed: e, = n m. 1g. Agresti (1990) maintains that the
only appropriate test in this situation is Fisher's exact probability test, therefore it
will not be considered in this study. Wickens (1989) presents other alternative
that also are appropriate.

Agresti (1990, p. 39) uses an example to clarify the differences among the
above sampling models. A two-way table is deﬁned by seat-belt use (yes, no)
and whether the driver survives the accident (yes, no). If the data include all
reported accidents occurring on the Massachusetts turnpike in a year, then the
cell frequencies are Poisson random variables. The cell observations have a
multinomial distribution when a subset of the population is randomly chosen, say
100 accident reports. If the researcher decides to sample 50 drivers who didn’t
wear seat belts and 50 seat-belt users, then we have a product multinomial
sampling distribution.

A different outcome needs to be chosen in order to illustrate the
hypergeometric sampling distribution. Let’s say the sample of accident reports
(from the product multinomial case) are given to an expert who is asked to
determine which 50 drivers were most likely to have worn seat belts. The
expert’s answers (likely, not likely) are compared to the actual data which were
withheld from the expert (seat belt, no seat belt). The resulting 2 x 2 table will

have marginal totals that are all ﬁxed to equal 50 by design.

This study will focus on four tests for which Pearson’s X2 is appropriate.
These tests are deﬁned by the two dimensions that were described in this
section, namely the sampling distribution and the method of calculativng the
expected cell frequencies - the goodness of ﬁt tests depend on H, while the
contingency table tests depend on the marginal totals. The target sample sizes
described earlier (n5 and n8) will vary depending on the tests as well as the

table size. The degrees of freedom will vary also. These are listed in the

lF—' .7!

following table.

 

 

Goodness of fit tests

Multinomial 4 3 24 122
16 15 48 216
Product multinomial 4 2 20 108
16 12 40 196

 

 

 

 

 

Contingencttable tests
Test Of independence 4 1 16 88
(Multinomial) 16 9 40 176

1

9

 

 

 

Hom_ogeneity test 4 16 88
(Product multinomial) 16 40 176

 

 

 

 

 

 

 

 

The distribution of X2 for these four tests will be compared in Chapter 3

and considered separately in the following chapters.

10

Chapter 2

SIMULATIONS

The next ﬁve chapters will present the results from multiple simulations.
The underlying procedures common to all are described in this chapter. Two
related topics are treated in separate sections: conﬁdence intervals and a
description of the tables that are used in more than one chapter.

The simulation programs were written in UNIX SAS version 6.07 (SAS
Institute Inc., 1990).

Four data sets were generated to assess the behavior of X2 across tests. _
Using the same data eliminates variation in the generated data as a possible
cause for any differences seen in the results. ’

The general strategy of the simulation programs was to partition the table
by the predetermined cell probabilities (pl). For example, the limits for cell 1 are
[0, ml”, the limits for cell 2 are (p1, p2], and so on. Each generated random
number, u, was then assigned to the cell for which p. s u < pm. The product
multinomial case differed from the multinomial in that each row was treated as a

separate table.

One critical aspect of simulations is the process used to generate the

random numbers. The SAS uniform random number generator uses a prime

 

' 2 The bracket is inclusive while the parenthesis is not: “[0” means “including
zero” while “p1)" means “up to, but not including, pl.”

11

modulus multiplicative generator with modulus 231 - 1 and multiplier 397204094.
This particular combination has been tested and found to be one of the better
random number generators (Fishman and Moore, 1982). The programs were
tested to see how well the generated data conformed to the target sampling
distribution. The observed cell means were compared to their theoretical values:

d. = 3. — npr The standardized residuals, n, ='d.,l 033, were plotted against the

expected 2 scores, 2., = <b(percentile rank 9. These normal probability plots

lmq

(Figure 2-1) are linear. The observed cell frequencies therefore follow the

expected distribution.

The mean observed cell variances were compared to their theoretical

2

s
values: 2i? - I]. The average deviation of the 96 cells from tables with a
I

multinomial distribution is slightly below the expected value of 0 (-.0035) while it
is slightly above 0 for the product multinomial case (.0046). The cell means and

standard deviations are therefore both close to their expected values.

 

3 The expected-standard deviation depends on the sampling distribution and the
number of replications. Given a multinomial sampling distribution and 1000

replications, c, = [np.(1- p.) [1000]"2 for the multinomial case. The standard
deviation for the product multinomial case is a, = n,p,(1- p.,)11000]"2 .

12

Conﬁdence intervals

Cochran (1952) and a number of other researchers have suggested the
range .04 to .06 as acceptable lower and upper limits for observed Type I error
rate when the nominal rate is .05. Other researchers have proposed a range
that is more liberal (e.g., .03, .07; for example, Koehler and Larntz, 1980), and at
least one researcher has proposed a range that is asymmetric, (.03, .06).
Bradley et al. (1979) justiﬁed the latter by remarking that many researchers
would accept a conservative bias. In actuality, these are tolerance limits and not
conﬁdence intervals since they were all set independently of the simulation.
These ranges of varying widths do lead to differing interpretations of the
behavior of X2; a wider range obviously makes the X2 appear to behave better

than would a stringent one.

A 95% conﬁdence interval is calculated based on the number of
replications: 0 :l: 1.96 [(ot)(1 - a) I (number of replications)]"2 where 0 = or for w =
0, and 9 is the expected power for all other values of the effect size index. 1825
tables were generated for most of the simulations so that the resulting

conﬁdence interval would equal Cochran’s limits (.4, .6).

Description of the tables
Two factors described in Chapter 1 were used to create the tables of set I,

namely the global index, R, and the number of small cell expectations, n(p). The

13

four values of R were chosen variance‘ of X2 would range from being slightly
discrepant from the theoretical variance to being two and a half time greater.
The ratios of the Pearson estimate of the variance to the theoretical was kept the
same for both table sizes. The following table lists the values of R and the
corresponding estimates of the variance of X2 for the two table sizes.

Ratio of variance (X2) to variance ()8)

 

 

 

 

 

k 716 1.5 2 2.5

R 4 32 52 82 112

Variance (x2) 4 7 9 12 15
R 16 366 526 759 1006

Variance (x2) 16 35 45 60 75

 

 

 

 

 

 

 

The tables of set II were created to evaluate the effect of changing
marginal probabilities while holding Pmin constant. Other tables were generated
to deal with speciﬁc questions and are described in the appropriate chapters.

Detailed descriptions of set I and II tables are given in Appendix A. An
alphanumeric code is used to identify these tables. The ﬁrst symbol is a letter
that represents the number of cells in the table: E, F, and S indicate that the
table consists of 8, 4, and 16 cells respectively. Following the letter are the
values of n(p) and the variance of X2. For example, S435 refers to a sixteen-cell

table with four small cells and the Pearson estimate variance of X2 is 35.

 

‘ Pearson’s estimate of the variance of X2 is: 2(k - 1) + (R + k2 - 2k - 2)/n. I set n
= 10 for four-cell tables and n = 16 for sixteen-cell tables. I prefer Pearson’s
formula to the others since it most closely matched the observed variance of X2.

14

Because the tables in set II are variations of a speciﬁc table from set I, an

additional letter is added to distinguish between the tables.

15

Chapter 3

THE FIRST QUESTION

Because the asymptotic distribution of Pearson’s X2 is the same across all
tests whatever the underlying sampling model may be, there is a tendency to
generalize the results from one test to all cases. Such generalizations may not
be justiﬁed according to Cochran (1952, p. 326):

Is the same X2 test to be used for all cases [i.e., contingency tables
with three underlying sampling distributions: multinomial, product
multinomial, and hypergeornetric]? In large samples there is no conﬂict, .
because X2 has the same limiting distribution however the linear
restrictions arise. This is not so in small samples, where the‘distribution
of X2 differs in the three cases.

A theoretical study of power substantiates this observation. Harkness
and Katz (1964) found that both the hypergeornetric and product multinomial
cases of X2 have more power than the multinomial (i.e., the test of
independence). The hypergeornetric case’s superiority did not hold, though,
when the marginal probabilities were skewed and n=20.

Few simulation studies have looked at more than one test. Roscoe and
Byers (1971) considered the goodness of ﬁt test and the homogeneity test
(though the latter was referred to as the test of independence) and proposed
sample size guidelines that are different for each test. Camilli and Hopkins

(1978) considered both the homogeneity test and the test of independence and
16

found the behavior of x2 to be similar for both. However, neither of these studies
used identical tables to assess the tests.

The above evidence suggest that the behavior of X‘2 may indeed be
different across tests when n is small but this issue has not been studied
systematically. The differences in the behavior of X2 across tests may not be a
serious problem if the differences are small. Therefore the question is: How
variable is the behavior of X2 across tests when n is small?

Part 1. Type I error and power across tests when M“

Methodology. The simulation programs and table speciﬁcations are
described in Chapter 2. For this chapter, a subset of the set I tables were used,
speciﬁcally the sixteen-cell tables with four small cell expectations (n(p) = 4).
The sample size of 16 was determined based on the most liberal available
guidelines, e.g., Koehler and Lamtz (1980) for the goodness of ﬁt test, Craddock
and Flood, 1970, and Bradley et al., 1979, for the test of independence.

Results. The power plots for the test of independence (Figure 3-1, Panel
a) and the homogeneity test (Figure 3-2, Panel a) appear very similar. The latter
does have a smaller rejection rate when the effect size is small while power is
greater for large ES but these differences are generally within the conﬁdence
limits (:I: .01) or not much larger. These two tests will be compared more
extensively in Chapter 7. For now it sufﬁces to say that the distribution of X2 is
similar for these two tests.

The multinomial and product multinomial cases of the goodness of ﬁt test

(Figures 3-3 and 3-4, Panels a) are also similar to each other. The observed

17

power distributions are not as much alike as those of the contingency tables, but
these two cases do not have the same expected distribution as they differ in the
degrees of freedom. Therefore, a greater variability is to be expected. The
product multinomial case does show a slightly more liberal trend. These two
tests will be compared in chapter 5.

Marked differences are to be found between the contingency table tests
and the goodness of ﬁt tests. For the former, the Type I error rate (i.e., when w
= 0) and power are both lower than predicted by the asymptotic distributions.
The goodness of ﬁt tests have a rejection rate that is greater than expected
when the effect size is small while power tends to be overestimated for large ES.
This overestimation, however, is not as dramatic as it is for the contingency table
tests.

The simulations listed in the methodology section were focused on Type I
error. The recommended sample size based on these studies is not sufﬁciently
large fOr power to be well approximated by the noncentral if even for the table
that is the least discrepant from the uniform ($435). The estimated power,
however, is low - the maximum is .45 for w = .7. In other words, when the
sample size is only 16, one is not likely to detect even a very large effect size.
Therefore, from a practical point of view, these results are not of much interest. It
may well be that power is well approximated by the noncentral 3“ when n is large
enough to detect a large or a moderate ES. If the asymptotic distributions have
an acceptable ﬁt to the actual distribution of X2 when n is somewhat larger then

we can ignore the erratic behaviors of X2 noted in this section.

18

 

Part 2. Type I error and power across tests for larger n

Methodology. The same tables are used again although the sample sizes
will correspond to n5 and n8. As deﬁned in Chapter 2, n5 is the sample size that
is large enough to detect a large effect size with a power of .5 while n8 is the
sample size where X2 is expected to detect a moderate ES with a power of .8.
Again, a = .05 and the 95% conﬁdence limits are :l: .01 of the theoretical values.

Results. The Observed power of the test of independence is still seriously
overestimated by the noncentral 752 distribution at n5 (Figure 3-3, Panel b). The
actual power of X2 is as low as 50% of the estimated power when the effect size
is large. The ﬁt between the actual and the theoretical distributions is much
better at n8 (Figure 3-1, Panel c). At this sample size Pearson’s X2 has an
observed power that is only .02 to .05 below that of the estimated power for the
table with the most extreme cell probabilities ($475).

The power plots associated with the test Of homogeneity follow the same
trends (Figures 3-2, Panels b and c). The discrepancy between the observed
and the theoretical power is actually less although the difference is too small to
be discernible from the plots.

The power distributions of both goodness of ﬁt tests are well
approximated by the noncentral 3;: distribution at n8 (Figures 3-3 and 3-4,
Panels c). At n5 the Type I error rate is somewhat liberal and there is some
overestimation for the larger ES, but both power plots show a reasonable ﬁt to

the noncentral 38 (Figures 3-3 and 3-4, Panels b).

19

To return to the question: How variable is the behavior of X2 across tests
when n is small? When n = 16 or n5, the lack of ﬁt between the noncentral X2
and the actual power distributions of X2 are most marked for the contingency
table tests. The dissimilarity across tests appears to be minor when n = n8.
When the sample size is large enough to detect a moderate effect size with I"
adequate power, X2 is well approximated by its asymptotic distributions for all ‘ i
four tests considered. F‘
As the major differences found were betweenthe goodness of ﬁt cases on
the one hand and the two contingency table tests on the other, This study will

focus on the two multinomial cases.

20

Chapter 4

THE GOODNESS OF FIT TEST UNDER THE MULTINOMIAL SAMPLING
MODEL

The earliest sample size recommendations for the goodness of ﬁt test
were based on the fact that Pearson used the multivariate normal distribution to
approximate the multinomial distribution of the cells (Cochran, 1952). This
approximation is valid only when expectations are large. It therefore became
customary to recommend that all expected frequencies be at least 5 or even 10.
Cochran proposed guidelines for assessing goodness of fit in the case of a
unimodal distribution with only one or two small expectations. These guidelines '
were less stringent than those of his predecessors. He suggested that the
minimum cell frequency could be as small as 0.5 when there was only one small
6.; that the minimum could be 1 when there were two cells with small el; and that
all other cells should have frequencies of 5 or more. These guidelines are still
cited although subsequent research, described below, has found them to be
restrictive.

There is as yet no universally accepted set of guidelines, although a
consensus has formed around the following ﬁndings (Roscoe and Byers, 1971;
Moore, 1986; Read and Cressie, 1988):

1. X2 has been found to be erratic when there is only one degree of freedom
(Lamtz, 1978). Roscoe and Byers recommend the exact binomial test in this

sﬂuaﬁon.

21

2. When the expected cell probabilities are uniform, X2 is robust for very small
sample sizes (Wise, 1963). How small n can be is still disputed. Tate and
Hyer (cited in Roscoe and Byars, 1971) suggest that 6. can be as small as 1.
Koehler and Lamtz (1980) suggest that the sample size, n, must be greater
than (10K)"2 and no less than 10. The expected frequencies can become as

small as .25 for large tables.

3. The distribution of X2 is not well approximated by 12 when samples sizes are
small and the expected cell frequencies are extremely different. Given an a
of .05, Roscoe and Byars suggest all e. 2 1 when the departure from the
uniform is moderate. For extreme departures, the minimum e. should be 2.
Koehler and Lamtz (1980) suggest that their formula cited above can still be
applied but the minimum 11 should be 15 when there is a departure from the
uniform. They warn, however, that the Type I error rate will be inflabd if
there are many e. < 1. Yamold (1970) argues that when there are too many
small cell expectations a distribution other than 78 should be used to
approximate the distribution of X2. He provided a lower bound for using X23
em... 2 5rlk with r = n(e. < 5). This can be modiﬁed in order to calculated a
sample size: n 2 5rl(kpm..,).

A few simulation studies have looked at the power of X2. Hayman and

Leone (1964), Slakter (1968), and Frosini (1978) showed that the power of X2 is

well approximated by the asymptotic distribution when the cell expectations are

equal; but the approximation can be poor when there are some small er. Slakter

22

recommended reducing the estimated power by 20% to get a better
approximation of the actual power when n is less than 50.
Implications for researchers

Let's use an example to illustrate what happens when one applies the
different guidelines given above. A statistician working for a state department
wants to compare local statistics to the following national statistics for teachers’

level of education.

 

Level of Education Percentage
Less than Bachelor's 0.9
Bachelor's 51 .3
Master's 44.9
Master's + 30 graduate 2.9

credits

By Cochran’s (1952, 1954) guidelines the sample size should be 112. Using
Roscoe and Byars’s (1971) recommendations for tables with an extreme
departure II should be 223. The sample size is 15 by Koehler and Lamtz’s
(1980) formula, but their caveat about too many small e. probably applies to this
case. Applying Yamold’s (1970) guidelines gives an n of 278. When power is
the criteria for choosing the sample size, one ﬁnds that n5 is 24 while n8 is 122.
In summary, the various guidelines yield very different sample sizes; all
but Cochran’s are either smaller than n5 or are larger than n8l These guidelines

will be compared empirically in the following simulations.

23

In summary, previous research suggests that the size ate"...1 depends on
the size of the table (em... decreases as k increases) and the number of small e.
(em... increases as n(p) increases). X2 apparently becomes unstable when there
are both small and large e.. The ﬁrst two factors will be considered in part 1 and
all three will be considered in part 2.

Part 1 Number and size of en...

Methodology. Tables of dimensions 1 x k were generated, where k was
equal to 4, 8, and 16. The number of cells with small expectations (n(p)) also
varied for a total of 14 different tables (refer to Table 4—1). For each of these 14
tables, simulations were run for various sample sizes. Two minimum sample
sizes, 10 and 16, were used for k = 4. The ﬁrst minimum sample size is
appropriate when the cell expectations are fairly uniform; the second minimum n
is more appropriate when the cell expectations are skewed (Koehler and Lamtz,
1980). The larger tables had the minimum sample size set to 16. The sample
size was increased by increments of 0.5k until the maximum of 5k was reached.

The small cell probabilities of the tables were decreased until one or more
Type I error rates fell out of the range (.036, .064). This range corresponds to
the conﬁdence limits when there are 1000 replications. For each table, k and
n(p) remained ﬁxed.

Several suggested minimum e. are reported in table 4-1. Yamold
developed his index for n 2 5k. His formula, em... 2 5rlk, was modiﬁed so that it

could be used here. I substituted n(p) for his r. A trial index was created by

24

combining Koehler and Lamtz’s formula for n with the modiﬁed form of Yamold’s
index: n"... n(p) I k2 with n"... = (10 k)"2 or 10, whichever is larger.

Results. The smallest ems that had all Type I error rates falling within the
conﬁdence interval are listed in Table 4-1. As predicted by previous research,
the minimum cell frequency, em increases as n(p) increases for any speciﬁc k;
em... is larger when there are fewer small cells in the table. The lower limits
suggested for em... by previous researchers are, in general, larger than the em."
observed by this simulation study. The emgiven by the modiﬁed version of
Yamold’s formula is of particular interest since his values follow a similar pattern
to that of the observed em... The trial index is closer to the observed emthan
any of the other guidelines but it falls below the observed em... when k=16,
therefore it may be too small for larger tables.

.Part 2 Large e.

Methodology. Five sets of tables (for a total of 13 tables) were generated
where the size and number of p... were held constant while the size and number
of the maximum expected cell probabilities, pm, were varied. The values of
these cell probabilities are listed in Table 4-2. The ratios of pmax to pm... ranged
from 10.8 to 78.5. All other cell expectations were set to nlk. The sample sizes
were increased by increments of 0.5k. An arbitrary large n was chosen as the
maximum.

Results. The Type I error rates are plotted against n in Figure 4—1. The
plots show time-series type trends because the sample sizes are accumulative.

The behavior of X2 does not appear to be affected by the value of pm... The

25

tables with the largest ratios of pm. to pm... are not that much different from the
tables with less extreme cell expectations. For example, table b in Panel B has
a ratio of 73 but its distribution of Type I error rates is similar to that of table a
with a ratio of 49. However,the other case with a very large ratio, namely table
b in Panel E, does show a more liberal trend. The two tables in this panel have
the largest discrepancy in their respective ratios: 18 for table a and 78 for table
b. For such a very large discrepancy in ratios, the difference between the Type I
error rates is hardly dramatic.

Controlling the number as small cells acted as a constraint to the size of
pm. . After a point, the only way to increase the size of p"... is to increase the
number of small cells. What appears to be true is that tables with the same
number of small cells are similar, irrespective of the size of pm... , at least when k
= 4 or 16. Panel Esuggests that different results may be found for larger tables
with many small cells. In these cases, n(p) would have less of a constraint on

the relative size of pm... resulting in much greater extremes in cell expectations.

The next section will look at the relationship between n(p), amend power
approximations. Will these factors which were found to influence the Type I 1
error rate also influence how well Xz’s power is estimated by the noncentral 12
distribution?

Part 3 Power
Methodology. In Chapter 3, only one series of tables from set I was used

' for the comparison across tests. The power distributions of X2 are presented

26

here for all of the set I tables. The sample sizes correspond to n5 and n8.
There are ﬁve different values of n(p) for k = 16: 1, 4, 8, 12, and 15. The four-
Cell tables have three possible values for n(p): 1, 2, and 3. At least one table in
each series is expected to have Type I error rates close to a for very small
sample sizes.

Results. The power plots for the sixteeh-cell tables are presented in
Figure 42. From Chapter 3 one would expect that the noncentral 38 would be a
good approximation of actual power at n8 and less so at n5. The results provide
a‘ few surprises. In Panel f (k=4, n(p)=3) the ﬁt is fairly good, as expected for n8.
On the other hand, the four-cell tables with fewer small cells (Panels b and d)
show a poorer ﬁt for the same sample size.

Among the sixteen-cell tables, it is the table with only one small cell that
shows a poor ﬁt at n8 (Panel h). These results suggests that it may be the size
of em... independent of the number of small cells, that affects the ﬁt of the
noncentral x2 to the observed power distribution of x2. But this hypothesis is
contradicted by the results for n5. Panels i and k (n(p) = 4 and 8 respectively)
show a good ﬁt although these tables have smaller cell probabilities than Panels
m and o (n(p) = 12 and 15 respectively).

To further complicate matters, Panel c indicates that another factor is
involved. There are three jumps in power: F212 jumps at w = .3, F209 jumps at
w = .4, F201 jumps at w = .5. These jumps correspond to a change in the
pattern of cell probabilities in H1. The lower power corresponds to an H1 where
is greater when there is a trade-off between like cells, e.g., one small e.

27

decreases by the amount that the other small 9. increases. A jump in power
corresponds to an H, where a cell with a large expectation decreases while the
other three cells increase. This latter pattern was used consistently for n(p) = 1
and 3. It thus appears that actual power was maximized (inadvertentlyl) by the
H1 used in these two series of simulations.

The above observation led to a question: Is degenerate power
associated with an H1 which posits that some small cell probabilities become
even smaller? The tables with n(p) = 12 and 15 have a large number (6 to 12) of
cells that are posited to have probabilities smaller than e"...1 under the null. This
may explain in part the discrepancy seen between observed and expected
power. This issue is explored in the next section.

Part N Power for two different H1

Methodology. Two sets of alternative hypotheses were created. One with
a positive pattern, meaning that all cells with small possibilities were larger
under H1. Under the negative pattern, at least two-thirds of the small cells were
set to .001. Simulations were run for four-cell tables and one sixteen-cell table,
81275. The effect size was set to w = .3 (moderate). The 95% conﬁdence

interval for the power distribution is 9 :l: 1%.

Results. In Figure 4-3, the obServed rejection rates are plotted against
the sample size. Several of the observed power functions in Panels a and b do
not increase smoothly as the sample size increases. In these extreme cases,

the possible values for X2 are restricted and the distribution for X2 is a step

28

 

 

function. The rejection rate decreases sharply when a speciﬁc set of observed
values yields an X2 that falls just below the critical value. For exernple, for the
case k = 4, n(p) = 3, pm... = .01, under the negative pattern, the set of observed
values (0, 0, 1, 9) occurs fairly frequently when n = 10. Its X2 is larger than the
critical value: 8.35 > 12.0“ = 7.815. At n = 12 the similar set (a, o, 1, 11) is no
longer signiﬁcant: 6.73 < 36". This results in the drop observed in the power
function. In Panel b the plot appears to smooth out near it = 100 for n(p) = 1

(where em... = 4) and at n = 120 for n(p) = 3 (where em... = 4.8).

The power plots are comparatively smooth for the large table (Panel c),
even though two-thirds of the cells have very small expectations: n(p) = 12, p..."
= .012. The two extreme His show that very different power plots can be creebd

for the same table. At the maximum sample size em... is 2.8.

In none of the plots do the two H1s converge. In Panel a, there is a
difference of 13% in the rejection rate between the two alternate hypotheses at n
= 200 (em..= .2). In Panel b, the disparity in the rejection rates beMen the two
hypotheses is 6% for when e“... = 8. In Panel c, at the rneximum sample size,
6..., = 2.8 and the disparity in the rejection rates is nearly .10. The observed
power plots are all outside the conﬁdence interval of the asymptotic distribution -

even when all cell expectations are greater than ﬁve (Panel b).

29

Discussion

Type I error was found to be sensitive to several factors: the size of the
minimum expectations, the number of smell expectations, and the size of the
table. Power was found to be sensitive to an additional factor, namely the
pattern of differences posited by the eltemetive hypothesis. Power plots where
the small cells were larger under H. were quite different from those where a

majority of the small cells were smaller.

Theepproximetion of Xz’s distribution by 38 does appear to be satisfactory
for sample sizes smaller than those generally recommended. However, under
the serne conditions the power distributions of X2 are not well approximated by
the noncentral 16. As suggested by Figure 4-3 Panel b, power can be
underestimated by the noncentral 3" even when the sample size is larger than
that recommended by any of the present guidelines. Admittedly, the observed

power is not greatly overestimated and the case used is extreme.

Any recommendations based on these limited number of cases would be
premature. Further work controlling all four known factors is needed in order to

develop reliable guidelines.

30

Application

This section is meant to illustrate how to apply the simulation resUlts to a

hypothetical example. A simulation was run to test the predictions made.’

An example was described in previously in the section “Implications for
researchers.” The four-cell table had two small cells. These cells represent the
extremes on the spectrum of educational level. If local teachers are higher than
the national average at one end of the educational spectrum, they are likely to
be lower than the national average at the other. In other words, it’s unlikely that
a state having a higher percentage of teachers with advanced graduate degrees
would also have more teachers who have not attained a bachelor's degree.
Therefore, the alternate hypothesis is not likely to be an extreme case where

both small cells are smaller than under Ho.

From the simulations in part 1, we can expect that the Type I error to be
acceptable as long as em... 2 .96. (Refer to Table 4-1, k = 4, n(p) = 2.) Given
that pm... is .009 for this example, 11 should therefore be at least 107. The results
of part 3 suggest that power is likely to be somewhat less than predicted by the
noncentral 3" even when n = 122 (n8). (Refer to Figure 4-2 Panel d, case F215.)

The actual power for this speciﬁc case was .03 less at n = n8 for the H, which

 

5 The data presented in all four application sections are made up. The
conﬁrmatory simulation runs used data generated by Numen'cal Recipes’ RAN2
(Press et al., 1992). This program uses a L’Ecuyer generator with a Bays-
Durham shuffle.

31

posited that local teachers would have higher educational levels than the
national average. Contrary to expectations, the other H1 tested showed more
power (+.02) than predicted by the asymptotic power distribution. The second
H1 posited that local teachers are less well educated than their national peers:

Pm... became larger under H1.

The predictions based on the previous simulations were therefore not
entirely misleading although the power trend for one of the eltemetive
hypothesis was opposite of what was expected. Power cannot yet be accurately

predicted by the results of this simulation study.

32

Chapter 5

THE GOODNESS OF FIT TEST UNDER THE PRODUCT MULTINOMIAL
SAMPLING MODEL

It may be best to explain the product multinomial case of the goodness of
ﬁt test by contrasting it with the usual multinomial case. In the example used in
the previous chapter, we were interested in teachers’ level of education. Let's
say that it is known that teachers’ level of education is not homogeneous across
all groups, speciﬁcally that high school teachers are more likely than any other
group to have a graduate degree. If our sample has a higher percentage of high
school teachers than in the national sample, this bias may cause us to
erroneously reject the null hypothesis. One option for controlling this bias is to
sample from each group and test against the expected proportions for each
separate group. This, then, is the product multinomial version of the goodness
of fit test.

The research question remains the same as for the multinomial case: Are
local teachers comparable in level of education to the nation as a whole? The
number of degrees of freedom, however, differs. For I groups and J categories,
the correct degrees of freedom is I(J - 1) or k - l. Otherwise the goodness of ﬁt
test is carried out in the usual manner.

I have found no empirical studies for this version of the goodness of ﬁt
test. In Chapter 3, it was seen that the product multinomial case followed the

same trends as its multinomial analog. In part 1, the extent of this similarity is

33

evaluated by comparing the simulation results for the two tests. In part 2, the
impact of varying the size of the samples is considered.
Part 1. Comparison to the multinomial case

Methodology. Set I tables with n(p) = 12 were used. These were chosen
because the ﬁt of the observed power distribution to the asymptotic was found to
be poor. The differences in ﬁt for the two sampling models had to be evaluated
indirectly because of the discrepancy in the degrees of freedom: [Observed
power (product multinomial case) - predicted power (df = 12)] - [Observed
power (multinomial case) - predicted power (df = 15)].

Results. The differences in ﬁt are plotted in Figure 5-1. At n = n5 the
differences in ﬁt are nearly all negative (Panel a). For the small effect sizes,
where power is slightly underestimated for both sampling models, the negative
differences mean that the multinomial case has a stronger liberal trend than the
product multinomial case. The interpretation is different when the effect sizes
are large. Power is overestimated in both cases, but more so for the product
multinomial. These differences, however, are small with the largest (in absolute
terms) being -.023.

At n = n8 the differences in ﬁt are random - the product multinomial case
does not show a consistent bias. The differences, again, are generally small.
The two cases can therefore be considered as equivalent, at least when the
group samples are all equal in size. This simulation is replicated in the next part

with tables where the groups are not equal in size.

Part 2. Varying the size of the samples

Methodology. Set ll tables are used, along with their set I counterparts,
namely S860 and S875. These are the tables where the minimum expected cell
frequencies are held constant while the marginal probabilities are varied. Since
it was found in the previous chapter that the patterns of differences under H1
affected power, this factor was controlled as much as possible. Speciﬁcally, I
attempted to set the smallest frequencies equal across all tables for a given w.
The table speciﬁcations can be found in Appendix A.

Results. Figure 5-2 presents the power plots. For both series, the best ﬁt
to the asymptotic power distribution occurs when the samples are equal (S860,
$875). What is striking is the fact that both the 860 and 875 series have similar
plots even though the minimum cell frequencies are smaller for the latter. The
875 series has only slightly less power (approximately -.02) than the 860 series
when the effect size is large and n = n5. Both are reasonably well approximated
by the noncentral X2 when n = n8.

The discrepancies seen in the power distributions at n = n5 (Panels a and
c) cannot be explained by the factors that have been considered previously. Em
and H1 patterns can be ruled out since these were held constant. Although the
number of small cells do vary somewhat, discrepancies are seen beMen tables
with the exact same n(p). For example, table C’s Observed power at w = .5 is
.27 more than that of table a even though they both have n(p) = 6. Two other

possible factors are marginal totals and the distribution of e. within the rows.

35

Let’s ﬁrst consider marginal totals as a possible factor. There are two
pairs of tables with the same ﬁxed row totals (1: a and b; 2: c and d). Tables c
and d do have similar Type I error rates and observed power distributions. The
same cannot be said for tables a and b. They show a .17 disparity in power at w
= .5. This ﬁnding seems to rule out marginal totals as a factor affecting the
power of X2.

The possibility that the distribution of elwithin each sample is the
explanatory factor cannot be answered with the sample sizes used in this
section. At n = 40, all of the 6...... are below the minimum observed values found
in Chapter 4 while they are all larger than the minimum values at n = 196. Other
sample sizes are considered in the next simulation.

Part 3. Distribution of elwlthin samples

. Methodology. The same tables are used as in part 2. Fewer effect sizes
were considered, namely w = .3 to .8. One sample size was chosen so that
tables a and c would have em... larger than the minimum Observed value for em,1
(as reported in Table 4-1) while tables b and d, with three small cells, will have
an em... below the minimum observed value. This sample size is 96 for the 860
series and 128 for the 875 series. A second sample size was chosen near the
minimum observed value for tables b and d.

Results. The power plots are presented in Figure 5-3. The distribution
for S875d shows markedly less power. It is a case where n(p) = 3 therefore it
and, to a lesser extent, S860d appear to conﬁrm the expectation that power plots

associated with tables having three small cells per group would have less power

36

than the plots for tables with n(p) = 2 in each row. However, the other two tables
with n(p) = 3, namely S860b and S875b, do not support this hypothesis. Their
power plots are not consistently worse than those of other tables for the smaller
sample size. Therefore, the number of small cells within each group does not
appear to explain the discrepancies in the observed power distributions noted in
part 2.
Discussion

When all samples are equal in size, the power distributions for the
product multinomial case of the goodness of ﬁt test are comparable to those for
the multinomial case. When sample sizes are not equal, the ﬁt of the observed
power distributions to the asymptotic is not as good although this does not
necessarily translate as loss of power. In the two series of tables with em... held
constant, three of the four tables with unequal samples had more power than the
tables with equal sample sizes. I was not able to isolate what speciﬁc factor or,
more likely, the combination of factors that could explain the discrepancies of the
observed power from the asymptotic power distribution.
Application A

The application problem will follow up on the example used at the
beginning of this chapter. Let's say that the national survey of teachers’ level of
education yielded the following results when broken down into four groups. The

total sample size is 13,060.

37

 

 

 

 

 

 

< Bachelor’s Bachelor’s Master's Master's Total
+ 30
Primary
N 64 2925 1577 4 4571
% of group 1.4 64.0 34.5 0.09
% of all 0.49 22.4 12.1 0.03
Upper Primary
N 26 1698, 1528 13 3265
% of group 0.8 52.0 46.8 0.4
% of all 0.20 13.0 11.7 0.1
Junior High
N 13 654 706 65 1437
% of group 0.9 45.5 49.1 4.5
% of all 0.1 5.0 5.4 0.5
High School
N 1 1 1428 2049 299 3787
% of group 0.3 37.7 54.1 7.9
% of all 0.08 10.9 15.7 2.3

 

 

 

 

 

 

 

From Table 4-1, we can expect that the Type I error rate will be
acceptable if em... is at least .44 (k = 18, n(p) = 8). As pm... is .0003, n should be
1437. They Type I error rate will be liberal for smaller sample sizes. The
simulation results showed that Xz’s power tends to be close to the power
approximation. (Refer to Figure 4-2 Panel I.) However, the application table has
cell expectations much smaller than any of the simulation tables, therefore
power can be expected to be less.

The results from the conﬁrmatory simulation run are presented in
Figure 5-4. The group sizes are all equal. The four sample sizes considered
correspond to expected powers of .80, .90, .95, and .99. The Type I error rates
are all liberal, as predicted above. Observed power is considerably less than

that of the noncentral X2 approximation for two of the eltemetive hypotheses.
38

The difference is more marked for the “Shift down” case where smaller cell
frequencies were predicted for the Master's + 30 level. This result runs counter
to the Chapter 4 application result where the “Shift down” H1 showed more
power! The hypothesis which posited no Change for the small cells (“No
extremes”) had Observed power close the nominal values.

In summary, the predicted trends were correct for both Type I error and
power under the two hypotheses predicting differences for the small cells.

Power, however, was much lower than I expected.

39

Chapter 6

THE TEST OF INDEPENDENCE

The test of independence differs from the goodness of ﬁt test in that the
expected cell probabilities are not predetermined but are calculated based on
the marginal probabilities: on = n p._ p,,. These expectations cannot be known
precisely before collecting the data therefore determining sample size will be a
process of guess-estimating. Some have suggested a multi-stage sampling
procedure when there is very little information about the possible values of the
marginal probabilities (e.g., Horn, 1977).

Simulation studies (Camilli and Hopkins, 1978; Craddock and Flood,
1970; Bradley et al., 1979) have consistently found that X2 is robust as long as
the marginal probabilities are not extremely skewed. For tables varying in size
from 2x3 to 5x5 and with nearly equal expected frequencies, Craddock and
Flood found that the x2 approximations of X2 is accurate at the 90'”, 95‘" and 98"
percentiles for n as small as k. In their extensive simulation study, Bradley et al.
found that Type I error rates will not exceed .06 unless both sets of marginal
probabilities are extreme skewed. If one set of marginal probabilities is highly
skewed while the other is nearly uniform, the Type I error rates are conservative.
This conservative bias, as remarked by Bradley, appears to be tolerable to many
researchers even though power may be adversely affected. Koehler (1986) and
Agresti and Yang (as cited in Agresti, 1990) considered much larger tables. For

10x10 and 20x20 tables, 6,, can be as small as 0.5 when all the expected
40

frequencies are equal. When both sets of marginal probabilities are highly
skewed, Koehler found the X2 approximation to be poor for large, sparse tables.
Agresti and Yang, on the other hand, found that the Chi-square approximation is
adequate given a large table (100 cells) and n = k for marginal probabilities as
small as .05. Their tables were not as skewed as those in Koehler’s study.

An empirical study on the power of Pearson’s chi-squared test of
independence for 2x2 tables was carried out by Bradley and Seely (1977). They
found errors of approximation when n is small. These errors are most serious
when a small it is combined with highly skewed marginal probabilities. For
example, given n=20 and marginal probabilities of .1 and .9, the actual power is
.8 whereas the power based on the noncentral 1,2 distribution is greater than .95.

In an earlier study Harkness and Katz (1964) compared power estimated
by normal approximation methods developed by Patnaik and Sillitto with an
exact test, the uniformly most powerful unbiased size a. test (UMPUT), for three
types of contingency tables. The power of all three tests was overestimated by
the normal approximations though the discrepancies were not large. Only 2x2
tables and n s 30 were considered.

In summary, the simulation studies focusing on Type I error suggest that
X2 is robust when n is small unless the marginal probabilities are highly skewed.
On the other hand, power simulations ( i.e., Bradley and Seely, 1977) found that
the noncentral x2 approximation is more sensitive to these factors, at least for

2x2 tables. The initial results presented in Chapter 3 bear this out: Power was

41

found to be seriously overestimated for the generally recommended sample size,
n = k or 16, and even for the larger sample size of 40 (n5).
Implications for researchers
Many different guidelines for sample size have been proposed.

Cochran’s (1952, 1954) guidelines are still frequently cited in textbooks. He
suggested that at least 80% of cells should have e. 2 5 while the remaining cells
can have expected values as small as 1. As stringent as Cochran’s guidelines
are, there are researchers that have recommended even larger sample sizes.
Hays (cited in Bradely et al., 1979) recommended that all e, 2 10 when df=1 and
a minimum of 5 for larger tables. Tate and Hyer (cited in Bradely et al., 1979)
. argued for a minimum 6., of 20. Bradley et al. considered these
recommendations as prohibitive and remarked that “traditional rules‘of thumb
based on minimum expected frequency, without regard to the marginal
distributions, do not provide selective protection against errors of approximation
where such protection is needed most” (p. 1295).

Roscoe and Byars (1971)6 suggested the following guidelines,
given a = .05: n 2 2k when the marginal probabilities are uniform; n 2 4k when
the probabilities are moderately skewed; n 2 6k for tables with extremely skewed

marginals.

 

° This study is cited frequently in the literature related to the test of
"independence although the actual sampling distribution used is the product
multinomial. As the two sampling distributions were found to give similar results
in Chapter 3, Roscoe and Byars’s guidelines are included in this section.

42

A more recent set of guidelines based on simulation studies was offered
by Wickens (1989, p. 30):

1. For tests with 1 degree of freedom, all the [Ag [cell expectations] should
exceed 2 or 3.

With more degrees of freedom, p. as 1in a few cells is tolerable.

In large tables up to 20% of the cells can have pg appreciably less
than 1.

The total sample should be at least 4. or 5 times the number of cells.
Samples should be appreciably larger when the marginal categories
are not equally likely. .

3”!”

5"?

The main drawback to these guidelines is the vagueness of some of the
terminology. When should one consider the marginal probabilities to be
extremely rather than moderately skewed? How much is “substantially more?“
Obviously, these different guidelines lead to different sample sizes. To
illustrate how different the sample sizes can be, ns are calculated for a few

tables that will be used in the simulations.

 

Table pm... Cochra Tate & Roscoe Wickens Power Power Pn

 

 

n Hays Hyer 8: Byars n5 n8

emm=5 emh=20 n=6k em>1
S475 .0047 1064 4255 96 >213 40 176 241
S475b .0047 1064 4255 96 >213 40 176 86
S875 .0085 589 2353 96 >118 40 176 153
$875b .0085 589 2353 96 >118 40 176 75

 

Pn in the last column refers to an index that I wish to introduce here. When n is
small, it is possible to end up with a marginal total of zero especially if the
marginal probabilities are skewed. When that happens the expectations for that
row’s (or column’s) cells are zero and it then becomes impossible to calculate X2

for all cells. The probability of getting a marginal total of zero for a speciﬁc

43

sample size can be calculated using Z.(1"Pr.)" + 210- p)". This estimate is

accurate for small probabilities (i.e., less than .05). Pn is the sample size where
the probability of getting a marginal total of zero is .01. This index will be
considered along with the other factors, namely 6...... and R, in the following
simulation. If any of these indices are useful in predicting when Type I error is
close to a, we would then have a quantitative index that can be helpful in
determining sample size.

Part 1. Type I error rate

Methodology. Set ll tables were used where pm... was held constant within
each series of tables while the marginal probabilities were manipulated. These
tables ' are described in Chapter 2 and Appendix A. Sample sizes ranged from
16 to 1000. The 95% conﬁdence interval for the Type I error rate is .4 to .6.
Whenever a generated table did have a marginal total of 0, it was treated as a
failure to reject Ho.

Results. The Type I error rates are plotted in Figure 6-1. The error rates
substantiate Bradley et al.’s (1979) conclusion: When both sets of marginal
probabilities are extremely skewed the error rates are higher than the nominal or;
otherwise X2 tends to be conservative. Apparently both sets of marginals need
to have at least one probability less than .1 for the Type I error to become liberal
(i.e., larger than expected).

For some of the tables, 11 must be quite large before Type I error falls

within the conﬁdence interval (notably S875b). If one sets wider tolerance limits,

44

as did several of the researchers cited above, then these results do substantiate
their conclusion that X2 is fairly well approximated by X2 for the test of
independence, even when the marginal probabilities are extremely skewed. The
majority of tables have distributions that are within (.3, .7) for n 2 32. There are
exceptions, the more notable being S475, S860, S860c, and S875.

Neither pm... nor R appear to be useful for predicting how close the Type I
error rate will be to the nominal, a. If pm... (or, alternatively, er...) were the
determining factor, then the error rates would be similar within each series.
However, this is not the case. For example, S475a falls within the tolerance
limits at n = 40, em... = .188 while this doesn’t happen for $475 until 11 = 136 and
em... = .64. There would also be noticeable differences across series. The 875
series should be worse than the 860 series (Pm = .0085 versus .0115 for the
860 series). The same argument can be made against R. The tables with the
largest values are not necessarily the worse. By this criteria, all of the 475
tables should have poorer ﬁt than the 875 tables (excepting S875 itself).
Though the lowest R values (S875e, b, c and $860a) do tend to have good ﬁts,
this is not consistently true (8860).

The index based on the marginal totals, Pn, does show some usefulness
in controlling Type I error. Sample sizes that are greater than Pn have error
rates well within the tolerance limits. 1
Part 2. Power

Methodology. For comparative purposes, set I tables are presented here
along with two tables from set ll, namely S869b and 8875b. These latter tables

45

have Type I error rates that are higher than expected. Two sample sizes are
considered for these sixteen-cell tables: n = 40 (n5) and 176 (n8).

Results. The power plots are presented in Figure 2. Power is well
approximated by. the noncentral X2 at n = n8. However, this is not the case when
n = 40. For these tables with skewed marginals, power is fairly consistently
overestimated by the noncentral chi-square distribution. This is true even for the
tables associated with a liberal Type I error rate (Panel e, S860b and S875b).
The observed power distributions for these tables are also overestimated in the
range of interest, namely w = .5 to .7.

Four-cell tables with extremely skewed marginal probabilities have a
particularity in that they have a restricted range for the effect size. If one column
(or row) total is small relative to the other, there is an upper limit to the size of
ES. In these trials, the largest effect size is w = .4. Power is overestimated for
small n, but well approximated by the noncentral x2 at n = n8.

As was seen in Chapter 3, the overestirnetion of power is much greater for
the test of independence than for the goodness of ﬁt test. Given k = 16, when
the number of small expectations was not very large (n(p) s 8), the ﬁt of the
observed power distribution by the noncentral X2 was good for the latter test.
For the test of independence, the observed power can be as little as half of that
predicted by the noncentral 7". Another difference between the two tests is that
the number of small cells does not seem to be a factor affecting power for the
test of independence. The power plots are fairly similar across n(p) (i.e.,
compare Panels 6, e, and g).

46

In summary, the power plots of S860b and $875b eliminate pun/em... and R
as determining factors. If the ﬁrst case were true, these plots would be similar to
those of their respective set I counterparts, S860 and $875. The observed
power for the former tables was greater for all effect sizes. If R was the
determining factor, then their power plots would have showed less power than
that of S860. However, this expectation is contradicted by the results.

In part 1, it was found that when n 2 Pn, Type I error was within the
tolerance limits. Can the same be said for power? This question is the
motivation for the next simulation.

Part 3. Pn and asymptotic ﬁt

Methodology. The same set of sixteen-cell tables used in Part 2 are used
here. The sample size was set to Pn rounded up to the nearest factor of .5k.

Results. The power plots are presented in Figures 3. The ﬁt of the
observed power distribution to the noncentral x2 is not ideal for all values of w. It
seems worse when power is in the middle ranges. The difference between the
Observed and nominal powers are plotted against the nominal values in Figure
6-4. The relationship is parabolic for power estimates between .05 and .80. The
maximum difference in ﬁt is .09, corresponding to a 9% decrease in the rejection
rate.

DISCUSSION

The above simulations conﬁrm previous research: The chi-squared test
of independence is quite robust as far as Type I error is concerned - as long as
one accepts tolerance limits that are somewhat wider than the conﬁdence

47

interval. However, when marginal distributions are skewed and n is small, power
can be seriously overestimated by the noncentral 77’.

Most of the available guidelines for determining sample size recommend
sample sizes that are much larger than needed. It was also found that the
distribution of marginal probabilities is a better indicator of the Pearson statistic’s
ﬁt to its asymptotic distribution than 6......

One practical issue not raised in the literature on the test of
independence is that small sample sizes may result in marginal totals of 0. A
researcher can avoid this problem by calculating Pn, deﬁned in this study as the
sample size where the probability of getting a marginal total of zero is 1%. An
easier method that yields a similar answer is to multiply the minimum estimated
marginal probability by 5.5. This sample size is large enough for the Type I error
to be reasonably close to at. Power, however, can be overestimated by as much
as .09 when n = Pn. Some adjustment to power estimates is recommended.

An application

A professor is interested in knowing whether the level of exposure to
advanced math courses is related to success in her introductory statistics
course. Based on a survey she ﬁnds the following distribution for highest level

of math course taken.

 

Factor 1: Highest level of math taken Percentage
No college level math 10
College algebra 55
1 year of calculus 15
1 year or more beyond calculus 20

 

48

Based on previous experience, she expects the following distribution for grades.

 

Factor 2: Grade Expected percentage
4.0 30
3.5 20
3 40
s 2.5 10

 

Her current enrollment is 40 students. Is the sample size large enough for a
reasonable level of power?

To answer the question a plausible effect size must ﬁrst be determined.
One strategy is to calculate w for a possible set of data if a high (but not perfect)
correlation exists. If the students are distributed as shown in the following table,
w = .87, a considerably large ES. The expected power is better than .90 for w

greater than .7.

 

s 2.5 3.0 3.5 4.0
No college math 2 2 0 0
College-algebra 2 14 4 2
1 year calculus 0 0 2 4
> 1 year calculus 0 0 2 6

 

It was found in this chapter’s simulations that Type I error rates generally
fell within the range .3 to .7 when the sample size was at least 32 for sixteen-cell
table. (Refer to part 1.) The marginal totals of the application table are not
extremely skewed - no proportion is expected to be less than .1 - therefore the
trend of the Type I error should be conservative.

Marginal totals of zero are not a concern here but two marginal totals are
less than ﬁve; a sample size of 40 is therefore less than Pn (which equals 51 for

49

this example). Actual power can be expected to be overestimated by the
noncentral 3". (Refer to Figure 6-2, Panels c and 3 for n(p) = 4 and 8
respectively. N(p) is 6 for the application table.) The overestimation will
decrease as the effect size increases. (Refer to Figure 6-4). In spite of the
overestimation, a sample size of 40 appears to be large enough for detecting a
large effect size with a power greater than .80.

The conﬁrmatory simulation run had a Type I error rate of 4.4% which
does fall within the expected range. The power distribution is given in Figure 6-
5. The discrepancy between observed and actual power does not consistently
decrease as the effect size increased as was predicted above. The largest
discrepancy, though, is for w = .5. Observed power is not too seriously

overestimated, supporting the conclusion that the sample size is large enough.

50

Chapter 7

THE HOMOGENEITY TEST

The calculations for the test of homogeneity are carried out in the same
manner as the test of independence. The difference is entirely in the sampling
procedure. One set of marginal totals corresponds to the samples taken from
the various populations. The objective is to determine whether the populations
are similar on the characteristic measured. For example, one may ask whether
career aspirations of medical students are similar across ethnic groups.

The homogeneity test has been studied less frequently than the test of
independence. Camilli and Hopkins (1978) found the homogeneity test to be
somewhat conservative when both sets of marginal probabilities were skewed
(e... s 2) but otherwise it was robust for 2 by 2 tables when the sample size was
at least 20. A simulation study by Roscoe and Byars (1971) considered two
equal groups and varying marginal probabilities on the second dimension
(uniform, moderately, and extremely skewed). They reported X2 to be “strikingly
robust.” At the .05 level, Type I error was conservative for the smallest sample
sizes when the column totals were skemd. They also reported that when both
sets of marginals were extremely skewed, the Type I errors were “a bit erratic
(though generally conservative)” Garside and Mack (1976) calculated the exact
Type I error rates for 2 x 2 tables. All but a very few error rates fell in the .04 to
.06 range for a = .05. Lamtz (1978) tested a 2x3 table with two equal-sized

groups. X2 was close to nominal values for n 2 16 and below nominal for smaller
51

n. These three studies are therefore consistent in ﬁnding that X2 is robust and
tends to be conservative when n is small, much like the results found for the test
of independence.

l have not found any simulation studies on the power of the homogeneity
test but there have been some theoretical work done. Meng and Chapman
(1966) presents Neyman’s proof that the optimum sample size for a 2:9 table is
n, = n; = N/2. The test of independence has less power than a homogeneity test
with equal group sizes. Harkness and Katz’s (1964) theoretical study of exact
power found that this superiority in power held for n s 30 and when the two
groups were not equal in size. Although higher in power than the test of
independence, the homogeneity test’s power is still overestimabd by the normal
approximations developed by Patnaik.

Implications for researchers

Recommendations made for the test of independence appear appropriate
for the homogeneity test. Ideally the all the samples would be equal in size as
this would maximize power. When the marginal totals are skewed and/or 11 is
small, the power of x2 will not be closely approximated by the noncentral x’. but -
research suggest that the test of homogeneity is more robust then the test of
independence. How much more robust is the question considered below.

Methodology. Set I tables with n(p) = 8 are used along with two tables
from set ll, namely S860b and S875b. The set I tables have equal sized groups

while the set ll tables have skewed marginals on both dimensions. Two sample

52

sizes are used: n = n5 which is 40 for both tests, and n = Pn. The value of Pn
will depend on the table.

Results. The power plots for the test of independence and the
homogeneity test are presented in Figure 7-1, along with the differences found
between the two tests’ observed power. In Panel a one can see that the
homogeneity test does tend to have more power for the larger effect sizes when
n = 40 and its Type I error rate (w = 0) is slightly more conservative. This
superiority does not hold when 11 increases (Panel d). The two tables with
unequal sample sizes Show the same pattern (Panel 9): The homogeneity test’s
superiority in power appears to exist only for large effect sizes and small n. The
maximum observed difference in power is .05 (Panel 9) with nearly all other
positive differences being less than .03.

Discussion

The homogeneity's test theoretical superiority in power over the test of
independence was conﬁrmed but found to be signiﬁcant only for large ES and
small 11. Guidelines developed for the test of independence appear to be
generalizable to the homogeneity test.

Application

From a ten-year old large-scale study, it was found that career aspirations
among medical students differed across ethnic groups. A replication study is
being considered. Previous data provide the following information. Sixty-ﬁve
percent of medical students are white, 25% are black, 7% are Hispanic, and 3%

are Asian. The breakdown for career aspirations is: Private practice, 54.0%;

53

Salaried positions, 12.9%; Faculty positions, 29.5%; the remaining 2.7% are
lumped together as “Other.” The effect size is expected to be moderate at best.

If the smallest group size is 10 to 24% of the overall sample size, Pn will be 175.
Since only one set of the marginals will be extremely skewed, the Type I error
rate can be expected to be conservative. A sample size of 176 is theoretically
large enough to detect a moderate effect size with a power of .8. The simulation
results suggest when the sample size is greater than Pn, Type I error will be
reasonably Close to a and the power approximation will also be close to the
observed power. (Refer to Figure 7-1, Panel e, Table 8875.)

The conﬁrmatory simulation run with the smallest group making up 10% of
the overall sample does substantiate the predictions: The Type I error rate was
5.8 and power was .82. The results were slightly better when all groups were

set equal: The Type I error rate was 4.2 and power was .80.

Chapter 8

SUMMARY AND RECOMENDATIONS

Although the asymptotic distributions for Pearson’s chi-squared statistic
are the same across tests, it was shown here that X2 behaves differently when n
is small. The ﬁt of Xz’s observed distributions to the asymptotic is further
worsened when the distribution of expected cell frequencies is not uniform.
Under these conditions, the gOodness of ﬁt X2 tends to have a liberal Type I
error. In contrast, the test of independence is generally conservative unless
both sets of marginal probabilities are extremely skemd. For both tests it was
found that power estimation is more sensitive than Type I error. Overestimetion
of power is much more serious for the test of independence than the goodness
of ﬁt test. The product multinomial analogs of these tests have similar trends.

Several sample size guidelines were considered for each test. These
yielded greatly divergent sample sizes. The objective of the earliest guidelines
was to have a close approximation of Xz’s Type I error rate by 352. These
guidelines are stringent and their recommended sample sizes tend to be large.
Later guidelines based on simulations considered a looser ﬁt as acceptable,
therefore these sample sizes are often considerably smaller. Though there have
been empirical power studies, these haven’t led to sample size guidelines. This
study attempted to combine both perspectives for evaluating sample size

guidelines.

55

A related problem is how best to describe tables with cell expectations
that are not uniform. The minimum cell expectation is frequently the criteria
used by sample size recommendations. It was found to not be a sufﬁcient
criteria for the goodness of ﬁt test and it is not as useful as marginal totals for
the test of independence. Several factors are involved in the former case: not
only the size of the minimum cell expectation, but also the number of small
expectations, the size of the table, and, for power, whether the small cells are
smaller or larger under the eltemetive hypothesis. These factors cannot be all
combined into a single index nor can a simple guideline be developed that would
account for all of the factors.

The test of independence was easier to deal with. A quantitative index
based on the marginal totals, Pn, was described. If the sample size is larger
than Pn, a researcher can be conﬁdent that the actual distribution of X2 is fairly
well approximated by its asymptotic distributions.

Recommendations for future research

A tension exists between “good enough” for practical purposes and the
theoretical perspective. Ideally the sample size should be large enough that the
statistic’s actual distribution will match its asymptotic distribution. Extreme
cases, though, pose a dilemma for practitioners. Given a table with extremely
small expectations, the sample size needs to be very large before one can
expect a good approximation by the asymptotic distributions. This may neither
be feasible nor even desirable. If the researcher is only interested in eveanting

a moderate to large effect size but the recommended sample size is so large that

56

it will detect a small to moderate effect size with better than .9 power, the
researcher would be justiﬁed in thinking that some middle ground should be
found! Guidelines that provide adjustments for less than ideal cases would help
in this type of situation. i

The tentative guidelines suggested here need to be reﬁned and tested to
other table sizes in order to make them more generalizable. Determining
adjustments for less than ideal sample sizes would also require a large
systematic simulation study. Extensions to srneller as and multi-dimensional
tables are two other areas where further research is needed.

Pearson’s chi-squared statistic, in spite of its well-known shortcomings, is
still the most used test for categorical data. With the growing emphasis on
power issues, research on the factors inﬂuencing the power estimation of X2

should become a greater priority.

57

APPENDICES

58

Table A-1. Marginal probabilities for tables

 

 

 

 

 

 

 

 

 

 

Set I.

| Id S_135 S145 S160 S175 S435 S445 S460 S475

k 16 16 16 16 16 16 16 16

np 1 1 1 1 4 4 4 4

Var(X2) 35 45 60 75 35 45 60 75

R 366 526 766 1006 366 526 766 1006

row1 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

row2 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

row3 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

row4 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

column1 0.195 0.191 0.1894 0.1888 0.060 0.044 0.026 0.019

column2 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

column3 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

column 4 0.305 0.309 0.3106 0.3112 0.420 0.456 0.474 0.461
Id 8835 S845 S660 mm

k 16 16 16 16 16 16 16 16

hp 6 6 6 6 12 12 12 12

Var(X2) 35 45 60 75 35 45 80 75

R 368 526 766 1006 366 526 766 1006

row1 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

row2 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

row3 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

row4 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25

column1 0.113 0.071 0.046 0.034 0.142 0.095 0.066 0.049

column2 0.113 0.071 0.046 0.034 0.142 0.095 0.066 0.049

column3 0.367 0.429 0.454 0.466 0.142 0.095 0.066 0.049

column4 0.387 0.429 0.454 0.466 0.574 0.714 0.602 0.854
Id S1525 S1545 _S1560 STE-75'

k 16 16 16 16

np 15 15 15 15

Var(x’) 35 45 60 75

R 366 526 766 q 1006

row 1 0.25 0.25 0.25 0.25

row2 0.25 0.25 0.25 0.25

row 3 0.25 0.25 0.25 0.25

row 4 0.25 0.25 0.25 0.25

column1 0.165 0.114 0.079 0.060

column2 0.165 0.114 0.079 0.060

column3 0.165 0.114 0.079 0.060

column4 0.505 0.658 0.763 0.620

 

 

59

 

Table A-1 continued.

 

 

Set I. .
I Id F—107——-——_F109 F112 F115 F207 F209 F212 F215
hr 4 4 4 4 4 4 4 4

np 1 1 1 1 2 2 2 2
Ver(X2) 7 9 12 15 7 9 12 15

R 32 52 62 112 32 52 62 112

row 1 0.362 0.349 0.342 0.340 0.50 0.50 0.50 0.50
row 2 0.638 0.651 0.658 0.660 0.50 0.50 0.50 0.50
column 1 0.362 0.349 0.342 0.340 0.146 0.084 0.051 0.037
column 2 0.638 0.651 0.658 0.660 0.854 0.916 0.949 0.963

 

 

 

Id F307 F309 F312 F315

 

k 4 4 4 4
np 3 3 3 3
Var(X2) 7 9 12 15
R 32 52 62 112

row 1 0.196 0.118 0.074 0.054
row 2 0.804 0.882 0.926 0.946
column 1 0.196 0.118 0.074 0.054
column 2 0.804 0.882 0.926 0.946

 

 

 

60

Table A-1 continued.

 

 

 

 

 

 

Set ll.
Id S47"5a S4" '7'5b $860a S660b' 6'66" To 666' '0d'
k 16 16 16 16 16 16
np 4 4 8 8 8 8
Var(X2) 75 75 60 60 60 60
R 1216 1137 662 870 647 913
row1 0.125 0.063 0.125 0.125 0.063 0.063
row2 0.125 0.063 0.125 0.125 0.063 0.063
row3 0.375 0.436 0.125 0.125 0.063 0.063
row4 0.375 0.438 0.625 0.625 0.613 0.813
column10.038 0.076 0.092 0.092 0.164 0.164
column2 0.038 0.076 0.092 0.092 0.184 0.184
column3 0.462 0.424 0.406 0.092 0.316 0.164
column4 0.462 0.424 0.4060 0.7240 0.316 0.446
Id S875a ss—75'b"S6"' 7'5c"—Ss75d"l
k 16 16 16 16
np 8 8 8 8
Var(X2) 75 75 75 75
R 666 1157 992 1165
row1 0.125 0.125 0.063 0.063
row2 0.125 0.125 0.063 0.063
row3 0.125 0.125 0.063 0.063
row4 0.625 0.625 0.813 0.613
column10.068 0.068 0.137 0.137
column2 0.068 0.068 0.137 0.137
column3 0.432 0.068 0.364 0.137
column4 0.432 0.795 0.364 0.591

 

 

 

61

 

.a 835.55 65 53 58x6 385 .9503 93 2 68:39 So 833.305 =8 =< ”262

 

 

 

 

 

 

3. .8.L .5.L .58. La 8.5 .88...» Lo
3. .8.L.5.L.8. co 3. .8.L .5.L .88. so
33. .8.L .8.L .35. no 3. .8.L .5.L .88. no
33. .8.L .8.L .8 5. so 3. .8.L .8.L .88. to
33. .5.L .8.L .85. no 33. .8.L .8.L .85. no
N3. .8.L .8.L .88. No 33. .8.L .8.L .55. No
«3. .5.L .8.L .555. 3.5 m3. .5.L .8.L .88. 3
«3.833.538. 5 «3:83:58. 5
85 u o. 85%.
3. 8 L 5.L .88. 8 3L .8. .5. .88L 3
3. ..8L 5.L.88. m5 8...m.5..338.L co
3. ..8 L ..5L .8. no 85.53588. no
33. ..8 L 8 L 55. so 3. .8.L .5.L .58. so
33. 8 L .8.L .55. no 3. .8.L .8.L .88. no
«3. .8.L .8.L .58. mo 33. .8.L .8.L .55. No
«3:83.388. 38 33..Lo..L.8..L.L~5. 3.o
«3.8.3.538. o . «3:83.388. 5
85588 :8 3 85.886 :8 a
85 u 2 85nd.

 

 

 

. .9 n.- ER

 

«03:52.05 :00 30 cozatomoo .Nr< can...

62

 

 

 

 

35 .35 .85 .5. .55. .88. .3855 Lo 3. .3. .535 .8. .85 .5. .85 .855 .385 8
35 .35 .85 .L5o5 .385 so 8. .3. .35 .5. .85 .85 .355 .385 so
35 .335 .85 .85 .555 .585 5.5 L3. .3. .35 .8. .8. 85.85.85 .385 me
35 .35 .85 .85 .85 .585 so 3. .3. .335 .8. .8. .8555 .85 .385 so
35 .8....3. .85 .85 .585 no 3.3. .3. .3. .85 .8. .85 .85 .385 no
35 .35 .85 .85 .55 .L5o5 .8 535 .335 .55 .85 .85 .585 No
35 .335 .85 .85 .85 .585 3.0 535 .335 .85 .85 .555 .L85 8

53..e.8..m.L8..e 5 3585.355 5

8513. 853.

3. .3. .35 .85 .5. .395 .885 .585 Lo 3. .3. .535 .8. .85 .55 .8. .555 .3385 Lo
3. .L3. .35 .85 .8. .305 .55 .885 5.0 L3. .3. .35 .8. .85 .85 .85 .585 5.5
L3. .3. .35 .8. .85 .85 .395 .885 me 3. 3. .535 .8. .85 .55 .585 5.5
3.3.. 3 35.8..8..85.85.m55.885 to 3. .33. .35 .8. .85 .85 .55 .585 so

3. .3. .3. .85 .85 .55 .355 .885 no 535 .85 .85 .85 .55 .5555 no
3. .3. .33. .3. .85 .85 .355 .885 No 35 .85 .85 .85 .85 .85 .885 No
35 .85 .85 .55 .585 3.0 335 .35 .85 .85 .585 355 3.0

535 .85 .L85 5 35.85 .85 o

3 82.885 :8 z,

85388 :8

 

 

Gasman.

 

 

09% u E

 

 

“30:53:00 Nr< Sam...

63

 

 

 

55 .85 .885 3 35 .85 .395 3
35.35885 3 3585.305 Qo
3585.885 no 3585.305 no
35 .35 .885 to 535 .85 .35 .885 5.0
35 .85 .88..» no 535 .85 .85 .885 no
535 .335 .395 .5585 ad 35 .35 .85 .3385 No
535 .335 .555 .885 3d 335 .35 .585 .3o5 3
535.885 o 335.395 0

38%. 58%.
«5.85.555 No 35.805305 3
35.35.5305 ed 35.85.3385 ed
:5 .85 .555 no 535 .35 .85 .585 no
35 .35 .5305 to 535 .85 .85 .585 to
35 .85 .555 ad 335 .85 .535 .395 we
53..3..33.5.3o5.3.85 No 335 85.35.3365 No
535 .335 .305 885 3.0 535 .85 .85 .3385 3
335.565 o 35.885 o
82.3805 :8 3 ”23:522.. :8 3

 

 

commug

 

 

mmmw u 2

 

3:528 5.< mam».

 

 

 

 

55 .5. .3. .85 .50. .8085 5.0 55. .3. .53. .53. .85 .85 .5305 .3005 3.0
8. .8. .5. .3. .85 505 .8085 5.0 55. .305 .3. .505 .85 .505 .385 5.0
5. .55. .35. .3. .505 .505 .3005 .80005 5.0 55 .35 .85 .585 5.0
85 .505 .8085 5.0 55 .35 .505 .585 5.0
555 .55 .505 .585 5.0 35 .535 .505 .3305 5.0
555 .355 .305 .885 5.0 35 .335 .85 .305 «.0
55 .355 .305 .085 3.0 35 .535 .85 .305 30

55.53053 0 35.35053 0

3535 u 0. 3.535 n 0.

8. .55. .3. .3. .5005 .85 .3085 5.0 5. .535 .305 .8. .505 .505 .3585 5.0
55. .35. .3. .3. .85 .85 .5005 .8005 5.0 35 .335 .505 .3585 5.0
35. .55 .53. .85 .85 .305 .5805 5.0 35 .535 .85 .885 50
555 .35 .85 .385 5.0 35 .35 .85 .305 5.0
55 .35 .85 .8005 5.0 305 .35 .85 .505 5.0
355 .35 .505 .5085 5.0 3.5 .35 .85 .805 5.0
.55 505.305 30 35.53.5605 .85 30

505.3053 0 535.8053 0

5230805 :8 3 833805 :8 a

 

 

Conant.

 

 

mmwrm u E

 

 

8:53:00 N..< 20¢...

65

 

 

 

 

 

 

 

55. .85 .385 .5085 3.0 55. .85 .85 .5585 3.0
35. .85 .805 .585 5.0 5. .85 .55005 5.0
8. .505 305.385 5.0 8. .85 .5805 5.0
3. .85 .0585 5.0 mm. .85 .3305 5.0
3. «05.5585 50 8. 505.305 5.0
3. .505 .8005 5.0 8. 505.505 5.0
3. 305.5305 3.0 8. 85.5505 30
3. .053053 0 5.55053 0
335 u .2 5535 u 0.
3. .305 .385 .5585 3.0 «5.85.3585 3.0
53. .85 .305 .385 5.0 35. .305 .305 5.0
3. 505.385 5.0 5. 55.305 5.0
3. .85 .885 5.0 5. .85 .505 5.0
33. 85.3385 50 5. 385.805 50
3.85.305 50 3.85.385 5.0
33. 505.305 3.0 3. 6505.805 30
3-83053 0 3:50.53 0
525.3305 :8 3 85.5305 =8 3
083512 835"...

 

uoaczcoo N.< 03m...

 

 

d E:E_c_E 05 33 38.8 585.0 .3533. 0.33 03 330352 85 55_3_._nmno.a =8 =< ”302

 

 

 

 

 

 

 

 

 

 

 

55 .55. .505 .3 305 .58. 3.0 5. .33. .505 .50. .50. .3305 .38. 3.0
555 .55. .505 .50. .3 305 5.0 5. .335 .505 .8. .50. .3305 .38. 5.0
55. .555 .50. .505 .3 305 5.0 5. .335 .505 .8. .50. .3305 .38. 5.0
55,555.30. 505.3305 5.0 35-35.505.30. 50-3305 5.0
5. .3505 .8. .505 .3305 5.0 35. .3. .505 .530. .3305 5.0
8. .35 .8. .505 .305 .5585 5.0 55. .3. .85 .85 .5305 .5585 5.0
8. .35 .50. .55 .305 .3805 3.0 55. .505 .85 .5 305 .585 3.0
8. .35 .505 .3 305 0 55. .505 .85 .3 305 0
0055513. 885%.
55 .335 .505 .5305 .3 305 3.0 35 .35 .5505 .5305 .3305 5.0
555 .35 .505 .5305 .3 305 5.0 35 .535 .505 .5305 .385 5.0
555 .35 .505 .5305 .3 305 5.0! 5. .35. .35 .85 .3 305 .5805 5.0
555 .3... .505 .5305 .3305 5.0 55. .55. .005 .85 .5 305 .3305 5.0
35. .55. .35 .505 .3305 .3585 5.0 55. .55. .85 .85 .505 .3305 5.0
55..55..53..53..50..5.30..5.5305.5535 5.0 55. .55. .85 .50. .505 .305 .5585 5.0
55..55..535.5053305305.335 3.0 55. .55. .8. .505 .3305 .385 3.0
555 .35 .505 .3305 0 555 .85 .505 .3 305 0
85355025 :50 3 85335005 :8 >3
508510. 5085"...
533. .7355

 

 

 

__ 355 8533890 :85 50503550 .5.< 5.053

67

 

 

 

35. .555 .505 .5585 .558. .358. 3.0 55 .35 .85 .30. .50. .5585 3.0
35. .555 .505 .505 .5585 .5500. 5.0 55 35.858. .50. .5585 5.0
55. .5..5.505.50. .5585 5.0 55. 35.5058. 50.885 5.0
35..53..5.50..5.558..5 5.0 55:35 .8 5 .30 .50 .5805 5.0
55.33.58. 50.5.5585 5.0 55. 35 ..50.5 .5585 .538 5.0
55:33.5 .50. .505 .5305 .5585 5.0 5 .33. .5.0 5 50.5 .305 .5585 5.0
55:33.5 .505 .5305 .55005 3.0 5. .3 5 50.5 .5305 .5585 3.0
55..33..5.50..5.558..5 0 5. ..30.5.50..5.558..5 0

05355 n 5. 55355 n 5.
555 .535 .505 .5585 3.0 535 .535 .505 .505 .5585 3.0
555 .35 .505 .5805 5.0 55 .335 .505 .505 .5585 5.0
355 .55 .505 .5585 5.0 55. .55. .35 .505 .5585 5.0
555 .535 .505 .5585 5.0 35. .55. .85 .505 .3305 .5585 5.0
35. .55. .335 .505 .5585 .5585 5.0 5. .55. .505 .505 .505 .5585 5.0
5..55..53..3..50..5.530..5.5305.5585 5.0 55. .55. .85 .50. .505 .5305 .5585 5.0
5. .55. .335 .505 .505 .5305 .5585 3.0 355 .85 .505 .50. .5305 .5585 3.0
55 .335 .505 .55005 0 355 .505 .505 .5805 0
552.5395 :8 3 555355005 :8 3

 

 

omhmm u U.

 

 

mmnmw u U.

 

8:35:00 N..< 20m...

 

TABLES AND FIGURES

69

 

Multinomial Distribution

 

 

 

 

 

Residuals
O

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Product Multinomial Distribution

 

 

 

 

 

 

Residuals
o

 

 

 

 

 

 

 

 

 

 

-3 -2 -1 o ' 1 2 3
Z (rank)

 

 

 

 

Figure 2-1. Normal plots of the standardized residuals of the cell means.

70

.533": 60.05":
30 .53": E 80553835 355 =55... 5 .5353 __8.53 8:55:88; 30 3553 53.3 .03 530.5 .5305 3.5 539...
3.0 5.0 5.0 5.0 5.0 5.0 3.0 0 3.0 5.0 5.0 5.0 5.0 5.0 3.0 0

_ — h _ F
3 _ _

 

O

     

1

00
N‘-

 

l
l

l
T

 

1
I

38888

1
I

see

 

 

 

33

 

 

 

 

 

 

No 0.0 md to md Nd _..o o
mhvw o.nm...l0l

09% 038. IOI
5.15m 535.... Icl
53% 03.5... In!
mute 3.3587935 .5265. .II

E 555 355335
5:39 303 38 3.260.351

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

71

.533": 303.05":
30 .53": 5 53.055.58.35 =8 .555. 5 .5353 =83 .383 35850022 55 .03 530.5 .5305 .5.5 205E

  
   

No md md We 06 Nd Pd 0 No md md .56 md. Nd v.0 o

p _ — —
— u u q

«u—
.—

 

   

 

 

 

 

 

 

 

 

 

 

 

muvm can... 1T
83 Enabler
53% 538. LT
35 @335... In!
mﬁo 53.533.323.335 825m .II

333 555 .5555
53301530333 :28on

 

 

 

 

 

 

 

 

3

 

 

 

 

72

 

.535": 33.55": .3 .53": .53 80558835
=8 3555 5 .5353 38.53 .3553 333 30 558085 55 30 558 532555 55 .03 5325 .5305 5.5 555E

             

Nd 0d md vd md Nd ...d Nd 0d md vd md Nd Pd 0

p 3
q 3

- O

I
000
N‘-

P h _
3 _ A

_ b
_ a

l.
O

— — p —
q 1 d 4

1
I

l l
I T I

ssés

ii
00
QR

 

 

 

 

3.

 

 

 

88
._

 

 

 

 

 

 

 

3.5 535... I01
85 535.5 I9.
53% 535.. LT
53% 535... no! .
5 3n... 3.235.333.5355 .5365. II.

.33 555 85.335
5:39 .03 535. 330.355.3533.

 

 

 

 

 

 

.5.

 

 

 

 

 

73

.53": 303.05": 3. .53": .5. 5005530595
=50 350.5 5 .5353 __50.53 5553 35 .0 5580000 55 .0 5550 35005.0... .0000... 55 .0. 5305 .5305 .55 5.005

Nd 0d md vd md Nd _.d o Nd 0d md Vd Md Nd —.d o
p . p . + w .4 w L o

p P
_ a

b
d

3
ﬂ

    

 

 

 

 

 

 

 

 

Nd 0d md vd md Nd _.d

_
q

vam 535._.Iol
85w 535... Ion
53% 535... LT
555$ 535... Iol
N335 585832555 .5265. II

.3. 55.5 300.35
533518.535. 5.355.351

p
_

 

 

 

 

 

 

 

 

 

 

 

 

74

 

 

Table 4-1. Lower limits for the minimum expected cell frequency (em)

 

 

 

 

Recommended minimums

Observed Yamold Roscoe Trial

k n(p) R pm 11 minimum e Cochran (Modiﬁed) & Byars index

' 16 1 366 0.0075 16 0.12 0.5 0.31 1.00 0.05

16 2 370 0.0125 16 0.2 l 0.63 1.00 0.10

16 4 345 0.0225 16 0.36 5 1.25 1.00 0.20

16 8 373 0.0275 16 0.44 5 2.50 1.00 0.40

16 12 349 0.0375 16 0.6 5 3.75 1.00 0.59

‘ 16 15 319 0.0475 16 0.76 5 4.69 1.00 0.74

8 1 163 0.008 16 0.096 0.5 0.63 1.00 0.16

8 2 146 0.019 16 0.228 1 1.25 1.00 0.31

8 4 217 0.02 16 0.24 5 2.50 1.00 0.63

8 6 139 0.045 16 0.54 5 3.75 1.00 0.94

8 7 122 0.058 16 0.696 5 4.38 1.00 1.09

4 l 34 0.04 10 0.4 0.5 1.25 1.00 0.63
49 0.025 16 0.4

4 2 27 0.09 10 0.9 1 2.50 1.00 1.25
38 0.06 16 0.96

4 3 29 0.14 10 1.4 5 3.75 1.00 1.88
33 0.095 16 1.52

 

 

 

75

Table 4-2. Cell probabilities of tables generated for part 2.

 

 

1 Ratio

L Set k n(p) Pm Subset n(pm) pm rim/pm R
1

g A 4 1 0.01 a 3 0.33 33 109
j b 2 0.37 37 109
i c 1 0.49 49 110
l

1 B 4 2 0.01 a 2 0.49 49 204
j b 1 0.73 73 205
l

g c 16 2 0.0065 a 14 0.07 10.8 508
i b 8 0.08 11.8 504
i c 1 0.17 26.8 522
i .

1 D 16 4 0.0065 a 12 0.08 12.5 763
b 8 0.09 13.9 768
1 c 1 0.29 44.1 795
1

l

l E 16 8 0.0065 8 8 0.12 18.2 1298
1 b 1 0.51 78.5 1345

 

76

 

 

 

 

 

 

A

—.._....—-—.—...._- --.

 

 

 

_.-o-—.-1L w...“ .

 

 

 

 

 

 

 

 

 

 

6 ‘ ‘1' ‘ 1 0“—
. Vivi, a \ a a
4 -— +Subseta I ‘ ' ”1&37431-23
+Subset b l
2 +Subset c . 1 P J
0 50 100 150 200 250
8 7..-.-. M, . i... - .. ,_...__ 2.-.._-__...L...-..____.__.__.1
6 I‘, ~—
11 ‘
i ‘_ _ . 4
4 "‘ A 451151,;
+ Subset a
2 +Subset b 1 1 1
0 50 100 150 200 250

 

Figure 4-1. Type I error rate (in percent) versus sample size, 4-cell
tables, (A) 1 and (B) 2 small cell expectations.

77

 

 

 

8 WWO‘«.—- cu.“ .

. ‘ 4w~...-..- ..

J...” w...— a...” _....._ .1

+ subset b

AW— _.__..M-.-_

 

 

 

 

+ Subset a + Subset c
.0" ,
6 “$an ‘ ‘7“ ‘. 1
F v ult' 1 l
4 _ m“ L" d 5\! 5 ' 1‘ ‘5 ’1
l. 1
’ 1
l
2 1 1 1 1
0 200 400 600

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2 I i i

0 200 400 600 800
8 «r i

1“»
6- ‘1 «1 _4 ...4 43:. 4,4141»
' 1'1; v *r"
4
+Subseta

2 +SubsettL 1 1 J

0 200 400 600 800

 

Figure 4-1 continued. 16-cell tables, (C) 2, (D) 4, and (E) 8 small cell

expectations.

 

 

4:84.685 =8 =23 ~ 5? 8.3 =84 A3 4 A3 .coasooQo :8 =95
F 5? @033 =34 3. w 3 .3 .3? Soto 6:29 “:8qu 5 no“! 5:00.? ”803 Esau N4 059“.

      

  

 

 

       
       

     
         

 

 

 

 

 

      

 

 

 

 

 

 

 

3 no no to 3 No 3 o 3 3 no 3 3 No to
— h n + w «P L _ — p ‘P w u o

9%.? . ”F 4:47? 1 2

99.101 cm «34.101 1 cm

831? -1 on 8:7? -1 on

89.101 n1 ov BEAT H1 cc

3»: .6381 4 WM 3»: 8366' 11 ww
-1 E
-1 8
1 8

: p8. E a.
3 9o rec to no «a 3 o 3 ad ad to no No to
1. ._ 1 _ 1 1. L. o m _ a n 1 _ n . c

or m 1 or

1 cm W .1 ow

1 on W 1 8

1 9. . 1 ow

1 on w 1 on

«5.1? 1 8 -1 8

89.1.? 11 E. -1 2

Swain: 1 mm 1 WM

"C 5030 I 11 . H l 11
«N n. 2: .3 cm a .6364. 2: E

 

 

 

 

 

 

 

79

 

    

No md md to ad Nd _..o

 

— _ L P L —

  

o

  
 

 

     

 

ﬁ 351? . . . w.
m 85101 1 cm.
951...: .1 8
85161 -m ”m

H: 5030 I
SN 6 1 8

 

 

mhwmll
005101
mgmlcl

mm 5 101
EN: .026."— II

 

 

L

k

l

   

 

 

 

 

 

E.

.2

 

 

..._-_.. -WM_...1,_

h

—
q

      

Nd 06 m6 #6 «.0 N6

m3...— 19.1
N3“. IOI
monu— 14.1
new“. InT

wauc .Qsomll

—
—

  

_
1

£38398 =3 =95
P 5.; 838 __8.2 E 4 av 4588696 =8 =28 4 5.; 8.3 :84 e a § .8288 N4. 2:9“.

_..o

 

 

       

N3“. 101
manu— 1a.!

son“. 10:

 

     

  
 

c

 

 

 

 

cor

 

 

 

S

.3

80

Nd dd md vd

 

    

md Nd Pd

 

0 PW": 5025A. I

 

 

 

 

.8.... . . . ,9
83101 1ON
9‘0wa -om
83101 -3

 

  

 

 

 

mnmw 16!
comm 101
mgw [cl

momw 101
9.": .gomll

  
  

 

 

 

 

 

E

—

 

Nd 0d md Vd nd Nd

_
q

w
w

 

 

Nd od md vd md Nd

211m 191
covw IOI
mﬁvw lcl
mmvw Ial

024.": 833'

h
d —

45.68098 =8 =95
w 5? 8.3 =88. e a 3 4:84.686 =8 =95 v 5? 833 :84. s a s .82....8 «4 659.“.

    

P h
_ —

 

 

  
           
       

  

Pd

db-

—
u

 

3.5 191
85 101
mvvw 141
33 101
m4": bosom Ill

 

CO
‘00

 

 

 

 

 

8

81

a

 

    

Nd 0d md vd nd Nd _.d

     

 

     
     

9N": ﬁscal

 

22.141? _ _ 1 m.
035191 1 ON
435161 11 on
845161 .1 cc

 

 

Nd od md vd md Nd rd

_
.-
1—
d1—
«1—

mNmrwIOI
com—bl?
mvmvmlcl

mam 5 101
mi: 3264.11

 

 

1

1 l

J

 
      

row

Tow
1cm

 

 

 

cow

 

 

     

a.

.3

     
        

. 2.0380096 =8 =95 2.
5E 838 .818 av .4 § 4:36.896 =8 :2.» N. 5? 8.3 :88. A5 a as 822.8 «4. 659“.

Nd od md Vd md .Nd _.d

 

mNNPmIOI
dmNPwIOI
mVva lcl

mmNlenT
SN": 5269'

 

_ * r p r _

w .

mNN 5 191
ooNFm 101
EN pm Icl

mmN 5 101

   

2
4 cm
-1 8
-1 8

 

      

11

11
CO
0500

 

 

o9.

 

 

.5

=5

82

 

 

 

 

     
   
 
  

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

(a) 100 __._____...__......_-...._._.-__-._
80 -— ......
60 ~—
40 ~—
20 —Power df=3 o 1x.0‘l(+H1)
“ 0. 12101 (+11) + 3x.01(+H1)
0 1 [x 3x.01 (+11) %
o 50 ' 100 150 200
80 -- ' " ' ’
60 __ —Power df=3
o 1x.04(+1-11)
40 ~1 0 1x.04 (+11)
20 ._ + 3x.04(+1-11)
a: 3:104 (+11)
0 "L‘t 1 r 1
o 50 100 150 200
1°) 100 ~
80 q- .: Z
60 ‘—
40 “ —Power df=15
20 +s1275 (Mixed H1)
" , +81275 (+111)
0 1 1 1 +812715 (-H1)
0 50 100 150 200 250

 

 

 

Figure 4-3. Rejection rates (96) versus sample size for alternative
hypotheses: small cells increasing (+H1) or small cells decreasing (-H1),
(a) and (b) 4-cell tables, n(p)=1 or 3. (c) 16-cell tables, n(p)=12.

83

.69 E 3 $0580 05 8 came iguana. 05 6v new $8 38055:. 833.5
65 § 8653 685248 65 2 .638 82636 E s 88.650 E .21: .919. .216. .3 65?.

Nd 0d md vd md Nd

_.d o

 

 

 

essés

 

 

888

 

 

 

 

 

 

.3

EV

 

Nd 0d md vd md Nd rd o.

 

.1-
Y

 

 

 

 

 

 

 

 

 

 

mNN Pm ﬂow... 101
8Nww 0.09—.10!
94pr min... Ial

.1435 min... 101.
men: coumgxoaam hogan. I

E 0% 60:0
3.61.015 0.8 compo?“

 

 

 

 

 

3

84

 

Nd md md Vd nd Nd _.d o

1
1
o

 

 

 

Nd 0d md vd md Nd _.d o

1
O

.a-y-u-

 

 

 

 

 

 

822.8 E 659“.

 

 

 

 

 

 

 

E

 

 

mNN Pm 0.00... 101
com Fm 0.00... 101
new 5 0.08. 101
mam 5 0.00... 101
on": 00:05:60an .0260 II.

 

 

 

E 6% 06:6 .
0:29 Ag; 28 0030?”.

 

 

.3

85

 

.8.. u m: 2: dv u me Amy 63.3 08 .mmNﬁ 0.9:mm 5:ch .N1m 0.590..

Nd md md vd md Nd

 
  
 

 

 

_.d o

1.
O O
N

“88888"

 

Nd od md vd md Nd

 

 

§88

_.d o

 

 

 

 

 

 

 

3.

E

 

 

38$ 033.101

88m 033.101

88$ 0.08.101

08mm 0.08.101

88 038.101
0030879000 Egon. II

 

 

 

E 6% 06:6
«3901505 0300?”.

 

 

86

 

.8. u a: As .8 u 0. § 8:8 03 085:8 «.m 6.501.

 

Nd od md Vd md Nd _.d o

w L _ _ _ h b o
q _ d . d1 4

          

 

     

889

       

11%?111‘
E883

§88

 

 

.3

 

 

 

_.d

   

33 030... 101

8N3 030.1101

$wa 030... 191

005m 038.101

. 2.0 030... 1.01
cozaéxoaam .0260 I

E 6% 06:6
4:89, 33 22 =68on

Nd md md V

b
_

 

d md Nd

h
4

  

 

 

 

 

 

 

 

 

.3

 

 

 

 

87

md

.8118 2.11 .20 84 .1119. .210. 432 .1151; .86 06 815415410

 

 

830 . . m . .
830 . . + . .
850+
88le

mwa '91
N3": 0300'

 

.3

 

 

 

 

 

 

3

 

 

0d

 

 

0d

       

 

 

 

 

      

md vd ad

1 .1 _ mm
“Swamiﬁi
880.1:

noowwnlal. mo
woomwnlclr
omwwlol

3.1": .0300 11 3.

1 mm

1 mm

md Vd md
1 1 . m«
880...... on
111141....
IT 00 .

moommlcl .. \ mv

. 1.11.1111. 1....

cm... 0 11, mo

1 mN

1 mm

1 mm

 

 

 

.3 65010

 

 

a.

E

88

 

 

Discrepancy beMeen observed and predicted

 

 

 

 

 

 

 

 

 

 

 

 

rejection rates
5.01””
-L A
° 0 A ‘ o EQUAL N
-5.0 4- A D Shift down
-lO 0 ._ ° A No extremes
0 ° 0 o Shift up
-15.0 i a
n
-20.0 9 l l i
80 9O 95 99
Predicted rejection rates (96)
12 O
1o —- 0 —Alpha
0
8 -- e
3 -_ 0 Type I
4 i i i i 9"" (96)

 

 

 

 

194 248 288 380
(.80) (.90) (.95) (.99)

Sample size (predicted rejection rates)

 

 

Figure 5—4. Application problem, conﬁrmatory simulation results.

89

 

 

+ $475 (n>Pn)
+ S475a (n>Pn)

 

NM-hUIODV

0 200 400 600 800 1000

 

 

 

  
   

       
     

+886)0<(nPn ‘

 

 

 

 

 

 

 

    

 

 

 

 

 

 

 

 

 

g, ’3' A- sasoa +sasoa n>Pn {
3 9' v. «S860b +3860b n>Pn 4
{Hi «e seeoc —o—seeoc n>Pn j
2 at , . a S860d, +8860d n>Pn
. , , , )ﬁ .
o 200 400 600 800 1000
7 1, y __
6 —_~t,"
'- . .TH'.‘,
5 y" ' \n ".‘.u"..,'. 3 '.
.9 ., 8875 ' +8875(n>Pn)
, .. .w-sa75a S875 P
3 . , . S875b
s ‘ ~+~ ss7sc
2 g“ p , ,. r8875d, .
L0 200 400 600 800 1000

 

 

Figure 6-1. Type I error rate versus sample size, test of independence:
(3) 475 series, (b) 860 series, and (c) 875 series.

90

888088 =8 :2.» v 55, 8.8. =88. A3 a
as 885880 =8 =95 u 5? 8.8. :88 E w E 8828»? so “8. .203 538 .3 28E

 

 

        

         

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

No 0.6 m6 #6 ad Nd _..o o 06 *6 nd Nd Pd o
86.? . _ -. 9 2.9.1.: _ . - w.
oovaOI -r em oomuIOI - om
mild! l on mvmmlol -T on
8.5+ -- ”m mangle: -. ”m

"c Ego I ll "C 530 I 1!
at n. + 8 8 n. .- 8
.- E. l 2.
l cm i cm
[1 OG Lu cm
2: .3 . as 3.
Nd 0.0 md #6 ad Nd Pd 6
. o
r or
- cm
I on
8.5:? H mm
003+ .8 oo
35ch -- 2.
.31de l on
own: Egon—II. .8 cm
2: E 3

 

 

 

 

 

91

8.3380096 =3 =9...»

9 £8 833 .8825 a 3 885893.00 =98 8 5? 8.3 :88 S a E .8888 «.8 28m

                

  

b

  

 

  

L
«

 

mum—bl?
cow—h IOI
mwmwm lei

.mmwwm IDI
OF P": thQ Ila...

Nd dd md ed md Nd _.d d

_ n p
q d _

 

 

 

mum—.w lel
ommww IOI
mVNww Iol

mam Fm ID!
at": ﬁscal

   

 

 

 

 

 

 

 

E.

             

 

 

 

 

 

 

 

3 so no to no. No to o
88m _ _ 1 -WF
88+
88?... - 8
88:? -- 8
mzwnf -. 8
mmalﬂl it On

oswucbogodlas :

8
,- 8
l 8
- 8
8. E
3 me no to no «o 3 o
_ ¢ ¢ w d O
- 9
.. 8
-v on
88?? “mm
2.8:?
88?? --8
88:? :2.
mgwlar l on
88qu 4 8
awn: .Qsodl 8.. 3

 

 

92

.mcoaﬁooaxo =8 =95 up By 8286096 =8 =2.» e. 3 ”ca u c .2 3o... Egon. 6% 059“.

 

Nd od md vd md Nd _.d o

 

o

8 ON

- d?

T dd

.. om

cow

 

 

 

 

 

 

Nd dd md vd md Nd _.d d

 

 

 

 

 

 

 

3.

E

 

 

 

mNNPm IOI
ompuc .ogodnll

doN Pm tel
mm"... Bion— II
RN 5 le
RN 5 InT
own: 530d II.

 

 

 

 

vam LT
3N": Esoall
83 +
0N v": .638 I
94w IX:
3 Fun .538 II.
movw InT

 

 

93

.mmtow mNd new cow A3 .3an0 =00 =95 m g .uoacacoo m6 2:9...

 

 

No no no .3 do No to o

d _ ed

. o8
.. o8
.- c8
. Q8
3.3

Ah
.-
..
_
.1—

    

L

 

 

 

 

 

Nd 0d md vd md Nd rd o

, o
. ON
a at
1 cm
.. dd
09.

 

 

 

 

 

 

 

.3

.3

 

 

nmNdw lxl
mm": Bion— II
noodw InT

eon: .oBodl

 

 

 

 

mNmm Iql

on F": .oiod II
comm I9:

NP Pu: Esau. II.
93 I.T

«Nu: .Qsod II

mmmm In...
3...... .ozodl

 

 

94

 

 

 

 

 

 

 

 

8 0 o
. ~- 1:
o" 1"in: "
6.0 " o O
o e
4.0 -_ e x e e
x
2.0 ~~ xgo,o
j." . . ~83
0.0 '1‘“ ; L
= o

.20 ._ 0 hp 4

o np = 8 o
-4.0 -_ x np = 12
-6.0 8 l t

0 0 20.0 40.0 60.0 80.0 100.0

 

Figure 6-4. Differences between observed and expected power versus

expected power.

95

 

 

 

Power distributions

 

 

m
C
l
I

  

—Predicted power n=40

 

 

 

60 T -D-Observed power
50 L l : l
0.5 0.6 0.7 0.8 0.9
Effect size (Cohen's w)

 

Figure 6—5. Application problem, conﬁrmatory simulation results.

96

 

885088. =8 =95 o 55 833 .88 888885 3 .8. 2.. 8 2a
“no“ 33:30:35 9: Ed 5223 anagxeaaa .038 .. $26.. 002033 «a 5 805350 5 .TN 239“.

 

Nd 0d md vd nd Nd —.d o Nd 0d md vd md Nd _.d o
w T u n . ..|.l o . w + 8 2 ._ n of
v a .- . -. 3-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.. 0."!
. 3-
od
.. 3
-- c.~
-. ca
- o...
-- on
E od E
2.3 287?
88 287?
83 287?
. 88 038qu
men... 5:35.86 cessodllu
E as» 88
A . 881528 8806. . m
a

 

 

 

 

 

97

 

 

.cd 2 .300 9 8a a. can 0388 on... .3338 ....N 0.59.“.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Nd ed md to ad «d 3 o

_ u . _ w . b O _ Y
.- or n.
-- om N.
.i 8 a PI
l CV m o
.. cm -- F
.. ow .- N
.. ON .. n
l 8 .. v
-- om - m

8. E m .3
mem 23.7?
-. ”F corn: bosom]
-- cm 83 enabler
: on NS": 3ng
l 9. 93 enabler
11 cm NW": 5031'
1.. 8 m8” OBNPIOI
-. 2. men: couwéxoaqm .osodll.
.- 8 E 8888
.- 8 E 88> 33 22 88%”.
co.

 

 

98

8:8 2.8 Eu 8 838:8 I. 28E

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Nd ed md v.0 0o «d 3 o .3 od md v.0 md No .8
F. _ _ w w n . c . w n n n _ _ _ .1
1T ON NI
1. on _...
1r 3 4 O
-- on P
l cm .. N
it OK 11 m
i on i V
.. om .. m
8. e o a.
L o 928 Enabler
11 ON 008% EDNPIII
ll 8 anc hogal
.. ow nmNom gambler
.- 8 88m 287?
wm men: conaﬁxoaam .26.“. II
.- 8 3 8m 88
-- om 2.201836. c0390”.
8F .5

 

 

 

 

 

 

 

BIBLIOGRAPHY

100

Blbllography

Agresti, A (1990). Categorical data analysis. John Wiley 8. Sons.

Bradley, D. R. Bradley, T. D. McGrath, S. G., 8: Cutcomb, S. D. (1979). Type I
error rate of the chi-square test of independence In r x c tables that have
small expected frequencies. Psychological Bulletin, 86(6), 1290-1297.

Bradley, D. R., & Seely, D. L. (1977). Empirical determination of the power of
the chi-square test of independence in 2 x 2 tables. Proceedings of the
Statistical Computing Section of the American Statistical Association, 138-
144.

Camilli, G., & Hopkins, K D. (1978). Applicability of chi-square to 2x2
contingency tables with small expected cell frequencies. Psychological

, Bulletin, 85(1), 163-167.

Cochran, W. G. (1952). The a" test of goodness of ﬁt. Annals of Mathematical
Statistics, 23, 315-345.

Cochran, W. G. (1954). Some methods for strengthening the common 3" tests.
Biometrics, 10, 417-451.

Cohen, J. (1 988). Statistical power analysis for the behavioral sciences, 2"” ad.
Hillsdate, NJ: Erlbaum.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.

Cooper, H., & Findley, M. (1982). Expected effect sizes: Estimates for

. statistical power analysis in social psychology. Personality and Social
Psychology Bulletin, 9, 168-173.

Craddock, J. M., 8. Flood, C. R. (1970). The distribution of the 12 statistic in
small contingency tables. Applied Statistics. Joumal of the Royal
Statistical Society, Series C, 19,173-181.

Fishman, G. S., & Moore, L. R. (1982). A statistical evaluation of multiplicative
oongruential random number generators with modulus 2311. Journal of
the American Statistical Association, 71, 129-136.

Frosini, B. V. (1978). On the power function of the X2 test. Matron, 34, 3-36.

Garside, G. R., 8. Mack, C. (1976). Actual Type I error probabilities for various
tests in the homogeneity case of the M contingency table. The
American Statistician, 30(1), 16-20.

Harkness, W. L. 8. Katz, L. ( 1964). Comparison of the power functions for the
test of independence in 2x2 contingency tables. Annals of Mathematical
Statistics, 35, 1115-1 127.

Haase, R. F ., Waechter, D. M., 8. SolomOn, G. S. (1982). How signiﬁcant is a
signiﬁcant difference? Average effect size of research in counseling
psychology. Journal of Counseling Psychology, 29, 58-65.

Horn, S. D. (1977). Goodness of ﬁt tests for discrete data: A review and an
application to a health impaim'Ient scale. Biometrics, 33, 237-248.

101

Hayman, G. E. 8. Leona, F. C. (1964). Comparison of the power functions for
the test of independence in 2x2 contingency tables. Annals of
Mathematical Statistics, 35, 1115-1 127.

Koehler, K. J., 8. Lamtz, K (1960). An empirical investigation of goodness of ﬁt
statistics for sparse multinomials. Journal of the American Statistical
Association, 75(370), 336-344.

Koehler, K. J. (1986). Goodness of ﬁt tests for log-linear models in sparse
contingency tables. Journal of the American Statistical Association,
81(394), 483-493.

Lamtz, K (1978). Small-sample comparisons of exact levels for chi-squared
goodness of ﬁt statistics. Journal of the American Statistical Association,
73(362), 253-263.

Lawal, H. B. (1992). A modiﬁed X2 tests when some cells have small
expectations In the multinomial distribution. Journal of Statistical
Computer Simulaﬁons,40,15-27.

Lawal, H. B. 28: Upton, G J. G. (1980). An approximation to the distribution of
the X2 goodness-of-ﬁt statistic for use with small expectations.
Biometrika, 67 (2), 447-453.

Meng, R. C., and Chapman, D. G. (1966). Journal of the American Statistical
Association, 61, 965-975. -

Moore, D. S. (1986). Tests of chi-squared type. In R. B. D’Agostino and M. A
Stephens (Eds) Goodness-of-ﬁt techniques. NewYork: Marcel Dekker,
Inc.

on, R. L. (1993). An introduction to statistical methods and data analysis (4"
edition). Belmont, CA: Duxbury Press.

Press, W. H, Teukolsky, S. A, Vetterling, W. T., 8. Flannery, B. P. (1992).
Numerical recipes in Fortran, 2'” edition. Cambridge University Press.

Read, T. T. C. 8. Cressie, N. A C. (1988). Goodness-of-ﬁt statistics fordiscrate
mulﬁvan‘ate data. New York: Springer-Venag.

Roscoe, J. T. 8. Byars, J. A (1971). An investigation of the restraints with
respect to sample size commonly imposed on the use of the chi-square
statistic. Journal of the American Statistical Association, 66, 336, 755-
759.

SAS Institute Inc. (1990). SAS language: Reference, version 6, ﬁrst edition.
Cary, NC: SAS Institute, Inc.

Slakter, M. J. (1968). Accuracy of an approximation to the power of the chi-
square goodness of ﬁt test with small but equal expected frequencies.
Journal of the American Statistical Association, 63, 912-924.

Von Eye, A (1990). Introduction to conﬁgural frequency analysis: The search
for types and antilypas in cross-classiﬁcations. Cambridge University
Press.

VVIckens, T. D. (1989). Multiway contingency tables analysis for the social
sciences. Hillsdale, New Jersey: Lawrence Erlbaum Associates,
Publishers.

102

 

 

Wise, M. E. (1963). Multinomial probabilities and the x’ and x2 distributions.
Biomatrika, 50, 145-154.

Yamold, J. K. (1970). The minimum expectation in X2 goodness of ﬁt tests and
the accuracy of approximations for the null distribution. Journal of the
American Statistical Association, 65(330), 864-886.

103