AM mivfimgmw @5 m5 90mg. §UE§€C€EQ§ -' ’ ' FG-R THE $225? a? END:E?E§E€§ENCE ' ma 2 2s 2 gamma-my Mam flask far she Degree 9% Pa. 9. . mcfiafim 5mm um’veasm Wéiééam Leanaéfi Hmknafis ' i 2%? ’ r:——— lESlS J Illllljllllll j] 1H jllol IIIIIIIIIII I This is to certify that the thesis entitled An Investigation of the Power Function for the Test of Independence in 2 x 2 Contingency Tables presented bg William Leonard Harkness has been accepted towards fulfillment I of the requirements for Pho Do degree in Statistics ‘ Major protegsor Date AUSUStng l959 0—169 LIBRARY Michigan State University MSU LlBRARlES _—. \— RETURNING MATERIALS: Place in book drop to remove this checkout from your record. FINES will be charged if book is returned after the date stamped below. " 5?”? e [JCT 1 8 1999 lllllll AN INVESTIGATION OF THE PO OF INDEPENDENCE IN 2 x 2 CONTINGENCY TABLES By WILLIAM LEONARD HARKNESS A THESIS Submitted to the School of Graduate Studies of Michigan State University of Agriculture and Applied Science in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics 1959 WER FUNCTION FOR THE TEST ._4_ ___ Q _A 8/99 9/17" William Leonard Harkness Candidate for the degree of Doctor of Philosophy Final examination, JUIY 29, 1959, 9:00 A.M., Physics- Mathematics Building . Dissertation: Contingency Tables ; Outline of Studies Major subjects: Mathematical Statistics, Probability Minor subjects: Algebra, Analysis Biographical Items Born, June 25, 193%, Lansing, Undergraduate Studies, Hillsdale College, 1951—52, Michigan State College, 1952—55 Michigan Graduate Studies, Michigan State University 1955-56, cont. 1957-59, University of Chicago, 1956-57. l Experience: Graduate Assistant, Michigan State College, 1955, Special Graduate Research Assistant 1955- , cont. 1957-5 , University Fellow, University of Chicago, 1956-57, Mathematician, Institute for Air Weapons Research 1957, Temporary Instructor, Michigan State University, 1959 Member of Phi Kappa Phi, Pi Mu Epsilon, Sigma Xi, Institute of Mathematical Statistics, American Mathematical Society ACKNOWLEDGEMENTS The author wishes to express his sincere gratitude to Dr. Leo Katz, his major professor, for suggesting the problem and for his continuous support, endless patience, and encouragement during the completion of the problem. The writer also deeply appreciates the financial support of the Office of Naval Research which made it possible for him to complete this investigation. The author also is very grateful to Dr. J. F. Hannan for several invaluable suggestions, and to Mrs. Helen Spence, Who set up the machine computation of the tables. The author wishes to dedicate this _‘ thesis to his wife, Mary Lou, for her ”In 3 kind understanding and patience during the course of the writing of this manuscript. éé¥irfl ..;Ffiiffflfm huIss. 2'; v .-' Wag-I'll. I Vfirn J -I 3’ ” ~ «TABLE. or CONTENTS CHAPTER 1 ANALYTICAL REVIEW OF PREVIOUS WORK .......... 1.1 Abstract Models in 2 x 2 Contingency Tables O00......COCOIOOOOOOOOOOOOOOOOOOO 1.2 Probability MOdels OOODOOCOOOOOOOOOOQOOOI 1.3 Statement of Hypotheses, and the Problem of Testing for Independence ............ 10” Order Of Presentation 0.00.0.000000000000 1.5 The Uniformly Most Powerful Unbiased Test for One-Sided Alternatives and TWO’Sided Alternatives 00.00.00.00000000 l 6 2 x 2 Independence Trial ................ 1.7 2 x 2 Comparative Trial ................. l 8 The Double Dichotomy .................... 1 9 R esults OOIIQOOOOOOOIOOOOOOOOOOOOOOOCIOOO CHAPTER 2 AN INVESTIGATION OF A PROBABILITY FUNCTION .. 2.1 General Properties 0000.000.0.00000..0..0 2.2 Limiting Distributions and Approximations CHAPTER 3 POWER FOR THE TEST OF INDEPENDENCE .......... 3.1 2 x 2 Independence Trial ............... 3.2 2 x 2 Comparative Trial ................ 303 Double DiChotomy OOOOOOOOOOOOOOOOOOOOOOOO 30h Asymptotic Power 000000....0000000.00000. CHAPTER N COMPARISON OF POWER FUNCTIONS ............... 1+0]. Preliminaries 00......OOOOIOOOOOOOOCOCOUO 9 h.2 Comparison of Exact and Approximate Power SWARY QC.O‘COQOOIOOUOO0.0..OOOOOOOOOOOOOOOOOOOOOCO. 2 93 97 APPENDIX A. EXACT POWER FOR THE DOUBLE DICHOTOMY ...... 100 A01 Tables Of Plo(7\,PA)PB) 0000000000000... 103 A02 Tables Cf P20(7\3PA9PB) 00.000.000.00... 108 A03 Tables 0f P3O(;K’PA,PB) 00.000.000.00... 113 APPENDIX B. EXACT POWER FOR THE 2 x 2 COMPARATIVE TRIAL 116 Bol Tables Of P10(P1,P22m1) 000.000.0000.... 118 B.2 Tables 0f P20(pl’p2’ml) 00.00.00.000.000 122 B03 Tables of P30(p1,p2,ml) .OIOOIOOOOOOOOOI 131 C02 Tables Of P20(t {m1,m2) 0.0.0.00000000000. 150C} C.3 Tables 0f P3o(t 'mlm2) 0000.00.00.00000000 153 3’ APPENDIX D. COMPARISON OF POWER FUNCTIONS .............. 158 D.1 Tables of the Three Exact Power Functions and the Normal'APPTOXimation 00.000000... 1 APPENDIX E. Approximations to Power in the 2 x 2 Comparative Trial 00.0.0...OOODOOOOOOOCOOOO 171 E01 Graphs 0OOOOOOOOOOOOIOOOOIOOCOOOOO.OOOOOOOIO 1.72 E.2 Tables of-Sillitto's and Patnaik's Approximation, and Values of _ P (p 017291111), P (13‘1“ 31112)]tp ....... 175 n l n l pJq xx =p2‘11 1.BIBLIOGRAPHY 00C....‘OOIOOOCCOOOOOCQOGQ‘IOOOI‘OIQCCOCOOQ 177 I i I : Wynn-g, {I in each of the three cases is given in Chapter 3. In this William‘Léonard’Harkness This thesis is concerned with an examination of the power function for the test of independence in 2 x 2 contingency tables. Three distinct types of experiments }i_'i leading to the presentation of data in the form of a 2 x 2 2; table have been delineated, and several tests for independ- 'ifz ' J ence for each have been proposed, but not much is known PI‘ ‘ a about the power functions of these tests. if}; ‘é In Chapter I, which is a systematic review of previous ";;‘ work in 2 x 2 tables, the uniformly most powerful unbiased Sir test for independence is discussed quite thoroughly. All the results and computations in this thesis are based on this test. The content of Chapter 2 is a study of the probability »“ n ; function k(ml,m2,n;t) h(nl [ml,m2,n) t l, 0 < t < 00, where 1 h(n1I m1,m2,n) is the ordinary hypergeometric function, and ' n k(m1,m2,n;t) is the reciprocal of the sum of h(nl Im1,m2,n)t 1 over all possible values of n1; some asymptotic properties are 1 included in this study. The exact power function for the test of independence ilchapter, the three power functions are related to one another,y :9 'Fvand asymptotic power is investigated. The asymptotic power r 2 function for the i!' -test 0f independence is giVen here. _;_ a ”a, l. «ac-h in Chapter 3 are compared with the exact power. Rather extensive tables of exact power for each of the three cases are in the appendices. These exact computations provide the Chapter 1. Analytical Review of Previous Wbrk subject. Since that time, many other writers have devoted themselves to the same problem. is moderately large, and consequently, for this case, the problem can be said to be solved for all practical purposes. - The problem of testing for independence in small samples, however, has led to considerable controv ersy. Several tests have been proposed, but basic disagreeme nt remains. This is partially due to the fact that there are experimental situations which lead to the presentation of data in the form of 2 x 2 contingency tables. Abstractly, these three experiments may be described in the following manner: I. A total of n similar balls, m 1 marked A1 and n - m are placed in an urn, 1 marked A2, then withdrawn randomly in order. They are then placed in order in a row of n cells, of which m2 1, n - m2 labeled B2. The result of the have been labeled B experiment is presented in Table I, where n1 is the observed number of balls marked A1 in receptables labe sets of marginal totals are fixed. TABLE I II. From two urns, A1 and A2, each containing a large number of balls marked B1 and B2 samples of :1:1 It is observed that m are labeled B and n - m1 are taken. 2 of the balls are labeled B1, and n - m2 . It is assumed that the proportion 2 of balls marked B in urn 1 A1 is pi, i = 1,2. With this type of experiment, one set of marginal totals is fixed in Table I, namely, m 1 and n - m1. Table II gives the relevant probabil- ities of occurrence of the balls with specified markings. TABLE II I 311 B2 A1 [pill‘pl ‘2 [92’1‘3’2 III. A total of n similar balls is randomly selected from an urn containing a large number of balls A , each ball labeled 1 or A 2 and also labeled B1 or B2. An observed result of the experiment is represented in the form of Table I, where none Following G. A. Barnard' s [3] nomenclature, we will call e trial" , and III the double dicho I the 2 x 2 "independenc , II the 2 x 2 "comparative trial" tomy. 1.2. Probability Models. Let us now consider the appropriate probability model for each of the three experiments described above. It will be seen that these probability models haVe a natural "hierarchical order" in that the probability model for the 2 x 2 independence trial and the 2 x 2 comparative trial are obtained as conditional probabilities of the probability distribution in the double dichotomy. Considering first, then, the double dichotomy, the probability of observing the sample point (n 1, n2, n , nu) is, by the multinomial probability law, (1.2.1) Pr { n3 n1 n2 n3 112+ “1’ “2' “3' nu} = n1":"n'2:"‘n3'T hp. "1 ”2 ”3 "n If we replace #1 by A PAPB’ 773 by PA(1- APB) 1r2 by PB(l-7\PA) 173+ by 1~PA- PB+ 7xPAPB \ P 4' P - 1 i ' where max [-0, -AL--§--w] < K < min [jl— , if] P P PA , _ AB PB and replace n2 by (zn‘2 - n1), n3 by (ml-n1) d _ - an n1+ by “1 n‘1 “‘2”? then (1.2.1) may be rewritten as (1.2.2) Pr {nymlnm2 In, A}' PB(1- 7. P ) E = b(m13n,PA)b(n1 “111, l PB)b m2- l;n-ml, \L l-PA where b(x;n,p) is the usual binomial probability. Thus, (1.2.2) gives the probability of the sample point (n1,m1,m2), given 11 , A , PA’PB° Computing the conditional probability of ( n1,m2), given ml,n', X , (1.2.3) Pr {n 1,m2l ml,n, 7x} A’ and PB, one obtains = b(n1;m , l PB)°b m2-n1; n-ml , l _ PA Letting p1 = R PB p _ PB(1 - APA) ’ c. s“. 2 1 PA (1.2.3) yields the probability model for the 2 x 2 comparative trial. When the 2 x 2 comparativa trial is discussed without reference to the double dichotomy, will be used to denote t Pr {nvmz ' m1,n, A} . Pr {n1,m2 'P19P29m11n} he probability function rather than Summing (1.2.3) over all possibl marginal probability for m2 as (1.2.11) Pr{m2 [1111,11, R} I e values of nl gives the n1 P (l-ZP ) = z b(n13ml,?\PB ) bé2- 13n-m1’M> . l-PA Prém2 [m1,n, 7x} is seen to be the convolution of two binomial distributions. Hence, from (1.2.3) and (1.2.10, we obtain the conditional probability . Pr n ,m2l ,n, A (1.2. 5) Pr {n1 'm1,m2,n, 7t} = 1 m1 } Pr {m2 1 ml,n,>‘\} (:1) (n ‘ m1) tnl 1 m2'ni = )1?) (n :2) t3 m2 M1 - PA - PB + A PAPB) _ plqg where t = ‘- ‘~ (1 - XPA)(1 - x PB) p2‘31 Q131-pi,i=1,20 07\ = 1. p1 = Pg<=>x = 1, and t = 1<==>A = 1, we may specify independence by Ho: A = 1, for each case. If A = 1, (1.2.2) and (1.2.3) reduce to (1.3.1) Pr<{nl,m1,m2 ln} = b(m1;n,PA) b(m2;n,PB) h(n1 lm1,m2,n) where h(n1 [m1,m2,n) is the hypergeometric probability function. (1.3.2) Pr {n1,m2 lml,n} = b(m2;n,pB) h(nl (m1,m2,n) and the conditional distribution of n1, given ml,m2,P , PB, and -R = 1, given by (1.2.6), holds for all three cases. Any alternative hypothesis may be expressed as H1: 72% 1, for any of the three cases, so that H1 is composite. In terms of 71, H0 is simple. The nuisance parameters PA and PB make it composite. Ideally, one would hope to find a uniformly most powerful test for the class of alternative hypotheses. The case for one-tailed tests has been disposed of for all practical purposes by Tocher [25], Sverdrup [2k], and Katz [9]. They showed that the same test procedure should be used in each of the three situations, and that the test, which is a slight modification of Fisher's classic test, is most powerful, in the sense of Neyman and Pearson. This test procedure will be completely described in Section 1.5. For two-tailed tests, no uniformly most powerful test exists. Several tests have been proposed, for each of the three cases, with disagreements as to the appropriate one. The main point about which the controversy hinges is whether the marginal totals are intrinsic or nuisance parameters. One school of thought, led by Fisher, maintains that the marginal totals per se furnish no relevant information as to the probabilities of the observed frequencies, and hence are nuisance parameters. Hence, Fisher would advocate a conditional test, given the marginal totals, so that each of the three cases would be handled in a similar way. Another view is taken by E. 8. Pearson [16], among others, who believes Fit is an artificial procedure to restrict the experimental probability set to a linear set," as Fisher does. Bernard [3], l9h7, page 136, expressed the opinion that ”significance tests for the 2 x 2 independence trial will not _‘i‘ -9— necessarily be appropriate for the 2 x 2 comparative trial," as well as the double dichotomy. He constructed an alternative test for the 2 x 2 comparative trial claiming this test had greater power than Fisher's "exact" test. However, Barnard [2], in l9h9, wrote "On the 2 x 2 table I arrived at a test, the CSM test, which seemed to be considerably more powerful than the "exact" test of Professor R. A. Fisher, by taking as a reference set a class of results different from that considered by Professor Fisher. This led to some controversy with Professor Fisher, in which he maintained that the Neyman-Pearson notion, that the reference set involved in a test of significance consists of the set of all results which could have arisen in the given circumstances, was ill-conceived. In private correspondence follow— ing on this controversy, Professor Fisher drew my attention to a particular case where there did seem to be some difficulty in using the Neyman—Pearson approach. I discussed this case in another paper (Barnard, 19%7b), in which I attempted to show how the Neyman-Pearson approach could be extended to cover such a case. However, I was not myself satisfied with the position, and further meditation has led me to think that Professor Fisher was right after all." When uniformly most powerful tests do not exist, various procedures are available. A very commonly used technique is -10.- to restrict the class of possible tests to a smaller class of tests, with the hope of finding in this smaller class of tests one which is uniformly most powerful. For example, one might require that the test be a "similar" test, or that it be “unbiased", or that it be an "invariant" test. There are circumstances in which making such restrictions would be quite reasonable, and then again, some statisticians might feel there is good reason why such tests should not be made. In restricting ourselves to a similar test of size o< we are requiring that the test make incorrect decisions at not more than the full allowable rate for all hypotheses under Ho. The principle of unbiasedness seems to be a very reasonable one - requiring that a test should accept the alternative H1 more frequently when to accept is the correct decision than when it is incorrect. In the two—tailed tests which have been proposed for the test of independence, generally the class of tests has not been restricted, and more or less "subjective" criteria have been used to judge the efficacy of the test. Clearly, the merits of a test should be judged from its power function, and not by intricate intuitive processes. l.h Order of Presentation While many people have written on 2 x 2 contingency tables, for the most part they have considered the three distinct types of experiments as separate problems, except for large sample sizes, where the :(.2 test becomes applicable in all three situations. As a result, there is an abundance -11- of papers on the subject, but no very unified treatment of the whole area. Thus, as a preliminary, in the remainder of Chapter I a comprehensive and systematic survey of past work is included. This survey reveals that there are still some unsolved problems left. It shows that whereas many tests for independence have been proposed, very little has been done on the power function. This is particularly true for the 2 x 2 independence trial and the double dichotomy. Our main concern, therefore, is an examination of the power function for the test of independence. In section 1.5 of this chapter, the test on which we will be basing computations of power for all three cases is described. This test is the uniformly most powerful unbiased test, first proposed by Katz in l9h2, and discussed in some detail by Sverdrup in 1953, and Tocher in 1950. Sections 1.6 and 1.7 and 1.8 are devoted specifically to past treatment of the 2 x 2 independence trial, 2 x 2 comparative trial and the double dichotomy. In order to evaluate power for the above mentioned unbiased test of independence, some preperties of the conditional distribution given by(l.2.5)are described in Chapter 2, including moments, asymptotic distributions, and several approximations. It will be shown in section 2.2 that the asymptotic distribution of(l.2.5)is normal, and this result will be the foundation and key tool in the study of the power function for the test of independence. The derivation of the asymptotic distribution is patterned after i. , ___ A A 77 fl ’ _I‘i -12.. Feller's normal approximation for the binomial distribution, and the assumptions made are essentially the same as for the binomial case. In Chapter 3, the exact power function for each of the Problems I, II, and III is given. The order of presentation begins in section 3.1 with the 2 x 2 independence trial, and proceeds naturally to the 2 x 2 comparative trial and the double dichotomy in sections 3.2, and 3.3 respectively. Several approximations to the power are given in these sections, using in the case of the 2 x 2 comparative trial an approximation given by Sillitto [21]. Theorem 3.4.B in Chapter 3 shows that, asymptotically, there is no difference in the power functions, for "corresponding" alternative hypotheses and suitable choice of marginal totals in the 2 x 2 independence trial and the 2 x 2 comparative trial. As a consequence of this theorem, an additional approximation to power for the 2 x 2 comparative trial and the double dichotomy is proposed. Also in 3.h. the asymptotic power function for the 3K 2--test of independence is studied. Finally, Chapter h serves to unify the results obtained in Chapter 3, and some of the approximations proposed in Chapter 3 are compared with the exact power. Rather extensive tables of exact power for each of the three cases are in the appendix. These exact computations provide the means for evaluating the adequacy of the various approximations. -13- 1.5 The Uniformly Most Powerful Unbiased Test for One-Sided Alternatives and Two -Sided Alternatives. As noted on page 11, Katz [9], Tocher [25], and Sverdrup [2%] are responsible for the develop ment of the tests to be described here. Katz, in l9#2, assumed that any alternative distribution \ \ i i to the null distribution for the 2 x 2 independence trial \ i could be taken as that given in (1.2.5), i.e. (1.5.1)Pr{nl [ml,m2.n; A} = (2962:2915’11 é z(?l)(g2:§‘1) tJ where t = A(l - PA - FBI? KPhPB) (1 - APAMl - APB) Thus, the test for independence amounts to testing Ho: t = 1 vs. H1: t ,1 l or, equivalently, to Ho: A = 1 vs. 1 # 1. Considering first the special case Ho: t = 1 vs. Hl:t=to;t°;!1 and applying the Neyman-Pearson lemma for testing a simple hypothesis against a simple alternative, Katz finds that Ho n Should be rejected for all values of I11 such that to1 2 k, where k ‘is some fixed constant chosen so that size o( , or the test has (1.5.2) nllog to > log k . For all t < l, the inequality (1.5.2) is satisfied for all 111 g a; for all t > 1, by all n1 2 b. Thus, if we wish to test the hypothesis t = 1 only with respect to one-sided alternative hypotheses t < 1 (or t > 1) the uniformly most powerful critical region is a tail of the c onditional dis- tribution (1.2.6), n1 5 a (or 111 2 b). For testing alternatives t ;=’ 1 powerful test exists. Katz, , no uniformly most therefore, restricts the class of tests to those which are unbiased, in the expectation that within this smaller class of tests a uniformly most powerful test exists. Katz notes that a necessary condition for unbiasedness is that the power function have t = 1, or that a minimum at 2 (29(32: 1) J11 _ (l 5 3) a n16w(ml:m2) 1 fi = o at Z CW" "”1) a J m2-j t = 1 . 3 where w(m1,m2) is the critical region for fixed m 1,m2, and n. A little algebra reduces the necessary condition (1.5.3) for unbiasedness to the form (1.5.10 2: Prij'mlmzm} 2 n1 ‘Prgnllm1,m2,n} J n16 wfinlmg) ; —ijr{jlml,m2,n} X Pr inl|m1,m2:n} J “lswmi’mfl This requires that the mean value of n1 in the critical region be the same as the mean value for the entire range of n . Thus, the critical region w(m1,m2) must contain values above the mean and below the mean, such that ob =o( .. five-5,..- _.—_.._.__-... . _ .- m—_-_ _— .--.-.——-—: w--:vo—-—- - and (105-7) Z n1 Pr inll mlsm2rn} 4’ a 81 Pr{a I ml,m2,n} n1b 1 m1m2 “X? ’ where a,b, £1, and, 52 are determined by these equations andog 51, 52(1. In the one-sided case (testing t=1 vs. t 1, decreasing when k < 1. Thus, the common best similar procedure for the test against all alternatives x < 1, say, is the conditional test, for fixed 1 m1 and m2, defined by ‘ (l.5.l‘+) w(n1.m1,m2) = l , n1 < a(ml,m2) w(a[m1,m2],ml,m2) = O( - Z Pr{nllm];,m2,n} nlsw (mlfln2) 1‘. Pr{a(ml,m2) lml,m2} w(nl,ml,m2) = O , n1 > a(ml,m2) 1 where 8(m1,m2), for fixed m1 and m2, is determined by \‘ Z Prinl ‘ml,m2,n} _<_0(< z Pr inllml,m2,n} \ \‘ n12 w(m1,m2) nlsw (1111,1112) For fixed m1 and m2, this test says to reject Ho if n1 < a(ml,m2) with probability 1, n1 = a(m1,m2) with probability 5 and to accept Ho if n1 >a(mlm2) where a(m1,m2) and a are determined such that (1.5.15) 2 Prin1|ml,m2,n} + e Pr{a |m1,m2,n = o( . ‘1 n1l "positive" p1 > p2 Hi > PAPB dependence Two-tailed alternative 7\%l Dependence p1 % p2 v1 # PAP If we let Tl be the one-tailed test defined by (1.5.8), and T2 be the two-tailed test defined by (1.5.6) and (1.5.7), then Sverdrup has proven the following theorem: Theorem: For the one-tailed tests, test T1 is the uniformly most Powerful unbiased test at the level of significance C<.. For two-tailed tests, T 2 is the uniformly most powerful unbiased test at the level of significance C< . Test T1 is also uniformly most powerful among all tests in the wider class of all similar tests, and T1 is uniformly most powerful among all tests for independence in the 2 x 2 trial. independence Thus Tocher and Sverdrup have shown that properties of the test procedures given by Katz extend to the 2 x 2 com- parative trial and the double dichotomy with each of Tocher and SVerdrup obtaining similar results for one-tailed tests. In the remainder of this chapter, a description will be given of some other tests which have been proposed, for each of the three problems, beginning in the next section with the 2 x 2 independence trial, and in sections 1.7, and 1.8 discussing the 2 x 2 comparative trial and the double dichotomy. 1.6 2 x 2 Independence Trial The situation in the 2 x 2 independence trial is best illustrated by Fisher's tea-tasting experiment. A lady is given n cups of tea, ml of which had milk added first, then the tea, and n ~ m1 cups with tea put in first, followed by the milk. The cups are presented to her in random order. Her problem is to sort the cups. If she is told the number of cups with milk added first presumably she will guess that ml of the cups had milk added first, and the rest with tea added first. It is therefore natural to regard both sets of marginal totals as fixed, in repeated sampling. ...-..— . _. -23- In the 2 x 2 independence trial, the null hypothesis is that the markings A1 or A2 are independent of the labelings B1 or B2. If the hypothesis is correct, the distribution of n1, the number marked A1 and B1’ given_ m1 and m2 is the hypergeometric distribution(l.2.6), i.e. m n - m1 (1.6.1) Pr{nllm1’m2’n} -.- w (“12) where the range of n1 is given by max[0,ml + m2 — n] 3 111 < min[m1,m2] Equation(l.6.l),for the null hypothesis, has been described by Yates as Fisher's "exact“ distribution based on the hypergeometric probability law. It is a completely known one-variate discrete probability distribution. Except for specifying non-hummenwmme, the only clearly defined alternatives to independence are those proposed by Katz. For one-tailed tests, the region of rejection of Ho usually consists of extreme values in one of the tails, rejecting all those values in the tail whose probabilities sum to a number equal to or less than <>( , given in advance. By employing a randomized decision rule, we can make the test of exact size c< . However, for a two—tailed test, several alternative regions are possible, since no uniformly most powerful test exists. Some criterion must therefore be adopted in order to determine which of the several possible critical Eh. - ..." 'n'fll-rg -24- regions should be taken. The usual convention is to make the probability associated with each tail region sum to G<./2 or less. Because nl assumes only discrete values it is not always possible to obtain critical regions of size exactly CK. in terms of integer values of n1. In order to obtain exact size C<_- tests, we must adopt a procedure of randomization for a decision. In this way, we can put probability' C(/2 in each tail, under Ho‘ P. Armsen [1] suggests two other possible rules for the selection of points in the critical region, which are denoted by D2 and D3 in his paper. D2. "Arrange the possible events in ascending order of the size of their probabilities under the null hypothesis; include in the 100 0(35, two—tailed rejection region those events for which the cumulative sum of these ordered probabilities is smaller than or equal to 0L 3' D3. " Define F(E), or the 'first tail' probability, as the cumulative sum of the probabilities under the null hypothesis of all possible events which are more extreme, in the same direction, than given event E, including the probability of E itself. Define S(E), or the 'second tail' probability, as the cumulative sum of probabilities, starting with that of the event most extreme in the opposite direction as compared with E, and cumulating up to but not exceeding the value of F(E). If and only if, F(E) + S(E) 5. oc , include E in the rejection region for the two-tailed 100 o<7e1eve1 of significance." _._———- ...-— han________________ H , _, . -25- One might also restrict the tests to a class of "nice" tests, and then look for one which is uniformly most powerful within this class as was done in 1.5. In any case, a class of alternative distributions for the alternative hypotheses [must be specified. It is clear that in order to decide which test is "best", one must examine the power function of the test. Katz, Tocher and Sverdrup proposed their test with this consideration in mind. Finally, as Armsen has pointed out, there are certain peculiarities in any of the first three definitions of the critical regions. It is possible, in certain cases, to construct, on the basis of his definitions, critical regions in which all points of the region come from one tail of the distribution. The classic test of the null hypothesis is that given by Fisher [7]. It consists in computing(l.6.l)for the probability of the observed value and all values less likely. If the sum of these probabilities is equal to or less than CK., then H0 is rejected. Extensive tables have been prepared by Finney [6], Latscha [10], and others, for small n, indicating the critical points for which the hypothesis of independence is to be rejected. For small values of n, exact computation of the tail terms is practical, but for large n, it is quite tedious. One may, in this case, approximate the exact hypergeometric distribution by a normal distribution with a mean and variance corresponding to the mean and variance of(l.6.D. m1m2 (1'6.2) E(n1‘ml’m2’n) = n = a; ' -- ..._~ .-- "WI-“- -26- (106.3) V (n1 'ml,m2,n) = m1(n ' mlhnz (11 - m2) = 112 n2(n - 1) Let m m . _l.2 _ - (1.6.4) u = n1 n = El_h-El ’ / m1(n - m1) m2(n - m2) n2(n - l) and write uc,< for the 100 o( ~percent point of the standard normal distribution, i.e., no< is defined by the equation 2 oo _ u 0.6.5) 3 1 e :— du = c( . v’ESF‘ uo( Then, for a one-tailed test, reject Ho if u > u GK or u < u1_C( , depending on the appropriate case. For two-tailed tests, reject if [u I > “on . Very often a correction 2 factor for continuity is included in u, i.e., the absolute value of the numerator of u is reduced by % . If this is done, then for very large n, we get the usual 7K’2 test in a 2 x 2 table, with one degree of freedom, if we square the corrected u, i.e., n 2 m1(n - m1) m2(n - m2) 2 2 We reject if X2 2 xx . where 7(0< is the 06* 2 percent point of 7( with one degree of freedom. is, , .m,_‘ -.fi -27- 1.7 2 x 2 Comparative Trial To fix the notion of a 2 x 2 comparative trial, we give an example. Groups of m1 men and n - m1 women are randomly\ selected. They are then examined to see whether they are smokers or non-smokers. It is assumed that the proportion of men smokers is p1, and the proportion of women smokers is p2. In repeated sampling, it is supposed that we always take the same number of men in each sample, and the same number of women, but that the number of smokers in each sub-sample is free to vary. In the 2 x 2 comparative trial, the hypothesis to be tested is the composite hypothesis p1 = p2 = p, against- alternatives p1 # p2. The probability of observing (n1,n2) is, in general, (1.7.1) Pr {(n1,n2) |p1.p2.m1.n_} = b(n13m1.p1) b(n2;n-m1.p2) = b(n1;m1,p1) b(mZ-nl;n-m1,p2) In contrast to the 2 x 2 independence trial, the probability distribution(l.7.l)is over a two-dimensional lattice of points. This distinguishes the problem from the one discussed previously. It is necessary for testing purposes to decide whether m2 is an intrinsic parameter or a nuisance parameter. Several tests for the null hypothesis have been proposed. Barnard's test [3] may be described as follows. "Taking rectangular axes in a plane, one can represent an observed result (as in Table I) as the point whose coordinates are -28- (n1,n2), where 0 S n1 3 m1, 0 S n2 5 n - m1 (n1,n2 integral). Call the totality of possible points a lattice diagram. points in this lattice diagram will be given a total ordering. First, the same rank should be given to the point (m1 - n1, n - m1 - n2) as to the point (n1,n2). This is called the symmetry condition, or conditinn S. Secondly, two points which, respectively, have the same same ordinate as (n abscissa or the 1,n2), and which lie farther from the diagonal line joining (0,0) and (m1,n - ml) should be considered as indicating a wider difference than (n1,n2). This is called the 'convexity' condition, or condition C. Conditions S and C generate a partial ordering. To make it a total ordering, one further condition, called condition, or condition M, the maximum is imposed. Conditions C and S require that the points (ml,0) and (O,n - m1 lowest rank. ) be given the Associate a function P with these two points defined by m n- n‘m (10702) P(O’n ' ml’p) = P7 1(l-p) m1 + p 1(1-p)m1’ where p1 = p2 = p in (1.7.1) Let fl.7.3) Pm(0,n - m1) 8 ogggi P(0.n - mloP)0 n n l 2 Considering only points for which a; < H—:—;; (by symmetry) , 1f the first (K - 1) points (81,101), (a2 ,b2 ), . . . , (aK-l , bK-l ) in order of increasing rank have been chosen, and (ax-1, bK-l) is associated with the function 047.”) P(8K_1:bx_lrp) = P(aK-2’bK-2’p) 11' m n-m n-m m + (:3)(n2mi> [p 2(l-p) 2+p 2(l-p) %] K,bK), is that point, of all points (n1,n2) permitted by the C condition, for which then the Kth point, (a CL7.5) P (n ,n ) = max P( ,b , ) m 1 2 O< and J‘H 2 u“ C ‘5 2 n n + l - 3— Let L1 = K.) (n1,m2-n1 :dL-—-§-—-—l— g_- uc‘ m2=0 2 n l “ n - - n \ 2 l and L2 = U (n1.m2-n1): l h 2 ucx 1 m2=0 2 and put L = L1 \J L2. Then, if a one-tailed test is performed, H0 is rejected at'level 925 it the observed sample point falls in the appropriate component of the critical region, either 1.1 while for two-tailed tests, Ho is rejected at level (X if the sample point is in L. Pearson notes that since the probability density is discrete, the "true" probability of rejection will quite generally be much smaller than 0( . For' each value of In2 let °< m2 = Z Pr {n1 |m1,m2,n} . Then 0(m2 _<_o< , nleL n for each 1112 = 0,1, ...,n, and hence z Pr {m2lp,n “m2 $o( . m2=0 He then remarks that whereas the 32‘ - correction for continuity is appropriate for the Problem I, "it is not helpful for the Problem II where we are concerned with a two-dimensional experimental probability set.“ He proposes, therefore, that the coreection for continuity be omitted in(l.7.8). If this is done, then new critical regions L1' and L2' are obtained, where __ n n - n Ll' = U (nl,m2-n1): -l-—l‘h _<,_ - u“ and m2=0 '2— n ”1"“: L2. = U (111,1112'1‘11): h Z 115 o m2=0 2 Corresponding to 0(m2, one has ' — I I U o(m2 " Z 'Pr{n1 |m1,m2,n}, L = L1 U L2 . naL -32- I With this modification,<>< - h(m2) 7 where, following his notation, B(m2) denotes the conditional m (n-ml) m2(n-m2) power function, and h(m2) = -;-—-—§-—---+—-log t. Next n (n-l) he approximates the distribution of 1112 by a normal distribution with mean 1‘: mlp1 + (n-m1)92 . and variance 2 2 r2 = m1p1(1-p1) + (n-m1)p2(1-p2) = 6’1 + 0'2 s ‘K and a-2 are the exact mean and variance of mg. This -35- leads to the approximation 00 - l(fl)2 (107015) flNS l e 2 a- B(m2) dm2 , J21rr -oo where {3 denotes the unrestricted power function. Since this still represents no essential simplication, he considers two approximations to (1.7.15). The first consists in expanding B(m2) in a Taylor's Series (about t”) to obtain 2 co m - X 43 {-2——) °° (J) g 1 0* 15—4—1 - :1 (107.16) BN8 my 9 32 3‘ (m2 ‘6' ) -m =0 2 " ‘6‘ 1+ (iv) 7; = $( X) + LgTL—J + Jug—ILL- + ace and then taking 5(8‘) as a first term approximation. Thus, his first approximation to power is given by 1107/2 ' h(fi) (107017) BN 1 "' -u072 - h(x) The second approximation is derived thru using the method of approximate product-integration developed by R. E. Beard. This leads to his second approximation, given by (1.7.18) p~1/6 an; - «‘3 r) + 2/3wf) + 1/6B(2f+~/_3 r >. Patnaik claims that (1.7.18) gives a better approximation than by using the first three or four terms of the above- -36- mentioned Taylor's series. Thus, Patnaik chooses (1.7.18) as i . l g his approximation to the power function for the test of ;. independence in the 2 x 2 comparative trial. It should be noted, however, that the assumption that m2 is normally distributed is irrelevant as far as the first two terms of (1.7.16) are concerned, since n 00 j (U)(m -'K) (1.7.19) 5” Prim2 |p1,p2,m1,n} Z 5(J)__J_f__ m2: J=O . 2 u — = £3”) + LETLZ-Q * qupflfie * (q2‘92)°’22_l HI . _Lgill. ., Both of Patnaik's approximations rely heavily on the approximation to the conditional distribution of n1 given m1,m2,n, and t. In section 2.2, it will be shown that this approximation may lead to rather non-sensical results, and hence that both of Patnaik's approximations may be unreliable. The implication of the results in 2.2 on Patnaik's approximations will be discussed in section 3.2. 'Sillitto [21] also obtained an approximation to power for the same test. The approximation is based on the arc sine transformation for a binomial variable. Let arc sin ~73; - arc sin ~/_p-l (107.20) C(pl’pasml) = r ( \ _l l_. *1 -l . n where a‘— 2 m + n-m — 2 E‘YETE'T , and the 1 l 1 1 angle is measured in radians. Then Sillitto gives as an approximation to power (1.7.21) fl~l - (DLug- c(pl,p2.m1) -¢ -uE- c(p1.p2.m1) 2 2 ‘ 1 for two-tailed tests, and (107.22) BN1 ‘ ¢ [11“ — C(plsp29m1)] for one-tailed tests. For comparisons of Patnaik's and Sillitto's approximations, see table E.2.1 and E.2.2 in appendix E. It appears from the available data that Sillitto's approximation is much more satisfactory than Patnaik's. One could also make a conditional test, for fixed m2 and put probability °(/2 or less in each tail, and then compute the power function. G. C. Sekar[20] and others, have given tables of the power function for the special case n where m1 = n - m1 = 2 , and the critical region for each m2 is Symmetric, i.e., if (n1,m2- 1) is in the critical region, then so is (m2-n1,n1). 1.8 The Double Dichotomy As in the previous two sections, an example of an GXDeriment illustrating the situation in the double dichotomy is given. g E ; '-'I- - ‘f-F-u-v — .. .-_ . . -33- A group of n college students, selected at random from the totality of all college students, are classified according to sex and according to whether or not they are drinkers or non-drinkers. The result of the classification is represented in the form of Table V, along with the hypothetical proportions of all students having the double classification: TABLE‘V Male Female Drinker n1 n3 m1 Non-drinker n2 nu n-ml m2 n—m2 ' n TABLEVI Male I Female Drinker "l #2 PA Non-Drinker n3 nu l-PA PB '2 l-PB 1 It is natural to regard both sets of marginal totals as free to vary in repeated experimentation. The hypothesis of independence takes the form HO: 1T1 = PAPB o If H0 is true, the two classifications are independent. As pre- V1°US1Y noted (page?) any alternative hypothesis may be ..39- expressed as Hi‘ "i = A PAPB ’ where 1 satisfies the inequalities m.,<_i___ PAPB . ‘ ‘ max[PA.PB] The double dichotomy appears to have received the least attention of the three cases. For small samples, Fisher, as in the previous cases, asserts the test should be performed regarding the marginal totals as fixed by the observed sample totals; in other words, the conditional test for fixed m1 and m2 is the appropriate test. Pearson [16] suggests that Barnard's method be extended, for small n, to this double dichotomy, and for large n suggests the use of the normal approximation in an analogous fashion to the 2 x 2 comparative trial. There is no available literature on the power of any test in the double dichotomy. 1.9 Results In concluding this survey of past work, one sees that while much has been done with 2 x 2 contingency tables, there are some aspects which have not been investigated. Thus, in the remainder of this thesis, the author proposes to do the following: 1. Investigate the properties of the conditonal distri- bution Pr‘inll m1,m2,n;t}- given by equation (1.2.5). E 'P'“"r< rww._... _ho- including its asymptotic distribution and several approxima- tions to it. As a part of this investigation, it is shown by Theorem (2.2.F) that there are at least two published results in the literature which may be invalid, namely, the results of Patnaik [1%] on the power function for the test of independence in the 2 x 2 comparative trial, and Moore's [13] power function for a test for randomness in a sequence of two alternatives involving a 2 x 2 table. 2. Give the exact power functions for the test of independence based on the uniformly most powerful unbiased test described in Section 1.5, for each of the 2 x 2 independence trial, 2 x 2 comparative trial, and the double dichotomy, and some approximations to these exact power functions. 3. Investigate some relations between these exact power functions, and prove that for large sample sizes there is little difference in power between the three cases for appropriately chosen alternative hypotheses and marginal totals. Theorem (3.#.A) in Chapter 3 gives the limiting power function in the 2 x 2 independence trial, with modest conditions, and Theorem (3.#.B) shows that asymptotically the power functions for the test of independence in the 2 x 2 independence trial, 2 x 2 comparative trial, and double dichotomy, are equal. h. Study the asymptotic power function for the )(2-test. 5. Investigate the adequacy of various approximations by making comparisons between the various approximations. _34 —'-'IIP' - - .._——.- ”...... Chapter 2. An Investigation of 8 Probability Function —kl- 2.1 General Properties The conditional probability of n , given the two marginal totals ml and m2, and the dependence parameter t, 0 < t < 00, obtained in section 1.2 and given byfl.25) takes the form @)(mfi tnl Fran [m In n}tnl 2 n1 1’ 2’ ”‘1‘“ prim 1;|ml,m2:n t}: (ELM Wall) 2:11” = Z Pr )1 lh’mz’n} where max [0,ml + m2 - n] S n1 3 min [ml,m2], 0 g m1,m2 g n. In order to evaluate power for the various models, some of the properties of this distribution will be useful. Let a = -ml; b = —m2, c = n - m1 - m2 + 1. Then m 1 —r 7 (“if ”‘11) .3 z ”1 . WW .4 J “2‘3 P0 since t > 0, which confirms the obvious fact that "1' is a monotone increasing function of t. Putting k = 2 in (2.1.11), obtaining ' , 9 t P3 ' P2 “1 #2 - -————17——————— an“ 2 O noting that #2 = ”2' - (pl) , and using(2.l.9)and(2.l.12), it is easily seen that -us- gg- 3u2 21 + 2(ul)3 (2.1.13) (a) g? 22 = t = Eta u - u (2.1. 13) and ...,1,’= —3——-2——2 . t Using well-known linear relations between hypergeometric functions (see Snow [22], pages 31—32), one derives some recurrence relations between moments. For example, using the linear relation (2.1.1h) t(1-t) (a+1) (b+1) F(a+2, b+2; c+2; t) + (c+1)[c-(a+b+l)t] °F(a+l , b+1; c+l; t) = c(c+l) F(a,b;c;t) and (2.1.5) and (2.1.6), it follows, after some simplication, _that I ' - b 1 t ” (2.1.15) p; = p' (1) + pl(l) _ .2? t _ [c 1?: + ) ] 1 + ”i = 111112 t— [n - (m1+m2) (1—t)] “1' l - t ° I Using this form for p2 , one then obtains, m1m2 t - [n - (m l+m2)(l- -t) + (l- -t)u1] “'1 (2.1-16) P2: 1 — t Using the same relation (2.1.1h), of contiguity above, after some algebra, one finds that . [mlmzt + n - m1 - m2]p£ -[(n-l) - (m1+m2)(1-t)]ué p3 = 1 - t (2.1.17) and (2.1.18) P1: ={[m1m2t + 2(n-ml-m2)]ul. + [(m1-2)(m2-2)t 4- 3(n-m1-m2+3)+3t(m1+m2-S)-ll(l-t)]pz' - [(n-ml-m2-3) + (ml+m2+l)t]u3'} (l-t)_l Other recurrence forms may also be derived, using linear relations between hypergeometric functions. Using (2.1.lh) again, one finds the relation (2.1.19) m m t E inllml,m2,n;t} = 1 2 (l-t)[E(nl [ml-1,m2-l,n-1;t)-(m1+m2-1)]+n from which one can compute E'inllml,m2,n;t:} from smaller values of m1,m2, and n. As soon as pi is computed, we can obtain p2, u; , and a; by substituting this value into the recurrence relations given by (2.1.16),(2.l.l7), and (2.1.18) respectively, and once these moments are known, p3 and uh can be calculated using (2.1.9). The variance p2 could also be obtained using (2.1.12), which asserts that “$1337.“;- An example illustrating the computation of the first four moments about the origin, and the central moments p2,p3, and “h’ is given at the end of this section. The procedure just outlined for computed moments is still quite tedious, even for small values of m1,m2, and n. Therefore, we now consider approximations for the moments. Solving for p1 in (2.1.16), and p1 takes the form the relation between 92 (2.1.20) ' n-(ml+m2)(l-t)+ -(m1+m2)(l- t) +m1m2t #1 = 2 t- 1 ' 2(1't)"" ' "2 where the negative square root is taken if t > 1 and the positive root if t < 1. If t = 1, then we have ' m1m2 F1 = n ‘ Defining 11* by (n) (2.1.21) 2 + - n n-(ml+m2)(l~t) n-(m1+m2)(l-t) m! met l(n) ‘ mlm2 [: 2(t-1) + 2(1-t) + l-t and J\(;) in the same manner, except the negative square root replaces the positive square root, we obtain the inequalities m (2.1.22)(a) u; S 7‘-(;)n m12 if t g l, and (2.1.22)(b) p; 2 1 (a) 3%22 12 t 1, Iv using the fact that p2 2 0. It is easily verified that + - . h(n) and ' X(n) satisfy the quadratic equation in h(n) l(;) are the roots of the equation) ml _ m2 Elfg) " PA and BIB N as n —> 00, then (2.1.23) 4- 1 1‘(P +PB)(l-t) " (n) m> m [7% l (P +PB )(1- t)— 2t-l 2 '(P +PB )(l- t) R (n)n ———>oo> 37‘21)‘ PAPBIii-(P +PB)(1-t1]2tPAPB - 4—“ + 2(t-1) and 7s + and X- are roots of the equation x (l—P -P RP P (2.1.2h) A B+ A B) = 1(1— APA)(1- APB) r’ We recall from section 1.2 that this constant arose in deriving the conditional probability of n1, given m1:m2,n, IMPA, and PB. The implication of (2.1.22)(a),(2.1.22)(b), (2.1.23) is that P'1 + . (2.1.25) 11m — g i PAPB if t g 1 n —> con P" . 1 — . 11m —n 2 R PAPB 1f tgl. n—>oo In the next section, it will be shown that the equality sign holds in(2.1.25). Hence, for ml,m2, and n moderately large, "1' can be approximated very well by m m (2.1.26) 90:): 10:) I]; 2 if t g 1, _ _ - mlm 9(n)- 1\(n) —n—2 if t 2 1‘ Furthermore, it will be proven that, under the same hypotheses on the order of magnitude of m1 and m2, _ n —1 (2 1 27) 11 ”2 Z 1 . . m — = — n ->oon 1Ti i=1 where HI = KPAPB 172 = PB - K PAPB 1T3 = PA - A PAPB 7T1+=1-PA-PB+ kPAPB. Hence, as an approximation to p2 we have _’+ —1 A 1 (2.1.28) 112 = n '—'— ”1 i=1 _ where "i. is obtained from Tri by replacing PA and P]3 by m m + . 3']; and 11—2 respectively, and 7\ by h(n) if A _<_ l, and by 101) if 1 _>_1. ___ ___.. ...—... . -50- For small values of m1,m2, and n, the approximation (2.1.26) can be improved if we replace p2 by 32 in (2.1.20),i.e., we approximate pi by (2.1 .29) A. = n - (m1+m2)(l-t) + n-(m1+m2)(l—t) + mlm2t - A “1 ' 2(t-l) " " 2 t-l '1'-' "2 square root if t > 1. If it is desired to approximate pg , p; , p3 , and “h , then the recurrence relations given in(2.l.l7)and(2.l.18), together with(2.l.9),may be used with p; replaced by fii, being sure to take the correct root . However, these approximations, especially for p; and “n2 will not be as close to the true values as 3; is to Hi : Since a very small error in approximating “1 can build up to a sizable error when p; is approximated. where the positive squariZgg taken if t < l, and the negative For small values of ml,m2, and n, (2.1.1), (2.1.10), and(2.l.16)may be used to compute exact probabilites, moments about the origin, and the variance, using tables of the hypergeometric probability distribution. Roksar [19] has four-place tables of the hypergeometric distribution for selected values of ml,m2, and n, ranging to n = 100. To approximate the moments,(2.1.28)and(2.l.29)may be used. We now give two examples illustrating some of the recurrence relations,approximations and other properties of the probability function (2.1.1). Example 1. we take m1 = 6, me are given in Table VII .003 =8nn=20,t=2.The individual probabilities, accurate to nine decimal places, Table VII Pr {n1’6,8,20;2} M89 .047 852 .209 356 .372 188 .279 1H1 .081 20% O\\J‘1 47w N H O .006 “1 “2 p3 = -.05019623 ”h 767 Exact computation of the moments about the origin and the mean value p; , using(2.1.10)and(2.l.9),yield in the next section we will show that n asymptotically normally distributed, as an indication of the rate of convergence to the normal distribution, we compute p. = .002025 and 32 = —55 = 2.83061. “2 = 1.07562k55 = 3.27h926h1 -52- These values compare very well with the values 0 and 3 for a normally distributed random variable. The two constants fll and 32 are measures of the skewness and peakedness of a distribution. b Using(2.l.2l) we find that *(20) = .1553778, and so as a first approximation to pi we have 6620): 3.l07556 From(2.l.28),we find that 92 = 1.018620, so that the improved approximate value of pi employing(2.l.29)is given by A. p1 = 3.1hh265. The higher moments about the origin are approximated by using the recursion relations between moments; thus, from(2.l.l§%(2.l.l7),and(2.l.l8),we find successively ‘{ that 15 i; = 10.9050 \2 fl; = 39.1503 \ E 3; = 10h.8391 . Except for p;, the approximations seem to be adequate. 1 Example 2. Let m1 = 12, m2 = 15, n = 30, t = 6. The exact \ probabilities Pr{nlll2, 15, 30; t} are given in Table VIII The procedure for computing the exact and approximate values was given in Example 1, so only the numerical data is present- ed here. 1.. \ .0000 .0000 .0000 .0000 .0001 .0020 .0162 O‘W-F'WNHO 1: N l "3 9.099%6 " 8h.2822o a; 793.51303 0' 0000 0002 0016 068% 5827 8911 h86h I at = 7,58u.7u691 a; 31 = .ontsa Table VIII n1 Pr-{n1l12,1s,3o;6} n1 9.09660 = 82.HHOOO = 791.03628 7,9h0.10768 B2 = 2.9h23t Pr.{nll12,15,30;6} .0752 .2051 .3190 .2650 .1032 .0137 0799 1270 6h21 6873 7353 6980 9(30) = 9.00000 p2 = 1.H8202 p3 = -.38066 7&(30)= 105 2.2 Limiting Distributions and Approximations The asymptotic properties of the probability function (2.1.1) will be of great importance in finding approximations for both the individual probabilities and the corresponding distribution function. In this section, three limit theorems, (2.2.D), (2.2.E) and (2.2.F), are given for (2.1.1). Since (2.1.1) is of the form n (2.2.1) Pr-{nllm1,m2,n;t'} = k(ml,m2,n;t) Pr {nllm1,m2,n}-t 1, where Pr(nllm1,m2,n) is the ordinary hypergeometric function as given by (1.2.6), and k(m1,m2,n;t) is the reciprocal of the sum of 'Pr inllml’m2’n} tnl over all possible values of n1, it would seem plausible that limit theorems for (2.1.1) would follow readily from corresponding limit theorems for Pr inllml,m2,n} . Three such limit theorems have been established by Feller for the hypergeometric function. Th. 2.2.A. (Normal Convergence: Feller [5], page 180). m m If n—l- —>r,r—1-2» -—>s,(0 x , where h =\// 1 2 2 m1 2 , n (n-l) _ -1 n1 1an _ x n1 then ‘1 (2.2.2) Prinllmlmyn} N h #0511) where the symbol "rv " means that the ratio of the two sides v . a p—y—u. .. . -._ ....-.u... _ . .- ._-._--.. — .1- -55.. 1 -— 2 tends to unity, and 0' is the normal density (2v) 2 exp [%%— , Th. 2.2.B (Binomial Convergence: Feller [5], page 180). m1 ““1 If H- --> r > O, —E- -—-> l—r > 0, then for each fixed m2 m n ‘ (2.2.3) 11:: Pr {nllmrmyn} = (n5) r l (l-r)n12 n1 . Th. 2.2.0 (Poisson Convergence: Feller [5], page 180). In1““2 If -;;- --> v, 0 < v < 00, as m1,m2,n ———> 00, then n . e.v v 1 (2.2.4) 11m Pr inllmlmzn} = n! . I1 In the next few pages, similar theorems on convergence to the normal, binomial, and Poisson laws will be proven for (2.1.1), with convergence to the binomial and Poisson Laws following almost trivially from Theorems 2.2.B and 2.2.C., whereas convergence to the normal density will be proven by suitably modifying Feller's proof of the convergence of the binomial distribution to the normal distribution. Theorems 2.2.D and 2.2.E below are easy consequences or (2.2.3) and (2.2.H), and are therefore given first. n Th. 2.2.D If gfi —-> r > 0, nm1 —> l-r > 0, then for each fixed m2, m m2-n li P - = 2 (x Ill—F 1 - r n goo r{nllml,m2,n,t} (n1) tr 4- tl - r) tr 4- l - r Proof: In proving Th. 2.2.B , Feller has shown that n m -n m m n l n- m -n 2 1 2 ( 1 _ 1) < E; _ 2 1) (2°2'5) (n1) n n n n <1» {simian} <e>nl (2.2.6) (b) m2-n n1 (m2)(m1t - lt> (n-ml - m2-nl n m.) ). g Gastfefifijé- 1' Pr {nllml,m2,n;t} < (::)(3§)n1 (131) 1(1 - 33—2) 232890,? - ‘31:) J03“ - “TV For each fixed m.2 both of the sums in the denominators in 'J 1 ”n1 i .. ..-.____—. (2.2.7)(a) 112mm lPr{n1|m1.m2.nst} {(39 use) -1.) 2 ] Similarly, it is easily shown that V (2.2.7)(b) lim Pr'in 1lm1,m2,n; t}< n->oo Thus, (2.2.7)(a) and (2.2.7)(b) result.' m m t Th.2.2.E If 4—2——- = vn —> v, m1t+n-ml m1,m2,n > 00, then n 'V 1 Lim Pr-in lm m ,n°t = 2—-X—- n_>oo l 1’ 2 ’ } n: mlt Proof: Let pn =Eitifitfi‘ . Pfi‘—-> 0 in such a way that m2pn -> v, and the Poisson limit theorem for the binomial distribution implies the conclusion of the theorem. One would probably suspect that convergence of (2.1.1), (2.2.6)(a) and (2.2.6)(b) are finite and positive, so that (3%) (rt)nl(1-r)mz-nl Zoo , and k -—>00 in such a way that hxk3 --> 0, and where p is held fixed. However, if p is free to vary but still hxk3 -—> 0 as n ->oo , k ->u> and n-k -—>oo, Feller's theorem still holds since the proof remains the same, except p is to be replaced by pn, h by hn = (npnqn) 2 , and xk by (k-npn)hn. In the derivation of the asymptotic -62- formula for the individual probabilities Prgnl\m1,m2,n;t} below, the quantities p1 and 132 to be introduced are not fixed, depending on n , but the notation will not show this. The quantity t in Pr inlIml,m2,n;t} is a positive constant and independent of m1,m2, and n. Write Pr {nllm1,m2,n;t} in the form b(n ;r ,p ) b(m -n ;n- ,p ) (2.2.13) Pr {n1|ml,m2,n;t} = 1 “1 1 2 1 m1 2 E: b(nl;m1,pl) b(m2-n1;n-m1,p2) n1 \ mm; “3W 2 zeta-ma“ 1: n1 piqz where p1 and p2 are chosen so that -——=t and P2q1 ! ‘; m2 = mlpl + (n-ml)p2. \- Letting l NIH 2 2 _ (2.2.1)-D h1 = (mlplql) , h2 = [(n-m1)p2q2] , h « hl + h2 xl,n1 = h1(n1-mlpl), X2,nl = h2[nl'(n'ml)p2]! X n1 h(n1 -mlp1) and noting that since 1112 - (n-m1)p2 = m 1P1 ’ (2.2.15) X2’m2_n1 = h2[(m2-n1) -(n-ml)p2] = ”1120111" lpl), two applications of the extended Feller—binomial-approximation yield (2.2.16) b(n13m12P1)b(m2'n13n'm1’p2)’NIh1%(xl,nl)h2d(x2,m27n1) as h ->O, hxg -—>0. This is true since h1 g h, so that l 3 ____ 3 3 hxn1 >0 obviously implies h1x1,n >0 and hx2 __>0. 1 'mz‘nl Conversely, if hlxi n —-->0, and hzxg m _n--->0, then ’ 1 ’ 2. l 3 3 1393 3 hxnl ——>0 follows from hlxl,n1 - hl (11 xnl h 3 3 _ _ (_2) 3 and h2x2,m2_n1 — h2 h xn1 , and the fact that h -—>o <::::j> hl and h2 ——>o. Using the definition of h and (2.2.15), (2.2.16) is easily transformed to l (2.2.17) b(nl;m1,pl)b(m2- 1;n-m1,p2) N (2n) 2 h1h2 ,5 (xnl) Ix ,, i i,nl and hence I ' - h (2.2.18) (2w) 2 @ b(nl;ml,pl)b(m2-nl;n-ml,p2)N h if (xnl) as h,hxi -—->0. Just as in Feller's treatment of the 1 binomial case, it follows that Fwy- ~6h- p 1 5 _h_ . . (2.2.19) 2 (277) 111112 b(n13m11Pl)b(m2‘nl :n'mlapz) n1: 0( B NZhfi{(xn)N§(x 1)-_§(x 1) c1 1 3* 5 °“ 5 as h, bx; , MB3 —> 00 Thus, if there exists tails n:L < ex and n1) B, with X 3 _ _ ___. 4. : > oo xB > 00 while still hx and th3 --> 0, such that the sum of the left handlside of (2.2.17) over these tail values converges to zero, then $- h 2 (21r) F132 b(n1;m1,p1)b(m2-n1;n-m1,p2) —> 1, and so n1 (2.2.20) Pr {rill m1,m2,n;t} n) h d(xn1) as h, hit: 1 >0. But if B > mlp1 and is such that XB --> 00 while hxp3 —> 0, then 2 b(nl;m1,p1)b(m2-n‘l;n-ml,p2) < b(Bimlml) X b(m2-n1;n-m1:P2) < b(Bgm1,pl)~ hlp'(x1,fl), and therefore -65- 1 ‘ h (2.2.21) ES (2")2 5132 b(n13m1,p1)b(m2—n1;n-m1.p2) nl>B ‘ i A hl «210 t: mxmxmnfiu + E ) mm) —> o, 1 . 5 h l- and similarly for Z (21f) F171; b(nl3mlip1)b(m2-nlin-m1:P2)o n1< 0L Therefore (2.2.19) is established. 5 The remaining question, since (2.2.19) is expressed in \ terms of m1,m2,n, and t only thru the parameters p1 and p2, 1 > o in is to try to interpret the conditions h, hxf’1 terms of m1,m2, and n for t fixed in (0,00). \' The condition h % 0 clearly implies that hl and 112 -—> 0, or mlplql ---> co and (n-m1)p2q2 --> co, and therefore mlpl, mlql, (n—ml)p2, (n-ml)q2 —> 00. But then 7 m2 = mlpl * (n‘m1)P2 7") a’ i and 1 (n-m2 = mlq1 + (n-m1)q2 ——-> oo ‘ Conversely “ m‘2 —> oo‘<’—_=) max. [m1p1, (n-m1)p2] —> a) 1‘ n—m2—> 00 :) max. [m1q1, (n—m1)q2] -—> oo __—-—-.— - - P q and with t held constant and equal to 1 2 , if m and n-m2 —-> 00 then either m p2q1 2 lplq1 or (n-m1)p2q2 -—> 00. Finally, if both m1 and n—m1 --> oo , then obviously mlplq1 and (n-m1)p2q2 -—-> 00. Hence, it remains only to determine the values of n1 such that hxn 3 --> O, or 1 3 h x _ > 0. 2 2,m2 n1 In general it may be rather equivalently, those n1 such that both h x 3 and \ l l,n1 difficult to determine the values of n1 for which hxn 3 -—-> 0 , but for the important 1 special case when m1 and m2 are both of the same order of magnitude as n , it is very easy. Thus, suppose El > P n A ’ plq > PB. Then the two conditions, t = “—g (0 < t < co) and m2 = mlpl + (n-m1)p2 imply l qu 2— n __ 2: n —>C nhl - m1P1Q1 > c1 ’ nh2 (n-m1)p2q2 2 ' where c1 and c2 are finite positive constants and consequently 3 __ m1 hxnl = h1+(n1-mlpl)3 — [(B-plql) 3 n and hence hx 3 2 _ n-m _ (n - p ) l + [0‘a‘l’P2q2] ] __l_E%_l__ n ——9 0 if and only if -l-§-—l- -—-> 0, 1 I1 which is the same condition as in Feller's binomial case. The above results are collected in the following theorem. Th.2.2.F. If «1 and {31 vary so that th3 —> o and 1 ‘31 Pr inl|m1,m2,n;t} u h d (xnl). uniformly for all 0<1 < n1 < B1 , and ‘31 (2.2.22) Z Pr-{nllmlmlz’nflz} N Q(x51+ l) " @(X o(1_% ). n1= °<1 2 The last assertion of Th.2.2.F follows exactly as in Feller. Finally, it follows, again as in Feller, that for every fixed a < b, (2.2.23) Pr {a g h(nl-mlpl) g b} —> § (lb) - t (a). The quantities p1 and p2 have not yet been expressed in terms of m1,m2,n, and t. Solving first for p‘2 in terms 01‘ p1 and t, one has p2 = p1(p1 4- qlt)-1, and then solving for p1 in the resulting quadratic equation '1 m2 = 1111131 + (n-m1)p1(pl+q1t) , it is found that -1 7\ + m2 m A + m1 m1 (2.2.219 p1 = 49..., p2 = 1, 7%; is to be replaced by Rub] where Z and ;K(n) are given by (2.1.21), and satisfy the equation , 1 _ ii _ i ., &) up n n . Routine algebra ( _ A2m1>2<1_ firs—"12> , shows that. putting 6‘2 = h—2 . omitting the + and - signs, 1+ -1 (2.2.25) o—2=n Z —l—,— 1Ti i=1 Where "1 =7‘(n)(£n])(: 3’") 2'=(:1il)< ‘ 7\(1'1):_2 , _m2 ,_ m1 m2 1m2 ”3' (n)( ’ M?) and "1. ~1'a‘"n—* ”(MGLXH‘ so that the asymptotic mean and variance, expressed as functions of m1,m2,n, and t, are given by A m (2.2.26) 9 = mlpl = 4%.1—2 and r2 = n 1 (n) i 1: m1 ”‘2 Therefore, if H— —> PA , H- —> PB , O i, __1 (2.2.27) lim g}— = AtpAPB, lim E3 = 1 , n ->oo n -->00 11 1T1 i=1 where ,A = 11m .A verifying the assertions made in n ->oo (n) section 2.1 in equations numbered (2.1.25) and (2.1.27), with #1 as defined in (2.1.27). Finally, in concluding this chapter, several approxima— tions to the probability function (2.1.1), based on the limit theorems established above, are given below. If n is large, m1 and h— = r, with m2 small as compared to n , then the binomial approximation m n m -n , 2) rt 1 1-2 2 1 (2.2.28) Pr {nllml’m2’n’ti}~(nl (rt+1'r) (rt+l-r) t mm 1 2 should be suitable. If g1 is small and vn = m is lt+n-ml 0f moderate magnitude, then the Poisson approximation 'Vn n1 9 V (2.2029) Prinl m1,m2,n3t} N n1! may be used. Finally, if m1,m2,n-m1, and n-m2 are all moderately large, then the normal approximation - 9 1 1 n1 (n) (2.2.30) Pr-inllm1,m2,n3t} AJJngrzl exp - 2 ___;r———- I will give satisfactory results. It is recommended that Patnaik's approximation not be used. Chapter 3. Power for the Test of independence 3.1 2 x 2 Independence Trial In all subsequent discussion of power functions, the power will be evaluated for the unbiased test described in section 1.5. For the 2 x 2 independence trial, the exact power of this test is given by nl w(m1’m2) ( mi)(m m2 m;) t 2 (39623559 n1 2: Pr‘inllml,m2,n} tn1 (3-l.l) PD (2, A’ P"13””1'n‘12) n§w(ml,m2) n :E: Pr {nllml,m2,n} t 1 n1 A(1-P -P +APP) A B A B = and w( ,m ) is the where t (l -APA)(1 ’M’B) : m1 2 region of rejection of Ho , for fixed m1 and m2 , defined by equations (1.5.6) and (1.5.7). The notation Pn(t Im1,m 2) .‘—V, ‘4 'rV-v . will also be used to denote the power function in the 2 x 2 independence trial, since it is much easier to tabulate power as a function of t. Using tables of the hypergeometric distribution,(3.l.l) is easily evaluated for small values of the parameters. Notice that (3.1.2) Pn(7t,PA,PB,ml,m2) = Pn(7\,PA,PB,m2,m1) Pn(A ’PA,PB,n-m1,n—m2) using the symmetry of Pr‘inllml,m2,n}-. When n is large, and m1 and m2 are both fairly large, the computation of (3.1.1) becomes laborious. There- fore, there is a need of approximating this expression. Fortunately, such approximations may be immediately obtained from the corresponding approximations to Pr'inilml,m2,n,}t} . If all of m1,m2,n-m ,n—m2 are large enough to require approximations, but are of moderate magnitude, then the II n normal approximation (2.2.30) may be used. Employing a % factor for continuity, this approximation becomes (3.1.3) a- -e Pn(A,PA,PB,m1,m2) N¢ < n q NIH +8113) a+%-9‘n) -9 (a- %-e§n2] l l — -9 b+z -9(n) b 2 9( __(n> ,(_......)-t( , Libi— -73- 1 l a+—-9() b-I- —-Q )+ £1¢< 3- n)_ (1- £2)¢ 3- (n) =(1- 61) d) a- 5 l -s2¢b_2‘_9_(n) O" where a, b, 81, and 62 are determined by equations (1.5.6) and (1.5.7), and em) and 0'2 are given by (2.2.26). If m2 is small compared to n, the bin0mial approximation (2.2.28) can be used, while if both ml and m2 are small relative to n, the Poisson approximation (2.2.29) is suitable. The summations may be performed with the aid of tables of the computations of the power function are included for the Poisson and binomial approximations. One can also approximate the power for the test of .r independence by using the test procedure described in section 1.6, based on assuming a normal approximation under the null hypothesis. The test consists of rejecting Ho when lul) u MIR incomplete beta function or the incomplete gamma function. No \ where u and ucfi are defined by (1.6.4) and (1.6.5) 2 respectively. Evaluating the power of this test assuming the normal approximation (2.2.30):fim (2.1.1), yields (3.1.1).) Pn(A,PA,PB,m1,m2) N m hues +(1' A “hu§+(l‘ h(n) )fl'n_‘2 1-¢ 2 0. -¢ 2 , -7I+.. (n-m )m (n-m ) where h2 = El__§_l__2____3_ , ;\(n) is given by(2.l.21), n (n-1) and a— 2 by (2.2.26). Exact values have been computed from (3.1.1) for various values of m1,m2,n, and t, and compared with approximate values from (3.1.3) and (3.1.h). These exact and approximate values may be found in the appendix in tablesIh3.l, D.3.2, and'ID.3.3. In section 3.h there is further discussion of the power function (3.1.1) including the relation between the power function for the 2 x 2 independence trial and the 2 x 2 comparative trial and double dichotomy. In Chapter h, the adequacy of the approximations (3.1.3) and (3.1.h) is discussed. 3.2 2 x 2 Comparative Trial The probability distribution of the sample point (3.2.1) Pri n1,m2|m1,n,l,}=(m 1)pl nll(l-p )ml-n1(n2: m1) p2m2 n-m -m +n PB(1 -l P ) where p1 = 1 PB, and p2 = --I-:-§;A* ° Using the unbiased test procedure discussed in section 1.5, the exact power function for Ho: p1 = p2 p1<1—p2) 1(1-pA- B+ x PA PB ) p2(1-p1) = (1-1AP:)(1- APB) ’15 7‘ 1 H1: p1 % p2, or, since t = is _n (3.2.2) Pn(7\ ’PA’PB’ml) = Z Z Pr inl,m2ln1_l_,n,7\ 3', m2=0 nl ewm1(m2) where wm (m2) is the critical region with m1 fixed. It 1 is convenient to use the notation Pn(l.,PA,PB,m1) to denote the power function, but the notation Pn(p1,p2,m1) will also be used to indicate that the power function is a function 0f p1 and p2 when it is desired to think in terms of the 2 x 2 comparative trial by itself. Since (3.2.3) Pr {m2'm1.n, K}= Z Pr inl,m2lml,n,7\} . n1 m m n-m -m n _ l 2 l 2 n -m 1 - (l-p ) p (l-p ) E m1 ( 1 t 1 2 2 (n1) m2-nl) n1 and (3.2.11») Pr invmzlmlm, x}: Pr {nllml,m2,n,R}Primz‘mlm, K} where Pr finl'mlmzm, A} is given by(l.2.5), one sees that the expression in (3.2.2) for the exact power function may be rewritten as (3.2.5) n Pn(1\,PA,PB,m1)= z Pr {mzlmrm X} Z Pr inl|m1,m2,n, L} m2=0 n1 8 wml(m2) n = E Pr im2lml,n, R} Pnu’PA’PB’ml’mZ) m2=0 where Pn(x,3A,BB,ml,m2) is the power function in the 2 x 2 independence trial. This last form of the power function is interesting but nOt very useful for computing. In dnguised form, (3.2.3) is the convolution of two independent random variables X and Y with x and Y distributed binomially b(n13ufi.Dl) and b(mZ-n1;n-m1,p2) respectively. -77- It is thus seen that either one must evaluate the exact power function by forming sums of products of binomial } probabilities, or by finding suitable approximations. The tabled exact values in the appendix of the power function (3.2.1) were computed by forming such sums, and three approximations are suggested below; the first has already appeared in the literature, whereas the last two have not. First, there is Sillitto's approximation (1.7.21) which is repeated for the sake of completeness. Letting sin“1 J p2 - sin."1 4/ p1 C(p19p2aml) = 1 1 - - —— + -l-— 2 m1 n-ml Sillitto's approximation is 2 ui-C(plapzsm1) _Y? 1 4/21T fig: C(p1.p2.ml) 2 a. where the angles are measured in radians. A second approximation which is considerably easier to i evaluate than Sillitto's but perhaps not as good may be obtained as follows. 1 x -x n m2-n1 ‘ Letu=-;_;x—-x—2-,where xlzfi’x2zfli’and 1'2 \ \ P P q P ' P \ «2- “4312—2 .ThenEcu)=.L_2:,v(u,=l If H0 is true, E(u) = 0. Assume that u is normally -78- distributed. If H0 is rejected whenever Iul > ucy , then 2 an approximation to power is given by 1124. -E(u) _ t_2 (3.2.7) Pn(p1aP2,m1)N 1- 2 1 e 2 dt ‘ ~‘2w 2 To Obtain the third approximation, we assume the normal approximation (3.1.h) for the conditional power function Pn(A,PA,PB,ml,m2) and expand it as a function of m2 in a Taylor Series about the mean if of m2 , where X = mlpl + (n-ml)p2. Then from (3.2.5), one has n (3.2.8) Pn(x,PA,PB,ml) NZ Pr im2 m1,n,x} En(X,PA,PB,m1,‘o’) m2=0 ll 2 P (lip ’13 9 220011 -3) ‘ ' x n A B m1 2 , + Pn (xipA2PB’mlix)(m2'-X)+ 2! +000‘ II 2 P (X’RA’PB’m ,X)r 1 n l ... Pn(x,PA, B,m1,b‘) + 2! + 2 . where f = mlplq1 + (n-m1)p2q2, and Pn(j)(k,PA,PB.m.X) is the jth derivative of Pn(2.,PA,PB,m1,m2) evaluated at m2 = X . Then as a first term approximation, one has (3.2.9) Pn(R,PA,PB,ml)/v Pn(}L,PA,PB,m1, X). It will be recalled from section 1.7 that Patnaik also gave an approximation to the power function. In view of Theorem 2.2.F and the remarks preceding and following the proof of the theorem, it is seen that there is considerable -79- doubt as to the validity of the approximation. Perhaps the approximation can be justified on the grounds that as the sample size gets larger one is interested in only those values of t near one (since for any fixed value of t the power of the test for independence tends to l as the integers 1111 ——> a>,n -—> 00, which means that the test is consistent), in which case the mean and variance of Patnaik's approxima- tion to the conditional distribution of n1, given m1,m2,n,t, may not differ significantly from the true mean and variance. Nevertheless, there is nothing in the way of simplicity in his approximation which recommends its use over (3.2.9), which is based on sound considerations. 3.3 Double Dichotomy For the double dichotomy classification, the probability of observing the sample point (n1,m1,m2), given K’PA’PB’ and n, is (3.3.1) Pr-{n1,ml,m2 In,z.} PBCL' X P ) A = b(m;n,PA)b(n1;m1o3»PB)b oo,and also the ratio of %1(X’PA’PB) and Pn(k,PA,PB,ml,m2)] m1=nPA goes to l as n ——> 00. This m2=nPB suggests several approximations for Pn(k,PA,PB), namely, the exact power function Pn(h,PA,PB,m1)] m1=nPA in the 2 x 2 comparative trial and any of the three approximations in 3.2 for Pn(R,PA,PB,m1)] m1 . Also, one may use the =nPA exact power function Pn(A,PA,PB,m1,m2)] ml=nPA for the test m2=nPB of independence in the 2 x 2 independence trial.evaluated for marginal totals equal to the expected values of ml and m2, as an approximation. In appendix D.1, Pn(R,PA,PB)3 Pn(x:PA:PB’m1)] ml=nPA and Pn(h,PA,PB,ml,m2)] m1=nPA are compared for several m2=nPB values of n’PA’PB’ and A . Finally, one can use the normal approximation 3.1.h, with. h(n) replaced by R . Thus, as a very easily computed approximation to Pn(R,PA,PB), we have -82- (3.3-5) —u‘xh+(1- 7L)nPAPB udh+(1— 7\)nPAPB Pn(7L,PA,PB)rV¢ 2 + 1 - (b 2 0‘ 0' The approximate values given by (3.3.5) are compared with the exact values as computed from (3.3.2) for a large number of cases, and judging from the simplicity of the approximation and the accuracy, (3.3.5) should be quite. adequate. These comparisons may be found in appendix D. -33- 3.h Asymptotic Power We shall be concerned in this section with an investi- gation of the asymptotic power function for the test of independence in the 2 x 2 independence trial, the 2 x 2 comparative trial, and the double dichotomy. The results which are to be presented in this section are not surprising, but rather they confirm what statisticians have believed for some time about the nature of the power function for the test of independence in 2 x 2 contingency tables for large sample sizes, i.e., that there is probably very little difference in power for the test of independence in the 2 x 2 independence trial, the 2 x 2 comparative trial, and the double dichotomy. Thus, our main contribution to the theory of the 2 x 2 contingency table given in this section is the limiting power function for the test of independence in the 2 x 2 independence trial. It follows almost trivially from this result that, asymptotically, power for the test of independence is the same for each of the three possible experimental situations leading to the presentation of data in the form of a 2 x 2 contingency table. In turn, the limiting power function in the 2 x 2 independence trial is almost a trivial consequence of Th.2.2.F on the asymptotic distribution of the conditional distribution of n1, given m1,m2,n, and t. The main results of this section are contained in Theorems 3.h.A and 3.k.B. We recall that the hypothesis of independence takes the form Ho:t = 1, while any alternative to independence is given by H1:t % 1; Ho and H1 may be expressed also by'Ho:5\= 1 vs. 111:3}! 1. Under the null hypothesis (by Th.2.2.F) the conditional distribution of n1, given ml,m2, and n, is asymptotically m m (nem )m (n-m ) 2 and variance h2 = E:;__.%?_ii___ii_ n (n-l) normal with mean if the conditions of Th.2.2.F, with t=l, are satisfied. This implies that, asymptotically, the unbiased test for independence described in 1.5 consists of rejecting H0 at level CK if m n1-——mgz u= T— 211w "2’ The main tool in studying power for large samples will be Th. 2.2.E on the asymptotic distribution of 111 under the alternative hypothesis that t #'1. If t > 0 is arbitrary but fixed, and the conditions of Th. 2.2.E are satisfied, then it follows that. l 2 .-u&h+ n -mlp1 2 (3ehol) Pn(t|m1,m2)rd ¢ r mm 1 2 + l ‘ ¢ 2 f’ 9 -l 2 l l 1 ’ wh = + and and are ere 6‘ mlplql Zn_ml)p2q2 5 131 p2 rr.t\wr$v'§- ., _ ,V -35. p1(l-p2) determined such that t = my 1 m2 = mlpl + (n'm1)P2° Here the assumptions of Th. 2.2.F require that m m 3 l 2 (11% h+-—-—-n -m1p1) U" 1 as r- -—> 0. We shall limit ourselves to values of m1,m2,n, and t for which (3.1+.2) is satisfied, since this will cover most cases of interest. If t is kept fixed as ml,m2,n -->oo, then for fixed level of significance on it turns out in the cases we shall examine that the power of the test for independence tends to 1. In order to examine the situation in which the power is not close to l in large samples, we must either let the significance probability decrease to O as ml,m2, and n increase, or consider a sequence of alternative hypotheses converging to the null hypothesis. We shall discuss the second case. In the 2 x 2 independence trial, we shall let t -—> 1 in such a way that MPH (l-t) -—> c, where c is any arbitrary but fixed constant, and for the 2 x 2 comparative trial and double dichotomy, we shall let 7L —> 1 such that a/‘r'f (l—X) converges to a constant.’ Before proceeding to the main results of this section on asymptotic power, it will be necessary to first note that the assumption that t is fixed in Th. 2.2.F is superfluous, as 1 long as a- -1 —-> O and r_ X}? -> O. In all that follows it 1 P B’ and A are positive; also, the is assumed that t, PA’ quantities p1 and p2 used in the proof of Th.3.'+.A depend on m1,m2,n, and t but this dependence will not be indicated by the notation. Th. 3.h.A. For all real numbers a,b(o < a,b < 1) such that m1 - na f ( ) m2 -’nb ( ) ----- g n and ---—-—- g f n where f Vna(1-a) 1 an(1-b) l 2 ’ 1 and f2 are such that n f1 -> O, n % f2 -> 0, we have (3.k.3) Pn(tlml,m2i] n-——>oo > ?(-uo(+ S) + l - @(u°(+ S). t=l:g_ 2 2 ~65 where S = M ail-aSbZI-b5 c . Proof: First we note that the hypotheses of the lemma imply m In El _+> a , _g ——> b. We will show that n m1m2 ugi h + n - mlpl 2 _>ud+s, l 2 (3.#.%) where p1 and a are as given in (3.h.l). Since t is assumed to be positive, the choice of p1 and p2 such that ' ELSE ‘ l d — m + (n-m )p t ‘ ‘ ‘° an m2 ‘ 191 1 2 2 implies p1,p2,q1 and q2 are positive so that a~ -——> 00. It will be shown below that (3.h.2) is satisfied, so that (3.h.l) holds. To establish the theorem it is only necessary to show that (3.h.h) holds. -37- _ p1‘12 Puttin t = l - 3L ' ' —- g #3 pqu , it is clear that as n > 0, so that using the fact that m2 = mlpl+(n-m1)p2 and E— -—> a, E— -> b, it follows that pl,p2 -—-> b, and therefore, since _ c ‘1 “2h '71,: ’ vnpqc Vn(p-p)=———l--1— --—>b(l-b). 2 l "E _ c c -l f A130, from the above, it follows that ELE ' n + n '-—-> a(l—a)b(l-b) n mlplql (n‘m15P2q2 ' Therefore 1 ‘ u h E“ 1 1 %% uc‘ 1111(n—ml)m2(n-m2)cr2 } ; f n (n-l) n 7? and } m m m \ l 2 —1 _ .1 _ _ -l L n ' m1131] °‘ ‘ n [:mlpl + (n m1)1’2 np1]" Nfld - l m -m 2 .2. - 1 1 r _ - - O _ (IT) (“n )(n ) “Fl-(92 P1) —> [8(1 a)b(1 b)] we have thus GStabliShed (3.1+.’+). It follows immediately that (3.h.2) holds, so that the theorem is proven. We now consider a sequence of 2 x 2 tables with fixed marginal probabilities PA and PB. Applying Th. 3.H.A. with A(l-PA-PB+APAPB) ’ 7t: 1 _ £— (l-APA)(1-7x13gf ME ’ a = PA’ b = PB, and t = we have that, upon rewriting the expression for t with }\= l— 4-. ch i -L i i n ‘ 1 ' a -_i and VéK(1 t) > QAQB , (301*05) Pn(A’PA’PB’ml’m2) y|_J L *9" C NW .0 ’1'.) 20b \‘1/ corollaries: } Corollary 1. For all values of m g l w ‘ We immediately infer from Th. 3.N.A the following two such that \ l where n 3 fl(n) -> O, F . P P 1 g __ _ 43.12 i t ___sL A—l/fi P P +1-4)u+\/-A-Bd e -89- Corollary 2 . (3.l+.7) an ’PA’PB):| P P > _u + fl d 4’ % VQAQB 5 P P +l-¢u3£+\—Q-:§d , 2 :1- 94. Th. 3.1+.A also shows that the limiting power function 2 2 for the 7‘. -test for independence is the non-central 7C - P A B d2, assuming P distribution with non-centrality parameter Q Q A B that the conditions of the normal-approximation Theorem 2.2.F are satisfied, and A: l - -d— . ME AS a particular case of (3.1+.l), we put m1 = nPA,m2 = nPB and obtain, after some algebra, I l -u°Lh + n(1-7L)PAPB ‘ '2- 1i h“ ugih + n(l- 7UPA ‘ + l - 2 t (P a} i L. -l ' ___——_’—‘ 2 t where m1 = nPA,m‘2 = nPB, h = V nPAPBQAQB , r" = n Z i: , i=1 d and 7T1, 11= 1,2,3,1+, is given by (2.1.27). Setting 7k: 1 - -— in (3.1+.8), it is easily seen that ‘IPEPE (3.1+.9) Pn(7"PA'PB .111 1,1112)“ (l 'utx \[1 Q AQB d + l ¢Q°‘+ QAQB nd> 2 ' ..-_£1_ .. _ where }\.— 1 “5' , m1 - nPA, m2 - nPB. Therefore, from (3.1+.5), (3.1+.6), (3.1+.7), and (3.1+.9), we obtain the following three asymptotic relations, which we group together in Th.3.1+.B. Th. 3.’+.B For all m1,m2 such that m --nPE m -nP 1__ S f1(n), 2__"‘L— S f2(n) “nPAQA “nPBQB __ 1 where n 3 rim) —-> o, 1 = 1,2, (3.4.10) Pfi(A,PA,PB,ml,m2;] N Pn(7\,PA,PB,m1,m2) =1 ' --d_ “a . (3.lr.11) Pn(;\,PA,PB,m1) N Pn(z,PA,PB,ml,m2) d 7\=l-/‘? (3.1+.12) Pn(a,PA,PB) N Pn(7t,PA,PB,ml,m2) d fi~=l- JE— where Pn(A,PA,PB,m1,m2) on the right-hand-side of (3.1%.10),(3.1+.ll), and (3.’+.l2) is evaluated at 7t= 1 - J:- Jr? m1 = nPA, and m2 = nPB in each case. (3.1+.l2) implies that for n moderately large we can approximate the power function for the test of independence in the double dichotomy quite well by evaluating the power function in the 2 x 2 independence trial with marginal totals at their expectations, i.e., with m1 = nPA, m2 = nPB. The implication of (3.1+.11) is that the power function in the 2 x 2 comparative trial can similarly be approximated by the power function in the 2 x 2 independence trial, this time with marginal totals ml and m2 = mlp1 + (n-m1)p2, where PB(1- 71 PA) ti P = AP and P = ———_——— . Also, the power func on 1 B 2 1 PA Pn(A,PA, B,m1):] , may also be used to approximate m1=nPA' Pn( 7k ’PA’PB)° For numerical illustrations of these last approximations, see appendix D. -92- Chapter h. Comparison of Power Functions h.1 Preliminaries In Chapter 3 we gave the exact power function for the uniformly most powerful unbiased test of independence in the 2 x 2 independence trial, 2 x 2 comparative trial, and the double dichotomy. we proposed several approximations for these exact power functions, including some based on asymp- totic properties as described in section 3.h. We also suggested that the exact power functions for the 2 x 2 independence trial and 2 x 2 comparative trial might be used as approximations for the power function in the double dichotomy. In this chapter we shall examine the adequacy of some of these approximations. We shall also see how the various exact power functions compare for small values of m1,m2, and n, knowing from the results in section 3.h that for large values of m1,m2, and n there is little difference between them. The notation Pn(;\,PA,PB) was used to denote the exact power function in the double dichotomy. The conditional power function, with one set of marginal totals fixed, say 1111 and n-ml, was denoted by Pn(7\,PA,PB,ml), and is the exact power function in the 2 x 2 comparative trial, with p1 = 7\PB, P(1-7\P) and p = —lL——————-Ji- . Going one step further, we denoted 2 1 - PA the conditional power function with both sets of marginal totals fixed by Pn( A’PA’PB ,m1,m2), which is the exact power function in the 2 x 2 independence trial. It is hoped that -93- this choice of notation emphasizes the relation between the three exact power functions, i.e., that the exact power functions in the 2 x 2 independence trial and 2 x 2 compar- ative trial are conditional power functions with respect to the exact power function in the double dichotomy. In the 2 x 2 comparative trial, we will use the alternative notation Pn(pl,p2,m1) to denote the power function when p1 and p2 are not related to A ’PA’ and PB' h.2 Comparison of Exact and Approximate Power Functions. Probably the most interesting tables of power which have been computed are given in Appendix D, pages 159-170. In these tables, values of the three exact power functions have been tablulated for certain combinations of n, PA’ and PB together with a few approximate values using the normal approximations (3.1.h) and (3.35). The power function in the 2 x 2 independence trial is evaluated at :111 = nPA, and m2 = nPB, and the power function in the 2 x 2 comparative trial is evaluated at m1 = nPA. 1 There are two important observations to be made from the data given there. The first is that these cases in which Pn(A’ A’PB) < Pn(}L ’PA’PB’ml)] < Pn(7L,PA ’PB ,m1,m2)J , m1=nPA m1: A ”12:an and also cases for which there are inequalities in the Opposite direction, with the exception that there is no case in the tables for which power in the double dichotomy is greater than that in the 2 x 2 comparative trial. However, if nPA = 1: then Pn(7\,PA,PB,m1)] 5% for all A . From table ml=nPA 301°1' P10(;‘:-1201) > ~05 for all 7‘? 1, so that no theorem is possible concerning the order of the power functions. The second observation is that the functional values of the three power functions draw together as n increases; for n = 10, there are wide differences; at n = 20, the differences decrease, still being fairly large, and for n = 30, the differences are smaller, but there are cases where the differences are large enough to be considered seriously. It is also rather interesting to examine the pairwise differences between the power functions. It appears that the differences between power in the 2 x 2 independence trial and the 2 x 2 comparative trial tend to be somewhat than the differences between power in the 2 x 2 comparative trial and the double dichotomy. The adequacy of the normal approximation (3.3.5) to Pn(A,PA,PB) is reflected in the tables in Appendix 1:, section D.l. Values computed from (3.3.5) tend to overes- timate Pn(7\,PA,PB) for small and large values of 7L, and underestimate for values of JR in the neighborhood of 1. Overall, the approximation seems to be quite adequate. Values of power in the 2 x 2 comparative trial, computed using Sillitto's and Patnaik‘s approximations, are given in E.2, together with the exact values Pn(p1,p2,m1). The tables given are for the special cases m1 = 10, n = 20 and 1111 = 15, n = 30, since for these cases the region for rejecting H0 is about the same for the unbiased test, and the Pearson test, on Which Sillitto and Patnaik based their computations of -95.. power. Pearson's test is described in section 1.7. It is very apparent that Patnaik's approximation is uniformly worse than Sillito's approximation, which tends to overestimate the correct values. The exact power function Pn(tlm1,m2) in the 2 x 2 independence trial may also be used to approximate quz p2q1 Pn(p1,p2,m1) by putting t = and m2 = mlpl + (n-ml)p2 =‘K. 'X is the expected value of m2. A few values were computed from Pn(tlm1,K‘), and the values compared with those obtained from Sillitto's approximation. These values were sometimes closer to the correct values than Sillitto's, but there is not enough data to draw any conclusions. These values may also be found in E.2. In all, Sillitto's approximation appears to be quite accurate, and isn't difficult to compute, so for small integral values of m1 and n it is probably to be preferred as an approximation. In the 2 x 2 independ- ence trial, the normal approximation (3.1.h) should be satisfactory, as indicated by the data in.D.l. In view of Th.3.h.B we have that price lm M) ~ Pn(p1,p2,ml) P q for a wide range of values of m and with t = -l-g . This 1 p2‘11 implies that power in the 2 x 2 comparative trial should be nearly constant for all combinations of p1 and p2 for which t is constant. -95- 0n the basis of the data in the appendices, we suggest that the following approximations to the power functions be used, considering both ease of computation and accuracy: I. 2 x 2 Independence Trial - use the normal approximation given by (3.1.H). II. 2 x 2 Comparative Trial-— use either Sillitto's approximation given by (3.2.6), or the normal approximation (3.1.h), with 1112 evaluated at its expected value, given by X= mm + (n'mflpo- III. Double Dichotomy - use the normal approximation given in (3.3.5). ~97- Summary We have re-examined one of the oldest problems in mathematical statistics, a problem which has become classical within its field -- that of testing for independence in 2 x 2 tables. At various times in the past few years, it has been thought that the theory of 2 x 2 contingency tables was completely known, that it was dead as a subject of research. In view of the controversy that has stirred the attention of such noted statisticians as R. A. Fisher, E. S. Pearson, and G. A. Barnard, it is difficult to understand how the problem can be considered as solved in all its many facets. This is particularly true when we realize that very little was pre- viously known about the power for the test of independence in small samples. This appears to be very surprising, since practicing statisticians and research workers have used the results of tests for independence in 2 x 2 tables for many years as a basis for making decision. We have undertaken the task of remedying this situation. Throughout this thesis, our one underlying objective was to thoroughly examine the power for the test of independence in small samples, even though we have investigated power for large samples also. In preparation for this examination, we first tried to determine precisely what had been accomplished previously in 2 x 2 tables. It was discovered that only in one of three possible cases had much been done in the way of studying power for a test of independence, this case being the 2 x 2 comparative trial. Therefore, we chose a particular -98- test for independence, the uniformly most powerful unbiased test first proposed by Katz [9] in l9h2, and were able to . use Katz' formulation of alternative hypotheses as a basis for investigating power for this test of independence. The unique feature of Katz' formulation was that it provided a logical and consistent method for examining and comparing power in each of the three cases corresponding to both sets of marginal totals fixed, one set fixed, and neither fixed. It was necessary first to study some properties of the conditional distribution for fixed marginal totals under the class of alternative hypotheses. Once these properties were known, it was fairly easy to examine systematically power for the test of independence, and that is what we hope we have done in this paper. Briefly, we will now summarize what we have done. 1. In section 2.1 of Chapter 2, we gave expressions for computing moments of the modified hypergeometric prob~ ability function Pr-{nllm1,m2,n;t}- , including also several recursion relations between the moments of the distribution, and were able to indicate the form of the asymptotic mean and variance of this distribution. 2. In section 2.2 of the same chapter, it was shown that the limiting distribution of the conditional distribution of n1 for fixed marginal totals was binomial, Poisson, or normal, depending on the mode in which the marginal totals m1 and m2 increased. As a result of Theorem 2.2.F, it was suggested that there are at least two published results in the literature which may be invalid. 3. We gave the exact power function for the test of independence in the 2 x 2 independence trial, 2 x 2 comparative trial, and the double dichotomy in the first three sections of Chapter 3, and suggested several approx- imations for these pOWer functions. We investigated the adequacy of these approximations in Chapter h. h. We examined asymptotic power for the test of independence in 3.h, and confirmed what many statisticians have believed for sometime - that for very large sample sizes, the difference in power between the three cases corresponding to the nature of the marginal totals is negligible. 5. We provided extensive tables of power for each of the three cases and used the values of exact power to evaluate the adequacy of the various approximations proposed in Chapter 3. The tables of exact power are in appendices A, B, C, and D. On the basis of these exact values, we suggested three specific approximations for the exact power functions. We hope that we have succeeded in thoroughly examining the power for the tests of independence, and that the results and computations, which we have presented in this thesis, will be useful for applied statisticians and research workers. -100- Appendix A. Exact Power for the Double Dichotomy. we consider a given population in which the members of the population are classified by two attributes, each having two categories, so that there are four distinct classes of members. We denote these classes by A1B1,A2B1,A1B2, and 1232. The proportions of members in these four classes are "1’72’" , and up respectively, Putting PA = v1 + w3 and PB = F1 + #2, we propose to test the null hypothesis Ho of no association in the occurrence of one attribute and the occurrence of the second, i.e., that "l = PAPB’ by selecting a sample of size n, and making use of the observed numbers in the four classes. Any alternative hypothesis may be expressed in the form vi = }\PAPB. We use the uniformly most powerful unbiased test, given in section 1.5, of Chapter 1, to test Ho’ The tables in this appendix give the exact values of the power function for this test. This power function is denoted by Pn(;\,PA,PB), and its functional form is given by (3.3.2) on page 79. The significance level C! is .05 in all the tables. The values of Pn(2.,P ,PB) were obtained by first com— puting the conditional power function Pn(;\,PA,PB,m1) for the 2 x 2 comparative trial, and then weighting these values with binomial probabilities, as suggested in 3.3, equation (3.3.H). All the computations were performed on the Michigan State university digital computer, MISTIC. ~101- The tables are divided into three sections, A.l, A.2, ‘and A.3. correSponding to n = 10, 20, and 30 respectively. In A.1 and A.2, tables of P10(7\,PA,PB) and P20(7"PA’PB) are given for all 15 combinations of P and P A withng B B A’ PA,PB =.1, .2, .3, .h, .5, and in A.3 tables of P30(7(,PA,PB) are given for the same combinations of P and PB, except the A three cases PA = PB = .2, PA = .2, PB = .l, and P = P .1 AB: are not available. For the combinations of PA and PB with PA = .3, .h, and .5, Pn(?\,PA,PB) was computed for 7\= .l,.2, .3, 0", [}%—] . For PA = .2, the range on 5K was from .2 A to %.8 with an increment of .2; for P = .1 and n = 10, the A range of 5k is .h to 9.6, the increment being .k, while for n = 20, the range on PR is .3 to 9.9, increasing by .3 . To check the accuracy of the computations, we used the fact that the test of independence is a conditional test. Thus, since P n,(7\ PA’PB ) = Z ZPr {m1,m2ln,}\} Pn (7t, PA,PB,m1,m2) H=O m2=0 n = > Pr {mllmpA} Pn(?L,PA,PB,ml), ml=0 where Pn(}l,PA,PB,ml,m2) is the exact conditional power function -102- for fixed m1 and me, we obtained a partial check on the exact computations for Pn(}l,PA,PB) by checking the values of the two conditional power functions, at certain selected points. In particular, we have that each of the power functions is equal to .05 for 5\ = 1. Finally for PA = .5, Pn(?\,PA,PB) should be symmetric about )1 = l, which provided one further check on the accuracy of the computations. All of the comput- ed values were given to nine decimal places, and rounded off to five places in the table. For n = 10, P10(7\,PA,PB) evaluated at A = l was exact to nine decimal places for all 15 combinations of PA’ and PB, for n = 20 it was accurate to 7 places, and for n = 30, there was 6-place accuracy. The partial check on the exact values obtained by computing the conditional power functions for certain selected values indicated that the exact computations were probably accurate to at least five decimal places for all 5L , with h-place accuracy virtually certain. The table numbers indicate the value of n and PA , with the first number giving the value of 31—0 , and the second number PA. For example, A.2.h gives values of P20( A’.,+,pB )_e Table A.1.1 P10(A’91’PB) .18005 1.2 1.h 1.8 2.0 2.2 2.1; 2.6 2.8 3.0 3.2 3.6 3.8 h'.o m2 Table A.l.2 P100 ,.2,PB) Table A.1.3 1310(A a 03.9133) X PB : .3 PB = .2 PB : .1 ‘ f .1; .08116 .06320 .05323 ‘ g .5 .07151 .05920 .05226 1 ‘ .6 .06372 .05591 .051h5 ‘ . .7 .05771 .05335 .05082 i l .8 .053hb. .05150 .0503? b .9 .05086 .05038 .05009 1.0 .05000 .05000 .05000 1.1 .05088 .05039 .05009 1.2 .05356 .05156 .05038 1.3 .0581h .05355 .05086 1.1; .06h72 .osoho .051Sh 1.5 .o73hh .060 .0523 1.6 .oshhh .06h83 .05353 1.? .09790 .07051 .05118 1.8 .11398 .07723 .05639 1.9 .1328? .08505 .0581? 2.0 .1Sh7b .09h02 .06018 2.1 .1797h .10h21 .062hh 2.2 .20802 .1156? .06h95 2.3 .23968 .12815 .06773 2.1; .27h78 .1b263 .0707? 2.5 .3131; .1582; .o7h09 , 2.6 .3552? .17533 .07769 ; 2.? .hooho .19396 .08159 .8 .1118 oZJJ-fl-h .08579 2 L 1‘7 .23592 .09030 .25930 .09513 .28h28 .10029 .31085 .10578 .33898 01.1162 -106- Table Aeloh P10 (2 , .h’PB) PB='3 PB='2 PIE-'1 1 .1927? .10626 .0626? 1 .15980 .09390 .06000 .13202 .0832? .05766 .10896 .07h2h .05563 .09020 .06673 .05392 .07535 .0606? .05251 .06h12 .05599 105m .0562; .0526? .05063 f .05156 .0506? .05016 1 .05000 .05000 .05000 .05158 .05068 .05016 .05638 .05271. .0506 .06h59 .05623 .051h6 .076h6 .06122 .05262 .09230 .06781 .05h12 .11th .07608 .05599 .137hl .08615 .05822 .167h6 .0981h .06083 .2030h .11219 .06385 .2hhh6 .128h3 .0672? 229191; '.lh699 .07112 .3h55h .16801 .075h1 ' 10513 .19162 .08016 : 70211 .21793 .08538 .107- Table A.1'.5 P10(2 5 OS’PB) .29699 3116111; .0707? .23832 .1210? .0662? '.1891h .1056? .0623? ‘ .Jl;8?8 .09008 .05903 i \ .116h6 .0??3h .05 623 ‘ .0913? .0672. .0539? .07275 .05958 .05223 .0599h .05h22 .05099 .052h6 .05105 .05025 .05000 .05000 . 000 .052h6 .0510; .05025 i I .0599h .05h22 .05099 ; .07275 .05958 .05223 ‘ .0913? .0672; .0539? .116h6 .0773h .05623 .111878 .09008 .05903 I .1893; .1056? .0623? .238 32 .12h3? .0662? * .29699 41.611. .0707? g I Table A. 201 1320‘ ’7" ‘1’PB) Table 11.2.2 P20(1’.2’PB) i 3 PB = 02 PB " .1 l E .0898h .0608h ‘ .072h8 .0562h .06010 .0528h I .0525? .05073 I 005000 005000 .05271 .0507? .06115 '.05315 .0758; .05726 .09731 .06321 ‘ . 1 2.0 .12593' 'j'.07113 7 2.2 .16190, ".0811; 2.h .20519 .0932; I 1 . 2.6 .25539 £10759 ‘ 2.8 .3117? .12 21 ‘ 3.0 .37323 . 312 ‘ 3-2 .h3831 .16h31 30 o 0531 .18776 3.6 .57237 .21339 3.8 .63762 2 1.0 .6993h .27078 h.2 .75613 30223 hob 080712 .3352? 14.6 .85212 .36966 a” .[Tm .- I Table A.2.3 P20( 2,.3,PB) 005910 ~05537 .05333 .05150 .05038 $05000 .05039 .0515? 1.9 2.0 2,2 2&3 2. Table A.2.h P20( 2, ,Oh,PB) .m. Table A.2.5 P20” "S’PB) PB=.h PB=.3 PB=.2 .9hh59 .75763 .h336o .8192 .62362 .3hh3h .71369 .h9071 .2 00 .55819 .3692? .20h65 .h0553 .26593 .15373 .27353 .1836? . .17173 .12265 .0851? .1022; .0813h .06530 .06271 .05768 .0537? .05000 .05000 .05000 .06271 .05768 .0537? .10223 .0813h .06530 .17173 .12265 .0851? .27352 .18366 . .1052 .2659? .15373 .55818 .3692? .20h65 .71368 .h9070 .26800 .8h922 '.62362 .3hh3h .9hb59 .75762 .h3360 .312— 2.0 2.3 2.1. 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 Table A.3.3 P30(1,.3,23) .08735 .07736 .06900 .06219 Table 11.3.1; 1330(13 0,4323) Table 2.3.5 P30(2 ,.S’PB) PB=°5 PBS-.1}, PB=.3 PB=.2 PB:.1 .99999 .99565 .9396h .70h78 .26299 .99889 .96969 .8hh01 .5728h .212h5 .98h85 .89698 .71078 .hh751 .17016 .92299 '.76575 .55793 .33608 .13516 .7805h .59101 .h0683 .2h282 .10759 .57085 .80716 .27511 .16923 .08591 .18073 .13532 .10278 .07801 .05865 .08131 ".07061 .06285 .0568? .05211. _'.05000 .05000 .05000 .05000 .05000 .0813]. .07061 .06285 .0568? .05211. .18073 .13532 .10278 ".07801 .05865 .35181 .28866 .1728? .mau .06978 .57085 .h0716 .27511 .16923 .08591 .7805h .59101 .h0683 .2h282 .10759 .92299 .76575 .55793 .33608 .135h5 .98h85 .89698 .71078 #1751 .1701 1 .99889 .9696? .8hl101 .5728); .212145 , .99999 .99565 .93961; .70h78 .26299 : Appendix B. Exact Power for the 2 x 2 Comparative Trial. In the 2 x 2 comparative trial, one set of marginal totals is fixed, say m1 and n-ml, which we may regard as the sizes of two independent random samples from two different populations A1 and A2. The probability that an observation from A1 falls in B1 is pi, i = 1.2. The null hypothesis states that p1 = p2, and the alternative hypothesis is that pBu -AP) pl # p2. If we put p1 = 2.83 and p2 =--jffirii:JL-, then the probability that nl of the observations from pupulation A1 fall in B1 and m2-nl of the observations from A2 fall in B1 is the conditional probability, given m1, of the probability distribution in the double dichotomy. We thus use two notations for the power function for the test that p1 = p2, against alternatives pl‘fi p2, namely, Pn()"PA’PB’ml) and Pn(p1:P2:m1)o It is convenient for tabulating purposes to give tables for Pn(p1,p2,m1) Where p1,p2 = 01, 12, 03, ... 9 .9, p2 S p1. In Appendix D, when we compare power among the three cases corresponding to the number of restrictions on the marginal totals m1 and m2, it will be convenient to tabulate Pn(A,PA,PB,m1). As in appendix A, we have divided this appendix into three sections, B.l, 3.2, and B.3, corresponding to n = 10, 20, and 30 respectively. The tables in these sections contain exact values of Pn(pl,p2,ml) for m1 = l,2,...,% , and all combina- tions of pl,p2 with p2 5 pl. These values are rounded to -117 - five decimal places, and are virtually certain to be accurate to this number of figures. The table numbers indicate the value of 7111 and n, with the first number being %5 and the sec0nd the value of m1. For example, table B.3.ll contains computed values of n = 30, m1 = 11. ~118- Table 3.1.2.. P10(P19P292) g .91 .1 ..2 ..3 .1 .5 .6 .7 .8 .9 1"2 1 ;; . . . . '- .1 .05000 .05316 .06266 .078h8 .10063 .12910 .16391 .2050h .25250 1 j 1 . j .2 .05000 .05260 .060h0 .o73hl .09161 .11502 .18363 .177h3 % .3 .05000 .0522; .0589? .07019 .08590 .10609 .1307? .1; .05000 .05205 .05819 .068h2 .08275 .1011? .5 .05000 .05198 .0579h .06786 .08175 .6 .05000 .05205 .05819 .068h2 : .7 .05000 .0521; .0589? .05000 .05260 .05000 Table B.1.3 P10(p1,p233) .1 .2 .3 .h .5 .6 .7 .8 .9 .05000 .0555? .0737? .10685 .1570h .22659 .31773 .83270 .57376 .2 .05000 .0555h .07333 -.10515 .1527? .21795 .302h6 .h0809 .3 .05000 .05551 .072Bh .10321 .11788 .20793 .28h69 .h .05000 .05585 .07222 .10093 .1h218 .19659 .5 .05000 .0553? .0711? .09830 .13587 .6 .05000 .05525 .07059 .095h0 .7 .05000 .05510 .06961 .8 .05000 .051195 .05000 .9 Table B.1.h 1>10‘1’11’P2I’L‘) .1 .2 .3 .b. .5 .6 .7 .8 .9 . .05000 .05?L:h .08192 .12638 .19332 .28h81 1.0250 .5h760 .72092 .05000 .05775 .08266 .12735 .19872 .28790 .h102? .565h5 .3 .05000 .05800 .08326 .12813 .19560 .2892; .h1328 J. .05000 .0581h .08313 .1277? .19386 .28519 .5 .05000 .05811 .08288 .12556 .18826 .6 .05000 .05790 .08152 .1213h .7 .05000 .05752 .079h3 .8 .05000 .05700 Table 3.1.5 210(p13p235) .1 .2 .3 .h .5 .6 .7 .8 .9 .05000 .05812 .085hh .1366h .21566 .32h33 .116102 .61919 .78608 .2 .05000 .05881 .08719 {138111; .21576 .32153 .h5653 .61919 .3 .05000 .05913 .08780 .13873 .2153? .32153 .h6102 .8 .05000 .05922 .08791 .13873 .21576 .32133 .5 .05000 .05922 .08780 .138hh .21566 .6 .05000 .05913 .08719 .1366). .7 .05000 .05881 .085hh .8 .05000 .05812 .05000 .1 .2 .3 .5 .6 ~122- Table 3.2.2 P20 (P1313232) .1 .2 .3 9h ’5 .6 I? _ .8 .9 .05000 .05h72 .06889 .09250 .12555 .16801. .21998 .28136 .35219 .05000 .0530? .622? .07762 .09910 .12672 .1608? .2003? .05000 .05238 .05951 .07139 .08803 .109h3 .1355? .05000 .05208 .05833 .06875 .08333 .10208 .05000 .05200 .05800 .06800 .08200 .05000 .05208 .05833 .06875 .05000 .05238 .05951 .05000 .0530? .05000 "u M”’*”“ . Table 3.2.3 P20@1.p2.3) .3 .h .5 .6 .7 .8 .9 .1 .05000 .05823 mm .1302. .19810 .28993 .10776 .55365 .7296h .2 .05000 .05725 .08083 .123h6 .18788 .27681 .39299 .53915 .3' .05000 .05673 .07825 .11650 .173h5 .25107 .35131 ”.1; .05000 .05625 .07561 .10903 .15725 .22180 .5 .05000 .05596 .07382 .10360 .111528 .6 .05000 .05593 .07311 .10060 .7 .05000 .05608 .07301 .05000 .0563}; .05000 Table E.2.l; .1 ..a. 32:- m1.“ P20(pl.p2,h) pl .1 .2 .3 .h .5 .6 .? .8 .9 1’2 .1 .05000 .06299 .1oh58 .1766h .2781? .h0539 .55169 .7076h .86098 .2 .05000 .06116 .09680 .15973 .25222 .37598 .53216 .72135 .3 .05000 .06060 .09h61_ .15599 . 961 .38117 .55729 .h .05000 .06076 .09h85 .1560? .28959 .38198 .5 .05000 .06072 .09363 .15096 .236h5 .6 .05000 .06020 .0903? .1h092 .7 .05000 .05972 .08753 .05000 .0599h .05000 .325. Table 3.2.5 1’20“???” .2 .3 .h .5 .6 .7 .8 .9 .1 .05000 .06699 .12073 .21175 .33591 .1852 .6hh52 .79860 .92536 .2 .05000 .0618? .11176 .19311 .30912 .15662 .62792 .80969 .3 .05000 .061;16 ..108511 .18665 .30219 .h5825 .65683 .1. .05000 .06396 .10771 .18623 .30716 .h8051 .5 .05000 .06393 .10767 .18715 .31219 .6 .05000 .06382 .10672 .18361 .7 .05000 .06360 .1013? .8 .05000 .06362 .05000 Table 3.2.6 .9 P20 (pl.pz,6) .1 .2 .3 .h .5 .6 .7 .8 .9 F2 .1 ..05000 .07113 .1383). .2h882 .39355 .55703 .719h8 .85953 .95758 .2 .05000 .0685? .1258h .22270 .35650 .5192. .69622 .86h95 .3 .05000 .06725 .12025 .21192 .3hh65 .51782 .?2528 .h .05000 .06655 .1180? .210h6 .35268 .55521 .5 .05000 .066h7 .11909 .21751 .3778? .6 .05000 .0669h .12170 .22571. .7 .05000 .067h2 .12252 .8 .05000 .06779 .05000 .127... . Table 3.2.7 P20(p1,122,7) .1 .2 . .3 . .h . .S . .6 . .7 .8 . .9 .05000 .07h02 .111855 .27081 .h2876 .60213 .7658? .895h3 .97383 .05000 .07112 .13661 .2558 .39325 .5668h .7hh91 .89882 .3 .05000 .0696? .12959 .232luo .3788? .56350 .77029 .8 .05000 .06863 .12656 .23012 .38715 .60293 .5 .05000 .06851 .12791 .23959 .12161. .6 .05000 .06930 .13300 .25790 .? .05000 .07055 .138h2 .8 .05000 .071111; .05000 Table B.2.8 .3 .h .5 .6 .7 .8 .9 .1 .05000 .07661 .15872 .29171. .h5922 .63602 .79h72 .91296 .979Sh .2 .05000 .073h7 .1113118 .26865 1.1255 .59051 .7687? .9160? .3 .05000 .07112 .13h8h .2h387 .3988h .59182 .79988 E .1; .05000 .06995 .13208 .2h36h .h123h .63885 3 1‘ .5 .05000 .06992 .13th .25678 .5612 .6 .05000 .07098 .114172 .28h39 7 .7 . . .05000 .0729? .15226 .8 .05000 .07h82 ..05000 -129- Table 13.2.9 P20051317?” .3 0’4- 05 O6 .7 .8 O9 .1 005% 307850 016,489 030176 0147055 0614620 .8025? 0918011» 098181} T .2 .05000 .o7h65 .1868? .26h52 .h210? .60119 .77980 .9261. 1 f .3 .05000 .07188 .13750 .2h963 .h0859 .60h75 .81096 3 3 j J. .05000 .07062 .13h85 .25028 .h2h02 .6522? 1.7 ' ' j; .5 .05000 .07063 .1376? .26502 .h7022 ‘ 1 17 .6 .05000 .07185 .1h618 .29691 .7 .05000 .07836 .16062 .3 - .05000 .07738 .05000 Table 3.2.10 P20815510) .3 .h .5 .6 .7 .8 .9 .1 .05000 .0782? .16’4110 .3035? .h773h .65690 .812hh .92306 .98260 .2' .05000 '.0?h92 .11832 .268h5 1.2714? .60785 ..78338 .92306 .3 .05000 .072211 .13890 .25239 .81219 .60785 .812hh .1. .05000 .07089 .1358h .25239 .h27h? .65690 .5 .05000 .07089 .13890 .268h5 .h773h .6 .05000 .072zh .111832 .3035? .7 .05000 .07h92 .161410 .8 .05000 .0782? .05000 ~13], Table 3.3.2 P3o(p1’P2’2) .h .5 .6 .7 .8 .9 .1 .05000 .05h9? .06990 .09h77 .12958 .17h35 .22906 .29373 .3683h .2 .05000 .05310 .062112 .07795 .0996? .12763 .16179 .20216 .3 .05000 .05238 .05952 .071112 .08809 .10951 .13569 .1: .05000 .05208 .05833 .06875 .08333 .10208 .5 .05000 .05200 .05800 .06800 .08200 .6 .05000 .05208 .05833 .06875 .7 .05000 .05238 .05952 .05000 .05310 .05000 .132. Table 13.3.3 P30013172,» .h .5 .6 .? .8 .9 .1 .05000 .05909 .0875? .13723 .20990 .30738 113188 .58399 .7667h .2 .05000 .05778 .08332 .12993 .20090 .2995h .h2915 .59302 .3 .05000 .05711 .07995 .12082 .18200 .2657? .37hh2 .1; .05000 .0563? .07617 .110h0 .16008 .22662 .5 .05000 .05600 .07398 .10396 .11593 .6 .05000 .056oh .073h? .10130 .7 .05000 .05635 .07386 .8 .05000 .05668 .05000 Table B.3 .h P3o®l.p2.h) .3 .h .5 .6 .7 .8 .9 .1 .05000 .0618. .11109 .19057 .30090 .h36h? .5882? .7h389 .88750 .2 .05000 .06189 .09992 .16705 .2653? .39619 .55996 .7563h .3 .05000 .06118 .097h? .16380 .26621 .h1186 .60898 .h .05000 .06152 .098h8 .16572 .26976 .h1883 .5 - .05000 .061116 .09671 .1583h .25065 .6 .05000 .06075 .09232 .1hh90 .05000 .06009 .08870 .05000 .06050 .05000 ~138- Table 3.3.5 P30(p1.1>2.5) p1 .1 .2 .3 .h .5 .6 .7 .8 .9 P2 .1 .05000 .07015 .13309 .23731 .37520 .5301 .6968? .8h377 .9525? .2 .05000 .0665? .11932 .21101 .3h076 .50229 .68225 .85850 .3 .05000 .06559 .111193 .20222 .33109 .50326 .?1?26 J: .05000 .06586 .1311 .20312 .3h2h2 .5hh36 .5 .05000 .06558 .111191 .2057? .351111 .6 .05000 .0651? .11221 .19681 .7 .05000 .06h66 .10805 .8 .05000 .06882 .05000 .135. Table 3.3.6 P30(P1.pg.6) .h .5 .6 .7 .8 .9 .1 ..05000 ..075h9 -.15h20,..281118 ..hh3oh 1.61732 .77918 .90h90 .97862 .2 .05000 .07121 .13788 .2496? .h0226 .58136 .76369 .91671 .3 .05000 .06978 .13110 .23856 .39189 .5888: .7967h .1: .05000 .06910 .12890 .23638 .170th .62895 .5 .05000 .06893 .1298? .28529 .1559? .6 .05000 .069h2 .132hh .2535h .7 .05000 .069h6 .13030 .05000 .0696h .05000 P30 (P1913237) . 1 1 .1 .2 .3 Ch .5 .6 .7 .8 .9 .1 .05000 .08103 .17550 .32392 .50361 .68hh5 .83692 .98035 ..98986 Table 3.3.7 1 1 .2 .05000 .07571 .lShSh .28h82 $5551 .6hh35 .81978 .9h715 .3 .05000 .07319 .11552 .2'6856 .1095? .61318 .81759 l 1 'J 1 1 .1 .05000 .07213 .11130 .26886 .hh901 .69050 1 1 . 1 1 ,5 .05000 .07189 .1h27h .27752 .h9911 I 1 ‘1 .6 .05000 .07295 .11975 .3ohzo 1 13: 1? .7 “ .05000 .07822 .151.21 1.. 1 .8 .05000 .07h86 Table 3.3.8 P3o(pl,p2.8) 1’1 .1 .2 .3 .1 .5 .6 .7 .8- .9 92 ‘ .1 .05000 .08655 .19508 .3599h .55093 .73232 .87370 .95960 .99118 .2 .05000 .07963 .16866 .31256 .89528 .68807 .85183 .96319 .3 .05000 .07636 .15636 .29189 1.17526 .68536 .87972 .1 .05000 .0785); .1512h .28765 .1866? .73361 .5 .05000 .07130 .1531? .30263 .5127? .6 .05000 .07566 .16281 .3h169 .7 .05000 .07818 .17573 .8 .05000 .08022 Table B.3.9 P306112») .3 .14 .5 .6 .7 .8 .9 .1 .05000 .09138 .21293 .393117 .593h5 .77265 .90158 .9720? .99685 .2 .05000 .0832). .18129 .3661; .52858 .72301 .88103 .97391 .3 .05000 .07885 .16590 .31178 .50655 .72089 .90503 .14 .05000 .07676 .16019 .30893 .52158 .77202 .5 .05000 .07658 .16325 .32702 .58280 .6 .05000 .0781? .17h79 .37h28 .7 .05000 .0816? .19529 Table B03 010 -139- P3o(p1.p2.10) .h .5 .6 .7 .8 .9 .1 .05000 .09h6h .22812 11171; .6216? .79961 .91991 .9798? .9981? .2 .05000 .08606 .19116 .35689 .557h8 .75360 .90328 .98193 .3 .05000 .08109 .17195 .33172 .53720 .753h8' .92108 .8 .05000 .07890 .16953 .32913 .55181 .79971 .5 .05000 .07870 .1722. .3h737 .61319 .6 .05000 .08038 .18896 .h0086 ,7 .05000 .08h61 .21200 .8 .05000 .0903? .05000 Table B.3.11 P30(p1.p2,11) P1 .1 .2 .3 .h .5 .6 .7 .8 .9 R2 .1 .05000 .0980h .235h1 .h3283 .6h210 .81655 .9299? .98355 .99865 .2 .05000 .08801 .19786 .36903 .5?hlo .7698? .91353 .981186 .3 .05000 .08255 .180h3 .3h299 .5532. .76932 .93305 .1. .05000 .08019 .17h72 .3h073 .56979 .81739 .5 .05000 .08002 .17826 .36218 .63659 .6 .05000 .08200 .19309 .h2277 .7 .05000 .08692 .2261? .05000 .09h70 .05000 .1171. Table 3.3.12 Paofi’l’Pa’lz) .1. .5 .6 .7 .8 .9 .1 .05000 .10216 .2911 .h5530 .66722 .83705 .98168 .987h3 .99906 .2 .05000 .09026 .20590 .38h32 .59383 .787h8 .923h6 .98736 .3 .05000 .oeho9 .18629 .35h70 .56922 .78h18 .9h053 .2: .05000 .0811): .17981 .35183 .58603 .83153 .5 .05000 .0812? .18379 .37501. .65bh6 .6 .05000 .083h7 .1999? .h3861 .7 .05000 .08892 .23668 .05000 .09858 .05000 Table B. 3.13 .152. _ P30(pl,p2,13) .h .5 .6 .7 .8 .9 .1 .05000 .10290 .2960 .85571 .66896 .8398? .98390 .98830 .99918 .2 .05000 .0906h .20683 .38658 .59819 .79312 .92789 .98882 .3 .05000 .08861 .18835 .35988 .57815 .7939? .9h588 .h .05000 .08216 .1830h .3596L; .59826 .88271 .5 .05000 .08212 .18785 .3851? .66988 .6 .05000 .oahhh .20526 1.5253 .7 .05000 .09015 .28862 .05000 .10089 .8 Table 3.3.114 1330(1’1’1’2'11‘)‘ .3 .h .5 .6 7 .8 .9 .1 .05000 .10361 .25268 .1620]. .67596 .8hh63 .9h601 .98890 .9992; .2 .05000 .09136 .20919 .3906h .60315 .79768 .93070 .98955 . 3 .3 .05000 .08513 .1902; .36381 .58390 .79959 .9h826 . \ E? \ .h .05000 .08265 .18508 .36u36 .60501 .8h659 l . ; . jf .5 _ .05000 .08266 .19031 .390h0 .67M.1 1 E _ ‘ .6 .05000 .08506 .20801 .h5762 1. .7 . ' .05000 .09098 .2191? \ .8 .05000 .10266 .05000 .2 .3 .1. .5 .6 .7 .8 .1 .2 .3 Table B. 3 015 P30(p1.p2.15) .h .5 .9 .05000 .10396 .05000 .253h7 .09165 .05000 .h6h62 .2109? .08561 .05000 .68196 .39Shh .19201 .08295 .05000 .85159 .609h6 .36671 .18612 .08295 .05000 .9h997 .80201 .58686 .36671 .19201 .08561 .05000 .98980 .93215 .80201 .609h6 .395hh .2109? .09165 .05000 .99928 .98980 .9h997 .85159 .68196 .h6h62 .253h7 .10396 .05000 Appendix C. Exact Power in the 2 x 2 Independence Trial. In the 2 x 2 independence trial, both sets of marginal totals are fixed. The classic example of an experiment in which the marginal totals should be so regarded is Fisher's tea-tasting experiment described earlier in section 1.6. The appropriate probability distribution is obtained conditionally from the multinomial distribution, and is given by (1.2.5) in Chapter 1. The exact power function for the test of independ- ence is a conditional power function, for fixed m1 and me, ——I n 1 > Pr {nllm1m2,n} t __l n $w(m ,m ) P (t m1,m2) = 1 1 2 , o < t < a). n E Pr {3 Iml,m2,n} t3 3 and is given by In order to relate Pn(t|ml,m2) to the exact power function in the double dichotomy, we put t = 7L(l-PA-PB+ 7L PAPB) (1- 7L PA)(1- 7k PB) and to relate it to the power function in the 2 x 2 comparative trial, we set p2(1-p1) . Exact values of Pn(t|ml,m2) were computed using a distinctly different program than that used for the previous two cases. The ~1h6- accuracy of these values was checked by direct hand compu- tation for several cases, which indicated that the values computed were accurate to a minimum of five decimal places in every case given in the tables. The machine computed results were given to 11 decimal places, but were rounded off to five places. When t = 1, Pn(t |ml,m2) is equal to .05. This served as a check also on the values computed on the digital computer. For all combinations of ml,m2, and n for which computations were made, the computed value of Pn(t lm1,m2) for t = l was accurate to 10 decimal places, i which again indicated that the values of power given in this appendix are certainly correct to five decimal places. The tables in this appendix are divided into three sections, 0.1, 0.2, and 0.3, corresponding to n = 10, 20, and 30. Since there are many combinations of ml and m2 for each Value of n, exact values for Pn(t| m1,m2) are given for selected combinations of m1 and m2. ; It was convenient to tabulate the power function as a E function of t rather than as a function of 7\. For this reason L it is necessary first to determine t from the relation .. _ Z 2&(1 PA PB+ PAPB) m m where PA = H; , PB = 3g . Then exact power may be found by looking in the table for which m1,m2, and n are the marginal totals and sample size. More values of power in the 2 x 2 independence trial may be found in Appendix D. I ’ Au- APA)(1- a PB) ‘1 .1151. Table 0.1.1 1112 = ‘4 m2 : 3 m2 : 2 .82859 .27223 .1071h .75971 .25985 .10h66 .699h1 ”.2h835 .10229 .6h626 .23763 .10003 .53773 .21383 .09h82 . 51480 .19362 .09018 .38981 .17629 .08603 .33783 .16133 .08231 .2955? .1h832 .07898 ‘ .20716 .11818 .07083 .16855 .103h9 .0666h .11830 .08261 .060142 .08883 .06923 .0562; .07098 .0606h .05350 .06019 .0552; .0517h .05398 .0520? .05069 .05089 .050h6 .05015 .05000 .05000 .05000 .05072 .05038 .05013 .05265 .05138 .050h6 .05550 .05285 .05095 .05906 .05h6? .05155 .06318 . 5676 .05223 .08883 .06923 .05625 .1h858 .09505 . 29 .20716 .11818 .07083 .26071 .13696 .0759? .30873 .1526 .08005 .35157 .16538 .08333 .38981 .17629 .08603 .hzhoh .18559 .08828 .h5h80 .19362 .09018 .5705? .22133 .09 9 451.626 .23763 .10003 . 991:1 .2h835 .10229 .73873 .25593 .10386 .7929? .26593 .10589 .82859 .27 223 .1071h .872b9 .27973 .10862 .88696 .28215 .10909 Table 0.1.2 P10“! h’m2) , 1': m2 = 1112 g 3 m2 _—_ 2 00.02 .31b58 '_.16970 .08722 00.03 .2993? .16h89 .08590 ‘ 00.01; .28528 .16031 .08h63 ‘ 7 00.05 .27220 .15592 .083 0 _ » 00.10 .21893 .1366h .0778). I . 00.125 .19816 .128h2 .07539 > ‘ 00.15 .18031 .12098 .073 g \ F 000175 01611-8? 011112 007107 [ 00.20 .151hh .10813 .0691 00.25 .12935 .097h9 . 79 00.30 .1121? .08863 .06293 00. 0 .08786 .07505 .058h5 00.50 .07236 .06555 .0552 . 00.60 .oézho .05898 .0530h ' “>070 0056111» 0051460 .0515 E 00.80 .052h3 .0518? 43506 ' 00.90 .05055 .05oh3 .0501h Table 0.1.3 .0735h Table 0.2.1 P20“; ' 10’1“?) mag-:8 Table 0.2 .2 "29-6 Table 0.2.3 2200.! 6,6} onu I 6,1.) race I mu) Table 0.3.1 P3005 '15: m2) ~2 - 2 '- .99679 .99703 .98960 .98996 .97779 .9778h .96170 '.9 3 .90681 .9023h .83922 .83012 .7668? .75356 .69h88 .67837 .62618 .60759 .h5101 .h3lh9 .36112 .3 3119 .23355 .22118 .15h7h .JL?05 .1 0 .10208 .0772? .07513 . 1 .0597? .05236 .0521? .05000 .05000 .05193 .05178 .05707 _‘.05651 .06h69 .06353 .07h25 .0723h .08533 .08258 .150717 .1h705 .31171 .29576 .h5101 $31179 .56229 .5h265 .60860 .63058 .71521 .69950 .7 7 .75356 .80729 .79622 .83922 .83012 .92701 .92396 .96170 .9 .97779 .9778h .98620 .986h9 .9932? @3703 o 9 . .99809 .99836 .99888 .99901 Table 0.3 .2 P 30 9 (t '15: 312) .98990 .91721 Table 0.3.3 P30“; I 11" m2) P30“! I 13: m2) m2 = 1h 211.2 a 12 m2 = 10 :11.2 = 13 m2 : 12 .99702; .99278 .97696 .99288 .9915? .98999 .98062 .9536}; .9808? .9Sh9h .97789 .9628h .9258? .96332 .97031 .96103 .9h0h8 ’.89530 .9 O .95130 .90262 .8713 .81355 .87278 .88976 .83063 .7936? .73185 .79570 .81720 .75h30 .7561 .65h95 .71810 .7h17h .67929 .58h70 . .66822 .60 7 .5731? .521le .57612 .59921 1.382 . 116528 . .53591 113275 1.0703 .37132 .h0989 .h2733 .3hh71 .32508 .29805 .3276h .3 .22218 .2113? .1963l. .21321 .2201; .114776 .10210 .1339? 41.330 . 685 .10253 .09973 .09552 .100h5 .10203 .07539 .07 .0721h .07h51 .07512 .05988 .059h2 .05866 .05958 .0597? .05220 .05210 .05193 .052 .0521? .05000 .05000 .05000 .05000 .05000 .05180 .05173 .05159 .05176 .0517? .05662 .05635 .0558h .056119 .0 9 ‘.o6376 .06323 .06211. .06352 .063h? .07273 .07188 .07006 .07239 .07223 .08320 .08195 .07926 .08273 .08238 .1h923 .10570 .13726 .M832 .11592 .30201 .29373 .27170 .3009 .29081 1.11123 . 29148 .39632 .hho 6 1.2202 .5965 .5h095 ”.50059 .55h25 .529hh .6h375 .62923 .58503 .60366 . 1h 3 .71303 .698h7 .6528h .7131? .68228 .76695 .75280 .70735 .7673h .73573 .80913 .79566 .75139 .80951 .77837 .8h239 .82972 .7872}; .8h282 . 12 .9325h .92393 .89286 .93292 .91063 .96673 . .938h9 .9670h .9515? .98175 .97791 .96161 .98200 .9712? .98928 .98655 .97hh6 .98908 .98182 .9958? .99h16 .98705 .99588 .99163 .997 .99706 .99255 .99818 .99556 .9985? .99835 .99533 .99879 .99705 .999 0 .99902 .99688 .999 .99 36 .99952 .9993? .99782 .9995? .9989? Table 0.3.1; P30(t'12, m2) 13:12 111-9 10-6 ~257- Twhcds P300; l9.9) . P30“; l 9,6) P30“ '6,“ '88917 ' 029359 .13802 '83966 0281.08 .135”? '79363 .2719? .13299 °6558h .2,-L592 .12h76 .5756? .22758 .11931 ~39921 .18220 .10h85 .35599 .16973 .10060 .28535 .18799 .09285 .23230 .12985 .08601 .15861 .1019h .o7h73 £81166 .06876 .05972 '06706 435961: .0551? .0567h .0539b, .05218 .0515? .05091 .05052 .05000 .05000 .05000 .0512 .05079 .05017 .05h66 .ogg98 .ggiégg . 7h .0 3o . .059 .8205; .ggggé ‘ .0 6 . 5 . .173 g .0988? .08295 ‘ .23560 ’lgiéé .21 23 . 8 .2 . 35; 8°27 57.28 . 2 o . 0702 . 39%; 1.6523 1.01136 .6h7h5 .51586 .05581 .69hh0 .55996 .5018 .73369 ‘.59889 .5829? .85626 .73271 .69258 .91h62 .80961 .782314 .9h558 .85755 .839h8 .963h5 .889hl .87773 7 781-3832 . 8 .9 919 o -;39§§i .96230 .963Bh .9958h .9709? .973h3 .99713 .97689 .97986 -158- Appendix D. Comparison of Power Functions. The three exact power functions Pn(?\,PA,PB), Pn(}\,PA,PB,m1), and Pn(7\,PA,PB,m1,m2) are compared in this appendix. More precisely, tables are given comparing the three power functions simultaneously, with Pn(7\,PA,PB,m1) evaluated at m1 = nPA, and Pn(?\,PA,PB,ml,m2) evaluated at m1 = nPA and 1112 = nPB. Comparisons are made for n = 10, 20, and 30, with various combinations of PA and PB. The first column gives the dependence parameter 2‘, and then values of power for the double dichotomy, 2 x 2 comparative trial, and the 2 x 2 independence trial are given in the next three columns. . In the last column values of the normal approximations (3.1.H) and (3.3.5), which are identical for the particular choices of PA’PB’ ml, and m2 in these tables, are given. This last set of figures may be used to evaluate the adequacy of either (3.1.h) as an approximation to Pn(?\,PA,PB,ml,m2), or (3.3.5) as an approximation to Pn(7\,PA,PB). Table D.l.l I . 1:: _ 111 .93281. .8010? .80866 .78609 .6h688 .62018 .61666 19672 _ .8870? .h5653 .36521 .5 .308h8 .3210? .25826 , .6 .20593 .2153? .17650 a .7 68 .1387h .11779 . .08598 .08791 .0789? 1.0 .05000 .05000 .05000 1.1 .0 8 .05923 .05706 1.2 .08598 .08791 .0789? 1.3 368 .1387h .11779 1. .20593 .2153? .17650 1.5 .308h8 .3210? .25826 1.6 .1070? .85653 .36521 1.7 .62018 .61 1.9672 1.9 -—-- .9328h .8010? Io P10(2,05905,535) mo P10 _ 131 .75763 .62362 .75763 I. P20( 2,0,4, 014,838) :1. P20( 2.1.1.8) Table 13.2.3 II .85130 271252 .56531 .h2576 .9998); P20< R, 01-19014) I .80315 .6257? .87265 .38589 . 2115 25 .16882 .113811 .07732 .0566h .05000 \ . 1.1 .05653 1.2 .07632 1.3 .11022 . 1.8 ' .15936 ; 1.5 .22h66 = 1.6 .30608 1.7 1.0205 1.8 050891 1.9 .62080 2.0 0730014 2.1 .82809 2g .%NH: 2 .3 096199 2.h .9916? I. P20(),,3,.h,6,8) II. 220( ),.3,.u,6) Table D.2.h .63602 .73382 .82200 .89501 .9h895 III. P20(;\,.3,.h) .50226 .59932 .69h79 .78313 .85918 .91905 Table D. 2 .5 .30596 036726 . 62 “51‘0" -28980 .3332. .207 28 .2261? .21375 016616 .17h71 .1 013106 .13380 .12811 -10 .10231 .09363 ~07953 .07886 .07633 .063 .062 67 .0617 3531111 .05315 .05292 ~88: 200° - . .0 317 o 05 293 o06398 .06281. .06186 .08199 .07933 .07708 .10783 .10305 .09897 . 77 1613 .12791 .18391 'j.1737h .16h20 o 2 09 I. 2 . 2079 2 .291811 .27682 .25892 .35631 .33903 .31666 2 0796 . 380214 I .50011 1.8171. 11.830 r .57589 .55810. .519 7 . 1112 .63559 .5906? 321131. .71063 . 72 .792214 .78076 .72700 .85276 .8h336 .7875 .90383 .89618 .8u038 .9 .9377h .88b69 .97210 .9675h .91998 .98916 .98630 .9 663 E .99722 .99595 .9657? --—-- .99913 .97910 } ""'"- 1. 098861-1- 1. P20( 1’03, 03.9696) mo P20( 3’03, 03) II. P20( A, 03,0336) Table D.3.1 I II 1.00000 1.00000 .99929 .99928 .99130 .98788 .98201 .93215 .8093? .79571 .59762 .58686 .36830 .36323 .20028 .18612 .‘9 .08983 .08263 1.0 .05000 .05000 1.1 .08983 .08263 1.2 .20028 .18612 1.3 .36830 .36323 1.8 .59762 .58686 1.5 .8093? .79571 1.6 .98201 .93215 1.7 .99130 .98788 1.8 .99929 .99928 1.9 1.00000 1.00000 10- P30(A, 05305915915) no 1330(2305, 05,15) IV. .92299 '.98b85 .99889 .99999 III. P30(;\,.5,.5) .180h9 .07832 .05000 .07832 .180h9 .3567? .5852? .80522 .9h695 .99501 Normal Approximation 1,2 41.682 1.3 .27391 1.1. 111675 1.5 .63796 1.6 .80775 1.7 .92823 1.8 .981h6 11.9 0 98314 I. P30(A,.h,.5,12,15) II. P (A,.h,.5,15) 30 Table D.3.2 111. P30( A,.h, .5) .13532 .07061 .05 000 .07061 .6131? 115190 .30659 .19122 .11101 .9h896 2.0 .98853 2.1 .99706 2.2 .99953 2. 3 1.00000 2.1. 1.00000 I. P30(;\90h90h312312) II. 2309,1112) III. IV. III .9591h .87368 .7h398 .5879h .9995? c99983 P30( A ,.h,.h) Nonnal Approximation IV .9958h .92299 us.“ .591129 .2818? .Iooh9 .05000 —-—- .1279? .2929h m .5838? .85915 .985112 .99995 1.00000 Table D.3.h 71 I 11 III .1 .779hl .6h201 .60156 02 . 7297 .50126 .h7297 .3 113930 .38103 .36179 .1. .31821 .2822? .2695? .5 .22500 .20809 .19602 .6 .15572 311.51; .13972 .7 .10655 '.10131 .09872 . .07012 .07220 .07107 .9 .05585 .055h5 .0551? 1.0 .05000 .05000 .05000 1.1 .05568 .055142 .05 1.2 .07272 .07188 .07078 1.3 .101h0 .09990 .09738 1 .10219 .1h006 .13558 , 1.5 .19536 .19266 .18561 1.6 .2606? .25738 .2h721 l 1.7 .33705 .33302 .21926 g 1.8 1.2238 1.1736 .3996? . 1.9 .513h6 .50716 . ' 2.0 .60619 .59838 .57286 2.1 .69589 .68 .65791 2.2 .77793 .76751 .73672 2.3 .8 833 .83751 .80609 2.1. .9oh50 .89821. .86392 2.5 .9558 .93685 .90936 2.6 ‘.97269 .96613 .9h289 2.? .9883h .98816 .96602 2,3 .99605 .99383 .98086 2.9 ---- .99811. .98973 3.0 "v“- .99962 .9916? 3.1 *--- .9999? .9972? 3.2 __ 1.00000 .99862 3.3 .....— 1.00000 .99936 I. 230003.399) 111- 1”3001:3231 II. P30( A, .3, 03,9) L— _ ~ \J ~171- Appendix E. Approximations to Power in the 2 x 2 Comparative Trial. We discusSed the power for the test of independence in the 2 x 2 comparative Trial in section 3.2, and gave tables _ of exact power in Appendix B, and in Appendix D. In E.l of this appendix, three graphs are given, two of which illustrate how the power changes as m1 varies from O to g, and the third graph contains some power curves for ml = 15 and n = 30. We fix p2 and then plot P30(pl,p2,l5) as a function of p1. This is done for p2 = 0]., .2, 03, too, .90 In E.2, Sillitto's approximation, given by (3.2.6) and Patnaik's 2nd approximation, given by (1.7.18), are compared with the exact values of power in the 2 x 2 com- parative trial, for the two special cases 1111 = 10, n = 20, and m1 = 15, n = 30. These values occur in groups of four for each pl,p2 combination, with the top number being the exact one, the second and third are obtained from Sillitto's and Patnaik's approximations respectively, and the fourth number is Pn(t lml, X), X‘= mlpl + (n-ml)p2, and P q t = -l-g . Not all P (t lml. K) values are given for P2q1 n each pl,p2 combination. Grouch 3‘1 1 “172- l ' I.P1'.'le= .8 lpl ”Pal: ‘7 lpl - P2'=‘06 IPI ' P2P ‘5 IPI " Pal: 9" Ia - 1°21: .3 lpl ' p2|: '2 _. lpl - P2]: .1 I ’ 0723'4567'89/0 . ._0/23456789/0///2/3/4/5 /. 000 Gm? l-‘-=1.3 -17h- . P300319 P29 15) .900 - . 800 . 700 .600 .5 00 \~. .400 P :N" .300 .200- ./00 . . . .1 0T a a, 3‘7 '5 '46” 7 e8 ’19 -175- Table 11.2 .1 Canparisons of Power in the 2 x 2 Comparative Trial .2 .3 .1. .5 .6 .7 .8 .9 .0783 .1681; .3036 .8773 .6569 .8121; .9231 .9826 .0973 .2109 .3683 .5152 .7137 .8893 .939? .9856 .0500 .0977 .2273 .1011. .6183 .7930 .9118 .9785 .9983 .8688 .9909 .0500 .0789 .1883 .2681. .0275 .6078 .7831. .9231 .0500 .0813 .1671 .3015 .8722 .6550 .8206 .939? .0500 .0820 .1717 .3162 .5018 .6999 .8703 .9785 .0500 1.3116 .8123 .0500 .0722 .1389 .2521. 2 .6078 .8121: .0500 .0756 . 2 .2780 1.525 .6550 .8h93 .0500 .0? 1 .1535 2860 .8736 .6999 .9118 .0500 .h259 0 05w 007$ o 58 0 25214 018275 . 6569 . 00 .0736 . .2780 .8722 .713? .0500 .07 .1h82 .2860 .5018 .7930 .0500 .1373 1.31:6 1 ' . .0500 .0709 .1389 .2681. .8773 1 5 .0500 .0736 .1512 .3015 .5852 1 ° .0500 .0781 .1535 .3162 .6113 1 .0500 . 688 1 .0500 .0722 .1183 .3036 1 6 .0500 .0756 .1671 .3683 1 ° .0500 .0761 .1717 . ‘ . . .0500 1 . , 1 Fr“ N°°' P20(P1’Pe’m) .0500 08119 .1611; 1 . . .21 1‘ .7 Second No.= SillittO's Approx. .8538 .0820 .223; 1 Third No.: Patnaik's Approx. '0500 .0 00 . 8 1 _- Fourth NO.: P20(’Gl 10, r) .OEOO .8797; 1 '8 .0500 .099? .0500 .0500 .0500 .0500 .0500 T I Table E.2.2 Canparisons of Power in the 2 x 2 Comparative Trial 1’1 1’2 .1 .2 03 Ch 05 O6 '07 .8 O9 .0500 .1080 .2535 .8686 .6820 .8516 .9500 .98h8 .9993 .0500 .1216 .2927 .5117 .7193 .8711; .9560 .9905 .9991 .0500 .1285 .3163 .5663 .7872 .9251 .9828 .9982 1.0000 .0500 6773 .9595 . .9993 ".0500 .0916 .2310 .3951: .6095 .8020 .9322 .9898 .0500 .0971; .2279 .18220 .6387 .82h0 .9 .9 .0500 .0982 .2339 .hlal .6715 .8598 .9668 .9982 .0500 - .6380 .9h20 .0500 .0888 .2036 .3897 .6161 .8280 .9560 .0500 .0892 .2069 . .6315 .8598 .9828 .0500 .2037 .5976 .9595 .0500 .0830. .1861 .3667 .6095 .8516 .0500 .0855 .1968 .3896 .6387 .8711; .0500 .0862 .1993 016006 c 6715 O9 251 .0500 .2003 .6380 .0500 .0830 .1920 .3951; .6820 .0500 08.55 .2036 . 220 .7193 .0500 .0862 .2069 . .7872 .0500 .2037 .6773 .0500 .0856 .2110 4.6116 5 .0500 .0888 .2279 .5117 . .0500 .0892 .2339 .5663 fp 15) .0500 First No.: P ' 30 1’p 2’ .0500 .0916 .2535 Second No. : Sillitto's Approx. .0500 .0971; .2927 .0500 .0982 .3163 mix-d No.8 Patnaik's Approx. .0500 Fourth No.: P oft [15, 3‘) .0500 .101“) ' .0500 .th5 .0500 .0500 .0500 .9 .0500 .0500 ~177- BIBLIOGRAPHY l. Armsen, P., "A new form of table for significance tests in a 2 x 2 Contingency Table," Biometrika, Vol. #2 (1955). pp. 899-511. 2. Barnard, G. A., “Statistical inference," Journal Royal Statistical Society, Vol. 11 (1989), Series B, pp. 115-139. 3. Barnard, G. A., "Significance tests for 2 x 2 tables," Eiometrika, Vol. 3% (1987), pp. 123-138. 2 . #. Cochran, W. G., "The fit -test of goodness of fit," Afiflélé Mathematical Statistics, Vol. 23 (1952), pp. 315-3 5. Feller, W., "An Introduction 39 Probability Zheory, and Its Applications," 2nd edition, John Wiley and Sons, New York, 1957. Finney, D. J., "The Fisher-Yates test of significance in 2 x 2 contingency tables," fliometrika, Vol. 35, (1988), pp. 1h5-156. Fisher, R. A., "Statistical Methods for Research Workers," 2nd edition, Oliver and Boyd Ltd., Edingurgh, 1935. Fisher, R. A., "The logic of inductive inference," 8. Journal E022; Statistical Societ , Vol. 98 (1935), pp- 39- Katz, Leo, "The test of the hypothesis of no association in the four-fold table in light of the Neyman—Pearson Theory," unpublished memorandum, Michigan State University, "Tests of Significance in 2 x 2 tables: extension of Finney's Table," Biometrika, Vol. #0 (1953), \n O O\ \1 O 10. Latscha, R., pp. 7""‘86 o 11. Lehmann E. L., "Significance level and over " nals of ’ pp: 112741176? Mathematical Statistics, Vol. 29 (1958 , 12. Loeve, M., "Probability Iheorz,“ D. Van Nostrand Co., Inc., "A test for randomness in a sequence of two 13. Moore, P. G., alternatives involving a 2 x 2 table," Biometrika, Vol.36 (1999). pp. 3055316. 19. Patnaik, P. B., "The power function of the test for the difference between two pro ortions in a 2 x 2 table,“ fiiometriga, Vol. 35, (1999 , pp. 157—175. ;_1___ ,, ,7 {W8- 15. Patnaik, P. B., "The Non-Central 12 — and F-distirbutions ) and their applications," Biometgika, V01. 36 (1989 , pp. 202-232. 16. Pearson, E. S., "The choice of statistical tests illustrated on the interpretation of data classed in a 2 x 2 table," Eiometgika, Vol. 38 (1987, pp. 139-167. 17. Pearson, K., "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling," Series 5, Vol. 50 (1900), Philoso hical Magazine pp. 157—172. 18. Riordan, J., "An Lntgodugtion t2 Combinatgrial al sis," """"‘8"' AQ__Z___ John Wiley and-Sons, Inc., New York, 19 . 19. Rokhar, A. E., "The effect of hypergeometric probability distribution on the design of sampling plans for small lot sizes," Master's ghesis Lehigh Universit , 1958. 20. Sekar, C. G., Agarwala, S. P., and Chakraborty, P. N., "On the power function of a test of significance for the difference between two proportions," Sankh a, Vol. 15, (1955). pp- 381-390. 21. Sillitto, G. P., "Note on approximation to the power function of the 2 x 2 comparative trial," Eiometrika, Vol. 36. (1989). pp. 387-352. 22. Snow, 0., "Hypergeometric and Legendre Functions with Applications 39 Integral Equations 9: Potential Theory," National Bureau of Standards Applied Mathematics Series 19, 1952. 23. Steck, G. P., "Limit Theorems for Conditional Distributions," (1957), University 2; California Publications in Statistics, Vol. 2, No. 12, pp. 237-288. 28. Sverdrup, E., "Similarity, unbiasedness, minimaxibility and admissibility of statistical test procedures," (1953). pp.68-86. Skandinavisk Actuarietidskrif , Vol. 36 25. Tocher, K. D., "Extension of the Neyman - Pearson Theory of tests to discontinuous nariates," Biometrika, Vol. 37 (1950). pp. 130-188. 26. Yates, F., "Contingency tables involving small numbers yal Statistical 2 and the 9( ~test," Su lament, Journal_Ro 55, pp- 217-235. Societ , Vol. 1 (193 %172_l, . 7 ,, 7. , j ..... IGQN STQTE UNIV II 1G)I IIIIIIIIIIIIIIIIIIIII III II II