ww . . 5. w . 29.1 ; a ..,Cnn...i “rm“. , . .. : anagréa 3». _ IN... 5-4 i . . 131* .L fix» , Ram-.mmmv 5.. u , a‘ . Iv. .. A.3..,..u..w,...uuuwza.u. . , J a? 9, was : .5." m i ... $5.“? . 3% Rama” v: 4.. .. ”1.5.1.1.. u.hm..fi..é+h firm. 7.} .Hfiwfi..e a. was . £1: ; . a. , géh» a. a. . 1.; Chat-.- «1 . .. . .: i . . 14:3 93 x Fit! 355 It a? 1.. . S “La .,J....ihl .: .lst 51:? u-xq' .‘ up if?“ _ . . ‘ ‘ , . ‘ ‘ V. Fifi-h. 32...“. , .FwfifithHME L .ux‘mxhhvvmf trans. \ C 4. .. .w, y .. f» r. .33 LEW.» 3,... avagiahfi; . . a ”ii E53 L I f. .7," ,‘1 c)’ t/ v 0K This is to certify that the dissertation entitled Interval Estimation for the Difference of Two Binomial Proportions in Non-adaptive and Adaptive Designs presented by Yichuan Xia has been accepted towards fulfillment of the requirements for Vq " x‘ - QM W CO- Major professor I nni Pa e , ”fig/J Date July 12, 2002 /‘f‘ [- Co-Major professor Vincent Melfi MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c:/CIRC/DateDue.p6&p. 15 Interval Estimation for the Difference of Two Binomial Proportions in Non-adaptive and Adaptive Designs By Yichuan Xia A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2002 ABSTRACT Interval Estimation for the Difference of Two Binomial PrOportions in Non-adaptive and Adaptive Designs By Yichuan Xia When comparing two treatments with dichotomous responses, the difference in proportions of successful responses of the two groups is often of primary interest. Confidence intervals are typically provided to estimate the treatment difference. This interval estimation problem for both non-adaptive and adaptive designs is studied in the dissertation. Several methods of constructing confidence intervals for the difference of two proportions are evaluated in non-adaptive designs. We begin by exploring the poor performance of the most widely used confidence interval, the Wald interval. We show that the poor behavior mainly results from its inappropriate center: the cov- erage performance can be improved greatly by simply recentering the Wald interval. We then derive a formula which gives smooth approximation of the coverage prob- ability of the Wald interval. Regardless of oscillation, this approximation shows how much the coverage probability of the Wald interval falls below the nominal level. Our analysis demonstrates the Wald interval is rather anti-conservative and often behaves much worse than pe0ple’s expectation. As alternatives, the Wald in- terval with continuity correction, two confidence intervals with adjusted centers(a Bayesian interval derived from Beta priors and Agresti-Coull’s adding 2 successes and 2 failures interval) and the profile likelihood based confidence interval are eval- uated. We compare both their coverage performance and expected lengths with those of the standard Wald interval. To replace the Wald interval, intervals with adjusted centers are recommended. Adaptive designs are gaining more attention nowadays. For adaptive designs, the validity of constructing confidence intervals discussed in non-adaptive designs is verified. We evaluate the performance of those confidence intervals in two general categories of adaptive designs: allocation adap— tive designs and response adaptive designs. We develop theorems concerning the connections between the coverage performance and expected lengths of confidence intervals based on non-adaptive and allocation adaptive designs. The theorems suggest that the Wald interval does not behave satisfactorily and that the inter- vals with adjusted centers should be used in allocation adaptive designs. Extensive simulation supports the same conclusion in response adaptive designs. ACKNOWLEDGMENTS I would like to express my deepest gratitude to my dissertation advisors, Pro- fessor Vincent Melfi and Professor Connie Page, for your constant guidance, gen- erous support and extreme patience during the writing of this dissertation. Your dedication and contribution to statistics have been my main source of inspiration and encouragement during my research. I am extremely grateful you suggested that I work in adaptive designs. I sincerely appreciate the time you put into our weekly meetings and the help you offered whenever I needed it. Your willingness to help at anytime encouraged me to keep going no matter what difficulties I met during the research of this dissertation. I also thank you for all the time you Spent on proofreading my dissertation drafts. I understand that it was not comfortable to read and correct a non-native speaker’s tedious and stiff writing on statistics. If time could flow backwards, I would not hesitate even a second to ask you to be my thesis advisors again. I would also like to thank Professor Roy Erickson and Professor Habib Salehi for serving on my thesis committee. Your help is highly appreciated. My special thanks go to Professor Roy Erickson and Professor Hira Koul who taught me Theory of Probability and Theory of Statistics during my first year of graduate study. The two courses turns out to be very important in my research. I iv also thank Professor Connie Page for training me to be a (good) statistical consul- tant which helped me a lot on my job searching and will benefit my future career. I also owe Professor Vincent Melfi many thanks for teaching me two courses: Mod- ern statistics and Sequential analysis and guiding me on the usage of Later and C/C++ when I was in trouble with them. I will not forget to express my sincere gratitude to Professor Stapleton. Thank you for being a true friend for so many years. Those scenarios when I got along with you are so lovely. I will not forget your warm but firm encouragement when I faced some trouble just before and after I came to the department; your suggestions on how to be a (good) teaching assistant; your correction on my English errors in emails; the jokes we played back and forth ...... I also thank the department for offering me the assistantship for four years. At last, but not the least, I want to thank all the professors and friends who ever helped me during my stay at Michigan State University. Life here became easy, meaningful and colorful because of you. List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 Exact coverage probability of the nominal 95% Wald interval for p1 = 09,102 = 0.1 and n1 = n2 = n: 6 to 100 ............ Exact coverage probability of the nominal 95% Wald intervals for p1=p2=0.5andp1=p2=0.9withn1 =n2=n=20to 100 ... Exact coverage probability of the nominal 95% Wald interval atnl = 77.2 = 10 and 121 = 10, 122 = 100 With p2 = 0.9 and p1 = 0.8 to 0.999 with jump size 0.001 .......................... Exact and SE approximate coverage probabilities of the nominal 95% Wald intervals for p1 = 0.9, p2 = 0.1 and 7n = n2 = n ........ Exact and SE approximate coverage probabilities of the nominal 95% Wald intervals for p1 = 0.8, p2 = 0.3 and 17.1 = n2 = n ........ Exact and SE approximate coverage probabilities of the nominal 95% Wald intervals for p; = 0.8, p2 = 0.3 and 722 = 2m ......... Exact coverage probability Boxplots of some 95% nominal intervals Comparison of exact coverage probabilities for p2 = 0.5, n1 = n2 = 20 at 95% nominal level ........................ 13 14 15 33 34 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 4.4 4.5 Comparison of exact coverage probabilities for p1 = p2 = 0.01 to 0.99,n1 = 112 = 20 at 95% nominal level ................ Comparison of exact coverage probabilities at p1 = 0.7, p2 = 0.5, 11;» = 211.; at 95% nominal level ..................... Comparison of exact coverage probabilities at p1 = 0.9, p2 = 0.8, n2 = 2111 at 95% nominal level ..................... Comparison of approximate expected lengths of some confidence in- tervals for p2 = 0.5 and m = 17.2 = 25 ................. Comparison of approximate expected lengths of some confidence in- tervals for p2 = 0.1 and m = 10, 112 = 20 ............... Comparison of coverage probabilities for p1 = 0.21 to 0.99, 172 = p1 — 0.2, n1 = n2 = 20 at 95% nominal level ............. Coverage probability Boxplots of some 95% nominal intervals upon RPW(1,1) ................................ Expected Length Boxplots of some 95% nominal intervals under RPW(1,1) ................................ Coverage probabilities of three 95% nominal intervals for n = 20 and 1),, = 0.5 under RPW(1,1) ....................... Expected lengths of three 95% nominal intervals for n = 20 and p4 = 0.5 upon RPW(1,1) ........................ Coverage probabilities of three 95% nominal intervals for n = 20 and 12,, = 0.9 upon RPW(1,1) ........................ 45 46 48 52 53 56 72 73 74 74 75 4.6 4.7 4.8 Expected lengths of three 95% nominal intervals for n = 20 and pA = 0.9 upon RPW(1,1) ........................ 76 Coverage probabilities of three 95% nominal intervals for n = 10- 100 and 12,. = 0.7, p3 = 0.4 upon RPW(1,1) ................ 78 Expected lengths of three 95% nominal intervals for n = 10 — 100 and p4 = 0.7, p3 = 0.4 upon RPW(1,1) ................ 78 viii TABLE OF CONTENTS 1 Literature Review 1 1.1 Introduction ............................... 1 1.2 Some Confidence Intervals and Comparisons ............. 2 1.3 Application ............................... 8 2 Wald Interval Estimation for the Difference of two Binomial Pro- portions 9 2.1 Introduction ............................... 9 2.2 Coverage Pr0perti£s of the Wald Confidence Interval ........ 10 2.3 A Reason for Inadequate Coverage .................. 18 2.4 A smoothing formula obtained by Edgeworth Expansion methods . 20 3 Interval Estimation for the Difi'erence of two Binomial Proportions 36 3.1 Introduction ............................... 36 3.2 Some Alternative Intervals ....................... 38 3.2.1 The Wald interval with continuity correction ......... 38 3.2.2 Intervals with adjusted center ................. 38 3.2.3 The profile likelihood based intervals ............. 40 3.3 Comparison of Intervals with Explicit Forms ............. 40 3.3.1 Coverage Pmperties ...................... 41 3.3.2 Expected Lengths ........................ 47 3.4 Comparison of the Wald Interval and all Proposed Alternatives . . 54 4 Interval Estimation for the Difference of Two Binomial Propor- tions in Adaptive Designs 59 4.1 Introduction ............................... 59 4.2 Notation and Some Adaptive Designs ................. 60 4.2.1 Some Allocation-Adaptive Designs .............. 62 4.2.2 Some Response-Adaptive Designs ............... 63 4.3 The Confidence Intervals in Adaptive Designs ............ 64 4.4 Comparison of Confidence Intervals in Allocation Adaptive Designs 66 4.5 Comparison of Confidence Intervals in Response Adaptive Designs . 71 4.6 Conclusion ................................ 79 Bibliography 80 Chapter 1 Literature Review 1 .1 Introduction In clinical trials and in industrial work, to compare a new treatment with a stan- dard (control) treatment, the difference in probabilities of successful responses of the two groups is often of primary interest. Confidence intervals are typically pro- vided to estimate the treatment difference. There exist quite a lot of methods for constructing confidence intervals for the difference of the two success probabilities. The most widely used confidence interval, the Wald interval, which is an asymptotic confidence interval computed based on a normal approximation, does not behave satisfactorily. In this dissertation, the poor performance of the Wald interval and the reason for the poor coverage performance are explored in Chapter 2. In Chapter 3, some selected methods for constructing confidence intervals for the difference of two treatment proportions are evaluated and compared with the Wald interval. We restrict attention to non-adaptive designs in these two chapters. 1 Nowadays, adaptive designs, in which the allocation of next subject to a certain treatment depends on accumulating information, is more widely used. The interval estimation problem is studied for adaptive designs in Chapter 4. In this dissertation, we use three types of “coverage” probabilities: exact, ap- proximate and nominal coverage probabilities. The exact coverage probability of a confidence interval is the actual coverage probability of that interval. The approxi- mate coverage probability is an approximation of the coverage probability. We will be using an Edgeworth expansion to derive the approximate coverage probability of the Wald interval. The nominal coverage probability is its named confidence level. For example, a 95% confidence interval has nominal coverage probability 0.95 though its exact coverage probability might be different from the claimed level. Sometimes we don’t specify whether a coverage probability is exact, approximate or nominal if it is obvious in context. Before presenting our findings, it is useful to give a survey of related literature. 1.2 Some Confidence Intervals and Comparisons Though we are interested in confidence intervals for the difference of two pro- portions, it is worthwhile to mention two papers on confidence intervals for one proportion which have impacted our study. Let us begin by introducing the Wald intervals for one proportion and for the difference of two proportions. Let X denote the number of successes from n Bernoulli trials with success probability p and let :3 denote the sample proportion. For two independent treat- ments, let X1, X2 denote the numbers of successes from treatment 1 and treatment 2 respectively, so that X,- ~ Bin(n,~, p.) for 2' = 1, 2. Let 2,, represent l—a percentile of the standard normal distribution. 1. The 100(1 - a)% Wald confidence interval for p is 15 i Za/2 150“ film 2. The 100(1 — a)% Wald confidence interval for p1 - 132 is . . ‘ 1—‘ ‘ 1—“ pl—pgiza/2¢pl( P1)+P2( P2) n1 n2 One way to derive these confidence intervals is to invert large sample Wald tests, which evaluate standard errors at the maximum likelihood estimates. For instance, the interval for p is the set of po values having P-value exceeding a in testing Hozpzpoversus Hazpyépo using the approximately normal Wald test statistic. The Wald intervals are some- times called the standard intervals. Although these two intervals are simple and applied most often, a considerable literature shows that they behave poorly. Brown et al. (2002) consider confidence intervals for one proportion. They notice there is a widespread misconception that the problems of the Wald interval are serious only when p is close to either boundary, or when the sample size n is rather small. Brown et al. (2002) shows that virtually all of the conventional wisdom 3 and popular prescriptions are misplaced because the Wald interval has a pronounced systematic bias due to its inappropriate center. They derive two-term Edgeworth expansions as an analytical tool to compare and rank the some selected intervals with regard to their coverage probabilities. They also give the two-term expansions for the expected lengths of the Wald interval and some alternative intervals. When deriving the two-term Edgeworth expansions for the coverage probabil- ities of those intervals for p, Brown et al. (2002) express all the confidence intervals in a unified form: 1/2 ‘_ {ZELSWSM}, p(1-p) where l. and u. are not related to the sample proportion 15. Since the statistic 121/2 (i) — p) / m has lattice structure, a direct application of a Theorem of Bhattacharya and Ranga Rao (1976) gives the desired Edgeworth expansions. But this method does not apply in two treatment problems. Brown et al. (2002) Show that the Wilson confidence interval for p, due to Wilson (1927), behaves much better than the standard interval. The Wilson interval is based on inverting the test with standard error evaluated at the null hypothesis value, which is the score test approach. Given level of significance (1, this interval contains all p0 values for which n1/2 « _ lP Pol < Za/2 100(1-pa) and has the form X + 22 /2 711/22 22 052 i 65/2 15(1 -13) + “/2. (1.2.1) n+za/2 n+z.o‘/2 4n This interval turns out to behave better than the Wald interval for p. Some confidence intervals for p1 - p2 are motivated by the Wilson interval for Agresti and Coull (1998) noticed that (1.2.1) can be rewritten in the following way: 1 TI. 1 22/2 12,, —— ‘1-‘ —— +- ——‘-’-— . /2 rz+z;‘:/2 p( p) n+z2/2 4 n+2?”2 Hence, the midpoint of the Wilson interval is a weighted average of 15 and 1/2, and it equals the sample proportion after adding 32/2 pseudo observations, half of each type. The square of the coefficient of 20/2 in this formula, is a weighted average of the variance of a sample proportion when p = [3 and the variance of a sample proportion when p = 1/2, using 72 + .22” in place of the usual sample size n. Motivated by this decomposition of the Wilson interval, Agresti and Caffo (2000) proposed adding 4 pseudo observations, one success and one failure from each treatment, to get the confidence interval for p1 — p2, Also motivated by the Wilson confidence interval for p, Newcombe (1998) proposed a method that performs substantially better than the Wald interval. This confidence interval results from the single-sample score intervals for p1 and p2. Specifically, let I,- < u,- be the roots for p, in for z' = 1, 2. Newcombe’s hybrid score 100(1 -— a)% interval is defined as . , lI—l u l—u . . 1_ 11—1 (p1‘p2)"za/2\/l—(——ll+‘l(‘—2)a(1h—P2)+za/2\/ul( “0+ 2( 2) n1 n2 n1 n2 Unlike quite a lot of other confidence intervals, the Newcombe’s interval is not symmetric around 131 -152. It has margins of errors different from those of the Wald interval. Newcombe (1998) evaluate eleven methods of constructing confidence inter- vals for p1 - 132 through simulation. Some of those confidence intervals have rela- tively complicated expressions compared to intervals discussed so far such as the Wald interval and the Agresti-Coull interval. Newcombe (1998) suggests replacing the Wald interval with the Newcombe hybrid score interval.The profile likelihood method (introduced in detail later), involving ~ {A E (-1~ 1) = 2001.132) - 1(A)) S Xi(a)}, where A = p1 — p2, A = 131 — 132 and 1 denotes the log-likelihood function of (A, p2), RA) 2 rgaxl(A,p2), is also considered in his paper. Newcombe (1998) shows this interval has the best coverage and location prOperties among all the eleven confidence intervals while it displays an undesirable anomaly. Suppose X1, 721 and 152 are held constants, while in —> 00 through values which keep X2 integer valued. One expects that a good method would produce a sequence of intervals, each nested within its predecessor, tending asymptotically towards some corresponding interval for the single proportion, shifted by the constant 132. Yet the profile likelihood method gives a sequence of lower limits which increase up to a certain 722, but subsequently decrease, violating the above consideration. 6 \ ) . Agresti and Caffo (2000) evaluate the Wald interval, the Agresti-Coull interval, a Bayesian interval(considered in detail in Chapter 3) and Newcombe’s hybrid score interval.They find their exact coverage probabilities and mean expected lengths at some specific pairs (711,722) with p1 and p2 taking values from the unit square. It is shown that the Agresti-Coull interval has better coverage performance than N ewcombe’s hybrid score interval. The above results all involve non-adaptive designs with constant sample sizes. The sample sizes in adaptive designs are not constants but random variables. To distinguish from non-adaptive design, we use N,(k), S,(k) to denote the sample size and the number of successes from treatment z' for 2' = 1, 2. Wei et al. (1990) studied the interval estimation problem for p1 — p2 and a specific adaptive design: randomized play-the-winner design, which is due to Wei and Durham (1978) and tends to assign more study subjects to the better treatment. Wei et al. (1990) developed a network algorithm to find the joint distribution of (N1(k), 6306) + 32(k), 51(k)), through which exact confidence intervals for p1 — p2 could be derived. The authors suggest using this method when the sample size n is small or moderate. There are two disadvantages of this method which limit its wide application . First, though the network algorithm can be easily modified to accommodate other adaptive designs, the computation of the joint distribution of (N1(k), 530:) + 52(k), Sl(k)) is not very easy. Second, the exact confidence interval does not have an explicit form. Wei et al. (1990) also evaluated the Wald interval and the profile likelihood based interval for p1 — p2 with randomized play-the- winner designs though simulation. They found that the Wald interval was rather anti-conservative. The profile likelihood method was recommended in Wei et al. (1990) for a moderate-sized or large sample design. 1 .3 Application All the confidence intervals studied in this dissertation are based on asymptotic theory. However, some confidence intervals behave rather well even when sample sizes are small or moderate. Therefore, one may use confidence intervals that will be suggested for a broad range of sample sizes. Moreover, the simplicity of all the confidence intervals except the profile like- lihood interval is an attractive feature from the point of view of applications. People who wish to perform adaptive designs have a wide variety of adaptive allocation procedures at their disposal. And the corresponding asymptotic theories of quite a lot adaptive designs are also reported. There are numerous references on the interval estimation problem for p1 — p2. But most of them concentrate on non-adaptive designs. We hope this dissertation will be useful for constructing good confidence intervals for p1 — p2, especially in adaptive designs. Chapter 2 Wald Interval Estimation for the Difference of two Binomial Proportions 2.1 Introduction Interval estimation for a single binomial prOportion and the difference of two bino- mial proportions are used extensively in practice and have been widely discussed in the literature. It is well known that the standard Wald intervals behave poorly. Brown et al. (2002) focused on the interval estimation for one binomial proportion and explored the reason why the coverage probabilities of the Wald interval for one binomial proportion are often far less than the nominal level even when the sample size is moderate or quite large. They evaluated the approximate coverage proba- bilities and expected lengths of the Wald interval for one binomial proportion and its candidate replacements. Inspired by their article, we studied interval estimation for the difference of two binomial proportions. In Section 2, we focus on the poor performance of the Wald interval of the difference of two binomial proportions by exhibiting its behavior through a few examples. As will be shown, the Wald interval for the difference of two binomial proportions, defined in (2.2.1), shares some similar properties to those of one bi- nomial proportion addressed in Brown et al. (2002). For example, the discreteness of the Binomial distribution leads to oscillatory coverage probabilities and the true coverage probabilities often differ significantly from the nominal level even when the two pr0portions are near 0.5 and sample sizes are moderate or large. We also note that unbalanced sample sizes, when the two proportions are close, among some other issues, may have severe effects on the coverage probabilities. In Section 3, we explore the reason for the poor performance of the Wald interval. Section 4 deals with a smooth approximation of the coverage probability of the Wald interval by applying Edgeworth Expansion methods. 2.2 Coverage Properties of the Wald Confidence Interval Let X1 and X2 be two independent random variables, X,- ~ Bin(n,-,p,-) , where p,- 6 (0,1) forz' =-- 1,2. Let :3, = Xi/ng. As mentioned in Chapter 1, the 100(1—a)% 10 Wald confidence interval for p1 -- p2 is - 1_ . . 1_ . 21-miz%\/p-———l( “HM, (2.2.1) n1 712 where 222:. denotes the 100( — %) percentile of the standard normal distribution. We will use Chi/(111,712, X1,X2) to denote this interval and CPw(n1,n2,p1,p2) to denote its exact coverage probability. Then CPW(nli n21p11p2) =P{p1 —' p2 E CIw(n1,n2,X1,X2)} (2.22) In 2 Tl 3 -:r: n :r: _ = Z Z ( 1)p11(1_p1)(n1 1)( 2)1’220 ’P2)(n2 I"01.41.051.332) 11:01:220 $1 $2 where Ap = {(xl,z2)|p1-p2 E Chi/(711,112, 3:1, (132)}. We will present a few examples to show that the coverage probabilities of the Wald interval are typically lower than its nominal level. The probabilities reported in the following plots and tables unless otherwise specified, are the result of exact probability calculations produced in S-Plus. Instead of using the algorithm given in equation 2.2.2, which contains two loops, we apply a more efficient one: CPW(nla n21plap2) "2 =ZP{Lw(n1,n2,pl,p2,i) < X1 < UW(nlin2,P11P2,i)}P{X2 = Z} (2.23) i=0 where Lw(n1,n2,p1,p2,z') < Uw(n1,n2,p1,p2,i) are two proper real roots of the equation as a function of X1: X. X21 \/X.(1—X1) 1(1—2‘) “"pFE‘E 2 (12.13 W 11 The algorithm given in (2.2.3) contains only one loop. Example 1. Figure 1 plots the exact coverage probabilities of the nominal 95% standard Wald interval for p1 = 0.9, p2 =: 0.1, 111 = 712 = 71 when n varies from 6 to 100. Two important features of the Wald interval are exhibited in the figure. First, there exists a very strong oscillation which is due to the discreteness of the binomial distribution. Therefore, the coverage probability does not at all get steadily closer to the nominal level though the magnitude of the oscillation tends to decrease. For example, at n = 14, the coverage probability is 0.942, but it is only 0.808 at n = 15. Even when n is as large as 67, the coverage probability is only 0.914. When n = 100, the coverage probability is still not satisfactory, it is only 0.927. Only after n 2 300 does the coverage probability fluctuate above 0.94. Second, the Wald interval is anti-conservative: the coverage probabilities at most values of n are less than the nominal level. Among all the coverage probabil- ities (for n = 6 up to n = 100), only three reach the nominal level. There are 50 coverage probabilities less than 0.93 and 31 less than 0.92. Similar to the phenomenon pointed out in Brown et al. (2002) for one sample interval, the existence of the oscillation of the coverage probability makes some quadruples lucky and some unlucky. For instance, the quadruple (n1, 112,191,122) = (53, 53, 0.9, 0.1) is lucky, with the exact coverage probability of the Wald interval equal to 0.9501. But (n1,n2,pl,p2) = (54, 54,0.9, 0.1) is unlucky, with the exact coverage probability 0.9132. Similarly, changing the proportions may result in some lucky or unlucky quadruples as well. Further, the lucky or unlucky quadruples 12 Figure 2.1: Exact coverage probability of the nominal 95% Wald interval for p1 = 0.9,p2=0.1 and n1=n2=n=6to 100 u: 02-4 0 g 33 0.9 m a) f > O 03, c: 20 40 60 80 100 are not predictable. There is no obvious pattern to follow on telling whether a quadruple is lucky or not. Example 2. Suppose the confidence level is 95%, 111 = 712 = n and It varies from 20 to 100. Compare the coverage probabilities in two cases. Case one, p1 = 19;; = 09; case two, 191 = 102 = 0.5. Conventional wisdom might suggest that the coverage probabilities in case 2 would be higher than those in case one. But this is not true. Figure 2 plots the coverage probabilities in the two cases. It is surprising to see CPw(n, n, 0.5, 0.5) is not obviously higher than CPw(n, n, 0.9, 0.9). When n varies from 20 to 100, CPw(n,n,0.9,0.9) has less oscillation. All CPw(n, n,0.9,0.9) are located between 0.940 and 0.949. The range of CPw(n, n, 0.5, 0.5) is [0.919, 0.953]. This example demonstrates we cannot judge the coverage probabilities of the Wald 13 Figure 2.2: Exact coverage probability of the nominal 95% Wald intervals for p1 = p2=05andp1=p2=0.9withn1=n2=n=20to100 0 a; O 8 I l“1 r\ 1A\\ l \ II‘\ 5 " I'\ I \ I \ I ‘ \ g \ I \\ l \ l \ j \ \ I \\ I \\ E l l I \ l \ , I \ I \ i g 3 I \ l L ‘ 5 \ j ‘— ’ \ ' \l d l l I I I \ j \ I \j \l O: l \ : \\ j \ j \I v a O) I ‘\ ' \ l \i t! 0’ ' ‘I \' V 3 d i il " ° I I 0 N I g g " - "' coverage at p1=p2=0,5 .......... -— coverage at p1=p2=o,9 32 C I I I I I interval by whether p1 and p2 are close to center or not. The relative positions of 191 and p2 affect the coverage probability. Since there are four quantities n1, 112, p1 and p2 affecting the coverage proba- bility, considering only the magnitudes of proportions is not enough. In fact, not only the relative positions of 1); and 112 but also the relative sizes of 111 and 71;. may influence CPw(n1, 712,111,122) significantly: Moreover, the four quantities interact. Example 3. Fix p1 = p2 = 0.9. Consider the coverage probabilities at 711 = 712 = 10 and 111 = 10,712 = 100 and nominal confidence level 95%. Which coverage probability is greater? It is striking to see that C'Pw(10,10,0.9,0.9) = 0.8282 and CPW(10, 100, 0.9, 0.9) = 0.6474. The large sample size does not improve but 14 Figure 2.3: Exact coverage probability of the nominal 95% Wald interval atnl - 112 = 10 and n1 = 10, 112 = 100 with p2 = 0.9 and p1 = 0.8 to 0.999 with jump size 0.001 O. - — coverage at n1=n2=10 -——- ooverageatn1=10,n2=100 “141 I I ~" 03 a o’ ‘ 5 a ‘3: 9- 0°. _ g, o S 0 > O 0 [x .1 0' <9 _ O t I I I I 0.80 . 0.85 0.90 0.95 1.00 p1 reduce the coverage probability in this special case. And the big difference between the two coverage probabilities cannot be explained only by the phenomenon of oscillation. This suggests the drOp on the coverage probability in case two is caused by unbalanced sample sizes. Figure 3 plots the coverage probabilities for p1 varying between 0.8 and .999 with step size 0.001. Table 1 and 2 list some coverage probabilities under different confidence levels for some p1, pg, 711 and n2. Observe how much the sample sizes might affect the coverage probability. From Figure 2.3 and the two tables, we may conclude that a larger sample size on one 11,- could not guarantee a better coverage probability. The 15 Table 2.1: Exact coverage probability of the nominal 95% Wald interval nl 10 10 10 30 30 30 100 100 100 712 10 30 100 10 30 100 10 30 100 p1=.9,p2=.5 .911 .920 .804 .906 .934 .937 .896 .940 .947 p1=.9,p2— .871 .849 .688 .849 .938 .908 .688 .908 .927 p1=.8,p2— .894 .900 .877 .898 .940 .929 .893 .931 .939 p1=.6,p2— .922 .913 .908 .913 .934 .941 .908 .941 .948 p1=.9,192— .949 .869 .647 .869 .945 .918 .647 .918 .948 p1=.5,p2=.5 .912 .917 .905 .917 .948 .939 .905 .939 .944 Table 2.2: Exact coverage probability of the nominal 99% Wald interval 711 10 10 10 30 30 30 100 100 100 712 10 30 100 10 30 100 10 30 100 p1 = .9,p2 = .5 .963 .963 .892 .968 .984 .978 .968 .982 .988 p1 = .9, p2 = .1 .878 .856 .754 .856 .946 .953 .754 .953 .982 p1 = 8,122 = .3 .972 .948 .895 .954 .976 .979 .958 .981 .986 p1 = .6, p2 = .4 .974 .964 .954 .964 .982 .983 .954 .983 .988 p1 = .9, p2 = .8 .991 .973 .692 .973 .991 .956 .692 .956 .989 p1 = .5,p2 = .5 .958 .966 .967 .966 .986 .985 .967 .985 .987 16 relative magnitudes(balanced or not) of the two sample sizes is another issue that influence the coverage probability significantly. It is obvious from above examples that the exact coverage probability of Wald interval seldom achieves the nominal level. We will examine the reason theoretically in next section. At the end of this section, it is worthwhile to mention an issue that might cause a non-negligible loss of the coverage probability of the Wald interval. Unlike a lot of alternative intervals, the Wald interval is sensitive to whether a confidence interval is defined as open or closed. The next remark gives such an example. Neither Brown et al. (2002) nor Agresti and Coull (1998) specifically mentioned whether their confidence intervals were closed or not. But their results are consistent with open confidence intervals. In Wei et al. (1990), the authors specified open confidence intervals. In this report, we define a confidence interval to be Open. Remark 2.2.1. The shrinkage of the Wald interval to an empty set, (a, a), at some realizations of (711,131; n2,p2) can cause its poor coverage performance, especially when both sample sizes are small and both proportions approach boundaries. The coverage probability of the Wald interval is at most 1 — (p;l1 + qI“)(p’2'2 + (132) regardless of the nominal level. For example, when p1 = p2 = 0.95 and both sample sizes are 20, the coverage probability of the Wald interval is at most 0.872 regardless of the nominal level. A more simple and instructive example is 711 = 712 = 10 and p1 = 112 = 0.9. If X1 = X2 = 10, then the confidence interval shrinks to (0,0). Though p1 — p2 = 0, but since 0 ¢ (0,0), (10,10) is not a proper pair in the AP 17 defined at the beginning of this section. Note that P{X1 = X2 = 10} = 0.1216, which makes C'Pw(10, 10, 0.9, 0.9) at most 0.8784. 2.3 A Reason for Inadequate Coverage Similar to the reason for the inadequate coverage of the Wald interval for one bi- nomial proportion explored in Brown et al. (2002), we will show that the poor performance of the Wald interval is due mainly to the fact that the Wald confi- dence interval is symmetric about a “wrong” center. Although 131 — 152 is the MLE and an unbiased estimator of p1 — 192, as the center of a confidence interval it causes a systematic loss of coverage from the nominal level. As we will see in next Chap- ter, by simply recentering the interval, one can improve the coverage performance significantly. One way to derive the Wald interval is to invert the large-sample Wald test. The nominal (1 — a)% Wald interval for p1 — p2 is the set of 6,, for which I151 - 152 - 512i \/I51(1- fill/711171520 - I52)/n2 < Za/2 Hence, in deriving the Wald interval, the following consequence of the central limit theorem plays an important role: . p: -Ap2 -(p1A-p2) - __c_> N(0,1) \/P1(1- I’ll/"1 +132(1 - P2l/nz For simplicity, denote the left side above Wmm. Even for quite large values of 711 and 712, the actual distribution of an, can be far from the the standard normal distribution for many p1 and p2 as we will show next. Thus the very premise on 18 which the Wald interval is based is seriously flawed for moderate and even quite large values of 711 and 712. The bias of anm which is EWmm, from the mean of standard normal distribution can be analytically computed by doing standard expansions. Denote can, = 15,- —— p,- for 2' = 1,2. Then simple algebra gives W — ”"1 ’ “"2 nimz "‘ PIQX+(91-Pllwn1'W121L + P202+(02-P2)wn2 —w3,2 n1 ’12 where q,- = 1 — p,- for 1' = 1, 2. Let u = 9%? + 3:733. Denote the denominator b, then l/b can be expressed as l 2 2 -- 11—1/2 (1 + (91*Pllwn1 + (Q2 —' 1’2)an _ (Wm + wn2)) 2 77.121. T1211. 77.11]. 71211. =u'1/2(1+x)’1/2, 2 .2 _ (QI—Pllwn] (92-p2)wn2 _ wn, w. - _ 71/2 where 2: —- mu + "2" mu + 47.2.. . Since can, — 0,,(71, ), a Taylor expansion yields _ :r 31:2 5a:3 _ (Wm ‘7 wmlu U20 " 5 + ? — T6_ + 0p((711/\ 712) 3/2)) Wn1.n2 = The formulas for central moments of the binomial distribution then yield an approximation to the bias: p1—1/2 1 9 p2-1/2 1 9 EW,, .-.-.-——-—-— 1 ———--—1 --—-———— 1 -—-———--~1 ""2 n1u1u1/2 ( + 111(2111 ) n2u2u1/2 + n2(2u2 ) + 30p. - 1/2)-(p2—1/2)) 2n1n2u1u2u1/2 _ 15021 -1/2)(p2 -1/2) (122 -1/2 _ 91—1/2) 2n1n2u1u2u3/2 n2 111 + 0((n1 /\ n2)'3/2) (2.3.1) 19 where From (2.3.1), it can be seen that when both p1 and p2 approach 1 / 2 for fixed sample sizes, the bias tends to decrease. Therefore, ignoring the oscillation effect, one can expect to increase the coverage probability by shifting the terms of the cen- ter of Wald confidence interval from 151 and :52 toward 1/2 and 1 / 2. When the two proportions are close for comparable sample sizes, the effect from p1 could coun- teract that from p2. On the other hand, it also explains why the Wald confidence interval behaves poorly when the sample sizes are extremely unbalanced for some p1 and 192: the effect from p1 cannot cancel out most effect from p2. In general, equation (2.3.1) can be used as a rule of thumb to explain how interaction among n1, 112, p1 and p2 affects the bias and thus the coverage probability. 2.4 A smoothing formula obtained by Edgeworth Expansion methods In this section, we will not justify Edgeworth Expansions, but rather will use Edge- worth expansion techniques to derive a formula that works well in approximating coverage probabilities of the Wald interval in a variety of settings. See Bhattacharya and Ranga Rao (1976) and Hall (1992) for more details on Edgeworth Expansions. First a theorem from Hall (1992) on Edgeworth expansion is presented. It gives general conditions under which the Edgeworth expansion is valid and will be 20 used as a tool to derive the smooth approximation of the coverage probability of the Wald interval. Theorem 2.4.1. (Hall, 1992, page 56) Let X, X1,X2,..., be independent and identically distributed random column d-vectors with mean It, and put I: = n‘1 2;, X; . Assume the function A : Rd —+ R has j + 2 continuous derivatives in a neighbor- hood Of/l. = E(X), that 2401) = 0, that E(||X||j+2) < 00, and that the characteristic function x of X satisfies limsuplx(t)| < 1. (2.4.1) IltII-wo The above inequality is called Cramer’s condition. Denote the asymptotic vari- ance of nl/zAOT) by 02. Suppose a > 0. Then forj Z 0, P(n1/2A(-X_)/o S :13) =(:r) + n'1/2r1(:r)d>(x) + n'1r2(a:)d(:r) + - ~- + n‘j/zrj(:r)¢(z) + 0(n"j/2) (2.4.2) uniformly in :c, where r; is a polynomial of degree at most 33' — 1, odd for evenj and even for odd j, with coefiicients depending on moments of X up to order j + 2. According to the arguments (pages 47, 48) in Hall (1992), the rj for j = 1,2 in Theroem 2.4.1 are given by r1(x) = — {km + ék3,1(:r2 — 1)} (2.4.3) and 1 1 1 r2(:r) = —:I: {5(k23 + kiz) + £09.“ + 4Ic1,2k;.,,1)(2:2 — 3) + 5143,42? — 10:1:2 + 15)}, (2.4.4) 21 where those k’s can be determined through the following expressions of It’s that may are expanded in terms of k’s as a power series in n“1 #ij = n'(j"2)/2(kj,1 + n”1kJ-,2 + 71—21013 + ...),j Z I (245) Let 5,, = n1/2(é - 60)/&. The It’s are defined by K1," == E(Sn) K2,, = E(SZ) — (E(S,,))2 = var(S,,) n3," 2 E(Sfi) — 3E(S,2,)E(Sn) + 2(E(S,,))3 = E(Sn — ES")3 64,. = E(Sil - 4E(SS)E(Sn) - 3(E(S.2.)2 +12E(S.2.)(1‘3(Sn))2 - 6(E(Sn))4 = E(Sn — ES")4 — 3(va.r(S,,))2 (2.4.6) To derive the smooth approximation of the coverage probability of the Wald interval, we define some notation. Let {YM- :j= 1,2, . . .} and {Yzj :j = 1,2,...) be two independent sequences of independent Bernoulli random variables, I’m- ~ Bernoulli(p,~) and let X,- = 2:;1KJ, where i = 1,2. Let '7 and K3 stand for the skewness and kurtosis of D = Ym — Yu respectively. Then 7 = E(D - ED)3 = E (Y1,1 - Y2,1 - (p1 - 191))3 = P191011 "- 191) — P292012 '7' P2) and K. = E(D — ED)4 — 3(tIar(D))2 = E (Y1 - Y2 - (P1 - I02))4 - 3(vaT(Y1 - Y2))2 = p191(q1 - 721)2 + 19292012 — 192)2 - 219%? - 219393 22 We do not have appropriate random variables to apply Theorem 2.4.1 directly since Bernoulli random variables do not satisfy Cramer’s condition. In general, absolutely continuous random variables satisfy Cramer’s condition. Therefore, we need to smooth Bernoulli random variables first. However, there is another problem arising after smoothing: the Wald test statistic, through which we may define the exact coverage probability of the Wald interval, is W _ P1‘P2—(P1'7P2) n ,n — .. - - . 1 l 2 \/p1(1 -p1)/n1+p2(1-pz)/n2 on the set - 1_ - - 1_ - 11mm ={P1( P1) +P2( P2) > 0} 711 112 and has no definition on —- A 1- A A 1— A Hmm = {P1( P1) + P2( P2) = 0}. 77.1 722 Consequently, we need to consider the smoothing random variables on Hum”. But Theorem 2.4.1 does not apply to random variables defined on a proper subset. Since P(II,,,,,) = (p'f+q§‘) (p3 +q3), which is of higher order of 0(n‘3/2). Hence, the probability that an, has no definition can be absorbed in 0(n‘3/2). And a smooth approximation of the coverage probability of the Wald interval will be given in an expression with error term 0(n’3/2). What would happen if the Edgeworth Expansions were theoretically valid on the subset 11%,”? We will focus on Hum, henceforth. For simplicity, we consider the case when n1 = n2 = n 2 2. The procedure of deriving the smooth approximation contains four steps. First, we create two sequences of random variables and define a statistic Tn," by 23 using those created random variables. We then show statistic Tn," can be used to approximate the exact coverage probability of the Wald interval. In step 2, we verify the validity of doing Edgeworth expansion for statistic T n," if the expansions were valid on a subset. The Edgeworth expansion for T W, is derived in step 3. Last, in step 4, the smoothing formula of the coverage probability of the Wald interval is given by applying results from the first three steps. Step 1. We first create two sequences of random variables to be used in the Edgeworth expansion and show the exact coverage probability of the Wald interval can be approximated using a statistic defined through the created random variables. Suppose £131“ and "iu' are two independent sequences of i.i.d random variables forj = 1,2, - - -, both are independent of Yid- for 2' = 1,2 and EiJ ~ U(—1/n4,1/n"), 17,-,3- ~ U(1—1/n4,1 + l/n“). For 2' = 1,2 and j = 1,2,-.., define T1,,- = Gall/3.1 = 0] + Pull/m = 11° (2-4-7) Then 1 1 Ym‘ - a: < Tm“ < Ym‘ + '7; (2-4-8) Put T,- = 2;:17},j/n The following inequality holds by applying inequality (2.4.8) 1 Ti- , — 1 24 Consider the quantity under the square root in W”. 151(1‘ 151)+ 1520‘ P2) n Tl < (71+ 1/n‘)(1 - TI +1/n“) + (72 +1/n“)(1 — 273+ 1/n") 12 TL 7“ —T Tl—T <—§—1—-‘-)-+—3-€———Q+%. (2.4.10) 11 n TL Similarly, . _* *1_* Tl—T‘ Tl—T MO 111) +pz( 122) > _1_l__L)_+.L_2_)_ 35 (2.411) n n n n n NotethatQU—J—m+fl§E—)—fg SOifandonlyif’i—‘(l—nul’fl+‘i'2—(-1;;_—p‘22 =0, which is out of our consideration according to previous analysis. Then, on Hum, define = TT—TZ-(pl—pz) \/T1(1-T1) + ran—113) n n Tn,n Therefore, applying inequalities (2.4.10) and (2.4.11), the following inequality chain holds, Tl-T'2:-(P1—P2)"2/n4z§” + it<2>z§2))| < 1. (2.4.16) It“)l+lt‘2’I-+oo By an argument (page 65) in Hall (1992), Cramer’s condition (2.4.16) holds if the distribution of the random vector ZJ- has a non-degenerate, absolutely continuous component, which is satisfied in current settings. The former inequality (2.4.15) is guaranteed by k 2 E(||Zj||"+2) = E(D/1,312 + |Y2,j|2)k—;—2 S 4+. Step 3, derive the two-term Edgeworth expansion of Til/214(2). 27 For simplicity, use 5,, to denote Til/224(2). Let W,- = T,- — p., for 2' = 1, 2, then E(W,) = o E(WE) = 13% + o(n-8) E(W?) = piQi(::2- pi) + 0(71—8) . . a 3 - 2 2 E(Wi‘i) = pigt(pr + q,)1-:;3(n 1)p1 Q: + 0(71-8) ~ 0(n-2) ? .2 . _ . E(Wis) = 1019,41,); p.) + 0(n'4) + 0(n'8) ~ O(n"3) 3 a E(Wis) = 13:1qu + 0(n-4) + 0(n‘8) ~ 0(n‘3) (2.4.17) Then with 5,. = n1/2A(-Z-) and W1 = 012(71-1/2), Sn = 111”(W — W2)((W1+ p1)(-W1+ 41) + (W2 + p2)(—VV2 + (fin-V2 = n1/2(W1 — W2) (plql + p242 + (ql - pawl + (q2 — p2)VV2 — (W,2 + Wyn—W _ _ —1/2 = 111/2(W1 - way-”2 (1+ ————(q‘ T 7")W1+ ___(q2 T 1MW... — 1(11'3 + 14/22)) 7. 2 1 . = n1/2T—1/2(W1 — W2){1 —' 57-12(01- P0”? i=1 1 2 3 3 + 2T—1 ;[1 + ZT—1(Qi - Pi)2]Wi2 + 37-2% - P1)(Q2 - P2)W1W2} :9 Therefore, apply the moment equations (2.4.17) and the independence of W1 and 28 W2, we have E6.) = nl/2r-WE ((Wl - W2)(1 — $74241.- — paw») + 001-1) 1 = —§n"1/2T‘3/2’y + 0(n'1) 2 E(Sg) = nT—1E{(W1 — W2)2(1- 7-1 2((15 — p,)W,- i=1 2 + 7‘1 2(1 + T‘1(q,- — p,-)2)W,-2 i=1 + 2T_2(€11 - P1)(<12 — P2)W1W2)} + 0(71—3/2) = 1 + n“1 + 2n‘17_2(p¥qf + pgqg) + 27‘372 + 0(n‘3/2) 2 E6?» = n3/2r-3/2E ((Wl — W2)3(1 - 37-12(4- - paw») + 0(n-l) i=1 = —;n_1/2T'3/l27 + 0(n—1) and 2 E(SZ) = n2r-2E{(Wl - W2>4(1 — 27—1 2(4- — paw.- i=1 2 + 7-1 2(2 + 3T_1(qi — pi)2)I’Vz-2 i=1 + 6T—2(QI - P1)(Q2 — P2)W1W2)} + 0(71—3/2) = 3 + 611‘1 + 18n‘17‘2(p¥qf + pgqg) — 211-1745 + 287—372 + 0(n‘3/2) 29 Hence, by equations (2.4.6) 141,. = E(Sn) = —-;—n‘1/27'—3/27 + 0(n“) ”2,11 = E(Si) " (£31530)2 = 1+ 21‘1 + 2n’1'r'2(p§qf +p§q§) + En’lr’a’f" + 0(n'3/2) Kan = E(Sfi) - 3E(Sr21)E(Sn)+ 2(E(Sn))3 = -2n'1/2T‘3/2'y + 001“) and 11:4," = E(Si) - 4E(53)E(Sn) - 3(E(53.)2 +1219(53)(E(Sn))2 - 609(5))»4 = 611-174(12qu + pgqg) — 2n"lr‘2n + 1211—17—372 + 0(n’3/2) Therefore, in the notation of (2.4.5), the two-term Edgeworth expansion for T W, is P(Tn,n < a) = (a) + n‘1/2r1(a)q5(a) + n’1r2(a)¢(a) + 0(n‘3/2) where r1(a) and r2(a) are given in equation (2.4.3) equation (2.4.4) with 1 191,2 = - “Fa/2 2 ’Y 7 km = 1 + 27—20912? + 19343) + {"372 k3,1 = —2T—3/2’)’ (“4.1 = 5T_2(PiCIi + qug) - 27’2“? + 127—372 in which 7' = plql + sz2. Step 4, compute the smooth approximation of the coverage probability of the Wald interval. 30 By equation (2.4.12), PW 3P (TM 3 (20,,2 + n‘2)/(1 — 2/n3)) - P (Tan < -(Za/2 + n'2)/(1 + 2/n3)) =((Za/2 + n‘2)/ (1 - 2/n3)) — (-(Za/2 + n‘2)/ (1 + 2/113)) + n‘1/2ri((Za/2 + n'2)/(1- 2/123))((Z../2 + n'2)/ (1 - 2/n‘°’)) — n‘1/2n(—(Za/2 + n'”)/ (1 + 2/115“))<15(—(Ze../2 + n“"’)/ (1 + 2/113)) + n'1r2((Za/2 + n"2)/(1- 2/n3))¢>((z../2 + n-2)/(1 — 2/n3)) _ Trim—(z...)2 + n-2)/(1+ 2/n3))¢(—(Za/2 + n‘2)/(1 + 2/n3)) + O(n"3/2) =(1- a) + 2n’1r2(Za/2)¢(Za/2) + O(n"3/2) (2.4.19) The cancellation is valid because all functions appeared in the two-term Edgeworth expansion of Sn are continuous and the n-2 terms can be absorbed in 0(n-3/2) . That r1 (1:) and (Mac) are even functions and rw(:r,) is an odd function also guarantees the last two steps. Similarly, it can be shown that PW 2 (1 — a) + 2n-1r2(za,2)¢>(za,2) + 0(n-3/2). (2.4.20) Combine inequalities (2.4.19) and (2.4.20), then we have the smoothing for- mula for the coverage probability of the Wald interval: The coverage probability of the Wald interval is at most 1 — (12'; + q?)(p§z + q?) and can be expanded as PW = (1 — a) + 2n'1r2(Za/2)¢(Za/2) + 0(n-3/2) (2.4.21) 31 where ”(Zn/2) = r2(Za/2) in equation (2.4.4) with - 7 ._ k2.2 = 1 + 27' 2(Pi4i +P§CI§) + 47 372 k3,1 = -2r'3/2'y 164.1 = (ST—200%? + 12323) - 27'2n + 127‘312 in whichr=p1q1+p2q2 when0 o __ 8 8 1 / ................... exact coverage ________ C. V 20 40 60 80 1 00 n1 35 Chapter 3 Interval Estimation for the Difference of two Binomial Proportions -3.1 Introduction The poor performance of the Wald interval for the difference of two binomial pro- portions has been addressed in last chapter. Consequently, there are quite a lot of methods of developing alternative intervals suggested. Their performances differ significantly. In Section 2, we present several interval estimation methods as candidates to replace the Wald interval, each with its motivation. The candidate intervals are classified into three groups: (1) The Wald interval with continuity correction. It has 36 the same center as the original Wald interval. (2) Confidence intervals with adjusted centers. We select two of such intervals. One is derived through Bayesian approach with Beta prior distributions and then using normal approximation. we identify it the (approximate) Bayes interval in the report. Another one is pr0posed by Agresti and Coull (1998). The main idea is to add four pseudo observations. Both intervals have different centers from the Wald interval. (3) The profile likelihood based confidence interval. Which, unlike the other intervals, does not have an explicit form. In Section 3, the performances of the above intervals with explicit forms along with the Wald interval are evaluated. We assess those intervals on two aspects. One is their coverage probabilities. All the coverage probabilities in this section are computed exactly rather than by simulation. The other is their expected lengths. The profile likelihood based interval is taken into consideration in Section 4. We compare the coverage probabilities and lengths of all the alternative intervals along with the Wald interval through simulation. According to our analysis, we recommend the intervals with adjusted centers as substitutes for the Wald interval. We concentrate on intervals with 95% nominal level in this chapter. The conclusions also hold for intervals with other nominal levels. 37 3.2 Some Alternative Intervals The following are some alternatives to the Wald interval. 3.2.1 The Wald interval with continuity correction There are a few intervals with different correction terms in this category. The most widely used one is .. . ‘ 1— ‘ ‘ 1— “ 1 1 P1 — P2 21: 2'2 p____1( P1) + p2( 122) + + — (3.2.1) 2 n1 n2 2n1 2n2 It results from inverting the Wald test: when computing the p-value, a continuity correction is applied for improving the accuracy of the central limit theorem ap- proximation. This interval has the same center as the Wald interval and a greater margin of error. 3.2.2 Intervals with adjusted center Approximate Bayes interval The method is motivated by using the Bayesian estimates instead of the maximum likelihood estimates when deriving confidence intervals. For i = 1, 2, the indepen- dent conjugate Beta(a, b) prior distribution results in the posterior distribution of p,- is Beta(a + X;, b + 111 — X.) with mean 5, = (Xi + a)/(n.- + a + b) and variance 15.-(1 — 13,-) / (n.- + a + b + 1). Using a normal approximation for the distribution of the difference of the posterior beta variate leads to the approximate Bayes interval 151—13221529- 2 Jan-51) + 2520-152) n1+a+b+1 n2+a+b+1 38 In particular, if a = b, the estimators 1'51 and 132 are driven to be closer to 1/ 2 than 131 and 152 respectively unless p, = 1/2. Suggested by Berry (1996) (p.291), the approximate Bayes interval in the report is specified to take a = b = 1, which leads to - .. Pl(1 'Pl) P2(1‘P2) — :i: 9. 3.2.2 pl 172 22\/ n1+ 3 + 722 + 3 ( ) where 13" = (Xi+1)/(n.-+ 2) fori=1,2. The Agresti-Coull interval As mentioned in Chapter 1, motivated by the Wilson interval for one binomial pro- portion as shown in Wilson (1927), Agresti and Coull (1998) suggested an interval with Z3025 z 4 pseudo observations, one success and one failure from each binomial population. Then the sample proportions are p, = (X. + 1)/(n,~ + 2) for i = 1, 2. Replacing p, with p, and n,- with n.- + 2 for i = 1, 2 in the ordinary Wald interval yields the Agresti-Coull interval ~ ~ 1510-131) 1520—152) CI = — 2i: 2 -——— —-—- 3.2.3 A P1 p2 22¢ n1+2 + n2+2 ( ) The above two intervals have the same center. The approximate Bayes interval is a subset of the Agresti-Coull interval. Since they have a different center from the Wald interval with or without continuity correction but a similar form, we call them intervals with adjusted centers. 39 3.2.3 The profile likelihood based intervals Unlike the other intervals discussed so far, profile likelihood based intervals do not have explicit forms. Suppose the log-likelihood function of 0 = (A, p2) is l (A, p2), where A = p1 — p2 is the parameter of interest and p2 is regarded as a nuisance parameter. Let {(A) = rriaxl(A,p2), which is called the profile likelihood for A, where the range of po for the maximization is (0,1 — A) if A Z 0 and (—A, 1) otherwise. Then an approximate 100(1 — a)% profile likelihood interval for p1 — 122 is {A e (-1,1) =213? 2 2 4 (n.- + 2)4(n,- + 3) (n,- + 2)2(n,- + 3) i=1 + fB(w1,w2)) + Op(n-2) (3.3.6) where f3 ((421,012) only contains terms of w,- and wlwg and has mean 0. The second step is again achieved by multivariate Taylor Expansion and w,- = 0,,(n’i). The t has the expression :Pl(1 - P2) + P2(1 — P2) n1 '1’ 3 n2 '1' 3 1 — 7 1 - 7 =Pl_‘11 + 219141 +___p2t12+ P292 0 (71-3) 1 2 _ 7 2 =11 (1 + + r (P141 + r P242)) + 0(n‘3) (3.3.7) "(P141 + 1'P2(I2) Therefore, replacing the t with the expression given by (3.3.7) and taking expectation of equation (3.3.6) with respect to wl and wg gives the desired result (3.3.3). The proof of (3.3.4) is very similar to the proof for the approximate Bayes interval and is omitted. El A direct conclusion of Theorem 3.3.1 is the comparison results of the expected lengths of the intervals. 50 Corollary 3.3.1. Denote r = nl/ng and n1 = n, then up to an error of 0(n‘2), ELw Z ELB if and only if 7(P191 + 1'2P242) _>_ 1 + 7‘2 and ELWCC Z ELB if and only if 1 + r2 - 7(12121 + ”12222) 1 + r > \/n(piqi + r2292) Remark 3.3.1. The above corollary can be applied to the Agresti-Coull interval after replacing the 7’s by 6’s. Remark 3.3.2. Based on the corollary, when both p1 and p2 are in (0.173, 0.827), the Bayes interval is shorter than the Wald interval and it is the shortest among the four intervals if the 0(n'2) error is neglected. In addition, if p,- E G — % a? is satisfied for either i = 1 or i = 2, the Bayes interval is shorter than the Wald interval with continuity correction if the 0(n‘2) error is neglected. Therefore, when one sample size is not too small and the corresponding proportion is not too close to the boundaries, the approximate Bayes interval is shorter than the Wald interval with continuity correction. Remark 3.3.3. The Wald interval with continuity correction is often much longer than the other three intervals. Figure 3.6 and 3.7 plot the approximate expected lengths of the four intervals under different conditions when nominal confidence level is 95%. They demonstrate that the expected lengths of those intervals except the Wald interval with continuity correction are comparable. 51 Figure 3.6: Comparison of approximate expected lengths of some confidence inter- vals for p2 = 0.5 and n1 = 112 = 25 ix 0' 1 """"""""""""""""""""""""""""""""""""""""" ‘9. a o .C ‘6: C 2 g 3 ‘ X LIJ 9. _ o — EL of Wald interval ------------- EL of Wald interval with correction - - -- EL of Bayes interval ----- EL of Agresti-Coull interval 3;. a r f l I I I 0.0 0.2 0.4 0.6 0.8 1 .0 p1 52 Figure 3.7: Comparison of approximate expected lengths of some confidence inter- vals for p2 = 0.1 and n1 = 10, 71.2 = 20 1.0 .............. ....... ........ ...... .' u n ’0 ,u '- a o' '- o' '- n 0.8 Expected length 0 6 l 0.4 1 \ ‘ EL 01 Wald interval \ EL oi Wald interval with correction — - —- EL of Bayes interval ————— EL oi Agresti-Coull interval 0.2 I I I ‘1 l l 0.0 0.2 0.4 0.6 0.8 1.0 p1 53 Now we may conclude that the poor coverage performance of the Wald interval is not because it is short. On the contrary, compared to intervals with adjusted centers, the Wald interval is often longer than them but with less coverage proba- bility. The high coverage probability of Wald interval with continuity correction is achieved by widening the Wald interval dramatically. Hence, in replacing the Wald interval, intervals with adjusted centers are much more preferable. 3.4 Comparison of the Wald Interval and all Pro- posed Alternatives The comparison is based on a simulation with 10000 iterations for each selected (n1,p1; n2,p2). The simulation results are summarized in next table, in which we use WCC and PLB to indicate the Wald interval with continuity correction and the profile likelihood based interval respectively. Since the Wald interval with continuity correction is not as good as the intervals with adjusted centers, we will not compare it with other intervals. Through table 1, we can see that the profile likelihood based interval does improve upon the Wald interval on coverage probabilities in a frequentist sense. As listed in the table, its coverage probabilities are (much) higher than the coverage probabilities of Wald interval except a few points of (n1,p1;n2,p2). Hence, the coverage of this interval is more reliable than the Wald interval. This suggests that 54 Table 3.1: Comparison of Confidence intervals at 95% level Coverage Probability Length 71 1 02 p1 p2 Wald WCC Bayes AC PLB Wald WCC Bayes AC PLB 10 10 .9 .1 .870 .878 .953 .953 .857 .458 .658 .556 .579 .467 .9 .8 .874 .989 .976 .985 .921 .567 .767 .601 .626 .638 .8 .3 .896 .973 .951 .953 .949 .707 .907 .671 .699 .708 .8 .7 .917 .975 .956 .968 .913 .707 .907 .671 .699 .728 .6 .4 .919 .972 .955 .961 .941 .812 1.01 .730 .760 .812 .6 .5 .907 .966 .955 .957 .941 .821 1.02 .736 .766 .822 .5 .5 .910 .955 .954 .954 .945 .830 1.03 .741 .771 .831 15 15 .9 .1 .809 .955 .962 .971 .930 .396 .529 .450 .463 .406 .9 .8 .933 .979 .960 .970 .923 .479 .613 .496 .511 .527 .8 .3 .932 .955 .937 .956 .940 .591 .724 .568 .584 .596 .8 .7 .931 .976 .953 .958 .933 .591 .724 .568 .584 .614 .6 .4 .932 .973 .956 .959 .949 .676 .810 .626 .644 .676 .6 .5 .930 .974 .933 .933 .931 .684 .817 .631 .649 .684 .5 .5 .939 .954 .951 .954 .951 .691 .824 .636 .654 .689 20 2O .9 .1 .916 .920 .956 .956 .967 .352 .452 .387 .396 .367 .9 .8 .943 .972 .960 .971 .931 .423 .523 .433 .445 .463 .8 .3 .938 .965 .954 .954 .931 .517 .617 .501 .512 .523 .8 .7 .940 .970 .949 .951 .939 .517 .617 .501 .512 .544 .6 .4 .942 .973 .938 .956 .942 .591 .691 .556 .569 .596 .6 .5 .931 .963 .949 .955 .949 .598 .698 .561 .574 .601 .5 .5 .918 .958 .954 .954 .954 .604 .704 .566 .578 .606 50 50 .9 .1 .932 .969 .949 .949 .953 .231 .271 .240 .242 .248 .9 .8 .944 .967 .953 .957 .941 .273 .313 .276 .279 .294 .8 .3 .945 .969 .951 .951 .946 .333 .373 .329 .332 .354 .8 .7 .940 .966 .944 .948 .943 .333 .373 .329 .332 .355 .6 .4 .941 .957 .944 .944 .944 .380 .420 .370 .374 .394 .6 .5 .939 .966 .940 .940 .940 .384 .424 .374 .377 .396 .5 .5 .939 .962 .939 .939 .939 .388 .428 .377 .381 .397 20 10 .9 .1 .856 .955 .956 .960 .930 .409 .559 .479 .495 .434 .9 .8 .873 .945 .972 .972 .906 .520 .670 .530 .549 .557 .8 .3 .908 .966 .943 .951 .937 .632 .782 .597 .618 .631 .8 .7 .913 .966 .949 .949 .939 .632 .782 .597 .618 .644 .6 .4 .921 .964 .946 .951 .944 .710 .860 .649 .671 .701 .6 .5 .917 .966 .947 .947 .945 .720 .870 .655 .677 .710 .5 .5 .926 .969 .944 .947 .944 .725 .875 .659 .682 .714 10 20 .9 .1 .856 .958 .953 .957 .932 .410 .560 .479 .495 .434 .9 .8 .945 .983 .970 .984 .927 .475 .626 .516 .533 .540 .8 .3 .911 .963 .946 .955 .924 .604 .754 .587 .607 .609 .8 .7 .921 .966 .954 .963 .923 .604 .754 .587 .607 .629 .6 .4 .921 .962 .945 .950 .941 .710 .860 .645 .671 .701 .6 .5 .919 .962 .944 .948 .939 .715 .865 .653 .675 .706 .5 .5 .925 .967 .941 .945 .941 .725 .875 .659 .681 .714 55 Figure 3.8: Comparison of coverage probabilities for p1 = 0.21 to 0.99, p2 = p1—0.2, m = 712 = 20 at 95% nominal level a: O). .. O 8 .3 o‘ .5 '5 (U .0 2 O. G V O) O) _ E o' m > o -..-. .' o .‘-'.. O: H". — Wald Wald with correction g. _ ——— Bayes o ----- Agresti-Coull ------------- Profile likelihood l T I I I 0.0 0.2 0.4 0.6 0-8 p2 the profile likelihood interval might be a good alternative to the Wald interval. However, for small or moderate balanced sample sizes, the coverage behavior of the profile likelihood based interval is questionable. Figure 3.8 plots the coverage probabilities of the five 95% nominal intervals at n1 2 n2 2 20 and p1 = 0.21 to 0.99 with step-size 0.01 and p2 = p1 — 0.2. In this specific case, though the lengths of the profile likelihood based interval are always greater than those of the Wald interval, its coverage behavior is even worse than that of the Wald interval . In general, one disadvantage of the profile likelihood based interval is, for 56 balanced sample sizes, the lengths of the profile likelihood intervals are greater than those of the Wald intervals. There is an “outlier” among the coverage probabilities of the profile likelihood based interval in the table. When n1 = M = 10 and p1 = 0.9, p2 = 0.1, the coverage probability of the profile likelihood based interval is only 0.857 while all the other coverage probabilities of this interval listed in the table are greater than 0.90. This is due to the discrete nature of the binomial distribution. Some quadruples (n1,p1,n2,p2) are lucky and some are unlucky. The quadruple (n1,p1,n2,p2) = (10,0, 9, 10, 0.1) is an unlucky one for the profile likelihood interval. According to table 1, compared to the intervals with adjusted centers, the profile likelihood interval does not behave better. The coverage probabilities of the Bayes interval and the Agresti-Coull interval are seldom less than those of the profile likelihood interval and have less deviation according to the above table. For small or moderate sample sizes, when p1 and p2 are close, the coverage probabilities of intervals with adjusted centers tend to be greater than or equal to those of the profile likelihood interval. This property makes the intervals with adjusted centers more attractive because it is more common that the difference of the two proportions of interest is small or not very large. In addition, except that p1 and p2 are close to boundaries, intervals with adjusted centers are always shorter than the corresponding profile likelihood based intervals. The other disadvantage of the profile likelihood based interval is that it does not have an explicit form. 57 Based on our evaluation, all the candidate intervals improve the coverage prob- abilities greatly upon the Wald interval in a frequentist sense. The phenomenon of over nominal coverage probability occurs quite often to the Wald interval with continuity correction, which tends to have the largest expected length. The profile likelihood based interval has a better coverage performance but greater expected length than the Wald interval when the binomial proportions are not close to the boundaries, and the computation of this interval is complex. Moreover, our ex- tensive simulation shows the performance of the intervals with adjusted center is better than other intervals. With respect to the five confidence interval methods discussed for construct- ing approximate 100(1 — (1)70 two-sided intervals, we recommend intervals with adjusted centers as substitutes for the Wald interval. Because of their stable cov- erage behaviors, they have relatively reliable coverage performance even when n1 , n2 are very small and p1, p2 are close to boundaries. Their simple expressions make the computation easier. Moreover, their lengths are not longer than other intervals in a frequentist sense. Especially, when the proportions are not very close to the boundaries, the lengths of the intervals with adjusted centers tend to be smaller than the others. As for which interval to choose between the Bayes interval and the Agresti-Coull interval, it depends on one’s favor. The former is shorter and a little bit less conservative. 58 Chapter 4 Interval Estimation for the Difference of Two Binomial Proportions in Adaptive Designs 4.1 Introduction In clinical trials and in industrial work, adaptive designs which use accumulating information to assign subjects to different treatments, are often highly desirable. People apply adaptive designs for two possible aims: first, to draw reliable statistical inferences for the benefit of future subjects, which can be thought of as an utilitarian goal. Second, to assign each subject to the treatment with better performance, which is the individualistic goal. In this chapter, the approaches of constructing confidence intervals for non- 59 adaptive designs will be applied to adaptive designs. A sequential adaptive model is considered in which two treatments are compared and the responses are binary: success or failure. In section 4.2, notation and some existing adaptive designs will be introduced. The validity of extending non-adaptive methods to adaptive designs will be checked in section 4.3. As will be explained in more details, adaptive designs are classified into two categories: allocation adaptive designs and response adaptive designs. In section 4.4, the connection between the coverage performance and expected lengths of a confidence interval derived from a non-adaptive design and its counterpart from allocation adaptive design is stated and proved. In section 4.5, simulation results are given for response adaptive designs. 4.2 Notation and Some Adaptive Designs The two populations to be compared are referred to as Population A and Popu- lation B, and {Xk : k 2 1} and {1”,c : k 2 1} denote the potential independent observations from populations A and B respectively. For each k 2 1, exactly one of (Xk,Yk) is actually observed. It is assumed that (X1,Y1), (X2,Y2), . .. are i.i.d., where X1 ~ Bernoulli(pA) and Y1 ~ Bernoulli(p3). The total sample size is n, the number of observations from populations A and B. For each k > 1, define 6,, to be 1 or 0 according to whether the kth object is assigned to population A or B. The symbols N A(k) and N 3(k) indicate the numbers of the first k observations 60 that are allocated to p0pulation A and B through stage I: . Then and k NBUC) = 2(1— 61‘) = k — NA(k)° i=1 Further, define SA(k) and Sg(k) to be the numbers of successes from populations A and B through stage 1:. Then k SAUC) = 25.913 i=1 and k 330:) = 2(1- 601/. i=1 As stated in Geraldes (1999), most adaptive designs fit into one of two general categories: allocation adaptive designs and response adaptive designs. The former encompasses those approaches for which the allocation of each subject does not depend on the responses of previous subjects but only depends on the subject’s covariate levels (when covariate information is taken into consideration) and the allocations and covariate levels of the previous subjects. The second category in- cludes those approaches for which the allocation of each subject depends also on the responses of the previous subjects. Hence, the main difference between alloca- tion adaptive designs and response adaptive designs is that (X 1, Y1), (X2,Y2), . .. are independent of the 6’s in allocation adaptive designs and they are dependent in response adaptive designs. 61 There are a lot of adaptive designs in the literature, for example, the doubly adaptive biased coin design proposed by Eisele (1994), the play-the-winner design proposed by Smythe and Rosenberger (1995). Woodroofe (1982) considers the prob- lem of sequentially allocating patients to treatments when covariate information is present. We will introduce some adaptive designs in the next two subsections. 4.2.1 Some Allocation-Adaptive Designs The Biased Coin Design, pr0posed by Efron (1971), allocates the next subject to one of the two populations, A or B, according to the following rule. Let D), denote the difference of NA(k)/k and N3(k)/k. Let p0 be a constant in [05,1]. Then il-Po, ika>0; P(5k+1=1)= 1/2, ika=0; p0, if Dk < 0. This allocation policy tends to balance the number of observations from both pop- ulations. The Adaptive Biased Coin Design, proposed by Wei (1978), allocates subjects to A or B according to the following rule. Let D], denote the difference of N A(k) / k and N3(k)/k. Let h : [—1,1] —> [0,1] be a non-increasing function such that h(:r) = 1 — h(—x) for any as 6 [-1,1]. Then P(6k+1 = 1) = h(Dk). This allocation policy may force an extremely imbalanced experiment to be balanced very quickly. 62 4.2.2 Some Response-Adaptive Designs The Randomized Play-the- Winner Rule, proposed by Wei and Durham (1978), tends to allocate more subjects to the population with higher success proportion. This rule can be described with an urn model. An urn has balls of two different types, marked A or B. We start with 0: balls of each type. When a subject enters the study, a ball is drawn at random and replaced. If it is type A, then the subject is assigned to A. It is assigned to B otherwise. If the observation of the subject is aisuccess, then [3 balls of the same type are added. Otherwise, 6 balls of the other type added to the urn. This rule is denoted by RPW(a, B). The Randomized Adaptive Design, pr0posed by Melfi and Page (1995), tends to allocate subjects to both populations according to an optimal proportion. Suppose {Uk : k 2 1} are a sequence of i.i.d. random variables, whose common distribution is U(0,1), and which are independent of both {X,c : k 2 1} and {Y1c : k 2 1}. To minimize the variance of the estimatorpAUc) - 153(k), the desired proportion is PA(1 - PA) ”(p/hm) = W+ 193(1 -p3)° Let 17k = 7r(13,4(k),153(k)), where 15,4(k), 13,4(k) are two estimators of the success probabilities pi and 103. Then 6k+1 = [{Uk+1 < 7706)}. 63 4.3 The Confidence Intervals in Adaptive Designs Because of the adaptive nature of the design, the distribution of SA(k) may no longer be Binomial Bin(NA(k), pA). And S A(k), 53(k) are no longer independent. Therefore, the validity of constructing confidence intervals using the non-adaptive formulas needs to be verified. The maximum likelihood estimators, at stage 1:, of the success probabilities, 1),, and p3, are mus) - 22%.) and 1013(k) = 2:22))- Some statisticians have studied asymptotic prOperties in some adaptive design settings, such as Eisele and Woodroofe (1995), Bai et al. (2002), Rosenberger (1993) and Rosenberger et al. (1997). Melfi et al. (2001) prove some theorems and applied them to show that (NAM/262(k) - p.) Na(k)‘/2(p‘a(k) — p3) (PAqul/2 , (pBQB)l/2 under a wide range of adaptive design rules, where Z1 and Z2 are independent ) :5. (21,22) (4.3.1) standard normal random variables. Wei et al. (1990) proved the same result under randomized play the winner rule using martingale technique. Therefore, the adap- tive version of the Wald confidence interval and the Wald interval with continuity correction up to stage It with nominal level 100(1 — a)% are _ - _ . p‘A(k)qla(k) pia(k)qia(k) (4.3.2) 64 and all) and.) = no.) at.) .. ( fill???) .. ”Bl-“ill“ . 2,10,, . W) (4.3.3) When A = A0 for A = 1),, — 123, it follows from (4.3.1) and the arguments in Cox and Hinkley(1974, page 322-323) that the variable 2{I(A(k),133(k)) — l(Ao)} is asymptotically chi-squared distributed with one degree of freedom, where A(k) = 13,.(k) —133(k). Thus an approximate 100(1—a)% profile likelihood based confidence interval for 1),; — p3 of adaptive designs is: Claw) = {A 6 (-1,1) : 2(l(13(k).233(k)) - RA» 5 xi(a)}. (4.3.4) To derive the confidence intervals with adjusted centers for adaptive designs, we define two estimators for 1),, and p3: - _ SAlk) +1 and - _ 513(k) +1 ”8“” — W“ Theorem 4.3.1. In the above adaptive setting, if 5%? ——> 1 and L255) —> 1 in probability as k —-) 00, where {ab bk} are positive constants with ak and bk tending to infinity. Then, (NAUC) + (JP/”(12346) - PA) (NBUC) + C)‘/2(p"a(k) - 103) c ( PAQA ’ «P343 ) = (21,22), (4.3.5) where c is a constant and Z1, Z2 are independent standard normal random variables. 65 Proof. The desired conclusion is a direct result of Corollary 3.1 in Melfi et al. (2001). [:1 This theorem gives the validity of the confidence intervals with adjusted centers for adaptive designs. Hence, the nominal level 100(1—a)% Bayes and Agresti-Coull confidence intervals for adaptive designs are _ - .. 1241(qu~ (k) p‘a(k) =1) i=0 where the * may be any confidence interval that the non-adaptive version CI. (j, k — j ) only involves svfl‘icient statistics: S A and S B. The proof of this theorem is based on the next Lemma. Lemma 4.4.1. In allocation adaptive designs, suppose a and b are any non-negative integers satisfying a g j and b S k — j, then 1- P(SA(’€) = GINAUC) = j) = (1)1720 — will“; 2. assoc) = bINAUc) =1) -—- (kg-into - par-H and s. P(SA(k) = a, 513(k) = blNA(k) = j) = P(SA(k) = GINAUC) = J')P(Sa(k) = bIMUC) = j)- 67 Proof. Let 75’ = (61, . . 461:)- For anyj 6 {0,1,...,k}, Cl . —> —-> {N4(k) =J}=U{ 6 = 6.} t=l where 3;) is such a k—dimension vector that has j elements with value 1 and the other k — j elements with value 0. There are C'j = (j) different such vectors. We put them in order. Note that (4.4.2) The third step is valid because {61,l Z 1} is independent of the i.i.d sequence {X,-, i = 1, 2, . . .} in allocation adaptive design. Hence, those 5’s only indicate when to take observations from A and B. We have proved that S A(k) has a conditionally binomial distribution. Similarly, we can prove the conclusion related to P(Sg(k) = bl N A(k) = j ). 68 Next to prove that given N A(k) = j, S A(k) and 83(k) are independent. The conditionally joint distribution of S A(k) and S 3(k) is J' k-J k =P (EX, =a,ZY.-=b, | 25,-:3') (4.4.3) Therefore, by the independence of the responses and the allocations, equation (4.4.3) can be rewritten in the following way: P(SA(k) = 0,5300 = blNAlk) =1) =P(:X=a|NA(k) =jP) (Zr-mum) :3) i=1 =P (5406) = a|N4(l€) = j) P (519(k) = b|N4(k) = 1') (4-4-4) Hence, the lemma holds. El Proof. (of Theorem 4.4.1). For any confidence interval, P(p4 - 193 e 019(k)) k =ZPoA—pgem ( ) more): 2') J: = Z P(p4 - 123 e CII‘(k)INA(k) = j)P(NA(k) = 3') (4.4.5) i=0 69 If the non-adaptive version of the confidence interval only involves sufficient statistics: SA and S B, the desired conclusion is achieved by applying Lemma 4.4.1 to (4.4.5). Cl Remark 4.4.1. The condition that N—fiffl —> 1 and 1‘ng —> 1 in probability as k —> 00 is not needed for the proof procedure, but does guarantee the validity of the asymptotic normality needed in constructing those confidence intervals in general adaptive designs. There is a similar theorem concerning the connection of the expected lengths of confidence intervals in non-adaptive designs and allocation adaptive designs. Theorem 4.4.2. In allocation adaptive designs, 1: EL? = ZEN]; k — j)P(N.. = j), i=0 where the * may be any confidence interval that the non-adaptive version CI. (j, k — 3') only involves the svfl‘icient statistics: S A and S 3. Proof. This proof is similar to the one of Theorem 4.4.1. El Remark 4.4.2. The five confidence intervals considered in the dissertation satisfy the requirements of the two theorems. Remark 4.4.3. The two theorems imply that for allocation adaptive designs, a confidence interval should behave well if it behaves well in non-adaptive designs. 70 4.5 Comparison of Confidence Intervals in Re- sponse Adaptive Designs For response adaptive designs, we do not have simple results as we do in allocation adaptive designs. The main reason is Lemma 4.4.1 does not hold in response adaptive designs because 6’s are not independent of the responses X’s and Y’s. However, for response adaptive designs, we still have the same conclusion via simulation: if a confidence interval behaves well in non-adaptive designs, one may expect this confidence interval to behave well in response adaptive designs. We obtain this conclusion through extensive simulation studies on some response adaptive designs. All the results shown use a simulation with 10000 iterations for each realized (n,pA,pB). We concentrate on RPW (1,1), the randomized play the winner design with a = 1 and fl = 1, in this dissertation. Similar conclusions hold for some other response adaptive designs such as the randomized adaptive designs. As we did in non-adaptive designs, to explore the average performance of the five confidence intervals, we randomly sampled 10,000 values of (n, pA, p3), taking pA and p3 independently from U (0, 1) and taking it from uniform distribution over {10, 11, . . . , 100}. We then applied RPW(1, 1) rule to the sampled n to achieve the sample sizes from the two treatments. Figure 4.1 shows the average coverage performance of the five intervals with means and medians of the coverage probabilities listed. Similar to the results in 71 Figure 4.1: Coverage probability Boxplots of some 95% nominal intervals upon RPW(1,1) 8 -: v—HI—u F-fi r—':—\ ,—__. 3.? - rm l::l r—L g o E 2 8 s - o” median==.914 median=.962 median=.950 median=.955 median=.949 8 mean: 875 mean::.948 mean=.952 mean=.958 mean=.950 d d Wald Wald_CC Bayes Agresti-Coull PLB non-adaptive designs, the Wald interval behaves poorly: the coverage probability is unstable and very low with median 0.914 and mean 0.875 at the 95% nominal level. It also occasionally has very low coverage probabilities. Though the cover- age probabilities of the Wald interval with continuity corrections are higher than those of the Wald interval, it inherits some disadvantages of the Wald interval too: occasional very low coverage probabilities and unstable performance. The average coverage behaviors of the Bayes interval, the Agresti-Coull interval and the Profile likelihood based interval are very similar: their means and medians are close to the 95% nominal level. The profile likelihood interval is not as stable as the intervals with adjusted centers. When comparing Figure 3.1 and 4.1 or simply comparing the corresponding 72 Figure 4.2: Expected Length Boxplots of some 95% nominal intervals under RPW(1,1) F“ 1.5 medians.475 median=.587 median=.488 median=.499 median=.504 mean=.499 mean==.646 mean=.517 mean=.539 mean=.537 F! '———H Mean Length hd Wald Wald_CC Bayes Agresti-Coull PLB mean and median coverage probabilities, we notice one interesting point. The average coverage performance of the Wald interval and the Wald interval with continuity correction in RPW(1,1) is much worse than it is in non-adaptive designs. However, this is not true for intervals with adjusted centers, which makes the intervals with adjusted centers desirable with RPW(1,1) and some other adaptive designs because of their stable performance. We also plot the mean length boxplots of the five intervals with RPW(1,1) in Figure 4.2. Since the Wald interval with continuity correction is too wide compared to other intervals, we discard it in the following comparisons. And because the two intervals with adjusted centers are very similar, we will only consider the Agresti- Coull interval henceforth. 73 Figure 4.3: Coverage probabilities of three 95% nominal intervals for n = 20 and m = 0.5 under RPW(1,1) 0.95 Coverage Probability O 90 1 In {D -l o‘ —— Wald o ------------ Agresti-Coull g _ —-- Profile likelihood 0.2 0.4 0.6 0.8 pb Figure 4.4: Expected lengths of three 95% nominal intervals for n = 20 and p A = 0.5 upon RPW(1,1) 0! O °°. J O E _1 f‘. —l o i 2 ‘0. .. O — Wald ------------- Agresti-Coull m ——-- Profile likelihood o' ‘ , a l 0.2 0.4 0.6 0-8 pb 74 Figure 4.5: Coverage probabilities of three 95% nominal intervals for n = 20 and pi = 0.9 upon RPW(1,1) 8. :""‘7:\\ ............................................ .xt— —- \ ...................................................... , ‘ _____ ‘ , , , _ _ \ , \ / i \ ’ \ / s a - — w... E 6 ~ ----------- Agresti-Coull 2 - - -- Profile likelihood Q .. 8 9 s O [\- _ o T l 1 l 0.2 0.4 0.6 0.8 pb Figure 4.3 plots the coverage probabilities of the Wald interval, the Agresti- Coull interval and the profile likelihood based interval at the 95% nominal level for n = 20, [9,4 = 0.5, p3 = 0.05 through 0.95 with step-size 0.05 with RPW(1,1). And Figure 4.4 plots the corresponding mean lengths of those three intervals. We see that the Agresti-Coull interval has both satisfactory coverage performance and the expected length in this setup. Its coverage probabilities are almost all (right) above the nominal level and it has the shortest length unless pH is close to either boundary, i.e, the difference of 1),; — pH is not very big. Though the values p3 taken are symmetric around p4 = 0.5, Figure 4.3 and Figure 4.4 do not exhibit any symmetry. This is due to the adaptive nature of the RPW(1,1) design and pA/pB being not symmetric around pA = 0.5. 75 Figure 4.6: Expected lengths of three 95% nominal intervals for n = 20 and p A = 0.9 upon RPW(1,1) 0. a o h o’ - § .. . o i 2 m j —— Wald c5 ------------- Agresti-Coull \ —--. Profile likelihood V. s O I l I j pb Different from the setup of Figure 4.3 and Figure 4.4, let pA = 0.9 in Figure 4.5 and Figure 4.6. When p; is far from p3, the coverage probability of the profile likelihood interval is rather high. It drops when p, and p3 gets closer. Contrary to the profile likelihood interval, the coverage probability of Agresti-Coull interval is much less sensitive to the relative positions of pA and p3. The coverage remains above nominal level. Though the expected length of the Agresti-Coull interval is much greater than that of the profile likelihood based interval when pA - p3 is large, it is close to the latter when pA - pH is not very large. This is verified through our extensive simulation. Actually, when the total sample size n increases, the disadvantage of the expected length of the Agresti-Coull interval when pA — p3 is large decreases. For example, when n = 100 and keep p, and p3 same as in Figure 76 4.6, the mean lengths of the three intervals are comparable. The expected length of the Agresti-Coull interval is less than that of the profile likelihood based interval most of the time and it has the smallest length when p3 is not very close to either boundary. In Figure 4.6, one may notice that the expected length of the Wald interval is much smaller of those of the other two intervals, especially when 193 is close to 0 or 1. This is due to the high frequency of the occurrence of the empty Wald interval when the sample size is small and the success proportions are close to boundaries. This is also the reason for the low coverage probability of the Wald interval in Figure 4.5. The feature of the much lower expected length of the Wald interval is not so obvious or does not exist for moderate(say, n = 50) or large sample size(say, n = 100). Let us compare the three confidence intervals from another point of view: let the total sample size n vary and keep pA and p3 as constants. Figure 4.7 and Figure 4.8, respectively, plot the coverage probabilities and mean lengths of the three confidence intervals for n varying from 10 through 100 with p, = 0.7 and p3 = 0.4. The Agresti-Coull interval has both the highest coverage probability and the shortest expected length for most values of n. This makes the Agresti-Coull interval very attractive in application. Another advantage of the Agresti-Coull interval is it may achieve the nominal level for very small sample sizes. When the sample size increases, the coverage probability of the Agresti—Coull interval tends to go down and fluctuate around the nominal level which may be 77 Figure 4.7: Coverage probabilities of three 95% nominal intervals for n = 10 — 100 and 1),, = 0.7, p3 = 0.4 upon RPW(1,1) 3 _ ''''''''' I~PM:17:7,?‘2‘};E7t7ifiyii©m7\7‘evm g. o -1 A If V I \’ \I - I a. 3 8 . 9 o‘ g — Wald 8 J ------------- Agresti-Coull —-—- Profile likelihood (D to .. o I l 1 I f 20 40 60 80 100 Figure 4.8: Expected lengths of three 95% nominal intervals for n = 10 — 100 and p), = 0.7, p3 = 0.4 upon RPW(1,1) 1.0 — Wald _ ............. Agresti-Coull — - - - Profile likelihood Mean Length 0.4 0.5 0.6 0.7 0.8 0.9 78 explained by the central limit theory for adaptive designs. Our extensive simulation shows that the Agresti-Coull interval always has the most satisfactory coverage probability when pA and p3 are not far apart from each other (say, lpA — pH] < 0.5). When IpA — p3| is very large, the profile likelihood based interval has the highest coverage probability. The expected length of the Agresti-Coull interval is also satisfactory unless the two proportions are close to boundaries. 4.6 Conclusion In summary, compared to other intervals discussed, the intervals with adjusted cen- ters behave best with RPW(1,1). They have both stable and satisfactory coverage probabilities and expected lengths in a frequentist sense. The stableness of the two intervals makes them good intervals in other adaptive designs. Our simulation with some other adaptive designs such as the randomized adaptive designs and adap- tive weighted difference designs, due to Geraldes (1999), confirms this conclusion. Therefore, we suggest the intervals with adjusted centers to be used in adaptive designs. One may expect to improve the coverage performance of the intervals with adjusted centers for large sample size by adjusting the weights of 114(k) and 1 / 2 when defining 114(k) for large k’s. We may adjust 153(k) the same way. 79 Bibliography AGRESTI, A. and CAFFO, B. (2000). Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Amer. Statist. 54 280—288. AGRESTI, A. and COULL, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial prOportions. Amer. Statist. 52 119—126. BAI, Z. D., HU, F. and ROSENBERGER, W. F. (2002). Asymptotic properties of adaptive designs for clinical trials with delayed response. Ann. Statist. 30 122-139. BERRY, D. (1996). Statistics: A Bayesian Perspective. Belmont, CA:Wadsworth. BHATTACHARYA, R. N. and RANGA RAO, R. (1976). Normal approximation and asymptotic expansions. John Wiley & Sons, New York-London-Sydney. Wiley Series in Probability and Mathematical Statistics. BROWN, L. D., CAI, T. T. and DASGUPTA, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions. Ann. Statist. 30 160—201. 80 COX, D. R. and HINKLEY, D. V. (1974). Theoretical statistics. Chapman and Hall, London. EFRON, B. (1971). Forcing a sequential experiment to be balanced. Biometrika 58 403-417. EISELE, J. R. (1994). The doubly adaptive biased coin design for sequential clinical trials. J. Statist. Plann. Inference 38 249-261. EISELE, J. R. and WOODROOFE, M. B. (1995). Central limit theorems for doubly adaptive biased coin designs. Ann. Statist. 23 234—254. GERALDES, M. (1999). Covariates in adaptive designs for clinical trials. Ph.D Dissertation, Michigan State University . HALL, P. (1992). The bootstrap and Edgeworth expansion. Springer-Verlag, New York. MELFI, V. F. and PAGE, C. (1995). Randomized adaptive designs. Inst. Math. Statist., Hayward, CA. MELFI, V. F., PAGE, C. and GERALDES, M. (2001). An adaptive randomized design with application to estimation. Canad. J. Statist. 29 107—116. NEWCOMBE, R. G. (1998). Interval estimation for the difference between inde- pendent proportions: Comparison of eleven methods. Statistics in Medicine 17 873—890. 81 ROSENBERGER, W. F. (1993). Asymptotic inference with response-adaptive treat- ment allocation designs. Ann. Statist. 21 2098—2107. ROSENBERGER, W. F., FLOURNOY, N. and DURHAM, S. D. (1997). Asymp- totic normality of maximum likelihood estimators from multiparameter response- driven designs. J. Statist. Plann. Inference 60 69—76. SMYTHE, R. T. and ROSENBERGER, W. F. (1995). Play-the-winner designs, generalized Polya urns, and Markov branching processes. In Adaptive designs (South Hadley, MA, 1992). Inst. Math. Statist., Hayward, CA 13-22. WEI, L. J. (1978). The adaptive biased coin design for sequential experiments. Ann. Statist. 6 92—100. WEI, L. J. and DURHAM, S. (1978). The randomized play-the-winner rule in medical trials. J. Amer. Statist. Assoc. 73 840—843. WEI, L. J., SMYTHE, R. T., LIN, D. Y. and PARK, T. S. (1990). Statistical in- ference with data-dependent treatment allocation rules. J. Amer. Statist. Assoc. 85 156—162. WILSON, E. (1927). Probable inference, the law of succession, and statistical inference. American Statistical Association 22 209—212. WOODROOFE, M. (1982). Sequential allocation with covariates. Sankhya Ser. A 44 403—414. 82