. . k. i. a: .u... 3.5.... . «I. .c an??? Ira: a fir. - .. r h: a . . : ttfluufl». . a in}: .5 :l. fi x! ‘1—‘ZQVi arm; {“3. p.539”. 1 . o laurunfluavxflxfi. I 1. I vtuWflu .......M t. ‘usfiho. T)... 3.1.)! l) . .rrlifli.$. .V..Lx~".nfi‘:~)ul C . ...Il.lv . 52.)..13PH0v73‘. u .4 5 .11 «an. 5115!: F X; Fusfyh...h -. HI. 9 if 2x. $.51... .3 Lti "”5 illllllllllllllllIIIHHHIIIIllllllilllllIIIIHIIIIIIHIIHHI ”‘79 31293 01810 3741 This is to certify that the dissertation entitled Co mt {ans (n Add 9 L V4 Designs {of C 9'9“ch TrCaQS presented by M artfu {4Q Crést‘ nq GerchoLls has been accepted towards fulfillment of the requirements for Yfi-D degree in Statistics and vrobqkiex‘tj gum/p M ajor professor/ Date 06- 30*‘19 MSU is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State UnIversIty PLACE IN RE1URN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 1m animus-p.14 COVARIATES IN ADAPTIVE DESIGNS FOR CLINICAL TRIALS By Margarida Cristina Geraldes A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1999 (l llllt'i'VJl“ I ‘1‘ (“WW all Il (are iv.” My» be (15117,” . .11 ml“ I93!” I}! In II“ _l ImplF‘fliq . ' l? ABSTRACT COVARIATES IN ADAPTIVE DESIGNS FOR CLINICAL TRIALS Bv I/ Margarida Cristina Geraldes This dissertation addresses the problem of designing clinical trials in such a way that a good compromise is achieved between the need to draw reliable statistical inferences from the data collected in the trial (utilitarian goal of a design) and the concern that each patient enrolled in the trial is receiving the best possible medical care (individualistic goal). These goals are often conflicting. The problem can further be complicated when relevant clinical information (covariates), which is likely to affect the responses of the patients to the treatments, is to be incorporated into the design. In this dissertation we develop and study three new designs that seek a com- promise between the utilitarian and individualistic goals. These procedures can be implemented in clinical trials for which eligible patients arrive sequentially and can be given either one of the two treatments being compared or evaluated in the trial. It is assumed that the responses of patients to the treatments are dichotomous (i.e. either a success or a failure). The new designs are randomized and adaptive. The first design is called The Adaptive Weighted Differences Design, abbreviated AVVD, and does not use covariate information on the patients. It can be seen as a generalization of The Adaptive Biased Coin Design of Wei (1978), so that ethical issues are also taken into consideration. This is achieved by taking into account, at each stage of the trial. but of (Mill; Ihv C pitfalls 1. (fibril;- ihiitl "it CRI‘W Ilie CR Rule 17:, ll‘. Mlle r hip SUE: linit E}; The ll}; lIPajm. the Ifli {Biker Wu .1711 Simul- Tin-Ups}: ,— s r... 5-- 9 ID If") ( IiiY‘v u trial, both the proportion of patients assigned to each treatment and the proportion of patients successfully treated in each treatment group. The second design is called The Covariate Adaptive Weighted Difierences Design, abbreviated CAW D, and incor- porates covariates into the AWD design using an innovative approach that consists in crossing-over information on the responses of patients from stratum to stratum. The third design is called The Covariate Randomized Play-the- Winner Rule, abbreviated CRPW, and corresponds to a multiple urn model; each urn represents a stratum. The CRPW rule can be seen as a generalization of The Randomized Play—the-W inner Rule (by Wei and Durham (1978)) within strata. It allows the responses of patients in one stratum to change the composition of the urns corresponding to all the possi- ble strata of the population of patients. Strong laws of large numbers and a central limit theorem are proved for each design. The proofs rely on martingale techniques. The main result for the CRPW rule is that the proportions of balls representing each treatment in each urn converge almost surely as the number of patients enrolled in the trial converges to 00. The proof of this type of results for single urn models is rather involved. The crossover of information on the responses of patients significantly complicates the arguments needed to prove this type of convergence. Monte Carlo simulations are used to evaluate the performance of the designs. The simulations illustrate the excellent combined performance of the designs (in terms of both the proportion of patients successfully treated and the mean squared error for estimating the treatments difference), for suitable choices of their parameters, when compared to complete randomization and the randomized play-the-winner rule. To Miguel. iv .. .. "w 1‘" .J With i: l. 2 - mums GlW'ITiil ACKNOWLEDGEMENTS I would like to express my sincere gratitude to my dissertation advisers, Professor Connie Page and Professor Vincent Melfi, for their constant guidance and mentorship during the past two and half years. Thank you for always having a word of encour- agement when I needed it the most. I sincerely appreciate the time you put into our weekly meetings and that you were still willing to answer my questions when I came knocking at your doors. Thank you for all the hours you spent proofreading my many dissertation drafts. My job search showed me how important a good research theme is; I am extremely grateful you suggested that I work in adaptive designs. Finally, thank you so much for believing in me and giving me the courage to pursue a career in the pharmaceutical industry. I would also like to thank Professor James Stapleton and Professor Peter Lappan for serving on my guidance committee and carefully reading my dissertation. I am sure you could think of better things to do in these beautiful spring days. I want to thank Professor Page for teaching me how to be a good consultant (or so I would hope). Thank you for giving me the opportunity to learn applied statistics, SAS and SPSS and for your words of encouragement during the entire SCS experience. Thank you for being so friendly and patient, and for always having the time and tact to fix problems that arose when I allowed clients to step over the line. There is nothing I would want to change in the way you supervised the activities of the SCS. Finally, I can never thank you enough for accepting to be the chair of my guidilliw ( I Lil‘” 5'tihjfl'l53 have had LaTt‘X. I“ mat rum Illilljh tux:- re‘ami I}. I will rim 1 fix efficient 5' on the oil for 03m: me whwm 93):;1}‘11ri [g émd your to take in A? lay mt} IQ fig. 7‘ diVV'Fv- (1.; } .‘ J" guidance committee. It’s how all this started. I also want to thank Professor Melfi for first teaching me two of my favorite subjects: martingales and adaptive designs. You are amongst the best teachers I have had in my many years as a student. Thank you also for introducing me to LaTex, re-introducing me to Fortran and for not, losing your patience every time I went running to your office asking you to fix my errors. I shudder to imagine how much time I would have spent if I had had to learn, on my own, all the computer related things that you taught me. I would not want to forget to thank Professor Stapleton for being so friendly when I first contacted the department of Statistics and Probability. Thanks to your efficiency and warm e-mails I never once doubted my decision to come to a country on the other side of the Atlantic Ocean to continue my graduate studies. Thank you for offering me an assistantship even though I had not requested one and for helping me whenever my University back home tried to make my life difficult. Finally, I truly enjoyed taking your classes in linear models and categorical data analysis; the subject and your teaching of it were like a breath of fresh air after some of the courses I had to take in this department. At last, but not the least, I want to thank my parents and brothers who encouraged me to live my life on my own terms and to make my own decisions. I know you will always be there to pick up the pieces... vi Cont LIST OF 1 Litera II 11 Contents LIST OF FIGURES xii 1 Literature Review 1 1.1 Introduction ................................ 1 1.2 Allocation-Adaptive Designs ....................... 3 1.3 Response-Adaptive Designs ....................... 13 1.4 Applications ................................ 18 2 The Adaptive Weighted Differences Design 20 2.1 Introduction ................................ 20 2.2 The Allocation Policy ........................... 21 2.3 Strong Laws of Large Numbers ..................... 25 2.4 Central Limit Theorem .......................... 32 2.5 Evaluation of the Design ......................... 35 3 The Covariate Adaptive Weighted Differences Design 49 3.1 Introduction ................................ 49 3.2 The Allocation Policy ........................... 51 vii 3.3 Sir 3.1 ('1‘ 3.3 EV The C1 1.1 It 12 Ti 13 R, 1.4 ('1 3.2.1 General Notation and Assumptions ............... 3.2.2 The CAWD Allocation Policy .................. 3.3 Strong Laws of Large Numbers ..................... 3.4 Central Limit Theorem .......................... 3.5 Evaluation of the Design ......................... The Covariate Randomized Play-the-Winner Rule 4.1 Introduction ................................ 4.2 The Allocation Policy ........................... 4.3 Strong Laws of Large Numbers ..................... 4.4 Central Limit Theorem .......................... 4.5 Evaluation of the Design ......................... viii 68 90 9O 92 119 119 142 List 0 3.1 (11:11; 3.2 C111“; 33 (“diff 21 C-III.‘ 23 (‘(,YV:V - .1“; . 'V "l A A4 L: ”:4 Iv (I) :2 V-‘ D'—« 1 CM; 3.1 (M, Pi": List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 3.1 3.2 3.3 Comparisons (AWD design) for n = 20 and PA = 0.05 ......... Comparisons (AWD design) for n = 20 and PA = 0.35 ......... Comparisons (AWD design) for n = 20 and [7A 2 0.50 ......... Comparisons (AWD design) for n = 20 and pg 2 0.65 ......... Comparisons (AWD design) for n = 20 and 17,; = 0.95 ......... Comparisons (AWD design) for n = 100 and p4 = 0.05 ........ Comparisons (AWD design) for n = 100 and 17,4 : 0.35 ........ Comparisons (AWD design) for n = 100 and [7/4 = 0.50 ........ Comparisons (AW D design) for n = 100 and [7/1 = 0.65 ........ Comparisons (AW D design) for n = 100 and [2A 2 0.95 ........ Comparisons of proportion of successes (CAW D design) for n = 30, Comparisons of mean squared error in stratum 1 (CAW D design) for n = 30, pA(1) = 0.50, 173(1) : 0.10, 114(2) : 0.15 and pB(2) = 0.40 . . Comparisons of of mean squared error in stratum 2 (CAVV D design) for n = 30, [14(1) 2 0.50, 173(1) 2 0.10, pA(2) = 0.15 and [13(2) : 0.40 ix 46 47 48 72 73 74 34 Cum [1‘11 3.5 Cum 3.6 Cum . l1 If I} 3.7 (bur; 3.‘ CHILI 3-9 (1111;; for n 3-10 Cum; P111 1 a 3‘11 (will; '31? (‘Unu {HI- .7) 3.13 Con; Pal, 314 Cf)”. 3.4 Comparisons of proportion of successes (CAW'D design) for n = 150, pA(1) = 0.50, 173(1) 2 0.10, 1),;(2) = 0.15 and 173(2) 2 0.40 ...... 3.5 Comparisons of mean squared error in stratum 1 (CAW D design) for n = 150, pA(1) = 0.50, 123(1) 2 0.10, pA(2) = 0.15 and 198(2) 2 0.40 3.6 Comparisons of of mean squared error in stratum 2 (CAW D design) for n 2150,17,;(1) = 0.50, 173(1) 2 0.10, 114(2) 2 0.15 and 113(2) 2 0.40 3.7 Comparisons of proportion of successes (CAWD design) for n = 30, 154(1) 2 0.90, 123(1) 2 0.10, p,4(2) : 0.60 and 123(2) 2 0.15 ...... 3.8 Comparisons of mean squared error in stratum 1 (CAW D design) for n = 30, p,4(1) = 0.90, [13(1) 2 0.10, p,.1(2) 2 0.60 and 173(2) 2 0.15 . . 3.9 Comparisons of of mean squared error in stratum 2 (CAWD design) for n :2 30,12,1(1): 0.90, 173(1) = 0.10, pA(2) = 0.60 and 173(2) 2 0.15 3.10 Comparisons of proportion of successes (CAWD design) for n = 150, 3.11 Comparisons of mean squared error in stratum 1 (CAW D design) for n = 150, pA(1) = 0.90, p3(1) = 0.10, pA(2) = 0.60 and 173(2) = 0.15 3.12 Comparisons of of mean squared error in stratum 2 (CAWD design) for n = 150, p,.;(1) = 0.90, 123(1) 2 0.10, pA(2) = 0.60 and 193(2) = 0.15 3.13 Comparisons of proportion of successes (CAW D design) for n = 30, pA(1) = 0.35, 173(1) : 0.50, pA(2) = 0.15 and [73(2) 2 0.85 ...... 3.14 Comparisons of mean squared error in stratum 1 (CAW D design) for n = 30, pA(1) = 0.35, 173(1) 2 0.50, pA(2) : 0.15 and 173(2) : 0.85 . . 76 77 78 79 80 81 82 83 84 3.13 (1111 for r 3.16 Curr P111 3.1? Curr; 3.15 Curr: I111 ’1 ‘11 Gilli; 3.15 3.16 3.17 3.18 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Comparisons of of mean squared error in stratum 2 (CAW D design) for n = 30,19,1(1) = 0.35, 113(1) 2 0.50, p,4(2) = 0.15 and 193(2) 2 0.85 86 Comparisons of proportion of successes (CAIN D design) for n = 150, 114(1) 2 0.35, 193(1) 2 0.50.11.1(2): 0.15 and [73(2 )2 0. 85 ...... 87 Comparisons of mean squared error in stratum 1 (CAW D design) for n = 150, 114(1) 2 0.35.1)B(1) = 0.50, 114(2) 2 0.15 and 173(2) : 0.85 . 88 Comparisons of of mean squared error in stratum 2 (CAW’D design) for n = 150, 154(1) 2 0.35, 173(1) 2 0.50, 19,1(2) = 0.15 and 113(2): 0.85 89 Comparisons of proportion of successes (CRPW design) for n = 30, 1744(1) = 0.50, 193(1) 2 0.10, 114(2): 0. 60 and pB(2 )2 0. 85 ...... 124 Comparisons of mean squared error in stratum 1 (CRPW design) for n = 30, 114(1) 2 0.50, pB(1) : 0.10.11.4(2) = 0.60 and [73(2) : 0.85 . . 125 Comparisons of mean squared error in stratum 2 (CRPW design) for n = 30, [14(1) = 0.50, 193(1) 2 0.10. 291(2) 2 0.60 and 193(2) = 0.85 . . 126 Comparisons of proportion of successes (CRPW design) for n = 150, [711(1): 0.50, 133(1) : 0.10, p.4(2)— — 0. 60 and 113(2 )2 0. 85 ...... 127 Comparisons of mean squared error in stratum 1 (CRPW design) for n = 150, pA(1) = 0.50, 193(1) 2 0.10, p.4(2) = 0.60 and 193(2) 2 0.85 . 128 Comparisons of mean squared error in stratum 2 (CRPW design) for n = 150, p.4(1) : 0.50, [73(1) : 0.10, pA(2) = 0.60 and 173(2) : 0.85 . 129 Comparisons of proportion of successes (CRPW design) for n = 30, pA(1) = 0.65, 173(1) 2 0.10, [51(2): 0. 40 and p3(2 )2 0.15 ...... 130 xi 4.111 CNN i111] 4.11 Cu“ 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 Comparisons of mean squared error in stratum 1 (CRPW design) for n = 30,13,1(1) = 0.65, pB(1) = 0.10, [14(2) 2 0.40 and 123(2) :— 0.15 . . Comparisons of mean squared error in stratum 2 (CRPW design) for n = 30.11.4(1) = 0.65, 173(1) 2 0.10, [14(2) 2 0.40 and p3(2) = 0.15 . . Comparisons of proportion of successes (CRPW design) for n = 150, 1),;(1) = 0.65, p3(1) = 0.10, 114(2) 2 0.40 and p3(2) = 0.15 ...... Comparisons of mean squared error in stratum 1 (CRPW design) for n = 150, [14(1) 2 0.65, p3(1) = 0.10, 1),;(2) = 0.40 and p3(2) = 0.15 Comparisons of mean squared error in stratum 2 (CRPW design) for n = 150, 114(1) 2 0.65, [13(1) : 0.10, 114(2) 2 0.40 and 103(2) 2 0.15 Comparisons of proportion of successes (CRPW design) for n = 30, Comparisons of mean squared error in stratum 1 (CRPW design) for n = 30,19,1(1) = 0.35, pB(1) = 0.10, 114(2) 2 0.65 and 193(2) :2 0.15 . . Comparisons of mean squared error in stratum 2 (CRPW design) for n = 30, pA(1) = 0.35, p3(1) = 0.10, [)A(2) 2 0.65 and 103(2) 2 0.15 . . Comparisons of proportion of successes (CRPW design) for n = 150, pA(1) = 0.35, 123(1) 2 0.10,p,1(2) = 0.65 and pB(2) = 0.15 ...... Comparisons of mean squared error in stratum 1 (CRPW design) for n = 150, [14(1) 2 0.35, p3(1) = 0.10, 114(2) 2 0.65 and 193(2) 2 0.15 Comparisons of mean squared error in stratum 2 (CRPW design) for n = 150, pA(1) 2: 0.35, 193(1) 2 0.10, pA(2) = 0.65 and 193(2) 2 0.15 xii 131 132 133 134 136 137 138 139 140 141 Chal Liter 1.1 11 (111151111?! a in redurin; patients at can be m. then. The pp to which (1 Chapter 1 Literature Review 1. 1 Introduction Consider a clinical trial to evaluate the relative effectiveness of two drugs, A and B, in reducing the risk of rejection following kidney transplant. Suppose that eligible patients arrive sequentially and must be treated immediately and that each patient can be given either one of the drugs, A or B, and will be assigned to exactly one of them. The problem is to decide which patients involved in the study will be allocated to which drug so that two goals are achieved. On one hand it is desirable that the data collected in the study can be used to draw reliable statistical inferences for the benefit of future patients (this can be thought of as an utilitarian goal [Sarkar (1991)]). On the other hand each patient should be allocated to the drug showing the best performance thus far in the study (this can be thought of as an individualistic goal [Sarkar (1991)]). 4, (half-1“ clan with a “ gulls are u" 1" of flit" 2M1“ ‘ plll'ail’d \\'l1('l serially "f IL to the (er?- Should Iiialfl‘ but is likely I In this (in, lam and llltllVl ”lighted Di: informatiun o Drferenm D adaptive \Wl ; ME COE'I'N‘HMI Ihe appmarh Ping-3f“, ll'zm BEE-ll”? I)“. A design for the clinical trial is a procedure that attempts to provide the physi- cian with a solution to this problem. Unfortunately, the utilitarian and individualistic goals are usually conflicting. Thus, designs are developed that either focus on one of the goals or seek a compromise between them. The problem can further be com- plicated when relevant clinical information (such as age, sex, general physical status, severity of the disease, etc.), which is likely to affect the responses of the patients to the drugs, is to be incorporated into the design. Using this covariate information should make the design more efficient in terms of achieving the individualistic goal but is likely to complicate statistical inference. In this dissertation three new designs that seek a compromise between the utilitar- ian and individualistic goals are developed and studied. The first, called The Adaptive Weighted Differences Design, is presented in Chapter 2 and does not use covariate information on the patients. The second, called The Covariate Adaptive Weighted Difierences Design, is presented in Chapter 3 and incorporates covariates into the adaptive weighted differences design using an innovative approach. The third, called The Covariate Randomized Play-the- Winner Rule, is presented in Chapter 4 and uses the approach introduced in Chapter 3 to incorporate covariates into the Randomized Play-the- Winner Rule proposed by Wei and Durham (1978). Before presenting and studying these new designs we begin by reviewing some of the numerous procedures that have already been proposed in the literature. This review will serve several purposes. Firstly, it introduces the designs that are used as basic points of reference for developing the three new procedures described and studied in later chapters. Secondly, it gives an idea about the different approaches {foremnphu been (‘ulblilt‘l regarded as ii this (liw‘rtati of those new i The eiiiph henceforth he: which (in u. ll in the ("lihii'al llost of r}, lhi’first iiiriu hi or her core: 011th allow [0 ELS Ohio/7mg Which the alh Ddhemg ‘ I)“ 1~2 All is defill‘il in . “a (for example, Bayesian theory, iterative procedures, optimum design theory) that have been considered for developing designs for clinical trials. Finally, this review can be regarded as an introduction and motivation for the three new procedures presented in this dissertation; it will be particularly useful in showing what are the novel aspects of these new designs. The emphasis of this dissertation is on designs with covariates. Such designs will henceforth be referred to as covariate designs, as opposed to non—covariate designs which do not take into consideration covariate information on the patients involved in the clinical trial. The main focus of this chapter is in reviewing covariate designs. Most of the designs that will now be presented fit in one of two general categories. The first includes those approaches for which the allocation of each patient depends on his or her covariate levels (when covariate information is taken into consideration) and on the allocations and covariate levels of the previous patients — these will be referred to as allocation-adaptive designs. The second category includes those approaches for which the allocation of each patient depends also on the responses of the previous patients —- these will be called response-adaptive designs. 1.2 Allocation-Adaptive Designs As defined in the previous section, allocation-adaptive designs are such that the allo- cation of each patient depends only on the allocations of the previous patients (and possibly on the covariate levels of the present and previous patients). These designs do not take into consideration the responses of patients to the treatments under eval- Utilitill in a ('. are orient ed ‘ hassiriiitig t of the Hill-Ill) Fred. haitutt' This sect int; tieris. Altht pmt‘t‘ihitm Trailiiiuiml :Blal'lflwl} , used as his We ”start \\ have been in Chaplet “'ei 1975 lri the: afri've $qu to either furt‘m.”Ii (O’Variatc. uation in a clinical trial and, therefore, do not address the individualistic issue. They are oriented towards statistical inference. A popular approach, with this goal in mind, is assigning treatments in such a way that some degree of balance is achieved in terms of the number of patients allocated to each treatment. When covariates are consid- ered, balance is also sought in the distribution of treatments across covariate levels. This section reviews some of the approaches developed for achieving balanced alloca- tions. Although the main emphasis will be on designs which incorporate covariates, procedures which do not make use of this extra information will also be presented. Traditional designs such as Complete Randomization, Randomized Permuted Blocks [Blackwell and Hodges (1957)] and The Biased Coin Design [Efron (1971)] are often used as basic points of reference when developing approaches that do use covariates. We start with a brief review of these three non-covariate designs and show how they have been modified to include covariates. The new procedures that will be developed in Chapters 2 and 3 are related to The Adaptive Biased Coin Design prOposed by Wei (1978); this design is also included in this review. In the remainder of this section, unless otherwise noted, it is assumed that patients arrive sequentially at the experimental site and that each patient can be assigned to either one of T 2 2 treatments and will be assigned to exactly one of them. Furthermore, when covariate information is to be used in the design, it is assumed that covariate measurements are obtained on each patient before allocation to a treatment. Throughout this section, reference is made only to categorical covariates. It is worth mentioning that discrete or continuous covariates can also be considered by grouping the possible values of the covariates into finitely many levels. 4 Supp: m, {inn consist l/f. As. (ii ital triai is it can he u conscious it ill? lili‘tllt‘d Plrtf‘ rainlt Spet'iaiii‘ ' 9WD Hit’ire by (‘tttiside Cnn‘iplm. tetals air. The R H.937],- E,“ adl'alit‘g Ff‘" , - \dx‘ldTlol rand!) m i 1. Suppose that there are T 2 2 treatments under evaluation. Complete Randomiza- tion consists of assigning each patient to treatment t, t E {1, - - - ,T}, with probability l/T. As discussed in Efron (1971), when the main emphasis of a design for a clin- ical trial is on the utilitarian aspect, complete randomization is well established for it can be used as a basis for statistical inference while minimizing the possibility of conscious or unconscious selection bias [Blackwell and Hodges (1957)] on the part of the medical investigator. Ethical considerations aside, a major disadvantage of com- plete randomization is that some unpleasant imbalances in treatment totals are likely, especially when the number of patients in the trial is small. This problem becomes even more apparent when covariates are considered relevant and strata are formed by considering subgroups of patients with common combinations of covariate levels. Complete randomization in this setting can create serious imbalances in treatment totals across strata. The Randomized Permuted Blocks Design, introduced by Blackwell and Hodges (1957), assumes that the total number of patients involved in the trial is known in advance. The design can be described as follows. If there are two treatments under evaluation, randomly divide the number of patients into blocks of length 2b. Then randomly assign, within each block, 1) patients to each one of the treatments. Al- though this design can be quite effective in achieving balance while retaining some randomization, its main disadvantage is that the assignments of some patients will be known in advance. This procedure can be modified to allow for covariates by simply assigning pat proposal by not usually k (lure. shim, l Ethattheii lads lllt'it'd Strata (811]) 35 the nuntl. Th6" Bu, 0fihtltrea asigntnent: X81: = k — Nil: 7‘ e _‘ if D], > um: assigning patients within each stratum by means of randomized permuted blocks, as proposed by Zelen (1974). However, the total number of patients in each stratum is not usually known in advance, which is a restriction on the applicability of this proce- dure. Also, Pocock and Simon (1975) note that a major difficulty with this approach is that the number of strata increases rapidly as the number of covariates and their levels increase. Furthermore, they argue that randomized permuted blocks within strata can prove inadequate to achieve its basic goal - balance within each stratum - as the number of strata approaches the number of patients involved in the clinical trial. The Biased Coin Design, proposed by Efron (1971), allocates patients to one of two treatments, A or B, according to the following rule. Suppose that after k assignments, NA], patients have been allocated to treatment A and the remainder N 3,1: = k — N A, k have been allocated to treatment B. Let Dk denote the difference NAJc/k — NBJc/k. Let p be a constant in [0, 1]. Then, for p 2% if Dk > O, allocate the (k + 1)st patient to treatment A with probability 1 ——— p; if D), = 0, allocate the (k + 1)st patient to treatment A with probability %; if D, < 0, allocate the (k + 1)st patient to treatment A with probability p. This allocation policy tends to balance the number of patients allocated to both treatments, the tendency being weakest if p = % (which corresponds to complete randomization) and strongest if p = 1 (which corresponds to randomized permuted blocks with b = 1). Wei (1978) notes that a disadvantage of this procedure is that, in assigning the next patient to a treatment, the allocation policy neither takes into consideration the number of patients treated thus far, nor does it discriminate be- 6 tit't’t’ll Sillilll . ofthe biaseil Th? Add," annfintu h“ i.., _: MP ill“. ll (‘1‘. ahutated to h::~l.l: ~ rn fl-ii thil. hi 3Hf9d luit‘ 35. ill? exp. YaHIPiEI’ p annnglp POFW Up a Imp." ’- Li" -, . J\dnarc ; t' tween small and large absolute values of Dk. Wei (1978) proposes a new procedure of the biased coin type that takes these issues into consideration. The Adaptive Biased Coin Design, proposed by Wei (1978), allocates patients according to the following rule. Suppose that after k assignments, Ni“, patients have been allocated to treatment A and the remainder N BJC = k — NA) have been allocated to treatment B. Let Dk denote the difference N Ag, / k — NB’k/k and let h : [-1,1] —+ [0,1] be a non-increasing function such that h(a:) = 1 — h(—:z:), for all :r 6 [—1,1]. Then, allocate the (k + 1)st patient to treatment A with probability h(Dk). This allocation policy forces an extremely imbalanced experiment to be bal- anced but tends to complete randomization as the difference Dk tends to zero (i.e. as the experiment approaches perfect balance). Efron’s biased coin design with pa- rameter p is the particular case of the adaptive biased coin design corresponding to setting h(:1:) = p for —1 g :1: < 0 and h(0) : 1/2. Both Efron (1971) and Wei (1978) mention that if covariate information is avail- able, in can be taken into consideration by applying the biased coin design procedures separately within each stratum. Pocock and Simon (1975) suggest an allocation rule which can be viewed as a generalization of Efron’s biased coin design to more than two treatments and several covariates. The design relies on a function G which measures the total amount of imbalance (in the distribution of treatment numbers within the levels of each covari- 7 ate] multin- ate then null-u ailntratiun [in get its (hand and dilut‘iitiui. the pvtiniiuuiu (lUllllZalltill. fat for small trial: disttihutinn of treatments ti. three designs. .h'tlx'lllSim dhgn of iii» In the WWW; in clinical tr i . to the if?” D t‘bnnm, Sign ”19m [1‘ Bit”: . ‘F’d (in. tuna] DP . .V I“ ate) resulting from each one of the possible assignments at each stage. Treatments are then ranked according to their G—values (rank 1 2 minimal imbalance value) and allocation probabilities are such that the smaller the G ~value for a treatment, the big- ger its chance of being assigned to the patient. Several possibilities for the G function and allocation probabilities are suggested. Finally, simulations are used to compare the performance of this new approach and three traditional designs: complete ran- domization, randomly permuted blocks and randomly permuted blocks within strata. For small trials, two treatments, a specific function G and other assumptions on the distribution of covariate levels, Pocock and Simon’s procedure is shown to enable treatments to be balanced across several covariates more effectively than the other three designs. Atkinson (1982) uses optimum design theory to develop a randomized balanced design of the biased coin type, for clinical trials with two or more treatments and in the presence, or absence, of covariate information. The procedure can be applied to clinical trials for which interest relies mainly in contrasts between treatment ef- fects on a response variable; the expected response is assumed to be linearly related to the treatments and (if present) covariates. In this context, Atkinson refers to D A—optimum design theory [Sibson (1974)] to give a procedure for obtaining the as- signment probabilities in a biased coin rule. This procedure is called DA—Optimal Biased Coin Design. Analytic expressions for the assignments probabilities are ob- tained only in the absence of covariates. Similarly to Wei’s adaptive biased coin design, this new procedure is also shown to respond to increasing imbalance. 8 Atkiiisoi pmpett it‘s i complete it design. Sui can he t’.\'.[>l stage of th- numher of t’ariahle at expressit )ll the loss la. far. the It WT) ”me C‘C‘Wariates are run U {E PXDGPT; term 016 t mill deiq lS 'dlSO n,” Atkinson (1998) follows up Atkinson (1982) and uses simulations to compare the properties of the DA—optimal biased coin design with those of three other procedures: complete randomization, a balanced deterministic approach and Efron’s biased coin design. Such properties are related to the loss of information due to imbalance, which can be expressed as the number of patients on whom information is unavailable at each stage of the trial. For two treatments the loss is defined as the difference between the number of patients treated thus far and the ratio between the variance of the response variable and the variance of the estimated contrast of treatment effects. An analytical expression for the loss is also derived when more than two treatments are considered; the loss is, in this case, expressed as a function of the number of patients treated thus far, the number of treatments, the design matrix and the matrix of contrasts. For two treatments only, simulations are run both under the assumption of independent covariates and correlated covariates, and for more than two treatments simulations are run only under the assumption of independent covariates. The results show, as expected, that the balanced deterministic approach has the best performance, in terms of the loss, while complete randomization has the very worst. Efron’s biased coin design is consistently better than Atkinson’s D A—optimal biased coin design. It is also noted that, for all these designs, the loss increases as the number of covariates increases, and that it is higher for correlated than for independent covariates. Ball, Smith, and Verdinelli (1993) develop, within the Bayesian framework, a randomized balanced design of the biased coin type for clinical trials with T 2 2 treatments and in the presence of one covariate. The procedure only has direct 9 prat‘tit'al rt patients is main disati of patient> has levels t pl’tiptill it 'li‘ level “as I?) With (‘m'ari pt; '1" fr] lllt patients Wi [It‘allllf‘lllfi Smi. Ah. and mth ”NM-r T . lip prllii‘dbl he . .; - d‘f‘s‘illF‘t] pi‘- \"y. J ‘ BiYYiv), . N4 13 miller practical relevance to clinical trials where a pool of covariate categorized potential patients is available from which the new patients can be selected; this is one of the main disadvantages of the design. The allocations and the model for the responses of patients to treatments can be described as follows. Suppose that the covariate has levels u, (j = 1, , J) and that k patients have already been assigned, with proportions fz'j = n,J-/k allocated to treatment i (i = 1, , T) when the covariate level was 12,-. Suppose that It more patients are to be allocated with mij 2 0 patients with covariate level uj to be allocated to treatment i. Let pij = mij/k. So, if $0. = pij + fij then the overall design for the 2k observations allocates k3,},- = kpij + k fij patients with covariate value v, to treatment i. The responses, yU-l, of patients to treatments are assumed to be such that yijl ~ Nfgz' +13%, 02) where i = 1, , T, j = 1, , J, l = 1, , 71,-]- +m,j, Zi2j(nij+m,j) = 2k, and yijl are conditionally independent given 6,, B and 02. It is assumed that the effects of the T treatments or T—l new treatments (when a control treatment is considered) are exchangeable with a common variance 72 and that both 02 and 72 are known. Ball, Smith, and Verdinelli (1993) show how the optimal proportion 17;]. can be identified with respect to design criteria such as D-optimality and A-optimality [Silvey (1980)]. The probability, pfj of choosing the (k + 1)st patient to have covariate value v, and to be assigned to treatment i, is then computed based on the relation between 23,,- and p,J-. Asymptotic properties of the above allocation scheme are considered when there is either a vague prior knowledge of the exchangeable treatments or a strong prior 10 [0 each l“ of path‘lllS \Vu (15 ("an ])t’ dl’l presentttitf anMyl» be int'tiilt'wi Criterion. (' theexpernn unanatettu ll : [u]. . . . innndancetgf 15 llE‘illitfd 85 Wt" . (‘f‘d'r .. “(mate fil [, the - - - . iUld] [I] ll. Au I1 specification of the exchangeable treatments implying that they are essentially iden- tical. It is shown that, in these particular cases, the proportion of patients allocated to each treatment converges almost surely to determined limits as the total number of patients converges to 00. Wu (1981) gives an iterative construction of nearly balanced assignments that can be applied in the context of clinical trials with T 2 2 treatments and in the presence of p 2 1 covariates with, possibly, different number of levels. The procedure can only be implemented if the covariate measurements of all the patients that will be involved in the trial are known in advance. The new design criterion, called B— Criterion, can be described as follows. Suppose that N patients are available for the experiment. Each patient has p covariate measurements. Assume that the ith covariate can take r,- different levels, so that each patient has a combination of levels 11 2 (ul, , up), 1 g u, g r,, where the ith covariate is at level u,-. A measure of imbalance of the assignment, within the set 'of patients with ith covariate at level u,, is defined as where nt(u,) represents the number of patients allocated to treatment t and with ith covariate at level u,-. Let A, be a weight reflecting the importance of covariate i. Then, the total imbalance of the assignment is defined as P Ti T 2 IB 2 2A,- : Z (nt(u,) — %Znt(ui)) a 1“: i=1 and a treat mi timizeti [5 derived. tients with n’u} = if mr-nts so tl: measure. X €Xt’tiltt‘tl. l essentially. the effet-t oi mciit 8' to r« assigrunent.~ (‘nmhinatitir design is (1.. {1951) also so that hing}, .‘sihle My“: that it neith and a treatment assignment is called B—optimal if the corresponding I B measure is minimized among all possible assignments. A sufficient condition for B—optimality is derived. The first step in the procedure is to check whether the number of pa- tients with level combination u, denoted n(u), is 2 T. If this is the case, say, n(u) = qT + r, 0 g r < T, randomly assign qT patients in stratum u to the T treat- ments so that each treatment is assigned to q patients. This does not change the IB measure. Now the remaining n(u) is less than T and the so called B—algorithm can be executed. This heuristic method for constructing nearly B—optimal designs consists, essentially, of applying a routine which is based on measuring two effects; the first is the effect of switching a patient with a combination of levels 11, from receiving treat- ment 3 to receiving treatment t; the second is the effect of exchanging the treatment assignments of a patient with a combination of levels u and another patient with a combination of levels v, from (s, t) to (t, s). As initially described, this balanced design is deterministic and tries to balance assignments only across main effects. Wu (1981) also presents some modifications that can be introduced in the B—algorithm, so that higher order effects and randomization can be taken into consideration. Pos- sible advantages of this design over, for example, the design of Atkinson (1982) are that it neither assumes the existence of a regression model for the responses nor does it require matrix inversion. Still, the relative merits of theses approaches are not discussed. The B—optimal design can be particularly useful when N, the number of patients involved in the trial, is not much larger than r1 x x rp, the total num- ber of possible level combinations. Its main disadvantage is that information on the covariate levels of the patients may not be available in advance. 12 1.3 He Retail that depends hull treatments 11 takes priority and the utili’. tan. in wine Tlih wr‘tit however. wirl Haiti pitiptmu the flt‘l't‘hipmf lll Wildl in site. each path to exactly W. pattern 15 dfisi; W in the (1,. ll'riIiPm “‘91 and D. We i‘ l)»’..\_,[]](, Q: If ..,‘:‘rr:‘-} if)" r ,l[.]pr'," ~ . ‘ J‘III\‘) ' pf? 1,. ulhfllVPV ”as of 1.3 Response-Adaptive Designs Recall that response-adaptive designs are such that the allocation of each patient depends both on the previous allocations and on the responses of the patients to the treatments under evaluation. Such designs are used when the individualistic goal takes priority or when some compromise is sought between both the individualistic and the utilitarian goals. Incorporating covariates in this setting is quite natural and can, in some cases, simplify the design (see, for example, Woodroofe (1979)). This section will focus on response-adaptive designs with covariates. It is started, however, with a traditional non-covariate design, The Randomized Play-the- Winner Rule proposed by Wei and Durham (1978), which is the basic point of reference for the development of The Covariate Randomized Play-the- Winner Rule in Chapter 4. In what follows it is assumed that patients arrive sequentially at the experimental site, each patient can be assigned to either one of two treatments and will be assigned to exactly one of them, and the response of each patient is observed before the next patient is assigned to a treatment. Furthermore, when covariate information is to be used in the design, it is assumed that covariate measurements are obtained on each patient before allocation to a treatment. Wei and Durham (1978) introduce The Randomized Play-the- Winner Rule as a possible solution to the problem of designing a clinical trial in such a way that a good compromise is achieved between the need to derive information about the relative effectiveness of the treatments and the desire of treating each patient in the best 13 poggjlllf’ ‘ that 9”" patients I hat initit rantinnllfi‘ merit i. “ the patii‘I the ft;-liu'~\' j: if a fail ]. Witter? «. ll IS SlIUWI small. but different is two menti- ment in a tnt'ariat es Rpll] ’1. if 1. Pitytheqr ‘ r‘. .45.. \[q i -. treatments 7.- )\. Iv . I “a: Trier, his. Iflllrou 4?: tit possible way. The design can be described with an urn model as follows. Suppose that there are two treatments, A and B under evaluation and the responses of the patients to the treatments are dichotomous (either a success or a failure). The urn has, initially, u balls of each type, u 2 1. When a patient enters the study, a ball is randomly drawn from the urn. If the ball is of type i, then assign the patient to treat- ment i, where i E {A, B}. The ball is then replaced in the um and the response of the patient is observed. The composition of the urn will now be changed according to the following rule. If a success has occurred, add fl balls of type i and a balls of type j; if a failure has occurred on treatment i, add a balls of type i and fl balls of type j, where ,3 2 a 2 0, i, j E {A, B} and i 7E j. This rule is denoted by RPW(u, oz, B). It is shown that the RPW(u, a, 5) rule introduces more randomization when B/oz is small, but tends to put more patients on the better treatment when 6/01 is large. So, different choices for the pair (a, [3) give different levels of compromise between the two mentioned goals. The main advantage of this design is that it .is easy to imple- ment in a real clinical trial. The new design introduced in Chapter 4 incorporates covariates into a multiple urn model, and can be regarded as a generalization of the RPW(u, 0, ,8) rule. A more complete discussion of the properties of the randomized play-the-winner rule is deferred to Chapter 4. This section will conclude with three designs whose goal is to assign patients to treatments in order to maximize the expected total response of the patients to the treatments. This problem is referred to, in the literature, as the bandit problem. A thorough discussion of classical bandit models appears in Berry and Fristedt (1985). 14 Here. tl'. models t that will enrolled tistit‘al t-j unknt MI: “in )1 ll alienating way that t a Bayesial treatment r635P*‘f‘tlt'e[~,~ on the patie tit (Ii. 3 trihuted as: f. iii) I. has fill) the (it, (it) the (it; it] 6-) p «L if as, Here, the focus is on bandit models which incorporate covariates. The first of these models was introduced by Vt’oodroofe (1979) and is described below. All the models that will now be presented were developed under the assumption that each patient enrolled in the study can either be treated by a standard treatment, A, whose sta- tistical characteristics are known, or a new treatment, B, whose characteristics are unknown. Woodroofe (1979) and Woodroofe (1982) consider the problem of sequentially allocating patients to treatments, when covariate information is present, in such a way that the expected value of a response variable is maximized. Woodroofe adopts a Bayesian approach by assuming that the distribution of responses for the new treatment depends on an unknown parameter which, in turn, is assumed to have a known prior distribution. The problem can, formally, be described as follows. Let Xk and Yk denote the potential responses of patient It to treatments A and B, respectively. For each It 2 1 exactly one of (Xk, Yk) is actually observed. Suppose that before assigning patient It to one of the treatments, a covariate M, can be observed on the patient. It is assumed that (i) (Vk, Xk, Yk), for k 2 1 are conditionally independent and identically dis- tributed as (V, X, Y) given the value of an unknown parameter 6) = 6; (ii) V has a known distribution F; (iii) the conditional distribution GX(o|v) of X given V = u is known; (iv) the conditional distribution Gy(-]v) of Y given V = 1) depends on G) = 6; (V) 9 has a known prior distribution 7t. 15 Here. X a taking \‘ait extra-ratio ti]; = I (if ii i 2 1. a, i: 2.6. a lllt‘d:~ the torari; pnpulatit ,1 Where 0 < 0i 9 and 4 With ”Spin Clle of it" ‘.‘ pl'pllldljit‘m L‘- 2111 91]] “Hi; Here, X and Y are assumed to be real-valued but V and 9 can be quite general, taking values in Polish spaces. The distributions F and it are assumed to yield finite expectations for X and Y. An allocation policy is a sequence 6 = {6k : k 2 1}, where (5;, = 1 or 0 if the kth patient is allocated to A or B, respectively. Furthermore, for each It 2 1, 6k is ameasurable function of {6], (5]- Xj+(1-6j) Yj, Vj, Vk : j = 1, , k—l}, i.e. a measurable function of the previous allocations, responses, covariate values and the covariate value of the present patient. Woodroofe (1979) defines, for an infinite population of patients, the expected a—worth of a policy 6 when the prior is it, as t/t’ata. 7t) 2 E’r {EC/”1W Xe +(1— 6031]} ’ k=1 where 0 < a < 1 and E7r denotes the expectation with respect to the joint distribution of G and {(Vk, Xk, Yk) : k 2 1}. Given a and it, the goal is to maximize Wa(6, it) with respect to 6. Under certain regularity conditions, Woodroofe (1979) describes a class of asymptotically optimal allocation policies. Woodroofe (1982) focuses on finite populations. IfN is the number of patients enrolled in the trial, and 6 = {(51, - . - , 6N} is an allocation policy as defined above, then the expected response of 6 is defined as N a - a {3... + t - W} , i=1 where E7r denotes the expectation with respect to the joint distribution of G and {(Vk, Xk, Yk) : k = 1, , N}. Given 7r, the goal is to maximize RN(6, 7r) with respect to 6. Woodroofe (1982) shows that the optimal policy can be determined by an algorithm based on backward induction and investigates the asymptotic properties of the policy, in the case of a large population (i.e. when N —> oo). 16 Clitl'ltil whith the : lit a ti’pit'a the same it this tintinii and «in the denote the I if! lit‘ai}, attttalit‘ ill) a (“Ui‘ariat e Values are 1‘ lid. with i Clayton (1989) proposes a covariate model for a Bernoulli bandit, i.e. a bandit in which the responses of the patients to the treatments are Bernoulli random variables. In a typical bandit it is assumed that all patients receiving the same treatment have the same marginal probability of success. By introducing a covariate, Clayton extends this notion assuming that the probability of success depends both on the treatment and on the covariate. The model can formally be described as follows. Let Xk and Yk denote the potential dichotomous responses (0 for failure and 1 for success) of patient k to treatments B and A, respectively. For each It: 2 1 exactly one of (Xk,Yk) is actually observed. Suppose that before assigning patient It: to one of the treatments, a covariate I}, can be observed on the patient. It is assumed that the covariate values are unknown before their observation, but the random variables Vl, V2, - - - are i.i.d. with a known distribution F. Suppose that functions p and A exist such that P(X,c = 1|p(Vk)) = p(I’I) and P(Yk = 1|A(Vk)) = A(Vk). The functions p() and x\() are linked by H, (a known increasing and invertible function) in such a way that p(u) = H(a + Bo) and /\(v) = H(c + do). Here, a and [3 are unknown constants and c and d are known constants such that (i) 5 2 0 and d 2 0, and so p(u) and A(u) are nondecreasing in i); (ii) a, B, c, d and u are constrained so that p(u) and Mo) lie in [0, 1]; (iii) prior information regarding a and fl is given by a probability distribution R. Two link functions are studied in the paper: the logit link, H(:r) = eI/(l + ex), and the log-linear link, H(:r) 2 ex. The worth of a strategy (i.e. allocation policy) is defined as the expected sum of the first n observations for all possible histories re- sulting from that strategy. A strategy will be called optimal if it yields the maximal 17 expected 5 35 in Win.” arteristit‘S between tl at a poiiil 1.4 A .lletlit‘al it. of arilapt in on ariat'itit Bartlett {1 mezriliram Sinn of her analysis of i alid Hpfiii,‘ dUUIJIF‘- b] :v 13 fig ~ Hi (lt‘prt ‘0 an Rim The K . 1H .4 l 9,. «id? M h All [this it . 5 Vii); expected sum. Rather than investigating the asymptotic performance of strategies as in Woodroofe (1979), Clayton focuses on the determination of the structural char- acteristics of exactly optimal strategies for the covariate bandit. The relationship between the standard bandit (which corresponds to the case where F is degenerate at a point) and the covariate bandit is also studied. 1.4 Applications Medical investigators who wish to perform clinical trials currently have a wide variety of adaptive allocation procedures at their disposal. Still, very few clinical trials based on adaptive designs have been reported in the literature. Cornell, Landeberger, and Bartlett (1986) report on an adaptive clinical trial to test the efficacy of extracorporeal membrane oxygenation (ECMO) for the treatment of persistent pulmonary hyperten- sion of newborn infants. The design used (RPW(1, 0, 1) rule) and the subsequent analysis of the ECMO trial data created great controversy and much of the criti— cism of adaptive designs has centered on this trial. Later, Tamura, Faries, Andersen, and Heiligenstein (1994) describe the rationale, design, and statistical analysis of a double-blind, stratified (two strata), placebo-controlled trial of out—patients suffering from depressive disorder. Patients were allocated to treatment or placebo according to an RPW(1, 0, 1) rule within strata. The simplicity of implementation of the randomized play-the-winner rule is per- haps its most atractive feature from the point of view of applications. The choice of a design should be driven by the simplicity of implementation but also by its statistical 18 properties ax. ll] llllS (llm‘l properties and the nature of the clinical trial. We hope that the new designs proposed in this dissertation can be successfully used in adaptive clinical trials. 19 Chapter 2 The Adaptive Weighted Differences Design 2.1 Introduction In this chapter, a new adaptive design is defined and studied. It is called The Adaptive Weighted Differences Design, abbreviated AW D, and it offers a compromise between the individualistic and utilitarian goals of a design for clinical trials. Patients are randomly assigned to treatments according to a response-adaptive rule. This new randomized response—adaptive design can be applied in clinical trials for which 0 patients arrive sequentially; 0 each patient can be assigned to either one of two treatments and will be assigned to exactly one of them; 0 the responses of patients to treatments are dichotomous (either a success or a failure); 20 o the response of each patient is observed before the next patient is assigned to a treatment. Recall, from Section 1.2, that Wei’s adaptive biased coin design is a randomized allocation—adaptive design that attempts to assign patients to treatments in such a way that, at the end of the trial, both treatment groups have received the same num- ber of patients. As mentioned previously, this design is oriented towards statistical inference (utilitarian goal). The AW D design can be seen as a generalization of VVei’s adaptive biased coin design so that ethical issues (individualistic goal) are also taken into consideration. This is achieved by taking into account, at each stage of the trial, both the proportion of patients assigned to each treatment and the proportion of patients successfully treated in each treatment group. In what follows, the allocation policy for the AW D design is formally described and strong laws of large numbers and a central limit theorem are proved. The design is evaluated by comparing its performance with that of complete randomization and the RPW(1, 0, 1) rule (see Section 1.3). 2.2 The Allocation Policy Suppose that patients arrive sequentially for treatment and are immediately allocated to one of two treatments, A or B. For each k 2 1, define 6),. to be 1 or 0 according to whether the kth patient is assigned to treatment A or to treatment B. Let Xk and Yk denote the potential dichotomous responses (0 for failure and 1 for success) 21 of patient k to treatments A and B, respectively. For each It 2 1 exactly one of (Xk, Yk) is actually observed. Suppose that {(Xk, ll.) : k 2 1} is a sequence of i.i.d. random vectors. From the point of view of statistical inference we are interested in estimating the true (unknown) success probabilities, [2,4 and p3, for treatments A and B, respectively. Here, 1),, = P(X,C = 1) and p3 = P(Yk = 1). Usual point estimators of 114 and p3 are the proportion of patients successfully treated in the trial by treatments A and B. To formally define these estimators we need to introduce some notation. Define N,” and [V8,]: to be the number of patients allocated to treatments A and B through stage k. Then I: Ni. = Z 6.- i=1 and k [V3,]: 2 2(1— 61') = k — NA,k- i=1 Define S A’k and S 3,}, to be the number of patients successfully treated by treatments A and B through stage k. Then I: 5A,. = 26.- X.- 2'21 and k 5,3,, = Z (1 — 5,) 3;. i=1 The point estimators, at stage k, of the success probabilities, 1),; and p13. are then defined to be 5A.}: flak = NAJc 22 and I) _ SBJC BJC — , 7 iVBJC respectively. Let {Uk : k 2 1} denote a. sequence of i.i.d. Uniform[0, 1] random variables, independent of the sequence {(.\k, 1),) : k 2 1}. The. sequence of Uk’s is used to describe the randomization in the allocation policy. Denote by I() the indicator function. Wei’s adaptive biased coin design (see Section 1.2) is based on the difference, Dk, between the proportion of patients allocated to treatments A and B, through stage k. Nik N81: NAk D: — ,: ——’— —1 k k k ( k ) Dk gives a measure of imbalance of the experiment at stage k. The allocation policy for the adaptive biased coin design consists of allocating patient k + 1 to treatment A with probability h(Dk) where h : [—1, 1] —> [0, 1] is a non-increasing function such that h(;z:) = 1 — h(—;L‘), for all .1: 6 [—1,1]. Wei (1978) suggests using a function h(-) defined by 12(1) 2 (1 — 1‘)/2 which yields T 1 — Dk 6k+1: [{L'k+1< 2 } (2-1) So, if D, = 0 then there is perfect balance and the next patient will be allocated to treatment A with probability % (which corresponds to complete randomization). Then, as Dk increases from zero (i.e. the more treatment B gets under-represented) 23 the probability of allocating the next patient to treatment A decreases to zero. So, this allocation policy forces an extremely imbalanced experiment to be balanced but tends to complete randomization as the difference Dk tends to zero (i.e. as the experiment approaches perfect balance). This same reasoning can be applied when the focus is on ethical allocation. In this setting, the difference between the proportion of patients successfully treated by A and B, 13,“. -— 133k, is used. This difference will be denoted by A], and the difference between the success probabilities, 1),, —p3, will be denoted by A. If 5,, = 0 then both treatments are performing equally well and the next patient is allocated to treatment A with probability % (which, once again. corresponds to complete randomization). Then, as 5;, increases from zero (i.e. the better treatment A is performing in com- parison with treatment B) the probability of allocating the next patient to treatment A increases to 1. This leads to an allocation policy of the form 1+5 (5,,+1 : 1{U,+, g 2 k}. (2.2) To achieve a compromise between the two previous allocation policies, (2.1) and (2.2), we consider a convex function of A], and Bit, namely, It, 2 A5,, — (1 — A)Dk (2.3) with A 6 (0,1). Then (2.3) can be used to allocate patients according to the policy (2.4) 1+AAk—(I—A)Dk} 2 . 6k+1: 1{ fk+1< Henceforth, the notation AWD(A) will refer to the allocation policy (2.4) for a con- stant A E (0, 1) and A will be referred to as the compromise weight. 24 Note that as A increases from 0 to 1, the allocation policy AWD(A) places less emphasis on balance and more emphasis on ethical allocation. 2.3 Strong Laws of Large Numbers In this section, unless otherwise stated, it is supposed that patients are allocated to treatments A or B according to the A\\'D(A) allocation policy where A is a constant in (0, 1). A fundamental result that will be proved here is the almost sure convergence of the proportion of patients allocated to each treatment, as the number of patients treated converges to 00. While working towards proving this result it will be shown that 13.4,): and 153,1: are strongly consistent estimators of 1),; and 123, the true success probabilities for each treatment. An expression for the asymptotic proportion of patients successfully treated following AWD(A) allocations is derived and compared with the corresponding expression resulting from allocating patients according to complete randomization. For k ,>, 1, let .7} be the o-algebra generated by the the first I: allocations, potential responses and auxiliary randomization, i.e. fk:0{6i, A}, y,“ Uz‘ I lglgk} and let f0 denote the trivial o-algebra. It is also useful, in the proofs that follow, to consider the a-algebra 9k = fk V 0{Uk+i}- Note that (5H1 is gk— measurable and the random vector (Xk+1, n+1) is independent 25 Of gk. Although the results of this section are only proved for treatment A, similar results hold for treatment B. Proposition 2.3.1 lim NA}, z oo a.s. k—>oo ’ Proof. Since {NAJc : k. 2 1} is a nondecreasing sequence of random variables, then limk_>00 N,” exists and {ymnqkky as) Fix It 2 1. Note that , - NA,k+l = N4,k + 0k+1 =NireltaH<1+%Al—U_A)P(%i)—q 2 Therefore, _ . 1+AAf—u—A)P(%i)—q . {JVAJZNA’h V]>k}§ DJ+1> 2 ,V]>k 1+A—1-—1—A zl—i §{UJ+1> ( ) (2 )( 2 )avj22k} 1— A :{Uj+1> ——2—, Vj 2 2k} (2.6) since Aj 2 —1 and NAk/j g 1/2 forj 2 2k. But (2.6) has probability zero for all A 26 in (0, 1) and so, by (2.5) 8 P {klim Nam < 00} g PlN.-i,j -_— Nib Vj > k} M k 1 1 —A < —,Vj22k}=0. \ M8 P {Cf-1+1 > a. | I l The result follows. I The following lemma is a technical result useful in proving the strong consistency of 114* and 1333,, as estimators of the success probabilities 19,4 and p3. Lemma 2.3.1 Let {Zk : k 2 1} be a sequence of random variables such that 2,, 2 0 or 1, for each It 2 1. Then Zk (Zi+"'+Zk) M8 2<00. a. ll 1 Proof. For each to and each k 2 1 let Zk(w) = 2k. Fix n 2 1 and to. Let n0 2 22:1 [{z;C :1}. Then i 2“” — " e "° i< " .1. 1:21 (21(0)) + +Zk(w))2 — k:1(z1+ + 2k)2 k2 \ k2. Since 2:, l/k'2 < 00, the result follows. I The proof of Theorem 2.3.1 below uses a result from Hall and Heyde (1980) which is included in Appendix A of this dissertation. Theorem 2.3.1 13A,), and [33,), are strongly consistent estimators of the success prob- abilities pA and p3, i.e., lim 13A,}, = 19,, as. k—mo 27 and lim 1313.}: = [)3 k-)DO Proof. Fix It 2 1. \Ve can write 1).-i. k — PA “C(So X’j—Psl- SM: Now, let. Mk 2 k :0. e F1 Since 6,, is Qk_1—measurable and Xk is independent of gm, then E l5k(Xk — PA) | gk—ll = (5kEl(Xk — 114)] =amA—ao=0 Hence, {Mb g, : k 2 1} is a martingale. Furthermore, {NA’k : k 2 1} is a nonde- creasing sequence of non-negative random variables such that Ni“, is Qk_1—measu- rable for each k 2 m 1 Z N k=1 k 1. Finally, Lemma 2.3.1 and the definition of N,” imply that E[0k( Xk—PA)2 lgk—l] The result follows by Proposition 2.3.1 and a direct. application of Theorem A.0.1. I It can now be proved that the proportion of patients allocated to each treatment converges almost surely to a constant, which depends on A, the compromise weight, and A, the difference between the success probabilities for treatments A and B. 28 Before formally stating and proving this result, we give some heuristics to Show what the limiting constant should be. So, suppose that N klim A‘k 2 77 as (2.7) where 77 can be a random variable. Then, NA 1: NB 1: D = ’ — ‘ k k k NA 1: : 2 ’ — 1 ( k ) k——> 277 — 1 as (2.8) We expect that . N.4,k _ 11m — P(6k+1 — 1 Ifk) = 0 as (2.9) lit—+00 k Now, Theorem 2.3.1 and (2.8) imply that 1+AAk—(1—A)Dk P(5k+1= life) 2 2 1 , _ _ _ __) + AA (1 A)(2n 1) as. k—)OO 2 So, if (2.9) holds then 1+AA—(1—A)(2n—1) n 2 as. 2 which, solving for 77, yields 1 A Therefore, if N A), / k converges, we expect that it converges almost surely to (2.10). 29 Theorem 2.3.2 , IVA]: _ 1 A , "121310 A? — 2 (1+ 2 _ A A) (1.3. (2.11) and -, 1V3}; _ 1 A 7' Proof. Define a function g on (0, 1) x (0, 1) by setting q(s, t) = (2 — A)t— (1 — A) 5. Note that q satisfies the regularity conditions of Section 2 in Eisele (1990), namely (2) q is continuous; (ii) (1(8» 8) = 8; (iii) q(s, t) is strictly decreasing in s and strictly increasing in t. Now, the allocation rule for the AWD(A) design can be written as (Mai—All}- NA, 1: k 7 (\DIr—A 5H1: [{Uk+1< q ( where, by Theorem 2.3.1, 1 A - 1 A o __ _._.__. :— —A .9. 1.1%]2 (1+2—AA’C)] 2(1+2—/\ ) as Hence, relation (2.11) can be proved by following the same arguments used in the proof of part (iii) of Lemma 1 in Eisele (1990). Relation (2.12) follows from (2.11) and the fact that NEH/k = 1 — NAk/k. I 30 Note that a direct consequence of Theorem 2.3.2 is that the relation (2.9) (that we used to heuristic-ally deduce the limiting proportion of patients allocated to A) holds. From an ethical point of view (individualistic goal) we are interested in the pro- portion of patients successfully treated following A\'VD(A) allocations. Below it is proved that the asymptotic proportion of patients successfully treated as a result of AVVD(A) allocations is almost surely greater than that for complete randomization; as expected, the difference between the ethical peformances of AWD(A) and complete randomization increases as A increases or as the difference between the treatments increases. For each It 2 1, let 8,, denote the number of patients successfully treated through stage It, i.e. 5,, = 5,4,1, + SEW Proposition 2.3.2 For complete randomization, 1 lim —£ = — (pA + p3) as. (2.13) S . 1:11:20 % - i (p, + mg) + 2 _ A (pA — pg)! as. (2.14) Proof. To prove (2.14) note that 5k 1V.~i,k NB 1: __:'A +." r k pAJc k PBJ: k and use Theorem 2.3.1 (to get the as. limit of [3,“, and 133$), Theorem 2.3.2 (to get the as. limit of NAk/k and Ngk/k) and algebra. 31 Similar arguments can be used to prove that (2.13) holds. I 2.4 Central Limit Theorem In this section it is shown that the strongly consistent estimators of the success probabilities p4 and [)3 are asymptotically independent and normally distributed. The proof of the theorem below uses uses results from Hall and Heyde (1980) which are included in Appendix A of this dissertation. Theorem 2.4.1 As k -—> oo, \/1VA,k(PA,k — PA) D 0 PA (1.4 0 —> N , \/ NB,k('.l33,k — PB) 0 0 PB (13 where qA =1—pA and (13 21—123. Proof. Fix real constants a and b, define for each k 2 1 and i = 1, , k, 1 i , Alkfl' Z — Z [0(Aj — pA)6j + b(Yj _ p3)(1_ 6.7)], and let gm 2 9,. For each j : 1, ,i let 1 , , - Zm = — [0 (ij — Paid} + 50’1“— Pelll — 01)]- yr The proof of the theorem uses the following three lemmas. Lemma 2.4.1 (MM, Gk),- : k 2 1, 1 g i g k} is a zero-mean and square-integrable martingale array with differences {Zm- : 1 g i S k, k 2 1}. 32 Proof. For each h 2 1 and each j = 1, , k, Z,” is QkJ—measurable and in- tegrable, and E[Z,,,,|gk,,-_,] = 0 (which can be shown as in the proof of Theorem 2.3.1). Hence, {Am G,” : 1 gj S k, k 2 1} is a martingale difference array. There— fore, {.l[k,,-, QM : k 2 1, 1 g i g k} is a zero-mean martingale array with differences {ZkJ-z 1g i g k, k 2 1} and for each h 2 1 and each i = 1, , k MW,)§:E (fi,) a2 k //\ The result follows. I Lemma 2.4.2 As k —> oo, :Z-QNO—3 H—LA\+E- 1—4LA ki 2PA94 / PBQB 2—A i: 1 where A = 19A — 193. Proof. To prove the lemma we use Theorems A02 and A03 in Appendix A. Note that condition (AA) is trivially satisfied by the o—algebras {gm : 1 g i g k, k 2 1}. Furthermore, max ]Z,,,|< —(]a|+|b|)—> 0, ask—>00 l 00, 1 5,4,1: — PA N.4,k o 0 5mm (1 + fiA) 0 7/? ——+ N , 53,1: — PB No.1: 0 0 2PBQB (1 - 337A) Proof. Since “ b 2121-, W51 k— PANA,k) + —(SB,k - PBNB,k), W? the result follows from Lemma 2.4.2 and the Cramer-Wold technique. I It is now easy to complete the proof of Theorem 2.4.1. Note that . k 1 \/N.4,k (P.-i,k " PA) I ‘7 (SA k - P4NA k) 1_VA k and . k 1 \/ No.1: (PBJc — P3) = NB k $613.1: — PBNB,k)a and use Theorem 2.3.2, Lemma 2.4.3, Slutsky’s Theorem and algebra. I 2.5 Evaluation of the Design Recall that the adaptive weighted differences design seeks a compromise between the individualistic and utilitarian goals. So, we evaluate the AW D design at two levels: a how ethical are the assignments of patients to treatments? 0 how good is the estimator of the treatments difference ([2,; — pg)? 35 Monte Carlo simulations are used to address these questions. For the first one we look at the proportion of patients successfully treated as a result of AWD()\) allocations, for different values of A in (0, 1); for the second question we look at the empirical mean squared error. Graphical comparisons, in terms of proportion of patients successfully treated and mean squared error, are made between the AW D design (with compromise weights equal to 0.2, 0.5 and 0.8), complete randomization (which focuses on utilitarian issues) and the RPW( 1, 0, 1) rule (which puts more emphasis on individualistic aspects). Figures 2.1 through 2.10 show the results of 100,000 replications of clinical trials with sample sizes 71 = 20 and 100, success probabilities for treatment A, 19.4 = 0.05, 0.35, 0.50, 0.65 and 0.95 and a range of values of the success probability for treatment B, 173 = 0.05 through 0.95. For each allocation policy and each value of n, opr and of 193, the proportion of successes is computed as the average, over 100,000 replications, of the proportion of patients successfully treated in the simulated trial; the empirical mean squared error is computed as the average, over 100,000 replications, of the squared difference between the estimates of ([14 — p3) and the parameter. The following labels are used in Figures 2.1 through 2.10. o r 2 complete randomization; o S = AWD(0.2); o m : AVVD(0.5); o l = AWD(O.8); o W = RPW(1, 0, 1) rule. 36 Simulations confirm some natural expectations on the performance of the five al- location rules being compared (see Figures 2.1 through 2.10). The RPW(1, 0, 1) rule and AVVD(0.8) yield the highest pr0portion of successes followed by AVVD(0.5), AWD(0.2) and complete randomization, in this order. Also, the differences between the proportion of patients successfully treated following the five allocation rules, in- crease as the treatments difference, (1),; — [)3], increases and as the trial size, 11, increases. Finally, as the trial size increases the mean squared error decreases for each one of the allocation rules. Figures 2.1 and 2.6 show that the two goals of a design for a clinical trial (individ- ualistic and utilitarian) are not always conflicting. If, for example, treatment A has a very small success probability and the success probability for treatment B is not large, then ethical rules like RPVV(1, 0, 1) and AVVD with large A yield not only the highest proportion of patients successfully treated but also the lowest mean squared errors for estimating the treatments difference. Note also that, if the success probability for treatment A is very small but the success probability for treatment B is large, then the RPVV(1, 0, 1) rule yields high proportion of successes but also very high mean squared errors (when compared to the other rules); the AW D allocation policy with large A performs nearly as well as the RPW (1, 0, 1) rule in terms of proportion of successes but much better in terms of estimating the treatments difference. Figures 2.5 and 2.10 illustrate the case when the individualistic and utilitarian goals are, in fact, conflicting. If, for example, treatment A has a very large success probability and treatment B has a small or moderate success probability, then using 37 an ethical rule (versus using a rule that focuses on statistical inference) gives much higher proportion of patients successfully treated in the trial but performs rather poorly when estimating the treatments difference. Figures 2.1 through 2.10 illustrate the excellent combined performance (in terms of both the proportion of patients successfully treated and the mean squared error for estimating the treatments difference) of the AWD allocation policy for suitable choices of A. If, as in many real clinical trials, there is some previous information on the success probability of one of the treatments, the simulations give rough guidelines as to good choices of the compromise weight for each situation. So, suppose there is some information on the success probability of tretament A. We suggest choosing compromise weights as follows. 0 If 1),; is very small then use a large A for both small (Figure 2.1) and large (Fig- ure 2.6) trials; 0 if 19,; is moderately small then use a moderate A for small trials (Figure 2.2) and a moderately large A for large trials (Figure 2.7); o if 1),; is moderate then use a moderate A for small trials (Figure 2.3) and a large A for large trials (Figure 2.8); o if 1),; is moderately large then use a moderately large A for small trials (Figure 2.4) and a large A for large trials (Figure 2.9); o if 192; is very large then use a moderately large A for small trials (Figure 2.5) and a large A for large trials (Figure 2.10). 38 Proportion of Successes 0.8 0.6 0.4 0.2 PB Mean Squared Error r smw / //// r smw / //// rl SW ///// r QW /M/fl // // 000.0 0N0.0 0F0.0 0.8 0.6 0.4 0.2 PB Figure 2.1: Comparisons in terms of prOportion of successes and mean squared error for n = 20 and 19,4 2 0.05. 39 . Iflilli l #lilil 0.0 dullII ®.O ifd- ll..lJ| v.0 allillli N0 T fill-llllll l \ no.0 mod mod v0.0 00.0 Proportion of Successes 0.8 0.6 PB 40 0.4 0.2 0.35. w I m a / A \ \ w .1 m s / A \ \ w/ \1 \m\\.... 1 M w I mrs / A \\\ wl ms K m. m... ,_ I r w ms 6 r _, : 0. E 1W ms d : : e 1w m. r : _: m m m 1w .ms 2 Z _: I am. Iwms / o a 2: W e Ima .. M 1.. A. 1 .3. w 1 2. W. 2. a // 1M. MW so No 8.0 mod mod vod 8.0 Figure 2.2: Comparisons in terms of proportion of successes and mean squared error for n = 20 and 12,1 Proportion of Successes no W 6 ‘ 5| ¢Y/m ” — ¢¥5T4$ o /%2 / zlfi‘ e - me o ‘4 I U" n _ m’ 0 ./ .l v "V 2 1:2; 2 d‘*T¢ 0.2 0.4 0.6 0.8 PB Mean Squared Error /W l w / B - /w _ ; ~w= — - ‘ ,1:1:W’W w ' |"‘I—I /\k’ __ __ _ 2 g _ ég;@f@:@—@—@~@:@S@: 4&4 \T3 \ - /$/ \gkm /fl/ \ \ r/ s m 8 J r?‘ \s o 0 0.2 0.4 0.6 0.8 PB Figure 2.3: Comparisons in terms of proportion of successes and mean squared error for n = 20 and 1911 = 0.50. 41 lliqilllidilfi. ll '1'. fill.i4 .llllln w. 0 m . O v . O 0.0 5.0 FlgUre .2 2 Proportion of Successes A m / o ‘ 2%? wé 4 N - 3" 0 ‘/ I/‘/ Q 1 " o 9" , 5%?! Y..m’m $5? r 0.2 0.4 0.6 0.8 PB Mean Squared Error 0) q w o / .. w/w E q W’W/W/ =w_Y:Y’ - W‘W ‘ l ‘ l A | ~ in /W’ @_ r215 0. " 1a ”:11 ° :;:@’@ Q g @ @:@2 8.. /* S\m o 5" s 0.2 0.4 0.6 0.8 PB Figure 2.4: Comparisons in terms of proportion of successes and mean squared error for n = 20 and 1),; = 0.65. 42 Proportion of Successes In" m _ ‘, o ,m’ W’Y’;:¥;¥’ _ aw—W’ , ’ é —|—|—|"" /m% l—l—l ,m/$ '\ _ m/m/ /§/ 0 /m/ /§/ ,m /r (O s/ / -« / r o /s r/ /S / s /r to. A /' o I I T I 02 04 03 03 PB Mean Squared Error W’W—W~w / ~w\ 8 - w/w W‘W‘W’w 0 w/ / w 8 w/ I V W / I\ Q - /'/l | 0 w | m,m—m—m—m‘m\ \l w l /g,s’8 8~ :T\ I g - /|/@§T \@§r§ o /@5 §@ 3" \m I I I I 02 GA 05 03 PB Figure 2.5: Comparisons in terms of proportion of successes and mean squared error for n = 20 and 19,; = 0.95. 43 QC QC r4\\ It, «I il‘ 11' J, I I: . L .. MU. C. U. . 1 woo 0 wood 2 P» \\ r a n. .15 H. F 90.“ Proportion of Successes / / 08 0.2 PB Mean Squared Error 0.8 0.6 0.4 0.2 133 Figure 2.6: Comparisons in terms of proportion of successes and mean squared error 0.05. for n = 100 and 19,4 44 Proportion of Successes /w (n d d /W I /w/|/m w I m/ /|/m/ t? 0' ‘ / fi 255" Ma: [I53 I‘m V. . " O " ././ .’ mfi*"' N 5 d .4 U, I I I I 0.2 0.4 0.6 0.8 PB Mean Squared Error w m w a + / 0 /w /w _ ' ,w ‘ .,g—I-U=U=¥E¥:¥;x:l—l—|_|_| "’.’ ‘0: : 8. a o O I T I r 0.2 0.4 0.6 0.8 PB Figure 2.7: Comparisons in terms of proportion of successes and mean squared error for n = 100 and pg 2 0.35. 45 3 Proportion of Successes w / _ /W/I rx /w§'%';r£| d I /¥§*é / ¢ 4%? é d “A “a . / l0 fi ‘/. o ./ ./ 9" " aw?! ‘_ .. :Ifié or) J5; y o ‘ $15 T I I I 02 CA 06 03 PB Mean Squared Error :9 W 3 ‘ / /W ,w’w - 5 sm-U-H—Himi E _ fl“5fl m x g * éEITI‘I o"’ 9:9: 8 V 'I‘ o d I I T I O 02 Q4 06 08 PB Figure 2.8: Comparisons in terms of proportion of successes and mean squared error for n = 100 and pA = 0.50. 46 mo 0.0 mo v.0 20.0 are ‘7' ,._, FIEI Bed 08.0 mood iOI' n Proportion of Successes /w a) W/ 6 ‘ g鮑@ / ’ “é '5 ‘ ué O 'I / ,3 (D. _. "I. o , 5m?” “ ’Y’V:¥E¥;¥é mIm:§é Y..m’ /§/ 0 /?/ §/ 0.2 0.4 0.6 0.8 PB Mean Squared Error 92 w Q - o .I v a q o w 0010 \\ ell-E |\\ °?-E E-E III G-E III GE III 03E II GE III GE III as E I \ 2 0.006 0.2 0.4 0.6 0.8 PB Figure 2.9: Comparisons in terms of proportion of successes and mean squared error for n = 100 and pA : 0.65. 47 0.6 0.7 0.8 0.9 0.5 0.015 0025 0.005 Proportion of Successes u" o: d ,m7 o (W’ ’ W—w—w—w—w—w—w—w—w—w—w—W‘W’w,¢?% / 7 Q I’l,|:r§; o * , ,|’ / / I—I—I—I—l—l'l ' /@¢@ m’m:§/ N q /m/ /§/ 0 m/m /§/ m,m ,s/ ,m’ s r m s’r/ co. 4 S/r/ O /S/l'/ 5/5 r/ / In. _ r/ O I I I I 0.2 0.4 0.6 0.8 PB Mean Squared Error LO 8 ‘ w—W d W/w’ W\w \ / W - W \w w/ \ m w _ / \ Q ‘ w W 0 \ // W 4 w \w / .—- — — — § \ \ § - w /. ,TzTéT-@=T=T-@—@=rb=¢n; \ w 5 ‘fl I I I I 02 DA 05 08 PB Figure 2.10: Comparisons in terms of proportion of successes and mean squared error for n = 100 and 19,; = 0.95. 48 Cha1 The Diff 3.1 35599“: uuo an confiuna indepen I”)? USFd intnuju ancaV batten is calh rSirxziIa Chapter 3 The Covariate Adaptive Weighted Differences Design 3. 1 Introduction As seen in Chapter 1, a widely used approach for incorporating covariate information into an adaptive design consists of first forming strata by considering all possible combinations of levels of relevant covariates and, then, using an adaptive design to independently allocate patients within each stratum. Clearly, this procedure can also be used to incorporate covariates into the AWD allocation policy. In this chapter, we introduce a new approach that generalizes the one described above by allowing the allocation of patients in one stratum to depend on the allocations and responses of patients previously treated in the same and other strata. This new adaptive design is called The Covariate Adaptive Weighted Difierences Design, abbreviated CAVVD. Similarly to the AWD design, this new design can be applied in clinical trials for 49 which 0 parieuh o earl) pa? to exartly I 0 {hp r951)“ urol: o the resin treammu. Sim"? the ( are neiwlml o relmuu physical st to a treatr 0 all the st lexwls are I 111 Each 011 with t to a ”SIM lHdh-MUHE, In Villa and Sty;mg ls ., , Mama,” which 0 patients arrive sequentially; 0 each patient can be assigned to either one of two treatments and will be assigned to exactly one of them; 0 the responses of patients to treatments are dichotomous (either a success or a fail- ure); 0 the response of each patient is observed before the next patient is assigned to a treatment. Since the CAW D procedure makes use of covariate information, further assumptions are needed. It will also be assumed that 0 relevant covariates on a patient (concomitant information such as age, sex, general physical status, severity of the disease, etc.) are available before assigning him/ her to a treatment; 0 all the strata that can be formed by considering common combinations of covariate levels are known before the trial begins and there will be at least one patient treated in each one of those strata. With the CAWD design, patients are randomly assigned to treatments according to a response-adaptive covariate design. A compromise is again sought between the individualistic and utilitarian goals of a design for clinical trials. In what follows, the allocation policy for the CAWD design is formally described and strong laws of large numbers and a central limit theorem are proved. The design is evaluated by comparing its performance with that of the AW D design within strata, 50 (rmnplote Semen l.- 3.2 "I 3.2.1 ( The fullmx' Supp» is examim the pm.“ We treat] the kill p; the P‘YIU’II‘ treatment Ub5““'(*d . each k 2 5“ Of all Ltd, ramj “‘hpreg‘ f( FUJI” [h 0 If g .1. .LUAHQ‘W] complete randomization within strata and the RPW(1, O, 1) rule within strata (see Section 1.3). 3.2 The Allocation Policy 3.2.1 General Notation and Assumptions The following notation and assumptions will be used here and in Chapter 4. Suppose that patients arrive sequentially for treatment. Upon arrival each patient is examined to determine his / her covariate levels. Let 2) denote the stratum to which the patient belongs, where v 6 {1,2, - -- ,r}. The patient is then allocated to one of two treatments, A or B. For each k 2 1, define 6;. to be 1 or 0 according to whether the kth patient is assigned to treatment A or to treatment B. Let X k and Yk denote the potential dichotomous responses (0 for failure and 1 for success) of patient k to treatments A and B, respectively. For each k 2 1 exactly one of (Xk, Yk) is actually observed. Suppose that {(Xk, Yk) : k 2 1} is a sequence of i.i.d. random vectors. For each k 2 1, let Vk denote the stratum to which patient k belongs. Denote by V the set of all possible strata {1,2, - -- ,r}. Suppose that {I}c : k 2 1} is a sequence of i.i.d. random variables such that P(I"’1 = v) = C(‘U), Vv E V where, for each i) E V, C(U) is an unknown constant in (0, 1) and Zvev 6(2)) 2 1. From the point of View of statistical inference we are interested in estimating the true (unknown) success probabilities, 1),,(0) and p3('v), for treatments A and B, within 51 each stratum v 6 V. Here, for each 11 E V, p,4(v) = P(X,c = 1 l V), = v) and pB(v) : P(Yk=1|I‘}C= 1)). Throughout the remainder of this subsection, k 2 1 and v E V are fixed. Usual point estimators of 1),,(21) and pB(-v) are the proportions of patients success- fully treated in the trial by treatments A and B within stratum “v. To formally define these estimators we need to introduce some notation. Let Nk(v) denote the number of patients treated in stratum v through stage k. Then Nk(v 22:31“; 22)}. Define NA,k(v) and NB k(’L ) to be the number of patients in stratum U which are allocated, through stage k, to treatments A and B, respectively. Then, NA 14:: [{V— _ v} 6 and N3k(v 221””, =v}( (,1—6)= Nk(v)—N,4,k(v). Define SA,k(v) and 33,],(12) to be the number of patients which belong to stratum v and are successfully treated by treatments A and B through stage It. Then 34 ,( 6) —ZI{V,- _ v}0, .1 and SBk(’U 221‘“ =’(,)U}l—6 )Y. The point estimators, at stage k, of the success probabilities within stratum v, pA('v) and 113(1)), are then defined to be - SA,k('U) p1 [C(11) — \’A,k('b) and 63 .(v) = —S.B"‘('“), ’ NB,,,(2)) respectively. Let {U,c : k 2 1} denote a sequence of i.i.d. Uniform[0, 1] random variables independent of the sequence {(Vk, Xk, Yk) : k 2 1}. The sequence of Uk’s is used to deseribe the randomization in the allocation policy. 3.2.2 The CAWD Allocation Policy In what follows, the general allocation policy for the CAWD design is of the form 6,ch1 = de+1(v)1{vk+1 = v} (3.1) 06V where, for each v E V, 6k+1(v) is to be specified. The allocation policy will first be described in the particular case corresponding to allocating patients using an AW D allocation policy (2.4) within strata. Suppose that patient k + 1 is in stratum 'vo. Then, in the expression for the AWD allocation policy, replace 3,, (the difference between the estimated success probabilities for A and B) by the difference between the estimated success probabilities for A and B within stratum no, A AHUO) = 13A,k('b‘0) ‘ I38,k(‘vol (3-2) and replace Dk (the difference between the proportion of patients allocated to A and B) by the difference between the proportion of patients allocated to A and B within stratum 1'0, N.-1,k(‘l'0) _ Ne,k(l’0) D U = i k( 0) Nk(l’0) Nk('1-’0) This yields <1+ 151mm) — (1 - MDIJUol}, (3.4) 2 The idea. behind the CAWD design is that, when allocating a patient with covariate value v0, it may be possible to increase the overall proportion of patients successfully treated in the trial by using information on the responses of patients treated in the previous stages in strata '1) 75 110. So, instead of simply using Ak(v0) in the allocation policy, we use a weighted average of Ak(v) ’8, namely k(v0, 1102(an 10,11)Ak(v) (3.5) vEV where m(1)0,v) ’s are non-negative real numbers such that £2va 771('110,v) 2 1. This yields 1+ AAk('110, AI) — (l — A)Dk(v0) 2 (3.6) 5k+1(1’0) = I Uk+1 \<. Suitable choices for the constants m(~110, 11) will be discussed in Sections 3.3 and 3.5. The m('1)0,'11)’s can be interpreted as the weights to be placed on responses of patients previously treated in strata v E V, when allocating a patient in stratum 110; these constants will be referred to as the crossover weights (from strata v to stratum 54 v0). Denote by M the matrix of crossover weights, m(1, 1) m(1, 2) m(1, 7') 771(2, 1) 771(2, 2) m(2, 7‘) M = . (3.7) m(r, 1) 771(r, 2) m(r, 7‘)J with non-negative elements and such that the sum of the elements in each row equals 1. Note that (3.4) (which is the AVVDM) allocation policy within strata) is the par- ticular case of (3.6) corresponding to choosing a diagonal matrix of crossover weights with diagonal elements equal to 1. We will denote such matrix by Ms. Henceforth, the notation CAWD(/\, M) will refer to the allocation policy given by (3.1), (3.6) and (3.5), for a constant (A E (0, 1) (again referred to as the compromise weight) and a matrix of crossover weights M. 3.3 Strong Laws of Large Numbers In this section, unless otherwise stated, it is supposed that patients are allocated to treatments A or B according to the CAWD(/\, M) allocation policy. The asymptotic results proved in this section are the analogues to the asymptotic results proved for the AVVD(/\) allocation policy in Section 2.3. A fundamental result that will be proved here is the almost sure convergence of the proportion of patients allocated to each treatment within each stratum, as the number of patients treated converges to 00. While working towards proving this result it will be shown that 13A,),(12) and 133,100 55 are strongly consistent estimators of p,4('v) and p3(v), the true success probabilities for treatments A and B, within each stratum v E V. An expression for the asymptotic proportion of patients successfully treated following a CAWD(/\, M) rule is derived and compared with the corresponding expressions resulting from allocating patients according to complete randomization within strata and according to the AWD(/\) rule within strata. For k 2 1, let .7), be the a-algebra generated by the the first k allocations, potential responses, strata and auxiliary randomization, i.e. and let f0 denote the trivial a-algebra. It is also useful in the proofs that follow, to consider, for k 2 1, the a-algebras Qk = 7:1: V Jill-+1} and Hk = .73, V U{l’)€+1, Uk+1}. Note that 6H1 is ’Hk—measurable. Lemma 3.3.1 For each '1) E V, = ((1)) (1.3. Proof. Fix 1) E V. Since and {WC : k 2 1} is a sequence of i.i.d. random variables with E[I{V1 = 11)] = C(11), the result follows by the Strong Law of Large Numbers. I Lemma 3.3.2 For each 11 E V, lirn Nk(11) : 00 (1.3. k—wo Proof. Follows directly from Lemma 3.3.1. I Although the results that follow are only proved for treatment A, similar results hold for treatment B. Proposition 3.3.1 For each 11 E V, klim N,4,k(11) z 00 as. Proof. Fix 11 E V. Since {N,4,k(11) : k 2 1} is a nondecreasing sequence of random variables, then lim;H00 N,4,k(11) exists and {33.30.11.419 < 00} = U {NM-(v) —_- N,4,k(11), Vj > 1.}. (3.8) [€21 Fix k 2 1. Note that 1VA,k+1(v) Z N.4,k(v) + Iin+1 Z L’}6k+l(l’) 1+ 113,,(21, M) - (1 — A) [2 (W) — 1] 2 = NA,1c(U) + [{Vk+1:v}l Uk+l < By Lemma 3.3.2, there exists k0 = k0(k, 11) > k such that Nko(11) 2 2k. Consider any such k0. Then, for any j 2 k0, Nj('11) 2 2k and so NA. 11(1)) k 1 . g =—, v 2k. 1 Nxm 2k 2 J 0 (30) Clearly [3,.(21, M) 2 —1, Vj 2 kg. (3.11) Now, (3.9), (3.10) and (3.11) together yield {JV/1,](v) Z NA,k(U), Vj > k} - M 1. 1+ 1A,“), M) — (1 — 1) [2( ,1)»ij 2 Q Uj+1> —1 ) orVJ-Haév,\'/j>k l—A Q {Uj+1 > 07‘ 1341 751), VJ 2 k0} (3-12) We now Show that (3.12) has probability zero. We can re-write (3.12) as 032,60 C]- where, for each j 2 k0 l—A Cj 2 {(17141 > T} U {V3.11 ¢ U} . Note that {C}, j 2 k0} is a set of independent events with Since C(11) E (0, 1) and /\ E (O, 1), then P {fly-2h) 01-} = 0,1.e., (3.12) has probability zero. Hence, by (3.8), and (3.12) P{kli_,n:oNA’k(U) < 00} S ;P{IVAJ(U) Z NA,k(’U), Vj > k} 2 0. The result follows. I The proof of Theorem 3.3.1 below uses a technical result proved in Section 2.3 (Lemma 2.3.1) and a theorem from Hall and Heyde ( 1980) which is included in the Appendix A of this dissertation. Theorem 3.3.1 For each 11 E V, 114,),(11) and 133,),(11) are strongly consistent estima- tors of the success probabilities pA(11) and 113(11) within stratum 11, i.e., lim [3,4,k(11) = p,4(11) (1.8. k—mo and 111“ 1531(1)) = 103(1)) (L19- k—mo Proof. Fix 11 E V and [121. We can write k 1111(1— .11. 11 “11.11111 21113=11161 211(11) — 1 a.s., k~+oc 012(1)) 2 We expect that 1. [N11, 1(1’) 1m Nk(v) — P(6k+1(11) : 1 1912)] = 0 as. k—aoo Now Theorem 3.3.1 and (3.16) imply that 1+ AAk(11, 1W) —- (1 — A) Dk(11) 2 1+ AA(11, M) — (1 — A)(2n('v) — 1) P(5k+1(v)=1|gkl= 12:0: 2 So, if (3.17) holds then _ 1+ A 5(11, M) — (1 — 1)(2 17(11) — 1) — 2 n(v) which, solving for 17(11) gives 1 A ~ 17(11) : 2 1+ 5:: A(11, IV) as (3.16) (3.17) (3.18) Therefore, if N A,k(v) /Nk(11) converges, we expect that it converges almost surely to (3.18). The formal proof of this result follows arguments similar to those used to prove Theorem 2.3.2. Theorem 3.3.2 For each 11 E V, . NA k(U) 1 A ~ ’ = — / ' M . . 21:20 Nk(11) 2 (1+ 2 _ /\ A(11, )) a s 61 (3.19) and . JV}; k(l,’) 1 A ~ 1 ’ =— 1— A" .M .2 kg; 1Vk(‘l1') 2 < 2 _ A (L’s )) (1 5 (3 0) Proof. Define a function q on (0, 1) x (O, 1) by setting q(s, t) = (2 — A)t— (1 — A) 8. Note that q satisfies the regularity conditions of Section 2 in Eisele (1990), namely (i) q is continuous; (ii) q(s, 8) = 8; (iii) q(s, t) is strictly decreasing in s and strictly increasing in t. Fix 11 E V. Now, for the allocation rule CAWD(A, M) we can write - , NH 1 A ~ 0k+1(vl Z [{bk+1< q (Nkill’C), 5 ( + m Ak(v1 1110)} - where, by Theorem 3.3.1, . 1 A ~ , _1 A /~ 1211111010 [2 (1+ 2_—/\Ak(11,111))] — 2 (1+ 2_)‘A(11m)) a.s. Hence, relation (3.19) can be proved by following the same arguments used in the proof of part (iii) of Lemma. 1 in Eisele (1990). Since NB,,,(11)/Nk(11) = 1 — 1N7,(k(11)/Nk(11), then (3.20) follows from (3.19). I Note that a direct consequence of Theorem 3.3.2 is that the relation (3.17) (that we used to heuristic-ally deduce the limiting proportion of patients allocated to A 62 within stratum 11) holds. From an ethical point of view (individualistic goal) we are interested in the propor- tion of patients succesfully treated following CAWD(A, M) allocations. This asymp— totic proportion is compared, in the proposition below, with the corresponding ex- pressions for complete randomization within strata and AW D(A) within strata. Recall that allocating patients according to AWD(A) within strata corresponds to choosing a diagonal matrix (1).-Is) of crossover weights with diagonal elements equal to 1. Hence, for this policy ~ A(11, 111) = A(11). (3.21) For each k 2 1, let 5;, denote the number of patients successfully treated through stage k, i.e. S), = ZUEV [34,),(11) + SB,k(v)]. Note that the true success probabilities for treatments A and B, p A and 113, can be written as PA = Z C('U)P.4(’U) vEV and p3 = 260111136), vEV respectively. Proposition 3.3.2 For complete randomization within strata, 1 lim i = — (11.1 +113) 11.3. (3.22) 63 For the CAWD(A, M) allocation policy, , S. 1 A ~ .3132. 73‘ = g [(1.1 +22) + 2_—,\ 2(1/ (1.. 1114\(11) (3.23) For the AWD(A) allocation policy within strata, lim fl — — (114 + 113) + ———- Zc(1) A2(11) as. (3.24) k-—)oo k 2 _ A vEV Proof. Relation (3.24) follows from (3.21) and (3.23). To show that (3.23) holds write 2 = z (2,... . - - 1111:) - N1”) v€V The result follows by Theorem 3.3.1 (as. convergence of 151,411) and 133201)), Theo- IVB‘ 1,-(11) Nk(v) rem 3.3.2 (as. convergence of M and ) and Lemma 3.3.1 (as. convergence 1V140) of 1317(2) and algebra. A similar argument proves that (3.22) holds. I As mentioned in Section 3.2.2, the idea behind the CAW D design is that it might be possible, for suitable choices of the crossover weights, to improve upon the AWD design within strata in terms of the proportion of patients successfully treated. We now use Proposition 3.3.2 to derive the best choices, in terms of ethical allocation, for the crossover weights. In the case of two strata the best choices for crossover weights have simple in- terpretations. Only the case of two strata is considered here and when evaluating the CAWD design in Section 3.5. In the remainder of this section we assume that 64 V = {1, 2}. Hence m(1,1) m(l, 2) M = m(2, 1) m(2, 2) Recall that, in this setting, 0 the crossover weights for allocating a patient in stratum 1 are m(l, 1) and m(l, 2) with 111(1, 1) + m(l, 2) = 1, and the crossover weights for allocating a patient in stratum 2 are m(2, 1) and m(2, 2) with m(2, 1) + m(2, 2) = 1; o A(1) 2 114(1) — 113(1) and A(‘2) = 114(2) — 118(2); . I 1 . A(1, m) .—_ 122(1, 1) A(1)+m(1, 2) 23(2) and M2, 771) = m(2, 1) A(1)+m(2, 2) 11(2). Theorem 3.3.3 If A(1)A(2) g 0, the choice of crossover weights that yields the largest asymptotic proportion of patients successfully treated is given by 1 0 M, = . (3.25) O 1 which corresponds to allocating patients according to AWD(A) within strata. If A(1) A(2) > 0 and |A(I)| > |A(2)|, the choice of crossover weights that yields the largest asymptotic proportion of patients successfully treated is 1 0 1 0 If A(1) A(2) > 0 and |A(1)| < |A(2)|, the choice of crossover weights that yields the largest asymptotic proportion of patients successfully treated is 0 1 M2 = . (3.27) Proof. Let Sgen denote the as. asymptotic proportion of sucesses for CAWD(A, M) with general choice of crossover weights. By (3.23). the maximum of Sgen is achieved for the choice of M that maximizes 22,61, C(11) A(v, M) A(v). Since m(l, 1) = 1 — m(l, 2) and m(2, 2) = 1 — m(2, 1), we may write 211(1) A(u, M) A(v) : 116V 2 2(1) 771(1, 2) 1(1) (A(2) — 1(1)) + 11(2) m(2, 1) A(2) (an) — A(2)) (3.28) If A(I) A(2) g 0, then both A(1)(A(2) — A(1)) S 0 and A(2)(A(1) — A(2)) S 0. Hence (3.28) is largest when 171(1, 2) = m(2, 1) = 0 and the first part of the theorem follows. If A(1)A(2) > 0 and |A(1)| > |A(2)|, then both A(1)(A(2) — A(1)) < O and A(2)(A(1) — A(2)) > 0. Hence (3.28) is largest when m(l, 2) = O and m(2, 1) = 1. The second part of the theorem follows. The proof of the result for A(1)A(2) > 0 and |A(1)| < IA(2)| is similar. I The results stated in Theorem 3.3.3 can be interpreted, from the point of view of ethical allocations, as follows. Suppose stratum 1 corresponds to female patients and stratum 2 corresponds to male patients. a If A is as good as B for treating either female or male patients or if A is better than B for treating females but B is better than A for treating males, then females (respectively, males) should be allocated without using information on the responses of males (respectively, females) previously treated; there should be no crossover of information. 66 o If A is better than B for treating both female and male patients and the difference between the success probabilites is largest for the female group (respectively, male group), then both females and males should be allocated by using only information on the responses of females (respectively, males); there should be crossover of information only from the stratum with the largest difference between treatments. 3.4 Central Limit Theorem In this section it is shown that within each stratum v E V, the strongly consistent estimators of the success probabilities pA (v) and p 3(11) are asymptotically independent and normally distributed. Theorem 3.4.1 For each v E V, as k —> oo, NA,k(’U) (1321,1201) “ 1921(0)) 11 0 1321(1)) (121(1)) 0 —> N , , VNB,k(U) (138.1(0) — 113(0)) 0 0 193(1)) (113(1)) where (124(1)) =1— 1921(0) and (18(1)) 2 1 — 173(1))- Proof. Fix v E V. Fix real constants a and b, and define for each h 2 1 and i=1,--1,k, \/_ and let Hm‘ = 71,-. For each j = 1, ,i let M,“- = i]: Z [a(XJ~ — pA(i1))6,-1{1'3 = v} + 13(1} - P3(v))(1— 5]) [{l’} = ”)1, Z)“ = % [11(Xj — pA(v))6j I{l"j = v} + b(Yj — pB(v))(I — (531)1{1’} = v}]. The theorem can now be established using similar arguments to the ones used in the proof of Theorem 2.4.1. I 67 3.5 Evaluation of the Design Evaluating the performance of the CAW D design is not a simple task. Many param— eters (p,.1(v), 113(1)) and ((1') for 11 E V) and choices of weights (A and M) have to be considered. The number of parameters increases rapidly as r, the number of strata, increases. Simulations are, in this setting, extremely time consuming; a thorough evaluation of the CAW D design is relegated to future work. Here our goal is simply to show that the idea behind the CAWD design has merit. Only the case r = 2 is considered. Monte Carlo simulations are used to evaluate the CAWD design from the individualistic point of view: 0 how ethical are the assignments of patients to treatments? and from the utilitarian pont of view: 0 how good are the estimators of the treatments difference within strata, pA(1) —p3(1) and 1921(2) - 113(2)? These questions are addressed by looking, respectively, at the proportion of patients successfully treated and at the empirical mean squared error within strata obtained as a result of CAVVD(A, M) allocations. Theorem 3.3.3 shows the best choices for the crossover weights from an individ- ualistic perspective. Now we will use simulations to verify, both for small and large trials, how good these ethical choices can be and how much may be lost in terms of statistical inference. Comparisons are made between the CAWD design with A = 0.8 and matrices of crossover weights given by Theorem 3.3.3, the CAWD design with A = 0.8 and a matrix of crossover weights, denoted Mh, with all elements equal to 68 0.5, complete randomization within strata and the RPW(1, O, 1) rule within strata. Here the choice of A = 0.8 is justified by the good performance shown by an AWD design with a large compromise weight. Figures 3.1 through 3.18 show the results of 10,000 replications of clinical trials with sample sizes 11 = 30 and 150, success probabilities (p,4(1), p3(1), p,4(2), pB(2)) = (0.50, 0.10, 0.15, 0.40), (0.90, 0.10. 0.60. 0.15) and (0.35, 0.50, 0.15, 0.85) and a range of values for c(1) from 0.20 through 0.80. For each of the allocation policies consid- ered and each combination of values for n, (pA(1), p3(1), pA(2), 113(2)) and C(I), the proportion of successes is computed as the average, over 10,000 replications, of the proportion of patients successfully treated in the simulated trial; the empirical mean squared error within stratum v (v E {1, 2}) is computed as the average, over 10,000 replications, of the squared difference between the estimates of (p,4(v) — p3(v)) and the parameter. The following labels are used in Figures 3.1 through 3.18. o r = complete randomization within strata; o a = CAVVD(0.8, .Ms); 0 b = CAWD(0.8, N11); 0 c = CAWD(0.8, Mg); 0 d = CAVVD(0.8, Mh); o W = RPVV(1, 0, 1) rule within strata. Here, M3, M1 and Mg are the matrices defined in (3.25), (3.26) and (3.27), respec- tively, and M), is a 2x2 matrix with all elements equal to 0.5. 69 For Figures 3.1 through 3.6, the success probabilities satisfy A(1)A(2) < 0. In this case, Theorem 3.3.3 states that the best ethical choice of crossover weights for the CAW D design is given by M, (which is equivalent to AWD allocations within strata). Simulations show that (for a particular choice of the success probabilities) not only the overall proportion of patients successfully treated is larger for the CAWD(0.8, M 3) design than for the CAVVD(0.8, Mh) design and complete randomization within strata, but the mean squared error for estimating the treatments difference within each stratum is also smaller for the CAWD(0.8, Ms) design than for the other two rules. The RPW(1, 0, 1) rule within strata and CAVVD(0.8, MS) design show similar combined performances in these figures. For Figures 3.7 through 3.12, the success probabilities satisfy A(1)A(2) > 0 and |A(1)| > |A(2)|. In this case, Theorem 3.3.3 states that the best ethical choice of crossover weights for the CAWD design is given by Ml. Simulations show that (for a particular choice of the success probabilities) CAWD(0.8, Ml) yields, as ex- pected, a larger proportion of successes than CAVVD(0.8, Mh). The RPW(1, 0, 1) rule within strata. and CAWD(0.8, Ml) have similar performances with respect to the proportion of patients successfully treated. Complete randomization within strata is markedly deficient in this aspect. A surprising feature of the CAWD(0.8, M1) de- sign and the RPW(1, 0, 1) rule within strata is shown in Figures 3.8, 3.9, 3.11 and 3.12. CAVVD(0.8, M1) does better than RPVV(1, 0, 1) within strata when estimating the treatments difference in stratum 1, but. the reverse happens when estimating the treatments difference in stratum 2. Still, RPVV(1, 0, 1) within strata has a good per- 70 101111111111 1111.1 .1111 ul (111111 the 1111 1111 1) 1111) Of ("1115 11111 (‘11‘ C .\\\‘l formance in this respect, comparable to that of complete randomization within strata. This suggests that the crossover weights may be adjusted to increase the proportion of patients successfully treated while yielding low mean squared errors for estimating the treatments difference within strata. Finally, for Figures 3.13 through 3.18, the success probabilities satisfy A(l) A(2) > 0 and |A(1)| < |A(2)|. In this case, Theorem 3.3.3 states that the best ethical choice of crossover weights for the CAW D design is given by Mg. Simulations yielded sim- ilar conclusions to the ones presented in the previous paragraph, but now for the CAVVD(0.8, 1112) design. 71 Preportion of Successes 032 in: 6949/6: /g 0.28 0.29 0.30 0.31 ‘ dig/9?”, I T F I T I 0.20 0.25 0.30 0.35 0.40 0.45 0.50 c(1) 033 in: 0.31 029 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 3.1: Comparisons in terms of proportion of successes for n = 30, p,4(1) = 0.50, 113(1) 2 0.10, 121(2) = 0.15 and 123(2) = 0.40. 72 Mean Squared Error in Stratum 1 Q-i /.7/ /7/ 1’0.“ // 3\ 8 _ O§é§ O 3% a 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0(1) O r 8 _ d o' \ - \ \ o a (El 4 \w\& \u\ 8 8 q \i \\\8 0.50 0.55 0.60 0.65 0.70 0.75 0.80 (1 V G Figure 3.2: Comparisons in terms of mean squared error in stratum 1 for n = 30, 73 Mean Squared Error in Stratum 2 d . /s g”. - 1.1/w o , / (1% 1:1/ 3 - g/i/ g /g%w/ ‘ /gZW// o 0.20 0.25 0.30 0.35 0.40 0.45 0.50 c(1) ‘1‘) \ 0.50 0.55 0.60 0.65 0.70 0.75 0.80 (1) ('3 Figure 3.3: Comparisons in terms of mean squared error in stratum 2 for n = 30, 112(1) = 0-50. 193(1) = 0.10, 114(2) = 0.15 and 223(2) = 0.40. 74 Proportion of Successes N “/0 g T /0/ ya ‘ 0%“ fi% 0 CO —1 d 3 1 2/ fie?” o d/ 0.20 0.25 0.30 0.35 0.40 0.45 0.50 c(1) _, /w O ‘ u/W 3/ E7) - d .. /d d/d O) d/d/ ________————f—""‘"“'"—'r g ‘ si::_.:: ér/T‘xr 0.50 0.55 0.60 0.65 0.70 0.75 0.80 C(1) Figure 3.4: Comparisons in terms of proportion of successes for n = 150, pA(1) = 0.50, 113(1) = 010. 112(2) = 0.15 and 123(2) 2 0.40. 75 Mean Squared Error in Stratum 1 0.025 / l' o d§a§ 3 d a§g 0.20 0.25 0.30 0.35 0.40 0.45 0.50 6(1) l’ d d .\8 § 1 \\ O .l/ // 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 3.5: Comparisons in terms of mean squared error in stratum 1 for n = 150, 112(1) = 050. 113(1) = 0.10, 122(2) = 0.15 and 223(2) = 0.40. 76 Mean Squared Error in Stratum 2 52 d g ‘ /1’1 /"/ ‘ a B , d/;/ o 21/11 / ‘ défi/ 2212?/, . , . . 0.20 0.25 0.30 0.35 0.40 0.45 0.50 (1) O 0025 \ o. 1 \L \X. 0.010 d 0.50 0.55 0.60 0.65 0.70 0.75 0.80 6(1) I I I I I I Figure 3.6: Comparisons in terms of mean squared error in stratum 2 for n = 150, p.1(1) = 0-50, 113(1) = 0.10, p..(2) = 0.15 and 223(2) = 0.40. 77 Proportion of Successes 8' b o /b/b w O b/ yd/ 8 b//w/ o 1 ‘2/ L0 <2: .I o r——--""”r r r/r/ o /r/ (4.3) and for k 2 1 Adv‘) = AM") + 2‘; W, v) [Sim + Nam) — SB,k(U)l (4.4) and Bier) = Batu“) + 225% 1:2)[SB,.(v)+N.I,.(v) — 5.1.00]. (4.5) So, according to the above description of the model, - x4k(v*) (5 l = I L'- g . 4.6 k+1(v ) { ’k+1 Akfvl) + Bk(v*)} ( ) Henceforth, the notation CRPVV(C0, B) will refer to the allocation policy described in this section, where the set C0 gives the initial composition of the urns and the matrix 3 gives the number of balls to be added to each urn, at each stage of the trial. Note 94 that the RPW(u, 0, '13) rule within strata is the particular case of the CPRVV(C0, 8) rule corresponding to choosing a set Co with all pairs equal to G, 2 u) and a diagonal matrix B with all diagonal components equal to [3. 4.3 Strong Laws of Large Numbers In this section it is assumed that patients are allocated to treatments A or B according to the CRPW(C0, 8) rule. The main result. proved here is that the proportion of balls of type A in each urn converges almost surely to a constant as the trial size converges to 00. The proof of this type of results for single urn models is rather involved. See, for example, Hill, Lane, and Sudderth (1980), Athreya and Karlin (1967) and Bai and Hu (1999). The crossover of information on the responses of patients significantly complicates the arguments needed to prove this type of convergence. We prove the result, for the CRPW rule, in the case of two urns and for matrices B with components satisfying some conditions. Our proof follows the structure of the proof of Theorem 2.1 in Hill, Lane, and Sudderth (1980) with several modifications to allow for the specific characteristics of a multiple urn model with crossover of information. It is also proved, in this section, that for any number of urns and any matrices B for which the above convergence holds, 13,4,k(v) and 133, k(v) are strongly consistent es- timators of [n(u) and p3(v), the success probabilities for treatments A and B, within each stratum v E V. For k 2 1, let 7-} be the a-algebra generated by the the first k allocations, potential responses, strata and auxiliary randomization, fk‘ Z 0{6i9 ‘27 ‘X'is )iv (Ii: 1S ’l S k} and let .70 denote the trivial a-algebra. It is also useful in what follows, to consider, for k 2 1, the a-algebras 9k = 7k V Ufl‘iwi} and 71k = .75}, V 0{l"}c+1, Uk+1}- Note that 6k+1 is ”Hk—measurable. The following lemma is useful in proving some of the theorems in this section. Lemma 4.3.1 F07“ each 2) E V, . NM“) 1331010 k 2: c(v) (1.3. Proof. The proof is similar to that of Lemma 3.3.1. I It is reasonable to assume that, for each 11 E V, the limiting proportion of type A balls in um 12 is almost surely the same as the limiting proportion of patients allocated to treatment A within stratum v (if the limits exist). So, in what follows, unless otherwise noted, we assume that for each 12 E V . Ak(’U) NA k(v) 11m — —’—— = O (1.5. 4-7 k—mo Ak(1)) + kav) JIVkiU) ( ) 96 If (4.7) holds, then Lemma 4.3.1 implies that . Bkll’) NB All") 1 — ——’— = O .. . 4.8 1.320 Aka) + am) am) a 9 ( ) For each u E V = {1, ‘2, - -- , r} and each k 2 0 define Rk(u) to be the proportion of type A balls in urn u just before patient I: + 1 is assigned to a treatment. Then Akfb’) Ak(u) + Bk(v)' Rk(l.’) = (4.9) Note that R0(v) : r0(u). The next theorem states that, for any specific urn, if the proportion of type A balls converges as. then the usual estimators of the success probabilities for treatments A and B within the stratum corresponding to that urn are strongly consistent. Theorem 4.3.1 For each u E V, if Rk(v) converges almost surely as k —> 00 then 15A,k('v) and 153,},(u) are strongly consistent estimators of the success probabilities pA(-v) and pB(-v) within stratum v, i.e., (i) lim;MOO [3,4,k('U) 2 114(1)) as; (at) limksmfimv) = 12302) as. Proof. First note that Lemma 4.3.1 and Assumption (4.7) imply that klim N,Lk(u) = 00 as. The result can now be proved by following the arguments used in the proof of Theo- rem 3.3.1. I 97 The previous results were proved with no restrictions on the number of strata or on matrix B. The next theorems summarize the main result of this section. The theorems are proved under the assumption that for each v E V, 94(0) + (18(1’) > 0 where (1,4(v) = 1 — p.4(v) and (13(1)) = 1 — pB(v). So, it is assumed that, within each stratum, at least one of the success probabilities is less than 1. The proofs are only valid for r : 2. Henceforth, V = {1, 2} and the matrix of ,8’s is of the form ,3(1,1) 3(1, 2) 13(22 1) 113(29 2) L .. Theorem 4.3.2 below states that, under the above assumptions and some further constraints on the components of B, the proportion of type A balls in each urn converges as. as the trial size converges to 00. Note that, the convergence for urn v" (v‘ E V) is proved by constraining only the 5’s in row v“ of B. Assumption (4.7) is not needed to prove the theorem. Theorem 4.3.2 The following hold. (i) [fl g fi(1, 1) g fi(1, 2) then Rk(1) converges almost surely as k —) 00; (ii) if1 g M2, 2) S 13(2, 1) then Rk(2) converges almost surely as k —> 00. The proof of the theorem follows three lemmas. Before stating these lemmas we introduce some notation, define some concepts and present general arguments that will be used in the proofs. 98 For simplicity, in what follows we write [3“- instead of [3(i, j), for i, j E {1, 2}. Fix v“ E V. For each It 2 1, define r,4,k(v*) to be the number of A balls added to urn v‘ after observing the response of patient 1:. Then, Time) = (,s,,.,11{t;. = 1} +s,,.,2 [{Vk = 2}) (5,, X, + (1 — (5k) (1 — m). N ow, P [7.4.1.079 =i’3v-,1|Hk_1] = = Illi=1}1{6k =1}p..(v*>+ 1m. = } Ital. = 0}<1— paw» and so PlTA,k(U*) : £1321 lgk—li : : 1),,(1) I{V,c z 1} Put, =1|Qk_1)+(1— 193(1)) Ill/"k = 1} P051: = 0 l gk—I) = lpA(1)Rk—1(1)+ (1 — 198(1))(1— Ric—1(1)” 191:1}- Similarly, it can be shown that PlTA,k(U*) = [32122 I gk—l] = = [11.49) Ric—1(2) + (1 - 103(2)) (1 - Ric—1(2)” 1ka = 2}- Define on V x [0, 1] x [0, 1] a function fv. by setting PA(1)r1+(1—p3(1))(1—r1) ,ifv=1 fv~(v, r1, r2) = (4.10) 114(2) r2+(1-p3(2))(1—r2) ,ifv=2 So, if {U;c : k 2 1} is a sequence of i.i.d. Uniform[0, 1] random variables independent 99 of the sequence {(u, Xk, 1},) : k 2 1}, then 7.4.1.0”) =,L3vo.11{Uk S [{l‘i=1}fv-(lic. Ric—1(1): Ric—1(2)” + :52 1 {Uk g [{1} = 2} f,,.(v,,, Rk_1(1), Rk_1(2))} (4.11) Also, note that ‘4_ ,’* +B_ ',’* R- [1“ , ,‘t Rk(v*)=( k 1(1 l k 1(l )) k 1(l )+TA,k(b) (4.12) 4415(11") + Bk('U") where, because of the restrictions on the components of B (1 g Bum g flung), Ak(v’) + Bk(v‘) : n.0(v‘) +,3,,-,1Nk(1) +,1’3,,.,21Vk(2) 2 710(1)") + k. (4.13) The notions of urn process and urn function defined in Hill, Lane, and Sudderth (1980) can be generalized as follows. For each v* E V, we say that {Rk(v*), k 2 1} is the urn v" process, with urn function fv- and initial urn composition (r0(v*), 72.0(v*)). The distribution of such urn process will be denoted by P(,.0(,,.),,,0(,,.)). We prove the theorem for v = 1 under the assumption 1 S [3(1, 1) g [3(1, 2). The proof for v = 2 under the assumption 1 g ,8(2, 2) g {3(2, 1) follows similarly. For simplicity, when an argument or statement refers only to the urn 1 function and the initial composition of urn 1 we omit the subscript v*, (v* = 1) from the urn function and the argument v“ from the initial urn composition. We now state and prove the three lemmas used in the proof of Theorem 4.3.2. 100 Let I = (a, b) and J = (c, d), with 0 g a < c < d < b g 1 and let UR“) be the event that {Rk(1), k 2 1} upcrosses the interval I infinitely often. Lemma 4.3.2 If onmo) (EMU) > 0 then for every 6 > O and positive integer 111, there erists so 6 J and mo 2 1)! such that P(SO,,,,0) (Ugm) 2 1 - 6. Proof. The proof of the lemma follows three claims. Claim 1. The increments of {Rk(1), k 2 1} converge uniformly to zero, i.e. for every 6 > 0 there exists a positive integer 1111 such that for every k. 2 All, sug |Rk+1(1)(w) — Rk(1)(w)| < 6. we Proof of Claim 1. Fix a) E Q. Then (4.13) implies le+1(1)(w) — Rk(1)(w)| = _ {Ak(1l(w)+Bk(1)(w)} Rk(1)(w)+7.4,k+1(1)(wl _ w _ Ak+1(1)(w)+Bk+1(1)(w) Rk(1)( ) TA,k+1(1)(w) — {131.1 [{Vk+1(w) = 1} + H12 [{Wc+1(w) = 2}} Rk(1)(w) n0+k //\ < 2(.31,1+/31,2) 4.14 71.0 + k ( ) The expression in (4.14) does not depend on w and converges to 0 as k —+ 00. The claim follows. 1:] Claim 2. Any path in UR“) must visit J infinitely often, i.e. for every w E U 3(1) and positive integer Mg, there exists It 2 Mg such that Rk(1)(w) E J. Proof of Claim 2. We prove the validity of the claim by contradiction. So, assume that there exist L00 6 UR“) and a positive integer Mg such that Rk(1)(w0) ¢ .1, for all k 2 A12. Let c = d — c. Note that e > 0. Then, the previous assumption and Claim 101 ‘ 1 imply that, for 1113 = m.a.;r{1lll, ill-2}, Rk(1)(w0) Q J and |Rk+1(1)(wo) — Rk(1)(w0)| < d — c, Vk 2 Mg. Hence, if R,.u3(1)(w0) S c then R,,.,3+1(1)(w0) S c. Iterating this reasoning yields RMs(1)('.v0) S c 2:» R,.u,+k(1)(w0) S c, Vk 2 1. (4.15) Similarly, RMS(1)(w-0) 2 d :> RM3+k(l)(w0) 2 d, Vk 2 1. (4.16) Expressions (4.15) and (4.16) contradict the assumption that tag 6 UR“). Claim 2 follows. C] Claim 3. Let C be an event. Then 10 is the function defined in Q as 1 iwa C low): 0 iwaCc Then “”1 P 0. Hence P(ro1no)(Gll S P(ro1no)(Ufa(1)) < 1 and, consequently, P(To,n0)(G2) > 0 Consider a fixed 1110 E 02. Fix 6 > 0 and a positive integer M. By (4.13) there exists a positive integer 1111 such that A1(1)(110) + 3.01%) >111, v11 2 111,. (4.17) 103 Since too E G2, there exists a positive integer .1113 such that P(Rk(1)(~00)1r111(1)(‘~110)+131~(1)(uJ0)) (1711(1)) 2 1 —' 1% WC 2 1113- (418) Let Mg : ma.r{.-l[, MI, .113}. Then, Claim 2, (4.17) and (4.18) imply that there exists an integer M4 2 Mo such that R1114 (1)(w'0) E J, AM4(1)(L00) + B,r([4(1)(w0) >111, and PlRM4(1)(wo),A114(1)(1110)+B,,,4(1)(w0)) (U1{(1)) 2 1 — 6. Consider any such M4. Let so = RM,(1)(wo) and no :2 AM,(1)(wo) +BM4(1)(wo). The lemma follows. I For each v“ E V, let 7(1)") 2 {Tk(v"‘), k 21} and Q(v*) = {Qk(v*), k 21} denote urn v” processes with urn functions gv1 and hv., and initial urn compositions (to(v*), no(v“)) and (qo(v*), no(v"‘)), respectively. Recall that when an argument or statement refers only to the urn 1 function and the initial composition of urn 1 we omit the subscript v", (v* = 1) from the urn function and the argument 11* from the initial urn composition. Lemma 4.3.3 If g(v, r1, r2) = r1, for all (11, r1, r2) 6 V x [0, 1] x [0, 1], then there exists a random variable T such that klirnTk(1) = T as. (4.19) 104 E(101no)(T) : t0 (420) lim sup E(t0,,,o)(T — to)2 2: 0 (4.21) rig—>00 (06]011] and, for every 6 > 0 1 P(t0.no) {5111) lTk(1) — t0] 2 6} g 6—2— E(to.n0)(T _ to)2 (422) 121 Proof. For simplicity, omit the subscript (to, no) when referring to expectations or probabilities in this proof. To distinguish between the several urn processes now being considered, denote by 799,110) the number of A balls added to urn 1 after observing the response of patient k, when the urn function is 9. Hence (see (4.11)) 79,1110) 2131,11{Uk S Iin = 1}9(Vk1 Tk—~l(1)1 711—191)} +131,21{Uk S [{Vk = 2} 9(1’111 Tk-1(1)1 Tic—1(2))l (423) Since the total number of balls in urn 1 at any given stage does not depend on the urn function, we can write (-4k—1(1)+ Bk-1(1))Tk—1(1)+ T9,.4,k(1). T1“) : A1(11+B1(11 Now, the assumption on g and (4.23) imply that Eng,A,k(1)lgk—ll 2,131,11{Vk = 1}Tk—1(1)+ 131,21{1"ic= 2} Tic—1(1) as. (4124) Hence, 1111111111=111>+r<1>1:113:51”“121me111 =Tk_1(1) a.s. So, {T1,(1), g1, : k 2 1} is a martingale. Furthermore, |Tk(1)| S 1 for all k 2 1. By the L2-martingale convergence theorem, Tk(1) converges almost surely (to an almost surely finite random variable, T) as A: —1 00. So (4.19) holds. Since {T1,(1) : k 2 1} is bounded (and hence uniformly integrable), (4.19) implies that lim E(Tk(1)) = E(T) k—100 But, since {T1,(1), g, : k 2 1} is a martingale, E(Tk(1)) = E(T0(1)) =10, Vk 21 Hence (4.20) holds. Since {(Tk(1) — to)? : k 2 1} is uniformly integrable, (4.19) also implies that lim E(T,,(1) — to)2 = E(T — to)2 (4.25) k—mo Now, because {T1,(1), g, : k 2 1} is a martingale, k k E(T1(11 41112 = E Z(T.(11— T110112 = ZE(T1(1)— 111-1(1))?- Fix i 6 {1, , k}. We can write Tg.A.1‘(1) — (131,11{Vi =1}+ fluzlfl’it = 2}) fF1410) 71(1) —— T._1(1) = A,(1) +B1(1) The assumptions on {3’s and (4.13) imply that (”0 +i)2(T1(1) - T1-_1(1))2 S g (7—914‘1,l'(1))2 _ 2Tg,.4,i(1)(,61,11{l/;=1}+ 512 [{V; : 2}) T;_1(1) + + (1312,1103- = 1} + (312,2 1W1: = 2}) 7131(1). 106 Now, a similar argument to the one used to obtain (4.24) yields E [(11.1.(111‘2191—1] =1f11{1~;~ = 1111-1(11+3i11{13 = 2111(1) and 13(+,,,,,,,(1)(31 11(1; =1}+3121{1;- :2})T 11191.1}: 2 (3,211“:- = 1}Tk_1(1)+,13,2,21{V, : 2}) (T,-_.1(1))2 Thus, 33,1{12 = 1} + 3,2,, [(12- = 2} E](Tz'(1) _Ti—1(1))2lgi—1] S (”0+z.)2 T1—1(1)(1-T1_1(1)) Since 1'; and T,_1(1) are independent, then El(T1-(1) — T1—1(1))))2l = E{El(T1(1) - T1-..1(1))2 | 91-11} 121C 1— <2 M, +1, C11))E[T._1(11(1—T._1(1111 71022 g133,111-(11+ <1c1<11 (7101111)2 Therefore, for any It 2 1 and to E [0, 1], 111(Tlcf1)—10)2 S 1:1 (71041)2 g(3,2,c(1)+,1312,2(1“4111):(710+1)2 <(121'1-()+’3122:1( :{1’3f,1)+131,2“1))};,1_0:(i)_nol+i) _{3,1)+fl12((1)1—C)};1'1’::i 107 Hence, sup E(T — to)2 = sup lim E(Tk(1) — to)2 to€[0. 1] to€[0, 1] ’HOO «‘2 , 2 1 no 1 g {151.1C(1)++31,2(1-C(1))};1; Z; (4.26) 121 By Cesaro’s Lemma, (4.26) converges to O as no —> 00. So (4.21) holds. Finally, since {(Tk(1) —- t0)2, 91,4 : k 2 1} is a submartingale, Doob’s inequality implies that for any m 2 1 P{ max |Tk(1) —-t0| 2 e}<;1§E(Tk(1)—t0)2. (4.27) 1gkgm But, as m ——> 00, {max 111111—1012 }1{sup111(11—1012 } lékSm 1:21 and by continuity lim P{ max lTk(1) —t0| 2 e} = P {sup |Tk(1) — tol 2 e}. (4.28) 1:21 m—mo igkgm So, (4.22) follows from (4.25), (4.27) and (4.28). I Lemma 4.3.4 If h(v, r1, 7‘2) : T1, for all (v, m, 72) E V x I x [0, 1], then lim sup P(so,,,0) {{Qk(1), k 2 1} visits IC} 2 0. (4.29) rig—>00 SOEJ Proof. Let 1; : min{c — a, b — (1}. Note that 7) > O. For each 30 E J, define QM, as Q11,130 : {811p iTk(1) _ 80' < 7}} ' k21 108 Suppose that the urn 1 process 7'(1) has urn 1 function g satisfying 9(1), r1, r2) = T1 for all (11, T1, 7‘2) 6 V x [0, 1] x [0, 1], and the initial urn 1 composition is (30, no). Then Lemma 4.3.3 implies that E(801n0)(T _ 30)2 772 P(so,n0) (011,50) 2 1 _ (4.30) and lim sup E(so no) )(T— 30)2 = O. (4.31) 710—)00 50E[0 1] Fix 1: > 0. Then by (4.30) and (4.31) there exists a positive integer M such that P(30,n0) (90,50) 2 1 — e, Vn0> >11I, so 6 J. Consider am such 11. Fix 71,20 >11 and 90 E J. Suppose that (30, no): (90(1), 710(1)) is the initial urn 1 composition for the urn 1 processes 7(1) and Q(1). Recall that 79,11,1(1) and 711,111,1(1) denote the number of A balls added to urn 1 after observing the response of patient k, when the urn 1 function is g and h, respectively. Also, 711,111“): 51 11{Uk< [iv-i: —1}h(1k1Qk—1(1)1Qk—1(2))} + 111111U1< 11V: 21111111, 621-1(1), 621-1211}, (4.32) W) 2 (1,1111) + 11:;11111Q51;(11<)1> + 1111(1), (4.33) 19,1,1(1=> 1311<11U1 1111:119011 T1 1(1) T112111 + 51 21{Uk<1{1’=2}g(1"}c,Tk_1(1),Tk_1(‘2))} (4.34) and m) 2 (141-1(1141 B1_1(1))T1_1<1) + 11,11,101 (4.35) Ak(1) + Bk(1) 109 Now, fix w E Qmso. Since so E J C I, the assumptions on g and h imply that 90111111 801 10(2)) = 30 = (1111(0))1 301 (10(2)) Then, by (4.32) and (4.34) Tg,A,1(1)(w) = Th..4,1(1)(w) and by (4.33) and (4.35) T1(1)(111) = Q1(1)(W)- Since w 6 9,1,0 and 17 = m11n{c — a, b — (1} then Q1(1)(w) = T1(1)(w) 6 1- Hence 9(13(w)1T1(1)(W)1T1(2)(w)) = T1(1)(W) = Q1(1)(w) = h(1"1(w)1Q1(1)(w)1Q1(2)(w)) and, as before, (4.32), (4.34), (4.33), (4.35) and the choices ofw and 77 yield Q2(1)(w) = T2(1)(w) E 1- Continuing in this way we conclude that Q1(1)(w) = Tk(1)(w) 6 I1 Vk >1 and, consequently, {Qk(1)(w) : k 2 1} does not visit I C. So, for every no 2 M and 30 E J, P(so,no){{Qk(1)7 k 2 1} ViSitS IC} S P(So1no) (93,50) S 6. 110 and the lemma follows. I We can now prove the theorem. Proof of Theorem 4.3.2. Recall that {Rk(1), k 2 1} is the urn 1 process with urn 1 function f and initial urn 1 composition (T0, no). To prove that 121(1) converges almost surely as k —> 00 it is sufficient to show that P(ro,no)(UR(l)) = 0, VI = (a, b) C [0, 1]. (4.36) Suppose, to the contrary, that (4.36) does not hold. Let Io 2 (do, bo) C [0, 1] be such that P(r0,no)(UR(1)) > 0. (4.37) Now, for every non-degenerate interval 11 g Io, v E V and r2 6 [0, 1], there exists r1 6 11 such that f(v, r1, r2) yé r1. So, a non-degenerate interval 11 C; Io can always be found so that, either' f(v, r1, r2) < r1, \7’ (1), r1, r2) 6 V x I1x[0, 1] (4.38) or flu, 7‘1,r2) > T1, V (1), r1, r2) 6 V x 11 x[0,1]. (4.39) Let us construct one such interval 11. It follows from (4.10), the urn 1 function for the urn 1 process {Rk(1), k 2 1}, that (13(1) Q1110) +(IB(1) f(117‘117"2)§ 7‘1 41> 7‘1 2 f(21 T1: T2) § 7”1 <=> T1 2 T2 (114(2) _ qB(2)) + (13(2)‘ 111 As r2 varies from O to 1, the expression r2 (114(2) — q3(2)) + q3(2) varies from min{p,1(2), {13(2)} to max{p,1(2), {13(2)}. Let. m : min{ (18(1) (111(1) +4311) 08(1) , . (11(1) +qB(1)’ 112.1(2). {18(2)} .1112). 111(2)}, M = max { Recall that Io : (ao, bo). Note that (1) if M S 0.0, then (4.38) holds for a choice 11 = Io; (2) if m 2 bo, then (4.39) holds for a choice [1 = Io; (3) if M 6 (do, bo), then (4.38) holds for a choice 11 = (M, bo); (4) if m 6 (do, bo), then (4.39) holds for a choice [1 2 (do, m). These are the only possible four cases for values of m and M. To be precise, suppose that 11 is such that (4.38) holds; the other case can be treated similarly. Let J1 be a proper subinterval of 11. Define an urn 1 function h on V x [0, 1] x [0, 1] by setting f(U1T117'2) ifTi $11 h(v, r1, r2) = (4.40) 7‘1 If 7'1 6 11 Let {621(1), k 2 1} be the corresponding urn 1 process with initial urn 1 composition (7‘0, no). Since {{Rk(1), k 2 1} upcrosses I1 infinitely often} 3 UR“) then (4.37) and Lemma 4.3.2 imply that lim sup P(,0,n0) {{Rk(1), k 2 1} upcrosses 11 infinitely often} 2 1. (4.41) 710—)00 SOEJI On the other hand, since h(-v, r1, r2) = T1 for all r1 6 11, Lemma 4.3.4 implies that lim sup 1130,1101 {{Qk(1), k 2 1} visits If} 2 0. (4.42) ’10—’00 506.11 112 Define QR and (2Q as QR : {.11 E Q : {Rk(1)(.u), k 2 1} upcrosses I1 infinitely often} and 9Q = {.11 E Q: {Qk(1)(.u), k 21} visits If}. Suppose 51;; fl QCQ ¢ ¢. Fix wo 6 9;; F1529 . Note that 22: {w E Q: Qk(1)(w) E 11, for all/c 21}. We now show, by induction, that R11(1)(w‘o)< Qk(1)(w(1)1 WC 2 0- (443) Since the initial urn 1 composition is the same for both processes, the above relation trivially holds when k = 0. Suppose (induction hypothesis) that for k 2 1 R,(1)(w0) S Q1(1)(wo), V2 6 {0, ‘ ' ° , [12}. There are. three possible cases to be considered. Case 1. Rk(1)(wo) 6 I1. In this case, (4.38), the induction hypothesis, the choice of wo E {222 and the definition of h (see (4.40)) yield f(1"1+1(w0)1 Rk(1)(¢do)1 Rk(2)(wo)) < Rk(1)(wo) S Qk(1)(wo)=h(""11+1(wo)1Q11(1)(wo)1Qk(?)(wo))- Hence, 7f,,.1,k+1(wo) g Th,,.1,k+1(wo) and, consequently Rk+1(1)(w0) S Qk+1(1)(w0)- 113 Case 2. Rk(1)(a1o)¢ I1 and 11+1(w0) : 2. In this case, the definitions off and h (see (4.10) and (4.40)), the choice of wo E (22? and the fact that if r1 6 [1 then r1 > T2 (111(2) — (13(2)) + (13(2), for any r2 6 [0, 1], yield f(1'i1+1(~'o)1Rk(1)(w'o)1Rk(‘2)(w'o))= f(21Rk(1)(wo)1Rk(?)(wo)) = Rk(2)(wo)(l).1(2) — 013(2)) + 08(2) < Qk(1)(’w‘0) = h(1"ic+1(wo)1 Qk(1)(wo)1 Qk(2)(w0))- Hence, 711/11k+1(~”'0) g Th,,.1,k+1(wo) and, consequently Rk+1(1)(wo)< Qk+1(1)(wo)- Case 2. Rk(1)(wo) 91 I1 and Vk+1(wo) =1. The assumption 131,1 g 131,2 is used only to prove this part of the theorem. Since Rk(1)(wo) g? 11 and Qk(1)(wo) 6 I1, then the induction hypothesis implies that Rk(1)(w0) < Qk(1)(w0)- The number of .4 balls in urn 1 (for both f and h urn 1 functions) is increased at each stage by either 0, 131,1 or 51,2. This together with the induction hypothesis yields (Ak(1)(w0) + Bk(1)(w0)) Rk(1)(w0) S < (Ak(1)(w0) + Bk(1)(w0)) Qk(1)(w0) — 771i‘71{131,11/31,2} 114 Hence, because 1'1+1(wo) = 1 and 131.1 é 51,2 (.4k(1)(.vo)+Bk(1 (4101)]1 k1( )(w0)+7f..~1.k+1(1)(w0) Ak+1( )(w0)+Bk+1(1)(W’0) ) 1 < (AA-(1)1410)+Bk(1)(w0))Rk(1)(w0)+ 131.1 \ 4k+1(1)(w0) + Bk+1(1)(w0) (1411(1)(w0) + Bk(1)(-'0)) Qk(1)(w0) — "1’17"{1’31,11 131,2} + 51.1 Ak+1(1)(w0) + Bk+1(1)(w0) : (141(1)(w‘0) + Bk(1)(w’0)) Qk(1)(w0) + 0 Ak+1(1)(w0) + Bk+1(1)(w0) < (14k(1)(~00) + Bk(1)(w0)) Qk(1)(w0) + Th..4.k+1(1)(w0) \ Ak+1(1)(wo) + Bk+l(1)(‘*’0) : Qk+1(1)(w'0)- 111-“(1)000: g So (4.43) follows by the induction principle. We have proved that if wo E {222 then (4.43) holds. But then too ¢ QR. This contradicts the assumption fig (1 02, 7f (15. Hence, (2;; (1 022 = o, i.e. QR Q QQ. Now (4.41) and (4.42) imply that 1 — lim sup 13(50 n0)QR\ < lim sup H50 no) 1—QQ — 0, 710—100 SOEJI 710-900 SOEJI which is impossible. So assuming (4.37) yields a contradiction. Therefore (4.36) must hold. The result follows. I Theorem 4.3.3 below allows us to relax the constraints imposed on the components of B (in Theorem 4.3.2) and still ensure that the proportion of type. A balls in both urns converges almost surely. Theorem 4.3.3 The following two conditions are equivalent. (2' ) 131(1) converges almost surely as k —> 00; (i2) Rk(2) converges almost surely as k —> 00. 115 Proof. Suppose that there exist a random variable 12(1) such that lim 121(1) 2 12(1) as. (4.44) k-mo Recall that Rk(1) = .4k(1)/(.41(1)+Bk(1)). Lemma 4.3.1, (4.3), (4.4) and (4.5) imply that, for each v* E V A.*+B'* A'*+Bv* N1 N2 11(11) 1.(11 ) = 0(1)) 0(1 )+16v',1 k( ) +51132—kQ k k k k ——> 1’3,.-,1c(1)+,8v-,2c(2) as (4.45) k—>oo We can write A 1 .41 kl: ) = 01:)+1131,1 5111(1) + 1V3,k(1) — 33,):(1) k +131,2 541(2) + NB’]:(2) — 33142). (4.46) By (4.44) and (4.45), Ak(1)/k converges a.s.; Lemma 4.3.1, Assumption (4.7), (4.8), Theorem 4.3.1, (4.44) and algebra imply that (3111(1) + NB,k(1) — 531(1)) /k con- verges a.s. Hence, by (4.46), (511(2) + NB,k(2) — 33,1(2)) /k also converges as. as k —> 00. Since, Ak(2) 140(2) 8,.1,k(1)+N3,k(1) "SB,Ic(1) k :_7c_+132’1 k S 2 +N 2 —S 2 +1312 .4,11( ) B,;( ) B.k( ) then 1411(2) / k converges as. This together with (4.45) yields the almost sure conver- gence of 1411(2)/ (Ak(2) + 81(2)), as k —> 00. A similar argument proves the other part of the theorem. I 116 By Theorem 4.3.2 and Theorem 4.3.3 a sufficient condition for the almost sure convergence of the proportion of type A balls in both urns is that either 1 S 13(1, 1) s 13(1, 2) or 1 <1’3(2, 2) 313(2, 1). In future work we will try to establish this result for r > 2 and matrices B with one zero component. in each row. Simulations (see Section 4.5) do suggest convergence of the proportion of type A balls in the urns for these type of matrices. We conclude this section by giving an expression for the limiting proportion of patients allocated to treatment A within each of two strata. So, suppose conditions 011 the components of the matrix B are satisfied that ensure the existence of random variables {(11) such that for each v E V = {1, 2}, lim Ak(?}) k—wo .4k(v) + Bk(2)) = {(v) as. Fix v E V. Assumption (4.7) and (4.8) imply that lim N114”) k—mo Nk(v) = {(v) as As in the proof of Theorem 4.3.3 lim Ak(’U) + Bk(v) k—mo k =fiv,lc(1)+flv,2(1 _C(1)) (1.8. Theorem 4.3.1, Lemma 4.3.1, (4.48), (4.4) and algebra yield _ Ak(U) 1.133.. k = £311,1C(v){19.4(1)€(1)+ (113(1) (1 - 5(0)} + (311.2 (1 — C(11)) {114(2) {(2) + (13(2)(1- €(2))} 0°3- 117 (4.47) (4.48) (4.49) Take the ratio of (4.50) and (4.49) and use (4.47) to get, for each 21 E V 5(0) {311.1 0(1) + 1311.2 (1 - C(1ll-11 1)(IJ1(1)€(1)+QB(1)(1— €(1))} + + 131.21 — 6(1) {111(2) {(2) + (113(2) (1 - {(20} (1.3. (4-51) Write (4.51) for v : 1 and for v = 2. Solve the resulting sytem of two equations for {(1) and {(2). 11’e have shown that, if the proportions of patients allocated to treatment A within strata 1 and 2 converge a.s., they converge as. to constants {(1) and 5 (2), respectively, which depend 011 c(1), B and the failure probabilities within strata. Explicit expressions for these limiting constants are given below. First, denote by det(B) the determinant of the matrix B and let D: (111(8) 6(1) (1 - 0(1)) ((140) + (18(1)) (114(2) + (113(2))- Then 17-50 )= (1643) (00- 0(1)) ((1.4(2) +qB(2))qB(1) + +{l31,16(1)+fli,(21—C(1))}fi2,6(1(111)3(1)+ '1' {132,1 C(1) '1' (32,2 (1 — C(1))} [31,2 (1 — C(1)) (13(2) '1' and D - 6(2) — (1611(3) 0(1) (1 - 6(1))(q.1(1) + (13(1)) (13(2) + + {,131,1C(1) ‘1‘ (31,2 (1 — C(1))} (32,1 C(1) (13(1) ‘1' +{l32,1C(1)+112,(2(11—C())} (312(1—C(1))(13(2)+ 118 4.4 Central Limit Theorem The arguments used to prove Theorem 3.4.1 would allow us to show that, if the proportion of type A balls in urn v E V = {1, 2... , r} converges almost surely as the trial size converges to 00, then the strongly consistent estimators of success probabilities p,;(z:) and p302) within stratumv are asymptotically independent and normally distributed. Theorem 4.4.1 For each 2) E V, 2f Rk(v) converges almost surely as k -—> 00 then N.1,k(1’) (13.1.k(1’) — 111(1)» 17 0 119.1(1)) (11(1)) 0 ——> N , NB,k(1’) (I3B.k(’1’) — 108(0)) 0 0 103(1)) (119(1)) as k ——> 00. 4.5 Evaluation of the Design For the same reasons mentioned when studying the CAWD design, the CRPW rule is only evaluated in the case r = 2. Although we have not yet been able to prove convergence results (to be more precise, Theorem 4.3.2) when at least one of the elements of the matrix B is 0, simulations support that convergence holds in those cases. Simulations also suggest that the best ethical choices of [3, according to the values of (14(1)) and q3(v) (v E V), correspond to cases where in each row of B, one of the 3’s is 0. Let B, be the matrix corresponding to allocating patients according to the RPW 119 rule within strata, BS: Let B1 be the matrix corresponding to allocating patients by using only information on the responses of patients previously treated in stratum 1 when allocating patients in any of the two strata, Finally, let B; be the matrix corresponding to allocating patients by using only infor- mation on the responses of patients previously treated in stratum 2 when allocating patients in any of the two strata, For each 2) E V, let F(v) : qA(v)/q3(v). Simulations indicate that the best choices in terms of ethical allocation are as follows. 0 If (T‘(1) — 1) (N2) — 1) g 0 then 8, yields the best ethical choice. 0 If (F(1) — 1) (F(2) — 1) > 0 and |F(1) — 1| > |F(2) — 1| then 81 yields the best ethical choice. 0 If (F(1) — 1) (F(2) — 1) > 0 and |F(1) — 1| < |F(2) — 1| then 82 yields the best ethical choice. Note that (F(1) — 1) (H2) — 1) < 0 means that A is better than B for treating pa- tients in stratum 1 (respectively 2) but B is better than A for treating patients in 120 stratum 2 (respectively 1). So, similarly to what was proved for the CAWD design, we again observe that in this case, and from an ethical point of view, there should be no crosssover of information from stratum to stratum. Monte Carlo simulations were used to evaluate the CRPW rule from the individ- ualistic point of view (how ethical are the assignments of patients to treatments?) and from the utilitarian pont of view (how good are the estimators of the treatments difference within strata, p.4(1) — pB(1) and 114(2) —- p3(2)'?). These questions were addressed by looking, respectively, at the proportion of patients successfully treated and at the empirical mean squared error within strata obtained as a result of CRPW(CO, B) allocations. Comparisons were made between complete randomization within strata and the CRPW rule with one ball of each type initially in each of the two urns, i.e. with C0 = {(%, 2), (5 2), (%, 2), (%, 2)}, and matrices 83,81 and 32. Note that B, and 31 yield the same mean squared errors in stratum 1 and that B, and 82 yield the same mean squared errors in stratum 2. Therefore, in the figures cor- responding to mean squared errors within strata, only three designs are represented. The following labels are used in Figures 4.1 through 4.18. 0 r 2 complete randomization within strata; 0 f = CRPW(C0, 31); 0 s = CRPW(CO, 82); 0 w = CRPW(C0, BS) 2 RPW(1, 0, 1) rule within strata. 121 Figures 4.1 through 4.18 show the results of 10,000 replications of clinical trials with sample sizes n = 30 and 150, success probabilities (114(1), p3(1), 114(2), 193(2)) :— (0.35, 0.10. 0.60, 0.15), (0.50, 0.10, 0.60, 0.85) and (0.65, 0.10, 0.40, 0.15) and a range of values for c(1) from 0.20 through 0.80. For each of the allocation policies consid- ered and each combination of values for n, (114(1), p30), 114(2), 123(2)) and c(1), the proportion of successes is computed as the average, over 10,000 replications, of the prOportion of patients successfully treated in the simulated trial; the empirical mean squared error within stratum v (v E {1, 2}) is computed as the average, over 10,000 repli(gfations, of the squared difference between the estimates of (pA(v) — p3(v)) and the parameter. In Figures 4.1 through 4.6 the success probabilities satisfy (F(1) -— 1)(F(2) — 1) < 0. Simulations show that the overall proportion of patients successfully treated is larger for the CRPVV(C0, 83) rule than for complete randomization within strata and CRPW rule with the other two choices of 8 matrices. As seen in Figures 4.2, 4.3, 4.5 and 4.6, the CRPW/(Co, 8,) rule has the best performance when estimating the treatments difference in stratum 1, but has the very worst when estimating the treatments difference in stratum 2. In Figures 4.7 through 4.12 the success probabilities satisfy (F(1) — 1)(F(2) — 1) > 0 and |F(1) — 1| > |F(2) - 1]. Simulations show that the overall proportion of patients successfully treated is larger for the CRPW(C0, 81) rule than for complete random- ization within strata and CRPW rule with the other two choices of 8 matrices. The best ethical choice for 3 yields a good performance when estimating the treatments 122 difference in stratum 1 both for small and large sample trials. In Figures 4.13 through 4.18 the success probabilities satisfy (F(1) — 1)(F(2) — 1) > 0 and |F(1) — 1| < |F(2) — 1|. Simulations show that the overall proportion of pa- tients successfully treated is larger for the CRPVWCO, 32) rule than for complete randomization within strata and CRPW rule with the other two choices of B matri- ces. The best ethical choice for 8 yields a good performance when estimating the treatments difference in stratum 1 both for small and large sample trials. 123 Proportion of Successes cud . :\W o f§s\w g \f\ \w f\ \w\ l0 5 g ‘ \ w EXI g - I I I r r r g 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0(1) 6 _ W\W\ o I w 2 J S§f\ \ \r\f w S\r\f\w UV). « S\r\f\w O \S\r\f\w O \s\\r\f Z;- \s\r \s 0.50 0.55 0.60 0.65 0.70 0.75 0.80 (1 v ('3 Figure 4.1: Comparisons in terms of proportion of successes for n = 30, p,4(1) = 0.50, 178(1) 2 0.10, 13,4(2) = 0.60 and pB(2) = 0.85. 124 Mean Squared Error in Stratum 1 0.18 m 0// / _ r \S 2 - w\r Ss\\ _ W s \ 8 — \w\ r o \W>Jv 0.20 0.25 0.30 0.35 0.40 0.45 0.50 C(1) S I\ \ Q 1 0 s _ \ S .\ S l!) f g - W\ r \ S \ \w\ r S \ \w§ r 8. 1 w JVQ r O W 0.50 0.55 0.60 0.65 0.70 0.75 0.80 6(1) Figure 4.2: Comparisons in terms of mean squared error in stratum 1 for n = 30, m(l) = 0.50, 123(1) = 0.10, “(2) = 0.60 and 193(2) 2 0.85. 125 Mean Squared Error in Stratum 2 o w o -4 w/ 0 d w/ 8 W"”"’W I o' / f / / v I / r O. - f % o I # f {f/ 0.20 0.25 0.30 0.35 0.40 0.45 0.50 6(1) w to / (\! —( O /W I . / r w ".3 /W/ 1 / o w / r w/ 1 / s W/ / I / I / 8 {fl 1/ o' r T T I I T r 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.3: Comparisons in terms of mean squared error in stratum 2 for n = 30, PA(1) = 0.50, 193(1) 2 0.10, 114(2) 2 0.60 and p3(2) = 0.85. 126 Proportion of Successes W m S\W\ g .. r\s W\ f\ \ w 0 \;\s \ (O. ‘ \ W O f§f \W t8 - \£\ \W é\f 8. - \; o I F I I I I I 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0(1) L0 W 3 .. \w\ I w E ‘ S§f\f\w\ \S\r\f W 10 \ \ \f\w fl: ‘ s r \ \ S \ f 8: . \S l’\ o \S r 8 ‘ \S o I fl , I I T 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.4: Comparisons in terms of proportion of successes for n = 150, 114(1) 2 0.50, 123(1) = 0.10, 17,4(2) = 0.60 and 103(2) = 0.85. 127 Mean Squared Error in Stratum 1 // S ‘1 l' \ S f \ s ,— \ 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0(1) 3 s o ‘ \ o S \ -( S \ o S S. d \ S - \ \ S W\ r W\ r \w\ é - \‘R’kw 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.5: Comparisons in terms of mean squared error in stratum 1 for n = 150, 10,40) = 0.50, 123(1) 2 0,10,19,42) = 0.60 and 103(2) = 0.85. 128 *1 l!’ Mean Squared Error in Stratum 2 0014 0010 E E 0.006 0.20 0.25 0.30 0.35 0.40 0.45 0.50 c(1) 0.05 i 004 003 002 0.01 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.6: Comparisons in terms of mean squared error in stratum 2 for n = 150, p.4(1) = 0.50, 103(1) = 0.10, 1),,(2) = 0.60 and 123(2) = 0.85. 129 Proportion of Successes f 8 /W ' 1 f W .4 /f/W/ 5 § . f/ / s O- / /W /____,_.—S“"/ f/W s/S r d w s/ / s /r r/l' ('3 -4 r/ O r/ 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0(1) . f/va f/w/ $11 1 /f;w/ O .1 /f w I w/ “3 W/ 5——-—-———-—S 0". « /———s—/ o 3/5 8/ .. s/ 35 . /—————I/r o r——---*"""""'r r/ 0.50 0.55 0.60 0.65 0.70 0.75 0.80 C(1) Figure 4.7: Comparisons in terms of prOportion of successes for n = 30, pA( 1) = 0.65, 198(1): 0.10, 1),;(2) = 0.40 and 193(2) 2 0.15. 130 Mean Squared Error in Stratum 1 S 2. 1 w\ 0 \§ . w\ W o \ 0.20 0.25 0.30 0.35 0.40 0.45 0.50 C(1) 5 _ w\ \g s - w\ o' \ 3 w\\\ s w§ s (9) W\ O " \®\ 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.8: Comparisons in terms of mean squared error in stratum 1 for n = 30, mm = 0.65, 193(1) = 0.10, “(2) = 0.40 and 193(2) = 0.15. 131 Mean Squared Error in Stratum 2 0.050 s , f:/ s / t¢w 1 f/1 ?W 0 w/w 8 I I I I I I I O 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0(1) I 1 f /w 8 / I ;W/ o d &/,///W 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.9: Comparisons in terms of mean squared error in stratum 2 for n = 30, 291(1) 2 0.65, 193(1) : 0.10, 114(2) 2 0.40 and 1123(2) = 0.15. 132 Proportion of Successes -4 /f/ W . f/ w/ s g 1 w/ /.-—S/ 0 w/ 5/5 1 S /——r/ r/l' O r/ 0') -. r/ o' r/ 0.20 0.25 0.30 0.35 0.40 0.45 0.50 6(1) (0 V: T /I 0 ~ /f///:va/w f g f////w/w ' d f w/ ______.._———s co /s m - s/3 O 5/ 5/ A 5/ r/f v r/ m. ~ /I/ o r/f r/ 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0(1) Figure 4.10: Comparisons in terms of proportion of successes for n = 150, [14(1) = 0.65, 193(1) .—.. 0.10, p,,(2) = 0.40 and p3(2) = 0.15. 133 Mean Squared Error in Stratum 1 0.020 / // 0.012 d ‘Q\g\\\ y\y 0.20 0.25 0.30 0.35 0.40 0.45 0.50 c(1) 0.008 0.008 a) “E 0.007 0 006 (IE1 I I T I I r I 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.11: Comparisons in terms of mean squared error in stratum 1 for n = 150, 1),;(1) = 0.65, 173(1) 2 0.10, 114(2) 2 0.40 and 193(2) 2 0.15. 134 Mean Squared Error in Stratum 2 f O a. - /.L g I f/ f/‘;’/W o r/ / /W ‘ f r/ *ér7w é -( JV/w O F r r r T I I 0.20 0.25 0.30 0.35 0.40 0.45 0.50 v (1 ('3 .\ \ .. / f / W I / o / \L’éw 5 $37—37; o I I I I I I 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.12: Comparisons in terms of mean squared error in stratum 2 for n = 150, 191(1) = 0.65, 193(1) = 0.10, 114(2) 2 0.40 and 103(2) = 0.15. 135 Proportion of Successes J/ / I _/ ,/ _/ / /,,// _ r \ r \ r f g \ r d .1 r 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0(1) q S WE s 55 - w\ s _ \ f w\ s (q - r \ \\ s \ r I k S - \ r v \ r 8. '1 \ r O I I I I f f f 0.50 0.55 0.60 0.65 0.70 0.75 0.80 6(1) Figure 4.13: Comparisons in terms of proportion of successes for n = 30, pA(1) = 0.35, PB(1) = 0-10, PM?) = 0.65 and p3(2) = 0.15. 136 Mean Squared Error in Stratum 1 In \ o _ d \ 8 " w\\g f \ S w o d I I T I I I I 0.20 0.25 0.30 0.35 0.40 0.45 0.50 (1) Q // \ WW\ a \I g ‘ w\w\$ _ \w\f \w\s Ki \w 0.50 0.55 0.60 0.65 0.70 0.75 0.80 6(1) Figure 4.14: Comparisons in terms of mean squared error in stratum 1 for n = 30, 191(1) = 0.35, 123(1) = 0.10, 19,.(2) = 0.65 and 393(2) 2 0.15. 137 Mean Squared Error in Stratum 2 0.055 \\ \ 0.035 I 1 I I I I I 0.20 0.25 0.30 0.35 0.40 0.45 0.50 c(1) 3" : /,/ 9 I d ‘1 /w A /\II/ I/ 8 - w/ d W/ 0.50 0.55 0.60 0.65 0.70 0.75 0.80 6(1) Figure 4.15: Comparisons in terms of mean squared error in stratum 2 for n = 30, pA(1) = 0.35, 193(1) = 0.10, p,,(2) = 0.65 and 103(2) 2 0.15. 138 Proportion of Successes S :1, 1 W§s\ O W\S . W\s 8. ‘ f\ \w\s O 4 f\f \w\s m. d \r f\ O \r f\ 1 \r f \r g .1 \r o‘ \r 0.20 0.25 0.30 0.35 0.40 0.45 0.50 6(1) 3'3 - s W \ 4 \f \ S r \f W\ \S 8 - \r \f W I \ -4 \r I \r a. \r O T I I I I r I 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.16: Comparisons in terms of proportion of successes for n = 150, [14(1) = 0.35, p3(1) = 0.10, “(2) = 0.65 and 123(2) = 0.15. 139 Mean Squared Error in Stratum 1 0.020 5 cm I!) S 1 S o w 0 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0(1) s 8 W\ q 1 w\ s . \e 8 wk 8 g - w§ s W\ S 3 \ w Q d T I I I I I I O 0.50 0.55 0.60 0.65 0.70 0.75 0.80 6(1) Figure 4.17: Comparisons in terms of mean squared error in stratum 1 for n = 150, 191(1) = 0.35, 103(1) = 0.10, pA(2) = 0.65 and 123(2) 2 0.15. 140 121-MW . "fl Mean Squared Error in Stratum 2 0.010 co / 8 - w I o / / I w d w/t/ W/f/ 8 / 0. 1 f o I I I I I I I 0.20 0.25 0.30 0.35 0.40 0.45 0.50 6(1) 3 , ‘1’ O. o o 8 1 ‘1’ o \\ /Y I I I I I I 0.010 0.50 0.55 0.60 0.65 0.70 0.75 0.80 c(1) Figure 4.18: Comparisons in terms of mean squared error in stratum 2 for n = 150, m(l) = 0.35, 123(1) 2 0.10, pA(2) = 0.65 and 103(2) = 0.15. 141 APPENDIX 142 Appendix A Theorem A.0.1 (Hall and Heyde (1980)) Let {Mk = ELI/13,711 : k 2 1} be a martingale and {Tk : k 2 1} be a nondecreasing sequence of positive random variables such that Tk is Hk_1—measurable for each It 2 1. Then . [Wk 111120 f — 0 as. on the set °° 1 {klim Tk = 00, 23:72- E(I’VE l Hk_1) < 00} . -+oo k=1 k Proof. See Theorem 2.18 in Hall and Heyde ( 1980). I Theorem A.0.2 (Hall and Heyde (1980)) Let {Mkm ’Hh : 1 g i g nk, k 2 1} be a zero-mean and square integrable martingale array with difierences {erkfl'}, and let 02 be an almost surely finite random variable. Suppose that max ll’Vk,2'l i) 0, (A.1) lggnk 71k p Z V173,,- ——> 712, (A2) i:l 143 E ( max IVEJ) is bounded in k, (A.3) ISignk and the o—algebras are nested: 71;“,- C ’HHL, forl g i g nk, k 21. (A.4) Then MW = 23;, ”K, A N (0, "2). Proof. See Theorem 3.2 in Hall and Heyde (1980). I Theorem A.0.3 (Hall and Heyde (1980)) Let {2le 14”,, ”HI, : k 2 1} be a martin- gale and define for each k 2 1 and each i = 1, , k, IV,“- = W,I{|W,| g k}. Suppose that, as k —> 00 k ZP(IVWI>k)—>0, (15) 1—1 1 k '1; ZE (II/hi Vii-1) 1) 0, (A6) 1'21 and 1 k p Z {EWEI - E [E (Wm IHHW} —> 0. (A.7) i=1 Then k‘1 2le W,- i) 0, as k —> 00. Proof. See Theorem 2.13 in Hall and Heyde (1980). I 144 BIBLIOGRAPHY 145 Bibliography Athreya, K. B. and S. Karlin (1967). Limit theorems for the split times of branching processes. J. Math. Mech. 17, 257—277. Atkinson, A. C. (1982). Optimum biased coin designs for sequential clinical trials with prognostic factors. Biometrika 69(1), 61—67. Atkinson, A. C. (1998). Optimum experimental designs for chemical kinetics and clinical trials. In New developments and applications in experimental design, pp. 36—49. Institute of Mathematical Statistics. Bai, Z. D. and F. Hu (1999). Asymptotic theorems for urn models with nonhomo— geneous generating matrices. Stochastic Process. Appl. 80(1), 87-—101. Ball, F. G., A. F. M. Smith, and I. Verdinelli (1993). Biased coin designs with a Bayesian bias. J. Statist. Plann. Inference 34(3), 403—421. Berry, D. A. and B. Fristedt (1985). Bandit problems. Chapman & Hall, London- New York. Sequential allocation of experiments. Blackwell, D. and J. Hodges, J. L. (1957). Design for the control of selection bias. Ann. Math. Statist. 28, 449—460. 146 Clayton, M. K. (1989). Covariate models for Bernoulli bandits. Sequential Anal. 8(4), 405—426. Cornell, R. G., B. D. Landeberger, and R. H. Bartlett (1986). Randomized play- the-winner clinical trials. Communications in Statistics, Part A — Theory and Methods 15, 159—178. Efron, B. (1971). Forcing a sequential experiment to be balanced. Biometrika 58, 403—417. Eisele, J. R. (1990). An adaptive biased coin design for the Behrens-Fisher problem. Sequential Anal. 9(4), 343—359 (1991). Hall, P. and C. C. Heyde (1980). Martingale limit theory and its application. New York: Academic Press Inc. [Harcourt Brace Jovanovich Publishers]. Probability and Mathematical Statistics. Hill, B. M., D. Lane, and W. Sudderth (1980). A strong law for some generalized urn processes. Ann. Probab. 8(2), 214—226. Pocock, S. J. and R. Simon (1975). Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics 31, 103—115. Sarkar, J. (1991). One-armed bandit problems with covariates. Ann. Statist. 19(4), 1978—2002. Sibson, R. (1974). DA-optimality and duality. pp. 677—692. Colloq. Math. Soc. Janos Bolyai, Vol. 9. Silvey, S. D. (1980). Optimal design. London: Chapman & Hall. An introduction to 147 the theory for parameter estimation, Monographs on Applied Probability and Statistics. Tamura, R. N., D. E. Faries, J. S. Andersen, and J. H. Heiligenstein (1994). A case study of an adaptive clinical trial in the treatment of out-patients with depressive disorder. J. Amer. Statist. Assoc. 89(427), 768—776. Wei, L. J. (1978). The adaptive biased coin design for sequential experiments. Ann. Statist. 6, 92—100. Wei, L. J. and S. Durham (1978). The randomized play-the-winner rule in medical trials. J. Amer. Statist. Assoc. 73(364), 840—843. Woodroofe, M. (1979). A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74 (368), 799—806. Woodroofe, M. (1982). Sequential allocation with covariates. Sankhya' Ser. A 44 (3), 403—414. Wu, C.-F. (1981). Iterative construction of nearly balanced assignments. I. Cate- gorical covariates. Technometrics 28(1), 37—44. Zelen, M. (1974). The randomization and stratification of patients to clinical trials. Journal of Chronic Diseases 27, 365—375. 148