THEflS i ,. W; n‘ A o38¢845 Ph.D. This is to certify that the dissertation entitled ADAPTIVE DESIGNS WITH COVARIATES presented by GEORGE SIRBU has been accepted towards fulfillment of the requirements for the degree in Statistics and Probability (7791a Major Professor’s Signature $190? Y Date MSU is an Affinnative Action/Equal Opportunity Institution _ 7‘ 7,, A ._. ._ -——-+ LD-g‘-'-'- -VL“‘I-l-‘- --‘-- gv...-_v-vn_‘-.A----- A ‘hL LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 cJCIRC/DatoDue.p65—p.15 ADAPTIVE DESIGNS WITH COVARIATES By George Sirbu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2004 ABSTRACT ADAPTIVE DESIGNS WITH COVARIATES by George Sirbu The potential benefits of adaptive allocation have been recognized as a great con- tribution especially for clinical trials. Implementation of these allocation schemes eases the ethical problem involved in trials on human subjects. While response- adaptive randomization does not eliminate the ethical problem of randomizing pa- tients to the inferior treatment, it can mitigate it by making the probability of assignment to the inferior treatment smaller. The relatively new techniques of response-adaptive randomization are attractive to industry and also to the Food and Drug Administration. An essential feature is to balance the conflict between information gathering - collective ethics - and immediate payoff - individual ethics. In this sense response-adaptive randomization represents a middle ground between community benefit and individual patient benefit. In this dissertation we concentrate on designs with covariate information and re- sponse adaptive randomization procedures. In some studies it is desired to conduct an analysis that is ’adjusted’ for other covariates. Stratification is one approach to adjust for covariates or to increase the efficiency by accounting for a highly influen- tial covariate. This approach however, is only applicable to qualitative covariates, or discretized quantitative covariates, and they must be few in number or else the number of strata grows exponentially. In many respects it is more natural to per- form an adjustment using a regression model that allows for both qualitative and quantitative covariates simultaneously. The problem consists of choosing in a sequential manner one of two treatments while we continuously observe the information from the process - the response and the covariates. We try to choose the best allocation while at the same time we learn from the process. We are able to prove strong consistency and asymptotic normality for maximum quasi-likelihood estimators of regression parameters in generalized linear models with a condition of smoothness for the link function. The key for the applicability of the results is to have a martingale difference structure for the errors of the model and to have a matrix for the covariates that is compact enough, where the compactness condition is expressed in terms of the eigenvalues of the design matrix. The results are general enough to allow us to define the design in a large variety of situations. For obvious ethical reasons, we investigate also what number of patients allocated to the inferior treatment we can expect from the design under some given underlined distributions for the covariate. Various Monte Carlo simulations are used to evaluate the performance of the design for suitable choices of the parameters. To my dear wife Mihaela. iv ACKNOWLEDGEMENT I would like to express my gratitude to all those who gave me the possibility to complete this thesis. I am deeply indebted to my supervisors Dr. C. Page and V. Melfi whose help, stimulating suggestions and encouragement helped me in all the time of research for and writing of this thesis. I wish to thank the members of my committee Dr. H. Salehi and R. Erickson, who have given me valuable suggestions. It is impossible to have my research career without my parents’ love and sup- port, as well as my colleagues and friends’ encouragement. I appreciate all their friendships and their collective encouragement to finish this dissertation. Especially, I would like to give my special thanks to my wife Mihaela whose patient love enabled me to overcome the obstacles through this journey. Table of contents List of Tables vii List of Figures viii 1 Chapter 1 1 1.1 Introduction ................................ 1 1.2 A Short Review of Adaptive Designs .................. 3 1.3 Applications ................................ 11 2 Chapter 2 14 2.1 Introduction ................................ 14 2.2 The Model and Notation ......................... 15 2.3 The Distribution of the Responses under Adaptive Design ............................. 19 2.3.1 A counterexample ......................... 21 3 Chapter 3 27 3.1 Introduction ................................ 27 3.2 A Generalized Linear Model Relating the Response, the Adaptive Design and the Covariatos ........................ 29 3.3 A Consistency Result ........................... 35 3.4 An Asymptotic Normality Result .................... 40 4 Chapter 4 43 4.1 Introduction ................................ 43 4.2 A Normal Response Example ...................... 48 4.2.1 An Evaluation Function for the Number of Patients on the Inferior 'Iieatment ........................ 51 4.2.2 Simulations of the Design .................... 60 4.3 Conclusions and Future Work ...................... 81 Appendix 83 Bibliography 85 vi List of Tables 4.1 Allocation probabilities to treatment A ................. 62 4.2 Simulation results for 31,32,33fi4, in the case n = 30 ......... 65 4.3 Simulation results for 31, 82,33,621, in the case n = 50 ......... 66 4.4 Simulation results for 31, 6}, 33, S4, in the case n = 100 ........ 67 4.5 Simulation results for (730, in the case n = 30 .............. 68 4.6 Simulation results for C50, in the case n = 50 .............. 69 4.7 Simulation results for 0100, in the case n = 100 ............. 70 4.8 Simulation results for C50, in the case n = 50 .............. 77 4.9 Simulation results for C50 / 40, in the case n = 50 ............ 78 4.10 Simulation results for 0100, in the case n = 100 ............. 79 4.11 Simulation results for 0100 /90, in the case n = 100 ........... 80 vii List of Figures 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Graph of function f1 (y) = 7%?7) .................... 51 Illustration of Corollary 4.2.2 ...................... 59 The regression lines for treatment A and B ............... 61 A simulation design ............................ 63 Histogram of 10000 replications for bias of 31 when n = 100, or = 0.1, 0.5, 1, 2 ................................ 71 Histogram of 10000 replications for bias of 32 when n = 100, a = 0.1,0.5,1,2 ................................ 72 Histogram of 10000 replications for bias of $3 when n = 100, a = 0.1, 0.5, 1, 2 ................................ 73 Histogram of 10000 replications for bias of 34 when n = 100, a = 01,05, 1,2 ................................ 74 Histogram of 10000 replications for estimates of 0100 when n = 100, a=0.1,0.5,1,2 .............................. 75 viii Chapter 1 Literature Review 1. 1 Introduction The current dissertation is motivated by the following scenario: in the case of a clinical trial when two drugs, A and B, are to be evaluated, a design is wanted such that the treatment assignment takes in consideration possible covariates and past responses. A design that incorporates in the allocation rule the information obtained from past observations is called an adaptive design. We will review in this chapter some of the designs proposed in the literature and we will see later chapters how we can construct a design for the problem formulated above. As usually in the case of a design problem, we want to balance two goals. On one hand we want the design to be ’ethz'cal’ in the sense that the allocation should be done toward the treatment that shows the best performance thus far in the trial (this can be thought of as an ’individualistic goal’ as defined by Sarkar(1991)). On the other hand we want the design to allow us to draw reliable statistical inference from the data to be collected (this can be thought of as an ’utilitarian goal’ as also defined by Sarkar(1991)). Unfortunately the utilitarian and individualistic goals are usually conflicting. The adaptive designs may address both concerns arisen before. The adaptiveness of the design reflects that the treatment allocation is based on responses available thus far in the trial. But the problem is more complicated here because we set our goal to make the design dependent on covariates too. As one may guess, the covariate information should make the design more ’ethical’ but at the same time it is likely that it complicates the statistical inference. In the current dissertation we will develop and study a new design that will satisfy the goals described before. In the second chapter we will look at some general prOperties of the distribution of responses in the case of an adaptive design with covariates. In chapter three we will prove some asymptotic results for such designs in the case of a specific model for the responses. In the final chapter we will look at some particular designs and how to evaluate them from an ’ethical’ point of view and simulation results will be shown. The dissertation work concentrates on designs where the response from the treat- ment depends on a vector of covariates. Such designs will henceforth be referred as covariate designs, as opposed to non-covariate designs which do not take into con- sideration covariate information on the patients involved in the clinical trial. The other focus of the dissertation is on the adaptiveness of the design as described be- fore. Following Rosenberger’s(2002) classification, the designs where the allocation is done based on the previous treatment allocations, used primarily when a balanced allocation is desired, are called allocation-adaptive designs. The designs where the allocation is done based on the previous responses from the treatments, primarily used when ethical considerations are preferred, are called response-adaptive designs. Before presenting and studying the new design we begin by reviewing some of the procedures that have already been proposed in the literature. This review can be regarded as an introduction and motivation for the new procedure proposed in this dissertation; it will be particularly useful in showing what are the novel aspects of the new design. Also it introduces the designs that are to be used as basic points of reference for the new design. 1.2 A Short Review of Adaptive Designs The main focus of this section is to review some of the adaptive designs available in the literature. Before we start to review some of these, is important to acknowledge the importance of the randomization in a clinical trial. The randomized clinical trial guards against systematic bias. Another prop- erty of randomization is that it promotes comparability among the study groups. Such comparability can only be attempted in observational studies by adjusting for or matching on known covariates. Moreover, the act of randomization provides a probabilistic basis for an inference from the observed results. In the remainder of this section, unless otherwise noted, it is assumed that patients arrive sequentially and that each patient can be assigned to exactly one of two treatments. Although Many of these designs can be modified to handle more than two treatments. Complete randomization is a fair coin-tossing assignment: patients have equal probability to be assigned to either of the two treatments. Sometimes it is desired that the final allocation is exactly equal between the treatments. When the total sample size, n, is known, a truncated binomial design [Blackwell and Hodges(1957)] will achieve the goal of equal allocation. The design is simply to use complete randomization until one treatment has been assigned n/ 2 times; all subsequent patients will receive the opposite treatment. An alternative is the random allocation rule [Lachin(1988)]. Let (tk)k.__1,,,,,n be the sequence of random assignments (where ti, = 1 if the kth patient receives treatment A and ti, = 0 otherwise) and we denote by N ,f the number of patients randomized to treatment A after k patients have been allocated. The design is defined by the following: - N111 — —k+1' 3 «3|:le The difference between the last two designs is that the treatment sequences will not be equiprobable for the truncated binomial design. These last two designs can be embedded in a larger class of designs called restricted randomization designs. In this case the treatment assignments (tk)k=1 n are dependent with a variance-covariance matrix 2t 75 I / 4. They are used when is desired to have equal numbers of patients assigned to each treatment group. The biased coin design proposed by Efron(1971) is a modification of the complete randomization in the sense that it is a biased coin-tossing. Let D, = Nf — N: be the difference between the number of patients who received treatment A and those who received treatment B after k patients have been randomized. Fix a constant p in the interval [0.5,1). Then the design is described by: l—p lka_1>0 P(tk:1)= 1/2 ika_1=0 p lka_1<0. Wei(1978) noted that a disadvantage of this procedure is that, in assigning the next patient to a treatment, the allocation policy neither takes into consideration the number of patients treated thus far, nor does it discriminate between small and large absolute values of Dk. He proposed an adaptive biased coin design as follows: let h : [—1, 1] —-> [0,1] be a non-increasing function such that h(:c) = 1 — h(-a:) for all a: 6 [—1,1]. Then Dk—l) k — 1 . Pa. = 1) = h( This allocation policy forces an extremely imbalanced experiment to be balanced but tends to complete randomization as the difference Dk tends to zero. All previous designs have in common that they are not response-adaptive, i.e. they do not depend on the observed responses from the trial. Their goal is to randomize the treatment and some balance is required. The next designs that we’ll present will be response-adaptive. The response-adaptive randomization can be used when various considerations make it desirable to have unequal numbers of patients assigned to treatments. For example, is very common in clinical trials to have more patients assigned to the superior treatment. Other examples are driven by optimality criteria, as they will be described below, which results in imbalanced allocation. A well-known such design is the randomized play-the-winner rule (RPW) intro— duced by Wei and Durham(1978). The design can be described with an urn model as follows. The urn has initially a balls of two types corresponding to the two treatments. When a patient enters the study, a ball is randomly drawn from the urn and the treatment is assigned according to the ball type. The ball is returned to urn and the response is observed. If a success has occurred, add fl balls of the initial ball type selected and (1 balls of the other type. If a failure has occurred, add 0 balls of the initial ball type selected and fl balls of the other type. The rule is denoted by RPW(u, 01,6) Different choices for the pair (a, 3) give different levels of compromise between balance and allocation to the better treatment. A design used in practice is the RPW(1,0,1). Later many authors have modified and generalized the RPW to achieve less selection bias or less variability or some other optimization criteria. We mention here for example the birth and death am by Ivanova, Rosen- berger, Durham and Fluornoy(2000). A good source for a review of urn models is Rosenberger(2002). So far the designs presented have intuitive motivation and can be completely non-parametric but are not derived from optimal considerations. Another approach for the response-adaptive design is based on Optimal allocation targets, where a specific criterion is optimized based on a population response model. The approach we take here is similar to Melfi, Page and Geraldes(2001). Let 1r 2 7r(-, ) be a function taking values in (0,1), and let 0" and 03 be two parameters for the distribution of the responses from treatment A, respectively B. Assume that there is an unknown but estimable Optimal proportion it = «(04,019) of the observations that should be allocated to treatment A. Let irk_1 be an estimate Of 1r based on the first k — 1 observations. Then the rule is simply to allocate according to the estimate of the Optimal proportion: P(tk = 1) = 7Tk_.1 The optimal criterion can be defined in various ways. For example if the responses A, 0’3 and we want to minimize are normally distributed with standard deviations a the variance Of the difference of the sample means when the total sample size is fixed then the Optimal target is the proportion 1r = 37$ng and thus the parameters to be estimated at each step are 6’4 = 0A and 03 = 03. Another criterion is given by Jenison and Thrnbull(2000). Let [1A,[LB be the mean responses from the treatment A, respectively B, and let [11,“, [if be their es- timators after 17. Observations. Consider it = N A /n, where N: + N5 = n. The Optimal it is found by minimizing the function u(pA, pB)NA + v(uA, pB)NB, with the variance of [if] — [1,? fixed. For example if the responses are dichotomous, so we can clearly define a success/ failure, and we want to minimize the expected number Of treatment failures, then we choose u(pA, ,uB) = 1 — MA, v(uA, #3) = 1 — ,uB and in this case we Obtain TI? = @' Melfi and Page(2000) give an elegant method for proving consistency and asymp- totic normality of estimators for a. wide class of response-adaptive designs. We will see in Chapter 2 and Chapter 3 how some results can be generalized to the case where covariates are incorporated in the design. Most Of the work done in the area of the covariate adaptive designs involves the case when the covariates are categorical. Obviously continuous covariates can also be considered by grouping in an appropriate way the possible values of the covariates into finitely many levels. A general approach is to form strata with combinations of the categorical covariates and then apply the adaptive designs within each stratum. Both Efron(1971) and Wei(1978) mentioned in their papers the procedure to extend their designs tO the covariate case. Pocock and Simon(1975) suggest an allocation rule which can be viewed as a generalization of Efron’s biased coin design to more than two treatments and sev- eral covariates. The design relies on a function G which measures the total amount of imbalance (in the distribution of the treatment numbers within strata). Treat- ments are then ranked according to their G-values. Their procedure is shown to enable treatments to be balanced across strata more effectively. But as Pocock and Simon(1975) note, a major difficulty with this approach is that number of strata increases rapidly as the number of covariates and their levels increase. Geraldes(1999) proposes two new designs. The first one is the covariate adap- tive weighted differences design which incorporates covariates and can be viewed as a generalization Of the adaptive biased coin design Of Wei(1978) by crossing-over information from responses of patients from stratum to stratum. The second one is the covariate randomized play-the-winner rule which corresponds to a multiple urn model, each urn representing a stratum. It allows the responses Of patients in one stratum to change the composition of the urns corresponding to the other strata. Instead of concerns about balancing treatment assignments across strata, one can take an entirely different approach and find an allocation rule that optimizes some criterion, such as minimizing the variance of the estimated treatment effect in the presence of covariates. Such a rule would necessarily require the specification of a model linking the covariates and the treatment effect. As an example, Atkinson(1982) chooses a standard linear regression model: E(Y,-)=$;fi, i=1,2,...,n where the Y, are independent responses with Var(Y) = 021 and 2:,- includes a treatment indicator and selected covariates Of interest. Then Var(fi) = 02(X’X)’1, where X’ X is the dispersion matrix. For the construction of the optimal design, we wish to find the n points of experimentation at which some function is Optimized. The DA-optimal design uses DA-optimality that maximizes DA = |A’M‘1A|'1 where M = X’ X/ n and A is a matrix of contrasts. He proposes then the following sequential design: assuming that n patients have been already allocated, the n + 1 patient is assigned to the treatment that maximizes D A (fin) evaluated at the n-point design En. He also compares the D A criterion to D-optimality, which maximizes the log determinant of M. Clayton(1989) proposes a covariate model for a problem with the responses of the patients governed by Bernoulli distributions. The model can be‘ formally described as follows. Let YkA and 1”,? denote the potential dichotomous responses of the 19‘" patient to treatment A and B, respectively. Before assigning patient It, a covariate X k can be observed on the patient. It is assumed that the covariate random variables X1, X2, . . . are i.i.d. with a known distribution F. He considered then a function H () for linking the responses to the covariates: P(Y,j‘ = 1pc.) = H(a + flXk) P(YkB =1]Xk)= H(C+ ka) where a and B are unknown constants and c and d are known constants. Following a Bayesian approach, he assumes that prior information regarding a and fl is available, given by a probability distribution. The worth of an allocation(strategy) is defined as the expected sum of the first n Observations for all possible histories resulting from the allocation policy. A strategy is called optimal if it yields the maximal worth. The research focuses on the determination of the structural characteristics of exactly Optimal strategies. Yang and Zhu(2002) generalize the problem to the case when the responses are continuous and they propose a design so that they are able to prove that the strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best treatment. Their approach employs nonparametric regression procedures for estimating the dependence of the rewards on the covariates for the treatments. It uses a randomized allocation scheme to control the trade-Off between the tendency to use the currently most promising treatment and further exploration to find the treatment that is truly the best. Lai and Wei(1982) consider the multiple regression model: yn=fi1xn1+...+fipxnp+en, n=1,2,... where the 6,, are unobservable random errors, 51, . . . , ,BP are unknown parameters and ya is the Observed response corresponding to the design levels :cnl, . . . ,rcnp. Let b" be the least squares estimators of H = (51, . . . , 6,). The statistical properties Of the estimators bn are well understood in the case where the design levels xi,- are non- random constants, but there is a much less definitive theory for the case where the covariate vectors xn = (12,“, . . . ,xnp) are sequentially determined random vectors. They consider an adaptive design in the sense that xn at each stage it depends on the previous observations x1,y1, . . . ,xn-1, yn_1. They give sufficient conditions to prove strong consistency and asymptotic normality for bn. They also give an interesting counterexample to show that their conditions are in some sense the weakest possible. Here is the counterexample: 91' = 31+ [3233i + (it where 61, 62, . .. are i.i.d. random variables with Eq = 0, E6? = 1. The regressors are defined inductively by: 3:1 = 0a 1311+] = in + 6311 where c 75 0 is a real constant and En, 5,, denote the arithmetic means. Then it can be proved that the assumptions are violated and the least squares estimate bag of 52 converges as. to fig — c‘l. Chen, Hu and Ying(1999) consider a similar problem in the context of the gen- eralized linear model. The key difference is Obviously the nonlinearity in the link function. They handle this by establishing a local inverse function theorem. In do- ing so, they show that the minimum conditions Of Lai and Wei(1982) in conjunction with an additional assumption, essentially to Offset the nonlinearity, ensure strong consistency in their case, tOO. We will consider in Chapter 3 a generalization of these last two designs. In their 10 case the adaptiveness refers to the dependence Of the covariate vector x" on the Observations x1,y1, . . . ,x,,_1, yn_1. In our case we will consider the problem when the vector x" has two components: a treatment indicator tn and a vector of patient covariates. Then the choice of the treatment, tn, is made not only on the previous Observations x1,y1, . . . ,xn_1, yn_1, but also on the values observed from the current patient covariate. 1 .3 Applications Medical investigators who wish to perform clinical trials currently have a wide vari- ety of adaptive allocation procedures at their disposal. Still, very few clinical trials based on adaptive designs have been reported to the literature. We mention some Of them here. Cornell, Landenberger and Bartlett (1986) reported on an adaptive clinical trial to test the efficacy Of extracorporeal membrane oxygenation (ECMO) for the treat- ment of persistent pulmonary hypertension of newborn infants. The design used was RPW(1,0,1). The trial was stopped after 12 total patients using a stopping rule with 1 patient allocated to the conventional therapy and the rest to the ECMO treatment. The subsequent analysis Of the ECMO trial data generated controversy, the foremost question raised was whether two treatments can adequately be com- pared when only one patient was assigned to one Of the treatments. Much Of the criticism of adaptive designs has centered on this trial and this is unfortunate be- cause this is exactly the type Of trial where response-adaptive randomization would be particularly advantageous. It is known that the RPW(1,0,1) is highly variable, particularly when 1),] + p3 > 1.5 which was the case for the ECMO trial, and the variance depends on the initial composition Of the urn. In retrospect, starting with more than one ball of each type should have resulted in a more balanced trial. 11 Another trial was the fluoxetine trial reported by Tamura, F aries, Andersen and Heiligenstein (1994). They used an RPW(1,0,1) design in a clinical trial Of fluoxe- tine versus placebo for depressive disorder. The trial was stratified by normal and shortened rapid eye movement latency (REML), so two urns were used in random- ization. In order to avoid the problem encountered in the ECMO trial, the first six patients in each stratum were assigned using a permuted block design. The trial was stopped after 61 patients had responded in accordance with surrogate criterion. The trial randomized a total of 89 patients: 21 fluoxetine patients and 20 placebo patients in the shortened REML stratum, 21 fluoxetine and 21 placebo patients in the normal REML stratum. The primary outcome was analyzed using Monte-Carlo randomization-based analysis. Rosenberger(1999) discusses conditions under which the use Of response-adaptive randomization is reasonable. We note some Of them here: 0 The treatments have been evaluated previously for toxicity. This is impor- tant to ensure that the response-adaptive randomization does not place more patients on a highly toxic treatment. 0 Delay in response is moderate, allowing the adaptation to take place. 0 Duration of the trial is limited and recruitment can take place during most or all of the trial. 0 Modest gain in terms of treatment successes is desirable from an ethical stand- point. 0 The experimental therapy is expected to have significant benefits to the public health. The choice of a design should be driven by the simplicity of implementation but also by its statistical properties and the nature Of the clinical trial. We hope that 12 the new design proposed in this dissertation can be successfully used some future clinical trials. 13 Chapter 2 Adaptive Designs with Covariates: Model and Marginal Distributions 2. 1 Introduction In this chapter we introduce the problem Of adaptive designs with covariates. We will look only at the response distributions without assuming any parametric model relating the response to the covariates and in this sense a wide class Of examples is covered. Assume that we are in the case of clinical trials where the patients are randomly assigned, one at a time, to one Of the two treatments according to a response-adaptive rule. We assume that we work under the following conditions: 0 Patients arrive sequentially. 0 Each patient can be assigned to either one of the two treatments and will be assigned to exactly one of them. 0 The response from patients is observed immediately, prior to the assignment of the treatment to the next patient. 14 o The response depends on a vector of covariates. The covariates are Observed prior to the treatment assignment. Our goal is to show how a design for a response-adaptive allocation where the treatment allocation depends on the previous responses and the current covariates of the patient influences the distribution of the responses. In contrast to the case when the adaptive design is without covariates, the new setup introduces a more complex dependence structure in the following sense: the response is dependent on the vector of the covariates and the treatment allocation and at the same time the treatment allocation is dependent on the previous re- sponses. Of course that in the case when we ignore the covariates from the design, the first dependence that we described is not present. 2.2 The Model and Notation In this section we introduce the notation for the model we will use through the rest Of the chapter. As we described before, suppose that patients arrive sequentially and they are allocated to one Of two treatments, say A or B. For each k 2 1, define tk to be 1 or 0 according to whether the kth patient is assigned to treatment A, respectively treatment B: 1 if 19‘" patient is assigned to treatment A et- 3' || 0 if 19‘" patient is assigned to treatment B . Let YkA, YkB denote the potential response Of the It“ patient for treatment A, respectively B. Recall from our initial assumptions that, for all k 2 1, exactly one of the pair (YkA, Y?) is actually observed. Let X], = (Xk1,. . . ,ka) denotes the p-dimensional covariate vector of the k“ 15 patient. Our goal is to have an adaptive design where the responses can be dependent on covariates. Thus the distribution for Yk", YkB can be conditioned on the X k. We consider then the following conditional distributions: 2.1 YAXk=x~FA k 1: Note that we do not assume or require any independence structure between Y,;4 and YkB. In fact there may be a strong correlation between the responses observed from treatment A and B, since they are Observed from the same patient. In most cases the allocation is randomized. It is useful then to let (Uk)k21 denote a sequence Of i.i.d. random variables whose common distribution is uniform on (0, 1) and is independent Of the other variables in the model (11?, K? , X k). Then the allocation is defined as: (2.2) tk = [[Uk < 7%] where the I [] denotes the indicator function and in, is some number. We will see later in chapter 4 how we can define in, in a suitable way. We use the index k for in, to emphasize that in, is computed based on the covariate Xk, but we have to keep in mind that in fact in, is computed based only on the information from the first It — 1 patients and the current covariate X 1,. Assume now the existence of an increasing sequence of a -algebras, {7-7,th such that the triplet (YkA, YkB , X k, tk, Uk) is f), -measurable. We can define them as: (2.3) f]; I: 0(YiA,},,-B,Xi,ti, U1 3 1 S 2 S k) . Think of the o -algebra .73}, as all possible information available after k patients have 16 been allocated and their responses and covariates Observed. Consider the increasing sequence of a -algebras {91.} 1:21 defined as follows: (2.4) 9,, Z=ka0'(Xk+1)V0'(Uk+1) . We have to define a new a -algebra so that the treatment indicator variable becomes measurable. In the view of our goal stated in the beginning of the chapter, think Of the o -algebra 9;, as the information to use for allocating the treatment for the (k+ 1)“ patient. Therefore we will make use of g], as the o -algebra with respect to which tk+1 becomes measurable. To understand better, Observe what happens without adding 0(Xk+1) to the filtration .75}. V 0(Uk+1). If we require only that tk+1 is measurable with respect to .75, V o(Uk+1), as in the usual case of adaptive designs without covariates, then the treatment allocation depends only on the previous responses without taking in consideration the covariate value of the current patient. But this was not our proposed goal, so the definition of the new algebra plays a crucial role in following developments. It is also useful to define the o -algebra Jk: (2.5) jk I: .7}; V 0’(Xk+1). Observe then from the definition of the sequence of the uniform variables ( Uh) 1:21 that Uh“ is independent of the a -algebra Jk. Then we have that: (2.6) E(tk+1|.7k) = E(I[Uk+l < 7fk+1lls7k) = ilk“ Let 7],, 11;, be defined as the random variables which give the stage when the kth Observation from treatment A, respectively B, is taken. More explicitly, after the allocation is done we observe from treatment A the sequence Of Observations 17 Y3, Y3, . . . and respectively from treatment B the sequence Yf, Y3, . . . Note that specifying the sequence 11,111,72, V2, . .. is equivalent to giving the sequence Of the treatment indicators t1, t2, . . . . Indeed, Ti, 11,- can be defined inductively as T0=O Ti=lnf{k>7'i_1]tk=1}, V1.21 V0 2 O V37 ‘2 inf{k > Vi_1|tk = 0}, V221 and conversely, t..=1, Vial tu.=0, ViZl. The condition for the design that the treatment allocation for the kth patient is based on the responses from the first I: — 1 patients and the covariate Of the kth patient can be expressed as follows: (2.7) (7’, = k} E gk—la {11,- = k} E Qk—l Vi,k Z 1 or equivalently, (2'8) tk E gk—l . 18 2.3 The Distribution Of the Responses under Adaptive Design We will show in this section that although the covariates can introduce a compli- cated dependence structure in the adaptive design, the marginal distribution of the responses will be unaffected. Theorem 2.1 from Melfi and Page(2000) says that in the case when the covariates are not considered in the model and as long as the pair (YkA, 1”,?) is independent of the o -algebra $1-1 then the allocated sequence inherit the distribution and independence structure Of the original sequence. In our case we do not get the independence, but we can show that the original distribution is inherited by the allocated sequence. Theorem 2.3.1. Let (YkA, YkB,Xk)k21 be a sequence of i.i..d random vectors. As- sume that the conditional distributions of YkA,YkB are given by equation (2.1) and that X1, is a discrete random vector. Assume that there is a filtration {fk}k21 such that (YkA,YkB,Xk)k21 is f}. - measurable for all k 2 1. Let {9,th be the filtration of the o -algebra introduced by equation (2.4). Assume that (YkA,YkB) and Qk_1 are conditionally independent given Xk. Let {77,}k21 and {V1,}k21 be two sequences of positive, increasing, integer-valued, a.s. finite random variables which satisfy condition (2.7) and also satisfy (2.9) P(r,- = Vj) = 0, Vi,j 2 1 (2.10) VkZ 1, 3i such that r,=k or V,=k. Then the conditional distributions of the allocated sequence are given by the same 19 original distributions: (2.11) Y,f‘|X,, =x~Ff, Vi 21 YT?|X,,=:r~Ff, Vial. Before we proceed to the proof Of the theorem we remark that the condition (2.9) is the expression of the initial assumption that a patient can receive only one treatment and the condition (2.10) excludes the trivial case when some patients do not receive any treatment at all. Proof. We’ll prove the result only for er’, the proof for Y}? is similar. We want to prove that for any measurable set D, Pm? e 0er, = x) = 173(0) . We have that P(Y,f e DIXT, = as) = Z Pm: e D, T,- = m | X,, = a3). m21 Recall from condition (2.7) that {Ti = m} E g".-. for all i, m 2 1 and using the hypothesis that Y; and cw, are conditionally independent given X... we Obtain that: P(Y,.,’;’ED,r,-=m|Xm=:r)=P(Y,,’,’EDIXm=zr)P(r,-=m|Xm=:L‘) 20 Then in the case when P(XT,. = :r) > 0 P(YT‘:’E D,r,-=m|X,, =23) = P(Y,f E D,’r,- = m,XT, = 2:) P(XT,. = as) Pal"? E D,Ti = m,Xm =-’ (II) P(X,, = 2:) P(Y.,,’;1 E D,r,- = ml Xm = 11:)P(Xm = 2:) P(X.,. = 11:) 3 P(Y,;‘,’ E D | Xm = a3)P(r, = ml Xm = :1:)P(Xm = :c) P(X,, = .73) Ff(D)P(r,- = m | Xm = x)P(Xm = a3) P(X.,,. = x) Ff(D)P(r, = m, Xm = 1:) P(X,, = 1r) Ff(D)P(r,- = m,X.,, = 3:) P(X.,, = x) = Ff(D)P(r, = m | X,, = :c) . independent. More precisely, keeping the same notation as before, we will Show that the se- Assume the following: Combining the relations together, we get indeed the equality required. 2.3. 1 A counterexample We will show in this section why the independence structure of covariates is not generally preserved in the case Of an adaptive design. For this we will look at a counterexample, an adaptive design satisfying the properties required in the begin- ning Of the chapter but for which the sequence of the Observed covariates is not quence X71, X72, . . . is not independent and similarly for XV1 , XW, . . .. o X1,X2, are i.i.d. Bernoulli with P(X1 = 1) = P(X1 = 0) = 0.5; 21 o For the responses assume the following distributions: Pug," = 1|Xk = 1) = 1 — 1301,,"1 = DIX), = 1) = .75 P(YkA = 1|X. = 0) = 1 — Pm" = 0le = 0) = .25 Pm? =1|X,c =1) = 1 — P(Y,,B = 0le =1) = .25 Pm? =1|X,c = 1) = 1 — P(YkB = 0le = 1) = .75 We associate the value 1 for the response with a successful treatment. 0 Assume the following iterative design: P(t1 = 1) = P(t1 = 0) = 0.5 (in the first stage we assume the equipoise state so there is no preferred treat- ment and the covariate is ignored). At each stage k 2 2 let N3 = 2le tk be the number of allocations to treat- ment A up to stage k. Let k‘ = TN12’_1 = maxlsi 0 then allocate according to the following rule: 0.9 if XV 0.3 if X),- P(tk = lle = 1) = 1— P(t,c = 0|X,c = 1) = 0.7 if X),— 0.4 if X),— 0.7 if XI,- 0.4 if X),— P(t;c =1IX;c = 0)=1— P(t,c =1|X1c = 0) = 0.9 if X),- 0.3 if X),— We show now that X,,, X”, . . . are not independent. P(X., = 1) =ZP(X,, = 1, n = is) 1:21 =2P(Xk = 1,11 = 0,...,t,,_1 = 0,1,c = 1) 1:21 :Zp(tk =1:le =11tl = 01' ' ° itk—l = 0) kZI -P(X,, = 1) . P(t1 = 0,... ,t,,_, = 0) 1 1 =Z§'§'P(t1=0....,t._.=0); 1:21 23 = 1,114 = 1,114 = 0,114 = 0,114 = 1,114 = 1,114 = 0,114 = 0,114 and similarly, (X.,=0) =(sz ., =o,n=k) k>l =ZP(Xk:Oatl=01"'1tk-1=0’tk:1) 1:21 22PM. =1|Xk = 0.t1= 0.---.tk—1 = 0) 1:21 'P(Xk=0)'P(t1=01'°'1tk—l=0) I 1 — — — P(t1=0,..-,tk—1=0)- 2 2 k>1 And since P(XT, = 1) + P(XT, = 0) = 1 we obtain that P(X.,, = 1) = P(XTl = 0) = 0.5 On the other hand, P(X1-2 =1)=Z Z P()(.,-2 =1,T2=k,7'1=j) k>1k>j>1 =ZZP(Xk=1,tk=1aatk—I=0m vtj+l=01tj=1tj-1=0’” t1=0) k_>_1k>j>l éZme. 1:21 k>j21 where A), denotes the event: Ak={Xk=1,tk=1,.tk_1=0,. ,tj+1=0,jt=1,jt _1=0,...,t1=0}. Let 8110' = {16' =1,Xj :1}1BIO,j = {)3 =1,Xj = 0},B01,j = {)9 = 0,XJ’ =1} 24 and BOO,j = {Yj = 0,Xj = 0}. Then: P(XT2 =1) = Z Z Z P(A,, D Baas“) k21 k>j21a=0,1 g=0,1 =111+110+101+100- We’ll show the computations just for the first summand: 111=P(Xk=1,tk=1,tk_1=0,. ,tj+1=0,jt=1,B11j,tj _1=0,...,t1=0) =P(tk=1|Xk=l,tk_1=0,..1+1—0t—1811,,]1—0 t1=0) .P(X,,=1) 'P(tk_1=0,..1+1—0lt—1B11],31—0 t1=0) .P(Y,- =1|t,- = 1,X,- = 1,1,41 = 0,...,t1 = 0) -P(t,- =1|X,=1,t,-_1=0,...,t1=0) -P(X,- =1) .P(t,-_1= 0,... ,t, = 0) = 0.9 - 0.5 - (0.1)’°-J'-l . 0.75 - 0.5 -0.5 0.5171 . In a similar way we Obtain that: 110 = 0.7 - 0.5 - (0.3)"”j‘l .025 .(15 .05 . 0.5j-1 101 = 0.3 - 0.5 - (0.7)k‘j‘l .025 . 0,5 . (15 . (151-1 100 = 0.4 - 0.5 - (0.6)k‘j‘1 .().75 . 0.5 . 0.5 . 0.51-1 25 Also we can to observe that: P(X,., = 1,X,, = 1) = Z Z P(X,, =1,¢2 = 1c,X,, =1,r1=j) k21 Ic>j_>_l :2 Z 10(sz1,1,,_—.1,t,,_1=0,...,t,-+1=0,X,-=1,t,- =1,t,-_1=0..--.t1 =0) k_>_1 1c>j21 =2 2 P(AkD{Xj=1}) k211c>j21 =2: 2 P(Am{Y,-=1}n{X,=1}+P(A,,n{Y,-=0}n{X,-=1}) 1:21 1c>j21 =111 +101- Since from the above relations is clear that In + 101 76 100 + 110 then it follows that: 1 P(XT1 =1)P(X,., = 1) 2 EU“ + 101+ [10 + 100) 75 ,1} 2(111 + [01) = P(X1-2 =11X‘rl =1) . which proves the property that the sequence (er)k21 is not independent. 26 Chapter 3 Asymptotic Results for Adaptive Designs with Covariates 3. 1 Introduction In this section we will introduce a specific parametric model for the problem we formulated in the beginning Of the previous chapter. Thus we will concentrate now on the study of the behavior Of the parameters rather than the study of distribution Of responses themselves. We need a different approach from Melfi and Page (2000) to derive the asymptotic results because in their case, the adaptive design produced an i.i.d. sequence for the responses, result which is not available in the case of the adaptive design with covariates. We will work under the same assumptions mentioned in the second chapter, namely: 0 Patients arrive sequentially. 0 Each patient can be assigned to either one of the two treatments and will be assigned tO exactly one of them. 27 o The response from patients is observed immediately, prior to the assignment Of the treatment to the next patient. 0 The response is possible dependent on a vector Of covariates. The covariates are Observed prior tO the treatment assignment. Our goal is to show how a response-adaptive allocation where the treatment allocation depends on the previous responses and the current covariate Of the patient influences the asymptotic results for estimators Of the parameters of the model we prOpose. As pointed out in the first chapter much Of the work done in the area of the adaptive designs with covariates concentrated on the approach Of forming strata by considering all possible combinations Of levels of relevant covariates and then inde- pendently use different adaptive schemes within each stratum to allocate patients. But the approach that we will take in this chapter comes naturally whenever we can model a response which is a function of covariates. Specifically, we will consider a generalized linear model for linking the response with the covariates and the treat- ment, and then we will concentrate on studying the behavior Of the estimators of the parameter vector of coefficients under a general adaptive allocation which satisfies the above mentioned assumptions. We will prove consistency and asymptotic normality results for the estimators Of the parameter vector of coefficients under some conditions imposed on the model. 28 3.2 A Generalized Linear Model Relating the Re- sponse, the Adaptive Design and the Covari- ates In what follows we introduce the model and notation that we’ll need through the rest Of the chapter. We will follow as much as possible the notation introduced in the second chapter and we will extend the definitions previously used when needed. As they were introduced in the previous chapter, for each It _>_ 1 let t), be the treatment allocation to the kth patient: 1 if k“ patient is assigned to treatment A a. 71" || 0 if k’h patient is assigned to treatment B . Also recall that X), = (X 1.1, . . . ,ka) denotes, as before, the p—dimensional co- variate vector of the kth patient. Let Y), be the Observed response of the kth patient. Using the notation Yk” and YB introduced in the previous chapter we can define Y1, as: (3.1) 11 = 114nm, = 1] + 113111,, = 0] where we denote by I () the usual indicator function. As will be seen below, we don’t have to make any longer the difference in the notation between response from treatment A and respectively treatment B, YkA and Y13. This is because with the new approach, we will consider the response Y), in the context Of a generalized linear model, with the independent variables being the covariate X k and treatment t), and with possible interactions. Assume, as before, the existence Of an increasing sequence of o -algebra {7:1,} 1), 29 defined as (3.2) .73), := 0(144,Y,.B,X,-,t,-, U,- : 1 g 1 g k). As we discussed in chapter 2, (Uk)k21 denotes a sequence Of i.i.d. random vari- ables uniform distributed on (0,1) and independent Of the other variables in the model (YkA, YkB, Xk). They may be used for randomizing the allocation treatment as in the construction given by (2.2). Consider also the increasing sequence Of a -algebras {9),} 121 defined as in equa- tion (2.4): 9,, := .7-7, V o(Xk+1) V o(Uk+1) . Recall that our goal is to make the allocation decision for the 19‘” patient based on the responses from the first k — 1 patients and the covariate Of the let" patient. Hence in this case the condition can be expressed as (3.3) tk E gk-l . Before we introduce our model, recall an overview Of the generalized linear model. The generalized linear model is characterized by a random part and a structural part. The random part consists of the independent observations (KL-21 whose dis- tributions are members of the same exponential family, given by the generic density function: y.- ' 9i ’ blgi) d) f(yi|0i1 42,7111) = exp{ ° wi + C(yii $1 7.01)} where o 0,- is the so—called natural parameter c (b is the dispersion parameter 30 o w,- is a known weight 0 c(., -, ) is a function of y,, dispersion (b and weight 1111. The structural part Of the model specifies the linear relationship between a vector of covariates 2:,- = (23,1, . . . ,x,q) and the expectation of the response, E(Y,-) = u,- : q g(flz‘) = 771' = 1‘15 = 25% '133' - j=1 The invertible function g(-), common to all Observations, is called the link func- tion and fl is the parameter vector of the model. Each exponential distribution is associated with a canonical link function, although this is not necessarily the best link for modeling. The more familiar linear regression model and logistic regression model can be both viewed as particular examples of the generalized linear model. For the case when the random part has binomial distribution and the canonical link is the logit function, g(y) = logit(y) = log(1—_Ly) we Obtain from the generalized linear model the logistic regression model. In the case when the random part has normal distri- bution and the canonical link is the identity function g(y) = y we Obtain the linear regression model. In our case we construct the generalized linear model as follows. The structural part is given by the following formula: (3-4) E(Yk|~7k—1 V 0(X1c, tic» = 9(X1c51 + 0152 + Xkfistk) where o g is the link function associated with the distribution of the response Y], 31 0 fl], fig are p -dimensional column vectors and 02 is scalar: B; = (1311.512. . - - 1181p) 131,}: (fi31118321 ' ' '1/8311)‘ We will use from now on the notation M’ to denote the transpose of a matrix M. We used in relation (3.4) the o -algebra $1-1 V 0(X 1:, tk) to emphasize the rela- tionship between the response Y), and X 1:, tk. But since ti, 6 91-1 by assumption (3.3) and using the definition (2.4) of the a -algebra gk_1 we may use as well the following definition for the structural part of our model: (3-5) E(Yklgk-1) = 9(Xkfl1 + 0:52 + Xkfistk) Consider the errors (k for the model: (3-6) 61: = Y1: — E(Yklgk-l) - We will assume that the errors 6]; are independent. In a more explicit form, the systematic part Of the model can be rewritten as: p p (3.7) E(Yklgk—1) = 9 (Z ijfllj + 0:02 + Zxkjfisjtk) - i=1 i=1 The coefficient parameter Of the model is given by the (2p + 1) -dimensional column vector .8, = (fil 1B2, fiI’S) ' In some cases it will be useful to refer to the regressor vector Of the model (3.4) 32 as a whole. We introduce the following notation: (33-8) X1: = (X1, 0:. X1: - ti) - The innovation of our model is that we introduce in the structural part Of gener- alized linear model the dependence on the treatment where the treatment allocation follows an adaptive design as defined by relation (3.3). By considering the generalized linear model we allow either continuous or cate- gorical responses Yk. For the continuous responses, normal errors and the case when the link function is the identity function g(y) = y, we get back to the usual linear model. For now we don’t assume yet any special conditions for the errors, except in- dependence, but as we will see later in the chapter we will need to enforce some regularity conditions on them to prove the consistency and the asymptotic proper- ties for the estimators of the parameters Of the model. Our next goal is to define some estimators for the coefficient parameters of the model (3.4) and then to study their behavior. The most natural estimators that arise are the maximum quasi—likelihood estimators. The maximum quasi-likelihood estimators are defined in connection with the so- called quasi-likelihood functions introduced by Wedderburn (1974) and McCullagh & Nelder (1989). In general, if we have independent Observations (y,),-=1,,,,,,, with expectations [1,- and variances V(/1,-) then the quasi-likelihood function K(y,-, 11,) is defined by the relation: 6K(y.-,u.-) _ yi — #1 8/1: _ Vf/Ii) ’ Assume that for each Observation, u,- is a function Of parameters 131, . . . ,13,,. It 33 can be proved that 8K . 13(0/31) —0 forallz—1,...,q. Then the maximum quasi-likelihood estimators are the solutions from equating BK/Bfi, with its expectation, zero: i 0K(yj,/1j(51. - ° wfiq» (9/31 =0 foralli=1,...,q. j=l Let 0:, = (B1,,,[31,,,B§,,,) be the maximum quasi-likelihood estimator of the model (3.4). Note again that the estimators fiLmfigm are p-dimensional column vectors and 101,, is scalar. Explicitly in our case 0,, is the solution of the following equations: inlyi _ g(xifilm + 62," ' ti + 531-83," . t1” = 0 i=1 (3.9) E till/1' — 907131,” + 3,111+ $1.33,” 41)] = 0 i=1 Zita-My.- — 9031/3131 + (32,1: ‘ ti + (Bi/93,1: ' ti)] = 0 - i=1 We may remark that in the case Of linear models, when the link function g() is the identity function, equations (3.9) become the normal equations and the maxi- mum quasi-likelihood estimator becomes the usual least squares estimator. We will prove in the next sections that under some assumptions we can derive consistency and asymptotic normality properties for the maximum quasi-likelihood estimators defined as solutions to equations (3.9). In return the results allow us to analyze predictors for the response Yk from treatments A and B and eventually to compare them. 34 Let Z, be the model (3.4) design matrix, that is (X11 X12 X1p X11't1 X12't2 X1p't1 t1\ X21 X22 X21, X21't2 X22't2 Xgp'tg t2 (3.10) z, (an Xng X,,,, an-tn Xng-tn an-tn in) We will pay special attention to the eigenvalues Of the matrix Z1,Z,,. Let Amin(n) and Amm(n) be the minimum and respectively the maximum eigen- values of ZQZn. The assumptions for the results we will prove next depend on the regularity Of the design matrix Zn and this will be expressed in terms of the above defined eigenvalues. 3.3 A Consistency Result We want to show in this section that the maximum quasi-likelihood estimator, fin , is strongly consistent for the model parameter, 0. In order to prove the result we need some limit constraints for the errors Of the model, 6", and also we will rule out the asymptotically ill-conditioned design matrices, Zn. Theorem 2 from Chen, Hu and Ying (1999) and Theorem 1 from Lai and Wei (1982) address the consistency problem for an adaptive design for a generalized linear model and respectively a linear regression model. But in their case the design does not allow for the treatment allocation t), to be dependent on the covariate Xk. But in our case our main design goal is to make the treatment allocation, t), dependent not only on the past information $1-1 but also on the covariate value of the current patient, Xk. We will mirror the results mentioned before but using the new a -algebras required by our design. In what follows we will denote by the norm II - H2 the usual Euclidean norm, that 35 is for an m—dimensional vector x = (2:1, 2:2, . . . ,xm), ||:c||2=\/:r§+:r§+...+x3n. Theorem 3.3.1. Assume that we have a generalized linear model as specified by the model (3.4) (3.6) with an adaptive design satisfying relations (3.2) and (3.3). Assume the following conditions are satisfied: (3.11) g is continuously diflerentiable with positive derivative function (3.12) 3112111) ||X,-||2 < ooa.s. , (3.13) 71:11.10 Am:n(n)/log /\max(n) = ooa.s. , (3.14) 81.1211) E(|e,:|°‘|g,--1) < was for some a > 2. Then the estimator fin is strongly consistent, and (3-15) ”Bu — fill2 : 0((10g )‘max(n)/’\min(n))l/2) “-3: Let recall first the theorem proved by Chen, Hu and Ying (1999) in their paper. Theorem 3.3.2 (Chen, Hu and Ying (1999)). Consider a generalized linear model for the pairs (111,301.21 given by: (3-16) E01034) = 90341710 36 where (3:101:21 is a o-filtration such that y, E f), and 2:), E .75},_1. Let the errors 6,, be defined as: C1: = yr: —- Efyklfk—I) Let Ann-n, )1me be the minimum and the marimum eigenvalues of the information matria: zyflxixfi. Let 8,, be the maximum quasi-likelihood estimator of B from model (3.16). Assume the following conditions are satisfied: (3.17) g is continuously differentiable with positive derivative function (3.18) sup ||:r,~||2 < was :21 (3.19) lim Amm(n)/log AmaI(n) = was (3.20) sup E(|e,~|"|f,_1) < was for some a > 2 121 Then (3.21) H31: - 5H2 = 0((10g Ama:(N)//\mm(n))’/2) 01-3- As we pointed out before, the implementation Of the new a -algebras 91: will solve the problem for our design. Proof of theorem 3.3.1. Horn the construction of the o -algebras 9,, we have that t), is Qk_1 -measurable and from the construction itself of the filtration, X1, is gm -measurable and Y), is 9;, -measurable. The model (3.4) becomes: E(Y1:|g1:—1)= 9W1 ' X1: + 32 ° 1i1: + 33 ' X1: ° tk) We can apply now theorem 3.3.2 with .7}, replaced by Q), and covariate matrix 221:13311'; replaced by ZQZn. 37 We Observe that conditions (3.17) (3.19) (3.20) are satisfied by the equivalent con- ditions (3.11) (3.13) (3.14) from theorem 3.3.1. To see that also condition (3.18) is satisfied we can use the inequality 2 “Xlllg =]I(X1:, tk, Xk'tk)|| S 2' “Xklli + 1 2 because t), 6 {0,1} for all k 2 1. Now using (3.12) it follows that indeed condi- tion (3.18) is satisfied. Hence the conclusion (3.21) Of Theorem 3.3.2 which is just exactly what we need for (3.15), too. Cl We will look now at the particular case when the covariate is 1-dimensional, that is, p = 1. Then we can find explicitly the eigenvalues involved in the condition (3.13) from Theorem 3.3.1 and we will be able to prove the desired limit under some verifiable conditions. Proposition 3.3.3. Let covariate (23,-),21 and treatment (t,),-21 sequences be such that: :13 _. 0...... :21 2t? -) 00 as :21 2 (Zi21xiti) 2 2 < 1a.s. 2:121:17: ' 1'21 ti Let $1 1132 . . . :13" Z:, = t1 t2 tn 38 Then ’\min (n) lim —— —: 00 as n—ioo log/\max(n) where Ann-An), Amax(n) are the minimum, respective maximum eigenvalue of the matrix Z; Zn. Proof. Let m1 = 2;, 23?; m2 = Zf=1ziti and m3 = Z" t? We omitted the index i=1 1' n to make the notation easier. It can be readily proved that Amin = % (ml + m3 — \/(ml _ m3)2 + 4mg) Amas: = 'é' (m1 + m3 + Wm] — m3)2 + 4m2) Because of Cauchy inequality we have that mg 3 m1m3. Hence Ami" 2 0. Moreover, Amin = 2 ' m1m3 2 2 _1_ _1_ _1- __ L _'_"2__.4__ m1 + mg + \/(m1 mg) + m1m3 m1m3 Let m2 m1m3 and from hypothesis, at the limit, I < 1a.s.. Using also the hypothesis that m1, m3 —> was we have that as n —+ co, 1—l 2- as 0+0+\/(0-0)2+l-0 Amin —’ therefore as n —1 co, Am,” ——> 00 us. Let q be a constant such that 1+fl q> T g 39 Then using some algebra we can prove that q ' Amin > Amax Then we have that Amin _ Amin 108 Amin log Am” — log Ann-n log Am,” and log /\m:'n log Am,” —— —— < 1 log(q - Amin) log Am” Since Am," —+ was we Obtain that log Amin/ log Am“. —> 1a.s. and therefore Amm/log Am” —+ was. when n —> 00. Cl 3.4 An Asymptotic Normality Result We showed in the previous section that the maximum quasi-likelihood estimator, fin, is consistent under assumptions (3.11) - (3.14). Therefore, under the adap- tive design and the model considered, we know that at least asymptotically we get the correct estimators for the coefficient parameter fl. Our next goal is tO study the asymptotic normality behavior Of the maximum likelihood estimators. We will prove the asymptotic normality result in the case of the linear regression when also the maximum quasi-likelihood estimator becomes the usual least square estimator. Unfortunately in the literature there is not any similar result for the generalized linear model. Because we need the variance for the asymptotic normal distribution we will need an extra asymptotic limit condition on the errors Of model (3.4). Recall that Theorem 3 from Lai and Wei (1982) addresses also the asymptotic normality problem but the design in their case does not permit the choice Of the treatment to be dependent on the covariate of the current patient. We will extend 40 the result they Obtained for their particular case to the design that we presented in the introduction of the chapter. Theorem 3.4.1. Suppose that for the model (3.4) with the identity link function, g() = -, the errors 6,, satisfy condition (3.14) and the following limit condition: (3.22) _lim E(ef|Q,-_1) = o2 as. Moreover there crisis a non-random positive definite symmetric matrix B" for which (3.23) lim B;1(z;,z,,)% 33: I (3.24) lim max naglxnh 3» 0 n—~oo 1519: Then the estimator [3,, has an asymptotically normal distribution: (3.25) (222.123.. - 13) 9-» N 2 :21 41 (3.28) lim E(e?|f,-_1) = 02 as for some constant a n—+oo Moreover, assume for each n that the design vector (3.29) x.- = (131,- . . ,ast-p)’ is .7-",-_1 — measurable and that there exists a non-random positive definite 13,. for which (3.30) lim B;1(Z x.x;)% 1’» I i=1 (3.31) lim max ”13:...”2 i1 0 71—200 lSiSn Then the least square estimate [3,. of [i has an asymptotically normal distribution: (3.32) (2 xix2)%(/§n — g) 9. N(O, 021) i=1 Proof of theorem 3.4 .1. Again the crucial step is'the definition of the o -algebra 9). we introduced by equation (2.4). From that definition and from condition (3.3) we have that X k, tk E gk—la hence X; is Qk_1-measurable for all k so condition (3.29) is satisfied. From the equa- tion (3.6) we have that the errors form already a martingale difference sequence. By conditions (3.22) and (3.14) we have satisfied conditions (3.27) and (3.28). Finally replacing the design vector xn by X ,2 and the covariate matrix 2;; mix; by ZQZn we get that the conditions (3.30) and (3.31) are satisfied by conditions (3.23) and (3.24). The conclusion (3.25) follows from (3.32). C] 42 Chapter 4 Examples of Models with an Adaptive Allocation 4. 1 Introduction In the previous two chapters we proved some general theorems regarding, first, the distribution of the response from a general adaptive design and then the limit behavior of maximum quasi-likelihood estimators for a generalized linear model with an adaptive design. We will describe in this section how an adaptive design can be constructed and then we will look at some particular cases. Recall the main features of the design: 0 Patients arrive sequentially. 0 Each patient can be assigned to either one of the two treatments and will be assigned to exactly one of them. 0 The response from patients is observed immediately, prior to the assignment of the treatment to the next patient. 43 o The response is possibly dependent on a vector of covariates. The covariates are observed prior to the treatment assignment. 0 We want the adaptive design to be such that the treatment allocation depends on the previous responses and the current covariate of the patient. We will keep the same notation as in the previous chapters. Denote by A and B the two treatments that patients receive. Let tk be the treatment allocation to the kth patient, as defined by: 1 if kth patient is assigned to treatment A its 0 if 19‘” patient is assigned to treatment B Recall also that X k denotes the p-dimensional covariate vector of the kt" patient and Yk is the potential response of the kth patient. We consider the generalized linear model introduced in Chapter 3 by relations (3.4) and (3.6). In particular g(-) is the link function to connect the mean of the response Y), with the linear combination of the covariate X k and treatment tk. We will use the notation Y,“ and YkB for the potential responses from treatment A and treatment B reSpectively and Y), for the observed response: Y), = YkAI[tk = 1] + YkBI[tk = 0] . Assume, as before, the existence of an increasing sequence of o -algebra {.77).},01 such that the triplet (Yk, Xk, tk) is f). -measurable: 11:: 0(KA,KB,X.-,t.-,U.- : 1 s 2' s k) . 44 where (Uk)k21 denotes a sequence of i.i.d. random variables uniform distributed on (O, 1) and independent of the other variables in the model (YkA, 3’3, X k). We will use the sequence (Uk)k21 for randomizing the allocation treatment as in the construction given by (2.2). Recall also the construction of the o -algebras g), and that the treatment tk+1 is Q), -measurable: 9k 3: fl: V ”(XkH) V 0(Uk+1)o In general we will say that we are at stage I: if we have observed the first I: — 1 patients, and the covariate of the let" patient, but we haven’t observed yet the response from the kth patient. Let 13;," and 17,? be the predictors of the potential response of the kth patient from treatment A, respectively treatment B. We want that allocation to treatment A to be made based on a function of these predictors, f (f’kA, fka ). We’ll use the notation: (4-1) 7h: = f(YkAaYkB) - It is also useful to think of in, as an estimate of the parameter 7Tk, where 7r,c is the function of the unobserved potential responses: 7!.A: : f(YkAa YkB) ' In summary, we consider the following general framework for our adaptive design: 0 First stage Start with no patients allocated to each treatment. 0 Second stage 45 At each stage I: > 2n0 we iterate the following procedure: - Compute the maximum quasi-likelihood estimators Bk_1,1,fik_1,2,3k_1,3 based on the information from the first I: — 1 patients as defined by relation (3.9). Recall that 51.4.1, [31,-“ are p -dimensiona1 column vectors and 31,43 is scalar. — Observe the covariate vector X), for the kth patient. — Compute the predictors 1%? and 17,? for the responses: (4-2) 37;? = g(Xk,3k—1,1 + file—1,2 ‘ 1+ Xkfik—IB ' 1) = g(Xk(8k—1,l + [lie—1,3) + file—1,2) (4-3) 37;? = g(XkflAk—m + 31:42 ' 0 + Xkék—1,3 ° 0) = g(Xk/lk—m) - — Evaluate the function in, := f(YkA, 17,33). — Generate the treatment allocation tk according to a Bernoulli(irk) dis- tribution. To achieve this, consider (U021 the sequence of i.i.d. uni- formly distributed random variables on interval [0,1], independent of (Xk, Yk)k21. Then let t), to be defined as: (4.4) tk 2: [[Uk < 71k] where the function I [] is the usual indicator function. Note that based on the proposed model the estimators 17g“, 17,? are computed from the information up to the (k — 1)‘h patient and the current covariate X 1,. Thus 46 the treatment allocation tk, through the dependence on the in, involves all the information available at stage k — 1 which is exactly our proposed goal stated in the beginning of the chapter. The first stage of the design is needed for an initial estimation of the parameters of the model. If some estimators are already available, this first stage can be skipped and the design can start directly from the second stage. From an ethical point of view the choice of the function f (., -) should satisfy some requirements according to the basic principle that we should favor the treatment with a superior response in the trial up to date. As an example, in the case of a continuous response where larger values corre- sponding to better treatment, we propose the following guiding rules for constructing the function f: (4.5) f(x, 1:) = 0.5 (4.6) 3330mm = 1 (4.7) Mlim f(:r,y) = 0 Equation (4.5) expresses that in the case of same potential responses from the two treatments, we should allocate with probability 0.5. Equation (4.6) reflects that in the case of a superior potential response for treatment A then we should allocate to treatment A with a probability close to 1. Likewise, in the case (4.7), when the superior potential response comes from treatment B, then we should allocate to treatment A with a probability close to 0. 47 4.2 A Normal Response Example In this section we investigate an example from the family of designs proposed in the beginning of the chapter. For the model (3.4) introduced in Chapter 3, we will concentrate on the case when the response is normally distributed together with the identity(canonical) link function for the model. In this particular situation the generalized linear model reduces to the linear model and the maximum quasi- likelihood estimators become the least squares estimators. Keep the same notation introduced in the beginning of the chapter and consider the following model for the response: (4-8) Yk = fi1 + 32%: + flatk + B4thk + 6k with the errors 6;, independent and normally distributed: (4.9) Q. ~ N(0, 0‘2) . We consider the covariate X,c to be unidirnensional. The random covariates (X 1.)),21 are an i.i.d. sequence whose distributions have a known density p(x). The treatment allocation tk is as described in the general framework of an adaptive design (see page 46). More precisely, we will consider Let Z, be the model (4.8) design matrix: (1 X1 t1 X131) 1X t X-t (4.10) z,: 2 2 2 2 (1 X, tn Xn-tn) Consider now the case when larger values for the response Y), are desirable. Let 48 31)., 32),, 33,,“ 34,), be the least squares estimators for the parameters fi1,,32,,63, 64, re- spectively, after k patients are treated and observed in the trial. Then the predictors for the let" patient response from treatment A, respectively B are: 37;? = BlJc—l + Balk—1 + (flak-1 + B4,k—1)Xk YkB =ll1,k—1 + flak—1X1: - From ethical reasons we want to allocate the patients to the better treatment so we will ’favor’ the treatment with a larger value from the two predictors YkA, KB. At the same time we want to keep the randomization for the trial design. Therefore ’favoring’ the better treatment will consist of allocating with a higher probability to the better treatment. We will consider the following criteria for evaluating the better treatment. Let A), be the difference for the kth patient between the mean responses to the two treatments: (4.11) A), = E01?) — Em?) = 53 + 34X]: and let A), be its estimator at stage k: (4.12) A]; = 1‘ka — YkB = [33,k—1 + [34,k—1Xk - A positive A), shows that treatment A is better and analogously, if A), is neg- ative then the treatment B is better. Moreover a small absolute value of Ak is an 49 indication that the two treatments are not very different while a large absolute value of A), is an indication that one of the treatments is much better than the other one. Using the above interpretations we propose the following function for inc: (4.13) 71,, = ”MA"? . l + 6Xp(Ak) To emphasize that the treatment allocation is a function of 17,54, 13,3 rewrite in, 7,”: = €Xp(33,f—1 + [34,16—1Xk) 1 + EXPWch-l + :64,k—1Xk) _ “Ni/1:1 —' YkB) 1 + exp(Y,;4 — i1?) ' Since in this case we will work with the estimator A,“ is easier to think of the probability of selection irk as a function of it. Therefore we will refer in this case to in, as a function of just one variable instead of two, in, = f1(Ak). It is easy to observe that the three properties of the function f (-) as they were defined in (4.5) - (4.7) become the following properties for function f1(-): f1(0) = 0.5 . . exp(1') 1 = 1 __ = 1:320 f1 (:13) xgblo 1 + exp(a:) . . expt'r) l = l —— = mgr-Poo fl($) :r—ltr-{loo 1 + 8Xp(SL') 0 In our case, f1 (y) = $13—27) and the three properties from above are satisfied, as it can be seen from the graph shown in figure 4.1. The design is now completely specified according to the general framework pre- sented in the introduction of the chapter (see page 47) 50 Figure 4.1: Graph of function f1(y) = M 1+exp(y) 0.8 - 0.6 >- 0.5 r 0.4 '- 0.3 ~ 0.2 r- 0.1 r _ 4.2.1 An Evaluation Function for the Number of Patients on the Inferior rIreatment We will consider now an evaluation criterion for the design proposed in Chapter 4.1 from an ethical point of view. Recall that one of the main advantages of the adaptive designs is that they offer the choice of combining ethical considerations with the randomization needed for an impartial trial. In our case we want to allocate as many patients as we can to the better treatment, where the better treatment is decided based on the estimators from the model (4.8), but we have to keep in mind that the allocation is also subject to randomization introduced by (4.4). The evaluation criterion is a natural one, given the model and the design consid- ered. We will count the number of ’mistreatments’, i.e. the number of allocations to the inferior treatment. It can be regarded as a loss function. We denote by Ck 51 the number of ’mistreatments’ up to stage It and is defined as follows: A: k (4.14) 0,. = Z [[t, = 1] - [[A, 3 01+ 2 I[t,- = 0] . 1m.- > 0] ,=2,,0+1 i=2no+1 where we denoted by I () the usual indicator function. The sum begins from 2no + 1 because we will take in consideration only the allocations done according to the design as defined by relations (4.13) and (4.12) and will not include the first 2no allocations from the first stage. Observe that the variation in the loss function 0,, comes from two sources: one is the variation coming from the randomization of the treatment assignment according to a Bernoulli distribution and the other comes from the parameter estimation we use in the treatment assignment. We are interested in evaluating Ck. Based on the theorems we proved in the Chapter 3.3 we will study the behavior of Ck as k -+ co and we will be able to find a limit for the proportion of these number of patients allocated to the inferior treatment. Theorem 4.2.1. Assume that the density function p(x) is uniformly bounded, the errors ck satisfy condition (3.14), the minimum and maximum eigenvalue of the design matrix Zn satisfy condition (3.13) and that k k—ooo (4.15) suplX,| < 00 as. 2'21 Then (4.16) 95 18—3 / f1(fi3 + 04:15)}?(33) dx +/ [1 — f1 (53 + 3433”}?(33) d3: - fls+fi4x<0 fis+fi4x>0 Proof. We begin by checking the assumptions of theorem (3.3.1) from chapter 3. 52 0 Since in our case the model considered (4.8) is the linear model, the link function is the identity function, g(y) = g, which is continuously differentiable with positive derivative function, and therefore the first condition (3.11) from theorem (4.2.1) is satisfied. 0 In our case we have the intercept and a one dimensional covariate, Xk, so condition (3.12) becomes: sup ||(1.X.-)ll2 < 00 :21 But ||(1.X.-)ll2 = \/1+X.-2 < 1 + WI and using condition (4.15) the conclusion follows. 0 Finally, conditions (3.13) and (3.14) follow from the assumptions made in the theorem for our model. Therefore we can apply theorem (3.3.1) and conclude that: (B1,kafi2,k483,kaa4,k) —* (51.32.33.541) 3-5- Recall the definition of the o -algebra J), as defined by (2.5): .7141: fie V 0(Xk+1) and the equation (2.6): E(tk+1ljk) = 7%“ From the construction of the o -algebra, X k is Jk_1 -measurable. Since in our case 53 A1,, = fi3+fi4Xk it follows that A), is also Jk_1 -measurable. Then for any i > 2n0+1: E(I[ti=1]'I[AiS 0]) = E(E(I[t,- = 1] - IlA; S (”Ur—1)) = E(I[A.- s 0] - E(I[ti = 1]|.7.-_1)) = E(I[A,~ g 0] - E(t.-|.7.--1)) = E(71’,'°I[Ai s 0]) Analogous, EUlti = 0] ' [[A, > 0]) = E((1“‘ 7%) ' [[Ai > 0]) Then we have that 54 k 5%) =%. 2 mm, = 1] . 1m.- 5 0])+ t=2no+l 52:11; E(I[t =0] I[A >01) E(7T, I[fl3+fi4Xi < 0])-l- "PrHIt-J ;M~ [w k 1H5: ((1—71,) 1[ga+g4X,->01) =no2 ino=2 _1_ k k _2 1 =1; /flf[31(3,i—1+fl4,i—1$)P($)dx+ 53+.04IS0f 1 ’° . A — Z / (1 — f1 (IBM—1 + ,34,i_1£13))p(:1.‘) dz: i=2no+l 33+fi41=>0 k — 27.0 / 1 ’° . . : fl(fi3,i—1 + 343—115)?)(113) dx+ k 193+fi4$<0 k 217.0 Mimi, k — 2n0 / 1 ’° . . 4.17) 1—f £3- +,B,.--a: pxdx ( k 33+B4x>0 k — 271.0 i=;0:+1( l( 3 l 4 l )) ( ) But from our previous statement, we have that 33,), —> B3 and 34,), -—> H4 as and because the function f1(~) is continuous, we have that: f1(B3,i—1 + 344-133) (‘13 f1(53 + 5455) V37 And therefore making use of the Stolz—Cesaro theorem (see A.1.1) we have that the series is convergent: 2 f1( fl“ 1+fi4i-1$)P($)g fllfis+fi4$)p(x) Vx. i=2no +1 k— 271.0 55 I In the similar way we argue to get that: k k _12n0 . Z (1 - f1 (33.1-1 + B4,.-1x))p(x) ‘5’; (1 — f,(,33 + 34m))p($) V3; _ t=2n0+1 But the function f1(-) is uniformly bounded and from assumptions the density p() is also uniformly bounded, so Slip |f1(1‘)| V IP03)! S B where B is some constant. It follows from the dominated convergence theorem that: k 1 . a f (5 ,i— + fl ,i_ x)p x) dx jaawaxgok _ 2710 .29“ 1 3 1 4 1 ( Ls" 1.1033 + fi4$)P($) dx, 53+B4ISO 1 " . . (1 ‘ fl(l33,i—1 + fi4,i—1$))P($) dx /fi3+fiax>0k - 2”0 fig“ 2’ (1 ‘ f1(/33 + fi4$))p(x) dx. 53+l34r>0 Combining the two relations together with (4.17) and using that £:%m —> 1 as k ——> 00 we get the conclusion of the theorem. El Let look more closely at the previous result in the particular case when X), is uniformly distributed: Xk~U(a,b) . Let L be the limit of the 9;} as it was defined by equation (4.16). Then we have the following corollary: 56 Corollary 4.2.2. If the covariate Xk from the model introduced by (4.8) and (4.9) is uniformly distributed on interval [a, b] then the limit L is equal to: 0 Case1:a<—gfm =1+efia 1+e63 Proof. In this particular case the density function p(x) is equal to p(x): bianagng] and the function f1(-) was defined as f1(y) = 14:15.80 the limit L given by rela- 57 tion (4.16) becomes: 1 earl-34$ dx+ L — l . {53+B4$30}fl{a0}fl{a- >- 8 8 1 C C 8 ’ 8 m ‘0 L 8 2 b a b covan'ate X covariate X Case 3 Case 4 > . i m e b b a b covariate X covariate X 59 4.2.2 Simulations of the Design In this section we will present some numerical results obtained from simulations. We will compute by Monte-Carlo methods the estimators for the coefficient parameters (01, [32, ,83, 64) of model (4.8) and also we will compute the number of allocations to the inferior treatment, C", and the proportion Cn/(n — 2no). Although the theorems presented in the previous sections are all proved in the asymptotic case we will see that even in the case of moderate sample sizes the asymptotic results hold. We will consider the simulations in the case when covariate X), is uniformly distributed, presented in corollary (4.2.2). Consider the following numerical application: - fi1=0,fie=1,fia=3,ga=—0.5. 0 0:0.1, 0.2, 0.5, 1 or 2. e a=0, b=10 e the fixed sample size is n = 30,50,100, and no = 5, so that the first five patients are allocated to treatment A, the next five to treatment B and the next n — 10 allocations are done according to our design introduced in page 46 Because of choice of the parameters 61, ,62, fig, 04 we are in case 1 of the above result (4.2.2) and we get that in this particular case L = 0.242. The two regression lines for treatment A, respectively B are graphed below in figure 4.3. We use a continuous line to denote the regression line for YA and a dashed line to denote the reggresion line for YB: E(Y,f) = 3 + 0.5x, Elka) = Xk 60 response Y 10 Figure 4.3: The regression lines for treatment A and B h— )— oovan'ate X 61 10 For a better understanding of the allocation design it is worth mentioning how the allocation probabilities vary as function of covariate X. Recall that Ax = [33 + )64113 exp(Ae) x = Ax = 7T M ) 1+ exp(Ae) and so in our case exp(3 — 0.5x) ”I = 1 + exp(3 - 0.5x) ' Then on the interval [0,10] the allocation probabilities vary according to the following table 4.1: Table 4.1: Allocation probabilities to treatment A IE A:1: f1(Ax) 0 3 0.95 1 2.5 0.92 2 2 0.88 3 1.5 0.82 4 1 0.73 5 0.5 0.62 6 0 0.5 7 —0.5 0.38 8 -1 0.27 9 -1.5 0.18 10 -2 0.12 A typical simulation of the design allocation is given in the following figure 4.4. We use symbol ’0’ for marking the allocations to treatment A and symbol ’+’ for marking the allocations to treatment B. 62 response Y Figure 4.4: A simulation design 12 r l 1 I 1 I l 1 + 10' +' + 8” : 4* 0+ ' o o o o 6.— 0. e O .1 o +'0 o o ' ++ , + o O . 0° 0+ + 4* . ' + '* O 000 2" + + :. 'o <90 o + 2— + « B o - + -2 1 1 1 1 1 1 m 1 0 1 2 3 4 5 7 8 9 10 oovarialex 63 In the following tables 4.2 ~ 4.7 we used Monte Carlo simulations to evaluate the bias and the variance of the estimators 61, do, [13 and 34 and also to evaluate the estimate and the standard deviation for Cu and Cn/(n — 2no). Since there are ten observations in the first stage, no = 5, then for n = 100, there are ninety observations in the second stage, n—no = 100—2-5 = 90. In the limit L = 0.242 so we may expect to observe about 90 - 0.242 = 21.78 ’mistreatments’ for 0100- Analogues, we may expect 40-0242 = 9.68 ’mistreatments’ for C5o and 20-0242 = 4.84 ’mistreatments’ for 03o. The estimates from the simulations were close to these values as it can be seen from the tables below. In each of the following cases there are 10000 replications of the simulation of the allocation designs. In figures 4.5 - 4.9 we can see the histogram from the 10000 replications of the simulation for the bias of the estimators 61, 62,63,611 and also the histogram for the estimates of 0100- The histograms are shown in the case when n = 100, and o = 0.1,0.5,1 and 2. 64 Table 4.2: Simulation results for 61, 32,33,621, in the case n = 30 bias standard deviation 61 32 133 B4 0:0.1 0:02 0:05 0:1 0:2 0:01 0:02 0:05 0:1 0:2 0:01 0:02 0:05 0:1 0:2 0:01 0:0.2 0:05 0:1 0:2 —0.0007 -0.0025 -0.0239 -0.0798 -0.2787 0.0001 0.0002 0.0014 0.0047 0.0129 0.0010 0.0012 0.0253 0.0861 0.2092 -0.0003 -0.0002 -0.0039 -0.0144 -0.0330 0.0762 0.1523 0.3870 0.7851 2.3751 0.0100 0.0224 0.0583 0.1204 0.2590 0.0900 0.1780 0.4629 0.9302 2.0170 0.0141 0.0300 0.0755 0.1597 0.3524 65 Table 4.3: Simulation results for 61,32, 33, ,8}, in the case n = 50 bias standard deviation fit fi2 fia .54 (7:01 0:02 0:05 0:1 0:2 0:01 0:02 0:05 0:1 0:2 0:01 0:02 0:05 0:1 0:2 o=0.1 0:02 0:05 0:1 0:2 -0.0021 -0.0041 -0.0261 -0.0883 -0.3109 0.0002 0.0004 0.0027 0.0070 0.0212 0.0021 0.0043 0.0287 0.0997 0.2973 -0.0003 -0.0008 -0.0048 -0.0160 -0.0478 0.0632 0.1261 0.3122 0.6534 1.3500 0.0100 0.0173 0.0447 0.1170 0.2245 0.0728 0.1428 0.3585 0.8209 1.6500 0.0100 0.0224 0.0583 0.1453 0.2900 66 Table 4.4: Simulation results for 61, 32, fig, 34, in the case n = 100 bias standard deviation fii fiz [33 34 0:01 0:02 0:05 0:1 0:2 0:01 0:02 0:05 0:1 0:2 0:01 0:02 0:05 0:1 0:2 0:01 0:02 0:05 0:1 0:2 -0.0009 -0.0035 -0.0197 -0.0644 -0.2559 0.0001 0.0004 0.0021 0.0067 0.0216 0.0010 0.0044 0.0217 0.0713 0.2711 -0.0002 -0.0007 -0.0034 -0.0116 -0.0416 0.0469 0.0949 0.2328 0.4824 1.0516 0.0100 0.0141 0.0332 0.0678 0.1539 0.0529 0.1068 0.2615 0.5530 1.2147 0.0100 0.0173 0.0424 0.0877 0.1977 67 Table 4.5: Simulation results for Coo, in the case n = 30 estimate standard deviation C’30 030/20 0:01 0:02 0:05 0:1 0:2 0:01 0:02 0:05 =1 0:2 4.8632 4.8698 4.9634 5.2542 5.9658 0.2432 0.2435 0.2482 0.2627 0.2983 1.8988 1.9196 2.0262 2.2815 3.0394 0.0949 0.0960 0.1013 0.1141 0.1520 68 Table 4.6: Simulation results for C5o, in the case n = 50 estimate standard deviation C50 0:0.1 9.7533 2.7314 0:0.2 9.6947 2.7323 17:05 9.7995 2.8569 0:1 10.1507 3.3095 0:2 11.3508 4.8413 05o / 40 0:01 0.2438 0.0683 0:0.2 0.2424 0.0684 0:0.5 0.2450 0.0714 o=1 0.2538 0.0827 o=2 0.2838 0.1210 69 Table 4.7: Simulation results for 0100, in the case n = 100 estimate standard deviation C'100 0100/90 01:01 o=0.2 0:0.5 0:1 0:2 {7:01. 0:02 0:05 0:1 0:2 21.7849 21.8040 21.9066 22.3827 24.0572 0.2421 0.2423 0.2434 0.2487 0.2673 4.0953 4.1247 4.3555 5.1867 7.7539 0.0455 0.0458 0.0484 0.0576 0.0860 70 Figure 4.5: Histogram of 10000 replications for bias of fill when n = 100, a = 0.1,0.5,1,2 31 n=100 o=0.1 B1 n=100 a=0.5 3000 2000 1000 0 -0.3 2 4000 300° 4000 2000 3000 2000 1000 1000 0 0 -4 -2 0 2 4 -15 -10 -5 0 5 71 Figure 4.6: Histogram of 10000 replications for bias of [32 when n = 100, a 0.1, 0.5, 1, 2 02 n=100 0:01 02 n=100 M5 3000 2500 6°°° 2000 4000 1500 1000 2000 500 0 0 -0.04 -0.02 0 0.02 0.04 -0.8 -0.6 -0.4 -0.2 0 0.2 02 n=100 a=1 02 n=100 c=2 5000 6000 4000 4000 3000 2000 2000 1000 0 -1.5 -1 -0 5 0 0 5 -2 -1 0 1 2 72 Figure 4.7: Histogram of 10000 replications for bias of 63 when n = 100, o 01,05, 1, 2 Ba n=100 0:01 03 n=100 0:05 4000 . - . 4000 - . - 3000’ 3000: 2000' 2000’ 1000* 1000. 0 0 -{l4 -2 4000 10000 - mi . 3000i 8000’ 6000’ 2000: 4000: 1 . 000 2000: 0 0 -80 -60 -40 73 It Figure 4.8: Histogram of 10000 replications for bias of 04 when n = 100, o 0.1, 0.5, 1, 2 b4 n=100 a=0.1 fl4 n=100 0:05 2500 5000 2000 1500 1000 500 1000 o o —0.04 —0.02 0 0.04 -02 0 02 0.4 06 on [5‘ n=100 0:1 [1‘ n=100 c=2 mm 4000 6000 3000 4000 2000 1000 2000 o o -o.5 0 0.5 1.5 -2 0 2 4 s a 74 Figure 4.9: Histogram of 10000 replications for estimates of 0100 when n = 100, o = 01,05, 1, 2 C‘0° n=100 030.1 C100 "3100 5:05 3000 - . - 3500 - . - 0 10 20 30 40 50 C100 n=100 a=1 C,00 n=100 o=2 4000 . a 75 Finally, we want to compare from an ethical point of view our design (ADC) with other designs proposed in the literature. So we will compare the number of ’mistreatments’, C", and the proportion Cn/(n —- 2no) from two other designs: 0 complete randomization (CR) design where each treatment has 50% chance of being assigned (the covariate is ignored) 0 deterministic (D) design where allocation is done according to the larger mean response of the two treatments, that is tk = [if/£1 > 17131] The results from the simulations with n = 50,100 and o = 0.1, 0.2, 0.5, 1, 2 are presented in tables 4.8 - 4.11. Recall that for our design in the first stage we allocate 5 patients to each treatment, so for comparison purposes we have to take 100—10=90, respectively 50—10=40 allocations for the other two designs enumerated before. 76 Table 4.8: Simulation results for C50, in the case n = 50 estimate ADC CR D 0:01 9.75 19.99 17.70 0:02 9.70 20.01 17.71 0:05 9.80 20.01 17.62 0:1 10.15 19.99 17.87 0:2 11.35 20.04 18.31 standard deviation ADC CR D o=0.1 2.73 3.13 3.72 0:02 2.73 3.22 3.73 0:05 2.86 3.13 3.74 0:1 3.31 3.17 3.93 o=2 4.84 3.18 4.18 77 Table 4.9: Simulation results for 050 / 40, in the case n = 50 estimate ADC CR D standard deviation 0:01 0:02 0:05 0:1 0:2 0.244 0.243 0.245 0.254 0.283 ADC 0.500 0.500 0.500 0.500 0.501 CR 0.443 0.443 0.441 0.447 0.458 0:01 0:02 0:05 =1 0:2 0.068 0.068 0.072 0.083 0.121 0.078 0.081 0.078 0.079 0.080 0.093 0.093 0.094 0.098 0.105 78 Table 4.10: Simulation results for C100, in the case n = 100 estimate ADC CR D 0:01 21.78 44.98 38.98 o=0.2 21.80 45.09 39.02 0:05 21.91 44.98 39.27 0:1 22.38 44.96 39.58 0:2 24.06 44.96 40.84 standard deviation ADC CR D o=0.1 4.10 4.73 6.73 0:02 4.12 4.69 6.75 0:05 4.36 4.80 6.98 o=1 5.19 4.74 7.31 0:2 7.75 4.74 8.12 79 Table 4.11: Simulation results for 0100/ 90, in the case n = 100 estimate ADC CR D 0:01 0:02 0:05 0:1 0:2 standard deviation 0.242 0.242 0.243 0.249 0.267 ADC 0.500 0.501 0.500 0.499 0.499 CR 0.433 0.434 0.436 0.440 0.454 0:01 0:02 0:05 0:1 0:2 0.046 0.046 0.048 0.058 0.086 0.053 0.052 0.053 0.053 0.053 0.075 0.075 0.078 0.081 0.090 80 4.3 Conclusions and Future Work The design we propose is quite natural and fits very well in situations described by the outline below: 0 Delay in response is moderate, allowing the adaptation to take place. 0 The recruitment can take place during most or all of the trial. 0 It is known that a generalized linear model is a good fit for the relationship between the response, the covariates and the treatment. 0 Modest gain in terms of treatment successes is desirable from an ethical stand- point. The design can be especially useful if it is desirable also to have an estimation of the parameters of the model during the duration of the clinical trial before waiting for the entire data to be collected. We have seen that the asymptotic results obtained in theorems 3.3.1 and 4.2.1 are consistent with the simulations even in the case when the n is relative small (n = 30,50,100). The histograms also show a normal behavior for the estimators, but as dispersion parameter 02 increases the variance of the estimators increase too, leading in some cases to more biased estimates. There are several lines of research arising from the current thesis work which should be pursued. 0 There is a natural extension to the case when in the clinical trial there are more than two treatments to be compared. 0 One of the requirement of the design studied was to have responses available prior to the assignment of the treatment to the next patient. But as often is the case in practice, delayed responses should be considered in the design, too. 81 0 Of particular interest is the extension of the normal asymptotic result from the normal linear model to the generalized linear model. 0 An useful design is such that the function 7r,c that we use in randomizing the allocation is obtained from various optimal criteria. Of interest in this case is also to compare the optimal target with its correspondent from the adaptive design. 0 Lot of work has been done in the literature in the case where the covariates are categorical, so a stratified design can be employed. An application of our design in this particular case would be interesting and the performance of the designs should compared. 82 Appendix A A.1 Theorem A.1.1 (Stolz-Cesaro Theorem (Siretchi (1985)). Let (an),,21 and (bn)n21 be two sequences of real numbers. If bn is positive, strictly increasing and unbounded and the following limit exists: . an+l—an l ———=l n—+oo n+1 — bn Then the limit: also exists and it is equal to l. Proof. From the definition of convergence , for every 6 > 0 there is N (e) E N such that (V)n 2 N(6) , we have : a —a l_€ N (e) be a natural number . Summing the last relation we get : k k k (l—e) Z (b,+1—b,)< Z (a,,,—a,,)< (1+6) 2 (am—boa i=N(c) i=N(c) i=N(c) (1 - €)(bk+1 — bis/(q) < ak+1 — am.) < (1+ €)(bk+1 - bN(c)) Divide the last relation by bk+1 > 0 to get : b 6 E b C (l—e)(1— N” < ““1 — “N” < (1+e)(1——fl)e bk+1 bk+1 bk+1 k+1 b E C b c C (l_€)(1-N—())+GN()(ak+l<(l+5)(1_fl_l)+aml bk“ bk“ bk+1 b1+1 bk+1 This means that there is some K such that for k 2 K we have : ak+1 (l—c)< < (l + c) bk+1 since the other terms who were left out converge to 0. This means that : 84 Bibliography [1] Andersen, T.W. and Taylor, J. (1979). Strong consistency of least squares es- timators in dynamic models, The Annals of Statistics, 7, 484-489. [2] Atkinson, AC. (1982). Optimum biased coin designs for sequential clinical trials with prognostic factors, Biometrika, 69, 61-67. [3] Blackwell, D. and Hodges, J .L. (1957). Design for the control of selection bias, Annals of Mathematical Statistics, 28, 446-460. [4] Chen, K., Hu, 1. and Ying, Z. (1999). Strong consistency of maximum quasi- likelihood estimators in generalized linear models with fixed and adaptive de- signs, The Annals of Statistics, 27, 1155-1163. [5] Clayton, MK. (1989). Covariate models for Bernoulli bandits, Sequential Anal- ysis, 8, 406-426. [6] Cornell, R.G., Landenberger B.D., Bartlett and RH. (1986). Randomized play the winner clinical trials, Communications in Statistics- Theory and Methods, 15, 159-178. [7] Dvoretzky, A. (1972). Asymptotic normality for sums of dependent random variables, Proceedings of the Sixth Berkeley Symposium on Mathematical Statis- tics and Probability , 2, 513-535. [8] Efron, B. (1971). Forcing a sequential experiment to be balanced, Biometrika, 58, 403-417 [9] Geraldes, MC. (1999). Covariates in adaptive designs for Clinical Trials, Michi- gan State University Graduate School(doctoral dissertation) [10] Ivanova, A., Rosenberger, W., Durham, S. and F lournoy, N. (2000). A birth and death urn for randomized clinical trials: asymptotic methods Sankhyd: The Indian Journal of Statistics B, 62, 104-118. [11] Jenison, C. and Turnbull, B.W. (2000). Group Sequential Methods with Appli- cations to Clinical Trials, Chapman and Hall, Boca Raton. 85 [12] Lachin, J.M. (1988). Properties of simple randomization clinical trials Con- trolled Clinical Trials, 9, 312-326. [13] Lai, TL. and Robbins, H. (1979). Adaptive design and stochastic approxima- tion, The Annals of Statistics, 7, 1196-1221. [14] Lai, T.L., Robbins, H. and Wei, C.Z. (1979). Strong consistency of least squares estimates in multiple regression II, Journal of Multivariate Analysis, 9, 343-361. [15] Lai, TL. and Wei, C.Z. (1982). Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems, The Annals of Statistics, 10, 154-166. [16] Lai, TL. and Robbins, H. (1982). Consistency and asymptotic efficiency of slope estimates in stochastic approximation schemes, Zeitschrift fiir Wahrschein- lichkeitstheorie und verwandte Gebiete, 56, 329-360. [17] McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models 2nd ed., Chapman and Hall, London. [18] Melfi, V. and Page, C. (2000). Estimation after adaptive allocation, Journal of Statistical Planning and Inference, 87, 353-363. [19] Melfi, V., Page, C. and Geraldes, M. (2001). An adaptive randomized design with application to estimation, Canadian Journal of Statistics, 29, 107-116. [20] Pocock, SJ. and Simon, R. (1975). Sequential treatment assignment with bal- ancing for prognosis factors in the controlled clinical trial, Biometrics, 31, 103- 115. [21] Robins, H. and Monro, S. (1951). A stochastic approximation method, Annals of Mathematical Statistics, 22, 400-407. [22] Rosenberger, W.F. (1999). Randomized play-the-winner clinical trials: review and recommendations, Controlled Clinical Trials, 20, 328-342. [23] Rosenberger, W. (2002). Randomized urn models and sequential design, Se- quential Analysis, 21, 1-41. [24] Sarkar,J. (1991). One-armed bandit problems with covariates, The Annals of Statistics, 19, 1978-2002. [25] Siretchi, G. (1985). Diflerential and Integral Calculus, vol.1, Editura Stiintifica si Enciclopedica, Bucharest [26] Smith, R.L. (1984). Sequential treatment allocation using biased coin designs, Journal of the Royal Statistics Society B, 46, 519-543. 86 [27] Tamura, R.N., Faires D.E., Andersen J.S. and Heiligenstein, J.H. (1994). A case study of an adaptive clinical trial in the treatment of out-patients with depressive disorder, Journal of the American Statistical Association, 89, 768- 776. [28] Wedderburn, R.W.M. (1974). Quasi-likelihood functions, generalized linear models and the Gauss-Newton method, Biometrika, 61, 439-447. [29] Wei, L.J. (1978). The adaptive biased coin design for sequential experiments, The Annals of Statistics, 6, 92-100. [30] Wei, L.J., Durham, S. (1978) The randomized play-the—winer rule in medical trials, Journal of the American Statistical Association, 73, 840—843. [31] Wu, C.F.J. (1985). Efficient sequential designs with binary data, Journal of American Statistical Association, 80, 974-984. [32] Yang, Y. and Zhu, D. (2002). Randomized allocation with nonparametric es- timation for a multi-armed bandit problem with covariates, The Annals of Statistics, 30, 100-121. 87