THEﬂS
i
,. W; n‘ A

o38¢845

Ph.D.

 

This is to certify that the
dissertation entitled

ADAPTIVE DESIGNS WITH COVARIATES

presented by

GEORGE SIRBU

has been accepted towards fulﬁllment
of the requirements for the

degree in Statistics and Probability

 

(7791a

Major Professor’s Signature

$190?

Y

Date

 

MSU is an Afﬁnnative Action/Equal Opportunity Institution

 

_ 7‘ 7,, A ._. ._ -——-+

LD-g‘-'-'- -VL“‘I-l-‘- --‘-- gv...-_v-vn_‘-.A-----

A ‘hL

 

LIBRARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 cJCIRC/DatoDue.p65—p.15

 

ADAPTIVE DESIGNS WITH COVARIATES

By

George Sirbu

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

2004

ABSTRACT
ADAPTIVE DESIGNS WITH COVARIATES
by
George Sirbu

The potential beneﬁts of adaptive allocation have been recognized as a great con-
tribution especially for clinical trials. Implementation of these allocation schemes
eases the ethical problem involved in trials on human subjects. While response-
adaptive randomization does not eliminate the ethical problem of randomizing pa-
tients to the inferior treatment, it can mitigate it by making the probability of
assignment to the inferior treatment smaller. The relatively new techniques of
response-adaptive randomization are attractive to industry and also to the Food
and Drug Administration. An essential feature is to balance the conﬂict between
information gathering - collective ethics - and immediate payoff - individual ethics.
In this sense response-adaptive randomization represents a middle ground between
community beneﬁt and individual patient beneﬁt.

In this dissertation we concentrate on designs with covariate information and re-
sponse adaptive randomization procedures. In some studies it is desired to conduct
an analysis that is ’adjusted’ for other covariates. Stratiﬁcation is one approach to
adjust for covariates or to increase the efﬁciency by accounting for a highly inﬂuen-
tial covariate. This approach however, is only applicable to qualitative covariates,
or discretized quantitative covariates, and they must be few in number or else the
number of strata grows exponentially. In many respects it is more natural to per-
form an adjustment using a regression model that allows for both qualitative and
quantitative covariates simultaneously.

The problem consists of choosing in a sequential manner one of two treatments

while we continuously observe the information from the process - the response and

the covariates. We try to choose the best allocation while at the same time we learn
from the process. We are able to prove strong consistency and asymptotic normality
for maximum quasi-likelihood estimators of regression parameters in generalized
linear models with a condition of smoothness for the link function. The key for the
applicability of the results is to have a martingale difference structure for the errors
of the model and to have a matrix for the covariates that is compact enough, where
the compactness condition is expressed in terms of the eigenvalues of the design
matrix.

The results are general enough to allow us to define the design in a large variety of
situations. For obvious ethical reasons, we investigate also what number of patients
allocated to the inferior treatment we can expect from the design under some given
underlined distributions for the covariate. Various Monte Carlo simulations are used

to evaluate the performance of the design for suitable choices of the parameters.

To my dear wife Mihaela.

iv

ACKNOWLEDGEMENT

I would like to express my gratitude to all those who gave me the possibility to
complete this thesis. I am deeply indebted to my supervisors Dr. C. Page and V.
Melﬁ whose help, stimulating suggestions and encouragement helped me in all the
time of research for and writing of this thesis. I wish to thank the members of my
committee Dr. H. Salehi and R. Erickson, who have given me valuable suggestions.

It is impossible to have my research career without my parents’ love and sup-
port, as well as my colleagues and friends’ encouragement. I appreciate all their
friendships and their collective encouragement to ﬁnish this dissertation.

Especially, I would like to give my special thanks to my wife Mihaela whose

patient love enabled me to overcome the obstacles through this journey.

Table of contents

List of Tables vii

List of Figures viii

1 Chapter 1 1

1.1 Introduction ................................ 1

1.2 A Short Review of Adaptive Designs .................. 3

1.3 Applications ................................ 11

2 Chapter 2 14

2.1 Introduction ................................ 14

2.2 The Model and Notation ......................... 15

2.3 The Distribution of the Responses under

Adaptive Design ............................. 19

2.3.1 A counterexample ......................... 21

3 Chapter 3 27

3.1 Introduction ................................ 27
3.2 A Generalized Linear Model Relating the Response, the Adaptive

Design and the Covariatos ........................ 29

3.3 A Consistency Result ........................... 35

3.4 An Asymptotic Normality Result .................... 40

4 Chapter 4 43

4.1 Introduction ................................ 43

4.2 A Normal Response Example ...................... 48
4.2.1 An Evaluation Function for the Number of Patients on the

Inferior 'Iieatment ........................ 51

4.2.2 Simulations of the Design .................... 60

4.3 Conclusions and Future Work ...................... 81

Appendix 83

Bibliography 85

vi

List of Tables

4.1 Allocation probabilities to treatment A ................. 62
4.2 Simulation results for 31,32,33ﬁ4, in the case n = 30 ......... 65
4.3 Simulation results for 31, 82,33,621, in the case n = 50 ......... 66
4.4 Simulation results for 31, 6}, 33, S4, in the case n = 100 ........ 67
4.5 Simulation results for (730, in the case n = 30 .............. 68
4.6 Simulation results for C50, in the case n = 50 .............. 69
4.7 Simulation results for 0100, in the case n = 100 ............. 70
4.8 Simulation results for C50, in the case n = 50 .............. 77
4.9 Simulation results for C50 / 40, in the case n = 50 ............ 78
4.10 Simulation results for 0100, in the case n = 100 ............. 79
4.11 Simulation results for 0100 /90, in the case n = 100 ........... 80

vii

List of Figures

4.1
4.2
4.3
4.4
4.5

4.6

4.7

4.8

4.9

Graph of function f1 (y) = 7%?7) .................... 51
Illustration of Corollary 4.2.2 ...................... 59
The regression lines for treatment A and B ............... 61
A simulation design ............................ 63
Histogram of 10000 replications for bias of 31 when n = 100, or =

0.1, 0.5, 1, 2 ................................ 71
Histogram of 10000 replications for bias of 32 when n = 100, a =
0.1,0.5,1,2 ................................ 72

Histogram of 10000 replications for bias of $3 when n = 100, a =
0.1, 0.5, 1, 2 ................................ 73

Histogram of 10000 replications for bias of 34 when n = 100, a =
01,05, 1,2 ................................ 74

Histogram of 10000 replications for estimates of 0100 when n = 100,
a=0.1,0.5,1,2 .............................. 75

viii

Chapter 1

Literature Review

1. 1 Introduction

The current dissertation is motivated by the following scenario: in the case of a
clinical trial when two drugs, A and B, are to be evaluated, a design is wanted
such that the treatment assignment takes in consideration possible covariates and
past responses. A design that incorporates in the allocation rule the information
obtained from past observations is called an adaptive design. We will review in this
chapter some of the designs proposed in the literature and we will see later chapters
how we can construct a design for the problem formulated above.

As usually in the case of a design problem, we want to balance two goals. On
one hand we want the design to be ’ethz'cal’ in the sense that the allocation should
be done toward the treatment that shows the best performance thus far in the trial
(this can be thought of as an ’individualistic goal’ as deﬁned by Sarkar(1991)). On
the other hand we want the design to allow us to draw reliable statistical inference
from the data to be collected (this can be thought of as an ’utilitarian goal’ as also
deﬁned by Sarkar(1991)). Unfortunately the utilitarian and individualistic goals are

usually conﬂicting.

The adaptive designs may address both concerns arisen before. The adaptiveness
of the design reﬂects that the treatment allocation is based on responses available
thus far in the trial. But the problem is more complicated here because we set
our goal to make the design dependent on covariates too. As one may guess, the
covariate information should make the design more ’ethical’ but at the same time
it is likely that it complicates the statistical inference.

In the current dissertation we will develop and study a new design that will
satisfy the goals described before. In the second chapter we will look at some
general prOperties of the distribution of responses in the case of an adaptive design
with covariates. In chapter three we will prove some asymptotic results for such
designs in the case of a speciﬁc model for the responses. In the ﬁnal chapter we will
look at some particular designs and how to evaluate them from an ’ethical’ point of
view and simulation results will be shown.

The dissertation work concentrates on designs where the response from the treat-
ment depends on a vector of covariates. Such designs will henceforth be referred as
covariate designs, as opposed to non-covariate designs which do not take into con-
sideration covariate information on the patients involved in the clinical trial. The
other focus of the dissertation is on the adaptiveness of the design as described be-
fore. Following Rosenberger’s(2002) classiﬁcation, the designs where the allocation
is done based on the previous treatment allocations, used primarily when a balanced
allocation is desired, are called allocation-adaptive designs. The designs where the
allocation is done based on the previous responses from the treatments, primarily
used when ethical considerations are preferred, are called response-adaptive designs.

Before presenting and studying the new design we begin by reviewing some of
the procedures that have already been proposed in the literature. This review can
be regarded as an introduction and motivation for the new procedure proposed in

this dissertation; it will be particularly useful in showing what are the novel aspects

of the new design. Also it introduces the designs that are to be used as basic points

of reference for the new design.

1.2 A Short Review of Adaptive Designs

The main focus of this section is to review some of the adaptive designs available in
the literature. Before we start to review some of these, is important to acknowledge
the importance of the randomization in a clinical trial.

The randomized clinical trial guards against systematic bias. Another prop-
erty of randomization is that it promotes comparability among the study groups.
Such comparability can only be attempted in observational studies by adjusting for
or matching on known covariates. Moreover, the act of randomization provides a
probabilistic basis for an inference from the observed results.

In the remainder of this section, unless otherwise noted, it is assumed that
patients arrive sequentially and that each patient can be assigned to exactly one of
two treatments. Although Many of these designs can be modiﬁed to handle more
than two treatments.

Complete randomization is a fair coin-tossing assignment: patients have equal
probability to be assigned to either of the two treatments. Sometimes it is desired
that the ﬁnal allocation is exactly equal between the treatments. When the total
sample size, n, is known, a truncated binomial design [Blackwell and Hodges(1957)]
will achieve the goal of equal allocation. The design is simply to use complete
randomization until one treatment has been assigned n/ 2 times; all subsequent
patients will receive the opposite treatment. An alternative is the random allocation
rule [Lachin(1988)]. Let (tk)k.__1,,,,,n be the sequence of random assignments (where
ti, = 1 if the kth patient receives treatment A and ti, = 0 otherwise) and we denote

by N ,f the number of patients randomized to treatment A after k patients have been

allocated. The design is deﬁned by the following:

- N111
— —k+1'

3 «3|:le

The difference between the last two designs is that the treatment sequences will not
be equiprobable for the truncated binomial design. These last two designs can be
embedded in a larger class of designs called restricted randomization designs. In this

case the treatment assignments (tk)k=1 n are dependent with a variance-covariance

matrix 2t 75 I / 4. They are used when is desired to have equal numbers of patients
assigned to each treatment group.

The biased coin design proposed by Efron(1971) is a modiﬁcation of the complete
randomization in the sense that it is a biased coin-tossing. Let D, = Nf — N: be
the difference between the number of patients who received treatment A and those

who received treatment B after k patients have been randomized. Fix a constant p

in the interval [0.5,1). Then the design is described by:

l—p lka_1>0
P(tk:1)= 1/2 ika_1=0

p lka_1<0.

Wei(1978) noted that a disadvantage of this procedure is that, in assigning the
next patient to a treatment, the allocation policy neither takes into consideration
the number of patients treated thus far, nor does it discriminate between small and
large absolute values of Dk. He proposed an adaptive biased coin design as follows:

let h : [—1, 1] —-> [0,1] be a non-increasing function such that h(:c) = 1 — h(-a:) for

all a: 6 [—1,1]. Then
Dk—l)
k — 1 .

 

Pa. = 1) = h(

This allocation policy forces an extremely imbalanced experiment to be balanced
but tends to complete randomization as the difference Dk tends to zero.

All previous designs have in common that they are not response-adaptive, i.e.
they do not depend on the observed responses from the trial. Their goal is to
randomize the treatment and some balance is required. The next designs that we’ll
present will be response-adaptive. The response-adaptive randomization can be
used when various considerations make it desirable to have unequal numbers of
patients assigned to treatments. For example, is very common in clinical trials to
have more patients assigned to the superior treatment. Other examples are driven
by optimality criteria, as they will be described below, which results in imbalanced
allocation.

A well-known such design is the randomized play-the-winner rule (RPW) intro—
duced by Wei and Durham(1978). The design can be described with an urn model
as follows. The urn has initially a balls of two types corresponding to the two
treatments. When a patient enters the study, a ball is randomly drawn from the
urn and the treatment is assigned according to the ball type. The ball is returned
to urn and the response is observed. If a success has occurred, add ﬂ balls of the
initial ball type selected and (1 balls of the other type. If a failure has occurred, add
0 balls of the initial ball type selected and ﬂ balls of the other type. The rule is
denoted by RPW(u, 01,6) Different choices for the pair (a, 3) give different levels of
compromise between balance and allocation to the better treatment. A design used
in practice is the RPW(1,0,1). Later many authors have modiﬁed and generalized
the RPW to achieve less selection bias or less variability or some other optimization

criteria. We mention here for example the birth and death am by Ivanova, Rosen-

berger, Durham and Fluornoy(2000). A good source for a review of urn models is
Rosenberger(2002).

So far the designs presented have intuitive motivation and can be completely
non-parametric but are not derived from optimal considerations. Another approach
for the response-adaptive design is based on Optimal allocation targets, where a
speciﬁc criterion is optimized based on a population response model.

The approach we take here is similar to Melﬁ, Page and Geraldes(2001). Let
1r 2 7r(-, ) be a function taking values in (0,1), and let 0" and 03 be two parameters
for the distribution of the responses from treatment A, respectively B. Assume
that there is an unknown but estimable Optimal proportion it = «(04,019) of the
observations that should be allocated to treatment A. Let irk_1 be an estimate Of 1r
based on the ﬁrst k — 1 observations. Then the rule is simply to allocate according

to the estimate of the Optimal proportion:
P(tk = 1) = 7Tk_.1

The optimal criterion can be defined in various ways. For example if the responses

A, 0’3 and we want to minimize

are normally distributed with standard deviations a
the variance Of the difference of the sample means when the total sample size is ﬁxed
then the Optimal target is the proportion 1r = 37$ng and thus the parameters to be
estimated at each step are 6’4 = 0A and 03 = 03.

Another criterion is given by Jenison and Thrnbull(2000). Let [1A,[LB be the
mean responses from the treatment A, respectively B, and let [11,“, [if be their es-
timators after 17. Observations. Consider it = N A /n, where N: + N5 = n. The
Optimal it is found by minimizing the function u(pA, pB)NA + v(uA, pB)NB, with

the variance of [if] — [1,? ﬁxed. For example if the responses are dichotomous, so we

can clearly deﬁne a success/ failure, and we want to minimize the expected number

Of treatment failures, then we choose u(pA, ,uB) = 1 — MA, v(uA, #3) = 1 — ,uB and
in this case we Obtain TI? = @'

Melﬁ and Page(2000) give an elegant method for proving consistency and asymp-
totic normality of estimators for a. wide class of response-adaptive designs. We will
see in Chapter 2 and Chapter 3 how some results can be generalized to the case
where covariates are incorporated in the design.

Most Of the work done in the area of the covariate adaptive designs involves the
case when the covariates are categorical. Obviously continuous covariates can also
be considered by grouping in an appropriate way the possible values of the covariates
into ﬁnitely many levels. A general approach is to form strata with combinations of
the categorical covariates and then apply the adaptive designs within each stratum.
Both Efron(1971) and Wei(1978) mentioned in their papers the procedure to extend
their designs tO the covariate case.

Pocock and Simon(1975) suggest an allocation rule which can be viewed as a
generalization of Efron’s biased coin design to more than two treatments and sev-
eral covariates. The design relies on a function G which measures the total amount
of imbalance (in the distribution of the treatment numbers within strata). Treat-
ments are then ranked according to their G-values. Their procedure is shown to
enable treatments to be balanced across strata more effectively. But as Pocock and
Simon(1975) note, a major difficulty with this approach is that number of strata
increases rapidly as the number of covariates and their levels increase.

Geraldes(1999) proposes two new designs. The ﬁrst one is the covariate adap-
tive weighted differences design which incorporates covariates and can be viewed as
a generalization Of the adaptive biased coin design Of Wei(1978) by crossing-over
information from responses of patients from stratum to stratum. The second one is
the covariate randomized play-the-winner rule which corresponds to a multiple urn

model, each urn representing a stratum. It allows the responses Of patients in one

stratum to change the composition of the urns corresponding to the other strata.

Instead of concerns about balancing treatment assignments across strata, one
can take an entirely different approach and ﬁnd an allocation rule that optimizes
some criterion, such as minimizing the variance of the estimated treatment effect in
the presence of covariates. Such a rule would necessarily require the speciﬁcation of
a model linking the covariates and the treatment effect.

As an example, Atkinson(1982) chooses a standard linear regression model:
E(Y,-)=$;ﬁ, i=1,2,...,n

where the Y, are independent responses with Var(Y) = 021 and 2:,- includes a
treatment indicator and selected covariates Of interest. Then Var(ﬁ) = 02(X’X)’1,
where X’ X is the dispersion matrix. For the construction of the optimal design, we
wish to ﬁnd the n points of experimentation at which some function is Optimized.

The DA-optimal design uses DA-optimality that maximizes
DA = |A’M‘1A|'1

where M = X’ X/ n and A is a matrix of contrasts. He proposes then the following
sequential design: assuming that n patients have been already allocated, the n + 1
patient is assigned to the treatment that maximizes D A (ﬁn) evaluated at the n-point
design En. He also compares the D A criterion to D-optimality, which maximizes the
log determinant of M.

Clayton(1989) proposes a covariate model for a problem with the responses of the
patients governed by Bernoulli distributions. The model can be‘ formally described
as follows. Let YkA and 1”,? denote the potential dichotomous responses of the 19‘"

patient to treatment A and B, respectively. Before assigning patient It, a covariate

X k can be observed on the patient. It is assumed that the covariate random variables
X1, X2, . . . are i.i.d. with a known distribution F. He considered then a function H ()

for linking the responses to the covariates:

P(Y,j‘ = 1pc.) = H(a + ﬂXk)

P(YkB =1]Xk)= H(C+ ka)

where a and B are unknown constants and c and d are known constants. Following a
Bayesian approach, he assumes that prior information regarding a and ﬂ is available,
given by a probability distribution. The worth of an allocation(strategy) is deﬁned
as the expected sum of the ﬁrst n Observations for all possible histories resulting from
the allocation policy. A strategy is called optimal if it yields the maximal worth.
The research focuses on the determination of the structural characteristics of exactly
Optimal strategies. Yang and Zhu(2002) generalize the problem to the case when the
responses are continuous and they propose a design so that they are able to prove
that the strategy is strongly consistent in the sense that the accumulated reward
is asymptotically equivalent to that based on the best treatment. Their approach
employs nonparametric regression procedures for estimating the dependence of the
rewards on the covariates for the treatments. It uses a randomized allocation scheme
to control the trade-Off between the tendency to use the currently most promising
treatment and further exploration to ﬁnd the treatment that is truly the best.

Lai and Wei(1982) consider the multiple regression model:

yn=ﬁ1xn1+...+ﬁpxnp+en, n=1,2,...

where the 6,, are unobservable random errors, 51, . . . , ,BP are unknown parameters

and ya is the Observed response corresponding to the design levels :cnl, . . . ,rcnp. Let

b" be the least squares estimators of H = (51, . . . , 6,). The statistical properties Of
the estimators bn are well understood in the case where the design levels xi,- are non-
random constants, but there is a much less deﬁnitive theory for the case where the
covariate vectors xn = (12,“, . . . ,xnp) are sequentially determined random vectors.
They consider an adaptive design in the sense that xn at each stage it depends on the
previous observations x1,y1, . . . ,xn-1, yn_1. They give sufﬁcient conditions to prove
strong consistency and asymptotic normality for bn. They also give an interesting
counterexample to show that their conditions are in some sense the weakest possible.

Here is the counterexample:

91' = 31+ [3233i + (it

where 61, 62, . .. are i.i.d. random variables with Eq = 0, E6? = 1. The regressors

are deﬁned inductively by:

3:1 = 0a 1311+] = in + 6311

where c 75 0 is a real constant and En, 5,, denote the arithmetic means. Then it can
be proved that the assumptions are violated and the least squares estimate bag of
52 converges as. to ﬁg — c‘l.

Chen, Hu and Ying(1999) consider a similar problem in the context of the gen-
eralized linear model. The key difference is Obviously the nonlinearity in the link
function. They handle this by establishing a local inverse function theorem. In do-
ing so, they show that the minimum conditions Of Lai and Wei(1982) in conjunction
with an additional assumption, essentially to Offset the nonlinearity, ensure strong

consistency in their case, tOO.

We will consider in Chapter 3 a generalization of these last two designs. In their

10

case the adaptiveness refers to the dependence Of the covariate vector x" on the
Observations x1,y1, . . . ,x,,_1, yn_1. In our case we will consider the problem when
the vector x" has two components: a treatment indicator tn and a vector of patient
covariates. Then the choice of the treatment, tn, is made not only on the previous
Observations x1,y1, . . . ,xn_1, yn_1, but also on the values observed from the current

patient covariate.

1 .3 Applications

Medical investigators who wish to perform clinical trials currently have a wide vari-
ety of adaptive allocation procedures at their disposal. Still, very few clinical trials
based on adaptive designs have been reported to the literature. We mention some
Of them here.

Cornell, Landenberger and Bartlett (1986) reported on an adaptive clinical trial
to test the efficacy Of extracorporeal membrane oxygenation (ECMO) for the treat-
ment of persistent pulmonary hypertension of newborn infants. The design used
was RPW(1,0,1). The trial was stopped after 12 total patients using a stopping
rule with 1 patient allocated to the conventional therapy and the rest to the ECMO
treatment. The subsequent analysis Of the ECMO trial data generated controversy,
the foremost question raised was whether two treatments can adequately be com-
pared when only one patient was assigned to one Of the treatments. Much Of the
criticism of adaptive designs has centered on this trial and this is unfortunate be-
cause this is exactly the type Of trial where response-adaptive randomization would
be particularly advantageous. It is known that the RPW(1,0,1) is highly variable,
particularly when 1),] + p3 > 1.5 which was the case for the ECMO trial, and the
variance depends on the initial composition Of the urn. In retrospect, starting with

more than one ball of each type should have resulted in a more balanced trial.

11

Another trial was the ﬂuoxetine trial reported by Tamura, F aries, Andersen and
Heiligenstein (1994). They used an RPW(1,0,1) design in a clinical trial Of ﬂuoxe-
tine versus placebo for depressive disorder. The trial was stratiﬁed by normal and
shortened rapid eye movement latency (REML), so two urns were used in random-
ization. In order to avoid the problem encountered in the ECMO trial, the ﬁrst six
patients in each stratum were assigned using a permuted block design. The trial
was stopped after 61 patients had responded in accordance with surrogate criterion.
The trial randomized a total of 89 patients: 21 ﬂuoxetine patients and 20 placebo
patients in the shortened REML stratum, 21 ﬂuoxetine and 21 placebo patients in
the normal REML stratum. The primary outcome was analyzed using Monte-Carlo
randomization-based analysis.

Rosenberger(1999) discusses conditions under which the use Of response-adaptive

randomization is reasonable. We note some Of them here:

0 The treatments have been evaluated previously for toxicity. This is impor-
tant to ensure that the response-adaptive randomization does not place more

patients on a highly toxic treatment.
0 Delay in response is moderate, allowing the adaptation to take place.

0 Duration of the trial is limited and recruitment can take place during most or

all of the trial.

0 Modest gain in terms of treatment successes is desirable from an ethical stand-

point.

0 The experimental therapy is expected to have signiﬁcant beneﬁts to the public

health.

The choice of a design should be driven by the simplicity of implementation but

also by its statistical properties and the nature Of the clinical trial. We hope that

12

the new design proposed in this dissertation can be successfully used some future

clinical trials.

13

Chapter 2

Adaptive Designs with Covariates:

Model and Marginal Distributions

2. 1 Introduction

In this chapter we introduce the problem Of adaptive designs with covariates. We
will look only at the response distributions without assuming any parametric model
relating the response to the covariates and in this sense a wide class Of examples
is covered. Assume that we are in the case of clinical trials where the patients
are randomly assigned, one at a time, to one Of the two treatments according to a

response-adaptive rule. We assume that we work under the following conditions:
0 Patients arrive sequentially.

0 Each patient can be assigned to either one of the two treatments and will be

assigned to exactly one of them.

0 The response from patients is observed immediately, prior to the assignment

of the treatment to the next patient.

14

o The response depends on a vector of covariates. The covariates are Observed

prior to the treatment assignment.

Our goal is to show how a design for a response-adaptive allocation where the
treatment allocation depends on the previous responses and the current
covariates of the patient inﬂuences the distribution of the responses.

In contrast to the case when the adaptive design is without covariates, the new
setup introduces a more complex dependence structure in the following sense: the
response is dependent on the vector of the covariates and the treatment allocation
and at the same time the treatment allocation is dependent on the previous re-
sponses. Of course that in the case when we ignore the covariates from the design,

the ﬁrst dependence that we described is not present.

2.2 The Model and Notation

In this section we introduce the notation for the model we will use through the rest
Of the chapter. As we described before, suppose that patients arrive sequentially
and they are allocated to one Of two treatments, say A or B. For each k 2 1, deﬁne
tk to be 1 or 0 according to whether the kth patient is assigned to treatment A,

respectively treatment B:

1 if 19‘" patient is assigned to treatment A

et-
3'
||

0 if 19‘" patient is assigned to treatment B .

Let YkA, YkB denote the potential response Of the It“ patient for treatment A,
respectively B. Recall from our initial assumptions that, for all k 2 1, exactly one
of the pair (YkA, Y?) is actually observed.

Let X], = (Xk1,. . . ,ka) denotes the p-dimensional covariate vector of the k“

15

patient. Our goal is to have an adaptive design where the responses can be dependent
on covariates. Thus the distribution for Yk", YkB can be conditioned on the X k. We

consider then the following conditional distributions:

2.1 YAXk=x~FA
k 1:

Note that we do not assume or require any independence structure between Y,;4
and YkB. In fact there may be a strong correlation between the responses observed
from treatment A and B, since they are Observed from the same patient.

In most cases the allocation is randomized. It is useful then to let (Uk)k21
denote a sequence Of i.i.d. random variables whose common distribution is uniform
on (0, 1) and is independent Of the other variables in the model (11?, K? , X k). Then

the allocation is deﬁned as:
(2.2) tk = [[Uk < 7%]

where the I [] denotes the indicator function and in, is some number. We will see
later in chapter 4 how we can deﬁne in, in a suitable way. We use the index k for in,
to emphasize that in, is computed based on the covariate Xk, but we have to keep
in mind that in fact in, is computed based only on the information from the ﬁrst
It — 1 patients and the current covariate X 1,.

Assume now the existence of an increasing sequence of a -algebras, {7-7,th such

that the triplet (YkA, YkB , X k, tk, Uk) is f), -measurable. We can deﬁne them as:
(2.3) f]; I: 0(YiA,},,-B,Xi,ti, U1 3 1 S 2 S k) .

Think of the o -algebra .73}, as all possible information available after k patients have

16

been allocated and their responses and covariates Observed.

Consider the increasing sequence of a -algebras {91.} 1:21 deﬁned as follows:
(2.4) 9,, Z=ka0'(Xk+1)V0'(Uk+1) .

We have to deﬁne a new a -algebra so that the treatment indicator variable
becomes measurable. In the view of our goal stated in the beginning of the chapter,
think Of the o -algebra 9;, as the information to use for allocating the treatment for
the (k+ 1)“ patient. Therefore we will make use of g], as the o -algebra with respect
to which tk+1 becomes measurable. To understand better, Observe what happens
without adding 0(Xk+1) to the ﬁltration .75}. V 0(Uk+1). If we require only that tk+1
is measurable with respect to .75, V o(Uk+1), as in the usual case of adaptive designs
without covariates, then the treatment allocation depends only on the previous
responses without taking in consideration the covariate value of the current patient.
But this was not our proposed goal, so the deﬁnition of the new algebra plays a
crucial role in following developments.

It is also useful to deﬁne the o -algebra Jk:
(2.5) jk I: .7}; V 0’(Xk+1).

Observe then from the deﬁnition of the sequence of the uniform variables ( Uh) 1:21

that Uh“ is independent of the a -algebra Jk. Then we have that:

(2.6) E(tk+1|.7k) = E(I[Uk+l < 7fk+1lls7k) = ilk“

Let 7],, 11;, be deﬁned as the random variables which give the stage when the
kth Observation from treatment A, respectively B, is taken. More explicitly, after

the allocation is done we observe from treatment A the sequence Of Observations

17

Y3, Y3, . . . and respectively from treatment B the sequence Yf, Y3, . . .

Note that specifying the sequence 11,111,72, V2, . .. is equivalent to giving the
sequence Of the treatment indicators t1, t2, . . . .

Indeed, Ti, 11,- can be deﬁned inductively as
T0=O Ti=lnf{k>7'i_1]tk=1}, V1.21

V0 2 O V37 ‘2 inf{k > Vi_1|tk = 0}, V221

and conversely,

t..=1, Vial

tu.=0, ViZl.

The condition for the design that the treatment allocation for the kth patient
is based on the responses from the ﬁrst I: — 1 patients and the covariate Of the kth

patient can be expressed as follows:
(2.7) (7’, = k} E gk—la {11,- = k} E Qk—l Vi,k Z 1

or equivalently,

(2'8) tk E gk—l .

18

2.3 The Distribution Of the Responses under
Adaptive Design

We will show in this section that although the covariates can introduce a compli-
cated dependence structure in the adaptive design, the marginal distribution of the
responses will be unaffected. Theorem 2.1 from Melﬁ and Page(2000) says that in
the case when the covariates are not considered in the model and as long as the pair
(YkA, 1”,?) is independent of the o -algebra $1-1 then the allocated sequence inherit
the distribution and independence structure Of the original sequence.

In our case we do not get the independence, but we can show that the original

distribution is inherited by the allocated sequence.

Theorem 2.3.1. Let (YkA, YkB,Xk)k21 be a sequence of i.i..d random vectors. As-
sume that the conditional distributions of YkA,YkB are given by equation (2.1) and
that X1, is a discrete random vector.

Assume that there is a ﬁltration {fk}k21 such that (YkA,YkB,Xk)k21 is f}. -
measurable for all k 2 1. Let {9,th be the ﬁltration of the o -algebra introduced
by equation (2.4). Assume that (YkA,YkB) and Qk_1 are conditionally independent
given Xk.

Let {77,}k21 and {V1,}k21 be two sequences of positive, increasing, integer-valued,

a.s. ﬁnite random variables which satisfy condition (2.7) and also satisfy

(2.9) P(r,- = Vj) = 0, Vi,j 2 1

(2.10) VkZ 1, 3i such that r,=k or V,=k.
Then the conditional distributions of the allocated sequence are given by the same

19

original distributions:

(2.11) Y,f‘|X,, =x~Ff, Vi 21

YT?|X,,=:r~Ff, Vial.

Before we proceed to the proof Of the theorem we remark that the condition
(2.9) is the expression of the initial assumption that a patient can receive only one
treatment and the condition (2.10) excludes the trivial case when some patients do

not receive any treatment at all.

Proof. We’ll prove the result only for er’, the proof for Y}? is similar.

We want to prove that for any measurable set D,
Pm? e 0er, = x) = 173(0) .
We have that

P(Y,f e DIXT, = as) = Z Pm: e D, T,- = m | X,, = a3).

m21

Recall from condition (2.7) that {Ti = m} E g".-. for all i, m 2 1 and using the

hypothesis that Y; and cw, are conditionally independent given X... we Obtain

that:

P(Y,.,’;’ED,r,-=m|Xm=:r)=P(Y,,’,’EDIXm=zr)P(r,-=m|Xm=:L‘)

20

Then in the case when P(XT,. = :r) > 0

P(YT‘:’E D,r,-=m|X,, =23) =

P(Y,f E D,’r,- = m,XT, = 2:)
P(XT,. = as)
Pal"? E D,Ti = m,Xm =-’ (II)

 

 

P(X,, = 2:)
P(Y.,,’;1 E D,r,- = ml Xm = 11:)P(Xm = 2:)

 

P(X.,. = 11:)

3

P(Y,;‘,’ E D | Xm = a3)P(r, = ml Xm = :1:)P(Xm = :c)

 

P(X,, = .73)
Ff(D)P(r,- = m | Xm = x)P(Xm = a3)

 

P(X.,,. = x)
Ff(D)P(r, = m, Xm = 1:)
P(X,, = 1r)

Ff(D)P(r,- = m,X.,, = 3:)

 

P(X.,, = x)

= Ff(D)P(r, = m | X,, = :c) .

independent.

More precisely, keeping the same notation as before, we will Show that the se-

Assume the following:

Combining the relations together, we get indeed the equality required.

2.3. 1 A counterexample

We will show in this section why the independence structure of covariates is not
generally preserved in the case Of an adaptive design. For this we will look at a
counterexample, an adaptive design satisfying the properties required in the begin-

ning Of the chapter but for which the sequence of the Observed covariates is not

quence X71, X72, . . . is not independent and similarly for XV1 , XW, . . ..

o X1,X2, are i.i.d. Bernoulli with P(X1 = 1) = P(X1 = 0) = 0.5;

21

o For the responses assume the following distributions:

Pug," = 1|Xk = 1) = 1 — 1301,,"1 = DIX), = 1) = .75
P(YkA = 1|X. = 0) = 1 — Pm" = 0le = 0) = .25
Pm? =1|X,c =1) = 1 — P(Y,,B = 0le =1) = .25
Pm? =1|X,c = 1) = 1 — P(YkB = 0le = 1) = .75

We associate the value 1 for the response with a successful treatment.

0 Assume the following iterative design:
P(t1 = 1) = P(t1 = 0) = 0.5

(in the ﬁrst stage we assume the equipoise state so there is no preferred treat-
ment and the covariate is ignored).

At each stage k 2 2 let N3 = 2le tk be the number of allocations to treat-
ment A up to stage k. Let k‘ = TN12’_1 = maxlsi<k{i]t,- = 1} be the last time

before the kth patient when we allocated treatment A.

— If k‘ = 0 which corresponds to the case that there were no allocations to
treatment A before the kth patient then just allocate with equal proba-

bility disregarding again the covariate:

Pa. = 1) = P(t,, = 0) = 0.5

22

— If k‘ > 0 then allocate according to the following rule:

0.9 if XV
0.3 if X),-
P(tk = lle = 1) = 1— P(t,c = 0|X,c = 1) =
0.7 if X),—
0.4 if X),—
0.7 if XI,-
0.4 if X),—
P(t;c =1IX;c = 0)=1— P(t,c =1|X1c = 0) =
0.9 if X),-
0.3 if X),—
We show now that X,,, X”, . . . are not independent.
P(X., = 1) =ZP(X,, = 1, n = is)
1:21
=2P(Xk = 1,11 = 0,...,t,,_1 = 0,1,c = 1)
1:21

:Zp(tk =1:le =11tl = 01' ' ° itk—l = 0)

kZI

-P(X,, = 1) . P(t1 = 0,... ,t,,_, = 0)

1 1
=Z§'§'P(t1=0....,t._.=0);

1:21

23

= 1,114
= 1,114
= 0,114
= 0,114
= 1,114
= 1,114
= 0,114

= 0,114

and similarly,

(X.,=0) =(sz ., =o,n=k)

k>l
=ZP(Xk:Oatl=01"'1tk-1=0’tk:1)
1:21
22PM. =1|Xk = 0.t1= 0.---.tk—1 = 0)
1:21
'P(Xk=0)'P(t1=01'°'1tk—l=0)
I 1
— — — P(t1=0,..-,tk—1=0)-
2 2
k>1

And since P(XT, = 1) + P(XT, = 0) = 1 we obtain that
P(X.,, = 1) = P(XTl = 0) = 0.5
On the other hand,

P(X1-2 =1)=Z Z P()(.,-2 =1,T2=k,7'1=j)

k>1k>j>1

=ZZP(Xk=1,tk=1aatk—I=0m vtj+l=01tj=1tj-1=0’” t1=0)

k_>_1k>j>l

éZme.

1:21 k>j21

where A), denotes the event:
Ak={Xk=1,tk=1,.tk_1=0,. ,tj+1=0,jt=1,jt _1=0,...,t1=0}.

Let 8110' = {16' =1,Xj :1}1BIO,j = {)3 =1,Xj = 0},B01,j = {)9 = 0,XJ’ =1}

24

and BOO,j = {Yj = 0,Xj = 0}. Then:

P(XT2 =1) = Z Z Z P(A,, D Baas“)

k21 k>j21a=0,1
g=0,1

=111+110+101+100-
We’ll show the computations just for the ﬁrst summand:

111=P(Xk=1,tk=1,tk_1=0,. ,tj+1=0,jt=1,B11j,tj _1=0,...,t1=0)

=P(tk=1|Xk=l,tk_1=0,..1+1—0t—1811,,]1—0 t1=0)
.P(X,,=1)
'P(tk_1=0,..1+1—0lt—1B11],31—0 t1=0)

.P(Y,- =1|t,- = 1,X,- = 1,1,41 = 0,...,t1 = 0)
-P(t,- =1|X,=1,t,-_1=0,...,t1=0)
-P(X,- =1)
.P(t,-_1= 0,... ,t, = 0)
= 0.9 - 0.5 - (0.1)’°-J'-l . 0.75 - 0.5 -0.5 0.5171 .

In a similar way we Obtain that:

110 = 0.7 - 0.5 - (0.3)"”j‘l .025 .(15 .05 . 0.5j-1
101 = 0.3 - 0.5 - (0.7)k‘j‘l .025 . 0,5 . (15 . (151-1

100 = 0.4 - 0.5 - (0.6)k‘j‘1 .().75 . 0.5 . 0.5 . 0.51-1

25

Also we can to observe that:

P(X,., = 1,X,, = 1) = Z Z P(X,, =1,¢2 = 1c,X,, =1,r1=j)

k21 Ic>j_>_l

:2 Z 10(sz1,1,,_—.1,t,,_1=0,...,t,-+1=0,X,-=1,t,- =1,t,-_1=0..--.t1 =0)

k_>_1 1c>j21

=2 2 P(AkD{Xj=1})

k211c>j21

=2: 2 P(Am{Y,-=1}n{X,=1}+P(A,,n{Y,-=0}n{X,-=1})

1:21 1c>j21

=111 +101-

Since from the above relations is clear that In + 101 76 100 + 110 then it follows

that:

1
P(XT1 =1)P(X,., = 1) 2 EU“ + 101+ [10 + 100) 75

,1}

2(111 + [01) = P(X1-2 =11X‘rl =1) .

which proves the property that the sequence (er)k21 is not independent.

26

Chapter 3

Asymptotic Results for Adaptive

Designs with Covariates

3. 1 Introduction

In this section we will introduce a speciﬁc parametric model for the problem we
formulated in the beginning Of the previous chapter. Thus we will concentrate now
on the study of the behavior Of the parameters rather than the study of distribution
Of responses themselves. We need a different approach from Melﬁ and Page (2000)
to derive the asymptotic results because in their case, the adaptive design produced
an i.i.d. sequence for the responses, result which is not available in the case of the
adaptive design with covariates.

We will work under the same assumptions mentioned in the second chapter,

namely:
0 Patients arrive sequentially.

0 Each patient can be assigned to either one of the two treatments and will be

assigned tO exactly one of them.

27

o The response from patients is observed immediately, prior to the assignment

Of the treatment to the next patient.

0 The response is possible dependent on a vector Of covariates. The covariates

are Observed prior tO the treatment assignment.

Our goal is to show how a response-adaptive allocation where the treatment
allocation depends on the previous responses and the current covariate
Of the patient inﬂuences the asymptotic results for estimators Of the parameters
of the model we prOpose.

As pointed out in the ﬁrst chapter much Of the work done in the area of the
adaptive designs with covariates concentrated on the approach Of forming strata by
considering all possible combinations Of levels of relevant covariates and then inde-
pendently use different adaptive schemes within each stratum to allocate patients.
But the approach that we will take in this chapter comes naturally whenever we can
model a response which is a function of covariates. Speciﬁcally, we will consider a
generalized linear model for linking the response with the covariates and the treat-
ment, and then we will concentrate on studying the behavior Of the estimators of the
parameter vector of coefficients under a general adaptive allocation which satisﬁes
the above mentioned assumptions.

We will prove consistency and asymptotic normality results for the estimators Of

the parameter vector of coefficients under some conditions imposed on the model.

28

3.2 A Generalized Linear Model Relating the Re-
sponse, the Adaptive Design and the Covari-
ates

In what follows we introduce the model and notation that we’ll need through the
rest Of the chapter. We will follow as much as possible the notation introduced in
the second chapter and we will extend the deﬁnitions previously used when needed.

As they were introduced in the previous chapter, for each It _>_ 1 let t), be the

treatment allocation to the kth patient:

1 if k“ patient is assigned to treatment A

a.
71"
||

0 if k’h patient is assigned to treatment B .

Also recall that X), = (X 1.1, . . . ,ka) denotes, as before, the p—dimensional co-
variate vector of the kth patient.
Let Y), be the Observed response of the kth patient. Using the notation Yk” and

YB introduced in the previous chapter we can deﬁne Y1, as:
(3.1) 11 = 114nm, = 1] + 113111,, = 0]

where we denote by I () the usual indicator function.

As will be seen below, we don’t have to make any longer the difference in the
notation between response from treatment A and respectively treatment B, YkA and
Y13. This is because with the new approach, we will consider the response Y), in
the context Of a generalized linear model, with the independent variables being the
covariate X k and treatment t), and with possible interactions.

Assume, as before, the existence Of an increasing sequence of o -algebra {7:1,} 1),

29

deﬁned as
(3.2) .73), := 0(144,Y,.B,X,-,t,-, U,- : 1 g 1 g k).

As we discussed in chapter 2, (Uk)k21 denotes a sequence Of i.i.d. random vari-
ables uniform distributed on (0,1) and independent Of the other variables in the
model (YkA, YkB, Xk). They may be used for randomizing the allocation treatment
as in the construction given by (2.2).

Consider also the increasing sequence Of a -algebras {9),} 121 deﬁned as in equa-
tion (2.4):

9,, := .7-7, V o(Xk+1) V o(Uk+1) .

Recall that our goal is to make the allocation decision for the 19‘” patient based
on the responses from the ﬁrst k — 1 patients and the covariate Of the let" patient.

Hence in this case the condition can be expressed as

(3.3) tk E gk-l .

Before we introduce our model, recall an overview Of the generalized linear model.
The generalized linear model is characterized by a random part and a structural
part. The random part consists of the independent observations (KL-21 whose dis-
tributions are members of the same exponential family, given by the generic density

function:
y.- ' 9i ’ blgi)
d)

 

f(yi|0i1 42,7111) = exp{ ° wi + C(yii $1 7.01)}
where

o 0,- is the so—called natural parameter

c (b is the dispersion parameter

30

o w,- is a known weight
0 c(., -, ) is a function of y,, dispersion (b and weight 1111.

The structural part Of the model speciﬁes the linear relationship between a vector

of covariates 2:,- = (23,1, . . . ,x,q) and the expectation of the response, E(Y,-) = u,- :

q
g(ﬂz‘) = 771' = 1‘15 = 25% '133' -
j=1

The invertible function g(-), common to all Observations, is called the link func-
tion and ﬂ is the parameter vector of the model. Each exponential distribution is
associated with a canonical link function, although this is not necessarily the best
link for modeling.

The more familiar linear regression model and logistic regression model can be
both viewed as particular examples of the generalized linear model. For the case
when the random part has binomial distribution and the canonical link is the logit
function, g(y) = logit(y) = log(1—_Ly) we Obtain from the generalized linear model
the logistic regression model. In the case when the random part has normal distri-
bution and the canonical link is the identity function g(y) = y we Obtain the linear
regression model.

In our case we construct the generalized linear model as follows. The structural

part is given by the following formula:
(3-4) E(Yk|~7k—1 V 0(X1c, tic» = 9(X1c51 + 0152 + Xkﬁstk)

where

o g is the link function associated with the distribution of the response Y],

31

0 ﬂ], ﬁg are p -dimensional column vectors and 02 is scalar:

B; = (1311.512. . - - 1181p)
131,}: (ﬁ31118321 ' ' '1/8311)‘

We will use from now on the notation M’ to denote the transpose of a matrix
M.

We used in relation (3.4) the o -algebra $1-1 V 0(X 1:, tk) to emphasize the rela-
tionship between the response Y), and X 1:, tk. But since ti, 6 91-1 by assumption (3.3)
and using the deﬁnition (2.4) of the a -algebra gk_1 we may use as well the following

deﬁnition for the structural part of our model:

(3-5) E(Yklgk-1) = 9(Xkﬂ1 + 0:52 + Xkﬁstk)
Consider the errors (k for the model:

(3-6) 61: = Y1: — E(Yklgk-l) -

We will assume that the errors 6]; are independent.

In a more explicit form, the systematic part Of the model can be rewritten as:

p p
(3.7) E(Yklgk—1) = 9 (Z ijﬂlj + 0:02 + Zxkjﬁsjtk) -

i=1 i=1

The coefficient parameter Of the model is given by the (2p + 1) -dimensional

column vector

.8, = (ﬁl 1B2, ﬁI’S) '

In some cases it will be useful to refer to the regressor vector Of the model (3.4)

32

as a whole. We introduce the following notation:
(33-8) X1: = (X1, 0:. X1: - ti) -

The innovation of our model is that we introduce in the structural part Of gener-
alized linear model the dependence on the treatment where the treatment allocation
follows an adaptive design as deﬁned by relation (3.3).

By considering the generalized linear model we allow either continuous or cate-
gorical responses Yk. For the continuous responses, normal errors and the case when
the link function is the identity function g(y) = y, we get back to the usual linear

model.

For now we don’t assume yet any special conditions for the errors, except in-
dependence, but as we will see later in the chapter we will need to enforce some
regularity conditions on them to prove the consistency and the asymptotic proper-
ties for the estimators of the parameters Of the model.

Our next goal is to deﬁne some estimators for the coefﬁcient parameters of the
model (3.4) and then to study their behavior. The most natural estimators that
arise are the maximum quasi—likelihood estimators.

The maximum quasi-likelihood estimators are deﬁned in connection with the so-
called quasi-likelihood functions introduced by Wedderburn (1974) and McCullagh
& Nelder (1989). In general, if we have independent Observations (y,),-=1,,,,,,, with
expectations [1,- and variances V(/1,-) then the quasi-likelihood function K(y,-, 11,) is
deﬁned by the relation:

6K(y.-,u.-) _ yi — #1

8/1: _ Vf/Ii) ’

Assume that for each Observation, u,- is a function Of parameters 131, . . . ,13,,. It

33

can be proved that

 

8K .
13(0/31) —0 forallz—1,...,q.

Then the maximum quasi-likelihood estimators are the solutions from equating

BK/Bﬁ, with its expectation, zero:

 

i 0K(yj,/1j(51. - ° wﬁq»

(9/31 =0 foralli=1,...,q.

j=l
Let 0:, = (B1,,,[31,,,B§,,,) be the maximum quasi-likelihood estimator of the
model (3.4). Note again that the estimators ﬁLmﬁgm are p-dimensional column

vectors and 101,, is scalar. Explicitly in our case 0,, is the solution of the following

equations:
inlyi _ g(xiﬁlm + 62," ' ti + 531-83," . t1” = 0
i=1

(3.9) E till/1' — 907131,” + 3,111+ $1.33,” 41)] = 0
i=1

Zita-My.- — 9031/3131 + (32,1: ‘ ti + (Bi/93,1: ' ti)] = 0 -
i=1

We may remark that in the case Of linear models, when the link function g() is
the identity function, equations (3.9) become the normal equations and the maxi-
mum quasi-likelihood estimator becomes the usual least squares estimator.

We will prove in the next sections that under some assumptions we can derive
consistency and asymptotic normality properties for the maximum quasi-likelihood
estimators deﬁned as solutions to equations (3.9). In return the results allow us to
analyze predictors for the response Yk from treatments A and B and eventually to

compare them.

34

Let Z, be the model (3.4) design matrix, that is

(X11 X12 X1p X11't1 X12't2 X1p't1 t1\
X21 X22 X21, X21't2 X22't2 Xgp'tg t2

(3.10) z,

 

 

(an Xng X,,,, an-tn Xng-tn an-tn in)

We will pay special attention to the eigenvalues Of the matrix Z1,Z,,.

Let Amin(n) and Amm(n) be the minimum and respectively the maximum eigen-
values of ZQZn. The assumptions for the results we will prove next depend on the
regularity Of the design matrix Zn and this will be expressed in terms of the above

deﬁned eigenvalues.

3.3 A Consistency Result

We want to show in this section that the maximum quasi-likelihood estimator, ﬁn ,
is strongly consistent for the model parameter, 0. In order to prove the result we
need some limit constraints for the errors Of the model, 6", and also we will rule out
the asymptotically ill-conditioned design matrices, Zn.

Theorem 2 from Chen, Hu and Ying (1999) and Theorem 1 from Lai and Wei
(1982) address the consistency problem for an adaptive design for a generalized
linear model and respectively a linear regression model. But in their case the design
does not allow for the treatment allocation t), to be dependent on the covariate
Xk. But in our case our main design goal is to make the treatment allocation, t),
dependent not only on the past information $1-1 but also on the covariate value of
the current patient, Xk. We will mirror the results mentioned before but using the
new a -algebras required by our design.

In what follows we will denote by the norm II - H2 the usual Euclidean norm, that

35

is for an m—dimensional vector x = (2:1, 2:2, . . . ,xm),

 

||:c||2=\/:r§+:r§+...+x3n.

Theorem 3.3.1. Assume that we have a generalized linear model as speciﬁed by
the model (3.4) (3.6) with an adaptive design satisfying relations (3.2) and (3.3).

Assume the following conditions are satisﬁed:

(3.11) g is continuously diﬂerentiable with positive derivative function
(3.12) 3112111) ||X,-||2 < ooa.s. ,

(3.13) 71:11.10 Am:n(n)/log /\max(n) = ooa.s. ,

(3.14) 81.1211) E(|e,:|°‘|g,--1) < was for some a > 2.

Then the estimator ﬁn is strongly consistent, and

(3-15) ”Bu — ﬁll2 : 0((10g )‘max(n)/’\min(n))l/2) “-3:

Let recall ﬁrst the theorem proved by Chen, Hu and Ying (1999) in their paper.

Theorem 3.3.2 (Chen, Hu and Ying (1999)). Consider a generalized linear

model for the pairs (111,301.21 given by:

(3-16) E01034) = 90341710

36

where (3:101:21 is a o-ﬁltration such that y, E f), and 2:), E .75},_1. Let the errors 6,,

be deﬁned as:
C1: = yr: —- Efyklfk—I)

Let Ann-n, )1me be the minimum and the marimum eigenvalues of the information
matria: zyﬂxixﬁ. Let 8,, be the maximum quasi-likelihood estimator of B from

model (3.16). Assume the following conditions are satisﬁed:

(3.17) g is continuously differentiable with positive derivative function
(3.18) sup ||:r,~||2 < was
:21

(3.19) lim Amm(n)/log AmaI(n) = was
(3.20) sup E(|e,~|"|f,_1) < was for some a > 2

121
Then
(3.21) H31: - 5H2 = 0((10g Ama:(N)//\mm(n))’/2) 01-3-

As we pointed out before, the implementation Of the new a -algebras 91: will

solve the problem for our design.

Proof of theorem 3.3.1. Horn the construction of the o -algebras 9,, we have that
t), is Qk_1 -measurable and from the construction itself of the ﬁltration, X1, is

gm -measurable and Y), is 9;, -measurable. The model (3.4) becomes:

E(Y1:|g1:—1)= 9W1 ' X1: + 32 ° 1i1: + 33 ' X1: ° tk)

We can apply now theorem 3.3.2 with .7}, replaced by Q), and covariate matrix

221:13311'; replaced by ZQZn.

37

We Observe that conditions (3.17) (3.19) (3.20) are satisﬁed by the equivalent con-
ditions (3.11) (3.13) (3.14) from theorem 3.3.1. To see that also condition (3.18) is

satisﬁed we can use the inequality

2
“Xlllg =]I(X1:, tk, Xk'tk)||

S 2' “Xklli + 1

2

because t), 6 {0,1} for all k 2 1. Now using (3.12) it follows that indeed condi-
tion (3.18) is satisﬁed. Hence the conclusion (3.21) Of Theorem 3.3.2 which is just

exactly what we need for (3.15), too. Cl

We will look now at the particular case when the covariate is 1-dimensional, that
is, p = 1. Then we can ﬁnd explicitly the eigenvalues involved in the condition (3.13)
from Theorem 3.3.1 and we will be able to prove the desired limit under some

veriﬁable conditions.

Proposition 3.3.3. Let covariate (23,-),21 and treatment (t,),-21 sequences be such

 

that:
:13 _. 0......
:21
2t? -) 00 as
:21
2
(Zi21xiti)
2 2 < 1a.s.
2:121:17: ' 1'21 ti
Let
$1 1132 . . . :13"
Z:, =
t1 t2 tn

38

Then
’\min (n)

lim —— —: 00 as
n—ioo log/\max(n)

where Ann-An), Amax(n) are the minimum, respective maximum eigenvalue of the

matrix Z; Zn.

Proof. Let m1 = 2;, 23?; m2 = Zf=1ziti and m3 = Z" t? We omitted the index

i=1 1'

n to make the notation easier. It can be readily proved that

 

Amin = % (ml + m3 — \/(ml _ m3)2 + 4mg)

Amas: = 'é' (m1 + m3 + Wm] — m3)2 + 4m2)

 

Because of Cauchy inequality we have that mg 3 m1m3. Hence Ami" 2 0. Moreover,

Amin = 2 ' m1m3

2 2
_1_ _1_ _1- __ L _'_"2__.4__
m1 + mg + \/(m1 mg) + m1m3 m1m3

 

Let
m2

 

m1m3
and from hypothesis, at the limit, I < 1a.s.. Using also the hypothesis that

m1, m3 —> was we have that as n —+ co,

1—l
2- as
0+0+\/(0-0)2+l-0

 

 

Amin —’

therefore as n —1 co, Am,” ——> 00 us.

Let q be a constant such that

1+fl

q>

 

T
g

39

Then using some algebra we can prove that

q ' Amin > Amax

Then we have that
Amin _ Amin 108 Amin
log Am” — log Ann-n log Am,”

 

and
log /\m:'n log Am,”

—— —— < 1
log(q - Amin) log Am”
Since Am," —+ was we Obtain that log Amin/ log Am“. —> 1a.s. and therefore

Amm/log Am” —+ was. when n —> 00. Cl

3.4 An Asymptotic Normality Result

We showed in the previous section that the maximum quasi-likelihood estimator,
ﬁn, is consistent under assumptions (3.11) - (3.14). Therefore, under the adap-
tive design and the model considered, we know that at least asymptotically we get
the correct estimators for the coefﬁcient parameter ﬂ. Our next goal is tO study
the asymptotic normality behavior Of the maximum likelihood estimators. We will
prove the asymptotic normality result in the case of the linear regression when also
the maximum quasi-likelihood estimator becomes the usual least square estimator.
Unfortunately in the literature there is not any similar result for the generalized
linear model. Because we need the variance for the asymptotic normal distribution
we will need an extra asymptotic limit condition on the errors Of model (3.4).
Recall that Theorem 3 from Lai and Wei (1982) addresses also the asymptotic
normality problem but the design in their case does not permit the choice Of the

treatment to be dependent on the covariate of the current patient. We will extend

40

the result they Obtained for their particular case to the design that we presented in

the introduction of the chapter.

Theorem 3.4.1. Suppose that for the model (3.4) with the identity link function,

g() = -, the errors 6,, satisfy condition (3.14) and the following limit condition:
(3.22) _lim E(ef|Q,-_1) = o2 as.

Moreover there crisis a non-random positive deﬁnite symmetric matrix B" for which

(3.23) lim B;1(z;,z,,)% 33: I
(3.24) lim max naglxnh 3» 0

n—~oo 1519:

Then the estimator [3,, has an asymptotically normal distribution:
(3.25) (222.123.. - 13) 9-» N<o. 021)

Let recall ﬁrst the theorem proved by Lai and Wei (1982) in their paper.

Theorem 3.4.2 (Lai and Wei (1982)). Suppose that for the regression model
(326) yn = ﬂl-Tnl + ' ° ° + lap-Trip + 5n:

6,, is a martingale sequence with respect to an increasing sequence of 0 -ﬁelds .7},

such that

(3.27) sup E(|e,-|°‘|.7:,-_1) < was for some a > 2

:21

41

(3.28) lim E(e?|f,-_1) = 02 as for some constant a
n—+oo

Moreover, assume for each n that the design vector

(3.29) x.- = (131,- . . ,ast-p)’ is .7-",-_1 — measurable

and that there exists a non-random positive deﬁnite 13,. for which

(3.30) lim B;1(Z x.x;)% 1’» I
i=1
(3.31) lim max ”13:...”2 i1 0

71—200 lSiSn

Then the least square estimate [3,. of [i has an asymptotically normal distribution:
(3.32) (2 xix2)%(/§n — g) 9. N(O, 021)
i=1

Proof of theorem 3.4 .1. Again the crucial step is'the deﬁnition of the o -algebra 9).
we introduced by equation (2.4).

From that definition and from condition (3.3) we have that X k, tk E gk—la hence
X; is Qk_1-measurable for all k so condition (3.29) is satisﬁed. From the equa-
tion (3.6) we have that the errors form already a martingale difference sequence. By
conditions (3.22) and (3.14) we have satisﬁed conditions (3.27) and (3.28). Finally
replacing the design vector xn by X ,2 and the covariate matrix 2;; mix; by ZQZn we
get that the conditions (3.30) and (3.31) are satisﬁed by conditions (3.23) and (3.24).
The conclusion (3.25) follows from (3.32). C]

42

Chapter 4

Examples of Models with an
Adaptive Allocation

4. 1 Introduction

In the previous two chapters we proved some general theorems regarding, ﬁrst,
the distribution of the response from a general adaptive design and then the limit
behavior of maximum quasi-likelihood estimators for a generalized linear model with
an adaptive design. We will describe in this section how an adaptive design can be
constructed and then we will look at some particular cases.

Recall the main features of the design:
0 Patients arrive sequentially.

0 Each patient can be assigned to either one of the two treatments and will be

assigned to exactly one of them.

0 The response from patients is observed immediately, prior to the assignment

of the treatment to the next patient.

43

o The response is possibly dependent on a vector of covariates. The covariates

are observed prior to the treatment assignment.

0 We want the adaptive design to be such that the treatment allocation depends

on the previous responses and the current covariate of the patient.

We will keep the same notation as in the previous chapters. Denote by A and B
the two treatments that patients receive. Let tk be the treatment allocation to the

kth patient, as deﬁned by:

1 if kth patient is assigned to treatment A

its
0 if 19‘” patient is assigned to treatment B

Recall also that X k denotes the p-dimensional covariate vector of the kt" patient
and Yk is the potential response of the kth patient. We consider the generalized linear
model introduced in Chapter 3 by relations (3.4) and (3.6). In particular g(-) is the
link function to connect the mean of the response Y), with the linear combination of
the covariate X k and treatment tk.

We will use the notation Y,“ and YkB for the potential responses from treatment

A and treatment B reSpectively and Y), for the observed response:
Y), = YkAI[tk = 1] + YkBI[tk = 0] .

Assume, as before, the existence of an increasing sequence of o -algebra {.77).},01

such that the triplet (Yk, Xk, tk) is f). -measurable:

11:: 0(KA,KB,X.-,t.-,U.- : 1 s 2' s k) .

44

where (Uk)k21 denotes a sequence of i.i.d. random variables uniform distributed on
(O, 1) and independent of the other variables in the model (YkA, 3’3, X k). We will use
the sequence (Uk)k21 for randomizing the allocation treatment as in the construction
given by (2.2).

Recall also the construction of the o -algebras g), and that the treatment tk+1 is
Q), -measurable:

9k 3: fl: V ”(XkH) V 0(Uk+1)o

In general we will say that we are at stage I: if we have observed the ﬁrst I: — 1
patients, and the covariate of the let" patient, but we haven’t observed yet the
response from the kth patient.

Let 13;," and 17,? be the predictors of the potential response of the kth patient
from treatment A, respectively treatment B. We want that allocation to treatment
A to be made based on a function of these predictors, f (f’kA, fka ). We’ll use the

notation:
(4-1) 7h: = f(YkAaYkB) -

It is also useful to think of in, as an estimate of the parameter 7Tk, where 7r,c is the

function of the unobserved potential responses:
7!.A: : f(YkAa YkB) '

In summary, we consider the following general framework for our adaptive design:

0 First stage

Start with no patients allocated to each treatment.

0 Second stage

45

At each stage I: > 2n0 we iterate the following procedure:

- Compute the maximum quasi-likelihood estimators Bk_1,1,ﬁk_1,2,3k_1,3
based on the information from the ﬁrst I: — 1 patients as deﬁned by

relation (3.9). Recall that 51.4.1, [31,-“ are p -dimensiona1 column vectors

and 31,43 is scalar.
— Observe the covariate vector X), for the kth patient.

— Compute the predictors 1%? and 17,? for the responses:

(4-2) 37;? = g(Xk,3k—1,1 + ﬁle—1,2 ‘ 1+ Xkﬁk—IB ' 1)

= g(Xk(8k—1,l + [lie—1,3) + ﬁle—1,2)

(4-3) 37;? = g(XkﬂAk—m + 31:42 ' 0 + Xkék—1,3 ° 0)

= g(Xk/lk—m) -

— Evaluate the function in, := f(YkA, 17,33).

— Generate the treatment allocation tk according to a Bernoulli(irk) dis-
tribution. To achieve this, consider (U021 the sequence of i.i.d. uni-
formly distributed random variables on interval [0,1], independent of

(Xk, Yk)k21. Then let t), to be deﬁned as:

(4.4) tk 2: [[Uk < 71k]

where the function I [] is the usual indicator function.

Note that based on the proposed model the estimators 17g“, 17,? are computed

from the information up to the (k — 1)‘h patient and the current covariate X 1,. Thus

46

the treatment allocation tk, through the dependence on the in, involves all the
information available at stage k — 1 which is exactly our proposed goal stated in the
beginning of the chapter.

The ﬁrst stage of the design is needed for an initial estimation of the parameters
of the model. If some estimators are already available, this ﬁrst stage can be skipped
and the design can start directly from the second stage.

From an ethical point of view the choice of the function f (., -) should satisfy some
requirements according to the basic principle that we should favor the treatment
with a superior response in the trial up to date.

As an example, in the case of a continuous response where larger values corre-
sponding to better treatment, we propose the following guiding rules for constructing

the function f:

(4.5) f(x, 1:) = 0.5
(4.6) 3330mm = 1
(4.7) Mlim f(:r,y) = 0

Equation (4.5) expresses that in the case of same potential responses from the
two treatments, we should allocate with probability 0.5. Equation (4.6) reflects that
in the case of a superior potential response for treatment A then we should allocate
to treatment A with a probability close to 1. Likewise, in the case (4.7), when
the superior potential response comes from treatment B, then we should allocate to

treatment A with a probability close to 0.

47

4.2 A Normal Response Example

In this section we investigate an example from the family of designs proposed in
the beginning of the chapter. For the model (3.4) introduced in Chapter 3, we
will concentrate on the case when the response is normally distributed together
with the identity(canonical) link function for the model. In this particular situation
the generalized linear model reduces to the linear model and the maximum quasi-
likelihood estimators become the least squares estimators. Keep the same notation
introduced in the beginning of the chapter and consider the following model for the

response:

(4-8) Yk = ﬁ1 + 32%: + ﬂatk + B4thk + 6k
with the errors 6;, independent and normally distributed:
(4.9) Q. ~ N(0, 0‘2) .

We consider the covariate X,c to be unidirnensional. The random covariates
(X 1.)),21 are an i.i.d. sequence whose distributions have a known density p(x). The
treatment allocation tk is as described in the general framework of an adaptive design
(see page 46). More precisely, we will consider

Let Z, be the model (4.8) design matrix:

(1 X1 t1 X131)

1X t X-t
(4.10) z,: 2 2 2 2

 

 

(1 X, tn Xn-tn)
Consider now the case when larger values for the response Y), are desirable. Let

48

31)., 32),, 33,,“ 34,), be the least squares estimators for the parameters ﬁ1,,32,,63, 64, re-
spectively, after k patients are treated and observed in the trial. Then the predictors

for the let" patient response from treatment A, respectively B are:

37;? = BlJc—l + Balk—1 + (flak-1 + B4,k—1)Xk

YkB =ll1,k—1 + ﬂak—1X1: -

From ethical reasons we want to allocate the patients to the better treatment so
we will ’favor’ the treatment with a larger value from the two predictors YkA, KB.
At the same time we want to keep the randomization for the trial design. Therefore
’favoring’ the better treatment will consist of allocating with a higher probability
to the better treatment.

We will consider the following criteria for evaluating the better treatment. Let
A), be the difference for the kth patient between the mean responses to the two

treatments:

(4.11) A), = E01?) — Em?)

= 53 + 34X]:

and let A), be its estimator at stage k:

(4.12) A]; = 1‘ka — YkB

= [33,k—1 + [34,k—1Xk -

A positive A), shows that treatment A is better and analogously, if A), is neg-

ative then the treatment B is better. Moreover a small absolute value of Ak is an

49

indication that the two treatments are not very different while a large absolute value
of A), is an indication that one of the treatments is much better than the other one.

Using the above interpretations we propose the following function for inc:

(4.13) 71,, = ”MA"? .
l + 6Xp(Ak)

 

To emphasize that the treatment allocation is a function of 17,54, 13,3 rewrite in,

7,”: = €Xp(33,f—1 + [34,16—1Xk)
1 + EXPWch-l + :64,k—1Xk)
_ “Ni/1:1 —' YkB)
1 + exp(Y,;4 — i1?) '

 

 

Since in this case we will work with the estimator A,“ is easier to think of the
probability of selection irk as a function of it. Therefore we will refer in this case to
in, as a function of just one variable instead of two, in, = f1(Ak).

It is easy to observe that the three properties of the function f (-) as they were

deﬁned in (4.5) - (4.7) become the following properties for function f1(-):

f1(0) = 0.5
. . exp(1')
1 = 1 __ =
1:320 f1 (:13) xgblo 1 + exp(a:)
. . expt'r)
l = l —— =
mgr-Poo fl($) :r—ltr-{loo 1 + 8Xp(SL') 0

In our case, f1 (y) = $13—27) and the three properties from above are satisﬁed,
as it can be seen from the graph shown in ﬁgure 4.1.
The design is now completely speciﬁed according to the general framework pre-

sented in the introduction of the chapter (see page 47)

50

Figure 4.1: Graph of function f1(y) = M

1+exp(y)

 

0.8 -

0.6 >-

0.5 r

0.4 '-

0.3 ~

0.2 r-

0.1 r

 

 

_

 

 

4.2.1 An Evaluation Function for the Number of Patients

on the Inferior rIreatment

We will consider now an evaluation criterion for the design proposed in Chapter
4.1 from an ethical point of view. Recall that one of the main advantages of the
adaptive designs is that they offer the choice of combining ethical considerations with
the randomization needed for an impartial trial. In our case we want to allocate
as many patients as we can to the better treatment, where the better treatment is
decided based on the estimators from the model (4.8), but we have to keep in mind
that the allocation is also subject to randomization introduced by (4.4).

The evaluation criterion is a natural one, given the model and the design consid-
ered. We will count the number of ’mistreatments’, i.e. the number of allocations

to the inferior treatment. It can be regarded as a loss function. We denote by Ck

51

the number of ’mistreatments’ up to stage It and is deﬁned as follows:

A: k

(4.14) 0,. = Z [[t, = 1] - [[A, 3 01+ 2 I[t,- = 0] . 1m.- > 0]
,=2,,0+1 i=2no+1

where we denoted by I () the usual indicator function.

The sum begins from 2no + 1 because we will take in consideration only the
allocations done according to the design as deﬁned by relations (4.13) and (4.12)
and will not include the ﬁrst 2no allocations from the ﬁrst stage.

Observe that the variation in the loss function 0,, comes from two sources: one is
the variation coming from the randomization of the treatment assignment according
to a Bernoulli distribution and the other comes from the parameter estimation we
use in the treatment assignment.

We are interested in evaluating Ck. Based on the theorems we proved in the
Chapter 3.3 we will study the behavior of Ck as k -+ co and we will be able to
ﬁnd a limit for the proportion of these number of patients allocated to the inferior

treatment.

Theorem 4.2.1. Assume that the density function p(x) is uniformly bounded, the
errors ck satisfy condition (3.14), the minimum and maximum eigenvalue of the

design matrix Zn satisfy condition (3.13) and that

k k—ooo

(4.15) suplX,| < 00 as.
2'21
Then
(4.16)
95 18—3 / f1(ﬁ3 + 04:15)}?(33) dx +/ [1 — f1 (53 + 3433”}?(33) d3: -
ﬂs+ﬁ4x<0 ﬁs+ﬁ4x>0

Proof. We begin by checking the assumptions of theorem (3.3.1) from chapter 3.

52

0 Since in our case the model considered (4.8) is the linear model, the link
function is the identity function, g(y) = g, which is continuously differentiable
with positive derivative function, and therefore the ﬁrst condition (3.11) from

theorem (4.2.1) is satisﬁed.

0 In our case we have the intercept and a one dimensional covariate, Xk, so

condition (3.12) becomes:
sup ||(1.X.-)ll2 < 00
:21

But
||(1.X.-)ll2 = \/1+X.-2 < 1 + WI

and using condition (4.15) the conclusion follows.

0 Finally, conditions (3.13) and (3.14) follow from the assumptions made in the

theorem for our model.

Therefore we can apply theorem (3.3.1) and conclude that:
(B1,kaﬁ2,k483,kaa4,k) —* (51.32.33.541) 3-5-
Recall the deﬁnition of the o -algebra J), as deﬁned by (2.5):
.7141: fie V 0(Xk+1)

and the equation (2.6):
E(tk+1ljk) = 7%“

From the construction of the o -algebra, X k is Jk_1 -measurable. Since in our case

53

A1,, = ﬁ3+ﬁ4Xk it follows that A), is also Jk_1 -measurable. Then for any i > 2n0+1:

E(I[ti=1]'I[AiS 0]) = E(E(I[t,- = 1] - IlA; S (”Ur—1))
= E(I[A.- s 0] - E(I[ti = 1]|.7.-_1))
= E(I[A,~ g 0] - E(t.-|.7.--1))
= E(71’,'°I[Ai s 0])

Analogous,

EUlti = 0] ' [[A, > 0]) = E((1“‘ 7%) ' [[Ai > 0])

Then we have that

54

k

5%) =%. 2 mm, = 1] . 1m.- 5 0])+

t=2no+l

52:11; E(I[t =0] I[A >01)

E(7T, I[ﬂ3+ﬁ4Xi < 0])-l-

"PrHIt-J
;M~ [w

k
1H5: ((1—71,) 1[ga+g4X,->01)
=no2

ino=2
_1_
k
k

_2

 

 

 

 

1
=1; /ﬂf[31(3,i—1+ﬂ4,i—1$)P($)dx+
53+.04IS0f
1 ’° . A
— Z / (1 — f1 (IBM—1 + ,34,i_1£13))p(:1.‘) dz:
i=2no+l 33+ﬁ41=>0
k — 27.0 / 1 ’° . .
: fl(ﬁ3,i—1 + 343—115)?)(113) dx+
k 193+ﬁ4$<0 k 217.0 Mimi,
k — 2n0 / 1 ’° . .
4.17) 1—f £3- +,B,.--a: pxdx
( k 33+B4x>0 k — 271.0 i=;0:+1( l( 3 l 4 l )) ( )

But from our previous statement, we have that 33,), —> B3 and 34,), -—> H4 as and

because the function f1(~) is continuous, we have that:

f1(B3,i—1 + 344-133) (‘13 f1(53 + 5455) V37

And therefore making use of the Stolz—Cesaro theorem (see A.1.1) we have that the

series is convergent:

 

2 f1( ﬂ“ 1+ﬁ4i-1$)P($)g fllﬁs+ﬁ4$)p(x) Vx.

i=2no +1

k— 271.0

55

I

In the similar way we argue to get that:

k
k _12n0 . Z (1 - f1 (33.1-1 + B4,.-1x))p(x) ‘5’; (1 — f,(,33 + 34m))p($) V3; _

t=2n0+1

 

But the function f1(-) is uniformly bounded and from assumptions the density p()

is also uniformly bounded, so

Slip |f1(1‘)| V IP03)! S B

where B is some constant.

It follows from the dominated convergence theorem that:

 

 

k
1 . a
f (5 ,i— + ﬂ ,i_ x)p x) dx
jaawaxgok _ 2710 .29“ 1 3 1 4 1 (
Ls" 1.1033 + ﬁ4$)P($) dx,
53+B4ISO
1 " . .
(1 ‘ fl(l33,i—1 + ﬁ4,i—1$))P($) dx
/ﬁ3+ﬁax>0k - 2”0 ﬁg“
2’ (1 ‘ f1(/33 + ﬁ4$))p(x) dx.
53+l34r>0

Combining the two relations together with (4.17) and using that £:%m —> 1 as

k ——> 00 we get the conclusion of the theorem. El

Let look more closely at the previous result in the particular case when X), is
uniformly distributed:

Xk~U(a,b) .

Let L be the limit of the 9;} as it was deﬁned by equation (4.16). Then we have

the following corollary:

56

 

Corollary 4.2.2. If the covariate Xk from the model introduced by (4.8) and (4.9)

is uniformly distributed on interval [a, b] then the limit L is equal to:

0 Case1:a<—gf<b

ﬂ
(4.18)
L — —1—- In 4
_ |ﬁ4|(b _ a) (1+ €53+l340)(1 + 663+ﬁ4b)

 

+ (Ba + ﬂia) v (it + BM)

0 Case 2: —% < a

4

1 1 + eﬁs-Hita

4.1 = —
< a L llan—M“I+wwm

 

0 Case 3: b < —%1
4

1 1+ eﬁi’w‘”
4.20 L = ———1 _—
( ) |/34|(b— a) “1+eﬁaw4b

0 Case 4: 34 = 0

833

 

 

um) L Hmsm+ Ha>m

=1+eﬁa 1+e63

Proof. In this particular case the density function p(x) is equal to

 

p(x): bianagng]

and the function f1(-) was deﬁned as f1(y) = 14:15.80 the limit L given by rela-

57

tion (4.16) becomes:

1 earl-34$

 

dx+

L — l .
{53+B4$30}ﬂ{a<x<b} b — a. 1 + 853+)(34x
1 53-11342:
/ ' (1’ £772?) dz
{ﬂ3+B4x>0}ﬂ{a<x<b} b " a 1 + e 3 4x

Calculus computations ﬁnish the proof. C]

 

As it can be seen from the above result we can distinguish there different scenarios
for the example considered. The regression line for the response from treatment A
and the regression line for the response from treatment B may intersect or not and
if they do intersect then the point of intersection may be or not in the interval
[a, b]. The case when they intersect inside the interval [a, b] corresponds to case 1,
when they intersect to the left or to the right of interval [a, b] corresponds to case
2, respectively case 3 and ﬁnally when they are parallel corresponds to case 4. The

four situations are illustrated in ﬁgure 4.2.

58

 

 

Figure 4.2: Illustration of Corollary 4.2.2

Case 1

 

 

 

response Y

 

 

 

 

 

 

 

Case 2
>- >-
8 8 1
C C
8 ’ 8
m ‘0 L
8 2
b a b
covan'ate X covariate X
Case 3 Case 4

> .

i

m

e b

b a b
covariate X

covariate X

59

4.2.2 Simulations of the Design

In this section we will present some numerical results obtained from simulations. We
will compute by Monte-Carlo methods the estimators for the coefﬁcient parameters
(01, [32, ,83, 64) of model (4.8) and also we will compute the number of allocations to
the inferior treatment, C", and the proportion Cn/(n — 2no). Although the theorems
presented in the previous sections are all proved in the asymptotic case we will see
that even in the case of moderate sample sizes the asymptotic results hold.

We will consider the simulations in the case when covariate X), is uniformly
distributed, presented in corollary (4.2.2).

Consider the following numerical application:
- ﬁ1=0,ﬁe=1,ﬁa=3,ga=—0.5.

0 0:0.1, 0.2, 0.5, 1 or 2.

e a=0, b=10

e the ﬁxed sample size is n = 30,50,100, and no = 5, so that the ﬁrst ﬁve
patients are allocated to treatment A, the next ﬁve to treatment B and the

next n — 10 allocations are done according to our design introduced in page 46

Because of choice of the parameters 61, ,62, ﬁg, 04 we are in case 1 of the above
result (4.2.2) and we get that in this particular case L = 0.242. The two regression
lines for treatment A, respectively B are graphed below in ﬁgure 4.3. We use a
continuous line to denote the regression line for YA and a dashed line to denote the

reggresion line for YB:

E(Y,f) = 3 + 0.5x,

Elka) = Xk

60

response Y

10

Figure 4.3: The regression lines for treatment A and B

 

 

 

h—

 

 

)—

oovan'ate X

61

10

 

For a better understanding of the allocation design it is worth mentioning how

the allocation probabilities vary as function of covariate X. Recall that

 

Ax = [33 + )64113
exp(Ae)
x = Ax =
7T M ) 1+ exp(Ae)
and so in our case
exp(3 — 0.5x)

 

”I = 1 + exp(3 - 0.5x) '
Then on the interval [0,10] the allocation probabilities vary according to the

following table 4.1:

Table 4.1: Allocation probabilities to treatment A

 

 

IE A:1: f1(Ax)
0 3 0.95
1 2.5 0.92
2 2 0.88
3 1.5 0.82
4 1 0.73
5 0.5 0.62
6 0 0.5
7 —0.5 0.38
8 -1 0.27
9 -1.5 0.18
10 -2 0.12

 

A typical simulation of the design allocation is given in the following ﬁgure 4.4.
We use symbol ’0’ for marking the allocations to treatment A and symbol ’+’ for

marking the allocations to treatment B.

62

response Y

Figure 4.4: A simulation design

 

 

 

 

12 r l 1 I 1 I l 1
+
10'
+'
+
8” :
4* 0+ '
o
o o o
6.— 0. e O .1
o +'0
o o ' ++
, +
o O . 0° 0+ +
4* . ' + '*
O 000 2" + +
:. 'o <90
o +
2— + «
B
o -
+
-2 1 1 1 1 1 1 m 1
0 1 2 3 4 5 7 8 9 10
oovarialex

63

 

In the following tables 4.2 ~ 4.7 we used Monte Carlo simulations to evaluate the
bias and the variance of the estimators 61, do, [13 and 34 and also to evaluate the
estimate and the standard deviation for Cu and Cn/(n — 2no). Since there are ten
observations in the ﬁrst stage, no = 5, then for n = 100, there are ninety observations
in the second stage, n—no = 100—2-5 = 90. In the limit L = 0.242 so we may expect
to observe about 90 - 0.242 = 21.78 ’mistreatments’ for 0100- Analogues, we may
expect 40-0242 = 9.68 ’mistreatments’ for C5o and 20-0242 = 4.84 ’mistreatments’
for 03o. The estimates from the simulations were close to these values as it can be
seen from the tables below.

In each of the following cases there are 10000 replications of the simulation of
the allocation designs.

In ﬁgures 4.5 - 4.9 we can see the histogram from the 10000 replications of the
simulation for the bias of the estimators 61, 62,63,611 and also the histogram for
the estimates of 0100- The histograms are shown in the case when n = 100, and

o = 0.1,0.5,1 and 2.

64

 

Table 4.2: Simulation results for 61, 32,33,621, in the case n = 30

 

bias

standard deviation

 

61

32

133

B4

0:0.1
0:02
0:05
0:1
0:2

0:01
0:02
0:05
0:1
0:2

0:01
0:02
0:05
0:1
0:2

0:01
0:0.2
0:05
0:1
0:2

—0.0007
-0.0025
-0.0239
-0.0798
-0.2787

0.0001
0.0002
0.0014
0.0047
0.0129

0.0010
0.0012
0.0253
0.0861
0.2092

-0.0003
-0.0002
-0.0039
-0.0144
-0.0330

0.0762
0.1523
0.3870
0.7851
2.3751

0.0100
0.0224
0.0583
0.1204
0.2590

0.0900
0.1780
0.4629
0.9302
2.0170

0.0141
0.0300
0.0755
0.1597
0.3524

 

65

 

Table 4.3: Simulation results for 61,32, 33, ,8}, in the case n = 50

 

bias

standard deviation

 

ﬁt

ﬁ2

ﬁa

.54

(7:01
0:02
0:05
0:1
0:2

0:01
0:02
0:05
0:1
0:2

0:01
0:02
0:05
0:1
0:2

o=0.1
0:02
0:05
0:1
0:2

-0.0021
-0.0041
-0.0261
-0.0883
-0.3109

0.0002
0.0004
0.0027
0.0070
0.0212

0.0021
0.0043
0.0287
0.0997
0.2973

-0.0003
-0.0008
-0.0048
-0.0160
-0.0478

0.0632
0.1261
0.3122
0.6534
1.3500

0.0100
0.0173
0.0447
0.1170
0.2245

0.0728
0.1428
0.3585
0.8209
1.6500

0.0100
0.0224
0.0583
0.1453
0.2900

 

66

Table 4.4: Simulation results for 61, 32, fig, 34, in the case n = 100

 

bias standard deviation

 

ﬁi

ﬁz

[33

34

0:01
0:02
0:05
0:1
0:2

0:01
0:02
0:05
0:1
0:2

0:01
0:02
0:05
0:1
0:2

0:01
0:02
0:05
0:1
0:2

-0.0009
-0.0035
-0.0197
-0.0644
-0.2559

0.0001
0.0004
0.0021
0.0067
0.0216

0.0010
0.0044
0.0217
0.0713
0.2711

-0.0002
-0.0007
-0.0034
-0.0116
-0.0416

0.0469
0.0949
0.2328
0.4824
1.0516

0.0100
0.0141
0.0332
0.0678
0.1539

0.0529
0.1068
0.2615
0.5530
1.2147

0.0100
0.0173
0.0424
0.0877
0.1977

 

67

 

Table 4.5: Simulation results for Coo, in the case n = 30

 

estimate standard deviation

 

C’30

030/20

0:01
0:02
0:05
0:1
0:2

0:01
0:02
0:05
=1
0:2

4.8632
4.8698
4.9634
5.2542
5.9658

0.2432
0.2435
0.2482
0.2627
0.2983

1.8988
1.9196
2.0262
2.2815
3.0394

0.0949
0.0960
0.1013
0.1141
0.1520

 

68

 

Table 4.6: Simulation results for C5o, in the case n = 50

 

 

estimate standard deviation
C50
0:0.1 9.7533 2.7314
0:0.2 9.6947 2.7323
17:05 9.7995 2.8569
0:1 10.1507 3.3095
0:2 11.3508 4.8413
05o / 40
0:01 0.2438 0.0683
0:0.2 0.2424 0.0684
0:0.5 0.2450 0.0714
o=1 0.2538 0.0827
o=2 0.2838 0.1210

 

69

 

Table 4.7: Simulation results for 0100, in the case n = 100

 

estimate standard deviation

 

C'100

0100/90

01:01
o=0.2
0:0.5
0:1
0:2

{7:01.
0:02
0:05
0:1
0:2

21.7849
21.8040
21.9066
22.3827
24.0572

0.2421
0.2423
0.2434
0.2487
0.2673

4.0953
4.1247
4.3555
5.1867
7.7539

0.0455
0.0458
0.0484
0.0576
0.0860

 

70

 

Figure 4.5: Histogram of 10000 replications for bias of ﬁll when n = 100, a =
0.1,0.5,1,2

 

 

 

 

 

 

 

 

 

 

31 n=100 o=0.1 B1 n=100 a=0.5
3000
2000
1000
0
-0.3 2
4000
300° 4000
2000 3000
2000
1000
1000
0 0
-4 -2 0 2 4 -15 -10 -5 0 5

71

Figure 4.6: Histogram of 10000 replications for bias of [32 when n = 100, a

 

 

 

 

 

 

 

 

 

0.1, 0.5, 1, 2
02 n=100 0:01 02 n=100 M5

3000
2500 6°°°

2000
4000

1500
1000 2000

500
0 0

-0.04 -0.02 0 0.02 0.04 -0.8 -0.6 -0.4 -0.2 0 0.2
02 n=100 a=1 02 n=100 c=2

5000

6000
4000
4000 3000
2000

2000
1000
0

-1.5 -1 -0 5 0 0 5 -2 -1 0 1 2

72

Figure 4.7: Histogram of 10000 replications for bias of 63 when n = 100, o

 

 

 

 

 

 

 

 

 

 

 

 

 

 

01,05, 1, 2
Ba n=100 0:01 03 n=100 0:05
4000 . - . 4000 - . -
3000’ 3000:
2000' 2000’
1000* 1000.
0 0
-{l4 -2
4000 10000 - mi .
3000i 8000’
6000’
2000:
4000:
1 .
000 2000:
0 0

   

-80 -60 -40

73

 

It

 

 

 

 

 

 

 

 

   

 

 

 

Figure 4.8: Histogram of 10000 replications for bias of 04 when n = 100, o
0.1, 0.5, 1, 2
b4 n=100 a=0.1 ﬂ4 n=100 0:05
2500 5000
2000
1500
1000
500 1000
o o
—0.04 —0.02 0 0.04 -02 0 02 0.4 06 on
[5‘ n=100 0:1 [1‘ n=100 c=2
mm
4000
6000
3000
4000
2000
1000 2000
o o
-o.5 0 0.5 1.5 -2 0 2 4 s a

74

Figure 4.9: Histogram of 10000 replications for estimates of 0100 when n = 100,
o = 01,05, 1, 2

C‘0° n=100 030.1 C100 "3100 5:05
3000 - . - 3500 - . -

 

 

 

 

 

 

   

0 10 20 30 40 50

C100 n=100 a=1 C,00 n=100 o=2
4000 . a

 

 

 

 

 

 

   

75

Finally, we want to compare from an ethical point of view our design (ADC)
with other designs proposed in the literature. So we will compare the number of

’mistreatments’, C", and the proportion Cn/(n —- 2no) from two other designs:

0 complete randomization (CR) design where each treatment has 50% chance of

being assigned (the covariate is ignored)

0 deterministic (D) design where allocation is done according to the larger mean

response of the two treatments, that is

tk = [if/£1 > 17131]

The results from the simulations with n = 50,100 and o = 0.1, 0.2, 0.5, 1, 2 are
presented in tables 4.8 - 4.11. Recall that for our design in the ﬁrst stage we allocate
5 patients to each treatment, so for comparison purposes we have to take 100—10=90,

respectively 50—10=40 allocations for the other two designs enumerated before.

76

 

Table 4.8: Simulation results for C50, in the case n = 50

 

 

 

estimate
ADC CR D

0:01 9.75 19.99 17.70
0:02 9.70 20.01 17.71
0:05 9.80 20.01 17.62
0:1 10.15 19.99 17.87
0:2 11.35 20.04 18.31

standard

deviation

ADC CR D
o=0.1 2.73 3.13 3.72
0:02 2.73 3.22 3.73
0:05 2.86 3.13 3.74
0:1 3.31 3.17 3.93
o=2 4.84 3.18 4.18

 

 

77

Table 4.9: Simulation results for 050 / 40, in the case n = 50

 

estimate

ADC

CR

D

 

standard
deviation

0:01
0:02
0:05
0:1
0:2

0.244
0.243
0.245
0.254
0.283

ADC

0.500
0.500
0.500
0.500
0.501

CR

0.443
0.443
0.441
0.447
0.458

 

0:01

0:02

0:05
=1

0:2

0.068
0.068
0.072
0.083
0.121

0.078
0.081
0.078
0.079
0.080

0.093
0.093
0.094
0.098
0.105

 

78

Table 4.10: Simulation results for C100, in the case n = 100

 

 

 

estimate
ADC CR D
0:01 21.78 44.98 38.98
o=0.2 21.80 45.09 39.02
0:05 21.91 44.98 39.27
0:1 22.38 44.96 39.58
0:2 24.06 44.96 40.84
standard
deviation
ADC CR D
o=0.1 4.10 4.73 6.73
0:02 4.12 4.69 6.75
0:05 4.36 4.80 6.98
o=1 5.19 4.74 7.31
0:2 7.75 4.74 8.12

 

79

Table 4.11: Simulation results for 0100/ 90, in the case n = 100

 

estimate

ADC

CR

D

 

0:01
0:02
0:05
0:1
0:2

standard
deviation

0.242
0.242
0.243
0.249
0.267

ADC

0.500
0.501
0.500
0.499
0.499

CR

0.433
0.434
0.436
0.440
0.454

 

0:01
0:02
0:05
0:1
0:2

0.046
0.046
0.048
0.058
0.086

0.053
0.052
0.053
0.053
0.053

0.075
0.075
0.078
0.081
0.090

 

80

4.3 Conclusions and Future Work

The design we propose is quite natural and ﬁts very well in situations described by

the outline below:
0 Delay in response is moderate, allowing the adaptation to take place.
0 The recruitment can take place during most or all of the trial.

0 It is known that a generalized linear model is a good ﬁt for the relationship

between the response, the covariates and the treatment.

0 Modest gain in terms of treatment successes is desirable from an ethical stand-

point.

The design can be especially useful if it is desirable also to have an estimation of
the parameters of the model during the duration of the clinical trial before waiting
for the entire data to be collected.

We have seen that the asymptotic results obtained in theorems 3.3.1 and 4.2.1
are consistent with the simulations even in the case when the n is relative small
(n = 30,50,100). The histograms also show a normal behavior for the estimators,
but as dispersion parameter 02 increases the variance of the estimators increase too,
leading in some cases to more biased estimates.

There are several lines of research arising from the current thesis work which

should be pursued.

0 There is a natural extension to the case when in the clinical trial there are

more than two treatments to be compared.

0 One of the requirement of the design studied was to have responses available
prior to the assignment of the treatment to the next patient. But as often is

the case in practice, delayed responses should be considered in the design, too.

81

0 Of particular interest is the extension of the normal asymptotic result from

the normal linear model to the generalized linear model.

0 An useful design is such that the function 7r,c that we use in randomizing the
allocation is obtained from various optimal criteria. Of interest in this case is
also to compare the optimal target with its correspondent from the adaptive

design.

0 Lot of work has been done in the literature in the case where the covariates
are categorical, so a stratiﬁed design can be employed. An application of our
design in this particular case would be interesting and the performance of the

designs should compared.

82

Appendix A

A.1

Theorem A.1.1 (Stolz-Cesaro Theorem (Siretchi (1985)). Let (an),,21 and
(bn)n21 be two sequences of real numbers. If bn is positive, strictly increasing and

unbounded and the following limit exists:

. an+l—an
l ———=l

n—+oo n+1 — bn

Then the limit:

also exists and it is equal to l.

Proof. From the deﬁnition of convergence , for every 6 > 0 there is N (e) E N such

that (V)n 2 N(6) , we have :

a —a

l_€<bn+1—bn

Because bn is strictly increasing we can multiply the last equation with bn+1 — bn to
get :

(l _ €)(bn+1 - bn) < an+1 - an < (1+ C)(bn-H _ bit)

83

 

Let k > N (e) be a natural number . Summing the last relation we get :
k

k k
(l—e) Z (b,+1—b,)< Z (a,,,—a,,)< (1+6) 2 (am—boa

i=N(c) i=N(c) i=N(c)
(1 - €)(bk+1 — bis/(q) < ak+1 — am.) < (1+ €)(bk+1 - bN(c))

Divide the last relation by bk+1 > 0 to get :

 

 

b 6 E b C
(l—e)(1— N” < ““1 — “N” < (1+e)(1——ﬂ)e
bk+1 bk+1 bk+1 k+1

b E C b c C
(l_€)(1-N—())+GN()(ak+l<(l+5)(1_ﬂ_l)+aml
bk“ bk“ bk+1 b1+1 bk+1

This means that there is some K such that for k 2 K we have :

ak+1

(l—c)<

 

< (l + c)
bk+1

since the other terms who were left out converge to 0.

This means that :

84

Bibliography

[1] Andersen, T.W. and Taylor, J. (1979). Strong consistency of least squares es-
timators in dynamic models, The Annals of Statistics, 7, 484-489.

[2] Atkinson, AC. (1982). Optimum biased coin designs for sequential clinical trials
with prognostic factors, Biometrika, 69, 61-67.

[3] Blackwell, D. and Hodges, J .L. (1957). Design for the control of selection bias,
Annals of Mathematical Statistics, 28, 446-460.

[4] Chen, K., Hu, 1. and Ying, Z. (1999). Strong consistency of maximum quasi-
likelihood estimators in generalized linear models with ﬁxed and adaptive de-
signs, The Annals of Statistics, 27, 1155-1163.

[5] Clayton, MK. (1989). Covariate models for Bernoulli bandits, Sequential Anal-
ysis, 8, 406-426.

[6] Cornell, R.G., Landenberger B.D., Bartlett and RH. (1986). Randomized play
the winner clinical trials, Communications in Statistics- Theory and Methods,
15, 159-178.

[7] Dvoretzky, A. (1972). Asymptotic normality for sums of dependent random
variables, Proceedings of the Sixth Berkeley Symposium on Mathematical Statis-
tics and Probability , 2, 513-535.

[8] Efron, B. (1971). Forcing a sequential experiment to be balanced, Biometrika,
58, 403-417

[9] Geraldes, MC. (1999). Covariates in adaptive designs for Clinical Trials, Michi-
gan State University Graduate School(doctoral dissertation)

[10] Ivanova, A., Rosenberger, W., Durham, S. and F lournoy, N. (2000). A birth
and death urn for randomized clinical trials: asymptotic methods Sankhyd:
The Indian Journal of Statistics B, 62, 104-118.

[11] Jenison, C. and Turnbull, B.W. (2000). Group Sequential Methods with Appli-
cations to Clinical Trials, Chapman and Hall, Boca Raton.

85

 

[12] Lachin, J.M. (1988). Properties of simple randomization clinical trials Con-
trolled Clinical Trials, 9, 312-326.

[13] Lai, TL. and Robbins, H. (1979). Adaptive design and stochastic approxima-
tion, The Annals of Statistics, 7, 1196-1221.

[14] Lai, T.L., Robbins, H. and Wei, C.Z. (1979). Strong consistency of least squares
estimates in multiple regression II, Journal of Multivariate Analysis, 9, 343-361.

[15] Lai, TL. and Wei, C.Z. (1982). Least squares estimates in stochastic regression
models with applications to identiﬁcation and control of dynamic systems, The
Annals of Statistics, 10, 154-166.

[16] Lai, TL. and Robbins, H. (1982). Consistency and asymptotic efﬁciency of slope
estimates in stochastic approximation schemes, Zeitschrift fiir Wahrschein-
lichkeitstheorie und verwandte Gebiete, 56, 329-360.

[17] McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models 2nd ed.,
Chapman and Hall, London.

[18] Melﬁ, V. and Page, C. (2000). Estimation after adaptive allocation, Journal of
Statistical Planning and Inference, 87, 353-363.

[19] Melﬁ, V., Page, C. and Geraldes, M. (2001). An adaptive randomized design
with application to estimation, Canadian Journal of Statistics, 29, 107-116.

[20] Pocock, SJ. and Simon, R. (1975). Sequential treatment assignment with bal-
ancing for prognosis factors in the controlled clinical trial, Biometrics, 31, 103-
115.

[21] Robins, H. and Monro, S. (1951). A stochastic approximation method, Annals
of Mathematical Statistics, 22, 400-407.

[22] Rosenberger, W.F. (1999). Randomized play-the-winner clinical trials: review
and recommendations, Controlled Clinical Trials, 20, 328-342.

[23] Rosenberger, W. (2002). Randomized urn models and sequential design, Se-
quential Analysis, 21, 1-41.

[24] Sarkar,J. (1991). One-armed bandit problems with covariates, The Annals of
Statistics, 19, 1978-2002.

[25] Siretchi, G. (1985). Diﬂerential and Integral Calculus, vol.1, Editura Stiintiﬁca
si Enciclopedica, Bucharest

[26] Smith, R.L. (1984). Sequential treatment allocation using biased coin designs,
Journal of the Royal Statistics Society B, 46, 519-543.

86

 

[27] Tamura, R.N., Faires D.E., Andersen J.S. and Heiligenstein, J.H. (1994). A
case study of an adaptive clinical trial in the treatment of out-patients with

depressive disorder, Journal of the American Statistical Association, 89, 768-
776.

[28] Wedderburn, R.W.M. (1974). Quasi-likelihood functions, generalized linear
models and the Gauss-Newton method, Biometrika, 61, 439-447.

[29] Wei, L.J. (1978). The adaptive biased coin design for sequential experiments,
The Annals of Statistics, 6, 92-100.

[30] Wei, L.J., Durham, S. (1978) The randomized play-the—winer rule in medical
trials, Journal of the American Statistical Association, 73, 840—843.

[31] Wu, C.F.J. (1985). Efﬁcient sequential designs with binary data, Journal of
American Statistical Association, 80, 974-984.

[32] Yang, Y. and Zhu, D. (2002). Randomized allocation with nonparametric es-
timation for a multi-armed bandit problem with covariates, The Annals of
Statistics, 30, 100-121.

87