ICHIGAN STA N

Ilﬂlllllllillllllll iliﬂiii‘lili‘iﬂiiﬁﬁl

3 1293 01051 7872

 

 

 

 

This is to certify that the

dissertation entitled

Estimation in Interval Censorship Models

presented by

Zhiming Wang

has been accepted towards fulﬁllment
of the requirements for

PhoDo degree in Stat‘iSt'iCS

ACWW

V . /
Major podfessor

Date October 28, 1993

MS U is art Affirmative Action/Equal Opportunity Institution 0-12771

 

 

 

 

LIBRARY
Michigan State
Unlverslty

 

 

 

PLACE It RETURN BOX to remove thte checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU le An Afﬂrmetlve ActlorVEqueI Opportunity lnetltulon
Wilma-9.1

 

ESTIMATION IN INTERVAL CENSORSHIP MODELS
By

Zhiming Wang

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

1993

ABSTRACT

ESTIMATION IN INTERVAL CENSORSHIP MODELS

By

Zhiming Wang

Interval-censored data arise in the analysis of survival times where there is only
periodic assessment of subjects for outcomes of interest. We address the problem of
estimating the survival distribution of the time of onset of some biologic event when its
time of occurrence is not observed directly, but is known to have taken place in some
time interval determined by the pattern of the examination times. This event time may

also be right-censored if the event has not occurred at the time of the last assessment.

We introduce a statistical model incorporating the examination times and the
event status at these times that indicates whether or not the outcome has already taken
place. Two cases are considered, one in which the number of assessments that are made
over the period of study is ﬁxed, and in the other where this number is random.

Identiﬁability of the distribution of the event time is proved.

A class of estimators of the survival function is proposed in an interval censorship
model in which there is one examination time. These estimators are shown to satisfy
certain self-consistency equations and this provides a means of their computation using

the EM algorithm. The relationship between this class of estimators and the

ii

nonparametric maximum likelihood estimates is investigated. We establish the strong

consistency of our estimators and derive the weak convergence of certain functionals.

A corresponding Bayesian estimator is constructed under the assumption of a
Dirichlet process prior, and is shown that it does not necessarily converge to the

nonparametric maximum likelihood estimator.

Simulation studies are presented under parametric assumption with two
examination times. Comparisons are made between the nonparametric Tumbull estimate
of the survival function and the maximum likelihood estimate under the parametric

model.

iii

To my wife Liqin and my daughter Cynthia

iv

ACKNOWLEDGMENTS

I thank my advisor Professor Joseph C. Gardiner for his patient guidance and
encouragement during the preparation of this dissertation.

I would also like to thank Professor R.V. Ramamoorthi for an introduction to
some problems studied here and for fruitful discussions.

I extend my sincere thanks to Professor Habib Salehi and Shlomo Levental for
having served on my guidance committee and for their comments.

Finally I would also wish to thank Mr. Alexander White for helping me to

improve the language of this thesis.

TABLE OF CONTENTS

Chapter Page

1. INTRODUCTION AND SUIVIMARY .............................................. 1

2. IDENTIFIABILITY ................................................................................. 6
2.1 Interval Censorship Model ................................................................. 6
2.2 Modell ............................................................................................... 8
2.3 Model 11 .............................................................................................. 12

3. STRONG CONSISTENCY AND WEAK CONVERGENCE 16

3.1 Introduction ........................................................................................ 16
3.2 Self-Consistency ................................................................................ 18
3.3 Strong Consistency ............................................................................. 22
3.4 Weak Convergence ............................................................................ 27
4. BAYESIAN ESTIMATION ................................................................. 34
4.1 Introduction ........................................................................................ 34
4.2 Right Censoring .................................................................................. 35
4.3 Interval Censoring .............................................................................. 38
5. SIMULATIONS .......................................................................................... 43
APPENDIX ....................................................................................................... 49
REFERENCES ................................................................................................ 54

vi

Chapter 1
INTRODUCTION AND SUMMARY

The past two decades have witnessed the rapid development of statistical
methodology for the analysis of survival times from censored data. These data arise in
several clinical, biomedical and epidemiological studies where outcomes of interest are
response times, for example, time to tumor appearance, time to relapse, or time to death
or failure. These endpoints, however, may not be observed in all subjects for various
reasons. Subjects in the study may be lost to follow-up due to withdrawal or to the
occurrence of an end point unrelated to the outcome of interest in the study. These data
are turned right censored. The typical situation where right censorship occurs is when
subjects enter a study at different (random) times and are followed until a speciﬁc
endpoint is observed. The time of its occurrence T, measured from entry, will be right
censored if by the time of termination of the study the event of interest has not taken

place.

The situation where the event/outcome of interest may be subject to left or right
censoring is generally referred to as double censoring. Leiderman (1973) describes a
study of infant precocity in a group of Kenyan children in which subjects entered the
study at different times and were tested periodically to ascertain whether they had
acquired certain skills. The time of onset T of the development of these skills was of
interest. We note that T is left censored if a child at the time of entry into the study has
attained the desired level of development, and right censored if, by the last available

assessment time, the child has not reached the goal.

2

Interval censorship arises where continuous monitoring of outcomes of interest is
impractical, and assessments of the study subjects can be conducted only periodically.
The precise time T at which the outcome occurred is not observed, but is known to have
taken place within a speciﬁed time interval determined by the sequence of observation
times {Wk : k z 1}. What is known is that the event occurred in some time interval
(Wk, WM]. Therefore, the event time T is said to be mm. For example,
Peckham (1991) reported for the European Collaborative Study regarding the
development of AIDS in 600 children born to mothers who tested positive for the human
immune deﬁciency virus-1. These children were examined at birth and at 3 months
intervals until 18 months of age, and again every 6 months thereaﬁer until 4 years of age.
Thus changes in their clinical status were not observed directly. They were seen only at
the age at which disease was ﬁrst detected and the age at the previous assessment and

hence the time of onset of infection or disease is interval censored.

In each of the cases of censorship described here, a basic problem is the
estimation of the distribution of T from a sample of observations under nonparametric
circumstances. For right censored data, Kaplan & Meier (1958) introduced the product
limit (PL) estimator, which has become the cornerstone for almost all analysis of right
censored survival data. The properties of the PL estimator, its asymptotic theory and the
statistical inferential procedures based on it, have been extensively investigated by
several researchers. Turnbull (1974) introduced an algorithm for estimating survival
curves from double censored data, and Chang & Yang (1987) described a statistical
model under which identiﬁability of the distribution of T and the strong consistency of an
estimator of it were established. The weak convergence of the estimator was proved by
Chang (1990). Samuelson (1989) describes another approach to the problem of estimation

using a formulation based on counting processes.

3

Tumbull (1976), improving on a method of Peto (1973), developed an algorithm
for computing an estimate of the survival function ﬁ‘om interval censored observations.
His method of estimation is based on maximum likelihood considerations and yields a

system of equations (self-consistent equations) that may be solved using the EM

algorithm.

An investigation of the properties of the Turnbull estimator was made very
recently by Groeneboom and Wellner (1992), who explored a connection in the
construction of the nonparametric maximum likelihood estimator (NPMLE) and isotonic
regression. They established the strong consistency and weak convergence of the NPMLE

in two models of interval censorship.

The focus of this thesis is on three issues in interval censorship: Identiﬁability of
the distribution of T in a general model of interval censorship, Consistency of a sequence

of estimators of the distribution of T and a Bayesian approach to this estimation.

Chapter 2 introduces a general model of interval censorship that incorporates the
entire pattern of potential examination times over the duration of a study. We consider
two designs. The ﬁrst permits only a ﬁxed number of inspections of the units, while the
other allows the number of assessments made during the follow-up period to be random.
Within a nonparametric formulation and under some mild conditions, we prove the
identiﬁability of the distribution of the event time and describe its relationship to other

interval censorship models.

Chapter 3 studies the strong consistency and weak convergence of the estimate of
survival ﬁrnction. The estimator is deﬁned implicitly through two equations ((3.3) and

(3.4)) and we demonstrate that their solution is equivalent to the solution of the self-

4

consistency equation. Turnbull (1976) has suggested that a self-consistent estimator is a
maximum likelihood estimator (MLE). However, we have found self-consistent
estimators that depend on the initial mass assigned in the EM-algorithm and do not lead
to maximum likelihood estimators. An example is given in section 3.2 showing we have
different self-consistent estimates derived from one set of interval censored data. Strong
consistency is proved for the class of self-consistent estimators under certain conditions
which are also satisﬁed by the MLE. Therefore the MLE is also strongly consistent under

these conditions.

In Section 3.4, we study the weak convergence of our estimator. Let S(t) be the
survival function of T and S‘"’(t) be a self-consistent estimator of S(t). For the double

censorship model, Chang (1990) shows that JE(S(")(t)—S(t)) converges weakly to a

Gaussian process. However, in our interval censoring model, we do not achieve such a
property. Instead, we show that J5]: (S‘"’(s) -S(s))ds converges weakly to a normal

distribution.

In Chapter 4 we construct a Bayesian estimator of S(t) under the assumption of a

Dirichlet process prior. For right censored data, Susarla & van Ryzin (1976)
. demonstrated a nonparametric Bayesian solution to estimation under squared error loss.
The resulting Bayesian estimator was shown to reduce to the PL estimator in the cases
where or(R+)—>O, where a( - ) is a ﬁnite measure on R‘“ which serves as a parameter of the
Dirichlet process prior. By following the process of construction of the Bayes estimate of
S(t) for right censored data, we ﬁnd that, for right censored data, the estimator satisﬁes
certain self-consistency equations. Several relationships between the Kaplan-Meier PL

estimator and Bayes analog are obtained and an expression for this estimate under

5

interval censorship is presented in section 4.3. It turns out that this Bayes solution does

not necessarily reduce to the MLE as a(R+) —>O.

Examples simulation studies are presented in Chapter 5. The underlying survival
time T is assumed exponential and the inspection times are Gamma-distributed. We
compare the Turnbull estimate of the survival function with that of the MLE obtained
under the parametric assumption. For the former we employ our APL programs to carry
out the task of solution of the self-consistency equations. The SAS LIFEREG procedure

is used for the parametric model.

Chapter 2
IDENTIFIABILITY

2.1 Interval Censorship Model

Let T denote the time of occurrence of the event or response of interest, and let
{Wk : k 2 1} be the increasingly ordered sequence of potential observation times, with all
variables being measured from a common time origin. At the k-th examination time W,,
the event status is designated by the indicator 5k= [ T s W, ], that is 6,, = 1 if the response

has occurred by time W, and 5,: 0 otherwise. We set W0 = 0 and for each I, deﬁne

N(t)=min{k21: 8k=l,W,St}, (2.1)

if this minimum exists and set M!) = +00 otherwise. Also M(t) = max { k 2 l: W, s t } is
the last examination at or before I, which takes place at time WM (see Figure 2.1). Since
throughout this paper t is held ﬁxed, we shall not exhibit the explicit dependence of the

variables on t.

Under the assmnption that W1 5 t almost surely, M(t) is ﬁnite, also N in (2.1)

when ﬁnite, marks the ﬁrst examination at which a positive diagnosis of the event

6

7

occurring at the unobserved time T is made by time t. Then L = WM1 and R = W N, deﬁne
respectively the time of the last negative assessment and the time of the ﬁrst positive
assessment. In this case we have T e ( L , R]. On the other hand if N = +00, we only have
the knowledge that T > W M, which makes T right censored. Therefore the information
available on T is that Te (WN_,,WN] when N is ﬁnite, and for N = 00, T > WM. Deﬁne
We, = WM. The problem addressed in this chapter is the identiﬁability of the distribution of
T on the basis of the datum (WN_,,WN,N,M). Speciﬁcally, if N' and M are deﬁned
similarly for the nonnegative variable T ' and the positive increasing sequence {W k': kal}
then we wish to show that {T, W, , k =1, 2, - - - } and {T', Wk' , k =1, 2, - - ~ } have the
same distribution on [0, t], when the distributions of (WN_,,WN ,N, M) and

(W,;,._,,W,;,.,N', M') are the same.

Turnbull (1976) described a method of maximum likelihood estimation of the
distribution of T based on data of the type T e (L , R], with R possibly inﬁnite when T is
right censored. Groeneboom and Wellner (1992) have developed estimation methods for
two interval censoring models. In the simplest of these, called Case 1, there is a single
examination time X, and the observable aspect of the model is the pair (X , 6 ) where
8=[TSXJ. This situation is obtained by taking X = W,, for any examination time Wk. In the
second case, called Case 2, there are two examination times X and Y, with X < Y, with
the observable datum being (X, Y,y ,y '), wherey = [TSXJ and y ' = [T e( X, Y]]. In
the context of periodic assessments Case 2 is also too wide, since the times X and Y can
be obtained from the examination history in several ways. For instance, from any two
examination times W,, W]. with W, < W}, we get Case 2 by deﬁning X = W, and Y = W}. Our
model, on the other hand, focuses on the entire examination pattern and deﬁnes the

variables that are relevant.

8

For our identiﬁability problem, we consider two situations. In both the times of
inspection are random. In the ﬁrst, a ﬁxed number of inspections is made in each subject,
that is M is a constant. We refer to this case as 'Model I'. In the second situation, the
follow-up period t is ﬁxed and then the number of inspections M over the period [0, t] is a

random number. We refer to this case as 'Model 11'.

Let S, (x) = P (T > x) denote the survival distribution of T, and V, = Wk- W,,_1 , k 21.
The basic assumptions that we make to establish our results are the following:
(A1) Tis independent of{ W,; k =1, 2, - - - }.
(A2) {Vk : k 2 1} are independent and nonnegative, with the density g, of V, being
continuous and positive on (0, 00).

(A3) 0 < S,(x) < 1 for x e (0, co), and ST is continuous.

The conditions (A1) and (A3) were used by Chang and Yang (1987) to prove
identiﬁability in a model for double censorship. Identiﬁability of the distribution of T
fails when the inspection times are degenerate; that is g, = O a.s.. See Remark (2) at the
end of the proofs for an example. Therefore in this nonparametric framework we will not
have identiﬁability if the inspections are made at ﬁxed times. Only the probabilities
assigned by the distribution of T to the ﬁxed intervals (WM, Wk] can be identiﬁed.

2.2 Model 1:

Random Inspection with Fixed Number of Inspections During Follow-up

Since M is assumed here to be ﬁxed, let M = m (21), where m is an integer. Also

N is the ﬁrst integer k for which T S W, or N = 00 if no such k exists. We ﬁrst deﬁne

9

several sub-distributions that will be used in our proofs. All arguments x, x,, x2 6 R”r =
[0, 00)-

Let
Ql (x)=P(Wl Sx ,N=l),
Qa,(x) =P(Wm Sx , N=oo),
and forkz 2,

Q.(xl ,xz)=P(W.., Sx. , W,.Sx2 ,N=k).

From our assumptions, Wk will have a density f, which is continuous and positive on
[0, 00). In fact f, = g, and f, (x) = [I'm (u) g, (x — u)du, k 2 2. Then Q,, Q, and Q, for 122

can be expressed as follows:

Q. (x) =Jo’{1-Sr(u)}f.(u)du, (2.2)
Q. (x) = Jam/mm (2.3)
on. ,x.) =1," If {5.6) — Sr(v)}f.-1 cog. (v - u)dvdu, k 2 2. (2.4)

Our ﬁrst theorem shows that T is identiﬁed from the datum ( Wm , WN, N ).

Theorem 2.1
Under assumptions (A1)-(A3), ST is uniquely determined by Q,, Q00, and Q, (k =
2, . . ., m).

Proof: For the random variables T ', W,’ , k 21, we deﬁne V,’ = Wk' - WH' ( Wo' = 0 ) and
N ', M ' as in (2.1). Let S,(x) = P (T' > x) and ﬂ, g,’ be the densities of Wk', V,‘
respectively. Assume the conditions similar to (A1) - (A3), and suppose they produce the

same sub-distributions Ql , Q”, and Q, (k = 2, - - -, m) as in (2.2) - (2.4). We get

10

{1 - 510)”? (I) = {1 - 51(x)}f.'(X), (2-5)

5106)]; (x) = 5106)}? (X), (2-6)

{57051) ' Si(x2)}fk-1 (x1) gk(x2 ' x1) = {Sr(x1) ' S1(x2)}fk.r'(xr) gk'(x2 ' x1), (2-7)

where k = 2, 3, ~ - -, m. We will prove S, a S,.

If m = 1, we only have (2.5) and (2.6), and together these yield f, (x) =f,‘ (x). Hence S, =

S, since f, is assumed positive.

For m 2 2, rewrite (2.5) - (2.7) in the form

{1 - 3700} {f1 ()6) -f1'(X)} = {5206) - 51(X)}f1'(x), (2.3)

SKX)%' (x)-f..(X)} = {5706) - S1(x)}f..'(x), (29)

{(ST -S,)(x,) ' (str)(x2)}ﬁ.r (x1) gr(x2 ‘ x,)
= {Sr(x|) ' S1052» {fr-1.051) gr'(x2 ' x1) ’fk-r(x1)gk(x2 ' 351)}, (2-10)

where k= 2, 3, - - -, m.

Let h = ST - S,. Then if h at 0 we must have one of the following two cases:
Case ( I ): h(x) 2 O (or symmetrically h(x) S O), x eR+.
Case ( II ): There exist x,, x2 eR”, such that h(x,) > O and h(xz) < 0.

Now we prove both cases will lead to a contradiction.

11

In Case ( I ), since h is continuous and h(O) = 0, there exists to eR+ such that h(x) 2 O for
xe [0, to], and h(x) < h(to) for x e [0, to). From (2.10) with k = m,
flu-1'05) gm'(t0 " x) (fin-1(x) 8.110 ' x): x E [09 to).

So. fro.) = I,“ f -.'(x)g.'(t. —- x)dx

< I: fin—I (x)gm (t0 — x)dx

=f...(to )- (2.11)
Since S,(to) - S, (to) > 0, we get from (2.9)
123(4)) >fm(to ) (2.12)

Hence (2.11) and (2.12) are contrary to each other. Symmetrically if h S 0, the proof is

the same. Therefore Case ( I ) cannot hold.

In Case ( II ), since h is continuous and h(0) = 0, there exist t0 , t, with 0 S t0 < t,, such
that h(to) = O, h(t,) = O, and h > O on ( t0 , t, ) (or h < O on (to, t,), in which case the proof
is the same). Let us assume to >0, for otherwise we would essentially have Case ( I )
again.
By (2.8) and (A3) we have

h(to) =f1'(to), f1(t.) =f1'(t.), and/'10:) >fl' (X) for 15600 , 10- (2.13)
Taking x, = to and x2 6(1‘0 , 1,) in (2.10) with k = 2, we obtain

fr'(xl) g2'(x2 - xi) <ﬂ (x.) g2 (x2 - x1),
and in view of (2.13), also

g2'(x) < g2 (x), for xe(0, t, - 0). (2.14)
Taking x, e( to, I, )and x2 = t, in (2.10) with k = 2, and in view of (2.13), we obtain

f1'(xr) gz'(x2 - 16,) >f.(x1) g2(x2 - x1)

>f1'(x1) 82062 - x1)-

So, g2'(x) > g2 (x), for xe(0, t, - to). (2.15)
Hence (2.14) and (2.15) are contrary to each other. Therefore Case ( II ) cannot hold.

12

Corollary 2.1
The identiﬁability above is total identiﬁability, that is { f, , g, , k = l, 2, - - - , m }
are also identiﬁed by Q, , Q 2 , - - . , QM, Q00.

Proof: We need to proveﬁs f,’ , g,-=- g,', k=1, 2, - - - , m.
We have already proved S,(x) = S, (x) for x eR+. By (2.5) and (A3) we get
g, =f, =f,’ =g,’ on R+. In (2.7) with k=2, take x, = 0, x2 =x eR“, then
{1 - 510an (0) g; (I) = {1 - S,(X)}f1'(0) 82' (X),
and it follows that g2(x) = g2'(x). From f2 (x) = ff, (u) g2 (x - u)du, we obtain f2 (x)=f2' (x).

By the same arguments and inductively we can establish 1; (x) = ﬁ'(x) and g, (x) = g,'(x)

forlSkSm.

2.3 Model II:

Random Inspection with Fixed Follow-up Period

In this case the number of inspections M made over the observational period [0, t]
is not ﬁxed in advance. Then M becomes a random integer that is the total number of
inspections made in this period. Note that N takes on values 1, 2, ~ - -, M and +00. Let

Q (x, ,x2 , n , m) = P(WN_, Sx,,W,,, S x2,N = n, M: m), (2.16)
where OSx,,x2 S t,1 S n S m andm _>. 1; and

Qw(x,m)=P(Wme,N=oo,M=m), (2.17)
where 0 S x S t and m 2 1. The following theorem shows that T is identiﬁed from the

datum (Wm, WN, N, M).

13

Theorem 2.2
Under assumptions (A1) - (A3), on [0, t], S, is uniquely determined by the sub-
distributions of (2. 16) and (2.17).

Proof: Let

Q: (x)=P(W,Sx,N= 1):

= ZPU‘Vn—r SxraWN Sx,N=l,M=M),

m-l

= ZQ(x,,x,1,M),
nt=l

and

Q2 (x,,x2)=P(W, Sx, , WZSx2,N=2),

= iQ(x,,x2,2,m),

m=1
where x, x, , x2 6 [0, t].

By the same argument leading to (2.2)-(2.4), Q, and Q2 can be expressed as follows:
Q, (x)=],’{1-Sr(u)}ﬁ(u)du. (2.18)

Q2(x1 ,x. > =1," I,” {5.6) — Sr(v)}f1(u)g2(v - u>dvdu. (2.19)
wherex,x, ,x,e[0,t].
For xe [0, t], let (72(x) = P(V2 > x) = fg,(u)du, then
Qw(x, l )=P(W, Sx, T2 W,,M= 1),
=P(W, Sx, T2 W, , W, + V2>t),

= L’P(T>u,V, >t—u)f,(u)du,

= 10’s,(u)f,(u)c_;,(t —u)du. (2.20)

We will prove ST is uniquely determined by Q, , Q2 and Q0, .

14

As in the proof of theorem 2.1, for the random variables T ', W,’ , k 21, we deﬁne
V,'= W,‘ - W,_,'( W,'=O)andN',M'asin(2.1). LetS, (x)=P( T'>x)andf,',g,’ bethe
density of W,', V,’ respectively. Assume the conditions similar to (A1) - (A3), and

suppose they produce the same sub-distributions Q and Q, as in (2.16) and (2.17). From
(2.18)-(2.20) we get

{1 ' 51(x)}f1 (x) = {1 ' S,(x)}f,'(x), (2-21)
{51(x1) ‘ S7(x2)}f1 (x1) 32 (x2 ' x1) = {Sl(xl) ‘ Sr(x2)}fr'(xr) 32.052 " x1), (2-22)
S7001: (xx—ix t - x ) = Slam '(x)C—32'(t - x), (2.23)

where x, x, ,x, e[o, t], and (72'(x) = P(V2'> x) = fg;(u)du.
Since (72 and (72' are survival distributions, taking x = t in (2.23) yields

$10M (t)=S.(t)f.'(t)- (2.24)
Together with (2.21), this gives f, (t) = f,'(t), and since f, and f,' are positive, we also get
S,(t) = S, (t). Hence ST and S, agree at zero and t. Suppose S, :t S, on [0, t]. There exist t0
and t, with 0 S t0 < t, S t such that S,(to) = S,(t,), S,(t,) = S,(t,), and S,. > S, on (to, t,) (or
S, < S, on (to, t,)). The rest of the proof follows the same arguments as in Case ( II ) of

section 2.3 in order to obtain a contradiction. Therefore ST E S, on [0, t].

Corollary 2.2
The densities {j}, g,; k 21} are also identified by the sub-distributions Q and Q00.

Proof: The proof is the same as corollary 2.1, note we have the sub-distributions:

15

Qk(xl ,x2)=P( Wit-13x19 Wkst’Nt=k)
=Q(x,,x2,k,k)+Q(x,,x2,k,k+1)+ ' ' '
=P( W,_,_<_x, , W,,Sx2, W,_,<T<W,)

=1." l.” {SM - 52mm, (20g. (v - u)dvdu,

where x, , x2 6 [0, t], k 2 2.

Remark:

(1). Theorem 2.1 and 2.2 will continue to hold if the assumption of continuity of the
densities g, in (A2) is replaced by either right or left continuity of the g,

(2). If g, = 0 on some interval, then the identiﬁability fails. For example, in model I with
a simple inspection (m = I), suppose f, = g, = 0 on (1, 2) and S, and S, coincide outside
(1, 2), but differ on (1, 2). Then S, and S, will produce the same sub-distributions Q, and
Q0 in (2.2) and (2.3). A similar example for the two-inspection case (m = 2) is given by

Chang & Yang (1987).

Chapter 3
STRONG CONSISTENCY AND WEAK CONVERGENCE
3.1 Introduction

In this chapter, we study the Case 1 interval censorship model, which is a special
case of our Model I discussed in Section 2.2 with m=1. Let T be the failure time on R“ =
[0, 00) with survival function S(t) = P( T > t ), Y be the inspection time on R+ = [0, 00)
with survival function S,(t) = P( Y > t ), and 5=[TS Y]. We observe n independent and
identically distributed copies of (Y, 8), {(Y, ,8, ): i=1, 2, n }.

Assume the following conditions to hold:
(B1) T and Y are independent.
(BZ) S, is continuous, and S,(s) - S,(t) > O, for V O S s < t < 00.

(B3) S is continuous, and S(s) - S(t) > O, for V 0 S s < t < 00.

Deﬁne
W, (t)=P(Y >t, 6 =1),
W. (t)=P(Y», 6 =0). teIO, co),

which can be written in terms of the survival functions as
M(t)=-]i1—S<s))dsy(s), (3.1)
M(t)=-f39(s)dsy(s). (3.2)

Deﬁne the empirical survival distribution of the inspections Y ,, . 1 -, Y

16

17

§§"’(t)=l:[x >t],

hi

the left censoring process

Nimitz St, 8.-=1],

1'21

and the right censoring process

main: St, 6.=01.

i=1
Let WI‘"’<r)=1(NL(oo)—N.<t» and Wz‘"’(t)=1(N,<oo)—N,(t)). Ni and NR are
n n
counting processes. Note that NL(oo) and NR(oo) are ﬁnite, and NL(oo)+NL(oo)=n. We
. co no I 9- _
denote the mtegrals L1,”) by L, LCD) by L , and Jionby I0. Deﬁne 0 — 0. Let it be the

Lebesgue measure on [0, 00). A function on [0, co) is said to be a sub-survival function if
it is nonnegative non-increasing, and right continuous. It is called a survival function if

additionally its values at 0 and 00 are respectively, 1 and 0.

We construct the estimators of S and S,., through the following two equations (3.3)
and (3.4) (see Remarks below).

n no 1_ SM) (t) n w n n
W,‘ )(’)+lr+1——_.9de( )(s)=LS§)(s-)dS( )(s) , (3.3)
WW)=-]fS‘"’<s)dS;"’(s), (3.4)

with 5"" (O) = S,(") (O) = 1, t e [0, 00).
In the next section, we will show that a solution 5"” exists and that it is a self-consistent

estimator of S.

 

18

Remark:
(1) An obvious approach to obtaining estimators of S and S, is to replace all functions
in (3.1) and (3.2) by their estimators, and search for a solution to the equations. For this

approach, we would solve for S1") and S,‘"’ from the equations:

W."°(t> = -],f(1- s‘"’(s))dS;"’(s> ,

W (t) = -J: S“) (new (s).
This approach can be applied to right censorship and double censorship. Chang and Yang
(1987) utilize this method in the latter case. However, in our case of interval censoring, if
we solve for S‘") and S,‘"’ from these equations, then S,.“) will be the empirical of S, and
S‘") will not be a survival function. The modiﬁcation made in (3.3) leads to the

appropriate solution of 8"" as a survival function.
(2) The solution S,"" from (3.3) and (3.4) is not the empirical S)”. Later in Section

3.3 we give its relationship to the empirical Sf”.
3.2 Self-Consistency
Given { (Y, ,6, ): i=1,2,---,n } = 11/", computation of the conditional expectation

F(t/\)u)[ F(t)—F(t/\u)
E(— ;§[T>t] I Mn)gives H{_ [x_u]+ 1—F(u)

 

[x > u]}dP,, (x,u), where

P,,(x, u) = 12m S x, Y, S u], F = l-S. A self-consistent estimator S‘") of S is a solution of

i=1
the equation (self-consistency equation)

50%): E(— 3121754 I 2,),

"i=1

where the right-hand side is evaluated at S"). In this section we show the equivalence

between the self-consistent estimator and the solution to the equations (3.3) and (3 .4).

19

Theorem 3.1

Given the observations { (Y, ,5, ): i =1,2,--°,n }, if S ('0 is a self-consistent estimator
of S , then there exists a survival ﬁtnction SI"), such that S 0') and SI") are the solution of
the equations (3.3) and (3.4). Conversely if S 0" and S,(f') are the solution of the equations

(3.3) and (3.4), then S 0') is a self-consistent estimator of S.
Proof: Suppose S 1") is a self-consistent estimator of S, then
1—=s‘"’(t) ES,,{E,(t) ”1,... , 1;; 5,,---,5,,}, (3.5)
where F,,(t) = 12”; S t] is the empirical of F = 1-S(t), and E W is the expectation
n

i=1

under the assumption that T, have the survival distribution S 0" for any i. So
1" 300“)" — 12E 8"" II]; St” X’si}:

1- S(")(t A 1;) + 50%: A 1;) — S(")(t)
1-50009 " SW01)

 

 

__1_ " _
*n§{ (1 51)},

1— S‘"’ (t)
-sS‘"’( )

,S(n) ( S) _S(n) ( I)
We )

Recall the deﬁnition of W,"° and W2”), and that N L(oo) + N R(°°) = n. We get

=—[NL()+ +I,""1 —dNL(s s)+ +I

 

dN,(s)]. (3.6)

8%) = W."”<r)+ we‘"’(r)+I”——“S(")(’) dm‘"><s)— I—Sm“) new). (3.7)

'+1— S‘")(s) 0 S(")(s)
Deﬁne the estimator of S, by
(n) (n)
s (t)- _ 1+ I0 WW (S), ta [0, oo). (3.8)

The deﬁnition of Sf,” is valid, since S (”)(t) 2 W200 (t) by (3.7) , and %= 0. Also, SI") is a
right continuous non-increasing function on [0, 00) with Sf."’(0) = 1.

Differentiating (3.8) yields

20

I
S 9') (s)

 

dS§")(s) = dW2(")(s), or dW2(")(s) = S ('0 (s)dS§")(s). (3.9)

By integration, this leads to (3.4).
Let Y,,,” > Y,,, be an arbitrary point to put on the remaining mass of 5‘") and SI"). Then
.S‘"’(00)=S§")(00) = O. This does not affect the self-consistency of 5‘"). So (3.7) becomes

an no 1- S‘"’ (t) .
(n) _ (n) (n) (n) (n) (u) (n)
S (t) — W. m— LS (ads, <s>+ Lam—”4W1 <s>-S (oIodS, (s),
with S§"’(O) = 1. Using the fact that
d(UV) = VdU + U_dV (3.10)

for discontinuous functions U and V, this becomes

1—S‘"’(t)
1-S(")(s)

which is (3.3). Hence S1") satisﬁes equations (3.3) and (3.4).

W.‘"’<r)+ I: dW.""(s)=-I:S;"’(s—)ds‘"’<s), (3.11)

Conversely, the entire argument above can be reversed. Suppose S "0 and SI") are
the solutions of (3.3) and (3.4) with S‘"’(O) = S§")(O) = 1 and S‘"’(oo) = SI")(oo) = 0. Then we
will get (3.8) from (3.4), and substituting it in (3.3) will lead to (3.7), which is the self-

consistency equation (3.5).

The estimators S”) and SI") deﬁned through (3.3) and (3.4) is speciﬁed only at the
observed points {Y,z lSiSn}. We extended 5"" and SI") to R+ by making them right
continuous step functions, which jump only at Y,'s. Unlike the right censoring and double
censoring cases, even if we just concern ourselves with its values at the observed points,
the self-consistent estimator from interval censored data is not unique. Both Turnbull
(1976) and Groeneboom (1992) obtain a MLE by maximizing

H{F(I§ )}5" {1—F(I§ )}"5'. They show the estimator to be self-consistent and unique.

i=1

21

The following example shows that there exists a self-consistent estimator which is not the

MLE.

Example 3.1

The following data, (1, 1), (2, 0), (3, 1), (4, 1), and (5, 0), n=5, with corresponding
intervals (0, 1], (2, co), (0, 3], (0, 4] and (5, 00) were analyzed using the APL program to
solve the self-consistency equations. Two different estimators shown below were
obtained (assume Y,,,,= 6 is the point to put on the remaining mass). It can be shown that
both Sf") (Figure 3.1) and S15") (Figure 3.2) are solutions to (3.3) and (3.4) and that Sf") is

the MLE. Hence there exists self-consistent estimator, namely 52‘"), which is not MLE.

S."°(t) = 1, taro, 1); S§"’(t) = 1, r em. 1);
= 1/2, t EU, 3); = 2/5, t e[1, 6);
= 1/3, t GB, 6); = O, t e[6, 00).

= O, t e[6, 00);

1/2- '
1/3 -

 

 

Figure 3.1

22

2/5 _

 

 

Figure 3.2

3.3 Strong Consistency

We already deﬁned S ,S,,W, ,W2 ,S‘") ,S}"’,W,"" ,W2("’ ,and SI") . They are survival
or sub-survival functions on [0, co), and their relationships are given in (3.1) - (3.4).

A self-consistent estimator 5"" is a non-increasing right continuous step function.
Let J, = { t,: k = l, 2, - - -, m} be the collection ofjump points ofS"). Let 0 = t0 < t,<, - - ~,

<tm< oo.Then{t,,: k=1,2,~~,m}c:{Y,: i=l,2,-~,n}.Deﬁne

(n) _ (n) ”I‘SWU) (n)

V, (o-W. (’)+Ir+1—S(">(s)dW‘ (s). (3.12)
Then (3.3) becomes

We)=-I,°fS;"’(s—)ds‘"’(s). (3.13)

Since W,<"’(t) + W,<'°(t) = Em), and using (3.9), we get
Mme) = d§§"’(t) - dWZ‘Wt),
= We) - S‘"’(t)dS§"’(t),
= mm) — St’u» + (1 — S‘"’(r»dS§"’(t). (3.14)
Differentiating both sides of (3.12) and using (3.10) yields

dW,"”(s) _ 1- S(")(t)

dV,(n)(t) ____ (In/‘0!) (l) _dS(")(l)fo l— S(n)(s) 1_ S(n)(t)

dW."°(t).

 

23

 

 

 

= —dS(")(t)_[wl—df/(—g:;)%)-)u
= _ds<">(r)I,°° “3:2,; 295:0» + ds‘"’<r>I,°° ——4: : EL: 8 St") (s),
= —dS‘"’ (t) I °° “5,352,; 2953(3)) + dS‘") (as?) (t—). (3.15)
Now using dV,"” (t) = s;”’(t—)ds("’(t) by (3.13), we get
ds<">(z)I°° d<§f;)£2,;f:;;(‘)) = o , t2 0. (3.16)

Since { 1,: k = 1, 2, - - -, m} are the points ofjump of S‘") , so dS‘"’(t,) at 0 and (3.16)

implies

,wd(§§"’(s)-S§"’(s)) :0 ,=, 2 ,

 

 

. (1—s<">(s)) --,m. (3.17)
Hence
mm) -S§"’(s» _ 0
1..-...) (l-S‘"’(s>> - '

Since 8"" is constant on [t,_,, t,),

 

1 "(n) (n) _ . . .
,_ S(")(tk_1)[,k:[,td)(sy (S) — Sy (S)) — O , whrch implres
I,”d(-§1"’<s>—S§"’<s>) =0. (3,18)
Therefore
SI")(t,-)=S,(,")(t,—), k=1,2,..., m. (3.19)

That is the left limits of SI") and SI") agree at every point of jump of 5"” .

To obtain the strong consistency of a sequence of self-consistent estimators S"),

we need to assign some condition on J, which is the points of jump of 5‘"). Deﬁne

24

A,,= { a): 3N,suchthatforanyn>N,J,,(co)n(t-e,t+e)¢ E}.
Condition C: P(A, , ) = l for any t> 0 and every 8 > 0.

The above condition also means that almost surely for any I > O and a > 0, there exists
N = N(t, e) such that for any n > N, J,(co)n( t - e , t + 8 )¢ E. Namely, almost surely, in

any neighborhood of t, we can always ﬁnd points of jump of 8‘") for n large enough.

Theorem 3.2

Under Condition C, S“) uniformly converges to S almost surely. That is

P( lim 1301p) lS‘")(t)—S(t)l = 0 ) = 1.

Since {S(")(t)}and {SW (t)} are series of survival functions, by Helly's theorem,
there exists a sub-sequence {Sm (t), S?" (t)} and sub-survival functions S°(t) and
S,’ (t) , such that 5"” —> So at continuous points of 5°, and SI”) —) S3 at continuous
points of $3. With probability 1, W,W(t) —> W,(t), W,(")(t) —> W2 (t), and SIM) —> S,(t)
uniformly for t e[0, 00), since W, , W2 and S, are continuous. Hence without loss of
generality, we may assume uniform convergence on the whole space (2. Further since we
will show every sub-sequence has the same limit, we will also assume {n’} = {n}. To

prove the theorem, we need a few lemmas.

Lemma 3.1

S: (t) is continuous on [0, 00).

Proof: From (3.8), S§"’(t) = 1+ J: WMMKSLI E [0, 00)-
s

For continuity points of S3 (t) 0S s, <s2 , we have

25

—1—dW2‘"’(s),

_ (n) _ (n) =_
(Sy (32) SY (51)) J. S(")(S)

(51-52]

 

< -(W;"’ (s2) - W1” (s: D

_ S(n) (82) ’

5 _(W2(") (32 ) _ ”120001 D , (3'20)
"/2000,”

 

where the last inequality follows from S‘"’(s2) 2 W,(")(sz) by (3.7). Letting n —> 00

”(W2 (52) - Wz(s1» .

'(Sr (SD—Sr (31)) 5 ”3(32)

 

(3.21)

Hence S,0 is continuous on [0, 00) by the continuity of W,. So, SI") (t) —-> S,’ (t) uniformly

for ta [0, co).

Lemma 3.2
W,(t) = -I°° s°(s)ds;’(s). (3.22)
Proof: Let t be a continuity point of So, by (3.4),
We) = - I f S‘"’(s)dsi"’(s),
= —I:d[s“’(s)si"’(s)1+Iﬁsitts-Msws) .
= S‘"’ (1)5)” (t) + f 5;") (s-)dS‘") (s) .
Lemma 3.1 implies SI") —> S,’ uniformly on [0, 00). Hence
"5‘"’<t>—"*'i+s°(t)S3(r> + I":S;’(s)ds°(s>,
= —I”s°(s)ds3(s).
Since the continuity points of S0 are dense in [0, co) and W2 is continuous, therefore

W20) = - I °° S°(s>d83(s). new, no) .

26

Lemma 3.3
S3") converges to S, that is S3 2 S, .

Proof: For te[0, 00), let s, e J, such that

 

s, -t| = min{|t, —t

 

;t, 6 J,}. Then under
Condition C, s, —-) t . Remember S3")(s, -) = S3"’(s, -), so

Si") (t) " Sy(t)=[51(r") (l) ' Si") (S,. ')l + [Sim (Sn ") - Sy(S,. ‘)I + [Sr (Sn —) " Sy(t)] ,

55+5+5
,, —) 0 , by S3") —) S3 uniformly and the continuity of S3 .

1,, —-> 0 , by S3") -) S, uniformly on [0, oo)

1,, —> O , by the continuity of S, .
So, S3")(t) —> S,(t) for te [0, so).

Hence S3 5 S, on [0, 00), since both S3 and S, are continuous.

Proof of theorem 3.2:

By (3.2), Lemma 3.2 and Lemma 3.3, we have W,(z)=—I,°fs(s)ds,(s) and

W,(t) = — I °° S°(s)dS,(s), te[0, 00). That is I”(S°(s)—S(s))d S,(s) = o, te[0, 00). So 5"

= S as. [I]. But S ° is right continuous and S is continuous, so S 0 a S on [0, 00). That is

S‘") -—) S , the continuity of S makes the convergence uniform. Therefore

. (n) — = :
P( 11m .ZEEJS (t) S(t)| 0) 1.

"—)w

Groeneboom (1992) studied the maximum likelihood estimator (MLE) of S,
which he called the nonparametric MLE (NPMLE). He shows that the NPMLE is

27

strongly consistent. We prove here that if S 0" is the NPMLE, then S") must satisﬁes

Condition C and therefore strongly consistent. This is stated below.

Theorem 3.3
If S ‘"’ is the MLE of S, then S W satisﬁes Condition C.

Proof: See Appendix.

3.4 Weak Convergence

In this section, we discuss the weak convergence of self-consistent estimators.

Now assume Tand Y are deﬁned on [0, l] (or deﬁned on [0, M] with M > 0). Deﬁne
' ut’m = 43 (We) — S(t)).

u§"’(r> = 42 (SW) — S,(t».

q:"’<r> = Jim/1"”(t) - W1(t)).

q§"’(t) = JRWZWI) - W20»,

We) = J3 (§§”’(t) —S.<t».

It is easy to see that the processes q,("’(t), q§"’(t) and R("’(t) converge weakly to a
Gaussian process, since they are sum of i.i.d. random variables modiﬁed by n“”. We will
now discuss the weak convergence of u,‘"’(t) and u§"’(t). Taking the derivative of (3.2)
and (3.4), we have dW2 (t) = S(t)dS,(t) and dW,("’(t) = S‘"’(t)dS3"’ (t). Subtracting one
from the other, then

dq§"’(t) = S‘"’(t)du§"’(t) + u§"’ (t)dSy(t), or

dq§"’(t) _ uf"’(t)
S‘"’(t) S(")(t)

 

dug") (t) = (18, (t). (3.25)

Assume that the largest observation is right censored to make 1 as the last jump point of

5"". Integrate both sides of (3.25) to get

28

 

,. '- dqé'” (s) u” (S)
”3 )(t_)=.[o S(")(S ) —I0Lds S(")( S) 7(

By (3.19), S,‘”’(t,—)=§§">(t,—) so u,">(1,—=) R‘")(t,—)= I:_dR("’(s).Hence

I," :1::3——:d5; ,(s s)= I "' §§:)——(—(‘,)— I " dR Rs‘"’( ) (3.26)
Applying the same technique we used to obtain (3.18) we may remove S(")(s) in the
denominator,

I;"u,<"’(s)ds,(s) = I:—[dq§"’(s) — S‘”’(s)dR(")(s)], k = 1, 2, - . -, m. (3.27)

The following lemma will be used to ﬁnd the asymptotic variance.

Lemma 3.4
For any bounded measurable functions h, and h2 on [0, 1], the variance of the

following integral is
Var { I; h.(S)dql"’(S) + I modqr’tsn
= —I,; h2(s)th(s) — I; (towns) — {I31 (s)dW.(s> + Igntswwsnz, 0 s t s 1.

Proof: Since{Y,, 5,} are i.i.d., and by the deﬁnition of q,"" , qé") , W3") and W21") (t) , we have
Var { I modqt’ts) + I meagre):
= Var {JEIJIttsWWm + «HI; mansion.

= Va, {_‘/1=:{h,(1;)[5, = 1,1; St]+h,(K)[8, = 0,11 S tl}},

"1.1
= Var {h,(I;)[5, =1,1; st]+h,(1:)[8,. =0,K Sill,
= E {h,();)[8, =1,); st]+h,(r;)[5,=0,11 St]?
- {E MODE). =L1§Stl+hz(1:)[51=0,XSII}}2,

def.

=L-A-

 

29

The cross-product term of I, is 0 when expanding, so

I. = 80120916. = 1,1: 911+ Eihfms. =1,): s t1},
= -I,' momm— Igtétstzo).

By the same principle, 1,: {I0}, (s)dW,(s) + Io'h, (s)dW2 (s)}2.

Theorem 3.4

Let S‘") be the self-consistent estimators of S satisfying Condition C, then
(i) ‘51-]: (S m (s) — S (S))dS, (s) converges weakly to a normal distribution with
mean 0 and variance -I01S(s)(1— S (s))dS, (s).
Further, if Y has positive continuous density g, then

(ii) x/hI;(S("’(s)—S(s))ds converges weakly to a normal distribution with mean 0

and variance I0] S(S)fg1(;;9(3)) ds.

Proof: (i) Taking t,=l in (3.27) yields

 

Jit- I: (S 0') (s) — S(s))dS, (s) = I: [dq§") (s) — S”) (s)dR('" (s)], which converges weakly to a
normal distribution since 5"” is strongly consistent.
The variance is given by the asymptotic variance of I: [dq§") (s) — S(s)dR("’ (s)] =
—I,:S(s)dq,‘"’(s) + I010 — S(s))dq§"’ (s), which is obtained by using Lemma 3.4 as
‘ (n) ' (n)
Var {—I, S(s)dq. (s>+I, (1— S(s))dq. on
I 2 l 2 l l 2
= —I, S (s)dW.(s) — I0 (1— S(s)) dmts>+{-IOS<s)dW.(s) + I,(1- S(s))desn ,

= —I§S(s)(1—S(s))dsy(s>.

30

(ii) Deﬁne the function g, on [0, 1] by g,(s) = g(t,_,) for s e [t,,, , t,) , k = 1, 2, - - ~, m. Then

almost surely —1— — -1— —-) O uniformly on [0, 1] since g is continuous. So
8,.

1 l l
m d =— I") —dS, ,
Iou. (s) s l,“ mg“) (s)

 

=-l'u.‘"’(s) 1 dS,(s)+0,,(1),
0 g..(S)

= - m 31(S)I[r 1 ,uf")(s)dS,(s)+ 010(1)’
k=l n """

 

=-z

k=l gn (S)

 

l. .,ldq§"’(s)-S‘"’(S)dR‘")(S)1+or“),
= _I‘;[dq;">(s) _ S("’(s)dR(")(s)]+ 0,,(1),

° g..(s)
= ‘l ——-1dq;"’(s)- S(s>dR‘"’(s>1+ o (I)

The desired result follows since -IOg—(1—S)[—dq§")(s) S(s)dR(")(s)] converges weakly to a

normal distribution. The asymptotic variance is obtained by the same method as in (i).

Groeneboom (1992) proved the weak convergence of MLE under certain
conditions. The following theorems are proved by using Groeneboom's Lemma 5.4
(restated here as Lemma 3.5). Assume T and Y have bounded densities f and g
respectively, satisfying ﬁt) 2 or > 0, g(t) 2 or > 0, for t e [0, 1] and g has a bounded

derivative on [0, l].

31

Lemma 3.5 ( Groeneboom)

Let or, = M n'm, where M > 0 is a constant. If S") is the MLE, then for any t0 6
[0, 1].
(i) The probability that S‘") does not have a jump in an interval of the form [t,,-or,, to+or,]

can be made arbitrarily small.

(ii) sup Is“) (1) —S(t0)l = 0,02%).

Idle-(1,, .10 441,,

The following theorem presents the relationship among S3"), S3"), and S,.
Theorem 3.5

If 15"" and S3") are solution of (3.3) and (3.4), with 5‘") being the MLE, then
(i) JE<S§"’<r)-§i"’(t»—”+o, r em. 11;
(ii) 11;") (t) = JI—I(S3n) (t) -— S, (t)) converges weakly to a normal distribution with mean
0 and variance S,(t)(1 - S,(t)).
Proof: For any t 6(0, 1), let t,>t such that S‘"’(s) = S"’(t,) for se [t, t,] and
S3") (t,) = S3")(t,). Then t,- t = 0,(n"’3) by (i) of Lemma 3.5. Deﬁne

P,(t,y) = £2.17; S t,)? S y], which is the empirical ofthe pairs (T,, Y,), and

hi

P (try) = P( T S I, Y Sy) = (1-S(t))(1-Sr0’))-

n 1 l n n 1 n l 1
Then s; 1(1) = "WWW; ’(s). W.‘ ’0) =;§1T. > 11.1: >t1= I,I,,1s>y1d12.(s,y),

..n 1 " 1 1
and Si ’(t)=;2111 >tl= I,I,,dP.(s,y>. So
i=1

1
S 3" (s)

 

1 1
(n) _ (I!) _ ('0 —______
s, (t,) S, (t)-I(] atW2 (s)— Swan)IOIW[s>y]dP,(s,y),and

.. ~ I
Si"’<t.) - St’tr) = —I, I, ,dR.(s,y).

Therefore

32

 

n ”n l [S>y]
Si ’(r) -5; ’(t) = —I, 1,...{1- S,., (tn)}d1’..(s,y).

= @1753 1..., {is > y] -S""(t..)}dR.(S,y)r

=IS<»>(t, )Io It”, {[5 > y]— S‘"’(t.. )}dP(Ss y)
1 1 ..
.»§,—,,—,(t—n)I0 Im {[s > y]—S( )(t,)}d(P,, — P)(S.y),

def.

= Iln+12n '

1

I... = “WI...,1{S(S)‘ s‘"’(t.)}dS.(s).

=S.,..( ——,{S‘"’< .)I,, as .s—() I S(s)dS.(s>}

{(5330, )- S(9))(Sy (t)- S, (1.. D} (wherCGEUJ ,.l).

 

=SW )
= 0,01%),

since S‘"’(t,)-S(0) =o,(n‘V3) by (ii) of Lemma 3.5, and S,(t)—S,(t,) =o,(n“%).

= o,( n“ ) by P,- P = o,( 11"” ) and t,-t = o,( n"’3 ). So
s/Ztsi"’(t) - Site»: 0.0).

(ii) follows from (i) since we know Jh(S3”(t) —S,(t)) converges weakly to a Gaussian

Under these conditions, we can use Lemma 3.5 to extend the result of Theorem

3.4 to integrals over [0, t], for any t 6(0, 1).

Theorem 3.6

Let 5"" be the MLE of S, then

33
(i) J; I; (S ("’(s)—S(s))dS, (s) converges weakly to a normal distribution with
mean 0 and variance —I:S(s)(l — S(s))dS, (s).
(ii) JZI; (S"')(s)-S(s))ds converges weakly to a normal distribution with mean 0

and variance I: S(s)(gl(-;;S’(s)) ds.

 

Proof: We will show for any t e(0,1)
I. uf"’(s)dS.(s)-I,1dq§"’(s)- S(s1dR‘"’(s)1—”>o. (328)
since I: [dq§"’ (s) — S (s)dR(") (s)] converges weakly to a Gaussian process.

By (i) of Lemma 3.5, fort e(0,l), there exists a jump point t,>t, such that t,- t = 0,(n"’3).

Then by (3.27) and using the same principle as in the proof of Theorem 3.5,
I, u:"’(s)dS.(s) — I,"u.‘"’(s)dS.(s) = —~/;I(,M(S(")(s) — S(s))dS,(s) = 0,0), and

III-”41") (S) ‘ 5"" @de) (S)) - I0 [dq§"’ (s) — 5‘") (s)dR""(s)] = o, (1).
(3.28) follows from the strong consistency of 5"". Hence JhI: (S(")(S)—S(s))dS',(s)

converges weakly to a normal distribution. Having proved weak convergence, the

variance and (ii) can be obtained in the same way as in Theorem 3.4.

Chapter 4
BAYESIAN ESTIMATION

4.1 Introduction

The problem of nonparametrically estimating a survival ﬁmction from censored
data has been addressed by a number of authors. For right censored data, Susarla & Van
Ryzin (1976 ) gave a nonparametric Bayes solution to the estimator under squared error
loss using Dirichlet process prior. The resulting Bayes estimator was shown to reduce to
the PL estimator in the cases where or(R+) —) 0, where a( - ) is a ﬁnite measure on R+
which serves as a parameter of the Dirichlet process prior.

In this chapter, with right censored data, we ﬁnd another expression for the Bayes
estimator, which is related to the self-consistency equation. This approach sheds more
light on the connection between the Bayes estimator and the Kaplan-Meier's PL estimator
(also the MLE). For interval censored data, we also present a Bayes estimator, but we
ﬁnd that in this case the Bayes estimator may not reduce to the MLE as a(R+) —> 0.

Let (R+, d", i0 ) be the probability space, where R+ = (0, co), and ﬂ is the Borel
o-ﬁeld on (0, 00). P is a random measure. We say Pe Wm) to mean P is a Dirichlet

process on ( R+, d’ ) with parameter or. See Ferguson(1973) for the deﬁnition of Dirichlet

 

process. J’( P(A)) =

MAR) , forAe Ki T, , - - -, T, is called a sample of size n from P, if
a

ﬂu, eC,,~-,T, EC, | P(A,),-~,P(A,,);P(C,),-~,P(C,)} = 1‘]P(C,.), a.s.
i=1

We assume T, T, are the lifetimes with survival S(t), and further that they are a

Sarnple from P, S(t) = P( t ,oo ). Pe .ﬂa). The Bayes estimator is always under the

34

35

squared error loss L(S,S) = I: (S(u)—S(u))2dw(u), where w is a weight function and

S(u) is a estimator of S(u).

Ferguson (1973) proved:

(i) The conditional distribution of P given T, ,-~,T, is a Dirichlet process with
parameter

B=a+:8,

i=1

(ii) Let S(t)= P(t, 00), then, given T, ,- ",,,T the Bayes estimator of S(t) is
Sh(t)= 3(S(t)| 71"",72).

_ 0t(t, co)
ot(R)+n +1ot(R)+nZ[’>]

 

_ a(t,oo) + n
"o(R)+n a(R)+n

 

S, (t). (4.1)

whereS, (t) = 1 2H; > t] is the empirical of S(t), and [T,>t] is the indicator of (T,>t) .
n 1

4.2 Right Censoring

We now consider right censored data, where T ,- « -, T, are the lifetimes and
Y,,- . -,Y, are the censoring times. We observe {(Z,, 8,) ; i= 1,-~-, n }, with Z, = T, A Y,, 8, =
{T, SY,], i= 1, - - -, n. Without loss of generality, assume8, = = 8, = l, 8,,, = = 8,= O
and that Z, ,--°,Z, are all distinct. Let

N , , (t) = Z[Z, 2 t,8, = 1] be the number of uncensored observations in [t, 00), and

i=1

36

N, (t) = Z[Z, 2 t,8, = 0] be the number of censored observations in [t, oo).

(’21
Let N (t) = NU (t) + NC (t), N " (t) = N (t+), then N(t) is right continuous counting

process. Actually N(t) is the number at risk at time t.

To ﬁnd the Bayes estimator under such right censored data, Susarla & Van Ryzin

use the following lemma and theorem.

Lemma 4.1

Given ( Z, ,1), - - - (Z,, 1), the conditional distribution of P is a Dirichlet process

k
with parameter [5, = or + 28,, .
i=l

Theorem 4.1 (Susarla & Van Ryzin)
Consider T,,, , - - -, T, as a sample of size n-kﬁ'om J45, ), which is obtained by

Lemma 4.1. Then under condition (Z,,, , 0), - - -, ( Z,, 0), the Bayes estimator of survival
function S(t) is

S(t) =

 

 

a(t,oo)+N+(t)I'-'I{ “(Ztaw)+N(Zi) (4.2)

or(R)+n ,,, or(Z,,oo)+N+(Z,)

Remark:
From (4.2) , by taking a(R+) —> O, Susarla and Van Ryzin showed that the limit of
S (t) is the Kaplan and Meier's PL estimator

[2, s:.5,=l]

(1 NZ,
S..(r>= II{1-(———N,3,”}

37

The following theorem enable us to see more about the relation between the
Bayes estimator and the Kaplan & Meier's PL estimator. The proof of lemma 4.2 is in

Appendix.

Lemma 4.2
Let S(t) = o"{S(t)| (Z,,8,),---,(Z,,8,)}. Then
”{7} >1 |(Z.,5.),---,(Z..,5..)} =[Z,>tl.if8,-=1;

= S(th,)
$12,)

,if8,=0;j=1,-~,n
Theorem 4.2

Given {(Z,, 8,) ; i= 1,---, n }, S (t), the Bayes estimator of S(t) is the solution of
the following equation

S(t) = who, oo)+ N*(t)+ I %d(—N NC(t))}. (4.3)

Proof: By (4.1) and Lemma 4.2,
S(t) = J’{S(t)| (Z..6.>.-~,(z,..6.)},
= fl J’(S(t)IT..---.T.)1 (Z,,5,),...,(Z,,5,)},

a(t, 00)
0‘(R )+n+a(R*)+n g” >t]|(Zl’51)” (321.95,»

 

_ a(t, co)
a(R )+n 0t(R 1)+n,,

 

2 MT. >t| (2.,6.)-- (2,6.1),

———{a(t, oo)+Z[T >r]+ Z W(T >t| (Z,,8,,-) ,(Z,,8,))},

=a(R+)+ j= -k+l

38

S(th,)

=a(R 1")+ ——{a(t, oo)+N,, U(t+)+j="2k;rl— S(Z) ——},
_ 1 00 S(_(t_d)(
-——a,,.)+ {att )+N..(t+)+N.<t+)+ I —N.(s>)},
0.. . S(t) _
=a,, 1,) ———,.{a(t )+N (t)+I'S —d( Neon}.
Remark:
SinceSPL (t) is the solution of equation
S..<t> = -‘-{N*<t) +I'5—"1’ldt—Ncmll, (4.4)
n oS,.,(s)

ﬁom (4.3), it is obvious that the limit of S(t) as a(R+) —) O is the Kaplan-Meier's PL
estimator, In other words, the Kaplan-Meier's PL estimator can be obtained from the self-
consistent equation S(t) = E g {S, (t)| (Z, ,5,),---,(Z,,8, )} , and the Bayes estimator can be
obtained from the self-consistent equation S(t) = E, {S,(t)| (Z,,8,),---,(Z,,5,)}, where

a(t,oo) + n
a(R) + n a(R) + n

 

S,(t) is the empirical of S(t), and s, (t) = S,(t).

4.3 Interval Censoring

For interval censoring case 1, we observe {(Y 8:) i— — 1,--- , n }. Let

i’i

(O,Y] if 8,=], ,
,-—-(Y{ be the observed intervals. We want to ﬁnd the Bayes

Yoo) if 5, =0.
estimator ofS(t). S(t) = J’{S(t) «mane-.0233}.

39

Let Ya) < - - - < Yo») be the ranked Y,'s, and let 50) and Am be the 8 and A corresponding to

Y

(0 . Then, under squared error loss, using the method of Lemma 1 in Susarla & Van

Ryzin (1976), the Bayes estimator is
S(t) = £(S(t)|(11,6.),~-,(x.,8,.)),
= J’(S(t)|7; eA,,---,7; e A"),

J’(P(t,°°)lll P(AU)» def. 1
: M = “—1—. (4.5)

Jilin/1...,» ’2

 

Now let us calculate I1 and 12.

Let 31:“), YmL 32 =( Y“) , 11(2)] ,. . ., B,=( Y,,”) , Ym], B"+1 = (Yo!) , oo) , and let a: =

l
a(B,), a2= a( B, ), - . ., an: a( 3,), an“: a( BM). Then A, =UB, if 5,. =1, and

)=1

A, =R+—UB, if 5,.=o. B, B", 3",, isa partition ofR+.

I"

P e do), the density of (P(Bl),~-,P(B,,)) is

 

1 . n 11,—]" _ n aMI-l
0(a‘,...,an+l) “x, (1 2x1) ’

igl Isl

F(a,)---F(an,1)
I‘(a,+---+an+,)’

 

where D(otl ,---,a,,+,) = on, +---+ocn,,= oc(R+).

Suppose t = Y“), then

12.D(ap. . .’an+l)

= J‘ ﬁfim,...,xi,5(i)).f1xfr‘.(1_:x,)°‘~+i“dxnmdx,, (4.6)

xl +"'+an' '3' i'l (=1

and

4O

11'D(av' ' 'aam)

=J Hf(x1’°' ' xi’6(i))(1_ _zlxj )' 1:135? .(1_:xi)“m"dxn...dxb

x,+-- -+x 5] '=1 i=1
’ 1'
2x1 iff’m =1;
'=l
where f,(x,,---,x,.,8(,.))=< ’ ,.
t 1='

 

Hence we have the following theorem.

Theorem 4.3
Given (11,8, ),-",(X, ,5"), the Bayes estimator of S(t) is

so): J’(S(t)|(}';,5,),...’(y 5 ”ﬁg

where I1 and 12 are decided by (4.7) and (4.6).

The calculation of II and I2 is done by using

IOCxY‘I(C_x)ﬂ-ldx=CY+ﬂ-IB(y,n) forOSCS 1 and Y, 11> 0 ’

where B(y, n)- — M ,and F(a+1)= aF(a) for a > O.

 

 

(4.7)

IW11)
Even in the simplest cases, the expression for I1 and 12 can be quite complicated. For
example, if all 8's are 0, then
I"(CLOTH r‘(Ot 1+1)1"(a,.)”1"(0tn+1+"‘a2 +n)F(ai)
2 _r(a1)”r(an+l) r(an+1+an+1) l-‘(ari1~1"""'+’a'l4'")
I"(0L(R+))

 

= r(a(R+)+n)a((n),°0)(<1(}f,._1),°°)+1)“(0“ (1)’°°)+"_1)’

41

_ P(a(R*))
1“(a(R*)+n)

 

0(Xn,a°°)(a(}fn_n ,oo) + N+ (Kn—1))) ' ' (“(111),”) + N+ (Yb) )) -
If there exists at least one 8 equal to 1, then I2 is a summation of the forms above. Such
as, suppose 8,“) = 1, then

= P(a(R*))
2 r(a(R+)+n—1)

 

a(Kn)’w)(a(Kn-2) 9°°) + N+()(n—2)) "1) ”'(aaimw) + N+(Y(1))"1)

_ l“(OIUVD
I“(a(R*) +n)

 

“(Kn)a°°)(a(Kn—1) ,oo) + N+(}(n—l))) ' ”((1021) ’00) + N+(Y(1)))°

In each case, I, will have a similarly complicated expression.

Theorem 4.3 tells us a way to calculate the Bayes estimator S(t), but it is too
tedious to ﬁnd an explicit expression. The following example show that given a Dirichlet
process prior (1, with nonnull probability, there exist Bayes estimator S(t), which does

not converge to the MLE as a(R+) —) 0.

Example 4.1

Let y,, y,, y3 and y4 be ﬁxed real numbers with O < y, < y2 < y3 < y,. Consider a
Dirichlet prior such that
oc[0, y,) = a[y,, y2) = (lb/2, y3) = only,” y4) =aDI4, co) = x. a(R+) = 5x, where O S x S 00.
Let Y be a discrete random variable taking values on y,, y,, y3 and y,,. Let T be the lifetime
with Dirichlet prior or. Suppose the intervals observed are (y,, 00), (y,, 00), (0, y,,], and
(y,, 00), n = 4. The probability of getting such observations is P( Yl = y,, T, > Y,; Y2 = y,,
T2 > Y2; Y3 = y,, T3 S y,; Y4 = y,, T4 > Y,, ), which is positive. Then the Bayes estimator S,
when t = y,, is

Sg(t): 5? S(t) IXIEO’v 0°): X2605, ‘30), X3€(0,y3], X46041, ‘30)},

42

£(P(t,oo)P(0,y3]l'I P(ynw) w. 1

= iat3 _

i<P(o,y3]1’[ Ponce» ' 12'

i323

 

Rewrite I, and 12,

I. = J’(P<y3,oo>1'IP(y,-,oo»- f<P<yeoo>H P(ypoo», and

#3 i

12= J’(1‘[P(y.,oo))- J’(1’[P(y.,oo)).

Then, each terms in I, and I2 can be calculated by integration as in Susarla & Van Ryzin

( 1976). Denote a = a(R+).

= Na) _ Na)
1, P(a+4)x(2x+l)(3x+2)(4x+3) ma”)

 

x(2x+1)(2x+2)(3x+3)(4x+4),

= Na) _ Na)
2 F(a + 3) x(3x+1)(4x+2) F(a + 4)x(2.x+l)(3x+2)(4x+3),

 

I = (x +1)(3x + 2)(4x + 3)(5x + 4) — (x + 1)(2x + 2)(3x + 3)(4x + 4)

S°’ 5") = I (3x+1)(4x+2>(5x+3)(5x+4)—(2x+1)(3x+2)<4x+3><5x+4>’

 

lingSU) = 13/22.

On the other hand, the MLE is S"’(y,) = S"’( y,) = 1, Smog) = S")(y,) = 1/2. As x —) 0, so
that a( R+ ) —-) O, S(t) does not converge to 1/2. Hence, S does not converge to the MLE

asa(R+)—>O.

Chapter 5
SIMULATIONS

In this chapter, we investigate the performance of our estimation procedures

through simulation studies from known distributions.

a. Parametric assumptions

We just consider the 2-inspection interval censorship model described in Chapter
2. The two inspection times are W,, W2 ( W, < W2 ) are generated through independent
exponential rv's V,, V2 with mean 1; That is, W,=V,, W,=V,+V2. The survival time T of
interest is also assumed exponential (mean 1/9) and independent of (V,, V2 ).

The stopping number N(t) = min{ k >0: TSW,_<J )(N(t)= +00, if there is no such k)
of (2.1), has values 1, 2, and +00. Its distribution is obtained from (2.2) - (2.4):

P[N(t)= 1]: —9——e"(1— 1

e-Ol)’
(9+1) (9+1)

1 - , , 9 _ . , 1 -. _ + 1
PW!) = 2]= (7971){'em+(—9T1)‘(1'em)}‘6(e -e ‘° " ),
P[N(t) = 00] = 1- P[N(t) s 2] .
Allowing t —-) 00, leads to consideration of an unrestricted follow up period. This is the
case (Case2) considered by Groeneboom and Wellner (1992), With t inﬁnite in (2.1) the
random variable N (5 N(oo)), has three values 1, 2 and +00 corresponding to (1) Te (0, W ,],
or (2) T e(W,, W,], or (3) T>W,. The distribution of N is then

P[N=1] = 29—3—1, P[N=2] = (Ti—:11?" P[N= 00] = (9:1),

 

43

44

Thus 69—1 is the proportion of left censoring, 799—; the proportion of interval
+

+1)

censoring and (Tl? the proportion of right censoring. The objective of our simulations
+

is to compare the nonparametric estimator of the survival distribution S(t) = e“, t >0 of T,
based on the Turnbull scheme, with that obtained through maximum likelihood
estimation of the parameter 6. The data are generated from n independent triples ( V, ,., V2,,

T,),lSiSn.

b. Maximum likelihood estimation of 9

The information on T may be written in the form T e (L, R], where L=O, R= V,
when N=1; and L= V,, R= V, +V2 when N=2; and L= V, +V2, R= +00 when N= 00; The SAS
LIFEREG procedure is used to estimate 9 form the data { T, e(L,,., R,): ISiSn}.

c. Turnbull method for estimation of S(t)

From the data { T, e(L,, R,]: ISiSn}, Tumle (1976) provided an algorithm for
the (nonparametric) estimation of S(t). He constructed a set of disjoint intervals {(q,, p,]:
ISjSm}, where qj's and pj's lie in the sets {L,.: ISiSn} and {R,: 1_<_iSn}. The likelihood is

proportional to

Lm= flown-Sm»). (5.1)

Deﬁne the vector of probability 3 = (s,, - - ., s”) by s, = S(qj) - S(pj) . The problem of
maximizing (5.1) reduces to one of maximizing
£6) = N(Zaysj), (5.2)
i=1 j=1

where a”. = 1 if (qj, pj] g (L,, R,], 0 otherwise.

45
For ISiSn, 15j.<_m, let 1,, = 1 if T,e(q,, p,] and 0 otherwise. Because of the

censoring the value of 1,, may not be known, however its expectation is given by

1511.1 = a. s,-/ 2,1,5, = 11., (s). (5.3)
k=1

If we treated (5.3) as observed rather than expected frequencies, the proportion of
observations in intervals is (:u,(s))/ n = 11,. We say that the vector of probability s is
i=1

self-consistent if

s, = 1t,(S,, . . ., s") (ISjSm). (5.4)
A self-consistent estimator of s is deﬁned to be any solution of the simultaneous
equations (5.4). The form of (5.4) immediately suggests a iterative procedure for ﬁnding
the solution.
(i) Obtain initial estimates s3 (13me). This can be any set of positive numbers summing
to unit.
(ii) Evaluated 11,, ( s°) for lSj_<_m.
(iii) Obtain improved estimates s} by setting

s}. = n,( s°) for IgSm.
(iv) Return to step (ii) with s1 replacing s°, etc.

(v) Stop when the required accuracy has been achieved.

d. Results
1. Our ﬁrst simulation study involves 10 replications of size n = 100 from each of the

exponential distributions with 9= .5, 1, 2, 3. The distribution of N is denoted by p =

e 9 1
9+1’ (9+1)2’ (9+1)

 

(

2 ) and [3 denotes the observed proportion of left, interval, and

right censorship, averaged over the 10 replications. Table 5.1 summarizes these results

46

for the four distribution that we generated. Even though the number of replication is

small, the degree of closeness of p to ﬂ is very satisfactory.

2. For each distribution, maximum likelihood estimate 6 is obtained from the SAS
LIFEREG procedure for each replication. In Table 5.2 we input the average of these

estimates over the 10 replications.

3. For the nonparametric maximum likelihood estimation of S(t) we use the Turnbull self-
consistency algorithm. APL program were written to efficiently carry out the task of
solving the self-consistency equations.

For each replication, the Turnbull estimator S (t) is close to a step function, with
gaps, in the sense that it is undefined on some intervals. We extended the deﬁnition of S
to these intervals by extending consecutively the steps from the leﬁ of the curve to the
next jump in S. These step functions were then averaged over the 10 replications to
obtain an estimator of S(t).

The ﬁgures illustrate the Turnbull estimator based on a single replication, and that

obtained by averaging across the 10 replications. The true survival curve is S(t) = e'z’; t>0.

 

 

 

 

 

 

 

 

 

 

 

 

 

0:0.5 0=1 0: =3
Censoring p i5 p i5 p 16 p 13
Left(N=1) .333 .317 .500 .510 .667 .655 .750 .744
Interval (N=2) .222 .220 .250 .230 .222 .224 .188 .191
Right (N=oo) .444 .463 .250 .260 .111 .121 .062 .065

 

 

Table 5.1

 

47

 

 

 

 

 

 

 

 

 

9 (3
.5 .4969
1 .9978
2 1.9678
3 2.9299
Table 5.2
TU RN BU LL ESTIMATION

Based on one sample of size 100

 

 

 

 

 

1.’L
1.1 .‘——

.8‘. \\\‘_

.6"

4:"-
41- \\.-
21- \_._,~\ _
\ ° Turnbull
0.0- -.----.-..—
° True
- I I 1 U I I survival
-.5 0.0 .5 1.0 1.5 2.0 2.5 3.0

Figure 5.3

48

TURNBULL ESTIMATION

Based on 10 samples of size 100

 

 

 

 

 

 

1.0-
.8--
.6"
.4"
.20 __
° Turnbull
0.0“- W ' ' ' '
° True
- , , , 1 , Survival
5 0.0 5 1.0 1.5 2.0 2.5 3.0

Figure 5.4

APPENDIX

Proof of theorem 3.3:
Let F”) = 1 - S"), then F") is the MLE of F. Let Y,,) < - - . < Y,,, be the ranked Y,'s,
and let 8,,,be the 8 corresponding to Y,,). Groeneboom (1992) showed that the value of F")

at Y,,, is the leﬁ derivative of H“ at i, where H“ is the convex minorant of the points

0,280,) on [0, n]. Call apointte{ Y, : i= 1, 2, - - -, n } avertex ofthe convex minorant
jSi

if it is a point such that F"’(t) < F‘"’(Y,) for any Y, > t, that is H“ changes its slope at k if
r = Y,,).

We know F") -) F0 on the continuous points of F’. If we could show F 0 is strictly
increasing, then Condition C will be satisﬁed. Suppose F0 is not strictly increasing, there
exist 3, and s,, s, < s,, such that 1306,) = F°(s2). For our convenience, assume F0 is
continuous at s, and s2, F°(s)< F°(s,) for any 8 < s,, and F°(s) > F°(s,) for any 3 > S,. If P”
is not continuous at s, and/or s,, we will consider ,S,+8 and/or S2-8 for small a, and the
proofis similar. Lets,,,=max { 15s, :1 isavertex ofF‘") } ands2n=min { r 282:1isa
vertex of F“), then s,,, —> s, and s,,, —) s2 , since 1“") —-) F0. Therefore, since H" is

convex minorant, we have

the number of 8,'s being 1 in (SM’SZ"]

F0081) S '
the number of 8,'s 1n (Slnisznl

s F<">(s,). (3.23)

 

As it —>00, 1“") (3,) —) F0 (S1), F"’(s2) -—) F0 (S) = F” (3,). The numerator of (3.23) becomes

—l—(the number of 8,'s being 1 in (s,,,s2,]) = “1'2”; 3 KY e(s,,,,s,,]],
n n

i=1

= I: 61W") U).
n—no ‘1
———> I, F(y)dFy(y),

49

50

since WI") —) W, and W, (t) = fFKQdF} (s) by (3.1). The denominator of (3.23) becomes

1(the number of6.'s in cams...» = 121): 60.88.11,
n

i=1

 

$4 I: dF,( y).
So, as n —900, we have
I: F (y)dFy(y) 0
1 32 = F (30° (3.24)
I; dF Y 0’)

By Mean Value Theorem of integral, there exists 0 e (s, , 32) such that F (0) = F0 (s, ).

But H" is convex minorant, so

the number of 8,'s being 1 in (s,,, ,0)

. 2 F(n)(sl)-
the number of 8,'s 1n (SING)

 

Following the same argument as obtaining (3,24) as n —->00,

I6 F (y)dFy (y)
I: dFy (y)

By the Mean Value Theorem of integral again, there exists 0, e (s,, 0) such that F(0,) _>_

 

2 F°(s,) = 17(0).

P(s,) = F (0), which is contrary to the fact that F is strictly increasing. Therefore F0 has

to be strictly increasing.

Proof of Lemma 4.2:
The lemma is trivially true when 8,. =1. We now assume all 8's are 0, otherwise we
could apply lemma 4.1 so that the new prior would become a Dirichlet process with

parameter [3,. We want to prove

51

S(thj)

MT,->t|(ZnO)a""(Z"’O))= 8(2) ’

or equivalently

o"(P(th,-.°°)H P(Z..°o)) 6”(P(th,,oo)H P(Z,,00))

J’(P(Z,,00)HP(Z,,00)) ’ £(P(z,,oe)n P(Z,,00))

iatj

(4.0)

Denote the LHS of (4.0) by 1’}, denote the RHS of (4.0) by %.

 

k+l
lall a2, 0, 1 bl rsz ak+2 aln “n+1
I
t
0 Z 22 HZ]. Z, ,,, Z"
Figure4.1

Without loss of generality, we assume Z, < . - - < Z,, . (4.0) is obviously true if t 5 Z,, so let
I e (Z, , Z,,,], and let B, = (0, Z, ], B, = (Z, , Z, ] ,- ~ -, Bn= (Z,,, Z,,], BM= (Z,, , 00), and 01,
= 01( B, ), 01, = 01( B, ), - - ., 01,, = 01( B"), 01",, = a(B,,,,).

Let b, = 01( Z,, t] and b, = 010, Z,,, ] , then b, + b, =01,” (see Figure 4.1). Use

‘I’Jﬂ'I (c—x)""'dx = CWHBWm) for 0 S c $1 and 7,11 >0,

where B(y,n) = W, and F(a+ 1) = 01F(a) for 01 > 0. Then, we have
Y 11

= mam»
r(a(R+) +n)

 

aml( am] T art-+1). ' ' ( “MT ' ' ' + ak+2+ "+1'(k+2))

(a,.1+ - - - + 01m+ b,+n+1-(k+2)+1 ) ( 01M+ . - - + (1141+ n+1-(k+1)+1 )...

(01",,+ - - - + a,,,+n+1-(i+2)+1)(01,,,,+---+ 011+ n+1-j)-- -(01,,,,+- - -+ 01,+n-1),

52

= 1"(Ot(R*))
I‘(01(R+ ) +n+ l)

 

an+l( (1",, + 01,,+1) ' ° ' (am1+ ' ' ' + 0942+ n+1'(k+2))

(01",,+ - - - + 01,,,+ b, +n+1-(k+2)+1 )( 01",,+ - - - + am+ b, +n+1-(k+1)+1 ). . .
(01",,+ . - - + aﬁ,+n+l-(i+2)+l )( 61M+ - - - + 0941+ n+1-(,'+1) +1)
(amr+'°'+0lj+rrt1-j+l) ---(01n+,+o--+ or,+n),

z 1"(<1(R*))
r(a(R+) +n)

 

an+l( “n+1 + anTI) ' ' ' (am1+ ' ' ' + ak+l+ n+1'(k+1) ) ' ‘ ’

(“n+1+ ' ° ' + aj+2+n+l-(i+2) ) ( am1+ ° ' ° + aj+l+ ”TI'UTI))
(am1+"'+aj+”+1'j) '°'(an+|+"'+ “24774):
and

= P(a(R*))
F(01(R*) + n + 1)

 

01",,( 01",, + an+1) . -- (01",,+ - . - + 0101+ n+1-(k+1) ) . ..

(a...+ - - - + a,..+n+1-o'+2) ) ( a...+ - ~ - + a...+ n+1-<i+1) +1)
( an+1+ ' ° ' + (1,-+n+ n+1'(i+1))(an+l+ ' ° ' + (11+ "+1"j+1 ) ’ ’ ' (am1+ ° ' ° + (12+n).
So,

£.I‘(01(R+)+n+l)
C 1"(01(R+)+n)

 

_ (01",,+---+a,+n+1—j)---(an,,+~-+01,+n—1)
(01",, +-~-+oc,+, +n+1—(j+1)+1)(01,,, +~~+01, +n+1—j+l)---(a,,, +---+01, +n)’

 

1T<9<an>+1v+<za>

 

0

J

[1(a(zi ,oo) + N(Zr))

1:1

and

B. F(a(R+)+n+1)
D l"(a(R+)+n)

 

53

(01",,+---+01, +n+l—j)---(01,,,, +---+01, +n—1)

_(a

 

+---+01,,, +n+1—(j+1)+1)(01 +---+01, +n+1—j+1)---(01,,, +---+a, +n)’

n+1 n+1

ﬁ<a(Z.,w)+N*(Z.))

 

l—I(G(Z.-a°°) + N (2.))

i=1

Hence, we get A = 2, that isﬁ = —C— .
C D B D

REFERENCES

CHANG M.N. & YANG G.L. Strong consistency of a nonparametric estimator of the
survival function with doubly censored data. Ann. Statist. 1987, 15, 1536-1547.

CHANG M.N. Weak Convergence of a Self-consistent Estimator of the Survival
Function with Doubly Censored Data. Ann. Statist., 1987, 5, 1536-1547.

FERGUSON T.S. A Bayesian analysis of some nonparametric problems. Ann. Statist.
1973, 1, 209-230.

GARDINER J .C. & SUSARLA V. Weak convergence of a Bayesian nonparametric
estimator of the survival function under progressive censoring. Statistics and
Decisions, 1983, C, 257-268.

GROENEBOOM P. & WELLNER J .A. Information bounds and nonparametric
maximum likelihood estimation. Birkhauser Verlag, Basel. 1992.

KAPLAN E.L. & MEIER P. Nonpararnetic Estimation ﬁ'om incomplete observations. J.
Amer. Statist. Assoc, 1958, 53, 457-481.

LAWLESS J.E. Statistical models and Methods for Lifetime Data. Wiley, New York,
1982.

LEIDERMAN P.H., Babu D., et. al. Aﬁican infant precocity and some social inﬂuences
during the ﬁrst year. Nature, 1973, 242, 247-249.

PECKHAM C.S. Children born to women with HIV-1 infection: Natural history and risk
of transmission. The European Collaborative Study. Lancet, 1991, 337, 253-260.
PETO R. Experimental survival curves for interval-censored data. Appl. Statist. , 1973,

22, 86-91.

SAMUELSON S.O. Asymptotic theory for nonparametric estimators from doubly

censored data. J. Statist., 1989, 16, 1-21.

54

55

SUSARLA V. & VAN RYZIN J. Nonparametric Bayesian Estimation of Survival Curves
from Incomplete Observations. J. Amer. Statist. Assoc, 1976, 61, 897-902.

TURNBULL B.W. The empirical distribution function from arbitrarily grouped,
censored and truncated data. J. Royal Statist. Soc. Ser. B., 1976, 38, 290-295.

MICHIGAN STRTE UNIV. LIBRARIES
I l I III I II I II III I II II I ll III III I l I
31293010517872