wm‘znqm

.

.
.1
t
t
a
.
w
"1
.
.
‘
..
u
u
v!
.
.

1 1K:

.to. .

.4...
.:~ .3

1h. fa» .1

13! :«

him...

.
.1. .1.m...1..1...1.....n.
.1... .1:
i...
.n 11”}th
. J r. t).
a. . .u

1../ﬂy:
.3
u...

”.1.
311!

4..

my... 1.,

a."

 

 

 

 

 

3.1.. ,
.II. '1

 

 

 

 

MICHIGAN STATE

I 3IIIIIII1III I

IBRARI IES

II1I1IIIII III II

417 2682

II

 

 

 

 

 

 

 

 

 

III 1III

 

This is to certify that the
dissertation entitled

REGRESSION MODELS WITH (CASE 2) INTERVAL CENSORING

presented by

Vasilis Katsikiotis

has been accepted towards fulﬁllment
of the requirements for

the Doctor of Wdegreein Wand Probability

 

 

Date August 2, 1995

 

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

 

LIBRARY
Mlchlgan State
UnlversIty

 

 

 

PLACE IN RETURN BOX to remove thie checkout from your record.
TO AVOID FINES return on or before dete due.

DATE DUE DATE DUE DATE DUE I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU I.“ AA‘! .L A n -1 lArr .2; .. A. I
HIS-9‘

 

REGRESSION MODELS WITH (CASE 2) INTERVAL CENSORING

By

Vasilis Katsikiotis

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

1995

ABSTRACT

REGRESSION MODELS WITH (CASE 2) INTERVAL CENSORING

By

Vasilis Katsikiotis

Interval censoring occurs frequently in longitudinal studies with periodic follow-up.
The outcome of interest is not directly observed but its occurrence can be ascertained
within an interval of successive inspection times.

The Accelerated Failure Time (AFT) and the Proportional Hazards (PH) are two of
the regression models used widely in survival analysis and reliability theory. Maximum
likelihood estimation is pursued in both models in a semiparametn'c framework.
Existence of the estimators is established along the lines of Groeneboom and Wellner
(1992). Strong consistency is proved and necessary conditions are given under which the
information for the ﬁnite dimensional parameter is positive. The importance of the
information calculation is illustrated in two ways. A lower bound for the asymptotic
variance of regular estimators is derived ﬁrst. Moreover, the beneﬁt of scheduling two
inspections instead of a single one is measured explicitly by the anticipated gain in the
information measure. Estimates of this measure are also provided.

Lack of smoothness in the function 9 l—) F,,(-,9) motivates the search for alternative

estimators in the AF T model. Asymptotically Generalized M-Estimators (AGME) are

considered and a few of the conditions of a general theorem due to Bickel, Klaassen,
Ritov and Wellner (1993) are established. A simulation study evaluates the performance

of the MLE in the AFT model.

 

 

 

N'I‘S

n
l

    

—
t. . .
p e ~ It.
t . . v .. .1 . ._ _.
.i . . v... ..
. ‘ k

- ”will be an mm

-3)

ACKNOWLEDGMENTS

I would like to thank all members of my guidance committee for their encouragement and
support during my years at M.S.U. Special thanks to Professor R.V. Ramamoorthi for
some very valuable suggestions. His comments have greatly improved early versions of
this manuscript. My deep appreciation to Professor D. Gilliland for his critical support
during my ﬁrst difficult year in the US. To Professor P. Groeneboom, special thanks for
making his computer programs available to me. Finally but not ultimately, my sincere
appreciation to two people that mean so much to me. To my advisor, Professor Joseph
Gardiner, whose guidance and continuous support was vital for the completion of this
work. To Professor Nigel Paneth whose research, analytic thinking and approach to

problems will be so inﬂuential to me now and in the years to come.

it

TABLE OF CONTENTS

List of Tables

List of Figures

Chapter 1: Introduction

1.1 Regression models in survival analysis
1.2 Random censoring

1.3 Interval censoring - A general scheme

Chapter 2: Maximum Likelihood Estimation in Two Semiparametric Models
2.1 The Cox PHM

2.2 The Accelerated Failure Model

2.3 Maximum likelihood in semiparametric models

2.4 Interval censoring with two inspections

2.5 Proﬁle likelihood estimation

2.6 Characterization of the NPMLE

Chapter 3: Strong Consistency of MLE

Chapter 4: Information Theory

4.1 Efficient scores and information bounds
4.2 Preliminaries

4.3 Information lower bound

4.4 Estimation of [(90)

4.5 Comparison of information measures

vi

22

34
35
36

48

Chapter 5: Generalized M-estimation in the AFT model
5.1 The master theorem
5.2 Generalized M-estimation under interval censoring

Chapter 6: Simulations

Appendix

Bibliography

vii

52
53

61

67

72

5.1

5.2

6.1

6.2

LIST OF TABLES

Information measure in the Cox model (y=.2)
Information measure in the Cox model (y=.5)
Proﬁle Likelihood

Proﬁle Likelihood

viii

49

50

62

63

Chapter 1

INTRODUCTION

1. Regression Models in Survival Analysis: One of the principal goals in survival
analysis is to make inference about the time to a speciﬁc response or event, in relation to
the risk factors that inﬂuence its occurrence. In most applications the identiﬁcation of
important risk factors is a challenging problem itself, sometimes with signiﬁcant
statistical input. Regression models establish a relationship between the outcome of
interest and a vector of covariates. Although such models merely approximate the true
relationship between two groups of variables, they become important analytic tools,
especially when they build upon the characteristics of the variables involved.

Exploratory statistical methods often provide a very useful insight to the relationship
between the lifetime of interest and the covariates involved. Models based on the
monotonicity of failure rates or the time-invariant relative risks for failure or the
proportionality of odds, have been used in a variety of situations with considerable
success as far as goodness of ﬁt and interpretation of results. On the other hand, general
regression models often demonstrate a trade off between adaptivity to speciﬁc
applications and mathematical complexity.

The Cox Proportional Hazards Model (PHM, Cox (1972), is one of the most

frequently used models to express the relationship between a lifetime and a vector of

2
covariates. There is an enormous literature that refers to this model and statistical

methods associated with it have been examined in a variety of situations. Multiplicative
Intensity Models (MIM, Aalen (1978), make up another important category of regression
models used in survival analysis. They are based on a product factorization of the rate of
failure (intensity process) into a component that describes the risk set at a given time t
and a component that describes the risk of failing at t, given the covariates 2. Such
models have been used in a variety of scientiﬁc ﬁelds like medical research,
econometrics, reliability and engineering. Models with increasing (decreasing) failure
rate are often used to incorporate some prior knowledge on the distribution of the failure
time, while others that assume proportionality of odds-ratios are applicable to situations
more general than those described by the classical PHM.

A general regression model establishes the relationship between a failure time and
covariates through a regression function g. When g is linear (nonlinear) and known up
to a ﬁnite dimensional parameter 9 , then we obtain familiar formats of regression
models, like the linear regression and the accelerated failure time models. When g
remains unspeciﬁed to a large extent, then the problem of nonparametric regression - one
of the most outstanding ones - provides a wide open ﬁeld for further research, while at
the same time it allows the broadest range of applications.

In this dissertation we consider estimation methods in regression models that
represent the two major categories of models described earlier. In particular we analyze
the Accelerated Failure Time (AFT) and the PH models under a censoring scheme that

occurs frequently in longitudinal studies and studies with periodic follow up.

3
2. Random Censoring : One of the characteristic features in the analysis of survival

data is the partial loss of information on the main variable of interest. Follow up studies
have usually a ﬁnite horizon over which the outcome of interest might occur. In addition,
subjects may withdraw from the study for a number of reasons or simply miss scheduled
examinations. Moreover there are situations where continuous ﬂow of information with
respect to the variable of interest is virtually impossible. In such cases, an inspection
scheme is needed to provide the necessary information. If X denotes the time to event
i.e. ‘failure time’, then there are situations where X might be : 1) fully observed, 2)
partially observed, 3) not observed at all. Usually there is a competing variable Y called
the censoring variable and an inspection scheme that provide information about X in
cases 2 and 3 respectively. This variable can be either ﬁxed or random.

Non-random censoring is common in economics related research, where variables are
observed or become of interest, whenever they fall above or below a ﬁxed threshold. A
simple example is that of an employee who plans to achieve a certain goal during his/her
tenure with a company. The time at which he/she achieves that goal is observed as long
as it occurs prior to the termination of the employee’s service.

Random censoring occurs naturally in a variety of situations. In toxicological
experiments animals injected with suspected carcinogens are monitored for tumor
development (Hoel and Walburg (1972)). The presence or absence of tumor is assessed
at random times of sacriﬁce for each animal. Depending on the lethality of the injection,

the time X of tumor onset is randomly censored at the time of sacriﬁce Y.

4
Other observational studies provide a window that allows information on X. To

assess motor development of pre-school children, a study was planned to test the skills of
participating children (Leiderman et. al. (1973)). If a child had the skill prior to the
initiation of the study, then the time to event is left censored at the beginning of the study.
If the child develops the skill during the study period, then the time of the event is
directly observed. If at the end of the study a child lacks the speciﬁc skill, then the event
of interest is said to be censored to the right, at the study’s termination period. The
window over which X is observed consists of the random time of entry and exit for each
subject. This scheme is known in the literature as double censoring (Tumbull (1974)).
Another form of random censorship is the one where X is never observed but known
to belong to a time interval, consisting of the last negative and the ﬁrst positive
assessment of the event’s occurrence. The time of mail delivery illustrates this form of
censoring. One is interested in the time the mailman delivers a piece of mail. However
only a sequence of mailbox inspections provides information about the time of delivery,
which is known up to an interval. This scheme is known as interval censoring, (Peto and

Peto (1972), Tumbull (1976)).

3. Interval Censoring - A General Scheme: interval censoring occurs frequently in
studies in which information about an event of interest is obtained by an inspection
process that assesses at each inspection time, the occurrence or not of the target event.
The most appealing applications of interval censoring, appear in cancer and AIDS related

research. An important measure of effectiveness of a treatment therapy in cancer research

5
is the length of time that a patient remains in remission (remission duration) - Rﬁcker and

Messerer (1988). Remission duration is deﬁned as the time period between complete
remission aﬁer treatment and tumor relapse. It is clear that both the initial and terminal
events that deﬁne remission duration are subject to interval censoring because they can
only be assessed by a sequence of inspections. A similar problem occurs in situations
where one is interested in the length of the incubation period of AIDS. By incubation
period we mean the time elapsed from infection to the onset of clinical AIDS. Evidently
at least the initial event (time of infection) is in almost every situation subject to interval
censoring.

The following general model that describes the interval censoring mechanism, has
been proposed by Wang, Gardiner, Ramamoorthi (1994). Let {Wk :k 21} be a sequence
of ordered positive random variables that represent the potential examination times for a
subject. At each time Wk an assessment is made of whether or not the event of interest
has occurred. Let t denote current time measured from the beginning of the study and
deﬁne N (t) = min {k 21: X 5 W, S t} , if such k exists, with N (t) = 00 otherwise.

Also let M(t) = max{k21: W,Ir St}, W0=0 and Wm = WM. Assuming at least one
examination is made, N when ﬁnite, marks the ﬁrst assessment at which a positive
diagnosis of the event of interest, occurring at the unobserved time X is made, by time t.
On the other hand if N = +00, we only have knowledge that X > WM. In their paper
these authors prove the identiﬁability of the distribution of X from the datum
(WN_,,WN,N, M). The time of diagnosis of the event of interest is deﬁned as

Z=WN, if N<oo ,butwe only knowthat Z> WM, when N=+oo.

6
In what follows we will consider a rather simple interval censoring model consisting

of two inspections. Groeneboom and Wellner (1992) call this scheme, case-2 interval
censoring, to distinguish it from the case of a single inspection (case-1) which can be
viewed as a degenerate interval censoring situation.

The manuscript is organized as follows. In chapter 2 we discuss the existence of
maximum likelihood estimators in two semiparametric models. In chapter 3 we prove the
strong consistency of the estimators. Chapter 4 examines the optimality in the
estimation. We compute efﬁcient scores and information lower bounds for the ﬁnite
dimensional parameters and provide estimates of their asymptotic variances. In chapter 5
we consider generalized M-estimation in the AFT model. We review some fundamental
results from the modern theory of empirical processes and establish a few of the
necessary conditions needed to obtain the asymptotic distribution of our estimators.
Finally in chapter 6 we present some simulation studies to illustrate the performance of

our estimators.

Chapter 2

MAXIMUM LIKELIHOOD ESTIMATION IN TWO

SEMIPARAMETRIC MODELS

1. The Cox PHM is one of the most popular regression models used extensively in
survival analysis. It assumes that the conditional hazard function given a vector of
covariates Z, factorizes as

(1.1) M42) = xo(r)g<z)

with A0 a nonnegative function (baseline hazard) of time t and g a function

independent of time. When time dependent covariates are considered, g depends on t
only through the covariates Z. The choice of g marks the degree of generalization of
the original model proposed by Cox (1972), in which he considered functions of the
form g(z) = e“. There are two broad categories of papers addressing (1.1). One class
considers functions g known up to a ﬁnite dimensional parameter 9 and the second
regards g as a general nonegative function. The following is a brief historical account
of the subject.

O’Sullivan (1986) considers PH models of the form

(1.2) Mtlz) = k0(t)e"‘)

with r :R" —) R. For k large, “partly linear” forms of the r function were introduced

and studied by Green (1985) and Heckman (1986). An interesting version of Cox’s
model considers hazard functions of the form

(1.3) i.(r|z)=exp(e,z,)7t(r|z,) for z=(z,,z,).

In case Z2 has ﬁnite support, (1.3) is called the stratiﬁed Cox model.

A different generalization of (1.1) was studied by Aalen (1978,1980). In Aalen’s model

(1.4) 3.(t|z) = imp)
I=l

where the M's are unknown ﬁmctions. To avoid the nonegativity constraints in (1.4)
Zucker (1986) and Zucker and Karr (1987). have considered estimation when the

conditional hazard function is given by

k
(1.5) x(r|z)=exp[szx 1.0)].
j=l
In the present and subsequent chapters, we will restrict our attention to the classical

model (1.1) with g(z) = e“ and we will comment on any generalizations of our results to

other models.

2. The Accelerated Failure Model is widely used in reliability studies and industrial
life testing. It assumes a log-linear relationship between the failure time T and the
covariates Z, namely

(2.1) T: 7;, exp(9'Z), T > o

with 7;, a baseline “failure time”. The model’s name illustrates that the covariates Z
have an accelerated (decelerated) effect on the survival function of T, compared to the
corresponding function of It}. The log-transformation reduces (2.1) to the familiar
linear regression model

(2.2) X = 9 'Z + s

with X = logT and a = loglg. This model has been studied in a variety of situations
including random (right-left) censoring. Important work on the subject can be found in

Buckley, James (1979), Koul, Susarla, Van Ryzin (1981), Ritov (1990), Schick (1993).

3. Maximum Likelihood in Semiparametric Models: Consider a family of
distributions 0’:{R,,F: (9,F)e®x.7} on a measurable space (1.8). Let u be a
measure on (B and p9.F(-) adensity of If” with respect to 11. Suppose that @CRd
while .7 is an inﬁnite dimensional space.

Let (90,F{,) be the true parameter and suppose that X ,,X2,...,X is a random

A

sample from Pom. Deﬁne the maximum likelihood estimator (6 n , 1:7,.) by
(6 ,, , 13;) E arg max I log p9 ‘F (x)dP,, (x) with P” the empirical measure based on
6 x F

X1,X2,...,X , under the assumption that such maximizer exists. Let

n

§=(9,F) e EEOXJ.

10

4. Interval Censoring with two inspections: Let A be a random variable having
distribution F3, (T, U) random variables with joint distribution H, Z a random vector

with distribution W. Denote by J the joint distribution of (T, U,Z). Suppose that A is
independent of (T, U) conditional on Z with distribution AIZ ~ G 0 , £0 65. We will
refer to (T, U) as the censoring variables and will assume throughout that Pr{T < U}: 1.

Consider the measure space (R x R2 x R", 03 ,Qg) with Qg a probability measure
on the Borel o—algebra 03. Denote by Y° .=_ (A, T, U, Z) a typical element from this

space and let (i) be the measurable transformation Y =¢(Y° ) = (5,y ,T,U,Z) where

8 =1“ 5,}, y =1{T<A 5”}. Let Pg = Q: o :1)" be the probability measure induced by

d). Then Pé and Qg relate to each other by

(4.1) 1:2{8’Y’AX B,E} 16(051) lytt<a Sn) 11—7—8“: > ude§(a,t,u,z)

with A x B, E Borel sets in R2 and K” respectively.

Our problem is to estimate £0 =(60,E,) on the basis ofa sample {K,I§,...,X,} of
independent and identically distributed observations. In what follows, we consider
maximum likelihood estimation in regression models under the censoring scheme
described above. We will call this situation interval censoring without any further
reference.
4a. The Cox model with interval censoring: Suppose that the hazard ﬁmction A. ,

associated with a nonnegative random variable X, conditional on a vector of covariates

Z, is given by

(4.2) Mxlz) = 2.(x)e°"

We maintain all the notations and assumptions introduced earlier in the section, with the
exception that A E X _>_ 0 w. p. 1. In addition we will assume that (T, U) and Z have
densities h and w with respect to Lebesgue measure, which do not depend on 9 GO .

Based on the observable y= (8 ,y ,t,u, z) , we have that for g = (G, F) e E
P,{8 =1|T=t,U = u,Z = z} = P§(X s tIZ = z) a F9012) = 1—[F(t)]'
Using the deﬁnition of the conditional cumulative hazard function A(t|z) a JIMsIz)ds
0

and (4.2), we can write the density of Y with respect to u = v2 x 1 4+2

[—1 —5

(4.3) p.(y) =(1-r(:)°*”‘°”)5(r(t)“"<“’ —r(u)°**"°"’)’ (r(u)°""‘°"’) h(t,u/z)w(z),

where v2 =counting measure on {0,l}®2 and r d = Lebesgue measure on R".

The transformation 7 = e"" allows an equivalent form of the density (4.3), namely

(4-4) My) = (1 -exr>[-A(t)e°"])5 x (eXP[—A(t)e°"]—exr)[-A(u)e°"l)7 x
(cxp[—A(u)e°“])""‘ x h(t,u|z) x w(z).

Very often we will switch between (4.3) and (4.4) depending on the circumstances.

Since we have assumed that h and w do not contain any information about 9 and

since our primary goal is to do inference about 2; , we can proceed safely considering h

and w known. The log-likelihood function based on n independent and identically

distributed observations { Yl ,I’; ,..., X, } is (up to an additive constant) given by

Maj”)::5"1°gl1‘ﬂ’ilmml’rti10g(F(t,-)°’“"°"”-F(u.-)m(°'z'))+
i=1

20‘” i _5i)log17(ui)ew(9'zl)

l=l

and

ln(9,A; y) = :8, log(1—e-A"I"°*' )+‘yi 1046mm”: _ e—Amv”, )_.
i=1

—(l—y,. —8,.)A(u,.)e°'z'
Early attempts to estimate the parameters in the Cox model under interval censoring were
conﬁned to purely parametric methods. Finkelstein (1986) has considered maximum
likelihood estimation under the assumption that the baseline hazard has ﬁnite support.
'Recently Huang (1994) has completed a thesis on efﬁcient estimation in the Cox model
with case 1 interval censoring. He has proved that the MLE of the ﬁnite dimensional
parameter is asymptotically normal and efﬁcient. Although from one point of view our
results can be taken as a natural generalization of Huang’s results, case 2 interval
censoring, still in its infancy, displays difﬁculties that do not appear in case 1. The biggest
of all is the potential “nearness” of the two inspections. This is not a mere technicality
that one has to address in a way or another but an integral part of the problem. Although
there are complete results that describe the asymptotic distribution of the NPMLE in case

1, (Groeneboom (1989)), no asymptotic theory for case 2 has been developed as of today,

13

to the best of our knowledge. In fact, not even the rate of convergence of the NPMLE -
fundamental tool in efﬁciency considerations- is known for case 2 as opposed to case 1.
4b. The linear regression model with interval censoring: Consider the model (2.2)
with 8 having distribution Fo. In addition to the basic assumptions for the interval
censored models that we considered earlier, we assume here that e is independent of the
covariates Z. Then for A a e , a density of Y with respect to u is given (up to a
multiplicative constant) by

_7.5

(4.5) p§(y)={F(t-6'z)}5{F(u—9’z)-F(t—9’z)}7 {l—F(u-—9’z)}l
and the log-likelihood ﬁmction by
1,,(e,F; y) = is, log{F(t,. —e'z,.)} +y ,. log{F(u, —e'z,.) — F(t,. -9'z,.)}+
(4.6) ‘1‘
2,10 —y , —6,)log{l — F(u,. —e'z,.)}.

Finkelstein and Wolfe (1985) were the ﬁrst to consider this model under interval
censoring. To model the joint distribution of (X2), they introduced a parametric
formulation of the conditional distribution of Z given X and they estimated the
distribution of X, using the “self-consistent” nonparametric estimator of Tumbull
(1976). Although they argue that their estimators are maximum likelihood in nature, it is
well known today that the self-consistent equations do not always yield maximum

likelihood estimators.

14

5. Profile Likelihood Estimation: We consider here a three step procedure that yields
a maximum likelihood estimator for the models we have introduced. This is a standard
approach for M-estimation in semiparametric problems and has been used among others
by Anderson and Gill (1982), Whittemor and Keller (1986), Leblanc and Crowley (1995).
It can be summarized in the following three steps.

81ml: For 9 e 9 ﬁxed, consider E,(-,9)=argmaxln(G,F).
Fe}

Step1: Replace F by F;(-,6) and consider the proﬁle likelihood function

9 1—) 1,,(9,E,(.,9 )).
m3: Let 6,, =argmaxl,,(0,E,(-,9)).
Gee

Set 6, a 5, and i;(-) = F,,(-,§,,).
There are a number of issues that need to be clariﬁed before we proceed to the properties
of the maximum likelihood estimator. In Step 1 we need to justify the existence of the
maximizer. Moreover a practical method for the computation of the maxirnizer might be
a priority. In the next section, we provide the arguments for the legitimacy of step 1 in
the two regression models that we consider. In that regard, the work of Groeneboom and
Wellner (1992) on the existence of the NPMLE is the basis for our arguments. Details on
the computational aspects of the MLE from interval censored data and algorithms to carry

out step 1, are given in Groeneboom and Wellner (1992). In relation to step 3, there

seems to be an ad-hock assumption that 9,, 69 Vn . This is not generally the case. We

will only need (in GO eventually, with high probability and this result will be

established in Theorem 3.1. Finally, with all three steps substantiated we obtain

1,(é,,&)21,(§,,F,(-,6,) with (6,,F) = argmaxln(0,F).
8x]

Moreover I,(6,,F,(-,6,) 21,(9,F,(-,e)) 2 1,,(6,F) v (9,F)e®x.7, by steps 3,1

respectively. It follows that ln(én,13;)=ln(§", [ix-,6" )). This proves that the three step
procedure described above yields a maximum likelihood estimator. In the remainder of

this chapter we will ﬁx a 9 66. We will also use the abbreviation

q,=6’z,. for ie{1,2,...,n}.

6. Characterization of the NPMLE: In this section we state necessary and sufﬁcient
conditions for an estimate of F to be a maximum likelihood estimate. Our focus is in
the Cox model for which we provide the full details. In Theorem 6.2 we state similar
results for the linear regression model without any proofs, since this would be a
duplication of arguments to a large extent.

We consider the mapping A H 1,,(A) based on (4.4),
(6.1) l,,(A; y) = Est-1030‘ e—Mnk'n )+Y i log(e-A(n)e"l _e—A(u,)a. )_ (1 ‘71- —8,.)A(u,)e"’ .
i=1

Let Jfln={T,.: 8,.=1 or y,=1 for i=1,2,...,n},

J9)={U,: y,=1 or 1—6,—y,=1 for i=1,2,...,n}and Jn=.}f,')U 153’.

16

Notice that J,' c:{7,7 ;i=1,2,...,n} U {U,. ; i=1,2,...,n} marks the set of relevant
observations which contributes to the likelihood function. Let 0 S "(1) S...S no") be

the order statistics of the elements of J". Write A j E A(n (1.)) and notice that
0 3 A1 5. A2 5...: A", due to the monotonicity of A . We abuse notation when we write
8(1)’ 7(1) or Z”) referring to 5's, 7's and 2's associated with nu). The MLE
A" of A0 can be chosen to be a right continuous, nondecreasing, step function, with
jumps at J". Set 11(0) =0 and An(0)=0. Then A" will have the form

0 05t<nm

An(t) =An(n(j)) “())S’<Tl(j..) j=1,...,m—1

unspeciﬁed t>71(,,.)-
It is worth noting that if 5(,)=0 or (6(m)=l or y(m)=l) then A"(n(l))=0 or

An(n(m))=oo respectively. So without loss of generality we will assume that

8 = 1 and 5 (m) = y (M) = 0. In this way we can restrict ourselves to ﬁrnctions ﬁom

m
£2{A: A(nm)>0, A(n(m))<oo, A(n(j))—A(n(j_l))>0 Vje{1,2,...,m}}.

In this way we can avoid pathological situations with maximizers of the form

1,,(An, Y) = —oo. Before we state the main theorems in this section, we introduce some

additional notations. Let

 

 

_ 5'. Y: GI‘AUll‘m
A110) _ Z (1_e—A(7j)e" - e—A(T,)e‘Ii _e—A(U))¢"' ) e

7.- Fri-5.- m-MUIW
+ Z [e—Mrw _e-A(U,):" — e—A(U,)e" )e

 

 

— 8 ‘Y q—A(s)e7
— n’l [I —e‘A(’)'q — e‘Al‘)” _e-A(v)"’ )e dPn(5,Y ,S,u,z)+

sSI

 

Y _ 1_Y _6 q—A(u)e"
"i ie-Ame‘ _e-A(u)e" e-Atuw )e dP"(5'Y'S’":Z)-

115!

Note that the process W has jumps at points of J" .

THEOREM 6.1. For ﬁxed 966), let q,=9'z,. for ie{1,2,...,n}. Suppose that

6 =1 and 6 (m) =7 (m) = 0. The following conditions are suﬁ‘icient and necessary for a

(1)

function An(-,9) to maximize (6.1) over i.
i) dex #(s)50 VtZO and ii) JAn(s,6)dW/~\mq(s) =0.
l 0

Moreover An(-,9) is uniquely determined by i) and ii).

Proof. Deﬁne S={f=(A,,A2,...,Am) :Aj=A(n(j)), A61} and let <p:S—)R be

the function (p07): l,,(A) if 5? = (A,,A2,...,Am). We supress the dependence of the

likelihood function from 9 . Notice that q) is non positive on S. If 56,, = argmaxrpﬁc‘)
YES

then we set (1A\,,/A\2,...,/A\,,,)5J3,l . It is easy to see that

g(z)=w,,(n,)—WM(11,_,) v.- e{l,...,m}.
1
Suppose that (i), (ii) hold and let A 62 . Set y=(K,,...,K,) and write X instead of

An . In the Appendix we prove that (p is concave. Exploring this property of (p and

the structure of S, we obtain for arbitrary 1r 6 (0,1)

«perms ﬁlm ~+(1-n)r)-<p(r))
_ oW—r +r)-<p(r)(-_-) W . -_-
— nip?) x y w <P(y),x y)

Thus z,(A)-z,(x) = <p(i‘)-<p(i) s (we). H“) =:AWX,4("(D)(AJ‘KJ)

I( WVW1()(:)()§AAWK_.(M) = (EWW-

~

Now we can write x as a telescopic sum

"1

3? = 2a, 1,, 1,. = (o,...,o,1,...,1)

1:1

with the last j components equal to one and a j = A,” 1+1 — A l s j S m. It follows

m—j’

that I.(A)- .A( )< 2a. (1,,v.))= ia gm: ("(0)

Fl

.. a (i)
= 2a]. J‘dWKﬂCg) S 0.

j" "(1)

So if A" satisﬁes (i) and (ii) then it maximizes (6.1) over i . Now suppose that

A" = arg maxln(A). For 31 = (A,,...,Am) eS, we can express the maximizer of the log-
AG!

likelihood function with respect to 31 = arg max<p(7c) .
EES

For an 8 >0 we have (31+elj) 63 V ISjSm.

Thus

lirn <P(X+811)—‘P(X)

s—bo 6

ll

(IJ’V‘P(7)) = Wx.q("<m))‘Whole-1+0)

= JdWM(s)SO , Vlsjsm.

"(Hm
Since (3+hy) eS Vh>0,
. <p(?+h?)-<P(?)_ .. _
1,33}; —h——= (y, Vo(y ))=Aj( S)dW~q( — 0,

because 31 is the maximizer of (p. Thus (i) and (ii) hold.

To prove uniqueness of the estimator we need to preview Proposition 2.1 in the
appendix. It provides an alternative characterization of the maximum likelihood
estimator as the left derivative of the convex minorant of a self induced cumulative sum
diagram. Since two consecutive vertices of the convex minorant always include a point
that corresponds to a leﬁ (8 :1) or interval (7 =1) censored observation, it is enough

to prove uniqueness of the M.L.E. on this set of the observation points only.

Suppose that besides I: = (A,,...,AM) eS , there exists another maximizer of (6.1),

call it 3? = (A'1,..., A...) e S. Then a second order Taylor approximation will give

 

20

where Mp )=—[V,j(p(p )]m with 51155—31.

ij=l
The matrix M of second derivatives has entries of the form,
2 ~ 2 ”Wu 2 ﬂit/1(1)
a (p x) we Rue

v?. ~= (=—o.——1 .'——1 ' '='
1.1(p(x) axiaxj l’l(1_e—A,R(,))2 {Tr=‘1(l)} +Y(')(1_e—A,R(,))2 {U*=‘1(I>l ’ if I j

 

= 0 , else

forsome lskSn and Ak=A(U,,)—A(Tk).

It follows that Z(y,— y) V,,.(p(y )= 0 from which we obtain y: = y, Vi such

i: 5(,)=1 0R y(,)=l
that 6(1) = 1 or y( 1) = 1 . This establishes the uniqueness of our estimator. Theorem (6.1)

is now proved.

REMARK 6.2. As a result of Proposition 2.1 in the appendix, conditions (i) and (ii)

o A w. .1
always hold. This along with the consistency result (90 GO, 9,, —P> 9°) conﬁrms that

the M.L.E. A"(-) = An(én,-) always exists for n sufﬁciently large.

REMARK 6.3. If we set i;(-) = 1 — e‘M" then we obtain a nondecreasing, right

‘3‘.

21

continuous step function, satisfying F;(0) = O, 0 < 132010)) <1 V 151's m. This
function is deﬁned explicitly as a result of our Theorem 6.1 and has jumps at the same

points as A" does. The monotonicity of our transformation A —> 1 —— e“ implies that
F; = argmax l,,(F) , with .7 : {subdistribution function F: F(0) = O, 0 < F(nm) <1}.
For

REMARK 6.4. We can treat the problem of maximum likelihood estimation in the
linear regression model (2.2) in the same way. To justify step 1 in the proﬁle likelihood

approach, we need to consider

_ _ I _ _ I 9.. 9_
ﬁ—T} Gz, Uta—U, 92,8i—l{s,5ﬁ,},y,—1{7I,<slsu,‘,}

The following theorem is the direct analog of Theorem 6.1 for the regression model

(2.2). We state it without a proof.

THEOREM 6.2. Fora ﬁxed 0 GO, suppose that 8° =1 and 8e

9 _
(I) (m) =Y(..)-0- 77"?

following conditions are suﬂicient and necessary for a function EEG) to maximize (4. 6)

over .7 .

i) IdWm(s)$0 VtZO and ii) IE(s,0)dWM(s)=O, with
I 0

 

WF»<'>=Elie1[tiir) trun- —'F<—e)) E ,..[,(U.)f,(,.)—‘;_5,Z(;;§]-

Moreover F;(-,9) is uniquely determined by 1) and ii).

Chapter 3

STRONG CONSISTENCY OF M.L.E.

In chapter 2 we have deﬁned the maximum likelihood estimator (Sm/X“) and (Shin)

in the Cox and the linear regression models respectively. In the next two theorems we
prove its consistency under a suitable topology on the parameter space. Pfanzagl (1988)
proves consistency of the NPMLE under a global condition -compactness of G) - and a
local condition - continuity of 9 —> f; in the neighborhood of 90. Van de Geer (1993)
obtains consistency of the NPMLE with respect to the Hellinger distance using some
entropy calculations. Although her results cover a much wider range of applications and
in certain cases lead to rates of convergence, we adapt Pfanzagl’s approach here, since it
is more direct and suitable to the demands of the semiparametric nature of our problem.
Theorem 1 presents the consistency result for the Cox model, while Theorem 2 is its
direct counterpart for the linear regression.

For the Cox model described by (2.4.2), a density of Pg with respect to u = v2 x 1:2“,
is given by

1—1—

_ , a _ , _ , __ ,2 5
Pt“) ___ (1_ F(t)ocp(e 2)) (Forms 1) _ F(u)=xp(ez))1 (F(u)cxp(e )) h(t,u/z)w(z)
with é = (6, F) e E E G) x .7, v2=counting measure on {0,1}®2 and 13d Lebesgue

measure on R" . Deﬁne

22

23

My) a 6(1- main)” (F'(r)“"‘°"’ - Form) +(1 - Y — gamma).

Let SH be the support of H and deﬁne a0 Esup{x:Fo(x)=0} , bO Einf{x:Fo(x)=1}.

THEOREM 1. (Consistency in the Cox model)
Suppose that (i) O is a subset of Rd with bounded closure .
(it) 90 e G, the interior of 6.
(m) V 9 it e, Pr{e Z ¢ 902} > 0
(iv) V (t,u) ES”, 05a0 <t <u<bo.
Then 8,, —‘”—>60 and Ii(y)i>ﬁ(y) Vy 6 En CFO, under
P0 5 Pow-1,), where C F0 denotes the set of continuity points of 11; and E = (do, be) . In

the special case that 143 is continuous on E, the above convergence is uniform with

 

probability one. i. e. sup P: (y) — F}, (y)| —“'—) O.
yeE

REMARK 1. Assumptions (i)-(iii) are essentially the same as in Pfanzagl (1988).
While assumption (iii) safeguards against non-identiﬁability problems, (iv) is naturally
satisﬁed in almost all problems with interval censored observations. Moreover (iv) is

essential here since the MLE F‘

I1

is uniquely deﬁned on S H and it is a step function on
[0, b0). Finally Pfanzagl’s assumption of a concave density with respect to the unknown

parameter, seems unnecessary here. He used it in order to verify condition (2.6), page

24

140, in his Lemma 2.5 -originally due to Wang (1985). We show that a more direct way

is indeed possible.

Proof of Theorem 1. Consider the measurable space (EA?) with it the Borel o—ﬁeld.

Let m = { measures F on (E,d§) : F (E)Sl }. We have deﬁned earlier .7 to be the set
3 = {F= distr. function: 5(0) = 0, o < F(n(,.)) < 1, F(n,) — F(n,._,) > 0 Vi 6 {1,2,..., m}}.
Note that 7 c m. Equip m with the topology .7, of vague convergence, i.e. the smallest
topology that makes ﬁmctions of the form F 6771 1—) I de continuous V f e C‘(E), the
space of continuous functions with compact support. Helly’s theorem asserts that ( m, .7, )
is a vaguely compact topological space. Equip (9 with the usual Euclidean topology .7,.
Then (6 x m, .7 ) is compact in the product topology .7 = .71 x .72. We say that
(9", E)-'"—’>(9, F) if and only if 9,, —) 9 and E, —'> F, with the latter denoting
vague convergence.

Fix an arbitrary ae(0,1) and let F 6.7. For 5,: (6, F)¢(90, 1%) = £0, E E Ego ,

f a j; , f0 5 ft. , Jensen’s inequality applied to the strictly concave function
<p(y) = log[l + a( y — 1)] ya 0, along with (iii) gives

(1.1) ”(28% ¢iE1€((:)l

= <p(j f(y)u(dy)) s logl = o.

 

 

By the concavity of the log function and the fact that an is the MLE of i , we obtain

25

1,,(é )> 1,,(E,o) which implies glog%:%20 and

(1.2) go“0E:;]=§1ogg[1+a(ﬁ:(:3—1J]

(
=§log[a;:[ —(—:;]+ +(1— a)]_>_ aglogﬁgg 20.

 

 

Consider now a collection {71,(g): e > 0} of nested neighborhoods of g. Let 71E E 719(é)

andf

(y): sup fg (y) Notice that §—) f§(y) is continuous under .7 for u—ae y, and

é'eﬂ.
bounded above by 1. Thus Z(y) lo f (y) for u-a.e y.
We want to prove the measurability of y 1—> Z( y) . Notice that for We denoting the

closure of 71 under.7, we can ﬁnd @671 suchthat f()= sup.f§(-). If £6715 then

6571‘
measurability of y 1—> f§( y) holds and there is nothing to prove. However if E 6 ‘71E \71E
then El (5,")"5" in 718 such that §,—’>E. Thus f§(y)—&+f§(y) and by the

completeness of u, fg is dimeasurable. It follows that

Metal) ..

 

If (MkzkeN) with Mk>0 Vk and MkToo then by the Lebesgue dominated

convergence theorem we obtain

26

lsigiE[(p[%]/\ 114,]: E[(p[£((:))]f\ Mk] Vk.

ﬂy)
fo(y)

convergence theorem and (1.1) give

mm 7.0) A = im f(y) A
11~E111101y11 “1 11911111111 M11

 

 

Moreover since (p[ )A M k 2 log(1— a) Vk , an application of the monotone

 

 

 

Thusfor e>0 small 3 k-=-k(e):

Zm A
(1.3) E[<p[fo(Y)] M,]< 0.

Let .8 be an arbitrary neighborhood of £0 and set E. = 5 x 771. Since .19' : E .\.8

 

is com act in .7 , any open cover of .8’ contains a ﬁnite subcover. Let U : 5, e49 be
P 5

one such cover of .8'. Then we can ﬁnd {gpfn ..... am} 619' such that 19’ c UUJ

I81

for U] E Uél' Therefore from (1.2) we obtain,

{E.

c {212:1 11:11:11: —::11 21

forsome M >0, with f: —-.supfg

561/;

v—‘ﬁ
J“)
a

ﬁt

b
\W_l

0
C:3

m
S
v

3

 

:C3
as

 

27

7,111)
fem

 

Set 155,. =(p[ JA Mj. The random variables {153.121} are i.i.d \7’j e{l,2,...m}.

Moreover le’J —El’}'J| S M]. —log(1—a) and as we have shown in (1.3), Eljf, < 0.
Applying Hoeffding’s inequality to the sum of centered, bounded random variables

If, — EY’ we obtain

1.1 1

111%. es}s‘:&{i<m —EY;,.-)2—nEY;,.}

j=l is]

S iexp(—2(—nE l1, )2 /4n( M j — log(l — a))2)
j=l
S iexp(—nB jz /2(M0 —log(1—a))2) S mexp(—nB 02 /2( M0 —log(1—a))2),
j=l
where 13,. = E17, < o Vje{1,...,m}, [so = {<11}, M0 = RIM].
J: J:

Thus 21$," .21} <00 and Borel-Cantelli gives

"-1
HAZE" e .8 eventually} = l .

Althought the above argument seems to depend heavily on the choice of .8 , it holds in a

straightforward manner for any such 8. This is true because the parameter space has a

countable base. It follows that E” = (6n, E)#—)(90, P3) a.s. Po"'3 and the theorem

is now proved.

REMARK 2. One of the advantages of Pfanzagl’s ideas incorporated in the above proof

is the lack of dependence from the exact form of the MLE. This is extremely useﬁrll

1

28

particularlly in cases where a close form solution of the MLE is very difﬁcult or
practically impossible to obtain. The models that we consider here contribute in the
above proof in three aspects. Firstly a topology on the parameter space is chosen that
makes sense from a statistical point of view. Secondly, both the global and local
conditions, necessary for the existence of the MLE and the veriﬁcation of Pfanzagl’s
assumptions, hold true in our models. Finally these conditions are easily veriﬁed in real

life applications with interval censored data.

COROLLARY 1. Under the assumptions of Theorem 1 and for A" E — log(l — 13;) we
obtain
A.(y)L>A.(y) Vy eEn CF,

ix.(y)—A.(y) J—m.

 

and in case that F; is continuous on E, sup
yEE

We now turn to the linear regression model (2.2.2). A density of I; with respect to

u = v2 x rm is given by

pg (y) = (F(t —Bz))6(F(u — Oz) — F(t - 92))7 (IT-(u -— 92))H—8 h(t, u

 

2M2),
with I; = (9, F) , v2=counting measure on {0,1}®2 and rd Lebesgue measure on R”.
Let j;(y) =8F(t - 62) + y[F(u — 62) — F(t — 92)] + (l — y - 5)F(u — 92), with

7;, = T—GoZ , Uo = U —60Z . Denote by V, J the joint distributions of (73, U0) and

29

('1', (1.2) respectively and by S, the support of V.
THEOREM 2. (Consistency in the Linear Regression model)
Suppose that: (i) O is a subset of R” with bounded closure
(ii) 60 60, the interior of 6)
(iii) V 9 at 90 Pr{9Z¢ GOZ} > 0
(iv) V (t,u) eSV, 05a0 <t<u<b0,
with a0 = sup{x:F;,(x) = 0} and b0 = inf{x:F;)(x)=1}.
Then
ﬁnite.) and 13;,(y)—i->Fo(y) Vy 6 En Co,
under Po E P(6..Fo)’ where C F0 denotes the set of continuity points of IS and

E = (a0, be). In the special case that If; is continuous on E, the above convergence is

 

uniform with probability one, i.e sup FA; (y) - F0 (y)‘ —"‘-) 0.

yeE
Proof of Theorem 2. As in Theorem 1, we endow the parameter space with the product
topology .727, x .7,. Continuity of §1—> f§(y) holds for u—a.e y, under .7 and the proof

goes through as in Theorem 1.

REMARK 3. In his thesis, Huang (1994) gives a proof of the consistency of M.L.E.
with case 1 interval censoring, based on a uniform law of large numbers (generalized

Glivenco-Cantelli) for V.C. subgraph, classes of functions. This nice proof is based on

30

the speciﬁc form of the model under consideration - as opposed to the proof we gave
earlier-. Moreover it introduces some useful technics - largely due to the powerﬁrl results
from the theory of empirical processes. This proof can also work in case 2 interval

censoring, with some modiﬁcations. We present it here for our linear regression model.

Second Proof of Theorem 2. Using the same argument as in Theorem 1, we will treat
(5 x 771, 7) as a compact topological space with .7 the product topology. Let
Q={w =<yn>nsNz y]. =(8j,yj,tj,uj,zj)}, 01 the Borel c-algebra on Q, R, a P:o the
true underlined distribution, (0,8,1?) the corresponding probability space. Let P" be

the empirical distribution based on i.i.d. observations Yl , Y2,...,lf,. From the strong law of
large numbers (S.L.L.N) and a separability argument
11;”{oaz P.(y;w)—,> m1} =1 Vy = (5,7.t,u,2)-
Now ﬁx an a) from the above set and denote Q, E Q" (co) , 1:105 [ix-,co).
For every subsequence (n’) c (n), 3 (n") c (n') and a (9.,Fl) O x 771 , such that
(2.0) 6,. —» e. and F —"+ F..
We want to show that 9. a 00 and F2 E Fo. Deﬁne

a" =inf{t—énz : (t,u,z) eSJ}, b" = sup{u—é,,z : (t,u,z) eSJ}

and let a. ,b. denote their counterparts when 9. replaces é" . At ﬁrst we would like to

prove

 

31

(2.1) Va,be(a.,b.) with a <b, 3114,,M,: o <Ml sF,(.:)sF,,(b)sM, <
forlarge n.

The maximum likelihood property and the deﬁnition of F; give
(22) Vn I. (to) s I. é.) s 111a...1(’-é»215 logﬁm —é.z)dt.(y)

5 log 2%)] 1[.,,.,](r 43.216 dP.(y)
while the S.L.L.N asserts that

("(éo) LR” 15] { F.3(To) 1011113031] +[1'3(Uo)-F3(75)] 10g[F6(Uo)-1%(75)] +

2.
3) +F1(Uo)log{F1(U.)] 120.

Now using the convention O-(i co) = 0, assumption (iv) and the relation xlogx 2 —%
for x e(0,l), we obtain

(2.4) 10g13‘,(a)l 1[,M](t —é,,z)5 dI’,,( y) z -2 for large n, as-Po.

Similarly

(2.5) log[1— F,(b)] l 1“ “(u—é "2)(1 —5 —y )dPn (y) 2 -2 for large n, as-Po.

Now notice that the sequences f,,( y) = 1[&..a] (t —énz) , g" (y) = 1[u;,](u —énz) with

y = (t, u, z), belong to a VC-subgraph, class of functions. Apply the Generalized
Glivenko-Cantelli Theorem (G.G.C.T) for half-spaces to obtain

(2.6) (P, — Po)l[a_'a](t —é,z)—“'—> o, (F, — 1;)1lb “(u—é nz)—'“+o and

(P, — 13,)1[b_5.](u-é nz)1W_](r -6 .2) Am.

Bounded convergence theorem implies,

32
(2.7) P, y ilwot — énz)l[a”&'](t — énz) —"—) 1:, y 1”.”(21 — 9.2)1[a”m](t — 9.2)

R) (1— Y —5)1[b.5.](u — énz)—,,) R) (1‘ 'Y _ 8)1[b,b,](u " 9'2) = EJFb(Uo)l[b‘b,](U') > O

and 1:, 511m ](t-é,z) —.> F0 81[a"a.](t—9.z) = EJF5(73)1[0"0_](72) > 0.

This is true by assumption (iv) of our theorem, the deﬁnition of a., b. and the fact that
a. < a < b < b.. Notice the use of some extra notation, 7: E T—9.Z , U. E U -G.Z.
The combination of (2.4) and (2.5) with (2.6) and (2.7) proves (2.1).
Now let’s consider the following empirical processes
B.‘ = I 1...](r 43.218 loam —é.z)dP.<y) ,

B: = Illabla —énz)1[al,](u—énz) y log[13;,(u—énz)— 13",(t —énz)]dPn(y)

B: = 11...](u—é.z)(1—6 —t)log[1-&<u—é.z)]dP.ty).
Denote by B1, B3, B3 the corresponding integrals obtained by substituting
6, «10., F; <-> F. , P, a PO. The combination of G.G.C.T, (2.0), (2.1) and bounded
convergence theorem gives
BﬁLBi j= 1,2,3.
Although this convergence is straightforward for j = 1,3, it deserves a closer look when
j=2. We need to check how the G.G.C.T. applies. Notice that if

1W](t -é,,z)l[aﬁ](u —é,,z) = 0 eventually, then B: —L) B} E 0. However if

1w)“—énz)l[a»](u—énz)=l eventually, then B: as a sum over the “relevant set” of

33

observations J 9'

n 3

involves only terms for which log[F,,(n?" ) — FLU)?" )] > -oo for some
i >j , (REMARK 2.6.3 and 2.6.4). Therefore in this case, F,(b) > F,(a) since [a, b]
contains at least one element from J?" , eventually. It follows that
B: 5 mm. — M.)Il[,,,,](t ~é.z)1[,,](u -é.z) 7 d(P. — Foxy)
+ 11...,0 —é.z)1,.,,,,,(u—é.z) 1’ Iog[ﬁ‘.(u—é.z) —F.(r —é.z)] dIMy),
which implies that B: i) B} .
A 3 . 3 , 3 .
Since I,,(§0) 3 mg") 5 EB; Vn, EB; 49231, I,(§o)i>C, it follows
j=l j=l H

3
that ZBJZC Va.<a<b<b.. Taking aim, bTb. inthelastrelation,weobtain

j=1

Pg. (Y)
pg, (Y)

 

E0 log E-K(P0,P,.)20, with K(P,Q)20 denoting the Kullback-Leibler

distance between two measures. So K(Po, 11.): 0 which in turn implies R, E P, .

Therefore F.(t.)=F,',(to) and F.(u.)=Fo(uo) a.e. J, while assumption (iii) gives

6. E 9,, and F. E E, a.e V. This concludes our proof.

Chapter 4

INFORMATION THEORY

1. Efﬁcient Scores and Information Bounds: In this chapter we calculate the efﬁcient
score and the information lower bound for the ﬁnite dimensional parameter in the Cox
and the linear regression models with interval censoring. This calculation is of
imperative importance since it asseses the degree of difficulty of the estimation problem.
Positive information is a necessary condition for existence of efﬁcient estimators. In
most parametric and semiparametric models including the Cox model with right

censoring , estimators of the ﬁnite dimensional parameter were ‘easily’ obtained at the

usual J; rate - see the excellent treatments by Fleming and Harrington (1991),
Andersen, Borgan, Keiding and Gill (1993) - However it is not obvious that results
obtained under the familiar right censoring scheme, could be extended to interval
censoring models in a straightforward manner. The problem is that the inﬁnite

dimensional parameter in the interval censoring model cannot be estimated at the usual

J; -rate. Although in the current context we consider the inﬁnite dimensional
parameter as ‘nuisance’, its inﬂuence in the overall estimation problem remains strong.
The information calculation sheds some light in the prospects of efficient estimation and
in the following theorems we prove that the information for 9 in the two models under

consideration is indeed positive under reasonable assumptions. For many interesting

34

35

results and more details on the estimation of the ﬁnite dimensional parameter in
Semiparametric models, we found the recent monograph by Bickel, Klaasen, Ritov and
Wellner (1994) to be a valuable source.

Before we proceed to the main results in this chapter, we state a few deﬁnitions and

auxilliary results that will be used in the following sections.

2. Preliminaries: Let 9 c R” be an open set. We call model, a family of probability

measures 0’ : {P9 :9 66)} on a measurable space (Ill) and experiment the triplet
8 : (1221,00). Let p. be a measure on (I, it) and suppose that the family 00 is dominated

with 1.1-densities p6 (~) . The model 0“ is said to be Hellinger- L201) - diﬁkrentiable at

. . 2
9 GO if there is a function 19:1 —) R" , H19 (x)l dPe (x) < 00 such that

Raﬁ/[Teena pe)2dp=0(|h|2), h—>O .
The Fisher information matrix is deﬁned as [(6) 441'ng . We say that 8 is regular
in ('9 , if 0° is: (i) dominated with 1.1-densities pe (~) , (ii) Hellinger differentiable with
derivative i9, (iii) the function e H p9 (x) is differentiable for u-a.e. x.
Let’s ﬁx a measure P0 60’ and let a elg(&)={h 617(1’0): Itho =0} be an

arbitrary function. Without loss of generality we might assume that a is bounded.

36

Otherwise for some M>0 we can work with aM(x) = a(x)l{laI 5M} -J‘a(x)l{laI sM}dR) (x).
For 9 one dimensional, consider the parametric family,
ft) (x) = Po(x) eXp(90(x) - b(9)),

with 12(0) = log” e""("’ciP0 (x)]<oo V9 69 , since a is a bounded function. Notice that

{f9: 9 e O} is an exponential family -thus regular- passing thru p0 with score function at
9 = 0 given by a(x) a % log fe(x)|9=o. In this way we establish an association between

[g(ﬂ) and regular parametric models. This association plays a fundamental role in the

computation of efﬁcient scores and inﬂuence functions for semi or non parametric

models.

3. Information lower bound: In this section we establish the conditions under which
the information for the ﬁnite dimensional parameter in the Cox and the linear regression
models is positive. In addition to the notation introduced in earlier chapters, we add some

more here to accomodate the needs of the current section. We start with the PH model.

E{Ze2°'zo,(2)| T = r, U = u}

 

 

(p(t,u) = E{e29'2q(z)l T= LU = u}
___r_(.._)_ ELL)— .- 0'“)
a(zl—sz) 0u<>-1-r(u|z) R()‘a(z)-o.(z)

37

, )_ E{Ze2°'ZR(Z)0.(2)| T = t,U = u}
w( ,u — E{eze'ZR(Z)0,(Z)| T=t,U= u}

 

A.,.(z)=R(z)(1+0.(z)) Q.(y)=e°"{60.(z)—vA..(z)} a(y)=e°"{iA.,.(z)—<1-a)}.

THEOREM 1. Suppose that
(i) v (t,u) es, o_<_ao <ot <t<u<B<bo forsome oi <5.
(ii) 3 M >0: Pr{|Z|SM}=1.
Then
(a) The eﬂicient score function for e is
i.i(y) = lz—<p(t.u)llA(r)Q.(y)+ A(u)Q.(y)i+l<p(t.u)— w(r.u)llA(u)— A(t)loz(y)-
(b) The information for 9 is
1(9) = E[1'; I” =
E.(Z— w)®2[A(UlZ)- A(712)]2 0.,(Z)R(Z)+ EJ(Z-<p)m/\(TlZ)20r(Z)

which is positive deﬁnite, unless the distribution of Z is degenerate at some point 20.

REMARK 1. With Pr{T=U}=l and the convention 0 x 00 = 0, our Theorem 1.2 gives
Theorem 2.3 of Huang and Wellner (1993). In their paper, these two authors provide a
survey of regression models, under a censoring scheme consisting of a single inspection,

(a situation that Groeneboom and Wellner (1992) call case I interval censoring).

38

Proof of Theorem 1. Without loss of generality we might assume that d=l. Let
y = (5,y,t,u,z) <a{0,l}2 x Rf x R, be ﬁxed throughout. Without causing confusion, we
do not exhibit the argument when using the functions (p, w , 0, ,0", R, A, Q,, Q. The
log-likelihood function up to an additive constant is
l(6,A; y) = 510g(1—e"A("e& )+y log(e"‘(’)e& — em”)? ) — (1 —y — 5)A(u)e92 .
The derivative of l(9,A; y) wrt 9 gives the score function for 9 ,
la (y) = zA(t)Ql + zA(u)Q2 .

Fix a distribution F with Lebesgue density f and for a e L‘,’(F) consider a regular
parametric family it : {fn:|n|<1} c .7, ={fzo:jfdt=1}, passing through f. Then

a(t) = -&a—']logf,1(t))n=0 and £17110» “:0 = FidF .

The score operator for f is obtained as the directional derivative of the function

n—->l(9,F;];-) at n=0. It is givenby

 

, QeeszIdF AeezfadF (A—1)e°z_[?1dF effiieur
(1.1) lfa(y)=—5—F_7(t—)——+y F(t) — F(u) +(I—Y—O)—?(-J—

[2dr EMF

= " 9* ro— ‘ 92 it?)
Moreover the assumptions of the theorem guarantee that both is, lfa are Po-square

integrable.

The efﬁcient score function for 9 is deﬁned by

39

(1-2) ie.(y)=ie(y)—ifa'(y)

with l f a. being the projection of [9 onto the closure of the space spanned by

{lfazaeLg(F)}.

 

 

Let Vl(a.) = zA(t) + _

Then

f”

—UMm-iaode=n)

[aw . M+aan(a M]+L(:ﬂaw 2( )+aa.(M
Notice that
E(Q,’|T,U,z) = e2°zo,R, E(Q,Q,|T,U,Z) = —e2620(,R, E(Q22|T,U,Z) = emOUR.

Thus

2dr“
-E(19(Y)-I,a.(Y))l a(Y): E H{£1T_T)E[e 262R(0T K(a.)—0,,Vz(a,))|7‘,u]}+

I: 262
13,2{F‘1(-—U—)E{eR0u(V2(m-) Wilma]

Wewanttohave E(l9(y)—lfa.(y))lfa(y)=0 VaeLg(F).

a(t,u)=E[e2°ZR(0iK(0o)-0..V2(ar))|T="U=“l

Let
r2(t,u) = E[e2°ZR0,,(V2 (a.)- V,(a.))IT = t,U = u]

Set r,(t,u) = r2(t,u) = 0.

It follows that

 

Ia2dF Ia dF

(1.3) giW-ga ;,( ) -f2 (u)-f.A(t)
ﬁnd}? Rd}?

(1.4) g2 3%) - 'nr) =f2[A(t)—A(u)],

with f,(t,u) = E{Ze2°ZRO,|T = t,U = u} g,(t,u) = E{e2°ZR0,|T = t,U = u}
f2(t,u) = E{Ze2°ZROu|T = t,U = u} g2(t,u) = E{emR0u|T = t,U = 2,},

Solving (1.3) and (1.4) we obtain

(1.5) [2am = —oA(r)F(t) and £21.dF = [(o — tp)A(t) — uA(u)]F(u).

 

Now by assumption (ii) we can easily verify that LEAF! S MA(t)Iv—‘ (t) —) O as t —> 0.

Moreover I "an15“ —) LEAF as r —> o . Thus a. 6 L307). Now we can use (1 . 1) and
(1.5) to obtain from (1.2) the efﬁcient score function for 9,
1.9.0!) = 1'2 (y) - i f “—0)
=(2 -<p)[A(t)Qi + A(u)Q2] + (<P -\v )[A(u) - A(1')]Q2-
Finally the information for e is
151;;2 (1’) = E6 (Z —(p)2 [A(T12)2 OTR + A(U|Z)2 OUR — 2A(T12)A(U|Z)0U R] +
EG((p -w )2[A(U|Z) — A(71Z)]2 OUR +
2Eo(Z -<p)(<P -w )[A(U|Z)-A(712)]2OUR =

Eo(Z-W)2[A(U|Z)-A(YlZ)]20uR+Ea(Z-<p)2[/\(TlZ)]202-

The theorem is now proved.

41

Before we turn to the linear regression model, we add here some more notations.

8 y y _1—8—y

(210’): F(t—92)—F(u-92)-F(”92) ’ Q20): F(u—Gz)-F(t—Oz) F(u—Gz)

 

 

~02

V(-,z) = zf(- — 92) + j a.dF, k(r,s) = E(Z|T— 92 = r, U— 92 = s)

 

 

 

Al(r,s)= l + 1 ,A,2(r,s)=-——i——,Az(r,s)= 1 +_1 ,r<s
F(t“) F(S)-F(r) F(S)-F(r) F(S)-F(r) F(S)
l 1 l 1
Elm—Wm ’ 32‘“)— F(s)—F(r) ‘ m

For a ﬁxed distribution function F and 9 69, let

a0 = sup{x:F(x) = 0}, b0 = inf{x:F(x) = 1}, (T, U, Z) ~ J, (T—BZ,U—GZ) ~ .8.

THEOREM 2. Suppose that

(1) F is a strictly increasing distribution with Lebesgue density f satisﬁing
f (S ) V 0 .

(it) El M>O : Pr{|Z|SM}=1.

(iii)V(t,u)eS, O_<_a0<0t<t<u<[3<b0 forsome 0L<B.

Then
(a) The efﬁcient score ﬁmction for 9 is

igo)= —[z - k(t -922u-92)][Q.(y)f(t -92) + Q2 omu -ez)].
(b) The information for 9 is

1(9)=E{IS(Y>]@ =E2[[Z-£W;2U2)]@[f(7;)a +{/(U2)JI2 «70min,

with T, = T-ez, U, = U-ez, A] aAj(7;,U,) Vj.

which is positive deﬁnite, unless Z = E(Z|7;,U#) a.e. ti.

42

Proof of Theorem 2. We follow the pattern established in Theorem 1. Let #1.

Arguing as in section 2, for a given (bounded) function a e 192(17), we can construct a

regular parametric family it passing through f. Let

1t : {ﬁ‘zln|<l}c3,={f20zIfdt=l}. Weobtain a(t)=%logfn(t)|n=o2

F'n(t = =0J‘IadF and since a e 12(F) we also have %F(’)ln=o = IadF.

6
534.1
The log-likelihood function up to an additive constant is

l(6,F;y) = 510gF(t—Oz)+y log[F(u —Gz)— F(t-92)]+(1—5 —y)logF(u—Oz)
and the associated score function for 9 ,

1'29) = -2f(t - 92)Q.<y) — 2f(u — 92)Q2o).
The score operator for f is obtained as the directional derivative of the ﬁmction
n —) [(911]; ) at n = O.

(2.1) ifa(y) E Ef(a(X)|Y= y)

(~62 11-6: 00

 

 

JadF IadF IadF
= -ao t— 6_z 14-62
8 F(t—92)+y F(u— 9z)- F(t— 9) +0 5—Y)F(u —9)
t—Gz 11-62

= a(y) JadF+Q2(y) JadF = %I(9,E,;y)ln=o~

Then

t-Oz

—(i2tv) - i,a.o))i,a(y> = [(25 (y)V(t22) + Q.(y)Q2o)V(u2z)] j adF

43

14-82

+ [elm/(24,2) + Q. (new) V(r,z)] 1 MR

Easy computations show that
E{Q.’<Y>IT,UJ}= Alma), E{Q§(Y)IT,U,Z}= Axum)

and E{Q.<Y)szlr,U,Z}=—A,2(7;,Ut).

Thus

T-GZ U-OZ
—E(19 - i,a.)i,a = 121 J MI W, Z)A1 - Val, mm] + E, IadFI V(U,Z)A2 - V(T,Z>A.2]

—oo

r T—9Z=r T-BZ-—-r
= E1IadF|:A‘(r’S)E{V(T,Z)|U-GZ = 5}" A12(r,S)E{V(U,Z)'U_GZ = 3}]

s T
+ E1JadF[A2(raS)E{V(U’Z)U_QZ=s U‘OZ:S

 

—BZ=r} { IT—OZ=r}]
_ A12(r,s)E V(TaZ) ,

with l the joint distribution of (T -—- OZ, U — 92). To satisfy the orthogonality condition
5(1}, — i,a.)i,a = 0 Va 6 13m,
set the two brackets in the preceding equation equal to zero and solve the system of

equations to obtain
(2.2) jadF = — f (r)k(r,s) and I'mdF = — f (u)k(r, s).

From (i) and (ii) we obtain

 

I a.dF“ s Mf(r)—:;;—~>O. Since J” a.dF —-) I” a.dF as r —> 00, we conclude that

a. e L3(F). Now from (2.1) and (2.2), the efﬁcient score function for 9 is

44

1;; (y) = iety) — ya. (y)
= —[z — k(t — oz,u — 9a)][Ql (y) f(t - 92) + Q2 (y) f(u — 92)]
and the information at e,
1(6)=£{i;(Y)]2=EJ[[Z—E(217;,zt)]2[f2(m +f2(Ut)Az -2ﬂ7;)f(l/;)An]] ,
with T, = T—ez, U, = 11—92, A] a rel/(mug) Vj.
Notice that strict monotonicity of F and assumption (iii) imply that t# -+ 0(5) is a
strictly decreasing function. Moreover
mm + {my/17 —f(n)JBT}2=f2(I;)At ”We -2f(7;)f(Ua)Aa > 0,
if and only if 0(a) > 0(u,,) for n < tn. This shows that the information for o is

positive unless Z = E (Z / 7;,U,,) a.e. 1;. The theorem is now proved.

REMARK 2. We want to emphasize here that both theorems that we have considered
have immediate generalizations to situations where interval censoring is a result of an
inspection process with ﬁnitely many observations-inspections. It will be a matter of

future research the case of a random number of inspections as it is described in Wang

et.al (1994).

4. Estimation of [(90): In this section we provide an estimator of [(90) in both

models that we have analyzed in this thesis. This estimator provides at least an

approximation of the lower bound for the variance of (3,, in the Neonatal Brain

45

Hemorrhage application that we consider in Chapter 6. Let (Esp/i") , (émFg) denote

MLE’s of (9,A) and (9, F) in the two models respectively. We make use of the
notation introduced in previous sections with obvious modiﬁcations when the MLE

replaces the true parameter. To make sure that there is no confusion about the model to

which we refer, we denote by if and if the estimates of the information in the Cox and
the Linear regression model respectively. The major difficulty that we encounter here is
the estimation of the conditional expectation that appears in the expression for I (60).
The following special cases are particularly useful in the Cox model due to its special
structure and the form of the information measure. No simple expression for [(90) is

available in the linear regression model.

1) WW.

 

 

 

 

Then
E Zezez 0 292
Z-W(t,u)=Z— W{ 202 Rl,u(Z) 14(2)} Z—(p(t’u)=Z—EW{ZEOZ 0!(Z)}.
EW{e Rl,u(Z)0u(Z)} EW {8 0, (2)}
22.8%)?» (mt, <2.) zzWo <2.)
Let GIN. = 1:1" ‘ and (ﬁnJ = Fl" . _
236.2119,“(Zj)0"‘U'(Zj) 2826an ".7,-(2])
j-l i=1

Now we can estimate 1C (00) by

2

(4.1) if = ﬁitz, -\II..,.-)2[RAMBO-11.0320] 0 (2.91%..(20 +

i=1

ﬁe,- —q‘>.,.-)2 [847.7120] 0",. (2,).

i=1

46

WWW

Let r,,.(Z)=e2°ZR,,.(Z)0.(Z) and A,(Z)=e2°20,(2).

 

 

 

 

Then
EW{A,(Z)|T,U}= AT(O)Pr{Z = 0| T,U}+ A,(1)Pr{z = 1| T,U}
AO( O)_a___hs(T U) A (1)(l-a)fi(T,U)
T h(T, U) T h(T,U) ’
_ (l-a)ht(T,U)
EW{ZAT(Z)|T,U}—AT(1) MT’U) .
Similarly
__ ah0(T,U) (l-a)ht(T,U)
Ew{rr.t/(Z)|T,U}- rum—WM Nana) MU)
_ (1_a)h|(T’U)
and EW{ZFT,U(Z)|T9U}_ FTJ/(l) h(T9U)

with h,(t,u)=h(t,u|Z = i) i = 0,1. Now we can estimate (p,\u by

(1— a)A (1)12 .(t u)
aA M) ,..(t u)+(1- M. (012.. (t u)

= (I-a)r.(1)h.,.(r,u)
af.(0)h.,.(r,u>+<1-a>f.(1>h.,.(t,u)

 

<3n(t,u)=
and

 

\l7,.(t,u)

respectively, where hn,,(t,u)=hn (t,ulZ = i) i = O, is a kernel type estimator of
h,(t,u)=h(t,u|Z = i) i = 0,1. By appropriate choice of kernel and bandwidth h can be

’n,i

a consistent estimator of h, - see Silverman (1986). An estimate of [C(G) is given by

(41) 118ng Wu’crﬁn ‘

47

To estimate 1(9) in other cases than the previously noted, one has to estimate the
conditional expectations that appear in the information for the models that we consider.
In addition to that, the linear regression model requires an estimate of the density f .
Kernel density estimation might be a solution here, leaving us the burden to choose the

kernel and the appropriate bandwidth. In this case the estimator will have the form

(4.2) f.(t) = ii K[’,;“]dﬁ.(u).

To estimate the conditional expectations that appear in the deﬁnition of (‘1’,(p) and k in

 

the Cox and the linear regression model, we proceed in two steps. If the conditional
expectation has the form E(g9(T,U,Z,F)|1t9(T,U)) = g(ne(T,U)), for some known
functions genre , then
i) Approximate g9(T,U,Z,F) by gé.(T,U,Z,F:,) and 1136 by “é,'
ii) Employ tools from nonparametric regression to estimate g in
gé"(T,U,Z,Fn)= g(nén(T,U))+e , with E8 = 0.

Call the resulting estimator gn(T,U). Now a meaningful estimator of 1(0) in the Cox
model is given by (4.1), with ﬁlmcﬁn functions of §n(T,U). The corresponding

estimator in the linear regression is

n 2 A A A A A A A A A A A
(4.3) I“: = %Z[Z,. -§.(T,,U,.>] [mam +f,3(U,-.t>A. —2f,.(T,,.)f,.<U,-,.)A.2].

#1

48

5. Comparison of information measures: It is expected that there is signiﬁcant loss of
information in the models that we have considered due to interval censoring. In this
section we try to quantify this loss for a number of distributions that the information

measure is relatively easy to compute. We do it for the Cox model only, since there is no

easy way to compute [(9) in the linear regression. Assume that Z is a binary covariate,
independent of(T,U) and for y > O, Pr{Z= O}=1—y , Pr{Z = 1} = y.

For I“,,u(Z) =ezezR,.u(Z) 0,,(Z) , A,(Z) =e262 0, (Z) as in the previous section, we

 

 

 

obtain
E. {Z F...(Z>} e2°R. .(1>0.(1)t
\p(t,u) = = 29 ’
Ez{r.,.(2)} e R.,.(1)0.(1)Y+R,,..(0)0.(0)(1-y)
and <p(t,u) = EZ{Z A'(Z)} — 7329041)

Ez{A,(Z)} ‘ve2°0,<1>+(1—v)0,(0> '

We will also assume that the censoring distribution is discrete uniform on the lattice
is = {(i, j): i < j, i, j e {1,2,3,4}}. Tables 5.1 and 5.2 contain the information measure
for 0, for a few selected distributions under interval censoring and no censoring. In the

latter case, Ritov and Wellner (1988) prove that I"(9) = E X {var(Z|X )} In the current

setup, the information for 6 without any censoring is given by

te°[F(x)]’°
ye°[F<x>]‘ +(1—t)F’(x)

 

[a(9)=Ex{§(X)—E,2(X)} with €05):

49

Notice that if X ~ F , F a continuous distribution, then F (X ) ~ U (0,1) , thus making
the information independent of F. This is indeed the case as the next two tables show.
For interval censoring, we present the information measure in case that two inspections
are available as well as in case of a single inspection (see Remark 3.1). In the following

tables the numbers in parenthesis correspond to case 1 interval censoring.

Table 5.1: Information measure in the Cox model (Y = .2)

 

 

 

 

 

 

Hazard 6 -2 -1 0 l 2
exp(.5) 1(9) .037 .077 .1 17 .096 .031
(.0235) (.052) (.086) (.076) (.021)
U(0,5) 1(9) .028 .060 .101 .105 .061
(.0135) (.032) (.061) (.079) (.046)
f(x)=.08x 1(9) .017 .039 .072 .091 .065
05x55 (.006) (.014) (.030) (.049) (.049)
no ,.(9) .076 .125 .160 .131 .081
censoring

 

 

 

 

 

 

 

 

50

Table 5.2: Information measure in the Cox model (Y = .5)

 

 

 

 

 

 

 

 

 

 

 

 

Hazard 0 -2 -1 0 l 2
exp(.5) 1(9) .077 .136 .183 .150 .052
(.050) (.095) (.134) (.119) (.036)
U(0,5) 1(9) .059 .110 .158 .152 .082
(.030) (.060) (.096) (.1 10) (.063)
f(x)=.08x 1(9) .037 .073 .113 .126 .080
05x55 (.013) (.027) (.046) (.061) (.059)
no 1‘0)
censoring .165 .223 .250 .187 .097

 

 

We have chosen the three distributions on the basis of severity in censoring. Clearly an
exponential distribution for the variable of interest creates many left censored
observations, while the distribution speciﬁed by the density f puts most of its mass
towards the right endpoint of the interval [0,5], thus causing an excess of right censored

observations.

Chapter 5

GENERALIZED M-ESTIMATION IN THE ACCELERATED

FAILURE TIME MODEL

In Chapter 2 we deﬁned maximum (proﬁle) likelihood estimators for the Cox and the
linear regression models and established sufﬁcient and necessary conditions for step 1 in

the proﬁle likelihood algorithm to be well deﬁned. Although in the Cox model the entire

algorithm seems to be easily implemented, largely due to the smoothness of the function

0 l—> e9: , the same result might not hold in linear regression (in step 2 we might not get a
maximizer) . The problem arises from the unpleasant situation of nonsmoothness in the
function 0 1—> E(o;0), with F,(-;0) the maximum likelihood estimator of F}, for a
ﬁxed 0. Any attempt to develop the asymptotic theory for this class of estimators will
have to confront problems of this nature.

To avoid artiﬁcial assumptions which are often impossible to verify in practice and
make the problem unnatural, we turn to a smaller class of M-estimators which hopefully
provides asymptotic theory under reasonable assumptions on the underline model. Our
efforts follow closely the theory of Asymptotic Generalized M-Estimators (AGME) of
Bickel, Klaassen, Ritov, Wellner (1993)— hereafter abbreviated to BKRW. In their master
theorem 7.3.1, page 312, these authors establish the conditions for an AGM estimator to

be asymptotically normal. However their requirement for a consistent AGM estimator is

51

52

extremely difﬁcult to verify, especially in models where a close form of such an estimator
is not available or very hard to obtain. We manage to obtain the result of their theorem
by relaxing the consistency assumption to the expence of more smoothness in the
objective function. To verify the rest of the conditions in their theorem, we exploit the
modern arsenal in the theory of empirical processes. Sufﬁcient conditions for

convergence of stochastic processes and results of the form
sup{~/;(Pn-P)f : f 6.7,, c3}=op(l),
are given in Pollard (1989) for certain classes of functions 3. In the appendix of this

thesis we put together the necessary machinery that will enable us to verify the
assumptions of our modiﬁed version of Theorem 7.3.1 of BKRW. Most of this work

will be the subject of future research.

1. The master theorem: Let 00 be a model and 0° c 7710 with 7710 containing all measures
with ﬁnite support. Let W" , W: R’" x 7710 —> R" and v20“ —) R’". Suppose
W (v(P), P) = 0 VP 6 0) and Wn(v, Po) = W(v, Po) + 0(1). Introduce the notation
W (v) E Wn(v,Pn) with P" denoting empirical distribution. We say that {In is

generalized M-estimator (GME) of v( P) if W;,(\7n) = 0 and asymptotically generalized
M-estimator (AGME) if W,,(9,,) = op(n'”2).

Let V..(v)E\/Z(W(v)-W(v,8)) . v. a v0.)-

Vn(v)—Vn(v0)|
1+J;|v— vol

 

 

:Iv— vol 5 8n} = op(l)

(GMO) VenlO sup{

53

(GMO') VM<oo, sup{

   

.(v)-V.(v.>l = lv-vols Mn-“2}=o.0).
(GMl) 3 v: in, —~) R" such that W(v(P),P)=0 VPemo.

(GM2) 3 w:I x m, —> R'", w e{1.‘;(P,)}"' such that

= n":w(l§, Po)+op(n‘“2).

i=l

(GM3) W(- , P0) is differentiable at v0 and W(Po) -=- W(vo, P0) is nonsingular.

THEOREM (BKRW-l993). Suppose P0 60’. Let 9, be an AGM estimator of Va. If

{1 is consistent and (GMO)-(GM3) hold or 0,, is J; - consistent and (GM 0') — (GM 3)

n

hold, then

M9,. -vo)=- W‘MMY- Po)+0 (1-)

si—

i=1

2. Generalized M-estimation under interval censoring: From our computations in

Theorem 4.3.2, the efficient score function for 0 is

(2.0) = )-—[z (8,44) )][Q( (y)f(t )+Q2( y)f(u.)]
with a, a a —02 Va. We will derive a generalized estimator of 0 by appropriately

modifying (2.0). Notice (although not exhibited) that Ql ,Q2 depend on both 0 and F.
We replace f k in (2.0) by known functions f, E . In addition we replace F ( — 02) by its
maximum likelihood estimator F,(-—02;0). Denote the corresponding generalized score

ﬁmction by Z(y;0) and let W,,(0) a P l:(0;Y), W,,(0, P) :- Pl:(0; Y) , with the functional

n

representation of the integral Pf = I fdP , used everywhere in this chapter. Finally let

54
X —0Z~F(-;0) V0 66). From the consistency property of the NPMLE of F3,

(Groeneboom and Wellner (1992)), we obtain F,(-—02;0)-i-)F(-—02;0) with

F ( — 002,00) 5 Fg,(- — 002). It follows that

 

 

 

(2.1) lbw) ——‘"——>T(y;9) s -[2—F(t.,u.)l {[F(:;9) — F(u#;9)y_ F(t#;9):|~(t#)
1-8— ~
{Wiel- F(t.;9) " F(u,;6y)lf(ut) }.

Choosing carefully if , we will prove that

(2.2) 1907,:(00,Y)—+ P,T(9,,Y) and
(2.3) W..(e.) a (P. — P. )T.(e.;Y)+ Imam—“‘4 P.7(e.;Y) 2 W0... Po).

Moreover

 

 

. _ ~ _ ~ FLO-002) E,(u—002)—F;,(t-002) ~
W(9’&)=M9’Y)"llz‘kl{l F(t.;e) ' F(u,;e)—F(t,;e) l (W

 

5(“‘902)-Fo(t-902) Eta—6.2) ..
[ F(u,;0)—F(t#;9) - F(urﬁ) lf(u#)}dj(t’u’z)r

from which it clearly follows that
(2.4) W(00;R,)-=-0.

The following theorem is our modiﬁcation of the master theorem presented earlier. It
shows that we can relax the consistency assumption and still manage to obtain the same
asymptotic result at the expense of more smoothness in the limit of the objective function.

Consider the conditions

 

V(0)—V,,(00)| : I0—00|Sn’“}=op(1), for some a>0.

(C1) sup{

(C2) 3 w:Rf x R" x {0,1}2 x9 —> R’", w e{13(P,)}"' such that

55
W (0,) = P,w(Y,0,) +0p(n"’2).

(C3) 0 l——) W (0,P0) is diﬁ’erentiable at 00 with W(BO) nonsingular. Moreover
W(0,Po)= W(Go,R,)+W(00)(0—00)+|0—00|' with ya >1/2.

(C4) W(90,R,) = 0'

THEOREM 1. Let eoeé), K,(0)Et/h[lK,(0)—W(0,R,)],0€@. Suppose that
(Cl-C4) hold. Then

(i) For large n 3 0n:

n

 

0,, —00 Is n'“ with ya >1/2 such that W (0") = op(n"“2)
almost surely Po.

(ii) Meat/72(0), -0,) = -\/I—1_ P,w(Y,9,)+op(1),
ﬁom which it follows that

72(0, —0,)—"—>N(0,z,)

I

with 2, = W(6,)"[E,Wv][W(0,)“] .

REMARK 1. As we have mentioned earlier, the veriﬁcation of the conditions of
Theorem 1 in our interval censoring problem, will be the subject of ﬁiture work.
However in Theorem 2 we prove a uniform law of large numbers, tailored for the needs
of our linear regression model, see display (2.3). This result is used in the proof of

Theorem 1.

56
REMARK 2. Notice the connection between the modulo of continuity in the objective

function (C1) and the smoothness of the model (C3). If 01—)W(0,P0) is twice
differentiable and (C3) holds with 7:2, then (Cl) needs to be established for a >1/4.
In this case, it might be appropriate to follow the approach of Manski (1975), (1985),
seeking “maximum scored” estimators of 00. Huang (1993), proved that the “maximum
scored estimator” of 00 with case 1 interval censored data is n“ -consistent. If we

manage to establish such a result with case 2 interval censoring, then all is left to do in

the present context is to solve W,,(0) = 0 in a )0 — 0,, = Op(n'”3 ) neighborhood of 0".

 

Proof of Theorem 1. Let 0 e Bn(a) = {0:[0 — 00] s n""} . In view of (C4) ,write (C1) as

W (9) = W") + W..(90) + 0601"”)

(3’ W,,(00) + W(0,)(0 —0,) + |0 — 0,|’ + spot-"2)
=W,,(e,)+ W(0,)(0 —90)+op(n'"2) since ya >1/2.
Now from (2.3), (2.4) and the deﬁnition of B" (a) , (i) follows. To obtain (ii) write
W,,('0',,) =W,,(0,) + W(0,)(0', —0,) + op(n'”2),
from which it follows that
0,, — 0, = —[W(90)]" W,,(e,) + op(n'”2).

In view of (C2), the theorem is now proved.

The following lemma is used in the proof of Theorem 2. Recall that H, J, V denote the
joint distributions of (T, U), (T, U,Z), (T—GOZ,U —00Z) respectively, while [ambo]

indicates the support of F3.

57
LEMMA 1. Suppose that

(i) F, is strictly increasing.
(ii) J is a continuous distribution, 0, U) and Z have densities with respect to
Lebesgue measure.
(iii) (T10 and Z are bounded. Moreover V (1:,p) e supportﬂO E S,
0<F;,(r)<FO(p)< 1.
(iv) 3 §>0 : H{(t,u): u2t+§} = 1.

Then there exists a neighborhood N ( 0) of 00 and an M >0 such that

l l
_ S M
eﬁnlﬁlﬁO-Gzzﬁm) + 15;,(u—0z;0;00)—F,,(t—02;0;c0)+ F,,(u—02;0;00)]

 

 

for n large enough and (t) e B with Po°°(B) = l .

Proof of Lemma 1. According to our deﬁnition in chapter 2 and for 0 e (9 ﬁxed,

F,(-,0) = argmaxl,(0,F).
F67
Let a e(0,l). For simplicity we denote by F,() E F,,(-,0). Then

1,,[(1— e)F+eF,]— 1,,( )<0 vtt.

It follows that

l:rr)1-2{l,,[(—l 8)F+8F]- l,,( (F)}<0 Vn.
If we evaluate this limit explicitly we obtain

(2. 5) [[8 BEL—92+) F3<u_ez)_&(t-Gz)+(l—y—6)MF,(8,y,t,u,z)SI.

F(t— 02) F,(u—02)-E,(t—02) Fun—02)

 

In view of (ii) and (iii), we can ﬁnd [a,b] c [a,,b,] such that for a properly chosen

neighborhood N (0,) of 0,, we have, V0 eN (0 ) and V(r, p)e where

S-(r -02,-(l 02),

58

S Fen/4,2) is the support of (T— 0Z,U — OZ) ,

(

(2.6) 0<F3(a)SF;,(r)<Fo(p)SE,(b)<l.
From (2.5) and (2.6) we obtain

(27) 8(a))

——§—dp,, $1.
80—92)

 

7
(2'8) “l 80-98—80-612) dP" 3‘

with An = {(t,u) e[a,b]®2: E,(u) — a(t) > n} and some n > 0.
— l—y—B
2.9 F b -._——dP _<_ l .
( ) 0( )I E, (u _ 92) n
From the strong law of large ntunbers we obtain the following weak convergence result

P(-;0))—"'—)P0() for (1) EB :R,”(B)= 1.

n

We now claim that 3 constant M2 > 0 such that for n large enough and co 6 B ,

l

(2.10) F,(u—02;0))— F,(t-02;co)

 

S M2 V(t,u,z) ES, : {t—02,u—02} e[a,b].

If this is not true, then for (t. ,u.,z.) ES, , (t. -— 02. ,u. — 02.) e [a,b]®2 and with positive
probability

QJD 1 >AJ VM>0.

F,,(u. -02. ;c0)— F,,(t. —02. ;(0)

 

Let A = {(x, y): a S t. - 02 S x < y S u. — 02 S b}. Note that by the monotonicity of

F,(-;o3), (2.11) holds for every point in A. Moreover A has positive Lebesgue

measure. Therefore

 

11J'F( ,,(—u —02;a)) F—,,(t— —02 ;0))dP"(.;w)

> T1M jrdPJ-W)
AnnA

= nMP,(A,,nAn{t<xSu};m)

59
Zl—lénMPo(A,,nAn{t<xSu})

> 1 for large n, M.
Since this violates (2.8), we conclude that (2.10) holds.

Similarly we can prove that

 

 

l
(2.12) E,(t—02;0))SM' V(t,u,z)eSJ :t—026[a,b]
and
(2-13) E(u-192;03)S M3 V(t,u,z)eS_, : u—02 e[a,b] with (0 EB :P0°°(B)=1.

The result now follows from (2.10), (2.12) and (2.13) for M = M1 + M2 + M,.

We conclude this section with the theorem that establishes the uniform strong law of

large numbers for our empirical process W,,(0) a P,,l:(Y ,0) .

THEOREM 2. In addition to the assumptions in Lemma 1 , assume that
(i) I? is a bounded function on [a,,b,]®2 .
(ii) 7 is Lipschitz, i. e.

lf(x)—f(y)| S mlx —y| Vx,y.

Then limW,,(0,) = W(Oo) as P0”.

Proof of Theorem 2. Recall the deﬁnitions of
(2.14) "2(9) 5 P.Z.'(Y,9) =(P. - P We; Y) + Pile; Y) 9 66>-

and

 

 

a(y;e)a—[z—t(t.,u.)]t[ 5 - Y ]70.)

17,059) E10459)" 120169)

60

+[ Y _l—8—Y].7(u#) }-

E(ut;9)-Fn(tt;9) E(urﬁ)

 

Let r,, be a sequence of positive numbers such that r,, i 0. Write

sup |<r.-P)i(e.Y)ls(P.—P) sup |Z(9,Y)l Vt.

le-eolsru le-eo Sf,

From (i) and (ii) in our hypotheses and Lemma 1 we obtain

 

sup Z(0,y)) S M, sup ;(t —02)+ M2, sup f(u-02)
le'eols’n l0-00lsrn le-eolsrn
S M, .0 5311': V(t — 02) — f(t — 0,2)) + M2 .9 Still: §f~(u - 02) — 7(u — 0,2);
l ‘ o ’t- l ‘ 0 ’n

+M,;(t-0,z)+ M27(u—0,z)
S (M, + 1171, )r, + M,f(t —0,z)+ M,f(u—0,z) , as P,°° .

Taking the limit as n goes to inﬁnity and from the strong law of large numbers applied

to
M,f(t — 0,2) + M2f~(u — 0,2) , we obtain

(2.15) lim sup

"2‘” ;e-e,:sr,

 

(P.—8)'I;(9,Y)l=0 as P.” .

To prove that P,l:(0,, Y) —) R,l(0,, Y) we note that f(t — 0,2) , 7(u - 0,2) are
bounded in a closed subset of [a,,b,]. Thus Lemma 1, (2.1) and the Lebesgue
dominated convergence theorem establish
(2.16) P,l,,(0,,Y) —> P,T(0,,Y).
Now from (2.15) and (2.16) we obtain

W,,(0,) —) W(0,, 1,) = 0 as p,“ .

The theorem is now proved.

Chapter 6

SIMULATIONS

We have seen in chapter 5 that the proﬁle likelihood approach doesn’t always
work in the AFT model as nicely as it does in the PH model. The source of most
problems with this model is the frequent lack of smoothness in the function
0 l—) F,,(-,0). In fact, very little one can say about this function and its properties.
Nevertheless, smoothness is essential in obtaining the maximizer in step 2 of the
proﬁle likelihood algorithm. To shed some light in the behavior of 0 1—-) F,(-,0) ,
a limited simulation study is conducted to examine the behavior of the proﬁle
likelihood as a function of 0 .
We work with the model

X = 0Z + e
where Z is a single covariate having Bernoulli(.5) distribution and s a normally
distributed random variable. We present the results of the simulation for sample
sizes of n=100 and n=1000 observations. Non negative, independent random
variables 7] , T2 are generated from preselected distributions and the censoring
pair (inspection times) is constructed according to

TET,U5T+E.

In an interval centered around the true 0 , we take a grid of points and for every
such point we compute the MLE F;(-,0,): i = 1,2,...,k , with k indicating the

cardinality of the grid. We then record the value of the proﬁle likelihood function
1t(9,)=l(9,,F;,(-,9,)) and plot the pairs (0,,1t(0,.)). For 0. E argmaxrt(0,.), we
lSjSk

A A

set 0 50. and F(-)EF(-,0.).

n

61

62

Table 6.1: Proﬁle Likelihood

 

 

 

 

x=2z+N(0,1 )
T. ~ Ewen. ~ Exp(5)
n=100 n=1000

0 7t(0) 0 71(0)
-1.0 -58.54 1.0 475.9
-0.5 -51.66 1.2 460.5
0 44.15 1.4 445.1
0.5 -33.19 1.6 433.7
1 -26.7 1.8 426.8
1.5 -25.16 2.0 424.4
2 -25.6 2.2 428.2
2.5 -28.7 2.4 434.7
3 -29.4 2.6 441.4
3.5 -29.5 2.8 448.7
4.0 -29.5 3.0 457.6
4.5 -29.6

5 -29.8

 

 

The proportion of censored observations in the two samples by type of censoring is
(.60,.30,.10) and (.60,.25,.15) respectively, for left, interval, right censoring. The

1: function is maximized for 0.=l.5 when n=100 and for 0.=2 when n=1000.

63

Table 6.2: Proﬁle Likelihood

 

 

 

X=.5Z+N(4, 1)
7, ~ Unif [3,5], T2 ~ Unif [0,1]
n=100 n=1000
0 n(0) 0 7t(0)

-1.0 -93.18 0 -860.0
-0.8 -90.88 0.1 -854.0
-0.6 -88.31 0.2 -850.2
-0.4 -86.60 0.3 -845.0
-0 -87.0 0.4 -840.2
0 -83.29 0.5 -839.8
0.2 -79.45 0.6 -837.6
0.4 -80.61 0.7 -841.5
0.6 -78.34 0.8 -840.6
0.8 -77.59 0.9 -845.5
1.0 -79.71 1.0 -850.4

 

 

 

The proportion of (leﬁ,interval,right) censoring is (.40,.18,.42) and (.38,.18,.44)
respectively, in the two groups. With 50% of the data essentially censored (to the
right), convergence of the proﬁle estimator is much slower. With n=100, we get
0. =8, while increasing the sample size to n=1000, we only obtain 0. =6. The
1: function is obviously “less” smooth here. In the next ﬁgures we plot the 71:

function and the maximum (proﬁle) likelihood estimate of the error distribution.

Iog-lik

log-lik

450 430

-470

 

 

   
 

 

n=100
T1 ~ Exp(.5), T2 ~ Exp(.5)

 

 

 

theta

Figure 6.1a: Profile Likelihood: X=2Z+N(O,1)

 

 

   

 

  
   

n=1000
T1 ~ Exp(.5), T2 ~ Exp(.5)

 

 

 

1.0 1.5 2.0 2.5 3.0

theta

Figure 6.1 b: Profile Likelihood: X=22+N(0.1)

 

 

Iog-lik

log-lik

-850 -840

-860

65

 

 

 
 
 

 

n=100
T1 ~ U[3,5], T2 ~ U[0,1]

 

  

 

 

-1.0 -0.5 0.0 0.5 1.0

theta

Figure 6.2a: Profile Likelihood: X=.52+N(4,1)

 

 

   

   

 

l n=1000
; T1 ~ U[3,5], 12 ~ UlO,1] ,

 
 

 

 

0.0 0.2 0.4 0.6 0.8 1.0

theta

Figure 6.2b: Profile Likelihood: X=.52+N(4,1)

 

 

0.8

0.4

0.0

0.8

0.4

0.0

66

 

 

 

 

 

Figure 6.3a: M.L.E. of error distribution: X=22+N(0,1)

 

 

 

 

Figure 6.3b: M.L.E. of error distribution: X=.5Z+N(4,1)

 

 

APPENDIX

Appendix

1. Concavity of the function A l—) 1,, (0;A): In chapter 2 we have used without proof

the concavity of the log-likelihood function with respect to A for every ﬁxed 0 e G). In
this section we give a short proof of this claim.

Let a>0 and consider the function

w,(t,u) = 810g(1— e"") + y log(e"" —e““’)—(1 —y —8)au.

 

 

 

 

 

6 8ae"" yae"" 0 yae'm
Th— ,= — ,—t,=—————1——8.
en at w0(t u) 1_e-ta e-ta _e—ua 6“ WO( u) e—ta _e-ua ( Y )a
Set
8e"" l—e"" +8e'2’“ e-('+u)a eat...)
th(5’Y) E ( _) 2 + I _ 2 a Ww(5’Y) E :Y, _ 2
(1_e a) (e a_e ua) (e a_e Ira)
-a(t+u)
W(v(B’Y)E-— _! _ 2 °
(e a -e ua)
Then we can write
2 2

Ewaaiu) = _ 02Wa(5’y) ’ 517W0(t’u) = _ 02WW(5’Y)’
62
auat

Now the matrix of second derivatives is

\p,(t,u) = — azwm(8,y).

 

67

68

It is easy to verify that 3(8)) is non-negative deﬁnite. Thus w,(t,u) is concave as

claimed.

2. Alternative characterization of MLE: We give here an alternative characterization
of the maximum likelihood estimator in both models that we considered. It is based on a
geometric interpretation of the NPMLE, as the left derivative of the gratest convex
minorant of a cumulative sum diagram. The idea is given in Robertson, Wright and
Dykstra (1988) and was ﬁrst implemented in the interval censored problems by
Groeneboom, see Groeneboom and Wellner (1992).

The process W associated with the ﬁrst derivative of the log-likelihood function with

respect to A in the Cox model is

 

5.- Yi qrMT’W
WA,q(t) = Z (1_e-A(T.)¢"" _ e_/t(7;)¢8 _e—A((/,)e<ﬁ ) e +

i: T,St

ll 1‘71 '61 q,—A((l,)e‘”
Z 'Alrl‘v’ﬁ -A(U-)eq" _ _,\((/ )erh e -
e ' -e ' e l

1': (1,5!

 

Let
d
G,,(t) = —EWM,(t) and dVA,(t) = dWA, (t) + A(r)dG,,,, (t) .
The following proposition is adapted from Groeneboom and Wellner (1992). It provides

an alternative characterization of the MLE of A , thus giving an equivalent statement to

the one presented in our Theorem 2.6.1.

PROPOSITION 1. For ﬁxed 0 66), let q, =0'z, ,ie{l,2,...,n}. Suppose that

8,, =1 and 5(m)=1(m) :0. Then X,(-,0) is the NPMLE of A, ifand onlyif K,(.,0)

69

is the leﬂ derivative of the convex minorant of the “cumulative sum diagram consisting

of the points

PM = (Gx,(..e).a(n(j))’ VK.(-.6).q(n(1)))

where R, =(0,0) and ”(1) eJn, j=1,2,...,m.

A similar proposition can be formulated to characterize the MLE in the linear regression
model. It can be used as an equivalent statement to Theorem 2.6.2. The associated W

process is given by

n

8? .-
WM’) = livellrm) _ F(U.°)Y- F(Yfll

, 7? _1-5? -7?
W Foil-Fm we?) '

 

 

—

+
3

 

3. Some results from empirical process theory: In this section we summarize results
that can be used to prove uniform central limit theorems and laws for large numbers. We
see the need for such machinery in our chapter 5, in our effort to verify the hypotheses of

the master theorem of BKRW (1994). Most of them can be found in Dudley (1984),

(1987) and Pollard (1984), (1989). Let (S,d) be a metric space, B c S, e > 0 and
.7 c [(P) , a family of functions for some r>0. Denote by In N (e,B,d) , ln D(c,B,d) ,
In N B (8,3 ,d) , the e—entropy, e-capacity and a— bracket entropy of B respectively. The

following proposition is a simple consequence of the deﬁnition of MD.

PROPOSITION 1: For every 8 > 0 and Vset B in a metric space (S ,d),

70
D(2e,B,d) s N(e,B,d) s D(e, B,d).

We now deﬁne the concept of a manageable class of functions . Pollard (1989)
introduced these classes and obtained results that go beyond the Vapnik-Cervonenkis
(1971) theory of VC classes of sets, thus extending the availability of central limit

theorems to estimators that depend on larger classes of functions.

DEFINITION 1: Let .7 be a class of functions with an envelope F, that is
I f |SF Vf 6.7 and let [HIM indicate the L,(Q) norm.. We say that .7 is manageable

for the envelope F if there exists a decreasing function F() for which

(i) E(logl"(x))mdx<oo and (ii) aépqellpuwtu.

 

 

2,Q)SI‘(8) v 0<eSl,

where the supremum is taken over all measures with ﬁnite support.

From the above deﬁnition it follows, see Dudley (1987), that
1. Every subclass .7 of a VC-subgraph class is manageable for F = supl f | .
.7

2. Every subclass .7 of a VC-hull class is manageable for F = sup] f | .
.7

3. Every subclass .7 of a VC-major class is manageable for constant F = sup| f | .
Moreover one can construct more complex manageable classes of ﬁinctions by starting
from simple classes and then use their stability properties to build new ones. For further
details see Pollard (1989). We are now ready to give the most appealing result associated

with these classes.

THEOREM 1 ( Pollard 1989): Let .7 be a manageable class for an envelope F with

PF2 <oo. Let y, E Jh(P,, — P) and for subclasses .7(n), n=1,2,...

71
(i) 06.7(n) Vn and (ii) supP|f|—)0 as n—>oo.
31")

Then

2
—)O as n——)oo.

 

 

Esura hf

50')

The concept of a manageable class comes close to Dudley’s deﬁnition of functional
Donsker classes, see Dudley (1987). Although a manageable class for a constant

envelope is a functional Donsker class, not all Donsker classes are manageable. Let .7 be

a class of uniformly bounded functions on a probability space (1,111, P). Set

v,(f)th(P,—P)f f6.7.

DEFINITION 2. The class of functions .7 is said to be aﬁmctional Donsker class if and
only if
(i) .7 is totally bounded for the sup-norm.

(ii) 3 8 > 0 such that sup Iv, (f) — v, (g)| = 0,,(1) , modulo measurability constraints.
If-gldi

BIBLIOGRAPHY

BIBLIOGRAPHY

Aalen, 0.0. (1978b). Nonparametric inference for a family of counting processes. Ann.
Statist. 6, 701-26.

Andersen, P. K., Borgan, 0., Gill, R.D., Keiding, N. (1993). Statistical Models Based on
Counting Processes. Springer-Verlag, New York.

Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes:
a large sample study. Ann. Statist. 10, 1100-1120.

Bickel, P. J ., Klaassen, C. A. J ., Ritov, Y., Wellner, J. A. (1993). Eﬁ‘icient and Adaptive
Estimation for Semiparametric Models. Johns Hopkins University Press, Baltimore.

Buckley, J ., and James, I. R. (1979). Linear regression with censored data. Biometrika
66, 429-436.

Cox, D. R. (1972). Regression models and life-tables. J. R. Statist. Soc. B, 34, 187-220.

Leblanc, M., and Crowley, J. (1995). Semiparametric regression ﬁinctionals. JASA,
429, 95-105.

Dudley, R. M. (1984). A course on empirical processes. Ecole d ’ Bté de Probabilités de
Saint-Flour XII-1 982. Lecture Notes in Math. 1097, 2-142. Springer, New York.

Dudley, R. M. (1987). Universal Donsker classes and metric entropy. Ann. Probab. 15,
1 306-1 326.

Finkelstein, D. M. and Wolfe R. A. (1985). A semiparametric model for regression
analysis of interval-censored failure time data. Biometrics 41, 933-945.

72

73

Finkelstein, D. M. (1986). A proportional hazards model for interval-censored failure
time data. Biometrics 42, 845-854.

Fleming, T. R., Harrington, D. P. (1991). Counting processes and survival analysis.
Wiley, New York.

Green, P. and Yandell, B., (1985). Semiparametric generalized linear models.
Proceedings 2nd International GLIM conference, Lecture notes in Statistics, 32,
Springer - Verlag, Berlin.

Heckman, N. E. (1986). Spline smoothing in a partially linear model, J. Roy. Statist.
Soc.,B, 48, 244-248.

Groeneboom, P. (1989). Brownian motion with a parabolic drift and Airy functions.
Prob. Theory and Related Fields, 811, 79-109.

Groeneboom, P. and Wellner, J. A. (1992). Information Bounds and Nonparametric
Maximum Likelihood Estimation. DMV Seminar Band 19, Birkhauser, Basel.

Hoel, D. G. and Walburg, H. E. (1972). Statistical analysis of survival experiments.
J. Nat. Canc. Inst, 49, 361-372.

Huang, J. (1993). Maximum scored likelihood estimation of a linear regression model
with interval censored data. Tech. Report, No. 253, Dept. of Statist., Univ. of
Washington.

Huang, J. and Wellner, J. A. (1993). Regression models with interval censoring.
Proceedings of the Kolmogorov Seminar, Euler Mathematics Institute, St. Petersburg,
Russia.

Koul, H. L., Sousarla, V. and Van Ryzin, J. (1981). Regression analysis with randomly
righr censored data. Ann. Statist. 9, 1276-1288.

Leiderman, P.H., Babu, D., Kagia, J ., Kramer, H.C., Leiderman, GP. (1973). African
infant precocity and some social inﬂuence during the ﬁrst year. Nature, 242, 247-
249.

Manski, C. F. (1975). Maximum score estimation of the stochastic utility model of
choice. J. Econometrics, 3 , 205-228.

74

Manski, C. F. (1985). Semiparametric analysis of discrete response: Asymptotic
properties of the maximum score estimator. J Econometrics, 27, 313-333.

Paneth, N., Pinto-Martin, J ., Gardiner, J ., Wallenstein, S., Katsikiotis, V., Hegyi, T.,
Hiatt, M., Susser, M. (1993). Incidence and timing of Germinal Matrix/
Intraventricular hemorrhage in low birth weight infants. Am. J. Epidemiol. 137,
1167-1176.

Peto, R. and Peto, J .(1972). Asymptotically efﬁcient rank invariant test procedures. J. R.
Statist. Soc. A, 135, 185-206.

Pfanzagl, J. (1988). Consistency of maximtun likelihood estimators for certain
nonparametric families, in particular: Mixtures. J. Statist. Plann. Inference, l9,
1 37-1 58.

Pinto-Martin, J ., Paneth, N., Witomski, T., Stein, 1., Schonfeld, S., Rosenfeld, D., Rose,
W., Kazam, E., Kairam, R., Katsikiotis, V., Susser, M. (1992). The central New
Jersey neonatal brain haemorrhage study: design of the study and reliability of
ultrasound diagnosis. Paediatr. Perinat. Epidemiol. 6, 273-84.

Pollard, D., (1984). Convergence of Stochastic Processes. Springer, New York.
Pollard, D., (1989). Asymptotics via Empirical Processes. Statist. Sci. 4, 341-366

Ritov, Y., (1990). Estimation in a linear regression model with censored data. Ann.
Statist. 18, 303-328.

Robertson, T., Wright, F. T., Dykstra, R. L. (1988). Order Restricted Statistical
Inference. Wiley, New York.

Rﬁcker, G., Messerer, D.,(l988). Remission duration: An example of interval censored
observations. Statist. in Medicine, 7, 1139-1145.

Schick, A.,(1993). On efficient estimation in regression models. Ann. Statist. 21, 1486-
1521.

Silverman, B. M. (1986). Density Estimation for Statistics and Data Analysis. Chapman
and Hall, New York.

75

Tumbull, B.W. (1974). Nonparametric estimation of a survivorship function with doubly
censored data. J. Amer. Statist. Assoc. 69, 169-173.

Tumbull, B.W. (1976). The empirical distribution function with arbitrarily grouped,
censored and truncated data. J. R. Statist. Soc. B, 38, 290-295.

Van de Geer, S. (1993). Hellinger-consistency of certain nonparametric maximum
likelihood estimators. Ann. Statist. 21, 14-44.

Wang, J .-L. (1985). Strong consistency of approximate maximum likelihood estimators
with applications to nonparametrics. Ann. Statist. 13, 932-946.

Wang, Z., Gardiner, C. J ., Ramamoorthi, V. R. (1994). Identiﬁability in interval
censhorship models. Statist. Prob. Letters, 21, 215-221.

Whittemore, S. A. and Keller, B. J. (1986). Survival estimation using splines.
Biometrics, 42, 495-506.

31293014172682

lllllllllllllllllll