.w
a"

.3 .
.. .
5:“.

J

km: W
.,, a an.
4S. ,
a. z .. . .
w . ‘ a... .
. as 3. t . . . ‘ “fl
2....» A Jen-Mm“
_ . , , , 4¥ 61) m:
V . ":5”; “.3012?
, . in... if
bat}. ...~
. . ”.3 . J.
‘ . a. Taxi»...
. .fbn. .:
31...! . r. .
in“... ”rev ,
. Lulu...“
Suki. ‘
m6. .1 yam-«Rah l , . , .
Emmmwtmﬁsﬁﬁ: 2., _ awn»
. , , .4. . .
.9. “may. «Lauww < tmumm.
, s
x. x .s

3.3%
. .r.

#192 J: 1. , mam.
5.x A».

n:
.x

i. . 1i
.Qvl 3...)», _
2.1%..
x
.531.” {a

‘ {1.1
.2. 2?. a
».

 

 

 

 

ZL‘Ci

This is to certify that the
dissertation entitled

Semiparametric Estimation For Current Status Data
With Flexible Covariate Effects

presented by

Neniiang Lu

has been accepted towards fulﬁllment
of the requirements for

 

 

PhoDo degreein StatiStiCS

M

 

Major professoi'

Hira L. Kou]

Date December 4, 2000

 

MS U i: an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

LIBRARY

Michlgan State
UnIversity

 

 

PLACE IN REFURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

moo mas-p.14

 

SEMIPARAMETRIC ESTIMATION FOR CURRENT STATUS DATA
WITH FLEXIBLE COVARIATE EFFECTS

By

Wenliang Lu

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

2000

ABSTRACT

SEN’IIPARAMETRIC ESTIMATION FOR CURRENT STATUS DATA WITH
FLEXIBLE COVARIATE EFFECTS

By

Wenliang Lu

This thesis studies a semiparametric hazard model with parametric baseline hazard
rate and nonparametric covariate dependency based on current status data. Two
estimators are proposed. One is the generalized proﬁle maximum likelihood estima-
tor (GPMLE) and the other is the sieve maximum likelihood estimator (SMLE). The
GPMLE is obtained by maximizing the proﬁle likelihood function where the nonpara-
metric covariate part is estimated using kernel and least square methods. Under some
regular conditions, the thesis establishes the square root consistency and asymptotic
normality of this estimator. The SMLE of the parameter is obtained by maximizing
the log-likelihood function with respect to both the ﬁnite dimensional and the inﬁnite
dimensional nuisance parameters while the inﬁnite dimensional nuisance parameter
is constrained to a subset of the parameter space which increases with the increase
in the sample size. This estimator is shown to be consistent and asymptotic normal.

Moreover, its asymptotic variance achieves the semiparametric lower bound.

ACKNOWLEDGMENTS

I would like to thank my advisor Professor Hira L. Koul for his guidance and many
helpful discussions on the subject of this thesis. He was always available when I had
doubts or questions. His general thinking of a statistical problem and ways to solve
the problem will help my future research and working. I would also like to thank
all the other committee members, Professor Joseph Gardiner, James Hannan and
Habib Salehi, for serving my guidance committee, especially Professor James Hannan
who was also my academic advisor. Finally I would like to thank the department
of Statistics and Probability for offering me graduate assistantships so that I could
come to the states and ﬁnish my thesis and my graduate study at Michigan State

University.

iii

TABLE OF CONTENTS

LIST OF TABLES ............................................................. vii

Simulation results for the two estimators

Introduction ..................................................................... 1
Overview .................................................................... 1
Literature review
Summary description
The model ................................................................... 3

The cumulative hazard function A(a:, 60)g(z)

Chapter 1
Generalized Proﬁle Maximum Likelihood Estimation ............................. 5
1.1 Deﬁnition of estimators of 60 and g ....................................... 5

Kernel estimate, 13‘(TJ-, Z,), of F(Tj, Z,, 00)
Least square estimate, §9(Z,-), of g(Z,-) for each ﬁxed 0
The estimator, 9, of 60
1.2 Asymptotic properties of the estimators .................................. 8
1.2.1 Consistency ........................................................ 8
Assumptions for the consistency and asymptotic normality of 9
6i converges to 60 in probability: Theorem 1
1.2.2 Asymptotic normality ............................................. 10
Jag — 00) converges weakly to a normal distribution: Theorem 2
1.3 Simulation .............................................................. 11
Weibull distribution with c.d.f. 1 — e19”

Means and standard deviations of 9 for small and moderate samples

iv

1.4 Proof of the consistency and asymptotic normality ...................... 12
1.4.1 Lemmas preliminary to the proof ................................. 12
Lemma 1: Uniform consistency of ("(t), the mean of
independent r.v.’s, t E 7)
Lemma 2 - 5: Uniform consistency of Kernel estimators
1.4.2 Proof of Theorem 1 and 2 ......................................... 19
Lemma 6 - 9: Uniform consistency of F(Tj, Z,) for F(Tj, Z,-, 00) and

of Q9(Z,-) and its derivatives for gg(Z,) and its derivatives

Proof of Theorem 1 (Consistency) .................................. 28
Proof of Theorem 2 (Asymptotic normality) ......................... 30

Chapter 2
Sieve Estimation ................................................................ 44
2.1 Estimation .............................................................. 44

Score equation: 5,,(0, on) = 0
The estimator, d, of 60: 3,,(9, (in) = 0
2.2 Consistency ............................................................. 47
Assumptions for the asymptotic properties
Theorem 3: Id — 60| + “an — O‘olloo converges to O in probability
Theorem 4: Convergence rate of the estimators,
I0“ — 90l = ope-i) and Ma. - aon -—- 0.027%)
2.3 Asymptotic normality of d .............................................. 49
Theorem 5: Jag — 00) converges weakly to a normal distribution
2.4 Information bound for 60 ................................................ 49
Efﬁcient score for 60

2.5 Simulation .............................................................. 51

Weibull distribution with c.d.f. 1 — e402
Means and standard deviations of IS3 for small and moderate samples
2.6 Proof of the theorems ................................................... 53

2.6.1 Proof of Theorem 3 ................................................ 53
Lemma 10: Inverse function theorem with sup-norm

2.6.2 Proof of Theorem 4 ................................................ 57
Lemma 11: The convergence rates of sieve estimators

2.6.3 Proof of Theorem 5 ................................................ 60

Lemma 12: Stochastic equi-continuity for empirical processes

Bibliography ................................................................... 67

vi

LIST OF TABLES

Table 1
Simulation results for the GPMLE .............................................. 11
Table 2
Simulation results for the SMLE ................................................ 52

vii

Introduction

0.1 Overview

Current status data arise in some clinical setting when the survival time of interest
can only be determined to lie below or above a random examination time. In the
settings such as destructive testing, animal experiments in which the occurrence of
a survival time is only observable upon sacriﬁce, and epidemiologic studies in which
obtaining more than one examination is not cost effective, current status data are
commonly encountered.

The nonparametric estimation of the survival time distribution and some smooth
functionals thereof have been discussed for current status data by a number of authors,
including Groeneboom and Wellner (1992, §2.3), Huang and Wellner (1995), Geskus
and Groeneboom (1996) and Geskus and Groeneboom (1997).

Semiparametric models based on current status data have also been studied in the
literature. Klein and Spady (1993), Rabinowitz, Tsiatis and Aragon (1995), Li and
Zhang (1998), and Murphy, Van der Vaart and Wellner (1999) considered the linear
regression model based on current status data. Klein and Spady used the proﬁle
maximum likelihood method to derive the estimator of the regression parameters
which were shown to achieve the semiparametric lower bound. In Rabinowitz, Tsiatis
and Aragon’s paper, a class of score statistics that may be used for estimation and
conﬁdence procedures is proposed. Li and Zhang minimized a class of U-statistics of
order 3 to obtain estimators of the parameters. Murphy, Van der Vaart and Wellner
considered the penalized maximized likelihood estimator of the regression parameter
which was shown to be efficient. Koul and Schick (1999) studied the estimation and

hypothesis testing of the ratio of scale parameters in the two-sample setting, using a

U-statistic of order 2.

Cox’s regression model has been also studied based on the current status data.
Finkelstein (1986), Diamond and McDonald (1991), and Shiboski and Jewell (1992)
developed several methods to ﬁt the model. Huang (1996) showed that, proﬁled over
the cumulative baseline hazard function, the proﬁle maximum likelihood estimator
for the regression parameter is asymptotically normal with ni-convergence rate.

Among the other semiparametric models for the current status data, additive
hazards regression model was studied by Lin, Oaks and Ying (1998) and the propor-
tional odds regression model was studied by Rossini and Tsiatis (1996). Under certain
conditions on the examination time, Lin, Oaks and Ying found that one can make
inferences about the regression parameters of the additive hazards model by using
the familiar asymptotic theory and software for the proportional hazards model with
right censoring data. Rossini and T siatis’s approach in the proportional odds regres-
sion model is based on approximating the inﬁnite-dimensional nuisance parameter,
the baseline log-odds of failure, with a step function, and carrying out a maximum
likelihood procedure. The resulting ﬁnite dimensional parameter estimates for the
regression parameters are shown to be asymptotically normal and semiparametrically
efficient.

Although these models, especially the Cox’s regression model, are popular and
widely used in practice, in many applications the shape of the baseline hazard is
thought to be well understood but the covariate effect is rarely speciﬁed precisely. For
example, in insurance problems the Gompertz-Makeham hazard has a long tradition
of successful application, [Jordan (1975), page 21]. Meshalkin and Kagan (1972)
claimed that the logarithm of the baseline hazard is approximately linear for a number
of chronic diseases. As an alternative to Cox’s regression model, Nielsen, Linton and

Bickel (1998) studied a model where the baseline hazard rate belongs to a parametric

class of hazard functions but the covariate part is of unknown functional form. They
obtained an estimator of the the underlying parameter by proﬁle maximum likelihood
method when the data is randomly right censored.

This dissertation discusses the estimation of the underlying parameter in this
model (Nielsen, Linton and Bickel, 1998) for current status data. Two estimators
are proposed. The ﬁrst one is obtained by maximizing a proﬁle likelihood where the
inﬁnite dimensional nuisance parameter is estimated nonparametrically. This is called
the generalized proﬁle maximum likelihood estimator. A set of sufﬁcient conditions
are provided for consistency and asymptotic normality.

The second estimator, called sieve maximum likelihood estimator, is obtained by
maximizing the log-likelihood function with respect to both the ﬁnite dimensional and
the inﬁnite dimensional nuisance parameters while the inﬁnite dimensional nuisance
parameter is constrained to a subset of the parameter space which increases with the
increase in the sample size. It is shown to be consistent, asymptotically normal, with
its asymptotic variance achieving the semiparametric lower bound.

Simulations are conducted to study the behavior of these estimators for small
and moderate sample sizes. The generalized proﬁle maximum likelihood estimator
seems to have a slightly lower bias and variance than the sieve maximum likelihood
estimator. Since the latter achieves the lower bound, as the sample size increase, it
should behave better than the generalized proﬁle maximum likelihood estimator for

large samples.

0.2 The model

Let X, T, Z be a random vector, where X represents the survival time, T the mon-
itoring variable and Z the covariate which could be a vector. Let (X1, T1, 21), ---,

(Xn, Tn, Zn) be i.i.d copies of X, T, Z.

Assume that, conditioned on Z, X and T are conditionally independent. The
conditional distribution of X, given Z, is assumed to depend on some parameter and
the covariate. In Cox’s regression model, the cumulative hazard rate function of X

given Z has the form

A0(a:)eﬁ’z,

where the ﬁrst part A0, with unspeciﬁed form, is called the baseline cumulative hazard
function, and ,3 is a vector of parameters. Nielsen, Linton, and Bickel (1998) proposed
an alternative model with the ﬁrst part depending only on some parameter 00 and
the second part with unspeciﬁed form. More speciﬁcally, the cumulative hazard rate

function is of the form

A($i 60)g(Z)i

where A(:r, 60) is a known function with unknown parameter 00, but 9 is an unknown
function. Here 60 belongs to O, a subset of 72" for some (1 Z 1. They discussed the
estimation of 90 and 9 under right censoring.

In this dissertation we discuss the estimation of 60 and g(z) based on current status
data or interval censoring Case I data, where one observes (7145,, Z,),i = 1,2,. . . ,n,
with 6, = [(A'iSTir It is assumed in the following sections that 90 is a scalar. For 00
as a vector, similar results can be obtained. Because of the curse of dimensionality,
Z is assumed to be a scalar also.

Let F (1:, Z, 00) be the conditional distribution function of X, given Z. Assume

that the cumulative hazard rate function is continuous. Then
FCC) Z, 00) :1_ 9Xp(_1\(I,60)g(Z))

We also assume that the distribution of (T, Z) does not depend on 90 or g, and
that if A(t,z,01)gl(z) = A(t,z,00)g(z) for all (t,z) in the support of (T, Z), then

01 = 60 and 91(2) = g(z) for all z. The latter is the identiﬁability condition.

Chapter 1

Generalized Proﬁle Maximum
Likelihood Estimation

1.1 Deﬁnition of estimators of 60 and g

In this dissertation we ﬁrst use a semiparametric proﬁle likelihood method to deﬁne
the estimator of the parameter. Both Klein and Spady (1993) and Nielsen, Linton,
and Bickel (1998) used generalized proﬁle likelihood methods to estimate the ﬁnite
dimensional parameter while the inﬁnitely dimensional nuisance parameter was es-
timated by the kernel method. The ensuing discussion in this section will be a bit
informal. The precise conditions under which all deﬁnitions are valid are stated in
the next section.

In this chapter, 6 is assumed to be a compact subset of R1, and is rewritten as
No.

One notes that, given (Ti, Z,),i = 1, 2, . .. ,n, the (conditional) log-likelihood for

0 and 9 based on (T,,6,~,Z,~),i = 1,2,... ,n is
Eli-109(1- exp(-A(Tt,0)9(Zi-))) — (1 — 6t)A(T.-,0)Q(Zt)l-
£21

The idea of generalized proﬁle likelihood methods is as follows:

(1) For a ﬁxed 0, obtain the estimates, 99(Z,), of g(Z,-), i = 1, - -- ,n, by using some

method such as the kernel method.

(2) The generalized proﬁle likelihood for 6 arising when g(Z,-) is replaced by §9(Z,-) is
Zl5i109(1— €1‘IJ(—é\(71,9)§9(zt))) — (1— 5i)A(Tt,9)§o(Zi)l-
i=1

Maximize it with respect to 6’ to obtain the estimate (9 of 6.
(3) If we want to estimate g(z), we treat 0 as the real parameter and use some method
as in step (1) or some other method to estimate it.

When 0 = 60, 690(Z,) should approach g(Z,-) for all ﬁxed Z,- as the sample size
it tends to inﬁnite. Moreover, the convergence must be faster than some particular
rate. This is hard to achieve for all 2,, i = 1, - -- ,n, because of the edge effects in

the kernel estimation. Hence we use the following modiﬁed likelihood for 0 and 9:

[11(6) 9) = Z w1(T.-,b)w2(Zi,b)[5i109(1- 633P(-A(1’"t.9)9(Z.-))) — (1 - 5i)/\(71, 9)9(Zi)l

i=1

where w2(Z,-, b) = 1 if Z,- is at least b far away from the boundary and 0 otherwise,
w1(TJ-, b) = 1 if T,- is at least b far away from the boundary and 0 otherwise. More
precisely, for example, if the support of Z is an interval [zf, zg], then w2(Z,-, b) = 1 if
Z, is in the interval [2? +b, .2; — b] and 0 otherwise, where b depends on n and b —> 0 as
n —> 00. Therefore, the modiﬁed likelihood is almost the same as the real likelihood
for 77. large enough.

In this dissertation, the support of a random variable (or possibly a random vector)
with a density with respect to Lebesgue measure means the closure of the set of all
points at which the density is positive.

To estimate 9 for any ﬁxed 0, our approach uses two dimensional kernel method

to estimate

F(T:jiZii00)i iij21a°°'in7

and then combines these estimates for each ﬁxed 2' to obtain §9(Z,-), i = 1, - - - ,n. The

least square method is used in the latter step.

6

Let K be a kernel and b the bandwidth. Deﬁne

- ..6K T—T'K Z—Z,
F(E,Zi):ZI¢J,z I b(z J) b( I )

 

1< ' '< . 1.1.1
Zt¢j,iKb(:’7 — T,)K,,(Z, - Zi) ’ — 2’] - n ( l

where

Note that 13‘ (~, ) depends on j, i, but we don’t make it explicit until it is necessary.
Under certain conditions on K, F and the density of (T, Z), and if b —+ 0 and
ab2 —> 00, then, conditioned on T], Z,, in probability,

. _ E- ,[6K,(T -— T‘)Kb(Z — 2.)]
FT-,Z, —>lim 1’ J
( 3 ) b—>0 Ej,,~[Kb(T — Tj)Kb(Z — Zill
_F(T,-, 2.,00)h(T,-,Zi)
— h(T:]i Zi)

=F(T}‘, Zia 00))

 

 

where h(t, z) is the joint density of (T, Z) and EL,- denote conditional expectation,
given T], 2,. Therefore F(Tj, Z,-) can be used to estimate F(Tj, Z,, 00).
Now if 0 is the real parameter, then —log(1 — F(Tj, Z,,6)) = A(Tj,9)g(Z,-) and

A

—log(1 -— F(T,,Z,~)) should be close to —log(1 -— F(T,»,Zi,6)) for all j and i when

the sample size is large enough. For ﬁxed Z,-, we shall estimate g(Z,-) such that

A

A(TJ-,9)g(Z,~) is close to —log(1 — F(T,-,Z,)), j = 1, - -- ,n.

 

 

 

Let
. Z .w (T-,b)A(T-,6)zog(1- F(T, 2.))
90(21): — 3% 1 J J 2 J , (1.1.2)
2]“;61‘ w1(Tja b)A (71], 0)
a least square estimator of g(Z,-), attaining
mzi.“ w1(7}ab)l109(1— F(Tji 21)) + M7}, 9)9(Zi)l2-
9( 01%.
The counterpart of 69(2) in limit is
E [A(T, 6)log(1 — F(T, Z, 00))] E[A(T,6)A(T, 60)]
= _ = 1. .

where E means the expectation w.r.t. the real parameter 60 and 9. Note that, by

(1.1.3), 990 = g. Let

F(t, z, 6) = 1 — e—MWM, F 2 1 —— F (1.1.4)
and
F(t, z, 6) = 1 — (““2096“). (1.1.5)

The modiﬁed proﬁle log-likelihood that arises when 9 is replaced by 99 is

("1(6)
= Zw1(71,b)w2(Ziab)l5t109(1— €$P(—1\(71,9)§o(zi))) - (1 — 5i)A(73a9)§o(Zi)l-

i=1

(1.1.6)

The estimator, 6, of 60 is the maximizer of the above likelihood over 6 E No.

Finally, the estimator of g(z) is deﬁned as

2?..w1(T.-.b)A(T.-,é)log(1— Fa:- 2))
23.1w.(71,b)A2(T.-.é) ‘

 

6(2) = -

1.2 Asymptotic properties of the estimators

1.2.1 Consistency

In this section, we state the consistency of the generalized proﬁle maximum likelihood
estimator 6. Before doing this, we give various assumptions which will be used to
prove the consistency and asymptotic normality of 6.

We list the following assumptions.

(A1) The respective supports Z and T of Z and T are closed intervals of R1.
A(t,6), 9(2) and h(t, z) are positive and continuous on their domains of deﬁnition

T x M, Z and T x Z. Moreover A(t, 6) is continuous in 6 uniformly for t. The ﬁrst

and second derivatives of A(t, 6) w.r.t. 6, A(t, 6) and A(t, 6), exist, and A(t, 6), A(t, 6)
are continuous in 6 uniformly for t, and continuous in t for any ﬁxed 6.

(A2) The function 9(2) and h(t, z) are four times differentiable on their domains
of deﬁnition with continuous 4th (partial) derivatives. Assume A(t, 60) is four times
differentiable in t with continuous 4th (partial) derivatives.

(A3) The kernel function K is an r-th order kernel supported on [—1, 1], symmetric
about zero and Lipschitz continuous on its support. (r-th order kernel means K
satisﬁes: fK(t)dt = 1, ftsK(t)dt = 0 for s = 2, - u ,r — 1 and f |t|’|K(t)|dt < 00.)

(A3’) The kernel function K is Lipschitz continuous, supported on [—1,1], and
satisﬁes: fK(t)dt = 1.

(A4) b = O(n‘°) with % < a < %.

(A5) 60 is an interior point of N0, which is a compact subset of R1.

(A6)

' 2F(Ti2700)

E (A(T,60)g(Z) + A(T:90)600(Z)) m > 0,

where 99(2) is the (partial) derivative of 99(3) with respect to 6.

Assumption (A1) or similar assumptions have been seen in the literature, see, for
example, Huang (1996), Klein and Spady (1993), Nielsen, Linton and Bickel (1998).
Assumption (A2) is a smooth condition on the model, which is used mainly for the
asymptotic normality. Assumptions (A3) and (AB’) are made for the kernel. One
notes that (A3) implies (A3’). Assumption (A4) is the bandwidth condition in kernel
estimation, which is crucial to the asymptotic normality. For the consistency of the
estimator, this bandwidth condition can be weakened. Assumptions (A1), (A3’) and
(A5) are imposed for consistency of the estimator. To prove the asymptotic normality,

we use assumptions (A1) -(A6).

Next we state the theorem on the consistency of the estimator. The proof will

be given in Section 1.4.2 following the general preliminary Lemmas 1-5 on kernel
estimations in Section 1.4.1. Before the proof of the theorem in Section 1.4.2, we give
ﬁrst Lemmas 6—9 on the uniform consistency of F(Tj, Z,) for F (Tj, Z,) and of gg(Z,)

and its derivatives for gg(Z,-) and its derivatives, 1 S i, j S n.

Theorem 1 Suppose that (A1), (A3’) and (A5) hold, b = 0(n’a) with 0 < a < %.
Then the generalized proﬁle likelihood estimator, 6, which is obtained by maximizing

ln1(6), converges in probability to the real parameter 60.

1.2.2 Asymptotic normality

In this section, we state the theorem on the asymptotic distribution of the estimator

and the proof will be given after the proof of Theorem 1 in Section 1.4.2.

Theorem 2 (Asymptotic distribution of 6} Suppose (AU-(A6) hold with r = 4 for
(A3). Then
ﬂu) — 00) => MO, 02),

where

2 _ E{[D1(T, 2. at) — MT. 2, 00)]? R(T, z, 60)}
[E (Dim 2. 00)R(T, 2. 00))12

 

,

MT. 00)h1(T)
Coh(T, Z)R(T, Z, 90)

C0 : E‘I\2(T7 90)) D1(ti Z) 60) : A(ti 00)g(Z) + A(ti 60)900(Z)7

 

ACT, Z: 60) : /A(t7 90)D1(ti Z7 60)R(t9 Z) 00)h(t3 Z)dti

and
F(ta 23 60)

h1(t):Lh(t,Z)dZ, R(t,2,00) = m.

10

1 .3 Simulation

Before we prove the stated asymptotic properties of the estimator, let’s take a look
at its behavior for small and moderate samples.

Assume that the conditional distribution of X, given Z, is a Weibull distribution
with distribution function

1— 8-1909(Z).

where g(z) : 2. Also assume that T and Z are uniformly distributed on [1,2] and
[0.2, 1.2] respectively.

For each ﬁxed sample size (n=30, 60, 100, 200 respectively) and appropriate b’s,
100 samples are generated with the real parameter 60 z 1.5 and 100 replications
of the estimate of 60 based on the generalized proﬁle maximum likelihood estima-

tor (GPMLE) are obtained. The means and standard deviations are shown in the

following table.

Table 1. Simulation results for GPMLE

 

n

b

mean

s.d.

 

30

0.0400

1.3847

1.3299

 

0.0420

1.4915

1.4075

 

0.0450

1.7596

1.3905

 

60

0.0308

1.4720

0.9801

 

0.0310

1.4824

0.9947

 

0.0312

1.4908

1.0043

 

100

0.0238

1.4535

0.7720

 

0.0240

1.4876

0.7702

 

0.0242

1.5075

0.7943

 

200

 

0.0166

1.4560

0.4795

 

0.0168

1.4990

0.4902

 

 

0.0170

 

1.5421

 

0.5103

 

The kernel function used in the simulation is K(:c) = 9/8 —- 15/812, —1 g :1: S 1; 0,

otherwise.

11

 

From the table we can see that the mean is around the true value for all the sample
sizes but the standard deviation decreases with the increase in the sample size. The

choice of b is crucial to the reduction of the bias of the estimator.

1.4 Proof of the consistency and asymptotic nor-

mality
1.4.1 Lemmas preliminary to the proof

To prove the consistency and asymptotic normality of the generalized proﬁle maxi-
mum likelihood estimator, the uniform consistency of 13' (T,, Z,-) for F (T), Z.) over all
1 S i,j g n, and of gg(Z,-) for gg(Z,-) over all 1 g i g n and 6 E No is proved ﬁrst.
Since gg(Z,-) is a function of F(Tj, Zi) which, in view of (1.1.1), is a ratio of two sums
(or means) of independent random variables, we ﬁrst discuss some uniform conver-
gence results of the sums (or means) of independent random variables in a general

setting.

Lemma 1 Let Y1, - - - ,Yn be i.i.d d-dimensional random vectors. Let D be a compact
subset of Rd, and for each t E D, let Wn(t, -),n 2 1, be a sequence of measurable

functions on 72“. Let

€n(t) = izwnua Y1): t6 13- (1.4.1)
i=1

Let 0 < ha 2 O(n"“°) with an > 0 and assume that for some 0 5 s, r < 00, and ﬁnite

real number CO,

d
hZIWnUiyll E Co) hZIWnUny) - WnUM/N S Co: ltlj " t231a (14.2)

i=1

uniformly for y 6 Rd and for all t, t1, t2 in D. Assume also that

E(W,,(t, 14)) = 0, t e D.

12

Then, for all a > 0,

[—0

 

 

sup l€n(t)| = 0,,(n- 2 hr). (1.4.3)
161)
Proof. Let
5 hi
An = n n,
2C0d

where 0 < 5,, —> 0, to be chosen later. By (1.4.1) and the second part of (1.4.2), for
all L1,L2 E D With ti = (til, ' ° ' ,tid),’l=1,2,
d
|€n(t1)— €n(t2)l : Ooh: Z In. —— t2.)
1:1

If |t1j — t2j| < A", then this inequality and the deﬁnition of An lead to
|€n(t1) - €n(t2)| S CultisdAn =

Since D is a compact space, it is contained in a hypercube. Without loss of generality,
let it be contained in a unit cube. Let N" = 1/An if 1/An is an integer, and ([l/An] +
1)“ otherwise, where [2:] means the integer part of 1:. Divide the unit cube into small
cubes Cm, i = 1, - -- ,Nn, each with length less than or equal to An. Cover D with

sets D F) Cm, i = 1, - -- ,Nn. Discard empty sets and let Din, i = 1, - -- ,.M,,, be the

remaining sets. Then t1,t2 E D,,, implies that [th —— tgjl < An, j = 1, - .. ,d. Note
also that
1 d
M < — +1 .
,, - (A. )
For i = 1, . -- ,Mn, let t, be a point in D,,,. Then, by triangle inequality,
5n
811p |€n(t)| S sup [l€n(ti)l + sup |€n(t) - €n(ti)l sup l€n(ti)| + 7
rev i=1,---,Mn ten... i=1,~-,M,,

 

It follows that

F(fgglt..(t)l>e.)313(3):) |€n(t) >553) 21006.. pg). (1.4.4)

l-Mn

13

Notice that, by (1.4.1) and the ﬁrst part of (1.4.2), nh;€n(t) is a sum of indepen-
dent and bounded random variables. Recall Bernstein’s inequality (for example, from
Shorack and Wellner(1986), page 855): for independent random variables €1,--- ,5"

with bounded ranges [—M, M] and zero means,

2

1 a:
0.. < ‘ ————_—— . .
p(|g1+ +€n| > :17) __ 2e$p( 22) + M17/3), (14 5)

for v 2 va'r(€1 + - - . + é").
Apply the above inequality with 6,- : h;W,,(t,Y,-), a: :— nhgen/2 and v = n03 to

obtain

 

en 1 n2h2’c2/4
(IE ( )l > 2) _ erp( 2nC§+Conh;5n/6)

Since has" —> 0 as n —> 00, the second term in the denominator of the fraction will

be less than the ﬁrst term for large enough n, and hence the above is less than
2exp(—Cnh3,'ef,),

for some 0 < C < 00, not depending on n, hn and en.
It now readily follows from (1.4.4), the upper bound for Mn and the deﬁnition of

A" that

 

d
P (sup |€n(t)| > an) S 2 (200d +1) exp(—Cnh,2,ref,), (1.4.6)

tED hZEn

which is 0(1) if 5,, = en‘lTahgr for all e > 0 and a > 0. The lemma is proved.

Next we are going to use Lemma 1 to show the uniform convergences in probability
of the means of independent random variables which have the same forms as those
in the deﬁnition of F (T), Z,), 1 g i, j S n. Moreover, their mean square convergence
is also established, which is crucial to the proof of the asymptotic normality of the

generalized proﬁle maximum likelihood estimator.

14

Let U 2 (U1, U2, - -- ,Ud) be a random vector in 72" and 7 be a random variable
taking values 0 or 1, and U,- = (LE-1.032, - -- ,Uid),’y,-, i = 1, - -- ,n, be i.i.d. copies of
U, '7, respectively.

Let g be a function on 72“ and K be a function on 72‘. Let Kb(t) : K(t/b)/b,
t E 72‘, b depends on n, b —> 0 as n —> 00. Also let i) 2 (v1, v2, . -- ,vd) be a vector in
72". If 11,23 6 Rd and 51:, y E 721, then xii + 3/5 :— (xu1+ yvl, - -- ,atud + yvd). Let also
d5 = dul - - - dud in the integration.

Deﬁne

- 1" ~
Tm) = ; 29(Uile(Ui1 - v.) - - -K.(U.. — vi).
i=1

The following two lemmas establish the convergence of T n (5). Lemma 2 establishes
the convergence rate of Tn(5) to its mean, in probability and in mean square, uniformly

in 5. Lemma 3 studies the rate behavior of the asymptotic bias of Tn(i)).

Lemma 2 Assume 0' has a bounded (joint) density f(&) with support D, = [8}, t'[‘] x
X [33,63], where s{,t’; E 72‘, i = 1, . n ,d. Also assume that K() is a bounded and

Lipschitz continuous function with

/_00 K2(t)dt < 00,
and g(&) is bounded. Then,
nbd seug) E|T,,(i}) — ETn(’D)|2 = 0(1) (1.4.7)
v f
and for all a > 0,
W :11; (mi) — Erna») = 0,,(1). (1.4.8)
v I

15

Proof. Using the fact that Var(Y) S EY'Z, for any random variable Y, and the

change of variable formula, we obtain

V ar(T,, (17)) = lIl’c1.r(g(U)I\’b(U1— v1)” Boa/d — val)

TL

1 -
g ;E{g() (U Kb(U1- v1) Kb(Ud - Udllz

_ ~ 2( U1 - v12 Ud— Dd ,~ -
_ if [ya u biwm b ) K (—b—)f(U)du
: nb‘1’_/i/‘2(U+ bt) K2(t(t1~) ~~K2(td)f(v + bf)dt.

 

Therefore, by the boundedness of f and g, and the square integrability of K,

sup ndear(Tn(i))) : 0(1).
176D;
Hence (1.4.7) is proved.
Apply Lemma 1 with t = 5, D : Df, 5,,(t) = 7",,(23) — Ean), hn = b, r = d and

s = d +1 to obtain (1.4.8).

Lemma 3 Assume the conditions of Lemma 2 hold.

(1) Iff and g are also Lipschitz continuous and K has support [—1,1] and satisﬁes

/K(t)dt = 1, /|K(t)|dt < oo.

SUP IETn(?7) - 9(5)f(17)| = 0(5),

" 0
vEDf

Then,

where D? : [s‘i‘+b,t’f —b] x ~-[s;+b,t;—b]
(2) Suppose f and 9 have up to rth bounded and continuous (partial) derivatives, and
K is an rth order kernel supported on [—1,1], and symmetric around zero. Then

SUP IETnii’) - 9(‘5)f(17)| = 0W)-

- 0
06D!

16

Proof. We only prove the second assertion since the ﬁrst one can be proved in a

similar but simpler way. Change of variables and Taylor expansion yields
E[T( (i) )]— _ :Eg( ((3)1t',(U1 — t) ). Kb(Ud — m)
”'WQ ’(UI—U1)"°I\r(ltd;vd)f(ﬂ)dﬁ

1
--/ g( (v+bt)K (t)---K(td)f('5+bf)dt

0

‘ yew ..
'foli [ii-1301'“) or Wﬁ*))b’t§]ﬁK(t,)dt

= 9('5)f(17') + 0W),

 

 

y...

$N\b

+

 

ll"

uniformly in v 6 D9, where i)“ :2 (vl‘, - - - ,vd") and vj“ is between vj —b and vJ- +b. In
the last two steps, the assumption f11t3K(t)dt = 0, s = 1, - .. ,r—l and f31K(t)dt =

1 were used.

The following two lemmas discuss the convergences of two other forms of means
of independent random variables based on kernels. They will be used to prove the
theorems in the following section. In Lemma 4, it is already centered; and in Lemma

5 there is some kind of centering.

Lemma 4 Assume that the conditions of Lemma 2 hold. Assume also that g(v) is

the conditional expectation ofy given U = i}. Let

n d
- 1 ~
sncv) = ; Zl’h — g(Ut-H HleWv — v.)-
J:
Then,
nb“ sup EIS..(?3')|2 = 0(1),
06D}

and for all a > 0,

«nI-aed sup 15427)) = 0,,(1).

06D}

17

Proof. Note that

 

Hence
Var(S (1.)) = E 'Vasr (Sn(v)lU,,i = , ,n)]
' n d
1 - -
:E EZQHJI)“ ”g(Uilll—IKIHUU v1):l
_ i=1 j=1

The rest of the proof is exactly the same as that of Lemma 2.

Lemma 5 Assume the conditions of Lemma 2 hold and that g is Lipschitz continu-

ous. Let
T1107) = '1‘ Zlﬂﬂil — g(ﬁ)le(Uz-1 — 711) ° ° ' Kb(Uid — 71d)-

n .
1:1

For the variance part of T7203), we have

nbd"2 sup EIT,'1(v) — ET,'l(i2)|2 = 0(1) (1.4.9)
{JED}
and for all a > 0,
an—“b2d sup |T,'l(17) — ET,'l(v)| 2 010(1). (1.4.10)
569}

For the bias part, we have the following.
(1) Iff is also Lipschitz continuous and K satisﬁes: fK(t)dt = 1, f |K(t)|dt < 00,

and has support [0, 1], then

sup |ET,'l(ii)| = ()(b). (1.4.11)

{JEDf

(2) If f and 9 have up to rth bounded and continuous (partial) derivatives, and that

K is a rth order kernel supported on [—1.1], symmetric around zero, then

sup |ET,'l(v)| = 0(br). (1.4.12)

~ 0
v61?!

18

Proof. Since we have the difference term g(U,) — g(v) in TM), we should expect a
better convergence rate than that of T n(i) The proof is similar to that of Lemmas 2
and 3.

For any 27 E Df,

V07‘(Tl.(l7)) = éVGT [Mm - 9(0))HKde — ’31)]
S 31-19 (9(0) — 9(5))2HK3W1 — val]

 

 

1 - ,~, 2i ,2 “1‘1“ 2_——_u"_vd a a
=—/“'/l9(u)—9(L)l b241‘( b ) K( b )f( )d
= — f - - - [W + bi) — 9(5)l2K2(t1)---K2(td)f(17 + boon”

|/\

for some ﬁnite real number C, not depending on 27. Thus (1.4.9) is proved.
Apply Lemma 1 with t = a, D = 1);, {(t) = T,’,(ii) — E(T,’,(v)), hn = b, r = d and
s = d + 1 to obtain (1.4.10). Assertion (1.4.11) and (1.4.12) can be proved in the

same way as in the proof of Lemma 3.

1.4.2 Proof of Theorem 1 and 2

Before giving proofs of Theorem 1 and 2, we shall use the general results of the
previous section to obtain some preliminaries for their proofs. To begin with, we shall
ﬁrst establish the uniform convergence of 13‘ (T,, Z,) to F (T,, 2,) over all 1 g i, j S n,
942,-) to 942,-), §,(z,) to 9.42,) and §,(z,) to §o(Z.-) over all 1 g i g n and 9 e No.
The expected square differences between 13’ (T,, Z,) and F (T), Z,), between §9(Z,-) and
gg(Z,-), and between §9(Z,-) and 90(Z.) are established as well.

By assumption (A1), let Z = [z’f,z.3] and T = [t",t§], two ﬁnite real intervals.
Let Z“ = [21‘ + b, 2; — b] and 7'0 = [t’{ + b, t; — b]. Then the support of h, Dh :—..
[t*,t§] x [21325]. Also, let D2 = 7'0 x 2’0.

19

Recall the deﬁnition of F(T,, Z,) from (1.1.1). Write

14404212.) + 31mm, 2.)

 

F(T-,Z,- — F(T-,Z,-,6 ) = .. , 1.4.13
.7 ) J 0 Bfljdl)(73, Zi) ( )
where

W") (t, z) :i—Zm F(:r,,Z,,60)]K,,(T, — t)K,,(Z, — 2), (1.4.14)

nl¢i ,j

B791.“ t, Z) —-%(Z[F (T1, Z1, 00)- -F(t, Z, 90)]Kb(T1 — t)Kb(Z1 — Z) (1415)
nlyéig'

and

.2. 1
87(le )(t,z) : E ,; Kb(T, — t)Kb(Z, — z).
3:]

We ﬁrst show that Véj’i)(TJ-,Z,-) and ng’i)(TJ-,Z,) converge to 0 in probability,
uniformly over 1 S i, j g n, and the conditional expectation of the squares of them,
given T,- and Z1, converge to 0 at certain rate, uniformly over 1 _<__ i, j g n. The same
convergence results of Bﬁjgilﬂ}, Z,) to h(T,-, Z,) are obtained as well. The previous
lemmas are used to obtain these convergence results. More speciﬁcally, we have the
following lemma.

In the following, sup“. stands for SUPlgj,ign and supm, 21,619,; stands for
“PearlSj.z<5n,(73.zt)evh)-

Lemma 6 (1) Assume that the conditions (A1) and (A3’) hold, and b = O(n““) with

0<a< %. Then,

sup ”Wm, 2.)! = 0,.(1), (1.4.16)

j,i
supEm. U" (732% ope-é). (1.4.17)
j,i

sup IB£j*")m.Z,)I = 0pm. (1.4.18)
(7332061)!)

20

sup 183513.21) — h(Tj,Z.-)l = 0.41), (1.4.19)
(T,,Z.)ED,2

where E“- stands for conditional empectation, given T], Z,.

(2) Assume (AU-(A3) hold with r = 4, and b = 0(n‘“) with 11—6 < a < %. Then

1

sup E...IB.‘.J*”(73, ZAP = opera) (1.4.20)
(73.2,)6173
sup E,,.|B,‘35"’(13,Z,) — h(T,-, Z,-)|2 _—. 0,,(n—%). (1.4.21)
(TJ,Z,‘)E’D2
Proof. Deﬁne
1 n
Vn(t. Z) = E :[51— F(Tz, Zz. 60lle(Tl — thb(Zi — 2)-
(=1

Apply Lemma 4 with Sn(27) = l7;,(t, 2:), d = 2, 7,- = 6,- and g = F(t, z,00) to obtain

sup Ell/"(L z)!2 = O(—2-) : o(n-§), (1.4.22)
(t,z)E’Dh nb

and

sup an(t,z)| = 0,,(1). (1.4.23)
(t,Z)EDh

Since K is bounded, by the deﬁnition of 179“)“, z) and Vn(t, z), we obtain

. . - c
sup sup IVU’”(t, z) — V(t, z)| g —2 = o(n"§), (1.4.24)
19',an (men. nb

for some constant 0 < C < 00. It follows from (1.4.23), (1.4.24) and the triangle
inequality that

sup sup |l”;,lj’i)(t,z)|=op(1).
19,191 (1,2)eoh

Hence (1.4.16) is obtained.
Similarly, by (1.4.22) and (1.4.24), apply the inequality (:1: — y)2 3 2(1:2 + yz),

x, y E R1 to obtain

sup sup Ell/33.3%, z) 2 = O(——2) = 0(n’5).
19,151: (t,z)eD,, 71b

 

21

Hence (1.4.17) is obtained.

Deﬁne
1 n
3.,(t, z) = ; Z[F(T,, 2,, 00) — F(t, z, 6,,)]K,,(:r, — t)K,,(Z, — z)
l=1

and
1 n
Bn0(t, z) = E Z: K,(T, — t)K,,(Z, — 2).
(=1

Apply (1.4.10) and (1.4.11) of Lemma 5 with T,’,(i§') = Bn(t,z), 9(6) 2 F(t,z,00),

U, = (T,, Z,), d = 2 and a = no to obtain that, for each (10 > 0,

sup an(t, z)| = op(n‘1+a°b'4) + 0(b),
(t,Z)EDh

which is op( 1) as 0.0 is chosen to be small enough. This is because of the assumption
on the convergence rate of b to 0. Similar argument as above leads to (1.4.18).
Similarly, apply (1.4.8) of Lemma 2 and part (1) of Lemma 3 with Tn(13) :-

Bn0(t, 2), 9(6) = 1, d = 2 and U, = (11-, 2,) to obtain that

sup IBno<t,z) — h(t,z)l = 0,.(1), (1.4.25)
(t,z)E”Dg

and (1.4.19) follows from the same discussion as the above.

Use the identity E (Y2) = var(Y) + (EY)2 for any random variable Y, and then
apply (1.4.9) and (1.4.12) of Lemma 5 with T,’,('b) = Bn(t,z), r = 4 and d = 2 to
obtain that

sup EIBn(t, z)|2 = 0(n‘1)+ 0(b8),

(t,z)€’D2

which is 0(n’i) since b = 0(n‘“) with a > 11—6. Thus (1.4.20) follows from the same
discussion as above.
Similarly, apply (1.4.7) of Lemma 2 and part (2) of Lemma 3 with Tn(i2) =

Bn0(t, z), r = 4 and d = 2 to obtain

sup EIBn0(t, z) — h(t, z)|2 = op(n—%).

(t,z)E”Dg

22

(1.4.21) follows from the same discussion as above. The lemma is proved.

Since, by assumption (Al), h(t, z) is bounded away from 0 and 00, and F(t, z, 00)
bounded away from O and 1, their estimators will also have these properties with
probability approaching 1 as the sample size tends to inﬁnity. We then discuss the
convergence of these estimators to their limits only on the set on which these prop-
erties are satisﬁes.

There exist real numbers 0 < a, 3 (12 < 00 such that a, < infumem h(t,z) and
a2 > s11p(,,z)€Dhli(t,z), and 0 < d, 3 d2 < 1 such that d, < inf(1,z)€’ph F(t,.z) and

d2 > supuﬂmm F(t, z). Particularly, choose

a, = inf h(t,z) — 6, a2 = sup h(t,z) +6,
(t.z)€Dh (i,z)eo,,

and

d1: inf F(t,z,60)—e, d2: sup F(t,z,90)+e,

(t7z)EDh (2,2)6Dh

for some 6 > 0.
Write F(j‘il(Tj, Z,) for F(T,, 2,) as the latter depends on (j, i), and let F(j")(t, 2:)

be obtained from (1.1.1) with T], Z,- replaced by t, 2 respectively. Let

An, = {a1 3 min nggi)(t,z) 3 max B£36i)(t,z) 3 a2},
(mevg (1.2)6172
132'.an 132'.an

Aug 2 {d1 3 min 13(3):)“, 2) s max F(j‘i)(t,z) 3 d2}.
(t,z)E'D2 (t,z)€D?1
19.an 152‘.an
In the definition of g(Z1), see (1.1.2), the summation is taken over these j such that
T,- E To, i.e. w,(TJ-,b) = 1, j = 1, - -- ,n. As we discuss the convergence rate of g(Z1)
to g(Zi), we want to exclude the case when all the T ,- fall into the edge area, more

speciﬁcally, 231:1 w,(T,-, b) = 0. Therefore, deﬁne

...={s;...,,,..}.

23

It is easy to see that, the probability of the complement of 14713, P(Af,3) = 0(b"), by
the assumption (A1).

Let

An 2 Anl n A112 ﬂ An3-

The probability of An is expected to go to 1 as n tends to 00. This is proved later.
Next the main results used to prove the consistency and asymptotic normality of the
generalized proﬁle maximum likelihood estimator are established in the following two

lemmas.

Lemma 7 (1) Assume condition (A1) and (A3’) hold, and b = 0(n‘“) with O < a <
%. Then

sup |13‘(T,-,Z,-) — F(T,, 2,, 00)| : o,(1). (1.4.26)
(T,.Z.-)6Dg

(2) Assume condition (A1),(A2) and (A?) hold, and b = 0(n—a) with 11—6 < d< %.
Then

sup E,-..-IF(T,,Z.) — F(T,-2.40%.... : 0,01%). (1.4.27)
(Tj,Zi)E'Dg

Proof. Note that h(t, z) is bounded away from 0. Thus (1.4.26) follows from
(1.4.13), (1.4.16), (1.4.18), (1.4.19), and (1.4.27) follows from (1.4.13), (1.4.17),

(1.4.20), (1.4.19). The lemma is proved.

Recall that 99(2) is the ﬁrst (partial) derivative of g()(z) with respect to 0. Let

'g'g(z) be the second (partial) derivative of 99(2) with respect to 0. Similarly deﬁne
20(2) and 20(3).
Lemma 8 (I) If {AU-{A3} hold, and b = 0(n‘“) with 116 < a < i, then

sup 47.1442.) — 99(2.)I2IA. = 0,44%), (1.4.28)
74:5"

24

and

1 . -1
SUP Eilgt)(Zi) ‘ 90(Zill2IAn = 011(71 2)- (1-4-29)
2.630
OEA‘O
where E,- stands for the conditional expectation given 2,.

(2) If condition (A1), (A3’) hold, and b = 0(n’“) with 0 < a < i, then

SUP lflo(Zi) — 90(Zi)l Z 011(1), (1-4-30)

Z, 6 Z")

sup |§o(Zz-) - 99(Zz-ll = 0,,(1),
2.62,“
BEA/o

and

sup (50(2.) — §o(Z.-)I = 0,0).
Z,€Z°
OEAm

Proof. We prove only (1.4.30) and (1.4.28). The proof of the remaining results will
be similar.

In view if (1.1.2) and (1.1.3), g,(Z,-) — gg(Z,-) can be decomposed into Rn1,9(Z,-) +
Rn2,9(Z,~), where

A

2,4. w1(TJ» blAlij 9)l109(1 - F(T,, 21)) - l09(1 - F(T,, 21,90)”

 

 

 

Rn , Z. = —
”l ) gimme/14:13.0)
and
Z z-wl(T-.b)/\(T'.(”MT-.190) EA 2:9 A T,6
R112.0(Zi) : HE ,J J 2 J — ( 2) ( 0) g(Zi)-
21¢, w,(Tj,b)A (73,6) EA (T, 6)
It is enough to show that
sup E,|R,,,,,,(2,)(21A, = o,,(n-%), k = 1,2, (1.4.31)
Z1620
GENO
under the conditions of part (1), and
sup ank,o(Zi)| = 0.41). k :12, (1.4.32)
3'55:

under the conditions of part (2). By the mean value theorem,

lx-yl

1.4.
,Ay, < 33

 

”09(1) -109(y)| S

A

for all positive 31:,y. Apply this with a: = 1 — F(T,, Z,-) and y = 1 — F(T,, Z1, 60) to
obtain

Ilog(1— Pow.» — 109(1 — F(T.,Z.-,00))I s '.F (71‘2") ’ F (212,90), .
(1" F(Th Zill A (1 _ F(Th 21300))

(1.4.34)

 

By the deﬁnition of Rﬂ,,g(Z,), and the boundedness of A(t,6) away from 0 and 00,

we obtain that, on Ang,

A

sup sup an1,9(Z1)| g C sup |log(1— F(T,-,Z,)) — log(1 —— F(T},Z,-,60))|,

2611561300 (3.206131.
for some constant 0 < C < 00. This, (1.4.34), (1.4.26), (1.4.27) and the boundedness
of F(t, z, 60) away from 0 and 1 imply (1.4.31) and (1.4.32) with k z 1.

The Lipschitz continuity of A(t, 6) with respect to 6 uniformly in t, and the uniform
SLLN imply (1.4.31) and (1.4.32) with k = 2. (They can also be proved by applying
(1).)

Notice that F(T,, 2,) does not depend on 6 and 99(Z,) depends on 6 only through

MT], 6). By the assumption on A(Tj, 6), similarly we can prove the remaining asser-

tions. The lemma is proved.

We shall show that the probability of An approaches 1 as n -+ 00.
Lemma 9 Assume that (A1) and (A3’) hold. Then

lim P(A,,) = 1.

71—)00

Proof. It sufﬁces to show that

lim P (Auk) = 1 or equivalently lim P (Afm) = 0, k = 1, 2,3.

n—>oo 71-900

26

We have seen that limnaoo P(An3) = 0 by its deﬁnition. We ﬁrst prove the above

assertion with k = 1. By the deﬁnition of Am, its compliment equal to

sup |B,(,J,,’i)(t, z) — h(t, 2)) > e
(t,z)E’D2
194311

One also notes that
Boa 2) — 353,0“, 2) = [Kim — t>K,(Z. — z) + KblTi — t)Kb(Z.- — 2)] /n.

the absolute value of which is less than C/(nb2) for all 1 g i, j g n, (t, z) 6 D1,, and

for some ﬁnite constant C, since K is bounded. Hence we have

P sup 185.].“(44—4441»
(t,z)evg
151,357;

g P ( sup |Bn0(t, 2) — h(t, z)| > e — C/(nb2))

(t,z)evg

which is 0(1) in View of (1.4.25) and that nb2 —> 00 as n —> 00. We thus obtain

lim P(A,,,) = 1.

11—)00

Let
F(t, z) = 2721 6114(7) _ thblZ’ " 2).
21:1 KAT, —— t)Kb(Zz — 2)

 

Similarly, one can obtain

~

P(Af,2) g P ( sup |F(t, z) — F(t, 2,60)| > e — sup |F(t,2) — F(j’i)(t, z)|),

(t,z)evg j,z‘,(t,z)evg
which is 0(1) if

sup |F(t,2) — F(j’i)(t,z)l = 0,,(1).

jai,(t,Z)ED2

This is easy to show and omitted here. The lemma is proved.

27

Proof of Theorem 1 (Consistency) It is enough to Show that ln1(6)/n con-
verges in probability, uniformly in NO, to a nonrandom function that has unique
maximizer at 60.

We are going to prove later that

sup |ln1(6)/n —ln1(6)| = 0,,(1), (1.4.35)
6ENO
where
ln1(6)
1 71

By a uniform law of large numbers, which holds under our conditions, and the

fact that
P(w1(T, b) = 0) = 0(b) and P(w2(Z, b) = 0) = 0(b),
we obtain
sup |ln1(6) — l(6)] = 0,,(1), (1.4.36)
OENO
where

1(9) = El5l09(1- CHM-MT, 9)go(Z))) - (1 - 5)A(T, 9)go(Z)l

= f/[U —- e"A(t'9°)9(z))log(1 — e”\("9)99(z)) — A(t, 6)gg(z)e"'\("9°)9(zl]h(t, z)dtdz.

This can also be obtained by apply Lemma 1 with t = 6, {(t) = lm-(6), D = No,
r = s = 0.

Next we prove that l (6) has a unique maximizer at 60. One notes that the function

f(y) = (1 - e’x)log(1 - e7”) - ye“

attains its maximum at y = :1: for any a: > 0 and y > 0, because

e‘y - e"”

f’(y) = —————,

l—e‘y

28

which is positive for y < 2:, equals 0 for y = :1: and negative for y > :17. Apply this with
:1: = A(t, 60)g(z) and y = A(t,6)g9(z) to obtain that 1(6) 3 [(60), and [(61) = l(60) iﬂ'
A(t,61)g,91 (z) = A(t, 60)g(z), (t, z) 6 D1,. This and the identiﬁability condition imply
that l (6) < l(60) for any 6 51$ 60. Therefore [(6) is uniquely maximized at 60. This,
(1.4.35) and (1.4.36) prove the theorem.

Now we establish (1.4.35). Write 111,-,- for w,(T,-, b)'LU2(Zi, b). It is enough to prove

 

that
su — w,,6,- lo F(T,,Z,,6)) w,,-6,-10(F(T,-,Z,~,6 =0 1 1.4.37
06301 2: g(( —23 g( >11 ,() ( )
and
Slip l_:wii(1_ “1176)96(Z' )_Z::wii(1— — Mm16)96(zi‘—_)l 0P(1)'
6623/0 n i- 1
(1.4.38)
Apply (1.433) with :1: = F(T,, 2,, 0) and y = F(T,, 2,, 19) to obtain
A isZia _ i1 i30
Ilog(F (71.2.4)) — log(F(T.-,Z.,0))I 3 WT 0) ”T Z )' (1.4.39)
F(n2Zi16)/\F(nizﬁg)
By the mean value theorem,
Ie'r — e.” 3 la: — yI, (1.4.40)

for all positive :3, y. Apply this with :1: = A(T,-,6)§9(Z,-) and y .—: A(T,-,6)g9(Z,) and
recall the deﬁnition of F(T,, Z,,6) and F(T,, Z,,6) (see (1.1.5) and (1.1.4)) to obtain

that the right hand side of (1.4.39) is no more than

A(T.,9)|9(Z.) — 912.): 1
F(T'iaZiyg) A F(Tiaziag)

 

Therefore the left hand side of (1.4.37) is no more than

SUPaeNo,(T,,2.)eDg A(Tz‘, 0)l69(Zi) — 90(Zi)l
infOENo.(T,-,Z,)E'D2 F(T,, Z130) A F(Tia Zia 9)

 

3

29

which is 0,,(1) because of the boundedness of A, (1.4.26), (1.4.30) and the boundedness
of F (t, z, 60) away from 0. This proves (1.4.37), and (1.4.38) can be proved in a similar
way. Hence (1.4.35) is proved.

Proof of Theorem 2 (Asymptotic normality) We ﬁrst prove the following.

sup (F(T,, 2,, 6) — F(T,, 2,, (1)1 = 0,,(1), (1.4.41)
(T120602
BEA/'0

where F(t, z, 6) is deﬁned in (1.1.5) and F(t, z, 6) is deﬁned in (1.1.4). Apply (1.4.40)
with :c = A(t, 6)f]g(z) and y = A(t, 6)gg(z) to obtain

A

lF(t1Z16) — F(t,2,9)l S A(t16)l60(z) _ 99(Z)|.

This, the boundedness of A and (1.4.30) imply (1.4.41).

By the deﬁnition of 99(2), see (1.1.3), and the assumption on A and 9 (see As-
sumption (Al)), g()(z), as a function of 6 and 2 on No x Z, is bounded away from 0
and 00. It follows from the deﬁnition of F (t, z, 6) that, as a function of t, z and 6, it
is boundedness from 0 and 00.

Let
Dim, 2., 0) = Aw, (9)422.) + Am, (1)4,(23, (1.4.42)

and

D1 (717 Zia 0) : A(7’176)90(Zi) + AA(T;,6)99(Z1-)

It follows from part (2) of Lemma 8 that

sup ID1(T.-.Z.,9) — D1(TiaZia6)l: 0,0). (1.4.43)
(T,,Z,)ED2

Now we begin to prove the theorem. The derivative, with respect to 6, of the

modiﬁed proﬁle log-likelihood, ln1(6), deﬁned in (1.1.6), is given by

a n
ﬁlnl (0) .2 2 wii

i=1

6" — 1 D,(T,-,Z,~,6).

Fm, 21,9)

 

 

30

Let

1 6
n 6 2 _—ln 6
Then, by the mean value theorem,
0 = 3,,(19) = 5,,(190) + «no? — 60)S‘,,(O*), (1.4.44)

where 6* is between 60 and 6, and

 

n ill—F 1) 2'70 "
=__Zw,, F( (T Z )lDf(T,,2,-,9)

T,,,-Z,0)
+ :23” FHLZﬂ) d
xM<7t0wwv+muﬂﬂﬁd&%wﬁﬂﬂﬁdﬂﬂ

 

We are going to show that Sn(6*) converges in probability to a positive number.
To do this, let 3;;(6) is obtained from 3,,(9) with F(T,, 2,, 9) replaced by F(T,, 2,, o),
D1(T,-, Z,,6) replaced by D1(T,~,Z,,6), §g(Z,) replaced by 90(Z1), 99(Z,) replaced by
g,(Z,-) and 542,-) replaced by g'9(Z,-.) In view of (1.4.41), (1.4.43), part (2) of Lemma
8, boundedness of F (t, z, 6) away from 0, and the boundedness of A, A and A, we

obtain

sup (3,,(0) — s;(0)| = o,(1). (1.4.45)

GENO
One also notes that, under assumption (A1), 33(6) is Lipschitz continuous in 6 on

No. This, (1.4.45) and the triangle inequality imply
lawn—Suan=401 04%)

Since 3;;(60) is the mean of bounded random variables, it follows from the SLLN

(Strong Law of Large Numbers) that 3,:(60) converges with probability 1 to

F(T, Z) 60)

—E __
F(T,Z,90)

D‘f(T, 2, 00) 2: —d(9,,).

31

This and (1.4.46) imply that S,,(6*) converges to —d(60) in probability. Hence it

follows from this and (1.4.44) that.
yew—90) = d-1(90)s,,(90)[1+o,(1)]. (1.4.47)

Next we are going to ﬁnd the limiting distribution of 8,,(60). Write g(z) for 690(2).

 

W'rite
5,,(90) = E, + Q... (1.4.43)
where
1 n [31(7‘2, 2:360)
En = — 1.0,, (S, — F 71', 21,6 .. , 1.4.49
W7- 2; l l 0)] F(T.~.Z.-,6o) ( )
and

min, 2.90)
F(T,,Zeeol "

 

1 " -
Q. = 752321)..- [F(T.~, 2.,90) - F(Tt.Z.~.90)] (1.4.50)
i=1

Both E, and 62,, has contributions to the limiting distribution of 5,,(60).
First we deal with E". Write 111,-, for w1(TJ-, b). By the deﬁnition of En, it can be

rewritten as the following

 

1 " Di(T,-. 21.00)
EnZ—‘E 1,61—Fﬂ,Zi,6 , 1.4.1
\/7—l 1:] w i ( 0)]F(11)Zi360) + R" ( 5 )
where
1 " . ~
Rn = 75 12:1 wiil6z‘ - F(Ti, ZtigollD(Ti, Zi)
and

7' D1(TiiZi100) Dl(naZi700)
D 71,21 = A — . 1.4.52
( ) F(T,, 2,9,) F(T,, 2.90) l )

 

 

In view of (1.4.41) and (1.4.43), R, is expected to go to 0 in probability. To prove

32

this, we show the expectation of R3, converges to 0 as 71 tends to inﬁnity. Note that

= — :2 tit-.16 F(T,, 2., 90)12D2(T 2)

£2 E uililldil — F(TiiniligollD(Tiini1 )

i1¢i2
X wig, [(512 — F(Tz'zi Zi2360)lD(:riga 212-)
That the ﬁrst term on the right hand side of the above expression is 0,,(1) follows

from

SUP lblTi'aZill 2 011(1):
(T,,Z.')E’Dg

which in turn follows from (1.4.41), (1.4.43) and the fact that F(t,z,6) is bounded
away from 0 and 1.
To prove that the second term of the expression of R3 goes to 0 in probability,

deﬁne the following

Zane”- 5kal-Tk " T,)K,,(Z,, -' Zn)
Zea“, K1477: — T,)K,,(Z,, — Zn) ’
For 1 g i1,i2 S n and i1 75 i2, let D(‘2)(T,-,,Z,-,) be obtained from D(T,~,,Z,-,) with

F“2’(T,-.Z.-.) = 1 31.21.12 3 n, 2‘, #12.

 

F(T,-,Z,,), 1 < j g n, replaced by F(‘2)(Tj,Z,-,). For any 1 g i, g n, by the

deﬁnitions of D(T,-,,Z,-,) (see (1.452)) and D1(T,-,,Z,,,6) (see (1. 4. 42)), D(T,-,,Z,-,)
depends on F(T,-,2“), 1 S j g n, through 990(Z,,). See (1.1.2) for the dependence
0f 660(21'1) on 1317113211)? 1 S j S n-

One can see that, for 1 g j,i,,i2 < n and i, ¢ i2,

5: 2TKb(i Z’leblzi 2-211) if 12 #3-
F<T.-,z.-.)—F“2><T.,z..)= 2,11,49,- ”1‘“ Z”) ’ (1.4.53)

In order to study this difference, denote W(t, z) : 6Kb(T — t)K,,(Z — z)/n. Since

 

nb2|W(t, z)| < Co for some ﬁnite number C0, by Bernstein inequality (see (1.45)),

n2b4ef,

 

P (WIWU, 2)! > 721125..) s 263319 (__

<2 — 1222,.
2cg+conbze,/3) —- ”M C" C)

(1.4.54)

33

The last inequality holds for some ﬁnite and positive number C if b215,, = 0(n‘“) for
some 0 < a < 1. In view of the proof of Lemma 1, instead of using 5,,(t), en, ha, (1, r
and s in the proof of Lemma 1, here using nW(t, z), nen, b, 2, 2 and 3, then (1.4.6)

there, with the exponential part replaced by that of (1.4.54), leads to

 

4 2
P( sup |W(t, 2)] > 5") g 2 (aneb +1) eatp (—Cnb2€n)a

(t,z)ev,, n
which is 0(1) if 5,, is chosen to be n‘(1”“°)b’2 for all 0 < a0 < 1. For these values of
a0, 5an = O(n‘“‘“0l), so that (1.4.54) holds.

It follows that
sup |W(t, z)| = 0,,(n—(1‘aolb—2).
(1,2)61),
Since b = 0(n‘“) with % < a < i, the above rate is 0,,(n‘l1‘00‘2‘ll) and is 0,,(n‘i) if
0 < on < % — 2a. Therefore, we obtain

sup |W(t, z)| = 0,,(n‘i). (1.4.55)
(t,Z)E'Dh

By (1.4.19) of Lemma 6, and that infumeph h(t,z) > 0, it follows from (1.4.53)

and (1.4.55) that

. - i -1
Sup lF(TjaZi1) _ F(2)(TjaZi1)l: 0,,(71 2)‘
19,11,229:
i1¢i2

By the deﬁnition of Dll2)(T,-,, Z,,) and D(T,-,, ,,), 1 _<_ i1, i2 3 n, i, ¢ i2, and assump-
tion (Al), we can obtain that, with probability approaching 1,
sup lb<i2)(Tl11 Zil) _ D(Ti11Zi1)l—<— C Sllp HEAT]? Zi1)_ 1302“le Zilllv

13113:st! lSj,i1,i2§'n
“#12 i1¢i2

for some ﬁnite and positive C. It follows from the above two displays that
sup (DWI/1., 4.) — M... 2.11 = opt-72%). (1.4.56)

1STI 91.2311
119512

~

Next we show that, for the second part of R3,, D(T,,, Z,,) can be replaced with
D(T,,,Z,,) — 13“?)(T,UZ,,), 1 3 i1, i2 3 n, i, 31$ 112, without changing the expectation
of it. This is because of the following reason. For any 1 g i, S n, given (T1,,Zk),1 S
k S n, and 6k,k 75 i, the conditional expectation of 6,, — F(T,,,Z,,,60) is 0. For
1 g i1,i2 g n,i1 75 i2, by their deﬁnitions, neither D(i2)(T,,,Z,,) nor Dli1)(T,-2,Z,,)
depends on 6,, and 6,2, and D(T,~,, Z,,) does not depend on 6,,. Given (Tk,Z,,),1 g

k g n, and (5),, 1 g k g n, k 72 i1, the conditional expections of
M. — HT... 2..)116... — F(T... 2..)1D‘i2>(T..,2..)1‘9‘1141"...2..)
and
16.. — F(T..,2..)lM.-. — F(T... 2..)llD(T.-. , 2..) — D"‘”(T...2..)1D“”(T..,2..)

are zero. Thus their expectations are 0 too. Therefore the expectation of the second

part of R3, is equal to

.1. ..,_ ~___~(.-,)_.
E (n 2312;; weld. F(T..,2..,0.)HD(T.., 2..) D (T... 2.)l
X wi2i2[5i2 — F(Tt'g. 2.2. 901llblTis. i2) _ D“”(71..Z..)l)
which is o,( 1) in view of (1.4.56) and the boundedness of 6, — F (T,, 2,). Therefore,
Rn = 012(1).

and hence, by (1.4.51),

1 Zn Di(Ti.Zi.90)
En:— ii6i_Fn)Zi76 1
ﬂ i=1 w [ ( 0)] F(Y-lza Zi.90) + 0p( )

Since P(w,, = 0) = 0(b) = 0(1), D1(T,,Z,,60) is bounded, F(T,,Z,,60) is bounded

 

away from 0, and the conditional expectation of 6, — F (T,, Z,, 60) given (T,, Z,) is 0,
it is easy to see that

_ 1 n . D1(7-;aZi700)
E. — ,5 gm. F(T..2.,9.)l F(,.” 2,, ,0,

 

+ 0,,(1). (1.4.57)

Now we deal with Qn. Recall (2,, from (1.4.50). Under the condition of the
boundedness of A(t, 60), A(t, 60) and F (t, z, 60) away from 0 and 00, by the uniform
boundedness of Q(Z,) and hence F (T,, Z,, 60) on An (deﬁned before), applying (1.4.40)
with :1: = A(T,, 60)§(Z,-) and y = A(T,, 60)g(Z,-), we can see that

1 n D T,,Z,-,6
lo..-—23w..-1F(T.2..9.)—F(T..2..9.)1 “ ”)II...

 

 

ﬂ i=1 F(Tiazi.90)
< — 29.99 2.) ll9(2.~) — 9(2))! + 19(2) — 9(2.)IlI..
%2:3I(z 62011912 Z-)|2 + |9(Z )— 912..)11912.) — 9129111....

Here C is a positive and ﬁnite number. Taking expectation ﬁrst conditioned on Z,
for each sub-term, we obtain that, by (1.4.28), the expectation of the ﬁrst term in the
last display is 0,,(1). It follows from Cauchy-Schwartz inequality, (1.4.28) and (1.4.29)

that the expectation of the second term is also 0,,(1). Hence we obtain that, on An,

D1(Tia Zia 00)
F(Ti. 21. 90)

 

Qn = :71: Ewii[F(n, Z1, 90) “ F(Tt'. Zia 00)] + 0p(1)' (1°4'58)

By Taylor expansion of 1 — e‘” with respect to a: at some point 2:0, applying this
with :1: : A(T,,60)§(Z,) and :170 = A(T,, 60)g(Z,-), noticing the boundedness of A, we
obtain

“F(T‘za Zi) 60) _ F(Y-ia Zia 60)] + A(Y’t) 60)F(na Zia 60)[Q(Zi) _ g(Zi)“
S Cl6(Zi) — g(Zill2)

for some ﬁnite and positive number C.

This, the boundedness of f(T,-, Z,) (deﬁned below) and (1.4.28) imply that, on An,

Q. = 71—7.,- 23w..t(T.-. 2919(2) — 9(2)] + 0.41) (1499)
where
{(t, z) 2: A(t,60)D1(t, 2,60)H, (t, z) 6 D1,. (1.4.60)

36

Let

1 n
0.0 = E Zwﬂﬁm, 00).
#1
By the law of large number and that P(wj1 2 0) = 0(1), Cno converges in probability

to EA2(T, 00) which is (:0 according to the notation used before.
By Taylor expansion of 109(1-33) with respect to :1: at some point 130, applying this

with :r : F(Tj,Z,) and :50 = F(T},Z,,00), noticing (1.4.27) of Lemma 7, we obtain

 

 

 

that, on An,
912.) — 912 CO —239.1An(T.-,99) 119911 — F129, 2.)) — 19911 — F129, 21, Am
1 1 . A(T- 19 F T- Z, —F T-,Z,—,9 _1
: C _ “’31 J) 0)[ 53]]: g 0 ( J 0)] +0p(n 2).
nOnj¢i (j, 1'90)

This and (1.4.13) imply that, on An,

 

 

 

 

912.) — 92( )— 12.112) + 19.212 )+op(n 2) (1.4.61)
where
1 1 v.5“) T ,Z,
Rn1(Z,) = C _ij,A(T,-,90) _ ( 3’ 99)) , (1.4.62)
110 n jii F(Tj, Zi, 00)B,(IJ0 (Tja Zi)
and
B‘j"’(T.-, 2)
R.,( —Zw,-,A( 3,90) _ < ) . (1.4.63)
0110”] F(T},Zi,60)Bn‘7t (73,21)
Substitute (1.4.61) into (1.4.59) to obtain that, on An,
Q11 : in + Qn2 + 0p(1)a (L464)

where

Q,“ = “3/: 21392991711, 2012.112.)

37

and

Q...— — ——-—— “15239.91 12.2) 2.212). 114.65)

Let Cnl 2: inCno. Then

 

1 ” A(-)T,60 ),,-V(T Z)
C. =-——— 9.... :62) J J’
‘ «a: 5‘ 69%“ 212.2)20122)

, ' ‘. i Y. . . . o
By Taylor expanslon of I Wlth respect to :1: at some pomt 170, applymg thlS With

a: = Bffai)(Y}-, Z.) and 1‘0 2 h(TJ-, Z,), we obtain

1 n . A(TjagO) (,9)
Cn1=-ﬁ;wn€(7}.zi)anwJ-1-(I},Z-—-—)_Vn1(Ty-.2.)

 

 

X [MiG-12.) ’ 19‘1212.2)12(322”(732Z>— h1T..2.)) ,

where h*(T,-,Z,-) is between h(T,-,Z,) and BLQ"(T,,Z,). By (1.4.17) and (1.4.21) of
' Lemma 6, and the boundedness of h, f, A and F, using Cauchy-Schwartz inequality,

we obtain that

 

 

 

1 " A(T 190)
11 :——E ii 7’113Zz)— 7:15;“ J, /n GZi 1
Let
1 {(71121)
n , '— i1 K Z,‘ — Z
C (T’ 2') 2.2.9.“) F12..2.)h1T..2) ”( ’)
and
h C, d!‘
((1.2): (T6022) (9 Z) 5, (1,9) eDh. (1.4.66)

where {(9, z) is defined (1.4.60). Then, by the deﬁnition of WWI), 2.), 1 g 9', j g n,

(See (1.414)), and change of summations,

1
Cm — \7—5:[ (51— 1707. 21)]; ZUI’112‘\(7}.90)K9(73 — T1)Cn(Tj. Zz) + 019(1)-
9921

38

Let 11.2,,(2b) = 1 if T, E [t‘i‘ + 2b, t; — 2b] and Z1 6 [2f + 2b, z; — 2b], and 0 otherwise,
1 g l g 71. Now we write the main part of C,” as the sum of two parts according to
whether 111,,(2b) = 1 or O. The reason for doing this is because of the edge effect of

the kernel estimation. Write

C... = Cn1+C21+op(1), (1.4.67)
where
- 1 "
C... = ——2329..12b)19. — 171,2.)1—an A1 12.6.)Kb1TyT)<.1T.-,2)
ﬂ (=1 j¢l
and

031 = —% 2(1— wu(25))l51 — F(T,, 201% Z “111M731 60)K9(T9' - T1)Cn(Tj, Z1)
- #1

Since conditioned on (T.,Z.-),13= 1, - u ,n, 61— F(T,, 21,00) and 15k — F(Tk, Zk,90) for
1 7f k are independent with mean zero and variances F (T), Z,,00)[1— F (T), Z;, 60)] and
F (Tk, Z,,, 90)[1— F (Tk, Zk, 00)] respectively, by taking conditional expectation ﬁrst, we

can see that

2102.)? = £23411 — wu(2b))F(T1. 2., 9911 — 1212.219.»

(=1

x l wle(TJ-,00)Kb(Tj —T,)C,.(TJ-,Z,) 2
(n 2 ) l
j¢l

which is 0,,(1) because

1
a. Z EIA(T7, 00)Kb(Tj — YDCnCTja Zl)|

 

#11
1\(-T 90)€ )(T. Z)
<— 3’ ’ KZ.-ZKT-—T =01
and E (1 — wll(2b))¢ = 0(1). Here we use the boundedness of W and that

supmznengllKﬁZ, — Zl)Kb(Tj — T,)| < 00. Therefore, we obtain

03, = 9,,(1). (1.4.68)

39

Write

énl = 1.11 + 6.212. (1-4-69)
where
1 ,
C... = —— $521321 29) 19.— F1T.. 2.90)); ZwﬂMTTO°Mba3 — T.)<(T.-. 2.)
#1
and
Cnl2— _——1—;u’u( 21)) [—51 1701.21.90”

x n: 23 w..A1T.. 9.)K.1T. — T.)1<.1T., 2.) — <1T.-,2.)1-
.1525!
Note that ((15, z) is four times differentiable under the assumptions. Apply (1.4.7 ) of

Lemma 2 and part (1) of Lemma 3 to ("(Tj, Z.) with d = 1 and r = 4 to obtain

sup E...1<.1T.,2.)—<1T..2.)1 =0.1— +0.11%). 114.20)

1
b)

Z.€[zf+2b,z§—2b],TjeT0 11
Because of the conditional independence of 15. — F (T., Z,, 00) and 15k — F (Tk, Zk, 00) for
l aé k with mean zero and variance F(Tl, Z¢)[1—F(T., Z.)] and F(Tk, Zk)[1—F(Tk, Zk)]

respectively when (T., Z.),z' = 1, - - - ,n are given, as before, we have
C)... ——%2:32{w..12b)FF1:/1.2..9.)11 — F1T..2.,6.)1

x l; Z,,w.A1T.A)21T. — 2119.12.21) - 1.12.291]?

<22 [9..129)F1T..2. 9.)11— F1T..2..6.))

_ 1 2
x —239..A 123.9.)K.1T.— T.)— ”2311.12.29 — <.1T.-.2.))2].
#1 ”#1
The last inequality follows from Cauchy-Schwartz inequality. Since K is bounded,

311-2

lel S C/b2 for some ﬁnite number C. This, the boundedness of F (T.,Z.,00) and
A(T.,60), and (1.4.70), imply that

21‘...)2=0.1—— 1 )+0.1b‘2)

nb3

40

which is 0,,(1) as b = 0(n‘“) with 21; < a < % (See assumption (A4)). Therefore,

an

Cum 2 0,,(1) and it follows from (1.4.69) that

1
Cnl — £2"; UM ()2b [—51 F(Tz. 21)]; Z'wle(Tj.60)Kb(Tj " T1)C(Tj.Zl)
#1
+ 0,,(1).

Let

n.1T.),Z.-— £21111“ 1T.,0..)K.1T.— T.)<1T.,Z.). 13157.,
”#1

and
77(t, z) = A(t,00)C(t, z)h.1(t), (t, z) E 'Dh. (1.4.71)

where ((t,z) is deﬁned in (1.4.66), and h1(t) is the marginal distribution of T as

deﬁned before. Similarly, we can obtain

Cnl '2 «22le (2b) )—[(51 F(T},Z()]77(CT(,Z1) +0p(1).

Since P(w..(2b) = 0) = 0(1)), 7)(t, z) is bounded, and the conditional expectation of

6., given (T., Z.), is 0, it is easy to see that
(in. = —i i315. — F(T,, 201.7171, Z.) + 0 (1). (1.4.72)
1/5 = p

Since Q... = CHI/C...) and 0110 — co = 0,,(1), it follows from (1.4.67), (1.4.68) and

(1.4.72) that

62.. = ——— «5215.— F1T.,Z.)1n1T.,Z.) + 0.11). (1.4.73)

Now we deal with Qn‘z. Let CH2 2 QnQCno. If follows from (1.4.65) and (1.4.63)

that

 

an ﬁzwu€( T 21‘)”: — (\(7330202) B£j,2)(7:7, 2;)
n19“ ﬂF(73’Zi’90)BnJO, (7372i)

41

As b = ()(n”“) with ﬁ < a < i (see assumption (Al)), (1.4.20) and (1.4.21) of
Lemma 6 and the same arguments as we were dealing with C... lead to

\(T .90) ..
C71 — n (7132') J, B0,!) T'aZi 1 -
2— £21 61,1,Zw..F(TH‘Z 00),,(TTT) . 1. )+o.1)

 

That is, 353577}, Z.) can be replaced by h.(TT-, Z.) with a small difference 010(1). By
the definition of 3713,.)(73, Z.) (see (1.415)),

1 n 1 MT 60)
0.. = —— 111.. T1.Z.‘ — 717‘ - J,
2 ﬂ Z; a )n 3; )1F(Tj.Zi.90)h(73. Z.)

 

1
x ; Z1F1T. 2.6..) — HT. 2. 11..)1K.1T. — T.)K.1Z. — Z.) + 0.11)
l¢i,j
:: 1120 + 017(1)? say

Note that

_E(T'zini1)A(TJl’00)
01120) _ "512 Z Z wzlilellF F(le,Zil,60)h(Y}1,Zil)

11=1J1J5i1 (H511 J1

X[F(Tl13Z11100) _ F(TJHZIHQO) le( T11 — 731)Kb(er _ Zii)

€(Ti212i2)A(T7'2160)
X u’igizu) )1
2: 3;? 12;” J F(Tji” Z’2’ 60)h(TJ21 Zi 2)

X [F(leaZl-zvgo) _ F(Tjw Zi2160)]Kb(T12 — TJz)Kb(Z12 _ Ziz)‘

 

 

Since
E{[F(T, Z, 60) — F(t, z, 60)]Kb(T — t)Kb(Z — 2)}
1 1
2/ / [F(t-1— b11,z + bv,190)— F(t, z, 00)]K(u)K(v)h(t + bu, z + bv)dudv
0 0
= 0(b4),
uniformly in (t, z) 6 Do, for the terms with l1 # l2, conditioned on TTUZ.“ 732, Z,,,

its expectation is of order 0(b8) uniformly in 31,332,131, 12 and hence the sum of the

expectations of these terms is of order 0(nb8) which is 0(1) if nb8 —> 0. Note also

42

that

E[7.L 11|F( (’33160)_ F(T 21‘ 60)]Rb(t —T')Kb(z —- Zi)”

:/0(b/1 IF( t+bu 2+1”) 90)— F(t,2.90)IIK(U)K(U)h(t+bu,z+bv|dudv

For those terms with l. = 12 and 1'. 91$ 12,]. # jg, conditioned on T.,,Z.,, its ex-
pectation is of order 0(b) uniformly in T“, Z.,, hence the order of the sum of the
expectations of these terms is also of order 0(b) since there are 5 summations. This
order is also 0(1).

Similarly, the sum of the expectations of the other terms is of order 0(1). There-

fore,
E(Cn‘20)2 = 0(1)
and hence Cm... : 0T,(1). Therefore, an and hence Qng is 0p(1). This, (1.4.64) and

(1.4.73) imply that

Q. = _EST/lth :15.- F(T., Z.,60)]11(Tz, Z.) + 0,,(1). (1.4.74)

If follows from (1.4.48), (1.4.57) and (1.4.74) that, on An,

0101.21.90) 77(TI.ZI)
6)0 =\/_12[6[— F(T‘I,Z(,00)] [F(T},Zl,60) —' CO ] +0p(1). (1.4.75)

This and (1.4.47) imply that, on An,

 

 

chrlaZlaBO) "(Y-lazl)

ML”:d_,(60)_1ﬁ§1..41712.9... [F(T..Zz.0o> _

 

 

] + 0.11),

where 77(t, z) is defined in (1.4.71) and

F(T, Z: 00)

W") 2 E [F(T, Z, 9..)

DT(T1 Z) 00)] '

Since D.(t, z, 00) and 17(t, z) are bounded and F(t, z, 60) is bounded away from 0 on

D.., the theorem follows from the central limit theorem and Lemma 9.

43

Chapter 2

Sieve Estimation

2. 1 Estimation

The second approach uses the idea of sieve and is analogous to that of Rossini and
Tsiatis (1996).

The goal of this chapter is to estimate 0 efﬁciently, with a(z) = log(g(z)) as an
inﬁnite dimensional nuisance parameter. The rescaled (conditional) log-likelihood of

0 and 0 based on (7},6,,Z,),i = 1,2,... ,n. is

L,(9, a) 2 £2 [9,logF(T,-, Z,, 9, a) + (1 -— 6,)zogF(T,, Z,, 9, a)]

£21

1 n .r, 0(2')
=_ 9,1 1— “(Fume ‘ — 1—6, AT,- 9 0W] 2. .1
7,;[ome >< >(.)e (1)
Here
F(t, 2:, 6, a) = 1— e“‘<"">€°“), F(t, z, 9, a) = 1 — F(t, z, 9, a). (2.1.2)

To maximize the log-likelihood over all possible 0 and a, we should set a(Z,) to
be positive inﬁnite if 6,- = 1, and negative inﬁnite if 6,- = 0. Hence the maximum
likelihood estimator over all possible functions a does not exist. The log-likelihood
function is maximized as oz varies over a small set of functions which depends on the

sample size. More speciﬁcally, we approximate a by a step function with known jump

44

 

points and maximize the log-likelihood as a varies over the step functions. As the
number of steps increases along with the sample size, the bias from the approximation
disappears. Assume that the covariate lies in a bounded interval. Without loss of
generality, it will be taken to be an interval [0,1]. To construct the step function,
deﬁne a partition 0 = 20 < 2:1 < < z), = 1, where k depends on n and increases

with the increase of n. The step function is then deﬁned as

k
an(z) = 29,-5.9), (21.3)

where 13(2) is the indicator function for the jth interval, deﬁned by 13(2) 2 1 if
zj_1 < 2 3 z, and zero otherwise. For the ﬁxed partition, the step function is
completely speciﬁed by the parameters (a,1,--- ,ank). Hence, from here on, an will
denote either the function 0:, given by (2.1.3) or, equivalently, the vector oz, depending
on the context.

The estimate (9,61,) is obtained by maximizing the approximate likelihood formed
by substituting (2.1.3) for a in (2.1.1). Since I: is an increasing integer-valued function
of n, written as k(n), a, will tend to a. The next two sections show that when
k(n) 2 0(71") with i < 7 < %, (6,611,) is consistent and 9 is also asymptotically
normal.

The ﬁrst and second partial derivatives of the approximate log-likelihood are used

to generate the estimates and their variance. In view of (2.1.1), the ﬁrst derivative

with respect to 6 is

5,0(9, 0,) = A(T,, 9)ean<Z">, (2.1.4)

 

l i [6i — F(T,“ Ziagi C{71)}

n i=1 F(TiaZi10J1n)

and that with respect to am is

1 " [9, —F(T,- Z,- 9 a,)] .
,9, =— ’ ” 1 ,9 a"WI-Z,, (=1,-.- k,
S ,J( 70 ) n; F(n,Zi,0,an) \(T7 )6 ]( ) J a

 

 

 

where A(t, 0) denote the derivative with respect to 6.

The score vector is deﬁned as

s,(9,a,).—. ’ ,a . (2.1.6)
Sn,k(gaan)

The estimates (0, (in) are deﬁned to be a solution to the score equation
5,,(0, an) 2 0. (2.1.7)

The derivative of S, with respect to (6, an) is called the Hessian matrix and related
to the observed information. This is deﬁned as

6

”(W = W

5,,(0, 0,), (2.1.8)

which is the (k + 1) by (k + 1) matrix of partial derivatives with respect to 0 and a,
of the elements of 5,,(6, a,). Let 0 denote the ﬁrst element. Then the elements of H,

are deﬁned by

 

1 n 61 '— F(T‘hZiag an) " \ a ,
h00(0,a,,) Z nZ[F (T, Z,,- 9 a ) Mme” "(21)
2:1 7 a n

1 n 6i(D00 71:92:30,011)

n,_, F(Tl‘,,Z,-,0a,) ’

 

 

1 n [5. — F(‘Z,'- Z,- 9 a,)]- _
. 0 n : _ 1 , , , j . 071(2') . i
hOJ( ,a ) n 1.2—1 F(E,Z,~,9,an) \(T,,0)e 13(2)

_ l : 51001013 Z1, 0, an)Ij(Zi)
n F(R7Zi307an) ,

 

j:17°"7k1

hj0(97an) : h0j(67an)a 3:1: ' ' ' 7k)

 

[6i—F(7117Zi10 071)] 011(21')
hJ'J( (9 an) 1%: (T ,Zi,9,an) A(Ti,9)€ Ij(Zi)

_ _ 1: 62'D11(7‘ia Zia 63 an)IJ(Zi)
n 1701,2919, an) ’

 

j:11"'aka

46

 

 

 

and

hij(gaan):07 z¢j:lamvk7

where

 

 

 

D00(t9 216,071) F(t Z 6 a )‘&2(t36)e2an(2)3 (2H19)
F t,z,6,an - a ,
D01(ta 376,071) 2 FEt Z 0 a ;‘/\(t30)1\(t36)82 "(0)7 (2'1'10)
F t, 2,9,0, 0
D11(t,z,0,an) = F(t z 6 a ;A2 (t, 0)e 2 "( (2.1.11)

and A(t, 9) is the second derivative with respect to 0.

Expectation is taken with respect to the true parameters ((90, a0).

2.2 Consistency

In order to have the consistency and asymptotic normality of the estimator, we use

some assumptions. We call the following assumptions Condition A.

(1) The real parameter 60 is an interior point of G.

(2) Let T and Z be the supports of T and Z respectively, where Z is a closed

interval of 72‘. A(t,0) is bounded away from 0 and 00 over (t,6) E T x N1, where
= {0 : |0 — 60| S A} for some 0 < A < 00. The density of (T, Z), h(t, z), is

bounded on T x Z, Lipschitz continuous in z uniformly for t E T.

(3) The ﬁrst and second derivatives of A(t,0) with respect to (9, A(t, 6) and A(tﬂ),

exist, are bounded for t 6 PT and 9 6 N1, and continuous in 0 for any ﬁxed t;

(4) (10(2) is Lipschitz continuous on Z.

For any function b(z) deﬁned on the support of Z, let Hblloo : supzez |b(z)| and

||b||= (/E 2be sup-norm and Lg-norm respectively.

47

 

 

In the following, Theorem 3 states the existence of one consistent (in sup—norm)
estimator, d, which is a solution to the score equation. Theorem 4 establishes the
convergence rate of the estimator (in L2 norm), which will be used to prove the

asymptotic normality of the estimator. The proof of them will be given later.

Theorem 3 Assume that Condition A holds, and the number of intervals is increas-
ing at a rate k(n) : 717, with O < 7 < 1. Assume also that for all k and 010,, with

Hag, — aOHoo < A0 for some positive and ﬁnite number A0,
P(Ij(Z) =1) 2 0(1), kP(IJ-(Z) = 1) > c, '=1,2,--- ,k, (2.2.1)

and

F(T12190 10011

(WDIKT. Z, 90, 00n)1j(z))

 

k .
E [D00(T, Z, 90,90,“ - Z > c, (2.2.2)

1:1

2
9(mﬁﬂwamuzammmo)
E

F(T,Z,9o,ao,,)
for some 0 < c < 00, not depending on n. Then there is at least one consistent {in

sup-norm) solution to (2.1.7), i.e. there exists at least one (d, 61,) such that
lé - 90|+H51n - aoiloo = 012(1).
The proof is given in Section 2.6.

Theorem 4 Assume that the conditions in Theorem 3 holds. Assume also k(n) = n7,
- 1 1
with Z < ”y < 5, and

E(001(T1Z190.ao))2
E (D11(T1 21001 00))

 

E [D00(T, Z, 60, 0(0)] — > 0. (2.2.3)

Then the estimator (0162,.) in Theorem 3 has the following convergence rate

9—v=4m%ilnwwm=4mdi

The proof is given in Section 2.6.

48

2.3 Asymptotic normality of 6

In this section, the asymptotic normality of the estimator is stated and the proof will

be given later.

Theorem 5 Assume that the conditions in Theorem 4 hold, and 02 deﬁned below is
ﬁnite. Assume also that the third derivative of A(t, 6) with respect to 6 exists for 6 in

a neighborhood of 60, and is continuous at 60. Then
ﬁ(é — 60) _T N(01 02),
where the asymptotic variance is given by

(2.3.1)

 

(E(DOI(T, Z, 90, a0)|z))2)] —1.

2: ED T29 4' —E
o ( oo( , , 0:00)) ( E(D11(T,Z,90,040)IZ)

The proof is given in Section 2.6.

2.4 Information bound for 60

The true model has two parameters: 6 is ﬁnite dimensional, and oz is an inﬁnite-
dimensional functional parameter. The semiparametric information bound for esti-
mating 6 is based on the maximum of the asymptotic variance bounds of regular
estimators for 6 obtained using parametric sub-models of a. It was shown in Section
2.3 that the estimator 6 is asymptotically normal with a certain asymptotic variance.
It is shown in this section that this asymptotic variance achieves the bound. Projec-
tion methods are used to ﬁnd the efﬁcient score for the semiparametric model and
hence the variance bound (Bickel et al. 1993).

The log-likelihood of 6 and (1 based on (T, 6, Z) is given by

0(2)

5109(1 — e“"‘<7¥9>e ) — (1 — 6)A(T, 6)e°(Z). (2.4.1)

49

Consider a general parametric submodel with a = a,, speciﬁed by 7 (a real variable),
where £a,(z)l,:0 : a(z) for some function a(z) with Ea2(Z) < 00. Take derivatives
of (2.4.1) with respect to 6 and 7 at (6 = 60, 7 = 0) to obtain the scores

A(Ta 60)600(Z)

 

(2.4.2)

and

A(T, 60)e°‘°(Z)a(Z)

Sa(T,Z.5,90100) "‘2 l6 — F(T,Z,6o,0‘0)l F(T Z 90 010)

(2.4.3)

 

To ﬁnd the information bound, project SO to the linear span formed from all square
integrable So. This projection is denoted by 50- and is computed by solving for all

5..
E(SOS,) = E(s,.s,). (2.4.4)

Note that the conditional expectation and variance of 6 given (T, Z) is F (T, Z, 60, a0)
and F(T, Z, 60,a0)F(T, Z, 60, (10) respectively. Substituting (2.4.2), (2.4.3) for So, S,
in the above expression, taking conditional expectation, given (T, Z) ﬁrst, and then

taking expectation with respect to (T, Z), We obtain
E(D01 (T, Z, 00, ao)a(Z)) = E(D]1(T, Z, 00, a0)a*(Z)a(Z)),

where D01 and Du were deﬁned in (2.1.10) and (2.1.11) respectively. Take conditional

expectation, given Z ﬁrst, and then expectation with respect to Z to obtain
E[E(D01(T, Z, 60, a0)|Z)a(Z)] = E[E(D11(T, Z, 00, ao)|Z)a*(Z)a(Z)]. (2.4.5)
It is easy to see that

(2.4.9)

 

solve (2.4.5) and hence also solve (2.4.4).

50

Therefore, the efﬁcient score is given by

50(T. 2.90100) - Sa‘(T12260100)

(6 — F(T, Z,60,(10))800(Z) . , E(D01(T, Z, 00,00)|Z)
= AT9—AT9
F(T, Z, 90, 90) l ’ 0) l ’ 0)E(D11(T, Z, 60,ao)|Z)

 

 

The semiparametric information bound is equal to
E [SO(T7 Z: 60-, 0'0) _ Sa'(Ta Z1001QO)]2

and the asymptotic variance bound is the inverse of the information bound. Take
the conditional expectation of the square of the efﬁcient score, given (T, Z) ﬁrst, and

then expectation with respect to (T, Z) to obtain

E [50(T, Z, 60, (YO) _ 30' (T1 2760100)l2

F(T,Z,00,00)6200(Z) ( E(D01(T12160100)IZ))2

=E A T,6 —AT,6 ,
F(T,Z,60,a0) ( 0) ( °)E(Du(r,z,9o,ao)|Z)

 

 

 

 

Expand the square term and take the conditional expectation given Z ﬁrst to obtain

that the right hand side of the previous display is equal to

 

E [D00(T, Z, 60,00) - (E(D01(T’Z’00100)IZ))2]

E(Dll(T, Z, 90, 00)|Z)
In view of (2.3.1), it follows that the asymptotic variance of 6 achieves the asymp-

totic variance bound.

2.5 Simulation

A simulation study is presented before we go to the proof of the stated asymptotic
properties of the estimator.
As in Section 1.3, assume that the conditional distribution of X given Z is a

Weibull distribution with distribution function

1 -— e_1.60e00(Z)

51

where (10(2) 2 log(z). Also assume that T and Z are uniformly distributed on [1,2]
and [02,12] respectively.

For each ﬁxed sample size (11230, 60, 100, 200, 500, 1000 respectively) and ap-
propriate k’s, 100 samples are generated with the real parameter 60 = 1.5 and 100
replications of the estimate of 60 based on the sieve maximum likelihood estimator
(SMLE) are obtained. The means and standard deviations of these estimates are

shown in the following table.

Table 2. Simulation results for the SMLE

 

11

w

mean

s.d.

 

30

1.6976

1.8180

 

2.1060

2.0269

 

2.4360

2.9598

 

60

1.7064

1.2145

 

1.8189

1.2680

 

1.9675

1.4248

 

100

1.5954

0.8047

 

1.6427

0.8103

 

1.6932

0.8502

 

200

1.5624

0.5154

 

@030003r500031hv‘kwt—t

1.5838

0.5330

 

p—I
O

1.6240

0.5365

 

500

03

1.5591

0.2946

 

00

1.5671

0.2893

 

b—J
01

1.6076

0.2964

 

1000

 

10

1.5432

0.2136

 

15

1.5530

0.2125

 

 

20

 

1.5651

 

0.2177

 

 

 

The above table shows that when the sample size is not large, the bias and variance
are slightly larger than those of the generalized proﬁle maximum likelihood estimator
(see Table 1). However, they decrease with the increase of the sample size, and the
variance will be eventually less than that of the generalized proﬁle maximum likeli-

hood estimator since it achieves the semiparametric lower bound. Unfortunately, a

52

very large sample size is needed for this to happen. This can be seen when we com-
pare the above table with the simulation results for the generalized proﬁle maximum

likelihood estimator in Section 1.3.

2.6 Proof of the theorems
2.6.1 Proof of Theorem 3

The deﬁnitions of sup-norms for a vector and a matrix are introduced ﬁrst. If a is a

vector with elements a], 1 _<_ j g m, then

llalloo = lrsrljgnlajl-

If A is an m. by 771. matrix whose (i, j) element is denoted by a,,-, then

m
”Alloo = 121,2); (2 Iasl)

1:1
Now deﬁne a step function, 00,, of form (2.1.3) as an approximation to 00. Pre-

cisely,
k(n)
0011(3) : 200(Zj)1j(z)'
j=1
The Lipschitz continuity of 00 implies that
[[0071 — 00““) = O(k(n)_1). (2.6.1)

Let

ﬁn = (970111: ' ° ' yank), BOn : (90) (10(21),' ' '1OO(Zk))1

and

[30 = (90.00)-

53

Note that 3,,(6n) = 0 is equivalent to

5.11%.)
5,9,) := kSmtw") = 0. (2.6.2)
kSn,k(IBn)
The derivative of 16,03") with respect to 6,, is
h00(,5n) helmn) h0k(/3n)
H,(,9,) ;-_— khml'g") 1.9110(9) ,0 g , (2.6.3)

khOkwn) 0 0 khkkwn)

where h,,- is deﬁned in Section 2.1. The low-right k by k sub-matrix is a diagonal

matrix.

Let SW") = ESn()6n) (expectations for all the elements). Then, by (2.6.2),
(2.1.4), (2.1.5) and the fact that the conditional expectation of 6,- given (T,,Z,-) is
F(T,, Z,, 60,00), we obtain

E<A<T,z,9.,9.>A(T,9))

~ kE(A(T, 2150, ﬁn)A(T10)Il(Z))

S (5..) = , (2.6.4)

kE(A(T, Z, 60, ,6,)A(T, 6)Ik(Z))

where

HTZ%%)
.TZ 9,=a.<z> ’ ’ -1-
4( , .60., ) e F(T,Z,6,an)

By (2.1.2) and Assumption (2), (3) of Condition A, F(t, z,6,a) is Lipschitz in 6,0,

 

uniformly for (t, z) E T x Z. It is easy to see that ||S~'(6n)||00 = 0(1) if “6,, — 60,,“00 =
0(k’1) and P(I,~(Z) = 1) = 0(1) forj = 1, - -- ,k.
Let g(ﬁn) = Efilnwn). Similarly, by (2.6.3) and the deﬁnition of h,,~(,6,,) (see

Section 2.1), 0 g i,j S k,

boown) (901(571) ' ' ' bakwn)
kbOkWn) 0 "' kbkk(ﬂn)

54

where

600(9,) = E [(R(T, Z, 90, 9,) — 1) A(T, 6)e""(Z)] — E(R(T, Z, 90, ,6,)DOO(T, Z, 9, a,)),

510.03..) =E[(R(T,Z,9o,9n)-1)A(T,6)e“"‘z’1.(Z)]

_ E(R(T1ZaﬂOaﬂn)-D01(TaZigaan)1j(Z))i .721127' ' ' 1k,

b,,(,9,) =E [(R(T, Z, 90, 9,) — 1) A(T, 6)e°"(Z)I,-(Z)]

— E(R(T? Za ﬁ07/Bn)D11(T, Z) 0: an)I](Z))a 3:1123'H1k7

 

and
F(T7Z100700)
Z , = .
R(Ta 1/30316 ) F(T, Z, 0,0,1)
Notice that
- a ..
H , = — , .
(9 > ,,nsw )
The inverse of 1706,) is as follows
~ [53—1
H “(9.) = <qu _1q01) , (2.6.5)
%1 k 611

where

k b2' —1
900 = (boo — 2 b&) a
J=1 J

qm is a row vector with its jth element
bO' .
—q00b—]7 J:1121'°'1ka
j.

and q“ is a k x k matrix with its (i, j )th element

2 bOi b0 j
00 ’
b,,-b,-,-

 

I(,’;j)bj_jl +q ]=1,2,'°' ,k.
Since 6030) = 0 by (2.6.4), S(,6,) in continuous in )3, by Condition A, by (2.6.1),
“Stags“... = 0(1). (2.6.6)

55

 

Since H(,6,) is continuous in 6, by Condition A, and Iii-1030,) exists for large n
by (2.2.1) and (2.2.2), by (2.6.6), it follows from the inverse function theorem with
sup-norm (Lemma 1 of Rossini and Tsiatis (1996), which is stated in the following
lemma. For the standard (L2) formulation of the inverse function theorem, see Rudin

(1964)) that there exists 6, = (6, (1}), with o}, of the form (2.1.3), such that

and
”3.. — ﬁenlloo = 0(1). (2.6.7)

If |]S,(6,)[|oo = 0,,(1) and [[1‘31l;1(/37,)||00 < c with probability approaching 1 for
some ﬁnite constant c, then by the inverse function theorem again, with probability

tending to 1, there exists solution 6, = (6, 0“,) to S,(6,) = 0 such that
[llén — 611““) = 010(1)-

This, (2.6.7), (2.6.1) and the triangle inequality imply that
”/5771 — ﬁolloo = 0p(1)'

||S,(6)Ho0 = 0,,(1) and ||f~1,j1(6,)||oo < c with probability approaching 1 can be
established in the same way as the proof of Theorem 1 in Rossini and Tsiatis (1996).

The theorem is proved.

Lemma 10 (Inverse Function Theorem with Sup-norm). Let H(x) be a continuous
differentiable mapping from Rm to Rm in a neighborhood of 2:0. Deﬁne the Jacobian
as the m x 771 matrix A(x) =2 6H(x) (derivatives of the elements ofH with respect to

the elements of x). If there exists constants C and (5* such that

“Ari-4170)”... < C

56

and

sup HAVE) - A(xo)lloo S (20)“,

{IIHI-xolloo<5'}
then for d < 6“/(4C) and all y such that [Iy — H(:r.0)[]Do < d, there exists a unique

inverse value x in the (5* neighborhood of x0 such that H(x) = y and “:1: — 230]] < 4Cd.

2.6.2 Proof of Theorem 4

We are going to use some general results on the convergence rate of sieve estimators.
The following lemma is a part of Theorem 1 of Shen and Wong (1994). To state the
lemma, we introduce some general notations.

Let Y1, » ~ - , Y, be a sequence of independent random variables (or possibly vectors)
distributed according to a density p0(y) with respect to a o-ﬁnite measure a on a
measurable space (37,8) and let G be a parameter space of the parameter 6. Let
l : (9 x y —> R be a suitably chosen function. We are interested in the properties of
an estimate 6, over a subset G, of 8 by maximizing the empirical criterion C,(6) 2
£2,221 l(6, Y2), that is, C,(6) = maxgee, C,(6). Here 9, is an approximation to 9
in the sense that for any 6 E 9, there exists 7r,6 E G, such that for an appropriate
pseudo—distance p, p(7r, 6, 6) —> 0 as n —-> 00. The following assumptions are needed
for the lemma.

C0. l is bounded.

C1. For some constants A1 > 0 and a > 0, and for all small 6 > 0,

.f El/3,Y ~lB’Y >2A 2a.
P(5,ﬁo)r>le,669n ((10 l ( )) __ 16

C2. For some constants A2 > 0 and b > 0, and for all small 6 > 0,

.f V l Yr—l 1Y<2A2b,
PWﬁolgcﬂeen ar( ([30, l (,3 )l _ 26

C3. Let .77, = {l(6, -) — l(7r,60) :6 E 9,}. For some constants r0 < % and A3 > 0,

H(€,}',) g A3ri2r°log (1) for all small 6 > 0,

C

57

where H (e, f,) is the Loo-metric entropy of the space Tm that is, exp(H(e,.7-',)) is

the smallest number of e-balls in the Loo-metric needed to cover the space 17,.
Lemma 11 Suppose Assumptions C6 to C3 hold. Then

.003, 230) = 0,. (max (n",p(7rnﬁo. Bo), Kl/(2“)(71nﬁo,ﬁo))) .

where K(rr,60, 60) = E(l(60, Y) — l(7r,60, Y)) and

1—2rQ _ loglqgn 2f b > O.
T __: 2a Zalogn ’ — ’

1—2r -
ii, 2f (2 < a.

From the proof of Theorem 1 of Shen and Wong (1994), it is noted that the globe
maximizer could be replaced by a local maximizer around the real parameter and the
convergence rate is still true for the local maximizer. In this situation, the sieve G, is
a sequence of shrinking neighborhoods of the real parameter 60. To apply the above
Lemma to our case, let Y = (T, Z, 6), 6 = (6,a), 7r,6 = (6, (1,) where a, is of form

(2.1.3) with 02,,- = a(z,—). Also let
en : {(01072) 3 l6 "' 00] S an: ”an _' QOIIOO < bn}:

where a, and b, are chosen such that, with probability approaching 1, (6, (1,) is the

maximum point in 9,. Deﬁne the metric as follows
903. 60) = l9 - 6ol + Ila - 00“: (246-8)
and also deﬁne
1(9, Y) = (Slog (1 — e-MT'OW“) — (1 — 6)A(T, 6)eG(Z).
Under our assumptions, C0 is true. Note that

El(6,Y)

58

Taking Taylor expansion of l ( 6, Y) with respect to 6 and a, noticing that the expec-
tation of the ﬁrst derivative vanishes at 60 and the matrix of the second derivatives

is negative deﬁnite by (2.2.3), we obtain
Em... 1’) — 1(9, Y)) 2 W, 90) (2.6.9)

for same ﬁnite and positive number c. Hence C1 is satisﬁed with a = 1. It is easy to

see that, under Condition A,
1"av‘(l(/3o. Y) — [(3, Y» S E(l(.30, Y) — ((5, Y))2 S 092(6aljol

for some 0 < C < 00. Thus C2 holds with b = 1.

Since for all y,
|l(1'3. y) - 1(130. y)! S 6‘019 - 90] + Ila - aolloo).
for some 0 < C < 00, not depending on y, it is easy to see that
FIR-.75.) S H(€/Ca9n),

where H (7),9,) is the metric entropy of the space 9, with respect to the norm
|6—60|+ [|a—ao||oo. Since 9, is a sequence of shrinking neighborhoods of 60 = (60, do),
there exists a positive and ﬁnite number C0 such that [6| S Co and “oz,“00 S C0,
(6,0,) 6 9,, and a, of form (2.1.3). For any 77 > 0, divide the interval [0,C0] into

small intervals, with length 71/2 or less, such that the number of intervals is less than

or equal to %Q + 1. Then, it is easy to see that

. k(n)
H(-n, 9,) 3 log ((2% +1) (3% +1) ) g Ck(n)log (%),

for some positive and ﬁnite constant C, as r} is small enough. Hence, for small 6 > 0,

H(e,}',) g Ck(n)log(-71-7-) = Cnllog (%),

59

for some positive and ﬁnite number C. Therefore C3 is satisﬁed with r0 : %.

Apply the above lemma, we obtain

p(6, 60) = O, (max (n—T,p(7r,60,60),K71i(7r,60,60))) , (2.6.10)

where
1 — 7 loglogn

T:————

2 2logn ‘

 

Note that, for large n, i < r < 3 as i < 7 < %. Since 60 = (90,00), 7rM30 = (90,007.),

where do, is of form (2.1.3), we obtain that, by (2.6.8) and (4) of Condition A,
102(7Tnﬂoaﬁo) = HO‘On — aoll2 g CHM—2 = 071—277
for some positive and ﬁnite number C. Thus
p(7r,60,60) S Cn’”. (2.6.11)

The same argument as that leading to (2.6.9) gives that, for some ﬁnite and positive

number C,

K(Tfnﬁo, (30) = E(l(,60, Y) — l(ﬂnﬂo,Y)) S Cllao, — (lollz 1' Cn"27, (2.6.12)

which is of order between o(n’i‘) and o(n 1) for] < 7 <2 — .It follows from (2.6.10),

(2.6.11) and (2.6.12) that, for i < 7 < %,

1063,60) = OAR—i).

The theorem is proved.

2.6.3 Proof of Theorem 5

S,,0(6,a) was deﬁned in (2.1.4) and further denote, for a function a on Z with
E(a(Z))2 < oo

—F(T,~,Z,,6, a)

S"(6’a)[a]: 6F( T, z 9 a)

A(Y},9)e"(z')a(Z-1).

 

60

where F(t, 2,6,0) is deﬁned in (2.1.2).
Denote the expectation of S,,0 and S,[a] by S0 and S [a] respectively. Since the

conditional expectation of 6,- given (T,, Z,) is F(T,, Z,, 60, (10), we obtain

 

 

50(9, (1) = E [F(T’ 2,925,655,211? 2,9,)MT, 9)e°<Z)] (2.6.13)
and
FTZ9 ‘—FTZ9 ,
S(6,a)[a]=E[ ( ’ ’ 109(6); 0 é)’ ’ ’a)A(T,6)e"(‘)a(Z)]. (2.6.14)

The method used here is similar to that described in Huang(1996). From Lemma 12

below, we obtain the following stochastic equi-continuity results

SUP [(512,0(910) — Solo. 01)) — (311,0(90100) — 50(90, Golll
|6—6013Cn—4,|Ia—aoi|§Cn_%
= o;(n’%)
and
SUP |(5n(9.a)lal - 5(9aallal) -- («$400. aallal - 5((90. Oollalﬂ
Io—oolsort.IIa—aoHSCn‘i
= ago-h,

for all a with Ea2(Z) < 00, and all positive and ﬁnite number C.

This and Theorem 4 results in

A

(311,0(915171) — 50(éa (311)) — (Sn,0(60100)_ 50(90. 00)) = 014716) (26-15)
and
(5.6. CMW] - 5(9. C‘9.)[a“l) - (S7400. ao)[a‘l - 5(90. ao)la*l) = 0149—52). (2-6-16)

where a“ is deﬁned in (2.4.6).

61

By the deﬁnition of6 and (1,, S,,0(6,d,) = 0. Also note that S0(60,ao) = 0 by

(2.6.13). It follows from (2.6.15) that

A

50(9, 61,)) = —s,,0(90, (.0) + 6,97%). (2.6.17)
For the another part, we do not have S,(6, (32,)[a‘] = 0, but we will show that
S,(6, 6,.)[4‘] -_~ o,(n-%). (2.6.18)
Together with S(60, a0)[a*] = 0 by (2.6.14), we obtain from (2.6.16) that
5(9, 6,))[..*] .—. —s,,(90, a’ollai] + 0,,(n-é). (2.6.19)

By Condition A and that the third derivative of A(t, 6) with respect to 6 exists and
is continuous, taking Taylor expansion of S0(6, d,)) to the second order with respect

to 6 and (1,, we obtain from (2.6.17) and (2.6.13) that

-E[D00(T, Z, 90, 60)](9 —— 90) -- E [001(T, Z, 90, ao)(d,(Z) —- 60(2))1

= —s,,0(90, 60) + 0(|9 — 90? + H6, — 60(12)+ o(n”%). (2.6.20)
. Similarly we can obtain from (2.6.19) and (2.6.14) that

—E[D,,(T, Z, 90, ao)a*(Z)](6 — 90) — E [D11(T, z, 90, a0)a*(Z)(ci,(Z) — 60(2))]

1

= —s,.(9o, 6914*] + 099 — 9012 + lld — 4915+ o(n—5). (2.6.21)

By Theorem 4,

~ - -1
I0 - 00!? + H0 - aoll2 = 0p(n '2)-

Subtracting (2.6.20) from (2.6.21) and noticing the deﬁnition of a"‘ (see (2.46)), we

obtain

E [000(T, Z, 60, (10) — D01 (T, Z, 60, ao)a‘(Z)] (é — 60)

= n,0(90,00) — 571(90. 0060*] + 012(66)-

62

The theorem follows from the central limit theorem and the calculation of the variance

is straightforward. Now we prove (2.6.18). Let

k
a;(z) = Za‘(z,)1,(z)
j=1
Condition A implies that.
Ila; — a“||30 = 0(1/k(n)). (2.6.22)

By the deﬁnition of (6, 62,), that is, it solves (2.1.7), we obtain

'We only need to show

Write the left hand side of the above display as

 

_Zd, — T.F(,2690.00)A(né)edn(2.)(at(zi) _,;(Z,))
,_ , F(T. 2.,9 61.)

 

+_:( “71.21.00.09— T'Z 9. 0)

F(ii I: A ‘ Z
. AT,,6e“"(')a*Z,-—a,‘,Z,-.
F(T,,ngé) ( ) (() ()).

(2.6.23)

That the second term is o(n“%) follows from (2.6.22), Theorem 4 and the Lipschitz
continuity of F with respect to 6 and a by Condition A. Similar proof as that of

Lemma 12 below leads to that the ﬁrst term is also o(n—i).

Lemma 12 For any positive and ﬁnite number C, and any function a on Z with

Ea2(Z) < 00,
SUP ]ﬁ(Sn,0(61 01) — 50(9.O)) — ﬁ(5n,0(602 00) — 50(90.00))l
w-aolscrn‘i.IIa-aOHSCn‘t
: 0;)(1) (2.6.24)

63

and

sup 1 |\/—n(Sn(6’,a)[a] — S(9,a)[a]) — ﬁ(5n(90,0’0)lal — 5(gotaollalll
l9-90lSC’l—Z
Ila—oonsori

: 0*(1). (2.6.25)
Here 0;,(1) means tending to 0 in outer probability.

Proof. We prove the ﬁrst part and the second one can be proved analogously. Note
that ﬁ(Sn,0(6, a) — 50(0, 0)) — ﬂ(Sn,0(60, ao) — 50(00, 00)) is an empirical processes
indexed by the set of functions

6—F(t,z,0,a) 6—F(t,z,00,ao)
F(t32190,00)

F(t,z,6,a)
|o — 90| g Cn-i, Ha — 00” g Cn—i}, (2.6.26)

 

 

c = {fut t, z,0,a) = A(t,6)ea(2) — A(t,00)ea°(z) ;

that is, by the functional notations used in the literature for empirical processes,

\ﬁi(Sn,o(9,a)—So(92 01)) ‘ \/ﬁ(5n,o(90,010) — 50(90,00))

2 ﬁne, — P)f(6, t, 2,6,01), (2.6.27)

where Pn is the empirical measure based on (6,-,7},Z,-),i = 1, - -- ,n and P is the
probability measure of (6, T, Z) with respect to the real parameters (60,00). Note

that, under Condition A, functions in C are uniformly bounded for large n, and
”(5,122,110) — f(5, t, 2,90. (roll S 0009 — 90l+ll01 — a’Olloola (2-6-28)

for some ﬁnite and positive number C0. Therefore, C is a set of functions which are

Lipschitz in parameter (6, a) E ’D, where
D = {(0 — 60,a — 010) :01 is of form (2.1.3), [0 — Ool _<_ Cn‘i, ||a — 00H s Cn—i}

and the norm in LOO(D) is ”(01,011) — (62, ()z2)||00 = '91—62l+ll01—(12“00. By Theorem

2.7.11 of Van der Vaart and Wellner (1996), the metric entropy of C with bracketing

64

with respect to L2(P) norm
H[ ](e,C,L2(P)) g H(€/C,D,Loo),

for some ﬁnite and positive number c. It is easy to see that

6

H(€,’D,Loo) _<_ C1k(n)log (l) ,

for some ﬁnite and positive number C1. Hence

Hi 16.6.1603» s (analog (1) ,

E

for some ﬁnite and positive number C2. It is obtained that for any 6 > 0, there exists

0 < C3 < 00, not depending on n, such that

 

J[ ](€,C, L2(P)) :2/ \/1+ H[ ](t,C, L2(P))dt S C3k(n)%el"", for any n > O.
o
It follows from this that, as k(n) = n7 with 0 < 7 < %,
J[ 1(Cn_i,C, L2(P)) = 0(1). (2.6.29)

Note that f(6, t, 2,90,00) = 0 by (2.6.26). It follows from this and (2.6.28) that, for

any f E C,

Nl"

P(f(<5,t,z,0,01))2 S C4n’ , (2.6.30)

for some ﬁnite and positive number C4.

Apply Lemma 3.4.2 (page 324) of Van de vaart and Wellner (1996), which is stated
in the followinglemma. Let Y,- = (6,,7},Z,~),i=1,2,-~-,n,f = C and e = Cn‘i.
By (2.6.30) and the boundedness of f, f E C, the conditions of the lemma hold. It
follows from the lemma and (2.6.29) that

«Blimp l(Pn — P)f|) = 0;;(1).

fEC

65

 

In View of (2.6.27), We obtain (2.6.24). The lemma is proved.

Let )1, Y2, . . . , Yn be i.i.d. random variables (or possibly vectors) with distribution
P and let Pn be the empirical measure of these random variables. Denote Cu 2
ﬂu?" — P) and ”Gull; = supfef lan| for any measurable class of functions .7.

Denote

 

0

Lemma 13 Let .7: be a uniformly bounded class of measurable functions. Then

J[](€,f,L2(P))
1+ 62% M ,

if every f in F satisﬁes Pf2 < 62 and ||f||00 S .M. Here E” means outer expectation

 

E*llGnllf s (:1, 1w. L2(P>)(

wzth respect to P.

66

 

Bibliography

[1] BICKEL, P. J., KLAASSEN, C. A. J., RITOV, Y. and WELLNER, J. A. 1993.

[2]

l4]

[6]

Efﬁcient and adaptive estimation for semiparametric models. Johns Hopkins Se-

ries in the Mathematical Sciences. Johns Hopkins University Press, Baltimore,

M D .

DIAMOND, I. D. and MCDONALD, I.VV. (1991). Analysis of current-status
data. In Demographic Applications of Event History Analysis, Ed. J. Trussell, R.

Hankinson and J. Tilton, pp. 231-52. Oxford: Oxford University Press.

FINKELSTEIN, D.M. (1986). A proportional hazards model for interval-

censored failure time data. Biometrics 42, 845-54.

GROENEBOON, P. and VVELLNER, J. A. (1992). Information Bounds and

Nonparametric Maximum Likelihood Estimation. Basel: Birkhduser Verlag.

GESKUS, R. B. and GROENEBOON, P. (1996). Asymptotically optimal esti-
mation of smooth functionals for interval censoring, part I. Statist. Neerlandica

50, 69-88.

GESKUS, R. B. and GROENEBOON, P. (1997). Asymptotically optimal esti-
mation of smooth functionals for interval censoring, part II. Statist. Neerlandica

51, 201-219.

67

[7] HUANG, J. (1996). Efﬁcient estimation for the proportional hazards model with

interval censoring. Ann. Statist. 24,540-68.

[8] HUANG, J. and WELLNER, J. A. (1995). Asymptotic normality of the NPMLE

of linear functionals for interval censored data. Statist. Neerlandica 49, 153-63.
[9] JORDAN, C. W. (1975). Life Contingencies. The Society of Actuaries, Chicago.

[10] KLEIN, R. W. and SPADY, R. H. ( 1993). An efﬁcient semiparametric estimator

for binary response models. Ecomometrica 61, 387-421.

[11] KOUL, H. L. and SCHICK, A. (1999). Inference about the ratio of scale pa-
rameters in a two—sample setting with current status data. Statist. and Probab.

‘ Letters 45, 359-369.

[12] LI, G. and ZHANG, C.-H. (1998). Linear regression with interval censored data.

Ann. Atatist. 26, 1306-1327.

[13] MESHALKIN, L. D. and KAGAN, A. R. (1972). Discussion of ”Regression mod-

els and life tables,” by D. R. Cox. J. Roy. Statist. Soc. 561‘. B 34, 213

[14] MURPHY, S. A., VAN DER VAART, A. W. and WELLNER J. A. Current.

status regression. Mathematical Methods of Statistics 8, 407-25.

[15] NIELSEN, J. P. and LINTON, O. B. (1995). Kernel estimation in a nonpara-

metric marker dependent. hazard model. Ann. Statist. 23, 1735—1748.

[16] NIELSEN, J. P., LINTON, O. B. and BICKEL, P. J. (1998). On a semiparametric

survival model with ﬂexible covariate effect.

[17] RABINOWITZ, D., TSIATIS, A. and ARAGON, J. (1995). Regression with

interval censored data. Biometrika 82, 501-13.

68

[18] ROSSINI, A. J. and TSIATIS, A. A. (1996). A semiparametric proportional odds
regression model for the analysis of current status data. J. Am. Statist. Assoc.

91, 713-21.

[19] RUDIN, W. (1964). Principles of Mathematical Analysis, New York, McGraw-

Hill.

[20] SEVERINI, T. A. and Wong, W. H. (1992). Proﬁle likelihood and conditionally

parametric models. Ann. Statist. 20, 1768-1802.

[21] SHEN, XIAOTONG and WONG, WING-HUNG. (1994). Convergence rate of

sieve estimates. Ann. Statist. 22, 580—615.

[22] SHIBOSKI, S. C. and JEWELL, N. P. (1992). Statistical analysisof the time
dependence of HIV infectivity based on partner study data. J. Am. Statist. Assoc.

87, 360- 72.

[23] SHORACK, G. R. and WELLNER, J. A. (1986). Empirical Processes with Ap-

plications to Statistics. Wiley, New York.

[24] VAN DER VAART, A. W. AND WELLNER, J. A. (1996). Weak convergence
and empirical processes. With applications to statistics. Springer Series in Statis-

tics. Springer-Verlag, New York.

69