\

 

 

MSU

LIBRARIES
n

 

 

RETURNING MATERIALS:
P1ace in book drop to
remove this checkout from
your record. .FINES will
be charged if book is
returned after the date
stamped below.

 

 

 

 

 

 

MINIMUM HELLINGER DISTANCE ESTIMATION OF PARAMETERS

IN THE RANDOM CENSORSHIP MODEL

BY

Song Yang

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

1988

ABSTRACT

MINIMUM HELLINGER DISTANCE ESTIMATION OF PARAMETERS

IN THE RANDOM CENSORSHIP MODEL

By

Song Yang

Let X1.°°°. Xn be i.i.d. with c.d.f. F. and Y1.°°°. Yn
be independent of Xi's and i.i.d. with an unknown censoring
c.d.f. G. In the random censorship model. the pairs
(min(Xi. Yi)' [Xi S Yi])' i = 1.°°-, n. are observed. where
[A] denotes the indicator of the set A. Let F have a
density f and {f9: 6 € 6} be a parametric family of
densities. where 9 is a subset of the p-dimensional
Euclidean space. This thesis discusses the minimum
Hellinger distance estimation (MHDE) of the parameter
that gives the "best fit" of the parametric family to the
data.

In studying the MHDE, the tail behavior of the
product-limit processes is investigated and the weak
convergence of these processes on the real line is
established. In addition. an upper bound on the mean square
increment of the normalized product-limit process is

obtained. Based on the global behavior of the product-limit

processes. kernel density estimators are constructed and

shown to be consistent under Hellinger metric. Using these
results. it is shown that, when f belongs to the parametric
model. the MHD estimators are asymptotically efficient
among the class of "regular" estimators; they are also
minimax robust in small Hellinger neighborhoods of the
given parametric family.

The work extends the results of Beran (1977; Minimum
Hellinger distance estimates for parametric models. Ann.
Statist. 5. 445-463) for the complete i.i.d. data case to
the censored data case. Some of the proofs employ the
martingale techniques developed by Gill(1980; Censoring and
Stochastic Integrals. Mathematical Centre Tracts 124.

Mathematisch Centrum, Amsterdam).

To my parents and my brother Tao

iv

ACKNOWLEDGEMENTS

I would like to thank my thesis adviser Hira L. Koul for
his support and encouragement during the preparation of this
dissertation. His guidance in my endeavor in statistics is
greatly appreciated. I would also like to thank Professor V.
Mandrekar. Professor James Hannan and Professor Clifford Weil
for serving on my guidance committee and reviewing this
manuscript. My special thanks go to Professor Mandrekar and
Professor Hannan for their many helpful suggestions and
stimulating discussions.

Thanks are also due to Cathy Sparks and especially
Loretta Ferguson for their help in typing this manuscript.

Finally I express my deep appreciation to my parents and
my brother Tao for their support and encouragement. which

have strengthened me a great deal since my earlier education.

TABLE OF CONTENTS

Chapter Page
1. Introduction and Summary 0 ° ° ° ° ° ° ° ° ° - 1
2. preliminaries . . . . . . . . . . . . . . . . . 4
3. Minimum Hellinger
Distance Functionals 0 e 0 ° ° ° ° ° ° ° ° ° ° 21
4. Asymptotic Distributions ° ° ° ° ° ° ° ° - ° ° 34
5. Robustness Properties 0 ° ° ° ° ° ° ° ° ° ° ° 0 52

References 0 o o o c o c o o o o o c o c c o o o o o 55

vi

1. INTRODUCTION AND SUMMARY

Let X1,°°°, Xn be independent and identically
distributed (i.i.d.) random variables with cumulative
distribution function (c.d.f.) F on [0,”). and Y1,-°°. Yn be
independent of Xis and i.i.d. with (sub—)c.d.f. G on [0.m]
(i.e.. C may assign positive mass to co). In the random
censorship model, the pairs {min(X1,Y1).[Xi$ Y1]}. 1 S i g n.
are observed. where [A] denotes the indicator function of the
event A. Suppose that F has a density f with respect to
Lebesgue measure. and some physical theory suggests that f
belongs to a parametric family {fat 9 E 9}. where 9 is a
subset of p-dimensional Euclidean space. At the same time we
recognize that. due to a variety of data contamination. f may
possibly differ from any of the fe’s. The problem is to
estimate the parameter that gives the "best fit" of the
parametric model to the data.

When G is degenerate at w, i.e. when we are able to
observe the complete data X1,°°°. Xn, there have been many
results in the literature. Millar(1983) illustrated that in
many cases when the "best fit" is given via a minimum
distance recipe, there usually exists a minimax structure.
and the minimum distance estimator usually has the local
asymptotic minimaxity property, which is defined to be
robustness there. While there is quite a bit of freedom in

choosing the distance. one distance -— Hellinger distance ——

has the merit that the estimation procedure is asymptotically

efficient if there were no contamination. as discussed in
Beran(b.1977: 1981). It is heuristically illustrated in
Beran(b.1977) that the minimum Hellinger distance estimator
considered there is closely related to the maximum likelihood
estimator and therefore asymptotic efficiency seems
plausible.

In this thesis, the minimum Hellinger distance
estimation (MHDE) in the random censorship model is
considered. It turns out that. as in the i.i.d. complete data
case discussed in Beran(b.1977), when there is no
contamination. this procedure is asymptotically efficient
among the class of "regular" estimators: it is also robust in
a minimax sense in small Hellinger neighborhoods of the
parametric model.

The material is organized as follows. In Chapter 2. some
preliminary results are introduced. The tail behaviors of the
product-limit processes are investigated and the weak
convergences of these processes on the entire support set are
established. The convergence of the kernel density estimators
in the Hellinger metric is obtained. In addition. an upper
bound on the mean square increment of the normalized
product-limit process is developed. In Chapter 3. the
differentiability of the minimum Hellinger distance
functionals is studied. In Chapter 4. the asymptotic behavior
of the MHDE is investigated and it is shown that this

procedure is asymptotically efficient if there were no

 

contamination. In Chapter 5. a minimax robustness property of
the MHDE is established.

Notational Remarks. Throughout this thesis. X1.°°°, Xn'

Y 0- Yn are independent r.v.'s. Unless mentioned

1'. I

otherwise. for i = 1,-00. n. X Yi have distributions F. G

1 '
respectively. 51: [X1 S Y1]. where [A] denotes the indicator

function of the set A, and X = min(Xi. Y1) with c.d.f. H.

1
For any function 5, §_(x) = E(x-), f+(x) = f(x+). For any
(sub-)c.d.f. D. D'1(t) = inf{u= D(u) 2 t}. TD: D'1(1) s w.
and ﬁ = l-D. AD = D—D-. Note that E = i 6 and

TH = min(TF. Tc). Abbreviate TH to T. Let

°". Mn). Jn(t) = [t S T]. Let Rp denote
p-dimensional Euclidean space with R = R1. Any v in Rp is a
pxl matrix. op(l) (Op(l)) denotes any sequence of r.v.’s
converging to zero in probability(bounded in probability).

1: t
Is = l(s.t] for s > 0 and I0 = [[0,t]’ The symbol = means

"is defined by".

2. PRELIMINARIES

In this chapter first we cite some frequently used
results from analysis. Then we investigate the behaviors of

some basic processes involved in the random censorship model.

Lemma 2.1. Let fn. gn. hn’ f. g and h be measurable
functions on a measurable space 0 with measure u.

(i) Suppose that fn. g and hn converge in u measure

n

to f. g and h respectively and they are all integrable. Also.
In 3n du -+ In 3 dn. In hn du -* In h du and en S fn S hn
a.e.[u]. Then Iﬂ fndu —» In f du.

(ii) Suppose that fn -#'f in u measure and "fnan(u)

—» Hf" ) where p > 0. Then fn -4 f in Lp(u). U

Lp(u

For a version of the above results in which the
convergence in measure is replaced by almost sure
convergence. see Fabian and Hannan(1985. p.32) and
Rudin(1974. p.76). From these references and a subsequence
argument we can get our results here.

The following integration by parts formula has a proof

in Hewitt and Stromberg(1965. p.419).

Lemma 2.2. Let U. V be of bounded variation on the real

line. then for a < b,

U+(b)V+(b) - U_(a)V_(a) = [[a b] U_dV + [[a b] V+dU. n

In this thesis we always assume that F and G do not have

common points of discontinuity:

(2.1)

(2.2)

and C
n

(2.3)

1; (AF) dG = o

H1(t) = P[5l - 1. i1 3 t] = [3 6_ dF.
H°(t) = P[61 - 0. k1 g t] = [5 F dG
=13 P_dG . by (2.1).
_.1 ~
Hn(t) = n 2 [X1 S t].
1 —1 ~
Hn(t) = n 2 [x1 g tjai

Hg(t) = n’1 2 [ii 3 t](1 - 51).

A1(t) = I3 (1/F_) dF : [3 (1/ﬁ_) dHl. for t < r.
Ao(t) = [3 (1/6_ ) dG = [3 (1/ﬁ_) dHo, for t < r.
0:1t) = ”1’21H (t) - H‘(c)J.
02(t) = n"2[H°(c) - H°(t)].
1

on = ( Z§ ).
for 1 = 1.2. t < r.
M:(t) = n1/2[H1(t) - [3 (1 - Hn) dAi]

= Q: (t) + [0 (oo_ +Q;_)/H_ dH1 (notice that u;

is Gill(1980)'s M(t) as in his definition (3.2.4)).

H1n(t)

1
ugct)

M n(t)= ).for t < T.

The Kaplan - Meier (1958) product-limit estimators Fn

of F and G are given as follows:

1 - Fn(t) a u [1 - AH n(s)/ Hn _(s)].
s S t
1 - Gn(t) a n [1 - AHg(s)/ Hn_(s)].

s S t

The corresponding processes are

1 _ 1/2 A
O _ 1/2 ‘
Pn = n (Gn - G), on (0.").

Shorack and Wellner (1986) (S-W) have a discussion on En
in their Chapter 7. Notice that since I: (AF) d0 = O. the
roles of F. G are interchangeable. hence we have parallel
results for an. In fact. for the a-field
a: = N u a{[ZiS 3151. [21s a]: 1 g 1 g n, o g s g t}. where N
denotes the collection of all null sets and their
commplements in the original probability space. one can check
via Theorem 3.11 of Gill(1980) that (Mn(t). a: t O S t ( T}
is a 2-dimensional square integrable martingale with mean
zero and the predictable variation processes

1 t - 1 1
(Mn)(t) - Io Hn_(1.- AA ) dA . 1 e o. 1.

and that the predictable variation process for the martingale

M0 + M1 is
n n
o 1 t - o 1 o 1
(an + Mn)(t) - Io Hn_(1 - AA - AA ) d(A + A )

= <u2>(t) + (M;>(t).

since [3 (AF) d6 = 0. So the predictable covariation process

(no. M1)(t) = o.
n n

Thus an argument similar to Theorem 4.2.1 of Gill(1980) shows

that Mn converges in D2[0,T] to a limit process

M1

2.5 M: .
( ) (11°)

where M1. M0 are independent zero mean Gaussian processes.
each with independent increments and for i = 0.1.

(Mi>(t) = [8 H_(l - AA!) dAi . Later on we need to consider

 

the integration operation and therefore it is more convenient
to use the uniform topology. We follow Pollard(1984)’s
approach. Define the sup metric ﬂ(¥)ﬂ = "X" + "Y" in D2[O,T].
Equip D2[0.T] with the a-field generated by open balls. or
equivalently. by the projection maps

{'t:(¥) —4 ($EE3) I t e R). Denote this space by

(D2[O.T]. ﬂ°ﬂ). Then each random element in this space is a
measurable mapping from some probability space. We say Wn
converges to w in (D2[O.T]. no") if Ef(Wn) -» Ef(W) for any
bounded. continuous and measurable function f. Here the

measurability refers to the open ball a-field on D2[O,T] and

the Borel a-field on R. We have the following.

Theorem 2.1. Suppose that F and G do not have common
points of discontinuity: [3 (AF) d6 = 0. Then on a common
probability space there exists a special construction of
triangular array {(§ni’ Gui). 1 S i S n. n = 1. 2.°°'). each
row consisting of i.i.d. pairs with the same distribution as

(X1. 51). and a 2-dimensional Gaussian process

V1(H1)
Q=(
v°(H°)
where V1 and V0 are Brownian bridges. with covariance
Cov(V1(H1). v°(H°)) = -H0H1. such that
"on - Q" -§T§T» 0.

Remark. For the case when there is no censoring,

S-W(1986) have the special construction of Xni's for the

convergence of the ordinary empirical processes in their
theorem 3.1.1 and Section 3.2. For the convergence of QD
process. without the condition [3 (AF) dG = 0. they claim to

have the special construction of Xn 's and Yn ’s by a minor

1 i

variation of their ordinary theory of empirical processes.
i.e.. by "rebuilding" the random variables from the process.

There is some difficulty in defining 5n ’3 if [3 (AF) dG g 0

i

and the construction of Xn ’s and Yn ’s from Qn does not seem

i i

obvious.

To prove Theorem 2.1. we need the following two lemmas.
In Lemma 2.3 we use Billingsley(1968)’s techniques for
fluctuation of partial sums to get an upper bound on the tail
probability of the empirical process Qn' In Lemma 2.4. we

show that (Xi. 51)’s can be "reconstructed from On".

Lemma 2.3. For any a > b. any 5 ) 0, there exists a
constant Ke’ depending only on e. such that for i = 0.1.
and n exceeding some no = no(a.b,e) ) O.

1 1
P Q ' Q >
[t€E:?b)l n(t) n(a)| 5]

s K61H11b) - H‘1a112.

Lemma 2.4. Suppose that F and G do not have common
points of discontinuity and that on has the same distribution

as Qn' Then with probability 1 random variables

A

(M?, 6?. i = 1.000. n} can be constructed from Qn’ with the

9

same Joint distribution as the ordered X(i)’s and their

corresponding 61’s.

Proof of Theorem 2.1. First let us prove the convergence
of on to Q in (D2[O,T], u-u). We rely on Theorem 5.3 in
Pollard(1984). The change from D[O.1] to D2[O.T] causes no
problem.

Just as in D[O.T]. from the fact that D2[O,T] is
equipped with the a-field generated by open balls in the
uniform metric. we obtain that every point in D2[0.T] is
"completely regular" as in Definition 4.6 of Pollard(1984).
Therefore if we can show that with probability 1 the limit
process Q lies in a separable subset of D2[O.T]. then the
necessary and sufficient conditions for the weak convergence
of the processes Qn to Q are their finite dimensional
convergences and the "small oscillation" condition(cf.
Pollard 5.1.(4)). The finite dimensional convergence part
being straightforward. we prove the separability and small
oscillation property. Also. we only look at the first
coordinate 0:.

Let S = (so, 8 0"} be the countable set of jump points

1 I
of H1 and Ho, and A be the set of all functions in D[O.T]
whose Jumps occur only possibly at points in S. For any

x G A, any 6 > O, by Lemma 14.1 of Billingsley(1968). there

exists a partition 0 = to< t1< 0-0 < tm = T, such that for

I1 = [ti-l’ti)'

lO

suplx(t) - x(ti_1)l < 6. i = 1,°°°. m.
I
1
Now we modify the partition as follows. If x is

continuous at t , replace t by a rational point t close

1 i i
enough to ti so that the above inequalities still hold. If x
is discontinuous at t then t must be in S. Therefore. the

i' i

points in the partition can be chosen from a countable set.
Let us denote this set by U = (no. u1.°--}.

Let Bn = {x G D[0.T]: x takes constant rational values
on each interval [t1_1. ti)” where 0 = to < t1 (-°°( tm = T
is a partition of [0. T] from points in U. and x(T) is also
rational). Then B = U Bn is countable and certainly dense in
A. Thus A is separable and clearly we have P[Q e A] = 1.

As to the small oscillation condition, for each

5, 5 > 0. take a partition 0 = t <°°°( tm = T such that

O
1 1
sup [H (t) - H (ti-1)l ( 6. i = 1.-°~, m.
I
i
By Lemma 2.3, for n exceeding some no(t1_1. t..e).
1 1
p(s¥p Ion1t) - on1t._1)l > e]
i
1 1 2
1 Ke(H_(t.) - H (t...)) .
Since there are only finitely many partition points. we

obtain

1

limsup P[ max sup IQ:(ti) - on
i I

n —» w i

g limsup 2 Ke(H1(ti) - H

n -+ 0 i

(t.-.)l > e1

1(tan

S K6 6.
Thus the proof for weak convergence is completed.

Since with probability 1 the limit process Q sits in a

11

separable set of completely regular points, the
representation Theorem 4.13 in Pollard(1984) guarantees that
on a common probability space there exist versions an' O of
Q , Q. respectively. such that "an - a" -ET;79 0. Now we can

n

obtain the special construction ((X 1 S i S n) from

ni' ani):
((X?. 5?), i = 1.°°°. n) in Lemma 2.4 through a random

permutation. a

Proof of Lemma 2.3. Let us just look at the case for
n1/2(Q; - 01). The proof is similar to the one used in the
proof of Theorem 13.1 in Billingsley(1968). In place of the
condition (13.17) there. one can verify that

E1Ioﬁ(s+p1) - o;1s1I2Io;(s+pl+p2) - oi1s+p11I21
S Constant (Hl(s+p1) - H1(s))
-(H1(s+p1+p2) - chs+p1)).
for s. s+pl. s+p1+p2 G [a, b).
Thus for r < b - a. by considering the random variables
Q:(a+(b-a-r)ilm) - Q:(a+(b-a-r)(i-1)/m). i = 1.0-0, m. we
have. in place of (13.22) in Billlingsley(1968),

1 1
P Q t - Q >
[[asgfrJI n( ) n(all 6]

(2 6) s Be(H‘(b—r) - H‘(a))2 + PEIQ;(b-r) - o;(a)l 2 e/2].
where Be is some constant depending only on c. By triangle
inequality.

PEIQ;(b-r) - Q;(a)l 2 6/2]
(2.7) s Ptlo;(b-) - Q;(a)l 2 6/4]

+ P[IQ:(b-) - Q:(b-r)l 2 e/4].

Since the fourth central moment of binomial(n. p) does not

12

exceed 8np + n(n-1)p2. and P[IXI > e] S e-4E(X4}. we obtain
P[IQ:(b—) - Q;(h-r)l 2 e/4]
4 1 1 4
s (4/e) E {Qn(b-) - Qn(b-r)}
(2.8) g (4/e)4(8n‘1(h1(h-) - Hl(b—r))
+ n‘1(n-1)(H1(h-) - H1(b—r))2}.
Substitute (2.7), (2.8) in (2.6) and then take the limit as
r 1 O. we have

1 1
PE Ezrb)|0n(t) - Qn(a)| > e]

s Be(H‘(b-) - H‘(al)2 + PEIQ:(b-) - Q:(a)| 2 e/4].
Now for a = H1(b-) - H1(a). as n ——» a. the Central Limit
Theorem gives us
P1IQ§(b-) - Q:(a)| 2 6/4]
-—e P[{a(1—a))1/2IN(O.1)I 2 e/4]
s (4/el402 E {No.11}4 s Be(H1(b-) — H‘1a112
11 we take the constant Be large enough. Thus for n exceeding
some n0 = n0(a. b, e).

1 1
PE Egrb)|0n(t) - Qn(a)| > e]

s xe(Hl(b-) - H‘(a))2

for some constant Kedepending only on e. 0

Proof of Lemma 2.4. First note that since I; (AF)dG = 0.
when there are ties among Xi’s. the corresponding 61’s must

be all 0’s or all 1’s: for 1 # J.

ll
0

P[x1 = x1. 51 = o. o = 1] g P[Y1 a xj]

J

' 1 1/2
-n

). where W; = Qn H1 for i O. 1. with

l

n
So for Wn = ( wo
n

13

probability 1. we have

(2.9) (wt-18)Wn e ((1n‘1/2. Jn'l/Z): 1, j 2 o, 1 + j g n}.
Atwn = lim (It - 1rt-l/m)wn
m a w
e ((o, kn‘l/z).(kn’1/2, 0): o g k g n}.
'1wn e ((1n'1/2. jn'l/z): 1. j 2 o. 1 + j = n}.

i.e.. with probability 1. W2. W; are increasing. taking

constant values on consecutive intervals. and jumping at

different points. also. the total number of jump points is n.

if we count k jump points when AW1= kn-llz. 1 S k S n. for

i = O or 1. By our assumption the same is true for
1

H a
O ). Denote the ordered jump points of Wn by

‘ 1/2
n “ Qn - n ( H
n

1"
according to whether W; or W3 jumps at X?. Then by (2.9) the

~

'°. X: and define the corresponding 6?’s to be 1 or 0

X? i)

joint distribution of (X?. 6?). i = l.---. n. can be
determined by the projection wt's acting on Wn. In fact. it
is the joint distribution of the ordered Xi's and the

corresponding 61's. 0

By Theorem 2.1. we can adapt Theorem 7.1.1 in S-W to

show
(2.10) sup IMn - MI -—z—;—»
[O'T] c o
for the special construction of xni’ 6n1. i = 1.000. n. and a

2-dimensional Gaussian process as in (2.5). Now for

O S t ( T. let

r(t) 13 F_/(fﬁ_) anl.

6(t) [3 é_/(éﬁ_) duo.

P1(t)

P°(t)

l4

1 0

Then P /F. P /G are martingales on [0. T) with quardratic

variation processes

01(t) 1

I; [F_/(Fﬁ_)]2 ﬁ_(1 - AAl) dA
13 (ﬁt_)‘1 dAl.

Co(t) = [5 ('t':1'=_)’l dA°.

respectively. Now we can obtain the following convergence

results for P1 and Po.
n n

Theorem 2.2. Suppose that F and G do not have common
points of discontinuity.

(i) if G is continuous at T. then for the special
construction in Theorem 2.1 and any a e (0.1/2).

(2.11) sup I[Pg -PO](t) il‘“(t)| ’2‘ 0.
t6 [0. '1‘]

(ii) if F is continuous at T and

t

(2.12) A(T) < m. where A(t) [0(1/§_) dF.

then for the special construction we also have

(2.13) sup [[P: - P1](t)l ‘F‘ o.
t e [0.T)

Proof. Since the roles of F and G are interchangeable.
from (2.10) an argument similar to (9) of Theorem 7.7.1 in

S-W shows that. if AG(T) = 0. we have

0 O - - -1
(2.14) sup I(P - P ) K0[G q(Ko)] | = o (1).
[0.T] “ p
co(t)
for Ko(t) = -———-—— and any function q such that. on
1+ Co(t)

(0. 1/2] q is T and t-1/2q(t) is l. q is symmetric about

15

1
t = 1/2. and [O q-2(t) dt ( w. Notice that in their proof.

S-W use their Theorem 7.4.2. which states the uniform
convergence of P; to P1 on each compact subinterval [0.p] of
[0.T). p < T. One way to avoid some flaws in their argument
is to restrict p to be a continuity point of H. Since H has
only countably many discontinuity points. this restriction

causes no problem in deriving their Theorem 7.7.1. Now Take

ta on (0.1/2]. where a 6 (0.1/2). Notice that

q

1

t t
C(t)[<‘;(t)]‘l = Io (éé_)’1dc g ] o(éé_f_)’ dG = C0(t)

s [F1tll‘l I; (éé_)"dc = C(cltﬁ1tll“. So
G 2 R0 = (1 + D())-1 2 H. Hence we have
R016 q(Ko)]" 2 (61“ Kol’“ 2 (61'1 ﬁ 1'“ 2 f 1‘“.
Thus (2.11) follows from the above inequality and (2.14).
To prove (2.13). first note that if AF(T) = 0. we can
use the same method as in the proof of (9) in Th.7.7.1 of
S-W. the role of Rl/q(Kl) being replaced by F and the

integrability condition on q being replaced by (2.12). to

obtain
1 1
(2.15) sup IPn - P I = o (1).
[0.T] P
Now observe that by triangle inequality

sup IP1 - Pll
[7. r) n
(2.16) 1 IF; (T) - P1(T)| + n1/2(F(T-) - F111)
+ sup IP1(T) - P1(t)l.
[T. T)
The first term is op(1) by (2.15). As for the last term. by

the continuity of F (hence Cl) at T and the fact T -% T. it

16

is op(1) when C1(T) < on; it is also op(1) when Cl(T) = m,
since by Remark 2.2 in Gill(1980) (the results (2.4).(2.5) in

that remark and the inequalities following them).

sup IP1(t)I-= oP(1) as p I T. Hence it only remains to show
[p.T)
n1/2(F(T—) - F(T)) = op(1). Note that for the function A as

defined in (2.12). for all t < T.
(2.17) F(T—) - F(t) = [(t T) 6_ dA g G(t)f(t.1)dA.
Now substitute t = T. multiply n(F(T-) - F(T)) through to get
(2.18) n1/2(F(T-) - F(T)) g {nH(T)[(T T) dA}1/2.
So it suffices to show that nH(T) is 0p(1). since A is
bounded and T —4 T w.p.1. Using H H-1(x) S x + AH(H—l(x)).
for x 6 [0.1]. we have

P[nH(T) > t]

1

P[H(T) < 1 - t/n] = [H_H' (1 - t/n)]n

[H H'1(1 t/n) - AH(H'1(1 — t/n))]n

g [1 - t/n]n -—» e“t .

which implies nH(T) = 0p(1). Thus n1/2(F(T-) - F(T)) = op(1).

So finally we have (2.13). 0

Remark. For later use we make the following

observations. For t close to T. F. Fl-a G are nonnegative.

right continuous and nonincreasing. Also.

I f2 dD g [g (1/é_) dF < w.

2

1

[ (fl‘“ 6) dDo g I; Fl'za dG g 1. for a e (0.1/2).
Hence. when C1(T) = 0. as illustrated in Remark 2.2 of

Gill(1983).

l7

P1(t) -—+ 0 w.p.1 as t—-9 T.
Po(t)F1-a(t) -—1 O w.p.1 as t—-* 1’.
So from Theorem 2.2 we have

(2.19) sup IPII < ﬂ w.p.1.

[0-7)

1

P = 0 1
82p E321) | nI p( l

and for any a e (0.1/2).

sup IPO Fl-al ( m. w.p.1.
[0.7)

sup sup IF: Fl-al = 0 (1).
n [0.T) p

P: Fl’“(T) = op(1).

To obtain the main results of Chapter 4(Theorem 4.2.
4.3). we need to control the increment of the process
2 = n1/2(F - F)/F.
n n
For any process x. let XT(t) = [t g T]X(t) + [t > T]X(T).
Also recall the function A defined in (2.12). The following

lemma gives an inequality for the mean square increment of

the 2: process in terms of the function A.

Lemma 2.5. Suppose F is continuous. Then for
0 S s < t < TF.
EEZ:(t) - 211s112 s 4 [A(t)/ﬁ21tl - A(sllfzcsll

s 4 f "2(t)[A(t) - A(sll + 4 A(slti ’21t) - F ”2(s)1.

Proof. Lemma 2.5 in Gill (1983) states that for

continuous F. 2: is a square integrable martingale on [0. p]

18

for any p < TF. with predictable variation process

(2.20) (2:)(x) = [3 [(1 - §n_)/f]2 Jn/Hn_ dA1
for x 6 [0. p]. Let L(x) = E<Z:>(x). Then
x -1/2 T 2 -
L(x) = 210 [1 - n zn_] Jn/Hn_ dA
x .—
g 2 [o E(Jn/Hn_) AA

1
Since Hn_(x) is binomial and for l g k g n.

1

+ 2 [3 E[Z:_]2JndAl.

n/k S (n+1)/(k+1). we have

- n —k —k
E(Jn/Hn_)(x) = 2 n/k ( E ) H_ H“ (x)
k=l
“ -k -k - -1
g 2 2 (n+l)/(k+1) ( 2 ) H_ HE (x) g 2 (H_(x)) .
1

Also by Fatou's Lemma.

£123-1x112 Jn1x) s Etz:_(x)12 = E 11m [2:(xk112
xkt x

3 lim E[ZT(xn)]2 = lim E< 2T >(xk) g L (x).
xkt x n xkt x n

since the integrand in (2.20) is nonnegative. Therefore. we

obtain
(2.21) L(x) g a(x) + 13 L dp .
where a(x) = 4 13 (1/ﬁ_) dA1 . p = 2 A1 .

Now we use the argument as in Gronwall's Lemma:

iterating m times in (2.21) yields

m-l
L(x) s a1x)+ 13 120 1/1! [31x1 - 5(2)]1 acy) dB(y)

+ 13 1/m! [n(x) - B(y)]m L(y) dacy)
s a1x) + 13 85(1)-p(y) a1y) dp(y)
+ l/(m+l)! Bm+1(x) L(x).
Let m ——» N. Then

L(x) s a(x) + 13 ep‘x’ ' B‘Y’ a1y) dB(Y)

19

a(x) + 2 13 e21n(r(y)/r(x)) a(y) dA11y)

= a(x)
+ 8 13 [F(yl/F1x112 13 (1/ﬁ_(u)) dA1(u) dA1(y)
(2.22) e 4 f ‘2(x) A(x)

by Fubini's theorem.
T T 2
Now E[Zn(t) - Zn(s)]

E[< Zl)(t) - < z: >(s)]

1

V\

2 1; E(J IHn_) AA1 + 2 ]; E[Z:_]2 AA

4 I: (1/ﬁ_) dA1 + 2 I; L 4A1

V\

4 [A(t)/F2(t) - A(s)/F2(s)]

V\

by (2.22) and the Fubini's theorem. 0

Let us now define the kernel density estimators and

smoothed version of the product-limit estimators as follows:

an'1 1R K((x-y1/an) dﬁn1y).
Fn(x) = 1_: fn(s) ds.

where K is some kernel function and an is some positive

(2.23) fn(x)

constant. The following theorem establishes the convergence

of fn to f in the Hellinger metric.

Theorem 2.3. Suppose 13 (1/6_) dF < m and c is
continuous at T. Also suppose that F has a continuous density
f. K is nonnegative. continuous and of bounded variation on

R. [R K(s) ds = 1. K(s) ——9 O as s ——e -w . an -4 0 and

n1/2an —» N. Then

1/2 1/2
(2.24) urn - f "2 —§» 0.

20
where "-ﬂ2 denotes the L2- norm with respect to the Lebesgue

measure.

Proof. Integration by parts gives us

(2.25) fn(x) — f(x)

= (pl/“Zamf1 [R P;(x - ant) dK(t)
+ IR [f(x - ant) — f(x)] K(t) dt
= R3 + R4 . say.

By (2.19) we have

1/2 -1 1

(2.26) sup IR3I S (n an) sup IPnI
x e R [0.T)

Since [R [R f(x - ant) K(t) dx dt = [R [R f(x) K(t) dx dt

IIR dKl = Op(1).

= 1. by Lemma 2.1 (ii) the nonrandom term R4 converges to

zero in Lebesgue measure. Also

1/2 2 A 1/2 2
urn "2 = Fn(T) = op(l) + F(T) ”5* l = Hf "2.

Now to obtain (2.24). we use a subsequence argument. For
any subsequence {n'} C {n}. there exists a further

subsequence {n"} C (n’} alone which

(a) R -—# 0 for a.e.[A]-x.

4
(b) sup IR3I -9 0 w.p.1.
x E R
1/2 1/2
(c) "fn”"2 -—4 Hf "2 w.p.1.

By (a) and (b). with probability 1. fn..(x) ——+ f(x) for

a.e.[h]-x. By Lemma 2.1 (ii). this and (c) imply that with

probability 1, n1;{? - 11/2u2 -—» 0. Hence (2.24) follows. a

3. MINIMUM HELLINGER DISTANCE FUNCTIONALS

In the censored data case. the pair (X. 6) is observed.
Assuming F has a density f with respect to the Lebesgue

measure A. we have

N t-
P[6l 0. x1 g t] Io F dG.
t
1°C

Thus (X. 6) has a (sub-)density Fl-y(x)fy(x) with respect to

P[51 1. i1 3 t] ‘_dr = [3 f 6 dx.
the measure "G on Rx(0.1). where "G is defined by the
relation
I m duc = I m(x. 0) dG(x) + I m(x. 1) G dx.
for any nonnegative measurable function m on Rx(0.1).
For any (sub-)density d on R w.r.t. A define a

(sub—)density L(d) on Rx(0.1} w.r.t. by

“c
L(d11x.y) = ﬁl‘yix) d’1x).

where D is the (sub-)c.d.f. of d. Recall the parametric

family (f9: 9 € 6} as mentioned in the introduction. For

(sub-)c.d.f. G and (sub-)density function d. the minimum

Hellinger distance functional W(d; G) is defined as a point

in 9. if exists. that minimizes the Hellinger distance

between L(fe) and L(d):

(3.1) "[L(fw(d;c))]1/2 - [L(d)]1/2"G
= inf "[L(f9)]1/2 - [L(d)]“zuG .
e e e

where "."G denotes the L2- norm in L2(uG). In the case when
there are more than one minimizer. a Borel measurable

selection is possible(c.f. Brown and Purves(1973)). For

21

22

O S T S m. define W(°; G; 1) similarly by restricting all
integration to x 6 (-¢.1]. Later we will use ﬂ(°)(—°. 1]"G to
denote the norm under the restricted integration. Note that
W(°: G; co) = W(°; G). For (sub-)densities f. fn on R w.r.t. A

and c.d.f. Gn’ 1n such that

(3.2) sup IGn — GI —9 0 and 1n 1 7.
(-°°-7]
we will use notations 00 = ¢(f; G; 1). 900 = W(f; G).
n
9n — W(f. Gn’ 1). enn - W(f. Gn’ 1 ) and u — “G’ “n = ”G

Also. we will use I_;°du to denote the integral on
(x.y) € (-¢.7]x(0.l} and F9 the c.d.f. corresponding to f6.
We have the following.

Lemma 3.1. Suppose
(a) 9 is a compact subset of Rp.
(b) B i 9 implies f5 f f9 on a set of positive
Lebesgue measure. and for almost every x. f9(x) is continuous
in 9.

Then.

(i) for any (sub-)c.d.f. G. (sub-)density function f
and 0 S T S w. W(f; G; 1) exists.

(ii) ¢(f ; G; T) = 9 uniquely if both T and 7 2 T .
9 G F9
(iii) for any (sub-)c.d.f. C. any (sub-)density

functions f.fn on R. Ilfnl/2 - f1/2H2 -e 0 implies

H[L(fn)]1/2 - [L(f)]1/2ﬂc —e 0.
(iv) Il[L(fn)]1/2 - [L(f)]1/2IIG —. 0 implies sun; G)

-—4 W(f; G) if W(f: G) is unique.

23

If. in addition.
(c) the family {F6(x): 6 € 6) is equicontinuous. then
(v) for Gn’ Tn. G and T satisfying (3.2) and
AG(1) = o. "[L(fn)]l/2 - [L(f)]1/2"G —» 0 implies

W(fn; G; Tn) —4 ¢(f; G; T) if ¢(f; G; T) is unique.

Proof. By (b) Fe(x) is continuous in 9 for fixed x. thus
(i) can be proved as in Theorem 1 of Beran (b.1977). (ii) is

obvious. For (iii). first note that for a. b 2 0. lb - a]

S b + a. hence (b - a)2 S lb2 a2I. So we have

"(L11 )11’2 — [L(f111’2uc 2
= IR [Fl/2 _ F1/2]2 dG + IR [fl/2 _ f1/2]2G dA
S In IFn - Fl dG + [R [fl/2 - f1/2]2G dx.

Since [R IFn - Fl dG g sgp IFn - Fl 3 [R Ifn - 1| AA

= IR [fl/2 - f1/2I(f111/2 + f1/2) dA. an application of

Cauchy-Schwartz’s inequality to the last integral gives us
1/2 _ f1/2 2

IR IFn ' Fl dG S 2{IR [fn ] dh)“ . Hence we obtain
"[L(f 111’2- [L(f)ll/2"G 2
S 2{In [fl/2 _ 1/2]2 dM1/2
+ In [fl/2 _ 1/2]2G dA

This proves (111). Next we prove only (v) as the assertion
(iv) follows similar to (v).
Define N. Nn ) O by
N16. 1) = "{[L119111’2 - [Lif1Jl’2}(-~. «lIIG .
Nn(9. 1) = "(1L119111’2 - [L(rlll’2II-w. «“113

n

By the triangle inequality. we have

24

INn(6. in) - Nn(9. f)I2

s "11L1rn111’2 -[L(f)l"2}(-~. «“113

n

"(L(f )11’2 - [L(fllllzll 2

1/2 1/2 2 - —
+ |jﬁ - 1 ] (an - G) dhl

-1/2 -1/2 2
+I1ﬁ - J 416 - G)|-
The first term converges to zero uniformly in 6 as
llfllll2 - f1/2II2 -—+ 0. and so does the second term for Gn’ 1n.

G and T satisfying (3.2). since it is dominated by

sup IGn - GI I::f1/2 - f1/2]2 dk S 2 sup IG - GI. Now
(~w . ”1 {-w «J “

using integration by parts formula. we can write the third

v“ - — - - 1/2
term as |-[_ (c n_-c_) d(Fn + F - 2(FnF) )
1/2 _ -1/2 2

+ [Fn ] (T n)(Gn - G)(Tn)I. which can be dominated by
5 sup IG - GI ——» 0. Thus we have shown that. for G . Tn. G
n n
(-”.T]
and 1 satisfying (3.2).
(3.3) . Nn(6. fn) - Nn(9. f) ——» 0
uniformly in 9 as "f;/2 - f1/2ﬂ2 ——» 0.

By the triangle inequality again we have

IN:(9. f) - N2(e. f)l

s [i 1/2 _ f

11212“:n _ 6| dx

+ lIﬁ —1/2 _ -1/2]2 d(Gn _ G)|

1/2 n 2
+ "{[L11911 - [L(fll“ 2111 .1110
Similar to the previous argument. the first term is bounded

by 2 sup IGn - GI and the second term by 5 sup IGn- GI.
(-°-7] (-" T]
The third term is bounded by

25

I(F + F + G)(1 - (F + F + G)(1“)|. Hence for G . 1“, c and
9 9 n

1 satisfying (3.2). (c) and the assumption AG(1) = O.

(3.4) N:(9. f) — N2(e. f) -—» o
uniformly in 9 as "£111,2 - f1/2"2 -% 0.

From (3.3), (3.4) and again the inequality
(b — a)2 S Ib2 — a2] for a. b 2 0. it follows that
Nn(6. fn) - N(9. f) ——» O uniformly in 9, which implies

Nn(9nn' fn) - N(90. f) = mén Nn(6. fn) - man N(9, f) ——4 O,

and Nn(9 fn) - N(9 f) -—» 0. Hence

nn' nn'

(3.5) N(9nn. f ) - N(90. f) ——» 0.

As in Beran(b.1977). from (3.5), compactness of 9,
continuity of N(9.f) in 9 and uniqueness of W(f; G), one has

n

enn -—4 90. i.e. W(fn; G; 1 ) -4 W(f; G; 1). D

To study the asymptotic behavior of the minimum
Hellinger distance functionals, we need to establish the
following expansions for se 5 [L(fe)]1/2. When the first
order partial derivatives of f9 w.r.t. 9 exist. we will

denote the column vector of the partials by f9 with ith

component £31); when the matrix of the second order partials

exists, it will be denoted by f with (1.1) entry féij).

9

Also, At denotes the transpose of the matrix A.

Lemma 3.2. Let p be an interior point of 9. Suppose that

there exists a neighborhood V of p such that

(i) on V, f9 is continuous in 9 for every x and f9(x)

26

is contniuous in 9 for x C N, where N is a A-null set,

(11) for 1 = 1,..., p. U“)(e) a [ [19“)12/19 ex is
continuous on V.
Then for pn in a neighborhood of p and (x.y) ¢ N0x{1},

where No is a h-null set.

(3.6) spn= sp + (ép‘ + rﬁupn - p).
where
(3.7) urgi)(-”.1]NG —e O as pn ——e p

for i = 1.°°-. p. 1 6 R. and any (sub-)c.d.f. G.
If. in addition. we assume

(iii) for i = l,¢°°, p. and some 5 ) O.

v(1)(9) a I |}9(1)|2+"’/1‘;+6 di and

W(1)(6) E I - Ifginlf9 dﬁélz are bounded in a neighborhood
of p. Then

(3.8) llrgi)(-°°,1n]llG ——+ O as pn-—4 p

n

for each i = 1.0-0. p and Gn’ 1n satisfying (3.2).

Lemma 3.3. Let p be an interior point of 9. Suppose that

there exists a neighborhood V of p such that

(i) on V. {G is continuous in 9 for every x and f9(x)
is contniuous in 9 for x C N. where N is a h-null set.
(11) for 1. 3 e 1,-oo. p, n(‘)(e)_ = [ [1(1)]4/13 dx.

U£122(9) E f [ f9(ij)]2/fe dh are continuous on V.
Then for pn in a neighborhood of p and

(x.y) C No x (1}, where No is a A-null set.

27

(3.9) epne .p + (';p + Rn)(pn - p).
where
(3.10) HRgi’J)(-¢.1]"G -—» o as pn -—» p

for i. j = l.'°°. p. 1 e R and any (sub-)c.d.f. G.
If. in addition. we assume

(iii) for i. j = 1.000. p. and some 6. 6 > 0

11 " 13 2+ 1+
vg )(9) f | 19‘ )I 6/16 2 dx.
1 _ ' 1 4 5 3 5
v; )(9) =1 I19( )I + ne+ dx.
1) = " 13 - 1/2
WE )(e) - I - | 19( )llf6 are and

1 _ ' 1 - 1/4
w; )(e) = j - [19‘ )I/fe are

are bounded in a neighborhood of p. then

(3.11) un£2°3)(-u.w“]uc -—e o as pn -—» p
n

for i. j = l.°--. p and Gn' 1n satisfying (3.2).

Here we Just prove Lemma 3.2. The argument for Lemma 3.3

is similar and more involved

Proof of Lemma 3.2. First let us look at the case when

the parameter is one dimensional.

On [St f9(s) = 0]\N. f9(s) must be zero since otherwise

ft(s) would be negative for some t. Now we can write IR

I191s1I ds = IR {I19(s)l 19‘1’21s11191’21s) ds. hence by (1).

the Cauchy-Schwartz inequality and Lemma 2.1(1).

IR If9(s)l ds is finite and continuous. Now since for s G N

28

and 6 6 V we have f9(s) fp(s) + f2 ft(s) dt . it follows
by Fubini’s theorem that Fe(x) = fp(x) + [g{[: ft(s) ds} dt.

So by Lemma 2.1 (i). for every x. f9(x) exists. is equal to

I: f9(s) ds and is continuous in 6. Next. note that for every

x < TF9. [%9(x)]2/f9(x)

= (1/F91x111I; (£9 19‘1’2)(s) ré’zcs) as]2
(3.12) g [: (192/16)(s) ds
and [R I: [19(1)]2/19 ds dG(x)

(3.13)

I c [19(1)]2/1e dx.

Thus Lemma 2.1 gives us the continuity of "{s9 - ép}(-w.«]uc
in 9 for any (sub-)d.f. G and 7 6 R. Now (3.12). (3.13).
assumptions (1). (ii) and the proof of Lemma A.2 in
Hajek(1972) show that. there is a neighborhood of p. in which
se(x.1) is absolutely continuous in 9 for a.e. x [A] and
89(X.0) is absolutely continuous in 9 for every x. Thus for
(x.y) C No x {1}. where No is a A-null set. we have for pn in

a neighborhood of p.

p .
s = s + f n 3 dt
9“ p p t

s + ( - ' + - ‘lfpn ° - ° ) dt}
on p){sp (9n n) p (St Sp .

(3.14) p

and
_ p . .
n((pn —p) 11p“ (st - sp) dt>(-~.~Jn§

_ p . .
s I(pn -p) lip“ "(st — sp}(-~.~Jn§ dtl

29

(3.15) "(A

' 2
En- sp)(-¢.7]llG

for some En between pn and p. Thus to prove (3.7) and (3.8).

it suffices to prove that as pn -—4 p.

. . m _* . - g -m 1n _#
“(spn- sp)(- .1]"G O. ll{spn p)( . ]Hon 0.

respectively. The first being guaranteed by the continuity of
"(39 - sp)(—°°.1]llG in 9. we only have to prove

"(spn- sp)(-m.w“]an-» 0.

When 9 is multidimensional. we can apply the above

argument to each component of 9. Consequently. we have

"{sgi) - sgi)}(-m.1]ﬂc -—4 0 for pn -+ p. and it only remains
n

to prove u( égi) - é“’}(—o.«“]uc ——» o.

n P n

We have

n(ég‘) - é£‘))(-m.1“]u§

n n

' 1 ' 1 2
= n(sgn) — s: ))(-m.1“]uc

1/2 2

1/2 - -
p )] (an - G) dx

+ 4'1]:: [1£‘)/(1p

'(1)
- f / f
n n) p (

-1/2 2

-1/2 : 1
) — F: )/(Fp )] d(Gn~ C).

(3.16) + 4'11:: [%£1)/(Fp
n n

The first term converges to zero as mentioned above. and by
(3.12) and repeated use of Lemma 2.1 (i). the second term
also converges to zero. Applying the integration by parts
formula and (3.12) to the third term we have

n . .
III. [F§"/(F;’2) - ig"/(F;’2)12 d(cn - G)I
n n

30

_ 1“ _ :(1) -1/2
| 2[_on (cn_ G_)[Fp /(Fp

n n

) - fﬁ"/(F;’2)J

1 1/2
dx —[F( )/(Fp

p11 11

) - %£‘)/(f;’2)] dx

+ (CD - G)[F(1)/(F:/2

pn n

(3.17) s 2 sup Ian-cl{tumwnnl’2 + [U“’(p)11’2}

(-”-7]

) _ F(i)/(F1/2)] 2( n)|

1/2

D
i
.IIQ I TS[F( )/(Fp

pn

_ L(i) -1/2
D ) Fp /(Fp )]| ds
+ sup Icn-cI110‘1’1pn)11’2 + [0‘1’1p111’2}.
(-"-1]
Since sup IGn-GI -—9 0. it suffices to prove that

(-”-7]

n . .
I:m |-%;[ng)/(§:/2) - Fg1)/(§:/2)]I ds remains bounded. We
n n

can bound it by [R [If£1)I/(F$ ) + I lf(1)|/(F1/2) )]
n

-1 4(1) -3/2 -1 1:(1) -3/2
+ 2 [R Fp /(Fp )de + 2 1 Pp /(Fp

) dF . The first two
n n n p

terms in the sum are 2W(i)(pn). 2W(i)(p) respectively and
therefore remain bounded. To deal with the last two terms in

the sum. denote them by B(pn). B(p) respectively. Let

p = 2 + e. and q the conjugate of p: p.1 + q-1 = 1. Then

q'1 - 2"1 > 0. Take a = q‘l. then aq = 1, ap = p — 1,

Holder's inequality gives us

IF“’(x)l/F3’2(x )
pn pn
/2 m '(i) a p l/
s n mux Irpn xfpnl (s) as]
-[I; 132(3) d811,
_ -1/q /2 m '(i) a p s l/p
- Fpn (xmx prn prnl (s) d 1
-1/q

(3.18) 1 F ’2(x)EV“’(pn)]"

31

hence

-l -1

(3 19) B(pn) s 2'IIV‘1’1pn111’P(q“ - 2 )

Similarly.

-1

(3.20) B(p) s 2‘11v‘1’1p111’P1q‘1 - 2 )‘1.

Finally the result follows from (3.12) through (3.16). 0

Now we are ready for the main results of this chapter:
the differentiability of the minimum Hellinger distance
functionals. We state the results separately for the case

when G is known and the case when G is unknown.

Theorem 3.1. Suppose
(i) assumptions (a) and (b) in Lemma 3.1 hold.
(ii) 900 = W(f; G) exists. is unique and lies in the

interior of 9.
" 1/2
(iii) the matrix I 39 [L(f)] du is nonsingular.
00

(iv) assumptions (i) and (ii) of Lemmas 3.2 and 3.3

hold for p = 9 Then. for fn in a Hellinger neighborhood

00'
of f.

(3.21) 1(1n; G) - 1(1; G)
" 1/2 -1
= 1- I seoo[L(r)1 du + “n1

.; $90011L11n111’2 - [L(f)]1/2) du

where all entries of the matrix un converge to zero as

"11/2 - 11/2" ——4 o.
n 2

32

Theorem 3.2. Suppose
(i) assumptions (a). (b) and (c) in Lemma 3.1 hold.
(ii) 90 = W(f: G; 7) exists. is unique and lies in the

interior of 6.
(iii) the matrix [a (seose; + .;90(890 - [L(f)]1/2))du

is nonsingular.

(iv) assumptions (1). (ii) and (iii) of Lemmas 3.2 and
3.3 hold for p = 60. Then. for fn in a Hellinger neighborhood
of f. Gn.1n satisfying (3.2) and AG(1) = O.

2(1n; an; 1“) - 2(1; c; 1)
3.22 = 7 ° ' t + °' - L f 1’2 d '1
( ) {I-m(890890 330(390 [ ( )1 )) u + Vn}

“ ' 1/2 1/2
'12. 890([L(fn)] - [L(feo] ) dun.

where all entries of the matrix vn converge to zero as
"1 1’2 — 1 1’2" ——3 o.
n 2
We only give the proof for Theorem 3.2. since the proof

for Theorem 3.1 is similar and simpler.

Proof of Theorem 3.2. First note that if assumptions

(1). (ii) and (iii) hold for 9 then they hold for all

0.
points in a neighborhood of 60. Let n be sufficiently large

so that enn is in that neighborhood. Since enn minimizes

I?“ 2 1’2 h b L 3 2 1
-w (5t ' 2 8t[L(fn)] ) dun. we ave y emma . . or

suffficiently large n.

n .
(3.23) [:3 se (s9 - [L(f)]1/2) dun = 0.
DD nn

33

Expanding s9 . s around 90. we can rewrite (3.23) as

nn arm
n . ..
o = 1:“ [390 + ( 390 + Rn) (ann - 90)]
~[s90+ (990+ rn)‘ (enn - 90) — [L(fn111’2] dun
n .
= II. s901s90 - [L(rn111’2) dun
+ [7n A (Q + r )t du (e - e )
-° 90 90 n n nn 0
D ..
+ II. ( 390+ Rn)(s90- [L(fn111’2) dun (enn — 90)

n O .
‘7 t
+ [_m ( 590+ Rn)(9nn- 90)(seo+ rn) dun (Gnn- 90).

An argument similar to that used to prove Lemma 3.2

shows that for Gn’ 7n satisfying (3.2). as "f;/2 - f1/2"2
n O O O O
1 t 1/2

-* ”- - (s s + s (8 -[L(f )] )) du -+
m 90 90 90 90 n n

(In (39 Set + 89 (s -[L(f)]1/2)) du. Thus the above equation
0 o o 90

can be written as

n .
0 = I. s90(s90 - [L(fn)]1’2) dun

t
O

+ vn}(enn _ 90)’

. 11:. ($9059 . '.90(.90 - [L(f)]1/2)) d”

where all entries of the matrix vn converge to zero as

llfrlll2 - f1/2ﬂ2 -—9 w. Therefore the result follows. 0

Notice that differentiating twice in the identity

I s: du E 1 yields f (seset + $989) du E O. which results

in the shorter expression (3.21).

4. ASYNPTOTIC DISTRIBUTIONS

When C is known. the MHD estimator of W(f; G) is defined

as

A

(4.1) 9 = W(fn; C):

1n
when G is unknown. the MHD estimator of W(f: G; T) is defined

as

2(1 ; E ; T)

(4.2) e n n

2n=

(recall T = max(Xl.°°'. Xn)). where CD is the product-limit
estimator as defined in (2.3). and fn is the kernel density

estimator as defined in (2.23):

-1 A
fn(x) - an I K((x—y1/an) an1y)

for some kernel function K and constant an ) 0. We now prove

the consistency of G . 92n'

In
Theorem 4.1. Suppose that
(i) assumptions (a) and (b) of Lemma 3.1 hold.

(ii) K is nonnegative. continuous and of bounded

variation on R. f K dA = 1. K(s) -—» o as s --4 - w.

(iii) an ——4 0 and n1/2an -—9 ”.

(iv) [3 (1/§-)dF < m.
Then
81“ -54 W(f; G) if W(f; G) is unique.
If. in addition. assumption (c) in Lemma 3.1 holds and G

is continuous at T. then

92n —5# W(f; G; T) if W(f; G; T) is unique.

34

35

Proof. By Wang(1987). sup IGn - CI = o (1). So Theorem
[0.7] p

2.3 and Lemma 3.1 give the result immediately. 0

To investigate the asymptotic distributions of eln and
92n’ we need to establish some convergence results of the
kernel density estimator fn and the smoothed product-limit

estimator F . Let ﬂ°ﬂ denote the Lm(R)- norm.
n no

Lemma 4.1. Suppopse that

(i) f'
[0. T]. llf’llm < 00 and llf”llw ( 0°.

-%; f exists and is absolutely continuous on

(11) 1 < a and [3 (1/6_) dF < m.
(iii) K is nonnegative. symmetric and absolutely

continuous. ] de =1. support of K C [-H. M] for some H < w.

(iv) a -—4 O. n1/2aﬁ

n
1/2 1+5
n a
n

-» O and for some 6 > 0.

a”

(v) U is bounded. V is right continuous and of bonded
variation on [0. T].
Then
T 1/2 T 1
(o n [Fn- F] U dG ‘F‘* (o P 0 dc.

T 1/2 T 1
)0 n [in - f] v di "F” '10 P_ dV.

Proof. Let

(4.3) ?n<x) = a;‘ I K((x-y)/an) dF<y>.
En(x) = I_: En dh.

Then we have

36

n1/2[Fn(x) - Fn(x)] = I P;(x - ant) K(t) dt.

n"2[§n<x) - F(x)] = I n"2

It follows from (2.13) that

[F(x - ant) - F(x)] K(t)dt.

T 1/2 " T 1
lo n [Fn(x) - Fn(x)] U d0 ‘3'” [0 P U dG.

By the integral form of the mean value theorem.
F(b) - F(a) = (b - a)f(a) + j: (b - u)f'(u) du

for a. b e R. Hence by the symmetry of X.

1/2 2
a
n

(4.4) sup nl/len(x) - F(x)l g uzuf'uon n -—4 o
R

1,2[Fn(x) - F(x)] U dG ——4 o.

as n -4 O. This gives us I; n
Thus we have
1' nl/2[F (x) - F(x)] U dG ———» [7 PIU dG
O n P O '
As for the second assertion of our lemma. we have
T 1/2 "
Io n [in - in] v dx
-1 1
IR V(x) an IR K((X'Y)/an) dpn(Y) dX
1
= IR [R V(y + ant) K(t) dt d Pn(y)
— I P;_(y - ant) dV(y) K(t) dc
‘F‘ — I P1 dV.
Hence it only remains to prove I; n1/2[fn(x) - f(x)] V dA

—4 0. Again. by the integral form of the mean value theorem.
f(b) - f(a) = (b - a)f’(a) + I: (b - u)f"(u) du

for a. b e R. So

1/2 2
a
n

___.)o.

(4.5) sup nl/2Ifn(x) - f(x)] 5 Hzllf"llon n
R

which implies I; n1/2[fn(x) - f(x)] V dk -—4 0. 0

Lemma 4.2. Suppose

(1) (ED) is a family of uniformly bounded functions on

37

[0. T). and for any x e [0.T). xn ——» x implies Bn(xn) .54

B(x) for a bounded function B.

(ii) G is continuous at T.

Then [3 3n d(§n - G) ‘F‘ 0.

Proof. S-W state in their Theorem 7.3.1 the uniform
strong consistency of Fn on [0. T). There is some difficulty
in their argument about the uniform convergence on [0. T).
But their proof shows that for any t 6 [0. T). Fn(t) is
strongly consistent for F(t). Thus similarly we have the

strong consistency of Gn(t) for any t 6 [0. T). This implies

that for w in a set of probability 1. Gn-l ——4 G-1 at

1

continuity points of G- in (O. G(T)). Since G"1 = T only at

t = 1. for a.e.t we have G—1(t) < T. hence for a.e.t.
BnOG (t) ——+ BOG (t). Now let pn — Gn(T). nn— G(T) and

p = G(T). By Wang(1987). pn -P9 p. And continuity of G at T

gives us "n -—» p. Therefore
T A
(0 BD d(Gn- G)

p a _ n _
= [0“ Bno cn 1(t) dt - [on Bnoc 1(t) dt ‘F* o. 0

Lemma 4.3. Suppose that

(i) f’ = —%; f exists and is absolutely continuous on

[0. T]. llf’llco < a. llf”llco < w and inf(f[f ) 0]} > O.
(11) r < a and [3 (1/6_) dF < w,
(iii) K is nonnegative. symmetric and absolutely

continuous. f de = 1. support of K C [-M. M] for some M < w.

(iv) a -—4 O. nllza: -—4 O and for some 6 > 0

n

38

1/2 1+6
n a
n

-—» w. Then

T 1/2 2

lo n (In - f) dx ”F” 0.

Proof. By (4.5). it suffices to show
T 1/2 ~ 2
10 n (fn - 1“) dx "5* 0. Let
1 l

Dn(x.t) — [Pn(x+ant) - Pn(x-ant)]. By symmetry of K and the

Cauchy-Schwartz inequality.

T 1/2 ~ 2
(o n (in - 1n) dA

= (n1/2a:)-1fg [1: Dn(x.t) K’(t)dt]2 dx
5 W(n1/2a:)-l [g [g D:(x.t) IK’(t)I dt dx
= W(n1/2a:)-l I: [g D:(x.t) dx IK’(t)I dt.

Writing Pn(s) = ZI(s)F(s) for s < T and using
(a+b)2 S 2a2 + 2b2. we have E Dn(x.t)2[ant < x < T - ant]
T - T - 2
- E[Zn(x+ant)F(x+ant) - Zn(x-ant)F(x-ant)] [ant< x < T - ant]
-2 T T 2
g 2 F (x+ant) E[Zn(x+ant) - Zn(x-ant)]
- - 2 T 2
+ 2 [F(x+ant) - F(x-ant)] E[Zn (x-ant)] .
By Lemma 2.5.
—2 T T 2
F (x+ant) E[Zn(x+ant) - Zn(x-ant)]

2

S 4[A(x+ant) - A(x-ant)] + 4 F (x+ant) A(x-ant)

-[F -2(x+ant) - F -2(x-ant)].

2) = b'2(a + b)(b — a)

Since for 0 < a < b. a2(a-2 -b-
< 2(b - a)/b. the second term on the RHS of the last
inequality does not exceed
- —l
8A(T)[F(x+ant) - F(x-ant)] F (x-ant).

Lemma 2.5 also gives us

- - 2 T 2

[F(x+ant) - F(x-ant)] E[Zn (x-ant)]

g 4 [F(x+ant) - F(x-ant)]2 F -2(x-ant) A(x—ant)

39

g 4 A(T) [F(x+ant) - F(x-ant)] i ‘1(x-ant).
2
Hence E Dn(x.t) [ant< x < T - ant]
S 8[A(x+ant) - A(x-ant)]
- -1
+ 24 A(T)[F(x+ant) - F(x—ant)] F (x-ant).
This and Holder's inequality gives us

E [T-ant (n (x 1))2‘26 dx
ant n '

2-2e dx

[R E (Dn(x.t)[ant< x < T - ant])

T-ant 2 l-e
S Ia t (E Dn(x.t) [antS x < T - antJ) dx
n
T-a t
S [a tn (8[A(x+ant) - A(x-ant)] + 24 A(T)
n

-[F(x+ant) - F(x-ant)] F -1(x-ant)}lme dx
T-ant 1_
(4.6) g T fa t [A(x+ant) - A(x-ant)]) 5 dx
n

+ w IT-a“t([r(x+a c) - F(x-a 1)] F ”1(x-a t)}l-ed
ant n n n x.

where W is independent of t.

Since
T-ant T-ant x+ant
Ia t [A(x+ant) - A(x-ant)]dx = Ia t Ix-a t dA(u) dx
n n n
T u+ant
g [o [u_ant dx dA(u) = 2 ant A(T) g 2M anA(T).

by Holder’s inequality the first term in (4.6) does not

exceed W T6{2M anA(T))l-e. The second term in (4.6) does not

exceed W{2M anﬂfﬂm}1-e inf(f[f > 0]) [3 Fe_1 dF. Thus the sum

in (4.6) can be written as Bnalllme for some bounded quantity

Bn independent of t. It follows that

T-a t
1/2 2 -1 M 2-2 .
(n an) [o la tn Dn(x.t) 6 dx 'K (t)l dt —F% O.
n

4O

T-a t

Hence by (2.19) (nl/za:)-1f: I n Dn(x.t)2 dx IK’(t)I dt

at
n
—Fa 0. Since we also have
“at 2 T 2
I0 Dn(x.t) dx + IT-ant Dn(x.t) dx
S 4 anH sup sup IPil.
n [0.T)

the result follows. D

The following theorems establish the asymptotic

distributions of our estimators 6 . 9 . Recall that from

2n
the beginning of Chapter 3. when X has a density f w.r.t the

ln

Lebesgue measure and Y has distribution G. (K. 6) has a
density L(f) w.r.t. “G' Since G remains unchanged throughout,
we will simply refer to the weak convergence as under L(f).
The theorems show that for a general density f ﬂ f9. the
asymptotic distributions are slightly different for the two

cases when G is known and when G is unknown. At f they

9
coincide. We will use the differentiability of Q as in

(3.21). (3.22). specifying ”n = uA . T = T. Thus
G

n
-1' -1/2
60 = V(f. G. T). 600 = V(f. G). Denote pl: 2 feofeo .
p = 2—1F F -1/2. p = p f-1/2. w = p F -1/2 and extend them
0 9 6 1 1 0 0
0 0
on R by defining them to be zero outside the support of f9

0
or the support of f.

Theorem 4.2. Assume (1) through (iv) of Theorem 3.1

hold. In addition. Suppose

41

. 1)
u < w. "1‘
9oo " eoo

1n£(£9 [19 > 0]) > o.
oo 00

(ii) ml is of bounded variation on [0. T].

(i) "f < m for i = 1.°°°. n and

(iii) f’ exists and is absolutely continuous on [0. T].
llf'llco ( w. ﬂf"ﬂm( w and inf(f[f > 0]} > 0.
(iv) 1 < m and f; (1/6_) dF < w.
(v) K is nonnegative. symmetric and absolutely
continuous. I de = 1. support of K C [—H. H] for some H < w.

(vi) a -—4 0. 111/2 a2

n n
1/2 1+e
a
n

——4 w, and for some 6 > O.

n ﬂ m.

1/2 A

Then. under L(f). n (91n - W(f: G)) converges weakly

to a normal distribution with mean zero and finite variance.

“2(31n - 1(1; G))

). where 2 is the Fisher

In particular. under L(fe). n
converges weakly to N(0. 2-1

information matrix:

2 = E —%§ (In L(19)(§. 5)) [‘%§ (In L(19)(§. 5))1‘.

Theorem 4.3. Assume (1) through (iv) of Theorem 3.2

hold. In addition. Suppose
(i) llf9 llon < w. ﬂféi)ﬂm ( m for i = 1.-°-. n and
O 0

1n1(19 [19 > 0]) > o.
o 0

(ii) ml is of bounded variation on [0. T].

(iii) f' exists and is absolutely continuous on [0. T].

uf'uco < o. u£"um< o and inf(f[f > 0]} > o,

42

(iv) TF S T < m. ; (1/§_) dF < w and G is continuous
9
0

at T.
(v) K is nonnegative. symmetric and absolutely

continuous. I de = 1. support of K C [-H. H] for some M < w.

(vi) an'-—4 0. 111/2 a: -—9 0 and for some 5 ) 0.

1/2 1+5
n a
n

Then. under L(f). n

ﬂ a.

1[2(9211 - 9(f; G; T)) converges

weakly to a normal distribution with mean zero and finite

variance.
1/2 A
In particular. under L(fe). n (92n - W(f; G; T))
converges weakly to N (0. 2-1).

We just prove Theorem 4.3. The proof for Theorem 4.2 is

similar.

Proof of Theorem 4.3. Throughout the proof. we adopt the

special construction of Xn s and 6n's as in Theorem 2.1. we

 

 

i i
will need to use the algebraic identity(for a. b > O)
(4.7) b1/2 _ 81/2
1/2 1/2
1 1 b - a
= -————— (b - a) - (b - a)
231/2 2a1/2 b1/2+ a1/2
=__1_(b_.,__1_ (b-a12 _
231/2 2a1/2 [bl/2+ a1/2]2

Under our assumptions the expansion (3.22). with Gn' 1n.

1 replaced by Gn' T. T respectively. is valid. where all
entries of the matrix vn converge to zero in probability.

Since the coefficient of the integral on the right hand side

43

of(3.22) converges to a nonrandom limit. we only have to deal

with the integral in (3.22).

Note that IF9(1)(x)I = I]: feé1)(s) dsI
o

_ o°'(1) '(1) '-
— lefeo /feol feodh S {suplf60 (s)/feo|} Feo(x). and since

TF S T the mean value theorem gives
6
0

F9 (x) S supf9 inf(f[f>0])F. Hence wo and F6 /F are
0 O

O

bounded. For the sake of convenience we will use W to denote
a bound for all bounded quantities in our argument. Notice

that as in (3.23). we have

I; $9011L119011’2 — [L(f)]l/Zldu = 0. Thus

nl’zlg $9011L11n)11’2 — [1(190)]1’2) d

= _ IT p f1/2 _ fégzln nn112(6 _ G) dk
+ f3 p0 [g 1/2 _ fo1/2 ] d[n1/2(5n _ G)]
_ n1/2 I; "1[f1/2 _ 0132] c dx
_ n1/2 1; po [1 1/2 _ fag/2] dG
_ 13p ”1 [f1/2 _ f1/2] n112(6n _ G) dk
+ I g pon nl’ztﬁ 1’2-1-‘21 d(cn - G)
+ I 3 Po“ “1,2[Fn 1/2_ - 1/2] dG
+ 1 3p 111/213“2 - 1 ’2] 6 dh
(4.8) = S + S + R + R + R + R + S + S

l 2 1 2 3 4 3 4'

We can write

s1 = [T B(x) Po(x) dx

1/2 _

where B(x) = - pl[f fl/z] is bounded.
90

44

I13 B‘1’(x)[P§(x) - P°<x>1 dxl

3“) | sup ((pg - p0] 1 1"1| 1; 1 “-131.
O.T]

for o < a < 1/2. Since I; F “‘1dh g I; F “’1dF

Ssupl
R

°(inf f(x)[f(x) ) 0]).-1 < w. by Theorem 2.2 and the fact that
R

T -» T w.p.1. we have

0

(4.9) s ‘F‘ I; B P dx.

1
Next. integration by parts gives
1/2 _ F 61/2 0

T O -
(4.10) 32 = — [o Pn_ A dh + (.00 [F ]Pn }(T) .
where
d - 1/2 - 1/2
(4.11) A = a;(po[F - FeO ]}
_ 2‘1[19 + 2 1 F9 19’3/2 f 1’2 19
O O O O
_ F 1/2 f9-1/2 {9 2—1 F(ﬁo F)—1/2 f].
O O 0
When T = T. A is bounded; when T < T. on [0. T ]
F F F
9 9 6
0 o O
- -1/2 — - 0
A S W F9 and F > F(TF ) > 0. Hence by Theorem 2.2 Pn
9o
converges to PO uniformly on [0. TF ]. Therefore in both
0
T 0 T 0
cases we have ]0 P _ A dh -—4 IO P_ A dx. The remainder term
- 1
in (4.10) {potF1’2 - 9’2]P°}(T) 1 W(T)<P° F)(T) = op(1) by
(2.19). Hence
(4.12) $2-——» - I; p? A dx.
P
By (4.7). $3 = 831: S32. where
331 = 2'1 [0 on1/2[Fn - F] d6.
-1 . n1/2
S32= 2 10 Mo [Fn ’ F]
.(Fnl/Z _ V12)(Fn1/2+ F1/2)—1dG.

45

-1 T 1
So $31T-2 Iowa?

w.p.1 . The integrand of $321) is dominated by that of $3gi)

dG by Lemma 4.1 and the fact T -4 T

in absolute value. and for x e [0.T).

(Full2 - F1/2)(Fn1/2+ 1.71/2).1 -9 0. Also AG(T) = 0. Hence
Lemma 2.1 (1) gives 832 -§» 0. Thus
(4.13) $3 ——4 - 2'11 ¢OP1 dG.

By (4.7) again S4 = S41 + S42. where

-1 T 1/2 -
2 lo eln [1n - 1] 6 dx.

_ 2-113 915 (1;/2+ f1/2)-2 n1/2

S41

2
8 [1n - 1] dA.

42 =
T 1/2 -
From (2.25). 1D - 1 1s bounded. So [T n (1n - 1) e16 dh

S Wn1/2(T - T) = op(1). Hence S41 -Fe - 2-lf P: d(¢1G) by

Lemma 4.1. Since IS£;)I S W]; n1/2(fn - f)2 6 d). by Lemma
4.3 sgé) -F* 0. Thus we have
-1 1 _

(4.14) s4 —F—» - 2 I P d(¢1G).

As for the remainder terms R1. R2. R3 and R4. The
results
(4.15) R1 = op(l). R2 = op(l)
follow since n1/2(T - 1

T) S (inf f(X)[f(X) > 0])—
R

~n1’2(F(r-) - F(T))

R3 = I; {p1(x)[(rn1’2(x) + 11’2cx)1’

-(1n - f)(x)Pg(x) dx.

1 . W it
op( ) r e

1}

then by (2.19). (2.25). (2.26) and the fact that the quantity
in ( } is bounded. we have

(4.16) R3 ‘F* 0.

Now look at

T -1/
R4 = 10 ”0(Fn

.n1/2[F

F1/2)-1(x)

- F](x) MEn - G)(x)

2+

n

46

(4.17) =1 an d(cn — G). say.

1/2

By (4. 4) and the fact that |p31)(ii’2+ P )-1

(Kll
(i) -l/2]
S Ipo [F l(x)I S W. the integrand is uniformly bounded

in probability. Also. by the uniform convergence of P; to P1

on each compact subinterval of [0. T). and continuity of F.

F90 and F90. we have for xn-—# x 6 (O.T). Bn(xn) -§4

4—1 F (F F)-1/2(x) P1(x). Hence by Lemma 4.2.
9 9
0 O
(4.18) R4 -F% 0
Therefore. we have proved that for A defined in (4.11).

n1’2lg $9011Lcrn111’2 - [L119111’21 dun

1/2 _ 1/2

(4.19) ———e -]5 p1[f ] Po dh - [T P_ 0A dh

P
T -l l T -1 1 -
— 10 2 To P dG - (o 2 P d(¢1G).
where the limit has a normal distribution with mean zero and

finite variance. Thus n1/2(32n - W(f; G: T)) also converges

weakly to a normal distribution with mean zero and finite
variance. The variance can be computed using (2.5). In

particular when f = f6 for some 9. then the limit becomes

-1 _ 1 -
- I; 2- so P1 dG - [72 P d(¢1G)

= [3 h1 d(P1/P).
where

-l T -1 T — -
hl(x) = 2 [x e0 P9 dG + 2 [x Fe d(¢1G)

-1 T -1 $ T - -1 - -
= 2 [Ix 2 F9 dG + [x elc are] — 2 ¢1F9G(x)
- 4‘1(;T P as + [T 1 a] - 2“. P C(x)
' x 9 x 9 l 9

—1 a - - -1 — -
= 4 -53— (FOG) - 2 W1 FGG

47

-1 a - -

Using the quardratic variation process of the martingale PllF
from Section 2. we have

Cov([5 hl d(Pl/ F))

2‘?

-1 T L _ _
16 [0 (Pa - 2¢1F9) c

o(ﬁ92 é)‘ldP

- t
a ’ 2"’1Fe)

9
_ -1 T t _ t _ t t —
The (i.j) entry of I; wowoé dF6 can be written as

4‘1]T F(i) F(J) F '2 é dF

) a d

'l'il

= 44101: V3 )

4'1[;T a d( l P

”1|

"1|

- 1; 6 F9 d(

4 1(1’ F(‘)

(J
a
(1
e
(1
e

:l
'11!-
"I1l GA GA
L4.
v
V
I._.J

(J )-
6 9
1) .11) . F11) .(1),d.]

Fj
+ [3 EF - M( F3
(i.J)6 entry of {[0 woeo FBdG + I0 mo of dF9

t
+ I; oloo G dFG }.

DI

Thus

Cov (I; hl d(P1/ F))

-l T t - T t -
[[0 *1’1 C “9+ [0 fofo Fe dc]

= 16’1 2.

Consequently the covariance of the limit of

n1/2 “

(92n - ¢(£; c; 1)) is [4‘121‘1(16’12)([4‘12]‘1

)t=2. [I

When the Xi’s are distributed according to the model f9.

the asymptotic covariance matrix of n1/2[92n - W(f; G: T)] is

48

the reciprocal of the Fisher information matrix. This fact
reflects a certain optimality property of the estimator 62n
For a 6 L2(R). let K(d. a. G) denote the collection of all

sequences of densities {dn} such that

(4.20) "n1/2(d:/2 - d1/2) - an2 -——» o as n -—» m.
Note that (4.20) implies a l d1/2. as is easily shown.
It also implies
1/2

(4.21) "n1/2([L(dn)] - [L(d)]l/z) - auc -—» o as n ——» m
where B(x.0) = [f: a2 dkjllz. B(x.1) = a(x). and

p 1 [L(d)]llz. Let K(d. G) denote the union of K(d. a. G) for
all a E L2(R). and let {an} be a sequence of estimators of
the functional W(d; G; 1) based on (i1. 6i). i = 1.‘°°. n. We
say that {an} is regular at d if for {dn} 6 K(d. G) and

X °° Xn independently and identically distributed

1.. O
according to dn’ n1/2[9n - W(dn: G; 7)] converges weakly to a
distribution F(d; 1; G) that does not depend upon the

particular sequence {dn). The following theorem extends

Theorem 5 of Beran(a. 1977) to the censored data case.

Theorem 4.4. Suppose W(°; G; T) is differentiable at d
with derivative w. in the sense that for (1D in a Hellinger
neighborhood of d.

W(dn; G; T) - W(d: G; T)
= I. w{[L(dn)J"2 - [L(d)]

+ "[L(dn)]l/2 - [L(d)]llzuc un.

where each component of un -—» O as Ildrll/2 - d1/2H2 -» 0. Let

1/2
} dnG

(an) be a sequence of estimators of W(°; G; T) which is

49

regular at d. Then F(d; T; G) can be represented as the
convolution of a N(0. 4-lfzw w wt duG) distribution with a

distribution T1(d: T; G).

Proof. Let

n [L(dn)1"2(2.. 6.)
(4.22) Ln = 2 a 1,2 .
i=1 [L(d)] (2.. 6.)

 

then we have for dn’ d in (4.20). as n -—+ m.

(4.23) PL(d)[ILn - 2 n‘l’

2 § 512.. 6.) [L(d)]‘1’212..6.)
i=1
+ 2 1:» 02 dual > e] -—+ 0.
for any 5 > 0. This can be easily deduced from LeCam’s second
lemma and is similar to Lemma 1 of Wellner(1982). Now the
rest is almost the same as in Theorem 6 in Beran(a. 1977).
For any vector v 6 RP. the differentiability of 2(0; G; 1) at
d and (4.20) give
(4.24) v‘[n1/2(w(dn; c; 1) - W(d: c; 4))]
‘---T I1“ (vtvlﬂ duc.
Thus we can proceed almost exactly as in Theorem 6 of

Beran(1977.a): the choice 5 = h vtw. h E R arbitrary. yields

that along a subsequence. the random vectors
1/2 “

{v‘tn (92D) - w(d: 0: 11)].
n
n'l’zizlvtw(21.6.)[L(d)1'1’2(Z..6.)}

converge weakly under L(d) to (vtS. vtN} where
N = N(0. II“ wwt due) and S depend only on d. T. G and not
on {dn}. Let w denotes the characteristic function of the

limit (vtS. vtN). Then at the end we get

50

(4.25) ¢(s.0) = 9(a. -2'1s)
°exp [-8-1([:on vtwwtv duc)s2].
The first factor is the characteristic function of
vt(S - 2-1N). the second factor is the characteristic
function of 4-1vtN. Thus the theorem follows. B
When the conclusions of of Theorem.4.2. 4.3 hold. the

sequences of estimators {91n)' (Ozn) are regular at f9. In

fact. under L(fe).

(4.26) n1’2[82n - 2(19; c; 1)]
- 4"! P9 P9“1 nI/Zn?n - F9] dG
- 4'11 19 {6‘1 a d{n1/2[Fn- P9]}
= op(l).

as in the proof of Theorem.4.3. Since (4.24) gives contiguity

of {L(dn)} to (L(fe)). (4.26) is also true under L(dn). Thus

1 1/2 A
= n

(F - D ) to P1 under
nn n n

convergence in D[0.T] of P
L(dn) and the differentiability of W(°; G; T) will give the

regularity of(02n). Similarly we can obtain the regularity of
{31n}' Since with probability 1 P1 sits in a separable subset

of D2[0. T]. by Theorem 5.3 in Pollard(1984) the necessary

and sufficient condition for the convergence of Pin to P1 are

the finite dimensional convergence and "small oscillation"
condition. Recall the martingale representation of P;/F under

L(f) as in Theorem 7.2.1 and Theorem 7.5.1 of S-W. We have

1

similar representation for Pnn

/ﬁ under L(d ). Thus
n n

convergence of Pin on [0.n] for any n ( TF can be obtained
9

by. say. Theorem 8.13 of Pollard(1984). This gives finite

51

dimensional convergence of Phn' Since small oscillation

property is reserved under contiguity. we obtain the

convergence of Phn' Therefore 02n is a distinguished regular
estimator of W(f: G: T) for having the smallest asymptotic

variance when the parametric model is true.

5. ROBUSTNESS PROPERTIES

Just as in the i.i.d. complete data case(i.e. G is
degenerate at m) discussed in Beran (b.1977). the minimum
Hellinger distance estimation procedure in the random
censorship model posesses certain degree of robustness. In
one way this is reflected in the continuity of W(-; G);
furthermore. W(fn: G) proves to be optimally insensitive to
perturbations of its argument in a minimax sense. Consider
the class of functionals {U} such that for p a p-dimensional
vector with components p(1) in L2(u).

(5.1) U(f9) a 9.

0(1) - e = I p<tL(f)1"2 — [L(f.)1"2) du

+ o(ﬂ[L(f)]1/2- [L(f9)]l/2ﬂc)un.
where each component of un -—» 0 as [L(f)]l/2 -—4 [L(f9)]12
in L2(u). We can assume for each 1. p(1) 1 36 in L2(u). since
otherwise we can replace p by

F = P - {I p 89 du}so.

with the difference caused by the replacement being absorbed
into the remainder term in (5.1). Also for each unit vector

e in Rp. scalar a g 0. we have

i

e = a-l[U(f - U(fe)] -4 [I psgi) dun] as a —4 0. So

i 9+aei)

I p s9 du = I. the identity matrix. When Theorem 3.1 applies.
{¢(‘: 0)} belongs to this class. One may be interested in

seeing which functional in the class just described is least

52

53

affected by infinitesimal perturbations of £9. at least
asymptotically. To this end. let us examine the behavior of
ct[U(f) - 9]. for every constant vector c € Rp. By
projection. [L(f)]ll2 can be represented as

[L(f)]ll2 = cos 1 se + sin 1 6.
where 1 e [0,2'11 ]. "anG = 1. l 5 se du = 0. Then
(5.2) U(f) - 0 = 1 I p 6 du + 0(7) as T -4 w.
Thus for small 1. or equivatently. small

"[L(f )]1/2 - seﬂ . the behavior of Ict[U(f) - 9]|

G
is primarily determined by If p 6 dul = Lc(p. 6). Thus the
problem becomes: which p minimizes for each c 6 RD the
deviation Lc' against all possible direction 6? It turns out

that W(°; G) corresponds to the optimal choice of p. as the

following result shows.

Theorem 5.1. Suppose sa satisfies the conditions in

Lemma 3.2. For each i. p(i) € L2(u). f p(1)se du = 0.

f p setd u = I. nanG = 1, f 5 59 du = 0.

Then for every v 6 RP.

(5.3) min max L (p. 6) = max min L (p. 6)
c c
p 5 6 p
0 0
= Lc( p o 5 )D
where
0 ' ' t -l'
(5.4) p = [I sese du] 89.
60 = Ilctpllc-1 ct po

Proof. The proof is just as in Beran(b.1977). It

54

suffices to show max min Lc 2 min max Lc' since the reverse
6 p p 6

inequality is trivial.
By the Cauchy - Schwartz inequality.

12
(5.5) mzx Lc (p. 6) = "c pllG .

-l

1: ° t
G c p. Since I p 56

at 5 = “ctpﬂ du = I and

[ p(i) s9 du = 0 for each i = 1.°°-. p. we can decompose

p = A 56 + a. where I at 89 dp = 0 and f 6 s6 du = 0. Then
't "t '1
I = I p 59 dp implies A = [f ses9 du] . So
(5-6) p = 90 + a. I at no du = 0.
and therefore
min max L (p. 6) = min "ctp + cta" = "ctpoﬂ .
c G G
p 6 a
0n the other hand.
max min Lc(p. 6) 2 min Lc(p. 60)
6 p p
= llctpollcu1 min Ict [p (po)tdu cl
p
= "ctpoﬂG-l. by (5.6).

Hence the theorem follows. D

10.

11.

12.

13.

14.

15.

Beran. R. J.
Statist. 5 4

Beran. R. J.

REFERENCES

(a.1977) Robust location estimates. Ann.

31-444

(b.1977) Minimum Hellinger distance
Ann. Statist. 5

estimates for parametric models.

445-463

Beran. R. J.
parametric mod

Billingsley. P.

Measures. Wil

(1981) Efficient robust estimation for

els. Z. Wahr.

ey.

Brown. L. D. and Purves. F.

selections of

Fabian. V. and

extrema. Ann

Hannan. J. (1985)

55

(1973)

91-108

Statist. 3

(1968) Weak Convergence of Probability

Measurable

902-912

Introduction to

Probability and Mathematical Statistics.

Gill. R. (198

Wiley.

0) Censoring and Stochastic Integrals.
124 Mathematisch Centrum,

Mathematical Centre Tracts

Amsterdam.

Gill. R. (198

product-limit estimator on the whole line.

Statist. 11

Hajek. J. (19
admissibility

175—194.

Hewitt. E. and
Analysis. Spr

Kaplan. E. and

3) Large sample behavior of the

49-58.

Ann.

72) Locally asymptotic minimax and

Sixth Berkeley Symposium
on Mathematical Statistics and Probability. Vol. 1

in estimation.

Stromberg. K.
inger-Verlag.

Meier. P. (1

(1965) Real and Abstract
New York.
958) Nonparametric

estimation from incomplete observations.
Statist. Assoc. 53 457-481

Millar. P.W.

Asymptotic Statistical Theory.

Mathematics.
Pollard. D. (
Processes. Sp

Rudin. W. (19
edition. McGr

Shorack. G. R.

l. Asst.

(1983) The Minimax Principle in

Lecture Notes in

976 75-265 Springer-Verlag. New York.

1984) Convergence of Stochastic
New York.

ringer-Verlag.

74) Real and Complex Analysis. second

aw-Hill.

and Wellner.

55

J. A.

(1986)

Empirical

16.

17.

56

Processes with Applications to Statistics. Wiley.

Wang. J. G. (1987) A note on the uniform consistency
of the Kaplan-Meier estimator. Ann. Statist. 15
1313-1316

Wellner. J. A. (1982) Asymptotic optimality of the
product limit estimator. Ann. Statist. 10 595-602

”TITIIMHJJIMWﬂﬁjlltfyﬂﬁt'lﬂiﬂjﬂjfﬂﬁm'“