$2QEHOZI
WW" will null». miiiifmﬁm

Michigan St 31293 00606
university

 

 

 

 

 

 

 

This is to certify that the

dissertation entitled

COMPOUND ESTIMATION OF PARAMETERS OF RIGHT CENSORED
EXPONENTIAL FAMILIES

presented by

Jagadish Purushotham Gogate

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree in STATISTICS

 

 

QM HWM

Major professor

Date OI 7/03 71/8?

MS U is an Afﬁrmative Action/Equal Opportunity Instiruu'on 0-12771

~—_._———— ~ —--— ——-— - , 77* 7—777777 7 7 7 7

 

PLACE IN RETURN BOX to roman this checkout from your record.
TO AVOID FINES return on or baton one due.

DATE DUE DATE DUE DATE DUE ll

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_JL_____
_Tl——ll—TJ

MSU Is An Affirmative ActiorVEquol Opportunity Institution

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

COMPOUND ESTIMATION OF PARAMETERS OF RIGHT CENSORED
EXPONENTIAL FAMILIES

By

Jagadish Pumshotham Gogate

A DISSERTATION

Submitted to
Michi an State University
in partial fulﬁlment of the requirements
for the degree of

DOCTOR. OF PHILOSOPHY

Department of Statistics and Probability

1989

kl" \
2 m

u-O

r0

\3

ABSTRACT

COMPOUND ESTIMATION OF PARAMETERS OF RIGHT CENSORED
EXPONENTIAL FAMILIES
By
Jagadish Pumshotham Gogate

Consider the usual random censoring problem in which X and Y are two
independent random variables with X~F0 for a 0 e 9 c R and Y~G and one is
required to estimate 0 based not on X and Y but on their identiﬁed minimum
(Z = XAY,A = the indicator of [X5Y]) under the squared error loss. An
estimator 112 then incurs a risk R(¢2,0) = E ”(go-0)? where E0 is the expectation
induced by (Z,A). In this thesis we investigate the set and the sequence
compound version of this problem.

In the set as well as the sequence compound versions, the above
mentioned problem is assumed to occur repeatedly and independently say 11
times and it is required to estimate ﬁn = (01,02,...,0n) based on
Zn = (Z1,Z2,...,Zn) and An = (A1,A2,...,An). A set compound estimator
(SCE) 39=(¢1’¢2"°°’¢n) is such that for each i=1,2,...,n , t/zi is an
estimator of 0i and is allowed to depend on _Z_n and An while for a sequence
compound estimator (SQCE) f9 = ({bl,$2,...,1~pn) each {bi is allowed to depend
only on Q, and -A—i' The risk of a compound estimator t is taken to be the
average of the component risks, Rn(t,ﬂ) = n-12?R(ti,0i). The modiﬁed regret
Dn(t,_6_l) = Rn(t,Q) - R(wn), with ”n the empirical distribution of £11 and R

the Bayes risk in the component problem, has been a standard for evaluating

compound estimators t.

The results obtained in this thesis hold uniformly in Q E 9‘”. When F 9
is exponential with density to = 0e.“ for x > 0 and 0 6 [mm C (0,00) and G
known, we show that Dn(_‘d3,_Q) = 0(n-7/5) for 0<7<l for SCE's 3Q based on
divided difference estimators of I (the average of densities) and its derivative
1". Let r be an integer greater than 1. For SCE's jg based on (Singh (1974))
kernel estimators of I and I", we show that Dn(3Q,Q) = 0(n—7(r-1)/(1+2r)) for
0<7<1.

In the more general situation when G is unknown and f 0 belongs to a
standard exponential family with 0 6 [end] C ll, we deﬁne SCE's 39 based on
kernel estimators of f and I, based in turn on the Product Limit (PL)
estimator of the average distribution function F and show that Dn(3g,§) = 0(1)
for the SCE 32. To show this, we ﬁrst derive L1 bounds for the uniform
maximal deviations of the PL estimator from F on intervals («,2] for each
2 e R using the exponential bounds in Foldes and Rejto (1981) and Singh
(1975). These Ll bounds themselves are of independent interest.

We show that similar results hold for SQCE's fk as well which, in fact,

are obtained as corollaries to those of SCE's.

TO MY PARENTS

iv

ACKNOWLEDGMENTS

I wish to express my deep gratitude to Professor James Hannan for his
guidance during the preparation of this work. Without his patience,
encouragement and many helpful suggestions this dissertation would not have
come into existence.

Thanks are also due to Professor Dennis Gilliland for being on my
guidance committee and also for having introduced me to Statistical Consulting
thereby Opening up new Opportunities. I would also like to thank the other
members of my guidance committee, Professors Hira Koul, Joseph Gardiner
and Habib Salehi.

Financial support provided by the Department of Statistics and
Probability, Colleges of Nursing and Oste0pathic Medicine made my graduate
studies at Michigan State possible.

I am grateful to my teachers at University of Mysore, India for initial
encouragement to pursue my graduate work in Statistics.

I also take this opportunity to express my gratitude to Mrs. Lalitha V.
Tirtha (Ex-Professor of Education, Bangalore University) for providing me
ﬁnancial support during my difﬁcult student days in India. But for her
encouragement and timely support it would have been very difﬁcult for me to
achieve my academic goals.

Finally, I wish to thank my wife Lakshmi. The supportive and
encouraging spirit that she has maintained over the past four years has been a

major ingredient of my success as a graduate student.

TABLE OF CONTENTS

Page
CHAPTER 1. INTRODUCTION .................. 1
1.0 Notations and Conventions .............. 1
1.1 The Component Problem ............... 2
1.2 The Set Compound Problem ............... 3
1.3 The Sequence Compound Problem ............ 4
1.4 Summary of the Present Work ............. 5

CHAPTER 2. SET COMPOUND ESTIMATION OF PARAMETERS
OF EXPONENTIAL DISTRIBUTIONS BASED ON

CENSORING DISTRIBUTION ............ 8

2.0 Introduction .................... 8
2.1 The Component Problem ................ 9
2.2 Bayes Estimates versus can and wnj ......... 9
2.3 An Upper Bound for the Modiﬁed Regret ........ 10
2.4 Procedures Based on Divided Difference Estimators. . 10
2.5 Asymptotic Optimality of the Procedures Based on

Divided Difference Estimators ............ 12
2.6 Best Possible Rates for the Procedures Based on

Divided Difference Estimator in the Identical

Component Case .................... 16
2.7 Procedures Based on Kernel Estimators ........ 20
2.8 A Lower Bound and Exact Rates of Convergence

for the Procedures Based on Kernel Estimators . . . 25

CHAPTER 3. SET COMPOUND ESTIMATION OF PARAMETERS
OF EXPONENTIAL FAMILIES BASED ON PRODUCT

LIMIT ESTIMATOR. ................ 29
3.0 Introduction ..................... 29

3.1 A Brief Review of Density Estimation in the
Presence of Censoring ................ 29

3.2 Compound Estimators of Q Based on PL

estimator of Fj ................... 31
3.3 Asymptotic Optimality of g ............. 32
3.4 Examples and Remarks ................. 38
CHAPTER 4. THE SEQUENCE COMPOUND ESTIMATION. . . .40
4.0 Introduction ..................... 40
4.1 A Useful Upper Bound for the Modiﬁed Regret ..... 40
4.2 Estimators Based on Divided Difference Estimators. . 41
4.3 Estimators Based on Kernel Estimators ........ 42
4.4 Estimators Based on Product Limit Estimator of Fj. . 42

APPENDIX. . . .

vii

A.1 The Joint Distribution of the Identiﬁed Minimum. . .

A.2 Bayes Estimates in the Squared Error Loss Estimation
and Stability with Respect to Small Perturbations. .

A3 The Product Limit Estimator of Average Distribution
Function; an L l bound for the maximal deviations

on intervals (-oo,z] for zell .............

A.4 On the Lower Bound for the Modiﬁed Regret in
Theorem 2.6 of Singh 1974) .............

A.5 On the Bound for the xpectation of Weighted Empiricals
Based on Independent Random Variables ........

A.6 Pr0perties of the Standard Exponential Family. . . .

BIBLIOGRAPHY

44
44
44

CHAPTER 1
INTRODUCTION

1.0 Notations and Conventions.

We begin with some notational conventions used throughout the body of
this thesis. For a positive integer u, an n—tuple (x1, x2,....,xn) is denoted
by 5n (the subscript n will not be exhibited if it is clear from the
context). For a sequence of probability measures P1’ P2, P3, ..... , we denote
their measure theoretic product, x°1° Pj by P and x111 Pj by Lin. For
a probability P the corresponding expectation is denoted by E.

To reduce the complexity of notations, we ultimately view random
variables as coordinate functions on their joint range while abusing notation by
retaining their original (capital letter) names; we will do this throughout
without further mention. To help clarify various iterated integrals, dummy
variables will frequently be displayed in integrals.

The indicator function of a set A is denoted by [A]. R stands for the
set of real numbers. The notation T ~ K means K is the distribution
function of T. Often a distribution function and its corresponding expectation
are denoted by the same letter.

For real numbers a1,a2,...,an, let 5 = n_1BI1'ak and ii = n—1 Eﬁﬁak.
When the ak = l—bk, we abbreviate Ej to 14% as a typographical convenience.
By h(") we mean the u—th derivative of a function h. The evaluation of a
function at its argument is omitted or exhibited as convenient. The Lebesgue

inf of the restriction of a function h to the interval (x.x+c] for some c>0 is

denoted by h c(x).

1.1. The Component Problem.

The component problem that we consider throughout the body of this
thesis is the well-known random censoring problem. Let 9 be a subset of R
indexing a family of probability measures. Let X and Y be two random
variables such that, under 0 , X ~ F 0 and independent Y ~ G. Let f 0
denote a density of F0 with respect to a measure p. Let Z=XAY and A =
[XSY]. Let P0 denote the joint distribution of Z and A determined by F 0
and G.

The decision problem considered is the squared error loss estimation of 0
based on Z and A. For this problem, the risk at 0 incurred by an estimator
11) is R(gb,0) = Bow—0)? For a prior w on 9 , let ‘1'“) denote the Bayes
estimate versus w (see Section A.2 for details) and R(w) denote the minimum

Bayes risk.
1.2. The Set Compound Problem.

Suppose the decision problem described in Section 1.1 occurs repeatedly
and independently, say 11 times. In the set compound version one allows
the use of observations from all the problems in each of the decisions. Thus,
in a set compound estimator p = (1/11, ¢2""’¢n) of _0_ = (01, 0 ,....,0n) 6
9n, t/Jj is an estimator of 0j based on Zn and An with joint distribution Pu =
x11l P j (the subscript dj is abbreviated to j, here and throughout). The

compound risk is taken to be the average of the component risks incurred by

the use of each 11)]. :

(1) R (at) = ‘1 3E (as-)2
n j=l_n j j

Let ”n denote the empiric distribution of 01,02,....,0n. For a simple

symmetric estimator 2/2, i.e., lbjan’ An) = (0(Zj, Aj) V 1 5 j S n for some
component estimator 11), the compound risk is the component Bayes risk of :12

against the prior ”n and hence at least R(wn). The excess

(2) Dn(ﬁ)ﬁ) = Rn(3éa.a) - R( wn)

is called the mmiﬁed regret of Q at Q and has been a standard (see, e.g.,
Section 0.2 in Singh (1974) and the references mentioned there) in evaluating
compound procedures. Compound procedures which attain risks asymptotically
no more than R(wn) are of interest and a compound estimator gig is evaluated

on the basis of how fast it achieves R(wn). We say that a compound

estimator 39 is asymptotimly optimal (a.o.) (with rate nc with e > 0 ) if

(3) 3238s Ducts) = 0(1) ( 0(n") )-

Often set compound estimators are of delete nature. Typically in a
delete compound estimator 32 = “01,1112, . . . Jpn) each 30]. is a function of
an estimate of the Bayes rule in j-th component using the other observations
and this function is evaluated at the j th observation. This creates some sort
of independence and thereby simpliﬁes some of the mathematical arguments in
obtaining the asymptotic Optimality of such estimators as we shall see later in
this thesis. With this in mind, we next obtain a simple bound for the

modiﬁed regret (under the squared error loss).

Let «2 . denote the empiric distribution of 01,0 "Maj—1’ dj +1""’

n.l
(with the "normalizing factor" 11 instead of n—l). Then from (A.2.5)

on

(with p there, in our case, the identity function on R ) it can be shown

that I! and it take values in 9 if 9 is convex.
“’n wnj

Since
an; ) = n’1

n i

ll Mi:

2

it follows from (2.1) , (2.2) , the identity b2—c2 = (b—c)(b+c) and the

triangle inequality with the intermediate term \Ilw .(Zj’Aj) that
“J

D
(4) IDnctol s 2 diam e n‘1{j§1EnI¢,-- ‘I’wanijH
n
+ ,2 snlww , — wwnsz-Ajn}

J=1 11]
for a compound estimator g with it]. taking values in 9.

1.3. The Sequence Compound Problem.

The sequence compound problem also considers, say 11, independent
repetitions of a component problem but allows data only through stage j in
estimating dj , i.e., in a sequence compound estimator Q = (5&1,¢2,....,7zn),
each iIinsbasedoanand Ajforlgjgn.

The risk through stage n and the modiﬁed regret of a sequence
compound estimator 373 is given by (2.1) and (2.2) respectively with the
understanding that each 01. depends only on Zj and Aj' The criterion for

the asymptotic Optimality (with and without rate) remains the same.

1.4. Summary of the Prment Work.

Compound Decision Theory was introduced by Robbins (1951). He
considered the problem of deciding between N(—1,1) and N (1,1) and showed
that the procedures he considered are asymptotically optimal which he called
asymptotically sub—minimax. Later this work was generalized to two
completely Speciﬁed distributions by Hannan and Robbins (1955). Since then
a huge literature has evolved on this subject. However, the most relevant to
the present work is that of Gilliland (1966) and Singh (1974). Gilliland
considered sequence compound estimation of certain discrete exponential
families and normal distribution and Singh extended it to general exponential
families. Singh estimated average u—densities and their derivatives obtaining
certain asymptotic pr0perties and used them in obtaining asymptotic optimality
of his estimators based on these density estimators. He used the so called
class-r (for additional material on this class, see, e.g., Devroye (1987)) kernels
in deﬁning his density estimators.

In this thesis we extend Singh's method of density estimation to the case
of right censored exponential families. Under the assumption that the
censoring distribution is known, we obtain the asymptotic optimality for our
set and sequence compound estimators for the special case of the ordinary

1/5

exponential distribution with rates near 11 for the estimators based on

1/2 for the estimators based on kernel

divided difference estimators and near 11
estimators. To deal with the more general situation when the censoring
distribution is unknown, we ﬁrst develop the product limit estimator of the
average distribution function and use it in deﬁning kernel estimators of the
average of densities and their derivatives. Based on these kernel estimators we

deﬁne our set as well as sequence compound estimators and obtain their

asymptotic optimality by means of certain Ll bounds result for the PL
estimator. Even though nonparametric kernel density estimation in the
presence of censoring has been considered and explored by many researchers,
it was carried out only in the case of estimating a common density. The
present work seems to be the ﬁrst in estimating the average of densities and
their derivatives when the observations are right censored and using them in
constructing the set and sequence compound estimation problems.

The material in this thesis is organized as follows:

In Chapter 2 we consider estimation of parameters of exponential
distribution under the assumption that the censoring distribution is known.
We deﬁne two classes of estimators of fj , 13(1), (l-Fj) and (l-Fj)(1) : based
on i)divided difference estimators and ii)kernel estimators. In the former, we

show that the compound estimators so deﬁned are a. o. with rates near mu5

1/2. In both situations we show

and in the latter they are with rates near 11
that the rates obtained here are the best possible in those classes by obtaining
lower bounds in the case of a constant parameter sequence.

In Chapter 3 we consider set compound estimation of parameters of the
standard exponential family when the censoring distribution is unknown. We
deﬁne kernel estimators of Ti and {3(1) based on the PL estimator of Pi and
use these estimators in deﬁning our compound estimators . By using Theorem
A.3.1 we then show these estimators are asymptotically optimal.

In Chapter 4 we consider the sequence compound version of the
component problems treated in the Chapters 2 and 3. We obtain the
asymptotic optimality of the sequence compound estimators deﬁned here as
corollaries to the results obtained for the set compound estimators.

The Appendix contains some miscellaneous results most of which are used

throughout the thesis. Among them the principal one regards uniform L1

bounds for the maximal deviations of F from F on the intervals (-00,Z] for

each zeR. This by itself is of independent interest.

CHAPTER 2
SET COMPOUND ESTIMATION OF PARAMETERS OF EXPONENTIAL
DISTRIBUTIONS BASED ON THE CENSORING DISTRIBUTION

2.0 Introduction

In this chapter we consider estimation of parameters of exponential
distributions when the observations are right censored. We deve10p estimators
based on divided difference and kernel estimators of average of certain densities
and their derivatives similar to those considered, presumably studied
extensively for the ﬁrst time, in Singh (1974) in the uncensored case. While
we defer the case of estimating densities in the presence of censoring without
the full knowledge of the censoring distribution until the next chapter, here we
base our estimators on the censoring distribution thereby obtaining rates of
convergence. However, these techniques do not seem to be obtaining rates in
the case of general exponential families without more restrictive assumptions on
the censoring distribution, among other things.

In Section 2.1 we formally introduce our Specialized component problem.
In Section 2.2 we obtain expressions for the Bayes estimates \Ilw and \Ilwnj.
In Section 2.3 we obtain a suitable upper bound for the modiﬁed regret. In
Section 2.4 we deﬁne compound estimators by estimating the Bayes estimates
versus the delete empiric distribution given in Section 2.2 and we obtain their
asymptotic Optimality in Section 2.5. In Section 2.6 we show that the rates
obtained in Section 2.5, in fact, are the best possible rates for the class of

procedures considered there by obtaining a lower bound for the modiﬁed regret

in the case of identical components. In Section 2.7 we deﬁne compound
estimators based on kernel estimators and we show that the rates obtained
here are close to 1/2 whereas the rates were close to 1/5 for the
estimators based on divided difference estimators. Finally, in Section 2.8 we
obtain exact rates of convergence for the estimators of Section 2.7 in the case

of identical components.
2.1 The Component Problem

In this chapter we consider the component problem of Section 1.1 with
O = [0, [i] C (0, 00) and F45,(x)=l-e_0x for x>0 and 0&9. Throughout this
chapter we make the following assumption.

Assumption G: G has a positive density g with respect to Lebesgue on R+.

2.2. Bayes Estimates Versus on and wnj’

Since f 0(x) = 0e-0x and F ”(x) = 1_e-0x and therefore

(1) {0(1) = ““0 and (140)“) = 404:9),
the Bayes estimates of 0 against can and wnj given in (A.2.6) specialize to
(2) anon) = —6(f(1)/f)(z) — (1-0((1—F)(1)/(1—F))(z)
and
z =_ (1) z _ _ _—,(1) -12
(3) was} .6) 60,. /f,)() (1 0((1 F) /(1F,))()

respectively for (2,6) 6 (O,oo)><{0,l}.

10

2.3. An Upper Bound For The Modiﬁed Regret

Since 9 = [0,3], it follows from (1.2.4) that for a compound rule _\I_I =
(\Ill, \IIZ,....,\I'n) with \I!j taking values in O ,

_1 n n
2(ﬂ—a)n 2 \II.-\II Z.,A.) + 2 (\II AI! )(Z.,A.)|}.
With A the Lebesgue measure on R+ and 6 is the counting measure on

{0,1}, it follows from Assumption G and (A.l.2) that (Zj,Aj) has a density
pa. with respect to Axé given by
J

(2) pace = 6(142(z))f0(z) + (1-0(1—F0(z))g(z) for (2,6)e(o,m)x(o,1}
Since 9 = [aﬂ and

31618 p0(z,b) S 9—” (53 + (F0800)

for ZE(0,00), it follows from Remark A.2.1 and the inequality (A.2.4) with (p
the identity function that the second sum in rhs(1) is 0(1). Thus from (1)

we have the following upper bound for the modiﬁed regret.

(3) (Dues) 5 2(a-a)n"j§lﬁnl(wj-anszjaj)I + 0(n‘1).

2.4 Procedures Based on Divided Difference Estimators.

By the bound for the modiﬁed regret in (3.3), a compound rule 2 will
be ac. if \II. so deﬁned approximates \I! in L (E ). Since the expression
J “nj 1 —n

for \Ilw in (2.3) involves hill) l—F. and (l—F.)(1), it sufﬁces to
nj J J J J

estimate these quantities. Note from (3.2) that (1—F0) is a part of the

density of (Z,A) which corresponds to "censored" and to to "uncensored".

This suggests a natural way of estimating fj , f9), l-Fj and (l-Fj)(1)

11

and therefore ‘1'“) for ISan.
111'

Our method here is the divided difference estimation of the fly) and
(l—Fj)(") for 12:0 and 1.

Let <cn> be a non—increasing sequence of positive numbers such that
on» 0 as n -) 00. For each k=1,2,...,n, z 6 (0,00) and 0:0,1 , deﬁne
random functions a& and 66k by
(1) 5,1.(2)((14;(zk))(6=11+g(zk)(6=01} =

c;2{[z+cn<Zk$z+2cn,Ak=0]-{z<ZkSz+cn, Ak=0]},
and
~ -1

(2) b &((z){(1—G(Zk))[6=1]+g(Zk)[6=0]} = On [z<Zk5z+cn, Ak=6]}.
Deﬁne corresponding sure functions alk and b1k by
(3) a,k(z) = -0kfk(z) , a,k(z) = -ek(1—Fk(z)) ,
Then with the notations in (3) , the Bayes estimate (2.3) becomes
(4) wwnjnn = sou/519(2) -(1-a(50,/60,)(z) for i=1...-,n

We now deﬁne our compound estimator g by

(5) MA) = ()iq(zl.A,). i2(zz.A2), . . . . inane,»
where for 1 S j 5 n and (2,6) 6 (0,00)X{0,1},

$12.6) = [-ﬂlj/Iilez) — (1-6)(3‘0,/?30,)(z)]a,ﬂ
with

(x)a,ﬂ = a[x<a] + x[a$x$,6] + ﬂx>ﬂ].

12

2.5 Asymptotic Optimality of the Procedures Based on Divided Difference
Estimators.

Since (Zj,Aj) has density p0. given by (3.2) and the \IIJ. depends
I
only on the other (Zk,Ak)'s , it follows by the independence of (Zk,Ak) for

k = 1,2,...,n, that

(1) El(‘I' ~51 njzx jA,)I = foams -e nj)(z. 1))(1—G))f,(z)dz
+ (“elm-5w!” )(z, 0)l(1-F0j(z))g(z)dz-
We will obtain suitable upper bounds for E|(\IIJ .w-W nj)(z, 5)] for 6:0,1

and use them in (1) in order to get an apprOpriate upper bound for the
modiﬁed regret via (3.3).
The following lemma (Datta (1988)) is a pointwise improvement of

Lemma A.2 of Singh (1974) and is useful in bounding the modiﬁed regret.

Lemma 5.1 (Singh—Datta): For <y, z, Y,Z,L> e R5 and 2160 S L,
(2) lzl {|§ - i) A L) s ly-YI + ((3,!) + Luz—2).

Let en = 11-1/5 for the remainder of this section. Lemma 5.2 below

will be used in the proof of our main result of this section, Theorem 5.1.

Lemma 5.2: For each 7 E (0,2] , 3 numbers M1 and Mo such that for
1 5 j 5 n the following two inequalities hold:

(3) f,‘7(z).5_ ,(I(51- 51,7) +Ib1j-51j)|75)
Mlj(l+(f (z)(1—G(z+2cn ))) 7/2)c7,

l3

(4) (1—F,(z))‘7s,(l(50,-- 50,I7 + I130,-50,75I)
M0 (1+((1- .5 jn(2))52, (2))‘7/2)c7 cn

Proof. We shall prove (3) in detail; (4) follows by similar arguments. By
the moment inequality and sub—additivity of the 7/2 power,
-.- .. -.- 2 1- .—
(5) anIal, - 5,,I7 .<. (Var(a,,))7/ + Isnal, — 5,,I7.
By the independence of 51k, for k=1,2,...,n and the second moment

bound for the variances of the summands,

(6) Var ('51,) g n-zki'j skim?

But

. 0‘4 z+2cn fk( x)
(7) Ekalk2 = 0n

c -0 (z+2c )
< (1—G(z+2(cnlc;14)) (e 0“ n —e k n)
by the monotonicity of G and then the exact evaluation of the resulting
integral. From the mean value theorem,

-2 c
—e 01‘ n = -20kcne—£ with 0<§<20kcn.

Since 0<cn51, fk(z)= 0ke 40“ n and 0k E [a, {7], it then follows from (7)
that

skifk g 2c;3tk(z)/(1—G(z+2(en)).

3 -1/2 = 11—1/5

From this, (6) and the fact that (ncn) = on , it follows that

(8) (Var (5,,»7/2 s c;,7( 5,-(z)/(Hi(z+2c,,)))7/2.

From the deﬁnitions of alk and 31k in (4.1) and (4.3), it follows by
the deﬁnition of pa in (3.2) and the exact evaluation of integrals involved
that

14

'“k n)
9) A - = — f z
( Ekalk an: ”I: k( )[(l—e ( 01,0 )2

By two applications of the mean value theorem for the exponential function, it

 

24616,92].

can be shown that
-0kc
(1-e 2-(0kcn)

(0:011 )2
since 0k€[a, ﬂ] Vk. By substituting this bound in (9) we get that

 

I25.

(10) (Ens ,j- 51,.)7 _<_ (2/i’i)7i‘,7(z)el’l7 v 131511.
Now (5), (8) and (10) complete the proof of (3). n

We now state and prove the main result of this section.

Theorem 5.1:
(i) If 7 E (0, 2] and for some n = 110

(11) fm{ I’Gm +£(z)

{WAY/”Ell a,
(143(z+3c,,))7/2 (12,,m(z))77"’}e <

then for 2 deﬁned by (4.5) , El a number B such that

(12) 13111an _n|(‘i',- - ‘12,,anth A1717 5 B 11-7/5 v n.
(ii) If 7 E(0,1] and (11) holds, then
(13) 1513353 In, (LI) 1)) = 0(n‘7/5)

; ' . ° < '. < - ' <
Proof We Will ﬁrst prove (12) Since a _ \IIJ , anj- ,6 and Iaéj/béjl _

B for Vj and 6:0 and 1, (4.4) and (4.5) and two applications of Lemma 5.1

give

15

Issac—3,115.3”

(14) s INF—all} - %ﬂ|t(en)}+(l-a{|;ﬂ - iz—‘ﬂhwﬂ
b1] lj b0j 0j
s 3I5,,I‘1(I5,,-5,,-)+(23-a)I§1,—51,I) +

(1-3)I50,I‘1I)5o,-50,-I+(23-a))60,--50,II.

Now from (14) and a consequence of the triangle inequality for the norm
or metric distance in L7(E‘n) according as 7 6 [1,2] or 7 6 (0,1), we
obtain the following inequality.

2 2(7—1)+En|(\in-\Ilwnj)(.,5)l7 S
(15) XFlj)-7{Enl3'1j-51,-l7 + (25-007 sum—51)”) +

(1-3)(50,)‘7I.E_n I50,-50,I 7+(2sa)7s,, I 50,450,) 7).
From (3) of Lemma 5.2, the ﬁrst term in rhs (15) is bounded by
6(1 v (2ﬂ—a)7)rhs(3) since Blj=fj(z). From (4) of Lemma 5.2, the second
term in rhs(15) is bounded by (1—3)(1 v (2ﬂ—o)7)rhs(4) since 50j=1-Fj(z).
From this observation and the 7—analog of (1), it follows from (15) that

A

(16) En|(\I',--‘I'wn,)(Z,,Aj)l75

f.(z l—F.(z)
7 1 1-G(z) 1 4 3(2)
M cnll + f{(5,(z))7’2 (143(z+2c,,))7/2 '(1—5,(z))7/2 (s2c (2))7/2ldz}

2(r1)
where M = 2 +

(1 v (2s—o)7)(M,v M0) with M0 and M1 as in
Lemma 5.2.
Since 0j€[a,ﬂ] and

f1'(z)/|‘1-(Z)|"’2 v (1-1*‘,-(z)/(1-F,(z)7/2 s (1 v savage—3725,

16

the integrals in (16) are bounded by (lvﬂaﬂ/z) times the integrals in (11).

But lhs(11) is non—increasing with respect to n by the monotonicity of G

and the deﬁnition of g20 . Thus (12) follows from (16) weakened by the
n

above bound with a B sufﬁciently greater than
M((1 + (1 v so‘7/2)1hs(11) with n = no)
to make it also hold for n 5 no.
Next we will prove (13). If 7E(0,l] and (11) holds then from (12)
a 1.7
17 \IL— . . < x 12 f all
( ) 31,1151n En|( , anj)(Z,rA,)| - (ﬂ-a) {rh8( )} or n

1.9
since a 5 \Ilj, \Ilw .5 B V j. Consequently, the ﬁrst term in rhs (5.3) is
Ill
0(n“7/5). Thus (13) follows from (3.3). o

2.6 Best Possible Rates for the Procedures Based on Divided Diﬂerence
Estimators in the Identical Component Case.

For the procedures 2 deﬁned by (4.5) we have obtained in the previous

section ( see part (ii) of Theorem 6.1) the asymptotic Optimality with rate

1/5. In this section we show that for a speciﬁc sequence of parameters

the rate could be as high as n7/5 with 7 6 (0,2) but not equal to n2/5. In

near I]

fact we show that it is the best possible rate by obtaining a lower bound for
the modiﬁed regret (see part (i) of the theorem below ).
Let 0” = (0,0,....) with a ﬁxed 0 in [(1,5) throughout this section.

Theorem 6.1: Let 1:: be deﬁned by (4.5).
(i) Then
(1) n2/5 11,75,114) .. o as n .. co.

17

(ii) On the other hand, if 7 6 (0,2) and G satisﬁes (5.11) with 5 replaced
by a, then 0 5 Dn(\_Il_,d‘°) for each 0 e 6 and 3 U < do such that

(2) 333 Dan’) s U 11"”5 for all n.

Proof: Since can is degenerate at 0, It“, a 0 and consequently R(wn) = 0.

11

Therefore from (1.2.2) and the fact that (Zj’Aj) are i.i.d. here,
~ ~ 2
(3) Dn(g’m = E11(‘I’n(zn,An)'-’0) '

For the purposes of obtaining a lower bound for the modiﬁed regret we
shall use only the part of the modiﬁed regret that correSponds to An=1. In
view of this and the fact that the (Zj,Aj)'s are i.i.d. with density p0 given by
(3.2), it follows from (3) that

. 0° . 2
(4) Dn(;r_,r°) z j“) En(\IIn(z,l)-0) (1-G(z)f0(z)dz.
Let 2 be a ﬁxed number in (0,oo). The ratio aim/I111] , in the deﬁnition

of \Iln(z,1) given by (4.5), will be denoted by til/1:11 with the understanding
that the latter averages are the usual averages (of n—l elements). In this
ratio, this understanding is consistent with our previous notational conventions.
To prove (1) we ﬁrst obtain the asymptotic distribution of nl/2(\Iln(z,1)—6)
and then justify its asymptotic moments as lower bound for the actual
moments by using a well known Fatou Theorem for convergence in distribution

and ﬁnally apply Fatou's Lemma to rhs(4). To begin we obtain the

asymptotic distribution of <31,F)1>.

Note that, from the deﬁnitions (4.1) and (4.2) of 51k and Elk for k =
1,2,..,n—1, the (<ilk,b,k>) are a row i.i.d. array of random vectors since the
(Zk,Ak)'s are i.i.d. Let 0&2 and 0,? denote variances of 5.11 and 611
respectively.

18

From the deﬁnition of alk’ as n -3 do,
Z+ZC +c
— 1
(5) E 1111,, = on2 U: ‘1 i,(t)- f2 “ f0(t)} .. is )(z)

and
z+2cn (t) 2f0(z)
‘33 E15 3112‘ = c 111]; III-101?)“ 71137—7

and therefore

2f (2)
3 2 0
(6) CD 0a 4 m.
From the deﬁnition of Blk’ as n -+ co ,
. —l z+cn
(7) Eb =c ft)-Ifz)
1 1k 11 1; 6A 0(
and
_1 z+cn f0(t) f 0(z)
anlblk = c11 f dt " Fonz
and consequently,
f (Z)
(8) Cu Ub2 " W0 Z 0
Since
2 A . _l Z+Cn f0(:) {0(2)
Cn Elalkblk = ‘cnl; dt " ‘ Tim—)2 1
it follows from (5) and (7) that
. . f (Z)
(9) cu2 Cov(a1k,blk) -1 - I'Ioﬂi)’ .

Let
1 2 2 ~ . l 2 * ‘
=<T.1£)T.i)) = (5.7 (an-Eran.) , c. / (blk‘Elblk))'
It follows from (6), (8) and (9) that the covariance matrix of T11k converges

toI‘asn-ooowhere

(10) F=rfcm[_ii]-

19

n—l
Let Sn—l = k§1(n_1)_l/2 Tnk' Since <Tnk> is a row i.i.d. array, the

covariance matrix of Sn_1 is equal to that of T111 and hence converges to 1‘.
Therefore, since {[3] , [9]} generates the column space of I‘, it will follow by
the Fabian and Hannan ( (1985); Theorem 4.3.2) CLT that

(11) Sn—l .2. N(0,I‘)

provided the arrays < (Tngt),...,Tm(12) > satisfy the Lindeberg condition for
l = 1 and 2. But, since <Tn£l)> is a row i.i.d array for each I, the
Lindeberg conditions then reduce to

(12) slang())2/Var(rn§‘))[('rn{5))2>(n—1)Var(rn§[))cz] .. 0

for each 6 > 0. It follows by the deﬁnitions of alk and b1k that ”23.11,“,Jo =
O(c;2) and ”Blkllm = O(c;1) and consequently, from (6) and (8) and the

fact that ncn -) do, the events in (12) are eventually empty. Thus from (11)

; (1) 2 s
(13) < (ncn3)1/2(a,—i, (z)) , (ncn)1/ (bl-ion» > .2. N(0, 1).
From this and the delta method theorem (see, e.g., Theorem 4.4.2 in Fabian
and Hannan (1985)) applied to the quotient function with the differential
at (f0(l)(z),f0(z)) equal to
l/f0(z)
l -f,(z)/(f,(1)(z))2l
here, it follows (using the fact (ncn'3)1/2 = n1/5) that
(14) n1/'Ev’(-a:.l/I)1 — 0) 42-) N(0,n)
where It = 2/(f0(z)(1—G(z)))). Recalling that 11n(z,1) is the retraction of the
ratio mil/I11 to the interval [aﬂ], it now follows from (14) and the Fatou
Theorem for convergence in distribution (see, e.g., Loéve (1963), Theorem

11.4.A(i) ) that
(15) li_m n2/5En(\Iln(z,1) — 0)2 2 K or Ic/2

20

according as 0 6 (0,0) or 0 6 {01,3}. Now by an application of Fatou's
Leanna to rhs(4) we obtain (1) from (15).

Next, we shall prove (2). From (3), the inequality concerning the
non-negativity in (ii) is immediate. Since \Iln takes values in [0,5], it follows

from (3) that for 7E(0,2)

. 2__ .
(16) D423") 5 (3—a) 7 anlwn(zn,A,,)—0I7.
Since It“, .5 0 (because wnj is degenerate at 0) and (5.11) holds with 3
HJ
replaced by a (by assumption), it follows from (5.12) that
(17) rhs(16) 5 (ﬂ—o)2—7Bn—7/5 eventually.

Now by choosing an U sufﬁciently larger than (ﬂ—a)2-7B (to compensate the

initial terms) we obtain from (17) the second inequality in (2). 1:)

Remark 6.1: The assertion regarding the lower bound for the modiﬁed regret
in the above theorem can also be proved by techniques similar to those in
Singh (1974) in connection with his result on the lower bounds but those make
use of the Berry Esseen inequality (however, his proof there is incomplete; see
Section A.4 of this thesis for a completion of his proof) and Theorem 2 of
Hoeffding (1963).

2.7 Procedures Based on Kernel Estimators.

Let r be an integer greater than 1. For u=0,1,2,....,r—l, set

(1) Jr = { K: Kbounded Borelmeasurable vanishing off (0,1) and a}
V .5ij K(y)dy =[j=1/] forj = 1,2,...,r-l

In interest of typographical simplicity estimates of h(V) will be denoted by

21

11(1’ ) and, when h = l—H, I) will be denoted by l-H in what follows.

In this section we deve10p compound procedures based on kernel
estimator of ng) and (l-F.)(V) for V=0, 1 (for how the problem
reduces to estimating fg") and (l—Fj)(”) for V=0,l see the introductory
paragraph of Section 2.4).

11—1 / (1+2r).

In this and the next sections we let CD = For V = 0 and

1 and for K” E .X I» deﬁne on (0, do)
‘ V — 2k“
(2) 1(1) =(ncn"+1) 1k§jtx,[-;;]IAk=1)/(1-G(zk)))
and
‘ V V - Z _.
(3) (1-F,)()(-)= can“) 1k§j(K,(—‘;T))Ak=01/g(zk)}.
We deﬁne our (set) compound estimator Q by

QLZJA) = (\i’l(Z1’Al)’ @2(Z21A2)1m9‘in(zniAn))
where for 1_<_an and (2,6) 6 (0,ao)x{0,1},

(4) 3,123) = [- 3(?§1)/?,)(z) ..(1-1m1.s,)(1)/(1.1‘:~,))(z)]

with 000,6 = a[x<a] + x[a$x$ﬂ] + ﬂ[x>,6].

0)

Lemma 7.1 and Theorem 7.1 to follow are analogous to Lemma 5.2 and

Theorem 5.1 respectively.

Lemma 7.1: For each 7E(0,2] 3 numbers MI and M3 such that
(5) fj-‘YE—n'ng)- {EWI‘Y

s (c,"1M{)7(3'7+(5,(Hi(- + c,))‘7/2)
and
(6) (1—F,)‘7s,I((1—F§”)) - (1—F,)("))7

s (cur‘IM;)7(3'7 + ((1-F,)g,n)‘7/2)

22

for v=0, 1.

Proof: We shall prove (5) in detail and (6) then follows by similar
arguments. We omit or exhibit evaluation at 2 as convenient. By moment
inequality and sub—additivity of the 7/2 power, for every 72(0, 2],

“ V) _ ,V) 7 < 7 7
(7) EDIT] if I - a + B

where
2 = ‘ V) = " V)_ .V)
o Var(T§ )and B @117] 15).
The deﬁnition of I?) and (3.2) give

. l
(8) EDTEVMZ) = 0;” 1;, KV(t)Ij(z+cnt)dt

fgr)(z+cnt 17)
r l

 

_V 1 r—l ilk)(z)
= C11 1;) Ky(t){ REG—LET“

where the expression in braces is the r-th order Taylor expansion of f. at z.

k
nt) + (ont)’} dt
Distributing the integral in rhs(8), the ﬁrst term in rhs(8) is Igy)(z) by the

deﬁning property of K” in (1). Thus, from (8) it follows that
cH’ 1 )
n r r
(9) B 5 T 11);) Ifg (z + cntn)|t dt
where M is the common bound for K0 and K1. Since fl?) = (-1)r0]l;fk
and 0k E [cam for k=1,...,n and 0 < cn, r), it follows that

021,131 If§’)(z + ants) = 5912)).

Thus, from (9) we Obtain that
7 7 M H 7 7
(10) B S Ar ((17:17! On ) Ij (Z)-
By the independence of <Zk, Ak> for k=1,2,..,n and the second

moment bound for the variances of summands and the deﬁnition of I'g"),

23

2 V+1 --2 2 y-_z 1
. s (nc. ) ,3, 15.1 antiserum) d)
= (nc,,2("+1))'1 fK3(12f>i,-(y>u:oim dy
(11) = (“CDZHIYI fK12/(Y)fj(z+cnﬂ(l-G(z+cny)) dy

S (ncnm’l'1 (1—G(z+cn)))-l f K3(y)fj(z+cny) dy.

Since sup f.(z+c y)=f.(z), and K2 _<_ M2 and (nc
OSySl J n J V 11

due to the fact CD = n—l/(1+2r)’ it therefore follows from (11) that
(12) «75 (11%“)7(I,-/(1—G(z+c,)))7/2.
Now (5) follows from (7), (10) and (12) with M; = (M/rI)VM7). o

 

2V+1—1_ 2(r—V)
) —Cn

Next, we state and prove the main result of this section.
Theorem 7.1:

(i) If 76(0, 2] and for some n = 110
(13) 1—G(Z) + Z } ‘(0'167/2)de < m
f (1-G(z+<‘n)"/2 (scn(z))" 2 6

then for 2 deﬁned by (4), 3 a number B* such that
14 ”.- . . 7 < * -7(r-1)/(1+2r) ,
( ) lzggnsnlorj 11: “anZJ’AJH _ B n v 11

(ii) If 76(0, 1] and (13) holds, then
(15) ﬂung], Inns, 3)) = 0(5‘7"1)/(1+2‘)).

2

Proof: We shall ﬁrst prove (13). Since

a s 3,, 3,”; I) and (HP/id)v((1—F,)(1)/(1-F,)I) s 3 Vi.

(2.3), (4) and two applications of Lemma 6.1 yield the following inequalities:
“(I‘L- \Il )('16)|

J wnj
i( 1) i0)
_l_ _ _l_

I]. 1'].

_ﬁ, (1) _p, (1)
(_1_13__ _ (_1_1)__ A(ﬂ—a)]

A

s 6]
(1.15,) (15,)

1(s.o)} + (1—6){

 

 

 

 

24

(1o) 5 a ij‘lui‘gD-igll) + (zs—anij—ijl} +

(1-6)(1-F,-)'1{|(l-13,)(1)-(1-F‘j)(1)| + (23-5))(1—5,Hl-F,)I).
for (,5) E (0,00)X{0,1}.
By the same reasons given in obtaining (5.15) from (5.14), we Obtain,

here, from (16) that for (4,5) 6 (0,00)X{0,1}.

(17) 2-2(7_1)+E lei-11' )(-,6)I"s
—n j wnj

-7 “(1)- (1) 7 7 ‘ 7

of]. {gnnj i]. | + (25-h) Enlfj—fjl }+

_ _--7 _1 (1) _- (1)7 7 _1 _ 7

(1 0(1 F,-) (sum F,-) -(1 F,-) | + (23-0) E,,|(l F,)-(l F,)| }

by Lemma 8.1, the ﬁrst and the second terms in rhs(l7) are bounded by
Ii(1v(2s—o)7)rhs(5) and (1-3)(1v(2s—o)7)rhs(d) respectively. From this and the
7—analog of (6.1), we obtain from (17) that
(13) snug-1,, )(z,,A,-))7 5 112,71“)

11)
f.(z) l-F.(z)
1 1 l-G(ZL 1 5(2) d
( + ”(71(2))”?(1—G(z+2c..))7/2 + (1-F,-(z))7/2 (g. (2))”2 } Z)
II
2 1

with M* = 2 (7— )+(1vs'7)(M’f7v M37) where M3 and M‘f are as in
Lemma 7.1.

Since the 0j6[o,ﬁ] and

(fr/(797(2) v((1-F,-)/(1-F,-)7/2) s (1v35‘7/2)e'(a-ﬂ7/2)°.

the integrals in (18) are bounded by (lvﬂa-7/2) times the integrals in (13).
But lhs(13) is non—increasing with respect to n by the monotonicity of G
and the deﬁnition of gc(-). Thus (14) follows from (18) weakened by the
above bound with B*sufﬁciently greater (to hold also for the term 11 5 n0)
than

M*(1 +(1vdo’7/2)1hs(13) for n = no)

25

We next prove (15). If 7E(0,1] and (13) holds, then from (14),

r: _ 1_.),)(
(19) EggnEnM‘I’j anj)(zjiAj)l S (5‘0) { 111304)} V 11

since a 5 \Ij, \Pw .5 ,6 Vj. Therefore from (19), the ﬁrst term in rhs(3.3) is
nJ

0(n—7(r-1)/(1+2r)) . Consequently, (l5) follows from (3.3). n

2.8 Best Possible Rates for the Procedures Based on Kernel Estimators
in the Identical Component Case.

For the class of procedures deﬁned via (7.2), (7.3) and (7.4) we Obtain in
this section rates of convergence of the order n7(r—1)/(l+2r) for 0<7<2 when
Q is a vector of identical components. By obtaining a lower bound (see part
(i) of the theorem below) we show that the rates are best possible for this
class of procedures. The proof of the theorem below is quite similar to that
of Theorem 2.6.1 and therefore most of the arguments in obtaining similar
conclusions are repetitive. However, the class of procedures are quite
different and consequently there is an improvement over the rates of
convergence provided the degree r of the class of kernels deﬁned by (7 .1) is
sufﬁciently large.

For the remainder of this section let 6” = (Q,0,...,Q) with a ﬁxed 0 E

[as/il-

Theorem 8.1: Let on = n_l/(1+2r) and Q be deﬁned by (7.2), (7.3) and
(7.4).

(i) Then

(1) n2("1)/(1+2‘) Dn(_€_1'1_,r°) .. 00 as n .. a .

26

(ii) On the other hand, if 7 6 (0,2) and (7.13) holds with 6 replaced by a,
then 0 5 Dn(\_II_,0‘”) for each 0 E 6 and 3 Ur < do such that

(2) 3025 D4219”) s Ur {79'1” (”20 for all n.

Proof: By following exactly the same analyses leading to (6.4), we have the
following lower bound for the modiﬁed regret:

as m as
(3) 11,,(23‘) 2 [0 §n(2(z,1)-0))2(1-G(Z))f9(2)d2-
Let 2 be a ﬁxed number in (0,oo). We denote the ratio In(1)/I'n simply

by 7(1)]? where the later averages are usual averages corresponding to n—l

elements. We next obtain the asymptotic distribution of <f(1),'f>.
For V = 0 and 1 and k=1,2,...,n—1, let

._ 1 5152 _
(4) Ame) — 31713 K,[ ,n ][Ak—ll/(1-G(Zk))
D

and note that the (<Alk,A0k>) are a row i.i.d. array of random vectors since

the (Zk,Ak) are i.i.d. It follows from the deﬁnition of pa in (3.2) that

1
_ —V
EIAVk — C11 0 KV(t)f0(z+tcn)dt

 

 

1 r——1i,,(k)(z)(tcn)k t,,(')(z+tcnry)(tcn)‘}dt

(5) = «1:7, 5.111,; n + .1

r-V
to“) + 31:1— 1;)lf0(r)(z+tcnn)trdt

where the second equality follows by the r-th order Taylor's expansion of f 0
at z and the last equality follows by the orthogonality prOperty of the class
.132. But the integrals in the second term of the extreme right hand side of

(5) are absolutely bounded by ff0(z)/(r+l) and consequently it follows from
(5) that

27

 

(6) ElAVk '2 f0(u) as n '2 00.
From (4) and (4.2), it follows that
2
1K (t)f (z+tc) (z)
(7) C121V+1E1Auk = 1;, V l-G(za+tcn)n d‘ 0 EU}. 192“)“

where the convergence above is a consequence of the facts that f 9 and G are

continuous and the integrands converge dominatedly. It also follows that

(Z)
(s) cn3Cov(A0k,A1k) 4 ”0 FTP-(Elf K1(t)K0(t)dt.

Let

Ta. = (3.1173111?) = (amen-314111.13 ”2310 1-E An.»-

Then it follows from (6), (7) and (8) that, as n -) do , the covariance matrix

of Tnk converges to
1
r

j; K12(t ) dt K1(t)K0(t)dtq

:c"._

(9) r =

 

 

3(1) K)l(t K0(t)dt f12K0(t)dt 1
n—l

Let Sn—l = k31(n—1)-1/2Tnk. Since <Tnk> is a row i.i.d. array the

covariance matrix of Sn—l is equal to that of Tnl and hence converges to 1‘.
Therefore, it will follow, as in Section 2.6, that

(10) s,,_l 39-1 No.1“)

if the arrays <(Tnge),. . . ,Tnngf)» satisfy the Lindeberg condition for l? =
1 and 2. But, since <Tnl((£)> is a row i.i.d. array for each I, the Lindeberg
conditions then reduce to

(11) EI(TD{‘))2/Var(rn{())[(Tn]‘))2>(n—1)Var(rn§(he?) .. 0 .

for each 6 > I). From (4), "Al/kn,” = 0(c;(”+1)). Therefore, from (6) and
(7) and the fact that (n-1)cn -1 do, the events in (11) are eventually empty.

Thus, from (10) and the deﬁnitions of Tnk and Ank and 1M"), it follows that

28

(12) (ncn3/2(i'(1)—r§1)), (ncn)1/2(i—t,)) =2. N(0,I‘).
Therefore, from ( 12) and the delta method theorem as applied in Section 2.6,
(13) 10H” (”31-1971 - 3) 4‘9» No.0
1 . ..

where c = 1") K12(t)dt/(f0(2)(l—G(z)). Since \Iln(z,1) = (in/i)“ , it
follows from (13) and the Fatou Theorem for convergence in distribution that
(14) 1_ir_I_1 12‘“W“may1‘it,,(z,1)—o)2 2 (or 02
according as 0 6 (0,5) or 0 €{0,ﬂ}. Now Fatou's Lemma applied to rhs(3)
and (14) together prove (I).

If 7 6 (0,2), then

~ 2 2-7 ~ 7
(13) 3,13,12,04) 5 (ea) s,I3,(z,1)-3)I
Since \Ilw . a 0 and (7.13) holds with 6 replaced by 0, from (7.14) we obtain
IlJ
that
(16) rhs(l5) s (s—o)2’7B*n’7("l)/(1+2’) eventually.

Now by taking a Ur sufﬁciently large to compensate the initial terms in (16)
and noting that lhs(15) is the modiﬁed regret in our situation , the second
inequality in (2) follows from (15) and (16). The non—negativity in (ii) is

immediate. o

CHAPTER 3
SET COMPOUND ESTIMATION OF PARAMETERS OF EXPONENTIAL
FAMILIES BASED ON PRODUCT LIMIT ESTIMATOR

3.0 Introduction.

In this chapter we consider the set compound version of the component
problem described in Section 1.1. We assume that F0 has density f 0 with
respect to a measure a on R given by f0(x) = d(Q)e0x where d(0) =
(leoxdp(x))_l and that it has a positive density n with respect to
Lebesgue restricted to (a, do) for an a 2 -00. We take 9 = [0, H] a
subset of the natural parameter space {0: d(0)>0}. Let
(1) m = 31615 f0 and L =|0|V|ﬂ|.

We shall denote the retraction of a function h to the interval [a, b] by
(2) (h)a b=a[h<a] + h[ashsb] + b[h>b].

3.1 A Brief Review of Density Estimation in the Presence of
Censoring.

Kernel density estimation based on the PL estimator has been considered
by many authors. See, e.g., F61des, Reth and Winter (1981), Blum 82 Susarla
(1980), Padgett & McNichols (1984) and the references there, Michlniczuck
(1986) and Susarla and Van Ryzin (1986). In most of the papers appearing in
this area various asymptotic prOperties of the kernel estimators based on the
PL estimator have been studied. Many of the asymptotic results (especially

with rates) have been obtained via rates of convergence of the PL estimator to

29

30

the corresponding distribution function, e.g., in FOIdes, Reth and Winter
(1980) strong consistency of the PL estimator is obtained and the results are
used in FOIdes, Reth and Winter (1981).

Susarla and Van Ryzin (1986) consider empirical Bayes squared error
loss estimation of the natural parameter of an exponential family and reduce
the problem to estimating the density and its derivative. They make use of a
result in Gill (1983) regarding the convergence of the PL estimator on the
whole real line to prove the asymptotic optimality of their empiric Bayes
procedures.

Recently kernel density and hazard rate function estimation in the
presence of censoring via strong representation of the Kaplan-Meier (PL)
estimator has been considered in L0, Mack & Wang (1989). In Diehl & Stute
(1988) the kernel density estimator is represented in terms of a sum of
independent random variables plus a negligible remainder from which they
determine the exact rate of point wise and uniform convergence among other
things. Padgett & Thombs (1989) consider non—parametric estimator of the
quantile function Of the life time distribution again through kernel density
estimation method.

However, all papers mentioned above consider the i.i.d. situation.
Moreover, the results are based on almost sure convergence. Consequently,
neither these results nor their obvious extensions are adequate for our purpose.
By applying the L1 bounds for the maximal deviations of PL estimator of
average distribution function obtained in Section A.3 we are able to overcome
this problem and it turns out that these types of bounds are just what we
want.

Campbell and Fdldes (1984) deﬁne a generalized product limit estimator

for weighted distribution functions based on censored data and prove its

31

consistency for the weighted average of the distributions, a result comparable
to Singh's (1975) result. For the reason mentioned in the previous paragraph
this result is also not applicable to our situation. However, by simple
application of exponential bounds in Fiildes and Rejto (1981) and in Singh
(1975), in Section A.3 we obtain rates for the L1 convergence of the maximal
deviations of PL estimator of average distribution function (see Theorem
A.3.l). It turns out that even a modiﬁed version of the PL estimator (i.e.,
the delete case) inherits these asymptotic properties which makes its
applications to the compound problems straightforward.

In the next section we deﬁne the kernel estimator of the Hh derivative
of the average of densities for V = 0, 1 based on the estimator deﬁned in

A.3.15
3.2 Compound estimators of Q based on PL estimator of Fj.

In this section we deﬁne kernel estimators of fjo’) for v=0, 1 based
on PL estimator of F]. and use them in exhibiting the compound procedures.

Let r > 1 be an integer. For V = 0,1,2,...,r—1, let
{ K Borel measurable,vanishing off [-1,1] and]
V =

(2 K: .
) (I'!)"llyJ K(y)dy=[j=u], j = 0,1,...,r-1

The class 2% of kernels deﬁned in (2) includes the class of kernels deﬁned
in (2.8.1). Let c=cn be a non—increasing sequence of numbers such that
0<c$1 and c-10 as mm. For a KVE 1;, deﬁne the kernel estimator of

Igu) on (a, 00) by

(3) 397(2) = 11—17 f 31%?) 5(5) 33,0)

32

A

where Fj is the PL estimator of Fj given by (A.3.15).
2
Since f0“) = Qfo and F0(z)=j;fadp, by a simple application of the

Fubini theorem the Bayes estimates versus ton. given in (A.2.6) Specialize to

J
(4) 1,,nj(z,o)=5]1)(z)/i,(z))+(1-3)(fz°°i§”du/ 51,31), (2106(atw)x{011}-

From the bound we have for the modiﬁed regret in (2.4.5), deﬁning an
asymptotically Optimal compound estimator of Q reduces to estimating the
Bayes estimate versus “’nj in each of the 11 components such that it
approximates Bayes estimate at least in L1' Because of the form of the the
Bayes estimate in (4) we therefore deﬁne our compound procedure 111 of Q as
follows:

(5) §(§,A)=(\III(Z1,Al),\I12(Z2,A2), ....... ,6n(zn,An)
where for ISan and (2,6)E(a, oo)X{0,1},

. ~(l) .. co ~(1) 00 ..
1,(z.3)= 31, (2)/5,-(z)+(1—3)fz (5,- )_,,,L,,,Lde/fz (5,)0 dn]

1m 0.6
Note that since Fj(and hence TE”) for V=0 and 1) is independent of
<Zj’Aj>’ so is \Ilj. This fact makes considerable contribution in obtaining

the asymptotic Optimality of the procedures 3 as we Shall see in the next

section.
3.3 Asymptotic Optimality of g.

In this section we will state and prove our main result Theorem 3.1.

Emma 3.1 below obtains the L1 consistency of the estimator Tg") Of IE")
for u=0, 1 and it will be used in the proof of Theorem 3.1 along with

33

Lemma 3.2 which itself is based on Lemma 3.1.
In proving Lemma 3.2 and Theorem 3.1 we will make use of some of the
properties of exponential family as stated in Section A.6 without further

mention. Lemma 3.1 is obtained mainly by an application of Theorem A.3.1.

Lemma 3.1: Let G be continuous and 2 be a number 3 G(z)<1. Let
< c > be a non-increasing sequence of numbers such that 0<c$1 ,c—30, and
log n/nc4 = 0(1). Let u-1 be of bounded variation on compacts of (a,oo) and
K” be of bounded variation on [-1, 1]. Then for V = 0 and l
(1 sup E sup I002 -I(V)z =01.

) MVWDIM) ,()I ()

Proof: Deﬁne on (a, do), for V=0 and l,
" V) _ l . -t 1
(2) 5] (~) - gm f 3.1-7,351,, 13,0).
We will prove (I) by triangulation about the non-stochastic term IE”). First
we will Show that E sup |‘.V)(z) — ﬂu)(z)| = 0(1) uniformly in Q.
" 151$n J J ’

Since Fi is absolutely continuous for each i, it follows that if the

exponents in (A315) are replaced by [Zk<z, Ak=1], then 1—Fj is equal
to rhs (A.3.15) with this modiﬁcation a.e. (E). Also since [Z(n)=z] is E

null, 1—Fj(z-) = rhs(A.3.15) with the exponents replaced by [Zk<z, Ak=1]
a.e. (E). Consequently under its hypothesis Theorem A.3.1 also obtains L1

bounds for the sup distance between F: and Fj on (-oo,z] for each zeR with

21"”; = Fj- + Fj.

34

Since 1—Hj= (1—Fj)(l-G) and G(z)<1 and for each ZE(a.,00)
inf 110(2) > inf ed(0)fz {e ”Aem}dp(x) > 0
and
‘12:; (150(2)) 2 126 1(1)]z {ew‘Aefxldm > o,

it follows that inf(1-Hj(z)) > 0 and sup Hj(2) < l.
J I
Let tn denote Lnlog n and n’ denote n—l for the remainder of this

4 -1 0 by our assumption there exists a sequence of

section. Since log n / no
numbers <Ln> diverging to do such that tn/nc4 -1 0. Since Ln -1 do and

inf(1-Hj(z)) > 0 and sup Hj(z) < 1 for each 26R, the conditions in (A.3.6),
J J
(A.3.7) and (A.3.8) hold eventually for each 26R. Thus it follows from

Theorem A.3.1 that, eventually, for each 26R

—M1(z)n’
(3) (14(2)) 1211!) E( 21192 IF*(1)—F(1)I) S ﬁn ,7n‘ + 69 +
l]_ <11 -00
"M Z I "" Z I
i, n, 2( )Ln +1/2+ 2e2 n, n 11’ M3( )Ln +1+ 03 n n + 327

where M1’ M2, and M3 are ﬁnite and positive functions of 2 independent of j
and Q and, consequently, rhs(3) is independent of j and Q.

 

Let Yz denote KV[' ; z)/u(-). Then by the deﬁnitions of I?) and
I?) in (2.3) and (2) respectively,
(4) 5’“ (i?) (z) - 3%) = f Y,(1)d(ﬁ",? - 5,)(0

Since Fj and F: induce the same measure. Since K” and u.1 are of

bounded variation on compacts, each Yz is of bounded variation on

[z-c, 2+0]. Therefore Yz is continuous there except possibly on a countable

subset, say Dz. But F} assigns mass only to those observations which

35

come from Fk’ k=1,2,...,n, katj. Since each Fk is absolutely continuous, it

therefore follows that [F';{x} > 0] is E null for each x in (a, co).

Consequently,

I Dzd(F'3')=0 a.e.(§).

Since Fj(Dz)=0, it now follows that the integral in (4) is equal to

fv;(t)d(r}—Fj)(t) a.e.(§)
where 2Y; = Yz+ + Yz-. Since Fj is absolutely continuous and (Fj*)*=
(Fj*), it follows that (‘3’ij = FI—Fj' From this and the facts that
K V(1+)=KV(—1-)=0, it follows (by the integration by parts formula in
Theorem 21.67 part (v) and Remark 21.68 extending it to functions of
bounded variations in Hewitt and Stromberg(1965) ) that the integral in (4) is

equal to

-f(F‘,f(1)-F,(t))dvz(1).

Thus from (4) we obtain for V = 0 and 1 that
.. .. . 2+1

(5) sIfi")(z)-ii")(z)l s 52.3. sup IFT(1)—F-(1)If Ide(t)l-
J J z-15t$z+l J J 2—1

Therefore, since the total variation of Yz is ﬁnite, by our assumptions on
< c > it follows from (3) that rhs(5) is 0(1) for each 2 uniformly in j and Q.

By the orthogonality of K V and r—th order Taylor expansion of fj as
used in obtaining (2.7.10), it can be Shown here that

’ V) __ V) Cr.” r Lc _
(6) IT] (z)1§ (2)) 57,1. ||K,,||,,e 1,(z). u- 0.1
where L is as deﬁned in (1.1) and llKullm is the sup-norm of K”. Since
f,(z) s 111(2) s 328 d(o)(e°“ v e305» and "KO", v )IK,)),,< m.

it follows from (6) that

36

V)z_ V)z _
(7) 323 1:11.111!) Ii] () I] ()l 0(1)

Since lhs(7) is non—stochastic, (1) now follows from the triangle inequality,

the asserted behavior of (5) and (7). 1:)

Lemma 3.2. Let hypothesis of Lemma 3.1 hold for each 2 < 00. Then

“1) _ 1) __.
(3) 3283 lgrjrgnfsm) )_mL,,,,L 5,4 Ida <10)
and
(9) 2383 1311113 st(5,)o,m-1,Idu=o(1).

Proof: Lemma 3.1 Shows that E values in (8) and (9) converge pointwise
to 0 uniformly in Q and j. Therefore (8) and (9) follow fron the D.C.T. with
the dominating L1 - functions 2mL and m respectively. [1

Theorem 3.1: Let the hypothesis of Lemma 3.1 hold for each 2 < do. Then
for 2 defined by (2.5)

(10) 9311213” 13111311 E|(‘i',--‘I',,, Ilj,-)(Z A,)| = 0(1)
and
(11) sup, IDn (1.1)) = 0(1)

Proof: Since (Zk,Ak) are independent and for each j \Ilj depends only on
(Zk’Ak) for katj and since (Zj’Aj) has density pj given by

(12) 1,23) = 3(141(z))1,(z)u(z)+(1—afz°°u(t)1,(1)dt

37

with respect to (01"; (see Section A. 1), it follows that

(13) EK‘i’j-‘Ilwnszr AjH = fEK‘i'jw-‘I’m’)(21)|(1'G(Z)fj(Z)dﬂ(Z)

+fEl(‘i' .1. (z 0)|( f tandem
Since

J

by the definition of \ilj, \Ilw . given in (2.5) and (2.3) respectively and two
11.I

. l) on 1) co
111.111111 wwnjeIa,11] and Hg ﬂjl v IfzIJC 1111/]z depl 3 L,

applications of Lemma 2.5.1 and obvious weakening of the resulting bound by

B=1V(L+ﬂ-a) we obtain the following two inequalities:
* —1 ‘ l l “
(14) I‘I'j(z,l)-‘1'wnj(z,1)| s B IjIz) III§ m )(z)I+IIj(z)-Ij(z)l}.

( 15) |1i1.(z,0)—wwnj(z,0) | g

B( f Id11I'1If |(T(1)).mLmL-f(l)ldu + f III)0 —f--ldu}
Since f0 > fa A fﬂ , so is (1-1/n)fj; thus (14) and Lemma 3.1 give

(16) E sup |\ilj(z, 1)-\Ilw (z,1)|= o(1) uniformly in Q for each 2.
1< _i- <11 ”111'

Similarly foo fjdp > (1—1/n) foo (f aAfﬂ)dp; thus (15) and Lemma 3.2 give

(17) E sup lilj (z, 0)—\Ilw (z, 0)] = 0(1) uniformly in Q for each 2.
1315 <11 111'

Since v fj< m, it follows from (16) that the p—integral integrand in
j=l

rhs(l3) is 0(1) uniformly in j and Q. Therefore, since this integrand is
bounded uniformly in j and Q by (ﬂ—a)m 6 L101), the D.C.T. shows that the
ﬁrst term in rhs(l3) is 0(1) uniformly in j and Q. Since the integrand of the
G—integral in (13) is dominated uniformly in j and Q by (ﬂ-a) and this

38

integrand converges to 0 uniformly in j and Q by (17), one more application of
the D.C.T. shows that the second term in rhs (13) is 0(1) uniformly in j and
Q.

Now (10) follows from (13) and consequently the ﬁrst term in rhs(1.2.4)
(with ij there ,in our case, the \in ) is 0(1) uniformly in Q. Thus, the proof
of (11) is complete once we show the second sum in rhs(l.2.4) is 0(1)

uniformly in Q and j. To show this, ﬁrst note that from (12)

sup p 2,6) 5 6 m(z)u(z) + (1-6)

069 A
and consequently the {01/6 integral of the lbs of this inequality is ﬁnite since
In 6 L101). From this, Remark A.2.l and the inequality (A.2.4) with (,0 there

the identity function on ll, we get that the second sum in rhs(1.2.4) is 0(1)

uniformly in Q and j since 9 = [0,5]. In
3.4: Some Examples and Remarks.

The hypotheses of Theorem 3.1 hold for many well known exponential
family distributions such as Normal and Gamma. Note that the assumptions
on G are rather mild and consequently the theorem has wide applications. As
indicated in Section 3.1, Susarla and Van Ryzin (1986) have considered the
empirical Bayes version of the problem treated here and have obtained the
asymptotic Optimality of their estimators (which are different from ours) under
more restrictive hypotheses on G. Our method of estimation can easily be
Specialized to the empirical Bayes situation and the asymptotic Optimality of
the resulting procedures can be obtained by techniques analogous to those used

in the proof of Theorem 3.1.

39

Our proof of Theorem 3.1 heavily depends on the L1 consistency Of the
Product Limit estimator with rates. However, this approach will not obtain
rates of convergence for the asymptotic optimality of the compound estimators
unless we impose more restrictive hypotheses on the censoring distribution
which could be vacuous. It seems that obtaining rates of convergence even in
the empirical Bayes estimation is not possible. If we can obtain the L1
consistency of the PL estimator on the whole real line, then the techniques of
this chapter give rates of convergence for the asymptotic Optimality.
Apparently, there are no results available in the literature to date concerning
the mean consistency of the PL estimator on the entire real line even in the
i.i.d. case. Thus, it seems that, in order to Obtain the rates of convergence
in situations like ours a different approach is necessary as pointed out in

Susarla and Van Ryzin (1986).

CHAPTER 4
THE SEQUENCE COMPOUND ESTIMATION

4.0 Introduction.

In this chapter we consider the sequence compound version Of the
component problem treated in Chapters 2 and 3. In the sequence compound
setting, at each stage j, we estimate Qj based on the available
observations gj=(Zl,Z2, ..... ’Zj) and A]: (A1,A2,...., Aj). Thus for a sequence
compound estimator (SQCE) g = (\Ill, W2,....,\Iln) of Q=(Q ,....,0n), each \IIj
is allowed to depend only on gj and Aj' As we pointed out in Section
1.3, the Optimality criterion for SQCE is the same as for the set compound
estimator. The results of this chapter are obtained as corollaries to the main
results of Chapters 2 and 3. In Chapter 2 we assumed that f0 = Qe—ox, 0 E
8 = [a,/3], a subset of (0,011) and the censoring distribution G is known and

in Chapter 3 that f 0 belongs to a general exponential family, 0 e O = [a,ﬂ],

a subset of the natural parameter space and G is unknown.
4.1. A Useful Upper Bound for the Modiﬁed Regret.

Throughout this chapter we will denote the empirical distribution of
01’02"”0j-1 by wjj' Let m and L be as deﬁned in (3.0.1). By particularizing
Lemma 2 of Chapter 2 in Singh (1974) to the case 9i = [a,ﬂ] for all i (which
is a consequence of inequalities (8.7) and (8.8) of Hannan (1957)), the modiﬁed
regret of a SQCE ii: = (\Ill, \I12,....,\Iln) of Q = (0 ,....,0n) has the following

upper bound:

40

41

.. n ..
(1) IDDIMI - 31.2 ‘25—“me s 11,—: gstIIj-wwjszjAjII.

Remark 4.1: We will be using the above bound for the modiﬁed regret in the
next three sections to obtain the a.o. of the respective SQCE's to be prOposed
in those sections. Since m 6 L101), the second term in lhs(l) is 0(log n/n).
Thus to obtain the asymptotic optimality of our SQCE's, it is enough to

consider rhs( 1).
4.2 Estimators based on Divided Difference Estimators.

Here we assume that the censoring distribution G is known and
Assumption A (that G has a positive density g) of Section 2.1 holds. We
now deﬁne our SQCE SE of Q based on divided difference estimators of f. =
.-1j-1 (1)_.—1j—1 (1) _ _ .-1j-l _ _ (1 _
1123,11,: ,1"1 —j 21 fk , (1 Fj) — j 21 (1 Fk) and (1 Fj) —
1" 2{‘ (l-Fk)” by

semen) =(w1(z,,A,), . . . sauna,»
where for iSan,

(1) \ij(zj,Aj) = rhs(2.4.5) with n=j there.

Theorem 2.1: For the SQCE deﬁned by (1), if 7 6 (0,1] and satisﬁes
(2.5.11), then
~ - 5
sup ID (MI = 0(n 7/ )-
.Qelmﬂlm n
Proof. By Theorem 2.5.1
‘.— . . 7 < "7/5

JJ

42

uniformly in Q. Consequently,
rhs(1.l) 5 53L (Ii—d)1""n"’/5
uniformly in Q. n

4.3 Estimators based on Kernel Estimators.

In this section also we assume that G is known and Assumption A
holds. The SQCE we prOpose in this section based on kernel estimators of fj
, Ii“) , (1—Fj) and (143)“) is deﬁned by

ﬁreman) =(I1(21.A1). . . . .‘i'n(Zn.An)).
where for lgjgn,

(1) @(ZrAj) = rhs(2.7.4) with n = j there.

Theorem 3.1: For the SQCE deﬁned by (1), if 7 E (0,1] and satisﬁes
(2.7.13) , then

£61311 mmanQZml = 0(n‘““‘)/(1+2‘)).

Proof. By Theorom 2.7.1,

ndISI': - ﬂwjj)(ZJ-,Aj)l7 s 3*(1‘Wr‘ll/(“2‘h

uniformly in Q. Consequently
rhs (1.1) g 5B*L (ﬂ-a)1-7n—7(r_l)/(1+2r). D

4.4. Estimators Based on Product Limit Estimators of Pi.

In this section we do not assume that G is known. The SQCE's to
be introduced here are based on kernel estimators Of I]. and 1].“) which
themselves are based on the PL estimator of F. = j-IBi'le. We deﬁne our
SQCE Q of Q by J

43

2(anén) = (Wl(zliA1), - - - i‘pn(zn:An)r
where for l 5 j S n,
(1) \Ilj(z,6) = rhs(3.2.5) with n=j there.
Thus at each stage j, the estimator III. of 0]. depends only on Zj—l

J
and Aj—l (see the deﬁnition Of the PL estimator in (A.3.15)).

Theorem 4.1: Let the hypotheses of Theorem 3.3.1 hold. Then for the SQCE
deﬁned by (1)

ﬁnnﬂlwlDJMH = 0(1).

3

Proof: Since
Ejl(‘pj-\I’wjj)(zj’Aj)l = 0(1)

uniformly in Q from (10) Of Theorem 3.3.1, so is rhs(l.1). n

APPENDIX
A. l The Joint Distribution of the Identiﬁed Minimum

Let X and Y be two random variables such that X~F and Y~G where F
and G are distribution functions. Let Z = XAY and A = [XSY]. Since
FxG[Z$z,A=1]=F([X_<_z]G[Y2X]) and FxG[Z$z,A=0]=G([YSZ]F[Y<X])
with rhs's denoting iterated integrals, for (2,6) 6 R x {0,1}
(1) Ptzsz.A=a = 1 f (1-G(x-))dF(x) + (1-1) f (1-F(y))dG(y).

[x52] [362}
Let f be a density Of F with respect to a measure A. For 66{0,1}, let

V6 = 6A+(l-6)G. Let Q denote the counting measure on {0,1}. Then (Z,A)
has a joint density p with respect to (oz/6 given by

(2) MM) = 5(1-G(Z-))f(Z) + (1-5)(1-F(Z)) 1 (215) E Rx{011}-

Remark 1.1: With the above A the Lebesgue measure on R, the
formulation of joint density similar to the one given in (2) has also
appeared in (2.1.8) of Wang (1983) but slightly differently in that the factor
1—G(--—) in rhs(2) is absorbed in the deﬁnition of V6 by replacing A by
A(1-G(-—)). However, the form in (2) is of interest to us since we are

concerned with a family of Fo's ( where Q E 9 C ll) dominated by A.

A.2 Bayes Estimates In The Squared Error Loss Estimation And Stability
With Respect To Small Perturbations

Existence: Let .X be a family of probability measures on a measurable space
(T,.9' ). Let A be a measure such that K<<A for every KEJé Let k denote

44

45

a density corresponding to KEJé Let u be a prior on x and 1p be a real
valued function on J. In this section we consider the squared error loss
estimation(SELE) Of (0(K) based on teT. Since w is a prior, it follows that
k is a density of E = woK with respect to WA. Therefore (see, e.g., Problem
2.6.3 Fabian & Hannan (1985)), k(t)/fk(t)du(k) is a conditional density of K
given t. Consequently, by Remark 10.4.16 ibid., in the SELE problem the
Bayes estimate of 1p(K), say ‘1'”, is a version of E(tp(K)|t) and it is given by
f<p(K)k(t)dw(k)
f k(t)dw(k) '

 

(1) \Ilw(t) =

Stability: Let .75 be parameterized by a set 9, i.e., Jo” = {K0 : 066}. Let
k 0 denote the density corresponding to K 0. Then for priors w and w’ on 9,

it follows by the deﬁnition of \I! (tj) in (1) that

f k,(t,-)d( w—w') f IpIIIJ-IIIWI _ fwk_(tj)dw' ;
fk.(tj)dw fk.(tj)d(w—w’) fk.(tj)dw’

 

 

 

(2) swap-we) =

this can be seen by addition and subtraction of ftpk.(tj)dw’/fk.(tj)dw on

lhs(2) and then rearrangement of the terms.

Remark 2.1: The simple identity in (2) turns out to be quite useful in
bounding the modiﬁed regret in compound problems. Let t1,t2,..., be a

sequence of independent random variables with K j = K 0. the distribution of
J

tj for each j , j=1,2, ..... . Let wn and wnj respectively denote the empiric
distribution of 01,02, ..... ,0n and 01’02""’0j—1’0j+1""’

can — wn j is degenerate at Qj. Since k 0 2 0, it therefore follows from this case

on for lgjgn. Note that

46

of (2) that
sup k0(t.)
. see 1
(3) \II t. -\II t. 5 diam 9 _—
| wn( JI ,nd JII vi 12,1, 1‘1“?
Let 5 = 99 . Then it follows from (3) that
i=1
(4) 52" \II (t.-\II t. 5 diam 9 f sup k A.

The integral in rhs(4) is ﬁnite in several interesting cases, e.g., when {K 0 :
Q E 9} is an exponential family with respect to a measure p and 9 is a

compact subset of the natural parameter space.

Remark 2.2: Let .7-{F0' 069} be family of distribution functions on R. Let
X and Y be two independent random variables such that X~F0 and Y~G. Let
(Z,A) be as deﬁned in Section A.1. Then the joint distribution Of (Z,A)
belongs to {P0' 069} where P0 is given by rhs(l.l) with F replaced by F0
The corresponding density p0 of P 0 is given by rhs(l.2) with f and F replaced
by f0 and F0 respectively. Thus, from (1), the Bayes estimate of ¢(Q) is
given by
(5) \II (z,6) = ftpp.(z,6)dw.

w fp.(z,6)dw
If l—G(z—) = 0 and 6:0 then this reduces to fgp(.)(l-F.(z)dw/f(l-F.(z)dw.
Since the set {(z,6): l—G(z—) = 0 and 6 = l} C {(z,6): fp.(z,5)dw = 0 } is
woPo null and 66{0,1}, it therefore follows that

 

 

a fpf.(z)dw + 1_ far—Frame

(6) z __ .
U ’6) ff.(z)dw ( b) f(1-F.(z))dw

47

3. The Product Limit Estimator of Average Distribution Function; an L1
Bound for the Maximal Deviations on Intervals (-o,z] for will.

Let < Xi’Yi >, i=l,2, ..... n, be independent random vectors such that
Xi and Yi are independent for each i. Let Fi denote the distribution function
of Xi for each i and G denote the common distribution function of Yi’
i=l.2,......n. For notational simplicity, in this section only, distribution
function means right tail distribution function. Let Zi=XiIIYi and
Ai=[XiéYi] and F = n42);l Fk' Denote the distribution function Of Zi by Hi’
{12!in by H and note that Hk = FkG (since Xk and Yk are independent)
and, consequently,
(1) H = FG.

Let Z(1)5Z(2)5 ..... 5201) denote the order statistics of Z1, 22, ...., Zn'
Let A“) be the concomitant of Z“); the ties are partially resolved by ranking
the uncensored Z's ahead Of the censored Z's. Based on <Zk, Ak> for

k=1,2,....,n, we now deﬁne a Product Limit (PL hereafter) estimator F of R

by
- ”(0940):”

(2) I‘Iz) = Iz<z(,,,1 I] [ﬁg]

151 Sn—l
Note that F is a right continuous distribution function on R constant except

for jumps at the Z0) with A“) = 1 or i = 11.

Theorem 3.1 below obtains L1 bounds for the sup distance between F
and F on (mo,z] V zER. The proof is based on exponential bounds in FOldes
and Reth (1981) and in Singh (1975) (see Remark A.5.2)).

48

Theorem (FOIdes and Reth (1981)): Suppose Fi for each i = 1,2,..,n and G

are continuous. Then for every

 

(3) >max .1. . 4_(z),
c {n H32 n1/2HH2]
(4) .131 sup |¢(t)-G(t)l > 1) S Bn(£1H(Z)1Z)
-oo<tSz
where
_2 11:22 _2 22 _126
Bn(c,a,z) = 6e 9n“ H X D + 1%; gm 3 + 2e2nce ﬁne a .

Remark 3.1: The above theorem is a special case of Theorem 3.1 of F51des
and Reth (1981) as given in their Remark 3.1.1. In that remark the authors
claim that the condition (iv) of the theorem there implies that l/H4Jﬁ < c; it
is not clear to us why that is so. However, since Theorem 3.1 there is based
on several corollaries, it sufﬁces to assume the condition (3) above to make
use of those corollaries. We use the bound in (4) in Theorem 3.1 below. The
assumption in (7) below makes the second term in (3) the maximum. We
apply the bound (4) for c greater than enlog n where

(5) 1,, = «(Ln/aloe n)

with LDER. Condition (6) below means enlog 11 greater than or equal to the

 

second term of maximum in (3) and, together with (7), lets us use the bound

in (4) when t > enlog 11.

Theorem 3.1: Suppose Fi for each i = 1,2,..,n and G are continuous. Let
LH and z in It be such that

(6) H(Z)EZ(Z)JE;W 2 4.
(7) 4,5 32(2) 2 11(2) and

(8) H3(z)‘/Lnlog n > 1.

49

Then
(9) G(z)_E_ sup |13I(t)-F(t)| s 1 log n + s (1 log n,H(z),z)
-co<tSZ n n n
+ 1 7252 + 22.
.51

Proof: Let H(z) = n-IE [Zk>z] for zER. Let G denote the PL estimator

of G , i.e., G(z) = rhs(2) with the 1's in the exponents replaced by 0's. By
direct veriﬁcation (Exercise 7.2.2, Shorack and Wellner (1986)), we verify the

PL representation of H:

(10) i1 = re.

By subtracting (10) from (1) we Obtain the following identity:

(11) G(F-P) = (II—H) + F(G—G).

Thus, since G is non-increasing, it follows from here that

(12) 6(2) sup IFIII-IIIOI s
-oo<tSz

By the Fubini Theorem, it follows that

sup IHIII-IIIIII + sup IéIIHIIIII.
Sz —oo<t Sz

-oo<

. 1 .
(13) E sup |G(t)—G(t)| s cnlog n + f g sup |G(t)-G(t)|>c)dc.
-oo<t$z cnlogn —oo<tSz

Bn(c,a,z) is non-increasing with respect to c if c > l/a3v‘r'1 since its
third term is uniquely maximized at l/ann' . The inequality in (8) means
that enlog n > 1/H3(z)JE. From (7) the maximum in (3) is the second
term . Consequently, if c > enlog n then (3) follows since (6) makes enlog n
no less than that maximum. Thus from (4) the integrand ( and hence the
integral) in rhs(13) is bounded by Bn(enlog n,H(z),z). Now (9) follows from
this weakening of (13), expectation in (11) and the L1 bound obtained for the

sup distance between H and H in Remark A.5.2. o

50

Remark 3.2: When the Fk's are continuous, w.p.1 the product in (2) is
unchanged if (i) is replaced by i and n—i is replaced by 23111[Z [>Zi]°

Consequently F can be written as

(14) 3(2) = [z<Z(n)] 1:1
15 5n

The above representation of F turns to be useful for technical reasons in

2‘1‘[zl>z i] [21152115131]

1+EII'[Z(>Zi]

 

 

 

Chapter 3. In view of applications to the compound problems of Chapters 3

n
and 4, we need a PL estimator of Fj = {11‘2le for j = 1,2,...,n. Denote
#J

and deﬁne Fj (analogous to F) by

(15) Fj=(z) [Z<Z(ni)] H 1+2.[Z >Z]
15k¢j$n #J l k

the max{zlaz 29" '1Zj_1’zj+1’" ’Zn } by 201%)

 

A.4 On The Lower Bound For The Modiﬁed Regret in Theorem 2.6 of
Singh (1974)

Remark 4.1: Theorem 2.6 in Singh (1974) Obtains a correct lower bound (5.16)
but the proof of it is incomplete. We mention this fact here because we have
similar theorems in Sections 2.7 and 2.9. The second inequality in (5.17) is
incorrect but can be corrected by reducing the right hand side by

(,6—111)Pi +l{[£<X<£+c/2]£i['f'$0]}. We will next show how the proof can be
completed with this change.

We Obtain an upper bound for 3,350] and use this in inequality (5.17).

51

Let
X.—X
__ j 1 [u(X.)>0] . .

and note that ,conditional on X, V1, 2,....,Vi are i.i.d. since the X's are. Let
02 = Var(Vl). Then by Theorem V.4.14 of Petrov(1975),
3

A .1/2
1 P. 50 _<_ <I> —1 P V +
()..,U l ( 1 1/0) 03 1+|i1/2 P1V1/0|3

 

where A is the Berry—Esseen constant and 0 is the standard normal

distribution function. Since

0 < inf C(w) S sup C(w) < co,
OSUSﬂ CK 3

it follows from (5.2) that

(2) 0 < inf inf f(t) 5 sup sup f(t) <00.
aﬁwS/i l<t<l+c/2 05015.5 l<t<l+c/2

Let X E (l, (+6/2) until the inequality (8). Since h i hi 1 0 as i I 00,
h < e/2 eventually. Since I1K0(u)du = 1 and w E [a, 5], it follows from (2)
0

that

X+h
* A
(3) clh 5 PIV1 — f

for numbers CI > 0 and c; which depend on I. Since K0 is bounded (which

KGB-TX] f w(t)dt 5 cgh eventually

is tacitly assumed in Theorem 2.6) and 11K;(u)du > 0, it follows from (2)
0

and (5.0) that

X+h 1 (t)
at 2 Q 2 t—X w *
(4) c3h g PlVl .. J; K0[-h—] 7711 t dt 5 c4h eventually

for numbers c; > 0 and c2. Since a2 = P1V¥—(P1Vl)2, it follows from (3)
and (4) that 3 numbers c; > 0 and c; such that

(5) cghll2 _<_ a 5 cghl/2 eventually.

Thus from (3) and (5) it follows that 3 a positive number such that

(6) c;b1/2 s (PlVl/a) eventually.

52

Since K0 is bounded, it follows form (5.0) and (3) and (5) that there is a
number c; > 0 such that
(7) P1|(Vl—P1V1)/a|3 5 gr”2 eventually.

By substituting the bounds in (6) and (7) in (1) we obtain that for a

number c; > 0

(8) Pi +1{[lz<X<z+r/2]_1>,[Iso]} s ¢(—e;(ib)1/2) + c;(ih)_2.

But rhs(8) is O((ih)'2); this can be seen using the fact that <I>(-x) ~ Ip(x)/x
for large x. Singh's lower bound for the rhs(5.l7) is 0((ih3)_l/2); this can be
seen from the analyses (5.19) through (5.25) and a part of the note following
(5.25). Since the bound obtained here for (ﬂ—w)(lhs(8)) is of smaller order,
the analysis following (5.25) is not affected and thus (5.16) holds.

A.5 On the Bound for the Expectation of Weighted Empiricals Based
on Independent Random Variables.

In proving Theorem 3.1 we have used Remark A.5.2 which Obtain L1
bounds for the maximal deviations uniform on ll of the empirical distribution
from the average distribution function . In this section we show how we can
obtain that by simple application of exponential bounds in Singh (1975).
Singh's result concerned weighted empiricals.

Let X1,X2,...,Xn be independent real random variables. For a E [0,1],
let Fj(x) = aP[Xj<x] + (l—a)P[Xj$x] and Yj(x) = a[Xj<x] + (1-a)[Xj5x].

Let wl,w2,...,w3 be non—negative numbers such that 211] w? = 1 . Let W =
$111 wn and note that l 5 W 3 J5. Let

N
D =su max Ew.Y.x—F.x .

Inequality (7) Of Singh (1975) asserts that

53
-2(t2-1)
(1) ﬂDn>t] 5 4Wte for every t > 1.

Remark 5.1: Since W 5 J5 , it is easy to see that rhs(l) is summable for
any sequence tn 2 K ﬂogn if 2K > J3.

Lemma 5.1: For every T > 0,

2
_li_:I)n s T + W e’2(T ’1).
Proof: Since D11 5 W and W 2 1, the above inequality is immediate if T g
1. Now suppose T > 1. By the Fubini Theorem

_EgDn = I: ﬂDn>t]dt 5 T + I; ﬂDn>t]dt.

2
From ( 1) the integrand on the right hand side is bounded by 4Wte—2“ "1).

By evaluating the corresponding integral of this bound we obtain the assertion

of this lemma. 0

Remark 5.2: Let P“ be the empirical distribution of the Xj's and let F be

the average Of the Fj's. For the case a = 0 and w1 = w2 = = (fr—1 we

Obtain from Lemma 5.1 that
* —2(r2-1)
(2) J5 E sup|F(x)—F(X)| S T + v’ﬁ e for every T > 0.
x

It is immediate with 2T = Jlog n that the rhs(2) reduces to Jlog n + e2.

A.6: Some Properties of Compact Subfamilies of the Standard Exponential
Family.

Let u be a measure on ll. Let I: { Q. Ieaxdp < 00}. Let d(Q) =
(Ieoxdpfl and set f0(x) = d(Q)e0x for 0 6 9K The family of probability

54

densities {f 0: Q E J} is called a standard exponential family. The associated
family of distributions F 0 is also called a standard exponential family. The
set I is called the natural parameter space. The HOlder inequality shows that
I is convex. General exponential families are extensively studied in a
monograph by Brown (1986).

In the interests of Chapter 3 we below state and prove some of the
properties of this exponential family when the parameter space 9 is a closed
interval [0,3] .

1. d is continuous on 6.

Let 0n be a sequence such that 011 -1 0. Since 301615 80x S em‘Veﬂx,

d(0n) -1 d(Q) follows by the D.C.T.
2. d is positive on O and consequently 0 < inf d(Q) 5 sup d(Q)< on.
069 069

By Hiilder's inequality, log d is concave. Therefore, i026 d(Q) =
d(a)/\d(,6). That 301613 d(0) < co follows from the continuity Of d.

log f0(x) = 0x + log d(0) is concave as a function of 0 since log (1 is

from the proof of 2.

4. m=supf EL(p).
069 0 1

Follows immediately from the inequality m(x) S 301613 d(0){em‘Veﬂx}.

BIBLIOGRAPHY

Brown, Lawrence D. (1986). Fundamentals of Statistical Exponential
families with Applications in Statistical Decision Theory. IMS
Lecture Notes — Monograph Series. 9

Campbell, G..and FOldes, A.(1984). A generalized product limit
estimator for wei hted distribution functions based on censored
data. Statistics a Decisions, Supplement issue No. 1, 87—100.

Datta, Somnath (1988). Asymptotically optimal Bayes compound and
empirical Bayes estimators in exponential families with compact
parameter space. Ph. D. Thesis, Department of Statistics and
Probability, Michigan State University.

Devroye, Luc (1987). A Course in Density Estimation. Birkhauser,
Boston.

Diehl, Sabine and Stute, W'mfried (1988). Kernel density and hazard
function estimation in the presence of censoring. J. Multivariate
Anal. 25, 299—310.

FOIdes, Antonia and Reth, Lidia (1981). Asymptotic properties of the
nonparametric survival curve estimators under variable censoring.
Proc. Of First Pannon Conf. in Statist. Springer Verlag

FOIdes, A., Rejto, L. and Winter, B.B. (1980). Strong consistency
properties of nonparametric estimators for randomly censored data
I: The product limit estimator. Period. Math. Hangar. 11, NO. 3,
223-250.

FOIdes, A., Rejto, L. and Winter, B.B. f(1981). Strong consistency
properties of nonparametric estimators or randomly censored data
II: Estimation of density and failure rate. Period. Math. Hangar.
12, NO. 1, 15-29.

Fabian, Vaclav and Harman, James (1985. Introduction to Probability
and Mathematical Statistics. John ileg (’5 Sons.

Gill, R.D.(1983). Large sample behavior of the product limit estimator
on the whole line. Ann. Statist. 11, No.1 , 49-58.

Gilliland, Dennis Crippen (1966). Approximations to Bayes risk in
sequences of non-ﬁnite decisions problems. Ph.D. Thesis,
Department of Statistics and Probability, Michigan State University.

Harman, James (1957). Approximations to Bayes risk in repeated play.

Contributions to the theory Of games. Ann. of Math. Studies, 3,
No. 39, Princeton University Press, 97—139.

55

56

Harman James F. and Robbins Herbert (1955). Asymptotic solutions of
the compound decision problem for two completely speciﬁed
distributions. Ann. Math. Stat. 26, NO. 1, 37—51.

Hewitt, Edwin and Stromberg, Karl (1969). Real and Abstract Analysis.
Springer- Verlag.

Hoeffding, Wassily (1963). Probability inequalities for sums of bounded
random variables. J. Amer. Statist. Assoc. 58, 13—30.

Kaplan, EL and Meier, P. (1958). Non—parametric estimation from
incomplete observations. J. Amer. Statist. Assoc. 53, 457—481.

Koul, H., Susarla, V. and Van Ryzin, J. (1981). Regression Analysis with
randomly right—censored data. Ann. Statist. 9, NO. 6, 1276-1288.

Lo, S.H., Mack, Y.P. and W , J.L. (1989). Density and hazard rate
estimation for censored ata via strong representations of the
Kaplan—Meier estimator. Prob. Th. Rel. 80, 461—473.

Loéve, M. (1963). Probability Theory. Third Edition. Van Nostrand.

Mielniczuk, Jan (1986). Some asymptotic prOperties of kernel estimators
of a density functions in case of censored data. Ann. Statist. 14,
N0. 2, 766-773.

Padgett, W.J. and McNichols, Diane T. (1984). Nonparametric density
estimation from censored data. Comm. St-the. 13 , No. 13,
1581-1611.

Padgett, W.J. and Thombs, LA. (1989). A smooth nonparametric
quantile estimator from right—censored data. Stat. Prob. L. 7,
113—121. North Holland.

Petrov, V.V.(1975). Sums of Independent Random Variables English
t‘r/a’nlslation O the original book(1972), by A. A. Brown). pringer
e ag.

Robbins, Herbert (1951). Asymptotically sub-minimax solutions of
compound decision problems. Proc. Second Berkeley Symp. Math.
Statist. Prob. 131-148, Univ. of California Press.

Singh, Radhey Shyam (1974). Estimation of average of p densities and

sequence—compound estimation in exponential families. Ph.D.
Thesis, Department of Statistics and Probability, Michigan State
University.

Singh, Radhey S. (1975). On the Glivenko—Cantelli Theorem for weighted
gmpiricals based on independent random variables. Ann. Probab. 3,
O. 2, 371-374.

Shorack, Galen R. and Wellner, Jon A. (1986. Empirical Processes with
Applications to Statistics. John Wiey Sons.

57

Susarla, V. and Van Ryzin, John (1986). Empirical Bayes procedures
with censored Data. Adaptive Statistical Procedures and Related
Topics - IMS Lecture Notes- Monograph Series 8, 219—234.

Wang, W. (1983). Statistical inference for randomly censored linear
regression model. Ph.D. Thesis, Department of Statistics and
Probability, Michigan State University.

HICHIGAN STRTE UN

III III IIIII IIIIIIIIIIIIIIIIII‘I“

 

12930060