ESTMATEON 0F DERWATS‘y’ES 0F AVERAGE OF
R.- DENSSHES AND SEQUEEéCE - CGMPOUR?)
ESTIMATEON IN EXPOHEHTEAL FAMiLéES

Thesis fer the Degree of Ph. D.
MECHEGM STATE UNWERSITY
RADHEY SHYAM 3:993?!
1974

  
  

LIBRJ .24'
DASH“ ’..‘.' ;-

"i I 31%;» emit; a. i

I

"v ' ..-.v'.‘.q-nw;it*-:-' .'. -)
J...1 . . .- ‘ _J

This is to certify that the

thesis entitled
ESTIMATION OF DERIVATIVES OF AVERAGE OF u-DENS ITIES
AND SEQUENCE - COMPOUND ESTIMATION IN EXPONENTIAL FAMILIES

presented by

Radhey Shyam Singh

has been accepted towards fulﬁllment a '
of the requirements for .

Ph.D. degree in Statistics

Sheet/Wm
0

 

 

Major professor

%/77

07639

 

ABSTRACT

ESTIMATION OF DERIVATIVES OF AVERAGE OF u-DENSITIES
AND SEQUENCE-COMPOUND ESTIMATION IN EXPONENTIAL FAMILIES

By

Radhey Shyam Singh

Let X1,...,Xn be independent random variables with M‘

densities f1,...,fn, where u is a o-finite measure dominated

by Lebesgue measure on the real line R. With a fixed integer
E(v) n-1 nffv).
1 J

For any subset D of R, we give sufficient and (some-

v 2 0, we exhibit kernel estimators of

2
what) necessary conditions for asymptotic unbiasedness (asy. u.),
almost sure (8.3.) and mean square (m.s.) consistencies, each
uniform on D. We also prove integrated mean Square (i.m.s.) con—
sistency, and obtain convergence rates and exact rates for the
ECT)

asy. u., m.s. and i.m.s. consistencies. When , for an

integer r > v, exists on D, we show that the error term is

(r-v)/2(1+r)

0((n-llog n) ) with probability one, while m.s. and i.m.s.

errors are 0(n-2(r-v)/(1+2r)), each uniform on D. The vector
(f(v)(x1),...,f(v)(xm)) is shown to be asymptotically m-variate
normal.

We extend this estimation to multivariate case. Specifically,

estimation of mixed partial derivatives of the average of p-variate

u-densities has been considered.

Radhey Shyam Singh

1(v)

We make applications of f to sequence-compound squared

error 1088 estimation (SELE). With an observation on X distributed

according to (~) Pm 6.05 an exponential family wrt u and w G n,

the natural parameter Space, we take SELE of 9(m) = w, em or w-1

as our component problem.

_ n _
With (X1""’Xn) ~ in — Pm1 X...X Pmn El? , a (sequence
compound) estimator of e = (9(m1),...,e(wn)) is m = ($1,...,qh)

with mi (X1,...,Xi)-measurab1e. With a 6 > O, and CD the empiric

distribution function of w "wn’ and R(Gn) the Bayes risk at

1,..

G11 in the component problem, we say m has a rate 6 at e if

~

. . -1 n 2 _ -6
the modified regret n 2131(mi - 9(wi)) — R(Gn) — 0(n ).

With Oi < Si in n such that -ai and Bi are increasing
in i, we exhibit estimators (of 9) having certain rates uniformly
in w 6 XEEGi’Bil' These rates depend on the Speed at which
\an‘ v ‘Bn\ 1 m as n t m. 'When ai’ Bi are constants wrt i
and satisfy certain conditions, we exhibit a divided difference
estimator of m with a rate 1/5, and kernel estimators, (for each

~

integer r > O), of 9 with rates (r-1)/(1+2r), r/(l+2r) or

(r-)/(1+2r) according as 6(w) = w, ew or w-l, where for the

case m, r > 1. When 9(w) = w, and w has identical components,

~

we show that rates with the divided difference and the kernel

estimators of w are near, but cannot be more than, 2/5 and

~

2(r-1)/(1+2r), respectively.

ESTIMATION OF DERIVATIVES OF AVERAGE OF u-DENSITIES
AND SEQUENCE-COMPOUND ESTIMATION IN EXPONENTIAL FAMILIES

By

Radhey Shyam Singh

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements

for the degree of
DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

1974

TO MY PARENTS

ii

ACKNOWLEDGEMENTS

I wish to eXpress my deep gratitude to Professor James Hannan
for the patience he accorded me in the preparation of this thesis.
His careful criticism and invaluable suggestions aided greatly in
improving and simplifying virtually all of the results in the thesis.
In addition, among many others who helped, I wish to record
my thanks to Professor Dennis Gilliland for reading and commenting
on a difficult rough draft; to Mrs. Noralee Barnes for her excellent
typing; and to my wife for her great understanding and endurance.
Finally I would like to thank the Department of Statistics
and Probability at Michigan State University for the generous support,

financial and otherwise, during my stay at Michigan State University.

iii

Chapter

0

TABLE OF CONTENTS

INTRODUCTION ....................................

0.1 Estimation of Derivatives of the Average

of Densities ...............................
0.2 Sequence-Compound SELE with Applications

of f(v) ...................................
0.3 Some Notational Conventions ................

NON-PARAMETRIC ESTIMATION OF DERIVATIVES OF THE
AVERAGE OF n u-DENSITIES, AND CONVERGENCE
RATES IN n .....................................

0 Introduction ...(.3 .........................
1 Estimation of f and the Main Assumption
.2 Asymptotic Unbiasedness and the Exact Rate
3 Strong Consistency with Rates ..............
4 Variance, Covariance and Asymptotic
Normality ..................................
1.5 Mean Square and Integrated Mean Square
Consistencies with the Exact Rates .........
1.6 Estimation of Mixed Partial Derivatives of
the Average of Multivariate u-Densities

CONVERGENCE RATES IN SEQUENCE-COMPOUND SQUARED
ERROR LOSS ESTIMATION OF CERTAIN UNBOUNDED

FUNCTIONALS IN EXPONENTIAL FAMILIES .............
2.0 Introduction ...............................
2.1 A Bound for the Modified Regret ............
2.2 Some Assumptions and Notations .............
2.3 A Divided Difference Estimator of g with

a Rate 1/5 .................................
2.4 Kernel Estimators with Rates Near 1/2 when

w -1
9(w) = u.), e or u) .....................

2.5 Rates Near the Best Possible Rates with the
Divided Difference and the Kernel Estimators

of 9 with Identical Components ...........
2.6 The Divided Difference Versus the Kernel
Estimators .................................

iv

Page

10
13
18
22
28

36

41
41
43
45

47

58

73

81

APPENDIX
A.l ON GLIVENKO-CANTELLI THEOREM FOR THE WEIGHTED
EMPIRICALS BASED ON INDEPENDENT RANDOM VARIABLES
A.2 A BOUND FOR THE v-th MEAN OF THE BOUNDED
DIFFERENCE OF TWO RANDOM RATIOS ................
BIBLIOGRAPHY ..............................................

Page

83

85

87

0. INTRODUCTION

In this thesis we consider estimation of derivatives of
the average of densities with applications to Sequence-compound

squared error loss estimation (SELE).

0.1. Estimation of Derivatives of the Average of Densities.
Estimation of a Lebesgue-density, hereafter L-density, has
been studied by various authors, and a variety of methods have been
used: For example, Watson and Leadbetter (1963), and Nadarya (1965)
used the kernel methOd first introduced by Rosenblatt (1956), and
studied in detail by Parzen (1962); Cencav (1962), Schwartz (1967),
Kronmal and Tarter (1968), and Watson (1969) used the orthogonal
series method; Weiss and Wolfowitz (1967), Rao (1969), and Wegman
(1969) used maximum likelihood methods; Van Ryzin (1970) and Wahba
(1971) used, respectively, histogram and polynomial (Lagrange)-
interpolation methods. Estimation of derivatives of a L-density
has also been considered by Bhattacharya (1967) and Schuster (1969).
Estimation of non-Lebesgue densities and their derivatives
arises in Empirical Bayes problems, while that of the averages of
non-Lebesgue densities and their derivatives arises in compound
decision problems. Yu (1970), (Section 2 of the appendix) exhibits
kernel estimators of a u-density and its derivative, where
dp = u(x)dx and, for some a 2 -m, u(x) > 0 iff x > a. He gives

rates for mean square errors (m.s.e.) at each point on the real

line R. Samuel (1965), (Section 6), exhibits kernel estimators
of the average of L-densities and, under uniform equicontinuity
(hence, necessarily uniform equiboundedness) of densities on a
subset D of R, she proves asymptotic unbiasedness (say. u.)

and weak consistency, both uniform on D. Susarla (1970), (Sec-
tion 1.3), exhibits kernel estimators of the average and its first

partial derivatives of m-variate normal densities with known co-

variance and uniformly bounded unknown means, and obtains rates for

m.s.e. uniform on Rm. Samuel uses Parzen-type kernels, while Yu
and Susarla use those of Johns and Van Ryzin (1972).

We consider here non-parametric estimation of derivatives
of the average of non4Lebesgue densities. Let u be a o-finite

measure with density u wrt Lebesgue measure on R. Let

X1,...,Xn be independent random variables with Xj having a u-
density fj' With r 2 v 2 0 fixed integers, we exhibit kernel
estimators f(v)(x), depending on X1,...,Xn, u and r, of

f(v) = n-12: f§v). (If u were known to be at least as smooth
as f, were, we would estimate derivatives of the average of

L-densities ufj directly.)

In the remainder of this section we describe the main re-

sults contained in Chapter 1. Bounds obtained here are quite explicit.

We make almost no assumption on u for some of the results on asy.

a.s., m.s. and integrated mean square (i.m.s.) consistencies. For
any subsetD of R and any h = hn l O as n 1 m, if

-1 x+h -(v) -(v) . .
SUPXEDh Ix \f (t) - f (x)\dt a 0 as n a w, and if, in case
v > 0, the v-th order Taylor expansion of f(x + hy) about x

with integral form of the remainder exists for all 0 < y < l and

U.

3

for each x in D, then, under certain boundedness conditions on
l/u, asy. u., 3,3. and m.s. consistencies, uniform on D, are proved
(in Sections 2, 3 and 5, reSpectively). (Thus, contrary to the
assumption made for similar results in most of the papers on the
subject, asymptotic continuity of f(v) at the estimation point is
not needed.) In Section 4, we obtain rates and exact rate for the

1(v)

var(f ) and prove the asymptotic normality of the vector
(f(V)(x1),...,f(V)(xm)). In Section 5, we prove i.m.s. consistency.
Under certain boundedness conditions on l/u, the difference
of %(v) and its expectation converges to zero a.s. and in second
mean; and hence, the three properties asy. u., a.s. and m.s. con-
sistencies of the estimator become equivalent. Sufficient and
(somewhat) necessary conditions for asy. u. (and hence for a.s. or
m.s. consistency) uniform on D are also given in Section 2. These,
Specialized to f E f and D = (a,w) for an a 2 -m, become: If

J
I: f(x)dx < "s then f(v) is asy. unbiased uniformly on (a,m) iff

f(v) .is uniformly continuous there.

When r > v, and for all 0 < y < l and for each x in D,
f(x + hy), with h as indicated above, has r-th order Taylor
expansion about x with integral form of the remainder, then, under
certain boundedness condition of h-lf:+h\f(r)(t)ldt and of l/u
on D we obtain rates for various convergences.

In Section 2, we obtain rates and the exact rate for the
bias term uniform on D. The result giving rates, specialized to the

i.i.d. case with r = v + l, u E l and D = R improves the

correSponding one of Bhattacharya (1967), (see Remark 2.5).

In Section 3, we show that the error term is

0((n-llog n)(r-v)/2(1+r)

) a.s. as n t m, uniformly on D. This
result, specialized to the i.i.d. case with r = v+1, u E l and
D = R improves the correSponding one obtained by Schuster (1969)
(see Remark 3.3).

Rates and exact rates for m.s. error uniform on D and
for i.m.s. error are obtained in Section 5. These rates are shown

to be 0(n-2(r-v)/(1+2r)

) as n t m. Results, concerning bounds
of m.s. and i.m.s. errors, Specialized to fj E f, u E l and
v = 0 improve the corresponding ones of Parzen (I962), Schwartz
(1967) and Wahba (1971), (see Remarks (5.1), (5.2) and (5.3)),
though only Schwartz considered i.m.s. consistency.

In Section 6, we estimate mixed partial derivatives of
the average of multivariate u-densities. Specifically, we exhibit

_(v ,...,vm) v +...+vm_ m vi
kernel estimators of f (x) = a f(x)/(Il1 axi ),

-1,“

where x 6 Rm, f = n E f and fj are m-variate u-densities.

1 J
These estimators have asymptotic prOperties analogous to those
possessed by the estimators prescribed in the univariate case. We

verify some of these related to asy. u., m.s. and a.s. consistencies,

each with and without rates.

0.2 Sequence—Compound SELE with Applications of f v).

In Chapter 2, we deal with sequence-compound SELE of certain
unbounded functionals in exponential families. We use the estimators
:%(v) in order to exhibit certain sequence compound estimators whose
nnodified regret converges to zero with certain rates.

Suppose é’= {Pwlw E O} is a family of probability measures

on R, and the component problem is SELE of real 9(w). The sequence-

, _
romp.

lea l

Thus

he

((8

compound problem consists of n repetitions of the component prob-
lem with the loss taken to be the average of the component losses.
Thus one has w = (w1,...,wn) E On and (X1

The i-th component of a (sequence compound) estimator

,ooo’x)~P x-ooXP
n U.) U)
l n

~

w = (Ql’°"’¢h) of E = (91,...,en), where ej abbreviates 9(wj),

is allowed to depend on (x ""’Xi)'

1

With Gn’ the empiric distribution function of w ,w ,

1,... n

and R(Gn), the Bayes risk versus CD in the component problem, let

*1

. n 2
Dn(‘i”ﬁ?) - n :1 E(ej - cpj) - R(Gn).

Dn(w,q9 is called the modified regret of m, and is often taken as

a standard for evaluating compound procedures, (e.g., Hannan (1956),
(1957), Samuel (1963), (1965), Gilliland (1966), (1968), Johns
(1967), and Susarla (1970); of course with varying component prob-
lems). If 5 > o, and Dn((i),£p) = 0(n’5) as n _. 00, we will say

ff hasa Egg; 6 (at 9).

In the references cited in the next paragraph 0 is a bounded

interval, and, except in case of Samuel, rates are uniform in

n
w E n 0

When 9(w) = em and .9 is an exponential family satisfying
certain conditions, Samuel (1965) exhibits estimators m and shows

that D:(w,qp 410 for each m as n a m. When components of w
are means of normal densities with variances unity, and e is the
identity, Gilliland (1966), (Chapter 3), obtains an estimator with
a rate 1/5. Extending Gilliland's work to m—variate-case, Susarla

(1970), (Section 1.4), exhibits, for each integer r > 1, estimators

with rates (r-1)/2(m+r+l), and thus, improves Gilliland's result.

Ether

COlll

an

est

of

me

When 0 is a certain family of discrete distributions and the
component problem is linear loss two-action, Johns (1967) prescribes
an estimator with a rate 1/2. The same rate, 1/2, is achieved by
estimators prescribed by Gilliland (1968) in sequenceccompound SELE
of' 9 in a certain discrete exponential families.

For our main results,«9 is an exponential family wrt u,
where u is a a-finite measure with density u wrt Lebesgue
measure on R such that, for an a 2 -m, u(x) > 0 iff x > a.

The assumption that u(x) >10 iff x > a is imposed in
various papers either on Empirical Bayes, or on compound problems
in exponential families, (e.g., Samuel (1965), (Section 6), Yu
(1970), (Chapters 1 and 2), and Johns and Van Ryzin (1972)). In
the case of Gilliland (1966), (Chapter 3), and in the univariate
version of Susarla (1970), (Chapter 1), u is the standard normal
density function.

In each of the papers cited in the preceding paragraph, and
in the paper of Hannan and Macky (1971), u is at least continuous
on {u >'0}. We, instead, make certain assumptions on (local)
boundedness of l/u. In all the papers on compound decision prob-
lems so far available in the literature, a is assumed to be bounded.
we relax this by Eggggg. n gg_ghg_natural parameter space. However,

clur assumptions restrict the speed at which max grows

ISan‘wj\

as ntca.
We will now describe the main results of Chapter 2. We

have treated only the cases 9(w) = w, em or w-l. (The cases of

(bk, eLw or w-m, where k and m are positive integers and

O <:L < a, can be treated analogously). For a1 < Si in n for

all
car
deg
bel
cas

CO‘.

EX

all i 2 l, with a1 1 and Bi 1, we exhibit estimators with
certain rates. These rates are uniform in m E X? [ai,ei], and
depend on how maxlsjsn(‘ai‘ V \Bil) grows as n t m. Rates
below are, for the sake of convenience, indicated only for the
cases when 01 and Bi are constants wrt i and satisfy certain
conditions.

We use the ideas of Gilliland (1966), (Chapter III), and
exhibit an estimator of w based on a divided difference estimator

of (log f)(1), where f n-1

n
f. and f is a -densit of
Pm . This estimator is shown, in Theorem 1, to achieve a rate 1/5.

1 gm

We use the estimators (introduced in the preceding

section) of f(v) to obtain certain kernel estimators of 9 when
9(w) = w, em or w-1.

For each integer r > 1, we exhibit kernel estimators of
m which are shown, in Theorem 2, to achieve a rate (r-l)/(1+2r).
When wj's are means of normal densities, our estimators are pre-
ferable, for various reasons (see Remark 4.3), to the correSponding
ones of Susarla (1970), (Section 1.4).

For the case 9(w) = em, we obtain, for each integer r > 0,
kernel estimators which are shown, in Theorem 3, to have a rate
r/(1+2r), and thus improve (rate wise) Theorem 6 of Samuel (1965),
(also, see Remark 4.5).

When 6(w) = w-l, we exhibit, for each integer r > 0, kernel
estimators which are shown, in Theorem 4, to avhieve a rate
(r-e)/(1+2r) for any 6 > 0. The result here with u(x) =

(F(T))-le-1[x > 0], T >'0, generalizes and improves the main re-

tsult of Section 2.1 of Susarla (1970), (see Remark 4.6).

In Theorems 5 and 6, we show that, when 9 is identity and

w has identical components, rates with the divided difference and

~

the kernel estimators are near, but cannot be more than, 2/5 and
2(r-l)/(1+2r), respectively.

Finally, when 9 is identity, 3 comparison between the
divided difference estimator, say T, and the kernel estimator,

say V , is made in Section 6. QR with r > 6 is preferable to

~

V in the sense that supw‘Dn(m,VK‘ e 0, as n a m, faster than

N

supm\Dn(m,E)\ a O, as n : m.

~

0.3 Some Notational Conventions

 

We suppress the arguments of functions whenever it is con-
venient not to exhibit them. We denote elementary functions by
their values, and, except for emphasis, do not diSplay the dummy
variables of integrations. Indicator function of a set A is denoted
by A itself, or by [A]. For any measure 5, the g-integral of y
is denoted by gy, g(y) or §[y]. We abbreviate the Space Lp(R)
to LP, with 1 s p s m, Lp-norm to H'Hp, g(t) - g(x) to g]:
and, occasionaly, supt€A\g(t)\ to HgHA. The symbol 5 indicates
that the equation holds by the definition, or, is a defining one.

The symbol I. is used throughout to signal the end of a proof.

CHAPTER 1

NON‘PARAMETRIC ESTIMATION OF DERIVATIVES OF THE
AVERAGE OF n u-DENSITIES, AND CONVERGENCE
RATES IN n

1.0 Introduction.
Let u be a o-finite measure, dominated by Lebesgue measure

on the real line R. Let X1,...,Xn be independent real valued

random variables with Xj distributed according to Pj << u. With

u , a fixed determination of du/dt, let fj(t) =
-1 t+e
106 t

u(t) > 0, and 0 otherwise. (From the properties of Lebesgue points

(u(t))dlime de if the limit exists for all j and
of a function, see pp. 255-256 of Natanson (1955), all of the above
limits exist a.e. Moreover, if f5 is a determination of de/du,
then almost every point is a Lebesgue point of ufj, and hence

j

a fixed v 2 0, we want to estimate

f = f; a.e.) Let fgl) be the i-th order derivative of fj. For

E(V) “ fiV).

-1
_n 21 J

In Section 1, we exhibit a class of kernel estimators f<v>

of f(V), and discuss the main assumption to be made in later

sections. We obtain results on the bias in Section 2, on the error
of the estimate in Section 3, and on the mean square and integrated

mean square errors in Section 5. In Section 4, we prove the

1(v)

asymptotic normality of (f (x1),...,f(v)

(Xm)). In Section 6, we treat

the multivariate version of the problem; Specifically, for
m _(v1,...,vm)
x ='(X ,---,X ) in R , we estimate f (x) =

m
(v +...+v ) v

i . .
), where f is the average of n m-var1ate

9

a 1 m g(X)/(HT ax

1

de

fi

Jc
Sa

9X

(0

10

densities. Unless stated otherwise, results are obtained at a

fixed point x.

(v)

1.1 Estimation of f and the Main Assumption.
Let X' be the class of all real valued BoreI-measurable

functions on R vanishing off (0,1). For an integer r > v,

let ﬁcx be such that if K E Kr, then
V
(1.0) ki é (1!)'1fylx(y)dy = [i = v], i = 0,1,...,r-1.

Denote [yi‘x(y)\dy/il by lk‘i. The set x: is non-empty, Since

it contains the v-th element of the dual basis for the subSpace of
v+1

L1(0’l) With basis {1’ y/1!,...,yr'1/(r-l)!}. Define K: = Kw

Let 0 <‘h é hn s 1 be such that hn l 0 as n 1 m. For

a fixed r 2 e and a fixed K 6 xi, let

X.--
(1.1) Yj(-) = {Kr-ﬁ——9/u(xj)}[u(xj) > 0]
The proposed estimator of f(v) is
(1.2) E(V) = (nhv+1)-1Zn Y,
1 J
. -(v) :(v)
Hereafter gg frequently_abbrev1ate f and f by gn and

 

A

gn, respectively.
For estimating a Lebesgue density and its derivative (in

Empirical Bayes linear loss two action in eXponential families)

Johns and Van Ryzin (1972) introduce and use Lz-kernel functions
satisfying the xt-conditions for r > 1 and v = 0 or 1, with the

exception that, for the case v = 1, their kernels vanish off

(0,2) instead of (0,1). With fn E f, Yu (1970), (Section 2 of

11

his appendix), considers estimation of f and f(1), and uses

Johns and Van Ryzin type kernels.

The orthogonality properties of K E K: and the assumption
(Aér)), which is introduced below only for integers r > 0, are
used in curtailing the bias of gm, (see (1.4) - (1.7) below).
(Aér )): For each 0 < y < 1, there exists the r-th order Taylor
expansion of f(t + by) about t with integral form of the re-

mainder:

fa: + by) = so 1 £11.51)“, + 2:137,-js+hy<c+hy-z)r’lf(r)(z)dz

Remark 1.1. For this expansion, it is sufficient that
f(r-1) is continuous on [t, t + s] with e = by and f(r)
exists on (t, t + a) with countably many exceptional points, and
is integrable there. This follows, Since by the fundamental theorem,
f‘r 1><t2> - E(r'1)<t> =

s t + e, and by repeated integrations of

(see Van Vleck (1973),p pp 286- -7),f

t
2 -(r)
It f (t1)dt, t 5 t2

both sides, we get

 

t t
f(t + e) - zr-l {is-f f(j)(t)= :+6 P r 2 f(r)(t )dt dt
0 J. J: t 1 1 r
-1
(t + e - t )r
_ t+e l -(I‘)
” it (r-l)! f (t1)dt1

where the second equality follows by Fubini theorem.

We introduce the notation

f(r) ]:\dt

(1 3) A (x) =-111fm

12

where the dependence of Ar on n and h is abbreviated by

omission. For some of our results, we will assume Ar(x) = 0(1).

f(r) is asymptotically g3:

 

Note that Ar(x) = 0(1) whenever

g(r) x+t

equicontinuous at x (i.e., 1x a O as t 1 O and n 1 a0.

If only finitely many Pj are distinct and x is a rt-Lebesgue
f€r>
J

point of each of the , then again Ar(x) = 0(1).

Let Bn denote the bias of gm, i.e.,

(1.4) B = P g - g

where En = P1 X...X Pn' Since Xj has Lebesgue den31ty ufj

and the fj's vanish wherever u vanishes, by (1.1) and (1.2)

= h-<v+1>

(1.5) (Bn + gn><t> é gnsn(t) jx<<z-t)/h)f(z)dz

h-v§K(y)f(t+hy)dy

If, for v > 0, (Aév)) holds and K 6 X3, then the substitution
in the rhs of (1.5) of the expansion of f(t + hy) given by (Aév)),

and use of the orthogonality properties of K give
- t+h - -
(1.6) (Bn + gn)(t) = h VjK(y)jt y(c + by - z)v 1f(v)(z)dz/(v-l)!.

If (Aér)) for r > v holds and K 6 xi, then, since RV = 1,

by arguments similar to those giving (1.6),

r-1E(r)

(1.7) Bn(t) = h'vjx(y)j:+hy(c + hy - z) (z)dz/(r-l)!.

For r = v + l kernels giving (1.6) and (1.7) belong to the same
Class Xi: but, since (AéV+1)) é (Aév)), the two expressions are
not eQuivalent. We will use (1.6) and (1.7) to prove the asymptotic

unbiasedness of our estimators.

13

In what follows <an> is a sequence of positive numbers,
and D is a subset of R. Unless stated otherwise, all the limits,
convergences and asymptotic equivalent relations (for functions

depending on n) are wrt n a m.

1.2 Asymptotic Unbiasedness and the Exact Rate
For the results of this section, X1"°"Xn need not be

independent. Recall that gn and gm stand for f(v) and %(v)
reSpectively.

We will give sufficient and (somewhat) necessary conditions
for “BnHD = 0(1). Under different conditions we will obtain two
upper bounds for Uﬂ]\,andan asymptotic expression for Bn. We

first prove the following, where by (t,n) a [0+,m) we mean t

in a non-deleted rt—nbd of 0 converges to 0 and n 4 m.

Theorem 1(a), Let K E Kt, and, for the case v = 0, be
bounded. If (Aév)), whenever v > 0, holds at each point in a
rt-nbd of x, and if

(2.0) Av(x + t) = 0(1) as (t,n) ~+ [O+,oo)
then
(2.1) Bn(x +’t) = 0(1) as (t,n) a [0+,m).

On the other hand, if K 6 X’ (K need not be in X3) is
+
bounded, (gn - gn')]: t = 0(1) as (t,n),n' a [O+,m),m, and for a
, x+t _ -
subsequence {m}, 11mmtm,t10 Bm]x - 0 and fm E L1[x,x+Tm] for

x+t\

some Tm > hm, then, as (t,n) ~[0+,a0, ign1x = 0(1) (and hence,

(2.0) holds).

14

Remark 2.1. The second part of the theorem essentially

 

says that, in the presence of certain assumptions (which are
always satisfied in the case fj E f and, for some T > O, Lebesgue-
inf of the restriction to [t E [x,x+T)\f(t) > 0} of u is positive) (2.0)
is necessary even for a weaker form of (2.1).
Prggf, (Sufficiency of (2.0)). First consider the case
v = 0. Since K E X3, vanishes off (0,1) and IK(y)dy = l, by

the first equation in (1.5), we have
(2.2) \Bnml = h’1\f:+hK(Zﬁ-t-)(E]Z)dy\ s \lKilm 1.00:)

Thus, Since K is bounded, (2.1) for the case v = 0 follows
from (2.2) and (2.0).
Next consider the case v 2 1. Since K being in K:

gives
(2.3) (h'v/(v~1)!)fK(y)j:+hy(t + by - z)“'1dzdy = k 5 1,

and, since, by our hypothesis, (1.6) holds at every t in N+(x),

a rt-nbd of x,
(2.4) (v-l)!Bn(t) = h'va(y)j:+hy(t + hy - z)V‘1(gn]:)dzdy V t e N+(x).

Note that the integrand in (2.4) is bounded above by
- z
\K(y)\(hy)v llgnlt‘ which vanishes for y 4 (0,1). Thus, by (2.4),
.‘ -v..
\Bn(t)\ S \k|v_1Av(t) V t € N+(x), and hence, Since K t X§ implies
\k\v-1 < m, (2.1) for v > 0 follows from (2.0).
Necessity ofg(2.0). Let m be a subsequence and Tm > h é

m

- x+t
E 3 fm 6 L1[x, x + Tm] and 11mm1m,t10(Bm]x ) — 0. By (1.5),

(Bm + gm)(') = E-ij(y)fm(- + gy)dy. Therefore, since K vanishes

15

off (0,1), by use of the transformation theorem

X+§

(2.5) \(Bm + gm)]:+t\ s g(V+1)HKHm Ix \f ]:+t\dv = 0(1) as t t 0,

m

where, since K is bounded, f E L [x, x + T ] and T > g, the
m l m m

convergence in (2.5) follows by a theorem on continuity of transla-
tion of Ll-functions, (e.g., see Hewitt and Stromberg (1965), p. 199).

Since, by our hypotheses, for all Sufficiently large m,

x+t x+t
(gn ‘ gm)]x « O as (t,n) a [O+,a0, and Bm]x a O as t l O,
the identity gn = (gn - gm) +(gm + Bm) - Bm and (2.5) yield
x+t

gnJX ~0 as (1:.m)-+[0+.<==~)-I

Remark 2.2. The proof of the first part of Theorem 1(a)

(v)

3130 proves that: If (A0 ), whenever v > 0, holds on D and

 

if K 6 x3, then

(2.6) “BnHDHAVle s “Rum or \k‘v-l according as v = 0 or >-0.

Thus, if (2.6) holds and rhs of this is finite, then “AvHD = 0(1)

implies
(2.7) HBHHD = 0(1).

In fact, “AvHD = 0(1) is somewhat a necessary condition for (2.7):
If K e (K d t b ' V ' b d d ( - ) x+t =
X’ nee no e in KQ) is oun e , SUngpi gn gn. 1x \
0(1) as (t,n),n' 4 [0+,m),w, and for a subsequence {m},
lim sup \B ]x+t\ = 0 and f 6 L (U > [x x + T )) for
m1m,t10 xED m x m l xED ’ m

l = 0(1)

x+t
xED‘gnJX

(and hence, “AvHD = 0(1)). Proof of this assertion follows from

some Tm > hm’ then, as (t,n) « [0+,a9, Sup

arguments identical to those given for that of the second part of

Theorem 1(a). As an immediate corollary to this last result, we have

l6

Corollary 1. Suppose K E K: is bounded and only finitely

many P are distinct. For an a 2 -m, if each jmfj < m, then
a

x+t
x

J

supx>aan(x)\ = 0(1) iff limtﬂosupx>algnj \ = 0(1).

 

Remark 2.3. If only finitely many Pj are distinct, then
AV(X) = 0(1) whenever x is a rt-Lebesgue point of each of the

f our

fgv). Thus, (from the first part of Remark 2.2) with fj

f(v)

estimator of is asymptotically unbiased at x under the

f(v)

weaker assumption than that of the continuity of at x imposed
for Similar results in almost all papers on the subject. Sufficiency
and necessity parts of Corollary 1 Specialized to the i.i.d. case

with u E l, v = 0 and a = -m have been proved, reSpectively, by

Nadarya (1964) and Schuster (1969) for their kernel estimators.

Remark 2.4. For the case 0 = 0 and u E l, (2.7), (with

 

different kernels), has been noted by Samuel (1965), (Section 6),
under the uniform equicontinuity (and necessarily uniform equi-

boundedness) of f1,f2,... on D.

 

Theorem 1(b), If, for r > v, (Aér)) holds and K E Xi,
then
‘HV’l‘l nX'H’l “(1')
(2.8) h \Bni S Akir-l Jx \f l,
and
-r+v -(r)
(2.9) \h Bn - krf \ s \k‘r-l Ar

Proof. Inequality (2.8) follows immediately from (1.7),
Since the absolute value of the rhs there at t = x is no more

than ((r-1)1)'1jyr‘1lx(y)\dy(é \k\r_1) times hr‘V’lf:+h\t(r)\.

17

Also, since

(2.10) ((r-i)!hr)'1fx(y)j:+hy(t + by - z)r'1dz dy =

(r!)'1jer(y)dy é kr,
from (1.7), the lbs in (2.9) at t is exceeded by
(2.11) <(r-1)!hr>'1j \K(y>\j:+hy<t + by - z)r‘1\E(r)1:\dzdy
s \k\r_1Ar . II

Remark 2.5. If only finitely many P are distinct, then,

3

from the first part of theorem 1(b), the existence and the bounded-

(r)

ness of each of the f, on UX€D[x,x+h) ensure that

(2.12) US

(2.12), for the case f

Ill
H1
C'.
III
H
U
ll
PU
m
:3
0..
H

j = v + l, is
proved by Bhattacharya (1967) for his kernel estimators under the
Stronger assumptions that f and its first v + l derivatives
are bounded.

AS an immediate consequence of (2.9), we have

Corollary 2. Let (2.9) hold uniformly on D. If kr # 0,

“ArHD = 0(1) and lim inf(inf f(r)(t)\) > 0, then

teal

(2.13) h'(r'V)Bn(t) ~ krf(r)(t) uniformly on D.

Thus, under certain conditions, the exact rate of convergence

r-v

for the bias of the estimator g“ is h Theorem 1(b) describes

the situations where such rate is indeed achieved by gm.

18

Some global properties of Bn will be obtained in Section
5. Under varying conditions, we will show that, for a fixed
a 2 -m, j:2;3: dt 0(1), j: 3“ 2dt 5 \k|r Ja\E \2 dt and

22 2

I" B Zdt ~ (r V)I: \£(r)\d
a n khr
1.3 Strong,Consistency with Rates

Let En denote the error of the estimator gm, that is,

g(v)

- gn, where gn and g“ denote reSpectively, and
. Unless stated otherwise, all convergences in this section
will be meant with probability one. We will give sufficient and

(somewhat) necessary conditions for HE E = 0(1), and prove, for

n‘D

r>v,

(3.0) uh'r+VEn - krf(r)“D = 0(1)

Under conditions weaker than those used for (3.0), we will show
that “En“D = 0((n-110g n)a) for 20 = (r-v)/(1+r).

Hereafter denote gm - P g by On. In view of

ﬁn

(3.1) En = cn + B“,

if “CUHD = o(1),tﬂunlsufficient and (somewhat) necessary conditions
for HEnHD = 0(1) can be obtained from Section 2 (Remark 2.2 and
Corollary 1). Similarly, regarding rates of convergence, if

_ .. .. I‘—
anHCnuD - 0(1), then sufficient conditions for anhEndD — 0(1)
and for (3.0) (with h-r+v = an) can be obtained from (2.8) and

(2.9), reSpectively. Thus our objective in this section will be

to obtain sufficient conditions under which anHCnHD = 0(1).

19
For the remainder of this chapter, let

n
(3.2) uh(.) = Leb-inf restriction to [t6[x,x+h)\v fj(t)>0} of u.
1

For the results in Theorems 2(a) and 2(b) below, K need not be

in xi (but K E X). First consider the case when D = {x}.

Theorem 213). Let “Knm < m.V n > O
(3.3) Bauer.‘ 2 11] s 2exp{- g(hWIuhn/HKHOD)2} .

Proof. By (1.2), the event on the lbs of (3.3) is
-1 n v+1
[‘n 21(Yj- Pij)\ > Th ], and by (1.1) and (3.2), W“ s
“K“ In a.s. for l s j s n. Hence, since Y ,...,Y are in-

m h 1 n
dependent, Theorem 2 of Hoeffding (1963), applied to random vari-
ables Y1 and 4Yj here, completes the proof.II

Clearly, when D contains finite, m, points,

EHLHCnuD > n] s m times the rhs of (3.3) with uh there replaced

by mintEDuh(t). We now consider the case when D is not finite.

Theorem 2(b), Let K on (0,1), and, for each t in

 

D, l/u on [t, t + h) be of bounded variations. Then, with
Y (t) = K((- - t)/h)[u(°) > O]/u(°) (we may understand that by

Yj we are abbreviating YX ), V n > 0,
l

(3.4) EDEHCnHD 2 n] S 4ngM exp(-2(M2 - l)+),

v+l t+h
where M - nah R/(suptED It ‘dY.(t)\).
Remark 3.1. Kernel functions K 6 xi, which are of bounded
variations always exist, e.g., take those K's in X: which are

polynomials on (0,1).

20

 

Remark 3.2. Since Y (t), as a function of t, is of
bounded variation, Y (t+) and Y (t-) exist for all t. There-
fore, for any countable set S dense in D, SUPtGDY (t) =

(t) is a random variable.

supt SY (t). Consequently, Sup

Y
E tED X.
v+1 -l n

Similarly, HEHHD (= u(nh ) 21(ij - PjY' 1

len by (1.1) and

(1.2» is a random variable, and the lhs of (3.4) is meaningful.

Proof. Fix t in D until stated otherwise. Let F

be the average of distribution functions of X .,Xn, and let

1,..
* -
2F (-) = n 122([Xj < .] +[Xj s .]). Since Lebesgue-Steiltjes

integral I-dG does not depend on how G (monotone) is defined

at points of discontinuity, from (1.1) and (1.2),

t+h
t

h<v+1>

_ * '—
(3.5) Cn(t) —j Y.d(F - F)(-).

Since Y is of bounded variation on [t, t+h), it is con-

tinuous there except on a countable subset C. But by the absolute
. , — dF* - . . . "‘d’k

continuity of F, En fC ~ deF — 0 which implies 00 F — 0 a.s.

Consequently, (3.5) can be written as

(3.5)' 2h("”)cn(t) = ft+h(y +Y._)d(F* - F)(-) a.s.

t '+

(17*- F>(o+> + (11* - F>(--), K<y> = 0

Since 2(F\ - E)(-)
'V y 4 (0,1), and Y is of bounded variation (and hence is the dif-
ference of two increasing functions) on [t, t + h), by (3.5)' here
and (V) of Theorem 21.67 of Hewitt and Stromberg (1965), the rhs

, , t+h * -' . .
of (3.5) 13 2ft (F - F)(-)d(Y ). Hence, s1nce our foreg01ng

analysis in the proof holds good for each t E D,

t+h

(3.6) hMHCnHD S “If ‘ ﬂimsupcen c

\dY.(t)\ .

21

Now (3.4) follows from (3.6) here combined with Lemma A.l

and Remark A.l with c1 =...= cn = n-25 of the appendix.'.

Let vh(t) be the total variation of l/u on [t,t+h),
and V(K) be that of K on (0,1). Then V y in [t,t+h),
(u(Y))-1 5 (uh(t))-1 + Vh(t) and \K((y-t)/h)\ S \K(0)\ +'V(K) =
V(K); and the total variation of Y.(t) on [t,t+h) is no more
than Vn(‘)“K“m'+ v<K>SuPts-<t+h(u(°))-1' Consequently, since
nxum . vac.

(3.7) suptED f:+h‘dY.(t)\ s “(uh)“1 + thHﬁV(K).

Now we prove the following corollary to Theorems 2(a) and 2(b).

Corollary 3. If “KHm‘< m, then

uhCn = o<<log tog/(n35 v+1));

(3.8)

and if (3.4) and (3.7) hold with V(K) < m, then
-1 -1

(3.9) n(uh) + thHD chHD = rhs of (3.8)

Proof. By (3.3) with n = 2(log n)%“KHm/(n%hV+1uh),

2: PUECn > n]'< m. Thus (3.8) follows by Borel-Cantelli Lemma.

Similarly, (3.9) follows from (3.4) and (3.7)..

 

Remark 3.3. Suppose for r > v, (Aér)) holds at each
, r -l t+h -(r) _
p01nt in D, K E X§ and h suptED It ‘f \dt — 0(1), then by

(2.8) of Theorem 1(b),
(3.10) HBDHD = 0(hr'v).

The choice of h that balances rhs's of (3.8)-(3.10) is proportional

22

-1 1/2(r+1) , ,
to {n (1 + log n)} - Thus with th1s h, if, for some n,

. . ‘1 .
uh > 0 for each p01nt 1n D (and “(uh) + 2vhHD < m, 1n case
D is not finite), then (3.1) combined with (3-8)-(3.10) gives,

with 2d = (r-v)/(1+r),

(3.11) “En: = 0((n-llog n)a).

\n

The result in (3.11) specialized to the case u E l, f E f,
r = v+1 and D = R, is proved by Schuster (1969) (for his
estimators) under stronger assumptions that f and its first
v+1 derivatives are bounded.

If only finitely many P are distinct, then (3.11) can be

j
strengthened slightly by replacing log n there by log log

:3

(This follows from (3.1), (3.6) and (3.10), Since HF* ‘ EHm

0((n-1 log log n)%), see Kiefer (1961)).

1.4 Variance, Covariance and Asymptotic Normality,
In this section we prove the asymptotic normality of

A on.A d A a
(gn(x1), ,gn(xm)) where, as before, gn an gn abbrev1ate

- t 2 .
f(v) f(v). We first obtain an upper bound for oh = var Sn

hZV+1)o§ ~ “KH§(f/u), and for x' # x, oh(x,x') é

V+1)-1). Throughout this section, we

and
and show that (n

A A 2
cov<gn<x).gn(x')) = o<<nh

assume that “K” < m, and, unless stated otherwise, summation 2
oo

 

is over l,...,n.

Recall from (1.1) and (1.2) that Yj(-) =

{K((xj-°)/h)/u(xj)}[u(xj) > O] and gm = (nhv+1)-12'Yj. Since

1(1....,xn are independent, so are Y ”Yn’ it follows that

1,..

v+1 2 2 _ 2
(4.0) (nh ) on — 2 var(Yj) S 2 Pij

23
Lemma 1. V g 2 l,
-1 t+h - -1
(4.1) n sz\Yj(t) - Pij(t)\§ s (2\\K[|m)§jt (f/ug ).

nggf. By cr-equality (LoeVe (1963), p. 155), the lhs
of (4.1) is exceeded by 2gn-lz Pj\Yj(t)\§ = 2§I\K(y-t)/h)\g(f/ug-1)dy
which is bounded above by the rhs of (4.1), since K vanishes off
(0,1).III

Inequality (4.0) and the latter part of the arguments used

in the preceding proof with g = 2 yield
2 2 2v+2 -l +h -
(4.2) ch(x) s HKHm(nh ) f: (f/u).

Remark 4.1. If u E l a.e. on {t‘fj(t) > 0 for some j 2 1},
then (4.2) is strengthened to nth+2Hg§Hm s “K”:, since then
IEﬁEsIVteR.

Lemma 2. If

_ f E
(A1): n 1 :‘H‘l-Jo) - :(Xde = 0(1)
then
- 2 -
(4.3) (nh) 1; Pij = “K“:(f/u) + 0(1).

 

Remark 4.2. (A1) is implied if x is a rt-Lebesgue point

of (u)-1, f(x) is bounded in n, A0(x) = 0(1), and either

or sup is bounded in n. Obviously,

Sprstci-i-hmun-l xst<x+hf(t)

(A1) holds if (flu), as a sequence in n, is asymptotically rt-
equicontinuous at x.
Proof of Lemma 2. Since the lhs of (4.3) is

h‘lsz((y-x)/h)(f/u)dy, and “RH: é sz = h'le2((y-x)/h)dy, to

24

prove (4.3) it suffices to show that
-1 2 1:23 i _ .t: =
(4.4) h jK < h )\u<y) u(x)\dy 0(1).

But since K is bounded and vanishes off (0,1), (4.4) follows
from (A1) . I

Lemma 3. If for an integer i E [0,v],
2 .

and in case i is positive,

A(10+

(Ao ) for each j 2 l, t = x,x' and O < y < 1,

f (t+hy ) = g1 1 gill-f £110“) +--—-- t"1hy(t+hy-z)i 1 £11)(z)dz

l L! (i- 11)! It

holds, then
(4.5) (nh)'1z{<Pij<x))(Pij(x'))) = 0(1).

Proof. Since K is bounded and vanishes off (0,1), the

lhs of (4.5) is bounded above, in its absolute value, by

2 -1 x'+h (0)
HKHm<nh> z{(j:+hij)(fx. fj))

2 )-
Now consider the case i 2 1. By the transformation

0(1) by (A

theorem, (Ag1 ))+ and the orthogonality prOperties of K, the lhs
of (4.5) is
(4.6) hn'1z{<fx<y)fj<x+hy)dy)(fx<y>fj<x'+hy)dy) = hn nj(xwn .<x' )

where (i-l)Ith(t) = fK(y)j:+hy(t+hy-z)i-1f§1)(z)dzdy.ZY Since K

t+h (')
1fjl 1'

sequently, since “KHQ‘< m, the rhs of (4.6) is 0(1) by (A(i)). "

vanishes off (0,1), (i-l)!\ynj(t)\ s \\K\\mh1'1f Con-

tt

25

Corollary 4. If (4.5) holds, then with h s \x-x"

 

(4.7) “h2v+1 on(x,x') = 0(1);

and if (4.3) and (4.5) with x' = x hold, then

(4.8) nh2V+1on 2 -uKH2(E/u) = o(1).

2322;, Since K vanishes off (0,1), with h s \x-x",

Yj(x)Yj(x') E 0. Hence, by independence of Y1,...

(1.2), (nh )- Oh(x,x') is minus the lhs of (4.5).

d
,Yn an by

Again by independence of Y1,...,Yn and by (1.2),

nh2V+1 a: = (nh)1£(P YZ- PZY ). Therefore, (4.8) follows from

J j J J
(4.3) and (4.5) with x' = x. I

As an immediate consequence of the preceding corollary,

if lim inf f > 0 and if (4.8) holds, then

(4.9) Wh2V+1 2 ~quzf/u ,

Remark 4.3. If (A1) holds (which is assumed indirectly
for (4.9)), then the simple result in (4.2) gives a rate for oi
equal to the exact rate obtained in (4.9).

Theorem 3. With x1,...,x[n in R, suppose (4.7) for
pairs (x1,xj), 1 ¥ j, i,j = l,...,m, and the hypotheses for (4.9)

for each x., i = l,...,m, hold. If for each t = x I...,x
1 l m

(4.10) (f(t)/u<t>) 3/2h1j:+h<f/f) = o((nh)1),

then

(4.11) g a (Zn(x1),---,Zn(xm)) émgg)

26

where ahzn = CH é (ﬁn - Engn), I is a m X m identity matrix

and 0 ' (0,...,0) is a l X m matrix.

* v+1 -1 *
P f. L t Y = h - P Y / . Then 2 = .
.529. e j (n ) (Yj jj) % 0 $1
Let 0 f c E Rm with coordinates c1,...,cm, and let Lj =
o A — W]- -1 . =
216 i Yj(xi ). Then, Since gn (nh ) ZYj, E E: ﬁle i Mn(x ) 31

is the sum of n independent random variables Lj centered at
eXpectations. Therefore, by Berry-Esseen theorem (see Loéve (1963),

p. 288), with n: = var(c Z),

3 3
(4.12) \PUg g s g nn] - mm s c z Pj\LJ.\ m.

where C is the Berry-Esseen constant.
2 . g A A
Recall that on = var(gn) and oh(Xi,xj) C0V(8n(xi),gn(xj))-

Therefore

(4.13) n: é var(ﬁT ciZn(xi))

=25 j1(CCo(X,X)/C(X)C(x))
ii‘j
= c c + o(l)

where the second step follows from (4.7) and the hypotheses of (4.9).
Moreover, (4.1) with g = 3 and (4.9) applied at the second step

below yield, for each t = x ,x

1,... m,

* 3 = v+1 -3 _ 3 3
(4.14) 2Pj\Yj(t)\ (nh ) 2Pj\Yj(t) Pij(t)\ /on(t)

- 3
s ~(nh) g(2‘-(\I<\\m/HI<\.I\2) (lhs in (4.10)) = 0(1) by (4.10).

Since, by m-l uses of cr-inequality, \Lj\3 5 22(m- 1) 2:m‘C f‘

* 3
may
by (4.13) and (4.14), the rhs in (4.12) is o(1). Consequently,

since by (4.13) n3 a c c,

(4.15) c 2 ﬂ(0,c c)

m, by a well known theorem,

Since (4.15) holds for every c # 0 in R
(see Billingsley (1968), p. 49), our desired conclusion follows.ll
Remark 4.4. From (4.12)-(4.14) with m = 1, c = 1 and

x1 = x, a rate of closeness of the distribution of 2n = Cn/oh to

that of ﬂ(0,1) is given by

%

HP[2n(x) s -] - gm“co 5"0((nh)- (lhs of (4.10) with t = x)).

Remark 4.5. Inequality (4.10), with t as indicated there,
*
is used in the proof of Theorem 3 only to show that 2 Pj\Yj(t)\3 = 0(1),

(cf. (4.14)). If is given by (3.2), then \Yj(t)\ s “K“ /uh(t).

”h

*
Therefore, the definitions of Yj given in the preceding proof,
*
and of o:(t) followed by the fact that Pj\Yj|3 s
v+1 -1 * 2
2“K“m(nh uhoh) Pj\Yj\ , lead to

z; PJ\Y:(t) ‘3 s 2HK\\m(nhV+1uh(t)on(t))'1 ~ 2\\K\\m(\\KH§nhf(t)u:(t) /u(t))-%

where the last relation follows from (4.9) which is assumed in
Theorem 3. Hence (4.10) in Theorem 3 can be replaced by
((f(t)u:(t)/u(t))m1 = o(nh). In case h-lf:+h(f/u2) ~ f(t)/u2(t)
at t = x1,...,xm, then obviously (4.10) can be replaced by
(u(c)f(c))'1 = o(nh).

Remark 4.6. If lim f(xi) = a1 > o and (4.8) holds at

x1,...,xm, then (4.11) implies

2v+1

(4.16) 11m Pn[(nh )%Cn(xi) S ti’ 1 s i s m]

t
i
HKH2(ai/U(Xi))

m
= U Q( )
1 %

28

For the i.i.d. case with v = 0 and u E l, (4.16) is quite similar
~to the univariate version of the result obtained in Theorem 3.5 of
Cacoullos (1966).

In view of the identity En = Bn + Cn and (4.16), certain
interesting results about the asymptotic distribution of
(En(x1),...,En(xm)) can easily be obtained from (2.8) and (2.9)

both at x1,...,xm.

1.5 Mean Square and Integrated Mean Square Consistencies with the

 

Exact Rates.

 

Define the mean square error (MSEn) and the integrated mean

square error (IMSEn) of the estimators gn by

2 a.
(5-0) MSEn = Pn(gn - gn) and IMSEn - fa MSEndt

reSpectively, where a 2 -m is fixed. Obviously,

2 2
(5.1) MSEn - BU + on

This section is divided into two parts. The first one deals
with properties of MSEn and the other deals with those of IMSEn.

We obtain, among other results, rates and the exact rates for MSEn
and IMSEn.

In view of (5.1), various results concerning MSEn can be
obtained from those of Bn and on contained in Sections 2 and 4
respectively. We describe some of them as follows. By (4.2), if
(nth+2)-13uptEDI:+h(f/u)dy = 0(1), then Sufficient and (somewhat)
necessary conditions for HMSEUHD = 0(1) can be obtained from

Remark 2.2 and Corollary 1. Regarding rates of convergence, we

have for r > v,

29
Theorem 4. If (2.8) holds, then

(5.2) MSEn s (\k\r_1hr’v’1j:+h\f(r)\)2 + “Ku:(nh2V+2)-1j:+h(E/u);

and if (2.13) with D = {x}, and (4.9) hold , then

-(r)

(5.3) MSEn ~ (krhr'v f 2W1)-1

)2 + (nh HK“§(E/u).

Proof. Inequalities (2.8) and (4.2) combined with (5.1)
yield (5.2). Since an ~ bn > 0 and cn ~ dn > 0 imply an +'cn ~
bn + dn’ (5.3) is an immediate consequence of its hypothesis. I

Remark 5.1. Suppose K is bounded. If for some 0 < p,

{(I:+h‘f(r)\1/p) V (I:+h(f/u)1/q} is bounded in n,

 

q S 1, suptED

then (5.2) followed by use of Holder inequality gives

2v+1+q)-1 n-Zs(r-v-p))

2--
(5.4) “MSEHHD = 0(h (r v p) + (nh ) = 0(

where 3-1 = 2r + l - 2p + q, and the second equation follows by
taking h proportional to n-8, a choice of h balancing the
two terms in the middle of (5.4).

The result in (5.4) specialized to v = 0, fn E f, u E l

and D = R improves the corresponding result obtained in Theorem 2

of Schwartz (1967). Assuming f is continuous, of bounded vari-

ation and xjf(r-J) 6 L2 for each j = 0,1,...,r, he exhibits an

estimator of f by orthogonal series method, and shows that MSEn
-(r-2)/r

) uniformly on R. This rate is
~(2r-1)/(2r+1)

of his estimator is 0(n

)

much slower (especially when r is not large) than 0(n

obtained in (5.4) with p = %, q = 1 and v = 0, which is guaranteed

t+e\f(r)‘

2
tERJ't < °°

in this case simply by the assumption that sup

for some 6 > 0. Moreover, he requires r > 2 instead of r > O.

30

-1 t+h -(r) t+h -
Remark 5.2. If suptEDh {(It \f \) V (It (f/U))}
is bounded in n, then taking h proportional to n-1/(1+2r)

(5.2), we get

n-2(r-v)/(l+2r)

64v umgt=o< >

improving the rate in (5.4) (with the excess in the rate of the

order n.‘2 where c = 23{(r-v)q + (4v+1)p}/(1+2r)). For the case

D = {x}, fn E f and v = 0,1, Yu (1970), (Section 2 of his appendix);
and for the case D = {x}, fn E f, u E 1 and v = O, Parzen (1962),
(Section 4), and Wahba (1971), (Theorem 2), obtain the rate in

(5.4)' for their estimators. Yu makes a little stronger assumption

f(r)

that and flu are bounded on [x,x+h]; while Parzen and

Wahba make still stronger assumption (adding others) that, reSpectively,
(r) .
f is continuous and is in L2.

An optimal h, in the sense of minimizing the asymptotic

expression for MSEn in (5.3) is given by

(5.5) h1+2r = n-1(2v+1)HKH:f/(2(r-v)k:(f(r))2u)

Thus approximations of the optimal h could be based on suitable

- - 2
guesses or estimates of the magnitude of f/(f(r)) .
Using h given by (5.5), (5.3) becomes
'(r) 1+2v -1 2- r-v 2/(1+2r)
(5.3)' MSEn ~ Cr,v{(krf ) (n HKHzf/u) } .
where
-l 2(r-v)/(l+2r) -1 (2v+1)/(1+2r)

cr v = ((2v+1)(2r-2v) )

’

+ (2(r-v)(2v+1) )

Relations (5.3), (5.5) and (5.3)' specialized to the case fn E f,

31

u E 1 and v.= 0 coincide (up to the factors kr and “KH2)
with (4.12), (4.15) and (4.16), respectively, of Parzen (1962).
In the remainder of this section we derive certain pro-
' f 4 ° d
parties 0 IMSEn - Ia MSEn(t) t.

Lemma 4. For each n 2 1,
m 2 - 2 -
(5.6) I, a, s (nh2V+1) luxuz f:(f/u).

Proof. Integrating both sides of the inequality in (4.0)
and then making use of the Tonelli theorem at the second step below

we get
‘ 2 - a: 2 .-
(nhZV+1)I: ohdt S h 1I8I(K (y-t)/h)/u(y))f(y)dydt
= f<f<y)/u<y))f:h'1xz<<y-t>/h>dtdy
2 m -
s “Kn, Jaw/u). l

Lemma 5. Suppose r = v. If Bn = 0(1) a.e. on (a,m),
V ‘gn‘ E L2(a,a0 and, for the case v > 0, (Aév)) holds on
n

(a,a9, then

(5.7) J”: B: = 0(1)

Proof. Consider first the case v = 0. Let s(t) =
V I\K(y)‘f(t+hy)dy. Using Tonelli theorem and Schwarz inequality
n

at the second step below, we get
2 a - _
I:S (t)dt S faff\K(y)K(w)\(z f(t+hy))(: f(t+hw))dydwdt

s £I‘K(y)K(w)‘(j: v f2(t+hy)dt)%(j: v f2(t+hm)dt)%dydm
n n

s II‘K(y)K(m)\f: V f2(t)dtdydw < m
n

32

since K 6 L1 by its definition, and v f(t) E L2(a,m) by
hypothesis. Since by (1.5) and by cr-izequality ‘Bn(t)\2 =
(IK(y)f(t+hy)dt - f(t))2 s 2(sz(t) +'V f2(t)) and since by
hypothesis Bn = 0(1) a.e. on (a,a0,nthe desired conclusion for
the case v = 0 follows by dominated convergence theorem.

Now consider the case 0 2 1. Since K vanishes off

(0,1), (2.4) followed by (2.3) gives, V x E (a,m0,
(5.8) \Bn(x)\ s b-1va-1\K(y)\j:+hy\gn(t)\dtdy +-\gn(x)‘
s jy"\1<(y>\j(1,\gn<x+hwy)\dwdy + \gnool

where the last inequality follows by use of the transformation
theorem. Since yVK(y) E L1 by the definition of K, and since
V‘gn‘ 6 L2(a,m), by the technique used to prove 82 E L2(a,w),

it can be shown that the supn of the extreme rhs of (5.8) is in
L2(a,m). Thus the proof is complete by dominated convergence
theorem. I

Lemma 6. If for r > v, (1.7) holds a.e. on (a,m), then

2 h2(r-v) m g(r)

(5'9) I: B: S \k‘r Ia< )2°

Proof. By (1.7) we have, after use of the transformation

h - -
theorem, (r-l)!han(x) = IK(y) 0y zr 1f(r)(x+hy-z)dzdy for almost
all x E (a,m). Therefore,using Tonelli theorem, and Schwarz in-

equality at the second step below, we have

33

by hy
2 a) 2 '2co -1-
h V a Bndx s ((r-l)!> ja<jjjo lfo 2\K(y1>1<(y2>\<zlzz>r \f‘r)<z+hy1-z1

f(r)(x+hy2-22)\dzzdzldyzdy1)dx

- by by ._ m - 2
s ((r-l)!)2M\K(y1)K(y2)”'0 11.0 202122; 813““) (x+hy1-zl)\ dad}:

(I:\f(r)(x+hy2-z2)\de)!5dzzdzldy2dy1
s hzrm: Elf“) \2 .-

Lemma 7. For r > v, suppose (2.8) holds on (a,m), and

both (2.9) and Ar = 0(1) hold a.e. on (a,m). If v ‘§(r)\2 E L1(a,a9,
n
then

oo--2 2-
(5.10) [a\h 2(r V>Bn - kr\f(r)\2| = 0(1).

Proof. By Schwarz inequality, the square of the lhs in

(5.10) is bounded above by 11-12, where
= @ -(r-V) _ -(r) 2 _ m ~(r-v) '(r) 2
I1 Ia‘h Bn krf \ and Iz—‘fa\h Bn+krf \.
Since (2.8) holds on (a,a0, by transformation theorem and the

Schwarz inequality, we get on (a,a9,

(5.11) (\k\r_1hr’“)‘23:(t) s (jg\é(r)(t+hw)\dw)2

S V I3\f(r)(t+hw)\2dw .
n

Since V \f(r)‘2 E L1(a,m), so is the extreme rhs of (5.11)- By
n

cr-inequality the integrands in I1 and 12 are bounded in n by

an L1(a,m)-function. Hence, since (2.9) and Ar - o(l) both hold

a.e. on (a,oo) , by dominated convergence theorem, I112 = 0(1). I

34

Lemma 8. If (4.8) holds a.e. on (a,m) and (V f/u) E
n
L1(a,a0, then

(5.12) f:\nhzv+1 a: - HKH:(f/u)‘ = 0(1).

Proof. By (4.0), V t 6 (a,m),

2v+

(5.13) nh 1°§(t) s (nh)-1£: PjY§(t) = h‘lf:+hxz((y-t)/h)(f/u)dy

s IK2(w)V (f(t+hw)/u(t+hw))dw .
n
Since (V f/u) E L1(a,a9, by an application of Tonelli theorem,
we see that the extreme rhs of (5.13) is in L1(a,m). Hence, the
integrand in (5.12) is bounded in n by a L1(a,a9—function, and,
since (4.8) holds a.e. on (a,m), (5.12) follows by dominated con-
vergence theorem. I.
As an immediate corollary to Lemmas 7 and 8, we have

- 2
Corollary 5. If kr # 0, lim inf f:\f(r)\ > 0 and (5.10)

holds, then

-2 - a 2 2 - 2
(5.14) h (r v)fa 3n ~ krj\t(r)\ ;

and if lim inf f:(f/u) > 0, and (5.12) holds, then
2 2 2 -
(5.15) nh V+1 f: “a . “KHZ j:(f/u)

We will use (5.14) and (5.15) to prove (5.17) below.
In view of (5.1), various results on IMSEn can be obtained
2 Q2 (.02
from those on I:On and IaBn’ e.g., if Jach - 0(1) (by (5.6)
2v+1 -1 2 m - _
it is sufficient that (nh ) “Ku2f8(f/u) - 0(1)), then (5.7)
implies IMSEn = 0(1). Regarding rates of convergence, we have

for r > v,

35

Theorem 5. If (1.7) holds a.e. on (a,a0, then

2

2 2 - m - 2 2v+l -1 -
(5.16) IMSEn s \k‘r h (r V>fa\£(r)\ + uKu2(nh ) f:(f/u);

and if (5.14) and (5.15) hold, then
(5.17) IMSEn ~ rhs of (5.16) with ‘k‘r replaced by \kr\.

nggf, Equation (5.1) followed by (5.6) and (5.9) yields
(5.16). By (5.1),(5.17) is an immediate consequence of its
hypotheses . I

It may be recalled that a sufficient condition for (1.7)
at a point is that (Aér)) holds at that point. Thus a simple
assumption gives (via (5.16)) a rate for IMSEn quite similar to
the exact rate obtained in (5.17).

Since ‘kr‘ s “KHZ by Schwarz inequality, (5.16) with

 

h = n-1/(1+2r) yields
r-v 2 w -(r) 2 -
(5.18) IMSEn s (“Kuzh ) ja{\f \ + (f/u)} .
Remark 5.3. The result in (5.18) Specialized to the case
u E l, f E f, v = 0 and a = -m improves the result in (3.6) of

n

Schwartz (1967) who exhibits an estimator of f by orthogonal series

method. Assuming tjf(r-j)(t), j = 0,1,...,r, are in L he shows

2’
-(r-1)/r

that IMSEn of his estimator is 0(n ). This rate is

significantly weaker (eSpecially when r is not large) than our
rate 0(n-2r/(1+2r)), which is guaranteed in this case if we only

f(r)

assume that 6 L2. Moreover, he restricts r > 1, while we

assume r > O.

36

An optimal choice of h as a function of n and independent
of the point at which gn is to be estimated can be obtained by
considering a global measure of how good gm is as an estimator of
g“. The integrated mean square error is a standard measure of this
type. The global Optimal h, as the minimizer of the asymptotic

eXpression in (5.17) for IMSEn is given by

1+2r

(5.5)' h = (n1(1+2V)HKfo (f/u)/(2(r- -v)k :fa 00(EM) 2

)

and, hence, could be approximated by some suitable guess or estimate
- - 2

of the magnitude of the ratio j:(f/u)/f:(f(r)) . Using h given

by (5.5)', the asymptotically minimum possible value of IMSEn can

be obtained by (5.17).

1.6 Estimation of Mixed Partial Derivatives of the Average

 

of Multivariate u—Densities.

Let X1,...,Xn be independent m-variate random variables

with Xj " Pj << u, where Pj's and u are over Rm, and u is
absolutely continuous wrt Lebesgue measure. Unless stated other-
wise, throughout this section, the product H is over l,...,m.
With t in Rm, and u , a fixed determination of du/dt, let

-1 t1+€1 tm+€m
fj(t) = (u(t)) llmeilo,1=l,ooo,mt)jt ... jtm de 1f
lhnit exists V j 2 l and u(t) > 0, and 10 otherwise. For v and
t in RIn with elements of v non-negative integers, and for

V .
\v‘ = 2? VJ, let f§V)(t) = a\v\fj(t)/(H3til). For a fixed vector

v = (v1,...,vm ), vj 2 O integers, we consider estimation of
g(V)= -121 f(\)) .

37

Let h = (h1,...,h ) be Such that 0 < h. E h, S 1
m 1 1,n
and h t 0 as n t m. With r = (r ,...,r ), r, 2 v, integers,
% 1 m 1 1 r
let in be defined as in Section 1. For fixed Ki in K§1, let
i i
,(v) -1 n Vi” -1 Xi-“xi
(6.0) f (x) = n 2._ {{n(h. ) K.(-l--)}[u X.) > 01/U(Xo)]
J'l 1 1 hi J J
where le,...,ij are coordinates of X,. f(v) is our proposed
estimator of g(v). Taking expectation of f(v)(x) wrt

Pn = P1 X...X Pn’ and then making use of the transformation theorem,

we get
-v

(6.1) an(V)(x) = j(nhi iKi(yi))§(x+h-y)dy

~

where h-y = (hlyl, . . . ,hmym) .

. m _
For 2 and t 1n R , let (2)1(t) - (t1,...,ti,zr+1,...,zm).

With 1 S L s m and with the first L-elements of r non-negative

integers, we introduce

(Aér))L: For V y E (0,1)m and for each i S L, f(x+h-y) has

ri-th order Taylor eXpansion in hiyi about xi with integral

form of the remainder, while other components Of (X + h‘Y) are
held fixed. (Such expansion in the univariate case is given in
(Aér)) in Section 1.)

Suppose L (0 S L s m) elements of v are positive. With-

out loss of generality, let these be v1....,v . Suppose (Aév))

6 L
liolds. Using Taylor formula, we eXpand f(x+h'y) (appearing in

(6u1)) in hly1 about x1 with integral form of the remainder

a}: the v ~th term, we perform the integration on the rhs of (6.1)

l

‘vrt y1 and use the orthogonality properties of K1. Then using

38

Taylor formula, we expand f ((x+h-y)1(t)) (appearing

in the resultant) in hzy2 about x2 with integral form of the

remainder at the vz-th term, we perform the integration wrt y2

and use the orthogonality properties of K ; then we do the similar
-(v1,v2,8,...,0)

operations wrt (x3,v3,K3) with f ((X+h'Y)2(t))

(appearing in the resultant), and so on until such operations wrt

(XL’YL’KL) are completed. We finally get

A 'V. -
(6.2) gnfmm = jam, min/1)) jJ;<t>f(")(<x+h-y> ,(tnstdy

V '1
v = _ i _ , —l
where Jy(t) “{[xi 5 ti < xi +hiyi](xi+hiyi t1) ((vi 1).) [vi>0]+

[vi = 0]}, st = Hidti and the second integral on the rhs of (6.2)
v -v
i i v =
is L tuple. Since Iz Ki(z)dz vii, Imhi Ki(y1))ij(t)stdy 1.

Consequently, using the transformation x

v -1 +hiyi = ti’ L+l S i s‘m,

i

V L 1 =
and the facts that Jy(t) s n1((hiyi) “vi-1)!) and K1 .. 0
off (0,1), all at the second step below, we get, with Bn =
2(v) '(v)
Enf ' f ’
-Vi v -(v) . -(v) td
(6.3) \Bn(x)\ Sfmhi \Ki(yi)\)j‘Jy(t)\f ((x+h y)L(t))-f (x)ldL y

. <ni\.k\..-1><rf:a\\1<i\\.> (a),

where Av(x) =‘I(ﬂh;1[xi 5 ti < xi + hi])‘f(v)]:\dt,

and ‘ik‘j = IZJ\K1(z)le/j!° Thus, if “Kium‘< m for

1+1 s i s m (bounded kernels Ki 6 x i, r
vi i

exhibited, see Section 1 and Remark 3.1),and (A8”)L holds and

2 vi, can always be

Av(x) ’ 0(1), then

(6.4) Bn(x) = 0(1).

39

(r)

Now suppose for ri > 0.1, (A0 )In holds. An analysis

similar to that given for (6.2) (this time use rl-th,r2-th,...,

rm-th order coordinatewise Taylor expansion with integral form

of the remainder) gives

A(V) '(V) -v' r -(r)

(6.5) gnf (x) = f (x) + fcnhi 1Ki(yi>)ij<t)f (t)dtdy .
. . (r) r1.1

Since K s vanish off (0,1), Jy (t) S n((hiyi) /(ri-l)!)

1.
-r

i (r) _ _
and IUIh1 K 1(yi))ij) (t)dtdy II(i Rr 1) where ikj -
$221K 1(z)dz/j!, it follows from (6. 5) that

r.-v.-l

(6.6) \Bn(X)\ s (Rh,1 1 \ik\r._1)f(n[xi 5 ti < xi + hi])f(r)(t)dt
1

and
(6.7) ((nh?i-riB (x) - (n.k >E(‘)<x>\ s (n\.k\ )A (x)
1 n 1 r1 1 ri-l r

Results obtained in (6.3), (6.6) and (6.7) for m = l coincide
with (2.6), (2.8) and (2.9), respectively.

Since X1,...,Xn are independent, the inequality var X S
EX2 followed by the transformation theorem gives

2 . 2(v) 2(Vi+1) -1 -
(6.8) oh(x) = var f (x) S M (nIIhi ) I(H[Xistiéxi+hi])(f/u)dt
where M E HHKiHm. Since MSEn é En‘f(v) - “(v)\2 B: + 02, rates
of convergence for MSEn can be obtained from (6.4), (6.6) and (6.8).

If, corresponding to u and h here, uh is defined analogous
to (3.2), then by Theorem 2 of Hoeffding (1963), V n > O

P [ f(V) f(v) 9 hVi+1 /M)2
"n \ - En \ > n] S 29XP{' 2((H 1 )“ uh }'

v,+l

Thus by Borel-Cantelli lemma (ﬂ hi1 2(V) PnE(V)\ =

>\f

O((n-llog n)%(M/uh)) a.s., and

40

‘(v)

rates for strong consistency of f (x) can be obtained from (6.4)

and (6.6), since ‘%(v) - f(v)\ s \Bn\ + \%(v) - Pnf(v)‘.

Though we have verified some of the asymptotic properties

of g(v)’ it is not our intent to encounter and verify all the pro-

(v)

perties of E that we have already studied in the univariate

case. However, regarding some of these, it can be verified that
under the assumptions analogous to those given for (4.7), (4.8),
(4.11), (5.2), (5.3), (5.16) and (5.17), results analogous to these

also hold good in the multivariate case here (analogoue of (4.8)
2v£+1 2 2 _
1 )Oh ' (HHK1H2)(f/u) = 0(1) and that of (5.16) is

2(ri-v.)

2 a an -
IMSEnS (n‘iklr hi 1 )J‘a fa \f(r)\2dt 1
i l m

is n(nh

2 -2vi-1 -1 m a ..
(UHK1“2 hi )n I ... Ia (f/u)dt, etc.). Conditions (somewhat)
m

a
1
necessary for asymptotic unbiasedness, (and also for strong or mean

m
square consistency), uniform on any subset of R , are analogous to

those given for the same in the univariate case (cf. Remark 2.2 and

Corollary 1).

CHAPTER 2
CONVERGENCE RATES IN SEQUENCE-COMPOUND SQUARED ERROR LOSS

ESTIMATION OF CERTAIN UNBOUNDED FUNCTIONALS
IN EXPONENTIAL FAMILIES

2.0 Introduction.

 

Let n be a parameter Space indexing a family of proba-
bility measures .0 = {Pw\w 6 Q} on a sample space I, With an
observation on a random variable X ~ Pm’ let the component prob-
lem be squared error loss estimation (SELE) of Egg; 9(w).

Suppose this component problem occurs repeatedly and in-
dependently. Then, after n such occurrences, we have an unknown
vector 9 = (w1,...,wn) 6 0n and a correSponding vector of in-

dependent random variables, X = (X ,...,X ) with X. ~ P E P .
N l n J j wj
With 0 abbreviating 9(wj), we consider estimation of each

J

component of e = (91,...,en) with loss taken to be the average

of the squared-error losses in the individual components.

 

We call m = ($1,...,qh) a sequence compound estimator

 

(henceforth, compound estimator of simply estimator) if “3 is

(X1,...,X )-measurable. Let G be the empiric distribution

j 1

function of the first 1 components of w, and R(-) be the Bayes
envelope for the component problem. With a 6 > O, we say m

achieves a rate 6 (at 0) if the modified regret of W! defined

 

by

_ -l n _ 2
(0.1) D (93.33) —n 21§j(cpj ej) R(Gn)

41

42

is 0(n-5) as n a m, where 21 = P1 X...X Pj' We now describe
the main results briefly as follows.

In Section 1, we use the method of Gilliland (1968),
(Section 2), to obtain an explicit bound for \Dn(9’gb\'

In Section 2, we introduce some further assumptions and
notations. For the results in Sections 3-6, I,= R,.9 is an
exponential family wrt p, a c-finite measure dominated by Lebesgue
measure on R and n is the natural parameter space.

Using the technique developed by Gilliland (1966), (Chapter
III), we exhibit, in Section 3, a divided difference estimator
for 9 with a rate 1/5.

Based on estimators (introduced in Chapter 1) of derivatives
of the average of u-densities, we exhibit, in Section 4, kernel
estimators of 9 for (integer) r > 1 when 9(w) = w, and for
(integer) r > 0 when 9(w) = em or w-l. These estimators are
shown to have rates (r-1)/(1+2r), r/(1+2r) or (r-)/(1+2r) in
their reSpective cases of w, em or w-l.

In Section 5, we show that, when 9 is an identity map
and w has identical components, rates with the divided difference
and the kernel estimators are near, but cannot be more than, 2/5
and 2(r-1)/(1+2r), respectively.

A comparison between the divided difference and the kernel
estimators, when 6 is identity, is made in Section 6. Because

of the reason stated there, the latter one is preferable to the

former one.

43

2.1 A ﬁound for the Modified Regret.

In this section we will prove two simple but useful lemmas.

 

Special forms of both have been studied, among others, by Gilliland
(1968) and Susarla (1970). Lemma 1 is essentially due to Gilliland
(1968), and Lemma 2 is a consequence of inequalities (8.8) and (8.11)
of Hannan (1957), and of Lemma 1.

With u some a-finite measure dominating P V j = l,...,n,

J
. f d
let fj be a determination of de/du Let mi 2 maxlsjsi j an
N1 2 maxlsjsi‘ej‘ be such that mi and Ni are non-decreasing.
Recall that e abbreviates 6(w ). As the Bayes response against

J 1

G1 in the component problem, we take the version of conditional

expectation
zief
(1.0) , .._l__l_i [:1 f > 0] .
1 21 f 1 j
1 3
Thus ‘ﬁi‘ s.N1. For the purpose of this section only, take *0
arbitrary real valued function on R, and for j 2 1, define
“j E '1 ' *1-1'

Lemma 1. With $0 taking values in [-Nn, Nn],
n
21 Pi‘Ai(Xi)| s 2Nn(1 + log n)u(mh).

Eroof. Abbreviate, throughout this proof, Nn by N.
From (1.0) it follows that, for l s i s n,

(91 ' w1-i)fi

. a.e. P
1
21 fj

A1 = 1 .

Consequently, since \9 - Wi-l‘ s 2N for V 1 s i s n,

i

44

2
n (filmn)

n
(1.1) 21 PilAi(Xi)\ s 2N Manual 1“ [m )
21 j n

Since by Lemma 2.1 of Gilliland (1968), Z: a:(2i a1).1 5

2: 1-1 for all 0 5 si s l, 1 s i s n and n 2 l, the rhs of
(1.1) is bounded above by 21mg: 1-1)“.(mn) s 2N(1 + log mum“). I

Lemma 2. For any estimator 3’: (¢1’°'°’¢h) with qh
and, for i - 2,...,n, mi taking values in [-Nn,Nn] and

[-N1,N1] reSpectively,
‘1) (“mm s An'lz“ N P \cp (X) -¢ (x )\
9 ~ ~ 2 J~J j j 1'1 J
-1N2
+ 8n n(1 + log n)u(mn).
Proof. Unless stated otherwise, sums in this proof are
taken from 1 to n. Let the argument Xj in various summands below

in this proof be abbreviated by omission. Inequalities (8.8) and

(8.11) of Hannan (1957) specialized to the SELE problem here yield

2 2

The identity b2 - c2 = (b-c)(b+c) followed by (0.1) and

(1.2) gives
2 33((qa - $3-1)(q3 +'¢j-1 - Zej)) S nDn(g,9)
. - -2
(1 3) s z £1“ch ¢j)(cpj+1l!j 63))
= - - -2 .
z §j((¢3 ¢j_1 Aj)(q3+vj 61))

Since *0 is arbitrary, we can (and do) take $0 = m1. Then,

since, for j 2 2, ¢3’ $1, ¢ and ej are in [-Nj,Nj], and

3-1

45

'maxlsdsn‘qB +'¢j - 291‘ s 4N“, from (1.3),

-422Njfj|¢3 ' *3-1‘ s nDn(g,go s 4(22Nj§j\¢3 - tj_1\ + NHZPJ\Aj\).
The last inequalities and Lemma 1 now complete the proof. .

2.2 Some Assumptions and Notations.

For the remainder of this chapter, we take I = R, the
real line, and assume 9 << ,1, where p. is a a-finite measure
dominated by Lebesgue measure on R. With u , a fixed deter-
mination of du/dx, we assume the existence of an a 2 -m such

that

(2.0) u(x) > 0 iff x > a.
Furthermore, we take

(2.1) n = {w e R\(c(w))‘1 é feu’xdmx) < m};
and, for w E n,

(2.2) fw(x) = C(u))emx for x >ra,

(and zero otherwise), as a fixed density of Pw wrt u.

(Thus, with fj abbreviating fw , ufj is a Lebesgue density of
j

X ). Let a1 5 min and Bi 2 max be in a for

1

each 1 S i s‘n, and ail and Bit. We also take

lstiwj 1sjsiwj

(2.3) m1 = sup{fw‘w E [ai’ei]} and Ni = Sup{\e(w)\\w E [01:51]}°

For T > O and x > a, define

(2.4) u (x) = Lebesgue-inf of the restriction to [X,X+T) of u.
° T

46

The conclusion of Lemma 2 will be used in obtaining
certain rates for various estimators to be introduced in
later sections. Since the upper bound in the lemma does not depend
on the first component (with values in [-Nn,Nn]) of the estimator
9 there, without any_further indication, the such component of
each of the estimators (yet to be introduced), is taken to be
arbitrary with gagges in [-N1,N1].

Our work in each of the next two sections is comprised of
mainly two steps: First to exhibit an appropriate estimator ¢i+l
of $1 and then to obtain a suitable bound for
N1+1£i+1\mi+1(xi+1) - ¢i(xi+l)‘ for each i = l,...,n-l. Using
this and Lemma 2, we will obtain a bound for ‘Dn(m,qp\ uniformly
in e e stabs.

Let 0 < hn s hn-l 3...: h1 S 1. Unless stated otherwise,
we, hereagier, fix 1 with, 1 s i s n-1 and drop the subscripts
in mi, a1, Bi’ Ni’ hi and Xi+1. For aj E R, let a = i-lziaj.

Note that log C(w) é -1og jew'du(-) is concave on
[0:9] and, hence, so is log fw(x) = mx + log C(w) for each x.

Thus, inf = f A f8. Hence, V y 2 0

f
aSwSB w a

(2.5) q é (mi+1/(fa A f )Y/Z) 2 £i+1/(f)Y/2.

Y B

For a real valued function g on R and for numbers
b < c, abbreviate the retraction of g to [b,c] by (g)b C.
Unless stated otherwise, all the limits (of functions depending

on i) are taken as i 1 m (hence, necessarily as n 1 m).

47

2.3 A Divided Difference Estimator of m with a Rate 1/5.
In this section we consider the case when 6 is the identity
- 1 -
map. Since fj(x) = C(wj)exp(wjx), by (1.0), ti = (log f)( )[f > O].

A

Motivated by this expression, the compound estimator ! to be
introduced here will be based on a divided difference estimator of
log f. The main idea behind the construction of this kind of
estimator is developed by Gilliland (1966), (Chapter 3), in sequence-
compound SELE of means in the family of normal densities. Our
technique to be introduced here in defining i is, however, a
little different than those of Gilliland (1966), (Chapter 3),
Susarla (1970), (Section 1.2), and Hannan and Macky (1971); and does
not require the continuity of u for g to have a rate. The
method used here to get rid of the continuity requirement of u is
partly due to Yu (1970), (Section 2 of the appendix), where he
exhibits kernel estimators of a density function and its derivative.

Define a real valued functional Q on the space of all real

valued non-negative functions t on R by
— +
(3.0) Q(t)(x) = h 1(10g Efﬁyhl)[t(x+h) + t(x) > 0] .

ZhN y+h
Let = e and, for ° = l,...,i, let 6.( ) = f, and
n J J y fy J
Bj(y) = [y s xj < y+h]/u(Xj). Note that bj is well defined with
probability one, and is an unbiased estimator of éj

The compound estimator *9 which we pr0pose for w, has

(i+1)st component

(3.1) 11,100 = (Q(é)(X))a,B-

Abbreviate Q(5)(x) and Q(6)(x) by Q(x) and 6(x) respectively.

48
For x > a, define
*
u (x) = Lebesgue-sup of the restriction to [x,x+2h) of u.

by u . In Lemma 3 below and in its proof, Q, Q,

Denote u *

2h
* -
u*, u ,m. and f all are evaluated at a fixed point x > a.

Lemma 3. V v > 0
(3.2) 31(\Q - 6\ A 2N)Y s ko(y)(ih3f ui/u*)’Y/2

where ko(y) ' vP(v/2)(l6n3(1 ‘l--T\2)/31<‘+)'Y/2 with k = l - hnu*m.

Proof. The lhs of (3.2) is
A 2
(3.3) If," gum - Ql > v]d<v") =fON<p1<v> + p2(v)>d<vY>.

where p1(v) = P1[(6 - Q) >'v] and p2(V) = P1[(Q - Q) > v].
Our method of the proof here involves obtaining an appropriate
upper bound for p1(v) + p2(v) with 0 < v < 2N.

Fix v in (0,2N) until stated otherwise. For

1 - 1,...,1, let Y1 = bj(x+b) - R ehvb

Let vj - Pij and 02 = i var(§). We will first obtain (3.7)

below by obtaining suitable upper bounds for 5 and 02. Notice

j(x), where R = 5(x+h)/5(x).

that Vj = 6j(x+h) - R ehv6j(x). Hence 5 = (l - ehv)5(x+h), and
we get
(3.4) —n5(x+b) s G s -bv5(x+b)

By independence of Yl’°'°’Yi and by cr-inequality (Loave (1963),

p. 155) we have

2 2hv

2 2 (x+h) +'R e P13§(x)).

(3.5) 16 s 2: Pij s 22i(ijj

49

Since v < 2N, R - 8(x+h)/5(x) and, for y = x, x+h, ijio') s
61(y)/u*, by (3.5) we get

a2 s 20 + Rn2)5(xH1)/u*

= 2(<5<ntb»’1 + n2<5<x>)'1>52<ar+h>>/u,.

Now, since, for l s j s 1, mi 6 ['N.N].
_ 6 (y) w (t-x)
(3.6) nnlsfla-i-fyﬁej dtsnn for y=x,x+h.
1

Therefore, weakening the final upper bound obtained above for 02
by the first inequality in (3.5) we get u*foz s 2(1+n2)m'152(x+n).
This last inequality and (3.4) give

- 2 h3v2fu*

(3.7) ﬂ;— 2 —-—-§—- .
a 20+“ )11
Next we will obtain (3.10) below by obtaining apprOpriate
lower bounds for oz, 6. vj and -Yj. By independence of
‘11,...31 and by the facts that v > O, Pj('51(°)) > 0, and

31(x'l'h)8 (x) I O with probability one, we get

.1

(3.8) 02 2 i-lzi(var('6 (x+h))+ 1?.2var('8j (x))) .

.1

*
Now the definition of u and the second inequality in (3.6) yield,

for y Ix, x+h,
. t+h - 2
Var(3j(Y)) jy (fj/uo 610’)
2 (51m (1 - u*6j<y)>+/u*)

2 (6j(y)(l - hnp*fj)+7u*) 2 k+6j(y)/u*,

50

where k is as given in the lemma, and the last inequality follows
from the definition of m given in (2.3). Consequently, from

(3.8) we get
(3.9) u*02 2 (5(x+h) + R25(x))k+ = (1+R)5(x+h)k+.

h
Next observe that -R e vBj(x) S'Yj S Bj(x+h). Therefore, since

for y = x, x+h, Bj(y) s l/u* with probability one, Y S 1/u*

J
and -vj S RTVu*, These upper bounds for Yj and -vj together

with (3.4) and (3.9) yield (Y j-vj)(-5/oz) s {(1+m)nu*/(k+(1+R)u*)}

2 * + -
S n u (k u*) 1. Hence

2 * 2
(3.10) Y. - v. s 11—”— (- 9—)
J J k+u v
*

We will use (3.7) and (3.10) to obtain a suitable upper
bound for p1(v). Note that the event in p1(v) is [§.>’0].
Therefore,(3.10) and the Bernstein inequality stated in (2.13)

of Hoeffding (1963) give

 

 

, - 2
(3.11) p1(v) = Pi[Y - C) > -v] S eXp{- 1L v) 7 * }
~ 2 n
o 2(1 +'IL§?'9
3k u*
3ik+h3vzfui
5 exp{- 3 2 *
161'] (l + T1)u

where the last inequality follows by (3.7) and by the fact that

2 * - *
(1 +‘“ u /(3kfh*)) S 4(3k+u*) lﬁzu , since n 2 1, k+ s 1 and

*
u 2L1

*.
By interchanging x, x+h in the definition of Yj's

and by applying the techniques used for bounding p1(V), we see

that p2(v) is also bounded above by the extreme rhs in (3.11).

51

Now bounding above the integrand on the rhs of (3.3) by
the upper bound just obtained for p1(v) + p2(v) and then per-
forming the integration there after extending the range of integra-
tion from (0,2N) to (0,oo) we get the desired conclusion. I

Lemma 4.

g(l)

 

supt>a\(Q(5) - )(t)\ s 4(Nm2h .

Proof. Since, for 1 s j s i, mj E [-N,N], for each integer

v 2 0 and V t E [°, - + 2h] we have

\£(V)(t)\ v wj(t-.) v

(3-12) fj(') = \wj\e s N T],
and

f (t) w.(t-°)
(3.13) -—i——- = e J 2 “'1

f.(')

J
For the purpose of this proof only, let gj = wglfj°

Since g(t) = g(t+h) - g(t), by Cauchyvmean value theorem, see

Graves (1956), p. 81, for some a in (0,1)
(3.14) m =‘é((::£t + h + eh) ___ f(t + h + 5h)
5(t) g (t +-eh) f(t + eh)

Therefore, by (3.14) and by mean value theorem, Q(5)(t) =

1 (1)
t'=t+yh

h' log(f(t + h + eh)/f(t + €h)) = (log E(t')) for some
Y 6 (0,2). Making another use of mean value theorem at the third

step below, we thus have, for some y', y” E (O,yh)

52

 

 

 

 

-<1) Eu) Eu)
\Q<5)(t> -( ><t>1= \(f )(t + Yh) - <E )(t)‘.
Sf(t-|1wh)(‘lf(1)(t+vh) - faht)‘
-<1>
+\(-=-—><t>! f(t+\(h) - mm
(3.15)
“(2) l
”321%?“ “+3 >\
-<1>
f
+ \(T)(t)"f(1)(t+¥")\)
s 4h(N’n)2

where the last inequality follows by applying (3.12) for v = 2,1,
(3.13) and the fact that lf(1)/f| SLN and y < 2.

Since the rhs of (3.15) is independent of t, the proof of
the lemma is complete. .

Observe that edh s (6j(x+h)/5j(x)) s eBh for each

1 s j S 1. Therefore, Q is in [a,B]. Since Wi = f(1)/f

and N = ‘a‘ V ‘5‘, by (3.1) and Lemma 4 we get
(3.16) m- ind s \Q ,B\+ \Q Hi \
s (\Q - (N A 2N) + 4h(NTDZ.

Therefore, (3.16) followed by cr-inequality (see Loéve (1963), p. 155),
Lemma 3 and (2.5) leads to

Lemma 5. V y 2 O
3 - 2 * 2 2 2
Emma» - imam s k6(v){(ih > Y/ Wu lug“ q!) + (W )Y)

4.
where k6(y) = 2(Y-1) {(4112)Y v ko(v) with k in k0(y) replaced

by mama - hnu*<x)m<x>>}.

53

For the remainder of this section, let c0.Cl,... denote

absolute constants, and let

. -l/5
h - hi - coi .

We will now state and prove our main result of this section. Numbers
k1,k2,... below are finite and independent of n.

Theorem 1. If V i = l,...,n,

(111.0) h1‘p*m s 1 - c1,
and if for a 5 6 [0.1] a a v 6 [5,1] and k and k2 a with

1
{1 I=2-l-y, V i =l,...,n,

(A1.1) u((u*/ui)Y/2qY) s kiNY-1h(6-Y)(1-g),
and
(Al.2) N s k2h§(6'Y),

then 3 a R3 3 \Dn(m,i)\ S R3112, uniformly in (B E Xrllllaiiei].

Remark 3.1. Assumptions (A1.0), (Al.1) and (Al.2) together

imply the existence of a R4 3 V i = l,...,n,

(3.17) u(m) s k4N-2h(85-11Y)/6 .

To prove (3.17) we proceed as follows. Since T} 2 l, (A1.0) implies
“umum < 11.1. Consequently, £01 A fB < (hu) -1. Therefore, since

* 2 -
m 2 m and since (u /u*) 2 (u) 1, from (2.5) the u-integrand

1+1
in (ALI) is no less than th/Z. Hence, by (A1.l)

54

(3.18) N2u<m) s kiN1+Y h'(Y/2)+(5-Y)(1-§)

S k4h2(6-y)(1-§)-(Y/2),

where the second inequality follows from (Al.2) and the fact that
1+y . (1-§)/§. Since 1-g = ((l+v)/(2+Y)) s 2/3 v y 6 [0,1],
(3.17) follows from (3.18). We will use (3.17) in the proof of the
theorem here, and in a comparison (which we will make later) of
the hypotheses of Theorem 1 with those of Theorem 2 in Section 4.

Remark 3.2. From (2.5) via the definition of m, and from
the fact that u* 2 u 2 u*, the lhs of (Al.l) is bounded below by

f

) = Qiﬁl‘""fm u1-Y/29Xp((5 - Ya/Z )X)dx.
(uf )Y 2 a
0'

(3-19) u(
CY/2(a)

Note that if B - (ya/2) > 0, then the rhs integral in (3.19) is
not finite, unless u(x)'~ O (as x aim) at least as fast as
e-Lx for some L > (25 - ya)/(2-y), which holds, since 0 s y s 1,
a4< B, a“ and '5 1 in i, only if N (= ‘3‘ V \a‘) = 0(1). Thus,
in such situations (Al.l) holds only if N = 0(1). It will be
shown in the example following the proof of the theorem that, for
the case u(x) = (2n)-%e-x2/2[-m < x < w], (Al.1) holds only if
N 1 m not faster than (log i)%. From these examples we con-
jecture here that for (Al.l) it is perhaps necessary that, for
some T1.T2 non-negative, N = 0(1) + T1(log i)T2, whatever be the
form of u. If this is true, then of course (A1.l) also implies
(Al.2) .

Proof of Theorem 1. By (Al.2), n (= eZhN) 1 l, and hence
by (Al.0), k6 in Lemma 5 is bounded in i. Now fix v and 5

satisfying the hypothesis of the theorem. The trivial bound

55

yd s 2N yields - ¢1(X)\ S ZNI-Ygi+1wi+l(x) ..

Ei+lwi+1(x)
i-l/S

”1+1 '

§1(X)‘Y. Therefore, since h = , Lemma 5 gives k5 and

k6 3 V’i = 1,...,n

_ - 2 2 2
(3.20) N1+1£i+lwi+1(X) - “(ml s k5N1+1N1 Yi Y/5(LL((U*/U*)Y/ qY)+N Y)

i-o/S

SR6 ,

where the second inequality follows from (A1.l) and (Al.2).

Since X abbreviates Xi+l and (3.20) holds for each

1 s i s n-l, {12; NjEjHj-1(xj) - $j_1(xj)\ s k6n'lzri'1j'Y/5.
Thus, the first term on the rhs of the inequality in Lemma 2 with
53 there replaced by i is bounded above by (k3/2)h: uniformly
in 9 E x:[ai,51], and so is the second term there by (3.17),
since k4 is independent of i and n. I

Now we will show how the conditions of Theorem 1 reduce to

a single condition on N when the family of densities involved is

normal-

2
Example N(m,1), Let u(x) = (2n)-%e-x /2[-m‘< x < m].

-— 2/2
Then a = -mq C(w) = e w and n

R. LEt -cy=B=N>0.
‘We will show that all the assumptions of Theorem 1, with y = l

and any fixed 6 6 [0,1], are satisfied iff
3 21) N é N = 0(1) + (LZQ l ')%
( . i 15 og l .

We will first prove the 'if' part. Clearly (Al.2) holds.
(Sonsidering the upper and lower bounds for the ratio u(t)/u(x)

* 2h‘x‘
for x s t < x + 2b, we get u (x) S u(x)e and u*(x) 2

e-2h(\x\+h). Therefore,

u(x)

56

(3.22) u*<x>fw(x> s e2h\x‘u<x)fw(x) = (2n)-%eXP['%((\X\‘w sgn x>2-4h\X\)}

S exp{2h(h + w sgn x)}.

4...). 2
By (3.22) u m = u SUP‘w‘Swa S eXp(2h + 2hN). Therefore, since

h = coiull5 and hence hN is bounded uniformly wrt co in a

neighborhood of zero, by a suitable choice of CO, (A1.0) holds.
For this paragraph only, let N abbreviate N1+1 (instead
of Ni). Now observe that

mi+1(x) é sup‘w“wa(x) S exp(x2/2)[‘x\ S N] + exp(N\x\-N2/2)[\x\>N].
Therefore, since fa(x) A fB(X) 2 exp(-N\x\ - N2/2), by (2.5)

Suez-erupt

2 1 2
(3.23) q1(x) s e \(egx [m SN] + e”2N ”MEN >511).

*
Moreover, using the bounds obtained above for u and u we get

*
*

(3.24) (E'§'(x))% s (211)%8XP(1£X2 + 3h\x\ + 2h2)
u*

Thus by (3.23) and (3.24),

u{((u*/u:)%ql)[\x\ S N]} S 2N exp(N2 +-3hN + 2h2)

and

2
ui<<u*/u:>%qi>[\x\ > NJ} s <2n>'%exp(- §"+ 2h2>fcxp<-tx2 +

3(h + §)\xk)dx

2
S czexp(2N + 9hN)

Consequently, since hN is bounded, we get

2
1.
(3.25) u((u*/u:)2q1) S c3e2N

57

Thus, by (3.21) and (3.25), (Al.l) holds with y = 1.
Conversely, by Remark 2.2 we note that the lhs of (Al.1)
is bounded below by the rhs of (3.19). Therefore, since

B = -a B N, with v = 1,

5‘21‘12 2
(3. 25)‘ “((u */u*) ql) 2 (2”) exp(- -—)Iexp(- %—+
2

= (8n)%e2N

3Nx
2 )dx

Thus (A1.l) with y = l and any 5C[0, 1] holds only if

2
eZN s (8n)%k1h2(6-1) ’3

, or only if (3.21) holds. II

The following corollary, which is a consequence of Theorem
1, asserts that for certain families of densities, Dn(m,§) = 0(hn)
uniformly in m 6 an. It also shows how the condition (Al.l) of
the theorem is simplified in fixed N case. Recall from (2.4)
and the definition following (3.1), that u*(é UZh) and u* de-
pend on i, l S i S n-l.

Corollary 1. Let a and B be constantswrt i. If H

is such that (Al.0) holds, and for a 6 6 [0,1], With W(a,8) =

a ' (55/2):

(3 26) “{(exp(xw(a,6))[x S 0] + exp(xw(B’a))[x > 0])(— u:)6/2<

u*

]< m

for i = l, (e.g., take any 6 in [0,1] with
6 < ZB/a, and u(x) = xT-l[x > 0], 'r 2 1 or 2:(j+1)[j S x < j+1]) ,
~ 5 . .
then Dn(9’!) = 0(hn) uniformly in m E [a,e]n
Proof. Since N E ‘a‘ V ‘8‘ is constant wrt i, (Al.2)

holds with y = 5. C(w) is clearly bounded away from 0 and m

_._ 5/2

‘ — (min/(fa A f8) ) s
2 *5/2 . . . *

(u*/u ) times the u-lntegrand in (3.26). Thus, Since u i

on [0231- Hence, since m 5 sup‘w‘ngw, q6

58

and u*1 in i, (Al.l) with y = 6 holds by (3.26)..

As a final comment to this section, we state here that the
rate 1/5, that is shown to be achieved, under certain conditions,
by i, is perhaps much slower than that i could actually achieve.

This will be supplemented in Section 5, by showing that Dn(w,§) =

-(2/5)+-

0(n ) V w 6 an with identical components.

2.4 Kernel Estimators with Rates Near % when 9(w) = w, ew 9£_ m-l.

In this section we consider the situations where 9(m) is
w, em or m-l. In each of these cases we will exhibit, for each
s >'0, a class of compound estimators with rates (l-e)/2 uniformly
in 9 e xlEQi’Bil'

The classes of compound estimators to be exhibited in this
section are based on types of kernel functions introduced by Johns
and Van Ryzin (1972) in Empirical Bayes Linear loss two-action
problems in exponential families. Thus in this section we will
have two sets of assumptions; the one, which was not needed in
Section 2.3, involves the kernel functions defining the classes
of estimators, and the other involves the family of densities.

Recall from the latter part of Section 2 that the dependence
of h,a,B,N,m and qY on i is abbreviated by omission, where
i is fixed with l S i S n-l. For 0 = 0,1 and integer r > v,
let x: be defined as in Section 1 of Chapter 1. As in Chapter 1,

with a fixed Kv 6 xi, define

v+l)-l i

3(v) , _. . -
(4.0) f ( ) — (ih 21{(Kv((Xj o)/h)/u(Xj))[u(Xj) > 0]} .

Estimation of Vi (and hence exhibition of compound

estimators) in this section, involves estimation of one or both of

59

the functions f and f(1). It has been seen in Chapter 1 that
g(v) (v)

, as an estimator of f , has various asymptotic properties.

we will make , according to our need, applications of one or
both of the functions f(0) and f<1> in defining our compound
estimators here.

The reason we have taken here r > v (instead of r 2 v,
as is taken in Chapter 1) is that x:, for any integer r > v, is
non-empty and f here is (infinitely) differentiable. Moreover,
we assume here that Kv 6 L2(O,l), (e.g., Kv could be the v-th
element of the dual basis for the subspace of L2(0,l) with basis
”'11). Denote ‘yj‘Kv(y)‘dy/j! by Mini“

Let s = ‘a‘ V ‘5‘. (In case 9 is identity map, 8 = N-)

{1.y.---,y

‘dd'h "(1‘)
Since maxlSjSi‘wj‘ S s, by mean value theorem, x ‘f ‘dy S
hsrehsf. Hence, (2.8) of Chapter 1 gives

(4.1) ‘Pif(v) - E(V)‘ s ‘k‘r_1 ”hr-vsrehsf .

- hs -
Moreover, since (Lebesgue) ess-supo‘tsl(f/u)(- + ht) S e (f/uh)(-),
the inequality in (4.0) of Chapter 1 followed by the equation in

the proof of Lemma 1 there, gives

(4.2) var(f(v)) S fehs(ih2v+luh)-1HKvH: .

As in Remark 5.2, and in Inequality (5.18), both of

Chapter 1, a choice of b, that balances the two terms hr-v

2 -
v+l) k

and (1h appearing in the bounds for the bias in (4.1)

2(v), i

and the standard deviation in (4.2) of the estimator s

(4.3) h = i‘1/(1+2r).

60

This choice of h has been adopted by various authors, (e.g., Susarla
(1970, Theorem.2; Yu (1970), Theorems 1.1 and 2.1; Johns and Van Ryzin
(1972), Theorems 3 and 4), working on certain problems utilizing
kernel estimators of a density or of its derivative. We too agggg

(4.3) throughout this section and in Theorem 6 of the next section.

For 0 < y S 2, let M v be the y-th mean error of f(v),

3

= 3(v) -(V) Y - = 2
i.e., My’v Bi‘f - f ‘ . Then, slnce M2,v (lhs of (4-1)) +

lhs of (4.2), Liapounov's inequality followed by cr—inequality and

(4.3) yields

 

y(r-v) r- y - y/2
(4-4) MY’V 5 °v(Y)h {(8 f) + (f/uh) }
where
- hS y hs 2 y/Z
Cv(Y) - (‘k‘r-1,ve ) V (e HKVHZ) .
Hereafter, wg abbreviate f(v) by f and MY 0 by My'
Recall that X abbreviates X . Inequality (4.4) will be used

r+1
in obtaining an upper bound for Ei+11¢i+l(x) - ¢i(X)‘ where
qa's are yet to be defined. Let c1,c2,... denote absolute
constants. We now discuss the three cases separately.
Case m. We consider here the case when 9 isthe identity
map. Since fj(-) = C(wj)exp(mj-), (1.0) Specialized to the case

9(w ) = wj yields

J
_ -(1) -

(4.5) ‘i - (f /f).

Since x: is non-empty for any r > v, and f here is

(infinitely) differentiable, we restrict, throughout our discussion

of this case, r in (4.0) to be at least 2. Define a compound estimator

 

 

 

 

 

 

..I‘Il.

61

; with its (i+l)st component, $1+1(X), given by

(4'6) %1+1 = (%(l)/%)a.6’

where f”) is given by (4.0) with h = 1'1/(H2r). Define
l-I - 111 by

(a 7) H = hr-l é i-(r-l)/(1+2r).

Recall that N = s = ‘a‘ V ‘B‘ and each of 0:9 and N hides
subscript i. The following lemma which plays the central role in
proving Theorem 2 below is a consequence of (4.4) and Lemma A.2

or the appendix.

Lemma 6. V p > 0 and 0 < v S p A 2,
- P P'V V rv v/Z
(4-8) 21+1‘Wi(x) Wi+1(x)‘ 5 B(P)N H (N +‘u(qY/Uh ))

+
where B(p) = 2P+(y-1) (1 +-(hN)Y(l+2Y))max (Y)-

v=0,lcv
Proof. Fix 0 < y S p A 2. Since *1 and Wr+1 are in

[u.B] and N = ‘a‘ v ‘3‘, by (4.5) and (4.6), ‘1i - yi+1‘ s 2N.

Consequently, Lemma A.2 of the Appendix and the definitions of

i d
M v y el

+
_ p P+(Y‘1) p-Y - - Y Y
(4.9) 31‘“ (1‘ s 2 N (f) V(MY 1 + (1+2 )N MY 0).

, 3

Since 8 = N, by (4.3) and (4.4), h(v-1)YM is bounded above by
Y2V

- - - 2
Cv(Y)(Hf)Y(NrY'+ (fuh) Y/ ). Consequently, the rhs of (4.9) is
- - - 2
bounded above by B(p)Np YHY(NrY+ (fuh) Y/ ), where B(p) is as
given in the lemma.
Since X abbrev1ates Xi+l’ taklng expectation wrt Pi+l
on both sides of the inequality just obtained we get the desired

62

conclusion from the definition of qY given in (2.5). I

Lemma 6 with p = 1 will be used to prove our main result
below. The numbers b0,b1,... below are finite and independent of
n.
Theorem 2. Recall from (4.7) that H = hr"1 = i-(r-l)/(l+2r).

If for a 5 e [0.1] a a b0 3 v i = l,...,n,

(A2.0) u(m) S b iHé/(N2(l + log i)),

0

and if 3 s v 6 [6,1] and b1 and b2 3 with §-1 = 2 + y(r-1),

(A23) u(qY/uzlz) s bINY'1n(5'Y)(1‘§) v i = l,...,n-l,
and
(A2o2) N s b2H§(5'Y) v i = l,...,n,

6 . . n
then 3 3 b3 3 ‘Dn((£,i)‘ 5 b3Hn uniformly 1n (3 E XIEHi’Bi].
Remark 4.1. For r = 2, (A2.2) is equivalent to (Al.2),
while (A2.l) is implied by (Al.l). Moreover, since the rhs in

1-5/2

(A2.0) is no less than b i /(N2(l + log i)), by Remark 2.1

0
via (3.17) there, for each r 2 2 (A2.0) is implied by (Al.0),
(Al.l) and (Al.2) together. Thus assumptions of Theorem 1 are
stronger than those of Theorem 2,at least for r = 2.

Remark 4.2. By (2.5) via the definition of m, the lhs
of (A2.l) is no less than
(4.10) ”(EL—Y7?) = C_(%__ .‘m ul'Y/Ze(B'Yo/2 )xdx .
(Ufa) CY (01) 8
Equation (4.10) is the same as (3.19). Hence the comments in Re-

mark 2.2, regarding possible necessary conditions for the finiteness

63

of the integral on the rhs of (4.10) (and hence for (A2.l)),remain
valid here too.

Proof of Theorem 2. Since hr.1 5 H and (y-6)§ < (r-l)-1,
by (A2.2) hN i 0. Therefore, since 3 = N, Cv(Y) in (4.4) is
bounded in i, and so is B(l) in Lemma 6. Consequently, Lemma 6

with p = 1 gives a b such that

4
1- 2
(4.11) Ni+121+1‘¢i(X) - yi+1(X)‘ s b4N1+1N Yip/(NW + u(qY/ux/ ))

S b51-6(r-1)/(1+2r)

where, remembering g'1 5 2 + y(r-l), the last inequality follows
from (A2.l) and (A2.2).

Since (4.11) holds for each 1 S i S n-1, and X there

-lzt11-li-6(r-l) /(l+2r).

Thus, the first term on the rhs of the inequality in Lemma 2, with

-l
abbreviates Xt+1, n ngiEi‘wi-l(xi) - 61(Xi)‘ S bsn

m there replaced by 3, is no more than (b3/2)H: uniformly in

w 6 x:[ai’ai]’ and so is the second term there by (A2.0), since
b0 is independent of i and n.-
The hypotheses of Theorem 2 are satisfied for many exponential

families. In Example N(m,l), introduced in Section 2, we will

show that all the assumptions of Theorem 2, with -a = B =

N>0, y = l and any fixed 6 6 [0,1] are satisfied iff

 

, _ rLr-l)(l 4;) . s
(4.12) N -~N1 — 0(1) + (2(l+r)(1+2r) log 1) .

Note that for the case r = 2, (4.12) is the same as (3.21).

Since the lhs of (3.25) is bounded below by u(ql/ug), we

h

have from there

64
2
g 2N

(4-13) u(ql/uh) S cle
Thus, if (4.12) holds, then, with Y = l, (A2.2) holds; from (4.13),
(A2.l) holds; and,2from the fact that m(x) é sup‘m‘Sme(x) S e
implies u(m) S eN l2, (A2.0) holds.

0n the other hand, we have noted in Remark 3.2 that the
lbs of (A2.l) is no less than the rhs in (4.10). Therefore, with
Y = 1

2

2 2
-% -N /4Ie-x /4 +.§E§-dx = (8n)%e2N .

(4-14) u(ql/UE) 2 (2n) 8 2

Hence, (A2.l) with y = 1 holds only if (4.12) holds. II

Remark 4.3. Theorem 2, specialized to the above Example
N(w,l), improves the univariate version of the result in Theorem
3 of Susarla (1970). We have shown, through a simpler and shorter
proof, the existence of less restrictive kernel estimators with
rates (r-l)6/(l+2r) where 6 is given by (4.12). Our rates are
strictly higher than the rates (r-l)/(4+2r) shown to be achieved
by his kernel estimators in the bounded N-case, provided, in our
unbounded N-case, N satisfies (4.12) with some 1 > 6 > (1+2r)/(4+2r).
Note that the number of restrictions on the kernel functions increases
as r increases.

The following corollary shows how the conditions of Theorem
2 are simplified greatly in fixed N case. From (2.4) and (4.3),

remember that depends on i with l S i S n-l.

uh

Corollary 2. Let a and B be constants wrt i. If for

a 6 6110.1]. and with w(oz.e) = 01 - (53/2).

0/2
h

(4-15) u{(eXP(m(o:,8))[x S 0] + eXP(XW(B,oz))[x > 0])/u } < co

65

for i = l, (e.g., take any 6 in [0,1] with
5.< ZB/a, and u(x) = xT-1[x > 0],'T 2 1 or E;(f+1)[i 5 x < 1+11),
then Dn(m,i) = 0(H2) uniformly in 9 e [a,e]“.

M. Since N 2 ‘cy‘ v ‘5‘ is constant wrt i, (A2.2)
holds for y = 6. C(m) is clearly bounded away from 0 and m on

[are]- Therefore. m(x) e sup f (x) s c2(exp(6x)[x >.0] +

‘w‘SN m
exp(ax)[x S 0]), hence (A2.0) holds since a and B are in 0.

Moreover, since (fB A fa)(x) = C(m)(exp(ax)[x >.0] +

exp(Bx)[x S 0]), q5 é mi+ll<fa A fB)6/2 S b6u:/2 times the u-

integrand in (4.15). Thus since u T in i, (A2.l) holds for

h
v = 6 ‘by (4.15).||‘

In the next section we will show that l perhaps could

~

achieve rates much higher than those indicated in Theorem 2.

Specifically, we will show that, when w has identical components,

H2 D 2' ~
cs 0 ‘ n(9’l) ‘ °4Hn '

= 1-1/(1+2r).

From (4.3) recall that h For the remainder

of this section, redefine H by

(4.7)' H = hr e i'r/(1+2r).

Case em. We consider here the case when 9(w) = em.

' on a function

Only for the purpose of our discussion here, let
indicate its translation to the right by l, e.g., f'(x) = f(x+l).
Since fj(-) = C(wj)exp(wj-), Specialization of (1.0) to the case

9(w ) = exp(wj) gives

J

(4.16) 1‘1 = E— .

Taking r in (4.0) at least 1 , define a compound estimator 3

~

66

with its (i+l)st component given by

(4.17) ii“ = (f‘lfia’ee.

8

Recall that s e ‘a‘ v ‘a‘ and N e e“ v e6 = e , and each of a,B,s
and N hides subscript 1.
Now using Lemma A.2 of the appendix, and (4.4) with v = 0,

we prove

Lemma 7. V p > 0 and 0 < v S p A 2,

-y/2

- 2 - /2 ,
aﬂlqyioo - ii+1(X)‘p) s B(p)H¥Np{er + N Y/ ”{qy((uh) Y + (uh) )1}.

+
where B(p) - ZPHY‘D (2 + 2Y)co(v)-

Proof. Since wj E [a,5] for all 1 S j S i, by (1.0)

e" si 5 e3. Thus (4.16) and (4.17) followed by the fact that

N . e3 give ‘ S 2N. Therefore, since (EVE) 5N.

”i " i1+1
it follows from'Lemma A.2 of the appendix, that

p P+(Y'1)+ ' “Y I")! Y Y
(4,13) Ei‘wi - ¢i+1\ s 2 (f) N {MY + (1+2 )N My} .

Since f' s Nf, from (4.4) with v 0, M‘ s c0(y)HY{(Nsr53')Y +

(Nf/uﬁ)Y/2}. Consequently, by (4.4) with v = 0,
rhs of (4.18) s B(p)NpHV{er + (Nf)-Y/2((uh)-Y/2 + (111;) ’Y/Zn,

where B(p) is as given in the lemma. Now (4.18) followed by the
preceding inequality and the definition of qY in (2.5) yield the
des ired conclus ion . .

Let the numbers bo,b1,...

of n. Now we will obtain the main result for the case under study.

below be finite and independent

67

- /' 1+2
Theorem 3. Recall from (4.7)' that H = hr = i r ( r).

If for a 6 6 [0,1] 3 a b0 3 V i = l,...,n,
. 6 2
(113.0) u(n) s boiH /N (l + log i),
and if 3 a y 6 [6,1] and b1 and b2 9 V i = l,...,n-l,

‘Y/2 . "le -2+¥/2 5w
(113.1) ”{qy((”h) + (uh) )} s blNiﬂ H

and

ry -2 6-y
(A3.2) s s b2N1+1H ,

6 . n
then 3 a b3 9 ‘Dn(e,i)‘ S b3Hn uniformly 1n 9 E X1[ai,ei].
Remark 4.4. If n S (-m,0], then taking 3 E 0 (implying

N 5 eB E l) (A3.2) becomes ‘q‘rv S b Hé-Y. In general, (A3.2)

2
holds if ‘o‘ V ‘B‘ 1 m at rates not faster than b4log i for

some th < r/(1+2r). Keeping the difference in H's in two

Cases w and ew in mind, we see that (A3.0) implies (A2.0).
Finally, since the lhs of (A3.l) is bounded below by the lhs of
(A2.l), comments, quite similar to those given in Remark 3.2 regarding
possible necessary conditions on o and B for (A3.l),can be stated

here too.
Proof of Theorem 3. Since h = 1-1/1+2r, H = hr and

N1+1 2 N1, (A3.2) implies hs # 0. Hence C(V) in (4.4) is bounded

in i, and so is B(l) in Lemma 7. Consequently, since

N (.2 Ni) S N Lemma 7 with p = 1, and the hypotheses (A3.l)

i+1 ’

and (A3.2) yield a b5 such that V i = l,...,n-l,

.-6 / 1+2
(4.19) Niﬂgiﬂ‘yim - imm‘ s bsl r ( r).

68

In view of (4.19) and (A3.0), the remainder of the proof
follows by arguments identical to those given in the second para-

graph of the proof of Theorem 2.].

Remark 4.5. Theorem 3 here improves Theorem 6 of Samuel

 

(1965). Restricting m's to a bounded interval of a, She exhibits
estimators a? and shows that, under certain conditions, V e > 0
Dn(m,gr) < e V n 2 some n0(m,e) < m. We do not require her
continuity assumption on u; and her other hypotheses always imply
ours, (this may readily be seen through Corollary 3 below).

By analyses analogous to those made earlier in Example
N(m,l), it can be verified that the hypotheses of Theorem 3, with

y=l,6€[0,1] and -a=a>0 (sothat s=B and

N = exp(e) 2 1) are satisfied iff

= 0(1) + (932$)— log i)

The hypotheses of Theorem 3 reduce to a rather simple one
in the fixed N case, as we can see in the following corollary.

Corollary 3. Let a and B be constants wrt i. If

 

for a 6 6 [0,1], (4.15) with u:/2 there replaced by uglz +
(uﬁ)6/2 holds, (e.g., take examples mentioned in Corollary 2),

then Dn(9’i) = 0(Hg) uniformly in m E [q,B]n

2322;. The proof is identical to the one given for
Corollary 3. I

In the next section we will point out that, in certain
cases, Dn(9’1) are 0(H:-). Thus one may eXpect ] to achieve

rates much higher than those indicated in Theorem 3.

69

Case m-l. We now consider the situation where the com-
-1
ponent problem is SELE of B(w) = w .
One of the important examples, where such estimation arises,

is sequence compound SELE of scale A in F(A.T)-family:

(T(T))-1xT-1k-Te-XIx[x,T > 0]. This of course includes the case

of sequence compound SELE of oz in N(0,02)-family:
2 - 2
(Zﬂd ) kexp(-x /02)[-m1< xr< m], since x2 is sufficient for 02.

Throughout the study of this case, we assume that’ an< 0. (Thus

(DX

B‘< 0 V i = l,...,n). Since fj(x) = C(wj)e j and, for
-l

j = l,...,i, wj < 0 , wj fj(x) = -‘: fj' Thus Specialization

of (1.0) to ej é 6(wj) = wgl gives

(4.20) (riot) = -(“: f)/f(x).

-l 1

Since asmjsa<0 Vj=1,...,i, by (1.0) e s‘isa'.

For the remainder of this section, let f be given by

l

(4.0) with r >>0, and let L denote 8. log H. Motivated by

(4.20), our proposed compound estimator has (i+l)st component

(4.21) (1,100 = (-(“:+L mm» ,1 -1-
6 ’0

r é i-r/(l+2r).

From (4.3) and (4.7)' recall that H h

Also, note that s 5 ‘a‘ V ‘B‘ = ‘o‘ and N SUp{‘w-1“d 5 w 5 B} =
W1

Lemma 8. V p > 0 and 0 < y S p A 1,

(4.22) gm‘iim - 11,1001" s B<p>HY\a""(‘a‘rY + 1 +

31/2

(‘log 11“"2 + l)u(qY/uh+L)}

70

where B(p) = 29.3(c0(y) v 1) with co(y) given by (4.4).

Proof. Fix 0 < v s p A 1. For j = l,...,i, ° f =
_1 w __ in. ,
'03:] e

B < 0. Therefore,

ij(x) S ‘B‘-1Hfj(x), since L5 = log H and a S wj S

(4.23) 1:41? s ‘e"1H f(x) .

As in Case em, abbreviate M1 0, introduced preceding (4.4), to

M Now (4.4), the inequality (x) S uh(t) V x S t < x+L,

1 ' uh+L

and Schwarz inequality give
(4.24) Bijﬁ+L‘f = f‘ = ‘:+LM1 S co(l)H{‘:+L(‘a‘rf + (f/uh)%)}

s c0(1)H{‘o‘r‘:+Lf + (L(“:+Lf)/uh+L(X)))%]

2

s c0(l)H{‘ore'1‘f(x) + (Lf(X)/(‘B‘u (x))) }

h+L

since ‘:f S ‘B‘lf(x). Liapunov's inequality, (4.23), (4.24) and

cr-inequality (Loéve (1963), p. 155) give
oo- +L“-y co - x+L - “- y
“15> BiUxf-J‘: 0 share], ‘f-fn

s (com v 1>(‘e'1‘H>V(<‘a“Y + was)"

W2

+ (‘log H‘f(x)/u (x) ]

h+L

since cg(l) = c0(y) and L8 = log H.

Since 3-1 S 61 S a-1 < 0, by (4.21) ‘wi - Wi+l\ S ‘B‘-1.

Therefore, by (4.20), (4.21) and Lemma A.2 of the appendix,
(4.26) 31‘)i(x) - yiﬂor)‘p s zp‘e‘Y’P(f(x))'Y{lhs of (4.25)

+ Z‘B‘-YMY(X)} .

71

But, since (4.4) followed by the inequality u gives

h ‘ uh+L
- - 2
MY 5 c0(y)HY{(‘oI‘rf )Y + (f/uh+L)Y/ ), by (4.25),

(4.27) rhs of (4.26) s B(p)HY‘e"p{‘a‘rY + 1 +

(f(x)u (x))‘V/2(‘log 11“”2 + 1)]

h+L

where B(p) is as given in the lemma. Since X ~ Pi+l’ (4.26)
followed by (4.27) and the definition of qY in (2.5) leads to
(4.22).-

We will use Lemma 8 with p = l in order to prove our

main result below. Numbers b ,b below are finite and

O 1,..

independent of n.
Theorem 4. Recall that H é hr = i

6 6 [0,1] and g > 0 a a b 9 v i = l,...,n,

0

(A4.0) u(m) S bOiHé‘log H‘Q‘B‘2/(l + log i),

and if 3 y 6 [6,1] and b1 and b2 3

2 y/2 6- - /2 .
(A4.1) ‘8‘ ”(qy/”h+1) s blH Y‘1og H‘g Y v l = l,...,n-l,
and
(A4.2) ‘5‘“2‘07‘rY s bZHb-Y‘log H‘Q v i = 1, .,n
6 . .
then 3 8 b3 9 ‘Dn(m,g)‘ S bBHn‘log Hn‘Q uniformly in
n

Proof. Since 0 >‘B 1,by (A4.2), h‘a‘ é Hllr‘a‘ a 0.

Therefore, since ‘3‘ = ‘a‘, by (4.4), B(l) in Lemma 8 is bounded

in i. Consequently, since 3 abbreviates Bi’ Ni here is

72

‘Bi‘-1’ H = i-r/(l+2r) and (A4.2) holds V i = l,...,n, Lemma 8
with p = 1 followed by (A4.2) and (A4.1) give a b4 3

V i = l,...,n-l,

(4.28) Ni+1§i+1‘yi(X) - Wi+1(X)‘ s baHé‘log H‘C

S béi-r6/(l+2r)‘log “n1;-

In view of (4.28) and (A4.0), the remainder of the proof
follows by arguments identical to those used in the second para-

graph of the proof of Theorem 2. I

Assumption (A4.1) of the theorem is the most stringent one.
Comments, regarding a possible necessary condition for this, are
the same as those contained in Remark 4.2.

Corollary 4. If a and a are constants wrt i, and for

6 6 [0,1] and g > 0 3 a y 6 [6,1] and a b5 3 V i = l,...,n-l

(4.29) (lhs of (4.15) with 6 and uh replaced by y and uh+L) S

5- -
bSH Y‘log H‘g Y/Z,

6 . .
then Dn(9,l) = 0(Hn‘log Hn‘g) unlformly 1n 9 6 [a,e]n.
Proof. The proof is analogous to that of Corollary 2. .
Example. For T > 0 fixed, let u(x) = (F(T)-1xT-1[x > 0].

Moreover, let a and B be constants wrt i. Then, since by cr-

inequality,
y/2

(4.30) (qyluzii) S {P(T)(XT-1[T 2 l] +-(x1_T + (h+L)1-T)[O < T < 1])]

exP((B - Yo/2)X)[X > 0],

(4.29), with any 0 < 6 s 1 3 o < Zs/d, g = (0/2) + (1-¢)[0 < T < 1]

and y = 6, is satisfied uniformly in w 6 [0,8]“.

73

Remark 4.6. Section 2.1 of Susarla (1970) deals with sequence
compound SELE in the example just mentioned. His condition on the
parameter Space implies 23 < a < a, and his assumption (0.8) to-
gether with his hypothesis T > 2 restricts T to be in
{3,4,...,r+l} U {t‘t 2 r+2]. Moreover, his presentations are rather
complicated and proofs of lemmas are lengthy. His estimator, which
depends also on certain other auxiliary random variables independent
of X1,...,Xn, achieves (a rather weaker) rate r/2(l+r) uniform in
m E [a,a]n. Note that this example with T > 1/2 does not cover
the case of sequence compound problems where the component is SELE

2 2 2
(based on X ) of c in N(0,o )-family.

2.5 Rates Near the Best Possible Rates with the Divided Difference

 

and the Kernel Estimators of m with Identical Components.

This section deals with only the case when w has identical
components. We will show that, when 9 is identity, rates with
the divided difference and the kernel estimators are arbitrarily
close to, but cannot be more than, 2/5 and 1, respectively. We
will also indicate that, for Case em, kernel estimators achieve
rates near 1.

Throughout this section, let w = (w,..-,w) 6 DP, and let
a and 3 be constants wrt i such that a S w < B. Let f

abbreviate fw. It may be noted at the outset that the conclusions

of Lemmas 5, 6 and 7 remain valid if qY there is replaced by
- 2
" e /f 1”

Theorems 5 and 6 below are proved for the case 9 is identity,

and for the lower bounds there we assume:

74
(5.0) 3 an e > 0 and a finite L > a 3

Lebesgue-sup of the restriction to (L,L+g) of u is finite,

and

Lebesgue-inf of the restriction to (L,L+g) of u is positive.

With 6 and L in (5.0), we have
(5.1) O<u(L<x<L+e)<e°,
and, since f(t) é C(m)exp(wt),

(5.2) 0 < inf f(t) s sup

L<t<L+e L<t<£+cf(t) < °° '

Since 9 is identity in both of the theorems and a1 and Bi

are constants wrt i, Ni 5 ‘ai‘ v ‘Bi‘ 57A (say). Moreover, since

wj E w, *j E w and R(Gn) a 0. Therefore, for every compound
estimator % = ($1,...,$n) with values in [-A,A]n, (0.1) gives
(5.3) D (um = n'lg“P.(i (X) - (.02
n ~ ~ 1~J j J
-1 2 2'Y -1 n _ y
S4n A + (2A) n zzgj‘yjmj) w‘ .

Throughout the remainder of this section, c0,c1,... will denote

finite positive constants.

Theorem 5. For each fixed 1 = l,...,n-l, let

- w
h (5 hi é coi 1l5), u* and u be defined as in Section 3, (cf.

preceding Theorem 1 as well as following (3.1). Also note that

*
u* and u depend on h, and hence, on i). Let Q be the

divided difference estimator introduced in Section 3. If for i = l

75
(5.4) supx>au*(x)f(x) < a

and for a y 6 [0,2) and i = l

(5.5) u((u*/u:)Y/2f1-Y/2) < a,

then 3 a c1 3
Y
(5.6) Dn(9,i) S Clhn .
On the other hand, if (5.0) holds, then
2 . .
(5.7) Dn(m,i) 2 CZhh V suff1c1ently large n.

Proof. Throughout this proof, fix 1 with 1 S i S n-1,

and abbreviate X to X.

i+l
Let y be given by (5.5). By (5.4), k6(y) in Lemma 5

is bounded in i (by choosing cO suitably, if necessary).

Since the conclusion of that lemma holds even if q there is

~ fl-y/Z

q , which here becomes

replaced by , by (5.5),

iY/S§i+1‘$i+1(x) - (1)"Y is bounded in i. This conclusion and the
inequality in (5.3) give (5.6).

To prove (5.7) we proceed as follows. Recall that

a > w . Since by (3.1). 1i+i(x) = (Q(6)(X))Q 3’

(5.8) Pi+1‘1i+1(X) - w‘ 2 g-wEi+1[¢i+l(x) - w > v]dv 2

~

Piﬂﬂt < X < L + e/2]“"8‘w§i[Q(6-)(x) > v + m']dv},

where (here and throughout this proof) L and g are given by (5.0).

76

Fix X E (L, L +'s/2) and v 6 (0, B-w) until stated
otherwise. From Section 3, (following (3.0», for j = l,...,i,

6j(') = [- S X < - + h]/u(Xj). As in the second paragraph in the

J
proof of Lemma 3, for .j = l,...,i, let Yj = 6j(X +-h) -
eh (W)6 Y

j(X). Slnce X1,...,Xi are 1.1.d. so are Y1,..., i'

The definition of Q in (3.0) and the Berry-Esseen theorem

15:1]

(Loéve (1963), p. 288) lead to

To .3!" L'lﬁr‘f

(5.9) gi[Q(6)(X) > v + 0.)] =Ei[zin >0]
Y - p Y

.-e 3

ll£9135 1 291‘ 1 1 1‘

2 Q( C ) - C3 3 S
1 01

 

2
where o = var of Y1 and 6 is the distribution function of
N(0,l).

Inequalities in the remainder of this proof are valid only V

sufficiently large i. Since h i 0, take h S g/4, where e

is as in (5.0). Let 01(X) = ‘:+hf(t)dt. Then, since

_ hw
(61(°+h)/61(')) - e ,

_ eh(v+u))

(5.10) P Y = 61(X+h) 01(X) = 61(X)ehm(1-ehv)

1 1

hp 2
2 hv61(X)e c4 v,

where the last inequality follows from (5.2). Moreover, since

61(X+h)61(X) = 0 with probability one and P161(-) > 0, Cl 2
ezhwvar 61(X). Thus,

-2hw 2
e 01

x+h

2
X (f/u) - 61(X)

2 var(61(X)) =

* 3t
2 61(X)(l - u (X)61(X))/u (X) 2 c5h

77

-1 *
since, by (5.0) and (5.2), su u (t)61(t) < m and

pL<t<L+ e/ 2h

-1 *
. b 5. ,
inf <L+SPZ (61(t)/u (t)) > 0 Consequently, y ( 10)

P Y
(5.11) -1—1 2 -c vh3/2.
01 6

3
and, since by (5.0), Pl‘Y1 - P Y ‘ s(tonstant)oi

l l
P1W1 " P1Y1‘3 *5
(5.12) 3 S c7h .
°l
Now weakening the integrand on the extreme rhs of (5.8)

by (5.9), (5.11) and (5.12) and then making the transformation
’5

c6v(ih3) = t we get, after recognizing that X has u-density

f satisfying (5.2),

(5.13) Pr+1‘$i+1(X) - w‘ 2 u(t <'x < L +.€/2)

3
3 42 (B-m)c6(ih )52 -%
{°g(ih ) ‘0 i(-t)dt - C9(1h) }.
.-l/S , .
Since h = C01 , the integral 1n (5.13) converges to
135('t)dt as 1‘“ m, and hence by (5-1), 11/5 times the lhs of

(5.13) is bounded below by a positive quantity for all large i.
2 2

Therefore, since Pt+1‘Wi+1(X) - w‘ 2 Pi+l‘mi+l(x) - w‘, (5-7)

follows from the equality in (5.3). II

Theorem 6. Let 2 be the kernel estimator introduced

~

under Case w in Section 4. (See (4.6). Also recall that W is

~

defined for each integer r > 1.) As in (4.3) and (4.7), take

-l/(l+2r) r-l

h = h1 5 i and H = Hi = h . If for a Y E [0:27

and i = l,

(5.14) u(fl'Y/zlug/Z) < a

 

78

then 3 c 9

10

Y.
(5.15) Dn(e,i) s CIOHn’

and if the kernel functions K0 and K1 defining W are

~

bounded, and (5.0) holds, then
2 o o
(5.16) Dn(9’i) 2 CllHn V suff1c1ently large n.

Proof. Fix i with l s i s n-1 and abbreviate X1+1

by X.

 

 

In view of Lemma 6, (5.15) follows by arguments identical
to those given for the correSponding part of Theorem 5.
Now we prove (5.16). Recall that B > w. Since W =

2(1) 2
(f If)a:6’

B-w

(5.17) Pi+l‘%i+1m - w‘ 2 O Ei+l[wi+l(x) - w > v]dv 2

Piﬂm, < x < L + e/ij‘o w§.[[%(1) - w? > v\f\]dv},

)

and f(1 ) is abbreviated by omission,

H'H

where the argument X in
and L and e are given by (5.0).
Fix X E (L, L + 3/2) and v E (O, B-w) until stated

otherwise. For 1 S j s i, let

{<-h 1K + wK + v\x Op (-—]—-——)}[u(X) > 0]

(5.18) u(Xj)T=1 0

J

where K.o and K1 are the kernels used in the definition of l.

Since X1,...,Xi are i.i.d., so are T . The

T1,..., 1
definitions of f(J), j = 0,1, given in (4.0) and Berry-Esseen

79

theorem give

A

 

t 1 - 2 1
(5.19) 21“” - wf > vm] 2 31021 Tj < 0]
- 3
i p T i %P T - P T \
2 §(_ ] l) _ C l] l l 1
cl 13 O3
1
whe 2 = T
re 01 var 1.

Inequalities in the remainder of this proof are obtained only V
sufficiently large i. Since h 1 0, we take h 5 3/2, where e

is as in (5.0). Let 21,22 and 23 denote, respectively, the

 

first, second and the third term in the expression for T Then

1.
the transformation theorem followed by r-th order Taylor expansion

with integral form of the remainder and the orthogonality properties

of K.j 6 x; for j = 0,1 gives P121 + hf(1)(X) =
f3K1(y)I:+hy(X + hy-t)r-1f(r)(t)dtdy/(r-l)!. Thus, since K1 is
bounded, by (5.2), P121 5 -hf(1)(X) +-const.hr. By similar
arguments, P122 s hwf(X) + const.hr+1. Therefore, since by (5.2),
P23 = V x+h\K0(Eﬁz§\f(t)dt s hv const. and since f(1) = wf,

r r-l
(5.20) PlTl S c14h(h + h + v) .

Next observe that

2 2
(5.21) 01 2 g (21) + ii cov(Zj,Zj,).
J j'

Since h i O and K0 and K are bounded, writing the exact

1
expression for cov(Zl,Zz), we see, after making use of the trans-
formation theorem and (5.0) and (5.2), that \cov(Zl,ZZ)\ is
bounded in i. The same conclusion holds for cov(Zl,23),

2
cov(Zz,23) and (P121) . Therefore, there exists a finite constant

80
§ (could be negative) such that
2 2 -l 2
(5.22) o 2 P z + g = h j‘K (t)((f/u) (X + ht))dt + g .
l 1 l 1
Consequently, by (5.0) and (5.2),
(5 23) h 2
° 01 2 C15)

and hence, by (5.20)

 

.‘HI. 5‘

 

(5.24) 1 1 s c 113/203 + h“1 + v)
01 16
M ' b 5 O hP T P T 3 2 b 5 23
oreover, Since y ( . ), 1‘ 1 1 1‘ 5 const. 01’ y ( . )
3

P \T - P T \

J 1 1 1 -%
(5.25) 3 S c17h .

“1

Now weakening the integrand on the extreme rhs of (5.17)
by (5.19), (5.24) and (5.25), and then, doing the analysis
exactly similar to that given (following (5.12)) in the proof of
Theorem 5, we get the desired conclusion. II

For Case em in Section 4 we have taken H, = h? with
1 1

h = i-l/(1+2r).

i In this case, if for a y 6 [0,2) and i = l

.. 2 _ -
(5.26) u{f1 Y/ ((uh) y/Z + (uﬁ) Y/2)} < m,

where uﬁ(-) in (5.26) stands for uh(-+l), then using a proof
similar to that used for (5.6) and making an application of Lemma
7 with p = y, it follows that 1 given by (4.17) satisfies
3 Y
13110312) Cain) as n t on.
*

Remark 5.1. In view of the definitions of u*, u and

uh, each of (5.5) and (5.26) implies (5.14). Densities satisfying

81

(5.5) and2(5.26) V y 6 [0,2) exist, e.g., take u(x) =
- - 2 - -

(2n) *8 x / [-w‘< x < m], (F(T)) le 1[x > O], T 2 1 or

g;(j+l)[jl< x 5 1+1]. Thus situations exist where the lower and

upper bounds in each of Theorems 5 and 6 are considerably tight.

2.6 The Divided Difference Versus the Kernel Estimators.
The divided difference estimator introduced in Section 3

and the kernel estimator introduced under Case w, in Section 4

 

n
are compound estimators of the same vector w = (w1,...,wn) 6 O - g;

N

Therefore, it is rather natural to make a comparison between them.
Denote them here by i and ER reSpectively. Recall that ER
is defined for each integer r > 1.

under certain conditions, Theorems 2 and 5 show that ER

with r > 6 is better than 1 in the sense that, V large n,

~

SUPw\Dn(‘£’iK)\ s con-(r-1)/(1+2r) ‘ elm-Us S supw\Dn(‘£’1)‘

where c0 and c1 are some finite positive constants. By Theorems

2 and 6, WK. with r > 6 is better than WK with r = 2 in the

same sense.
Results obtained in Theorems 2 and 6 for WK with r = 2
coincide, respectively, with those obtained in Theorems 1 and 5

for w. However, as we have noted in Remarks 4.1 and 5.1, condi-

tions forlatter ones are stronger than those for former ones.

~

‘Hence, ‘K’ even with r = 2, could be preferable to 1. Neverthe-

less, $ is a more natural estimator compared to 6K.

~ ~

82

Estimators i and 3K are somewhat (but not completely)
similar to !f* and oi. respectively, prescribed by Susarla (1970),
(Chapter 1), for the case u(x) = (2n)-%exp(-x2/2)[-O < x < m]
and -01 - Bi 5 c2, a finite positive number. Results of Theorems
1 and 5 for i, specialized to the above case and (in Theorem 5)
w = 0, coincide with those obtained by Susarla for 1f*. However,
in order to make oi_ (which is, in comparison of EK’ rather

**
complicated to exhibit) better than 1. in the sense described

above he requires r > 12.

 

APPENDIX

 

APPENDIX -

Here we prove two useful lemmas; one concerning the weighted
empiricals based on independent random variables and the other

concerning the difference of two random ratios.

A.l. On Glivenko-Cantelli Theorem for the Weighted Empiricals
ﬁgsed on Independent Random Variables.
Let X1,...,Xn be independent real valued random variables,

and, for w 6 [0,1], let Fj(x) = wP[Xj < x] + (l-w)P[Xj S x] and

Yj(x) = w[Xj < x] +-(l—w)[Xj S x]. Furthermore, with c1,...,cn
non-negative numbers such that 22C: = 1, let
n * n
H = F =
n 21% J" Hn zch'YJ

and

+ 1':
Dn = supx’wmaxNSn(HN(x) - HN(X)).

A Special case of the result in Remark A.l (following the proof of

Lemma A.l below) is used in the proof of Theorem 2(b) of Chapter 1.

Lemma A.l. With c = zch, V M 2 1,

(1) P[D: 2 M] < 2c M exp(-2(M2 - 1))

*
Proof. Let A = mastnGIN - HN). The remark following
(2.17) of Hoeffding (1963), p. 17, and Theorem 2 therein, applied

to random variables chj with w = 1 yield

83

 

1r...» .... ._

84
(2) P[A(x-) 2 n] S exp(-2n2) V x 6 R and V H > 0.

Fix (temporarily) O < y‘< M and partition R into k
intervals with endpoints -m = x < x1 <...< xk = m such that

Hn(x ) S y for j = l,...,k. Since 0 S Hn(-) S c, we can

J-l’xj
-1 .
(and do) take ki< cy +11. Since HN(xj_1,xj) S Hn(xj_1,xj) S y

*
for N S n, using the monotonicity of RN and HN’ we get

*
(3) sup j-[<x<xjA(X) S maxN$n(HN(Xj-) - HN(xj-1+))

 

 

S A(Xj-) + Y

since A(xk-) = 0. The rhs of (3) is independent of w.
Now observe that A(x) S A(x+) V A(x-) S suprSsupr(x),
+ .
where S is any dense subset of R. Therefore, Dn = Supx W60‘) 5
’

supw‘maxlsjsksupx <x<x A(x), and from (3) and (2) we have

3'1 kjl
(4) 9(6: 2 M] s p( g [A(xj-) 2 M - Y1)

< cy'lexP(-2(M - W2) -

Since the lhs of (4) is independent of y, substituting y on the
rhs of (4) by Y0 = M(l - (1 - M-2)%) and noting that val S 2M,
we get the desired conclusion.-

- *
Remark A.l. If Dn is defined by interchanging HN and

 

HN in D:, then P[D; 2 M] < rhs of (1). This follows from Lemma

- + _
A.l since Dn(§n) Dn(-§n) where En (X1’°°°’Xn)° Thus, with

Dn = supx’wmaxNSn\H;(x) - HN(x)\(= D: v pg), P[Dn 2 M] < 2(rhs of (1)).

- L
With =...= c n k and M = (1 + log n)2, it follows

c1 n

from (1) and Remark A.l that

85

(5) rtsup marqSUIZNW (x) - F (x))\ 2 misc + log n) 1

< 4n-3/2(1 + log n)!5

Thus, by Borel-Cantelli lemma,

supmmaaqsnﬁajo) - Fj(x>)\ = 0((n log rob

with probability one.

A.2. A Bougd_for the Yrth Mean of the Bounded Difference of Two
Random_Ratios.
We apply Lemma A.2 below in the proof of Lemmas 6, 7 and
8 of Chapter 2, in order to obtain certain suitable bound for the
p-th mean distance between the compound and Bayes estimators there.
Lemma A.2. Let y,z and L be in R with 2 ﬁ 0 and
L > 0. If Y and Z are two real valued random variables, then

V y > 0
Y (-1)+ -
(1) qu - g A L)Y 5 2% V ‘2‘ Y{E\y-Y\Y
1 +
+ (\E‘Y + 2‘“' ) LY)E\z-Z\Y}.

Proof. Since [2\z-Z\ S ‘2‘] S [2‘2‘ 2 ‘2‘], the lhs of

(l) is exceeded by
(2) mi - §\V[2\2\ 2 \z\]) +LYE[2\z-Z\ 2 \z‘] .

Now by Markov-inequality, the second term in (2) is no more than
(2L)Y‘z|-YE\z-Z‘Y. By triangle inequality with intermediate term
y/Z, and by c r-inequality (Loeve (1963), p.155), the first term
in (2) is bounded above by 2Y+<Y1)+\z \ Y(E‘y-Y\.Y + \EWYE\z-Z\Y).

Putting these results together we conclude (1)....

 

86

Remark A.2. The proof given above also proves: If y,z,Y,Z
and L are real valued random variables, and L > O with probability

one, then V y >-0
Y ( -1)+ -
(3) Edi-2‘ /\L)Y52YJ'Y EH4 Y(\y.Y\Y+ (‘SY
+
+ Z'W’l) LY)\z - 2‘3}.

Thus (1) becomes a special case of (3).

 

 

BIBLIOGRAPHY

BIBLIOGRAPHY

BHATTACHARYA, P.K. (1967). Estimation of a probability density
function and its derivatives. Sankhya Ser. A 29 373-382.

BILLINGSLEY, PATRICK (1968). Convergence of Probability Measures.
John Wiley & Sons, Inc., New York.

CACOULLOS, THEOPHILOS (1966). Estimation of a multivariate density.

Ann. Inst. Statist. Math. 18 179-189.

CENCOV, N.N. (1962). Evaluation~of an unknown distribution density
from observations. Soviet Math. 3 1559-1562.

GILLILAND, DENNIS C. (1966). Approximation to Bayes risk in

 

sequences of non-finite decision problems. RM-162, Department
of Statistics and Probability, Michigan State University.

GILLILAND, DENNIS C. (1968). Sequential compound estimation. Ann,
Math. Statist. 39 1890-1905.

HANNAN, JAMES F. and ROBBINS, HERBERT (1955). Asymptotic solutions of

the compound decision problem for two completely specified
distributions. .Ann. Math. Statist. 26 37-51.

HANNAN, JAMES F. (1956). The dynamic stafistical decision problem
when the component problem involves a finite number, m, of
distributions (Abstract). Ann. Math. Statist. 21 212.

HANNAN, JAMES (1957). Approximation to Bayes risk in repeated

 

 

play. Contributions t2_the Theory g§_Games 3 97-139. Ann.

 

 

Math. Studies No. 39, Princeton Univ. Press.

HANNAN, J.F. and VAN RYZIN, J.R. (1965). Rate of convergence
in the compound decision problem for two completely Specified
distributions. Ann. Math. Statist. 36 1743-1752.

HANNAN, JAMES and MACKY, DAVID W. (1971). Empirical Bayes
squared error loss estimation of unbounded functionals in
exponential families. RM-290, Department of Statistics and
Probability, Michigan State University.

87

53

 

88

HEWITT, EDWIN and STROMBERG, KARL (1965). Real and Abstract Analysis.
Springer-Verlag'New York, Inc.
HOEFFDING, WASSILY (1963). Probability inequalities for sums of
bounded random variables. J, Amer. Stat. Assoc. 58 13-30.
JOHNS, M;V., Jr. (1967). Two-action compound decision~problems.
Proc. Fifth Berkeley Symp. Math. Statist. Prob. 1, University
of California Press. ~
JOHNS, M{V., Jr. and VAN RYZIN, J. (1972). Convergence rates for

 

 

empirical Bayes two-action problems II. Continuous case.
Ann. Math. Statist. 43 934-947.

KIEFER, J. (1961). On lane deviations of the empiric d.f. of
vector chance variables and a law of iterated logarithm.
Pacific J, Math, 11 649-660.

KRONMAL, R. and TARTER:~M. (1968). The estimation of probability

 

11—.

densities and cumulatives by Fourier series methods. J, 599;,
Statist. Assoc. 38 482-493.

LOEVE, MICHEL (1963).~7Probability Theory (3rd ed.). Van Nostrand,
Princeton.

NADARAYA, E.A. (1965). On non-parametric estimates of density

 

 

function and regression curves. Theor. Prob. Appl. 10 186-190.
NATANSON, I.P. (1955). Theory of Functions of a Real Varigble.
Frederick Ungar Publishing Co., New York.
OATEN, ALLAN (1972). Approximation to Bayes risk in compound decision
problems. Ann. Math. Statist. 43 1164-1184.
PARZEN, EMANUEL (1962). On the estimation of probability density
and mode. Ann. Math. Statist. 33 1065-1076.

RAO, B.L.S. PRAKASA (1969). Estimation of a unimodal density.

 

 

 

Sankhya Ser. A. 31 23-36.
ROSENBLATT, MURRAY (1956). Remarks on some non parametric estimates
of a density function. Ann. Math. Statist. 27 832-837.
SAMUEL, ESTER (1963). Asymptotic solutions of the~sequential com-
pound decision problem. Ann. Math. Statist. 34 1079-1094.
SAMUEL, ESTER (1965). Sequential compound estimatgrs. Ann, Math,
Statist. 36 879-889.
SCHWARTZ, STUART C. (1967). Estimation of a probability density by
an orthogonal series. Ann. Math. Statist. 38 1261-1265.

 

 

 

 

 

 

89

SCHUSTER, EUGENE F. (1969). Estimation of a probability density
function and its derivatives. Ann. Math. Statist. 40 1187-1195.

SUSARLA, V. (1970). Rates of convergence in sequence-compound
squared-distance loss estimation and two-action problems.

RM-262, Department of Statistics and Probability, Michigan
State University.

VAN RYZIN, J. (1966). The compound decision problem with m X n
finite loss matrix. Ann. Math. Statist. 37 412-424.

VAN RYZIN, J. (1970). On a histogram method of~density estimation.
Univ. of Wisconsin, Department of Statistics T.R. No. 226.

VAN VLECK, F.S. (1973). A remark concerning absolutely continuous
functions. Amer. Math. Monthly 80 286-287. _

WAHBA, GRACE (1971). A polynomial alggrithm for density estimation. ,5
5.2.9.- M_a_t_:_t_1_. Statist. 42 1870-1886.

WATSON, 6.8. and LEADBETTER, M.R. (1963). On the estimation of
the probability density I. Ann. Math. Statist. 34 480-491.

WATSON, 6.8. (1969). Density estimation by orthogonal~series.

Ann, Math, Statist. 40 1496-1498.

WEGMAN, E.J. (1969) . Maximum likelihood estimation of a unimodal
density function. Inst. Statist. Mimeo Ser. No. 608, University
of North Carolina at Chapel Hill.

WEISS, L. and WOLFOWITZ, J. (1967). Estimation of a density at a
point. z, Wahrscheinlichkeitstheorie und Verw. Gebiete 7
327-335. ~

YU, BENITO (1970). Rates of convergence in empirical Bayes two-

- .“d

 

action and estimation problems and in extended sequence-com-
pound estimation problems. RM-279, Department of Statistics
and Probability, Michigan State University.

 

Ill. Ila If..-

.l‘Avn : v