' I r
”v.01.

\~::;I::Isyt_5=:g~w\j«

 

RATES OF CONVERGENCE IN EMPIRICAL BAYES
TWO - ACTION AND ESTIMATION PROBLEMS AND IN EXTENDED
SEQUENCE - COMPOUND ESTIMATION PROBLEMS

Thesis for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
BENITO ONG YU
1970

 

Yon-:WW‘a-rrrimm
LIBRARY I
Michigan Stan: I
(JDI'VCIBEIY

’9" MW

This is to certify that the

thesis entitled

RATES OF CONVERGENCE
IN EMPIRICAL BAYES TWO-ACTION AND ESTIMATION PROBLEMS
AND IN EXTENDED SEQUENCE-COMPOUND ESTIMATION PROBLEMS

presented by

Benito Ong Yu

has been accepted towards fulfillment
of the requirements for

Ph.D. Statistics and

Probability

;rtb V'Lé’ﬂdi ‘ (AA/M ag

 

 

degree in

Major professor

Date ’5/30/70

0-7639

ABSTRACT
RATES OF CONVERGENCE

IN EMPIRICAI.BAXES TWO-ACTION AND ESTIMATION PROBLEMS
AND IN EXTENDED SEQUENCE-COMPOUND ESTIMATION PROBLEMS

BY

Benito Ong Yu

Throughout, our component problems concern exponential
families of distributions of x conditional on the parameter 9.

In Part I we consider exponential families determined by
a measure with Lebesgue density h, where h(x) > 0 if and only
if x > a, and assume the parameter 9 has a distribution G.
Based on a sequence of observations x1,x2,...,xn, iid according
to the marginal distribution of x, estimates of the posterior
mean are used to define estimates for the Bayes test in the linear
loss two-action problem. Rates of convergence of the excess risk
are obtained under certain integrability conditions. The scale
parameter exponential and the location parameter Normal densities
are given as examples where the finiteness of certain moments of
G is sufficient for these integrability conditions.

These results,proved under weaker hypotheses than those
of Johns and Van Ryzin (1967), are obtained under the assumption
h(r) exists for some r 2 2. Analogous results are also obtained
without any differentiability assumption on h.

In the squared error loss estimation problem, a truncation

of the previous estimates for the posterior mean are used to estimate

Benito Ong Yu

6. By a different method of proof, rates of convergence of the
excess risk are established.

It is shown that the excess risk of the linear loss two-
action problem is exceeded by the squared root of that of the
estimation problem and, consequently, certain improved rates in
the location parameter Normal two-action problem can be obtained
as a corollary to those obtained in the estimation problem.

In Part II we consider certain discrete exponential and
the location parameter Normal families, and assume that the parameter
9 is bounded. Based on all past observations x1,x2,...,xn, with
the x1 conditional on 6i being independently distributed

according to P , squared error loss estimation of an is con-

91

sidered with the aim that the average risk across the first n

problems approach the extended Bayes envelope Rk(G:) evaluated
k

at Gn’ the empirical distribution function of the k-vectors

(91.....ek). (92,...,ek+1).-..,(e ....,en).

n-k+l
Swain (1965) obtained rates of 0(n-% logk n) and 0(1)

for the discrete exponential and the Normal families, reSpectively.

Gilliland (1966 and 1968) considered the unextended (k = l)
%

versions of these problems and obtained improved rates of 0(n-

/

)

and 0(n"1 5), reSpectively. In Chapters 3 and 4, the same order

of improved rates, namely, O(n-%) and O(n k+4 ), are obtained

in these families, respectively.

RATES OF CONVERGENCE
IN EMPIRICAI.BAXES TWO-ACTION AND ESTIMATION PROBLEMS
AND IN EXTENDED SEQUENCE-COMPOUND ESTIMATION PROBLEMS

BY

Benito Ong Yu

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

1970

ACKNOW LEDGEMENTS

I wish to express my sincere gratitude to Professor James
F. Hannan for the patience he accorded me in the preparation of
this thesis. His guidance and comments greatly improved and
simplified the results in this work. To Professor Dennis C.
Gilliland, I owe my thanks for his encouraging and helpful
suggestions in reviewing the second part of this thesis.

This work was made possible through the financial support

provided by the Department of Statistics and Probability and the

National Science Foundation.

ii

TABLE OF CONTENTS

Page
PART I EMPIRICAL BAYES IN EXPONENTIAL FAMILIES
Introduction .................................. 1
Chapter
1 LINEAR LOSS TWO-ACTION PROBLEM ................ 3
1.1 Introduction 0.00.00.00.00... 000000 0.0.00. 3
1.2 The Empirical Bayes Problem .............. A
1.3 Exponential Families ......OOOOOOOOOOOOOO. 5
1.4 Summary and Some Useful Results .......... 7
1.5 Main Result and Examples ......OOOOOOOOOOO 9
1.6 Result Without Differentiability of h ... 17
2 SQUARED ERROR LOSS ESTIMATION PROBLEM ......... 21
2.1 Introduction ............................. 21
2.2 Estimation of LG: Pxe ................... 22
2.3 summary ..................0.0000.......... 24
2.4 Main Results and Examples ......O...’..... 24
PART II EXTENDED SEQUENCE-COMPOUND ESTIMATION
Introduction .....OOOOCOOOCOCOOOOOOOO0.0.0.0... 32
3 ESTIMATION IN DISCRETE EXPDNENTIAL FAMILIES
UNDER SQUARED ERROR IDSS ...................... 34
3.1 Introduction ..................... ....... 34
3.2 A Bound for the Modified Regret Dn ...... 38
3.3 Estimation in Discrete Exponential
Families under Squared Error Loss ........ 40
4 SQUARED ERROR LOSS ESTIMATION IN THE NORMAL

FMIIX 00.0.00...0.000.000.0000...00.........I. SO

4.1 Introduction '°°§"'°"°°°°°'°°°°°'°°"'°' 50
4.2 Bounding 311+in - (X+t)\ 52

REFERENCES ......OOOOCOOOOOOOOOOOO00.00.0000... 57

APPENDIX 0.0.00..........OOOOOOOOOOOOOO0......O 59

iii

PART I

EMPIRICAL BAYES IN EXPONENTIAL FAMILIES

INTRODUCTION

Johns and Van Ryzin (1967) studied the empirical Bayes two-
action problem in the exponential family. They used kernel estimates
for the marginal density f and its derivative g to define tests
¢n’ and showed, in their Theorem 3, that under certain conditions,
including (C) and (D) of Theorem 1.1, the risk Rn(¢n,G) converges
to the Bayes risk R*. Furthermore, a rate was obtained. They
gave the scale exponential and the Normal densities as examples
where the existence of certain moments of the prior G is
sufficient for the conditions (C) and (D).

Lin (1968) considered the multivariate estimation problem
with Squared error loss. A multivariate version of Theorem 2.1
was considered.

Chapter 1 considers the same empirical Bayes two-action
problem that Johns and Van Ryzin studied. Theorem 1.1 improves
upon their Theorem 3 by deleting assumption (B) in §l.4 and by
relaxing (A). The scale exponential and the Normal densities are
given to show that in each case their moment assumptions on G
can be relaxed.

Chapter 2 considers the squared error loss estimation
problem. Using a truncation different from that of Lin, Theorem
2.1 establishes a certain rate of convergence. Lemma 2.4 shows

that for certain natural tests derivable from estimates the excess

risk in the two-action problem is bounded by the square root of

the correSponding excess risk in the estimation problem. Corollary
2.3 utilizes this fact to obtain better rates for the Normal two-
action problem (Corollary 1.2) from those obtained in the Normal
estimation problem (Corollary 2.2). The improved rates are exactly
those correSponding to priors not having finite (3 +n/§§)/10 - Ch

absolute moment.

Notational Conventions.

Sets and their corresponding indicator functions will be
used interchangeably. The same symbols will be used to denote
distribution functions and their induced Lebesgue-Stieltje measures.
For any measure u, the n-integral of Y will be denoted by uY,
u[Y] or u{Y}. Dependence on arguments will be suppressed for
simplicity and dummy variables of integration will not be displayed

except for emphasis.

CHAPTER 1

LINEAR IDSS TWO-ACTION PROBLEM

1.1. Introduction.

 

Let us consider the following hypotheses testing problem.

Let 9 ~ G. We test
9 s c against H : e > c

based on an observation X, with XIe being distributed according
to some FS with Lebesgue density f9. Let A1 and A2 respectively

denote the actions of deciding on H1 and H2, and
L1(9) 2 0: L2(9) 2 0

denote the losses of A1 and A2 when 9 is the true parameter.
Let P denote the p-measure on (X,e). A randomized

test ¢ in the Bayes problem above incurs a risk given below by
(1.1) R(¢,G) = 11.,ny1 + (1-¢)L2'I.

Let R* or R*(G) denote the Bayes risk.versus G. (We tacitly
assume that P'x(L1 - L2) is well-defined. This will be the case
for the application of the theory to the two-action problem in
exponential families with linear losses.)

Since a test is Bayes if and only if it minimizes the

expected loss given x,

(1.2) ¢C(x) = [PX(L1 - L2) 5 O]

is Bayes versus G. Johns (1957) considered the linear losses
+ -

(1.3) L1(9) = (e - c) . L2(e) = (e - c) .

and intended, as a consequence, that PX(L1 - L be expressible

2)

in terms of the posterior mean; that is,

(1.4) P (L

x 1 ° L2) = Pi<9 ' C) '

Hereafter, unless stated otherwise, we will assume that L1 and

L2 are as defined in (1.3).
We remark that, although the losses in (1.3) are unbounded,
*
the Bayes risk R (C) may be uniformly bounded on the class of

all priors; for example, let x ~ N(e,l) and consider the natural

test ¢'(X) = [x s c]. Taking conditional expectation given 9,

Pe{¢'L1 + (l-¢')L2} . \e - c\s(-|e - cI) is less than (211)"25
by the Normal tail bound (Feller (1962), p. 166). Therefore, the
s

Bayes risk in the Normal two-action problem is less than (2n)—

whatever be G.

1.2. The Empirical Bayes Problem.

In this chapter we shall consider the case when a sequence

of past observations x1,X2,...,Xn' is available, with each of the
X's i.i.d. according to the ,marginal distribution of x.

At the (n+1)8t prdblem, the decision rule ¢n is allowed to
depend on all the past observations as well as the (n+1)8t. Hence,

¢n is a measurable function of X1,X2,...,Xn and X = X

n+1
With P extended to denote the product measure on (X,e),

X1,X2,...,Xn, we can express the risk of ¢n by
(1.5) Rn(¢n,G) = P[¢n L1 + (1-¢n)L2} .

We note that since Pk X{g(e)} = Px{g(e)} for any function g(e),
1’

it follows that ®G continues to be Bayes in the empirical Bayes

problem. This motivates the use of the excess risk (regret)

n

(1.6) R - R* = KIQDH’G) - R*

as a measure of goodness of a test ¢n' Restricting G to those

with finite Bayes risk, the excess risk satisfies

(1.7) o s Rn - R* = PI(¢n - Pc)(Px e - e)} .

Note that the integrand (¢n - ¢G)(PR e - c) is non-negative since
¢G continues to be Bayes.
1.3. Exponential Families.

Let h be a non-negative measurable function defined on
the real line, and

ﬂ = {-m < 6 < m : I e"ex h dx < w} .

For each 9 in the natural parameter Space 0, let

1
8(9)

 

(1.8) fe(x) = 5(9) h(x) e‘ex , where = I e"ex h(x) dx.

The following lemma, due to Professor J. Hannan, yields
a choice hg of h such that on the set of x for which ha

is positive, the function

(1.9) we =I ace) e‘e“ dG(e)

is infinitely differentiable and its derivatives can be computed

by repeated differentiation under the integral sign.

Lemma 1.1. let .9 = {G : G is a distribution on O} and

CG = {x : J(x) < w}, for each C €.£. Then there exists a deter-
mination N9 within the Lebesgue equivalence class of h
(independent of G 6.3), for which [hg > O] C int(CG), what-
ever be G.

Proof. The fact that hJ is a density implies that [h > 0] 5 CG a.e.

for each C €.&. The closed convex set C; =I7{Cé : G E.&} is

also the countable intersection HIE; : rational r {LES} where
r
CG is any one of the CG that excludes r. The above con-
r

siderations, together with the fact that a countable union of null
sets is null, imply that [h > 0] s C; a.e. and, therefore, also

[h > 0] s int(§z) a.e. Hence, by defining h& = 0 off int(C£)
and ha = h on int(C9), it follows that [h > 0] C int(C&) c

int(C§) C int(CG), whatever be G.

tenses

Since J is well known to be infinitely differentiable on
int(CG) and its derivatives can be computed by repeated differentia-
tion under the integral sign, it follows that the same hold true

on the subset [ha > 0]. Therefore, with

(1.10) f =I‘ fe dG(e)

denoting the marginal density, the existence of hér) on [hg > 0]
will imply the existence of f(r) via the Leibniz's rule of dif-
ferentiation for the product f = J hﬁf We shall make use of this

fact immediately after the following summary.

1.4. Summary and Some Useful Results.

Johns and Van Ryzin (1967) considered the two-action
empirical Bayes problem in exponential families with densities
(1.8) under the additional assumption that there is an a 2 -m

such that
(1.11) h(x) > 0 if and only if x > a.

For each integer r 2 2, they exhibited procedures ¢n such that

under the assumptions:

(A) h<r> exists and is continuous for x > a
and

r
(B) GIeI <°° 9

together with the conditions (C) and (D) of Theorem 1.1, the regret
can be shown to converge to zero at a rate no worse than n-Y,
where y = (r-1)6/(2r+1) and 0 s 6 s 2. Moreover, they gave
the Normal (-e,l) and the scale exponential families as examples
where conditions (C) and (D) hold for some 0 s 6 S 1 when-
ever the prior G has certain moments finite.

We shall show in Theorem 1.1 that only the existence of
h(r) together with (C) and (D) are required for the regret con-
vergence of 0(n-y). The Normal and the scale exponential examples
will be discussed in Corollaries 1.1 and 1.2; and we will show that
in each case their moment assumptions can be relaxed.

We will further show in Theorem 1.2 that analysis similar

to that in Theorem 1.1 can be carried out in exponential families

(1.8) where h is not assumed to have any derivatives.

In the remainder of Part 1,.& is assumed to be the class
of priors G for which the Bayes risk is finite, and only exponential
families as defined in (1.8) and (1.11) will be considered; moreover,
since [x s a] is aP-null set, all statements are assumed to be
quantified by x > a unless stated otherwise.

We note that since [x > a] is an open set, the h in

(1.11) is already its own he determination. By the remark follow-

(r) for

Lemma 1.1, the existence of h implies the existence of
This improves upon Lemmas 2, 3 and 4 of Johns and Van Ryzin in that
their respective moment assumptions GIeI < m, GIeIr < m and
GIlog e‘r < a are deleted.

For the exponential family in (1.8) and (1.11),

Ja)
(1.12) Px(9) = - 3——— (for x > a).
Hence, the quantity P’X(L1 - L2) = Px(e - c) and, therefore, also
the Bayes test $6 in (1.2),are well defined without any assumption
on G. In addition, if h(1) exists then, with

(1)

(1.13) v = %-- , g = f(1) and o = f PX(e - c),
we have
(1.14) gx(e) = v - % and a = (v-c)f - g .

We note that the Bayes test in (1.2) becomes

(1.15) ¢G(x) = [a(x) s 0] .

When a sequence of i.i.d. observations X1,...,Xn and
X is available, it is the Special form of $6 in (1.15) that
we will exploit in defining reasonable extimates ¢n by estimating
the density f and its derivative g by the kernel method so
successfully employed by Johns and Van Ryzin.

To conclude this section, we state and prove Lemma 1 of

Johns and Van Ryzin (1967) as a consequence of (1.7).

lemma 1.2. Let an be any measurable function of X ..,Xn

1"

and X. Then the excess risk of

(1.16) ¢n = [an S 0]
satisfies
(1.17) 0 S Rn - R* S ‘LIQIIPXEIQIII " (XI 2 IQIJdX.

Proof. From (1.7) and (1.13),
* m
(1.18) o s Rn - R = IIQIPXI 93m - Re Idx ,
The reSult follows from (1.18) since I¢n - ¢CI 3 [Ion - GI 2 IaI].

1.5. Main Result and Examples.

In view of (1.14) and (1.17), the excess risk Rn - R*
can be made small if f and g can be adequately estimated. The
appendix provides kernel estimates fn and gn for which the bias
terms Pxfn - f and ngn - g are small. These estimates will be

used in the obvious way to define an and ¢n in (1.19).

Theorem 1.1 below is an improvement of Theorem 3 of Johns

and Van Ryzin (1967) in that their assumptions GIeIr < m and

lO

h(r) is continuous are deleted. Their proof is reproduced below

for completeness.

For each integer r 2 2, let

(1.19) (on = [an s 03 , where an = (v-c)fn - gn
with
-1 n o o -1
fn(x) = n jilem) , wjm) = A K0((Xj - x)/A)
and

n
(nm'IL .2: «In» - wJIum. wIm

g (X)
n 1:1

'1
A K1((Xj-x)/A)

being the type of kernel estimates of f and g given in (A.8)

of the appendix. We note that r 2 2 is required in (A.l).

 

Theorem 1.1. LBt ¢n be as in (1.19) with A = n-l/(2r+l), If h(r)
exists (for x > a), and if there is some 6 > 0 such that
‘° 1- o
(a) I IoI 6(1 + \v\>6<q( I)“2 dx < ..., q(°)<x> = sup f<x + en)
a e e 0<u<l
(D) I \eIH’a +IvI>5<c1§“’))‘5 dx < ..., q‘r)<x> = sup Irma + eU)I
a e 0<u<1
then,

 

0 s R ( G) - 11* = 0(n'Y) where = r'1 6
n ¢n’ ’ Y 2r+l

Proof. Lemma 1.2, followed by the Markov inequality, yields
(1.20) o s R ( .c) - 11* s]? IaI1’5P Io - oIédx
n (”In a x n '

Since (1.14), (1.19) together with the Cr-inequality (Loéve p. 155)

imply

ll

6 6
Ion - oI 3 CO {Iv - CI Ifn - fI6 + Ign - gIﬁ} ,
we have, by (1.20),

*
0 s Rn(¢n,G) - R s (35 {A +~B}
where

A =TIsI1'5Iv «45 lefn - s5 ax

and

B = T IoIl-é P’XIgn - gI6 dx .

Thus, the rate at which the regret converges to zero is no worse

than that of max(A,B). Let us first consider A. For 6 > O,

the Cr-inequality yields

- 5 _ 5 _ 5
(1.21) If“ f\ 5 Ce {\fn Exnt + \prn fI }
and for 0 < 6 < 2, Holder's inequality yields

_ 6 5/2
BXIfn Pan‘ S (Varxfn) °

Since the above inequality trivially holds for 6 = O and 2, it

follows from (1.21) that

6/2

6 e
(1.22) kafn - fI 3 C5 {(Varxfn) + IPan - fI } .

Thus by (A.9) and (A.lO) of the appendix,

_ 6 -1 (0) 6/2 r (r) 6
Pk‘fn fI s const X {[(nA) qe ] + [A qt 1 I

-l/(2r+l)

so that by (C), (D), and the choice A = n , one has

/2) +_O(Ars) g 0(n-r6/(2r+l)

A = cans)"6 >

12

Similarly, for O s 6 s 2,

Rngn - sI6 s Ce {(Varxgn)6/2 + Ingn - gIé}
= const X {L(nA3)-1q:0)]5/2 + [Ar-lqér)]61
so that by (C) and (D),
B = 0((n13)'6/2) +-0(A(r’1)6) = 0(n'Y)

The proof is completed by this weaker rate of B.
For the remainder of this section, the scale exponential and the
location Normal families will be given as examples to illustrate

how conditions (C) and (D) relate to the moments of G.

Example 1. (Scale Exponential)
Consider the exponential density in (1.8) with h = [x > 0]

and 3(9) = e; i.e., for each 8 > 0

e e-OX , x > 0
(1.23) fe(x) B o , otherwise.

The density f satisfies the following facts:
(1.24a) fe is monotonically decreasing, and so is f.

(1.24b) Since h(r) = O for x > O, f(r) exists (for
x >10) by Lemma 1.1; moreover, v = 0 so that conditions (C) and

(D) simplify.

(r) r . . .
(1.24c) If I = I e fedG(9) is monotonically decrea31ng
andstherefore,

(1.24d) qér) = If(r)I .

13

Corollary 1.1 is an improvement over Corollary 3.1 of
John84Van Ryzin (1967). They proved the sane result under the

assumptions Ger+1 < a and (1.26) below.

Corollary 1.1. For the scale exponential in (1.23), the hypothesis

of Theorem 1.1 holds for each 0 s 6 s 1 if

(1.25) G[9r] < e ,

(1.26) G[e-n < m, where n = (1+t)6/(2-6) for some t > 0.
1

Proof. Since v = 0, condition (D) simplifies and is implied by
r
the integrability of a and q: ), subsequently illustrated. By

Tonelli's theorem (Royden (1965), p. 234),

IIaIdx s IIIe - cIfedG dx = GIe - cI.

By (1.24c) and (1.24s),
(r) _ r _ r
I qe dx - If e fedG dx - GL9 ] .

Hence, we have shown that G[er}< m is sufficient for condition (D).

Let us next verify condition (C). Since a is bounded

(0)

by GIe(e - c)‘, v = O, and q6

= IfI s G[e], it follows that,
under (1.25), condition (C) is implied by
to

(1.27) I \sI1'6 fb/de < s .

1
Since 9 5 e9, If< )(x)I 3 f(x - l) for x > 1; consequently,

Io(x)I - Icf + £(1)| s (c+l)f(x-1) for x > 1. Thus, by the
Holder inequality,

an

(1.28) I IaIl'6 fG/z dx s (e+1)1'5(1/t)5/2{p[1+x]“}1'5/2.

14
The proof is completed by the equality PIX“) = G{9-n}F(1+ﬂ).

Remark. Corollary 1.1 shows that procedures ¢n exist, for

which the regret convergence rate can be arbitrarily close to

-’5

n provided 5 = 1 and r is sufficiently large, i.e., G has finite

(-1)- as well as arbitrarily high moments.

Example 2. (Normal (-e,1)).

2

Consider the exponential family in (1.3) with h(x) = e“x /2

2
and 5(9) = (211)“!5 e'9 /2; that is, for each -m < e < a,

2
e e-(9+x) /2

fe(x) = (2n)- , where -m < x < m .

We have shown earlier (§1.3) that for this family the Bayes risk
* -
R (G) < (2n) I whatever be G.
2 2
- - +
eY/2+e()’e) /2

Since the function is symmetric with

respect to y I -e/2, and has a unique minimum there with value

2
2e-e ,8, it follows that

fe(x + t) s fe(x) + fe(x + e),
(1.29) for 0 S t S e s,/8 log 2 .

q:°)(x) s £00 + for + e).

By repeated differentiation under the integral sign,
f(r)(x) = (-1)r Ihr(x +-e)fe(x)dc(e) .

where Hr is the r-th Hermite polynomial. Thus, for e s,/8 log 2,

(r) r 1
If (x>I s z IajI le + e\ fe(x) dc(e) .
o

(1.30) (r) r j j
q‘3 (x) $53 Iajch I<Ix+eI + e )(feoc) + fe(x+e)>dc<e>.

15

where the second inequality follows from the first via (1.29) and

the Cr-inequality. Lastly,

% t

is s (2n)' , f 5 (Zn)-

(1.31) IaI S.II9 - cIf.e do(e) s (211)';5 GIe - cI.

qéo) s (2n)'3 , qér)~ is bounded.

Remark. Corollary 1.2 below is an improvement of Corollary 3.2 of
Johns-Van Ryzin. They proved the corollary under the stronger

1+(3+t)6/(2-6) <

assumption GIeI m, and GIeIr < m.

Corollary 1.2. Consider the Normal (-9,1) family. For each

0 s 6 s 1, if

GIeI1-+-(2-+-t)s/(2-e) < m

(1.32) for some t > 0 ,

then the hypothesis of Theorem 1.1 holds for each r 2 2.

Proof. Condition (D) is implied by the integrability of a and
r

IxI q: ), since 1 + IvI = 1 +-IxI is bounded by 2IxI for

IxI > 1. By (1.31), if GIeI < m then
(1.33) IIaIdx s GIG - cI < m .

. 2
Denote by b the constant (2n)-% IIzIJ e.z /2dz. Since

1

PeIx +'te = b , it follows, by the triangle inequality, that

.1
PeLIXIIx + eI-I] s Pe[(Ix+ 9| + I9I)Ix + 9I11= sjﬂ + Ierj.

Hence, GIeI < m implies P(IxI Ix + te) < m for each j and
therefore IxI qér) is integrable by (1.30). This completes

the verification of (D) under GIeI < a.

16

Let us next consider condition (C) for 6 = 0, 6 = l, and

O < 6 < 1.

Case 1 (6 0). (1.33) proves this case.

Case 2 (6 = 1). Since IvI = IxI and q(0) 3 (2n)-%, we need

only to verify the integrability of [IxI> l] Ix I(q(0) ) I. By
Holder's inequality,
3+t O
I IxI(q: O)x)35d s (—> (“XI q()d ax}

IXI>1

where the last integral is bounded by

I IxI3+t(f(x) + f(x+€))dx
s P[Ix+9I +I9I33+t +-P[Ix+BI + Ie+eI]3+t

via (1.29) and the triangle inequality. Again by the fact that

(x+e) given 9 is standard Normal, GIeI3+t < m implies Case 2.

Case 3 (0 < 6 < 1). Let 0 s g s 1,0 < t. With 0 < l/p= 6/2 < l,

l- 0 2 -
0 < 1/q .—2—6< 1 X = I I( §)5(q : ))5/ and Y =IXI§5IUI16
in the Holder inequality, it follows that (C) is implied by the

integrability of Xp and Yq. By (1.29),

I Xpdx S P x2(1-§) +~2 PIX'6I2(1-§)

so that GI9I2(1-E) < m implies the integrability of Xp.
If GIeI < a, then a is bounded by (1.31), and Y is
bounded on IxI s 1. Therefore, the integrability of Yq is

implied by that of [IxI > 1]Yq. By Holder's inequality,

ILIXI>11quxS (.95/(2- 6) X{IIquIoIdXI2(1-6)/(2-6),

17

where u = %(1 +-t +-2§)6/(1-6). By Tonelli's theorem,
IIquIaIdx s IIe - cIPeIqudG(e). Since (x+9) given 9 is

standard Normal, PeIqu is bounded by

rem“ s on x {Pelx + e\“ + W} .

l
by the Cr-inequality. Thus, GIBI +u < m implies that Yq is
integrable. Balancing between l+u and 2(l-§), we get
m8x(l+u, 2(1-§)) is minimized when 1-2§ = 6(2+t)/(2-6), so that

2(1-§) = 1+6(2+t)/(2-6). Therefore, (1.32) implies Case 3.

Remark. Corollary 1.2 shows that for the Normal (-9,1) family
there exist procedures for which the regret convergence to zero

is of a rate no worse than n-Y, provided that the prior has finite
1+6(2+t)/(2-6)&1absolute moment , where O s 6 s 1. In the case
where 6 = l and r is sufficiently large, a rate close to

n")5 can be achieved provided the prior G has 3+ absolute
moments. However, for 6 = 0, the finiteness of the first moment

of G guarantees only the boundedness of the excess risk. This

lack of rate will be removed in Corollary 2.4.
1.6. Result Without Differentiability of h.

In Section 1.5 we discussed the exponential family in (1.11)

(r)

and (1.8). We took advantage of the existence of h and obtained

the result in Theorem 1.1. In this section we shall not assume h

to have any derivative. We recall from (1.9) the definition
-ex
(1.34) J(x) = I e 5(e)dc(e) .

It was shown in Lemma 1.1 that J is infinitely differentiable on

18

[h&> 0] and, therefore, also on [x > a]. Since f = Jh, it

follows from (1.12), (1.13) that

(1.35) a = -(J(1> + cJ)h.

In view of the method of attack exhibited in Sections 1.4 and
1.5 , we shall estimate $6 through J and J(1)

For each r 2 2, let

0

n
Jn<x> =n 1 2 wjm/ij)
i=1
(1.36)
J'(X) = (HA)-1 ; {W1(2A) ' W1(A)}/h(x )
n j=1 j J j
where W? and W} are as defined in (A.8) of the appendix. Let
(1.37) 6n = [an s 0], where an = ~(J; + an)h .

Theorem 1.2. Let ¢n be as in (1.37). Consider the exponential
family in (1.11) and (1.8). For each 0 s 6 s 2, if there exists

some a > 0 such that

(Cl) ‘I‘ IaI1-6(T% h)6 dX < a ’ Te(x) = SD M
a

e 0031 h(x+eU)
(D') ‘II IQI1-6(S(r)h)6 dX < ‘0 ’ S(r) (X) = Sup IJ (r) (X+eu) I 9
a e e 0<u<1
-l/(2r+1)

then, with A = n , we have

(1.38) 0 S Rn(¢n,G) - R* = 0(n-Y), where y = (r-1)6/(2r+1).

Proof. From (1.20), it follows by the Cr-inequality that the

excess risk is bounded above by C6 X (A +-B) where

19

A = I46]: IOIIl-é PXI(Jn - J)hI6 dx
and

Q
1-5 (1) 6
B = I IaI PkI(J; - J )hI dx .
With Jn and J replacing fn and f in (1.22) we have,
6 6/2 6
PXIJn - JI s C6{(VarxJn) + IPxJn - JI }, for o s 5 s 2.

Under (0') and (D') and Lemmas A.S and A.6 of the appendix and the

choice A = n-1/(2r+1), then
A = “(um-M2) + 0(AI‘6) = 0(n-r6/(2r+l))
Similarly, for 0 s 6 S 2,
(1) 6 6/2 . (1) 6
PXIJr'l - J I s C6{(Varth:) + Iprn - J I }.

Invoking (C'), (D') and Lemmas A.5 and A.6 of the appendix,

B = 0((nA3)'5/2) + 0010.4)6

) = 0(n'Y>
The proof is completed by the weaker rate of B.

Example 3. Consider the exponential family with
(1.39) h = [0 < x s 1] +-2[1 < x < m] .
Then 0 = (0,m) and 3(9) = e/(l + e-e). We note that

(1.40a) h is non-decreasing while J is strictly decreasing.
(1.40b) IJ(r)I -.f 9r 8(9) e'exdc(e> -

(1.40.) s“) = INN and T = l.
6 e h

20

Corollary 1.3. Consider the exponential family with h in (1.39).

The hypothesis of Theorem 1.2 holds provided (1.25) and (1.26) hold.

Proof. The proof of Corollary 1.1 works with

O
(1.41) qér), q: ), f, IaI s.c[eIe - cI] and f s Gie]
respectively replaced by

(1.42) 5“), T , J, IaI s zciele - cI] and f s ZG[e].
e e

CHAPTER 2

SQUARED ERROR lDSS ESTIMATION PROBLEM

2.1. Introduction.

 

Suppose e is distributed according to some prior G, and
one is to estimate 9 based on an observation X with XIe dis-
tributed according to the exponential family given in (1.8) and

(1.11); that is, for some a 2 an,

(2.1) £e(x) = 5(9) h<x> e‘ex ,
where
(2.2) h > 0 if and only if x > a .

Let P denote the joint p-measure on (x,e) as in Chapter 1.
Let the loss function be the squared error loss. The risk of an
estimate ¢ is then given by R(¢,G) = P(¢ - e)2 with Bayes risk

(2.3) R*(G) = inf P(¢ - e)2 .
¢

We note that R and R* denote different quantities in Chapter 1.

In order that the problem not be totally uninteresting, we
restrict G to those with finite Bayes risk" We note that the
Bayes risk R*(G) can be uniformly bounded in G. For

example, let X ~ N(e,1). Then the natural estimate ¢'(X) = X

21

22

*
has risk P(¢' - e)2 = 1. Therefore, R (G) s l whatever be G.
Extend P to denote the product p-measure on (X,e),
X1,X2,..., and Xn. Let In be any measurable function of

x1,...,Xn and X. The risk of In is then given by

_ 2
Rn(¢n,c) - P(q;n - e) .

Let WC be a Bayes estimate versus G. If In - WC 6 L2(P) then

POIn - WG)(¢G - e) = 0, and the excess risk satisfies

(2.4) o s Rn(¢n,G) - R*(G) = P(wn - 1&2.

We recall the following definitions from Chapter 1.
(1)
h -
(2.5) v = 5—— . f -—- we) .3 = f”) . and J<x> =fe %B(e)dc(e)-
It is well known that a Bayes estimate under squared error loss
is the posterior mean PRG' Hence, by (1.12), the Bayes estimate
VG is well defined without any assumption on the prior G. Further-

more, (1.14) remains valid with Pie replaced by WG’ i.e.,

(2.6) WG=V-%.

In view of (2.4), it is now a matter of estimating I; by estimating

the density f and its derivative g.

2.2. Estimation of WC = PXG-

We shall exploit the expression in (2.6) in estimating WC
when a sequence of observations X ,...,X , i.i.d. according to
l n

the common density f, is available.

23

Let fn and gn reSpectively be any estimates of f and

g. Let n > O. Truncate fn away from 0 by

' =
(2 7) fn fn v n .
and define
gn
(2-8) , in = V - f: -

Lemma 2.1. For each ﬂ > O, the estimate In in (2.8) satisfies

(2.9) P(In - IG)2s 3m"2 A + n'2 B +10) ,

2
where A = P(gn - g) , B = P(g/f)2(fn - f)2, and

c - P(g/f)2[f < n].

Proof. From (2.7) and (2.8), simple algebraic manipulation followed

by the triangle inequality will yield

n

- .Ee a. ﬁ- '
(2.10) nIIn - IGI “If; ‘ fI 5 ‘gn ' f f I

5 Is, - 8l + I§| lf - fgi .

Since If - féI s “[f < H] + If - nt, the proof follows from
(2.10) and the inequality (a +‘b +~c)2 s 3(a2 +b2 + c2).

Lemma 2.1 shows that for any estimate In of the form
in (2.8), the regret can be bounded in terms of A, B and C in

2

(2.9). The first two terms, namely A and B, involve P - f)

X(£n
and Bx(gn - g)2. The appendix gives kernel estimates fn and
gn, for which these quantities are small. Therefore, hereafter,
we shall consider fn and gn to be the kernel estimates given

in (A.8) and that In in (2.8) is to be defined in terms of these

estimates.

24

2.3. Summary.

Theorem 2.1 below is a l-dimensional specialization of a
result considered by Lin (1968). The scale exponential and the
Normal densities again will serve as examples to show that the
existence of certain moments of G is sufficient for the hypothesis
of Theorem 2.1. In Corollary 2.4, better rates are obtained for
the Normal two-action problem from those obtained in the Normal

estimation problem.

2.4 Main Results and Examples.

Theorem 2.1. Let In be of the form in (2.8) with fn and gn
being kernel estimates of f and g as given in (A.8) of the

appendix. If h(r) exists and if for some 0 < e
(2.13) P{ <1 + (g/f>2>q:°’} < e ,
(2.14) P{ <1 + <g/f>2><q§">2) < .. .

and if 6 2 0 such that

 

 

(2.15) P{(g/f)2[f < m} 5 c1 115 ,

l 2(r-1)
then, with A = n'1/(2r+1) and n = n 2+6 2r+1
(2.16) 0 s Rn(In,G) - R* = 0(n'Y') ,

l 2 ‘
where Y = 236 25:11) .

Proof. Let A, B and C be as in (2.9). With A = n-l/(2r+1),

Lemmas A.3 and A.4 of the appendix followed by (2.13) and (2.14)

will yield

25

A 3 c5 x (nA3)‘1 + c; x A2(r-1) S C2 X n-2(r-l)/(2r+l) ’

and

B 5 c5 X (nA)-1 + c" X A2r s c n-2r/(2r+l)

3 X

3

3

with the rate on A being the smaller of the two. The choice

_ 1 2(rﬁll
2+6 2r+l

_ _ I
balances the rates of C and n 2A to n Y .

B = n

The proof is completed by Lemma 2.1.

Example 1 (Scale exponential).
Consider the scale exponential with Lebesgue densities

given by (1.23), i.e.,

e e , for x > 0

(2.17) fe(x) =

O , otherwise.

Consider the extreme case where G is degenerate at
e = 1 with all moments finite. The quantity

c = P(g/f)2[f < n] in (2.9) can be computed to be exactly n.

This motivates the bound in the following lemma.

Lemma 2.2. For the scale exponential in (2.17) if 0 < n s f(l),

then for each p > 1 and l/p + 1/q = 1,

2 1 l
(2.18) P(g/f) [f < n] s <r(1+2q>> ”(n/(21> - 1)) /" .

Proof. The inequality (g/f)2 = (-Pxe)2 s Px(92) followed by

Holder's inequality yields

26

P(g/f)2 [f < n]

[A

P(ex)2 x-2[f < n]

(2.19)

IA

- l
(Precex>2q>1/q(p x Zpif < n3) /p

<r<1+2q))1/q(P x'ZPIf < n1>1/p .

where the last equality follows from the fact that conditioned on
9, ex is standard scale exponential. For 0 < n s f(l),

[f < n] s [x > 1] so that

P x-2p[f‘< n] S n I x-2p dx = n/(Zp - l) .

This completes the proof.

Lemma 2.2 shows that (2.15) holds with 6 = l/p and

/p

c1 = r1/q(1 + 2q)/(2p - 1)1 without any assumption on the prior

1
1xa
shown that f(x) = x-( +a) g zae-zdz ~ x-(1+a)F(l+a) and

G. (For priors with densities ea-1[0 < 9 < l], a > 0, it can be

Ig(x)I ~ x'(2+a)r(2+e) as x .. 00. Hence, (g/f)2 ~ (1+a)2x"2 as

x a,¢, and' C s clﬂ(2+a)/(1+a). Here we see that the bound on

1

C deteriorates as a, the number of finite moments of e- ,

increases.

Corollary 2.1. For the scale exponential family in (2.17), the

hypothesis of Theorem 2.1 holds for each r 2 2 and 6 < 1, pro-

vided
+
(2.20) c er a < c. .
2 2 -
Proof. Since (g/f) s §{(9 ) and qér) = C(er+1 e ex), it

suffices to note that with ei ~ G1 = G, i = 1,2,

27

2 -elx
G1 G Fe[(1+e )91 e 3

2 -
P[(1+2x<e >)c(e e 9x)3

2
c1 c[(1+e )e1 e/(e+el>]

c<e)<1+c<ez>) .

IA

and furthermore, by the Arithmetic-Mean-Geometric-Mean inequality
(Beckenback-Bellman (1961), p. 54 ),
2 -
P(1+Px(e ))G2(er+1 e 9")

'8 x '9 x
2 r+l 1 r+l 2
c1 62 CF9[(1+9 >91 e 92 e 1

r+l r+l 2
- G1 e2 e1 92 G[(1+e )e/(e+el+92>1
r+k r
s a G1 62 61 92

r+k

“5 Ci (1+ez) 935]

= a cz<e )G[9,(1+Gz)] -

The proof is complete.

Example 2 (Normal).
let us consider the Normal (-e,l) family with Lebesgue

densities

c -(x+e)2/2
e

(2.21) £e(x) = (2n)-

For each 0 S u and 1 s v, we note that

(2.22) bu = PeIx + eIu
u -z2/2 %
is the finite constant I IzI e dz/(Zn) and by Jensen's
inequality,
(2.23) Ig/fIv = IPk(x +9)Iv s Pka +eIV .

Remark. Consider again the extreme case when G degenerates at

e I 0. Then the quantity C = P(g/f)2[f < n] ~ 2Ln as n a O

28

with L2 = -1og(2n “2) = o(n-t) for any t > 0.

2322;. Since f(L) = n and [f < n] = [IxI > L], we have

C = 2 ixzf dx which, upon integration by parts, yields

C = ZLM +-2P[x > L]. By the Normal tail bound (Feller (1962),
p. 166), it follows that 2Lf(L) + 2f(L)(% - 15) < C < 2Lf(L) +
2f(L)/L. Consequently, C ~ ZLm. The proof i: completed by the
fact that L = o(n-t) for any t > 0.

The above remark motivates the bound in the next lemma.

Lemma 2.3. Consider the Normal (-e,1) in (2.21). For each

0 s a < 1, (2.15) holds if

(2.24) cIeI(1+t)5/(1‘5) < e for some t > 0.

Proof. By the Holder inequality,

P(g/f)2[f < n] s 11/p Ill/q ,

where

I = PIg/fI2p s PPxIx + eIZP = PPeIx + 9I2p = b2p

by (2.23) and (2.22), and
s l-s
II = P[f < n3 3 n I f dx, for any 0 s s < 1.

Since the density 'f is bounded by (2n)-%, the integrability

1- 1-
of f 8 is implied by that of [IxI > l]f S. Temporarily, let
6(3)=(1+a)s/(1-s)for each a >.o. The Holder inequality followed

by the Cr-inequality yields

I [IxI > 1]f1-sdx s (2/a)S P1-8(IxIv)

_<. We)" {cbv + PIeIVme‘S .

29

Hence, GIBIV(S)<#»implies that fl”8 is integrable and, therefore,
(2.15) holds with the rate s/q. Since (2.24) implies that there
exists some 0 < a«< t for which GIGIv<6+> < m with

6+-> 6, the proof above shows that (2.15) holds with rate 6+/q. The
proof is completed by the choice 6+/q = 6. Such a choice is possible

since 1 < q is a free parameter.

Corollary 2.2. Consider the Normal (-e,l) family. For each
0 s 6 < 1, if (2.24) holds, then the hypothesis of Theorem 2.1
holds for any r 2 2.

35295 “with“ (2-13) and (2-14) are satisfied because qéo)
(r)

2 .
s are bounded functions, and (g/f) is P-1ntegrable by

and q

(2.23). The proof is completed by Lemma 2.3.

Remark. For 6 close to 1, Corollary 2.2 shows that a rate of
0(n-Y'),with y' arbitrarily close to 1/3, can be attained, pro-
vided GIeIul< a for sufficiently large at On the other hand,
for 6 close to zero, lower convergence rates are attained. This
last result is completely absent in the two-action problem (Cf.
the remark following Corollary 1.2). We shall presently remedy
the situation by obtaining better rates in the Normal two-action
problem as a corollary of the estimation problem.

Let In be the estimate prescribed in Theorem 2.1. Con-
sider the test ¢; = [In - c s O] in the two-action problem in
Theorem 1.1.

"P >2 1
( x9 In . Consequent y,

lama 2.4. P{(Pxe - c)(¢r'l - ‘30)} s P
the excess risk of ¢é in the two-action problem is bounded by

the square root of the excess risk of In in the estimation problem.

30
Proof. Since

- ’f
Pxe c 1 In 5 c < PXG

- 7 - =
(PXG °)(¢n 66)
c-Pxe 1f PXGSc<In,

it follows that

35

(2.25) P{(Pxe - cm);1 - 66)} s PIPxe - In\ S P (PXG - In)2 .

where the second inequality follows by the Liapounov inequality.
The proof is completed by (1.7) and (2.3).

We note that (2.25) is a statement about the excess risk
in the two-action problem being bounded by the Ll-norm of I - In,
which in turn is bounded by the Lz-norm of I - In.

Applying Lemma 2.4 to the Normal two-action problem, a
rate of- 0(n-k Y') is possible provided G has finite
((1+t)6/(1-6))-th moment. If we let m denote the number of

finite moments of G and v the obtained rate, we have the para-

metric equation in 6

= r-l
2r+l '

 

6
(2.26) m = (1+t) 1:6 , v = q 323-, where q

Similarly, we obtain the parametric equation

(2.27) m=1+(2+t)'2-§_z,v=q6,

from Corollary 1.2. The two parametric equations have a solution

at m = .3(1-t) + [.0911-t)2+.80(1+t)]35. For t = 0, m = m0 = .3 + (.89)?

Therefore, for priors not having finite mo-th moment, v‘< ky'.

We have thus proved the following corollary.

31

Corollary 2.3. Consider the Normal two-action problem. Let In

be as in Theorem 2.1, and ¢; = [In - c s 0]. Then the excess

risk in the two-action problem satisfies

0 S Rn(¢:1,G) - R* = 0(n'Y'/2)

3

provided (2.24) is satisfied.

PART II

EXTENDED SEQUENCE-COMPOUND ESTIMATION
INTRODUCTION

Let ‘Q = (61,...,en,...) be a sequence of parameters.
Let Gn denote the empirical distribution of 91,92,...,9n. The
usual standard in compound decision problems is R(Gn), the Bayes
envelope of the component problem evaluated at cm.
k
Let k 2 1. Let Gn denote the empirical distribution
k k
of the k-vectors ﬁk (61,...,ek),_gk+1 - (92""’ek+1)"'°’
k . . .
ﬁn (en_k+1,...,6n). Gllliland and Hannan (1969) con31dered
k
the following extended game. Player I picks 9k = (w1,...,wk) E 0
and Player II, after observing xk ~ P x...x P , picks an action
- 0"1 wk
a 6‘7 according to some randomized decision rule m(§k). With
1&w,a) denoting the loss, the risk Player II incurs is given by
k ) = L( a) ( ) da d P x x P )
New: If wk. as“) (m1 wk

k
The Bayes risk versus a p-measure G on n is
k k
R (GNP) =I R (.3 (p)dG °

Swain (1965) used Rk(G:) as standards for compound prob-
lems, and called the resulting versions the extended compound
decision problems. He considered squared error loss estimation
problems in the discrete exponential and the Normal families and

32

33

- k
obtained rates of 0(n % log n) and 0(1), reSpectively, uniformly
in 3, Samuel (1965) and Gilliland (1966 and 1968) considered the
unextended (k = 1) versions of these same problems with Gilliland

obtaining the improved rates of 0(n-k) and 0(n-1/5), reSpectively.

It is the purpose of this work to re-instate the k in

Gilliland's results.

Chapter 3 considers the discrete exponential families.
Lemma 3.2, a corollary of a theorem of Bikelis (1966), is used
in (3.31) to bound certain probabilities involving kddependent
random variables. Without Lemma 3.2, the knowledge of a lower
bound for the variances r: in (3.28) seems to be necessary.
Theorem 3.2, an improvement of Theorem 3.5 of Gilliland (1968),
*

gives a rate of 0(n-%), uniformly in g, for the estimates m

that sub8ume those of Gilliland's.

Chapter 4 considers the Normal family. Here there is
auch.in common with the estimation problem in the k-multivariate
Normal considered by Susarla (1970). Nest of the results in his
§l.2 are applicable to our extended problem. Theorem 4.1 gives a

-1/(k+4))

rate of 0(n uniformly in g, We note that the rate

deteriorates as k increases.

CHAPTER 3

ESTIMATION IN DISCRETE EXPONENTIAL FAMILIES
UNDER SQUARED ERROR 1088

3.1 Introduction.

We shall consider a sequence of statistical decision prob-
lems each of which is structurally identical to the component prob-

lem described below.

A component problem consists of a family of probability
measures {Pe : e 6 0} on a measurable Space (1J3), a measurable
apace (4,0), and a loss function 0 s L defined on 0 x a. A
randomized decision rule ¢ 6 6 is a function defined on I x C,
such that for each x 6 I, ¢(x,-) is a probability measure on
cg and for each C 6 Cg ¢(-,C) iSIB-measurable. The risk of a

procedure ¢ is defined by

(3.1) R<e,¢) = If L<e.A)¢<x.dA>Pe(dx> .

A sequence (non-Bayes) compound problem is one in which
the decision rule ¢n for the n-th problem is allowed to depend
on all past observations in = (x1,x2,...,xn) and the loss is
taken to be the average of the component losses. We require that
¢n(§n,-) be a probability measure on Co for each x“; and that

¢n(-,C) be.5p-measurable, for each C 6 Ck

34

35

Let m_= (¢1,¢2,...) be a procedure in a sequence-compound
problem. The average risk of using ‘m against g_ in the first n

problems is given by
-1 n
(3.2) Rn(§.m) = n 121““ L(ei,A)¢i(£i,dA)§i(d§i) .

where P, denotes the product measure P X P X...X P .
-1 6 e 9.
l 2 1
A compound procedure Q. is simple if ¢,(-,C) is x,-
1 1
measurable for each C 6 CL If, in addition, all $1 are identical,

say $1 = ¢, it is simple symmetric. For every simple symmetric

procedure m. and any g,

-l

lira:

Rump) = n R(eim) = j‘ R(-.¢>dcn .

1 1

where Gn denotes the empirical distribution of the first n

9's; i.e.,

(3.3) Gn puts mass l/n on each of 91, 92,...,en .

With R(G,¢) denoting I R(°,¢)dG and
(3.4) R(G) = inf{R(G,¢) : ¢ 6 Q}

denoting the Bayes risk versus the distribution C, it is obvious

that for any simple symmetric procedure m'= (¢,¢,...)
(3-5) Rn(_e,d) = R(Gn.¢) 2 R(Gn).

This motivates the use of the modified regret

(3.6) Dn(Q,m) = Rn(.9.:m) - R(Gn)

as a measure of goodness for compound procedures.

36

Swain (1965) considered the following extended version of
R(Gn)-
k
Let k 2 1 be an integer. Let §_E mm and Gn be the
k-th order empirical distribution of the first n 9's which puts

equal mass l/(n-k+1) on each of the k-vectors:
k
QR: (91,62"°°,ek) ,

k
3H1 7 (92’93’°°"9k+1) ’

31 = (91-k+1"”’ei) ’

k -
9.. - (en-k+1"°"9n) -

CorreSpondingly, an extension of a simple symmetric procedure is
k k k

a k-simple symmetric procedure m. for which ¢i(-,C) is Ei-

k k ,

measurable for each C E c“ ¢i<§i’.) is a p-measure on C, and
k

all ¢§ are identical to some ¢ . The risk of any k-simple

symmetric procedure against 6.6 0” in the first n problems,

not counting the first k-l, is given by

k - n k
(3.7) Rama) = <n-k+1> 1 z Rk<ai.¢k)
i=k

_ R(Gk k

" R na¢ ) 9
where
(3.8) Rk<a‘;.ck> = If L(61:A)¢k(£l;adA)§l:(d§.l;) ,

i
k
311‘ = n P9 and Rk(G:,¢k) = I‘ 11kg ,mk)dG:(gk).

i-k+l j

37

It follows from (3.7) that for any k-simple symmetric procedure

k k k
m, ' (p .6 .---).

k _ k k k k k
(3.9) Rn(§,m) " R (Gn:¢ ) 2 R (Gn) :
where
(3.10) Rk(Gk) = inf Rk(ck.¢k) -
n k n
d

k
Swain (1965) used the k—th order Bayes envelopes R (-)

in (3.10), or effectively
k _ k k
(3.11) Dn(§,¢0 - Rn(§J¢D R (an) .

as standards in defining goodness of compound procedures @, and
called the resulting problem the extended compound decision problem.
Gilliland and Hannan (1969), in an improvement of a result

of Swain, showed that for each 1 s k s n and g,

k+
k+1 G l

(3.12) (n-k) R ( n ) s (n-k+l) Rk(G:) -

‘-- k+1 k+1 k+1

k k
In special cases, lim {R (C ) - R (G )} < 0, so that R
new n n k
is truly asymptotically more stringent than R .
Swain exhibited procedures, for the discrete exponential
and the Normal families, that attained regret convergence of rates
- k
no worse than 0(n % log n) and 0(1) reapectively. Gilliland
(1968) considered the (k=l) unextended versions of these problems
and was able to exhibit procedures that possessed regret convergence
6 1/5

of rates no worse than 0(n- ) and 0(n- ) for the discrete

exponential and the Normal families, respectively.

38

It is the purpose of the remainder of this thesis to re-

instate the k in Gilliland's results and to show that the same

5

improved rates of 0(n- ) and 0(n-1/(k+a)) hold. In the course

of doing so, several of Gilliland's lemmas and theorems will be

extended and, in some cases, strengthened.

3.2. A Bound for the Modified Regret D: .

It is well known that under squared error loss, the posterior
. . k . .
mean 18 Bayes. With reSpect to cm, a verS1on of the posterior mean

of the k-th component of ﬁk is given by

k n
(3.13) Inez.) = [an > 0] 321.93 rrj/pIn
k n

,nj = II p._k+c(yé) and pn= 2n. .

where p. = p

91

Under squared error loss, a non-randomized estimate Q
has a modified regret
k -1 n 2 k k
Dummy = <n-k+1> pk 2,6, - 91> - R (an) ,

l:
i

where ‘21 = U Pe . Thus, by Theorem 2 of Gilliland and Hannan
1 3
(1969) (i.e.,
n n
k k k k k k k k
2 R (3MB 5 (n-k+1) R (Gn) s 2 R (91,114) .
i=k i=k

where Ital is arbitrary), one can show that D: is bounded

above and below by

(3 14) (n-k+l)-1 n P (( ' k)( + k - 29 ))
' E—i $1 *1 ‘91 *1 i

and

39

(3 15) (n-k+1)-1; P (( ‘ k)( + Wk " 29 ))
' k-i ‘51 1'1 91 i i

l

- n k k
+ <n-k+1> 2 213011: - w,_1><w:+¢,-1 - 291)) .
k

reapectively, where the argument of It, I:_1 is x: .

If we assume 0 = a7= [-a,a], then the bounds (3.14) and

(3.15) yield the following bound on-the modified regret:

k -1 n k
(3.16) IDn(§_,m)I 5 4a (n-k+1) z PiIImi - H +IA1H ,
k

k
where Ai = I: - Ii_1 for i 2 k. Let us show that following

extended version of Theorem 2.1 of Gilliland (1968).

Theorem 3.1. Let n = [-a,a]. For each P9, let pe

Radon-Nikodym derivative with reapect to some c-finite measure

be its

u. If M B supIpe : e E O} is u-integrable, then
-1n k -1
(3.17) (n-k+1) 2.21IA1I = O(n log n) uniformly in §_.
k

Proof. From the form of I: in (3.13), it is easily verified

by simple algebra that

IAiI s 2a [pi-1 > O'Irri/pi + a[pi_1 = 0, pi > O]

from.which

n n
(3.18) 12231; IAiI 5 2a I z [(ni/wZ/(pi/MHM dpk
k

n
k
+ aI‘ E[p1_1 = 0, p1 > 0] (ni/1~_4)b_d dp, ,

k
where M_= H M(yL). The first term on rhs of (3.18), according
6‘1

to lemma 3.1 below, is bounded by

4O
“ k k
2a.I( z 1/i)M_dn = O(log n) I M'dn ,
i=k
and the second term is bounded by a I M duk. But since
k k
I! dp. = (I M dp.) < co, the result follows.

We state without proof lemma 2.1 of Gilliland (1968).

Lema3.1. Forall Osaisl,ksi_<.n,
n 2 i n
s = 2 a1 / z a. s 2 l/i .
i=k j=k J i=k
Combining (3.16) and (3.17), we have
Corollary 3.1. If n = d" [-a,a] and the hypothesis of Theorem

3.1 is satisfied, then

(3 19) I0k(g )I s 4a (n-Ic+1)'1 2 P I - IkI +-0(n"1 log n)
‘ n ’9- k -i ¢i i

uniformly in Q, for any compound procedure m’.

3.3 Estimation in Discrete Exponential Families Under Squared
Error Loss.

 

Consider the family of probability measures on the non-

negative integers having densities
x
(3.20) 9900 = 6 Me) g<x> . x = 0,1,2.....
with respect to counting measure 6, where g > O, and let
(A1) 0=d=[0,a], 0<a<co.
For this family, the Bayes estimate in (3.13) takes the form
k e N
(3.21) 111(1) = [pi > 01(g/g><p,/pi> ,

where for each y_= (y1"°°’yk) 9

41

g=g(yk) . §=g(1+yk) ,

k-l

ﬁ..
3 c=13 j

k J

IIMH-

In view of (3.21), when a sequence of past observations

is available, a natural estimate for I: (xk) is
-1

(3.22) ﬁg?) = {[s > O]((g/§)(S +v1)/(S +v2))} /\ a , 2k 3 i,

where f(l) = f(y1,...,yk_1,1 + yk) for any f,

i-k k 1 if §k=§1§ i-k
S=26..6=5.(x.) = .§=z'c'.

and 0 3 v1, v2 5 k. We note that ¢: depends on the last k

k
observatiOns Ki taken as a kevector, and is essentially a ratio

. k
between the number of times the kdvectors g. equals

J

k k
I O O 1 .
(Xi-k+1’ ,Xi_1, +xi) and the number of times Zj equals Ei’

except for the perturbations v1 and v2 in the numerator and

denominator. It will be shown that these perturbations are

*
negligible by comparing $1 to the unattainable procedure

(3.23) 9:05-13:13 > O](g/§)(§ + s')/(s + s')} /\ a .

i
where ratios 010 are taken to be 0, S' 8 2 6f , 6'(§§) =
s u c
j-k+l’°'.’xi-k’xi-k+l"°"xj)’ £1) with Xj independently

distributed according to P9 and independent of gj.

6((X

It will also be shown that ¢i possesses a certain rate
of the regret convergence. To be more Specific, we will show that

under suitable conditions, with E1 denoting the product measure

on (K; 3 Kl-k) 3

42

O(n ) uniformly in ‘6

-1n k
<n-k+1> ii'ﬁEiIas; - IiI

(Proposition 3.1) and

0(n-%) uniformly in Q

(n- k+1) 2:13:11 I<z>i - ¢ I
(Pr0position 3. 2), so that, by the triangle inequality,

(n-k+1)-1 2‘, £1“): - IEI = 001-35) uniformly in g

:3

“1

(Theorem 3.2).

A Useful Result of Bikelis (1966).

Let Yi’ i = l,2,...,n be a sequence of independent random

variables that possess finite 2 + 6 (O < 6 s l) moments. Let

Fn denote the distribution function of the normalized sum
n
2

Sn ‘ 2 (Y1 - EYi)/Sn’ where sn - Z Var Yi' There exists a
181 1
2+6)
universal constant c such that IFn (x) - n6(x)I s c L2+6n/(1+Ix I
where is the Liapounov quotient z EIYi -EY WIZ+6I 2+6

L2+6,n

X
and Q(X) a (2n) kl” e ‘t 2/2d

The lemma below is an immediate corollary of the Bikelis
theorem. ‘We will use the lemma in bounding the error term in the

Normal approximation.

Lemma 3.2. Let Y1, i = 1,...,n be a sequence of independent

bounded random variables with IYi - EYiI s B < m for each 1.

Then
— 1+6B 6
(3.24) IFn(x/sn) - o(x/sn)I s c 2 /(sn +-Ix I)6
Proof. By the Bikelis theorem, we have
2+6)

IF-n(x) - @(x)I s c L /(1 +I I

2+ ”6

43

where ,

2+ 1
L2+6,n 5 36/8: and 1 + IxI 6 2 (1 + IxI)2+6/2 +6

by the Cr-inequality. Hence,

1+ 2
6 B6 +6

IE;(x/sn) - 6(x/sn)I s c 2 srzl/(sn + IxI)

1
S c 2 +6 36

/(sn +-IxI)6.

The proof is completed.

Henceforth until (3.34), we will let 3: = x: be fixed

k k k .
and abbreviate ¢i(§1) and ¢£(§i) to I rand ¢', respectively.
Let E abbreviate Ei' Since 0 S o'. I 5 a by (A1):

it follows that

a 0
(3.25) EI¢' - II eg E[¢' - I 2 63cm +I‘ EI_¢' - I s u]du .
-I

We shall next place bounds on the two integrands by the

use of Lemma 3.2.

For each i and IuI s a, put

q = (g/g)(II +11) 3

(3.26) {6' -q 6., for ksj s i-k,
Y = j J
3 S-q6I,for i-k<j

. .w = (g/g) pi/k .
1 1
Since 2 BY. = 2 (ﬁ

~qn.)=-kwu,
R J kj J

1
[¢' -I2u'_I s[ij 20]
k
i
(3.27) = [Z (Y - BY.) 2 k w u] ,
k .1 J
S 2'13"“ka - EYL'Idk) 2 w u]

where 2' denotes summation over L from 1 to k and 2" denotes

44

summation over d for which k s L+dk s i. For each 1 s L s k

and i 2 k, we let

r2 = 2" Var Y
L L+dk
(3.28) 2 i .
r = E Var Y.
k J

Then, with c' = 21+6c, it follows from (3.27), (3.24) and the

fact that IYj - EYJI S 1+§ .
(3.28) E[¢' - I 2 u] s £'{6(-w u/rL) + c'(1+q)6/(w u)6} ,

where l+q playes the role of B in Lemma 3.2. We shall next
bound the terms on the rhs of (3.29) by a quantity not involving
the index L. Let Q = 2a g/g and T2 ==Q(1+Q), then, for each

IuI s a, (A1) yields

(3.308) q S Q 9
3.306 ” s . ,
( ) nj 6 Q “I
2 2 1 .. 2 2
(3.30c) rL S r S E(nj +'q ﬁj) s T pi .

Thus, (3.29) yields
(3.31) E[¢' - I 2 u] s k{6(-w u/(T p:)) + c'(l+Q)6/(w u)6} .

Upon setting 6 = k and integrating u between 0 and a, we

a
obtain,,via the inequality g Q(-bt)dt 5 b-1(2n)-%

a

(3.32) I E[¢' - I 2 u]du S BlprI/w +-(1+Q)%/w%} ,
0

where B1 is a constant independent of i, 5% and Q, We shall

i
next bound the second term in the rhs of (3.25).

45

1'.
Since [¢' - I s u] $.[Z - Yj 2 O], the arguments in (3.27)
k

through (3.29) hold with Yj replaced by éYj; consequently, by

(3.30) and the arguments leading to (3.32), it follows that

a a a a
(3.33) I E[¢I - Ii 5 0160 s BZITpi/w +-(1+Q) /w I ,

i

. . . k
where B2 13 a constant independent of 1, £1 and g.

Combining (3.25), (3.32), (3.33) and (3.30), we have

(3.34) EI¢I - IiI s B3IT pE/w + (1 +-Q)%/w%)

s 34[(1 + era/pi]k .

Before we prove the next pr0position, we quote Lemma 3.1
of Gilliland (1968), i.e.,
n i n
5

(3.35) 2 a (2 a )- 3 2(2 a.)%, for all a. 2 0, k s i s n .
k i k 3 k 1 1

Let pa denote pe=a°

Proposition 3.1. Under the assumptions

(Al) d=n=[0,a], O<a<co,

6
(AZ) E Pa < m
and

~ 6
(A3) 2 [(s/s)pa] < a .
x

'In I k _ '% . .

(3.36) (n-k+1) 2 PSEI¢1 ‘ IiI - 0(n ) unlformly in Q'.
k

Proof. Let M = supIpe : 6 G 0}. Since h(e) is a decreasing

function, it follows from (A.l) that

46
(3.37) M s pa(x) h(0)/h(a) and M is u-integrable.

From (3.34) and (3.35), we have

n , .. 1:
€31 I91 - III s 2134 i [(1 + g/g)pn]
(3.38) k
s 2134 2 [<1 + yawn/191*“ ii" .
1k
k k
where M’= H M(y ) is bounded by H p (y )[h(O)/h(a)]k via
{1:1 L {1:1 a {I
(3.37), and pn/M. is bounded trivially by n-k+l. Hence,
“ a a k a
zpkglcg - III s B,<n-k+1) 2 <1 + g/E) in pa(y )3
k"1 1k l L
k-l
= ass-MINI: 13:60] 2 [<1 + gene? .
X

The result follows from (A2) and (A3).

Remark. Proposition 3.2 of Gilliland (1968) proved (3.36) under

the stronger assumption (Al+), (A2) and (A3)Iﬂlth k=1. The Bikelis bound
on the error term in the Normal approximation enabled us to weaken

the assumption (Al+) to (Al). Gilliland used the Berry—Esseen

bound to prove his Proposition 3.2.

k *
Bounds for Dn(§,m_).

Proposition 3.1 shows that mf and IF are not more than

’5

0(n- ) apart in a Cesaro sense. In view of (3.19), it remains

*
to be shown that ‘ﬁ and .Q' are close in order to show that
* k
‘m and I» are not far apart.

Henceforth until (3.41), let x: = gi’ be fixed and abbreviate

* k d ,( k b * d ' 1
61(51) 8n $1 51) y ¢ an ¢ , respective y.

47

Lemma 3.3. I¢* - ¢'I s k(a + g/§)[S > 0]/S .

Proof. Let I = [(g/§)(§/(k+5)) < a]. On [3 > 011,
I0* - o'I S (s/§)I(§ + k)/3 - §/(k +-S)I s k(a + g/§)/s. Since

I¢* ' O'I = 0 on {[S > O]I}C, the result follows.

Remark. Lemma 3.3 is an analogue of (3.28) of Gilliland (1968).
The truncation of mf in (3.23) results in the better bound in

lemma 3.3.

Lemma 3.4.

(3.39) Ei-k([s > 01/3) < (k+2)/pi

Proof. If S > 0, then the inequality S+k+1 s S(k+2) implies
i
[s-> 03/3 s (k+2)/(S+k+1) s (k+2)/(S +- z 65 + 1). By the
i-k+l
convexity of l/(1+z), Hoeffding's Theorem 3 (1956) applies to

yield
1 _1 1
131.158 + z 5! + 1) s 2 0<j>p <1 - p) j/<1+j> ,
i-k+l 3 3:0

where p = pi/(i-k+l). The rhs of the last inequality is bounded

by
i 1+i '-
2 0(1+j>91+j(1- p)1 j/<(1+i)p)
1%
s (1 - (1-p)1+i)/((1+i)p) s 1/((1+i)p)

Since (l+i)p = (1+i)pi/(i-k+l) > pi, the result follows.

Lemma 3.4 with k Specialized to l improves upon Lemma
3.3 of Gilliland (1968).

The next lemma is suggested by the proof of Lemma 3.5 of

Gilliland (1968).

48

Lemma 3.5. Under (Al),

n k
- k k
(3.40) E'"i'£i-k([s > 03/3) < b(n k+1) (Lglpa(xi_k+t))

%

where b = 2(k+2) (h(O)/h(a))k/2.

Proof. Since [S > 03/8 5 l, [S > 0]/S s ([S > Oj/S)%. Con-

sequently, Jensen's inequality applies to give
P ([3 > 03/3) s (P [3 > 03/3)!5 < ((k+2)/p )15
‘i-k ‘i-k i
where the last inequality follows from (3.39). Thus, by (3.35),

% %
(3.41) Ei-RU‘S > 03/8) < 2(k+2) pn .

WM'J
:I

1

Under (Al), (3.37) holds. Hence it follows from (3.41) that

n 5 k/2 a k k
E nrgi_k([3>03/3) < 2(k+2) (h(0)/h(a)) (pm/ﬂ) LE pa(Xi-k+L))
s b(n-k+l)%( g p (x ))15

L=1 a i-k+c °

The proof if completed.

Proposition 3.2. If the family of distributions satisfies the

assumptions
(A1) n=d=[09a]s 0<a<°°2
(A2) 21:3”,
x
and
. ~ %
(A3) 2 (8/3) pa<m :
x

then

49
-1 n * -g
(3.42) (n-k+l) 2P¥E3¢i - ®i‘ = 0(n ) uniformly in g'.
k").
Proof. From Lemma 3.3,

n n
(3.43) 23f33¢3 - ¢:\ 3 k 2‘gi((a + g/§)[s > 03/3) .
k 1 k

Via the equality 'gi((a + g/§)[s > 03/3) = z (a + g/§)n__13i k([3 > 03/3),
1 -

1k

(3.43) and (3.40) yield

n k
(3.44) 23%; - 53 < box-k+1);5 z (a + g/§>( r1 pa<yL>>i .

k =1

11. L
Since
1‘ a

2 (a + g/§)(H pa)!5 = (2 pa)(k-l)/2 z (a + 3/§)p

1k l x x

the proof is completed by (A2) and (A3').

a ,

Theorem 3.2. Under (Al), (A2) and (A3'),

 

(3.45) \D:(Q, mf)‘ = 0(n-a) uniformly in g .

Proof. Under (A1), (A2), (A3') and (A3), Corollary 3.1 together
with (3.36) and (3.42) implies (3.45). Since (A2) and (A3') imply

(A3) via the Cauchy-Schwarz inequality

2 ((g/§)pa)$5 s (2(3/§)p:)%(z p3)35 .

the result follows.

Remark. Theorem 3.5 of Gilliland (1968) proved (3.45) under the

stronger assumption (A2+) together with (Al) and (A3'). The pro-
cedure mf in (3.45) extends and includes that of if and Qf*

in Gilliland. For examples of distribution satisfying these

assumptions see Gilliland (1968).

CHAPTER 4

SQUARED ERROR IDSS ESTIMATION IN THE NORMAL FAMILY

4.1 Introduction.
Consider the Normal (9,1) family

2
% -(x-e /2
e ,-oo<x<m

(4-1) 99(X) = (20)-

with ‘9‘ s a. The Bayes estimate in (3.13) takes the form

k k

(4.2) ¢n(z) = y + U“, for each y_= (y1,...,yk) E R ,
n

where y = yk, un = g; log(z nj). In View of (3.19), let us
k

consider estimating wk. The method of estimation is contained
n

in §l.2 of Susarla (1970).

k k
Let ‘2 denote the product measure on E3 and

_. - R
Q = (nu-k+1) 1 Z P o For each a = }_(_ in Rk, 18C
k 'j k
k
[j = X 1 , where I = [x , x + e] for L = 1,...,k,
L‘1 L L L L
and
R
[]k = X 1;, where I; = IL for L # k and I; = [xk+ﬁ, xk+ﬁ+e].
L=1
and
R
EV - x I" where I" = I for L # k and I" = [x x +ﬂ+e].
k L=1 L , L (1 k k, R

For any distribution F on Rk let t(F)(x) denote the
function “-1 log(F Dk/F [1) where FE] and Ffjk represent the

50

51

measures of [j and [3k under F and undefined ratios are taken

to be 1. We abbreviate t(O-)()£) = n‘1 1og(6 Elk/5 a) by tog.

*
Let Q be the k-order empiric distribution of X1,...,Xn

* -1 * * *
and abbreviate t(Q )(x) = T3 log(Q Dk/Q C1) by t (x_).

, k
Let X abbreV1ate xn+k’ §_ abbreviate §n+k and

* * *3? ' *
(4.3) ¢n+k = tr(x + t ()_()) , ¢n+k = tr (X + '1 (>9)

where tr and tr' stand for retraction to the intervals

['(a+ﬂ+€): a + R + e] and [-a,a] reSpectively.

k
With W abbreviating wn and suppressing the subscripts
, * ** ** ' *
in ¢n+k and ¢n+k , we have ‘3‘ s a , W = tr V and there-
** *
fore ‘3 - 3‘ s ‘w - W‘. Consequently, by the triangle in-

equality,
** * k
(4.4) guikw - M s 3.41.11 - (x + m +1134.” + t - M

We state without proof Lemma 3 of Susarla with 02 = l,

and F =Q-.

k
Lemma 4.1 (Susarla). For each x. in R

(1) x + mini) e [-a - £1 - e. a]
(2) 6 Bk 2 F J exp(- manna“ + W)} .

<3) Eogsc'iokﬂfﬁexpmnxlx \ +a+n+e>3

where x = xk , n = E nj/(n-k+1) and HE“ = Lil‘xé‘ .

52

*
4.2 Bounding 32n+klw - (X + t)‘

Fix §_= 5' until (4.10). Since x + t, by (1) of Lemma
4.1, is in [-a -'9 - e, a] it follows from the definition of
3* that ‘w* - (X + t)‘ is bounded by the quantity
a' = 2a4-gﬂ +'23, and at the same time bounded by \t* - t‘.

Therefore, for each x. in Rk,

a 0
(4.5) p M" - (x+t)| s3 A du +3 B du
-n 0 -a'

* *
where A = P [t - t > u] and B = P [t - t < u]. We shall
-n -n

first bound A and B by the Bikelis theorem.

Put
k k
Si=[§iéﬂk]. 6i=D£iEEUb
(4.6) rim) a Si - bi amt“) , for |u\ s a' , k s i .
2 n
r = 2 Var Yi
k

-— +a+'

Let w = (n-kﬁl) QC]k n/k, R = en(‘x\ a ), 2 denote
summation over i from k to n, 2' denote summation over L
from 1 to k and, 2" denote summation over d for which
R s L + dk s n, for each L.
lemma 4.2. For some constant c1,
A s k §(-wu/r) + Cl 155/3qu , o s u ,

and, for n a' s l ,

B S k §(wu/2r) +c1 R%/\wu/2‘% , -a' s u s O .

53

= ' " ..
Proof. Note that A gnu Y1 2 03 s 2 3:13; (YL+dk 3n YL-l-dk)
2 -2, gm Yi/k] and, similarly, B = §n[2(-Yi) 2 03

I
s }::_1_>,a\'_2"-(it‘dk - 3n YLMR) 2 2 33“ Yi/k] . By (4.6),

(4.7) 2 P Yi = (n-k+1) Egka ' ell“) .

'11

For OSU,1-enus-T3u. For -a's.us0,'na'sl implies

l - e'nu > J; 'nu. Thus, by (4.7),

I II _
ASE-En[2(YL-+dk PY£+dk)2wu], OSU

-n
(4.8)
I H _ - p .. - '
B5): an (YL-i-dk 11393411192 wu/23, a guso .
Since ‘Yi - EYi‘ s 2R, we have, by Lemma 3.2
I ’5 35
A s: {QC-Wu/rL) + c R /\wu| 3 , 0 Su
(4.9)
B S 2'{{>(wu/2rL) + c Rkl‘wu/les} , -a' S u s O
2
where rL = Var 2" YL-i-dk° The proof is completed by the bound

25 Va Y = 2
rL 2 r i r .
We note that

2 2 _ 2" I
(4.10) r SEEnYiS(nk+1)RQDk°

With (4.10), we prove an analogue of lemma 4 of Susarla.

Lemma 4.3. For O< e s T] s l/(6 +2a),

(4.11) gnﬁclf - (x + t)| s 31(n- k+1)'35{(3711;h)% + 9-1—1933 ,
T) 6 Us

where B1 is independent of n and Q .

54

an

Proof. Since £ §(-bt)dt s (2n)-%/b, for b > 0, it follows
from (4.5) and Lemma 4.2 that, for na' 5 1

*5
(4.12) LIN" - (x-i-t)‘ 5 c2 5+ C3 R—g .
w

By (4.10) and the definitions of w and R, the above inequality

yields

(4.13) En‘¢* - (X+t)‘ S 32(n'k+1)-%{(2LJ_ k+1) )CHDkR + (“a _]T)%D!5R%}a
“26

J—Q Bk k
where and D = 5—— . By (2) and (3) of Lemma 4.1,

"n+3 Q Elk Q Elk
C 3 exp{ (n+e)(‘x ‘ + a + T3 + (5)} and D s (n)1exp('n+e) (“12“ + 551.3)

Hence, it follows from (4.13), the definition of R and

OsesnsI/(6+2a) that

(4.14) £1333” - (x + t)\ s B3(h-k+1)'%{(32ik+9—1)25 + (—1—k)353 x
n e “e

X exp{(2‘x‘ +-\\§_u)'n}(;)-35 .

k
To complete the proof, we shall show that the P -integral of

-n+k
the function g = exp{(2|x\ +'“§“)}(;)-% is uniformly bounded
$5

in n. Let c = (2n). Since c pe(y) s exp{-33y‘ - a)+32/2}

and cgpe(y) 2 exp{-[\y| + a]2/2}, we have (F)-% s

ckﬁh exp[z'(‘xL‘ +-a)2/4], and "n+k s c-k/zexp[-£'[(ﬂx£3 - a)+j2/2}.

Consequently, the Pﬁ+k-integral of g is exceeded by the constant
-k 2 + 2
I c /4exp{(2‘x‘ +-H§M) +'Z'(‘XL\ +‘a) /4 ' 2'[(‘XL‘ - a) ] /2}d§ .

The proof is completed.

We state without proof 8 special case of Lemma 6 of Susarla.

55
2 2
lemma4.4. ‘x-i-t-H s'ﬂ(l+a)+e(l+ka)

The next lemma, suggested by Professor Gilliland, is an

analogue of Theorem 3.1.

Lemma 4.5. Consider the Normal (6,1) family in (4.1). For any

1 s b, b + k s n

(4.15) Pkwk - ¢:_b| = 0(n'1)

-n n

uniformly in ‘Q .

k
Proof. Let 1 s k, k s n-b. Since for each fixed x“

n
2 “j
k k n-b+l
Hn - wn-b‘ S 2a n
3%
n _2 n _
and, by Jensen's inequality, 1/2 "3 S (n-k+l) z “j
k k
n n
33k - Wk bl s 2a(n-k-I-1).2 2 ﬂ 2 “:1 . But for any x_€ Rk,
n n- n-b+l j k 1

, we have

"j n;1(x) s eZaHxﬂ’ therefore,

31133335) - WIS-Mi” s2.—ib(n-k+1)'1 3 eZaHaH "n dx

By the monotone likelihood ratio property of the Normals,

3 2 m
Peeza‘x‘ s 2e‘ /2 e2ax pa(x)dx = c(a) is a finite constant.

Consequently, I eZaHEM ﬁn d£.S ck(a), uniformly in n; therefore,

the result follows.

With lemma 4.5, it follows from (3.19), via the triangle

inequality, that for the Normal family in (4.1)

56
k -1 n k -1
(4.16) ‘Dn(_Q,m)‘ _<. 4a(h-k+1) Egi‘d’i - ¢i_k| + O(n log n),

uniformly in 3 .

 

Theorem 4.1. With = n-I/(k+4) and n = be for l < b, then
** k -1 kﬂ4
(4.17) Emit” - ¢n\ = 0(n /( ))
and
k ** -1 k
(4.18) Dn(§,y_ ) = 0(n /( +4)).

Proof. Lemmas 4.3 and 4.4 imply (4.17), via (4.4). The result

follows from (4.16) and (4.17).

REFERENCES

REFERENCES

Beckenbach, E. and Bellman, R. (1961). An Introduction to
Inequalities. Random House.

Bikelis, A. (1966). Estimates of remainder term in the central
limit theorem. Litovsk. Mat. Sb, 6, 323-346.

 

Feller, William (1962). An Introduction to Probability Theory
and Its Applications, Vol. 1, 2nd ed. John Wiley & Sons.

Ferguson, T.S. (1967). Mathematical Statistics. Academic Press.

Fox, Richard (1968). Contributions to compound decision theory
and empirical squared error loss estimation. RM-214,
Department of Statistics and Probability, Michigan State
University.

Gilliland, Dennis (1966). Approximation to Bayes risk in sequences
of non-finite decision problems. RM-l62, Department of
Statistics and Probability, Michigan State University.

Gilliland, Dennis (1968). Sequential compound estimation. Ann.
Math. Statist. 39, 1890-1904.

Gilliland, D.C. and Hannan, J.F. (1969). On an extended compound
decision problem. Ann. Math. Statist. 40, 1536-1541.

 

Hannan, James F. (1957). Approximation to Bayes risk in repeated
play. Contributions to the Theory of Games, 3, 97- 139.
Ann. Math. Studies No. —39, Princeton University Press.

 

 

Hewitt, E. and Stromberg, K. (1965). Real and Abstract Analysis.
Springer-Verlag New York.

Hoeffding, Wassily (1956). On the distribution of the number of
successes in independent trials. Ann. Math. Statist. 27,
713-7310 ~~

 

Johns, M.V., Jr. (1967). Two-action compound decision problems.
Proceedings 9£_the Fifth Berkeley Symposium on Mathematical
Statistics and Probability, 463-478. University of
California Press.

 

57

58

JohnS, M.V., Jr. and Van Ryzin, J. (1967). Convergence rates for
empirical Bayes two-action problems II. Continuous case.
Technical Report No. 132, Department of Statistics, Stanford
University.

Lin, Pi-Erh (1968). Estimation of a multivariate density and its
partial derivatives, with empirical Bayes applications.
Ph.D. Thesis, Columbia University.

LoEVe, Michel (1963). Probability Theory, 3rd Edition. Van Nostrand.
Royden, H.L. (1963). Real Analysis. Macmillan.

Samuel, E. (1965). Sequential compound estimators. Ann. Math.
Statist. 36 879-889.

Susarla, V. (1970). Rates of convergence in sequence-compound
squared-distance loss estimation and two-action problems.
RM-262, Department of Statistics and Probability, Michigan
State University.

Swain, Donald D. (1965). Bounds and rates of convergence for the
extended compound estimation problem in the sequence case.
Tech. Report No. 81, Department of Statistics, Stanford.

APPENDIX

APPENDIX

SOME KERNEL ESTIMATES OF DENSITIES AND THEIR DERIVATIVES

Estimation of Lebesgue density f and its derivative

(1)

g = f will be discussed in Section 1. Estimation of a density

J(1) will be

J with reSpect to dn = h dx and its derivative
discussed in Section 2. Estimates for the above quantities are

based on the kernel method that Johns and Van Ryzin (1967) used.

We shall first discuss briefly the existence of some of

the kernels. Let r be an integer 2 2 and let K0 and K1

be L2(0,l) functions vanishing off (0,1) with f‘ur Kj\du

= r! cjr’ j = 0,1 such that
t 1 if t = O
(A°1) I u K0(u) du = 0 if 0 < t s r-l
and LlK satisfies (A.l) with r replaced by r-l. For

1

O and K1 can be the first two elements of the dual

basis for the subSpace of L2(O,l) with basis {l,u,...,u

example, K

r-l}.

As the intended result of these conditions on K0 and

K1, if S has its rth derivative bounded by M on (0,1), then

th

substitution of the r order Taylor expansion with Lagrange's

remainder shows

(A.2) U‘sxo du-S(O)‘ sMcor

and, if in addition S(O) = O,

59

6O

(A.3) \f 3 K1 du - S(l)(0)‘ s M c1r .

Let X1,X2,... be a sequence of random variables i.i.d.
according to some Lebesgue density f. Let E denote the

product measure on X1,X2,...,Xn-

1. Lebesgue Density

In this section kernel estimates fn and gn for f and

g = fa)

, respectively, will be discussed. Johns and Van Ryzin
(1967) prOposed these estimates and it appears that they showed
0A.9) below under the extra assumption that f(r) is continuous

for x > a. The bounds on the bias terms in (A.9) improve as the

number of derivatives of f increases.

Lemma A.1. (Approximation of f and g). For each x and each

A > 0, let
f(x) = jxo(u) f(x+Au)du

(A.4) Ekx) = I A.1 f]:::ﬁu K1(u)du .

If f(r) exists on [x, x +-2A], then

(A.5) \E'- f‘ s Ar qér) e0r
. (A.6) IE - gl s or'lcqf’ + 2r qéZbck .

where

(A.7) qgr)(x) = Sup {\f(r)(x+Au)‘ : o < u < 1} .

.grggf. Since f(r)(x+n.) is bounded by Arqgr)(x), (A.5) follows

from (1.2). With so» = £33123“ in (A.3), the fact 3(0) = o

61

together with ‘S(r)\ s Ar(q§r) + 2r qéz)) implies (A.6).
lemma A.2. (Unbiased estimation of f. and E), For each x

and A > 0, let

n n
(A.8) fn(x) = n-1 2 W2(A) and gn(x) = n.-1 2 A-1(W;(2A) - W;(A))
i=1 i=1
where wow) = A-1 x ((x - x)/A) and W1(A) = 1'1 K ((x - X)/A)-
J 0 J j 1 1

Then fn(x) and gn(x) are unbiased for f(x) and gkx),
respectively.
Proof. Since the Xj are i.i.d., the proof follows readily from

(A.8) and the transformation theorem.

Combining Lemmas A.1 and A.2, we have

Lemma A.3. (Johns and Van Ryzin). Let A > 0. If f(r) exists

on [x, x + 2A], then

(r)

‘E fn(x) - f(x)‘ 5 Ar qA (x) cor ,

(A.9)
-1 (r) (r)

r r
\E gn(X) - g(X)l s A (qA (x) + 2 qzn (X))c1r .
Lemma A.4. (Johns and Van Ryzin). Under the hypothesis of Lemma

A.3,

va. fn(x) s (no)'1 q§°)<x>nxon§ ,

(A.lO)

3 -
Var gn(x) s 3(nA ) 1 q§2)(X)HK1H: :

where Var denotes the variance taken with respect to the measure

E, and “.H2 denotes the L2-norm with reSpect to Lebesgue.

62

2
Proof. Since the xj are i.i.d., the inequality Var X s E(X )

followed by the transformation theorem, and with the Cr-inequality

applied at the proper place, yields (A.lO).

2. Density with ResPect to du = h dx.

 

Let f be a Lebesgue density of the form f = h J, where
h > 0 if and only if x > a. Then J is a density with reSpect
(1)

to du = h dx. The estimation of J and its derivative J

will be discussed next.

Let A > 0. For each x, let

_ -1 n o
Jn(x> — n zwjmwhcxj) ,
i=1

(A.ll) n

J'<x) = n‘1 z b‘lcwlam - W1(A))/h(x.)

n j=1 J j J
Lemma A.5. If J(r) exists on [x, x + 2A], then

r (r)

\E Jn - J‘ s A 3A e0r ,

(A.12)

\E J; - J(1)\ s Ar'1 (Sgt) +’2r $52))c1r .

where Sir)(x) = Sup{‘J(r)(x+Au)‘ : 0 < u < 1} .

Proof. The proof is the same as that of Lemma A.3 with. W2, w;
1
and qgr) replaced by W?/h(xj), Wj/h(Xj) and Sir), reSpectively.

Lemma A.6. Under the hypothesis of Lemma A.5,

Vee in e (nd)‘1 i, uxoni
(A.13)
, 3 -l 2
Var Jn S 3(nA ) TA HK1H2 ’

63

= M2. .
where TA(X) sup {h(x+Au) . 0 < u < l} .

Proof. The proof is the same as that of Lemma A.4 with W2, w;

1/h(Xj) and TA’ reSpectively.

and qéo) replaced by WO/h(Xj), Wj

J

1 Minimum

|

930

 

”7

H

“2

Ill.|1
N H

”3
m H