ABSTRACT

CONTRIBUTIONS T0 COMPOUND DECISION THEORY
AND EMPIRICAL BAYES SQUARED ERROR LOSS ESTIMATION

By

Richard John Fox

Let = (x1,x2,...,xn) be a set of n independent

x

—n

random variables where for i = l,2,...,n, xi ~ P9 and 6i E O,
i

a real interval. Consider the n testing problems corresponding

to 9=(6

, ..,9 ), each having the structure of the following
—n l n

component problem, H: 9 S a, K: 9 > a, a in the interior of O
and letting L denote loss, L(H,9) = (6-a)+, L(K,9) = (S-a)". A
compound procedure inQEn) IS a sequence (¢1(§n),... ,Wn(§n))
of x -measurable test functions where 1li,(x ) is to be used

-n 1 -'n
for testing 91' The compound risk of in is the average of

1n

the individual risks, Rn(§,_‘{“) = n 21:1 R(_e_ ,qti),where

_e_ = (91,92,...) 6 0“. Define the modified regret of in’ denoted

by Dn(_Q,}IF_n), by Dn(-e-’ln) = Rn(_9_,y_n) - Mon), where R(Gn) is the Bayes
risk verSus Gn’ the empiric distribution of ﬁn, in the component
problem. For both discrete and continuous exponential families,
compound testing procedures are presented whose modified regrets
converge to zero as n -’ °°.

Consider the problem of estimating Gn’ based on §n°

Let the class of distributions be the uniform on the interval

(0,9) , 9 > 0, family. An estimator is presented whose Levy

 

 

 

 

.—r- 1": -

 

 

 

 

Richard John Fox

diStance from Gn’ for a certain class of _9_'s, almost surely
converges to zero as n -+ 0°. Estimators are presented possessing
this property for all _9_'s, when the family is the uniform on the
interval [9,9+l), 9 E (-°°,°°), distributions. For these same two
families, it is shown that if the 91's are i.i.d. ~ G, the same
estimators converge in Levy metric to G.

Let x have distribution function F 9 E O, a subset of

6’
the reals, where 9 is a random variable possessing distribution
function G. Let xl,x2,... be a sequence of random variables
i.i.d. according to the marginal distribution on x. Based on
x1,x2,. .. ,xn , we estimate the conditional mean of 6 given x
and show that the risk, assuming squared error loss, of using
this estimate of 9 converges to Bayes risk for three different
families of distributions, namely the two uniform families pre-
vious 1y discussed and a certain family of Gamma distributions.

No assumptions are made concerning G in the uniform [9,9+1)
case and in the other two cases we assume I 92dG(e) < 6°.

Consider the estimation problem discussed immediately above
when 9 indexes an exponential family on the non-negative integers.
Assuming a bounded parameter space, sufficient conditions are pre-
sented for obtaining a rate of n45 of convergence to Bayes risk.
Finally, this same problem is considered in the context of a
bivariate exponential family where one component of the two-
dimensional parameter indexing the family is to be estimated. An

estimator is displayed whose risk, under a set of assumptions, con-

verges to Bayes risk.

 

 

 

 

 

 

 

 

CONTRIBUTIONS TO COMPOUND DECISION
THEORY AND EMPIRICAL BAYES

SQUARED ERROR LOSS ESTIMATION

By

Richard John Fox

 

A THESIS
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

1968

 

 

 

4541/6, 2:, ’2’ ‘w ---. -

I. ’3’ ’4’ x
" v t ‘
2. / 1’ ' ‘1'” /
- J

W. ﬂt‘ '

 

I" f. u'.

To my parents

ii

 

 

_. -_. .\p .

 

 

 

 

 

 

_

ACKNOWLEDGMENTS

I wish to express my sincere appreciation to Professor
J. Hannan for his patience and guidance throughout the pre-
paration of this manuscript. His suggestions and comments were
of great value in obtaining the results of this thesis. I also
wish to thank Professor D.C. Gilliland for his help in the final
review.

Finally, I wish to express my gratitude to the National
Science Foundation which provided the financial support for this

investigation through Grant GP 7362.

iii

 

 

 

TABLE OF CONTENTS

Chapter Page
I. INTRODUCTION .................................... 1
II. COMPOUND TESTING IN EXPONENTIAL FAMILIES ........ 4
2.1 General Remarks ............................ 4

2.2 Discrete Case .............................. 8

2.3 Continuous Case ............................ 20

III. ESTIMATING THE EMPIRICAL DISTRIBUTION OF A
PARAMETER SEQUENCE IN COMPOUND DECISION

 

 

PROBLEMS ........................................ 27
3.1 Introduction ............................... 27
3.2 Uniform (0,9) Case ....... . ............... 29
3.3 Uniform [9,9+1) Case ..................... 33
3.4 Estimating the Prior Distribution .......... 37
IV. SOME EMPIRICAL BAYES SOLUTIONS .................. 42
4.1 Introduction ..... . ......................... 42
4.2 Uniform (0,9) Case ....................... 43
4.3 Uniform [9,9+1) Case ......... . ........... 52
4.4 Estimation of a Location Parameter in
Certain Gamma Distributions ................ 54
j V. EMPIRICAL BAYES ESTIMATION IN EXPONENTIAL
{1 EAMILIES .. ........................... . .......... 62
j‘ 62
Q‘ 5.1 A Rate for the Discrete Case ...............
if 5.2 Estimation in the Presence of a Nuisance
‘ Parameter .................................. 7O
BIBLIOGRAPHY .................................... 79

 

iv

 

 

 

 

 

 

 

 

 

CHAPTER I

INTRODUCTION

The problems considered in this thesis fall into the following
three categories: Compound Decision Theory, Estimation of Distribution
Functions and Empirical Bayes Estimation. In Chapter II, we consider
a compound decision problem, where the component problem is a test on
the parameter of an exponential family. For both discrete and con-
tinuous cases, compound testing procedures which possess a certain
desirable asymptotic property are displayed.

Consider a compound decision problem where the underlying family
of distributions is either uniform on the interval (0,0), 9 E (0,”)
or uniform on the interval [9,6+l), 9 E (-°,m). Let Gn denote the
empiric distribution of the n parameters corresponding to n
observations. In Chapter III, for both families, estimators are pre-
sented whose Lévy distancesfrom Gn almost Surely converge to zero
as n ﬂ w. For these same two families, we obtain as corollaries
that if the 6's are i.i.d. random variables with common distribution
function G, these same estimators converge in Lévy metric to G.
Robbins (1964) presents a minimum distance technique in a general
context for obtaining Levy-convergent estimators of the prior dis-
tribution function G.

In Chapter IV we deal with the Empirical Bayes Quadratic Loss
Estimation Problem, which is treated for certain exponential families
by Robbins (1964). Suppose x has a distribution depending on a

random variable 6 possessing distribution function G and that

 

 

 

 

the value of 9 is to be estimated. Further suppose that this

PTOblem has occurred n times in the past. For three families,
namely'the two families considered in Chapter III and a certain family
of Gamma distributions, estimators of the conditional mean of 6
given x, based on the past n observations, are presented whose risk
converges to the Bayes risk versus G, i.e. the estimators are
asymptotically optimal. The class of prior distributions G, for
which this result holds varies with the family.

In Chapter V,two more empirical Bayes quadratic loss estimation
results are presented. Macky (1966) displays an asymptotically
optimal procedure for a family of exponential distributions on the
non-negative integers:under the assumption that the prior dis-
tribution possesses a second moment. Under the assumption that the
parameter Space is bounded, we present sufficient conditions for
obtaining a rate of convergence to Bayes risk for this procedure.
Finally, asymptotically optimal estimators are presented for one
component of a two-dimensional parameter indexing a bivariate
exponential family.

We now make some remarks concerning notation. If A is an
event, [A] will sometimes be used to denote the indicator function
of A. For any distribution function, say F, the letter F also
represents the corresponding Lebesgue-Stieltjes measure and
F]: = F(b) - F(a). We adopt the convention that distribution
functions are right continuous.

For any measure u and a function f, u(f) will occasionally

 

 

 

' In H

 

 

 

 

 

 

 

 

3

be used to denote If du. The abbreviation i.o. stands for infinitely

ofterL. For any function of a real variable, say g, g' and g"
denote its first and second derivatives. If A is aset, A' denotes
its complement. Finally, Q stands for the standard normal dis-
tribution function.

We now make some remarks concerning a certain type of three-
point distribution which occurs frequently throughout this paper.
Let the random variable x take on the three values -v, 0, w,
where v and w are positive, with corresponding probabilities

q, 1-q-p,p. By direct calculation, letting V denote variance,
2 2
(1.1) V(x) = v q(l-q) + 2v w q p + w p(l-p).

The following lemma is due to Gilliland and Hannan (1968).

Lemma 1.1. If x is a random variable assuming the three values
-v,(Lw, where both v and w are positive, with corresponding
probabilities q,1-p-q,p, then letting the range of x be denoted

by r = v + w and 02 = V(x),

NIHN
IA

#3 Ir—a
+

'O [H

Q

 

 

 

 

CHAPTER II

COMPOUND TESTING IN EXPONENTIAL FAMILIES
1. General Remarks.

Let 65;?) be the measurable space consisting of the Borel
field on the real line and let «9 = {PQIS E O}, 0 being a real
interval, be a family of probability measures on (3,50. Suppose

that 9<< p. and that

dP
Jgp
du 9'
Consider the following test of hypothesis problem which we
call the component problem. Based on an observation of a random

variable x, distributed according to P 6 G O, we test:

9’
H: 9 S a

versus

K: 9 > a,

where a is in the interior of 0. The loss function L is defined

as follows:

+
L(H)e) = (e ' a) :
L(K,9) = (9 - a)’.
= ... ' d
Let ‘En (x1,x2, ,xn) be a sequence of n independent ran om

variables with x1 distributed according to P6 and Bi 6 0 for

i

 

"ll\\m a

V .

 

 

(~ ~‘ ‘ _-‘
“.M—

 

 

 

 

 

 

   

 

 

   

 

 

i ‘ 1,2,...,n. Abbreviate P6 to Pi’ p9 to pi and “2:1P1
i
to P

i
_11. Let 2n = (91,62,...,0n). We consider the compound dec1sion

problem consisting of the n testing problems corresponding to 9

3

where the hypotheses are as in the component problem.
We adopt the convention that the value of a test function is

the probability of accepting H. By a compound testing procedure

is meant a sequence, 1n(§n) = (¢1(§n),¢2(xn),...,¢n(xn)) of x“-

measurable test functions, where W is the test function for

testing the hypothesis concerning Si. Also, for any sequence 2,
0%

let 2 denote the corresponding product measure on (I ,5 ).

For any sequence 2, we define the compound risk of in,

denoted by Rn(§,1n), as follows:
-1 n + -
(2.1) Rn(9-’1n) = n ifltznﬂixei-a) + (1-gn(wi))(ei-a) 3,

i.e., Rn(§,1n) is the average of the risks for the n problems.
Let Gn be the empiric distribution function of g“. If we
restrict in to simple symmetric compound procedures, i.e.,
ti(§n) = V(xi) where W is some test function for the component
problem, then the minimum of Rn(§,1n) over such procedures is
R(Gn)’ the Bayes risk against Gn in the component problem. De-

fine the modified regret of a procedure in, denoted by Dn(§’in)’

as follows:

(2.2) 13,52,111) = 1352.1“) - R(Gn).

Hence, a procedure whose modified regret tends to zero, asymptotically

 

 

 

 

 

 

 

 

6

is as good in terms of average risk as the best simple symmetric
procedure.

Let ¢G be the Bayes test versus Gn in the component
problem whichnchooses H if and only if Px(9) < a, where Px(9)
is the conditional expectation of 9 given x when the joint dis-
tribution on the pair (9,x) results from Gn on 9 and P6 on

x. Thus,

(2.3) «:6 (x) = [13(6) < a].
n

 

Let QC be the simple symmetric compound procedure which uses
n

$6 at each stage.
n

We now derive a useful expression for the modified regret of

a compound procedure in. Since R(Gn) = Rn(§,QGn), by (2.1) and
(2-2),

-1 n
(2.4) ”SM-n) = n .2 «fauna, - e6 (xi)).

1=1 n

For each i in the right hand side of (2.4), interchange the order
of integration so that 2“ becomes Pi(n;#i Pj). Then, converting
the Pi-integrals to u-integrals with the variable of integration
changed from xi to x, replac1ng U?#i P. by En and interchang-

J
ing summation and integration, we obtain from (2.4)

Dn@,ln) = “(1:1) ,
(2.5)

_ -1 “
In(x) - n i§1(ei-a)pi(X)[£n(wi(xl’..-’xi-1,x,xi+1’...,xn))—¢Gn(X)].

 

In this chapter u will either be Lebesgue measure or counting

measure on the non-negative integers. ‘9 ‘will be an exponential

family specified by the following density with respect to u:

(2.6) pecx) = exc<e>m<x>,

where 6 6 ﬂ, 0 being an interval subset of the non-negative reals
and m is a positive function. In both the discrete and continuous

cases we shall exhibit compound testing procedures whose modified

 

regrets tend to zero as n increases. .L;
For any real-valued function, say g, on the‘Q-support of I,

define the linear functional T by

= aorta sis).
(2'7) T(g(x)) m(x+1) ’ a m(x) °
Let
-1 -1
(2.8) |Tl(x) = (m(x+1)) + a(m(x)) .

Throughout the remainder of this chapter we will occasionally

abbreviate expressions involving functions of x by omitting the

—- -1
display of the argument x. Define p = n 22:1 i

Since by (2.7)
and the definition of '5,
_. -1 n x
(2.9) T(p(x)) = n E (9. - a)9. C(G.)
i=1 1 1 i

'and since pe(x) being defined by (2.6) implies that Px(6) < a

iff 13660) < 0, by (2-3)’

 

 

 

(2.10) ¢G = [T(E) < o].
n
*
Let F be the empiric distribution function of x = (x ,x ,...,x )
‘11 l 2 n
*
d l b h ' ' d’ ' ' ’ ... ,...,
an et Fi e t e empiric istribution function of (XI, ,xi_1,x1+1 xn)

* -
multiplied by (n-l)/n, i.e. Fi(x) = n 1 2?#i [xj S x].

2. Discrete Case,

In this case p is counting measure on the non-negative integers
and the family considered is specified by density (2.6). Our results

will be obtained under the following two assumptions:
0A1) n = [0,5], 5 < m.
(A2) 0=[d,B],0<OI<B<°°-

Johns (1966) considers a sequential compound testing problem,
i.e., only the first i observations can be used at the ith problem.
He considers '9 to be a class of distributions having some common
discrete support. In particular, the family dealt with in this
section is considered by Johns.

Under (Al) and the further assumption that B is in the interior

of the natural parameter space, {9'9 > 0,u(9xm(x)) < a}, Johns dis-

52

Plays a procedure whose modified regret is of order n- uniformly

in ‘9. The statistic used in the testing procedure involves
artificial randomization. Under (A2) and the same assumption on B,
~Johns points out that randomization is not necessary.

In the following, a non-randomized compound testing procedure

 

 

 

 

 

 

Also

 

 

is given whose modified regret under (Al) converges to zero for
each 9' and under (A2) converges to zero uniformly in .9. The
method of proof differs very much from the technique of Johns.
Also, an example is given which shows that unless further as-
sumptions are made, no rate of convergence can be found for this

procedure under (Al) or (A2).

 

Define
dF*
* . -1 n
f.(X)= =n z[x.=XJ,
1 d“ #1
* n
* dF -l
f (X) ="_‘ = n 2 [x. = x].
du j==1 J

* * *
Equation (2.10) motivates the testing procedure ‘1“ = ($1,...,¢n)

defined by
* *
(2.11) wi(§n) = [T(fi(xi)) < o], 1 = 1,2,...,n.

We now proceed to show that this testing procedure has
modified regret converging to zero as n r w. It is convenient

to introduce the random variables Yj(x) defined by
(2.12) Yj(x) = T([xj = x]).

By (1.1),

azpi<x> (1410.)) zap J. (x) p j <x+1> p, (x+1> <1-p j <x+1>>

 

 

 

 

 

 

' (2.13) V(Y.( )) = +
J x m2(x) m(x)m(x+1) m2(x+l)

A180,

 

 

 

10

x
(2.14) Pj(Y (X)) (GJ. - 6093. C(ej)-

.1

We note that by the linearity of T, for any x,

* -1 n
(2.15) T(fi(x)) = n 2 Y.(x)
j#i J
and
* -1 n _____
(2.16) T(f (x)) = n 2 Yj(x) = Y(x).
i=1

* _
Lemma 2.1. T(fi) - T(p) r 0 in ‘P-measure uniformly in i and

g for each x.

Proof: Let x be fixed. By (2.15), (2.16) and the definition

of |T| in (2.8),

hat) - mm -n‘1|yi| s n‘lm.

By (2 9) and (2.14), T(E) = gn(§). Thus, by (2.16),

* _. _. ._
|T(f ) - T(p)| = |Y - (Y)|. which converges to zero in P-

P

aﬂl

measure uniformly in ‘9 by the Tchebichev Inequality. Hence,
* ._

by the triangle inequality T(fi) - T(p) r 0 in meeasure

uniformly in .3 and i.

Qgrollary 2.1. For each x, there exist two sequences {0n} and

{an}, both positive and decreasing to zero, such that for all ‘9
_. *

and i = 1,2,...,n, if |T(p)| > 5n, then §K[T(fi) < 0] =

[T(E) < 0]) 2 1 - en.

EEEQEE: Fix X and let {5;} and {6;} be two sequences of

Positive reals decreasing to zero. Define n3, j = 1,2,... such

 

 

 

 

 

 

 

 

11

* _
that n 2 n3 implies g(|r(fi) - T(p)| 2 53) s 63 for all “g and
i = 1,2,...,n. The existence of n3 for each j is guaranteed by

Lemma 2.1. Define n1 = l and for j 2 2 let nj = (“j-1" n3) + 1.

Define the sequence {an} by en = l for n1 5 n < n2 and for

. = I
Define {6“} by 5n 6j for

for j = 1,2,... . The sequences {6n} and {en}

= ' S
j 2 2, en 51 for nj n < nj+1.

nj S n < nj+1,
are positive, decreasing to zero and satisfy the condition:
g(|r(f:) - T(E)| 2 an) s an for all .g and i = 1,2,...,n which
completes the proof.

We now state and prove a lemma which follows immediately from

a well knowu theorem of probability theory. Applicationsof this

lemma to the variables Y (x) yield results which are useful in

J

this development.

Lemma 2.2. If z , j = 1,2,... are independent random variables

J

possessing finite ranges rj such that rj S r < a for j = 1,2,...

and if s: = 2?=1 V(zj) a Q as n H m, then for any pair of real
numbers b < d, P{b S zg‘lzj S d} e 0 as n a W. Further, if
for j = 1,2, .., V(zj) 2 52 > 0, P[b s zg=1zj s d} = 0(n'5).

Proof: By the Berry-Esseen Theorem, page 288 of LdEve (1963),

n
P{b s 2 zj s d} s §(d*) - @(b*) + 2c r s;1,
j=1

where

9: *
snd = d — 2P=ls(zj), snb = b - z§=1E(zj)

 

 

uh

in:

the

 

12

* *
and c is the Berry-Esseen constant. Since §(d ) - Q(b ) s
-1 _
sn (2n) l5(d-b), the proof of the first result of the lemma is
complete. If Var (23) 2 62 > 0 for all j, then sn 2 nab

and the proof is complete.

Lemma 2.3. If 9 f 0 as j ~ D, for each x,

J

be S z§=l Yj(x) S d} d 0 as n ~ Q where b < d are two real
numbers. Further, under (A2), be S z§=l Yj(x) S d} = 0(n-k)

uniformly in g for each x.

Proof: If 9j ﬂ 0 there exists n > 0 such that 9j 2 n i.o.
If 9j 2 n. by (2.13)

a ,(x)d (x+l)

V(Yj (x)) 2 m(x)m(x+1)

where for x = 0,1,2,...
dn(x) = inf pe(x) > O.
sews]
Hence, 3: = z§=l V(Yj(x)) H w. Also, the range of Yj(x) is
|T|(x) for all j. Thus, the first result of the lemma follows
from the first part of Lemma 2.2.
Under (A2), for all j

a qi(x)da(x+l)

V930") 2 W

and the right hand side of this last inequality is independent of
53. Hence, the second result of the lemma follows directly from

the second part of Lemma 2.2.

 

 

m

 

 

 

Cr

an

 

 

13

We now define

* *
(2.17) v (x) = sup P {T(f,(x)) < o} - inf P {T(f.(x)) < 0}.
n lSiSn-n 1 lSiSn-n 1

Lemma 2.4. If 9j {'0 as j r w, Vn(x) d 0 as n d)” for each
x. Also, under (A2), Vn(x) = 0(n-k) uniformly in g. for each x.

* —- -
Proof: Let x be fixed. By (2.15) T(fi) = Y - n lYi' Hence,

by (2.12), for i = 1,2,...,n
(2-13) Eng < -a(nn0-1} 5 En{T(f:) < 0} 5 Eng < (nm(x+1))-1}-

By the definition of Vn(x) in (2.17) and (2.18),

n -1
s 2 Y. s (m(x+l)) }.
i=1 3

(2.19) Vn(x) s gn{-am'1

By Lemma 2.3, Vn(x) r 0 for each x. Under (A2), by the second
a

part of Lemma 2.3, Vn(x) = 0(n- ) uniformly in .9 for each x.

Lemma 2.5. If 9j r 0 as j a a, then

*
inf P (T(f,(0)) < 0) r l.
lSiSn_n 1

Proof: By the lower bound of (2.18),

(2.20) inf P (T(f’,‘(0)) < 0) 2 P {Y(O) < -a(nm(0))'1}.
lSiSn 1 ”1‘

Since SJ * 0, by (2.14) §n(Y(O)) ~ -‘a(m(0))-1 < 0. By Kolmogorov's

Criterion, page 238 of Lbeve (1963), Y(O) ‘.§n(Y(0)) ~ 0 a.s. ‘g

and the result is immediate from (2.20).

0 Q x -1
We note that Since C(9) = (2&30 9 m(x)) 9

 

I ‘ l’.‘ Inn-“Inna

9 \1__r'=-.£’_~

 

 

 

 

 

 

 

   

 

14

(2.21) C(O) = mm)"1 = sup {C(e)le 6 [0,3]}.

It follows from (2.6) and (2.21), with In defined in (2.5),

that for x = 0,1,2,...

(2.22) |1n(x>l s C(O)Bx+1m(x).

*
Proposition 2.1. Under (A1), for each ‘g, Dn(§,1n) r 0 as n a w

*
where -1n is defined by (2.11).

Proof: Case 1: 9j " 0 as j r a. Let x 2 l. Recalling the

definition of In in (2.5), we have

n
|In(x)| s a n'1.2 Vpi(x)

i=1.

and since 9.1 ﬂ 0, the right hand side converges to zero. Let
x = 0. Since 9j r 0, by (2.9) and (2.10) ¢G (0) = l for n
sufficiently large. By Lemma 2.5, n

inf P {T(f’XO) < 0} _. 1

lSiSn _“ 1
so that |1n(0)| s n‘le ELIIZJTGBOD < o} - ¢G (0)| —» 0. Thus,
since the right hand side of (2.22) is u-integrab12, by the Dominated
Convergence Theorem, 0(In) ~ 0 which by (2.5) completes the proof
of Case 1.

Case 2: 9.1 f 0 as j a m. Let x = 0,1,... be fixed.

By Corollary 2.1, there exist two sequences of positive reals

<iepending on x, say {0“} and {an}, both decreasing to zero

8lxch that |T(p)| > 6n implies that for i = 1,2,...,n

 

 

 

 

 

 

 

 

15

* _
£n{[T(fi) < 0] = [T(p) < 0]} 2 1 - en,
Hence, if |T(E)| > (an, since pe(x) s 1 for all x and e,
-1 n a .—
(2.23) lInl s n a 2 |£n{T(fi) <0} - [T(p) < OJI s a an.

i 1

If |T(E)| s an, by (2.9) and the triangle inequality

(2.24) lIn' S m 5n(l + qn) + B Vn

where
l * *
q (x) = — (lsup P {I(£,(x)) < o} + inf P {T(f (x)) < 0})
n 2 S.S ‘-n i . -n i
i n 1S1$n
and Vn is defined by (2.17). Adding the bounds in (2.23) and

(2.24) and replacing 1 + qn by 2 we have
s .
(2.25) lIn| 2m an + B<en + V“)

The right hand side of (2.25) converges to zero by Lemma 2.4 and
the construction of {6“} and {en}. Hence, since the bound of
(2.22) is u-integrable, by the Dominated Convergence Theorem
0(In) * 0 which by (2.5) completes the proof.

In order to prove uniform in .9 convergence under (A2),
we will need the following lemma which appears as Lemma 5c of
Parzen (1959). For a proof of this lemma, Parzen refers the
reader to another text where the result appears in a much more

general context .

Lemma 2.6. Let (1,0,0) beameasure space. Let fn A,
’

 

 

I.--I':.,'l

‘1

 

 

 

 

 

 

16

ﬂ ‘ 1,2,..., A 6 A, A being some set of possible values of A,
be real-valued Borel measurable functions defined on (3,0)

such that for each n and 1, |im1| s g with u(g) < no. If
fn,x(x) r 0 uniformly in A E A for each x, as n r a, then

”(fn,k) r 0 uniformly in A as n H w.
3322;: Without loss of generality, asSume 0 S fn,A for all
n and A. For a > O, “(fn,k) = An + Bn + Cn where
An = Mfg S elfnﬂ‘),
n =u<£g> cltf , s czar 1)
n n, n,
and
C =u([g>e][f >c2,]f ).
n n,k n,k
Let v be the finite measure such that dv/du = g and note that
An S H([g S e]g) = V([g S 3]), which can be made arbitrarily small
by choice of e, since v([g S 0]) = 0. Also Bn S ezu([g > e]) S cu(g)
which can be made arbitrarily small by choice of c. Note that

by the uniform in A convergence of fn A to zero and the finite-
’

ness of v, for any 6 > 0,
> 6 ~ 0.
skip v([fn,k ])
Hence since C S V([f > 62]) as n ~ @
’ n n,A ’
32p Cn a 0,

which completes the proof.

 

 

 

17

, *

Pr02081tion 2.2. Under 0A2), Dn(§,1 ) ~ 0 uniformly in g as
*

n e m where 1 is defined by 2.11.

2322:: The proof is the same as that of Case 2 of Proposition 2.1
except for the following modifications. Since the sequences {6“}
and {an} are independent of 9 for each x and by Lemma 2.4,
Vn(X) ' 0(n-7) uniformly in 9 for each x, the bound on lIn|

of (2.25) converges to zero uniformly in g for each x. Apply-
ing Lemma 2.6 instead of the Dominated Convergence Theorem completes

the proof.

 

Consider the compound testing procedure defined by
2 26 - [ f* < o
<. > vim“) - T< (x9) 1,

*
i = 1,2,...,n. With 1“ defined by (2.26) and in by (2.11),

since T(f*(x1)) = T(f:(xi)) - a/n m(xi) we obtain by (2.1),
(2 27) IR (8 ) - R (e *)| s ln'1 2 (e -a)P {0 s T(f*(x ))
' n -’1n n —’1n i=1 i —n i i

< a/n m(xi)}l.

Dealing with the right hand side of (2.27) as we did to obtain

(2.5), since le-al s B,
* -1 n *
(2.28) |Rn(§’1n) - Rn(§,1n)| S Bu(n 2 pi(X)En{0 S T(fi(X))
i=1
< a/n m(x)}).

By the definitions of Yj(x), |T|(x) ((2.12) and (2.8)) and

(2-15), for each x

 

 

 

a

“Hill

by the

     

 

18

n
(2.29) n“1 )3

i pi<x>2n{o s I<f§<xn< a/n mm} s tuba/ms)

l

n
s 2 Y.(x) < |P|(x)}.
i=1 3

Theorem 2.1. Under (A1), with in defined by (2.26), Dn(§,[n) ﬂ 0

for each .9. f3:

*
Proof: By Proposition 2.1, Dn(§,[n) r 0 for each .2- Hence, by
the triangle inequality, it suffices to show that for each .9,

*
IRnQJm) - Rn(_e_,yn)| - o.

 

Case 1. 9j {’0 as j d w.
By Lemma 2.3, the right hand side of (2.29) converges to zero for
each x. Hence, since the integrand of the right hand side of
(2.28) is bounded by the p-integrable function C(0)me(x), by
the Dominated Convergence Theorem the right hand side of (2.28)

converges to zero which completes the proof of Case 1.

_case 2. Gj-'0 as j-'°°.
For x,2 1, the integrand of the right hand side of (2.28) converges
to zero, since pj(x) * 0'as j r m. For x = 0, this integrand
converges to zero by Lemma 2.5. Again an application of the

Dominated Convergence Theorem completes the proof.

Theorem 2.2. Under (A2), with in defined by (2.26), Dn(9’1n) r 0

uniformly in IQ.

*
Proof: By Proposition 2.2 Dn(§,1n) ~ 0 uniformly in '9. Hence,

by the triangle inequality it suffices to show that"

 

 

 

 

 

 

 

 

 

19

\Rn(§J$TR - Rn(§,i:)| ~ 0 uniformly in 9. Under (A2) the right
hand side of (2.29) converges to zero uniformly in g by Lemma
2.3. Hence, since the integrand of the right hand side of (2.28)
is bounded by the u-integrable function C(O)me(x), the right
hand side of (2.28) converges to zero uniformly in g by Lemma

2.6 and the proof is complete.

Example 2.1. The following is a slight modification of Example

3.1 of Gilliland (1966) which shows that no rate of convergence

 

to zero can be found for the modified regret of the compound
testing proceudre defined by (2.26) under (Al) or (A2). Let
m(O) = 1, m(x) = x‘3 for x = 1,3,5,... and m(x) = r(x) for
x = 2,4,5,... with r(x) S x_3 and r(x) strictly decreasing.
The parameter space for this family is [0,1]. Let 0 < a < 1
and g = (1,1,...). Since R(Gn) = 0, with in defined by
(2.26)

Duel“) = n‘lz;1(1-a)_r:n{tr(f*(xi))< 0}.

It then follows that the right hand side of this last equality
has a lower bound of (l-a)2:=0p1(x)(1-p1(x+1))n-1. Summing
over x = 1,3,5,..., since p1(x) = C(1)m(x), we have

°° -3 n-l

Dn(g,1n) z (l-a)c(1) z (2x+1) (1-C(1)r(2x+2)) .

x=0
Proceeding exactly as Gilliland, we see that the modified regret
dominates a positive null sequence which decreases arbitrarily

slowly by choice of r(x).

 

 

 

 

 

 

Upon
Hill i

real v

(2.30)

£011.

 

 

20

3.x.Cont1nuous Case.

 

In this section 9 is characterized by density (2.6) with

respect to Lebesgue measure u. We recall that m is positive and

further assume that:

0A1) m is continuous.

We also aSSume:

(A2) Q=[a,B],0<a<B<°°-

 

Let F6 be the distribution function corresponding to p9.
' =
By (Al), Fe pe.

Abbreviate F to Fi. Define E = n-lzf F Let

9. i=1 i.
1

0 < h(n) S l have sequence of positive reals decreasing to zero.
Henceforth, we omit the display of the dependence of h on n.
We estimate 3' for the ith problem by a divided differencetafF:.
For convenience we use central differences, but it is obvious
upon analyzing the development that other differencing procedures
will work just as well. Let g be a real-valued function of a

real variable, x. Define the linear functional A by
-1 x+h
A g<x> = (2h) 3]“,-
*
Consider the procedure in defined by
* *
2. = <
( 30) wiqn) [T(A Fi<xi)) 0].

for i = 1,2,...,n. For g a E-square integrable function of

 

   

,-
f
I
l‘ ‘ ‘
I _. ‘
. . 2
/ ,'
/’ r 4‘Illl'r.
I " V

   

 

21

3; let “g“ denote the L2(P) norm of g. We note that by (A2)

for all 9 E 0,
(2.31) c<e> s'5 = {p((Ex > 01ax + [x s OJex>m<x)>}'1.

(2.32) C(9) 2.9 = {p((Ex > 015* + [x s 03ax>m<x>>}‘1.

Lemma 2.7. If nhz ~ a, then for each x, “T(l F*(x)) - T(l'E(x))H ~ 0

uniformly in '8.

 

’Proof: By the linearity of a and T and the triangle inequality,

Hr<b F*<x>) - T(l Ekx>>H s <m<x+1>)'1HA(F*JE><x+1>H
+ a(m(x))’lnl(r*4f)(x)u.

*._ -
For any y, “p(F -F)yH2 is the product of (2h) 2 and the variance

of the average of n independent Bernoulli random variables.

Hence, ”$(F*;P)yuz S (16nh2)-1 which completes the proof.

Lemma 2.8. Under (Al) and 0A2), for each fixed x,

|T(b‘§(x)) - T(p(x))| * 0 uniformly in ‘Q.

Proof: By the linearity of T and the triangle inequality, for

each x
II<$'F<x)> - T(S(x>>| s <m<x+1)>'1|l'i(x+1> -'S<x+1)|
+ams»*u§s)9RmL

For any y, since by the linearity of L, b F(y) = n'lzqglb Fi(Y):

 

 

 

unit

(A1)

ineql

la)
n

is cc

(2.33

Let

 

 

22
n
HFo>-Mw|si12HF5w-png
i=1

We now bound the summands in the immediately preceding expression
uniformly in 9 and show thatthisbound converges to zero. By

(Al) and the Mean-Value Theorem, A Fe(y) = pe(y + 5), '6‘ s h. Also
y 6
hgy+®-p¢w|seumumo+é)-MwL

Under (A2), for all 0 6 Q, the right hand side of this last

inequality is bounded by (eyv 913’) E (l86m(y+6)-m('y)|v

 

I06m(y+6)-m(y)|), where E is defined by (2.31). Since h ~ 0,
6 ~ 0 and hence, by the continuity of m, this bound converges

to zero which completes the proof.

Lemma 2.9. Under (Al) and (A2), if nh2 ~ °, for each x,
* _

“T(L Fi(x)) - T(p(x))“ * 0 uniformly in g and i.

Proof: Let x be fixed. By Lemma32.7 and 2.8 and the triangle

inequality for L -norm, “T(l F*(x)) - T(p(x)n d O uniformly in

2
ﬂ. Since “T(L F:(x)) - T(b F*(x))n S (2hn)-1IT|(x), the proof

is complete by another application of the triangle inequality.

We now define Mh(x), the continuous analog of Vn(x), by
* ' *
(2.33) M (x) = sup P {T(l F.(x)) < o} - inf {r (T(l F.(x)) < 0}.
n . *n 1 . -n 1
IS1Sn 1S1$n
Let
r(x) = inf {m(y)| Iy-x| s 1}.

and note that r > 0 by (Al). Then, by (2.32) for all y such

 

 

 

ind

de‘

hand

m

L

Ill.‘ 1

 

23
that \y-x| s 1 and all e e o,

(2.34) pe(y) 2 A(x)

where
A<x> = 9 r(x) exp {-<|1og e|v|log a|)(lx|+1)}-

Lemma 2.10. Under (A1) and 0A2), if nh2 d a then for each x
-1 * -1
and any two real numbers b < d, Pfh b‘S nT(A F (x)) S h d] w 0

uniformly in g.

 

*
Proof: Let x be fixed and note that nT(b F (x)) = zg=l Wj

where wj = (2h)-1T([x-h < xj s x+h]). By (1.1),
- - 1+h .
V(Wj) 2 28(2h) 2(m(x)m(x+1)) 1Fj1:f: Fj]::l-h' Thus, w1th
2
sn - 22:1 V(wj), by (2.34),
(2.35) 52 2 2 anA(x)A(x+l)

n m(x)m(x+1)

By the Berry-Esseen Theorem, page 288 of Leave (1963) and

the fact that the range of w = (2h)-1|T|(x) for j = 1,2,...,n,

j

n
(2.36) gn{h‘1b s zle s h‘ld} s §(d*) - §(b*) + c(snhf1|T|(x).
j: '

* -1 * -1
h s b = h b - if P, w, and s d = h d - 2? P w
w ere n J=1 J( J) n J=1 j( J)
and c is the Berry-Esseen constant. Since
* * -
§(d ) - §(b ) S (snh)-1(d-b) and nh2 * m, by (2.35) the right

hand side of (2.36) converges to zero which completes the proof.

Lemma 2.11. Under (Al) and (A2), if nh2 ﬂ m, for each x,

 

 

24
MnCX)‘” 0 uniformly in g.

Proof: Let x be fixed. For i = 1,2,...,n,
P {am P*(x)) < -a(2hm(x))'1} s P {T(A {(x)) < 0} S P {anx F*(x))
—n --n 1 '_n

< (2hm(x+1))-1}.

Hence, by the definition of Mn(x) in (2.33), Mn(x) is bounded
by the difference between the upper and lower bounds of the above
inequality. This difference can be expressed as

-1 * -l .
gnf-a(2hm(x)) S nT(A F (x)) < (2hm(x+1)) } wh1ch converges to
zero uniformly in g by Lemma 2.10, which completes the proof.

We now note that by the bound on C(G) of (2.31) and (A2)
(2.37) peso S V(X)

where V(x) = E(sx[x > o] + QXEx s 0])m(x).

*

Proposition 2.3. With in defined by (2.30), under (A1) and (A2),
. 2 * .

1f nh d G, DnQQ‘in) O uniformly in g.

Proof: Let x be fixed and recall the definition of In in (2.5).

By the bound on p6 of (2.37), bounding |B-a| by B, we have

—n

n
(2.38) lIn(x)| s sv(x)n'1 2 IP {T(A F:(x)) < o} - [T(P(x)) < o]|.
i=1

, 'k .—
By Lemma 2.9 and the Tchebichev Inequality, T(A Fi(x)) - T(p(x)) * 0
in P-measure uniformly in g and i. Using the same construction
as in Corollary 2.1 following Lemma 2.1, there exist two sequences

{on} and {en} of positive reals both decreasing to zero such that

 

 

in

th(

 

 

25

\T(§(x))| > 65 implies §{[T(A F:(x))< 0] = [T(E(x)) < 0]} 2 1-en

for i = 1,2,...,n and all 2. Hence, if |T(p(x))| > 5n, the

right hand side of (2.38) is bounded by Bv(x)en. If [T(p(x))] 5 6n,

since by (2.9), T(p(x)) = n'12§=1(ei-a)e:c(ei), adding and sub-

tracting qn(x) to the term [§n(W:(x1,...,xi_1,x,xi+1,..-,xn))-¢G (X)]
n

of the expression defining In in (2.5), where

1 * _ *
qn(x) = 41:31:; gum) Fi(x)) < o} + 1:; znird F100) < 0}) .

we have IIn(x)| s m(x)(5n + qn(x)én) + Bv(x)Mh(x).

Since en,6n and Mn converge to zero uniformly in g
for each x and IInl S Bv which is u-integrable, it follows
from Lemma 2.6 that p(llnl) a o uniformly in g, which completes
the proof.

Consider the compound testing procedure defined by

(2.39) 11m“) = [rd F*(xi)) < 0]-

Theorem 2.3. Under (A1) and (A2), with 1n(xn) defined by (2.39),
if nh2 w a, then Dn(§,ln) * O uniformly in g,
* * -1
Proof: Since T(A F (xi)) = T(A Fi(xi>) - a(2hnm(xi)) ,by (2.1),
<2 40) In (M ) - R (9 PM s Bn-IIZIIP {0 srd P*(x ))
' n JLn rl-Ln i=fﬂ i i

< a(2hnm(xi))-1}.

Proceeding as we did to obtain (2.5) and applying the bound on

pe(x) in (2.37) we get that the right hand side of this inequality

 

 

 

26

is bounded by:
-1 n * -1
(2-41) BH(V(X)n Z P {0 S T(F (X) S a(Zhnm(X)) 1)-
i=1 -n i

Since for i = 1,2,...,n,

gn{o s T(l F:(x» s a(2hnm(x))‘1} s gn{-a(2hm(x))'1

s um F*(x>) s <2h>'1|T| so},

the integrand of (2.41) converges to zero uniformly in g for
each x by Lemma 2.10. Since this integrand is bounded above
by V(x) which is u-integrable, the integral of (2.41) converges
to zero uniformly in g by Lemma 2.6. It follows that the left
hand side of (2.40) converges to zero uniformly in g. Hence,

by Proposition 2.3 and the triangle inequality, Dn(§,in) d 0

uniformly in g.

Remark. Suppose we consider I to be an interval neighborhood
of +m. Redefine i to be a right divided difference, i.e.,

4 g(x) = hml g]:+h and let r(x) = inf {r(y)|l 2 y-x 2 0}. If
one modifies the techniques of this section accordingly, Theorem
2.3 can be obtained in this more general context. Taking I to

be of this form includes the Gamma and Negative Exponential

distributions as special cases.

 

 

 

CHAPTER III

ESTIMATING THE EMPIRICAL DISTRIBUTION FUNCTION

OF A PARAMETER SEQUENCE IN COMPOUND DECISION PROBLEMS
1. Introduction.

In a statistical compound decision problem one is faced
with a set of n problems all having the structure of a certain
component problem. For example,in Chapter II the component pro-
blem is a test on,the,parameter of an exponential family. In such
problems,procedures are desired whose compound risk (average risk
over the n problems) converges to R(Gn)’ the Bayes Risk versus
Gn’ the empirical distribution of the n parameters, in the com-
ponent problem. It is evident that knowledge of Gn is useful
in these problems.

Let F

9

Subset of the real line and f be a corresponding density with

6

respect to Lebesgue measure u. Let x = (x1,x

be a distribution function for each 9 E 0, some

2,...) be a

sequence of independent random variables where xi has dis-

tribution function Fe , henceforth abbreviated to Fi’ and
i

91 E 0 for i = 1,2,... . Also abbreviate fe to fi' Let
ﬁ 1
E - Hi-lri and Gn be the empiric distribution of the n

parameters corresponding to x1,x2,...,xn.

Define the following functions
_ -1 n
(3-1) F = G (F ) = n E F ,
n 9 i=1 i

27

 

 

disc
met:

foll

L0\w

dist:

intez
WhOSe
cm.
of ”DJ
In thl
601'!qu-l
C0115 ide

“(or (11

 

28

32 - -1n
(-) f-Gn(fe)=n.2 fi
1=l
and note that f = dP/du. Let
* _ n
03) Fm)=n12hisﬂ.
i=1

For any real-valued function g of a real vaiable, say x, define

(3.4) A g<x) = h'ls]:+h. h > 0.

We now make some remarks about the Lévy metric which is

 

discussed on page 215 of LOEVe (1963). The Lévy metric is the
metric on the space of all distribution functions defined by the
following distance formula. For any two distribution functions

F1 and F2, letting d denote distance,

d(F1,F2) = inf [c > o|for all x, F1(x-e)-e s F2(x) s F1(x+c)+e}.

Lobve mentions that convergence in Lévy metric of a sequence of
distribution functions is equivalent to complete convergence.

In section 2, we consider the family of uniform on the
interval (0,9) distributions, 9 E (0,“) and exhibit an estimator
whose Lévy distance from Gn converges to zero a.s. E for a
certain class of E's. In section 3, we deal with the family
of uniform on the interval [9,6+l) distributions, 6 E (d@,°).

In this case, we find an estimator whose Levy distance from Gn
converges to zero a.s. E for every g. In section 4, we again
consider these two families and assume that the 6's are i.i.d.

according to some distribution G. We then apply the results of

 

fut
155
the
for
cha‘

SEN

cont

 

 

find
the q

1 Same .

It then

 

 

29

sections 2 and 3 to the problem of estimating this prior dis-
tribution function G.

Robbins (1964) treats the general problem of estimating a
prior distribution function G. Under certain conditions, he
shows that if the estimate of G is chosen so that the resulting
mixed distribution function is within en(en$v 0 as n w no)
of minimizing over the class of possible mixed distribution
functions, the Sup norm distance from the empiric distribution
function of the observations (x1,x2,...,xn), then, under certain
assumptions, almost surely the estimator will converge to G on
the continuity set of G. However, no explicit method is given
for obtaining this estimator. The family of section 2 of this
chapter is discussed in Robbins' Example 3 and the family of ~
section 3 is a special case of his Theorem 2.

Deely and Kruse (1968) further asSume that Fe(x) is
continuous in x for each 6. They then exhibit a method of
finding an estimator satisfying Robbins' condition. Calculating
the estimate involves finding an optimal strategy in a certain

game.

2. Uniform (0,9) Case.
We consider the following family of distributions. For

6603(01Q): let
-1
fe(x) = e Eo<x< a].

It then follows that

 

 

[ram

defin

Lem

 

 

30

0 x S 0,
Fe(x) = x9‘1 0 < x < e,
l x 2 6.

Thus, for any x, by the definitions of P and f in (3.1)

and (3.2) respectively,
(3.5) F(x) = Gn(x) + xf(x).
We estimate Gn at any point x 2 0 by

* * *
(3.6) Gn(x) = F (x) - x 1 F (x)

*
where F and A are defined by (3.3) and (3.4) respectively.

For each n form the following grid:

= < <...< <
0 xno xn1 an an+1

and let 8 = x For any c > 0, define for all x 2 0.

nN+l'

An(x) = {xlx A F*(x) > (x-e)f(x-e) + E]:-¢ + e/Z},

Bn(X) = {ilx b F*(x> < <x+c)?<x+e> - Elf" - e/2}.

Note that F(x) can be written as either F(x-c) + Ej:-e

or P(x+€) - Ej:+'. The following lemma then follows immediately
*

from equation (3.5), definition (3.6) which defines Gn and the

definitions of An(x) and Bn(x).

Lemma 3.1. For any e > O, for each x 2 0

 

 

it fol

W in

     

 

31

{§|G:(x) < Gn(x-€) - a} C {§IF*(X) < F(x) - 9/2} U An(x),

{36:00 > Gn(x*+e) + 63} C {5|P*(x) > F(x) + c/z} U Bn(x).

Lemma 3.2. If h e 0 and 2:; N exp {-nhzeZ/ZBZ} < a, then

1

'J

N
EIjEOCAn(xnj) U Bn(xnj)) i.o.} = 0.

Proof: Let 0 < x < B be fixed and note that

 

* -1 n
xAF(x)=n beExin]. -
i=1 '

It follows from the definition of f1 that the variables

x AExi S x] have expectations x A Fi(x) S xfi(x). It then

follows that x b F*(x) has expectation bounded above by xf(x).

Since If is decreasing on x >'0, we bound 7P]:_€ = x-c f du

below by 3 f(x) if x 2 e and xf(x) if x < c. Thus

subtracting the upper bound, xf(x), on.the expectation of

x A F*(x) from the right hand side of the inequality defining

An(x) yields a quantity which is bounded below by 3/2. Hence,

by Theorem 2 of Hoeffding (1963), since x < B,

§(An(x)) S exp(-nh232/282). Then, since .ECAn(O)) = 0 for all n,
N

(3.7) y U A (x )) S N exp(-nh2c2/ZBZ)-
1-0 n nj
If h.S e, fi(x+c) > 0 implies A Fi(x) = fi(x+c) and
it follows x»b Fi(x) 2 xfi(x+t). Hence, the expectation of

* ._
x A F (x) is bounded below by xf(x+e). Hence, since

 

 

 

32

F1131: 2 ef(x+c), subtracting xf(x+|:) from the right hand side
of the inequality defining Bn(x) yields a quantity less than or
equal to -¢/2. Again applying Hoeffding's Theorem 2, since

x < B and §(Bn(0)) = 0 for all n we obtain, for h S e,

the bound of the right hand side of (3.7) for EGJ§=OBé(xnj)).
Since the infinite series formed by summing this bound over n
converges by assumption, by the Borel-Cantelli Lemma, the proof

is complete.

Define the distribution function G“ by Gn(O-) = 0,

 

Gn(B) = l and for 0 S x < B

G (x) = max {G*(x )IO S x S x}
n n nj n '

j

*
Note that Gn(0) = 0.

____—_._‘ = - S S —+ _.
Theorem 3.1 If 6 max {xnj+l xnj|0 j N} 0’ h 0,
Gn(3) “ 1 and for all e > 0

a
2 N exp(-nh262/2B?)< 0°,
n=l

then d(Gn’Gn) * 0 a.s. 2.

Proof: Let a > 0 be arbitrary. By the extension of the
Glivenko-Cantelli Theorem to non-identically distributed in-
dependent random variables, see Theorem 4.1 of Wolfowitz (1953),

_ *
a.s. §,F - F w 0 uniformly in x. It then follows from Lemmas

3.1 and 3.2 that

N
(3.8) _P;{ U (one:n

j - c) - e S G:(xnj) S Gn(xnj + e) + ¢)'i.o.} = 0.
J=0 . .

 

 

 

 

 

 

 

311

(in

Win:

[Film]

 

 

 

33

Since for 0 S x < B, én(x) = Gn(x') where x' is the largest

x“j which is not larger than x,
U {l‘ N *
x > + C > + +
cage—Gum snore) z} jgomlcnsnj) and“ e) c}
and if 5 S e,

)< Gn(xn - e) - C}-

J

-2 -
0;J B{alcnm < Gn(x c) e } Cijgo{glcn(xn j

Since 6 ~ 0 and Gn(8) ~ 1, by (3.8) and the fact that an - Gn

 

for x < 0, a.s. E ,for n sufficiently large, d(Gn,Gn) S 2:,

which completes the proof.

3. Uniform [9,9+l) Case.
We now consider the following family of distributions. For

9 60 = (-°°.°°).
fe(x) = [e s x < e + 11.

It then follows that for all x

(3.9) f(x) = Gn(x) - Gn(x-l).

By (3-9),

(3.10) c (x) = 2 f(x-r).
n r=0

* *
Since F (x) S Gn(x) S F (x+1), we estimate Gn at a
*
point x by Gn(x) which is the truncation to the interval

[F*(x), P*(x+1)] of z:=o i F*(x-r), i.e.

 

 

34

(3-11) G:(x) ‘- m :01 F*(x-r))v F*(x>)/\ F*(x+1)}.
r:

For convenience we assume that h S 1.

Lemma 3.3. For any 3 > 0, if h S c, then for all x,

* I 2 2
£(ExIGn(x—e) - c s Gn(x) s Gn(X+c) + e}‘) s 2 exp(-2nh e ).

*
Proof: Since the truncation involved in the definition of Gn
can only improve the estimator, it suffices to prove the lemma

for the estimator Tn’ defined by

Q * 1 n a
T (x) = Z A F (x-r) = n- 2 2 A Ex. S x-r].
n . 1
r=0 i=1 r=0

By the definition of F in (3.1),

_1 °‘_ x+h-r
E(Tn(X))= h 2 FJx-r
r=0
By (3-9).
” — +h ” h
2 P1“ ‘r = z (fx+ "(c (t) - c (t-l))dt).
r=0 x-r r=0 x-r n n
. x+h-r x+h-r;1 .
Writing x-r Gn(t-l)dt as x-r-l' Cn(t)dt,.we see that the

right hand side is a telescopic series and we obtain

Q
- +h- +h
(3.12) {BP]:_r r = I: Gn(t)dt.
r:

By Hoeffding's (1963) Theorem 2, since by (3.12),:(Tn(x)) 2 Gn(x),
2
£(Tn(x) < Gn(x-e) - e) S exp(-2nh2e ). Similarly,if h S e, by

(3.12), F(Tn(x)) S Gn(x+e); applying Hoeffding's bounds again

 

 

 

 

35

2
EiTnoc) > Gn(x+c) + e) S exp (-2nh c2), and the lemma is proved.
Let 6 = N-1, N a positive integer depending on n and

consider the following grid on the real line:
...< -26 <-6 <0< 6< 26 <...
Define the following distribution function.

(3.13) Gn(x) = sup {6:(j6)lj6 s x, j = o, i 1, i 2,...}.

 

Theorem 3.2. If Z:=1N exp(-2nh252) < m for any a > 0, N r w
and h a 0, then for any g, a.s. F, d(Gn,Gn) ~ 0, where Cu is

defined by (3.13).

Proof: Let c > 0 be arbitrary. Let h S e and 6 S e. Let

*
J be the largest integer such that F (J6 + l) < c. Define the

(j+l)6+l 2 c

*
following Subset of the real line, letting .J= {le Jj5

j 2 J, j = o, i 1,...},
An = u [i6.<j+1>6>
jE,J
Note that there are at most L = (N+l)M grid points in An where
M is the smallest integer greater than or equal to 5.1. Also
*

note that An may be empty. If x < J6, since F (J6 + l) < e,
Gn(x) < e and Gn(x) < e and it follows trivially that

- *(mHN+1
Gn(X‘€) - e S Gn(x) S Gn(x+€) + c. For m 2 J, let F ]m6 < 6.
Then for x E [m6,(m+1)6), since both Gn(x) and Gn(x) are in

* * a

the interval [F (m6),F ((m+1)6 + 1)], Gn(x-c)-e S Gn(x) S Gn(x+e) + e.

Let [m6,(m+1)6) C-An. For all x in this interval

 

 

 

 

 

I’d

Loo

be f1

 

 

Gn(x) = Gn(m6). Thus,

 

u {xIGnOO > Gn(x+e) + e} c U {alc*(36> > a (jam + e}
. EA n n

x€An J5

and since 6 s e,

A *
xléA fgglcn(x) < Gabi-2:) - 6} C jéLéA [Elcn(j6) < Gn(j6-e) - a}. #3“

n n E
The ‘2 measure of the union of the two right hand sides of the

above inclusions, by Lemma 3.3 is less than or equal to

 

2 2 Q 2 2
2L exp {-2nh e ). Since 2£=1N exp (-2nh c ) < a, by the Borel-
Cantelli Lemma

E( U {5|Gn(x-26) - e 5 6mm s Gn(x+c) + e}'i.o.) = o.

x
n

It follows that a.s. ‘E, d(Gn,Gn) S 2: for n sufficiently

large, which completes the proof.

Remark: We have tacitly been assuming, in both sections 2 and 3,
that d(Gn,C ) is for each n a Borel-measurable function of

n
in ,where the o-field on the space of xn's is the n-dimensional

Borel sets. It will be shown that these measurability assumptions

are satisfied in the proofs of Corollaries 3.1 and 3.2.

Lemma 3.4. For a > O, a > O and all c 3

a
0’
2 nc exp {-an }4< 0.
n=l

2322:: It suffices to prove the lemma for c > 0. Let c > 0

be fixed and let m, a positive integer, be such that x 2 m

 

 

37

implies (x+1)c < x2e. Then
a c a 2
2 n exp (-an ) < I“ x c exp (-axa)dx,
n=m+l m
and the integral on the right hand side can be shown to be finite

by the change of variable: y = xa, which completes the proof.

. . C . . . 41
Remark: Lettlng N = n , c a p051t1ve integer, h = n and

B = n.Y with a,Y > 0, and a + Y < k, by Lemma 3.4 the series
of the hypothesis of Theorem 3.1 converges. Letting N = nc,

c a positive integer and h = nqa, 0 < a < a, again by Lemma

3.4, the series of the hypothesis of Theorem 3.2 converges.

(,4, Estimating the Prior Distribution.

.Let (8,50 be the measurable space consisting of the

. . m
real l1ne and the Borel f1e1d. Let (R ,EF5, where m is a
positive integer or infinity, be the usual product space. We
now drop the condition that F9 is absolutely continuous with
respect to u for each 6 E Q.

We refer the reader to page 137 of Ldbve (1963)for a

brief discussion of regular conditional probability.

Lemma 3.5. If F 9 6 Q, is a regular conditional probability

a,
so
measure on (RqﬁD, then F, g E Q , is a regular condition

probability measure on (R?,Bw).

Q) Q
Proof: Since F_ is a probability measure on (R ,B ) for-each

Q ,we only need show that for each set B 6 SP, F(B) is a

 

 

38

meas\xral>le function of g. If B is a meaSurable cylinder set,

i e B — m
‘ ° ‘ IIi=1 Bi’

Bi 6 B for i = 1,2,..., where only a finite
number of Bi's are not equal to R, F(B) is a finite product
of terms of the form Fi(Bi) and hence is a measurable function
of 9, Thus, since it is easily seen that the class of all
subsets whose F- measure is a measurable function of 9 is a
o-field and since the measurable cylinders are the generators
of 59, the proof is complete.

Suppose 9i, i = 1,2,..., are i.i.d. according to some
distribution G. Let Gm(§) denote the marginal distribution
on E of the joint distribution on pairs (9,5) resulting from

a:
G on 9 and F on x.

Theorem 3.3. If 9i are i.i.d. according to G and Cu is
an estimator of G based on (x ,x ,...,x ) such that
n l 2 n
d(C ,G ) is jointly measurable in (9 ,x ) for each n and
n n -n -n

if Gm {§|d(Gn,Gn) w 0 a.s. E} = l, and F 9 E 0, is a regular

9’
conditional probability measure,then d(Gn,G) * 0 a.s. 69(F).
M: By the triangle inequality d(&n,c) s d(Gn,Gn) + d(Gn,G).
By the Glivenko-Cantelli Theorem, page 20 of Loeve (1963),
d(Gn,G) * 0 a.s. Gm. Let C be the set of pairs (9,5) such
that d(6n,cn) -' 0. c is jointly measurable in ($5) and
since by Lemma 3.5 F is regular, the measure of C is
G°(§(C)). Since F(C) = O a.s. G”, the proof is complete.

Let the U-field on 0 be the restriction of the Borel

Field to 0.

 

39

Corollary 3.1. If Fe corresponds to the uniform distribution

on the interval (0,9), 6 E 0 = (0,w), if 9,, i = 1,2,..., are
1

i.i.d. according to G, if Gn is defined as in Section 2 of

this chapter and if the hypotheses of Theorem 3.1 are satisfied

with B .. on replacing Gn(B) ... 1, then d(<‘;n,G) ~ 0 a.s. G°°(§).

 

Proof: Let B be a Borel set. Fe(B) = 9-1u(B(0,6)) which is .t]
a continuous function of 6 > 0 and hence Fe, 6 6 0, is a

regular condition probability measure.

 

For each n, Cn(§n) assumes one of a finite set of values,
each of which is a step function with discontinuity points
restricted to the selected grid. The set of ‘gn's for which
an assumes a particular value is a finite union of sets each
of which corresponds to a particular set of values for the

* *
restriction of Gn to the selected grid, where Gn is defined
by (3.6). Each set of this union is a finite intersection of
sets which correspond to a specific value of G: at each point

*
of the grid. Hence, since it is easily seen that Gn 18 a

Borel-measurable function of ‘5“ for each x, the set of gn's

A 0
where G assumes a specific value is measurable. Canalder
n

the partition of an 1?.n into sets J1,J2,...,JM where M
is the number of possible values of Gn and each Ji’
1 = 1,2,...,M, is the product of On and a set of gn's‘ on

which C aSSumes one of the M possible values and hence
n

' A = A . S. C3
18 measurable. Then d(Gn’Gn) 2?;1 Jid(Gn’Gn) in

Q ' ' ‘ A s a
d(Gn’Gn) 18 continuous in ﬁn on each Ji’ Jid(Gn,Gn) 1

 

 

 

 

 

 

 

 

 

 

 

 

"hem

and

40

measurable function of pairs for i = 1,2,...,M.

($1,351,)
Hence, d(Gn’Gn) is jointly measurable for each n.
Since the ei's are i.i.d. according to G, by the

Glivenko-Cantelli Theorem, 8 ~ a implies G (B) r l a.s. Gco
n .

Thus, by Theorem 3.1, Gm£§Id(Gn,Gn) * 0 a.s. E} = 1 and it

 

follows by Theorem 3.3 that d(Gn,G) H 0 a.s. dm(z), 5‘

Corollary 3.2. If Fe corresponds to the uniform distribution<n1 %
F

[e,e+l),96(-m,a) and if Gdis defined by(3.13) and if the con-

 

ditions of Theorem 3.2 are satisfied, then d(Gn,G) ~ 0 a.s.

Gag) .

Proof: Let B be a Borel set. F6(B) = u(B[9,9+l)) which is
a continuous function of 9 and hence Fe is regular. By

Theorems 3.2 and 3.3 it remains to be shown that d(Gn,Gn) is

a measurable function of pairs for each n. For any

(9 ,x )
-n -n
c > 0,
3. A p = ' A l _
( 14) Ed<Gn,cn) < c} 9L3 e(Ed<Gn,Gn> < e} n {d(cn,cn) < c e}),
_n)
where g; is an n-dimensional vector with rational components

and G' is its corresponding empiric distribution function and
n

e is a positive rational. For any fixed -§n and any c' 2 0

~ A S'= -'-'s“ sG +'+c',
(3 15) {d(Gn,Gn) c } LJfGn(r c ) c Gn(r) n(r c ) }
r
where r ranges over the set of rationals. Since it is easily
seen that 6*(x), defined in (3.11% is a measurable function of
n

5n for each x, it follows by the definition of Cu that Gn(x)

 

 

41

is measurable for each x. Hence, it follows from (3.15) that

d(G ,G ) is a measurable function of x for each fixed 6 .
n n -n -n

Also, for each 6', d(G',G ) is continuous in 6 and hence

-n n n -n

measurable in .9“. Thus, each of the sets of the countable

union of the right hand side of (3.14) is an intersection of

two measurable cylinders in the space of pairs (gn’En) and

consequently is measurable and the proof is complete.

 

.
"In my .. _

 

 

 

 

 

Let

Hhic

“nth
these
599”!

the j,

 

CHAPTER IV

SOME EMPIRICAL BAYES SOLUTIONS

1. Introduction.

As previously mentioned a symbol, say F, representing a
distribution function, also represents the corresponding Lebesgue-
Stieltjes measure. For 0 a subset of the real line, let

{Fele E 0} be a class of distributions for a random variable
x possessing densities, £9, with respect to Lebesgue measure, u.
Let G be a distribution on 0.

Define

K(X) = G(Fe(X))

which is the marginal distribution function of x of the pair
(9,x), where (9,x) possesses the joint distribution resulting
from G on 6 and F9 on x.

Let

k(X) = G(fe(X))

which is a determination of the density of K with respect to

u. Let (x1,x2,...) be a sequence of i.i.d. according to K
random variables and K? be the product measure on the space of
these sequences. Let P be the product measure on the space of
sequences (x1,x2,...,(9,x)), i.e. P is the product of ﬁn and
the joint distribution of (9,x).

42

 

 

I ll ['11

 

43

The Bayes Estimator verSus G in the problem of estimating
9 based on observing x, under quadratic loss, is the conditional
expectation of 6 given x, Px(6). Denoting the Bayes risk
verSus G by R ,we have R = P(¢(x) - 6)2, where ¢ is a Bayes
response.

Our objective is to find an estimator, say ¢n’ based on
(x1,x2,...,xn), of ¢, for which Rn = P(¢n - 0)2 r R as
n H w. If P(¢n - 9)2 < m and P(¢ - 9)2 < m, then

P((¢n - ¢)(¢ - 9)) = 0 and it follows that
2
(4.1) Rn - R = P(¢n - ¢)

We shall also consider the technique of first estimating G
based on (x1,x2,...,xn), and then using the Bayes Estimator
verSus this estimate of G to estimate 9.

Throughout this chapter, + or - appearing as an affix on
the lower limit of an integral means respectively to exclude and

include the lower limit in the range of integration.

2. Uniform (0,9) Case.
We consider the same family of distributions discussed

in section 2 of Chapter III. For 9 E O = (0,6),

fe(x) = e‘1[o < x < e]

and

0 x S 0,
Fe(x) = x 9-1 0 < x < 9,

1 x 2 9.

 

 

 

De

 

and

"here

 

 

44

In this case we have the following:
-1
(4.2) k(x) = carom) = [x > OJJ:+9 (16,

(4.3) K(x) = G(Fe(x)) = C(x) + x k(x).

Henceforth, we only consider x > 0. From the definition of f

c<e£e<x>> 1_G x

(4'4) Pit“) = G(fe(x)) = k(x)

Note that k(x) = 0 implies G(x) = l and K(x) =

k(X) > 0, by (4-3),

l-K x
pxae) =x+—k—G%)-.

Define

 

and consider the following Bayes response,
(4-5) ¢(x) = V(x) + x.
We now make the following definitions:

-1 n
Kn(x) = n .2 [xi S x],

i=1
i.e. Rh is the empiric distribution of (x1,x2,..
I -l x
kn(x) - h KnJX-h

where h depends on n and is positive. Define

1. If

° axn) :

e_and k,

 

 

 

 

45

(“-6) line) = (anon/x (1 - Kn(x))/kn(x))[" 2 h],

where undefined ratios are taken to be zero and an is a bounded

non-negative function of x > 0 for each n and
(4-7) ¢n(X) = x + ”(X)-
We now assume

(A1) C(62) < ...

 

By (Al) and Jensen's Inequality P(¢2) < ¢. Hence, since an
is bounded for each n the conditions implying (4.1) are satisfied.

Thus, we are interested in choosing an so that P(¢n - ¢)2 d 0.

Remark. If the failure rate of the marginal distribution of x
is bounded away from zero, W is bounded, say by C. If we
estimate W by (l-Kn)/kn truncated at C, under the proper
conditions on h, at every x where k(x) is poSitive and is
the derivative of K, hence a.s. K, the estimator converges in
Q
R measure to ¢(x). Then, since P0?n - ¢)2 = P Pxﬂn - V)2,
by twice applying the Bounded Convergence Theorem,
P 2 2 O

(in-l) =P(¢n'¢) ~.

The following example illustrates a parametric class of

prior distributions for which a rate of convergence to Bayes
risk can be obtained for each member of the class with a non-

truncated estimator.

Example 4.1. Suppose G << u and

 

 

 

 

sucl

 

 

46

d6 2 -h9

——= >

i A 9 e [6 03,
where X E (0,“). Then for x > 0,

k(x) = rﬁe-ldG = xzf; e-Aede = x e-Ax.

Hence, #(x) = e-xx/X e->‘x = h-1. We estimate ¢(x) = k-1 + x

by ¢n(x) = Q + x where ; = “-12? Since C(92) < w

i=1 xi'
‘2 . - -1 2
and P(x ) < ”e (4-1) holds. Since P(x) = k , P(¢n — ¢) =

V(x) = n-lk-z. Thus, for each G in this class, Rn - R is

 

of order n-l.
We now proceed in the general problem of finding an

such that P(q)n - ¢)2 m 0.

Lemma 4.1. Under (A1), if nh2 4 w, h d O and an(x) a G for

each x, p((yn - ¢)')2 ~ 0.

Proof: Let x e A = {x|k(x) > o, x 4519(6)}, where 19(0) is
the discontinuity set of G. Since k is continuous at x,
2
k(x) = K'(x). By the Tchebichev Inequality, since nh ~ m,
-1 x , m _ -l x “
kn(x) - h KJx-h 0 in K -measure. Since h KJx-h k(x)

as n ~ a,

(4 . 8) kn(x) -. k(x)

Q
in K -measure. By the Glivenko-Cantelli Theorem, page 20 of

\
Loeve (1963) and (4-8),

I'Kn(x) # l-Kgx!

k111(x) k(x) ‘ V(x)

 

 

 

47

C3
in K -measure. Since an(x) r m,

(as wgw-wmn*~m
Since
mam os(wn-w32sl§

by (4.9) and the Bounded Convergence Theorem

- 2 m - 2
(4-11) Px((¢n - V) ) = K ((¢n - V) ) r 0.

 

Since P(A) = 1 and under (A1), P012) < an, by (4.10), (4.11)
and the Dominated Convergence Theorem, P Px((*n - *)')Z =

P((Vn - H')2 m 0 which completes the proof.

Lemma 4.2. If x 2 h and k(x) > 0,

c(a: + han) (1+) + 2(h-1a: + an)
(n h k)!5 nkk

 

+ 2
wan - l) > s

+
where c is the Berry-Esseen constant and 1 denotes

(1-K(h))-% which decreases to one as h ~ 0.

Proof: Let x 2 h be such that k(x) > 0. Since
+
P(¢n - ¢ > v) = 0 for v 2 * where * = (an - W) ,
+ 2 * 2
gun-w) =hr¢%-¢>wa. ut0<v<t
Vn - V > v iff G < 0 where for i = 1,2,...,n,
Wi = h-1[x-h < xi S x](¢+v) - [xi > x]. Since the xi's are

i.i.d., the wi's are i.i.d. Since k is a decreasing function

 

 

II} I

 

 

48

(4.12) Px(w1) = l(¢ + v) - (1-x) 2 vk,

where we define A = h-IKJ:_h. By the Berry-Esseen Theorem,
page 288 of Loeve (1963), with V(wl) = 62 and r denoting

the range of wl,

(4.13) px(E < 0) s e(z) + c n-krU-l,

%

- ~1
where z = -n Px(w1)o and c is the Berry-Esseen constant.

By (1.1), hko 2 {(¢+y)zl(1-hl)}%. Also, hr = (¢+y) + h.

Hence,

fgro'ldvz s (hl(1-hl))'5f;(y + v + h)(¢ + v)'1dv2.

Note that (1-hl) 2 1 - K(h) for all x > 0. Thus, bounding
the integral in the right hand side of the above inequality,

we obtain

(4 14) fgro'ldyz s {hl(1-K(h))}"’(*2 + 2h*).

By a weakening of the tail bounds of the standard normal

distribution function, page 166 of Feller (1957), the lower bound
on Px(w1) of (4.12) and the fact that O S r, @(2) S (nﬁfk)-1r.
Hence, since hr S an + h for 0 S v S *9

* 2 2(h-la2 + a )

(4.15) f §(z)dv s ———--£L—-—lL-
0 n!5 k

Since A 2 k for x 2 h and * s an, by (4.13), (4.14) and

k

(4.15), replacing (l-K(h))- by (1+):

 

 

m
the
Seco
by in

Lemma
(4.16)

Now 9X

 

 

49

2 -1 2
c(an + Zhan) (1+) 2(h an + an)

 

* A _ 2
P (w < O)dv S ,
I0 x (n h k)% nkk

which completes the proof.

Let

nan“, = sup

Hanll2 = 44(a§))*,

llanll1 = Man).
Recall that an is a bounded non-negative function on x > 0

for each n.

Lemma 4.3. With 1+ as defined in Lemma 4.2,

P((Vn - l>+)2 s I3 12d? + n"‘<<1+)uh'l<|la,,l|,.,llanH2

+ 2h|lanll2> + zs'luanu: + Hamil,»-
iro_of: P((‘ln - *)+)2 = $3 lzdr + J: wan - w>+>2dr. Since
the P-measure of the set where k > 0 is one, converting the
second integral of the right hand side above to a u-integral

by introducing k in the integrand and applying the bound of

Lemma 4.2, we obtain
(4.16) I: pxmyn - ¢)+)2dp s n-kj:((1+)ck35h-15(a:+2han)+2(h-1ar21+an))du..

liow extend the range of integration of this bound to (0,”).

 

 

 

 

50

. 2
Then, Since an S Hanna’s.n and p(k) = 1, by the Schwarz In-

equality the right hand side of (4.16) is bounded by

n'i‘E<1+>ch'*<llén\l,\lanll2 + 2hnannz> + 2(h'1HanH§+llan||1>1.
which completes the proof.
Theorem 4.1. Under 0A1), if an(x) * w for each x, h m 0 and

Mann, = 005‘),

Hamil, =o<(nh2)3‘),
Han“, = cam"),
2
then P(q)n - ¢) d 0 as n ﬂ “.

jggggfz Under (Al), P(wz) < w and it follows by the conditions
of this theorem and Lemma 4.3 that P(Gn - ¢)+)2 * 0. By Lemma
4.1, since nh2 ~ 0°,P((¢n - H')2 r 0. Hence, P(q)n - ¢)2 =
P(Wn - ¢)2 H 0 as n ~ w and the theorem is proved.

We now consider the procedure of first estimating the
prior G and then using the Bayes Estimator versus the estimate
of G. As before let d denote the Livy metric. Noting that
K” is the same measure on the space of §~sequences as G°(§),
which is discussed immediately before Theorem 3.3, by Corollary
3.1 we can construct an estimator, Gn, of G such that
d(Gn,G) r 0 a.s. Rf.

The following example shows that the risk of the Bayes

EStimator versus a distribution function, converging in Levy

 

 

 

 

 

 

 

51
metric to G, may not converge to Bayes risk.

Example 4.2. Let 0 < B < 09 be a continuity point of G with

G(B) = 1 and C(9) < l for 6<B. Let

G(6) e<b,

n
A = S
Gn(9) G(bn) bn e < Mn,
1 9 2 M i
n
where M-°°,b =6 -c with O<c S1 and cVO and
n n n n n n

9“ satisfies the following conditions:

0 s emit a.

G(9n-) S 1 - cn.

Since bn’f B, d(Gn,G) -° 0. For x > 0, let

l-G (x)
A n -1
¢n(x) = ————- [lx+e dGn > 03,

-1 .
J:+6 dGn

which is a Bayes response versus G“. Let n be sufficiently
large so that bn > 0 and Mn > B. It follows that

A2 A2
K(¢n> 2 ftbn,e>on

-l
Fe([bn.a>> 2 1 - bnen ,

2 .
dK - Mn K([bn,8)). Since for en S 9 S B,

-l
K<Ebn.e)> = IF9(Ebn.e>)dc 2 <1-bnen )(1'G(°n‘))-

Thus, K(&:) 2 M:(1-bn9;1)cn 2 B'1(Mncn)2. Hence if Mncn —. co,

since C(92) < 6°, by the triangle inequality for L2- norm the

 

 

 

 

52

risk of using & converges to infinity. However, since
n

G(92) < m, the Bayes risk is finite.

3. Uniform [9,6+l) Case.
We now consider the class of distributions considered in

section 3 of Chapter III. For 9 E 0 = (49,69),
fe(x) = [e s x < 6+1].
In this case, we have
(4.17) k(x) = G(fe(x)) = G(x) - G(x-l),
(4.18) K(x) = C(Fe(x)) = C(x-l) + xk(x) -j’é:r1)+edc,

where the affix + on the upper limit of the integral means to
include the limit in the range of integration.

By the right continuity of G, k is right continuous and
hence, for all x, k is the right hand derivative of K. Thus,
we estimate k(x) by
knot) = h'lxnlfh.
where 0 < h S l and h is allowed to dependc on n and Kn
is the empiric distribution function of x1,x2,...,xn as in
Bac:tion 2 of this chapter. Note that central or left differences
0f K. are good estimators of k(x) when K'(x) = k(x).

n
By equation (4.17), for all x,
Q

(4.19) G(x) = 2 k(x-j).
i=0

 

 

 

53

(4.20) K<x> - If, k(y)dy = I:_lc(y)dy.
For all x, define

(4.21) G:(x) = .2 kn(x-j).
J=0

*
Lemma 4.4. If h H 0, for each x, Gn(x) H G(x) in Km-measure.
Proof: Let x be fixed. By (4.17),

” X+h-j ” x-J+h x-j-1+h
jfngx-J = j:g(fx_j c<y)dy - x_j_1 G(y>dy>.

Since the series on the right hand side is telescopic and

Ix+h-JG(y)dy H O as j H G, we have,

X-j
” +h +h
(4.22) 2 KJX '1 2 IX G(y)dy.
x-j x
1‘0
*
Since Gn (x) is the average of n i.i.d. random variables,

each distributed as h-12:=0 [x-j < x1 S x-j+h] whose expectation

is h-lif KJx—j+h and since nh2 H w, by the Tchebichev In-

J=0 x-j
* - -

equality, G (x) — h 1:? KJx J+h H 0 in Kﬂ-measure. Hence,
n j=0 x-J

by (4.22) and the fact that the right continuity of G plus

h converging to zero imply that h-lf:+hG(y)dy H G(x),

*-

Grle) ” G(x) in KD-measure, which completes the proof.

Since
G(9f6(x))

Px(°) = G(fe(x)) ’

by tile equations describing the functions k and K, (4.17) and

(4 . 18) respectively ,

 

 

 

54

= con + tx-Dkoc) - K<x>
Px(e) k x

Thus, we take as a Bayes response
(4.23) ¢(X) - (x-l) + V(X),

where the function V is defined as follows:

Since the conditional distribution of 9 given x
on (x-l,x], 0 S V S 1. Define the function ¢n by

*

G - K
(4.24) w = (L—J-‘Vom 1.

n k
n

0/0 is defined to be an arbitrary value in the interval

where

[0,1], We estimate ¢(x) by

(4.25) ¢n(x) = (x-l) + ¢n(x).

Theorem 4.2. With ¢n defined by (4.25), if nh2 H G and

h H O, Rn H R as n H w.

fﬂgagf: Since the conditional distribution of 6 given x is
ccnncentrated on (x-1,x], P(¢ - 9) = P(Px(¢ - 6)2) S 1 and
Hon - e) = 1>(1>x(¢n - e)2) s 1. It follows that (4.1) holds and
it is sufficient to show P(¢ - Wn)2 H 0. Let x be fixed.
Sinxze nh2 H G, using the Tchebichev and triangle inequalities

m
as iti the method used to obtain (4.8), in K -measure,

is concentrated

 

 

 

i.

     

55

kn(X) _. k(X) .
By the Glivenko-Cantelli Theorem, page 20 of Loeve (1963), a.s. KI,
Kn(x) H K(x)

Hence, by Lemma 4.4 and the Slutsky Theorem, page 174 of Lobve

(1963), if k(x) >10,
ln<x> ~ ﬁx)

in Ké-measure. We note that P({xlk(x) > 0}) = 1. Thus, by the
Bounded Convergence Theorem, a.s. P, Px(¢n - ¢)2 H 0. Again by
the Bounded Convergence Theorem P(PxOl'n - ¢)2) = P(‘Vn ' ¢)2 * 0:
which completes the proof.

Note that the assumption C(92) < o is not made in this
case. This assumption is sufficient for R.< w. However, in
this case, R S l for any G as was shown in the proof of
Theorem 4.2. The question arises as to whether or not the Bayes
.Estimator m can have an infinite second moment if R.< m. The
following example shows that this is possible. Obviously, in
ttris example, C(92) = ﬁ since by Jensen's Inequality,

G(ez) < co =2 p(¢2) < co.
kample 4.3. Let G be concentrated on I = {1,2,3,...} such
that: the mass at m E I is C mﬂa/2 where C is a normalizing

constant. Since ¢(x) = x-l + #(x) and it is bounded above by

1 and below by 0, P(¢2) = °° iff P(XZ) '3 °°- BY (4'17)

 

 

 

 

56
2 ” 2 m 2
P(x ) = f x k(x)dx = f x (G(x) - G(x-1))dx.
1 1
Thus, by the definition of C,

Q
P(xz) 2 C E mZm-B/2 =

m=l
. 2
and it follows that P(¢ ) = w.
We now consider the technique of using the Bayes Estimator
verSus an estimate of the prior, C. By Corollary 3.2, we can

A A Q
construct an estimator, Gn’ of G such that d(Gn,G) * 0 a.s. K .

 

A
Let Gn be such an estimator and redefine

x+ .
edG
(4,25) ¢n(x) = £1§Llli....§

A x
n x-l

where 0/0 is defined to be some value in the interval [x-1,x].

It then follows that for all x, x-l S ¢n(x) S x.
Theorem 4.3. With ¢n(x) defined by (4.26), Rn - R a 0.

Proof: Let D be the discontinuity set of G and let
D + 1 = (x+1lx e D}. Let x E A = {x|x e D U (D + 1), k(x) > o}
be fixed. Then, by the Helly-Bray Lemma, page 80 of LOEVe (1963),

since d(Gn,G) a 0 a.s. f”, we have that a.s. Ké,
x+ * x
d d d
I(x-l)+e Gn Ix-l e G’
" X X
Gn]x-1 G:lx-l '

Hence, ¢n(x) " Px(6) a.s. KG. Since x-l S Px(9) S x, by the Bounded

C<>nvergence Theorem, Px(¢n(x) - Px(9))2 = Kr(¢n(x) - Px(9))2 H 0.

 

57

Hence, since PQA) = 1, again by the Bounded Convergence Theorem,

P(¢n(x) - Px(9))2 ~ 0, which completes the proof.

4. Estimation of a Location Parameter in Certain Gamma Distributions.

Consider a family of distributions characterized by the
following density with respect to Lebesgue measure, u:

_ 01-1 -(x-e)
fe(x) = EFL“;— [x 2 9]

with a 2 1, where 6 E Q = (-m,+w) and T represents the Gamma

Function. Suppose that G is a distribution on O and assume
2
(A1) G(e ) < 400.

In this case

(4.27) k(x) = G(fe(x)) = f%&; Ifm(x-9)a-1e-(x-e)dG.

We adopt the convention that the upper limit of an integral is in-
cluded in the range of integration.
The Bayes Estimator in the problem of estimating 9 based

on an observation x, with quadratic loss, is

I:6(x-9)a-1e-(x-e)dc
(4.28) ¢(x) = “00km

 

Remark. By part (ii) of the proposition of section 4 of Teicher
(15961), a sufficient condition for identifiability of a class of
trainslation parameter mixtures is that the characteristic function
0f the generating distribution function (take the location parameter

to be zero) not be identically zero on a non-degenerate interval. In

 

 

 

this

tril
hen

M

m

Inv

 

58

this case, the characteristic function of the generating dis-

.. . .-01
tribution is (l - 1t) . Hence, the family of mixtures considered

here is identifiable.
Lemma 4.5. With ¢ defined in (4.28), if k(x) > 0,
x e-(x-t)

¢<x> = x - ”(x))-1]“ am).

Proof: By the definition of k in (4.27),

ajfne'(x't)dx(c) = (T(cv))deifie-(x-t)j‘fa(t-9)a'1e-(t-e)dG(9)dt.

 

Inverting the order of integration in the expression of the right

hand side and performing the inner integration yields

-(x-9)

(T(oz))-1J‘:°(x-O)ae dc(e) = Ifm(x-9)fe(x)dG(9).

This last expression is xk(x) - k(x)¢(x), which completes the

proof.
Define
Ifae-(x-t)dl((t)
(4. 29) V(x) = “—k—(XT—

We estimate ¢(x) by
:me-(x-t)th(t)
(4.30) inc.) = T Aan(x) ,
n

where Kn and kn are defined as in section 3 of this chapter

arui h is positive and allowed to depend on n and an is a

bourided non-negative function of x for each n. Also, define

the estimator to be zero in the case of an undefined ratio.

 

 

(4.3

Note

lt1

 

(4.31) {F(x) = ——————— .
Note that by the definition of k, for any x and e > O,
(4.32) k(x+c) 2 e"k(x).

It then follows from (4.32) that

(4.33) h'lxj:+h 2 e‘hk(x).

. 2 2
Since Fe(x ) = or(a+l) + 290! + 6 . by (A1),
2 2
(4.34) P(x ) = G(Fe(x )) < 0°.

Define

A = {xlk(x)> 0, k(x) = K'(x)}.

Lemma 4.6. Under (A1), if h _. 0, pa? - ¢)2 .. 0.

Proof: Let x 6 A. Then, since h -* 0, @(x) -* $(x). By (4.33),

”'(x) - ¢(x)| S (eh + 1)V(x). By (4.34) and the fact that (Al)

implies that P((pz) < on, since 01¢(x) = x - ¢(x), P(WZ) < m.

Since P(A) = 1, by the Dominated Convergence Theorem, P($ - ¢)2 -' 0

which completes the proof.

Lemma 4.7. Under (A1), if h -° 0, nh2 -° 0° and for each x
- 2

an<x> To». P((wn - in ~ 0.

Proof: Let x E A be fixed. By the Strong Law of Large Numbers,

\

x - - _ -
$-53 (x t)dKn(t) "' “rice (x t)dK(t) a.s. K“. Since nh2 -°°° and

h —. O, kn(x) -* k(x) in Koo-measure by the method used to obtain

 

 

 

 

 

60

(4.8) . Hence, since an(x) T 0°, Wn(x) -. ’Hx) in Ken-measure.

Since $(x) ~ V(x), ¢n(x) - $(x) v 0 in Ké-measure. By the

bound on G(x) of (4.33), (Hum - ion->2 s (ehwxnz.

Hence, by the Bounded Convergence Theorem, Px(f¢n(x) - $(x)}')2 v 0.
Since P(A) = 1 and under (Al), P(WZ) < a, by the Dominated
Convergence Theorem P((Vn - (if)2 ~ 0 and the proof is complete.
Lemma 4.8. P((Wn - @)+)2 S Zehn-!5(c+1)u(an + h-lai), where c is
the Berry-Esseen constant.

2m: Let * = (an - WWW“ - W32 {Egan - i > odtz.
since the integrand of the right hand side is zero for t 2 *.

Fix 0 < t < * and suppose x is Such that k(x) > 0. Then

[¢n - ﬁ > t] = [w > 0] where wi = [xi 5 xje

- ($ + t)h-1[x < xi $ x+h] for i = 1,2,...,n. By (4.33),
— -1 +h -h
(4.35) Px(w) = Px(w1) = -th x]: s -te k(x).

Consider the following bounds on 02 = V(wl),

(4.36) (ch'1)21<.]:+h s ((1 + c)h'1)2x3:+h s a2 s (1 + h'lan)2.

By the Berry-Esseen Theorem, page 288 of LoEVe (1963) and (4.35),

Sillce the range of w1 is bounded by 1 + h-lan,

Px(5 > 0) s @(z) + c n'ko'lu + h'lan),

where dz = -'n;§t e-hk(x) and c is the Berry-Esseen constant.
APPlying a weakening of the bounds on the tails of the standard

notTnal distribution of Feller (1957) and the bounds of (4.36),

 

 

 

 

61
we bound the right hand side of the above inequality by:

(1 + anh-l)eh c(1 + anh'l)
+ .
n1": k(x) nkt h'1(K]:-+h)!5

X

Hence, since (K]:+h)’5 2 x]:+h and by (4.33), h-lK]x+h 2 e'hk(x),

h -l
Zane (c+1) (1 + anh )

 

f3? (w>0)dt2s a,
x n k(x)

Noting that P({xlk(x) > 0}) = l, converting the P-integral of the

right hand side of the above inequality to a u-integral we obtain,

P((Wn - §)+)2 S 2n-keh(c+1)u.(an + h-lai), which completes the proof.
Define

¢n(X) = x - (ﬁnk)-
Theorem 4.4. If p(an) = o(n;5), p(arzl) = 0((nh2)%), h = 0(1) and

an(x) ¢ °° for each x, then under (Al) , Rn - R -* 0.

Proof: By (Al) and the fact that an is bounded for each n,

the conditions implying (4.1) hold and thus it suffices to show
’5

that P((bn - ¢)2 -* 0. Since an(x) 1‘ °° for each x, u.(a§)= o((nh2) )

implies that nh2 -' 0°. Hence, by Lemmas 4.7 and 4.8, POI!n - ‘3')?" —° 0.
By Lemma 4.6 and the triangle inequality for L2-norm, P(¢n - W)2 —~ 0.

Since for x E A, ¢n - ¢ = G(tn - 4t) and P(A) = 1, the proof is

Comp lete.

 

 

 

 

 

CHAPTER V

EMPIRICAL BAYES ESTIMATION IN EXPONENTIAL FAMILIES
1. _A Rate for the Discrete Case.

Macky (1966) dealt with the Empirical Bayes Estimation
Problem for the class of distributions considered in Section 2
of Chapter II. The family is characterized by the following
density with respect to u, where u is counting measure on the
non-negative integers, pe(x) = GXC(9)m(x), m(x) > o for
x = 0,1,2,... and 6 E O<Z (0,“). Let G be a distribution
on Q. As in Chapter III, let K be the marginal distribution
of x of the pair (9,x). Let x1,x2,... be a sequence of
i.i.d. according to K random variables and let P be the
product measure on the space of sequences (x1,x2,...,(9,x)),
i.e. P is the product of Km and the joint measure of (9,x).

Define k(x) = G(pe(x)) which is the density of K with

Irespect to p. The Bayes estimator in the problem of estimating

9 based on the observation x, assuming quadratic loss is:

¢<x> = (31:33)) = T(x) 515%11,
G(G C(9))
where T(x) = m(x)/m(x+1).
Macky (1966) took 0 to be the natural parameter space
for the family, i.e. {9|u(0xm(x)) < co, 9 > O}, and assumed

2
G(9 ) < 0°. He then exhibited a procedure for estimating 9,

62

 

 

 

 

 

 

 

 

 

 

 

 

 

based

risk

(A 1)

vhe

CC!

 

63

based on (x1,x2,. . . ,xn,x), whose risk converges to the Bayes
risk versus G as n -°°.

In this section, we assume
(A1) 0 = (026]: B < a:

where (0,6] is a subset of the natural parameter space. Under

 

certain other assumptions, we show that P((bn — ¢)2 = 0(n-AE),
where
kn(x+1)
¢n(x) = T(x) WA B
and

-1 n
kn(x) = n ifllzxi = x].

2
Again undefined ratios are taken to be zero. Let R = P(q) - 9)
and Rn = P(q)n - 9)2 as in Chapter III. Since B< on, ¢ 5 B

and C(62) S 82 and it follows that

_ 2
Rn-R-P(¢n“¢)

Lemma 5.1. ¢ is an increasing function of x.

* *
Proof: Define the measure G on O by dG /dG = C(9). Note
*
that G is a finite measure possessing all moments and that

for x = 0,1,2,...,

. *
Since G (at), r 2 0 is a log convex function of r, see 9.3b,

 

 

 

 

 

 

 

 

 

 

 

 

I'v

 

64

page 156 of Lo‘eve (1963),

<c*<e"+1>>2 s c*<ex)c*(ex+2>.

for x = 0,1,2,..., which completes the proof.

We now make the following assumptions which are part of
the group of assumptions made by Gilliland (1966) to obtain a
rate of convergence to zero of the modified regret, discussed
in Chapter II, in the sequential compognd estimation problem

for this family of distributions.

 

an 2p§n<e.
x=0
(A3) 2: (T(x)pe(x))’5 < ...
x=0

Lemma 5.2. Under (A1), (A2) and (A3),

2 ﬁo><m
x=0

zaewonk<m
x:

Proof: The proof follows directly from the fact that under (A1)

for 6 E 0, pe(x) S(m(0)\)-,18xm(x) which is a constant multiple

0f PB(X) -

Remark. Gilliland (1966) mentions that B in the interior of

the natural parameter space is sufficient for (A2) and (A2) and

(A3) hold in the Poisson case.

 

 

 

whet

(5.4

Sine

equi

 

 

 

65

Theorem 5.1. Under (A1), (A2) and (A3), uniformly in G,

 

P015“ - «352 = 005*).

Proof: p(en - ¢)2 s e P(decbn - ¢|)) and

1>x(|¢sn - ¢|) = J“; men - ,3 > v)dv +j‘: Px(¢n - ,3 < -v)dv.

Noting that the integrand of the first integral on the right
hand side immediately above is zero for v2 * where * = (B - ¢)+

and that for 0<v<*,[¢n-¢>v]=[ﬁ>0] where

 

wi = T(x)[xi = x+l] - (¢ + v)[xi = x] for i = 1,2,...,n, we have
(5.1) ﬁpan - ¢>v)dv =_[‘; Px(w>0)dv.

Abbreviate T(x) and k(x) by omission of x and let

E = k(x+1). Let 02 = V(w and let 0 < v < *. By the Berry-

1)
Esseen Theorem, page 288 of Leave (1963), since K(w1) = -vk,

(5.2) pxé > 0) s 6(21) + enJ‘r 0'1
where
(5 3) 02 = m" k
. 1 v ,
C is the Berry-Esseen constant and r is the range of w, i.e.

r = T+¢+v.- By Lemma (1.1),

(5.4) r0 1 502-1 + 161)}?
Since 2 2 2 2
GSK(w1)=Tk+(¢+v)kandby(A1),¢SB or

e‘ll-liXIallently E S T-lﬂk, for 0 < v < *,

 

 

 

 

 

 

 

 

66

(5.5) oz s are + 82)k-

Recalling the definition of 21 in (5.3), replacing O by the
upper bound of (5.5) and then extending the range of integration

from * to a, we obtain
(5.6) f; §(zl)dv s (mnkfl‘ias + 52)?
Hence by (5.1), (5.2), (5.4) and (5.6), since * S B,
45 2 k
(5.7) I; 1>x(¢n - q) > v)dv s (2an) (TB + a)
+ Ben-k(k-l + k-1)%.

Letting u1 = T[xi = x+l] - (¢ - v)[:xi = x] for
i = 1,2,...,n, noting that wan - a) < -v) = 0 for v.2 ¢
and that for 0<v<¢,[¢n-¢<-v]S[ESO],we have for

O <v<¢,
(5.8) wan - e < -v) s PXG s 0).

Let 0 < v < q) and s2 = V(ul). By (1.1), s2 S T2E(1 - k)

+ 2T(¢ — v)k k + (¢ - v)2k(l - k). Since k S k ¢ T-1, letting
V = 0, we bound the right hand side of this last inequality by
k(¢'I‘(1 - k) + 2¢2k + (2)2(1 - k)). It then follows that for all

0 < v < Q5,
(5.9) s2 s k(2qs2 + ¢T).

By the Berry-Esseen Theorem again, since K(u1) = v k and by

 

 

 

 

 

67

Lemma 1.1, rs-1 S (k-1 + k-1)% where r is the range of ul,
(5.10) pr s 0) s N22) + c {$5024 + (1)5,

where 822 = -n%vk. By the same method used to obtain (5.6),

using the upper bound of (5.9) for s, we obtain
(5.11) If; §(zz)dv s (2rrnk)';‘(2¢2 + or)".
By (5.8), (5.10) and (5.11)
(5.12) I: Px(¢n - d < -v)dv s (2rrnk)-%(2¢2 + “)6

+ @anEﬂE-l + 161)?

Combining (5.7) and (5.12), replacing ¢ by B, since
the square root of a sum of positive quantities is no greater

than the sum of the individual square roots we have,
(5.13) Px(l¢n - ol) s n-;5(B;5(2rr)-15A + es D)

A = 16*(215 + al‘u +/2)),

D = 12.5“ + 15;“.
By Lemma 5.1, k E'1 s T¢(O). Thus,
(5. 14) P(D) s p((T¢(0));5k% + k3).

By Lemma 5.2, the right hand side of (5.14) is finite. Also,

by Lemma 5.2 P(A) = 1415035 + 53(1 +./2))} < no, so the P-

 

 

 

 

 

68

integral of the right hand side of (5.13) is 0(n-Lé) and the

proof is complete.
The following example indicates that merely assuming that

the parameter space is bounded, i.e. (Al), is not sufficient for

obtaining a rate of convergence for the estimator ¢n.

Example 5.1. We again consider the family due to Gilliland (1966)

that appeared in Example 2.1.

Let n = (0,11, m(O) = 1, m(x) = x'3

for x = 1,3,5,... and m(x) = a(x) S x-3 for x = 2,4,...

Let G concentrate all mass at 9 = 1. Then making use of the

fact that undefined ratios are zero,
2 a
n
R -R=P(¢ -0) 2 2P(X)(1-p(x+1)).
n n x=0 l 1

Since p1(x) = C(1)m(x),
° 3
Rn - R 2 2 C(l)(2x + 1)” (1 - C(l)a(2x + 2))“.
x:
Proceeding exactly as Gilliland we see that Rn - R dominates
a positive null sequence which decreases arbitrarily slowly by

choice of a(x). This example also indicates that bounding

the parameter Space away from zero as well as above will not be

Sufficient for obtaining a rate. Also, if undefined ratios

Were defined otherwise, a slight modification of this example

Would yield the same result.

Tucker (1963) demonstrates a method of estimating G in

he ' = x -9 _
,
t Poisson Case i.e. pe(x) 0 e /x!. He exhibits a dis

t"-‘ibut ion function concentrated on (0,09) , based on observations

 

 

 

 

 

 

69

xl’x2’” .,xn which are i.i.d. according to K, which converges

&
a.s. K to G, on the continuity set of G, as n -' 0°.

In order to apply the exact same estimation procedure to
the general exponential family on the non—negative integers, the

following condition, say (A), would have to be satisfied:
(A) there exists r > 0 such that for '2] < r,

z: z"(x:)'-1(j‘; exdq) < co

x=0
where

gggce
dc ﬂag-mm).

Let us assume, as in the previous theorem, that G is

concentrated on (0,B], B < 09. It then follows that condition

(A) is satisfied with r = 8.1. Let G“ be the Tucker estimate

of G, modified to meet this more general context. Since

C(O) 0x < m-1(x) , by the Helly-Bray Theorem, page 182 of Lobve

(1963), én(e"C(e)) ~ G(GxC(9)) a.s. x” for each x. Thus,

for each fixed x ,a.s. Km

 

 

x+1
dgm~69xcw”=deh
G (9 C (9))
Whe re we define
. &n(exﬂc (6))
gs>=A x Ad
Gn(9 C(9))

It then follows that (in d ¢ a.s. P and by the Bounded Con-

vergence Theorem, P(Ei)n - ¢)2 -' 0 ,so that the risk of using the

Bayes Estimator versus Gn converges to the Bayes Risk versus

G.

 

 

 

 

 

 

 

70

2. Estimation in the Presence of a Nuisance Parameter.
Consider a bivariate random variable with a discrete and
a continuous component. Let the distribution depend on a two-
dimensional parameter with one component pertaining to the
discrete variable and the other to the continuous variable. :5
In this section an empirical Bayes estimation procedure is given '
for the problem of estimating the discrete cqmponent of the
parameter, under the assumption of quadratic loss.

Let 2 = (x,y) 6 {0,1,2,...} X (-¢,+¢). Let
(5.15) Pn(z) = c1<e>c2<§)exe§’m<x>r(y),

where -n = (9,§) 6 Q, 0 being the natural parameter space,

i.e. o = {(e,§)|e > o, 2:: 9xm(x) < co, fegyr(y)dy < co} and

0
m and r are positive, be the density with respect to
u = “1 X ”2’ where B1 is counting measure on the non-negative

integers and ”2 is Lebesgue measure. Let C(U) = 01(9)C2(§).

Let G be a distribution on 0 such that

(Al) C(52) < °°,

(A2) c(ez) < co.

Iset P be the usual product measure on the space of sequences

(:zl,zz,...,(z,6)) where 21, i = 1,2,..., are i.i.d. with density

(15.16) k(z) = G(Pn<z>> = r<y>m<x)c(c<n>exe§y>

 

 

 

 

   

71

with respect to u and (2,9) has the usual joint distribution.
The Bayes Estimator in the problem of estimating 9 based

on the observation 2, under quadratic loss, is

c<ep (2)) k
5. = = “+1
( 17) W) G(pn<z)) T(x) k(x.y)’ .
where F1
_ edsl__ i
T(x) - m(x+l) ° 1

We now assume that r is twice differentiable and that

(A3) r,|r'|,|r"|

are bounded functions. By (A3) and Theorem 9, page 52 of Lehmann
(1959), for each fixed x, k(x,y) is continuous and possesses at
least its first two derivatives with reSpect to y. Thus for each

fixed x, for all y, letting Fx(y) = If” k(x,t)dt,

de(y)
dy = F;(y) = k(x,y)-
Thus, by (5.17),
F' (5')
(5.18) M2) = T(x) —’IE$%;)— .
X

Let the operator ‘ be defined as in Chapter II by

1

bed) = <2h)‘ g y”

1,,h ,

‘vhere 1 2 h.> 0 and g is any real-valued function of a real

1Jariable y. Define

   

72

417110)
(5.19) t (z) = T(x) —-j‘,——/\a ,
n A F ( ) n
x y

where an is a non-negative constant and

* -1 n
F (y)=n 2[x.=x,y.Sy].
x i=1 1 1

‘
. A
h

We consider undefined ratios to be zero. Let Nn be a sequence
of non-negative integers increasing to 9.

Define

A ={zI0SxSNn, -NnSySNn},

>
II

S S - - S S
{2'0 x Nn-l-l, an y Nn+1}s

= - S S - S S
D ETHNn loge Nn, Nn 5 Nu},

'1
ll

- - S S
inf [r(y)| Nn 1 3’ Nn + 1}:

inf {m(x)|o s x 5 Nn + 1}.

Note that mn > 0 and that rn > 0 since r is positive and

c ont inuous .

Macky (1966) shows that with

B(y) = sup Cz(§)e§y,

“s

where (lg = {5” egyr(y)dy < 0°}, B(y) < on for all y and that

ZX+B(Y) = max (B(-Nn -1), B(Nn + 1))-
n

 

73

It then follows that for z E A:

(5.20) k(z) S Bn’

where Bn is the product of the max (B(-Nn-1), B(Nn+1)) and
the bound on r guaranteed by (A3). rSince

k(z) 2 r(y)m(x)IDnC(n)9xegde, for z €.A:,
(5.21) k(Z) 2 en:

. 2 .
where cn rnmndn exp (--2(Nn +11) ) and dn IDnC(n)dG which
converges to IC(n)dG >‘0 as n ~>@.

Define

‘3 Fri-10’)

(5.22) ¢(z) =ﬁ-—(y—)— .
x

Lemma 5.3. For 2 6 A , ¢(z) S B c-l.
----——- n n n

Proof: The proof follows immediately from the bounds, (5.20)

and (5.21), on k(z) for z 6 A:' and the definition of t in

(5.22).
Define
(5.24) ¢n(z) = tn(z)[z 6 Ah],
$1?" (v)
(5.25) tn(z) = 3:“ ,
5 Fx(y)
‘where tn is defined by (5.19). As usual let R = P(¢ - 9)2,
the Bayes risk and Rn = P(¢n - 6)2. By (A2) and the fact that

‘ﬁn, is bounded for each n, we have that Rn - R = P(c)n - ¢)2.

 

 

 

 

 

 

 

 

74

Lemma 5.4. For 2 E An’ b> 0,
* 8nh2b2c 4
A n
emu-1) who“.-- 2
((1+b)cn +IBn)

where * denotes + or -.

Proof: [En - w > b] - E? > o] where for 1 = 1,2,...,n,

Vi a [xi = 3+1:Y'h < yi S Y'Hl] - (*ﬂ)[x~i = x’y-h < yi S Y'H‘J-

_ - - VH1 .
Hence, since Pz(v) b ijy-h and the range of v1 18

1 +-(¢+b), applying Theorem 2 of Hoeffding (1963) to 92(3 > 0),

 

, -2n<b r 1““)2 "'
(5.26) P (t - v > b) 5 exp 'x 3t“ 2
z n ‘(1 +¢ + b)

Also, [En - ¢ < -b] S [6'2 0] where for i.= 1,2,...,n,
wi = (FWD:1L = x.y-h < 3’1 5 M] {[31 = x+1.y-h < ’1 5 M]-
Since 6' has the same expectation as '3' and the range of w

1

is smaller than the range of v , the bound of the right hand

1
side of (5.26) applies also to Pz(tn --# < -b). In the right
hand side of (5.26), we replace Pi];T: by the lower bound,

2h cn, given in (5.21) and ¢ by the upper bound, c;an, given

in Lemma 5.3 and obtain the bound of this lemma.

Letting k"(x,y) denote the second partial with respect

to y,
(5.27) k"<x.y) = m(x){r(y)c*<§2e§-")

+ 2.9 (”Joe”) + r"<y)c*(e§y)}.

 

 

 

 

 

75

where

*
.dG a x
36’ C(n)9 -

Converting the G*-integrals in the right hand side of (5.27)

back to G-integrals by introducing C(Tl)6x into the integrands
as a multiplicative factor, noting that C1(9)9xm(x) < l for all X 12
and 9 and recalling the definition of Bn , we see that for

z 6 AI, by (Al), (A3) and the triangle inequality the absolute

 

value of the right hand side of (5.27) is bounded by L Bn’
where L is a constant depending on G(Igl), G(§2) and the

bounds on r, Ir" and Ir"|. Hence for z 6 A:,
(5.28) lk"(x,y)| s L Bn.

Lemma 5.5. For 2 E An,

 

2 2
‘w _ k(x+l,y)\ S h L 3n

k(x:Y) 3 C2
[1

Proof: For fixed x, since L Fx(y) = (2h)-¥I;f:k(x,t)dt,

expanding k(x,t) by a Taylor Series about y yields

1: Iggy) -- <2h)'.1f§i?l(k<x.y) + <t-y)k'(x.y) + 2'1(t-y)2k"(x,s))dt,

where 3 depends on t and y-h S s S y+h. Distributing the
integration we see that the right hand side of the above equation
equals k(x,y) + (2h)-lfrf: 2-1(t-y)2k'.‘(x,s)dt. By (5.28), for

z E {(x,y)|x = 0,1,2,...,Nn+l, a»:n s y s Nu}, the second term of

 

 

 

76

this sum is bounded in absolute value by hZL Bn/6' Hence for

z E A ,
n

(5.29) lb Fx+1<y)k(x.y) - k(x+1.y)1 Fx(y)| s <k<x.y)+k<x+1.y))hZBnL/e.

Since It - (k(x,y))-1k(x+l,y)l equals the product of the left
hand side of (5.29) and .(‘1.]?'x(y)k(x,y))-1 and i Fx(y) = k(x,a)
where y - h S a S y + h, replacing k by the upper bound of
(5.20) in the right hand side of (5.29) and then using the bound
of (5.21) to obtain a lower bound of c: for (k(x,a)k(x,y))-1,
we see that for z E An’

2 2

|‘¢ _‘kgx+l,y) S h BnL
k(x:Y) 3 c:

which completes the proof.
We choose the sequences an, h and Nn so that the

following conditions are satisfied:

 

 

Bn
(1) a Q ~ 0,
n n
a
(ii) ,“ ~0,
n hc
n
th
(111) -z-‘v 0.
n

lﬂote that since dn converges to a positive constant, these
conditions are implied by 'the set of conditions formed by
rcplacing cn by cndgl, which is a quantity independent of G.

Hence,it is possible to choose an, h and Nn independent of G.

 

 

 

77

We also assume
2
(A4) P(T ) < co.

Theorem 5.2. If conditions (i), (ii) and (iii) are satisfied,

 

then under (A1)-(A4), P(¢n - ¢)2 * 0 as n v)“, where ¢ and

¢n are defined by (5.17) and (5.24) respectively. .3

Proof. f(¢n - o) 2dP= ngn (on - ¢) 2dP +-IA,¢2 dP. (A2) implies

P(¢ 2) < a, so IA'¢2 dP ~ 0, since Nn increases to infinity.

Thus, by (5.24), it remains to be shown that IA (tn - ¢)2dP m 0.
n

Let 2 E A .
n

P (t - 1102 = 121‘"? {(2 An) - ¢)2>b}db

z n 0 z n n '

By Lemma 5.3 and (i), for n sufficiently large for z E An’

¢(z) S an, so that the range of integration of the above right

hand side can be reduced to (0,a:). It also follows that for
large n we can remove the truncation of En at an and apply
Lemma 5.4 to obtain the following asymptotic bound on the integrand

of the right hand side of the above equation:

-8nh2b c4
n

 

2 exp 2
((14%) cu + 3n)

Replacing (li/b) by (1+an) and integrating the resulting

expression over the range (0,0),'we obtain

2 T2((1+an )cn + Bn)2
(5.30) P (t - 11) S
2 n 4 nhzc:

 

 

 

 

78

By Lemma 5.5, since 2 E An,

1121 82 2
___L
3 c2

n

(5.31) sz - ¢)2 s 1:2

By the Minkowski Inequality,
2 2 3 2 k 2
(5.32) Pz(tn - d) s {(Pzan - Ti) ) + (sz - ¢) ) l .

By (1), (ii), (iii) and (A4) the P-integral over the set An

of the right hand sides of (5.30) and (5.31) converge to zero

as n v 6. Thus, by the Schwarz; Inequality the P-integral over
An of the right hand side of (5.32) converges to zero and it
follows that IA (tn - ¢)2dP H 0 which completes the proof.

n

Example 5.2. This example points out that Theorem 5.2 applies
to the Poisson-Normal (§,l) case. In this case r(y) = e-%y2
so (A3) is satisfied. Let G be such that (A2) is satisfied.
T(x) = x+l and P(x+l)2 = G(Pe(x+l)2) where Pe denotes the

Poisson distribution with parameter 9. Since Pe(x+l)2 = 92+39+1

and (A2) holds, (A4) holds. Also in this case,
r = exp (-(N +1)2/2)
n n ’

on = (01n + 1)!)'1

2
(N +1)
-.(JT).

:1
it

 

1’“ “‘1

 

 

 

 

 

BIBLIOGRAPHY

 

 

 

 

 

BIBLIOGRAPHY

Deely, J.J. and Kruse, R.L. (1968). Construction of Sequences Estimating
the Mixing Distribution. «Ann. Math. Statist.,_§2,.286-288.

Feller, W. (1957). An Introductioqjto Probability Theory and its Appli-, '11
cations Volume I (Zgg ed.). Wiley, New York. a

 

Gilliland, D. (1966). Approximation to Bayes Risk in Sequences of Non-
Finite Decision Problems. Tech. Report No. 10, Department of Sta-
tistics and Probability, Michigan State University.

 

Gilliland, D. and Hannan, J. (1968). The Role of Normal Approximation
in Compound Decision Theory Problems. Research Memorandum RM-218,
Department of Statistics and Probability, Michigan State University.

Hoeffding, W. (1963). Probability Inequalities for Sums of Bounded
Random Variables. J. Amer. Statist. Assog;,.§§, 13-30.

Johns, M.V. (1966). Two-Action Compound Decision Problems. Tech.
Report No. 87, Department of Statistics, Stanford.

Lehmann,E.L. (1959). .Testigg §tatistical Hypotheses. Wiley, New York.

Loeve, Michel (1963). Probability Theory (3gd_ed.). Van Nostrand,
Princeton.

Macky, D. (1966). Empirical Bayes Estimation in an Exponential Family.
Research Memorandum RM-176, Department of Statistics and Probability,

Michigan State University.

Parzen, E. (1959). Convergence of Families of Sequences of Random
Variables. California University‘Publications'in'Statisticg,
' 2, 23-53.

Robbins, H. (1964). The Empirical Bayes Approach to Statistical Decision
Problems. Ann. Math. Statist., 32, 1-20.

Teicher, H. (1961). Identifiability of Mixtures. éAnn.'Mat‘;‘Statist.,
32, 244-248.

Tucker, H. (1963). An Estimate of the Compounding Distribution of a
Compound Poisson Distribution. Theory of PrObability and its
ApplicatiOns,.§, 195-200.

Wolfowitz, J. (1953). Estimation by the Minimum Distance Method. Ann.
Inst. Math. Statist., Z, 9-23. 1 '

79

 

 

 

 

 

 

 

 

...ubo.o¢.tov._.mw
«...-.... ....
.

‘

 

 

 

   

"'TITx'l‘i'rgﬁﬁtLﬁjﬂﬂﬂﬁtﬂiﬂﬂﬂfﬂ'r‘ﬂnﬂﬂﬂiﬂmm‘s