4‘- .. ...

5-»...

..-A,.

....._..p~—.n.-.....

 

""‘-Cn0-4(~l< «a:

r:.—.-—
M..-W,

W

-\
,v“.
" n

I'

E?

EMPIRICAL BAYES RESULTS INITHE CA8

 

OF NON-IDENT

 

 

ICAL COMPONENTS . -

 

 

,.:

Thesis for the Degree .of Ph. D.

MICHIGAN STATE UNIVERSITY

THOMAS EUGENE O'BRYAN

1972

 

 

 

 

 

 

#7
LIB R A it ’1"
Michigan State
University

 

This is to certify that the

thesis entitled

EMPIRICAL BAYES RESULTS IN THE CASE OF
NON-IDENTICAL COMPONENTS

presented by

Thomas Eugene O'Bryan

has been accepted towards fulﬁllment
of the requirements for

 

 

 

Ph.D. degree in Statistics and
Probability
v1 . a / L at,“
hlqknrprcdiusor

Date August 11, 1972

0-7639

      
   

Y ammue av ‘ir'
NOAH a SONS'
IIQMIEEIUNB.

ABSTRACT

EMPIRICAL BASES RESULTS IN THE CASE OF
NON-IDENTICAL COMPONENTS

By

Thomas Eugene O'Bryan

A Bayes rule with respect to a distribution G will
minimize the risk of a decision concerning a parameter 9 which
is distributed according to G. The infimum Bayes risk is
denoted by R(G). Herbert Robbins ((1956), An empirical Bayes
approach to statistics. Proc. Third Berkgigy,§ygp, £235, Statist.
2522., 157-163, University of California Press) demonstrated that,
even if G is unknown, in certain cases one can construct
statistical procedures based on data gathered from n independent
repetitions of the decision problem for which the risk converges
to R(G) as n a m for all C. Such an empirical Bayes procedure
is asymptotically optimal. Rudimentary forms of this problem had
appeared prior to Robbins' unifying treatment and a huge empirical
Bayes literature has evolved since.

Only sequences of identical component problems have been
treated in the literature. However, it is clear that when the only
difference from problem to problem is sample size, empirical Bayes
methods should still be useful. In this case there is not a single

m
Bayes envelope R(G), but rather a sequence of envelopes R n(G)

Thomas Eugene O'Bryan

where mn denotes the sample size in the nth problem. Let

ﬁ_- (el,ez,...) be a sequence of iid G variables and let the
conditional distrib:tion of the observations g“ ' (xn,1"°"xn,mh)
given 3. be (Pen) n, n = 1,2,... . For a decision concerning
en’ we will investigate procedures tn which will utilize all

the data £1’°'°’§n and which, under certain conditions, arem
asymptotically optimal in the sense that lim I'_Rm'“(tn ,G) - R n(6)]
- O for all G. In particular this paper Ereats squared error
loss estimation and linear loss testing in certain discrete

exponential families where the construction of asymptotically

Optimal procedures is tractable.

EMPIRICAL BAYES RESULTS IN THE CASE OF
NON-IDENTICAL COMPONENTS

By

Thomas Eugene O'Bryan

A THESIS
Submitted to
Michigan State University
in partial fulfilhment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

1972

TO MY PARENTS

AND

MARY

ii

ACKNOWLEDGMENTS

I wish to express my appreciation to Professor
Dennis G. Gilliland for his guidance throughout the preparation
of this manuscript and his concern for my work. I am indebted
to both he and Professor James Hannan for the introduction to
empirical Bayes decision theory as well as many other aspects
of mmthematical statistics and for their many helpful comments
and suggestions which led the way to stronger theorems and
simplified proofs.

The financial support provided by the Department of
Statistics and Probability and the National Science Foundation
made my graduate studies possible. I thank them and also wish
to thankMrs. Noralee Barnes for her patience and skill in typing

this dissertation.

iii

Chapter

I

II

III

IV

TABLE OF CONTENTS

INTRODUCTION

1 l The Statistical Decision Problem
1 2 The Empirical Bayes Decision Problem
1.3 History
1 4 The Non-Identical Case

DECISION PROBLEMS INVOLVING SOME DISCRETE
EXPONENTIAL FAMILIES (PRELIMINARIES)

2.1 Introduction

2.2 Assumptions

2.3 Lemmas

ESTIMATION

3.1 Estimation Under (A1“) and (A3)
3.2 Estimation Under (Al) and (A2)
3.3 Estimation Under (Al)

TESTING AND FINAL REMARKS

4.1 Testing Under (A1-) and (A3)
4.2 Testing Under (A1)

4.3 Final Remarks

BIBLIOGRAPHY

iv

Page

\OVU‘H

ll
11
18
19
22
22
28
33
37
37
39
43

46

CHAPTER I

INTRODUCTION

§l.l The Statistical Decision Problem

Consider the following statistical decision problem. Let
{P9: 6 E o} be a family of probability measures over a a-field B
of subsets of I. 9 will be called the parameter space and 9 will
denote a generic element of Q. I will be called the observation
space. Let a be an action space with generic element a. Let
L 2 0 be a loss function defined on (9 x a. Let G be a probability
measure over a a-field 3 of subsets of @. G will be called the
"a priori" distribution. With 0 a a-field of subsets of a, a
randomized decision rule (decision function) t has domain I X<3
and is such that t(x,°) is a probability measure on c. for each
fixed x 6 I, and t(-,C) is B measurable for each fixed C 6 C.
When 9 is the parameter, the decision rule t results in expected
loss

(1.1) R(t.e) '- { j L(e.a)t(x.da)P (dx) .
a e

We require that c. contain all the singleton subsets of ¢7
so that the class of randomized decision rules contains the class of
nonrandomized decision rules. With a nonrandomized decision rule t,
t(x,-) puts all its probability on a singleton set, say {t(x)},

for each x. For a nonrandomized decision rule t

(1.1') R(t.e) =1 L(e.t(x))Pe(dx)

R(t,G) is called the risk of t with respect to 6. When G is

the "a priori" distribution on @, the overall expected loss is given

by
(1.2) R(t,G) = {g R(t.9)c(de)
R(t,G) is called the Bayes risk of t with respect to G. Let

(1.3) R(G) = inf R(t,G) .
t

R(G) is called the Bayes envelope evaluated at G. If there exists

a *
a decision rule t such that R(t ,G) = R(G), then we write

*

t 8 tG and call tG

Note that tG need not exist and, if it does exist, it need not be

unique. However if it does exist, then there exists a nonrandomized

a Bayes decision rule with respect to G.

decision rule which is also a Bayes decision rule with respect to G.
We give two examples of the preceeding which will illustrate
the two types of decision problems, i.e., estimation and testing,
which we will be concerned with in the following chapters. In both
examples, let G - (0,m). Let (PB: 9 E G} be the Poisson family of
admits a density f with

9 9
respect to counting measure u on I I {0,1,...} given by fe(x) =

distributions with means 9, i.e., P

e-eex(xl)-1. Let G be the Gamma distribution on (0,m) which has
density with respect to Lebesgue measure X given by g(e) =

[r(8)]‘1a893'le‘aﬂ for a,B >~O.

Bxgple 1.1. (Poisson) Estimation. Let as (9. Let L(e,a) -= (e-a)2.

Then for any nonrandomized decision rule, (1.2) becomes

R(t,G) - 2 (t(x)-e)2fe(x)g(e)de .
xIO

Here the nonrandomized Bayes decision rule with respect to G is
given by a version of the conditional expectation of 9 given
X I x,

3 Biz
tc(x) Gel

Exmle 1.2. (Poisson) Testing. Let 0 < c < on. Let a- [a1,a2}
correspond to the actions "decide 9 s c" and "decide 6 > c"
+ -

respectively. Let L(e,a1) I b(e-c) and let L(e,a2) I b(9-c) .
Here we use the symbol 6 rather than t for decision rules. Let
6(x) be the probability of choosing action a1 given x. Here
we have (1.2) as

Q

R(6.G) . 2 [5(x)L(e.a ) + (1-5(x))L(e.a )]f (x)s(e)de
l 2 e

xIO

and a nonrandomized Bayes decision rule with respect to G is

given by
66(x) I ESE? s c]

Example 1.3. Consider the problem of Example 1.1 with X = (X1,...,Xm),

a sequence of iid random variables each distributed £9. The
m

statistic Y I 2 X1 is sufficient for this family. Here Y ~ fme.
iIl

The nonrandomized Bayes decision rule with respect to G is given

by a version of the conditional expectation of a given Y = y,
tG(y) I . Also we calculate R(G) a( I ) .
The important thing to notice here is that both the Bayes

decision rule with respect to G and the Bayes envelOpe evaluated

at G depend upon the sample size m.

Throughout this paper, we will let square brackets denote
indicator functions, so that for an event A, [A] denotes the

indicator function of A.

§1.2 Theggmpirical Bayes Decision Problem

In the case where G is known and a Bayes decision rule with
reapect to G exists, one merely employs tG and thereby incurs the
minimum possible Bayes risk R(G). But suppose G is unknown. Robbins
(1956, 1963, 1964) showed that if a given statistical decision prob-
lem occurred repeatedly and independently with the same unknown G
throughout, then, under certain conditions, one could exhibit a
sequence of rules {tn} which had Bayes risk with respect to G
converging to the Bayes envelope evaluated at G. As the problem
repeats itself, it presents a sequence of pairs of random variables
{(91' Xi)} with each pair being independent of all other pairs.

The 9 are unobservable and iid with distribution G. The condi-

i

tional distribution of X1 given that 9i = e is given by P

8°
Robbins suggested that one use a decision rule tn in the n+lst
repetition of the problem with tn depending on X1,...,Xn. Robbins'
rationale was that one could use the knowledge about G gained through
the variables X1,...,Xn in such a way that for large n the Bayes
risk with respect to G of tn would be close to the Bayes risk

with respect to G of tG. With tn used as the decision rule in

the n+1st problem, the risk conditional on X ..,X is

1" n

(1.4) R(tn,G) = {g i J‘ L(e,a)tn(x,da)P (dx)G(d9)
a e

which satisfies R(tn,G) 2 R(G) in view of (1.3). Hence with the

overall expected loss for the decision concerning en+ denoted by

l

(1.5) Rn(tn,G) E E R(tn,G)

we see that

(1.6) Rn(tn,G) 2 R(G)

Definition 1.1: If lim Rn(tn,G) = R(G), then {tn} is said to be
H
asymptotically optimal relative to G and we will write {tn} a.o.

relative to G.

§l.3 Histogy

The search for rules {tn} which are a.o. relative to G
for every distribution G or at least for every G within a certain
class has taken basically two tracks. The first track is to use the
values of X1,...,Xn to form an estimate of G, call it On, and
then let tn be a Bayes decision rule with respect to G“, i.e., let
tn I tG . The second track is to estimate the form of the Bayes
decision rule directly without estimating G first.

In 1955 at the Third Berkeley Symposium on Mathematical
Statistics, Robbins introduced empirical Bayes procedures and dis-
cussed both tracks mentioned above. In 1963 and 1964, two more
papers by Robbins appeared which discussed the empirical Bayes
problem further. Rudimentary forms of the problem had appeared prior
to Robbins' unifying treatment and a huge empirical Bayes literature
has evolved since. We will concern ourselves here with that segment
of the literature which involves situations similar to what we will
discuss in this paper.

Along the second track, Johns (1957) discussed estimation
in the case where the class of probability distributions {P9: 9 E O}
was not restricted to a particular parametric family. Macky (1966)
and more recently Hannan and Macky (1971) have dealt with certain
exponential families in the case of estimation and have demonstrated
a.o. rules with weak restrictions on the prior distribution and the
parameter space. Samuel (1963) discussed the testing problem under
various loss structures and in part dealt specifically with the

type of discrete exponential families which we will discuss in

Chapter 11. Johns and Van Ryzin (1971) treated the testing problem
with linear loss and developed rates of convergence on
Rn(6n,G) - R(G).

Concerning the estimation of G which is part of the first
track, Tucker (1963) examined the case where {P9: 9 6 G} was the
family of Poisson distributions. Rolph (1968) used Bayesian
estimation of G in the case where the parameter space 8 was
limited to [0,1]. Recently Meeden (1972) has looked at Bayesian

esthmation of G in the case where G may be [0,m).

 

 

iiill Illa...

§l.4 The Non-Identical Case

The history of the empirical Bayes decision problem is such
that the only problem that seems to have been considered thus far is
the problem where the stages are identical repetitions of a given
component problem. One could ask whether it is meaningful and useful
to apply empirical Bayes procedures to sequences of independent but
not identical decision problems all having the same unknown G. We
will attempt to answer this question in part in the remainder of
this paper. Specifically, we will address the case where the statis-
tical decision problems in the sequence are identical except for
sample size. When we observe a random vector of observations
E_ = (X1,...,Xm), where m may vary from stage to stage, it will
become necessary to consider the dependence of the Bayes decision
rules and the Bayes envelopes evaluated at G upon the values of m
(cf. Example 1.3). This was not necessary before when one considered
problems where the sample sizes were identical at each stage.

In the situation that we are considering, where the problems
occur independently with the same unknown G throughout, there is a
sequence of independent random vectors {(ei,§i)}, i = 1,2,... where

X I (X11,...,X1 m ) is the sample of size m from the ith problem.
’

‘1 1 i

The random variables 91 are unobservable and are iid with distribu-

tion G. Conditional on e = e, X ,...,X are iid P . We
1 il i,mi 9

consider a decision rule tn for use in the n+lst problem which de-

pends on g1....,§n. Letting m = m, the risk conditional on

n+1
$1,...,§n is given by

10

m

(1.7) Rm(tn.G) = £3} £L(e.a>tn(§.da)r‘3(d§)c(de>
1

which satisfies Rm(tn,G) 2 Rm(G) where Rm(G) is the Bayes
envelope for a sample size m component problem. Hence, with the

overall expected loss for the decision concerning 9 denoted by

n+1

m
(1.8) Rn(tn,c) E R (tn,G) ,

we see that

N

m
(1.9) Rn(tn,G) R (a)

This motivates the following definition which parallels Robbins'
definition.

Definition 1.2. A sequence of decision rules {tn} is said to

be asymptotically Optimal (a.o.) relative to G if lim Dn(G) = 0

mn+1 n—ooo

where Dn(G) I Rn(tn,G) - R (G).

The remainder of this paper treats squared error loss estima-
tion and linear loss testing involving certain discrete exponential
families and exhibits sequences of rules {tn} which are

asymptotically optimal in the situation described in this section.

We will approach the problem along the second track discussed before.

CHAPTER II

DECISION PROBLEMS INVOLVING SOME DISCRETE
EXPONENTIAL FAMILIES (PRELIMINARIES)

§2.l Introduction

We impose the following special structure on the component
problem to be treated in the empirical Bayes framework. Suppose
that conditional on 9, X1,...,Xm are iid Pe where Pe has a
density with respect to counting measure u on I = {O,l,...}

given by

(2.1) fe(x) = 9xz(e)g(x) where g(x) > O, x 6 I

and e 6 @c: H where
naiezoz zexs(x)<oo}
XIO

and z is the function defined by

( z exg(x))'1, e 2 o .

x=0

2(9)

The function 2 is continuous on [0,m), is decreasing to O, and

m

is positive on the interval 3. We note that Y = 2 X1 is suf-
iIl

ficient for this family. The random variable Y, whose distribu-

tion we will denote by P9 m, takes values in I and has density
9

with respect to u given by

11

12

y m
(2.2) fame!) = e z (e)gm(y). y 6 I
where
m
(y)=2 Hs(x.)>0.ye1
gm Am(y)i=1 1
(2.3)

m
Am(y) “{(x1.-o-.xm):_21xi = y} -
1:

With 9 ~ G, the marginal density of Y with respect to

u is given by

(2 A) hm(y) = qm(y) gm(y)

where

(2.5) qm(y) = j eyzm<e>ctde>
O

is the marginal density for Y with respect to the measure on I
defined by the mass density gm' We note that for all m,
qm(y) > O for all y E I and hence hm(y) > O for all y E 1,
unless G is degenerate at O in which case qm(0) I zm(O) I
(g’lmnm > o. hm<°> = 1 and qm(y) = hm(y) - o for y 2 1.

Of course f9,1 = f6, g1 = g and for convenience we

let h1 I h and q1 I q.
Two common families of the type discussed above are the

Poisson family with

£e<x> - exe'eeo'1

(2.6) and

= ~me y -l
fe,m(y) Bye m (Y!)

13

and the Negative Binomial family (r > 0 known) with

fe(x) = ex(1-e)r(r+x'1)
X

(2.7) and

= y _ mr mr-I—y-l
fe’m(y) 9(19) ( y ).

In this paper we will consider two loss structures. In
Chapter III we will consider

Estimation with @C ac [0,00) and L(e,a) = (9-8)2 and
in Chapter IV we will consider

Testing with d = {81’82} and for b > O and c E 64),
+ -
L(9.81) = 13(9-0) : 149,82) = b(e-C) o

For Estimation (hereafter understood with squared error

 

loss), the Bayes risk of any nonrandomized rule t based on Y
in the sample size m problem is given by
m 2
(2-8) R(t,G) = 2‘. (MY) - e) f (y)G(d9)
e.m
y=O
and the nonrandomized rule which is Bayes with respect to G is

given by a version of the conditional expectation of 9 given

Y = y .
qm(y+1)
(2-9) tG(y) '3 W

where throughout this paper ratios 0/0 are to be interpreted as O.
For Testing (hereafter understood with linear loss function as
given above), the Bayes risk of any randomized rule 6 with respect

to G, where 6(y) is the probability of taking action a1 given

14
Y = y, is given by

(2-10) R(6.G) =J' I: [6(y)L(9.81) + (1-6(y))L(9.82)]f (Y)G(d6)
® y=0 9,m

and a nonrandomized rule which is Bayes with respect to G is given

by

(2.11) aG<y> = EaG<y> s 0]

where

(2.12) ace) = qm(y+1>- c qmo) .

We will add superscripts m to R,t,tG,5,5G,a and 8G whenever it

is necessary to emphasize the dependence on m.

The empirical Bayes problem that we consider involves repeti-

tionsof the component problem (eitherEstimation or Egggigg) with sample
size m varying from problem to problem. We now turn our attention

to the method that we will use in constructing rules which are a.o.
relative to any G. Notice that the Bayes decision rules (2.9) and
(2.11) depend upon y through qm' With G unknown, qm. is un-

known but can be estimated. One method is to express qm as a
function of q, a common marginal density with respect to g of

all xij’ j = l,...,mi, i I 1,2,... and to use the xij data to
estimate q. With the discrete exponential family (2.1) this

method can sometimes be implemented. We note that

(2.13) qmo) -- g zm‘1<e>eyz(e>c<de>. y e 1 .

15

If 2(9) is a polynomial then zm-1(9) is also a poly-

nomial and we can write for each m

-l k
(2.14) z‘“ (e) = 2 vke

where the Yk are constants. Substituting (2.14) into (2.13) and

interchanging the order of integration and (finite) summation yields
(2.15) qm(y) = Z vkq(y+k)

so that qm is expressed in terms of the estimable function q.
Since 2(9) is continuous, given an interval [0,6],

8 < m, the Weierstrass' approximation theorem allows that for each

m and every 6 > 0 there exists a polynomial z ykek' which

approximates zm-1(9) to within 6 uniformly on [0,9], i.e.,

(2.16) Izm-1(9) - 2 vkekI < e for all 9 E [0,5] .
Defining
(2.17) qm,€(y) = 2 vkq(y+k). y e x.

we see that
(2-18) Iqm(y) - qm 800‘ S

IIzm‘lm - z vkekmyz<e>g<de> s e q(y). y e r.

O
i.e., qm,e(y) approximates qm(y) to within 6 q(y) for each
Y E I and each m.

For the Negative Binomial fe of (2.7) with r an integer,

2(9) I (1-9)r, a polynomial of degree r. For the Poisson f

9
of (2.6), zm-1(9) = e-e(m-1) has a power series expansion about

16

O and the series is uniformly convergent on any bounded set of 9
values for each m and hence, for each s > O, the polynomial
approximating 2m. to within 6 uniformly on the bounded set

of 9 values can be found simply by truncating the power series

expansion.

We will find it sufficient but not necessary for our

purposes to require that the estimator of q(y), y 6 I, be

:1

-— 1
(2.19) q(y) ’ g z q(y). y E 1.
1:11

where for each i, iq(y) is an unbiased estimator of q(y) based
on Xi, bounded by l/g(y), and note that iq(y) = O for all
y 2 1 if Xi = Q. As an average of unbiased and bounded estimators,
3(y) is unbiased and pointwise consistent for q(y), y E I.

We now develop an example of such an estimator. Let
X1,...,Xm be a sample where, conditional on 9, X1,...,Xm are

iid f9 and where 9 ~ C. An unbiased estimator of q(k) is

provided by [X1 = k]/g(k) since
(2.20) E[X1 = k] = h(k) = q(k)g(k) .

An improved estimator based on the sufficient statistic

Y = X1 +...+ Xm is given by the conditional expectation

_ P[x1 = k, x

+...+ X = y - k]
E [x1 = k]-— 2 m

PIY = Y]

 

The probabilities can be computed first conditional on 9 and

then unconditionally to obtain the result

l7

g(k)gm_1(y-k)
sm(y)

 

(2.21) 1-:y[x1 = k] =

This relation (2.21) holds for all m 2 l, k,y E I with the
definitions g0(u) = [u = 0] and gm(u) I 0 if ‘m 2 l and
u < 0. In view of (2.20) and (2.21) we see that the estimator

gm_1(Y - k)
gm(Y) ’

 

(2.22)

is unbiased for q(k), k E I. Furthermore, 0 s g(k)gm_1(Y - k) s
gm(Y) for each m to ensure that the estimator (2.22) is bounded

by g-1(k),k€I. WhenX =...=xm=o then Y=o andwe see

1
that the estimator (2.22) is equal to O for k 2 1. Therefore,
with Yi denoting the sufficient statistic in the ith component
problem,

gm '1(YI - Y)

i
2.23 a = .0.
( ) iq(}') 8 (Y ) 9 Y 6 In 1 192:
mi i

 

provides an example of an estimator to be used in (2.19).

18

§2.2 Assumptions

The discussion in Section 2.1 helps motivate the imposi-
tion of the following assumptions. In order to exhibit rules
{tn} which are a.o. relative to all G we will impose these

assumptions from time to time in what follows.

(111') o t: [0,5] where B < co .
(Al) ®<: [0,9] where 9 E n .
(A2) 2(9) is a polynomial in 9 E ® .

(A3) The sample sizes mn form a bounded sequence, mn$M<m, n=1,2,...

In Estimation, the action space a'is always taken to be [0,9].

Remark 2.1. The assumption (A2) implies the assumption (Al-).
Proof: Recall from the definition of 2(9) in Section 2.1 that
(i) 2(9) 2 0 for all 9 2 0 and (ii) 2(8) I 0 as 9 t m.

If (A2) obtains, then
R k

for constants v0,...,yK where YR f 0. If VK > O, 25 ykek « m
as 9 a a so that 9 must be bounded in view of (ii). If
YK‘< 0, 25 ykek a -m as 9 a m so that 9 must be bounded in
view of (1).

Of course, assumption (Al) also implies 9 is bounded.
In the presence of (A2), (A1) only adds the requirement that
sup{9‘9 E G} E n. Assumption (Al) need not be satisfied when (A2)
is satisfied; for example, with G = [0,1) I n, the negative

binomial family with r a known integer satisfies (A2); however,

9 does not satisfy (Al).

l9

§2 .3 Lemmas

We will need the following lemmas in Chapters III and IV.

Lemma 2.1. For the discrete exponential family (2.1)
(2.24) P9,m[Y>y]SPB,M[Y>y] for all yEI,
O$9$BEH,ISmSM.

Proof. Let y E I, 0 S 9 s 9 E [1 and 1 Sm 5M be fixed. Since
{f9 In(y)I9 E II} has an increasing monotone likelihood ratio in y
D

it follows from Lehmann (1959, Lemma 3.2) that

P9,m[Y > y] s P9,m[Y > y] .

Since qu1 = ET Xi'I'EM

m+1 Xi where the X1 2 0, PEEP; X1 2 y] 2

PH)? Xi 2 y] so that

PBJFY > y] s PB,M[Y > y] .

In our constructions in Chapters III and IV it is necessary
in the absence of (A3) to use the existence of decision rules Lm
such that R(Lm,G) a 0 as m a m for every G. The two lemmas

to follow will establish the existence of such rules. We notice

that
f (1)
mi __9._ e.

has a "natural" bounded estimator

 

m
o slut - 1]
m 1:
(2.26) T (x1,...,xm) - ELI-8(1) m A B ,
ZIP(1 " 0]
i:

20

under (Al-).

Lemma 2.2. (Estimation) Under (Al-), R(Tm,G) a O as m a m

for any G.
Proof: Let G be an arbitrary but fixed a priori distribution on
‘m
1 m P
9. Since -' 2 [X. I x] 49 f (x), x E I and f (0) E O and
mi=1 1 9 9

9 E [0,9], we see that

m

P f (1)
m _,9 EMA—e
T (x1,...,xm) g(l) few) 9 .

2
Since (Tm - 9)2 s 9 for all m, the dominated convergence yields
m m 2
R(T ,9) = E9(T - 9) a 0 for each 9

where Ee denotes expectation with respect to the distribution
m m 2 2
P9. Since E6(T - 9) s B for all 9 and m, another applica-

tion of the dominated convergence theorem yields
m m

(2.27) R(T ,c) a £110: ,9)G(d9) -. o .

For the testing problem we define

(2.28) Am(x1,...,xm) = [Tm s c] .

Lemma 2.3. (Testing) Under (Ai‘), R(Am,G) -. o as m -. do for
all C.
Proof: Let G be an arbitrary but fixed a priori distribution

on Q and let Ee be as in Lemma 2.3. We can write

21
R(Am,8) -= b{E9([T‘“ s We - c)+)
+-Ee([Tm > c](9 - e)‘)} s b EeITm - 9|

Since Tm - e e o in L (cf. (2.27)), Tm - e a o in L

2 1

which completes the proof.

CHAPTER III

ESTIMATION

§3.1 Estimation Under (A1-) 33g (A3)

In this chapter we will exhibit sequences of decision rules
which are a.o. relative to every G in the case of squared error
loss estimation in the Special discrete exponential families des-
cribed in Section 2.1. As has been noted by many authors (e.g.
Macky (1966, pp. 6-7)) and quite apart from distributional assump-
tions if T(X) and 9 are L random variables then

2

T(X) - E[9IX] and E[e\x] - e are orthogonal in L2 so that
E(T(X) - 9)2 - E(E[9IX] - 9)2 I E(T(X) - E[9IX])2. This implies

that for estimating 9 in our sample size m component decision

problem with an estimator t,
(3.1) Rm(t,G) - Rm(C) = E(t - tG)2

In dealing with risks concerning the estimation of 9n+1 in the
empirical Bayes problem we will find it convenient to drOp sub-
scripts within the n+1st problem. We will let P and E denote
probability and expectation taken over all random variables on
which they operate and Py and Ey denote their conditional

on Y I y counterparts.

The following lemma is motivated by the approach of Robbins

(1964) and proves useful in establishing the asymptotic opthmality

22

23

of specified sequences of decision rules when assumptions (A1-)
and (A3) obtain.

Lemma 3.1. Suppose (A1-) and (A3) obtain. Then with t: defined
for each m to be a y-measurable decision rule in the sample size

m problem depending on 3' X and with t: Bayes with respect

1""’-n
to G in the sample size m problem (t: is given in (2-9)):

P
(3.2) t:(y) - t2(y) 4y 0 for all y E I, 1 s m s M

m
implies that the sequence tn = tum.1 is a.o. relative to all C.
Proof: Let G be arbitrary but fixed. By applying (3.1) con-
ditional on X1,...,Xn and then completing the expectation, we

m m m 2
see that 0 s Rn(tn,G) - R (G) = E(tn(Y) - tG(Y)) . Condition

(3.2) implies that
(3.3) t:(Y) - t2(Y) 3.0 for each m, 1 s m s M

Under (Al-), the sequence in (3.3) is bounded and for a bounded

sequence, convergence in probability implies convergence in L2

so we see that
(3.4) Rn(t:,G) - Rm(G) « O for each m, 1 S m S‘M .

Since 1 s mn+1 $1M < m for all n under (A3), (3.4) implies

that Dn(G) a 0 (where Dn(G) is defined in Definition 1.2) as
was to be proved.

This lemma shows that in order to find a.o. rules under

(A1”) and (A3) it suffices to be able to approximate tm as

G
n a a for each m. Now under (A2), (2.15) obtains, i.e., for

each m

24

(3.5) qm(y) = z vkq(y+k). y E L .

and from (2.9) the Bayes rule is provided by

m qm(y+1)
(3.6) tG(y) = ———qm(y) . y E I .
With 3' defined in (2.19),
_ __ P
(3.7) q (y) -=- 2 v, q<y+k) J q (y)
m m

since q(y+k) is consistent for q(y+k) for each k. 80 in view

of Lemma 3.1 the following empirical Bayes procedure is a candidate
m

for an a.o. rule. Let tn = tnn+1 where
-+
m qm 9+1)
(3-8) t(y)=:T—-Ae.yer,n.m21.
n (y)

m

Remark 3.1. If G is degenerate at 0, then each X1, i 2 l, is

 

degenerate at 9, Then aIy) I 0 if y 2 l as required in (2.19)
and we see that ;¥(y) I 0 if y 2 l, m 2 1. Hence, t: defined
by (3.8) satisfies t:(y) = o for all y e r, m 2 1. Since the
Bayes rule (3.6) satisfies t:(y) = O for all y E I, m 2 l, we
see that Dn(G) I 0 for all n if G is degenerate at 0, i.e.,
tn is trivially asymptotically Optimal relative to G degenerate
at 0. All of the empirical Bayes rules defined in this chapter
can be easily shown to be a.o. relative to G degenerate at 0 and
we will exclude treatment of this case in the proofs of asymptotic
optimality relative to all G. When G is not degenerate at 0,
qm(y) > 0 for all y and m.

Theorem 3.1. Under (A2) and (A3) and with tn = tn where t:

is given by (3.8), {tn} is a.o. relative to every G.

25

Proof: Let G and m be fixed, 1 s m s‘M. By (3.7) and the fact
qm(y) > 0, y E‘I, we see that

Em+(y+1) Py qmo+1>

5-;(3') qmw)

(3.9)

 

.YEI-

Since (A2) implies (A1-) (cf. Remark 2.1) we have 0 s tE(y) s 9,

y 6 I, so that (3.9) implies
m P m
(3.10) tn(y) «V tG(y) . y 6 I .

and, hence, (3.2) is satisfied. An application of Lemma 3.1 com-
pletes the proof.

In the absence of (A2) the choice of a sequence {tn}
which is a.o. relative to every G is more complicated. However,
in Section 2.1 we saw that under (Al-), for each m and each
6 > 0 there exists a polynomial E vkek which approximates
zm-1(9) to within 3 uniformly in 9 E 9. For a determination

of such a polynomial, we define

(3.11) 6' 6O) = 1: y, E<y+k> . y e 1

ms

where 3’ is given in (2.19). Reversing the order of summation

in (3.11) we have

_ B l n
(3.12) qmm n 121 2 wk iq(y+k). y 6 I .
where for each i,

M Ika

3.13 a
(A ) I: y, ic1<y+k>I s p(e.y) mil 2: 86%)

in view of our requirement 1q(y) s g-1(y) in (2.19). Hence,

26

2
(3.14) Vary(qm e(y)) S 9.45.112 , y E I, 1 s m SM .

n

Si e E'- I E , where is defined b
M: qu’¢(y) qm,¢(Y)’ y - I qmw y

(2.17), we have by (2.13) and (3.14) that

2
- 2 2 (my) 2 2
(3.15) Ey(q “(1') - qm(y)) s n + 6 q (3'). y e I .

m

The bound p(e,y) defined in (3.13) is independent of C. With
a -o 0 there exist n I n(y,e) such that n-lpz(s,y) -° 0. By
inverting the function for each fixed y we Obtain a choice

e I g(y,n) I 0 with n-1p2(e,y) I 0. For such choices

 

 

P
'_ my
(3.16) qm,e(y) qm(y), y E I, l s m s M .
- mn+1
Theorem 3.2. Under (Al ) and (A3) and with tn = tn where
t: is given by
—- +
m qm (y+1)
(3.17) tn(y)= :1 + AB,yEI,1SmSM,
m.e (y)

with a choice a I e(y,n) such that (3.16) Obtains, [tn] is a.o.
relative to every G.
Proof: Let G and m be fixed, 1 Sim s‘M. By (3.16) and the fact

that qm(y) > O, y E I, we see that

- 4-

<1 (Y‘I‘l) P q (Y‘I'l)

(3.13) 4321—_ _.Y J1.—
+ q (7)

<1,“ (y) m

.yEI.

Under (Al-), 0 s t:(y) s 9 so (3.18) implies

P
(3.19) t:(y) —-y 930') . y 6 I

and, hence, (3.2) is satisfied. An application of Lemma 3.1 com-

pletes the proof.

27

Theorem 3.2 subsumes Theorem 3.1 for in case 2(9), 9 E 9,
is a polynomial a choice corresponding to e I 0 exists. Theorem
3.1 was presented because of its simple proof and because of its

significance in the motivation of Theorem 3.2.

28

§3.2 Estimation Under (Al) and (A2)

Let these two conditions hold throughout this section.
The candidate for an a.o. empirical Bayes rule will be based on
t: defined by (3.8). In the absence of (A3) we have found it

necessary to examine in greater depth the conditional mean square

error of estimation
(3 20) E (tm( ) - t”( >)2
. y n y C y .

Define iqm(y) g z Vk iq(y+-k) for each 1,m.y so that qm'=

%- 2 19m. Let f' denote the function defined by f'(y) I f(y+1)
iIl

and fix m,n, and y. Temporarily suppressing the display of the

dependence on m,n, and y (e.g. qm(y) I q) we can write (3.20) as

8'1: 2 t
(3.21) I0 G P[t > tG + c]dc + I06 P[t < tG - c]dc2 .

Since t I [(aw)+7(a5+] A 9, we see that

(3.22) [t>t+c]$[;$0]+[;'-(tc+c);>0].

G

We can write

(3.23) P[q' - (tG + C)q > 0] = P[u) > 0]
where
_ 1 n
(3.24) (0 I - E (1)
n i=1 1
and

II". II
(3.25) 1w iq (tG + c)iq, i 1,2,...

29

In preparation for bounding the tail probability (3.23) note that,

since tG I q'lq and E iﬁ I q, we have

(3 .26) 11(5)

-cq .

From (3.25) and the bound for iq required in (2.19), we have for

c E [0, tG - 9] and for each i that

(3°27) ‘iw‘ ‘ E‘Yk\(s(y+1+k) + s(y+k)) " p '

By (2.3) of Theorem 1 of Hoeffding (1963) we have

(3.28) P[E 2 0] s epr-Zrig'g'lzl
and hence
B-t:
G - 2 2113
(3.29) g P[m20]dc s 2 .
nq

A similar treatment of PE; 5 0] yields
— 2
(3.30) P[q s 0] s exp{-2r{§%] }

where use is made of the fact that 1Q 5 9/9. Combining (3.22),

(3.29), and (3.30) we obtain

B-t 2 _ 2 2
(3.31) J‘ G P[t > t + c]dc2 s Ze.+ 32 arm—ELL} 5' in.
0 c n£12 2p2

The same bound holds for the second term of (3.21) so that by

(3.20) and (3.21)
(3.32) Eye?» - 1:29))2 s Bn(m.y)

where, with p(m,y) defined in (3.27), Bn(m,y) is defined in

(3.31) with the dependence on m,n and y now displayed.

30

Lemma 3.2. Let N be any function from I to I such that

N(M) « m as M a m. There exists a sequence [Mn] independent
of G such that

(3.33) s (N) 2 v v Bn(m,y) « o as n a m .
“ main yQJ (Mn)

Proof: Let N be fixed such that N(M) « m as ‘M I m. For each

M, let n I n(M) be any increasing sequence of integers independent

of G such that
(3.34) V V B (m,y) I O as M I a .

mar yQHM) “
Inverting n(M) to obtain M(n) will allow a choice of a sequence
Mn I M(n) independent of G such that (3.33) obtains. To see
that such a sequence n(M) independent of G exists, note that

my) 2 z‘“(e>tgeyc<de> 2 2mm»? where u - £9 awe). Then

(3.35) A A qm(y) 2 [2(6) A 1]“[9 A DINO”
n5M.y£N(M)
With
(3.36) E V V p(m,y).
msMIy‘N(M)

we see by the definition of B in (3.31) and by (3.35) and (3.36)

that any choice n I n(M) such that

(3.37) 0%Z(B) A 1]M[p /\ llum) pg]. —0 m 88 M —o a:

will ensure that (3.34) obtains. Any choice

(3.38) n I n(M) I [2(9) A 1].2M p: exp{aNb(M)]

31

where a and b are constants, a > 0, b > 1, is independent of
G and guarantees (3.37) regardless of the value of 9. Hence the
proof of the lemma is complete.

We are ready to define a candidate for an a.o. rule. Let
Lm be decision rules such that R(Lm,G) I 0 as m «to for every
G. Such a choice is possible as was seen in Section 2.3. Let

{Mn} be any sequence of positive integers I m as n I m. For

each n define tn by

mn+ Inn-*-
(3.39) tn = L 1]:me > Mn] + tn 1[mn+1 s Mn]

where for each m, t: is defined in (3.8).

Theorem 3.3. Under (Al) and (A2), with N any function satisfying
the hypothesis of Lemma 3.2 such that PB’k[Y >'N(k)] I 0 as

k1~ m, and with tn defined by (3.39) with {Mn} chosen independent
of G so that (3.33) obtains, the rule {tn} is a.o. relative

to every G.

Proof: Let G be fixed. For mn+1 > Mn’

”+1
(3.40) o s Dn(G) s R(L “ ,c) .. o as n —. m
since Mn 4 a. For m I mn+1 5 Mn’ by applying (3.1) conditional

on 31,...,§n and then completing the expectation,

(3.41) 0 s Dn(G) = E(t:(Y) - 63(2))2 .

2
Since (t:(y) - t2(y)) s 92 for each m,y, and n, the right
hand side of (3.41) is bounded (with Nn I N(Mn)) by
N

n 2 2
(3.42) yfo zyeﬁm - ego» time) + e PEY > N“)

32

which in turn is bounded by
3 43 B (N) +' 2 P Y N
( ° ) n B 8 ’Mn[ > n]

from (3.32), the definition of Bn(N) in (3.33), and Lemma 2.1.
Since (3.33) Obtains for {Mn} and N, the first term in (3.43)
I 0, while the second term in (3.43) I 0 by the definition of
the function N and the proof is complete.

This section is subsumed by the succeeding section in the
same manner as Theorem 3.1 was subsumed by Theorem 3.2. However
we have included this section for its significance in motivating
the deveIOpment of a.o. rules in the succeeding section.

An earlier construction used weaker bounds on the condi-
tional mean square error of estimation and required the imposition

of an assumption
+
(A1) OCEO.B]=d.o>0.BEH

in order to determine the choice [Mn] independent of G.
Professor James Hannan observed that an application of Hoeffding's
Theorem 1 yielded a bound from which a construction could be

accomplished under (Al).

33

§3.3 Estimation Under (A1)

Let (A1) hold throughout this section. The candidate for
an a.o. empirical Bayes rule will be based on t: defined in

(3.17). For m 2 l and g 2 0, define

(3.44) tm =.Jﬂaﬁ

G96 qm C

where qm is defined by (2.17) and f'(y) I f(y+1) for any

function f. For each m and g,

1 “9'

m m -
(3.45) It - tGI qm,e m.e

(1,. - 9,;I + BIqM - qu} -

Fixing m and taking 23 < zm-1(9) fixed,

(3.46) q 2 zm'1<e>q

m

and from (3.46), (2.18), and the choice of e

(3.47) 9m”; >- qm - eq 2 i zm'1(6)q

Hence from (2.18), (3.47), and the fact that q' 5 9q, the right

hand side of (3.45) is bounded by
(3.48) 43 e{z““'1(e)]'1 .
For the choice of e, we see

111
(3.49) 0 < tG.e s 39

where use is also made of (3.47), (3.48), and the fact that

m

0 s tG

s 9 under (A1). Now define

-, + - +
(3.50) Tie I [(qm’e) /(qm,e) 1 A 3B

34

with Em 8 defined in (3.11) and note that
9

m m 2 m m 2
( ) (Tnse G’s ( n G96)
where t: is defined by (3.17). Following the same procedure

leading to (3.32), we have that for each m,y,n, and e < zm-1(9)/2,

111 I'll
(3.52) Eyomo) - t

2 *
Gwho) s Bn(m.y.e) 2

 

4 2 2 -982“ (I: (Y)
3.91.211... 185 ex“ 1L }

nq‘i’eb') 2 92(msYsC)
where
3
(3-53) p(n.y.e) =-.: 24ka (W + 87331?)

From (3.51) and (3.52), we have

2 'k
(3.54) Ey(t:(y) - tg’e(y)) S Bn(m.y.e)

for each m,n,y, and e < zm-1(9)/2.
Lemma 3.3. Let N be any function from I to I such that
N(M) arm as M I m. There exist sequences [Mn] and {en}

independent of G such that

*

(3.55) Bn(N) 2 v v B*(m,y,en) -. 0 as n _. ..
mQ‘ln yQHMn)

and

(3.56) ¢n( A zm-1(B))-1 I 0 as n I a .
ms'Mn

Proof: Let N be fixed such that N(M) I m as M.~ m. For

each M, let can) ‘be any null sequence such that g(M)( A zm-l(9))“1
‘mSM
I 0 as M a a. Let n I n(M) be any increasing sequence of

35

integers independent of G such that
*

(3.57) V V Bn(m,y,e(M)) I O as M I m .

m9! yam)
Inverting n(M) to obtain M(n) will allow a choice of sequences
M.“ I M(n) and g“ I g(Mn) independent of G such that (3.55)
and (3.56) Obtain. Again such a choice of n(M) independent of
G is possible. Without loss of generality, g(M) s k( A 2m-1(9))

mSM

so that from (3.47), qm 2 % Zm-1(B)q 2 3 zm(9)p.y where

.804)
p. - {£9 G(d9). Then

(3.58) A A qm (M) (y) 2 {2(9) A 1]”[9 A 1]N(M) .
m4! yam) "

With
(3059) DM 5 V V 901101600):
mSMIy£N(M)

we see as in the proof of Lemma 3.2 that any choice
-2M * 2 b
(3-60) In I 1900 I [2(8) A 1] (9M) exp{aN 04)}

where a and b are constants, a > 0, b > 1, is independent of
G and guarantees (3.57). Hence the proof is complete.

Define a candidate for an a.o. rule by letting Lm be
decision rules such that R(Lm,G) I O as m.» o for every G.
Let {Mn} be any sequence of positive integers I m as n I m

and let {en} be any null sequence. For each n, define tn by

m m
= n+1 n+
(3.61) tn L [mn+1 2 Mn] + tn l[mn+1 5 Mn]

where for each m and e, t: is defined by (3.17).

36

Theorem 3.4. Under (A1), with N any function satisfying the
hypotheses of Lemma 3.3 such that PB REY >'N(k)] I O as k I m,
and with tn defined by (3.61) with {Mn} and {an} chosen
independent of G so that (3.55) and (3.56) obtain, the rule
{tn} is a.o. relative to every G.

Proof: Let G be fixed. For mm”1 >’Mn, (3.40) holds since

M I»m. For m = m 5 Mn’ (3.41) holds and its right hand side

n n+1
is bounded by

N“ m m
(3.62) 2 z Ey(tn(y) - tG

(y))2h (y) +-
y=0 3 6“ m

2 2
e (y) - t:(y)) hm(y) + B PB,Mn[Y >Nn]

with Nn = N(Mn) by use of the cr-inequality (cf. Loéve (1963,
2
p. 155)), the fact that (t: - t3) 5 a , and Lemma 2.1. The third

term of (3.62) I O as n I m by the choice of N. Without loss

of generality, an s k A zm-1(B) so that from (3.54) and (3.48)
mSM

the first two terms of (3362) are bounded by

(3.63) 23:<N) + 86 en( A z“"'1(e))'1

mSM
n

which I O as n I m since (3.55) and (3.56) obtain. Hence the

proof is complete.

CHAPTER IV

TESTING AND FINAL REMARKS

§4.l Testing Under (A1-) gEd_(A3)

In this chapter we will exhibit sequences of decision rules
which are a.o. relative to every G in the case of linear loss
testing in the special discrete exponential families described in
Section 2.1. Recall from that section that a Bayes rule with
respect ot G in the sample size m component problem is provided
by (2.11) and (2.12). Lemma 1 of Johns and Van Ryzin (1971, p.

1524) establishes the useful inequality for each m
m m a m m
(4.1) o s Rn‘bn’c) - R (a) s b 2 ‘ac(y)|gm(y)ryc|an(y) -
y=0
m m
aG(y)l 2 \aG(y)\]
for any empirical Bayes procedure 6: defined by
(4.2) 6%) = (also) s 01. y e I .

where a:(y) is a function of g1,...,§n.

The following lemma proves useful in establishing the
asymptotic Optimality of Specified sequences of decision rules when
assumption (A3) obtains.

Lemma 4.1. Suppose 6: is defined for each m to be a y-measurable

decision rule in the sample size m problem of the form (4.2)

37

38

where a:(y) is a function of §_,...,§n for each y E I. Under

1
(A3) and with G any prior such that g e G(de) < m,

P
(4.3) a:(y) Iy a2(y) for all y E I, l s m S'M,
. m‘+1
implies that the sequence 5n = 6n“ is a.o. relative to G.

Proof: Let G be fixed such that IS G(d9) < m. Fix m,
m 8 'm m m
1 s m s M. Since 1 2 [aG(y) # O]Py[|an(y) - aG(y)‘ 2 ”66)” I O
for each y 6‘1 by (4.3) and since 2 ‘a2(y)\g (y) s
Y‘0 m

j(e+c)G(de) < m, the dominated convergence theorem applies to show

8
that the right hand side of (4.1) I O and hence

m m
(4.4) Rn(6n,G) - R (G) I o .

Since 1 S mn+1 s M for all n under (A3), (4.4) implies that
Dn(G) I O as was to be proved.

Theorem 4.1. Under (AI') and (A3) and with

(4.5) aim = Eamon) - c Ems). y e I. m 2 1.

where Eg" is defined in (3.11) with a choic; of e = e(y,n)
such that (3.16) obtains, the sequence 6n ' 6nn+1 with 6:
defined by (4.2) is a.o. relative to all C.

2392;; Let G be fixed. Under (A1'), a is bounded so that
I 9 G(de) < m. From (3.16) it follows that (4.3) is satisfied
8

and so an application of Lemma 4.1 completes the proof.

39

§4.2 Testing Under (Al)

Let (A1) obtain throughout this section. With f'(y) =

f(y+l) for any function f, define

m = v - c
(4.6) 86,; gm,e qm,e ’
for each m and c where qIn e is defined by (2.17). With
a: defined by (2.12),
m _ m I _ I
(4'7) ‘aG,e G‘ ‘ ‘qm.e qm|

from (2.18). In light of the fact that q' s ﬁq, the right hand

side of (4.7) is bounded by

(4-8) e(B‘+ C)q -

For each m and 6. define

m ='-, _ '-
(4.9) an,e qm’e c qm’e, n 2 l,

where E; e is defined by (3.11). Reversing the order of summa-
9

tion in (4.9) yields

n
(4.10) “2.5” = l g): vk(iq(y+1+k) - c 1q(y+k)). y e x. n 2 1.

“ 1 1

where for each i
(4.11) \z yk(iq(y+1+k) - c iq(y-Hcm s p(e.m.y) a

C

1
“M (W + g<r+k>

 

)

40

in view of our requirement that 1q 5 3-1 in (2.19). Noting

m -01
that Ey(an,e(y)) aG,e(y), y E I, n 2 l, we have from (4.10)
and (4.11)
2
, 2 m - m 2 .e_i1d2stl
(4 1 ) Ey(an e(y) aG,e(y)) s n . y E I. n 2 1 .
Since

m m

“an,e - 86‘ 2 ‘a2|] s [‘a:,e - a: 3| 2 k‘ag‘] +

m m m
“an,e - ac‘ 2 k‘ac‘], n 2 l,
the summand on the right hand side of (4.1) with a: B a: e is

bounded for each n,m,g, and y by

(4.13) |a§<y)\gm(y)Pytia: e(y) - ag’e<y>) 2 r\a§(y>\]
+2%WH%3®)-%UH.

In view of (4.7), (4.8) and the fact that gmq s.[zm-1(s)]-1hm,

the second term of (4.13) is bounded by

(4.14) 2(e+c>e[zm'1<a>]'1hm<y> .

Using a Markov bound on the first term of (4.13), this term is

bounded by

(4.15) zgm<y)[2y(e: e(y) - am

2 %
mm) ] .

which, in light of (4.12),is itself bounded by

(4.16) 2gm(y)n-%p(e.m.y)

41

Combining (4.13), (4.14) and (4.16), the first N terms of the

right hand side of (4.1) with a: 8 a: e is bounded by
-3 N m-l -1
(4.17) 2n b 2 &m(y)p(e,m:Y) + 2b<a+c>e[z (8)] .
y=0

The bound (4.17) motivates the following lemma.
Lemma 4.1. With N any function from I to I, there exist

sequences Mn and an independent of G such that

mm;

(mm) %m>sf5v 2 %owuwmw~o a new
tmﬁﬂ y=0
n
and
‘k - ..
(mm) %EeJA rmhm>1~oesn-..
mil“!n

Proof: For each M, let g(M) be any null sequence such that

6(M)( A zm’1(B))-1.I O as M I m. Since p(M,N) a

(M)
V Z gm(y)p(e(M),m,y) is independent of G, n = nOM) can be
“1‘“ Y'0 J!
chosen independent of G such that n

p(M,N) I 0 as M Ion.
Inverting n(M) to obtain M(n) allows the choices of Mn ' M(n)
and en ' g(Mn) independent of G such that (4.18) and (4.19)
obtain.

Now let Lm be any decision rules such that R(Lm,G) _. 0

as m I m for every G. Such a choice is possible as was seen

in Section 2.3. Then for any sequences {Mn} and {an}, define
m m
a n+ n+
(4.20) 6n L ltmn+1 > Mn] + 6n ltmn+l s Mn]
where, for each m and n, 6:. is defined by (4.2) with

m tn
(4.21) an(y) an e (y), y 6 1,, m,n 2 l ,

’ n

42

where a: e is defined by (4.9).
9

Theorem 4.2. Under (Al) and with N a function from I to I

defined such that
(4.22) Pa,k[Y > N(k)] I 0 as k I do

and with 6n defined by (4.20) with {Mn} and {en} chosen
independent of G such that (4.18) and (4.19) obtain, the rule
{5“} is a.o. relative to every G.

Proof: Let G be fixed. For mn+1>Mn,

m
(4.23) o s Dn(G) s R(L “1.6) .. o as n -. on

since Mn I»@. For m I mh+1 s Mn’ 0 s Dn(G) is bounded by the

right hand side of (4.1). In light of (4.17), the fact that

gm(y)|ag(y)\ s (B+c)hm(y) for each ‘m and y, and Lemma 2.1,

the right hand side of (4.1) is bounded by

(4.24) 21, pnm) + 2b(a+c)p: + (3+c)P [Y > N(Mn)]

6 .Mn

which I 0 as n I a from (4.18), (4.19), and (4.22) by the

choice of N, {Mn}. and {an}. Hence the proof is complete.

43

§4.3 Final Remarks

The rules presented in Chapters III and IV have several
competitors. The first competitor which we shall discuss arises
in the following manner. Suppose for each m that, in an
empirical Bayes problem involving repetitions of a sample size m
component problem, there is a sequence {Y2} of rules which is
a.o. relative to every G. One might then consider for use in the
corresponding varying sample size empirical Bayes problem, a pro-
cedure {whl which partitions the problem into those involving
a common sample size and uses the apprOpriate {Y2} within each
class, i.e.,

n

2 m =m .
i=1[i ]

(4.25) = m] where k(n,m)

m
Th Yk(n,m)[mn+l

Such a rule was suggested by Professor Hannan as a first thought
as to what could be done in the varying sample size empirical
Bayes problem. Under (A3), i¢h} will be a.o. relative to every
G. However such a rule does not use all of the past data at each
stage and so we could say that it does not use all of the available
"information about G". So intuitively at least, when dealing
with the particular component problem of Chapter II, our method
seems better in the sense that it uses all of the past data
available. In the absence of (A3) the rule (4.25) would not be
a.o. since a new sample size can then appear infinitely often.
Professor Hannan has also suggested more sophisticated estimators

of qm based on averaging across mrtuples within X

"i’ miZm,

which use more of the available data.

44

One can use {mh} to obtain ¢-asymptotic optimality
relative to every G by employing the device used in Sections
3.2, 3.3, and 4.2. For if Lm is a decision rule in the sample

size m problem such that R(Lm,G) s e if m 2 M. for every G,

then

* __ mn+1

(P11 - (pn[mn+1 s M] + L [mu-+1 > M]

will result in 11;; Dn(G) s e for every G. It is possible that
we might obtain asymptotic Optimality by replacing M by a proper
choice of Mn I>w but this idea has not been fully explored.

The second competitor that we look at is one which would
arise from the first track discussed in Section 1.3. We will
discuss this competitor in the context of the special component
problem of Chapter II. For an estimator Gn based on
31,...,§' and taking values on the set of probability distributions

n
on [0,8], let

(4.26) and) -=- j eyzm<e)én<de>

Q
Since eyzm(9) is bounded and continuous in e E Q, the Helly Bray
Lemma (cf. Loeve (1963, p. 180)) implies that 6mm “23' qm(y)
for each y E I if an I G in distribution a.s. Tucker (1963),
Rolph (1968), and Meeden (1972) have demonstrated the existence of
such estimators G“ in the case of identical sample sizes. We
saw in the proofs of Theorems 3.1, 3.2, and 4.1 that rules based
on consistent estimators of qm can easily be shown to be a.o

relative to every G under (A3). Again in the absence of (A3)

45

the rules based on am might possibly be extended to a.o. rules
by choice of rules Lm and a sequence {Mn}.

The assumption that the parameter Space ® is bounded is
a stronger assumption than what is needed to prove asymptotic

Optimality in the particular case of the component problem of

Chapter II where the sample sizes are identical in the empirical
Bayes problem. Macky (1966) and Hannan and Macky (1971) have
demonstrated a.o. procedures for estimation when ® = [0,m)

while Robbins (1963) and Samuel (1963) have demonstrated a.o.
procedures for testing when @C [0,») relative to every G for
which I9 G(de) < m. So one would hOpe that in the future methods
for estgblishing a.o. procedures in the varying sample size empirical
Bayes problem could be found without restricting ® to a bounded

set.

BIBLIOGRAPHY

BIBLIOGRAPHY

Ferguson, Thomas S. (1967). Mathematical Statistics. é_Decision
Theoretical Approach. Academic Press, New York and London.

Hannan, James and Macky, David W. (1971). Empirical Bayes squared
error loss estimation of unbounded functionals in
exponential families. Unpublished.

Hoeffding, Wassily (1963). Probability inequalities for sums of

bounded random variables. J. Amer. Statist. Assoc. 58
13-30 0

Lehmann, E.L. (1959). Testing Statistical Hypotheses. John Wiley
& Sons, New York.

Loéve, Michel (1963). Probability Theory. Third Edition, Van Nostrand,
Princeton.

Johns, M.V., Jr. (1957). Non-parametric empirical Bayes procedures.
Ann. Math. Statist. 28 649-669.

Johns, M.V., Jr. and Van Ryzin, J. (1971). Convergence rates for
empirical Bayes two-action problems I. Discrete case.
Ann. Math. Statist. 42 1521-1539.

Macky, David W. (1966). Empirical Bayes estimation in an exponential
family. RM-l76, Department of Statistics and Probability,
Michigan State University.

Maritz, J.S. (1970). Empirical Bayes Methods, Methuen and Co. Ltd.,
London.

Meeden, Glen (1972). Bayes estimation of the mixing distribution,
the discrete case. Presented at l33rd Meeting of the
Institute of Mathematical Statistics, Ames, Iowa, April
1972.

Neyman, J. (1962). Two breakthroughs in the theory of statistical
decision making. Rev. Inst. Internat. Statist. 30 11-27.

Robbins, Herbert (1956). An empirical Bayes approach to statistics.

Proc. Third Berkeley Symp. Math. Statist. Prob. I, University
of California Press, 157-163.

46

47

Robbins, H. (1963). The empirical Bayes approach to testing

statistical hypotheses. Rev. Inst. Internat. Statist. 31
195-208.

Robbins, Herbert (1964). The empirical Bayes approach to statistical
decision problems. Ann. Math. Statist. 35 1-20.

Rolph, John E. (1968). Bayesian estimation of mixing distributions.
Ann. Math. Statist. 39 1289-1302.

Samuel, Ester (1963). An empirical Bayes approach to the testing
of certain parametric hypotheses. Ann. Math. Statist. 34
1370-1385.

Tucker, Howard G. (1963). An estimate of the compounding distribu-
tion of a compound Poisson distribution. Theor. Prob.
Appl. 8 195-200.

‘1» in!
Ni

_' ‘g'L‘ ..- _._.. ﬂimwa

 

  

MTIWITWIMIIEﬁiﬂijiﬁiﬂililiuiﬁWES

1029