I »

      
     
       
   
     

I

III

   

'-+
Imom

SOME ADMISSIBILITY CONSIDERATIONS IN THE FINITE
STATE COMPONENT COMPOUND AND EMPIRICAL BAYES
DECISION PROBLEMS

Dissertation for the Degree of Ph. D.
MICHIGAN STATE UNIVERSITY
JOHN ELVIN BOYER, JR.

1976

    

  

LIBRARY
Marianas”
.. U ,

IIIIIIIII Iiiiiiiiiiiiii L

3 1293 00 SIOIO

This is to certify that the

thesis entitled

SOME ADMISSIBILITY CONSIDERATIONS IN
THE FINITE STATE COMPONENT COMPOUND AND
EMPIRICAL BAYES DECISION PROBLEMS

presented by

John Elvin Boyer, Jr.

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degpein Statistics and
Probability

zit, ZWJ

Major professor

 

Date AM

0-7 639

‘ gm‘i i995",

 

ABSTRACT
SOME ADMISSIBILITY CONSIDERATIONS IN
THE FINITE STATE COMPONENT COMPOUND AND
EMPIRICAL BAYES DECISION PROBLEMS
BY

John Elvin Boyer, Jr.

We consider the compound and empirical Bayes decision
problems with finite State component. Relationships between admis-
sibility of a compound rule and the admissibility of the component
decision rules it selects are established. Analogous results are
obtained in the empirical Bayes decision problem. The main result
is the demonstration of an admissible Bayes (A) empirical Bayes

decision rule which is asymptotically optimal for a large class of A.

SOME ADMISSIBILITY CONSIDERATIONS IN
THE FINITE STATE COMPONENT COMPOUND AND
EMPIRICAL BAYES DECISION PROBLEMS

BY

John Elvin Boyer, Jr.

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

1976

ACKNOWLEDGEMENTS

I wish to express my sincerest thanks to Professor Dennis C.
Giliiland for his guidance and encouragement in my studies and in the
preparation of this thesis. His patience and willingness to discuss
any problem at any time are greatly appreciated.

I also wish to thank Professor James Hannan for his critical
reading of this thesis and his encouragement as it progressed.
Special thanks are due to Mrs. Noraiee Burkhardt for her typing of

the manuscript, and the patience with which she did it.

TABLE OF CONTENTS

Chapter Page
I INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1

Component Decision Problem .

i
2. Compound Decision Problem .
3 Empirical Bayes Decision Problem .

U‘IW"

2 RELATIONSHIPS BETWEEN COMPOUND AND COMPONENT
ADMISSIBILITY . . . . . . . . . . . . . . . . . . . 7

I. Introduction . . . . . 7
2. Relationships between Compound and Component
Admissibility . . . . . . . . . . . . . . . . . IO

3 ADMISSIBLE (BAYES) SOLUTIONS TO THE EMPIRICAL BAYES
PROBLEM . . . . . . . . . . . . . . . . . . . I7

I. Admissibility and Component Admissibility in
the Empirical Bayes Problem . . . . . . . . . I7
2. Bayes Procedures in the Finite State
Component Empirical Bayes Problem . . . . . . . 2i
3. Bayes Risk Consistency of the Bayes
Procedures . . . . . . . . . . . . . . . . . . . 25
APPENDIX A . . . . . . . . . . . . . . . . . . . . . . . . . . 30
APPENDIX B . . . . . . . . . . . . . . . . . . . . . . . . . . 32

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . 3A

iii

CHAPTER I

INTRODUCTION

l. Component Decision Problem

 

Consider the following statistical decision problem called
the component decision problem. Let {P6: 6 E O} be a family of
probability measures over a 0-field B of subsets of X. ® is
called the parameter or state space. X denotes an X-valued, Pe-
distributed random variable. Let A denote the action space and
L :_O the loss function defined on ® X A. ,Let C be a 0-field
of subsets of A, with respect to which the 6-sections of L are
measurable. A (randomized) decision rule t is a function having
domain X x C such that each X-section of t is a probability

measure on C and each C-section of t is B-measurable. The risk

of t at state 6 is
(I) R(O,t) = fl L(O,a)t(x,da)Pe(dx).l

Let G denote a class of probability measures (priors) on F a
o-field of subsets of ® with reSpect to which the t-sections of

R(6,t) are measurable. The Bayes risk of t at G E G is

(2) R(G,t) = fR(e,t)G(de).

Let T be a specified class of component decision rules. The infimum

Bayes risk at G is

(3) R(G) = inf R(G,t)
tET
and R(-) defined by (3) is called the Bayes envelope. For t“

4.
s

such that R(G,t ) = R(G) we write t' = tG and call tG a Bayes

rule with respect to 6. Note that tG need not exist and, if it

does exist, it need not be unique. However, if it does exist at each

 

G and a minimizer is Specified at each G, then the decision rule

 

valued function t(.) defined on G is called a Bayes response.

it.

This thesis considers the finite ® case almost exclusively.

'4‘:
-I’

Here we write ® = {O,I,...,m} and assume that PO’PI""’Pm are

distinct probability measures on B. We write P = ZmG.Pi for

G O I
G = (Go’Gl’°°"Gm) 6 G, here the m-dimensional simplex in Rm+l,

 

  
     

‘._-—_L“‘J—
a a h" .'o \-
‘ ' '. ‘7' '

Euclidean (m+I)-space. We let fi denote a (bounded) density of Pi

with respect to u = 2: Pi’ i = O,l,...,m, whenever densities are

used. In the finite ® case, the risk function R(-,t) defined

I m)

by (I), is a vector 5 = (50,5 ,...,5 in [O,m]m+l. The collection

of all such 5 is termed the risk set and is denoted by S. The
Bayes risk of s at G is 23 Gis'° If S is closed from below,

then tG exists for every G E G.

2. Compound Decision Problem.

 

In the compound decision problem the component problem occurs
repeatedly and independently n times presenting a sequence (ei,xi),
i = l,2,...,n. The 6i are unobservable. We write §_= (6,,62,...,6n)

and §= (xl,x2,...,xn) ~ P9= Pe'x P82 x...x Pen. 9:; denotes g

with the a-component deleted and, similarly, for 2a, a = l,2,...,n.
For the class of compound decision rules £_= (tl""’tn) we take all
functions 5_ such that for each a in {O,I,...,m} to is a func-
tion on Xn x C with the property that every iu-section of to is
a component decision rule (as in section I) and such that sa(iu) =

If L(O,a)ta(§n,da)Pe(an) IS a measurable function of 5“. 0f

0" in" .‘ .
) - (sa(§a),...,sa(§a)) Is then a measurable functIon

course 5 (X
a -o

of 5a into S. The unconditional component a risk is denoted by
(A) Ra(_e_s£_) = [R(ea’ta(§u))P:é11(d§ﬂ)’ (1 = IDZD"‘In

and the compound (average risk across components) risk is denoted by
(5) Ne ) = i

i V . V

In the O = {O,I,...,m} case, sa(§a) = R(I,ta(§a)),
I = O,I,...,m, and the unconditional component a risk is a
P" -average of the S-valued function 5 (Ii ).
9a a -o

The compound decision problem is invariant under the group of
permutations of coordinates. t_ is equivariant if ta(§) = t(Xa;3u),
a = l,2,...,n, where t as written is a symmetric function of its
second argument. For such t, sa(§u) = 5(la), a = l,2,...,n, where

s is a symmetric S-vaiued function on Xn-I. Equivariant t. have

compound risk depending on 2_ only through its empirical distribution
which is denoted by n_= (n0,nl,...,nm) in the ® = {O,l,...,m} case.
There has been increased concern over the component risk
behavior of compound decision rules. In Chapter 2 we explore the
relationships between compound admissibility of t, the admissibility
of the Xu-section of ta, a = l,2,...,n and the admissibility of
the éu-sections of the risk functions Ra(9)£),(l= l,2,...,n.
Examples I-3 and Theorems l and 2 serve to delineate the implications

in finite ® case.

3. Empirical Bayes Decision Problem.

 

In the empirical Bayes decision problem 6 are iid

I,62,...
G (unknown) distributed random variables and the conditional distribu-
tion of §_= (XI,X2,...) given g-= (6',62,...) is P x P x...

6I 62

The marginal distribution of 5_ is PG x PG x..., hereafter, simply

PG'
In the empirical Bayes problem in the finite ® case we let

m
0 G
also assume the linear independence of the densities f0,fl,...,fm in

L2(u) ensuring, among other things, the identifiability of the

fG = Z Gifi which is a density of P with respect to n. We

mixtures PG and the existence of unbiased bounded estimators of G.

Let £_= (tl’t .) where for each. n, the action selected

2,..
for component n, is by tn, where tn is the nth component of a
compound rule, that is tn is a function on Xn x C such that

every §ﬂ_l-section is a component rule and
R(6,tn(§ﬂ_l)) = If L(e,a)tn(§n,da)Pe(an)

is a measurable function of ln-l' Robbins (l956) introduced the
empirical Bayes problem showing examples of constructions of empirical
Bayes rules whose component n Bayes risks converge to the component
minimum Bayes risk R(G), whatever be 6.

V

The conditional on N“ component n Bayes risk of t_ is

m v
(6) R(G.tn(§n)) = 3:0 GiR(i.tn(§ﬂ)). n = 1.2....

and the unconditional component n Bayes risk of t_ is

v -lv
(7) Rn(G.tn) = IR(G,tn(§n))PE (dxn), n = I,2,...

Of course, for each n and G

(8) R(G) 1 R(c,tn(§gn>)

from which R(G) §_Rn(G,tn) for all _E, n and G.

An empirical Bayes rule t_ is said to be strongly Bayes risk

 

consistent if

 

R(c,tn(gﬁ)) + R(G) 3.5. [PE] for all e e G

and simply Bayes risk consistent (usually termed asymptotically

 

optimal) if
Rn(G,tn) + R(G) for all G e'G.

Chapter 3 concerns the empirical Bayes case. In section I,
Theorem 3 and Example A delineate relationships between the admis-
sibility of tn as a decision rule for decisions concerning G and
the admissibility of the Xn-sections in the component decision problem.
Theorem A shows that for equivariant compound t, the empirical Bayes
admissibility of tn implies the compound admissibility of t,
Theorem 5 shows that admissible empirical Bayes rules result from
playing Bayes versus a second level prior A on G.

In Section 2 of Chapter 3, Lemma I exposes the structure of
Bayes (A) empirical Bayes procedures. This lemma together with a
series of lemmas in Section 3 culminate in a proof of the main re-
sult (Theorem 7), namely, the strong Bayes risk consistency of the
Bayes (A) empirical Bayes procedure, provided each point G E G

is a point of support of A.

CHAPTER 2

RELATIONSHIPS BETWEEN
COMPOUND AND COMPONENT ADMISSIBILITY

I. Introduction.

 

In this chapter we consider some admissibility questions in
the compound decision problem. In particular, we explore the rela-
tionship between admissibility of a compound decision procedure and
the admissibility of the risk functions it selects for component
decisions. We define three criteria which we are interested in com-
paring. A

Definition l. (A) A compound rule £_ is admissible if

 

A.

there does not exist a compound rule 5? such that

'0

R(_e_.§) “

:_R(g3_) for all g_ in ® and R(gntf) < R(gjt) for some
n

g_ in ® .

Definition 2. (CA) A compound rule 5_ is component

 

admissible if, for every a in {l,2,...,n}, every éu-section of

 

Ra(§3_) is an admissible risk function in the component.

Definition 3. (CCA) A compound rule t_ is conditionally

component admissible, if for each a = I,2,...,n and Eu in @n-l,

 

 

almost all [PD J , éh-sections of t are admissible decision rules
-u a
in the component.
Of course, (A) is the standard admissibility definition

applied to compound decision rules t, Gilliland and Hannan (l97A)

discuss the restricted compound decision problem wherein only

7

compound rules taking values in a specified restricted component risk

set are considered. For example, if the component risk set is re-

V

stricted to the admissible risk functions, then all X -sections of

A

ta are admissible decision rules in the component, a = I,2,...,n.

In the risk function (S-game) notation of Gilliland and Hannah, the

V

risk function 50(2a) corresponding to the X -section of ta

A

is an admissible risk function for all Xa’ a = l,2,...,n. Our de-

finition of (CCA) imposes this condition for almost every [Pv J Ra.

Am

The unconditional component a risk is the average Ra(9’£) =
rsa(gu)P6 (did) and (CA) requires that éu-seetions be admissible
risk functions.

Any simple rule t_ where tl,t2,...,tn are admissible
component decision functions is both (CA) and (CCA). The Stein
example of the inadmissibility of the usual estimator of the mean of
a multivariate normal, squared error loss, n :_3, shows that, in
general, (A) need not follow from (CA) and (CCA).

Copas (1974) has established necessary conditions for the
admissibility of an equivariant compound decision rule when the com-
ponent is a 2-state, 2-act decision problem. One such necessary con-
dition is that the rule be cut-point in nature, that is, that there
exist a symmetric function A* so that component decisions 0 and l
are made according to whether (fl(xa)/f0(xa)) < A*(x) or
(f'(xa)/f0(xa)) > A*(x) where fl/fO is the likelihood ratio in the
component. In the next section we establish a necessary condition

and a sufficient condition for (A) for some finite state components.

Our definitions (A), (CA), (CCA) refer to fixed n.
Usually the term compound decision rule refers to a specification of
decision procedures for each n = l,2,... so that (A), (CA), (CCA)

could then be required for every n.

I0

2. Relationships Between Compound and Component Admissibility.
We now summarize the results concerning (A), (CA) and (CCA)

to be established in this section for finite state components.

Theorem 2
(CA) = (A)
s u Theorem I
Remark l (CCA)

The implication (CA) = (CCA) will be seen to be trivial. The
implication (CA) =I(A) will be proved for two-state components and
equivariant t_ only. All other pairwise implications do not hold as
shown by Examples l-3.

Example I. (CCA) #>(A). Consider the 2 x 2 component
testing problem used by Robbins (l95l) to introduce compound decision
theory. Here P6 is N(26 - l,l), O E {O,l}. The compound decision

rule exhibited for this problem by Robbins (l95l, (37)) is t. where

l, x > c(x)
a —
t (x) = ’ a: l,2,...,n,
a—

O, x < c(x)
a —
and c is the symmetric function

oo,x<-]

C(§)=%Dn(-I--’_-‘-.-l<§<l
l+x _
-w , x :_l

Each éu-section of c is a decreasing function of xa when finite
valued. It follows that each ga-section of ta is a monotone rule
and, therefore, Bayes versus some component prior putting positive
mass on each element of @. Hence, t_ is (CCA).

Several authors including Robbins (I9SI, p. l38), Hannan
and Robbins (1955, SB) and Huang (I972, p. 350) have indicated that
£_ is not admissible if n 3_2. Robbins states that inadmissibility
follows since t_ is not of the form of a Bayes compound rule, but a
rigorous demonstration that E_ is inadmissible has not appeared in
the literature. One such does exist based on the fact that there does
not exist a Bayes compound rule equal to £_ a.e. [Lebesgue in Rn].
This demonstration is found in Appendix A for the case n = 2.

Example 2. (CCA) #I(CA). Consider the compound decision
problem and rule .p of Example I. AS indicated there, every ga-section
of to has component risk sa(3a) which is an admissible risk point
of the component risk set S. Hence, 5a takes values on the lower
boundary B of S. Here the likelihood ratio fl/fO has a con-
tinuous distribution under both P and P so by Appendix B, B is

O I

strictly convex. Also, we see that s is not equal to a constant

Q

V

almost everywhere [Pé J so that the ga-section of Ra(g)t) being
a P6 -average of sa —:elongs to the interior of S and, therefore,
is hgt admissible. Hence .3 is not (CA).

Example 3. (A) ¢I(CA). Again we consider the Robbins'
component of Example I but here we take t_ to be an equivariant Bayes

rule with respect to the diffuse prior (see Robbins (I9Sl, §3)).

Gilliland and Hannan (l97h, p. ll) Show in the setting of a general

I2

finite state component that a version of £_ is delete bootstrap in
form, that is, each Bu-section of ta is Bayes versus some w(gu)
on ® = {O,l,...,m}, o = l,2,...,n. For the Robbins component such a
t_ is (CCA) and, by the strict convexity of the lower boundary B as
indicated in Example 2, such a t_ is not (CA).

Remark l. Suppose ® = {O,I,...,m} and that the risk set
S is convex. That CA =>CCA is trivial because the points selected
as gu-sections for the CA rules must be on the lower boundary of the
risk set 5. These points are probability-weighted averages of the
5(Zu)’ and the only way for that average to be on the lower boundary
of the convex set S, is for the 5(20) to be on the lower boundary

a.e. [Pv J, i.e. to be CCA.
O
-o
Theorem I. Suppose that ® = {O,I,...,m} and that the risk
set S is a compact convex subset of [O,w)m+l. Then (A) = (CCA).

ﬂiggﬁ. Suppose .p is not (CCA). Then there exists an

o, O: and a set C c X".l such that P.0(C) > O, and, for all

6
-u

i E C, s (R ) is not in B, the lower boundary of S. Therefore,

-u a -u

for each i 6 C there exists s"(x ) E S such that s"(x ) < s (i )
-u a —o a -o a -o

where for vectors a and b, a < b means that a| §_bi for all i

with a. < b. for some i. Extend the domain of definition of s;
to all of X”.1 by defining sh(x ) = s (i ) for i E C. Now let
a -o a -o -o
v 1;". 'v
C. = {x : s '(x ) < s'(x )}, i = O,I,...,m and note that C = uc..
I ~u a -o a -o I
Hence, at least one of the CE has positive probability which we

suppose to be C without loss of generality. Note that

0

13

(i) s"'(£)<s'(£) for all see; ex”"
a -o - o -o -o
.. i=0 V \ 0" V
(II) sa (5“, < sa(xa) for 5o E Co
(iIi) P‘éO(CO) > 0.

Now consider a = ... t“ ... h A '
_g (t', ’ta-l’ o’to+l’ ,tn) w ere to Is a

v f: v v
deCISIOn rule whose x -section has rIsk point 5 (x ) for all x .
-o a -o -u

Then R(Qﬂtf) §.R(§n£) for all §_E @n and R(Q?,£f) < R(§?,£)

O 0

O O
l,...,e )

a_l,0,ea+l,...,en .

That measurable sa can be chosen ensuring the existence

where 9? = (O

of t; is demonstrated by the following argument. For a point 5
in S that is not in B, let c(s) be the real number such that
s - c(s)-l_ is in B where l_ is the vector <l,l,...,l>. Such a
c(s) exists because of the compactness and convexity of S. Then let
sa(§u) = sa(§u) for Na not In C and sa(§u) = (s - c(s)'l) (Eu)
for X in C. The s"(X ) is then a measurable function of i

-u a -u -u
since 5 (X ) is measurable. U

a -u
Whereas (CCA) is a necessary condition for (A), our next

result shows that (CA) is a sufficient condition for (A) for equi-
variant £_ in the two-state component case. A proof or counter
example for the general finite state component has not been found.
The (CA) condition is a very strong condition and, as can be seen in
Examples 2 and 3, when the lower boundary of S is Strictly convex,
only simple rules are (CA).

Theorem 2. Suppose that ® = {O,l}. If £_ is an equi-

variant rule which is (CA), then t_ is (A).

IA

Proof. The compound risk of equivariant t_ depends on ‘9

only through the number nI of O = l, a = l,2,...,n. Displaying

this through the notation R(_3£) = R((no,n]),t) where nO - n - n],

we have

n
nR((no.nl).£) = oil Ra((n0.nl).£)

n n
= EIEBG = 03Re((”o'”1)'£) + a:I[e“ = IJRa((n0.nl).5)

n n -I

l e O l e
+ nlfs (xﬁIIPO x Pl )(dgn)

— O - I. -
- noR ((n0 l,nl),t) + an ((no,nI l),tj
where R0 and R1 are defined by position in the last line. Then

rRIIo.nI.t) = a‘IIo.n-I).t)

(*) i R((I.n-I).;) =%R°((0.n-I).g) + D—;—'-R'III.n-2).ti

 

LRIIn.o).g = R°IIn-I.o).t).

Suppose that t_ is not (A), but t_ is (CA). Since 'g
is not (A), then there is an equivariant E? such that
R((j,n-j),t*) :_R((j,n-j),t) for all j = O,I,...,n, with
R((j,n-j),£%) < R((j,n-j),£) for some such j, say jo. On the other
hand, since t_ is (CA), if, for any j, Rqu,n-I-j),tf) <

R°((j,n-I-j),t), then R'((j,n-I-j),£f) > Rl((j,n-l-j),£) and

I5

similarly, if for any j, R1((j,n-l-j),£f) < R]((j,n-l-j),t) then
R°IIi,n-I-i).t"‘) > R°IIi,n-I-i).t).

Since R((jo,n-jo),£f) < R((jo,n-jo),t) then either
R°((io-I.n-i0).§‘)< ROIIJO-Im-JOLQ or
Rl((j0,n-l-jo),£f) < Rl((jo,n-l-jo),£). Suppose that
Ro((jo-l,n-j0),tf) < R0((jO-l,n-jo),t). (The proof with the other
assumption is exactly analogous.) Then, since £_ is (CA),
Rl((jo'l,n-jo),£%) > R]((jorl,n-j0),t). Because .
R((jo-I,n-jo+l),£*) : R((jo-l,n-jo+l),£) we also get
Ro((jO-2,n-jo+l),£f) < RO((jo-2,n-jo+l),t). Then, again because .2
is (CA), R'((jo-2,n-jO+I),tf) > R'((jo-2,n-jo+I),t). Proceeding
inductively until the first argument is zero, we get
R'IIo,n-I),t*) > R'((o,n-I),t), or that R((0,n),_t_*) > R((O,n),£),
which contradicts the assumption that R((j,n-j),tf) §_R((j,n-j),t)
for all j. Thus if t_ is not (A), t_ is not (CA). D

Theorem 2 concerns m = l, the two-state problem. The
following numerical example shows that the method of proof used for
m = I will not work for m = 2 and n = 2; thus a new proof must
be devised if Theorem 2 is to be generalized to arbitrary finite m
and n. Using notation exactly analogous to that in the proof of

Theorem 2 suppose

IR°III.o.o).t). R'III.o.o).t), R2(Ii.o.o>.gi = (6.2.2)
IR°(Io.I.oI.t), R‘IIo.I.0I.t). RZIIo.i.o>.tI) = (2,6,2)
IR°IIo.o.I),t). R‘Iio.o,I).g. R2<<o,o.I).t)) = (2.2.6)

l6

Further suppose

(R°((I.o.o).t"). R'III.o.o).t_*). RZIII.o.o)._t_*)) = (5.1».0)
(R°((o,I.o)._t_*). R'((o.I.o).§"). R2I(o.I.o)._t_"‘)) = (0.6.4)
(R°(Io.o.I)._t_*). a‘((o,o,n,§‘), R2((o.o.I).t"‘)) = (4.0.6)

Using the analog to (*), R((2,0,0),£ﬁ) = 5 < 6 = R((2,0,0),t) and

for all other possible 9) R(gjgf) = R(22£)- Thus, even though .3

.1.

may well be (CA) (it is not dominated by t") it is not (A).

CHAPTER 3

ADMISSIBLE (BAYES) SOLUTIONS
TO THE
EMPIRICAL BAYES PROBLEM

I. Admissibilityiand Component Admissibility in the Empirical
Bayes Problem.

 

 

The empirical Bayes decision problem was introduced in
Section 3 of Chapter I. An empirical Bayes decision rule t_ in-
volves specification of a decision procedure tn for use in component

n,nll.

 

 

Definition A. (A) tn is admissible if there does not
. 7‘: 3': .
eXISt a tn such that Rn(G,tn) §_Rn(G,tn) for all G E G wIth
R(G *) R(G ) r c;
n ,tn < n ,tn or some G E .

Definition 5. (CCA) tn is conditionally component

 

 

n-I
G

tn are admissible decision rules in the component.

admissible if for each G E G, almost all [P

 

J i -sections of
-n

We refer to an empirical Bayes rule £_ as (A) or (CCA) if tn is
(A) or (CCA) for every n 3_l.

Meeden (I972, p. 97) defines admissibility (A) of 5_ as
above. In his Section 3 he demonstrates the inadmissibility of some
classical empirical Bayes procedures for certain linear loss
testing and squared error loss estimation components having discrete

exponential distributions. The demonstrations exploit the non-(CCA)

property of these classical empirical Bayes procedures.

I7

l8

Recently, Van Houwelingen (I973, I976) has demonstrated

(CCA) rules for the linear loss testing component which have improved

rates of Bayes risk consistency. Gilliland and Hannan (l976) have
extended some of these results to the general multiple decision prob-
lem component.

Out next theorem shows that (CCA) is implied by (A) in the

finite state, compact convex risk set component empirical Bayes problem.

Theorem 3. Suppose that ® = {O,I,...,m} and that the

risk set S is a compact convex subset of [O ,w)m+'. If £_ is an (A)

empirical Bayes procedure, then it is (CCA).

2592:. Suppose t_ is not (CCA). Then there exists an n,
H E G and a set C c X”.1 such that P:-'(C) > O, and, for all
in E C, sn(2n) does not belong to B, the IOwer boundary of S.

Since the uniform prior U = (-l—- —) dominates every G so that

m+l”"’ m+Il
Pn.l dominates e er Pn-' e ha e Pn l(C) > 0 Let t* be t*
U v Y G w v U ' n O.

as constructed in the proof of Theorem I (Section 2 of Chapter 2).
3': it E-n l5 7':
Since 5 < s and s < s on C it follows that n(X ) .1

n—n n n

n-I v .
EG sn(§ﬂ) for all G E G and EU ms (x ) < EU Isn (x ). Since

m i n-l v , , a
Rn(G,tn) - 20 G EG sn(5ﬂ) for G E G, and Similarly for tn, and
since U puts positive mass on every coordinate, it follows that
A A
R(G,tn) 1 Rn(G,tn) for all G E G and Rn(U,tn) < Rn(U,tn). D
Our next result shows that for finite ® and equivariant
t, the empirical Bayes admissibility of tn (Definition A) implies
the compound admissibility of £_ (Definition l).
Theorem A. Suppose that ® = {0,l,...,m+l}. Let £_ be

an equivariant compound decision rule. If tn is empirical Bayes

admissible, then £_ is compound admissible.

IS

2522:. The compound risk of t_ at g_ is a function of '9
through p_= (no,nl,...,nm) where for i = O,I,...,n, ni is the
number of 6a 8 i, a = l,2,...,n. Let the compound risk of £_ at
g_ be denoted by R(22£)- If £_ is inadmissible, there exists a
3* such that R(p,£*) _<_R(p,_t_) for all 36 ®n with R(p°,_t_*) <
,5) for some 2? E @n. By the finiteness of the group of n!
permutations and Theorem A.3.2 of Ferguson (I967), E? can be taken
to be equivariant. Therefore, it follows that R(pjtf) §.R(pjt)
for all p_ with R(p?,£%) < R(p?,t). Let G be any distribution on
@. The Gn-average of Ra(9’£) is constant with respect to a for
equivariant t_ so that Rn(G,t:) §_Rn(G,tn). Furthermore, if Hn
puts positive mass on 2?, then Rn(H ,t:) < Rn(H ,tn). Therefore,
tn is inadmissible according to Definition A. D

Example A. Empirical Bayes (CCA) ¢)(A). Consider the
Robbins component and bootstrap rule E_ of Example I. As demon-
strated there, each gn-section of tn is an admissible decision
rule in the component. Hence, tn is (CCA) according to Definition
5. However, 3_ is equivariant and, as shown in Appendix A, £_ is
inadmissible in the compound problem if n :_2. Therefore, by
Theorem A, tn is not (A).

As seen by example, conditional component admissibility is
not sufficient for the admissibility of an empirical Bayes procedure.
The following observation leads to a characterization of empirical
Bayes admissibility with which it is easy to show that certain Bayes
empirical Bayes procedures are admissible. Let n be given and

consider the statistical decision problem with states G E G,

20

observation g_= (Xl,...,Xn_]) ~ P3-l, actions d E D, where D is

class of all component decision rules, decision rules t which are
igfmeasurable mappings into D, and loss function R(G,d). The risk
of t is EE-IR(G,t(X)) = Rn(G,tn) where tn is the empirical Bayes
rule tn(X',...,Xn) = (t(29)(Xn). Thus, tn is empirical Bayes-
admissible (Definition A) if and only if t is admissible in the
usual sense in the decision problem (G,D,R,Rn) defined above.
Theorem 5. Let ® = {O,I,...,m}, S C [O,w)m+l, and suppose
that n :_2 is given. Let A be a prior distribution on G, the
m-dimensional simplex of distributions on 8, and suppose that the
support of A is all of G. If t is a decision rule in the decision
problem (G,D,R,Rn) which is Bayes with respect to A and
fRn(G,t)A(dG) is finite, then tn = t is an admissible empirical
Bayes decision rule (Definition A).
Pippf, The proof used by Ferguson (l967, Theorem 2.3.3)
covers the present situation. Here G is the m-dimension simplex
in Rm+l whereas the parameter set is the real line in Theorem 2.3.3.
Here the decision problem (G,U,R,Rn) has risk functions
n-l

EG R(G,t(2)) = Eg-I(ZgGiR(i,t(2))) which are continuous functions

of G. U

2i

2. Bayes Procedures in the Finite State Component Empirical Bayes
Problem.'

 

One can regard the empirical Bayes decision problem as a
classical decision problem with parameter G E G. Therefore, it is
not surprising that Bayesian decision rules have been suggested for
use in the empirical Bayes problem, for example, Lindley (l97l, §l2.l),
Tucker (I963), Meeden (I972) and Shapiro (l97A). Tucker (I963) and
Meeden (I972) for certain infinite state component decision problems
have demonstrated the (empirical) Bayes risk consistency of Bayes
procedures. Since asymptotic optimality in empirical Bayes problems
is Bayes risk consistency (at every G), the Tucker and Meeden Bayes
rules ”solve” certain empirical Bayes problems.

Robbins (l95l, 53) conjectured that Bayes procedures might
be solutions to the compound decision problem. Since for equivariant
E, compound risk convergence to the simple envelope for every p_ in
the compound problem implies that tn is (empirical) Bayes risk con-
sistent (for every G), as observed by Gilliland and Hannan (l97A),
Robbins' conjecture has implications to the Bayesian approach to
empirical Bayes. In fact the two state results of Gilliland, Hannan
and Huang (I97A) demonstrate the (empirical) Bayes risk consistency
for a large class of priors and two state components. Shapiro (l97A)
considered a testing component and investigated the asymptotic pro-
perties of the Bayes procedures and the average loss across component
decisions.

Rolph (l968) and Ferguson (I97A) have investigated the

problems of placing prior distributions on classes of distributions

22

G. This problem is trivial in the case ® = {O,I,...,m} where G
is a subset of Euclidean (m+])-space. This problem is difficult
for non-finite @, particularly, when tractable Bayes procedures are
sought in order to make the consistency question tractable. Ferguson
(l97A) claims there are at least two desirable characteristics of such
prior distributions: (I) the support of the prior with reSpect to
some suitable topology on the space of probability measures should be
large, and (2) the posterior distribution given a sample from the true
probability measure should be manageable analytically. The resulting
Bayes rules then are desirable since they are generally admissible and
have nice large sample properties.

Throughout this chapter O = {O,I,...,m}, S is a compact
subset of [O,w)m+l, G is the m-dimensional simplex of probability

distributions G = (G G1,...,G ) on 9 and A is a probability

0’ m

distribution on G. The conditional on 3n risk of the empirical

Bayes rule t = tn at G is

(I0) R(G,t) = GiR(i,t)

0

"Ma

3
and the (empirical) Bayes risk at G is

n

-I
G R(G,t).

(II) Rn(G,t) = E
The “second level Bayes risk” of t with respect to A is
(12) BIA.t) = fRn(G.t)A(dG)-

Lemma l. B(A,t) is minimized by‘ t = t * where
¢ G
G” = EEGIXnJ and t(.) denotes a Bayes response in the component

problem.

23

Proof. Let P denote marginal distribution of 3“, here-
after denoted by 2, and let Q denote the conditional distribution

of G given 2: By (l2) and (II),
(I3) B(A.t) = If R(G.t(§9)Q(dG)P(d3).

Substituting (l0) and interchanging integral and finite sum we obtain

BIA.t) f fGiQ(dG) R(i.t I3))P(dg)

O

IIMB

(in) '
/R(G*.t(3))P(d_§g)

which is minimized by the choice t(3) = t *. D

It is interesting to note that tﬁe result of Lemma 5 can
also be obtained as a corollary to results of Gilliland and Hannan
(l97A). They Show on pp. lO-ll that a Bayes compound rule versus a
symmetric prior 8 on @n in the compound decision problem has

ta(§u) = t_-v a = I,2,...,n where g_= (EO’EJ""’!n) is given by

their (A0), namely

()5) w. =

up
D
A

ll x3
m
- 3
A—o
~:
0
3

Z

2. r' J
n+m O O O O 0

Here the sum ranges over all ( m ) empirical distributions p_ of

g, fj = de/dp, * denotes symmetrization, and nji = ni if j # i,

nji = ni - I if j = i, i,j = O,I,...,m. Consider 8 = B(A) de-

fined by

n ITI n.
(16) Bn(A) = IIn)(_n GJJ)A(dG);

Gilliland and Hannan (l97A, Remark 2) show that for such 8, tw(; )
—-—n

2A

is a Bayes (A) empirical Bayes procedure. Using (l6) in (IS) we

find that
n_ m n.. m n.. *
(I7) ﬂ; = n “3th ( n C.J')( x t“) A(dG)
_' J'-_-O J j=0 J
where 24 = ("Oi’ ’nmi) Since the marginal denSIty of 5“ IS
n-l m

H ( Z Gifi(xa))’ the integrand of (I7) is proportional to Gi times
a=l i=0
the conditional density of G given X“. Thus, g_ is proportional to

C“ = EEGIX 1.
"TI
The empirical Bayes rule which is second level Bayes (A)
has been seen to be the rule which is first level (component) Bayes
versus the induced estimator G“. We note that G% depends on A
through the conditional expectation Q. By Theorem 5, tn = t * is
G
an admissible empirical Bayes procedure. In the next section we

give conditions on A sufficient for the Bayes risk consistency of

t * at every G E G.
G

25

3. Bayes Risk Consistency of the Bayes Procedure.

 

The following lemma is a corollary to Lemma I of Oaten (I972).
Lemma 2 (Oaten (I972, p. II67)). Let R(i,t) §_M < m for
all i 6 {O,I,...,m} and component decision rules t, that is, suppose

S c [O,MJm+l. Then for all F,G e G,

m
(18) D §_R(G,tF) - R(G) :_M 2 |I=i - Gil.
i=O

It follows from Lemmas l and 2 that if S c [O,M]m+l then

the Bayes (A) empirical Bayes procedure tn = t * satisfies
G
(I9) 0 §_R(G,tn) - R(G) :_(m+I)iM “C” - G“ a.s. [PE'IJ
. m+l
where H n denotes the usual Euclidean norm on R . Hence, the
a.s. [PE] consistency of the conditional expectation G“ = EEGIXﬂJ

at G implies the strong empirical Bayes risk consistency of tn
at G. In turn, by the boundedness of “G* - G“, this implies the
mean or usual empirical Bayes consistency of tn at G.

We will use Theorem 6.l of Schwartz (I965) to establish the
a.s. [PE] consistency of EEGIXﬂJ at G for all G 6 G. For this
purpose we establish some lemmas which serve to verify the hypotheses
of the Schwartz theorem for our application. In what follows super-
script c indicates complementation.

Lemma 3. Let G be any point of G and V any G
neighborhood of G. There is a uniformly consistent test of P = PG
versus P 6 {PF: F E Vc}.

Proof. Fix G E G. It suffices to prove the result for

neighborhoods V of the form

26

V
—m I i

O,I,...,m. If we identify [0,94] with a subset of Rm+l, namely,

{(Eovab) x (949E}) xu-X (G .EBII n G where En < G. < G' for

i

{(x0,x],...,xm): xi 6 [0)E43} and similarly for [G},l], then

m
Vc = U {([O,§J] U LG}.I]) PIG}. Let' [O,Gq] flG = 24 and
i=0

[G},l] n G = O} for i = O,I,...,m. Note then that if G is on the

boundary of G, then one or more of the corresponding 94 or Ui is
empty. Let U = {g4: i = O,I,...,m} U {U}: i = O,I,...,m}. Then,
since U is a finite collection of sets and, UU = VC, by Kraft
(I955, Theorem 7) it suffices to show that for each nonempty U E U,

there is a uniformly consistent test of P = P against the alternative

G
P 6 (PF: F E U}.
Take U = U} nonempty. (The case U = 24 is exactly

analogous.) Let hi be a function such that

I if J = I
(20) Ejhl = s '9.) = 09]: ,lTl
0 If j # I
Suppose that hi is bounded by the constant Mi . (For example,

one can take hO’hl"'°’hm to be a basis dual to the densities

f0,f

for each n and §_€ X0° define

l""’fm in L2(u) as observed, e.g. by Robbins (I96A).) Now

IIMD

(Zl) n. (x) =%

O.

I hi(xo)

and the test function

(22) m =

where c = HGi + G}). We will Show that on is a uniformly con-

sistent test of P = PG versus P 6 {PF: F E Ui}' Clearly

hi = G. < c. Now let F E G}. Then F.

n .
Eth + 0 Since E ' I

G :Gi

so that for each n = l,2,...

(23) E:(l - Tn)-: PPE-(h}n(x) - Fi) 3_G} - c].

By the Hoeffding bound (I963, Theorem 2), RHS (23) is bounded by
exp{-Bin} where Bi is a constant depending only on Mi and G} - c.
Hence on is a uniformly consistent test of P = PG versus
P E (PF: reii'i}. i]
For F,G E G we define
f6
(2‘)) KL(G,F) = E (Cm—r).
G f
F
the Kullback-Liebler information number between the mixture densities
fG and fF
Lemma A. Suppose that the support of A is all of G. Let
G be any point of G and V any G neighborhood of G. Given
8 > 0 there exists a subset W c V such that A(W) > O and
KL(G,F) < e for all F E W.
Proof. Fix G E G and let V be a G neighborhood of G.

Let e > 0 be given. For each 6 > O we define

U6 = {F: Gie-é i-Fi :Gie-6 + l - e-G, i = I,2,. ,m},
-6 m -6 m -6
V6 = {F: Gie :_Fl, i = l,2, ,m and 2 FE :_e 2 GE + l - e }.
i=l i=l
W6 = {F: G.e-6 < F. < 6.6-6 + l-(l - e-6), i = l,2,. ,m}
I —- i - I m

and note that W c V c U . We see that U C V for sufficiently

6 O 6 6
small 6. Let 6 be such that U c V and 6 < 5. Since on
O 5 O
O
-6 -6
VO’ F0 :_e GO, we see that fF 3_e fG on V6 so that

KL(G,F) < e for F E V6 . Since each W6 contains a nonempty open sub-
0

set of G and every point of G is a point of support of A, A(VO ) > O. U

0

Lemmas 3 and A verify hypotheses (ii) and (iii), respectively,
of the following theorem which has been converted to our notation.
Recall that Q denotes the conditional distribution of G given
= x ... x * = ' .

( l’ , n-l) and G ELGlfn]

Theorem 6. (Schwartz (I965, Theorem 6.l)). Suppose that

2
_ﬂ

(i) the densities fG(x) may be chosen to be jointly measurable in
G and x, (ii) V is a neighborhood of G and there is a uniformly

consistent test of P = PG versus P 6 {PF: F,G Vc}, and (iii) for

every 5 > O, V contains a subset W such that A(W) > 0 and

KL(G,F) < e on N. Then q(v°) + o a.s. [PE].

Our final lemma will be used to complete the last Step of the
proof of the Bayes risk consistency of the Bayes (A) empirical Bayes
rule tn = t *.

G
Lemma 5. Suppose that A is a distribution on G. Let

G E G be such that for every neighborhood V of G, Q(Vc) + O a.s.

[Pm]. Then “C* - G“ + O a.s. [P“

G G]°

29

Proof. Let G E G. Note that
(25) “6* ' GII = I|f(F ' GlQ(dFIII f. fIIF ' GIIQ(dF)

where the inequality is the Minkowski integral inequality (e.g.,
see Fabian and Hannan (I973, §l.5)). For each s > 0 let

VE - {F: “F - G” < e}. Partitioning the integration into integra-
tion over V5 and over v: and using the bound “F - G“ :_l,

(25) Yields

oo

(26) “a" - G“ i e + q(v:) a.s. [PG].

Since 5 > O is arbitrary the proof is complete. D
Lemmas l-5 and the Schwartz theorem are used to prove the main
result of this section, namely

Theorem 7. Suppose ® = {O,I,...,m} and S is a compact

 

set. Suppose A is a probability distribution with support G.
Then a Bayes (A) empirical Bayes procedure tn is Bayes risk con-

sistent at every G 6 G, i.e.

(27) Rn(G,tn) + R(G) for all G E G.

2192:. By Lemma l, tn = t * is a Bayes (A) empirical Bayes
G
procedure. By Lemma 2, this tn satisfies (l9) for each n. By

A
Lemmas 3-5 and the Schwartz theorem (Theorem 6),“G - G“ + O a.s.

[PE] for all G E G from which (27) follows by the bounded con-

vergence theorem. D

APPENDICES

APPENDIX A

In this Appendix we Show that the Robbins (l95l) bootstrap
rule £_ defined in Example I is an inadmissible compound rule if
n = 2. The demonstration for general n :_2 is Similar but nota-
tionally complicated.

Since @2 is finite, if the equivariant rule E_ is ad-
missible it is Bayes versus some invariant prior 8 on the four
states (0,0), (O,I), (l,0), (l,I). Invariance of 8 requires

In Huang (l972) only 8 with 80 are con-

BOI = 810' 0 = 8II

sidered. It can be shown for general invariant 8, a Bayes compound
rule with respect to 8 must be equal a.e. [Lebesgue on R2]

to the equivariant compound rule E_ with

I 'f i(' ' BDo ' BII)(A2 ' Al) + BIIAIAz ’ BOO 3-0

0 'f i(' “ BOO ' BII)(A2 ' Al) + BIIAIAZ ’ 800 < o
where Ai E exp{2xib i = l,2. Note that if Bil > 800 then
(xl,x2) = (0,0)is in the interior of the region deciding(6I = l,62 = l).

The Robbins bootstrap rule £_ of Example I has (0,0) on the

boundary of the partition it induces in R2 so 5_ is not equal a.e.

[Lebesgue on R2] to a rule 2_ Bayes with respect to B, B > 8

ll 00'

Similarly, for B with 8]] < B The partitions induced by .3

00°

30

3I

and by Z_ for 8 = B = are shown in Figure l of Huang (l972).

I
ll 00 3

The separating curves for the E_ partition have vertical and
horizontal asymptotes for every 8. The E_ partition has separating

2

not equal a.e. [Lebesgue on R2] to any E, Thus, £_ is inadmissible.

curves with asymptotes x] + x2 = 2, x1 + x = -2 so that g_ is

APPENDIX 8

Theorem. In the 2 x 2 component testing problem, if

f (X)
I O 0 0
Z = has continuous cumulative distribution function on (0,m)
fOIX)
under both P0 and P], then the lower boundary B of the risk set
S is strictly convex.

_0l _Ol ..
Proof. Let s] - (51,51) and 52 — (52,52) be distinct

 

points on B, corresponding to rules tI and t2 respectively and
l i

assume without loss of generality that s? > 52. Thus, 52 > 5'.

From the Neyman-Pearson lemma, we know that tl is of the

form

H
II
.<
«m
N
II
7?

O
.h
N
A
x.

where kI 3_0 , and similarly, there is a k2 > k‘ associated with t2.

We assume for the moment that kl # 0, k2 # m . Then, by the

continuity of the distribution of Z, we know that the probability that

Z = k' or Z = k2 is zero under either P0 or P]. We arbitrarily
assign yi = I, i = l,2 and write
si = (P0[Z 3_kiJ, P][Z < kiJ), I = l,2.

Let B be given, 0 < B < l, and define

32

33

s8 = BsI + (l - B)s2

and note that s is the risk point associated with the rule

8
= + ..
tB tI (l 8)t2, namely,
I If 2 3_k2
t8 = B If kl §_Z < k2
0 if 2 < kl
Since tB does not possess Neyman-Pearson structure, 58 is not on
the lower boundary B. (58 is dominated by any Neyman-Pearson test
0
f ' .
0 Size 58 )
The cases kl = 0 and k2 = m can be handled in analogous

waystﬂusonly difference being the choice of ‘YI’YZ' (The points of
B on the two axes are (POEZ 3_mJ, PIEZ < 0°J) and

(Potz > 0], PIEZ : 01).)

REFERENCES

REFERENCES

Brown, L.D. and Purves, R. (I973). Measurable selections of extrema.
Ann. Statist. l 902-9l2.

Copas, J.B. (l97A). 0n symmetric compound decision rules for
dichotomies. Ann. Statist. g l99-20A.

Fabian, vaclav and Hannan, James (I973). Introduction £p_Probability

 

 

 

 

and Mathematical Statistics. Lecture notes, Statistics and
Probability, MSU.

Ferguson, Thomas (I967). Mathematical Statistics, A_Decision

 

 

Theoretic Approach. Academic Press, New York and London.

 

Ferguson, Thomas S. (I97A). Prior distributions on spaces of proba-
bility measures. Ann. Statist. g '6I5-629.
Gilliland, Dennis C. and Hannan, James (I97A). The finite state

 

compound decision problem, equivariance and restricted
risk components. RM-3l7, Statistics and Probability, MSU.
Gilliland, Dennis C., Hannah, James and Huang, J.S. (l97A).
Asymptotic solutions to the two state component compound
decision problem, Bayes versus diffuse priors on proportions.
RM-320, Statistics and Probability, MSU.
Gilliland, Dennis C. and Hannan, James (I976). Improved rates in the
empirical Bayes monotone multiple decision problem with
MLR family. RM-352, Statistics and Probability, MSU.
Hannan, James F. and Robbins, Herbert (I955). Asymptotic solutions
of the compound decision problem with two completely
specified distributions. Ann. Math. Statist. 26 37-Sl.

 

Hoeffding, Wassily (I963). Probability inequalities for Sums of

bounded random variables. J, Amer. Statist. Assoc. 5g

13-30.

 

 

3A

35

Huang, J.S. (l972). A note on Robbins' compound decision procedure.
Ann. Math. Statist. 32 348-350.

Kraft, Charles (I955). Some conditions for consistency and uniform

 

consistency of statistical procedures. Univ. pf_Calif.

Publications 12 Statist.

 

Lindley, D.V. (I971). Bayesian statistics, a review. Regional
Conference Series ip_Applied Mathematics No. 2, SIAM,
Philadelphia.

 

 

Meeden, Glen (I972). Some admissible empirical Bayes procedures.
Ann. Math. Statist. A} 96-l0l.

Oaten, Allan (l972). Approximation to Bayes risk in compound decision
problems. Ann. Math. Statist. 53 ll6A-ll8A.

Robbins, H. (l95l). Asymptotically subminimax solutions of compound

 

 

 

statistical decision problems. Proc. Second Berkeley

 

 

Symp. Math. Statist. Prob., l3l-IA8.

 

 

 

Robbins, H. (I956). An empirical Bayes approach to statistics. 3522:
Third Berkeley Symp. Math. Statist. Prob. I, University of
California Press, l57-l63.

Robbins, Herbert (I96A). The empirical Bayes approach to statistical

 

decision problems. Ann. Math. Statist. 32 I-20.

 

Rolph, John E. (I968). Bayesian estimation of mixing distributions.
Ann. Math. Statist. 39 l289-l302.
Schwartz, L. (I965). 0n Bayes' procedures. 2, Wahrscheinlichkeits-

 

theorie und Verw. Gebiete. A l0-26.

 

Shapiro, C.P. (l97A). Bayesian classification: asymptotic results.
Ann. Statist. g 763-77h.
Tucker, Howard G. (I963). An estimate of the compounding distribution

 

of a compound Poisson distribution. Theor. Prob. App]. g
l95-200.

Van Houwelingen, J.C. (I973). 0n empirical Bayes rules for the con-
tinuous one-parameter exponential family. Doctoral thesis.
Rijksuniversiteit te Utrecht, Netherlands (to appear in

Ann. Statist. (I976)).

 

“IIIIIIIIIIIIIIIT