WWW

I

=
_—
——
——
——
_—
_’_
——
——
_—
_—

(13—3
0—:

 

'4
I

_mNm

'4

 

w _ ‘”WWIlllllllllllﬂllllllllﬂlﬂ'lll

3 1293 01058 8253
LIBRAR Y

Michigan State

University

 

This is to certify that the

thesis entitled

On Asymptotic Optimality of
Bayes Empirical Bayes Estimators

presented by

Tze Fen Li

has been accepted towards fulﬁllment
of the requirements for

Doctoral degree in Statistics

Q. ﬂ/éﬂ/ l

Major professor

 

 

Date j%/—2€1 /7&4/

0-7 639

    
   

c ‘ » I ‘.
. pt. ‘4 .2

’j "i¢:.‘.‘.‘\\\ ;.
~ .4 I’l-um ‘
‘~‘ ’ C. - -

\ c
x» “can”.

Iﬂﬂﬂllﬂﬂilllﬂlﬂl

Y Y
\ ,1

 

   

wk.
Rh

 

 

 

OVERDUE FINES:
25¢ per day per item

BEWRNIM LIBRARY MATERIALS:

Place in book return to remove
charge from circulation records

 

 

 

ON ASYMPTOTIC OPTIMALITY OF
BAYES EMPIRICAL BAYES ESTIMATORS

By

Tze Fen Li

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

1981

ABSTRACT

0N ASYMPTOTIC OPTIMALITY 0F
BAYES EMPIRICAL BAYES ESTIMATORS

By

Tze Fen Li

In an empirical Bayes decision problem, a prior distribution
A is placed on a one-dimensional family G of priors Gm,w E Q,
to produce a Bayes empirical Bayes estimator. The asymptotic
optimality of the Bayes estimator is established when the support
of A is Q and the marginal distributions Hm have monotone
likelihood ratio and continuous Kullback-Leibler information number.
For the normal case, a simple class of empirical Bayes
estimators is constructed that dominate the James-Stein estimator.
Here the Bayes estimator is smooth, admissible and asymptotically
optimal on G. The rate of convergence to mininumi risk is
0(n'1) uniformly on G. The results of a Monte Carlo study are
presented to demonstrate the favorable risk behavior of the Bayes
estimator in comparison with other competitors including the James-

Stein estimator.

ACKNOWLEDGMENTS

I would like to take this opportunity to express my appreci-
ation to my advisor, Professor Dennis C. Gilliland, and my guidance
committee for invaluable guidance, constructive assistance and suggestions
during the entire course of this study. The financial support provided
by the Department of Statistics and Probability and National Science
Foundation made my graduate studies possible. I wish to thank Clara

Hanna who accurately typed the thesis with great patience and skill.

TABLE OF CONTENTS

Chapter Page
I INTRODUCTION TO BAYES EMPIRICAL BAYES
ESTIMATION ...... .. ............ . ............ 1
1.1 Introduction ..... ......... ....... ..... 1
1.2 Example - Normal Case ........... . ..... 3

1.3 Literature Review..................... 17

II ASYMPTOTICOPTIMALITY OF BAYES EMPIRICAL
BAYES RULES..... ........... . ..... ... ...... 19

BIBLIOGRAPHY ............................................. 29

CHAPTER I
INTRODUCTION TO BAYES EMPIRICAL BAYES ESTIMATION

l.l Introduction

Consider the component decision problem consisting of
estimation of 6 based on X which has a distribution
Fe' Let L(e,-) denote a loss function and let R(G,d) denote
the risk of an estimator d when G is a prior distribution

on e, i.e.,
R(G,d) = f L(e,d(x)) dFe(x) dG(e). (l.l)

Let 0 denote the class of all component estimators d.

The infimum risk,

R(G) = inf R(G,d), (1.2)
deD

defines the Bayes envelope at G. An estimator d6 6 D such that

R(G,dG) = R(G) is said to be a Bayes component rule versus G.

In the empirical Bayes (EB) decision problem with this
component, (61.xi), i = l,2,... are i.i.d. with 6i ~ G, and,

conditional on ei,Xi ~ Fe.° The EB problem is to estimate en
1

based on observing X]....,Xn. This can be construed as using

l""’Xn-l) to select a component decision rule tn(§n_1)€D

and evaluating it at Xn to estimate en. (In what follows tn

will sometimes be used to abbreviate the evaluation tn(§n_1)(xn).)

The risk of’ tn at G conditional on §n_] lS

R(G, tn(§n_])) 3 R(G), and the overall risk is

Rn<e. tn) = I R(G. tn(5n_1>)ng"(5n,1). (1.3)

where its-1

denotes the (n-l)-fold product of the G-mixture of
the Fe“
Let G be a specified family of distributions on 6.

Definition l.l. (Robbins (1956)) A sequence of EB rules tn

 

is said to be asymptotically optimal (a.o.) on C? if for each
G E G,

lam Rn(G, tn) = R(G).

When the component loss is squared error, a Bayes component
rule is dG(X) = EEelX]. Furthermore, EB risk (1.3) has the re-

presentation
- 2
Rn(G,tn) - R(G) + E(tn-dG(Xn)) (l.4)

provided E62 < w. This representation was noted by Johns (1956)
and used as a starting point to prove the a.o. property of certain
EB estimators. It follows from the Lz-orthogonality of

EEenlxn] - an and tn - Eten|xn].

In this thesis, 6 is assumed to be a parametric family
of distributions of a. Each G 6 G is identified by an element

w of an indexing set 9 which is a subset of the reals, i.e.,
G = {Gwlw E 51}. “-5)

Let A be a prior distribution on a. An EB estimator

tn is said to be Bayes and called Bayes EB with respect to A if it

minimizes
Rn(A,tn) = f Rn(Gw’tn)dA(‘”)° (1.6)

Good (1965) refers to such priors on priors as Type III

probabilities. Meeden (l972) illustrates the Bayes approach to

empirical Bayes squared error loss estimation problems with several
examples. Other literature discussing Bayes empirical Bayes includes
Lindley (1971), Gilliland and Hannan (1974), Gilliland, Hannan and
Huang (1976), Deely and Lindley (1979) and Gilliland and Boyer (1979).

In the next section we develop the EB and Bayes EB methods
for an example. In Section 3 we give a brief review of the related
literature. In Chapter II we consider the Bayes EB method and show

that it produces a.o. procedures for a variety of EB decision problems.

1.2 Example - Normal Case

In the present section we consider the component consisting
of squared error loss estimation of 6 based on X ~ FB = N(e,1).
First consider the compound decision problem with this component.

This consists of estimation of §_= (61,62,...,6n) based on

_x= (x x ...xn) ~ Fe

1, 2,. X Fe X...X F with the compound risk being

1 2 6n
the average risk across the n components. James and Stein (1961)

show that the estimator t} = (t1,t;,...,t;), where for i = l,2,...,n,

1 _ 5:3; _ n 2
ti(X) - (1 - S )Xi where S - 1;] Xi, (1.7)
has compound risk satisfying
3 (3,0) <1 for all g, n 3 3. (1.8)

This demonstrates the inadmissibility of the compound estimator “X
if n 3 3.

Efron and Morris (1972) point out that t; is a natural
EB estimator for the EB problem with this component and a class

of normal prior distributions, here parameterized as
G={N(O, 1&Q)|0<w51}. (1.9)

This is seen to be the case since a Bayes component rule versus

Gm = N(0, 171-9) is
dw(X) = (1-w)X, (1.10)

and the shrinking factor (n-2)/S in the James-Stein rule is an

unbiased estimator of w based on X_~ H2. H denotes the marginal

(D

of the Xi in this EB problem, i.e., N(O,w']).

Let Gm be abbreviated by w. The Bayes envelope on G is

R(w) = 1-w, 0 < m f 1. (1.11)

From (1.4) and (1.10) it follows that for any EB estimator of the

form

tn = (1'¢(5))Xna (1-12)
Rn(w,tn) = R(m) + E[(w-¢(S))Xn]2. (1.13)

The evaluation of the RHS(1.13) is simple if a is a linear com-
bination of powers (including negative powers) of 5 since, conditional
on w, m 5 ~ x2(n) and Xi/S ~ Beta (k. 5510 are independent and

the moments (including negative moments) of S are easily calculated.

1

Efron and Morris(l972) use this calculation for tn

with shrinking

factor ¢](S) = (n-2)/S to show that
Rn(w,t;) = R(w) + %$-, 0 < w 5 1. (1.14)

From (1.14), the James-Stein EB estimator t; is seen to be a.o.
on G with a rate 0(n'1) uniform in w. (That t; is not a.o.
on the class of all priors G is pointed out by Susarla (1976).)

We now demonstrate a simple class of a.o. EB estimators
each dominating on G the James-Stein estimator t; for large n.
Consider

ti = t; + géglél-xn (1.15)

Using (1.13) and letting H denote the marginal distribution of

ms, i.e., x2(n),

Rn(w.t§)-R(w) = [tn-n

x2
n n
f jf-de(§n) I [w -

 

2(n-6) 12x2 dHn (x )
S2

n:2_ 2(n-6 2 n
S + S J S de(xﬂ)

sz y dHly)

 

g_ n-Z
n I [1 - y +

y°
—1ﬁ:§7-[2(n-2) + _9_Iﬂ_§l._ -—iﬂ—§lﬂ (1.16)

5 m, (1.16) and (1.14) combine to show

2(n- 6)

Using the fact that w2

4w2(n-6)

' "(n'2)(n-i) , n > 6. (1.17)

 

Rn(w, tﬁ) < Rn(w, tl)
The coefficient 2(n-6) of 5'2 in the definition (1.15) of

ti was chosen from among all constants to produce an adjustment

for the James-Stein estimator t; which results in domination

(1.17). Continuing this method of construction leads to EB

estimators with nested risk functions. The next two estimators in

the construction are

 

2
3 2 2 n-1O
tn tn + -1—g§—l—-xn (1.18)
and
2
4 = 3 2(n-14)(n - 28n + 188)
tn tn + s4 xn (1.19)
with

3 2
Rn(ws tn) < Rn(w, tn)-

4te3(n-10)3

n(n-2)(n-4)(n-5)(n_8) : n > 10 (1.20)

and

4 2
4 3 4w (n-14)(n_-28n + 188)

From (1.13) and the fact 1.16 (0.1]. each estimator ti
can be improved by retracting its shrinking factor ¢j(S) to the interval

[0,1]. Each ti is dominated on G by
1*: _ * -27
tn (1 ¢j(S))Xn (1. -)
* 0
where a = max { m1n{a,1}, 0}.

We now turn to the study of Bayes empirical Bayes
rules with respect to a prior A on we (0.1]. The estimators are
of the same general form as the ta but with more complicated
shrinking factors. Moreover, they are monotone in X". and. for
suitable A, admissible on (0.1] and a.o. on (0.1] with a rate
0(n'1). The results of a Monte Carlo study will demonstrate the
favorable risk behavior of the Bayes EB estimator based on uniform A.
Let A be a prior distribution on w E (0.1]. By (1.4)
and (1.10)

Rn(A. tn) = f R(w)dA(w)

+ f (téggh,)(xn)-(1-n)xn)2dng(5ﬂ)dn(n). (1.23)

For (1.23). a minimizer is

tﬁ = (1-A)x , (1.24)

where

(1 %.+1 '39.
0“’ e ‘- dA(
, n).

1'“?
f0 mze dA(w)

(1.25)

Here A is conditional expectation of w given X“ in the model

~ . . . . -l
w A and cond1tional on w, X].....Xn_]. Xn 1.1.d. Hm = N(0,w ).
Note that tﬁ is unique a.e. Lebesgue on Rn-space. Also note that
t2 is of the same form as the t3 introduced earlier but with a more

I
complicated shrinking factor ¢(S) =13.

Whereas the EB estimators t% are not monotone N1 Xn

for fixed Xn-l’ we have

A .
Remarkl.l. For any prior A on (0.1], the Bayes EB rule tn ‘lS

 

monotonically increasing in Xn for fixed ln-l‘

n
Proof. From (1.24) and the fact S = I x?.

D + wrt S. But B is the mean of an exponential family with parameter

it suffices to prove

-S/2 and is therefore i wrt -S/2. U

Remark 1.2. For every prior A on (0.1]. the Bayes EB rule t2 is

 

admissible on (0.1].

Proof. The uniqueness a.e. Lebesque of a Bayes rule implies it ad-
missibility. D
The Bayes EB estimator t2 is easily evaluated and the rate

of a.o. determined for the prior A = Beta (a,1), a > 0. Let

téa) be Bayes EB versus Beta (a, 1). (This class of Beta priors
was used by Strawderman (1971) in constructing a prior for the compound
decision problem. He shows that the compound version ti“) of

ti“) satisfies (1.8) provided n 3 4 + 2a.) With the Beta (a, 1)

prior. (1.25) simplifies using integration by parts to

 

. l -
w = “+2“ - 2(s f u $5"+°‘“e1‘5“'“)5du) 1, (1.26)
O
In case 3-+ a - l is an integer, say m, (1.26) further simplifies
through repeated integration by parts to the closed form
m k
a = Zigill. - (1,5)"‘[m1(e1‘iS - 2 $E$l—)J". (1.27)
0 0

Using (1.27). B can be easily calculated to any degree of precision.

Since the EB risk Rn(w.t£“)) is the expectation of the

compound risk R(e.t(“)),the aforementioned Strawderman (1971) result

implies

Remark 1.3. If n 3 4 + 2a, Rn(w, tga))< l. O < w < l. f (1.28)

 

The conditional expectation B of (1.26) satisfies

. A +2a o A
11m w = ﬂ—————-. 11m 5w = n+2a. (1.29)
510 "+2“+2 Six

Figure 1.1 displays for n = 20 the graphs of ¢j(S) that are part
of the EB estimators tgo, j = 1.2.3 as well as B of (1.27)

with a = l (uniform prior).

10

. ..¥_...__.T ...—4» ._..-_.-.'__
1

.-..- . _.._.-._.. ...—o

 

The conditional expectation B of (1.26) is close to the

, *
corresponding factor ¢](S) in the modified James-Stein EB rule

1+ 1

tn . Also tn and téa) have the same rate of a.o., namely,

0(n'1) uniformly in w. To establish these results we begin with

Lemma 1.1. Let wn = min{l. (n+2a-2)/S} and let 8 be given by

(1.26). If n+2o-2 > 0. then

((B'Nn)2 < 411’

- m “-30)

 

11

Proof. Let A denote the conditional distributioncrf to in the model
w ~ A and conditional on m, X]. X2.....Xn are i.i.d. Hm = N(O,w']).
Then A has a density with respect to Lebesgue measure which is

proportional to
E
A(w) = w2 e 22. 0 < m f l. (1.31)

Assume that n+2o-2 > 0. Examination of 9(a) =4h R(w) shows that

wn is the maximizer of 9(w). Note that

1o - onl

I I (w-wn)dA(w)|

IA

I Iw-wnldA(w)
= I A£w=>w +t3dt + I Ate <1»-t]dt.(1.32)
0 n o n

The density 3. is 27-. concave from which it follows that

Aiel__. -
Ato.1] + 1n w e (0.11. (1.33)

(cf. Gilliland, Hannan and Huang (1976), Lemma 8)). Also. X(1-w)

is m, concave from which it follows that

€591— 1 in o e (0.11. (1.34)
A(0,w]
Thus.
A [w > mn+tJ f R(wn+t)/X(wn)
A A A (1.35)
A [w < wn-t] f A(wn-t)/A(wn).

12

The Taylor series expansion of 9 =1% X about w = wn shows that

g<on+t)-g(on) 5 tg'<on)- 2.—<n+2o-2)t2

1 2 (1.36)
9(wn-t)-9(on) 5 -tg'(wn)- zin+2o-2)t
for t such that wn 1 t E (0.1]. Since g'(mn) = 0 if wn e (0.1)

and g'(wn) 3 0 if m = 1. it follows from (1.36) that for t

n
such that wn t t 6 (0.1].

g(oon t t)-g(mn) f - %(n+Zo-2)t2. (1.37)

Using exp {- %(n+2a-2)t2} to bound RHS (1.35). (1.32) and the
inequality I0° exp {- %-at2}dt f Jn72a. the proof of (1.30) is
0

complete. D

Remark 1.4. If n+2o-2 > 0.

 

2
Rn(w,téa))-R(w) 5 ﬁ;§§:§-n" + 4";%£EZS‘) m. (1.38)

Proof. Using (1.13), and triangulation about wn of Lemma 1.1,

 

Rn(w,t£a))-R(w) f 2 E[(m-wn)2X§] + 2 [[(wn-w)2X2]. (1.39)
By Lemma 1.1 and the fact E xﬁ = o’].
. 2 2 n4 -1
E[(U)-wn) Xn] f n+2a-2 u) o (1.40)

 

13

Also

2 2 n+2a- 2 2 2
E[(wn-w) Xn]< E[(— -m) Xn]. (1.41)

Expanding the square in (1.41) and using the fact m S ~ x2(n)
independent of Xi/S. it follows that

 

2
2n+4(a -1)
which together with (1.39) - (1.41) completes the proof. D

The bound in (1.38) fails to demonstrate the uniform 0(n'1)
rate on (0.1]. The following theorem does establish this uniform
rate by combining (1.38) with a bound designed for risk in a

neighborhood of w = 0.

Theorem 1.1. The Bayes EB estimator t3“) is a.o. with a rate

0(n") uniform in m e (0.1].

Proof. By (1.13). (1.26), and the fact n E [(8-w)2X2] =

E [(B-w)25] (by the symmetry of B in X].X2.....Xn).

n[Rn(1o,t'(‘a))-R(m)] = E r;- {n-Z-wS + 2(1+a)—2f(5)}2] (1.43)
where
+a-1e--1- S
f(S) = (10117" ( u) do)". (1.44)

Let n > 2. Note that E [5' 9(5)] = (Mn-2)"1 E g(Y) where

m S ~ x2(n) and w Y ~ x2(n-2) by a change of variable argument.

14
Now the variance of w Y is
E{(n-2)-wY}2 = 2(n-2) (1.45)
and
Cov(Y,f(Y)) < 0 (1.46)

since f(Y) + with respect to Y. so that (1.43) shows that

2
LHS(1.43) 5 2 w + 4”E[{(;fg)'f(Y1} 3 : (1.47)

The proof is completed by showing that E f2(Y) is 0(n) uniformly
in w e (0.%] and using (1.38) to establish the uniform rate on

[%3 1]. From (1.44). for any b > O

E f2(Y) 5 (ﬂniiwwi'ldnr2 PEmY 5 b] + (floimm'le‘l‘ulb/hdu)‘2 (1.48)
O 0

where use is made of the fact that the integrand in (1.44) is increasing

in S. The choice b = kn+o-1 ensures that the PLwY f b] 5 2(n-2)

(lgn-ot-lf2 (use the Chebyshev inequality and the fact w Y ~ x2(n-2)).

With this choice of b,

u*"+“"e("“)b/2“ > 1, 2w < u < 1 (1.49)

so that the last term on RHS(1.48) is bounded by (1-2w)-2. Hence.

E f2(Y) = 0(n) uniformly in w e (o. %J. . p
Maritz (1970, Chapter 3) proposes several methods of obtaining

"smooth" EB estimators. He illustrates two of these methods for

*the normal case EB example of this section.

15

An estimator 36 is described in Maritz Examples 3.4.4

and 2.14.2 which is
e A*
6G = (1'01”)xn, (1.50)
5*
where wM is the retraction of the shrinking factor

13M ="—;1- (1.51)
s

D)
3
D.
U7<
II

II M l
—l

The estimator 36 is seen to be a delete version of t;+.
where delete refers to the fact that only the initial observations

Xn_] are used to est1mate w. By (1.13).

A 1 «* 2
Rn(w,GG)-R(w) - w E (m-wM)

(L)

+
= TL_1)('_I2nr-'33 :_5 , (1.52)

so 3G is seen to be a.o. with a rate 0(n'1) uniform in

'1 E (w- E5192

IA

416 (0.1]. For comparison. the James-Stein estimator. which uses
the untruncated 41(5) = (n-2)/S to estimate w, has excess risk
2w/n.

The Maritz estimator B3 is constructed by finding the
3-point uniform distribution G on 66 (-co,co) among all such
distributions so that the G-mixture of N(e,l) minimizes a distance

between the mixture and the empirical distribution of X1,X2,...,Xn_1.

16
(See Maritz (1970, pp. 54-55).) The EB estimator of an is

then taken to be

53 = d§(xn) . (1.53)

Both Maritz EB estimators are "smooth" in the sense of
being monotone in Xn for fixed Xﬂ_]. However. as the following
table shows. both estimators perform poorly relative to t3]) and

even t:+ and t; for selected values of n and m.

Table 1.1. Rn(w,tn)-R(u) Values

 

 

 

o w R(w) n = 4 n = 10 n = 20
té‘) .13 .05 .02
+
5 .167 .833 t; .09 .03 .02
t; .08 .03 .02
tél) .09 .06 .04
2 .333 .667 t;+ .18 .06 .03
1
tn .17 .07 .03
tgl) .06 .04 .04
l+ .25 .09 .05
1 .500 .500 t; .25 .10 .05
56 ____ .17 .10
03 ____ .32 .19
tél) .07 .03 .03
1+
tn .33 .10 .06
.5 .667 .333 t; .33 .13 .07
66 ____ .16 .09

D3 .22 .14

17
Table 1.1. (Continuation)

 

té‘) .16 .07 .04
tl+ .44 .12 .07
.1 .909 .091 t; .46 .18 .09
86 ____ .12 .07
p3 ____ .16 .07
t£]) .20 .10 .06
+
.01 .990 .010 t; .48 .13 .07
t1 .50 .20 .10
n
The values for the James-Stein estimator t; are exact

+
Zw/n. The values for the modified James-Stein estimator t; and

the Bayes EB rule tél) are estimates based on 1000 Monte Carlo
trials. The estimated standard deviations are generally about 8%
of the estimated excess risks. The values for the Maritz estimators
SG and 53 are taken from his Table 3.14 (1970) where the margins
of errors of the Monte Carlo estimates are reported to be less than
t .02. Also the Maritz risks are based on EB decision problems
with n = 11 (not 10) and n = 21 (not 20) observations.

Figure 1.1, Lemma 1.1 and Theorem 1.1 suggest that t(])

n
+
and the modified James-Stein estimator t; should have very similar
4.
EB risk behavior for moderate to large n. Also t; and t;

should have very similar EB risk when w is small since

a S ~ x2(n) and t;+ = t; if S 3 n-2. Table 1.1 illustrates

these facts. Furthermore, it shows that 03 (and 36) are poor

18

estimators in the tested combinations of w and n contrary to
the conclusion Maritz (1970, p. 72) reaches by comparing B3 and

56 with the simple estimator X“.

1.3 Literature Review.

 

Gilliland and Hannan (1974) and Gilliland. Hannan and Huang
(1976) discuss Bayes procedures for the finite state compound decision
problem. Bayes compound procedures with respect to mixtures of
product distributions are shown to be Bayes EB procedures. Impli-
cations fbr the EB problem are discussed including asymptotic optimality.
Gilliland and Boyer (1979) demonstrate that a.o. in the finite state
EB problem is an easy consequence of classical results on the con-
sistency of posterior distributions. Tsao (1980a, 1980b) gives an
algorithm to efficiently compute Bayes EB procedures and uses a
Monte Carlo simulation to develop small n risk behavior for selected
A and the Robbins two state component.

In the finite state case, the unrestricted family G of
distributions on 6 is finite-dimensional. Otherwise, G is infinite-
dimensional and the process of placing priors A on G is itself
a technical problem. Meeden (1972) for certain components. with,

a restricted to the unit interval, places a prior A on G through
the moment sequence and demonstrates the a.o. of the resulting Bayes
EB estimator. Kuo (1980) proposes a way to compute the Bayes EB
estimators when A is a Dirichlet process. The a.o. property is

not established in this general setting. Deely and Lindley (1979)
illustrate the use of Bayes procedures in EB decision making but

do not consider a.o..

CHAPTER II
ASYMPTOTIC OPTIMALITY OF BAYES EMPIRICAL BAYES RULES

Consider the EB decision problem of Section 1.1 with the
one-dimensional family G of component priors of (1.5). We let
Fe denote the distribution of X given 6 and when 6 ~ Gm let
H; denote the marginal distribution of X. Throughout this chapter
we assume that the family {lew E Q} is identifiable and dominated
and let h(XIw) denote a density for Hw. It is assumed that
h(x|w) is jointly measurable in x and w. This assumption is
part of the hypothesis of a Schwartz theorem on consistency of
posterior distributions, which will be applied in the Bayes EB

approach where a probability measure A is placed on m.

From (1.3) and (1.6),
Rn(n,tn) = I} R(o.tn(xn_.))dIn(o)dP(xn_]) (2.1)

where, here and throughout, An(-) = A(-|xﬂ_1) is the conditional
distribution of w given ln-l in the model where w ~ A and'
conditional on m, X ,X .....X are i.i.d. H . As Gilliland

1 2 n-l w
and Boyer (1979) point out, it follows from (2.1) that a Bayes EB

rule is provided by

1) = dé (2.2)

19

20

where dé is component Bayes versus the mixture
n

on(o) = I ow(o) din(o). (2.3)

Of course. the random measure Gn need not be G-valued where
G = {Gwlw 6 Q}. In the normal example of Section 1.2, Gw = N(O,
(1-w)/w) and nondegenerate A-mixtures of Gm are not normal.
(cf. Teicher (1960), Corollary, p. 67.)

A basic condition critical for our proofs of a.o. for Bayes

EB procedures is

A C m
An(V ) -> 0 a.s. Hm
° (A)

for every mo 6 O and every neighborhood V of we

where Vc denotes the complement of V in Q. The condition (A)

is easily seen to imply

oo

Iw(w)d71n(w)+w(oo) 3.5. nw

O

(4-)

for every mo 6 a and every bounded continuous function 0.

Theorems 2.1, 2.2, and 2.3 give conditions which together
with (A-) imply the a.o. property for the Bayes EB procedure t2.
Theorem 2.4 states conditons on the family {lem E 0} sufficient
for (A). The chapter is completed with four examples of Bayes EB

procedures whose a.o. property follows from the theorems.

Theorem 2.1. Suppose that thecomponent risk functions R(G,d)
are bounded (by M < w). Suppose each Gm has a density with

respect to Lebesgue measure given by g(Olw) which is continuous

21

in w E 9 for each fixed 8 E o. Then if (A-) obtains, t2 of

(2.2) is a.o. on n, that is,

Rn(w°.t2) + R(wo) for all wo E O. (2.4)

A

Proof. Fix w° e 0. Note that for each prior G on e.

0 s R(wo.d§)-R(w°) = R(w°,dé)-R(G,dé) + R(G,dé)-R(wo) 5 [R(wo,dé)
-R(o.dg)1 + [R(G’do°)'Riwo’dnoll- (2.5)

A

Let Gn be given by (2.3) and note that Gn has density

§n<9> = 7 g(elw)dAn(w). (2.6)
For any d E D.
R(wo,d)-R(Gn,d) = f R(G,d) I {g(elwo)-g(elw)}dAn(w)de
from which
IR(o°.d)-R(én,d)l 5 M II Ig(eloo)-g(elw)Ided1n(o). (2.7)

Note that |g(e|w°)-g(e|w)| 5 g(elwo) + g(elw) and ‘I{g(e|w°)
+ g(elw)}d6 = 2. It follows from the assumed continuity of
g(elw) in w and the general dominated convergence theorem
(DCT) (Royden (1968. p. 89)), that j|g(e|w°)-g(e|w)|de is a
continuous as well as bounded function of w. Therefore, (A-)
implies

RHS(2.7) 4 0 a.s. H: . (2.8)

O

22

Letting G = Gn in (2.5) and using the bound RHS (2.7) for each
of the square bracket terms, one obtain
R(w tA(X )‘- R(w ) + 0 a s H00 (2 9)
0’ n -ﬂ-1 I o ' ' wo- -
/
Since all risks are bounded. the a.o. of t2 follows from (2.9)

and the DCT. D

The Bayes EB rule t2 defined by (2.2) has a useful alternative
representation in the special case of squared error loss estimation
and when the family {Fele 6 O} is dominated. Let f(xle) denote

a density for F . Since dG(x) is the point estimator

e
_ 6‘f x 0 d6 9
do‘x) ' lIT’x 6 d6 6 (2°10)

and G" is a An-mixture of Gm, (2.2) implies that

I16 f(xnlo)de(o)d?\n(w)
II f(xn19)de(9)dAn(w)

 

t£(5n-l)(xn) =

_ I dw(xn)h(xn|w)dAn(w)
I h(xn|w)dAn(w)

 

n
=I dw(xn) 1:] h(Xilm)dA(w)

 

n
I .n h(xi|w)dA(w)
1=l

A

Thus, tn is the point estimator

t2(xn_])(xn) = I dw(xn)dAn+](w). (2.11)

23

Theorem 2.2. Suppose that the component loss function is squared
error loss estimation of 6. Suppose that for each w 6 9. the

component Bayes rule dw with respect to Gm is of the form
P
dwm = x 41.0mm) (2.12)
i=1

for some integer p, some square integrable (Hm) functions

51 and some bounded, continuous functions 01. Then if (A-)

A

obtains, tn of (2.11) is a.o. on 9.

Proof. Fix wo E 9. By (2.11) and (2.12),

t£(5n_])(xn)-dwo(xn) = 121 91(xn){Ii-vi(wo)} (2.13)
where
I, = fwi(w)dAn+1(w), i=1.2,...,p. (2.14)
By (2.13).
Et£(§n_1)(xn)-dwo(xnll2 5 p 12] r§(xn){1,-wi(wo)}2. (2.15)

Using (1.13) and (2.15) and using the invariance of If and the
distribution H2 under permutation of x].x2....,xn to permute

x1 and X".

A p 2 c 2 n
o 5 Rn(w°,tn)sR(wp) 5 p 1;] I ¢i(x]){wi-wi(wo)} d w°(5n). (2.16)

By (A-). $1 + wi(w°) a.s. H: . Therefore. by the assumed

2

integrability of the mi and the DCT. RHS (2.16) + 0. D

24

Consider the linear loss multiple decision problem of
Van Ryzin and Susarla (1977). From (33) of Van Ryzin and Susarla
(1977) or (14) of Gilliland and Hannan (1977). it follows that the
excess risk of a Bayes EB procedure can be bounded by a multiple
of the mean error of estimator of dw(Xn) by RHS (2.11) rather
than the mean square error as with the squared error loss estimation

component.

Theorem 2.3. Consider the linear loss k-action multiple decision

 

problem of Van Ryzin and Susarla (1977). Suppose that for each

w E 9, the component conditional expectation dw(X) is given by
(2.12) for some integer p, some integrable (Hm) function mi

and some bounded. continuous function 41. Then if (A-) obtains,
A

tn 35 3.0. on 52.

Proof. From the c = 1 case of (14) of Gilliland and Hannan (1977),
A P c n
0 f Rn(wo’tn)-R(wo) f(k-IBE] I lcp1(x1){wT-w1(wo)}ldeo(z-n) (2.17)
where use is made of (2.11) and (2.12) and the invariance under
permutations used to reach (2.16). By (A-). $1 + ¢i(w°) a.s. H:

Therefore, by the assumed integrability of the mi and the DCT.
RHS (2.17) + 0. D

The next theorem gives a set of conditions sufficient for
(A) and hence (A-). It depends for its proof upon Theorem 6.1 of
Schwartz (1965) which we state in a notation appropriate to the

application at hand.

25

 

Theorem 6.1 (Schwartz). Suppose that (i) the.££ﬂii:£i§i_ h(x|w)

are jointly measurable (ii) V is a neighborhood of wo and there
is a uniformly consistent test of the hypothesis w = w° against
the alternative w 6 VC. and (iii) for every c > 0 V contains a
subset W such that A(W) > 0 and the Kullback-Leibler information
number

K(w,w°) = f h(x|w)de (X) (2.18)

satisfies K(w,wo) > K(wo,wo) - c on N. Then An(VC) + 0 a.s.
Hoo
LOO

Theorem 2.4. Suppose that for each n. the joint density

 

h(gnlw) = n h(xilw) has a monotone likelihood ratio (MLR) in
Tn(§n). Suppose that the Kullback-Leibler information number
K(w,w°) is finite and is continuous in w for each wo 6 0. Then

if A has support equal to n, (A) obtains.

Proof. Let wo.w1 e a. w1 < mo. There exists a consistent test

an of Hmo versus Hm] s1nce Hm] f Hm by ident1f1ab111ty.

*
Let 6" be the Neyman-Pearson test based on Tn of the same size

0

'k .
as on. Then on is uniformly consistent for Hm versus {lew e n

O

and w 5 “1} by the MLR property. Similarly. there exists a

uniformly consistent test of Hw versus {lew E n and w 3 wz}

0

where w2 E a. w2 > mo. It follows that there exists a uniformly
consistent test of Hw versus {Hm |w 6 n and w 4 (w],m2)}.

Moreover, the continuity of K implies that for every Q-neighborhood

V of w, and every c > 0 there exists a subset W of V such

26

that A(W) > 0 and K(w,m°)-K(wo,mo) > - c on W. Theorem 6.1
establishes that (A) holds. D

The family of densities
h(x|w) = c(w)eQ(‘°)T(")h(x), w e o (2.19)

where Q is a monotone function. results in the family h(gnlw)
being MLR in Tn(§n)= )T(xi). In each of the examples to follow.

the mixtures {lew 6 9} form a one-parameter exponential family.

Example 2.1 (Normal case of §1.2), Let FB = N(e.1),e E O =
(-w.r) and Gm = N(0,(l-w)/w). w 6 0 = (0.1]. Then

/"" 2
h(XIw) = '29:"— e-lﬁwx , -oo < X < 00.

Moreover,

K(w,wo) = err/A24) - a. o + 5—1.

0

Here the component conditional expectation is (l-w)X and note that
w(w) = l-w is bounded and continuous and X is square integrable

(Hm). The density g(elw) is continuous in w for each fixed 6.

Example 2.2. Let FB = Poisson {6), e e (0.w) and let

 

em = Gama (010,00/(1-01”, w 6 (0.1), where 010 > 0 '15 fixed.
Then
F(ao+x) X a0
h(x|w) = W 00 (Two) , X = 0.1,...
Here K(w,wo) is continuous in w and component conditional

expectation is (a0 + X)w. Also, w is bounded and continuous and

27

a0 + X is square integrable (Hm). The density g(elw) is continuous

in w for fixed 6.

Example 2.3. Let Fe be Binomial (n = 1,6), 6 E [0.1] and let
Gm = Beta (a0,a°(1-w)/w), w e (0.1), where do > 0 is fixed.
Then

h(XIw) = wx (1-m)1-X, X = 0.1.

Here K(w,wo) is continuous in w and the component conditional
expectation is (o1o + X)w/(ao +w). Also, w/(oo +w) is bounded
and continuous and 00 + X is square integrable (Hm). The density

g(elw) is continuous in w for fixed 9.

Example 2.4. Let FB = Uniform (0.9). e e (0.w) and Gm = Gamma

 

(2.w']), w E [a.w) where a > 0. Then

-wx

h(x|w) = w e , x > 0,

Here K(w,w°) is continuous in w and the conditional expectation

1 = X-l + low"1 and note that X and l are square

in X + w-
integrable (Hm) and l and w-1 are bounded and continuous on
[a.«). .The density g(elw) is continuous in w for fixed 6.
Consider any EB decision problem with component probability
structure that of any of the Examples. If the loss structure is either
bounded, or is squared error loss estimation, or is linear loss multiple
decision. Then the Theorems show that a Bayes EB procedure t2 will

be a.o. on the support of A. and, therefore, on G = {Gwlw E 0}

if the support of A is equal to n.

28

The results of this thesis pertain to EB problems with
one-dimensional families G. In generalizing to higher (finite)
dimensional families, there is no problem invoking the Schwartz
consistency theorem except in the demonstration of a uniformly
consistent test. For infinite dimensional families, the technical
problems are much greater and only a few examples of a.o. Bayes 53

procedures have been given (c.f; Meeden (1972)).

BIBLIOGRAPHY

BIBLIOGRAPHY

Deely, J.J. and Lindley, D.V. (1979). Bayes empirical Bayes.
University of Canterbury. Christ Church and University College.
London. (private communication).

Efron, Bradley and Morris, Carl (l972). Limiting the risk of Bayes
and empirical Bayes estimators, Part II: The empirical Bayes
case. J. Amer. Statist. Assoc. 51, 130-139.

Efron, Bradley and Morris, Carl (1973). Stein's estimation rule
and its competitors -- An empirical Bayes approach. J. Amer.
Statist. Assoc. 53, 117-130.

 

Ferguson. Thomas S. (1973). A Bayesian analysis of some nonparametric
problems. Ann. Statist. A, 209-230.

 

Gilliland, Dennis C.and Boyer. John E.. Jr. (1979). Bayes empirical
Bayes. Dept. of Statistics and Probability. MSU. (submitted
for publication).

Gilliland, Dennis C.and Hannan. James (1974). The finite state
compound decision problem, equivariance and restricted risk
components. RM-317, StatiStics and PrObability, MSU.

 

Gilliland. Dennis C. and Hannan, James (1977). Improved rates in
the empirical Bayes monotone mulitple decision problem with
MLR family. Ann. Statist. 5, 516-521.

Gilliland. Dennis C.. Hannan, James and Huang. J.S. (1976). Asymptotic
solutions to the two state component compound decision problem,
Bayes versus diffuse priors on proportions. Ann. Statist.

A, 1101-1112.

Good, I.J. (1965). The estimation of probabilities: An esgay
on modern Bayesian methods. Research—Monograph No. 30,
M.I.T. Press.

 

James, W. and Stein, C. (1961). Estimation with quadratic loss.
Proc. Fourth Berkeley §ymp. Math. Statist. Prob., 361-379.
UnTVersity of California Press.

 

Johns. M.V.. Jr. (1965). Contributions to the theory of non-parametric
empirical Bayes procedures in statistics. Ph.D. Dissertation,
Columbia.

29

3O

Kuo, Lynn (1980). Computations of mixtures of Dirichlet processes.
Technical Report No. 96. Dept. Stat.. Univer51ty of M1chigan.

Lindley, D.V. (1971). Bayesian statistics. a review. Regional
Conference Series in Applied Mathematics No. 2. SIAM,
Philadelphia.

Maritz. J.S. (1970). Empirical Bayes methods. Methuen and Co.
Ltd. London.

 

Meeden, Glen (1972). Some admissible empirical Bayes procedures.
Ann. Math. Statist. 33, 96-101.

 

Robbins, H. (1956). An empirical Bayes approach to statistics.
Proc. Third Berkeley Symp. Math. Statist. Prob. 1, University
of California Press, 157-163.

Royden. H.L. (1968). Real Analysis, 2nd Edition, Macmillan. New
York.

 

Schwartz. Lorraine (1965). On Bayes' procedures. Z Wahrscheinlichkeits-
theorie und verw. Gebiete 3, 10-26.

 

 

Strawderman, William E. (1971). Proper Bayes minimax estimators of
the multivariate normal mean. Ann. Statist. 5g, 385-388.

 

Susarla. V. (1976). A property of Stein's estimator for the mean
of a multivariate normal distribution. Statistica Neerlandica
30, 1-5.

 

Teicher, Henry (1960). On the mixture of distributions. Amp,
Math. Statist. 31, 55-73.

 

Tsao, How Jan (1980a). 0n the risk performance of Bayes empirical
Bayes procedures for classification between N(-1,1) and N(l,l).
Statistic. Neerlandica 35,

Tsao, How Jan (1980b). 0n the risk performance of Bayes empirical
Bayes procedures in the finite state component case. Ph.D.
Dissertation. Dept. Stat. Prob., Michigan State University.

Van Ryzin. J. and Susarla, V. (1977). On the empirical Bayes
approach to multiple decision problems. Ann. Statist. 5,
172-181.

 

"IIIIIIIIIIIIIIIIII