\ .
Lo \‘za‘,"w ‘

 

gveaod; FINES: "

25¢ 90rd” per it“

éﬁh“ ‘ RETQRNING LIBRARY MATERIALS:
-‘ -, ”1 Place in book return to move
4 charge from circulation records

,/

 

 

 

ON THE RISK PERFORMANCE OF BAYES EMPIRICAL
BAYES PROCEDURES IN THE FINITE STATE
COMPONENT CASE

By

How Jan Tsao

A DISSERTATION

Submitted to
Michigan State University
in partiaT fulfiilment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probabiiity

1980

ABSTRACT
ON THE RISK PERFORMANCE OF BAYES EMPIRICAL

BAYES PROCEDURES IN THE FINITE STATE
COMPONENT CASE

By

How Jan Tsao

Since Robbins' introduction of the empirical Bayes approach
to a sequence of decision problems, a large literature has evolved
treating a variety of component problems. Most of the papers
advance empirical Bayes procedures which are asymptotically optimal,
and some establish rates of convergence.

In empirical Bayes decision making, the Bayes empirical
Bayes approach is discussed by Gilliland and Boyer (1979 ). In
the finite state component case. the Bayes empirical Bayes pro—
cedures are shown to have optimal properties in a fairly general
setting and believed to have small sample advantage over the classical
rules. The flexibility of making desirable adjustments for these
decision procedures by choice of prior enables one to set a proper
strategy when dealing with actual problems.

IrI this thesis, a complete class theorem is proved to show
that, at each sample stage, the class of Bayes empirical Bayes
rules is complete, and. under some regularity conditions, that it

is minimal complete. In the two state component case the posterior

mean which generates the Bayes empirical Bayes rules is shown to
be asymptotically normal under certain assumptions.

The use of Bayes empirical Bayes procedures creates some
interesting theoretical and computational problems as the Bayes
procedures are fairly complicated in structure. The thesis also
develops methods of<:omputing Bayes empirical Bayes rules and
determining their small sample risk behavior. In some cases risk
functions are evaluated by numerical methods, and, in other cases,

Monte Carlo simulation is used to estimate risk.

To my parents and Grace

ii

ACKNOWLEDGMENTS

I would like to take this opportunity to thank my advisor,
Professor Dennis C. Gilliland for his invaluable guidance, assistance,
encouragement and caretaking during the entire course of this study.
The financial support provided by the Department of Statistics and
Probability and National Science Foundation made my graduate studies
possible. I wish to thank them and Clara who accurately typed the
whole manuscript with great patience and skill.

My deep gratitude is extended to my parents and my wife,

Grace, for their understanding, encouragement and support.

TABLE OF CONTENTS

Chapter Page
1 FINITE STATE BAYES EMPIRICAL BAYES
PROCEDURES. ..... . ................................ 1
1.1 The Component and Empirical Bayes
Decision Problems ........ . ..... . ........... 1
1.2 Bayes Empirical Bayes.... .................. 6
1.3 A Complete Class Theorem.... ............... 13
1.4 The Classification Problem ................. 23
2 TWO STATE BAYES EMPIRICAL BAYES PROCEDURES ...... 25
2.1 Testing Simple Hypothesis Against
Simple Alternative ............ . ........... 26
2.2 Asymptotic Property of pA(§n).. ........... 30
2.3 Optimal Properties and Risk Performance 32

of Bayes Empirical Bayes Procedures for
Classification Between N(-1, 1) and

N(1,1) ..... . ............................. 32
2.4 Other Empirical Bayes Procedures .......... 38

2.5 Monte Carlo Comparisons of T, T1 and
T2. ..................... . ..... . ........... 41
APPENDIX A............ ....................................... 46
APPENDIX B ......... . ............................. . ........... 57
BIBLIOGRAPHY............. .................................... 63

iv

Table

>>>>>>>
Nam-bum

LIST OF TABLES

Rn(p,TB(1)) ....................................
Risk behavior of R n(p’TB(1)) ...................
The flexibility of R (p, T A) with a prior A

1n {B(y)|y > 0}....... .........................
R50(p,T2) ............ . ........... . ..... . .......

Comparisons of risk behaviors for decision
procedures T1, T2, TA; A e {B(y)|y > 0},

when n = 1,2,5 ......... . ......... . .............

Comparisions of risk behaviors for decision
procedures T1, T2, TA; A e {B(y)|y > 0}.

when n = 10,25,50 ..............................
Risk behavior of Rn(p,T1) ......................
Risk behavior of Rn(p,T2) ..... . ................
Risk behavior of Rn(p,TB(z))...... .............
Evaluation of the Bayes envelope R(p) ..........
Monte Carlo simulation of Rn(p,TA), A E B ......
A numerical computation program ................

Monte Carlo simulation of Rn(p,Ta), a =1,2 .....

Page
36
37
39

42

44

45
46
47
48
49
50
53
55

Section

problem

(1')

(ii)

(iii)

(iV)

CHAPTER I
FINITE STATE BAYES EMPIRICAL BAYES PROCEDURES

1.1. The component and empirical Bayes decision problems.
Consider the following component statistical decision

with which we shall be concerned. This comprises

A sample space (X,§) and a parameter space (9,3) where
5,3_ are o-algebras on X,o respectively. {P :

e
is a family of probability measures on (X, ) dominated

e e o}

by some o-finite measure u. fe is a density for P6
with respect to u, e 6 Q. X denotes an X-valued random
variable distributed Pe’ conditional on 9.
An action space (A,A) where A_ is a c-algebra on A
containing the singleton sets.
A loss function L: o x A + [0,”) representing the loss
of taking action a in A with e E Q. L(e,-) is measur-
able for each 9 6 n.
The (behavioral) decision rules t(-,-), each a function
of the pair (x,B) where x e X and B E A, having the
measurability properties below:

(a) for each x, t(x,-) is a probability measure

on A,

(b) for each B, t(o,B) is {fmeasurable.

A nonrandomized decision rule is one where for each x,
t(x,-) is degenerate. The set of (behavioral) decision
rules is denoted by A.

For any t, the expected loss when a is the true para-

meter is
R(e.t) = ffL(e,a)t(x,da)Pe(dx).

Let G denote the class of all probability measures (priors) on
g. with respect to which the t-sections of R(e,t) are measurable.

The Bayes risk of t versus G is
R(G,t) = f R(e,t)G(de).

tG is called a Bayes rule with respect to G if its Bayes risk
attains the infimum Bayes risk
R(G) = inf R(G,t).

tGA

He will assume that R(G) is attained for each G 6 G.
Throughout our discussions we will consider a = {0,1,...,m},
m

n = 2“, and assume that P = I g p is identified by
G = (go....,gm). G 15 the m-dimensional s1mplex in Em+1’ the
m+1-dimensional Euclidean space. We will call R(-,t) the risk
function of t and R(-) the Bayes envelope defined on G.

Consider the empirical Bayes decision problem. In it the

 

component decision problem just described occurs repeatedly and

independently. Thus, let (61,X1), (92.x2),...,(en,xn), (e X ),...

n+1’ n+1

be iid with Si having distribution G and, conditional on

e X. having distribution Pe . The marginal distribution of

i’ 1
1
Xi is the mixture PG’ Based on the initial observations
5“ = (x1,...,xn), a component dec151on rule Tn(§n) 15 selected

and evaluated at x to reach a decision about en+1’ n g 1.

n+1
Thus, an empirical Bayes decision rule for reaching a decision

about en+1 is
Tn(5n)(xn+1’°)’ n g 1.

The goal is to use the information about G from the
initial observations to construct a rule Tn whose risk behavior
is close to that of the Bayes rule tG(xn+1")' In general, more
information about G will be available with increasing number of
observations. We will consider an empirical Bayes procedure as
a sequence T = (T1,T2,...) of empirical Bayes decision rules

n+1

where for each n, Tn is a function on X x A_ such that every

x" = (x1,...,xn) -section is an element of A, the class of component
decision rules, and such that for each a e a, R(e,Tn(xn)) is
a measurable function in 5".

For each n, we let Tn denote the collection of all possible
Tn defined as above. The use of Tn against prior G incurs

the unconditional component Bayes risk
_ n
Rn(e,1n> - f R(e,1n(gﬂ))PG(d§n), n g 1

where here and throughout a symbol for a measure with a superscript
indicates a product measure. Since Tn(xn) 6 A for each .5" 6 X",

we see that for all n, Rn(G’Tn) R(G), the minimum component

IIV

Bayes risk. Observe that

Rn(G,Tn) = ER(G.Tn(§n))

= ago 96 I R( (6, T n(_‘n )) PG(dx )

m n m

= a; 96 f R(e.Tn(x )){,§1 [jéo fj( ) J} u ( _n)

= ? g (930. .glm)H (e 20, . ,2 ) (1-1)
e=0 10+. +2 = n 9 m
m
where
m
Hn(e,20,...,2m) = BX*""Bm f R(e Tn( )){ j- no iij fj (x1 )} u"(dxn)

'81) =
i = 0,...,m

The summation above is over partitions {B .,Bm} of

0’8 1’”
{1,2,...,n}, and the second summation in (1.1) is over all partitions

£0,£1,...,£m of the integer n, i.e., integers 2i g 0 with

Z 1i: n. From (1.1) we see that the risk function Rn(-,Tn)
i=0

is determined by the collection of coefficients

{Hn(e.20....,2m)|e= 0,...,m; ; 2. = n, 2. i o, i O,..,m} (1.2)

which in turn, can be identified by an element of of the space
EN, where by Feller (1975, (1L5.2)),~ =(m + 1) m+"). This remark

will prove useful in Section 1.4.

Definition 1.1. If lim Rn(G,Tn) = R(G) we say that T is
n
asymptotically optimal relative to G(a.o.[G]). If T is a.o.[G]

 

for all G E G, we say that T is asymptotically optimal (a.o.).

* . *.
Def1n1t1on 1.2. For Tn,Tn e Th,Tn 15 as good as Tn 1f
* *
Rn(G’Tn) : Rn(G,Tn) for all G 6 G. Tn lS better than Tn
. * *
lf Rn(G.Tn) ; Rn(G,Tn) for a11 G e G and Rn(G,Tn) < Rn(G’Tn)
for at least one G e G. Tn is equivalent to T: if

*
Rn(G,Tn) = Rn(G,Tn) for all G 6 G.

Definition 1.3. Tn is said to be admissible if there does not

exist an empirical Bayes decision rule in Tn that is better than
Tn. T is called an admissible empirical Bayes procedure if Tn
is admissible, n g 1.

Listed below are some desirable properties of an empirical
Bayes decision procedure T = (T1,T2,...).

(i) T is a.o.

(ii) Rn(G,Tn) converges to R(G) rapidly for all G 6 G.

(iii) T is admissible.

(iv) Tn has good risk behavior for small to moderate
values of n, that is, Tn is suitable for use
even when large numbers of observations are not
available.

(v) An algorithm for computing the decision rules is

available and can be executed economically.

(vi) T can be adjusted systematically to improve its
performance on many specified subsets of G.
We will judge the performance of an empirical Bayes procedure

on the basis of properties (i) - (vi) mentioned above.

Section 1.2. Bayes empirical Bayes

 

Let IQ be the Borel o-algebra of subsets of G. The
Bayes approach to the empirical Bayes decision problem considers

possible priors on (G,G). First we give the following definitions.

Definition 1.4. An empirical Bayes rule Tn 6 Tn is Bayes with

 

respect to a prior A on G, if it is a infimizer (across Tn) of

Rn(A,Tn) = j Rn(G,Tn)A(dG)

Definition 1.5. T is said to be a Bayes empirical Bayes procedure

 

if Tn is Bayes, n g 1. T is said to be a Bayes procedure with
respect to a prior A if Tn is Bayes with respect to A, n g 1.

To construct a Bayes empirical Bayes rule at stage n, it
is convenient to introduce the component risk set, S = {§j=(so,...,sm)|
for some t e A, 51 = R(i,t), i = 0,...,m}. S is a convex subset
of E"1+1 which we will assume is compact throughout this thesis.

We will use the following theorem (LeCam (1956, Theorem 3.3.2)).

Theorem 1.1. Let (X,X) be a measurable space and let a be a
compact metric space. Let f(x,e) be a function from X x 9 to

the real line. Assume that f is measurable in x for each a

and continuous in e for each x. Then it is possible to find
a function é(x) which is measurable in x and such that
f(x,é(x)) = inf f(x,t). D
t€®

For a g1ven A, Tn e Tn

Rnwn)

N
m
x

A
m
—.

A

I
m
A

‘ E(A) 620 R(O, T "(m )) )E A(gelX)

Here EA(G|Xn) is G-valued conditional expectation corresponding

to the conditional distribution of G given X and E(A)
corresponds to the mixture P U)( )= f P3( -)A( (dG) Since
(R(0,Tn(xﬂ)),.. . ,R(m, 1 n(_" ))) e s for all 5“ and Tn, to minimize
Rn(A,Tn) we seek a function 5: Xn + S such that 6 is measur-
able and

m
X 69(X )E (9 IX ) = inf Z 56 EA (9 IX )
e=0 T" A 6* ses e=0 em

where 6(Xn) = (50(Xﬂ),...,5m(Xn)). By Theorem 1.1, such a 6

exists.
Suppose 5 is a measurable version, and for each 5",

Tn A(x ) e A 15 such that

5%,) = R(e 1 <

n,A )), e = 0,...,m. (1.3)

x
—n

Then

T 6 Th and Rn(A,Tn

n,A ) = 1nf R (A,T ).

, A n n
The Tn

Also note that the Bayes empirical Bayes rule Tn is pointwise

A
component Bayes with respect to EA(G|§n) = (EA(gO|§n),...,EA(gm|§n)).

In what follows we sometimes will use the notation GA(§n) in-
stead of EA(G|§n).

For a given prior A, let Tn A 6 Tn denote a Bayes empirical
Bayes rule with respect to A which has the above form. We first

discuss conditions that assure that Tn A is a.o.. Oaten (1972,

(1.6 )) shows that
0 g R(G,tF) - R(G) g M HG-Fn (1.4)

for all F,G e G. Here M is a bound on the component risk,
“-ﬂ is the 11 (total variation) norm on Em+1' Under the
assumption that A has support all of G, Gilliland and Boyer

(1979) prove that

llm UGA(§H) - G“ = 0 a.s. PG for all 6 E G. (1.5)

where PE is the probability distribution of x1,x2,... . Thus,
in the bounded risk case,(ld4l with F = GA(§n) and(1.5) establish
is a.o., that is,

that Tn,A

lgm Rn(G’Tn,A) = R(G) for all G 6 G.

The above shows how the question of asymptotic optimality
in the finite o Bayes empirical Bayes problem can often be re-
duced to a question of the consistency of the estimator GA(§n)
for G.

To obtain the form of GA(§n), note that the conditional
dens1ty of 5n 15

n

n -
f (zﬂlG) - 1:1 jEO fj(x )gj

Hence the conditional probability of G given 5n has density
_ n n
f(G|xn) - f (xﬂlG)/ f f (xﬂlG)A(dG
with respect to A and

G( M5“ -f Gf(G|X )A( dG) (1.6)

9 _
We denote the components of GA(§n) by gA(§n), e - 0,1,...,m.
We now develop an algorithm for computing GA(§n). Here

we will represent G by the m-dimensional simplex in Em

m
S = {gm = ($1,.. .,s m)|s. > 0, i = 1,...,m, E s. g 1}

 

m i = j 1 J
m
and for gm 6 3m let s0 = 1 - jgl sj. By (1.6)
n
fse H { 2 SJ )} A d(s )
9A (1“) = i 1 m5 0 (1.7)

figlio E sjj)(f(1.)}A(d§m)

10

2 +2 + +2 = n 52 2 (5n)”g 1
0 "' m 0’°'°’ m 0’°'°’ m , B=l,...,m (1.8)

= X ) u
£0+21+...+2m n 20,..., m —n £0""’£m

where for each nonnegative integer partition 20,11....2m of n,

m
5") = Z { n n [f.(Xi)-fO(Xi)l} n f0(X

S ( . J
m BO""’Bm J=1 168. 1630

20,...,2 -)},(1.9)

l

J
IBOI = 20,...,|Bm| = 2m

Here B U...UBm = {1,...,n}, BinB' = ¢, i,j = 0,...,m,

1 J
I 21 lm
u = s .. s A (ds )
20, ,tm S 1 m
m
and
6 11 m
“2 , ,2 = f (51 . Sm ) S6 A (ds“). 6 = 1, ,m
0 m S
m
The following theorem leads to a convenient way of computing
(1.9).

Theorem 1.2. For each n g 1 and set of real numbers {a.jli=l...-,n;

j=0,...,m} define the function Qn on Sm by

(a1.0 + ails1 + ... + aimsm)'

O
A
U)

H
0
U
U)
V
II
":21:

11

For each nonnegative integer partition 20.11,...,2m of n, let
C2 1 denotes the coefficient for the term
0,...,m

21 22 2m
s1 s2 ...sm in the polynomial expansion for Qn‘ Then

C = a - C . n z 2, 1.10
£0,...’£m j=0 nJ 10,...lj'1,...£m "
with the convention Cn'1 = 0 if some k. = -1.
k0,.oo,km J

3399:, The proof follows from the uniqueness of the coefficients

n . .
C10"°"£m in the polynomial Qn' D

To find all coefficients of Qn(sl,...,sm), n g 2, we go

through equation (1.10)

n m
X ({(2 ,...,2 )| Z t. = k; z. 2 0}]
k=2 0 m i=0 ‘ ‘
-("""‘1>(+2)
- m + l ' m
m+1
n
”m1.

times (see Feller (1957, (II.5.2) and (II.12.8)), where the sign
m is used to indicate that the ratio of the two sides tends to
unity as n + m. The limiting form is obtained by applying the
Stirling's formula (Feller (1957)) and the l'H6pital's rule. To
apply Theorem 1.2 in computing (1.9), for each 5n 6.x", we let

aij = fj(xi) - f0(xi) and a1.0 = f0(xi), 1 = 1,..,n, j = 1,...,m.

Then for each nonnegative integer partition £O’°"’£m of n, we

have 51 Hence by (1.8),

(x ) = Cn .
0’°"’£m —n £0’°"’£m

12

 

2 + 1 +2 - n C: 1 “Z 2
92(En) 3 0 m 0 m 0 m a 9 = 1,...,m. (1.11)
1 en u
2.0+ 'HLm '-" n 2.0, .,ﬂm 2.0, ,lm

Note that 92 (5”) depends on A only through a finite number
of general moments of A. The computation of (1.11) is, in most
cases, more efficient and accurate than a direct numerical inte-
gration in (1.7) or a direct evaluation of (1.8). Even in the case
m = 1, a direct evaluation of 9: (5n) through(1”9) is not feasible;
in most cases, however, the application of (1.11) results in an
efficient and accurate evaluation. Chapter 2 provides a detailed
example.

Of course, computation with (1.11) is simplified when those

general moments of A can be evaluated easily. Here we consider

one such example:

EXAMPLE. (Bayes empirical Bayes with Dirichlet priors.)
Let 0(a1,...,am,a0) denote the m-variate Dirichlet distribution
on the simplex Sm which has probability density function

P(ao+...+am) a -1 am-1

-1
_ 1 0‘o
f(§m) - r(a0)1..r(am)’ s1 ...sm (1-51-...-sm) , §m 6 Sm,

 

where the o. are all real and positive. If we let

1
A = D(a ,...,am,a0), then it can be verified (Nilks (1962), (7.7.6))

1

that the general moment pl 2 of the m-variate Dirichlet
0,000, m

prior A has the following value

u : r(a1+21)...r(am+2m) F(a0+...+am)
20,...,zm P(oi)... r(ah)' r(a0+...+am+21+...+2m)

 

13

Section 1.3. A complete class theorem

 

Gilliland and Boyer (1979) have suggested that, for each
n, the study of empirical Bayes rules in Tn can be viewed as
a study of the class of nonrandomized decision rules in a decision

problem (G,D,Rn), so that the class B of Bayes empirical Bayes

n
rules is the class of Bayes rules in (G,D,Rn). In this section

we will prove that, in a large number of empirical Bayes problems,
Bn is a complete class for (G,D,Rn). The results apply to each

stage n, n > 1.

Definition 1.6. A class C of decision rules Cc: D, is said

to be complete, if, given any rule t in D not in C, there
exists a rule t* in C that is better than t. A class C

of decision rules is said to be essentially complete, if, given
any rule t not in C, there exists a rule t* in C that is as
good as t.

Consider the decision problem (G,D,Rn) with sample space
{xn,§h), parameter space (6,2), (Pg: 6 e G} a family of pro-
bability measures on (Xn,§h) dominated by u",§n distributed
P3 conditional on G, action space (S,§) where §_ is the Borel
o-algebra on S, loss R: G x S + [0,w) with R(G,§) = E 9656.
The class of nonrandomized rules D is represented by th:_glass
of measurable transformations form (Xn,§h) to (S,§). Using

1

rule d = (d0,d ,...dm) 6 o, the expected loss when G is the

true parameter is

"'

14

mm41-Immu%n$w%>
(1.12)

E fd9( )P"(d )
9:0 96 5n G in

Note here (1.12) and the fact d(xn) e S implies that each

Tn e Tn determines a d e D such that Tn’ d have the same risk
function; conversely, for each d 6 D there exists a Tn e Tn
with the same risk function.

Let A be a prior on 95 the Bayes risk of d 6 D is
Rn(A,d) = f Rn(G,d) A (dG).
A Bayes rule with respect to A is a rule dA E D such that

R (A,d

n A) = inf R (A,d).

deD “
Our discussion is restricted to nonrandomized rules because for

t 6 0,0 denoting the class of behavioral rules, we have

Rn(e.t> fj'R<e.§)t(gﬂ.d§)P3<d§ﬂ)

m

a; 99 ff §t(5n,d_s_)P2(d§n).

For each x“, def1ne

Then, according to Lemma 2.7.3. of Ferguson (1967), we have
d(xﬂ) 6 S. This implies d e D and d,t are equivalent (De-

finition 1.2). Therefore, the discussion of t E U is redundant

15

as far as the risk behavior is concerned and we may restrict our con-
sideration to the class of non-randomized rules D.

Assume that 5_ is generated by a countable number of sets
and recall S is a compact, convex subset of Em+1. From previous
discussions we know that the question as to whether Bn is complete
is reduced to the question whether B, the class of nonrandomized
Bayes rules in the game (G,D,Rn), is complete. To study this,

we need the following lemma.

Lemma 1.1. A compact convex subset S of Em+1 is an intersection

of countably many closed half spaces which contain it.

Proof:
Let Qm+1,Q1 denote the rational points in Em+1’El’
respectively, and define the countable collection of closed half-

spaces,
S={H={x_|b'_x§c}|§€ Qm+1,cEQ1 and S: H}.

We will show S = ns. Obviously, St: n 3 so it remains to show
03: S.

Let g_£ S. The separating hyperplane theorem (e.g.,
Rockafellar (1972), Corollary 11.4.1.), implies there exist

60, c0 such that for all §_€ S, 96§_§ c0 and c0 < 9633 Let
A = (Q6; - c0)/3. The fact that S is bounded and Qm+1 is

dense implies the existence of a b_€ Qm+1 such that for all

§.€ 5’ ELE ; CO + A and 963 - A g bfg, The denseness of Q1

establishes the existence of a c 6 0 such that c

‘1 0+A<c<b_0§_-A.

16

Then 5:: {§JQf§_§ c} = H and c < bfa so a e H. Since H 6 S,
_a_£ns. E]
The remark preceding Definition 1.1 of Section 1.1 implies
that in the bounded S case, all empirical Bayes risk functions
Rn(G,d) are polynomials in G = (90,91....gm) on G. Identify
a d 6 D by the risk function which in turn is identified by a
vector g = (g1....,g") of coefficients in the polynomial. Let
Q = {gld 6 D} and let H-H denote the usual Euclidean norm

1n EN.

Theorem 1.3. Q is a compact subset of EN.
3392:; The proof will be based on the fact that D is (component-
wise) weakly compact.

Since Q is a subset of a metric space it is sufficient
to show that D is sequentially compact (Munkres (1975) p. 181),
that is, every sequence in Q has a convergent subsequence. Let
{gi}‘: Q. Let {d1} be a corresponding sequence in D such that
g, is generated by di' Since {d1} is bounded, by the weak
compactness theorem (Lehmann (1959), p. 354), there exists a real
valued measurable function d8 and a subsequence {dg } of

0 1
{di} such that

113m I dg.(§n)h(§n)un(d§n) = f dgunhunmdx 1
l 1

for all integrable h.

Again, apply the weak compactness theorem to the sequence

{di }. There exists a real valued masurable function d3 and
i

a further subsequence {d%.} of {d%.

1 1

} such that

17

1mfd1m1Mxh"(%Id(xn%)"w%1
l '1

for all integrable h.

Repeat the above process we obtain a measurable transfor-

0

. n a m
mat1on from X to Em+1, d0 (do....,do) and a subsequence

{dk } of {d.} such that
i 1

lEm f d:i(xn)h(xn)un(dx) = f d8% )h(x )u" (dx ) (1.13)

for all integrable h and for e = 0,...,m. By the expression

following (1.1), each coefficient of dk in the polynomial
i

Rn(G,d is of the form (1.13) so from (1.13)

k1.)

11m “Qk. ' £10“ =
1 1

It remains to be shown that do 6 D or equivalently,

Pg [do 6 $1 = 1 for all G e G. Let

S = {H = {xlb'x g c}|b€ le. c E Ql and S: H}. We claim that
n
PG [d

O E H] = 1 for all G e G and H E 3. To see this, suppose

I
9.5 Em+1, c 6 E1, and b_§_§ c for all §_€ S. Then

n n
P G [Ed0 > c]

"A
MS
0'
O.
O
A
X
V
—h
(T) 3
A

11
11MB

—l
—l c-J

a
ﬁg.

0'
(D

a.
7rd)

A

A
O
'U

(I):
r1
[0"
D.
O

V
O
1...:

Q

where the last inequality follows from the fact bfdk.(xﬂ) c

IIA

1
for all i. Therefore, we have P2 [bfdo > c] = o, i.e.,
P3 [bfdo ; c] = 1. This proves the claim.
From the fact that S is countable and the above claim,
we obtain the result
-n 1=n .
1 - PG {HES [d0 6 H1} PG [do 6 n S] for all G 6 G.

But Lemma 1.1 shows that s = ns. Therefore Pg [d e 51 = 1

O
for all G E G. This completes our proof. D

Corollory 1.1 There exists a topology on D such that (a) D
is compact and (b) Rn(G,d) is continuous in d e D for all

G E G.

Proof: Define o to be a function on D such that o(d) = g
for all d e 0. Then the collection of sets

F = {m-1(A)|A open in Q}

is a topology on D such that o: D + Q is continuous. Since
o is onto, if Q is a covering of Q = ¢(D) then
{o-1(A)|A E Q} is a covering of D. Hence the compactness of
Q from Theorem 1.3. implies that (D,F) is compact.
Form the polynomial form of Rn(G,d), Rn(G,d) is a linear

combination of g‘ = Hio¢(d), i = 1,...,N where

19

Hi: EN + El, 1 = 1,...,N, is the projection map. Therefore the
continuity of H1°¢a i = 1,...,N, implies that Rn(G,d) is continuous

in d E D for all G e G. D

Definition 1.7. A rule d e D is extended Bayes if for every

 

e > 0 there is a prior distribution A such that

Rn(A,d) g inf Rn(A,d) + e
dED
The following theorem follows immediately from Corollary

1.1 and Theorem 2.10.3 of Ferguson (1967).

Theorem 1.4. The class of extended Bayes rules in D is essentially

 

complete.

Theorem 1.5. Any extended Bayes rule in D is a Bayes rule.

 

Proof: For d e D, Rn(-,d) is continuous in G. Let d e D be
an extended Bayes procedure. Then for each positive integer N,

there exists a prior distribution AN such that

] Rn(G,dAN)AN(dG) g f Rn(G,d)AN(dG)

; f Rn(G.dAN)AN(dG) + l/N. (1.14)

Since G, a closed subset of [0,13m, is compact, the class
{AN}:=1 is tight. By the Prohorov theorem (Billingsley (1968))
{AN};=1 is relatively compact which means that there exists a1
pr1or A and a subsequence {AN}N=1<: {ANlN=1 such that AN

converges weakly to A as N + w.

Con

The

20
Consequently,

I Rn(G,dA)A(dG)

"A

f Rn(G,d)A(dG)

1am I Rn(G,d)Aﬁ(dG)

IIA

TEE'I Rn(G,dA&)A&(dG) by (1.14)
g Em f Rn(G,dA)Aﬁ(dG)
= j Rn(G,dA)A(dG).

The above shows that d is Bayes with respect to A. B
Our complete class theorem follows directly from Theorem

1.4. and Theorem 1.5.
Theorem 1.6. The class of Bayes empirical Bayes rules is complete.

Proof: From Theorems 1.4., 1.5. we know that the class of extended
Bayes rules is equal to B and is essentially complete. There-

fore, for d E B, there exists a Bayes rule dA such that

Rn(G’dA) g Rn(G,d) for all G e G. If "=" holds for all G
in G then d is Bayes with respect to A, a contradiction, so
dA is better than d. This implies that B is complete. U

Definition 1.7. A class C of decision rules is said to be minimal
complete if C is complete and if no proper subclass of C is
complete.

It is also of interest to know when the class of Bayes

empirical Bayes rules will constitute a minimal complete class.

21

The minimal complete class, when it exists, is exactly the class
of admissible rules. Since Bn has been proved to be a complete
class, any admissible rule will be in Bn' It is then sufficient
to find conditions under which the Bayes empirical Bayes rules
are admissible. The following remark is needed in the proof of

Theorem 1.7.

Remark 1.1. If the members of {Pele e a} are mutually absolutely
continuous then so are the {PGIG e G} which implies that the
product measures {PEIG E G} are mutually absolutely continuous

and equivalent to any mixture P(A).

Theorem 1.7. Suppose that {P e e 9}, are mutually absolutely

 

6’
continuous and that the Bayes component decision rules are unique

up to risk equivalence. Then the class of Bayes empirical Bayes

rules is minimal complete.

Proof: Since the class of Bayes empirical Bayes rules is complete,
if we show that the Bayes empirical Bayes rules are admissible,
then Bn is minimal complete.

For a given A, let T be the Bayes empirical Bayes

n,A
rule with respect to A as defined in (1.3). Then for Tn E Th,

( ),1 (x )) (1.15)

R(GA(-)£n)’ T n,A —n

)) ;R(G(

nln win

Suppose Tn E Tn is Bayes with respect to A. Then

Rn(A,Tn) = Rn(A’Tn,A) (1.16)

22
and (1.15) and (1.16) implies

_ﬂ n ﬁﬂ)) = R(GA(§n). Tn A(5,1)) a.s. P(A)' (1.17)

By our hypothesis, the Bayes component rules are unique up to risk

equivalence, which means if t1,t2 E A are Bayes with respect

to G, then R(e,t1) = R(e,t2), e = 0,...,m. This and (1.17) im-
plies that
R(e,Tn(xn)) = R(e,Tn,A(xﬂ)), e = 0,...,m. a.s. P(A)

By Remark 1.1, the above equalities holds a.s. Pn

G for all G E G,

so that

Rn(G,Tn) = Rn(G,T ) for all G e G.

i.e., Tn is equivalent to Tn Thus, the Bayes rule with

,A°
respect to A is unique up to risk equivalence. It is well known
that if a Bayes rule is unique up to risk equivalence then it is

admissible. U

Empirical Bayes classification between N(-1,1) and
N(1,1) is a decision problem satisfying the hypothesis of Theorem
1.7. This example is the subject of computation and study in
Section 2.3.

Boyer and Gilliland (1980, Theorem 4) point out how the
continuity of risk functions Rn(G’Tn) in G ensures that Tn,A

is admissible if A has support all of G.

23

Section 1.4. The classification problem

 

In this section we will derive the form of the Bayes empirical
Bayes rules for classification problems. A classification problem
will provide an example for the application of the algorithm de-
veloped in (1.11) for computing Bayes empirical Bayes rules. In
a classification problem, an observation is to be classified as
comming from one of m + 1 distributions. Specifically, we let
A = {0,1,...,m} = Q and the loss be a if an incorrect classifi-
cation is made and 8 if a correct classification is made,
a > 8 g 0.

Recall, G = (90,...,gm) represents a probability measure

on Q. Conditional on X = x, the distribution of 6 has density
m
f(elx) = fe(x)ge/jE0 gjfj(x) e = 0,...,m.

For each a 6 {0,...,m} and x e X,

E L(e.a)f(elx)
e=0

E(L(6.a)IX)

a - (a -B)f(alx)

IIV

a - (a -B) max f(ilx)
169

Define dG(X) max {e|f(e|X) = max f(i|X), e 6 Q}

169

max {elfe(X)ge = max f.

,(X)gi. e e a} (1.18)
169

Then dG is a non-randomized component decision rule which is

Bayes with respect to G.

24

From the discussions in last section we know that Tn A(xﬂ
choses a Bayes component rule with respect to GA(xn). Therefore,
to implement the Bayes empirical Bayes rule with respect to A,

first evaluate GA( ) and then replace fe(x)ge in (1.18) by

in
f,(xn,1)gi<xn).

It is known (Ferguson (1967)) that when both R and A
are finite, the risk set S is compact. Hence in classification
problems, Theorem 1.6. implies that the class of Bayes empirical

Bayes rules 8" is complete at each stage n g 1.

)

CHAPTER 11
TWO STATE BAYES EMPIRICAL BAYES PROCEDURES

The studies based on a two state component decision problem
have a long history. Robbins (1951) studied the compound decision
problem and discussed both bootstrap and Bayes rules. After that,
Hannan and Robbins (1955), Hannan and Van szin(1965), Huang (1970),
Van Hbuwelingen(1974), Shapiro (1974), Gilliland, Hannan and Huang
(1976), Snijder (1977) have studied two state component decision
problems through either the compound or empirical Bayes approach.
Some of these discuss the rate of convergence for a.o., some discuss
finite state risk behavior. Only Snijder found a complete class
among a class of decision procedures under consideration.

In this chapter, we will study the two state component
Bayes empirical Bayes procedures. Section 2.1. formulates the
classification problem between two subpopulations and applies the
computing algorithm developed in Section 1.2. to evaluate the
Bayes empirical Bayes rules. In Section 2.2. we give sufficient
conditions under which the limiting distributions of the posterior
means are asymptotically normally distributed. In order to illus-
trate the properties of Bayes empirical Bayes procedures we examine

empirical Bayes classification between N(-1,1) and N(1,1) in

25

26

Section 2.3, where we know that the class of Bayes empirical Bayes
procedures is minimal complete at each stage n. With the help

of the algorithm developed from (1.11), the computations of the
risk functions are simplified, so that in Section 2.4. we are able
to compare the risk performance of Bayes empirical Bayes procedures
with other empirical Bayes procedures for selected priors A and

11. Van fbuwelingen (1974) has discussed the empirical Bayes approach
to the classical problem of testing a simple hypothesis against

a simple alternative. He has proposed a non-Bayes empirical Bayes
procedure as an improvement over the original Robbins rule. The
fact that the Robbins and Van Houwelingenrules are inadmissible

is also established in Section 2.4. Section 2.5 explores their
risk behavior and compares it with that of selected Bayes empirical

Bayes procedures.

Section 2.1. Testing simple hypothesis against simple alternative

 

In order to demonstrate the feasibility of Bayes empirical
Bayes approach we study a two state classification problem, i.e.,

m = 1, with the following component model:

9 = {0,1} = A
P0, P1 have dens1t1es f0, f1 respect1vely
Loss function
1 if e f a

L(e,a) =

27

Identify G by the mass p it puts on 1, so e.g., we write
R(p,t) in place of R(G,t). Thus Pp has density

fp = pf1 + (1-p)fo, P E [0.1] = 31.
version of a component Bayes rule is

From (1.18), a nonrandomized

IIV

tp(x) = 1 if pf1(x) (1-p)f0(x) (2.1)

A

0 if pf1(x) (I-p)f0(x).

Now let pA(Xn) denote 9A(5n)' By (1.7)

I p n tpf1(x ) + (1-p)f0(xi)11(dp)

 

0 =1
pA(Xn) = 1 n
I0 151tpf1(xl) + (l’p)fo(X1)JA(dP)
- Sn(5n)mn+1+---+ 50(5n)m1 (2.2)

 

Snuhmn +"'+ 31(Xh1m1+so(5h7

1 .
where m1 = f p‘A(dp); and, for Bc: {1,...,n}, |B| = cardinal
0

number of B and

n [f1(Xi) - f0(Xi)] U f OJ.(X ) k = 0,1,...,n

168 jEBO

k(-n g
|=k

|B

Direct computation of pA(Xn) by (2.2) involves the search of
(E) subsets of {1,...,n} for each value of Sk(§n) (Zn in total)
and by Stirling's formula (:")~(wn)‘1522n increases quite rapidly.
To apply Theorem 1.2. in this special case (m=1), observe

that the function Qn on 31 = [0,1] is

28

":13

Q (S) =

n (310 + ails)’ "

"V
H

i 1

Let C2 denotes the coefficient for the term 5k in the polynomial

for Qn' Then (1.10) becomes

n _ n-1 n-1
. n-1 _ n-1 . .
w1th C_1 - Cn = 0. Hence, to compute all the coeff1c1ents

of Qn(s), we only need to go through the recursive definition
(2.3) (n2 + 3n - 4)/2 ~ n2/2 times.

Let a].1 = f1(x1)-f0(xi); a].0 = f0(x.), i = 1,...,m. Then
n

 

Sk(xn) = Ck; k = 0,...,m, and, therefore,
n n n
_ Cn mn+1 +...+ C1 1112 + C0 m1 2 4
pA(5n) ' n + + Cn + cn ( ' )
Cn mn ... 1 m1 0

has the form of (1.11). (Note that pA(xﬂ) depends on A only
through the first n + 1 moments of A.) The Bayes empirical
Bayes rule is given by (2.1) with p = pA(Xﬂ) as in (2.4); this
was discussed in Section 1.4.

He now turn to risk behavior. For Tn,A the conditional

expected loss g1ven X“ is

R(p’Tn,A(ln)) = R(p.t

ll
—h
H
A
ﬂ-
V
C1.
d
+
A
H
I
U
g—w
—h

Also, the risk function R (.,T

n n,A) has the form

29

n
Rn(p,Tn A) = I R(p.Tn’A(§n)) 1:1 fp(xi)d>_<n (2 6)

which is a polynomial in p with degree at most equal to n + 1.

From the observations in Chapter 1, we see that a Bayes

empirical Bayes procedure TA = (T1 T ..) has the properties

(1')

(ii)

(iii)

(iV)

(vi)

,A’ 2,A"

TA is a.o. if A has support (091)

Tn A is admissible, if A has support (0,1)

Tn,A is admissible, if {Pele e a} are mutually
absolutely continuous and if the Bayes component rules
tp are unique up to risk equivalence. Admissibility
results in a good risk performance for small values
of n.

Tn,A 6 8", 8n is a complete class, n g 1.

An algorithm for computing the decision procedure is
based on (2.4) which can be executed economically.

The performance of Tn can be adjusted by choice of A.

,A
Low risk over a region of [0,1] is obtained by

choice of A concentrating on that region.

In later sections we will study the risk performance of

T along with other empirical Bayes decision procedures. All

n,A

the properties (i) - (vi) above will be demonstrated in a class of

examples.

Also, for notational convenience n will not be displayed

in denoting empirical Bayes rules. It will be clear from the con-

text whether a sequence of decision rules (procedure) or a decision

rule is being discussed.

30

Section 2.2. Asymptotic property of pA(Xﬂ)

 

This section is a slight digression in which asymptotic
properties of the posterior mean pA(Xﬂ) and maximum likelihood
estimator 5(xﬂ) are stated. The proofs aredeferredtunﬁl Appendix
B. A theorem of LeCam (1956) is used to prove the asymptotic
normality of Jﬁ'(5(xn) - p) and one of Johnson (1970) to prove
the asymptotic equivalence JR (pA(Xﬂ) - 5(Xﬂ)) + 0. The approach
is similar to that of Shapiro (1972) in establishing the asymptotic
normality of the cut point in the Bayes empirical Bayes rule.

The product of mixtures density f3(xn) is continuous in
p 6 [0,1], a compact subset of 51' Theorem 1.1 ensures the exis-
tence of a maximum likelihood estimator 5. Whereas the evaluation
of 6(Xﬂ) is a difficult computation, the Bayes estimator is easily
computed by the recursive formula developed in Section 2.1.

Gilliland, Hannan and Huang (1976) show that the maximum
likelihood estimator 5 is consistent for the empirical proportion
of states "61 = 1" in the independent non-identically distributed
compound model and the consistency result is inherited by the em-
pirical Bayes model. Likewise, their results on the consistency
of pA transfer to the empirical Bayes model. The theorems to
follow place stronger assumptions on the model and prior but yield
the asymptotic normality in addition to the consistency for p

in the interval (0,1).

31

Theorem 2.1 If fllog fi(x)|Pj(dx) < w for i,j 6 {0,1} and the

 

true parameter pO 6 (0,1), then

p(Xﬂ) + p0 a.s. P

and
‘ -1 . . . .
/ﬁ(p(§ﬂ) - p0) + N(0,I(p0) ) 1n d1str1but1on
where
32
= - l f
1(1)) Ep'a‘p'f 09 p(x)
Proof: (In Appendix B) D

Theorem 2.2 Suppose A is a prior on [0,1] which has density

 

A with respect to the Lebesgue measure where A(p0) > 0 and
A(-) has three continuous derivatives in a neighborhood of the
true parameter pO 6 (0,1). If P0, P1 are mutually absolutely
continuous and if fllog fi(x)|Pj(dx) < w for i,j 6 {0,1} then

(D

(5(PA(£n) - 5(1ﬂ)) + 0 a.s. Pp0

Proof: (In Appendix B) D

As a consequence of Theorem 2.1 and Theorem 2.2, under the hypothesis

of Theorem 2.2, pA(Xn) + p0 a.s. Pco

p0 and (R(pAQn) - p0) +

N(0,I(p0)'1) in distribution.

32

Section 2.3. Optimal properties and risk performance of Bayes
empirical Bayes procedures for classification between N(-1,1)

and N(1,1).

 

To illustrate the risk performance of Bayes empirical Bayes

procedures we will study the following example.

EXAMPLE: Testing N(-1,1) against N(1,1).
1, f0(x) = (2n)-%exp{—(x+1)2/2}
and f1(x) = (Zn)-%exp{-(x-1)2/2}. By (2.1) a nonrandomized version

In this example we have X = E

of a component Bayes rule is:

ff
A
X
V
II
H
....
‘h
X
“V
n

(2.7)

0 if C
x < p

_ 1 1:3. . .
where Cp - 20m ( p ). The Bayes emp1r1cal Bayes rule TAQn)
simply replaces p in (2.7) with pA(Xn). By (2.4) and (2.7)
an algorithm for computing the Bayes empirical Bayes rule is al-

ready available. If A is chosen as the probability measure

corresponding to a mass 1 at p, then TA tp is the Bayes em-
pirical Bayes procedure with respect to A. In particular, with
A(%) 8 1, TA 8 t15 is the minimax procedure with constant risk
Rn(p,TA) = P0(X ; 0) = 0.1587 for all p 5 [0,1] and n g 1.
Observe that the risk set of tp, p 6 [0,1], is
{(so,sl)ls0 = P0 [X g a], $1 = P1 [X < a] for some a E £-w.w]};
this together with the form of (2.7) implies that the component

Bayes rules are unique up to risk equivalence. By Theorem 1.7

33

we see that at each stage n, the class of Bayes empirical Bayes
rules is minimal complete in this example.

In our applications we will deal with those A that belong
to a given parametric family 8 = {B(y)|y > 0} where B(y) denotes

a symmetric beta distribution on (0,1) with density

= [(21) y-l _ y-l < <
93(Y)(p) [r(y)]2 p (1 p) for o p 1.

From previous discussions we note that {TAz

totically optimal procedures and are admissible at each stage n.

A e B} are asymp-

Also note that assumptions in Theorem 2.2 are satisfied, so that
pA(Xn) is asymptotically normally distributed. The variance of
. . . . . -1

the l1m1t1ng distribut1on of /H (pA(Xﬂ) - po) 15 I(p0) .
(Behboodian (1972) discussed the conditional moments of p for

Beta priors.)

Remark 2.1. If A has a density gA(p) which is symmetric about
1/2, then Rn(p,TA) = Rn(1-p, TA) for p 6 [0,1]. To see this,
observe
(i) fp(-X) = f1_p(x)
(ii) pA(-xﬂ) = I-pA(xﬂ) (by elementary calculus)
(iii) CA(-xn) = -CA(xn) (a direct result of (11))

where chn) = 11.)," [(l-pAqnn/pAqnn. Since (iii) implies

R(p,TA(-X ))9

_n)) = R(l‘pa TA(

1,,

the remark is verified by appealing to (2.6), (i) and (iii).

34

Since Rn(p,TA) is a polynomial in p (see (2.6)), the
Remark 2.1. implies that for A e B, Rn(p,TA) is a function of
1)2

(P- 2' ; hence,it has an even degree less than or equal to n + 1.

With n = 1 or 2 one can readily see that Rn(p,TA) will be
a horizontal line or a parabola with extremum at 1/2.

We will compute the values Rn(p,TA) for p 6 [0,1] when
n = 1 or 2. Using (2.5), (2.6) and results (i), (ii), (iii)

of the Remark, elementary calculus shows

2

R1(p.TA) = 2(a-b)p + 2(b-a)p + a
with
m CA(x1)
a = f [ f1(x)dx f1(x1)dx1
m CA(XI)
b = f f f1(x)dx f0(x1)dx1.

Also, a tedious calculation shows that

R2(p,TA) = [3c - (2d + e)]p2 + [(Zd + e) - 3c]p + c
with
c = [a I-” I-“ f1(x)dx f1(x1)f1(x2)dx1dx2
m m CA(x1.x2)
d = I-” f-” I-” f1(x)dx f1(x1)f0(x2)dx1dx2
CA(x1.x2)

e = f w f0° f f1(x)dx f0(x1)f0(x2)dx1dx2.

m

35

For the case A = 8(1), the uniform distribution on (0,1),
numerical computations supported by softwares from IMSL (1979)
(Table A.6) subroutines were used to compute: a = 0.12071,

b = 0.21212, c = 0.09576, d = 0.16486, e = 0.25720. Using MSU

CDC 6500 computer the accuracy of computing a,b,c,d,e was controlled

at 3 to 4 significant decimal digits. Therefore:

R1(p,TB(1)) = -0.1828p2 + 0.1828p + 0. 1207 (2.8)
R2(p,TB(1)) = -0.2997p2 + 0.2997p + 0.0958 (2.9)
are parabolas concave downward with extremum at p =-% .

The direct numerical computations for n 3 2 and A e B
are in general not feasible; to overcome this difficulty, Monte
Carlo integration method was used to evaluate Rn(p,TA). For
A 6 B, we generate independently L sample sequences of independent
from a population having f (x) as

n P
density. For each of the L sequences generated, we then compute

random variables X1,...,X

R(p,TA(§n)) based on (2.4) and (2.5). An estimate of Rn(p,TA)
is obtained by averaging the L computed values of R(p,TA(xﬂ)).
An estimate of two standard deviations of the average is also
obtained based on these L samples. L is made large enough to
make the two standard deviations width acceptable in each experiment.
Within each constructed table in this paper, the numbers following
the f signs are estimates of two standard deviations of the Monte
Carlo estimates. (See Table A.5 for computing program.)

To examine the accuracy of our Monte Carlo estimates,

Table 1 compares the values of Rn(p,TB(1)) with p = 0.0(0.05)0.5

36

and n = 1,2 obtained by (2.8), (2.9) and by Monte Carlo integra-

tions. Table 2 explores the risk behavior of Rn(p,TB(1)) for

n = 1,2,5,10,25,50 and for p = 0.0(0.05)0.5. (also see Table
A.3 for Rn(p’TB(2)))° It can be seen that Rn(p,TB(1)) converges
to R(p) quite rapidly and has steady small sample size risk
behavior. Values of Rn(p,TB(1)) for p > 0.5 need

not be computed because of the symmetry about 0.5.

Table 1. Rn(p,TB(1))

 

 

n = 1 n = 2
"1133227”""AEAEGEET WEBEETmABAQEEESI"

p Carlo Computing Carlo Computing R(p)
0.0 0.122t0.006 0.121 0.093:0.005 0.096 O

0.05 0.12810.006 0.129 0.107t0.006 0.110 0.0405
0.10 0.13710.005 0.137 0.125:0.006 0.123 0.0701
0.15 0.146:0.005 0.144 0.135:0.006 0.134 0.0934
0.20 0.151:0.005 0.150 0.147:0.005 0.144 0.1121
0.25 0.15510.004 0.155 0.152i0.005 0.152 0.1270
0.30 0.160:0.003 0.159 0.159:0.004 0.159 0.1387
0.35 0.162:0.003 0.162 0.164:0.003 0.164 0.1476
0.40 0.16510.002 0.165 0.168t0.002 0.168 0.1538
0.45 0.166:0.001 0.166 0.17010.002 0.170 0.1574
0.50 0.16610.001 0.166 0.170:0.002 0.171 0.1587

 

*200 replications for each estimate

 

37

 

Aopcmo mucozv mums_umm comm Low m:o_umo_pamc om
Aopgmu macozv mumswamm comm com mcopumuw_amc com

cowuauzano Fmowgmszc an umcwmuno mmpmswumm

«is...

c;

.4.

 

 

Ammﬂ.o Noo.owmmﬂ.o Hoe.OHeeH.o Noo.omm~ﬁ.o moo.owuka.o “22.5 meﬁ.o om.o
e~m~.o ~oo.omoo~.o Noo.owooﬁ.o moo.owHNH.o moo.owmuﬂ.o oNH.o oeﬁ.o me.o
mmmﬂ.o Hoo.owkmﬂ.o Noo.ommeﬁ.o moo.owmeﬁ.o moc.owﬁuﬁ.o mo~.o meH.o oe.o
eNeH.o Noo.owmmﬂ.o Noo.ommmﬁ.o moo.owoo~.o mco.owﬁe~.o eeH.o NmH.o mm.o
Ammﬂ.o Noo.owme~.o Hoo.owmeﬁ.o moo.ow~mﬁ.o eoo.owmmﬁ.o mmH.o mm~.o om.o
o-2.o Hoo.owomﬂ.o Hoo.ow¢m~.o moo.omgeﬁ.o eoo.oweeﬁ.o Nm~.o mmH.o mN.o
2N~H.o ﬂoo.oonH.o moo.ohwﬂﬁ.o soo.osN~H.o moo.owmmﬁ.c eeﬁ.o omH.o om.o
emmo.o ~oo.owmmo.o Hoo.omoo~.o Noo.owmoﬂ.o moo.owo~ﬂ.o em~.o eeﬁ.o m~.o
Hoko.o ﬂoo.owm~o.o Hoo.owo~o.o eoo.ommmo.o moo.cwﬁoﬁ.o MNH.o Nm~.o oﬁ.o
moeo.o ~oo.oweeo.o Hoo.ommeo.o moo.ow~oo.o moo.cm-o.o o-.o mNH.c mo.o
o ~oo.omeoo.o ~oo.owmﬁo.o moo.owmmo.o mco.owomo.o omo.o HNH.O oo.o

Any: ...omwe ..mmue ..on= wwmue .Nwe .Hue a

 

AAﬁvmh.nv:a mo Lo_>mcma xmﬁm .N m~am~

38

At this point it is important to note that in the Bayes
empirical Bayes approach the presence of A does not restrict the
construction of Bayes empirical Bayes procedures but adds the
flexibility which enables one to access a family of decision pro-
cedures with predictable risk behavior. In particular, consider
procedures TA for A e 8. While the mass of B(1) is evenly
distributed over [0,1], B(y) puts more weight to those p values
close to 0.5 as y increases, and conversely, puts more weight to
those p values close to 0 and 1 as y decreases. From the
fact that TA is admissible and TA is Bayes with respect to A,
we expect that for a < b, Rn(p,TB(a)) > Rn(p’TB(b)) for p
close to 0.5 and Rn(p’TB(a)) < Rn(p’TB(b)) for p close to
0 or 1. Table 3 shows the flexibility with choices among
B(y); y = 0.25,1,2,3,10 and gives values of R1(p,TB(Y)) for
p = 0.0(0.05)0.5. The fact that B(y) has mean %- and variance
1/4(2y + 1) implies that as y + w, B(y) converges weakly to the
distribution degenerated in p = %-, and hence TB(y) converges

to the minimax rule with constant risk .1587. This is also re-

flected in Table 3.

Section 2.4. Other empirical Bayes procedures
Robbins (1951) in his original example of the related com-
pound decision problem uses the estimator

n
p1(Xn) = max{0,m1n{1, 0.5 + ('2

0.5 Xi)/n}} (2.10)
1

1

39

 

mumEFHmm comm com meowamo_—ch com

mumewumw scam Low meowpmu__qmc com

.3,

.4.

 

 

~oo.owmm~.o ooo.owooﬁ.o ooo.owﬂo~.o Hoo.oweoﬁ.o moo.OAom~.o Ammﬁ.o om.o
Hoo.ommmﬁ.o ooo.owoeﬁ.o Hoo.owHoH.o Hoo.owoeﬁ.o ~oo.owem~.o «Amﬁ.o me.o
ﬁoo.oaom~.o ~oo.owmm~.o ~oo.owooﬁ.o Noo.owmo~.o Noo.onoaﬁ.o mmmﬁ.o oe.o
~oo.owwm~.o Hoo.owmmﬁ.o moo.owmmﬁ.o moo.cw~e~.o moo.ow~m~.o 82¢“.o mm.o
Hoo.owmm~.o Hoo.ow~mﬁ.o Noo.owumﬁ.o moo.owooﬁ.o eoo.owﬁmﬁ.o Ammﬁ.o om.o
Hoo.ommmﬁ.o Noo.owemﬁ.o moo.owmmﬁ.o eoo.owmmﬁ.o eoc.owooﬁ.o oNNH.o mN.o
Hoo.owkmﬁ.o Noc.ommm~.o ~oo.op~m~.o moo.o“~mﬂ.o moo.ohmmﬂ.c H~H~.o om.o
Hoo.owcmﬂ.o Noo.owNmH.o moo.ommeﬂ.o moo.owoeﬁ.o moo.ow~¢~.o emmo.o mﬁ.o
Hoo.o«mmﬁ.o Noo.owmeﬁ.o moo.oﬂmeﬁ.o moo.ow~mﬁ.o moo.owm~g.o Houo.o oﬂ.c
Hoo.o0mmﬁ.o Noo.omee~.o moo.omoe~.o ooo.owmmﬁ.o moo.o«~H_.o moeo.o mo.o
Hoo.ow~m~.o Noo.omoeﬁ.o moo.OHmmH.o ooo.om-~.o moo.owmoo.o o oo.o
..oﬁw» wwmu» ..Nw» ..Hu» wm~.ou> Anya a

Arvm u < mo gmuwsogaq

 

Ac x »_A»Vmc e. < Lo_Le e eu_z A<h.avﬁa co »p___e_xm_c sch .m 8.882

40

and the corresponding decision procesure T1 constructed by re-
placing p in (2.7) with p1(Xn). Van Houwelingen (1974) modified

Robbins' procedure by estimating p with an improved estimator
n
p2(Xﬂ) = max{0, min{1, 0.5 + ( 2 0.908429 tan h(Xi))/n}} (2.11)
i=1

and constructing a decision procedure T2 by replacing p in

(2.7) with p2 ). Both p1,p2 are consistent estimators of p,

(A.
and consequently from (1.4) and (1.5), the corresponding decision
procedures T1 and T2 are asymptotically optimal. Also as observed in
Van Houwelingen (1974), the rate of convergence for both T1 and
T2 is proportional to (n)'1.

However, T1 and T2 are not Bayes empirical Bayes rules.
To see this, note that from (2.2) it follows that the conditional
mean pA(Xﬂ) is 0 if A is degenerate at p = O, is 1 if
A is degenerate at p = 1, and satisfies 0 < pA(Xn) < 1 other-

wise. Thus, apart from the trivial procedures T(X ) t and

-n 0
T(X") 5 t1, there are no Bayes empirical Bayes procedures taking

values to or t1, with positive P3 probability. However,

since p (X ) and p (X ) take on both 0 and 1 with positive
1 -n 2 -n

P3 probability, T1 and T2 take on values t0 and t1 with

positive Pg probability demonstrating that T1

not Bayes empirical Bayes procedures. Moreover, the fact that

and T2 are

Bn is complete implies that T1, T2 are not admissible empirical

Bayes rules.

41

In fact, if p(Xn) is an unbiased estimator for p = 0
other than the estimator p(Xn) = 0 a.s., then the corresponding
decision rule T choses to with a positive probability less
than one, which implies that T is not Bayes and hence not admiss-
ible.

Before leaving this section we assess the accuracy of equation
(28) of Van Houwelingen (1974) which gives an approximation to the
risk functions of T1 and T2 for large n and p 6 (0,1).
The approximation formula has a faster convergence rate for points
p close to 0.5 than those points close to 0 or 1 and is not defined
at p = 0,1. Table 4 compares the values of R50(p,T2) estimated
by (28) of Van Houwelingen (1974) with the Monte Carlo estimates.
Note there is a significant difference for the two estimated values
at p = 0.05 and agreement otherwise.

Since pA(Xﬂ) is known to be asymptotically equivalent to
the M.L.E., using the asymptotic second moment of pA(Xn) about
p in (28) of Van Houwelingen (1974) provides an alternative estimate
for the large n risk Rn(p’TB(y))‘ A numerical computation
showed the fairly close agreement of the results with those reported

in Table 2 for n

50.

Section 2.5. Monte Carlo Comparisons of TA, T1 and T2

 

T and T2 are neither Bayes nor admissible rules. At

1
stage n, we were able to choose a A in B such that Rn(p,TA)
behaves as a good competitor against Rn(p,Ta); a = 1,2. In

some cases that follow, the risk values of T1 will not be listed

42

 

 

Table 4. R50(p.T2)
p Van Houwelingen Monte Carlo*
approximation

0.0 0.004t0.001
0.05 0.050 0.04610.001
0.10 0.077 0.078:0.001
0.15 0.099 O.101:0.002
0.20 0.117 0.119:0.002
0.25 0.132 0.133:0.001
0.30 0.144 0.14410.001
0.35 0.152 O.153t0.001
0.40 0.158 0.158t0.001
0.45 0.162 0.162t0.001
0.50 0.163 0.163:0.001

 

200 replications for each estimate

 

43

if our results showed that the risk behaviors of T1 and T2
were very similar. (See Table A.I and Table A.2 for complete data,
Table A.7 for computing program.)

From Table 5 we see that the estimates of R1(p,TB(.25))
dominate those of R1(p,T§), R2(p,TB(.10)) dominates R2(p,Ta),
R5(p,TB(.15)) dominates R5(p,Ta). From Table 6 the estimates of
R10(p,TB(.35)) dominate those of R10(p,Ta). The estimates of
R25(p,TB(.37)) come within one standard deviation of the estimates

for R p,T2) when p = 0.25(0.05)O.5 and p = 0.0 but signif-

25(
icantly less than R25(p,T2) at p = 0.05(0.05)0.2. For

n = 50, R50(p,TB(1)) dominates R50(p,Ta) except at p = 0.0.

The small difference may be adjusted by carefully choosing some
B(y) with y slightly less than 1. This will improve the risk
function at p = 0.0 with a little sacrifice at p = 0.5.

Table 2 shows the rule TB(1) has good small sample per-
formance. However, this is not true for T1 and T2 at n = 1,2.
Table A.1, Table A.2 and Table 5 entries indicate that the Bayes
empirical Bayes rule with respect to the uniform prior TB(1) has
lower risk than Robbins and Van Houwelingen empirical Bayes rules
T1 and T2 except near p = 0 (and by symmetry, near p = I).
Copas (1969), p. 413) reports a similar finding in regard to

TB(1) and T1.

It is interesting to note that the estimates of R1(p,T1)
dominate R1(p,T2), but, estimates of R50(p,T1) are dominated
by R50(p,T2). This means that small sample properties may not

be guaranteed by a fast convergence rate and vice versa.

44

 

mumswumm comm com mco_pmo__qmc oooc e
mams_umm comm com mcowumo_pamc ooom N
momspumm comm cow mcowomow_amc oooH m

momswumm comm com mcoFHmo_PamL cow

.4.

 

 

 

noo.owmmm.o moo.owmmm.o moo.owomm.o woo.owo¢~.o moo.owmmm.o eoo.owmmv.o Noo.owom~.o mmmﬁ.o cm.o
moc.omvmm.o moo.owmmm.o Noo.ow~om.o voo.ommem.o moo.owmom.o eoo.owm~¢.o Noo.owvma.o cum~.o m¢.o
noo.owmmm.o moo.omeN.o noo.ow~om.o moo.owmmm.o mco.owmmm.o moo.owmcv.c Noo.owom~.o mmmH.o o¢.o
noo.ome~.o voo.ow¢om.o noo.owmmm.o moo.owomm.o moo.oav¢m.o moo.owmmm.o moo.o«mma.o cmvH.o mm.o
moo.owmo~.o woo.owmmﬁ.o moo.owﬁum.o moo.QHoHN.o ooo.owvmm.o ooo.camnm.o woo.o«~m~.o mme.o om.o
moo.owHwH.o eoo.o«mo~.o moo.o«cqm.o moo.o«~m~.o moo.OHOOm.o noo.owmem.o coo.owmoﬁ.o oumﬁ.o mm.o
moo.oﬁmm~.o coo.o«~m~.o moo.owmﬁm.o moo.ow¢-.o moo.owmom.o moo.owmom.o moo.owmmﬁ.o HNHH.o om.o
eoo.owemH.o moo.oamNH.o noo.owewﬁ.o moo.owmm~.o moo.o«NmN.o moo.o«no~.o moo.o«meﬁ.o emmo.o mH.o
moo.o«~oﬁ.o voo.o«~mo.o ooo.owmea.o ooo.o«e-.o noo.cwmma.o moo.omeN.o moo.o«wmﬁ.o Hemo.o o~.o
moo.oweoo.o moo.o«cmc.o ooo.o«mmo.o moo.oavwo.o noo.o«m¢~.o moo.owemg.o moo.owHHH.o moeo.o mo.o
moo.owﬁmo.o Noo.oaoﬁo.o moo.o«omo.o moo.owovo.o moo.o«mmo.o moo.ow-~.o moo.o«mmo.o o oo.o
wimzzmmmmw» wAmH.VmH Nimzzummuw» *Aoﬁ.vmh em:_ccom mimzzmmmuw» ramm.vmh Aqvm a
........ me we 2..
m .N .H w e ewe: .co x »_A»vmoa< we» .N» .Hp
mmcammooca :o_m_omu com mco¢>mcmc xmwc mo mcom_cmasou .m m_ceh

 

45

 

 

 

mums_umm comm cow mcowumo_pamc com 8

mumawumm comm com mco_umoWPamc com *

moms_umm comm com mco_umo_pmmc cm a

~oo.owmoﬁ.o ﬁoo.ommma.o moo.owmo~.o Noo.oumm~.o ~oo.oamm~.o moo.ow~m~.o moo.o«mm~.o “mm“.o om.o
Hoo.owmm~.o Hoo.owmcﬁ.o ~oo.omooﬂ.o ~oo.ommm~.o ~oo.owme~.o moo.owmw~.o moo.owmma.o mNmH.o mm.o
Hoo.owmmﬁ.o Hoo.owmmﬁ.o ~oo.owmmﬁ.o moo.owmma.o Noo.o«mo~.o moo.o«~mﬁ.o moo.owmmﬁ.o wmm~.o om.o
Noo.o«mmﬁ.o ~oo.owmm~.o moo.o«mm~.o Noo.owwma.o ~oo.owmm~.o moo.owmm~.o moo.omﬂmﬁ.o mNmH.o mm.o
moo.owmmﬂ.o ﬂoo.o«¢m~.o Noo.owm¢~.o ~oo.oumm~.o moo.omeH.o ooo.owmma.o moo.owmmﬁ.o “mm~.o om.o
Noo.ow¢m~.o ﬂoo.owmm~.o Hoo.owom~.o Nco.owmmﬁ.o Hoo.owmmﬁ.o moo.ow~c~.o moo.owm¢~.o o-~.o m~.o
Noo.o«omﬁ.o Noo.o«mHH.o ~oo.oonH.o Noo.omeH.o Hoi.oH-~.o moo.o«m¢~.o moo.o«mm~.o HNHH.o om.o
~oo.o«~o~.o ~oo.ow~o~.o moo.o«wmo.o Noo.o«moﬁ.o Hoo.o«~o~.o moo.owomﬁ.o Noo.o«moH.o mmmo.o m~.o
Hoo.o«mko.o ~oo.o«m~o.o Hoo.owm~o.o ~oo.oa~mo.o ~oo.owmmo.o ~oo.owomo.o moo.owamo.o Homo.o o~.o
Hoo.owmmo.o ~oo.owmeo.o Hoo.owmmo.o ﬂoc.oumeo.o ﬂoo.owmmo.c Noo.owmmo.o ~oo.owomo.o mome.o mo.o
Hoo.o«moo.o Hoo.oweoo.o Hoo.owmoo.o Hoo.o«moo.o Hoo.o«moo.o moo.owm~o.o woo.owe~o.o o oo.o
Es... .-....“usﬁ .05.. .....““.ﬁ .15.. seamen...» .32.. as .

om-" c i i i 1 mm-" c u a- 1 .1. m“ u c
8 .8 .2 w e 5;: .8 x : Ea: 3 .N2 .5

mmczumooca commwomm cow mco_>mcmc caps mo mcom_cmasou .o m—cmh

APPENDICES

APPENDIX A

46

 

 

 

Ao_cmu oucozv mums_umo comm can m:o_umo__aoc mo Lucas: A.V
.oc.ommmm_.o moo.cw_c~—.c mco.ow~cm_.o moo.cw~mm~.o noo.cwmowu.o noo.owsmom.o moo.owcmmm.o om.o
.oo.ow_~m—.c ~oo.ow~mm..c ooo.owc~m_.c woc.owomm~.o noo.owmmo~.o soc.owmmoM.o moo.owcmmm.o mc.o
.oo.ow~om_.o moo.owamm_.o soc.owm~a—.o sco.owmm-.o soc.ow~hm~.c noo.ownom~.c moc.ommmmm.o oc.o
~oo.owcmm_.o moc.owooo_.c woo.cwmma_.c sco.cwmw_~.o soc.owonm~.o soc.owmcm~.o moo.on~mcm.o mm.o
moc.owomc_.o noc.owscm_.c mac.owc—s_.c oco.owmmo~.o sco.o«__c~.o noo.owmoo~.c ooo.ow~c~m.o om.o
~oc.owomm_.o cco.owm_c_.c acc.cwm~o_.c moo.owc~m_.o ooo.ow~c_~.o soo9owmmc~.c moo.ommmm~.c m~.o
~oo.ow~c~_.c moo.owmo~_.c aoc.owmcc_.a moo.owm:o_.o moo.owo~a_.c noo.ow~c_~.c noo.onmcm~.o o~.o
~oc.owm~c..o mac.owmmo_.o noc.owm.~..o moc.ow~om_.c moc.cw~mm_.c sco.owmmm_.o moo.onm~m~.o m_.o
—oo.ow_c~o.o ~co.cncmac.o «cc.owmmmo.c noo.owo~o_.o coo.cw—a—_.o ooo.owonm_.c soo.owmcm_.o o..o
.oo.ow_~:o.o ~oo.o«mmcc.c ~oo.owmmmo.o moo.owcmmo.o coc.cwmm~o.o coc.owmmao.c moo.ow_m:..o mo.o
.oc.owmcoc.o ~cc.o«-oo.o noo.onmm_o.o moo.o«m-c.o noc.cwm—mo.o moo.o«mmcc.o moo.o«-mo.o oo.o
AooNV Roomy Amoco cacao. “com... .ooc.~v Aooo.:v
OMIC MNIC O p U: MIC . MIC NI: —IC Q

 

.Amm.evem co Le_>memm xm_m .H.< a_nme

47

 

 

 

Ao_cmu oucozv mums.uno comm cow m:o_amo__amc mo cones: A.v

.oo.cwmmm_.o ~om.owmmm_.c moo.owmcm_.c hoo.cwo~m~.c soo.mw:m-.c uoo.mw:o_n.c :cm.cwm-s.o cm.o
.oo.cwm.m_.c Nam.owm~m_.o mco.mw-m_.o moo.ow_mm~.c ~co.ow~mm~.c soc.cwmmom.o :co.ow-.e.o ma.o
.co.cw-m_.o noo.ow_mm_.o moo.mwmmm_.o som.cwo-~.m ~oo.owm_m~.c Ncc.cws~om.o moo.owmmo:.c c:.o
.oo.owm~m_.c Noc.cw:~m_.o moc.owm~m_.o soc.ownm_~.c mc°.ow:~m~.o soo.owm~mu.c moo.owm~mm.c mm.o
.oo.cnmma..c mac.owmm:_.c eca.cwomm..o moo.omo~o~.m mco.ow_mm~.o ~co.owmo-.o mco.ow~_~m.o mm.o
_oc.owm~m_.o aom.owm.a_.c moo.owm~m_.c mam.cwm_m_.c moc.ow~a_~.o sce.owcm:~.c smo.owm~am.o m~.o
~om.ow~m__.c mcc.cwcm~_.o acm.mwmma_.o moc.ow:~m_.c moc.owmhm_.o soc.cwma_~.o sco.owm~om.o o~.c
«co.owa.o_.c mom.ow_mo_.m moo.owm°~_.o moc.cwoam_.c mco.owmmm_.o soo.cw~:m_.c moo.owm~m~.c m_.o
.oo.owm-o.o ~oo.owm~mo.c ~o¢.ow~mmo.m moc.mwc_c_.c acm.owmm_..o moo.owm_s_.o moo.owmmp~.c o_.o
.oo.cwmm:o.m Ncc.ow_mam.o «cc.ow.mmo.c mec.owoamo.c moc.cw_m~c.o moo.ow_mmo.o moo.ow:mm_.o mo.o
.oc.cwmnco.c ~oa.owmmom.o mom.cwma_o.m nmm.cw~o~m.o coo.ow.cmc.o moc.owmcmo.o moo.owmo__.c oo.c

Aoo~y .ocuv Acme. Accmv Acme... Amoc.~v Acco.av

omlc mun: c-u: ml: ml: NI: plc a

 

.ANe.ave¢ co Le_>meee xm_m .~.< 8.882

48

 

 

 

Ao_cmu oucoxv mums_omo comm co» m:o_omo__mmc om «m

“0.509 oucozv oume_umo comm cam «co—umo__moc com «
—co.OHN—o_.o —oo.o«mmm—.o NO0.0Hmem—.c —°c.OHNmm—.o —oc.cHN:o—.c —O0.0Hon—.c oco.OHm—o—.o om.o
No0.0Hc—w—.c —O¢.°Homo—.G NO0.0Hmm@—.o —cc.OHmmm—.c —oc.owmmw—.o —oc.OHMNc—.o —oo.cHM—w—.o ma.o
Noo.OH:~m—.o poc.owhmm—.O NO0.0H:No—.o ~O0.0N—ao—.O —oc.OHN—m—.o —Oo.o«@—m—.O poo.owuoo—.c 0:.0
—oo.onom—.o .oc.OHNmm—.O Nco.owmwm—.o NO0.0Hth—.c NO0.0Homm—.O Noc.omcmm—.o Noo.owmmm—.O mm.o
NO0.0HmMa—.o —oo.ou—m:—.o Ncc.owm~:—.o NC0.0H:Nm—.o moc.cwm:m—.O Ncc.owmmm—.o Noc.owmmm—.o Om.c
poo.owmmN—.o —oc.cuman—.O m¢¢.o«oo:—.O moc.owc::—.c moc.OHmm:—.o moo.owm—m—.O moo.onJm—.c mN.o
Noo.owmm——.o —OG.OHNw——.O mO0.0HN@N—.c £00.6Hmom—.c MO0.0Hm:3—.o ncc.owcaa—.c Nco.owmom—.o o~.o
_oc.onmsmc.o —O0.0Hm—O—.o NOG.OH:———.O moc.OHNMN—.o {Oo.owomm—.O MO0.0Hmmn—.O MO0.0H—w:—.c m—.O
Nco.owmomc.o Noo.0Hm—mc.o mco.oummmc.o mo¢.oanm——.O moc.owoo——.o JO0.0H-m—.o moc.owm::—.O O—.o
Noo.owuhéo.c Nco.cﬂmmmo.o MO0.0H¢QN0.0 €06.0Hmhmo.o :O0.0H—:——.O acc.OH—mN—.o moo.OHNO¢—.O mo.o
NO0.0H~m—c.c NOO.¢H:@N0.0 MO0.0H@emO.o MO0.0Hm—wo.c :O0.0ﬂOmo—.O 36°.OHoQ——.O moo.OHN:M—.O oo.o

«mom-c «mun: «o—Ic «ml: «nu: «Nu: «pa: a

 

.Acmvmh.evem co Le_>e;em ¥m_m .m.< 0.882

Table A.4.

LIGTrF.

100=

105=

110= P=O.

120= D0 2 I=1r50
130= P=P+0.01
140=

150= U=CP+1.
160: U=CP—1.
170=2

1803

190=

200= PRINT 79P7RP
210=7

220=2 CONTINUE
230= END

Evaluation of the Bayes envelope R(p)

PROGRAM ENULOPCOUTPUT)
REAL PrCPvUvUrAIBrRP

CP=0.5*ALOG((1.-P)/P)

CALL MDNOR(07A)
CALL HDNDR(UvB)
RP=P$A+(1.-P)*(1.-B)

49

F0RMAT(3X:F5.273X1F10.6)

.01
.02
.03
.04
.05
.06
.07
.08
.09
.10
.11
.12
.13
.14
.15
.16
.17
.18
.19
.20
.21

.22

.23
.24

.25

.26
.27
.28
.29
.30
.31
.32
.33
.34
.35
.36
.38
.39
.40
.41

.42

a?
\‘9‘0
.44
.45
.46
.47
an
.4?

I": ~
5 -_J'.,’

. 111':
l(\.'

EXEC BEGUN.09.?4.2

.009311

.026090
.033503
0 040459
.047018
.053226
.059118
.064722
.070061
.073155
.080019
.084670
.089117
.093373
.097446
.101345
.105077
.108649
.112067
.115336
.118461
.121447
.124298
.127017
.129608
.132074
.134417
.136642
.138749
.140741
.142620
.144388
.146047
.147598
.149042
.150382
.151618
.152751
.153783
.154714

.155545

.156276
.156909
.157443
.157380
.158219
.158462
.153507

f." “\ ’ III?
0 m 5;} IDLJM.

Knutor

50

Table A.5. Monte Carlo simulation of Rn(p’T0)’ A e B

REHDY 22.11.33
UK.
UK-HTTRCH9ﬂyBHYESS.
ﬁTTRCHyHyBHY586.
UK-FTﬂyl-HyﬂPT-e.
COMPILING BERISK
COMPILING CDEFICT
CUMPILING BETH
.531 C? SECONDS CUMPILRTIDN TIME
UK-PRUMPT. ‘

DK-LISTTY’IIﬁaﬂs.
PRUGRHH BERISK(INPUT:OUTPUT,THPESIINPUT:THPES'UUTPUT)

DOUBLE 9(100):B(100)a0(100>9M(100):X(100)

DOUBLE na.na.nsesn.se1.sun1.suna

REEL HCLINIT.HERN.P.P1:P2:PRR9R(1000).RISK10RISK29SD0Y1.Y2
RERL Rnun.ssn

INTEGER CUUNT.NEXP.HUH

COMMON n.3,c/HUNEHT/H

URITE(6.100)

100 Funnnr(.1..oTH13 PRUGRHM IS WRITTEN BY HUM Jan Tsao.)
Renn<5.400>s.7

400 FoanaT<F5.a.F5.a>

1000 IF<EDF(5).NE.O)STUP
Rean<s.soo>P.HEXP.N.DSEED '

500 FunnnT<F5.a.14.14.025.13)

URITE<69550>PvﬂEXP’NvDSEED
550 FURNRT<¢0¢0F3.8:3X:I473XaI49D25.13)

C
KIH+1
CRLL BETQ<K0T:S)
C
C FDR EﬁCH EXPERIMENT NE 98816” R UNIFORM(0:1) RRNDDM 9HRIHBLE
C
CRLL 66UBS<DSEEDvNEXPrR>
C
COUNT-0 '
C COUNT IS THE NUMBER OF ILLIGHL DRTRS FOR 651
C
RISK180.0
RISKEIO.O
C
DD 1 Ltl’NEXP
C
DSEED=214748364?.DUOPCL)+1.
C

DU 10 1'1,"
PRR=GEBIP<D£EED:1:P)

ME HHVE GENERRTED H BERNBULLIKP) ERNDUM VHRIHBLE

13(30

X(I)=GENQF(DSEED)+C2.¢PHR-1.)

tpwfﬁﬁty

ME HR?E GEHEPHTED R NDFMRL€E*PRP-l.pl} PHNDDM VHRIHELE

1) [DH T I HUE

C HUM
C

30

C

c

C

30

C

C

C HUM
C

C

C

C

C

C

C
4321
650

C

(int?

700

51

ME COMPUTE EElsﬁCLINIT

DO 30 I'lvN
DHi-0.5¢(X(I)-l.)9(x(I)-1.)
DBI-O.SO(X(I)+1.)¢(X(I)+1.)
B(I)=DEXP(DB)
H<I>=DEXP<DR)-B(I)

CONTINUE

CRLL COEFICT(N)

SUNI'C(1)ON(1)
SUN2=C(1)

DU 30 1-2.K

SUNI'SUN1+C(I)¢N(I)

SUME'SUN2+C(I)ON(I-1)

CONTINUE

6E1-SUM1/8Uﬂ2

ME SCREEN OUT ﬁLL ILLIGHL DRTR
IF((GE1.LE.0.D0).OR.(6E1.rE.1.D0))EO TO 4381

RCLINIT80.SDDODLO6((1.D0-6E1)/GE1>

NOU UE CONPUTE CONDITIONRL BRYES RISK GIVEN X(1)s...vX(N)

Yl-RCLINIT-l.
YEIRCLIHIT+1.

CELL NDNOR<Y1$P1>

CRLL MDNOR<YEyP3>
RISK-POP1+(1.-P)O(1.-P8)
RISKI-RISK+RISK1
RISKE'RISKORISK+RISKE

GO TO 1

URITE<60650>(I:B(I)9B(I)9C(I)9I819N)
FORMHT(O 094X9I493085.13)

COUNTICOUNT+1
CONTINUE

NUN'NEXP-CUUNT

HERN'RISKI/NUN

SD'SE‘PT ( (RISKe-NUNONEFINONEHN5 / (HUN-I . ) )
RNUMINU"

SSD'E..SD/SQPT(RNUM)
WRITE(6,7003P2N9NEXP9HEHN23D!33D
FURNRT(’0.9¢P' .9F5.290 N89214:. NEKP=92I42

+ . RISK=¢9F10.590 39:0.F10.5,o 33D: OsFS.3)

GO TO 1000
END

52

SUPPOUTIHE CDEFICTfN)

DOUBLE 9(100):B(100)9C§100)90(100)
common 9.2.0 ’

c<1>=3<1>

c<20=9<1>

IF(N.EQ.1)EO TO 5

DO 10 1829N

D(1)=B(I)¢C(1)

DO 30 J8€9I
D(J)8R(I)OC(J-1)+B(I)OC(J)
C(J-I)ID(J-1)

20 CONTINUE
D(I+1)-H(I)OC(I)
C(I)ID(I)
C(I+1)'D(I+1)

0 CONTINUE
RETURN
END

Of? (RF

SUBROUTINE BETH<K9T98>
REEL T98

DOUBLE "(100)9PROD19PRODE
CONNON /MONENT/H

THIS SUBPOUTINE GENERHTES 1 THRU K TH MOMENTS OF
BETH<T1$>

OCUOC)

PROD1'1.D0
PRODE=1.DO
DO 10 I'lyK
PRODISPROD19(T+(I-1.))
PRODE-PRODEO(T+S+(I-l.)3
N(I)=PROD1/PRODE

10 CONTINUE
RETURN
END

OEOROO

OEOI

OK-HHL.

HHL 5.33

L?LGO.

EXEC BEGUN.22.16.4E.

THIS PROERRM IS WRITTEN BY HOM JRN TSRO

00.3 0.5
90.20 10 5 13524.00
.20 10 3 .135240000000000000D+0S
P= .20 Na 5 NEXP= 10 RISK= .1330? SD: .03410
00.20 400 5 14336.00
.30 400 5 ,14gegoonunnnnnngnnn+05
P= .20 Na 5 NEXPa 400 PIER: .13464 SD= .039?3

0 THIS ROUTINE CONTINUES UNTIL USER HEOPT.

$20:

0'!)
III")

.004

53

Table A.6. A numerical computation program.

This program evaluates:

 

 

 

 

DCADRF (H,A,B)-i where
g l N(T) I i 1:000
{a 0 C(S,T) i _ 1
J I I fo(x)dx . fo(S)dS * r1(r)ar /f;
A c -~ -
’1... P ___I '
l—r(s)
DCADRE(F,C,D)

RERDY 12.22.37

u HTTHCHyRaRONBB.

RTTRCH:H!RONBB.

RERDY 12.32.49

LISTTYyIiﬁaNS.

PROGRHN RONB3(OUTPUT)

INTEGER IER

RERL DCHDRFPH9FOQF1’RQB!HERR9PERRFERPUR!INTEG
EXTERNRL H

93-3.11

335.11

RERR'O.

HERR'1.E-5 '
INTEG‘DCRDRFstRvByﬁERRsRERRQERROP!IER)
PRINT 7!INTE59ERROR9IER
FORNRT(1X9FI7.15!3X!F10.8!3X9I3)

END

"J

("I

‘-

REHL FUNCTION H(T)

INTEGER IER .
RERL DCRDREvaF01F19C9D99ERR9RERRsERRORsINTEGIZ
EXTERNRL F

COMMON /JOINT/Z

ZST

D33.11

RERR‘U.

HERR31.E'S
H=DCHDEE(F7C9D!RERR9EERR9EERUESIERF.FI(T)
RETURN

END

1'? "TI

REEL FUNCTION F63)
DOUBLE C

REEL $.P.Y.2.T.F0 1
COMMON /.JO I NT "2 2 .

Ta:

HONOR
DCADRE

IMSL subroutines used:

"r'm: (SSW-*1. 3. DCADRF: a binary copy
CHLL mpngp.g"f.p~. of DCADRE.

F=POF0930
RETuRH
END

54

('3

DOUBLE FUNCTION C(SsT)

DOUBLE X12X32D99D396

PERL SpT

X183

X2=T
D981.+DEXP(2.OX2)+DEXP(2.0X1)+3.¢DEXP(2.O(X1+X2>)
DB!4.+2.ODEXP(2.OX2)+2.ODEXPfE.OX1)+4.ODEXP(2.O(21+X2>)
sane/03

C-0.S¢DLOG((1.-G)/E)

RETURN

END

1 j 17';

RERL FUNCTION F0<X>

DOUBLE Y’PI

RERL X'
PI83.14159265353979323846264338D0

Y8K
FOIDEXP(-0.5.(Y+1.)¢(Y+1.))/DSDRTf2.OPI)
RETURN

END

RERL FUNCTION F1(X)
DOUBLE YvPI
RERL X
91-3.14159265353979323346264338D0
Y'X
FI'DEXP(-0.5¢(Y-1.)¢(Y-1.))/DSQRT(2.¢PI)
RETURN
END

OEOR00

OEOI

REHDY 12.24.10
PETURNsDCRDRF.

RERDY 12.25.49
REMINDyH.

RERDY 12.26.04
RTTRCHyDCRDRFuCRSDCRDRF.
RTTRCHsDCRDRF9CRSDCHDRF.

REHDY 12.26.25

FTNsI-R.

COMPILINE RON33

COMPILING H

COMPILINE F

COMPILING C

COMPILINE F0

COMPILINE F1

.220 CR SECONDS COMPILHTION TIME

SEED? 12.26.48

HRL.

HHL 5. 37’

LPLURD.DCRDRF.

L- ?LSH 9 ' LIED.

LTERECUTE.

EHEC BEGUN.13.3?.39.

.3351?0353?439?3 .00000514 0
END ROME?
1.013 0P SECONDS EEECUTIDH TIME

55

Table A.7. Monte Carlo simulation of Rn(p,T ), a = 1,2
Cl

PERDY 12 .43.39

LIST’FyNS.
PROGRRN RISK(INPUT3OUTPUT:TRPESIINPUT:TRPE6=OUTPUT)
DOUBLE DSEED
PERL R(6000)9X(100)

MONTE CRRLO SINULRTION OF MIXED NORNRL RRNDON VRRIRBLES
FOR TESTING N(1y1) VS N(-191)
NEXP REPLICHTIONS OF SHNPLES UITH SIZE N IS GENERHTED
TO ESTINRTE RISK BEHHVIORS OF (1) ROBBIN’S DECISION PROCEDURE
9ND (2) VRN HOUELINGEN’S DECISION PROCEDURE126=C
URITE<6050)
0 FORHRT<¢0O9¢DRTR-¢)

OUOOOO

RERD<50100)P9NEXP9N:DSEED

100 FORNRT(F5.29I49149D25. 18)
URITE(69200)P:NEXP:N,DSEED

200 FORHRT<¢0¢9F5. 293X’I4’3X9I493X7D25. 13)

CRLL 66UBS<DSEED9NEXPIR)
RBI-0.0
R3280.0
RVHIO0.0
RVH2-0.0

DO 1 L-1:NEXP
DSEED=214743364?.¢R(L)+1.
SUN1'0.0

SUN2'0.0

0

DO 10 I'lvN
PRR’GGBIR(DSEED:12P)
X<I>=GENQF<DSEED)+(2.0PﬁR-1.)
SUN1=X<I>+SUN1 ‘

\ SUN23THNH(X(I))+SUM2

10 CONTINUE

PRB'O. 5+SUN1/(2.ON)
PVHIO. 5+0. 90342942‘SUM2 /N

IF(PRB.EE.1.>GU TO 350,
IF<PRB.LE.0.)GD TO 352
CRB=0.5¢HLOG((1.-PRB)/PRB)
vesxaceB-l.
YPB2=CRB+1.
CHLL MDNOR<YR811PRBI>
CHLL MDHUR(YRB2.PRB2>
RB=F+PFB1+(1.-F)o<1.-Fesa)
GO TO 356

330 F22 1. -P
60 T0 355.

:23 FE1=F

56

336 IFfpwH.5E.1.)EO TO 320
IF(PVH.LE.0.)GO TO 332
CVH=0.509LOG((1.-PVH)/PVH)
YVH1=C¥H*1. '
YVH2=CVH+1.

CRLL NDNOR(YVH1:PVH1)

CRLL NDNORfYVHEyPVHE)
PVSPOPVH1+(1.-P)O(1.-PVH2)
GO TO 460

PV=1.-P

GO TO 450

PVSP

c. 1‘) 1,... Lg. 1.51
.' III II: (III
[0 .- - .

RBI'RBI+RB
RBEIRB2+RBORB
RVH1=RVH1+PV
RVH2'RVH2+PVOPV
1 CONTINUE

0T
D

RNEXP-NEXP
SNRBIRB1/RNEXP

SDRB‘SQRT((RD2-RNEXPOSNRBOSNRB5/(RNEXP-l.))

SSDRD=2OSDRB/SQRT(RNEXP)
SNVH'RVHl/RNEXP

snvnasoer<(RVHa-RNEXPoSMVH.SMVH)x<RNEXP-1.>>

SSDVH-BOSDVH/SQRT(RNEXP)

WRITE(6!400)SMRB!SDRB!SSDRB!3NVH!3DVH9SSDVH

400 FORMHT<90999ROBBIN 99F10.5:F10.5:F5.390 V HOU *92F10.59F5.3)

END

QERDY 12.50.09
FTN.
CONPILING RISK
.159 CR SECONDS CONPILHTION TIME

RERDY 12.50.37

HAL.

HRL 5.37

L?LGO.

EXEC DEGUN.12.51.3B.

BETH-0.3 200 5 26133.D0

.30 200 s .261380000000000000D+0S
003210 .1902? .03330 .012 v HOU .1erxs
END RISK

.193 CR SECONDS EXECUTION TIME

.0901?

.013

APPENDIX B

APPENDIX B

In this appendix we prove Theorem 2.1 and 2.2. The notation

and the following assumptions are from Johnson (1970). The model

assumes X1, X2,..., i.i.d Pe where Pe has density f(x,e)

with respect to a given 0-finite measure 0-

3.1

3.2
3.3
3.4

3.5
3.6

3.7

3.8

3.9

The parameter space ® is a compact subset of E1. Let
®() denote its interior and @_ denote the Borel 0-algebra
on G.

e is identified by P9.

f(x,e) is jointly measurable in (x,e).

For each x, f(x,e) admits continuous first and second
partialderivatives with respect to e.

The measures P are mutually absolutely continuous.

e
If lim lei] = m, then lim f(x,ei) = 0 for all x except
for perhaps a null set depending on the sequence.
For all e e @. Eellog f(X,e)| < w and
2
0 < 1(6) = -Ee [-3§-log f(X,e)]
36
For each 60 E 00, there exist functions Gl(X) and
62(X) satisfying

2
|§%-log f(X,e)| g 61(X), |§—2-log f(x,e)| g 62(X)
as

for e in a neighborhood of 60 and also Ee [61(X)] < m

0

and E [G (X)] < m. The functions G and G may

60 2 1 2
depend on 90.
Let f(X,6,p) = sup f(x,e’), p > 0

le-e'lgo
and Q(x,y) = sup f(x,e), y > 0.
WM

57

3.10

3.11

3.12

3.13

58

For every 6 6 0 and 0.Y > 0. T(X.6.o) and Q(X.Y)
are measurable functions of x. Moreover, for sufficiently
small 0 and sufficiently large y,

E8 [log f(x,e,p)J+ < .
0

E6 [log Q(x,y)]+ < w for each 60 6 @0

0
For each x, log f(x,e) had 5 continuous partial derivatives
with respect to a 6 9.

There exists functions Gk(x) with E6 [Gk(x)] < m

0
and

IEEF'IOQ f(x,e)| g Gk(x) for e in a neighborhood
of 60 E 8. k = 3,4,5.
A is a probability measure on (8, go, A has density
A. with respect to the Lebesgue measure. For 60 6 @0,
1(60) > 0 and A(-) has 3 continuous derivatives in a
neighborhood of 60.

f [0|A(e)de < m.
0

Conditions 3.1 8.9 are basically those assumed by Wald

(1949) to establish the strong consistency of the M.L.E. and those

of LeCam (1956) to show that the M.L.E. is asymptotically normal.

A weakened one-dimensional verison of LeCam's (1956) Theorem 3.4.1 is

Theorem B.II.1 Let 8.1 ~ 3.4, 8.7, 3.8 be satisfied.

Then the maximum likelihood estimator 5n is strongly consistent

and asymptotically normally distributed. The variance of the

59

limiting distribution of

/0 (6 (

n 5“) - 00) is 1/[I(eo)].

The following Theorem is a specialization of Theorem 3.1

of Johnson (1970) to y = 1 and |<= 2.

Let
M) [13321 f(x )1‘3
e = - —- 09 .,6
"i=1? 1
and
-1 n a3 ,
a3n(e) - n 121 Egg-log f(Xi,e)/6, e E 0.

Theorem 8.11.2 Under the assumptions 8.1 ~ 8.13, there exists
a constant C such that for sufficiently large n depending on

x = (x1,x2,...) belonging to a set of probability one,

3
~ -1 ~ . . « '-1 -1 ' 2
Iii/((60.1) - an - b (6a3n(en) + A (en)/A(en))n l g 0 Ch (1)
where we have abbreviated b(en(xn)), en(xn) by b and en.
Proof: See Theorem 3.1 and (3.4) of Johnson (1970). D

For the proofs of Theorem 2.1 and Theorem 2.2 we apply the
above results with f(x,e) = ef1(x) + (1-6) f0(x0, 0 = [0,1], 6 = p.

Proof of Theorem 2.1. It is sufficient to show that the hypothesis

 

of Theorem 2.1 implies that of Theorem 8.11.1. Clearly 8.1 ~ 8.4
are satisfied. (Recall that identifiability 8.2 is a tacit assumption
in our empirical Bayes problem and is implied by P0 and P1 being

different measures in the two state case.)

60

Since logarithm is strictly increasing on (O,w], for

66 [0,1] and for almost all x

 

[log f(x,e)| g max{|log f1(x)|, llog f0(x)l}. (2)
Also,
2
2 (f (X) - f (X))
_ 3 _ 1 0
1(6) - -f;;§-log f(x,e)Pe(dx) -‘( f(x,e)2 Pe(dx) > 0,

for e e 0. since uffl f f0] > 0. Hence, 8.7 is satisfied.

For 80 6 (0,1) pick 8 > 0 such that e < min{eo,1-e }.

0

Then for each a e (GO-6’90 + e) and f0r almost all x

ak ~1
|-——-log f(x,e)|[(k-1)1]
83k

1/[9 + f0(x)/(f1(x)-f0(x))]k g e'k < (so-e)’k if f1(x) > f0(x)

-k

f1(x) k k -k
IIBTIT'-1| /[e(f1(x)/f0(x)) + (1-6)J g (1-0) < [1-(00 + 5)]

if f1(x) < f0(x).

0 if f0(x) = f1(x) > 0, k = 1,2,3,... (3)
Hence 8.8 is satisfied.

Proof of Theorem 2.2 We first show that the hypothesis of Theorem
2.2 implies that of Theorem 8.11.2.1t is clear that 8.5, 8.6, 8.12,
8.13 are satisfied; also, from the proof of Theorem 2.1, we see

that 8.1'~ 8.4, 8.7, 8.8 are satisfied.

61
Observe that
f(x,e,p) = f(x,(e-p)v0)I[f0 ; f1] + f(x,(e+p)A1)I[f0< f1] é max{f0(x),f1(xﬂ
and
Q(x,y) = f(x,0)I[fO ; f1] + f(x,l)I[fO < f1] g max{f0(x),f1(x)} (4)

both are log-integrable by (2).
We see that (3) implies 8.10, (4) implies 8.9. Also, (3)
implies that 8.11 is satisfied as well.

Next we show that the result of Theorem 8.11.2 leads to

/H (EA(e|§n) - en) + O a.s. P60 (5)
By Theorem 8.11.1, én + 30 a.s. P: , 80 is the true parameter,
0
-0
80 E b .
Let
( ) I 32 ( > ( )
8 6'6 = - ———-log f x,e P dx
1 0 362 90
and
83
C(eleo) = ]-—3- log f(x,e)P (dx).

as 60

Together 8.8, 8.11 and the uniform strong law (Rubin (1956))
implies that

2. . “ 1.1.
b (an)-+I(eo) > 0, a3n(en) 6 C(eoleo) a.s. P

62

Also, 8.12 implies that

i A + I " + l °°
1(0n) 1(80) > 0 and A (on) A (00) a.s. Peo'
Therefore, for some M > O and for almost all .x,

1

lb‘1(6a3n(én) + x'(én)/1(én))l + Cb' g M

for large n so that (1) implies (5).

BIBLOGRAPHY

BIBLIOGRAPHY

Ballard, Robert J. and Gilliland, Dennis, C. (1978). On the risk
performance of extended sequence compound rules for classi-
fication between N(-1,1) and N(1,1). g, Statist. Comput.
Simul. 6, 265-280.

 

Behboodian, Javad (1972). Bayesian estimation for the proportions
in a mixture of distributions. Sankhya, Series B. 34, 15-22.

Billingsley, Patrick (1968). Convergence of probability measures.
John Wiley and Sons Inc.

Boyer, John E., Jr. and Gilliland, Dennis, C. (1980). Admissibility
consideration in the finite state compound and empirical
decision problems. Statistica Neerlandica, 34.

Copas, J.B. (1969). Compound decisions and empirical Bayes. JRSS
Series B, 31, 397-425.

Feller, William (1957). An introduction to probability theory and

its applications, Vol. 1., 2nd edition, John Wiley & Sons,
Inc., New York.

Ferguson, Thomas, S. (1967). Mathematical statistics a decision
theoretic approach. Academic Press.

Gilliland, Dennis, C., Hannan, James and Huang, J.S. (1974).
Asymptotic solutions to the two state component compound
decision problem, Bayes versus diffuse priors on proportions.
RM-320, Statistics and Probability, MSU. (1976). Ann,
Statist. 4, 1101-1112.

Gilliland, Dennis, C. and Boyer, John E., Jr. (1979). Bayes empirical
Bayes. Submitted for publication.

Hannan, James, F. and Robbins, Herbert (1955). Asymptotic solutions
of the compound decision problem for two completely spec-
ified distributions. Ann. Math. Statist. gg, 37-51.

 

Hannah, James, F. and Van Ryzin, J.R. (1965). Rate of convergence
in the compound decision problem for two completely spec-
ified distributions. Ann. Math. Statist. 36, 1743-1752.

 

63

64

Huang, J.S. (1970). A note on Robbins' compound decision problem.
RM-266, Statistics and Probability, MSU. (1972). Ann.
Math. Statist. 33, 348-350

 

IMSL Library, 3 and 3, 7th ed. Int. Math. Stat. Libraries, Houston,
TX, January 1979.

Johnson, R.A. (1970). Asymptotic expensions associated with posterior
distributions. Ann. Math. Statist. 53, 851-864.

LeCam, Lucien (1956). Lecture Notes. Department of Statistics,
University of California at Berkeley.

Lehmann, E.L. (1959). Testing Statistical Hypothesis. Wiley,
New York.

Munkres, James, R. (1975). Topology. Prentice-Hall, Inc.

Oaten, Allen (1972). Approximation to Bayes risk in compound
decision problems. Ann. Math. Statist. 33, 1164-1184.

 

Robbins, Herbert (1951). Asymptotically subminimax solutions of
compound statistical decision problems. Proc. Second
Berkele Symp. Math. Statist. Prob., 157-163. UniVersity
of Ca|i¥ornia Press.

 

Robbins, Herbert (1964). The empirical Bayes approach to statistical
decision problems. Ann. Math. Statist. 35, 1-20.

Rockafellar, R. Tyrrell (1972). Convex Analysis. Princeton Univer-
sity Press.

Rubin, H. (1956). Uniform convergence of random functions with
applications to statistics. Ann. Math. Statist. 33,
200-203.

Shapiro, Connie P. (1972). Bayesian classification. Ph.D. disser-
tation, Department of Statistics, University of Michigan.

Shapiro, C.P. (1974). Bayesian classification: Asymptotic results.
Ann. Statist. 3, 763-774.

Snijders, Tom (1977). Complete class theorems for the simplest
empirical Bayes decision problems. Ann. Statist. 3,
164-171.

Van Houwelingen, J.C. (1974). An empirical Bayes rule for testing
simple hypothesis versus simple alternative. Statistica
Neerlandica 33, 209-221.

 

65

Wald, A. (1949). Note on the consistency of the maximum likelihood
estimate. Ann. Math. Statist. 39, 595-601.

 

   

"llilllllllljlHilliﬂllilllljlllllEs