IIIIIIIIIIIIIIIIIIIIIIIIIIIIII

309 3289

 

LIBRARYO
Michigan State
University

 

 

 

This is to certify that the

dissertation entitled

Asymptotically Optimal Bayes Compound and
Empirical Bayes Estimators in Exponential
Families with Compact Parameter Space

presented by

Somnath Datta

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree in mum

 

 

 

Date July 25, 1988

MS U is an Afﬁrmative Action/Equal Opportunity Institution

 

 

MSU

 

 

 

RETURNING MATERIALS:
Place in book drop to

 

 

 

LIBRARIES remove this checkout from
.—:_—-. your record. FINES will
be charged if book is
returned after the date
stamped below.
'

 

 

ASYMPTOTICALLY OPTIMAL BAYES COMPOUND AND
EMPIRICAL BAYES ESTIMATORS IN EXPONENTIAL
FAMILIES WITH COMPACT PARAMETER SPACE

By

Somnath Datta

A DISSERTATION

Submitted to
Michi an State University
in partial ful illment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

1988

- , ('2
'11:} .f // 74. g.
.2 :

ABSTRACT

ASYMPTOTICALLY OPTIMAL BAYES COMPOUND AND
EMPIRICAL BAYES ESTIMATORS IN EXPONENTIAL
FAMILIES WITH COMPACT PARAMETER SPACE

By

Somnath Datta

The problem of ﬁnding admissible, asymptotically optimal compound
and empirical Bayes rules is pursued in the inﬁnite state case.

The component distributions considered in this work form a real
exponential family of quite general nature with component parameter in a
compact interval of the natural parameter space. The component problem
estimates an arbitrary continuous transform of the natural parameter under
squared error loss.

We consider the set, sequence compound and the empirical Bayes
formulations of the above component and show that all Bayes estimators in
the various formulations are admissible. Our main result is that any Bayes
compound estimator versus a mixture of i.i.d. priors on the compound
parameter is asymptotically optimal if the mixing hyperprior has full support.
Analogously any Bayes empirical Bayes estimator is asymptotically optimal if
the empirical Bayes prior has full support.

The exponential family structure has been used to treat the difference
in risks of the Bayes estimators and the component Bayes versus the empiric
for some Special cases of the continuous transforms. The key to the proof of
asymptotic Optimality is an L1 consistency of posterior mixtures, itself a

major ﬁnding of the thesis and extendible far beyond the exponential context.

The thesis also derives an interesting uniform L1 LLN for random continuous
functions on a compact metric space which is applied in the proof of the last
result.

The asymptotic Optimality results are generalized to weighted squared
error loss with continuous weight function and applications to some
non—exponential situations are also considered.

Several examples of such hyperpriors/empirical Bayes priors are given
and for some of them practically useful forms of the corresponding Bayes

estimators are obtained.

To my parents and my wife

iv

ACKNOWLEDGMENTS

I wish to express my deepest appreciation to my thesis adviser
Professor James F. Hannan for everything he has done for me. His invaluable
suggestions led the way to some results and continuous careful criticism
greatly improved the overall presentation of this thesis.

I like to express my thanks to my other committee members
Professors Hira L. Koul, Dennis C. Gilliland and Clifford Weil for their
suggestions on an earlier draft.

Special thanks go to my wife Susmita for her total support and
encouragement during the preparation of this thesis.

Finally, I would like to thank Ms. Loretta Ferguson for numerous
helps during my typing of the manuscript.

TABLE OF CONTENTS

Chapter Pag
0. INTRODUCTION ..................... l
0.1. The component problem ............... 1
0.2. The set compound problem ............. 2
0.3. The sequence compound problem ........... 3
0.4. The empirical Bayes problem ............ 4
0.5. Literature review and a summary of the present work 5
1. THE SET COMPOUND ESTIMATION ............ 9
1.1. Introduction ................... 9
1.1.1. Notations and conventions .......... 9
1.1.2. Exponential family component ......... 9
1.2. Estimators induced by priors on Q ......... 11
1.2.1. Bayes versus mixture of i.i.d. priors . . . . 11
1.2.2. Admissibility ................ 13
1.2.3. A useful inequality on the modiﬁed regret . . 13
1.3. A bound on the L1(E0) distance between two component
Bayes rules when ¢(0) = ea‘, k 6 ll ........ 14
1.4. Consistency of the posterior mixtures ....... 15
1.5. Asymptotic optimality ............... 23
2. THE SEQUENCE COMPOUND ESTIMATION ........ 25
2.1. Introduction ................... 25
2.2. Bayes versus EX .................. 25
2.2.1. The Bayes sequence compound estimator . . . . 25
2.2.2. Admissibility ................ 26
2.3. Asymptotic optimality ............... 26

vi

3. THE EMPIRICAL BAYES ESTIMATION .......... 28

3.1. Introduction ................... 28
3.2. Bayes versus A .................. 28
3.3. Asymptotic optimality ............... 29
4. EXAMPLES OF A AND CONCLUDING REMARKS ..... 30
4.1. Introduction ................... 30
4.2. Examples of A ................... 31
4.3. The Bayes estimators ............... 34
4.4. Remarks ...................... 37
APPENDIX ......................... 40
A.1. On bounding the difference of two ratios ..... 40
A2. On the uniform convergence of convex functions . . 41
A.3. A uniform Ll- LLN for independent random continuous
functions .................... 42
AA. Admissibility of Bayes estimators in the compound
problem under squared error loss ......... 46
BIBLIOGRAPHY ....................... 48

vii

CHAPTER 0
INTRODUCTION

We start with some notational conventions used throughout the body

of the thesis. Let n be a positive integer. An n—vector (x1, ..., xn) is

denoted by x and for lSaSn, (x1, ..., x0) is denoted by 290' For
n

probabilities P1, ..., Pn, x P a denotes their measure theoretic product.

Or=1
Typically the letter P is used for probabilities and E for the corresponding

expectations. For the sake of clariﬁcation, dummy variables are often
displayed in integrals. Also mixed mode integral expressions like ]X(w)dP
are used. For a bounded function f, f... and f* denote its inﬁmum and
supremum reSpectively over its entire domain. For a measure m on the
Borel a—ﬁeld of a tOpological space .7 the support of m is deﬁned to be
the set n{Fc.7 : F is closed and m(Fc) = 0}. Note that the support of m
= 3' iff, V Open (1) ,e 0C3; m(O) > 0. R, I, ll stand for the set of reals,

integers and non—negative integers respectively.

1. The component problem.

The component problem has the structure of the usual decision theory
problem, i.e. we have a parameter Space 9, a family of probability measure
{P 0 : 069} on some common measurable space .3 an observable .$—valued
random variable X ~ P 0 under 0, an action space A a loss function
Lwﬂe —» [0,oo), decision rules t, t:.$ —. .1 such that L(t,0) is measurable
for each 0, with risk R(t,0) = E0 L(t,0).

2. The set compound problem.
The set compound problem simultaneously considers a number, say
11, of independent decision problems each of which is structurally identical to
the above component problem, and allows the use of observations from all the
problems in each of the decisions. The compound loss is taken to be the
average of all the component losses.
Thus for each n_>_1, the set compound problem can be formulated as a
decision problem as follows. We have the parameter space 9”, the action
11

space .25“, observations 2: 2 (X1, ..., Xn) ~ P0 = P0 , 0 = (01, ..., 0n)
— 1 a

a:
6 6”, compound rules t = (t1, ..., tn), where for each lgasn ta: .2“ —-» .4

such that L(ta,0) is measurable for each 0, with loss Ln(t,_Q)

—1 n .
n E L(ta’0a) and l‘lSk

(2.1) Knee) = E, Luca).
Let Q = {w : w is a probability on 9}. For a) E (I, let R(w) stand

for the minimum Bayes risk versus w in the component problem, i.e.

R(w) = 2 [R(t,0)dw(0).

For a traditional simple symmetric rule (i.e. t a(_)_(_) = t(xa) V lgagn for some
component rule t) the compound risk is easily seen to be at least R(Gn),
Gn being the empirical distribution of 01, ..., on. In all non—trivial situations
a component Bayes rule versus GB is unavailable to the statistician because
Gn is unknown and hence R(Gn) cannot be achieved (for any 11) via the
use of a simple symmetric rule. Thus compound rules which attains risks

asymptotically no more than R(Gn) are Of interest. Hannan (1957) used

the term 'approximation to Bayes risk' to describe such effects.

For a compound rule t, the difference Dn(t,_Q) = Rn(t,£) - R(Gn)
is called the modiﬁed regret of t at Q. We say that a rule t is
asymptotically Optimal (a.o.) if
(2.2) \g Dn(t,ﬂ)+ —’ O as n —+ co .

For the relation of this notion of Optimality to that with more stringent
envelopes in the ﬁnite 9 case, see Gilliland and Hannan (1986).

A set compound rule 1, is said to be admissible if for each n21,
Rn(t,ﬂ) is admissible in the usual decision theoretic sense as a function of Q

in the class of set compound rules.

3. The sequence compound problem.

The sequence compound problem also considers a number, say 11, of
independent repetitions of a component problem but allows only data up to
stage a in making the a—th decision for 15asn. Thus a sequence compound
rule is t = (t1, ..., tn), t a: .3“ —» .1 such that L(ta,0) is measurable
for each 0, with the interpretation that the a—th decision made with the use
of t is ta(za) for each ISOSn.

In the weak sequence compound (w.s.c.) version 11 is known to the
statistician apriori so that each t a’ 1$a$n may use 11. We will use tn for
a w.s.c. rule to show its possible dependency on n. The other version is
called the strong sequence compound (s.s.c.) problem and is more interesting.
In both versions we are interested in the asymptotic risk behavior of
compound rules as 11 tends to inﬁnity. Hence a w.s.c. rule can be viewed
as a triangular array tn, n21 and an s.s.c. rule as a sequence

t = (t1, t2, ). It should be noted that an s.s.c. rule is automatically a

w.s.c. rule.

The risk (up to stage n) and the modiﬁed regret of a sequence
compound rule 1, (or tn) is as given by (2.1) and (2.2) (with the
understanding that t is viewed as a function on .2“ as t a(;) = t a(; a)
for ISOSn). The notion of asymptotic optimality remains the same.

A sequence compound rule 1 (or tn) is said to be admissible if, for
each n21, Rn(t,Q) (or Rn(_tn,Q)) is admissible in the usual decision theoretic
sense as a function of Q in the class of sequence compound rules. [This is the
natural deﬁnition for w.s.c. rules and therefore more demanding of s.s.c.

rules]

4. The empirical Bayes problem.

The empirical Bayes problem considers a sequence of independent,
identical Bayes decision problems.

In this case the component problem is the same as that in Section 1
with the additional notion that the component parameter 0 is a random
element having a (prior) distribution w on 9. Thus the risk of a component
rule t is

R(t,w) = ]R(t,0)dw.
The prior w is unknown to the statistician even though it is believed to
exists.

The empirical Bayes problem has generic point Q = (01, 0 , )
representing the true states and data X = (X1, X2, ) from the problems
and the assumption is that (01,X1), (02,X2), are i.i.d. COpies of (Q,X)
having distribution w on 0 and, conditional on Q, P 0 on X. Let E denote
the overall expectation. At stage n, a decision tnﬁn) about ﬁn is taken by
tn: 5" -» .1 with loss L(tn,0n), which is jointly measurable, and the risk

incurred is Rn(tn,w) = E L(tn,0n) = f] L(tn,0n) dPﬂndwn. We call

<tn: n21> an empirical Bayes (e.B.) rule.
An e.B. rule <tn> is called asymptotically Optimal if

lim Rn(tn,w) = R(w), Vwe fl.
n—+oo

An e.B. rule <tn> is said to be admissible if for each n, Rn(tn,w) is
admissible in the usual decision theoretic sense as a function of w in the class

Of possible t n'

5. Literature review and a summary of the present work.

The pioneering paper of compound decision theory is by Robbins
(1951). His featured example was decision between N(-l,1) and N(l,1). He
exhibited an a.o. compound procedure and called it asymptotically
subminimax by comparison with the simple symmetric minimax rule.

A.o. compound and empirical Bayes rules have been worked out for
many choices of component problem. Typically they are bootstrap (or delete
bootstrap) in nature, - rules whose components are Bayes versus some
estimate of the unknown Gn (or w in the e.B. case) or direct estimates of
the Bayes rule versus Cu (or w). In particular, when the component problem
is an estimation problem under squared error loss, Gilliland (1968) and Singh
(1974) obtained a.o.sequence compound rules with rates (we say 3, is a.o.

with rate an if \é Dn(t,Q)+ = 0(an)) for discrete and Lebesgue exponential

components respectively.

Though the above mentioned rules satisfy the criterion of asymptotic
Optimality, they are not very satisfactory as far as their ﬁnite n behaviors
are concerned. In fact they turn out to be inadmissible where admissible is as

deﬁned in the previous sections.

Thus the problem of exhibiting compound and e.B. rules which are so.
as well as admissible has been an interesting and challenging question ever
since it was put forward by Robbins (1951) in the sense that he proposed the
Bayes compound rule versus the symmetric prior uniform on proportions for
his featured example and conjectured that it might have better risk behavior
than his asymptotically subminimax rule. Inglis (1973) studied, i.a., the
asymptotic Optimality of a class of admissible Bayes rules for two state
components under the ﬁniteness of the expected log-densities and tacit (cf.
Inglis (1977)) non-atomicity conditions for his "generalization" of the
Harman—Robbins theorem; cf. the addenda of the next two works. Gilliland
and Harman (1974, The ﬁnite state compound decision problem, equivariance
and restricted risk components, RM 317, Department of Statistics and
Probability, Michigan State University), which was later published in 1986,
treated the more general problem of restricted risk components in the ﬁnite
9 case. They worked with a more stringent enveIOpe and reduced the
problem of asymptotic Optimality to the problem of establishing the L1
consistency of certain induced estimators. Gilliland, Harman and Huang (1976)
established that consistency, in two state components, for Bayes compound
estimators versus certain symmetric priors including Robbins prior. This
approach yielded for them admissible rules which are so. with rates as good
as 0(n—1/2) in the general two state component case. Vardeman (1978)
successfully exploited a result by the last authors to obtain admissible, a.o.
sequence compound rules in the two state component case.

None of the results mentioned in the previous paragraph go beyond the
ﬁnite 9 case. Meeden (1972) obtained admissible, a.o. empirical Bayes rule in
two special inﬁnite state examples, where the component problems are (i)

squared error loss estimation of Geometric parameter and (ii) linear loss

testing of Poisson mean. Inglis (1973) attempted to prove the admissibility
and asymptotic Optimality of a class of Bayes compound estimators versus
mixtures of i.i.d. priors in example (i) above with compact parameter space.
Unfortunately his proof of asymptotic optimality appears to contain certain
serious gaps. For a discussion on this see the addendum of Gilliland and
Hannan (1986).

The present work, which subsumes Inglis's example, seems to be the
ﬁrst successful attempt in the literature to accomplish compound admissibility
and asymptotic Optimality simultaneously in non—ﬁnite state case. Our
component distributions form a one dimensional exponential family Of quite
general nature, whose examples include well known exponential families such
as Normal, Exponential, Geometric, Poisson and Negative Binomial, where the
parameter space is any compact interval of the natural parameter space on
which the ﬁrst moment is ﬁnite. The component problem is to estimate an
arbitrary continuous transform of the natural parameter under squared error
loss. We note that all compound Bayes estimators in our set and sequence
compound problems are admissible. Our main result is that those Bayes
versus a mixture of i.i.d. priors on the compound parameter are so. if the
mixing hyperprior has full support. In the empirical Bayes version of our
problem, our conclusion is that all Bayes e.B. estimators are admissible and
those versus a prior with full support are so.

In the set compound situation, for a dense class of continuous
functions, the question of asymptotic Optimality is reduced to the question of
the L1 consistency of a posterior mixture which itself is of independent
interest. We make use Of an inequality suggested in the addendum of
Gilliland, Hannan and Huang (1976), develop and use an uniform Ll- LLN
and the full support prOperty of A to treat the resulting terms and prove the

 

required consistency. The results in the sequence compound and empirical
Bayes problems, to a large extent, follow from the compound results.

The thesis is organized as follows.

Chapter 1 treats the set compound problem described above. Section 2
obtains the admissibility of all Bayes estimators from Lemma A.4, describes
the Bayes estimator versus the above mentioned prior and establishes a bound
on its modiﬁed regret. Section 3 establishes a bound, in terms of the L1
distance between the corresponding mixtures, on the L1 distance between two
component Bayes estimators of ¢(0) = e0k for k 6 l and 0 the natural
parameter. Section 4 proves the consistency result as detailed in the previous
paragraph and adapts it to the delete versions. Section 5 combines the results
of Sections 2-4 and establishes the desired asymptotic optimality.

Chapter 2 and 3 considers the sequence compound and empirical Bayes
formulations respectively and obtains similar admissibility and Optimality
results.

Chapter 4 contains various examples of hyperpriors having full support.
In some cases practically useful forms of the Bayes estimators are obtained.
To this end some possible generalizations are recorded.

Finally some results of possible independent interest in more general
contexts, which include the aforementioned uniform Ll-LLN for random
continuous functions on a compact metric Space, are derived in the appendix.

They are used in the body of the thesis.

CHAPTER 1
THE SET COMPOUND ESTIMATION

1. Introduction.
1.1. Notations and conventions.

In this chapter we consider the set compound problem as described in
Chapter 0 corresponding to the component problem to be introduced in the
next section. We assume that the reader is familiar with the notations and
deﬁnitions of the general set compound problem from Chapter 0. The
following additional notations and conventions will be used throughout this
chapter.

11
x

We will use P for PQ = P 0 and E for the corresponding expectation.
1 a

0:
Given any vector u = (111, ..., un), for each lSaSn, u; will mean the
vector (v1, ..., vn—l) with vj = uj for j<a and = 11H1 for j2a. Since .3:
It in our problem (see the next section), we will view X1, ..., Xn to be the
co—ordinate functions on Rn. On the other hand, any measurable function H
on Rn will be viewed as the random variable H(L) whenever convenient.
The next section describes our component problem and records a few

elementary but useful results related to the component distributions. A

summary of the rest of the chapter was given in Section 5 Of Chapter 0.

1.2. Exponential family component.

Our component problem is the following. 9 = [c,d], - oo<C<d<oo,
.1 = ll, L(a,Q) = (a — <;S(Q))2 where 43 is any real continuous function on
9 and V 0 E 9, P0 admits a density p0 wrt a common a—ﬁnite p on I!

10

given by
(1.1) p0(x) = e0xh(0), x E R.

In addition it is assumed to satisfy p1 ( p, where ”k = ”Skl’ sk(x) =
x+k,x€R;k€l.
Clearly c, d e O = {0 : jeoxdp(x) < co}. Throughout we will assume

(1.2) manna) < s, lxedxdp(x) < a.

It is well known (e.g. Lehmann (1959, 1986), Theorem 2.9) that(l.2) holds if
c, d 6 interior of 9.

For w E {2, let P 0.: denote the w—mix of P 0's and p w denote its p

density, pw(x) = [p0(x)dw, x E R.

The following consequences of our assumptions are worth noting and

some will be used in the later sections.

Cl. h(0) = (leaxdp(x))-1 and h is continuous and positive on
*
compact 9. Consequently, 0 < h... 5 h < 00.

[Indeed h*= h(c)/\h(d), since log h is concave by the Hiilder inequality.]

C2. 0 ~~> Ixeodex) is continuous on 9.

dx 0x cx dx

< e S e V e on O, the continuity of h and

[Since ecx A e

0 ~~> [xeoxdp(x) including one sided continuity at the end points follow
readily by the Dominated Convergence Theorem (D.C.T. hereafter).]

11

C3. For any 0 E O,
(1.3) p, s (h"‘/h...)(1>c + pd),
and hence any f e L1(Pc) n L1(Pd) is uniformly integrable wrt the family
of probability measures {P 0 , Q E 9}. In view of (1.2) the identity function

is such.

C4. Since, for any 0 E O,
*
(1.4) h,..(ecx A ed") 5 p0(x) 5 1. (ecx v ed"),
and for any (.12 E 0, p w inherits the above bound (1.4) on p0 , we have

1) (X) *
(1.5) [log p—w—ml s |x|(d — c) + log £1- , x e a
w,

for any 11), w’ E Q.

2. Estimators induced by priors on n.
2.1. Bayes versus mixture of i.i.d. priors.
Consider D with the topology of weak convergence. Let B(ﬂ) denote

the Borel a—ﬁeld of 0. Let A be a probability on (0, 8(0)). Deﬁne the prior

(for each n) ER on On as follows

11
(2.1) 52(le...x an) = 1 Onlqsi) dA,
1:

for B ,B Borels of 9. [Note that the above integral makes sense

1,...
because the integrand is non—negative and measurable. For a proof of

n

measurability it sufﬁces to take n=1 and B1 Open. But then it follows
from a deﬁning property of weak convergence (Billingsley (1968), Theorem
2.1.iv) that {w : w(Bl) > c} is open and hence measurable V c>0.] Hereafter

12

we will drOp the superscript n in ER, as it will be clear from the context.
By the Fubini Theorem, the a—oomponent Bayes risk against E A of an

estimator t = (t1,...,tn) in our compound problem is

(2.2) an, 1.2,) = IR,(19,(ta- 45(00))2i21p9i(xi)dw,()du

30(0)) n .. .
Let c = II p (x.), A denote the probab111ty measure on Q wrth
i 1‘ a w 1 a
g
density prOportional to e a wrt A and ”a = Aaow. By the Fubini

Theorem (on the Space QxOn), the inner integral in (2.2) is

(
(ma- ¢(o,))2p,a(xa)dw eg“ ”at,

which then, by deﬁnitions of g a and tau, equals

sa(w)

(2.3) (I e M) m, -- ¢<oa))2p,a(xa)dwa(aa).

The compound risk being the average of the risks across the
components, it follows from (2.2) and (2.3) that the estimator which plays
component Bayes versus ”a in the a—th estimation V a = 1, ..., n is
Bayes versus 5 A in the compound problem. Let Tw denote the Bayes

estimator of (15(0) versus a prior w On O in the component problem,

ryx) = Morgana: / pyx).

13

Thus the Bayes estimator _t_ versus 5 A is then given by

(2.4) ta(x) = Twa(xa) , a = 1, ..., n.
2.2. Admissibility.

Since, for each Q E 9”, P and ”n are mutually absolutely
continuous, we have P ( [PdC for any prior C on 6”. Hence, by an
immediate application of Lemma A.4, we get that for each n 2 1, all Bayes

estimators in our compound problem are admissible. In particular, t is so.

2.3. A useful inequality on the absolute modiﬁed regret.
Recall that Gn stands for the empirical distribution of 01, ..., on. Let

i be the Bayes estimator versus G“. Then i' (x) = r (x ), lgagn
- n 0 G11 0

and the modiﬁed regret of t at Q is

(25) D (ﬁ 2) — n‘1 is {Ed — w n? - Ea - 41w ))2}
' n—’ _ 0:1 0 a a a ‘

Since ta. Q. 4400) E ¢[9].

. ._ n “ ~
(2.6) |Dn(t, 9| g 2 diam ¢[e] n1 2: Elta - tal.
a=1

Note that it follows from the deﬁnitions of t a and fa that
Elta— ful = E-eéanlTwa— anl. Hence in order to investigate the bound

on the modiﬁed regret given by (2.6), it is useful to consider EQITJ Tw’l’

forQEOandw,w’eﬂ.

14

3. A bound on the L1(Ea) distance between two component Bays rules when
¢(l) = e“, k e R.

Throughout this section, let ¢(0) = ea‘ for some k E R. For this
special case, we establish a bound, uniform in 0, essentially in terms of the
L1 distance between the corresponding mixtures.

For any two w, w’ 6 I), let

lle - PWII = Ilpw - payldn,
and for a function f on R and any k 6 ll, let f(k) = fosk .
Note that ”k < it follows from the assumption p1 ( p .

Lemma 3.1. For any w and w’ 6 ﬂ and m, m’ E (0,00),

11*

3.1 dk k k ,
( l 3' EM”— rw,| g (e - ec )(Pc+ Pd)[|.|>m or d )>m]
+ rid—‘3)m (Zedk — eCk + m’) ||Pw- PM”,
dpk
w1th f = p .

ck,edk

Proof. Since Ta) and r e (e ), by C3 it follows that

w!

h»:

k
FEQITw-Tw’u'l >morf()>m’]

 

is bounded by the ﬁrst term in the RHS of (3.1). To bound the expectation

over the other region ﬁrst observe that

ryx) = Ie“r,(x)dw /pw(x) = rims/m

15

for any w E 0 since eﬂ‘pdx) = p0(x+k) = pgk)(x). Lemma A.l then
applies to yield

k k
(3.2) pwlrw- rwl s (2edk-e°k)lpw- pwl + 1le - 115,91.

*

h

Since h*p0 5 e(d_c)l’l

p (1) follows from (1.5), (3.2) shows that

(h. /h*)FsIrw- T... Ill-ISm, dk’sm'i is bounded by

e “‘9’“ (2edk—eCk)|[Pw— P an }.

| 118‘) - pg)

m!

 

w," + {(ﬁk

This completes the proof because, by the transformation theorem, the above

integral wrt p is

 

llpw- pw. ldpk = I | pw- pw,|fdu S 111’ le- Pm,"-

fgm’ me’

4. Consistency of the posterior mixtures.
In view of (3.1) and the paragraph following (2.6), the question of null
convergence of the modiﬁed regret reduces to the question, loosely speaking,

whether “a is Ll consistent for PG . In Theorem 4.1 we establish such a
n

consistency result for the non-delete version. The result for the delete
versions will follow as a corollary (i.e. Corollary 4.1). Note that the proof of
this theorem can be extended far beyond the exponential families of Section
1.

Replace n by n+1 in Section 2.1. Denote gn +1, An +1, ”n +1 (of
the second paragraph Of 2.1) by g, A, a: respectively. [112 can shown to be
the posterior distribution of 011 +1 given the data X = (X1, ..., Xn) under

16

the prior 52‘” on (0 , ..., 0n, 0n+1).] We will use 52w), if necessary, to

exhibit the number of arguments.

We will ﬁrst prove three lemmas which will be used for the proof of
Theorem 4.1.

Let

7’: In—1

HM:

V 10s pw(xa) - flog pdeGnl
w

and for any 112, w’ e Q,

Aw,(w) = [log (pw,/pU)de,.

Lemma 4.1. For each 6 > 0,

- 1 n6
lsup —P u < m + P(7>5) + —e——§—
2 (3) G11 I A(il6)
where
Proof. By deﬁnition of a,
HP. - P II = flip (40w) - p ldu,
w an ad Gn
(4.2) = ]|f( [padw - pG )dA(w)|dp by Fubini on ﬂxO,
n

.<. Ulpw - pG |d4(w)du = 1|le - PG ||d4(w)-
n n
The last two steps follow by taking absolute under integral and Fubini on

QXR respectively.

17

For any 112, by (3.6) of Harman (1960),
1 W
11

Clearly LHS above < l

everywhere and, by the above, < J25 on $12 25‘
Combining this with (4.2), we get

(4.3) % ”Pa. - Pen" < m + M12429“),

where the superscript c denotes complement.
Since A has density wrt A proportional to eg and 7’ is the sup norm

of n-lg + AG - [log pG dPG , one easily gets (equation (iii)' of the
n n n

addendum of Gilliland, Hannan and Huang (1976))

A((?t __2__6)C) < e-2n6+n7’

(4.4) _ _n _n
11(216) A(?t5)e 5 7

 

by bounding g above on (it ”)0 and below on ”6‘ Since A is a probability

the LHS bounds A((?t26)c ); while on the set [7’ 5 6/4], the RHS is bounded

asserted bound.

)/A(?t§). Using these in (4.3) and taking expectation we get the

Lemma4.2. LIV—+0 uniformlyinQ as n—ioo.

18

Proof. The conclusion readily follows by an application of Theorem
A3 with S = n, d = the Levy metric, I: 9‘”, P0 = x P0 = P for
— a a

Q E 9‘”, Ha(w) = log pw(Xa), w 6 Q, a 21.

(S,d) is a compact metric Space by Helly's theorem.

Continuity of w ~~> p w(x) follows from the continuity and the
boundedness of 0 ~~> p0(x). Thus H a's satisfy (1). We verify (ii+) and
(iii+) of Remark A3 in the present situation.

For (ii+), use (1.4) to observe that

_ :1:
“H,” s IXa|(-c v d) + log (11.1 v n 1,

and

Il
*(|X|-M) Slimsup v VP([X|—M) =v1> |.|-M) 10

as MTao in view of (1.3) and (1.2).

Next observe that p w is convex (because it is log—convex by HOlder)
and continuous (by continuity of pa and D.C.T.). Using also the previously
noted continuity of w ~~> p w(x) V x, it follows by Lemma A.2 that for

any m<oo,

(4.5) V |p(x)-p (x)|—10 asw—iw.
lesm w (00 0

Let Vwopa = V{|Ha]:‘50| : d(w,w0) < p}. Since (aAb)|log El 5 |b—a| for

any a, b 6 (0,oo), (4.5) implies that for each 6 > 0 and m < oo, 3 p0 > 0

19

such that
(4.6) [Vwopa>e,|Xa|$m]=¢,Vaifp<p0.

Hence

(4.7) lim sup V P[V

>6 5 VPX >m = VP m,oo).
p10 0,2 “’09“ l at!“ al 1 0%

Now let mToo to conclude, LHS of (4.7) = 0. This establishes (iii+).

7Q) _ -1 _ -1 _
Also "H II - x] n 210g pw(Xa) n EPlog pw(Xa)| — 7/.
Hence by Theorem A.3, * 7’ = lim supn \é E 7 = 0.

Lemma 4.3. If support of A = (2 then for any 6 > 0,

A A(A <6) >0.
10069 {”0 }

Proof. Fix 6 > 0. For an “’0 E Q, the continuity of the function

w ~~> [log (pr/dePwo follows by D.C.T since log pwn—1 log p w as
can —-1 w and (1.5). So the set {Aw < 6} is Open, non—empty (because it
0

contains 160) and hence

(4.8) A({Aw0 < 6}) > 0,

Since A has full support.

20

Next observe that, since the functions A w converge to A w pointwise
n 0

and hence in A—distribution if ”n —-1 “’0’ by the deﬁning property of the
latter convergence (cf. our Section 2.1 usage)

lim infn A (ma): 5}) 2 A({Aw0 < 5}) 11 can —. wo .

This shows that the function “’0 ~~> A({Aw < 6}) is lower semi-
0

continuous. Hence it attains its inﬁmum because 0 is compact. The proof

now ends by (4.8).
Theorem 4.1. Let the support of A be n. Then

E||PA-P [I ——10 uniformlyinQ as 11—900.
11) Gn

Proof. Fix a 6 > 0. Consider the bound in Lemma 4.1. Now, as
n —1 no, the second term of this bound goes to zero by Lermna 4.2 and so
does the third term by Lemma 4.3, the convergences being uniform in Q.

Thus

1im supn 161's" P,- PG || 5 W-
a: 11

The proof ends, 6 > 0 being arbitrary.
Corollary 4.1. If the support of A = Q then

V EIIPw—PGII —+0 uniformlyinQ as n—ioo.
OSn a n

21
Proof. Fix Q 6 9n and 1$a_<_n. Let Gna denote the empirical based

on Q;. Then we have the following.

(i) Gna is Gn—l corresponding to Q; E On—l.
(ii) Letting F: 11““1 —+ 0 be the function such that F(_xn_1) =

&’(n—1)’ it follows from the deﬁnition of ”a that ”a = F(;(_('I).
-_1 _1
(111) Clearly P X = .. _ .
Q a Q0 2in 1
From (i), (ii) and (iii), we get that
PQII Pu, —PG ll'1= Pf II P. -PG ll ‘1.
a na 01 ”(n-1) n—1
Since Q and a are arbitrary
v VEIIP -P IIS V _EIIP. —P ”—10
Qeenchn wa Gna Q6911 1 “’(n—l) n—l

as n —1 00, by Theorem 4.1.
-1 . . . .
Next, because Gn — Gna = n (600 - Gnu) w1th 600 the d13tr1but1on

degenerate at 00, the variation norm of Gn - Gun is no more than 2n_1.

Thus, by deﬁnition of p a) and by (1.3),

*
—1 -1 h

ID -p I $2n Vp 52n 11—11) +13)-

Gn Gun 0 Q ... c d

22

Consequently
_1 * -—1
HP -P H S 2n I(Vp X))dﬂ S. (4111/11 )1!1 .
Gn Gun 0 A *

The proof now ends by the triangle inequality.

(Empirical Bayes, Estimation of mixtures.)
Consider the situation where X1, ..., Xn are i.i.d. observations from
the mixed distribution P w , w E (2 being unknown to the statistician. The

problem is to estimate P w . This model obtains in the usual empirical Bayes

context where _E is an expectation under which 0 , ..., 0n are i.i.d. ~ w and,
n

given Q, 1; ~ x P0 . It turns out that P, is L1 consistent for Pw
a=1 a w

whenever A has full support.
Corollary 4.2. If the support of A = Q then, for any h),

EHP.‘P" —10 as n—boo.
w w

Proof. It follows from the continuity, noted in Lemma 4.2 's proof, of
w ~~> p 11100 V x that w ~~> P w is continuous in H.“ by the Scheffe’

theorem. Thus as n -+ 00, PG -+ P as. (since G ——1 w as. by
n w n

Glivenko—Cantelli) and hence Ell PG — P w" —-» 0 by D.C.T. The conclusion
11

now follows by Theorem 4.1 and the triangle inequality.

23

5. Asymptotic optimality. Now we are in a position to prove our main
result.

Theorem 5.1 (Main result). If the support of A = ﬂ and t is the

 

Bayes estimator given by (2.4), then

(5.1) VEIta-tal —10uniformlyinQasn—1oo.
a<n

Consequently t is ac.

Proof. The second part of the assertion follows from (2.6).
For the ﬁrst part, ﬁrst consider the case 43(0) = eﬂ‘, keN. By the

representation noted after (2.6),
El E. — '1‘: I = E v E | T — T I ,
a a Q 0 0a wa G11

5 [1;‘11‘ {(e —eCk)(P +P d)“. |>m or f(k)>m 1

+e‘d‘°)m(2edk- e°“ +m') En 12w - PG 11}
a n

by Lemma 3.1, where m, m’ < on are arbitrary. For each m and m’,
the second term of the above bound is 0(1) uniformly in a and Q as n ——+ on
by Corollary 4.1. The ﬁrst term is independent of a and Q and can be made
arbitrarily small by choosing m and m’ large enough. This concludes the

proof in the present case.

24

Next let ¢(0) = 213 eke"k be a polynomial in e”. By deﬁnition and

the linearity prOperty of conditional expectation (or integral) it follows that

‘ - “[kl _ ‘lkl
ta—Eakta’ tar" fakta

. ~ 0 k
where tug] and tug] are the corresponding Bayes estimators of e a for

each 1:. Hence (5.1) holds since it holds with ilk] and ilk] for each It
by the previous case.
Finally for general continuous d, given t > 0, choose a polynomial p

such that \é |¢(0) - p(e0)| < t. Then, using deﬁnitions and taking

absolute values under integrals,
lta— tlgll < c, Ita— t[g]|< c

“I ] ~I ] 00k
where t p amd t p are the correSponding Bayes estimators of p(e )
and so

t | 5 It t I + 2c.

Ito,

The proof is now complete by the previous case, 6 being arbitrary.

CHAPTER 2
THE SEQUENCE COMPOUND ESTIMATION

1. Introduction.

Here we consider the sequence compound version of the problem
treated in Chapter 1. In this formulation, at each stage a we estimate (6(00)
by estimators based on the data £0 = (XI, .., X a) then available. The

sequence compound estimator, which for each 11 plays Bayes versus ER with
n
the compound loss L (t,Q) = n-IE (t — ¢(Q ))2, turns out to be of s.s.c.
11 0:1 or a

type described in Chapter 0 and is ac. if A has full support. The proof
reduces to a corollary to the set compound result via an inequality due to

Hannan (1957).

2. Bayes versus BR.
2.1. The Bayes sequence compound estimator.

Fix an n 2 1. For ISGSII, a stage a sequence compound rule t a
has ROGER) = R(ta,EX) which is its a component Bayes risk in the set
problem with 0 components. But t a which minimizes R0032) is the
a—th component of the Bayes rule versus 52' in the set problem with or

components. Hence we get from Section 2 of Chapter 1 that in the sequence
11

problem the estimator which minimizes Rn(t,EX) = n_IE R0055?) is given
0:1

by

(2.1) lama) = rw (x0). 15am

where 1120 is as in (1.2.4) with n=a. Note that the components do not

25

26

depend on n and let 1, = <f : a21>.

a
2.2. Admissibility.

It has been noted in Chapter 1 that the condition of Lemma A.4 holds
in our case. Hence all Bayes sequence compound estimators are admissible. In

A

particular 1, is so.

Remark 2.1. Note that for each 11, ﬁn is Bayes versus ER in the
class of all stage 11 sequence compound estimators tn wrt the n-th
estimation loss L(tn,Qn) = (tn- 11>(Qn))2 and, as recorded in the proof of

Lemma A.4, has unique risk.

3. Asymptotic Optimality.

Theorem 3.1. If A has full support then 1, is a.o.

N

Proof. Let t (mun) = an(xa) and in = (I ..., I

ln’ nn)
1$a$n<oo and 'f = <fm>. Now Rn('f, .) 5 RDGD, .), since its
generalization (cf. inequality (8.8) of Harman (1957)) holds without restriction
and hence Dn(£,Q)+ is bounded by RHS (1.2.6) with to as in this chapter
and i0 replaced by faa‘ But Since A has full support this bound is o(1)

uniformly in Q, because )é Eltn- innl is 0(1) by Theorem 1.5.1 and

convergence implies Cesaro convergence.

Remark 3.1. In fact, in the above, 2 |Dn(t,Q)| = 0(1). To prove this

note that a slight extension of (2.5) of Gilliland (1968) gives

27

~ . -111 ~ ~ —1
|Dn(t,Q)[ 5 2dlam ¢[O]n ElElta— taa' + O(n log n),

where O(n’llog n) is uniform in Q. But the above proof has Shown that

ﬁrst term is 0(1) uniformly in Q.

CHAPTER 3
THE EMPIRICAL BAYES ESTIMATION

1. Introduction.
In this chapter we look at the empirical Bayes approach to our
estimation problem. An empirical Bayes estimator t = <tn : n_>_1> is such

that tn is a function of -X-n with n—th estimation risk function

(1.1) Rn(tn,w) = ”(tn— ¢(Qn))2dP£dwn , w e 11.

See Chapter 0 for further details.

We will prove that the empirical Bayes estimator t = <fn>, where

in is the Bayes estimator versus a prior A on n, is so. if A has full

support.

2. Bayes versus A.

For any given 11, any estimator tn based on Ln has stage n
Bayes risk wrt A in the empirical Bayes problem equal to its stage n
Bayes risk versus ER in the compound problem Since iterated integral
dwndA and integral d8: have same meaning. Hence the Bayes empirical

Bayes estimator of ¢(Qn) versus A is tn of (2.2.1).

Admissibility. For any n, the stage n Bayes e.B. estimators wrt

 

A are the stage n Bayes sequence compound estimators wrt ER and, since

28

29

the integral dun maps the risks of the latter to those of the former, inherit
the uniqueness of risk from that of the later (cf. Remark 2.2.1) and hence are
admissible.

3. Asymptotic optimality.

Theorem 3.1. If A has full support then the Bayes empirical Bayes
estimator <tn> is so,

i.e. V w E (I, Rn(tn,w) —+ R(w) as n -—1 oo.

Proof. First method: Since for each n21, tn is the n—th
component of the compound rule t of (1.2.4) whose equivariance follows
from the deﬁnition of t and asymptotic Optimality in the compound
problem follows from Theorem 1.5.1, the asymptotic Optimality of tn in the
empirical Bayes problem follows by Remark 1 of Gilliland and Harman
(1986).

Second method: A direct proof of the asymptotic Optimality of tn
can be given in the present case along the same lines as that of Theorem
1.5.1. Interpret E as P: and in as the component Bayes estimator versus
10 based on Xn. Then as before, it is sufﬁcient to prove that Eltn- ful
-1 0 as n -—1 m. The proof goes through by the same steps with the use

Of Corollary 1.4.2 instead Of 1.4.1.

Remark 3.1. For the asymptotic Optimality of the Bayes rules in the
general ﬁnite state empirical Bayes problem, Gilliland, Boyer and Taso (1982)

came up with the same sufﬁcient condition on A as the present work.

CHAPTER 4
EXAMPLES OF A AND CONCLUDING REMARKS

1. Introduction.

In Chapters 1—3 we have Shown that the Bayes estimators under
consideration in the set, sequence compound and the empirical Bayes versions
of our problem are all related and are asymptotically Optimal if A (of 1.2.1)
has full support. In Bayesian contexts, we can say that such a prior is
nonparametric in nature, a desirable property as indicated by Ferguson ( 1973)
and others.

Priors on the set of probability distributions, or random probabilities
or random distribution functions have been considered by many authors in
the contexts of nonparametric Bayes and empirical Bayes estimation,
estimation of mixing distributions, etc. In Section 2 we present brief
descriptions of four examples of full support A from their works (with some
modiﬁcations in the Rolph case).

In Section 3, some practically useful forms of the Bayes estimators
corresponding to some of them are obtained. It is pointed out in the
beginning of the section that the Bayes estimators can be expressed as a
ratio of two multidimentional integrals involving the posterior means of w.
This form is useful if the posteriors of w are analytically calculable (e.g. A
and B).

Section 4 contains the concluding remarks which include some possible
generalizations of the present work and its application to some

non-exponential families. Also some related Open problems are indicated.

30

31

2. Examples of A.
We list below ﬁve examples of A with support (I.

A. (Dirichlet process) An important class of priors on the probabilities on R

 

with manageable posteriors has been introduced by Ferguson (1973). Among
many equivalent deﬁnitions, here we state Deﬁnition 1 of Ferguson
(1973,1974). Other equivalent representations can be found in Blackwell and
McQueen (1973), Ferguson (1974) and Sethuraman and Tiwari (1982).

Deﬁnition: Let 7 be a non-null, ﬁnite Borel measure on R. Then A is
called the Dirichlet process prior with parameter 7 (hereafter we write A =
Q (7)) if for every ﬁnite measurable partition {B1, ..., Bm} of R the
distribution of (w(Bl), ..., w(Bm)) under A (w is the identity function on the
Space of probabilities on R) is Dirichlet with parameters (7(Bl), ..., 7(Bm)).

It is well known (e.g. Ferguson (1974)) that the support of Q (7) is
the set of probability distributions on R whose support is contained in the
support of 7. SO if we choose 7 with support of 7 = 9 = [c,d] then A =
.9 (7) has support (I.

B. (Processes neutral to the right) A more general class of priors than
Dirichlet process has been introduced by Doksum (1974).
Deﬁnition: A random distribution function F(t) on real line is said

 

to be neutral to the right if, for every m and t1<t2< .. <tm, 3

independent random variables V1, V2, .., VIn such that (F(tl), F(t2), ..,

— m _—

F(t )) has the same distribution as (V , V V , .., II V.), where F =
m 1 1 2 i=1 1

l—F.

32

Doksum (1974, Theorem 3.1) gave a nice characterization of such
processes in terms of independent increment processes. In his Example 3.1, he
showed how such a process can constructed starting from any non-negative,
inﬁnitely divisible random variable Y and any distribution function 60 on
R. If ,60 is absolutely continuous wrt the Lebesgue measure on [c,d] with
positive density and the distribution function of Y is strictly increasing,

then the resulting process has support 9.

C. (Distributions on the moment Space) Let

D = {(pl, #2, )2ﬂi = Midw, V iZI, for some 11) E {I} C R°° be the

 

Space of moment sequences of probabilities on 9.

Since any w E I) is determined by its moment sequence {pi} E D, a
prior on D induces a prior on (I in the obvious way. To make the ideas
precise consider (I with the weak convergence topology and D with the
product tOpology. Let p be the mapping w ~~> (p1(w), 112012), ), p.106) =
Hide), i2]. Then I; is 1—1, continuous, onto D and hence is a
homeomorphism since (I is compact and D is Hausdorff. Thus a prior A on
(D,.ﬂD)) induces the prior A0 = A): on (9,119»- Since p is a
homeomorphism, support of AQ = 0 iff support of A = D. Hereafter we
will write A for An too.

The structure of D, for the case 9 = [0,1], has been studied by many
authors. Rolph (1968) exploited this structure to deﬁne his prior sequentially
on the co—ordinates. His priors can be adapted to the case 9 = [c,d] by the
reparametrization 0 ~~> (0—c)/(d—c).

Another way of putting priors on D would be to follow Rolph's
approach directly for D = D[c,d]. It is easy to see that D[c,d] has the

same structure as that of D[0,1]. Let us elaborate this more extensively. We

33

use Rolph's notation of lower and upper bars to denote corresponding bounds
on the range of moments given their predecessors — deSpite conﬂict with our
n—tuple lower bar notation.

Let In be the projection of R°° onto its ﬁrst 11 co—ordinates, n21

and Dn= lrn(D). For (111, ..., ”n) E Dn[c,d] let

n
l
away 11,) = (de)"+1an+l(ml. ms) 1330 (“1)(eln‘ra,

(2.1)

- +1— n n+1 n—r
I‘n +1011, an) = ((1.0)In mn+l(ml, inn) 1:210 ( r )(—c) pr,

1‘ .
_ _ —r r r-1 —
where m0 — 1, mr — (d—c) i20(i)(_C) pi, 15r$n and In +1, mn+1 are

as in (5) of Rolph. Then for (111, ..., pn) E Dn[c,d], (p1, ..., ”n’ ”n+1) E
Dn+1[c,d] iff

(22) En+1(”11 "-1 ”11) S ”n+1 S ”n+l(pli "-1 #11)

Thus starting from a sequence of measurable functions {hn} positive on

[c,d] with f hndx < co (the integral is wrt Lebesgue), cn and tin being
cn’ n
the minimum and the maximum of 0 ~~> 0“ on [c,d], we construct the

prior on D[c,d] exactly the same way given in Rolph with only changes of
mu, mm, in to an, En’ ﬁn respectively.

Under this A, the distribution of any ﬁnite moment sequence has full
support, hn's being positive. It then follows from the deﬁnition of the
product tOpology that A has full support. We will refer to this prior as
Rolph's prior for [c,d].

34

D. (Random distribution functions of Dubins and Freedman) Starting from a
probability (base probability in their terms) on the Borels of unit square
assigning measure 0 to the corners (0,0) and (1,1), Dubin and Freedman
(1966) constructed a random distribution on [0,1]. They gave (3.6,Theorem)
precise conditions on the base probability so that the resulting prior has full
support. Then an obvious transformation carries it over to a prior on (I
with full support. However for these priors we do not know any form of the

Bayes estimator which can be computed in practice.

E. (Discrete priors) Since (I is compact and metrizable, it is separable. Let

 

{tun} be a dense sequence in f) and 0<Cn<ao, £cn= 1. Then A = 2cn6wn,

6 a) being the probability degenerate at tun, has support Q.
n
This example Shows that, contrary to as assumed in Inglis (1973),

non—atomicity of A is not necessary for the purpose of asymptotic Optimality.

3. The Bayes estimators.

It has been pointed out in the previous chapters that the Bayes
estimators in the different formulations considered are all related. In this
section we ﬁnd out some practically useful expressions for t a’ the mth
component of the Bayes estimator in the set problem (with n components),
lgasn, for some of the examples considered above.

From (1.2.4), its preceding equation and the deﬁnition of ”a it follows
that

m 61,131),i (xi) dEAM)

 

(3-1) {301(3) - n
IiI=Ilp0i ( xi )d-‘DA(Q)

35

Let u denote a random probability having distribution A and, given or,

let Qn = (Q , ..., an) be a random sample ﬁom w ; equivalently consider P
such that

n
(3.2) P(wEﬂo, QlEBl, ..., QnEBn) = I 190(w)iglw(Bi) dA(w)

V no E 3(a), Bl, ..., Bn E .19).
Note that the marginal distribution of Qn is UK, i.e.

n
(3.3) ”915311 one-an) = jigltrsimn.

£11

(3.4) [A311 (00) dUA(Qn) = LHS of (3.2)
B X...XB
en

1 n
and let 112 be the posterior mean of «1 given Qn, i.e.

win = deA'Qn.

Then by repeated conditioning it follows that

Let A be a regular posterior distribution of at given Qn, SO that

(3.5)

90

LHS of (3.3) = If} .. é édwﬂn-lwn) dw -2(0n—1) .. dd) (01),
1 n-l n

where wﬂo = deA.
Using this via (3.3) in (3.1) we get

‘1 ‘1 Qi-l
[...j ¢(Qa).ll p0. (xi)_II dw (0i)
(3.6) tact) = n 1:1 :1 [7-1
[min p0 (xi) II (110 1-1 (0i)
i=1 i i=1

 

For examples A and B (3.6) can be used to calculate ta from the

data.

36

A. Let A = Q (7), with support 7 = 9. Then by PrOposition 1
and Theorem 1 of Ferguson (1973),

3n (7 +1: 601)

(3.7) a) = , n20.
7(9) + n

 

B. In this case expression of the posterior means is known but

complicated. It is given in Doksum, Example 4.1.

Remark 3.1. A trivial generalization of the (3.7) specialization of (3.6)
is given in Kuo (1986). Also Kuo has described and exempliﬁed a Monte

Carlo method for its calculation.

When p(Nc) = 0 consider the reparametrization n = e0. Then we can

use the geometric form of our 50(x) = 1)" EM) to obtain a manageable

expression for t a in Example C when 45(1)) = 711‘, kEN. For a general ()3 one

can then use a suitable polynomial approximation.

C. Let 23er be a polynomial approximating h' on [ec,ed]. Then for

A = Rolph's prior for [ec,ed and ¢(1)) = 711‘, t a will be approximated

by

D II
(3.3) I ajnxa +1. +1. .n (rajnxiﬂ) dA / [i21(23jpxi+j) dA.

late

It is important to note that fa above corresponds to the A mix of i.i.d.

priors on 11 or, in terms of the original parametrization, the A mix of i.i.d.
priors on e”. The integrals in (3.8) reduce to Lebesgue integrals on some

* ill
Euclidean Spaces (RX +d+k and RX +d respectively to be precise where

37

X* = V <X1, ..., Xn> and d is the degree of the polynomial) because
any ﬁnite 11 sequence has Lebesgue density under A.
To see (3.8), ﬁrst use Fubini to rewrite (3.1) (with Q replaced by 11,

Do by 5,, and ¢(n) = 111‘) as

n
ltlnﬁptxaldwt n)i;1alp,,(Xi)dw(n)) «174 .

(3.9) t x) =

 

a( II
I(,H Iii (Xi)dw(n)) d4
1=l ’7

Now use 1")"(x) = ”x I1(r)), h(r)) a 2 ajnj in (3.9) to get (3.8).

Remark 3.2. For conditionally uniform prior, i.e. hi 5 1 V i, (3.8)
takes a simpler form. In cases like Geometric and Negative Binomial h

itself is a polynomial. For the Poisson case we can choose the polynomials

_ j . ..
2 {—1.} 17) for approximation. In general, since h is continuous, a sequence of
j.
approximating polynomials always exists. Moreover such a sequence can be

found numerically.

4. Remarks.

1. It should be Obvious that we can also treat the cases where the
components are 1—1 transforms of some exponential families we have been
considering. Suppose that the component distributions P 0 , 0 E 9 are such
that {Q 1]: r) E H} form one such exponential family where Q 7) =
P ¢—1( mT-l, T and (I: are 1-1 transformations on R and 9 respectively
and (Fl is continuous. Let 2; ~ Pf . Then 1 ~ ()12 where Y a = T(Xa),
”a = “00), Igogn. Since T is 1-1, estimators (based on X) in the

transformed problem are related in a 1-1 fashion to the estimators (based

38

on X) in the original problem. Any such two estimators have identical risk
function under a common parametrization. Moreover since ([1 is continuous,
43 remains continuous in the reparametrization 1) of the transformed problem.
Hence the conclusion of Chapter 1, Section 2 and Theorem 1.5.1 for the
transformed problem implies that the set compound estimator
—1
. 14w (n))q,,(T(x,,)) aw,
130(3) = T x d
lq,( ( a» w,

is admissible and a.o. for estimating Q in the original problem, where wa is

n
as in Section 1.2.1 with g a(w) = 2 log q w(T(Xi))’ Analogous conclusions
iata

 

hold for the sequence compound and the empirical Bayes versions.

2. We can generalize the component loss to weighted squared error
loss, where the weight function is positive and continuous. If L(a,0) =
w(lll)(a—(1>(Q))2 then a component Bayes estimator of (15(0) is the ratio of the
corresponding Bayes estimators of w(0).¢( Q) and w(0) wrt the squared
error loss. Since was is continuous and w... > 0, the L1(E) case of
Lemma A.1 with L = 2w*|¢|*/w* and two applications of Theorem 1.5.1
imply that Elta- ial -—1 0 uniformly in a and Q where t a and fa refer
here to the weighted loss. AS before this is sufﬁcient to conclude the
asymptotic Optimality of 5 since (1.2.6) holds with w* multiple of the
RHS. The same conclusion holds in the sequence compound and the empirical

Bayes versions.

3. An interesting question seems to be how far we can relax the
compactness assumption on the component parameter Space. It is known that

we can not always go up to the natural parameter Space. An example where

39

no a.O. compound estimator exists is the Poisson family with unbounded

parameter set. See Gilliland (1968, Section 3.3) for a proof.

4. Under the assumption ”-1 << 11 instead of 111 << )1 we can
use the transformation T(x) = -x in Remark 1 to obtain admissible, a.o.
rules. An example where none of these holds is provided by the Binomial
family and it is well known that in this case even the empirical Bayes

problem has no a.o. solution.

5. A possible Open question is whether the condition A has full support
is necessary. Another interesting problem is to ﬁnd examples of A for which
a good lower bound (as a function of 6) to the quantity in Lemma 1.4.3. can
be obtained so that a rate of convergence in the asymptotic Optimality of t

can be established.

APPENDIX

APPENDIX

Here we present a few results of possible independent interest. They

are used as technical tools in the body of the thesis.
1. On bounding the difference of two ratios.

Lemma A.1. For <y,z,Y,Z,L> e s5 and z 16 o g L,
Y —1
(1.1) IZI {Bf-2| 4L1 5 IZI IyZ-zYI +L(lzl 421),
s ly-YI +( I§I + L) lz-Zl-
Proof. The ﬁrst inequality holds because the RHS, less |Y| if
Z = 0, is the I-E-l, (1 — Ig-l)+ weighted average of quantities whose
minimum is the LHS. The second inequality follows by triangle inequality
weakenings,
IyZ-zYl s IZIIy-YI + lyllz-ZI
and
(IZI -|ZI)+ s Iz-Zl,

in the two LHS terms.

Remark A.1. Division by |z| in (1.1) yields a pointwise
improvement on a lemma of Singh (1974, Lemma A.2). When <y,z,Y,Z> are
measurable functions on a space with an integral J, his lemma itself (and its
extension, his Remark A2) is further improved by the corollary to ours
resulting from the subadditivity Of the norm or metric distance from 0 in

L7(J) according as 7 E [1,oo] or (0,1).

40

41

2. 0n uniform convergence of convex flmctions.
Lemma A.2. If {fn} is a sequence of convex functions on an interval
I c R converging to a continuous real function f pointwise on I, then the

convergence is uniform on compact submts of 1.

Proof. Let K be a compact subset of I, an interval w/o.l.g.
Partition K into intervals of equal length.

Let 6 denote the maximum of the oscillations of f within the
subintervals. Let ”n denote the maximum of [fn — f | on the endpoints of
the subintervals.

On a given subinterval with endpoints a and b , bound fn above
by its chord to obtain

In 5 fn(a) V fn(b) ;
bound fn below by the line extending the chord from an adjacent interval
to obtain
fn 2 fn(a) - I fnl2'a-b I '
Thus

V lfn-fl 5 317 +26 —1 26 as n—loo.
K n

The proof is now Over Since the uniform continuity of f over K permits

arbitrarily small c > 0 based on the number of subintervals.

42

3. A uniform Ll- LLN for independent random continuous functions.

Let (S,d) be a compact metric space. Let || || denote the sup
norm on C(S).

Let {Pu : VEI } be an arbitrary family of probability measures.
( We use the measure to denote the corresponding expectation too and use
the superscript (V) to denote deviations Of random elements from the values
of their Pu expectations. )

Let An denote the uniform expectation on {1,...,n}. (If {fk} is a
sequence of elements of a linear Space, Anf. will be denoted by I when
convenient.) Note that An commutes with (V). Let * denote the

(iterated) Operation lim supn X Anx Pu and note that it is subadditive.

Theorem A.3. If
(i) Under each Pu, H1, H2, are independent C(S) valued random
elements with expectations E RS ( (PVHkXS) = PVHk(S) V k and S ),

(ii) * (IIH‘f’ln — M)+ l o as M 1 .,
(iii) v t > o and s e s, with vwk = V{|H('l:)]:|:d(s,t) <p},

*
[Vspu,>6110 as p10,
then

* IIHMII = o .

43

Proof. If card(S) = 1, [I [I reduces to | | and (iii) is vacuously
satisﬁed. The Hk are then real valued random variables and will be denoted
by Xk'

For M E (0,oo) represent (as in the proof of Theorem 2.3.9 of Fabian

and Hannan (1985), which the present real case greatly strengthens)
(V) _ (V)
(3.1) X k — U k + PVUk + Wk

with Uk the projection of LHS into [-M,M], so that

—(u) +4) -— —
PVIX | s PVIU | + IPVUl + PV|W|

and
lPuUl = IPVWL
Thus
(32) 19,1821 s M/ni + 211,,1WI

(since (PleMl) 2 g Pym”)2 5 M2/n ). Since kaI = (|X£”)l — M) + ,
*IWI 5 *|W.| 10 as M T 00 by (ii) and thus

(3.3) * |Y(")| = 0
follows from (3.2).
Let C > 0, S E S. Since

(3.4) v,,,,k 5 211119911,

 

44

(3.5) VWk < C + M [vspuk> C] + (2 "118'?” — M )+

and by subadditivity

* VW. s c + M * [vain]? r] + * (2 “118’?” — M )+
(3.6)
5 2t + M(c) * WSW.) c] by choice of M = M(c) in
(ii). And so
(3.7) * Vsp(s,c)u. 5 36 by choice Of p = p(S,€) in (iii).

By compactness of S, 3 a ﬁnite cover of S by Spheres indexed by

oi: (Si, pi) with pi = p(si,e), for i = 1, ..., g. Then

(3.8) "HM" s [51 Ismail + IV‘QVI + W l,

V 0.11
I

g _ __ 8 _
s 2121 |H(")(si)| + wig/3,11 + (Pa/tit

From (3.4) it follows that |v(gguk| s 2 ( "118'?" + PV||H([:)|| ) and

hence

(3.9) P,( ly‘ﬁiml - M )+ s 2 P,,( 2 1111‘?" - 14/21,,

Thus the “311k , as well as the H(i’)(si), inherit (ii). Since (i) obviously
1

holds for both sequences, so does (3.3). So it follows from (3.8) and

45

subadditivity of * that
* _(V) S _ , 8 ..,
(3.10) ||H || 3 * 11/ PuVoiu _ hm supn 11/ y Puvoiu

8
Since lim supIl commutes with V for ﬁnite g, we get
1

RHS (3.10) 5 3c. The proof ends since 6 is arbitrary.

Remark A.3. Let (ii+) and (iii+) denote (ii) and (iii) respectively
without the centerings (V).
Let (ii+) hold. Then, Since IIPVHRII S PullHkll and hence

([lPqull - M) + s PV(||Hk|| - M) + , (ii+) holds with HR replaced by PVHk.
This along with (ii+) then gives (ii) (since (a + b — 2M) + 5 (a — M) + +

(b - M) + ).
Let (ii+) and (iii+) hold. Then, since v{ milk]; I : d(t,s) < p}
-l . ..
S PVVSpk , Anlpuvsp.> c] S c AnPVVsp. . So we (3.6+) and (11+)
0 = Iii:lo * Vs p. . Thus (iii+) holds with Hk replaced by Funk This along

with (iii+) then gives (iii) (Since [ V|H+G| > c] C [ V|H| > 13/2] +
[VIGI > 15/21)-

 

46

4. Admissibility of Bayes estimators in the compound problem under squared
error loss.
Let {P 0}, 069 be the component distributions. Consider the compound

problem of estimating 52 under squared error loss L(t, Q) =
'1 a; (ta— 113(001»2 for any function 45 on 9. Let Pi = (1:1an for Q69”.

Let C be a prior on Q. Denote the joint distribution (0P:— on < _x,Q> by
Q. Then the marginal of x is Qx-l = [Pddg For a function f on 911

let Qxf( Q) denote the class of conditional expectations of f( Q) given 9;.

Lemma A.4. If C is such that P0 ( Qx—l V Q E 9n then every

Bayes estimator versus C is admissible.

Proof. First consider the set compound case.
Fix an n e {1,...,n}. Then Q(ta— 00)2 is minimal iff t 0(a) e

Qx¢(Qa). Hence ta is determined up to Qx'l null sets and so by the

assumption of the lemma has unique risk Q ~~> f(Qx¢(Qa) — ¢(00))2dP0
Thus, Since a E {1,...n} is arbitrary, the compound Bayes estimators have the
11
unique compound risk Q ~~> n—1 23 [(Qx¢(Qa) — ¢(Qa))2dP£ and hence are
a=1 —

admissible.

For the sequence compound case, the given condition implies that for

-1

each a E {1, ..., n}, PQ ( Qxa V Qa E 90’. Hence, by combining the
a

intermediate results in set case with n = a for each a, we get that the

47

sequence compound Bayes estimators have the unique compound risk

11
ﬂ ~~> n-1 2 [(Qx ¢(Qa) — ¢(Qa))2dP£ and hence are admissible.
a=1 —a a

 

BIBLIOGRAPHY

BIBLIOGRAPHY

Basu, D. and Tiwari, R. C. (1982 . A note on the Dirichlet process.
Statist. Prob: Essays in onor of C. R. Koo. North—Holland
Publishing Comp, 89—103.

Bilﬁngsgfyé Patrick (1968). Convergence of Probability Measures. John Wiley
ons.

Blackwell, David and McQueen, James B. (1973). Ferguson distribution via
Polya urn schemes. Ann. Statist. 1, 353-355.

Doksum, Kjell (1974. Tailfree and neutral random probabilities and their
posterior distri utions. Ann. Statist. 2, 183-201.

Dubins, L. E. and Freedman, D. A. (1966). Random distribution functions.
Proc. Fiﬁh Berkeley Symp. Math. Statist. Prob. II.1, Univ. of
California Press, 183-214.

Fabian, Vaclav and Harman, James V\(1985). Introduction to Probability and
Mathematical Statistics. John iley & Sons.

Ferguson, Thomas S. (1973). A Bayesian analysis Of some nonparametric
problems. Ann. Statist. 1, 209—230.

Ferguson, Thomas S. 81974). Prior distributions on Space of probability
measures. Ann. tatist. 2, 615—629. ‘

Gilliland, Dennis C. (1968). Sequential compound estimation. Ann. Math.
Statist. 39, 1890—1904.

Gilliland, Dennis C. and Harman, James (1986). The ﬁnite state compound
decision problem, equivariance and restricted risk components.
Adaptive Statistical Procedures and Related Topics, IMS Lecture Notes
- Monograph Series 8, 129—145.

Gilliland, Dennis C., Harman, James and Huang, J. S. (1976). Asymptotic
solutions to the two state compound decision problem, Bayes versus
diffuse priors on prOportions. Ann. Statist. 4, 1101—1112.

Gilliland, Dennis C., Boyer, John E. and Tsao, How Jan (1982). Bayes
empirical Bayes : ﬁnite parameter case. Ann. Statist. 10, 1277—1282.

48

49

Hannan, James F. (1957). Approximation to Bayes risk in repeated play.
Contribution to the Theory of Games 3, Ann. Math. Studies, No. 39,
Princeton University Press, 97-139.

Hannah, J. (1960). Consistency of maximum likelihood estimation of
discrete distributions. Contributions to Prob. Statist., Stanford Univ.
Press, 249—257.

Inglis, James 1973). Admissible decision rules for the compound problem.
Ph.D. T esis, Dept. of Statistics, Stanford University.

Inglis, James (1977). Admissible decision rules for the compound decision
problem : the two action two state case. Ann. Statist. 7, 1127-1135.

Kuo, Lynn (1986). A note on Bayes empirical Bayes estimation by means of
Dirichlet process. Stat. Prob. Let. 4, 145-150.

Lehmansn), E. L. (1959, 1986). Testing Statistical Hypotheses. John Wiley &
ns.

Meeden, Glen (1972). Some admissible empirical Bayes procedures. Ann.
Math. Statist. 43, 96-101.

Robbins, Herbert (1951). Asymptotically sub-minimax solutions of compound
problems. Proc. Second Berkeley Symp. Math. Statist. Prob, Univ. of
California Press, 131—148

Robbins, Herbert £81955 . An empirical Bayes approach to statistics. Proc.
Third Berk ey ymp. Math. Statist. Prob. 1, Univ. of California Press,
157—163.

Rolph, John E. (1968). Bayesian estimation of mixing distributions. Ann.
Math. Statist. 39, 1289—1302.

Sethuraman, Jayaram and Tiwari, Ram C. (1982). Conver ence of Dirichlet
measures and the interpretation of their parameter. tatistical Decision
Theory and RelatedTopics, IH.2, Academic Press, 305-315.

Singh, Radhey Shyam (1974). Estimation of derivatives of average 11—
densities and sequence—compound estimation in exponential families.
Ph.D. Thesis, Dept. of Statistics and Probability, Michigan State
University

Vardeman, Stephen B. (1978. Admissible solutions of ﬁnite state sequence
compound decision pro lems. Ann. Statist. 6, 673-679.

 

,.. 4.1,

 

 

nICHIan STATE UNIV. LIBRARIES
lllllllllllllWWllllllllllllllllllllllllllll
31293005393289