CHIGANT

IIIIIIIIIIIIIII III IIIIIIIIIIIIII

1293 00877 2703

IIIIIIII

 

 

This is to certify that the

dissertation entitled

Asymptotically optimal and admissible estimators in
compound compact Gaussian shift experiments

presented by

Suman Majumdar

has been accepted towards fulﬁllment
of the requirements for

PILDM degreein Statistics

 

QMMIMMM

Major professor
L

Date August 5, 1992

MS U i: an Affirmative Action/Equal Opporﬂuu'ry Institution 0-12771

._._—_———

a“ “EF—

 

 

LIBRARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

  

LLJC
I_—II——I

'LJC—ij
LII—Tl;

I—_I I

MSU Is An Affirmative Action/Equal Opportunity Institution

   

 

 

 

 

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WMMI

ASYMPTOTICALLY OPTIMAL AND ADMISSIBLE ESTIMATORS IN
COMPOUND COMPACT GAUSSIAN SHIFT EXPERIMENTS

By

Suman Majumdar

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

_ DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

1992

5’77’

t??-««

ABSTRACT

ASYMPTOTICALLY OPTIMAL AND ADMISSIBLE ESTIMATORS IN
COMPOUND COMPACT GAUSSIAN SHIFT EXPERIMENTS

By

Suman Majumdar

The problem of finding admissible and asymptotically optimal
compound and empirical Bayes rules is investigated in the context of

decision about an infinite dimensional parameter.

The component experiment considered is a homogeneous
experiment {P9 : 06H} on some measurable space ($,€F), where H is a
real separable Hilbert space, such that the map

9H<0,. >0 := 1n po(.)+II9IP/2
is linear from H into the real-valued measurable functions on ($,€F),
where p0 is a density of P9 wrt p=P0. This experiment is a Gaussian
shift experiment in the sense of LeCam (1986) and {<0,. >0 : 06H}
is the isonormal process on (%,€F,p) in the sense of Dudley (1967).
The component problem estimates the shift parameter 0 restricted to a

compact subset of H under squared error loss.

We consider the compound and empirical Bayes formulations of
the above component problem and show that all Hayes estimators in
the various formulations are admissible. Our main result : Any Bayes
compound estimator versus a mixture of iid priors on the compound
parameter is asymptotically optimal if the mixing hyperprior has full
support. Analogously any Hayes empirical Bayes estimator is

asymptotically optimal if the empirical Bayes prior has full support.

Using the (weak) conditional expectation representation of the
Bayes estimator in the component problem and weak compactness of the
unit ball, along with the fact that {<0,.>0 : OEH} is the
isonormal process and consequences thereof, we reduce the question
of asymptotic optimality to that of an L1 consistency of posterior
mixtures. Ve prove the consistency result, which complements Datta
(1991a), by assembling some previously known results and repeatedly

using the Gaussian shift structure.

The dissertation also characterizes the support of a Dirichlet
hyperprior on the set of all probability measures on a separable
metric space to be those probability measures whose supports are
contained in that of the parameter measure (of the Dirichlet
hyperprior), proving a result stated in Ferguson (1973) for the line

and providing examples of a full support hyperprior.

To my father and to the memory of my late mother

iv

ACKNOULEDGEMENT

I would like to express my sincere gratitude to my advisor
Professor James F. Hannan whose erudition, vision of and dedication
to Statistics always guided and inspired me. His caring personality

provided strong support in some rather trying circumstances.

I would also like to thank Professors Dennis Gilliland, Hira
L. Koul, V. Mandrekar and Habib Salehi for serving on my guidance
committee, Professor Gilliland for some very useful comments on an
earlier draft, Professor Mandrekar for* alerting me to the (now
realized) possibility of generalizations of the results initially

presented and Professor R.V. Ramamoorthi for an important reference.

The opportunity provided by Professors Habib Salehi and James
Stapleton on behalf of the Department of Statistics during the last
two years to enhance my professional experience by teaching a wide
variety of independent courses and working in the Statistical
Consulting Service is gratefully acknowledged. Professors Gilliland
and Stapleton very kindly taught me different aspects of the role of

a Statistical Consultant.

The support provided by my sisters during my difficult student
days in India and their encouragement to pursue graduate studies, as
well as the support and encouragement provided by my wife during the

preparation of this dissertation deserve special mention.

TABLE OF CONTENTS

Chem;

0. INTRODUCTION ..................... 1
0.1. The component and the compound problem ...... 1

0.2. Literature review and a summary ......... 4

0.3. Notational Conventions .............. 7

1. THE COMPOUND ESTIMATION ................ 9
1.1. The Gaussian shift component ........... 9

1.2. Estimators induced by hyperpriors ....... 14

1.2.1. Bayes versus mixture of iid priors . . 15

1. 2. 2. A useful inequality on the modified regret 16

1.3. A bound on the L1(P0) distance between two component

Bayes estimators ................ 17

1.4. Asymptotic optimality .............. 23
1.5. Admissibility .................. 27

2. CONSISTENCY OF THE POSTERIOR MIXTURES ......... 29
3. THE EMPIRICAL HAYES ESTIMATION ............ 42
3.1. Bayes empirical Bayes .............. 42
3.2. Asymptotic optimality .............. 43
APPENDIX ........................ 46
A.1. On measurability ................. 46
A.2. On topological support of Dirichlet prior . . . . 47
BIBLIOGRAPHY ....................... 53

vi

CHAPTER 0
INTRODUCTION

In Section 1 we describe the idea of compounding a decision
problem (called the component problem) first espoused by Robbins
(1951). In Section 2 we review that part of the existing literature
on compound decision theory which can be considered to be a
forerunner to our work and present a summary of it. In Section 3 we
state the notational conventions to be followed throughout the
dissertation (some of these conventions will be used informally in

Sections 1 and 2).

1. The component and the compound problem.

The component problem is a usual decision theory problem,
consisting of a parameter set G), a family of probability measures
{P9 : 069} on some measurable space ($,‘I), an observable $-valued
random element X~P9 under 0, an action space A, a loss function
L:.AX9I—+[0,oo) and decision rules t, tz$HA such that L(t,0) is
measurable V 0 with risk R(t,0) :2 P0L(t,9).

For consideration of Hayes solutions, we fix a o-algebra of
subsets of G such that each of the maps (x,0)I—+L(t(x),0) is jointly
measurable. Let Q={w : w is a probability on 9}. For (.069, let
1(0)) and 7w respectively denote the minimum Bayes risk and a Bayes
rule versus w in the component problem (we assume existence of Tw

for every w). That is,

1(0)) = €1R(t,9)dw(0)=IR(‘rw,9)dw(0).
('9 9

2

The compound problem simultaneously considers a number, say n,
of independent decision problems, each of which is structurally
identical to the above component problem. The compound loss is taken
to be the average of the component losses. In the set compound
version a decision about each component parameter is reached by
using data from all the component problems, while in the sequence
compound version only X0, data up to stage a, is used in making the
a-th decision. Thus for each n21, the compound problem is also a
decision problem, with parameter set 9", family of probability

measures {P9 := il<1P9a : 0 := (01,...,0n)69n} on the measurable
- a:

space $11, observations X = (X1, . . . ,Xn) ~PQ under 0, action space A",

decision rules tz$nHAn such that each L(ta,90) is measurable, loss

r'
r:
A
la»
IQ:
v

II

n

“—1 2 L(taaaa)
(1:1

and corresponding risk

(1.1) Rn(t96) :

PgLn(t_,Q) .

If we were going to use only the data from a particular
component problem to decide about that component parameter, then the
component problems being structurally identical, there is an
intuitive reason to use the same procedure (with different data) in
the different problems. Formally, that amounts to using a compound
procedure 1;, for which ta(x)=t(xa) V a=1,...,n, where t is a
component procedure; such a compound procedure is called simple

symmetric .

Let Gn denote the empirical distribution of (91,...,9n). The

3
compound risk at Q of a simple symmetric 3 reduces to the component
Bayes risk of t versus Gn, where ta(x_) =t(xa) V a: 1,...,n; as such
it is at least r(Gn), the minimum Bayes risk versus On, which is

referred to as the simple envelope at 9.

For a compound rule t, the
difference Dn(_t,Q):Rn(t,Q)-r(Gn) is called the modified regret of t
at Q and a sequence of compound rules {3 : n21} is said to be
asymptotically optimal (a.o.) if

(1.2)

Dn(_t,Q) —~> 0 as n—+oo.

IQ><

However, it has long been recognized (Hannan and Robbins
(1955)) that the compound problem is invariant under the group of n!
permutations of coordinates; also, almost all the compound rules in
the literature are equivariant under the permutation group. Hence a
more appropriate yardstick to judge the performance of a compound
rule should be the equivariant envelope, the minimum compound risk of
equivariant rules (see Gilliland and Hannan (1986) for a discussion

of equivariance in compound decision problems).

Mashayekhi (1990) has shown that if the component problem
involves a compact (in total variation norm) class of mutually
absolutely continuous probability measures, then the excess of the
simple envelope over the equivariant envelope goes to zero uniformly
in the measures. We shall use that result to extend our optimality
result against the simple envelope to that against the equivariant

envelope.

A sequence of compound rules {3 : n21} is said to be

4

admissible if for every n it is admissible in the usual sense.

2. Literature review and a summary.

The problem of exhibiting compound rules which are a.o. as
well as admissible has been an interesting and challenging question
ever since it was put forward by Robbins (1951) in his pioneering
paper of compound decision theory. He considered the problem of
decision between N(—-1,1) and N(1,1), exhibited an a.o. compound
procedure and conjectured that the Bayes compound rule versus the
symmetric prior uniform on proportions might have better risk
behavior, exactly or asymptotically, than his ano. rule. [That it

will not be exactly superior to the bootstrap rule of Robbins was

shown by Huang (1972).]

A.o. compound rules whose components are typically Bayes
versus some estimates of the unknown GB or direct estimates of the
Bayes rule versus Gn have been worked out for many different
component problems. In particular, when the component problem is an
estimation problem under squared error loss, Gilliland (1968) and
Singh (1974) obtained a.o. sequence compound rules with rates (we
say 1; is a.o. with rate an if XDn(t,Q)=O(an)) for discrete and
Lebesgue exponential components fespectively. But these rules are

inadmissible in the sense of the previous section.

Making use of results from Gilliland and Hannan (1974), which

was later published in 1986, Gilliland, Hannan and Huang (1976)

1/2

obtained admissible and a.o. rules with rate n— where the

component problem was a two—state restricted risk problem. [They did

5
not specify admissibility of their rules. But they considered full
support hyperprior mixing of independent identically distributed
(iid hereafter) priors on the compound state space to generate full
support priors on it and looked at the resulting Bayes rules. Since
the risk in their problem is trivially continuous (the state space

is discrete), the resulting Bayes rules are admissible.]

The first solution to the problem of exhibiting compound rules
which are a.o. as well as admissible when the component problem
involves decision among infinitely many probability measures, has
been provided by Datta (1988/91b). The component problem there is
the squared error loss estimation of an arbitrary continuous
transform of the natural parameter of a large compact subclass of a

one parameter exponential family.

Since then, Mashayekhi (1990) proposed a class of admissible
and a.o. procedures in the restricted risk compact component
compound decision problem. This was extended by' Zhu (1992), who
successfully exploited Datta’s (1991a) result about consistency of
the posterior mixtures to obtain admissible and a.o. rules when the
component problem involves equi(in actions) continuous loss
functions in a multiparameter exponential family with parameter set

restricted to a polytope inside the natural parameter set.

The present work seems to be the first to accomplish
asymptotic optimality when the component problem is the estimation
of an infinite dimensional parameter. In fact, it accomplishes
admissibility and asymptotic optimality simultaneously. Our

component distributions, indexed by a real separable Hilbert space,

6
form a Gaussian shift experiment. Ve consider the component problem of
squared error loss estimation of the Hilbert-valued shift parameter
restricted to a compact subset of the Hilbert space. lie note that
all Bayes estimators in our compound problem are admissible. Our
main result is that a Bayes compound estimator versus a mixture of
iid priors on the compound parameter is a.o. if the mixing

hyperprior has full support.
The dissertation is organized as follows.

Chapter 1 treats the compound estimation problem. Section 1
formally introduces the component distributions as satisfying an
assumption (A). That assumption immediately identifies the
experiment to be a Gaussian shift experiment. Section 2 describes
the Bayes estimator versus the above mentioned mixture of iid priors
and establishes a bound on the modified regret of such an estimator.
Section 3 establishes an upper bound on the distance between two
component Bayes estimators in terms of the L1 distance between the
corresponding mixtures. Section 4 combines the results in Sections 2
and 3 to establish asymptotic optimality, first against the simple
envelope and then against the equivariant envelope, assuming
posterior mixtures are L1 consistent for the empirical mixture. In
this section we provide a closed form expression of our estimator,
and examples of a full support hyperprior. In Section 5, we show
that every Bayes estimator in our compound problem, in particular a

Bayes estimator versus a mixture of iid priors, is admissible.

In Chapter 2 we establish the consistency of the posterior

mixtures assumed in proving asymptotic optimality in Chapter 1. In

7
the process we get that the very general sufficient conditions given
by Datta (1991a) for this kind of consistency of the posterior

mixtures are by no means necessary.

Chapter 3 looks at the empirical Bayes problem of Robbins
(1951, 1956) with the component problem described above.
Admissibility and asymptotic optimality (defined in that chapter)

follow from the compound results.

Finally, in Section 1 of the Appendix we prove two
measurability lemmas that are used in the main body of the
dissertation; in Section 2, we characterize the topological support
of a Dirichlet prior on a separable metric space, which is used in

Section 4, Chapter 1 to give examples of a full support hyperprior.

3. Notational conventions.

Given any n-tuple x_=(x1,...,xn) of elements from a set, for
each 1 SaSn, xa denotes the a—tuple (x1,...,xa). For probabilities
P1,” .,Pn, iglpi denotes their measure theoretic product; when Pi=P V
i, iglpi is denoted by P“. For sets {Ai : 1 _<_ i Sn}, iEIAi denotes their
set theoretic product; when Ai=A V i, iglAi is denoted by A“. To
denote the integral of a function f with respect to (wrt hereafter)
a measure u, we will interchangeably use the standard integral
notation ffdp and the left operator notation p(f), or even pf;
depending on typographical convenience and the emphasis to be
conveyed, the dummy variable of integration in the integral notation
will be sometimes displayed, sometimes only partially displayed and

sometimes hidden altogether. Sets are always identified with their

8
indicator functions. The same is true for probabilities and their
induced expectations. ER stands for the real line. If X is a random
element on a probability space (.,.,P), then PX“1 denotes the P-
induced distribution of X on the range space. The notation a := b
will mean that a is defined to be b. The set theoretic complement of
a set A will be denoted by A, except in Section 2 of the Appendix,
where the more traditional Ac will be used. The following numbering
convention will be used throughout : All numberings of displays and
statements are local within a chapter. For chapters with multiple
sections, (2.1) will refer to the first display in the second
section; for chapters with a single section, (3) will refer to the
third display. On occasions when we have to refer to numberings in

other chapters, the reference will be explicit, e.g. Theorem 1 of

Chapter 2 or Lemma 1.1 of Chapter 1.

CHAPTER 1
THE COMPOUND ESTIMATION

In this chapter we consider the compound problem as described
in Chapter 0 corresponding to the Gaussian shift component problem to
be introduced below. We prove asymptotic optimality of Hayes rules
versus (full support hyperprior) mixture of iid priors [Theorem
4.1], which is the main result of the dissertation, using the
consistency of the posterior (distribution of 0,, under the mixed
compound prior given xn__1) mixtures, a result of independent
interest stated and proved in Chapter 2. Section 1 describes the
component problem to be investigated and assembles some pertinent
facts about it. In Section 2 we calculate a Bayes estimator in the
compound problem versus a mixture of iid priors on the compound
parameter and obtain a useful upper bound on its absolute modified
regret. In Section 3, we obtain an upper bound on the distance
between two component Bayes rules in terms of the L1 distance between
the corresponding mixtures, which is used in Section 4 in
conjunction with the bound on the absolute modified regret obtained
in Section 2 to prove the main result. In Section 5 we show that
every Bayes estimator in the compound problem, in particular a Bayes

estimator versus a mixture of iid priors, is admissible.

1. The Gaussian shift component.

We consider the squared error loss estimation problem in a

Hilbert indexed Gaussian shift experiment. Let H be a real separable

10
Hilbert space (with "f" denoting the norm of an element f in H and
< .,. > the inner product) and {P9 : 06H} be a family of
probabilities on a measurable space ($,‘:'F) specified by (strictly
positive) densities {p0 : 06H} wrt p=P0, such that
(A) the map 9H<0,. >0 := ln p9(.)+I|0|I2/2 is linear from H
into the linear space of all real-valued measurable

functions on (%,‘.F) .

We consider the component problem with ('9 a compact subset of

H 3 .A 3 e and L(a,0) =IIa—9IP.

The contents of the remainder of this section are as described
below : We show that {<0,. >0 : 06H} is the isonormal process on
($,€I,p) in the sense of Dudley (1967) [Remark 1.1], which in turn
identifies the experiment under investigation to be a Gaussian shift
experiment in the sense of LeCam (1986) [Remark 1.2]. We show that
(0,x)I—->p0(x) and (w,x)I—»pw(x) :: fp0(x)dw(0) are jointly measurable
when it (the set of all probabilities on O) is endowed with the
topology of weak convergence and the corresponding Borel o-field
[Remark 1.3]. We then show that a Bayes estimator in the component
problem must be the (weak) posterior expectation [Lemma 1.1]. We
close the section by proving two lemmas [Lemma 1.2 and Lemma 1.3]
describing certain features of the component problem that are used

in the sequel.

Remark l_-_l (The isonormal process) . Since by (A)
p0=exp(—||9|P/2+ (9,. >0) V OEH, by representing the lhs below
as a u integral, using the linearity of the map in (A) to treat the

integrand and representing the resulting integral as a Pn+t0

11

integral, we get V tEER and 0,7]EH,
Pn[exp{t<0,. >0}] = exp{t <0,n> +t2II9II2/2},
which by uniqueness of moment generating function proves
(1.1a) P,,<0,. >0-1=N( <o,n>, Half?) v men.
The linearity assumption in (A) then shows

(1.1b) {<9,. >0 : 06H} is a centered Gaussian process on
(0.6,‘5F,p).

By the (polar) representation of the product of two numbers in terms
of the square of their sum and the individual squares, using the

linearity of the map in (A) and (1.1a) with 11:0, we get
(1.2) p(<0,.>0<77,.>0)=<9,17> V0,nEH.

Now, the assertions in (1.1b) and (1.2) show that the process

{<0,. >0 : Hell} is isonormal in the sense of Dudley (1967). //

Rglﬂk 1+2 (Gaussian shift experiment). Note that by (1.1b), the
experiment under investigation is a Gaussian shift experiment in the
sense of Definition 2 of Chapter 9 of LeCam (1986). Even though the
definition in LeCam does not require the indexing set to be a
Hilbert space, discussions following it show that it suffices to

restrict attention to that case. //

12

Remaﬂ; _L._3 (Joint measurability of densities). Let {ej : jZl)
be an orthonormal basis of H. By (1.1b) and (1.2), we get that
{<ej,. >0 : j_>_1} are independent random variables on ($,§F,p). Let
0,, :2 .il<0,ej>ej. By linearity of the map in (A),
<01“. >J;j§::1<9,ej> <ej,. >0. Since Orr-+0 in H, <01“. >0
converges to <9,.>0 in L201) by (1.2); since the <0n,.>0 are
the partial sums of a sequence of independent random variables, by
Levy’s Theorem [Theorem 3.3.1, Chow and Teicher (1988)], the
convergence is p-a.s. as well. Since <0n,. >0 is continuous in 9
and a measurable function on 0.6, it is jointly (in 9 and x)

measurable by Doob’s Theorem. That implies the joint measurability

of its u—a.s. limit and hence that of po.

Let Q, the set of all probabilities on the Borel o-field of 9,
be endowed with ‘ the topology of weak convergence and the
corresponding Borel o-field. For a269, let pw(x) :2 fp9(x)dw; this
is clearly a density of the mixture Pw :2 [Pgdw wrt p. The map
(w,x)I—)pw(x) is jointly measurable by the joint measurability of

0,x Hp x and Lemma 1.2 of the Appendix.
0

The next lemma characterizes a Bayes estimator in our
component problem. Specializing the notation introduced in Chapter 0
we shall denote a Bayes estimator (versus w) in the component

problem by 7w.

Throughout the remainder of the dissertation, let
(1.3) M=sup{IIOII : 069}.

Lemma 1.1. On the common support of {PV : V60}, 7,, is the

13

unique mapping into H satisfying

<Tw,h> = f <17,h>(p,,/pw)dw(17) V hEH.
9

Proof. We first show that for any probability measure 1r on 9,

3 an unique element v(n) in H satisfying
(1.4) <v(1r),h> = f <n,h>d7r(n) V hEH.
9

Since the map hI—rf <17,h>d7r(n) is a linear functional on H whose
9

norm is bounded by M, the assertion of (1.4) follows from the Riesz—

Frechet Theorem [Theorem 5.5.1, Dudley (1989)].

Note that if pw(x) is positive, the map 0Hp0(x)/pw(x) is a
density (wrt w) of a probability measure 5.2x on ('9. By (1.4), it is
enough to show that rw=v(&:) on the common support of {PV : V69}.
Now, by Fubini’s Theorem, the Bayes risk (versus ca) of an estimator

t is equal to

(1.5) f [ fI|t(x) — BIFdeIdpMIxIdIIIx).
95 6

Triangulating around v(&2x) and expanding the norm square of the sum,

the inner integral in (1.5) is

IIIIx)-v(wx)IF+ yum.) JIM»...
which is minimized _iﬁ t(x) =v(&2x), completing the proof. //

Lemma 1.2. For every finite sequence {9i : 13 i Sk} CH and {ai
: 1 g i S k} C ‘R,

k . k
2105( ll-llpoialdl‘) 7"”.213162
I: 1::

 

I—élaIIeIII-

14
Proof. Starting with the functional form of pa. implicit in
l
(A), the assertion follows by using the linearity of the map in (A)

k
and the functional form of p6, where 5:.Zlai0i' //
1:

Throughout the remainder of the dissertation, let ||f|Lq denote

the Lq(p) norm of a function f in Lq(u).
Lemma 1.3. For every we 9 and every integer q 2 1,

2
—l M 2
PwELqU‘) and IIPI...2I|qSe(q ) / -

Proof. Writing pwq as a q-fold iterated integral, interchanging
the order of integration on $ and 9“, applying Lemma 1.2 (with k=q

and ai=:1 V'i), and using (1.3), we get

(1-6) #(pwq) S exp{q(q- 1)M2/2},

completing the proof. //

2. Estimators induced by hyperpriors.

In Subsection 2.1 we show that the a-th component of a Bayes
estimator in the compound problem versus a mixture of iid priors on
the compound parameter is the Bayes estimator in the component
problem versus the posterior mean under the mixing hyperprior given
the data from the other problems; in Subsection 2.2 we obtain an
upper bound on the absolute modified regret of such an estimator in
terms of the distance between its a-th component and a component

Bayes rule versus the empirical state distribution.

15

2.1. Bayes versus mixture of iid priors.

Since G is a compact metric space, by Theorem II.6.4 of
Parthasarathy (1967), Q with the topology of weak convergence is
also a compact metric space; let €B(Q) denote its Borel o-field. Let
A be a probability measure on (ﬂ,‘5B(Il)). We take A-mixture of iid
priors on 9“ (for each n) and denote that prior by 52A,“. [The

measure (DAn is defined on the class of measurable rectangles by

n
(2.1) GA’n(BlXB2X....XBn) =3; .I-IIW(Bi)dA,
1:

and then extended to the product o-field. Note that by Lemma 1.1 of

the Appendix the above integrand is measurable.]

Let _t;=(t1,...,tn), where ta:$“I—ui is a measurable function,
be an estimator in the set compound problem. The a-th component

Bayes risk of _t versus DAD is
I

2.2 R t ,w = t —9 dP dw dP n-ldA.
( > (0 AM) {wiHIAQIBIQ an? 0a ] ..
Disintegrating the joint probability on $n'1xﬂ determined by
(den—ldA) as (dAa,ndP5An_1), where AC”, is the posterior
distribution of on (under A) given (x1,...,xa_1,xa+l,...,xn) [since
(I is a Polish (in fact compact metric) space, by Theorem 10.2.2 of

Dudley (1989), such a disintegration exists], we get

2
(2.3) 1hs(2.2) = f H9916 "ta—0C,“ dPgadwa,n]deA,n_l,

$n_

where wmn denotes the A0,,n mix of w’s. Clearly, rhs(2.3) is

16
minimized by choosing ta(x)=1wan(xa). Since the compound risk is
the average of the component risks, the Bayes estimator in the set

compound problem versus the prior GA“ is given by f, where

(2.43.) Pa(§n) =Twa,n(xa)°

A similar argument shows that the Bayes estimator in the

sequence compound problem versus the prior IDA,“ is given by t’, where

(2°4b) t’a(§n) :Twa,a(xa)°

2.2. A useful inequality on the modified regret.

Recall from Chapter 0 that Gn stands for the empirical

distribution of 01,. . . ,0“. For every Q69”, by definition,

. _ n . ..
Dn(§aQ) = n 121 PQIIItonr—901II2”II tor—901W],

a:
where 130(5) =TGn(Xa). Using Cauchy-Schwartz inequality to bound the
absolute difference between "d".2 and “b“:2 by IId+bII times IId—bII,
triangle inequality in H and (1.3), we get

a n A ~
(2.5) IDn(.t,Q)I s4Mn'lzngllta—tall.

(1:

Since ta(x) = Twa,n(xa) 9

(26) P2" Ea — Pa ll :: PC1300“ Twam — 7G“ "i

17
to investigate the bound on the absolute modified regret given by

(2.5), we therefore consider Pdlrw—r,“ , where 969 and w,1rEQ.

3. A bound on the L1(Po) distance between two component Bayes rules.

In Proposition 3.1 we derive a bound on Penny—“r," essentially
in terms of the total variation distance between the corresponding
mixtures. Abusing notation we shall use "all to denote the total

variation norm of a signed measure a on ($,‘.f) as well.
The next three lemmas are used to prove Proposition 3.1.

Lemma 3.1. Let <w,x>0 := f <9,x>0dw(9). Then

#<w,- >0‘1=N(0, If <v,€>dw(n)dw(€))-

Proof. By (1.1b), <w,. >0 is normally distributed if an is
finitely supported. Since (9,n)H<9,n> is continuous and bounded
(on compact 02), the map taking (w,1r) to the L201) inner product of
<w,. >0 and <7r,. >0 (which by interchanging the order of
integration and using (1.2) is seen to be (wxn)<.,.>) is
continuous. Continuity of wH<w,. >0 in L2(u) follows. Since 0
has a dense subset consisting of finitely supported measures
[Theorem II.6.3, Parthasarathy (1967)], and a family of normally
distributed random variables is closed under L2 convergence, we get
that <w,. >0 is normally distributed. The expression for the mean

and the variance follows by using Fubini’s Theorem. //

The following lemma is Lemma A.1 of Datta (1988).

18
Lemma (Datta—Singh): For (y,z,Y,Z,L)E§R5 such that z5£0 and
L20,

IzIII%—§I AL} s Iy-YI +(|%I +L)|z—Z|-

Lemma 3.2. Given 6>0, 3 {h1,...,hI}C‘W := {hEH : IIhIISl}

such that, for all real numbers a and b,

exp( —M2/2“a+b)ﬂ(P0IITw—Twrll[ <6,- >0 Sa][ <w.. >0 >b])
(3.1)
I
s 26+,21I1If < M > Pod(w-7I)I+3MIIPw—P1r”'
1:

Proof. Starting with the definition of pw( 2 fpgdw), recalling
the functional form of po implicit in (A), using (1.3) to bound p9
below, applying Jensen’s inequality to the exponential function, and
noting that p9[ <9,. >Oga]e‘3§1 and e<w">0_>_eb[ <w,. >0>b],

we get
exp(—M2/2—a+b)p9[ <9,. >0_<_a][ <w,. >0>b] Spw.

In view of the above it suffices to show that ,u(pu]lrw—r,,")
can be bounded by rhs(3.1).

By Lemma 1.1,
(3.2) "Tu—Tn”:V{|f<.,h>dCJ—f<.,h>dir|},
‘W e e

where C) and it are as in the proof of Lemma 1.1.

19
Applying Datta-Singh Lemma with zzpw, y: f <77,h>p,,dw(n),
e
Z = p,” Y = f < r],h > pnd1r(n) and L = 2M,
9

(3'3) Pwl((i_; < 71,11 > Pndw(7l)/Pw) _é < 77,11 > qu’r(77)/Pwl

S lg <mh>Pnd(W—7I)(77)I +3MIPw_P1rI '

Since G is compact by assumption and ‘W is weakly compact by the
Banach-Alaoglu Theorem, 6wa is compact. Since H is separable, ‘Ww
and hence ('3wa is metrizable. Since (9,h)I—+<9,h>is a continuous
function on GX‘WW, it is uniformly continuous. That implies
{hH<9,h> : 969} is an equi(in 9) uniformly continuous family of

functions on WW, so that for every wEQ

(3.4) pl j( <0,h> — <0,h’> )pgdwl
g gl <6,h> — <6,h’>|

S5,

if the distance between h and h', in a metric metrizing ‘Ww, is less
than 6:6(6). If weak-balls of radius 6 around {h1,...,h1} cover ‘W,
then triangulating around appropriate hi, using (3.4) and dominating

the maximum of I non—negative terms by their sum, we get

I

The lemma follows from (3.2),(3.3) and (3.5). //

Proposition 3.1. Let 7>0 be fixed arbitrarily. Then, 3 a

number SIG such that

2O
POIITw‘TWIIS57‘I‘3QIPw—P1rII'

Proof. For arbitrary real numbers a and b, partitioning H into
the sets [<0,.>Oga][<w,.>0>b], [<0,.>Oga][<w,.>ogb]
and [<9,. >0>a], using the bound IlTw-TwIISZM on the last two
sets and Cauchy-Schwartz inequality in L2(p) on the remaining

2
factors, and bounding "pelt, by eM /2 (see Lemma 1.3), we get

Penna—Tr”
(3.6) g 2MeM2/2{(p[ <9,. >0 >a])1/2+ (u[ <w,. >0 5b])1/2}
+u(p9“'rw—r,,“[ <9,. >0_<_a][ <w,. >0>b]).

By (1.1a), using the familiar bound on the upper tail of a

normal distribution and (1.3), we get, for a>0
(3.7) p[<9,. >0>a]S(27r)—1/2Ma—1exp(—a2/2M2).
Similarly, using Lemma 3.1, for b<0

(3.8) ,1[ < 3,. > 0 g b] g (2n)—1/2M( —b) -1exp( —b2/2M2).

In view of (3.7) and (3.8), the first term in rhs(3.6) can be
made arbitrarily small by appropriate choice (to be made later) of a
and b. To treat the second term, we shall use Lemma 3.2 and
concentrate on the term )ulf <9,hi>p0d(w—7r)| in the bound (3.1).

A<¢h>~

Expanding the function AHe 1n a Taylor series around

21
A=0 up to 2nd order, collecting the terms in lhs(3.9) on one side
of the equality, and using Cauchy-Schwartz inequality in H and (1.3)

to bound the other side, we get for A >0 and h E‘W,

(3.9) |<0,h> —%(e’\<0’h> —1)|_<_AM2eAM/2.
By (3.9) and the triangle inequality, with a abbreviating a)-n,

(3.10) III I < 0,11 > Pedal 5 We” +IIIIPw—PIII+III fe"<”’h>ped0|]-

We now show

(3.11) Izlfekg’bpodW—ﬂl=PAhIPw-pnl

as a consequence of
A 9 h A 9 h _ _
(3-11a) #(fe < ’ >pgdw,fe < ’ >pgdv) lZPAh(pwapn) 1-

By (1.1a), linearity of inner product and the map in (A), we get, V
mZI and V (91,...,9m)€(3m,

or equivalently,

A<6.,l1> ': ._ '2: _
#({e ‘ Pgifizr?) 1 = Pyh({Pgi}li:T) 1.

Hence, if co and 7r are finitely supported, (3.11a) holds. Since by
Theorem II.6.3 of Parthasarathy (1967) (2 has a dense subset
consisting of finitely supported measures, to prove (3.11a) for

general u22uul n it will suffice to show that for every V in (I, as

uk—w, fe2<0’h>p0duk(9) [puk] goes to fe’\<0’h>p9du(9) [pV] along

22
a subsequence p [PAh] a.s.. Actually we shall show the continuity of
the map taking (11,11’) to the L201) [L2(PAh)] inner product of
fe2<0’h>p0du(9) and fe’\<o’h>p0du’(9) [pV and pu,]. We do that by
interchanging the order of integration on $ and 92, using Lemma 1.2
(with k=2, a1=a2=1) to evaluate the p integral (which is
continuous on 92, by continuities of vector addition and inner
product and the exponential function, and bounded on 92 by (1.3))
and Lemma III.1.1 of Parthasarathy (1967). The bracket alternative
is shown by representing the 19(th) inner product as a p integral,
again interchanging the order of integration on % and 92, using
Lemma 1.2 (this time with k==3, ai==1 V'i) to evaluate the integral
(which is bounded continuous on ('92 by the same reasons as above) and

Lemma III.1.1 of Parthasarathy (1967) again.

Combining (3.10) and (3.11), we get

(3.12) lhs(3.10)sAMIeIM+IIPw—P.II+—I-p(Ipw—pIIpM).

By partitioning $ into [pAh>Cl and [PAhSCI’ and applying

Cauchy-Schwartz inequality in L2(p), we get

(3.13) ”(Ipw—pIIpM)scIIPM—P.II+IIp..-p..II{IIpIh2IpM>cI}‘/2.

Since the family {pAh2 : A6[0,K], hE‘W} is uniformly p-integrable
(it has uniformly bounded higher moments) for every K>0,
{/‘PAh2LPAh>C]}1/2 can be made arbitrarily small, uniformly in A and

h, by choosing c large enough.

Now choose a in (3.7) and b in (3.8) so that, uniformly in w
2
and 9, 2MeM /2{(u[ <9,. >0>a])1/2+(#[ <w,. >OSb])1/2} <7. Then

23
choose 6 small enough so that exp(M2/2+a—b) <7/6. Let I correspond
to this 6 as in Lemma 3.2. Now choose A small enough so that
AM2e’\M <6/I. Then choose c large enough so that, uniformly in w and
7r as well as in hE‘W, (1/A)||pw-Px||2{#PAh2[p,\h>c]}1/2S¢5/I
(possible since by Lemma 1.3 and the triangle inequality in L201),

M22
IIpM—pxIbs2e /)-

With these choices, by (3.12) and (3.13),
(3.14) lhs(3.10)S26/I+(c+1)||Pw—P,,||/A.

The proof of the proposition is now completed [with

2
36: {3M+A—1I(c+1)}exp(%—+a—b)] by (3.6), choice of a and b, use
of Lemma 3.2 with the above mentioned choice of 6 and substitution of

the bound from (3.14) in Lemma 3.2. //

4. Asymptotic optimality.

In view of the bound obtained in Proposition 3.1, (2.5) and
(2.6), the question of convergence of the modified regret to 0
reduces to the question, loosely speaking, whether PM)!n is L1

consistent for PGn. More specifically, it suffices to show

(4.1) \Il PQIIPwam—PGnII —) 0, uniformly in 9, as n—Ioo.

0:1

In Theorem 1 in Chapter 2 we establish such a consistency
result for the non-delete version for sufficiently diffuse A. The
result involving the delete versions will follow as a corollary

(i.e. Corollary 1 in Chapter 2).

Now we are in a position to prove our main result. For a

24
finite measure ﬁt on the Borel a-field of a second countable
topological space 3’, let Sm denote the topological support of m.
[For the definition of the topological support of a finite measure
on a second countable topological space see Section 2 of the

Appendix.]

Theorem 4.1 (Main Result). If SA=Q and f is the Bayes

estimator in the set compound problem given in (2.4a), then
(4.2) 61%" ta—tall—AO, uniformly in Q, as n—>oo.
a: -

Consequently, i is a.o.

Proof. The second part of the assertion follows from the first

part and the bound (2.5).

For the first part recall from (2.6) the representation
Pg" ta — ta II = PQPOO," Twam — Tan H; since 7 in the statement of
Proposition 3.1 is arbitrary, the assertion follows from that

proposition and the L1 consistency (4.1). //

Remagk 4&1 (Asymptotic optimality against the equivariant
envelope). As indicated in the introduction we now extend our
optimality result against the simple envelope to that against the
equivariant envelope. If the component problem involves a compact
(in total variation norm) class of mutually absolutely continuous
probability measures, then the excess of the simple envelope over
the equivariant envelope goes to zero uniformly in the measures
(Remark 4 in Mashayekhi (1990)). Recall that by assumption the
measures {P0 : 969} are mutually absolutely continuous. Since G is

topologically embedded in I) by Lemma 2 of Chapter 2 the map 9h+p0 is

25
continuous in L404). That implies continuity of 9HP9 in total
variation norm by the moment inequality. Since ('9 is compact, {P9 :
969} is compact in the total variation norm. By triangulation
around the simple envelope, the asymptotic optimality against the

equivariant envelope follows from Theorem 4.1. //

11mg; 4._2_ (Asymptotic optimality of Hayes sequence compound
estimators). We now prove the asymptotic optimality of the Bayes
sequence compound estimator t’ given in (2.4b). For ISaSn<oo,
let tan(xn)=TGn(xa), i=(t1n,...,tnn) and §’=(t11,...,tnn). Now
note that (with Pi and 'rj abbreviating ng and TGj respectively), by

the definition of Gk—-1’

k—l k—l

jgl lelTk—gle Zj§1 lelTk_1—0le V k=n,n—1,...,2.
Applying the above iteratively with k==n,n-—1,...,2

n n

,3 lelrj—Ojlf 5,2 PjIITn‘IjIIZ '

That is,

newsman) v 1121.
which implies
(4-3) DALE) S Ram's) —Rn(i’,ﬂ)-
It should be noted that the display immediately preceding (4.3) is
essentially inequality (8.8) of Hannan (1957).

From the definition of Rn, _t_’ and 3, following the steps
involved in showing (2.5),

(4.4) |rhs(4.3)| g min-10:11)," t'a— Em".

26
From (2.4b) and the definition of tau, using an analog of
(2.6), Proposition 3.1 and (4.1), it follows that

\0/PQ" t’n — tun H —)0 as n—>oo.

Using subadditivity of supremum and the fact that the limit of a
convergent sequence equals its Ce’saro limit, we get that rhs(4.4)—)0
uniformly in Q. If we can show that )o/Dn(t’,9) is positive, the
asymptotic optimality of t’ will follow. by (4.3) and convergence
(uniform in Q) of rhs(4.4) to 0.

We shall show that \ofDn(_t,Q) is positive for every compound

t. Since fan(t,Q)dwnZ r(w) for every w, in particular for

procedure
G“, we get that VRn(t, 9) > Vr(Gn). That, by definition of VDn(t, 9)

and subadditivity of supremum, implies the positivity of VDn(_t, 9). //

Remark 4;} (Calculation of the a.o. Bayes compound estimator).
From (2.4a), Lemma 1.1 and the definition of wow, it follows by a

successive deconditioning argument that

[I---I <e.,h> .fllpgi(xi)flldwﬂi‘l(9i)]

(4.5) <Ea(§),h> = n n 9.
[I.. -.I iEIP9i(Xi)i£lldw-'- 1(9i)l

 

9- . . . 9
where w" 18 the poster1or mean of w g1ven Qi and w'ozjwdA; for

details see Section 3 in Chapter 4 of Datta (1988).

To use (4.5) to calculate our Bayes compound estimator, we

. . 9-
need to choose a hyperpr1or A such that the poster1or mean w" has a
nice form for all 1. With that end in mind, we settle for the

Dirichlet priors described below.

Let a be a non—null finite Borel measure on ('9, where 9 is an

27

arbitrary separable metric space. In Section 2 of the Appendix we
show (compiling some results from Section 4 in Ferguson (1973)) that
there exists a probability measure “3(a) on (Q,‘.B(Q)) with the
following property : for every finite measurable partition
{B1,...Bm} of 9, the distribution of (w(Bl),...,w(Bm)) under 9(a)
is Dirichlet with parameters (0(31),...,a(Bm)). We call 610(0) the
Dirichlet prior with parameter a. By Theorem 2.1 of the Appendix,
the topological support of c.D(a) is 9 if that of a is 6. An example
of a finite Borel measure a on 9 with full support is obtained by
choosing a countable dense subset {9n : n21} of ('3 and selecting
aznilcndgn, where cn_>_0 V n and n02:1cne(0,oo). By Theorem 1 in
Ferguson (1973),

an=(a(®)+n)-1(a+.:2:160i), n_>_0.

When ('9 is a subset of the line, a Monte Carlo method for
calculation of rhs(4.5) has been given by Kuo (1986). The problem of
numerical evaluation of our estimator remains and is worth

investigating. //

5. Admissibility.

The argument we use to prove admissibility of Hayes compound
estimators is fairly standard in decision theory : A unique Bayes
rule is admissible (see Theorem 1 in Section 2.3 of Ferguson (1967)

for a precise statement).

Let f be a prior on the compound parameter Q. Q will denote the

joint distribution {0P0 on (5,9). Note that n—lﬁqllta—9alfz, the
" 0:1

28

Bayes (versus 6) compound risk of an estimator t, is minimal iff

OIIta—9a"2 is minimal for every 0. Now O||ta—9a|F can be represented
as [PQU'fllta—9a|[2dP0ad£a)d£, where {0:69a_1. Since the
expression inside parenthesis in the previous line has, by Lemma
1.1, a unique minimizer, there exists a unique Bayes compound
estimator versus every prior 5. That implies the admissibility of

every Bayes compound estimator.

CHAPTER 2
CONSISTENCY OF THE POSTERIOR MIXTURES

In this chapter we show [Theorem 1] that Pun,“ (the non—delete
version of the discussion at the beginning of Section 4 in Chapter
1) is Ll consistent for PGn-l in the sense of (4.1) of Chapter 1. We
actually prove the result with n replaced by (n+1) and obtain (4.1)
of Chapter 1 as a corollary [Corollary 1]. For the rest of the
chapter let LI: and A abbreviate wn+1,n+l and An+l,n+1
respectively. Before proceeding further we note that G) can be
interpreted as the posterior distribution of 9n+1 given
(X1,...,Xn)=(x1,...,xn) in the Bayes compound model with (n+1)

components.

Consider the following Bayes model on ﬂxan%":
(i) Bayes model: to is distributed as A and given (.0, 9 is distributed

n .
as can: x w and g1ven 9 and w, R
a:

is distributed as P9: ii P9 .
— azl a
The above model gives rise to the following marginal model:
(ii) Bayes compound model: 9: (91,...,9n) is distributed as EA,“ and
given Q, R=(X1,...,Xn) is distributed as P6, where LEA”) is the A

mixture of w“ .

Since () and hence (I (with the weak convergence topology) is a
Polish (in fact compact metric) space, all conditional distributions
are regular by Theorem 10.2.2 of Dudley (1989). Datta (1991a) shows
[see his Proposition 2.1] that under model (ii), with n replaced by

n + 1 , (I: is the posterior distribution of 9H +1 given

(X1,...,Xn) :(Xl,...,xn).

29

30
We now develop the machinery needed to prove Theorem 1. There
are four propositions leading to the proof of Theorem 1. Four

auxiliary lemmas are needed to prove the propositions.

The key to the proof of Theorem 1 is the inequality (17)
proved in Proposition 3. The force of Proposition 1 is used in part
in the proof of Proposition 3 and later in full in the proof of
Theorem 1 to treat the denominator of the second term in the bound
(17); it is the only link in the proof where the assumption SA=Q
is used. Proposition 2 is used to treat the numerator of the second
term in the bound (17). Proposition 4 disposes of the third term in
the bound (17).

Lemma 1. For every {w,1r}cil, log(pw/p,r) €L2(u) and

2
I|10g(pw/p«) II; S 65M /2IIPw - Prr I14-

Proof. Since the reciprocal function is convex on (0,00), the
area under the reciprocal curve between a and b, where 0<a_<_b<oo,
is smaller than the area under the straightline joining the points
(a,a"l) to (b,b—l), which is equal to (b—a)(a_1+b"1)/2. That

gives

I10g(Pw/P7r)ISIPw_P1rI(Pw—1+P1r—1)/2 a-S- (II),

which implies, via Cauchy-Schwartz inequality,

(1) 2II10g(pw/p«) II; _<_ up... - pnII4L/‘(Pw— 1 + P1r_1)4]1/4°

Applying Jensen’s inequality to the function XHX—j, which is

convex on (0,00) V j_>_1 and (trivially) for j=0,

31

(2) pw-j _<_ [pg—jdw V wEﬂ and V j described above.

Applying (2) with j=i on pw and j=4—i on p,” interchanging the
order of integration on 62 and $, and using Lemma 1.2 of Chapter 1

(with k=2, a1: —i, a2=i—4), we get

(3) u(pw‘ip«i’4) = exp{Z'IIII — 19+ (1 -4)nIF+ iII9II2+ (4- i)IInII2]}-

The exponent in rhs(3) simplifies to
2‘1[(12+1)I|9|I2+(12—9i +20)||17||2+21<9,17>(4— 1)].

For all i=0,1,...,4, the coefficients of ”NP, "1}"2 and <9,n> are

all non-negative and hence, by (1.3) of Chapter 1, the exponent in
4

rhs(3) is bounded by 10M2. Since E(%)=24, using (3) with 10M2
i=1

bounding the exponent, we get
2
(4) second factor in rhs(1)S 2e5M /2.

The lemma follows from (1) and (4). //

In what follows, any reference to a topology of {2 will be to

the topology of weak convergence.
Lemma 2. prw is uniformly continuous from 0 into L404).

Proof. For j=0,1,2,3,4, writing p“,j (and p,,4_j) as a j (and
4— j) fold iterated integral, interchanging the order of integration

on % and 94 and using Lemma 1.2 of Chapter 1 (with k=4,

a1=...=a.4=l),

(5) W p.4-Idp=uﬁx«4-I(epr2-1I

 

 

1:216)i II2 —i $1“ 0i "2“).

32
By repeated application of Lemma III.1.1 of Parthasarathy (1967), if
wn—w then wnjxw4_j—>w4 weakly on 94. Since the integrand in
rhs(5) is a bounded continuous function on ('94, using (5) twice we
get

I pwnjpw4_jdu—*f mild/1 as can-W-

Expanding (Pwn"Pw)4 and applying the above to the integral of each

term,

I (pt.n - pw)4du—+0 as wn-W;

that establishes the asserted uniform continuity because the weak

topology of Q is metrizable as a compact metric space. //

Let
(I) An(w) :: flog(P1r/Pw)dpir-

Lemma 3. WHA,(w) is equi(in on) continuous.

[Proof. For it and u in Q, triangulating around flog(p,r/pw)dP,,,
(5) IAN») -Au(w)l S Iflog(P1r/Pw)d(P1r—PV)I+IAV(7")I'
By Cauchy-Schwartz inequality in L2(u),

(7) lst term in rhs(6)S”105(Pw/Pwln2IIP1r—Pull2i

by Lemma 1, the triangle inequality in L4(u) and Lemma 1.3 of Chapter
1 with q=4,

2
(8) rhs(7) 5264M "Pr—Pull?
By Cauchy—Schwartz inequality in L2(p) again,

(9) 2nd term in rhs(6) S||1og(pV/p,,.)"2"p,,lb ;

33

by Lemma 1.3 of Chapter 1 with q: 2 and Lemma 1,
3M2
(10) rhs(9) Se "pi-pun..-

Since IIPI‘PVIIZSIIPI‘PVILI the proof is completed by combining (6)-
(10) and applying Lemma 2. //

Lemma 4. wHA,(w) is equi(in 7r) continuous.

Proof. For no and V in Q, by Cauchy—Schwartz inequality in

L2(p)9
IAn(w) —A7r(V)I Slllog(Pu/Pw)II2IIP1rII2
2
S e3M Ile‘Pulka by Lemma 1.3 of Chapter 1
with q=2 and Lemma 1;
the lemma follows by Lemma 2. //

Proposition 1. If SAzﬂ, then
9 A{A,,<6}>0 V 6>0.

Proof. [Taken from Lemma 6.6 of Datta (1991a)] By Lemma 4
{A,<6} is open; since it is non—empty (it contains 7r) and SAzﬂ,
A{A,<6} >0. By Lemma 3, if 1rn—nr then AID—+A, pointwise on 0,
hence in A-distribution. Therefore, by Theorem II.6.1(d) of
Parthasarathy (1967),

lim inf A{A,n<6} ZA{A,,<6} as 7rn—nr;

in other words, 7r:—&A{A,,<6} is lower semi-continuous. Hence the
infimum is attained over compact ﬂ and is positive. //

Let
n
(H) Vn(w) == n'lillog Pw(xa) - flog pdeGn-
a:

34
Proposition 2. Let p be a metric on.(l for the topology of weak

convergence. For every 6>0, 3 an e>0, such that p(w,1r) <6 implies

(11) PQIepr2nIIHIw) — men) s e“.

Proof. Using Cauchy—Schwartz inequality in L2(p), Lemma 1.3 of
Chapter 1 with q = 2 to bound "p90!”2 and Lemma 1,

2
(12) |f10g(pw/p«)dpgal S e3MIIPw—P1rll4-

Since

2nI°rn(w) — (“(4)1 = 35:: Imam/paw.) — iguana/mama,
by isotonicity of the exponential function and the bound in (12),
(13) lhs(n) s [Pg(al:ll(Pw/P1r)2(xa))leXPIZHIIPw— p.II.e3M2}.

We shall now show that

(14) PQIQIlew/p.)2<xa)) s eXpI2nIIp... — p.II.e16M2}.

Since the L2 norm of a random variable is less than or equal to its
L4 norm, (13) and (14), in view of Lemma 2, will complete the proof

of the proposition.

Using independence of the factors in the integrand under P0,

lhs(l4) =aﬁlpga(Pw/Pr)2 3

which, by bounding each of the factors using the inequality

vSeV—l, is bounded by

(15) expiglnaupﬁ/pﬁ) —1]}.

35
Converting the integrand into an expression with common denominator,

factoring the numerator and applying Cauchy-Schwartz inequality in

L201),
(16) P0a[ (sz/PIZ) - 1] 5 "Pa: - Pr "2" 1’9an + 1r/ Pr? It

It now suffices to show that the square of the second factor
in rhs(16) can be bounded by 4e32M2. We do that by considering it as
a p integral, using the bound on the inverse fourth power of a mixed
density obtained in (2), writing (p‘,,_,_,.r)2 as an iterated integral,
interchanging the order of integration on % and G3, applying Lemma
1.2 of Chapter 1 (with k=4, a122, a2=a3=1, a4: —4),
simplifying the resulting exponent by expanding the squared norm of
the sum of four terms and making possible cancellations, and using
Cauchy—Schwartz inequality in H and (1.3) of Chapter 1 to bound the

remaining terms in the exponent, in that order. //

The basic idea in the following proposition can be traced back
to (iii)’ of the Addendum of Gilliland, Hannan and Huang (1976). For
a similar exploitation of that basic idea see Lemma 6.1 of Datta

(1991a).

Let
(III) (U6 :2 {w : AGn(w)<6}.

Proposition 3. Let p be as in Proposition 2. Fix a 6 >0. Let

A =i§1uranI>I < 6/2},

where p—balls of radius 6 (corresponding to 6 as in Proposition 2)
around {w1,...,wr} cover 0. Then

—3n6 7' 7- ~
1 e n n
A] . —P < \I6+ nd ndA A + .
(17) 2lP GnII 2 [ 2( 6)]6 Afe ] 6 A6

 

36

Proof. Since éﬂPw—PWHS 1 for all {w,7r} C0

(18) lhs(17) gaIPa—Pcnlpﬁié.

By definition of C), using the inequality |ff| Sflfl with
f=pw—pG and interchanging the order of p and A integration, we
n

get
09> III-"cullsIII-(”GINA
Since by inequality (3.6) of Hannan (1960)

Ala-Pals IA.(w),

bounding the integrand in rhs(19) by 446 on “1146 and by 2 on the
complement, and A(‘U.46)A6 by 1,

(20) ﬁIPQ—PGHHA632IE+A(%4,)A,.

In view of (18) and (20), it remains to show that the second

term in rhs(20) can be bounded by the second term in rhs(17).

By definition of A [it is the posterior distribution of w

given (X1,...,Xn)=(x1,...,xn), when given w, X1,...,Xn are iid~Pw
and w~A],
~f egdA
(21) [\(il KELL—
46 — feg dA ’
”as

n
where g(w) = 21103 Pw(xa)°
a:

Using the identity

g(w) = n‘Vn(w) — nAGn(w) + nflog pGndPGn,

37
bounding AGn below by 46 on ‘1145 in the numerator of rhs(21) and

above by 6 on ‘116 in the denominator of rhs(21), we get

I en‘rndA
11
(22) rhs(21) S e-P’n‘S —46—nv—
j'e ndA.
”a
Normalizing A on (116 (A(‘U.6)>O by Proposition 1) and applying

Jensen’s inequality to the reciprocal function, which is convex on

(0,00),

1 1 —n‘l"
(23) ————S— fe ndA.
f envndA A2615) In,
“6
Substituting (23) in rhs(22) and weakening the resulting bound

by enlarging the ranges of integration, the proposition follows from

the remark following (20). //

Proposition 4. Fix a 6 > 0. Let A6 be as in Proposition 3. Then

XP ~6=O(n_1) as n—->oo.

Proof. By the definition of A6 and subadditivity of measures,
~ 1'
P9445 _<_ . ZIPQI: |‘V'n(wi) I 2 5/2l
1:

which, by applying Chebychev’s inequality to each of the terms and
bounding the sum. of r nonnegative terms by the maximum of the terms

times r, is bounded by
(41162)); when?

Since ‘V'n(w) is the centered average of n independent random

variables under Pb,

38

PQ(Tn(w))2 S “—1 ){vargﬂog Pw)°

Since the variance is smaller than the second moment, it

suffices to show that )9 )0/P0(log pm)2 <00.

By the elementary log inequality used in the proof of Lemma 1,

(24) 4P0(log Pw)2SP0(Pw_1)2(1+Pw_1)2'

By Cauchy—Schwartz inequality in L2(p),

(25) rhs(24) s||p9<1+pM-1)2|I1Ip.— 11.2.

By the triangle inequality in L404) and Lemma 1.3 of Chapter 1 with

q = 49
2
(26) 2nd factor in rhs(25) S (63M /2+1)2 .

By Cauchy-Schwartz inequality in L204) and Lemma 1.3 of Chapter 1

with q=4,

(27) lst factor in rhs(25) ge3M2/2||1+pw‘1|[2.
By the triangle inequality in L8(u),

(28) 2nd factor in rhs(27) S2(1+"pw“lIE2).

Applying the bound on the inverse eighth power of a mixed density
obtained in (2), interchanging the order of p and w integration,
using Lemma 1.2 of Chapter 1 (with k=1, a1: —8), and using (1.3)
of Chapter 1 to bound the resulting exponent, we get that "pm-1",;2
is bounded by e18M2; substituting that bound in rhs(28) and

combining the result with (27) and (26), we complete the proof using

39

(25) and the remark preceding (24). //

Theorem 1 [Cogsistency 9f LIE posterior mixtures]. If SAzﬂ,
then

PQHPa—PGHH—IO, uniformly in Q, as n—200.

Proof. We shall show that for every 6 >0,

(29) P,( fe—nvndAfenvndAMé 3 e2“,

where A6 is as in Proposition 3.

Taking P0 expectation of both sides of (17) and using (29),

(30) Pd)Pa—PGHIIS4\I6+2 6, +2123,

n6
A ((115)
Since the above inequality holds for every 6>0, we complete the

proof of Theorem 1 by taking supremum over 9

and lim sup as n—+00 (in
that order) of both sides of (30), using subadditivity of these

operations, and applying Propositions 1 and 4.

Applying Cauchy-Schwartz inequality in L2(Pb) .and then the
moment inequality to the A integrals in both the factors in the

Cauchy-Schwartz bound,

(31) lhs(29) _<_[pQ(fe‘mndA)A,]1/2[PQ(je2nvndA)A,]1/2.

Consider the finite cover of (I described in Proposition 3. Clearly,
for every (.060, choosing an wi such that p(w,wi)<e and using

Proposition 2 with aiand an, by the definition of A6,
(32) P6(e —2nv“(w)A6) S e2ms and P0(e2nvn(w)A6) S e2n6.

Interchanging the order of A.and P9 integration in rhs(31) and using

40

(32), we establish (29). //

Remark 1 (Comparison of Theorem 1 and Theorem 3.1 of Datta
(1991a)). In his Theorem 3.1 Datta (1991a) proves the assertion of
Theorem 1 for compact metric 9 and a class of probability densities
{p9 : 969} on an arbitrary measurable space, under two regularity
assumptions including the one (A1) that p9(x) is continuous in 9 for
every x. Unless 9 is what Dudley (1967) called a GC set (GC is an
abbreviation for Gaussian Continuity; a subset of a Hilbert space is
defined to be a GC set if the isonormal process indexed by that
subset has a sample continuous version) (A1) is not satisfied for
the Gaussian shift experiment. For an example (among the ellipsoids)

of a compact 9 which is not GC, see the introduction to Section 6

and Proposition 6.3 of Dudley (1967).

Datta (1991a) obtains a bound on 2IIP5’_PGnII in his Lemma 6.1
similar to our bound in Proposition 3. His bound consists of an
arbitrarily small term, a term involving a measure of diffuseness of
A (which we dispose of by Proposition 1, which is essentially his
Lemma 6.6) and another term involving the probability of the tail of
XITMQ’H- By his assumption (A1), the quantity XI‘Vn(w)| is the
Banach norm of a C(Q) valued random element, where C(S) is the
Banach space (with sup norm) of all real-valued continuous functions
on compact metric S. He deveIOps an uniform L1 law of large numbers
for C(S)—valued random elements to dispose of that term. In our
context Tn is not a C(Q) valued random element, making Datta’s

method of proof inapplicable. //

Corollary .1_._ If SAzfl, then (4.1) of Chapter 1 holds; i.e.,

41

{l PQHPwam—PGBH —>0, uniformly in _9_, as 11—200.

a=1
Proof. As in the proof of Lemma 4.3 of Datta (1991b), we

observe that

\/ V P I’ -P 1< V 1’ P -—P
(33> WI anal-mm Ilwnn c.-II
where Gnu is the empirical distribution based on

(0,,..,oa_,,9a+1,..0,). Since Gn-cna=n-1(5,a_cm) with 5,0 the

unit mass at 9a, the definition of pw gives

_ _ _ = -l _ —l
(34) IIPGn Pawn—mpg“ penal) n #(nga pana|)32n -

By the triangle inequality, (33) and (34), the corollary follows if
rhs(33) goes to 0. But that, with n replacing n-—1, is the assertion

(with some notational changes) of Theorem 1. //

CHAPTER 3
THE EMPIRICAL BAYES ESTIMATION

In this chapter we look at the empirical Bayes [Robbins (1951,
1956)] formulation of our component problem. Consider a Bayes
decision problem involving {P0 : 969} and a Bayes prior (.0, where w
is unknown. Suppose we have iid pairs (91,X1),...,(9n,Xn),...., where
91 is distributed as w and given 91, X1 is distributed as P01. At
stage n, a decision tn=tn(Rn) about 9n is taken incurring loss
"tn—9an and risk fflltn—9nﬂ2dPde“. The sequence {tn : n21} is
called an empirical Bayes rule. An empirical Bayes rule {tn} is

called asymptotically optimal (a.o.) if

f f“tnT9nIFdPOd“"n_’r(“’)a for each an 6 Q, as 11—)OO.

The notion of admissibility in the class of empirical Bayes rules
is the same as the corresponding notion in the case of compound

rules, with the understanding that the risk now is a function of w.

Let A be a hyperprior on Q. We will prove that any sequence of
Hayes (versus A at each stage) empirical Bayes rules is admissible;
if SAzﬂ, a sequence of Hayes empirical Bayes rules versus A is

asymptotically optimal as well.

1. Bayes empirical Bayes.

For any given n, the stage n Bayes risk versus A in the

empirical Bayes problem is

42

43
t —9 dewndA w = t -—9 dP dd:
gééunﬂfg ()ggunnthm
which is the n—th component Bayes risk versus the prior DA“, on the
compound parameter Q in the set compound problem with n components.
Hence a Bayes empirical Bayes estimator is tn given in (2.4a), with

0 replaced by n.

A

Admissibility. Since a Bayes empirical Bayes estimator is tn
given in (2.4a), and as observed in Section 2.5 of Chapter 1 every
Bayes compound estimator is unique up to u" equivalence, the
admissibility again follows from the uniqueness of Hayes rule

argument.

2. Asymptotic optimality.

Theorem 2.1. If SAzle then the Bayes empirical Bayes

estimator {tn : n231} is asymptotically optimal.

Proof. Let 70,11 be a component Bayes estimator versus (.0 based

on Xn. Then, as in (2.5) of Chapter 1,

(2.1) I I III In — OnIFdPde“ — r(w) I 5 4M PM" "an,“ —rM,n“;
by Proposition 3.1 of Chapter 1 it is enough to show that
(2.2) PwHTIHPwn’n—Pw" ——>0 as n—>00.

The uniform (in.oa) version of (2.2), with n replacing n-—1, is the
assertion (with some changes in notation) of the following corollary

[Corollary 2.1] to Theorem 1 in (Lapter 2; applying that corollary

44

we complete the proof. //

Corollary 2.1. Let (.7) be as in Chapter 2. Under the assumption

PwnIIPcD—PWII'IO’ uniformly in w, as n—roo.

Proof. Noting that P“,n is the marginal on $n of the joint

 

distribution on $“x9n obtained from P9 and w“, and triangulating

around PG“,
(2.3) PwnIIPa‘PwIIS IPQIIP&_PGn"dwn+wn(“PGn_PWIl) .
Now the first term in rhs(2.3) goes to zero uniformly in w by

Theorem 1 in Chapter 2.

By the moment inequality, applied first to the w" integral and

then to the u integral,

I“) (“DIIPGn’PwIll25”n(I|pGn‘pWIP2) -

I]

Now by interchanging the order of a and 1.) integration, noting that

I.u"(p(~1:n—pw)2 is the variance of the average of n iid random
variables and bounding the variance of a random variable by its

second moment, we get
(2.5) rhs(2.4) Sn‘lffp02dw(9)du.

Interchanging the order of u and to integration, and applying Lemma

1.3 of Chapter 1 with q=2 to bound p(p92),
—1 M2
(2.6) rhs(2.5) Sn e .

The corollary follows by (2.4)-(2.6). //

45
emar 2.1. By (2.1), Proposition 3.1 of Chapter 1 and
Corollary 2.1, we get that lhs(2.1) goes to 0 uniformly in an which

is a stronger form of asymptotic optimality.

APPENDIX

APPENDIX

1 . On measurability .

In this section, we prove two lemmas concerning measurability
of two maps which have been used in the main body of the

dissertation.

Lemma 1.1. Let 9 be a separable metric space endowed with its
Borel a-field. Let It be the set of all probabilities on 9, endowed
with the topology of weak convergence and the corresponding Borel 0-
field. Then, for every bounded real-valued measurable function h on

9, the map wHw(h) is measurable.

Proof. We shall use the following theorem (TI.20) from Meyer
(1966):

Let JG be a vector space of bounded real-valued functions defined on I‘,
which contains the constant 1, is closed under uniform convergence, and is such
that for every increasing, uniformly bounded sequence of non-negative functions
gn 6 3B, the function g = rig-12mg“ belongs to fit. Let C be a subset of 3%, closed
under multiplication. Then the space 35 contains all the bounded functions

measurable with respect to the o-field ‘3' generated by the elements of C.

Let 1‘29, JG: {h : h is a bounded real-valued function on 9
and wHw(h) is measurable}; clearly 3% satisfies all the conditions
of TI.20, Meyer (1966). Let C={B§9 : B is closed}. Clearly C is
closed under multiplication. Since, by the portmanteau Theorem
[Theorem II.6.1(c), Parthasarathy (1967)], for every B6C and every
kEIR the set {w : w(B)Zk} is closed in the topology of weak
convergence, we get that C is a subset of 3%. Therefore 3% contains

all the bounded real-valued measurable functions on 9. //

46

47
Lemma 1.2. Let ($,‘J) be a measurable space. Let 9 be a
separable metric space endowed with its Borel o—field. Let
f:9x$I—»[0,00) be a measurable function. Let Q be the set of all
probabilities on 9, endowed with the topology of weak convergence
and its Borel o-field. For c060, let f(w,x) := ff(.,x)dw; then
fzﬂx$H[0,00) is measurable.

Proof. We shall again use TI.20, Meyer (1966). Let F=9x$,
3£={h : h is a bounded real—valued function on 9x$ and
(w,x)+—+fh(.,x)dw is measurable}; clearly 3% satisfies all the
conditions of TI.20, Meyer (1966). Let C={AxB : A is a measurable
subset of 9, B is a measurable subset of $}. C is clearly closed
under multiplication. That C is a subset of 3% follows from Lemma
1.1. Therefore 3% contains all the bounded real-valued measurable
functions on 9x$, in particular, fAM for every integer M. Since
{fAM : M is an integer} is an increasing, uniformly bounded
sequence of non-negative functions, its pointwise limit f also

belongs to 3‘6. That completes the proof of the lemma. //

2. On topological support of Dirichlet prior.

In this section we present a result of independent interest
characterizing the topological support of a Dirichlet prior on an
arbitrary separable metric space, which is used in Remark 4.3 of

Chapter 1 to give examples of A with full support.

Ferguson (1973) states that the topological support of a
Dirichlet prior on the Borel o—field (corresponding to the weak

convergence topology) of the set of all probability measures on the

48
line is the set of all probability measures with their topological
supports contained in that of the parameter measure of the Dirichlet
prior. We prove that statement with the line replaced by an

arbitrary separable metric space.

Let $ be a separable metric space and .A be the Borel o-field
of $. Let Q be the set of all probability measures on ($,.A). The
topology of weak convergence on Q is metrizable as a separable
metric space [Theorem II.6.2, Parthasarathy (1967)]; let ¢§B(f2) denote

its Borel o—field.

We consider the random probability measure P defined in (4.7)
of Ferguson (1973). Let {Vn : n21} be a sequence of iid random
elements taking values in ($,A) with common distribution 0, where
O(A) =a(A)/a(‘£) and a is a finite non-null measure on ($,.A). Let
{Jn : n21} be a sequence of non-negative random variables
independent of {Vn : n21}. For j22, let the conditional
distribution of Jj given Jj_1,...,J1 be equal to the distribution of
J1 truncated above at Jj_1; let the distribution function of J1 be
exp(N(.)), where N(x)= —a(%):[oe-yy—1dy for x>0. In Theorem 4.1
of Ferguson (1973) it is proved that 0):an converges w.p. 1. For
A61, define 1

PM) = :PjXVj(A)a

where PD: 03“ and XV(A) = 1 if v6A
:1: n

 

= 0 otherwise.

Clearly, for every point in the set (in the probability space

underlying the random sequences {Pu} and {Vn}) on which ngn
1

49
converges, P is a probability measure on 1. Therefore without loss
of generality we can assume P to be (I valued. Let ¢A be the map on
9 taking a: to w(A). Since the real—valued map P(A) is Borel
measurable, P is measurable with respect to o{¢}, the o-field
generated by the family {43A : A691}. Note that by Lemma 1.1, o{¢}
is a sub o-field of 633(9). We shall show that 93(9) is a sub o-field
of o{¢}. We shall denote the induced distribution of P on (Q,?B(Q))
by ‘3? and refer to it as the Dirichlet prior on "53(0) with parameter
a. By Theorem 4.2 of Ferguson (1973), for every k=1,2,..., and
measurable partition (B1,...,Bk) of %, the distribution of
(P(Bl),...,P(Bk) is Dirichlet with parameters (a(B1),...,a(Bk)).
Note that in the sense of Ferguson (1973) if the j-th parameter of a
Dirichlet distribution is equal to 0, then the j—th coordinate is

degenerate at 0.
To prove “.B(Q)Co{¢}, recall [Theorem 3, Appendix III,

Billingsley (1968)] that

‘11. :: {N(p ; A1,...,Ak; 61,...,€k) : #612, ei>0,
Ai p-continuity subset of %, i=1,2,...,k, k=1,2,...}

is an open base for the topology of weak convergence on Q, where

k
N(u ; A1,...,Ak; 61,...,€k) :=i01{l/69: |u(Ai)—p(Ai)| <ei}.

Obviously, every set in ‘11 is in 0'{¢}. Using separability and
metrizability of 9, which together imply second countability, we
conclude by Lindel'o'f’s Theorem that every open subset of Q is in

o{¢}; hence ‘fB(Q) co{</>}.

For a finite Borel measure m on a second countable topological

50

space If, the topological support of m is defined to be the set
Sm: ﬂ{F: Fis closed and m(Fc) =0}.

Note that 363m iff for any open set O containing 3, we have
m(O) >0. Since I? is second countable, by Lindel'o'f’s Theorem Smc can

be expressed as a countable union of Fc sets. Therefore

(2-1) m(Smc) =0;
hence
(2.2) if B is closed and m(B) =m(Sm), then SmCB.

In the sequel, the set

{(x1,...,xm)6§Rm: 0Sxi V i=1,2,...,m and.}n:lxiS1}

1:]

will be referred to as the sub-simplex in m-dimension.

We now state the main result of this section characterizing

the topological support of a Dirichlet prior {P with parameter 0:.

Theorem 2.1. Sap={pell : SﬂcSa}.

Proof. We first show 593(1169 : SPCSa}. Let Spcsa ; we
shall show that every basic open set in “U. containing p has positive
fP-probability to conclude 1165?. Now for arbitrary positive integer

k, u-continuity subsets A1,...,Ak of $ and positive numbers

61,...,6k,
(2-3) ?P(i§1{ueﬂzlu(Ai)—u(Ai)| < 6il)=P1‘(i__rk_l1{IP(Ai)-#(Ai)I < éil),

where Pr is the probability measure on the domain of P.

Let {Fulwy : Vi=0 or 1 V i=1,...,k} denote the measurable

k

51

partition generated by A1,...,Ak ; i.e.

Ful....uk =.

k . .
ﬂAjV-l where AVJ :A if Vj =1
1:1

= Ac otherwise .

Then, noting that Ai': U F"1"""k and using subadditivity of distance,
”1:1
we get

(2.4) gum—mum <e}ai__r__")1{ .21 l(P-u)(Fu,....uk)| <e}.

Since the class of p—continuity sets form a field [Lemma

is a p—continuity set; i.e.,

II.6.4, Parthasarathy (1967)], Fylnnyk

(2'5) #(6Fu1....uk) :0-

Note that,
(2.6) if 0(FV1....Vk) = 09 then ”(intFVI....Vk) :0.

Because a(F,,1mJ,k) =0 implies a((intFu1_._.,,k)cﬂSa) =a(Sa), and since
(intFylmmkfnSa is closed, by the observation in (2.2),

(intFVlekYDSa. Since SpC $0,, the claim follows by (2.1).

Since (P(F"1"“"k)’P((F”1"""k)c)) has a Dirichlet distribution
with parameters (a(Fu,....uk).a((Fu,....uk)“)), P(Ful.....k) is degenerate

at 0 if a(F,,1" ) =0; hence, by (2.5) and (2.6),

.J/k

(2.7) rhs(2.4) : n{[|(P—#)(Ful....uk)l < 2-ke] = a(Fu,....uk) >0},

where c =./\ 6i.

1:
Now {P(F”l""”k) : a(F,,1_m,,k)>0} has a Dirichlet distribution

with all parameters positive; since, by (2.5) and (2.6),

52

2:{I‘(FV1....i/k) : 0(Ful....uk) > 0}=1,

temporarily abbreviating (u1,...,vk) to y and fixing a 2 for which

a(F,7) >0, we get

(2.8) rhs(2.7) D ﬂ{[|P(Fy_)—u(Fz)| <2'2ke] : K752 and 0(FZ) >0}.

Since {P(Fylwyk) : a(F,,1W,,k) >0} has a Dirichlet distribution
with all parameters positive, the induced distribution (over the
sub—simplex in appropriate dimension) of a one—component-deleted
subvector of {P(FVl-u-Vk) : a(F,,1m.,,k) >0} puts positive mass on every
subset of the sub-simplex with non-empty interior. By (2.4), (2.7)
and (2.8), lhs(2.3) is positive. That completes the proof of p659.

Conversely, suppose #659; to show SpCSa it is enough (by the
observation in (2.2)) to show [1(Sac) =0. Since by Theorem II.6.1(d)
of Parthasarathy (1967) lim wn(A) Zw(A) whenever tun—w and AC$ is
open, the set {1x60 : u(A)Zu(A)+e} is closed (in the topology of
weak convergence) for every open set Ac% and every e>0. Since Sc,c
is open, {V69 : p(Sac)<1/(Sac)+e} is an open set containing p for

every e>0; since #639,,
YP{VeQ : u(Sac) <V(Sa°)+e} >0 for every e>0.

Now V(Sa°)=0 a.s. (‘3’), because (P(Sac),P(Sa)) has a Dirichlet
distribution with parameters (a(Sac),a(Sa)) and a(Sac) =0 by (2.1).
Therefore, u(Sac) <6 for every e>0; that is, u(Sac) =0. That

completes the proof. //

BIBLIOGRAPHY

BIBLIOGRAPHY

BILLINGSLEY, P. (1968). Convergence of Probability Measures. Wiley, New
York.

CHOW, Y.S. and TEICHER, H. (1988). Probability Theory, Independence,
Interchangeability, Martingales. Springer Verlag, New York.

DATTA, S. (1988). Asymptotically optimal Bayes compound and
empirical Bayes estimators in exponential families with compact
parameter space. Ph.D. dissertation, Dept. Statist and Probab.,
Michigan State Univ.

DATTA, S. (1991a). On the consistency of posterior mixtures and its
applications. Ann. Statist. 19 338-353.

DATTA, S. (1991b). Asymptotic optimality of Bayes compound
estimators in compact exponential families. Ann. Statist. 19 354-365.

DUDLEY, R.M. (1967). The sizes of compact subsets of Hilbert Space

and continuity of Gaussian processes. J. Functional Analysis 1 290-330.

DUDLEY, R.M. ( 1989). Real Analysis and Probability. Wadsworth It
Brooks/Cole, Pacific Grove, California.

FERGUSON, T.S. (1967). Mathematical Statistics, A Decision Theoretic

Approach. Academic Press, New York.

FERGUSON, T.S. (1973). A Bayesian analysis of some nonparametric
problems. Ann. Statist. 1 209-230.

GILLILAND, D.C. (1968). Sequential compound estimation. Ann. Math.
Statist. 39 1890-1904.

GILLILAND, D.C. and HANNAN, J. (1974/86). The finite state compound
decision problem, equivariance and restricted risk components. In
Adaptive Statistical Procedures and Related Topics (J.Van Ryzin, ed.) IMS
Lecture Notes-Monograph SeriesS 129-145. IMS, Haywood, Calif.

GILLILAND, D. C., HANNAN, J. and HUANG, J. S. (1976). Asymptotic

solutions to the two state component compound decision problem,
Bayes versus diffuse priors on proportions. Ann. Statist. 4 1101-1112.

53

54

HANNAN, J.F. and ROBBINS, H. (1955). Asymptotic solutions of the
compound decision problem for two completely specified
distributions. Ann. Math. Statist. 36 1743-1752.

HANNAN, J.F. (1957). Approximation to Bayes risk in repeated play.
The Theory of Games 3. Ann. Math. Studies. 39 97-139. Princeton Univ.

Press.

HANNAN, J. (1960). Consistency of maximum likelihood estimation of
discrete distributions. In Contributions to Probability and Statistics :
Essays in Honor of Harold Hotelling (I. Olkin et. al., eds.) 244-257.
Stanford University Press.

HUANG, J.S. (1972). A note on Robbins’ compound decision procedure.
Ann. Math. Statist. 43 348-350.

KUO, L. (1986). A note on Bayes empirical Bayes estimation by means
of Dirichlet processes. Statist. Probab. Lett. 4 145-150. ‘

LECAM, L. M. (1986). Asymptotic Methods in Statistical Decision Theory.
Springer, New York.

MASHAYEKHI, M. (1990). Stability of symmetrized probabilities and
compact compound equivariant decisions. Ph. D. dissertation, Dept.
Statist. and Probab., Michigan State Univ.

MEYER, P.A. (1966). Probability and Potential. Blaisdell Publishing
Company, Waltham, Massachusetts.

PARTHASARATHY, K. R. (1967). Probability Measures on Metric Spaces.

Academic, New York.

ROBBINS, H. (1951). Asymptotically subminimax solutions of compound
statistical decision problems. Proc. Second Berkeley Symp. Math.
Statist. Probab. 131- 148. Univ. California Press, Berkeley.

ROBBINS, H. (1956). An empirical Bayes approach to statistics. Proc.
Third Berkeley Symp. Math. Statist. Probab. 1 157-164. Univ. California

Press, Berkeley.

SINGH, R.S. (1974). Estimation of derivatives of average p-densities
and sequence-compound estimation in exponential families. Ph.D.
dissertation, Dept. Statist. and Probab., Michigan State Univ.

ZHU, J. (1992). Asymptotic behavior of compound rules in compact
regular and nonregular families. Ph.D. dissertation, Dept. Statist.
and Probab., Michigan State Univ.

IIIIIIIIIIIIIII