=
—
—
=
—
—
—
_
—.
—
—
_
—
—
—_
=

 

usm\es

um

\\\\\\\\\\\\\\\\\\\\\\\\\\ 43

it \
3 12.93 0

ﬁt)“

ti
\\

“\CHM \3

This is to certify that the
dissertation entitled
Asymptotic Behavior of Compound Rules in
Compact Reguiar and Nonregular Families

presented by

Jin Zhu

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree in Stat'iStiCS

%W chuk
Prof

. James Hannan

 

Major professor

[ﬁne July 2, 1992

 

MSU is an Affirmative Action/Equal Opportunity lnslirulion 0- 12771

 

 

LIERARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before due due.

DATE DUE DATE DUE DATE DUE

J ll—
ISLJ;
—7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_J
”—T—il h

MSU Io An Afﬁrmative ActiorVEqueI Opportu'lIIy Institution
_ _ _ cMMpmS-pI

 

 

ASYMPTOTIC BEHAVIOR OF COMPOUND RULES IN
COMPACT REGULAR AND NONREGULAR FAMILIES
By

Jin Zhu

A DISSERTATION

Submitted to
Michi an State University
in partial ful ent of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics and Probability
1992

'Lu-l/

7.)

(7?-

ABSTRACT

ASYMPTOTIC BEHAVIOR OF COMPOUND RULES IN
COMPACT REGULAR AND N ONREGULAR FAMILIES

By
Jin Zhu

Asymptotic behavior Of compound rules is considered for three different
component problems.

For a restricted component problem with a compact parameter space,
admissibility and asymptotic Optimality are proved for compound Bayes rules based
on full support hyperpriors. A special case Of the above general structure is a
multi—dimensional exponential family with a polytOpe parameter space and equi(in
action) continuous loss function. An example is given to show that this class of loss
functions is the largest for which asymptotically optimal compound rules exist. A
multi—dimensional Miintz—Szész theorem is also developed to prove identiﬁability
Of our exponential family.

For a two sided truncation family, an equation satisﬁed by component
Bayes rules and an inversion formula for probability measures on parameter space

are derived. Based on these, asymptotically Optimal 0(n-1/4

) compound
estimators of truncation parameters and the empirical probability measure
determined by n arbitrary parameters are obtained.

For a linear compound problem with quadratic variance and bounded

”2 is proved for compound

parameter space, asymptotic Optimality with rate n—
estimators Of means. The estimators are Obtained by approximating functions Of
Bayes estimators versus the empiric Of 11 parameters. An example is given to show
that the boundedness of the parameter set is relative necessary for the existence of

asymptotically Optimal compound estimators.

To my parents and wife

iii

ACKNOWLEDGMENTS

I would like to express my gratitude to my thesis advisor, Professor James
Hannan, for his patience, encouragement and guidance in the preparation of this
thesis. His careful criticism and invaluable suggestions were of great value in
simplifying proofs and improving presentations.

What he has been consistently doing, taught me much about the attitude tO
academic research and even more being a mathematical statistician.

I would like to thank other committee members, Professors Dennis Gilliland,
V. Mandrekar, R.V. Ramamoorthi, and Habib Salehi, for their suggestions on an
earlier draft. I beneﬁted greatly from Professor Gilliland for his careful reading of
the thesis and invaluable suggestions on Chapter 3.

Warm thanks and appreciation also go to Ms. Lora Kemler for her expert
typing of my thesis.

Finally, I wish to thank the Department Of Statistics and Probability at
Michigan State University for providing the ﬁnancial support during my stay.

iv

TABLE OF CONTENTS

CHAPTER 0: INTRODUCTION
1 The Component Problem

2 The Set Compound Problem and the Empirical Bayes Problem

3 Literature Review and Summary of the Present Work
4 Some Notations and Conventions

CHAPTER 1: BAYES COMPOUND RULES FOR COMPACT FAMILIES

1 Introduction
2 Admissible and Asymptotically Optimal
Compound Bayes Rules

Theorem 1 (Equi uniform continuity of R n(t ,_) and R(t, w))

Theorem 2 (g A inherits full support Of A)

Remark 1 (Admissibility of compound Bayes rules)

Theorem 3 (A suﬂicient condition on A. O. of 9

Remark 2 (EA. 0. of compound Bayes rules)
3 Multivariate xponential Families

Remark 3 (On the assumptions of L)

Lemma 1 ((Continuity of pa)

Lemma 2 (Continuous differentiability of p 0)
Lemma 3 (Norm Lipschitz of P 0)

Lemma 4 (R(t, 0) inherits equi
continuit of L( (a ,0))
Remark 4 (Rt Lginherits equi Li schitz of L(a, 9))
Corollary 1 ( quicontinuity of Rt t,0))
Remark 5 (On the existence
assumption Of Bayes rules)
Corollary 2 Application Of Theorem 2
Corollary 3 Application of Theorem 3

APPENDIX FOR CHAPTER 1

1 Equicontinuity Of Loss Functions
Lemma 1 Equicontinuity of L)
Lemma 2 Equi Lipschitz Of L)
Remark 1 Convex loss)
Example ( .0. rule does not en'st)

2 A k—dimensional Miintz—Szész Theorem with Applications
Theorem 1 EA k—dimensional Miintz—Szész theorem)

Remark 2 ( xamples of A( k) ))
Theorem 2 (Identiﬁability( with respect to
an exponential family)

10
11

12
13

14

15

16
17

19
20

21
22
24
25

CHAPTER 2: COMPOUND ESTIMATION OF TRUNCATION PARAMETERS

1 Introduction 26
2 The Component Problem

Lemma 1 (An equation satisﬁed by t w) 27

Lemma 2 (Inversion formula for w) 28
3 Compound Estimators 29
4 Estimation of pG 31

n a

Lemma 3 (A bound on conditional L1 error for p)

Corollary 1 (Same for pi) 32

Lemma 4 (Bounds on various L1 errors for p)

Corollary 2 (Same for pi) 34
5 Asymptotic Optimality of i 35

Theorem 1 (L1 consistency Of On)

Lemma 5 (Singh—Datta Lemma) 36

Theorem 2 (A. O. oft)

An Example 37

CHAPTER 3: THE LINEAR COMPOUND PROBLEM 38

1 Introduction
2 The Component Problem and a Compound rule

The linear simple rule? 40
The compound rule 1

3 Asymptotic Optimality of i
Theorem 1 (A. O. of t) 41

Lemma 1 (Identities for 7i — 7 and H(Ti)—H(7))

Lemma 2 (Bounds on E 0| 32 — 02| and E 9| R(RHICJ)” 42

Lemma 3 (Bounds on average L1 error of Xi

and L1 error of 111) 43

Proof Of Theorem 1 44

4 An Example 45
BIBLIOGRAPHY 47

CHAPTER 0
INTRODUCTION

1. The Component Problem

Let X be a random clement taking values in a measurable space (.5 .2)
and let the parameter space 9 be a metric space. For each 0 E 9, P 0 is a
probability measure on (.13 .3). We are to take an action a in A based on the
Observed values of X to P 0 . The non-negative loss L(a,0) is incurred when a is
taken and 0 is the true parameter. A non-randomized decision rule t maps .3
into A such that L(t, 0) is measurable. The P g—average of L(t,0) is called the
risk of t at 0 and is denoted by R(t,0). For a restricted problem, the loss and
decision rules are bypassed in favor Of consideration of an arbitrary class of risks.

Let (I be the set Of all probability measures on the Borel a—ﬁeld Of O. For
each u E n, the w—average of R(t,-) is called the Bayes risk of t at w and is
denoted by R(t,w). The inﬁmum Bayes risk is denoted by r(w) and a decision rule
t0 such that R(t0,w) = iltlf R(t,u) 5 r(w) is called a Bayes decision rule versus «1.

If the inﬁmum is taken among the afﬁne functions of x, the inﬁmum is

denoted by rL(w) and any minimizer is called a linear Bayes decision rule versus an.

2. The Set Compound Problem and the Empirical Bayes Problem

Suppose that the component decision problem occurs repeatedly and
independently n times and we are to make decisions about Q based on _X_ = (XI,
..., Xn). A set compound rule 1 = (t1, . . . , tn) is a sequence such that each ti(§)
is a decision at ith stage. The parameter space and the action space for the

compound problem are e11 and A“. The loss function Ln: All x on ~~> [0, e) is

n
taken to be the average loss across the components, Ln(a,§) = % )3 L(ai,0i). For E
i=1

= (x1, ..., xn), let _x_§ denote x with i—th component deleted and for a compound
rule 1, let R(ti,0) denote a conditional risk given 5i . With E 0 stands for the

n
expectation determined by P 0 = a P0 , the compound risk is
- i=1 i

(1) Russ = EgLnUJD =§i§1Egatt,.o,);
the rhs(l) deﬁnes the compound risk with restricted components.
A compound rule 1 is said to be simple if t1= - - - = tn and each ti is a
function of Xi only. As a function of g
(2) is f Ree!)
3 simple

is called the simple envelOpe. Since Rn(t,9 = R(t,Gn) for a simple rule with
component t, (2) coincides with r(Gn). For a compound rule 5 , the modiﬁed

regret is deﬁned by

(3) Dn(.t.I!) = Rue?!) — r(G11)

We say a sequence of compound rules t n is asymptotically Optimal (a.o.) if
(4) sup DnQn,g) -t 0 as n -t m,

0

and asymptotically optimal with rate on if
(5) 8159 D,,(£n. l) = 0(an)-

For the relation Of this notion of Optimality to that with more stringent envelopes,
see Theorems 1 and 2 Of Gilliland and Hannan (1974—) for ﬁnite 9 and Remark 1
of Mashayekhi (1993) for compact 9.

For the linear compound problem, the modiﬁed regret for a decision rule t
is deﬁned by

(6) DnL = RAE!) " rL(Gn)I

and a sequence Of decision rules 1 n is said to be asymptotically Optimal (with
respect to rL(Gn)) if
(7) map DnLQn’D -t 0 as n -t m.

Asymptotic Optimality with rate is deﬁned similarly as in (5).

A sequence of set compound rules _t_n is said to be admissible if for each
n _>_ 1, 5n is admissible with respect to the compound risk Rn.

In the empirical Bayes problem, it is assumed that there exists an w E (I

such that Q ~ at“. At stage n, the risk of a decision rule tn is

t
(8) R (tutu) = I EQLOnron) d‘JII
and the modiﬁed regret is deﬁned by
t t
(9) D (twat) = R (tn,w) — r(w).
A sequence of decision rules tn is said to be asymptotically Optimal (with rate on)
if as n 4 III .
Q
(10) D (ins!) = 0(1) (001,,»-

A sequence of empirical Bayes rules tn is said to be admissible if, for each n 2 1,
*
tn is admissible with respect to the empirical Bayes risk R .

3. Literature Review and a Summary of the Present Work

Following his featured example on decisions between N (-1, l) and N(l, 1),
Robbins (1951), in Section 6 Of his Berkeley Symposium paper, introduced the
compound and empirical Bayes problems, pointing out their relations and possible
solutions.

Since then a great amount of work has been done in these two ﬁelds. Hannan
and Robbins (1955) generalized the example Of Robbins (1951) to two arbitrary
speciﬁed distributions and proved the asymptotic Optimality Of their proposed
compound rules. Asymptotic Optimality with rate 0(1NE) was proved later by

Hannan and Van Ryzin (1965). This result was further generalized by Van Ryzin
(1966) to ﬁnite 9 and ﬁnite A.

For non—ﬁnite O, Gilliland (1968) and Singh (1974) proved the asymptotic
Optimality with rate Of their compound procedures (estimates of simple rules Bayes
with respect to Gn) under the squared error loss for some compact (discrete and
continuous) exponential families.

Gilliland, Hannan and Huang (1976) ﬁrst proved admissibility and

1/ 2) of the compound rules Bayes with

asymptotic Optimality with rate 0(a-
respect to appr0priate diffuse hyperpriors (Robbins conjecture) for two arbitrary
speciﬁed distributions. The proof was based on Theorem 3 and Theorem 4 of
Gilliland and Harman (1974—); the latter reduces the problem of asymptotic
Optimality to that of posterior consistency for the ﬁnite state compound problem
with the restricted risk structure. This method has become standard in proving
asymptotic Optimality and admissibility in compound problems.

For squared error loss estimation in one—dimensional exponential families,
Datta (1991b) Obtained admissible asymptotically optimal compound rules which
are Bayes versus mixtures derived from hyper—priors on n ; this was the ﬁrst such
result for non-ﬁnite 9, implementing the outline in Addendum III Of Gilliland,
Hannan and Huang (1976). Mashayekhi (1993) strengthened Datta’s result to the
equivariant enveIOpe and his Theorem 1 gave a sufﬁcient condition for the
asymptotic Optimality of "delete bootstrap" rules.

In this thesis we consider three different problems in three chapters.

The component problem in Chapter 1 has a restricted risk structure. Section
2 ﬁrst proves (Theorem 1) that both compound and component Bayes risks inherit
the equicontinuity of the component risk. Based on this result, Theorem 1 of

Mashayekhi ( 1993) is reduced to a easily veriﬁed form (Theorem 2) and the proof of
admissibility (Remark 1) Of prOposed compound rules becomes straightforward.

Section 3 of Chapter 1 applies the results Of Section 2 to a multi-dimensional
exponential family. Asymptotic Optimality and admissibility are proved for a large
class of loss functions. This extends the results 3.2 and 3.3 Of Datta (1991b) from
one—dimension to multi—dimension and from squared error loss to equi(in a)
continuous loss.

The Appendix for Chapter 1 consists Of two parts. The ﬁrst part gives some
suﬁcient conditions for a loss function to be equi(in a) continuous and proves in
particular that most convex losses are equi(in a) continuous. The example given at
the end of the ﬁrst part shows that in some cases equicontinuity Of a loss function is
necessary for the existence of asymptotically Optimal compound rules. In the second
part, Theorem 1 is a multi—dimensional version of the Mﬂntz—Szész theorem.
Based on Theorem 1, Theorem 2 gives a sufﬁcient condition for the identiﬁability
of multi—dimensional exponential families.

In Chapter 2, we consider compound estimation Of truncation parameters of
certain nonregular families. (Ferguson (1967) Section 3.5) A one—dimensional
version was considered earlier by Nogami (1978).

Empirical Bayes estimation of truncation parameters has long been
considered. Fox (1978) considered two nonregular families U(0, 0) and U(9, 0+1),
and proved the asymptotic Optimality of his estimators without rate under the
assumption that the prior G has second moment. Nogami (1988) treated U(0, 0)
on a compact parameter space Of R and proved that her empirical Bayes

estimator is asymptotically Optimal with rate 11-1/2.

Nogami’s result was
generalized to a wider class Of nonregular distributions by Datta (1991). See more
references in this paper.

Wei (1989) considered a two—sided truncation family in the empirical Bayes
problem. He proved the asymptotic Optimality without rate under the assumptions

that the parameter space is compact and the prior G has a density. In this thesis,

under the assumption that parameter space is compact, our prOposed compound

1/4

estimators are proved to be asymptotic Optimal with rate 11" . Also, estimators

of the SE empirical distribution Cu are given and are shown to be Ll—consistent
with rate {4/4.

Chapter 3 considers the linear compound problem introduced by Robbins
(1983). Asymptotic Optimality of the linear empirical Bayes problem was studied
by Yu (1986), (1988). Here asymptotic Optimality of compound estimators is

proved under some mild conditions.

4. Some Notations and Conventions
For any positive integer 11, let 2 denote an n—tuple (a1, ..., ah). |-| is

used to denote the Euclidean norm and xy the inner product of x and y on

n
Euclidean spaces. If P1, ..., Pn are probability measures, we use P = x Pi to
i=1

denote the product probability measure. If the index is not exhibited for sum 2‘. or
product If, it runs from 1 to an integer and the integer is clear from the context.

We sometimes use p(f) to denote the integral of a function f with respect
to a signed measure 1;. Dummy variables are Often at least partially displayed in
integrals such as ]f(x) dp. E is used for the expectation corresponding to a
probability measure P.

A function on a set .3 to a set ] is denoted by x e £~~> y E ﬂ
Sometimes we abuse notation and denote fimctions by their values. Indicator
functions are often not distinguished from the sets they indicate. The symbol [ ]
is Often used for the indicator function.

In this thesis, 9 is always a parameter set and 0 is a generic element of O.

For _Q E 6“, P 0 is the product measure xP 0. and C11 is the empirical probability
_ I

measure determined by _0_. V and It represent supremum and inﬁmum respectively.

CHAPTER 1 .
BAYES COMPOUND RULES FOR COMPACT FAMILIES

1. Introduction

Admissibility and asymptotic Optimality are studied for a restricted risk
compound problem. As an application Of the Obtained results, a multivariate
exponential family is considered. .

Let .9 be a family of probability measures on a metric space (.3 .29,
compact in the total variation norm. Let M < m and let 9 be a class of decision
rulessuchthat R(t,P)$M forany t6 9 and PE .2 Let (I bethesetofall
probability measures on .9 with the weak convergence tOpology. We assume that

for any I.) 6 (I there exists a Bayes decision rule t w E a that is

(l) R(tw. 02) =:I€1ng(t.w)o

Consider a (restricted risk) compound problem with n independent
repetitions of the above restricted risk component structure. For _x_ = (x1, ..., xn),
x P.
ita ‘
denote P with at—th factor deleted. The class Of decision rules is g such that

let 55: denote _x with a—th component deleted and, for P 6 ﬂ, let P 5: =

every component ta of _t_= (t1, ..., tn) 6 g is a Q—valued map of £5 and
R(ta,P) is $1.1 measurable for any P e .2 The compound risk (0.1) Of _t_ E g

at _P_e .9” canbewrittenas

(2) Rn(t,P_) = %§PR(ta.P,,)-

For a more detailed description Of the structure Of the above restricted risk
problem, see Section 1 of Mashayekhi (1993) and, for the relation between the
usual decision problem structure and restricted risk problem structure, see Section

5 of Gilliland and Hannan (1974-).

Section 2 of this chapter considers admissibility and asymptotic Optimality Of
compound Bayes rules for the above restricted risk compound problem. Theorem 1
proves that the compound and component Bayes risks inherit the equicontinuity
and boundedness of the component risk. Theorem 2 and 3 concern the admissibility
and the asymptotic Optimality Of compound Bayes rules.

In Section 3, we apply the results of Section 2 to a multivariate exponential
family. Lemmas 1 through 3 concern some basic prOperties of the component
distribution. Lemma 4 is a general result on relations between risks and loss
functions. An application Of Lemma 3 and Lemma 4 gives us the equicontinuity of
the component risk (Corollary 1). Corollaries 2 and 3 prove admissibility and
asymptotic Optimality Of compound Bayes rules based on full support hyper-priors.

2. Admissible and Asymptotically Optimal Compound Bayes Rules

In this section, we consider the asymptotic behavior of a compound Bayes
rule for a restricted risk problem. In Theorem 1, we prove that the compound risk
Rn(_t_,-) and component Bayes risk R(t,-) inherit the equicontinuity and
boundedness Of the component risk. With A a hyper—prior on the set of all
probability measures 0, we deﬁne the prior Q A to be the A—mixture of n it'd w.
Theorem 2 proves that Q A has full support if A has; consequently proves that a
Bayes rule 5 A against E A is admissible. Based on Theorem 1 of Mashayekhi
(1993), Theorem 3 gives a sufﬁcient condition for t A to be asymptotically
Optimal.

Theorem 1. Suppose the component risk R(t,P) is equi(in t) uniformly continuous
and equi(in t) bounded by M. Then
(i) the compound risk Rn(t,P) is equi(in t) uniformly continuous on .951

and

(ii) the component Bayes risk R(t,w) is equi(in t) uniformly continuous
on D.
Both obviously inherit the bound M.
Proof: Let _P, _P’ e 91. By triangulation about PR(t,P') (with a’s unexhibited)
and 0 g R 5 M, the difference of summands of Rn (in (2)) at P and P’ is
bounded by
(a) guitar) -R(t.P')| + émur—P'u.

By triangulations about n—l points changing _P’ to 2 one coordinate at a time
and applications of the subadditivity and multiplicativity of the norm,

n n
(4) IIP-P’IIS 3 II x P (P--P’-) * P’||= 3 IIP--P’-ll
j.,=1 k<j 1‘ J Jk>j 1‘ j=1 J J

(Mashayekhi (1990) display (2.7)). For any a > 0, the equi(in t) uniform

continuity of R and (4) yield a 6 > 0 such that

(5) V||Pj-P:i|| < 6 implies (3) < 5.

Hence, each summand of R11 (in (2)) is equicontinuous and so is Rn .
TO prove the component Bayes case, we use the following inequality of

Oaten(1972) (Result (a) on page 1179): If E, F are probability measures on the

Borel a—ﬁeld of a metric space (.2 p), 7 is their Prohorov distance and h is a

bounded measurable function on .9: then

(6) |(E-F)h| S 7Diameter(h($) + ah.
where
(7) 01, = BuP{|111(P)--h(1”)|; p(P.P’) < 7}-

The Oaten—inequality (6) here bounds |(w1—w2)R(t,-)| by 7M +
sup{|R(t,P) —R(t,P')| ; ||P -P’|| < 7}, which, by the uniform equicontinuity Of

R, converges to 0 uniformly in t as 7 converges to 0. n

10

The compound decision rules derived below are similar to the ones in Datta
(1991b) Section 3.1. They are Bayes rules against the priors (8) induced by
hyper-priors on 0.

Let B(fl) denote the Borel a—ﬁeld on 0 corresponding to the weak
convergence tOpology and let A be a probability measure on (D, B(fl)). Let Q A
denote the compound prior on 9‘, the A-mixture Of a!“ determined by the values

(a) 72,031,: --- 43,) = 131413,) dA(w)
Of these A integrals of continuous functions.

Let .9“ denote the (Ir—mixture of P and Aﬁ—l the A—mixture of .954.
Since the joint probability QA°P51 induces the A—mixture of 52—1,“ on
.sn‘lx .9 we assume the latter has the disintegration (A 55-1)“: at . (When, as in
Section 3, .9” has density p w and A a is the probability on {I with A—density

prOportional to II p “(xi), then the A a—mixture of w is such an “a .) It then
itta

follows from the deﬁnition of the compound risk Rn that t is Bayes with respect
to Q if V a, t = t with
A a A a

(9) tAaQ) = twa(§&)(xa) V x .

Theorem 2. If A has full support I), QA has full support .9‘. Hence ifin
addition Rn(t_,-) is continuous for each 3 e g, then the Bayes rules with respect
to g A are admissible.

Proof: Let Bi i= 1, ...,n beOpen sets in .9 Since

(10) {an «(13) > 0}

is Open for any Open set B in 9(See Billingsley (1968), Theorem 2.1.iv.) and A
has full support, rhs(8) is strictly positive. Hence [1 A also has full support. a

11

Remark 1. Under the assumption that the component risks R(t, -) are
equicontinuous and bounded, by Theorem 1 (i) the compound risks Rn(_t_,-) are
equicontinuous. Hence Theorem 2 yields that the compound Bayes rule t_ A is
admissible. n

Let _t_ denote a simple compound decision rule with components Bayes
versus Gn’ the empirical measure determined by 9. Let «I: be a symmetric
mapping from sin—1 into (I and let 3 be a compound rule with

(11) {C(15) = migraine).

Theorem 3. Let .9 be a compact metric space. Suppose that w ~~> 9w is 1—1
and that the component risks are equi(in t) uniformly continuous and bounded.
Then i deﬁned by (11) is asymptotically optimal if

(12) sup{P, "5:,— agnll; re 5”} = 0(1).

Proof: We will use the following theorem Of Mashayekhi (Theorem 1 Of Mashayekhi
(1993)): i is an. if
(13) for any 6 > 0, 3 6) 0 such that V n

n 5%,— $9.." < 5 implies (cn — .L)(R(£n,.) — R(tn,-)) < e,
and (12) holds.

Much more than (13) holds here: By the one—to—oneness of the map as ~~>
.9“ , Remark 2 of Mashayekhi (1993) shows the distance between I.) and («1’
deﬁned by H 9w — .9“, || determines the same tOpOlogy as does the Prohorov
distance d. By the assumptions on the component risks R(t,-) and Theorem 1 (ii),
V r) 0,3 6>0suchthat

d(wl,w2) < 6 implies |(wl-w2)R(t,-)| < e

12

for any decision rule t e .9! n

Remark 2. The conditional probability w in (9) can be replaced by its average

a
over permutations; without loss of generality it can therefore be taken symmetric.
Also because ”a is constant with respect to a, 1A deﬁned by (9) is a special case of

(11). u

3. Multivariate Exponential Families

In this section, we consider a multivariate exponential family example of
Section 2. The component distribution assumptions are polytOpe parameter space
6 and ﬁnite ﬁrst moments on the vertices of O. Lemmas 1, 2, 3 prove some basic
prOperties of our exponential family under these assumptions. The assumptions on
the loss function L(a,0) are boundedness and equi(in a) continuity. Lemma 4
proves more generally that the component risk R(t,0) inherits the equicontinuity
of L under the hypothesis that 0 am) P 0 is uniformly continuous. As an
immediate consequence Of Lemmas 3 and 4, our component risk is equicontinuous
and bounded (Corollary 1). Using Theorem 2, Corollary 2 proves the admissibility
of our compound Bayes rule and, using Theorem 3 together with a posterior
consistency result of Datta and Theorem A.2, Corollary 3 proves the asymptotic
Optimality.

Let 6 beapolytOpe in Rk with vertices b1, ..., hm. Let p beameasure
on the Borel o—ﬁeld of Rk and for each a e 9, let

(14) pix) = eoxdﬁo)
be a density of P0 with respect to u such that
(15) g IIXIexp(b,X)du(X) < ..

Let M be a positive number and let the loss function L: (a, 0) e A x O ~~> [0, M]

13

be equi(in a) uniformly continuous, i.e. V e > 0, 3 6 > 0 such that
(16) |01 — 02| < 6 implies |L(a, 01) -L(a, 02)| < c
for 01, 0269 and aEA.

Remark 3. Assumption (16) is satisﬁed if A is a compact subset of RI and L
satisﬁes conditions (ii) and (iii) of the Appendix. The example given there shows
that in some cases there may not exist an asymptotically optimal rule if (16) is not

satisﬁed. I:

Some consequences Of assumptions (14) and (15) are stated here for later use.
Lemma 1 proves the continuity of the function 0 ~~> p 0(x) for each x and the
continuity of E 0X Lemma 2 proves the continuous differentiability Of 0 ~~> p 0(x)
for each x. Lemma 3 shows that 0 ~~> P 0 is Lipschitz in the total variation norm.

Lemmal. 0~~> [ehdubq and 0~~> leleodex) are continuous on 9.

Proof: The continuity of the above two functions is a direct consequence of
ﬁniteness of jexp(bix)dp(x) for each i, ( 15) and the dominated convergence
theorem in view of the domination

(17) exp(0X) s exptvbix) s Eexp(b,X). o e 9,

resulting from the fact 9 = convex hull Of {bl, ..., bm}. a

Lemms 2. Let M0) = jeoxdp(x). Then, for each 0 E 9 and for any direction a

in the unit ball of Rk and into 9, the directional derivative Aa Of A exists on

Band

(18) Rate) = I (am)e”" duo).

14

Furthermore A is continuous in (01,9). Since 1]: = lnA, the directional derivative

Ill: A/ A inherits the continuity of A and A.

Proof: For any 0 and unit a, the a directional derivative of exp( 0x) at 9 is
(aar)exp( 0x). Since, for 0 e 9, the derivative is dominated by |x|2exp(bixi)
which is integrable by (15), a standard theorem Of interchanging the order of
integration and differentiation (e.g. Theorem 1.5.4 Of Fabian and Harman (1985))

yields (18) for 0 e O and a into 9. The continuity Of A follows from the

dominated convergence theorem with the same dominating function. o

For compact 9 in the interior Of the natural parameter space, there always
exists a polytOpe containing 9 such that (15) is satisﬁed. (Cf. e.g. Theorem 2.2 of
Brown (1986).)

Let || || denote the total variation norm of signed measures on Rk. Since
rhs(18) is deﬁned on 9 x the unit ball in Rk, we abuse notation and use A and 7;

to denote their corresponding extensions also. Let K1 = (“15” a (0)| and K2 =

313p! |x|p0dp+ K1.

Lemma 3. The map 0 ~~> P 0 is Lipschitz on 9 in the total variation norm.
Proof: Since the unit ball in Rk and 9 are compact, Lemma 2 implies that K1
is ﬁnite. Hence, by Lemma 1, K2 is ﬁnite.

By applying the mean value theorem to the exponential function exp and
noticing exp is monotone,
(19) ng-Pgtl S I91!-0’x-(1/(0)-¢(0’))I(p0Vp0,).
By the Euclidean norm inequality and by the mean value theorem applied to 10,
(20) mm) s Iii-9’ l(lxl + lep, + p0,).

15

Hence the conclusion follows from the integration of (19) with respect to p. 0

Lemma 4 below shows that component risks inherit equi(in a) uniform

continuity of loss functions more generally.

Lemma 4. If only 0 mo) P 0 is uniformly continuous and if 0 ~~> L(a,0) is such
equi(in a) and bounded equi in a, then the component risk R(t,0) is equi(in t)
uniformly continuous and bounded on 9.
Proof: The risk is bounded because an expectation inherits the bounds of a
integrand.

By triangulating about P ”L(tﬂ’),
(21) |R(t.0) - R(t,?“ S X|(L(a.0) - L(MOI + {(:?%)L(a.0) }||Pp- P91";

this proves the equicontinuity of R. 0

Remark 4. If 0 ~~> P 0 is Lipschitz and 0 am) L(a,0) is equi—Lipschitz and
uniformly bounded, from (21) 0 ~~> R(t,0) is equi—Lipschitz and uniformly
bounded. Remark A.1 gives examples of equi—Lipschitz loss functions. a

Using Lemma 3, the following is an immediate special case of Lemma 4 and
Remark 4:
Corollary 1. For our multivariate exponential family, R(t,0) is equi(in t) uniformly
continuous and bounded on 9. If we further assume that L is equi(in a) Lipschitz,

then R(t,0) is equi(in t) Lipschitz and bounded. o

For the remainder of this chapter, we make the assumption that component

Bayes decision rules exist. Let n be the set Of all probability measures on O. For

16

each u e (I let 9w and p w denote the tel—mixtures of P 0 and p 0 . Deﬁne the
posterior expected loss

(22) Lx(a.w) = l L(a.0)pp(X) dam/pulls)-

We assume that for any a: 6 0, there exists a measurable t w: Rk ~~> A such that

t w(x) minimizes Lx(" w) on A. Such a t w is a (component) Bayes decision rule.

Remark 5. In the context Of the compound problem (which is the subject of this
thesis), the assumption on the existence of a measurable minimizer is without
essential loss Of generality since there always exists an e—Bayes decision rule. (See
Gilliland and Hannan (1974—), last paragraph on page 135 for further explanation
for ﬁnite 9.)

Under the hypothesis of Remark 3, Bayes decision rules always exist. In
this case, Lx( - , w) inherits the continuity of L(- ,0) by the bounded convergence
theorem, hence attains the minimum at some point t ”(x) in the compact set A.
Because L is then also joint measurable on A x 9 and A is complete, it is
possible to choose a measurable t w . (See Brown and Purves(1973), Theorem 3.)

Under the assumption (16), closure of J: {L(a,-); a e A} in C(O) is
without loss Of generality by the Ascoli theorem. In this case, the previous criteria

apply with A taken to be .2 a

For our multivariate exponential family, {P0 ; 0 E 9} is dominated by p.
Thus given a full support hyper—prior A on 0, if A a is the probability on 0

with A—density proportional to II p “(xi), then the A a—mixture of w is a
i# a

conditional probability ”a in the disintegration of Mfg—1w). Therefore the
compound Bayes rule 1 A makes sense in this situation and we will prove its

admissibility and asymptotic Optimality in the next two corollaries.

17

Corollary 2. In the context of the preceding paragraph, the compound Bayes rule
1 A is admissible.

Proof: Corollary 1 proves that R(t, -) is equi uniformly continuous and bounded
on 9. Since A has full support, the admissibility follows from Remark 1. n

The following notation will be needed to prove the identiﬁability of 0, is.
(23) w M») .9” is 1-1,
for our multivariate exponential family which in turn will be used to prove the
asymptotic Optimality Of our compound rules.

Let A(1) be a set in R which contains a sequence of distinct numbers {y1,
y2, - - -} such that

(24) yu, u; 1, have the same sign

and

( ) i I I 1

25 y A =0.
u=1 u yu

For 2 g i g k, let A(i) be a set in Ri such that for each y(i‘1) e A(i—l), there

exists a sequence of distinct numbers {yil’ yi2’ - - -} with properties (24) and (25)
(H) . .. -

and {(y Iyiu)l ll - 1: 2i } C A(1)

Corollary 3. In the context of the paragraph preceding Corollary 2, suppose that
the support of p is a set A(k) deﬁned above. Then the compound Bayes rule 1A
is asymptotically Optimal.
Proof: Since t A = i by Remark 2, we need only verify the hypothesis of Theorem 3
and condition (12).

Since 9 is compact and the map 0 ~~> P 0 is Lipschitz by Lemma 3, {P 0 ;
0 e O} is compact for the total variation norm. The assumption that the support

18

Of [l is a set A(k) and Theorem A.2 imply the identiﬁability of (I. Corollary 1
asserts that R(t,0) is equi(in t) uniformly continuous and bounded on 9.

To prove the posterior consistency condition (12), we will use Theorem 3.1
of Datta (1991a). Since 9 is a compact metric space, by that theorem we only
need to verify the following two conditions:

A1 p ”(x) is continuous in 0 for each x,
and
at
A2 with he = :Iloswplpp).
*
(26) )3] (ha—M)+p0dp-00 as M-Hn.
A1 holds by Lemma 1.

A2 will follow since (26) holds with h; and p 0 increased to their respective
supremum with respect to 0.

Since log(p0, /p0) = (0'- 0)x + 7(9)— ¢(0), the mean value theorem gives
h; g 7(le + K1) with 7 the diameter of O and K1 the bound Of it. The
domination (17) and the boundedness of exp(—1p) gives an uniform bound in L1(p)
for p0 . In view of the assumed integrability (15), the strengthened (26) follows
from the continuity from above of the lit—integral. o

APPENDIX FOR CHAPTER 1

Some results which are used in Chapter 1 and might have independent
interest are presented here. Section 1 discusses some sufﬁcient conditions for the
equi(in a) continuity of the loss functions L(a,0) and gives an example to show that
asymptotically Optimal decision rules may not exist if this condition is violated.
Section 2 deveIOps a k—dimensional version of the Muntz-Szasz theorem and gives a
suﬁcient condition for the identiﬁability with respect to k—dimensional exponential
families.

1. Equicontinuity of Loss Functions

Two lemmas, a remark and an example are given in this section. Lemma 1
shows that conditions (i), (ii) and (iii) below imply the assumptions about the loss
functions in Chapter 1. Lemma 2 gives a sufﬁcient condition for (iii). Based on
Lemma 2, Remark 1 shows that, when restricted to compact action and parameter
spaces, almost all the loss functions in common use fall under our consideration.
An example is given to show that the Chapter 1 assumptions on loss functions are
relatively necessary to ensure the existence of asymptotically optimal compound
rules.

Consider the following conditions on a loss function L: (a, 0) E A x 9 am) [0,

m):

(i) A x 9 is a compact subset of R1 x Rk;

(ii) every a—section Of L is continuous on 9;

(iii) L is equi(in 0) uniformly continuous on A, i.e. V e > 0, 3 6 > 0
such that
(1) |al —a2| < 6 implies |L(al,0) -L(a2,0)| < e

foral,a26 Aand 0e 6.

19

20

Lemma 1. Let L satisfy conditions (i), (ii) and (iii). Then L is continuous and
bounded on A x 6. Moreover, L satisﬁes the equi(in a) uniform continuity
condition (1.16).
Proof: Let (a0,00) E A x O and e > 0. By triangulation about L(a0,0), (iii) and
(ii) yield a 6 such that when |a—a0|v| 0— 00] < 6,
(2) |L(a,0) — L(a ,0O)| < 6.
SO L is continuous on compact A x 9 and therefore L is bounded thereon.

By (iii) V e > 0, there exists a 60 > 0 such that
(3) Ia-a’l < 60 implies |L(a,0)—L(a’,0)| < 6/3 for 0G 9.
Since A is compact, there exist a1, . . .,aIn such that UB(ai, 60) J A, where
B(a,7) is the Open ball with center a and radius 7. By the uniform continuity Of
each L(ai,-), there exists a 6i such that
(4) I01—02| < 6i, implies |L(ai,0l)—L(ai,02)| < 6/3, i= 1,- - -,m.

m
Let 6 = It. 6i and a e B(aj,6j). By triangulations about the points (aj,0l) and
0

(aj,02), when I01—02I < 6, |L(a,01) — L(a,02)| < c. This is the equi(in a)

continuity. 0

Lemma 2. Assume that (i) and (ii) hold. Also assume that A is contained in an
Open set C such that

(iv) for each 0 E O, L(~,0) has a convex extension to G.
Then L is equi(in 0) Lipschitz on A.
Proof: Let B be the unit ball in RI and c be a positive number such that A +
cB c G. Deﬁne

(5) 3(a) = 6:3 L(a.0)-

Since the supremum is subadditive and positive homogeneous, S inherits the

21

convexity of L(-, 0) on G and therefore is continuous on G. So L(-,0) is

bounded on compacta in G. Let M: sup S(a).
aeA+cB

For any a1, a2 6 A, let

(6) 3 = 32 + W541).

Then aeA+cB and
'32 - all

(7) a2=(1—A)al+Aa, A=c +1112 -all'

 

Hence, by the convexity of L(- ,0),

(8) L(a2,0) g AL(a,0) + (1—A)L(a1,0).

By the nonnegativity and boundedness of S on A + cB, and the deﬁnition of A,
it follows from (8) that

(9) L(a,0) —L(a,.o) 3 AM 3 ¥la2 -a1|-

Interchanging al and a2 in (9), the desired result is proved with Lipschitz

constant K = M/c. u

A more general version of Lemma 2 can be found in Rockafellar (1972)
Theorem 10.6.

Remark 1. If L(a,0) = q( | a—0l) for some nonnegative nondecreasing convex
function on [0, III) (which the most commonly used loss functions are), interchanging
0 with a in Lemma 2, we have that L is equi(in a) Lipschitz, that is
(10) |L(a,01)—L(a,02)| sKIol-02|, VaeA and 01, ozee.
This strengthens the second result of Lemma 1 from equicontinuous to
equi—Lipschitz. Examples Of q are xp for p 2 1.

Not all convex functions can be extended (ﬁnite) convexly. For example,
f(x) = —l i + 1 is a nonnegative convex function on [0, 1], but can not be

extended convexly beyond 0. n

22

The following example shows that no asymptotically Optimal rule exists if

the equi(in a) continuity condition for L fails.

Example. Let A=e=[o, 1] andlet

(11) L(a, o) = “-0 2 (a, a) t o
a +

= 1 (a, 0) = 0.
L is continuous in each variable and is bounded by 1. But L is not equi(in a)
continuous at 0 = 0, because no matter how small 0 is, L( 0, 0) = 0.

Let n bea positive integer and g= (0, . . . , 0) be a constant sequencein
R11 with 0 t! 0. By (11) a simple compound rule Bayes versus G: is i = _Q. SO
for any compound decision rule 3, the modiﬁed regret is
1 n (ti—0) n n

(12) D11 = Rn(t, g) = E .2 Ilpo(xi)du (E)-

I=l ti+ 1
Let 0 it 0m be a sequence converging to 0. By Fatou’s Lemma

n
sup D 2 liminf Up11 (x.) dpn(x .
0 n I m 1 0m l J
If the right hand side is positive, as is true for the multivariate exponential case of

(1.14) and (1.15), no asymptotically Optimal compound rule exists. I:

2. A k—dimensional Mﬁnta—Saasa Theorem with Applications

In this section, we use the 1—dimensional Mﬁntz—Szftsz theorem to prove a
k—dimensional version Of the Milntz—Szész theorem. As an application of this
theorem, we consider the identiﬁability problem for k—dimensional exponential
families.

Theorem 1. Suppose A(1) is a set in R which contains a sequence of distinct
numbers {y1, y2, - - -} such that

23

(13) yu, uz 1, have the same sign

and

( ) i l I 1

14 y A =In.
u=1 u yu

For 2 g i g k, let A(i) be a set in Ri such that for each y(i—l) E A(i—l), there
exists a sequence of distinct numbers {yi1, yi2’ - - -} with the prOperties (13) and
(14) and {(y(i-1),yiu); u = 1, 2, - - -} c A(i). Then the subspace

13(9) = sp{0 e e am» e”? ; y e A(k)}
is dense in the uniform norm in the space 0(9) of all continuous functions on 9 .
Proof: We will use the following simple fact in our proof: if T g is the map from
C[a, b] to C[c, d] deﬁned by a homeomorphism g from the closed intervals [a, b]

_ —l
to [c,d], Tgf—fg , then T8

0
Let {y1, y2, - - -} be a sequence of distinct positive numbers such that 113

yuA%-— = m. The Miintz—Szasz theorem (Sata I, Szész (1916)) implies that, for any
u

a > 0, every continuous function is in the C[a, 1]—closure of the subspace spanned

is a norm preserving isomorphism.

Y
bythepowers g“, u=1,2, Let [c,d] beanintervalwith c>0. Leta>0
and g be a scale change on [a, 1] such that its image includes [c, d]. Noticing that

T 8 preserves subspaces spanned by the single powers, the Mi'mtz—Szész theorem

holds on the g-image of [a, 1], consequently on [c, d].

Let [ai, bi] be the convex hull of the i—th coordinate projection of 9. Let
”W yi2’ ---} be a sequence with the properties (13) and (14) and let si =
sign(yi1). Because of the similarity of the proofs, we only prove the case si = 1.

By the above consequence of the Miintz—Szasz theorem, each positive integer power
m.
0i ‘, as a continuous function of 5i = exp(0isi), is in the C[exp(siai), exp(sibi)]

—closure of sp {6i , u = 1, 2,... }. With g: 0i 6 [ai, bi] ~~> exp(si0i) E

[exp(siai), exp(sibi)], it follows from the properties Of T g that

24

mi . . 0iyiu
(15) 0i Is In the C[al, bil—closure of sp {e ; u = 1, 2,... }.

We now use induction to prove

(16) rim? 2 sp,{exp<o“)y(‘)); ,(i) e A(i)} for 1 s i s k,

i
here and hereafter, spi denotes the C(x[aj, bj])-closure of sp.
1

By (15), (16) is true for i = 1.

Suppose that 1 < i 5 k and (16) is true for i—l. For each y(i_l) E A(i—l),
by the deﬁnition of A(i), there exists a sequence {yil’ yi2, - - -} such that
(ya—l), yin) e A(i) for u = 1,2, and (15) holds. Therefore, by boundedness
of the additional factor,

. . m. . . -
emu“ 1’s“ ”)0,‘ e setups“ ”y“ 1))exp(0,yiu); u= 1, 2. ---l
which, since (yo-Dam) e A(i), is contained in
(17) sp,texp(o(‘)y(‘)); y“) e tall.
1!].
By the induction assumption and boundedness of 0i 1,

{101:1 e spilexp(0(i'l)y(i'l))01:i; y(i-1)€ Ali-1)} .

hence in the subspace in (17). This proves (16) for i.
By the restriction Of (16) to i = k and the Weierstrass theorem, D(x[aj, bjl)
is dense in C(x[aj, bjl) and, a fortiori, D(O) is dense in C(O). 0

Remark 2. Special examples of A(k) are product sets with A(1) factors and any

set in Rk with non-empty interior. a

Let 9 be a metric space with its Borel a—ﬁeld and let {P 0 ; 0 E 9} be an
appropriate family Of probability measures with densities p 0 with respect tO a
measure It. Let I.) E (I, the set Of all probability measures on O, and let 9w and

25

pm be the tat—mixture of P0 and p0 ,
.9u(-) = I P0(') do: and pw(-) = I p0(-) dw.
II is said to be identiﬁable if the map
«I ~~> .9“I is 1—1.

We here only consider the mixtures of a k-dimensional exponential family: It
is a measure on the Borel a—ﬁeld Of Rh, 9 is a compact set in Rk and P 0 has a
continuous density
(18) p4.) = cox-W)
with respect to It. With the aid Of Theorem 1 we now give a sufﬁcient condition for
identiﬁability of II.

Theorem 2. Let S“ be the support Of It,

s“ = n{F; F closed and p(F°) = 0}.
If x0 6 Rk and S” — 10 is a set A(k) for which the hypothesis Of Theorem 1
holds, then the (I is identiﬁable.

Proof: Suppose that 9w = 9w for probability measures all and (112 on 9. Let
1 2

S be the set where their densities are equal, S = {x; p w (x) = p w (x)}. S is closed
1 2

because p w inherits the continuity Of p 0 by the bounded convergence theorem.
Since also It(S°) = 0, S p c S by the deﬁnition of support.

Since p ”(x) = I exp(0(x—xo)p ”(1:0) dw, S It C S implies that
(19) ”(013060) ch», = I f(wxol «w,
for all f e D(9) a sp{exp(0y); y e S” — x0}. By Theorem 1 D(O) is dense in
C(O): a g in C(O) is the uniform limit of asequence fn e D(O). With |g| + 1
as a dominating function, the dominated convergence theorem yields that (19)
holds for g. Therefore, the measures with (vi-densities p 4x0) are equal and, by
the positivity of p ”(xo), all = w2 . u

CHAPTER 2
COMPOUND ESTIMATION OF TRUNCATION PARAMETERS

1. Introduction

Our component distribution in this chapter is a two dimensional truncation
family and the component problem is to estimate the truncation parameters in a
compact set with squared error loss.

Section 2 introduces a general and a special component problem. We will
restrict to the special problem from Section 3 on. Two lemmas are proved for the
general problem. Lemma 1 proves that a Bayes estimator satisﬁes a linear equation
and Lemma 2 gives a relation between a probability distribution and the marginal
distribution Of the random variable X (inversion formula). This relation is a two
dimensional analog of (2.1) of Fox (1978) which was ﬁrst noted by Robbins.

In Section 3, we construct compound estimators of Q by directly estimating
the solution of the equation of Lemma 1 and construct estimators of On by using
the inversion formula of Lemma 2. In Section 4, we consider estimations of average
densities and their integrals. Upper bounds on various Ll—errors are derived for
non—delete (Lemmas 3 and 4) and delete (Corollaries 1 and 2) kernel estimators.

The main results are in Section 5. Using results from Section 4, Theorem 1
gives the Ll-consistency of the estimators Of On and Theorem 2 the asymptotic
optimality of our compound estimators.

Section 6 gives an example Of the two sided truncation families to illustrate

our results.

2. The Component Problem
In this section, we ﬁrst introduce a general component problem. Lemma 1

gives a linear equation which is satisﬁed by a Bayes estimator and Lemma 2 gives

26

27

an inversion formula for a probability measure. At the end of the section, we
specify the component problem we will consider in this chapter.

For any 0 E R, let qozR R2 am) {0, 1} be the indicator function of the SE
quadrant of 0, q 4x) = [01 5 x1 , x2 5 02] and, let It be a measure on R2 such
that 70(0) 5 l/I q0(x) du E (0, In). For each 0 E R2, let X be a random variable
with p—density
(1) p00!) = «$941)-

We will estimate the truncation parameter 0 under the squared Euclidean distance

loss.
Given a probability measure w on R2, let p also denote its own
extension,
(2) pub!) = I pix) cm.
A component Bayes estimator versus w is
(a) twth- = «31’th tifltx» Jig—‘1’
Here 0/0 is interpreted as 0.

The following lemma shows that t w satisﬁes the linear equation (5). Our
compound estimators in the next section are approximations of the solution of the

equation.

Lemma 1. With p «I deﬁned in (2) and
(4) V£I)(x) =—fx1 pw(s,x2)ds and v£2)(x) = fx: p ”(xl,s)ds,

we have

(5) (t ”(x) — x)pw(x) = v w(x).

Proof: By the deﬁnition (2) of p w and the representation (3) of t w

(6) lhs(5) = I (0 —x)p0(x) dw.

28

By the Fubini theorem and the deﬁnitions of v32) and p w ,
2
(7) vw( )(x) = I I p0(xl,s)[x2$ s]ds dw.
Since the integrand of rhs(7) is actually p Ax)[x2 g g 02], integration of s gives
us the second component of rhs(6). Similarly _ v ”(1) is the ﬁrst component Of

rhs(6). This proves Lemma 1. o
The following lemma gives an inversion formula for w.
Lmnma 2. Let p0 and pw be deﬁned by (1) and (2). Then, for any y 6 R2,

(8) «(t,) = f {pw(y) 4- pu x1.y2)- pw(y1.x2) + PJXDQyOI) dub!)-
Proof: Since P ”(qy) = 1 if qy(0) = 1,

(9) qyto) = qytm q,n,ds
Integrating both sides of (9) with respect to w, by the Fubini theorem, we have
(10) day) = I w{¢(0)qy(0)qp(X)}qy(X)du(1)-

The lemma follows from representing the 0—rectangle indicator qy( 0)q 0(x) in
terms of 0—NW quadrant indicators and using the deﬁnition of p w . I:

From now on we will consider the following special case of the above
component problem:

Let b be a positive number and D be the upper triangle in the square
[0, b]2, D = {x =(x1,x2); 0 5 x1 5 x2 3 b}. Let m be a positive number. Let f
be a measurable function from D to [0, m] such that f is nonincreasing to SE
(nonincreasing with respect to the ﬁrst component and nondecreasing with respect
to the second component) and let e be a function from D to [1 /m, m] such that
e is Lipschitz with Lipschitz constant a. Let A be the Lebesgue measure
restricted to D and let p be a measure with a A-density fe such that ¢I(0) .=.

29

1/Iq0dpe (0, m) for 06 D and 02 > 01 . Let 5 be apositive number and let 6
= {0 = (01,02); 0 g 01 , 02 _<_ b, 76(0) 5 6}. We will only consider Bayes estimators
versus the probability measures u on O. The loss function is still the squared
Euclidean distance and the action space A is taken to be D.

For this component problem, because p ”(x) > 0 a.e. P w , Lemma 1 yields

(11) tw(x)=x+;—:((g a.e. Pw’

where
x b
(12) v£1)(x)=-_I;) lpw(s,x2)ds and v£2)(x) = [X2 pw(x1,s)ds a.e. A,

and ifwe deﬁne Fw and III by

(13) wa') = I P0013.) dw and 6’6') = day).
then Lemma 2 yields

(14) 516') = I {pw(y) - pw(x1.y2) - pw(y1.x2)}qy(1) dub!) + Fm)-

3. Compound Estimators
In this section, we ﬁrst construct non—delete and delete estimators of pG ,
n

from which the corresponding estimators of VG are Obtained by the composition
It

as in (12). With such Obtained estimators Of pG and VG , compound estimators
n n

Of g and estimators of Cu are constructed by the composition as in (11) and (14).

Let Xl ,..., Xn be independent random variables with distributions P 0 ,...,
1
P0 . Let Gn be the empirical measure determined by _0. We use i to denote the
n
simple estimator with components Bayes versus Gn’ ti(1_r) = t6 (xi). From the
n

representations (11) of t w and (14) Of 6) with w = G , we might have good

It
one—stage compound estimators of 0 and Of Cu if we used good estimators of

pGn and an.

30

Let K be a bounded density on R2 vanishing off the SE unit square,

(15) K g M for some positive number M,
(16) IK(x) dx= 1.

and

(17) K(x) = 0 for x t [0,1] x [—1,0].

Let h be a positive number. (Dependence of h on n will be determined later.)
Based on the relation dPu/dA = fepw , we consider the following non-delete
estimator p of pG ,

n

(18) ntx)= It? images,

1
and the Xi—delete estimator pi of pGn ,

(19) p,(x)= _—1—2)h i3 Ken-0%

where
r(x) = A{e(x—hs); s E the SE unit square}/e(x)
is the ratio Of the inﬁmum Of e on the h—square SE to x to the value of e at x.
By the Lipschitz prOperty of e,
(20) r(x) e [1 — I2ah/e(x), 1].
With the Xi—delete estimator ii of an deﬁned by the composition as in

(12) with pG11 replaced by pi , the i—th component of our prOposed compound

estimator t is Obtained by ﬁrst replacing pGn, an Of the th in (11) by pi,v i

and then projecting the resultant estimator to A.
For any y e R2, let F n(y) be the empirical distribution deﬁned by

F,(y)=,1,j2 q (x,.)

The estimator CH of an is deﬁned by substitutions of FGn and pGn in C11

(see (14)) by Ru and p.

31

4.Estimationopr
n

In this section, we will derive bounds on various Ll errors Of estimators Of

pG . Lemma 3 derives a bound on Eolp — pG | for every x and Lemma 4
n - n

bounds integrals of the derived bound. Corollary 1 derives a bound on E 0'13i —

pc I for every x and i and Corollary 2 bounds averages of integrals Of the
n

derived bound.

Lemma 3. With p given by (18), M and 0 the bounds for K and ab respectively
and a the Lipschitz constant for e,

. M = 211
(21) Vuypnﬁgf—ewH
and
(22) lags—pap 3 eff Iq,,,,,-q,IdG,, K(s)ds + 15—39.

Hence from the inequality E|W| 5 |EW| + VUl/2(W) with w = p — pG ,
n
(23) E,Ir3 - as I 3 Wu) + rhs(22).
- 11

Proof: We prove the evaluations Of (21) — (22) at an arbitrary x.
Because the variance of the sum equals the sum of variances for independent

summands and the variance is less than the second moment,

(24) lhs(21)°(x) s 3.31,: f Mom's? : ’,F dy.
By the change of variables s = (x—y)/h, and the boundedness of K, 0 5 K 5 M,
and r, 0 5 r 5 1, the j-—th integral in rhs(24) is less than
(25) h2Mf K(s)(p0j/(fe))(x—hs)r(x) ds.

Since supp(K) C SE unit square, when 3 E supp(K) f(x—hs) 2 f(x) by the
SE nonincreasing of f and r(x)/e(x—hs) 5 1/e(x) by the deﬁnition of r. Hence it
follows from 0 5 pa 5 ﬂ and the property (16) of K that the integral in (25) is

32

bounded by ﬁ/(f(x)e(x)). This, combined with (24) and (25), proves the ﬁrst
conclusion (21).

By the change of variables s = (x — y)/h for p and the property (16) of
K, it follows from triangulation about r(x)pGn(x) that

(26) lhs(22)°(x) s f r(x)|pGn(x-hs) — pGn(x)lK(s) as + (1 - r(x))pGnIx).
Since 0 5 It 5 )0, by the deﬁnition of pc and q, the absolute value in the ﬁrst
It

term of rhs (26) is bounded by

(27) 0! lemon) —q,(x)I «16,,

By the range (20) of r and 0 5 pc 5 6, the second result (22) follows from (26)
n

and (27). 0

Corollary]. With pi given by (19) and M, s and a asin Lemma 3,

(23) E glf’i — pGnI 5 V(n—1) + is, + rhs(22).

Proof: Because i—delete average of n a’s exceeds non—delete by (E—aQ/(n—l) and
because 0 5 I r(x)p oj(x—-hs)K(s)ds(=aj) 5 6, the corollary follows from

triangulations about E 003,) and E ”(13), and (21) and (22) of Lemma 3. 0

Lemma 4 (b) and (c) summarize bounds on integrals of rhs(23) which will be
used to prove Ll-consistency (of the ﬁrst three terms) of Cu . Lemma 4 (a) will be
used only in the proof of Corollary 2. Corollary 2 gives average Ll-bounds on lpi
PGnl/PGn and Ivi— anIIPGn which will be used to prove the asymptotic

Optimality of i.

Lanma4. Let M, 0 and a beasinLemma3andlet m betheupperboundof

fand e.

33

(a)
(29) I rhs(23)o(x) dp(x),
is bounded by
(30) Bl(n) E 1%:2 + 2ﬁbm2h + bzmaﬁh/I 2.
(b) With m also the uppm bound for 1/e, both
(31) f flrhs(23)e(t,r2)dt dp(x)
and
(32) f fb rhs(23)o(x1,t)dt d;t(x)
"2
are bounded by
(33) Beta) Eb{m%+ 2min + minimums/Ia.

(c) With m as in (b),
x b
(34) f I; 1f rhs(23)o(y)dy d].l(x)
x2

is bounded by bB2(n).
Proof: For the p—integral of the ﬁrst term in rhs(22), we apply the Fubini
theorem to interchange the integration order with respect to It and ans. Since
the difference Of q’s indicates an x set with A—area less than (81 — 32) bh for s
E supp(K) and dIt/dA = fe 5 m2, the integral is bounded by
(35) ﬂmzthI (s1 - 82)K(s) dGnds.
Because 0 5 s1 — s2 5 2 for s E supp(K) and K is a density, (35) is further
bounded by
(36) 2sm2bh.

By using the common upper bound m of f and e and the fact that the
A—area Of supp(p) is less than A(D) = b2/2, the p—integral Of V(n) and the
second term in rhs(22) are bounded by the ﬁrst and third terms of (30) respectively.

34

Result (a) follows from this and (36).

Similar to the derivation of (36), the turn integral of the ﬁrst term in
rhs(22) is bounded by b(36). By the SE nonincreasing, f(t,x2) 2 f(x) for t 5 x1
and f(x1,t) 2 f(x) for t 2 x2 . Then, using the the common upper bound m of f,
e and 1/e for V(n) and the second term in rhs(22), we have the desired bound (33)
for (31) and (32).

With some minor changes for the domain of integration, the procedure used

in the proof of (b) will also prove (c). n

Corollary2. Let M, ,6, a and m beasinLemma4.
(a)
11 .
(37) %,ElEflpi-pgnl°(X,)/pGn(Xi)}

l
. 2m2
Is bounded by 2%” + Bl(n-1).
(b) With m as in Lemma 4 (b), both

(as) ﬁ-iglEglirIl)-vé:)|°(X,)/pcn(xi)
and

n .
(39) §iglEple2)-vé:)lo(X,)/pgn(xi)

3m2
are bounded by 2%)“ + B2(n-1).
Proof: By Corollary 1, (37) is bounded by

n
(40) f islpotiéf + V(n—1)+ rhs(22)}/pG dp.
I: l II
By cancelling pG in the numerator and the denominator of (40) on [pG > 0] and
n 11
using Lemma 4 (a), (a) of this corollary is proved.

By the deﬁnition of vi and VG , and Corollary 1 and Lemma 4 (b), (b) of
n

this corollary can be proved similarly. o

35

5. Asymptotic Optimality at;
Based on the bounds derived in Lemma 4 and Corollary 2, we here prove the

—1/4

O(n—1’4) Ll-consistency of Cu in estimating Cu and the O(n ) so of 1

in estimating 0 .

A

First the Ll-consistency of Sn

Theorem 1. Let On be given by (14) and On be deﬁned in the last paragraph of
Section 3. Then under the assumptions of our special component problem described

in the second last paragraph of Section 2,
(41) WI lén(y)-G,,(y)l titty)

is bounded by

3bB,(n) + b2/(4l 5).
hence is O(n-1,4) by taking h = n—l/4 in B2(n).
Proof: By using the Fubini theorem to reverse the order of integration to prxE 0
and then using rhs(23) in Lemma 3 to bound the inner integrals for the ﬁrst three
terms of (41), it follows from the deﬁnition of qy and the integration Of y2 in the
second and y1 in the third terms that
(42) 1118(41) S (34) + b-{(i’tl)+(3‘4-’)} + Epl lf‘nm - F3116“ MI)-
Since X j are independent and qy(Xj) are unbiased estimators of F0j(y) with

variances less than 1/4, by the Fubini theorem and Schwarz inequality, the last
term in rhs(42) is bounded by b2/(4I F). The theorem follows from (42) and
Lemma 4 (b) and (c). n

36

The next lemma is the Singh—Datta inequality (Datta (1991) Lemma 4.1)
which is useful in bounding the difference of two quotients. We state it here for

convenience.

Lemma5. For <a,ﬂ,o,6,b>ER5 and )03t05b,

(43) mug-ﬁlth: Ié—al +(I%I+b)li‘9-I9I.

The following is the asymptotic Optimality result.

Theorem 2. Let ti and i. be as deﬁned in the second and fourth paragraph of

. 1 " 2 ' 2 .
Section3andlet Dn=ﬁ.2 E0{|ti-0i| - [ti-0i] } bethe modiﬁed regret.

Then under the assumptions of Theorem 1, IgDn is bounded by

(44) 9%; + 8szl(n—l) + 8bB2(n—1)

and _t_ is asymptotically Optimal with rate ad“.
Proof: Since 6 C [0,b]2 and ti Ii E [0,b]2, applying the identity a2 - b2 =

(a—b)(a+b) to each component, the i—th integrand of D11 is bounded by
2 , . ,, .
(45) 2b 2: ItIJLtIJh.
i=1 ,
By applying Lemma 5 to the above absolute value with véJ)/pG as old and
n n

then weakening the resultant bounds by lvéj)/pG | 5 b, (45) is bounded by
n n

(46) 2b§ IIGW-véjh/p )+2b(4bIn--pG l/pG)
j=1 ‘ n Gn l n n
as. PG . The bound (44) follows from Corollary 2, and the a.o. result follows from
n

the bound (44) with h = n’1/4. n

37

6.AnExample

We now give an example of our component problem. A similar example was
considered by Wei (1989) for the empirical Bayes problem. (See his Theorem 1 and
display (3).)

Let b be a positive number and let h be a Lipschitz function from [0, b]

2 ”2
to (o, e). For any a = (01, 02) e R with 02 > 91, let c(0) = 1U; h(y) dy.

1
Let s beapositivenumberandlet e={o=(ol,02); 0501,0251), 02—012

0}. For each 06 O, Y is a r.v. on [01, 02] with density
306') = c(0)h(y)[01 s y s 92]-
If k22 and Yl""’Yk aret't'd Y, then
X = (X1, X2) = (min Yi’ max Yi)

k
is a sufﬁcient statistic for {l'Ig0(yi); 0 E 9} and if D is the upper triangle in the
1

square [0, b]2, D = {x; 0 5 x1 5 x2 5 b}, then a density of X given 0 can be

written in the form:

(47) p00!) = f(x)e(x)ll(0)qp(x)D(x).
where
(48) f(x) = h(k-1)c2’k(x). e(x) = name,»

Ito) = c170). sax) = Io, 5x1 ,x,s 021.

By the deﬁnition Of c, f is nonincreasing to SE. As a product Of Lipschitz
functions from [0, b] to (0, an), e is such. The boundedness Of f follows from
that of h. Since h is bounded away from 0 and 0 from the diagonal, c is
bounded and so is III. Hence, all the assumptions in the second last paragraph of
Section 2 are satisﬁed by this example.

CHAPTER 3
THE LINEAR COMPOUND PROBLEM

I. Introduction

Robbins (1983) introduced an empirical Bayes decision problem where the
component problem is squared error loss estimation of means and the class Of
component decision rules is restricted to the class of linear estimates t(x) = A +
Bx. In this chapter, we consider the compound decision problem with this
component. We construct a compound decision rule _t_ and show that its modiﬁed

regret DnL is of order O(n-'I/2

) uniform in parameter sequences under certain
assumptions on the component parameter space 6 and the family of distributions
P 0 , 0 E 9.

In Section 2, we specify our component problem explicitly, and derive a
linear component Bayes rule and a compound rule. In Section 3, we prove
asymptotic Optimality of the compound rule. Section 4 gives an example to show
that there exists such a case that there is no asymptotically Optimal rule when 9 is

unbounded.

2. The Component Problem and a Compound Rule
Let M be a positive number and let 9 be a bounded subset of R,

(1) 9 c [—M/2, M/2].
Let a, b and c with c it 1 be constants such that the function H, deﬁned by
(2) H(0) = a + b0 + c9,

is nonnegative on 9. Let K be aﬁnite number. For each 0E 9, let x be a r.v.
with distribution P0 such that
(3) Eax = 0 and Varox = H(0),

38

39

and
(4) E ,(r — o)4 5 K.
An example of the above distributions is the family of the Poisson distributions with

a parameter set bounded away from a.

Let g = (o , on) e an and on be the empirical distribution determined

by 0. Let 7 and 7 bethemeanandvarianceof Cu,

1 n 1 n 2

(5) 7=E0=i 2 0. and 7=Var0=i 2 (IL-7).

j=1 J j=1 3
Let 02 be the variance Of the mixture GnP 0 and let H be the average of H( 0i),

1 ii H o

(6) - 5H ( jI.
By assumption (3),
(7) 02 = EVar(x| a) + VarE(x| a) = E“ + 7,
where E denotes the joint expectation corresponding to GnoP 0 Hence, by the
identity

n n
(s) I: 02. = 2 (II-7)2 + as?
and the deﬁnition (2) of H, we have
(9) 02 = 11(7) + (c+1)7.

Let r: R2 ~~> R; (x,y) ~~> (y — H(x))/((c+l)y) and a = r(-0,02). By (9) we
have
a = 7/ 02.
It follows from (7) and H2 0 that
(10) 0 5 a 5 1.

By assumption (3), a minimizer Of

(11) ii

40

within the class of linear functionals t(x) = A + Bx is

(12) f(x) = 0 + O(x - 0).
(See Robbins (1983) for a derivation with a general mixing probability measure.)

We will take 11 > 2 in the remainder Of this chapter and let xi be

independent random variables with distributions P0. , i = 1, ..., n. The simple
1

compound procedure i is deﬁned by its components fig) = f(xi), i = 1, ..., n.

2
i
(which will be deﬁned in (15)), we see, from (12) and (10), that i is estimated

With 7 and 02 replaced by their asymptotic unbiased estimators iii and s

by i with components

(13) £i(£) = ii + ﬁio‘i " Ei):
where
— 2
(14) 1% = 0Vr(xi , si)A1,
and
- 1 2 1 - 2
(15) xi = mﬁi xj and si = n—_1-j§i(xj—xi) .

3. Asymptotic Optimality of i
In this section, we will show that the compound rule i has the same

asymptotic behavior as i (Theorem 1), which is unavailable because 01, ..., on

are unknown. Lemma 1 shows that [H(Wi) - H(0)| and la? — 02| are Of order

O(n—1). Lemma 2 shows that H(ﬂ and s2 are L1 consistent estimators Of

n
H(T) and 02 with rate 11—1/2. Lemma 3 gives bounds on % 2 Edii — '0)2 and
i=1 —

n

1 n

the difference of the compound risks for _t_ and _t'_.

Theorem 1. Under the assumptions of (1), (3) and (4), 815]) DnL = O(n—”2).

Before the proof of the theorem, we prove three lemmas.

Let G; be the empirical measure determined by 0; , the vector _0 with 0i
deleted. Let 7i and 7i be the mean and variance of G; ,

(17) Hi: =—_1-2ij0 and 7i: —_1-j2ij—(0 -—i27).
#i"

Let a? bethe variance Of GiP0' By (9),

(18) of = H(Tii) + (c+l)7i .

The ﬁrst lemma gives two identities about the differences between delete and
non—delete 7’s and H’s. From these and the boundedness Of 9, (21) and (22)
follow directly.

Lemma 1. With notations introduced in (5) and (17), and a2 and a? the

variances of ano and GiP0’ we have

 

(19> t-v=—<a%r>2.—1r«t-I>2-n+57”
and
(20)(n°1i-—)2(tl——.7I)2

                         

Hence, under the assumption that 9 is bounded,

I21) IH(7,)-H(7)l = O(n-1).
and by (9) and (18),

 

42

(22) Is? — 02] = O(n-1).

Proof: The following identity will be used repeatedly in our proof:

(23) lEa—lza- l a—E) VaER
ni=ll Hwy-Edi r '

By adding and subtracting 7, and using identity (23) with ai = 0i ,

11
Hence, the result (19) follows from adding and subtracting (n—EI)27, and applying

identity (23) with ai = (IIi — '6)?

Since a? — 72 = 27(7i — 7) + (7, — if, by the deﬁnition of n and
identity (23) with ai = 0i , the second result (20) follows. a

The next lemma gives rates, and its proof gives bounds, of the L1 errors of

82 and H(i) respectively.

n
Lemma 2. Let s2 = il; .2 (xj — 32. Then, under the assumption of Theorem 1,

(25) 1 East” - 02| = curl/2)
and -

I26) EoIHGcl—HUN = O(n—1’2).
Proof: Since -

(27) Ix,- -22 - II,- J)” = [Ixj — 0,) — (i-7)][(xj - 0,) - Iii-7) + 2(0, -7)].
by the expression (7) of a2
2 2_ln _ 2_ _—_ 22n _ _
(23) s a _ njglej oj) H(0j)] (x 7) + njglorj 09(0). 7).
By applications of the Schwarz inequality to the ﬁrst and third terms, and the

independence Of x j , the expectation Of |rhs(28)| is bounded by

 

43

(29) (HE‘S-(xi — If)"2 + 51’ + 2(i22H(0j)(0j — Tiff/2.

From the boundedness of O, (0j — 0) and H, hence H are bounded. Therefore,
by the boundedness of the fourth central moment Of x, (29) is of order O(n—1,2),
which proves the ﬁrst result.

Similar to (27),

(30) mi) 41(7) = (b + 2c7)(i-7) + c(i-7)2.
By applying the Schwarz inequality to the ﬁrst term, the expectation of |rhs(30)|
is bounded by

(31) lb + 2c0'|(%H)1/2 + ﬁn.

The second result follows from the boundedness of H and '0. n

11
Lemma 3 below considers the asymptotic behavior of % 2 E 001i - 7)2 and
i=1 -

Eolﬁi —a|. These two terms will appear in a bound of DnL .

Lemma. 3. Let H be deﬁned as in (6). Then, under the assumption of Theorem 1,

n
(32) grips-7)” = ,éyﬁ + (ll—11921175 F32,
and
(33) 02Eplﬁi-al = O(n-1,2).

Proof: The inequality in (32) follows from the expression (7) of 02.
By adding and subtracting 7i , and using identity (23) with ai = 0i
- 2 1 1 2

The equality (32) follows from taking average of (34) with respect to i.

44

With (c + 1)a2 as 6 and l as b in the Singh—Datta lemma (Lemma 5 of
Chapter 2), after some algebra,

(35) (C+1)02|ﬁi-al561159-110)!+(2|<=+1|+ nsi—ail}.

Triangulating about H(Vi) for the ﬁrst term and a? for the second term,

Lemmas 1 and 2 complete the proof of the second result (33). n

We are now ready to prove the theorem. Although some computation is
needed, the idea of the proof is simple. First, the summands of DnL are
simpliﬁed to (40) by using a2 - b2 = (a—b)(a+b) and the fact that xi is
independent of (ii , ﬁi). Then the bound (41) is derived by the boundedness Of a

and Bi . Lemma 3 and the Schwarz inequality ﬁnally ﬁnish the proof.

Proofof Theorem 1:
From the expressions (12) for {i and (13) for ti , we have

(36) ii - 0i = (1 — a)('6 — 0,) + oni - 0,),

and

(37) ii — 0, = (1 — six? - 0,) + slot, — 0,) + (1 — six;i — 7).
Hence

(38) ii - E, = (37) - (36) = (ﬂ, - a)(xi - 7) + (1 - I995, - 7).
and

(39) (ii — 0,) + (ii — 0,) = (37) + (36)
= "(2 " ﬁi " a)”, - 7) + (ﬂi + a)(xi - 0i) + (1 - 1395i ‘ 7):
Since xi is independent Of ii and ﬁi, and E 0x = 0i, the i—th summand
I

of DnL , namely E drhs(38) rhs(39)], is equal to

45

(40) 49(6), — 702(2 - I9, — axis, - a) + E962 — now, + a)Ixi — 0,)2

+ E30 —Iii)(ili -7)(ﬂi — a- 2 + B, + a)(§i -7) + E41 —I3i)2(ii —T)2.
Because [al 5 1 and Iﬁil 5 1, DnL is bounded by

(41) ﬁlms, II)” + manuals, — al + ”9' 9, —7| Iii —'6| + Eéii :02}.

i=
Distributing the average across the three terms in the summands, (41) is further
bounded by
O(n—1,2) + 2[7 rhs(32)]1/2 + rhs(32),
by (7) and Lemma 3, and the Schwarz inequality applied to the second term. This

ﬁnishes the proof of Theorem 1. o

4. An Example
The example below shows that, when 9 is unbounded and the fourth
central moment is a continuous function of 0, the conclusion Of the theorem may

not be true.

Example: Let 9 be the interval [1, m). For each 0E 9, the distribution of x is:
P0(x=0)=%- and P0(x=20)=%.
Then on = 0 and Varox = 02 2 1. The boundedness conditions of 9 and E0(x
0)4 = 04 are violated.
For any 11, let tni be the compound decision rule for 0i . When
_0 = (0, ..., 0), the simple linear Bayes estimator of Q is 0. SO the modiﬁed regret

DnLQ’ 0) is

1 n
(42) a 2

46

From the distribution of x, (42) equals
1 n
(43) — 2

2
11211 i=12(tni(x1)mlxn) - 0) l
where the inner summation is for all possible 211 points of x. Choose 00 such
that
2
(tn1(0,. . .,0) — so) 2 n2“.
For this 00 , (43) 2 1. Hence DnLQ’ I) does not converge to 0 and consequently

the conclusion Of the theorem is not true. a

BIBLIOGRAPHY

Billingsley, Patrick ( 1968). Convergence of Probability Measures. Wiley.

Brown, L.D. and R. Purves (1973). Measurable selections of extrema. Ann. Statist.
1 902 — 912.

Brown, Lawrence D. (1986). Fundamentals of Statistical Exponential Families with
Applications In Statistical Decision Theory. IMS Lecture Notes —
Monograph Series 9.

Datta, Somnath (1991a). On the consistency of posterior mixtures and its
applications. Ann. Statist. 19 338 - 353.

Datta, Somnath ( 1991b). Asymptotic Optimality of Bayes compound estimators in
compact exponential families. Ann. Statist. 19 354 — 365.

Datta, Somnath (1991). Nonparametric empirical Bayes estimation with O(n-1,2)
rate of a truncation parameter. Statistics 8: Decisions 9 45 - 61

Fabian, VAclav and James Hannan (1985). Introduction to Probability and
Mathematical Statistics. Wiley.

Ferguson, Thomas S. (1967). Mathematical Statistics: A Decision Theoretic
Approach. Academic.

Fox, Richard J. (1978). Solutions to empirical Bayes squared error loss estimation
problems. Ann. Statist. 6 846 — 853.

Gilliland, Dennis C. (1968). Sequential compound estimation. Ann. Math. Statist.
6 1890 - 1904.

Gilliland, Dennis G. and James Hannan (1974). The ﬁnite state compound decision
problem, equivariance and restricted rlsk components. RM — 317 Statistics
and Probability, MSU; (1986) Adaptive Statistical Procedures and Related
Topics. IMS Lecture Notes — Monograph Series 8 129 — 145.

Gilliland, Dennis C., James Hannan and J. S. Huang £1976). Asymptotic solutions
to the two state component decision problem, ayes versus diffuse priors on
prOportions. Ann. Statist. 4 1101 -- 1112.

Hannan, James F. and Herbert Robbins (1955). Asymgtotic solutions of the
compound decision problem for two completely speci ed distributions. Ann.
Math. Statist. 26 37 — 51.

Hannan, J. F. and J. R. Van Ryzin (1965). Rate of conver ence in the compound

decision problem for two completely speciﬁed distri utions. Ann. Math.
Statist. 36 1743 - 1752.

47

48

Mashayekhi, Mostafa (1990). Stability of symmetrized probabilities and compact
equivariant compound decisions. Ph.D. Thesis, Dept. of Statistics and
Probability, Michigan State University.

Mashayekhi, Mostafa (1993). On equivariance and the compound decision problem.
Ann. Statist. 21.

Nogami, Yoshiko (1978). The set—compound one—eta e estimation in the
nonregular family Of distributions over the interv (0, 0). Ann. Inst.
Statist. Math. 30 Part A 35 - 43.

Nogami, Yoshiko 1988). Convergence rates for empirical Bayes estimation in the
uniform U 0, 0) distribution. Ann. Statist. 16 1335 — 1341.

Oaten, Allan 1972). Approximation to Bayes risk in compound decision problems.
Ann. ath. Statist. 43 1164 — 1184.

Robbins, Herbert (1951). Asymptotically subminimax solutions of compound
statistical decision problems. Proc. Second Berkeley Symp. Math. Statist.
Prob. 131 — 148. University of California Press.

Robbins, Herbert (1983). Some thoughts on empirical Bayes estimation. Ann.
Statist. 11 713 — 23.

Rockafellar, R. Tyrrell (1972). Convex Analysis. Princeton University Press.

Szasz, Otto (1915 — 1916). Sur die approximation stetiger funktionen durch lineare
aggregate von potenzen. Math. Ann. 77 482 - 496.

Singh, Radhey Shyam (1974). Estimation of derivatives Of average Of p—densities
and sequence—compound estimation in exponential families. Ph.D. Thesis,
Dept. of Statistics and Probability, Michigan State University.

Van Ryzin, J. R. (1966). The compound decision problem with mxn ﬁnite loss
matrix. Ann. Math. Statist. 37 412 — 424.

Wei, Laisheng ( 1989). Asymptotically Optimal empirical Bayes estimation for
parameters of two—sided truncation distribution families. Ann. of Math.
10(B) 94 — 104, China.

Yu, Kai F. (1986). On the bounded regret Of empirical Bayes estimators.
Commun. Statist. — Theory Meth. 15 2391 — 2403.

Yu, Kai F. (1988). A linear reﬁession with unobserved dependent variables.
Commun. Statist. — Theory eth. 17 3075 — 3087.

nrcHIonN STATE UNIV. LIBRARIES
lll[Willi[Will[llllllllWillllllll[Illlllllllllllllll
31293010554362