grit/lean

“UHHI!HIWIHIIIHIIHIJHI

L
=2

l

 

 

 

 

 

 

 

 

 

 

 

 

 

 

llHl III II”!

31293 00548 0292

LIBRARY
Michigan State
University

 

 

 

 

 

This is to certify that the

dissertation entitled

Parametric Empirical Bayes Problems
with
Cost For Component Observations

presented by
Inna Jung

has been accepted towards fulfillment
of the requirements for

Ph.D. degreein_$I.aIJ_S_U_§_S_' '

ﬂaw

Major professor

 

Date November 11, 1988

MS U is an Aﬂ‘mmm’vc Action/Equal Opportunity Institution 0-12771

 

 

MSU

LIBRARIES
4-:—

 

 

ﬂl ﬂ"

RETURNING MATERIALS:
Place in book drop to
remove this checkout from
your record. FINES will
be charged if book is
returned after the date
stamped below.

 

 

 

 

 

 

PARAMETRIC EMPIRICAL BAYES PROBLEMS
WITH
COST FOR COMPONENT OBSERVATIONS

By

Inna Jung

A DISSERTATION

Submitted to
Michi an State University
in partial ful lllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

1988

’wrﬂ

rtxt
. .

LJN

f "\

‘a-x

Gil

ABSTRACT

PARAMETRIC EMPIRICAL BAYES PROBLEMS
WITH
COST FOR COMPONENT OBSERVATIONS

By

Inna Jung

We consider the empirical Bayes decision problem where the component
problem includes a constant cost per observation and the option to choose in
advance the total number of observations. The usual empirical Bayes decision
problem involves identical components with a given ﬁxed sample size for all
repetitions of the component. The empirical Bayes decision approach with our
component permits data accumulated over past component problems to be used in
selecting both the sample size and the decision rule to be used in the current
component problem. The generality introduced by allowing sample sizes that are
determined stochastically makes the result more useful in applications where,
typically, the choice of sample size is an option based on past data.

The empirical Bayes version involves "independent" repetitions (a sequence)
of the component decision problem. With the varying sample size possible, these
are not identical components. However, we impose the usual assumption that the
parameter sequence Q = (01, 02,...) consists of independent G—distributed

parameters where G is unknown. We assume that G E y, a known family of

distributions. The sample size Ni and the decision rule di for component i of

the sequence are determined in an evolutionary way. The sample size N1 and the

decision rule (11 E DN used in the ﬁrst component are ﬁxed and chosen in
1

advance. The sample size N2 and the decision rule (12 are functions of
L1 = (X11,....,X1N ), the observations in the ﬁrst component. The sample size
1
2
)

N3 and decision rule (13 are functions of (_)_(_1, L . In general, Ni is an

integer—valued function of (231, L2,...,_)gi_1) and, given N i’ di is a DN.—valued

1

function of (.)_(1, L2

,...,_)_(_i_1). (The action chosen in the i—th component is di(xi)
which hides the display of dependence on (L1, L2,...,;(_i_l).) For a variety of
models, we will construct empirical Bayes rules that are asymptotically Optimal.
We consider both parametric models involving squared error loss estimation
and linear loss testing and show how more general cost functions are covered by the

work. We will simulate one model to assess the small—to—moderate i risk plus cost

behavior of one of the suggested asymptotically Optimal empirical Bayes procedures.

To my wife Chairan
and

Sons Sehyun, Sunggon

iv

ACKNOWLEDGEMENTS

I wish to express my deepest appreciation to Professor Dennis C. Gilliland
for his guidance throughout the preparation of this dissertation and his concern for
my work. I would like to thank Professor R. V. Erickson, Professor R. V.

Ramamoorthi and Professor H. Salehi for serving on my committee.

I am also thankful to Ms. JoAnn Peterson and Ms. Cathy Sparks who have
helped me in many ways. My Special thanks go to Ms. Loretta Ferguson for her

patience and great care in typing my thesis.

TABLE OF CONTENTS

CHAPTER
1. INTRODUCTION ......................................... 1
1.1. A Statistical Decision Problem With Cost for Observation ....... 1
1.2 An Empirical Bayes Decision Problem with Random Sample Size
Components ....................................... 3
1.3 Literature Review .................................... 8
2. ESTIMATION OF BINOMIAL PARAMETER ................... 11
2.1. The Component Problem .............................. 11
2.2. An Empirical Bayes Decision Procedure ................... 16
2.3. Some Empirical Bayes Risk Calculations ................... 19
3. TESTING THE BIN OMIAL PARAMETER .................... 22
3.1 The Component Problem .............................. 22
3.2 An Empirical Bayes Decision Procedure ................... 26
4. ESTIMATION OF THE NORMAL MEAN ...................... 28
4.1. The Component Problem .............................. 28
4.2 An Empirical Bayes Decision Procedure ................... 31
5. TESTING THE NORMAL MEAN ............................ 35
5.1. The Component Problem .............................. 35
5.2 An Empirical Bayes Decision Procedure ................... 38
REFERENCES ............................................... 41

LIST OF TABLES

TABLE

1.1. Empirical Bayes Procedure with Stochastically

Determined Sample Sizes ................................... 5
2.1. n*(a, ﬂ) and r(a, ,8) ...................................... 20
2.2. Estimated Empirical Bayes Risks (m=2, c=.001, a=ﬂ) .............. 21

vii

LIST OF FIGURES

Figure

2.1. A Risk EnveIOpe .......................................... 19

viii

CHAPTER 1
INTRODUCTION

§ Lt. A Statistical Decisign Prgblem With Cost for Observations

Consider a statistical decision problem with parameter space 9, action
space .A', nonnegative loss function L( . , -) on 9 x .1, unknown prior distribution
G on O and a cost c > 0 per observation. Let X1, X2,... be observations which
are independently and identically distributed with a distribution P 0 given 0,
taking values in a set .z the observation Space. Let Dn be the set of all decision
functions (I: .3” —» .1 where .311 is the observation space for the vector
X = (X1,....,Xn). When 0 is the parameter and a decision rule (1 e D n is used,
the decision loss plus cost for observing 2; = (X1,....,Xn) is

L(0, d(2(_)) + cn
where we assume that L is integrable for all 0, n and d 6 Dn'
Let Rn denote the risk and Bayes risk of the decision rule (I 6 Dn, i.e.,

(1.1) Rum, d) = Igan. dendPge)

(1.2) Rn(G, d) = [Rn(0, d) dG(6’)
and let rn denote the risk and Bayes risk of the decision rule (1 E Dn including

cost for observations.

Then
(1.3) rn(0, d) = Rn(0, d) + cn
(1.4) rn(G, d) = Rn(G, d) + cn.

We deﬁne minimum Bayes risk and minimum Bayes risk plus cost in the usual way.
We assume for each prior G and each n = 1,2,... that a Bayes rule d8 6 Dn

exists. Thus,

(1.5) (11213 Rn(G, d) = Rn(G, d3).
I]

Let

Rn(G) = Rn(G, <13)
and
(1.6) rn(G) = Rn(G) + on.

Since Rn(G) is nonincreasing in n, a minimizer Of rn(G) exists among the
integers 1, 2,.... We will denote a speciﬁed minimizer as n* = n*(G) and refer tO
it as an optimal ﬁxed sample size. Therefore, r(G) = rn*(G) is the minimum
Bayes risk in the component across all the possible sample sizes and the
correSponding class of decision rules, i.e.,
(1.7) r(G)=rn*(G) = min {min {rn(G, d)|d E Dn}| n = 1,2,....}.
Moreover, note that Rn*(G) + cn* 5 R1(G) + c < no so that
n*_<_(R1(G)+C)/C<oo. For some components, R1(G) is a bounded function Of G 6 y.
Example 1.1. (Estimation). Let X1, X2,....,Xn be i.i.d. N (0, A) given 0
and let 0 have prior distribution G = N(p, V). Assume A is known. Let
6 = .1: (-oo, oo), L(0, a) = (0—a)2 for (0, a) E O x .1, and let c > 0 denote the
constant cost per observation. Then a Bayes decision function for estimating 0

based on observation X = (X1,....,Xn) is

.1,
(1.8) deem) 4%) u + (1 — ﬁx» X,
and

(1.9) rn(G) = (ﬁhiv) + en.

The function AV/(A+nV) + cn is a convex function Of n e (—A/V, 00) with a
minimum at 17=(A/c)1/2 — A/ V. Therefore, we can deﬁne an Optimal ﬁxed sample
size n* as the smallest positive integer minimizer Of ( 1.9), which is related to 17 by
1 if r] < 1
(1.10) n* = n*(A, V) = n if 17 E {1, 2, 3,....}
[77] or [77] + 1, otherwise

where [ ] denotes the greatest integer function.

§Q. EmirilB sDeii rlm’h n m 1e ize
Commnents

When a statistical decision problem occurs repeatedly and independently
with the same unknown prior G, one can apply an empirical Bayes approach where
G is estimated using data collected from previous repetitions and a Bayes rule with
respect to the estimated G is used in the current component problem. The
empirical Bayes decision approach with our component permits data accumulated
over past component problems to be used in selecting both the sample size and the
decision rule tO be used in the current component problem. The generality
introduced by allowing sample sizes that are determined stochastically makes the
result more useful in applications where, typically, the choice Of a sample size is an
Option and based on past data. We impose the usual assumption that the parameter
sequence (01, 02,...) consists Of independent G-distributed parameters, where G is
an unknown element Of the known class Of distributions y.

The sample size Ni and the decision rule di for the components are
determined in an evolutionary way. The sample size N1 and the decision rule

(11 E DN used in the ﬁrst component are given nonrandom choices.
1

The sample size N2 and the decision rule (12 are functions of

_)_(_l = (X11"""X1N ), the Observations in the ﬁrst component. The sample size
1

N3 and the decision rule d3 are functions Of (£1, £2). In general Ni is an

integer—valued function Of (x1,_)_(_2,...._xi‘1) and, given Ni’ di isa DN.—valued
1

function Of (x1, x2,....,xi"1).

Let N = (N1, N2,...) and 51 = (d1, d2,...). We will be concerned with the
risk behavior Of empirical Bayes procedures (N, 51). (Here and henceforth, the term
risk will refer to the expected loss plus cost for Observations.) The risk for the

decision about 0i is

(1.11) ErNi(G, di) = ERNi(G, di) + cENi,

where E denotes the expectation over the earlier Observations L1, L2,"..X—1.

Deﬁnitign 1.1. If the empirical Bayes procedure (_N_, g1) possesses the

property:

(1.12) lgm ErNi(G, di) = r(G) for all G E f,

we say it is W. This means that in the limit, the empirical
Bayes procedure has the best possible risk behavior, i.e., achieves minimum Bayes
risk.

For a variety Of models, we will construct empirical Bayes rules that are
asymptotically Optimal. All Of our results concern parametric families Of priors,
y={G w' wen} where (I is a Speciﬁed subset of a ﬁnite—dimensional Euclidean space
RP. Families of conjugate priors will be used as the parametric families of priors.
We will identify G by w and replace G accordingly in formulas for risk, etc.

Also, we will use the empirical Bayes approach wherein the prior w is
estimated, say by a, and ﬁ=n*(&)) and draws are used in deﬁning the empirical
Bayes procedure. Note that we have dropped the superscript on d (if The following
table shows how the empirical Bayes procedure evolves using estimates 620
arbitrary, 5.21:5), (x1), a2=aa2 (x1,x_2), 513:“ 3 (x1,x2,x_3),.... The 01, 02, 0

are i.i.d. Gw‘

Table 1.1. Empirical Bayes Procedure with Stochastically Determined Sample Sizes

Para— Sample Decision Observa— Estimated
Stage meter Size Rule tion Prior Risk
. 1 . 1 1
1 01 Nl=n*(w0) d1=daj0 _)_(_ wl(_)_(_ ) E{L(0l,dl(l(_ ))+cN1}
l

2 0 N =n*(&2) d =d. x2 a; (x1 X?) E{L(0 d (X2))+cN }
2 2 1 2 ml -' 2 — ’ — 1’ 2 — 2

= ErN2(w, d2)
3 0 N =n*(&2 ) d =d. x3 a; (x1x2x3) E{L(0 d (X3))+cN }
3 3 2 3 (.22 — 3 - "- ’— 3’ 3 - 3

= ErN3(w, d3)

The convergence Of the sequence Of risks in the last column to the smallest possible
risk r(w) = rn*(w) is the asymptotic Optimality prOperty. The following remark
shows how asymptotic Optimality implies the convergence Of the sample sizes Ni to

the set of Optimal ﬁxed sample sizes.

Bemagk 1.1. Let s(w) denote the set Of integer minimizers of rn(w)
(a) If (N, 51) is asymptotically Optimal at (.2, then
(1.13) P(Ni E s(w)) -1 1 as i—r 00.

(b) If rN (a), di) -» r(w) a.s., then
i

(1.14) P(Ni E s(w) eventually) = 1.
mt. For given w, there exists an 6 > 0 such that for all n’ j! s(w),

rn,(w, d) — r(w) 2 c for all (1 6 Dn" On the event, Ni 9! s(w), rNi(w, di) — r(w) 2 6

so that

13er (01, di) —r(w)] Z 6 PW, ¢ 800)),
i
which yields (1.13) by letting i -+ 00. Since (Ni ii 3(a)) i.O.) implies rN_(w, di) —
1

r(w) 2 c, i.O., (1.14) is proved. a

The following lemma will be used in subsequent chapters in establishing the

asymptotic Optimality prOperty.

Lemma 1.1. For priors w and 11, let n = n*(w), m = n*(u) be Optimal
ﬁxed sample sizes and let (11:), d}; E Dk denote Bayes decision rules with respect tO
O), V for k = 1,2,... Then
(1.15) 0 5 Im (“’1 (1111/1) ‘ TM 5

k k k k
sukaRk(w, du) — Rk(u, dV)| + supkle(w, dw) - Rk(u, dw) |.

Proof. The left inequality follows from the fact that r(w) is the minimum

 

Bayes risk Over choices (I E Dk and sample sizes k. Adding and subtracting
rm(u, (1’3) and noting that rm(1/, (1’3) 3 rn(u, d2) yields

(1.16) rm(w, (I?) — rn(w, d2) S rm(w, (it?) — I‘m(V, (113) + rn(u, (13)) — rn(w, (13)
which together with (1.4) implies the right inequality Of (1.15). n

In Chapters 2 and 3 we develOp a.O. empirical Bayes procedures for squared
loss estimation and linear loss testing and a binomial component. Here the family
Of priors is the beta family. In Chapter 2 we give the results Of computer
simulations that provide estimates Of risk behavior for small to moderate i. In
Chapters 4 and 5 we treat the two loss functions in a normal component with

normal priors.

The quadratic loss function L(0, a) = b(0—a)2, where b > 0, is covered by
our results by factoring b out and replacing c by c/b. Similarly, the linear loss
function for testing with slopes -b and b for its arms is covered by our work.

Our methods cover more general cost functions as well. If the cost function
is c(n) and lim inf c(n) > R1(G), then for any given G, inf {rn(G) |n = 1, 2,....}
is attained, and we can deﬁne n*(G) as the smallest minimizer. Moreover, the
proof of Lemma 1.1. applies to give the same conclusion, that is, a bound for excess
risk in terms Of the supremum of differences in decision risks over varying sample

size problems.

§ 1_.3. Literatgta Bgvigw

In the usual empirical Bayes decision problem we are given a stochastic
process (01, X1), (02, X2),.... Of independent and igantiaally distributed random
vectors with the interpretation that, at the ith component problem, Observation Xi
has distribution P 0 given the parameter 0i = 0 and 01, 0 ,... are i.i.d. with a
ﬁxed but unknown prior distribution G in a family Of distributions f. The datum
Xi may be a vector Of summary statistics for the Observations taken at the ith
component, e.g., the sample mean or other sufﬁcient statistic based on a sample Of
Speciﬁed size taken at that stage. The family Of priors f can be an unspeciﬁed
subfamily Of all priors On 9 or a certain parametric family, like conjugate priors.
Morris (1983) uses the terminology nonparametric empirical Bayes (N PEB) for the
former case and parametric empirical Bayes (PEB) for the latter case. Morris
(1983) indicates that PEB is needed to deal with those cases in which number Of
component problems is tOO small for Bayes' theory to approximate well. Robbins
(1951, 1955, 1966) introduced the empirical Bayes problem. Most Of his work and
that which followed Robbins is N PEB. It has mainly concerned constructing

procedures in a variety Of Situations that are asymptotically Optimal, i.e, such that

lim Rk (G, di) = Rk(G) v G e y.

Here k indicates the common sample size taken at each component and on which
both the Bayes and empirical Bayes procedure (1G and di are based.

Two different approaches have been used in constructing empirical Bayes
procedures. The ﬁrst one is to estimate G from data accumulated from previous
component problems and then to construct a Bayes procedure with respect to the
estimated G. The second approach is to estimate the Bayes procedure (1G with
respect to G directly using data from previous component problems without

estimating G from the previous component problems. The ﬁrst approach gives

smoother procedures Since the decision rules will be conditionally component Bayes.
O'Bryan (1972, 1976) introduced the nonparametric empirical Bayes decision
problem with non — i.i.d. components by allowing unequal nanranaam sample sizes
in the component problems. He followed the second approach in the situation that
P 0 is in the discrete exponential family. O'Bryan (1976) deﬁned asymptotic
Optimality for his case, which is necessarily more general than that Of Robbins
(1951), and showed the asymptotic Optimality Of his procedure. O'Bryan and
Susarla (1975) studied the empirical Bayes decision problem with nonidentical
components in which P 0 is normal with mean 0 and known variance which is
changing from component to component.

Laippala (1985), whose work is motivated by O'Bryan (1976), introduced an
empirical Bayes problem with nonidentical components with cost for Observations
and random "floating" sample Sizes for the components. Laippala (1985) deﬁnes the
"Optimal" sample size as

ié = [inf {nlr G) ;r (G)}] A 1

. n+1(
where i is a given ﬁxed integer. This is not Optimal among the set of all ﬁxed

sample sizes Since for all G E y,
r- (G) a r *(G)a
16 n

and for some G E y it is possible to have

riG(G) > rn*(G).

Laippala (1979) deﬁnes a ﬂoating Optimal sample size i; +1 for use at

(n+1)th component problem which is a function Of the Observations from previous
11 components as well as current Observations. It is pointed out in Gilliland and
Karunamuni (1988) that this rule is not necessarily Optimal when i a 3 and that
the ﬁrst line Of the proof Of Theorem 1 in Laippala (1985) claiming that i; -—P-+ if;

neglects the boundary set on which the convergence may fail. Laippala's results as

10

claimed are nonparametric in the sense of Morris (1983).

The component problems that we will consider involve squared error loss
estimation and linear loss testing. Many authors have considered the empirical
Bayes problem with independent and identical repetitions Of these components
following Robbins (1956, 1964).

Morris (1983) and Susarla (1982) give general discussions. Singh (1979)
provides results on squared error loss estimation problems. Van Ryzin and Susarla
(1977) and Gilliland and Hannan (1977) develop the theory for monotone multiple
decision problems extending the results for linear loss testing Of Johns and Van
Ryzin (1971, 1972).

All empirical Bayes work cited above involves identical components with the
exception Of the nonrandom sample Size work Of O'Bryan and Laippala. The
variant Of O'Bryan and Susarla (1975) has a linear loss component with a
translation and scale parameter exponential with the scale parameter known and
changing from component to component.

Karunamuni (1985, 1988) and Gilliland and Karunamuni (1988) consider the
possibility Of varying stochastic sample sizes. Gilliland and Karunamuni (1988)
deveIOp the theory for ﬁnite state problems. Karunamuni (1985, 1988) studies an
empirical Bayes problem with a sequential component with linear loss and multiple
decision loss structures. He does not treat the Optimal ﬁxed sample size problem.
Rather, assuming a consistent estimator for G, he shows that the risk Of an
empirical Bayes one—step sequential decision procedure converges tO the Bayes risk
attained by the one—step look ahead sequential decision procedure. This is not the

asymptotic Optimality deﬁned by Robbins (1956).

CHAPTER 2
ESTIMATION OF THE BINOMIAL PARAMETER
§2._L Tha Commnent Problem

Suppose that the rate 0 at which defectives are produced by a given
production process varies from day—tO—day. On each day a random sample Of at
least two parts is taken at a cost Of $ .50 per part and an estimate 3’ made with
loss $1000 (AG—0)? If the sequence 01, 0 ,.... is modeled as a stochastic sequence
with independent and identically G—distributed variables with G unknown, then
the empirical Bayes method is apprOpriate. For the case G is restricted to the
Beta (0:, 6) family and the sampling is two—at-a—time, we show how tO construct a
decision procedure with risk plus cost for Observations converging to the lowest
possibly risk, whatever be a and 5. In Section 3 we ﬁnd that in this case the
envelope risk plus cost is no greater than $18.00 per day, the minimax risk plus cost.
Against the least favorable a=ﬂ=2, the empirical Bayes risk is estimated to be
below $20.00 after 15 days. The empirical Bayes sample size converges to the
Optimal 8x2 = 16 parts here. Other (1, 6 values are tested in the computational
work Of Section 3. In this section and the next we deve10p the empirical Bayes
procedure and prove its asymptotic Optimality.

Let X1, X2,... be i.i.d B(m, 0), where m is a given positive integer and
the parameter 0 have prior distribution G in the beta family f = {Beta (0:, ﬂ)|
a>0, ﬂ>0}. Estimation Of 0 is considered for squared—error loss. Here O = .1 =
[0, 1]. Let c > 0 be a constant cost per Observation. Let (I 6 D11 be a decision rule
based on the Observation ﬁn = (X1""’Xn)' The decision loss plus cost for
Observation is given by [0 — d(2(_n)]2 + cn.

The marginal distribution Of Xi is Beta—Binomial. We let 5 and n

denote the ﬁrst two moments Of G = Beta (0:, 6), that is,
_ _ a
6 _ EGa - n+3

11

12

(2.1)

+1
"=EG”2=(3+%I(21+5+1)1

 

andnotethat 0<§2<n<§<1 Since a>0, ﬂ>0. Also
E(Xi)=m{

(2.2)

B(x?) = me - 17) + m2»,

and from (2.1) it follows that

a:

71-5

In the empirical Bayes application, (2.2) and (2.3) will be useful in the

(2.3)

construction of consistent estimators for a and ,6. We will use the method Of
moments to Obtain estimates Of 5 and n and will use (2.3) to Obtain estimates for
the parameters a and 6.

A Bayes rule exists and is given by the posterior mean Of 0, given X”. The
posterior distribution Of 0, given X11, is Beta (a+an, [3 + mn — an), where
Xn denotes the average Of X1""’Xn'

Hence, a Bayes rule dG 6 DD 18

(1+an
(2.4) dG(Xn ) = EFF-TL

_ a + 11 X
— a+B+mn a+ZFan n
if G 2 Beta (0, 6).

Baka 2.1, For G = Beta (0, ﬂ) and G’ = Beta (0’, ﬂ’),

, , 2
(2.5) Rn(G, dG’) = (a’+ﬂi+mn)2 {[(a + 5 ) —mn]1)

 

l3

- warmer) -—mn1 5+ («1021,

 

(2'6) an(G, dGI) _Rn(G’$ dG/)i :2 ig— {Ii + '7]— ﬂ’ '1
and
(2.7) R (G) = “ﬂ .

n (a+ﬂ)(a+ﬁ+1)(a+ﬁ+mn)

Prmf. In (2.4) for G’, we see that

 

 

x12

_ _ a’ _ n
Rn(G’dG')—EGE0[0 a’+ﬂ’+mn a’+ﬂ’+mn n

=EGE0[9— (Kn-mo)— ““1 012

a’+,6’+mn

 

 

 

a' _ n
a’+ﬂ’+mn a’+ﬁ’+mn

 

 

 

I + I ’ 2
= 130.130 [(0.51 ﬂl’B+mn 0 ‘ a'+%’+mn) _ (01%; +mn (x11 _ m0))]

 

= 1 2{E [(a’+ﬂ’)0—a’]2+n2E E (x -—m0)2}.
[a’+ﬂ’+mn] G G 0 11
Using (2.1)
EG[(a'+ﬂ')9— or]2 = (a' + ﬂ’)2n- 201’(a’+ﬂ’)€+(01’)2
and

nZEGEo (Xn - me)2 = n2EG(Var0 Xn)

--= n2EG[%mN1~ 011= mn (r — 2).

Hence

 

2
1 f I 2 I I I
Rn(G’dG’)=[a,+ﬂ,+ ] {(0 +ﬂ) n—2a(a +ﬂ)€

+ (a)2 + mn (€— 11)}

 

_ [ l ]2{[( I I 2 I I I _ I 2
_ a+ﬂ)—mn]n—[2a(a +ﬂ) mn]§+(01)}i
a’+ﬂ’+mn

14

which proves (2.5). Letting G’ = G in (2.5) and using (2.1) leads to (2.7).
Finally, from (2.5) we Obtain (2.6) since
|Rn(G, dG’) — Rn(G’, dG,)| =

I[ 1 ]2{[(a’+ﬂ’)2-nm}(1r-n’)—[2a’(a’+ﬂ’)-mn](€-€’)}|
a’+ﬂ’+mn

 

and

(a’+ﬂ’ ) 2+mn
(a’+ﬂ’+mn )

(a’+ﬂ’) 2-mn
(a’+ﬂ’+mn ) 2

 

 

<

<1

 

 

 

2a’(a’+ﬂ’)—mn

< 2a’ ( a’+,6’)+mn < 2
2 = '
(a’+ﬂ’+mn)

‘ (Oz’+ﬂ’+mn)2

 

 

 

 

From (2.7) the minimum Bayes risk including cost for Observations is

(2.8) rn(G) = + cn.

 

m6 -—-1
(a+/3)(a+,6+1) (‘1 + 3 + m”)
We seek the Optimal sample size n*. rn(G) is a continuous and convex function of

real 11 > —(a+ﬂ)/n. Consider the equation

 

0 = gﬁrnm) = _(a+ﬂril(rg+ﬂ+l) (01+ ﬂ+ mn)—2 + c.
Its larger solution is
(2.9) u ={(%(a+m?£+5+l,)1/2 — (a + zip/m

and an Optimal ﬁxed sample size n* = n* (a, ,6) is given by

r

 

1 if V<1

(2.10) n*= V if V6 {1, 2, 3,...}

[u] Or [u] + 1 depending on which integer minimizes r n(G)’

 

otherwise.

Here [ ] denotes the greatest integer function and we take n* = [u] if both [11] and
[u] + 1 minimize rn(G). By the comment preceding Example 1.1 and the fact
R1(G) 5 .25 for all G, it follows that n* _<_ (.25 + c)/c for all G.

If a and 6 were known constants, we can use dG 6 DH... to achieve

15

minimum Bayes risk, i.e.,
r(G) = min {rn(G)|n = 1, 2,...}.
In the next section we Show how (a, B) is estimated in the empirical Bayes
problem with this component and establish the asymptotic Optimality for the

resulting procedure.

16

§ 2.2. An Empiricﬂ Bayea Dagiaian Prmadura

Consider the binomial component problem Of the last section. Let O0, 30
be initial nonrandom estimates Of a, 5 and let N1 = n*(OzO, 30) be the sample
size chosen for the ﬁrst component. (See (2.10) for the deﬁnition Of the optimal
ﬁxed sample size function n*.) Recall that _)_(_1= (X11,X12,...,X1N1) denotes the
vector Of Observations from the ﬁrst component.

We will deﬁne a sequence of estimates Oi, Bi based on (X1, X2,...,_)_(_i).
Then for component i+1, the empirical Bayes sample size is Ni+1 = n*(Ozi, Bi)

and the empirical Bayes estimator Of 0H-

 

(2.11) di+l(_xi+1)=a . + N‘HYi“ ,i=0,1,...
“i + 5i + m Ni+1
(see (2.4)), where
1N1
(2.12) Yi= —2 X.j,i1,,2...
Ni1j=1

We will give estimates based on the method of moments and will ﬁnd it
useful to consider

i
X?., i = 1, 2,...

(2.13) Z. = U

.1
1 Ni j
and denote average Of Yj’ Zj,j = 1, 2,...,i as Yi, Zi’ i = 1,2,...
Let .96 be the trivial a—ﬁeld and let .9} = 0(X1, X2,...,X_j),j = 1, 2,...

The sample size N . is .93_1 measurable, j = 1,2,..., and we see that

J J
E(Yj| .9j_1) = m 5, j = 1, 2,...

N
21

(2.14)
12(sz 2.1) = m (c — n)+m217. 1= 1, 2,...
follow from (2.2.).

Since Y]. 5 m and Zj 5 m2, j = 1, 2,..., the strong law for centerings at

17

conditional expectation (see Hall and Heyde (1980, Theorem 2.19)) implies
i

l
YI— To: 13(le ..9j_1) -) 0 3.8.
(2.15) 1 i

From (2.14) and (2.15) we have
Yi -) m5 a.s.
(2.16)

Zi -) m({—n) + m2” a.s.

Lemma 2.1. Let m 2 2. The estimators deﬁned for i = 1,2,... by

Yi
Eli—m
(2.17) . 71 "Yi
”15 anti")—
and
A __ Ei(€i- 2,) +
01: ﬂ—
IIi - Ei
(2.18)

.. _ (l-Ei) (ii-2,) +
ﬂi‘ . _ 2
"i 5i

are as. consistent. (In (2.18) take ratios 0/0 to be 0.)

 

Prmf. The as. convergence Of the estimates (2.17) follows from (2.16). The

as. convergence of the estimates (2.18) then follows from (2.3). 1:)

Refer tO Table 1.1. Let L.) = (0, 3), 6.10 be arbitrary and Oi = (Oi, bi) be
deﬁned by (2.18). Let the sample size sequence N be deﬁned by Ni+1 =
n*(iri, Hi), i = 0, 1,... where n* is deﬁned by (2.10). Let the empirical Bayes

18

decision rules S! be deﬁned by (2.11).

Thagram 2.1. Let m 2 2. The empirical Bayes procedure (N, a) deﬁned
above is asymptotically Optimal at each G = (a, [3).

Prmf. By Lemma 1.1 and (2.6),
(2.19) o s erm, d,+1)—- r(G) s 41:, — r) + 212, — nl-
Since IEi — {I 51 and lﬁi — nl S 2 for all i, the DCT, Lemma 2.1 and (2.19)

imply that ErN +1(G, di+1) -+ r(G). [:1

l

Bamaak 2.2. In the component problem under consideration in this chapter
and the next, the marginal distribution Of a single Observation is Beta—Binomial
with parameters 111, a, B. If m = 1, this is Binomial (1, 01/01 + 6) and the pair
((1, ﬂ) is not identiﬁed. Our method Of estimation in the empirical Bayes version
requires that m 2 2. This assumption can be removed if we require that the N i 2 2
and use estimators based on pooled data. Requiring Ni 2 2 i.O. would sufﬁce but
details Of these variations will not be presented. In Chapters 4 and 5 we Optimize
sample Size over 11 2 2 for the purpose Of simplifying the problem Of estimating the

prior.

19

§2.§I-S E Hm B'lQll!’

In this section we treat the empirical Bayes problem Of the last section. All
risks are multiplied by 1000, which corresponds to a component with loss function
1000(a-0)2 and cost 1000c per Observation.

We have calculated the envelOpe risk r(a, ﬂ) and the Optimal sample Size(s)
for various m, c, a, and ﬂ and present some Of the results in Table 2.1. We have
included the mean and standard deviation Of the Beta (0:, 6) prior in each case.

Figure 2.1 below is a graph Of the envelope risk function r(a, a) plotted

against a on a log scale. For this we have chosen m = 2 and c = .001.

A Flgure 2.1 A Risk Envelope

21 .0
1 8.0
1 5.0
1 2.0
9.0
6.0

3.0
1 -0 1 1 1 1 1 1 1 1

.001 .01 .1 1 2 10 100 1000

liar a)

rlrlrlllrll'l'

 

 

 

20

 

 

 

 

 

Table 2.1. n*(a, ﬂ) and r(a, 6)

Prior c=.001 c=.002 c=.001 c=.002
a ,6 p a n* r n* r n* r n* r
0.1 0.1 0.50 0.456 4 9.081 3 12.720 4 7.415 3 10.529
0.1 0.3 0.25 0.366 5 10.151 3 14.371 4 8.320 3 11.699
0.1 0.9 0.10 0.212 4 9.000 3 12.429 4 7.462 2 10.429
0.1 1.9 0.05 0.126 3 6.958 2 9.278 3 5.879 2 7.958
0.2 0.2 0.50 0.423 6 11.760 4 16.503 5 9.638 3 10.529
0.2 0.6 0.25 0.323 6 12.510 4 17.470 3 10.274 3 11.699
0.2 1.2 0.14 0.226 5 11.266 4 15.599 4 9.330 2 10.429
0.2 1.8 0.10 0.173 4 10.000 3 13.500 4 8.286 3 11.455
0.3 0.3 0.50 0.395 7 13.421 5 18.844 5 11.010 4 15.440
0.3 0.6 0.33 0.342 7 14.065 5 19.657 6 11.569 4 16.160
0.3 1.2 0.20 0.253 6 13.111 4 18.105 5 10.818 4 15.111
0.3 1.8 0.14 0.199 5 11.855 4 16.213 5 9.851 3 13.473
0.5 0.5 0.50 0.354 7 15.333 5 21.364 6 12.579 4 17.615
0.5 1.0 0.33 0.298 7 15.602 5 21.594 6 12.838 4 17.877
0.5 1.5 0.25 0.250 7 14.812 5 20.417 6 12.250 4 16.929
1.0 1.0 0.50 0.289 8 17.259 5 23.889 7 14.246 5 19.804
1.0 1.5 0.40 0.262 8 17.266 5 23.714 7 14.295 5 19.796
1.0 2.0 0.33 0.236 8 16.772 5 22.821 6 13.937 4 19.111
1.5 1.5 0.50 0.250 8 17.868 5 24.423 7 14.813 5 20.417
1.5 2.0 0.43 0.233 8 17.768 5 24.109 7 14.775 4 20.289
2.0 2.0 0.50 0.224 8 18.000 5 24.286 7 15.000 4 20.500
3.0 3.0 0.50 0.189 7 17.714 4 23.306 6 14.929 4 19.905
4.0 4.0 0.50 0.167 7 17.101 3 21.873 6 14.547 3 19.072
5.0 5.0 0.50 0.151 6 16.331 3 20.205 5 14.091 3 17.962
10.0 10.0 0.50 0.109 1 11.823 1 12.823 2 11.158 1 12.352

21

For m=2, c=.001 and selected a, 6 values, we have made Monte Carlo
estimates Of the empirical Bayes risk Of our procedure with initial starting
estimates 210 = 230 = 1. This is done for stages i = 10, 15,20, 25,50 and 100 and

the results are presented in Table 2.2 along with the standard errors Of the

 

 

estimates.

Table 2.2 Estimated Empirical Bayes Risks (m=2, c=.001)
Estimated Empirical Bayes Risks (Standard Errors)
0 [3 10 15 20 25 50 100 Envelope
Risk

0.1 0.1 10.22 9.83 10.13 10.00 9.28 9.13 9.081
(0.18) (0.07) (0.14) (0.14) (0.05) (0.01)

0.5 0.5 17.31 15.97 15.68 15.56 15.40 15.37 15.333
(0.67) (0.10) (0.05) (0.03) (0.01) (0.00)

1.0 1.0 21.27 19.05 18.26 18.05 17.41 17.32 17.259
(0.73) (0.43) (0.25) (0.28) (0.02) (0.00)

2.0 2.0 21.26 19.67 19.89 19.44 19.09 18.27 18.000
(0.43) (0.25) (0.30) (0.25) (0.20) (0.04)

3.0 3.0 20.43 19.73 19.36 19.75 18.73 18.47 17.714
(0.28) (0.24) (0.21) (0.25) (0.14) (0.17)

4.0 4.0 19.98 19.34 19.05 18.95 18.66 18.10 17.101
(0.29) (0.19) (0.16) (0.16) (0.15) (0.12)

0.1 0.9 12.25 12.58 13.12 13.05 10.69 9.41 9.000
(0.27) (0.34) (0.42) (0.44) (0.31) (0.31)

0.2 1.8 12.79 13.34 13.24 13.28 12.38 10.86 10.000
(0.19) (0.24) (0.29) (0.29) (0.28) (0.17)

CHAPTER 3
TESTING THE BIN OMIAL PARAMETER

§ 3.1; The Compgnant Prablem

In connection with the estimation problem for the binomial parameter 0
presented in Chapter 2, we consider a testing problem concerning the value Of 0 in
B(m, 0), where m g 2 is a given integer. As in Chapter 2, we assume the
conjugate prior G=Beta (a, )6) for the binomial parameter 0 and a constant cost
c > 0 per Observation. The hypothesis to be tested is

H0: 05 00 against H1: 0 > 00

for a given 00 E O = [0, 1]. Thus the action space .1 consists Of two actions a0
and a1, where 30 = "accept H0" and a1 = "reject H0". We assume the the
linear loss function L(-, ) go on Ox .1?

1(1), a0) = (0— 00f“.

L(0,a1) =(110 — o)+.
Conveniently, L(0, a0) — L(0, a1) = 0- 00. Let X1,...,Xn be i.i.d. P 0, the
distribution B(m, 0), with support 36’: {0, 1,...,m}. Then P15, the joint
distribution function of _)_(_ = (X1,...,Xn), has support .3“.

Let An denote the set Of all nonrandomized decision functions
(3.2) a. .3“ -) {0, 1}.

When a 6 .55“ is Observed, we take action a “Q and thereby incur the loss
(3.3) L(0,a6(£)) = L(0, a0) — 6(a) [L(0, a0) — L(0, a1)]
= 11(0) 30) " 60$)“- 00)-
The Bayes risk Of 6 E An at G is
(3.4) Rn(G, 6) = EL(0, aay),

where E denotes the expectation with respect tO the joint distribution Of (0, X).

22

23

Using (3.3), we can write
(3.5) Rn(G, 5) = 110w - 00)dG(0)
— 2 5(a)[ll(0-00)pg(s) dawn.
136.311 0

X. We see that in (3.5),

(3.6) 13w - oo)p,,(a)dc(0) .. EG(0Ix) - 00

where p 0 is the conditional mass function for

and EG(0|X_) is the Bayes estimate Of 0 based on at deﬁned by (2.4):

a + (Xl+. "'Xn)
dG(-)-(-)=I‘3G(0ll(-)=az + [3 + 11111

Thus (3.5) can be written as
(3.7) Rn(G, a) = [$009 — 00)dG(0)

— 2 6(8)1dG(21)— 0,) per),
56.311

 

where p denotes the marginal mass function for X. Since 6(X) takes values 0 or
1, it is clear from (3.7) that Rn(G, b) is minimized by taking
1 if dGI ( X) 2 00
(3-8) 5(309 =
0 othe r w i se
which is a Bayes decision function with respect to G. From (3.8), we Observe that

a Bayes test 6G E A11 is determined by comparing a Bayes estimate (IG 6 D n
with 00 for each n = 1,2,.... This Observation is useful in that an empirical Bayes
test 6n can be obtained from the empirical Bayes estimate (111 deﬁned in

Chapter 2.

Remark 3.1. Let g, g’ be densities of G = Beta (0:, ﬂ) and G’ = Beta
(0’, 6’). Then we have
(3.9) an(G, 6G,) — Rn(G’, 6G,)

 

:21}, I8’(0)-g(0)ld(0)

and

24

(310) 13,85. 4(3) — Rn(G)| .<. 213, new) —g(o) )do
for all n = 1, 2...
PM. For 6: 6G, in (3.7)

(311) Ram, 1G,)=11. (0-00)g(0)d0- 2 6G,(a)idG(a)—00)p(r).
0 x6311

Letting G = G’ in (3.11) leads to

(3.12) 8,162 6G,) =1) (0— 00)g’(0)d0- 2 404214618.) — 40111418).
0 56.311

By subtraction,

|Rn(G, 6G,)—Rn(G’, 6G,)

 

811010—001 |g(0)-8’(0)|d0

+ 2 1912113104,) |8(0)—8’(0)|d01
n
x63
1
821010—00) lg’(0)-g(0)|d0
521.1, lg’w) —g(0)|d0-
The second statement (3.10) follows immediately from (3.12) by changing the roles
of G and G’. 1:)

From (3.12), Bayes decision risk Of 6G 6 An at G = Beta (07, B) can be

written as
(3.13) Rn(G) = Rn(G, 3G)
= 110(9— 00) g(0)d0-— 2: n[11G(a)— 0015(3), 11 = 1, 2,....
x63

We seek a minimizer Of rn(G) = Rn(G) + cn among the integers
n = 1,2,... By the comment preceding Example 1.1 and the fact that R1(G) S 1
for all G, it follows that a minimizer n** satisﬁes n** 5 (1+c)/c for all G. We
have chosen tO denote the Optimal sample size function for the test as n** =

n**(a, ,6) to distinguish it from the Optimal sample size n* for estimation. We do

25

not have an explicit formula for n** although it is easily computed for any given
a, 5. Thus, using the sample size n** and using 6G E An”, we achieve

minimum Bayes risk r(G).

26

§3_-2.- W

In this section we consider the empirical Bayes decision problem with the
linear loss testing component problem described in the last section. The prior G is
assumed to be in the parametric family f Of beta priors on O = [0, 1]. Let G =
Beta (0:, ,6), where a 6 > 0 are unknown constants. In the sequence Of component
problems resulting from the repetition Of the component, we are given a sequence Of
parameters 01, 0 ,... which are assumed to be i.i.d. G = Beta (0:, 5).

Suppose that we have experienced i component problems by Observing X1:

(X11,...,X1Nl),..., X1 = (Xi1"""XiNi)' At the (i+1)th component problem we will

test

H0: 0i+1-<- 00 against H1: 0i+1 > 00
with the linear loss function given by (3.1). Since 0i+1 ” Beta (07, ﬂ) and a > 0,
B > 0 are unknown, the Optimal sample size n**(a, ﬂ) and Bayes decision rule
6G e An**(a 6) are not directly available, so that the minimum Bayes risk r(G)
cannot be achieved. However, if an estimate Gi Of G is available at this stage, we
estimate the Optimal sample size n**(G) and the Bayes rule 6G E An**(G) at G
by Ni+1 = n**(Gi) and 6i+1 = tiéi E An**(éi) and, thus, deﬁne an empirical

Bayes procedure (21,52) as in Table 1.1. For the estimates Of a, B assume m g 2
and let Ai’ 3i begiven by (2.18). Let £10, 230 be arbitrary initial estimates.
Then
(3.14) Ni+1 = n**(Evi, 23,), i: 1, 2,...
and

1 , if d. 1(xiirl) z 00

i+1 1+ _

(3-15) 6(.X_ )=

0 , otherwise

where di+1 is deﬁned by (2.11).

27

Lamma 3.1. Let m22 and let O.

1’ 6i be the as. consistent estimators, e.g.,

as in (2.18). Let g, denote the Beta density with parameter Oi, 3i and let g be
the beta density with the governing parameter values a, )6. Then
(3.16) gm) -) g(0), 0 < 0 < 1, as.

11991- At each 0, g(0) is a continuous function Of (a, 3). 1:1

jlfhaaram 3.1. Let m 2 2. The empirical Bayes testing procedure (N, .6)
deﬁned by (3.14) and (3.15) is asymptotically Optimal at each G = Beta (07, ﬂ).
Praaf. From Lemma 1.1 and (3.10), it follows that
1 .
(3.17) 0 5 ErNi+l(G, 61+1) — r(G) 5 4 Ej0|gi(0) —g(0) Id0.

Note that the sequence g, — g -) 0 as. on the probability space of the empirical
Bayes problem cross Lebesgue measure on (0, l). The sequence gi + g dominates
lgi — g| and converges to 2g(0) SO by the generalized dominated convergence

theorem, RHS (3.17) converges to zero. :1

CHAPTER 4
ESTIMATION OF THE NORMAL MEAN

§ 4_._1_ Tha Qampanant Prablam

The component problem considered in this chapter is the one introduced in

Example 1.1. Here G = N(p, V) and, letting

(4.1) p = ﬁve.

the posterioi distribution Of 0 given )_(_ = (XI, X2,...,Xn) is
(42) NW + (l—p) X. 343V).

With this notation, the Bayes estimator (1.8) can be written
(43) dGQi.) = pp + (l-p)X-

The following remark parallels Remark 2.1.

Remark 4.1. For G = N01, V) and G’ = N(p’,V’),

(4.4) 8,16, «13.) = (I-p')2 % + 2'2 [(11 w)? + V).
(4.5) IRn(G,dG,)-Rn(G’.dG,)l SUV—102+ IV’ -v1
and

(4.6) Rn(G) = ﬂy“

 

Proof. By (4.3), dG,(_X) = p’p’ + (l—p’)X. Since expected squared
deviation is variance plus bias squared,
2
Rn(G1 dG’) = EGEgIP'I" +(1‘P')X— 0]

= EG {(1-10’)2 % + 0’2 (11’ - 02}

= (1-10’)2 %+ 9’2 [V + (u’ - 102]-
Then (4.6) follows by replacing G’ by G above and using (4.1). Since

2 A 2
Rn(G” dGI)=(1_p’) T47 p, V'
it follows that
2 .
Rn(G) dctl - RING“ 93’) = p’2[(u’ -u) + (V -V )l

28

29

which yields (4.5). 1:1

We seek the Optimal sample size n* which minimizes
__ _ AV
In(G) — RD(G) + CH —.W + CD

among the integers n = 1,2,.... We consider rn(G) as a function Of real 11 and the

 

equation
2
(1 AV
0 = r (G) = — + c.
6'11 11 (A + nV)2
Its larger solution is
(4.7) 7) = (A/e)1/ 2 — A/V.

We see that rn(G) is convex in n 6 (—A/V, co) and that the Optimal sample size
n* = n*(A, V) is given by (1.10).

In our empirical Bayes application the variance A of the conditional
distribution N (0, A) is assumed to be unknown but is assumed to be in a given
bounded interval (0, a]. Thus we are taking A tO be a nuisance parameter.

It is convenient, though not necessary, tO require that at least two
Observations be taken in each component Of the empirical Bayes problem so that the
estimation Of A is simple. Therefore, we will Optimize sample size over choices 11
= 2, 3,.... in deﬁning the envelOpe risk. It follows that

2 if 77 < 2

(4.8) n*=n*(A, V): ’7 if r) 6 {2,3,...}
[17] or [n]+1, otherwise

where 7] is given in (4.7).

Since R2(G) = EGE0(X — 61)2 5 A/2, it follows as in the comment
preceding Example 1.1 that n* S (A/2 + 2c)/c. Letting M be the integer
[a/2c + 2] + 1, it follows that
(4.9) 2 5 n*(A, V) 5 M < 00

30

for all A and priors G = N(p, V).
Notice that in the component problem

(4.10) EXn = p,

1
(4.11) Eii

ll M1:

2
k 1 k

and, provided 11 2 2,

n
(4.12) E n—if k2101k — Xn)2 = A.

31

§ﬂ.AnEmiri B SD iinPr r

In this section we construct a decision procedure for the empirical Bayes
problem with the component Of the last section. The unknown prior G is assumed
to be from the family Of normal distributions f, the family Of conjugate priors.
Let G = N(p, V), where p 6 (-co, co) and V E (0, 00).

Let A0, [10 and V0 be initial nonrandom estimates Of the component
nuisance parameter A and the parameters 11, V of the prior. Let N1=n*(A0, V0).

Then x_1=(x11,....,x1Nl) is observed in the ﬁrst component. The empirical

Bayes procedure that we will study is deﬁned through sequences Of estimators Ai,

111 and Vi that are (31,...,_X_i) measurable with

 

(4.13) Ni+1 = n*(Ai, Vi)’ i = 0, 1,...
and
i+1 _ . . . . _

(4.14) di+10£ ) — pi+lﬂi + (1_pi+1)Yi+1’ 1— 0,1,...
where
( ) A‘
4.15 p. = i = 0 1...

1+1 ~ ~ . 1 1

i + Ni+lvi
and
N.
- 1 21 x — 1 2

(4.16) Yi—N;j=1 iji 1“ 1 7

We now deﬁne the estimators iii, A. and V., i = 1, 2,.... Motivated by
(4.10) we deﬁne

. _ _l i ._
(4.17) ,rl_v._f.2 Y., 1-1,2,...

the average Of the sample means for the ﬁrst i components. Motivated by (4.12)

we define

(4.18) A. = S, A a i = 1, 2,...

32

where
1 i
i=1 1
is the average Of the sample variances
N.
(4 20) s. = 1 2) (x. ._ v)?
‘ 1 Nj-I :1 1k 1
for the ﬁrst i components. Finally, motivated by (4.11) we deﬁne
A - A — A + . _
(4.21) Vi — [Ti Ai] , l— 1, 2,...
where
. 1 i
(4.22) Ti = .1- 2 T.i
i=1 1
is the average of the average squared deviations from ”i = Yi,
N1
_ 1 2

In (4.23) the centerings change with i, which creates a more complicated stochastic

structure than exists in (4.20). For purpose Of triangulation, we introduce

l

(4.24) 13:; 2: T.,

i=1 1
where

N1
1 2
(4.25) T: 2 (x. —p)
1 Njk—l 1"

Let 5;, be the trivial a—ﬁeld and let a]: = c(x1,x2,...,xj), j = 1, 2,.... The
sample size N j is Jig—measurable and we see that
E Y. . = ' = l 2,....
( J| 33.1) ﬂ . J .
. . . = A '=1,2,....
. . = A ° = 1 2,....

Lemma 4.1. The sequences a] = Yi’ Si and Ti are as. consistent for 17, A
and V+A, respectively.

33

13m. We will use (4.26) and the theorem on stability about conditional
expectation used earlier, i.e., Hall and Heyde (1980, Theorem 2.19). The sequences
Yi’ Si and Ti are not bounded. However, we will ﬁnd random variables Y, S and
T that are Square integrable and stochastically larger than their absolute values.
This implies the hypothesis Of Theorem 2.19 that is sufﬁcient for the as.
convergence.

Recall that 2 5 Ni 5 M, i = 1,2,.... Consider the component problem with
sample size M and Observations X1, X2,....XM. Let Y = 2 lle’ S = 2 X? and
T = 2 (Xj - p)2. From the deﬁnitions (4.16), (4.20) and (4.25) we see that Y, S
and T are stochastically larger than |Yi | , ISil and ITi | , i = 1,2,.... Also
Y2 5 MS and, conditional on 0, the distributions Of S and T are noncentral
chi—square distributions with second moments that are integrable N(p, V). Thus,

Y, S, and T are square integrable. 1:)

Emma 4.2. The estimator Ti and Vi are as. consistent for V+A and V.
Prmf. We have from (4.23) and (4.25) that

(4.27) Tj — Tji = (Yi — 11)(2Yj -— p — Yi).
Since Y1 = E Yj/i, we have from (4.22) and (4.24), that
(4.28) Ti — Ti = (Yi — 102.

It follows from Lemma 4.1 that Ti is as. consistent for V+A. Using (4.21) and

Lemma 4.1 it follows that Vi is consistent for V. n

flfhaa am 4.1. Let A 5 a. Then the empirical Bayes procedure (N, :1)
deﬁned by (4.13) — (4.23) is asymptotically Optimal at each G = N(p, V).
Prmf. From Lomma 1.1 and (4.5),

—r(G) 3 2031—17)? + 2|vi—vl.

(4.29) 0 s r (G, d- )
Ni+l 1+1

34

Let Y, T be the random variables deﬁned in the proof Of Lemma 4.1. Then
for p > 0, E|Yj|2+p g E(Y+1)2+p < co and E|Tj|1+p g E(T+1)1er < 00 for
j = 1,2,.... Hence, the {Y?} and the {Tj} are uniformly integrable. Thus, {11?}

and {Ti} are uniformly integrable and the as. convergence (Lemma 4.1) implies

that

(4.30) E([1i — 102 —) 0
and

(4.31) E|Ti — (V+A)| -» 0.

It follows from the triangle inequality and (4.28) that
(vi—v1 g lTi-Ti' + lTi-(V+A)| + |(V+A)—(V+Ai)|
(4.32)
= (74—7.)2 + IT,—(V+A)I + IAi—AI.
The dominated convergence theorem and Lemma 4.1 imply
(4.33) ElAi—A|-)0
which together with (4.29) — (4.32) establish the result. 1:)

CHAPTER 5
TESTING THE NORMAL MEAN
§ 5.1. The Comp_onent Problem.
In this section we consider linear loss testing Of the normal mean 0 in

N(0, A). Speciﬁcally, we consider the problem Of testing
(5.1) H0: 0 S 00 against H1: 0 > 00
with

L(0, a0) = (o— 00)+

L(0, a1) = (00 - o)+.
Using the analysis deveIOped in Section 3.1 for the component Of this section, we
ﬁnd that for any test 6,
(5.2) Rn(0, 6) = 1°50( 0— 00)dG(0)

— I}, 6(8) [dG(3) — ours) as,

where .2”: (we, 00), f is the marginal density Of N = (X1,....,Xn) and dG is given
by (4.3). A Bayes test versus the prior G is

1 if dG(X) 2 00

0 if dag) < 00.

Throughout this chapter we will take y to be the family Of normal

(53) 6G (.X.) =

distributions N(/1,V) with
(5.4) Be |0| =1 )0) WW: K < .,

where K > 0 is a known constant.

Ramark 5.1. Let g, g’ be densities for G =N(11, V), G: N(11’,V’) in
f. Then
(5.5) an(G. 1G,)-R,,<G', 6G.)| s21 I0—0OI ls’(0)—g(0)|d0

and

35

36

(5.6) 8,461) s 2(K + 100))
for n = 1, 2,....
Prmf. (5.5) followsasin (3.9). Let G’ = G in Rn(G’,dG,). Using
“13(25)— 00) 1(8) 51:, 10— 0012; (x) new.

we have
8,62) = Raw. 63) 811°, l0- 0,) so) do
+13, 11:10-00) |g(0) eerie) ex
= 2 [:0 I0—0OI g(0) d0
$21!;I0I8(0)d0+l0011.<..2(K+I00I)
for n = 1, 2,..., which proves (5.6). u

Bemﬂk 5.2. Let (b be the c.d.f. Of N(0, 1). Then
(5.7) 1 )0) 74940 = 1?)” 2 exp (~2p2/V)
+ p [1 — 21> (-)u/~/V)l
and
(5.8) If )0) g’(0)d0-l )0) new) 4317' —p)
+ 1417' —.N1 + 2177') 141—p/m—11—p/m1
+ JVI exp (~2/t’2/V’) - exp (~292/V) l-
M. A direct calculation gives (5.7). Using (5.7) for G’ and subtraction,
LHS Of (5.8) is less than or equal to
163%)”2 exp (—2p'2/v') + p' (1— 244-47470)
— (2,3)”2 exp (-2p2/V) — 70 — 2¢(-u/~/V))|
8 Nil)”2 — (2931/2) exp 1—27'2/v')
+ 1%)”2 lexp (—2p'2/v') — exp (-2142/V)l
+ In’ -u| + 2|u’ -u| ¢(—u’/~/V”)
+ 21p) I¢(—u'/JV0 -¢(—p/N)I

37

:3 lu’ _,., + 1777—7171 + 2174 new/m—n—p/NH
+ .47) exp 1—2p'2/v') — exp (-2#2/V) I.
the RHS of (5.8). 1:)

We seek the smallest minimizer n** Of rn(G) = Rn(G) + on, n = 2,3,...
(As in Chapter 4 we are Optimizing over 11 2 2.) It follows as in the comment
preceding Lemma 1.1, that n** 5 (R.2(G) + 2c) /c, so using (5.6) and letting M
denote the integer [{2(K + 1001) + 2c}/c] + l,
(59) 2_<_n**$M<oo
for all G e y.

38

§ 5;. An Empirigl Bayea Degisign Prﬂdura
Suppose that in the component problem of Section 5.1, the prior G=N(u, V)

and the variance A Of the conditional distribution N (0, A) are unknown. We
assume that unknown prior G is in the subfamily y of normal distributions
satisfying
BC | 0| 3 K < co

and that the variance A, a nuisance parameter, is in a bounded interval (0, a].
The constants K and a are known.

In general, the Optimal sample size n** deﬁned in (5.9) is a function Of A
and G = N(p, V), i.e.,

n** = n** (A, 11, V)

for (A, 11, V,) E (0, a] x (—oo, oo) x (0, co).

If the component problem occurs repeatedly and independently with the
same unknown G = N(p, V) and A, the empirical Bayes approach is applicable.

Suppose that we have experienced i components by observing

1 _ i—
L — (X11,....,X1N1)3"°7 K _ (Xil"'"’XiNi)

from N(01, A),...,N(0i, A), where 01, 0 , ..... are independent G—distributed
parameters. At the (i+1)th component, we will test

H0: 0i+1 S 00 against H1: 9 > 00.

This will be carried out by determining the sample size N.

1+1 and the decision rule

6. E A for i = 0, 1,....
1+1 Ni+1
Let A0, £10, and V0 be nonrandom initial estimates Of A, 11 and V and

A

let N1 = n** (A V0). Then N1 = (X11,....,X1Nl) is Observed in the ﬁrst

07 #0:
component. The empirical Bayes procedure that we will study is deﬁned through
Ai, iii, Vi that are (Nl,....,Ni) measurable with

(5.9) Ni+1 = n**(Ai, 711, V,)

39

and

. i+1

0+1
(5.10) 3. (x‘ )=
1+1 0 , otherwise

where di+1(2(_i+1) is deﬁned by (4.14) for i = 0, 1,....

If we use Ai, ”i and Vi deﬁned by (4.8), (4.16) and (4.21) in constructing
(N, t) = ((N1’ N2,...), (61, 62,....)) given by (5.9), (5.10), then it is easy to see that
they satisfy all the consistency prOperties proved in Lemma 4.1, 4.2 and (4.30),
(4.31), (4.33) in Theorem 4.1.

The following lemma is useful in proving the asymptotic Optimality of (N, 15)

that has been constructed above through Ai, iii and Vi i =0,1,....

Emma 5.1. Let gi, g be densities Of Gi = N(i1i, Vi), G=N(p, V) for

i = 0,1,...
Then

(5.11) 12m E 7:01am —g(0)|d0= 0

and

(5.12) lém E 11°00 10— 00) (gin) —g(0)|d0= 0

mat“. Note that gi - g -) 0 a.e. on the measure space Of the empirical Bayes
problem cross Lebesgue measure On (-00, op). Using the same argument as in the
proof of Theorem 3.1., we obtain (5.11).

For (5.12), it sufﬁces to show that

(5.13) lim E [:0 |0| Igi(0) —g(0)|d0= 0.
1
Since |0| lgi(0) - g(0)| -) 0 a.e. on the measure space Of the empirical Bayes

problem cross Lebesgue measure on (—oo, oo), |0| (gi(0) + g(0)) dominates the
integrand |0| Igi(0) —g(0)| and |0| (gi(0) + g(0)) -) 2|0| g(0) a.e. on that

40

product space, (5.13) will follow by generalized dominated convergence theorem by
showing that
00 “ 00
(5-14) E11,, l0! (8,09) + 8(0))d9" 2 EL“, |0| 8(0)“-
Using (5.8) applied to g’ = gi ,
oo “ 00
(5-15) IELno |0| 51(0)d0- 1.0, MI 8(9)d0|.
53E|ﬂi—ul +EI1/Vi-1/Vl
+ 217,) El¢(-it,/i/ V,) -¢(—u/JV)I
+ J V, Elexp (at/V,) — exp (—2p2/V) I.
which converges to 0 by the as. consistency and the mean consistency of [ti and Vi.
The proof is completed since
on “ 00
IE].no |0| 8,(0)d0-E 1.0, MI 8(0)d0|
0° ..
$.13 ”.le31(0)d0-I°_°,,|0|g(0)d9|. U

Theoram 5.1. Let A g a. Then the empirical Bayes decision procedure
(N, .4) deﬁned by (5.9), (5.10) through the estimates Ai [ti and Vi given by
(4.18), (4.17) and (4.21) is asymptotically optimal for all G with EG|0| ; K.
Proof. From Lemma 1.1, (5.5), and Lemma 5.1,
0§Er (G,d. )—r(G)
Ni+1 1+1

_S_4 E12,, I0-0ol Ié,(0)—s(0)lcw-»o c1

REFERENCES

REFERENCES

Gilliland, Dennis and Harman, James 1977). Improved rates in the empirical Bayes
monotone multiple decision pro lem with MLR family. Ann. Statist. 5
516—521.

 

Gilliland, Dennis and Karunamuni, Rohana (1988 . On empirical Bayes with
sequential component. Ann. Inst. Statist. ath. 40 187—193.

Hall, P. and Heyde, C. C. (1980). Martingale limit theory and its application.
Academic Press, New York.

Johns, M. V. and Van Ryzin, J. (1971). Convergence rates for empirical Bayes
two—action problems 1. Discrete case. Ann. Math. Statist. 42 1521—1539.

Johns, M. V. and Van Ryzin, J. ((1972). Convergence rates for empirical Bayes
two—action problems 11. ontinuous case. Ann. Math. Statist. 43 934—947.

Karunamuni, Rohana (1985). Empirical Bayes with sequential components. Ph.D.
Thesis. Dept. 0 Statistics & Probability, Michigan State University.

Karunamuni, Rohana (1988). On empirical Bayes testing with sequential
components. Ann, Statiat. 16 1270—1282.

Laippala, P. (1979). The empirical Bayes approach with ﬂoating Optimal sample
Size in binomial experimentation. Sgng. ,1. Statiat. 6 113—118; correction
note 7 105.

 

Laippala, P (1982). The empirical Bayes rules with ﬂoating optimal sample size for
exponenti conditional distributions. Ann. Inst. Statigt. Math. 37 315—327.

Morris, Carl (1983). Parametric empirical Bayes inference: Theory and

Applications. ,1. Amer. Statiat. Assoc. 78 47—65.

O'Bryan, T. (1972). Empirical Bayes results in the case Of non—identical
components. Ph.D. Thesis, RM—306, Statistics and Probability,
Michigan State University.

O'Bryan, T. (1976). Some empirical Bayes results in the case Of component
problem with varying sample sizes for discrete exponential families.
Ann. Statiat. 4 1290—1293.

O'Bryan, T. and Susarla, V. (1975). An empirical Bayes two—action problem with
non-identical components for a translated exponential distribution. Comm.

Statist. 4(8) 767—775.
Robbins, H. (1951). Asymptotically subminimax solutions Of compound statistical

decision problems. Proc. 2nd Barkaley Symp. Math. Statist. Prob. 131—148,

University of California Press.

Robbins, H. (1956 . The empirical Bayes approach to statistics. Prac. 3rd Barkelay
Symp. Mat . Statiat. Prab. 1 157—163, University Of California Press.

41

42

Robbins, H. (1964). The empirical Bayes approach tO statistical decision problems.
Ann. Math. Statiat. 35 1—20.

Singh, R. S. (1979). Empirical Bayes estimation in Lebesgue—exponential families
with rates near the best possible rate. Ann. Statist. 7 890—902.

Susarla, V. (1982). Empirical Bayes Theory, Encyclopedia Of Statistical Science,
(eds. Kotz and Johnson) 2 490 - 503, Wiley, New York.

Van Ryzin and Susarla, V. (1977). On the empirical Bayes approach to multiple
decision problems. Ann. Statiat. 5 172—181.

"I11111111111111“