I » I III '-+ Imom SOME ADMISSIBILITY CONSIDERATIONS IN THE FINITE STATE COMPONENT COMPOUND AND EMPIRICAL BAYES DECISION PROBLEMS Dissertation for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY JOHN ELVIN BOYER, JR. 1976 LIBRARY Marianas” .. U , IIIIIIIII Iiiiiiiiiiiiii L 3 1293 00 SIOIO This is to certify that the thesis entitled SOME ADMISSIBILITY CONSIDERATIONS IN THE FINITE STATE COMPONENT COMPOUND AND EMPIRICAL BAYES DECISION PROBLEMS presented by John Elvin Boyer, Jr. has been accepted towards fulfillment of the requirements for Ph.D. degpein Statistics and Probability zit, ZWJ Major professor Date AM 0-7 639 ‘ gm‘i i995", ABSTRACT SOME ADMISSIBILITY CONSIDERATIONS IN THE FINITE STATE COMPONENT COMPOUND AND EMPIRICAL BAYES DECISION PROBLEMS BY John Elvin Boyer, Jr. We consider the compound and empirical Bayes decision problems with finite State component. Relationships between admis- sibility of a compound rule and the admissibility of the component decision rules it selects are established. Analogous results are obtained in the empirical Bayes decision problem. The main result is the demonstration of an admissible Bayes (A) empirical Bayes decision rule which is asymptotically optimal for a large class of A. SOME ADMISSIBILITY CONSIDERATIONS IN THE FINITE STATE COMPONENT COMPOUND AND EMPIRICAL BAYES DECISION PROBLEMS BY John Elvin Boyer, Jr. A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1976 ACKNOWLEDGEMENTS I wish to express my sincerest thanks to Professor Dennis C. Giliiland for his guidance and encouragement in my studies and in the preparation of this thesis. His patience and willingness to discuss any problem at any time are greatly appreciated. I also wish to thank Professor James Hannan for his critical reading of this thesis and his encouragement as it progressed. Special thanks are due to Mrs. Noraiee Burkhardt for her typing of the manuscript, and the patience with which she did it. TABLE OF CONTENTS Chapter Page I INTRODUCTION . . . . . . . . . . . . . . . . . . . . 1 Component Decision Problem . i 2. Compound Decision Problem . 3 Empirical Bayes Decision Problem . U‘IW" 2 RELATIONSHIPS BETWEEN COMPOUND AND COMPONENT ADMISSIBILITY . . . . . . . . . . . . . . . . . . . 7 I. Introduction . . . . . 7 2. Relationships between Compound and Component Admissibility . . . . . . . . . . . . . . . . . IO 3 ADMISSIBLE (BAYES) SOLUTIONS TO THE EMPIRICAL BAYES PROBLEM . . . . . . . . . . . . . . . . . . . I7 I. Admissibility and Component Admissibility in the Empirical Bayes Problem . . . . . . . . . I7 2. Bayes Procedures in the Finite State Component Empirical Bayes Problem . . . . . . . 2i 3. Bayes Risk Consistency of the Bayes Procedures . . . . . . . . . . . . . . . . . . . 25 APPENDIX A . . . . . . . . . . . . . . . . . . . . . . . . . . 30 APPENDIX B . . . . . . . . . . . . . . . . . . . . . . . . . . 32 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . 3A iii CHAPTER I INTRODUCTION l. Component Decision Problem Consider the following statistical decision problem called the component decision problem. Let {P6: 6 E O} be a family of probability measures over a 0-field B of subsets of X. ® is called the parameter or state space. X denotes an X-valued, Pe- distributed random variable. Let A denote the action space and L :_O the loss function defined on ® X A. ,Let C be a 0-field of subsets of A, with respect to which the 6-sections of L are measurable. A (randomized) decision rule t is a function having domain X x C such that each X-section of t is a probability measure on C and each C-section of t is B-measurable. The risk of t at state 6 is (I) R(O,t) = fl L(O,a)t(x,da)Pe(dx).l Let G denote a class of probability measures (priors) on F a o-field of subsets of ® with reSpect to which the t-sections of R(6,t) are measurable. The Bayes risk of t at G E G is (2) R(G,t) = fR(e,t)G(de). Let T be a specified class of component decision rules. The infimum Bayes risk at G is (3) R(G) = inf R(G,t) tET and R(-) defined by (3) is called the Bayes envelope. For t“ 4. s such that R(G,t ) = R(G) we write t' = tG and call tG a Bayes rule with respect to 6. Note that tG need not exist and, if it does exist, it need not be unique. However, if it does exist at each G and a minimizer is Specified at each G, then the decision rule valued function t(.) defined on G is called a Bayes response. it. This thesis considers the finite ® case almost exclusively. '4‘: -I’ Here we write ® = {O,I,...,m} and assume that PO’PI""’Pm are distinct probability measures on B. We write P = ZmG.Pi for G O I G = (Go’Gl’°°"Gm) 6 G, here the m-dimensional simplex in Rm+l, ‘._-—_L“‘J— a a h" .'o \- ‘ ' '. ‘7' ' Euclidean (m+I)-space. We let fi denote a (bounded) density of Pi with respect to u = 2: Pi’ i = O,l,...,m, whenever densities are used. In the finite ® case, the risk function R(-,t) defined I m) by (I), is a vector 5 = (50,5 ,...,5 in [O,m]m+l. The collection of all such 5 is termed the risk set and is denoted by S. The Bayes risk of s at G is 23 Gis'° If S is closed from below, then tG exists for every G E G. 2. Compound Decision Problem. In the compound decision problem the component problem occurs repeatedly and independently n times presenting a sequence (ei,xi), i = l,2,...,n. The 6i are unobservable. We write §_= (6,,62,...,6n) and §= (xl,x2,...,xn) ~ P9= Pe'x P82 x...x Pen. 9:; denotes g with the a-component deleted and, similarly, for 2a, a = l,2,...,n. For the class of compound decision rules £_= (tl""’tn) we take all functions 5_ such that for each a in {O,I,...,m} to is a func- tion on Xn x C with the property that every iu-section of to is a component decision rule (as in section I) and such that sa(iu) = If L(O,a)ta(§n,da)Pe(an) IS a measurable function of 5“. 0f 0" in" .‘ . ) - (sa(§a),...,sa(§a)) Is then a measurable functIon course 5 (X a -o of 5a into S. The unconditional component a risk is denoted by (A) Ra(_e_s£_) = [R(ea’ta(§u))P:é11(d§fl)’ (1 = IDZD"‘In and the compound (average risk across components) risk is denoted by (5) Ne ) = i i V . V In the O = {O,I,...,m} case, sa(§a) = R(I,ta(§a)), I = O,I,...,m, and the unconditional component a risk is a P" -average of the S-valued function 5 (Ii ). 9a a -o The compound decision problem is invariant under the group of permutations of coordinates. t_ is equivariant if ta(§) = t(Xa;3u), a = l,2,...,n, where t as written is a symmetric function of its second argument. For such t, sa(§u) = 5(la), a = l,2,...,n, where s is a symmetric S-vaiued function on Xn-I. Equivariant t. have compound risk depending on 2_ only through its empirical distribution which is denoted by n_= (n0,nl,...,nm) in the ® = {O,l,...,m} case. There has been increased concern over the component risk behavior of compound decision rules. In Chapter 2 we explore the relationships between compound admissibility of t, the admissibility of the Xu-section of ta, a = l,2,...,n and the admissibility of the éu-sections of the risk functions Ra(9)£),(l= l,2,...,n. Examples I-3 and Theorems l and 2 serve to delineate the implications in finite ® case. 3. Empirical Bayes Decision Problem. In the empirical Bayes decision problem 6 are iid I,62,... G (unknown) distributed random variables and the conditional distribu- tion of §_= (XI,X2,...) given g-= (6',62,...) is P x P x... 6I 62 The marginal distribution of 5_ is PG x PG x..., hereafter, simply PG' In the empirical Bayes problem in the finite ® case we let m 0 G also assume the linear independence of the densities f0,fl,...,fm in L2(u) ensuring, among other things, the identifiability of the fG = Z Gifi which is a density of P with respect to n. We mixtures PG and the existence of unbiased bounded estimators of G. Let £_= (tl’t .) where for each. n, the action selected 2,.. for component n, is by tn, where tn is the nth component of a compound rule, that is tn is a function on Xn x C such that every §fl_l-section is a component rule and R(6,tn(§fl_l)) = If L(e,a)tn(§n,da)Pe(an) is a measurable function of ln-l' Robbins (l956) introduced the empirical Bayes problem showing examples of constructions of empirical Bayes rules whose component n Bayes risks converge to the component minimum Bayes risk R(G), whatever be 6. V The conditional on N“ component n Bayes risk of t_ is m v (6) R(G.tn(§n)) = 3:0 GiR(i.tn(§fl)). n = 1.2.... and the unconditional component n Bayes risk of t_ is v -lv (7) Rn(G.tn) = IR(G,tn(§n))PE (dxn), n = I,2,... Of course, for each n and G (8) R(G) 1 R(c,tn(§gn>) from which R(G) §_Rn(G,tn) for all _E, n and G. An empirical Bayes rule t_ is said to be strongly Bayes risk consistent if R(c,tn(gfi)) + R(G) 3.5. [PE] for all e e G and simply Bayes risk consistent (usually termed asymptotically optimal) if Rn(G,tn) + R(G) for all G e'G. Chapter 3 concerns the empirical Bayes case. In section I, Theorem 3 and Example A delineate relationships between the admis- sibility of tn as a decision rule for decisions concerning G and the admissibility of the Xn-sections in the component decision problem. Theorem A shows that for equivariant compound t, the empirical Bayes admissibility of tn implies the compound admissibility of t, Theorem 5 shows that admissible empirical Bayes rules result from playing Bayes versus a second level prior A on G. In Section 2 of Chapter 3, Lemma I exposes the structure of Bayes (A) empirical Bayes procedures. This lemma together with a series of lemmas in Section 3 culminate in a proof of the main re- sult (Theorem 7), namely, the strong Bayes risk consistency of the Bayes (A) empirical Bayes procedure, provided each point G E G is a point of support of A. CHAPTER 2 RELATIONSHIPS BETWEEN COMPOUND AND COMPONENT ADMISSIBILITY I. Introduction. In this chapter we consider some admissibility questions in the compound decision problem. In particular, we explore the rela- tionship between admissibility of a compound decision procedure and the admissibility of the risk functions it selects for component decisions. We define three criteria which we are interested in com- paring. A Definition l. (A) A compound rule £_ is admissible if A. there does not exist a compound rule 5? such that '0 R(_e_.§) “ :_R(g3_) for all g_ in ® and R(gntf) < R(gjt) for some n g_ in ® . Definition 2. (CA) A compound rule 5_ is component admissible if, for every a in {l,2,...,n}, every éu-section of Ra(§3_) is an admissible risk function in the component. Definition 3. (CCA) A compound rule t_ is conditionally component admissible, if for each a = I,2,...,n and Eu in @n-l, almost all [PD J , éh-sections of t are admissible decision rules -u a in the component. Of course, (A) is the standard admissibility definition applied to compound decision rules t, Gilliland and Hannan (l97A) discuss the restricted compound decision problem wherein only 7 compound rules taking values in a specified restricted component risk set are considered. For example, if the component risk set is re- V stricted to the admissible risk functions, then all X -sections of A ta are admissible decision rules in the component, a = I,2,...,n. In the risk function (S-game) notation of Gilliland and Hannah, the V risk function 50(2a) corresponding to the X -section of ta A is an admissible risk function for all Xa’ a = l,2,...,n. Our de- finition of (CCA) imposes this condition for almost every [Pv J Ra. Am The unconditional component a risk is the average Ra(9’£) = rsa(gu)P6 (did) and (CA) requires that éu-seetions be admissible risk functions. Any simple rule t_ where tl,t2,...,tn are admissible component decision functions is both (CA) and (CCA). The Stein example of the inadmissibility of the usual estimator of the mean of a multivariate normal, squared error loss, n :_3, shows that, in general, (A) need not follow from (CA) and (CCA). Copas (1974) has established necessary conditions for the admissibility of an equivariant compound decision rule when the com- ponent is a 2-state, 2-act decision problem. One such necessary con- dition is that the rule be cut-point in nature, that is, that there exist a symmetric function A* so that component decisions 0 and l are made according to whether (fl(xa)/f0(xa)) < A*(x) or (f'(xa)/f0(xa)) > A*(x) where fl/fO is the likelihood ratio in the component. In the next section we establish a necessary condition and a sufficient condition for (A) for some finite state components. Our definitions (A), (CA), (CCA) refer to fixed n. Usually the term compound decision rule refers to a specification of decision procedures for each n = l,2,... so that (A), (CA), (CCA) could then be required for every n. I0 2. Relationships Between Compound and Component Admissibility. We now summarize the results concerning (A), (CA) and (CCA) to be established in this section for finite state components. Theorem 2 (CA) = (A) s u Theorem I Remark l (CCA) The implication (CA) = (CCA) will be seen to be trivial. The implication (CA) =I(A) will be proved for two-state components and equivariant t_ only. All other pairwise implications do not hold as shown by Examples l-3. Example I. (CCA) #>(A). Consider the 2 x 2 component testing problem used by Robbins (l95l) to introduce compound decision theory. Here P6 is N(26 - l,l), O E {O,l}. The compound decision rule exhibited for this problem by Robbins (l95l, (37)) is t. where l, x > c(x) a — t (x) = ’ a: l,2,...,n, a— O, x < c(x) a — and c is the symmetric function oo,x<-] C(§)=%Dn(-I--’_-‘-.-l<§CCA is trivial because the points selected as gu-sections for the CA rules must be on the lower boundary of the risk set 5. These points are probability-weighted averages of the 5(Zu)’ and the only way for that average to be on the lower boundary of the convex set S, is for the 5(20) to be on the lower boundary a.e. [Pv J, i.e. to be CCA. O -o Theorem I. Suppose that ® = {O,I,...,m} and that the risk set S is a compact convex subset of [O,w)m+l. Then (A) = (CCA). fliggfi. Suppose .p is not (CCA). Then there exists an o, O: and a set C c X".l such that P.0(C) > O, and, for all 6 -u i E C, s (R ) is not in B, the lower boundary of S. Therefore, -u a -u for each i 6 C there exists s"(x ) E S such that s"(x ) < s (i ) -u a —o a -o a -o where for vectors a and b, a < b means that a| §_bi for all i with a. < b. for some i. Extend the domain of definition of s; to all of X”.1 by defining sh(x ) = s (i ) for i E C. Now let a -o a -o -o v 1;". 'v C. = {x : s '(x ) < s'(x )}, i = O,I,...,m and note that C = uc.. I ~u a -o a -o I Hence, at least one of the CE has positive probability which we suppose to be C without loss of generality. Note that 0 13 (i) s"'(£) 0. Now consider a = ... t“ ... h A ' _g (t', ’ta-l’ o’to+l’ ,tn) w ere to Is a v f: v v deCISIOn rule whose x -section has rIsk point 5 (x ) for all x . -o a -o -u Then R(Qfltf) §.R(§n£) for all §_E @n and R(Q?,£f) < R(§?,£) O 0 O O l,...,e ) a_l,0,ea+l,...,en . That measurable sa can be chosen ensuring the existence where 9? = (O of t; is demonstrated by the following argument. For a point 5 in S that is not in B, let c(s) be the real number such that s - c(s)-l_ is in B where l_ is the vector . Such a c(s) exists because of the compactness and convexity of S. Then let sa(§u) = sa(§u) for Na not In C and sa(§u) = (s - c(s)'l) (Eu) for X in C. The s"(X ) is then a measurable function of i -u a -u -u since 5 (X ) is measurable. U a -u Whereas (CCA) is a necessary condition for (A), our next result shows that (CA) is a sufficient condition for (A) for equi- variant £_ in the two-state component case. A proof or counter example for the general finite state component has not been found. The (CA) condition is a very strong condition and, as can be seen in Examples 2 and 3, when the lower boundary of S is Strictly convex, only simple rules are (CA). Theorem 2. Suppose that ® = {O,l}. If £_ is an equi- variant rule which is (CA), then t_ is (A). IA Proof. The compound risk of equivariant t_ depends on ‘9 only through the number nI of O = l, a = l,2,...,n. Displaying this through the notation R(_3£) = R((no,n]),t) where nO - n - n], we have n nR((no.nl).£) = oil Ra((n0.nl).£) n n = EIEBG = 03Re((”o'”1)'£) + a:I[e“ = IJRa((n0.nl).5) n n -I l e O l e + nlfs (xfiIIPO x Pl )(dgn) — O - I. - - noR ((n0 l,nl),t) + an ((no,nI l),tj where R0 and R1 are defined by position in the last line. Then rRIIo.nI.t) = a‘IIo.n-I).t) (*) i R((I.n-I).;) =%R°((0.n-I).g) + D—;—'-R'III.n-2).ti LRIIn.o).g = R°IIn-I.o).t). Suppose that t_ is not (A), but t_ is (CA). Since 'g is not (A), then there is an equivariant E? such that R((j,n-j),t*) :_R((j,n-j),t) for all j = O,I,...,n, with R((j,n-j),£%) < R((j,n-j),£) for some such j, say jo. On the other hand, since t_ is (CA), if, for any j, Rqu,n-I-j),tf) < R°((j,n-I-j),t), then R'((j,n-I-j),£f) > Rl((j,n-l-j),£) and I5 similarly, if for any j, R1((j,n-l-j),£f) < R]((j,n-l-j),t) then R°IIi,n-I-i).t"‘) > R°IIi,n-I-i).t). Since R((jo,n-jo),£f) < R((jo,n-jo),t) then either R°((io-I.n-i0).§‘)< ROIIJO-Im-JOLQ or Rl((j0,n-l-jo),£f) < Rl((jo,n-l-jo),£). Suppose that Ro((jo-l,n-j0),tf) < R0((jO-l,n-jo),t). (The proof with the other assumption is exactly analogous.) Then, since £_ is (CA), Rl((jo'l,n-jo),£%) > R]((jorl,n-j0),t). Because . R((jo-I,n-jo+l),£*) : R((jo-l,n-jo+l),£) we also get Ro((jO-2,n-jo+l),£f) < RO((jo-2,n-jo+l),t). Then, again because .2 is (CA), R'((jo-2,n-jO+I),tf) > R'((jo-2,n-jo+I),t). Proceeding inductively until the first argument is zero, we get R'IIo,n-I),t*) > R'((o,n-I),t), or that R((0,n),_t_*) > R((O,n),£), which contradicts the assumption that R((j,n-j),tf) §_R((j,n-j),t) for all j. Thus if t_ is not (A), t_ is not (CA). D Theorem 2 concerns m = l, the two-state problem. The following numerical example shows that the method of proof used for m = I will not work for m = 2 and n = 2; thus a new proof must be devised if Theorem 2 is to be generalized to arbitrary finite m and n. Using notation exactly analogous to that in the proof of Theorem 2 suppose IR°III.o.o).t). R'III.o.o).t), R2(Ii.o.o>.gi = (6.2.2) IR°(Io.I.oI.t), R‘IIo.I.0I.t). RZIIo.i.o>.tI) = (2,6,2) IR°IIo.o.I),t). R‘Iio.o,I).g. R2< O, and, for all in E C, sn(2n) does not belong to B, the IOwer boundary of S. Since the uniform prior U = (-l—- —) dominates every G so that m+l”"’ m+Il Pn.l dominates e er Pn-' e ha e Pn l(C) > 0 Let t* be t* U v Y G w v U ' n O. as constructed in the proof of Theorem I (Section 2 of Chapter 2). 3': it E-n l5 7': Since 5 < s and s < s on C it follows that n(X ) .1 n—n n n n-I v . EG sn(§fl) for all G E G and EU ms (x ) < EU Isn (x ). Since m i n-l v , , a Rn(G,tn) - 20 G EG sn(5fl) for G E G, and Similarly for tn, and since U puts positive mass on every coordinate, it follows that A A R(G,tn) 1 Rn(G,tn) for all G E G and Rn(U,tn) < Rn(U,tn). D Our next result shows that for finite ® and equivariant t, the empirical Bayes admissibility of tn (Definition A) implies the compound admissibility of £_ (Definition l). Theorem A. Suppose that ® = {0,l,...,m+l}. Let £_ be an equivariant compound decision rule. If tn is empirical Bayes admissible, then £_ is compound admissible. IS 2522:. The compound risk of t_ at g_ is a function of '9 through p_= (no,nl,...,nm) where for i = O,I,...,n, ni is the number of 6a 8 i, a = l,2,...,n. Let the compound risk of £_ at g_ be denoted by R(22£)- If £_ is inadmissible, there exists a 3* such that R(p,£*) _<_R(p,_t_) for all 36 ®n with R(p°,_t_*) < ,5) for some 2? E @n. By the finiteness of the group of n! permutations and Theorem A.3.2 of Ferguson (I967), E? can be taken to be equivariant. Therefore, it follows that R(pjtf) §.R(pjt) for all p_ with R(p?,£%) < R(p?,t). Let G be any distribution on @. The Gn-average of Ra(9’£) is constant with respect to a for equivariant t_ so that Rn(G,t:) §_Rn(G,tn). Furthermore, if Hn puts positive mass on 2?, then Rn(H ,t:) < Rn(H ,tn). Therefore, tn is inadmissible according to Definition A. D Example A. Empirical Bayes (CCA) ¢)(A). Consider the Robbins component and bootstrap rule E_ of Example I. As demon- strated there, each gn-section of tn is an admissible decision rule in the component. Hence, tn is (CCA) according to Definition 5. However, 3_ is equivariant and, as shown in Appendix A, £_ is inadmissible in the compound problem if n :_2. Therefore, by Theorem A, tn is not (A). As seen by example, conditional component admissibility is not sufficient for the admissibility of an empirical Bayes procedure. The following observation leads to a characterization of empirical Bayes admissibility with which it is easy to show that certain Bayes empirical Bayes procedures are admissible. Let n be given and consider the statistical decision problem with states G E G, 20 observation g_= (Xl,...,Xn_]) ~ P3-l, actions d E D, where D is class of all component decision rules, decision rules t which are igfmeasurable mappings into D, and loss function R(G,d). The risk of t is EE-IR(G,t(X)) = Rn(G,tn) where tn is the empirical Bayes rule tn(X',...,Xn) = (t(29)(Xn). Thus, tn is empirical Bayes- admissible (Definition A) if and only if t is admissible in the usual sense in the decision problem (G,D,R,Rn) defined above. Theorem 5. Let ® = {O,I,...,m}, S C [O,w)m+l, and suppose that n :_2 is given. Let A be a prior distribution on G, the m-dimensional simplex of distributions on 8, and suppose that the support of A is all of G. If t is a decision rule in the decision problem (G,D,R,Rn) which is Bayes with respect to A and fRn(G,t)A(dG) is finite, then tn = t is an admissible empirical Bayes decision rule (Definition A). Pippf, The proof used by Ferguson (l967, Theorem 2.3.3) covers the present situation. Here G is the m-dimension simplex in Rm+l whereas the parameter set is the real line in Theorem 2.3.3. Here the decision problem (G,U,R,Rn) has risk functions n-l EG R(G,t(2)) = Eg-I(ZgGiR(i,t(2))) which are continuous functions of G. U 2i 2. Bayes Procedures in the Finite State Component Empirical Bayes Problem.' One can regard the empirical Bayes decision problem as a classical decision problem with parameter G E G. Therefore, it is not surprising that Bayesian decision rules have been suggested for use in the empirical Bayes problem, for example, Lindley (l97l, §l2.l), Tucker (I963), Meeden (I972) and Shapiro (l97A). Tucker (I963) and Meeden (I972) for certain infinite state component decision problems have demonstrated the (empirical) Bayes risk consistency of Bayes procedures. Since asymptotic optimality in empirical Bayes problems is Bayes risk consistency (at every G), the Tucker and Meeden Bayes rules ”solve” certain empirical Bayes problems. Robbins (l95l, 53) conjectured that Bayes procedures might be solutions to the compound decision problem. Since for equivariant E, compound risk convergence to the simple envelope for every p_ in the compound problem implies that tn is (empirical) Bayes risk con- sistent (for every G), as observed by Gilliland and Hannan (l97A), Robbins' conjecture has implications to the Bayesian approach to empirical Bayes. In fact the two state results of Gilliland, Hannan and Huang (I97A) demonstrate the (empirical) Bayes risk consistency for a large class of priors and two state components. Shapiro (l97A) considered a testing component and investigated the asymptotic pro- perties of the Bayes procedures and the average loss across component decisions. Rolph (l968) and Ferguson (I97A) have investigated the problems of placing prior distributions on classes of distributions 22 G. This problem is trivial in the case ® = {O,I,...,m} where G is a subset of Euclidean (m+])-space. This problem is difficult for non-finite @, particularly, when tractable Bayes procedures are sought in order to make the consistency question tractable. Ferguson (l97A) claims there are at least two desirable characteristics of such prior distributions: (I) the support of the prior with reSpect to some suitable topology on the space of probability measures should be large, and (2) the posterior distribution given a sample from the true probability measure should be manageable analytically. The resulting Bayes rules then are desirable since they are generally admissible and have nice large sample properties. Throughout this chapter O = {O,I,...,m}, S is a compact subset of [O,w)m+l, G is the m-dimensional simplex of probability distributions G = (G G1,...,G ) on 9 and A is a probability 0’ m distribution on G. The conditional on 3n risk of the empirical Bayes rule t = tn at G is (I0) R(G,t) = GiR(i,t) 0 "Ma 3 and the (empirical) Bayes risk at G is n -I G R(G,t). (II) Rn(G,t) = E The “second level Bayes risk” of t with respect to A is (12) BIA.t) = fRn(G.t)A(dG)- Lemma l. B(A,t) is minimized by‘ t = t * where ¢ G G” = EEGIXnJ and t(.) denotes a Bayes response in the component problem. 23 Proof. Let P denote marginal distribution of 3“, here- after denoted by 2, and let Q denote the conditional distribution of G given 2: By (l2) and (II), (I3) B(A.t) = If R(G.t(§9)Q(dG)P(d3). Substituting (l0) and interchanging integral and finite sum we obtain BIA.t) f fGiQ(dG) R(i.t I3))P(dg) O IIMB (in) ' /R(G*.t(3))P(d_§g) which is minimized by the choice t(3) = t *. D It is interesting to note that tfie result of Lemma 5 can also be obtained as a corollary to results of Gilliland and Hannan (l97A). They Show on pp. lO-ll that a Bayes compound rule versus a symmetric prior 8 on @n in the compound decision problem has ta(§u) = t_-v a = I,2,...,n where g_= (EO’EJ""’!n) is given by their (A0), namely ()5) w. = up D A ll x3 m - 3 A—o ~: 0 3 Z 2. r' J n+m O O O O 0 Here the sum ranges over all ( m ) empirical distributions p_ of g, fj = de/dp, * denotes symmetrization, and nji = ni if j # i, nji = ni - I if j = i, i,j = O,I,...,m. Consider 8 = B(A) de- fined by n ITI n. (16) Bn(A) = IIn)(_n GJJ)A(dG); Gilliland and Hannan (l97A, Remark 2) show that for such 8, tw(; ) —-—n 2A is a Bayes (A) empirical Bayes procedure. Using (l6) in (IS) we find that n_ m n.. m n.. * (I7) fl; = n “3th ( n C.J')( x t“) A(dG) _' J'-_-O J j=0 J where 24 = ("Oi’ ’nmi) Since the marginal denSIty of 5“ IS n-l m H ( Z Gifi(xa))’ the integrand of (I7) is proportional to Gi times a=l i=0 the conditional density of G given X“. Thus, g_ is proportional to C“ = EEGIX 1. "TI The empirical Bayes rule which is second level Bayes (A) has been seen to be the rule which is first level (component) Bayes versus the induced estimator G“. We note that G% depends on A through the conditional expectation Q. By Theorem 5, tn = t * is G an admissible empirical Bayes procedure. In the next section we give conditions on A sufficient for the Bayes risk consistency of t * at every G E G. G 25 3. Bayes Risk Consistency of the Bayes Procedure. The following lemma is a corollary to Lemma I of Oaten (I972). Lemma 2 (Oaten (I972, p. II67)). Let R(i,t) §_M < m for all i 6 {O,I,...,m} and component decision rules t, that is, suppose S c [O,MJm+l. Then for all F,G e G, m (18) D §_R(G,tF) - R(G) :_M 2 |I=i - Gil. i=O It follows from Lemmas l and 2 that if S c [O,M]m+l then the Bayes (A) empirical Bayes procedure tn = t * satisfies G (I9) 0 §_R(G,tn) - R(G) :_(m+I)iM “C” - G“ a.s. [PE'IJ . m+l where H n denotes the usual Euclidean norm on R . Hence, the a.s. [PE] consistency of the conditional expectation G“ = EEGIXflJ at G implies the strong empirical Bayes risk consistency of tn at G. In turn, by the boundedness of “G* - G“, this implies the mean or usual empirical Bayes consistency of tn at G. We will use Theorem 6.l of Schwartz (I965) to establish the a.s. [PE] consistency of EEGIXflJ at G for all G 6 G. For this purpose we establish some lemmas which serve to verify the hypotheses of the Schwartz theorem for our application. In what follows super- script c indicates complementation. Lemma 3. Let G be any point of G and V any G neighborhood of G. There is a uniformly consistent test of P = PG versus P 6 {PF: F E Vc}. Proof. Fix G E G. It suffices to prove the result for neighborhoods V of the form 26 V —m I i O,I,...,m. If we identify [0,94] with a subset of Rm+l, namely, {(Eovab) x (949E}) xu-X (G .EBII n G where En < G. < G' for i {(x0,x],...,xm): xi 6 [0)E43} and similarly for [G},l], then m Vc = U {([O,§J] U LG}.I]) PIG}. Let' [O,Gq] flG = 24 and i=0 [G},l] n G = O} for i = O,I,...,m. Note then that if G is on the boundary of G, then one or more of the corresponding 94 or Ui is empty. Let U = {g4: i = O,I,...,m} U {U}: i = O,I,...,m}. Then, since U is a finite collection of sets and, UU = VC, by Kraft (I955, Theorem 7) it suffices to show that for each nonempty U E U, there is a uniformly consistent test of P = P against the alternative G P 6 (PF: F E U}. Take U = U} nonempty. (The case U = 24 is exactly analogous.) Let hi be a function such that I if J = I (20) Ejhl = s '9.) = 09]: ,lTl 0 If j # I Suppose that hi is bounded by the constant Mi . (For example, one can take hO’hl"'°’hm to be a basis dual to the densities f0,f for each n and §_€ X0° define l""’fm in L2(u) as observed, e.g. by Robbins (I96A).) Now IIMD (Zl) n. (x) =% O. I hi(xo) and the test function (22) m = where c = HGi + G}). We will Show that on is a uniformly con- sistent test of P = PG versus P 6 {PF: F E Ui}' Clearly hi = G. < c. Now let F E G}. Then F. n . Eth + 0 Since E ' I G :Gi so that for each n = l,2,... (23) E:(l - Tn)-: PPE-(h}n(x) - Fi) 3_G} - c]. By the Hoeffding bound (I963, Theorem 2), RHS (23) is bounded by exp{-Bin} where Bi is a constant depending only on Mi and G} - c. Hence on is a uniformly consistent test of P = PG versus P E (PF: reii'i}. i] For F,G E G we define f6 (2‘)) KL(G,F) = E (Cm—r). G f F the Kullback-Liebler information number between the mixture densities fG and fF Lemma A. Suppose that the support of A is all of G. Let G be any point of G and V any G neighborhood of G. Given 8 > 0 there exists a subset W c V such that A(W) > O and KL(G,F) < e for all F E W. Proof. Fix G E G and let V be a G neighborhood of G. Let e > 0 be given. For each 6 > O we define U6 = {F: Gie-é i-Fi :Gie-6 + l - e-G, i = I,2,. ,m}, -6 m -6 m -6 V6 = {F: Gie :_Fl, i = l,2, ,m and 2 FE :_e 2 GE + l - e }. i=l i=l W6 = {F: G.e-6 < F. < 6.6-6 + l-(l - e-6), i = l,2,. ,m} I —- i - I m and note that W c V c U . We see that U C V for sufficiently 6 O 6 6 small 6. Let 6 be such that U c V and 6 < 5. Since on O 5 O O -6 -6 VO’ F0 :_e GO, we see that fF 3_e fG on V6 so that KL(G,F) < e for F E V6 . Since each W6 contains a nonempty open sub- 0 set of G and every point of G is a point of support of A, A(VO ) > O. U 0 Lemmas 3 and A verify hypotheses (ii) and (iii), respectively, of the following theorem which has been converted to our notation. Recall that Q denotes the conditional distribution of G given = x ... x * = ' . ( l’ , n-l) and G ELGlfn] Theorem 6. (Schwartz (I965, Theorem 6.l)). Suppose that 2 _fl (i) the densities fG(x) may be chosen to be jointly measurable in G and x, (ii) V is a neighborhood of G and there is a uniformly consistent test of P = PG versus P 6 {PF: F,G Vc}, and (iii) for every 5 > O, V contains a subset W such that A(W) > 0 and KL(G,F) < e on N. Then q(v°) + o a.s. [PE]. Our final lemma will be used to complete the last Step of the proof of the Bayes risk consistency of the Bayes (A) empirical Bayes rule tn = t *. G Lemma 5. Suppose that A is a distribution on G. Let G E G be such that for every neighborhood V of G, Q(Vc) + O a.s. [Pm]. Then “C* - G“ + O a.s. [P“ G G]° 29 Proof. Let G E G. Note that (25) “6* ' GII = I|f(F ' GlQ(dFIII f. fIIF ' GIIQ(dF) where the inequality is the Minkowski integral inequality (e.g., see Fabian and Hannan (I973, §l.5)). For each s > 0 let VE - {F: “F - G” < e}. Partitioning the integration into integra- tion over V5 and over v: and using the bound “F - G“ :_l, (25) Yields oo (26) “a" - G“ i e + q(v:) a.s. [PG]. Since 5 > O is arbitrary the proof is complete. D Lemmas l-5 and the Schwartz theorem are used to prove the main result of this section, namely Theorem 7. Suppose ® = {O,I,...,m} and S is a compact set. Suppose A is a probability distribution with support G. Then a Bayes (A) empirical Bayes procedure tn is Bayes risk con- sistent at every G 6 G, i.e. (27) Rn(G,tn) + R(G) for all G E G. 2192:. By Lemma l, tn = t * is a Bayes (A) empirical Bayes G procedure. By Lemma 2, this tn satisfies (l9) for each n. By A Lemmas 3-5 and the Schwartz theorem (Theorem 6),“G - G“ + O a.s. [PE] for all G E G from which (27) follows by the bounded con- vergence theorem. D APPENDICES APPENDIX A In this Appendix we Show that the Robbins (l95l) bootstrap rule £_ defined in Example I is an inadmissible compound rule if n = 2. The demonstration for general n :_2 is Similar but nota- tionally complicated. Since @2 is finite, if the equivariant rule E_ is ad- missible it is Bayes versus some invariant prior 8 on the four states (0,0), (O,I), (l,0), (l,I). Invariance of 8 requires In Huang (l972) only 8 with 80 are con- BOI = 810' 0 = 8II sidered. It can be shown for general invariant 8, a Bayes compound rule with respect to 8 must be equal a.e. [Lebesgue on R2] to the equivariant compound rule E_ with I 'f i(' ' BDo ' BII)(A2 ' Al) + BIIAIAz ’ BOO 3-0 0 'f i(' “ BOO ' BII)(A2 ' Al) + BIIAIAZ ’ 800 < o where Ai E exp{2xib i = l,2. Note that if Bil > 800 then (xl,x2) = (0,0)is in the interior of the region deciding(6I = l,62 = l). The Robbins bootstrap rule £_ of Example I has (0,0) on the boundary of the partition it induces in R2 so 5_ is not equal a.e. [Lebesgue on R2] to a rule 2_ Bayes with respect to B, B > 8 ll 00' Similarly, for B with 8]] < B The partitions induced by .3 00° 30 3I and by Z_ for 8 = B = are shown in Figure l of Huang (l972). I ll 00 3 The separating curves for the E_ partition have vertical and horizontal asymptotes for every 8. The E_ partition has separating 2 not equal a.e. [Lebesgue on R2] to any E, Thus, £_ is inadmissible. curves with asymptotes x] + x2 = 2, x1 + x = -2 so that g_ is APPENDIX 8 Theorem. In the 2 x 2 component testing problem, if f (X) I O 0 0 Z = has continuous cumulative distribution function on (0,m) fOIX) under both P0 and P], then the lower boundary B of the risk set S is strictly convex. _0l _Ol .. Proof. Let s] - (51,51) and 52 — (52,52) be distinct points on B, corresponding to rules tI and t2 respectively and l i assume without loss of generality that s? > 52. Thus, 52 > 5'. From the Neyman-Pearson lemma, we know that tl is of the form H II .< «m N II 7? O .h N A x. where kI 3_0 , and similarly, there is a k2 > k‘ associated with t2. We assume for the moment that kl # 0, k2 # m . Then, by the continuity of the distribution of Z, we know that the probability that Z = k' or Z = k2 is zero under either P0 or P]. We arbitrarily assign yi = I, i = l,2 and write si = (P0[Z 3_kiJ, P][Z < kiJ), I = l,2. Let B be given, 0 < B < l, and define 32 33 s8 = BsI + (l - B)s2 and note that s is the risk point associated with the rule 8 = + .. tB tI (l 8)t2, namely, I If 2 3_k2 t8 = B If kl §_Z < k2 0 if 2 < kl Since tB does not possess Neyman-Pearson structure, 58 is not on the lower boundary B. (58 is dominated by any Neyman-Pearson test 0 f ' . 0 Size 58 ) The cases kl = 0 and k2 = m can be handled in analogous waystflusonly difference being the choice of ‘YI’YZ' (The points of B on the two axes are (POEZ 3_mJ, PIEZ < 0°J) and (Potz > 0], PIEZ : 01).) REFERENCES REFERENCES Brown, L.D. and Purves, R. (I973). Measurable selections of extrema. Ann. Statist. l 902-9l2. Copas, J.B. (l97A). 0n symmetric compound decision rules for dichotomies. Ann. Statist. g l99-20A. Fabian, vaclav and Hannan, James (I973). Introduction £p_Probability and Mathematical Statistics. Lecture notes, Statistics and Probability, MSU. Ferguson, Thomas (I967). Mathematical Statistics, A_Decision Theoretic Approach. Academic Press, New York and London. Ferguson, Thomas S. (I97A). Prior distributions on spaces of proba- bility measures. Ann. Statist. g '6I5-629. Gilliland, Dennis C. and Hannan, James (I97A). The finite state compound decision problem, equivariance and restricted risk components. RM-3l7, Statistics and Probability, MSU. Gilliland, Dennis C., Hannah, James and Huang, J.S. (l97A). Asymptotic solutions to the two state component compound decision problem, Bayes versus diffuse priors on proportions. RM-320, Statistics and Probability, MSU. Gilliland, Dennis C. and Hannan, James (I976). Improved rates in the empirical Bayes monotone multiple decision problem with MLR family. RM-352, Statistics and Probability, MSU. Hannan, James F. and Robbins, Herbert (I955). Asymptotic solutions of the compound decision problem with two completely specified distributions. Ann. Math. Statist. 26 37-Sl. Hoeffding, Wassily (I963). Probability inequalities for Sums of bounded random variables. J, Amer. Statist. Assoc. 5g 13-30. 3A 35 Huang, J.S. (l972). A note on Robbins' compound decision procedure. Ann. Math. Statist. 32 348-350. Kraft, Charles (I955). Some conditions for consistency and uniform consistency of statistical procedures. Univ. pf_Calif. Publications 12 Statist. Lindley, D.V. (I971). Bayesian statistics, a review. Regional Conference Series ip_Applied Mathematics No. 2, SIAM, Philadelphia. Meeden, Glen (I972). Some admissible empirical Bayes procedures. Ann. Math. Statist. A} 96-l0l. Oaten, Allan (l972). Approximation to Bayes risk in compound decision problems. Ann. Math. Statist. 53 ll6A-ll8A. Robbins, H. (l95l). Asymptotically subminimax solutions of compound statistical decision problems. Proc. Second Berkeley Symp. Math. Statist. Prob., l3l-IA8. Robbins, H. (I956). An empirical Bayes approach to statistics. 3522: Third Berkeley Symp. Math. Statist. Prob. I, University of California Press, l57-l63. Robbins, Herbert (I96A). The empirical Bayes approach to statistical decision problems. Ann. Math. Statist. 32 I-20. Rolph, John E. (I968). Bayesian estimation of mixing distributions. Ann. Math. Statist. 39 l289-l302. Schwartz, L. (I965). 0n Bayes' procedures. 2, Wahrscheinlichkeits- theorie und Verw. Gebiete. A l0-26. Shapiro, C.P. (l97A). Bayesian classification: asymptotic results. Ann. Statist. g 763-77h. Tucker, Howard G. (I963). An estimate of the compounding distribution of a compound Poisson distribution. Theor. Prob. App]. g l95-200. Van Houwelingen, J.C. (I973). 0n empirical Bayes rules for the con- tinuous one-parameter exponential family. Doctoral thesis. Rijksuniversiteit te Utrecht, Netherlands (to appear in Ann. Statist. (I976)). “IIIIIIIIIIIIIIIT