IIIIIIIIIIIIIIIIIIIIIIIIIIIIII 309 3289 LIBRARYO Michigan State University This is to certify that the dissertation entitled Asymptotically Optimal Bayes Compound and Empirical Bayes Estimators in Exponential Families with Compact Parameter Space presented by Somnath Datta has been accepted towards fulfillment of the requirements for Ph.D. degree in mum Date July 25, 1988 MS U is an Affirmative Action/Equal Opportunity Institution MSU RETURNING MATERIALS: Place in book drop to LIBRARIES remove this checkout from .—:_—-. your record. FINES will be charged if book is returned after the date stamped below. ' ASYMPTOTICALLY OPTIMAL BAYES COMPOUND AND EMPIRICAL BAYES ESTIMATORS IN EXPONENTIAL FAMILIES WITH COMPACT PARAMETER SPACE By Somnath Datta A DISSERTATION Submitted to Michi an State University in partial ful illment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1988 - , ('2 '11:} .f // 74. g. .2 : ABSTRACT ASYMPTOTICALLY OPTIMAL BAYES COMPOUND AND EMPIRICAL BAYES ESTIMATORS IN EXPONENTIAL FAMILIES WITH COMPACT PARAMETER SPACE By Somnath Datta The problem of finding admissible, asymptotically optimal compound and empirical Bayes rules is pursued in the infinite state case. The component distributions considered in this work form a real exponential family of quite general nature with component parameter in a compact interval of the natural parameter space. The component problem estimates an arbitrary continuous transform of the natural parameter under squared error loss. We consider the set, sequence compound and the empirical Bayes formulations of the above component and show that all Bayes estimators in the various formulations are admissible. Our main result is that any Bayes compound estimator versus a mixture of i.i.d. priors on the compound parameter is asymptotically optimal if the mixing hyperprior has full support. Analogously any Bayes empirical Bayes estimator is asymptotically optimal if the empirical Bayes prior has full support. The exponential family structure has been used to treat the difference in risks of the Bayes estimators and the component Bayes versus the empiric for some Special cases of the continuous transforms. The key to the proof of asymptotic Optimality is an L1 consistency of posterior mixtures, itself a major finding of the thesis and extendible far beyond the exponential context. The thesis also derives an interesting uniform L1 LLN for random continuous functions on a compact metric space which is applied in the proof of the last result. The asymptotic Optimality results are generalized to weighted squared error loss with continuous weight function and applications to some non—exponential situations are also considered. Several examples of such hyperpriors/empirical Bayes priors are given and for some of them practically useful forms of the corresponding Bayes estimators are obtained. To my parents and my wife iv ACKNOWLEDGMENTS I wish to express my deepest appreciation to my thesis adviser Professor James F. Hannan for everything he has done for me. His invaluable suggestions led the way to some results and continuous careful criticism greatly improved the overall presentation of this thesis. I like to express my thanks to my other committee members Professors Hira L. Koul, Dennis C. Gilliland and Clifford Weil for their suggestions on an earlier draft. Special thanks go to my wife Susmita for her total support and encouragement during the preparation of this thesis. Finally, I would like to thank Ms. Loretta Ferguson for numerous helps during my typing of the manuscript. TABLE OF CONTENTS Chapter Pag 0. INTRODUCTION ..................... l 0.1. The component problem ............... 1 0.2. The set compound problem ............. 2 0.3. The sequence compound problem ........... 3 0.4. The empirical Bayes problem ............ 4 0.5. Literature review and a summary of the present work 5 1. THE SET COMPOUND ESTIMATION ............ 9 1.1. Introduction ................... 9 1.1.1. Notations and conventions .......... 9 1.1.2. Exponential family component ......... 9 1.2. Estimators induced by priors on Q ......... 11 1.2.1. Bayes versus mixture of i.i.d. priors . . . . 11 1.2.2. Admissibility ................ 13 1.2.3. A useful inequality on the modified regret . . 13 1.3. A bound on the L1(E0) distance between two component Bayes rules when ¢(0) = ea‘, k 6 ll ........ 14 1.4. Consistency of the posterior mixtures ....... 15 1.5. Asymptotic optimality ............... 23 2. THE SEQUENCE COMPOUND ESTIMATION ........ 25 2.1. Introduction ................... 25 2.2. Bayes versus EX .................. 25 2.2.1. The Bayes sequence compound estimator . . . . 25 2.2.2. Admissibility ................ 26 2.3. Asymptotic optimality ............... 26 vi 3. THE EMPIRICAL BAYES ESTIMATION .......... 28 3.1. Introduction ................... 28 3.2. Bayes versus A .................. 28 3.3. Asymptotic optimality ............... 29 4. EXAMPLES OF A AND CONCLUDING REMARKS ..... 30 4.1. Introduction ................... 30 4.2. Examples of A ................... 31 4.3. The Bayes estimators ............... 34 4.4. Remarks ...................... 37 APPENDIX ......................... 40 A.1. On bounding the difference of two ratios ..... 40 A2. On the uniform convergence of convex functions . . 41 A.3. A uniform Ll- LLN for independent random continuous functions .................... 42 AA. Admissibility of Bayes estimators in the compound problem under squared error loss ......... 46 BIBLIOGRAPHY ....................... 48 vii CHAPTER 0 INTRODUCTION We start with some notational conventions used throughout the body of the thesis. Let n be a positive integer. An n—vector (x1, ..., xn) is denoted by x and for lSaSn, (x1, ..., x0) is denoted by 290' For n probabilities P1, ..., Pn, x P a denotes their measure theoretic product. Or=1 Typically the letter P is used for probabilities and E for the corresponding expectations. For the sake of clarification, dummy variables are often displayed in integrals. Also mixed mode integral expressions like ]X(w)dP are used. For a bounded function f, f... and f* denote its infimum and supremum reSpectively over its entire domain. For a measure m on the Borel a—field of a tOpological space .7 the support of m is defined to be the set n{Fc.7 : F is closed and m(Fc) = 0}. Note that the support of m = 3' iff, V Open (1) ,e 0C3; m(O) > 0. R, I, ll stand for the set of reals, integers and non—negative integers respectively. 1. The component problem. The component problem has the structure of the usual decision theory problem, i.e. we have a parameter Space 9, a family of probability measure {P 0 : 069} on some common measurable space .3 an observable .$—valued random variable X ~ P 0 under 0, an action space A a loss function Lwfle —» [0,oo), decision rules t, t:.$ —. .1 such that L(t,0) is measurable for each 0, with risk R(t,0) = E0 L(t,0). 2. The set compound problem. The set compound problem simultaneously considers a number, say 11, of independent decision problems each of which is structurally identical to the above component problem, and allows the use of observations from all the problems in each of the decisions. The compound loss is taken to be the average of all the component losses. Thus for each n_>_1, the set compound problem can be formulated as a decision problem as follows. We have the parameter space 9”, the action 11 space .25“, observations 2: 2 (X1, ..., Xn) ~ P0 = P0 , 0 = (01, ..., 0n) — 1 a a: 6 6”, compound rules t = (t1, ..., tn), where for each lgasn ta: .2“ —-» .4 such that L(ta,0) is measurable for each 0, with loss Ln(t,_Q) —1 n . n E L(ta’0a) and l‘lSk (2.1) Knee) = E, Luca). Let Q = {w : w is a probability on 9}. For a) E (I, let R(w) stand for the minimum Bayes risk versus w in the component problem, i.e. R(w) = 2 [R(t,0)dw(0). For a traditional simple symmetric rule (i.e. t a(_)_(_) = t(xa) V lgagn for some component rule t) the compound risk is easily seen to be at least R(Gn), Gn being the empirical distribution of 01, ..., on. In all non—trivial situations a component Bayes rule versus GB is unavailable to the statistician because Gn is unknown and hence R(Gn) cannot be achieved (for any 11) via the use of a simple symmetric rule. Thus compound rules which attains risks asymptotically no more than R(Gn) are Of interest. Hannan (1957) used the term 'approximation to Bayes risk' to describe such effects. For a compound rule t, the difference Dn(t,_Q) = Rn(t,£) - R(Gn) is called the modified regret of t at Q. We say that a rule t is asymptotically Optimal (a.o.) if (2.2) \g Dn(t,fl)+ —’ O as n —+ co . For the relation of this notion of Optimality to that with more stringent envelopes in the finite 9 case, see Gilliland and Hannan (1986). A set compound rule 1, is said to be admissible if for each n21, Rn(t,fl) is admissible in the usual decision theoretic sense as a function of Q in the class of set compound rules. 3. The sequence compound problem. The sequence compound problem also considers a number, say 11, of independent repetitions of a component problem but allows only data up to stage a in making the a—th decision for 15asn. Thus a sequence compound rule is t = (t1, ..., tn), t a: .3“ —» .1 such that L(ta,0) is measurable for each 0, with the interpretation that the a—th decision made with the use of t is ta(za) for each ISOSn. In the weak sequence compound (w.s.c.) version 11 is known to the statistician apriori so that each t a’ 1$a$n may use 11. We will use tn for a w.s.c. rule to show its possible dependency on n. The other version is called the strong sequence compound (s.s.c.) problem and is more interesting. In both versions we are interested in the asymptotic risk behavior of compound rules as 11 tends to infinity. Hence a w.s.c. rule can be viewed as a triangular array tn, n21 and an s.s.c. rule as a sequence t = (t1, t2, ). It should be noted that an s.s.c. rule is automatically a w.s.c. rule. The risk (up to stage n) and the modified regret of a sequence compound rule 1, (or tn) is as given by (2.1) and (2.2) (with the understanding that t is viewed as a function on .2“ as t a(;) = t a(; a) for ISOSn). The notion of asymptotic optimality remains the same. A sequence compound rule 1 (or tn) is said to be admissible if, for each n21, Rn(t,Q) (or Rn(_tn,Q)) is admissible in the usual decision theoretic sense as a function of Q in the class of sequence compound rules. [This is the natural definition for w.s.c. rules and therefore more demanding of s.s.c. rules] 4. The empirical Bayes problem. The empirical Bayes problem considers a sequence of independent, identical Bayes decision problems. In this case the component problem is the same as that in Section 1 with the additional notion that the component parameter 0 is a random element having a (prior) distribution w on 9. Thus the risk of a component rule t is R(t,w) = ]R(t,0)dw. The prior w is unknown to the statistician even though it is believed to exists. The empirical Bayes problem has generic point Q = (01, 0 , ) representing the true states and data X = (X1, X2, ) from the problems and the assumption is that (01,X1), (02,X2), are i.i.d. COpies of (Q,X) having distribution w on 0 and, conditional on Q, P 0 on X. Let E denote the overall expectation. At stage n, a decision tnfin) about fin is taken by tn: 5" -» .1 with loss L(tn,0n), which is jointly measurable, and the risk incurred is Rn(tn,w) = E L(tn,0n) = f] L(tn,0n) dPflndwn. We call an empirical Bayes (e.B.) rule. An e.B. rule is called asymptotically Optimal if lim Rn(tn,w) = R(w), Vwe fl. n—+oo An e.B. rule is said to be admissible if for each n, Rn(tn,w) is admissible in the usual decision theoretic sense as a function of w in the class Of possible t n' 5. Literature review and a summary of the present work. The pioneering paper of compound decision theory is by Robbins (1951). His featured example was decision between N(-l,1) and N(l,1). He exhibited an a.o. compound procedure and called it asymptotically subminimax by comparison with the simple symmetric minimax rule. A.o. compound and empirical Bayes rules have been worked out for many choices of component problem. Typically they are bootstrap (or delete bootstrap) in nature, - rules whose components are Bayes versus some estimate of the unknown Gn (or w in the e.B. case) or direct estimates of the Bayes rule versus Cu (or w). In particular, when the component problem is an estimation problem under squared error loss, Gilliland (1968) and Singh (1974) obtained a.o.sequence compound rules with rates (we say 3, is a.o. with rate an if \é Dn(t,Q)+ = 0(an)) for discrete and Lebesgue exponential components respectively. Though the above mentioned rules satisfy the criterion of asymptotic Optimality, they are not very satisfactory as far as their finite n behaviors are concerned. In fact they turn out to be inadmissible where admissible is as defined in the previous sections. Thus the problem of exhibiting compound and e.B. rules which are so. as well as admissible has been an interesting and challenging question ever since it was put forward by Robbins (1951) in the sense that he proposed the Bayes compound rule versus the symmetric prior uniform on proportions for his featured example and conjectured that it might have better risk behavior than his asymptotically subminimax rule. Inglis (1973) studied, i.a., the asymptotic Optimality of a class of admissible Bayes rules for two state components under the finiteness of the expected log-densities and tacit (cf. Inglis (1977)) non-atomicity conditions for his "generalization" of the Harman—Robbins theorem; cf. the addenda of the next two works. Gilliland and Harman (1974, The finite state compound decision problem, equivariance and restricted risk components, RM 317, Department of Statistics and Probability, Michigan State University), which was later published in 1986, treated the more general problem of restricted risk components in the finite 9 case. They worked with a more stringent enveIOpe and reduced the problem of asymptotic Optimality to the problem of establishing the L1 consistency of certain induced estimators. Gilliland, Harman and Huang (1976) established that consistency, in two state components, for Bayes compound estimators versus certain symmetric priors including Robbins prior. This approach yielded for them admissible rules which are so. with rates as good as 0(n—1/2) in the general two state component case. Vardeman (1978) successfully exploited a result by the last authors to obtain admissible, a.o. sequence compound rules in the two state component case. None of the results mentioned in the previous paragraph go beyond the finite 9 case. Meeden (1972) obtained admissible, a.o. empirical Bayes rule in two special infinite state examples, where the component problems are (i) squared error loss estimation of Geometric parameter and (ii) linear loss testing of Poisson mean. Inglis (1973) attempted to prove the admissibility and asymptotic Optimality of a class of Bayes compound estimators versus mixtures of i.i.d. priors in example (i) above with compact parameter space. Unfortunately his proof of asymptotic optimality appears to contain certain serious gaps. For a discussion on this see the addendum of Gilliland and Hannan (1986). The present work, which subsumes Inglis's example, seems to be the first successful attempt in the literature to accomplish compound admissibility and asymptotic Optimality simultaneously in non—finite state case. Our component distributions form a one dimensional exponential family Of quite general nature, whose examples include well known exponential families such as Normal, Exponential, Geometric, Poisson and Negative Binomial, where the parameter space is any compact interval of the natural parameter space on which the first moment is finite. The component problem is to estimate an arbitrary continuous transform of the natural parameter under squared error loss. We note that all compound Bayes estimators in our set and sequence compound problems are admissible. Our main result is that those Bayes versus a mixture of i.i.d. priors on the compound parameter are so. if the mixing hyperprior has full support. In the empirical Bayes version of our problem, our conclusion is that all Bayes e.B. estimators are admissible and those versus a prior with full support are so. In the set compound situation, for a dense class of continuous functions, the question of asymptotic Optimality is reduced to the question of the L1 consistency of a posterior mixture which itself is of independent interest. We make use Of an inequality suggested in the addendum of Gilliland, Hannan and Huang (1976), develop and use an uniform Ll- LLN and the full support prOperty of A to treat the resulting terms and prove the required consistency. The results in the sequence compound and empirical Bayes problems, to a large extent, follow from the compound results. The thesis is organized as follows. Chapter 1 treats the set compound problem described above. Section 2 obtains the admissibility of all Bayes estimators from Lemma A.4, describes the Bayes estimator versus the above mentioned prior and establishes a bound on its modified regret. Section 3 establishes a bound, in terms of the L1 distance between the corresponding mixtures, on the L1 distance between two component Bayes estimators of ¢(0) = e0k for k 6 l and 0 the natural parameter. Section 4 proves the consistency result as detailed in the previous paragraph and adapts it to the delete versions. Section 5 combines the results of Sections 2-4 and establishes the desired asymptotic optimality. Chapter 2 and 3 considers the sequence compound and empirical Bayes formulations respectively and obtains similar admissibility and Optimality results. Chapter 4 contains various examples of hyperpriors having full support. In some cases practically useful forms of the Bayes estimators are obtained. To this end some possible generalizations are recorded. Finally some results of possible independent interest in more general contexts, which include the aforementioned uniform Ll-LLN for random continuous functions on a compact metric Space, are derived in the appendix. They are used in the body of the thesis. CHAPTER 1 THE SET COMPOUND ESTIMATION 1. Introduction. 1.1. Notations and conventions. In this chapter we consider the set compound problem as described in Chapter 0 corresponding to the component problem to be introduced in the next section. We assume that the reader is familiar with the notations and definitions of the general set compound problem from Chapter 0. The following additional notations and conventions will be used throughout this chapter. 11 x We will use P for PQ = P 0 and E for the corresponding expectation. 1 a 0: Given any vector u = (111, ..., un), for each lSaSn, u; will mean the vector (v1, ..., vn—l) with vj = uj for j Ixeodex) is continuous on 9. dx 0x cx dx < e S e V e on O, the continuity of h and [Since ecx A e 0 ~~> [xeoxdp(x) including one sided continuity at the end points follow readily by the Dominated Convergence Theorem (D.C.T. hereafter).] 11 C3. For any 0 E O, (1.3) p, s (h"‘/h...)(1>c + pd), and hence any f e L1(Pc) n L1(Pd) is uniformly integrable wrt the family of probability measures {P 0 , Q E 9}. In view of (1.2) the identity function is such. C4. Since, for any 0 E O, * (1.4) h,..(ecx A ed") 5 p0(x) 5 1. (ecx v ed"), and for any (.12 E 0, p w inherits the above bound (1.4) on p0 , we have 1) (X) * (1.5) [log p—w—ml s |x|(d — c) + log £1- , x e a w, for any 11), w’ E Q. 2. Estimators induced by priors on n. 2.1. Bayes versus mixture of i.i.d. priors. Consider D with the topology of weak convergence. Let B(fl) denote the Borel a—field of 0. Let A be a probability on (0, 8(0)). Define the prior (for each n) ER on On as follows 11 (2.1) 52(le...x an) = 1 Onlqsi) dA, 1: for B ,B Borels of 9. [Note that the above integral makes sense 1,... because the integrand is non—negative and measurable. For a proof of n measurability it suffices to take n=1 and B1 Open. But then it follows from a defining property of weak convergence (Billingsley (1968), Theorem 2.1.iv) that {w : w(Bl) > c} is open and hence measurable V c>0.] Hereafter 12 we will drOp the superscript n in ER, as it will be clear from the context. By the Fubini Theorem, the a—oomponent Bayes risk against E A of an estimator t = (t1,...,tn) in our compound problem is (2.2) an, 1.2,) = IR,(19,(ta- 45(00))2i21p9i(xi)dw,()du 30(0)) n .. . Let c = II p (x.), A denote the probab111ty measure on Q wrth i 1‘ a w 1 a g density prOportional to e a wrt A and ”a = Aaow. By the Fubini Theorem (on the Space QxOn), the inner integral in (2.2) is ( (ma- ¢(o,))2p,a(xa)dw eg“ ”at, which then, by definitions of g a and tau, equals sa(w) (2.3) (I e M) m, -- ¢m or d )>m] + rid—‘3)m (Zedk — eCk + m’) ||Pw- PM”, dpk w1th f = p . ck,edk Proof. Since Ta) and r e (e ), by C3 it follows that w! h»: k FEQITw-Tw’u'l >morf()>m’] is bounded by the first term in the RHS of (3.1). To bound the expectation over the other region first observe that ryx) = Ie“r,(x)dw /pw(x) = rims/m 15 for any w E 0 since efl‘pdx) = p0(x+k) = pgk)(x). Lemma A.l then applies to yield k k (3.2) pwlrw- rwl s (2edk-e°k)lpw- pwl + 1le - 115,91. * h Since h*p0 5 e(d_c)l’l p (1) follows from (1.5), (3.2) shows that (h. /h*)FsIrw- T... Ill-ISm, dk’sm'i is bounded by e “‘9’“ (2edk—eCk)|[Pw— P an }. | 118‘) - pg) m! w," + {(fik This completes the proof because, by the transformation theorem, the above integral wrt p is llpw- pw. ldpk = I | pw- pw,|fdu S 111’ le- Pm,"- fgm’ me’ 4. Consistency of the posterior mixtures. In view of (3.1) and the paragraph following (2.6), the question of null convergence of the modified regret reduces to the question, loosely speaking, whether “a is Ll consistent for PG . In Theorem 4.1 we establish such a n consistency result for the non-delete version. The result for the delete versions will follow as a corollary (i.e. Corollary 4.1). Note that the proof of this theorem can be extended far beyond the exponential families of Section 1. Replace n by n+1 in Section 2.1. Denote gn +1, An +1, ”n +1 (of the second paragraph Of 2.1) by g, A, a: respectively. [112 can shown to be the posterior distribution of 011 +1 given the data X = (X1, ..., Xn) under 16 the prior 52‘” on (0 , ..., 0n, 0n+1).] We will use 52w), if necessary, to exhibit the number of arguments. We will first prove three lemmas which will be used for the proof of Theorem 4.1. Let 7’: In—1 HM: V 10s pw(xa) - flog pdeGnl w and for any 112, w’ e Q, Aw,(w) = [log (pw,/pU)de,. Lemma 4.1. For each 6 > 0, - 1 n6 lsup —P u < m + P(7>5) + —e——§— 2 (3) G11 I A(il6) where Proof. By definition of a, HP. - P II = flip (40w) - p ldu, w an ad Gn (4.2) = ]|f( [padw - pG )dA(w)|dp by Fubini on flxO, n .<. Ulpw - pG |d4(w)du = 1|le - PG ||d4(w)- n n The last two steps follow by taking absolute under integral and Fubini on QXR respectively. 17 For any 112, by (3.6) of Harman (1960), 1 W 11 Clearly LHS above < l everywhere and, by the above, < J25 on $12 25‘ Combining this with (4.2), we get (4.3) % ”Pa. - Pen" < m + M12429“), where the superscript c denotes complement. Since A has density wrt A proportional to eg and 7’ is the sup norm of n-lg + AG - [log pG dPG , one easily gets (equation (iii)' of the n n n addendum of Gilliland, Hannan and Huang (1976)) A((?t __2__6)C) < e-2n6+n7’ (4.4) _ _n _n 11(216) A(?t5)e 5 7 by bounding g above on (it ”)0 and below on ”6‘ Since A is a probability the LHS bounds A((?t26)c ); while on the set [7’ 5 6/4], the RHS is bounded asserted bound. )/A(?t§). Using these in (4.3) and taking expectation we get the Lemma4.2. LIV—+0 uniformlyinQ as n—ioo. 18 Proof. The conclusion readily follows by an application of Theorem A3 with S = n, d = the Levy metric, I: 9‘”, P0 = x P0 = P for — a a Q E 9‘”, Ha(w) = log pw(Xa), w 6 Q, a 21. (S,d) is a compact metric Space by Helly's theorem. Continuity of w ~~> p w(x) follows from the continuity and the boundedness of 0 ~~> p0(x). Thus H a's satisfy (1). We verify (ii+) and (iii+) of Remark A3 in the present situation. For (ii+), use (1.4) to observe that _ :1: “H,” s IXa|(-c v d) + log (11.1 v n 1, and Il *(|X|-M) Slimsup v VP([X|—M) =v1> |.|-M) 10 as MTao in view of (1.3) and (1.2). Next observe that p w is convex (because it is log—convex by HOlder) and continuous (by continuity of pa and D.C.T.). Using also the previously noted continuity of w ~~> p w(x) V x, it follows by Lemma A.2 that for any m 0 and m < oo, 3 p0 > 0 19 such that (4.6) [Vwopa>e,|Xa|$m]=¢,Vaifp6 5 VPX >m = VP m,oo). p10 0,2 “’09“ l at!“ al 1 0% Now let mToo to conclude, LHS of (4.7) = 0. This establishes (iii+). 7Q) _ -1 _ -1 _ Also "H II - x] n 210g pw(Xa) n EPlog pw(Xa)| — 7/. Hence by Theorem A.3, * 7’ = lim supn \é E 7 = 0. Lemma 4.3. If support of A = (2 then for any 6 > 0, A A(A <6) >0. 10069 {”0 } Proof. Fix 6 > 0. For an “’0 E Q, the continuity of the function w ~~> [log (pr/dePwo follows by D.C.T since log pwn—1 log p w as can —-1 w and (1.5). So the set {Aw < 6} is Open, non—empty (because it 0 contains 160) and hence (4.8) A({Aw0 < 6}) > 0, Since A has full support. 20 Next observe that, since the functions A w converge to A w pointwise n 0 and hence in A—distribution if ”n —-1 “’0’ by the defining property of the latter convergence (cf. our Section 2.1 usage) lim infn A (ma): 5}) 2 A({Aw0 < 5}) 11 can —. wo . This shows that the function “’0 ~~> A({Aw < 6}) is lower semi- 0 continuous. Hence it attains its infimum because 0 is compact. The proof now ends by (4.8). Theorem 4.1. Let the support of A be n. Then E||PA-P [I ——10 uniformlyinQ as 11—900. 11) Gn Proof. Fix a 6 > 0. Consider the bound in Lemma 4.1. Now, as n —1 no, the second term of this bound goes to zero by Lermna 4.2 and so does the third term by Lemma 4.3, the convergences being uniform in Q. Thus 1im supn 161's" P,- PG || 5 W- a: 11 The proof ends, 6 > 0 being arbitrary. Corollary 4.1. If the support of A = Q then V EIIPw—PGII —+0 uniformlyinQ as n—ioo. OSn a n 21 Proof. Fix Q 6 9n and 1$a_<_n. Let Gna denote the empirical based on Q;. Then we have the following. (i) Gna is Gn—l corresponding to Q; E On—l. (ii) Letting F: 11““1 —+ 0 be the function such that F(_xn_1) = &’(n—1)’ it follows from the definition of ”a that ”a = F(;(_('I). -_1 _1 (111) Clearly P X = .. _ . Q a Q0 2in 1 From (i), (ii) and (iii), we get that PQII Pu, —PG ll'1= Pf II P. -PG ll ‘1. a na 01 ”(n-1) n—1 Since Q and a are arbitrary v VEIIP -P IIS V _EIIP. —P ”—10 Qeenchn wa Gna Q6911 1 “’(n—l) n—l as n —1 00, by Theorem 4.1. -1 . . . . Next, because Gn — Gna = n (600 - Gnu) w1th 600 the d13tr1but1on degenerate at 00, the variation norm of Gn - Gun is no more than 2n_1. Thus, by definition of p a) and by (1.3), * —1 -1 h ID -p I $2n Vp 52n 11—11) +13)- Gn Gun 0 Q ... c d 22 Consequently _1 * -—1 HP -P H S 2n I(Vp X))dfl S. (4111/11 )1!1 . Gn Gun 0 A * The proof now ends by the triangle inequality. (Empirical Bayes, Estimation of mixtures.) Consider the situation where X1, ..., Xn are i.i.d. observations from the mixed distribution P w , w E (2 being unknown to the statistician. The problem is to estimate P w . This model obtains in the usual empirical Bayes context where _E is an expectation under which 0 , ..., 0n are i.i.d. ~ w and, n given Q, 1; ~ x P0 . It turns out that P, is L1 consistent for Pw a=1 a w whenever A has full support. Corollary 4.2. If the support of A = Q then, for any h), EHP.‘P" —10 as n—boo. w w Proof. It follows from the continuity, noted in Lemma 4.2 's proof, of w ~~> p 11100 V x that w ~~> P w is continuous in H.“ by the Scheffe’ theorem. Thus as n -+ 00, PG -+ P as. (since G ——1 w as. by n w n Glivenko—Cantelli) and hence Ell PG — P w" —-» 0 by D.C.T. The conclusion 11 now follows by Theorem 4.1 and the triangle inequality. 23 5. Asymptotic optimality. Now we are in a position to prove our main result. Theorem 5.1 (Main result). If the support of A = fl and t is the Bayes estimator given by (2.4), then (5.1) VEIta-tal —10uniformlyinQasn—1oo. am or f(k)>m 1 +e‘d‘°)m(2edk- e°“ +m') En 12w - PG 11} a n by Lemma 3.1, where m, m’ < on are arbitrary. For each m and m’, the second term of the above bound is 0(1) uniformly in a and Q as n ——+ on by Corollary 4.1. The first term is independent of a and Q and can be made arbitrarily small by choosing m and m’ large enough. This concludes the proof in the present case. 24 Next let ¢(0) = 213 eke"k be a polynomial in e”. By definition and the linearity prOperty of conditional expectation (or integral) it follows that ‘ - “[kl _ ‘lkl ta—Eakta’ tar" fakta . ~ 0 k where tug] and tug] are the corresponding Bayes estimators of e a for each 1:. Hence (5.1) holds since it holds with ilk] and ilk] for each It by the previous case. Finally for general continuous d, given t > 0, choose a polynomial p such that \é |¢(0) - p(e0)| < t. Then, using definitions and taking absolute values under integrals, lta— tlgll < c, Ita— t[g]|< c “I ] ~I ] 00k where t p amd t p are the correSponding Bayes estimators of p(e ) and so t | 5 It t I + 2c. Ito, The proof is now complete by the previous case, 6 being arbitrary. CHAPTER 2 THE SEQUENCE COMPOUND ESTIMATION 1. Introduction. Here we consider the sequence compound version of the problem treated in Chapter 1. In this formulation, at each stage a we estimate (6(00) by estimators based on the data £0 = (XI, .., X a) then available. The sequence compound estimator, which for each 11 plays Bayes versus ER with n the compound loss L (t,Q) = n-IE (t — ¢(Q ))2, turns out to be of s.s.c. 11 0:1 or a type described in Chapter 0 and is ac. if A has full support. The proof reduces to a corollary to the set compound result via an inequality due to Hannan (1957). 2. Bayes versus BR. 2.1. The Bayes sequence compound estimator. Fix an n 2 1. For ISGSII, a stage a sequence compound rule t a has ROGER) = R(ta,EX) which is its a component Bayes risk in the set problem with 0 components. But t a which minimizes R0032) is the a—th component of the Bayes rule versus 52' in the set problem with or components. Hence we get from Section 2 of Chapter 1 that in the sequence 11 problem the estimator which minimizes Rn(t,EX) = n_IE R0055?) is given 0:1 by (2.1) lama) = rw (x0). 15am where 1120 is as in (1.2.4) with n=a. Note that the components do not 25 26 depend on n and let 1, = . a 2.2. Admissibility. It has been noted in Chapter 1 that the condition of Lemma A.4 holds in our case. Hence all Bayes sequence compound estimators are admissible. In A particular 1, is so. Remark 2.1. Note that for each 11, fin is Bayes versus ER in the class of all stage 11 sequence compound estimators tn wrt the n-th estimation loss L(tn,Qn) = (tn- 11>(Qn))2 and, as recorded in the proof of Lemma A.4, has unique risk. 3. Asymptotic Optimality. Theorem 3.1. If A has full support then 1, is a.o. N Proof. Let t (mun) = an(xa) and in = (I ..., I ln’ nn) 1$a$n. Now Rn('f, .) 5 RDGD, .), since its generalization (cf. inequality (8.8) of Harman (1957)) holds without restriction and hence Dn(£,Q)+ is bounded by RHS (1.2.6) with to as in this chapter and i0 replaced by faa‘ But Since A has full support this bound is o(1) uniformly in Q, because )é Eltn- innl is 0(1) by Theorem 1.5.1 and convergence implies Cesaro convergence. Remark 3.1. In fact, in the above, 2 |Dn(t,Q)| = 0(1). To prove this note that a slight extension of (2.5) of Gilliland (1968) gives 27 ~ . -111 ~ ~ —1 |Dn(t,Q)[ 5 2dlam ¢[O]n ElElta— taa' + O(n log n), where O(n’llog n) is uniform in Q. But the above proof has Shown that first term is 0(1) uniformly in Q. CHAPTER 3 THE EMPIRICAL BAYES ESTIMATION 1. Introduction. In this chapter we look at the empirical Bayes approach to our estimation problem. An empirical Bayes estimator t = _1> is such that tn is a function of -X-n with n—th estimation risk function (1.1) Rn(tn,w) = ”(tn— ¢(Qn))2dP£dwn , w e 11. See Chapter 0 for further details. We will prove that the empirical Bayes estimator t = , where in is the Bayes estimator versus a prior A on n, is so. if A has full support. 2. Bayes versus A. For any given 11, any estimator tn based on Ln has stage n Bayes risk wrt A in the empirical Bayes problem equal to its stage n Bayes risk versus ER in the compound problem Since iterated integral dwndA and integral d8: have same meaning. Hence the Bayes empirical Bayes estimator of ¢(Qn) versus A is tn of (2.2.1). Admissibility. For any n, the stage n Bayes e.B. estimators wrt A are the stage n Bayes sequence compound estimators wrt ER and, since 28 29 the integral dun maps the risks of the latter to those of the former, inherit the uniqueness of risk from that of the later (cf. Remark 2.2.1) and hence are admissible. 3. Asymptotic optimality. Theorem 3.1. If A has full support then the Bayes empirical Bayes estimator is so, i.e. V w E (I, Rn(tn,w) —+ R(w) as n -—1 oo. Proof. First method: Since for each n21, tn is the n—th component of the compound rule t of (1.2.4) whose equivariance follows from the definition of t and asymptotic Optimality in the compound problem follows from Theorem 1.5.1, the asymptotic Optimality of tn in the empirical Bayes problem follows by Remark 1 of Gilliland and Harman (1986). Second method: A direct proof of the asymptotic Optimality of tn can be given in the present case along the same lines as that of Theorem 1.5.1. Interpret E as P: and in as the component Bayes estimator versus 10 based on Xn. Then as before, it is sufficient to prove that Eltn- ful -1 0 as n -—1 m. The proof goes through by the same steps with the use Of Corollary 1.4.2 instead Of 1.4.1. Remark 3.1. For the asymptotic Optimality of the Bayes rules in the general finite state empirical Bayes problem, Gilliland, Boyer and Taso (1982) came up with the same sufficient condition on A as the present work. CHAPTER 4 EXAMPLES OF A AND CONCLUDING REMARKS 1. Introduction. In Chapters 1—3 we have Shown that the Bayes estimators under consideration in the set, sequence compound and the empirical Bayes versions of our problem are all related and are asymptotically Optimal if A (of 1.2.1) has full support. In Bayesian contexts, we can say that such a prior is nonparametric in nature, a desirable property as indicated by Ferguson ( 1973) and others. Priors on the set of probability distributions, or random probabilities or random distribution functions have been considered by many authors in the contexts of nonparametric Bayes and empirical Bayes estimation, estimation of mixing distributions, etc. In Section 2 we present brief descriptions of four examples of full support A from their works (with some modifications in the Rolph case). In Section 3, some practically useful forms of the Bayes estimators corresponding to some of them are obtained. It is pointed out in the beginning of the section that the Bayes estimators can be expressed as a ratio of two multidimentional integrals involving the posterior means of w. This form is useful if the posteriors of w are analytically calculable (e.g. A and B). Section 4 contains the concluding remarks which include some possible generalizations of the present work and its application to some non-exponential families. Also some related Open problems are indicated. 30 31 2. Examples of A. We list below five examples of A with support (I. A. (Dirichlet process) An important class of priors on the probabilities on R with manageable posteriors has been introduced by Ferguson (1973). Among many equivalent definitions, here we state Definition 1 of Ferguson (1973,1974). Other equivalent representations can be found in Blackwell and McQueen (1973), Ferguson (1974) and Sethuraman and Tiwari (1982). Definition: Let 7 be a non-null, finite Borel measure on R. Then A is called the Dirichlet process prior with parameter 7 (hereafter we write A = Q (7)) if for every finite measurable partition {B1, ..., Bm} of R the distribution of (w(Bl), ..., w(Bm)) under A (w is the identity function on the Space of probabilities on R) is Dirichlet with parameters (7(Bl), ..., 7(Bm)). It is well known (e.g. Ferguson (1974)) that the support of Q (7) is the set of probability distributions on R whose support is contained in the support of 7. SO if we choose 7 with support of 7 = 9 = [c,d] then A = .9 (7) has support (I. B. (Processes neutral to the right) A more general class of priors than Dirichlet process has been introduced by Doksum (1974). Definition: A random distribution function F(t) on real line is said to be neutral to the right if, for every m and t1 (p1(w), 112012), ), p.106) = Hide), i2]. Then I; is 1—1, continuous, onto D and hence is a homeomorphism since (I is compact and D is Hausdorff. Thus a prior A on (D,.flD)) induces the prior A0 = A): on (9,119»- Since p is a homeomorphism, support of AQ = 0 iff support of A = D. Hereafter we will write A for An too. The structure of D, for the case 9 = [0,1], has been studied by many authors. Rolph (1968) exploited this structure to define his prior sequentially on the co—ordinates. His priors can be adapted to the case 9 = [c,d] by the reparametrization 0 ~~> (0—c)/(d—c). Another way of putting priors on D would be to follow Rolph's approach directly for D = D[c,d]. It is easy to see that D[c,d] has the same structure as that of D[0,1]. Let us elaborate this more extensively. We 33 use Rolph's notation of lower and upper bars to denote corresponding bounds on the range of moments given their predecessors — deSpite conflict with our n—tuple lower bar notation. Let In be the projection of R°° onto its first 11 co—ordinates, n21 and Dn= lrn(D). For (111, ..., ”n) E Dn[c,d] let n l away 11,) = (de)"+1an+l(ml. ms) 1330 (“1)(eln‘ra, (2.1) - +1— n n+1 n—r I‘n +1011, an) = ((1.0)In mn+l(ml, inn) 1:210 ( r )(—c) pr, 1‘ . _ _ —r r r-1 — where m0 — 1, mr — (d—c) i20(i)(_C) pi, 15r$n and In +1, mn+1 are as in (5) of Rolph. Then for (111, ..., pn) E Dn[c,d], (p1, ..., ”n’ ”n+1) E Dn+1[c,d] iff (22) En+1(”11 "-1 ”11) S ”n+1 S ”n+l(pli "-1 #11) Thus starting from a sequence of measurable functions {hn} positive on [c,d] with f hndx < co (the integral is wrt Lebesgue), cn and tin being cn’ n the minimum and the maximum of 0 ~~> 0“ on [c,d], we construct the prior on D[c,d] exactly the same way given in Rolph with only changes of mu, mm, in to an, En’ fin respectively. Under this A, the distribution of any finite moment sequence has full support, hn's being positive. It then follows from the definition of the product tOpology that A has full support. We will refer to this prior as Rolph's prior for [c,d]. 34 D. (Random distribution functions of Dubins and Freedman) Starting from a probability (base probability in their terms) on the Borels of unit square assigning measure 0 to the corners (0,0) and (1,1), Dubin and Freedman (1966) constructed a random distribution on [0,1]. They gave (3.6,Theorem) precise conditions on the base probability so that the resulting prior has full support. Then an obvious transformation carries it over to a prior on (I with full support. However for these priors we do not know any form of the Bayes estimator which can be computed in practice. E. (Discrete priors) Since (I is compact and metrizable, it is separable. Let {tun} be a dense sequence in f) and 0 and d is the degree of the polynomial) because any finite 11 sequence has Lebesgue density under A. To see (3.8), first use Fubini to rewrite (3.1) (with Q replaced by 11, Do by 5,, and ¢(n) = 111‘) as n ltlnfiptxaldwt n)i;1alp,,(Xi)dw(n)) «174 . (3.9) t x) = a( II I(,H Iii (Xi)dw(n)) d4 1=l ’7 Now use 1")"(x) = ”x I1(r)), h(r)) a 2 ajnj in (3.9) to get (3.8). Remark 3.2. For conditionally uniform prior, i.e. hi 5 1 V i, (3.8) takes a simpler form. In cases like Geometric and Negative Binomial h itself is a polynomial. For the Poisson case we can choose the polynomials _ j . .. 2 {—1.} 17) for approximation. In general, since h is continuous, a sequence of j. approximating polynomials always exists. Moreover such a sequence can be found numerically. 4. Remarks. 1. It should be Obvious that we can also treat the cases where the components are 1—1 transforms of some exponential families we have been considering. Suppose that the component distributions P 0 , 0 E 9 are such that {Q 1]: r) E H} form one such exponential family where Q 7) = P ¢—1( mT-l, T and (I: are 1-1 transformations on R and 9 respectively and (Fl is continuous. Let 2; ~ Pf . Then 1 ~ ()12 where Y a = T(Xa), ”a = “00), Igogn. Since T is 1-1, estimators (based on X) in the transformed problem are related in a 1-1 fashion to the estimators (based 38 on X) in the original problem. Any such two estimators have identical risk function under a common parametrization. Moreover since ([1 is continuous, 43 remains continuous in the reparametrization 1) of the transformed problem. Hence the conclusion of Chapter 1, Section 2 and Theorem 1.5.1 for the transformed problem implies that the set compound estimator —1 . 14w (n))q,,(T(x,,)) aw, 130(3) = T x d lq,( ( a» w, is admissible and a.o. for estimating Q in the original problem, where wa is n as in Section 1.2.1 with g a(w) = 2 log q w(T(Xi))’ Analogous conclusions iata hold for the sequence compound and the empirical Bayes versions. 2. We can generalize the component loss to weighted squared error loss, where the weight function is positive and continuous. If L(a,0) = w(lll)(a—(1>(Q))2 then a component Bayes estimator of (15(0) is the ratio of the corresponding Bayes estimators of w(0).¢( Q) and w(0) wrt the squared error loss. Since was is continuous and w... > 0, the L1(E) case of Lemma A.1 with L = 2w*|¢|*/w* and two applications of Theorem 1.5.1 imply that Elta- ial -—1 0 uniformly in a and Q where t a and fa refer here to the weighted loss. AS before this is sufficient to conclude the asymptotic Optimality of 5 since (1.2.6) holds with w* multiple of the RHS. The same conclusion holds in the sequence compound and the empirical Bayes versions. 3. An interesting question seems to be how far we can relax the compactness assumption on the component parameter Space. It is known that we can not always go up to the natural parameter Space. An example where 39 no a.O. compound estimator exists is the Poisson family with unbounded parameter set. See Gilliland (1968, Section 3.3) for a proof. 4. Under the assumption ”-1 << 11 instead of 111 << )1 we can use the transformation T(x) = -x in Remark 1 to obtain admissible, a.o. rules. An example where none of these holds is provided by the Binomial family and it is well known that in this case even the empirical Bayes problem has no a.o. solution. 5. A possible Open question is whether the condition A has full support is necessary. Another interesting problem is to find examples of A for which a good lower bound (as a function of 6) to the quantity in Lemma 1.4.3. can be obtained so that a rate of convergence in the asymptotic Optimality of t can be established. APPENDIX APPENDIX Here we present a few results of possible independent interest. They are used as technical tools in the body of the thesis. 1. On bounding the difference of two ratios. Lemma A.1. For e s5 and z 16 o g L, Y —1 (1.1) IZI {Bf-2| 4L1 5 IZI IyZ-zYI +L(lzl 421), s ly-YI +( I§I + L) lz-Zl- Proof. The first inequality holds because the RHS, less |Y| if Z = 0, is the I-E-l, (1 — Ig-l)+ weighted average of quantities whose minimum is the LHS. The second inequality follows by triangle inequality weakenings, IyZ-zYl s IZIIy-YI + lyllz-ZI and (IZI -|ZI)+ s Iz-Zl, in the two LHS terms. Remark A.1. Division by |z| in (1.1) yields a pointwise improvement on a lemma of Singh (1974, Lemma A.2). When are measurable functions on a space with an integral J, his lemma itself (and its extension, his Remark A2) is further improved by the corollary to ours resulting from the subadditivity Of the norm or metric distance from 0 in L7(J) according as 7 E [1,oo] or (0,1). 40 41 2. 0n uniform convergence of convex flmctions. Lemma A.2. If {fn} is a sequence of convex functions on an interval I c R converging to a continuous real function f pointwise on I, then the convergence is uniform on compact submts of 1. Proof. Let K be a compact subset of I, an interval w/o.l.g. Partition K into intervals of equal length. Let 6 denote the maximum of the oscillations of f within the subintervals. Let ”n denote the maximum of [fn — f | on the endpoints of the subintervals. On a given subinterval with endpoints a and b , bound fn above by its chord to obtain In 5 fn(a) V fn(b) ; bound fn below by the line extending the chord from an adjacent interval to obtain fn 2 fn(a) - I fnl2'a-b I ' Thus V lfn-fl 5 317 +26 —1 26 as n—loo. K n The proof is now Over Since the uniform continuity of f over K permits arbitrarily small c > 0 based on the number of subintervals. 42 3. A uniform Ll- LLN for independent random continuous functions. Let (S,d) be a compact metric space. Let || || denote the sup norm on C(S). Let {Pu : VEI } be an arbitrary family of probability measures. ( We use the measure to denote the corresponding expectation too and use the superscript (V) to denote deviations Of random elements from the values of their Pu expectations. ) Let An denote the uniform expectation on {1,...,n}. (If {fk} is a sequence of elements of a linear Space, Anf. will be denoted by I when convenient.) Note that An commutes with (V). Let * denote the (iterated) Operation lim supn X Anx Pu and note that it is subadditive. Theorem A.3. If (i) Under each Pu, H1, H2, are independent C(S) valued random elements with expectations E RS ( (PVHkXS) = PVHk(S) V k and S ), (ii) * (IIH‘f’ln — M)+ l o as M 1 ., (iii) v t > o and s e s, with vwk = V{|H('l:)]:|:d(s,t) 6110 as p10, then * IIHMII = o . 43 Proof. If card(S) = 1, [I [I reduces to | | and (iii) is vacuously satisfied. The Hk are then real valued random variables and will be denoted by Xk' For M E (0,oo) represent (as in the proof of Theorem 2.3.9 of Fabian and Hannan (1985), which the present real case greatly strengthens) (V) _ (V) (3.1) X k — U k + PVUk + Wk with Uk the projection of LHS into [-M,M], so that —(u) +4) -— — PVIX | s PVIU | + IPVUl + PV|W| and lPuUl = IPVWL Thus (32) 19,1821 s M/ni + 211,,1WI (since (PleMl) 2 g Pym”)2 5 M2/n ). Since kaI = (|X£”)l — M) + , *IWI 5 *|W.| 10 as M T 00 by (ii) and thus (3.3) * |Y(")| = 0 follows from (3.2). Let C > 0, S E S. Since (3.4) v,,,,k 5 211119911, 44 (3.5) VWk < C + M [vspuk> C] + (2 "118'?” — M )+ and by subadditivity * VW. s c + M * [vain]? r] + * (2 “118’?” — M )+ (3.6) 5 2t + M(c) * WSW.) c] by choice of M = M(c) in (ii). And so (3.7) * Vsp(s,c)u. 5 36 by choice Of p = p(S,€) in (iii). By compactness of S, 3 a finite cover of S by Spheres indexed by oi: (Si, pi) with pi = p(si,e), for i = 1, ..., g. Then (3.8) "HM" s [51 Ismail + IV‘QVI + W l, V 0.11 I g _ __ 8 _ s 2121 |H(")(si)| + wig/3,11 + (Pa/tit From (3.4) it follows that |v(gguk| s 2 ( "118'?" + PV||H([:)|| ) and hence (3.9) P,( ly‘fiiml - M )+ s 2 P,,( 2 1111‘?" - 14/21,, Thus the “311k , as well as the H(i’)(si), inherit (ii). Since (i) obviously 1 holds for both sequences, so does (3.3). So it follows from (3.8) and 45 subadditivity of * that * _(V) S _ , 8 .., (3.10) ||H || 3 * 11/ PuVoiu _ hm supn 11/ y Puvoiu 8 Since lim supIl commutes with V for finite g, we get 1 RHS (3.10) 5 3c. The proof ends since 6 is arbitrary. Remark A.3. Let (ii+) and (iii+) denote (ii) and (iii) respectively without the centerings (V). Let (ii+) hold. Then, Since IIPVHRII S PullHkll and hence ([lPqull - M) + s PV(||Hk|| - M) + , (ii+) holds with HR replaced by PVHk. This along with (ii+) then gives (ii) (since (a + b — 2M) + 5 (a — M) + + (b - M) + ). Let (ii+) and (iii+) hold. Then, since v{ milk]; I : d(t,s) < p} -l . .. S PVVSpk , Anlpuvsp.> c] S c AnPVVsp. . So we (3.6+) and (11+) 0 = Iii:lo * Vs p. . Thus (iii+) holds with Hk replaced by Funk This along with (iii+) then gives (iii) (Since [ V|H+G| > c] C [ V|H| > 13/2] + [VIGI > 15/21)- 46 4. Admissibility of Bayes estimators in the compound problem under squared error loss. Let {P 0}, 069 be the component distributions. Consider the compound problem of estimating 52 under squared error loss L(t, Q) = '1 a; (ta— 113(001»2 for any function 45 on 9. Let Pi = (1:1an for Q69”. Let C be a prior on Q. Denote the joint distribution (0P:— on < _x,Q> by Q. Then the marginal of x is Qx-l = [Pddg For a function f on 911 let Qxf( Q) denote the class of conditional expectations of f( Q) given 9;. Lemma A.4. If C is such that P0 ( Qx—l V Q E 9n then every Bayes estimator versus C is admissible. Proof. First consider the set compound case. Fix an n e {1,...,n}. Then Q(ta— 00)2 is minimal iff t 0(a) e Qx¢(Qa). Hence ta is determined up to Qx'l null sets and so by the assumption of the lemma has unique risk Q ~~> f(Qx¢(Qa) — ¢(00))2dP0 Thus, Since a E {1,...n} is arbitrary, the compound Bayes estimators have the 11 unique compound risk Q ~~> n—1 23 [(Qx¢(Qa) — ¢(Qa))2dP£ and hence are a=1 — admissible. For the sequence compound case, the given condition implies that for -1 each a E {1, ..., n}, PQ ( Qxa V Qa E 90’. Hence, by combining the a intermediate results in set case with n = a for each a, we get that the 47 sequence compound Bayes estimators have the unique compound risk 11 fl ~~> n-1 2 [(Qx ¢(Qa) — ¢(Qa))2dP£ and hence are admissible. a=1 —a a BIBLIOGRAPHY BIBLIOGRAPHY Basu, D. and Tiwari, R. C. (1982 . A note on the Dirichlet process. Statist. Prob: Essays in onor of C. R. Koo. North—Holland Publishing Comp, 89—103. Bilfingsgfyé Patrick (1968). Convergence of Probability Measures. John Wiley ons. Blackwell, David and McQueen, James B. (1973). Ferguson distribution via Polya urn schemes. Ann. Statist. 1, 353-355. Doksum, Kjell (1974. Tailfree and neutral random probabilities and their posterior distri utions. Ann. Statist. 2, 183-201. Dubins, L. E. and Freedman, D. A. (1966). Random distribution functions. Proc. Fifih Berkeley Symp. Math. Statist. Prob. II.1, Univ. of California Press, 183-214. Fabian, Vaclav and Harman, James V\(1985). Introduction to Probability and Mathematical Statistics. John iley & Sons. Ferguson, Thomas S. (1973). A Bayesian analysis Of some nonparametric problems. Ann. Statist. 1, 209—230. Ferguson, Thomas S. 81974). Prior distributions on Space of probability measures. Ann. tatist. 2, 615—629. ‘ Gilliland, Dennis C. (1968). Sequential compound estimation. Ann. Math. Statist. 39, 1890—1904. Gilliland, Dennis C. and Harman, James (1986). The finite state compound decision problem, equivariance and restricted risk components. Adaptive Statistical Procedures and Related Topics, IMS Lecture Notes - Monograph Series 8, 129—145. Gilliland, Dennis C., Harman, James and Huang, J. S. (1976). Asymptotic solutions to the two state compound decision problem, Bayes versus diffuse priors on prOportions. Ann. Statist. 4, 1101—1112. Gilliland, Dennis C., Boyer, John E. and Tsao, How Jan (1982). Bayes empirical Bayes : finite parameter case. Ann. Statist. 10, 1277—1282. 48 49 Hannan, James F. (1957). Approximation to Bayes risk in repeated play. Contribution to the Theory of Games 3, Ann. Math. Studies, No. 39, Princeton University Press, 97-139. Hannah, J. (1960). Consistency of maximum likelihood estimation of discrete distributions. Contributions to Prob. Statist., Stanford Univ. Press, 249—257. Inglis, James 1973). Admissible decision rules for the compound problem. Ph.D. T esis, Dept. of Statistics, Stanford University. Inglis, James (1977). Admissible decision rules for the compound decision problem : the two action two state case. Ann. Statist. 7, 1127-1135. Kuo, Lynn (1986). A note on Bayes empirical Bayes estimation by means of Dirichlet process. Stat. Prob. Let. 4, 145-150. Lehmansn), E. L. (1959, 1986). Testing Statistical Hypotheses. John Wiley & ns. Meeden, Glen (1972). Some admissible empirical Bayes procedures. Ann. Math. Statist. 43, 96-101. Robbins, Herbert (1951). Asymptotically sub-minimax solutions of compound problems. Proc. Second Berkeley Symp. Math. Statist. Prob, Univ. of California Press, 131—148 Robbins, Herbert £81955 . An empirical Bayes approach to statistics. Proc. Third Berk ey ymp. Math. Statist. Prob. 1, Univ. of California Press, 157—163. Rolph, John E. (1968). Bayesian estimation of mixing distributions. Ann. Math. Statist. 39, 1289—1302. Sethuraman, Jayaram and Tiwari, Ram C. (1982). Conver ence of Dirichlet measures and the interpretation of their parameter. tatistical Decision Theory and RelatedTopics, IH.2, Academic Press, 305-315. Singh, Radhey Shyam (1974). Estimation of derivatives of average 11— densities and sequence—compound estimation in exponential families. Ph.D. Thesis, Dept. of Statistics and Probability, Michigan State University Vardeman, Stephen B. (1978. Admissible solutions of finite state sequence compound decision pro lems. Ann. Statist. 6, 673-679. ,.. 4.1, nICHIan STATE UNIV. LIBRARIES lllllllllllllWWllllllllllllllllllllllllllll 31293005393289