WWW I = _— —— —— —— _— _’_ —— —— _— _— (13—3 0—: '4 I _mNm '4 w _ ‘”WWIlllllllllllflllllllllfllfl'lll 3 1293 01058 8253 LIBRAR Y Michigan State University This is to certify that the thesis entitled On Asymptotic Optimality of Bayes Empirical Bayes Estimators presented by Tze Fen Li has been accepted towards fulfillment of the requirements for Doctoral degree in Statistics Q. fl/éfl/ l Major professor Date j%/—2€1 /7&4/ 0-7 639 c ‘ » I ‘. . pt. ‘4 .2 ’j "i¢:.‘.‘.‘\\\ ;. ~ .4 I’l-um ‘ ‘~‘ ’ C. - - \ c x» “can”. Iflflflllflflilllfllfll Y Y \ ,1 wk. Rh OVERDUE FINES: 25¢ per day per item BEWRNIM LIBRARY MATERIALS: Place in book return to remove charge from circulation records ON ASYMPTOTIC OPTIMALITY OF BAYES EMPIRICAL BAYES ESTIMATORS By Tze Fen Li A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1981 ABSTRACT 0N ASYMPTOTIC OPTIMALITY 0F BAYES EMPIRICAL BAYES ESTIMATORS By Tze Fen Li In an empirical Bayes decision problem, a prior distribution A is placed on a one-dimensional family G of priors Gm,w E Q, to produce a Bayes empirical Bayes estimator. The asymptotic optimality of the Bayes estimator is established when the support of A is Q and the marginal distributions Hm have monotone likelihood ratio and continuous Kullback-Leibler information number. For the normal case, a simple class of empirical Bayes estimators is constructed that dominate the James-Stein estimator. Here the Bayes estimator is smooth, admissible and asymptotically optimal on G. The rate of convergence to mininumi risk is 0(n'1) uniformly on G. The results of a Monte Carlo study are presented to demonstrate the favorable risk behavior of the Bayes estimator in comparison with other competitors including the James- Stein estimator. ACKNOWLEDGMENTS I would like to take this opportunity to express my appreci- ation to my advisor, Professor Dennis C. Gilliland, and my guidance committee for invaluable guidance, constructive assistance and suggestions during the entire course of this study. The financial support provided by the Department of Statistics and Probability and National Science Foundation made my graduate studies possible. I wish to thank Clara Hanna who accurately typed the thesis with great patience and skill. TABLE OF CONTENTS Chapter Page I INTRODUCTION TO BAYES EMPIRICAL BAYES ESTIMATION ...... .. ............ . ............ 1 1.1 Introduction ..... ......... ....... ..... 1 1.2 Example - Normal Case ........... . ..... 3 1.3 Literature Review..................... 17 II ASYMPTOTICOPTIMALITY OF BAYES EMPIRICAL BAYES RULES..... ........... . ..... ... ...... 19 BIBLIOGRAPHY ............................................. 29 CHAPTER I INTRODUCTION TO BAYES EMPIRICAL BAYES ESTIMATION l.l Introduction Consider the component decision problem consisting of estimation of 6 based on X which has a distribution Fe' Let L(e,-) denote a loss function and let R(G,d) denote the risk of an estimator d when G is a prior distribution on e, i.e., R(G,d) = f L(e,d(x)) dFe(x) dG(e). (l.l) Let 0 denote the class of all component estimators d. The infimum risk, R(G) = inf R(G,d), (1.2) deD defines the Bayes envelope at G. An estimator d6 6 D such that R(G,dG) = R(G) is said to be a Bayes component rule versus G. In the empirical Bayes (EB) decision problem with this component, (61.xi), i = l,2,... are i.i.d. with 6i ~ G, and, conditional on ei,Xi ~ Fe.° The EB problem is to estimate en 1 based on observing X]....,Xn. This can be construed as using l""’Xn-l) to select a component decision rule tn(§n_1)€D and evaluating it at Xn to estimate en. (In what follows tn will sometimes be used to abbreviate the evaluation tn(§n_1)(xn).) The risk of’ tn at G conditional on §n_] lS R(G, tn(§n_])) 3 R(G), and the overall risk is Rn)ng"(5n,1). (1.3) where its-1 denotes the (n-l)-fold product of the G-mixture of the Fe“ Let G be a specified family of distributions on 6. Definition l.l. (Robbins (1956)) A sequence of EB rules tn is said to be asymptotically optimal (a.o.) on C? if for each G E G, lam Rn(G, tn) = R(G). When the component loss is squared error, a Bayes component rule is dG(X) = EEelX]. Furthermore, EB risk (1.3) has the re- presentation - 2 Rn(G,tn) - R(G) + E(tn-dG(Xn)) (l.4) provided E62 < w. This representation was noted by Johns (1956) and used as a starting point to prove the a.o. property of certain EB estimators. It follows from the Lz-orthogonality of EEenlxn] - an and tn - Eten|xn]. In this thesis, 6 is assumed to be a parametric family of distributions of a. Each G 6 G is identified by an element w of an indexing set 9 which is a subset of the reals, i.e., G = {Gwlw E 51}. “-5) Let A be a prior distribution on a. An EB estimator tn is said to be Bayes and called Bayes EB with respect to A if it minimizes Rn(A,tn) = f Rn(Gw’tn)dA(‘”)° (1.6) Good (1965) refers to such priors on priors as Type III probabilities. Meeden (l972) illustrates the Bayes approach to empirical Bayes squared error loss estimation problems with several examples. Other literature discussing Bayes empirical Bayes includes Lindley (1971), Gilliland and Hannan (1974), Gilliland, Hannan and Huang (1976), Deely and Lindley (1979) and Gilliland and Boyer (1979). In the next section we develop the EB and Bayes EB methods for an example. In Section 3 we give a brief review of the related literature. In Chapter II we consider the Bayes EB method and show that it produces a.o. procedures for a variety of EB decision problems. 1.2 Example - Normal Case In the present section we consider the component consisting of squared error loss estimation of 6 based on X ~ FB = N(e,1). First consider the compound decision problem with this component. This consists of estimation of §_= (61,62,...,6n) based on _x= (x x ...xn) ~ Fe 1, 2,. X Fe X...X F with the compound risk being 1 2 6n the average risk across the n components. James and Stein (1961) show that the estimator t} = (t1,t;,...,t;), where for i = l,2,...,n, 1 _ 5:3; _ n 2 ti(X) - (1 - S )Xi where S - 1;] Xi, (1.7) has compound risk satisfying 3 (3,0) <1 for all g, n 3 3. (1.8) This demonstrates the inadmissibility of the compound estimator “X if n 3 3. Efron and Morris (1972) point out that t; is a natural EB estimator for the EB problem with this component and a class of normal prior distributions, here parameterized as G={N(O, 1&Q)|0 6. (1.17) Rn(w, tfi) < Rn(w, tl) The coefficient 2(n-6) of 5'2 in the definition (1.15) of ti was chosen from among all constants to produce an adjustment for the James-Stein estimator t; which results in domination (1.17). Continuing this method of construction leads to EB estimators with nested risk functions. The next two estimators in the construction are 2 3 2 2 n-1O tn tn + -1—g§—l—-xn (1.18) and 2 4 = 3 2(n-14)(n - 28n + 188) tn tn + s4 xn (1.19) with 3 2 Rn(ws tn) < Rn(w, tn)- 4te3(n-10)3 n(n-2)(n-4)(n-5)(n_8) : n > 10 (1.20) and 4 2 4 3 4w (n-14)(n_-28n + 188) From (1.13) and the fact 1.16 (0.1]. each estimator ti can be improved by retracting its shrinking factor ¢j(S) to the interval [0,1]. Each ti is dominated on G by 1*: _ * -27 tn (1 ¢j(S))Xn (1. -) * 0 where a = max { m1n{a,1}, 0}. We now turn to the study of Bayes empirical Bayes rules with respect to a prior A on we (0.1]. The estimators are of the same general form as the ta but with more complicated shrinking factors. Moreover, they are monotone in X". and. for suitable A, admissible on (0.1] and a.o. on (0.1] with a rate 0(n'1). The results of a Monte Carlo study will demonstrate the favorable risk behavior of the Bayes EB estimator based on uniform A. Let A be a prior distribution on w E (0.1]. By (1.4) and (1.10) Rn(A. tn) = f R(w)dA(w) + f (téggh,)(xn)-(1-n)xn)2dng(5fl)dn(n). (1.23) For (1.23). a minimizer is tfi = (1-A)x , (1.24) where (1 %.+1 '39. 0“’ e ‘- dA( , n). 1'“? f0 mze dA(w) (1.25) Here A is conditional expectation of w given X“ in the model ~ . . . . -l w A and cond1tional on w, X].....Xn_]. Xn 1.1.d. Hm = N(0,w ). Note that tfi is unique a.e. Lebesgue on Rn-space. Also note that t2 is of the same form as the t3 introduced earlier but with a more I complicated shrinking factor ¢(S) =13. Whereas the EB estimators t% are not monotone N1 Xn for fixed Xn-l’ we have A . Remarkl.l. For any prior A on (0.1], the Bayes EB rule tn ‘lS monotonically increasing in Xn for fixed ln-l‘ n Proof. From (1.24) and the fact S = I x?. D + wrt S. But B is the mean of an exponential family with parameter it suffices to prove -S/2 and is therefore i wrt -S/2. U Remark 1.2. For every prior A on (0.1]. the Bayes EB rule t2 is admissible on (0.1]. Proof. The uniqueness a.e. Lebesque of a Bayes rule implies it ad- missibility. D The Bayes EB estimator t2 is easily evaluated and the rate of a.o. determined for the prior A = Beta (a,1), a > 0. Let téa) be Bayes EB versus Beta (a, 1). (This class of Beta priors was used by Strawderman (1971) in constructing a prior for the compound decision problem. He shows that the compound version ti“) of ti“) satisfies (1.8) provided n 3 4 + 2a.) With the Beta (a, 1) prior. (1.25) simplifies using integration by parts to . l - w = “+2“ - 2(s f u $5"+°‘“e1‘5“'“)5du) 1, (1.26) O In case 3-+ a - l is an integer, say m, (1.26) further simplifies through repeated integration by parts to the closed form m k a = Zigill. - (1,5)"‘[m1(e1‘iS - 2 $E$l—)J". (1.27) 0 0 Using (1.27). B can be easily calculated to any degree of precision. Since the EB risk Rn(w.t£“)) is the expectation of the compound risk R(e.t(“)),the aforementioned Strawderman (1971) result implies Remark 1.3. If n 3 4 + 2a, Rn(w, tga))< l. O < w < l. f (1.28) The conditional expectation B of (1.26) satisfies . A +2a o A 11m w = fl—————-. 11m 5w = n+2a. (1.29) 510 "+2“+2 Six Figure 1.1 displays for n = 20 the graphs of ¢j(S) that are part of the EB estimators tgo, j = 1.2.3 as well as B of (1.27) with a = l (uniform prior). 10 . ..¥_...__.T ...—4» ._..-_.-.'__ 1 .-..- . _.._.-._.. ...—o The conditional expectation B of (1.26) is close to the , * corresponding factor ¢](S) in the modified James-Stein EB rule 1+ 1 tn . Also tn and téa) have the same rate of a.o., namely, 0(n'1) uniformly in w. To establish these results we begin with Lemma 1.1. Let wn = min{l. (n+2a-2)/S} and let 8 be given by (1.26). If n+2o-2 > 0. then ((B'Nn)2 < 411’ - m “-30) 11 Proof. Let A denote the conditional distributioncrf to in the model w ~ A and conditional on m, X]. X2.....Xn are i.i.d. Hm = N(O,w']). Then A has a density with respect to Lebesgue measure which is proportional to E A(w) = w2 e 22. 0 < m f l. (1.31) Assume that n+2o-2 > 0. Examination of 9(a) =4h R(w) shows that wn is the maximizer of 9(w). Note that 1o - onl I I (w-wn)dA(w)| IA I Iw-wnldA(w) = I A£w=>w +t3dt + I Ate <1»-t]dt.(1.32) 0 n o n The density 3. is 27-. concave from which it follows that Aiel__. - Ato.1] + 1n w e (0.11. (1.33) (cf. Gilliland, Hannan and Huang (1976), Lemma 8)). Also. X(1-w) is m, concave from which it follows that €591— 1 in o e (0.11. (1.34) A(0,w] Thus. A [w > mn+tJ f R(wn+t)/X(wn) A A A (1.35) A [w < wn-t] f A(wn-t)/A(wn). 12 The Taylor series expansion of 9 =1% X about w = wn shows that g 0. 2 Rn(w,téa))-R(w) 5 fi;§§:§-n" + 4";%£EZS‘) m. (1.38) Proof. Using (1.13), and triangulation about wn of Lemma 1.1, Rn(w,t£a))-R(w) f 2 E[(m-wn)2X§] + 2 [[(wn-w)2X2]. (1.39) By Lemma 1.1 and the fact E xfi = o’]. . 2 2 n4 -1 E[(U)-wn) Xn] f n+2a-2 u) o (1.40) 13 Also 2 2 n+2a- 2 2 2 E[(wn-w) Xn]< E[(— -m) Xn]. (1.41) Expanding the square in (1.41) and using the fact m S ~ x2(n) independent of Xi/S. it follows that 2 2n+4(a -1) which together with (1.39) - (1.41) completes the proof. D The bound in (1.38) fails to demonstrate the uniform 0(n'1) rate on (0.1]. The following theorem does establish this uniform rate by combining (1.38) with a bound designed for risk in a neighborhood of w = 0. Theorem 1.1. The Bayes EB estimator t3“) is a.o. with a rate 0(n") uniform in m e (0.1]. Proof. By (1.13). (1.26), and the fact n E [(8-w)2X2] = E [(B-w)25] (by the symmetry of B in X].X2.....Xn). n[Rn(1o,t'(‘a))-R(m)] = E r;- {n-Z-wS + 2(1+a)—2f(5)}2] (1.43) where +a-1e--1- S f(S) = (10117" ( u) do)". (1.44) Let n > 2. Note that E [5' 9(5)] = (Mn-2)"1 E g(Y) where m S ~ x2(n) and w Y ~ x2(n-2) by a change of variable argument. 14 Now the variance of w Y is E{(n-2)-wY}2 = 2(n-2) (1.45) and Cov(Y,f(Y)) < 0 (1.46) since f(Y) + with respect to Y. so that (1.43) shows that 2 LHS(1.43) 5 2 w + 4”E[{(;fg)'f(Y1} 3 : (1.47) The proof is completed by showing that E f2(Y) is 0(n) uniformly in w e (0.%] and using (1.38) to establish the uniform rate on [%3 1]. From (1.44). for any b > O E f2(Y) 5 (flniiwwi'ldnr2 PEmY 5 b] + (floimm'le‘l‘ulb/hdu)‘2 (1.48) O 0 where use is made of the fact that the integrand in (1.44) is increasing in S. The choice b = kn+o-1 ensures that the PLwY f b] 5 2(n-2) (lgn-ot-lf2 (use the Chebyshev inequality and the fact w Y ~ x2(n-2)). With this choice of b, u*"+“"e("“)b/2“ > 1, 2w < u < 1 (1.49) so that the last term on RHS(1.48) is bounded by (1-2w)-2. Hence. E f2(Y) = 0(n) uniformly in w e (o. %J. . p Maritz (1970, Chapter 3) proposes several methods of obtaining "smooth" EB estimators. He illustrates two of these methods for *the normal case EB example of this section. 15 An estimator 36 is described in Maritz Examples 3.4.4 and 2.14.2 which is e A* 6G = (1'01”)xn, (1.50) 5* where wM is the retraction of the shrinking factor 13M ="—;1- (1.51) s D) 3 D. U7< II II M l —l The estimator 36 is seen to be a delete version of t;+. where delete refers to the fact that only the initial observations Xn_] are used to est1mate w. By (1.13). A 1 «* 2 Rn(w,GG)-R(w) - w E (m-wM) (L) + = TL_1)('_I2nr-'33 :_5 , (1.52) so 3G is seen to be a.o. with a rate 0(n'1) uniform in '1 E (w- E5192 IA 416 (0.1]. For comparison. the James-Stein estimator. which uses the untruncated 41(5) = (n-2)/S to estimate w, has excess risk 2w/n. The Maritz estimator B3 is constructed by finding the 3-point uniform distribution G on 66 (-co,co) among all such distributions so that the G-mixture of N(e,l) minimizes a distance between the mixture and the empirical distribution of X1,X2,...,Xn_1. 16 (See Maritz (1970, pp. 54-55).) The EB estimator of an is then taken to be 53 = d§(xn) . (1.53) Both Maritz EB estimators are "smooth" in the sense of being monotone in Xn for fixed Xfl_]. However. as the following table shows. both estimators perform poorly relative to t3]) and even t:+ and t; for selected values of n and m. Table 1.1. Rn(w,tn)-R(u) Values o w R(w) n = 4 n = 10 n = 20 té‘) .13 .05 .02 + 5 .167 .833 t; .09 .03 .02 t; .08 .03 .02 tél) .09 .06 .04 2 .333 .667 t;+ .18 .06 .03 1 tn .17 .07 .03 tgl) .06 .04 .04 l+ .25 .09 .05 1 .500 .500 t; .25 .10 .05 56 ____ .17 .10 03 ____ .32 .19 tél) .07 .03 .03 1+ tn .33 .10 .06 .5 .667 .333 t; .33 .13 .07 66 ____ .16 .09 D3 .22 .14 17 Table 1.1. (Continuation) té‘) .16 .07 .04 tl+ .44 .12 .07 .1 .909 .091 t; .46 .18 .09 86 ____ .12 .07 p3 ____ .16 .07 t£]) .20 .10 .06 + .01 .990 .010 t; .48 .13 .07 t1 .50 .20 .10 n The values for the James-Stein estimator t; are exact + Zw/n. The values for the modified James-Stein estimator t; and the Bayes EB rule tél) are estimates based on 1000 Monte Carlo trials. The estimated standard deviations are generally about 8% of the estimated excess risks. The values for the Maritz estimators SG and 53 are taken from his Table 3.14 (1970) where the margins of errors of the Monte Carlo estimates are reported to be less than t .02. Also the Maritz risks are based on EB decision problems with n = 11 (not 10) and n = 21 (not 20) observations. Figure 1.1, Lemma 1.1 and Theorem 1.1 suggest that t(]) n + and the modified James-Stein estimator t; should have very similar 4. EB risk behavior for moderate to large n. Also t; and t; should have very similar EB risk when w is small since a S ~ x2(n) and t;+ = t; if S 3 n-2. Table 1.1 illustrates these facts. Furthermore, it shows that 03 (and 36) are poor 18 estimators in the tested combinations of w and n contrary to the conclusion Maritz (1970, p. 72) reaches by comparing B3 and 56 with the simple estimator X“. 1.3 Literature Review. Gilliland and Hannan (1974) and Gilliland. Hannan and Huang (1976) discuss Bayes procedures for the finite state compound decision problem. Bayes compound procedures with respect to mixtures of product distributions are shown to be Bayes EB procedures. Impli- cations fbr the EB problem are discussed including asymptotic optimality. Gilliland and Boyer (1979) demonstrate that a.o. in the finite state EB problem is an easy consequence of classical results on the con- sistency of posterior distributions. Tsao (1980a, 1980b) gives an algorithm to efficiently compute Bayes EB procedures and uses a Monte Carlo simulation to develop small n risk behavior for selected A and the Robbins two state component. In the finite state case, the unrestricted family G of distributions on 6 is finite-dimensional. Otherwise, G is infinite- dimensional and the process of placing priors A on G is itself a technical problem. Meeden (1972) for certain components. with, a restricted to the unit interval, places a prior A on G through the moment sequence and demonstrates the a.o. of the resulting Bayes EB estimator. Kuo (1980) proposes a way to compute the Bayes EB estimators when A is a Dirichlet process. The a.o. property is not established in this general setting. Deely and Lindley (1979) illustrate the use of Bayes procedures in EB decision making but do not consider a.o.. CHAPTER II ASYMPTOTIC OPTIMALITY OF BAYES EMPIRICAL BAYES RULES Consider the EB decision problem of Section 1.1 with the one-dimensional family G of component priors of (1.5). We let Fe denote the distribution of X given 6 and when 6 ~ Gm let H; denote the marginal distribution of X. Throughout this chapter we assume that the family {lew E Q} is identifiable and dominated and let h(XIw) denote a density for Hw. It is assumed that h(x|w) is jointly measurable in x and w. This assumption is part of the hypothesis of a Schwartz theorem on consistency of posterior distributions, which will be applied in the Bayes EB approach where a probability measure A is placed on m. From (1.3) and (1.6), Rn(n,tn) = I} R(o.tn(xn_.))dIn(o)dP(xn_]) (2.1) where, here and throughout, An(-) = A(-|xfl_1) is the conditional distribution of w given ln-l in the model where w ~ A and' conditional on m, X ,X .....X are i.i.d. H . As Gilliland 1 2 n-l w and Boyer (1979) point out, it follows from (2.1) that a Bayes EB rule is provided by 1) = dé (2.2) 19 20 where dé is component Bayes versus the mixture n on(o) = I ow(o) din(o). (2.3) Of course. the random measure Gn need not be G-valued where G = {Gwlw 6 Q}. In the normal example of Section 1.2, Gw = N(O, (1-w)/w) and nondegenerate A-mixtures of Gm are not normal. (cf. Teicher (1960), Corollary, p. 67.) A basic condition critical for our proofs of a.o. for Bayes EB procedures is A C m An(V ) -> 0 a.s. Hm ° (A) for every mo 6 O and every neighborhood V of we where Vc denotes the complement of V in Q. The condition (A) is easily seen to imply oo Iw(w)d71n(w)+w(oo) 3.5. nw O (4-) for every mo 6 a and every bounded continuous function 0. Theorems 2.1, 2.2, and 2.3 give conditions which together with (A-) imply the a.o. property for the Bayes EB procedure t2. Theorem 2.4 states conditons on the family {lem E 0} sufficient for (A). The chapter is completed with four examples of Bayes EB procedures whose a.o. property follows from the theorems. Theorem 2.1. Suppose that thecomponent risk functions R(G,d) are bounded (by M < w). Suppose each Gm has a density with respect to Lebesgue measure given by g(Olw) which is continuous 21 in w E 9 for each fixed 8 E o. Then if (A-) obtains, t2 of (2.2) is a.o. on n, that is, Rn(w°.t2) + R(wo) for all wo E O. (2.4) A Proof. Fix w° e 0. Note that for each prior G on e. 0 s R(wo.d§)-R(w°) = R(w°,dé)-R(G,dé) + R(G,dé)-R(wo) 5 [R(wo,dé) -R(o.dg)1 + [R(G’do°)'Riwo’dnoll- (2.5) A Let Gn be given by (2.3) and note that Gn has density §n<9> = 7 g(elw)dAn(w). (2.6) For any d E D. R(wo,d)-R(Gn,d) = f R(G,d) I {g(elwo)-g(elw)}dAn(w)de from which IR(o°.d)-R(én,d)l 5 M II Ig(eloo)-g(elw)Ided1n(o). (2.7) Note that |g(e|w°)-g(e|w)| 5 g(elwo) + g(elw) and ‘I{g(e|w°) + g(elw)}d6 = 2. It follows from the assumed continuity of g(elw) in w and the general dominated convergence theorem (DCT) (Royden (1968. p. 89)), that j|g(e|w°)-g(e|w)|de is a continuous as well as bounded function of w. Therefore, (A-) implies RHS(2.7) 4 0 a.s. H: . (2.8) O 22 Letting G = Gn in (2.5) and using the bound RHS (2.7) for each of the square bracket terms, one obtain R(w tA(X )‘- R(w ) + 0 a s H00 (2 9) 0’ n -fl-1 I o ' ' wo- - / Since all risks are bounded. the a.o. of t2 follows from (2.9) and the DCT. D The Bayes EB rule t2 defined by (2.2) has a useful alternative representation in the special case of squared error loss estimation and when the family {Fele 6 O} is dominated. Let f(xle) denote a density for F . Since dG(x) is the point estimator e _ 6‘f x 0 d6 9 do‘x) ' lIT’x 6 d6 6 (2°10) and G" is a An-mixture of Gm, (2.2) implies that I16 f(xnlo)de(o)d?\n(w) II f(xn19)de(9)dAn(w) t£(5n-l)(xn) = _ I dw(xn)h(xn|w)dAn(w) I h(xn|w)dAn(w) n =I dw(xn) 1:] h(Xilm)dA(w) n I .n h(xi|w)dA(w) 1=l A Thus, tn is the point estimator t2(xn_])(xn) = I dw(xn)dAn+](w). (2.11) 23 Theorem 2.2. Suppose that the component loss function is squared error loss estimation of 6. Suppose that for each w 6 9. the component Bayes rule dw with respect to Gm is of the form P dwm = x 41.0mm) (2.12) i=1 for some integer p, some square integrable (Hm) functions 51 and some bounded, continuous functions 01. Then if (A-) A obtains, tn of (2.11) is a.o. on 9. Proof. Fix wo E 9. By (2.11) and (2.12), t£(5n_])(xn)-dwo(xn) = 121 91(xn){Ii-vi(wo)} (2.13) where I, = fwi(w)dAn+1(w), i=1.2,...,p. (2.14) By (2.13). Et£(§n_1)(xn)-dwo(xnll2 5 p 12] r§(xn){1,-wi(wo)}2. (2.15) Using (1.13) and (2.15) and using the invariance of If and the distribution H2 under permutation of x].x2....,xn to permute x1 and X". A p 2 c 2 n o 5 Rn(w°,tn)sR(wp) 5 p 1;] I ¢i(x]){wi-wi(wo)} d w°(5n). (2.16) By (A-). $1 + wi(w°) a.s. H: . Therefore. by the assumed 2 integrability of the mi and the DCT. RHS (2.16) + 0. D 24 Consider the linear loss multiple decision problem of Van Ryzin and Susarla (1977). From (33) of Van Ryzin and Susarla (1977) or (14) of Gilliland and Hannan (1977). it follows that the excess risk of a Bayes EB procedure can be bounded by a multiple of the mean error of estimator of dw(Xn) by RHS (2.11) rather than the mean square error as with the squared error loss estimation component. Theorem 2.3. Consider the linear loss k-action multiple decision problem of Van Ryzin and Susarla (1977). Suppose that for each w E 9, the component conditional expectation dw(X) is given by (2.12) for some integer p, some integrable (Hm) function mi and some bounded. continuous function 41. Then if (A-) obtains, A tn 35 3.0. on 52. Proof. From the c = 1 case of (14) of Gilliland and Hannan (1977), A P c n 0 f Rn(wo’tn)-R(wo) f(k-IBE] I lcp1(x1){wT-w1(wo)}ldeo(z-n) (2.17) where use is made of (2.11) and (2.12) and the invariance under permutations used to reach (2.16). By (A-). $1 + ¢i(w°) a.s. H: Therefore, by the assumed integrability of the mi and the DCT. RHS (2.17) + 0. D The next theorem gives a set of conditions sufficient for (A) and hence (A-). It depends for its proof upon Theorem 6.1 of Schwartz (1965) which we state in a notation appropriate to the application at hand. 25 Theorem 6.1 (Schwartz). Suppose that (i) the.££flii:£i§i_ h(x|w) are jointly measurable (ii) V is a neighborhood of wo and there is a uniformly consistent test of the hypothesis w = w° against the alternative w 6 VC. and (iii) for every c > 0 V contains a subset W such that A(W) > 0 and the Kullback-Leibler information number K(w,w°) = f h(x|w)de (X) (2.18) satisfies K(w,wo) > K(wo,wo) - c on N. Then An(VC) + 0 a.s. Hoo LOO Theorem 2.4. Suppose that for each n. the joint density h(gnlw) = n h(xilw) has a monotone likelihood ratio (MLR) in Tn(§n). Suppose that the Kullback-Leibler information number K(w,w°) is finite and is continuous in w for each wo 6 0. Then if A has support equal to n, (A) obtains. Proof. Let wo.w1 e a. w1 < mo. There exists a consistent test an of Hmo versus Hm] s1nce Hm] f Hm by ident1f1ab111ty. * Let 6" be the Neyman-Pearson test based on Tn of the same size 0 'k . as on. Then on is uniformly consistent for Hm versus {lew e n O and w 5 “1} by the MLR property. Similarly. there exists a uniformly consistent test of Hw versus {lew E n and w 3 wz} 0 where w2 E a. w2 > mo. It follows that there exists a uniformly consistent test of Hw versus {Hm |w 6 n and w 4 (w],m2)}. Moreover, the continuity of K implies that for every Q-neighborhood V of w, and every c > 0 there exists a subset W of V such 26 that A(W) > 0 and K(w,m°)-K(wo,mo) > - c on W. Theorem 6.1 establishes that (A) holds. D The family of densities h(x|w) = c(w)eQ(‘°)T(")h(x), w e o (2.19) where Q is a monotone function. results in the family h(gnlw) being MLR in Tn(§n)= )T(xi). In each of the examples to follow. the mixtures {lew 6 9} form a one-parameter exponential family. Example 2.1 (Normal case of §1.2), Let FB = N(e.1),e E O = (-w.r) and Gm = N(0,(l-w)/w). w 6 0 = (0.1]. Then /"" 2 h(XIw) = '29:"— e-lfiwx , -oo < X < 00. Moreover, K(w,wo) = err/A24) - a. o + 5—1. 0 Here the component conditional expectation is (l-w)X and note that w(w) = l-w is bounded and continuous and X is square integrable (Hm). The density g(elw) is continuous in w for each fixed 6. Example 2.2. Let FB = Poisson {6), e e (0.w) and let em = Gama (010,00/(1-01”, w 6 (0.1), where 010 > 0 '15 fixed. Then F(ao+x) X a0 h(x|w) = W 00 (Two) , X = 0.1,... Here K(w,wo) is continuous in w and component conditional expectation is (a0 + X)w. Also, w is bounded and continuous and 27 a0 + X is square integrable (Hm). The density g(elw) is continuous in w for fixed 6. Example 2.3. Let Fe be Binomial (n = 1,6), 6 E [0.1] and let Gm = Beta (a0,a°(1-w)/w), w e (0.1), where do > 0 is fixed. Then h(XIw) = wx (1-m)1-X, X = 0.1. Here K(w,wo) is continuous in w and the component conditional expectation is (o1o + X)w/(ao +w). Also, w/(oo +w) is bounded and continuous and 00 + X is square integrable (Hm). The density g(elw) is continuous in w for fixed 9. Example 2.4. Let FB = Uniform (0.9). e e (0.w) and Gm = Gamma (2.w']), w E [a.w) where a > 0. Then -wx h(x|w) = w e , x > 0, Here K(w,w°) is continuous in w and the conditional expectation 1 = X-l + low"1 and note that X and l are square in X + w- integrable (Hm) and l and w-1 are bounded and continuous on [a.«). .The density g(elw) is continuous in w for fixed 6. Consider any EB decision problem with component probability structure that of any of the Examples. If the loss structure is either bounded, or is squared error loss estimation, or is linear loss multiple decision. Then the Theorems show that a Bayes EB procedure t2 will be a.o. on the support of A. and, therefore, on G = {Gwlw E 0} if the support of A is equal to n. 28 The results of this thesis pertain to EB problems with one-dimensional families G. In generalizing to higher (finite) dimensional families, there is no problem invoking the Schwartz consistency theorem except in the demonstration of a uniformly consistent test. For infinite dimensional families, the technical problems are much greater and only a few examples of a.o. Bayes 53 procedures have been given (c.f; Meeden (1972)). BIBLIOGRAPHY BIBLIOGRAPHY Deely, J.J. and Lindley, D.V. (1979). Bayes empirical Bayes. University of Canterbury. Christ Church and University College. London. (private communication). Efron, Bradley and Morris, Carl (l972). Limiting the risk of Bayes and empirical Bayes estimators, Part II: The empirical Bayes case. J. Amer. Statist. Assoc. 51, 130-139. Efron, Bradley and Morris, Carl (1973). Stein's estimation rule and its competitors -- An empirical Bayes approach. J. Amer. Statist. Assoc. 53, 117-130. Ferguson. Thomas S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. A, 209-230. Gilliland, Dennis C.and Boyer. John E.. Jr. (1979). Bayes empirical Bayes. Dept. of Statistics and Probability. MSU. (submitted for publication). Gilliland, Dennis C.and Hannan. James (1974). The finite state compound decision problem, equivariance and restricted risk components. RM-317, StatiStics and PrObability, MSU. Gilliland. Dennis C. and Hannan, James (1977). Improved rates in the empirical Bayes monotone mulitple decision problem with MLR family. Ann. Statist. 5, 516-521. Gilliland. Dennis C.. Hannan, James and Huang. J.S. (1976). Asymptotic solutions to the two state component compound decision problem, Bayes versus diffuse priors on proportions. Ann. Statist. A, 1101-1112. Good, I.J. (1965). The estimation of probabilities: An esgay on modern Bayesian methods. Research—Monograph No. 30, M.I.T. Press. James, W. and Stein, C. (1961). Estimation with quadratic loss. Proc. Fourth Berkeley §ymp. Math. Statist. Prob., 361-379. UnTVersity of California Press. Johns. M.V.. Jr. (1965). Contributions to the theory of non-parametric empirical Bayes procedures in statistics. Ph.D. Dissertation, Columbia. 29 3O Kuo, Lynn (1980). Computations of mixtures of Dirichlet processes. Technical Report No. 96. Dept. Stat.. Univer51ty of M1chigan. Lindley, D.V. (1971). Bayesian statistics. a review. Regional Conference Series in Applied Mathematics No. 2. SIAM, Philadelphia. Maritz. J.S. (1970). Empirical Bayes methods. Methuen and Co. Ltd. London. Meeden, Glen (1972). Some admissible empirical Bayes procedures. Ann. Math. Statist. 33, 96-101. Robbins, H. (1956). An empirical Bayes approach to statistics. Proc. Third Berkeley Symp. Math. Statist. Prob. 1, University of California Press, 157-163. Royden. H.L. (1968). Real Analysis, 2nd Edition, Macmillan. New York. Schwartz. Lorraine (1965). On Bayes' procedures. Z Wahrscheinlichkeits- theorie und verw. Gebiete 3, 10-26. Strawderman, William E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Ann. Statist. 5g, 385-388. Susarla. V. (1976). A property of Stein's estimator for the mean of a multivariate normal distribution. Statistica Neerlandica 30, 1-5. Teicher, Henry (1960). On the mixture of distributions. Amp, Math. Statist. 31, 55-73. Tsao, How Jan (1980a). 0n the risk performance of Bayes empirical Bayes procedures for classification between N(-1,1) and N(l,l). Statistic. Neerlandica 35, Tsao, How Jan (1980b). 0n the risk performance of Bayes empirical Bayes procedures in the finite state component case. Ph.D. Dissertation. Dept. Stat. Prob., Michigan State University. Van Ryzin. J. and Susarla, V. (1977). On the empirical Bayes approach to multiple decision problems. Ann. Statist. 5, 172-181. "IIIIIIIIIIIIIIIIII