\ . Lo \‘za‘,"w ‘ gveaod; FINES: " 25¢ 90rd” per it“ éfih“ ‘ RETQRNING LIBRARY MATERIALS: -‘ -, ”1 Place in book return to move 4 charge from circulation records ,/ ON THE RISK PERFORMANCE OF BAYES EMPIRICAL BAYES PROCEDURES IN THE FINITE STATE COMPONENT CASE By How Jan Tsao A DISSERTATION Submitted to Michigan State University in partiaT fulfiilment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probabiiity 1980 ABSTRACT ON THE RISK PERFORMANCE OF BAYES EMPIRICAL BAYES PROCEDURES IN THE FINITE STATE COMPONENT CASE By How Jan Tsao Since Robbins' introduction of the empirical Bayes approach to a sequence of decision problems, a large literature has evolved treating a variety of component problems. Most of the papers advance empirical Bayes procedures which are asymptotically optimal, and some establish rates of convergence. In empirical Bayes decision making, the Bayes empirical Bayes approach is discussed by Gilliland and Boyer (1979 ). In the finite state component case. the Bayes empirical Bayes pro— cedures are shown to have optimal properties in a fairly general setting and believed to have small sample advantage over the classical rules. The flexibility of making desirable adjustments for these decision procedures by choice of prior enables one to set a proper strategy when dealing with actual problems. IrI this thesis, a complete class theorem is proved to show that, at each sample stage, the class of Bayes empirical Bayes rules is complete, and. under some regularity conditions, that it is minimal complete. In the two state component case the posterior mean which generates the Bayes empirical Bayes rules is shown to be asymptotically normal under certain assumptions. The use of Bayes empirical Bayes procedures creates some interesting theoretical and computational problems as the Bayes procedures are fairly complicated in structure. The thesis also develops methods of<:omputing Bayes empirical Bayes rules and determining their small sample risk behavior. In some cases risk functions are evaluated by numerical methods, and, in other cases, Monte Carlo simulation is used to estimate risk. To my parents and Grace ii ACKNOWLEDGMENTS I would like to take this opportunity to thank my advisor, Professor Dennis C. Gilliland for his invaluable guidance, assistance, encouragement and caretaking during the entire course of this study. The financial support provided by the Department of Statistics and Probability and National Science Foundation made my graduate studies possible. I wish to thank them and Clara who accurately typed the whole manuscript with great patience and skill. My deep gratitude is extended to my parents and my wife, Grace, for their understanding, encouragement and support. TABLE OF CONTENTS Chapter Page 1 FINITE STATE BAYES EMPIRICAL BAYES PROCEDURES. ..... . ................................ 1 1.1 The Component and Empirical Bayes Decision Problems ........ . ..... . ........... 1 1.2 Bayes Empirical Bayes.... .................. 6 1.3 A Complete Class Theorem.... ............... 13 1.4 The Classification Problem ................. 23 2 TWO STATE BAYES EMPIRICAL BAYES PROCEDURES ...... 25 2.1 Testing Simple Hypothesis Against Simple Alternative ............ . ........... 26 2.2 Asymptotic Property of pA(§n).. ........... 30 2.3 Optimal Properties and Risk Performance 32 of Bayes Empirical Bayes Procedures for Classification Between N(-1, 1) and N(1,1) ..... . ............................. 32 2.4 Other Empirical Bayes Procedures .......... 38 2.5 Monte Carlo Comparisons of T, T1 and T2. ..................... . ..... . ........... 41 APPENDIX A............ ....................................... 46 APPENDIX B ......... . ............................. . ........... 57 BIBLIOGRAPHY............. .................................... 63 iv Table >>>>>>> Nam-bum LIST OF TABLES Rn(p,TB(1)) .................................... Risk behavior of R n(p’TB(1)) ................... The flexibility of R (p, T A) with a prior A 1n {B(y)|y > 0}....... ......................... R50(p,T2) ............ . ........... . ..... . ....... Comparisons of risk behaviors for decision procedures T1, T2, TA; A e {B(y)|y > 0}, when n = 1,2,5 ......... . ......... . ............. Comparisions of risk behaviors for decision procedures T1, T2, TA; A e {B(y)|y > 0}. when n = 10,25,50 .............................. Risk behavior of Rn(p,T1) ...................... Risk behavior of Rn(p,T2) ..... . ................ Risk behavior of Rn(p,TB(z))...... ............. Evaluation of the Bayes envelope R(p) .......... Monte Carlo simulation of Rn(p,TA), A E B ...... A numerical computation program ................ Monte Carlo simulation of Rn(p,Ta), a =1,2 ..... Page 36 37 39 42 44 45 46 47 48 49 50 53 55 Section problem (1') (ii) (iii) (iV) CHAPTER I FINITE STATE BAYES EMPIRICAL BAYES PROCEDURES 1.1. The component and empirical Bayes decision problems. Consider the following component statistical decision with which we shall be concerned. This comprises A sample space (X,§) and a parameter space (9,3) where 5,3_ are o-algebras on X,o respectively. {P : e is a family of probability measures on (X, ) dominated e e o} by some o-finite measure u. fe is a density for P6 with respect to u, e 6 Q. X denotes an X-valued random variable distributed Pe’ conditional on 9. An action space (A,A) where A_ is a c-algebra on A containing the singleton sets. A loss function L: o x A + [0,”) representing the loss of taking action a in A with e E Q. L(e,-) is measur- able for each 9 6 n. The (behavioral) decision rules t(-,-), each a function of the pair (x,B) where x e X and B E A, having the measurability properties below: (a) for each x, t(x,-) is a probability measure on A, (b) for each B, t(o,B) is {fmeasurable. A nonrandomized decision rule is one where for each x, t(x,-) is degenerate. The set of (behavioral) decision rules is denoted by A. For any t, the expected loss when a is the true para- meter is R(e.t) = ffL(e,a)t(x,da)Pe(dx). Let G denote the class of all probability measures (priors) on g. with respect to which the t-sections of R(e,t) are measurable. The Bayes risk of t versus G is R(G,t) = f R(e,t)G(de). tG is called a Bayes rule with respect to G if its Bayes risk attains the infimum Bayes risk R(G) = inf R(G,t). tGA He will assume that R(G) is attained for each G 6 G. Throughout our discussions we will consider a = {0,1,...,m}, m n = 2“, and assume that P = I g p is identified by G = (go....,gm). G 15 the m-dimensional s1mplex in Em+1’ the m+1-dimensional Euclidean space. We will call R(-,t) the risk function of t and R(-) the Bayes envelope defined on G. Consider the empirical Bayes decision problem. In it the component decision problem just described occurs repeatedly and independently. Thus, let (61,X1), (92.x2),...,(en,xn), (e X ),... n+1’ n+1 be iid with Si having distribution G and, conditional on e X. having distribution Pe . The marginal distribution of i’ 1 1 Xi is the mixture PG’ Based on the initial observations 5“ = (x1,...,xn), a component dec151on rule Tn(§n) 15 selected and evaluated at x to reach a decision about en+1’ n g 1. n+1 Thus, an empirical Bayes decision rule for reaching a decision about en+1 is Tn(5n)(xn+1’°)’ n g 1. The goal is to use the information about G from the initial observations to construct a rule Tn whose risk behavior is close to that of the Bayes rule tG(xn+1")' In general, more information about G will be available with increasing number of observations. We will consider an empirical Bayes procedure as a sequence T = (T1,T2,...) of empirical Bayes decision rules n+1 where for each n, Tn is a function on X x A_ such that every x" = (x1,...,xn) -section is an element of A, the class of component decision rules, and such that for each a e a, R(e,Tn(xn)) is a measurable function in 5". For each n, we let Tn denote the collection of all possible Tn defined as above. The use of Tn against prior G incurs the unconditional component Bayes risk _ n Rn(e,1n> - f R(e,1n(gfl))PG(d§n), n g 1 where here and throughout a symbol for a measure with a superscript indicates a product measure. Since Tn(xn) 6 A for each .5" 6 X", we see that for all n, Rn(G’Tn) R(G), the minimum component IIV Bayes risk. Observe that Rn(G,Tn) = ER(G.Tn(§n)) = ago 96 I R( (6, T n(_‘n )) PG(dx ) m n m = a; 96 f R(e.Tn(x )){,§1 [jéo fj( ) J} u ( _n) = ? g (930. .glm)H (e 20, . ,2 ) (1-1) e=0 10+. +2 = n 9 m m where m Hn(e,20,...,2m) = BX*""Bm f R(e Tn( )){ j- no iij fj (x1 )} u"(dxn) '81) = i = 0,...,m The summation above is over partitions {B .,Bm} of 0’8 1’” {1,2,...,n}, and the second summation in (1.1) is over all partitions £0,£1,...,£m of the integer n, i.e., integers 2i g 0 with Z 1i: n. From (1.1) we see that the risk function Rn(-,Tn) i=0 is determined by the collection of coefficients {Hn(e.20....,2m)|e= 0,...,m; ; 2. = n, 2. i o, i O,..,m} (1.2) which in turn, can be identified by an element of of the space EN, where by Feller (1975, (1L5.2)),~ =(m + 1) m+"). This remark will prove useful in Section 1.4. Definition 1.1. If lim Rn(G,Tn) = R(G) we say that T is n asymptotically optimal relative to G(a.o.[G]). If T is a.o.[G] for all G E G, we say that T is asymptotically optimal (a.o.). * . *. Def1n1t1on 1.2. For Tn,Tn e Th,Tn 15 as good as Tn 1f * * Rn(G’Tn) : Rn(G,Tn) for all G 6 G. Tn lS better than Tn . * * lf Rn(G.Tn) ; Rn(G,Tn) for a11 G e G and Rn(G,Tn) < Rn(G’Tn) for at least one G e G. Tn is equivalent to T: if * Rn(G,Tn) = Rn(G,Tn) for all G 6 G. Definition 1.3. Tn is said to be admissible if there does not exist an empirical Bayes decision rule in Tn that is better than Tn. T is called an admissible empirical Bayes procedure if Tn is admissible, n g 1. Listed below are some desirable properties of an empirical Bayes decision procedure T = (T1,T2,...). (i) T is a.o. (ii) Rn(G,Tn) converges to R(G) rapidly for all G 6 G. (iii) T is admissible. (iv) Tn has good risk behavior for small to moderate values of n, that is, Tn is suitable for use even when large numbers of observations are not available. (v) An algorithm for computing the decision rules is available and can be executed economically. (vi) T can be adjusted systematically to improve its performance on many specified subsets of G. We will judge the performance of an empirical Bayes procedure on the basis of properties (i) - (vi) mentioned above. Section 1.2. Bayes empirical Bayes Let IQ be the Borel o-algebra of subsets of G. The Bayes approach to the empirical Bayes decision problem considers possible priors on (G,G). First we give the following definitions. Definition 1.4. An empirical Bayes rule Tn 6 Tn is Bayes with respect to a prior A on G, if it is a infimizer (across Tn) of Rn(A,Tn) = j Rn(G,Tn)A(dG) Definition 1.5. T is said to be a Bayes empirical Bayes procedure if Tn is Bayes, n g 1. T is said to be a Bayes procedure with respect to a prior A if Tn is Bayes with respect to A, n g 1. To construct a Bayes empirical Bayes rule at stage n, it is convenient to introduce the component risk set, S = {§j=(so,...,sm)| for some t e A, 51 = R(i,t), i = 0,...,m}. S is a convex subset of E"1+1 which we will assume is compact throughout this thesis. We will use the following theorem (LeCam (1956, Theorem 3.3.2)). Theorem 1.1. Let (X,X) be a measurable space and let a be a compact metric space. Let f(x,e) be a function from X x 9 to the real line. Assume that f is measurable in x for each a and continuous in e for each x. Then it is possible to find a function é(x) which is measurable in x and such that f(x,é(x)) = inf f(x,t). D t€® For a g1ven A, Tn e Tn Rnwn) N m x A m —. A I m A ‘ E(A) 620 R(O, T "(m )) )E A(gelX) Here EA(G|Xn) is G-valued conditional expectation corresponding to the conditional distribution of G given X and E(A) corresponds to the mixture P U)( )= f P3( -)A( (dG) Since (R(0,Tn(xfl)),.. . ,R(m, 1 n(_" ))) e s for all 5“ and Tn, to minimize Rn(A,Tn) we seek a function 5: Xn + S such that 6 is measur- able and m X 69(X )E (9 IX ) = inf Z 56 EA (9 IX ) e=0 T" A 6* ses e=0 em where 6(Xn) = (50(Xfl),...,5m(Xn)). By Theorem 1.1, such a 6 exists. Suppose 5 is a measurable version, and for each 5", Tn A(x ) e A 15 such that 5%,) = R(e 1 < n,A )), e = 0,...,m. (1.3) x —n Then T 6 Th and Rn(A,Tn n,A ) = 1nf R (A,T ). , A n n The Tn Also note that the Bayes empirical Bayes rule Tn is pointwise A component Bayes with respect to EA(G|§n) = (EA(gO|§n),...,EA(gm|§n)). In what follows we sometimes will use the notation GA(§n) in- stead of EA(G|§n). For a given prior A, let Tn A 6 Tn denote a Bayes empirical Bayes rule with respect to A which has the above form. We first discuss conditions that assure that Tn A is a.o.. Oaten (1972, (1.6 )) shows that 0 g R(G,tF) - R(G) g M HG-Fn (1.4) for all F,G e G. Here M is a bound on the component risk, “-fl is the 11 (total variation) norm on Em+1' Under the assumption that A has support all of G, Gilliland and Boyer (1979) prove that llm UGA(§H) - G“ = 0 a.s. PG for all 6 E G. (1.5) where PE is the probability distribution of x1,x2,... . Thus, in the bounded risk case,(ld4l with F = GA(§n) and(1.5) establish is a.o., that is, that Tn,A lgm Rn(G’Tn,A) = R(G) for all G 6 G. The above shows how the question of asymptotic optimality in the finite o Bayes empirical Bayes problem can often be re- duced to a question of the consistency of the estimator GA(§n) for G. To obtain the form of GA(§n), note that the conditional dens1ty of 5n 15 n n - f (zfllG) - 1:1 jEO fj(x )gj Hence the conditional probability of G given 5n has density _ n n f(G|xn) - f (xfllG)/ f f (xfllG)A(dG with respect to A and G( M5“ -f Gf(G|X )A( dG) (1.6) 9 _ We denote the components of GA(§n) by gA(§n), e - 0,1,...,m. We now develop an algorithm for computing GA(§n). Here we will represent G by the m-dimensional simplex in Em m S = {gm = ($1,.. .,s m)|s. > 0, i = 1,...,m, E s. g 1} m i = j 1 J m and for gm 6 3m let s0 = 1 - jgl sj. By (1.6) n fse H { 2 SJ )} A d(s ) 9A (1“) = i 1 m5 0 (1.7) figlio E sjj)(f(1.)}A(d§m) 10 2 +2 + +2 = n 52 2 (5n)”g 1 0 "' m 0’°'°’ m 0’°'°’ m , B=l,...,m (1.8) = X ) u £0+21+...+2m n 20,..., m —n £0""’£m where for each nonnegative integer partition 20,11....2m of n, m 5") = Z { n n [f.(Xi)-fO(Xi)l} n f0(X S ( . J m BO""’Bm J=1 168. 1630 20,...,2 -)},(1.9) l J IBOI = 20,...,|Bm| = 2m Here B U...UBm = {1,...,n}, BinB' = ¢, i,j = 0,...,m, 1 J I 21 lm u = s .. s A (ds ) 20, ,tm S 1 m m and 6 11 m “2 , ,2 = f (51 . Sm ) S6 A (ds“). 6 = 1, ,m 0 m S m The following theorem leads to a convenient way of computing (1.9). Theorem 1.2. For each n g 1 and set of real numbers {a.jli=l...-,n; j=0,...,m} define the function Qn on Sm by (a1.0 + ails1 + ... + aimsm)' O A U) H 0 U U) V II ":21: 11 For each nonnegative integer partition 20.11,...,2m of n, let C2 1 denotes the coefficient for the term 0,...,m 21 22 2m s1 s2 ...sm in the polynomial expansion for Qn‘ Then C = a - C . n z 2, 1.10 £0,...’£m j=0 nJ 10,...lj'1,...£m " with the convention Cn'1 = 0 if some k. = -1. k0,.oo,km J 3399:, The proof follows from the uniqueness of the coefficients n . . C10"°"£m in the polynomial Qn' D To find all coefficients of Qn(sl,...,sm), n g 2, we go through equation (1.10) n m X ({(2 ,...,2 )| Z t. = k; z. 2 0}] k=2 0 m i=0 ‘ ‘ -("""‘1>(+2) - m + l ' m m+1 n ”m1. times (see Feller (1957, (II.5.2) and (II.12.8)), where the sign m is used to indicate that the ratio of the two sides tends to unity as n + m. The limiting form is obtained by applying the Stirling's formula (Feller (1957)) and the l'H6pital's rule. To apply Theorem 1.2 in computing (1.9), for each 5n 6.x", we let aij = fj(xi) - f0(xi) and a1.0 = f0(xi), 1 = 1,..,n, j = 1,...,m. Then for each nonnegative integer partition £O’°"’£m of n, we have 51 Hence by (1.8), (x ) = Cn . 0’°"’£m —n £0’°"’£m 12 2 + 1 +2 - n C: 1 “Z 2 92(En) 3 0 m 0 m 0 m a 9 = 1,...,m. (1.11) 1 en u 2.0+ 'HLm '-" n 2.0, .,flm 2.0, ,lm Note that 92 (5”) depends on A only through a finite number of general moments of A. The computation of (1.11) is, in most cases, more efficient and accurate than a direct numerical inte- gration in (1.7) or a direct evaluation of (1.8). Even in the case m = 1, a direct evaluation of 9: (5n) through(1”9) is not feasible; in most cases, however, the application of (1.11) results in an efficient and accurate evaluation. Chapter 2 provides a detailed example. Of course, computation with (1.11) is simplified when those general moments of A can be evaluated easily. Here we consider one such example: EXAMPLE. (Bayes empirical Bayes with Dirichlet priors.) Let 0(a1,...,am,a0) denote the m-variate Dirichlet distribution on the simplex Sm which has probability density function P(ao+...+am) a -1 am-1 -1 _ 1 0‘o f(§m) - r(a0)1..r(am)’ s1 ...sm (1-51-...-sm) , §m 6 Sm, where the o. are all real and positive. If we let 1 A = D(a ,...,am,a0), then it can be verified (Nilks (1962), (7.7.6)) 1 that the general moment pl 2 of the m-variate Dirichlet 0,000, m prior A has the following value u : r(a1+21)...r(am+2m) F(a0+...+am) 20,...,zm P(oi)... r(ah)' r(a0+...+am+21+...+2m) 13 Section 1.3. A complete class theorem Gilliland and Boyer (1979) have suggested that, for each n, the study of empirical Bayes rules in Tn can be viewed as a study of the class of nonrandomized decision rules in a decision problem (G,D,Rn), so that the class B of Bayes empirical Bayes n rules is the class of Bayes rules in (G,D,Rn). In this section we will prove that, in a large number of empirical Bayes problems, Bn is a complete class for (G,D,Rn). The results apply to each stage n, n > 1. Definition 1.6. A class C of decision rules Cc: D, is said to be complete, if, given any rule t in D not in C, there exists a rule t* in C that is better than t. A class C of decision rules is said to be essentially complete, if, given any rule t not in C, there exists a rule t* in C that is as good as t. Consider the decision problem (G,D,Rn) with sample space {xn,§h), parameter space (6,2), (Pg: 6 e G} a family of pro- bability measures on (Xn,§h) dominated by u",§n distributed P3 conditional on G, action space (S,§) where §_ is the Borel o-algebra on S, loss R: G x S + [0,w) with R(G,§) = E 9656. The class of nonrandomized rules D is represented by th:_glass of measurable transformations form (Xn,§h) to (S,§). Using 1 rule d = (d0,d ,...dm) 6 o, the expected loss when G is the true parameter is "' 14 mm41-Immu%n$w%> (1.12) E fd9( )P"(d ) 9:0 96 5n G in Note here (1.12) and the fact d(xn) e S implies that each Tn e Tn determines a d e D such that Tn’ d have the same risk function; conversely, for each d 6 D there exists a Tn e Tn with the same risk function. Let A be a prior on 95 the Bayes risk of d 6 D is Rn(A,d) = f Rn(G,d) A (dG). A Bayes rule with respect to A is a rule dA E D such that R (A,d n A) = inf R (A,d). deD “ Our discussion is restricted to nonrandomized rules because for t 6 0,0 denoting the class of behavioral rules, we have Rn(e.t> fj'R c] "A MS 0' O. O A X V —h (T) 3 A 11 11MB —l —l c-J a fig. 0' (D a. 7rd) A A O 'U (I): r1 [0" D. O V O 1...: Q where the last inequality follows from the fact bfdk.(xfl) c IIA 1 for all i. Therefore, we have P2 [bfdo > c] = o, i.e., P3 [bfdo ; c] = 1. This proves the claim. From the fact that S is countable and the above claim, we obtain the result -n 1=n . 1 - PG {HES [d0 6 H1} PG [do 6 n S] for all G 6 G. But Lemma 1.1 shows that s = ns. Therefore Pg [d e 51 = 1 O for all G E G. This completes our proof. D Corollory 1.1 There exists a topology on D such that (a) D is compact and (b) Rn(G,d) is continuous in d e D for all G E G. Proof: Define o to be a function on D such that o(d) = g for all d e 0. Then the collection of sets F = {m-1(A)|A open in Q} is a topology on D such that o: D + Q is continuous. Since o is onto, if Q is a covering of Q = ¢(D) then {o-1(A)|A E Q} is a covering of D. Hence the compactness of Q from Theorem 1.3. implies that (D,F) is compact. Form the polynomial form of Rn(G,d), Rn(G,d) is a linear combination of g‘ = Hio¢(d), i = 1,...,N where 19 Hi: EN + El, 1 = 1,...,N, is the projection map. Therefore the continuity of H1°¢a i = 1,...,N, implies that Rn(G,d) is continuous in d E D for all G e G. D Definition 1.7. A rule d e D is extended Bayes if for every e > 0 there is a prior distribution A such that Rn(A,d) g inf Rn(A,d) + e dED The following theorem follows immediately from Corollary 1.1 and Theorem 2.10.3 of Ferguson (1967). Theorem 1.4. The class of extended Bayes rules in D is essentially complete. Theorem 1.5. Any extended Bayes rule in D is a Bayes rule. Proof: For d e D, Rn(-,d) is continuous in G. Let d e D be an extended Bayes procedure. Then for each positive integer N, there exists a prior distribution AN such that ] Rn(G,dAN)AN(dG) g f Rn(G,d)AN(dG) ; f Rn(G.dAN)AN(dG) + l/N. (1.14) Since G, a closed subset of [0,13m, is compact, the class {AN}:=1 is tight. By the Prohorov theorem (Billingsley (1968)) {AN};=1 is relatively compact which means that there exists a1 pr1or A and a subsequence {AN}N=1<: {ANlN=1 such that AN converges weakly to A as N + w. Con The 20 Consequently, I Rn(G,dA)A(dG) "A f Rn(G,d)A(dG) 1am I Rn(G,d)Afi(dG) IIA TEE'I Rn(G,dA&)A&(dG) by (1.14) g Em f Rn(G,dA)Afi(dG) = j Rn(G,dA)A(dG). The above shows that d is Bayes with respect to A. B Our complete class theorem follows directly from Theorem 1.4. and Theorem 1.5. Theorem 1.6. The class of Bayes empirical Bayes rules is complete. Proof: From Theorems 1.4., 1.5. we know that the class of extended Bayes rules is equal to B and is essentially complete. There- fore, for d E B, there exists a Bayes rule dA such that Rn(G’dA) g Rn(G,d) for all G e G. If "=" holds for all G in G then d is Bayes with respect to A, a contradiction, so dA is better than d. This implies that B is complete. U Definition 1.7. A class C of decision rules is said to be minimal complete if C is complete and if no proper subclass of C is complete. It is also of interest to know when the class of Bayes empirical Bayes rules will constitute a minimal complete class. 21 The minimal complete class, when it exists, is exactly the class of admissible rules. Since Bn has been proved to be a complete class, any admissible rule will be in Bn' It is then sufficient to find conditions under which the Bayes empirical Bayes rules are admissible. The following remark is needed in the proof of Theorem 1.7. Remark 1.1. If the members of {Pele e a} are mutually absolutely continuous then so are the {PGIG e G} which implies that the product measures {PEIG E G} are mutually absolutely continuous and equivalent to any mixture P(A). Theorem 1.7. Suppose that {P e e 9}, are mutually absolutely 6’ continuous and that the Bayes component decision rules are unique up to risk equivalence. Then the class of Bayes empirical Bayes rules is minimal complete. Proof: Since the class of Bayes empirical Bayes rules is complete, if we show that the Bayes empirical Bayes rules are admissible, then Bn is minimal complete. For a given A, let T be the Bayes empirical Bayes n,A rule with respect to A as defined in (1.3). Then for Tn E Th, ( ),1 (x )) (1.15) R(GA(-)£n)’ T n,A —n )) ;R(G( nln win Suppose Tn E Tn is Bayes with respect to A. Then Rn(A,Tn) = Rn(A’Tn,A) (1.16) 22 and (1.15) and (1.16) implies _fl n fifl)) = R(GA(§n). Tn A(5,1)) a.s. P(A)' (1.17) By our hypothesis, the Bayes component rules are unique up to risk equivalence, which means if t1,t2 E A are Bayes with respect to G, then R(e,t1) = R(e,t2), e = 0,...,m. This and (1.17) im- plies that R(e,Tn(xn)) = R(e,Tn,A(xfl)), e = 0,...,m. a.s. P(A) By Remark 1.1, the above equalities holds a.s. Pn G for all G E G, so that Rn(G,Tn) = Rn(G,T ) for all G e G. i.e., Tn is equivalent to Tn Thus, the Bayes rule with ,A° respect to A is unique up to risk equivalence. It is well known that if a Bayes rule is unique up to risk equivalence then it is admissible. U Empirical Bayes classification between N(-1,1) and N(1,1) is a decision problem satisfying the hypothesis of Theorem 1.7. This example is the subject of computation and study in Section 2.3. Boyer and Gilliland (1980, Theorem 4) point out how the continuity of risk functions Rn(G’Tn) in G ensures that Tn,A is admissible if A has support all of G. 23 Section 1.4. The classification problem In this section we will derive the form of the Bayes empirical Bayes rules for classification problems. A classification problem will provide an example for the application of the algorithm de- veloped in (1.11) for computing Bayes empirical Bayes rules. In a classification problem, an observation is to be classified as comming from one of m + 1 distributions. Specifically, we let A = {0,1,...,m} = Q and the loss be a if an incorrect classifi- cation is made and 8 if a correct classification is made, a > 8 g 0. Recall, G = (90,...,gm) represents a probability measure on Q. Conditional on X = x, the distribution of 6 has density m f(elx) = fe(x)ge/jE0 gjfj(x) e = 0,...,m. For each a 6 {0,...,m} and x e X, E L(e.a)f(elx) e=0 E(L(6.a)IX) a - (a -B)f(alx) IIV a - (a -B) max f(ilx) 169 Define dG(X) max {e|f(e|X) = max f(i|X), e 6 Q} 169 max {elfe(X)ge = max f. ,(X)gi. e e a} (1.18) 169 Then dG is a non-randomized component decision rule which is Bayes with respect to G. 24 From the discussions in last section we know that Tn A(xfl choses a Bayes component rule with respect to GA(xn). Therefore, to implement the Bayes empirical Bayes rule with respect to A, first evaluate GA( ) and then replace fe(x)ge in (1.18) by in f,(xn,1)gi_ 0 and A(-) has three continuous derivatives in a neighborhood of the true parameter pO 6 (0,1). If P0, P1 are mutually absolutely continuous and if fllog fi(x)|Pj(dx) < w for i,j 6 {0,1} then (D (5(PA(£n) - 5(1fl)) + 0 a.s. Pp0 Proof: (In Appendix B) D As a consequence of Theorem 2.1 and Theorem 2.2, under the hypothesis of Theorem 2.2, pA(Xn) + p0 a.s. Pco p0 and (R(pAQn) - p0) + N(0,I(p0)'1) in distribution. 32 Section 2.3. Optimal properties and risk performance of Bayes empirical Bayes procedures for classification between N(-1,1) and N(1,1). To illustrate the risk performance of Bayes empirical Bayes procedures we will study the following example. EXAMPLE: Testing N(-1,1) against N(1,1). 1, f0(x) = (2n)-%exp{—(x+1)2/2} and f1(x) = (Zn)-%exp{-(x-1)2/2}. By (2.1) a nonrandomized version In this example we have X = E of a component Bayes rule is: ff A X V II H .... ‘h X “V n (2.7) 0 if C x < p _ 1 1:3. . . where Cp - 20m ( p ). The Bayes emp1r1cal Bayes rule TAQn) simply replaces p in (2.7) with pA(Xn). By (2.4) and (2.7) an algorithm for computing the Bayes empirical Bayes rule is al- ready available. If A is chosen as the probability measure corresponding to a mass 1 at p, then TA tp is the Bayes em- pirical Bayes procedure with respect to A. In particular, with A(%) 8 1, TA 8 t15 is the minimax procedure with constant risk Rn(p,TA) = P0(X ; 0) = 0.1587 for all p 5 [0,1] and n g 1. Observe that the risk set of tp, p 6 [0,1], is {(so,sl)ls0 = P0 [X g a], $1 = P1 [X < a] for some a E £-w.w]}; this together with the form of (2.7) implies that the component Bayes rules are unique up to risk equivalence. By Theorem 1.7 33 we see that at each stage n, the class of Bayes empirical Bayes rules is minimal complete in this example. In our applications we will deal with those A that belong to a given parametric family 8 = {B(y)|y > 0} where B(y) denotes a symmetric beta distribution on (0,1) with density = [(21) y-l _ y-l < < 93(Y)(p) [r(y)]2 p (1 p) for o p 1. From previous discussions we note that {TAz totically optimal procedures and are admissible at each stage n. A e B} are asymp- Also note that assumptions in Theorem 2.2 are satisfied, so that pA(Xn) is asymptotically normally distributed. The variance of . . . . . -1 the l1m1t1ng distribut1on of /H (pA(Xfl) - po) 15 I(p0) . (Behboodian (1972) discussed the conditional moments of p for Beta priors.) Remark 2.1. If A has a density gA(p) which is symmetric about 1/2, then Rn(p,TA) = Rn(1-p, TA) for p 6 [0,1]. To see this, observe (i) fp(-X) = f1_p(x) (ii) pA(-xfl) = I-pA(xfl) (by elementary calculus) (iii) CA(-xn) = -CA(xn) (a direct result of (11)) where chn) = 11.)," [(l-pAqnn/pAqnn. Since (iii) implies R(p,TA(-X ))9 _n)) = R(l‘pa TA( 1,, the remark is verified by appealing to (2.6), (i) and (iii). 34 Since Rn(p,TA) is a polynomial in p (see (2.6)), the Remark 2.1. implies that for A e B, Rn(p,TA) is a function of 1)2 (P- 2' ; hence,it has an even degree less than or equal to n + 1. With n = 1 or 2 one can readily see that Rn(p,TA) will be a horizontal line or a parabola with extremum at 1/2. We will compute the values Rn(p,TA) for p 6 [0,1] when n = 1 or 2. Using (2.5), (2.6) and results (i), (ii), (iii) of the Remark, elementary calculus shows 2 R1(p.TA) = 2(a-b)p + 2(b-a)p + a with m CA(x1) a = f [ f1(x)dx f1(x1)dx1 m CA(XI) b = f f f1(x)dx f0(x1)dx1. Also, a tedious calculation shows that R2(p,TA) = [3c - (2d + e)]p2 + [(Zd + e) - 3c]p + c with c = [a I-” I-“ f1(x)dx f1(x1)f1(x2)dx1dx2 m m CA(x1.x2) d = I-” f-” I-” f1(x)dx f1(x1)f0(x2)dx1dx2 CA(x1.x2) e = f w f0° f f1(x)dx f0(x1)f0(x2)dx1dx2. m 35 For the case A = 8(1), the uniform distribution on (0,1), numerical computations supported by softwares from IMSL (1979) (Table A.6) subroutines were used to compute: a = 0.12071, b = 0.21212, c = 0.09576, d = 0.16486, e = 0.25720. Using MSU CDC 6500 computer the accuracy of computing a,b,c,d,e was controlled at 3 to 4 significant decimal digits. Therefore: R1(p,TB(1)) = -0.1828p2 + 0.1828p + 0. 1207 (2.8) R2(p,TB(1)) = -0.2997p2 + 0.2997p + 0.0958 (2.9) are parabolas concave downward with extremum at p =-% . The direct numerical computations for n 3 2 and A e B are in general not feasible; to overcome this difficulty, Monte Carlo integration method was used to evaluate Rn(p,TA). For A 6 B, we generate independently L sample sequences of independent from a population having f (x) as n P density. For each of the L sequences generated, we then compute random variables X1,...,X R(p,TA(§n)) based on (2.4) and (2.5). An estimate of Rn(p,TA) is obtained by averaging the L computed values of R(p,TA(xfl)). An estimate of two standard deviations of the average is also obtained based on these L samples. L is made large enough to make the two standard deviations width acceptable in each experiment. Within each constructed table in this paper, the numbers following the f signs are estimates of two standard deviations of the Monte Carlo estimates. (See Table A.5 for computing program.) To examine the accuracy of our Monte Carlo estimates, Table 1 compares the values of Rn(p,TB(1)) with p = 0.0(0.05)0.5 36 and n = 1,2 obtained by (2.8), (2.9) and by Monte Carlo integra- tions. Table 2 explores the risk behavior of Rn(p,TB(1)) for n = 1,2,5,10,25,50 and for p = 0.0(0.05)0.5. (also see Table A.3 for Rn(p’TB(2)))° It can be seen that Rn(p,TB(1)) converges to R(p) quite rapidly and has steady small sample size risk behavior. Values of Rn(p,TB(1)) for p > 0.5 need not be computed because of the symmetry about 0.5. Table 1. Rn(p,TB(1)) n = 1 n = 2 "1133227”""AEAEGEET WEBEETmABAQEEESI" p Carlo Computing Carlo Computing R(p) 0.0 0.122t0.006 0.121 0.093:0.005 0.096 O 0.05 0.12810.006 0.129 0.107t0.006 0.110 0.0405 0.10 0.13710.005 0.137 0.125:0.006 0.123 0.0701 0.15 0.146:0.005 0.144 0.135:0.006 0.134 0.0934 0.20 0.151:0.005 0.150 0.147:0.005 0.144 0.1121 0.25 0.15510.004 0.155 0.152i0.005 0.152 0.1270 0.30 0.160:0.003 0.159 0.159:0.004 0.159 0.1387 0.35 0.162:0.003 0.162 0.164:0.003 0.164 0.1476 0.40 0.16510.002 0.165 0.168t0.002 0.168 0.1538 0.45 0.166:0.001 0.166 0.17010.002 0.170 0.1574 0.50 0.16610.001 0.166 0.170:0.002 0.171 0.1587 *200 replications for each estimate 37 Aopcmo mucozv mums_umm comm Low m:o_umo_pamc om Aopgmu macozv mumswamm comm com mcopumuw_amc com cowuauzano Fmowgmszc an umcwmuno mmpmswumm «is... c; .4. Ammfl.o Noo.owmmfl.o Hoe.OHeeH.o Noo.omm~fi.o moo.owuka.o “22.5 mefi.o om.o e~m~.o ~oo.omoo~.o Noo.owoofi.o moo.owHNH.o moo.owmufl.o oNH.o oefi.o me.o mmmfl.o Hoo.owkmfl.o Noo.ommefi.o moo.owmefi.o moc.owfiufi.o mo~.o meH.o oe.o eNeH.o Noo.owmmfl.o Noo.ommmfi.o moo.owoo~.o mco.owfie~.o eeH.o NmH.o mm.o Ammfl.o Noo.owme~.o Hoo.owmefi.o moo.ow~mfi.o eoo.owmmfi.o mmH.o mm~.o om.o o-2.o Hoo.owomfl.o Hoo.ow¢m~.o moo.omgefi.o eoo.oweefi.o Nm~.o mmH.o mN.o 2N~H.o floo.oonH.o moo.ohwflfi.o soo.osN~H.o moo.owmmfi.c eefi.o omH.o om.o emmo.o ~oo.owmmo.o Hoo.omoo~.o Noo.owmofl.o moo.owo~fl.o em~.o eefi.o m~.o Hoko.o floo.owm~o.o Hoo.owo~o.o eoo.ommmo.o moo.cwfiofi.o MNH.o Nm~.o ofi.o moeo.o ~oo.oweeo.o Hoo.ommeo.o moo.ow~oo.o moo.cm-o.o o-.o mNH.c mo.o o ~oo.omeoo.o ~oo.owmfio.o moo.owmmo.o mco.owomo.o omo.o HNH.O oo.o Any: ...omwe ..mmue ..on= wwmue .Nwe .Hue a AAfivmh.nv:a mo Lo_>mcma xmfim .N m~am~ 38 At this point it is important to note that in the Bayes empirical Bayes approach the presence of A does not restrict the construction of Bayes empirical Bayes procedures but adds the flexibility which enables one to access a family of decision pro- cedures with predictable risk behavior. In particular, consider procedures TA for A e 8. While the mass of B(1) is evenly distributed over [0,1], B(y) puts more weight to those p values close to 0.5 as y increases, and conversely, puts more weight to those p values close to 0 and 1 as y decreases. From the fact that TA is admissible and TA is Bayes with respect to A, we expect that for a < b, Rn(p,TB(a)) > Rn(p’TB(b)) for p close to 0.5 and Rn(p’TB(a)) < Rn(p’TB(b)) for p close to 0 or 1. Table 3 shows the flexibility with choices among B(y); y = 0.25,1,2,3,10 and gives values of R1(p,TB(Y)) for p = 0.0(0.05)0.5. The fact that B(y) has mean %- and variance 1/4(2y + 1) implies that as y + w, B(y) converges weakly to the distribution degenerated in p = %-, and hence TB(y) converges to the minimax rule with constant risk .1587. This is also re- flected in Table 3. Section 2.4. Other empirical Bayes procedures Robbins (1951) in his original example of the related com- pound decision problem uses the estimator n p1(Xn) = max{0,m1n{1, 0.5 + ('2 0.5 Xi)/n}} (2.10) 1 1 39 mumEFHmm comm com meowamo_—ch com mumewumw scam Low meowpmu__qmc com .3, .4. ~oo.owmm~.o ooo.owoofi.o ooo.owflo~.o Hoo.oweofi.o moo.OAom~.o Ammfi.o om.o Hoo.ommmfi.o ooo.owoefi.o Hoo.owHoH.o Hoo.owoefi.o ~oo.owem~.o «Amfi.o me.o fioo.oaom~.o ~oo.owmm~.o ~oo.owoofi.o Noo.owmo~.o Noo.onoafi.o mmmfi.o oe.o ~oo.owwm~.o Hoo.owmmfi.o moo.owmmfi.o moo.cw~e~.o moo.ow~m~.o 82¢“.o mm.o Hoo.owmm~.o Hoo.ow~mfi.o Noo.owumfi.o moo.owoofi.o eoo.owfimfi.o Ammfi.o om.o Hoo.ommmfi.o Noo.owemfi.o moo.owmmfi.o eoo.owmmfi.o eoc.owoofi.o oNNH.o mN.o Hoo.owkmfi.o Noc.ommm~.o ~oo.op~m~.o moo.o“~mfl.o moo.ohmmfl.c H~H~.o om.o Hoo.owcmfl.o Noo.owNmH.o moo.ommefl.o moo.owoefi.o moo.ow~¢~.o emmo.o mfi.o Hoo.o«mmfi.o Noo.owmefi.o moo.oflmefi.o moo.ow~mfi.o moo.owm~g.o Houo.o ofl.c Hoo.o0mmfi.o Noo.omee~.o moo.omoe~.o ooo.owmmfi.o moo.o«~H_.o moeo.o mo.o Hoo.ow~m~.o Noo.omoefi.o moo.OHmmH.o ooo.om-~.o moo.owmoo.o o oo.o ..ofiw» wwmu» ..Nw» ..Hu» wm~.ou> Anya a Arvm u < mo gmuwsogaq Ac x »_A»Vmc e. < Lo_Le e eu_z Amcmc xmwc mo mcom_cmasou .m m_ceh 45 mums_umm comm cow mcowumo_pamc com 8 mumawumm comm com mco_umoWPamc com * moms_umm comm com mco_umo_pmmc cm a ~oo.owmofi.o fioo.ommma.o moo.owmo~.o Noo.oumm~.o ~oo.oamm~.o moo.ow~m~.o moo.o«mm~.o “mm“.o om.o Hoo.owmm~.o Hoo.owmcfi.o ~oo.omoofl.o ~oo.ommm~.o ~oo.owme~.o moo.owmw~.o moo.owmma.o mNmH.o mm.o Hoo.owmmfi.o Hoo.owmmfi.o ~oo.owmmfi.o moo.owmma.o Noo.o«mo~.o moo.o«~mfi.o moo.owmmfi.o wmm~.o om.o Noo.o«mmfi.o ~oo.owmm~.o moo.o«mm~.o Noo.owwma.o ~oo.owmm~.o moo.owmm~.o moo.omflmfi.o mNmH.o mm.o moo.owmmfl.o floo.o«¢m~.o Noo.owm¢~.o ~oo.oumm~.o moo.omeH.o ooo.owmma.o moo.owmmfi.o “mm~.o om.o Noo.ow¢m~.o floo.owmm~.o Hoo.owom~.o Nco.owmmfi.o Hoo.owmmfi.o moo.ow~c~.o moo.owm¢~.o o-~.o m~.o Noo.o«omfi.o Noo.o«mHH.o ~oo.oonH.o Noo.omeH.o Hoi.oH-~.o moo.o«m¢~.o moo.o«mm~.o HNHH.o om.o ~oo.o«~o~.o ~oo.ow~o~.o moo.o«wmo.o Noo.o«mofi.o Hoo.o«~o~.o moo.owomfi.o Noo.o«moH.o mmmo.o m~.o Hoo.o«mko.o ~oo.o«m~o.o Hoo.owm~o.o ~oo.oa~mo.o ~oo.owmmo.o ~oo.owomo.o moo.owamo.o Homo.o o~.o Hoo.owmmo.o ~oo.owmeo.o Hoo.owmmo.o floc.oumeo.o floo.owmmo.c Noo.owmmo.o ~oo.owomo.o mome.o mo.o Hoo.o«moo.o Hoo.oweoo.o Hoo.owmoo.o Hoo.o«moo.o Hoo.o«moo.o moo.owm~o.o woo.owe~o.o o oo.o Es... .-....“usfi .05.. .....““.fi .15.. seamen...» .32.. as . om-" c i i i 1 mm-" c u a- 1 .1. m“ u c 8 .8 .2 w e 5;: .8 x : Ea: 3 .N2 .5 mmczumooca commwomm cow mco_>mcmc caps mo mcom_cmasou .o m—cmh APPENDICES APPENDIX A 46 Ao_cmu oucozv mums_umo comm can m:o_umo__aoc mo Lucas: A.V .oc.ommmm_.o moo.cw_c~—.c mco.ow~cm_.o moo.cw~mm~.o noo.cwmowu.o noo.owsmom.o moo.owcmmm.o om.o .oo.ow_~m—.c ~oo.ow~mm..c ooo.owc~m_.c woc.owomm~.o noo.owmmo~.o soc.owmmoM.o moo.owcmmm.o mc.o .oo.ow~om_.o moo.owamm_.o soc.owm~a—.o sco.owmm-.o soc.ow~hm~.c noo.ownom~.c moc.ommmmm.o oc.o ~oo.owcmm_.o moc.owooo_.c woo.cwmma_.c sco.cwmw_~.o soc.owonm~.o soc.owmcm~.o moo.on~mcm.o mm.o moc.owomc_.o noc.owscm_.c mac.owc—s_.c oco.owmmo~.o sco.o«__c~.o noo.owmoo~.c ooo.ow~c~m.o om.o ~oc.owomm_.o cco.owm_c_.c acc.cwm~o_.c moo.owc~m_.o ooo.ow~c_~.o soo9owmmc~.c moo.ommmm~.c m~.o ~oo.ow~c~_.c moo.owmo~_.c aoc.owmcc_.a moo.owm:o_.o moo.owo~a_.c noo.ow~c_~.c noo.onmcm~.o o~.o ~oc.owm~c..o mac.owmmo_.o noc.owm.~..o moc.ow~om_.c moc.cw~mm_.c sco.owmmm_.o moo.onm~m~.o m_.o —oo.ow_c~o.o ~co.cncmac.o «cc.owmmmo.c noo.owo~o_.o coo.cw—a—_.o ooo.owonm_.c soo.owmcm_.o o..o .oo.ow_~:o.o ~oo.o«mmcc.c ~oo.owmmmo.o moo.owcmmo.o coc.cwmm~o.o coc.owmmao.c moo.ow_m:..o mo.o .oc.owmcoc.o ~cc.o«-oo.o noo.onmm_o.o moo.o«m-c.o noc.cwm—mo.o moo.o«mmcc.o moo.o«-mo.o oo.o AooNV Roomy Amoco cacao. “com... .ooc.~v Aooo.:v OMIC MNIC O p U: MIC . MIC NI: —IC Q .Amm.evem co Le_>memm xm_m .H.< a_nme 47 Ao_cmu oucozv mums.uno comm cow m:o_amo__amc mo cones: A.v .oo.cwmmm_.o ~om.owmmm_.c moo.owmcm_.c hoo.cwo~m~.c soo.mw:m-.c uoo.mw:o_n.c :cm.cwm-s.o cm.o .oo.cwm.m_.c Nam.owm~m_.o mco.mw-m_.o moo.ow_mm~.c ~co.ow~mm~.c soc.cwmmom.o :co.ow-.e.o ma.o .co.cw-m_.o noo.ow_mm_.o moo.mwmmm_.o som.cwo-~.m ~oo.owm_m~.c Ncc.cws~om.o moo.owmmo:.c c:.o .oo.owm~m_.c Noc.cw:~m_.o moc.owm~m_.o soc.ownm_~.c mc°.ow:~m~.o soo.owm~mu.c moo.owm~mm.c mm.o .oo.cnmma..c mac.owmm:_.c eca.cwomm..o moo.omo~o~.m mco.ow_mm~.o ~co.owmo-.o mco.ow~_~m.o mm.o _oc.owm~m_.o aom.owm.a_.c moo.owm~m_.c mam.cwm_m_.c moc.ow~a_~.o sce.owcm:~.c smo.owm~am.o m~.o ~om.ow~m__.c mcc.cwcm~_.o acm.mwmma_.o moc.ow:~m_.c moc.owmhm_.o soc.cwma_~.o sco.owm~om.o o~.c «co.owa.o_.c mom.ow_mo_.m moo.owm°~_.o moc.cwoam_.c mco.owmmm_.o soo.cw~:m_.c moo.owm~m~.c m_.o .oo.owm-o.o ~oo.owm~mo.c ~o¢.ow~mmo.m moc.mwc_c_.c acm.owmm_..o moo.owm_s_.o moo.owmmp~.c o_.o .oo.cwmm:o.m Ncc.ow_mam.o «cc.ow.mmo.c mec.owoamo.c moc.cw_m~c.o moo.ow_mmo.o moo.ow:mm_.o mo.o .oc.cwmnco.c ~oa.owmmom.o mom.cwma_o.m nmm.cw~o~m.o coo.ow.cmc.o moc.owmcmo.o moo.owmo__.c oo.c Aoo~y .ocuv Acme. Accmv Acme... Amoc.~v Acco.av omlc mun: c-u: ml: ml: NI: plc a .ANe.ave¢ co Le_>meee xm_m .~.< 8.882 48 Ao_cmu oucoxv mums_omo comm co» m:o_omo__mmc om «m “0.509 oucozv oume_umo comm cam «co—umo__moc com « —co.OHN—o_.o —oo.o«mmm—.o NO0.0Hmem—.c —°c.OHNmm—.o —oc.cHN:o—.c —O0.0Hon—.c oco.OHm—o—.o om.o No0.0Hc—w—.c —O¢.°Homo—.G NO0.0Hmm@—.o —cc.OHmmm—.c —oc.owmmw—.o —oc.OHMNc—.o —oo.cHM—w—.o ma.o Noo.OH:~m—.o poc.owhmm—.O NO0.0H:No—.o ~O0.0N—ao—.O —oc.OHN—m—.o —Oo.o«@—m—.O poo.owuoo—.c 0:.0 —oo.onom—.o .oc.OHNmm—.O Nco.owmwm—.o NO0.0Hth—.c NO0.0Homm—.O Noc.omcmm—.o Noo.owmmm—.O mm.o NO0.0HmMa—.o —oo.ou—m:—.o Ncc.owm~:—.o NC0.0H:Nm—.o moc.cwm:m—.O Ncc.owmmm—.o Noc.owmmm—.o Om.c poo.owmmN—.o —oc.cuman—.O m¢¢.o«oo:—.O moc.owc::—.c moc.OHmm:—.o moo.owm—m—.O moo.onJm—.c mN.o Noo.owmm——.o —OG.OHNw——.O mO0.0HN@N—.c £00.6Hmom—.c MO0.0Hm:3—.o ncc.owcaa—.c Nco.owmom—.o o~.o _oc.onmsmc.o —O0.0Hm—O—.o NOG.OH:———.O moc.OHNMN—.o {Oo.owomm—.O MO0.0Hmmn—.O MO0.0H—w:—.c m—.O Nco.owmomc.o Noo.0Hm—mc.o mco.oummmc.o mo¢.oanm——.O moc.owoo——.o JO0.0H-m—.o moc.owm::—.O O—.o Noo.owuhéo.c Nco.cflmmmo.o MO0.0H¢QN0.0 €06.0Hmhmo.o :O0.0H—:——.O acc.OH—mN—.o moo.OHNO¢—.O mo.o NO0.0H~m—c.c NOO.¢H:@N0.0 MO0.0H@emO.o MO0.0Hm—wo.c :O0.0flOmo—.O 36°.OHoQ——.O moo.OHN:M—.O oo.o «mom-c «mun: «o—Ic «ml: «nu: «Nu: «pa: a .Acmvmh.evem co Le_>e;em ¥m_m .m.< 0.882 Table A.4. LIGTrF. 100= 105= 110= P=O. 120= D0 2 I=1r50 130= P=P+0.01 140= 150= U=CP+1. 160: U=CP—1. 170=2 1803 190= 200= PRINT 79P7RP 210=7 220=2 CONTINUE 230= END Evaluation of the Bayes envelope R(p) PROGRAM ENULOPCOUTPUT) REAL PrCPvUvUrAIBrRP CP=0.5*ALOG((1.-P)/P) CALL MDNOR(07A) CALL HDNDR(UvB) RP=P$A+(1.-P)*(1.-B) 49 F0RMAT(3X:F5.273X1F10.6) .01 .02 .03 .04 .05 .06 .07 .08 .09 .10 .11 .12 .13 .14 .15 .16 .17 .18 .19 .20 .21 .22 .23 .24 .25 .26 .27 .28 .29 .30 .31 .32 .33 .34 .35 .36 .38 .39 .40 .41 .42 a? \‘9‘0 .44 .45 .46 .47 an .4? I": ~ 5 -_J'.,’ . 111': l(\.' EXEC BEGUN.09.?4.2 .009311 .026090 .033503 0 040459 .047018 .053226 .059118 .064722 .070061 .073155 .080019 .084670 .089117 .093373 .097446 .101345 .105077 .108649 .112067 .115336 .118461 .121447 .124298 .127017 .129608 .132074 .134417 .136642 .138749 .140741 .142620 .144388 .146047 .147598 .149042 .150382 .151618 .152751 .153783 .154714 .155545 .156276 .156909 .157443 .157380 .158219 .158462 .153507 f." “\ ’ III? 0 m 5;} IDLJM. Knutor 50 Table A.5. Monte Carlo simulation of Rn(p’T0)’ A e B REHDY 22.11.33 UK. UK-HTTRCH9flyBHYESS. fiTTRCHyHyBHY586. UK-FTflyl-HyflPT-e. COMPILING BERISK COMPILING CDEFICT CUMPILING BETH .531 C? SECONDS CUMPILRTIDN TIME UK-PRUMPT. ‘ DK-LISTTY’IIfiafls. PRUGRHH BERISK(INPUT:OUTPUT,THPESIINPUT:THPES'UUTPUT) DOUBLE 9(100):B(100)a0(100>9M(100):X(100) DOUBLE na.na.nsesn.se1.sun1.suna REEL HCLINIT.HERN.P.P1:P2:PRR9R(1000).RISK10RISK29SD0Y1.Y2 RERL Rnun.ssn INTEGER CUUNT.NEXP.HUH COMMON n.3,c/HUNEHT/H URITE(6.100) 100 Funnnr(.1..oTH13 PRUGRHM IS WRITTEN BY HUM Jan Tsao.) Renn<5.400>s.7 400 FoanaT 1000 IFP.HEXP.N.DSEED ' 500 FunnnTPvflEXP’NvDSEED 550 FURNRT<¢0¢0F3.8:3X:I473XaI49D25.13) C KIH+1 CRLL BETQ C COUNT-0 ' C COUNT IS THE NUMBER OF ILLIGHL DRTRS FOR 651 C RISK180.0 RISKEIO.O C DD 1 Ltl’NEXP C DSEED=214748364?.DUOPCL)+1. C DU 10 1'1," PRR=GEBIP=DEXP NOU UE CONPUTE CONDITIONRL BRYES RISK GIVEN X(1)s...vX(N) Yl-RCLINIT-l. YEIRCLIHIT+1. CELL NDNOR CRLL MDNOR RISK-POP1+(1.-P)O(1.-P8) RISKI-RISK+RISK1 RISKE'RISKORISK+RISKE GO TO 1 URITE<60650>(I:B(I)9B(I)9C(I)9I819N) FORMHT(O 094X9I493085.13) COUNTICOUNT+1 CONTINUE NUN'NEXP-CUUNT HERN'RISKI/NUN SD'SE‘PT ( (RISKe-NUNONEFINONEHN5 / (HUN-I . ) ) RNUMINU" SSD'E..SD/SQPT(RNUM) WRITE(6,7003P2N9NEXP9HEHN23D!33D FURNRT(’0.9¢P' .9F5.290 N89214:. NEKP=92I42 + . RISK=¢9F10.590 39:0.F10.5,o 33D: OsFS.3) GO TO 1000 END 52 SUPPOUTIHE CDEFICTfN) DOUBLE 9(100):B(100)9C§100)90(100) common 9.2.0 ’ c<1>=3<1> c<20=9<1> IF(N.EQ.1)EO TO 5 DO 10 1829N D(1)=B(I)¢C(1) DO 30 J8€9I D(J)8R(I)OC(J-1)+B(I)OC(J) C(J-I)ID(J-1) 20 CONTINUE D(I+1)-H(I)OC(I) C(I)ID(I) C(I+1)'D(I+1) 0 CONTINUE RETURN END Of? (RF SUBROUTINE BETH REEL T98 DOUBLE "(100)9PROD19PRODE CONNON /MONENT/H THIS SUBPOUTINE GENERHTES 1 THRU K TH MOMENTS OF BETH OCUOC) PROD1'1.D0 PRODE=1.DO DO 10 I'lyK PRODISPROD19(T+(I-1.)) PRODE-PRODEO(T+S+(I-l.)3 N(I)=PROD1/PRODE 10 CONTINUE RETURN END OEOROO OEOI OK-HHL. HHL 5.33 L?LGO. EXEC BEGUN.22.16.4E. THIS PROERRM IS WRITTEN BY HOM JRN TSRO 00.3 0.5 90.20 10 5 13524.00 .20 10 3 .135240000000000000D+0S P= .20 Na 5 NEXP= 10 RISK= .1330? SD: .03410 00.20 400 5 14336.00 .30 400 5 ,14gegoonunnnnnngnnn+05 P= .20 Na 5 NEXPa 400 PIER: .13464 SD= .039?3 0 THIS ROUTINE CONTINUES UNTIL USER HEOPT. $20: 0'!) III") .004 53 Table A.6. A numerical computation program. This program evaluates: DCADRF (H,A,B)-i where g l N(T) I i 1:000 {a 0 C(S,T) i _ 1 J I I fo(x)dx . fo(S)dS * r1(r)ar /f; A c -~ - ’1... P ___I ' l—r(s) DCADRE(F,C,D) RERDY 12.22.37 u HTTHCHyRaRONBB. RTTRCH:H!RONBB. RERDY 12.32.49 LISTTYyIifiaNS. PROGRHN RONB3(OUTPUT) INTEGER IER RERL DCHDRFPH9FOQF1’RQB!HERR9PERRFERPUR!INTEG EXTERNRL H 93-3.11 335.11 RERR'O. HERR'1.E-5 ' INTEG‘DCRDRFstRvByfiERRsRERRQERROP!IER) PRINT 7!INTE59ERROR9IER FORNRT(1X9FI7.15!3X!F10.8!3X9I3) END "J ("I ‘- REHL FUNCTION H(T) INTEGER IER . RERL DCRDREvaF01F19C9D99ERR9RERRsERRORsINTEGIZ EXTERNRL F COMMON /JOINT/Z ZST D33.11 RERR‘U. HERR31.E'S H=DCHDEE(F7C9D!RERR9EERR9EERUESIERF.FI(T) RETURN END 1'? "TI REEL FUNCTION F63) DOUBLE C REEL $.P.Y.2.T.F0 1 COMMON /.JO I NT "2 2 . Ta: HONOR DCADRE IMSL subroutines used: "r'm: (SSW-*1. 3. DCADRF: a binary copy CHLL mpngp.g"f.p~. of DCADRE. F=POF0930 RETuRH END 54 ('3 DOUBLE FUNCTION C(SsT) DOUBLE X12X32D99D396 PERL SpT X183 X2=T D981.+DEXP(2.OX2)+DEXP(2.0X1)+3.¢DEXP(2.O(X1+X2>) DB!4.+2.ODEXP(2.OX2)+2.ODEXPfE.OX1)+4.ODEXP(2.O(21+X2>) sane/03 C-0.S¢DLOG((1.-G)/E) RETURN END 1 j 17'; RERL FUNCTION F0 DOUBLE Y’PI RERL X' PI83.14159265353979323846264338D0 Y8K FOIDEXP(-0.5.(Y+1.)¢(Y+1.))/DSDRTf2.OPI) RETURN END RERL FUNCTION F1(X) DOUBLE YvPI RERL X 91-3.14159265353979323346264338D0 Y'X FI'DEXP(-0.5¢(Y-1.)¢(Y-1.))/DSQRT(2.¢PI) RETURN END OEOR00 OEOI REHDY 12.24.10 PETURNsDCRDRF. RERDY 12.25.49 REMINDyH. RERDY 12.26.04 RTTRCHyDCRDRFuCRSDCRDRF. RTTRCHsDCRDRF9CRSDCHDRF. REHDY 12.26.25 FTNsI-R. COMPILINE RON33 COMPILING H COMPILINE F COMPILING C COMPILINE F0 COMPILINE F1 .220 CR SECONDS COMPILHTION TIME SEED? 12.26.48 HRL. HHL 5. 37’ LPLURD.DCRDRF. L- ?LSH 9 ' LIED. LTERECUTE. EHEC BEGUN.13.3?.39. .3351?0353?439?3 .00000514 0 END ROME? 1.013 0P SECONDS EEECUTIDH TIME 55 Table A.7. Monte Carlo simulation of Rn(p,T ), a = 1,2 Cl PERDY 12 .43.39 LIST’FyNS. PROGRRN RISK(INPUT3OUTPUT:TRPESIINPUT:TRPE6=OUTPUT) DOUBLE DSEED PERL R(6000)9X(100) MONTE CRRLO SINULRTION OF MIXED NORNRL RRNDON VRRIRBLES FOR TESTING N(1y1) VS N(-191) NEXP REPLICHTIONS OF SHNPLES UITH SIZE N IS GENERHTED TO ESTINRTE RISK BEHHVIORS OF (1) ROBBIN’S DECISION PROCEDURE 9ND (2) VRN HOUELINGEN’S DECISION PROCEDURE126=C URITE<6050) 0 FORHRT<¢0O9¢DRTR-¢) OUOOOO RERD<50100)P9NEXP9N:DSEED 100 FORNRT(F5.29I49149D25. 18) URITE(69200)P:NEXP:N,DSEED 200 FORHRT<¢0¢9F5. 293X’I4’3X9I493X7D25. 13) CRLL 66UBS=GENQF+SUN1 ‘ \ SUN23THNH(X(I))+SUM2 10 CONTINUE PRB'O. 5+SUN1/(2.ON) PVHIO. 5+0. 90342942‘SUM2 /N IF(PRB.EE.1.>GU TO 350, IF CHLL MDHUR(YRB2.PRB2> RB=F+PFB1+(1.-F)o<1.-Fesa) GO TO 356 330 F22 1. -P 60 T0 355. :23 FE1=F 56 336 IFfpwH.5E.1.)EO TO 320 IF(PVH.LE.0.)GO TO 332 CVH=0.509LOG((1.-PVH)/PVH) YVH1=C¥H*1. ' YVH2=CVH+1. CRLL NDNOR(YVH1:PVH1) CRLL NDNORfYVHEyPVHE) PVSPOPVH1+(1.-P)O(1.-PVH2) GO TO 460 PV=1.-P GO TO 450 PVSP c. 1‘) 1,... Lg. 1.51 .' III II: (III [0 .- - . RBI'RBI+RB RBEIRB2+RBORB RVH1=RVH1+PV RVH2'RVH2+PVOPV 1 CONTINUE 0T D RNEXP-NEXP SNRBIRB1/RNEXP SDRB‘SQRT((RD2-RNEXPOSNRBOSNRB5/(RNEXP-l.)) SSDRD=2OSDRB/SQRT(RNEXP) SNVH'RVHl/RNEXP snvnasoer<(RVHa-RNEXPoSMVH.SMVH)x> SSDVH-BOSDVH/SQRT(RNEXP) WRITE(6!400)SMRB!SDRB!SSDRB!3NVH!3DVH9SSDVH 400 FORMHT<90999ROBBIN 99F10.5:F10.5:F5.390 V HOU *92F10.59F5.3) END QERDY 12.50.09 FTN. CONPILING RISK .159 CR SECONDS CONPILHTION TIME RERDY 12.50.37 HAL. HRL 5.37 L?LGO. EXEC DEGUN.12.51.3B. BETH-0.3 200 5 26133.D0 .30 200 s .261380000000000000D+0S 003210 .1902? .03330 .012 v HOU .1erxs END RISK .193 CR SECONDS EXECUTION TIME .0901? .013 APPENDIX B APPENDIX B In this appendix we prove Theorem 2.1 and 2.2. The notation and the following assumptions are from Johnson (1970). The model assumes X1, X2,..., i.i.d Pe where Pe has density f(x,e) with respect to a given 0-finite measure 0- 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 The parameter space ® is a compact subset of E1. Let ®() denote its interior and @_ denote the Borel 0-algebra on G. e is identified by P9. f(x,e) is jointly measurable in (x,e). For each x, f(x,e) admits continuous first and second partialderivatives with respect to e. The measures P are mutually absolutely continuous. e If lim lei] = m, then lim f(x,ei) = 0 for all x except for perhaps a null set depending on the sequence. For all e e @. Eellog f(X,e)| < w and 2 0 < 1(6) = -Ee [-3§-log f(X,e)] 36 For each 60 E 00, there exist functions Gl(X) and 62(X) satisfying 2 |§%-log f(X,e)| g 61(X), |§—2-log f(x,e)| g 62(X) as for e in a neighborhood of 60 and also Ee [61(X)] < m 0 and E [G (X)] < m. The functions G and G may 60 2 1 2 depend on 90. Let f(X,6,p) = sup f(x,e’), p > 0 le-e'lgo and Q(x,y) = sup f(x,e), y > 0. WM 57 3.10 3.11 3.12 3.13 58 For every 6 6 0 and 0.Y > 0. T(X.6.o) and Q(X.Y) are measurable functions of x. Moreover, for sufficiently small 0 and sufficiently large y, E8 [log f(x,e,p)J+ < . 0 E6 [log Q(x,y)]+ < w for each 60 6 @0 0 For each x, log f(x,e) had 5 continuous partial derivatives with respect to a 6 9. There exists functions Gk(x) with E6 [Gk(x)] < m 0 and IEEF'IOQ f(x,e)| g Gk(x) for e in a neighborhood of 60 E 8. k = 3,4,5. A is a probability measure on (8, go, A has density A. with respect to the Lebesgue measure. For 60 6 @0, 1(60) > 0 and A(-) has 3 continuous derivatives in a neighborhood of 60. f [0|A(e)de < m. 0 Conditions 3.1 8.9 are basically those assumed by Wald (1949) to establish the strong consistency of the M.L.E. and those of LeCam (1956) to show that the M.L.E. is asymptotically normal. A weakened one-dimensional verison of LeCam's (1956) Theorem 3.4.1 is Theorem B.II.1 Let 8.1 ~ 3.4, 8.7, 3.8 be satisfied. Then the maximum likelihood estimator 5n is strongly consistent and asymptotically normally distributed. The variance of the 59 limiting distribution of /0 (6 ( n 5“) - 00) is 1/[I(eo)]. The following Theorem is a specialization of Theorem 3.1 of Johnson (1970) to y = 1 and |<= 2. Let M) [13321 f(x )1‘3 e = - —- 09 .,6 "i=1? 1 and -1 n a3 , a3n(e) - n 121 Egg-log f(Xi,e)/6, e E 0. Theorem 8.11.2 Under the assumptions 8.1 ~ 8.13, there exists a constant C such that for sufficiently large n depending on x = (x1,x2,...) belonging to a set of probability one, 3 ~ -1 ~ . . « '-1 -1 ' 2 Iii/((60.1) - an - b (6a3n(en) + A (en)/A(en))n l g 0 Ch (1) where we have abbreviated b(en(xn)), en(xn) by b and en. Proof: See Theorem 3.1 and (3.4) of Johnson (1970). D For the proofs of Theorem 2.1 and Theorem 2.2 we apply the above results with f(x,e) = ef1(x) + (1-6) f0(x0, 0 = [0,1], 6 = p. Proof of Theorem 2.1. It is sufficient to show that the hypothesis of Theorem 2.1 implies that of Theorem 8.11.1. Clearly 8.1 ~ 8.4 are satisfied. (Recall that identifiability 8.2 is a tacit assumption in our empirical Bayes problem and is implied by P0 and P1 being different measures in the two state case.) 60 Since logarithm is strictly increasing on (O,w], for 66 [0,1] and for almost all x [log f(x,e)| g max{|log f1(x)|, llog f0(x)l}. (2) Also, 2 2 (f (X) - f (X)) _ 3 _ 1 0 1(6) - -f;;§-log f(x,e)Pe(dx) -‘( f(x,e)2 Pe(dx) > 0, for e e 0. since uffl f f0] > 0. Hence, 8.7 is satisfied. For 80 6 (0,1) pick 8 > 0 such that e < min{eo,1-e }. 0 Then for each a e (GO-6’90 + e) and f0r almost all x ak ~1 |-——-log f(x,e)|[(k-1)1] 83k 1/[9 + f0(x)/(f1(x)-f0(x))]k g e'k < (so-e)’k if f1(x) > f0(x) -k f1(x) k k -k IIBTIT'-1| /[e(f1(x)/f0(x)) + (1-6)J g (1-0) < [1-(00 + 5)] if f1(x) < f0(x). 0 if f0(x) = f1(x) > 0, k = 1,2,3,... (3) Hence 8.8 is satisfied. Proof of Theorem 2.2 We first show that the hypothesis of Theorem 2.2 implies that of Theorem 8.11.2.1t is clear that 8.5, 8.6, 8.12, 8.13 are satisfied; also, from the proof of Theorem 2.1, we see that 8.1'~ 8.4, 8.7, 8.8 are satisfied. 61 Observe that f(x,e,p) = f(x,(e-p)v0)I[f0 ; f1] + f(x,(e+p)A1)I[f0< f1] é max{f0(x),f1(xfl and Q(x,y) = f(x,0)I[fO ; f1] + f(x,l)I[fO < f1] g max{f0(x),f1(x)} (4) both are log-integrable by (2). We see that (3) implies 8.10, (4) implies 8.9. Also, (3) implies that 8.11 is satisfied as well. Next we show that the result of Theorem 8.11.2 leads to /H (EA(e|§n) - en) + O a.s. P60 (5) By Theorem 8.11.1, én + 30 a.s. P: , 80 is the true parameter, 0 -0 80 E b . Let ( ) I 32 ( > ( ) 8 6'6 = - ———-log f x,e P dx 1 0 362 90 and 83 C(eleo) = ]-—3- log f(x,e)P (dx). as 60 Together 8.8, 8.11 and the uniform strong law (Rubin (1956)) implies that 2. . “ 1.1. b (an)-+I(eo) > 0, a3n(en) 6 C(eoleo) a.s. P 62 Also, 8.12 implies that i A + I " + l °° 1(0n) 1(80) > 0 and A (on) A (00) a.s. Peo' Therefore, for some M > O and for almost all .x, 1 lb‘1(6a3n(én) + x'(én)/1(én))l + Cb' g M for large n so that (1) implies (5). BIBLOGRAPHY BIBLIOGRAPHY Ballard, Robert J. and Gilliland, Dennis, C. (1978). On the risk performance of extended sequence compound rules for classi- fication between N(-1,1) and N(1,1). g, Statist. Comput. Simul. 6, 265-280. Behboodian, Javad (1972). Bayesian estimation for the proportions in a mixture of distributions. Sankhya, Series B. 34, 15-22. Billingsley, Patrick (1968). Convergence of probability measures. John Wiley and Sons Inc. Boyer, John E., Jr. and Gilliland, Dennis, C. (1980). Admissibility consideration in the finite state compound and empirical decision problems. Statistica Neerlandica, 34. Copas, J.B. (1969). Compound decisions and empirical Bayes. JRSS Series B, 31, 397-425. Feller, William (1957). An introduction to probability theory and its applications, Vol. 1., 2nd edition, John Wiley & Sons, Inc., New York. Ferguson, Thomas, S. (1967). Mathematical statistics a decision theoretic approach. Academic Press. Gilliland, Dennis, C., Hannan, James and Huang, J.S. (1974). Asymptotic solutions to the two state component compound decision problem, Bayes versus diffuse priors on proportions. RM-320, Statistics and Probability, MSU. (1976). Ann, Statist. 4, 1101-1112. Gilliland, Dennis, C. and Boyer, John E., Jr. (1979). Bayes empirical Bayes. Submitted for publication. Hannan, James, F. and Robbins, Herbert (1955). Asymptotic solutions of the compound decision problem for two completely spec- ified distributions. Ann. Math. Statist. gg, 37-51. Hannah, James, F. and Van Ryzin, J.R. (1965). Rate of convergence in the compound decision problem for two completely spec- ified distributions. Ann. Math. Statist. 36, 1743-1752. 63 64 Huang, J.S. (1970). A note on Robbins' compound decision problem. RM-266, Statistics and Probability, MSU. (1972). Ann. Math. Statist. 33, 348-350 IMSL Library, 3 and 3, 7th ed. Int. Math. Stat. Libraries, Houston, TX, January 1979. Johnson, R.A. (1970). Asymptotic expensions associated with posterior distributions. Ann. Math. Statist. 53, 851-864. LeCam, Lucien (1956). Lecture Notes. Department of Statistics, University of California at Berkeley. Lehmann, E.L. (1959). Testing Statistical Hypothesis. Wiley, New York. Munkres, James, R. (1975). Topology. Prentice-Hall, Inc. Oaten, Allen (1972). Approximation to Bayes risk in compound decision problems. Ann. Math. Statist. 33, 1164-1184. Robbins, Herbert (1951). Asymptotically subminimax solutions of compound statistical decision problems. Proc. Second Berkele Symp. Math. Statist. Prob., 157-163. UniVersity of Ca|i¥ornia Press. Robbins, Herbert (1964). The empirical Bayes approach to statistical decision problems. Ann. Math. Statist. 35, 1-20. Rockafellar, R. Tyrrell (1972). Convex Analysis. Princeton Univer- sity Press. Rubin, H. (1956). Uniform convergence of random functions with applications to statistics. Ann. Math. Statist. 33, 200-203. Shapiro, Connie P. (1972). Bayesian classification. Ph.D. disser- tation, Department of Statistics, University of Michigan. Shapiro, C.P. (1974). Bayesian classification: Asymptotic results. Ann. Statist. 3, 763-774. Snijders, Tom (1977). Complete class theorems for the simplest empirical Bayes decision problems. Ann. Statist. 3, 164-171. Van Houwelingen, J.C. (1974). An empirical Bayes rule for testing simple hypothesis versus simple alternative. Statistica Neerlandica 33, 209-221. 65 Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 39, 595-601. "llilllllllljlHilliflllilllljlllllEs