4‘- .. ... 5-»... ..-A,. ....._..p~—.n.-..... ""‘-Cn0-4(~l< «a: r:.—.-— M..-W, W -\ ,v“. " n I' E? EMPIRICAL BAYES RESULTS INITHE CA8 OF NON-IDENT ICAL COMPONENTS . - ,.: Thesis for the Degree .of Ph. D. MICHIGAN STATE UNIVERSITY THOMAS EUGENE O'BRYAN 1972 #7 LIB R A it ’1" Michigan State University This is to certify that the thesis entitled EMPIRICAL BAYES RESULTS IN THE CASE OF NON-IDENTICAL COMPONENTS presented by Thomas Eugene O'Bryan has been accepted towards fulfillment of the requirements for Ph.D. degree in Statistics and Probability v1 . a / L at,“ hlqknrprcdiusor Date August 11, 1972 0-7639 Y ammue av ‘ir' NOAH a SONS' IIQMIEEIUNB. ABSTRACT EMPIRICAL BASES RESULTS IN THE CASE OF NON-IDENTICAL COMPONENTS By Thomas Eugene O'Bryan A Bayes rule with respect to a distribution G will minimize the risk of a decision concerning a parameter 9 which is distributed according to G. The infimum Bayes risk is denoted by R(G). Herbert Robbins ((1956), An empirical Bayes approach to statistics. Proc. Third Berkgigy,§ygp, £235, Statist. 2522., 157-163, University of California Press) demonstrated that, even if G is unknown, in certain cases one can construct statistical procedures based on data gathered from n independent repetitions of the decision problem for which the risk converges to R(G) as n a m for all C. Such an empirical Bayes procedure is asymptotically optimal. Rudimentary forms of this problem had appeared prior to Robbins' unifying treatment and a huge empirical Bayes literature has evolved since. Only sequences of identical component problems have been treated in the literature. However, it is clear that when the only difference from problem to problem is sample size, empirical Bayes methods should still be useful. In this case there is not a single m Bayes envelope R(G), but rather a sequence of envelopes R n(G) Thomas Eugene O'Bryan where mn denotes the sample size in the nth problem. Let fi_- (el,ez,...) be a sequence of iid G variables and let the conditional distrib:tion of the observations g“ ' (xn,1"°"xn,mh) given 3. be (Pen) n, n = 1,2,... . For a decision concerning en’ we will investigate procedures tn which will utilize all the data £1’°'°’§n and which, under certain conditions, arem asymptotically optimal in the sense that lim I'_Rm'“(tn ,G) - R n(6)] - O for all G. In particular this paper Ereats squared error loss estimation and linear loss testing in certain discrete exponential families where the construction of asymptotically Optimal procedures is tractable. EMPIRICAL BAYES RESULTS IN THE CASE OF NON-IDENTICAL COMPONENTS By Thomas Eugene O'Bryan A THESIS Submitted to Michigan State University in partial fulfilhment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1972 TO MY PARENTS AND MARY ii ACKNOWLEDGMENTS I wish to express my appreciation to Professor Dennis G. Gilliland for his guidance throughout the preparation of this manuscript and his concern for my work. I am indebted to both he and Professor James Hannan for the introduction to empirical Bayes decision theory as well as many other aspects of mmthematical statistics and for their many helpful comments and suggestions which led the way to stronger theorems and simplified proofs. The financial support provided by the Department of Statistics and Probability and the National Science Foundation made my graduate studies possible. I thank them and also wish to thankMrs. Noralee Barnes for her patience and skill in typing this dissertation. iii Chapter I II III IV TABLE OF CONTENTS INTRODUCTION 1 l The Statistical Decision Problem 1 2 The Empirical Bayes Decision Problem 1.3 History 1 4 The Non-Identical Case DECISION PROBLEMS INVOLVING SOME DISCRETE EXPONENTIAL FAMILIES (PRELIMINARIES) 2.1 Introduction 2.2 Assumptions 2.3 Lemmas ESTIMATION 3.1 Estimation Under (A1“) and (A3) 3.2 Estimation Under (Al) and (A2) 3.3 Estimation Under (Al) TESTING AND FINAL REMARKS 4.1 Testing Under (A1-) and (A3) 4.2 Testing Under (A1) 4.3 Final Remarks BIBLIOGRAPHY iv Page \OVU‘H ll 11 18 19 22 22 28 33 37 37 39 43 46 CHAPTER I INTRODUCTION §l.l The Statistical Decision Problem Consider the following statistical decision problem. Let {P9: 6 E o} be a family of probability measures over a a-field B of subsets of I. 9 will be called the parameter space and 9 will denote a generic element of Q. I will be called the observation space. Let a be an action space with generic element a. Let L 2 0 be a loss function defined on (9 x a. Let G be a probability measure over a a-field 3 of subsets of @. G will be called the "a priori" distribution. With 0 a a-field of subsets of a, a randomized decision rule (decision function) t has domain I X<3 and is such that t(x,°) is a probability measure on c. for each fixed x 6 I, and t(-,C) is B measurable for each fixed C 6 C. When 9 is the parameter, the decision rule t results in expected loss (1.1) R(t.e) '- { j L(e.a)t(x.da)P (dx) . a e We require that c. contain all the singleton subsets of ¢7 so that the class of randomized decision rules contains the class of nonrandomized decision rules. With a nonrandomized decision rule t, t(x,-) puts all its probability on a singleton set, say {t(x)}, for each x. For a nonrandomized decision rule t (1.1') R(t.e) =1 L(e.t(x))Pe(dx) R(t,G) is called the risk of t with respect to 6. When G is the "a priori" distribution on @, the overall expected loss is given by (1.2) R(t,G) = {g R(t.9)c(de) R(t,G) is called the Bayes risk of t with respect to G. Let (1.3) R(G) = inf R(t,G) . t R(G) is called the Bayes envelope evaluated at G. If there exists a * a decision rule t such that R(t ,G) = R(G), then we write * t 8 tG and call tG Note that tG need not exist and, if it does exist, it need not be unique. However if it does exist, then there exists a nonrandomized a Bayes decision rule with respect to G. decision rule which is also a Bayes decision rule with respect to G. We give two examples of the preceeding which will illustrate the two types of decision problems, i.e., estimation and testing, which we will be concerned with in the following chapters. In both examples, let G - (0,m). Let (PB: 9 E G} be the Poisson family of admits a density f with 9 9 respect to counting measure u on I I {0,1,...} given by fe(x) = distributions with means 9, i.e., P e-eex(xl)-1. Let G be the Gamma distribution on (0,m) which has density with respect to Lebesgue measure X given by g(e) = [r(8)]‘1a893'le‘afl for a,B >~O. Bxgple 1.1. (Poisson) Estimation. Let as (9. Let L(e,a) -= (e-a)2. Then for any nonrandomized decision rule, (1.2) becomes R(t,G) - 2 (t(x)-e)2fe(x)g(e)de . xIO Here the nonrandomized Bayes decision rule with respect to G is given by a version of the conditional expectation of 9 given X I x, 3 Biz tc(x) Gel Exmle 1.2. (Poisson) Testing. Let 0 < c < on. Let a- [a1,a2} correspond to the actions "decide 9 s c" and "decide 6 > c" + - respectively. Let L(e,a1) I b(e-c) and let L(e,a2) I b(9-c) . Here we use the symbol 6 rather than t for decision rules. Let 6(x) be the probability of choosing action a1 given x. Here we have (1.2) as Q R(6.G) . 2 [5(x)L(e.a ) + (1-5(x))L(e.a )]f (x)s(e)de l 2 e xIO and a nonrandomized Bayes decision rule with respect to G is given by 66(x) I ESE? s c] Example 1.3. Consider the problem of Example 1.1 with X = (X1,...,Xm), a sequence of iid random variables each distributed £9. The m statistic Y I 2 X1 is sufficient for this family. Here Y ~ fme. iIl The nonrandomized Bayes decision rule with respect to G is given by a version of the conditional expectation of a given Y = y, tG(y) I . Also we calculate R(G) a( I ) . The important thing to notice here is that both the Bayes decision rule with respect to G and the Bayes envelOpe evaluated at G depend upon the sample size m. Throughout this paper, we will let square brackets denote indicator functions, so that for an event A, [A] denotes the indicator function of A. §1.2 Theggmpirical Bayes Decision Problem In the case where G is known and a Bayes decision rule with reapect to G exists, one merely employs tG and thereby incurs the minimum possible Bayes risk R(G). But suppose G is unknown. Robbins (1956, 1963, 1964) showed that if a given statistical decision prob- lem occurred repeatedly and independently with the same unknown G throughout, then, under certain conditions, one could exhibit a sequence of rules {tn} which had Bayes risk with respect to G converging to the Bayes envelope evaluated at G. As the problem repeats itself, it presents a sequence of pairs of random variables {(91' Xi)} with each pair being independent of all other pairs. The 9 are unobservable and iid with distribution G. The condi- i tional distribution of X1 given that 9i = e is given by P 8° Robbins suggested that one use a decision rule tn in the n+lst repetition of the problem with tn depending on X1,...,Xn. Robbins' rationale was that one could use the knowledge about G gained through the variables X1,...,Xn in such a way that for large n the Bayes risk with respect to G of tn would be close to the Bayes risk with respect to G of tG. With tn used as the decision rule in the n+1st problem, the risk conditional on X ..,X is 1" n (1.4) R(tn,G) = {g i J‘ L(e,a)tn(x,da)P (dx)G(d9) a e which satisfies R(tn,G) 2 R(G) in view of (1.3). Hence with the overall expected loss for the decision concerning en+ denoted by l (1.5) Rn(tn,G) E E R(tn,G) we see that (1.6) Rn(tn,G) 2 R(G) Definition 1.1: If lim Rn(tn,G) = R(G), then {tn} is said to be H asymptotically optimal relative to G and we will write {tn} a.o. relative to G. §l.3 Histogy The search for rules {tn} which are a.o. relative to G for every distribution G or at least for every G within a certain class has taken basically two tracks. The first track is to use the values of X1,...,Xn to form an estimate of G, call it On, and then let tn be a Bayes decision rule with respect to G“, i.e., let tn I tG . The second track is to estimate the form of the Bayes decision rule directly without estimating G first. In 1955 at the Third Berkeley Symposium on Mathematical Statistics, Robbins introduced empirical Bayes procedures and dis- cussed both tracks mentioned above. In 1963 and 1964, two more papers by Robbins appeared which discussed the empirical Bayes problem further. Rudimentary forms of the problem had appeared prior to Robbins' unifying treatment and a huge empirical Bayes literature has evolved since. We will concern ourselves here with that segment of the literature which involves situations similar to what we will discuss in this paper. Along the second track, Johns (1957) discussed estimation in the case where the class of probability distributions {P9: 9 E O} was not restricted to a particular parametric family. Macky (1966) and more recently Hannan and Macky (1971) have dealt with certain exponential families in the case of estimation and have demonstrated a.o. rules with weak restrictions on the prior distribution and the parameter space. Samuel (1963) discussed the testing problem under various loss structures and in part dealt specifically with the type of discrete exponential families which we will discuss in Chapter 11. Johns and Van Ryzin (1971) treated the testing problem with linear loss and developed rates of convergence on Rn(6n,G) - R(G). Concerning the estimation of G which is part of the first track, Tucker (1963) examined the case where {P9: 9 6 G} was the family of Poisson distributions. Rolph (1968) used Bayesian estimation of G in the case where the parameter space 8 was limited to [0,1]. Recently Meeden (1972) has looked at Bayesian esthmation of G in the case where G may be [0,m). iiill Illa... §l.4 The Non-Identical Case The history of the empirical Bayes decision problem is such that the only problem that seems to have been considered thus far is the problem where the stages are identical repetitions of a given component problem. One could ask whether it is meaningful and useful to apply empirical Bayes procedures to sequences of independent but not identical decision problems all having the same unknown G. We will attempt to answer this question in part in the remainder of this paper. Specifically, we will address the case where the statis- tical decision problems in the sequence are identical except for sample size. When we observe a random vector of observations E_ = (X1,...,Xm), where m may vary from stage to stage, it will become necessary to consider the dependence of the Bayes decision rules and the Bayes envelopes evaluated at G upon the values of m (cf. Example 1.3). This was not necessary before when one considered problems where the sample sizes were identical at each stage. In the situation that we are considering, where the problems occur independently with the same unknown G throughout, there is a sequence of independent random vectors {(ei,§i)}, i = 1,2,... where X I (X11,...,X1 m ) is the sample of size m from the ith problem. ’ ‘1 1 i The random variables 91 are unobservable and are iid with distribu- tion G. Conditional on e = e, X ,...,X are iid P . We 1 il i,mi 9 consider a decision rule tn for use in the n+lst problem which de- pends on g1....,§n. Letting m = m, the risk conditional on n+1 $1,...,§n is given by 10 m (1.7) Rm(tn.G) = £3} £L(e.a>tn(§.da)r‘3(d§)c(de> 1 which satisfies Rm(tn,G) 2 Rm(G) where Rm(G) is the Bayes envelope for a sample size m component problem. Hence, with the overall expected loss for the decision concerning 9 denoted by n+1 m (1.8) Rn(tn,c) E R (tn,G) , we see that N m (1.9) Rn(tn,G) R (a) This motivates the following definition which parallels Robbins' definition. Definition 1.2. A sequence of decision rules {tn} is said to be asymptotically Optimal (a.o.) relative to G if lim Dn(G) = 0 mn+1 n—ooo where Dn(G) I Rn(tn,G) - R (G). The remainder of this paper treats squared error loss estima- tion and linear loss testing involving certain discrete exponential families and exhibits sequences of rules {tn} which are asymptotically optimal in the situation described in this section. We will approach the problem along the second track discussed before. CHAPTER II DECISION PROBLEMS INVOLVING SOME DISCRETE EXPONENTIAL FAMILIES (PRELIMINARIES) §2.l Introduction We impose the following special structure on the component problem to be treated in the empirical Bayes framework. Suppose that conditional on 9, X1,...,Xm are iid Pe where Pe has a density with respect to counting measure u on I = {O,l,...} given by (2.1) fe(x) = 9xz(e)g(x) where g(x) > O, x 6 I and e 6 @c: H where naiezoz zexs(x)0.ye1 gm Am(y)i=1 1 (2.3) m Am(y) “{(x1.-o-.xm):_21xi = y} - 1: With 9 ~ G, the marginal density of Y with respect to u is given by (2 A) hm(y) = qm(y) gm(y) where (2.5) qm(y) = j eyzmctde> O is the marginal density for Y with respect to the measure on I defined by the mass density gm' We note that for all m, qm(y) > O for all y E I and hence hm(y) > O for all y E 1, unless G is degenerate at O in which case qm(0) I zm(O) I (g’lmnm > o. hm<°> = 1 and qm(y) = hm(y) - o for y 2 1. Of course f9,1 = f6, g1 = g and for convenience we let h1 I h and q1 I q. Two common families of the type discussed above are the Poisson family with £e - exe'eeo'1 (2.6) and = ~me y -l fe,m(y) Bye m (Y!) 13 and the Negative Binomial family (r > 0 known) with fe(x) = ex(1-e)r(r+x'1) X (2.7) and = y _ mr mr-I—y-l fe’m(y) 9(19) ( y ). In this paper we will consider two loss structures. In Chapter III we will consider Estimation with @C ac [0,00) and L(e,a) = (9-8)2 and in Chapter IV we will consider Testing with d = {81’82} and for b > O and c E 64), + - L(9.81) = 13(9-0) : 149,82) = b(e-C) o For Estimation (hereafter understood with squared error loss), the Bayes risk of any nonrandomized rule t based on Y in the sample size m problem is given by m 2 (2-8) R(t,G) = 2‘. (MY) - e) f (y)G(d9) e.m y=O and the nonrandomized rule which is Bayes with respect to G is given by a version of the conditional expectation of 9 given Y = y . qm(y+1) (2-9) tG(y) '3 W where throughout this paper ratios 0/0 are to be interpreted as O. For Testing (hereafter understood with linear loss function as given above), the Bayes risk of any randomized rule 6 with respect to G, where 6(y) is the probability of taking action a1 given 14 Y = y, is given by (2-10) R(6.G) =J' I: [6(y)L(9.81) + (1-6(y))L(9.82)]f (Y)G(d6) ® y=0 9,m and a nonrandomized rule which is Bayes with respect to G is given by (2.11) aG = EaG s 0] where (2.12) ace) = qm(y+1>- c qmo) . We will add superscripts m to R,t,tG,5,5G,a and 8G whenever it is necessary to emphasize the dependence on m. The empirical Bayes problem that we consider involves repeti- tionsof the component problem (eitherEstimation or Egggigg) with sample size m varying from problem to problem. We now turn our attention to the method that we will use in constructing rules which are a.o. relative to any G. Notice that the Bayes decision rules (2.9) and (2.11) depend upon y through qm' With G unknown, qm. is un- known but can be estimated. One method is to express qm as a function of q, a common marginal density with respect to g of all xij’ j = l,...,mi, i I 1,2,... and to use the xij data to estimate q. With the discrete exponential family (2.1) this method can sometimes be implemented. We note that (2.13) qmo) -- g zm‘1eyz(e>c. y e 1 . 15 If 2(9) is a polynomial then zm-1(9) is also a poly- nomial and we can write for each m -l k (2.14) z‘“ (e) = 2 vke where the Yk are constants. Substituting (2.14) into (2.13) and interchanging the order of integration and (finite) summation yields (2.15) qm(y) = Z vkq(y+k) so that qm is expressed in terms of the estimable function q. Since 2(9) is continuous, given an interval [0,6], 8 < m, the Weierstrass' approximation theorem allows that for each m and every 6 > 0 there exists a polynomial z ykek' which approximates zm-1(9) to within 6 uniformly on [0,9], i.e., (2.16) Izm-1(9) - 2 vkekI < e for all 9 E [0,5] . Defining (2.17) qm,€(y) = 2 vkq(y+k). y e x. we see that (2-18) Iqm(y) - qm 800‘ S IIzm‘lm - z vkekmyzg s e q(y). y e r. O i.e., qm,e(y) approximates qm(y) to within 6 q(y) for each Y E I and each m. For the Negative Binomial fe of (2.7) with r an integer, 2(9) I (1-9)r, a polynomial of degree r. For the Poisson f 9 of (2.6), zm-1(9) = e-e(m-1) has a power series expansion about 16 O and the series is uniformly convergent on any bounded set of 9 values for each m and hence, for each s > O, the polynomial approximating 2m. to within 6 uniformly on the bounded set of 9 values can be found simply by truncating the power series expansion. We will find it sufficient but not necessary for our purposes to require that the estimator of q(y), y 6 I, be :1 -— 1 (2.19) q(y) ’ g z q(y). y E 1. 1:11 where for each i, iq(y) is an unbiased estimator of q(y) based on Xi, bounded by l/g(y), and note that iq(y) = O for all y 2 1 if Xi = Q. As an average of unbiased and bounded estimators, 3(y) is unbiased and pointwise consistent for q(y), y E I. We now develop an example of such an estimator. Let X1,...,Xm be a sample where, conditional on 9, X1,...,Xm are iid f9 and where 9 ~ C. An unbiased estimator of q(k) is provided by [X1 = k]/g(k) since (2.20) E[X1 = k] = h(k) = q(k)g(k) . An improved estimator based on the sufficient statistic Y = X1 +...+ Xm is given by the conditional expectation _ P[x1 = k, x +...+ X = y - k] E [x1 = k]-— 2 m PIY = Y] The probabilities can be computed first conditional on 9 and then unconditionally to obtain the result l7 g(k)gm_1(y-k) sm(y) (2.21) 1-:y[x1 = k] = This relation (2.21) holds for all m 2 l, k,y E I with the definitions g0(u) = [u = 0] and gm(u) I 0 if ‘m 2 l and u < 0. In view of (2.20) and (2.21) we see that the estimator gm_1(Y - k) gm(Y) ’ (2.22) is unbiased for q(k), k E I. Furthermore, 0 s g(k)gm_1(Y - k) s gm(Y) for each m to ensure that the estimator (2.22) is bounded by g-1(k),k€I. WhenX =...=xm=o then Y=o andwe see 1 that the estimator (2.22) is equal to O for k 2 1. Therefore, with Yi denoting the sufficient statistic in the ith component problem, gm '1(YI - Y) i 2.23 a = .0. ( ) iq(}') 8 (Y ) 9 Y 6 In 1 192: mi i provides an example of an estimator to be used in (2.19). 18 §2.2 Assumptions The discussion in Section 2.1 helps motivate the imposi- tion of the following assumptions. In order to exhibit rules {tn} which are a.o. relative to all G we will impose these assumptions from time to time in what follows. (111') o t: [0,5] where B < co . (Al) ®<: [0,9] where 9 E n . (A2) 2(9) is a polynomial in 9 E ® . (A3) The sample sizes mn form a bounded sequence, mn$M O, 25 ykek « m as 9 a a so that 9 must be bounded in view of (ii). If YK‘< 0, 25 ykek a -m as 9 a m so that 9 must be bounded in view of (1). Of course, assumption (Al) also implies 9 is bounded. In the presence of (A2), (A1) only adds the requirement that sup{9‘9 E G} E n. Assumption (Al) need not be satisfied when (A2) is satisfied; for example, with G = [0,1) I n, the negative binomial family with r a known integer satisfies (A2); however, 9 does not satisfy (Al). l9 §2 .3 Lemmas We will need the following lemmas in Chapters III and IV. Lemma 2.1. For the discrete exponential family (2.1) (2.24) P9,m[Y>y]SPB,M[Y>y] for all yEI, O$9$BEH,ISmSM. Proof. Let y E I, 0 S 9 s 9 E [1 and 1 Sm 5M be fixed. Since {f9 In(y)I9 E II} has an increasing monotone likelihood ratio in y D it follows from Lehmann (1959, Lemma 3.2) that P9,m[Y > y] s P9,m[Y > y] . Since qu1 = ET Xi'I'EM m+1 Xi where the X1 2 0, PEEP; X1 2 y] 2 PH)? Xi 2 y] so that PBJFY > y] s PB,M[Y > y] . In our constructions in Chapters III and IV it is necessary in the absence of (A3) to use the existence of decision rules Lm such that R(Lm,G) a 0 as m a m for every G. The two lemmas to follow will establish the existence of such rules. We notice that f (1) mi __9._ e. has a "natural" bounded estimator m o slut - 1] m 1: (2.26) T (x1,...,xm) - ELI-8(1) m A B , ZIP(1 " 0] i: 20 under (Al-). Lemma 2.2. (Estimation) Under (Al-), R(Tm,G) a O as m a m for any G. Proof: Let G be an arbitrary but fixed a priori distribution on ‘m 1 m P 9. Since -' 2 [X. I x] 49 f (x), x E I and f (0) E O and mi=1 1 9 9 9 E [0,9], we see that m P f (1) m _,9 EMA—e T (x1,...,xm) g(l) few) 9 . 2 Since (Tm - 9)2 s 9 for all m, the dominated convergence yields m m 2 R(T ,9) = E9(T - 9) a 0 for each 9 where Ee denotes expectation with respect to the distribution m m 2 2 P9. Since E6(T - 9) s B for all 9 and m, another applica- tion of the dominated convergence theorem yields m m (2.27) R(T ,c) a £110: ,9)G(d9) -. o . For the testing problem we define (2.28) Am(x1,...,xm) = [Tm s c] . Lemma 2.3. (Testing) Under (Ai‘), R(Am,G) -. o as m -. do for all C. Proof: Let G be an arbitrary but fixed a priori distribution on Q and let Ee be as in Lemma 2.3. We can write 21 R(Am,8) -= b{E9([T‘“ s We - c)+) +-Ee([Tm > c](9 - e)‘)} s b EeITm - 9| Since Tm - e e o in L (cf. (2.27)), Tm - e a o in L 2 1 which completes the proof. CHAPTER III ESTIMATION §3.1 Estimation Under (A1-) 33g (A3) In this chapter we will exhibit sequences of decision rules which are a.o. relative to every G in the case of squared error loss estimation in the Special discrete exponential families des- cribed in Section 2.1. As has been noted by many authors (e.g. Macky (1966, pp. 6-7)) and quite apart from distributional assump- tions if T(X) and 9 are L random variables then 2 T(X) - E[9IX] and E[e\x] - e are orthogonal in L2 so that E(T(X) - 9)2 - E(E[9IX] - 9)2 I E(T(X) - E[9IX])2. This implies that for estimating 9 in our sample size m component decision problem with an estimator t, (3.1) Rm(t,G) - Rm(C) = E(t - tG)2 In dealing with risks concerning the estimation of 9n+1 in the empirical Bayes problem we will find it convenient to drOp sub- scripts within the n+1st problem. We will let P and E denote probability and expectation taken over all random variables on which they operate and Py and Ey denote their conditional on Y I y counterparts. The following lemma is motivated by the approach of Robbins (1964) and proves useful in establishing the asymptotic opthmality 22 23 of specified sequences of decision rules when assumptions (A1-) and (A3) obtain. Lemma 3.1. Suppose (A1-) and (A3) obtain. Then with t: defined for each m to be a y-measurable decision rule in the sample size m problem depending on 3' X and with t: Bayes with respect 1""’-n to G in the sample size m problem (t: is given in (2-9)): P (3.2) t:(y) - t2(y) 4y 0 for all y E I, 1 s m s M m implies that the sequence tn = tum.1 is a.o. relative to all C. Proof: Let G be arbitrary but fixed. By applying (3.1) con- ditional on X1,...,Xn and then completing the expectation, we m m m 2 see that 0 s Rn(tn,G) - R (G) = E(tn(Y) - tG(Y)) . Condition (3.2) implies that (3.3) t:(Y) - t2(Y) 3.0 for each m, 1 s m s M Under (Al-), the sequence in (3.3) is bounded and for a bounded sequence, convergence in probability implies convergence in L2 so we see that (3.4) Rn(t:,G) - Rm(G) « O for each m, 1 S m S‘M . Since 1 s mn+1 $1M < m for all n under (A3), (3.4) implies that Dn(G) a 0 (where Dn(G) is defined in Definition 1.2) as was to be proved. This lemma shows that in order to find a.o. rules under (A1”) and (A3) it suffices to be able to approximate tm as G n a a for each m. Now under (A2), (2.15) obtains, i.e., for each m 24 (3.5) qm(y) = z vkq(y+k). y E L . and from (2.9) the Bayes rule is provided by m qm(y+1) (3.6) tG(y) = ———qm(y) . y E I . With 3' defined in (2.19), _ __ P (3.7) q (y) -=- 2 v, q 0 for all y and m. Theorem 3.1. Under (A2) and (A3) and with tn = tn where t: is given by (3.8), {tn} is a.o. relative to every G. 25 Proof: Let G and m be fixed, 1 s m s‘M. By (3.7) and the fact qm(y) > 0, y E‘I, we see that Em+(y+1) Py qmo+1> 5-;(3') qmw) (3.9) .YEI- Since (A2) implies (A1-) (cf. Remark 2.1) we have 0 s tE(y) s 9, y 6 I, so that (3.9) implies m P m (3.10) tn(y) «V tG(y) . y 6 I . and, hence, (3.2) is satisfied. An application of Lemma 3.1 com- pletes the proof. In the absence of (A2) the choice of a sequence {tn} which is a.o. relative to every G is more complicated. However, in Section 2.1 we saw that under (Al-), for each m and each 6 > 0 there exists a polynomial E vkek which approximates zm-1(9) to within 3 uniformly in 9 E 9. For a determination of such a polynomial, we define (3.11) 6' 6O) = 1: y, E . y e 1 ms where 3’ is given in (2.19). Reversing the order of summation in (3.11) we have _ B l n (3.12) qmm n 121 2 wk iq(y+k). y 6 I . where for each i, M Ika 3.13 a (A ) I: y, ic1I s p(e.y) mil 2: 86%) in view of our requirement 1q(y) s g-1(y) in (2.19). Hence, 26 2 (3.14) Vary(qm e(y)) S 9.45.112 , y E I, 1 s m SM . n Si e E'- I E , where is defined b M: qu’¢(y) qm,¢(Y)’ y - I qmw y (2.17), we have by (2.13) and (3.14) that 2 - 2 2 (my) 2 2 (3.15) Ey(q “(1') - qm(y)) s n + 6 q (3'). y e I . m The bound p(e,y) defined in (3.13) is independent of C. With a -o 0 there exist n I n(y,e) such that n-lpz(s,y) -° 0. By inverting the function for each fixed y we Obtain a choice e I g(y,n) I 0 with n-1p2(e,y) I 0. For such choices P '_ my (3.16) qm,e(y) qm(y), y E I, l s m s M . - mn+1 Theorem 3.2. Under (Al ) and (A3) and with tn = tn where t: is given by —- + m qm (y+1) (3.17) tn(y)= :1 + AB,yEI,1SmSM, m.e (y) with a choice a I e(y,n) such that (3.16) Obtains, [tn] is a.o. relative to every G. Proof: Let G and m be fixed, 1 Sim s‘M. By (3.16) and the fact that qm(y) > O, y E I, we see that - 4- <1 (Y‘I‘l) P q (Y‘I'l) (3.13) 4321—_ _.Y J1.— + q (7) <1,“ (y) m .yEI. Under (Al-), 0 s t:(y) s 9 so (3.18) implies P (3.19) t:(y) —-y 930') . y 6 I and, hence, (3.2) is satisfied. An application of Lemma 3.1 com- pletes the proof. 27 Theorem 3.2 subsumes Theorem 3.1 for in case 2(9), 9 E 9, is a polynomial a choice corresponding to e I 0 exists. Theorem 3.1 was presented because of its simple proof and because of its significance in the motivation of Theorem 3.2. 28 §3.2 Estimation Under (Al) and (A2) Let these two conditions hold throughout this section. The candidate for an a.o. empirical Bayes rule will be based on t: defined by (3.8). In the absence of (A3) we have found it necessary to examine in greater depth the conditional mean square error of estimation (3 20) E (tm( ) - t”( >)2 . y n y C y . Define iqm(y) g z Vk iq(y+-k) for each 1,m.y so that qm'= %- 2 19m. Let f' denote the function defined by f'(y) I f(y+1) iIl and fix m,n, and y. Temporarily suppressing the display of the dependence on m,n, and y (e.g. qm(y) I q) we can write (3.20) as 8'1: 2 t (3.21) I0 G P[t > tG + c]dc + I06 P[t < tG - c]dc2 . Since t I [(aw)+7(a5+] A 9, we see that (3.22) [t>t+c]$[;$0]+[;'-(tc+c);>0]. G We can write (3.23) P[q' - (tG + C)q > 0] = P[u) > 0] where _ 1 n (3.24) (0 I - E (1) n i=1 1 and II". II (3.25) 1w iq (tG + c)iq, i 1,2,... 29 In preparation for bounding the tail probability (3.23) note that, since tG I q'lq and E ifi I q, we have (3 .26) 11(5) -cq . From (3.25) and the bound for iq required in (2.19), we have for c E [0, tG - 9] and for each i that (3°27) ‘iw‘ ‘ E‘Yk\(s(y+1+k) + s(y+k)) " p ' By (2.3) of Theorem 1 of Hoeffding (1963) we have (3.28) P[E 2 0] s epr-Zrig'g'lzl and hence B-t: G - 2 2113 (3.29) g P[m20]dc s 2 . nq A similar treatment of PE; 5 0] yields — 2 (3.30) P[q s 0] s exp{-2r{§%] } where use is made of the fact that 1Q 5 9/9. Combining (3.22), (3.29), and (3.30) we obtain B-t 2 _ 2 2 (3.31) J‘ G P[t > t + c]dc2 s Ze.+ 32 arm—ELL} 5' in. 0 c n£12 2p2 The same bound holds for the second term of (3.21) so that by (3.20) and (3.21) (3.32) Eye?» - 1:29))2 s Bn(m.y) where, with p(m,y) defined in (3.27), Bn(m,y) is defined in (3.31) with the dependence on m,n and y now displayed. 30 Lemma 3.2. Let N be any function from I to I such that N(M) « m as M a m. There exists a sequence [Mn] independent of G such that (3.33) s (N) 2 v v Bn(m,y) « o as n a m . “ main yQJ (Mn) Proof: Let N be fixed such that N(M) « m as ‘M I m. For each M, let n I n(M) be any increasing sequence of integers independent of G such that (3.34) V V B (m,y) I O as M I a . mar yQHM) “ Inverting n(M) to obtain M(n) will allow a choice of a sequence Mn I M(n) independent of G such that (3.33) obtains. To see that such a sequence n(M) independent of G exists, note that my) 2 z‘“(e>tgeyc 2 2mm»? where u - £9 awe). Then (3.35) A A qm(y) 2 [2(6) A 1]“[9 A DINO” n5M.y£N(M) With (3.36) E V V p(m,y). msMIy‘N(M) we see by the definition of B in (3.31) and by (3.35) and (3.36) that any choice n I n(M) such that (3.37) 0%Z(B) A 1]M[p /\ llum) pg]. —0 m 88 M —o a: will ensure that (3.34) obtains. Any choice (3.38) n I n(M) I [2(9) A 1].2M p: exp{aNb(M)] 31 where a and b are constants, a > 0, b > 1, is independent of G and guarantees (3.37) regardless of the value of 9. Hence the proof of the lemma is complete. We are ready to define a candidate for an a.o. rule. Let Lm be decision rules such that R(Lm,G) I 0 as m «to for every G. Such a choice is possible as was seen in Section 2.3. Let {Mn} be any sequence of positive integers I m as n I m. For each n define tn by mn+ Inn-*- (3.39) tn = L 1]:me > Mn] + tn 1[mn+1 s Mn] where for each m, t: is defined in (3.8). Theorem 3.3. Under (Al) and (A2), with N any function satisfying the hypothesis of Lemma 3.2 such that PB’k[Y >'N(k)] I 0 as k1~ m, and with tn defined by (3.39) with {Mn} chosen independent of G so that (3.33) obtains, the rule {tn} is a.o. relative to every G. Proof: Let G be fixed. For mn+1 > Mn’ ”+1 (3.40) o s Dn(G) s R(L “ ,c) .. o as n —. m since Mn 4 a. For m I mn+1 5 Mn’ by applying (3.1) conditional on 31,...,§n and then completing the expectation, (3.41) 0 s Dn(G) = E(t:(Y) - 63(2))2 . 2 Since (t:(y) - t2(y)) s 92 for each m,y, and n, the right hand side of (3.41) is bounded (with Nn I N(Mn)) by N n 2 2 (3.42) yfo zyefim - ego» time) + e PEY > N“) 32 which in turn is bounded by 3 43 B (N) +' 2 P Y N ( ° ) n B 8 ’Mn[ > n] from (3.32), the definition of Bn(N) in (3.33), and Lemma 2.1. Since (3.33) Obtains for {Mn} and N, the first term in (3.43) I 0, while the second term in (3.43) I 0 by the definition of the function N and the proof is complete. This section is subsumed by the succeeding section in the same manner as Theorem 3.1 was subsumed by Theorem 3.2. However we have included this section for its significance in motivating the deveIOpment of a.o. rules in the succeeding section. An earlier construction used weaker bounds on the condi- tional mean square error of estimation and required the imposition of an assumption + (A1) OCEO.B]=d.o>0.BEH in order to determine the choice [Mn] independent of G. Professor James Hannan observed that an application of Hoeffding's Theorem 1 yielded a bound from which a construction could be accomplished under (Al). 33 §3.3 Estimation Under (A1) Let (A1) hold throughout this section. The candidate for an a.o. empirical Bayes rule will be based on t: defined in (3.17). For m 2 l and g 2 0, define (3.44) tm =.Jflafi G96 qm C where qm is defined by (2.17) and f'(y) I f(y+1) for any function f. For each m and g, 1 “9' m m - (3.45) It - tGI qm,e m.e (1,. - 9,;I + BIqM - qu} - Fixing m and taking 23 < zm-1(9) fixed, (3.46) q 2 zm'1q m and from (3.46), (2.18), and the choice of e (3.47) 9m”; >- qm - eq 2 i zm'1(6)q Hence from (2.18), (3.47), and the fact that q' 5 9q, the right hand side of (3.45) is bounded by (3.48) 43 e{z““'1(e)]'1 . For the choice of e, we see 111 (3.49) 0 < tG.e s 39 where use is also made of (3.47), (3.48), and the fact that m 0 s tG s 9 under (A1). Now define -, + - + (3.50) Tie I [(qm’e) /(qm,e) 1 A 3B 34 with Em 8 defined in (3.11) and note that 9 m m 2 m m 2 ( ) (Tnse G’s ( n G96) where t: is defined by (3.17). Following the same procedure leading to (3.32), we have that for each m,y,n, and e < zm-1(9)/2, 111 I'll (3.52) Eyomo) - t 2 * Gwho) s Bn(m.y.e) 2 4 2 2 -982“ (I: (Y) 3.91.211... 185 ex“ 1L } nq‘i’eb') 2 92(msYsC) where 3 (3-53) p(n.y.e) =-.: 24ka (W + 87331?) From (3.51) and (3.52), we have 2 'k (3.54) Ey(t:(y) - tg’e(y)) S Bn(m.y.e) for each m,n,y, and e < zm-1(9)/2. Lemma 3.3. Let N be any function from I to I such that N(M) arm as M I m. There exist sequences [Mn] and {en} independent of G such that * (3.55) Bn(N) 2 v v B*(m,y,en) -. 0 as n _. .. mQ‘ln yQHMn) and (3.56) ¢n( A zm-1(B))-1 I 0 as n I a . ms'Mn Proof: Let N be fixed such that N(M) I m as M.~ m. For each M, let can) ‘be any null sequence such that g(M)( A zm-l(9))“1 ‘mSM I 0 as M a a. Let n I n(M) be any increasing sequence of 35 integers independent of G such that * (3.57) V V Bn(m,y,e(M)) I O as M I m . m9! yam) Inverting n(M) to obtain M(n) will allow a choice of sequences M.“ I M(n) and g“ I g(Mn) independent of G such that (3.55) and (3.56) Obtain. Again such a choice of n(M) independent of G is possible. Without loss of generality, g(M) s k( A 2m-1(9)) mSM so that from (3.47), qm 2 % Zm-1(B)q 2 3 zm(9)p.y where .804) p. - {£9 G(d9). Then (3.58) A A qm (M) (y) 2 {2(9) A 1]”[9 A 1]N(M) . m4! yam) " With (3059) DM 5 V V 901101600): mSMIy£N(M) we see as in the proof of Lemma 3.2 that any choice -2M * 2 b (3-60) In I 1900 I [2(8) A 1] (9M) exp{aN 04)} where a and b are constants, a > 0, b > 1, is independent of G and guarantees (3.57). Hence the proof is complete. Define a candidate for an a.o. rule by letting Lm be decision rules such that R(Lm,G) I O as m.» o for every G. Let {Mn} be any sequence of positive integers I m as n I m and let {en} be any null sequence. For each n, define tn by m m = n+1 n+ (3.61) tn L [mn+1 2 Mn] + tn l[mn+1 5 Mn] where for each m and e, t: is defined by (3.17). 36 Theorem 3.4. Under (A1), with N any function satisfying the hypotheses of Lemma 3.3 such that PB REY >'N(k)] I O as k I m, and with tn defined by (3.61) with {Mn} and {an} chosen independent of G so that (3.55) and (3.56) obtain, the rule {tn} is a.o. relative to every G. Proof: Let G be fixed. For mm”1 >’Mn, (3.40) holds since M I»m. For m = m 5 Mn’ (3.41) holds and its right hand side n n+1 is bounded by N“ m m (3.62) 2 z Ey(tn(y) - tG (y))2h (y) +- y=0 3 6“ m 2 2 e (y) - t:(y)) hm(y) + B PB,Mn[Y >Nn] with Nn = N(Mn) by use of the cr-inequality (cf. Loéve (1963, 2 p. 155)), the fact that (t: - t3) 5 a , and Lemma 2.1. The third term of (3.62) I O as n I m by the choice of N. Without loss of generality, an s k A zm-1(B) so that from (3.54) and (3.48) mSM the first two terms of (3362) are bounded by (3.63) 23: ) 40 in view of our requirement that 1q 5 3-1 in (2.19). Noting m -01 that Ey(an,e(y)) aG,e(y), y E I, n 2 l, we have from (4.10) and (4.11) 2 , 2 m - m 2 .e_i1d2stl (4 1 ) Ey(an e(y) aG,e(y)) s n . y E I. n 2 1 . Since m m “an,e - 86‘ 2 ‘a2|] s [‘a:,e - a: 3| 2 k‘ag‘] + m m m “an,e - ac‘ 2 k‘ac‘], n 2 l, the summand on the right hand side of (4.1) with a: B a: e is bounded for each n,m,g, and y by (4.13) |a§) 2 r\a§(y>\] +2%WH%3®)-%UH. In view of (4.7), (4.8) and the fact that gmq s.[zm-1(s)]-1hm, the second term of (4.13) is bounded by (4.14) 2(e+c>e[zm'1]'1hm . Using a Markov bound on the first term of (4.13), this term is bounded by (4.15) zgme[z (8)] . y=0 The bound (4.17) motivates the following lemma. Lemma 4.1. With N any function from I to I, there exist sequences Mn and an independent of G such that mm; (mm) %m>sf5v 2 %owuwmw~o a new tmfifl y=0 n and ‘k - .. (mm) %EeJA rmhm>1~oesn-.. mil“!n Proof: For each M, let g(M) be any null sequence such that 6(M)( A zm’1(B))-1.I O as M I m. Since p(M,N) a (M) V Z gm(y)p(e(M),m,y) is independent of G, n = nOM) can be “1‘“ Y'0 J! chosen independent of G such that n p(M,N) I 0 as M Ion. Inverting n(M) to obtain M(n) allows the choices of Mn ' M(n) and en ' g(Mn) independent of G such that (4.18) and (4.19) obtain. Now let Lm be any decision rules such that R(Lm,G) _. 0 as m I m for every G. Such a choice is possible as was seen in Section 2.3. Then for any sequences {Mn} and {an}, define m m a n+ n+ (4.20) 6n L ltmn+1 > Mn] + 6n ltmn+l s Mn] where, for each m and n, 6:. is defined by (4.2) with m tn (4.21) an(y) an e (y), y 6 1,, m,n 2 l , ’ n 42 where a: e is defined by (4.9). 9 Theorem 4.2. Under (Al) and with N a function from I to I defined such that (4.22) Pa,k[Y > N(k)] I 0 as k I do and with 6n defined by (4.20) with {Mn} and {en} chosen independent of G such that (4.18) and (4.19) obtain, the rule {5“} is a.o. relative to every G. Proof: Let G be fixed. For mn+1>Mn, m (4.23) o s Dn(G) s R(L “1.6) .. o as n -. on since Mn I»@. For m I mh+1 s Mn’ 0 s Dn(G) is bounded by the right hand side of (4.1). In light of (4.17), the fact that gm(y)|ag(y)\ s (B+c)hm(y) for each ‘m and y, and Lemma 2.1, the right hand side of (4.1) is bounded by (4.24) 21, pnm) + 2b(a+c)p: + (3+c)P [Y > N(Mn)] 6 .Mn which I 0 as n I a from (4.18), (4.19), and (4.22) by the choice of N, {Mn}. and {an}. Hence the proof is complete. 43 §4.3 Final Remarks The rules presented in Chapters III and IV have several competitors. The first competitor which we shall discuss arises in the following manner. Suppose for each m that, in an empirical Bayes problem involving repetitions of a sample size m component problem, there is a sequence {Y2} of rules which is a.o. relative to every G. One might then consider for use in the corresponding varying sample size empirical Bayes problem, a pro- cedure {whl which partitions the problem into those involving a common sample size and uses the apprOpriate {Y2} within each class, i.e., n 2 m =m . i=1[i ] (4.25) = m] where k(n,m) m Th Yk(n,m)[mn+l Such a rule was suggested by Professor Hannan as a first thought as to what could be done in the varying sample size empirical Bayes problem. Under (A3), i¢h} will be a.o. relative to every G. However such a rule does not use all of the past data at each stage and so we could say that it does not use all of the available "information about G". So intuitively at least, when dealing with the particular component problem of Chapter II, our method seems better in the sense that it uses all of the past data available. In the absence of (A3) the rule (4.25) would not be a.o. since a new sample size can then appear infinitely often. Professor Hannan has also suggested more sophisticated estimators of qm based on averaging across mrtuples within X "i’ miZm, which use more of the available data. 44 One can use {mh} to obtain ¢-asymptotic optimality relative to every G by employing the device used in Sections 3.2, 3.3, and 4.2. For if Lm is a decision rule in the sample size m problem such that R(Lm,G) s e if m 2 M. for every G, then * __ mn+1 (P11 - (pn[mn+1 s M] + L [mu-+1 > M] will result in 11;; Dn(G) s e for every G. It is possible that we might obtain asymptotic Optimality by replacing M by a proper choice of Mn I>w but this idea has not been fully explored. The second competitor that we look at is one which would arise from the first track discussed in Section 1.3. We will discuss this competitor in the context of the special component problem of Chapter II. For an estimator Gn based on 31,...,§' and taking values on the set of probability distributions n on [0,8], let (4.26) and) -=- j eyzm Q Since eyzm(9) is bounded and continuous in e E Q, the Helly Bray Lemma (cf. Loeve (1963, p. 180)) implies that 6mm “23' qm(y) for each y E I if an I G in distribution a.s. Tucker (1963), Rolph (1968), and Meeden (1972) have demonstrated the existence of such estimators G“ in the case of identical sample sizes. We saw in the proofs of Theorems 3.1, 3.2, and 4.1 that rules based on consistent estimators of qm can easily be shown to be a.o relative to every G under (A3). Again in the absence of (A3) 45 the rules based on am might possibly be extended to a.o. rules by choice of rules Lm and a sequence {Mn}. The assumption that the parameter Space ® is bounded is a stronger assumption than what is needed to prove asymptotic Optimality in the particular case of the component problem of Chapter II where the sample sizes are identical in the empirical Bayes problem. Macky (1966) and Hannan and Macky (1971) have demonstrated a.o. procedures for estimation when ® = [0,m) while Robbins (1963) and Samuel (1963) have demonstrated a.o. procedures for testing when @C [0,») relative to every G for which I9 G(de) < m. So one would hOpe that in the future methods for estgblishing a.o. procedures in the varying sample size empirical Bayes problem could be found without restricting ® to a bounded set. BIBLIOGRAPHY BIBLIOGRAPHY Ferguson, Thomas S. (1967). Mathematical Statistics. é_Decision Theoretical Approach. Academic Press, New York and London. Hannan, James and Macky, David W. (1971). Empirical Bayes squared error loss estimation of unbounded functionals in exponential families. Unpublished. Hoeffding, Wassily (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30 0 Lehmann, E.L. (1959). Testing Statistical Hypotheses. John Wiley & Sons, New York. Loéve, Michel (1963). Probability Theory. Third Edition, Van Nostrand, Princeton. Johns, M.V., Jr. (1957). Non-parametric empirical Bayes procedures. Ann. Math. Statist. 28 649-669. Johns, M.V., Jr. and Van Ryzin, J. (1971). Convergence rates for empirical Bayes two-action problems I. Discrete case. Ann. Math. Statist. 42 1521-1539. Macky, David W. (1966). Empirical Bayes estimation in an exponential family. RM-l76, Department of Statistics and Probability, Michigan State University. Maritz, J.S. (1970). Empirical Bayes Methods, Methuen and Co. Ltd., London. Meeden, Glen (1972). Bayes estimation of the mixing distribution, the discrete case. Presented at l33rd Meeting of the Institute of Mathematical Statistics, Ames, Iowa, April 1972. Neyman, J. (1962). Two breakthroughs in the theory of statistical decision making. Rev. Inst. Internat. Statist. 30 11-27. Robbins, Herbert (1956). An empirical Bayes approach to statistics. Proc. Third Berkeley Symp. Math. Statist. Prob. I, University of California Press, 157-163. 46 47 Robbins, H. (1963). The empirical Bayes approach to testing statistical hypotheses. Rev. Inst. Internat. Statist. 31 195-208. Robbins, Herbert (1964). The empirical Bayes approach to statistical decision problems. Ann. Math. Statist. 35 1-20. Rolph, John E. (1968). Bayesian estimation of mixing distributions. Ann. Math. Statist. 39 1289-1302. Samuel, Ester (1963). An empirical Bayes approach to the testing of certain parametric hypotheses. Ann. Math. Statist. 34 1370-1385. Tucker, Howard G. (1963). An estimate of the compounding distribu- tion of a compound Poisson distribution. Theor. Prob. Appl. 8 195-200. ‘1» in! Ni _' ‘g'L‘ ..- _._.. flimwa MTIWITWIMIIEfiiflijifiiflililiuifiWES 1029