ABSTRACT CONTRIBUTIONS T0 COMPOUND DECISION THEORY AND EMPIRICAL BAYES SQUARED ERROR LOSS ESTIMATION By Richard John Fox Let = (x1,x2,...,xn) be a set of n independent x —n random variables where for i = l,2,...,n, xi ~ P9 and 6i E O, i a real interval. Consider the n testing problems corresponding to 9=(6 , ..,9 ), each having the structure of the following —n l n component problem, H: 9 S a, K: 9 > a, a in the interior of O and letting L denote loss, L(H,9) = (6-a)+, L(K,9) = (S-a)". A compound procedure inQEn) IS a sequence (¢1(§n),... ,Wn(§n)) of x -measurable test functions where 1li,(x ) is to be used -n 1 -'n for testing 91' The compound risk of in is the average of 1n the individual risks, Rn(§,_‘{“) = n 21:1 R(_e_ ,qti),where _e_ = (91,92,...) 6 0“. Define the modified regret of in’ denoted by Dn(_Q,}IF_n), by Dn(-e-’ln) = Rn(_9_,y_n) - Mon), where R(Gn) is the Bayes risk verSus Gn’ the empiric distribution of fin, in the component problem. For both discrete and continuous exponential families, compound testing procedures are presented whose modified regrets converge to zero as n -’ °°. Consider the problem of estimating Gn’ based on §n° Let the class of distributions be the uniform on the interval (0,9) , 9 > 0, family. An estimator is presented whose Levy .—r- 1": - Richard John Fox diStance from Gn’ for a certain class of _9_'s, almost surely converges to zero as n -+ 0°. Estimators are presented possessing this property for all _9_'s, when the family is the uniform on the interval [9,9+l), 9 E (-°°,°°), distributions. For these same two families, it is shown that if the 91's are i.i.d. ~ G, the same estimators converge in Levy metric to G. Let x have distribution function F 9 E O, a subset of 6’ the reals, where 9 is a random variable possessing distribution function G. Let xl,x2,... be a sequence of random variables i.i.d. according to the marginal distribution on x. Based on x1,x2,. .. ,xn , we estimate the conditional mean of 6 given x and show that the risk, assuming squared error loss, of using this estimate of 9 converges to Bayes risk for three different families of distributions, namely the two uniform families pre- vious 1y discussed and a certain family of Gamma distributions. No assumptions are made concerning G in the uniform [9,9+1) case and in the other two cases we assume I 92dG(e) < 6°. Consider the estimation problem discussed immediately above when 9 indexes an exponential family on the non-negative integers. Assuming a bounded parameter space, sufficient conditions are pre- sented for obtaining a rate of n45 of convergence to Bayes risk. Finally, this same problem is considered in the context of a bivariate exponential family where one component of the two- dimensional parameter indexing the family is to be estimated. An estimator is displayed whose risk, under a set of assumptions, con- verges to Bayes risk. CONTRIBUTIONS TO COMPOUND DECISION THEORY AND EMPIRICAL BAYES SQUARED ERROR LOSS ESTIMATION By Richard John Fox A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1968 4541/6, 2:, ’2’ ‘w ---. - I. ’3’ ’4’ x " v t ‘ 2. / 1’ ' ‘1'” / - J W. flt‘ ' I" f. u'. To my parents ii _. -_. .\p . _ ACKNOWLEDGMENTS I wish to express my sincere appreciation to Professor J. Hannan for his patience and guidance throughout the pre- paration of this manuscript. His suggestions and comments were of great value in obtaining the results of this thesis. I also wish to thank Professor D.C. Gilliland for his help in the final review. Finally, I wish to express my gratitude to the National Science Foundation which provided the financial support for this investigation through Grant GP 7362. iii TABLE OF CONTENTS Chapter Page I. INTRODUCTION .................................... 1 II. COMPOUND TESTING IN EXPONENTIAL FAMILIES ........ 4 2.1 General Remarks ............................ 4 2.2 Discrete Case .............................. 8 2.3 Continuous Case ............................ 20 III. ESTIMATING THE EMPIRICAL DISTRIBUTION OF A PARAMETER SEQUENCE IN COMPOUND DECISION PROBLEMS ........................................ 27 3.1 Introduction ............................... 27 3.2 Uniform (0,9) Case ....... . ............... 29 3.3 Uniform [9,9+1) Case ..................... 33 3.4 Estimating the Prior Distribution .......... 37 IV. SOME EMPIRICAL BAYES SOLUTIONS .................. 42 4.1 Introduction ..... . ......................... 42 4.2 Uniform (0,9) Case ....................... 43 4.3 Uniform [9,9+1) Case ......... . ........... 52 4.4 Estimation of a Location Parameter in Certain Gamma Distributions ................ 54 j V. EMPIRICAL BAYES ESTIMATION IN EXPONENTIAL {1 EAMILIES .. ........................... . .......... 62 j‘ 62 Q‘ 5.1 A Rate for the Discrete Case ............... if 5.2 Estimation in the Presence of a Nuisance ‘ Parameter .................................. 7O BIBLIOGRAPHY .................................... 79 iv CHAPTER I INTRODUCTION The problems considered in this thesis fall into the following three categories: Compound Decision Theory, Estimation of Distribution Functions and Empirical Bayes Estimation. In Chapter II, we consider a compound decision problem, where the component problem is a test on the parameter of an exponential family. For both discrete and con- tinuous cases, compound testing procedures which possess a certain desirable asymptotic property are displayed. Consider a compound decision problem where the underlying family of distributions is either uniform on the interval (0,0), 9 E (0,”) or uniform on the interval [9,6+l), 9 E (-°,m). Let Gn denote the empiric distribution of the n parameters corresponding to n observations. In Chapter III, for both families, estimators are pre- sented whose Lévy distancesfrom Gn almost Surely converge to zero as n fl w. For these same two families, we obtain as corollaries that if the 6's are i.i.d. random variables with common distribution function G, these same estimators converge in Lévy metric to G. Robbins (1964) presents a minimum distance technique in a general context for obtaining Levy-convergent estimators of the prior dis- tribution function G. In Chapter IV we deal with the Empirical Bayes Quadratic Loss Estimation Problem, which is treated for certain exponential families by Robbins (1964). Suppose x has a distribution depending on a random variable 6 possessing distribution function G and that the value of 9 is to be estimated. Further suppose that this PTOblem has occurred n times in the past. For three families, namely'the two families considered in Chapter III and a certain family of Gamma distributions, estimators of the conditional mean of 6 given x, based on the past n observations, are presented whose risk converges to the Bayes risk versus G, i.e. the estimators are asymptotically optimal. The class of prior distributions G, for which this result holds varies with the family. In Chapter V,two more empirical Bayes quadratic loss estimation results are presented. Macky (1966) displays an asymptotically optimal procedure for a family of exponential distributions on the non-negative integers:under the assumption that the prior dis- tribution possesses a second moment. Under the assumption that the parameter Space is bounded, we present sufficient conditions for obtaining a rate of convergence to Bayes risk for this procedure. Finally, asymptotically optimal estimators are presented for one component of a two-dimensional parameter indexing a bivariate exponential family. We now make some remarks concerning notation. If A is an event, [A] will sometimes be used to denote the indicator function of A. For any distribution function, say F, the letter F also represents the corresponding Lebesgue-Stieltjes measure and F]: = F(b) - F(a). We adopt the convention that distribution functions are right continuous. For any measure u and a function f, u(f) will occasionally ' In H 3 be used to denote If du. The abbreviation i.o. stands for infinitely ofterL. For any function of a real variable, say g, g' and g" denote its first and second derivatives. If A is aset, A' denotes its complement. Finally, Q stands for the standard normal dis- tribution function. We now make some remarks concerning a certain type of three- point distribution which occurs frequently throughout this paper. Let the random variable x take on the three values -v, 0, w, where v and w are positive, with corresponding probabilities q, 1-q-p,p. By direct calculation, letting V denote variance, 2 2 (1.1) V(x) = v q(l-q) + 2v w q p + w p(l-p). The following lemma is due to Gilliland and Hannan (1968). Lemma 1.1. If x is a random variable assuming the three values -v,(Lw, where both v and w are positive, with corresponding probabilities q,1-p-q,p, then letting the range of x be denoted by r = v + w and 02 = V(x), NIHN IA #3 Ir—a + 'O [H Q CHAPTER II COMPOUND TESTING IN EXPONENTIAL FAMILIES 1. General Remarks. Let 65;?) be the measurable space consisting of the Borel field on the real line and let «9 = {PQIS E O}, 0 being a real interval, be a family of probability measures on (3,50. Suppose that 9<< p. and that dP Jgp du 9' Consider the following test of hypothesis problem which we call the component problem. Based on an observation of a random variable x, distributed according to P 6 G O, we test: 9’ H: 9 S a versus K: 9 > a, where a is in the interior of 0. The loss function L is defined as follows: + L(H)e) = (e ' a) : L(K,9) = (9 - a)’. = ... ' d Let ‘En (x1,x2, ,xn) be a sequence of n independent ran om variables with x1 distributed according to P6 and Bi 6 0 for i "ll\\m a V . (~ ~‘ ‘ _-‘ “.M— i ‘ 1,2,...,n. Abbreviate P6 to Pi’ p9 to pi and “2:1P1 i to P i _11. Let 2n = (91,62,...,0n). We consider the compound dec1sion problem consisting of the n testing problems corresponding to 9 3 where the hypotheses are as in the component problem. We adopt the convention that the value of a test function is the probability of accepting H. By a compound testing procedure is meant a sequence, 1n(§n) = (¢1(§n),¢2(xn),...,¢n(xn)) of x“- measurable test functions, where W is the test function for testing the hypothesis concerning Si. Also, for any sequence 2, 0% let 2 denote the corresponding product measure on (I ,5 ). For any sequence 2, we define the compound risk of in, denoted by Rn(§,1n), as follows: -1 n + - (2.1) Rn(9-’1n) = n ifltznflixei-a) + (1-gn(wi))(ei-a) 3, i.e., Rn(§,1n) is the average of the risks for the n problems. Let Gn be the empiric distribution function of g“. If we restrict in to simple symmetric compound procedures, i.e., ti(§n) = V(xi) where W is some test function for the component problem, then the minimum of Rn(§,1n) over such procedures is R(Gn)’ the Bayes risk against Gn in the component problem. De- fine the modified regret of a procedure in, denoted by Dn(§’in)’ as follows: (2.2) 13,52,111) = 1352.1“) - R(Gn). Hence, a procedure whose modified regret tends to zero, asymptotically 6 is as good in terms of average risk as the best simple symmetric procedure. Let ¢G be the Bayes test versus Gn in the component problem whichnchooses H if and only if Px(9) < a, where Px(9) is the conditional expectation of 9 given x when the joint dis- tribution on the pair (9,x) results from Gn on 9 and P6 on x. Thus, (2.3) «:6 (x) = [13(6) < a]. n Let QC be the simple symmetric compound procedure which uses n $6 at each stage. n We now derive a useful expression for the modified regret of a compound procedure in. Since R(Gn) = Rn(§,QGn), by (2.1) and (2-2), -1 n (2.4) ”SM-n) = n .2 «fauna, - e6 (xi)). 1=1 n For each i in the right hand side of (2.4), interchange the order of integration so that 2“ becomes Pi(n;#i Pj). Then, converting the Pi-integrals to u-integrals with the variable of integration changed from xi to x, replac1ng U?#i P. by En and interchang- J ing summation and integration, we obtain from (2.4) Dn@,ln) = “(1:1) , (2.5) _ -1 “ In(x) - n i§1(ei-a)pi(X)[£n(wi(xl’..-’xi-1,x,xi+1’...,xn))—¢Gn(X)]. In this chapter u will either be Lebesgue measure or counting measure on the non-negative integers. ‘9 ‘will be an exponential family specified by the following density with respect to u: (2.6) pecx) = excm, where 6 6 fl, 0 being an interval subset of the non-negative reals and m is a positive function. In both the discrete and continuous cases we shall exhibit compound testing procedures whose modified regrets tend to zero as n increases. .L; For any real-valued function, say g, on the‘Q-support of I, define the linear functional T by = aorta sis). (2'7) T(g(x)) m(x+1) ’ a m(x) ° Let -1 -1 (2.8) |Tl(x) = (m(x+1)) + a(m(x)) . Throughout the remainder of this chapter we will occasionally abbreviate expressions involving functions of x by omitting the —- -1 display of the argument x. Define p = n 22:1 i Since by (2.7) and the definition of '5, _. -1 n x (2.9) T(p(x)) = n E (9. - a)9. C(G.) i=1 1 1 i 'and since pe(x) being defined by (2.6) implies that Px(6) < a iff 13660) < 0, by (2-3)’ (2.10) ¢G = [T(E) < o]. n * Let F be the empiric distribution function of x = (x ,x ,...,x ) ‘11 l 2 n * d l b h ' ' d’ ' ' ’ ... ,..., an et Fi e t e empiric istribution function of (XI, ,xi_1,x1+1 xn) * - multiplied by (n-l)/n, i.e. Fi(x) = n 1 2?#i [xj S x]. 2. Discrete Case, In this case p is counting measure on the non-negative integers and the family considered is specified by density (2.6). Our results will be obtained under the following two assumptions: 0A1) n = [0,5], 5 < m. (A2) 0=[d,B],0 0,u(9xm(x)) < a}, Johns dis- 52 Plays a procedure whose modified regret is of order n- uniformly in ‘9. The statistic used in the testing procedure involves artificial randomization. Under (A2) and the same assumption on B, ~Johns points out that randomization is not necessary. In the following, a non-randomized compound testing procedure Also is given whose modified regret under (Al) converges to zero for each 9' and under (A2) converges to zero uniformly in .9. The method of proof differs very much from the technique of Johns. Also, an example is given which shows that unless further as- sumptions are made, no rate of convergence can be found for this procedure under (Al) or (A2). Define dF* * . -1 n f.(X)= =n z[x.=XJ, 1 d“ #1 * n * dF -l f (X) ="_‘ = n 2 [x. = x]. du j==1 J * * * Equation (2.10) motivates the testing procedure ‘1“ = ($1,...,¢n) defined by * * (2.11) wi(§n) = [T(fi(xi)) < o], 1 = 1,2,...,n. We now proceed to show that this testing procedure has modified regret converging to zero as n r w. It is convenient to introduce the random variables Yj(x) defined by (2.12) Yj(x) = T([xj = x]). By (1.1), azpi (1410.)) zap J. (x) p j p, (x+1> <1-p j > ' (2.13) V(Y.( )) = + J x m2(x) m(x)m(x+1) m2(x+l) A180, 10 x (2.14) Pj(Y (X)) (GJ. - 6093. C(ej)- .1 We note that by the linearity of T, for any x, * -1 n (2.15) T(fi(x)) = n 2 Y.(x) j#i J and * -1 n _____ (2.16) T(f (x)) = n 2 Yj(x) = Y(x). i=1 * _ Lemma 2.1. T(fi) - T(p) r 0 in ‘P-measure uniformly in i and g for each x. Proof: Let x be fixed. By (2.15), (2.16) and the definition of |T| in (2.8), hat) - mm -n‘1|yi| s n‘lm. By (2 9) and (2.14), T(E) = gn(§). Thus, by (2.16), * _. _. ._ |T(f ) - T(p)| = |Y - (Y)|. which converges to zero in P- P afll measure uniformly in ‘9 by the Tchebichev Inequality. Hence, * ._ by the triangle inequality T(fi) - T(p) r 0 in meeasure uniformly in .3 and i. Qgrollary 2.1. For each x, there exist two sequences {0n} and {an}, both positive and decreasing to zero, such that for all ‘9 _. * and i = 1,2,...,n, if |T(p)| > 5n, then §K[T(fi) < 0] = [T(E) < 0]) 2 1 - en. EEEQEE: Fix X and let {5;} and {6;} be two sequences of Positive reals decreasing to zero. Define n3, j = 1,2,... such 11 * _ that n 2 n3 implies g(|r(fi) - T(p)| 2 53) s 63 for all “g and i = 1,2,...,n. The existence of n3 for each j is guaranteed by Lemma 2.1. Define n1 = l and for j 2 2 let nj = (“j-1" n3) + 1. Define the sequence {an} by en = l for n1 5 n < n2 and for . = I Define {6“} by 5n 6j for for j = 1,2,... . The sequences {6n} and {en} = ' S j 2 2, en 51 for nj n < nj+1. nj S n < nj+1, are positive, decreasing to zero and satisfy the condition: g(|r(f:) - T(E)| 2 an) s an for all .g and i = 1,2,...,n which completes the proof. We now state and prove a lemma which follows immediately from a well knowu theorem of probability theory. Applicationsof this lemma to the variables Y (x) yield results which are useful in J this development. Lemma 2.2. If z , j = 1,2,... are independent random variables J possessing finite ranges rj such that rj S r < a for j = 1,2,... and if s: = 2?=1 V(zj) a Q as n H m, then for any pair of real numbers b < d, P{b S zg‘lzj S d} e 0 as n a W. Further, if for j = 1,2, .., V(zj) 2 52 > 0, P[b s zg=1zj s d} = 0(n'5). Proof: By the Berry-Esseen Theorem, page 288 of LdEve (1963), n P{b s 2 zj s d} s §(d*) - @(b*) + 2c r s;1, j=1 where 9: * snd = d — 2P=ls(zj), snb = b - z§=1E(zj) uh in: the 12 * * and c is the Berry-Esseen constant. Since §(d ) - Q(b ) s -1 _ sn (2n) l5(d-b), the proof of the first result of the lemma is complete. If Var (23) 2 62 > 0 for all j, then sn 2 nab and the proof is complete. Lemma 2.3. If 9 f 0 as j ~ D, for each x, J be S z§=l Yj(x) S d} d 0 as n ~ Q where b < d are two real numbers. Further, under (A2), be S z§=l Yj(x) S d} = 0(n-k) uniformly in g for each x. Proof: If 9j fl 0 there exists n > 0 such that 9j 2 n i.o. If 9j 2 n. by (2.13) a ,(x)d (x+l) V(Yj (x)) 2 m(x)m(x+1) where for x = 0,1,2,... dn(x) = inf pe(x) > O. sews] Hence, 3: = z§=l V(Yj(x)) H w. Also, the range of Yj(x) is |T|(x) for all j. Thus, the first result of the lemma follows from the first part of Lemma 2.2. Under (A2), for all j a qi(x)da(x+l) V930") 2 W and the right hand side of this last inequality is independent of 53. Hence, the second result of the lemma follows directly from the second part of Lemma 2.2. m Cr an 13 We now define * * (2.17) v (x) = sup P {T(f,(x)) < o} - inf P {T(f.(x)) < 0}. n lSiSn-n 1 lSiSn-n 1 Lemma 2.4. If 9j {'0 as j r w, Vn(x) d 0 as n d)” for each x. Also, under (A2), Vn(x) = 0(n-k) uniformly in g. for each x. * —- - Proof: Let x be fixed. By (2.15) T(fi) = Y - n lYi' Hence, by (2.12), for i = 1,2,...,n (2-13) Eng < -a(nn0-1} 5 En{T(f:) < 0} 5 Eng < (nm(x+1))-1}- By the definition of Vn(x) in (2.17) and (2.18), n -1 s 2 Y. s (m(x+l)) }. i=1 3 (2.19) Vn(x) s gn{-am'1 By Lemma 2.3, Vn(x) r 0 for each x. Under (A2), by the second a part of Lemma 2.3, Vn(x) = 0(n- ) uniformly in .9 for each x. Lemma 2.5. If 9j r 0 as j a a, then * inf P (T(f,(0)) < 0) r l. lSiSn_n 1 Proof: By the lower bound of (2.18), (2.20) inf P (T(f’,‘(0)) < 0) 2 P {Y(O) < -a(nm(0))'1}. lSiSn 1 ”1‘ Since SJ * 0, by (2.14) §n(Y(O)) ~ -‘a(m(0))-1 < 0. By Kolmogorov's Criterion, page 238 of Lbeve (1963), Y(O) ‘.§n(Y(0)) ~ 0 a.s. ‘g and the result is immediate from (2.20). 0 Q x -1 We note that Since C(9) = (2&30 9 m(x)) 9 I ‘ l’.‘ Inn-“Inna 9 \1__r'=-.£’_~ 14 (2.21) C(O) = mm)"1 = sup {C(e)le 6 [0,3]}. It follows from (2.6) and (2.21), with In defined in (2.5), that for x = 0,1,2,... (2.22) |1n(x>l s C(O)Bx+1m(x). * Proposition 2.1. Under (A1), for each ‘g, Dn(§,1n) r 0 as n a w * where -1n is defined by (2.11). Proof: Case 1: 9j " 0 as j r a. Let x 2 l. Recalling the definition of In in (2.5), we have n |In(x)| s a n'1.2 Vpi(x) i=1. and since 9.1 fl 0, the right hand side converges to zero. Let x = 0. Since 9j r 0, by (2.9) and (2.10) ¢G (0) = l for n sufficiently large. By Lemma 2.5, n inf P {T(f’XO) < 0} _. 1 lSiSn _“ 1 so that |1n(0)| s n‘le ELIIZJTGBOD < o} - ¢G (0)| —» 0. Thus, since the right hand side of (2.22) is u-integrab12, by the Dominated Convergence Theorem, 0(In) ~ 0 which by (2.5) completes the proof of Case 1. Case 2: 9.1 f 0 as j a m. Let x = 0,1,... be fixed. By Corollary 2.1, there exist two sequences of positive reals 6n implies that for i = 1,2,...,n 15 * _ £n{[T(fi) < 0] = [T(p) < 0]} 2 1 - en, Hence, if |T(E)| > (an, since pe(x) s 1 for all x and e, -1 n a .— (2.23) lInl s n a 2 |£n{T(fi) <0} - [T(p) < OJI s a an. i 1 If |T(E)| s an, by (2.9) and the triangle inequality (2.24) lIn' S m 5n(l + qn) + B Vn where l * * q (x) = — (lsup P {I(£,(x)) < o} + inf P {T(f (x)) < 0}) n 2 S.S ‘-n i . -n i i n 1S1$n and Vn is defined by (2.17). Adding the bounds in (2.23) and (2.24) and replacing 1 + qn by 2 we have s . (2.25) lIn| 2m an + B O, “(fn,k) = An + Bn + Cn where An = Mfg S elfnfl‘), n =u<£g> cltf , s czar 1) n n, n, and C =u([g>e][f >c2,]f ). n n,k n,k Let v be the finite measure such that dv/du = g and note that An S H([g S e]g) = V([g S 3]), which can be made arbitrarily small by choice of e, since v([g S 0]) = 0. Also Bn S ezu([g > e]) S cu(g) which can be made arbitrarily small by choice of c. Note that by the uniform in A convergence of fn A to zero and the finite- ’ ness of v, for any 6 > 0, > 6 ~ 0. skip v([fn,k ]) Hence since C S V([f > 62]) as n ~ @ ’ n n,A ’ 32p Cn a 0, which completes the proof. 17 , * Pr02081tion 2.2. Under 0A2), Dn(§,1 ) ~ 0 uniformly in g as * n e m where 1 is defined by 2.11. 2322:: The proof is the same as that of Case 2 of Proposition 2.1 except for the following modifications. Since the sequences {6“} and {an} are independent of 9 for each x and by Lemma 2.4, Vn(X) ' 0(n-7) uniformly in 9 for each x, the bound on lIn| of (2.25) converges to zero uniformly in g for each x. Apply- ing Lemma 2.6 instead of the Dominated Convergence Theorem completes the proof. Consider the compound testing procedure defined by 2 26 - [ f* < o <. > vim“) - T< (x9) 1, * i = 1,2,...,n. With 1“ defined by (2.26) and in by (2.11), since T(f*(x1)) = T(f:(xi)) - a/n m(xi) we obtain by (2.1), (2 27) IR (8 ) - R (e *)| s ln'1 2 (e -a)P {0 s T(f*(x )) ' n -’1n n —’1n i=1 i —n i i < a/n m(xi)}l. Dealing with the right hand side of (2.27) as we did to obtain (2.5), since le-al s B, * -1 n * (2.28) |Rn(§’1n) - Rn(§,1n)| S Bu(n 2 pi(X)En{0 S T(fi(X)) i=1 < a/n m(x)}). By the definitions of Yj(x), |T|(x) ((2.12) and (2.8)) and (2-15), for each x a “Hill by the 18 n (2.29) n“1 )3 i pi2n{o s I = (2h) 3]“,- * Consider the procedure in defined by * * 2. = < ( 30) wiqn) [T(A Fi s'5 = {p((Ex > 01ax + [x s OJex>m}'1. (2.32) C(9) 2.9 = {p((Ex > 015* + [x s 03ax>m>}‘1. Lemma 2.7. If nhz ~ a, then for each x, “T(l F*(x)) - T(l'E(x))H ~ 0 uniformly in '8. ’Proof: By the linearity of a and T and the triangle inequality, Hr) - T(l Ekx>>H s )'1HA(F*JE>H + a(m(x))’lnl(r*4f)(x)u. *._ - For any y, “p(F -F)yH2 is the product of (2h) 2 and the variance of the average of n independent Bernoulli random variables. Hence, ”$(F*;P)yuz S (16nh2)-1 which completes the proof. Lemma 2.8. Under (Al) and 0A2), for each fixed x, |T(b‘§(x)) - T(p(x))| * 0 uniformly in ‘Q. Proof: By the linearity of T and the triangle inequality, for each x II<$'F - T(S(x>>| s '1|l'i(x+1> -'S-Mw|si12HF5w-png i=1 We now bound the summands in the immediately preceding expression uniformly in 9 and show thatthisbound converges to zero. By (Al) and the Mean-Value Theorem, A Fe(y) = pe(y + 5), '6‘ s h. Also y 6 hgy+®-p¢w|seumumo+é)-MwL Under (A2), for all 0 6 Q, the right hand side of this last inequality is bounded by (eyv 913’) E (l86m(y+6)-m('y)|v I06m(y+6)-m(y)|), where E is defined by (2.31). Since h ~ 0, 6 ~ 0 and hence, by the continuity of m, this bound converges to zero which completes the proof. Lemma 2.9. Under (Al) and (A2), if nh2 ~ °, for each x, * _ “T(L Fi(x)) - T(p(x))“ * 0 uniformly in g and i. Proof: Let x be fixed. By Lemma32.7 and 2.8 and the triangle inequality for L -norm, “T(l F*(x)) - T(p(x)n d O uniformly in 2 fl. Since “T(L F:(x)) - T(b F*(x))n S (2hn)-1IT|(x), the proof is complete by another application of the triangle inequality. We now define Mh(x), the continuous analog of Vn(x), by * ' * (2.33) M (x) = sup P {T(l F.(x)) < o} - inf {r (T(l F.(x)) < 0}. n . *n 1 . -n 1 IS1Sn 1S1$n Let r(x) = inf {m(y)| Iy-x| s 1}. and note that r > 0 by (Al). Then, by (2.32) for all y such ind de‘ hand m L Ill.‘ 1 23 that \y-x| s 1 and all e e o, (2.34) pe(y) 2 A(x) where A = 9 r(x) exp {-<|1og e|v|log a|)(lx|+1)}- Lemma 2.10. Under (A1) and 0A2), if nh2 d a then for each x -1 * -1 and any two real numbers b < d, Pfh b‘S nT(A F (x)) S h d] w 0 uniformly in g. * Proof: Let x be fixed and note that nT(b F (x)) = zg=l Wj where wj = (2h)-1T([x-h < xj s x+h]). By (1.1), - - 1+h . V(Wj) 2 28(2h) 2(m(x)m(x+1)) 1Fj1:f: Fj]::l-h' Thus, w1th 2 sn - 22:1 V(wj), by (2.34), (2.35) 52 2 2 anA(x)A(x+l) n m(x)m(x+1) By the Berry-Esseen Theorem, page 288 of Leave (1963) and the fact that the range of w = (2h)-1|T|(x) for j = 1,2,...,n, j n (2.36) gn{h‘1b s zle s h‘ld} s §(d*) - §(b*) + c(snhf1|T|(x). j: ' * -1 * -1 h s b = h b - if P, w, and s d = h d - 2? P w w ere n J=1 J( J) n J=1 j( J) and c is the Berry-Esseen constant. Since * * - §(d ) - §(b ) S (snh)-1(d-b) and nh2 * m, by (2.35) the right hand side of (2.36) converges to zero which completes the proof. Lemma 2.11. Under (Al) and (A2), if nh2 fl m, for each x, 24 MnCX)‘” 0 uniformly in g. Proof: Let x be fixed. For i = 1,2,...,n, P {am P*(x)) < -a(2hm(x))'1} s P {T(A {(x)) < 0} S P {anx F*(x)) —n --n 1 '_n < (2hm(x+1))-1}. Hence, by the definition of Mn(x) in (2.33), Mn(x) is bounded by the difference between the upper and lower bounds of the above inequality. This difference can be expressed as -1 * -l . gnf-a(2hm(x)) S nT(A F (x)) < (2hm(x+1)) } wh1ch converges to zero uniformly in g by Lemma 2.10, which completes the proof. We now note that by the bound on C(G) of (2.31) and (A2) (2.37) peso S V(X) where V(x) = E(sx[x > o] + QXEx s 0])m(x). * Proposition 2.3. With in defined by (2.30), under (A1) and (A2), . 2 * . 1f nh d G, DnQQ‘in) O uniformly in g. Proof: Let x be fixed and recall the definition of In in (2.5). By the bound on p6 of (2.37), bounding |B-a| by B, we have —n n (2.38) lIn(x)| s sv(x)n'1 2 IP {T(A F:(x)) < o} - [T(P(x)) < o]|. i=1 , 'k .— By Lemma 2.9 and the Tchebichev Inequality, T(A Fi(x)) - T(p(x)) * 0 in P-measure uniformly in g and i. Using the same construction as in Corollary 2.1 following Lemma 2.1, there exist two sequences {on} and {en} of positive reals both decreasing to zero such that in th( 25 \T(§(x))| > 65 implies §{[T(A F:(x))< 0] = [T(E(x)) < 0]} 2 1-en for i = 1,2,...,n and all 2. Hence, if |T(p(x))| > 5n, the right hand side of (2.38) is bounded by Bv(x)en. If [T(p(x))] 5 6n, since by (2.9), T(p(x)) = n'12§=1(ei-a)e:c(ei), adding and sub- tracting qn(x) to the term [§n(W:(x1,...,xi_1,x,xi+1,..-,xn))-¢G (X)] n of the expression defining In in (2.5), where 1 * _ * qn(x) = 41:31:; gum) Fi(x)) < o} + 1:; znird F100) < 0}) . we have IIn(x)| s m(x)(5n + qn(x)én) + Bv(x)Mh(x). Since en,6n and Mn converge to zero uniformly in g for each x and IInl S Bv which is u-integrable, it follows from Lemma 2.6 that p(llnl) a o uniformly in g, which completes the proof. Consider the compound testing procedure defined by (2.39) 11m“) = [rd F*(xi)) < 0]- Theorem 2.3. Under (A1) and (A2), with 1n(xn) defined by (2.39), if nh2 w a, then Dn(§,ln) * O uniformly in g, * * -1 Proof: Since T(A F (xi)) = T(A Fi(xi>) - a(2hnm(xi)) ,by (2.1), <2 40) In (M ) - R (9 PM s Bn-IIZIIP {0 srd P*(x )) ' n JLn rl-Ln i=ffl i i < a(2hnm(xi))-1}. Proceeding as we did to obtain (2.5) and applying the bound on pe(x) in (2.37) we get that the right hand side of this inequality 26 is bounded by: -1 n * -1 (2-41) BH(V(X)n Z P {0 S T(F (X) S a(Zhnm(X)) 1)- i=1 -n i Since for i = 1,2,...,n, gn{o s T(l F:(x» s a(2hnm(x))‘1} s gn{-a(2hm(x))'1 s um F*(x>) s <2h>'1|T| so}, the integrand of (2.41) converges to zero uniformly in g for each x by Lemma 2.10. Since this integrand is bounded above by V(x) which is u-integrable, the integral of (2.41) converges to zero uniformly in g by Lemma 2.6. It follows that the left hand side of (2.40) converges to zero uniformly in g. Hence, by Proposition 2.3 and the triangle inequality, Dn(§,in) d 0 uniformly in g. Remark. Suppose we consider I to be an interval neighborhood of +m. Redefine i to be a right divided difference, i.e., 4 g(x) = hml g]:+h and let r(x) = inf {r(y)|l 2 y-x 2 0}. If one modifies the techniques of this section accordingly, Theorem 2.3 can be obtained in this more general context. Taking I to be of this form includes the Gamma and Negative Exponential distributions as special cases. CHAPTER III ESTIMATING THE EMPIRICAL DISTRIBUTION FUNCTION OF A PARAMETER SEQUENCE IN COMPOUND DECISION PROBLEMS 1. Introduction. In a statistical compound decision problem one is faced with a set of n problems all having the structure of a certain component problem. For example,in Chapter II the component pro- blem is a test on,the,parameter of an exponential family. In such problems,procedures are desired whose compound risk (average risk over the n problems) converges to R(Gn)’ the Bayes Risk versus Gn’ the empirical distribution of the n parameters, in the com- ponent problem. It is evident that knowledge of Gn is useful in these problems. Let F 9 Subset of the real line and f be a corresponding density with 6 respect to Lebesgue measure u. Let x = (x1,x be a distribution function for each 9 E 0, some 2,...) be a sequence of independent random variables where xi has dis- tribution function Fe , henceforth abbreviated to Fi’ and i 91 E 0 for i = 1,2,... . Also abbreviate fe to fi' Let fi 1 E - Hi-lri and Gn be the empiric distribution of the n parameters corresponding to x1,x2,...,xn. Define the following functions _ -1 n (3-1) F = G (F ) = n E F , n 9 i=1 i 27 disc met: foll L0\w dist: intez WhOSe cm. of ”DJ In thl 601'!qu-l C0115 ide “(or (11 28 32 - -1n (-) f-Gn(fe)=n.2 fi 1=l and note that f = dP/du. Let * _ n 03) Fm)=n12hisfl. i=1 For any real-valued function g of a real vaiable, say x, define (3.4) A g 0. We now make some remarks about the Lévy metric which is discussed on page 215 of LOEVe (1963). The Lévy metric is the metric on the space of all distribution functions defined by the following distance formula. For any two distribution functions F1 and F2, letting d denote distance, d(F1,F2) = inf [c > o|for all x, F1(x-e)-e s F2(x) s F1(x+c)+e}. Lobve mentions that convergence in Lévy metric of a sequence of distribution functions is equivalent to complete convergence. In section 2, we consider the family of uniform on the interval (0,9) distributions, 9 E (0,“) and exhibit an estimator whose Lévy distance from Gn converges to zero a.s. E for a certain class of E's. In section 3, we deal with the family of uniform on the interval [9,6+l) distributions, 6 E (d@,°). In this case, we find an estimator whose Levy distance from Gn converges to zero a.s. E for every g. In section 4, we again consider these two families and assume that the 6's are i.i.d. according to some distribution G. We then apply the results of fut 155 the for cha‘ SEN cont find the q 1 Same . It then 29 sections 2 and 3 to the problem of estimating this prior dis- tribution function G. Robbins (1964) treats the general problem of estimating a prior distribution function G. Under certain conditions, he shows that if the estimate of G is chosen so that the resulting mixed distribution function is within en(en$v 0 as n w no) of minimizing over the class of possible mixed distribution functions, the Sup norm distance from the empiric distribution function of the observations (x1,x2,...,xn), then, under certain assumptions, almost surely the estimator will converge to G on the continuity set of G. However, no explicit method is given for obtaining this estimator. The family of section 2 of this chapter is discussed in Robbins' Example 3 and the family of ~ section 3 is a special case of his Theorem 2. Deely and Kruse (1968) further asSume that Fe(x) is continuous in x for each 6. They then exhibit a method of finding an estimator satisfying Robbins' condition. Calculating the estimate involves finding an optimal strategy in a certain game. 2. Uniform (0,9) Case. We consider the following family of distributions. For 6603(01Q): let -1 fe(x) = e Eo 0, define for all x 2 0. nN+l' An(x) = {xlx A F*(x) > (x-e)f(x-e) + E]:-¢ + e/Z}, Bn(X) = {ilx b F*(x> < - Elf" - e/2}. Note that F(x) can be written as either F(x-c) + Ej:-e or P(x+€) - Ej:+'. The following lemma then follows immediately * from equation (3.5), definition (3.6) which defines Gn and the definitions of An(x) and Bn(x). Lemma 3.1. For any e > O, for each x 2 0 it fol W in 31 {§|G:(x) < Gn(x-€) - a} C {§IF*(X) < F(x) - 9/2} U An(x), {36:00 > Gn(x*+e) + 63} C {5|P*(x) > F(x) + c/z} U Bn(x). Lemma 3.2. If h e 0 and 2:; N exp {-nhzeZ/ZBZ} < a, then 1 'J N EIjEOCAn(xnj) U Bn(xnj)) i.o.} = 0. Proof: Let 0 < x < B be fixed and note that * -1 n xAF(x)=n beExin]. - i=1 ' It follows from the definition of f1 that the variables x AExi S x] have expectations x A Fi(x) S xfi(x). It then follows that x b F*(x) has expectation bounded above by xf(x). Since If is decreasing on x >'0, we bound 7P]:_€ = x-c f du below by 3 f(x) if x 2 e and xf(x) if x < c. Thus subtracting the upper bound, xf(x), on.the expectation of x A F*(x) from the right hand side of the inequality defining An(x) yields a quantity which is bounded below by 3/2. Hence, by Theorem 2 of Hoeffding (1963), since x < B, §(An(x)) S exp(-nh232/282). Then, since .ECAn(O)) = 0 for all n, N (3.7) y U A (x )) S N exp(-nh2c2/ZBZ)- 1-0 n nj If h.S e, fi(x+c) > 0 implies A Fi(x) = fi(x+c) and it follows x»b Fi(x) 2 xfi(x+t). Hence, the expectation of * ._ x A F (x) is bounded below by xf(x+e). Hence, since 32 F1131: 2 ef(x+c), subtracting xf(x+|:) from the right hand side of the inequality defining Bn(x) yields a quantity less than or equal to -¢/2. Again applying Hoeffding's Theorem 2, since x < B and §(Bn(0)) = 0 for all n we obtain, for h S e, the bound of the right hand side of (3.7) for EGJ§=OBé(xnj)). Since the infinite series formed by summing this bound over n converges by assumption, by the Borel-Cantelli Lemma, the proof is complete. Define the distribution function G“ by Gn(O-) = 0, Gn(B) = l and for 0 S x < B G (x) = max {G*(x )IO S x S x} n n nj n ' j * Note that Gn(0) = 0. ____—_._‘ = - S S —+ _. Theorem 3.1 If 6 max {xnj+l xnj|0 j N} 0’ h 0, Gn(3) “ 1 and for all e > 0 a 2 N exp(-nh262/2B?)< 0°, n=l then d(Gn’Gn) * 0 a.s. 2. Proof: Let a > 0 be arbitrary. By the extension of the Glivenko-Cantelli Theorem to non-identically distributed in- dependent random variables, see Theorem 4.1 of Wolfowitz (1953), _ * a.s. §,F - F w 0 uniformly in x. It then follows from Lemmas 3.1 and 3.2 that N (3.8) _P;{ U (one:n j - c) - e S G:(xnj) S Gn(xnj + e) + ¢)'i.o.} = 0. J=0 . . 311 (in Win: [Film] 33 Since for 0 S x < B, én(x) = Gn(x') where x' is the largest x“j which is not larger than x, U {l‘ N * x > + C > + + cage—Gum snore) z} jgomlcnsnj) and“ e) c} and if 5 S e, )< Gn(xn - e) - C}- J -2 - 0;J B{alcnm < Gn(x c) e } Cijgo{glcn(xn j Since 6 ~ 0 and Gn(8) ~ 1, by (3.8) and the fact that an - Gn for x < 0, a.s. E ,for n sufficiently large, d(Gn,Gn) S 2:, which completes the proof. 3. Uniform [9,9+l) Case. We now consider the following family of distributions. For 9 60 = (-°°.°°). fe(x) = [e s x < e + 11. It then follows that for all x (3.9) f(x) = Gn(x) - Gn(x-l). By (3-9), (3.10) c (x) = 2 f(x-r). n r=0 * * Since F (x) S Gn(x) S F (x+1), we estimate Gn at a * point x by Gn(x) which is the truncation to the interval [F*(x), P*(x+1)] of z:=o i F*(x-r), i.e. 34 (3-11) G:(x) ‘- m :01 F*(x-r))v F*(x>)/\ F*(x+1)}. r: For convenience we assume that h S 1. Lemma 3.3. For any 3 > 0, if h S c, then for all x, * I 2 2 £(ExIGn(x—e) - c s Gn(x) s Gn(X+c) + e}‘) s 2 exp(-2nh e ). * Proof: Since the truncation involved in the definition of Gn can only improve the estimator, it suffices to prove the lemma for the estimator Tn’ defined by Q * 1 n a T (x) = Z A F (x-r) = n- 2 2 A Ex. S x-r]. n . 1 r=0 i=1 r=0 By the definition of F in (3.1), _1 °‘_ x+h-r E(Tn(X))= h 2 FJx-r r=0 By (3-9). ” — +h ” h 2 P1“ ‘r = z (fx+ "(c (t) - c (t-l))dt). r=0 x-r r=0 x-r n n . x+h-r x+h-r;1 . Writing x-r Gn(t-l)dt as x-r-l' Cn(t)dt,.we see that the right hand side is a telescopic series and we obtain Q - +h- +h (3.12) {BP]:_r r = I: Gn(t)dt. r: By Hoeffding's (1963) Theorem 2, since by (3.12),:(Tn(x)) 2 Gn(x), 2 £(Tn(x) < Gn(x-e) - e) S exp(-2nh2e ). Similarly,if h S e, by (3.12), F(Tn(x)) S Gn(x+e); applying Hoeffding's bounds again 35 2 EiTnoc) > Gn(x+c) + e) S exp (-2nh c2), and the lemma is proved. Let 6 = N-1, N a positive integer depending on n and consider the following grid on the real line: ...< -26 <-6 <0< 6< 26 <... Define the following distribution function. (3.13) Gn(x) = sup {6:(j6)lj6 s x, j = o, i 1, i 2,...}. Theorem 3.2. If Z:=1N exp(-2nh252) < m for any a > 0, N r w and h a 0, then for any g, a.s. F, d(Gn,Gn) ~ 0, where Cu is defined by (3.13). Proof: Let c > 0 be arbitrary. Let h S e and 6 S e. Let * J be the largest integer such that F (J6 + l) < c. Define the (j+l)6+l 2 c * following Subset of the real line, letting .J= {le Jj5 j 2 J, j = o, i 1,...}, An = u [i6.6> jE,J Note that there are at most L = (N+l)M grid points in An where M is the smallest integer greater than or equal to 5.1. Also * note that An may be empty. If x < J6, since F (J6 + l) < e, Gn(x) < e and Gn(x) < e and it follows trivially that - *(mHN+1 Gn(X‘€) - e S Gn(x) S Gn(x+€) + c. For m 2 J, let F ]m6 < 6. Then for x E [m6,(m+1)6), since both Gn(x) and Gn(x) are in * * a the interval [F (m6),F ((m+1)6 + 1)], Gn(x-c)-e S Gn(x) S Gn(x+e) + e. Let [m6,(m+1)6) C-An. For all x in this interval I’d Loo be f1 Gn(x) = Gn(m6). Thus, u {xIGnOO > Gn(x+e) + e} c U {alc*(36> > a (jam + e} . EA n n x€An J5 and since 6 s e, A * xléA fgglcn(x) < Gabi-2:) - 6} C jéLéA [Elcn(j6) < Gn(j6-e) - a}. #3“ n n E The ‘2 measure of the union of the two right hand sides of the above inclusions, by Lemma 3.3 is less than or equal to 2 2 Q 2 2 2L exp {-2nh e ). Since 2£=1N exp (-2nh c ) < a, by the Borel- Cantelli Lemma E( U {5|Gn(x-26) - e 5 6mm s Gn(x+c) + e}'i.o.) = o. x n It follows that a.s. ‘E, d(Gn,Gn) S 2: for n sufficiently large, which completes the proof. Remark: We have tacitly been assuming, in both sections 2 and 3, that d(Gn,C ) is for each n a Borel-measurable function of n in ,where the o-field on the space of xn's is the n-dimensional Borel sets. It will be shown that these measurability assumptions are satisfied in the proofs of Corollaries 3.1 and 3.2. Lemma 3.4. For a > O, a > O and all c 3 a 0’ 2 nc exp {-an }4< 0. n=l 2322:: It suffices to prove the lemma for c > 0. Let c > 0 be fixed and let m, a positive integer, be such that x 2 m 37 implies (x+1)c < x2e. Then a c a 2 2 n exp (-an ) < I“ x c exp (-axa)dx, n=m+l m and the integral on the right hand side can be shown to be finite by the change of variable: y = xa, which completes the proof. . . C . . . 41 Remark: Lettlng N = n , c a p051t1ve integer, h = n and B = n.Y with a,Y > 0, and a + Y < k, by Lemma 3.4 the series of the hypothesis of Theorem 3.1 converges. Letting N = nc, c a positive integer and h = nqa, 0 < a < a, again by Lemma 3.4, the series of the hypothesis of Theorem 3.2 converges. (,4, Estimating the Prior Distribution. .Let (8,50 be the measurable space consisting of the . . m real l1ne and the Borel f1e1d. Let (R ,EF5, where m is a positive integer or infinity, be the usual product space. We now drop the condition that F9 is absolutely continuous with respect to u for each 6 E Q. We refer the reader to page 137 of Ldbve (1963)for a brief discussion of regular conditional probability. Lemma 3.5. If F 9 6 Q, is a regular conditional probability a, so measure on (RqfiD, then F, g E Q , is a regular condition probability measure on (R?,Bw). Q) Q Proof: Since F_ is a probability measure on (R ,B ) for-each Q ,we only need show that for each set B 6 SP, F(B) is a 38 meas\xral>le function of g. If B is a meaSurable cylinder set, i e B — m ‘ ° ‘ IIi=1 Bi’ Bi 6 B for i = 1,2,..., where only a finite number of Bi's are not equal to R, F(B) is a finite product of terms of the form Fi(Bi) and hence is a measurable function of 9, Thus, since it is easily seen that the class of all subsets whose F- measure is a measurable function of 9 is a o-field and since the measurable cylinders are the generators of 59, the proof is complete. Suppose 9i, i = 1,2,..., are i.i.d. according to some distribution G. Let Gm(§) denote the marginal distribution on E of the joint distribution on pairs (9,5) resulting from a: G on 9 and F on x. Theorem 3.3. If 9i are i.i.d. according to G and Cu is an estimator of G based on (x ,x ,...,x ) such that n l 2 n d(C ,G ) is jointly measurable in (9 ,x ) for each n and n n -n -n if Gm {§|d(Gn,Gn) w 0 a.s. E} = l, and F 9 E 0, is a regular 9’ conditional probability measure,then d(Gn,G) * 0 a.s. 69(F). M: By the triangle inequality d(&n,c) s d(Gn,Gn) + d(Gn,G). By the Glivenko-Cantelli Theorem, page 20 of Loeve (1963), d(Gn,G) * 0 a.s. Gm. Let C be the set of pairs (9,5) such that d(6n,cn) -' 0. c is jointly measurable in ($5) and since by Lemma 3.5 F is regular, the measure of C is G°(§(C)). Since F(C) = O a.s. G”, the proof is complete. Let the U-field on 0 be the restriction of the Borel Field to 0. 39 Corollary 3.1. If Fe corresponds to the uniform distribution on the interval (0,9), 6 E 0 = (0,w), if 9,, i = 1,2,..., are 1 i.i.d. according to G, if Gn is defined as in Section 2 of this chapter and if the hypotheses of Theorem 3.1 are satisfied with B .. on replacing Gn(B) ... 1, then d(<‘;n,G) ~ 0 a.s. G°°(§). Proof: Let B be a Borel set. Fe(B) = 9-1u(B(0,6)) which is .t] a continuous function of 6 > 0 and hence Fe, 6 6 0, is a regular condition probability measure. For each n, Cn(§n) assumes one of a finite set of values, each of which is a step function with discontinuity points restricted to the selected grid. The set of ‘gn's for which an assumes a particular value is a finite union of sets each of which corresponds to a particular set of values for the * * restriction of Gn to the selected grid, where Gn is defined by (3.6). Each set of this union is a finite intersection of sets which correspond to a specific value of G: at each point * of the grid. Hence, since it is easily seen that Gn 18 a Borel-measurable function of ‘5“ for each x, the set of gn's A 0 where G assumes a specific value is measurable. Canalder n the partition of an 1?.n into sets J1,J2,...,JM where M is the number of possible values of Gn and each Ji’ 1 = 1,2,...,M, is the product of On and a set of gn's‘ on which C aSSumes one of the M possible values and hence n ' A = A . S. C3 18 measurable. Then d(Gn’Gn) 2?;1 Jid(Gn’Gn) in Q ' ' ‘ A s a d(Gn’Gn) 18 continuous in fin on each Ji’ Jid(Gn,Gn) 1 "hem and 40 measurable function of pairs for i = 1,2,...,M. ($1,351,) Hence, d(Gn’Gn) is jointly measurable for each n. Since the ei's are i.i.d. according to G, by the Glivenko-Cantelli Theorem, 8 ~ a implies G (B) r l a.s. Gco n . Thus, by Theorem 3.1, Gm£§Id(Gn,Gn) * 0 a.s. E} = 1 and it follows by Theorem 3.3 that d(Gn,G) H 0 a.s. dm(z), 5‘ Corollary 3.2. If Fe corresponds to the uniform distribution 0, 3. A p = ' A l _ ( 14) Ed < e} n {d(cn,cn) < c e}), _n) where g; is an n-dimensional vector with rational components and G' is its corresponding empiric distribution function and n e is a positive rational. For any fixed -§n and any c' 2 0 ~ A S'= -'-'s“ sG +'+c', (3 15) {d(Gn,Gn) c } LJfGn(r c ) c Gn(r) n(r c ) } r where r ranges over the set of rationals. Since it is easily seen that 6*(x), defined in (3.11% is a measurable function of n 5n for each x, it follows by the definition of Cu that Gn(x) 41 is measurable for each x. Hence, it follows from (3.15) that d(G ,G ) is a measurable function of x for each fixed 6 . n n -n -n Also, for each 6', d(G',G ) is continuous in 6 and hence -n n n -n measurable in .9“. Thus, each of the sets of the countable union of the right hand side of (3.14) is an intersection of two measurable cylinders in the space of pairs (gn’En) and consequently is measurable and the proof is complete. . "In my .. _ Let Hhic “nth these 599”! the j, CHAPTER IV SOME EMPIRICAL BAYES SOLUTIONS 1. Introduction. As previously mentioned a symbol, say F, representing a distribution function, also represents the corresponding Lebesgue- Stieltjes measure. For 0 a subset of the real line, let {Fele E 0} be a class of distributions for a random variable x possessing densities, £9, with respect to Lebesgue measure, u. Let G be a distribution on 0. Define K(X) = G(Fe(X)) which is the marginal distribution function of x of the pair (9,x), where (9,x) possesses the joint distribution resulting from G on 6 and F9 on x. Let k(X) = G(fe(X)) which is a determination of the density of K with respect to u. Let (x1,x2,...) be a sequence of i.i.d. according to K random variables and K? be the product measure on the space of these sequences. Let P be the product measure on the space of sequences (x1,x2,...,(9,x)), i.e. P is the product of fin and the joint distribution of (9,x). 42 I ll ['11 43 The Bayes Estimator verSus G in the problem of estimating 9 based on observing x, under quadratic loss, is the conditional expectation of 6 given x, Px(6). Denoting the Bayes risk verSus G by R ,we have R = P(¢(x) - 6)2, where ¢ is a Bayes response. Our objective is to find an estimator, say ¢n’ based on (x1,x2,...,xn), of ¢, for which Rn = P(¢n - 0)2 r R as n H w. If P(¢n - 9)2 < m and P(¢ - 9)2 < m, then P((¢n - ¢)(¢ - 9)) = 0 and it follows that 2 (4.1) Rn - R = P(¢n - ¢) We shall also consider the technique of first estimating G based on (x1,x2,...,xn), and then using the Bayes Estimator verSus this estimate of G to estimate 9. Throughout this chapter, + or - appearing as an affix on the lower limit of an integral means respectively to exclude and include the lower limit in the range of integration. 2. Uniform (0,9) Case. We consider the same family of distributions discussed in section 2 of Chapter III. For 9 E O = (0,6), fe(x) = e‘1[o < x < e] and 0 x S 0, Fe(x) = x 9-1 0 < x < 9, 1 x 2 9. De and "here 44 In this case we have the following: -1 (4.2) k(x) = carom) = [x > OJJ:+9 (16, (4.3) K(x) = G(Fe(x)) = C(x) + x k(x). Henceforth, we only consider x > 0. From the definition of f c> 1_G x (4'4) Pit“) = G(fe(x)) = k(x) Note that k(x) = 0 implies G(x) = l and K(x) = k(X) > 0, by (4-3), l-K x pxae) =x+—k—G%)-. Define and consider the following Bayes response, (4-5) ¢(x) = V(x) + x. We now make the following definitions: -1 n Kn(x) = n .2 [xi S x], i=1 i.e. Rh is the empiric distribution of (x1,x2,.. I -l x kn(x) - h KnJX-h where h depends on n and is positive. Define 1. If ° axn) : e_and k, 45 (“-6) line) = (anon/x (1 - Kn(x))/kn(x))[" 2 h], where undefined ratios are taken to be zero and an is a bounded non-negative function of x > 0 for each n and (4-7) ¢n(X) = x + ”(X)- We now assume (A1) C(62) < ... By (Al) and Jensen's Inequality P(¢2) < ¢. Hence, since an is bounded for each n the conditions implying (4.1) are satisfied. Thus, we are interested in choosing an so that P(¢n - ¢)2 d 0. Remark. If the failure rate of the marginal distribution of x is bounded away from zero, W is bounded, say by C. If we estimate W by (l-Kn)/kn truncated at C, under the proper conditions on h, at every x where k(x) is poSitive and is the derivative of K, hence a.s. K, the estimator converges in Q R measure to ¢(x). Then, since P0?n - ¢)2 = P Pxfln - V)2, by twice applying the Bounded Convergence Theorem, P 2 2 O (in-l) =P(¢n'¢) ~. The following example illustrates a parametric class of prior distributions for which a rate of convergence to Bayes risk can be obtained for each member of the class with a non- truncated estimator. Example 4.1. Suppose G << u and sucl 46 d6 2 -h9 ——= > i A 9 e [6 03, where X E (0,“). Then for x > 0, k(x) = rfie-ldG = xzf; e-Aede = x e-Ax. Hence, #(x) = e-xx/X e->‘x = h-1. We estimate ¢(x) = k-1 + x by ¢n(x) = Q + x where ; = “-12? Since C(92) < w i=1 xi' ‘2 . - -1 2 and P(x ) < ”e (4-1) holds. Since P(x) = k , P(¢n — ¢) = V(x) = n-lk-z. Thus, for each G in this class, Rn - R is of order n-l. We now proceed in the general problem of finding an such that P(q)n - ¢)2 m 0. Lemma 4.1. Under (A1), if nh2 4 w, h d O and an(x) a G for each x, p((yn - ¢)')2 ~ 0. Proof: Let x e A = {x|k(x) > o, x 4519(6)}, where 19(0) is the discontinuity set of G. Since k is continuous at x, 2 k(x) = K'(x). By the Tchebichev Inequality, since nh ~ m, -1 x , m _ -l x “ kn(x) - h KJx-h 0 in K -measure. Since h KJx-h k(x) as n ~ a, (4 . 8) kn(x) -. k(x) Q in K -measure. By the Glivenko-Cantelli Theorem, page 20 of \ Loeve (1963) and (4-8), I'Kn(x) # l-Kgx! k111(x) k(x) ‘ V(x) 47 C3 in K -measure. Since an(x) r m, (as wgw-wmn*~m Since mam os(wn-w32sl§ by (4.9) and the Bounded Convergence Theorem - 2 m - 2 (4-11) Px((¢n - V) ) = K ((¢n - V) ) r 0. Since P(A) = 1 and under (A1), P012) < an, by (4.10), (4.11) and the Dominated Convergence Theorem, P Px((*n - *)')Z = P((Vn - H')2 m 0 which completes the proof. Lemma 4.2. If x 2 h and k(x) > 0, c(a: + han) (1+) + 2(h-1a: + an) (n h k)!5 nkk + 2 wan - l) > s + where c is the Berry-Esseen constant and 1 denotes (1-K(h))-% which decreases to one as h ~ 0. Proof: Let x 2 h be such that k(x) > 0. Since + P(¢n - ¢ > v) = 0 for v 2 * where * = (an - W) , + 2 * 2 gun-w) =hr¢%-¢>wa. ut0 v iff G < 0 where for i = 1,2,...,n, Wi = h-1[x-h < xi S x](¢+v) - [xi > x]. Since the xi's are i.i.d., the wi's are i.i.d. Since k is a decreasing function II} I 48 (4.12) Px(w1) = l(¢ + v) - (1-x) 2 vk, where we define A = h-IKJ:_h. By the Berry-Esseen Theorem, page 288 of Loeve (1963), with V(wl) = 62 and r denoting the range of wl, (4.13) px(E < 0) s e(z) + c n-krU-l, % - ~1 where z = -n Px(w1)o and c is the Berry-Esseen constant. By (1.1), hko 2 {(¢+y)zl(1-hl)}%. Also, hr = (¢+y) + h. Hence, fgro'ldvz s (hl(1-hl))'5f;(y + v + h)(¢ + v)'1dv2. Note that (1-hl) 2 1 - K(h) for all x > 0. Thus, bounding the integral in the right hand side of the above inequality, we obtain (4 14) fgro'ldyz s {hl(1-K(h))}"’(*2 + 2h*). By a weakening of the tail bounds of the standard normal distribution function, page 166 of Feller (1957), the lower bound on Px(w1) of (4.12) and the fact that O S r, @(2) S (nfifk)-1r. Hence, since hr S an + h for 0 S v S *9 * 2 2(h-la2 + a ) (4.15) f §(z)dv s ———--£L—-—lL- 0 n!5 k Since A 2 k for x 2 h and * s an, by (4.13), (4.14) and k (4.15), replacing (l-K(h))- by (1+): m the Seco by in Lemma (4.16) Now 9X 49 2 -1 2 c(an + Zhan) (1+) 2(h an + an) * A _ 2 P (w < O)dv S , I0 x (n h k)% nkk which completes the proof. Let nan“, = sup Hanll2 = 44(a§))*, llanll1 = Man). Recall that an is a bounded non-negative function on x > 0 for each n. Lemma 4.3. With 1+ as defined in Lemma 4.2, P((Vn - l>+)2 s I3 12d? + n"‘<<1+)uh'l<|la,,l|,.,llanH2 + 2h|lanll2> + zs'luanu: + Hamil,»- iro_of: P((‘ln - *)+)2 = $3 lzdr + J: wan - w>+>2dr. Since the P-measure of the set where k > 0 is one, converting the second integral of the right hand side above to a u-integral by introducing k in the integrand and applying the bound of Lemma 4.2, we obtain (4.16) I: pxmyn - ¢)+)2dp s n-kj:((1+)ck35h-15(a:+2han)+2(h-1ar21+an))du.. liow extend the range of integration of this bound to (0,”). 50 . 2 Then, Since an S Hanna’s.n and p(k) = 1, by the Schwarz In- equality the right hand side of (4.16) is bounded by n'i‘E<1+>ch'* + 2(h'1HanH§+llan||1>1. which completes the proof. Theorem 4.1. Under 0A1), if an(x) * w for each x, h m 0 and Mann, = 005‘), Hamil, =o<(nh2)3‘), Han“, = cam"), 2 then P(q)n - ¢) d 0 as n fl “. jggggfz Under (Al), P(wz) < w and it follows by the conditions of this theorem and Lemma 4.3 that P(Gn - ¢)+)2 * 0. By Lemma 4.1, since nh2 ~ 0°,P((¢n - H')2 r 0. Hence, P(q)n - ¢)2 = P(Wn - ¢)2 H 0 as n ~ w and the theorem is proved. We now consider the procedure of first estimating the prior G and then using the Bayes Estimator versus the estimate of G. As before let d denote the Livy metric. Noting that K” is the same measure on the space of §~sequences as G°(§), which is discussed immediately before Theorem 3.3, by Corollary 3.1 we can construct an estimator, Gn, of G such that d(Gn,G) r 0 a.s. Rf. The following example shows that the risk of the Bayes EStimator versus a distribution function, converging in Levy 51 metric to G, may not converge to Bayes risk. Example 4.2. Let 0 < B < 09 be a continuity point of G with G(B) = 1 and C(9) < l for 6 0, let l-G (x) A n -1 ¢n(x) = ————- [lx+e dGn > 03, -1 . J:+6 dGn which is a Bayes response versus G“. Let n be sufficiently large so that bn > 0 and Mn > B. It follows that A2 A2 K(¢n> 2 ftbn,e>on -l Fe([bn.a>> 2 1 - bnen , 2 . dK - Mn K([bn,8)). Since for en S 9 S B, -l K = IF9(Ebn.e>)dc 2 <1-bnen )(1'G(°n‘))- Thus, K(&:) 2 M:(1-bn9;1)cn 2 B'1(Mncn)2. Hence if Mncn —. co, since C(92) < 6°, by the triangle inequality for L2- norm the 52 risk of using & converges to infinity. However, since n G(92) < m, the Bayes risk is finite. 3. Uniform [9,6+l) Case. We now consider the class of distributions considered in section 3 of Chapter III. For 9 E 0 = (49,69), fe(x) = [e s x < 6+1]. In this case, we have (4.17) k(x) = G(fe(x)) = G(x) - G(x-l), (4.18) K(x) = C(Fe(x)) = C(x-l) + xk(x) -j’é:r1)+edc, where the affix + on the upper limit of the integral means to include the limit in the range of integration. By the right continuity of G, k is right continuous and hence, for all x, k is the right hand derivative of K. Thus, we estimate k(x) by knot) = h'lxnlfh. where 0 < h S l and h is allowed to dependc on n and Kn is the empiric distribution function of x1,x2,...,xn as in Bac:tion 2 of this chapter. Note that central or left differences 0f K. are good estimators of k(x) when K'(x) = k(x). n By equation (4.17), for all x, Q (4.19) G(x) = 2 k(x-j). i=0 53 (4.20) K - If, k(y)dy = I:_lc(y)dy. For all x, define (4.21) G:(x) = .2 kn(x-j). J=0 * Lemma 4.4. If h H 0, for each x, Gn(x) H G(x) in Km-measure. Proof: Let x be fixed. By (4.17), ” X+h-j ” x-J+h x-j-1+h jfngx-J = j:g(fx_j cdy>. Since the series on the right hand side is telescopic and Ix+h-JG(y)dy H O as j H G, we have, X-j ” +h +h (4.22) 2 KJX '1 2 IX G(y)dy. x-j x 1‘0 * Since Gn (x) is the average of n i.i.d. random variables, each distributed as h-12:=0 [x-j < x1 S x-j+h] whose expectation is h-lif KJx—j+h and since nh2 H w, by the Tchebichev In- J=0 x-j * - - equality, G (x) — h 1:? KJx J+h H 0 in Kfl-measure. Hence, n j=0 x-J by (4.22) and the fact that the right continuity of G plus h converging to zero imply that h-lf:+hG(y)dy H G(x), *- Grle) ” G(x) in KD-measure, which completes the proof. Since G(9f6(x)) Px(°) = G(fe(x)) ’ by tile equations describing the functions k and K, (4.17) and (4 . 18) respectively , 54 = con + tx-Dkoc) - K Px(e) k x Thus, we take as a Bayes response (4.23) ¢(X) - (x-l) + V(X), where the function V is defined as follows: Since the conditional distribution of 9 given x on (x-l,x], 0 S V S 1. Define the function ¢n by * G - K (4.24) w = (L—J-‘Vom 1. n k n 0/0 is defined to be an arbitrary value in the interval where [0,1], We estimate ¢(x) by (4.25) ¢n(x) = (x-l) + ¢n(x). Theorem 4.2. With ¢n defined by (4.25), if nh2 H G and h H O, Rn H R as n H w. fflgagf: Since the conditional distribution of 6 given x is ccnncentrated on (x-1,x], P(¢ - 9) = P(Px(¢ - 6)2) S 1 and Hon - e) = 1>(1>x(¢n - e)2) s 1. It follows that (4.1) holds and it is sufficient to show P(¢ - Wn)2 H 0. Let x be fixed. Sinxze nh2 H G, using the Tchebichev and triangle inequalities m as iti the method used to obtain (4.8), in K -measure, is concentrated i. 55 kn(X) _. k(X) . By the Glivenko-Cantelli Theorem, page 20 of Loeve (1963), a.s. KI, Kn(x) H K(x) Hence, by Lemma 4.4 and the Slutsky Theorem, page 174 of Lobve (1963), if k(x) >10, ln ~ fix) in Ké-measure. We note that P({xlk(x) > 0}) = 1. Thus, by the Bounded Convergence Theorem, a.s. P, Px(¢n - ¢)2 H 0. Again by the Bounded Convergence Theorem P(PxOl'n - ¢)2) = P(‘Vn ' ¢)2 * 0: which completes the proof. Note that the assumption C(92) < o is not made in this case. This assumption is sufficient for R.< w. However, in this case, R S l for any G as was shown in the proof of Theorem 4.2. The question arises as to whether or not the Bayes .Estimator m can have an infinite second moment if R.< m. The following example shows that this is possible. Obviously, in ttris example, C(92) = fi since by Jensen's Inequality, G(ez) < co =2 p(¢2) < co. kample 4.3. Let G be concentrated on I = {1,2,3,...} such that: the mass at m E I is C mfla/2 where C is a normalizing constant. Since ¢(x) = x-l + #(x) and it is bounded above by 1 and below by 0, P(¢2) = °° iff P(XZ) '3 °°- BY (4'17) 56 2 ” 2 m 2 P(x ) = f x k(x)dx = f x (G(x) - G(x-1))dx. 1 1 Thus, by the definition of C, Q P(xz) 2 C E mZm-B/2 = m=l . 2 and it follows that P(¢ ) = w. We now consider the technique of using the Bayes Estimator verSus an estimate of the prior, C. By Corollary 3.2, we can A A Q construct an estimator, Gn’ of G such that d(Gn,G) * 0 a.s. K . A Let Gn be such an estimator and redefine x+ . edG (4,25) ¢n(x) = £1§Llli....§ A x n x-l where 0/0 is defined to be some value in the interval [x-1,x]. It then follows that for all x, x-l S ¢n(x) S x. Theorem 4.3. With ¢n(x) defined by (4.26), Rn - R a 0. Proof: Let D be the discontinuity set of G and let D + 1 = (x+1lx e D}. Let x E A = {x|x e D U (D + 1), k(x) > o} be fixed. Then, by the Helly-Bray Lemma, page 80 of LOEVe (1963), since d(Gn,G) a 0 a.s. f”, we have that a.s. Ké, x+ * x d d d I(x-l)+e Gn Ix-l e G’ " X X Gn]x-1 G:lx-l ' Hence, ¢n(x) " Px(6) a.s. KG. Since x-l S Px(9) S x, by the Bounded C<>nvergence Theorem, Px(¢n(x) - Px(9))2 = Kr(¢n(x) - Px(9))2 H 0. 57 Hence, since PQA) = 1, again by the Bounded Convergence Theorem, P(¢n(x) - Px(9))2 ~ 0, which completes the proof. 4. Estimation of a Location Parameter in Certain Gamma Distributions. Consider a family of distributions characterized by the following density with respect to Lebesgue measure, u: _ 01-1 -(x-e) fe(x) = EFL“;— [x 2 9] with a 2 1, where 6 E Q = (-m,+w) and T represents the Gamma Function. Suppose that G is a distribution on O and assume 2 (A1) G(e ) < 400. In this case (4.27) k(x) = G(fe(x)) = f%&; Ifm(x-9)a-1e-(x-e)dG. We adopt the convention that the upper limit of an integral is in- cluded in the range of integration. The Bayes Estimator in the problem of estimating 9 based on an observation x, with quadratic loss, is I:6(x-9)a-1e-(x-e)dc (4.28) ¢(x) = “00km Remark. By part (ii) of the proposition of section 4 of Teicher (15961), a sufficient condition for identifiability of a class of trainslation parameter mixtures is that the characteristic function 0f the generating distribution function (take the location parameter to be zero) not be identically zero on a non-degenerate interval. In this tril hen M m Inv 58 this case, the characteristic function of the generating dis- .. . .-01 tribution is (l - 1t) . Hence, the family of mixtures considered here is identifiable. Lemma 4.5. With ¢ defined in (4.28), if k(x) > 0, x e-(x-t) ¢ = x - ”(x))-1]“ am). Proof: By the definition of k in (4.27), ajfne'(x't)dx(c) = (T(cv))deifie-(x-t)j‘fa(t-9)a'1e-(t-e)dG(9)dt. Inverting the order of integration in the expression of the right hand side and performing the inner integration yields -(x-9) (T(oz))-1J‘:°(x-O)ae dc(e) = Ifm(x-9)fe(x)dG(9). This last expression is xk(x) - k(x)¢(x), which completes the proof. Define Ifae-(x-t)dl((t) (4. 29) V(x) = “—k—(XT— We estimate ¢(x) by :me-(x-t)th(t) (4.30) inc.) = T Aan(x) , n where Kn and kn are defined as in section 3 of this chapter arui h is positive and allowed to depend on n and an is a bourided non-negative function of x for each n. Also, define the estimator to be zero in the case of an undefined ratio. (4.3 Note lt1 (4.31) {F(x) = ——————— . Note that by the definition of k, for any x and e > O, (4.32) k(x+c) 2 e"k(x). It then follows from (4.32) that (4.33) h'lxj:+h 2 e‘hk(x). . 2 2 Since Fe(x ) = or(a+l) + 290! + 6 . by (A1), 2 2 (4.34) P(x ) = G(Fe(x )) < 0°. Define A = {xlk(x)> 0, k(x) = K'(x)}. Lemma 4.6. Under (A1), if h _. 0, pa? - ¢)2 .. 0. Proof: Let x 6 A. Then, since h -* 0, @(x) -* $(x). By (4.33), ”'(x) - ¢(x)| S (eh + 1)V(x). By (4.34) and the fact that (Al) implies that P((pz) < on, since 01¢(x) = x - ¢(x), P(WZ) < m. Since P(A) = 1, by the Dominated Convergence Theorem, P($ - ¢)2 -' 0 which completes the proof. Lemma 4.7. Under (A1), if h -° 0, nh2 -° 0° and for each x - 2 an To». P((wn - in ~ 0. Proof: Let x E A be fixed. By the Strong Law of Large Numbers, \ x - - _ - $-53 (x t)dKn(t) "' “rice (x t)dK(t) a.s. K“. Since nh2 -°°° and h —. O, kn(x) -* k(x) in Koo-measure by the method used to obtain 60 (4.8) . Hence, since an(x) T 0°, Wn(x) -. ’Hx) in Ken-measure. Since $(x) ~ V(x), ¢n(x) - $(x) v 0 in Ké-measure. By the bound on G(x) of (4.33), (Hum - ion->2 s (ehwxnz. Hence, by the Bounded Convergence Theorem, Px(f¢n(x) - $(x)}')2 v 0. Since P(A) = 1 and under (Al), P(WZ) < a, by the Dominated Convergence Theorem P((Vn - (if)2 ~ 0 and the proof is complete. Lemma 4.8. P((Wn - @)+)2 S Zehn-!5(c+1)u(an + h-lai), where c is the Berry-Esseen constant. 2m: Let * = (an - WWW“ - W32 {Egan - i > odtz. since the integrand of the right hand side is zero for t 2 *. Fix 0 < t < * and suppose x is Such that k(x) > 0. Then [¢n - fi > t] = [w > 0] where wi = [xi 5 xje - ($ + t)h-1[x < xi $ x+h] for i = 1,2,...,n. By (4.33), — -1 +h -h (4.35) Px(w) = Px(w1) = -th x]: s -te k(x). Consider the following bounds on 02 = V(wl), (4.36) (ch'1)21<.]:+h s ((1 + c)h'1)2x3:+h s a2 s (1 + h'lan)2. By the Berry-Esseen Theorem, page 288 of LoEVe (1963) and (4.35), Sillce the range of w1 is bounded by 1 + h-lan, Px(5 > 0) s @(z) + c n'ko'lu + h'lan), where dz = -'n;§t e-hk(x) and c is the Berry-Esseen constant. APPlying a weakening of the bounds on the tails of the standard notTnal distribution of Feller (1957) and the bounds of (4.36), 61 we bound the right hand side of the above inequality by: (1 + anh-l)eh c(1 + anh'l) + . n1": k(x) nkt h'1(K]:-+h)!5 X Hence, since (K]:+h)’5 2 x]:+h and by (4.33), h-lK]x+h 2 e'hk(x), h -l Zane (c+1) (1 + anh ) f3? (w>0)dt2s a, x n k(x) Noting that P({xlk(x) > 0}) = l, converting the P-integral of the right hand side of the above inequality to a u-integral we obtain, P((Wn - §)+)2 S 2n-keh(c+1)u.(an + h-lai), which completes the proof. Define ¢n(X) = x - (fink)- Theorem 4.4. If p(an) = o(n;5), p(arzl) = 0((nh2)%), h = 0(1) and an(x) ¢ °° for each x, then under (Al) , Rn - R -* 0. Proof: By (Al) and the fact that an is bounded for each n, the conditions implying (4.1) hold and thus it suffices to show ’5 that P((bn - ¢)2 -* 0. Since an(x) 1‘ °° for each x, u.(a§)= o((nh2) ) implies that nh2 -' 0°. Hence, by Lemmas 4.7 and 4.8, POI!n - ‘3')?" —° 0. By Lemma 4.6 and the triangle inequality for L2-norm, P(¢n - W)2 —~ 0. Since for x E A, ¢n - ¢ = G(tn - 4t) and P(A) = 1, the proof is Comp lete. CHAPTER V EMPIRICAL BAYES ESTIMATION IN EXPONENTIAL FAMILIES 1. _A Rate for the Discrete Case. Macky (1966) dealt with the Empirical Bayes Estimation Problem for the class of distributions considered in Section 2 of Chapter II. The family is characterized by the following density with respect to u, where u is counting measure on the non-negative integers, pe(x) = GXC(9)m(x), m(x) > o for x = 0,1,2,... and 6 E O = (31:33)) = T(x) 515%11, G(G C(9)) where T(x) = m(x)/m(x+1). Macky (1966) took 0 to be the natural parameter space for the family, i.e. {9|u(0xm(x)) < co, 9 > O}, and assumed 2 G(9 ) < 0°. He then exhibited a procedure for estimating 9, 62 based risk (A 1) vhe CC! 63 based on (x1,x2,. . . ,xn,x), whose risk converges to the Bayes risk versus G as n -°°. In this section, we assume (A1) 0 = (026]: B < a: where (0,6] is a subset of the natural parameter space. Under certain other assumptions, we show that P((bn — ¢)2 = 0(n-AE), where kn(x+1) ¢n(x) = T(x) WA B and -1 n kn(x) = n ifllzxi = x]. 2 Again undefined ratios are taken to be zero. Let R = P(q) - 9) and Rn = P(q)n - 9)2 as in Chapter III. Since B< on, ¢ 5 B and C(62) S 82 and it follows that _ 2 Rn-R-P(¢n“¢) Lemma 5.1. ¢ is an increasing function of x. * * Proof: Define the measure G on O by dG /dG = C(9). Note * that G is a finite measure possessing all moments and that for x = 0,1,2,..., . * Since G (at), r 2 0 is a log convex function of r, see 9.3b, I'v 64 page 156 of Lo‘eve (1963), >2 s c*. for x = 0,1,2,..., which completes the proof. We now make the following assumptions which are part of the group of assumptions made by Gilliland (1966) to obtain a rate of convergence to zero of the modified regret, discussed in Chapter II, in the sequential compognd estimation problem for this family of distributions. an 2p§nx(|¢sn - ¢|) = J“; men - ,3 > v)dv +j‘: Px(¢n - ,3 < -v)dv. Noting that the integrand of the first integral on the right hand side immediately above is zero for v2 * where * = (B - ¢)+ and that for 0v]=[fi>0] where wi = T(x)[xi = x+l] - (¢ + v)[xi = x] for i = 1,2,...,n, we have (5.1) fipan - ¢>v)dv =_[‘; Px(w>0)dv. Abbreviate T(x) and k(x) by omission of x and let E = k(x+1). Let 02 = V(w and let 0 < v < *. By the Berry- 1) Esseen Theorem, page 288 of Leave (1963), since K(w1) = -vk, (5.2) pxé > 0) s 6(21) + enJ‘r 0'1 where (5 3) 02 = m" k . 1 v , C is the Berry-Esseen constant and r is the range of w, i.e. r = T+¢+v.- By Lemma (1.1), (5.4) r0 1 502-1 + 161)}? Since 2 2 2 2 GSK(w1)=Tk+(¢+v)kandby(A1),¢SB or e‘ll-liXIallently E S T-lflk, for 0 < v < *, 66 (5.5) oz s are + 82)k- Recalling the definition of 21 in (5.3), replacing O by the upper bound of (5.5) and then extending the range of integration from * to a, we obtain (5.6) f; §(zl)dv s (mnkfl‘ias + 52)? Hence by (5.1), (5.2), (5.4) and (5.6), since * S B, 45 2 k (5.7) I; 1>x(¢n - q) > v)dv s (2an) (TB + a) + Ben-k(k-l + k-1)%. Letting u1 = T[xi = x+l] - (¢ - v)[:xi = x] for i = 1,2,...,n, noting that wan - a) < -v) = 0 for v.2 ¢ and that for 0 0 such that for '2] < r, z: z"(x:)'-1(j‘; exdq) < co x=0 where gggce dc flag-mm). Let us assume, as in the previous theorem, that G is concentrated on (0,B], B < 09. It then follows that condition (A) is satisfied with r = 8.1. Let G“ be the Tucker estimate of G, modified to meet this more general context. Since C(O) 0x < m-1(x) , by the Helly-Bray Theorem, page 182 of Lobve (1963), én(e"C(e)) ~ G(GxC(9)) a.s. x” for each x. Thus, for each fixed x ,a.s. Km x+1 dgm~69xcw”=deh G (9 C (9)) Whe re we define . &n(exflc (6)) gs>=A x Ad Gn(9 C(9)) It then follows that (in d ¢ a.s. P and by the Bounded Con- vergence Theorem, P(Ei)n - ¢)2 -' 0 ,so that the risk of using the Bayes Estimator versus Gn converges to the Bayes Risk versus G. 70 2. Estimation in the Presence of a Nuisance Parameter. Consider a bivariate random variable with a discrete and a continuous component. Let the distribution depend on a two- dimensional parameter with one component pertaining to the discrete variable and the other to the continuous variable. :5 In this section an empirical Bayes estimation procedure is given ' for the problem of estimating the discrete cqmponent of the parameter, under the assumption of quadratic loss. Let 2 = (x,y) 6 {0,1,2,...} X (-¢,+¢). Let (5.15) Pn(z) = c1c2<§)exe§’mr(y), where -n = (9,§) 6 Q, 0 being the natural parameter space, i.e. o = {(e,§)|e > o, 2:: 9xm(x) < co, fegyr(y)dy < co} and 0 m and r are positive, be the density with respect to u = “1 X ”2’ where B1 is counting measure on the non-negative integers and ”2 is Lebesgue measure. Let C(U) = 01(9)C2(§). Let G be a distribution on 0 such that (Al) C(52) < °°, (A2) c(ez) < co. Iset P be the usual product measure on the space of sequences (:zl,zz,...,(z,6)) where 21, i = 1,2,..., are i.i.d. with density (15.16) k(z) = G(Pn> = rmexe§y> 71 with respect to u and (2,9) has the usual joint distribution. The Bayes Estimator in the problem of estimating 9 based on the observation 2, under quadratic loss, is c 0 and g is any real-valued function of a real 1Jariable y. Define 72 417110) (5.19) t (z) = T(x) —-j‘,——/\a , n A F ( ) n x y where an is a non-negative constant and * -1 n F (y)=n 2[x.=x,y.Sy]. x i=1 1 1 ‘ . A h We consider undefined ratios to be zero. Let Nn be a sequence of non-negative integers increasing to 9. Define A ={zI0SxSNn, -NnSySNn}, > II S S - - S S {2'0 x Nn-l-l, an y Nn+1}s = - S S - S S D ETHNn loge Nn, Nn 5 Nu}, '1 ll - - S S inf [r(y)| Nn 1 3’ Nn + 1}: inf {m(x)|o s x 5 Nn + 1}. Note that mn > 0 and that rn > 0 since r is positive and c ont inuous . Macky (1966) shows that with B(y) = sup Cz(§)e§y, “s where (lg = {5” egyr(y)dy < 0°}, B(y) < on for all y and that ZX+B(Y) = max (B(-Nn -1), B(Nn + 1))- n 73 It then follows that for z E A: (5.20) k(z) S Bn’ where Bn is the product of the max (B(-Nn-1), B(Nn+1)) and the bound on r guaranteed by (A3). rSince k(z) 2 r(y)m(x)IDnC(n)9xegde, for z €.A:, (5.21) k(Z) 2 en: . 2 . where cn rnmndn exp (--2(Nn +11) ) and dn IDnC(n)dG which converges to IC(n)dG >‘0 as n ~>@. Define ‘3 Fri-10’) (5.22) ¢(z) =fi-—(y—)— . x Lemma 5.3. For 2 6 A , ¢(z) S B c-l. ----——- n n n Proof: The proof follows immediately from the bounds, (5.20) and (5.21), on k(z) for z 6 A:' and the definition of t in (5.22). Define (5.24) ¢n(z) = tn(z)[z 6 Ah], $1?" (v) (5.25) tn(z) = 3:“ , 5 Fx(y) ‘where tn is defined by (5.19). As usual let R = P(¢ - 9)2, the Bayes risk and Rn = P(¢n - 6)2. By (A2) and the fact that ‘fin, is bounded for each n, we have that Rn - R = P(c)n - ¢)2. 74 Lemma 5.4. For 2 E An’ b> 0, * 8nh2b2c 4 A n emu-1) who“.-- 2 ((1+b)cn +IBn) where * denotes + or -. Proof: [En - w > b] - E? > o] where for 1 = 1,2,...,n, Vi a [xi = 3+1:Y'h < yi S Y'Hl] - (*fl)[x~i = x’y-h < yi S Y'H‘J- _ - - VH1 . Hence, since Pz(v) b ijy-h and the range of v1 18 1 +-(¢+b), applying Theorem 2 of Hoeffding (1963) to 92(3 > 0), , -2n b) 5 exp 'x 3t“ 2 z n ‘(1 +¢ + b) Also, [En - ¢ < -b] S [6'2 0] where for i.= 1,2,...,n, wi = (FWD:1L = x.y-h < 3’1 5 M] {[31 = x+1.y-h < ’1 5 M]- Since 6' has the same expectation as '3' and the range of w 1 is smaller than the range of v , the bound of the right hand 1 side of (5.26) applies also to Pz(tn --# < -b). In the right hand side of (5.26), we replace Pi];T: by the lower bound, 2h cn, given in (5.21) and ¢ by the upper bound, c;an, given in Lemma 5.3 and obtain the bound of this lemma. Letting k"(x,y) denote the second partial with respect to y, (5.27) k"b}db z n 0 z n n ' By Lemma 5.3 and (i), for n sufficiently large for z E An’ ¢(z) S an, so that the range of integration of the above right hand side can be reduced to (0,a:). It also follows that for large n we can remove the truncation of En at an and apply Lemma 5.4 to obtain the following asymptotic bound on the integrand of the right hand side of the above equation: -8nh2b c4 n 2 exp 2 ((14%) cu + 3n) Replacing (li/b) by (1+an) and integrating the resulting expression over the range (0,0),'we obtain 2 T2((1+an )cn + Bn)2 (5.30) P (t - 11) S 2 n 4 nhzc: 78 By Lemma 5.5, since 2 E An, 1121 82 2 ___L 3 c2 n (5.31) sz - ¢)2 s 1:2 By the Minkowski Inequality, 2 2 3 2 k 2 (5.32) Pz(tn - d) s {(Pzan - Ti) ) + (sz - ¢) ) l . By (1), (ii), (iii) and (A4) the P-integral over the set An of the right hand sides of (5.30) and (5.31) converge to zero as n v 6. Thus, by the Schwarz; Inequality the P-integral over An of the right hand side of (5.32) converges to zero and it follows that IA (tn - ¢)2dP H 0 which completes the proof. n Example 5.2. This example points out that Theorem 5.2 applies to the Poisson-Normal (§,l) case. In this case r(y) = e-%y2 so (A3) is satisfied. Let G be such that (A2) is satisfied. T(x) = x+l and P(x+l)2 = G(Pe(x+l)2) where Pe denotes the Poisson distribution with parameter 9. Since Pe(x+l)2 = 92+39+1 and (A2) holds, (A4) holds. Also in this case, r = exp (-(N +1)2/2) n n ’ on = (01n + 1)!)'1 2 (N +1) -.(JT). :1 it 1’“ “‘1 BIBLIOGRAPHY BIBLIOGRAPHY Deely, J.J. and Kruse, R.L. (1968). Construction of Sequences Estimating the Mixing Distribution. «Ann. Math. Statist.,_§2,.286-288. Feller, W. (1957). An Introductioqjto Probability Theory and its Appli-, '11 cations Volume I (Zgg ed.). Wiley, New York. a Gilliland, D. (1966). Approximation to Bayes Risk in Sequences of Non- Finite Decision Problems. Tech. Report No. 10, Department of Sta- tistics and Probability, Michigan State University. Gilliland, D. and Hannan, J. (1968). The Role of Normal Approximation in Compound Decision Theory Problems. Research Memorandum RM-218, Department of Statistics and Probability, Michigan State University. Hoeffding, W. (1963). Probability Inequalities for Sums of Bounded Random Variables. J. Amer. Statist. Assog;,.§§, 13-30. Johns, M.V. (1966). Two-Action Compound Decision Problems. Tech. Report No. 87, Department of Statistics, Stanford. Lehmann,E.L. (1959). .Testigg §tatistical Hypotheses. Wiley, New York. Loeve, Michel (1963). Probability Theory (3gd_ed.). Van Nostrand, Princeton. Macky, D. (1966). Empirical Bayes Estimation in an Exponential Family. Research Memorandum RM-176, Department of Statistics and Probability, Michigan State University. Parzen, E. (1959). Convergence of Families of Sequences of Random Variables. California University‘Publications'in'Statisticg, ' 2, 23-53. Robbins, H. (1964). The Empirical Bayes Approach to Statistical Decision Problems. Ann. Math. Statist., 32, 1-20. Teicher, H. (1961). Identifiability of Mixtures. éAnn.'Mat‘;‘Statist., 32, 244-248. Tucker, H. (1963). An Estimate of the Compounding Distribution of a Compound Poisson Distribution. Theory of PrObability and its ApplicatiOns,.§, 195-200. Wolfowitz, J. (1953). Estimation by the Minimum Distance Method. Ann. Inst. Math. Statist., Z, 9-23. 1 ' 79 ...ubo.o¢.tov._.mw «...-.... .... . ‘ "'TITx'l‘i'rgfifitLfijflflflfitfliflflflffl'r‘flnflflfliflmm‘s