' I r ”v.01. \~::;I::Isyt_5=:g~w\j« RATES OF CONVERGENCE IN EMPIRICAL BAYES TWO - ACTION AND ESTIMATION PROBLEMS AND IN EXTENDED SEQUENCE - COMPOUND ESTIMATION PROBLEMS Thesis for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY BENITO ONG YU 1970 Yon-:WW‘a-rrrimm LIBRARY I Michigan Stan: I (JDI'VCIBEIY ’9" MW This is to certify that the thesis entitled RATES OF CONVERGENCE IN EMPIRICAL BAYES TWO-ACTION AND ESTIMATION PROBLEMS AND IN EXTENDED SEQUENCE-COMPOUND ESTIMATION PROBLEMS presented by Benito Ong Yu has been accepted towards fulfillment of the requirements for Ph.D. Statistics and Probability ;rtb V'Lé’fldi ‘ (AA/M ag degree in Major professor Date ’5/30/70 0-7639 ABSTRACT RATES OF CONVERGENCE IN EMPIRICAI.BAXES TWO-ACTION AND ESTIMATION PROBLEMS AND IN EXTENDED SEQUENCE-COMPOUND ESTIMATION PROBLEMS BY Benito Ong Yu Throughout, our component problems concern exponential families of distributions of x conditional on the parameter 9. In Part I we consider exponential families determined by a measure with Lebesgue density h, where h(x) > 0 if and only if x > a, and assume the parameter 9 has a distribution G. Based on a sequence of observations x1,x2,...,xn, iid according to the marginal distribution of x, estimates of the posterior mean are used to define estimates for the Bayes test in the linear loss two-action problem. Rates of convergence of the excess risk are obtained under certain integrability conditions. The scale parameter exponential and the location parameter Normal densities are given as examples where the finiteness of certain moments of G is sufficient for these integrability conditions. These results,proved under weaker hypotheses than those of Johns and Van Ryzin (1967), are obtained under the assumption h(r) exists for some r 2 2. Analogous results are also obtained without any differentiability assumption on h. In the squared error loss estimation problem, a truncation of the previous estimates for the posterior mean are used to estimate Benito Ong Yu 6. By a different method of proof, rates of convergence of the excess risk are established. It is shown that the excess risk of the linear loss two- action problem is exceeded by the squared root of that of the estimation problem and, consequently, certain improved rates in the location parameter Normal two-action problem can be obtained as a corollary to those obtained in the estimation problem. In Part II we consider certain discrete exponential and the location parameter Normal families, and assume that the parameter 9 is bounded. Based on all past observations x1,x2,...,xn, with the x1 conditional on 6i being independently distributed according to P , squared error loss estimation of an is con- 91 sidered with the aim that the average risk across the first n problems approach the extended Bayes envelope Rk(G:) evaluated k at Gn’ the empirical distribution function of the k-vectors (91.....ek). (92,...,ek+1).-..,(e ....,en). n-k+l Swain (1965) obtained rates of 0(n-% logk n) and 0(1) for the discrete exponential and the Normal families, reSpectively. Gilliland (1966 and 1968) considered the unextended (k = l) % versions of these problems and obtained improved rates of 0(n- / ) and 0(n"1 5), reSpectively. In Chapters 3 and 4, the same order of improved rates, namely, O(n-%) and O(n k+4 ), are obtained in these families, respectively. RATES OF CONVERGENCE IN EMPIRICAI.BAXES TWO-ACTION AND ESTIMATION PROBLEMS AND IN EXTENDED SEQUENCE-COMPOUND ESTIMATION PROBLEMS BY Benito Ong Yu A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1970 ACKNOW LEDGEMENTS I wish to express my sincere gratitude to Professor James F. Hannan for the patience he accorded me in the preparation of this thesis. His guidance and comments greatly improved and simplified the results in this work. To Professor Dennis C. Gilliland, I owe my thanks for his encouraging and helpful suggestions in reviewing the second part of this thesis. This work was made possible through the financial support provided by the Department of Statistics and Probability and the National Science Foundation. ii TABLE OF CONTENTS Page PART I EMPIRICAL BAYES IN EXPONENTIAL FAMILIES Introduction .................................. 1 Chapter 1 LINEAR LOSS TWO-ACTION PROBLEM ................ 3 1.1 Introduction 0.00.00.00.00... 000000 0.0.00. 3 1.2 The Empirical Bayes Problem .............. A 1.3 Exponential Families ......OOOOOOOOOOOOOO. 5 1.4 Summary and Some Useful Results .......... 7 1.5 Main Result and Examples ......OOOOOOOOOOO 9 1.6 Result Without Differentiability of h ... 17 2 SQUARED ERROR LOSS ESTIMATION PROBLEM ......... 21 2.1 Introduction ............................. 21 2.2 Estimation of LG: Pxe ................... 22 2.3 summary ..................0.0000.......... 24 2.4 Main Results and Examples ......O...’..... 24 PART II EXTENDED SEQUENCE-COMPOUND ESTIMATION Introduction .....OOOOCOOOCOCOOOOOOOO0.0.0.0... 32 3 ESTIMATION IN DISCRETE EXPDNENTIAL FAMILIES UNDER SQUARED ERROR IDSS ...................... 34 3.1 Introduction ..................... ....... 34 3.2 A Bound for the Modified Regret Dn ...... 38 3.3 Estimation in Discrete Exponential Families under Squared Error Loss ........ 40 4 SQUARED ERROR LOSS ESTIMATION IN THE NORMAL FMIIX 00.0.00...0.000.000.0000...00.........I. SO 4.1 Introduction '°°§"'°"°°°°°'°°°°°'°°"'°' 50 4.2 Bounding 311+in - (X+t)\ 52 REFERENCES ......OOOOCOOOOOOOOOOOO00.00.0000... 57 APPENDIX 0.0.00..........OOOOOOOOOOOOOO0......O 59 iii PART I EMPIRICAL BAYES IN EXPONENTIAL FAMILIES INTRODUCTION Johns and Van Ryzin (1967) studied the empirical Bayes two- action problem in the exponential family. They used kernel estimates for the marginal density f and its derivative g to define tests ¢n’ and showed, in their Theorem 3, that under certain conditions, including (C) and (D) of Theorem 1.1, the risk Rn(¢n,G) converges to the Bayes risk R*. Furthermore, a rate was obtained. They gave the scale exponential and the Normal densities as examples where the existence of certain moments of the prior G is sufficient for the conditions (C) and (D). Lin (1968) considered the multivariate estimation problem with Squared error loss. A multivariate version of Theorem 2.1 was considered. Chapter 1 considers the same empirical Bayes two-action problem that Johns and Van Ryzin studied. Theorem 1.1 improves upon their Theorem 3 by deleting assumption (B) in §l.4 and by relaxing (A). The scale exponential and the Normal densities are given to show that in each case their moment assumptions on G can be relaxed. Chapter 2 considers the squared error loss estimation problem. Using a truncation different from that of Lin, Theorem 2.1 establishes a certain rate of convergence. Lemma 2.4 shows that for certain natural tests derivable from estimates the excess risk in the two-action problem is bounded by the square root of the correSponding excess risk in the estimation problem. Corollary 2.3 utilizes this fact to obtain better rates for the Normal two- action problem (Corollary 1.2) from those obtained in the Normal estimation problem (Corollary 2.2). The improved rates are exactly those correSponding to priors not having finite (3 +n/§§)/10 - Ch absolute moment. Notational Conventions. Sets and their corresponding indicator functions will be used interchangeably. The same symbols will be used to denote distribution functions and their induced Lebesgue-Stieltje measures. For any measure u, the n-integral of Y will be denoted by uY, u[Y] or u{Y}. Dependence on arguments will be suppressed for simplicity and dummy variables of integration will not be displayed except for emphasis. CHAPTER 1 LINEAR IDSS TWO-ACTION PROBLEM 1.1. Introduction. Let us consider the following hypotheses testing problem. Let 9 ~ G. We test 9 s c against H : e > c based on an observation X, with XIe being distributed according to some FS with Lebesgue density f9. Let A1 and A2 respectively denote the actions of deciding on H1 and H2, and L1(9) 2 0: L2(9) 2 0 denote the losses of A1 and A2 when 9 is the true parameter. Let P denote the p-measure on (X,e). A randomized test ¢ in the Bayes problem above incurs a risk given below by (1.1) R(¢,G) = 11.,ny1 + (1-¢)L2'I. Let R* or R*(G) denote the Bayes risk.versus G. (We tacitly assume that P'x(L1 - L2) is well-defined. This will be the case for the application of the theory to the two-action problem in exponential families with linear losses.) Since a test is Bayes if and only if it minimizes the expected loss given x, (1.2) ¢C(x) = [PX(L1 - L2) 5 O] is Bayes versus G. Johns (1957) considered the linear losses + - (1.3) L1(9) = (e - c) . L2(e) = (e - c) . and intended, as a consequence, that PX(L1 - L be expressible 2) in terms of the posterior mean; that is, (1.4) P (L x 1 ° L2) = Pi<9 ' C) ' Hereafter, unless stated otherwise, we will assume that L1 and L2 are as defined in (1.3). We remark that, although the losses in (1.3) are unbounded, * the Bayes risk R (C) may be uniformly bounded on the class of all priors; for example, let x ~ N(e,l) and consider the natural test ¢'(X) = [x s c]. Taking conditional expectation given 9, Pe{¢'L1 + (l-¢')L2} . \e - c\s(-|e - cI) is less than (211)"25 by the Normal tail bound (Feller (1962), p. 166). Therefore, the s Bayes risk in the Normal two-action problem is less than (2n)— whatever be G. 1.2. The Empirical Bayes Problem. In this chapter we shall consider the case when a sequence of past observations x1,X2,...,Xn' is available, with each of the X's i.i.d. according to the ,marginal distribution of x. At the (n+1)8t prdblem, the decision rule ¢n is allowed to depend on all the past observations as well as the (n+1)8t. Hence, ¢n is a measurable function of X1,X2,...,Xn and X = X n+1 With P extended to denote the product measure on (X,e), X1,X2,...,Xn, we can express the risk of ¢n by (1.5) Rn(¢n,G) = P[¢n L1 + (1-¢n)L2} . We note that since Pk X{g(e)} = Px{g(e)} for any function g(e), 1’ it follows that ®G continues to be Bayes in the empirical Bayes problem. This motivates the use of the excess risk (regret) n (1.6) R - R* = KIQDH’G) - R* as a measure of goodness of a test ¢n' Restricting G to those with finite Bayes risk, the excess risk satisfies (1.7) o s Rn - R* = PI(¢n - Pc)(Px e - e)} . Note that the integrand (¢n - ¢G)(PR e - c) is non-negative since ¢G continues to be Bayes. 1.3. Exponential Families. Let h be a non-negative measurable function defined on the real line, and fl = {-m < 6 < m : I e"ex h dx < w} . For each 9 in the natural parameter Space 0, let 1 8(9) (1.8) fe(x) = 5(9) h(x) e‘ex , where = I e"ex h(x) dx. The following lemma, due to Professor J. Hannan, yields a choice hg of h such that on the set of x for which ha is positive, the function (1.9) we =I ace) e‘e“ dG(e) is infinitely differentiable and its derivatives can be computed by repeated differentiation under the integral sign. Lemma 1.1. let .9 = {G : G is a distribution on O} and CG = {x : J(x) < w}, for each C €.£. Then there exists a deter- mination N9 within the Lebesgue equivalence class of h (independent of G 6.3), for which [hg > O] C int(CG), what- ever be G. Proof. The fact that hJ is a density implies that [h > 0] 5 CG a.e. for each C €.&. The closed convex set C; =I7{Cé : G E.&} is also the countable intersection HIE; : rational r {LES} where r CG is any one of the CG that excludes r. The above con- r siderations, together with the fact that a countable union of null sets is null, imply that [h > 0] s C; a.e. and, therefore, also [h > 0] s int(§z) a.e. Hence, by defining h& = 0 off int(C£) and ha = h on int(C9), it follows that [h > 0] C int(C&) c int(C§) C int(CG), whatever be G. tenses Since J is well known to be infinitely differentiable on int(CG) and its derivatives can be computed by repeated differentia- tion under the integral sign, it follows that the same hold true on the subset [ha > 0]. Therefore, with (1.10) f =I‘ fe dG(e) denoting the marginal density, the existence of hér) on [hg > 0] will imply the existence of f(r) via the Leibniz's rule of dif- ferentiation for the product f = J hfif We shall make use of this fact immediately after the following summary. 1.4. Summary and Some Useful Results. Johns and Van Ryzin (1967) considered the two-action empirical Bayes problem in exponential families with densities (1.8) under the additional assumption that there is an a 2 -m such that (1.11) h(x) > 0 if and only if x > a. For each integer r 2 2, they exhibited procedures ¢n such that under the assumptions: (A) h exists and is continuous for x > a and r (B) GIeI <°° 9 together with the conditions (C) and (D) of Theorem 1.1, the regret can be shown to converge to zero at a rate no worse than n-Y, where y = (r-1)6/(2r+1) and 0 s 6 s 2. Moreover, they gave the Normal (-e,l) and the scale exponential families as examples where conditions (C) and (D) hold for some 0 s 6 S 1 when- ever the prior G has certain moments finite. We shall show in Theorem 1.1 that only the existence of h(r) together with (C) and (D) are required for the regret con- vergence of 0(n-y). The Normal and the scale exponential examples will be discussed in Corollaries 1.1 and 1.2; and we will show that in each case their moment assumptions can be relaxed. We will further show in Theorem 1.2 that analysis similar to that in Theorem 1.1 can be carried out in exponential families (1.8) where h is not assumed to have any derivatives. In the remainder of Part 1,.& is assumed to be the class of priors G for which the Bayes risk is finite, and only exponential families as defined in (1.8) and (1.11) will be considered; moreover, since [x s a] is aP-null set, all statements are assumed to be quantified by x > a unless stated otherwise. We note that since [x > a] is an open set, the h in (1.11) is already its own he determination. By the remark follow- (r) for Lemma 1.1, the existence of h implies the existence of This improves upon Lemmas 2, 3 and 4 of Johns and Van Ryzin in that their respective moment assumptions GIeI < m, GIeIr < m and GIlog e‘r < a are deleted. For the exponential family in (1.8) and (1.11), Ja) (1.12) Px(9) = - 3——— (for x > a). Hence, the quantity P’X(L1 - L2) = Px(e - c) and, therefore, also the Bayes test $6 in (1.2),are well defined without any assumption on G. In addition, if h(1) exists then, with (1) (1.13) v = %-- , g = f(1) and o = f PX(e - c), we have (1.14) gx(e) = v - % and a = (v-c)f - g . We note that the Bayes test in (1.2) becomes (1.15) ¢G(x) = [a(x) s 0] . When a sequence of i.i.d. observations X1,...,Xn and X is available, it is the Special form of $6 in (1.15) that we will exploit in defining reasonable extimates ¢n by estimating the density f and its derivative g by the kernel method so successfully employed by Johns and Van Ryzin. To conclude this section, we state and prove Lemma 1 of Johns and Van Ryzin (1967) as a consequence of (1.7). lemma 1.2. Let an be any measurable function of X ..,Xn 1" and X. Then the excess risk of (1.16) ¢n = [an S 0] satisfies (1.17) 0 S Rn - R* S ‘LIQIIPXEIQIII " (XI 2 IQIJdX. Proof. From (1.7) and (1.13), * m (1.18) o s Rn - R = IIQIPXI 93m - Re Idx , The reSult follows from (1.18) since I¢n - ¢CI 3 [Ion - GI 2 IaI]. 1.5. Main Result and Examples. In view of (1.14) and (1.17), the excess risk Rn - R* can be made small if f and g can be adequately estimated. The appendix provides kernel estimates fn and gn for which the bias terms Pxfn - f and ngn - g are small. These estimates will be used in the obvious way to define an and ¢n in (1.19). Theorem 1.1 below is an improvement of Theorem 3 of Johns and Van Ryzin (1967) in that their assumptions GIeIr < m and lO h(r) is continuous are deleted. Their proof is reproduced below for completeness. For each integer r 2 2, let (1.19) (on = [an s 03 , where an = (v-c)fn - gn with -1 n o o -1 fn(x) = n jilem) , wjm) = A K0((Xj - x)/A) and n (nm'IL .2: «In» - wJIum. wIm g (X) n 1:1 '1 A K1((Xj-x)/A) being the type of kernel estimates of f and g given in (A.8) of the appendix. We note that r 2 2 is required in (A.l). Theorem 1.1. LBt ¢n be as in (1.19) with A = n-l/(2r+l), If h(r) exists (for x > a), and if there is some 6 > 0 such that ‘° 1- o (a) I IoI 6(1 + \v\>6 = sup f5 = sup Irma + eU)I a e 0 O, the Cr-inequality yields - 5 _ 5 _ 5 (1.21) If“ f\ 5 Ce {\fn Exnt + \prn fI } and for 0 < 6 < 2, Holder's inequality yields _ 6 5/2 BXIfn Pan‘ S (Varxfn) ° Since the above inequality trivially holds for 6 = O and 2, it follows from (1.21) that 6/2 6 e (1.22) kafn - fI 3 C5 {(Varxfn) + IPan - fI } . Thus by (A.9) and (A.lO) of the appendix, _ 6 -1 (0) 6/2 r (r) 6 Pk‘fn fI s const X {[(nA) qe ] + [A qt 1 I -l/(2r+l) so that by (C), (D), and the choice A = n , one has /2) +_O(Ars) g 0(n-r6/(2r+l) A = cans)"6 > 12 Similarly, for O s 6 s 2, Rngn - sI6 s Ce {(Varxgn)6/2 + Ingn - gIé} = const X {L(nA3)-1q:0)]5/2 + [Ar-lqér)]61 so that by (C) and (D), B = 0((n13)'6/2) +-0(A(r’1)6) = 0(n'Y) The proof is completed by this weaker rate of B. For the remainder of this section, the scale exponential and the location Normal families will be given as examples to illustrate how conditions (C) and (D) relate to the moments of G. Example 1. (Scale Exponential) Consider the exponential density in (1.8) with h = [x > 0] and 3(9) = e; i.e., for each 8 > 0 e e-OX , x > 0 (1.23) fe(x) B o , otherwise. The density f satisfies the following facts: (1.24a) fe is monotonically decreasing, and so is f. (1.24b) Since h(r) = O for x > O, f(r) exists (for x >10) by Lemma 1.1; moreover, v = 0 so that conditions (C) and (D) simplify. (r) r . . . (1.24c) If I = I e fedG(9) is monotonically decrea31ng andstherefore, (1.24d) qér) = If(r)I . 13 Corollary 1.1 is an improvement over Corollary 3.1 of John84Van Ryzin (1967). They proved the sane result under the assumptions Ger+1 < a and (1.26) below. Corollary 1.1. For the scale exponential in (1.23), the hypothesis of Theorem 1.1 holds for each 0 s 6 s 1 if (1.25) G[9r] < e , (1.26) G[e-n < m, where n = (1+t)6/(2-6) for some t > 0. 1 Proof. Since v = 0, condition (D) simplifies and is implied by r the integrability of a and q: ), subsequently illustrated. By Tonelli's theorem (Royden (1965), p. 234), IIaIdx s IIIe - cIfedG dx = GIe - cI. By (1.24c) and (1.24s), (r) _ r _ r I qe dx - If e fedG dx - GL9 ] . Hence, we have shown that G[er}< m is sufficient for condition (D). Let us next verify condition (C). Since a is bounded (0) by GIe(e - c)‘, v = O, and q6 = IfI s G[e], it follows that, under (1.25), condition (C) is implied by to (1.27) I \sI1'6 fb/de < s . 1 Since 9 5 e9, If< )(x)I 3 f(x - l) for x > 1; consequently, Io(x)I - Icf + £(1)| s (c+l)f(x-1) for x > 1. Thus, by the Holder inequality, an (1.28) I IaIl'6 fG/z dx s (e+1)1'5(1/t)5/2{p[1+x]“}1'5/2. 14 The proof is completed by the equality PIX“) = G{9-n}F(1+fl). Remark. Corollary 1.1 shows that procedures ¢n exist, for which the regret convergence rate can be arbitrarily close to -’5 n provided 5 = 1 and r is sufficiently large, i.e., G has finite (-1)- as well as arbitrarily high moments. Example 2. (Normal (-e,1)). 2 Consider the exponential family in (1.3) with h(x) = e“x /2 2 and 5(9) = (211)“!5 e'9 /2; that is, for each -m < e < a, 2 e e-(9+x) /2 fe(x) = (2n)- , where -m < x < m . We have shown earlier (§1.3) that for this family the Bayes risk * - R (G) < (2n) I whatever be G. 2 2 - - + eY/2+e()’e) /2 Since the function is symmetric with respect to y I -e/2, and has a unique minimum there with value 2 2e-e ,8, it follows that fe(x + t) s fe(x) + fe(x + e), (1.29) for 0 S t S e s,/8 log 2 . q:°)(x) s £00 + for + e). By repeated differentiation under the integral sign, f(r)(x) = (-1)r Ihr(x +-e)fe(x)dc(e) . where Hr is the r-th Hermite polynomial. Thus, for e s,/8 log 2, (r) r 1 If (x>I s z IajI le + e\ fe(x) dc(e) . o (1.30) (r) r j j q‘3 (x) $53 Iajch Idc. 15 where the second inequality follows from the first via (1.29) and the Cr-inequality. Lastly, % t is s (2n)' , f 5 (Zn)- (1.31) IaI S.II9 - cIf.e do(e) s (211)';5 GIe - cI. qéo) s (2n)'3 , qér)~ is bounded. Remark. Corollary 1.2 below is an improvement of Corollary 3.2 of Johns-Van Ryzin. They proved the corollary under the stronger 1+(3+t)6/(2-6) < assumption GIeI m, and GIeIr < m. Corollary 1.2. Consider the Normal (-9,1) family. For each 0 s 6 s 1, if GIeI1-+-(2-+-t)s/(2-e) < m (1.32) for some t > 0 , then the hypothesis of Theorem 1.1 holds for each r 2 2. Proof. Condition (D) is implied by the integrability of a and r IxI q: ), since 1 + IvI = 1 +-IxI is bounded by 2IxI for IxI > 1. By (1.31), if GIeI < m then (1.33) IIaIdx s GIG - cI < m . . 2 Denote by b the constant (2n)-% IIzIJ e.z /2dz. Since 1 PeIx +'te = b , it follows, by the triangle inequality, that .1 PeLIXIIx + eI-I] s Pe[(Ix+ 9| + I9I)Ix + 9I11= sjfl + Ierj. Hence, GIeI < m implies P(IxI Ix + te) < m for each j and therefore IxI qér) is integrable by (1.30). This completes the verification of (D) under GIeI < a. 16 Let us next consider condition (C) for 6 = 0, 6 = l, and O < 6 < 1. Case 1 (6 0). (1.33) proves this case. Case 2 (6 = 1). Since IvI = IxI and q(0) 3 (2n)-%, we need only to verify the integrability of [IxI> l] Ix I(q(0) ) I. By Holder's inequality, 3+t O I IxI(q: O)x)35d s (—> (“XI q()d ax} IXI>1 where the last integral is bounded by I IxI3+t(f(x) + f(x+€))dx s P[Ix+9I +I9I33+t +-P[Ix+BI + Ie+eI]3+t via (1.29) and the triangle inequality. Again by the fact that (x+e) given 9 is standard Normal, GIeI3+t < m implies Case 2. Case 3 (0 < 6 < 1). Let 0 s g s 1,0 < t. With 0 < l/p= 6/2 < l, l- 0 2 - 0 < 1/q .—2—6< 1 X = I I( §)5(q : ))5/ and Y =IXI§5IUI16 in the Holder inequality, it follows that (C) is implied by the integrability of Xp and Yq. By (1.29), I Xpdx S P x2(1-§) +~2 PIX'6I2(1-§) so that GI9I2(1-E) < m implies the integrability of Xp. If GIeI < a, then a is bounded by (1.31), and Y is bounded on IxI s 1. Therefore, the integrability of Yq is implied by that of [IxI > 1]Yq. By Holder's inequality, ILIXI>11quxS (.95/(2- 6) X{IIquIoIdXI2(1-6)/(2-6), 17 where u = %(1 +-t +-2§)6/(1-6). By Tonelli's theorem, IIquIaIdx s IIe - cIPeIqudG(e). Since (x+9) given 9 is standard Normal, PeIqu is bounded by rem“ s on x {Pelx + e\“ + W} . l by the Cr-inequality. Thus, GIBI +u < m implies that Yq is integrable. Balancing between l+u and 2(l-§), we get m8x(l+u, 2(1-§)) is minimized when 1-2§ = 6(2+t)/(2-6), so that 2(1-§) = 1+6(2+t)/(2-6). Therefore, (1.32) implies Case 3. Remark. Corollary 1.2 shows that for the Normal (-9,1) family there exist procedures for which the regret convergence to zero is of a rate no worse than n-Y, provided that the prior has finite 1+6(2+t)/(2-6)&1absolute moment , where O s 6 s 1. In the case where 6 = l and r is sufficiently large, a rate close to n")5 can be achieved provided the prior G has 3+ absolute moments. However, for 6 = 0, the finiteness of the first moment of G guarantees only the boundedness of the excess risk. This lack of rate will be removed in Corollary 2.4. 1.6. Result Without Differentiability of h. In Section 1.5 we discussed the exponential family in (1.11) (r) and (1.8). We took advantage of the existence of h and obtained the result in Theorem 1.1. In this section we shall not assume h to have any derivative. We recall from (1.9) the definition -ex (1.34) J(x) = I e 5(e)dc(e) . It was shown in Lemma 1.1 that J is infinitely differentiable on 18 [h&> 0] and, therefore, also on [x > a]. Since f = Jh, it follows from (1.12), (1.13) that (1.35) a = -(J(1> + cJ)h. In view of the method of attack exhibited in Sections 1.4 and 1.5 , we shall estimate $6 through J and J(1) For each r 2 2, let 0 n Jn =n 1 2 wjm/ij) i=1 (1.36) J'(X) = (HA)-1 ; {W1(2A) ' W1(A)}/h(x ) n j=1 j J j where W? and W} are as defined in (A.8) of the appendix. Let (1.37) 6n = [an s 0], where an = ~(J; + an)h . Theorem 1.2. Let ¢n be as in (1.37). Consider the exponential family in (1.11) and (1.8). For each 0 s 6 s 2, if there exists some a > 0 such that (Cl) ‘I‘ IaI1-6(T% h)6 dX < a ’ Te(x) = SD M a e 0031 h(x+eU) (D') ‘II IQI1-6(S(r)h)6 dX < ‘0 ’ S(r) (X) = Sup IJ (r) (X+eu) I 9 a e e 0 The proof is completed by the weaker rate of B. Example 3. Consider the exponential family with (1.39) h = [0 < x s 1] +-2[1 < x < m] . Then 0 = (0,m) and 3(9) = e/(l + e-e). We note that (1.40a) h is non-decreasing while J is strictly decreasing. (1.40b) IJ(r)I -.f 9r 8(9) e'exdc(e> - (1.40.) s“) = INN and T = l. 6 e h 20 Corollary 1.3. Consider the exponential family with h in (1.39). The hypothesis of Theorem 1.2 holds provided (1.25) and (1.26) hold. Proof. The proof of Corollary 1.1 works with O (1.41) qér), q: ), f, IaI s.c[eIe - cI] and f s Gie] respectively replaced by (1.42) 5“), T , J, IaI s zciele - cI] and f s ZG[e]. e e CHAPTER 2 SQUARED ERROR lDSS ESTIMATION PROBLEM 2.1. Introduction. Suppose e is distributed according to some prior G, and one is to estimate 9 based on an observation X with XIe dis- tributed according to the exponential family given in (1.8) and (1.11); that is, for some a 2 an, (2.1) £e(x) = 5(9) h e‘ex , where (2.2) h > 0 if and only if x > a . Let P denote the joint p-measure on (x,e) as in Chapter 1. Let the loss function be the squared error loss. The risk of an estimate ¢ is then given by R(¢,G) = P(¢ - e)2 with Bayes risk (2.3) R*(G) = inf P(¢ - e)2 . ¢ We note that R and R* denote different quantities in Chapter 1. In order that the problem not be totally uninteresting, we restrict G to those with finite Bayes risk" We note that the Bayes risk R*(G) can be uniformly bounded in G. For example, let X ~ N(e,1). Then the natural estimate ¢'(X) = X 21 22 * has risk P(¢' - e)2 = 1. Therefore, R (G) s l whatever be G. Extend P to denote the product p-measure on (X,e), X1,X2,..., and Xn. Let In be any measurable function of x1,...,Xn and X. The risk of In is then given by _ 2 Rn(¢n,c) - P(q;n - e) . Let WC be a Bayes estimate versus G. If In - WC 6 L2(P) then POIn - WG)(¢G - e) = 0, and the excess risk satisfies (2.4) o s Rn(¢n,G) - R*(G) = P(wn - 1&2. We recall the following definitions from Chapter 1. (1) h - (2.5) v = 5—— . f -—- we) .3 = f”) . and J =fe %B(e)dc(e)- It is well known that a Bayes estimate under squared error loss is the posterior mean PRG' Hence, by (1.12), the Bayes estimate VG is well defined without any assumption on the prior G. Further- more, (1.14) remains valid with Pie replaced by WG’ i.e., (2.6) WG=V-%. In view of (2.4), it is now a matter of estimating I; by estimating the density f and its derivative g. 2.2. Estimation of WC = PXG- We shall exploit the expression in (2.6) in estimating WC when a sequence of observations X ,...,X , i.i.d. according to l n the common density f, is available. 23 Let fn and gn reSpectively be any estimates of f and g. Let n > O. Truncate fn away from 0 by ' = (2 7) fn fn v n . and define gn (2-8) , in = V - f: - Lemma 2.1. For each fl > O, the estimate In in (2.8) satisfies (2.9) P(In - IG)2s 3m"2 A + n'2 B +10) , 2 where A = P(gn - g) , B = P(g/f)2(fn - f)2, and c - P(g/f)2[f < n]. Proof. From (2.7) and (2.8), simple algebraic manipulation followed by the triangle inequality will yield n - .Ee a. fi- ' (2.10) nIIn - IGI “If; ‘ fI 5 ‘gn ' f f I 5 Is, - 8l + I§| lf - fgi . Since If - féI s “[f < H] + If - nt, the proof follows from (2.10) and the inequality (a +‘b +~c)2 s 3(a2 +b2 + c2). Lemma 2.1 shows that for any estimate In of the form in (2.8), the regret can be bounded in terms of A, B and C in 2 (2.9). The first two terms, namely A and B, involve P - f) X(£n and Bx(gn - g)2. The appendix gives kernel estimates fn and gn, for which these quantities are small. Therefore, hereafter, we shall consider fn and gn to be the kernel estimates given in (A.8) and that In in (2.8) is to be defined in terms of these estimates. 24 2.3. Summary. Theorem 2.1 below is a l-dimensional specialization of a result considered by Lin (1968). The scale exponential and the Normal densities again will serve as examples to show that the existence of certain moments of G is sufficient for the hypothesis of Theorem 2.1. In Corollary 2.4, better rates are obtained for the Normal two-action problem from those obtained in the Normal estimation problem. 2.4 Main Results and Examples. Theorem 2.1. Let In be of the form in (2.8) with fn and gn being kernel estimates of f and g as given in (A.8) of the appendix. If h(r) exists and if for some 0 < e (2.13) P{ <1 + (g/f>2>q:°’} < e , (2.14) P{ <1 + 2>2) < .. . and if 6 2 0 such that (2.15) P{(g/f)2[f < m} 5 c1 115 , l 2(r-1) then, with A = n'1/(2r+1) and n = n 2+6 2r+1 (2.16) 0 s Rn(In,G) - R* = 0(n'Y') , l 2 ‘ where Y = 236 25:11) . Proof. Let A, B and C be as in (2.9). With A = n-l/(2r+1), Lemmas A.3 and A.4 of the appendix followed by (2.13) and (2.14) will yield 25 A 3 c5 x (nA3)‘1 + c; x A2(r-1) S C2 X n-2(r-l)/(2r+l) ’ and B 5 c5 X (nA)-1 + c" X A2r s c n-2r/(2r+l) 3 X 3 3 with the rate on A being the smaller of the two. The choice _ 1 2(rfill 2+6 2r+l _ _ I balances the rates of C and n 2A to n Y . B = n The proof is completed by Lemma 2.1. Example 1 (Scale exponential). Consider the scale exponential with Lebesgue densities given by (1.23), i.e., e e , for x > 0 (2.17) fe(x) = O , otherwise. Consider the extreme case where G is degenerate at e = 1 with all moments finite. The quantity c = P(g/f)2[f < n] in (2.9) can be computed to be exactly n. This motivates the bound in the following lemma. Lemma 2.2. For the scale exponential in (2.17) if 0 < n s f(l), then for each p > 1 and l/p + 1/q = 1, 2 1 l (2.18) P(g/f) [f < n] s > ”(n/(21> - 1)) /" . Proof. The inequality (g/f)2 = (-Pxe)2 s Px(92) followed by Holder's inequality yields 26 P(g/f)2 [f < n] [A P(ex)2 x-2[f < n] (2.19) IA - l (Precex>2q>1/q(p x Zpif < n3) /p 1/p . where the last equality follows from the fact that conditioned on 9, ex is standard scale exponential. For 0 < n s f(l), [f < n] s [x > 1] so that P x-2p[f‘< n] S n I x-2p dx = n/(Zp - l) . This completes the proof. Lemma 2.2 shows that (2.15) holds with 6 = l/p and /p c1 = r1/q(1 + 2q)/(2p - 1)1 without any assumption on the prior 1 1xa shown that f(x) = x-( +a) g zae-zdz ~ x-(1+a)F(l+a) and G. (For priors with densities ea-1[0 < 9 < l], a > 0, it can be Ig(x)I ~ x'(2+a)r(2+e) as x .. 00. Hence, (g/f)2 ~ (1+a)2x"2 as x a,¢, and' C s clfl(2+a)/(1+a). Here we see that the bound on 1 C deteriorates as a, the number of finite moments of e- , increases. Corollary 2.1. For the scale exponential family in (2.17), the hypothesis of Theorem 2.1 holds for each r 2 2 and 6 < 1, pro- vided + (2.20) c er a < c. . 2 2 - Proof. Since (g/f) s §{(9 ) and qér) = C(er+1 e ex), it suffices to note that with ei ~ G1 = G, i = 1,2, 27 2 -elx G1 G Fe[(1+e )91 e 3 2 - P[(1+2x)c(e e 9x)3 2 c1 c[(1+e )e1 e/(e+el>] c) . IA and furthermore, by the Arithmetic-Mean-Geometric-Mean inequality (Beckenback-Bellman (1961), p. 54 ), 2 - P(1+Px(e ))G2(er+1 e 9") '8 x '9 x 2 r+l 1 r+l 2 c1 62 CF9[(1+9 >91 e 92 e 1 r+l r+l 2 - G1 e2 e1 92 G[(1+e )e/(e+el+92>1 r+k r s a G1 62 61 92 r+k “5 Ci (1+ez) 935] = a cz 0. 2322;. Since f(L) = n and [f < n] = [IxI > L], we have C = 2 ixzf dx which, upon integration by parts, yields C = ZLM +-2P[x > L]. By the Normal tail bound (Feller (1962), p. 166), it follows that 2Lf(L) + 2f(L)(% - 15) < C < 2Lf(L) + 2f(L)/L. Consequently, C ~ ZLm. The proof i: completed by the fact that L = o(n-t) for any t > 0. The above remark motivates the bound in the next lemma. Lemma 2.3. Consider the Normal (-e,1) in (2.21). For each 0 s a < 1, (2.15) holds if (2.24) cIeI(1+t)5/(1‘5) < e for some t > 0. Proof. By the Holder inequality, P(g/f)2[f < n] s 11/p Ill/q , where I = PIg/fI2p s PPxIx + eIZP = PPeIx + 9I2p = b2p by (2.23) and (2.22), and s l-s II = P[f < n3 3 n I f dx, for any 0 s s < 1. Since the density 'f is bounded by (2n)-%, the integrability 1- 1- of f 8 is implied by that of [IxI > l]f S. Temporarily, let 6(3)=(1+a)s/(1-s)for each a >.o. The Holder inequality followed by the Cr-inequality yields I [IxI > 1]f1-sdx s (2/a)S P1-8(IxIv) _<. We)" {cbv + PIeIVme‘S . 29 Hence, GIBIV(S)<#»implies that fl”8 is integrable and, therefore, (2.15) holds with the rate s/q. Since (2.24) implies that there exists some 0 < a«< t for which GIGIv<6+> < m with 6+-> 6, the proof above shows that (2.15) holds with rate 6+/q. The proof is completed by the choice 6+/q = 6. Such a choice is possible since 1 < q is a free parameter. Corollary 2.2. Consider the Normal (-e,l) family. For each 0 s 6 < 1, if (2.24) holds, then the hypothesis of Theorem 2.1 holds for any r 2 2. 35295 “with“ (2-13) and (2-14) are satisfied because qéo) (r) 2 . s are bounded functions, and (g/f) is P-1ntegrable by and q (2.23). The proof is completed by Lemma 2.3. Remark. For 6 close to 1, Corollary 2.2 shows that a rate of 0(n-Y'),with y' arbitrarily close to 1/3, can be attained, pro- vided GIeIul< a for sufficiently large at On the other hand, for 6 close to zero, lower convergence rates are attained. This last result is completely absent in the two-action problem (Cf. the remark following Corollary 1.2). We shall presently remedy the situation by obtaining better rates in the Normal two-action problem as a corollary of the estimation problem. Let In be the estimate prescribed in Theorem 2.1. Con- sider the test ¢; = [In - c s O] in the two-action problem in Theorem 1.1. "P >2 1 ( x9 In . Consequent y, lama 2.4. P{(Pxe - c)(¢r'l - ‘30)} s P the excess risk of ¢é in the two-action problem is bounded by the square root of the excess risk of In in the estimation problem. 30 Proof. Since - ’f Pxe c 1 In 5 c < PXG - 7 - = (PXG °)(¢n 66) c-Pxe 1f PXGScPe(dx> . A sequence (non-Bayes) compound problem is one in which the decision rule ¢n for the n-th problem is allowed to depend on all past observations in = (x1,x2,...,xn) and the loss is taken to be the average of the component losses. We require that ¢n(§n,-) be a probability measure on Co for each x“; and that ¢n(-,C) be.5p-measurable, for each C 6 Ck 34 35 Let m_= (¢1,¢2,...) be a procedure in a sequence-compound problem. The average risk of using ‘m against g_ in the first n problems is given by -1 n (3.2) Rn(§.m) = n 121““ L(ei,A)¢i(£i,dA)§i(d§i) . where P, denotes the product measure P X P X...X P . -1 6 e 9. l 2 1 A compound procedure Q. is simple if ¢,(-,C) is x,- 1 1 measurable for each C 6 CL If, in addition, all $1 are identical, say $1 = ¢, it is simple symmetric. For every simple symmetric procedure m. and any g, -l lira: Rump) = n R(eim) = j‘ R(-.¢>dcn . 1 1 where Gn denotes the empirical distribution of the first n 9's; i.e., (3.3) Gn puts mass l/n on each of 91, 92,...,en . With R(G,¢) denoting I R(°,¢)dG and (3.4) R(G) = inf{R(G,¢) : ¢ 6 Q} denoting the Bayes risk versus the distribution C, it is obvious that for any simple symmetric procedure m'= (¢,¢,...) (3-5) Rn(_e,d) = R(Gn.¢) 2 R(Gn). This motivates the use of the modified regret (3.6) Dn(Q,m) = Rn(.9.:m) - R(Gn) as a measure of goodness for compound procedures. 36 Swain (1965) considered the following extended version of R(Gn)- k Let k 2 1 be an integer. Let §_E mm and Gn be the k-th order empirical distribution of the first n 9's which puts equal mass l/(n-k+1) on each of the k-vectors: k QR: (91,62"°°,ek) , k 3H1 7 (92’93’°°"9k+1) ’ 31 = (91-k+1"”’ei) ’ k - 9.. - (en-k+1"°"9n) - CorreSpondingly, an extension of a simple symmetric procedure is k k k a k-simple symmetric procedure m. for which ¢i(-,C) is Ei- k k , measurable for each C E c“ ¢i<§i’.) is a p-measure on C, and k all ¢§ are identical to some ¢ . The risk of any k-simple symmetric procedure against 6.6 0” in the first n problems, not counting the first k-l, is given by k - n k (3.7) Rama) = 1 z Rk = If L(61:A)¢k(£l;adA)§l:(d§.l;) , i k 311‘ = n P9 and Rk(G:,¢k) = I‘ 11kg ,mk)dG:(gk). i-k+l j 37 It follows from (3.7) that for any k-simple symmetric procedure k k k m, ' (p .6 .---). k _ k k k k k (3.9) Rn(§,m) " R (Gn:¢ ) 2 R (Gn) : where (3.10) Rk(Gk) = inf Rk(ck.¢k) - n k n d k Swain (1965) used the k—th order Bayes envelopes R (-) in (3.10), or effectively k _ k k (3.11) Dn(§,¢0 - Rn(§J¢D R (an) . as standards in defining goodness of compound procedures @, and called the resulting problem the extended compound decision problem. Gilliland and Hannan (1969), in an improvement of a result of Swain, showed that for each 1 s k s n and g, k+ k+1 G l (3.12) (n-k) R ( n ) s (n-k+l) Rk(G:) - ‘-- k+1 k+1 k+1 k k In special cases, lim {R (C ) - R (G )} < 0, so that R new n n k is truly asymptotically more stringent than R . Swain exhibited procedures, for the discrete exponential and the Normal families, that attained regret convergence of rates - k no worse than 0(n % log n) and 0(1) reapectively. Gilliland (1968) considered the (k=l) unextended versions of these problems and was able to exhibit procedures that possessed regret convergence 6 1/5 of rates no worse than 0(n- ) and 0(n- ) for the discrete exponential and the Normal families, respectively. 38 It is the purpose of the remainder of this thesis to re- instate the k in Gilliland's results and to show that the same 5 improved rates of 0(n- ) and 0(n-1/(k+a)) hold. In the course of doing so, several of Gilliland's lemmas and theorems will be extended and, in some cases, strengthened. 3.2. A Bound for the Modified Regret D: . It is well known that under squared error loss, the posterior . . k . . mean 18 Bayes. With reSpect to cm, a verS1on of the posterior mean of the k-th component of fik is given by k n (3.13) Inez.) = [an > 0] 321.93 rrj/pIn k n ,nj = II p._k+c(yé) and pn= 2n. . where p. = p 91 Under squared error loss, a non-randomized estimate Q has a modified regret k -1 n 2 k k Dummy = pk 2,6, - 91> - R (an) , l: i where ‘21 = U Pe . Thus, by Theorem 2 of Gilliland and Hannan 1 3 (1969) (i.e., n n k k k k k k k k 2 R (3MB 5 (n-k+1) R (Gn) s 2 R (91,114) . i=k i=k where Ital is arbitrary), one can show that D: is bounded above and below by (3 14) (n-k+l)-1 n P (( ' k)( + k - 29 )) ' E—i $1 *1 ‘91 *1 i and 39 (3 15) (n-k+1)-1; P (( ‘ k)( + Wk " 29 )) ' k-i ‘51 1'1 91 i i l - n k k + 2 213011: - w,_1> O'Irri/pi + a[pi_1 = 0, pi > O] from.which n n (3.18) 12231; IAiI 5 2a I z [(ni/wZ/(pi/MHM dpk k n k + aI‘ E[p1_1 = 0, p1 > 0] (ni/1~_4)b_d dp, , k where M_= H M(yL). The first term on rhs of (3.18), according 6‘1 to lemma 3.1 below, is bounded by 4O “ k k 2a.I( z 1/i)M_dn = O(log n) I M'dn , i=k and the second term is bounded by a I M duk. But since k k I! dp. = (I M dp.) < co, the result follows. We state without proof lemma 2.1 of Gilliland (1968). Lema3.1. Forall Osaisl,ksi_<.n, n 2 i n s = 2 a1 / z a. s 2 l/i . i=k j=k J i=k Combining (3.16) and (3.17), we have Corollary 3.1. If n = d" [-a,a] and the hypothesis of Theorem 3.1 is satisfied, then (3 19) I0k(g )I s 4a (n-Ic+1)'1 2 P I - IkI +-0(n"1 log n) ‘ n ’9- k -i ¢i i uniformly in Q, for any compound procedure m’. 3.3 Estimation in Discrete Exponential Families Under Squared Error Loss. Consider the family of probability measures on the non- negative integers having densities x (3.20) 9900 = 6 Me) g . x = 0,1,2..... with respect to counting measure 6, where g > O, and let (A1) 0=d=[0,a], 0 01(g/g> , where for each y_= (y1"°°’yk) 9 41 g=g(yk) . §=g(1+yk) , k-l fi.. 3 c=13 j k J IIMH- In view of (3.21), when a sequence of past observations is available, a natural estimate for I: (xk) is -1 (3.22) fig?) = {[s > O]((g/§)(S +v1)/(S +v2))} /\ a , 2k 3 i, where f(l) = f(y1,...,yk_1,1 + yk) for any f, i-k k 1 if §k=§1§ i-k S=26..6=5.(x.) = .§=z'c'. and 0 3 v1, v2 5 k. We note that ¢: depends on the last k k observatiOns Ki taken as a kevector, and is essentially a ratio . k between the number of times the kdvectors g. equals J k k I O O 1 . (Xi-k+1’ ,Xi_1, +xi) and the number of times Zj equals Ei’ except for the perturbations v1 and v2 in the numerator and denominator. It will be shown that these perturbations are * negligible by comparing $1 to the unattainable procedure (3.23) 9:05-13:13 > O](g/§)(§ + s')/(s + s')} /\ a . i where ratios 010 are taken to be 0, S' 8 2 6f , 6'(§§) = s u c j-k+l’°'.’xi-k’xi-k+l"°"xj)’ £1) with Xj independently distributed according to P9 and independent of gj. 6((X It will also be shown that ¢i possesses a certain rate of the regret convergence. To be more Specific, we will show that under suitable conditions, with E1 denoting the product measure on (K; 3 Kl-k) 3 42 O(n ) uniformly in ‘6 -1n k ii'fiEiIas; - IiI (Proposition 3.1) and 0(n-%) uniformly in Q (n- k+1) 2:13:11 Ii - ¢ I (Pr0position 3. 2), so that, by the triangle inequality, (n-k+1)-1 2‘, £1“): - IEI = 001-35) uniformly in g :3 “1 (Theorem 3.2). A Useful Result of Bikelis (1966). Let Yi’ i = l,2,...,n be a sequence of independent random variables that possess finite 2 + 6 (O < 6 s l) moments. Let Fn denote the distribution function of the normalized sum n 2 Sn ‘ 2 (Y1 - EYi)/Sn’ where sn - Z Var Yi' There exists a 181 1 2+6) universal constant c such that IFn (x) - n6(x)I s c L2+6n/(1+Ix I where is the Liapounov quotient z EIYi -EY WIZ+6I 2+6 L2+6,n X and Q(X) a (2n) kl” e ‘t 2/2d The lemma below is an immediate corollary of the Bikelis theorem. ‘We will use the lemma in bounding the error term in the Normal approximation. Lemma 3.2. Let Y1, i = 1,...,n be a sequence of independent bounded random variables with IYi - EYiI s B < m for each 1. Then — 1+6B 6 (3.24) IFn(x/sn) - o(x/sn)I s c 2 /(sn +-Ix I)6 Proof. By the Bikelis theorem, we have 2+6) IF-n(x) - @(x)I s c L /(1 +I I 2+ ”6 43 where , 2+ 1 L2+6,n 5 36/8: and 1 + IxI 6 2 (1 + IxI)2+6/2 +6 by the Cr-inequality. Hence, 1+ 2 6 B6 +6 IE;(x/sn) - 6(x/sn)I s c 2 srzl/(sn + IxI) 1 S c 2 +6 36 /(sn +-IxI)6. The proof is completed. Henceforth until (3.34), we will let 3: = x: be fixed k k k . and abbreviate ¢i(§1) and ¢£(§i) to I rand ¢', respectively. Let E abbreviate Ei' Since 0 S o'. I 5 a by (A1): it follows that a 0 (3.25) EI¢' - II eg E[¢' - I 2 63cm +I‘ EI_¢' - I s u]du . -I We shall next place bounds on the two integrands by the use of Lemma 3.2. For each i and IuI s a, put q = (g/g)(II +11) 3 (3.26) {6' -q 6., for ksj s i-k, Y = j J 3 S-q6I,for i-k 0]/S . Proof. Let I = [(g/§)(§/(k+5)) < a]. On [3 > 011, I0* - o'I S (s/§)I(§ + k)/3 - §/(k +-S)I s k(a + g/§)/s. Since I¢* ' O'I = 0 on {[S > O]I}C, the result follows. Remark. Lemma 3.3 is an analogue of (3.28) of Gilliland (1968). The truncation of mf in (3.23) results in the better bound in lemma 3.3. Lemma 3.4. (3.39) Ei-k([s > 01/3) < (k+2)/pi Proof. If S > 0, then the inequality S+k+1 s S(k+2) implies i [s-> 03/3 s (k+2)/(S+k+1) s (k+2)/(S +- z 65 + 1). By the i-k+l convexity of l/(1+z), Hoeffding's Theorem 3 (1956) applies to yield 1 _1 1 131.158 + z 5! + 1) s 2 0p <1 - p) j/<1+j> , i-k+l 3 3:0 where p = pi/(i-k+l). The rhs of the last inequality is bounded by i 1+i '- 2 0(1+j>91+j(1- p)1 j/<(1+i)p) 1% s (1 - (1-p)1+i)/((1+i)p) s 1/((1+i)p) Since (l+i)p = (1+i)pi/(i-k+l) > pi, the result follows. Lemma 3.4 with k Specialized to l improves upon Lemma 3.3 of Gilliland (1968). The next lemma is suggested by the proof of Lemma 3.5 of Gilliland (1968). 48 Lemma 3.5. Under (Al), n k - k k (3.40) E'"i'£i-k([s > 03/3) < b(n k+1) (Lglpa(xi_k+t)) % where b = 2(k+2) (h(O)/h(a))k/2. Proof. Since [S > 03/8 5 l, [S > 0]/S s ([S > Oj/S)%. Con- sequently, Jensen's inequality applies to give P ([3 > 03/3) s (P [3 > 03/3)!5 < ((k+2)/p )15 ‘i-k ‘i-k i where the last inequality follows from (3.39). Thus, by (3.35), % % (3.41) Ei-RU‘S > 03/8) < 2(k+2) pn . WM'J :I 1 Under (Al), (3.37) holds. Hence it follows from (3.41) that n 5 k/2 a k k E nrgi_k([3>03/3) < 2(k+2) (h(0)/h(a)) (pm/fl) LE pa(Xi-k+L)) s b(n-k+l)%( g p (x ))15 L=1 a i-k+c ° The proof if completed. Proposition 3.2. If the family of distributions satisfies the assumptions (A1) n=d=[09a]s 0 03/3) . k 1 k Via the equality 'gi((a + g/§)[s > 03/3) = z (a + g/§)n__13i k([3 > 03/3), 1 - 1k (3.43) and (3.40) yield n k (3.44) 23%; - 53 < box-k+1);5 z (a + g/§>( r1 pa>i . k =1 11. L Since 1‘ a 2 (a + g/§)(H pa)!5 = (2 pa)(k-l)/2 z (a + 3/§)p 1k l x x the proof is completed by (A2) and (A3'). a , Theorem 3.2. Under (Al), (A2) and (A3'), (3.45) \D:(Q, mf)‘ = 0(n-a) uniformly in g . Proof. Under (A1), (A2), (A3') and (A3), Corollary 3.1 together with (3.36) and (3.42) implies (3.45). Since (A2) and (A3') imply (A3) via the Cauchy-Schwarz inequality 2 ((g/§)pa)$5 s (2(3/§)p:)%(z p3)35 . the result follows. Remark. Theorem 3.5 of Gilliland (1968) proved (3.45) under the stronger assumption (A2+) together with (Al) and (A3'). The pro- cedure mf in (3.45) extends and includes that of if and Qf* in Gilliland. For examples of distribution satisfying these assumptions see Gilliland (1968). CHAPTER 4 SQUARED ERROR IDSS ESTIMATION IN THE NORMAL FAMILY 4.1 Introduction. Consider the Normal (9,1) family 2 % -(x-e /2 e ,-oo9) where tr and tr' stand for retraction to the intervals ['(a+fl+€): a + R + e] and [-a,a] reSpectively. k With W abbreviating wn and suppressing the subscripts , * ** ** ' * in ¢n+k and ¢n+k , we have ‘3‘ s a , W = tr V and there- ** * fore ‘3 - 3‘ s ‘w - W‘. Consequently, by the triangle in- equality, ** * k (4.4) guikw - M s 3.41.11 - (x + m +1134.” + t - M We state without proof Lemma 3 of Susarla with 02 = l, and F =Q-. k Lemma 4.1 (Susarla). For each x. in R (1) x + mini) e [-a - £1 - e. a] (2) 6 Bk 2 F J exp(- manna“ + W)} . <3) Eogsc'iokflffiexpmnxlx \ +a+n+e>3 where x = xk , n = E nj/(n-k+1) and HE“ = Lil‘xé‘ . 52 * 4.2 Bounding 32n+klw - (X + t)‘ Fix §_= 5' until (4.10). Since x + t, by (1) of Lemma 4.1, is in [-a -'9 - e, a] it follows from the definition of 3* that ‘w* - (X + t)‘ is bounded by the quantity a' = 2a4-gfl +'23, and at the same time bounded by \t* - t‘. Therefore, for each x. in Rk, a 0 (4.5) p M" - (x+t)| s3 A du +3 B du -n 0 -a' * * where A = P [t - t > u] and B = P [t - t < u]. We shall -n -n first bound A and B by the Bikelis theorem. Put k k Si=[§iéflk]. 6i=D£iEEUb (4.6) rim) a Si - bi amt“) , for |u\ s a' , k s i . 2 n r = 2 Var Yi k -— +a+' Let w = (n-kfil) QC]k n/k, R = en(‘x\ a ), 2 denote summation over i from k to n, 2' denote summation over L from 1 to k and, 2" denote summation over d for which R s L + dk s n, for each L. lemma 4.2. For some constant c1, A s k §(-wu/r) + Cl 155/3qu , o s u , and, for n a' s l , B S k §(wu/2r) +c1 R%/\wu/2‘% , -a' s u s O . 53 = ' " .. Proof. Note that A gnu Y1 2 03 s 2 3:13; (YL+dk 3n YL-l-dk) 2 -2, gm Yi/k] and, similarly, B = §n[2(-Yi) 2 03 I s }::_1_>,a\'_2"-(it‘dk - 3n YLMR) 2 2 33“ Yi/k] . By (4.6), (4.7) 2 P Yi = (n-k+1) Egka ' ell“) . '11 For OSU,1-enus-T3u. For -a's.us0,'na'sl implies l - e'nu > J; 'nu. Thus, by (4.7), I II _ ASE-En[2(YL-+dk PY£+dk)2wu], OSU -n (4.8) I H _ - p .. - ' B5): an (YL-i-dk 11393411192 wu/23, a guso . Since ‘Yi - EYi‘ s 2R, we have, by Lemma 3.2 I ’5 35 A s: {QC-Wu/rL) + c R /\wu| 3 , 0 Su (4.9) B S 2'{{>(wu/2rL) + c Rkl‘wu/les} , -a' S u s O 2 where rL = Var 2" YL-i-dk° The proof is completed by the bound 25 Va Y = 2 rL 2 r i r . We note that 2 2 _ 2" I (4.10) r SEEnYiS(nk+1)RQDk° With (4.10), we prove an analogue of lemma 4 of Susarla. Lemma 4.3. For O< e s T] s l/(6 +2a), (4.11) gnficlf - (x + t)| s 31(n- k+1)'35{(3711;h)% + 9-1—1933 , T) 6 Us where B1 is independent of n and Q . 54 an Proof. Since £ §(-bt)dt s (2n)-%/b, for b > 0, it follows from (4.5) and Lemma 4.2 that, for na' 5 1 *5 (4.12) LIN" - (x-i-t)‘ 5 c2 5+ C3 R—g . w By (4.10) and the definitions of w and R, the above inequality yields (4.13) En‘¢* - (X+t)‘ S 32(n'k+1)-%{(2LJ_ k+1) )CHDkR + (“a _]T)%D!5R%}a “26 J—Q Bk k where and D = 5—— . By (2) and (3) of Lemma 4.1, "n+3 Q Elk Q Elk C 3 exp{ (n+e)(‘x ‘ + a + T3 + (5)} and D s (n)1exp('n+e) (“12“ + 551.3) Hence, it follows from (4.13), the definition of R and OsesnsI/(6+2a) that (4.14) £1333” - (x + t)\ s B3(h-k+1)'%{(32ik+9—1)25 + (—1—k)353 x n e “e X exp{(2‘x‘ +-\\§_u)'n}(;)-35 . k To complete the proof, we shall show that the P -integral of -n+k the function g = exp{(2|x\ +'“§“)}(;)-% is uniformly bounded $5 in n. Let c = (2n). Since c pe(y) s exp{-33y‘ - a)+32/2} and cgpe(y) 2 exp{-[\y| + a]2/2}, we have (F)-% s ckfih exp[z'(‘xL‘ +-a)2/4], and "n+k s c-k/zexp[-£'[(flx£3 - a)+j2/2}. Consequently, the Pfi+k-integral of g is exceeded by the constant -k 2 + 2 I c /4exp{(2‘x‘ +-H§M) +'Z'(‘XL\ +‘a) /4 ' 2'[(‘XL‘ - a) ] /2}d§ . The proof is completed. We state without proof 8 special case of Lemma 6 of Susarla. 55 2 2 lemma4.4. ‘x-i-t-H s'fl(l+a)+e(l+ka) The next lemma, suggested by Professor Gilliland, is an analogue of Theorem 3.1. Lemma 4.5. Consider the Normal (6,1) family in (4.1). For any 1 s b, b + k s n (4.15) Pkwk - ¢:_b| = 0(n'1) -n n uniformly in ‘Q . k Proof. Let 1 s k, k s n-b. Since for each fixed x“ n 2 “j k k n-b+l Hn - wn-b‘ S 2a n 3% n _2 n _ and, by Jensen's inequality, 1/2 "3 S (n-k+l) z “j k k n n 33k - Wk bl s 2a(n-k-I-1).2 2 fl 2 “:1 . But for any x_€ Rk, n n- n-b+l j k 1 , we have "j n;1(x) s eZaHxfl’ therefore, 31133335) - WIS-Mi” s2.—ib(n-k+1)'1 3 eZaHaH "n dx By the monotone likelihood ratio property of the Normals, 3 2 m Peeza‘x‘ s 2e‘ /2 e2ax pa(x)dx = c(a) is a finite constant. Consequently, I eZaHEM fin d£.S ck(a), uniformly in n; therefore, the result follows. With lemma 4.5, it follows from (3.19), via the triangle inequality, that for the Normal family in (4.1) 56 k -1 n k -1 (4.16) ‘Dn(_Q,m)‘ _<. 4a(h-k+1) Egi‘d’i - ¢i_k| + O(n log n), uniformly in 3 . Theorem 4.1. With = n-I/(k+4) and n = be for l < b, then ** k -1 kfl4 (4.17) Emit” - ¢n\ = 0(n /( )) and k ** -1 k (4.18) Dn(§,y_ ) = 0(n /( +4)). Proof. Lemmas 4.3 and 4.4 imply (4.17), via (4.4). The result follows from (4.16) and (4.17). REFERENCES REFERENCES Beckenbach, E. and Bellman, R. (1961). An Introduction to Inequalities. Random House. Bikelis, A. (1966). Estimates of remainder term in the central limit theorem. Litovsk. Mat. Sb, 6, 323-346. Feller, William (1962). An Introduction to Probability Theory and Its Applications, Vol. 1, 2nd ed. John Wiley & Sons. Ferguson, T.S. (1967). Mathematical Statistics. Academic Press. Fox, Richard (1968). Contributions to compound decision theory and empirical squared error loss estimation. RM-214, Department of Statistics and Probability, Michigan State University. Gilliland, Dennis (1966). Approximation to Bayes risk in sequences of non-finite decision problems. RM-l62, Department of Statistics and Probability, Michigan State University. Gilliland, Dennis (1968). Sequential compound estimation. Ann. Math. Statist. 39, 1890-1904. Gilliland, D.C. and Hannan, J.F. (1969). On an extended compound decision problem. Ann. Math. Statist. 40, 1536-1541. Hannan, James F. (1957). Approximation to Bayes risk in repeated play. Contributions to the Theory of Games, 3, 97- 139. Ann. Math. Studies No. —39, Princeton University Press. Hewitt, E. and Stromberg, K. (1965). Real and Abstract Analysis. Springer-Verlag New York. Hoeffding, Wassily (1956). On the distribution of the number of successes in independent trials. Ann. Math. Statist. 27, 713-7310 ~~ Johns, M.V., Jr. (1967). Two-action compound decision problems. Proceedings 9£_the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 463-478. University of California Press. 57 58 JohnS, M.V., Jr. and Van Ryzin, J. (1967). Convergence rates for empirical Bayes two-action problems II. Continuous case. Technical Report No. 132, Department of Statistics, Stanford University. Lin, Pi-Erh (1968). Estimation of a multivariate density and its partial derivatives, with empirical Bayes applications. Ph.D. Thesis, Columbia University. LoEVe, Michel (1963). Probability Theory, 3rd Edition. Van Nostrand. Royden, H.L. (1963). Real Analysis. Macmillan. Samuel, E. (1965). Sequential compound estimators. Ann. Math. Statist. 36 879-889. Susarla, V. (1970). Rates of convergence in sequence-compound squared-distance loss estimation and two-action problems. RM-262, Department of Statistics and Probability, Michigan State University. Swain, Donald D. (1965). Bounds and rates of convergence for the extended compound estimation problem in the sequence case. Tech. Report No. 81, Department of Statistics, Stanford. APPENDIX APPENDIX SOME KERNEL ESTIMATES OF DENSITIES AND THEIR DERIVATIVES Estimation of Lebesgue density f and its derivative (1) g = f will be discussed in Section 1. Estimation of a density J(1) will be J with reSpect to dn = h dx and its derivative discussed in Section 2. Estimates for the above quantities are based on the kernel method that Johns and Van Ryzin (1967) used. We shall first discuss briefly the existence of some of the kernels. Let r be an integer 2 2 and let K0 and K1 be L2(0,l) functions vanishing off (0,1) with f‘ur Kj\du = r! cjr’ j = 0,1 such that t 1 if t = O (A°1) I u K0(u) du = 0 if 0 < t s r-l and LlK satisfies (A.l) with r replaced by r-l. For 1 O and K1 can be the first two elements of the dual basis for the subSpace of L2(O,l) with basis {l,u,...,u example, K r-l}. As the intended result of these conditions on K0 and K1, if S has its rth derivative bounded by M on (0,1), then th substitution of the r order Taylor expansion with Lagrange's remainder shows (A.2) U‘sxo du-S(O)‘ sMcor and, if in addition S(O) = O, 59 6O (A.3) \f 3 K1 du - S(l)(0)‘ s M c1r . Let X1,X2,... be a sequence of random variables i.i.d. according to some Lebesgue density f. Let E denote the product measure on X1,X2,...,Xn- 1. Lebesgue Density In this section kernel estimates fn and gn for f and g = fa) , respectively, will be discussed. Johns and Van Ryzin (1967) prOposed these estimates and it appears that they showed 0A.9) below under the extra assumption that f(r) is continuous for x > a. The bounds on the bias terms in (A.9) improve as the number of derivatives of f increases. Lemma A.1. (Approximation of f and g). For each x and each A > 0, let f(x) = jxo(u) f(x+Au)du (A.4) Ekx) = I A.1 f]:::fiu K1(u)du . If f(r) exists on [x, x +-2A], then (A.5) \E'- f‘ s Ar qér) e0r . (A.6) IE - gl s or'lcqf’ + 2r qéZbck . where (A.7) qgr)(x) = Sup {\f(r)(x+Au)‘ : o < u < 1} . .grggf. Since f(r)(x+n.) is bounded by Arqgr)(x), (A.5) follows from (1.2). With so» = £33123“ in (A.3), the fact 3(0) = o 61 together with ‘S(r)\ s Ar(q§r) + 2r qéz)) implies (A.6). lemma A.2. (Unbiased estimation of f. and E), For each x and A > 0, let n n (A.8) fn(x) = n-1 2 W2(A) and gn(x) = n.-1 2 A-1(W;(2A) - W;(A)) i=1 i=1 where wow) = A-1 x ((x - x)/A) and W1(A) = 1'1 K ((x - X)/A)- J 0 J j 1 1 Then fn(x) and gn(x) are unbiased for f(x) and gkx), respectively. Proof. Since the Xj are i.i.d., the proof follows readily from (A.8) and the transformation theorem. Combining Lemmas A.1 and A.2, we have Lemma A.3. (Johns and Van Ryzin). Let A > 0. If f(r) exists on [x, x + 2A], then (r) ‘E fn(x) - f(x)‘ 5 Ar qA (x) cor , (A.9) -1 (r) (r) r r \E gn(X) - g(X)l s A (qA (x) + 2 qzn (X))c1r . Lemma A.4. (Johns and Van Ryzin). Under the hypothesis of Lemma A.3, va. fn(x) s (no)'1 q§°)nxon§ , (A.lO) 3 - Var gn(x) s 3(nA ) 1 q§2)(X)HK1H: : where Var denotes the variance taken with respect to the measure E, and “.H2 denotes the L2-norm with reSpect to Lebesgue. 62 2 Proof. Since the xj are i.i.d., the inequality Var X s E(X ) followed by the transformation theorem, and with the Cr-inequality applied at the proper place, yields (A.lO). 2. Density with ResPect to du = h dx. Let f be a Lebesgue density of the form f = h J, where h > 0 if and only if x > a. Then J is a density with reSpect (1) to du = h dx. The estimation of J and its derivative J will be discussed next. Let A > 0. For each x, let _ -1 n o Jn(x> — n zwjmwhcxj) , i=1 (A.ll) n J'