_(D00\l llllllllllllllllllllllllllllllllUlllllllllllllllllllllllll 293 00917 3075 This is to certify that the dissertation entitled Stability of Symmetrized Probabilities and Compact Equivariant Compound Decisions presented by Mostafa Mashayekhi has been accepted towards fulfillment of the requirements for PhoDo degree in Stat'lSt'lCS /’ x" gem/nod T WCM Major professor [Mme August 10, 1990 MSU is an Affirmative Action/Equal Opportunity Institution 042771 A LIBRARY ”kmxan State l University ¥ .1 ~—-——— PLACE iN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE _—l _J #— MSU Is An Affirmative Action/Equal Opportunity Institution chS-pd STABILITY OF SYMMETRIZED PROBABILITIES AND COMPACT EQUIVABJANT DECISIONS By Mostafa Mashayekhi A DISSERTATION Submitted to Michi an State University In partial fnl ment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1990 ABSTRACT STABILITY OF SYMMETRIZED PROBABILITIES AND COMPACT EQUIVARJANT COMPOUND DECISIONS By Mostafa Mashayekhi Extensions of Hannan and Huang (1972) results on the stability of symmetrization of product probability measures to the compact case and their applications in extensions of some of the results of Gilliland and Harman (1974), on equivariance in a compound decision problem are obtained. Let 5’ be a compact, in the total variation norm, class of pairwise mutually absolutely continuous probability distributions. We show that the total variation norm of the symmetrization of two products of probabilities in .9: with differences in one factor, converges to zero uniformly as the number of factors approaches an. Rates of convergence are obtained for the case where 3’ is an exponential family with its parameter space in the interior of the natural parameter space. The above convergences translate into the convergence to zero of the excess of the simple enve10pe over the equivariant enve10pe, for a restricted component risk compound decision problem, as the number of problems approaches an. For compound estimation of continuous functions under squared error loss, and finite action problems with continuous loss functions, the problem of treating the asymptotic excess compound risk of equivariant "delete bootstrap" rules is reduced, under an identifiability condition, to the question of LI— consistency of certain mixtures. Examples of estimates satisfying the above consistency condition are included. To my mother iv ACKNOWLEDGEMENTS I wish to express my deepest gratitude to my advisor Professor James Hannan for suggesting the problem and for the patience he accorded me in the preparation of this thesis. His careful criticism and invaluable suggestions aided greatly in simplifying proofs and improving virtually all of the results in the thesis. I like to express my thanks to my other committee members, Professors Dennis C. Gilliland Habib Salehi Clifford Weil, and Professor James Stapleton for their suggestions on an earlier draft. Finally I would like to thank the Department of Statistics and Probability at Michigan State University for their generous support, financial and otherwise, during my stay at Michigan State University. TABLE OF CONTENTS Chapter 1. Introduction 1. The set compound problem 2. History review and a summary of the present work 3. Notations and conventions Chapter 2. On symmetrization of product measures 1. Introduction 2. Preliminaries 3. Contraction effect of probability factors Lemma 1 4. Two product probability measures with differences in one factor Lemma 2 _ * (Convergence to zero of ||('rPn 1) ||) Remark 1 On the monotonicity of X) Remark 2 Necessity of mutual absolute continuity) Theorem 1 Main result) Example 1 Location family) 5. Exponential families Example 2 (8 polytOpe) Remark 3 (a c N°) Theorem 2 (Main result) Chapter 3. Equivariance and the compound decision problem 1. Introduction 2. Asymptotically optimal "delete bootstrap" rules Remark 4 (Excess of the sim 1e enve10pe over the equivariant enve10pe), Theorem 3 (Sufficient conditions for asymptotic) optimality) Lemms 3 (Continuity of w E (0,d) ~-o V w) Remark 5 (Uniform continuity of w E (fl,p) ~-o V 1.)) Example 3 Squared error loss estimation of d 0)) Example 4 Finite J and each La continuous 3. Examples of L1— consistent mixtures A. Consistent mixtures based on a hyperprior Datta) B. Consistent mixtures based on a minimum distance (Edelman) Lemma 4 (Bicontinuity of w €(fl,p)~-+ w E(fl,r)) Theorem 4 (Minimum distance estimation) Bibliography (”OOH Becca 11 12 13 14 15 17 19 21 22 23 24 25 27 28 30 CHAPTER 1 INTRODUCTION 1. The set compound problem. In the set version of the compound decision problem, pioneered by Robbins (1951), simultaneous decisions are to be made in 11 problems of the same generic structure, with this structure being possessed by what is called the component problem. Ordinarily in the component problem, there is a family of probability distributions 9 on some common measurable space (.33), an observable .$-valued random element X with distribution P, where P E .9; an action space .4 a loss function L: say—i [0,m), a class .9 of (randomized) decision rules t, on .3 x a , where a is a a—field of subsets of .4 such that for each x e .3 t(x,-) is a probability measure on a and for each A e a t(-,A) is .3 measurable. The decision procedure t has risk (1) R(t.P) = llL(a.P)t(x.da)dP(X)- In the compound problem, we have the state space .99, the action space .1 n, observations E = (X1, ..., Xn) with distribution P , where n P =a;1 Pa , P = (P1, ..., Pn) e 59, compound rules t = (t1, ..., tn) : where for each 1 g a g n, ta has domain .a“ x a with ta(-,A) .é‘ measurable for each A and t a(x,-) a probability measure on a for each x. The compound risk is given by 2 (2) sea) = izuuaimptedahdrw. Sometimes it is preferable (see Hannan and Huang (1972a)) to consider loss functions that may depend on x itself. A more general setting (cf. Gilliland and Hannan (1974—)) is to bypass the consideration of a loss function and identify each decision rule in g by its risk point in [0,...)5.’ Then a compound rule t is identified with g = (81, ..., sn) where for each i, si is a 31-1 measurable mapping into [0,m) 9 The a—th component risk at P a is then sa(Pa) and the compound risk is given by has) = %2 sum). Let o’ be the class of all simple procedures (i.e. of: {_t-: t “(g = t(xa) V l 5 a 5 n, for some component rule t}), and let 3 be the class of compound rules that are equivariant under the permutation group. As functions of 1:, inf REP), inf R(LP) are called the o’ 8 simple enve10pe and the equivariant enve10pe respectively. It is clear from the definition that the latter is the infimum over a larger class and well known that the former coincides with R(Gn), where R(w) is the component Bayes risk at w, and GH denotes the empirical distribution of PI’ ..., Pn. Traditionally, a compound rule is called asymptotically optimal if, with the modified regret at I: defined by (3) 13,1042) = BEE) - R(Gn). 83p DIED —-0 0 as n —» m. However, since almost all of the compound rules in the literature are equivalent to equivariant procedures, the equivariant enve10pe (cf. Hannan 3 and Huang (1972 a), p 104) is considered a more apprOpriate yardstick of performance than the simple one. 2.Historyreviewandasummaryofthepresentwork. Hannan and Robbins (1955) introduced the class of equivariant procedures for the 2x2 .9: .1 compound problem and showed (Theorem 5) that the difference between the simple and equivariant envelopes converges to zero uniformly in P. Hannan and Huang (1972a) considered the compound problem for finite .9 under a certain class of loss functions and provided an upper bound on the difference of the simple and equivariant enve10pes which is 0(n'1/2). Gilliland and Barnum (197+) (Theorems 1 and 2) extended those results to arbitrary bounded risk components for finite , .9! They also showed (Theorems 3 and 4) that for equivariant "delete bootstrap" procedures, the excess compound risk over the simple enve10pe is bounded in terms of the L1 error of estimation and thus established a large class of asymptotic solutions to the compound decision problem with restricted risk and finite state component. Their proof depended heavily on the Harman and Huang (1972b) results on the stability of symmetrization of product measures (Theorem 3) which was a strengthened generalization of Theorem 11.1 of Hannan (1953). In this thesis we consider the compound decision problem in which the set of component distributions .9 is compact in the t0pology induced by the total variation norm and has pairwise mutually absolutely continuous elements. The risk set of the component problem is assumed to be a bounded subset of [0,...)91 4 In Chapter 2 we consider some extensions of Hannan and Huang (1972 b) results on symmetrization of product measures to the compact case and prove two measure theoretic theorems analogous to Theorem 1 of Hannan and Huang (1972b). Theorem 1 shows convergence to zero , as the number of factors approaches to , of the total variation norm of the symmetrization of the difi‘erence of two product probability measures with differences in one factor. Theorem 2 specializes to compact k—dimensional exponential families and obtains rates of convergence for the case where the parameter space is a compact subset of the interior of the natural parameter space. Chapter 3 considers some extensions of Gilliland and Hannan (1974-) results on equivariance and the compound decision problem. In Remark 4 we observe that the method of proof of their Theorem 1 bounds the difference of the simple and equivariant enve10pes by a constant multiple of the norm of two product probability measures considered in our Theorem 1. Our enve10pe results strengthen, inter alia, the results of Datta (1988) who obtained admissible asymptotically Optimal solutions to the compound estimation problem for a large subclass of the real one parameter exponential family under squared error loss. Theorem 3 provides sufficient conditions for asymptotic Optimality of "delete bootstrap" rules. Examples 3 and 4 show that for squared error loss estimation of continuous functions and for finite .1 problems with continuous loss functions Theorem 3 reduces the problem of treating the asymptotic excess compound risk of Bayes compound rules to the question of Ll-consistency of certain mixtures. The reduction is analogous to Theorem 3 of Gilliland and Hannan (1974—), and immediately extends the results of Datta (1990) for the empirical Bayes decision problem, to the corresponding compound decision problem, under appropriate loss functions. 5 Examples of estimates satisfying the above consistency condition are provided in Section 3 of Chapter 3. 3. Notations and conventions. Let n be a positive integer. If Pl’ ..., Pn are probability measures, P denotes the product probability measure Pl x x Pn. We use P11 to n denote (1:11) and P1P2 to denote P1 x P2. An n—tuple (x1, ..., xn) is denoted by 5n and in denotes the average of the components of 5n (the subscript n will not be exhibited if it is clear from the context). The empirical distribution of I_’_ , where P a is a probability measure for each a, is denoted by Gn' We use u(f) or uf to denote the integral of a function f with respect to (wrt hereafter) a signed measure it. We sometimes use expressions such as jf(x)dp(x) to exhibit dummy variables. The same notation is used for a set and its indicator function when the distinction is clear from the context. A function f defined on a set .3 into a set y is sometimes denoted by x E .3 ~~-. f(x) or x ~~-» f(x). Sometimes we abuse notation and denote functions by their values. If .3 and y are metric spaces with metrics r and p respectively, we sometimes denote f by x 6 (3r) ~-o f(x) 6 ( flp) . If f is a function of two arguments, f(-, y) denotes the function (section) that is obtained by fixing the second argument at the point y. If r is a signed measure, then |r| will denote the total variation measure corresponding to r (i.e. |r| = r+ + r- ) and "T" will denote the total variation norm of r. We denote the Euclidian norm and inner product 6 by | | and juxtaposition respectively. The supremum of a real function f is denoted by ||f||m whatever be its domain. All incompletely described limits are as n -+ so through positive integers. All sums will be on i from 1 to 11 unless otherwise indicated. The symbol n denotes end of proof. CHAPTER 2 ON SYMMETRIZATION OF PRODUCT MEASURES 1. Introduction In this chapter we consider some extensions of Hannan and Huang (1972b) results, on the stability of symmetrization of product measures, to the compact case. Our main results (Theorems 1 and 2) are analogous to their Theorem 1. In Section 1 we reproduce some of the general prOperties of signed measures and their symmetrization with respect to general groups from their Section 2, with the minor improvement that we consider total, instead of their maximum, variation norm. The substitution of this equivalent norm simplifies some relations and proofs. Section 2 considers a contraction effect of probability factors in product signed measures that was noted in their Section 3 , and presents an extension of their Lemma 1 with a simpler proof. In Section 3, specializing to permutation groups, we consider product probability measures with factors in a set which is compact under the t0pology induced by the total variation norm, and has pairwise mutually absolutely continuous elements. Theorem 1 and Theorem 2 deal with the effect of symmetrization on the difference of two product probability measures with differences in one factor. Theorem 1 shows uniform convergence to zero of the total variation norms, and Theorem 3 specializes to k—dimensional exponential families and obtains rates of convergence. 2. Preliminarim Let 7 be a finite group of measurable transformations g on ( fl 6). For * a signed measure 1 on ( fl 3), the symmetrization r of r is defined by m im=rhdwn0es 36? where N is the number of elements in y Thus symmetrization (*, hereafter) is an expectation Operator. We will abbreviate affixes on * by omission. For any real valued function f on y , its symmetrization f* is defined by (2) f=N-12f.g. 1' and f are said to be symmetric if r = 7* and f = f*, respectively. at: It at: at: The prOperties, (r. g) = r and (fig) = f V g 6 fl will be used later without comment. The following two facts are taken from Section 2 (Relations (6) and (8)) of Hannan and Huang (1972b). Let r be a signed measure, and let p be a measure such that dr/dp exists. If p = if, then (a) (dT/dfl)* = when). Let p be a measure. If f is pat—integrable, then III Ii! It It! (4) #(f)=u(f)=u(f)- 9 The following two relations ((5) and (6)) are simpler analogs Of their relations (9), (11) and (12). If (3) holds, who: 3 Idr/dul" by subadditivity of | |. Applying this with it = if, by (4), integration wrt p and the isotonicity of p—integral give (5) "in 5 Hr". If 70 is a product signed measure, then |ra| = (r‘a‘ + 717’) + (1"0' + 777*) = |r| |a| and therefore by the Fubini Theorem, (6) "Tall = IITII "all- If P and Q are product probability measures, subadditivity of norm and applications of (6) give 7 P- (2 xP.P.— .x .=2P.—. () || ‘2" - i "Ki ,( , Q,)j>iQJII i ll , Q," 3. Contraction efiect of probability factors. Let (3 .3) be a measurable space. For each n let fin be a measurable group of transformations on (.3 3)" such that in is a subgroup Of in +1. Consider the symmetrization of a measure on (3.3)11 relative to fin. The following lemma is a strengthened generalization of Lemma 1 of Hannan and Huang (1972b), with a simple proof eliminating the need for developing their (13) and (14). It also serves for the extension of Remark 2 of their 10 Addendum. Their Lemma 1 was already sufficient for the proof of our theorems in this chapter. Lemma 1. If r is a signed measure and P is a probability measure then (8) II up)“ ll 5 u?"- * * It Proof. Observe that ('rP) = (r P) , since they agree on symmetric functions. Therefore (8) follows from the application of (5) and a: the probability case of (6) to r P. n Henceforth we specialize fin to be the group of transformations on (3.3)n induced by the group of permutations On 11 objects. We also let n denote the permutation group itself. Thus a generic element g E in will be used both as a permutation and the transformation g(i_r) = (x81, xgn). 4. Two product probability measures with diflerences in one factor. Throughout the rest of the thesis .9 is a non-empty norm-compact class of pairwise mutually absolutely continuous probability measures. Part (i) of Lemma 2, to follow, proves uniform convergence to zero of ll(TQn)*Il where r = R — S with (Q,R, S) e .9 3. The result is used to prove a stronger assertion in Theorem 1 where Qn is replaced by a product of n elements of 9 . Part (ii) of Lemma 2 obtains rates of convergence under an additional assumption and is used in the proof of Theorem 2 in Section 5. 11 Lemma 2. For (Q,R,S) e .93 let t be a density of r wrt Q and let Tn = "(fin—If". Then, with ti denoting 5 ~-+ t(xi), (9) T, = infl. (i) "Ta"... = 0(1), and for each I 6 (1,21 .. r-l r r . _ 2-r (n) n "Tun, < anew u, web a, - 2 . Proof. Since t is a density Of T wrt Q, t1 is a density Of rQn-l — _ at wrt Qn and t is a density of (Q11 1) wrt Qn. The latter implies (9). (i) We show that (Tn) is a monotone sequence of continuous functions 3 and therefore by Dini’s Theorem (cf. e.g. decreasing to zero on compact 9 Proposition 9.11 of Royden (1968)) it converges to zero uniformly. By Lemma 1 Tu 2 T Since sum and product are, by the norm properties n+1' and (6), continuous Operations on finite signed measures and * is linear, the composition Tn inherits continuity on .9 3. By the L1 Law of Large Numbers the rhs(9) converges to zero. (ii) By the independent case of a von Bahr and Esseen inequality (cf. von Bahr and Esseen (1965)) (10) infitilrs a, 2Q“It,l‘ = narQltlr- From (9) and the moment inequality, an; g lhs(10). Weakening this by (10), dividing by n and taking supremum over (Q,R,S), gives the inequality in (ii). a 12 Remark 1 In the proof Of Lemma 2 we used Lemma 1 to Obtain the monotonicity of Tn. Only later did we note that Lemma 1 can be used to show the monotonicity of EIXnI when X1, ..., Xn are i.i.d with EX1 finite, a fact seldom noted in texts, but presumable well known to the authors who discuss the associated reverse martingale. Any hope that this application Of Lemma 1 would be a worthy simpler proof were short lived. My colleague Liu, Zhihui noted non—increasing monotonicity for "in"r with r E [1,m] and X1, ..., Xn only exchangeable, ( m? n 15 II» n ""331 n n—l X. g X. = n X. , i=1 1 r j=1i¢j l r i=1 1 r as this immediate consequence Of the homogeneity and subadditivity properties of the Lr(P) norm appropriately applied. Remark 2. Pairwise mutual absolute continuity of the elements Of .9 is a necessary condition for the conclusion of Lemma 2 to hold. For, if B is such that P(B) = 0, then (12) ||(P“ - QPn‘lfu 2 (P11 - 29-57(93)”) = Q(B) so that lhs(12) = 0(1) only if Q(B) = 0. n Theoreml. LetP= xPi,whereforeachi Pie .9,andlet i=1 T=R—SwithRandS€.9. Then (13) suptntrrfn = (ass) e 9+2} = 0(1). 13 Proof. Let Q be a product Of N factors Of P. By Lemma 1 It * (14) ||('rP) || 3 ||(TQ) II- If Q is a probability measure, by the triangle inequality, (5), (6), and (7). (15) rhs(14) s urn 2nd, - on + "WWII. Let c > 0 , and use Lemma 2 to choose N such that "TN-i-lllm < 5/3. By total boundedness there is a finite covering Of .9 by m balls of radius 513—. If n > (N-1)m, then some ball contains at least N factors of P. Let Q be the center of such ball and the Qi be factors Of I_’ therein. Then by (14) and (15) and the choice of N ||(TP)*|| < .. n Example 1: Location family. Let P be a probability measure equivalent to Lebesgue measure on Rk. Let 8 be a compact subset of R1‘ and, V 0 e 8, let P 0 be the translate of P by 0. Observe that IIP0+6 — P0" = ||P6- P" = PldPo/dP — 1| —» 0 as 6—+ 0 by Lebesgue’s theorem ( cf. Royden pp 90—91 for the proof in the one dimensional case). Compactness of .9 = {P0 : 0 e ii} then follows by compactness Of 8, and pairwise mutual absolute continuity Of its elements follows by the equivalence of P and the Lebesgue measure together with the translation invariance of the latter. 14 5. Exponential families. Let p be a measure on .3 = Rk such that (16) I: {a e R“ : «(0) = Injeaxdttbt) < a} a i. For 9 e J,’ let P 0 be the probability measure with p—density (17) p0(x) = e990). It has long been known (cf. e.g. Theorem 1.13 Of Brown (1986)) that, on J,’ «p is convex by the Holder inequality and lower semicontinuous by the Fatou Lemma. If 0 E 1°, then (cf. e.g. Theorem 2.2 of Brown (1986)) all derivatives Of cp exist at 0 and can be obtained by differentiating under the integral sign. In Example 2 and Remark 3, to follow, 9 = {P 0° 0 e 8 } for various 8 c J.’ Example 2: I polytOpe. By a Gale—Klee—Rockafellar theorem (Theorem 10.2 Of Rockafellar (1970), convex p is upper semi-continuous on locally simplicial 8. Since a polytOpe is a finite union of simplices, it then follows that each x-section of p is continuous . Therefore, by the Scheffé Theorem, P0 is norm continuous so that 9 inherits compactness of 8. Remark 3: Compact 8 C .1 °. Then each x-section of p is continuous so that 9 inherits the compactness Of 8 as in Example 2. Consider a finite covering Of 8 by Open cubes with their closures in 1°. Then the convex hull of the vertices v Of these cubes is a polytOpe in I with 8 in its interior. Lemma 2.1 Of Brown (1986) then applies and gives a number K1 such that 15 (18) |e0° - e”"| 5 w -9'|K12ev' v (0,0,) 6 92. 0’- By triangulation about c it follows that whence properties of integration wrt to p and (18) bound Ilpo - P9." by B|0 — m with B = ne‘V’nm 2 leefiv). Theorem 2. Let 9be an exponential family with compact 8 C 1°. If r 6 (1,2] and, for every 00 and 01 in 8, the external convex combination or = r 01 +(1—r)00 6 1°, then with r = R — s (19) sup{||('rP)*|| : R, s e 9and g e 9} = O(n-fi) where 5 = (r—1)/(r+(2r—1)k). Proof. Let n 6 2*, (R, S) e 5? and g e .99. Let (0, on) e n + (2— l) k ii such that Pi = P0! Choose N e Z such that with m = ([N r ]+1) 1 and g(N) = (N—1)m +1, g(N) S n .<. g(N+1)- Consider a cube containing 8 and divide it into m equal size subcubes. Since 11 z g(N) there exist N factors Of P , say Q1, ..., QN, with their indices in the intersection of one of the cubes with 0. By (15) 16 t N N t (20) "((1’) II S 2:2 HQ, - Qlll + "(7%) Il- Since the diameter of each cube is less than or equal to a constant multiple 1 of N('r' ’2) , where the constant depends on k and the size of the cube containing 8, by the uniform bound, on "P0 — P0," /|0 - 0’I, considered in Remark 3 there is a constant B1 such that the first term on the rhs(20) 1 is less than or equal to B1N(? '1). Since P90(Pgl/P90)r = exp{-w(01) - (l-r)w(0o) + Mp}. it is continuous on compact 82 and therefore is bounded. Thus if t is a density of r wrt to Q, ||Q|t|r||m < 2r(above bound) by Minkowski’s inequality in Lr(Q) . Therefore (20) and Lemma 2(ii) imply that (21) "or?" 5 But} ‘ 1) where B is a constant independent of R, S, _P_. Observe that by definition of s and fi (22) gfi(N+l)rhs(21) = 0(1). By choice of N and (21), (23) nfilhs(19) g lhs(22). The conclusion Of the theorem follows by (22) and (23). u CHAPTER 3 EQUIVARIANCE AND THE COMPOUND DECISION PROBLEM I. Introduction Consider a compound problem involving 11 independent repetitions of a component problem with states P e 9 Let .9 be a bounded risk class of decision rules for the component problem and let M < m be such that V t e .9, and V P E 9, R(t,P) g M. For an n—tuple x = (x1 ,..., xn) let 55: denote g with the a—th component deleted, and let PO denote P with the a—th factor deleted. Consider the class _.9 of compound rules 1 = (t1, ..., tn) where each git-section of ta 6 .9 When 9 is the largest class Of decision rules for the component problem, the above compound problem is the usual compound problem with g the largest class of compound decision rules. The compound problem with restricted component risk was considered by Gilliland and Hannan (1974-) for finite 9 , because Of the generality it provided for their enve10pe results and the fact that it is the natural setting in which to study "delete bootstrap" procedures. Moreover , as they noted, it allows for choice of 9 to control component risk behavior and the construction of asymptotically best equivariant procedures in .2 Let s be the function on g): {1, ..., n} a: 9x 31-1 such that s(_t_, a, P, $6) is the conditional on 2% risk incurred by g in the component a when the distribution of X a is P. It is well known (see Section 2 of Hannan and Huang (1972a) or Section 1 of Gilliland and Hannan (1974—)) that the compound problem is invariant under the group of n! permutations Of coordinates, and that a 17 18 compound rule 3 is equivariant if and only if there exists a function 7 on .3 x 31—1 to .1 symmetric on Jan-1, such that t a(x_) = 7(xa , £5) for all a. The latter implies that if g is equivariant then s is constant in its second argument and symmetric in its fourth argument. The implied prOperty for s will be used as a definition of equivariance when we bypass the consideration of a loss function. For equivariant procedures we will abbreviate 3(3, 0:, P, -) by using the affixes on t and P. For example s(_t_, 1, P a, -) will be abbreviated to 30,. Let 3 and d’ denote the class of all equivariant rules in g and the class of all simple symmetric rules in g respectively. The equivariant enve10pe corresponding to g is defined by (1) M) = inf Ross). tea- and the simple enve10pe corresponding to g is defined by (2) (is) = ingress). ts In this chapter we use the results Of Chapter 2 and prOperties of equivariant rules to show the asymptotic equivalence Of the simple and equivariant enve10pes and establish asymptotic optimality of certain equivariant "delete bootstrap" rules. Section 2 deals with the difierence of the two envelopes and asymptotic Optimality. In Remark 4 we observe that the method of proof Of Theorem 1 of Gilliland and Hannan (1974-) can be applied to translate the results of our Theorems 1 and 2 into convergence to zero of the excess of the simple l9 enve10pe over the equivariant enve10pe. Theorem 3 introduces sufficient conditions for asymptotic Optimality of equivariant "delete bootstrap" rules. Examples 3 and 4 consider important cases in which, by assuming an identifiability condition, Theorem 3 reduces the problem Of treating the asymptotic excess compound risk Of "delete bootstrap" rules to the question of LI- consistency Of certain mixtures. Section 3 provides, as examples, two classes of mixtures that satisfy the required consistency condition. The first example is the class of mixtures based on hyperpriors Obtained in Datta (1990), thus showing that his results for empirical Bayes problems are extended to the corresponding compound problems. A mixture in the second class is obtained by minimizing an L2—distance. 2. Asymptotically optimal "delete bootstrap" rules. The following remark shows that the excess of the simple enve10pe over the equivariant envelope has a uniform upper bound for which we have shown convergence to zero in Theorem 1 and obtained rates Of convergence, in a case Of exponential families, in Theorem 2. The result which is the first proof in the non-finite case, strengthens all the previous results in compound estimation under squared error loss. Remark 4. (a) (i - ii) 3 M s31» slgp "(P5, - Pfifn. This follows by the method Of proof of Theorem 1 of Gilliland and Hannan (l974—) : Let t E a By non—negativity and symmetry Of 30 * (4) |(P,'l - Ppsals M "(Pg - P5) ll 20 which implies . -l (5) P n n 2 8a - R(t,P) g rhs(3) Since Pi (n-IE sa) — (KP) = P1] Gn(s - 5), where t is Bayes versus GH in the component problem, by isotonicity of P i Pr”. (n‘lr s0) 2 ((2). Therefore (5) implies (6) (is) - has) 3 rhs(3). Since t_ is arbitrary E 8 , we Obtain (3). Consider 9 with the tOpology induced by the total variation norm and let (1 be the set of all probability measures on Borels of 9 For each w 6 11 the mixture P w is the measure on .3 defined by P “(13) = ..(P(B)). B e s. For each u e 0, let t u be a Bayes solution versus (1) in the component problem. Considered as a function on 1'1 into 9 (cf. Hannan 1957 p 101), t is called a Bayes response. Let t be a Bayes response, 3: a symmetric mapping on 31-1 into (I. Let E be the compound rule with 10(5) = than“) 21 Then by symmetry of i2), ft: is equivariant. The next theorem gives sufficient conditions for asymptotic Optimality of 3. Theorem 3. g is asymptotically Optimal if (i) For each 5 > 0, 3 iiE > 0 such that V 11, ("Pg - PGmll < 55) = ((9,;ng - 5) < 6). where i an equivariant rule with its components Bayes versus Gn. (ii) sup{PI~l u p3) — PGnll : g e 9} = 0(1). Proof. By (4) (7) sea) - 11(2) 3 P, Gn(3 - 6) + rhs(3). Weakening (7) by subtracting the non—positive function 8(3 — 5) from its right hand side integrand and triangulation about 171(2) together with (3) give (8) 11.6.2) - «7(2) 1 P, (6,; 01(3- 5) + 2cm». The rhs(3) converges to zero by Theorem 1. In order to show uniform convergence to zero of the first term on the rhs(8), let 5 > 0. Choose 66 with the prOperty assumed in (i). The first term on the rhs(8) is less than or equal to 6 + MPfilll P3, - Pen" 2 6,] which is less than or equal to (9) e + M5;1 Pi" p2, — PGn" by the Markov inequality. Since 5 is arbitrary, the conclusion follows by (ii). 22 Observe that since 9 is a compact metric space, by Theorem II 6.4 of Parthasarathy (1967), {I with the tOpology of weak convergence is a compact metric space. Lemma 3 Let p be a continuous function on 9 For each w let ”u be the signed measure defined by uw(B) = “(P)P(B)dw(P), B e .9. Let d be a metric of weak convergence. Then «I e (fl,d) ~~_. "w , with the norm-tOpOlogy on the range, is uniformly continuous. Proof. Let “n be a sequence in II converging to is). Since 9 is a compact metric space, it is complete and separable. By the SkorOhOd representation theorem (Theorem 3.3 of Billingsley (1971)) there exist 9 valued random elements "11 and 17 on the Lebesgue unit interval with respective distributions w and 1.2 such that "n converges to 1) pointwise. 11 Since ”w is the w—mixture Of the VP = p(P)P, — = — == 1 -— , (10) an ”w (wn w)u. Io (””n u”) Triangulation about flan)” and simple norm properties give (u) ”2.. - 4,5 "in, "(7,, - all + Man) - «rm. Since variations of a positive mixture are bounded by the mixture of the variations, continuity of u at (.2 follows from (10) and (11) by two applications of the Bounded Convergence Theorem for the I g . Uniform continuity follows by compactness of {1. u 23 In the rest Of this chapter we assume (12) fl is identifiable: w ~-v P w is 1—1, and let p denote the metric on {1 thereby induced by || II on the range (13) p(w.w’) = up, - PM". Remark 5. If (1 metrizes weak convergence in I), then d is equivalent to p : By choosing d a 1 in Lemma 3, it follows that w e (0,d) ~~-o P w is continuous on I). The same conclusion follows directly with d replaced by p. By the identifiability assumption and compactness Of 0 and metric range, ((cf. PrOposition 9.5 of Royden 1968)) both are homeomorphisms. Thus d and p are equivalent. Example 3. Let i be a continuous function on 9, and consider the compound decision problem whose component problem is estimation of KP) under squared error loss. Let i.) be a symmetric mapping on 3‘4 into n, and let i be an equivariant rule which is Bayes versus C(Xa) in the a—th component. Then i satisfies assumption (i) Of Theorem 3. Proof. Let B = |l¢(P)ll,,. Since (3 — Exp) = we2 — i2) — 2¢(P)p(’t‘ _ I), (14) (till — 3x3 - s) = (PGn— pg, )(t‘2 — i2) — 2(an- u3)(’t‘ - t) s B2||PGn- pa," + 4B||an-V8|| since ’t‘ and f inherit the bound on d. The conclusion now follows by the 24 uniform p continuity of Lemma 3 with the choice d = p justified by Remark 5. 0 Example 4: finite .1 and continuous loss functions Such decision problems satisfy a much stronger property than (i). Note that, for arbitrary s : P ~-+ P Eta La(P)’ (15) (GH — w)s = :KGn — w)(Pta)La(P) = f(VGn— Vilma g )aJHVGn— 112,". But by Lemma 3 and Remark 5, V c > 0 3 6 > 0 such that p(Gn,'u)) 2 6 or "VGn— we," 5 5. Theorem 3 of Gilliland and Hannan (1974—) reduced the problem of treating the asymptotic excess compound risk Of equivariant "delete bootstrap rules" to the question of Ll-consistency of "On—Gnu for finite 9 Datta (1988) considered the compound estimation problem under squared error loss for real one parameter exponential families with compact parameter space and reduced the problem to the question Of Ll—consistency of II PC: — PG II» under a domination assumption on translates of u that n 11 implies our identifiability assumption. His proof however, depended heavily on the particular shape of the densities for that family and the functional form Of the Bayes estimator under squared error loss. 25 3. Examples Of Ll-consistent posterior mixtures In Theorem 3 we listed two conditions under which "delete bootstrap" rules are asymptotically Optimal. In Examples 3 and 4 we considered situations where one of the conditions was satisfied and the problem of finding asymptotically Optimal solutions was reduced to the problem of Obtaining estimates of GH that satisfy the L1— consistency requirement of Theorem 3. Below we consider two classes of estimates of On that satisfy that requirement. A. Consistent posterior mixtures based on a hyperprior. Consistent mixtures based on a hyperprior were introduced in Section 1.4 of Datta (1988), for a subclass of one dimensional real exponential families and were extended to a much larger class of probability distributions in Theorem 3.1 Of Datta (1990). More specifically; let it be a measure and let 9 be the class Of probability distributions with densities {p0 : 0 E 0} wrt it, where B is a compact metric space. Suppose p(x) is continuous for each x and, with h=suplOpp,su h-M+p —10 asM—o. g, 060l8(9/91)l 0631(0 )9“ 0 Observe that as pointed out in Remark 3.2 Of Datta (1990), the second part of the above assumption forces Pa’s to be pairwise mutually absolutely continuous. By the Scheffé theorem, continuity of p(x) for each x implies the norm-continuity of P0 The latter implies that 9 inherits the compactness of 8. Consider n with the tOpOlogy of weak convergence and let A be a 26 probability measure on the Borel subsets Of (I. Let A be the posterior distribution Of w given x = 5. Then A is the probability measure on 0 n with density prOportional to II p w(xi) with respect to A. Let On denote the i=1 A—mix Of w’s. Then Theorem 3.1 Of Datta (1990) asserts that if A has full support (16) sup{P "an - PGn" : g e 9 } = .)(1). Since (n+l)(PG — PG ) = P0 - P0 , its norm does not exceed 2. n n-l n n+1 Thus, by triangulation about PG , (16) is equivalent to (16) with Gn n+1 replaced by Gn or, equivalently, with an replaced by 22114. +1 Observe that 3:114 is symmetric on 3‘4. Therefore an—l provides an example Of estimates that satisfy assumption (ii) Of Theorem 3. The importance of Datta’s estimates is due to the fact that compound Bayes rules against a prior not depending on _)_I_ turn out to be Bayes versus On_l(.)_(_&), in the cr-th component (cf. Datta (1988), Section 1.2.1). Therefore if the Bayes rules versus a given prior have unique risk, the compound rule that is obtained by playing Bayes versus an-IQK-O) in the a—th component, will be admissible for each n. The uniqueness of the compound risk of Bayes rules versus a prior ( in an estimation problem under squared error loss was shown in Section 4 of the appendix in Datta (1988), under the condition that P 0 is dominated by P C for every 0. 27 B. Ll- consistent mixtures based on a minimum distance Ll-consistent estimators of the mixing distribution for a normal mean were Obtained, in Edelman (1988), by minimizing an L2(A)-distance where A denotes Lebesgue measure on R. His proof depended heavily on the prOperties Of the normal distribution , especially the functional form Of the normal characteristic function. Instead of L20) we consider minimum distance in L207) with r; a probability with support Rk and Obtain estimators for the case where 9 is a k class Of distributions on R . Theorem 4, to follow, proves Ll- consistency Of minimum L2(r))—distance estimators of PG . In what follows we will use F, n with or without affixes, to denote the distribution function Of a probability distribution P and II tO denote the norm on L2(n). 1,, Observe that if r) is a probability measure on Rk, any distribution function H is in L (n) and d: P ~~-+ ||F - H|| satisfies 2 — Gn n (17) WP -d(£’)| S "F 'F ,Il SllP -P ,il —) on Gnu on on so that d is continuous on compact 91 and therefore attains a minimum. Lemma 4. Let .3 be Rk and let 1) be a probability measure with support Rk. Let r be the pseudo-metric on (1 induced by the L207) norm on the range of w ~-t F at . Then r is a metric equivalent to p and w 6 (i1,p) ~-i w E (fl,r) is uniformly bicontinuous. Proof. If r(w,w’) = 0, then F u = F a.e.(r)) and therefore, by w! continuity from above, everywhere. Thus P w = P and by identifiability w! 28 w = (11’. Since r 5 p, the identity function on (fl,p) to (I1,r) is continuous. Therefore by compactness, as in Remark (5), it is uniformly bicontinuousn k Theorem 4. Let .3 be R and let 7) be a probability measure with support Rk . Let On be the empirical distribution of I: a measurable minimizer of dn: 2 ~~-+ ||FGn- Hull" with Hn the empirical distribution of X. Then (18) sup{P||PG — PG || : g e 9" } = 0(1). n—l 11 Proof. Since FG = P(Hn), Hn as average Of P-independent n Bernoulli processes, has variance (19) P(FGn - Hn)2 = % Gn(F(l-F)) _<_ 1}; . 2 7] triangulation about Hu and use of the minimizing property of On , By the Fubini Theorem P F — H has the same bound. By G11 11 2 2 2 (20) r (Gwen) = "Fén— 1er",7 g 2HFGn - Hull 17' Let 6 >0. ByLemma4,take6>Osuchthatp$ corrzd. Then (21) P||P~-P IlSe+PP~-P rG,G 26. Gn Gn ll Gn GnllH ,, n) l 29 By the Markov inequality, the last term in (21) is bounded by (22) 1 P(lhs(20)) g 1 {a a; by (20)and the bound (19) for its P expectation. The resulting bound for the lhs(2l) proves (18) for the equivalent (as for (16) in A.) form with GH replaced by Gn—l' :1 Observe that On can be taken to depend on 2; only through H11 and therefore is an example of ill of Theorem 3. BIBLIOGRAPHY Billin sley, Patrick (1968). Convergence Of Probability Measures. John Wiley Sons. Billin sley, Patrick (1971). Weak Convergence Of Measures : Applications in robability. SI M. Brown, Lawrence D (1986). Fundamentals of Statistical Exponential families with Applications in Statistical Decision Theory. Inst. Of Mathematical Statistics, Lecture Notes — Monograph Series, vol. 9. Datta, Somnath (1988). Asymptotically Optimal Bayes compound and empirical Bayes estimators in exponential families with compact parameter space. Ph.D. Thesis, Dept. of Statistics and Probability , Michigan State University. Datta, Somnath 1990). On the consistency of posterior mixtures and its applications. 0 appear in Ann. Statist. Edelman, David (1988. Estimation Of the mixing distribution for a Normal mean wit applications to the compound decision problem. Ann. Statist. 16 1609—1622. Fabian, Vaclav and Hannan, James (1985). Introduction to Probability and Mathematical Statistics. John Willey & Sons. Gilliland, Dennis C. and Hannan, James (1974). The finite state compound decision problem, equivariance and restricted risk components. RM-3l7 Dysyidyivd snf Probabilit , MSU; (1986)Adaptive Statistical Procedures and Related Topics. IM Lecture Notes—Monograph Series 8 129—145. Hannan, James F. (1957). Approximation to Bayes risk in repeated play. Contribution to the Theory Of Games 3, Ann. Math. Studies, NO. 39, Princeton University Press, 97—139. Hannan, James and Huang, J.S. (1972a). Equivariant procedures in the compound decision problem with finite state component problem. Ann. Math. Statis. 43 102-112. 30 31 Hannan, James and Huang, J.S. (1972b). A stability of symmetrization of product measures with few distinct factors. Ann. Math. Statist. 43 308-319. Hannan James F. and Robbins, Herbert (1955). As mptotic solutions Of the compound decision problem for two complet y specified distributions. Ann. Math. Statist. 36 1743-1752. Parthasarathy, K. R. (1967). Probability measures on metric spaces. Academic Press. Robbins, Herbert 1951). Asymptotically sub—minimax solutions Of compound decision prob ems. Proc. Second Berkeley Symp. Math. Statist. Prob., Univ. of California Press, 131-148. Rockafellar, R. Tyrrell (1970). Convex Analysis. No 28, Princeton Mathematical Series. von Bahr, Bengt and Esseen, Carl—Gustav (1965). Inequalities for the rth absolute moment of a sum of random variables, 1 g r 5 2. Ann. Math. Statist. 36 299—303. HICHIGRN STRTE UN llllllllllllllljlllWilli) 312 300 IV . llllllll 1 7