= — — = — — — _ —. — — _ — — —_ = usm\es um \\\\\\\\\\\\\\\\\\\\\\\\\\ 43 it \ 3 12.93 0 fit)“ ti \\ “\CHM \3 This is to certify that the dissertation entitled Asymptotic Behavior of Compound Rules in Compact Reguiar and Nonregular Families presented by Jin Zhu has been accepted towards fulfillment of the requirements for Ph.D. degree in Stat'iStiCS %W chuk Prof . James Hannan Major professor [fine July 2, 1992 MSU is an Affirmative Action/Equal Opportunity lnslirulion 0- 12771 LIERARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before due due. DATE DUE DATE DUE DATE DUE J ll— ISLJ; —7 _J ”—T—il h MSU Io An Affirmative ActiorVEqueI Opportu'lIIy Institution _ _ _ cMMpmS-pI ASYMPTOTIC BEHAVIOR OF COMPOUND RULES IN COMPACT REGULAR AND NONREGULAR FAMILIES By Jin Zhu A DISSERTATION Submitted to Michi an State University in partial ful ent of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1992 'Lu-l/ 7.) (7?- ABSTRACT ASYMPTOTIC BEHAVIOR OF COMPOUND RULES IN COMPACT REGULAR AND N ONREGULAR FAMILIES By Jin Zhu Asymptotic behavior Of compound rules is considered for three different component problems. For a restricted component problem with a compact parameter space, admissibility and asymptotic Optimality are proved for compound Bayes rules based on full support hyperpriors. A special case Of the above general structure is a multi—dimensional exponential family with a polytOpe parameter space and equi(in action) continuous loss function. An example is given to show that this class of loss functions is the largest for which asymptotically optimal compound rules exist. A multi—dimensional Miintz—Szész theorem is also developed to prove identifiability Of our exponential family. For a two sided truncation family, an equation satisfied by component Bayes rules and an inversion formula for probability measures on parameter space are derived. Based on these, asymptotically Optimal 0(n-1/4 ) compound estimators of truncation parameters and the empirical probability measure determined by n arbitrary parameters are obtained. For a linear compound problem with quadratic variance and bounded ”2 is proved for compound parameter space, asymptotic Optimality with rate n— estimators Of means. The estimators are Obtained by approximating functions Of Bayes estimators versus the empiric Of 11 parameters. An example is given to show that the boundedness of the parameter set is relative necessary for the existence of asymptotically Optimal compound estimators. To my parents and wife iii ACKNOWLEDGMENTS I would like to express my gratitude to my thesis advisor, Professor James Hannan, for his patience, encouragement and guidance in the preparation of this thesis. His careful criticism and invaluable suggestions were of great value in simplifying proofs and improving presentations. What he has been consistently doing, taught me much about the attitude tO academic research and even more being a mathematical statistician. I would like to thank other committee members, Professors Dennis Gilliland, V. Mandrekar, R.V. Ramamoorthi, and Habib Salehi, for their suggestions on an earlier draft. I benefited greatly from Professor Gilliland for his careful reading of the thesis and invaluable suggestions on Chapter 3. Warm thanks and appreciation also go to Ms. Lora Kemler for her expert typing of my thesis. Finally, I wish to thank the Department Of Statistics and Probability at Michigan State University for providing the financial support during my stay. iv TABLE OF CONTENTS CHAPTER 0: INTRODUCTION 1 The Component Problem 2 The Set Compound Problem and the Empirical Bayes Problem 3 Literature Review and Summary of the Present Work 4 Some Notations and Conventions CHAPTER 1: BAYES COMPOUND RULES FOR COMPACT FAMILIES 1 Introduction 2 Admissible and Asymptotically Optimal Compound Bayes Rules Theorem 1 (Equi uniform continuity of R n(t ,_) and R(t, w)) Theorem 2 (g A inherits full support Of A) Remark 1 (Admissibility of compound Bayes rules) Theorem 3 (A suflicient condition on A. O. of 9 Remark 2 (EA. 0. of compound Bayes rules) 3 Multivariate xponential Families Remark 3 (On the assumptions of L) Lemma 1 ((Continuity of pa) Lemma 2 (Continuous differentiability of p 0) Lemma 3 (Norm Lipschitz of P 0) Lemma 4 (R(t, 0) inherits equi continuit of L( (a ,0)) Remark 4 (Rt Lginherits equi Li schitz of L(a, 9)) Corollary 1 ( quicontinuity of Rt t,0)) Remark 5 (On the existence assumption Of Bayes rules) Corollary 2 Application Of Theorem 2 Corollary 3 Application of Theorem 3 APPENDIX FOR CHAPTER 1 1 Equicontinuity Of Loss Functions Lemma 1 Equicontinuity of L) Lemma 2 Equi Lipschitz Of L) Remark 1 Convex loss) Example ( .0. rule does not en'st) 2 A k—dimensional Miintz—Szész Theorem with Applications Theorem 1 EA k—dimensional Miintz—Szész theorem) Remark 2 ( xamples of A( k) )) Theorem 2 (Identifiability( with respect to an exponential family) 10 11 12 13 14 15 16 17 19 20 21 22 24 25 CHAPTER 2: COMPOUND ESTIMATION OF TRUNCATION PARAMETERS 1 Introduction 26 2 The Component Problem Lemma 1 (An equation satisfied by t w) 27 Lemma 2 (Inversion formula for w) 28 3 Compound Estimators 29 4 Estimation of pG 31 n a Lemma 3 (A bound on conditional L1 error for p) Corollary 1 (Same for pi) 32 Lemma 4 (Bounds on various L1 errors for p) Corollary 2 (Same for pi) 34 5 Asymptotic Optimality of i 35 Theorem 1 (L1 consistency Of On) Lemma 5 (Singh—Datta Lemma) 36 Theorem 2 (A. O. oft) An Example 37 CHAPTER 3: THE LINEAR COMPOUND PROBLEM 38 1 Introduction 2 The Component Problem and a Compound rule The linear simple rule? 40 The compound rule 1 3 Asymptotic Optimality of i Theorem 1 (A. O. of t) 41 Lemma 1 (Identities for 7i — 7 and H(Ti)—H(7)) Lemma 2 (Bounds on E 0| 32 — 02| and E 9| R(RHICJ)” 42 Lemma 3 (Bounds on average L1 error of Xi and L1 error of 111) 43 Proof Of Theorem 1 44 4 An Example 45 BIBLIOGRAPHY 47 CHAPTER 0 INTRODUCTION 1. The Component Problem Let X be a random clement taking values in a measurable space (.5 .2) and let the parameter space 9 be a metric space. For each 0 E 9, P 0 is a probability measure on (.13 .3). We are to take an action a in A based on the Observed values of X to P 0 . The non-negative loss L(a,0) is incurred when a is taken and 0 is the true parameter. A non-randomized decision rule t maps .3 into A such that L(t, 0) is measurable. The P g—average of L(t,0) is called the risk of t at 0 and is denoted by R(t,0). For a restricted problem, the loss and decision rules are bypassed in favor Of consideration of an arbitrary class of risks. Let (I be the set Of all probability measures on the Borel a—field Of O. For each u E n, the w—average of R(t,-) is called the Bayes risk of t at w and is denoted by R(t,w). The infimum Bayes risk is denoted by r(w) and a decision rule t0 such that R(t0,w) = iltlf R(t,u) 5 r(w) is called a Bayes decision rule versus «1. If the infimum is taken among the affine functions of x, the infimum is denoted by rL(w) and any minimizer is called a linear Bayes decision rule versus an. 2. The Set Compound Problem and the Empirical Bayes Problem Suppose that the component decision problem occurs repeatedly and independently n times and we are to make decisions about Q based on _X_ = (XI, ..., Xn). A set compound rule 1 = (t1, . . . , tn) is a sequence such that each ti(§) is a decision at ith stage. The parameter space and the action space for the compound problem are e11 and A“. The loss function Ln: All x on ~~> [0, e) is n taken to be the average loss across the components, Ln(a,§) = % )3 L(ai,0i). For E i=1 = (x1, ..., xn), let _x_§ denote x with i—th component deleted and for a compound rule 1, let R(ti,0) denote a conditional risk given 5i . With E 0 stands for the n expectation determined by P 0 = a P0 , the compound risk is - i=1 i (1) Russ = EgLnUJD =§i§1Egatt,.o,); the rhs(l) defines the compound risk with restricted components. A compound rule 1 is said to be simple if t1= - - - = tn and each ti is a function of Xi only. As a function of g (2) is f Ree!) 3 simple is called the simple envelOpe. Since Rn(t,9 = R(t,Gn) for a simple rule with component t, (2) coincides with r(Gn). For a compound rule 5 , the modified regret is defined by (3) Dn(.t.I!) = Rue?!) — r(G11) We say a sequence of compound rules t n is asymptotically Optimal (a.o.) if (4) sup DnQn,g) -t 0 as n -t m, 0 and asymptotically optimal with rate on if (5) 8159 D,,(£n. l) = 0(an)- For the relation Of this notion of Optimality to that with more stringent envelopes, see Theorems 1 and 2 Of Gilliland and Hannan (1974—) for finite 9 and Remark 1 of Mashayekhi (1993) for compact 9. For the linear compound problem, the modified regret for a decision rule t is defined by (6) DnL = RAE!) " rL(Gn)I and a sequence Of decision rules 1 n is said to be asymptotically Optimal (with respect to rL(Gn)) if (7) map DnLQn’D -t 0 as n -t m. Asymptotic Optimality with rate is defined similarly as in (5). A sequence of set compound rules _t_n is said to be admissible if for each n _>_ 1, 5n is admissible with respect to the compound risk Rn. In the empirical Bayes problem, it is assumed that there exists an w E (I such that Q ~ at“. At stage n, the risk of a decision rule tn is t (8) R (tutu) = I EQLOnron) d‘JII and the modified regret is defined by t t (9) D (twat) = R (tn,w) — r(w). A sequence of decision rules tn is said to be asymptotically Optimal (with rate on) if as n 4 III . Q (10) D (ins!) = 0(1) (001,,»- A sequence of empirical Bayes rules tn is said to be admissible if, for each n 2 1, * tn is admissible with respect to the empirical Bayes risk R . 3. Literature Review and a Summary of the Present Work Following his featured example on decisions between N (-1, l) and N(l, 1), Robbins (1951), in Section 6 Of his Berkeley Symposium paper, introduced the compound and empirical Bayes problems, pointing out their relations and possible solutions. Since then a great amount of work has been done in these two fields. Hannan and Robbins (1955) generalized the example Of Robbins (1951) to two arbitrary specified distributions and proved the asymptotic Optimality Of their proposed compound rules. Asymptotic Optimality with rate 0(1NE) was proved later by Hannan and Van Ryzin (1965). This result was further generalized by Van Ryzin (1966) to finite 9 and finite A. For non—finite O, Gilliland (1968) and Singh (1974) proved the asymptotic Optimality with rate Of their compound procedures (estimates of simple rules Bayes with respect to Gn) under the squared error loss for some compact (discrete and continuous) exponential families. Gilliland, Hannan and Huang (1976) first proved admissibility and 1/ 2) of the compound rules Bayes with asymptotic Optimality with rate 0(a- respect to appr0priate diffuse hyperpriors (Robbins conjecture) for two arbitrary specified distributions. The proof was based on Theorem 3 and Theorem 4 of Gilliland and Harman (1974—); the latter reduces the problem of asymptotic Optimality to that of posterior consistency for the finite state compound problem with the restricted risk structure. This method has become standard in proving asymptotic Optimality and admissibility in compound problems. For squared error loss estimation in one—dimensional exponential families, Datta (1991b) Obtained admissible asymptotically optimal compound rules which are Bayes versus mixtures derived from hyper—priors on n ; this was the first such result for non-finite 9, implementing the outline in Addendum III Of Gilliland, Hannan and Huang (1976). Mashayekhi (1993) strengthened Datta’s result to the equivariant enveIOpe and his Theorem 1 gave a sufficient condition for the asymptotic Optimality of "delete bootstrap" rules. In this thesis we consider three different problems in three chapters. The component problem in Chapter 1 has a restricted risk structure. Section 2 first proves (Theorem 1) that both compound and component Bayes risks inherit the equicontinuity of the component risk. Based on this result, Theorem 1 of Mashayekhi ( 1993) is reduced to a easily verified form (Theorem 2) and the proof of admissibility (Remark 1) Of prOposed compound rules becomes straightforward. Section 3 of Chapter 1 applies the results Of Section 2 to a multi-dimensional exponential family. Asymptotic Optimality and admissibility are proved for a large class of loss functions. This extends the results 3.2 and 3.3 Of Datta (1991b) from one—dimension to multi—dimension and from squared error loss to equi(in a) continuous loss. The Appendix for Chapter 1 consists Of two parts. The first part gives some suficient conditions for a loss function to be equi(in a) continuous and proves in particular that most convex losses are equi(in a) continuous. The example given at the end of the first part shows that in some cases equicontinuity Of a loss function is necessary for the existence of asymptotically Optimal compound rules. In the second part, Theorem 1 is a multi—dimensional version of the Mflntz—Szész theorem. Based on Theorem 1, Theorem 2 gives a sufficient condition for the identifiability of multi—dimensional exponential families. In Chapter 2, we consider compound estimation Of truncation parameters of certain nonregular families. (Ferguson (1967) Section 3.5) A one—dimensional version was considered earlier by Nogami (1978). Empirical Bayes estimation of truncation parameters has long been considered. Fox (1978) considered two nonregular families U(0, 0) and U(9, 0+1), and proved the asymptotic Optimality of his estimators without rate under the assumption that the prior G has second moment. Nogami (1988) treated U(0, 0) on a compact parameter space Of R and proved that her empirical Bayes estimator is asymptotically Optimal with rate 11-1/2. Nogami’s result was generalized to a wider class Of nonregular distributions by Datta (1991). See more references in this paper. Wei (1989) considered a two—sided truncation family in the empirical Bayes problem. He proved the asymptotic Optimality without rate under the assumptions that the parameter space is compact and the prior G has a density. In this thesis, under the assumption that parameter space is compact, our prOposed compound 1/4 estimators are proved to be asymptotic Optimal with rate 11" . Also, estimators of the SE empirical distribution Cu are given and are shown to be Ll—consistent with rate {4/4. Chapter 3 considers the linear compound problem introduced by Robbins (1983). Asymptotic Optimality of the linear empirical Bayes problem was studied by Yu (1986), (1988). Here asymptotic Optimality of compound estimators is proved under some mild conditions. 4. Some Notations and Conventions For any positive integer 11, let 2 denote an n—tuple (a1, ..., ah). |-| is used to denote the Euclidean norm and xy the inner product of x and y on n Euclidean spaces. If P1, ..., Pn are probability measures, we use P = x Pi to i=1 denote the product probability measure. If the index is not exhibited for sum 2‘. or product If, it runs from 1 to an integer and the integer is clear from the context. We sometimes use p(f) to denote the integral of a function f with respect to a signed measure 1;. Dummy variables are Often at least partially displayed in integrals such as ]f(x) dp. E is used for the expectation corresponding to a probability measure P. A function on a set .3 to a set ] is denoted by x e £~~> y E fl Sometimes we abuse notation and denote fimctions by their values. Indicator functions are often not distinguished from the sets they indicate. The symbol [ ] is Often used for the indicator function. In this thesis, 9 is always a parameter set and 0 is a generic element of O. For _Q E 6“, P 0 is the product measure xP 0. and C11 is the empirical probability _ I measure determined by _0_. V and It represent supremum and infimum respectively. CHAPTER 1 . BAYES COMPOUND RULES FOR COMPACT FAMILIES 1. Introduction Admissibility and asymptotic Optimality are studied for a restricted risk compound problem. As an application Of the Obtained results, a multivariate exponential family is considered. . Let .9 be a family of probability measures on a metric space (.3 .29, compact in the total variation norm. Let M < m and let 9 be a class of decision rulessuchthat R(t,P)$M forany t6 9 and PE .2 Let (I bethesetofall probability measures on .9 with the weak convergence tOpology. We assume that for any I.) 6 (I there exists a Bayes decision rule t w E a that is (l) R(tw. 02) =:I€1ng(t.w)o Consider a (restricted risk) compound problem with n independent repetitions of the above restricted risk component structure. For _x_ = (x1, ..., xn), x P. ita ‘ denote P with at—th factor deleted. The class Of decision rules is g such that let 55: denote _x with a—th component deleted and, for P 6 fl, let P 5: = every component ta of _t_= (t1, ..., tn) 6 g is a Q—valued map of £5 and R(ta,P) is $1.1 measurable for any P e .2 The compound risk (0.1) Of _t_ E g at _P_e .9” canbewrittenas (2) Rn(t,P_) = %§PR(ta.P,,)- For a more detailed description Of the structure Of the above restricted risk problem, see Section 1 of Mashayekhi (1993) and, for the relation between the usual decision problem structure and restricted risk problem structure, see Section 5 of Gilliland and Hannan (1974-). Section 2 of this chapter considers admissibility and asymptotic Optimality Of compound Bayes rules for the above restricted risk compound problem. Theorem 1 proves that the compound and component Bayes risks inherit the equicontinuity and boundedness of the component risk. Theorem 2 and 3 concern the admissibility and the asymptotic Optimality Of compound Bayes rules. In Section 3, we apply the results of Section 2 to a multivariate exponential family. Lemmas 1 through 3 concern some basic prOperties of the component distribution. Lemma 4 is a general result on relations between risks and loss functions. An application Of Lemma 3 and Lemma 4 gives us the equicontinuity of the component risk (Corollary 1). Corollaries 2 and 3 prove admissibility and asymptotic Optimality Of compound Bayes rules based on full support hyper-priors. 2. Admissible and Asymptotically Optimal Compound Bayes Rules In this section, we consider the asymptotic behavior of a compound Bayes rule for a restricted risk problem. In Theorem 1, we prove that the compound risk Rn(_t_,-) and component Bayes risk R(t,-) inherit the equicontinuity and boundedness Of the component risk. With A a hyper—prior on the set of all probability measures 0, we define the prior Q A to be the A—mixture of n it'd w. Theorem 2 proves that Q A has full support if A has; consequently proves that a Bayes rule 5 A against E A is admissible. Based on Theorem 1 of Mashayekhi (1993), Theorem 3 gives a sufficient condition for t A to be asymptotically Optimal. Theorem 1. Suppose the component risk R(t,P) is equi(in t) uniformly continuous and equi(in t) bounded by M. Then (i) the compound risk Rn(t,P) is equi(in t) uniformly continuous on .951 and (ii) the component Bayes risk R(t,w) is equi(in t) uniformly continuous on D. Both obviously inherit the bound M. Proof: Let _P, _P’ e 91. By triangulation about PR(t,P') (with a’s unexhibited) and 0 g R 5 M, the difference of summands of Rn (in (2)) at P and P’ is bounded by (a) guitar) -R(t.P')| + émur—P'u. By triangulations about n—l points changing _P’ to 2 one coordinate at a time and applications of the subadditivity and multiplicativity of the norm, n n (4) IIP-P’IIS 3 II x P (P--P’-) * P’||= 3 IIP--P’-ll j.,=1 kj 1‘ j=1 J J (Mashayekhi (1990) display (2.7)). For any a > 0, the equi(in t) uniform continuity of R and (4) yield a 6 > 0 such that (5) V||Pj-P:i|| < 6 implies (3) < 5. Hence, each summand of R11 (in (2)) is equicontinuous and so is Rn . TO prove the component Bayes case, we use the following inequality of Oaten(1972) (Result (a) on page 1179): If E, F are probability measures on the Borel a—field of a metric space (.2 p), 7 is their Prohorov distance and h is a bounded measurable function on .9: then (6) |(E-F)h| S 7Diameter(h($) + ah. where (7) 01, = BuP{|111(P)--h(1”)|; p(P.P’) < 7}- The Oaten—inequality (6) here bounds |(w1—w2)R(t,-)| by 7M + sup{|R(t,P) —R(t,P')| ; ||P -P’|| < 7}, which, by the uniform equicontinuity Of R, converges to 0 uniformly in t as 7 converges to 0. n 10 The compound decision rules derived below are similar to the ones in Datta (1991b) Section 3.1. They are Bayes rules against the priors (8) induced by hyper-priors on 0. Let B(fl) denote the Borel a—field on 0 corresponding to the weak convergence tOpology and let A be a probability measure on (D, B(fl)). Let Q A denote the compound prior on 9‘, the A-mixture Of a!“ determined by the values (a) 72,031,: --- 43,) = 131413,) dA(w) Of these A integrals of continuous functions. Let .9“ denote the (Ir—mixture of P and Afi—l the A—mixture of .954. Since the joint probability QA°P51 induces the A—mixture of 52—1,“ on .sn‘lx .9 we assume the latter has the disintegration (A 55-1)“: at . (When, as in Section 3, .9” has density p w and A a is the probability on {I with A—density prOportional to II p “(xi), then the A a—mixture of w is such an “a .) It then itta follows from the definition of the compound risk Rn that t is Bayes with respect to Q if V a, t = t with A a A a (9) tAaQ) = twa(§&)(xa) V x . Theorem 2. If A has full support I), QA has full support .9‘. Hence ifin addition Rn(t_,-) is continuous for each 3 e g, then the Bayes rules with respect to g A are admissible. Proof: Let Bi i= 1, ...,n beOpen sets in .9 Since (10) {an «(13) > 0} is Open for any Open set B in 9(See Billingsley (1968), Theorem 2.1.iv.) and A has full support, rhs(8) is strictly positive. Hence [1 A also has full support. a 11 Remark 1. Under the assumption that the component risks R(t, -) are equicontinuous and bounded, by Theorem 1 (i) the compound risks Rn(_t_,-) are equicontinuous. Hence Theorem 2 yields that the compound Bayes rule t_ A is admissible. n Let _t_ denote a simple compound decision rule with components Bayes versus Gn’ the empirical measure determined by 9. Let «I: be a symmetric mapping from sin—1 into (I and let 3 be a compound rule with (11) {C(15) = migraine). Theorem 3. Let .9 be a compact metric space. Suppose that w ~~> 9w is 1—1 and that the component risks are equi(in t) uniformly continuous and bounded. Then i defined by (11) is asymptotically optimal if (12) sup{P, "5:,— agnll; re 5”} = 0(1). Proof: We will use the following theorem Of Mashayekhi (Theorem 1 Of Mashayekhi (1993)): i is an. if (13) for any 6 > 0, 3 6) 0 such that V n n 5%,— $9.." < 5 implies (cn — .L)(R(£n,.) — R(tn,-)) < e, and (12) holds. Much more than (13) holds here: By the one—to—oneness of the map as ~~> .9“ , Remark 2 of Mashayekhi (1993) shows the distance between I.) and («1’ defined by H 9w — .9“, || determines the same tOpOlogy as does the Prohorov distance d. By the assumptions on the component risks R(t,-) and Theorem 1 (ii), V r) 0,3 6>0suchthat d(wl,w2) < 6 implies |(wl-w2)R(t,-)| < e 12 for any decision rule t e .9! n Remark 2. The conditional probability w in (9) can be replaced by its average a over permutations; without loss of generality it can therefore be taken symmetric. Also because ”a is constant with respect to a, 1A defined by (9) is a special case of (11). u 3. Multivariate Exponential Families In this section, we consider a multivariate exponential family example of Section 2. The component distribution assumptions are polytOpe parameter space 6 and finite first moments on the vertices of O. Lemmas 1, 2, 3 prove some basic prOperties of our exponential family under these assumptions. The assumptions on the loss function L(a,0) are boundedness and equi(in a) continuity. Lemma 4 proves more generally that the component risk R(t,0) inherits the equicontinuity of L under the hypothesis that 0 am) P 0 is uniformly continuous. As an immediate consequence Of Lemmas 3 and 4, our component risk is equicontinuous and bounded (Corollary 1). Using Theorem 2, Corollary 2 proves the admissibility of our compound Bayes rule and, using Theorem 3 together with a posterior consistency result of Datta and Theorem A.2, Corollary 3 proves the asymptotic Optimality. Let 6 beapolytOpe in Rk with vertices b1, ..., hm. Let p beameasure on the Borel o—field of Rk and for each a e 9, let (14) pix) = eoxdfio) be a density of P0 with respect to u such that (15) g IIXIexp(b,X)du(X) < .. Let M be a positive number and let the loss function L: (a, 0) e A x O ~~> [0, M] 13 be equi(in a) uniformly continuous, i.e. V e > 0, 3 6 > 0 such that (16) |01 — 02| < 6 implies |L(a, 01) -L(a, 02)| < c for 01, 0269 and aEA. Remark 3. Assumption (16) is satisfied if A is a compact subset of RI and L satisfies conditions (ii) and (iii) of the Appendix. The example given there shows that in some cases there may not exist an asymptotically optimal rule if (16) is not satisfied. I: Some consequences Of assumptions (14) and (15) are stated here for later use. Lemma 1 proves the continuity of the function 0 ~~> p 0(x) for each x and the continuity of E 0X Lemma 2 proves the continuous differentiability Of 0 ~~> p 0(x) for each x. Lemma 3 shows that 0 ~~> P 0 is Lipschitz in the total variation norm. Lemmal. 0~~> [ehdubq and 0~~> leleodex) are continuous on 9. Proof: The continuity of the above two functions is a direct consequence of finiteness of jexp(bix)dp(x) for each i, ( 15) and the dominated convergence theorem in view of the domination (17) exp(0X) s exptvbix) s Eexp(b,X). o e 9, resulting from the fact 9 = convex hull Of {bl, ..., bm}. a Lemms 2. Let M0) = jeoxdp(x). Then, for each 0 E 9 and for any direction a in the unit ball of Rk and into 9, the directional derivative Aa Of A exists on Band (18) Rate) = I (am)e”" duo). 14 Furthermore A is continuous in (01,9). Since 1]: = lnA, the directional derivative Ill: A/ A inherits the continuity of A and A. Proof: For any 0 and unit a, the a directional derivative of exp( 0x) at 9 is (aar)exp( 0x). Since, for 0 e 9, the derivative is dominated by |x|2exp(bixi) which is integrable by (15), a standard theorem Of interchanging the order of integration and differentiation (e.g. Theorem 1.5.4 Of Fabian and Harman (1985)) yields (18) for 0 e O and a into 9. The continuity Of A follows from the dominated convergence theorem with the same dominating function. o For compact 9 in the interior Of the natural parameter space, there always exists a polytOpe containing 9 such that (15) is satisfied. (Cf. e.g. Theorem 2.2 of Brown (1986).) Let || || denote the total variation norm of signed measures on Rk. Since rhs(18) is defined on 9 x the unit ball in Rk, we abuse notation and use A and 7; to denote their corresponding extensions also. Let K1 = (“15” a (0)| and K2 = 313p! |x|p0dp+ K1. Lemma 3. The map 0 ~~> P 0 is Lipschitz on 9 in the total variation norm. Proof: Since the unit ball in Rk and 9 are compact, Lemma 2 implies that K1 is finite. Hence, by Lemma 1, K2 is finite. By applying the mean value theorem to the exponential function exp and noticing exp is monotone, (19) ng-Pgtl S I91!-0’x-(1/(0)-¢(0’))I(p0Vp0,). By the Euclidean norm inequality and by the mean value theorem applied to 10, (20) mm) s Iii-9’ l(lxl + lep, + p0,). 15 Hence the conclusion follows from the integration of (19) with respect to p. 0 Lemma 4 below shows that component risks inherit equi(in a) uniform continuity of loss functions more generally. Lemma 4. If only 0 mo) P 0 is uniformly continuous and if 0 ~~> L(a,0) is such equi(in a) and bounded equi in a, then the component risk R(t,0) is equi(in t) uniformly continuous and bounded on 9. Proof: The risk is bounded because an expectation inherits the bounds of a integrand. By triangulating about P ”L(tfl’), (21) |R(t.0) - R(t,?“ S X|(L(a.0) - L(MOI + {(:?%)L(a.0) }||Pp- P91"; this proves the equicontinuity of R. 0 Remark 4. If 0 ~~> P 0 is Lipschitz and 0 am) L(a,0) is equi—Lipschitz and uniformly bounded, from (21) 0 ~~> R(t,0) is equi—Lipschitz and uniformly bounded. Remark A.1 gives examples of equi—Lipschitz loss functions. a Using Lemma 3, the following is an immediate special case of Lemma 4 and Remark 4: Corollary 1. For our multivariate exponential family, R(t,0) is equi(in t) uniformly continuous and bounded on 9. If we further assume that L is equi(in a) Lipschitz, then R(t,0) is equi(in t) Lipschitz and bounded. o For the remainder of this chapter, we make the assumption that component Bayes decision rules exist. Let n be the set Of all probability measures on O. For 16 each u e (I let 9w and p w denote the tel—mixtures of P 0 and p 0 . Define the posterior expected loss (22) Lx(a.w) = l L(a.0)pp(X) dam/pulls)- We assume that for any a: 6 0, there exists a measurable t w: Rk ~~> A such that t w(x) minimizes Lx(" w) on A. Such a t w is a (component) Bayes decision rule. Remark 5. In the context Of the compound problem (which is the subject of this thesis), the assumption on the existence of a measurable minimizer is without essential loss Of generality since there always exists an e—Bayes decision rule. (See Gilliland and Hannan (1974—), last paragraph on page 135 for further explanation for finite 9.) Under the hypothesis of Remark 3, Bayes decision rules always exist. In this case, Lx( - , w) inherits the continuity of L(- ,0) by the bounded convergence theorem, hence attains the minimum at some point t ”(x) in the compact set A. Because L is then also joint measurable on A x 9 and A is complete, it is possible to choose a measurable t w . (See Brown and Purves(1973), Theorem 3.) Under the assumption (16), closure of J: {L(a,-); a e A} in C(O) is without loss Of generality by the Ascoli theorem. In this case, the previous criteria apply with A taken to be .2 a For our multivariate exponential family, {P0 ; 0 E 9} is dominated by p. Thus given a full support hyper—prior A on 0, if A a is the probability on 0 with A—density proportional to II p “(xi), then the A a—mixture of w is a i# a conditional probability ”a in the disintegration of Mfg—1w). Therefore the compound Bayes rule 1 A makes sense in this situation and we will prove its admissibility and asymptotic Optimality in the next two corollaries. 17 Corollary 2. In the context of the preceding paragraph, the compound Bayes rule 1 A is admissible. Proof: Corollary 1 proves that R(t, -) is equi uniformly continuous and bounded on 9. Since A has full support, the admissibility follows from Remark 1. n The following notation will be needed to prove the identifiability of 0, is. (23) w M») .9” is 1-1, for our multivariate exponential family which in turn will be used to prove the asymptotic Optimality Of our compound rules. Let A(1) be a set in R which contains a sequence of distinct numbers {y1, y2, - - -} such that (24) yu, u; 1, have the same sign and ( ) i I I 1 25 y A =0. u=1 u yu For 2 g i g k, let A(i) be a set in Ri such that for each y(i‘1) e A(i—l), there exists a sequence of distinct numbers {yil’ yi2’ - - -} with properties (24) and (25) (H) . .. - and {(y Iyiu)l ll - 1: 2i } C A(1) Corollary 3. In the context of the paragraph preceding Corollary 2, suppose that the support of p is a set A(k) defined above. Then the compound Bayes rule 1A is asymptotically Optimal. Proof: Since t A = i by Remark 2, we need only verify the hypothesis of Theorem 3 and condition (12). Since 9 is compact and the map 0 ~~> P 0 is Lipschitz by Lemma 3, {P 0 ; 0 e O} is compact for the total variation norm. The assumption that the support 18 Of [l is a set A(k) and Theorem A.2 imply the identifiability of (I. Corollary 1 asserts that R(t,0) is equi(in t) uniformly continuous and bounded on 9. To prove the posterior consistency condition (12), we will use Theorem 3.1 of Datta (1991a). Since 9 is a compact metric space, by that theorem we only need to verify the following two conditions: A1 p ”(x) is continuous in 0 for each x, and at A2 with he = :Iloswplpp). * (26) )3] (ha—M)+p0dp-00 as M-Hn. A1 holds by Lemma 1. A2 will follow since (26) holds with h; and p 0 increased to their respective supremum with respect to 0. Since log(p0, /p0) = (0'- 0)x + 7(9)— ¢(0), the mean value theorem gives h; g 7(le + K1) with 7 the diameter of O and K1 the bound Of it. The domination (17) and the boundedness of exp(—1p) gives an uniform bound in L1(p) for p0 . In view of the assumed integrability (15), the strengthened (26) follows from the continuity from above of the lit—integral. o APPENDIX FOR CHAPTER 1 Some results which are used in Chapter 1 and might have independent interest are presented here. Section 1 discusses some sufficient conditions for the equi(in a) continuity of the loss functions L(a,0) and gives an example to show that asymptotically Optimal decision rules may not exist if this condition is violated. Section 2 deveIOps a k—dimensional version of the Muntz-Szasz theorem and gives a suficient condition for the identifiability with respect to k—dimensional exponential families. 1. Equicontinuity of Loss Functions Two lemmas, a remark and an example are given in this section. Lemma 1 shows that conditions (i), (ii) and (iii) below imply the assumptions about the loss functions in Chapter 1. Lemma 2 gives a sufficient condition for (iii). Based on Lemma 2, Remark 1 shows that, when restricted to compact action and parameter spaces, almost all the loss functions in common use fall under our consideration. An example is given to show that the Chapter 1 assumptions on loss functions are relatively necessary to ensure the existence of asymptotically optimal compound rules. Consider the following conditions on a loss function L: (a, 0) E A x 9 am) [0, m): (i) A x 9 is a compact subset of R1 x Rk; (ii) every a—section Of L is continuous on 9; (iii) L is equi(in 0) uniformly continuous on A, i.e. V e > 0, 3 6 > 0 such that (1) |al —a2| < 6 implies |L(al,0) -L(a2,0)| < e foral,a26 Aand 0e 6. 19 20 Lemma 1. Let L satisfy conditions (i), (ii) and (iii). Then L is continuous and bounded on A x 6. Moreover, L satisfies the equi(in a) uniform continuity condition (1.16). Proof: Let (a0,00) E A x O and e > 0. By triangulation about L(a0,0), (iii) and (ii) yield a 6 such that when |a—a0|v| 0— 00] < 6, (2) |L(a,0) — L(a ,0O)| < 6. SO L is continuous on compact A x 9 and therefore L is bounded thereon. By (iii) V e > 0, there exists a 60 > 0 such that (3) Ia-a’l < 60 implies |L(a,0)—L(a’,0)| < 6/3 for 0G 9. Since A is compact, there exist a1, . . .,aIn such that UB(ai, 60) J A, where B(a,7) is the Open ball with center a and radius 7. By the uniform continuity Of each L(ai,-), there exists a 6i such that (4) I01—02| < 6i, implies |L(ai,0l)—L(ai,02)| < 6/3, i= 1,- - -,m. m Let 6 = It. 6i and a e B(aj,6j). By triangulations about the points (aj,0l) and 0 (aj,02), when I01—02I < 6, |L(a,01) — L(a,02)| < c. This is the equi(in a) continuity. 0 Lemma 2. Assume that (i) and (ii) hold. Also assume that A is contained in an Open set C such that (iv) for each 0 E O, L(~,0) has a convex extension to G. Then L is equi(in 0) Lipschitz on A. Proof: Let B be the unit ball in RI and c be a positive number such that A + cB c G. Define (5) 3(a) = 6:3 L(a.0)- Since the supremum is subadditive and positive homogeneous, S inherits the 21 convexity of L(-, 0) on G and therefore is continuous on G. So L(-,0) is bounded on compacta in G. Let M: sup S(a). aeA+cB For any a1, a2 6 A, let (6) 3 = 32 + W541). Then aeA+cB and '32 - all (7) a2=(1—A)al+Aa, A=c +1112 -all' Hence, by the convexity of L(- ,0), (8) L(a2,0) g AL(a,0) + (1—A)L(a1,0). By the nonnegativity and boundedness of S on A + cB, and the definition of A, it follows from (8) that (9) L(a,0) —L(a,.o) 3 AM 3 ¥la2 -a1|- Interchanging al and a2 in (9), the desired result is proved with Lipschitz constant K = M/c. u A more general version of Lemma 2 can be found in Rockafellar (1972) Theorem 10.6. Remark 1. If L(a,0) = q( | a—0l) for some nonnegative nondecreasing convex function on [0, III) (which the most commonly used loss functions are), interchanging 0 with a in Lemma 2, we have that L is equi(in a) Lipschitz, that is (10) |L(a,01)—L(a,02)| sKIol-02|, VaeA and 01, ozee. This strengthens the second result of Lemma 1 from equicontinuous to equi—Lipschitz. Examples Of q are xp for p 2 1. Not all convex functions can be extended (finite) convexly. For example, f(x) = —l i + 1 is a nonnegative convex function on [0, 1], but can not be extended convexly beyond 0. n 22 The following example shows that no asymptotically Optimal rule exists if the equi(in a) continuity condition for L fails. Example. Let A=e=[o, 1] andlet (11) L(a, o) = “-0 2 (a, a) t o a + = 1 (a, 0) = 0. L is continuous in each variable and is bounded by 1. But L is not equi(in a) continuous at 0 = 0, because no matter how small 0 is, L( 0, 0) = 0. Let n bea positive integer and g= (0, . . . , 0) be a constant sequencein R11 with 0 t! 0. By (11) a simple compound rule Bayes versus G: is i = _Q. SO for any compound decision rule 3, the modified regret is 1 n (ti—0) n n (12) D11 = Rn(t, g) = E .2 Ilpo(xi)du (E)- I=l ti+ 1 Let 0 it 0m be a sequence converging to 0. By Fatou’s Lemma n sup D 2 liminf Up11 (x.) dpn(x . 0 n I m 1 0m l J If the right hand side is positive, as is true for the multivariate exponential case of (1.14) and (1.15), no asymptotically Optimal compound rule exists. I: 2. A k—dimensional Mfinta—Saasa Theorem with Applications In this section, we use the 1—dimensional Mfintz—Szftsz theorem to prove a k—dimensional version Of the Milntz—Szész theorem. As an application of this theorem, we consider the identifiability problem for k—dimensional exponential families. Theorem 1. Suppose A(1) is a set in R which contains a sequence of distinct numbers {y1, y2, - - -} such that 23 (13) yu, uz 1, have the same sign and ( ) i l I 1 14 y A =In. u=1 u yu For 2 g i g k, let A(i) be a set in Ri such that for each y(i—l) E A(i—l), there exists a sequence of distinct numbers {yi1, yi2’ - - -} with the prOperties (13) and (14) and {(y(i-1),yiu); u = 1, 2, - - -} c A(i). Then the subspace 13(9) = sp{0 e e am» e”? ; y e A(k)} is dense in the uniform norm in the space 0(9) of all continuous functions on 9 . Proof: We will use the following simple fact in our proof: if T g is the map from C[a, b] to C[c, d] defined by a homeomorphism g from the closed intervals [a, b] _ —l to [c,d], Tgf—fg , then T8 0 Let {y1, y2, - - -} be a sequence of distinct positive numbers such that 113 yuA%-— = m. The Miintz—Szasz theorem (Sata I, Szész (1916)) implies that, for any u a > 0, every continuous function is in the C[a, 1]—closure of the subspace spanned is a norm preserving isomorphism. Y bythepowers g“, u=1,2, Let [c,d] beanintervalwith c>0. Leta>0 and g be a scale change on [a, 1] such that its image includes [c, d]. Noticing that T 8 preserves subspaces spanned by the single powers, the Mi'mtz—Szész theorem holds on the g-image of [a, 1], consequently on [c, d]. Let [ai, bi] be the convex hull of the i—th coordinate projection of 9. Let ”W yi2’ ---} be a sequence with the properties (13) and (14) and let si = sign(yi1). Because of the similarity of the proofs, we only prove the case si = 1. By the above consequence of the Miintz—Szasz theorem, each positive integer power m. 0i ‘, as a continuous function of 5i = exp(0isi), is in the C[exp(siai), exp(sibi)] —closure of sp {6i , u = 1, 2,... }. With g: 0i 6 [ai, bi] ~~> exp(si0i) E [exp(siai), exp(sibi)], it follows from the properties Of T g that 24 mi . . 0iyiu (15) 0i Is In the C[al, bil—closure of sp {e ; u = 1, 2,... }. We now use induction to prove (16) rim? 2 sp,{exp .9“I is 1—1. We here only consider the mixtures of a k-dimensional exponential family: It is a measure on the Borel a—field Of Rh, 9 is a compact set in Rk and P 0 has a continuous density (18) p4.) = cox-W) with respect to It. With the aid Of Theorem 1 we now give a sufficient condition for identifiability of II. Theorem 2. Let S“ be the support Of It, s“ = n{F; F closed and p(F°) = 0}. If x0 6 Rk and S” — 10 is a set A(k) for which the hypothesis Of Theorem 1 holds, then the (I is identifiable. Proof: Suppose that 9w = 9w for probability measures all and (112 on 9. Let 1 2 S be the set where their densities are equal, S = {x; p w (x) = p w (x)}. S is closed 1 2 because p w inherits the continuity Of p 0 by the bounded convergence theorem. Since also It(S°) = 0, S p c S by the definition of support. Since p ”(x) = I exp(0(x—xo)p ”(1:0) dw, S It C S implies that (19) ”(013060) ch», = I f(wxol «w, for all f e D(9) a sp{exp(0y); y e S” — x0}. By Theorem 1 D(O) is dense in C(O): a g in C(O) is the uniform limit of asequence fn e D(O). With |g| + 1 as a dominating function, the dominated convergence theorem yields that (19) holds for g. Therefore, the measures with (vi-densities p 4x0) are equal and, by the positivity of p ”(xo), all = w2 . u CHAPTER 2 COMPOUND ESTIMATION OF TRUNCATION PARAMETERS 1. Introduction Our component distribution in this chapter is a two dimensional truncation family and the component problem is to estimate the truncation parameters in a compact set with squared error loss. Section 2 introduces a general and a special component problem. We will restrict to the special problem from Section 3 on. Two lemmas are proved for the general problem. Lemma 1 proves that a Bayes estimator satisfies a linear equation and Lemma 2 gives a relation between a probability distribution and the marginal distribution Of the random variable X (inversion formula). This relation is a two dimensional analog of (2.1) of Fox (1978) which was first noted by Robbins. In Section 3, we construct compound estimators of Q by directly estimating the solution of the equation of Lemma 1 and construct estimators of On by using the inversion formula of Lemma 2. In Section 4, we consider estimations of average densities and their integrals. Upper bounds on various Ll—errors are derived for non—delete (Lemmas 3 and 4) and delete (Corollaries 1 and 2) kernel estimators. The main results are in Section 5. Using results from Section 4, Theorem 1 gives the Ll-consistency of the estimators Of On and Theorem 2 the asymptotic optimality of our compound estimators. Section 6 gives an example Of the two sided truncation families to illustrate our results. 2. The Component Problem In this section, we first introduce a general component problem. Lemma 1 gives a linear equation which is satisfied by a Bayes estimator and Lemma 2 gives 26 27 an inversion formula for a probability measure. At the end of the section, we specify the component problem we will consider in this chapter. For any 0 E R, let qozR R2 am) {0, 1} be the indicator function of the SE quadrant of 0, q 4x) = [01 5 x1 , x2 5 02] and, let It be a measure on R2 such that 70(0) 5 l/I q0(x) du E (0, In). For each 0 E R2, let X be a random variable with p—density (1) p00!) = «$941)- We will estimate the truncation parameter 0 under the squared Euclidean distance loss. Given a probability measure w on R2, let p also denote its own extension, (2) pub!) = I pix) cm. A component Bayes estimator versus w is (a) twth- = «31’th tifltx» Jig—‘1’ Here 0/0 is interpreted as 0. The following lemma shows that t w satisfies the linear equation (5). Our compound estimators in the next section are approximations of the solution of the equation. Lemma 1. With p «I defined in (2) and (4) V£I)(x) =—fx1 pw(s,x2)ds and v£2)(x) = fx: p ”(xl,s)ds, we have (5) (t ”(x) — x)pw(x) = v w(x). Proof: By the definition (2) of p w and the representation (3) of t w (6) lhs(5) = I (0 —x)p0(x) dw. 28 By the Fubini theorem and the definitions of v32) and p w , 2 (7) vw( )(x) = I I p0(xl,s)[x2$ s]ds dw. Since the integrand of rhs(7) is actually p Ax)[x2 g g 02], integration of s gives us the second component of rhs(6). Similarly _ v ”(1) is the first component Of rhs(6). This proves Lemma 1. o The following lemma gives an inversion formula for w. Lmnma 2. Let p0 and pw be defined by (1) and (2). Then, for any y 6 R2, (8) «(t,) = f {pw(y) 4- pu x1.y2)- pw(y1.x2) + PJXDQyOI) dub!)- Proof: Since P ”(qy) = 1 if qy(0) = 1, (9) qyto) = qytm q,n,ds Integrating both sides of (9) with respect to w, by the Fubini theorem, we have (10) day) = I w{¢(0)qy(0)qp(X)}qy(X)du(1)- The lemma follows from representing the 0—rectangle indicator qy( 0)q 0(x) in terms of 0—NW quadrant indicators and using the definition of p w . I: From now on we will consider the following special case of the above component problem: Let b be a positive number and D be the upper triangle in the square [0, b]2, D = {x =(x1,x2); 0 5 x1 5 x2 3 b}. Let m be a positive number. Let f be a measurable function from D to [0, m] such that f is nonincreasing to SE (nonincreasing with respect to the first component and nondecreasing with respect to the second component) and let e be a function from D to [1 /m, m] such that e is Lipschitz with Lipschitz constant a. Let A be the Lebesgue measure restricted to D and let p be a measure with a A-density fe such that ¢I(0) .=. 29 1/Iq0dpe (0, m) for 06 D and 02 > 01 . Let 5 be apositive number and let 6 = {0 = (01,02); 0 g 01 , 02 _<_ b, 76(0) 5 6}. We will only consider Bayes estimators versus the probability measures u on O. The loss function is still the squared Euclidean distance and the action space A is taken to be D. For this component problem, because p ”(x) > 0 a.e. P w , Lemma 1 yields (11) tw(x)=x+;—:((g a.e. Pw’ where x b (12) v£1)(x)=-_I;) lpw(s,x2)ds and v£2)(x) = [X2 pw(x1,s)ds a.e. A, and ifwe define Fw and III by (13) wa') = I P0013.) dw and 6’6') = day). then Lemma 2 yields (14) 516') = I {pw(y) - pw(x1.y2) - pw(y1.x2)}qy(1) dub!) + Fm)- 3. Compound Estimators In this section, we first construct non—delete and delete estimators of pG , n from which the corresponding estimators of VG are Obtained by the composition It as in (12). With such Obtained estimators Of pG and VG , compound estimators n n Of g and estimators of Cu are constructed by the composition as in (11) and (14). Let Xl ,..., Xn be independent random variables with distributions P 0 ,..., 1 P0 . Let Gn be the empirical measure determined by _0. We use i to denote the n simple estimator with components Bayes versus Gn’ ti(1_r) = t6 (xi). From the n representations (11) of t w and (14) Of 6) with w = G , we might have good It one—stage compound estimators of 0 and Of Cu if we used good estimators of pGn and an. 30 Let K be a bounded density on R2 vanishing off the SE unit square, (15) K g M for some positive number M, (16) IK(x) dx= 1. and (17) K(x) = 0 for x t [0,1] x [—1,0]. Let h be a positive number. (Dependence of h on n will be determined later.) Based on the relation dPu/dA = fepw , we consider the following non-delete estimator p of pG , n (18) ntx)= It? images, 1 and the Xi—delete estimator pi of pGn , (19) p,(x)= _—1—2)h i3 Ken-0% where r(x) = A{e(x—hs); s E the SE unit square}/e(x) is the ratio Of the infimum Of e on the h—square SE to x to the value of e at x. By the Lipschitz prOperty of e, (20) r(x) e [1 — I2ah/e(x), 1]. With the Xi—delete estimator ii of an defined by the composition as in (12) with pG11 replaced by pi , the i—th component of our prOposed compound estimator t is Obtained by first replacing pGn, an Of the th in (11) by pi,v i and then projecting the resultant estimator to A. For any y e R2, let F n(y) be the empirical distribution defined by F,(y)=,1,j2 q (x,.) The estimator CH of an is defined by substitutions of FGn and pGn in C11 (see (14)) by Ru and p. 31 4.Estimationopr n In this section, we will derive bounds on various Ll errors Of estimators Of pG . Lemma 3 derives a bound on Eolp — pG | for every x and Lemma 4 n - n bounds integrals of the derived bound. Corollary 1 derives a bound on E 0'13i — pc I for every x and i and Corollary 2 bounds averages of integrals Of the n derived bound. Lemma 3. With p given by (18), M and 0 the bounds for K and ab respectively and a the Lipschitz constant for e, . M = 211 (21) Vuypnfigf—ewH and (22) lags—pap 3 eff Iq,,,,,-q,IdG,, K(s)ds + 15—39. Hence from the inequality E|W| 5 |EW| + VUl/2(W) with w = p — pG , n (23) E,Ir3 - as I 3 Wu) + rhs(22). - 11 Proof: We prove the evaluations Of (21) — (22) at an arbitrary x. Because the variance of the sum equals the sum of variances for independent summands and the variance is less than the second moment, (24) lhs(21)°(x) s 3.31,: f Mom's? : ’,F dy. By the change of variables s = (x—y)/h, and the boundedness of K, 0 5 K 5 M, and r, 0 5 r 5 1, the j-—th integral in rhs(24) is less than (25) h2Mf K(s)(p0j/(fe))(x—hs)r(x) ds. Since supp(K) C SE unit square, when 3 E supp(K) f(x—hs) 2 f(x) by the SE nonincreasing of f and r(x)/e(x—hs) 5 1/e(x) by the definition of r. Hence it follows from 0 5 pa 5 fl and the property (16) of K that the integral in (25) is 32 bounded by fi/(f(x)e(x)). This, combined with (24) and (25), proves the first conclusion (21). By the change of variables s = (x — y)/h for p and the property (16) of K, it follows from triangulation about r(x)pGn(x) that (26) lhs(22)°(x) s f r(x)|pGn(x-hs) — pGn(x)lK(s) as + (1 - r(x))pGnIx). Since 0 5 It 5 )0, by the definition of pc and q, the absolute value in the first It term of rhs (26) is bounded by (27) 0! lemon) —q,(x)I «16,, By the range (20) of r and 0 5 pc 5 6, the second result (22) follows from (26) n and (27). 0 Corollary]. With pi given by (19) and M, s and a asin Lemma 3, (23) E glf’i — pGnI 5 V(n—1) + is, + rhs(22). Proof: Because i—delete average of n a’s exceeds non—delete by (E—aQ/(n—l) and because 0 5 I r(x)p oj(x—-hs)K(s)ds(=aj) 5 6, the corollary follows from triangulations about E 003,) and E ”(13), and (21) and (22) of Lemma 3. 0 Lemma 4 (b) and (c) summarize bounds on integrals of rhs(23) which will be used to prove Ll-consistency (of the first three terms) of Cu . Lemma 4 (a) will be used only in the proof of Corollary 2. Corollary 2 gives average Ll-bounds on lpi PGnl/PGn and Ivi— anIIPGn which will be used to prove the asymptotic Optimality of i. Lanma4. Let M, 0 and a beasinLemma3andlet m betheupperboundof fand e. 33 (a) (29) I rhs(23)o(x) dp(x), is bounded by (30) Bl(n) E 1%:2 + 2fibm2h + bzmafih/I 2. (b) With m also the uppm bound for 1/e, both (31) f flrhs(23)e(t,r2)dt dp(x) and (32) f fb rhs(23)o(x1,t)dt d;t(x) "2 are bounded by (33) Beta) Eb{m%+ 2min + minimums/Ia. (c) With m as in (b), x b (34) f I; 1f rhs(23)o(y)dy d].l(x) x2 is bounded by bB2(n). Proof: For the p—integral of the first term in rhs(22), we apply the Fubini theorem to interchange the integration order with respect to It and ans. Since the difference Of q’s indicates an x set with A—area less than (81 — 32) bh for s E supp(K) and dIt/dA = fe 5 m2, the integral is bounded by (35) flmzthI (s1 - 82)K(s) dGnds. Because 0 5 s1 — s2 5 2 for s E supp(K) and K is a density, (35) is further bounded by (36) 2sm2bh. By using the common upper bound m of f and e and the fact that the A—area Of supp(p) is less than A(D) = b2/2, the p—integral Of V(n) and the second term in rhs(22) are bounded by the first and third terms of (30) respectively. 34 Result (a) follows from this and (36). Similar to the derivation of (36), the turn integral of the first term in rhs(22) is bounded by b(36). By the SE nonincreasing, f(t,x2) 2 f(x) for t 5 x1 and f(x1,t) 2 f(x) for t 2 x2 . Then, using the the common upper bound m of f, e and 1/e for V(n) and the second term in rhs(22), we have the desired bound (33) for (31) and (32). With some minor changes for the domain of integration, the procedure used in the proof of (b) will also prove (c). n Corollary2. Let M, ,6, a and m beasinLemma4. (a) 11 . (37) %,ElEflpi-pgnl°(X,)/pGn(Xi)} l . 2m2 Is bounded by 2%” + Bl(n-1). (b) With m as in Lemma 4 (b), both (as) fi-iglEglirIl)-vé:)|°(X,)/pcn(xi) and n . (39) §iglEple2)-vé:)lo(X,)/pgn(xi) 3m2 are bounded by 2%)“ + B2(n-1). Proof: By Corollary 1, (37) is bounded by n (40) f islpotiéf + V(n—1)+ rhs(22)}/pG dp. I: l II By cancelling pG in the numerator and the denominator of (40) on [pG > 0] and n 11 using Lemma 4 (a), (a) of this corollary is proved. By the definition of vi and VG , and Corollary 1 and Lemma 4 (b), (b) of n this corollary can be proved similarly. o 35 5. Asymptotic Optimality at; Based on the bounds derived in Lemma 4 and Corollary 2, we here prove the —1/4 O(n—1’4) Ll-consistency of Cu in estimating Cu and the O(n ) so of 1 in estimating 0 . A First the Ll-consistency of Sn Theorem 1. Let On be given by (14) and On be defined in the last paragraph of Section 3. Then under the assumptions of our special component problem described in the second last paragraph of Section 2, (41) WI lén(y)-G,,(y)l titty) is bounded by 3bB,(n) + b2/(4l 5). hence is O(n-1,4) by taking h = n—l/4 in B2(n). Proof: By using the Fubini theorem to reverse the order of integration to prxE 0 and then using rhs(23) in Lemma 3 to bound the inner integrals for the first three terms of (41), it follows from the definition of qy and the integration Of y2 in the second and y1 in the third terms that (42) 1118(41) S (34) + b-{(i’tl)+(3‘4-’)} + Epl lf‘nm - F3116“ MI)- Since X j are independent and qy(Xj) are unbiased estimators of F0j(y) with variances less than 1/4, by the Fubini theorem and Schwarz inequality, the last term in rhs(42) is bounded by b2/(4I F). The theorem follows from (42) and Lemma 4 (b) and (c). n 36 The next lemma is the Singh—Datta inequality (Datta (1991) Lemma 4.1) which is useful in bounding the difference of two quotients. We state it here for convenience. Lemma5. For ER5 and )03t05b, (43) mug-filth: Ié—al +(I%I+b)li‘9-I9I. The following is the asymptotic Optimality result. Theorem 2. Let ti and i. be as defined in the second and fourth paragraph of . 1 " 2 ' 2 . Section3andlet Dn=fi.2 E0{|ti-0i| - [ti-0i] } bethe modified regret. Then under the assumptions of Theorem 1, IgDn is bounded by (44) 9%; + 8szl(n—l) + 8bB2(n—1) and _t_ is asymptotically Optimal with rate ad“. Proof: Since 6 C [0,b]2 and ti Ii E [0,b]2, applying the identity a2 - b2 = (a—b)(a+b) to each component, the i—th integrand of D11 is bounded by 2 , . ,, . (45) 2b 2: ItIJLtIJh. i=1 , By applying Lemma 5 to the above absolute value with véJ)/pG as old and n n then weakening the resultant bounds by lvéj)/pG | 5 b, (45) is bounded by n n (46) 2b§ IIGW-véjh/p )+2b(4bIn--pG l/pG) j=1 ‘ n Gn l n n as. PG . The bound (44) follows from Corollary 2, and the a.o. result follows from n the bound (44) with h = n’1/4. n 37 6.AnExample We now give an example of our component problem. A similar example was considered by Wei (1989) for the empirical Bayes problem. (See his Theorem 1 and display (3).) Let b be a positive number and let h be a Lipschitz function from [0, b] 2 ”2 to (o, e). For any a = (01, 02) e R with 02 > 91, let c(0) = 1U; h(y) dy. 1 Let s beapositivenumberandlet e={o=(ol,02); 0501,0251), 02—012 0}. For each 06 O, Y is a r.v. on [01, 02] with density 306') = c(0)h(y)[01 s y s 92]- If k22 and Yl""’Yk aret't'd Y, then X = (X1, X2) = (min Yi’ max Yi) k is a sufficient statistic for {l'Ig0(yi); 0 E 9} and if D is the upper triangle in the 1 square [0, b]2, D = {x; 0 5 x1 5 x2 5 b}, then a density of X given 0 can be written in the form: (47) p00!) = f(x)e(x)ll(0)qp(x)D(x). where (48) f(x) = h(k-1)c2’k(x). e(x) = name,» Ito) = c170). sax) = Io, 5x1 ,x,s 021. By the definition Of c, f is nonincreasing to SE. As a product Of Lipschitz functions from [0, b] to (0, an), e is such. The boundedness Of f follows from that of h. Since h is bounded away from 0 and 0 from the diagonal, c is bounded and so is III. Hence, all the assumptions in the second last paragraph of Section 2 are satisfied by this example. CHAPTER 3 THE LINEAR COMPOUND PROBLEM I. Introduction Robbins (1983) introduced an empirical Bayes decision problem where the component problem is squared error loss estimation of means and the class Of component decision rules is restricted to the class of linear estimates t(x) = A + Bx. In this chapter, we consider the compound decision problem with this component. We construct a compound decision rule _t_ and show that its modified regret DnL is of order O(n-'I/2 ) uniform in parameter sequences under certain assumptions on the component parameter space 6 and the family of distributions P 0 , 0 E 9. In Section 2, we specify our component problem explicitly, and derive a linear component Bayes rule and a compound rule. In Section 3, we prove asymptotic Optimality of the compound rule. Section 4 gives an example to show that there exists such a case that there is no asymptotically Optimal rule when 9 is unbounded. 2. The Component Problem and a Compound Rule Let M be a positive number and let 9 be a bounded subset of R, (1) 9 c [—M/2, M/2]. Let a, b and c with c it 1 be constants such that the function H, defined by (2) H(0) = a + b0 + c9, is nonnegative on 9. Let K be afinite number. For each 0E 9, let x be a r.v. with distribution P0 such that (3) Eax = 0 and Varox = H(0), 38 39 and (4) E ,(r — o)4 5 K. An example of the above distributions is the family of the Poisson distributions with a parameter set bounded away from a. Let g = (o , on) e an and on be the empirical distribution determined by 0. Let 7 and 7 bethemeanandvarianceof Cu, 1 n 1 n 2 (5) 7=E0=i 2 0. and 7=Var0=i 2 (IL-7). j=1 J j=1 3 Let 02 be the variance Of the mixture GnP 0 and let H be the average of H( 0i), 1 ii H o (6) - 5H ( jI. By assumption (3), (7) 02 = EVar(x| a) + VarE(x| a) = E“ + 7, where E denotes the joint expectation corresponding to GnoP 0 Hence, by the identity n n (s) I: 02. = 2 (II-7)2 + as? and the definition (2) of H, we have (9) 02 = 11(7) + (c+1)7. Let r: R2 ~~> R; (x,y) ~~> (y — H(x))/((c+l)y) and a = r(-0,02). By (9) we have a = 7/ 02. It follows from (7) and H2 0 that (10) 0 5 a 5 1. By assumption (3), a minimizer Of (11) ii 40 within the class of linear functionals t(x) = A + Bx is (12) f(x) = 0 + O(x - 0). (See Robbins (1983) for a derivation with a general mixing probability measure.) We will take 11 > 2 in the remainder Of this chapter and let xi be independent random variables with distributions P0. , i = 1, ..., n. The simple 1 compound procedure i is defined by its components fig) = f(xi), i = 1, ..., n. 2 i (which will be defined in (15)), we see, from (12) and (10), that i is estimated With 7 and 02 replaced by their asymptotic unbiased estimators iii and s by i with components (13) £i(£) = ii + fiio‘i " Ei): where — 2 (14) 1% = 0Vr(xi , si)A1, and - 1 2 1 - 2 (15) xi = mfii xj and si = n—_1-j§i(xj—xi) . 3. Asymptotic Optimality of i In this section, we will show that the compound rule i has the same asymptotic behavior as i (Theorem 1), which is unavailable because 01, ..., on are unknown. Lemma 1 shows that [H(Wi) - H(0)| and la? — 02| are Of order O(n—1). Lemma 2 shows that H(fl and s2 are L1 consistent estimators Of n H(T) and 02 with rate 11—1/2. Lemma 3 gives bounds on % 2 Edii — '0)2 and i=1 — n 1 n the difference of the compound risks for _t_ and _t'_. Theorem 1. Under the assumptions of (1), (3) and (4), 815]) DnL = O(n—”2). Before the proof of the theorem, we prove three lemmas. Let G; be the empirical measure determined by 0; , the vector _0 with 0i deleted. Let 7i and 7i be the mean and variance of G; , (17) Hi: =—_1-2ij0 and 7i: —_1-j2ij—(0 -—i27). #i" Let a? bethe variance Of GiP0' By (9), (18) of = H(Tii) + (c+l)7i . The first lemma gives two identities about the differences between delete and non—delete 7’s and H’s. From these and the boundedness Of 9, (21) and (22) follow directly. Lemma 1. With notations introduced in (5) and (17), and a2 and a? the variances of ano and GiP0’ we have (19> t-v=—2.—1r«t-I>2-n+57” and (20)(n°1i-—)2(tl——.7I)2 Hence, under the assumption that 9 is bounded, I21) IH(7,)-H(7)l = O(n-1). and by (9) and (18), 42 (22) Is? — 02] = O(n-1). Proof: The following identity will be used repeatedly in our proof: (23) lEa—lza- l a—E) VaER ni=ll Hwy-Edi r ' By adding and subtracting 7, and using identity (23) with ai = 0i , 11 Hence, the result (19) follows from adding and subtracting (n—EI)27, and applying identity (23) with ai = (IIi — '6)? Since a? — 72 = 27(7i — 7) + (7, — if, by the definition of n and identity (23) with ai = 0i , the second result (20) follows. a The next lemma gives rates, and its proof gives bounds, of the L1 errors of 82 and H(i) respectively. n Lemma 2. Let s2 = il; .2 (xj — 32. Then, under the assumption of Theorem 1, (25) 1 East” - 02| = curl/2) and - I26) EoIHGcl—HUN = O(n—1’2). Proof: Since - (27) Ix,- -22 - II,- J)” = [Ixj — 0,) — (i-7)][(xj - 0,) - Iii-7) + 2(0, -7)]. by the expression (7) of a2 2 2_ln _ 2_ _—_ 22n _ _ (23) s a _ njglej oj) H(0j)] (x 7) + njglorj 09(0). 7). By applications of the Schwarz inequality to the first and third terms, and the independence Of x j , the expectation Of |rhs(28)| is bounded by 43 (29) (HE‘S-(xi — If)"2 + 51’ + 2(i22H(0j)(0j — Tiff/2. From the boundedness of O, (0j — 0) and H, hence H are bounded. Therefore, by the boundedness of the fourth central moment Of x, (29) is of order O(n—1,2), which proves the first result. Similar to (27), (30) mi) 41(7) = (b + 2c7)(i-7) + c(i-7)2. By applying the Schwarz inequality to the first term, the expectation of |rhs(30)| is bounded by (31) lb + 2c0'|(%H)1/2 + fin. The second result follows from the boundedness of H and '0. n 11 Lemma 3 below considers the asymptotic behavior of % 2 E 001i - 7)2 and i=1 - Eolfii —a|. These two terms will appear in a bound of DnL . Lemma. 3. Let H be defined as in (6). Then, under the assumption of Theorem 1, n (32) grips-7)” = ,éyfi + (ll—11921175 F32, and (33) 02Eplfii-al = O(n-1,2). Proof: The inequality in (32) follows from the expression (7) of 02. By adding and subtracting 7i , and using identity (23) with ai = 0i - 2 1 1 2 The equality (32) follows from taking average of (34) with respect to i. 44 With (c + 1)a2 as 6 and l as b in the Singh—Datta lemma (Lemma 5 of Chapter 2), after some algebra, (35) (C+1)02|fii-al561159-110)!+(2|<=+1|+ nsi—ail}. Triangulating about H(Vi) for the first term and a? for the second term, Lemmas 1 and 2 complete the proof of the second result (33). n We are now ready to prove the theorem. Although some computation is needed, the idea of the proof is simple. First, the summands of DnL are simplified to (40) by using a2 - b2 = (a—b)(a+b) and the fact that xi is independent of (ii , fii). Then the bound (41) is derived by the boundedness Of a and Bi . Lemma 3 and the Schwarz inequality finally finish the proof. Proofof Theorem 1: From the expressions (12) for {i and (13) for ti , we have (36) ii - 0i = (1 — a)('6 — 0,) + oni - 0,), and (37) ii — 0, = (1 — six? - 0,) + slot, — 0,) + (1 — six;i — 7). Hence (38) ii - E, = (37) - (36) = (fl, - a)(xi - 7) + (1 - I995, - 7). and (39) (ii — 0,) + (ii — 0,) = (37) + (36) = "(2 " fii " a)”, - 7) + (fli + a)(xi - 0i) + (1 - 1395i ‘ 7): Since xi is independent Of ii and fii, and E 0x = 0i, the i—th summand I of DnL , namely E drhs(38) rhs(39)], is equal to 45 (40) 49(6), — 702(2 - I9, — axis, - a) + E962 — now, + a)Ixi — 0,)2 + E30 —Iii)(ili -7)(fli — a- 2 + B, + a)(§i -7) + E41 —I3i)2(ii —T)2. Because [al 5 1 and Ifiil 5 1, DnL is bounded by (41) films, II)” + manuals, — al + ”9' 9, —7| Iii —'6| + Eéii :02}. i= Distributing the average across the three terms in the summands, (41) is further bounded by O(n—1,2) + 2[7 rhs(32)]1/2 + rhs(32), by (7) and Lemma 3, and the Schwarz inequality applied to the second term. This finishes the proof of Theorem 1. o 4. An Example The example below shows that, when 9 is unbounded and the fourth central moment is a continuous function of 0, the conclusion Of the theorem may not be true. Example: Let 9 be the interval [1, m). For each 0E 9, the distribution of x is: P0(x=0)=%- and P0(x=20)=%. Then on = 0 and Varox = 02 2 1. The boundedness conditions of 9 and E0(x 0)4 = 04 are violated. For any 11, let tni be the compound decision rule for 0i . When _0 = (0, ..., 0), the simple linear Bayes estimator of Q is 0. SO the modified regret DnLQ’ 0) is 1 n (42) a 2 46 From the distribution of x, (42) equals 1 n (43) — 2 2 11211 i=12(tni(x1)mlxn) - 0) l where the inner summation is for all possible 211 points of x. Choose 00 such that 2 (tn1(0,. . .,0) — so) 2 n2“. For this 00 , (43) 2 1. Hence DnLQ’ I) does not converge to 0 and consequently the conclusion Of the theorem is not true. a BIBLIOGRAPHY Billingsley, Patrick ( 1968). Convergence of Probability Measures. Wiley. Brown, L.D. and R. Purves (1973). Measurable selections of extrema. Ann. Statist. 1 902 — 912. Brown, Lawrence D. (1986). Fundamentals of Statistical Exponential Families with Applications In Statistical Decision Theory. IMS Lecture Notes — Monograph Series 9. Datta, Somnath (1991a). On the consistency of posterior mixtures and its applications. Ann. Statist. 19 338 - 353. Datta, Somnath ( 1991b). Asymptotic Optimality of Bayes compound estimators in compact exponential families. Ann. Statist. 19 354 — 365. Datta, Somnath (1991). Nonparametric empirical Bayes estimation with O(n-1,2) rate of a truncation parameter. Statistics 8: Decisions 9 45 - 61 Fabian, VAclav and James Hannan (1985). Introduction to Probability and Mathematical Statistics. Wiley. Ferguson, Thomas S. (1967). Mathematical Statistics: A Decision Theoretic Approach. Academic. Fox, Richard J. (1978). Solutions to empirical Bayes squared error loss estimation problems. Ann. Statist. 6 846 — 853. Gilliland, Dennis C. (1968). Sequential compound estimation. Ann. Math. Statist. 6 1890 - 1904. Gilliland, Dennis G. and James Hannan (1974). The finite state compound decision problem, equivariance and restricted rlsk components. RM — 317 Statistics and Probability, MSU; (1986) Adaptive Statistical Procedures and Related Topics. IMS Lecture Notes — Monograph Series 8 129 — 145. Gilliland, Dennis C., James Hannan and J. S. Huang £1976). Asymptotic solutions to the two state component decision problem, ayes versus diffuse priors on prOportions. Ann. Statist. 4 1101 -- 1112. Hannan, James F. and Herbert Robbins (1955). Asymgtotic solutions of the compound decision problem for two completely speci ed distributions. Ann. Math. Statist. 26 37 — 51. Hannan, J. F. and J. R. Van Ryzin (1965). Rate of conver ence in the compound decision problem for two completely specified distri utions. Ann. Math. Statist. 36 1743 - 1752. 47 48 Mashayekhi, Mostafa (1990). Stability of symmetrized probabilities and compact equivariant compound decisions. Ph.D. Thesis, Dept. of Statistics and Probability, Michigan State University. Mashayekhi, Mostafa (1993). On equivariance and the compound decision problem. Ann. Statist. 21. Nogami, Yoshiko (1978). The set—compound one—eta e estimation in the nonregular family Of distributions over the interv (0, 0). Ann. Inst. Statist. Math. 30 Part A 35 - 43. Nogami, Yoshiko 1988). Convergence rates for empirical Bayes estimation in the uniform U 0, 0) distribution. Ann. Statist. 16 1335 — 1341. Oaten, Allan 1972). Approximation to Bayes risk in compound decision problems. Ann. ath. Statist. 43 1164 — 1184. Robbins, Herbert (1951). Asymptotically subminimax solutions of compound statistical decision problems. Proc. Second Berkeley Symp. Math. Statist. Prob. 131 — 148. University of California Press. Robbins, Herbert (1983). Some thoughts on empirical Bayes estimation. Ann. Statist. 11 713 — 23. Rockafellar, R. Tyrrell (1972). Convex Analysis. Princeton University Press. Szasz, Otto (1915 — 1916). Sur die approximation stetiger funktionen durch lineare aggregate von potenzen. Math. Ann. 77 482 - 496. Singh, Radhey Shyam (1974). Estimation of derivatives Of average Of p—densities and sequence—compound estimation in exponential families. Ph.D. Thesis, Dept. of Statistics and Probability, Michigan State University. Van Ryzin, J. R. (1966). The compound decision problem with mxn finite loss matrix. Ann. Math. Statist. 37 412 — 424. Wei, Laisheng ( 1989). Asymptotically Optimal empirical Bayes estimation for parameters of two—sided truncation distribution families. Ann. of Math. 10(B) 94 — 104, China. Yu, Kai F. (1986). On the bounded regret Of empirical Bayes estimators. Commun. Statist. — Theory Meth. 15 2391 — 2403. Yu, Kai F. (1988). A linear refiession with unobserved dependent variables. Commun. Statist. — Theory eth. 17 3075 — 3087. nrcHIonN STATE UNIV. LIBRARIES lll[Willi[Will[llllllllWillllllll[Illlllllllllllllll 31293010554362