APPROXIMATION T0 BAYES RISK IN COMPOUND DECISION PROBLEMS Thesis for the Degree of Ph. D. MICHIGAN STATE UNIVERSITY ALLAN OATEN 1969 ‘flEEIC This is to certifg that the thesis entitled APPROXIMATION TO BAYES RISK IN COMPOUND DECISION PROBLEMS presented by Allan Oaten has been accepted towards fulfillment of the requirements for PhoDo degree in StatiStiCS and Probability EEiZjQ~M£A_,Iéréhflb~u10£_. U Major professor Date August 5, 1969 0-169 I I ‘ -e .- IINOINO IV ‘ I me a saw I“; mm mm Inc. V I IIIIA-V mint-s g': n 'I ABSTRACT APPROXIMATION TO BAYES RISK IN COMPOUND DECISION PROBLEMS By Allan Oaten The set version of the compound decision problem involves simultaneous consideration of N statistical decision problems, called the component problems, with identical generic structure: state space 0, action Space A, sample space I and non-negative loss function L defined on Q X A x I. With 5" (x1,x2,...,xN) N distributed according to H P = P , a compound procedure is a ._ 9 ‘g 1-1 i N vector, 9 = ($1,...,cpN) such that, for each i, cpi: I, _. A. The conditional risk, given 5, of the procedure $2 is N W@,gg,§) = N-1 Z L(er,cpr(§),xr), the unconditional risk is r=1 R(g,gg) = fW@,gg)dPe, and the modified regret is D(_Q_,gg) = R@,§Q) - R(GN) where G is the empirical distribution of 91,6 and R(o) N 2,000,6N is the Bayes enve10pe in the component problem. For F a distribution on Q, QF is the set of procedures in the component problem which are e-Bayes against F. Let C be an estimator of GN - i.e. for each §_E IF, §(§) is a distribution on Q and let QC be the set of compound procedures 53 such that, for each 5, there is an element, cp°(§), of @6405) such that cpr(§) = cp°(}_()(xr) for each r. Thus, to use a procedure in QC one first estimates GN by C(g) and then plays e-Bayes against C(g) in each component problem. Allan Oaten We consider two subsets of Qa, the "half-Space" procedures and the "equivariant uniformly e-Bayes" procedures. For the m X n problem (i.e. O has m elements, A has n) we establish the uniform almost sure convergence of W@,gg,39 to R(GN) for half- Space procedures if C is "uniformly strongly consistent"; and if C is "uniformly consistent" we establish D@,gg) < 0(1) + e uniformly as N ~ m for both types. For the m X m problem, we again establish D@,gg) < 0(1) + e for the equivariant procedures. We also consider the problem when. 0 is infinite. We consider a class of procedures that differs from QC in that the corresponding component procedures are defined in the elements of a finite partition, V, of I, rather than I itself. Assuming the existence of "good" estimators C, we establish results similar to, but slightly weaker than, those for finite state spaces, pro- vided V is a "good approximation" to 1 in respect of both the distributions Pb and the loss function. We give conditions under which this latter requirement holds, and show that these conditions are satisfied by wide classes of problems. APPROXIMATION TO BAYES RISK IN COMPOUND DECISION PROBLEMS BY Allan Oaten A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1969 TO MY FAMILY ii ACKNOWLEDGEMENTS I wish to express my sincere gratitude to Professor James Hannan, who introduced me to the Compound Decision Problem and guided me through this research. His suggestions have been the source of many of the results presented here and have improved virtually all results, either in substance or in form, almost beyond recognition. His comments helped me avoid many wrong turnings. In addition, among many others who helped, I would like to thank Professor Dennis Gilliland for some useful discussions and for reading and commenting on a difficult rough draft; Mrs. Noralee Barnes, who typed both the rough and the final versions of this thesis, and whose speed, accuracy and cheer- fulness leave me, at times, unnerved; my wife and son for their heroic patience and endurance; and Messrs. S. Guthery and F. Ruiz for several helpful discussions. Finally I would like to thank the Department of Statistics and Probability at Michigan State University for the generous support, financial and otherwise, that I have enjoyed over the period of my graduate studies. iii Chapter 0 I II TABLE OF CONTENTS INTRODUCI‘ION MD NOTATION 0 O C C O O O O O O O O O O O O O O ........ FINITE STATE SPACES l 1 .0 l. 1 .2 Definitions and Preliminaries ................. Finite Action Spaces. Definitions of Half- Spaces and Half-Space Procedures .............. Unions of Intersections of Half-Spaces ........ Convergence of D(§Jg9 for Half-Space Procedures .................................... Remarks on the Restrictions on Half-Space Procedures .................................... Uniformly e-Bayes Procedures. The Sets QPB' Equivariant Procedures ........................ Invariant Estimates. Convergence of D(_9_,gQ) for Equivariant Uniformly e-Bayes Procedures. (Finite Action Spaces) ........................ Convergence of D(§,@) for Equivariant Uniformly e-Bayes Procedures. abtally Bounded Action Spaces) ................................ INFINITE STATE SPACES 2.0 Introduction .................................. 2.1 Finitely Based Decision Procedures ............ 2.2 Convergence Theorems for e-V-Bayes Compound Procedures .................................... 2.3 Approximating the Sample Space by a Finite Partition ..................................... BIBLIOGRAPHY ....... ................................ APPmDIx 0.0.0.000...OOOOOCOOOOOOOOOOOOOOO0.0.000... iv 10 12 13 17 20 21 26 30 31 32 40 44 46 0. INTRODUCTION AND NOTATION In this thesis we consider a set of statistical decision prob- lems, each having the structure of what is called the component prob- lem: a state Space 0 indexing a family of probability distributions {P&, w E 0} over a o-field 46 of a sample space I; a measurable action Space (A,a0; and a loss function L 2 0 defined on Q X A X 1, which is ahmeasurable for each m and x. A (randomized) decision function W has domain I X 6’ and is such that Y(x)(-) is a probability measure on <7 for each fixed x. If the State is w, the conditional risk of Y given x is (0.1) L(w,~r,x> = IALY(x)dP = ILdP and (0.6) R@,g) =fw°°q> = Meow”. As is becoming standard (cf. Gilliland, (1968) p. 1890), we say a compound procedure 9_ is Simple if ¢r(-)(C) is a function of xr for each r and C. If, in addition, the mr are identical, say ”r = m, we say 9. is simple symmetric, with kernel m. We Shall, in general, identify Simple symmetric procedures with their kernels, and write R(§,¢) and ‘W(§,m,x) for, respectively, the risk and the conditional risk given 5. of the simple symmetric pro- cedure whose kernel is m. For §.E Om and any simple symmetric procedure w, -1 N new) = N )3 Merm- r=l From the right side it is clear that R(§,¢) is the risk of the component procedure, m, against G , the empirical distribution of 91,92,...,eN. To emphasize this, we shall frequently write, when m is Simple symmetric, (0. 7) RQM = GN[R(w.cp)] R(GN,cp) 2 R(GN) where R(-) is the Bayes envelope for the component problem and we have used the convention of writing integrals in operator notation, i.e. G[h(w)] = Ih(w)dG(w), a notation we shall continue to use for integrals on the Space 0 only. In (0.7), because of the nature of GN’ theiB-measurability of LQn,m(x),x) for each m suffices to make R(GN,¢) meaningful. Future references to R(G,m), where G is a distribution on 0, will be restricted to the class, Q, of "measurable" component procedures - those for which R(w,m) exists and is a measurable function of m. In particular, the Bayes envelope at G is given by R(G) = inf R(G,m). 9 For any Simple procedure g) (cf. Gilliland (1968), p. 1890), N sup {new - R(GNH 2 sup {N‘1 2 R(e.cpi) - inf j‘L(e,‘1’)dPI e e r=1 Y 2 inf sup {R(e,cg) - inf fL(9,Y)dP}. w 0 Y The right side is positive, and is zero only when the component prob- lem is trivial; hence, with modified regret defined by (0.8) Dam) = Mam) - R(GN) we have D(§,g) 2 0 for all simple g; and if the component problem is non-trivial, there is a 6 > 0 such that sup D(§,g) 2 6 for all 9 simple 9, However, it is possible, in some cases, to find non-Simple pro- cedures, g, for which D(§,g) converges in some sense to zero as N a m. Robbins (1951) gives a heuristic argument for the existence of such procedures, and precedes it by an example in which the component problem is to distinguish between N(1,1) and N(-1,l). In the case where the component problem is to distinguish between two arbitrary distributions, Hannan and Robbins (1955) exhibit non-simple procedures, 9, for which D(§,g) < 0(1) uniformly in .g as N a m (Theorem 4); this result is obtained as a corollary to a theorem (Theorem 3) in which it is shown (in our notation) that to any 6 > 0 corresponds an N(e) such that, for any §_E Ow, P°°[W(g,g,y - R(GN) > e for some N > N(e)] < e. In addition, defining the "equivariant envelope" by R*(cN) = inf {R(GN,Q): 52(gg) = g Egg) for all 5 e 1“, g 65} where S; is the set of all permutations, g, of vectors of N coordinates (i.e. )) where g(r1,r2,...,rN) = (rg(1)’rg(2)";"rg(N g(1,2,...,N) = (g(l),...,g(N))), they show that In (3N) - R(GN)I s 0(1) 0 as N a m. Hannan and Huang (1969) have generalized and strengthened this latter result by showing that, for a finite State Space and under * - mild conditions on the loss function, IR (GN) - R(GN)I s 0(N 5 ) as N a m. Hannan and Van Ryzin (1965), in the case considered by Hannan and Robbins, exhibit a function of the observation in the rth problem which provides an unbiased estimate of the empirical distribution of the rth state; they consider procedures which consist of playing Bayes, in each component problem, against the estimate of GN given by the average of these estimates, and give sets of conditions, each stronger than the last, under which an upper bound for D(§,Q) is 0(N-g ). J; -1 . 0(N ) and 0(N ) reSpectively, each bound being uniform in 'g. Van Ryzin (1966) considers the case in which the component problem consists of making one of n decisions based on an observation from one of m distributions; for procedures analogous to those of Hannan and Van Ryzin, he gives conditions which imply D(§Jg) S 0(N"35 ) and further conditions implying D(§,g) s 0(N-1), both rates being uniform in ‘g. Mention should also be made of the papers by Stein (1955) and James and Stein (1960) Showing inadmissibility of the usual estimator of the mean of an N-variate normal distribution (N > 2), with co- variance matrix identity, under Squared error loss. As is recognized in these papers, this is a compound problem whose rth component prob- lem is to estimate the mean of the rth coordinate of the random vector; however the estimator which makes the usual estimator inadmissible does not seem to have been derived by an explicit compound procedure. Other work on the compound problem includes that of Fox (1968, Chapter 2) who exhibits procedures, 9, for which D(§,g) ~ 0 for each .g in the case where the component problem is a test on the parameter of an exponential family; that of Cogburn (1963 and 1967) who deals in particular with the case in which the component problem is to estimate the probability of success of a binomial distribution and who formulates a general approach from the point of view of the notion of stringent solutions; that of Suzuki (1968) who includes (among other things) some discussion of "e-Bayes" simple symmetric procedures; and some expository work of Samuel (1967). Throughout this thesis, 3 denotes an arbitrary non-negative number, possibly depending on N, and we concern ourselves with the asymptotic prOperties of procedures which consist of playing e-Bayes against an estimate C(x) of GN. The main intuitive justification for these procedures is the following lemma, a slight generalization of earlier results (see, e.g., Hannan and Robbins (1955), Theorem 1), which we will refer to again, and which shows that a simple symmetric procedure which is Bayes against a distribution "close" to GN will yield a risk close to the Bayes envelope at GN' Lemma 0 Let G and F be any distributions on Q and let m be any measurable component procedure. Then R(G,cp) - R(G) s sup (G - F)[R(u>.v) - R(w,‘i’)] + R(F.cp) - R(F) v,YE§ where Q is, as before, the class of measurable component procedures. 2.22213; (1) R(G.w) - R(F. 0 and v > 0 Such that, for each gen”, sup P00[IC(§) - GNI (0) > TI] < Y- N>N1(TI.Y) With the supremum inside the Square brackets, C is uniformly strongly consistent. (An estimator C is really a sequence of functions C1,C2,..., with CN : IF "n3 being BN-measurable for each N. We Shall not need to emphasize this formality, however.) Definition 2. For each F 6.3, let QF be the set of component procedures e-Bayes against F. For C an estimator, let % = Isa N 22.3 «90(5) 6 @233) such that,vr, are) = cp°(§)(xr)} 9 10 Thus to use a procedure in fig one first estimates GN by C(x) and then, using a simple symmetric procedure the choice of whose kernel is permitted to depend on 5, plays e-Bayes against C(x) in each component problem. Henceforth, for each x. and each g_€ 9A, mo(x) will denote the component procedure given by Definition 2. It is with two subsets of QC that we will be concerned in this chapter, the first in sections 1.1, 1.3 and 1.4, and the second in sections 1.5, 1.6 and 1.7. I §l.l Finite Action Spaces. Definitions of Half-Spaces and Half- Space Procedures. Throughout this section (and also sections 1.3 and 1.6) A = {1,2,...,n}. Let Ek be k-dimensional Euclidean space and let 2: I » Emn be given by (1.2) Z(w,a,x) = LQn,a,x)f(w,x) m = 1,2,...,m; a = 1,2,...,n. We shall adopt, for Z, the same conventions as for L; i.e. 20b,x,x) = IZ(m,a,x)dx(a) for any signed measure on <7 for which the right Side exists. Definition 3. A set H<: ER is a half-space if, for some linear functional L and some number p, either H or Hc is {y: L(y) < p}. let k; be the set of all intersections of 8 half- Spaces, Hi the set of all unions of t members of NE, and t_ -1 t Xs-Z avg). For each F €.& we want to restrict attention to those members of Q which take on only finitely many values, and for which the F corresponding induced partition of I is a collection of regions each of which is an element of x: for some t and S. Formally, Y 6 CF is an element of QF,s,t,v if 11 {Y(X) : x E I} C {v1.v2.---.vv} where v1,...,vv are distinct measures on (7 and, for each j, {x: Y(x) = vj} = Qj for some (possibly empty) Qj E x:. Hence v Y(x)(a) = 2 Qj(x)vj(a) for each x 6 I and a E A. 1‘1 . Definition 4. (Half-Space Procedures). Let G be an estimator. Then,with s,t,v all finite, _ . o éE‘;.s.t:.v ' {9’- 6 Q3: ' for web i’ (P (3") 6 Q€3(.>;>.==».t:.v}’ where $0 is as in Definition 2. To use a procedure in QC,S,t,V’ one first estimates GN by C(x) and then, using a Simple symmetric procedure whose kernel, the choice of which may depend on x, is an element of §C(§),s,t,v’ one plays e-Bayes against C(x) in each component problem. Most results obtained so far in the set version of the compound decision problem have been obtained only for Special Subsets of QC 3 t v (e.g. Hannan and Robbins (1955) and Van Ryzin (1966)) - , 2 3 usually the class QF 8 t v has been restricted to those procedures, , 2 8 Y, for which Y(x)(a) = 2 Q0 (x)v (a), where x E Q0 if B is the BZA FB B FB set of Bayes acts against F when x is observed. The proof that QPB G X: for some t and s will be given in a more general context in Lemma 2, in section 1.5. Usually vB(a) is restricted to the values 0 and 1. In addition the form of the estimator, C, is usually restricted, especially when rates have been obtained; the most common form is the "average of unbiased estimators" given by Hannan and Van Ryzin (1965) and mentioned in the Introduction. 12 §l.2 Unions of Intersections of Half-Spaces. t Recalling the definition of N; in the previous section, and t identifying sets and their indicator functions, we see that if H €.flg, then for some elements {Hj} of Hg, 12 _ = c c c c c c . H - Li Hj H1 + H1}!2 + H1H2H3 +...+ 11le Ht-lHt° Since Hj 6 NS S implies that, for some half-Spaces {Hji}, Hj = fl Hji’ so that c S c C c i: t = = H + 0.. 0.. h Hj ingji jl HlejZ + + HlejZ Hjs_1Hj§,1we see t at H ENS implies that H is the disjoint union of at most 2 8 members of i=0 ~St° For future use we note that this implies: t. J (1.4) If Hj 6 kg] for j = 1,2,...,J, then 0 H. is the disjoint k union of C ( Z 3,) members of N', where q = Z sjt,. j=1 k=0 3 q j=l 3 We now prove a lemma which is a slight generalization of the results of Ranga Rao (1962), and which we use in section 1.3. Lemma 1. Let (Iy3,P) be a probability space, and let Pk be the empirical distribution of N i.i.d. random variables ~ P. Let h: I a E’ be P-integrable and let g: I a Ek bale-measurable. Then, for any 3 and t, p°°[suptIJ‘g'1(u) h d(PN - P)‘ —. o as N _. co] = 1. HOV s Proof. It is clearly Sufficient to prove the lemma for h 2 0; also, by the preceding discussion, and Since g preserves unions and intersections, it is sufficient to prove the theorem for ME. If S H = n H , where the Hj are indicators of half-spaces, H can be i=1 written as a linear combination of open members of fig by replacing any closed H by l - H: in the product. Hence it is sufficient J to prove the lemma for Open members of fig. 13 For any Borel set B<: Ek, let N(B) ='I’IIgI-1(B) h dP, and xN(B) = Ig-1(B)h dPfi. Ranga Rao (1962, Lemma 7.3) shows the existence, forfinite x,of mutually orthogonal measures, N(l), r = 0,1,...,k-1, k’l °° (i) (0) i = 0,1,2,... and r = k, i = 0 such that l = 2 2 1 +'xk , r=0 i=0 and,for each r and i, there exists an r-dimensional subSpace or a translate of such a subspace, Ail) say, such that xii)(Ek ~ A:1)) = 0, and x(1)(A) = 0 whenever A is a translate of a subSpace of dimension r (i) r less than r. For each Borel set B C Ek, let 16:)(3) = IN(B n A ). Then Ranga Rao (Lemma 7.5) Shows that if (i) IN converges weakly to (i) r (1) Cr 1, (ii) xN(A:i)) » x(A ) for all r and i, and (iii) Aéi) converges weakly to , then sup IXNCH) - x(H)I a 0 as N 4.x, * RSV: where R' are the Open members of UV . st St Thus our lemma is proved if we can Show that (i), (ii) and (iii) hold almost Surely [Pm]. However (ii) follows immediately from the (i) I r strong law of large numbers and the fact that {A is a countable collection; and (i) and (iii) follow from the Strong law together with the "sufficiency" part of Theorem 3.1 of Varadarajan (1958), which establishes that for any separable metric Space 3 there iS a f sequence of functions f such that, for any finite measure 1, 2,... I on S, RN converges weakly to I if Ifide a Ifidx for each i. Hence the lemma is proved. §1.3 Convergence of D(§,§) for Half-Space Procedures. In this section we prove the main theorem for half-Space pro- cedures, establishing conditions for the uniform almost Sure con- vergence of the conditional risk. This is followed by some remarks which attempt to point up some features of the proof which the State- ment of the theorem tends to obscure, by a Corollary concerning the unconditional risk, and some further remarks. 14 Theorem 1. Let C be a uniformly strongly consistent estimator of GN' Then given S < m, t < m and v < m, there exists a function N(TI,-y), defined for all n > o and y > 0, such that, for all g e n” and all Q 6 QC,s,t,v’ PP[IW(§JQSE) _ R(GN)I > e + n for some N >'N(n,v)] < y. Proof. Let Y E Q a say Y = 2 Qj v where each Q F,s,t,v j: _.]1 j j is an element of K:, and v1,...,vv are distinct measures on an Then _1 N v W@,Y,£) = N Z Qj(xr ) Z V (3)1.(9 rsaaxr) r= -1 j= -1 aEA j m _1 v = EN 2 2: v. (a) )3 Qj(xr)L(er,a.xr). w=l j= -l aEA j (r: er=w} Now 2 (x que ,a, x I:)= waQ L(u),a)dP where P is Qj j N N {r: 9 r=w} N w w the empirical distribution of the Nm = Nw(§) = Z [er=w] i.i.d. r=l random variables [xrz 9r=w}. Thus m N v “(@389 = 2 13m 2 2 v. (£1)qu L(w, a)de 0313-18691ij Subtracting from each side its expectation: 'Z IIMB IIM< - .JR wfls‘y ii) - R(GN :Y) N 2v (a)QJ. L( .a)d(PN - P)- 1 j- I w w w -1 aEA j Nw Finally, taking the supremum of the absolute value of the integrals and bounding out the vj(a) terms, we have, with Smagawsi) = V 2 SUP tI‘I‘Q L(UJaa)d(PN - Pw)‘, 86A QEXE Nw m N (1.5) Iw - R(G ml s z 4113‘“ 9"” x): " N N w=l for any Y E e and any F 6.9. F,s,t,v 15 0 Let w (x) be the member of 6&(5),s,t,v’ guaranteed by Definition 4, for which ¢r(§) = mo(§)(xr) for each r. Since A has n elements, and ‘YLOn,a)dP S‘M for each m and a, IL(w,Y)dP 3 Mn for any component procedure Y. Hence, from (0.9) and (1.5), we have, since W@,g,x_) = W(_9_,cpo(x), x) for each E, m N (1.6) was“) - R(GNH s z #smamgs) +Mn\& O and y > 0, such that, for all w,‘g and a, P°°E sup sup \ Q L(w.a)d(P - P )l > n] < v J(k':w,fi) Q9(t I NO) (1) s where J(k',w,§) = {Ni Nw(§) > k'}- Thus, with h = k(fl,y) = k'(fi(nv)-1,vn-1), we have Pm[ sup S(N,g,w,§) >T1] < y for all _e_ and m. J(h,w,§) -1 -1 Thus, for any N', with k = k(fim , y(2m) ) and H(k,w,§) = {N: N >'N' and Nw(§) s k}, we have m m ‘Nw(g) p [ 2 N SGN,§,w,§) > n for some N > N'] w=1 m w Nw(§) _1 s )3 P [ N SCN,g,w,§) > Tun for some N >N'] w=1 m -L 5 Z P00 SUP SCN,§_, “tn-1 + P 1%?" max SCN,§_,w,§_)>nm- } (0:1 J (Rd-Dag.) R(k,w,§) < y/Z + g(N') 16 where g(N') l 0 as N' 1 m, since max S(N,§,m,x) is finite R(ksw a_e_) valued, because {S(N,§,w,§) : N E H(k,w,§)} has at most k elements. Hence for N' sufficiently large, say N2(n,y), m m N (1.7) P E z fim-S(N,§,w,§) > n for some N > N'] < y. w=1 Using now the condition on G, let Nam) = max {N1(n “n+3 for some N>N(T\,y)] < y. 2. It follows from (1.8) that if E is uniformly consistent (not necessarily strongly) and n > 0 then (1.9) sup sup PPE‘W(§,Q,§) - R(GN)‘ > n + e] a 0 as N atm. QE;,s,t:,v g It is (1.9) which yields the corollary to this theorem. Corollary. If 6 is uniformly consistent then sup sup D(§Jg) < e + 0(1) as N 4 m. Proof. Recalling our convention that §_= (Y1(61),...,YN(eN)), let ab] = {1, WQAMQ > R(GN) + e + 5/2}, so that R(g,§g) s R(GN) + e + 5/2 + j‘cNWQ,g)dP°°. 17 N N Since W(§Jg,§) s N 1 2 max Lfin,a,YrQn)) N.1 2 V(Yr)’ say, r=l w,a r=l N fcflw (g,52)dp°° s u'1 2 chv (Yr)de r=1 < 6/2 for P(CN) sufficiently small, because the V(Yr) are identically distributed and integrable. Hence, from (1.9), R(g,g) < R(GN) +’e + 6, for all g_ and g E ¢§,s,t,v’ for N suffic1ently large. Remarks 1. We repeat the observation in the Introduction that none of our results are affected if 6 depends on N. In particular, if e = 0(1) as N a m, then the conclusion of the corollary becomes sup sup D(§, ) < 0(1) as N a m. é§,s,t,v g 2. The class §&,s,t,v may include procedures, 9, which are not EN-measurable (though, for each 5, the component procedures mo(x) must beIB-measurable). Such procedures are included in Theorem 1 and its corollary in the sense that, whether or not W@,g,x) is BN-measurable, there is a measurable function, W'Qgi) such that W'@,§) 2 W(_e_,gp,_)£) for all g and x, with W' having the prOperties asserted for W. This can be seen from the fact that both S(N,g,m,§) and Rug) - GN‘GI) are HN-measurable, and the comment following (1.6). §l.4 Remarks on the Restrictions on Half-Space Procedures. It is clear that the results of section 1.1 depend heavily on the use of Lemma 1. Indeed this is the reason for restricting the conclusions of Theorem 1 to the class Q. G,s,t,v' The restrictions on the class s. are two, both the G,s,t,v result of restrictions on the classes Q for F €.&: that for F,s,t,v 18 Y 6 § (1) {Y(x): x E I} is finite and (2) (x: Y(x) = vj} E X: F,x,t,v’ for each j. Restriction (2) is clearly not essential: the application of Lemma 1 would not be affected if finitely many of the sets , F E.&} were not elements {fx: Y(x) = vj}, 1 S j s v, Y E §F,x of X'. S stsv The crux of the proof of Theorem 1 is the uniform almost sure convergence of S(N,§,w,§), which is obtained from Lemma 1 because of the structure of the family of functions U {Y(-)(a): a E A, Y E Q }. F69 F,s,t,v However if, for each F 6.3, Q is a subclass of QF and, for each F1 w E O and a E A, Pm[sup”g(a)L(w,a)d(PNm - Pw)‘ -+ O as N -° 0°] = 1 where the sup is taken over g 6 U {Y(')(a): Y E QFI}, then Theorem 1 F69 o — A. o A would hold with QG,s,t,v replaced by Q61( {99 E @G. cp (x) E §G@)1 for each E3). The families U {Y(-)(a): Y E QF 5 t v}, a E A, are 9 3 9 F63 by no means the only ones with this property; however they are of particular interest as a rather natural generalization of the standard situation, mentioned previously in section 1.1, in which, for each F, 9F is restricted to those procedures, Y, for which 0 Y(x> = 2 chowe). KIA In fact, although vB usually depends only on B (the most common case, with A = {1,2,...,n}, is to have VB degenerate at the "minimum" member of B; see, e.g., Hannan and Robbins (1955), Hannan and Van Ryzin (1965), Van Ryzin (1966)), Theorem 1 also applies to the case where v is also permitted to depend on F; and the con- B clusions of Theorem 1 hold if VB is a measurable function of x (since Pw[sup ‘Iv (a)Q L(w,a)d(P - P )|'» O as N d’w] = 1 for QEXI B Nw w w each a E A, Ba: A and w E Q, from Lemma 1). 19 However the methods of Theorem 1 fail if vB depends on both F and x. In this case (1.5) becomes m N 42 , O (1.10) W(§,‘i’,§) - R(GN,‘Y) s z N 2 2: Sup lfQFBvFB(a)L(w,a)d(PN - P)|. w=1 BZA aEA FEQ w It is possible for the right side of (1.10) not to converge to zero; for there may exist, for a fixed B, containing at least 2 points, a set C 618, of non-atomic RD-measure, with card C 3 card g where 8 = {F: C:C:Q;B}' Let J map 5 onto the finite subsets of C and let vFBx(a) = J(F)(x) for some a E B and all F E 8. Then, if . o __ = p01nts are measurable, IQFBvFB(a)L0n,a)dP - IJ(F)LGn,a)dP& 0 for any F, since JCF) is finite. However for any empirical distribution PW , there is an F for which EN [x E J(F)] = Pfi [x E C] and L w o w m C c: QFB' Hence 0 _ m sup IQFBvFB(a)L(w,a)dPN - f0 L(u>,a)dPN -+ IC L(w,a)de a.s. [P ], F68 w m which may not be zero. For example, let P& be the uniform distribution on [0,1] U [m,w+l] for w = 1,2,3 with L0»,l) = w = 4 - L0»,2). Let C = [0,1]. Then since F(w,x) = %{0,l](x) +-[w,w+l](x), we have 0 3 8 = {F2 C C QF{1,2}} = {Ft 2 Fw(L(w,1) - L(w,2))F(u>.X) = 0} = {F: F1 = F3}, w=1 if e = 0, so that card C = card 3. Then with J mapping 8 onto the finite subsets of C, and VF{1,2}X(1) = J(F)Cx), we have 0 sup UQF{1.2}"F{1,2}(1)”1’1’d‘1’n - P1" _ Sup In»N - 1’1) (mm - Pu [0,1]. .& l g 1 1 Since RN [0,1] » P1[0,1] = k, the equivalent of Lemma 1 does not hold 1 in this case. 20 §1.5 Uniformly geBayes Procedures. The sets QFB' Equivariant Procedures. In this section we define the procedures to be discussed in section 1.6 for finite action spaces, and in section 1.7 for infinite action Spaces. We also prove, in a more general context, the claim of 0 section 1.1, that the sets QFB’ F E 3, B C.‘ A, are elements of X: for some t and 8. Definition 5. A measurable component procedure, m, is uniformly e-Bayes against a distribution F 6.9 if m(x)(B(F,x,e)) = l for all x, where (1.11) B(F,x,e) = {8: F[Z(w,a-b,x)] s e/m for all b E A}. If m is uniformly e-Bayes against F, then R(F.cp) = FULdP] = 12sz .x>de(x>3 (1.12) fF[z(w.cp(x>.x>]du s R(F) + e , since F[Z(w,m(x),x)] S min F[Z(w,a,x)] + e/m, and p(I) = m. The aEA change in the order of integration is justified by the finiteness of 0. Hence a uniformly e-Bayes procedure is e-Bayes in the usual sense. Lemma 2. Let QFB = {x: B(F,x,e) = B}. Then for each F 6.9, e t “1 2k 2 B<: A and e 2 0, Q E X’ for t = 2 n , and s = n (1+r) FB 3 k=0 n where r = n . Proof. Let TFba = {x: F[Z(w,b-a,x)] s e/m}. Then TFba 18 the Z-inverse of a half space, so is an element of X1. Since 21 QEB '3 {X: B C B(F9Xa€)} n {X: (AnB) n B(F,X,€) _—_ ¢} =nn'r nnurc, b6B 86A Fba d6A~B e6A Fde and, from (1.4) and the fact that inverses of functions preserve . . . - , c unions and intersections, 0 fl TFba 6 X5 and n J TFde 6 XI, 2 b6B a6A d6A~D e6A q where q = n and r = 11“. Hence, by (1.4), QFB C K: where r-l t = 2 q and s = q(l+r). The proof is complete. k=0 Since 6 is fixed in our discussion, we shall abbreviate B(F,x,e) and QFB to B(F,x) and QFB respectively, in future. Definition 6. For each F 6.8, let QFu be the set of component procedures uniformly e-Bayes against F; and for a an estimator, let can = {gm/5, 3 cp°(§) 6 tag)“ such that,v r, 601.01) = cpo(}_{_)(xr)}. From (1.12), §6u> * Proof. Clearly G is invariant. Suppose g; 6 @a. We shall * show that m 6 GE. ' To show this, we need to show that cpr(§)(B(&(x),xr)) = 1 for all 5. and r. Given 5, r and g, let gk = r. Then,since :9 6 @Cu’ we have cpk(g§) (B(G(gx_),xgk)) = 1. Thus, Since 99 is equivariant, mgk@(‘B(G(g§),xgk)) = 1, so that ¢r(§)(B(6(g§),xr)) = 1 for all g, x. and r, i.e. cpr(§)[ fl B(a(g§),xr)] = 1. Thus Q 6 <1: if n B(E;(g§) ,xr) C B(éQg) ,xr). 865 .. In , G g€6 For any a 6 n B(G(gx),x ), 2 G (g3<_)Z(w,a-b,x ) s e/m for all - r __ w r g66 03-1 1 m A b 6 A and all g 6 6. Hence 1“- 2 2 Gw(g};)Z(w,a-b,xr) S e/m a 866 (1):]. for all h 6 A, so that a 6 B(GQc),xr) as required. Hence {Cir C G IIIIIIIIIIIIIIIIIIIIIIIIIII-r——————————————————r ,s-. 23 Suppose that f; is uniformly consistent, and let a, B, 'Y and 6 be arbitrary positive numbers. We write ‘6‘ for ‘G‘GD. Then for N > N(y,6) we have Pm[\é(1(g)) - GN‘ > y] < 6 for all g, where _Y_@) = (Y1(01),...,YN(3N)). Hence, since ‘C-GN‘ s ‘6‘ + ‘GN‘ = 2, ‘H‘EQQD - GNUdeOQ < y + 26 for all Q. Since the Yi are i.i.d. and GN(g_e_) = GN(_Q) for all g and g (where GNQ) is the empirical distribution of 91,92,...,9N) we have, by the transformation “EQGQD ' GN‘deQ) < 'Y + 26 for theorem, “Maytag” - GNldeQ) all g and g. Hence fl 9051(89) - GN1dP°°(Y) s f};- 2: J‘Iécysa» - GNldem gee ale 2 geS < y + 26. - . °° 1 32% Thus, by the Markov inequality, P “N? Z G(g_¥_(gg)) - GN‘ > a] < 01 . 866 Since gx_ = g_Y_(gg) for each x, g and 3 we have that if N >N(v,6), with y + 26 < (18, PODHCQQ - GN‘ > 01] < B for all g. The lemma is proved. We can now state the main result of this section. Theorem 2. Let C be a uniformly consistent estimator. Then sup sup D(_Q,gg) < 0(1) + e as N -+ 00. mag a Proof. In view of Lemma 3, it suffices to prove the result for A O * - G invariant. Let 9&6 66; and let Wr@,gg,§) -L(er,cpr(x_),xr). Then Wgrfiiamg) = L(egr.cpgr(§).xgr) = L(egr,cpr(g§).xgr) = Wr(s§.gg-gzr_)- Let E denote expectation under the distribution with mass N 1%!- at each element of 6. Then since N.1 2 h(r) = E[h(gN)] we r=1 have 24 N Mam) = fEEWgN(§,gg.§_)] kglfmk’kaBNQ) EEfWN (g§.m,g><) II f(eg kg,x kN)du (x)] k= 1 N N EEwamgam) k21f(9gksxk)du (25)] 0 O N O O O by the transformation theorem, Since M is invariant under per- N mutations of x, Noting that EtegN = w] = §93 we have N _ N Race) ~§E1L k31f(esk’xk)]d” (a) N = IEEEEL‘egN’WE’Xu) k21f(egk’xk)‘egN = mm“ (a) m Nw N-l (Ln) = f 2 _w L(u), cpN(x),N x )f(w, XN )E[ H k: _ N w=1N t(e gkk,x HegN -w]dp. (1g). 1 Denoting the integrand of (1.13) by T<¢N(§)’§) we have, with B(r) = B(GQ:_).xr). (1.14) R(MQ) s j‘ max T(a, x)dp.N U for all g; e @a. aEB(N) Let Q'6 Qau be given by gr(x)(a) = l for a = aB(r)(xr)’ where aB(x) is the first maximizer, among elements of B, of GN[Z(m,a,x)]. One might expect 'g to do about as badly as possible against ‘g since it "plays anti-Bayes" against GN within the restrictions imposed by membership of Qéu' We shall Show that this is, in fact, the case. Since C is invariant, g_ is equivariant (see, e.g., the remarks following Definition 9). Hence fmax TJWM< W x) - Tex>1duN (x) aEB(N) 863 (1.15) s z j‘max T(a,x) - T(aB(xN),x)dp.N(x), BCA aEB 25 A5) since each integrand is positive. We Show the right side is 0(N- as N d-m. For each B, consider the problem obtained from the present problem by truncating the action Space to B and using the loss function LB(w,a,x) = bEBL(w,b,x) - L0»,a,x). Replacing L by LB in (1.13) and interchanging orders of summation, we see that the risk of an equivariant procedure, X3 in this new game is (1.16) RB(_e_.i) = beBumy - wN exam" 6). In particular, the best equivariant procedure has risk (1.17) R;(GN) =j' z T(b,§) - max T(a.§)duN® bEB aEB and the best simple Symmetric procedure has risk _ N (1.18) RB(GN) - f z T(b,}_<_) - T(%(xN).§)du Qg) bEB since simple symmetric procedures are equivariant. We obtain (1.18) from (1.16) by taking X. to be the simple symmetric procedure whose . th 0 o o o kernel, in the r problem, is degenerate at the first minimizer in B of GN[LB(w,a,xr)f(u),xr)] = GN[f(w,xr)b§BL(w,b,xr)] - CN[Z(w,a,xr)], i.e. at aB(xr), the first maximizer of GNEZQn,a,xr)] in B. Substituting into the left side of (1.15) the left of (1.14) and replacing the right of (1.15) by the difference of the left sides * of (1.18) and (1.17), we obtain, for all g) 6 11>“, (1.19) R(_e_,§g) - R(g,;) s 2 {RB(GN) - g(an. BIA Hannan and Huang (1969) have shown that each summand on the right of (1.19) is bounded by 0(N-a) uniformly in g, Hence 26 (1.20) sup* Sup D(g,gg) s sup{R(§_,Q) - R(GN)} + 009-}5 606% Q .Q We now Show that Q_ is a half-space procedure, so that, from ). the corollary to Theorem 1, sup{R(§,£) - R(GN)} < 0(1) +.e. We first 0 note that if go is the function given by Definition 6 then 0 {Q (§)(y): y 6 I}<: {a1,a2,...,an} (where "a" denotes the measure degenerate at a) for each x, It remains only to Show that, for each x, {y: 60(x)(y) = a} 6 K: for some t and 5. But 0 {Vi Q E) (y) = a} = U (Q* 0 Q ) {BCA:aEB} GQ)B GNBa where D II (x: aB(x) = a} GNBa n {x: GN[Z(w,b-a,x)j < 0} H n {x: GN[Z(m,b-a,x)] s 0} bEB b6B ba E Kn-l' G’ (1.4) yields the result. t Since we already have Q"(§n)B 6 X; from Lemma 2, an application of e e A s e d d H nc £_6 QG,s,t,n for om boun ed 3 an t, so we can apply the corollary to Theorem 1 in (1.20) to get Sgp sup D(§,gg) < 0(1) + e as N _. co, as required. Q“ 9 G _ §1.7 Convergence of D(§,Q) for quivariant Uniformly e-Bayes Procedures. (Totally Bounded Action Spaces) In this section we replace the assumption that A is finite by (1.21) A is totally bounded in the metric d(a,a') = sup ‘L(w,a-a',x)‘. x,w (Since, in fact, we deal only with uniformly e-Bayes procedures, it is sufficient that some totally bounded subset of A contain 27 U B(F,x).) We now state and prove the result analogous to Theorem 2. F63 XEI Theorem 3. If (1.21) holds and 6 is uniformly consistent, sup sup D(_9_,<_;Q) < e + 0(1) as N -o 00. a 66 I6 Proof. For each 6 > 0, let D6 = {a1,a2,...,ak}, k = k(6), be such that, for any a 6 A, d(a,aj) < 6 for some aj 6 D5, where d is the metric given in (1.21). * Let 9665 Fix 6 and let {Ajz j =l,2,...,k} be a partition of A such that, for each j, d(a,a ) < 6 for every a 6 Aj. .1 Consider the reduced problem obtained by replacing A by A f - - - u u D Let ché and 666 satisfy Definitions 6 and 8 (for ch and 6. * "66"), with "e" replaced by "e+m6"; and let R6(-) be the Bayes envelope for this reduced game. We observe that, for any G 6.9, we have min G[Z(w,aj,x)] - inf G[Z(w,a,x)] < 6 max f(w,x) < 6, D A w 6 since f(w,x) S 1 for all m and x. Integrating this inequality with reSpect to u, we have, Since B(I) = m, (1.22) R6(G) - R(G) < m6 for all G 6.9. Let ‘Q = QQQ) be the procedure in the reduced game given by cr(a>0 for some 5, y and j, then @005.) (y) (Aj) > 0. Since (pow 6 QCQN’ there exists an a 6 Aj such that §(§)[Z(w,a-b,y)] S e/m for all b 6 A. By definition of A this implies that 6(x)[LQp,aj-b,y)f6p,y) s e/m + 6] for all j b 6 A (and hence for all b 6 D6)' Hence g°(§)(y)(a ) > 0 implies J aj 6 B6(C(x),y,e+m6) where B6(G’x’€) satisfies (1.11) when A is replaced by D5. Hence 60(5) 6 §§(§)u6 for all x, so that * 96 @65' By definition of Q_ we have, for all ‘g and x, N Mata) - Megan s if1 z ‘Wr@,§Q,£) - Wr(§,§p£)‘ r=1 N n -1 s N a E H‘A.L(9r,asxr)Cpr(>_<_sda) - Lcpr(a.Aj>l r-l J-l J _1 N n _<. N )3 E jAj|L(er,a,xr) - L(er,aj,xr)|tpr(g,da) r-l j-l _1 N n SN )3 ZUCOQEJA.)=6° r=1 j=l r J Integrating this inequality we obtain, (1.23) R(g,gg) - R(§,Q) < 6 for all g. Thus, for any 6 > O, sgp sup New s sgp sup{lMass-Mama»|+\R(§»§.(g2)-R5(0N)\ QC 3 9a .Q + ‘R5(GN) - R(GNH} (1.24) s 5 + 83p sule(§_,;)-R6(GN)| + m6, by (1.22) and (1.23). ea 9 ca - From Theorem 2 there is a function N6(y) = N(y,6) such that N > N(y,6) implies sgp sup‘R(§,£_) - R6(GN)‘ < y + a + m6. Q66 6 29 Substituting this in (1.24), with 6(y) = , we have, .__;l___ 2 (2m+1) for N > N'(Y) = N(%3 6(y)), Sgp sup D(§Jgp < y +'e as required. <16 6 Remark. We again note that Theorems 2 and 3 continue to hold if 6 depends on N. CHAPTER II INFINITE STATE SPACES §2.0 Introduction. When 0 is infinite, we face two problems not encountered earlier. Solutions to the problem of estimating the empirical dis- 1,x2,...,xN are not known in general. In what follows we Simply assume the existence of appro- tribution GN from the observations x priate estimators, and we will not discuss this question further except to mention the work of Fox (1968, Chapter III) in the case where the distribution Pb is the uniform distribution on [0,w] (O < w < w) and the case where PE is the uniform distribution on [w,w+1] (-m < w < m). Also, appropriate forms of Lemma 1 are not available because, among other things, of the partial failure of the Glivenko-Cantelli theorem in infinite dimensional Spaces (see, e.g., Sazonov (1963)). The convergence for which Lemma 1 was used, however, could be expected if the sample Space, 1, were finite, since we would then be concerned with a supremum over a finite number of sets. How- ever if I is finite the problem of estimating GN is virtually in- capable of solution, since the distributions {P@: w 6 0} would not be linearly independent if Q has more elements than I. One seems to need, then, an infinite sample space to allow the estimation and a finite sample Space to ensure the convergence needed for the asymptotic optimality of the "Bayes against the estimate" procedures. It is these considerations that motivate this chapter. 30 31 In section 2.1 we define terms to be used in the following sections, and outline the basic approach. The main results of the chapter are in section 2.2, and this is followed, in section 2.3, by an attempt to show that, under reasonable conditions, constant terms appearing in the bounds in section 2.2 can be controlled. §2.l Finitely Based Decision Procedures. Let V be a finite measurable partition of I, and for each x 6 I let x' be the member of V to which x belongs and L(w,a,x') be the value of L(w,a,y) at a fixed, but arbitrary, point y 6 x'. As before, let .9 be the set of distributions on 0. Definition 10. For each m 6 0, let va bl the distribution on V induced by PG on .6. For the component game obtained by replacing I by U, Pb by PUV and Lfin,a,x) by LQn,a,V), let RV(.) be the Bayes envelope, RV(G,m) the risk of a V—measurable component procedure m against G 6.9, Q the set of component V procedures and,for each F 6.9, QFV the set of component procedures e-Bayes against F. Component procedures available in the reduced game are also available in the original game in the sense that if m 6 av for every x, can be identified with cp. Since V CB, any V-measurable , the procedure Y in the original game, given by ‘i’(x) = cp(x') procedure in the reduced game ista-measurable in the original game. In the context of the original problem, a procedure m 6 QE/ will be called "g-V-Bayes" against F. For each F and G 6.9, let (2.1) 1(F.G)= sup 2 (G-F)\'_L(w.v]- wresv vs! Then, from Lemma 0, for each F and G 6.9, and w 6 fi/, 32 2.2 RC, -RGS ,G+RF, -RF ( ) v(cp) v() HF) v(cp) v() where the interchange of the order of integration, for the term I(F,G), is justified by the finiteness of V. The main idea in what follows is that, if V is a "good" approximation to I in the sense that both \L(w,a,x) - L(w,a,x')‘ P (V) and 2 IV(f --J£———)+m1 are small for every w,x,a and V (where vev ' m ”(V) dP fw = aam' for some measure p), then Rv(-) might be close to R(o). Then we might use &(x1,...,xN) to estimate GN, play e-V-Bayes against C(x) in each component problem, and use the finiteness of V to obtain the convergence which, in Theorem 1, came as a result of Lemma 1. §2.2 Converggnce Theorems for e-V-Bayes Compound Procedures. In this section we give conditions under which the risk of an "e-v-Bayes against a" procedure is close to RV(GN)’ and give a bound on the difference RV(GN) - R(GN). These results are drawn together for a general theorem on the convergence to R(GN) of these procedures. Definition 11. For 6 an estimator, let Gav = {SEVEJCPOQQ E 966w such that.Vr. 61,6) = cp°(1<.) (x91- Theorem 4. Let Q be totally bounded in the metric d1(w,w') = sup{\L(tu,a,V) - L(u>',a,V)‘: a e A, v e V}- Let (2'4) 6W) = SUP{‘L((D,3,X) ' L(w:a:Y)‘:w E a: a e A, X. = y'} be finite and let M«< m be the uniform bound on L implied by these conditions. 33 Then with 6 = 6(V), there is a function N(n,v), defined for all T\>0 and y>0 such that, for all 660m and 99665;, P E U {W(§,92.§) - x(§(§),G ) > RV + e + 56 +111] < v- N N N>N(fl.v) Proof. Let E1,...,Ek be a partition of O by sets of dl-diameter < 6, so that for each i, sup‘LGn,a,x) - L(w',a,x')‘ < 26 a whenever w,w' 6 Ei' For each i, let wi be an arbitrary fixed element of Ei- Let Y 6 QVF for some F. Then _1 N _1 N Manny-N 2: L(er.‘r(x;).x;)| s N z r_1 r=1‘ L(er,1 (x11) ,xr)-L(er,‘i’ (x11) ,x;)| s 5, Since the integral of the absolute value bounds the absolute value of the integral, (2.5) buggy) - RVQQJ’)‘ s 6. Also, for er 6 Ei’ we have ‘L(er’Y(X1'-)’xr)-L(wi’Y(xii)’x1'-)‘ s f|L(er,a,xr)-L(wi,a,x;)N(xp(da) s 25. Hence -1 m (2.6) (Mini) - N z z L(w.,‘l’(x'),x')‘ < 25. i=1 {r-e 6E } 1 r r ‘ r i N __ _1 N Let Ni - N(g,1) - 2: [er 6 E1], PN - N].L E19]: 6 Ei'JPe V’ r-1 _1 N i r-l r and for each D<: V, let P (D) = N z [e 6 E,][x' 6 D], so that Ni i r=1 r i r 3’ is the "average" distribution on V arising from the er's in i E1, and PN is the corresponding empirical distribution given by i t. {xr. er 6 E1}. Hence (2.6) becomes 34 m N, ._l (2.7) |W(g,‘i’,§_) - E N IL(wi,‘Y)dPN.| < 25 1-1 1 which implies m N, _1_ _ |R(§_,1’) - 15:1 N j‘L(wi,1)dei| < 25, so that, from (2.5), we have N m . ‘_ (2.8) ‘RVQQD - 1: ffuwixmpNJ < 35. 1 From (2.7) and (2.8) we have m N, A __ (2.9) W(_6_,Y,§) s 55 + 121 N fL(wi,Y)d(PNi PNi) + Rug,“- Let SD 6 96V and let cpo be the function guaranteed by Definition 11. Then since W@,gg,§) = W@,cpo(§_),§) for each 5, we have, from (2.9) and using (2.2) and the definition of QFV to bound Rv(g,cp°@), m Ni (2.10) W(_9_,gg,£) s 55 + £1 fi— S(N,_9_,Ei,)_<_) + pva) + e + )((C(:_<_),GN) where S(N,_9_,Ei,§) = M 2 |PN - EN \(V). V€Vm 1N1 1 m We now show that 2 fi—-S(N,§,i,§) a 0 a.s. (P ) uniformly i=1 in Q_ as N a m, by considering the 4th moment of \Pfi - P; ‘(V). i i We have It @N - 5N m>|4dP°° = N74{ >3 Icvm - P >“cu£>e (x) i i 1 {r=9r6Ei} 6r r + 6 [form-PG (vnzdpe supremo-re >2cuae (xm {r#s:6r,686Ei} r r s s -4 -2 + - 3 N1 {N1 6NiCNi 1)} < 6Ni where we use the independence and zero expectation of the terms V(xr) - Perm), r = 1,2,...,N. 35 Hence, by the Markov inequality, given n > 0, Pm[sup S(N,§Ji,§) > D] S 2 6n-2'T'F2 a 0 as N'(§,i) d'm. N>N' n=N'(§_,i) Hence there exists a function k' = k'(n,y), defined for all n > o and y > 0, such that, for each i and each g e o”, P°°E SUP S 11] < v N6J(k'.i.g) where J(k',i,§) = {N: N(gni) > k'}. Let k = k'(fi/m,v/m). Then, with H(k,i,§) = {N: N>N' and N(g,i)n] s z P°°[sup f 801.5.199 > n/m] 1 N i=1 N>N' 2 (D P [sup N>N' i "MB “1 (2°11) 5 2 (P001:- sup S(N,g,i,x)>'fl/m] + Pmi‘k—r max SCN,_Q:1.>_<.)>11/m]) i=1 N6J (k,1,9_) N N€H(k.i.g) < y/2 +-g(N'), where g(N') 1 O uniformly in g_ as N 1 m since S(N,g,i,x) S 2M< 0°. Theorem 4 now follows from (2.10) and (2.11). We now deal with the term fiV’ Let = = _ + (2.12) am) Sip max) 33p Jaw aw) d». It is clear that au(v) depends on n. In fact, if u is c-finite, then av(V) s au(v) for any finite measure, v, equivalent to u and agreeing with u on U{V 6 v: u(V) < m}. This follows easily from the fact that fwv(x) = 0 if ”(x') - m. Definition 12. For V a measurable partition of I, let d(V) = inf dp(V): where the infimum is taken over the set of o-finite 36 measures {u: Pb < < u for all m 6 0}. Remark. We assume henceforth that Q is separable in the metric d(w,w') = sup‘P (A) - P ,(A)‘; this implies domination of AC—B‘” w {P¢: w 6 O} by a o-finite measure. Definition 13. For each C 6.&, let R1(G) = inf R(G,¢D, where the infimum is taken over all measurable procedures m for which ciJ‘pr) fwdu] = IG1L(w,cp)fw]dt-h Remark. The class of procedures for which this change of order is valid does not depend on u, since (i) u can be taken to be equivalent to {sz w 6 0} because of the separability of 0 under the metric d; and dP .' ——u‘)-=£1—u'-f h (11) if u < < v then dv dv w’ so t at dP J'GEE-jp- L(w,cp)]dv = ”((51% G[wa(w,cp)]dv = j‘GEwa(w,cp)]du. Lemma 4. Let L(m,a,x) s M for all w,a and x, and let 6 = 5(V) be given by (2.4). Then, with a = d(V), (2.13) RV(G) - R1(G) S Ma + 6 for all C 6 .3. Proof. Let u be any measure with PE < < u for each m, and let fw and wa be as in (2.12). Let m be any procedure for which GLfL(w,cp)fwdu] =j’c[L(w,cp)fw]du. Then Map) 2 fc[{L.x'> - 61fw(X)]du(X) >— fc[L(w.cp(x>.x')fw_ 13w (x) - (fwv(x) - fw(x))+, we have (2.14) R(G,cp) + a 2 fancies) .x'>£wv1da(x) - mcmfwv - fw)+du]- 37 The first term on the right of (2.14) is bounded below by (2.15) j‘ inf G[L(u),a,x')f (x)]du.(x) 2 R (c) - M 2: G[Pm(V)]. aEA (W V u(V)=°=> To deal with the second term on the right of (2.14) we note that IE”) - fw)+du _ 1(qu - fw)+du )3 IVf du. + z fvaw - £W)dh u(V)=°° ‘” none» (2.16) E P (V). MVme Combining (2.12) and (2.16) we have, for all m, + (2.17) (v)- 2 P002 (f -f)d. a” uW)=°°w I “’V w )5 Combining (2.14), (2.15) and (2.17) we have R(G,cp) + 5 2 Rv(G) - mum). Since q) and p are arbitrary, the proof is complete. Corollary. R(GN) 2 R (GN) - Ma - 6 for all ‘g 6 Om. Proof, We have only to show that R1(GN) - R(GN). This, however, follows immediately from the fact that GN is discrete, so the change in the order of integration in Definition 13 is valid for any measurable procedure 5;). We are now in a position to obtain a result, analogous to Theorem 1, for infinite state Spaces. Theorem 5. Let Q and A be compact, L jointly continuous in w and a for each fixed x, Pb(V) continuous in m for each V 61/, and 5 = 5([) and a = ad!) as in (2.4) and Definition 12 respectively. Let M be the bound on L, and let either *5 (a) 63,d) be a metric Space and L (G(§),GN) » 0 a.s. [Pm] 38 * as N ... 0° for each g 6 000, where L is the Prohorov metric, or (b) Q be a subset of the real line and L(CQ),GN) —+ 0 a.s. [Pm] as N -° 0° for each _9_ 6 Om, where L is the Levy metric. Then there exists a function Nm,y,g) such that for each P°°[ sup WQ,;Q,§) - R(GN) > a +Ma + 65 + n] < y. N>N(T19Y:§) In addition, if the convergence of fig) to GN is uniform in Q then N(T],v,§) =N(T\,v). Proof. For any v and ‘Y 6 QV and V6v |L(w.vw> - YW).V)Pw(V) - L(w'.v(V) - YCV).V)Pw.(V)| s wanuww) - 1mm - Loam) - Y,V)| +M|Pw.m-Pw(v>| (2.18) S 2 sup ‘L(w,a,v) _ L(wv,a,v)‘ +M‘Pw,(V) - PwWH. a Since 0 is compact, P is uniformly continuous so that w WWW) - PwCV)‘ -+ O uniformly in u) as (n' -o m. We shall show that the same is true for sup \L(u),a,V) - L(u)',a,V)‘. a Given p > 0 and a.) 6 O, for each a 6 A there exist open sets U C Q and W C. A such that (w,a) 6 U X W , and a a a a (w',a') 6 Ua X W8 2 lL(w,a,V) - L(w',a',V)‘ < p/2. Since A is compact, a finite covering {U8 X Wa , i = 1,2,...,n} covers {an} X A. n i i Let U = 0 U8 . Then U is open, contains 0), and for any 10' 6 U i=1 i and a 6 A there is an i for which {(m,a),(u)',a)} c: Ua X W , a so that |L(u>,a,V)-L(w',a,V)‘ s ‘L(w,a,V)-L(u),ai,V)‘ + ‘L(w,ai,V)-L(u)',a,V)‘ 0 such that d(u),u)') < Bw(p) implies sup ‘L(w,a,V) - L(u)',a,V)‘ < p. An elementary argument, a using the compactness of C), shows that inf Bw(p) = B(p) > 0 for u) 39 each p > 0. Hence sup ‘LQn,a,V) - LQn',a,V)\ # 0 uniformly as w' a w, for each V; aid the above~also suffices to show that Q is compact (and hence totally bounded) in the metric 5156,60 = sup |L(w,a,V) - L(u>',a,V)\. a,V Thus both terms on the right of (2.18) tend uniformly to zero as w a w', so that, with ah(p) = sup{h(w)-h(w'): d(m,w') < p} and with Y = {L(w,Y(V)-V(V),V)Pw(V): V 6 V; Y,v 6 QV}, we have sup ah(p) a 0 as p a O. hGV Hence, if G),d) is a metric space, we can apply Lemma 7 in the Appendix, to get h(F,G) « 0 as L*(F,G) ~ 0; and if Q is a subset of the real line, we can apply Lemma 8 (or 8') in the Appendix to get x(F,G) » 0 as L(F,G) a 0. Consequently, under either (a) or (b) we have, for each §_6 Om, 1(C(§),GN) a 0 a.s. [Pm], with this convergence being uniform for §,6 Om if the convergence of C(x) to GN is uniform. In addition, as has been shown, 0 is totally bounded in the metric d1(m,w‘) = :us ‘L(w,a,V) - L0»',a,V)‘; thus the conclusion of Theorem 4 holds. ,This conclusion, with the conclusion of the corollary of Lemma 4 and the convergence of x(C(3),GN), yields the required result. Corollary. Under the conditions of Theorem 5, sup D(9_,gg) < 6(1) + e + Ma + 55, as N —. no, for each g e a”; and the convergence to 0 is uniform in .Q if the convergence of 6(5) to GN is uniform. Proof. Given n > 0, by Theorem 5 we have R(g,92) = fw(g&,:_t_)dp°° s R(GN) + e + n + M: + 65 if N > N(n/z, n/ZM,g), since w s M. 40 We remark, in concluding this section, that none of the results depend on the particular determination of ‘Lcn,a,x'). (The first condition for Theorem 4 does depend on this; however if d1 were computed for a determination different from the one to be used, L0»,a,x") say, the first sentence of the proof would still hold and would imply sup |LGn,a,x) - L6»',a,x")‘ < 46; and the conclusion a of Theorem 4 would still hold with "56" replaced by "96".) Consequently any determination of LQn,a,x') will suffice. §2.3 Approximating the Sample Space byia Finite Partition. The usefulness of Theorem 5 and its corollary will depend on the existence of estimators satisfying conditions (a) and (b) of Theorem 5, and on the availability of partitions,v, for which d(v) and 5(9) are arbitrarily small. As has been said, we do not discuss the first of these problems in this thesis. The next two lemmas and the remarks which follow give a partial answer to the second. Lemma 5. Let Pb‘< < u with §;&-= fw. Then, for each m and a > 0, there is a partition vw such that amu(V) s a whenever V is a sub-partition of Vw -1 _ Proof. Choose a so that PN(fw [0,a1)) < a/3. Let a0 0, 1 3a =_.l;.1._ J ’311 and let aj 3-a (3-a) for l < j s k where k = min{j. Pm (f 001‘;(3_01)j""a1 1,a)) < d/3}. Let ak+1 =5. and let -1 Vw = {V0,V1,...,Vk} where Vj = fw [aj,aj+1). Let V be a sub- -partition of Vw Then for V CZVj, li< j < k, P (V)+ and x 6 V, (f (x) - M) s fw (x )(l - -J—-9 s fw (x)a/3. Hence no MV) aj P (V) + a (V)= r. J‘V(fw - (V))dp.= z: fvtdn+ z: z: fowMy/3du 0 there is a partition,v, of I for which d(V) s a. £59313. Since 0 is totally bounded in d, it is separable so that, for some o-finite u, Pm < < p. for all m 6 0. Let dP foo = ELL—“l. Then d(u),u)') = kflfw- fw,‘dp.. By Lemma 5 we can find, for each u), a partition Vw such that am (V) s (11/2 for any sub-partition of Vw. Let Ul’U2’°°°’Uk be a covering of Q be Spheres of diameter S 01/8, and let mi E Ui’ i - 1,...,k, be arbitrary. Then since, for any w,w' and any V, P (V) P ICV) 2 IVwa - :(v))+ - (fw. - Wridu V P (V) Pw.(V) s ZUVlfw - may. +J‘v‘ :(V) - amid“ V and since, for the second term on the right, P (V) w.(V) .J&___ -.—————- g - f - d M “M MWM» lP (V) PwmI)‘ sIVI w £00.! 1». h - S 2 f - f = 4d , ' . we ave QWW) awmfll) J“ u.) w"du' (w 03) Let V be any finite sub-partition of V ,v ,...,v . Then “’1 UL’2 UL’k + d . for any 0), awa) s awiual) 4 (w,wi) s or for u) 6 Ui Hence a (y) = sup or (V) s 0. Since oral) $0! (V), the lemma is proved. HI 0.) W 14 Remarks 1. If the conditions of Lemma 6 hold and there is a partition,Vr say, for which 64/1) s 6, then any sub-partition, ([2, of both V1 and the v of Lemma 6 will have 01412) s a by Lemma 5 and 5(/2) s 5 trivially. We do not discuss 5(,) further except to note that, obviously, 6(V) = 0 for all V if L is independent of x. 42 2. The condition of Lemma 6, that Q be totally bounded, holds for many families of distributions. Scheffé's theorem, that I‘fw - fw,‘du a 0 if, for every F with h(F) < m and every n‘> 0, ”[F n {x: ‘fw(x) - fm,(x)‘ > n}] a 0, serves to establish total boundedness - often by using compactness - in many cases. We give soue examples. (a) Exponential families. Let T be a mapping from '1 into Ek and let u be a measure on I. Let ® = {w 6 Ek} fewT(x)du < m} where m T(x) is an inner product. The class of densities wT(x): w 6 ®}, where C0») = [IewT(x)dp]'1, is the exponential {C0b)e family on I generated by T and n. It is well known that C is continuous on the interior of @; so,since fw(x) a fw.(x) as w «’w' for all x and all w' in the interior of @, we see, using Scheffé's Theorem, that any subset of a compact subset of the interior of ® will be totally bounded in the metric d(w,w') = sgp |Pw(A) - Pw,(A)‘ = aflfw - fw,\du. For example, in the one dimensional normal family, T(x) = (x2,x) and w = (- -l§, Ea), for the distribution with mean u and variance 02. Hence oi: requirement is satisfied if, for some positive numbers, a and b, 02 2 a and ‘u‘ s boz. (b) Translation parameter families. Let Ip dv = l where, for u Lebesgue measure on Ek, v < < u and g3- is bounded. Let fw(x) = p(x-w), w 6 ER. Then since I‘f¢ - fw,‘dvH 0 as w d w' (cf. Royden (1968), p. 91, Problem 17 (b)) our condition is satisfied if Q is any bounded subset of ER. Remark. In both (a) and (b) above, we also have, for each B 68, ‘Pw(B) - Pw'CB)‘ a 0 uniformly in u) 6 Q as w' aw, a result which, in Theorem 5, was obtained from the compactness of O 43 and was used in showing that x(F,G) ~ 0 as L(F,G) * 0 or * L (F,G) d O. The other requirement,that sup ‘L(w,a,V) - L0»',a,V)‘ a 0 uniformly in w, still needs separate a,V treatment however. BIBLIOGRAPHY BIBLIOGRAPHY Cogburn, Robert (1965). On the estimation of a multivariate location parameter with squared error loss. Bernoulli (1723), Bayes (1763), Laplace (1813), Anniversary Vol., 24-29. (1967). Stringent solutions to statistical decision problems. Ann. Math. Statist. 38, 447-464. Fox, Richard (1968). Contributions to compound decision theory and empirical Bayes squared error loss estimation. RM-214, Department of Statistics and Probability, Michigan State University. Gilliland, Dennis (1968). Sequential compound estimation. Ann. Math. Statist. 39, 1890-1904. Hannan, James F. and Huang, J.S. (1969). .Equivariant procedures in the compound decision problem with finite-state component prob- lem. To be submitted for publication. Hannan, James F. and Robbins, Herbert (1955). Asymptotic solutions of the compound decision problem for two completely Specified distributions. Ann. Math. Statist. 26, 37-51. Hannan, James F. and Van Ryzin, J.R. (1965). Rate of convergence in the compound decision problem for two completely Specified distributions. Ann. Math. Statist. 36, 1743-1752. James, W. and Stein, Charles (1961). Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium 92 Mathematical Statistics and Probability, 361-379. University of California Press. Rao, R. Ranga (1962). Relations between weak and uniform convergence of measures with applications. Ann. Math. Statist. 33, 659-680. Robbins, Herbert (1951). Asymptotically sub-minimax solutions of compound decision problems. Proc. Second Berkeley Symp, Math. Statist. Prob., 131-148. University of California Press. Royden, H.L. (1968). Real Analysis, 2nd Edition. Macmillan. Samuel, E. (1967). The compound statistical decision problem. Sankhya 29, 123-140. 44 45 Sazonov, V.V. (1963). On the Glivenko-Cantelli Theorem. Theory of Probability and Its Applications VIII, 282-290. Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proceedings 2£.£EE Third Berkeley Symposium 22_Mathematica1 Statistics and Probability, 197-206. University of California Press. Suzuki, Giitiro (1968). A general class of non-parametric empirical Bayes procedures. Research Memorandum No. 19, The Institute of Statistical Mathematics. Varadarajan, V.S. (1958). Weak convergence of measures on separable metric Spaces. Sankhya 19, 15-22. APPENDIX APPENDIX We prove here lemmas which have been used in Chapter II but which were unsuitable for inclusion there because, except for their application, they have no particular connection with the Compound Decision Problem. These lemmas are concerned with the relations between certain metrics on sets of probability measures; although they were used in the proof of Theorem 5, their use may not have been necessary and they are included at least partly because of their general interest. Before introducing the lemmas, we need some definitions. Definition 1. Let h be any function on a metric Space «l,d). The modulus of continuity of h is the function given by ah(e) = SUP {\h(w) - h(w')\ =d(w.w') < s} for each a > 0. Definition 2. Let .3 be the space of probability distributions on a metric space 0. The Prohorov metric on .9 is given by * . 6 L (F,G) = 1nf {6: F(A ) + 6 2 G(A) for all closed A.C:O} where A6 = {w: for some w' 6 A, d(w,w') < 6}. * * [We note L (F,G) = L (G,F); for A5 is open for each A, so 6c 6 A is closed; and w 6 A6C 46 '-—- A3.“- . a d(w,w') < 6 for some w' 6 A5C‘= w 6 AC, 1.4—" .21 ... ;’ 47 6C6 c . * 6 so A <: A . Hence, if 6 < L (G,F), so that G(A ) +-6 < F(A) for * some A, then F(Abcs) + 6 S F(Ac) +-6 < G(Aéc), i.e. 6 < L (F,G). * * Thus L (G,F) S L (F,G).] Lemma 7. Let a s h(-) s a +-M be a real-valued function on a metric Space 0. Then if P and Q are distributions on O with * L (P5Q) = p: Uh dP - J‘h dQ| s Mp +ah(p). Proof. Without loss of generality we take a = 0. Then fh dP - fh dQ = jg): dPh'1 - ng c‘lQh'1 (1) = fl; Ph'1[x > t] - Qh'1[x > t]dt (_:"__“)9 Since w 6 h 1[t,M] (where the bar denotes closure) implies h(w) 2 t - ah(p), we have - - )9 - Ph 1[x 2 t] < Q(h 1[t,M] + p S Qh 1[t - ah(p),M] + p. Thus M -l (1) Sfth [t-ah(p)SxSt]+pdt = OM +'Qh-1[Ig[x, x+ah(p)](t)dt] by Fubini's Theorem SpM+ah(p). The same argument, with P and Q interchanged, proves the lemma. Definition 3. Let S be the space of probability distribution functions on the real line. The Levy metric on S is given by L(F,G) = inf {6: F(x-e) - e < G(x) < F(x+e) + e for all real x}. 48 Lemma 8. Let a and b be real, and c s h(o) s c +-M be a real valued function on Q C [a,b]. Let F and G be any dis- tributions on O, and I > 2p = 2L(F,G). Then b-a ‘fh dF - jh dG‘ < ME—i— + 119 +'dh(l + P) +'ah(X + 29): where [x] denotes the largest integer not greater than x. Proof. Without loss of generality, we take c = 0, Mi< m and b“& < mo Choose 0 so that 2p < o'< I and [2&2 + l]o - (b-a) = 6 > 0, and let xj = a + jo - 6/2 for j = 0,1,2,...,k = [Pifi,+ l]. . . . . = I t v = By definition of L(F,G), we can find x0 x0 < x1 <...< xk xk such that, for each j, ‘xj - x3‘ 5 p and F(x3-) - p < G(xj) < F(xs) +'p, because F(xj - p) - p < G(x ) < F(x +'p) + p implies the existence 1 J of an x36 [xj - p, xj + p] for which either G(xj) - p < F(xS) < G(xj) + p or F(x3-) < G(xj) - p < G(xj) +-p < F(xs). For each j, let yj = min {xj, x3} and 2]. = max {xj. xj'}. and for x 6 (xj, xj+1] let h1(x) = inf {h(w)= w 6 [yja Zj+1j n 0}: with h1(x) = M if [yj, zj+1] n 0 ¢. I i = For x 6 (x , xj+1), let h2(x) h1(y) for y 6 (xj , x j and for each j let h2(x3) = max {h2(x3-),2 h (x'*9}- j+l] ’ Since ‘xj - x3_1\ < I + p, ‘x ‘ < I + p and j ' x3+1 lxj' - x3+1‘ < I + 29 for each j, \h-hl‘ s ah(I + p) and ‘h-hz‘ S oh(I + 2p)- We note that F(x£,x3) < G(xi,xj] + 2p by the construction of , k {Xj}j=0° Let 0 s r < M. If h 2(xj') s r then (x'_ 1,xj+1)<:12 h 1(43, r] and (xj-l’xj+l]<:1h1(qn’ r]. Conversely (xj_1,x:+43 Cih11(-m,r] implies (x' _1,x') U (xJ, ,xj +1) Cih21(-m, r] and this implies 49 h2(x') S r. Hence h;1(«n,r] is the union of at most k/2 intervals j of the form (x;,x' 3 intervals (xi,xj]. and h11(«n,r] is the union of the corresponding Hence, for all r, -l -1 F 112 (-°°,r] < C h1 (-00,r] + kp. l M -1 M - d - d = - d Thus {E h F {1; h1 (; Io x th2 I0 x Gh1 M -1 -1 = f0 6111 (-°°,x] - th (-oo,x]dx N f: Gh11(-®,x] - [Gh;1(dw,x] +-kp]dx - Mkp. Hence jh dF - jh dG 2 -Mkp - ah(I + p) -ozh(I + 26). The same argument with F and G interchanged proves the lemma. Lemma 8' involves a special case of Lemma 8 in which a simpler proof leads to a somewhat stronger conclusion. It seems, though we have been unable to Show this, that the proof could be used in the context of Lemma 8, with some modifications, to give an improved result; and also that it might be amenable to versions of Lemma 8 or Lemma 8' in higher dimensions. Lemma 8'. Let h be a function on [a,b], F and G any distributions on [a,b] and I > p = L(F,G). Then b- m. as - f5 dcl s oh()\){3 + [7519} Proof. Choose 0 such that p < O < I and [2&2 + 110 > b-a, and let xj = b - (k+l-j)o, for j = 0,1,...,k+l = [b—i‘i + 1]. x 4x Let c, = h -1:l-—i for j = l,2,...,k+l. Then J 2 |h(x) - cj‘ s ah(I) for each x 6 (xj-l’ij’ and \cj - cj-l‘ s ah(I) 50 for each j. Let D = F(xj) - G(xj) for each j. J? J Then k+l hd(F-G) $01 (I) + 23 c (D -D. k = ah(I) - chO + ck+le+1 +jEIDj(cj - cj+1) k So (I) +6 0.) 2 |D.|. h h j=l 3 But ‘DJL = [F(xj)-G(xj)] V [G(xj)-F(xj)] < [G(xj+1)+p-G(xj)] V [G(xj)-G(xj_1)+p] Hence j21|nj1 < jEI{G(xj+1) - G(xj) + p + G(xj) - G(xj_1)} = kp + G(xk+1) + G(xk) - G(xl) - G(xo) s kp + 2. Hence [h d(F-G) < ah(X)(kO + 2) = [fljp a (I) + 3:101)- I h h Again, reversing the roles of F and G proves the lemma.