ON THE CONTINUITY OF THE BAYES RESPONSE Thesis for the Degree of Ph. D. MlCHIGANI STATE UNIVERSITY MERRILEE KATHRYN HELMERS 1972 UNIVERSITY LIBRARIES mliu‘l‘iiiliimmu mm It 3 1293 00627 4363 l l LIBRARY Michigan State University This is to certify that the I! thesis entitled ON THE CONTINUITY OF THE BAYES RESPONSE presented by Merrilee Kathryn He lmers \ I has been accepted towards fulfillment ‘ of the requirements for ‘ Ph .D. degreein Statigtics and w r Probability Q MW Major professor Date August 11, 1972 0-7639 ‘ amass BY ; i . ,' ,t “MB ISUNS” I 800K BINDERY m5. LIBRARY amazes p W‘Mmm . ABSTRACT ON THE CONTINUITY OF THE BAYES RESPONSE By Merrilee Kathryn Helmers Consider a statistical decision problem with parameter set @, observations X ~ P9, action Space d5 decision rules m, losses L(e,m(X)) 2 O and risk functions R(e,¢9 =‘EeL(9.qKX))- When 9 is random with prior distribution G 6.2, the class of all prior distributions, R(G,q9 = EGR(9,qD is called the Bayes risk of ¢ versus G. The Bayes envelope is defined on .& by R(G) = infq§(G,q9 and any ¢ for which R(G,¢D = R(G) is said to be Bayes with respect to G. If a determination of a Bayes rule with reSpect to G, qt, is made for each C, then qt defines a mapping from .3 into the set of decision rules. Hannan((1957), Contributions to the Theory of Games, 3, 97-139. Ann. Math. Studies No. 39, Princeton University Press) investigated the relationship that certain continuity conditions on “b have on the average risk stability of the procedure which plays Bayes against the past in a sequence of independent repeti- tions of a game. With 91,92,... the sequence of parameters and Gi denoting the empirical distribution of 91""’9i this thesis continues Hannan's investigation of D: = n-1 2 R(ei,¢b ) - R(Gn) i-l in the framework of a statistical decision problem component. Merrilee Kathryn Helmers Chapter I gives the pertinent results of Hannan in the decision theoretic framework and offers some slight variations of conditions sufficient for D: a 0. Chapter II examines in more detail the ® finite case and begins by showing that the continuity of R(e,¢b) in G is suf- ficient for D: a O. For component decision problems where the action space G7 is also finite, Theorem 3 provides a characteriza- tion of a Hannan condition which is sufficient for D: # O. This characterization reduces to a condition on the continuity of dis- tributions of the likelihood ratios for components which are either 2 X N or M X M classification problems, and the likeli- hood ratio condition is shown to be necessary for D: d 0. Hence, a complete characterization of D: 4 0 is provided by the easily checked likelihood ratio condition. Chapter III has @ finite and investigates the relation- ship that the differentiability of R has to the continuity of ON THE CONTINUITY OF THE BAYES RESPONSE By Merrilee Kathryn Helmers A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1972 TO MY P ARENTS ii ACKNOWLEDGEMENTS I wish to express my sincerest gratitude to Professor Dennis C. Gilliland for his guidance and encouragement in the preparation of this thesis. His patience and willingness to discuss any problem at any time are greatly appreciated. I also wish to thank.Professor James Hannan for his critical reading of this thesis. In particular, I would like to thank him for pointing out how the results of his game theory paper could be used to shorten the proof of Theorem 2.3. Special thanks are due to Mrs. Noralee Barnes for her excellent typing of the manuscript and the cheerful attitude with which she did it. Finally, I wish to express my gratitude to the National Science Foundation and to the Department of Statistics and Probability, Michigan State University for their financial support during my stay at Michigan State University. iii Chapter I II III TABLE OF CONTENTS PRELIMINARIES 1.1 The Component Statistical Decision Problem 1.2 The Sequence of Component Problems 1.3 Mpgotonict;y)Results for R(e’q9+te) and "pGri-te SUFFICIENT CONDITIONS AND NECESSARY CONDITIONS FOR ASYMPTOTIC AVERAGE RISK STABILITY 2.1 Introduction 2.2 Continuity of R(e,¢h) in G Condition (HO); A Characterization for M X N Decision Problems 2.4 A Likelihood Ratio Characterization of Condition (1) SMOOTHNESS OF THE BAYES ENVELOPE AND ITS RELATION TO (C) Introduction Some Mathematical Preliminaries Differentiability of the Bayes Envelope as a Function of Q WWW LAMP“ BIBLIOGRAPHY iv Page \OJ-‘H 15 15 17 20 27 46 46 48 SO 60 CHAPTER I PRELIMINARIES l. The Component Statistical Decision Problem Consider a statistical decision problem with parameter set @ and a random variable X taking values in I with P9 denoting the distribution of X given parameter 9 6 @. Let L 2 0 be a (real-valued and measurable) loss function defined on Q X a? where Gr denotes the set of (random) actions. A (behavioral) decision rule m is a mapping from I into gr. The risk function of m is given by (1) R(9,cp) E J; L(e,cp(X))Pe(dX). A11 risk functions are assumed finite valued and measurable with respect to e. If G E.&, the class of all a priori probability measures on a o-field of Q, the Bayes risk of m with respect to G is given by (2) R(G,cp) E I R(9,cp)G(d9)~ ® We assume the o-field of G contains all singleton sets and through- out this thesis will identify degenerate probability distributions with the singleton sets on which they concentrate all their proba- bility. Then (2) provides a natural extension of the domain of R(-,¢) defined in (1). The Bayes envelOpe is given by (3) R(G) a inf New). <9 Any rule m such that R(G,m) = R(G) is said to be Bayes with respect to G and will be denoted by mt. It is also called a Bayes response with respect to G. A.minimal assumption that we impose is that a Bayes response exists with respect to G for each prior measure G. If a Bayes response qt is specified for each C E.$, then defines a mapping from .& into the class “PM of decision rules. We will call ¢(.) a determination of the Bayes response. It is convenient for our purposes to extend the domain of R(-,¢) to N3 the class of all finite signed measures on @, and the domain of m(-) to NT. the subclass of all finite non-negative measures on @. For H 6 N3 let (2') R(H.cp) E f R(escp)H(d6) ® and note that R(H,qD is linear in H. Let \H‘ = H(®) and for H E V+, H i9 0, define (4) 90H 5 <93 m If mb is specified (any Q will do) then this specification to- gether with (4) extends the domain of any determination of the Bayes + response to the class H'. The resulting m(.) is positive “PM + . homogeneous on. N', i.e. _ + mkH - mH for all k > 0, H E y' and R(H’q’fl) s R(H,cp) for all (p and H e if. In this thesis we will always work with positive homogeneous deter- minations of the Bayes reaponse. Throughout we will use square brackets to denote indicator functions; for example, [f > 0] denotes the indicator function of the set {x\f(x) > O}. 2. The Sequence of Component Problems. Suppose the component problem is repeated with ,Q = (91,92,93,...) the sequence of parameter values and = (X1,X2,X3,...) the sequence of observations. We assume in- dependent repetitions; that is, the distribution of K. given Q Q XP. 191 Hannan (1957) launched an extensive study of the sequence is the product distribution of decision rules * (5) $2 = ((9 :CP 3C? 3°°°) G0 Cl G2 for use across the sequence of component problems where CO = O and (6) Ci = empirical distribution of 91,...,ei, i = 1,2,... Hannan (1957, §8) shows that 1 n 1 “ 1—1 1 i=1 1-1 which suggests that under certain continuity conditions on the Bayes n response ¢(.), %' 2 R(ei’mb ) should be stable about R(Gn)° We i=1 i-l let >6- (8) D = 5],... "MD R(eiflPG. ) ' R(Gn) 1- i l l * denote the excess over R(Gn) of average risk resulting from Q and define an asymptotic uniform in .3 average risk stability * resulting from ‘m as * (A) Dn a O as n a m uniformly in Q. The idea of comparing average risk with the envelope R(-) evaluated at CD is part of Robbins (1951) original formulation of the compound decision problem. Hannan's (1957) game theoretic level investigations into variants of NC 1 were at least in part motivated by the earlier work of Robbins 2nd Hannan and Robbins (1955) in compound decision theory where statistical information replaces exact knowledge of Gn The original formulation of the -1‘ compound decision problem was set rather than sequence in nature. The idea in the sequence case is to estimate 61-1 or more generally, 991-1 using X1,...,X1_1 and consider i’E: E R(ei,&hi-1) - R(Gn)° The convergence to zero has been demonstrated and bounds on rates obtained for the 2 x 2, M X N, squared error loss estimation, two action, and other decision problems by Robbins (1951), Hannan and Robbins (1955), Hannan and Van Ryzin (1965), Van Ryzin (1966a), Samuel (1963), (1965), Gilliland (1968), Johns (1967), and many other authors too numerous to mention individually. This thesis concerns the development of necessary conditions and sufficient conditions for the (A) and its non-uniform in ‘3 counter- part. In some applications, for example,pattern recognition with supervised learning, g? is a procedure which is available to the decision maker and hence our results will have immediate application. Equally important is the potential use of our results in combination with results on estimation of empirical distributions for use in the important compound decision problem. A survey of known results * concerning the convergence of Dn and its loss analog to zero can be found in Gilliland (1972). In compound decision problems authors seldom impose con- tinuity conditions on the Bayes response because statistical estima- tion of NG._1 furnishes its own smoothing. Hannan (1956), (1957), Samuel (1963), (1966), Gilliland (1966), (1969) and Jilovec and Subert (1967) have investigated play against random perturbations of Gi-l and have established asymptotic average risk stability about R(Gn) without continuity conditions. By differencing across the extremes of (7) we have n * l (9) O S Dn s n -E [R(ei,mG. ) - R(ei’qh )1, for all Q_. 1-1 1-1 1 Identifying 9i with the probability measure degenerate on 81 we have i G1 = (i-l)Gi_ + 9i; and therefore ‘ 1 4b."? . -1' i Gi_1+(1 l) 91 This together with (9) suggests the investigation of R(G’N9+te) as t l O. Hannan (1957, p. 129) defines a condition on the component problem which we shall denote as condition (H): (H) 1im R(e ) = R(e,m ) uniformly in 9 6 @, G E.& . no G "PM: 9 Proposition 1. (Hannan) (H) = (A). 1 we see that (H) implies Proof. Since “b = _ o + O- (10) [R(Gi,¢bi-1) ' R(Oi,mGi)] e O as i a m uniformly in Q . Since Cesaro mean preserves the uniform convergence of a sequence, (A) follows immediately from (9) and (10). Hannan (1957, Th. 5) has observed that if ¢(.) satisfies a Lipschitz condition in (H) then the same analysis yields a bound * on the rate of convergence of Dn a 0 in (A). Gilliland and Hannan (1969, p. 11) give an example involving an infinite 9 decision problem where (A) obtains for some determina- tions of m(_) but (H) fails for all determinations of ¢(.). Thus, (H) is not a necessary condition for (A) to obtain. The next result concerns - * (A ) Dn a 0 as n a m for each ‘9 and (H ) lim R(9 ) = R(9,¢b) uniformly in G 9 6(9) > 0 tiO ’¢G+t9 where G(9) denotes the probability that G assigns to the singleton set {9}. Proposition 2. If s is finite and (H') holds for all 9 6 ® then (A-) obtains. Proof. Since 9 is finite, (H-) implies that for every 6 > 0 there exists a 6 > 0 depending on e and not depending on 9 or G 3 6(9) > O with ‘R(9,gG+te) - R(G’NG)‘ S g for all 0 s t s 6, 9 and G 3 C(9) > 0. Thus, (11) |R(ei.¢b ) - R(ei’¢b )\ s e if (i-1)’1 s 5 and Gi-l(ei) > o. i-l i For any sequence of parameters 9, Gi-l(ei) > 0 except finitely often and since risks R(9i,m) are finite, we see that (A-) follows directly from (ll)and (9). Progosition 3. If 9 is finite of cardinality M_ and the loss function L is bounded, say L(9,a) s B < o for all * - 9 E (9, a 6 a , then (H ) holding for all 9 6 9 implies (A) obtains. Proof. With 6 > 0 arbitrary and 5 as in the proof of Proposition 2, we have from (9) that n * .l O S Dn S n 121 [R(ei’¢Ci_1) - R(ei’wbi)] Gi-1(9i)=0 1 n + 3' .E [R(ei.¢b. ) - R(ei.¢h.)] 1-1 1-1 1 Gi-l(ei)>0 n n s%(BM)+-3; 2 13+1 2 e i=1 n i=1 (i-l)'1>6 (i-1)-1s5 -1 S (y +5n + 1)B + 6 Thus, there exists an n0 so large that O S D: S 23 for all .9 and n 2 no, i.e., (A) obtains. In Chapter II we investigate other sufficient conditions for (A-) and (A) and develop a necessary and sufficient condition for (A-) for some special component problems. The next section of this chapter deals with the monotonicity of R(9,¢b+te) and L(9,¢b+te(X)) in t and presents a useful necessary condition and a useful sufficient condition for a Bayes response to satisfy the limit condition in (H). 3. Monotonicity Results for R(9 ) and L(9 (X)). ’ CPO-ft 9 ’ ¢G+t e Hannan (1957, p. 129) has remarked that R(9,go+te) is monotone decreasing in t 2 0. In this section we state and prove the Hannan result with an extension on the domain t (Proposition 4) and then establish an analogous a.s. loss monotonicity result (Proposition 5). For 9 6 ® and G 6.8 let G(9) denote the measure that G assigns to the singleton set (9}. Proposition 4, (Hannan). For all 9 E O and G €.$, (12) R(9 ) 1 in t 2 -G(e) ’ cp(;+t 9 Proof. Let 9 and G be given and consider t > O. The inequality R(G + t9,cp0+t9) .<. R(G + t9,ch) can be weakened to R(G + te’qo+te) s R(G + t9,¢b) + R(G,gG+t9) - R(G,os) Using the linearity of R(H,¢) in H and the fact that t > O we obtain (13) R(9 ) s R(e,ch) ’ q”c;+t 9 Note that.(12)is trivially satisfied if G is degenerate at 9 so we exclude this case in the remainder of the proof. Let -G(9) 5 t1 < t2 be fixed and consider G* = (G + t19)/(1 + t1) and t = (t2 - t1)/(1 + t1). G* is a probability measure since t1 2 -G(9) and t1 = -l is excluded since G is not degenerate at 9. Since and m * G +te HE* = ¢b+t19 = qG+t29’ replacing G 10 * by G in (13) yields R(e’ch‘lee) S R(9 ) ’W G+t19 Since t1 and t2 are arbitrary numbers satisfying -G(9) S t1 < t2, the proof of (12) is complete. Consider the following assumptions on the component decision problem. (A1) there exists a o-finite measure u such that Pe << u for all 9 E 9 (A2) For H 6 31+, R(H,cp) can be written j p(e,o(x))fe(x)n(de)n(dx) . I o and a Bayes response with respect to H is given by choosing qfix) to infhnize the inner integral a.s. PH. dP In (A2) fe is a version of agfl- and (14) rats) = j P9(A>H(de> O defines the measure (H mixture) P d? f denote a version of -Jfl . H du H on the c-field of I. Let Under (Al) and (A2) m is Bayes with respect to H if and only if (15) éL(9.cp(X))fe(X)H(de) = inf £L(9,a)fe(X)H(de) a.s. PH - a Proposition 2, Suppose (Al) and (A2) are satisfied. Then for all 9 G O and G 6.9, and -G(9) S t1 < t2 (16) L(9 ,cpmtzem» sL(e,ch+tle(X)) a.s. P9 . ll ggggf. If G is degenerate at 9, (16) is trivially true. We therefore exclude this case in the remainder of the proof. The transformation given in the proof of PrOposition 4 will establish the result for general t1,t2 such that -G(9) S t < t once the 1 2 result is proved for O = t1 < t2 = t. Let 9 and G be given and fix t > O. For any m R(G+te.cp) = j” j{L(o,o(x))fa(x) + t L(e.o(x))fe(x>}c(da)n0 9 G (17) + t J“ L(9.cp(X))fe(X)u(dX) [fG=0,fe>O] = + Let Pe v1 v2 where v1 << PG and v2 1.PG so that v1 has density f9 [f6 > 0] and v2 has density fd:fG = 0]. Since ¢b+t9 minimizes R(G+t9,q9 over choices of m we see from (17) that L(9a¢b+te(x)) = min L(9,a) a.s. v2 a so that (18) L(9,ch+te(X)) SL(9,ch(X)) a.s. v2 . From (15) we have (19) é{L(o,ch+te(X))fa(x) + t L(e.ch+te(X))fe(X)}G(do) s £{L(a,ch(X))fa(X) + r. L(9.>fa(x)c(do) 2 g L(o.ch(X))fa(X)G(do) a.s. PG 9 12 which together imply (21) c L(9,(pG+te(X))f9(X) s c L(9,(pG(X))fe(X) a.s. P . G where << P , (21)implies Since t > O and fe(X) > O a.s. v1 G "1 (22) L(9 (X)) S L(9,¢b(X)) a.s. v “fifite Combining (18) and (22) we have (23) L(9 (x)) SL(e,ch(X)) a.s. P . "paste 9 We conclude this section with two prOpositions which will be used in the proof of Theorem 3 of Chapter II. The results are presented here because they apply to more general decision problems than the finite 9 problems investigated in detail in Chapter II. The first pr0position gives a necessary condition for a Bayes reSponse to satisfy PM (24) lim R(9,m ) = R(eap) t10 G+t9 G and is presented at the game theoretic level by Hannan (1957, p. 129). We repeat the result here in our notation. Pronosition 6. (Hannan). Let be a determination of ‘90) the Bayes response. Then (24) implies that (25) R(9,¢b) S R(9,q9 for all m which are Bayes wrt G . Proof. Let G be given and suppose ¢(,) is a determina- tion of the Bayes reaponse satisfying (24). Let m be Bayes wrt G. Then as in the paragraph following (8.12) of Hannan (1957), 13 t R(G’Pmte) + R(G) S R(Gd'te) S R(G+te.cp) = R(G) + t R(8,cp) so that R(e,ch) - R(9,cpc+te) 2 R(ean) - R(e.o) if t > 0. By (24) we see that R(9,qb) S R(9,¢). The next pr0position is a slight variation of the results presented by Hannan (1957, p. 129) concerning sufficient conditions for (24) to obtain. We need the following Definition. A Bayes response ¢b is said to be G-dominant if (26) R(a,¢b) S R(a,¢) a.s. G for all m Bayes wrt G . (Of course, mt is G-dominant if and only if R(a,mG) = R(a,q9 a.s. G for all m Bayes wrt G.) Proposition 7. (Hannan). Suppose the class of risk func- tions {R(-,¢)\¢ a decision rule} is sequentially compact under pointwise convergence on 9 and that R(9,m) S B < m for all 9 and 9' 'Let G 6.9. (i) If m(_) is such that uh is G-dominant, then (24) obtains for all 9 such that G(9) > 0. (ii) If ¢(.) is such that qt is everywhere dominant then (24) obtains for all 9. 2522:. (1) Suppose 9 is such that G(9) > 0. Let t‘1 l 0 and by the sequential compactness assumption let tk be a sub- sequence and m a decision rule such that (27) R(o,cp) = lim R(o.

0, ’qo+te we see that R(G,m) S R(G). Hence, the weak limit m is Bayes wrt G. Since qt is G-dominant and G(9) > O we must have (28) R(9.m) = R(e.¢b) . Since lim R(9,m th G+t9 4) we see that (27) and (28) combine to yield (24). (ii) The proof ) exists by the monotonicity in t (Proposition is carried by the same arguments used in the proof of (i). CHAPTER II SUFFICIENT CONDITIONS AND NECESSARY CONDITIONS FOR ASYMPTOTIC AVERAGE RISK STABILITY 1. Introduction. In this chapter and in Chapter LEIwe consider component statistical problems in which 9 is of finite cardinality M 2 2. If 9 = {l,...,M}, each a priori probability measure G can be identified with the M-dimensional vector (G(l),...,G(M)), where G(9) is the mass put on 9 by G. .3 may therefore be identified with the M-1 dimensional simplex in EM (Euclidean M space). That is, M ,3 = [y = (y1,...,yM)\yi 2 O, 1 S i S M, .2 y1 = 1} i=1 We will equip ,9 with the usual Euclidean distance, given by d()':)") = [213 (yi - >92]? This chapter concerns itself with sufficient conditions for (A) and (A-) to obtain. The conditions fall into two categories. In Section 2, we discuss sufficient conditions involving the (joint) continuity of R(9,qh) in G as opposed to the continuity of R(9,qb) along lines in. Mt as given by conditions (H) and (H-) in Section 1.2. In Section 3, we consider component problems in which the action space a' is also finite. We study a pointwise version of (H) called (H0) and relate it to a condition (I) involv- ing both the family of probability measures {Pe‘9€®} and the loss structure. 15 16 Under certain assumptions these two conditions are equivalent and each is a sufficient condition for (A-) to obtain. The condition (I) is often simpler to verify than (HO). In Section 4, we turn to some special component problems. In two problems, one in which ® consists of two elements and<7 of any arbitrary (but finite) number of elements, and the other, the M X M classification prob- lem with 0-1 loss, we find a condition (Io) equivalent to (H0) but stated in terms of the distributions of the likelihood ratios f (X) fa(X) only. This condition is, in general, quite simple to verify. 6 Moreover, it is a necessary condition for (A-), and hence for (A), to obtain. In the third component problem examined in this section, one in which 9 consists of any arbitrary (but finite) number of elements and (7 consists of two elements, the structure of the problem plays a very important role. A condition on the likelihood ratios is given which is a necessary condition for (A-) to obtain. It is not, in general, equivalent to (H0) although it provides a simple means of verifying that (A-) does not obtain. l7 2. Continuity of R(G’NG) .13 G. The convergence conditions (H) and (H-) defined in Section + 1.2 concern the continuity of R(9,¢h) along lines in N'. The conditions we now study involve the (joint) continuity of R(9,¢b) in G. Consider (C) R(9,¢b) continuous in G E.£ (C-) R(9,¢b) continuous in G 6.3 3 G(9) > O . Theorem 1. If 9 is finite and (C) obtains for all 9, then (H) obtains; and, hence, by Preposition 1.1, (A) obtains. Proof. If R(9,qb) is continuous in G 6.9, then it is uniformly continuous in G E.&. Since qb+t9 = mb* where G: = (G + t9)/(l + t) E.& and t 2 * -1 2 d (Gt,G) = 2 (C(01) - (”0 C(01)) + d€®‘{9} -1 2 [G(9) - (1+t) (G(9)-+0] 2 2 2 = ——t—'2' 2 G2(or) + ___t__2_ [G(9) - 1] (1+t) a€®-{ 9} (1+0 5 2t2, * we see that d(Gt,G) « 0 as t l 0 and hence R(9,¢b+te) a R(9,¢h) uniformly in G as t l 0. Since 9 is finite the convergence is uniform in 9 and G. Theorem 2. If 9 is finite of cardinality M, the loss function L is bounded, say L(9,a) S B < m for all 9 E 9, * - - . a E a , and (C ) obtains for all 9, then (A ) obtains. 18 Proof. Let 0 < 6 S 1 be fixed and define the sets L. n {1‘1 5 i s n, 91 = oz. Gi_1(a) < 6). oz 6 ® 7: fl {i‘l S i S n, 91 = a, Gi_1(a) 2 6}, o 6 ® denote their respective cardinalities. (1) o s n: sf 2{ z “(91% 1) - R(ei. O is arbitrarily small, the convergence D: a O as n a m is proved. A study of condition (C) has implications beyond those re- lated to the asymptotic average risk stability (A). For example, in a decision problem with ¢(.) a Bayes response for which (C) is satisfied for each 9, if G is any estimate of the a priori distribution G which is independent of the data gathered in the given decision problem, then R(9,¢» G for each 9. Thus, under a bounded risk assumption, ) a;s. R(9.¢b) as G aLs. G R(G,qa) 845° R(G,¢b) = R(G) by the dominated convergence theorem; i.e., playing Bayes versus a consistent estimate of G is asymptotically Optimal. We now turn to the study of the variation (H0) of (H) and its characterization in terms of the loss structure in a finite action decision problem. 20 3. Condition (Ho); §_Characterization for M X N Decision Problems. Let 8 = {1,2,...,M} and aV= {1,2,...,N} where both M and N are finite so that the loss function L is given by an M X N matrix of non-negative numbers whose (9,a) element is L(9,a). We will show (Theorem 4) that if the following condition (H lim R(9,cp O) th o+ce) = R(9,¢b) for all c 9 6(9) > 0 holds for all 9 6 ® then (C-) also holds for all 9 and hence, via Theorem 2 , (A-) obtains. First we develOp a characterization of (H0) in terms of the structure of the M X N component decision problem (Theorem 3). Throughout this section and the next we assume: (4) The columns of (L(9,a)) are mutually nondominated; that is, no pair a,a' E a’ exists for which L(9,a) 2 L(9,a') for all 9 E 9. Equivalently, for every a,a' E a' there exists 9,9' 6 ® such that (5) [L(9,a) - L(9,a')][L(s'.a) - L(e'.a')] < 0- As in Chapter I let fe denote a version of dPe/du where u is a a-finite measure dominating each P Since 9 is finite, 9' u can be taken to be the finite measure ESPO. Definition 1. For each a E d, H 6 21, x E I, we define Mms>=gumsmw$s>. We note that a Bayes reSponse versus G E.& is characterized a.e. u 21 l , if < (6) qb(x)(a) = arbitrary, if xa(G,x) = 2:23 xa,(G,x) O , if > with the proviso E ¢b(x)(a) = 1. We will find it convenient to let mb(a‘x) denote ¢b(x)(a), the probability assigned to act a by the probability measure qh(x) and henceforth we employ this notation. ‘ For the purpose of investigating (H ) for a particular 0 fixed 9 we can find a permutation of the acts a1,az,...,aN such that and define the equivalence relation a ~ a' iff L(9,a) = L(9,a') The permutation and the equivalence relation clearly depend on 9 but we will not use additional notation to display this dependence. With A1,...,Ar denoting the equivalence classes with the order on subscripts denoting the order on losses we have (7) L(9,A1) < L(9.A2) <...< L(9,Ar) and for any decision rule m r (8) R(e.cp) = jEl $(esAj)cp(Aj|x)fe(x)u(dx) where L(9.Aj) L(e.a) for 86A and cp(Aj\X) = 2 min (G,x) P j$i 3 Lemma 1. Let 9 and G such that G(9) > 0 be fixed. Then (11) P [p.(G,X) = min p,(G,X)] = O, i = l,...,r e 1 j#i 3 if and only if R(9,¢b) is unique across all determinations of (PG ' Proof. (Only if) For any decision rule m, R(9,qb is given by (8). Using (8), (10) and (11) we see that r R(Giwb) = L(e,Aj)Pe[pi < min pj(c,x>] j¢i i=1 for any determination of qt. (If). Suppose there exists an i for which P [p.(G,X) = min p.(G,X)] > O . e 1 j#i 3 Then there exists a j # i such that P9(Bij) > 0 where (12) Bij = [91(G’x) = pj(G.X) s :i: j pk(G.X>] - 23 Consider two Bayes reSponsea ¢C and qé such that 0 = ' 0 mb( \x) qh( ‘x), for x é Bij ¢b(Ai‘x) = l, é(Aj‘x) = l, for x E Bij Then R(e,ch) - R(e, 0. We now give a characterization of (H0) in Theorem 3. A determination of the Bayes response m(.) satisfies (HO) for all 9 6 ® if and only if (I) Pe[p1(G,X) # 31;: pj(G,X)] = o for all i = l,...,r and G 9 G(9) > 0 holds for all 9 E 9 in which case all determinations ¢(,) satisfy (HO) for all 9 6 9. Proof. (Only if). Supose is a determination which ‘P(-> satisfies (H0) for all 9 E 9. Let 9 and G such that G(9) > 0 be fixed. By PrOposition 1.6 R(a,¢b) S R(o,¢é) for all a such that G(a) > O and ¢é Bayes with respect to G. Hence, R(a,qb) = R(a,qé) for all a such that G(a) > 0. In particular R(9,¢b) = R(9:¢E)° The uniqueness of R(9,¢h) across determina- tions of the Bayes response ensures that (11) obtains (Lemma 1). Since 9 and G such that G(9) > O are arbitrary we see that (I) holds for all 9 E 9. 24 (If). Suppose that (I) obtains for all 9 G 9 and let ¢(.) be any determination of the Bayes response. Fix 9 and G such that G(9) > 0. Lemma 1 shows that for all a such that G(a) > O, R(a,mb) = R(o,qé) for all qé Bayes with reSpect to G. Hence, W(.) is G-dominant. For the M X N decision problem the risk set is compact. Hence, by Pr0position 1.7, lim R(9,m tl0 G+t9 are arbitrary we see that (H0) holds for all 9 6 ®. ) = R(9,qh). Since 9 and G such that G(9) > O The condition (H0) is a one-sided directional continuity condition for R(9,¢h) as a function H €_flfih Our next theorem shows it to be equivalent to the (joint) continuity condition (0-). Theorem 4. (HO) holds for all 9 E 9 if and only if (C-) holds for all 9 E 9. For the proof of Theorem 4 we will use Lemma 2. Let 9 E ® be fixed and let A1,...,A.r be the corresponding equivalence classes (cf. (7)). If Hn is a sequence of probability measures, i.e., [Hn}<:,&, such that Hn a G, then (13) pi(Hn,x) a 91(G’x) as n a m uniformly in x for all i = 1,2,...,r. Proof. By Definitions 2 and l we see that Ipi(Hn.X) - pi(G.X)1 5:2: \xaflinnc) - xa(G.X)I (14) ‘ Smax 2 L(a,a)\Hn(a) - G(a)|£ (x). aeAi o6® “ Since max L(a,a) = B < m and sup f (x) S l (by choice 0,8 09" of the dominating measure u = 2 Po) and since 9 and a' are a 25 finite, we see that the right hand side of (14) converges to 0 as n a m uniformly in x so that (13) obtains. Proof of Theorem 4. (If) Suppose (0-) holds for 9 and fix G, such that G(9) > 0. Note that where ‘PG-tte = ‘Pct Gt = (G+t9)/(l+t) and note that, Gt a G as t l 0. Hence, by (C ), R(9,qbt) « R(9,¢b) as t 1 O, i.e., (HO) holds at 9 and G. (Only if) Assume (H0) holds for all 9. Now fix 9 and G such that G(9) > 0. We will show that R(9,¢h ) « R(9,qh) as n H a G. By Theorem 3, (I) obtains and for any determination of fl tp(.), (8) and (10) yield r 1-1 j#1 For sufficiently large n, Hn(9) > 0 and we can write r E (16) R(9.:pG) - 11(9an ) = n i=1 L(e.Ai)[Pe(Bi) - PO(Cn,i)] where Bi = {x‘pi(G,x) < min p (G,x), f9(x) > O} ji‘i J Cm = {x‘pi(Hn,x) < 31;? pjiunsc). fem > 0} for i = l,...,r. For any sets B and c, |P(B) - P(C)| SP(BCC) + P(BCC). By Lemma 2, limn C“,i C {x‘pi(G,x) S min pj(G,x), fe(x) > O} ji‘i and since lim C , :>B,, '——1n n,1 1 c = . c lim Cn,’ (11m Cn,) CB Thus, 26 "‘.'"' C C — 11mn Pe(Bi n Cn,i) S Pe(Bi fl limn Cn i) Pe[pi(G,X) = min p (G,X)] = o jfii j and —'.'_ C "'.—" C 1mn Pe(Bi n on i) s P903i n 11mn on i) Pe(¢) = 0 and the right hand side of (16) converges to O as n a m. Since 9 and G such that G(9) > O are arbitrary we see that (0-) holds for all 9. Corollary 1. If (HO) obtains for all 9 E 9 then (A-) obtains. Egggf. The proof is carried by Theorem 4 together with Theorem 2. 27 4. A Likelihood Ratio Characterigation of Condition (1). As we have seen in the preceeding section, for all 9 (I) is equivalent to (H0) for all 9 and it provides an alternative sufficient condition for (A-) to obtain. In statistical decision problems condition (1) provides a means for verification of (H0). In this section we discuss three types of 'M.X N decision prob- lems and in two we develOp a simple characterization of (I) in terms of the continuity of the distributions of the likelihood ratios fa(X)/fe(X). Consider f (X) _Q___.= = u (10) Pe[fe(x) c] O for all c E (0,m) and for all a E 9, a f 9 . The three component problems we consider are (i) M = 2, N arbitrary (ii) M X M classification problem, 0-1 loss (iii) M arbitrary, N = 2. In (i) and (ii) we show that (I) holding for all 9 is equivalent to (IO) holding for all 9 and that (10) holding for all 9 is a necessary condition for (A-) and hence for (A). In (iii) we give a condition involving continuity of distributions of certain likelihood ratios which turns out to be necessary for (I) to hold for certain 9 and also necessary for (A-). Gilliland and Hannan (1969, Theorem 2) have shown that the existence of a determination of T(.) satisfying (H-) is equivalent to (10) holding for all 9 in a 2 X 2 (testing) decision problem. Our results concerning problem (i) subsume the Gilliland and Hannan 28 2 X 2 results. Problem (1). Let M = 2 and N be arbitrary (finite). Here 9 = {1,2} and aV= {1,2,...,N} where the acts are labelled so that L(l,l) < L(l,2) <...< L(1,N) (17) L(2,l) > L(2,2) >...> L(2,N) This is possible since the columns of L are mutually nondominated (cf. (4) ). When M = 2, the condition (IO) holds for a 9 E 9 if and only if it holds for the other parameter in 9 since if c > O, -l Per(X)/fa(X) = C 1 = c Peffa(X)/fe(X) = c]. Definition 1 yields 1a(G.x) = L(1.a)c(1)f1(x) + L(2,a)G(2)f2(x) and for each 9, the equivalence relation induced on acts is equality; in fact, for 9 = l, Aj = {j} and for 9 = 2, A) = {N-j}. We have pj(G,x) = xj(G,x) for 9 = 1 pj(G,x) = xN_j(G,x) for 9 = 2 . Condition (1) states P“ l Pe[pi(G,X) = min p (G,X)] = 0 for all 3*i j - 1,2,...,N and GBG(9)>0 where it is understood that the within the square brackets pi correspond to 9 in the operator P9. For both 9 - 1,2, (I) is seen to be equivalent to 0.8) P9[>.1(G,X) = min 1 j(G,X)] = O, for all i = l,...,N and jfii G 3 G(9) > O 29 that is, (19) Pe[L(l,i)G(l)f1(X) + L(2,i)G(2)f2(X) - "in (L(1.j>c<1)£1(x> + L(2.j)G(2)f2(X)}] = 0, j i for all i = l,...,N and G 3 G(9) > 0. Theorem 5. (1) holds for 9 = 1,2 if and only if (10) holds for 9 = 1,2. nggf. (If) Suppose (Io) holds for 9 = 1,2. Let G E.& be such that G(l) > 0. If G(l) = 1 then (1) holds trivially for 9 = 1. Thus, suppose G(l) < 1. For every pair of acts a,a' ’ a f a', C = GQJLLQLaI - LAZSa'll E (0 m) G(1)[L(1,a') - L(1.a)] ’ by (17). By (I0) for 9 = l, we have P1[1a(G.X) = ia.(G,X)] = 0 . Hence, (18) holds for 9 = l. The treatment of 9 = 2 is similar, and we see that (I) holds for 9 = 1,2. (Only if) Suppose (I) holds for 9 = 1,2. If (I0) fails to hold for both 9 = 1,2 then there exists a c E (0,m) such that f (X) l (20) PIE-£7275- = C] > 0 . * Define the function L : a X a... (O,oo) by (21) L*(i,i> - L(Zi‘) ' L(Z’j) — L(l j) _ L(l,i) , for all 1 s i, j s N . 30 Let aO < N be any act for which * * L (a0,N) = min L (i,N) i 0 9 a a , a l k a #a1, ,a k for 9 = 1,2 and some set 2k = {a1,...,ak}, k.< l. Wlog, 1 = a1 < a2 <...< ak. If not, since for all x in the set f 1(x) _§$_l f2 10,) =*G(1)L (a. .aj), for all 1Si O. --n.n 31 We shall consider two distinct cases: i) L*(i,j), as defined in (21), is constant for all 1 S i (j S N; ii) L*(i,j) is not constant for all i,j. We shall consider the 2 X 2 problem as a special subcase of case i). Case i. L*(i,j) constant for all i,j. The set in (22) must be the set (11mm) =...= was} = (“1(6) Define A (G) = {A (G,x) < min h (G,x)}, j = l,...,N. Set 3 j kfj k g = P1{AN+1(G)} and y = P1{(1-¢b(l‘X))AN+1(G)}. Necessarily, OSySQ. With G(2) = n, we construct a parameter sequence 9' as follows: 91 arbitrary, 2 Gi_1(2) < n I O 2 614(2) =‘n. v - 1 G,_1(2) = fl . Y 1 ei_1(2) > n (23) 9 V c> For 9 = 1,2 define 0 .. oe — 2‘: L(e.j)Pe{AJ(G)] " 0 + O 09 = oe + L(e.N)P9{.¢N+1(c>} . Then for all i such that 32 (i) o < Gi_1(2) < n, R(2,¢bi-1) 2 a; + ii) n < Gi-l(2) < 1. R(l,¢bi-1) 2 01 i (24) fl 111) ci_1(2> = n, R(2,¢b. 1) 2 a: + liflrium) - L(1,1)]g, if y = o 1- L R(1,¢b. 1) 2 a; + [L(1,N) - L(1,1)]Y. if v > o. 1- Consider the decomposition ) + E R(1,¢ (25) z: R(9i’9b ) = x R(2 G ) i- 1 ei=1 1- »¢ 1 ei=2 Gi- 1 By the construction (23) there are nGn(2) terms in the first sum on the right hand side of (25) and ‘Gn(2) - T“ < 11.1. Then (24) together with (25) yields n n 2 - + [n(1-n) - 2][o; + 51] if y > o and f + 1.. V + (26 ) 2:11 R(eiupcid) 2 [Ml-2102 + 71]] 52] + [nu-n) - 2101 if y = 0 where 51 =[L(1.N) ' L(1,1)]Y if v > O 82 = [L(er) ’ L(1:1)]Q if Y = 0 . 33 Since R(G) = no; + (l-n)o; = no: +'(1-n)q:. (26) and (26') yield. respectively, (27.) 2'1‘ 11(91an > 2 nR(G) - 2&1; + a; + 511+ n(l-‘n)91 if V > o i-l and , +- + 1- (27 > 2211919614) 2 nus) - 2th2 + o1 + T“ 321 + nu-me, if y=0. By construction Gn « G so that R(Gn) a R(G) and 1. D* 61 if y > 0 un n > (l-n) 9 > 0 where 9 - . - n 92 if y — O at the parameter sequence 9 defined by (23). This contradicts the assumption that (A-) is true, and therefore (I) must hold at 9 = 1,2 and the proof is complete for case i. * Case ii. L (i,j) is not constant for every pair (i,j). It is necessary to make a few preliminary remarks concerning Remark 1. For every triple (i,j,k) 6 a x a X a, i < j < k one and only one of the following conditions is satisfied: * * ‘k A: L (1.1) = L (1.10 L (1.10 * * * B: L (is) >L (1.10 > L (J.k> * * * C: L (1.1) < L (i.k) < L (J.k) Remark 2. For every triple (i,j,k), i < j < k, satisfying (C), 34 xj(G,x) > min xk (G,x) for all G 3 0 < G(2) < l and for all ki‘ j x 3 f1(x) > O or f2(x) > 0 . Set (7' = {a G a’}(i,a,k) satisfies (A) or (B), for all i < a < k}. Since 3k C d', by virtue of Remark 2, we may restrict attention to d'; that is, we may suppose that a = a' and that therefore, every triple (i,j,k) satisfies (A) or (B). We define a sequence of actions {aj.‘ i = 0’°°°’kO+l} as follows: r a = 1 jO if (1,2,3) satisfies B aj = max{j‘ 2 < j S N and (1,2,j) satisfies A} 1 if (1,2,3) satisfies A f(ji-l’Ji-I+1’ ji-I+2) satisfies B (28)}81 = mx;j\jb11k " 35 where KL(G,X) = la (G,X), L = j09"°9j'k +1" 6 0 We shall continue to use the sets Aj (G) as defined pre- viously, noting that there, )‘j (G,x) means xa(G,x) with a = j E 4. Thus, for any prior G with 0 < G(2) < l, N R(e,sG) = j§1L(e.j>Pe{Aj} ko+1 jk +- 2 z L(e,a )P {o (a \x) (6)} . k=1 j=jk-1 j 9 G j Ah+k » + The sets Aj(G), 1 S] SN and AN+k(G), l S k S k0 l, are pairwise disjoint; moreover, for all 1 S j S N, (X) *. 1 G(2) AJ.(G) ._. {G(1_Q))_L (3,19 < £200 (6(1) L* (i ,3), for :11 i< j < k} and for all 1 S k S k0+l f(X> = Q2). * =9—(—>-L HELL AN+k(G) {6(1) L (ai’it') < f2(X> c(1) M‘ ’1) < 6(1) fl( ae’ 1k 1) for all (,0,y >0 1 ci_1(2) =11 , “>0, y0=0 L1 Gi-l(2) > 'n For 9 = 1,2 define a; as before; ko+l 0 0' + 2‘. L(Gfi )P (G) 9 k=1 jk-l BM‘H‘ } 0 ll k'+ 0 1 ° + P . 9 09 kgl L(6,8jk) e{AN+k(G)} 0 ll Then, for all i such that ’ 1) o < ci_1<2> < n, R(Z’CPGH) 2 a; 11) n < GHQ) < 1, R(1.cpci-1) 2 0: iii) ci_1(2) = n, (30) a Masai-1) 2 01+ 11311171413) - L(1,1)]g1, if “'1 = o R(2,ngi-1) 2 a: + 1%? I'L(1,N) - L(1,N-l)]y0, if yl > 0, Yo > o L R(1,chi-1) 2 a; + [L(1,2) - L(l,l)]Y1, if Y1 > 0, Y0 = O . 37 As in case i), we have the decomposition (25) with the first sum on the right hand side of (25) containing nGn(2) terms and \Gn(2) - ’n‘ < n-l. Then (30) together with (25) yields (31) 2‘; R(Giwpc. 1) 2 [rm - 210; + [nu-m - 21w; + 61] 1- if Y1 > 0, Y0 = O with 31 = [L(l,2) - L(l,l)]y1 and (31') 2: 11%,.pr ) 2 En“ - 210: + £3.71} 82] +0104» " Ho: i-l if y1=0 or y1>0,y0>0 with [L(l,2) - L(l,l)]g1 if y1 = 0 [L(1,N) - L(l,N-l)]yo if Y1 > 0, yo > o . As before, R(G) = “0;: + (l-mo; = Tb: + (1490:, and (31) and (31') yield, reapectively, (32.) 2‘11 R(ei,chi-1) 2 nR(G) - 2K0; + °1-+ 51] + n(1-'n)e1 if Y1 >'O, yo = O and )>_ nR(G) - 2[o:+ o++ 1:? 32] , n (32) 21 R(919(PG1- 1 1 if Y1=0 or y1>0,‘YO>0. By construction Gn - C so that R(G“) - R(G) and 38 * lim Dn > (l-n) B > 0 where 81 if Y1 > 0, Y0 = O 52 if Y1 = O or v1 > 0, Y0 > O at the parameter sequence 3_ defined by (29). This contradicts the hypothesis that (A-) is true and therefore (I) must hold at e = 1,2 and the proof is complete. Problem (ii). M X M classification problem, 0-1 loss. Let ® = 6" {l,...,M}, with loss function given by L(9,a) = 1 if e # a and L(e,a) = o if e = a. Condition (1) at 9 becomes 0 = P9[G(e)fe(x) = mix G(a)fa(X)] for all C s.t. G(9) > 0. a 9 Theorem 7. (I) holds for all 9 iff (10) holds for all e. 2529;: Suppose (IO) holds at 9. Then for any a # e and any G such that O < G(9), G(a) < 1, set c = G(9)/6(a). Then c 6 (O,m) and Pe[fa(X)/fe(X)=c] = 0 entails Pe[G(9)fe(X) = G(a)fa(X)] = 0. Moreover, for any G such that G(9) = l, or for any x such that fe(x) > 0, fa(x) = 0, G(e)fe(x) > G(a)fa(x). Therefore, since a was arbitrary, (1) holds at 9. Conversely, fix 9 and suppose there exists c E (O,m) for which Petfa(x)/f9(X) = c] > 0 for some a # 9. Then there exists a (unique) G such that 0 < G(9) < 1, G(a) = l - G(9), and C a G(9)/G(a)- Moreover, o < Pe[fa(X)/fe(X) = c] = Pe[G(e)fg(X) = G(a)fa(X) = 3:: G(w)fw(X)] 39 and (I) does not hold at 9. Since 6 was arbitrary, the proof is complete. Theorem 8. If (A-) holds, (1) and therefore (10) holds for every 9. 2322;. Suppose (IO) does not hold at 9; that is, there exists a # e and G such that G(9) > o and (33) Pe[G(9)fe(X) = G(a)fa(X)] > 0 . Wlog, we may assume G(9) + G(a) = 1, for if not, we may replace G by G' where G'(w) =G(u))/[G(e) +G(a)] for w=a,e. Then xe(G,x) = G(a)fa(x) k (6.70 = G(e)f (X) a 9 x (G,x) = G(e)f (x) + G(a)f (x) for every n; # e.a w 9 a and (33) becomes (33 ) Pe[xe(G,X) = xa(G,x)] > o . Set g = Pati9(c.x> = xa o For any (1' 63 with G'((y) = 1 - c'(e), o < G'(e) < 1, R(e,cpc.) = PGEG'(a)fa(x) < G'(e)fe(x)] + Pe{¢G:(a‘X)[G'(a)fa(x) = G'(e)fe(-x)]}. with a similar expression for R(a,qb,). Therefore, by the construction (34), if Bi = a, then Gi-1(9) S n and R(93CPG ) 2 Ce 1. 1 if y = 0, 9i = a, then Gi-1(e) > n and R(cr,ch )2 0+ = 0- + C 1_1 a at if y > 0, 91 = a, then Gi-l(e) 2 n and R(Q’flPGi 1) 2 R(a.ch) = a; + v . 41 As in the proof of Theorem 6, we have the decomposition given by Q5) with 9 replacing 2 and 0! replacing l in the right hand side of (25),with nGn(9) terms appearing in the first sum on the right hand side and ‘Gn(9) - n‘ < n-l. We therefore have 2’; R(91. 2 23R(91»'ve, ) 2 i-l 1-1 [rm - 210; + [nu-n) - 21w; + a) where Since R(G) = n a; + (l-n)o; and Gn 4 G by construction (34), 1' D* l ' 1m n 2 ( -fl) 8 > 0 at the ‘3 given by (34), which wiolates the hypothesis that (A-) is true. Problem (iii)- M arbitrary, N = 2. In the case N = 2, condition (I) becomes Pe[),1(G,,X') = 12(G.,X)] = 0 for all C s.t. G(9) > 0 . We shall assume that for all 6 6 @, either L(e,l) < L(e,2) or L(e,1) > L(e,2). Set @1 = {e E (9; L(e,1) < L(9,2)} and @2 = G - @1 = {9; @; L(e,l) > L(e,2)}. The following result is not quite as strong as that given in Theorems 5 or .7 for M 2 3. Theorem 9. If (1) holds for every 9, then the following condition holds for every 9: 42 (16) Pe[fa(X)/fe(X) = c] = o for all c 6 (0,00), x ~ P9, and for all a s.t. [L(a,1) ' L(a,2)][L(6.1) ' L(9,2)] < 0 . Proof: Suppose (1) holds at 9 E @1. Then for every 0 E @2 and every c E (O,m), there exists a (unique) G E.&, 0 < G(G) < l, G(a) = l - G(9) such that = 9m L(QLZ) - L(axl) ‘35) G(o!) L - L Then P9[fa(X)/fe(X) = c] 8 Petx1(G,X) = x2(G,X)]'= 0 by (1).. Since a and c were arbitrary, (16) holds at e. A similar proof yields the same result if 6 6 @2. Theorem 10. If (A-) holds, then (16) holds for every 9. Proof: Suppose there exists 9 E @1 for which (16) does not hold; that is, there exists a 6 @2 and c G (0,m) for which f (X) Péfifi=c]>0. 9 Fix 9,q,c and consider G satisfying (35), O < G(9) < 1, G(a) = l - G(9)- Define g, y as in the proof of Theorem 8. We may then wlog suppose that M = 2 with ® = {a,e}, and proceed as in the proof of Theorem 6, the case L* constant, by identifying a with l, 9 with 2, and v with the y defined there. We conclude this section with some examples to show that (I) holding for all 9 and (IQ) holding for all 9 need not be related. 43 Example 1. (I) holds for every 9 but (IO) does not, although (16) does. Let r») = {1,2,3}, a= {1,2} with @1 = {1}, @2 = {2,3}. Let P1, P2, P3 be uniform on (O,l], (1,21 and (1,3], respectively, and let p be Lebesgue measure. Since P1 and Pa’ a E @2 have disjoint supports (I) and (16) are trivially satisfied for each 9. But f (X) f (X) .1... = .. _2__ .._., i P2[f2(x) 1/2] — 1 and P3[ £30,) 2] 1/2 and (lo) is violated at 9 = 2 and 9 = 3. Example 2. (I) and (10) both hold for every 9. Take @,¢7, @1 and @2 as in example 1, with P1, P2, p also the same as in example 1, but replace P3 with P5, uniform on (2,3]. Then (I) and (IO) are trivially satisfied for every 9. Example 3. (IO) holds for every 9 but (I) does not. Again take 6), d, @1 and @2 as in example 1. Let P P2, P3 be triangular distributions on (0,6), 1, (1,5) and (2,4) reSpectively, with p = Lebesgue measure. Let the loss matrix be 0 9 L(9,a)) = 8 o o 1 For every x 6 (2,4), each likelihood ratio fa(X)/fe(x) is the ratio of two first order polynomials and therefore, given any c E (G,x), there exists unique solutions xc, xé 6 (2,4) for which 44 f (x ) f (x') QC C... f(x)=°’£(x')"° 6 c a c and hence Pe{[fa(X)/fe(x) = c, 2 < x < 4] = o for all c 6 (0,00), and for all a,6 E @. f (X) For every x E (l,2] U [4,5), f3(x) a O and fl(;)’ being f (x) 2 the ratio of two first order polynomials, fl(;) = c and 2 f2(X) f1(x) = c have unique solutions for every c e (0,m), Therefore Pe{[fa(X)/fe(X) = c, 1 < X S 2 or 4 S X«< 5] = 0 for all c E (0,m), and for all a,9 E @. For every x E (O,l] U [5,6), both f2(x) and f3(x) are zero and P1{[fa(X)/f1(X) = c, O < X S l or 5 S X < 6] = O for all c E (G,x), a = 2,3. Thus (I is satisfied for every 9. 0) However, for the given loss matrix, (I) does not hold at any 9. For example, consider the uniform p-measure G = (£3 %’, '31-) and the set A = {X‘X1(Gax) = k2(G,X). 2 < X S 3} ° For each x E A, 11mm = ‘5’- f, O for every 9 E @. In fact, for any loss matrix in which L(l,1) = L(2,2) = L(3,l) = 0, L(2,l) = -;-L(l,2) = '81-L(3,2) condition (1) will be violated at every 9. 45 The results presented in Section 3 and in this section give a complete characterization of (A-) in terms of an easily verified condition on likelihood ratios (I0) for certain component decision problems. Theorem 11. Suppose the component decision problem is either of Type (i) or (ii). Then (A-) obtains if and only if (IO) holds for all e E @. Egggf, For each of these decision problems, Corollary 1 shows that (H0) for all e =‘(A-) and Theorem 3 shows that (H0) for all e a (I) for all 9. But Theorems 5 and 7, show that (I) for all e ¢’(Io) for all e and Theorems 6 and 8, show that (A-) 2 (10) for all e for the respective problems. These results carry the proof of Theorem 11. CHAPTER III SMOOTHNESS OF THE BAXES ENVELOPE AND ITS RELATION TO (C) ‘1. Introduction. Throughout this chapter as in Chapter II, ® = {1,2,...,M} where 2 S'M < a so that .9 is the (M-l)-dimensiona1 simplex .9 = {>\c o, e = l,...,M} The mapping G n g_ is a one-to-one correSpondence between .3 and £9 Let Q = {1,2,...,M-1} and for e E Q, g e; and t such that -G(e) v G(M)-1 s t s G(M) A 1 ' G(O) note that (1) g+te=G+ta-tM. 46 47 In this chapter we define the Bayes envelope R as a function of G, that is, (2) R(Q) = infcp R(G,cp) For later purposes it is useful to note the representation, M-1 (3) R(9) - Z G(9)R(e-M.¢G) + RCM,ch) e=l Use will also be made of the fact that R is a continuous and con- cave function when defined on the simplex .9 in Euclidean.Mespace (for example, see Wijsman (1970, Theorem 2)) and that these prOperties are inherited by R as defined by (2) as a function of §_E,&. Gilliland and Hannan (1969, Theorem 1) have proved the equi- valence of the existence of partials §§{%% for H.E§fl* (R(H) is defined for H 6 71+ by R(H) = m R(H/|H|) if H + o and R(H) = o if H = 0) with a variant of continuity condition (H). It is our purpose to relate the smoothness of R(-) as a function of g with the continuity of R(e,qb) in G. Samuel (1963, Lemma 1) (1966, Lemma 1) has attempted to establish an implication of this type but as we show the attempts fall short. In fact both Lemmas as stated are false as shown in Section 3. For notational convenience in this chapter if f is a real-valued function defined on ER, fi will denote the igh partial derivative, 1 = l,...,k. 48 2. Some Mathematical Preliminaries. We will need the following results concerning concave (con- vex) functions. Theorem 1. Let f be a real-valued concave (or convex) function defined on an Open convex subset C of Euclidean k-space. (i) If the partials fi(x) exist for all x E C and i = l,...,k then f is differentiable on C. (ii) If a partial fi(x) exists for all x E C then f1 is continuous on C, i = l,...,k. 2522;. (i) See Rockafellar (1970, Th. 25.2). (ii) See Rockafellar (1970, Th. 25.4). We will now prove a lemma which will be used in a slight extension of the k = 1 case of Theorem 1 (ii). The lemma extends the result of Goffman (1961, Th. 9.7.1) to include a boundary case. Lemma 1. Suppose f is a real-valued function of a real variable which is defined in [a,b] where a < b. Suppose that f' exists, and is finite valued on (a,b) and that the limits f(b+t)-f(b) t (4) lim fia+t)-f(a) E f'(a), lim t E f'(b) t10 t10 exist (possibly infinite). If [daB] c:[a,b] and c is any extended real number between f'(a) and f'(e) inclusive, then there exists a g E [a,a] such that f'(g) = c. 2522:. If c = f'(a) or f'(a) take 5 = a or B. Other- wise, a'< 3 and c is real and strictly between f'(a) and f'(B). Consider the case f'(a) > f'(a) and let F(x) = f(x) - cx so that F'(a) > 0 and F'(e) < 0. Hence, there exist a.( §1<< 52 < B 9 F(gl) >'F(a), F(gz) > F(B). Thus, F is maximized on the interior 49 of [a,s], say at g. Then F'(§) = 0 so that f'(g) = c. If f'(a) < f'(a), F'(oz) < o, F'(e) > 0 so the same ideas yield a maximizer of F(x) on (aifi). Theorem 2. Let f be a concave (or convex) function of a real-variable defined on an interval [a,b] where a«< b and suppose f' exists and is finite on (a,b). Then (i) f'(a) and f'(b) exist (possibly infinite) as one-sided derivatives and (ii) f' is continuous from [a,b] into the extended reals. 2599;» (i) That the limits (1) exist (in the extended reals) is a consequence of the fact that for a concave function t'][f(x+t) - f(x)] 1 in t 1‘ 0. (ii) By Lemma 1 we see that f' assumes every value be- tween f'(a) and f'(b) inclusive. Since f is concave, f'(x) is monotone (i in x E [a,b]) and, therefore, must be continuous on [a,b]. 50 3. Differentiabiltity of the Bayes EnvelOpe as a Function of Q. Definition. For 9 E Q and G E .3 we define the line segment (5) L9 G = {9+ tel-G(9) v (604) - 1) < t < 601) A (1 - G(em . Note that Lg G is empty if and only if G(9) + 604) = 0. Theorem 3. Fix 9 E Q and G0 E} such that 60(9) + GOO!) > 0. The partial derivative Re(§) exists and is finite at all (_; E L if and only if there exists a determination 9.60 (9(0) satisfying (6) lim R(9-M .cp ) = R(9—MW) for all g e L t-0 9+” 9 9.60 in wh ich case (7) 119(9) = R(G-M. - R(G,] . The left hand side of (ll)being o(t) as t a 0 implies that the first term on the right hand side of (11)is o(t) as t e 0. Hence by (8) we see that R9(§) exists and is equal to R(B‘M’CPG) - 52 Corollary 1. If the partials R9(Q) exist for all e E Q and Q E intQ) then for any determination of the Bayes response ¢(,)’ M-l (12) R(Q) = Z G(9)R (Q) + R(M,q)G), for all Q E intCé). 9-1 9 M. Let (P0) be any determination of the Bayes response. If R9(Q) exists for each a E Q and Q E intw, Theorem 3 shows that for such 9 and Q, Re(G) '3 R(9-M,cpc) and (12) follows from (3). The next result is motivated by work of Samuel (1966, Lemma 1) and establishes the joint continuity of R(e,ng) in Q E intcg). Theorem 4. (1) Suppose R6(Q) exists for all QE int@) and e E Q and let :90) be any determination of the Bayes response. Then for every 6 E (9. R(9,ch) is continuous in Q E int(,_&_). More- over, for each 9 E 6) and Q E intQ), R(e,qh) is unique across determinations cp(.)- (ii) If there exists a determination cp(.) such that for every 9 6 @, R(e,cpc) is continuous in Q E intQ), then Re(Q) exists for all Q E intQ) and e E Q. 23.33;. (i) Let (90) be any determination of the Bayes response. By Theorem l(ii), Re(§) is continuous in Q E inth) for each 9 = l,...,M-l. Since, in addition R(Q) is continuous in Q, (12) shows that R(M,tpc) is continuous in Q E inth). Moreover, (12) shows that R(M,tpc) is unique across determinations app) since R9(G‘)’ e=l,...,M-l and R(Q) are well defined and do not depend on (p(°)' By Theorem 3 we can write 53 (13) R(9.ch) = 119(9) + R(M,q)G), e = l,...,M-l, c a G(9) + cm) > o . Therefore, (13) together with the continuity and uniqueness of R9(Q) and R(M,¢b) yields the continuity and uniqueness of R(G’QG)’ e = l,...,M-l and completes the proof of (i). (ii) Let ¢(.) be a determination such that for each 9, R(e,¢b) is continuous in Q E intQ). Then (6) obtains for any 90 E intQ) and all e = l,...,M-l and it follows from Theorem 3 that R9(Q) exists for all Q E int(-§) and e E Q. Since the existence of partials is equivalent to differen- tiability for concave functions (cf. Theorem 1 (i)), we see that Theorem 4 gives a complete characterization for the differentiability of the Bayes envelope. Also it shows that the differentiability of R on int(&9 depends in no way on the choice of the ‘M-1 coordinates in the definition of £- The continuity condition (C) which was introduced in Section 2,2 is the continuity of R(e,¢b) in all Q, not just on the interior. Samuel (1966, Lemma 1) claims that the differentiability of R, which relates necessarily to differentiability of R at points in the interior of its domain, is sufficient for (C) for the particular determination of ¢(.) defined by Samuel (1966, (6)). Example 1 shows that R may be differentiable on int(§9 and, in fact, may possess a one-sided partial derivative R9 at a 4|! 1 1‘] ‘1”. 54 point on the boundary of g with R(9,cpc) and R(M,th) not con- tinuous at that point for any determination of ¢(.). Example 1. Let (9 =- {l,2,3}, dB {1,2} with loss matrix given by 1 o (L(9,a)) = o 1 o 1 Let P1 and P2 be uniform on (0,1) and (1/2, 3/2), respectively, and let P3 be triangular on (0,2) with u = Lebesgue measure. For every Q'E intCéQ, Pe[XI(G,X) = x2(G,X)] = 0, e = 1,2,3, and R(locpc) = P1D.1(G.X) < 12mm) 2,3 R(Bacpc) = Pe[12(G.X) < 1.1mm]. e and it follows that R(e,qh) is continuous in Q.E int(&9 for each 9 = 1,2,3. Hence, by Theorem 3, R9(Q) exists and is equal to R(9 - B’qb) for each Q_E intcg), e = 1,2, and by Theorem l(i), R is seen to be differentiable on int(£9. Consider the boundary point Q0 = (5,8). We see that for any determination “’M’ lim R(1,tp + )= 5 , lim R(l, )= o t10 Qo "1 tTO Cpgoflz and 1 lim R(3, )= — , lim R(3,¢ )= 1, . no cPgo+t1 8 ”0 Qo+t2 55 Thus, there is no determination for which (C) obtains for e = l or e = 3. However, by Theorem 3 we see that R1(QO) exists as a left limit and is equal to %- but that R1 is not continuous at Q0 since lim R1(Qd+tl) = g' and lim R1(Qo+t2) = -% . 1:10 C10 The predecessor to Lemma 1 of Samuel (1966) is Lemma 1 of Samuel (1963). This lemma pertains to an M = 2 decision problem, in fact a 2 x 2 decision problem. Lemma 1 (1963) states that the differentiability of R on g (here fi=[0,l] and we presume that the differentiability at 0 or 1 refers to one-sided and finite derivatives) implies that for any determination ¢(.) and either 6, R(e’qb) is continuous in Q_E [0,1]. Theorem 5 states that for any action Space with bounded loss function, the differentiability of R on .Q; implies the uniform continuity and unicity of R(e,¢h) in (0,1) for each 9. Theorem 6states that, in the 2 x 2 problem considered by Samuel (1963), under the assumption of differentiability of R in (0,1), R(e,¢é) is uniformly continuous in [0,1] for both 9's if and only if mb and m1 are admissible determinations of the Bayes response. Theorem 5. Suppose M = 2 and L(9,a) S B < m for all e E 64), a E 4. (Since M = 2, Q = [0,1] and we let g be generic for an element of .é; and R' denote the derivative of R.) If R' exists on (0,1) then (i) for each 9 E ® and g E (0,1), R(0.¢g) is unique across determinations (ii) for each cp<-)’ 0 E @, R(9,¢é) is uniformly continuous in g E (0,1). 56 2592;. (i) Part of Theorem 4. (ii) The continuity of R(e,¢é) in g E (0,1) is part of Theorem 4. We note that when M - 2 the probability measure Gt - (g+t,1-g)/(l+t) is equal to (g+A,l-g-A) if t = A/(l-g-A). Hence, if 0 < A < l-g and with this choice of t, R(l’qh+fi) = R(l,qbt) S R(l,¢b), by'PrOposition 1.5 and we see that R(1,¢g) i in g E (0,1). Likewise, R(2,¢é) 1 in g E (0,1). For 0 E 9 define the functions he by h (0) = um R(e.cp) . h (1) =- um R(e.cp) 9 glo 8‘ 9 811 3 119(8) = R(9acpg) 9 g E (091) The limits exist by the monotonicity of R(9,¢é) in g and are finite since 0 S R(9,qh) S B4< m for all 9 and g. Since he(g) is continuous on [0,1] it is uniformly continuous on [0,1] and hence is uniformly continuous on any subdomain. In particular, R(9,¢h) is uniformly continuous on (0,1). Example 1 shows that Theorem 5 cannot be extended to M = 3 even when the action space is finite. For in this decision problem and with Q_E intcg), I “-0 R(1,cpc) = 0, R(23%) = *9 R(3:(PG) - if 9. 6 £1 while I ooh-a R(chc) = ’5. R(2. 0 . If R is differentiable on (0,1) then R(e,¢g) is uniformly continuous in g E [0,1] for each 9 if and only if qb and $1 are admissible. 2529;. For a decision rule m, let m(x) = m(2‘x). Since R' exists on (0,1), Theorem 5 states that R(e,¢é) is continuous in g on (0,1) for each a. The Bayes rules qb and m1 are 58 characterized a.s. u by l , if f2(x) > 0 (X) = Wb arb, if f2(x) = 0 V O 0 , if f1(x) (X) = $1 arb, if f1(x) — I O For any determination ¢( ) a calculation shows that lim R(1,(p) " R(1,(p1) " 0 311 8 lim R(2,(pg) R(Z. )- 0 . 3:0 CPO that lim R(l ,(pg) a P [f (X) >'0] 8‘0 1 2 limR(2,cp) ahp f (X) >0] , 311 g 2[ 1 and that R(l,mb) = a P1[f2(X) > 0] +-a P1[qb(X)[f2(X) = 0]] R(2,¢1) = b P2[f1(X) > 0] + b sz (1 - tp1(X))[f1(X) = 0]] . We see that R(l,¢é) is continuous on [0,1] if and only if P1[qb(X)[f2(X) - 0]] I 0 which is satisfied if and only if qb is admissible. Likewise, R(2,¢h) is continuous on [0,1] if and only if P2[(l - ¢1(X))[f1(X) - 0]] - 0 which is satisfied if and only if m1 is admissible. 59 An equally simple and direct proof of Theorem 6 exists based on PrOpositions 1.6 and 1.7(ii) and the fact that mb and m1 are admissible if and only if they are everywhere dominant (cf. Hannan (1965)). BIBLIOGRAPHY BIBLIOGRAPHY Ferguson, Thomas S. Mathematical Statistics: §_Decision Theoretic Approach. New York: Academic Press, 1967. Gilliland, Dennis C. (1966). Approximation to Bayes risk in sequences of nonfinite decision problems. RM-l62, Department of Statistics and Probability, Michigan State University. Gilliland, Dennis C. (1968). Sequential compound estimation. Ann. Math. Statigg. 39 1890-1905. Gilliland, Dennis C. (1969). Approximation to Bayes risk in sequences of non-finite games. Ann. Math. Statist. 40 467-474. Gilliland, Dennis C. (1972). Asymptotic risk stability resulting from play against the past in a sequence of decision problems. To appear in September IEEE Transactions on Information Theory. Gilliland, Dennis C. and Hannan, James F. (1969). On continuity of the Bayes response and play against the past in a sequence of decision problems. RMr216, Department of Statistics and Probability, Michigan State University. Goffman, Casper. Real Functions. New York: Holt, Rinehart and Winston, 1961. Hannan, James F. (1956). The dynamic statistical decision problem when the component problem involves a finite number, m, of distributions. (Abstract). Anna Math. Statist. 27 212. Hannan, James (1957). Approximation to Bayes risk in repeated play. Contributions £g_the Theory g§_Games 3 97-139. Princeton University Press. Hannan, James (1965). MR 30 #3553. Hannan, James F. and Robbins, Herbert (1955). Asymptotic solutions of the compound decision problem for two completely specified distributions. Ann. Math. Statist. 26 37-51. Hannan, J.F. and Van Ryzin, J.R. (1965). Rate of convergence in the compound decision problem for two completely specified distributions. Ann- Math. Statist. 36 1743-1752. Jilovec, S. and Subert, B. (1967). Repetitive play of a game against nature. Apl. Mat. 12 383-396. 60 61 Johns, M.V., Jr. (1967). Two-action compound decision problems. Proc. Fifth Berkeley Symp. Math. Statist. Prob. 463-478. University of California Press. Neyman, J. (1962). Two breakthroughs in the theory of statistical decision making. Review of the International Statistical Institute 30 11-27. Robbins, Herbert (1951). Asymptotically subminimax solutions of compound statistical decision problems. Proc. Second Berkeley §1E23 Math. Statist. Prob. 131-148. University of California Press. Rockafellar, R.T. Convey Analysis. Princeton: Princeton University Press, 1970. Samuel, E. (1963). Asymptotic solutions of the sequential compound decision problem. Ann. Math. Statist. 34 1079-1094. Samuel, E. (1965). Sequential compound estimators. Ann. Math. Statist. 36 879-889. Samuel, E. (1966). Sequential compound rules for the finite decision problem. Journal Royal Statist. Qgg., Series B 28 63-72. Subert, Bruno (1967). An asymptotically Optimal decision procedure. Transactions 9§_the Fourth Prague Conference 23 Information Theory, Statistical Decision Functions, Random Processes. Czechoslovak Academy, Prague 253-258. Van Ryzin, J.R. (1966a). The compound decision problem with m X n finite loss matrix. Ann. Math. Statist. 37 412-424. Van Ryzin, J.R. (1966b). The sequential compound decision problem with m X n finite loss matrix. Ann. Math. Statist. 37 954-975. "” "71111111111111