_ , 7, v . _. . hr’“_,.‘.--..— 7.-...-“._ - .. .w. . .. ‘ J . v V , ‘ - , _ } - ‘ RATES OF CONVERGENCE IN SEQUENCE-COMPOUND ' SQUARED-DISTANCE LOSS .ESTEMATION AN‘D . ' r r { TWO ACTION PROBLEMS ' V‘— Thesis for the Degree of Ph. D. < ‘ ~ " MICHIGAN STATE UNIVERSITY ‘ r .' WAGH ESWARUDU SUSARLA . 1970 . , . . ~ V ‘ . ‘ . ‘ M 4 . . . H . , . ... . '. ' . - " . , ‘ , ‘ ‘ ,. ‘ .. A x . ‘. . A . ‘.- , . . ~ , , ‘ . .. - . V . .. ‘7 » - . .. u ... H_ “H , _ , ,_,,, 4 ‘ . . , - . .. . .. / ‘ ,..V‘..:, ..‘.v-r.~v: um: I . . A ‘-»___,,.. .. 11- - V,.’...., .l. , - . ‘ e , , ~-» . "x...“r . ,. 4-.. ‘ .7 . , . . ,...-.— u, .... A ‘ . U ‘ ‘ {. rn-rrv v4" " ' e e r‘ " _ ..»N4.v¢r ’ ' . ' . . . - , .. . , . ‘ , , . . . .w . _ . . ~ . ~ " L1,}... (1 ‘ . . . ,. . . - ~v no» .rr v<~u, .. . . ‘ ...,..-..'o~-A.wo.‘ . . , vv n..- Ira—.J‘-i LIBRARY ‘ Typqxé MiChigan Scam ' - University This is to certify that the thesis entitled RATES OF CONVERGENCE IN SEQUENCE-COMPOUND SQUARED-DISTANCE IDSS ESTIMATION AND TWO-ACTION PROBLEMS presented by Vyaghreswarudu Susarla has been accepted towards fulfillment of the requirements for Ph.D. Statistics and Probability Aw’aézéé’w’m’ Major professor degree in I / l\/ Date August 12, 1970 0-169 ABSTRACT RATES OF CONVERGENCE IN SEQUENCE-COMPOUND SQUARED-DISTANCE IDSS ESTIMATION AND TWO-ACTION PROBLEMS BY Vyaghreswarudu Susarla We consider a sequence of repetitions of a statistical decision problem which has the structure of one of the statistical decision problems described below. These statistical decision prob- lems will be referred to later on as component problems. When the family of distributions 6’ is, (1) the family of mevariate normal distributions with covariance matrix I and mean 9 in o = [‘9‘ s a], the problem is to estimate 9 with squared- distance loss, (2) the family of F(a) distributions with scale parameter 9 in ® 8 [a,b] where o.< a < b < m, the problem is to estimate 9 with squared-distance loss and (3) same as (2) except that the problem is a linear loss two-action problem. For any dis- tribution G on @, let R(G) denote the Bayes risk in the com- ponent problem. §_= {fin} is a sequence of independent random.variables with distributions {Pen} in :‘9. Let Gn be the empiric distribution of 91,...,en. Let s be a positive integer and y be in (0,1). All the orders stated here are uniform in the parameter sequences .Q in 3 ®. > When the component problem is described by (1), we ethbit ** procedures ln , ln and oln’ which are functions of X1,...,§n, Vyaghreswarudu Susarla such that Dn(fi,y_**) = n'lzt; EH? - 93|2 - R(Gn)’ Dn(§,§_) and ”Jim are 0(n'1’ (“H"), ocn'(2‘1>v/) - (3:1) /2 (s+m+1)) and 0(n respectively. Whenever m 2 5 and ** (s-1)y(mH4) 2 2(23+m)(1+y), i. is better than i_ in the sense ** that SUpiDn(§9i- )‘gj converges to zero at a faster rate than sup{Dn(§)1f*)\§J does. Similar comparison has been given between ** ** i_ and 0?. The results stated above for 1_ and i' have been extended to the case when the covariance matrix I is replaced by 021 (02 unknown) and the means an lie in lower dimensional sub- spaces having the same dimension. When the component problem.is given by (2), we exhibit a pro- cedure W: such that Dn(§J1f) = 0(n-a/2(8+1)) when a,b and a satisfy certain conditions. For the same set of conditions on a,b and a, when the component problem is described by (3) with loss function L, we define a procedure fin such that n-lfiq E L(9j,¢j) - Men) = 0(n'S/2(S+1)). (I. Elul ['1 RATES OF CONVERGENCE IN SEQUENCE-COMPOUND SQUARED-DISTANCE IDSS ESTIMATION AND TWO-ACTION PROBLEMS BY Vyaghreswarudu Susarla A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHIIDSOPHY Department of Statistics and Probability 1970 TO MY PARENTS ii 7"? ACKNOWLEDGEMENTS I wish to express my sincere gratitude to Professor J.F. Hannah for introducing me to compound decision theory and for suggesting the problems treated in the thesis. His comments aided greatly in improving and simplifying most of the results of the thesis. I wish to thank Professors D.C. Gilliland and J.S. Huang for going through the thesis and pointing out some misprints. I wish to thank Mr. T. O'Bryan for suggesting changes in the phrasology. Special thanks are due to Mrs. Noralee Barnes for her excellent typing and cheerful attitude in the preparation of the manuscript. I am grateful to the Department of Statistics and Probability, Michigan State University and the National Science Foundation for the financial support during my stay at Michigan State University. iii -- “—4 lllln'lllll ll Chapter 0 I II TABLE OF CONTENTS INTRODUCTION RATES IN THE ESTIMATION PROBLEM FOR A FAMILY OF m-VARIATE NORMAL DISTRIBUTIONS o o o o o o o o o ooooooooooo 1.0 Introduction and Notation .............. ...... 1.1 A Bound for the Modified Regret D (ggg) ..... * 1.2 A Rate of Convergence for Dn(§,1_ ) with ** V Based on a Divided Difference Estimator for the Derivative of the log of a Density ... k ' . ) for Dn(§’i) With 3 Based on Kernel Estimators for a Density and its Derivative ...................OOOOOOIOO... 1.4 Rates Near 0(n-k) for Dn(§,o¢) where O$,aParticu18r i OOOOOOOIOOOOOOOOOO0.0000. * 1.5 A Lower Bound for Dn(Q’!-*) .......... ...... 1.3 Rates Near 0(n- 1.6 Extension of Results in Sections §1.2 and §1.3 to Constrained Mean Vectors and Unknown Covariance Matrix ............................ 1.6.1 Definition of gf* and a Rate of Convergence for“ Dn(§Jgf*) ........... 1.6.2 Definition of Z_ and a Rate of Convergence of Dn(§”i) ............... RATES IN THE ESTIMATION AND TWO-ACTION PROBLEMS FOR A FAMILY OF SCALE PARAMETER P(a) DISTRIBUTIONS 2.0 Introduction and Notation ............ ........ 2.1 Estimation Problem. Rates of Convergence for Dn(§3if) with w* Based on Kernel Estimators for a Density ................................ 2.2 Two-action Problem. Rates of Convergence for Dn(§”£) with t Based on Kernel Estimators for a Density ............................... APPENDIX 0......OOOOOOOOOOOOOOOOOOOO00...... ..... O BIBLIOGRAmY 0....OOOOOOOOOOOOOOOOOOOOOOO. ........ iv Page 10 27 33 40 45 46 47 49 50 54 63 72 74 INTRODUCTION In Chapter 1,.9 = {P9} is the family of m-variate normal distributions with covariance matrix I and mean 9 in O = [‘9‘ s a] and the component problem is squared-distance loss estimation of e. In Chapter II,'0 is the family of f(a) dis- tributions with scale parameter 9 in ® = [a,b] where 0 < a < b < m and the component problem is either squared-distance loss estimation or a linear loss two-action problem. For any dis- tribution G on @, let *G and R(G) denote the Bayes estimate and the Bayes risk in the component problem. The sequence-compound problem consists of a sequence of repetitions of the component problemtwith the loss taken to be the average of the component losses. 3.. {En} is a sequence of in- dependent random variables with distributions, {Pe } in fir? and the nth component decision §n depends only on §:,...,§n. With Gn denoting the empiric distribution of 61,...,en, let n _ l (0.1) Dn(§_,§) - n jEIEEMejéjn - Men). Dn(§3§) is known as the modified regret of 5, Since the work reported here is a continuation of Gilliland (1966, 1968) and Johns (1967), we describe some of the main results contained in these references. All the orders stated below are uniform in the parameter sequences concerned. For the purpose of this introduction only, abbreviate 0(n-a) to order -a. ‘lllllll.‘l I'l'l‘ E [J When 6’ is the family of univariate normal distributions with variance unity and mean 9 in [-a,+u] and the component problem is squared-distance loss estimation, Gilliland (1966) exhibited a procedure whose modified regret is order -1/5. When ‘9 is a certain family of discrete distributions and the component problem is the linear loss two-action problem, Johns (1967) exhibited a procedure whose modified regret is order -1/2. When 6' is a certain discrete exponential family and the component problem is squared-distance loss estimation, Gilliland (1968) exhibitadtwo pro- cedures whose modified regrets are order -1/2. Now we briefly describe the main results obtained in this work. In Chapter I, the Bayes estimate against Gn_1 is with p denoting the mixed density ‘pedGh_1, q denoting the matrix of partial derivatives of p and indication of the evalua- tion of both at En abbreviated by omission. In section §1.2, we define ¢:* based on a divided difference estimate of 5/5 whose Dn is order -(mfl4)-1. This generalizes the result of Gilliland (1966) for 'm = 1 case. In section §1.3, for each positive integer s and y in (0,1), we define 1n based on kernel estimators for p and q analogous to Johns and Van Ryzin (1967) estimates of ~‘pedG and its derivative in empirical Bayes two-action problem in exponential families and show Dn(§”!> is order -(s-1)y/(23+m)(1+y). For each integer s > 1, we exhibit 0?“, specializing i_ but for the latter's retraction to [5,m), whose Dn is order -(s-1)/2(s+m+1). ll“ [1.“ 2/m+4 ** - In section §1.5, we show that Dn(9’i- ) 2 c n where c is a constant depending on a- Hence, whenever m 2 5 and s and y are such that (s-1)y(m+4) > 2(Zs+m)(1+y), i_ is better than ** i in the sense that sup{Dn(§,§)‘§J converges to zero at a faster ** rate than SUP{Dn(§Jl- )‘Q}. A similar comparison is made between *4: ‘1 and 2%: Section §1.6 extends the main results of sections §1.2 and §1.3 to the case when the covariance matrix I is replaced by 021 (02 unknown) under the additional assumption that the means lie in lower dimensional subspaces having the same dimension. In Chapter II, as already indicated earlier, 6’ is the family of F(a) distributions with sclae parameter 9 in ® = [a,b]. In section §2.1, the component problem is squared- distance loss estimation. For each positive integer s, we define V: based on kernel estimates for two densities and show that Dn(§gif) is order -s/2(s+l) whenever a,b and a satisfy certain conditions. In section §2.2, the component problem is linear loss two-action. For each positive integer. s, we define 1n based on kernel estimates for two densities and show that Dn(§’i) is order -s/2(s+l) whenever a,b and a satisfy the conditions imposed on them in section §2.1. Throughout this work, we let Q and ¢ denote the standard normal distribution and its density respectively. We suppress the arguments of functions whenever it is convenient not to exhibit them. Indulging in the abuse of notation, we let sets denote their own indicator functions and, infrequently, are forced to let the value of a function denote the function. For any measure n, we let u[f] or pf denote ‘fdp. CHAPTER I RATES IN THE ESTIMATION PROBLEM FOR A FAMILY OF m-VARIATE NORMAL DISTRIBUTIONS §1.0 Introduction and Notation. For fixed a < m and for fixed positive integer m, let ‘9 = {Pe“e‘ s a} be the family of distributions with P9 denoting the mdvariate normal law with mean 9 and covariance 021, where I is the m X m identity matrix and 02 > 0. We consider the following estimation probleunwhich will be called the component problem hereafter. Based on an observa- tion of a random vector R whose distribution Pe belongs to ‘9, the problem is to estimate 9 with squared-distance loss. For any distribution G on the m-Sphere of radius a, let and R(G) denote the Bayes estimate and the Bayes q'G risk versus G in the above estimation problem. Since the problem considered here is the squared-distance loss estimation problem, *6 is given by the conditional expectation of 9 given 5. If pe denotes the usual density of P9 wrt Lebesgue measure on (Rmdgm), then the conditional expectation of 9 given § is G[epe]/G[pe] which, can be expressed as 2 X +'o qG where qG is the vector of partial derivatives of log G[pe] wrt the various coordinates of X. Hence, N 2 .1 = + 0 We consider a sequence of component problems as des- cribed above. That is, let {Kn} be a sequence of independent random variables with Xn distributed as Fe belonging to 9 n and the problem is to estimate every component of {9“} with loss taken as the average of squared-distance losses in individual n components. For each n, let the product measure x Pi’ Where i=1 Pi is an abbreviation for P8 , be denoted by En' Let 1 g = {gm} be a sequence-compound procedure (abbreviated to procedure hereafter). For any parameter sequence g = {en} and for any non randomized procedure §'= {gm}, define 1 n - 2 (0.2) Dn(_€i,§> = n z gjugj - ejl 1 - R(Gn) j=1 where Cn is the empiric distribution of 91,...,en. D (§,§) is called the modified regret of the procedure g. The orders stated in the results of sections §1.1, §1.2, §1.3 and §1.4 are uniform in all parameter sequences g in x [‘en‘ 5 a] and the order stated in section §1.6 is uniform In all parameter sequences §_ belonging to X ([‘en‘ S a] 0 RE), where, for each n, Eh is a d (d < m)-dime:sional subsPace of Rm. To reduce the complexity of the statements of various re- sults in this chapter, the range of the parameter sequences will not be exhibited, but is understood to be as in the pre- ceeding sentence. Henceforth, we use these conventions. In section §1.1, we get an upper bound for ‘Dn(§,§)‘ under the assumption that g is in X [-a,+a]m and a useful n lemma, both results holding for each 0 . In section §l.2, .- ** -___ we exhibit a procedure y_ for which Dn(§,yf*) = 0(n m+4) when 02 = 1. In section §1.3, for each y > 0, we exhibit a procedure i_ for which Dn(§,i) = 0(n-(%-Y)) again for 02 = 1. In section §l.4, for each positive integer s, we - -l 2 +s+1 exhibit a procedure 0% for which Dn(§,o$) = 0(n (S )/ (m )) for 02 = 1. Section §1.5 shows that 2 Dn(9,yf*) 2 c n m+4 for all n, where Q.= {0‘ and c is a positive constant. Section §1.6 has two subsections. These subsections extend reSpectively the main results of sections §1.2 and §l.3 to the case when 02 is unknown and when, for each n, en lies in fig intersected with m-sphere of radius a. Let n denote the Lebesgue measure on (ngam). For m any two points u,v in R with coordinates u1,...,um, m m v1,...,vm respectively, let ‘u‘2 = 2 ui, “u“ = “E ‘ui‘ m i=1 1— _ and (u,v) = 2 uivi. The inequalities ‘u‘ s “u“ s,/m ‘u‘ i=1 m will be used without further comment. Also, a vector in R will be denoted by < > with the general coordinate of the vector exhibited inside the brackets. Let pn be an abbreviation for pe , the density of n Pe . For each n, let we be abbreviated by N“. Then, n n specializing (0.1), (093) 1“ =§+Uq where qn is the vector of partial derivatives of the function n log 2 pj wrt the coordinates of X. j=1 " §1.1 A Bound for the Modified Regret Dn(§’§)' We state and prove two lenmas which are higher dimen- sional generalizations of proposition 1 and corollary l of Chapter I of Gilliland (1966) for the case of the family of normal distribu- tions 9. 2 2 ' -1 Lemma 1. PHE‘wn - ¢n_1‘] s z: e4° “ n for n > 1. Proof. From W“ = Gn[9pe]/Gn[pe], the triangle inequality and Jensen's inequality, reSpectively, it follows that ' n-1 ““1 n _ = -1 ' _ i Mn $11-1! {5133113) (121131.) \jilmj 9“)ij (1.1) n ['1 -1 -2 - S 2a pn( z pj) s 2a n pm 2 Pj - i=1 1‘1 , -1 -2 -1 Since pnpj = exp 0 (en - ej, X - (9n +’9j)2 ): -1 -2 2 -2 2 PnIPnPj ] = exp 0 ‘6n ' Gj‘ s exp 0 4a which, when substituted in (1.1), completes the proof. Lemma 2. If the procedure g' is in X [-a,+u]m, then, for 2 n each a > O, -1 n _1 ‘Dn(§9§)‘ S 40 n jil£j[“§j ' ¢j_1H] +'O(n log n). where to is an arbitrary decision rule taking values in m [-0,-1.0] 0 Proof. Inequalities (8.8) and(8.1l) of Hannan (1957) when Specialized to the squared-distance loss estimation problem here give the inequality 9 n 1 s 2 j=12jilvj_1 - eji 1. n (1.2) n .2123[‘¢j - ej‘ ] s R(Gn) s n By bounding the term R(Gn) appearing in the defini— tion (0.2) of Dn(§’§) above and below by using (1.2), we 2 2 obtain, by using the equality ‘a‘ - ‘b‘ = (a + b, a-b) for a,b in Rm, the double inequality n (1.3) n'1 z Pj[(§ ._ + i J ' 2 a ' s 1 j j_1 ej t, ¢j_1>] s Dn from Rm to Rm where H] and HJL for L = 1,...,m reSpectively under F and any undefined represent the measures of [j and [3 ratios are taken to be 1. We abbreviate t(E) frequently by t hereafter. * * Let the function t(F ), where F is the empiric * distribution of x1,...,Xn_1, be denoted by t , Let x abbre- viate Xn and X1,...,Xm denote the coordinates of X. Let N ** 2 * * 2 * (2.1) q, =tr'(X+ot (X)), t =tr(x+ot (X)) where tr' and tr stand for the coordinatewise retraction to the intervals [-a,+a] and [-a - k - h, a + k +-h] respectively. 11 With t abbreviating wn-l’ we have, since ‘V‘ s a ** * ** * and V = tr'¢ , “t - V” s “I - NH. Therefore, by the triangle inequality (2.2) gm“ - IN] S 2mm" - (x + azuxmn + Pninx + oztoo - MH- m : Lemma 3. For all x in R (1) x + ozufux) e [-a - 13 - helm. ‘ ' h Sk¥h2 mh+k - (2) ED 2 p<->'” exp - (M + --> where p L o 02 2 is the density of F. at x and (3) E‘s—HELEK—hexp%h-(‘XL‘ +q+k+h) for O m L = l,...,m where [3' = X I" With 1" = I. for 1 E L L j=1 J J and 13 = [XL’ XL + k +-h]. Proof. In this proof, let Fj denote the distribution of Xj and ejl’°°"ejm denote the coordinates of ej. Proof of (1). Let L be in {l,...,m}. Since the coordinates of x are independent, we can express Ff] and F as 30!. the products of univariate normal probabilities. Therefore, by cancelling out the common terms in these products, we obtain that m Mo'ls -e +k+h)>-e(a'1(x ~e +k>> j L = L if L 11L F C] -1 -1 j @(o (XL ' 93!. +10) - M0 (XL - 91(1)) Applying Cauchy's mean value theorem (Graves (1946), p. 81) to the rhs of this equality over (0,0-1h) with the function in the denominator to be taken as 6(o-l(xL - ejL)) while that in the numerator to be taken as §(O-1(XL - ej + k)). we L obtain, by using a2 - b2 = (a +-b)(a - b) for a,b in R1 9 12 the existence of m in (0,1) such that Fl] _12L = - E. - E FJD exp 02 (xx. 9J4. + 2 + wh)‘ Hence, since ‘ejL‘ s a, F exp -‘5- (x +-a 4"E +-h) S -i3L s exp k‘(a - x ). 2t 2 FC] 2 L 0 j 0 Since these bounds for FIVE/FIE are independent of j, they also bound fijL/fij. These inequalities are equivalent to (l) in view of the definition of t(F). Since L is arbitrary the proof of (1) is complete. Proof of (2). We temporarily abbreviate ”[S¢] by §(S) for any 3 in 5‘“. Then, F [J = §(o-1(I - e + k))r1 Q(a-1(Ii - eji)). Hence, applying the mean value theorem -1 -l to (o (Ii - eji)) for i 9‘ t and to Mo (IL - en + 10). . . m we obtain the ex1stence of in (0,1) such that . - + h+ hm x' 91 “’1 61k m = _ 1 J L chk, (a) 121¢( o ) where 61L = [i = L]. Hence, since - log(¢(u)/®(v)) = (u-v)(u+v)/2, we obtain that Fij m w.h + 6.,k -02 108(‘fi‘L—) = )3 (w.h + 6 k)(x. - 9.. +4-7—y-‘fi- (“)mpj i=1 1 1L 1 11 0 Hence, since the functions of mi appearing on the rhs of this equality, being convex, attain their maxima at mi = 0 or 1, we obtain that the rhs of the last equality is exceeded by m h + b. k .. _ .21.. iglouks, eji + 5mm” v () 13 m h + 51 k S 12101 + aukx‘xi‘ + ‘eji‘ +___L_2 ) s (k +h)(“x“ +/moz +mh'2H‘). Since this bound for -ozlog(Ff:k/(h/o)mpj) is independent of j, it also bounds -azlog(fijk/(h/o)mp). Since this in- equality is equivalent to (2), the proof of (2) is complete. Proof of (3). Using the notation 6(8) for S in 16m introduced in the proof of (2), we have F113 =¢(o'1(1 -e. +k)) nuo‘la -e )) and FCJ' = L L If; l—ML 1. jl j L 9(0-1(I" ‘ 9 )) H 9(O-1(I ‘ 9 )). Hence, applying the mean L 3!. i541, 1 ji - _ '1 n _ value theorem to 6(0 (IL 91L + k)) and Q(o (IL ejL))’ we obtain the existence of w, m' in (0,1) such that p '1 Ff] = k :-h ¢(o (XL Fiji, quads, - 9n + k + ooh» - ' k ejL_+ w ( +h)) Hence, since log(¢(u)/¢(v)) = (v-u)(v+u)/2, we obtain from the above equality that F U 2 _h__£t- . . L11 w_+w_1 a log kll FJJL - ((1 m )k +-(w - w )h)(xL-ejL + 2 k + 2 h). Hence, since 0 < w, m' < l, we obtain from the above equality that 2 h F ' a log Egg 5 (k-l-h)(‘xL‘ +0! + k-i-h). Since this bound is independent of 1, it also bounds 2 _. ._ o log(h H:£/(k+h)HJL). This inequality is equivalent to (3). Hence the proof of (3) is completed. 14 Now we bound the integrals on the rhs of (2.2). The method of bounding the first integral is essentially a gen- eralization of that given in Chapter III of Gilliland (1966). We get a simpler method of bounding this integral because of the definition of ¢f* in (2.1). This definition of ¢** differs from that of a similar function introduced by Gilliland (1966). The method of bounding the second integral of the rhs of (2.2) differs from that of Gilliland (1966). Let c ,c ,... l 2 denote finite functions of 02. Let K = {k‘O < k < (s + g'2(2a + 1))‘1}. Lemma 4. If k is in K, then * 2 k+h e _1_ 25 RIM - (x + o t(X))n] s c1(nk2hm+1) + C2(nh‘“ . nggf. Since the 1hs is the sum of gn-integrals of the moduli of the coordinates of y* - X - azt(X), the lemma will be proved by showing that these integrals are bounded by rhs/m. Let the dependency of t on X be suppressed and X1,...,Xm denote the coordinates of X. We abbreviate in this proof the Lth coordinates of ¢* and t by omission. Let a' denote 2(a + k +’h). Since ¢*, by definition (2.1), is the retraction of XL +czt* to [-a - k - h, a + k + h] and Since XL + ozt, by (l) of Lemma 3, is in [-a - §’- h,a], it follows that H" - XL - ozt‘ s a' and ‘f - XL - czt‘ s 02‘t* - t‘. Therefore, 15 7'c 2 0" ‘k 2 Emily - X, - o t‘] -<- ‘Eflm - x, - c t‘ > quu a' 2 * (2.3) s g §n_1[o ‘t - t‘ > ujdu a! = g En-1[02(t* - t) > u]du O 2 * + [a'£h_1[o (t - t) < u]du. 2 * The main part of the proof bounds £n-1[° (t - t) > u] f o < s ' d p [ 2 * f ' 0 or _ u a an -n-1 a (t - t) < u] or -a s u < by using the Berry—Esseen theorem. The rest of the proof shows that the Pn-integral of m times the bound for the rhs of (2.3) is exceeded by the bound in the lemma. Let X be fixed until otherwise stated. Let , = . = , a d s, [35, 6:3,]. as, [35, ea] n 2 = - R(tiulo ) (2.4) Yj(u) Sj bj e for ‘u‘ 561'. Let the dependency of Yj on u be suppressed hereafter. 2 -3 3 Let a = Var(2Yj) and L = a z PJ‘Yj - Pij‘ where 3 stands for summation over 1 from 1 to n-l. Sublemma. For ‘u‘ S a', -2 kg (a+a'+‘xL‘) 2 E ° 2__e C4 (n-l) ‘P 141ij 2 0] - Me 1213an s (RIB) Proof. With 3 denoting the Berry-Esseen constant, the Berry- Esseen theorem (Loeve (1963), p. 288) implies that -1 En-l[ij Z 0] - 9(6 ZPin)‘ is exceeded by BL. Hence, we complete the proof of the sublemma by showing that L is exceeded by B“1 times the bound of the sublemma. In order 16 to get a bound on L, we first get a lower bound on 62. By applying L1.A (see Appendix) to the Yj’ we 2 obtain a lower bound for B . We observe that Yj defined by (2.4) takes three values; namely 2 (2.5) o, 1 and - ek u] s [ZYj 2 0]. Hence, by the sub- * lemma, it follows that £n_1102(t - t) > u] is exceeded by (2.10) Q(B-12 Pij) + bound in the sublemma. Since ekt = Rik/fij by the definition of t, l8 _ _2 _ _ Z Pij = (n-l) H3L(1 - exp kc u) S -(n-l)ko 2 HJLU. There- fore, by using the upper bound for B in (2.9), we obtain ’5 that 9(5-1}: Pij) s §(-((n-l)hmk2) fu) where f is the positive solution of the equation -2 _ 2ko (ain'i‘ X ) .. (2.11) 0“ htn p131: e ‘ 9‘ £2 = (FOL 2 * Therefore, since (2.10) is a bound for §h_1ch(t - t) > u], we have, for O s u s a', (2.12) Pn_1[oz(t*-t) > u] s §(-((n-l)hmk2)%f u) + bound in the sublemma. * Now we consider bounding the probability Pn_l[02(t -t) < u] * for -a' s u < O. The definitions of t and Yj imply that 2 * [o (t - t) < u] s [2 Yj S 0] = [2 - Yj 2 0]. Since the sub- lemma continues to hold when 6j and bj in the definition of Yj are replaced by -Ej and -oj respectively, we obtain, by applying the sublemma to .2 1[2 - Yj 2 O], that n- 2 * . §n_1[g (t - t) < u] 13 at most -1 (2.13) §(-B Z Pij) + bound in the Sublemma. Again since 3 Pij = (n-1)th(1 ' exp kO-Zu) 2 -2-1(n-1)ko-2N:ku where the inequality follows since ko-za' < l by the hypothesis on k, we obtain, by using the upper bound 32 in (2.9) and the definition of f in (2.11), that §(-B-12 Pij) is exceeded by @(2-1((n-l)k2hm)$5f u). Therefore §n_1[02(t* - t) < u] S §(%(n-l)k2hm)%f u) + bound in the Sublemma. 19 Integrating this inequality wrt u on [-aJ,O) and the inequality (2.12) wrt u on [0,af], then bounding their a first terms by using the inequality £§(-au)du S (2n) A(581 for any a > 0, we obtain, by using the inequality (2. 3), that 3 l . . 2 ngg +20 (bound in the P [‘w* - X - ozt‘] s “'1 ’9 fzn ((n-1>kh)f sublemma). Hence we complete the proof of the lemma by showing below that the Pn-integrals of m(h(k+h)-1)%f-1 and m(nhm)% (bound in the sublemma) are uniformly bounded in n. By definition of f in (2.11), we have k0 2(a+u W+‘X ‘) -1 _ 2 4% hm %e f - O (FDL _J) (PDL ) By bounding above (fijé/E:k)£ by using (3) of Lemma 3 and by bounding below fi:k/hm by using (2) of Lemma 3, we get an upper bound for (h(k+h)-1)%f-1. ‘Weakening this upper bound for (h(k-l-’n)ml)s‘;f-1 by using the fact that 0 < h S k.< 1/5, we obtain that e2:2 -—-(3‘x L‘fi‘x“) 5 5% _h_, 1 . C (keh f 3 for some c5. Since (2n02)mp§ S exp(-o-%K‘x‘ - a)+)2) and (2no2)mf>2 2 exp - (o'2(a + ‘x‘)2), we obtain that the above upper bound for (h(k+h)-1)%f- is uniformly bounded hn Iland Pn-inte- grable. Now by using (2) of Lemma 3, and the inequality 0 < h< k < 1/5, we obtain that (nh m)% (bound in the sublemma) is exceeded by 20 °-2(‘Xt‘ + WE‘LL) e C 6 pt 2 -2 + 2 for some c6. Again, since (2n02)mpn S exp - (o ((‘X‘ r a) ) ) and (2noz)m 52 2 exp - (c-2(a +-‘X‘)2), we obtain that the hm/2 (bound in the Pn~integra1 of the above upper bound for sublemma) is uniformly bounded in n. This completes the proof of the lemma. The next lemma is a slight generalization of a particular case of Cauchy's mean value theorem (Graves (1946), p. 81). Lemma 5. For each j = l,...,n-l, i = l,...,m, let the func- tions fji’ gji be real valued, continuous on [ai’bi] and differentiable on (ai’bi) and let the derivative of gji be finite and positive. Then there exist c1 in (a1,b1),...,cm 1n (am’bm) such that b. 1 I 2 n fjiJai 2 n fji(ci) b. - E n g' (c.) 1 ji 1 2 n sjiJai where 3 stands for the summation over j from 1 through. n-l, n stands for product over i from 1 through m and prime over any function denotes its derivative. Proof. Define the functions g1 and n1 on [a1,b1] as follows. m bi 5(X)=Ef (X) Hf] 1 jl i=2jiai and m b n(x)=zg (x) m i l jl i=2 ji]ai 21 for x in [a1,b1]. With these definitions, we obtain that b b ' 1 2 1T £11181 €118. (2.14) 1 = —_L . bi b1 2 n gji-Jai nl]a1 Since fjl and gjl are continuous on [a1,b1] and dif— ferentiable on (a1,bl) for all j, so are g1 and n1. Moreover, since the derivative of gji is finite and positive for all j and i by assumption, so is the derivative of RI. Hence, applying Cauchy's mean value theorem to the rhs of (2.14), we obtain that there exists c1 in (a1,b1) such that b. 1 2 " £31], §' (2.15) 1 = 1 1 _T___. bi “1(61) 2 " 8111a. 1 Now, we define g2 and n2 on [a2,b2] as follows. m bi _ I i=3 i and m b, .. 1 o 0 9 ' ' for x in [a2,b2]. Then ltbeIIOZS that the ratio §1(c1)/H1(c1) l O a a 2 2 . is identically the ratio §2]32/fl2]az. Again, §2 and n2 are continuous on [a2,b2] and differentiable on (a2,b2) Since szogjz are continuous on [a2,b2] and differentiable on (a2,b2) for all j. Also, since the derivative of gji is finite and positive for all j and i, the derivative of 22 n2 is finite and positive. Therefore, again using Cauchy's mean value theorem, the definitions of §2 and Hz and (2.15), we obtain the existence of c2 in (a2,b2) such that b. 1 ’3 " £1118 we) (2.16) 1 = 2 2 hi 112(C2) E n gji]ai Iterating the above procedure of obtaining (2.16) from (2.15) (m-2) times, we obtain the result of the lemma. We apply this lemma to prove the following lemma. 2 2 Lemma 6. ‘Lth coordinate of X +-a t - t‘ S k(1 +'gé9 2 o + h(l +-m Q7) for 1 = 1,...,m. 0 Proof. Let the dependency of t on X be suppressed and abbreviate the indication of the Lth coordinates of t and t by omission. Let H abbreviate h- E: and eL denote the unit vector in the Lth direction. Since t = k.1 (log H(X + k eL) - log H(X)), by the mean value theorem, there exists 6 in (0,1) such that alog H 8 XL L . 2 - . Since t - X = a a log p/a XL’ the above equality L together with the triangle inequality implies that 2 2 (2.18) ‘xL + o t - t‘ s o (‘11‘ + ‘12‘) where 10 - X+ek e4 (2.19) 11 = g—is—RJX L .m 23 and (2.20) I=M(X+eke)ram(X+eke). 2 3X, 4. ax, L By the mean value theorem, I1 = gk(azlog p/aXi)(X + 3*k eL) for some 3* in (O,e). With ej1,...,9 denoting the jm coordinates of ej, we have 2 2 - 2(x-e)p 2(X-e)p 02(1+02§log2)= t it 1_( t it if. all, 2 P1 2 Pj The rhs of this equality can be recognized as the conditional variance of the Lth coordinate of X - 9 given X when the pair (e,X) has the joint distribution resulting from Gn-l on e and P6 on X for given 9. Hence, since the Support of Gn-l is in the m-sphere of radius a, we obtain that 2 21 ' 2 (2.21) O’ ‘chlg._2‘ S 1 +15 . ‘ X G L 0 Hence 2 2 (2.22) o ‘11‘ s k(l +945). 0 We complete the proof of the lemma by showing that 02‘12‘ S h(1 +-maza-2) with the help of Lemma 5. The definition of H gives (2.23) (n-l)hmfl = z Fj D where, since the coordinates of Xj are independent, (xi-e, {HO/c 2.2 = ( 4) Fj[3 n Q](xi'eji)/° 24 Therefore, (X '9. +h)/U (X -6..+h)/o DH (XL jL i 1 Now we apply Lemma 5 to the ratio (aH/aXL)/H obtained by using (2.23), (2.24) and (2.25) with the following iden- " ‘ - = -1 .. tification. For all j l,...,n- -1, fji= gji 9(0 (y 931)) L'BQQJ 10-9%» and (ai’bi) = (c-1Xia 0.1(Xi + h)) for all 1. Then there exists for i # L, ij = ¢(o 1(y-9jL)). gj a o in (0,1)"1 such that a_%25_fl.= g—%95—E (x + he). 5 t L By subtracting 510g p/ax, and then applying the mean value theorem to this function of h, we obtain the existence of h' in (O,h) such that - m 2 - a XL 5 XL i=1 i axiaXL For i # L, we obtain directly that 2 - - - - - 4 a 103 p = 2(9JL XL)€931 xi)EJ, 2(GJL xt)pj , 2(911 Xi)pi X. X . . a13L ij ZPJ EPJ The rhs of this equality can be recognized as the i,Lth element in the covariance matrix of 9 - X conditional on X when the joint distribution of (9,X) results from Gn-l on e and P9 on X for given 9. Hence, since the support of Gn-l lies in m-sphere of radius a, it follows by Schwarz's inequality that S a for i # L. 25 This inequality, together with (2.21) and (2.26), implies that 2 oZP-i—ofi-Ii - LE-E—H s h(l +m9’—2-). a L a L o 2 Thus, by (2.20), ‘12‘ s h(1 + mg 0-2) and the proof of the lemma is complete. u Before stating a theorem as a Corollary to Lemmas 2, 4 and 6, we make a remark on the proof of Lemma 6. Remark 1. The method of proof of the lemma differs much from that of Gilliland (1966) for m = 1 case. He has never used ‘7’ . the fact that the conditional variances and covariances are uniformly bounded by explicit functions of a2. Moreover, the constants multiplying k and h in the result of the lemma are Specific functions of a while those of Gilliland are complicated integrals. A proof similar to the proof obtained by particularizing our proof to m = 1 is simpler than that of Gilliland. In the rest of the section, we let h and k depend on n. We assume in the theorem to be stated below that 02 = l. The choices of h and k given in the following theorem are optimal for the convergence to O of the expression obtained by adding the right hand sides of Lemmas 4 and 6. l 1 Theorem 1. If h = n m+4 , k = a n “H4 for a in [l,m) ** and ¢ is defined by (2.1), then 1 gm“ - m = 0(n m“) 26 and l D (93¢— ) = 0(n M). Proof. The first result is a direct consequence of (2.2), Lemmas 4 and 6 and the definitions of h and k. Since, * , , m 18 In X [-a,+u] , the second result n follows from the first result and Lemma 2 with 02 = 1. it by definition ‘1 27 §l.3 Rates Near 0(n-k) for Dn(§’i) with W Based on Kernel Estimators for a Density and its Derivative In this section, for each positive integer s and v in (0,1), we exhibit a procedure 1_ belonging to a class of procedures whose modified regret Dn(§9i) is 001- (s -1)v/(28+m)(1+v) ). The definition of 1_ depends on kernel estimators for a density and its derivative. These kernel estimators are similar to those defined by Johns and Van Ryzin (1967) for estimating the unconditional density and its derivative in the empirical Bayes two-action problem in exponential families. For L = O,l,...,m, let KL be bounded with ”[HUHSKL]:=S!CLS< G and for all nonnegative integers t1....,tm, m t (3.1) M: n ujj K O] = 1 or 0 as E tj = O or in {l,...,s-l} J=1 and, for 1 s L s m, ULKL satisfies (3.1) with 3 replaced by 8'10 K and their As a result of these conditions on K0,..., m intent, if f is a function on km with partials of order s uniformly bounded by M, then the substitution of the sth order Taylor expansion with Lagrange's form of the remainder shows (3.2) ‘u[f KO] - f(0)\ s M COS and if, in addition, all partials of f not involving the Lth variable vanish at O, 28 (3.3) ‘pEf KL] - fL(O)\ s MLcLS where fL stands for the first partial of f wrt the Lth variable. The notation to be introduced below is defined for each n. We abbreviate by omission the dependency on n of the func- tions to be defined below. We let 2 denote summation over j from 1 to n-l. Let 3,6 be positive. As in section §1.2, let X abbreviate X“. Define (3.4) sj = e'mk0(e‘1<§j - x>), (n-1>§ = : fij and a = <§L> where - mfil l -l 3.5 - = . lth A =—1( .-x _ < > (n 1>aL z aLJ w 6 qu> -1 KL“ (35]. X)) where IL is the m X m identity matrix reduced by l/2 in the Lth diagonal element. Now we state and prove some lemmas which will be use- ful in obtaining a rate of convergence for the modified regret of a certain procedure 1_ to be defined in the latter part of the section. let c1,c2,... denote finite functions of 02. In the following lemmas, p, the average of the densities of X1,...,Xn_1 and q, the vector of partial derivatives of p are evaluated at X. We do not require the condition that ‘enl S a to prove lemmas 7 and 8. IEEEELZ: 2;-lilfi - El] s c1 (es +‘((n-1)em)-%)- 29 Proof. Since “[pj 6-mKo(€-1(o-X))] = ”[pj(x + e.)KO], its absolute difference from pJ(X), by the uniform boundedness of partials of order s of' pj and (3.2), is at most c2 es. Hence - - 8 (3-6) En_1[P] ‘ P‘ 5 C2 6 . Let VX(§) denote the conditional variance of § given X. Since -2 2 -1 - ? - 2 - 2 2 ulpje mKO(e (. - x))] = e m ”[pj(X + e.)KO] s e m(Zno ) m/ ”[KO] 2 and ”[KO] < CO, I -l (3.7) vxm s c, (ax-1).”) . Since for any random variable R, E‘R| 3 ‘ER‘ + Var%(R), (3.6) and (3.7) will yield the bound in the lemma with c1 = c2 V c3. 2 .. - _. Since a ”qH/p s‘/m a + “X“ and since \9n\ S 0 implies that Ph[HXH] is uniformly bounded, the following corollary is a direct consequence of Lemma 7. Corollary 1. gn[nd“‘(§/p) - 1‘] s c4(es + ((n_1)€m)-%). Lemma 8. Eh[“a - an] s c5(58-1 + ((n-l)6m+2)-%). Proof. In this proof, we abbreviate by omission the indication of the Lth coordinates of q and q. Since, by two usages of the transformation theorem, A = -1 1 . - . nip]. 61“.] 6 u[KL(pj(X +1, 6) pj(X + 6 ))3. 30 its absolute difference from the partial derivative of pj wrt the Lth coordinate, by the uniform boundedness of partials of order s of pj and (3.3), is at most c6 68-1. Hence, (3.8) \P _1[§] - §| s c6 63‘1 . Let Vx(é) denote the conditional variance of d 2 2 2 given X. By the inequality (a +-b) s 2(a +ib ) for a,b 1 in R and the transformations as above, we have m+2 2 -1 2 1 uEpj(aLj) ] s (o ) uEKL(2 pj(x +-IL5-) + 2pj(x + 6-))]- 2 Hence, since ”[KL] < m, (3.9) vxé) s c, ((n_1)5m+2)-1 . Since for any random variable R, E‘R‘ S ‘ER‘ +-Var%(R), inequalities (3.8) and (3.9) yield the bound in the lemma c5 = C6 V c7. Lemma 9. For any a in (0,1), there exists a finite function 2 of o , c8, such that ' a Pn[p Let M be the minimum value of ‘Zl for which rhs of (3.10) s 8' Since, for all t, /2 P[|z|2/2 > t] s e'bt(1-b)'” for b in (0,1), we get from 31 (3.10) that a -1 2 2 _a _ _a _a -§(M+Zo‘ a) -bM /2 -m/2 6 Pulp < B] s B Pn[\z\ > M] s c e (l-b) which is bounded in M for b > a. Corollary 2. For any a in (0,1), there exists a function of 02, c9, such that - - a Pniuiluip < an s c, e p Proof. Since OZHaH/fi s,/E a + “X” and, therefore, has all moments, Holders inequality yields, for any r > 1, the bound r-l r l r - r-l f, - Pn [(11391) 1 P, [P < B] P 1 _1_ for the lbs of the corollary. By Lemma 9, P:[p < B] S C; Br 9. for b in (a,l). Choosing r such that a r = b, we get the result of the corollary. Henceforth, we take 6 to minimize the bound in Lemma 8. That is, (3.11) 523+” = (In-1)-1 . We also choose a to be such that m2. _s-1 (3.12) 5 2 s e s a s . Let B be defined by 81+Y = 68-1 for any y in (0,1). With these choices for e, 6 and B, we define 1 as follows. Let 32 2 (3.13) w = tr'(X + o 1) 13311:»: m where tr' stands for retraction to [-a,+u] and for y in 1 I R , let y = V V B- In the following lemma, V is evaluated at X. Lemma 10. For each positive integer s and y in (0,1), there m+2 s-l - 2 +m = (n-l) 1, 5 m s c s 6 and exists C10 such that if 523 1 - B +Y = 68 1, then 5-1 x 28+m’ +Y for each n > 1 . EnUW - N] s aloe-1) Proof. Since W lies in the m-sphere of radius a and w is 2::' the retraction of X +'o q/p to [-a,+u]m, we have by using '1 the inequality p 2 5 0‘wa _wsng.-21“$.5qu-§§'ns§-{ué-&\1+@l5-§'Ho p p P p Since \5 - 8" s ‘5 - 8‘ + a[5 < B], the result of the lemma follows from the above inequality,Lemma 8, Lemma 7 and Corollary 2 and the hypothesis on. e, 5 and 5° Now we state the main result of this section. Theorem 2. If 02 = 1, the hypothesis of Lemma 10 is satisfied and ‘1 is defined by (3.13), then 5-1 .31. 'Zs+m 1 Dn(§31) = 0(n +Y ) . Proof. Since i, by definition (3.13), lies in x [-a,+a]m, '————- n 1 and Lemma 10. 2 the theorem is a consequence of Lemma.2 with o 33 §1.4 Rates Near 0(n-k) for Dn(§,o$) where Oh, a Particular W Let a = l and let 8 > 1 be a fixed integer throughout this section. Letting 0* denote a specialization, less a retrac- tion to [a,m), of the i. of section §1.3, with certain additional assumptions on the kernels, we show that Dn(§,o$) = 0(n-(S-1)/2(m+s+1)). We specialize § and a (defined by (3.4) and (3.5) respectively) by setting a = 6 and denote their common value by h. Let (4.1) §=tr'(X+g) O x P where tr' (as in previous sections §1.2 and §l.3) stands for re- traction to the cube [-a,+a]m and any undefined ratios are taken to be zero. Let h2 = X - X, v 3 h(u + 61/5) and Yj(u) = J J with (4.2) Y (u)= 1x 01 -x -vK)oz,=h"‘+1q _hmv’f). LJ 2 L L L 0 J Lj j In the following lemma, y will be evaluated at X. let c1,c2,... denote constants. S - Lemma 11. If K0,...,Km are bounded with u[uuu KL] - CLs < m, KO satisfies (3.1) and 111 with 8 replaced by s-l and are such that for |u\ s 20, K1,...,uml{m satisfy condition (3.1) h S S a, -c X‘ Var (Y ), Va R Z ) X (4.3) CI e 2‘ s m L1 r( O 0 i 3 c3 eca‘ ‘, h ¢<|X|> then gn[now - 1H] s c5(((n-1)h28+m)% + 1 m+2 %)' ((11-1)h ) 34 Proof. Let the indication of the Lth coordinate of 0% and W be abbreviated by omission. Since 0% lies in [-a,+a] and since V lies in [-a,+a], it follows that \OW - 1‘ s 2a and ‘Ol - V‘ s ‘D| where (4.4) D ' '0) 1“». l "U I |<~ | Therefore 20! 2a (4.5) Eo-luol - M] s g gu_1[|n| > u]du = g 3.1-1“) > u]du 0 i-I £h_1[D < -u]du. -20 The main part of the proof bounds the integrands of the rhs of this inequality by using the Berry-Esseen theorem and (4.3). The rest of the proof shows that the Ph-integral of a bound for the rhs of (4.5) is at most the bound in the lemma. With 82 = Var(z YLJ) and L - 3-33 PJ‘YLJ - PJYleB, the standardized range bound for L, together with lhs inequality of (4.3), the inequality (4.6) M s h(3a + |xL|) and the fact that K ,...,K are bounded, implies that O m c7 (1 + h(3a + |xL|)) (4.7) L s c1((n-1)hm)?’¢”(\x‘)e for ‘u‘ S 2a . -c21X‘/2 Let 0 s u s 20. Then the definitions of D in (4.4), 32Lj in (4.2) imply that [D > u] s [2: Y“ > 0] + [‘5 < 0]. The Berry-Esseen theorem (Leave (1963), p. 288) and the triangle inequality imply that 2n_1[IHzJ >'0] is at most 35 +1 -1- (4.8) M-(n-Dhmfle 1pu>+\¢(-h““ a p u) Y,)| + B L. -l -§(B 23ij C M Since rhs inequality of (4.3) implies that 52 s c3hm¢(‘x\)e A , the first term in (4.8) can be bounded above by replacing B by this upper bound for 6. Also, by the equality (4.9) (n-l)hm+1p u +-z ijcj = (nbl)hm+1(h-1v( (p - p _[p]) + gmli’c‘h - c3). the lhs inequality in (4.3), the bounds (3.6), (3.8) and the in- equality (4.6) imply that the second term in (4.8) is at most c 8((n-1>hzs*““>}5 :% 2 (1 +-h(3o +-|XL‘)) "czTillz (4.10) WIXI) Hence, with f defined as the positive solution of the equation c4‘x‘ -2 (4.11) c3e ¢(‘X‘)f2 = p , we obtain that (4'12)Eh-1[2Ytj> o] s ¢(-(n-l)hm+2)%f u) + (4.10) + B rhs of (4.7) Now we consider -2a 5 u < O. The definitions of D in (4.4) and YLj in (4.2) imply that [D < u] s [E YLj [b s O]. The Berry-Esseen theorem and the triangle < O] + inequality imply that gn-1EEYLJ < 0] is at most (4.13) o((n-1)hm+15'15 u) + |§((n-1)hm+la'lp u) - o(-e'lszYLj)\ + BL. 36 c \X‘ Since the rhs inequality of (4.3) implies that 82 s c3hm¢(‘X\fe 4 , the first term in (4.13) is bounded by Q((n-l)hm+2)%f u) where f is the positive solution of (4.13). The lhs inequality of (4.3), the equality (4.9) and the bounds (3.6), (3.8) and (4.6) imply that the second term of (4.13) is at most (4.10). Therefore, _fl, 2)%f u) + (4.10) +-B rhs of (4.7). m+ (4.14) P 1[2YLJ<03 s 2(((n-1)h Integrating (4.12) wrt u over [0, 20] and (4.14) wrt u over [-Za,0), then bounding their first terms by using the inequality 20 g Q(-At)dt s A.1 for A >10, we obtain.(since the corresponding Berry- 2a Esseen, followed by normal tail bound, treatment of gn—l[§ S 0]du con- tributes no more than 1+q2/8 times the rest) that §n_l[‘o$-¢|] is at most 2 l 1 m+2)% %’+'4a[(4.10) +'B rhs of (4.7)]}(2 + %—). ((n-l)h Hence we complete the proof of the lemma by showing that the P - c ‘Xl/Z n . -l 2 -g integrals of f and (l + h(3a + |XL\))e ¢ (‘X‘) are uniformly bounded. , 2 + 2 m-Z 2 Since (211)mpn S exp -((‘X‘ - a) ) and (Zn) p 2 exp -(a+1x‘) , we obtain from the definition of f in (4.11) that pnf"1 is at most c |X\/2 c8 ¢<+>c”<|xl>e “ 95(le + a) which is u-integrable. Again by using the upper bound pn, we can cz‘Xl/Z % show that the Pn-integral of (1 + h(3a +-‘XL‘))e ¢ (‘X\) is uniformly bounded. This ends the proof of the lemma. Now we state the main result of the section. 37 Tfiieorem 3. If the kernel functions K -1/m+8+l n ,...,K satisfy the conditions m 0 (of Lemma 11, h = a where 0 < a s s.1 and Oi. is de- fined by (4.1), then D (g, i) = 0(n-(s-l)/2(s-lm+l)). n O Egggf. Since 0% lies in X [-a,+a]m, the result of the theorem is a direct consequence of Lgmma 11, the hypothesis on h and Lemma 2. Now we exhibit kernel functions K0,...,Km satisfying the conditions of Lemma 11. We develop these kernels in m = 2 case for the sake of simplicity of the notation. Let [cij] be an a X a matrix whose ijth element is Cij' For each pair of positive integers i,j, let Wi’j be the indicator function of the south-west quadrant of (i,j) intersected with the north-east quadrant of (0,0). We will determine [aij], [bijl] and [bijz] with only finitely many entries different from zero such that (4.15) K = z 1i’1,1<1= )3 hi ni’j and K2= z; b. 114 0 . i l . . 1'2 13.1 j 13.1 j 1’] J satisfy the conditions of Lemma 11. For any two positive integers S, T, let [aijjs T denote the modification of [aij] obtained by replacing aij by zero if i > S or j > T. We note that for any two sets of distinct non- negative integers, k1,...,kS and L1,...,LT, the vectors k L k L (4.16) [1 lj ST 11S T,..., [i 81 T13 T are a basis for R 38 kL kL .r.t_ . rt (For 2 crtil J ] - [0] lff z Crtx y = O has the roots {l,...,S} X {l,...,T}, which by iterative application of Descarte's rule of signs requires the crt to vanish.) We use this fact to show that certain norms are different from zero and to show that certain coefficients are zero. The kernel conditions (3.1) on K0 and K specialize to the following requirements on inner 1 products, 1 I = 1 (anol: [1111'sz = o 11 -3 2 3 +1. 3 5+1 1.] S LISLZD S £1 2 and 2 = = (Eb 1 [11.1.1.2 > = L1 2’ L2 1 ijl’ J] 0 4SL1+LZSS+1 We choose [aij] for simplicity to be the ‘1 4’2 projection of [lj]s,s on L {[i j 15,3‘1 g L1,L2, (4.17) 3 g L1 +~L2 3 3+1} divided by its squared norm, and in order to satisfy the variance requirements (4.3), we take b [bijl] to e projection of [izj] on L [[iLlez |(L L ) # (2 l) 3,8 13,3 l’ 2 ’ ’ (4.18) l 3 L1 3 s, 1 s L2 5 3} divided by its squared norm. The squared norms are non-zero by the aforenoted linear independence for (S,T) = (8,3). Mbreover, bSjl # 0 for some j in {l,...,s} for, otherwise [bijll defined in (4.18) will lie (s-1)s (s-l)s in R and is orthogonal to a basis in R , hence is 0. Let M = Max{j‘bSj * 0}. Interchanging i and j, we get a solution for [bijZ] such that K2 satisfies the kernel condgtions cultminating in (3.1). 39 and K 1 With A denoting a bound of K0, K 2, 3 2 (4.19) V3r<¥Lj) s A2(§'+-v) En-ltgj E (X,X + sh) X (X,X +-sh)]. By the mean value theorem, the probability on the rhs of this in- equality is szthj(X + §sh) for some g in the unit square. Hence, factoring out h2¢(‘X\), the restriction h s s-la and the inequality (4.6) show that the rhs of (4.19) is bounded by the rhs of (4.3) for suitable c3 and c4. Now we observe that YLJ defined by (4.2) takes finite number of values including zero and Z-lbsM' The probability that it takes the value zero is En-ID'Sj - x a! (0, sh) x (0, sh)] and that it takes z-lbsM is gn_1[gj - x e (2(s-1)h,2sh) x ((M-l)h,Mh)]. Therefore by L1.A of the Appendix, we obtain that (4.20) Var(Y1j) 2 c9 En-IEEj - X E (2(s-l)h, 23h) X ((M-l)h, Mh)]. By the mean value theorem, the probability on the rhs of this in- 2 equality is h p (X + §h) for some g in (2(s-l),25) X (M-1,M). J 2 - Hence, factoring out h ¢(|X|), the restriction h s 3 £1 shows that the rhs of (4.20) is bounded below by (4.3) for Suitable c1 and c2 when L = 1. Similarly that Var(Y2.) is bounded by lhs of J (4.3) can be similarly proved. By following the argument given above, we can show that Var-(K0 o Zj) also satisfies inequality (4.3). 40 ** §l.5 A Lower Bound for Dn(9,¢ ). In this section, we use the notation of section §1.2 specialized to the 02 = 1 case. Let ,c denote c1 2,... absolute constants. With (5.1) 92 = (n-l)k2hm, by using the Berry-Esseen theorem and Lemma 1 of the Appendix, ** 2 -2 we show that Dn(9’i' ) 2 c1 8 under certain conditions on 8. Theorem 4. 1f 36% +ih) a a < m, B a m and yf* is defined by (2.1), then ** 2 -2 Dn(_0_,y_ )2 c1 8 . ** Proof. Let the first coordinate of 1n be abbreviated by ** * W and let the indication of the first coordinate of t be abbreviated by omission. As in section §l.2, let X, with coordinates X ,...,X , abbreviate X . Our method of proof is 1 m ~n ** to show that §h[[X1 > a]‘¢ |] ,exceeds the square-root of the bound of the theorem. This completes the proof of the theorem 1 n ** 2 ** 2 E P and P 2 2 ** emu), | 1 1m, \ 1 2 ** 2.1m, llzznitxlndlv n. ** Since, by definition, \fi**| s a and since [‘W \ > u] = ** - since Dn(9,l_ ) = n * [\Xl +~t | > u] for u < a, we obtain by Fubini's theorem that 01 0' (5.2) §n[\¢**\] = g gn[‘x1 + t*‘ > u]du 2 Ph[[x1 > a] _1[x1+t* > ujduj. J's, O , m-l Let x 1n (a,m) x R and u be in (0.0) fixed until otherwise stated. As in section §l.2, let 41 ".= 611, .= x.<—i and 6J [h 1] 5] [N] m] ~ t(u-xp (5.3 Y. = 6 - 6 e . ) J j 1 With this definition of Yj’ we obtain that * (5.4) [x1 + t > u] = [8Y1 2 0] where~3, as in sections §1.2, §1.3 and §1.4, denotes summation over j from 1 to n-l. Note that X1 > 0 implies that ~ * [ZYj 2 0, Z 5 = 0, 2 5 = 0] C [X1 + t > u] for u < a. J 1 Since §1""’§n-l Y1,...,Yn_1. Hence, with B denoting the Berry-Esseen constant, the are i.i.d., so are Berry-Esseen theorem and (5.4) give that 3 * (n-l)%P1Y1_%P1|Y1-P1Y1\ (5.5) P [x + t > u] 2 6H ) B(n- 1) _11-1 1 SodoY 1 (so d.Y1)3 ek(u-X1) The definition of Y1 gives that P‘Y1=Ffjl- Ff]. Hence, since the alternative expression for FIE /Fj[j in the J L proof of (2) of Lemma 3 when Specialized to the case 2 k = ' = = ' = _ +._ 0' 3 L 1 gives that FlDl/FlD exp h(X1 2 +mh) for some w in (0,1), we obtain that k k(u-X1) '.k(u + - + (oh) 2 (5.6) Ply1 = F1131 e (e - 1) 2 -k F1D1(U +-;+ h) for k< (0+4)- where the inequality follows from the inequalities u < a < X1 and e'k - l 2 -x. Applying IJJA.(See Appendix) to the random variable Y1, we obtain, since Y1 takes value 1 with probability Ffjl’ 42 that Var (Y1) is at least (1 - F131 - FPWIDIO - FlCll). Hence, Since (1 - F131 - Ffj) is bounded away from zero for h < (a +-4)-1, we obtain that for some c2 > 0 2 -l (5.7) Var (Y1) 2 c2 Ffjl for h < (a +-4) , Using (5.6), (5.7) and the definition of B in (5.1), we obtain that, for k.< (a +'4)-1: P Y 5 1 1 g_ k . m/2 a o - 2 - + — = o (5 8) (n l) s.d.Y1 % (u 2 +-h)f With h f (Fill) c 2 3 The standardized range bound for Pl‘Yl - PiY1‘3/(s.d.Y1) , (u-X1)k with the help of the inequality range of Y1 s 1 + e s 2 since u < a < X1, (5.7) and the definition of 8, gives that, for h < (a + 4)'1, _ 3 Plhr1 P1Y1| (5.9) 3 .<. 2k (n-l)%(s.d.Y1) a czf Integrating the inequality obtained by weakening (5.5) with the help of (5.8) and (5.9) wrt u over (0,0), then using the transformation B(u +~%'+-h)f = cgv in the first integral, we obtain that 0' 12 C55 «Tl-:34") /'I:2 B 8 En-IEXI +'t* > ujdu 2 f2. I §(-v)dv - Zia, for k < (0+4)-1. (Lg—mus: Cif In view of (5.2), we complete the proof by showing that the Pn-integral of the first term of the rhs of this inequality on [x1 > a] converges to a positive constant while that of the second term on X > a conver as to zero. 1 8 43 Since f, defined in (5.8), converges to p1 and Since Specialization of (2) of Lemma 3 to the case of 02 = L = l and n = 2 gives that f"1 is exceeded by pit exp(HXH +,E§l) which is Pn-integrable on [X1 > a], it follows by dominated convergence theorem and the hypothesis on B that B(a+%*h)/C: m Pnux1 > o] 6(-v)dv] —. P1[[X1 > o] f %§(-v)dv] > 0 8(34h)f/c2 a(c2p1)- and Eli-q- Pn[[X1 > 0,ij .. o . C2 The proof of the theorem is complete. ** Now we make a remark concerning the procedures 1. , i and CL defined in sections §l.2, §l.3 and §1.4 reSpectively. Remark 4. For the choice of h and k given in Theorem 1 of section §l.2, we obtain by the theorem proved above that 2 Dn(9’&f*) 2 c n. “H4 for some c > 0. For any y > 0, Theorem 2 of section §1.3 Shows that we can define a procedure i_ such that Dn(§’i) = 0(n-(i-V)). Hence, since y > 1/36 implies that k - y 2 Efiz- for m 2 5, ** it follows that the procedure 1. is better than 1_ in the sense that sup Dn(fi’i) s c1 n-(%-Y) S c2 n m+4 s sup Dn(§,flf*) where the sup is taken over all parameter sequences. 44 For any positive integer S, Theorem 3 of section §l.4 Shows that we can define a procedure 0* Such that -S/2(s+3)) Hence, if ms 2 5m + 8, the procedure Dn(s.9_$_) = 0(n A c ** . Ow 18 better than 1. in the sense described above. 45 §1.6 Extension of Results in Sections §1.2 and §l.3 to Constrained Mean Vectors and Unknown Covariance Matrix Let Y be a d-variate normal with mean w and co~ variance matrix 021. If m is assumed to lie in a lower dimensional subspace E5 say of dimension m < d, then the square of the projection of Y onto the subSpace orthogonal to E’ has expectation 02(d-m) and variance 204(d-m). In this section, this fact has been used to extend the results of sections §l.2 and §1.3. Let {En} be a sequence of independent random variables with In distributed as d-variate normal with unknown covariance matrix 021 and mean wn belonging to an m-dimensional subSpace g; of Rd intersected with the d-sphere of radius a. While stating the results of the present section in section §l.0, we interchanged m and d in order to make proper references to sections §1.2 and §1.3. Let Bn be an orthogonal matrix whose first m columns generate Eh. Let Xn and en denote the vectors formed by the first m coordinates of 3; Zn and B; wn respectively where B; is the transpose of Bn' Let (m-d)Zn denote the square of the projection of Zn onto the subspace which is orthogonal to Eh. Let E stand for expectation wrt the joint distribution of X1,...,§n, 21,...,Zn. This section is divided into two subsections. In the first subsection, with the help of the procedure yf* defined in (2.1), we exhibit a procedure 27* for which -1/(mH4)) ** Dn(§JZ, ) == 0(n for each 02. In the second subsection, 46 for each positive integer S and each v in (0,1), with the help of the procedure 1_ defined by (3.13), we exhibit a i, for which n(§,i) = 0(n-(S-1)Y/(Zs+m)(1+y)) for each 02. Let 2 de- U note summation over i from 1 to n. ** §l.6.1 Definition of T_ and a Rate of Convergence for ** Dn (2.1 > In this subsection, we use the notation of section §1.2. We require the following notation for each n, but, as in earlier sections, we suppress the dependency on n of the functions to be defined below. ** ** Define I. = {T j as follows, 22 ** , 1 * = + —- tr t (6.1) T tr (X n k ) where tr' (as in section §l.2) and tr)\ Stand for retractions m to [-o,+o]m and X [-k-IQXL‘ + o, + k + h), {1(‘xLl + a + k + h)] L=1 respectively. ** a Let T be the modification of T obtained by re- - 1H: 2 * placing n 1221 in the definition of T by a . Let T be the modification of T obtained by replacing tr' in the definition of T by retraction to the cube [-a',+u']m where a' = a +-k + h. Let c1,c2,... denote finite functions of 2 O' o c Lemma 12 E\\T* - T“ s Tl . n X . , , 111 Proof. Since the distance between two p01nts retracted in R to the same cube is at most the distance between the 47 points and since xutrxt*“ S “X“ +'ma', we obtain that 1HT* - T“ S (HXH + ma')‘n-122i - 02‘. Since (d-m)Zl/02,...,(d-m)Zn/o2 are i.i.d. x2 - random variables with d-m degrees of freedom, application of Schwarz inequality to the rhs of the last inequality and the fact that E[(HXH + ma')2] is bounded by a finite function of 02 completes the proof of the lemma. l/ufl4_ -l/mfi4 Theorem 5. If h = n- , k = a n for a in [1,”): _ ** 12(m+4) = n (m+2) and T_ is defined by (6.1), then 1 Dn(§,Tf*) = 0(n-m*4 ) for each 02. nggf. Let 02 be fixed. In the proof, we consider only those n for which k < 02. Since ‘W‘ S a and since T = tr'T*, it follows that HT - Ml S \\T* ' TH and hence HTMr - 1H 5 HT“ - TH + HT" - N- If the Lth coordinate of t* (its negative) > x(‘XL| +-a'), then, since 1 < oz, T* and $* defined by (2.1) turn out to equal a' (its negative). Hence, T* = 1*. Therefore the last inequality, together with Lemas 12, 4 and 6 and the definitions of 1, h and k, implies that EuT* - W“ = -1 ** n /m+4). Since I. , by definition (6.1), takes values in 0( X [-a,+u]m, Lemma 2 and this order relation give the result of n the theorem. §l.6.2 Definition of i. and a Rate of Convergence of Dn(§.i) In this subsection, we use the notation of section §l.3. We require the following notation for each n, but as in previous 48 sections, we suppress the dependency on n of the functions to be defined below. Define i'= {T} as follows, . 27- I T = tr'(x +-—i tr 617)) n X. P where tr' (as in section §l.3) and trA stand for retractions m to [-a,+a]m and x [-1 1(|x | +-a), 1'1(\x | + a)] and - 1 - (j=1 L L A p = b V B ((3.13)). Let T be the modification of T - a 2 obtained by replacing n 1221 in the definition of T by c . * x AS a consequence of replacing T by T, T of sub- section 1.6.2 by T of this subsection and a' by a in the proof of Lemma 12, we obtain the following lemma. C . 1 Lemma 13. EHT - T“ 3 —¥-. n 1 Now we State and prove the main result of the Subsection. Theorem 6. If the hypothesissoi Lemma 10 is satisfied, i. is — -i defined by (1.6.2) and 1 = n28+m +y , then -8-1 .1. Dn(§”i) = 0(fl 28+m 1+¥ ) for each 02. 2329;. Let 02 be fixed. In the proof, we consider only those n for which 1 < 02. If the Lth coordinate of T (its negative) > 1(‘XL1 + a), then, Since 1 < 02, T and i defined by (3.13) turn out to equal 0 (its negative). Hence T I $. Therefore the inequality HT - v“ S “T - T“ + “I - w“, together with Lemmas l3 and 10 and 5-1 .;1_ the hypothesis of the theorem, gives that EHT - W“ = 0(n 28+m 1+v ). Since, by definition, T is in [-a,+u]m, this order relation and Lemma 2 complete the proof of the theorem. CHAPTER II RATES IN THE ESTIMATION AND TWO-ACTION PROBLEMS FOR A FAMILY OF SCALE PARAMETER F(a) DISTRIBUTIONS 49 50 §2.0 Introduction and Notation For 0 < a.< b < 2a < o and a > 2, let 9 a {9919 e [a,bj} be the family of distributions with P6 representing the F(a) distribution with scale parameter 9. Let S be a positive integer. Let {Xn} be‘a sequence of independent random variables with Xn distributeddas P6 belonging to .9. Let n Kn = (X1,...,Xn), §'= {an} and Gn be the empiric distribution of 91,...,en. In section §2.1, we consider a sequence of estimation problems each having the structure of the following component estimation problem. Based on an observable random variable X whose distribution Pe belongs to .9, the problem is to estimate 9 with squared-error loss. Let R(Gn) denote the Bayes risk against CD in the estimation problem just described. Let Q = {¢n} be a randomized sequence-compound procedure (abbre- viated to randomized procedure hereafter). That is, for each n, ¢n is a randomized function of X“. For any such 9, g in x [a,b], let n n -l 2 (0.1) D (m) = n z E|¢ - el - R(G) n j=1 j j n where E stands for expectation wrt the joint distribution of all the random variables involved. In section §2.1, we exhibit * * ** a randomized procedure 1’ = {Va} SUCh that Dn(§ai, ) = n-s/2(s+l)) O( uniformly in all parameter sequences g in X [a,b]. 51 In section §2.2, we consider a sequence of two-action problems each having the structure of the following component two-action problem. Based on an observable random variable X whose distribution P6 belongs to «9, the problem is to choose one of two possible actions a1 and a2 when the loss functions correSponding to 81 and a2 are L(a1,e) = (e-c)+ and L(a2,e) = (e-c)' for some c in (a,b). Let R(Gn) denote the Bayes risk against Gn in the two-action problem described above. Then, in section §2.2, we exhibit a randomized procedure 1 = {fin} such that the absolute value of Dn(§,$) defined by -1 n (0.2) Dn(a,i) = n z mare) - R(G) j=1 j I1 is 0(n-S/2(S+1)) uniformly in all parameter sequences Q in X [a,b]. n The orders stated in the results of both sections §2.1 and §2.2 are uniform in all parameter sequences g in X [a,b]. Hence, in order to reduce the complexity of the Statements of the results in this chapter, the range of the parameter sequences will not be exhibited, but is understood to be X [a,b]. n We introduce some notation which is common to both sections §2.l and §2.2. Let {in} be a sequence of i.i.d. random variables with the density of 11 as (a-l)xg-2[O < 11 < l] wrt Lebesgue meaSure p, on ((0,00), 5 n (0am))- Furthermore, we assume that {kn} is independent of {Xn}. Define, for each n, Yn = ann. Then, Yn has F(q-1) distribution with scale 52 parameter en. We let 2 and 2' denote Summations over j from 1 to n-1 and from 1 to s reSpectively. Now we introduce some notation which is similar to that introduced in section §l.4. Since the Vandermonde determinant involved does not vanish, there exists a unique vector d = (d1,...,ds) in R3 such that :18 + 0 and l for L = l . .1 _ '6 3 (0'3) 2 di(1 - (i 1) ) 0 for L = 2,...,s. For any h > O and any real valued function g on (0,m), define ism) = h'1 0. With P. and H. denoting the averages of the distributions of X1,-..,Xn_1 and Y ,...,Yn_1. respectively and with X abbre- l viating Xn, let (0-4) 11- = 2' dil F(x + (i-l)h) and (0.5) §'= z' dii fi(x +-(i-1)h). * * With F and H denoting the empiric distributions of xl’,po,xn-1 and Y1’000,Yn-1, let * * (0.6) n = 2' d1; F (x +-(i-1)h) and * -k (0.7) g = 2' d1; H (x + (i-l)h). 53 Let p and q denote the densities of T(a) dis- 9 9 tribution with scale parameter 9 and Fug-l) distribution inith scale parameter 6 reSpectively. Let 5 and 5 denote the densities of P. and H. reSpectively. With pés) and q(S) 9 denoting the sth order derivatives of p9 and q reSpectively, we assume throughout this chapter that a is 3 s (0.8) Sup {‘pé )\, \qés)‘ 1 a s e s b} < m. Under this assumption (0.8), it follows from the con- dition on d in (0.3) and (3.2) of Chapter I that (0.9) \fi'- 5‘ s kzhs and (0-10) \E - {1| 5 15118 where k6 and k7 are constants. 54 * §2dl Estimation Problem. Rates of Convergence for Dn(§’£ ) x with 1 Based on Kernel Estimators for a Density In this section, under certain conditions on 65 we Show that,for each positive integer s, the modified regret of * the procedure i (to be defined below by (1.2)) is -s/2(s+1) n O( ) when the component problem involved is the estimation problem described in section §2.0. The method of proving this rate of convergence is Similar to that of Theorem 3 of Chapter I. Let w denote the Bayes estimate against in Gn-l the component estimation problem described in section §2.0. 5 Then W can be expressed as (1.1) u = —%1H%LEL for u > 0. °’ MU) * Define the procedure 1 as follows. Let *_ _X_ (1.2) W — tr (a-l :3srns V where tr stands for retraction to [a,b]. Any undefined ratios are taken to be zero. Let K1,K2,... denote constants in this section. Let E stand for the expectation wrt the joint distribution of the random variables involved unless otherwise Specified. In the following lemma, q, 5 are evaluated at X. Lemma 1. If a > 2, b < 2a, (0.8) is satisfied and h is in N'= {h‘O < h((a-1)a-1 V (a-Z)a-2) < ea-2F(a-l)a r 8-1} for some r in (0,%), then 55 5 + 23+l % E[‘¢* - ((X)\] s K1((nh)- (nh ) ). Proof. We have by the definition of conditional expectation * 'k (1.3) Bill - 1(x>l] = E[E[\l - l|X11 where E["X] stands for the conditional expectation operation given X. * Since w , by definition (1.2), is the retraction of 7': 3': .. .. XE /(a-l)fl to [a,b] and since t(X) = Xq/(a-1)p, the Bayes estimate against G whose support lies in [a,b], is in n-l [a,b], we have ‘¢* - t‘ S b-a and ‘w* - Y‘ S X(a-l)-1\D‘ where * - (1.4) D = S; - 3'- 0 p We then have * b-a * EE‘Y ' W(X)“X] S g PE‘W - Y‘ > ujdu b-a -1 (1-5) SS PUD| > (a-l)x ujdu b-a _1 o -1 = 1 FEB > 01-1)X u] +-f P[D < (o-l)x ujdu o a-b where P stands for the joint probability measure of X1,...,Xn_1 and Y ..,Y 1" The main part of the proof bounds P[D > (a-l)X-1u] n-l' for 0 S u S b-a and P[D < (a-l)X-1u] for a-b S u < O by using the Berry-Esseen theorem. The rest of the proof shows that the expectation of a bound for the rhs of (1.5) is exceeded by the bound in the lemma. 56 Let X > 0 be fixed until otherwise stated. Let (1.6) Zj(u) = z'di([x + (i-1)h < Yj < x + ihj - ((a-1)X-1u + %9[X + (i-l)h < Xj < x + ihj) p for \u\ s b-a. Let the dependency of Zj on u be suppressed. Let 82 = Var(£Zj) and L = B-3ZE‘ZJ - EZj‘B. Now we prove the following Sublemma. Sublemma. For any \u‘ S b-a and any constant k2 such that ‘di‘ S k2 for i = 1,...,S, k2(1 + 2(a-1)x‘1b) L s g _ 1 . k4((n-1)h) (1H(x+(s-1>h>>2 Proof. In order to obtain the result of the sublemma, we need a lower bound for 52. This bound will be obtained by applying L1.A (see Appendix) to the Zj' Since Yj = ijj and Since the distribution of xj is supported on (0,1), P[Yj 2 Xj] = 0 and hence Zj defined by (1.6) takes 2-ls(s+5) + 1 values; namely, 0, di-dj((q-l)X-1u + %) for 1 S i S j S s, (1.7) p d1 for i = 1,...,8 and -di((a-1)X-1u + %) for i = 1,... p with nonzero probability. The probability that Zj takes the value zero in (1.7) is given by P[xj é (X,X + sh), Yj 4 (X,X + Sh)]. Since ,S. 57 e-uum S e.mmm for m > 0 and since a > 2 and ej > 2 by assumption, we have U xx+h = 1 x-hgh(“——j)'o’1 ejd P(Xj6(, s)] “Maj e u 8"(0’ 2) (1‘2 Sf73:133'((a 1)a V (0;?) )Sh. (1.8) m__ = 1 X+Sh (u )w _2 ej PLYJ. E (X,X + sh)] —-—1.(a_1)ej 111(8 du -(a-2) , e a-l a-Z S W ((01-1) V (oz-2) )sh. Therefbre, it follows by the hypothesis on h that P[Xj E (X,X + sh)] and PEYj E (X,X + sh)] are exceeded by r. Hence, since P(A n B) 2 P(A) + P(B) - l for any two events A, B, we obtain that P[xj (E (X,X + sh), Yj 4 (X,X + sh)] > 1-2r. Hence, since Zj takes the value dS With probability P(X + (S-l)h < Yj < X + sh S Xj], we obtain by L1.A. that (1.9) Var(ZJ.) 2 k§(P[X + (s-1)h < Yj < X + sh S Xj'j) vahere k; = d:(1-2r) inf{1 - P(u + (s-l)h < Yj < u + sh]\u > 0, h E R3. Vie observe that k3 # 0 since (18 # 0 and Zr < 1. Hence, Since [x + (S-1)h <3!j < x +sh, xj - Yj 2 h] c [x + (s-l)h < ‘Yij < X + shS %] and since Xj - Yj and Yj are independent, Vve: obtain that 2 . Var(YJ.) 2 k3 in£{P[xj - Yj 2 hj\h e %}P[X + (s-l)h < Y], < x + sh]. Ufllerefore, 2 2 '- (1.10) e 2 k4(n-l)h i H(X + (s-l)h) to 2 _ 2 . llere k4 - k3 1nf{P[Xj - Yj 2 h]\h 6 fl}. 58 Since ‘u‘ S b-a and since xq/(a-1)p S b, the maximum of the moduli of the values of'(-Z,)in.(l.7) is at J most (1.11) k2(1 + 2(a-l)X-1b) where k2 is the constant stated in the sublemma. Therefore, the standardized range bound for L, together with the help of (1.10) and (1.11), gives the result of the Sub- lemma. Proceeding with the prOof of the lemma, we obtain an upper bound for 82. The definition of Zj in (1.6) and (1.11) imply that X+sh h 11x ) 2 2 - 2 E zj s k2 (1 + 2(a-1)X 1b) (F + njji‘h" where Fj and Hj are the distributions of Xj and Yj 2 respectively. Therefore, since 32 = 2 Var(Zj) S 2 Ezj, we obtain that 2 - _ _ (1.12) 32 s k2(1 + 2(a-1)x 1b)2(n-l)(F];{+Sh + Hjifih). Let 0 S u S b-a. Then the definitions of D in (1.4) and Zj in (1.6) imply that [D > 03-1)X-1u] S [zzj > 0] + [n* S 0]. Hence, with b(L) denoting the bound in the sublemma and B denoting the Berry-Esseen constant, by the Berry-Esseen theorem, the sublemma and the triangle inequality, we obtain that PD: Z > O] is exceeded by j -l -l- -1 -l- (1.13) we (n-1)h(o-1)x p u) + lu-e (n-1)h(o-1)x p u) - Q(g'1z Ezj)|+-B b(L). 59 2 By using the upper bound for B in (l.12),we obtain that (1.14) 8-1(n-1)h(a-1)X-1p 2 ((n-l)h)%f where f is the positive solution of the equation (1.15) k§x2(1 + 2(a-l)x'1b)2(5j:+5h +-fij§+5h)£2 = h(a-l)2p2. Since 2 EZj + (n-l)h(a-1)X-1p u = (n-l)h((a-1)X-1u + %)(B - '11) + E - a) for all lul s b-8 and X5/I3 S b, it follows from (0.9) and (0.10) that In“. J a (1.16) \z Ezj + (n-l)h(o-1)x'15 u\ s (n-1)h‘°’+1(21<6b(o-l)x'l + K7) for all ‘u‘ S b-a. Therefore, it follows by the mean value theorem and the lower bound for 82 in (1.10) that the second term in (1.13) is exceeded by ((n-1)h2$+1)}5(21t6(o-1)x'1 +~K ) (1.17) 7 - % kz(l H(X + (s-l)h)) Hence it follows from (1.13) and (1.14) that (1.18) P[2 zj > 0] S ¢(-((n-1)h)%f u) + (1.17) + b b(L) for 0 S u S b-a. Let a-b S u < 0. Then the definitions of D in (1.4) and zj in (1.6) imply that [D < (a-1)X-lu] S [z(-Zj) > 0] + [n* s 0]. Hence, since the sublemma continues to hold if d is replaced by -d, we have by the Berry-Esseen theorem, the 60 trviangle inequality and the sublemma that P[Z(-Zj) > O] is exceeded by (1.19) 6(3'1(n-1)h(o-1)x‘15 u) + |o(a'1(n-l)h(o-1)x’1u) - “-34; Ezj)| + B b(L). 2 By using the upper bound for B in (1.12), we obtain . . ’5 f tlhatzthe first term of (1.19) is bounded by 0(((n-l)h) f u) Mflneare f is the positive Solution of (1.15). By using (1.16) (311:1 the lower bound for 82 in (1.10), we obtain that the second téirnlof (1.19) is exceeded by (1.17). Hence, it follows from (1 .19) that P[D < (o-l)x'1u] s 9(((n~-1)h)SE f u) + (1.17) + B b(L) for a-b S u < 0. Integrating this inequality wrt u over [a—b,0) and C118 inequality (1.18) wrt u over [0,b-a], then bounding their ffirst terms by using the inequality :-a§(-Au)du S (211).15 A"1 fRDr A > 0, we obtain from (1.5) that 2 a; + 2(b-a) (1.17) + 2(b-a>B b(L). E1|l* - ll|XJ s «mm 15 In view of this inequality, (1.17) and the bound in tlle sublemma, we continue the proof of the lemma by showing that t'1 and (1 + x'1)(t h(x + (s-l)h))-% are uniformly bounded and Pn-integrable. h - - = P(X +'eSh)/q(X + esh) for some , -X+sh -X+s Since ij /H]x e in (0,1) by Cauchy's mean value theorem, Xd/(a-l)p 2 a, and Since f is defined as the positive solution of (1.15), 61 we (obtain that f“1 is exceeded by k2 X (l + 2(a-1)X-lb)(1-{l]§+8h)%((1+ ((a-l)a)-1(X + ssh))% (1.20) hgp . ‘“ +Sh - -l Slnce a]: = sh q(X + fish) for some 0 < 6 < 1, (01-1) _ _ _ _ _ _ 1- q(X + ssh)/P(X) s b(x + s)“ 2/x°’ 1 and b 0’ e Va _<. 1“(o)X o’pj a Ei‘cy e-X/b, the condition b < 2a implies that the expectation (Df the upper bound (1.20) for £71 is uniformly bounded. Since, by the mean value theorem, 5 H(X + (S~l)h) = d (}{ + €(S-l)h) for some 6 in (0,1) and since r-(C¥_1)ba-la 2 Xa-Ze-X/a a-la-ae-X/b and P(a)pn S X , the con- Cli.tions b < 2a and a > 2 ‘imply that the expectation of )(-1(A H(X + (s-l)h))15 is uniformly bounded. . w The same notnod of bounding §n[n g 0] completes the proof. Now we state and prove the main result of the section. 'rliis result is a consequence of Lemma 1, Theorem 2.1 and (2.5) C)f Gilliland (1968). Tflieorem 1. If a > 2, b < 2a, (0.8) is satisfied, h -l/s+l vn - - - -1 wuh0<¥2reAMrs mrwm r iri (O,%) and if is defined by (1.2), then Dn(9.ll> = 0(n‘s‘2(s+l)>. 3:522£° Since Pe(U) is exceeded by (P(a))'la'aua'le-U/b uIliformly in all e belonging to [a,b] and u[ua-187U/b] < m 3 it follows by Theorem 2.1 of Gilliland (1968) that -1 n -1 l. E - X = 1 < > n jgl [\lj(xj) yj_1( j)l] 0(n og n) 62 where 0(n-110g n) is uniform in all parameter sequences g in X [a,b]° n Since the inequality (2.5) of Gilliland (1968) continues to hold when the ¢i mentioned there are randomized procedures a- and since 11 , by definition (1.2), takes values in [a,b], it follows by (1.19) that ‘k -1 n * -1 ‘Dn(_8_,i )1 S 4b n jEIEENJ - ¢j_1(Xj)H + 0(n log 11). Hence the reSult of the theorem follows from Lemma 1 and the definition of h in the statement of the theorem. 63 92.2 Two-action Problem. Rates of Convergence for Dn(§,1) with 1 Based on Kernel Estimators for a Density In this section, under certain conditions on 65 we Show that, for each positive integer s, the modified regret of the procedure 1_ (defined below by (2.5) and (2.6)) is 0(n-S/2(S+1)) when the component problem involved is the two action problem described in section §2.0. The method of proving this rate of convergence is Similar to that of Johns (1967). In this section, we make it a convention that the value of any decision function iS the probability of taking action a1. Define, for each n, (2.1) Yn = (6n - C)Pn- If R(Gn) denotes the component two-action problem described in section §2.0, then ['1 n R(cn) = inf bib a“1 2 VJ.) + a”1 z (ej - «0’. 6 j-l j=1 Hence, with mn defined by 1'1 (2.2) m = Z Y.3 n j=1 j - n _- (2.3) n R(Gn) = minim] 4.121(6)]. - c) With E denoting the expectation operation, for any randomized procedure i.= {fin}, the risk of using Wu to decide about en is given by (an - c)E[$n] +(en - c) and hence the average risk of using $1,...,¢n to decide about 91,...,9n respectively is given by 64 1 n 1 n - — E = +._ - n jE1PEYj(U) [¢J\Xj UJ] n 121(93 C) where E[$j‘X, = u] is a conditional expectation of lj J given Xj = u. Hence, it follows from (0.2) and (2.3) that (u)E[1j\xj = u] + m;(u)]. n (2.4) n Dn(a.1> = ulYllil l'”[j§2VJ Let h > 0 be a function of n. We define = f 11 . 1 {mm} as o ows Let (2.5) $1 = l and, for n > 1, * * (2-6) ln = [XE /a-1 < c“ ] * * where n and g are defined by (0.6) and (0.7) reSpectively and X is an abbreviation for Xn. Let, for n > 1, * = _ 93. . * (2.7) Sn-l (n 1)(a-l cn ) for u > 0 * * where g and n are evaluated at Y1,...,Yn 1, u and X1,...,Xn_1, u respectively. * (2.8) mn_1 = E[Sn_1] for u > 0 and 2 _ (2.9) Bn-l — Var(Sn_1) for u > 0. Lemma 2. If a > 2, b < 2a, (0.8) is satisfied and h is in -1 -2 -2 - w'= {h|0 < h((a-1)a v (o-2)“ ) < e“ r(o-1)a r s 1} for some 0 < r < %, then 6S * m m ‘01V (9(‘ n-l) ‘ 9(‘ “2:19)]‘ 5 K1((n-l)h28+1)l5 for n > 1. n Bn-l Bn-l Proof. By the mean value theorem and the inequality ./2; ¢ S l, we obtain that * 1* l m _ m - m _ -m _ (2.10) ‘¢(_ n 1) _ ¢(_ n 1)‘ n l n l 6n-1 8n-1 /2n Bn-l Since (0.8) is satisfied by the hypothesis, it follows from (0.9), (0.10) and the definitions of m: 1 in (2.8) and mn_1 in (2.2) that (2.11) 1m* - m \ s (n-1)h3(-E—-k + c ). n-1 n-1 a-l 6 k7 2 Now we get a lower bound for an 1. Let (2.12) h Zj(u) = g'di(u(a-1)‘l[u + (i-l)h < Yj < u + ih] - c[u + (i-1)h < Xj < u + ih]). Then, since P[Yj 2 Xj] = 0, h Zj defined above takes -1 2 s(s+5) + 1 values; namely, -1 (2.13) 0, diu(a-l) - djc for 1 S i S j S s, diu(a-l)-1 for i = 1,...,s and dic for i = 1,..,,S. with nonzero probability. The probability that h 2 takes the value zero in (2.13) is given by P[Xj E (u,u + Sh), Yj f (u,u + Sh)]. Then, it follows by the hypothesis on b, (1.8) and the in- quality P(A n B) 2 P(A) + P(B) - l for any two events A and B that this probability is at least l-2r > 0. Hence, 66 Since h Zj takes the value (18 with probability P[u + (s-1)h < YJ. < u + sh 5 xj], we have from L1.A. (see Appendix) that (2.14) Var(h 2]) 2 (l-2r)d: P(_u + (S-l)h < Y < u + sh s x] . J J (1-- P[u + (s-l)h < Y < u + sh s x.])). J J Hence, since in£{1 - P[u +(s-,1)h 4 Yj < u + sh s ij| u > 0, ' ej 6 [a,b], h €_Nj >'0, by using the argument given to obtain (1.10) from (1.9), we obtain that (2.15) h 5:1 2 k:(n-l) Aim + (s-l)h). Therefore, we have from (2.10) and (2.11) that _;;r * 25+1 5 u m _ m <h > (k —_- + C) (2.16) \9(- n 1) - 9(- “—n'l)‘ S _ 6 a; k7 Bn_1 8,14 k2 (M101 + (s-1)h)) We have (2.17) 30 P(a)‘yn‘ S (b + C) Ua-le-U/b. By the mean value theorem, (2.18) All-(u + (s-l)h) = &(u + eh) for some 6 in (3-1,5). Hence, since 0.1 _ q-Ze-u/a (2-19) b P(a-l)Q(U) 2 U it follows by the hypothesis on h, (2.20) ba-]T(a) t H(u + (s-1)h) 2 ua'ze'l'U/a. 67 Since b < 2a the result follows from the inequalities (2.16), (2.17) and (2.20). [Rt _-3 3 (2.21) Ln_1 — en_1 z E‘Zj - EZJ‘ where Zj is defined by (2.12). Lemma 3. If the hypothesis of Lemma 2 is satisfied, then -1 h '% “[‘Yn‘Ln-l] S K3((n ) ) for n > 1. Proof. The standardized range bound for Ln-l’ together with (2.15) and the fact that the umximum of the moduli of the values of h Zj defined in (2.12) is at most 2 22 d d 1 '1 ( . ) max {\ 1|,...,\ S|}(u(o- ) + c), implies that max {\d1\,...,\ds|1(u(a-1)'1 + c) % Ln-l S 3 ‘- k,((n-1>h> (1 no + (s-1)h)) Since b < 2a and or > 2 implies that the u-integral of the rhs of the inequality obtained by weakening this inequality for Ln-l by using (2.20) is uniformly bounded, the proof of the lemma is complete. Below, we get an upper bound for B§_1. We have by the definition of h Zj in (2.12) and (2.22) that 2 2 2 -1 2 u+Sh u+Sh h E2j s (max {‘d1\,...,|ds\}) (u(a-1) + c) (iju + Hjju ) where Fj and Hj are the distribution functions of X, and 68 . . 2 Yj respectively. Therefore, Since 5 S EZ2 e ha e n-l 2 j’ w V 2 - — h (2.23) h28n_1 s (max {\dl‘,...,\ds‘})2(u(a-l) 1 + c)2(n-l)(F]:+S -u+sh +11]u ). -1% Lemma 4. ”[Bn-l] S k4((n-l)h ) for n > 1. L Proof. By (2.23) and the inequality (a+b)% S a2 +ib% for a,b > 0, we have - _ _._ +Sh ((n-1)h 1) 75n_1 s max {\d1\,...,\ds|}((h 1F]: )5 + (h'lfijsfihfi). By the mean value theorem h-lfj:+8h = s 5(x + ash) and h-lfij:+8h = s 5(x + Ssh) for some 0 < 3,6 < 1. Hence, since P(a)aQE S ua-le-U/b, F(a-l)aQ-1 a S ua-Ze-u/b and ua'.2e.u/b is u-integrable, the result of the lemma follows. The proof of the following theorem depends on Lemmas 2,3,4 euuipart of the method of proof of Theorem 1 of Johns (1967). Theorem 2. For each positive integer s, if (0.8) is satisfied, -l/s+l n a > 2, b < 2a, h = y where Y((a_1)a-l v (a_2)a-2) < ea"2 P(a-1)a r S'1 for some 0 < r < 5 and l, is defined by (2.5) and (2.6), then Dn(§Jl) = 0(n-s/2(s+l)). Proof. By (2.4) and the definition of i. in (2.5) and (2.6), we have (2.24) nlnnmm)‘ s ulhll] +‘ \ul 2 Yj(u>Eiljlxj = u] + mgom- j=2 69 To start with, we consider bounding the integrand of the second term of the rhs of (2.24) on the set [mn > O]. Afterwards we consider the case when mh S 0. So, let mn 2 0 until otherwise Stated. Since m.n 2 0, we have by the defini- tion of lj in (2.6) and Sj-l in (2.7) for j > 1 that n _ n (2.25) jEZYj(u)E[$j‘Xj = uj + mn(u) = jizyj(u)P[Sj_l < 0] where P stands for the joint probability measure of all the random variables involved. By the triangle inequality, [1 (2.26) | 22v. (u)P[Sj_1< 01‘ s 1Dl| + \Dzl + ‘D3I J- where [11* [l (2.27) 01 = z yj (P[Sj_1< 0] - 9(--—1-)), i=2 Bj- -1 n (2.28) D2 = z yjm- 4—)- -:—1'—1>) i=2 BJ- -1 SJ- -1 and n .121 2.2 = - ( 9) D3 ZYJ §( 511 ) With B denoting the Berry-Esseen constant, the Berry- Esseen theorem (Loave (1963), p. 288) gives n ‘D1l S B jEZ‘vj‘LJ-l Therefore, by Lemma 3, n (2.30) nun“ > 01101” a 1.3 22((j-1)h)-%. j: 70 By Lemma 2, we have n (2.31) ullmn > OllDzll S k1 jgz (h25+l>5. Replacing aj and sj by Yj and Bj in (2.6) through (2.13) of Theorem 1 of Johns (1967), we obtain that n y n ‘y,‘ (2.32) \D \ s¢(0) z -1——+ z ——1—+(e +5 )(A§(A) +2@(0)) -- 3 j=2 Bj_1 3:2 j2 n 1 1 1 r" where max X§(-X) = A1§(A1). x>0 . The lower bound for B§_1 in (2.15), the lower bound for b fiku + (s-l)h) in (2.20) and the upper bound for yj L in (2.17), together with the conditions b < 2a and a > 2, 2 - _ imply that ”[Yj/Bj-I] S k5((j-l)h 1) t. Hence, since (2.17) implies that ”[lyj‘] is uniformly bounded, it follows from (2.32) and Lemma 4 that n n ullm > 0]\D3|] s k ( z ((1-1)h‘1)'5 + 2 l— + (nh‘1)¥ + 1). n 8 j=2 j=2 j2 Hence (2.25) to (2.32) imply that luttmn > 0]< z vj(U)E[l|Xj = u] + mg)]\ j 2 n = mum“ > 0] Ezyj(u)P[Sj_1 < 0]“ (2.33) n $5 n 2 1 5 s k9< z <h>' + z ((3-1)h 3+ > j=2 J=2 n n - - 1 +- z ((j-l)h 1) a + 2 —§'+ (3)35 + 1). i=2 i=2 1 Now we consider bounding the integrand of the second term of the rhs of (2.24) on [mn < O]. For u in [mm s 0], we 71 have by the definition of fij in (2.6) and Sj-l in (2.7) for j > 1, we obtain that n _ n jEZYJ (u)E[$j‘Xj = u] + mn(u) = --Y1(u) - jEZYj (u)P[Sj_1 2 0]. Following the same argument we gave to bound 2; Vj(U)P[Sj_1 < O] by rhs of (2.33), we obtain that 1'1 mum“ < Oyjizijmwj‘xj = u] + m;(u))]‘ s rhs of (2.33) + uE‘yl‘]. Since (2.17) implies that “[‘y1‘] is uniformly bounded, this inequality and (2.33), together with (2.24) and the hypothesis concerning h, imply the result of the theorem. APPENDIX APPENDIX We apply the following lemma for obtaining lower bounds for certain variances in Lemmas 4 and 11 of Chapter I and Lemmas l, 2 and 3 of Chapter II. The inequality in the following lemma is trivially true when p0 = 1. Lemma 1.A. Let pO < l, p1,...,pi,... be a probability distribu- tion on {O,l,...,i,...} and let 2 be the r.v. Z(i) = Z1 for . . _ . 2 :4: Specified 20 - 0,21,...,zi,... With 2 zipi < m. Let qi abbreviate l - pi and let 1(X) = 2: Pi(1 — Xqi)-1. Then I g from to #{i 2 l‘pi > O} as X t from O to l and 0 1 i. i 1 with Al the unique root of 1(X) = 1. Since 1(p0) s 1, Proof. 1 g since each summand p,(l - )\q.)”1 _____ 1 l with p1 > 0 g. Since equality holds in the inequality when XI = l, we consider below the case k1'< 1. To prove the inequality when k1 < 1, let ¢(z) = Var(Z) - 2 . . X1 2 2i piqi for z — (21,22,...). Denoting the first and second partials wrt zj by *3 and wjj reSpectively, ¢j(Z) = 2Pj{(1-x1qj)zj - 2 z.p.} , w..(z) = 2(1-A1)quj- 11 J] For j with pj > 0, W is, therefore, minimal wrt zj varia- tion iff zj = (l‘quj)-IZ zipi. These conditions are satisfied 72 73 \ iff, for some constant c, zj = c(l-)\1qj)-’1 for j with pj 2 O. For such 2, -2 2 _ ¢(z) = C {2 Pi(1-xlqi) - 1 - x1 2 piqi(1-ilqi) 2} = 0 which yields the nonnegativity of W asserted by the lemma. BIBLIOGRAPHY BIBLIOGRAPHY Clemmer, Bennie A. and Krtuchkoff, Richard G. (1968). The use of empirical Bayes estimators in a linear regression model. Biometrika 55, 525-534. Fox, Richard (1968). Contributions to compound decision theory and empirical squared error loss estimation. RM-214, Department of Statistics and Probability, Michigan State University. Gilliland, Dennis (1966). Approximation to Bayes risk in sequences of non-finite decision problems. RM-162, Department of Statistics and Probability, Michigan State University. Gilliland, Dennis (1968). Sequential compound estimation. Ann. Math. Statist. 39, 1890-1904. Graves, Lawrence M. (1956). The theory of functions of real variables, 2nd Edition. Macmillan. Hannan, James F. (1957). Approximation to Bayes risk in repeated play. Contributions £g_the Theory g£_Games, 3, 97-139. Ann. Math. Studies No. 39, Princeton University Press. Hannan, J. (1964). Mathematical Reviews 27, 828. Johns, M.V., Jr. (1967). Two-action compound decision problems. Proceedings g£_the Fifth Berkelgngymposium 22 Mathematical Statistics and Probability, 463-478. University of California Press. Johns, M.V., Jr. and Van Ryzin, J. (1967). Convergence rates for empirical Bayes two-action problems II. Continuous case. Technical Report No. 132, Department of Statistics, Stanford University. Loeve, Michel (1963). Probability Theory, 3rd Edition. Van Nostrand. Martz, Harry F., Jr. and Krutchkoff, Richard G. (1969). Empirical Bayes estimators in a multiple linear regression model. Biometrika 56, 367-374. Miyasawa, Koichi (1961). An empirical Bayes estimator of the mean of a normal distribution. Bull. Inst. Internat. Statist. 38, 181-188. 74 75 Pi-Erh, Lin (1968). Estimation of a multivariate density and its partial derivatives, with empirical Bayes applications. Ph.D. Thesis, Columbia University. Rutherford, John R. (1965). Some parametric empirical Bayes techniques. Ph.D. Thesis, Virginia Polytechnic Institute. "MW 165 111141 N" N” U" Em MMMMM iii 3 12 4min: 3146 2 93 O