1 LlNll‘uWWWWNWWWkMl ‘Iritiiiiiiiiiiw‘ dissertation entitled Set Compound Decision Estimation Under Entropy Loss In Exponential Families presented by Zhihui Liu has been accepted towards fulfillment of the requirements for Ph.D. degreein Statistics at; /§%4/ Major professor Date June 17. 1997 %n 5:: M Major Professor MS U i: an Affirmative Action/Eq ual Opportunity Institution 0-12771 LIBRARY Mlchlgen State Unlverelty PLACE ll RETURN BOX to remove thin checkout trom your record. TO AVOID FINES return on or More due due. DATE DUE DATE DUE DATE DUE || | MSU le An Affirmative ActionlEcpel Oppommlty Institution W M‘ SET COMPOUND DECISION ESTIMATION UNDER ENTROPY LOSS IN EXPONENTIAL FAMILIES By Zhihui Liu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1997 Copyright by Zhihui Liu 1997 ABSTRACT SET COMPOUND DECISION ESTIMATION UNDER ENTROPY LOSS IN EXPONENTIAL FAMILIES By Zhihui Liu Set compound decision estimation has been studied for half a century, starting with finite state examples. Set compound estimation under the entropy (Kullback-Leibler, same through the thesis) loss for a k-dimensional standard exponential family with a compact parameter space is discussed here. The entropy loss with the exponential family and related properties are investigated in detail. Asymptotically optimal set compound estimators with rates 0(n’i) under this loss are established for one dimensional discrete exponential families including Poisson and negative binomial by being able to view the Bayes estimators as a ratio of two power series and representing them in form of mixture density and using Singh-Datta Lemma. Some remarks related to squared error loss and noncompact state case are also given. Generalization to a higher dimensional discrete exponential family is illustrated with a two dimensional example. Secondly going from cumulant generating functions and using kernel density estimation, continuous exponential families are studied under the same setting. Normal and Gamma distribution families are examples. To my husband, my son and my daughter. ACKNOWLEDGMENTS First the author would like to take this opportunity to thank Professors Dennis Gilliland and James Harman for suggesting the problems, for valuable recommendations and for their criticism through which the author has gained research abilities. The author has benefited from their knowledge both in statistics and language. The author likes to express her thanks to the department for the financial support and consideration during her Ph. D. study here. She wants to thank Professor Salehi, Professor Stapleton, Professor LePage and other committee members for their help. At the end, the author thanks her parents for their encouragement and her husband for his understanding and support in many ways. TABLE OF CONTENTS CHAPTER 0 INTRODUCTION ............................................................................................................... 1 CHAPTER 1 SET COMPOUND DECISION PROBLEM UNDER ENTROPY LOSS IN EXPONENTIALFAMILIES............ .............................................................................. 4 1.1. The Component Decision Problem4 1.2. The Corresponding Set Compound Problem .................................................. .8 1.3. Bounds for the Regret Dn(Q,t) 10 1.4. Summary ............................................................................... 12 CHAPTER 2 SET COMPOUND DECISION ESTIMATION UNDER ENTROPY LOSS IN DISCRETE EXPONENTIAL FAMILIESIS 2.1 . Introduction ..................................................................................................... 1 5 2.2. An Asymptotically Optimal Set Compound Estimator for a One Dimensional Family .......................................................... 18 2.3. Poisson and Negative Binomial Distributions ............................................. .22 2.4. A Two Dimensional Example ....................................................... 24 CHAPTER 3 SET COMPOUND DECISION ESTIMATION UNDER ENTROPY LOSS IN CONTINUOUS ONE-DIMENSIONAL EXPONENTIAL FAMILIES .................... 30 3.1. Kernel Density Estimation .............................................................................. 31 k 3.2. Case w(9) = Z:aq6q (k 22 and {aq} are constants)... ............................... .34 q=0 k 3.3. Case me) = Zatqe-q + bln(—0) (-00 < a 5.93 p < o , b s o and q=l sign{aq}=(—l)q for 1 quk) ......................................................... 37 BIBLIOGRAPHY .............................................................................................................. 47 vi Chapter 0 INTRODUCTION Compound decision theory was introduced by Robbins (1951) through an example of decision between N(-1,l) and N(1,l). In his paper he proposed a compound procedure that was optimal in the sense of asymptotic subminimaxity and showed the necessity of this kind of procedure. The theory was greatly developed and in much general finite state situation by Hannan (1956) and (1957). He used a randomization technique to overcome the difficulty caused by discontinuity of a Bayes response with respect to priors in his (1957) paper. Since then many works have been done in this area. Gilliland (1968) further demonstrated the necessity of compound procedures by showing that the supremum of regrets over both a parameter space and stages for any simple procedure is positive. Vardeman (1982) applied Hannan (l957)’s randomization technique to extended sequence compound problems for finite state case. Gilliland and Harman (1986) considered set compound problems in a setting of restricted risk components that avoided action space and loss function. In their paper a large class of asymptotic solutions to the set compound decision problems for finite state case were established. Singh (1974) considered sequence compound problems for exponential families with compact parameter spaces for parameter functions in three cases: 9, e9 and 2 G" , and yielded some rates. Datta (1991a) generalized and strengthened Gilliland, Hannan and Huang (1976)’s work from finite state case to compact state case and established an asymptotically optimal Bayes compound estimator based on a hyperprior for a continuous parameter function of an exponential family with a compact parameter space under squared error loss. Mashayekhi (1991 and 1995) strengthened Datta (1991a)’s result to the equivariant envelope. He also successfully extended Hannan and Huang (1972)’s results on the stability of symmetrization of product measures to the compact state case for exponential families. Zhu (1992) developed Datta’s work (1991) to multidimensional case. In addition he considered a nonregular family - two dimensional truncation family, and got some rate for this family. Majumdar (1993) developed Datta (1991a) and (l99lb)’s work to Hilbert spaces. The present work considers the problem of set compound estimation of the natural parameter in an exponential family with entropy loss. The families considered here had been the focus of sequence compound estimation of 9, e6 and 9" with squared error loss [see Samuel (1965), Gilliland (1968), Swain (1965), Yu (1971) and Singh (1974)]. Estimation with entropy loss was also studied extensively. [see M. Ghosh and M. C. Yang (1988), D. K. Dey, M. Ghosh and C. Srinivasan (1987)]. The work draws on the earlier work, but the results are not immediate extensions of that work. Compound estimation in the Hannan’s sense with entropy loss is first attempted apparently. Terms not previously considered in the compound literature have been analyzed. Chapter 1 conducts a general discussion about component and set compound problems for exponential families with compact parameter spaces under entropy loss. 3 Chapter 2 obtains asymptotically optimal set compound estimators with rates O(n”4) for some discrete exponential families with compact parameter spaces by representing the Bayes estimators in terms of mixture density under entropy loss. Chapter 3 discusses set compound estimation with continuous exponential families under entropy loss through using the kernel density estimation considered in Singh (1974). Chapter 1 SET COMPOUND DECISION PROBLEM UNDER ENTROPY LOSS IN EXPONENTIAL FAMILIES 1.1 The Component Decision Problem The component statistical decision problem considered has a standard exponential family {Peze GO} , where P9 has density (1.1) r5.= e°'*“"‘°’ with respect to a measure u on R" and 9' x denotes the inner product of 9 and x in R". (9 is a subset of the natural parameter space (1.2) N = {9: jee'Xdp < 00} and (1.3) w(9) = ln( Icelxdp.) , GEN is the cumulant generating function (see Brown (1986, page 1)). Of course, in this thesis, we consider only the case where G), and therefore, N are nonempty. Define (1.4) N~ = {0 eN: E9|X| _<» and regret (1.14) D.(9.1)=R. 0, x = O,1,2,---. It follows that P9 has the probability mass function (2.1) pe(x) = g(x)e°"'*"‘°’, x = 0,1,2,---,e e e. 16 x=0 Clearly, 111(9) = ln[z e9“ g(x):l is strictly increasing in 9. Here the natural parameter space is an interval (-oo, v}, open or closed on the right with —oo < v 3 +00. (From display (9) of Rainville (1967) on page 111, v = -ln[lim sup Vg(k) ].) Consider k—>+oo (2.2) o=A=[v,,v,] c N0. Remark 2.1 If (2.2) holds, then the hypotheses of Propositions 1.1 - 1.4 are satisfied so that for any probability distribution G on G), the unique Bayes response is given by (1 .12), that is , by (2.3) rem: n”‘[E(n(9)IX)l, x = 0,1,2.---, Moreover, in Proposition 1.4, the Lipschitz constant C0 can be taken to be 1/ r'1(vl ) , that is, (2.4) |n”'(a) — n”‘(b)| Sla — b|/1'1(v,) for all a, b 6A. Proof Since G)=A=[ vl, v2] is a compact subset of N0 g N” and (p is differentiable of all orders on NO, n = \1! is continuous on (9. Therefore, E(n(9)| x) exists and is integrable. Also n[(9] = [11(vl), n(v2)] is a convex set so that (1.11) is satisfied. Also note that @1140): 1/n(n”'(y)). Slnce n(9)=\p(9)=Var9(X)>0 on N0, n 18 d . . strictly increasing. Thus, Enfiy) s 1/ n(n”'(n(vl)))= 1/ n(vl). (2.4) follows from the mean value theorem. 0 We will use the bounds in (2.5) in proofs that follow. 17 Remark 2.2 If (2.2) holds, then (2.5) c”1pvl(x) S p9(x) S cpvz(x), x =0, l, 2, ..., 9 e [vl,v2] where c = e‘V‘V’W‘V". (This is an analog of the inequality on page 1894 of Gilliland (1968).) Proof (2.5) follows directly from (2.1) and (2.2). 0 Suppose that n has the power series representation (2.6) 11(9) = quej", 9 66 = [v,,v,] j=1 and that the series converges uniformly on (9. It follows from (2.1) and (2.6) that for any probability distribution G on (9, ' ,_ _ )6 _w Po(x+j) g(x) _ (2.7) 10(x)... E(n(9)|x)—§j:qu(e IX)-;qj pe(X) g(“DJ-0,1,2, where pG(x)= Ipe(x)dG(9). (2.7) expresses the conditional expectation in terms of ratios. In view of the bound (1.20), we turn our attention to the estimation of 1:3 (Xi):= En(n(9)IXi), i=1, 2, ---,n through estimation of the probabilities in the ratios, namely, the pan (x) , x = O, l, 2,---. For this consider (2'8) I3(X)=n—IZ:[X1=X]sx=0,1, 2;" j=1 and, fori = 1,2, ---,n,n22 (2.9) 6.6) =(n—1)"ZIX,- = x1,x=o,1, 2,-~; jaei i=1 where square brackets denote indicator functions. 18 Lemma 2.1 With c = e“’(v””“’”", (2.10) Eglmx) — p0,, (x)l s 1cp., (x) I n1”, n 2 1 (2.11) E9|P1(X)‘Pon(x)l s 21013., (x)/(n-1>1” , n22. Proof Since E9(p(x)) = pG“(x) , LHS (2.10) is no greater than the standard deviation of p(x) , that is, 16 (2.12) LHS (2.10) s all: p,1(x)(1 — p9j(x))] . (2.10) follows from weakening (2.12) and applying the second inequality of (2.5). To prove (2.11) we triangulate in LHS (2.11) about E9(pi(x)) to get 0-”) LHS (2-11) 5 59113.0() - Eg(l3i(X))|+l Eg(§.(X)) - P6,, 001- The first term in RHS (2.13) is bounded by [cpV2 (x) / (n - ”1% by applying (2.10) to the set with 9i deleted. The second term in RHS (2.13) is equal to IV)? - wil/(n - l) where wj:= p9j(x), j= 1, 2,---,n. Since the wj 2 0, j=1,2,---,n, IW—wjl/(n—1)s mjax wj /(n — 1), so that the second term in RHS (2.13) is bounded by max p9j(x) / (n — 1) for n 2 2. Bounding this by its square root and applying the second inequality of (2.5) to the W]. completes the proof of (2.1 1). 0 2.2 An Asymptotically Optimal Set Compound Estimator for a One Dimensional Family 19 With the results of Chapter 1 and the previous section, we are prepared to demonstrate a compound estimator that satisfies (1.15) with rate O(n”%) uniform in 9. Let (-)[a'b] denote retraction to the interval [a, b], that is (-) [ab] = (av-)Ab. Consider the compound estimator {t} defined by (2-14) t.QS) = n"(S.(X)) where Xi+ gxi . (2.15) sI(X)=(;:qu)iIfE(i +1.1)(FE.+)J'))["”""“V’"’l =1,2,...,n where the q j, p, are defined by (2.6) and (2.9) and 0/0 :=0. SAX) is an estimator of 12;“ (X) defined by (2.7) with G = G". Theorem 2.1 Let {Pez9 66)} be the discrete exponential family defined by (2.1) and suppose (2.2) holds. Also suppose that (2.6) holds with uniform convergence on ®=[vl, v2] and that 2.16 0° .. 21pg(x) ’2 ( > gglqjlkrp (mg—mp Let {t} be defined by (2.14) and (2.15). Then (2.17) supl D.(9,t)l = 0(n‘” ). 13 Proof By Remark 2.1 and Proposition 1.4, 3 (B0 / f](v, ))n"z Eglsi - 1.0,(X1)| , for all 9 and n. (2.18) ID.(9,:) The conditional expectation 1'0" (X) = a/ b , where 20 a3: ZQJPG,,(X1 +j)[g(xi)/g(xi + 1)] and b3: pG"(Xi)a takes values in ln(Vt), 7105)]- i=1 Letting A := 2%.;eri + j)[g(xi )/ g(xi + j)] and 13:==fa,(xi ) we see that j=1 . A a (2.19) ISi —TGn(Xi)lSIE—BIAD where D = 11(v2 ) — 11(vI ). Applying Singh-Datta Lemma (1.21) and weakening the result wehave (2.20) ISi-T.6“(Xi)| 1 °° ,. . . g(x.) S . . X. — X _____.___ pGn(x,){§qu”p'( .+1) pe,( '+J)lg(Xi+j) + (2710/2) — T1(V1 ))l P1(Xi ) ’ Po, (x1 )1}- We first consider the Cesaro mean expectation of the last term in RHS(2.20). Applying (2.11) to each summand shows that for n22, n-IZEQ lPii-(X) PG(X1)l< 20y ii“ WX) 2.21 ( ) pe_(X.) S:/————n-1n_9‘pG(X) 1 “ °° . . . Since “ZEGi = E0" , RHS(2.21) 5 2c” lef?2 (x)/ dn —1 . Since v2 is 1n the 1nterior of x=0 i=1 the natural parameter space, pf: (x) is summable with respect to x as indicated in Gilliland (1966, page 24). (This is easily seen to be the case because there is a number b “11": 1’2 larger than v2 such that Zeb‘g(x) < 00. Thus, sup[eb "g(x)] < oo , so that e pé(x)= x=0 21 e‘v2 b”"2[eb"g(x)]’% < {sup[e""g(x)]}%elvz”b”"2 which is the summable.) Therefore we X have RHS(2.21) =0(n"V2) uniform in 9. We now consider the Cesaro mean expectation of the first term 1n RHS(2 20) Applying (2 11) to each summand, using an argument similar to that used above and using pv (x + j) = evl’pv2 (x)g(x + j)/ g(x) shows that for n 22, _ lpii(X H)" PG..1(X +1) g(X 2.22 ' E _2 ”LHS 2.16 / -1 ( ) n2 eZIq-I paw) g(x, i)j)< c ( )J—n Us1ng (2 16) we see that RHS(2.22) is O(n”%) uniform in 9 completing the proof 0 Remark 2.3 (2 16) does not follow from (2.2). Consider the example with g(2k) — e g(2k+1) =(2k+1)”3, k=0,1,2,--- Then, N=(-oo, -1). Take v2 = —% which is smaller than -1, but __g__(2k)l)] =62 ——(2k)(2k+1)§ezk = 00. k=0 212342101 g(2k Remark 2 4 If \i/() can be represented as a series of powers of exponential functlon e i.e., 41(9) = Zq e39 with q J 2 0 , then the dominating measure p. indicated 1n (1 1) has to J: k—l eq.“"‘°°’ . em = 1812 g(j)q...-.,1, by the be discrete. In fact g(0) = 6‘4””), g(1)= j=0 argument in Theorem 51 of Rainville (1967) (see page 129). But if q ' < 0 for some 2 9 -— 'nN9,1, 21 ( ) 1nteger j0 >0 the conclusion is not true. For example, w(9) 22 ac . 0 _ j 01(9) = 9 = 2(4)J+l E—r—l—L, for -oo<9sln2, p. is not discrete, actually u is absolutely i=1 continuous with respect to the Lebesgue measure. 2.3 Poisson and Negative Binomial Distributions Poisson Case. Consider the Poisson family with the mean 11. in a closed interval [1», , 1.2] g (0, 00). It is transformed to standard form by letting 9 = ln 1. resulting in 611-111(9) (2.23) p9(x)= x' x=0, 1, 2,---; 9 e[ln1t,,ln1tz] where \y(9) = e9. In this case 01(9) = 11(9) = e6 and the loss function (1.7) is L(9,a) = (9 — a)ee - e6 + e“. In the power series representation (2.6), qI = 1 and q j = 0 for j 2 2. Thus, LHS(2.16) = X§Z[(x+l)pvz(x)]% which is easily shown to be finite. Thus, this x=0 Poisson case is covered by Theorem 2.1. Negative Binomial Case. Consider the family with probability mass functions F(x +01) F(x +1)r(a)’ (1-p)“p" xe0,1,2,---;O0 is fixed. It is transformed into standard form by letting 9 = ln p resulting in F ex-wte) (“'00 -O,1,2,---; lnp. 595111132- (2.24) p9(x) = e F(X +1)F(a) , x— where (41(9) = —0t ln(l — e9). In this case 01(9) = n(9) = one9 / (1 — e”) and the loss function (1 .7) is 23 ore“ L(9,a)=(9—a)1_ee +aln(1—e9)—aln(1—e“). In the power series representation (2.6) , q J. = 01 , j = 1,2, . -- . Thus LHS(2.16)= a2 2632, p;2(x)= x=0 j=l P2 x=0 iiZU—p )Zp :J RH“) r(x +1)F(a) I”(x+oc) x°F(x+ 1) which is finite since -> O as x —>oo. Hence this negative binomial case is covered by Theorem 2.1. An interesting observation is that the Bayes estimator with geometric distribution (01=1) has the simple form PG(X2x+1) (2.25) TG(X)=1n( PG(X 2 x) ) by Remark 1.4, Proposition 1.2 and the following fact: ICNNHG = PG(X _>_ x +1). It is not difficult to prove the above fact. First note that pG(x) = je°*+'"“-°">do = Jede — [6’9“le so that by telescoping, Ie°(“*‘)dG =1— 2130(1) = PG(X 2. x +1). i=0 One can estimate the ratio in th(x) using natural estimates of the tail probabilities in constructing asymptotically optimal compound estimators in the geometric case. Remark 2.5 (i) No one has done the set compound problem with discrete exponential families and continuous parameter spaces. The achievement in this chapter is that instead 24 of the usual squared error loss, the entropy loss is used and with this loss and a different proof, set compound asymptotic estimators with rate n”% are established for some discrete exponential families. (ii) For Poisson exponential family (2.23), squared error loss (x—e°)2, G) = (—oo, 1n kl] (which is unbounded) and compound estimator t with _ p (X +1) t. R2 by T(p, 1.) = (ln (1 _ p)?» , ln p). Then T is continuous, P which yields the compactness of (9:: T{[pI , p2]x[1t,, 12]}. In this example let the action space A=(9. Now 9+0 m . el 2 +. 111(9) = Me) = 13,00: 2. =1—ee. = Zea. n, and . 9+1 00 -0+)‘0 )0 n2(9)=w2(9)=Ea(Y)=—_p =lee- 2+e 2]. i=1 ___€_+l n[®]= [(Q,,§2 ):1t1 SCI 5 12,—— ] —£l-pl <§2< _ ——], which is a polytope in R2. Thus n[(9] is convex. By using Remark 1.5 we see that the hypotheses of Propositions 1.1 and 26 1.2 are satisfied. By Proposition 1.2 the unique Bayes estimator of 9 with the entropy loss is 1:(i("V)3'-” (710, 1720) = 714137le W1], where w := (x, y). Interchange the summation and conditional expectation, obtain °° p6(X+l,y+i) g(x,y) E 0 = , (M W) ,2; pe(x,y) g(x +l,y+j) E.tn(w))-n (<1; mm“) g(Xi+l,Yi+j)’ 0° “1(Xi‘l‘laYi‘l'l) g(xlaYi) +Pi(X1,Yi+j) g(XisYi) . P [J A O A e j=l pi(xi’Yi) g(Xi +1’Yi +1) pi(Xi’Yi) g(xiaYi + J) ])[n[@]]} and _t_:=(t,,t,,---,t ). Then {t} is an asymptotically optimal set compound estimator with (2.8) satisfied under the condition that 1‘ \lVPOk (2.29) VKOR < ————. 1+ ,lvpo,‘ The proof follows. From (1.16) an(9, 01s Ban-{Z Enum- ra,(X..Y. )I (2.30) f‘ s Bon“ZE._.{I my!) — rianlxifli )|+| any) — rm, (X.,Y.)l}. i=1 Let C be a constant in the following. Use (2.28), Mean Value Theorem to the functions 11”] and n“ and Singh-Datta Lemma (see Section 1.3), (2.31) 28 LHS(1.16) n - d.) X Y') sC " E ' X-,Y- *. X2+1,Y.+' g( " ' co - g(xi’Yi) A - X LY _ I C 1 Xl’Yl — Xl’Yl §p0.( 1+ 1+1) gi(X +19Yi +J)|+ lp( ) pGn( )l g(xi9Yi) 0° - 2 g(xi’Yi) +1 JP1(X1+1:Yi+J) J (Xi+1aYi+J) .' ; g(X +1, Y +j) g1: p9» g(x,+l,Y,+))' Go A 2 gg(X i, Yi) - gg(xi9Yi) ' X'sY . — X'9Yi . sen'iZleZpgivxxnmfl) g(x’” +Zva.(x.y+ +1)—~———-—— 8‘ —-—-——’—-——”.l y2x j=l k=l g(x+1,y+j) j=1 k: 1 g(x y+j) '_ _ l (y+Jx1).< Furthermoreb (2.27) and , __ , y (m)! Zijipjivk(x+l,y+j) g(X, Y) y2x j=l k=l g(x'l'laY‘l‘j) ][yliJ/ 00 -m 3"” x+2p +'x—-le lkk__o__0__kx+l = J [( )(1- -p k) 1"“ ’ E: El: E “I O (“1)! Wk +1): HI) / . °°. y / (1-p0k)>"0k) [ij (y+J—x—1)!% 21" W 2‘1”)?" 21 x‘ 0+1)! ) j=l x=0 p0k j=l y=0 Z k=l 5C2 Zip OkyZ(1+Y)p0k% [J(1’P0k)7~0k/P0k +lly lt=l C2 2' n’Z O are used in the estimation of its uth derivative. Such K(s) exists since there is a linear functional on L2 [0,1] which satisfies (3.4) (see Rudin (1973), Theorem 3.5). Let 1 “ 1 X- -x 1 “M :=— K J 32 x > c for o = O, 1, ..., r-l, n 21 and 0 < h < 1. We let p§”’(x) denote the (n-l) term average with the ith term deleted, i = l, 2, ..., n. The upper index (1)) will be omitted in case 1) = O. In the following let T=e‘”‘ [K2(s)ds, T,=[(r—1)!]"s{;‘[|K(s)|s'ds and (I) 1 . so=1alv1131. Also let p€“’(x)=;-_-;Zpé',"(x), p"(x)=pe,(x), p.(x)=p§°’‘"’ (X)) S (ii) lng)‘°’(X) — p’(x>| s mes" —1)h"°pe,, (3-6) (iii) Eglf)‘°’(X)-P2’,’(X)IS D001)- Proof Note that ElWlS [Var(W)]%+|EW| by triangulation at E(W) and moment inequality for ElW—EWI . Thus, (iii) follows from (i) and (ii). Singh (1974, page 59) proved ( i ) and (ii). The bound in (ii) is slightly different from the Singh’s bound because of use of the inequality le" —1|s|e' —1||x/ a| if lxls a instead of the Mean Value Theorem. 0 33 Lemma 3.2 For n .>_ 2, (3.7) E91P10)(X)—P(U)(x)I-<- D.(n) + 28.‘.’¢‘".‘”'(P..(X) + p.(X)) / (n -1)- Proof Triangulate LHS(3.7) at p§°)(x). Use the fact |W —wi|/(n - l) s Zmaxle-l/(n — l) J for W]: =p(")(x) and (3. 6) for the following second inequality and (3. 3) for the third inequality below LHS(3 7)— < EelP(°)(X)“ P. (U)(X)I+IP. M00 P2:)(X)| < D .(n) + 2maxp‘”’(x)/(n - l) (3.8) s D.(n)+ 2s::e* " (p.(x)+ p.(x»/(n -1) The proof is finished. 0 Lemma 3.3 Assume for a positive number ho with ho < 1, (3'9) = Cé‘ifidC—j—fl V(x x,xh,) Let Uu(x) be RHS(3.7) with u < r. Then IUu (x)d “(X)<4 S(,__e___“’.1")‘"(n +zcfie%(w'-w.)(n_1)_%h_(u+%) + mes” —1)h"" (3.10) forOh""* 1} + 11 n(a)IVI n(13)l+n(B) — nth)” (n _1) + 2cfiei“*’"‘"~’(n -1)‘ih‘% + mes" — l)h']). 1 For the choice of h = n“5 with 8 = , 2r+1 RHS(3.7) 5 Chi-1 , r—k+1 where Cl is a constant and y = 2r+1 1: Remark 3.2 (i) If w(9) = Zaqe“ (k 22 and {aq} are constants), then N is closed (i=0 9‘! because J‘ee‘dtt=eza“ If 9“ —>a, a finite boundary point, Iliflee'xdps 36 28.93. flme =e}:"“°‘q O) by the same argument before. And also 0 e N if a2 = 0. If 111(6): a9k (k > 1), then a > O for all even integers k >1. Ifk is odd, a9 is nonnegative for all 9 belonging to N if N0 is not empty. (ii) In the above case, i.p.(k/2) (integer part) can not be even. The reason is as follows: assume p. is the corresponding measure to the cumulant generating function 111(9) == a9k . Take 90 6 N3 , the interior of the natural parameter space for u and define du' = e°"‘du. Then 11’ is a finite measure. a(9 + 90 )k will be the corresponding cumulant generating fimction. First let k be even and k/2 be even, for any tl , t2 6 R, 689:: ea[i(t.-tz)+9ttlk ealiit.-t.>+e.l" 2.19:; 2a93-k(k-l)a03"(t.—t,)’+.-.+2a(t.-t,)" =6 '6 , eat}: which is smaller than 0 for large (t| — t2 ). This means came")k is not a Laplace-Stieljes transform, contradicting to the argument of Brown (1986) (see page 42). Now let k be odd and (k-1)/2 be even. In this case, a90 is nonnegative. For any t,,tze R, net a[i(t -t you)" (1 t 0 e I 2 =62”: _eaBu[2e:"_k(k_|)93"((|—:2)1+-~-+2k(t.-t2)b'] ennui-wen" e393 ’ which can be negative for large (tl — t2) if 90 ¢ 0. In case 90 = 0, take t,,t2,t3e R such that a[(tl_t2 )k + “3 — t1)k + “2 " t3)k] ‘7‘ 2m” for any integer m, 37 1 emu-ti)" cam-1.)" cal-(‘z-Il )k 1 eai(I1‘t3)k cams-ti)" 631“)": )IK 1 = -2+2eos{a[(t,—t,)“ +(t, -t.)k +(t2 -t.)"]} <0. a(it+6., )k Therefore e is not positive definite, and is not a Laplace-Stieljes transform. So a(6 + 90)" is not a cumulant generating function, neither is aG" in case that k is odd and (k-1)/2 is even. (iii) If k is odd, the dominating measure for 111(9) = a9k (k > 1) can not be finite from the latest argument above. (iv) In Theorem 2 on page 62 of Singh (1974), we can reduce the condition (A2.1) there to Jfldu 1, j = 1, 2, ..., and 0 < t < 00, (which is (7.6) there), then the proposition will follow. First, let us show that 40 . . 1 (3.23) f‘”= (—1)Jf(w>P.(;v—), where j(k+l) 1 1 . P,(;V-)= 29“? with C1120 for 1 Sisj(k+1). f‘”(w) = f(w)[zk: (—|am|mw‘""l ) + b / w] = (—1)f(w>l§ (IamlmW'"‘" ) + (—b)/ wl = (—1)f(w)P.(-:;), where P,(;1v-) = Z (Iamlmw‘m")+ (—b)/ w. Assume (3.23) is true for j. j(k+1) j+l) _ _ j+1 __1_ i _ j+l i f‘ (w)—( 1) f(w>P.(w)P,(w)+( 1) f(w)(§C..w...) j(k+1) . 1 1 = (-1>’*'f(w)tP.(;)P,(;) + ;C. W := (-1)"" f(W)P,-..(-v1;) (3.23) is verified. Using the argument on page 4 of Boas and Widder (1940), Aj(c,t) = (~1)’cjtj"e°‘ fw“"f‘”(w + c)dw 2 0, foranyc>1,j21and0 1 case. Lemma 3.4 Let xe R , G a prior on [a,B]. Then for any integer k > 0, (3.24) p‘o’“(x) = (—1)“ Es... 1:... [Boom Proof Induction on k 2 1 will achieve the proof. Observe that the integral of pG(s) with respect to the Lebesgue measure ds on [x, 00) is the integral of —-(1;pe(x) with respect to G0 on [01,13] . Assume (3.24) is true for k = n-l. Now consider k = 11 case. RHS(324) = H) I ds.-. f 9"""’p.(s.-.>dG0 = (-1)“ I ask“. 1:. ds,_,--- 1: as, 1: p,(s)ds Lemma 3.5 In the context above, 2 n -1 LkTPi(X) (n -1)hu(x,kL + h)| m" E, | fil‘“ - plink’IS—IBI“" {kh'pa (x) + eW‘W' (p.00 + ppm) (3.25) + T,(e’°L —1)h'pi(x)} +J Proof Triangulate the LHS(3.25) about k iterated integrals of p0_(s) changing each iterated integral’s interval at a time and apply the subadditivity of the absolute value, LHS(325) +L k_|+L 2+L I+L A S dSk-l I»: dSk—2”°£ dslf' Egilpa(S)—po. (S)IdS an k-I*L 2+L .+L + I+LdSk—l [H dSk—2H'J'2 dslfl p0,, (S)dS m m 1+L ”L + Ids,“I £_I+Ldsk_2---£ dslfl pGn(s)ds+-~ + Ids“ £_|dsk_2...[ dsl[+LpGn(s)ds Consider the second one of k+1 iterated integrals above, denoted as [2. Replace the upper limits of k-l inner integrals of I2 by +00 and use Lemma 3.4 in the first inequality below, 1, s (-1)" fe"‘p.(x + L)dG..(e) snark p0,,(x + L) S”SI-k po,(x)eBL =ll3l—k h'pGn (’0 Similarly, we have |B|"‘ h'pGn(x) as the common bound for each iterated integral after the second one. For the first one, denoted as 1,, applying Lemma 3.2 in the first inequality and Lemma 3.4 in the second inequality below, 43 +L |H+L z+L I+1. 2 ._ It=f (isle-ll:l dSk-2”'[2 dsnfl [ii—:16, W'(Pa(5)+PB(S)) +D-(n)]ds k CW 'oW <|l3|' [2r1 1(Pa(X)+Pp(X))+Tr(e-a -1)Pa(X)] ,_ ,+L [+Ld +L Tp (s) +deSk1l, ds" 2 ds'f. \/(n—1)hu(s, h) d8 k 26‘”. "'W dssuf [ p.(s)ds>* sum! was.) Therefore, 2e‘""‘"- LkTp-(x) I 5 .k + + T ‘a -1 hr - + ' . IBI [ n_1 (p.00 p.00) ,(e ) p.(x)] \/(n_1)hu(x,kL+h)lmk Combined with the first part of this proof, we have our lemma. 0 Lemma 3.6 Let {pe(-):9 66)} be the one dimensional exponential family (1.1) mentioned in the beginning of Section 3.1. Assume that for numbers Cl,C2 and I? with c,,c2 20 and E s 0, pv(x) % i m (3.26) ML” J( W, (h I)“ h)> du(X) s h (C. + Czlloghl ) holds for O < h < l and some real number m. Let Vk(x) be the RHS (3.25). Here V0(x) equals to U0(x) in Lemma 3.3. Then 44 (3.27) qu(X)du(x) q T _ - —(n—1)‘ h‘ '*’(C. +C2|loghlm ) e“"“"- + T,(e'°‘ —1)h']+ 2C 'qu .<.|B|“‘ [W + _1 forOSqSk+landOl“‘“"‘I) i=1 +(-b)|13l'”(X.)- 9511(X.)l+[lv(a)IVIw(B)l+\iI(B)- v(a)llf).(X.) — po,(X.)|} 4 < Boco {ZMquGBF‘M [(q +1)hr+ewl —2v + T (e—a— 1)hr] q+l IBIrWHY—hw‘q+C2I‘Oghl'“ >>1+(— beBr (1.2.. 4 —-e“’ -~2 +T(e‘°‘ 1m —1 +2C + zc,/|3‘m—T(n -1)-%1.-%+2(c, + Czlloghl‘“ )) + [I «anvwwwm — woo] ( 4 152“"- + r,(e'°‘ - l)h' + 2cfie2‘W""~‘(n —1)'2h‘2"+‘(cl + c2| log hl'“ ))) , n _— 46 for0