$2QEHOZI WW" will null». miiiifmfim Michigan St 31293 00606 university This is to certify that the dissertation entitled COMPOUND ESTIMATION OF PARAMETERS OF RIGHT CENSORED EXPONENTIAL FAMILIES presented by Jagadish Purushotham Gogate has been accepted towards fulfillment of the requirements for Ph.D. degree in STATISTICS QM HWM Major professor Date OI 7/03 71/8? MS U is an Affirmative Action/Equal Opportunity Instiruu'on 0-12771 ~—_._———— ~ —--— ——-— - , 77* 7—777777 7 7 7 7 PLACE IN RETURN BOX to roman this checkout from your record. TO AVOID FINES return on or baton one due. DATE DUE DATE DUE DATE DUE ll _JL_____ _Tl——ll—TJ MSU Is An Affirmative ActiorVEquol Opportunity Institution COMPOUND ESTIMATION OF PARAMETERS OF RIGHT CENSORED EXPONENTIAL FAMILIES By Jagadish Pumshotham Gogate A DISSERTATION Submitted to Michi an State University in partial fulfilment of the requirements for the degree of DOCTOR. OF PHILOSOPHY Department of Statistics and Probability 1989 kl" \ 2 m u-O r0 \3 ABSTRACT COMPOUND ESTIMATION OF PARAMETERS OF RIGHT CENSORED EXPONENTIAL FAMILIES By Jagadish Pumshotham Gogate Consider the usual random censoring problem in which X and Y are two independent random variables with X~F0 for a 0 e 9 c R and Y~G and one is required to estimate 0 based not on X and Y but on their identified minimum (Z = XAY,A = the indicator of [X5Y]) under the squared error loss. An estimator 112 then incurs a risk R(¢2,0) = E ”(go-0)? where E0 is the expectation induced by (Z,A). In this thesis we investigate the set and the sequence compound version of this problem. In the set as well as the sequence compound versions, the above mentioned problem is assumed to occur repeatedly and independently say 11 times and it is required to estimate fin = (01,02,...,0n) based on Zn = (Z1,Z2,...,Zn) and An = (A1,A2,...,An). A set compound estimator (SCE) 39=(¢1’¢2"°°’¢n) is such that for each i=1,2,...,n , t/zi is an estimator of 0i and is allowed to depend on _Z_n and An while for a sequence compound estimator (SQCE) f9 = ({bl,$2,...,1~pn) each {bi is allowed to depend only on Q, and -A—i' The risk of a compound estimator t is taken to be the average of the component risks, Rn(t,fl) = n-12?R(ti,0i). The modified regret Dn(t,_6_l) = Rn(t,Q) - R(wn), with ”n the empirical distribution of £11 and R the Bayes risk in the component problem, has been a standard for evaluating compound estimators t. The results obtained in this thesis hold uniformly in Q E 9‘”. When F 9 is exponential with density to = 0e.“ for x > 0 and 0 6 [mm C (0,00) and G known, we show that Dn(_‘d3,_Q) = 0(n-7/5) for 0<70 is denoted by h c(x). 1.1. The Component Problem. The component problem that we consider throughout the body of this thesis is the well-known random censoring problem. Let 9 be a subset of R indexing a family of probability measures. Let X and Y be two random variables such that, under 0 , X ~ F 0 and independent Y ~ G. Let f 0 denote a density of F0 with respect to a measure p. Let Z=XAY and A = [XSY]. Let P0 denote the joint distribution of Z and A determined by F 0 and G. The decision problem considered is the squared error loss estimation of 0 based on Z and A. For this problem, the risk at 0 incurred by an estimator 11) is R(gb,0) = Bow—0)? For a prior w on 9 , let ‘1'“) denote the Bayes estimate versus w (see Section A.2 for details) and R(w) denote the minimum Bayes risk. 1.2. The Set Compound Problem. Suppose the decision problem described in Section 1.1 occurs repeatedly and independently, say 11 times. In the set compound version one allows the use of observations from all the problems in each of the decisions. Thus, in a set compound estimator p = (1/11, ¢2""’¢n) of _0_ = (01, 0 ,....,0n) 6 9n, t/Jj is an estimator of 0j based on Zn and An with joint distribution Pu = x11l P j (the subscript dj is abbreviated to j, here and throughout). The compound risk is taken to be the average of the component risks incurred by the use of each 11)]. : (1) R (at) = ‘1 3E (as-)2 n j=l_n j j Let ”n denote the empiric distribution of 01,02,....,0n. For a simple symmetric estimator 2/2, i.e., lbjan’ An) = (0(Zj, Aj) V 1 5 j S n for some component estimator 11), the compound risk is the component Bayes risk of :12 against the prior ”n and hence at least R(wn). The excess (2) Dn(fi)fi) = Rn(3éa.a) - R( wn) is called the mmified regret of Q at Q and has been a standard (see, e.g., Section 0.2 in Singh (1974) and the references mentioned there) in evaluating compound procedures. Compound procedures which attain risks asymptotically no more than R(wn) are of interest and a compound estimator gig is evaluated on the basis of how fast it achieves R(wn). We say that a compound estimator 39 is asymptotimly optimal (a.o.) (with rate nc with e > 0 ) if (3) 3238s Ducts) = 0(1) ( 0(n") )- Often set compound estimators are of delete nature. Typically in a delete compound estimator 32 = “01,1112, . . . Jpn) each 30]. is a function of an estimate of the Bayes rule in j-th component using the other observations and this function is evaluated at the j th observation. This creates some sort of independence and thereby simplifies some of the mathematical arguments in obtaining the asymptotic Optimality of such estimators as we shall see later in this thesis. With this in mind, we next obtain a simple bound for the modified regret (under the squared error loss). Let «2 . denote the empiric distribution of 01,0 "Maj—1’ dj +1""’ n.l (with the "normalizing factor" 11 instead of n—l). Then from (A.2.5) on (with p there, in our case, the identity function on R ) it can be shown that I! and it take values in 9 if 9 is convex. “’n wnj Since an; ) = n’1 n i ll Mi: 2 it follows from (2.1) , (2.2) , the identity b2—c2 = (b—c)(b+c) and the triangle inequality with the intermediate term \Ilw .(Zj’Aj) that “J D (4) IDnctol s 2 diam e n‘1{j§1EnI¢,-- ‘I’wanijH n + ,2 snlww , — wwnsz-Ajn} J=1 11] for a compound estimator g with it]. taking values in 9. 1.3. The Sequence Compound Problem. The sequence compound problem also considers, say 11, independent repetitions of a component problem but allows data only through stage j in estimating dj , i.e., in a sequence compound estimator Q = (5&1,¢2,....,7zn), each iIinsbasedoanand Ajforlgjgn. The risk through stage n and the modified regret of a sequence compound estimator 373 is given by (2.1) and (2.2) respectively with the understanding that each 01. depends only on Zj and Aj' The criterion for the asymptotic Optimality (with and without rate) remains the same. 1.4. Summary of the Prment Work. Compound Decision Theory was introduced by Robbins (1951). He considered the problem of deciding between N(—1,1) and N (1,1) and showed that the procedures he considered are asymptotically optimal which he called asymptotically sub—minimax. Later this work was generalized to two completely Specified distributions by Hannan and Robbins (1955). Since then a huge literature has evolved on this subject. However, the most relevant to the present work is that of Gilliland (1966) and Singh (1974). Gilliland considered sequence compound estimation of certain discrete exponential families and normal distribution and Singh extended it to general exponential families. Singh estimated average u—densities and their derivatives obtaining certain asymptotic pr0perties and used them in obtaining asymptotic optimality of his estimators based on these density estimators. He used the so called class-r (for additional material on this class, see, e.g., Devroye (1987)) kernels in defining his density estimators. In this thesis we extend Singh's method of density estimation to the case of right censored exponential families. Under the assumption that the censoring distribution is known, we obtain the asymptotic optimality for our set and sequence compound estimators for the special case of the ordinary 1/5 exponential distribution with rates near 11 for the estimators based on 1/2 for the estimators based on kernel divided difference estimators and near 11 estimators. To deal with the more general situation when the censoring distribution is unknown, we first develop the product limit estimator of the average distribution function and use it in defining kernel estimators of the average of densities and their derivatives. Based on these kernel estimators we define our set as well as sequence compound estimators and obtain their asymptotic optimality by means of certain Ll bounds result for the PL estimator. Even though nonparametric kernel density estimation in the presence of censoring has been considered and explored by many researchers, it was carried out only in the case of estimating a common density. The present work seems to be the first in estimating the average of densities and their derivatives when the observations are right censored and using them in constructing the set and sequence compound estimation problems. The material in this thesis is organized as follows: In Chapter 2 we consider estimation of parameters of exponential distribution under the assumption that the censoring distribution is known. We define two classes of estimators of fj , 13(1), (l-Fj) and (l-Fj)(1) : based on i)divided difference estimators and ii)kernel estimators. In the former, we show that the compound estimators so defined are a. o. with rates near mu5 1/2. In both situations we show and in the latter they are with rates near 11 that the rates obtained here are the best possible in those classes by obtaining lower bounds in the case of a constant parameter sequence. In Chapter 3 we consider set compound estimation of parameters of the standard exponential family when the censoring distribution is unknown. We define kernel estimators of Ti and {3(1) based on the PL estimator of Pi and use these estimators in defining our compound estimators . By using Theorem A.3.1 we then show these estimators are asymptotically optimal. In Chapter 4 we consider the sequence compound version of the component problems treated in the Chapters 2 and 3. We obtain the asymptotic optimality of the sequence compound estimators defined here as corollaries to the results obtained for the set compound estimators. The Appendix contains some miscellaneous results most of which are used throughout the thesis. Among them the principal one regards uniform L1 bounds for the maximal deviations of F from F on the intervals (-00,Z] for each zeR. This by itself is of independent interest. CHAPTER 2 SET COMPOUND ESTIMATION OF PARAMETERS OF EXPONENTIAL DISTRIBUTIONS BASED ON THE CENSORING DISTRIBUTION 2.0 Introduction In this chapter we consider estimation of parameters of exponential distributions when the observations are right censored. We deve10p estimators based on divided difference and kernel estimators of average of certain densities and their derivatives similar to those considered, presumably studied extensively for the first time, in Singh (1974) in the uncensored case. While we defer the case of estimating densities in the presence of censoring without the full knowledge of the censoring distribution until the next chapter, here we base our estimators on the censoring distribution thereby obtaining rates of convergence. However, these techniques do not seem to be obtaining rates in the case of general exponential families without more restrictive assumptions on the censoring distribution, among other things. In Section 2.1 we formally introduce our Specialized component problem. In Section 2.2 we obtain expressions for the Bayes estimates \Ilw and \Ilwnj. In Section 2.3 we obtain a suitable upper bound for the modified regret. In Section 2.4 we define compound estimators by estimating the Bayes estimates versus the delete empiric distribution given in Section 2.2 and we obtain their asymptotic Optimality in Section 2.5. In Section 2.6 we show that the rates obtained in Section 2.5, in fact, are the best possible rates for the class of procedures considered there by obtaining a lower bound for the modified regret in the case of identical components. In Section 2.7 we define compound estimators based on kernel estimators and we show that the rates obtained here are close to 1/2 whereas the rates were close to 1/5 for the estimators based on divided difference estimators. Finally, in Section 2.8 we obtain exact rates of convergence for the estimators of Section 2.7 in the case of identical components. 2.1 The Component Problem In this chapter we consider the component problem of Section 1.1 with O = [0, [i] C (0, 00) and F45,(x)=l-e_0x for x>0 and 0&9. Throughout this chapter we make the following assumption. Assumption G: G has a positive density g with respect to Lebesgue on R+. 2.2. Bayes Estimates Versus on and wnj’ Since f 0(x) = 0e-0x and F ”(x) = 1_e-0x and therefore (1) {0(1) = ““0 and (140)“) = 404:9), the Bayes estimates of 0 against can and wnj given in (A.2.6) specialize to (2) anon) = —6(f(1)/f)(z) — (1-0((1—F)(1)/(1—F))(z) and z =_ (1) z _ _ _—,(1) -12 (3) was} .6) 60,. /f,)() (1 0((1 F) /(1F,))() respectively for (2,6) 6 (O,oo)><{0,l}. 10 2.3. An Upper Bound For The Modified Regret Since 9 = [0,3], it follows from (1.2.4) that for a compound rule _\I_I = (\Ill, \IIZ,....,\I'n) with \I!j taking values in O , _1 n n 2(fl—a)n 2 \II.-\II Z.,A.) + 2 (\II AI! )(Z.,A.)|}. With A the Lebesgue measure on R+ and 6 is the counting measure on {0,1}, it follows from Assumption G and (A.l.2) that (Zj,Aj) has a density pa. with respect to Axé given by J (2) pace = 6(142(z))f0(z) + (1-0(1—F0(z))g(z) for (2,6)e(o,m)x(o,1} Since 9 = [afl and 31618 p0(z,b) S 9—” (53 + (F0800) for ZE(0,00), it follows from Remark A.2.1 and the inequality (A.2.4) with (p the identity function that the second sum in rhs(1) is 0(1). Thus from (1) we have the following upper bound for the modified regret. (3) (Dues) 5 2(a-a)n"j§lfinl(wj-anszjaj)I + 0(n‘1). 2.4 Procedures Based on Divided Difference Estimators. By the bound for the modified regret in (3.3), a compound rule 2 will be ac. if \II. so defined approximates \I! in L (E ). Since the expression J “nj 1 —n for \Ilw in (2.3) involves hill) l—F. and (l—F.)(1), it suffices to nj J J J J estimate these quantities. Note from (3.2) that (1—F0) is a part of the density of (Z,A) which corresponds to "censored" and to to "uncensored". This suggests a natural way of estimating fj , f9), l-Fj and (l-Fj)(1) 11 and therefore ‘1'“) for ISan. 111' Our method here is the divided difference estimation of the fly) and (l—Fj)(") for 12:0 and 1. Let be a non—increasing sequence of positive numbers such that on» 0 as n -) 00. For each k=1,2,...,n, z 6 (0,00) and 0:0,1 , define random functions a& and 66k by (1) 5,1.(2)((14;(zk))(6=11+g(zk)(6=01} = c;2{[z+cnfl]. 12 2.5 Asymptotic Optimality of the Procedures Based on Divided Difference Estimators. Since (Zj,Aj) has density p0. given by (3.2) and the \IIJ. depends I only on the other (Zk,Ak)'s , it follows by the independence of (Zk,Ak) for k = 1,2,...,n, that (1) El(‘I' ~51 njzx jA,)I = foams -e nj)(z. 1))(1—G))f,(z)dz + (“elm-5w!” )(z, 0)l(1-F0j(z))g(z)dz- We will obtain suitable upper bounds for E|(\IIJ .w-W nj)(z, 5)] for 6:0,1 and use them in (1) in order to get an apprOpriate upper bound for the modified regret via (3.3). The following lemma (Datta (1988)) is a pointwise improvement of Lemma A.2 of Singh (1974) and is useful in bounding the modified regret. Lemma 5.1 (Singh—Datta): For e R5 and 2160 S L, (2) lzl {|§ - i) A L) s ly-YI + ((3,!) + Luz—2). Let en = 11-1/5 for the remainder of this section. Lemma 5.2 below will be used in the proof of our main result of this section, Theorem 5.1. Lemma 5.2: For each 7 E (0,2] , 3 numbers M1 and Mo such that for 1 5 j 5 n the following two inequalities hold: (3) f,‘7(z).5_ ,(I(51- 51,7) +Ib1j-51j)|75) Mlj(l+(f (z)(1—G(z+2cn ))) 7/2)c7, l3 (4) (1—F,(z))‘7s,(l(50,-- 50,I7 + I130,-50,75I) M0 (1+((1- .5 jn(2))52, (2))‘7/2)c7 cn Proof. We shall prove (3) in detail; (4) follows by similar arguments. By the moment inequality and sub—additivity of the 7/2 power, -.- .. -.- 2 1- .— (5) anIal, - 5,,I7 .<. (Var(a,,))7/ + Isnal, — 5,,I7. By the independence of 51k, for k=1,2,...,n and the second moment bound for the variances of the summands, (6) Var ('51,) g n-zki'j skim? But . 0‘4 z+2cn fk( x) (7) Ekalk2 = 0n c -0 (z+2c ) < (1—G(z+2(cnlc;14)) (e 0“ n —e k n) by the monotonicity of G and then the exact evaluation of the resulting integral. From the mean value theorem, -2 c —e 01‘ n = -20kcne—£ with 0<§<20kcn. Since 0. Note that, from the definitions (4.1) and (4.2) of 51k and Elk for k = 1,2,..,n—1, the () are a row i.i.d. array of random vectors since the (Zk,Ak)'s are i.i.d. Let 0&2 and 0,? denote variances of 5.11 and 611 respectively. 18 From the definition of alk’ as n -3 do, Z+ZC +c — 1 (5) E 1111,, = on2 U: ‘1 i,(t)- f2 “ f0(t)} .. is )(z) and z+2cn (t) 2f0(z) ‘33 E15 3112‘ = c 111]; III-101?)“ 71137—7 and therefore 2f (2) 3 2 0 (6) CD 0a 4 m. From the definition of Blk’ as n -+ co , . —l z+cn (7) Eb =c ft)-Ifz) 1 1k 11 1; 6A 0( and _1 z+cn f0(t) f 0(z) anlblk = c11 f dt " Fonz and consequently, f (Z) (8) Cu Ub2 " W0 Z 0 Since 2 A . _l Z+Cn f0(:) {0(2) Cn Elalkblk = ‘cnl; dt " ‘ Tim—)2 1 it follows from (5) and (7) that . . f (Z) (9) cu2 Cov(a1k,blk) -1 - I'Iofli)’ . Let 1 2 2 ~ . l 2 * ‘ = is a row i.i.d. array, the covariance matrix of Sn_1 is equal to that of T111 and hence converges to 1‘. Therefore, since {[3] , [9]} generates the column space of I‘, it will follow by the Fabian and Hannan ( (1985); Theorem 4.3.2) CLT that (11) Sn—l .2. N(0,I‘) provided the arrays < (Tngt),...,Tm(12) > satisfy the Lindeberg condition for l = 1 and 2. But, since is a row i.i.d array for each I, the Lindeberg conditions then reduce to (12) slang())2/Var(rn§‘))[('rn{5))2>(n—1)Var(rn§[))cz] .. 0 for each 6 > 0. It follows by the definitions of alk and b1k that ”23.11,“,Jo = O(c;2) and ”Blkllm = O(c;1) and consequently, from (6) and (8) and the fact that ncn -) do, the events in (12) are eventually empty. Thus from (11) ; (1) 2 s (13) < (ncn3)1/2(a,—i, (z)) , (ncn)1/ (bl-ion» > .2. N(0, 1). From this and the delta method theorem (see, e.g., Theorem 4.4.2 in Fabian and Hannan (1985)) applied to the quotient function with the differential at (f0(l)(z),f0(z)) equal to l/f0(z) l -f,(z)/(f,(1)(z))2l here, it follows (using the fact (ncn'3)1/2 = n1/5) that (14) n1/'Ev’(-a:.l/I)1 — 0) 42-) N(0,n) where It = 2/(f0(z)(1—G(z)))). Recalling that 11n(z,1) is the retraction of the ratio mil/I11 to the interval [afl], it now follows from (14) and the Fatou Theorem for convergence in distribution (see, e.g., Loéve (1963), Theorem 11.4.A(i) ) that (15) li_m n2/5En(\Iln(z,1) — 0)2 2 K or Ic/2 20 according as 0 6 (0,0) or 0 6 {01,3}. Now by an application of Fatou's Leanna to rhs(4) we obtain (1) from (15). Next, we shall prove (2). From (3), the inequality concerning the non-negativity in (ii) is immediate. Since \Iln takes values in [0,5], it follows from (3) that for 7E(0,2) . 2__ . (16) D423") 5 (3—a) 7 anlwn(zn,A,,)—0I7. Since It“, .5 0 (because wnj is degenerate at 0) and (5.11) holds with 3 HJ replaced by a (by assumption), it follows from (5.12) that (17) rhs(16) 5 (fl—o)2—7Bn—7/5 eventually. Now by choosing an U sufficiently larger than (fl—a)2-7B (to compensate the initial terms) we obtain from (17) the second inequality in (2). 1:) Remark 6.1: The assertion regarding the lower bound for the modified regret in the above theorem can also be proved by techniques similar to those in Singh (1974) in connection with his result on the lower bounds but those make use of the Berry Esseen inequality (however, his proof there is incomplete; see Section A.4 of this thesis for a completion of his proof) and Theorem 2 of Hoeffding (1963). 2.7 Procedures Based on Kernel Estimators. Let r be an integer greater than 1. For u=0,1,2,....,r—l, set (1) Jr = { K: Kbounded Borelmeasurable vanishing off (0,1) and a} V .5ij K(y)dy =[j=1/] forj = 1,2,...,r-l In interest of typographical simplicity estimates of h(V) will be denoted by 21 11(1’ ) and, when h = l—H, I) will be denoted by l-H in what follows. In this section we deve10p compound procedures based on kernel estimator of ng) and (l-F.)(V) for V=0, 1 (for how the problem reduces to estimating fg") and (l—Fj)(”) for V=0,l see the introductory paragraph of Section 2.4). 11—1 / (1+2r). In this and the next sections we let CD = For V = 0 and 1 and for K” E .X I» define on (0, do) ‘ V — 2k“ (2) 1(1) =(ncn"+1) 1k§jtx,[-;;]IAk=1)/(1-G(zk))) and ‘ V V - Z _. (3) (1-F,)()(-)= can“) 1k§j(K,(—‘;T))Ak=01/g(zk)}. We define our (set) compound estimator Q by QLZJA) = (\i’l(Z1’Al)’ @2(Z21A2)1m9‘in(zniAn)) where for 1_<_an and (2,6) 6 (0,ao)x{0,1}, (4) 3,123) = [- 3(?§1)/?,)(z) ..(1-1m1.s,)(1)/(1.1‘:~,))(z)] with 000,6 = a[x,6]. 0) Lemma 7.1 and Theorem 7.1 to follow are analogous to Lemma 5.2 and Theorem 5.1 respectively. Lemma 7.1: For each 7E(0,2] 3 numbers MI and M3 such that (5) fj-‘YE—n'ng)- {EWI‘Y s (c,"1M{)7(3'7+(5,(Hi(- + c,))‘7/2) and (6) (1—F,)‘7s,I((1—F§”)) - (1—F,)("))7 s (cur‘IM;)7(3'7 + ((1-F,)g,n)‘7/2) 22 for v=0, 1. Proof: We shall prove (5) in detail and (6) then follows by similar arguments. We omit or exhibit evaluation at 2 as convenient. By moment inequality and sub—additivity of the 7/2 power, for every 72(0, 2], “ V) _ ,V) 7 < 7 7 (7) EDIT] if I - a + B where 2 = ‘ V) = " V)_ .V) o Var(T§ )and B @117] 15). The definition of I?) and (3.2) give . l (8) EDTEVMZ) = 0;” 1;, KV(t)Ij(z+cnt)dt fgr)(z+cnt 17) r l _V 1 r—l ilk)(z) = C11 1;) Ky(t){ REG—LET“ where the expression in braces is the r-th order Taylor expansion of f. at z. k nt) + (ont)’} dt Distributing the integral in rhs(8), the first term in rhs(8) is Igy)(z) by the defining property of K” in (1). Thus, from (8) it follows that cH’ 1 ) n r r (9) B 5 T 11);) Ifg (z + cntn)|t dt where M is the common bound for K0 and K1. Since fl?) = (-1)r0]l;fk and 0k E [cam for k=1,...,n and 0 < cn, r), it follows that 021,131 If§’)(z + ants) = 5912)). Thus, from (9) we Obtain that 7 7 M H 7 7 (10) B S Ar ((17:17! On ) Ij (Z)- By the independence of for k=1,2,..,n and the second moment bound for the variances of summands and the definition of I'g"), 23 2 V+1 --2 2 y-_z 1 . s (nc. ) ,3, 15.1 antiserum) d) = (nc,,2("+1))'1 fK3(12f>i,-(y>u:oim dy (11) = (“CDZHIYI fK12/(Y)fj(z+cnfl(l-G(z+cny)) dy S (ncnm’l'1 (1—G(z+cn)))-l f K3(y)fj(z+cny) dy. Since sup f.(z+c y)=f.(z), and K2 _<_ M2 and (nc OSySl J n J V 11 due to the fact CD = n—l/(1+2r)’ it therefore follows from (11) that (12) «75 (11%“)7(I,-/(1—G(z+c,)))7/2. Now (5) follows from (7), (10) and (12) with M; = (M/rI)VM7). o 2V+1—1_ 2(r—V) ) —Cn Next, we state and prove the main result of this section. Theorem 7.1: (i) If 76(0, 2] and for some n = 110 (13) 1—G(Z) + Z } ‘(0'167/2)de < m f (1-G(z+<‘n)"/2 (scn(z))" 2 6 then for 2 defined by (4), 3 a number B* such that 14 ”.- . . 7 < * -7(r-1)/(1+2r) , ( ) lzggnsnlorj 11: “anZJ’AJH _ B n v 11 (ii) If 76(0, 1] and (13) holds, then (15) flung], Inns, 3)) = 0(5‘7"1)/(1+2‘)). 2 Proof: We shall first prove (13). Since a s 3,, 3,”; I) and (HP/id)v((1—F,)(1)/(1-F,)I) s 3 Vi. (2.3), (4) and two applications of Lemma 6.1 yield the following inequalities: “(I‘L- \Il )('16)| J wnj i( 1) i0) _l_ _ _l_ I]. 1']. _fi, (1) _p, (1) (_1_13__ _ (_1_1)__ A(fl—a)] A s 6] (1.15,) (15,) 1(s.o)} + (1—6){ 24 (1o) 5 a ij‘lui‘gD-igll) + (zs—anij—ijl} + (1-6)(1-F,-)'1{|(l-13,)(1)-(1-F‘j)(1)| + (23-5))(1—5,Hl-F,)I). for (,5) E (0,00)X{0,1}. By the same reasons given in obtaining (5.15) from (5.14), we Obtain, here, from (16) that for (4,5) 6 (0,00)X{0,1}. (17) 2-2(7_1)+E lei-11' )(-,6)I"s —n j wnj -7 “(1)- (1) 7 7 ‘ 7 of]. {gnnj i]. | + (25-h) Enlfj—fjl }+ _ _--7 _1 (1) _- (1)7 7 _1 _ 7 (1 0(1 F,-) (sum F,-) -(1 F,-) | + (23-0) E,,|(l F,)-(l F,)| } by Lemma 8.1, the first and the second terms in rhs(l7) are bounded by Ii(1v(2s—o)7)rhs(5) and (1-3)(1v(2s—o)7)rhs(d) respectively. From this and the 7—analog of (6.1), we obtain from (17) that (13) snug-1,, )(z,,A,-))7 5 112,71“) 11) f.(z) l-F.(z) 1 1 l-G(ZL 1 5(2) d ( + ”(71(2))”?(1—G(z+2c..))7/2 + (1-F,-(z))7/2 (g. (2))”2 } Z) II 2 1 with M* = 2 (7— )+(1vs'7)(M’f7v M37) where M3 and M‘f are as in Lemma 7.1. Since the 0j6[o,fi] and (fr/(797(2) v((1-F,-)/(1-F,-)7/2) s (1v35‘7/2)e'(a-fl7/2)°. the integrals in (18) are bounded by (lvfla-7/2) times the integrals in (13). But lhs(13) is non—increasing with respect to n by the monotonicity of G and the definition of gc(-). Thus (14) follows from (18) weakened by the above bound with B*sufficiently greater (to hold also for the term 11 5 n0) than M*(1 +(1vdo’7/2)1hs(13) for n = no) 25 We next prove (15). If 7E(0,1] and (13) holds, then from (14), r: _ 1_.),)( (19) EggnEnM‘I’j anj)(zjiAj)l S (5‘0) { 111304)} V 11 since a 5 \Ij, \Pw .5 ,6 Vj. Therefore from (19), the first term in rhs(3.3) is nJ 0(n—7(r-1)/(1+2r)) . Consequently, (l5) follows from (3.3). n 2.8 Best Possible Rates for the Procedures Based on Kernel Estimators in the Identical Component Case. For the class of procedures defined via (7.2), (7.3) and (7.4) we Obtain in this section rates of convergence of the order n7(r—1)/(l+2r) for 0<7<2 when Q is a vector of identical components. By obtaining a lower bound (see part (i) of the theorem below) we show that the rates are best possible for this class of procedures. The proof of the theorem below is quite similar to that of Theorem 2.6.1 and therefore most of the arguments in obtaining similar conclusions are repetitive. However, the class of procedures are quite different and consequently there is an improvement over the rates of convergence provided the degree r of the class of kernels defined by (7 .1) is sufficiently large. For the remainder of this section let 6” = (Q,0,...,Q) with a fixed 0 E [as/il- Theorem 8.1: Let on = n_l/(1+2r) and Q be defined by (7.2), (7.3) and (7.4). (i) Then (1) n2("1)/(1+2‘) Dn(_€_1'1_,r°) .. 00 as n .. a . 26 (ii) On the other hand, if 7 6 (0,2) and (7.13) holds with 6 replaced by a, then 0 5 Dn(\_II_,0‘”) for each 0 E 6 and 3 Ur < do such that (2) 3025 D4219”) s Ur {79'1” (”20 for all n. Proof: By following exactly the same analyses leading to (6.4), we have the following lower bound for the modified regret: as m as (3) 11,,(23‘) 2 [0 §n(2(z,1)-0))2(1-G(Z))f9(2)d2- Let 2 be a fixed number in (0,oo). We denote the ratio In(1)/I'n simply by 7(1)]? where the later averages are usual averages corresponding to n—l elements. We next obtain the asymptotic distribution of . For V = 0 and 1 and k=1,2,...,n—1, let ._ 1 5152 _ (4) Ame) — 31713 K,[ ,n ][Ak—ll/(1-G(Zk)) D and note that the () are a row i.i.d. array of random vectors since the (Zk,Ak) are i.i.d. It follows from the definition of pa in (3.2) that 1 _ —V EIAVk — C11 0 KV(t)f0(z+tcn)dt 1 r——1i,,(k)(z)(tcn)k t,,(')(z+tcnry)(tcn)‘}dt (5) = «1:7, 5.111,; n + .1 r-V to“) + 31:1— 1;)lf0(r)(z+tcnn)trdt where the second equality follows by the r-th order Taylor's expansion of f 0 at z and the last equality follows by the orthogonality prOperty of the class .132. But the integrals in the second term of the extreme right hand side of (5) are absolutely bounded by ff0(z)/(r+l) and consequently it follows from (5) that 27 (6) ElAVk '2 f0(u) as n '2 00. From (4) and (4.2), it follows that 2 1K (t)f (z+tc) (z) (7) C121V+1E1Auk = 1;, V l-G(za+tcn)n d‘ 0 EU}. 192“)“ where the convergence above is a consequence of the facts that f 9 and G are continuous and the integrands converge dominatedly. It also follows that (Z) (s) cn3Cov(A0k,A1k) 4 ”0 FTP-(Elf K1(t)K0(t)dt. Let Ta. = (3.1173111?) = (amen-314111.13 ”2310 1-E An.»- Then it follows from (6), (7) and (8) that, as n -) do , the covariance matrix of Tnk converges to 1 r j; K12(t ) dt K1(t)K0(t)dtq :c"._ (9) r = 3(1) K)l(t K0(t)dt f12K0(t)dt 1 n—l Let Sn—l = k31(n—1)-1/2Tnk. Since is a row i.i.d. array the covariance matrix of Sn—l is equal to that of Tnl and hence converges to 1‘. Therefore, it will follow, as in Section 2.6, that (10) s,,_l 39-1 No.1“) if the arrays <(Tnge),. . . ,Tnngf)» satisfy the Lindeberg condition for l? = 1 and 2. But, since is a row i.i.d. array for each I, the Lindeberg conditions then reduce to (11) EI(TD{‘))2/Var(rn{())[(Tn]‘))2>(n—1)Var(rn§(he?) .. 0 . for each 6 > I). From (4), "Al/kn,” = 0(c;(”+1)). Therefore, from (6) and (7) and the fact that (n-1)cn -1 do, the events in (11) are eventually empty. Thus, from (10) and the definitions of Tnk and Ank and 1M"), it follows that 28 (12) (ncn3/2(i'(1)—r§1)), (ncn)1/2(i—t,)) =2. N(0,I‘). Therefore, from ( 12) and the delta method theorem as applied in Section 2.6, (13) 10H” (”31-1971 - 3) 4‘9» No.0 1 . .. where c = 1") K12(t)dt/(f0(2)(l—G(z)). Since \Iln(z,1) = (in/i)“ , it follows from (13) and the Fatou Theorem for convergence in distribution that (14) 1_ir_I_1 12‘“W“may1‘it,,(z,1)—o)2 2 (or 02 according as 0 6 (0,5) or 0 €{0,fl}. Now Fatou's Lemma applied to rhs(3) and (14) together prove (I). If 7 6 (0,2), then ~ 2 2-7 ~ 7 (13) 3,13,12,04) 5 (ea) s,I3,(z,1)-3)I Since \Ilw . a 0 and (7.13) holds with 6 replaced by 0, from (7.14) we obtain IlJ that (16) rhs(l5) s (s—o)2’7B*n’7("l)/(1+2’) eventually. Now by taking a Ur sufficiently large to compensate the initial terms in (16) and noting that lhs(15) is the modified regret in our situation , the second inequality in (2) follows from (15) and (16). The non—negativity in (ii) is immediate. o CHAPTER 3 SET COMPOUND ESTIMATION OF PARAMETERS OF EXPONENTIAL FAMILIES BASED ON PRODUCT LIMIT ESTIMATOR 3.0 Introduction. In this chapter we consider the set compound version of the component problem described in Section 1.1. We assume that F0 has density f 0 with respect to a measure a on R given by f0(x) = d(Q)e0x where d(0) = (leoxdp(x))_l and that it has a positive density n with respect to Lebesgue restricted to (a, do) for an a 2 -00. We take 9 = [0, H] a subset of the natural parameter space {0: d(0)>0}. Let (1) m = 31615 f0 and L =|0|V|fl|. We shall denote the retraction of a function h to the interval [a, b] by (2) (h)a b=a[hb]. 3.1 A Brief Review of Density Estimation in the Presence of Censoring. Kernel density estimation based on the PL estimator has been considered by many authors. See, e.g., F61des, Reth and Winter (1981), Blum 82 Susarla (1980), Padgett & McNichols (1984) and the references there, Michlniczuck (1986) and Susarla and Van Ryzin (1986). In most of the papers appearing in this area various asymptotic prOperties of the kernel estimators based on the PL estimator have been studied. Many of the asymptotic results (especially with rates) have been obtained via rates of convergence of the PL estimator to 29 30 the corresponding distribution function, e.g., in FOIdes, Reth and Winter (1980) strong consistency of the PL estimator is obtained and the results are used in FOIdes, Reth and Winter (1981). Susarla and Van Ryzin (1986) consider empirical Bayes squared error loss estimation of the natural parameter of an exponential family and reduce the problem to estimating the density and its derivative. They make use of a result in Gill (1983) regarding the convergence of the PL estimator on the whole real line to prove the asymptotic optimality of their empiric Bayes procedures. Recently kernel density and hazard rate function estimation in the presence of censoring via strong representation of the Kaplan-Meier (PL) estimator has been considered in L0, Mack & Wang (1989). In Diehl & Stute (1988) the kernel density estimator is represented in terms of a sum of independent random variables plus a negligible remainder from which they determine the exact rate of point wise and uniform convergence among other things. Padgett & Thombs (1989) consider non—parametric estimator of the quantile function Of the life time distribution again through kernel density estimation method. However, all papers mentioned above consider the i.i.d. situation. Moreover, the results are based on almost sure convergence. Consequently, neither these results nor their obvious extensions are adequate for our purpose. By applying the L1 bounds for the maximal deviations of PL estimator of average distribution function obtained in Section A.3 we are able to overcome this problem and it turns out that these types of bounds are just what we want. Campbell and Fdldes (1984) define a generalized product limit estimator for weighted distribution functions based on censored data and prove its 31 consistency for the weighted average of the distributions, a result comparable to Singh's (1975) result. For the reason mentioned in the previous paragraph this result is also not applicable to our situation. However, by simple application of exponential bounds in Fiildes and Rejto (1981) and in Singh (1975), in Section A.3 we obtain rates for the L1 convergence of the maximal deviations of PL estimator of average distribution function (see Theorem A.3.l). It turns out that even a modified version of the PL estimator (i.e., the delete case) inherits these asymptotic properties which makes its applications to the compound problems straightforward. In the next section we define the kernel estimator of the Hh derivative of the average of densities for V = 0, 1 based on the estimator defined in A.3.15 3.2 Compound estimators of Q based on PL estimator of Fj. In this section we define kernel estimators of fjo’) for v=0, 1 based on PL estimator of F]. and use them in exhibiting the compound procedures. Let r > 1 be an integer. For V = 0,1,2,...,r—1, let { K Borel measurable,vanishing off [-1,1] and] V = (2 K: . ) (I'!)"llyJ K(y)dy=[j=u], j = 0,1,...,r-1 The class 2% of kernels defined in (2) includes the class of kernels defined in (2.8.1). Let c=cn be a non—increasing sequence of numbers such that 0’ so is \Ilj. This fact makes considerable contribution in obtaining the asymptotic Optimality of the procedures 3 as we Shall see in the next section. 3.3 Asymptotic Optimality of g. In this section we will state and prove our main result Theorem 3.1. Emma 3.1 below obtains the L1 consistency of the estimator Tg") Of IE") for u=0, 1 and it will be used in the proof of Theorem 3.1 along with 33 Lemma 3.2 which itself is based on Lemma 3.1. In proving Lemma 3.2 and Theorem 3.1 we will make use of some of the properties of exponential family as stated in Section A.6 without further mention. Lemma 3.1 is obtained mainly by an application of Theorem A.3.1. Lemma 3.1: Let G be continuous and 2 be a number 3 G(z)<1. Let < c > be a non-increasing sequence of numbers such that 0 inf ed(0)fz {e ”Aem}dp(x) > 0 and ‘12:; (150(2)) 2 126 1(1)]z {ew‘Aefxldm > o, it follows that inf(1-Hj(z)) > 0 and sup Hj(2) < l. J I Let tn denote Lnlog n and n’ denote n—l for the remainder of this 4 -1 0 by our assumption there exists a sequence of section. Since log n / no numbers diverging to do such that tn/nc4 -1 0. Since Ln -1 do and inf(1-Hj(z)) > 0 and sup Hj(z) < 1 for each 26R, the conditions in (A.3.6), J J (A.3.7) and (A.3.8) hold eventually for each 26R. Thus it follows from Theorem A.3.1 that, eventually, for each 26R —M1(z)n’ (3) (14(2)) 1211!) E( 21192 IF*(1)—F(1)I) S fin ,7n‘ + 69 + l]_ <11 -00 "M Z I "" Z I i, n, 2( )Ln +1/2+ 2e2 n, n 11’ M3( )Ln +1+ 03 n n + 327 where M1’ M2, and M3 are finite and positive functions of 2 independent of j and Q and, consequently, rhs(3) is independent of j and Q. Let Yz denote KV[' ; z)/u(-). Then by the definitions of I?) and I?) in (2.3) and (2) respectively, (4) 5’“ (i?) (z) - 3%) = f Y,(1)d(fi",? - 5,)(0 Since Fj and F: induce the same measure. Since K” and u.1 are of bounded variation on compacts, each Yz is of bounded variation on [z-c, 2+0]. Therefore Yz is continuous there except possibly on a countable subset, say Dz. But F} assigns mass only to those observations which 35 come from Fk’ k=1,2,...,n, katj. Since each Fk is absolutely continuous, it therefore follows that [F';{x} > 0] is E null for each x in (a, co). Consequently, I Dzd(F'3')=0 a.e.(§). Since Fj(Dz)=0, it now follows that the integral in (4) is equal to fv;(t)d(r}—Fj)(t) a.e.(§) where 2Y; = Yz+ + Yz-. Since Fj is absolutely continuous and (Fj*)*= (Fj*), it follows that (‘3’ij = FI—Fj' From this and the facts that K V(1+)=KV(—1-)=0, it follows (by the integration by parts formula in Theorem 21.67 part (v) and Remark 21.68 extending it to functions of bounded variations in Hewitt and Stromberg(1965) ) that the integral in (4) is equal to -f(F‘,f(1)-F,(t))dvz(1). Thus from (4) we obtain for V = 0 and 1 that .. .. . 2+1 (5) sIfi")(z)-ii")(z)l s 52.3. sup IFT(1)—F-(1)If Ide(t)l- J J z-15t$z+l J J 2—1 Therefore, since the total variation of Yz is finite, by our assumptions on < c > it follows from (3) that rhs(5) is 0(1) for each 2 uniformly in j and Q. By the orthogonality of K V and r—th order Taylor expansion of fj as used in obtaining (2.7.10), it can be Shown here that ’ V) __ V) Cr.” r Lc _ (6) IT] (z)1§ (2)) 57,1. ||K,,||,,e 1,(z). u- 0.1 where L is as defined in (1.1) and llKullm is the sup-norm of K”. Since f,(z) s 111(2) s 328 d(o)(e°“ v e305» and "KO", v )IK,)),,< m. it follows from (6) that 36 V)z_ V)z _ (7) 323 1:11.111!) Ii] () I] ()l 0(1) Since lhs(7) is non—stochastic, (1) now follows from the triangle inequality, the asserted behavior of (5) and (7). 1:) Lemma 3.2. Let hypothesis of Lemma 3.1 hold for each 2 < 00. Then “1) _ 1) __. (3) 3283 lgrjrgnfsm) )_mL,,,,L 5,4 Ida <10) and (9) 2383 1311113 st(5,)o,m-1,Idu=o(1). Proof: Lemma 3.1 Shows that E values in (8) and (9) converge pointwise to 0 uniformly in Q and j. Therefore (8) and (9) follow fron the D.C.T. with the dominating L1 - functions 2mL and m respectively. [1 Theorem 3.1: Let the hypothesis of Lemma 3.1 hold for each 2 < do. Then for 2 defined by (2.5) (10) 9311213” 13111311 E|(‘i',--‘I',,, Ilj,-)(Z A,)| = 0(1) and (11) sup, IDn (1.1)) = 0(1) Proof: Since (Zk,Ak) are independent and for each j \Ilj depends only on (Zk’Ak) for katj and since (Zj’Aj) has density pj given by (12) 1,23) = 3(141(z))1,(z)u(z)+(1—afz°°u(t)1,(1)dt 37 with respect to (01"; (see Section A. 1), it follows that (13) EK‘i’j-‘Ilwnszr AjH = fEK‘i'jw-‘I’m’)(21)|(1'G(Z)fj(Z)dfl(Z) +fEl(‘i' .1. (z 0)|( f tandem Since J by the definition of \ilj, \Ilw . given in (2.5) and (2.3) respectively and two 11.I . l) on 1) co 111.111111 wwnjeIa,11] and Hg fljl v IfzIJC 1111/]z depl 3 L, applications of Lemma 2.5.1 and obvious weakening of the resulting bound by B=1V(L+fl-a) we obtain the following two inequalities: * —1 ‘ l l “ (14) I‘I'j(z,l)-‘1'wnj(z,1)| s B IjIz) III§ m )(z)I+IIj(z)-Ij(z)l}. ( 15) |1i1.(z,0)—wwnj(z,0) | g B( f Id11I'1If |(T(1)).mLmL-f(l)ldu + f III)0 —f--ldu} Since f0 > fa A ffl , so is (1-1/n)fj; thus (14) and Lemma 3.1 give (16) E sup |\ilj(z, 1)-\Ilw (z,1)|= o(1) uniformly in Q for each 2. 1< _i- <11 ”111' Similarly foo fjdp > (1—1/n) foo (f aAffl)dp; thus (15) and Lemma 3.2 give (17) E sup lilj (z, 0)—\Ilw (z, 0)] = 0(1) uniformly in Q for each 2. 1315 <11 111' Since v fj< m, it follows from (16) that the p—integral integrand in j=l rhs(l3) is 0(1) uniformly in j and Q. Therefore, since this integrand is bounded uniformly in j and Q by (fl—a)m 6 L101), the D.C.T. shows that the first term in rhs(l3) is 0(1) uniformly in j and Q. Since the integrand of the G—integral in (13) is dominated uniformly in j and Q by (fl-a) and this 38 integrand converges to 0 uniformly in j and Q by (17), one more application of the D.C.T. shows that the second term in rhs (13) is 0(1) uniformly in j and Q. Now (10) follows from (13) and consequently the first term in rhs(1.2.4) (with ij there ,in our case, the \in ) is 0(1) uniformly in Q. Thus, the proof of (11) is complete once we show the second sum in rhs(l.2.4) is 0(1) uniformly in Q and j. To show this, first note that from (12) sup p 2,6) 5 6 m(z)u(z) + (1-6) 069 A and consequently the {01/6 integral of the lbs of this inequality is finite since In 6 L101). From this, Remark A.2.l and the inequality (A.2.4) with (,0 there the identity function on ll, we get that the second sum in rhs(1.2.4) is 0(1) uniformly in Q and j since 9 = [0,5]. In 3.4: Some Examples and Remarks. The hypotheses of Theorem 3.1 hold for many well known exponential family distributions such as Normal and Gamma. Note that the assumptions on G are rather mild and consequently the theorem has wide applications. As indicated in Section 3.1, Susarla and Van Ryzin (1986) have considered the empirical Bayes version of the problem treated here and have obtained the asymptotic Optimality of their estimators (which are different from ours) under more restrictive hypotheses on G. Our method of estimation can easily be Specialized to the empirical Bayes situation and the asymptotic Optimality of the resulting procedures can be obtained by techniques analogous to those used in the proof of Theorem 3.1. 39 Our proof of Theorem 3.1 heavily depends on the L1 consistency Of the Product Limit estimator with rates. However, this approach will not obtain rates of convergence for the asymptotic optimality of the compound estimators unless we impose more restrictive hypotheses on the censoring distribution which could be vacuous. It seems that obtaining rates of convergence even in the empirical Bayes estimation is not possible. If we can obtain the L1 consistency of the PL estimator on the whole real line, then the techniques of this chapter give rates of convergence for the asymptotic Optimality. Apparently, there are no results available in the literature to date concerning the mean consistency of the PL estimator on the entire real line even in the i.i.d. case. Thus, it seems that, in order to Obtain the rates of convergence in situations like ours a different approach is necessary as pointed out in Susarla and Van Ryzin (1986). CHAPTER 4 THE SEQUENCE COMPOUND ESTIMATION 4.0 Introduction. In this chapter we consider the sequence compound version Of the component problem treated in Chapters 2 and 3. In the sequence compound setting, at each stage j, we estimate Qj based on the available observations gj=(Zl,Z2, ..... ’Zj) and A]: (A1,A2,...., Aj). Thus for a sequence compound estimator (SQCE) g = (\Ill, W2,....,\Iln) of Q=(Q ,....,0n), each \IIj is allowed to depend only on gj and Aj' As we pointed out in Section 1.3, the Optimality criterion for SQCE is the same as for the set compound estimator. The results of this chapter are obtained as corollaries to the main results of Chapters 2 and 3. In Chapter 2 we assumed that f0 = Qe—ox, 0 E 8 = [a,/3], a subset of (0,011) and the censoring distribution G is known and in Chapter 3 that f 0 belongs to a general exponential family, 0 e O = [a,fl], a subset of the natural parameter space and G is unknown. 4.1. A Useful Upper Bound for the Modified Regret. Throughout this chapter we will denote the empirical distribution of 01’02"”0j-1 by wjj' Let m and L be as defined in (3.0.1). By particularizing Lemma 2 of Chapter 2 in Singh (1974) to the case 9i = [a,fl] for all i (which is a consequence of inequalities (8.7) and (8.8) of Hannan (1957)), the modified regret of a SQCE ii: = (\Ill, \I12,....,\Iln) of Q = (0 ,....,0n) has the following upper bound: 40 41 .. n .. (1) IDDIMI - 31.2 ‘25—“me s 11,—: gstIIj-wwjszjAjII. Remark 4.1: We will be using the above bound for the modified regret in the next three sections to obtain the a.o. of the respective SQCE's to be prOposed in those sections. Since m 6 L101), the second term in lhs(l) is 0(log n/n). Thus to obtain the asymptotic optimality of our SQCE's, it is enough to consider rhs( 1). 4.2 Estimators based on Divided Difference Estimators. Here we assume that the censoring distribution G is known and Assumption A (that G has a positive density g) of Section 2.1 holds. We now define our SQCE SE of Q based on divided difference estimators of f. = .-1j-1 (1)_.—1j—1 (1) _ _ .-1j-l _ _ (1 _ 1123,11,: ,1"1 —j 21 fk , (1 Fj) — j 21 (1 Fk) and (1 Fj) — 1" 2{‘ (l-Fk)” by semen) =(w1(z,,A,), . . . sauna,» where for iSan, (1) \ij(zj,Aj) = rhs(2.4.5) with n=j there. Theorem 2.1: For the SQCE defined by (1), if 7 6 (0,1] and satisfies (2.5.11), then ~ - 5 sup ID (MI = 0(n 7/ )- .Qelmfllm n Proof. By Theorem 2.5.1 ‘.— . . 7 < "7/5 JJ 42 uniformly in Q. Consequently, rhs(1.l) 5 53L (Ii—d)1""n"’/5 uniformly in Q. n 4.3 Estimators based on Kernel Estimators. In this section also we assume that G is known and Assumption A holds. The SQCE we prOpose in this section based on kernel estimators of fj , Ii“) , (1—Fj) and (143)“) is defined by fireman) =(I1(21.A1). . . . .‘i'n(Zn.An)). where for lgjgn, (1) @(ZrAj) = rhs(2.7.4) with n = j there. Theorem 3.1: For the SQCE defined by (1), if 7 E (0,1] and satisfies (2.7.13) , then £61311 mmanQZml = 0(n‘““‘)/(1+2‘)). Proof. By Theorom 2.7.1, ndISI': - flwjj)(ZJ-,Aj)l7 s 3*(1‘Wr‘ll/(“2‘h uniformly in Q. Consequently rhs (1.1) g 5B*L (fl-a)1-7n—7(r_l)/(1+2r). D 4.4. Estimators Based on Product Limit Estimators of Pi. In this section we do not assume that G is known. The SQCE's to be introduced here are based on kernel estimators Of I]. and 1].“) which themselves are based on the PL estimator of F. = j-IBi'le. We define our SQCE Q of Q by J 43 2(anén) = (Wl(zliA1), - - - i‘pn(zn:An)r where for l 5 j S n, (1) \Ilj(z,6) = rhs(3.2.5) with n=j there. Thus at each stage j, the estimator III. of 0]. depends only on Zj—l J and Aj—l (see the definition Of the PL estimator in (A.3.15)). Theorem 4.1: Let the hypotheses of Theorem 3.3.1 hold. Then for the SQCE defined by (1) finnfllwlDJMH = 0(1). 3 Proof: Since Ejl(‘pj-\I’wjj)(zj’Aj)l = 0(1) uniformly in Q from (10) Of Theorem 3.3.1, so is rhs(l.1). n APPENDIX A. l The Joint Distribution of the Identified Minimum Let X and Y be two random variables such that X~F and Y~G where F and G are distribution functions. Let Z = XAY and A = [XSY]. Since FxG[Z$z,A=1]=F([X_<_z]G[Y2X]) and FxG[Z$z,A=0]=G([YSZ]F[Y, i=l,2, ..... n, be independent random vectors such that Xi and Yi are independent for each i. Let Fi denote the distribution function of Xi for each i and G denote the common distribution function of Yi’ i=l.2,......n. For notational simplicity, in this section only, distribution function means right tail distribution function. Let Zi=XiIIYi and Ai=[XiéYi] and F = n42);l Fk' Denote the distribution function Of Zi by Hi’ {12!in by H and note that Hk = FkG (since Xk and Yk are independent) and, consequently, (1) H = FG. Let Z(1)5Z(2)5 ..... 5201) denote the order statistics of Z1, 22, ...., Zn' Let A“) be the concomitant of Z“); the ties are partially resolved by ranking the uncensored Z's ahead Of the censored Z's. Based on for k=1,2,....,n, we now define a Product Limit (PL hereafter) estimator F of R by - ”(0940):” (2) I‘Iz) = Izmax .1. . 4_(z), c {n H32 n1/2HH2] (4) .131 sup |¢(t)-G(t)l > 1) S Bn(£1H(Z)1Z) -oo enlog 11. Theorem 3.1: Suppose Fi for each i = 1,2,..,n and G are continuous. Let LH and z in It be such that (6) H(Z)EZ(Z)JE;W 2 4. (7) 4,5 32(2) 2 11(2) and (8) H3(z)‘/Lnlog n > 1. 49 Then (9) G(z)_E_ sup |13I(t)-F(t)| s 1 log n + s (1 log n,H(z),z) -coz] for zER. Let G denote the PL estimator of G , i.e., G(z) = rhs(2) with the 1's in the exponents replaced by 0's. By direct verification (Exercise 7.2.2, Shorack and Wellner (1986)), we verify the PL representation of H: (10) i1 = re. By subtracting (10) from (1) we Obtain the following identity: (11) G(F-P) = (II—H) + F(G—G). Thus, since G is non-increasing, it follows from here that (12) 6(2) sup IFIII-IIIOI s -ooc)dc. -oo l/a3v‘r'1 since its third term is uniquely maximized at l/ann' . The inequality in (8) means that enlog n > 1/H3(z)JE. From (7) the maximum in (3) is the second term . Consequently, if c > enlog n then (3) follows since (6) makes enlog n no less than that maximum. Thus from (4) the integrand ( and hence the integral) in rhs(13) is bounded by Bn(enlog n,H(z),z). Now (9) follows from this weakening of (13), expectation in (11) and the L1 bound obtained for the sup distance between H and H in Remark A.5.2. o 50 Remark 3.2: When the Fk's are continuous, w.p.1 the product in (2) is unchanged if (i) is replaced by i and n—i is replaced by 23111[Z [>Zi]° Consequently F can be written as (14) 3(2) = [zz i] [21152115131] 1+EII'[Z(>Zi] Chapter 3. In view of applications to the compound problems of Chapters 3 n and 4, we need a PL estimator of Fj = {11‘2le for j = 1,2,...,n. Denote #J and define Fj (analogous to F) by (15) Fj=(z) [ZZ] 15k¢j$n #J l k the max{zlaz 29" '1Zj_1’zj+1’" ’Zn } by 201%) A.4 On The Lower Bound For The Modified Regret in Theorem 2.6 of Singh (1974) Remark 4.1: Theorem 2.6 in Singh (1974) Obtains a correct lower bound (5.16) but the proof of it is incomplete. We mention this fact here because we have similar theorems in Sections 2.7 and 2.9. The second inequality in (5.17) is incorrect but can be corrected by reducing the right hand side by (,6—111)Pi +l{[£0] . . and note that ,conditional on X, V1, 2,....,Vi are i.i.d. since the X's are. Let 02 = Var(Vl). Then by Theorem V.4.14 of Petrov(1975), 3 A .1/2 1 P. 50 _<_ —1 P V + ()..,U l ( 1 1/0) 03 1+|i1/2 P1V1/0|3 where A is the Berry—Esseen constant and 0 is the standard normal distribution function. Since 0 < inf C(w) S sup C(w) < co, OSUSfl CK 3 it follows from (5.2) that (2) 0 < inf inf f(t) 5 sup sup f(t) <00. afiwS/i l 0 and c; which depend on I. Since K0 is bounded (which KGB-TX] f w(t)dt 5 cgh eventually is tacitly assumed in Theorem 2.6) and 11K;(u)du > 0, it follows from (2) 0 and (5.0) that X+h 1 (t) at 2 Q 2 t—X w * (4) c3h g PlVl .. J; K0[-h—] 7711 t dt 5 c4h eventually for numbers c; > 0 and c2. Since a2 = P1V¥—(P1Vl)2, it follows from (3) and (4) that 3 numbers c; > 0 and c; such that (5) cghll2 _<_ a 5 cghl/2 eventually. Thus from (3) and (5) it follows that 3 a positive number such that (6) c;b1/2 s (PlVl/a) eventually. 52 Since K0 is bounded, it follows form (5.0) and (3) and (5) that there is a number c; > 0 such that (7) P1|(Vl—P1V1)/a|3 5 gr”2 eventually. By substituting the bounds in (6) and (7) in (1) we obtain that for a number c; > 0 (8) Pi +1{[lz,[Iso]} s ¢(—e;(ib)1/2) + c;(ih)_2. But rhs(8) is O((ih)'2); this can be seen using the fact that (-x) ~ Ip(x)/x for large x. Singh's lower bound for the rhs(5.l7) is 0((ih3)_l/2); this can be seen from the analyses (5.19) through (5.25) and a part of the note following (5.25). Since the bound obtained here for (fl—w)(lhs(8)) is of smaller order, the analysis following (5.25) is not affected and thus (5.16) holds. A.5 On the Bound for the Expectation of Weighted Empiricals Based on Independent Random Variables. In proving Theorem 3.1 we have used Remark A.5.2 which Obtain L1 bounds for the maximal deviations uniform on ll of the empirical distribution from the average distribution function . In this section we show how we can obtain that by simple application of exponential bounds in Singh (1975). Singh's result concerned weighted empiricals. Let X1,X2,...,Xn be independent real random variables. For a E [0,1], let Fj(x) = aP[Xjt] 5 4Wte for every t > 1. Remark 5.1: Since W 5 J5 , it is easy to see that rhs(l) is summable for any sequence tn 2 K flogn if 2K > J3. Lemma 5.1: For every T > 0, 2 _li_:I)n s T + W e’2(T ’1). Proof: Since D11 5 W and W 2 1, the above inequality is immediate if T g 1. Now suppose T > 1. By the Fubini Theorem _EgDn = I: flDn>t]dt 5 T + I; flDn>t]dt. 2 From ( 1) the integrand on the right hand side is bounded by 4Wte—2“ "1). By evaluating the corresponding integral of this bound we obtain the assertion of this lemma. 0 Remark 5.2: Let P“ be the empirical distribution of the Xj's and let F be the average Of the Fj's. For the case a = 0 and w1 = w2 = = (fr—1 we Obtain from Lemma 5.1 that * —2(r2-1) (2) J5 E sup|F(x)—F(X)| S T + v’fi e for every T > 0. x It is immediate with 2T = Jlog n that the rhs(2) reduces to Jlog n + e2. A.6: Some Properties of Compact Subfamilies of the Standard Exponential Family. Let u be a measure on ll. Let I: { Q. Ieaxdp < 00}. Let d(Q) = (Ieoxdpfl and set f0(x) = d(Q)e0x for 0 6 9K The family of probability 54 densities {f 0: Q E J} is called a standard exponential family. The associated family of distributions F 0 is also called a standard exponential family. The set I is called the natural parameter space. The HOlder inequality shows that I is convex. General exponential families are extensively studied in a monograph by Brown (1986). In the interests of Chapter 3 we below state and prove some of the properties of this exponential family when the parameter space 9 is a closed interval [0,3] . 1. d is continuous on 6. Let 0n be a sequence such that 011 -1 0. Since 301615 80x S em‘Veflx, d(0n) -1 d(Q) follows by the D.C.T. 2. d is positive on O and consequently 0 < inf d(Q) 5 sup d(Q)< on. 069 069 By Hiilder's inequality, log d is concave. Therefore, i026 d(Q) = d(a)/\d(,6). That 301613 d(0) < co follows from the continuity Of d. log f0(x) = 0x + log d(0) is concave as a function of 0 since log (1 is from the proof of 2. 4. m=supf EL(p). 069 0 1 Follows immediately from the inequality m(x) S 301613 d(0){em‘Veflx}. BIBLIOGRAPHY Brown, Lawrence D. (1986). Fundamentals of Statistical Exponential families with Applications in Statistical Decision Theory. IMS Lecture Notes — Monograph Series. 9 Campbell, G..and FOldes, A.(1984). A generalized product limit estimator for wei hted distribution functions based on censored data. Statistics a Decisions, Supplement issue No. 1, 87—100. Datta, Somnath (1988). Asymptotically optimal Bayes compound and empirical Bayes estimators in exponential families with compact parameter space. Ph. D. Thesis, Department of Statistics and Probability, Michigan State University. Devroye, Luc (1987). A Course in Density Estimation. Birkhauser, Boston. Diehl, Sabine and Stute, W'mfried (1988). Kernel density and hazard function estimation in the presence of censoring. J. Multivariate Anal. 25, 299—310. FOIdes, Antonia and Reth, Lidia (1981). Asymptotic properties of the nonparametric survival curve estimators under variable censoring. Proc. Of First Pannon Conf. in Statist. Springer Verlag FOIdes, A., Rejto, L. and Winter, B.B. (1980). Strong consistency properties of nonparametric estimators for randomly censored data I: The product limit estimator. Period. Math. Hangar. 11, NO. 3, 223-250. FOIdes, A., Rejto, L. and Winter, B.B. f(1981). Strong consistency properties of nonparametric estimators or randomly censored data II: Estimation of density and failure rate. Period. Math. Hangar. 12, NO. 1, 15-29. Fabian, Vaclav and Harman, James (1985. Introduction to Probability and Mathematical Statistics. John ileg (’5 Sons. Gill, R.D.(1983). Large sample behavior of the product limit estimator on the whole line. Ann. Statist. 11, No.1 , 49-58. Gilliland, Dennis Crippen (1966). Approximations to Bayes risk in sequences of non-finite decisions problems. Ph.D. Thesis, Department of Statistics and Probability, Michigan State University. Harman, James (1957). Approximations to Bayes risk in repeated play. Contributions to the theory Of games. Ann. of Math. Studies, 3, No. 39, Princeton University Press, 97—139. 55 56 Harman James F. and Robbins Herbert (1955). Asymptotic solutions of the compound decision problem for two completely specified distributions. Ann. Math. Stat. 26, NO. 1, 37—51. Hewitt, Edwin and Stromberg, Karl (1969). Real and Abstract Analysis. Springer- Verlag. Hoeffding, Wassily (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 13—30. Kaplan, EL and Meier, P. (1958). Non—parametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53, 457—481. Koul, H., Susarla, V. and Van Ryzin, J. (1981). Regression Analysis with randomly right—censored data. Ann. Statist. 9, NO. 6, 1276-1288. Lo, S.H., Mack, Y.P. and W , J.L. (1989). Density and hazard rate estimation for censored ata via strong representations of the Kaplan—Meier estimator. Prob. Th. Rel. 80, 461—473. Loéve, M. (1963). Probability Theory. Third Edition. Van Nostrand. Mielniczuk, Jan (1986). Some asymptotic prOperties of kernel estimators of a density functions in case of censored data. Ann. Statist. 14, N0. 2, 766-773. Padgett, W.J. and McNichols, Diane T. (1984). Nonparametric density estimation from censored data. Comm. St-the. 13 , No. 13, 1581-1611. Padgett, W.J. and Thombs, LA. (1989). A smooth nonparametric quantile estimator from right—censored data. Stat. Prob. L. 7, 113—121. North Holland. Petrov, V.V.(1975). Sums of Independent Random Variables English t‘r/a’nlslation O the original book(1972), by A. A. Brown). pringer e ag. Robbins, Herbert (1951). Asymptotically sub-minimax solutions of compound decision problems. Proc. Second Berkeley Symp. Math. Statist. Prob. 131-148, Univ. of California Press. Singh, Radhey Shyam (1974). Estimation of average of p densities and sequence—compound estimation in exponential families. Ph.D. Thesis, Department of Statistics and Probability, Michigan State University. Singh, Radhey S. (1975). On the Glivenko—Cantelli Theorem for weighted gmpiricals based on independent random variables. Ann. Probab. 3, O. 2, 371-374. Shorack, Galen R. and Wellner, Jon A. (1986. Empirical Processes with Applications to Statistics. John Wiey Sons. 57 Susarla, V. and Van Ryzin, John (1986). Empirical Bayes procedures with censored Data. Adaptive Statistical Procedures and Related Topics - IMS Lecture Notes- Monograph Series 8, 219—234. Wang, W. (1983). Statistical inference for randomly censored linear regression model. Ph.D. Thesis, Department of Statistics and Probability, Michigan State University. HICHIGAN STRTE UN III III IIIII IIIIIIIIIIIIIIIIII‘I“ 12930060