\ MSU LIBRARIES n RETURNING MATERIALS: P1ace in book drop to remove this checkout from your record. .FINES will be charged if book is returned after the date stamped below. MINIMUM HELLINGER DISTANCE ESTIMATION OF PARAMETERS IN THE RANDOM CENSORSHIP MODEL BY Song Yang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1988 ABSTRACT MINIMUM HELLINGER DISTANCE ESTIMATION OF PARAMETERS IN THE RANDOM CENSORSHIP MODEL By Song Yang Let X1.°°°. Xn be i.i.d. with c.d.f. F. and Y1.°°°. Yn be independent of Xi's and i.i.d. with an unknown censoring c.d.f. G. In the random censorship model. the pairs (min(Xi. Yi)' [Xi S Yi])' i = 1.°°-, n. are observed. where [A] denotes the indicator of the set A. Let F have a density f and {f9: 6 € 6} be a parametric family of densities. where 9 is a subset of the p-dimensional Euclidean space. This thesis discusses the minimum Hellinger distance estimation (MHDE) of the parameter that gives the "best fit" of the parametric family to the data. In studying the MHDE, the tail behavior of the product-limit processes is investigated and the weak convergence of these processes on the real line is established. In addition. an upper bound on the mean square increment of the normalized product-limit process is obtained. Based on the global behavior of the product-limit processes. kernel density estimators are constructed and shown to be consistent under Hellinger metric. Using these results. it is shown that, when f belongs to the parametric model. the MHD estimators are asymptotically efficient among the class of "regular" estimators; they are also minimax robust in small Hellinger neighborhoods of the given parametric family. The work extends the results of Beran (1977; Minimum Hellinger distance estimates for parametric models. Ann. Statist. 5. 445-463) for the complete i.i.d. data case to the censored data case. Some of the proofs employ the martingale techniques developed by Gill(1980; Censoring and Stochastic Integrals. Mathematical Centre Tracts 124. Mathematisch Centrum, Amsterdam). To my parents and my brother Tao iv ACKNOWLEDGEMENTS I would like to thank my thesis adviser Hira L. Koul for his support and encouragement during the preparation of this dissertation. His guidance in my endeavor in statistics is greatly appreciated. I would also like to thank Professor V. Mandrekar. Professor James Hannan and Professor Clifford Weil for serving on my guidance committee and reviewing this manuscript. My special thanks go to Professor Mandrekar and Professor Hannan for their many helpful suggestions and stimulating discussions. Thanks are also due to Cathy Sparks and especially Loretta Ferguson for their help in typing this manuscript. Finally I express my deep appreciation to my parents and my brother Tao for their support and encouragement. which have strengthened me a great deal since my earlier education. TABLE OF CONTENTS Chapter Page 1. Introduction and Summary 0 ° ° ° ° ° ° ° ° ° - 1 2. preliminaries . . . . . . . . . . . . . . . . . 4 3. Minimum Hellinger Distance Functionals 0 e 0 ° ° ° ° ° ° ° ° ° ° 21 4. Asymptotic Distributions ° ° ° ° ° ° ° ° - ° ° 34 5. Robustness Properties 0 ° ° ° ° ° ° ° ° ° ° ° 0 52 References 0 o o o c o c o o o o o c o c c o o o o o 55 vi 1. INTRODUCTION AND SUMMARY Let X1,°°°, Xn be independent and identically distributed (i.i.d.) random variables with cumulative distribution function (c.d.f.) F on [0,”). and Y1,-°°. Yn be independent of Xis and i.i.d. with (sub—)c.d.f. G on [0.m] (i.e.. C may assign positive mass to co). In the random censorship model, the pairs {min(X1,Y1).[Xi$ Y1]}. 1 S i g n. are observed. where [A] denotes the indicator function of the event A. Suppose that F has a density f with respect to Lebesgue measure. and some physical theory suggests that f belongs to a parametric family {fat 9 E 9}. where 9 is a subset of p-dimensional Euclidean space. At the same time we recognize that. due to a variety of data contamination. f may possibly differ from any of the fe’s. The problem is to estimate the parameter that gives the "best fit" of the parametric model to the data. When G is degenerate at w, i.e. when we are able to observe the complete data X1,°°°. Xn, there have been many results in the literature. Millar(1983) illustrated that in many cases when the "best fit" is given via a minimum distance recipe, there usually exists a minimax structure. and the minimum distance estimator usually has the local asymptotic minimaxity property, which is defined to be robustness there. While there is quite a bit of freedom in choosing the distance. one distance -— Hellinger distance —— has the merit that the estimation procedure is asymptotically efficient if there were no contamination. as discussed in Beran(b.1977: 1981). It is heuristically illustrated in Beran(b.1977) that the minimum Hellinger distance estimator considered there is closely related to the maximum likelihood estimator and therefore asymptotic efficiency seems plausible. In this thesis, the minimum Hellinger distance estimation (MHDE) in the random censorship model is considered. It turns out that. as in the i.i.d. complete data case discussed in Beran(b.1977), when there is no contamination. this procedure is asymptotically efficient among the class of "regular" estimators: it is also robust in a minimax sense in small Hellinger neighborhoods of the parametric model. The material is organized as follows. In Chapter 2. some preliminary results are introduced. The tail behaviors of the product-limit processes are investigated and the weak convergences of these processes on the entire support set are established. The convergence of the kernel density estimators in the Hellinger metric is obtained. In addition. an upper bound on the mean square increment of the normalized product-limit process is developed. In Chapter 3. the differentiability of the minimum Hellinger distance functionals is studied. In Chapter 4. the asymptotic behavior of the MHDE is investigated and it is shown that this procedure is asymptotically efficient if there were no contamination. In Chapter 5. a minimax robustness property of the MHDE is established. Notational Remarks. Throughout this thesis. X1.°°°, Xn' Y 0- Yn are independent r.v.'s. Unless mentioned 1'. I otherwise. for i = 1,-00. n. X Yi have distributions F. G 1 ' respectively. 51: [X1 S Y1]. where [A] denotes the indicator function of the set A, and X = min(Xi. Y1) with c.d.f. H. 1 For any function 5, §_(x) = E(x-), f+(x) = f(x+). For any (sub-)c.d.f. D. D'1(t) = inf{u= D(u) 2 t}. TD: D'1(1) s w. and fi = l-D. AD = D—D-. Note that E = i 6 and TH = min(TF. Tc). Abbreviate TH to T. Let °". Mn). Jn(t) = [t S T]. Let Rp denote p-dimensional Euclidean space with R = R1. Any v in Rp is a pxl matrix. op(l) (Op(l)) denotes any sequence of r.v.’s converging to zero in probability(bounded in probability). 1: t Is = l(s.t] for s > 0 and I0 = [[0,t]’ The symbol = means "is defined by". 2. PRELIMINARIES In this chapter first we cite some frequently used results from analysis. Then we investigate the behaviors of some basic processes involved in the random censorship model. Lemma 2.1. Let fn. gn. hn’ f. g and h be measurable functions on a measurable space 0 with measure u. (i) Suppose that fn. g and hn converge in u measure n to f. g and h respectively and they are all integrable. Also. In 3n du -+ In 3 dn. In hn du -* In h du and en S fn S hn a.e.[u]. Then Ifl fndu —» In f du. (ii) Suppose that fn -#'f in u measure and "fnan(u) —» Hf" ) where p > 0. Then fn -4 f in Lp(u). U Lp(u For a version of the above results in which the convergence in measure is replaced by almost sure convergence. see Fabian and Hannan(1985. p.32) and Rudin(1974. p.76). From these references and a subsequence argument we can get our results here. The following integration by parts formula has a proof in Hewitt and Stromberg(1965. p.419). Lemma 2.2. Let U. V be of bounded variation on the real line. then for a < b, U+(b)V+(b) - U_(a)V_(a) = [[a b] U_dV + [[a b] V+dU. n In this thesis we always assume that F and G do not have common points of discontinuity: (2.1) (2.2) and C n (2.3) 1; (AF) dG = o H1(t) = P[5l - 1. i1 3 t] = [3 6_ dF. H°(t) = P[61 - 0. k1 g t] = [5 F dG =13 P_dG . by (2.1). _.1 ~ Hn(t) = n 2 [X1 S t]. 1 —1 ~ Hn(t) = n 2 [x1 g tjai Hg(t) = n’1 2 [ii 3 t](1 - 51). A1(t) = I3 (1/F_) dF : [3 (1/fi_) dHl. for t < r. Ao(t) = [3 (1/6_ ) dG = [3 (1/fi_) dHo, for t < r. 0:1t) = ”1’21H (t) - H‘(c)J. 02(t) = n"2[H°(c) - H°(t)]. 1 on = ( Z§ ). for 1 = 1.2. t < r. M:(t) = n1/2[H1(t) - [3 (1 - Hn) dAi] = Q: (t) + [0 (oo_ +Q;_)/H_ dH1 (notice that u; is Gill(1980)'s M(t) as in his definition (3.2.4)). H1n(t) 1 ugct) M n(t)= ).for t < T. The Kaplan - Meier (1958) product-limit estimators Fn of F and G are given as follows: 1 - Fn(t) a u [1 - AH n(s)/ Hn _(s)]. s S t 1 - Gn(t) a n [1 - AHg(s)/ Hn_(s)]. s S t The corresponding processes are 1 _ 1/2 A O _ 1/2 ‘ Pn = n (Gn - G), on (0."). Shorack and Wellner (1986) (S-W) have a discussion on En in their Chapter 7. Notice that since I: (AF) d0 = O. the roles of F. G are interchangeable. hence we have parallel results for an. In fact. for the a-field a: = N u a{[ZiS 3151. [21s a]: 1 g 1 g n, o g s g t}. where N denotes the collection of all null sets and their commplements in the original probability space. one can check via Theorem 3.11 of Gill(1980) that (Mn(t). a: t O S t ( T} is a 2-dimensional square integrable martingale with mean zero and the predictable variation processes 1 t - 1 1 (Mn)(t) - Io Hn_(1.- AA ) dA . 1 e o. 1. and that the predictable variation process for the martingale M0 + M1 is n n o 1 t - o 1 o 1 (an + Mn)(t) - Io Hn_(1 - AA - AA ) d(A + A ) = (t) + (M;>(t). since [3 (AF) d6 = 0. So the predictable covariation process (no. M1)(t) = o. n n Thus an argument similar to Theorem 4.2.1 of Gill(1980) shows that Mn converges in D2[0,T] to a limit process M1 2.5 M: . ( ) (11°) where M1. M0 are independent zero mean Gaussian processes. each with independent increments and for i = 0.1. (Mi>(t) = [8 H_(l - AA!) dAi . Later on we need to consider the integration operation and therefore it is more convenient to use the uniform topology. We follow Pollard(1984)’s approach. Define the sup metric fl(¥)fl = "X" + "Y" in D2[O,T]. Equip D2[0.T] with the a-field generated by open balls. or equivalently. by the projection maps {'t:(¥) —4 ($EE3) I t e R). Denote this space by (D2[O.T]. fl°fl). Then each random element in this space is a measurable mapping from some probability space. We say Wn converges to w in (D2[O.T]. no") if Ef(Wn) -» Ef(W) for any bounded. continuous and measurable function f. Here the measurability refers to the open ball a-field on D2[O,T] and the Borel a-field on R. We have the following. Theorem 2.1. Suppose that F and G do not have common points of discontinuity: [3 (AF) d6 = 0. Then on a common probability space there exists a special construction of triangular array {(§ni’ Gui). 1 S i S n. n = 1. 2.°°'). each row consisting of i.i.d. pairs with the same distribution as (X1. 51). and a 2-dimensional Gaussian process V1(H1) Q=( v°(H°) where V1 and V0 are Brownian bridges. with covariance Cov(V1(H1). v°(H°)) = -H0H1. such that "on - Q" -§T§T» 0. Remark. For the case when there is no censoring, S-W(1986) have the special construction of Xni's for the convergence of the ordinary empirical processes in their theorem 3.1.1 and Section 3.2. For the convergence of QD process. without the condition [3 (AF) dG = 0. they claim to have the special construction of Xn 's and Yn ’s by a minor 1 i variation of their ordinary theory of empirical processes. i.e.. by "rebuilding" the random variables from the process. There is some difficulty in defining 5n ’3 if [3 (AF) dG g 0 i and the construction of Xn ’s and Yn ’s from Qn does not seem i i obvious. To prove Theorem 2.1. we need the following two lemmas. In Lemma 2.3 we use Billingsley(1968)’s techniques for fluctuation of partial sums to get an upper bound on the tail probability of the empirical process Qn' In Lemma 2.4. we show that (Xi. 51)’s can be "reconstructed from On". Lemma 2.3. For any a > b. any 5 ) 0, there exists a constant Ke’ depending only on e. such that for i = 0.1. and n exceeding some no = no(a.b,e) ) O. 1 1 P Q ' Q > [t€E:?b)l n(t) n(a)| 5] s K61H11b) - H‘1a112. Lemma 2.4. Suppose that F and G do not have common points of discontinuity and that on has the same distribution as Qn' Then with probability 1 random variables A (M?, 6?. i = 1.000. n} can be constructed from Qn’ with the 9 same Joint distribution as the ordered X(i)’s and their corresponding 61’s. Proof of Theorem 2.1. First let us prove the convergence of on to Q in (D2[O,T], u-u). We rely on Theorem 5.3 in Pollard(1984). The change from D[O.1] to D2[O.T] causes no problem. Just as in D[O.T]. from the fact that D2[O,T] is equipped with the a-field generated by open balls in the uniform metric. we obtain that every point in D2[0.T] is "completely regular" as in Definition 4.6 of Pollard(1984). Therefore if we can show that with probability 1 the limit process Q lies in a separable subset of D2[O.T]. then the necessary and sufficient conditions for the weak convergence of the processes Qn to Q are their finite dimensional convergences and the "small oscillation" condition(cf. Pollard 5.1.(4)). The finite dimensional convergence part being straightforward. we prove the separability and small oscillation property. Also. we only look at the first coordinate 0:. Let S = (so, 8 0"} be the countable set of jump points 1 I of H1 and Ho, and A be the set of all functions in D[O.T] whose Jumps occur only possibly at points in S. For any x G A, any 6 > O, by Lemma 14.1 of Billingsley(1968). there exists a partition 0 = to< t1< 0-0 < tm = T, such that for I1 = [ti-l’ti)' lO suplx(t) - x(ti_1)l < 6. i = 1,°°°. m. I 1 Now we modify the partition as follows. If x is continuous at t , replace t by a rational point t close 1 i i enough to ti so that the above inequalities still hold. If x is discontinuous at t then t must be in S. Therefore. the i' i points in the partition can be chosen from a countable set. Let us denote this set by U = (no. u1.°--}. Let Bn = {x G D[0.T]: x takes constant rational values on each interval [t1_1. ti)” where 0 = to < t1 (-°°( tm = T is a partition of [0. T] from points in U. and x(T) is also rational). Then B = U Bn is countable and certainly dense in A. Thus A is separable and clearly we have P[Q e A] = 1. As to the small oscillation condition, for each 5, 5 > 0. take a partition 0 = t <°°°( tm = T such that O 1 1 sup [H (t) - H (ti-1)l ( 6. i = 1.-°~, m. I i By Lemma 2.3, for n exceeding some no(t1_1. t..e). 1 1 p(s¥p Ion1t) - on1t._1)l > e] i 1 1 2 1 Ke(H_(t.) - H (t...)) . Since there are only finitely many partition points. we obtain 1 limsup P[ max sup IQ:(ti) - on i I n —» w i g limsup 2 Ke(H1(ti) - H n -+ 0 i (t.-.)l > e1 1(tan S K6 6. Thus the proof for weak convergence is completed. Since with probability 1 the limit process Q sits in a 11 separable set of completely regular points, the representation Theorem 4.13 in Pollard(1984) guarantees that on a common probability space there exist versions an' O of Q , Q. respectively. such that "an - a" -ET;79 0. Now we can n obtain the special construction ((X 1 S i S n) from ni' ani): ((X?. 5?), i = 1.°°°. n) in Lemma 2.4 through a random permutation. a Proof of Lemma 2.3. Let us just look at the case for n1/2(Q; - 01). The proof is similar to the one used in the proof of Theorem 13.1 in Billingsley(1968). In place of the condition (13.17) there. one can verify that E1Iofi(s+p1) - o;1s1I2Io;(s+pl+p2) - oi1s+p11I21 S Constant (Hl(s+p1) - H1(s)) -(H1(s+p1+p2) - chs+p1)). for s. s+pl. s+p1+p2 G [a, b). Thus for r < b - a. by considering the random variables Q:(a+(b-a-r)ilm) - Q:(a+(b-a-r)(i-1)/m). i = 1.0-0, m. we have. in place of (13.22) in Billlingsley(1968), 1 1 P Q t - Q > [[asgfrJI n( ) n(all 6] (2 6) s Be(H‘(b—r) - H‘(a))2 + PEIQ;(b-r) - o;(a)l 2 e/2]. where Be is some constant depending only on c. By triangle inequality. PEIQ;(b-r) - Q;(a)l 2 6/2] (2.7) s Ptlo;(b-) - Q;(a)l 2 6/4] + P[IQ:(b-) - Q:(b-r)l 2 e/4]. Since the fourth central moment of binomial(n. p) does not 12 exceed 8np + n(n-1)p2. and P[IXI > e] S e-4E(X4}. we obtain P[IQ:(b—) - Q;(h-r)l 2 e/4] 4 1 1 4 s (4/e) E {Qn(b-) - Qn(b-r)} (2.8) g (4/e)4(8n‘1(h1(h-) - Hl(b—r)) + n‘1(n-1)(H1(h-) - H1(b—r))2}. Substitute (2.7), (2.8) in (2.6) and then take the limit as r 1 O. we have 1 1 PE Ezrb)|0n(t) - Qn(a)| > e] s Be(H‘(b-) - H‘(al)2 + PEIQ:(b-) - Q:(a)| 2 e/4]. Now for a = H1(b-) - H1(a). as n ——» a. the Central Limit Theorem gives us P1IQ§(b-) - Q:(a)| 2 6/4] -—e P[{a(1—a))1/2IN(O.1)I 2 e/4] s (4/el402 E {No.11}4 s Be(H1(b-) — H‘1a112 11 we take the constant Be large enough. Thus for n exceeding some n0 = n0(a. b, e). 1 1 PE Egrb)|0n(t) - Qn(a)| > e] s xe(Hl(b-) - H‘(a))2 for some constant Kedepending only on e. 0 Proof of Lemma 2.4. First note that since I; (AF)dG = 0. when there are ties among Xi’s. the corresponding 61’s must be all 0’s or all 1’s: for 1 # J. ll 0 P[x1 = x1. 51 = o. o = 1] g P[Y1 a xj] J ' 1 1/2 -n ). where W; = Qn H1 for i O. 1. with l n So for Wn = ( wo n 13 probability 1. we have (2.9) (wt-18)Wn e ((1n‘1/2. Jn'l/Z): 1, j 2 o, 1 + j g n}. Atwn = lim (It - 1rt-l/m)wn m a w e ((o, kn‘l/z).(kn’1/2, 0): o g k g n}. '1wn e ((1n'1/2. jn'l/z): 1. j 2 o. 1 + j = n}. i.e.. with probability 1. W2. W; are increasing. taking constant values on consecutive intervals. and jumping at different points. also. the total number of jump points is n. if we count k jump points when AW1= kn-llz. 1 S k S n. for i = O or 1. By our assumption the same is true for 1 H a O ). Denote the ordered jump points of Wn by ‘ 1/2 n “ Qn - n ( H n 1" according to whether W; or W3 jumps at X?. Then by (2.9) the ~ '°. X: and define the corresponding 6?’s to be 1 or 0 X? i) joint distribution of (X?. 6?). i = l.---. n. can be determined by the projection wt's acting on Wn. In fact. it is the joint distribution of the ordered Xi's and the corresponding 61's. 0 By Theorem 2.1. we can adapt Theorem 7.1.1 in S-W to show (2.10) sup IMn - MI -—z—;—» [O'T] c o for the special construction of xni’ 6n1. i = 1.000. n. and a 2-dimensional Gaussian process as in (2.5). Now for O S t ( T. let r(t) 13 F_/(ffi_) anl. 6(t) [3 é_/(éfi_) duo. P1(t) P°(t) l4 1 0 Then P /F. P /G are martingales on [0. T) with quardratic variation processes 01(t) 1 I; [F_/(Ffi_)]2 fi_(1 - AAl) dA 13 (fit_)‘1 dAl. Co(t) = [5 ('t':1'=_)’l dA°. respectively. Now we can obtain the following convergence results for P1 and Po. n n Theorem 2.2. Suppose that F and G do not have common points of discontinuity. (i) if G is continuous at T. then for the special construction in Theorem 2.1 and any a e (0.1/2). (2.11) sup I[Pg -PO](t) il‘“(t)| ’2‘ 0. t6 [0. '1‘] (ii) if F is continuous at T and t (2.12) A(T) < m. where A(t) [0(1/§_) dF. then for the special construction we also have (2.13) sup [[P: - P1](t)l ‘F‘ o. t e [0.T) Proof. Since the roles of F and G are interchangeable. from (2.10) an argument similar to (9) of Theorem 7.7.1 in S-W shows that. if AG(T) = 0. we have 0 O - - -1 (2.14) sup I(P - P ) K0[G q(Ko)] | = o (1). [0.T] “ p co(t) for Ko(t) = -———-—— and any function q such that. on 1+ Co(t) (0. 1/2] q is T and t-1/2q(t) is l. q is symmetric about 15 1 t = 1/2. and [O q-2(t) dt ( w. Notice that in their proof. S-W use their Theorem 7.4.2. which states the uniform convergence of P; to P1 on each compact subinterval [0.p] of [0.T). p < T. One way to avoid some flaws in their argument is to restrict p to be a continuity point of H. Since H has only countably many discontinuity points. this restriction causes no problem in deriving their Theorem 7.7.1. Now Take ta on (0.1/2]. where a 6 (0.1/2). Notice that q 1 t t C(t)[<‘;(t)]‘l = Io (éé_)’1dc g ] o(éé_f_)’ dG = C0(t) s [F1tll‘l I; (éé_)"dc = C(cltfi1tll“. So G 2 R0 = (1 + D())-1 2 H. Hence we have R016 q(Ko)]" 2 (61“ Kol’“ 2 (61'1 fi 1'“ 2 f 1‘“. Thus (2.11) follows from the above inequality and (2.14). To prove (2.13). first note that if AF(T) = 0. we can use the same method as in the proof of (9) in Th.7.7.1 of S-W. the role of Rl/q(Kl) being replaced by F and the integrability condition on q being replaced by (2.12). to obtain 1 1 (2.15) sup IPn - P I = o (1). [0.T] P Now observe that by triangle inequality sup IP1 - Pll [7. r) n (2.16) 1 IF; (T) - P1(T)| + n1/2(F(T-) - F111) + sup IP1(T) - P1(t)l. [T. T) The first term is op(1) by (2.15). As for the last term. by the continuity of F (hence Cl) at T and the fact T -% T. it 16 is op(1) when C1(T) < on; it is also op(1) when Cl(T) = m, since by Remark 2.2 in Gill(1980) (the results (2.4).(2.5) in that remark and the inequalities following them). sup IP1(t)I-= oP(1) as p I T. Hence it only remains to show [p.T) n1/2(F(T—) - F(T)) = op(1). Note that for the function A as defined in (2.12). for all t < T. (2.17) F(T—) - F(t) = [(t T) 6_ dA g G(t)f(t.1)dA. Now substitute t = T. multiply n(F(T-) - F(T)) through to get (2.18) n1/2(F(T-) - F(T)) g {nH(T)[(T T) dA}1/2. So it suffices to show that nH(T) is 0p(1). since A is bounded and T —4 T w.p.1. Using H H-1(x) S x + AH(H—l(x)). for x 6 [0.1]. we have P[nH(T) > t] 1 P[H(T) < 1 - t/n] = [H_H' (1 - t/n)]n [H H'1(1 t/n) - AH(H'1(1 — t/n))]n g [1 - t/n]n -—» e“t . which implies nH(T) = 0p(1). Thus n1/2(F(T-) - F(T)) = op(1). So finally we have (2.13). 0 Remark. For later use we make the following observations. For t close to T. F. Fl-a G are nonnegative. right continuous and nonincreasing. Also. I f2 dD g [g (1/é_) dF < w. 2 1 [ (fl‘“ 6) dDo g I; Fl'za dG g 1. for a e (0.1/2). Hence. when C1(T) = 0. as illustrated in Remark 2.2 of Gill(1983). l7 P1(t) -—+ 0 w.p.1 as t—-9 T. Po(t)F1-a(t) -—1 O w.p.1 as t—-* 1’. So from Theorem 2.2 we have (2.19) sup IPII < fl w.p.1. [0-7) 1 P = 0 1 82p E321) | nI p( l and for any a e (0.1/2). sup IPO Fl-al ( m. w.p.1. [0.7) sup sup IF: Fl-al = 0 (1). n [0.T) p P: Fl’“(T) = op(1). To obtain the main results of Chapter 4(Theorem 4.2. 4.3). we need to control the increment of the process 2 = n1/2(F - F)/F. n n For any process x. let XT(t) = [t g T]X(t) + [t > T]X(T). Also recall the function A defined in (2.12). The following lemma gives an inequality for the mean square increment of the 2: process in terms of the function A. Lemma 2.5. Suppose F is continuous. Then for 0 S s < t < TF. EEZ:(t) - 211s112 s 4 [A(t)/fi21tl - A(sllfzcsll s 4 f "2(t)[A(t) - A(sll + 4 A(slti ’21t) - F ”2(s)1. Proof. Lemma 2.5 in Gill (1983) states that for continuous F. 2: is a square integrable martingale on [0. p] 18 for any p < TF. with predictable variation process (2.20) (2:)(x) = [3 [(1 - §n_)/f]2 Jn/Hn_ dA1 for x 6 [0. p]. Let L(x) = E(x). Then x -1/2 T 2 - L(x) = 210 [1 - n zn_] Jn/Hn_ dA x .— g 2 [o E(Jn/Hn_) AA 1 Since Hn_(x) is binomial and for l g k g n. 1 + 2 [3 E[Z:_]2JndAl. n/k S (n+1)/(k+1). we have - n —k —k E(Jn/Hn_)(x) = 2 n/k ( E ) H_ H“ (x) k=l “ -k -k - -1 g 2 2 (n+l)/(k+1) ( 2 ) H_ HE (x) g 2 (H_(x)) . 1 Also by Fatou's Lemma. £123-1x112 Jn1x) s Etz:_(x)12 = E 11m [2:(xk112 xkt x 3 lim E[ZT(xn)]2 = lim E< 2T >(xk) g L (x). xkt x n xkt x n since the integrand in (2.20) is nonnegative. Therefore. we obtain (2.21) L(x) g a(x) + 13 L dp . where a(x) = 4 13 (1/fi_) dA1 . p = 2 A1 . Now we use the argument as in Gronwall's Lemma: iterating m times in (2.21) yields m-l L(x) s a1x)+ 13 120 1/1! [31x1 - 5(2)]1 acy) dB(y) + 13 1/m! [n(x) - B(y)]m L(y) dacy) s a1x) + 13 85(1)-p(y) a1y) dp(y) + l/(m+l)! Bm+1(x) L(x). Let m ——» N. Then L(x) s a(x) + 13 ep‘x’ ' B‘Y’ a1y) dB(Y) 19 a(x) + 2 13 e21n(r(y)/r(x)) a(y) dA11y) = a(x) + 8 13 [F(yl/F1x112 13 (1/fi_(u)) dA1(u) dA1(y) (2.22) e 4 f ‘2(x) A(x) by Fubini's theorem. T T 2 Now E[Zn(t) - Zn(s)] E[< Zl)(t) - < z: >(s)] 1 V\ 2 1; E(J IHn_) AA1 + 2 ]; E[Z:_]2 AA 4 I: (1/fi_) dA1 + 2 I; L 4A1 V\ 4 [A(t)/F2(t) - A(s)/F2(s)] V\ by (2.22) and the Fubini's theorem. 0 Let us now define the kernel density estimators and smoothed version of the product-limit estimators as follows: an'1 1R K((x-y1/an) dfin1y). Fn(x) = 1_: fn(s) ds. where K is some kernel function and an is some positive (2.23) fn(x) constant. The following theorem establishes the convergence of fn to f in the Hellinger metric. Theorem 2.3. Suppose 13 (1/6_) dF < m and c is continuous at T. Also suppose that F has a continuous density f. K is nonnegative. continuous and of bounded variation on R. [R K(s) ds = 1. K(s) ——9 O as s ——e -w . an -4 0 and n1/2an —» N. Then 1/2 1/2 (2.24) urn - f "2 —§» 0. 20 where "-fl2 denotes the L2- norm with respect to the Lebesgue measure. Proof. Integration by parts gives us (2.25) fn(x) — f(x) = (pl/“Zamf1 [R P;(x - ant) dK(t) + IR [f(x - ant) — f(x)] K(t) dt = R3 + R4 . say. By (2.19) we have 1/2 -1 1 (2.26) sup IR3I S (n an) sup IPnI x e R [0.T) Since [R [R f(x - ant) K(t) dx dt = [R [R f(x) K(t) dx dt IIR dKl = Op(1). = 1. by Lemma 2.1 (ii) the nonrandom term R4 converges to zero in Lebesgue measure. Also 1/2 2 A 1/2 2 urn "2 = Fn(T) = op(l) + F(T) ”5* l = Hf "2. Now to obtain (2.24). we use a subsequence argument. For any subsequence {n'} C {n}. there exists a further subsequence {n"} C (n’} alone which (a) R -—# 0 for a.e.[A]-x. 4 (b) sup IR3I -9 0 w.p.1. x E R 1/2 1/2 (c) "fn”"2 -—4 Hf "2 w.p.1. By (a) and (b). with probability 1. fn..(x) ——+ f(x) for a.e.[h]-x. By Lemma 2.1 (ii). this and (c) imply that with probability 1, n1;{? - 11/2u2 -—» 0. Hence (2.24) follows. a 3. MINIMUM HELLINGER DISTANCE FUNCTIONALS In the censored data case. the pair (X. 6) is observed. Assuming F has a density f with respect to the Lebesgue measure A. we have N t- P[6l 0. x1 g t] Io F dG. t 1°C Thus (X. 6) has a (sub-)density Fl-y(x)fy(x) with respect to P[51 1. i1 3 t] ‘_dr = [3 f 6 dx. the measure "G on Rx(0.1). where "G is defined by the relation I m duc = I m(x. 0) dG(x) + I m(x. 1) G dx. for any nonnegative measurable function m on Rx(0.1). For any (sub-)density d on R w.r.t. A define a (sub—)density L(d) on Rx(0.1} w.r.t. by “c L(d11x.y) = fil‘yix) d’1x). where D is the (sub-)c.d.f. of d. Recall the parametric family (f9: 9 € 6} as mentioned in the introduction. For (sub-)c.d.f. G and (sub-)density function d. the minimum Hellinger distance functional W(d; G) is defined as a point in 9. if exists. that minimizes the Hellinger distance between L(fe) and L(d): (3.1) "[L(fw(d;c))]1/2 - [L(d)]1/2"G = inf "[L(f9)]1/2 - [L(d)]“zuG . e e e where "."G denotes the L2- norm in L2(uG). In the case when there are more than one minimizer. a Borel measurable selection is possible(c.f. Brown and Purves(1973)). For 21 22 O S T S m. define W(°; G; 1) similarly by restricting all integration to x 6 (-¢.1]. Later we will use fl(°)(—°. 1]"G to denote the norm under the restricted integration. Note that W(°: G; co) = W(°; G). For (sub-)densities f. fn on R w.r.t. A and c.d.f. Gn’ 1n such that (3.2) sup IGn — GI —9 0 and 1n 1 7. (-°°-7] we will use notations 00 = ¢(f; G; 1). 900 = W(f; G). n 9n — W(f. Gn’ 1). enn - W(f. Gn’ 1 ) and u — “G’ “n = ”G Also. we will use I_;°du to denote the integral on (x.y) € (-¢.7]x(0.l} and F9 the c.d.f. corresponding to f6. We have the following. Lemma 3.1. Suppose (a) 9 is a compact subset of Rp. (b) B i 9 implies f5 f f9 on a set of positive Lebesgue measure. and for almost every x. f9(x) is continuous in 9. Then. (i) for any (sub-)c.d.f. G. (sub-)density function f and 0 S T S w. W(f; G; 1) exists. (ii) ¢(f ; G; T) = 9 uniquely if both T and 7 2 T . 9 G F9 (iii) for any (sub-)c.d.f. C. any (sub-)density functions f.fn on R. Ilfnl/2 - f1/2H2 -e 0 implies H[L(fn)]1/2 - [L(f)]1/2flc —e 0. (iv) Il[L(fn)]1/2 - [L(f)]1/2IIG —. 0 implies sun; G) -—4 W(f; G) if W(f: G) is unique. 23 If. in addition. (c) the family {F6(x): 6 € 6) is equicontinuous. then (v) for Gn’ Tn. G and T satisfying (3.2) and AG(1) = o. "[L(fn)]l/2 - [L(f)]1/2"G —» 0 implies W(fn; G; Tn) —4 ¢(f; G; T) if ¢(f; G; T) is unique. Proof. By (b) Fe(x) is continuous in 9 for fixed x. thus (i) can be proved as in Theorem 1 of Beran (b.1977). (ii) is obvious. For (iii). first note that for a. b 2 0. lb - a] S b + a. hence (b - a)2 S lb2 a2I. So we have "(L11 )11’2 — [L(f111’2uc 2 = IR [Fl/2 _ F1/2]2 dG + IR [fl/2 _ f1/2]2G dA S In IFn - Fl dG + [R [fl/2 - f1/2]2G dx. Since [R IFn - Fl dG g sgp IFn - Fl 3 [R Ifn - 1| AA = IR [fl/2 - f1/2I(f111/2 + f1/2) dA. an application of Cauchy-Schwartz’s inequality to the last integral gives us 1/2 _ f1/2 2 IR IFn ' Fl dG S 2{IR [fn ] dh)“ . Hence we obtain "[L(f 111’2- [L(f)ll/2"G 2 S 2{In [fl/2 _ 1/2]2 dM1/2 + In [fl/2 _ 1/2]2G dA This proves (111). Next we prove only (v) as the assertion (iv) follows similar to (v). Define N. Nn ) O by N16. 1) = "{[L119111’2 - [Lif1Jl’2}(-~. «lIIG . Nn(9. 1) = "(1L119111’2 - [L(rlll’2II-w. «“113 n By the triangle inequality. we have 24 INn(6. in) - Nn(9. f)I2 s "11L1rn111’2 -[L(f)l"2}(-~. «“113 n "(L(f )11’2 - [L(fllllzll 2 1/2 1/2 2 - — + |jfi - 1 ] (an - G) dhl -1/2 -1/2 2 +I1fi - J 416 - G)|- The first term converges to zero uniformly in 6 as llfllll2 - f1/2II2 -—+ 0. and so does the second term for Gn’ 1n. G and T satisfying (3.2). since it is dominated by sup IGn - GI I::f1/2 - f1/2]2 dk S 2 sup IG - GI. Now (~w . ”1 {-w «J “ using integration by parts formula. we can write the third v“ - — - - 1/2 term as |-[_ (c n_-c_) d(Fn + F - 2(FnF) ) 1/2 _ -1/2 2 + [Fn ] (T n)(Gn - G)(Tn)I. which can be dominated by 5 sup IG - GI ——» 0. Thus we have shown that. for G . Tn. G n n (-”.T] and 1 satisfying (3.2). (3.3) . Nn(6. fn) - Nn(9. f) ——» 0 uniformly in 9 as "f;/2 - f1/2fl2 ——» 0. By the triangle inequality again we have IN:(9. f) - N2(e. f)l s [i 1/2 _ f 11212“:n _ 6| dx + lIfi —1/2 _ -1/2]2 d(Gn _ G)| 1/2 n 2 + "{[L11911 - [L(fll“ 2111 .1110 Similar to the previous argument. the first term is bounded by 2 sup IGn - GI and the second term by 5 sup IGn- GI. (-°-7] (-" T] The third term is bounded by 25 I(F + F + G)(1 - (F + F + G)(1“)|. Hence for G . 1“, c and 9 9 n 1 satisfying (3.2). (c) and the assumption AG(1) = O. (3.4) N:(9. f) — N2(e. f) -—» o uniformly in 9 as "£111,2 - f1/2"2 -% 0. From (3.3), (3.4) and again the inequality (b — a)2 S Ib2 — a2] for a. b 2 0. it follows that Nn(6. fn) - N(9. f) ——» O uniformly in 9, which implies Nn(9nn' fn) - N(90. f) = mén Nn(6. fn) - man N(9, f) ——4 O, and Nn(9 fn) - N(9 f) -—» 0. Hence nn' nn' (3.5) N(9nn. f ) - N(90. f) ——» 0. As in Beran(b.1977). from (3.5), compactness of 9, continuity of N(9.f) in 9 and uniqueness of W(f; G), one has n enn -—4 90. i.e. W(fn; G; 1 ) -4 W(f; G; 1). D To study the asymptotic behavior of the minimum Hellinger distance functionals, we need to establish the following expansions for se 5 [L(fe)]1/2. When the first order partial derivatives of f9 w.r.t. 9 exist. we will denote the column vector of the partials by f9 with ith component £31); when the matrix of the second order partials exists, it will be denoted by f with (1.1) entry féij). 9 Also, At denotes the transpose of the matrix A. Lemma 3.2. Let p be an interior point of 9. Suppose that there exists a neighborhood V of p such that (i) on V, f9 is continuous in 9 for every x and f9(x) 26 is contniuous in 9 for x C N, where N is a A-null set, (11) for 1 = 1,..., p. U“)(e) a [ [19“)12/19 ex is continuous on V. Then for pn in a neighborhood of p and (x.y) ¢ N0x{1}, where No is a h-null set. (3.6) spn= sp + (ép‘ + rfiupn - p). where (3.7) urgi)(-”.1]NG —e O as pn ——e p for i = 1.°°-. p. 1 6 R. and any (sub-)c.d.f. G. If. in addition. we assume (iii) for i = l,¢°°, p. and some 5 ) O. v(1)(9) a I |}9(1)|2+"’/1‘;+6 di and W(1)(6) E I - Ifginlf9 dfiélz are bounded in a neighborhood of p. Then (3.8) llrgi)(-°°,1n]llG ——+ O as pn-—4 p n for each i = 1.0-0. p and Gn’ 1n satisfying (3.2). Lemma 3.3. Let p be an interior point of 9. Suppose that there exists a neighborhood V of p such that (i) on V. {G is continuous in 9 for every x and f9(x) is contniuous in 9 for x C N. where N is a h-null set. (11) for 1. 3 e 1,-oo. p, n(‘)(e)_ = [ [1(1)]4/13 dx. U£122(9) E f [ f9(ij)]2/fe dh are continuous on V. Then for pn in a neighborhood of p and (x.y) C No x (1}, where No is a A-null set. 27 (3.9) epne .p + (';p + Rn)(pn - p). where (3.10) HRgi’J)(-¢.1]"G -—» o as pn -—» p for i. j = l.'°°. p. 1 e R and any (sub-)c.d.f. G. If. in addition. we assume (iii) for i. j = 1.000. p. and some 6. 6 > 0 11 " 13 2+ 1+ vg )(9) f | 19‘ )I 6/16 2 dx. 1 _ ' 1 4 5 3 5 v; )(9) =1 I19( )I + ne+ dx. 1) = " 13 - 1/2 WE )(e) - I - | 19( )llf6 are and 1 _ ' 1 - 1/4 w; )(e) = j - [19‘ )I/fe are are bounded in a neighborhood of p. then (3.11) un£2°3)(-u.w“]uc -—e o as pn -—» p n for i. j = l.°--. p and Gn' 1n satisfying (3.2). Here we Just prove Lemma 3.2. The argument for Lemma 3.3 is similar and more involved Proof of Lemma 3.2. First let us look at the case when the parameter is one dimensional. On [St f9(s) = 0]\N. f9(s) must be zero since otherwise ft(s) would be negative for some t. Now we can write IR I191s1I ds = IR {I19(s)l 19‘1’21s11191’21s) ds. hence by (1). the Cauchy-Schwartz inequality and Lemma 2.1(1). IR If9(s)l ds is finite and continuous. Now since for s G N 28 and 6 6 V we have f9(s) fp(s) + f2 ft(s) dt . it follows by Fubini’s theorem that Fe(x) = fp(x) + [g{[: ft(s) ds} dt. So by Lemma 2.1 (i). for every x. f9(x) exists. is equal to I: f9(s) ds and is continuous in 6. Next. note that for every x < TF9. [%9(x)]2/f9(x) = (1/F91x111I; (£9 19‘1’2)(s) ré’zcs) as]2 (3.12) g [: (192/16)(s) ds and [R I: [19(1)]2/19 ds dG(x) (3.13) I c [19(1)]2/1e dx. Thus Lemma 2.1 gives us the continuity of "{s9 - ép}(-w.«]uc in 9 for any (sub-)d.f. G and 7 6 R. Now (3.12). (3.13). assumptions (1). (ii) and the proof of Lemma A.2 in Hajek(1972) show that. there is a neighborhood of p. in which se(x.1) is absolutely continuous in 9 for a.e. x [A] and 89(X.0) is absolutely continuous in 9 for every x. Thus for (x.y) C No x {1}. where No is a A-null set. we have for pn in a neighborhood of p. p . s = s + f n 3 dt 9“ p p t s + ( - ' + - ‘lfpn ° - ° ) dt} on p){sp (9n n) p (St Sp . (3.14) p and _ p . . n((pn —p) 11p“ (st - sp) dt>(-~.~Jn§ _ p . . s I(pn -p) lip“ "(st — sp}(-~.~Jn§ dtl 29 (3.15) "(A ' 2 En- sp)(-¢.7]llG for some En between pn and p. Thus to prove (3.7) and (3.8). it suffices to prove that as pn -—4 p. . . m _* . - g -m 1n _# “(spn- sp)(- .1]"G O. ll{spn p)( . ]Hon 0. respectively. The first being guaranteed by the continuity of "(39 - sp)(—°°.1]llG in 9. we only have to prove "(spn- sp)(-m.w“]an-» 0. When 9 is multidimensional. we can apply the above argument to each component of 9. Consequently. we have "{sgi) - sgi)}(-m.1]flc -—4 0 for pn -+ p. and it only remains n to prove u( égi) - é“’}(—o.«“]uc ——» o. n P n We have n(ég‘) - 飑))(-m.1“]u§ n n ' 1 ' 1 2 = n(sgn) — s: ))(-m.1“]uc 1/2 2 1/2 - - p )] (an - G) dx + 4'1]:: [1£‘)/(1p '(1) - f / f n n) p ( -1/2 2 -1/2 : 1 ) — F: )/(Fp )] d(Gn~ C). (3.16) + 4'11:: [%£1)/(Fp n n The first term converges to zero as mentioned above. and by (3.12) and repeated use of Lemma 2.1 (i). the second term also converges to zero. Applying the integration by parts formula and (3.12) to the third term we have n . . III. [F§"/(F;’2) - ig"/(F;’2)12 d(cn - G)I n n 30 _ 1“ _ :(1) -1/2 | 2[_on (cn_ G_)[Fp /(Fp n n ) - ffi"/(F;’2)J 1 1/2 dx —[F( )/(Fp p11 11 ) - %£‘)/(f;’2)] dx + (CD - G)[F(1)/(F:/2 pn n (3.17) s 2 sup Ian-cl{tumwnnl’2 + [U“’(p)11’2} (-”-7] ) _ F(i)/(F1/2)] 2( n)| 1/2 D i .IIQ I TS[F( )/(Fp pn _ L(i) -1/2 D ) Fp /(Fp )]| ds + sup Icn-cI110‘1’1pn)11’2 + [0‘1’1p111’2}. (-"-1] Since sup IGn-GI -—9 0. it suffices to prove that (-”-7] n . . I:m |-%;[ng)/(§:/2) - Fg1)/(§:/2)]I ds remains bounded. We n n can bound it by [R [If£1)I/(F$ ) + I lf(1)|/(F1/2) )] n -1 4(1) -3/2 -1 1:(1) -3/2 + 2 [R Fp /(Fp )de + 2 1 Pp /(Fp ) dF . The first two n n n p terms in the sum are 2W(i)(pn). 2W(i)(p) respectively and therefore remain bounded. To deal with the last two terms in the sum. denote them by B(pn). B(p) respectively. Let p = 2 + e. and q the conjugate of p: p.1 + q-1 = 1. Then q'1 - 2"1 > 0. Take a = q‘l. then aq = 1, ap = p — 1, Holder's inequality gives us IF“’(x)l/F3’2(x ) pn pn /2 m '(i) a p l/ s n mux Irpn xfpnl (s) as] -[I; 132(3) d811, _ -1/q /2 m '(i) a p s l/p - Fpn (xmx prn prnl (s) d 1 -1/q (3.18) 1 F ’2(x)EV“’(pn)]" 31 hence -l -1 (3 19) B(pn) s 2'IIV‘1’1pn111’P(q“ - 2 ) Similarly. -1 (3.20) B(p) s 2‘11v‘1’1p111’P1q‘1 - 2 )‘1. Finally the result follows from (3.12) through (3.16). 0 Now we are ready for the main results of this chapter: the differentiability of the minimum Hellinger distance functionals. We state the results separately for the case when G is known and the case when G is unknown. Theorem 3.1. Suppose (i) assumptions (a) and (b) in Lemma 3.1 hold. (ii) 900 = W(f; G) exists. is unique and lies in the interior of 9. " 1/2 (iii) the matrix I 39 [L(f)] du is nonsingular. 00 (iv) assumptions (i) and (ii) of Lemmas 3.2 and 3.3 hold for p = 9 Then. for fn in a Hellinger neighborhood 00' of f. (3.21) 1(1n; G) - 1(1; G) " 1/2 -1 = 1- I seoo[L(r)1 du + “n1 .; $90011L11n111’2 - [L(f)]1/2) du where all entries of the matrix un converge to zero as "11/2 - 11/2" ——4 o. n 2 32 Theorem 3.2. Suppose (i) assumptions (a). (b) and (c) in Lemma 3.1 hold. (ii) 90 = W(f: G; 7) exists. is unique and lies in the interior of 6. (iii) the matrix [a (seose; + .;90(890 - [L(f)]1/2))du is nonsingular. (iv) assumptions (1). (ii) and (iii) of Lemmas 3.2 and 3.3 hold for p = 60. Then. for fn in a Hellinger neighborhood of f. Gn.1n satisfying (3.2) and AG(1) = O. 2(1n; an; 1“) - 2(1; c; 1) 3.22 = 7 ° ' t + °' - L f 1’2 d '1 ( ) {I-m(890890 330(390 [ ( )1 )) u + Vn} “ ' 1/2 1/2 '12. 890([L(fn)] - [L(feo] ) dun. where all entries of the matrix vn converge to zero as "1 1’2 — 1 1’2" ——3 o. n 2 We only give the proof for Theorem 3.2. since the proof for Theorem 3.1 is similar and simpler. Proof of Theorem 3.2. First note that if assumptions (1). (ii) and (iii) hold for 9 then they hold for all 0. points in a neighborhood of 60. Let n be sufficiently large so that enn is in that neighborhood. Since enn minimizes I?“ 2 1’2 h b L 3 2 1 -w (5t ' 2 8t[L(fn)] ) dun. we ave y emma . . or suffficiently large n. n . (3.23) [:3 se (s9 - [L(f)]1/2) dun = 0. DD nn 33 Expanding s9 . s around 90. we can rewrite (3.23) as nn arm n . .. o = 1:“ [390 + ( 390 + Rn) (ann - 90)] ~[s90+ (990+ rn)‘ (enn - 90) — [L(fn111’2] dun n . = II. s901s90 - [L(rn111’2) dun + [7n A (Q + r )t du (e - e ) -° 90 90 n n nn 0 D .. + II. ( 390+ Rn)(s90- [L(fn111’2) dun (enn — 90) n O . ‘7 t + [_m ( 590+ Rn)(9nn- 90)(seo+ rn) dun (Gnn- 90). An argument similar to that used to prove Lemma 3.2 shows that for Gn’ 7n satisfying (3.2). as "f;/2 - f1/2"2 n O O O O 1 t 1/2 -* ”- - (s s + s (8 -[L(f )] )) du -+ m 90 90 90 90 n n (In (39 Set + 89 (s -[L(f)]1/2)) du. Thus the above equation 0 o o 90 can be written as n . 0 = I. s90(s90 - [L(fn)]1’2) dun t O + vn}(enn _ 90)’ . 11:. ($9059 . '.90(.90 - [L(f)]1/2)) d” where all entries of the matrix vn converge to zero as llfrlll2 - f1/2fl2 -—9 w. Therefore the result follows. 0 Notice that differentiating twice in the identity I s: du E 1 yields f (seset + $989) du E O. which results in the shorter expression (3.21). 4. ASYNPTOTIC DISTRIBUTIONS When C is known. the MHD estimator of W(f; G) is defined as A (4.1) 9 = W(fn; C): 1n when G is unknown. the MHD estimator of W(f: G; T) is defined as 2(1 ; E ; T) (4.2) e n n 2n= (recall T = max(Xl.°°'. Xn)). where CD is the product-limit estimator as defined in (2.3). and fn is the kernel density estimator as defined in (2.23): -1 A fn(x) - an I K((x—y1/an) an1y) for some kernel function K and constant an ) 0. We now prove the consistency of G . 92n' In Theorem 4.1. Suppose that (i) assumptions (a) and (b) of Lemma 3.1 hold. (ii) K is nonnegative. continuous and of bounded variation on R. f K dA = 1. K(s) -—» o as s --4 - w. (iii) an ——4 0 and n1/2an -—9 ”. (iv) [3 (1/§-)dF < m. Then 81“ -54 W(f; G) if W(f; G) is unique. If. in addition. assumption (c) in Lemma 3.1 holds and G is continuous at T. then 92n —5# W(f; G; T) if W(f; G; T) is unique. 34 35 Proof. By Wang(1987). sup IGn - CI = o (1). So Theorem [0.7] p 2.3 and Lemma 3.1 give the result immediately. 0 To investigate the asymptotic distributions of eln and 92n’ we need to establish some convergence results of the kernel density estimator fn and the smoothed product-limit estimator F . Let fl°fl denote the Lm(R)- norm. n no Lemma 4.1. Suppopse that (i) f' [0. T]. llf’llm < 00 and llf”llw ( 0°. -%; f exists and is absolutely continuous on (11) 1 < a and [3 (1/6_) dF < m. (iii) K is nonnegative. symmetric and absolutely continuous. ] de =1. support of K C [-H. M] for some H < w. (iv) a -—4 O. n1/2afi n 1/2 1+5 n a n -» O and for some 6 > 0. a” (v) U is bounded. V is right continuous and of bonded variation on [0. T]. Then T 1/2 T 1 (o n [Fn- F] U dG ‘F‘* (o P 0 dc. T 1/2 T 1 )0 n [in - f] v di "F” '10 P_ dV. Proof. Let (4.3) ?n. En(x) = I_: En dh. Then we have 36 n1/2[Fn(x) - Fn(x)] = I P;(x - ant) K(t) dt. n"2[§n O. (11) r < a and [3 (1/6_) dF < w, (iii) K is nonnegative. symmetric and absolutely continuous. f de = 1. support of K C [-M. M] for some M < w. (iv) a -—4 O. nllza: -—4 O and for some 6 > 0 n 38 1/2 1+6 n a n -—» w. Then T 1/2 2 lo n (In - f) dx ”F” 0. Proof. By (4.5). it suffices to show T 1/2 ~ 2 10 n (fn - 1“) dx "5* 0. Let 1 l Dn(x.t) — [Pn(x+ant) - Pn(x-ant)]. By symmetry of K and the Cauchy-Schwartz inequality. T 1/2 ~ 2 (o n (in - 1n) dA = (n1/2a:)-1fg [1: Dn(x.t) K’(t)dt]2 dx 5 W(n1/2a:)-l [g [g D:(x.t) IK’(t)I dt dx = W(n1/2a:)-l I: [g D:(x.t) dx IK’(t)I dt. Writing Pn(s) = ZI(s)F(s) for s < T and using (a+b)2 S 2a2 + 2b2. we have E Dn(x.t)2[ant < x < T - ant] T - T - 2 - E[Zn(x+ant)F(x+ant) - Zn(x-ant)F(x-ant)] [ant< x < T - ant] -2 T T 2 g 2 F (x+ant) E[Zn(x+ant) - Zn(x-ant)] - - 2 T 2 + 2 [F(x+ant) - F(x-ant)] E[Zn (x-ant)] . By Lemma 2.5. —2 T T 2 F (x+ant) E[Zn(x+ant) - Zn(x-ant)] 2 S 4[A(x+ant) - A(x-ant)] + 4 F (x+ant) A(x-ant) -[F -2(x+ant) - F -2(x-ant)]. 2) = b'2(a + b)(b — a) Since for 0 < a < b. a2(a-2 -b- < 2(b - a)/b. the second term on the RHS of the last inequality does not exceed - —l 8A(T)[F(x+ant) - F(x-ant)] F (x-ant). Lemma 2.5 also gives us - - 2 T 2 [F(x+ant) - F(x-ant)] E[Zn (x-ant)] g 4 [F(x+ant) - F(x-ant)]2 F -2(x-ant) A(x—ant) 39 g 4 A(T) [F(x+ant) - F(x-ant)] i ‘1(x-ant). 2 Hence E Dn(x.t) [ant< x < T - ant] S 8[A(x+ant) - A(x-ant)] - -1 + 24 A(T)[F(x+ant) - F(x—ant)] F (x-ant). This and Holder's inequality gives us E [T-ant (n (x 1))2‘26 dx ant n ' 2-2e dx [R E (Dn(x.t)[ant< x < T - ant]) T-ant 2 l-e S Ia t (E Dn(x.t) [antS x < T - antJ) dx n T-a t S [a tn (8[A(x+ant) - A(x-ant)] + 24 A(T) n -[F(x+ant) - F(x-ant)] F -1(x-ant)}lme dx T-ant 1_ (4.6) g T fa t [A(x+ant) - A(x-ant)]) 5 dx n + w IT-a“t([r(x+a c) - F(x-a 1)] F ”1(x-a t)}l-ed ant n n n x. where W is independent of t. Since T-ant T-ant x+ant Ia t [A(x+ant) - A(x-ant)]dx = Ia t Ix-a t dA(u) dx n n n T u+ant g [o [u_ant dx dA(u) = 2 ant A(T) g 2M anA(T). by Holder’s inequality the first term in (4.6) does not exceed W T6{2M anA(T))l-e. The second term in (4.6) does not exceed W{2M anflfflm}1-e inf(f[f > 0]) [3 Fe_1 dF. Thus the sum in (4.6) can be written as Bnalllme for some bounded quantity Bn independent of t. It follows that T-a t 1/2 2 -1 M 2-2 . (n an) [o la tn Dn(x.t) 6 dx 'K (t)l dt —F% O. n 4O T-a t Hence by (2.19) (nl/za:)-1f: I n Dn(x.t)2 dx IK’(t)I dt at n —Fa 0. Since we also have “at 2 T 2 I0 Dn(x.t) dx + IT-ant Dn(x.t) dx S 4 anH sup sup IPil. n [0.T) the result follows. D The following theorems establish the asymptotic distributions of our estimators 6 . 9 . Recall that from 2n the beginning of Chapter 3. when X has a density f w.r.t the ln Lebesgue measure and Y has distribution G. (K. 6) has a density L(f) w.r.t. “G' Since G remains unchanged throughout, we will simply refer to the weak convergence as under L(f). The theorems show that for a general density f fl f9. the asymptotic distributions are slightly different for the two cases when G is known and when G is unknown. At f they 9 coincide. We will use the differentiability of Q as in (3.21). (3.22). specifying ”n = uA . T = T. Thus G n -1' -1/2 60 = V(f. G. T). 600 = V(f. G). Denote pl: 2 feofeo . p = 2—1F F -1/2. p = p f-1/2. w = p F -1/2 and extend them 0 9 6 1 1 0 0 0 0 on R by defining them to be zero outside the support of f9 0 or the support of f. Theorem 4.2. Assume (1) through (iv) of Theorem 3.1 hold. In addition. Suppose 41 . 1) u < w. "1‘ 9oo " eoo 1n£(£9 [19 > 0]) > o. oo 00 (ii) ml is of bounded variation on [0. T]. (i) "f < m for i = 1.°°°. n and (iii) f’ exists and is absolutely continuous on [0. T]. llf'llco ( w. flf"flm( w and inf(f[f > 0]} > 0. (iv) 1 < m and f; (1/6_) dF < w. (v) K is nonnegative. symmetric and absolutely continuous. I de = 1. support of K C [—H. H] for some H < w. (vi) a -—4 0. 111/2 a2 n n 1/2 1+e a n ——4 w, and for some 6 > O. n fl m. 1/2 A Then. under L(f). n (91n - W(f: G)) converges weakly to a normal distribution with mean zero and finite variance. “2(31n - 1(1; G)) ). where 2 is the Fisher In particular. under L(fe). n converges weakly to N(0. 2-1 information matrix: 2 = E —%§ (In L(19)(§. 5)) [‘%§ (In L(19)(§. 5))1‘. Theorem 4.3. Assume (1) through (iv) of Theorem 3.2 hold. In addition. Suppose (i) llf9 llon < w. flféi)flm ( m for i = 1.-°-. n and O 0 1n1(19 [19 > 0]) > o. o 0 (ii) ml is of bounded variation on [0. T]. (iii) f' exists and is absolutely continuous on [0. T]. uf'uco < o. u£"um< o and inf(f[f > 0]} > o, 42 (iv) TF S T < m. ; (1/§_) dF < w and G is continuous 9 0 at T. (v) K is nonnegative. symmetric and absolutely continuous. I de = 1. support of K C [-H. H] for some M < w. (vi) an'-—4 0. 111/2 a: -—9 0 and for some 5 ) 0. 1/2 1+5 n a n Then. under L(f). n fl a. 1[2(9211 - 9(f; G; T)) converges weakly to a normal distribution with mean zero and finite variance. 1/2 A In particular. under L(fe). n (92n - W(f; G; T)) converges weakly to N (0. 2-1). We just prove Theorem 4.3. The proof for Theorem 4.2 is similar. Proof of Theorem 4.3. Throughout the proof. we adopt the special construction of Xn s and 6n's as in Theorem 2.1. we i i will need to use the algebraic identity(for a. b > O) (4.7) b1/2 _ 81/2 1/2 1/2 1 1 b - a = -————— (b - a) - (b - a) 231/2 2a1/2 b1/2+ a1/2 =__1_(b_.,__1_ (b-a12 _ 231/2 2a1/2 [bl/2+ a1/2]2 Under our assumptions the expansion (3.22). with Gn' 1n. 1 replaced by Gn' T. T respectively. is valid. where all entries of the matrix vn converge to zero in probability. Since the coefficient of the integral on the right hand side 43 of(3.22) converges to a nonrandom limit. we only have to deal with the integral in (3.22). Note that IF9(1)(x)I = I]: feé1)(s) dsI o _ o°'(1) '(1) '- — lefeo /feol feodh S {suplf60 (s)/feo|} Feo(x). and since TF S T the mean value theorem gives 6 0 F9 (x) S supf9 inf(f[f>0])F. Hence wo and F6 /F are 0 O O bounded. For the sake of convenience we will use W to denote a bound for all bounded quantities in our argument. Notice that as in (3.23). we have I; $9011L119011’2 — [L(f)]l/Zldu = 0. Thus nl’zlg $9011L11n)11’2 — [1(190)]1’2) d = _ IT p f1/2 _ fégzln nn112(6 _ G) dk + f3 p0 [g 1/2 _ fo1/2 ] d[n1/2(5n _ G)] _ n1/2 I; "1[f1/2 _ 0132] c dx _ n1/2 1; po [1 1/2 _ fag/2] dG _ 13p ”1 [f1/2 _ f1/2] n112(6n _ G) dk + I g pon nl’ztfi 1’2-1-‘21 d(cn - G) + I 3 Po“ “1,2[Fn 1/2_ - 1/2] dG + 1 3p 111/213“2 - 1 ’2] 6 dh (4.8) = S + S + R + R + R + R + S + S l 2 1 2 3 4 3 4' We can write s1 = [T B(x) Po(x) dx 1/2 _ where B(x) = - pl[f fl/z] is bounded. 90 44 I13 B‘1’(x)[P§(x) - P°1 dxl 3“) | sup ((pg - p0] 1 1"1| 1; 1 “-131. O.T] for o < a < 1/2. Since I; F “‘1dh g I; F “’1dF Ssupl R °(inf f(x)[f(x) ) 0]).-1 < w. by Theorem 2.2 and the fact that R T -» T w.p.1. we have 0 (4.9) s ‘F‘ I; B P dx. 1 Next. integration by parts gives 1/2 _ F 61/2 0 T O - (4.10) 32 = — [o Pn_ A dh + (.00 [F ]Pn }(T) . where d - 1/2 - 1/2 (4.11) A = a;(po[F - FeO ]} _ 2‘1[19 + 2 1 F9 19’3/2 f 1’2 19 O O O O _ F 1/2 f9-1/2 {9 2—1 F(fio F)—1/2 f]. O O 0 When T = T. A is bounded; when T < T. on [0. T ] F F F 9 9 6 0 o O - -1/2 — - 0 A S W F9 and F > F(TF ) > 0. Hence by Theorem 2.2 Pn 9o converges to PO uniformly on [0. TF ]. Therefore in both 0 T 0 T 0 cases we have ]0 P _ A dh -—4 IO P_ A dx. The remainder term - 1 in (4.10) {potF1’2 - 9’2]P°}(T) 1 W(T) 0])— R ~n1’2(F(r-) - F(T)) R3 = I; {p1(x)[(rn1’2(x) + 11’2cx)1’ -(1n - f)(x)Pg(x) dx. 1 . W it op( ) r e 1} then by (2.19). (2.25). (2.26) and the fact that the quantity in ( } is bounded. we have (4.16) R3 ‘F* 0. Now look at T -1/ R4 = 10 ”0(Fn .n1/2[F F1/2)-1(x) - F](x) MEn - G)(x) 2+ n 46 (4.17) =1 an d(cn — G). say. 1/2 By (4. 4) and the fact that |p31)(ii’2+ P )-1 (Kll (i) -l/2] S Ipo [F l(x)I S W. the integrand is uniformly bounded in probability. Also. by the uniform convergence of P; to P1 on each compact subinterval of [0. T). and continuity of F. F90 and F90. we have for xn-—# x 6 (O.T). Bn(xn) -§4 4—1 F (F F)-1/2(x) P1(x). Hence by Lemma 4.2. 9 9 0 O (4.18) R4 -F% 0 Therefore. we have proved that for A defined in (4.11). n1’2lg $9011Lcrn111’2 - [L119111’21 dun 1/2 _ 1/2 (4.19) ———e -]5 p1[f ] Po dh - [T P_ 0A dh P T -l l T -1 1 - — 10 2 To P dG - (o 2 P d(¢1G). where the limit has a normal distribution with mean zero and finite variance. Thus n1/2(32n - W(f; G: T)) also converges weakly to a normal distribution with mean zero and finite variance. The variance can be computed using (2.5). In particular when f = f6 for some 9. then the limit becomes -1 _ 1 - - I; 2- so P1 dG - [72 P d(¢1G) = [3 h1 d(P1/P). where -l T -1 T — - hl(x) = 2 [x e0 P9 dG + 2 [x Fe d(¢1G) -1 T -1 $ T - -1 - - = 2 [Ix 2 F9 dG + [x elc are] — 2 ¢1F9G(x) - 4‘1(;T P as + [T 1 a] - 2“. P C(x) ' x 9 x 9 l 9 —1 a - - -1 — - = 4 -53— (FOG) - 2 W1 FGG 47 -1 a - - Using the quardratic variation process of the martingale PllF from Section 2. we have Cov([5 hl d(Pl/ F)) 2‘? -1 T L _ _ 16 [0 (Pa - 2¢1F9) c o(fi92 é)‘ldP - t a ’ 2"’1Fe) 9 _ -1 T t _ t _ t t — The (i.j) entry of I; wowoé dF6 can be written as 4‘1]T F(i) F(J) F '2 é dF ) a d 'l'il = 44101: V3 ) 4'1[;T a d( l P ”1| "1| - 1; 6 F9 d( 4 1(1’ F(‘) (J a (1 e (1 e :l '11!- "I1l GA GA L4. v V I._.J (J )- 6 9 1) .11) . F11) .(1),d.] Fj + [3 EF - M( F3 (i.J)6 entry of {[0 woeo FBdG + I0 mo of dF9 t + I; oloo G dFG }. DI Thus Cov (I; hl d(P1/ F)) -l T t - T t - [[0 *1’1 C “9+ [0 fofo Fe dc] = 16’1 2. Consequently the covariance of the limit of n1/2 “ (92n - ¢(£; c; 1)) is [4‘121‘1(16’12)([4‘12]‘1 )t=2. [I When the Xi’s are distributed according to the model f9. the asymptotic covariance matrix of n1/2[92n - W(f; G: T)] is 48 the reciprocal of the Fisher information matrix. This fact reflects a certain optimality property of the estimator 62n For a 6 L2(R). let K(d. a. G) denote the collection of all sequences of densities {dn} such that (4.20) "n1/2(d:/2 - d1/2) - an2 -——» o as n -—» m. Note that (4.20) implies a l d1/2. as is easily shown. It also implies 1/2 (4.21) "n1/2([L(dn)] - [L(d)]l/z) - auc -—» o as n ——» m where B(x.0) = [f: a2 dkjllz. B(x.1) = a(x). and p 1 [L(d)]llz. Let K(d. G) denote the union of K(d. a. G) for all a E L2(R). and let {an} be a sequence of estimators of the functional W(d; G; 1) based on (i1. 6i). i = 1.‘°°. n. We say that {an} is regular at d if for {dn} 6 K(d. G) and X °° Xn independently and identically distributed 1.. O according to dn’ n1/2[9n - W(dn: G; 7)] converges weakly to a distribution F(d; 1; G) that does not depend upon the particular sequence {dn). The following theorem extends Theorem 5 of Beran(a. 1977) to the censored data case. Theorem 4.4. Suppose W(°; G; T) is differentiable at d with derivative w. in the sense that for (1D in a Hellinger neighborhood of d. W(dn; G; T) - W(d: G; T) = I. w{[L(dn)J"2 - [L(d)] + "[L(dn)]l/2 - [L(d)]llzuc un. where each component of un -—» O as Ildrll/2 - d1/2H2 -» 0. Let 1/2 } dnG (an) be a sequence of estimators of W(°; G; T) which is 49 regular at d. Then F(d; T; G) can be represented as the convolution of a N(0. 4-lfzw w wt duG) distribution with a distribution T1(d: T; G). Proof. Let n [L(dn)1"2(2.. 6.) (4.22) Ln = 2 a 1,2 . i=1 [L(d)] (2.. 6.) then we have for dn’ d in (4.20). as n -—+ m. (4.23) PL(d)[ILn - 2 n‘l’ 2 § 512.. 6.) [L(d)]‘1’212..6.) i=1 + 2 1:» 02 dual > e] -—+ 0. for any 5 > 0. This can be easily deduced from LeCam’s second lemma and is similar to Lemma 1 of Wellner(1982). Now the rest is almost the same as in Theorem 6 in Beran(a. 1977). For any vector v 6 RP. the differentiability of 2(0; G; 1) at d and (4.20) give (4.24) v‘[n1/2(w(dn; c; 1) - W(d: c; 4))] ‘---T I1“ (vtvlfl duc. Thus we can proceed almost exactly as in Theorem 6 of Beran(1977.a): the choice 5 = h vtw. h E R arbitrary. yields that along a subsequence. the random vectors 1/2 “ {v‘tn (92D) - w(d: 0: 11)]. n n'l’zizlvtw(21.6.)[L(d)1'1’2(Z..6.)} converge weakly under L(d) to (vtS. vtN} where N = N(0. II“ wwt due) and S depend only on d. T. G and not on {dn}. Let w denotes the characteristic function of the limit (vtS. vtN). Then at the end we get 50 (4.25) ¢(s.0) = 9(a. -2'1s) °exp [-8-1([:on vtwwtv duc)s2]. The first factor is the characteristic function of vt(S - 2-1N). the second factor is the characteristic function of 4-1vtN. Thus the theorem follows. B When the conclusions of of Theorem.4.2. 4.3 hold. the sequences of estimators {91n)' (Ozn) are regular at f9. In fact. under L(fe). (4.26) n1’2[82n - 2(19; c; 1)] - 4"! P9 P9“1 nI/Zn?n - F9] dG - 4'11 19 {6‘1 a d{n1/2[Fn- P9]} = op(l). as in the proof of Theorem.4.3. Since (4.24) gives contiguity of {L(dn)} to (L(fe)). (4.26) is also true under L(dn). Thus 1 1/2 A = n (F - D ) to P1 under nn n n convergence in D[0.T] of P L(dn) and the differentiability of W(°; G; T) will give the regularity of(02n). Similarly we can obtain the regularity of {31n}' Since with probability 1 P1 sits in a separable subset of D2[0. T]. by Theorem 5.3 in Pollard(1984) the necessary and sufficient condition for the convergence of Pin to P1 are the finite dimensional convergence and "small oscillation" condition. Recall the martingale representation of P;/F under L(f) as in Theorem 7.2.1 and Theorem 7.5.1 of S-W. We have 1 similar representation for Pnn /fi under L(d ). Thus n n convergence of Pin on [0.n] for any n ( TF can be obtained 9 by. say. Theorem 8.13 of Pollard(1984). This gives finite 51 dimensional convergence of Phn' Since small oscillation property is reserved under contiguity. we obtain the convergence of Phn' Therefore 02n is a distinguished regular estimator of W(f: G: T) for having the smallest asymptotic variance when the parametric model is true. 5. ROBUSTNESS PROPERTIES Just as in the i.i.d. complete data case(i.e. G is degenerate at m) discussed in Beran (b.1977). the minimum Hellinger distance estimation procedure in the random censorship model posesses certain degree of robustness. In one way this is reflected in the continuity of W(-; G); furthermore. W(fn: G) proves to be optimally insensitive to perturbations of its argument in a minimax sense. Consider the class of functionals {U} such that for p a p-dimensional vector with components p(1) in L2(u). (5.1) U(f9) a 9. 0(1) - e = I p