.1— . ‘ yam? 3m .. a u.” E ,n i n ' .2»: W. 9‘. m: m: iiifin 1a 1 Vin K.- . u... L. : 1. “m5?! 6... r: .31.! , . , b. _. I #1?» . z a... , .. 1.11.... .13.. . .35 : :3. E l.\'h 5v}. , if g 3. a . ..,...z.. :3. .... 11...}. .. r . . V , . . .Y .. ‘ . . . I, .rlbrt‘uu , 1w. Aunt!” ; .3, 7 . . ‘ . _3,,....~.§..q. . a? may. 5. , ‘. ‘ )1 .. ‘ .. ..... 5. ..1‘ . . THESIS F'» ; J/O? This is to certify that the dissertation entitled Nonlinear Wavelet-Based Nonparametric Curve Estimation With Censored Data And Inference on Long Memory Processes presented by Linyuan Li has been accepted towards fulfillment of the requirements for Ph.D. degree in Statistics ) 7 Major professor Hira L. Koul Date May 20, 2002 MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c:/CIRC/DateDue.p65-p. 15 NONLINEAR WAVELET-BASED NONPARAMETRIC CURVE ESTIMATION WITH CENSORED DATA AND INFERENCE ON LONG MEMORY PROCESSES Bv U Linyuan Li A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2002 ABSTRACT NONLINEAR WAVELET-BASED NONPARAMETRIC CURVE ESTIMATION WITH CENSORED DATA AND INFERENCE ON LONG MEMORY PROCESSES By Linyuan Li In the first two parts of this thesis, we provide asymptotic formulaes for the mean integrated squared error (MISE) of nonlinear wavelet-based density and haz- ard rate estimators under randomly censored data. We show this MISE formula, when the underlying survival density and hazard rate functions and the censoring distribution function are only piecewise smooth, has the same expansion as anal- ogous kernel density estimators. However, as to the kernel estimators, this MISE formula holds only under the smoothness assumption. In addition, we establish an asymptotic normality of non-linear wavelet estimator of hazard rate function, which is useful to construct a confidence interval of hazard rate function. In the third part, we discusses the asymptotic behavior of Koul’s minimum distance (m.d.) estimators of the regression parameter vector in linear regression models with long memory moving average errors, when the design variables are either known constants or i.i.d. random variables, independent of the errors. It is observed that all these estimators are asymptotically equivalent to the least squares - estimator in the first order. ACKNOWLEDGMENTS I would like to express my deep gratitude to my dissertation advisor, Profes- sor Hira L. Koul, for his constant guidance, generous support and extreme patience shown during the writing of this dissertation. His dedication and contribution to statistics have been my main source of inspiration during my graduate study. I am also very grateful to Professor Winfried Stute for his valuable and constructive suggestion and Professor Donatas Surgailis for providing me a preprint of Lemma 3.3.3 in this thesis. I would also like to thank Professors LePage, Levental and Salehi for serving on my thesis committee. My special thanks go to Professor Ibragimov who offers wonderful courses during each summer and to Professor Page for training me as a statistical consultant. The research in this thesis was also partly supported by the NSF Grant DMS 0071619 under the RI. Professor Hira L. Koul. iii TABLE OF CONTENTS 1 Nonlinear Wavelet-based Density Estimator 1 1.1 Introduction ............................... 1 1.2 Notatlions and Estimators ....................... 5 1.3 Main results ............................... 7 1.4 Proofs of the theorems ......................... 8 1.5 Proofs of the propositions ....................... 23 2 Nonlinear Wavelet-based Hazard Rate Estimator 34 2.1 Introduction ............................... 34 2.2 Notations and Estimators ....................... 35 2.3 Main results ............................... 36 2.4 Proofs .................................. 38 3 Minimum Distance Estimators in Regression Models under Long Memory 52 3.1 Introduction ............................... 52 — 3.2 Main results ............................... 54 iv 3.2.1 The case of non-random designs ................ 3.2.2 The case of random designs .................. 3.3 Proofs of the theorems 3.4 Appendix ....... Bibliography ooooooooooooooooooooooooo Chapter 1 Nonlinear Wavelet-based Density Estimator 1 .1 Introduction The mathematical theory of wavelets and their applications in statistics have be- come a well-known technique for non-parametric curve estimation: See e.g., Meyer (1990), Daubechies (1992), Chui (1992), Mallat (1989), Donoho and Johnstone (1994), Donoho, et a1. (1995, 1996) and Kerkyacharian and Picard (1992, 1993). For a systematic discussion of wavelets and their applications see recent mono- graph by Hardle, et a1. (1998). The major advantage of the wavelet method is its adaptation to erratic behavior of the density and local adaptation to the degree of smoothness of the unknown density. These wavelet estimators typically achieve the optimal convergence rates over exceptionally large function spaces. They do an excellent job of taking care of discontinuities in the target function, and in con- sequence they enjoy very good convergence rate even if smoothness conditions are imposed only in a piecewise sense. Hall and Patil (1995) first explicitly demonstrated that, in the no censorship case, the discontinuities of densities have a negligible effect on the performance of non-linear wavelet density estimators. The mean integrated squared error (MISE) of the kernel estimator of density function f has the form MISE ~ 61(nh)-1 + c212”, 71 where “~ means that the ratio of the left- and right-hand sides converges to 1 as the sample size n —> 00, h is the bandwidth of the kernel estimator, r is the order of the kernel and c1 and c2 are constants depending on both the kernel and unknown density. The first term derives from the variance and the second from the squared bias. This expansion for kernel estimators generally fails if the underlying density function does not have r derivatives (Hall and Patil, 1995, p.906). However, the MISE expansion of non-linear wavelet estimators is still valid for only piecewise smooth density function, and even has the same constants c1 and c2. Patil (1997) provided similar results for non-linear wavelet-based hazard rate estimator with complete data. In industrial life-testing, medical follow-up research and other studies, the observation of the occurrence of the failure event may be prevented by the previous occurrence of the censoring event. So only part of observations are real failure times. Formally, let X1, X2, ' -- ,Xn be i.i.d. survival times with a common distribution function F and density function f. Also let Y1, ’2, - -- ,Yn be i.i.d. censoring times with a common distribution function G. It is assumed that X.- is independent of Y, for every 2'. Rather than observing X1, X2, - -- ,Xn, the variables of interest, in the randomly right-censored models, one observes Z,- = min(X,-, Y,) = X,- A K and 6,- : I(X.- s K), i = 1,2, - -- ,n, where 1(A) denotes the indicator function of the set A. Antoniadis, et al. (1999) describe a wavelet method for the estimation of den- sity and hazard rate functions from randomly right-censored data. The method is based on dividing the time axis into a dyadic number of intervals and then counting the number of events within each interval. The number of events and survival func- tion of the observations are then separately smoothed over time via linear wavelet smoothers. They provide estimator’s asymptotic normality and obtained best pos- sible asymptotic MISE convergence rate under the assumption that survival time density function f is r-times continuously differentiable and the censoring density 9 is continuous. The objective of this chapter is to propose a non-linear wavelet estimator of density function with censored data and derive a result similar to the main result, Theorem 2.1 of Hall and Patil ( 1995). One of the consequence of this extension is that we can show that MISE has the analogous expansion MISE ~ kln'lp + kgp‘z’, (1.1.1) where n denotes the sample size, p is the smoothing parameter, a wavelet analogue of the bandwidth ’1‘1 for kernel estimators and k1 and k2 are constants depending 3 on the wavelet, unknown density and censoring distribution. Recently Wu and Wells (1999) provided hazard rate estimation by non-linear wavelet methods in the left truncation and right censoring model. They have n observations (Xi,6,-, V.) with X,- 2 V,, where X,- = min(T.-, U.) and 6,- : I(Ti _<_ Ui). They applied counting process techniques and obtained analogous MISE expansion, but needed further truncation. They provided a wavelet-based estimator for hazard rate function over bounded interval [L, T] which is chosen such that the size of risk population satisfies the following conditions: (Y1): P(Ym;n-s na) 2 o(n"2) for some a > 0, where Ymin = inf,€[,,,.]}"(t) and Y(t) = 23:11“, 2 t 2 V,). (Y2): ESUPtepnjlfir) — WI = 0(n‘1), where C(s) = E|Y(s)/n|. Basically, the condition (Y1) means that the size of the risk population Y(t) is large and the condition Y(2) means that Y(t) is uniformly close to its expectation, for all t E [L, r]. In addition, they only obtained the approximation ( 1.1.1) for the MISE, which is weaker than the result (1.3.1) given below. In this thesis, we apply the method of Stute (1995) that approximates a Kaplan-Meier integral by an average of i.i.d. random variables with a sufficiently small rate. We provide a MISE expansion similar to that of Hall and Patil (1995) for density function over (—00, T], for any fixed T < T”, where TH = inf{:r : H (x) = 1} S 00 is the least upper bound for the support of H, the distribution function of Zl. In the next section, we give the elements of wavelet transform and provide non- linear wavelet-based density estimators. The main results are described in Section 3, while their proofs appear in Sections 4 and 5. 1.2 Notations and Estimators This section contains some facts about wavelets that will be used in the sequel. Let ¢(:r) and w(:r) be father and mother wavelets, having the properties: (15 and 11) are bounded and compactly supported. I a? = f w? = 1, ,uk 5 f yk¢v(y) dy = O for O S k S r — 1 and p, = rlrs: aé 0, where K. = (r!)‘1fy'w(y)dy. Let $10?) = P1/2¢(P$ - j), wig-(I) = pE/wa — j), :r 6 IR for arbitrary p > 0, —00 < j < 00 and p,- = 192", 2' Z 0. Then f¢j1¢j2 = 61112, [wiijlwizh = 6i1i26j1j2’ [Chill/”’02 = 0’ where 6,)- denotes the Kronecker delta, i.e., 6,,- = 1 , if 2' = j; 0, otherwise. For the more on wavelets see Daubechies (1992). In our random censorship model, we observe Z ,- = min(X,:, I2), and 6,- = I (X,- g Y,), i=1,2,--- ,n. Let T < TH be fixed and f1(:r) = f(:r)I(:r g T). We estimate f1(x), i.e. density function f (:r) for a: E (—oo,T]. The wavelet expansion of f1(:r), assuming f1 6 £2, is f1(:r) = Z b.¢.~(z)+2 Z amt-(x), j=-oo i=01=-oo (1.2.1) bj =/f1¢j, bij=/f1¢‘ijo We propose a nonlinear wavelet estimator of f1(2:) : . oo . q—l 00 A M2) = Z bjcbjoc) + Z 1(|b,j| > 5)¢,,(x), (1.2.2) jz—oo i=0 jz-oo where 6 > 0 is a “threshold” and q _>_ 1 is another smoothing parameter, and the wavelet coefficients (3]— and Iii]- are defined as follows: __ 61:1(Zk S T)¢j(Zk) bj =/¢j()1(:c < T) an2:( )= ”ELI l-én(Zk-) , (1.2.3) .. = a: ___ _1_ n 6kI(Zk S Tli/Iij(Zk) b. / w..()1( Hz. 1_ cf..(z.—) (1.2.4) Here F}, and On denote the Kaplan-Meier estimators of distribution functions F and G, respectively, i.e., —1_[" ___:lNZmSI) k=1 n — 61: )+1 " _1_[ 1— 50¢) _] (2mg) k=1 n — 1: +1 where, Z0.) is the k-th ordered Z-value and 6“.) is the concomitant of the i-th order Z statistic, i.e., 6m 2 63- if Z0.) = Z,-. Note that 6k/n(1 — Cn(Zk-)) is the jump of the Kaplan-Meier estimator F}, at Zk. Remark 1.2.1 We can define the wavelet estimator of f (2:), say f (2:), instead of f1(:r), similarly to (1.2.2)-(1.2.4). However, in this case, the MISE, i.e. Ef(f — f )2 < 00 cannot be ensured. Thus we typically consider E ffoo( f — f )2 to eliminate the endpoint effects. Since wavelet estimator f is the same as f1 whenever Z(,,) S T, [:(f—f)2=/(f1—f1)"‘—/:f{"=/(f1—f1)2, we have provided that T 3 Z(,,). Thus, our analysis for fl is closely related to that for f restricted to (—00, T]. Remark 1.2.2 Although here we consider survival times setting, the random vari- ables by no means be necessary restrictedly to positive. Suppose there is no censor- ing, i.e. G _=. 0 on (—00, 00). Then 6,. E 1, for all k = 1,2, - -- ,n and upon taking T = TH = 71:, we see that f1 E f and the above estimator f1 = f of Hall and Patil (1995) 1 .3 Main results We assume that the smoothing parameters p, q and 6 satisfy the following condition: A: p—->oo, q—>oo, p62—>0, p2r+162—+oo, 62CVn'llnn,where q C > Co E 2{r(2r + 1)‘1 sup f1(1 — G)’1}1/2. Theorem 1.3.1 In addition to the conditions on (b and it) stated in section 1.2, assume that the r-th derivative f(') is continuous on (—oo,oo) and is bounded, monotone on (-—oo, -u) and (u, 00) for a sufiiciently large positive it and the cen- soring distribution function G is continuous. Also assume that condition (A) holds. Then El/(fl — f1)2 - {n’lp Til—G +p'2’n2(1 — 2‘2")“/f1‘”2} (1.3.1) = 0(n‘1p + 12‘”)- Remark 1.3.1 This theorem is an analogue of Theorem 1 of Hall and Patil (1995), where the monotonicity of f (’l on (u, 00) for large positive u is needed. However, 7 it is not needed for censored data case, because of the effect of truncation at T. Remark 1.3.2 The result (1.3.1) is stronger than traditional asymptotic formula for MISE. It implies a wavelet version of the MISE formula: E/(fl - f1)2 N "—11? 1_f_IG “PP—2”?“ — 2-2rl-1/fir)2- In the Theorem 1.3.1, we have assumed that survival time density f is r-times continuously differentiable and censoring distribution function G is continuous for simplicity and convenience of exposition. However, if f (’l and G are only piecewise continuous, Theorem 1.3.1 still holds. That is the following: Theorem 1.3.2 In addition to the conditions on d) and w stated in section 1.2, assume that the r-th derivative f”) and G are only piecewise smooth, i.e. there exist points 230 z —oo < 2:1 < 2:2 < < am < oo = ash/+1 such that the first r derivatives off exist and are bounded and continuous on (rung-+1) for 0 g i g N, with left- and right-hand limits; and that f") is monotone on (—00, —u) and (u, 00) for sufiiciently large positive u. In particular, f and G themselves may be only piecewise continuous. Also assume that condition (A) holds and pgr+1n‘2' —-> 00. Then also (1.3.1) holds. 1.4 Proofs of the theorems The proof of the above theorem follows along the lines in Hall and Patil (1995), combined with Stute (1995) which establishes an asymptotic representation for the Kaplan-Meier integral f (,0an as an average of i.i.d. random variables with a 8 sufficiently small error. This allows for a more traditional and direct approach to the density estimation problem for the censored data, compared to the martingale approach as, e.g., in the Wu and Wells (1999). We begin with some lemmas. To state these lemmas, we first need some addtional natation. Let 991(1) = ¢J($)I(‘T S. T): j : Gail-3:27”. a Wij($)=¢ij($)1($ST), i=0,1,~-,q—1,j=0,i1,:l:2,~-, ~ 1 n 5kF _ v < u, v < w)c,o,-,-(w)m(u,~)1-0 v ~1 w u H ///I( ' [1—H(v)]2 H (d )H (d )H..(d )+R....,. where Rn,z'j = nl,ij + 5712.0 + Rn1,ij — Rn2,ij + 23113.0 a (1-4-11) Hn, H2 and H}, are the empirical (sub-) distribution function estimators of H , HO and H1, respectively, 70(Zk) = 1/(1 — C(Zk)), and 5711,41: iifipifl ()Zk )"/0(Zk)5kBkm (1.4.12) nk: 1 Sn2,ij 2% g; (pij(Zk)6keA"{Bkn + Ckn}2, (1.4.13) --— --w w z u' [Hn(z (2)]2 0 *4 ~1 w 12...”- ff m m m < .)[,_H(z;.]'[1_(H wHHanHAd ). (1.4.14) Raw/ff x [H2(dv) — H°) 4 [fl/(1 — G). J and Z]. bf z 0(ff12), then EZ(hj—l) j 2=n p/f1/( 1(—G )+o(n‘1p). .7 As a consequence of Zk g T for all k = 1, 2, - - - ,n, all denominators appearing in (1.4.3) and (1.4.4) are bounded away from below. Thus they may be handled along the same lines as those in Hall and Patil’s paper p.922 to show that Var( 23(13)- - (5)2) = o(n‘2p2). So we obtain 311 = o(n’1p). 13 From (1.4.5), we have 312 g n" Zewfizl) g 2n'1:(EUJ-2(Zl) + E132(Z1)). J 1 In view of (1.4.6), applying the Cauchy-Schwarz inequality and using the compact support of o, we finally can obtain 1 [1—H(T)][1—GT EU2(ZI)<)]p-1/¢2(u)f12((u +j)/p)dU- (1-4-17) So we obtain n-IZEU?()Z,= 0(n“1/¢2(u)§;p‘1f,( (u+j )/p)du)o(= 22-1 p.) Y By applying the same argument, we can obtain 1 —1 2 2 . H(T)]2[1—G(T)]2p /¢(U)f1((u+J)/P)du2 (1.4.18) Ev3?(z2)< [1_ thus, 71-12], EVJ-2(Zl) = o(n'1p) too. Hence 312 = 0(n’1p). By (1.4.9), 313 = o (flax/2,2321}? = o (g) = 0(n-1p). Finally, by applying Cauchy-Schwarz inequality twice to 314, together with s11 and 313, we obtain 1/2 314 S 2 (2 E091 _ (5)2 ' ZEerm) = 0(n'lp). J J By applying the same argument, we can show 315 = 316 = o(n‘1p) too. This completes the proof of the lemma. Lemma 1.4.3 Under the assumptions of Theorem 1.3.1, 22: 224022 2 21(lb22|>6)} o 22;), __0 j 322-. — :ZE{(bij— bij)21(lbiJ-bi1l > B6» i=0j So s2 5 321 + 322. By (1.4.10), we have 821 S3iZEU5i1 — b,j)2}1(|b,-j| > 05) i=0 j q-l + BZZEWEqum-jl > at) ._0 J. ”23213123,,“ (lb,j|>a6) i: 0 j =3(321,1+ 521,2 + 821,3), (say). Define f”,- = sup f1((y+j)/p,-)/(1—G((y+j)/p,~)). Since the denominator 116311139“) 1— C((y + j)/p,-) Z 1 — C(T), for all i,j and y, we have f1.ij S sup (1 — yesuppw G'(T))‘1f1((y + j)/p,~). Because fl is bounded and monotone in the extreme tails and 112 has a compact support (—v, v), we have, for suffiently large K, sup—Zf122-< (:r» ‘sup-l-Z sup f1((y+j)/p2) n;i>0pi j miZOpi j yE(—v,v) 30-60))" sup— 2: f1((((u+j)/p2-)+Ksur>f1] _n ,i>0pi I] /P |>K l' g (1 —G’(T))’1 sup/l" Kf1(u/p.-+a:)d.r+Ksupf1], _n;i20 15 where u = v or —v, depending on the monotony of f1. Hence we have supp: 12h” < 00 and supp: 12/99?de < 00. (1.4.19) J n;1>0 n ,1'>O Use this fact and an argument as in Hall and Patil (p.916-917) to obtain 321,1 = o(n"2’/(2’"+1)). As to the 321,2, from (1.4.5), 2 q 3212 _<_ EZZ[EUEJ()Z1 +EV- 3(2.,)] (1.4.20) i=0 J By applying an argument similar to (1.4.17) in Lemma 1.4.2, we have EUé-(zn = 0(2)? / ¢2(U)f12((u +j)/p.)du), so that where the second equality holds from q = 0(ln n), and Z]. pf1f12((y+j)/p,-) —> f ff. By applying an argument similar to (1.4.18) to the second term of (1.4.20), we have 321,2 = o(n‘2’/(2'+1)). Next, from (1.4.10), q—l 321,3<§2ER11JJ=O($)ZpiZP:I/Q?Jdlr i=0 j 1‘0 J 1.4.21 = 0 (52-) by (1.4.19), ( ) n = 0(n—2r/(2r+1)) by n—lpq _> 0 Thus, 321 = o(n‘2'/(2'+1)). 16 As to the 322, by (1.4.10) we have 822 <3QZIZE{(b1J—b1J)2II—b1jl>1‘36)} 1:0 J q-1 __2 , + 3ZZE{WUI(|b,-j — b2Jl > 66)} 1:0 j +3ZZE{R"U bij_b1Jl > (36)} 1: 0 J =3(S22,1 + 322.2 + 322.3)» (say). By applying the similar argument as that in 321,2 and 8213, it is obvious that 3222 : 522,3 : o(n‘2’/(2’+1)). To complete the proof of this lemma, it thus suffices to prove 322,1 : o(n‘2'/(2'+1)). In view of (1.4.10), we have 5221(:ZElwij_bij)21(lb1J-bijl> 0155)} 1:0 J +2259” 0 2211-12.,|>2,52)} +22%!) 22-— b2)21(|R2.-J|>22,132)} 1:0j =822,11 + 322.12 + 822,132 where 011, 02 and 03 are positive numbers such that 01 + (12 + a3 : 1. The term 322,11 is similar to 312 in Hall and Patil, following the argument there, noticing 5,5 and fly-J- play the roles of hij and fij there, we can show that 322,11 = 0 (n’2r/(2r+l)) . (1.422) 17 AS 130 the 82213, 191'. A = {IBJ'J' - bijl S 01/36}, then, 322,13 ziZE{(Bij—bgj)(21(|Rnijl > 03136) )14} 1-0 ,1 +ZZE{(51J—bij)21(.|Rnijl>0336)14c} 1:0 J <22}? 3262PI <.an2~2| > 2222) 1: 0 j +EZE{(b ii ()0) )(21lb1’3 _‘bijl > (11/36)} 1: 0 j 0, and define 33O=qzlzbijl (lb1j|<6)1 331::lzb2Ji I{lbiJl<( (1+€)6} i=0 j l: 0 J q-l 332— _ 022123 1{|b,,-| < (1— a5}, A = XZbEJIIIb.J — b,,| > 26). i=0 j i=0 J 18 Then S32 - A S 830 S 331+ A . (1423) By applying the arguments analogous to those of Hall and Patil (1995, p.918-921) to 331 and 332, we can obtain 22. = 232 = 2‘2'2220 - 2‘2")'1 f If” + 002-2"). (1.4.24) Now, EA 7126)+qZ:Zb§,.1>(|I_I,,-I>7225) '=o j 1:0 J +22%!) (IRMA > 2226) ._0 J. =A1 + A2 + A32 (WW2 where '71, '72 and 73 are positive numbers such that 71 + 72 + 73 : 1 . (22” eXP(--I1— 2) f52f;:,2222]) =(::23)_0I)231 1:0 J The first equality follows from Bernstein’s inequality (see Hardle, et al. 1998, p.244), while the second follows from 1162 —+ 00. 212-0 (2223, 53%;)” (3:223; 7/2322) 1-0 J 1‘ 0 J q-l = ., (2:22.) =22». 1:0 J' The second equality follows from the arguments in (1.4.17) and (1.4.18), while the third follows from 1162 -> 00. Similarly, we can show A3 = 0(331) too. Thus, EA : 0(331). Combining (1.4.23) and (1.4.24), the proof of lemma follows. 19 Lemma 1.4.5 Under the assumptions of Theorem 1.3.1, 34:225.}: p)—22 1:9] Proof. The proof follows from the step 3 of Theorem 2.1 of Hall and Patil (1995). We are now in the position to give the proof of the Theorem 1.3.1 and 1.3.2. Proof of the Theorem 1.3.]. Proof follows from the bound E‘/(f1 - f1)2 - {71—119 1110 +P—2r52(1 " 2-2r)_l [f1r)2} £81+32+S3+S4, and Lemmas 1.4.2, 1.4.3, 1.4.4 and 1.4.5. Proof of the Theorem 1.3. 2. We use the same notations as in Hall and Patil (1995). Noticing that, by the orthogonality properties of (b and 1/2, [(1—12)‘2’.,,=I(zz.z..), where Z denotes the set of all intergers and q-l Iq(lp, $0, W1, . . .) = 2a)., — bj)2 + Z 2(81j_b1j)21(|b1jl > 6) jeW 2:0 Jew.- +§Zb3,1(1b,,|<6)+22b3,. 1:0 j6W11=q jE'I’g 20 By (1.4.9) and (1.4.10), 1,011,210, 211,, . . .) =20}, — bj)2 + 2W: + ZREBJ + 22(51 — bj)Wj J'EW J'EW J'ESP J’EWI +2203,- 6,)12..,-+22W—7,-R,,-+:Z(6 .,-— 6,,)(21|6,,|>6) JIEW JEW 1: OJEW +:ZW3, 1( (|6,,-|>6) +QZZR,,,1 ()6,,|>6) 2:0 36!? 2:0 jEW q—l +2ZZ(6,,—6,)IT1(|6,,—|> 6) +2216 ,j— 6., )R,,-,-1(|6,,-| >6) 2:0 3912. 2= 0169?.- +2EZWURM,I( ()6,,-| >6) )+q:::bf,l( ()6,,-| <6) +22%. 2‘: OjeW 1': 0 jeW i=0 jew =11+12+I3+I4+15+15+I7+Ig+19+110 +111+112+113+1142 (say). By the Theorem 1.1 of Stute (1995), when F and G are only piecewise con- tinuous, a quartile transformation may be applied so as to trace everything back to uniformly distributed Z ’3. Thus the above Lemma 1.4.1 still holds in this case (see also Stute and Wang (1993, p.1605). From (1.4.17) and (1.4.18) in the Lemma 1.4.2, we obtain E12 = o(n‘1p). From (1.4.9) in the Lemma 1.4.1, it is easy to see E13 = 0(n‘2p) = 0(n‘1p). From the proof of Lemma 1.4.3, we have E18 = 0(n‘zr/(2'+1)). From (1.4.21), we also have E19 2 o(n“2’/(2'+1)). Applying the Cauchy-Schwarz inequality, we can show 14, I5, Is, 110, In and 112 are all of the order o(n‘2'/(2'+1)). When fl is only piecewise smooth, let lI denote the finite set of points where flm has discontinuities for some 0 g s S r. Suppose supp d) g (—v,v), supp 21 111 g (—v,v) and let K={k:k€ (px—v,p:r.+v) forsome :rEl'I}, K,- = {k : k 6(1),:1: - v,p,-sc+v) for some :1: E H}. Also let Kc, 1K3 denote their complements. Then, unless j 6 HQ, b,, and 13,-,- are con- structed entirely from an integral over or an average of data values from an interval where f1”) exists and is bounded. Also, unless j E K, b, and (3,- are constructed solely from such regions. Thus we may write 1,(1,210,211,...) =[1(1K)+12+I3+I4+15+15+I7(K0,K1,K2,...) +113(K01K11K21°°') + [14(K‘01K17K21'“) + 1,016) + 17(K3,1<:,1K.g, . . .) + 113(K3,KC,IK§, . . .) + 11412312212; . . . ), (1.4.25) where 11(22): 2032- — 2,)2, 120122) = 2032 — 12V, J'EK jeKC q— 1 12(K2,K2,K2,...) =22b ,— b.,))(21(|6,-I >6) i=0 16K. -1 17(Kg,1<§,1<§,...) :qzza') ,,— 6,,) )21( ((6,,|>6) 1': OJ'EKf the rest of the terms are defined similarly. However, for our compactly supported wavelet (b and 1/2, both K and K,- have no more than (211 + 1)(#H) elements for I each 1'. Considering q = 0(lnn), we can show 11(K), 17(K0,K1,IK2,...), and 22 113(K0, K1, K2, . . .) are of the lower order 0(n‘2’/(2'+1)). Thus it is negligible com— / pared to the main terms of MISE. Although b,, is only of the order p,—1 2 when fl is not r-times smooth, based on theorem’s additional assumption pg’fln‘zr —> 00, we readily see that 114(K0,1K1,K2, . . . ) = 0(n‘2r/(2'+1)). By tracing the whole proof of Theorem 1.3.1 carefully, we will see the rest of the terms of the right hand side of (1.4.25) have precisely the asymptotic properties claimed for f(f1 — f1 )2 in Theorem 1.3.2. 1.5 Proofs of the propositions Proof of the Proposition 1.4 .1 . In view of (1.4.12), applying the moment inequality to Sn”,- and taking expectation yields 1 n E (5,31,,2) S E(; Z 9922j(Zk)73(Zk)6kBin) 2. 2:1 (1.5.1) 1 = a: E(991?j(zk)73(zk)6kE(Bgn|(Zka6k)))' k=l In view of (1.4.7), the definition of BI," and note that $2 33—3- _ln(1+:r) SI, for 2:20, we have IB |<—1-/Zk- 113(212 < 1 (152) kn _ 2n _oo [1— H,,(z)]2 _ n(1— Hn(Z,,—))' ' ’ Thus, conditionally on {Z)c = z} and {6), = d}, noticing an(z—) = 23:, I(Z,- < z) is a binomial random variable with parameters 71 — 1 and p := H (z—), we have 1 2 E 2 = z = < < —' (Bknlzk 25* d) - E(n2(1-— Haw—D3) ‘ 723(1-19)2 23 Thus, from (1.5.1), notice (Z1,61), (Z2,62), - -- , (Zmdn) are i.i.d., we have E(Si...)< E(222,(22)23(zl)62 211-62112.» ) = 0(513) [2 2,2112 The last equality follows, because {Z1 3 T}, T < r”, imply that 1 — H (Z1—) 2 1—H(T)>0. Remark 1.5.1 If we consider the estimation of f, instead of trunction f1. Then we have 2 _ F(dx) E(522'2)"(n21)1;1(/11—21121141211 The above integral will be infinity for some i and j, such that 1,63, 2 K > O on a neighborhood of r”. Since q —> 00 and j 6 (—oo, 00), there always exist some i and j satisfing above condition. Thus we could not show 23:13:, E(S,2,1,-,)= 0(n'2r/(2’+1)) without the truncation. Proof of the Proposition 1.4.2. In view of (1.4.13), 1 " 1 " IS'222,2-j| S E 122—; l¢2j(Zk)|5k€A"an + 5 §|W1j(zk)|6keAkC£n2 Again applying the moment inequality to the average and taking expectation yields E(Sfi...)<_ 32422212216 22228...) +§2E(22?.(22)62e222cz.). nit: l k=1 Because the proof of the two terms are similar and the second term is more involved and require more details, we here only need to prove, for any k, 1 - - E (W?j(Zk)6ke2AkC;:n) = O ('11—'2 )/§0?j (IF. (1.0.3) 24 Writing LHS of (1.5.3) as E(w?.(z.)6.E(e2-“‘*Cz.l 0, we need to bound the J. The idea is to divide the sum . . . 16(k— —n )2 Into two parts according to the magnitude of n "H“ 12>”. To make it clear, let A = {k; [11: - 111)] _<_ nd,d 6 (1/2,1)}. We write J = 216A+ZkEM =: J1+ J2. It is easy to see J1 S Timn'M—Hlfl").2 = 0(1). As to the J2, when k > 7111) + 71“, we have 27 (see Feller, 1957 p.163) 2 1 1211.12.12) 3 11m; 12.10) «re-2222+“, 161 < '2'. h _ k-—(n+1)p+l . . w ere £1 — ———1(n+l)pq , (n +1)p — 1 < m S (n. +1)p and b(m, 11,19) 18 the central term, which is O((27rnpq)"%) (see Feller, 1957 p.140). Thus, when k — up > n", we have 61. 2 (p11)’1/2[nd"1/2+ +(1/2- P)” 1”l and 2 16(k- up) _1 2d— 1 2 71250-2) b(k;n,p)=0(nn16b(m;n,p)e 2" " )=O(1). k>np+nd By applying the same argument to the k < np — 71", we have .12 = 0(1). Thus we prove (1.5.7) and hence (1.5.6). 5 As to the (1.5.5), in view of (1.4.8), we write C1,, , conditionally on {Z1 = 21} and {61: d1}, as IZ18 -z. war -1 .1 Ram] -/9Pij( )70(w w)”; [l—H ()Zk )Hl2l1“ "(an Hum ). So k=1 n (Z1: ) = E(E(K2(zl)lzl)). (1.5.12) Conditionally on {21 = 21} and {61 = d1}, we rewrite K (21) as sum, we have = $2 fwzk, w) - u(w)][f1,t(dw) — Wm», k=2 30 where 1(Zk < 21A w)901j(w)70(w)(1- 5k) “2’“ w) = 11— H(Zu]? ’ u0=EMMLwfl k=za~.m. dam) Again continuing to write K (21) as sum, we have Kt.)=.,:,;{, 1 Z [b.(Zk Z051—U(Z1))61]H[h(zk,W)— H(U/ ')]H1(dw)} k- 2 (:1 =— nz{:l 2 [h (21;, Z1)61— H(Zl)6l] " %u(Zk)6k (#1: _/[h( Z,c,111)))-u('w)]1f11 (61111)} = — 7112214 Zk)61c +- 11:: {711 Z [h(Zk, 2051 - “(2051] nk: 2 (#1: —/[h(Zk,w) — u()w)]f11(dw } 1 n — ;2' éuflflfi-n (n _1) _ZZ[’I( (Z1: Z1) 51 — "U (2051] )kz 2l¢k 1 n n ”l + m ;;{[:(Zk, Z1)(51— u(Z1)61]—/[h(Zk,w)" “(U/”H (dw)} =11+ 12 + 13, (303/)- As to the 11, in view of (1.5.13), we have 1 E112 = o (E) E(u2(Zg)62) = o (”i ) [993, dF (1.5.14) As to the 12, in View of (1.5.13), we have |12| = 0 (5) iiwzzwznazw 0(-1—) 2:11:11 ()21 110(2116. k=2 (#1: Thus E1; = 0 (”i ) [go-U. dF. (1.5.15) 31 AS t0 the 13, 18t H(Zk,Z1) = [h(Zk, Z1)61 — H(Z1)61] — f[h(Zk,'lU) — u(w)]f11(dw), thus 13 = 0(n‘2) 22:2 2;“: H(Zk, Z1). Noticing EH(Z,., 2,) = E(E(H(Zk, 20121)) = 0, k 5e 1. (1.5.16) EH(Zk, Z1) 2 O, EH(Zk, 21) = 0, k #1. Hence E132 =0(-1:—)kEH2(Zk, Z1) =21¢k Z Z Z EH(Z,,, lelH(Zk, Z12) ““1 (#512 115512 ) ) Z Z Z EH(Z,,, Zk)H(Z,,, Zk) ) ) +0 13th +0 :1)... hill ““2 115512 2 Z Z EH(Zk le)H(lev Zk) k¢11 k¢12 115512 +0 Sél 1... +0 2: Z Z EH(Zk11 Z1)H(Z1, Zkg) k1¢l k2¢l k1¢k2 AAAA SAI ._. =I3(1) + 13(2) + 13(3) + [3(4) + 13(5): (say). The first term E(E(H2(ZkaZl)|Zk)) fiél ._. a. 3H [V]: 21!: 1L = o (5) E1h)" a- 21k ‘lL E(h2(Zk, Z1)61) :5.) 1—1 v v V v V a. 3 ll M= \ ‘6 are a. I: no 11' all M: || C II Q AAA/“\A E¢?j(Zt)7§(Zt)5t 3.4 ._. 3. LI) 1'? I: ll 0 .F. 352' .._. As to the 13(2), for k 95 l], k ¢ 12 and II ¢ 12, conditionally on {Z1c = zk} and {61; = 611;}, we have E(H(Zk, Z11)H(Zk, Z12)le, dk) = EH(Zk, Z11) EH(Zk, Z12) = O, which is from (1.5.16). Thus 13(2) = 0. By applying the same argument, we have 13(3) = [3(4) = [3(5) = 0. Thus 1 E13 = 0 (E) [53,1111 (1.5.17) Together with (1.5.14), (1.5.15) and (1.5.17), we deduce EK2(21) = 0(n“2) fgcfj (1F. 1. From (1.5.12), we finally obtain ERfig‘U-(l) = 0(n‘2) f (pf,- (11“. Proof of the Proposition 1.4 .5. The proof is basically the same as the previous proposition. 33 Chapter 2 Nonlinear Wavelet-based Hazard Rate Estimator 2. 1 Introduction In this chapter, we consider the same setting of survival analysis with random censorship as that in Chapter 1, with the extra assumption that random variables X and Y are nonnegative. Our goal is to estimate the hazard rate function A(x) with censored data, _, P(a:$X_':r)_ f(1:) A(x)—€l-1)rgi+ 6 _1—F(x—)’ :1: E (0, 00). There is an extensive literature avaliable on estimating A(.r) from censored data, see e.g., the survey paper Singpurwalla and Wong (1983) and the review paper Padgett and McNichols (1984). Tanner and Wong (1983) and Lo, at al. (1983) - studied a kernel estimation of density and hazard rate under random censorship 34 and provided Mean Square Error (MSE) and asymptotic normality of hazard rate estimators. The objective of this chapter, like that in the previous, is to provide a non- linear wavelet-based hazard rate estimator for randomly censored data, its asymp- totic formula for MISE and its asymptotic normality. We show this MISE formula, when the underlying survival density function and censoring distribution function are only piecewise smooth, has the analogous expansion for the kernel estimators. However, as to the kernel estimators, this MISE formula holds only under the smoothness assumption. In the next section, we give the elements of wavelet transform and provide nonlinear wavelet-based hazard rate estimators. The main results are described in Section 3, while their proofs appear in Section 4. 2.2 Notations and Estimators As that in Chapter 1, let T < TH be fixed and A1(:r) = A(1:)I(:r g T). Since, in general, hazard rate function A(:r) is not square integrable, we estimate A1(:r), i.e. hazard rate function /\(:r) for :r E (0, T]. Like in Section 1.2. the wavelet expansion of A1(:r) is M2) = Z tat-m +2 2 bum-xx). j=-oo i=01=—oo (2.2.1) bj =/)\1¢j, sz =/)\1?1’zj- 35 We propose a nonlinear wavelet estimator of A1(x) : oo q-l oo :5) = Z b,¢,(a:) +2 2 b,,-I(|5,-,| > awn-(1:), (2.2.2) j=-oo i=0 jz-oo where now the wavelet coefficients f) and hi]- are defined as follows: an (__:_r__) bj ___/$100“ —Fn (:r-) _ _ "“121: < T)¢j(Zk) Zl1‘—)Gll1-n(Zk—)l an-T( ) b..— -/w.~.( T)——— Fm _) __" M< ia 00. Then (2.3.1) continues to hold. While wavelet estimators allow us to obtain MISE and optimal convergence rates analogous to kernel estimators under weaker assumption, there is a fundamen-. tal instability in the asymptotic variance of wavelet estimator caused by the lack of translation invariance of the wavelet transform. For more details, see Antoniadis, - et al. (1994). Because wavelet estimators are only dyadic translation invariant, we 37 provide an asymptotic expansion of the variance and asymptotic normality result at dyadic point r = U2", 1: and l are integers. Theorem 2.3.3 In addition to the conditions on (b and 1,9 stated in Section 1.2, assume A1(r) is r-times continuously differentiable at :c 2 U2", Also assume that p = 2” = 0(n1/(2’+1)), q —> 00, pq62 -—> 0, 6 Z CVn‘l 1n n, where C > C1§{(8r + 2)(2r + 1)‘1 sup A1(1 - H)“1}1/2. Then \/np‘1():1(:z:) — /\1(:1:) + b(x)) =d> N((l,0'2(:r)), where Mr) = (Tn—um.) / u’ 2 ¢w Proof. The proof follows along the same lines as those in Lemma 1.4.1, use (p,- / (1 — (24.3) EU = b,,- + W5,“ + Rm, E(Riy) F) and 90,-]- / (1 — F) instead of go,- and apij. Because the denominators of b, and bi,- are bounded away from zero, all needed conditions are satisfied. 39 Let = iZth). W“) = hill/WM) k=l k=l 6kI(Zk S t) 1- G(Zk) ’ (2.4.4) Q(stt)= ”Mail: U(st 15—) V(stt) and U(Zk,t) H(SEQ/I(Zk+ Ito), (say). The first term 21,2,(1) = %E[A§(Z1)B2(Z1)P2(Zl)] = 0(1)E[A§(21)E1/2(B4(21) _0(ni)EA§(Z1)=O(£3)/cp§dF. Hence, from (1.4.19), we obtain Z1, 51)] (2.4.9) Z1,61) E1/2(P4(Zl) E21121“): o(n_1p). (2.4.10) The second term E[|A1(Zk)| Isa-(201 E(IB(Zk)l lB(Zz)| IP(Zk)| |P(Zz)l |z., 2., 626)]. 42 Conditionally on (Z,c = 2k}, {6,c = dk}, {Z = 2,} and {6, = d,}, by the Cauchy- Schwarz inequality, we have 4 4 4 4 ”4 1908(4)) (3(2))! (1442.)) |P(z:)l) 3 [EB (2)123 (zaEP (2.)EP (2)] . Through direct calculations as that in (2.4.7) and (2.4.8), we have 513(2 =oni( )2 Z EI4( (Zk) )Il4(Zz)| k==111,¢k = o G) (/ lgojldFY. (2.4.11) Hence 2213(2) = 061-) EU (0,.er = 0%) = 0(n-1p) (24.12) V Z].(f|<,9j|oiF)2 < 00 and p —> so. This, together with (2.4.10), we have E21112]- - o( n 1.p) Apply the previous same lines in 11, to 12,, we can show E2]. 1221- = o(n’1p) too. To complete the proof the lemma, it remains to show that E Z]. 13]- = o(n‘1p). Apply the moment inequality to 13], we have E13,. 3 E[A§(ZI)BZ(ZI)R3,(Z1)]. Conditionally on {Z 1 = 21}, {(51 = (11}, by the Cauchy-Schwarz inequality, we have E[B2( (21) )R3,( (21) )]< [EB4(~ )ER4( 21)]1/2 = 0(1/n2) P E: —1 ,2 F 2.4.13 43 Lemma 2.4.4 Under the assumptions of Theorem 2.3.1, we have - __ ,\ __ 2(1).- — 4))2 — n 12/171) = o(n 1p). Proof. The proof is similar to that in Lemma 1.4.2, but here we need one more step to approximate (3,. In view of (2.4.3) and Z(13,—b,)2=§j:(b,—b,)2+Z((3, b+)22Z(5, ”xi—(3,), i we have SISE :03.- — 4.)? —— n-lp / j /\1 —,"2 2 ‘1—'_'—H +EZHJ- +EZRn’j J J + 25215. — Mm...) + 2492113.- — bJIIWJ-I + £214.21?) 1' J" J' +EZ((3,- )2+2EZ|b,- —b,||(3,- —b| J =11+12+13+I4+15+15+I7+13, (say). Notice that 2- 2 /\1($) 2 -b) _/¢j(‘r)md$—b]9 we have E(b 2( _11A1((y+.7)/P _ ()2, Z ’52)] ‘2 . -H((y+j)/)p?1i: since f¢2 =1, Zt)-Why +j)/p)/(1— H((y +jl/P7) —> /,\,/(1- H = 0(f Ag), it follows that E2103]- — (21-)2 = n'lpfAl/(l — H) + o(n‘1p). Because the denominator appearing in hj is bounded away from below. Thus these 44 (3,- may be handled along the the same lines as those of Hall and Patil (1995, p.922) to show that Var{2j(13)—b2} - o(n n‘2 p2). So we obtain 11— - o(n p.) By Lemma 2.4.1, 12 = n—1 Z E1132(21) g 2n-1Z(EU,2(21) + E132(21)). j 1 By direct calculation, notice all denominators are bounded away from below, we have 2 _ z? ___ ,—1 2n 2 “+7 EUj(Zl)—El,(Zl) 0(1) /¢()Al( p )du) Thus, 12 = 0(n’1)f¢2(U) ij'12¥((u +j)/P)du = 0(n’1P) by 2,1942%“?! + 77/19) —>f/\§ {ooandp—+00. By Lemma 2.4.1 or (2.4.3), we have 13 = o(n‘lp). From Lemma 2.4.3, 17 = o(n'lp). Applying the Cauchy—Schwarz inequality to the rest terms, we complete the proof. Lemma 2.4.5 Under the assumptions of Theorem 2.3.1, we have stZE(((3.-s — .. 21(Ib..l>6)} (no-222+”). 1:0 j Proof. Let a and )8 denote positive numbers satisfying a + B = 1, we have .,<2zzg{a,_b 2IIb..)>6}+2ZZE{ ,.,._ 1),.) 21(lbs)|>6)} i=0 j i=0 j q—l s2ZZE{(bs.-— b..- ))+2ZZE((I> 2'1" .. 21(Ib..)>a6)} i=0 qj i=0 J +2ZZE{(b .-.— .. 21).. 542|>B<5)} i=0 j = 2(821 + $22 + 823), (say). Apply the argument analogous to (2.4.9), (2.4.11) and (2.4.13) appearing in the proof of Lemma 2.4.3 to 321, we conclude that 0c >2;/w< >::;;; (_)::/J.JJJ+o(g)ii . (/.JJJJ—)2 i=0 Jew?) = o(n—2r/(2r+l)). The third equality follows from 2]. p, 1 f 99?] dF < 00, 2“ (f ISOJ'J‘IdF)2 < 00 and q = 0(ln n), ‘while the last equality follows from 17.“po —> 0. Apply the same argument as that in the proof of Lemma 1.4.3 to 322, use 517' instead of 13,-]- in there, we conclude that 322 = 0(Tl-2r/(2r+l)). NOW let A = {IBij — bijl > 6}, then 323::IZE{(b iJ'" bij) 21Gb iJ" bij|>66)1 (A )} i=0 J +ZZE{(5 J-J— bJJ>(2IIbJJ-— bJJI>fi6)IA( >} i=0 J SZZEW JJ— JJ 21 bJJ— JJI>6}+ZZ<52P IbJJ—bJJI>z36) i=0 J i=0 .7 SZZE{(bij-1321(IIJbi)">6)}+ZZB—2E(bij-bij)2 i=0 J 1-0 J = 323(1) + 323(2): (SUI/L where 323(1) is analogous to 322 in section 1.4, which is o(n'2r/(2r+1)). While 323(2) is 0(321) , which is o(n‘2'/(2’+1)) too. Together with .921 and 322, we prove the 2 lemma. 46 Lemma 2.4.6 Under the assumptions of Theorem 2.3.], we have q-l JJ, 5 E 22bfj1(|l3,j| g 6) _JJ-wu —2’2')‘1/J\‘{)’ = o(JJ—zr). i=0 j Proof. The proof follows the same lines as that of Lemma 1.4.4. Lemma 2.4.7 Under the assumptions of Theorem 2.3.], we have 34 E i Z biz]- : o(p“2'). i=0 1' Proof. The proof follows from the step 3 of Theorem 2.1 of Hall and Patil (1995). We are now in the position to give the proof of the Theorem 2.3.1 and 2.3.2. Proof of the Theorem 2.3.1. Observe that Elf“ ‘ 2‘)? ‘ {212/ J f‘JJ +p'2'n2a — 22V / AW} _<_sl+32+53+34. Thus Lemma 2.4.4, 2.4.5, 2.4.6 and 2.4.7 together prove the Theorem 2.3.1. Proof of the Theorem 2.3.2. The basic idea of the proof is similar to that of Theorem 1.3.2. We omit the details. In the sequel we prove the Theorem 2.3.3. This will involve the following two lemmas. Lemma 2.4.8 Under the assumptions of Theorem 2.3.3, we have J/np-1(Z<6J — bJ)¢J(z)) =2» M0, 02m). where 02(JJ) = gala / [Z ¢(u + l)¢(l)]2du. l 47 Proof. In view of (2.4.1), ”P-1(ZUSJ " bJ')¢J‘(17)) .i l 3 :2 VnJc 3 :1 a- where K( t, 2:) =2 (b( t — J) (J: -— j). For the wavelets in Section 1.2, the kernel K (t,x) satisfies the moment condition (See Theorem 8.3 of Hardle, et a1. 1998, p. 95), i..e f( t(-J: k)K(t, 1) dt— -— 60),, for k— — O, 1,- . ,r— 1. Notice (Zhdk) are i.i.d. for k = 1,2,--- ,n,EV,,Jc = O, and EVfJc = g/£%K2(pt,px) dt — %(/ A1(t)K(pt,p:r) dt) 1 A1(3+u/p) —/1—H(:I:+u/p) [EM u+p1r - J)(p:r - J')]2du - 515(//\1(I + U/p)2¢(u +prc -J')a>(p:c -J')dU)2 j = i f 1-% [Z ¢(u + l)<;5(l)]2 du + 0(n‘1p‘l). The second equality follows by the change of variable, while the third equality by p = 2”, :L‘ = l/2", N —-> 00 and the Taylor expansion. Thus 22:1 E12,: = 02(33) + 0(p’1) —-> a 2:1:( ). In addition, K(t, 2:) being uniformly bounded, we have IV" kl < cfnTl—p —> O, cis a positive constant. So for all 6 > 0,1im,,_,,,JD Z:_1E(|Vn klz; an kl > e) = 0. Thus by Lindeberg-Feller CL’s Theorem, the lemma follows. Let =2? 2: but/ii” I(lbijl > 5) i=0 j 48 Lemma 2.4.9 Under the assumptions of Theorem 2.3.3, EJJ;2 = o(n—1p). Proof. In view of (2.4.3), write (30- as following biJ' = bij + (sz— ()1) + VT 1] + Rn 33‘. (2.4.14) Then q-l J6: :ZbiJViJ(I) I(lbijl > 6)+ ZZU’IJ - bij)¢ij($)1(lbijl > 5) i=0 3' i=0 J q—l +ZZWJ,JJJ,-(x1) (l5,,| > 5)+Z:Rfl,,o,,1(|iJJ,-| > 5) (2-4-10) i=0 j ' = 11+ 12 + 13 + 14, (say). Because of the compact support of w(:r), for each i, there are only finite number of j such that wJJ-(x) are nonzero. So E122— 0(Q):ZE(b iJ_ biJ) 2¢i2j($ 1') i—O j 221/ (Well 2:0 J = 0(4) [-131 + 2] n2 n — o(n‘l p). In the above, the second equality follows from the argument similar to (2.4.9), (2.4.11) and (2.4.13) in Lemma 2.4.3, while the last equality is from pqn‘l —-> O and q = 0(ln n). Similarly, we can show E12 2 E142 2 o(n-1p). As to the first term of J6, 2:1:(b21b2J)w2J($ I(Ib21l> 6) )+ EZbijVij(x) I(lbijl > 5) - [11+ 112 i=0J 1:0] 49 Let a and B are positive numbers such that a + B = 1, so ”11' <21: '13:)" bJ‘jlle-J-(rr )|I(|bJ,~| > (16) i=0 j +ZZIbz-J— bJJIIu'J-J< >11<1bJJ—b.J-|>JJJ). i=0 j Because A1(Jv ) is r- times continuously differentiable at 11:, so |b,-]-| < (jg—(”1(2) or b3 5 c2p:(2'+1), c is a constant (see Hall and Patil, 1995, p.917). Notice 62 = 0(ln n/n), p,- = p22, p = 0(n1/(2’+1)), thus I(IbJJ-l > 6) = 0 for large n, hence the first term in the bound of 11 1 actually is zero for all sufficient large n. In view of (2.4.14), the leading term to approximate hij is 5,5. Apply a similar argument as in (2.4.15) to 111. all the rest of the terms are of smaller order, we have EIJ2J=0(q) )ZZEw J-J- bJ-J) 2wJ-2J(x)I(IbJJ-- bJJI>z361 i=0J' (Q)§ZEl/a(b ij_ b0) )2a 1,1210; )Pl/b(|b,j — szl > 56) 1:0j i _ __ 4 +1 =0(q)ZZ£p,-n d:0(q)p3n d 1, where d> 2:+1 —2r/(2r+1)) : 0(TIJ—1p). = o(n The second equality follows by Holder’s inequality, while the third equality by Rosenthal’s and Bernstein’s inequality and let a —> 00, b —> 1 (see the details in Hall and Patil, 1995, p.917—918). The fifth equality follows by n‘lpq -—> 0. Apply —(2r+1) the same argument to 112, using (if, g c2 p,- ,VVe can show that E1122 2 o(n p) too, which proves the lemma. 50 Proof of Theorem 2.3.3 In view of (2.2.2), by analogous equality of (2.4.14) to h,- and the definition of J6 in Lemma 2.4.9, we have XJ(z)—AJ(x)—b( 1:20» —bJ-) )csz-(x )+[ZbJJsJ-()- (x)— b N(O, 02(x)). By Lemma 2.4.9, we have @7116 —> 0. The terms J2, J3 and J4 are analogous to 12, 13 and 14 in Lemma 2.4.9, so ap- plying the same argument, we can show that E122 2 EJ2= EJ2— — o(n p.) Thus W12 l—J 0, same as J3 and J4. Hence, in order to prove the theorem, it suffices to show that J5 = o(p”). Apply the same argument as in Lemma 2.4.8, using the moment condition of K(t,:c), it is easy to see J5: /[A1(t)- (:r)'(lplR thp$)dt- ME) = [1w + u/p) — AJ(x>JK(px + JJJpx) du — bu) /\(k)($ =/Zlk!( K(,ppJ:+up:r)du+o( )—b(:r) A2415) (fif— 2p; o(u + l)¢(l) dup" — b(x) + o(p") 1 : o(p—r), the last equality follows from the moment condition of K (t, 2:), which proves the theorem. Chapter 3 Minimum Distance Estimators in RegreSsion Models under Long Memory 3. 1 Introduction The practice of obtaining estimators of parameters by minimizing a certain distance between some functions of observations and parameters has long been present in statistics. These estimators have many desirable properties, including consistency, asymptotic normality under weak assumptions and robustness against outlier in the errors. Koul and DeWet (1983) and Koul (1985a, b; 1986) pointed out the importance of this methodology in linear regression models, using certain weighted empirical processes that arise naturally in these models. For more details and references on this methodology, see the monograph by Koul (1992b). Koul and Mukherjee (1993) extended the above results to linear regression models with long range dependent errors that are either Gaussian or subordinate to Gasussian. More specifically, they considered the multiple linear regression model )fni :12;;i/3+€i1 5i :G(T]i)1 Z: 1721... in? where {Jami 2 1} are known fixed constants, C is a measurable function from IR to R, {mi 2 1} is a stationary, mean zero, unit variance Gaussion process with correlation p(k) :2 E77117”,C ~ k‘”L(k), k _>_2 1, 0 < 6 < 1, where L is a function of positive integers, slowly varying at infinity, and L(k) is positive for large 1:. Thus 22:, p(k) = 00, implying the errors have long memory. For motivation and arguments in support of this Gaussian and / or Gaussian subordinated long memory error process, see Taqqu (1975), Dehling and Taqqu (1989) and a review paper by Beran (1992). The other class of long memory process is of the moving average type. For more on their importance in economics and other sciences, see Robinson (1994), Beran (1994), and Baillie (1996). These processes include an important class of fractional ARIMA processes. For various theoretical results pertaining to the em- pirical processes of long memory moving averages, see Ho and Hsing (1996, 1997), Giraitis et a1. (1996), Koul and Surgailis (1997, 2001b), Giraitis and Surgailis ( 1999), among others. Because of the importance of multiple linear models with long memory moving average errors, and the desirable properties of the above mentioned minimum dis- 53 tance estimators, it is natural to investigate their properties under the long memory moving average errors. The objective of this paper is to obtain the asymptotic dis- tribution of the m.d. estimators of regression parameter in multiple linear model with long memory moving average symmetric errors when the design variables are either known constants or i.i.d. random variables, independent of the errors. These results thus extend those of Koul (1985a,b) and Koul and Mukherjee (1993) to these models. The rest of this chapter is organized as follows. Section 2 provides the m.d. estimators and their asymptotic normality under both fixed and i.i.d. random design cases, while their proofs appear in Section 3 and Section 4, respectively. 3.2 Main results 3.2.1 The case of non-random designs Consider the linear regression model where one observes the response variable {Ym}, 1 S i g n, satisfying Yn, = 1;,5 + 5,, 1 g i g n, 5 6 RP. (3.2.1) h I Let X denote the n x p design matrix of known constants whose it row is 33,", 1 _<_ i g n. Here IR” denotes p—dimensional Euclidean space, IR = R1. In the sequel, for the sake of convenience, the dependence of various entities on n will not be exhibited. We assume the errors {55,1 3 i g n} to form a stationary moving average sequence, 00 8J = Eng-4,, bk ~ L1(k)k’(1+9)/2, 0 < 0 < 1, 1 g i _<_ n, (3.2.2) k=1 with the common distribution function F, where (3, s E Z are i.i.d. standard random variables, symmetric around zero and L1 is a slowly varying function at infinity. This implies that p(k) = Cov(el,el+k) = L(k)k‘2, where L(k) = C9 L2(k), Co = 2(2 — 9)‘1(1 — 6)’1fo°°(u + u2)‘(1+9)/2 du, and hence the errors have long memory. We assume that (0 in (3.2.2) satisfy the following conditions: A.1 |Ee2"29| S C (1 + |u|)‘5, for some C < oo, 6 > 0, Vu 6 IR. A.2 E (COP < oo. Giraitis et al. (1996, Lemma 1) proved that under the Condition A.1, the error dis- tribution function F is infinitely differentiable. The assumption A2 is a condition on the decreasing rate of its density function in the tails. Now, let 7,, :2 L1/2(n) n‘1‘QI/2 and define, following Koul and Mukherjee (1993). MN == r.:2 [Horn-”1:3“ [I(YJ — sz s y)-— 1(_1;+ sz < y)]}l|2dH(y). o(A) == 7.2 /I|(X'X)“/2{ZIJ[I(JJ s y) — Me. > —y)]} +(X'X12/2(A — fl) my) + f(-y)I lIdeo), A 6 RP, where I (A) is the indicator function of set .4, Hull denotes Euclidean norm and H -. is a nondecreasing right continuous function from IR to IR. The m.d. estimator of 55 the regression parameter )3 is defined by A B 2: argmin {Ag/(A), A 6 IR”}. Note that [3 is the estimator 5+ defined in Koul (1985b) for the independent errors case and is the estimator [3; defined in Koul and Mukherjee (1993) for the Gaussian subordinated process errors. The motivation for considering these m.d. estimators and its finite sample properties are discussed in Koul (1985b, 1992b). In particular, for p = 1, xi E 1, H(I) = .r [H(r) = 1(1: 2 0)], 13 is the Hodge-Lehmann [Median estimator] estimator of the one sample location parameter. Before we state the asymptotic normality of B, we need the following additional assumptions on the model (3.2.1) and (3.2.2): A.3 (X’X)’1 exists for all n 2 p. A.4 n max |$;(X’X)‘IJ:,-| = 0(1). 151571 A.5 f(1+ y2)’1dH(y) < 00. Conditions A.3 and A.4 are the same as those in Koul and Mukherjee (1993), while A.5 replaces the conditions ff’dH < 00, r = 1,2 and f0°°(1 — F)dH < 00 of the above paper. Let A = rn'1(X’X)1/2, B = 7,,(X'X)1/2, c,- = A‘lz,, d,- = B'lxi. We now state the main result: Theorem 3.2.1 In addition to (3.2.1) and (3.2.2), assume that A.1-A.5 hold. ' 56 Then, Am“ — fl) = (2/1’2222’2)1 / 24110::- S y) — 1(J,> —y)]f(y)dH(y) +0,,(1). (3.2.3) The next result gives the asymptotic equivalence of the m.d. estimator in the first order to the least square estimator and its asymptotic normality. Corollary 3.2.1 Under the assumptions of Theorem 3.2.1, J,,-I(X'X)1/2(JJ — s) = —J,,-1(J"X)-l/2 23,-5.- + 0,,(1). (3.2.4) 1, i=1 Moreover, G;‘/2TJ:2(X'X)2/2(3 — a) => NJ(0, 1...), (3.2.5) where Ipxp is p x p identity matrix, and an = T;2(X"X)_l/2X’RRX(‘XH‘¥)—1/21 Rn : (p(z - j))an1 21.2:1921 ° ' ° in- 3.2.2 The case of random designs In this subsection, we consider the following multiple linear regression model Y,- = 112,23 + 5,, 1 g 2' g n, 5 6 IR”, (3.2.6) under the same assumptions as those in the previous section, except that here {XJ,i 2 1} are i.i.d. random variables, independent of the errors and EX1 75 0. 2 57 Similarly, define Mlm ,;2 =an -1/2{Zx[1(—x;3 s y) _1(_3-;+x;3 < y)]}|l2dH(y) ::T_2/Iln -1/2{ZX_[I( 5i —y)]} + n-1/2(X'X)(A — a) my) + f(—y)] ||2dH(y) The m.d. estimator of the parameter )3 in (3.2.6) is defined by Bl :2 argmin {AII(A),A 6 RP}. Before we present the asymptotic normality of m.d. estimators, we need the fol- lowing assumptions on the model (3.2.6). A.6 E||X1||5 < 00. Let an— - Tn- Til/2, bn — —Tnn1/2, C,- = angi, D,- = bngi, then we have the following analogous result of Theorem 3.2.1 under the i.i.d. random design case. Theorem 3.2.2 In addition to (3.2.6) and (3.2.2), assume that A.1, 24.2, A.5 and A6 hold, then (13(8) — 5) z (4)2313)“ / 212 [1(5. _<_ y) — 1(5.> -y)]f(y)dH(y) + 0,,(1). (3.2.7) Corollary 3.2.2 Under the assumptions of Theorem 3.2.2, T—1n1/2(B _ 3)T—1n—1/2ZX 514.0100 (3.2.8) 58 Moreover, let EXI = p ¢ 0, then rglnl/2(Bl — fl) = —p rn'ln“1/2:5, + 0,,(1), (3.2.9) i=1 and Tilvl'l/QZEI- => N(0.,1). (3.2.10) i=1 3.3 Proofs of the theorems The method of proof is similar to that of Koul (1985a or 1992b; Ch5) which requires that M (A) istuniformly locally asymptotically approximated by quadratic form Q(A) and shows ”.403 — )3)“ = 0(1). This approximation in turn is used to obtain the asymptotic normality of m.d. estimators 5. F or more details, see Koul (1992b; Ch5) and Koul (1985a). In order to provide the details, we need some notations and several lemmas. Let C stand for a generic constant which may change from line to line. As in Ho and Hsing (1996, 1997) and Koul and Surgailis (1997, 2001a, b), put I oo Eu 3: E bkCi—ka 5:: I: E bkCi—k, k=l k=t+1 (3.3.1) H(z) := 13(5) 5 :17), mm) := F,’(x). The following two lemmas are analogous to Lemmas 5.1 and 5.2 of Koul and Sur- gailis (2001b), thus their proofs can be deduced from there. Lemma 3.3.1 Under the assumptions A] and 24.2, there exist lo 2 1 and a con- 59 stant C such that for any I 2 lo, :1: 6 1R, |f"”(x)l + lflp’(r)| S C(1+ I113)", p = 0,1,2, (3.3.2) If)(x) — f,_1(x)| g be(1+)x|3)“. (3.3.3) Lemma 3.3.2 Let g7(x) 2: (1+ MP)"1 and h(x), x E R be a real valued function such that, for some C < oo, |h(x)| S Cg.,(x), 'y = 2,3. (3.3.4) Then there exists a constant C7, depending only on C in (3.3.4), such that for any x, y E R, WI + y)| S 0797($)(1V lyll), (3-3-5) where a V b :2 max{a, b}. Remark 3.3.1 From (3.3.2) in the Lemma 3.3.1, f(x) and f, (x) satisfy conditions of h(x) in Lemma 3.3.2, thus, |f(x + y)) S C(1+ x2)‘1(1 + yz), lf'(x + y)! S C(1+x2)‘1(1+y2). Lemma 3.3.3 (Surgailis). Under the assumptions A] and A.2, there exists a constant C < 00 such that |00v(I(Eo s 2). Ms. S x))| 5 00+ gem-0, foralliEZ, xElR. Proof. The proof is in Appendix. 60 Lemma 3.3.4 Under assumptions of A.I and A2, there exists a constant C such that lCov(I(x<£o SIC-+010), I($10, q=1,2. (3.3.3) 61 Proof of { 3. 3. 7). According to the definition, we have 2 E[U.-,)(x,x + 0.)] S 2[EF,2_1($ -I012|- 52,14,513 + lail " 51,1—1) +£mflx—m.—axm+m4—afl Sinx-m¢x+mm lat" = 4 f(x + v) dv -l0:'| S C(1+ x2)‘1(|a,-| V |a,~|3), the last inequality follows from lemma 3.3.2 with h(x) replaced by f (x) and 7 = 2. Proof of (3.3.8). For q =1, (1) I+Og (1,, (z. a: + a.) = / [fz(u — Inc.-. — é...) — fz(u — 5.3)] du. (3.3.9) Follows the argument of Lemma 3.3.3, apply Lemma 3.3.1 and 3.3.2 with 7 = 2, we can obtain the following analogous inequality (U.(,i)($,$ + Gill S C(lszi-IIV lszt—1l2)(1+ 531)“ + $2)_1(lai| V lail3)- From (3.3.9) and (3.3.2), we have |Ui(.})(x,x + a,)| g C(lb,(,-_,| A1), thus we obtain E[Ui(,i)($,$ + 011)]2 S CEIb)C.-_,|2(1+ 55:1)(1 + Ira—100:1 V lail3)v which is (3.3.8) for q = 1. 62 For q = 2, apply Lemma 3.3.1 and 3.3.2, we have |U1-(3)(x,x + a,)| S 1+0.- / “(W-51,14)—f(-1(u-E,-J_1)]du! x+|o,-| _1 S C bf(1+ Iu — 5,,,_1|2) du I-lail I'l'lotl g be/ (1+u2)'1(1+§fi,_1)du ~|01| S Cblz(1+ €i,_1)(1+ x2)'1(|a,| V |a,-|3). Again, as IU’,(§)(x,x + ai)| S 2, we obtain (3.3.8) for q = 2. Hence, we proved the lemma. We are now ready to state and prove the asymptotic uniform quadraticity of AHA). Lemma 3.3.5 Under the assumptions of Theorem 3.2.1, for all b 6 (0, 00), E sup |M()3 + A'ls) — Q(8 + 24—13)]: 0(1), (3.3.10) s€N(b) where N(b) = {s 6 RP : ||s|| S b}. Proof. The proof basically is similar to that of Theorem 2.1 of Koul (1985a). As there, use the symmetry assumption of f (y), it is enough to show that Vb 6 (0,00), sup f|)Zd.[F+czs1(y>]||2dH 0, (3.3.2) and assumption A.5. 63 As to the (3.3.12), here we only give the proof for fixed 3 E N (b). The uniform convergence can be obtained by the compactness of N (b), similar to that of Theorem 2.1 (Koul,1985a). Let dijz=the j-th entry of the vector d,. Thus the integrand of the j-th summand of the left hand side (LHS) of (3.3.12) does not exceed 2: Idijdrjl |Cov(I(y < e,- S y + cfis), I(y < 5, g y + 0:3)“. 1 1" Apply Lemma 3.3.4, notice “cis” S ||c,~||||s|| = 0(n‘9/2)||s|| —> 0, so, for any 0 < h < 0, the above bound does not exceed 02:133..) (1 + yz)“(1+li- rl)‘”n-"/2nsn ! s Cllslln“2‘”)n2'”n"‘/2(1 + 33-1, where the last inequality follows from max,||d,~|| = 0(n’(2‘9)/2). Thus the j-th entry of the LHS of (3.3.12) does not exceed C||s||71_h/2/(1 + y2)“1dH(y) —> O, n —+ 00, which proves the (3.3.12). As to (3.3.13), we need to prove j” Z dicisflylllzdmy) = 0(1), (3.3.14) / E|(Zd.[I(e. s y) — Fm] [l2dH 0, there exists a O < z, < 00 and N15 such that P(|M(,s)| g .2.) 21— e, for all n 2 N15. (b). for any 5 > 0, 0 < z < 00, there exists N25 and a positive b > 0 such that P( inf M()3+.4“s) 2 3) 21—5 for all n. 2 N25. IISIiZb Proof. The proof of part (a) is from finite moment Ell/1(3) < 00, which is from (3.3.15). The part (b) is very similar to that of Lemma 3.1 of Koul (1985a) which we omit here. Finally, we are in the position to provide the proof of main theorem. Proof of Theorem 3. 2.1. The proof follows that of Theorem 5.41 of Koul (1992b) and Theorem 3.1 of Koul (1985a). We only give the sketch here. From Lemma 3.3.6, we have Mu?) — 2(3)) = (Higgins + A-1s) — “ifgbow + A-‘s)| S sup M(fi+A'ls) — Q()3+A’1s)l. IISIISb 65 From above inequality and Lemma 3.3.5, we get 111(3) = Q(A) + 0,,(1). (3.3.18) The last equality (3.3.18) together with M(B) 2 62(3) + 0,,(1), yield 62(5) = 62(5) + 0,,(1), which is precisely ||A(,3 —- A)“ = 0,,(1). Thus A A(,s — s) = A(A — s) + 0,,(1). (3.3.19) Now, from the defintion of Q(A) and A, we readily get the (3.2.3) of Theorem 3.2.1 from (3.3.19). In order to prove the Corollary 3.2.1, we need the following lemma. Lemma 3.3.8 Let Sn(x) = 2?:1d,[1(€i S x) — F(x) +f(x)€,-] , under the assump- tions of Theorem 3.2.], then Proof. The proof of the lemma can be deduced from Theorem 3.1 of Koul and Surgailis (2001c), where they proved more general case, i.e. the uniform reduction priciple for weighted residuals empirical processes. Proof of Corollary 3. 2.1. From the Theorem 3.2.1 and notation of Sn(x), we obtain Av? — (a) = (2 f 12cm)" f [5.3) + 3,,(_,,) — 2: d.s.~f(y)]f(y)dH(1/) + 0,.(1) i=1 = - 23,-5.- + 0,,(1), i=1 the last equality , which is (3.2.4), follows from lemma 3.3.8, while (3.2.5) follows from Theorem 2 of Giraitis et a1. (1996). 66 The following lemma is the asymptotic uniform quadraticity of M ((A) under i.i.d. random case. Lemma 3.3.9 Assume the conditions of Theorem 3.2.2 hold. Then, for all b E (0, 00), sup |Ml()3 + a;13)— Q10? + agls)| '2 0,,(1). sEN(b) Proof. The proof is similar to that of Lemma 3.3.5 except here {Xm' 2 1} are i.i.d. r.v’s, instead of fixed known constants. To prove the theorem, it is enough to show that Vb e (0, oo), 2 :22) [HE D. (Po. 3 + as) — 0:313») || dH(y) = o.(1), (3.3.20) E SUP [HER-[Hy <8.- 5 y+CIS) -F(v,y+C£8)] '2 dH(y) = 0(1), (3.3.21) s€N(b) E sup / H: D. (as. s y) — F(y) + 031(3)] ”2 dH(y) = 0(1). s€N(b) (3.3.22) Proof of ( 3 3. 20). Write F( (x, y) =fyf )du, use the differentiability of f, we can obtain LHS of (3320 )< b4/II71}:‘IZX1XIIIX||/::f ,)2|dz|| dH(y (3.3.23) Now, from Remark 3.3.1, we have --1 fan —a;1 my + blle-IIZ) ldz s Ca: (1 + b21321?) (1 + 32)“. 67 Thus, from (3.3.23), we have n“ 2 X,Xfl 1 LHS of (3.3.20) S Ca;2b4| hf/b+firfimn 4x2 (1+ b2||X,- which is 0,,(1) from A.6, A.5 and of —-> 0, hence (3.3.20) is proved. Proof of { 3. 3. 21 ) Similar to the proof of (3.3.12), we here only give the proof for the fixed 3 E N (b). Let D,j:=the j-th entry of the vector D,, which are analogous to dij in the fixed design case and K.(Z) == 1(1/ < 2 S y + 0,3) - F(x/3+ 0:3). Thus the integrand of the j-th summand of (3.3.21) does not exceed 22E [lDuDul lE [K1(€i)1\'r(sr) X.,X.] I]. (3.3.24) Apply Lemma 3.3.4, as |C§s| S of,‘ |X,—||||s||, we have (E [Ki(€i)K,-(Er) Xi, Apr] sca+fl40wawma4wawflma4WQWW? (3.3.25) Combining (3.3.24) and (3.3.25), use A.6, we obtain, for any 0 < h < 0, the j-th entry of LHS of (3.21) g C(llsll v ”smirk/2714+": Z(1+)1— r|)“’/(1 + y2)-1 dH(y) scmwwmmwm/h+rrmmwam 33m, which proves the (3.3.21). 68 Proof of (3. 3. 22) It suffices to prove fEHZDC’sfly) )|| dH(y =0(1), (3.3.26) / EHZD. (as. s y) — Fm] “2414(3) = 0(1). (3.3.27) The first equality (3.3.26) follows from the following inequality and assumptions A.6, (3.3.2) and A.5. 2 / Home) < oo As to the (3.3.27), like that of (3.3.15), we have the integrand of the j-th summand LHS of (3.3.26) 3 ||3||2E ”72-1 Z X,X; of the LHS of (3.3.27) does not exceed ZZEUDU'DU'I IE{[I(51 Syl-F(y)][1 ] Thus, from Lemma 3.3.3 and similar argument as (3.3.24), we obtain the j-th entry X,, x.) of LHS of (3.3.27) 3 Cb;2ZZ(1+ Ii— mfg/(1+ (fl—l dH(y). Thus, LHS of (3.3.27) S Cn’iQ‘Oan‘ofll + y2)'1dH(y) < 00. Hence, lemma is proved. Proof of Theorem 3.2.2. The proof is completely analogous to that of Theorem 3.2.1. Proof of Corollary 3.2.2. Proof of the (3.2.8) is completely analogous to (3.2.4) of Corollary 3.2.1. From (3.2.8), we obtain T—l n1/2(B— B): —r'1n'1/2Z(z\-—/L) "-W'n— 1n U225 +op(1). (3.3.28) i=1 69 But, the variance of the first term of the RHS of (3.3.28) goes to zero, thus the first term is 0,,(1), which proves (3.2.9). The last equality (3.2.10) follows from the Lemma 5.1 of Surgailis (1982). 3.4 Appendix Before we give the proof of Lemma 3.3.3, we need a following lemma. Lemma 3.4.1 Let g(x) = (1 + lxI‘Q’)‘l and h(x), x 6 IR be a real valued function such that |h(~r)l 5 09(23). (3.4.1) hold for any x E R. Then, for any x S O and any v, w 6 R If W“ + v + w) ‘ h(u + W141!) S C(lvl V W) (1 v |w|3)(1 + :32)“. (3.4.2) Proof. First consider |v| S 1, then by (3.4.1) and (3.3.5) with 7 = 3, the LHS of (3.4.2) does not exceed Clvl [3 (I + In + w|3)—ldu S C|v|(1 V |w|3) [—1 (1+ |u|3)-ldu S C|v|(1V |w|3)(1+ 152)“- Next, consider |v| > 1. Then the LHS of (3.4.2) does not exceed C/1(1 + In +. v + w|3)—ldu + C/j (1+Iu + w|3)_1du. (3.4.3) By (3.3.5), the first term of (3.4.3) does not exceed C(1V|v+w|3)/ (1+ lul3)‘1du 5 C|v|3(1 v |w|3)(1 +32)-1. The second term of (3.4.3) follows similarly. This proves the lemma. Proof of Lemma 3. 3. 3. Let f, be the o-field generated by (1,, k S i. Write the telescoping identity: [(5, g .2) — F(x) = Z U,,,(x), (3.4.4) (:1 where Ui,l($) = F1-l(~T " giJ—l) — Fill? — 5w) = Ugly.) + 1.33%), (3.4.5) where U,(,l)(l‘) = 171(33 - 53,1—1) - E(x — 51,1), U,(3)(17) = 171-1“ — 51,1-1) - E(x — 521-1)- The lemma 3.3.3 follows from the following (3.4.6) and (3.4.7). 2 E[U,-,,(x)] gC(1+x2)", 1:1,2,~-,10, (3.4.6) 2 E[U,‘j’(x)] 50(1+22)-11-1-9, 1>zo, q=1,2, (3.4.7) where lo will be chosen sufficiently large in order that the bounds of Lemma 3.3.1 hold. Indeed, by orthogonality of (3.4.4) and (3.4.6), (3.4.7). l:EU1,1+1($),(U01 (xll lCov( I(eo_ < x),I (e, S x) )l Now, it suffices to show (346) and (3.4.7) for x S 0 only. As to (3.4.6), E[U1,1($)]2 S 2[EF,2_1(1‘ ‘ 51.1—1) + E5205 — 51.1)] _<_ 2[EFz—1(l‘ - 51,1-1) + EFIUC — 51.1)] = 4F(x). Notice F(x) = ffoof(u)du and by Lemma 3.3.1 (3.3.2), we have F(x) S C(l + x2)", this proves (3.4.6). Consider (3.4.7) for q = 1. In view of (3.4.5), as 5,-1-1 = b,(,-_, + in), we have ,U.-(,l)($) = [1 [f1(u -' szi—z - 51,1) - flat — E,,,)]du. —00 Here, f, satisfies Lemma 3.4.1’s h(x) by Lemma 3.3.1. Thus from (3.4.2), we obtain le-(j)($)| S C(lblCi—ll V lblCi—ll3)(1 V lgi,ll3)(1+ 1‘2)-1 3 C(lb,(,_,| v |b,(,-_,|3) (1 + |é,,,)3)(1+ x2)-1. (3.4.8) Combining (3.4.8) with the estimate IU,(,})(x)| S C(lb)(,_)| /\ 1), which is an easy consequence of (3.3.2), we obtain E[U,.(,}’(3:)]2 s C(Elb,(,-.,|2 + E|b,(,_,|3) (1 + Elé,,,|3)(1+ .32)—l gcwu+zfi* 300+204rkfl m4m the second inequality follows from E |§,,)|3 < 00, which follows from the Rosenthal inequality. El: szzl3 S C: Elsz1l3 + C(: ElblCzl2)3/2, (:1 [=1 l=l 72 this proves (3.4.7) for q = 1. As to (3.4.7) for q = 2. From Lemma 3.3.1 (3.3.3) and Lemma 3.3.2 (3.3.5) with ’7 = 3, we obtain lU1(,i)($)l = l/ [f1(u — E.131—1) - fI—1(U ‘ ELI—llldu S be/ (1+ I'll — 51.1-1l3)-1du I g Cb,2(1v|§,,,_1|3)/ (1+|u|3)“du —oo 3 be(1v|é,,)_1|3)(1+ 2:2)-1. Hence, as |U1.(f)(x)| S 2, similarly as q = 1, we obtain 1~:[U,‘f’(3)]2 g 03,2(1 + .42)-1 g C(1+ 30-11-14, this, together with (3.4.9), proves (3.4.7). Hence the lemma is proved. 73 Bibliography [1] Antoniadis, A., Gregoire, G. and McKeague, I. W. (1994) Wavelet methods for curve estimation. J. Amer. Statist. Assoc. 89, 1340-1353. [2] Antoniadis, A., Grégoire, G. and Nason, G. (1999) Density and hazard rate estimation for right-censored data by using wavelet methods. J. R. Statist. Soc. B, 61, 63-84. [3] Baillie, RT. (1996) Long memory processes and fractional integration in econo- metrics. ‘J. Econometrics, 73, 5-59. [4] Beran, J. (1992) Statistical methods for data with long range dependence. Statist. Science 7, 404-427. [5] Beran, J. (1994) Statistics for Long-Memory Processes. Monographs on Statis- tics and Applied Probab., 61. Chapman and Hall. NY. [6] Chui, K. (1992) Wavelets: A tutorial in theory and applications , Boston: Academic Press. [7] Daubechies, I. (1992) Ten Lectures on Wavelets, SIAM, Philadelphia. 74 [8] Dehling, H. and MS. Taqqu. (1989) The empirical processes of some long rang dependent sequences with application to U-statistics. Ann. Statist., 17, 1767-1783. [9] Donoho, D. L. and Johnstone, I. M. (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425-455. [10] Donoho, D. L., Johnstone, I. .\/I., Kerkyacharian, G. and Picard, D. (1995) Wavelet shrinkage: asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B, 57, 301-369. [11] Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1996) Density estimation by wavelet thresholding. Ann. Statist., 24, 508-539. [12] Feller, W. (1957) An Introduction to Probability Theory and its Application. Volume I, Second edition, John W'iley 8: Sons, Inc. [13] Giraitis. L., Koul, H.L. and Surgailis, D. (1996) Asymptotic normality of re- gression estimators with long memory errors Statist. Probab. Lett. 29, 317-335. [14] Giraitis, L. and Surgailis, D. (1999) Central limit theorem for the empirical process of a linear sequence with long memory. J. Statist. Plann. and Inference, 80, 81-93. [15] Hall, P. and Patil, P. (1993) On the choice of smoothing parameter, threshold and truncation in nonparametric regression by nonlinear wavelet methods. Re- - 75 search Report SMS-72—93, Center for Mathematics and Statistics. Australian National University, Canberra. [16] Hall, P. and Patil, P. (1995) Formulae for mean integated squared error of non—linear wavelet-based density estimators. Ann. Statist., 23, 905-928. [17] Héirdle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998) Wavelets, Approximation and Statistical Applications. Lecture Notes in Statistics 129 [18] Ho, HO. and Hsing, T. (1996) On the asymptotic expansion of the empirical process of long memory moving averages. Ann. Statist. 24, 992-1024. [19] Ho, HQ and Hsing, T. (1997) Limit theorems for functionals of moving aver- ages, Ann. Probab. 25, 1636-1669. [20] Kerkyacharian, G. and Picard, D. (1992) Density estimation in Besov space. Statist. Probab. Lett. , 13, 15-24. [21] Kerkyacharian, G. and Picard, D. (1993) Density estimation by kernel and wavelet methods, optimality in Besov space. Statist. Probab. Lett. , 18, 327- 336. [22] Koul, H.L. (1985a) Minimum distance estimation in multiple linear regression, Sankhya Ser. A 47, 57-74. [23] Koul, H.L. (1985b) Minimum distance estimation in multiple linear regression with unknown error distributions Statist. Probab. Lett. 3, 1-8. 76 [24] Koul, H.L. (1986) Minimum distance estimation and goodness-of-fit tests in first order autoregression. Ann. Statist. 14, 1194-1213. [25] Koul, H.L. (1992a) M-estimators in linear models with long range dependent errors. Statist. Probab. Lett. 14, 153-164. [26] Koul, H.L. (1992b) Weighted Empiricals and Linear Models. IMS Lecture Notes, 21. [27] Koul, H.L. and DeWet, T. (1983) Minimum distance estimation in linear re- gression models. Ann. Statist. 11, 921-932. [28] Koul, H.L. and K. Mukherjee. (1993) Asymptotics of R-, MD- and LAD- estimators in linear regression models with long range dependent errors. Probab. Theory Related Fields, 95, 535-553. [29] Koul, H.L. and D. Surgailis. (1997) Asymptotic expansion of M-estimators with long memory errors. Ann. Statist. 25, 818-850. [30] Koul, H.L. and D. Surgailis. (2001a) Asymptotics of the empirical process of long memory moving averages with infinite variance J. Stochastic Processes Ed A ppl., 91(2), 309-336. [31] Koul, H.L. and D. Surgailis. (2001b) Asymptotic expansion of the empirical process of long memory moving averages. preprint. [32] Koul, H.L. and D. Surgailis. (2001c) Robust estimators in regression models — with long memory errors. preprint. 77 [33] [34] [35] [36] [37] [38] [39] [40] Lo, S.H., Mack, Y.P. and Wang, J .L. (1989) Density and hazard rate estimation for censored data via strong representation of the Kaplan-Meier estimator. Probab. Theory and Related Fields, 80, 461-473. Mallat, S. (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 674-693. Marron, J.S. and Padgett, W.J. (1987) Asymptotically optimal bandwidth selection for kernel density estimators from randomly right-censored samples. Ann. Statist., 15, 1520-1535. Meyer, Y. (1990) 0ndelettes et operateurs, Hermann, Paris. Padgett, W .J . and McNichols. D.T. (1984) Nonparametric density estimation from censored data. Comm. Statist. Theory and Methods, 13, 1581-1611. Patil, P. (1997) Nonparametric hazard rate estimation by orthogonal wavelet methods. Journal of Statistical Planning and Inference, 60, 153-168. Robinson, PM. (1994) Semiparametric analysis of long-memory time series. Ann. Statist., 22, 515-539. Singpurwalla, ND. and Wong, M.Y. (1983) Estimation of the failure rate, a survey of the nonparametric methods. Part 1: Non Bayesian Methods. Comm. Statist. Theory and Methods, 12, 559-588. [41] Stute, W. (1995) The central limit theorem under random censorship. Ann. Statist., 23, 422-439. [42] Stute, W. and Wang, J. -L. (1993) The strong law under random censorship. Ann. Statist. 21, 1591-1607. [43] Surgailis, D. (1982) Zones of attraction of self-similar multiple integrals. Lithuanian Mathematical J. 22, 327-340. [44] Tanner, M. A. and Wong, W .H. (1983) The estimation of the hazard function from randomly censored data by the kernel method. Ann. Statist., 11, 989-993. [45] Taqqu, MS. (1975) Weak convergence to fractional Brownian motion and to the Rosenblatt process. Z. Wahhrsch. 31, 287-302. [46] Wu, S. and Wells, M. (1999) Estimating hazard rate with truncated and cen- sored data by wavelet methods. preprint [47] Zhang, B. (1996) Some asymptotic results for kernel density estimation under random censorship. Bernoulli 2, 183-198. 79