. a 3&5 n . V. . n I . RymeFETH nun $.qu n9 .5— .9? .73.. Ln .0 f... “nouns ... .m‘ um. I'LL s. a... 55.1... {I .1... t L at"... ha. . ”a”. 1. mm," .fisfii .. .5133? 5...: Iii. 1‘ o ~u..,‘ 1.x. Ill .. in . i. 2.“. 9V ‘ 1_ ‘. 3%,: THESlS A 0/009 This is to certify that the dissertation entitled Error Density and Distribution Function Estimation in Nonparametric Regression Models presented by Fuxia Cheng has been accepted towards fulfillment of the requirements for Ph.D. degree in Statistics .M Major professor Hira L. Koul Date May 22 , 2002 MS U is an Affirmative Action/Sq ual Opportunity Institution 0- 12771 LIBRARY Michigan State University PLACE IN RETURN Box to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE 6/01 cJCIRC/DateDuopss-sz Error Density and Distribution Function Estimation in Nonparametric Regression Models Bv U Fuxia Cheng A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2002 ABSTRACT Error Density and Distribution Function Estimation in Nonparametric Regression Models Bv UV Fuxia Cheng This thesis studies some asymptotics of some error density and distribution function estimators in nonparametric regression models. First, the histogram type density estimator based on nonparametric regression residuals obtained from the full sample is shown to be uniformly weakly and strongly consistent with some rates. Uniform weak consistency with a rate is also obtained for the empirical distribution function of these residuals. We also show the weak and strong uniform consistency of the kernel type error density estimator. Furthermore, if one uses a part of the sample to estimate the regression function and the other part to estimate the error density, then the asymptotic distribution of the maximum of a suitably normalized deviation of the density estimator from the true error density function is the same as in the case of the one sample setup. Similarly, a suitably standardized nonparametric residual empirical process based on the second part of the sample is shown to weakly converge to a time transformed Brownian bridge. These asymptotic distribution results can be used to test the goodness-of- fit hypothesis, pertaining to the error density and distribution functions, thereby enhancing the domain of their applications. ACKNOWLEDGMENTS This research was partly supported by NSF grant DMS 0071619 with P.I. Professor Hira L. Koul. I wish to express my deep gratitude to my dissertation advisor Professor Koul for his guidance and suggestions on the subject of this thesis. I am also indebted to him for all his time and effort he spent on reading and correcting this thesis. His love of statistics and devotion to research have served and will continue to serve as one of the main sources of inspiration to my research. I would like to thank all the other committee members, Professors Dennis Gilliland, Vincent Melfi and Habib Salehi, for serving on my guidancecommittee. I am grateful to Professors James Hannan, V.S. Mandrekar and Roy V. Erickson who have given me a lot of help besides teaching me statistics and probability. I thank Professors Yimin Xiao, Sarat Dass and Lijian Yang for their helpful discussions whenever I stop by their offices. I also benefited greatly from many discussions with Professor Ildar I. Ibragimov during his summer visits to this department. iii TABLE OF CONTENTS Introduction ..................................................................... 1 0.1 Overview ................................................................ 1 Literature review Summary description 0.2 The model ............................................................... 3 Chapter 1 Consistency of the Density and Distribution Function Estimators ................. 4 1.1 Definitions of estimators of G and g ...................................... 4 Regression function estimate, mn(:1:), of m(:r) Error’s distribution function estimate, On(t), of G (t) Error’s density function estimate, g},,(t), of g(t) 1.2 Basic assumptions ........................................................ 5 1.3 Consistency of On ........................................................ 7 1.4 Consistency of fin ....................................................... 10 1.4.1 Uniform weak consistency of 9,, .............................. 10 1.4.2 Uniform strong consistency of 9,, ............................. 13 1.4.3 Ll-norm consistency of 9,, .................................... 18 1.5 Consistency of kernel density estimator g... of g ......................... 19 1.5.1 Point-wise weak consistency of fin. ........................... 19 1.5.2 Uniform weak and strong consistency of f)”. ................. 25 Chapter 2 Asymptotic Distribution of some Density and Distributions Function Estimators 30 2.1 Definitions of the modified estimators of G and g ....................... 30 iv Regression function estimate, m;(;r) of m(2:) Error’s distribution function estimate, G;(t) of C(t) Error’s density function estimate, g,‘,(t) of g(t) 2.2 Basic assumptions ....................................................... 31 2.3 Asymptotic distributions of g; .......................................... 32 2.3.1 Asymptotic normality of g;(t) .................................... 33 2.3.2 Asymptotic distribution of the global measure of the deviation of g; from the estimator g ............................................................ 38 2.4 Asymptotic distributions of G; .......................................... 48 Chapter 3 Applications on Testing the Goodness-of-Fit of the Error Distribution ........... 54 A test statistics for the hypothesis H : g = g0 A test statistics for the hypothesis H : G = Go Bibliography .................................................................... 55 Introduction 0. 1 Overview Regression analysis is a well-known method for studying the relationship between variables. A tremendous amount of attention has been focused on the problem of regression function estimation. But the unknown error distribution estimation has not been well-considered, though it is important in its own right. In order to check whether the model is appropriate or not, we need to propose a good estimator for the error distribution and to consider a goodness-of-fit test for the error distribution. Since the error distribution estimation may give a direct impact on diagnostics of the model, we will consider the error distribution estimation in the nonparametric regression model. However, few published works are available on the error distribution estimation in the nonparametric regression. In nonparametric regression models, over the last five decades, the statistical lit- erature has been replete with papers on the estimation of the regression function. Relatively little is known about the estimation of the error density and distribution functions in these models. It is often of interest and of practical importance to know the nature of the error distribution after estimating a regression function. The focus of this dissertation is to investigate the consistency and the asymptotic distributions of the error density estimators and empirical distribution function based on nonpara- metric residuals. In parametric regression and autoregressive models several authors have studied the weak convergence of the empirical processes based on residuals, cf., Koul (1970, 1977, 1992, 1996), Loynes (1980), Portnoy (1986), and Mammen (1996), Boldin (1982), Koul (1991), and Koul and Ossiander (1994). The uniform consistency of kernel type error density in these models is discussed in Koul (1992). In this dissertation, we consider the asymptotic properties of the error density and distribution function (d.f.) estimators based on the nonparametric residuals. Sufficient conditions are given under which the histogram error density estimator based on nonparametric residuals is uniformly weakly and strongly consistent, and Ll-consistent. The uniform consistency with a rate of the estimator of the distribu- tion function is established while the strong uniform consistency with a rate of the histogram error density estimator is also obtained. More generally, we show the kernel error density estimator is not only weakly consistent, but also strongly consistent. In order to consider the asymptotic distributions of the estimators, we will split the whole sample into two parts. Use the first part of the sample to estimate the unknown regression function. Define the density and distribution function estimators based on the second part of the sample and the regression function estimator. The asymptotic distribution of a suitably standardized density estimator at a fixed point is shown to be normal while that of the-maximum of a suitably normalized deviation of the modified density estimator from the true density function is the same as in the case of the one sample set up established by Bickel and Rosenblatt (1973). An application of the asymptotic distribution results is to test the goodness-of-fit of a specified error density function. Similarly, a suitably standardized nonparametric residual empirical process is shown to converge weakly to a time transformed Brownian bridge. This result ex- tends some of the results of Lemma 2.3.2 with d,"- 5 71-1/2 of Koul (1970) and Loynes (1980) from parametric regression to the nonparametric regression model. We also obtain the asymptotic distributions of the d.f. estimator. 0.2 The model Let X and I" be I-dimensional random variables (r.v.’s) with X taking values in [0, 1], E ll’l < 00, and m(I) = E(Y sz) denote the regression function. We consider the nonparametric regression model: Y = m(X) + E, where 5 is independent of X, the regression function m and the distribution of the error 5 are unknown. Let {(X,, K); 1 S 2' g n} denote independent replications of (X, Y). Then €,~=If}—m(X,-), i=1, 2,...,n, are independent identically distributed (i.i.d.) copies of 5. Let g and G denote the unknown density and distribution functions of 5. If we can observe 5,, 1 S i S n, it is easy to estimate 9 and G. Unfortunately, we can not observe them. In this dissertation we will consider the problem of nonparametric estimation of the unknown density 9 and d.f. G ofe. Chapter 1 Consistency of the Density and Distribution Function Estimators 1.1 Definitions of estimators of G and 9 To describe the estimation of G and 9, we need to first describe an estimator of the unknown regression function m. Here, we shall use the well-known Nadaraya-Watson (1964) kernel regression estimator mn(1'): ( ._ 2:121 Yi Lh 5’3 - Xi) "‘"m " 22.. Luz — X.) ’ where h 2 ha is a band width sequence of positive numbers tending to zero, and L is :r €[0,1], the kernel density function defined on R, Lh(:r) = L(:r/ h) / h. Let ('31:: Y; — mn(X,-), z=1,2,...,n, denote the nonparametric residuals and let an be another sequence of positive numbers tending to zero. The empirical d.f. and the histogram type density estimators based on these residuals that are of interest here are, respectively, - 1" Gut :2 — Iéi 0 and 6,, —> 0, as n —+ 00, (1.2.2) 0,? SUP ldn(:r)l = 0p(1), as n —+ 00. (1.2.3) IE[0,1] Note: Let f (x,y) denote the joint density of (X, l"), f0(a:) denote the marginal density of X and [(1') = fyf(2:, y)dy. From Mack and Silverman (1982), one can show that under the following assump- tions, (1.2.3) holds. 0 L: (L1) L is uniformly continuous with modulus of continuity wa; (L2) L has bounded variation; (L3) L is absolutely integrable w.r.t Lebesgue measure on the line; 01 (L4) L(:r) = L(—:r) and L(a:) —> 0 as [II ——> 00; (L5) f \/l_xlog Iii ldL(x)l < oo; o E|Y|s < 00, SUPxe[o,1] f |y|5f(:r, y)dy < 00 for some .9 2 2 and n2'7‘1hn ——> 00 for some 7) <1— 3’1; 0 f(:r, y),f0(1‘) and l(:r) are continuous functions, f0(a:) is bounded away from zero on [0,1] and f0(r) and [(117) have bounded second derivatives on [0,1]; 0 hf, = 0 ((logh;1)/(nhn)). Under the additional condition that Y is bounded, Hardle, Janassen and Serfiing (1988) have shown that there exist constants A and A, such that sup |d,,(:r)| 3 A5,, + A'hn, all large n, a.s.. (1.2.4) 16(0,l] At this point we only need to note the following implications. First, by (1.2.3), for any 6 > 0, there exists numbers KE < 00 and N( < 00, such that P(A,,,,) > 1 — c, Vn > N“ (1.2.5) where Ame = { sup] ldn(x)| S Kefln} x6[0,l This fact will be used in the proof of the uniform weak consistency of G", the Ll-norm weak consistency of 9,, for g, the point-wise and uniform weak convergence of (in. Secondly, by (1.2.4) and choosing hn to be small enough, there exists a constant C (0 < C < 00) such that sup |d,,(2:)| < C5,, all large n, a.s.. IE[0,1] Define B,, = sup |d,,(.1')| 3 C3,, . (1.2.6) :rE[0,l] Then we have This property will be used in the proof of the point-wise and uniform strong, and the Ll-norm strong consistency of 9,, for g. 1.3 Consistency of G7, In this section, we shall state and prove the uniform consistency theorem of G". Theorem 1.3.1 Under (1.2.2), (1.2.3), and the assumption that for some 0 < a < 1/2 and any finite constant K > 0, (nh..)° sup [Ga + KB.) — G 0, in probability. (1.3.2) tER Proof: Let G,,(t) denote the empirical distribution function of the errors 51, 62, . . . , en. Recall the extended Glivenko-Cantelli lemma from Fabian and Hannan (1985, pp 80-83): supn-BlGnU) — G(t)| —> 0 a.s., for any 0 < B < 1/2. (1.3.3) teR Thus, if we Show sup(nh,,)°|G,,(t) — Gn(t)| -—> 0, in probability, (1.3.4) tER then, by (1.3.3) and hn —-> 0 in (1.2.2), we obtain the claim (1.3.2). We now proceed to prove ( 1.3.4). Rewrite 5”,- = 5,- — (1,,(X,) and [(Ei S t+ dn(‘Xi)) 1 n . 1:1 1 " - an t = — I 1' < t , an t '2 < > n E (e _ > ( > Fix any 6 > 0. By (1.2.5), it suffices to show (1.3.4) on AM. By nhn ——> 00 (which ( can be obtained from 13,, —> 0 in (1.22)), a > 0 and 1.3.1), it follows that sup[G(t + K43") — G(t — K',B,,)] ——+ 0. t6}? Thus sup[G(t + 1x15") — G(t)] —+ 0. (1.3.5) tER Next, rewrite (nhn)aién(t) ‘ Gn(t)i : 2112“) + Z2n(t)s where Zln(t) I: (Tl—:31): i[1(5i S t+ dn(4¥i)) — G(t + dn(4xri)) _ 1(5i 5 t) + G(t”) __ (nh.,,)" " ,. 22m) .— n Ziae + dn(A.)) — G(t» Using the monotonicity of G, we see that on AM, sup |Z2,,(t)| g (nhn)° sup [G(t + KEBn) — G(t - K,fi,,)] —> 0, (1.3.6) tER tER by (1.3.1). It remains to show that on AM, sup|Z1n(t)|—>0, in probability. (1.3.7) tER On AM, for any t E R, by the monotonicity of the functions I and G, we have 2mm s 9'33)— ime. s t+ m.) — G(t — 1w.) — as. s t) + Gm] : (Tl—IE): :[Ha g t + K0871) *— GU + Kcfln) — [(51’ S t) + G(t” +(nh,,)°[G(t + K,B,,) —— G(t — A’,B,,)], 8 and similarly, we also get Zine) 2 ("To ime, g t — 1.15,) — G(t — ma.) — 1(5, 3 t) + G(t)] :-(nh,,)"[G(t + 111.8,) — G(t — 1cm]. Thus, sup IZIn(t)| S (nhn) fisup lUn(t + 1(6611) "T Un(t)| tell \/7—1 £612 +(nh,,)° sup [G(t + K,B,,) — G(t — Kcflnfl, tER where U,, is defined in (1.1.4). For independent uniform [0,1] random variables: {1,52, - .. ,{m define n my) = $3: [1(g, 3 v) — v], v v 6 [0,1]. Since U,,(t) has the same distribution as V,,(G(t)), by 3,, ——> 0, (1.3.5) and the tightness property of the standardized uniform [0,1] empirical process, we obtain that fisup IUn(t + K,B,,) — U,,(t)| —> 0, in probability. :63 Also, (nh,,)"/\/n —> 0 because of hn —+ 0 and a < 1/2. Therefore, the first term in the above bound tends to zero in probability. We have completed the proof of Theorem 1.3.1. Comment: If the density 9 is bounded, using llglloo to denote its Loo-norm, then (nhn)° sup [G(t + K5") — G(t — K9,,” tER s 2KIIguooaa ——- Minuteman-i vlogn. In this case, a < % and (nhn)“"% Vlogn -+ 0 imply the assumption (1.3.1). 1.4 Consistency of 9,, In this section. we state and prove the weak, strong and in Ll-norm consistency of 9,, for 9 1.4.1 Uniform weak consistency of 9,, In order to show the uniform weak consistency of {In for 9, we use the following exponential type probability inequality established first by Dvoretzky, Kiefer, and Wolfowitz (1956). See also Corollary 1 of Massart (1990). Lemma 1.4.1 For any 6 > 0 and the empirical distribution function G,, and empir- ical distribution G, we have 13(21):: G,,(x) — G(x)! > e) _<_ 2exp(—ne2/2), V n 2 1. Define ”12((avbll: 071(1)) - 011(0), u((a» bl) = G(b) - G(a), for any a S b. By the. triangular inequality and using Lemma 1.4.1, we can obtain that P(sup{ Now, we begin to state and prove the uniform weak consistency theorem. un((a,b]) - u((a,b])| : a g b} > e) S 4exp(—n€2/2), V n 21. (1.4.1) Theorem 1.4.1 Suppose g is uniformly continuous on R. Then, under the assump- tions (1.2.2), (1.2.3) and liman = 0, (1.4.2) h,, 2 limn a" = 00, (1.4.3) logn we obtain sup |9,,(t) — g(t)| —> 0, in probability. tER 10 Proof: Since 9 is a uniformly continuous density on R, it follows that g is bounded on R. Let suti,eRg(t) = 90. Define 1 n ZIU—an <5,- 3 t+a,,). (1.4.4) on , zr-l nt: g() 271 By the uniform continuity of g on R, it follows that 1 l'i'an Eg,,(t) = / g(s)ds —> 9(t), uniformly in t 6 R. (1.4.5) t 2a,, —a.. Also we have I 9.0) - Egn(t) = 2a [un((t — a... t + an]) - #((t - a... t + £11.01- Hence, for any 6 > 0, by (1.4.1), we have that for Vn 2 1, P(Sup lgn(t) - Egn(t)| > 6) tell 2 P(sup lu,,((t — a,,,t+ a,,]) — u((t — a,,,t+ a,,])}| 2 Zone) (1.4.6) 1612 _<_ 4exp(—2naie2). But, h,, —> 0 and (1.4.3) imply that no2 —-> 00, and hence exp(—2naf,e2) —+ 0. 71 Combining this with (1.4.6), we obtain that spit; |g,,(t) — E9,,(t)| —) 0, in probability. Thus, to establish Theorem 1.4.1, it suffices to prove that 30:1,}: |9,,(t) — g,,(t)| —> 0, in probability. (1.4.7) For any 6 > 0, by (1.2.3), it suffices to show (1.4.7) only on A",( which is defined below (1.2.5). Define Cn : Kcfin- (1.4.8) 11 By (1.4.3), we see that a,,/c,, —> 00. (1.4.9) So c,, < a,, holds for all sufficient large n. Hence, on AM, it follows that for large enough n, '91:“) _ grz(t)l I 2na" 21a — an + dn(Xi) < 6i S t+ an + dn()(i)) —I(t—a,, 0, in probability, j = 1,2. teR Rewrite fl 1 Jul“) -_— 2na [I(t—a,,—c,, 0, by (1.4.1), we have P( [It— n_n77) = P( sup tER S 4exp(—2naf,n2). (1.4.10) un((t —— an — c,,,t -— a,, + c,,]) - u((t — an — c,,,t — an + c,,])l > 2a,,n) 12 On the other hand, by nag, —-) oo (implied by h,, —-> 0 and (14.3)), we see that exp(—2naf,n2) —> 0. Combining this with (1.4.10) and 1 t-an-I-cn sup |—/ g(s)ds| g goon/an -—> 0 (1.4.11) tER 2011 t ”an‘Cn (where we have used (1.4.9) and the boundedness of 9), we have, on AM, sup J,,1(t) —> 0, in probability. tER On the other hand, we can similarly prove that sup,€R J,,2(t) —+ 0, in probability on AM. Thus the claim holds. Using a similar argument, we can prove the following Theorem 1.4.2 Suppose g is Lipschitz on R. Then, under the assumptions (1.2.2), (1.2.3), an = if”4 and h,, = n‘1/5, we obtain that for any a: 0 < a < 3/20, n‘I sup |9,,(t) — g(t)| ——> 0, in probability. tER Note: 9 is Lipschitz on R means that there exists a constant I (0 S l < 00) such that for any t1, t2 6 R |9(t1) - g(t2)| S lltl — t2I- 1.4.2 Uniform strong consistency of 9,, The first result is about the point-wise strong consistency of 9,,(t) for g(t). Theorem 1.4.3 Suppose g is continuous at t. Then, under the assumptions (1.2.2), (1.2.4), (1.4.2) and (1.4.3), we have 9,,(t) —>g(t), a.s.. (1.4.12) Before we give the proof of the above theorem, we state the following well-known Hoeffding inequality. See Hoeffding (1963) for its proof. 13 Lemma 1.4.2 Let 21,22, . . . , Z,, be independent zero-mean real-valued random vari- ables and let S,, = 2?:1Z,. If (1,3 Z,- S b,, i = 1,2,...,n, where a1,b1,...,a,,,b,, are finite constants. Then 2t? PUSRI Z t) _ 28X1)(—Zn (l) _ a.) i=1 2 z 2), Vt > 0. Proof of Theorem 1.4.3: Use the following decomposition (where 9,,(t) is defined in (1.44)) 91.0) - 90) = [51.20) - 97.0)] + [9.10) - 59710)] + [59.10) — g(t)l- (1-4~13) By the continuity of g at t and using (1.4.2), we obtain that 1 (+011 Egn(t) = 2—/ g(s)ds -—> g(t). an t—an So it suffices to show that gn(t) - Egn(t) -+ 0. a.s., (1.4.14) Mt) - g..(t) —> 0, as. (1.4.15) We show (1.4.14) first. Fixing any 6 > 0, apply the Hoeffding inequality to 1 fl nt_Ent= t-n i< n_ t_n i< n g() g() 2na.;[1( a 0 (in (1.22)) imply that 2 ”a" —+ 00. (1.4.16) logn 14 This, in turn, implies that for any constant 7' (0 S 7 < 00) Z exp{—2naf,7} < 00. (1.4.17) So (1.4.14) follows from the Borel-Cantelli lemma. Now we begin to show (1.4.15). Recall the definition of B,, in (1.2.6). By (1.2.7), for V6 > 0, we obtain that P( we. (la-(t) — gunman 2 6)) s 10(1):, Bf} =1— was) 4 0. Thus, (9,,(t) — 9,,(t))I(Bf,) —> 0, as. (1.4.18) Next, we deal with the difference of 9,,(t) —- 9,,(t) on B,,. Let b,, = 05”. (1.4.19) Define 1 Tl Lnt 3: It— n—bn iSt_n n) 1() 2nan;( a <6 a +b) 1 n n t I: n_ n i< n n- L2() 2nan§l(t+a b <5 _t+a +b) By (1.2.1) and (1.4.3), it follows that b,, _ logn Thus there exists an integer N, such that b,, < a,, for all n _>_ N1. Combining this with the property SUPxe[o,1] |d,,(:r)l 5 b,, on B,,, it follows that for all n > N1, IQn(t) - gn(t)|I(Bn) S Ln1(t)1(Bn) + Ln2(t)1(Bn)- (1-4-21) So, it remains to show that L,,,(t)I(B,,) ——> 0, as, (1.4.22) Ln2(t)I(B,,) —+ 0, as. (1.4.23) Rewrite L,,1(t)I(B,,) < 1 Z [Ht—a,, -—b,, <6,- St—a,,+b,,) bng(t)(1+ 0(1)) 2a,, 1 —P(t—a,,—b,, 0, by the Hoeffding inequality, P(1 2nan Z[I(t—a,,—b,, 0, as. (1.4.25) teR Proof: Since all conditions of Theorems 1.4.1 and 1.4.3 are satisfied, we can use the results we have obtained in the process of their proofs. For any 6 > 0, since (1.4.6) holds and (1.4.16) implies that Zexp{-2na,2,e2} < 00, by the Borel-Cantelli lemma, it follows that sup |9n(t) - Egn(t)l -+ 0, 0.8- tell 16 Hence, to prove (1.4.25), by (1.4.13) and (1.4.5), it suffices to show that sup |9,,(t) — 9,,(t)[ ——> 0, as. tER For any 6 > 0, by (1.2.7), we obtain that 1_ Tl l P{Ui’in(sup Wt) - 91(t)|1(Bf) Z 6)} 5 PW” BC}=1- P{DEnBz‘} -> 0- zen Therefore, sup |9,,(t) — g,,(t)|I(Bf,) —-> 0, as. teR Since (1.4.21) still holds, by the same argument as in the proof of Theorem 1.4.3, we can see that it remains only to prove sup L,,1(t)I(B,,) —> 0, as. tER By (1.4.20) and the boundedness of 9, we see that the supremum of the second term 11 (1.4.24) tends to zero, i.e., sup b.9(t)<1+ 0(1)) —+ 0, tER 20% it suffices to prove that suplZ[1(t—a,,—b,, 0, a.s.. For any given 77 > 0, by (1.4.1), we obtain that for V n 2 1 :[IU—an—bn <6,St—a,,) P{1 sup 271011 tER —P(2:-a,,—b,, <6,- St—a,,)] >17} = P{sup lun((t - an — hmt - anl) - M((t - a,, - b,,,t - anl)| Z 204.77} tER S 4exp(—-2nain2). 17 This bound together with the Borel-Cantelli lemma, (1.4.16) and (1.4.17 ) with 'y = 172, proves the lemma. Under some further conditions (choosing a,, = 71’”4 and h,, = n‘1/5), we can similarly show the following uniform strong convergence with some rate. Theorem 1.4.5 Suppose g is Lipschitz on R. Then, under the assumptions (1.2.2), (1.2.4) and an 2 n—1/4, hn : 11—1/5, we obtain that for any a: 0 < a < 3/20, no sup linU) - g(t)| -+ 0. a-s- tER 1.4.3 Ll-norm consistency of 9n In this sub—section, we will consider the Ll-norm of the deviation of 9,, from 9: Ha. — gll. = /R |9,,(a‘) — g(rr)|d:r. Under some sufficient conditions, we obtain that ”9,, — g”, —> 0 almost surely, First, we give the following lemma. For its proof, see Corollary 3.1 of Serfling (1983). Lemma 1.4.3 Under the assumptions E |e| < 00 and Ila. — glloo = sup |§n(t) - g(t)| -+ 0 as. ten we have I14. — 9)). = 001.6). — gull?) where O(.) depends only on 9. Combining Theorem 1.4.4 and Theorem 1.4.5 with Lemma 1.4.3, we have 18 Theorem 1.4.6 Assume Elel < 00. Then, the following hold. (i) Under the conditions of Theorem 1.4.4, we obtain ”Sn - gII1——> 0, 0.8. (ii) Under the conditions of Theorem 1.4.5, we obtain that for any a: 0 < a < 3/40, n°||9,, — g||1-—> 0, as. 1.5 Consistency of kernel density estimator 9,,. of 9 In this section, we consider the consistency for the more general kernel density esti- mator of g. The kernel density estimator (based on nonparametric residuals) is given by emu) := 2m ), teR, (1.5.1) 720. n 1.: where K is the kernel density function. The assumptions on kernel K are as follows: (K.1) The kernel density K is bounded and there exists a constant p > 0 such that K(u) = 0, for |u| > p. (K2) The kernel density K is Riemann integrable on [—p, p]. 1.5.1 Point-wise weak consistency of 9,,. In order to show the weak consistency of 9,,.(t) (defined in (1.1.2)) at the continuous point t of 9, we first state the following lemma, which is about the piecewise constant function approximation for the kernel function K. See Lemma 3 of Devroye and Wagner (1980) for its proof. 19 Lemma 1.5.1 Suppose the kernel density K satisfies the assumptions (KI) and (K2). Then, for any 6 > 0 and 6 > 0, we can find a function N K*(;17)= ZOHIU.($) where (i) 011,02, - .. , 02),) are nonnegative real numbers, N < 00 is an integer, (ii) U1, U2, ~ -- ,UN are disjoint, left-open and right-closed intervals contained in (-p,p], (iii) K*(:r) S supueRK(u), V2: 6 R, (iv) |K"'(2:) - K(a‘)| < 6, except on a set D, for —p < a: S p. (v) D Q E = ug,E,-, where M < 00 is an integer, E1,E2, - -- ,EM are left—open and right-closed intervals from (—p, p], whose union has Lebesgue measure less than 6. Define n 1 K(Ei—t na 0. n i=1 n g...(t) = ), t e R. (1.5.2) Now we begin to state and prove the point-wise weak convergence theorem. Theorem 1.5.1 Suppose g is continuous at t. Then, under the assumptions (K. 1), (K2), (1.2.2), (1.2.3), (1.4.2) and (1.4.3), we have 9,,.(t) —-> g(t), in probability. (1.5.3) Proof: Rewrite a..(t) - 90) = [9...(t) - gn.(t)l + [9...(t) — g(t)l- (1-5-4) 20 By (1.4.2), the continuity of g at t and K being a density function, we can check that Egn.(t) = iEK(Ela:t) = lu|£pK(u)g(t + a,,u)du -+ g(t). and Var[g,,.(t)] = 71:3 Var[K(€la:t)l s nigElK(€1.:tll2 : fK2(z)dzTgC(:)(1+o(1)) :O(nh,,) _>0, where we use nan —) 00, which is implied by (1.4.3) and (1.4.2). Thus, by the Chebyshev inequality, 9,,.(t) — Egn.(t) —> 0, in probability. So it follows that gn.(t) —-) g(t), in probability. To complete the proof of the claim (1.5.3), it thus suffices to prove that 9,,.(t) — 9,,.(t) —> 0, in probability. (1.5.5) For any 5 > 0, by (1.2.3), it suffices to show (1.5.5) only on A,“E of (1.2.5). By Lemma 1.5.1, we see that for the e > 0 and V6 > 0, we have the piecewise constant function K “ having all properties in Lemma 1.5.1 with D = B = U,’-‘.”__,E,- in property (v). We will choose D = E whenever we need to use Lemma 1.5.1. , Rewrite 21 lino t - gn.(t)| K () 2;] (“a:t)dc§‘.(u)—/K("a:t)dG..(u) 1 ,, u—t — . _<_ E, I.(a )—K“a( ntt))‘dG,,(u) n u-t . +aln(“"/IK a. —I\ (a( a. )ldG,(u) +— fK‘(ua—t)dG,, u)-—(u/K‘ ),,dG(u) a,, a,, =J,,(t)+J,2(t)+J,,3(t), say. (1.5.6) Let S,, := {u: |u— t] S pan}, t+a,,D :2 {t+ua,, : u E D}, D, := S,n(t+a,,D), 02:: S,n(t+a,,D)C, where ()C denote the complement of a set. We denote k = supueR K (u) Since 9 is continuous at t, it follows that there exist a constant I > 0 such that g(t + a,,u) S l for V|u| S p and all large enough n. Note: We will choose I = supten g(t) if we know that g is bounded in R. And in this case, I does not depend on t E R. Let [2,, and n, denote the empirical distributions of 5”,, ,2?” and a,,-n ,5, respectively, p be distribution of e. (i) First, we show that J,,1(t) —> 0, as. Actually, Jn1(t) S 3": dGn(u) + if dGn(u) 0,, Di a,, 02 3 311,419,), Wm.) .. “(a)” + cf—nuw.) + law.) — 1402)”- 22 But for large enough n, 35,419,) = 2k] g(t+a,,u)duS2kld, (1.5.7) an 0] i,u(D2) = 6/ g(t+a,,u)du S lpe. a,, 02 Hence, we obtain that 2k 6 J..1(t) S 21:16 +1106 + a—lun(D1)- #(D1)| + a—lun(D2) - #(D2)|- (1-5-8) For any 77 > 0, by Lemma 1.4.2, it follows that P(—1-|un(D1)— 140.): > n) < 2exp{—1nazn2}. a,, — 2 " By (1.4.17) and the Borel-Cantelli lemma in Kallenberg (1997, pp 32), we obtain 1 a—l'lln(D1)— ”(DIM —') 0, (1.3.. (1..59) Similarly, we can show 1 .. a—Iun(D2) — p(D2)| —-> 0, a.s.. (1.5.10) Therefore, combining (1.5.9), (1.5.10) with (1.5.8), we obtain that J,,1(t) —> 0, a.s.. (ii) Secondly, we show J,,2(t) —+ 0, in probability. 1.2(t) s Z—k dam) + ,f— / dam) n D, n 02 g :—§[p(D1)+lun(D1)-' MD.» + law.) - 4.40.)” +(—)E:[/,t(D2) + |l1n(D2) " “(1)2“ + lfin(D2) _ #n(D2)l]‘ (1.5.11) By (1.5.7), (1.5.9) and (1.5.10), it suffices to show that on A,“6 2k :lfln(Dl) — pn(D,)| —+ 0, in probability, (1.5.12) iii—MAD» — pn(D2)| -—+ 0, in probability. 23 Since D1 and D2 are both unions of finite intervals like (t + a,,u), t + a,,ug], where —p < u, < u2 S p, by the triangular inequality, we will obtain (1.5.12) if we can show that on AM, in probability, 1 . E—Ipn((t + a,,u1,t+ a,,u2]) — ,u((t + a,,u1,t+ anu2])| —-) 0, (1.5.13) for any u, and u2 satisfying —p < u, < 11.2 S p . Note: By the method for showing (1.5.9), we can similarly show 1 a—lun((t + a,,u1,t+ a,,u2]) — u((t + a,,u1,t+ a,,u2])| —+ 0, (1.5.14) for any u, and 11.2 satisfying -—p < u, < u2 S p . By (1.4.9), for large enough n, we obtain that on AM, 1 . a—|p,,((t + a,,u1,t+ a,,u2]) - p((t + a,,u1,t+ a,,u2])| n 1 na,, Z[I(t + a,,u) + d,,(X,-) S 52‘ S 15+ anu2 + dn(X,)] '=l —I(t + a,,ul S e,- S t+ a,,u2)] fl Z[I(t + a,,ul — c,, < e,- S t + a,,ul + 0,) i=1 nan +I(t+a,,u2 — c,, < e,- S t+a,,u2 +c,,)]. 051m Note that by the continuity of g at t, the expected value of each summand is bounded alone by 2c,,g(t)(1 + 0(1)). Thus, for any given 17 > 0 and large enough n, by (1.4.9), it follows that 1 P ({g—wu + t+ (1.24)) — u(lt + t+ 4.421)) > n} r) 4,) S24M00+00D ma —->0. Thus, we have proved that J,,2(t) —) 0, in probability. 24 (iii) The rest is to show J,,3(t) —> 0, in probability. (1.5.13) and (1.5.14) imply that for any interval U,- C (—p, p], 1 21103} 6 (15+ Uian» — 1(52' E (t + Uianm —-> 0, in probability. nan j_l Combining this with the fact that 1 no, n N 220.416,- 6 (t + U,a,)) — 1(5, 6 (t + U.an))l] j=l i=1 Jn3(t) : N n = . a)”: ng, e (t+U,a,,)) — 1(.,e(t+U,-a,))]], (1.5.16) we obtain that J,,3(t) —> 0, in probability. 1.5.2 Uniform weak and strong consistency of 9,,. We first state the uniform consistency of 9,,. (defined in (1.5.2)) for 9. For its proof, see Theorem 1 of Devroye and Wagner (1980). Lemma 1.5.2 Suppose g is uniformly continuous on R. Then, under the assump- tions (K1), (K2), (1.4.2) and (1.4.3), we have sup |9,,(t) — g(t)| —> 0, a.s.. (1.5.17) tell The first result below is about the weak uniform consistency of 9,,(t) for g(t). Theorem 1.5.2 Suppose g is uniformly continuous on R. Then, under the assump— tions (K1), '(K2), (1.2.2), (1.2.3), (1.4.2) and (1.4.3), we have sup |9,,(t) - g(t)| —-> 0, in probability. (1.5.18) tel? 25 Proof: Since 9 is a uniformly continuous density on R, we continue to use 9,, = sup,€Rg(t). By (1.5.17) and the triangle inequality, in order to prove the claim (1.5.18), it suffices to prove that on A,” sup |9,,(t) — 9,,(t)] -—> 0, in probability. tER By (1.5.6), (1.5.8), (1.5.11), (1.5.16) and the fact that D, and D2 are both unions of finite intervals like (:1: + a,,u), :1: + a,,u.2], where —-p < u, < u2 S p, it is sufficient to show that on AM, in probability 1 — sup |p,,((t + a,,u1,t+ a,,u2]) - p((t + a,,u1,t+ a,,u2])| ——) 0, an tER (1.5.19) 1 a— sup |[i,,((t + a,,u1,t+ a,,u2]) — p((t + a,,u1,t+ a,,u2])| —> 0. n tER (1.5.20) For any given 17 > 0 and large enough n, by nag, —) oo (implied by (1.4.3) and (1.4.2) and (1.4.1), we have 1 P(-a— Sup l/Ln((t + anula t + anu2l) - ”((t + a,,u1,t+ anU2])| > 17) 11 (ER S 4exp{—2n2na3,} —> 0, (1.5.21) Thus, we have shown (1.5.19). By (1.5.15), for large enough n and c,, defined in (1.4.8), it follows that on AM, 1 - __ sup lun((t + anu1,t+ a,,u2]) —' [111((t + anula t+ anu2])l an tER 1 S {sup |p,,((t + a,,u, — c,,, t + a,,u1+ c,,]) - [1((t + anui — Cm t+ anul + Cnlll n t6 1 +iz— sup |u,,((t + a,,uz — c,,, t + an’u2 + Cnl) " #(U + a,,u2 " Cm t+ a,,u2 + Cnlll n tER 1 +3. sup u((t + anm — Cm t+ anu1+ Cnl U (t + a,,u2 " cm H” a,,u2 + Cal), 11 tER 26 where 1 a— Slip [1((t + anul _ Cm t'l' anul + Cnl U (t + anu2 ‘ C1,, t+ 011112 + 671]) it tell c S 490—1 ‘2 0» an by (1.4.9). And using an argument similar to the one used in proving (1.5.19), we obtain that in probability 1 — sup |p,,((t + a,,ul — c,,, t + anul + Cnl) an tER —p((t + a,,u1 — c,,, t + anui + Call] —* 0) I — sup |p,,((t + a,,U2 — Cm 15+ 01.112 + c,,]) an tER —u((t + a,,ug — c,,, t + a,,u2 + c,,])| -—) 0. Therefore, we have finished the proof of (1.5.20) and (1.5.18). Now we begin to consider the uniform strong consistency of 9,,. for 9. Theorem 1.5.3 Suppose g is uniformly continuous on R. Then, under the assump- tions (KI), (K2), (1.2.2), (1.2.4), (1.4.2) and (1.4.3), we have sup |9,,(t) — g(t)| —> O, a.s.. (1.5.22) teR Proof: By (1.5.17) and the triangle inequality, in order to prove the claim (1.5.22), it suffices to prove that SUI) 1.2m“) — gn0(t)l —) 0) ans-- $63 By (1.5.6), (1.5.8), (1.5.11), (1.5.16) and the fact that D, and D2 are both unions of finite intervals like (:1: + a,,u), :r + a,,ug], where — p < u, < u2 S p, it is sufficient to show that 27 1 21— SUP lun((t + anul. t + aural) - u((t + anul. t+ anU2l)| -> 0. as. n tER (1.5.23) 1 - a— sup |u,,((t + a,,u1,t+ a,,u2]) — p((t + a,,ul, t + a,,u2])| ——> 0, as. n tER (1.5.24) By (1.5.21), the fact that 2:, exp{—2n2naf,} S 00 (implied by (1.4.16)), and the Borel-Cantelli lemma, we obtain (1.5.23). Recall the definition of B,, in (1.2.6). By (1.2.7), for V6 > 0, we have that P(U§’:n {supi tell an [1,,((t + a,,u), t + a,,u2]) — p((t + a,,u1,t+ anu2]) > 0, a.s.. £6]? 11 (1.5.25) Next, for b,, defined in (1.4.19), (1.2.1) and (1.4.3) imply (1.4.20). Thus combining (1.4.20) with the property that SUPxe[o,1] |d,,(:1:)| S b,, on B,,, it follows that for large enough n, 1 .. a— sulp |p,,((t + a,,u1,t+ a,,u2]) - un((t + a,,u1,t+ a,,u2])]I(B,,) 12 t6 n su It+au—b 0, n and using the same arguments for proving (1.5.23), we can similarly show that 1 — sup |u,,((t + a,,u1 - bn. t+ anul + bnl) an tER —p((t + a,,ul — b,,, t + a,,u1+ b,,])] —+ 0, as. I — sup |p,,((t + a,,u2 — b,,, t + a,,u2 + b,,]) an tER -u((t + a,,u2 — b,,, t + a,,ug + b,,])| —> 0, as. Hence, we obtain that 1 - a— sup |p,,((t + a,,u1,t+ a,,u2]) — pn((t + a,,u1,t+ a,,u2])|I(B,,) —> 0, as. n tER Combing this with (1.5.25), we have finished the proof of (1.5.24), and hence of the claim (1.5.22). 29 Chapter 2 Asymptotic Distributions of some Density and Distribution Function Estimators In this chapter we will split the whole sample into two parts. Use the first part of the sample to estimate the unknown regression function and define the density and distribution function estimators based on the second part of the sample and the regression function estimator. Then we consider the asymptotic distributions of the suitably standardized estimators of g and G. 2.1 Definitions of the modified estimators of G and 9 Let r,, be a sequence of positive integer satisfying lim r,, = 00, lim(n — r,,) = 00. Use the first r,, observations (X 1, Y1), (X2, Y2), - - - , (Xm Yr") to construct the estima- tor of the regression function. Actually, we let m;(2:) denote the Nadaraya-Watson kernel regression estimator based on (X 1,Y1), (X2, Y2), - -- , (Xrn, Yr") : _rz; Y, Lhn($ - X,) 237:1“).an ’ 30 m*(:r) L),(:r) :2 lug), 2: 6 [0,1], ii where h,, is the usual band width sequence of positive numbers tending to zero, and L is the kernel density function. We use the rest of the observations (X,n+1, I’m“), (X,n+2, I’m”), - - - , (X,,, Y,,) to construct the distribution and density function estimators of 51. Let e; := Y,- —m,‘,(X,), r,,+1 S i S n, denote the nonparametric residuals and let a,, be another sequence of positive numbers tending to zero. The empirical d.f. and the histogram type density estimators based on these residuals that are of interest here are, respectively, n 1 G:.(t) == Z 1(8: _<_ t). n — r " i=r,.+1 1 tER 9;,(1) :2 Z I(t—a,, 0 and B; —> 0, as n —> 00, (2.2.2) 6,271 sup |d;(;1:)] = 0,,(1), as n —> 00. (2.2.3) :rE[0,l] Note: The conditions for (2.2.3) to hold have been discussed in Section 1.2. At this point we only need to note the following implications. By (2.2.3), for any 6 > 0, there exists numbers K 2‘ < 00 and N: < 00, such that P(A:,,,) > 1 — e, Vn > N:, (2.2.4) where 4:... = 1 sup 14:14» 3 Km. xE[O,I] This fact will be used in the proof of the asymptotic normality of 9;,(t), and the asymptotic distribution of a suitably standardized supremum deviation sup, [9;(t) — g(t)|. It will be also used for showing the uniform closeness of U;(t) to the error empirical process U,,(t), i.e., in showing SUP fill/£0) - U,,(t)| = 012(1). tER and the asymptotic distribution of 6;. 2.3 Asymptotic distributions In this section, we shall first show the asymptotic normality of g,‘,(t). After that, we will obtain the asymptotic distribution of the global measure of the deviation of g; from g, which will be shown to be the same as the result in Bickel and Rosenblatt (1973) in the case of the one sample set up. Then we shall show the uniform closeness of U; to U,,,. and the weak convergence of U; to a time transformed Brownian bridge. 32 This result extends some of the results of Lemma 2.3.2 with d,,,- E n‘”2 of Koul (1970) and Loynes (1980) and Theorem 1.1 with 7,,- E n‘l/2 of Koul and Ossiander (1994) from parametric regression to the nonparametric regression model. This section ends by showing the asymptotic distribution of G;. 2.3.1 Asymptotic normality of 9;“,(t) Set £71 : (X17X'27 ' . ' 7X11: )/17Y2i ' H iyrrn)i (n : (Ern+17€rn+23° ' ° iEn)° Recall the definition of d;(:::) from (2.1.1). For any t E R, define 62.10:: 1 Z11(t—a.+d:; 0 and 6 = 6(t) > 0 which only depend on t such that s e (t — 5,1 + 5) :4 [9(3) — g(t)| _<_ c(t)|s — t]. (2.3.9) Proof: For any 6 > 0, (2.2.3) implies (2.2.4). In view of (2.2.4), to show (2.3.8), it is sufficient to prove that for any n > 0, P ({(Q, (1)) > nm) flA;,,) —) 1). (2.3.10) 4; = Kgsg. (2.3.11) Let 34 Since {a,, r,, + 1 S i S n} are i.i.d. and independent of 9,,, by the Chebyshev inequality, we obtain that on .4‘ n,(’ ) < EilQCnl2 £71] £11 — 772an/(n " Tn) (W g(s) (LS/(772%)- +01! -C; P (12.41)) > Nan/(n - r.) |/\ (2.3.12) Since (2.3.7) implies that r,,afihn/log r,, —> 00, by (2.3.11) and (2.2.1), we have c;/a,, —> 0. (2.3.13) Combining this with (1.4.2), it follows that a,, + c; —> 0. Hence, for 6 in (2.3.9), we can conclude that for large enough 17., a,, + c; S 6, 0 < a,, — c; S 6, (2.3.14) Thus using the local Lipschitz property (2.3.9), we obtain that t+an+C; f gt)...- 3 21401) + c(t)61c:.. t+an"C; Combining this with (2.3.12) and (2.3.13) yields that on 24;,“ E.) S 2[(g(t) +c(t)6]c; _) 0. 7720.. P (1624(1)) > 17 “" Integrating out over A‘ and using the Dominated Convergence Theorem, we obtain 11,0 that (2.3.10), thereby completing the proof of this lemma. Now, we begin to consider the asymptotic normality of g,‘,(t). Theorem 2.3.1 Under the condition of Lemma 2.3.1, and the assumptions lim(n — r,,)a: = 0, (2.3.15) lim(n — r,,)a,, = 00, (2.3.16) (n — r,,)a,, log r,, 11m r,,h,, = 0, (2.3.17) 35 we have \/2(n — r,,)a,, Wm ——> N(0,1), in distribution, (2.3.18) 9 for any t such that g(t) > 0. Example: r,, = [n/ 2], a,, = n‘l/4 and h,, = 71‘”5 are an example of sequences r,,, a,, and h,, satisfying the assumptions of Lemma 2.3.1. Proof: Recall the decomposition (2.3.3). The proof consists of proving (2.3.4) - (2.3.6). (i) Proof of (2.3.4). By the triangle inequality, we have \/(n — r,,)a,, 1 2 (n — r,,)a,, (n _ Tn) an gut) — 9,,(t)] s ; Qc.(t)[ + x z": lG(t+a,,+d:,(X,-))—G(t—a,,+d;(X,~))—G(t+a,,)+G(t—a,,) izrn +1 Let L,,(t) denote the second term of the above bound. Lemma 2.3.1 shows that \/(n —' Tnl/aleCn (tll = 012(1)- Therefore, by (2.2.4), it suffices to show that on A“ L,,(t) = 0,,(1). me! With the 0; given in (2.3.11), on A’ we have 12,6’ 1 n sup 2: |G(t+a,,+l,-) —G(t+a,,) \/(n — r,,)a,, r,,+1Si_<_n,|l,-|Sc;, i=r..+1 Ln“) S —G(t — a,, + 1,) + G(t — a,,) 1 " ’1 = sup |Z[/ g(t+s+a,,)ds (n — r,,)a,, rn+ISiSn,[l,|Sc,’, 1.2,”, o , l.- —/ g(t+s — a,,)ds]] o n — r,, c; C; .<_ a [/ 140+ s+a..) — g(t)|...) 190+ s — a.) — g(t)|.ds] 36 Hence, by (2.3.14), the local Lipschitz property (2.3.9), c; < a,, for large enough n (which is implied by (2.3.13)), and (2.3.17), we obtain that for large enough n, the above upper bound is s "‘T"r 0 and the local Lipschitz property (2.3.9), we obtain that Ben) = 2a.g(t)<1+ 0(1)). The condition (2.3.16) implies that the Lindeberg—Feller Condition (See Theorem 4.12 in Kallenberg (1997)) is satisfied for the random variables W,,(t)12(t), thereby proving the claim (2.3.5). (iii) Proof of (2.3.6). Since a,, —> 0 in (1.4.2) implies that for 6 in (2.3.9) and large enough n, we have a,, < 6. So, by (2.3.3) and (2.3.9), for large enough 71, we obtain 37 that This completes the proof of (2.3.18). 2.3.2 Asymptotic distribution of the global measure of the deviation of 9,"; from g In this sub-section, we extend Theorem 3.1 in Bickel and Rosenblatt (1973) to 9;. We will show the asymptotic distribution of the global measure of the deviation of g; from g is the same as in the case that we can observe the error e. Denote 111;: sup{\/(n — r,,)a,, [9:10) — g(t)| :0 S t S 1}. g A (~0- ' V (There is no loss in considering [0, 1] rather than any other interval on which the density is bounded away from 0 and 00. The main result is as follows: Theorem 2.3.2 Suppose that there exist constants 71 and 72 such that 71 < 0 < 1 < 72 and on [71, 72], g is continuous, positive; 91/2 is absolutely continuous and has bounded derivative 9/(291/2). Then, under assumptions (2.2.2), (2.2.3) and a,, = (n —- r,,)7‘s, for some 6:1/3 < 6 <1/2, (2.3.19) , rfiafihn 11m [log (n _ Tn)]4 log Tn — 00, for some a . 0 < a < 1, (2.3.20) lim n2 exp {—(n — r,)/rg} = 0, (2.3.21) lim (7‘ 7 ma" 1:3)" 7 T") log T" = 0, (2.3.22) 38 we have P026 log (n — r,,)]%(\/2M,: -- e,,) < y) —+ 6—22-51, Vy E R, where tel—1 6,, = 12610g (n — Tull 1 1 I + 110 ————10 6+lolo n—r,, . [2dlog(n-r,,)]‘2'{ gzfi 2[ g g g( )1} (2.3.23) Example: 6 = 5/12, r,, = [n/ 2] and h,, = 11‘”24 are an example of 6 and sequences r,, and h,, satisfying the above assumptions. The proof of this theorem will appear after the following two lemmas. The first lemma follows from Theorem 3.1 of Bickel and Rosenblatt (1973), upon taking w(t) = %I(|t| g 1). Let M. = sup {Wu — r.)a./g(t) 9,,(t) — Eg..(t)| :0 St 3 1}. Lemma 2.3.2 Suppose g is continuous, positive and bounded on [0,1]; 91/2 is abso- lutely continuous and has bounded derivative 9/(291/2), and a,,: (n-r,,)76, 0<6<1/2. Then, P([26 log (n — r,,)]%(\/2M,, - e,,) < y) ——) e72c-y, Vy E R, where e,, is defined in (2.3.23). Next, we state and prove the second lemma Lemma 2.3.3 Suppose that (1.4.2), (2.2.2), (2.2.3), (2.3.20) and (2.3.21) hold and 9 satisfies the assumptions in Lemma 2. 3.2. Then, sup te[01] an(t)[ = 0p(\/(n _ Tn) 1:201 _ TD). (2.3.24) 39 The proof of this lemma uses the technique of twice symmetrizations, as in Pollard (1984, p. 14-16), which is based on the principle of comparing the empirical process to a symmetrized empirical process. This approach is also discussed in Van der Vaart and Wellner (1996). To prove this lemma, we need some additional notations. Let (5;, r,, +1 S i S n} be an independent copy of {s,-, r,, +1 S i S n} and also independent of (X1,X2, - n ,X,,, Y1,Y2, - ~ ,Y,n). Set I I ,J I C1: = (Ern+11°rn+2" ' ' 757;)- Let Q(;. (t) denote the Q(n(t) with C, replaced by C,’,. Conditionally, given 6,, the process {QCn(t),t E R} and {QC;l(t),t E R} are independent copies of each other. Let 0r,+1.0r,+2. . . . , 0,, be i.i.d. r.v.’s, independent of (,,, (j, and 5,, and such that P(0,, =1): P(a,, = -1) 21/2. For t E R, define it —I(t—a,,<5,-St+a,,)], (2.3.25) and let Qz;(t) denote the Q2” (t) with C, replaced by (,2. Then, conditionally, given 6,, the process {Q2n(t),t E R} and {ng(t),t E R} are independent copies of each other. Denote A — a" ". (n-rn)10g(n—rn)' Now we begin to prove Lemma 2.3.3. Proof: For any 6 > 0, by (2.2.4) ( implied by (2.2.3)), in order to prove (2.3.24), for 40 any n > 0, it suffices to prove that P( sup {|Q(n(t )| > 17A,, )flA; )—> 0. (2.3.26) t6[0,1] By the continuity of g on [71,72], we know that g is bounded on [71, 72]. Denote t 90 = sup g(t) < 00. (2.3.27) 71S‘S72 Since {a,, r,, + 1 S i S n} are i.i.d. and independent of 5,, by the Chebyshev inequality, (2.3.1), (2.3.27) and (2.3.11), it follows that on A“ n ,, for each t 6 [0,1], P 0624(1)] > ’7 5 ) 4(n — rn) log (n - Tn)ElQ<.. [2 11 172 an 4log(n -—r,,) t+aa":c:g( s()ds f” . (2.3.28) 7720:: Since (2.3.11) and B; —> 0 (in (2.2.2)) shows that c; —> 0, combining this with (1.4.2), we obtain that for large enough n, 71 S an — C; S an '1‘ C; S ’72 — 1- (2.3.29) Thus, we see that for large enough 11: 71 S t + a,, — c; S t + a,, + c; S 72 for any t 6 [0,1]. Combining this with (2.3.28), we obtain that for large enough n and t6 [0,1] 41 _ n t t P(1¢2..(t)1>§). a.) 0,0, ’" >906 __ £15: (log (n - r,,))2 log r,, - n2 r,,afihn ' By (2.3.20)),the above bound tends to zero. Hence, we conclude that for all large n ,on A; ,, P (ch.(t)l > g). 4..) 31/2- 41 Therefore, by the symmetrization lemma (see Pollard (1984), p. 14), we have that For given 5,, we can see that the symmetric random process {an(t) — Qt; (t), 0 S for large enough n, on A; ,, n) S P ( SUP |Q<.(t) - Q<1.(t)|> g4... t€[0, 1] $1) ( SUP IQC. (t )I > 77)». t6[0 1] (2.3.30) t S 1) has the same distribution as {an (t) — Q2; (t), 0 S t S 1}. Thus we have that the upper bound in (2.3.30) is n) +P ( sup [Q2.(t )I > 2A,, n) t€[0, 1] .) . (2.3.31) for any giving 5,, it follows that =P(mmrrm— Qum>gt te[01] SP(prHM>ZM t€[0 1] =2P(SUP chn (t )l > 3A" t€[0 1] By (2.3.30) and (2.3.31), on .4“ 71,61 5)s4p(mmuu(n>§t te[01] P ( sup |Q<.(t)| > 174.. t6[0, 1] Integrate out over A; 6, P0133131] ch.(t )l >17)». )flA; ) S4P({ sup |Q2(t ZnAn}flAn ). t€[0, 1] (2.3.32) For each i ( r,, +1 S i S n), define Z, :2 0’,[I(t — a,, + d;(X,') < E,‘ S t '1’ a,, + d;(X,)) — 1(t — a,, < E, S t+ a,,)]. Then, by (2.3.25), (n — r)QEn (t)-— — 2:; +1Z, is the sum of n — r,, conditionally independent random variables, given 5,, and (,,, with conditional mean 0 and the 42 following property: for each 2' (Tn + 1 S 2' S n), —|I(t — a,, +d;(X.-) < e,- S t+an +d;;(X,-)) — 1(t — a,, < e,- _<_ t+a,,)| S 22' S |I(t — an +d;(X,) < a, S t+an +d;(X,~)) — I(t — an < 5,- S t+an)|. By Lemma 1.4.2 (Hoeffding inequality), we conclude that P (102,101 > 31,. a... c.) = P (I _ ZlZil > 3AA” — Tn) {m (it) 1=rn+ S 2exp {—172(n — r,,)an/ [32 log (n — Tn) X Z I!“ — an + d;(X,) < 51' S t+ an + d;(Xi)) i=rn +1 —I(t—an<€,-St+an) 1}- Next, we handle the denominator in the exponent of the above bound. On A" 71,0 for any 2' (Tu +1 S 2' S n), c; defined in (2.3.11) and large enough 11 such that c; < an (implied by (23.20)), we obtain that |I(t—a,,+d;(X,-) <5.- gt+an+d;(X.-))—I(t-a,, <5.— St+an)| SI(t+a,,<5iSt+an+c;)+I(t+an—c;<€iSt+an)| +I(t—an 3A,. GHQ.) S 2exp {- 44 To deal with the supremum over t E [0, 1], let T1 < T2 < < T), denote all different ordered possible values among 5,- — a,, — d;(X,), 7",, + 1 S i S n and 5,- + a,, — d;(X,-), Tn + 1 S i S n. And define To 2 —00, TH] = +00. Then we see that k S 2(n — Tn). It also follows that given 5,, and (,,, an(t) E an(T,-) for any t E [T,-, Ti“),0 S i S k. Therefore, for any given 6,, and (,,, an(t) can be at most I: + 1 different random variables for t E [0, 1]. Hence, using the fact that k +1 S 2(n — Tn) + 1 S 4(n — 1‘"), we have P (sup] lenOI > gAn Ean) [(Am) te[0,1 27%a h % <4(n—r,,)exp - 77 n n" _ 128K;(1+ 95) log (n — r,,)\/log 1",. 2 % l 77 Tu a,,hn2 < 4 — -1 1 — n , ‘ exp] [128160 +ga>nog(n- raw—logn. ] 0“" r )} for all 71 large enough, w.p.1. Since (2.3.20) implies that {n2r§anh,%/ [128KZ(1 + 95)l10g (72 - Tn)]2\/i-CE] -1}10g(n - n.) —+ 00, we obtain that the above upper bound tends to 0. Thus, taking expectations over 5,, and (,,, and use the Lebesgue Dominated Convergence Theorem and the independence of {n and (,,, to obtain 1.: 77 . P ({ £2313] |Q ZAn} HAM) —) O. Combining this with (2.3.32), we have completed the proof of the claim (2.3.24). This also completes the proof of Lemma 2.3.3. Now we begin the Proof of Theorem 2.3.2: In view of (2.3.3) and Lemma 2.3.2, it suffices to prove [w(t) x/g(t) anA;l sup —> 0, in probability, tE[0,l] 45 and I3n(t) g(t) a,,A;1 sup teloJl with anA;1 = \/(n — r,,)a,, log (n -— rn). Since g is continuous and positive on [71, '72], it is easy to get that l/g is bounded on ['71, 72]. So, in order to prove the claim, it suffices to show a,,A;1 sup |11n(t)| —-> 0, in probability, (2.3.34) t6[0,l] and a,,A;1 sup |13n(t)| —+ 0. (2.3.35) t€[0,l] (i) First, we will use Lemma 2.3.3 to show (2.3.34). For any 5 > 0, by (2.2.3), it suffices to show (2.3.34) only on AIM. By the definition of L,,(t) in (2.3.3), it follows that an): sup I11n< )|< _ sup chJU] tE[0,1] 21:: t€[0, 1] log(n- rn) " — u E G t + an + d; X,- (n - Tn)an te[0p1] l ( ( )) + (1/2) i=rn+1 - G(t — a,, + d;(X,-)) —- G(t + a,,) + G(t — a,,) . (2.3.36) Since the conditions of Lemma 2.3.3 are satisfied under the assumptions of Theo— rem 2.3.2, by (2.3.24), we can conclude that the first term on the right side of (2.3.36) tends to zero, i.e., sup ]Q() n t)=| 1 . 2/1\n t€[0, 1] C( 0,,( ) As to the the second term, by the inequality in (2.3.19), we obtain that it is 46 bounded above by +/Cn |g(s+ t — an) - 9(8)|d3]- Since a,, —-> 0 can be gotten from (2.3.19), by ,6; —> 0 in (2.2.2), we still have that (2.3.29) holds for all large n. Hence, for any t E [0, 1], we have that '71 S t+c; -—a,, < t+ c; + an S 72 for all large n. By the continuity of g and the boundedness of g/ 291/ 2 on [71, 72], it is easy to get that g is Lipschitz on [71, 72]. Thus, there exists a constant c (0 < c < 00) such that |9(t1) " 9052)] S Cltl — t2], V t1 6 [71,72], t2 E [71,72] (2337) Hence, for all large n, the second term on the right side of (2.3.36) is bounded above by 2:" /_cn 6K3 + a,,) + (an — 8)] d3 - n n1 - n] n =2c\/(" T )a 393;" r ) 03’" —+0, by (2.3.22). Therefore, by (2.3.36), we have completed the proof of (2.3.34). (ii) It remains to show (2.3.35). By (2.3.29) and (2.3.37), for large enough n, it follows that a,,A;1 sup |I3n(t)| t6[0,1] 1 S — SUP t+an s - t ds 23mm,” / [g() g( )1 ] —an 1\/(n—Tn)10g(n—Tn) Slip C/t+an]3-tld8 t 2/\n an t€[0,l] —a,. =_- gfln — Tn)a§.108 (n ' Tn)’ |/\ 47 And, by (2.3.19), we can check that \/(n — r,,)af, log (n — Tn) —+ 0. Thus, we have shown (2.3.35) and the proof is completed. An application of Theorem 2.3.2 will be discussed in Chapter 3, briefly. 2.3.3 Asymptotic distribution of G; For U;(t) and U,,..(t) defined in (2.1.2), first, we show that \/(n — r,,)lU,",'(t) — Un.(t)| —> 0, in probability. (2.3.38) Lemma 2.3.4 Suppose that 9 satisfies the local Lipschitz condition at t, for some t E R1. Then, under the assumptions (2.2.2) and (2.2.3), we have (2.3.38). Proof: By (2.2.3), to show (2.3.38), it suffices to prove that for any n > 0, P ({IUSU) - Um(t)l > n/x/(n — m} HAL.) —> 0. (2.3.39) We will continue to use 6;, defined in (2.3.11). Since {5,-, Tu + 1 S i S n} are i.i.d. and independent of (,,, by the Chebyshev inequality, we obtain that on AIM, < ElIU;(t)-Un.(t)l"’ 6") - n2/(n-rn) t+c; S 9(8) d8 722- -/t‘-c;, / Thus, by (2.2.2) and the continuity of g at t, we obtain that P (mam — Umun > n/x/(n — E.) 6n] P (MU) — Um(t)| > w(n - n?) 6") -—> 0. t mo and using Lebesgue Dominated Convergence theorem, we Integrating out over A obtain that (2.3.39). Thus we have finished the proof of this lemma. 48 By the classical Central Limit theorem, we know that in distribution (n — r.) U...(t)= Z [1(c1StG()l—> N(0,G(t)(1— G(t». V (n—:__Tn) i: rn+l (2.3.40) So, combining this with Lemma 2.3.4 and the fact that on Ag, ,/ - ,, d; X) G(t)S| <(,/ — n) (n r )n_rni=;lG(t + (n r )/::ng (2.3.41) we have the following asymptotic normality: Theorem 2.3.3 Under the conditions of Lemma 2.3.4 and , (n — rn) log rn l ‘ = 2. .42 1m r,,hn 0 ( 3 ) we have \/(n — rn )[G'*(t) ]—> N(O, G(t)(1 G—(t)), in distribution. 2/3 Example: rn = n — n and h,, = 71—1/4 provide an example of the sequences rn and h,, satisfying the assumptions in the above theorem. Now we begin to consider the asymptotic distribution of (/ (n — rn)[G;(t) — G(t)], viewed as random functions. First, using the technique of twice symmetrizations again, we can show the following uniform closeness of U; and U,,.. Lemma 2.3.5 Suppose that g is bounded on R. Then, under the assumptions (2.2.3) and (2.3.21 ), and the condition that for some a: 0 < oz < 1 lim mh = 00, (2.3.43) log rn log2 (n — r,,) we have \/(n — rn) sup |U,:(t) — Un.(t)| —> 0, in probability. (2.3.44) teR 49 To prove this lemma, we continue to use some notations used in the proof of Lemma 2.3.3. We also define Pa“) := W) - U...(t), n n 2 0’,‘[I(€1'_<_t+ d;(4¥1)) — 1(81' $15)]. izrn +1 I n—Tn Pgn(t) :2 (2.3.45) Let P(;(t) denote the P4,,(t) with (,, replaced by C], and PEA“) denote the Pgn(t) with (,, replaced by (,1. Then, conditionally, given (,,, the processes {P9, (t),t E R} and {P(;(t),t E R} are independent copies of each other while {Pgn(t),t E R} and {Pg:1(t),t 6 R} are independent copies of each other. Now we begin to prove Lemma 2.3.5. Proof: By (2.2.3), in order to show Lemma 2.3.5, it is sufficient to prove that for any n>0, 19(2)}: {)1}, (n) > n/m}fl A;,,) —+ 0. (2.3.46) Since {a,, rn + 1 S i S n} are i.i.d. and independent of En, by the Chebyshev inequality, the boundedness of g and (2.3.11), it follows that on A‘ for each t 6 R, ,, 4(n-rn)Elch.I2 a.) P |P<.(t)]>——£. s 2 2 (n — rn) 7) 4L1?" g(s) ds S n 2 77 881m g(t) . -<— 162R cn ____) 0, 77 where we use c; -> 0 implied by (2.3.43). So we obtain that on A;,,, for all large n, P ]P<.(t)| > ———1’— a 31/2. 2‘/(n - r,,) 50 Therefore, by the symmetrization lemma (see Pollard (1984), p. 14), we have that on {71) s 2P suplP<.(t) -Pc'(t)| > ’7 tER " 2 (n—rn) Aim, for large enough n, P(§161gchn(t)l > n/x/(n — Tn) 5n) . (2.3.47) For any given (,,, we can see that the symmetric random process P(n(t) — PC; (t), t E R has the same distribution as Pgn(t) — P5510), t E R. Thus we have that .) ..) a.) + P (suplpg;(t)l > 4 ’7 tER (n — rn) P SUPIPc.(t)-Pc;.(t)l > ” tER 2 (n—rn) 77 :P St] P‘K t —P’I t > (will 4"” “(H 2 (n—rn) S P (suplP£,(t)| > 4 7? tER (n - rn) .) an) . (2.3.43) . 77 =2P su P t > (tegl “(H 4 (n—rn) By (2.3.47) and (2.3.48), on A“ for any giving 5", it follows that 2,) P (33;; ch..(t)| > man -— 'r'Z) g 4P (sup IPC‘n(t)| > ’7 t€R 4 (n — rn) .) t 11,0 P ({ 25113.30) > n/x/(n - ram/1;.) S 4P({ sup IPC‘n(t)| > .6). 4h} HAP) For each i ( rn +1 Si S 72), define Integrate out over A (2.3.49) 211: 0i[1(5i S t + d;(X,)) — [(5, S t)]. Then, (n — r)P<*n(t) = Elma-121' is the sum of n — rn conditionally independent random variables, given 6,, and (,,, with conditional mean 0 and the following property: 51 for each i (rn +1 Si S n), —|I(s,- S t+ d‘,",(X,)) — 1(5) S 23)] S Z,- S |I(e,- S t+ d;(X,~)) — 1(5, S t)|. By Lemma 1.4.2 (Hoeffding inequality), we conclude that .) :P(| Z Zil > gVn—Tn 671: (n) i=rn+1 . 77 P (|P(n(t)| > 4——— (TL _ Tn) S 2exp{—n2(n—rn)/[32 2n: ]I(€. S t+d;(X.-))-1(€.- S 0H}. i=7n+ Using DKW inequality in Lemma 1.4.1 and assumption (2.3.21), by the method for * 11,61 showing (2.3.34), we can similarly show that on A for any t E R, with probability one, ' P IP‘ (t)| 1) 6 C < 2 1727",?“ > —— m n _ 6X — c. 4m p 128K;(1+g5),/'—logrn To deal with the supremum over t E R, let T1 < T2 < - - - < Tk denote all different ordered possible values among 5,- -— d;(X,-), rn + 1 S i S n and 5,, rn + 1 S i S n. And define To z —00, Th“ 2 +00. Then we see that k S 2(n — rn). It also follows that given R. and (,,, Pg" (t) E Pgn(T,-) for any t 6 [Th T,+1),O S i S 1:. Therefore, for any given Ru and (,,, Pg" (t) can be at most 2(n —— r,,) + 1 different random variables for t E R. Hence, using the fact that k S 2(n — r,,) + 1 S 4(n — r,,), we have that for all n large enough and with probability 1, P (sup IPgn(t)l > #- €11,471) [(An,c) tER 4 (”_Tn) 2 ‘2' l nrnhn2 <4 — n - — (" T)BXP{ 128K;(1+gg),/Io"g'—r,,} 2 3' l nrnhnz <4 — _1 l — n o ‘ exp] [128K:(1+ga)uog(n—ram—“om. PM" U} (2.3.50) 52 Since (2.3.43) implies that {n2r§h.%/[128K:(1+ gonog (n — MHM] — 1} log (n — r.) —+ co, we obtain that the up bound in (2.3.50) goes to 0. Thus, take expectations over 6,, and (,,, and use the Lebesgue Dominated Convergence Theorem and the independence of {n and (,,, to obtain 77 P su P’t >—————— A:e —->0. ({ tel?” C"( )I 4\/(n— rn)}n ’) Combining this with (2.3.49), we have finished the proof of the claim (2.3.46). There- fore, we have finished the proof of Lemma 2.3.5. By the Donsker theorem in Van der Vaart (1998) (pp 266), we know that the sequence of empirical processes ,/(n - r,,)Un. converges in distribution to a Brownian bridge with respect to G. Combining this with Lemma 2.3.5, we see that under the conditions of Lemma 2.3.5, we have \/(n — r,,)U; => B(G), where B is a Brownian bridge over [0,1]. Using (2.3.41) and (2.3.42), we can obtain that 1 n—rn Z G(t + d;(X,-)) — G(t) —+ o, in probability. i=rn+l sup (n — rn) tER Therefore, we have the following asymptotic distribution of the process V (n - Tn)(G:i - G) Theorem 2.3.4 Under the conditions of Lemma 2.3.5 and assumption (2.3.42), we obtain (n — r,,)(G; — G) => 3(0). 53 Chapter 3 Applications on Testing the Goodness-of-Fit of the Error Distribution Using the asymptotic distributions of M; (or 0;), we can test the goodness-of-fit of a specified error density function g0 (or d.f. Go). (i) A test statistics for the hypothesis H : g = g0 To test H : g = 90 it is natural to compute M; with g = g0 and reject for large values of the statistic. According to the Theorem 2.3.2, in order to obtain level a, we should use as cutoff point, 6(0) 2 —[log|log(l — a)| — log2]/\/2610g (n — T") + en/\/2. (ii) A test statistics for the hypothesis H : G = G0 Theorem 2.3.4 shows that (n — r,,)(G; — G) => B(G). Compute G; — GO and use the asymptotic distribution of B(G°), to test H : G = G”. 54 Bibliography [1] Bickel, P.J. and Rosenblatt, M. (1973). On some global measures of the deviations of density function estimators. Ann. Statist, 6, 1071-1095. [2] Billingsley, P. (1968). Convergence of Probability Measures. John Wiley, New York. [3] Bosq, D. (1996). Nonparametric Statistics for Stochastic Process. Springer- Verlag, New York, INC. [4] Boldin, M.V. (1982). Estimation of the distribution of noise in an autoregression scheme. Theory Prob. Appl, 27, 866-871. [5 Cheng, Fuxia (2001 (a)). Consistency of error density and distribution function estimation in nonparametric regression, Accepted by Statistics and Probability Letters. [6] Cheng, Fuxia (2001(b)). Weak and strong uniform consistency of a kernel error density estimate in nonparametric regression, Revised for Journal of Statistical Planning and Inference. [7] Devroye, L. (1983). The equivalence of weak, strong and complete convergence in L1 for kernel density estimation. Ann. Statist, 11, 896-904. [8] Devroye, LP. and T.J. Wagner (1980). The Strong Uniform Consistency of K er- nel Density Estimates. Multivariate Analysis, 5, 59-77. 55 [9] Dvoretzky, A., Kiefer, J. and Wolfowitz, J. (1956). Asymptotic minimax charac- ter of the sample distribution function and of the classical multinomial estimator. Ann. Math. Statist., 27, 642—669. [10] Fabian, V. and Hannan, J. (1985). Introduction to Probability and Mathematical Statistics. John Wiley & Sons, INC. [11] Hiirdle, W., Janassen, P. and Serfling, R. (1988). Strong uniform consistency rates for estimators of conditional functionals. Ann. Statist., 16, 1428-49. [12] Hoeffding, Wassily (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58, 13-30. [13] Kallenberg, O. (1997). Foundations of Modern Probability. Springer-Verlag New York Berlin Heidelberg. [14] Koul, H.L. (1970). Some convergence theorems for ranks and weighted empirical cumulatives. Ann. Math. Statist., 41, 1768-1773. [15] Koul, H.L. (1977). Behavior of robust estimators in the regression model with dependent errors. Ann. Statist., 5, 681-699. [16] Koul, Hira L. (1991). A weak convergence result useful in robust autoregression. J. Statist. Planning and Inference, 29, 291-308. [17] Koul, Hira L. (1992). Weighted Empiricals and Linear Models. (1992). Lecture Notes-Monograph Series, 21, Institute of Mathematical Statistics, Hayward, Cal- ifornia. [18] Koul, Hira L. (1996). Asymptotics of some estimators and sequential empiricals in non-linear time series. Ann. Statist., 24, 380-404. 56 [19] [20] [21] [22] [23} [24] [25] [26] [27] Koul, Hira L. and Osiander, M. (1994). Weak convergence of randomly weighted residuals empiricals with application to autoregression. Ann. Statist., 22, 540- 562. Loynes, RM. (1980). The empirical d.f. of residuals from generalized regression. Ann. Statist., 8, 285-298. Mack, Y.P. and Silverman, B.W. (1982). Weak and strong uniform consistency of kernel regression estimates. Z. Wahrscheinlichkeitstheorie verw. Gebiete, 61, 405-415. Massart, P. (1990). The tight constant in the Dvoretzky-Kiefer-Wolfowitz in- equality. Annals of Probability, 18, 1269-1283. Mammen, E. (1996). Empirical process of residuals for high-dimensional linear models. Ann. Statist., 24, 307-335. Nadaraya, E.A.(1964). On estimating regression. Theory Probab. Appl, 9 141- 142. Pollard, D. (1984). Convergence of stochastic process. Springer-Verlag New York Inc. Portnoy, S. (1986) Asymptotic behavior of the empirical distribution of M- estimated residuals from a regression model with many parameters. Ann. Statist., 14, 1152-1170. Serfling, Robert J. (1983). Properties and applications of metrics on nonparamet- ric density estimators. Proc. International Colloquium on Nonparametric Statis- tical Inference. Budapest, North-Holland: 859-873. 57 [28] Van der Vaart, A.W. and Wellner, J .A. (1996) Weak convergence and empirical processes. Springer-Verlag New York Inc. [29] Van der Vaart, A.W. (1998). Asymptotic Statistics. Cambridge University Press 58