IIWIW“WNW“.lNlHillHWlllfll‘lHHNlllWl 132 , O41 THS ilililiiil’il iii!liililiil'lillifliiifiii 3 1293 01051 7914 This is to certify that the dissertation entitled Strong Consistency And Bahadur Type Expansions Of A Class Of Minimum Distance Estimators In Linear Regression presented by Zhiwei Zhu has been accepted towards fulfillment of the requirements for PhoDo degree in StatiStiCS #94»: Major professor / Date MIL—1.233. MSUiJ an Affirmative Action/Equal Opportunity Institution 0- 12771 LIBRARY Michigan State Unlvorslty PLACE Ill RETURN BOX to romovo this chookout from your rooord. To AVOID FINES rotum on or botoro date duo. DATE DUE DATE DUE DATE DUE MSU II An Affirmative Action/Equal Opportunity Institution WM: STRONG CONSISTENCY AND BAHADUR TYPE EXPANSIONS OF A CLASS OF MINIMUM DISTANCE ESTIMATORS IN LINEAR REGRESSION By Zhiwei Zhu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1993 ABSTRACT STRONG CONSISTENCY AND BAHADUR TYPE EXPANSIONS OF A CLASS OF MINIMUM DISTANCE ESTIMATORS IN LINEAR REGRESSION by Zhiwei Zhu Let p Z 1 be an integer, F be a distribution function (d.f.) on the real line R and {5;}, 1 S i S n, be independent and identically distributed (i.i.d.) F random variables (r.v.’s). Consider the linear regression model Kzizxgfifl'i'eia 152.3”, where 23;,- is the ith row of the known n x p design matrix X n, 1 _<_ i S n, and fl is the regression parameter vector of interest of dimension p x 1. For a nondecreasing right continuous function H from R to R, Koul 6'5 De Wet (1983) defined a minimum distance estimator E of H as ~ fl = argminbM(b)’ where, for b E R”, Ta») = j u fiat-{me .<. y + act-b) — F(y)} u” we). i=1 When F is unknown but symmetric around 0, K oul (1985) defined a similar estimator fl+ of ,3 as 3+ = argminb M+(b), where, for b E H”, mm = / || :zwm s y + ctr-b) — 1H,". < y — cm} II” we). ii In both papers, the authors discussed the asymptotic normality of these estimators. The estimator 3 provides the right extension of the one sample minimum distance estimation methodology of Wolfowitz (1957) to the linear regression setup. This thesis analyzes the strong asymptotic behavior of these estimators. In the first part (Chapter 2), some inequalities about weighted and centered em- pirical processes are developed. In the second part (Chapter 3), strong consistency of the above mentioned estimators is proved under difi'erent sets of conditions. Finally, in the third part (Chapter 4), a Bahadur type expansion of 3+ is given, using the results from the first part. iii To my parients and my wife iv ACKNOWLEDGMENTS I wish to express my sincere thanks to my advisor Professor Hira L. Koul for his patient guidance and continuous encouragement during the preparation of this dissertation. I would also like to thank Professors Habib Salehi, James Hannan, and James Stapleton for their serving on my guidance committee. My special thanks are due to Professor James Hannan for his very useful comments which lead to the improvement of this dissertation. Finally, I express my deep appreciation to my wife for her stupendous encourage- ment and support to my study. Contents 1 Introduction 1 2 Tail Probability Inequalities 6 2.1 Introduction ................................ 6 2.2 Results under sup-norm ......................... 7 2.3 Results under Lg-norm .......................... 16 3 Strong Consistency 21 3.1 Introduction ................................ 21 3.2 Main results and proofs .......................... 24 4 Bahadur Expansion 36 4.1 Main result and proof ........................... 36 vi Chapter 1 Introduction The study of minimum distance (MD) estimation of a parameter can be traced back to that of the least square (LS) estimation. Other examples of classical MD estima- tion are, for instance, the least absolute deviation (LAD) and the least chi-square (LCS) estimations. In these methods, estimators are obtained by minimizing some types of distance functions related to the data and the parameters to be estimated. However, it was in 1950’s that Wolfowitz first explicitly employed the concept of MD estimation when estimating a parameter by minimizing a distance between an em- pirical distribution function (d.f.) and the modeled parametric family of d.f.’s. As Millar (1981) commented that Wolfowitz took Neyman’s idea on minimum chi-square and elevated it to a general principle. In his work, Wolfowitz (1953, 1954, 1957) demonstrated that MD method not only could be used in a wide range of problems but also yielded strongly consistent estima- tors even when sometimes classical methods, like the maximum likelihood method, failed to give a consistent estimator. Wolfowitz’s work drew people’s attention to MD estimation. Blackman (1955) and Bolthausen (1977) studied the asymptotic normal- ity of some MD estimators. Pollard (1980) worked on testing hypothesis problems with MD estimators. Beran (1977, 1978, 1982), Parr &: Schucany (1979), Millar (1981, 1982, 1984), and Donoho & Liu (1988a,1988b) investigated various robustness and local asymptotic minimaxity of a large class of MD estimators. Most of these authors worked on the one sample or the two sample location models and found that the MD estimators corresponding to Lg-distances are generally more robust against certain gross errors than the ones corresponding to the supremum distance. A bibliography about the work on MD estimation up to 1980 can be found in Part (1981). The above mentioned MD methodology was extended to estimating parameters in linear regression models by Kou1(1979, 1980, 1985a, 1985b), Williamson (1979, 1982), and Koul & DeWet (1983). These authors successfully established the asymptotic distributions of a class of MD estimators which minimize some Cramer-Von Mises type distances. Systematic presentation of the work in this field can be found in Koul (1992). This thesis is concerned with the strong consistency, the rates of convergence, and the Bahadur type representations of the class of MD estimators defined by Koul & DeWet (1983) and by Koul (1985a, 1985b). A special case of the study can be seen in Koul & Zhu (1991). It is known that the almost sure convergence rate and Bahadur type expansion of an estimator provide a deeper understanding of its large sample behavior. They are also important in using the given estimator in sequential analysis. We shall now describe the estimators studied in this thesis in more detail. Let X”, n 2 1 be a sequence of r.v.’s and an, n 2 1, a sequence of real numbers. We write ‘X,, = 0(an)’ if limsupnfioo IXnI/Ian| S M a.s. for some 0 < M < co and ‘Xn = 0(an)’ if limsupnfioo IXn|/ Ianl = 0 (1.3. Further, we write ‘Xn < an wpln’ if limsupnfioo Xn/an < 1 0.3. Let p Z 1 be an integer, F be a d.f. on the real line R, and {5.31 S i _<_ n} be independent and identically distributed (i.i.d.) random variables (r.v.’s) with the common d.f. F. Consider the p-dimensional linear regression model Y“; = 23;.fl + 6;, I S 1 S n, (1.1) where 2.3, l S i S n, is the ith row of a known real it x p design matrix X ,, and fl is the parameter p—vector to be estimated. With respect to (1.1), define a weighted empirical process corresponding to an 2 n x p real weight matrix D“, of which d;. is the ith row, as n VD(y,b) = Zdi-“Ym' S y + 2,.5), y E R, b E R”, (1.2) i=1 where I is the standard zero-one valued indicator function. Further, define a Cramer- von Mises type distance (T (-))1/ 2 between VD(y,b) and the expectation of VD(y,fl) as T0») = j u Vn(y, b) — EVD(y,fi) "2 we) = [II gamma..- 5 y+2tb) —F(y)} II’ we), (1.3) where E is the expectation under (1.1), || . II is the Euclidean norm, and H is a given nondecreasing right continuous function. When p = 1, Koul & DeWet ( 1983) defined a MD estimator ii of D as a minimizer of the function T assuming F is known. Their motivation of this definition is similar to that of the LS estimator: in the integrand of T(-), VD(y, b) — EVD(y, fl), has mean 0 when 5 equals the true parameter E. Observe that (1.3) actually defines a class of T functions, one corresponding to each H and D... Therefore, a class of estimators E} of fl is obtained upon chosing different H’s and Dn’s in (1.3). For the one dimensional case, i.e. when p = 1, and when F is known, Koul & DeWet (1983) studied some finite sample properties, asymptotic distribution, and asymptotic efficiency of E. This study was later extended by Koul (1985a, b) to multiple linear regression models in which the errors could be either i.i.d. F with F being a known d.f., or independent with unknown d.f.’s F,-’s which are symmetric around a common point. When F is known, the definition of [:3 is as above. In the case when error d.f.’s E- are unknown but symmetric about a common point, assuming that the common point of symmetry of F,-’s is 0 without loss of generality, Koul (1985a, b) defined a MD estimator fl+ of fl as a minimizer of T+(o), where, for b E R”, T+(b) = / ll Edi-“(Y’s _<. y + 23.25) - I(—Y,..- < 31 — 3&5» ll2 dH(3/)- (1-4) i=1 3 The robustness of both [3 and fi+ was also discussed in KouI’s papers. According to Koul (1985a), among the estimators obtained from choosing certain type of weight matrices D", the one corresponding to Du = X,,(X:,X,,)'1/2 is asymptotically most efficient. Therefore, in the sequel, we take D“ = X “(X :,X n)'1/ 2 and consider the cases when F.- = F with F known and unknown. We use [3" for either if or 3+. According to Wolfowitz (1953, 1954, 1957), it is desirable to prove the strong consistency of the MD estimators fl'. The first goal of this thesis is to give appropriate conditions under which E‘ is strongly consistent. We in fact prove, under certain conditions, that H fl‘ "' 3 ll: 0(7fl): where {7“} is a sequence of real numbers which depend only on the design matrices X ”’8 and converges to 0 for a wide choice of X “’3. In 1966, Bahadur obtained linear expansions for sample quantiles as estimators of the population quantiles. It is known that Bahadur expansion is very useful in se- quential analysis. 0110311 (1971) and Ghosh & Suhthme (1974) weakened Bahadur’s conditions and obtained similar expansions for sample quantiles in term of conver- gence in probability. Haan & Taconis-Haantjes (1979) further extended Ghosh & Sukathme’s work and also obtained Bahadur’s result under slightly weaker condi- tions than Bahadur’s. Others also obtained results similar to Bahadur’s for other estimators. An important example is in Babu (1989) where it is shown that the least absolute deviation estimator of linear regression parameter has Bahadur expansion. The second goal of this thesis is to obtain Bahadur type expansions for the MD estimators fl’defined above. We shall prove that {3' —fl — $314,.- = 002.). where ¢,—’s are independent random vectors and {Rn} is a sequence of real numbers which converge to 0 at a rate depending on the choice of X "’8. We call our expansions ‘Bahadur type expansions’ because the convergence rate R" is different from what Bahadur obtained in the one sample problem. See Chapter 4 for the details. To reach our goals, we first study in Chapter 2 some properties of the weighted empiricals defined in (1.2). Some tail probability inequalities related to the sup- and Lg-IIOI‘IIIS of these empirical processes are obtained in Section 2.2. The inequality pertaining to the sup-norm extends an inequality of Ghosh & Sen (1972) from a simple linear regression model to the multiple linear regression model. These inequalities are the fundmental tools used in the proofs of strong consistency and the Bahadur expansions of (3‘. In Chapter 3, strong consistency and convergence rates of ,6" are discussed. According to our conclusions, many frequently used designs yield strongly consistent MD estimators 3". Finally, in Chapter 4, we present the Bahadur type expansions of 3" in detail. Chapter 2 Tail Probability Inequalities 2.1 Introduction In this chapter, we develop in Theorem 2.1 an exponential inequality for the tail probabilities of the centered weighted empirical processes (1.2). Then some large sample probability inequalities are obtained. Ghosh & Sen (1972) also derived a similar inequality involving the sup-norm of certain weighted and centered empirical processes for their study on bounded length confidence intervals. In the following, we give a brief description of Ghosh & Sen’s inequality because we are going to show that their result is actually covered by our Corollary 2.1. For a sequence of real numbers c1, 02, - - -, let c;i=(Ci—En)/Cna 132$”, where a, = n"1 2;, c,- and 0,2, = 2?=1(c.- — En)”. Also, let {Y1, Y2, - - } be a sequence of i.i.d. r.v.’s having uniform distribution over (0,1) and let F be a d.f. on R. For 1SiSn,0 0, 3. F is an absolutely continuous d.f. for which the density function f and its first derivative f’ are bounded a.e. under Lebesgue measure, Ghosh & Sen proved that for every h > 0, there exist positive constants K1, K 2 and n" (all of which may depend on h) such that for n 2 n", k 2 1 and 0 < 6 < 1/4, P( sup sup |G,’,(t,b) — G;(t,0)| Z Kln'6(ln n)k) 5 Km”, (2.2) O_ 1} are two sequences of real numbers such that bn T 00, an 2 n"'‘0 for some 0 < [to < co, and an?)" = 0(1). (B) The d.f. F satisfies that 811p |F(y + 5) - F(y)l S Ma I5I, 1:612 for 6 E R and some Mo < 00. (C) {Cmn Z 1} is a sequence of n x p real matrices, with some fixed p 2 1, such that C;C,, = Ipxp, the identity matrix, ICnI Z n'k° for some 0 < lco < co, and ICnI b,, = 0(1) for some b7, 1’ 00. For any y,u E R, s,t E R", and Y = (Y1, - - - ,Yn)’ whose components are i.i.d. r.v’s with d.f. F, let I‘(Y.~Sy) = 'Syl—Ffil), I( , W.(y,us) = inl‘mswuso. (2.3) Note that the process Wt(-,us) reduces to the ordinary centered empirical process when 8 = 0 and t = (l/n, - - - ,1/n)’. We are now ready to state our results. Theorem 2.1 Assume that (B) holds. Given 6 > 1,111 < 00, h; < oo, 7 > 0, P( sup suprWt(y,us) — W¢(y,0)| Z 27 + 2M0 II s I)” t H a) l9l 0 andn 2 1, n 1.2 P X.- Z 1' S 2ex — n . ( g i ) p( 251:1 Var(X,-) + %m'r]) (See Serfling 1980, P95) D Lemma 2.2 . Let ¢1,¢2,¢1 and 1/22 be nondecreasing functions from R to R. Let (I) = ($1 —¢g and“? = $1 —1b2. Thenfor anyx E [a,b] C R, |<1’(x)- ‘I’(0)l S maX(|‘I’(a) - ‘I’(0)I , I‘M?) - ‘I’(0)l) + ¢2(b) - c5201), (16) and |‘1’($)- ‘I’($)I S maX(|‘1>(a) - ‘1’(b)|, I‘W’) - ‘1’(a)l) +¢2(b) + $201) - ¢2(a) — ¢2(a)- (2-7) Cl Proof of Theorem 2.1. For a real number a, use a+ and a" to refer to the positive and negative part of it, respectively. Similarly, for a vector t = (t1, . - . ,tn) 6 R”, let t+ = (t'f,- - - ,t:) and t" = (ti’,- - - ,t;). Then, t = t+ — t". According to (2.3) and by the triangle inequality, it is enough to prove (2.4) with the 32 on the RHS replaced by 16 for the case when t has nonnegative components. N ow, observe that mans) - Wt(y.0)l = item—y — u(—s.-) s —K- < —y) , where I " is defined as in (2.3). Thus, we can and hanceforth further restrict our proof of (2.4) with the 32 on the RHS replaced by 8 for the case that both t and s have nonnegative components. To simplify the notation, in this proof, we write I(31,11)= git-HY.- S y + "35): F(y,U) = item! + us.) i=1 i=1 W(y,U) = Wt(y,u8)- Hence, WW") 2 [(31,11) _ F(yau)' First, we prove that for any y E R, 0 S u S b, and r > 0, 2 P(|W(y,u) — W(y,0)l 2 T) s 2exp(—,(Mo M u [“2 H % m 1.)), (2.8) and for any yl < 312, 7.2 2(n t n: 6 + % utm)’ (2'9) P(IW(3/2,0) — W(y1,0)| 2 T) _<_ 2exp(_ where 6 = F(y2) —- F(y1). Let x.- 2 mm.- s y + u...) — NY.- 3 y)1. Then, W(y, u) — W(y,0) = n X; and i=1 EX;=0, IX,|St;S_ItI, ISiSn. Further, by (B), :Varm) S :tilfly + us.) — F(y)] _<_ M0 Isl u t “2 b. Hence, the Bernstein bound of Lemma 2.1 gives (2.8). Similarly, if we let Xi E ti[I.(Yi S yz) — 1‘0”} S 311)], then, W(y2,0) — W(y1,0) = n X,’ and i=1 EXizoi IXiIStiSItl, 131.3", ivaflxt‘) 5 itHFh/z) - F(y1)) = 6 II t H2 . i=1 i=1 10 Again, the Bernstein bound of Lemma 2.1 gives (2.9). Next, for a fixed a a, 0 < a S b'1 A (2Mohi/ 2)"1, construct a partition 0 = 770 < 171 < ° - - < 17,,, = b of the range of u such that r“ S bo'l. (2.10) The assumption (B) and (2.10) imply that, for 1 S r S r,,, iti(F(y + smr) - F(y + Sim-1)) i=1 M0 2 ti3i(’7r — nr-l) i=1 S Mollfilllltlla lF(y:’lr) _ F(y, 77r-1)l = |/\ M, say. (2.11) Therefore, by the nondecreasing property of I (y, u) and F(y,u) in u and (2.6) of Lemma 2.2, it follows from (2.11) that sup |W(y,u) - W(y,0)| S lgngx lW(y, v7.) - W(y,0)l + M. (212) OSqu —'—”" Now, for a fixed 17,, define My) = F (mm) + F(y,0), y e R. Then, g,(-) is nondecreasing and 0 S g, S 2 22;, t,- S 2n1/2 H t H. Choose a partition {-00 = {0 <61 < - - - < 6”,, = 00} of (—oo,oo) such that A03”) :2 97“”) - 9r(€u-1) S M, 1/2 1/2 S 2 II t II n 5 2n . M Mo || 8 || 0 By the nondecreasing property of F (y, 17,.) and I (y, 17,.) in y, (2.7) of Lemma 2.2, and (2.13), when y 6 [£v_1,£,,], we obtain ”n (2.13) IW(y) nr) — W(ya 0)| 11 S maX(IW(£u,nr) - W(£a—1,0)l , |W(€a—1,nr) - W(€a,0)l) + AU, v) < max |W(£.-,nr) - W(£s,0)l + |W(£U—lt 0) — WW, 0)I + M, (2'14) i=0— where the last inequality follows from the triangle ineqality. Hence, sup IW(y,1],.) — W(y’0)| |v|<°° ma{|W(£mnr)- W(£..0)I} + max {IW(£.,,0) — mason} + M. (2.15) lSuSun Combining (2.12) with (2.15) obtains sup sap |W(y,U)- W(y.0)| |y|0andM10 sup (F(y + 6) — F(y))2dH(y + a) S M162. lal1, A < oo, 0 < B < 00, no 2 1, and A < 00 such that for all /\ Z A and n 2 no, P((a,,b,,)'1|H(Y) — H_(Y - a..b..)| > A) g Aexp(—BX’), where Y is a r.v. with d.f. F and H. is the left limit of H. Also, / F(l — F)dH < oo. (2.25) The next theorem gives an analog of (2.20) for the Lg-norm || . ”H. This theorem holds for large n only. Theorem 2.2 . Let assumptions (A), (D), and (E) hold and b" 2 (1n n)1/2. Then, for any I: > 0 there exist a constant K < co and No 2 1 such that F(vsupn n W.(, us)— w.(.,0) “Hz [rd/25:”) g n-k, (2.26) holds for all a, t E R" satistying (2.21) and for all n 2 No. 16 To prove Theorem 2.2, the following lemma due to Bychkova (1986) is used. Lemma 2.3 . Let {X1n lc _>_ 1} be a sequence of random elements taking values in a Hilbert space such that EX)c = 0, k _>_ 1, with 0 being the 0-element of the Hilbert space. If P(" XI: ”1.2 x) S Aexp(—Bx‘1/(q—1)), where 1 < q < 2, A > 0, and B > 0, then for any sequence of real numbers {vm n 2 1} satisfying m 2 lvnla < 00 11:1 for an a E (q, 2], we have P(u 2: tax). ".2 a) s exp(—A.a°'/<°-‘>), k=l where Ad > 0 is a constant depending only on a and II - ”h is the norm defined on the Hilbert space. E] Proof of Theorem 2.2. Similar to the proof of Theorem 2.1, it suffices to prove (2.26) for t and 8 having nonnegative components. Let ’H be the Hilbert space defined by I] . "H. For 0 S u S bn, y E R, and 1 g i _<_ n, let X,-(y) = (a.b,)-l/2{P(Y.- 5 y + us.) — my. 3 y)}. (2.27) Then n x.- Ilia: (abs-1U Ia < Y.- s ,, + uss)dH(y) + / (F(y + us.) — an)“ dH(y) -2 / 1e < Y.- s y + usa)(F(y + as.) — F(y))dH(y)}- Clearly, for all 1 S i S n, /I(y < Y.- S y + as.) dH(y) = H_(Y,-) — H_(Y,- — 113;). (2.28) 17 Because 0 < u S bu and |s| S an, by (A) and (D) it follows that there exists an 1S N1 < 00 such that (F(y+aa.-)— F(y))’dH(y) s M1(us.-)2 lSiSn S M1(a,,b,,)2, (2.29) foralllSiSnananNl. Further, the Cauchy-Schwarz inequality, (2.28), and (2.29) imply that / I(y < Y._ < y + us.-)(F(y + as.) - F(y» dH(y) < Mlmanb b,.(H.(Y.) — H-(Y,- — us.))1/2, (2.30) foralllSiSnananNl. Theseimplythat X,E’H,1SiSnandn2N1. Now, by (2.29), max (marl / (F(y + as.) — Fm)“ dH(y) ——- ow.) = 0(1). lSiSn and by (2.28), (2.30), and (E), there exist A < co, 0 < B < oo, o > 1, and A < 00 such that for 1 SiSn, A>A,ananNo=noVN1, P(b(a,,,,) [)I(< Y-S y + us.) dH(y) > A2) S AexP('—BA20): p((..b.)-1 / I(y < Y.- s y + u..)(p(y + as.) — F(y» dH(y) > A?) S Aexp(—Bz\2"). These inequalities imply that there exist A1 < co, and 0 < 81 < 00 such that for 1 S i _<_. n) fl 2 N03 P(ll Xe ||H> A) P(ll Xi |Ib> V) S Alexp(—BIA2°). 18 Since a > 1, there is a 1 < q < 2 such that 20 = q/(q — 1). Apply Lemma 2.3 to the {X,-} defined in (2.27) with a = 2, v,- = t,-, 1 S i S n, v,- = 0, i > n, to obtain P((a,.b..)-1/2 u W(-,us,,) — W(-,0) ||H> A) g exp(—A,\2), for some A < co and n 2 No. Taking A = Kb“ gives that for 0 S u S b and n 2 No, P(II W(-,us) — W(-,0) I|H> Kai/253,”) S exp(—AK2b?,). (2.31) Now, take a partition on [0, bn] as in the proof of Theorem 2.1 and take the configuration of b, hl, ’12, and 0' as in Corollary 2.1. Then, a S ai/2bfi/2. The discussion similar to that for (2.12) in the proof of Theorem 2.1 leads to supb || W(-,us..) - W(-,o) IIH OSuS n s max, u W(-,a.s.) — Wen) ua +max n Fm.) — Fm.-.) IIH . (2.32) By (A), (D), and the Cauchy-Schwarz inequality, ll F(‘a’lr) — F(, 777-1) ”I! =|| Zte(F(- + am) - F(- + sax—1)) Ht i=1 Sll t ||2 2 II F (' + 3m.) - F (° + Sena-1) Ilia i=1 S M1(a,1,/2b?,/2)2, n _>_ N1. Therefore, for n 2 N1, (2.32) can be rewritten as sup n W(-,us) - W(',0) llH OSqun < max n Wow) - W(-,0) ua +Mi’2ay’bi/2. _ OSrSrn To prove (2.26), it thus suffice to show that there exists a constant K < 00 such that for all n _>_ No, P max (I W(.,r,.s) — W(-,0) ||H> Kai/2123;”) g n-k. (2.33) 5'57» 19 By (2.31), LHS(2.33) s 2: P(IIW(-.nrsn)-W(-.0)IIH> Kai/253”) OSrSrn < r,,exp(—AK2b:), n 2 No. Since b,, 2 (lnn)1/2 and r,, S bud-1 S (a,,b,,)"‘/2 S a;1/2 S n"°/2 by (A), we can select K large enough so that (2.33) holds. This completes our proof. [I] The following corollary is analogous to Corollary 2.2. Corollary 2.3 . Define 8 and ’13,, as in (2.22). In the assumptions of Theorem 2.2, replace (A) by (C). Then there exists a constant K < 00 such that, for any tn 6 R", II tn IIS 1, Ital S ICI, SUP || th(':bd) —W,,,(.,0) Ila< chlllabi”, wpln- (2-34) OSbSbnydED E] The proof of Corollary 2.3 is similar to that of Corollary 2.2 and hence is not given. 20 Chapter 3 Strong Consistency 3.1 Introduction In this chapter, we first recall the MD estimators defined in Chapter 1. Asymptotic distributions of these estimators have been studied by Koul & DeWet (1983) and Koul (1985a, 1985b). We will present in the next section some strong consistency results of these estimators. Consider the linear regression model (1.1). As in Chapter 2, we will not exhibit the dependence of Y,,,- and X ,, on n and use 23;. and 23., for the ith row and jth column of X, respectively. Let S = X ’X and assume that 5’1 exist for all n _>_ p. Let C = xs-l/z. (3.1) Then the model (1.1) is equivalent to Y,- = c,-.A + 6;, 1 S i S n, (3.2) where c,-. is the ith row of the C and A = 31/35. (3.3) Observe that the design matrices C’s of (3.2) satisfies C'C = I pxp. Our study is conducted based on model (3.2). Any conclusions obtained can also be translated to the forms with respect to model (1.1) according to (3.3). 21 Given a nondecreasing right continuous function H from R to R, let M) = / II web) II’ dH(y), b e RP. where U(y,Cb) = fauna/.- s y + ab) — F(y». (3.4) i=1 Note that under (3.2), E U(y, CA) = 0. This motivates one to define a MD estimator A of A, in the case F is known, as a minimizer of T(-): A = argminb T (b) Similarly, in the case that F is unknown but symmetric around 0, a MD estimator A+ of A is defined as a minimizer of T+(-): A+ = argminb T+(b), where, for b E R”, T+(b) = j u U+(y,Cb) "2 dH(y), U+(y, Cb) = f: c..{1(Y.- s y + c..b) — I(—Y.- < y — c..b)}. (3.5) i=1 Note that for B and 6+ defined in Chapter 1 with D = C, we have ,3 = 54/221, 3+ = 5-1/24+. (3.6) See Koul & DeWet (1983) and Koul (1985a) for more motivation and other properties of ,6 and )6+. In this chapter, we give the strong consistency results for R and ,3+ along with rates. To this effect, besides the assumptions (B)—(E) in Chapter 1, we shall also use the following assumptions. (F) The d.f. F satisfies (2.25) of (E) and has a density f which satisfies 0 <|| f llir< 00. (3.7) (1919] de > o. (3.8) 22 (G) There exist a > 1, A < 00, B > 0, and A < 00 such that for all A _>_ A, P(HueI) — H-(- Isl) > a) s Aepr—Br), where 6 has distribution F. (H) There exists 0 < a < 2 such that Z ICIza < co and |C|° = o((ln n)‘1). n=1 (I) The function F and H are such that // F(x)(1 — F(y)) dH(x) dH(y) < oo. xSy The following lemma demonstrates some facts related to assumptions (F) and (1). Lemma 3.1 . Let F be a distribution function and H be a nondecreasing right continuous real function. Then, the following hold. (1) fF(l — F)dH < 00 if and only iffH_.dF < oo. (2) fHZdF < 00 if and only if U... F(a)(1 — F(y))dH(x)dH(y) < oo. Proof. By the thini Theorem, / F(l — F)dH = ///Sz(1 — F(y))dH(a)dH(y) z- [ll/535K:dF(s)dF(t)dH(x)dH(y) s ; [faint—(t) — H—(s)l’dF(s)dF(t). and ll... F(a)(1 — F(y))dH(a)dH 0 such that T+(0) < Kolnn wpln. (3.9) 25 Proof. By the definition, Tue) = / II v+(y,o) II” dH(y) = :1/{iq,(1(x s y) — I(—Y.- < y))}2dH(y) i=1 := fry-(0) (3-10) i=1 with 13(0) = [{i a..(I(Y.- _<_ y) — I(—Y.- < y))}’dH(y), 1 s) s p. i=1 Since p is fixed, it thus suffices to show that (3.9) holds for each T,-(0), lSan Let X.~(y) = I(K-Sy)—I(—K- )2) P(H(|Y1|) - H(-IY1|) > )2) S Aexp(—BA2"). Since 20 > 2, there is q, 1 < q < 2, such that 20 = q/(q -1). 26 NowforeachfixedlSjSp,takea=2anda,-=c,-jwhenOSiS n, ag=0 when i > n. Then 00 Elm-I2 =1 < oo. i=1 By Lemma 2.3, P(ll Zea-X.- ||H> A) i=1 exp(—A,~A2) < exp(—AA2), P(T,(0) 2 )2) |/\ where A,- > 0, 1 S j S p, are constants independent of c.,- and A = min1555,(A,-). Now, take A = (K Inn)”2 with the K such that AK > 1 in the above inequality. Then the Borel-Cantelli Lemma and (3.10) imply that (3.9) holds with Ko = pK. U Lemma 3.3 . If H is a nondecreasing right continuous real function, then there exists a nonnegative real function 9 such that (a) 0 < g S 1. (b) fgdH < 00. Proof. We just construct such a function g. Let _ 1 IHIa)I 5.1, 9($)-{ xi?) (11(2)) >1. This g satisfies (a). To prove (b), we only need to prove f[H>l] gdH < 00. By Fatou Lemma, " 1 dH = 1' dH < 1' .— . It»)? .22. “.2539 — 32.2.2 < °° t: This completes the proof. D 27 Lemma 3.4 . Assume that (B), (C) and (F) hold. Then there exists a constant 0 < K1 < 00, such that T+ b > K In ’ 1 , 3.11 ||b||2(i(11nn)1/2 () 0 n wp n ( ) with Ko as in Lemma 3.2. Proof. Let h,, = (Kllnn)1/2 with K1 < co to be determined and 8 and D be as in (2.22) with Cu equal to C of (3.1). Let g be as in Lemma 3.3 and define )u(x) = [3 gdH. x E R. (3.12) Then )1 is a bounded nondecreasing function and so is H g ”H. We assume that H g "H: 1, without loss of generality, and denote p0 = f gdH. For any b E R”, b at 0, there is unique 8 E 8 such that b=||b||e=be, where b z” b H. Therefore (3.11) is equivalent to - + bleyEgE£T (be) > Kolnn, wpln. (3.13) By the Cauchy-Schwarz inequality, T+(b) = T+(be) / II e II’ II U+(y.bc:e) Il’ dH(y) / [e'U+(y, 5.1)] ’ dH(y) [/ U;(1.bd)da(y)]’. (314) IV ll IV where d = (d1,d2, - - - ,dn)’ = Ce and U;(y,bd) == e’U+(yabd) = 2.2.-(10’.- s y + bd.) — I(—Y.- < y — 5‘10}- i=1 28 Observe that for any fixed (1 E D and y E R, U;(y, b d) is a nondecreasing function of b. Therefore, when b 2 hn, Ui(y.bd) Z U501. had). 31 E R. d E 73- Hence, to prove (3.13), it suffices to show that 3,2; / U511). 1.3) My) > (3.1...)1/2, wpln. 01‘ mp — / 03(1). h.d) dam) < —(K.1nn)1/2. wpln. de‘D Now, divide D into mu pieces, say, D1,- - - .17"... such that 1. The diameter of D)‘ is no larger than "—2, I: = 1, - - . ,mn. 2. m" S (pn2)’. By the Fatou lemma and (3.8) of (F), 11:35.1] %[F(x + 6) — Fund). 2 / fa). 2 f1- de > o. 1_<_H51] Therefore, we can select a 0 < r7 < 00 such that LHS(3.16) > 1). (3.15) (3.16) (3.17) Let K1 be such that rh/Kl — 2m > 0. We first prove that there exists an 1SN. —2(Ka In at”) S P(- / U543], bndk) du(y) + E [U111 (yr bndk) dH(y) Z ('7 K1 - 2 Ko)(lnn)1/2) Sexp(—-;-( K1—2 K0 2Inn). This proves (3.18). Note that the RHS in (3.18) does not depend on 11:. Hence, P( max {‘ / U;.(y,h..d")d,)(y)} 2 —2(Ko1nn)1/2) lSksmn Smnexp{—%(n K1—2 K0 2Inn} S prn-%(m/RI-2JKE)’+22, n 2 N. (3.20) 30 Clearly, there exists a positive constant K1 such that the RHS of (3.20) is summable in n. Then, the Borel-Cantelli lemma gives max {—/U3u(y,hndk) dp(y)} < —2(Kolnn)1/2, wpln. (3.21) lSksmn Next, for any (1 E D, d E D), for some la, and [f [U;(y, had) — U;.(y. h.d")] d#(y)l = l) [U;(y,h..d) — UJ~(1I. 11.00] d11(3) + / [111.11, M) — 113(1). had")] My) s (sup 201? - d.)[I(Y.- s y + hnde) — I(-Y.- < y - hndilll lyl<°° i=1 + sup 2.115 (10/, g y + hndf) — I(Y. S y + huddll lVl<°° i=1 + sIup de[I(—Y.- < y — lad?) — II—K- < y — h.d.-)] )m I” <00 i=1 := (I1 + I; + I3)po, say. (3.22) We have II = Isup EM:c - d,)[I(Y.- S y + hndi) — “‘16 < y ‘ hndi)]| Vl<°° i=1 S H dk-dll "”2 : 0(71-3/2). (3.23) Recall the definition of Wt from (2.3). By Corollary 2.2, assumption (B), and the fact that |C| h,, = 0(1), [2 = sup lv|<°° 23d? [I(Y. s y + 11.4?) - 10/. s y + h.d.-)]‘ i=1 S Iii? {IW6~(1/. hndk) - Wa~(y.0)| + IWd~(y. had) - Wa~(y. 0)| + £3de + ad?) — F(y + 6.3)) I} = 0(ICI1/’(1nn)3/4) = o((lnn)1/2). (3.24) 31 Similarly, I3 = o((ln n)1/2). (3.25) Combining (3.22) — (3.25), mp |/ [01(1), 6.4) - 111.11), 1.41)] d#(y)| = own are). (3.26) (161) Finally, by (3.21) and (3.26), 811p - / U101. hnd) dMy) deD = 3‘25“ / U;.(y, 3.x) dp(y) + / [113(1), ad") — U16, 1.3)] da(y)) < max {- / UJ~(1/.had")dfl(y)} +.((1...)1/2) _ lSkSmn < —(Ko ln n)1/2, wpln, thereby proving (3.15) and also the lemma. D Now, we are ready to prove Theorem 3.1. Proof of Theorem 3.1. By the definition of 4+, T+(A+) s T+(0). On the other hand, by Lemmas 3.2 and 3.4, inf T+(b) > K0 lnn > T+(0), wpln. llb||2(K11nn)‘/’ Therefore, || A+ ||< (K1 lnn)l/2, wpln This completes the proof of Theorem 3.1. C] Next, to prove Theorem 3.2, we prove the following three lemmas. Lemma 3.5 Let 11(0) = ]{it-(KY: s y) — F(y))}’dH(y). 1:1 32 Then, under the assumption (1), for every t e R", E1310) 5 6 II t H“ [[39 F160 — F(y» dH(a)dH(y) < ea. (327) Proof. Observe that by the 11161.21 Theorem, ET3(0) = E[/{::lt.-(I(Ye S v) - 15110)}2 6111(3)]2 = E{[/{Zt(I(Y.- g .) _ F(.))}’1H(.)] . I / {z .,. (ms- 5 ,, - F(.))}’am.)I} 2 (syéttEKHY. s y) — F(y))’(I(Y.~ s a) — F(a))’] +22... t?t}E[(I(Y_ < y)— F(y))2(I(Y,- < x)— F(x))2] +222 x tft§E[(I(Y-_ < x) — 11(2)) (I(Y,_ < y) — F(y)) (I(Y < y)— Fm) (I(Y. < a)— F(a))]}dH(a) dH(y) IA = Z/AK, {§t?A+ 22% tftiB +2: 2,513.30} dH(a) dH(y). (328) where A = E[(1(Y.- s y) — F(y))’(1(y,. s .) — 11(3)”), B = E(I(Y. S y) - F(v))2E(I(Y3 S 3) — F(x))2, and D=E[(I(Y.Sw)—F(z))(I(Y' 5-3!) F(yl) (mg- _<_y)- no) (11 s a)— 3(3)]. Further, when x S y, A s E I(I(Y.- s a) — F(a))(I(Y.- s y) — F(y))| F(x)(l — F(y)) [1 + 2(F(y) — F(x))] 33 s 3F(x)(1 — F(y)) (3.29) B = F($)(1- F($))F(3I)(1— F(y» S F($)(1 - F (21)) (330) D = 1"'2(='=)(1 - F(y»2 S F($)(1- F(y» (331) Then (3.27) follows from (3.29) - (3.31). C] Lemma 3.6 . Assume (H) and (I) hold. Then T(O) = /U2(y,0)dH(y) < ICI’“ wpln (3.32) Proof. Apply Lemma 3.5 p times, the jth time to t = c.,~, 1 S j S p and use the fact that I] c.,- H: 1 together with the Cauchy-Schwatz inequality to obtain Ema) = 131/ II U(y.0) Il’ dH(y)]? E(Zp: [{f: a..(z(Y. s ,) — F(y))}’ dH(y)]2 (§{E[/{;Qj(l(fi s y) — F(y))}’da |C|-°' wpln. (3.33) IIbII2K|C|‘°” 34 Proof. This proof is similar to that of Lemma. 3.4. Proof of Theorem 3.2. Similar to that of Theorem 3.1. 35 Chapter 4 Bahadur Expansion 4.1 Main result and proof In this chapter, we further give a Bahadur type expansion for A+ so that a similar expansion can also be obtained for 6+. We need further assume that (J) F has density function f whose derivative satisfies “233’" [1m + 3))2 dH(y) < 0.. To describe the theorem, define 3+ = - / U+(y.0)f(y) 4H (11). Mb) = / II U+(y.0) +2bf(y) Il’ dH(y). and A A+ = argminb T+(b). Remark 4.1 . Observe that 2 I] f H}, 21+ = B+ ifO (M f ||H< 00. Because of this fact and (3.9), II 21“ ||< mama)“. wpln, where K; = {-K. n f "g”. a 36 Theorem 4.1 In addition to the linear model (3.2) with true A = 0 and the sym- metry ofF around 0, assume (B) — (C) and (J) hold. Then IIfIIH(A+) =—-/f:[c.{I(K