x l = = ~ —‘ 2 _ — = * fi — :— — '— = = 970 IHESI. LIBRAR Y - MichiganState ‘ University This is to certify that the thesis entitled NEIGHTED EMPIRICAL-TYPE ESTIMATION OF THE REGRESSION PARAMETER presented by Mark Allen Williamson has been accepted towards fulfillment of the requirements for Ph.D. degreein Statistics and Probability M Major professor OVERDUE FINES ARE 25¢ PER DAY PER ITEM Return to book drop to remove this checkout from your record. HEIGHTED EMPIRICAL-T¥PE ESTIMATION OF THE REGRESSION PARAMETER By Mark Allen Williamson A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1979 ABSTRACT- NEIGHTED EMPIRICAL-TYPE ESTIMATION OF THE REGRESSION PARAMETER By Mark Allen Williamson We consider three estimators for the slope parameter 3 in the simple linear regression model, each of which is based on the minimization of a statistic for testing H0: 8 = 0 versus the alternatives H]: s f 0. The statistics include a Cramér- von Mises statistic and its rank analogue, and the Kolmogorov- Smirnov statistic. Invariance and symmetry properties of the estimators are studied for finite samples, and their asymptotic distributions are derived. The Cramér-von Mises-type estimators are shown to be asymptotically normal, while the asymptotic distribution of the Kolmogorov-Smirnov-type estimator is expressed in terms of func- tionals of a Brownian bridge. _ The Cramér-von Mises-type estimators are compared with some common estimators of B by an examination of asymptotic variances at various underlying distributions. Comparisons for the Kolmogorov- Smirnov-type estimator are made via a Monte Carlo study and by comparing asymptotic upper bounds for the lengths of associated con- fidence intervals. ACKNOWLEDGEMENTS I wish to thank Professor Hira Koul for his guidance in the preparation of this dissertation. The advice and encouragement he gave are greatly appreciated. I would also like to thank Professors Dennis Gilliland, Roy Erickson, and Joel Shapiro for their review of my work. The excellent typing of the manuscript was done by Mrs. Noralee Burkhardt. Finally,_I would like to thank my parents, Mr. and Mrs. A. Rex Williamson, for their constant encouragement during my years as a graduate student. TABLE OF CONTENTS Page INTRODUCTION AND SUMMARY ................. 1 Chapter 1 CRAMER-VON MISES TYPE ESTIMATION OF B; THE RANK ANALOGUE ............. . 6 1. Notation and Preliminaries ....... 6 2. Finite Sample Properties ........ l0 3. Asymptotic Behavior of M (A) ...... l3 4. Asymptotic Distribution of 0 C81 . . . . l6 5. Asymptotic Efficiency of Bl ...... 20 2 CRAMER-VON MISES TYPE ESTIMATION or e . . . 23 1. Notation and Preliminaries ....... .. 23 2. Finite Sample Properties ........ 28 3. Asymptotic Behavior of M2(A) ...... 29 4. Asymptotic Distribution of OcBZ . . . . 42 3 KOLMOGOROV-SNIRNOV TYPE ESTIMATION OF B. . . 48 l. Notation and Preliminaries ....... 48 2. Finite Sample Properties ........ 56 3. Asymptotic Distribution of o €33 . . . . 58 4. Interval Estimation of B ........ 6l 5. Asymptotic Efficiency of the IC a . . . 66 6. Monte Carlo Study . . . . . . . c: . . . . 69 BIBLIOGRAPHY ....................... 7l INTRODUCTION AND SUMMARY 1. The Model. Consider the simple linear regression model Xm. = 80 + 8cm. 4- Eni’ l 5 i in, where cn1 gcnz 5..._<_ cnn are known constants, not all equal, 80 and B are unknown parameters, and the eni are iid F for F an absolutely continuous distribu- tion. We regard 80 as a nuisance parameter and consider three methods for estimating 8. Throughout this paper we will, for the sake of convenience, suppress the dependence of 'Ixni}. {eni}’ and '{cnii on! n. The vectors (c1,c2,...,cn)' and (X1,X2,...,Xn)' will be denoted by g and 5, respectively, d1, 1 5_i 5_n, will denote the centered ci's, and we will take CE = fg=1 dg. Furthermore, many of the statements made are true w.p. l, even though it may not be stated explicitly. 2. Cramér-von Mises Type Estimation of B; the Rank Analogue, As given by Hajek and Sidak (l967, p. 103), the rank analogue of the Cramér-von Mises test for H0: 8 = A, where A is a given constant, is based on the statistic _ T " 2 MM) - f0 [1;] diI(RniA int)] dt where Rni is the rank of X1 - Ac. 1 among {Xj - ch, l §_j §_n}. A Since H0 is rejected only for large values of M](A), it seems I I reasonable to attempt to define an estimator B] for 8 based on the minimization of M1 in A. In section l.l we propose a unique definition for such an estimator and give a numerical example to illustrate its computation. When so defined 61 is translation invariant and has a distribution which is symmetric about the true parameter provided the underlying distribution is symetric or the centered regression constants are skew-symmetric (section 1.2). Section 1.3 contains intermediate results which are used to derive the asymptotic distribution of the normalized estimator in section 1.4. Finally, in section l.5, we consider the asymptotic efficiency of 3]. With asymptotic variance as the basis for comparison, 6] performs remarkably well against some common estimators for B, particularly when the underlying dis- tribution has heavy tails. Comparisons are made with the Wilcoxon, median, normal scores, and least squares estimates at the normal, double exponential, and logistic distributions. At the double exponential, B1 out-performs all of the above but the optimal median-type estimator. Similarly, at the logistic,.only the optimal Wilcoxon-type estimator is more efficient. At the normal, B1 beats only the median-type estimator, but shows only a slight loss of efficiency against the other estimators. 3. Cramér-von Mises Type Estimation of B. We base our second estimator on a statistic which is similar to M1 of the previous section, but which uses the observations themselves rather than their ranks. Here we consider the process 3 n M2(A) = 1me 2 diI(X1. - Ac. __ 0 a.e. on '{x; O < F(x) < l}. (1.2) lim 0" max |d.| = 0. hr» lffign 1 In what follows let the vectors 5 and x be given ahd define the quantities B = If“ f2(x)dx and K = fjm f3(x)dx. For each real A and for each t 6 [0,1] define n S(t,A) := X diI(R . 5_nt) i=l where n RniA := .2 I(X. - Ac. 5 X. - Ac.) . J l 3 We also define, for each real A, W1(A) := I; S(t,A)dt and note that 7 -1 n ”1(0) “-n iEI diRnio 0 Next consider the process {MI(A)’ '°° < A < m} where ,_ l 2 M](A) o- [0 S (t,A)dt o If we let (0 ) denote the vector of anti- nlA’DnZA""’DnnA ranks for (x - Ag)’, it is interesting tO note that; n- -l -2 I o 2 c . ‘i [ d J li=l DniO -2 _ 0c ”1(0) - n This is the Cramér-von Mises statistic (Hajek-Sidak, l967) for test- ing the regression slope parameter B 8 0 against the alternatives 8 f O. For a fixed sample, M1(A) is a step function (in A) whose points of discontinuity are contained in the set I] = {(Xj - X1.)/(cj - c1); i < j and c1 < cj}. Set A0 = miniA; A 6 P1} and A1 = maX{A; A 6 r1}. Then for c1 < cj, A < A0 implies A < (Xj - Xi)/(cj - Ci) and hence R Thus the residuals '{Xi - Aci, l 5_i‘5 n} are niA < anA' naturally ordered and therefore (w.p.l) ( ' . . g d. for t e [13 1:1), l < j < n-l S(t,A) =< o for t e [0,‘10 u {l}. g n Hence, _ n-l j M1(A) = I; 52(t.A)dt = n ' Z T 2 di]2 i=1 i=l and thus M ( ') '1 nil i d 2 A =n [ .] 1 ° i=1 i=l ‘ Similarly, for A > A], the residuals are in a reversed natural . n _ . J 2 _ ordering. U51ng' Zi-l di - O one obtains M](A)=n'12j;][fi=1 d J + As A crosses A0 only one pair of adjacent residuals cross. Let ck < ck+l denote their respective regression constants. Then n-l ' -l g 2 n X E d 3 i=1 ‘ i=1 "1(A5) ' Ml(AO) 1% k“] 2 n' {3&1 [ H: d 32 + Idk+l + jz d1] } jfk k-l n-l 2 2 n{[1: d. J -Tdk+I + ig} di] }. Now c1 §_c2 5,..§_c and ck < ck+1 imply k-l + Z d. 5.0 . k . Z d. < d = “i=1 1 l k+l ' . . . ~l- .. Thus M1(Ao) > M](A3). Similarly it follows that M1(A]) > M](A]). As a result, the following quantities are finite: * B1 min{s e r]; M1(S+) = inf M](A)} C AEI‘.l ** - B] = max{s e I]; M1(S ) = inf M1(A)} . c AEI‘1 We now define our estimator B] for B by A - L * ** B] ‘ 2(81 + B] ) ° Numerical example By the preceding remarks we may determine the value of B1 by identifying the set of slopes P1 and computing M](A') for each ‘A 6 F1. Computation of M1(A) is facilitated by using the formula -1 (1.2) M](A) n 1 0 there exist positive real numbers N,a and d such that whenever n 3_N, 15 P0(En1(a,d)) 3,1 - e. 1 Proof. Since a; W1(D) has a limiting distribution, there exists a positive real number b such that ”DIG HIW (0)I_ < b] > 1 - e V n . 2 If we also take d > 2b and choose a so that a > b + (so/2)15 8" we have 2N1(O) §_b2 < d/2 and inf o 22(f0[5(t, O) + Ao f(P"(t))1dt)2 IAI =5 nin{(oj W1(O) - aB)2, (o"w1(0) + a3)2 1 IV (aB - og‘lw1(0)|)2 3_(a8 - b)2 3_3d/2 on {o;|W1(O)I_< b}. Choosing N according to (3.3) completes the proof. U Lemma 3.3. For every 2 > O and d' > 0 there exist positive real numbers a and N such that n 3.N implies -2 2 '1 g IATEa oC W1(Aoc ) 3_d ) 3_1 - e . Po( Proof. In the proof of lemma 3.2, take d > max{d',2b2}. The proof is completed by using the fact that W1 is nonincreasing in A (Hajek, 1969, p. 35). D 16 4. Asymptotic Distribution of OCB1. Throughout this section we retain the notation of sections 1-3 and assume that (1.1) and (1.2) hold. In addition we assume, without loss of generality, that B = 0. Lemma 4.1. For 0 5_t §_l and n 3_l define an(i,t) O i 5_tn i - tn tn §_i 5_tn + 1 = 1 tn + 1 5.i Then the process {Zn(t) := 0;] 1:] dian(R1,t), 0 §_t 5.1} converges in distribution in (B, C[0,l]) to the Brownian Bridge {3(13). 0 _<_ t :1}. Erggf. Héjek and Sidak (1967), Theorem V.3.5. D Remark, {Zn(t), O §_t §_1} is a process with continuous sample paths which is related to {S(t,0), O 5.t 5_l} in the following manner: sup |2n(t) + oE‘S(t,O)| 5_o-] max Id. |. city ° 1_<_i_<_n ‘ . For y a bounded integrable function on C[0,1J define -1 W) = K I}, y(t>f(r“N(O,o§) where o? = K'thg 62(t)dt - (I; G(t)dt)2] and _ t -1 G(t) - f0 f(F (s))ds, 0 5_t 5_1 . We now define 2 ._ -1 . ocB1 .- —h(oc S( ,0)) . For 0 < b < a and p > 0 define 6 (a,b) := {lo 3 1 §_b, inf _ M (A) > inf _ M (A)} n1 C 1 IAIZaoc] I |A| Kp /2} n1 IAls-aog" C A I 1 C I — .= _ 3 . c = - An1(a) . supiacls1 81I . 81 e r1. M1(B1) (ATan" M1(A)} . Aer1' C Lemma 4.3. Let s > O and 0 < b < a. Then P0({An1(a) > e} n Gn1(a,b)) + O. 2599:. Suppose there exist y and p positive such that P0({An1(a) > 2p} 0 Gn1(a,b)) > y for infinitely many n. By (3.2) there exists an N > 0 such that n 3.N implies P0(Hn1(asp)) < Y/2 and hence 18 P0[{An1(a) > 2o} n Gn1(a,b) n H:1(a,p)] > 1/2 . Since the above event is contained in {An1(a) > 2p} n Gn1(a,b), we can find for any given 5 e {An1(a) > 2p} n Gn1(a,b) n H:1(a,p), 1 * c - -1 . . B * a A 6 P1 n (-aoc , aoc ) satisfying 1B1 - A1| > p and * (4.1) M1(A1) = inf _1 M1(A) . |A| K02 1 A * . C ' ’1 from 13] - A | > p. Since for any AIE F1 n (-30C 3 3°C )2 -2 2 -2 -2 A 2 A + oc |T1(Aoc) - T1(B1oc)l < 15K6 + IT1(Aoc) - T1(B1oc)|. ** by the continuity Of T1 in A there is a A 6 P1 1, aoE‘) for which n (-aoc -2 ** 2 2 (4.4) cc |M1(A ) - T1(B1oc)| < Kp l2 . But combining (4.2) - (4.4) we see that M1(A*) > T1(ocA*) - 2 2 2 ** . . KO /2 > T1(OCB1) + KO /2 > M1(A ), contradicting (4.1). D 2 Lemma 4.4. L0(ocB1) =»N(O,o1). 19 Proof. loc§1 - h(Zn)l = |h(o;‘S(-.O)) + h(zn)l _<_ K-nmugioysnm + Zn(t)ldt s (16,) Till, max Id I + o °°1N(O,o1). 0 Lemma 4.5. Given a > 0 there exist positive real numbers a,b and N with a > b such that n 3.N implies POIGn1(a,b)J 3_l - e . Prpgf. Since OCB1 and o c2M1(O) have limiting distributions there exists a positive real number b such that 2 -2 POEOCIB1I §_b, oc M1(O) 5-b1 3_1 - 6/2 for all n. Taking d > b and noting that M1(A) 3_W§(A) by the Cauchy-Schwarz inequality, it follows from lemma 3.3 that there exist a > b and O < N < m such that inf _1 o “M1(A) > d] > 1 - 5/2 P I O |A|>ao; for n 3_N. But then 022M1(0) _>_ inf _1 022M1 (A) |A|aU - C Theorem 4.1. L0(OCB1) +-N(O,o§). x P Proof. We prove that oclB1 - 811 +0 O. The theorem them follows from lemma 4.4. Let c > O and 6 > D be given. By lemma 4.5 there exist positive real numbers a,b and N1 with a > b such that POIGn1(a,b)] 3_1 - 6/2 V n.3 N1 Now use lemma 4.3 to choose N > N1 such that P0({An1(a) > e} n Gn1(a,b)) < 6/2 V n 3_N . Then for n 3_N we have s 3_sup{o 1B - g | : M (B ) = inf _ M (A)} c l 1 1 1 |A| max P2. Computations similar to l5j c )3 dx . Since c1 5_c2 5,..5 c and the ci's are not all equal, d1 f 0 n and there is a K.S." such that dK # d1. Let K* denote the first such K. For x between c.l and c * we have K n: diI(x _>_ c1)]2 = (K* - l)2d$ > 0. Hence - d d (d. - d.) > 0 lgigjin 1‘1 J 1 This establishes (1.8). We now define 82 = ave(A). Remark. In the two sample location problem 27 82=mEd{xj ' xi311isjins C1- =0, Cj = 1}. A regression example That 82 need not agree with the Wilcoxon estimate 3w in the general regression problem is illustrated by the following example. Here we consider the sample -l, -2, l0, -3, 15, -28 with the weights Ci = i, l §_i §_6. As‘was indicated earlier in this section, we can determine §2 once we have computed n'1M2(A) for each A 6 r2. To compute 3”, it suffices to calculate nw1(A') for each A 6 P2 (Adichie, l967). TABLE I Values of n'1M2(A) and nw1(A') for A’E r2 A 5 r2 n'1M2(A) nw1(A‘) -43 454.3333 17.5 -13 93.0833 16.5 -12.6667 89.0972 15.5 -12.5 87.3125 12.5 - 6.5 18.0625 10.5 - 5.4 10.8667 6.5 - 1.0 27.9167 1.5 - .6667 28.7917 .5 - .5 29.4375 -2.5 2.5 42.5625 -4.5 4.0 49.8750 -6.5 5.5 64.6875 -10.5 5.6667 66.1944 -12.5 12 137.7083 -15.5 18 203.9583 -16.5 28 Here we see that 82 = -5.4; SW = -.6667 . Although §2 and éw may differ for any given finite sample, as the above example illustrates, we prove in section 2.4 that A A oclBZ - Bw' converges in probability to 0. 2. Finite Sample Properties (a) Invariance. A useful property of the estimator §2 is its translation invariance; that is, (2.1) §2(£ + Y9) = §2(g) + v for all real 7. To verify (2.l) we note that M2(A - Y)(X) — Z didjlxj - X, - (A - v)(dJ - d1)! i 0 be given and let ‘b = A1 < A2 <...< Ak(€) = b be a partition of [-b,b] such that + -l max (A. - A.) < E{sup H'(0)] . l§j5k(e)-l 1+] 1 2 n K" By the above we have P -3 + + + 1 +0 Ioc [T (Aj/oc) - T (0)] - KnAjH (0)] 0 for each l 5.j 5_k(e). Thus we may choose 0 < N < a» such that -3 + + + 1-6 < PE max 0 [T (A./o ) - T (0)] - A-H'(0) < e] 19:k(e)l C J C K" J I whenever n 3_N. Now suppose that A0 6 (A1, A ) for some 1 5_j 5_k(e) - l. j+l 34 Then since o;3[T+(A/oc) - T+(0)] and K: AH'(0) are nondecreasing in A we have -3 + + + 1 Cc [T (AD/ac) - T (0)] - Kn AOH (0) -3 + + + 1 -<-°c [T (Aj+l/°c) - T (0)] - K11 AjH (0) + -3 + + . 10c [T (AjH/Oc) - T (0)] - KnA1+1H (0) + I and ;3[T+(Ao/oc) - T+(0)] - K;A0H'(0) 0 -3+ + ‘+, +, 3_oc [T (A1/oc) - T (0)] - KnA1H (0) + KnH (0)[A1 - A111] 3_-e Hence -3+ + +1 |oc [1 (AD/0c) - 1 (0)] - KnAoH (0)] < e and it follows that POE sup Io‘3[1+(A/oc) - T+(O)] - K; AH'(0)| < e] 3_1 - a, IAlsb ° completing the proof of (3.5). In a similar fashion one can show that -3 - - - P0 (3.6) sup lo [T (A/o ) - T (0)] - Kn AH'(0)| + O . lAisb c c Combining (3.5) and (3.6) completes the proof of the lemma. U 35 Lenma 3.2. Suppose X ~ H. Then lim t'zEH{X[I(X §_t) - I(X §_0)]} = H'(0)/2. t+0 [£599f. We consider only the right limit; the proof for the left limit is similar. Let s > 0 be given. Since H'(0) exists there is a to > 0 such that [H(y) - H(O) - yH'(0)| < Zye whenever 0 §_y §_t0. Thus 0 §_t §_t0 implies that It“2 131H(y) - H(O) - yH'(O)deI 5_t'2 [3 2y 2 dy §_e and hence lim t'2 It [H(y) - H(O) - yH'(0)]dy = o . t+o 0 Defining X = XI(0 < X §_t) we see that t lim t'zEH{X[I(X §_t) - I(X §_0)]} t+0 . -2 t = 11m t f P (X > y)dy t+o 0 H t -2 t = lim t f0[H(t) - H(y)]dy t+0 -2 t t u = lim t {f H(t)dy - f [H(O) + yH (0)]dy t+o ° 0 - f3[H(y) - H(O) - yH'(0)de} lim {t“[H(t) - H(O)] - H'(0)/2} t+0 H'(0)/2 . D 36 Lemma 3.3. Let 0 < b < m. Then under the assumptions of section 1 p sup lo;2[V(A/oc) - V(0)] - %-A2H'(0)| +0 o . Algb Proof. Fix 0 < A 5_b. The proof for -b 5_A < 0 is similar. Then E0{o;2[V+(A/oc) - V+(0)]} E0{o;2 Z (d1d1)'1(o < x1 - x1 5.A11/oc)(x1 -x1)} io - H'(0)/2} ll 2 -4 g + 2 A oc 121 d11(d11 d1)H (0) . d11>0 Now 1 2 -4 E-A 0c 1;. d11(d1- d1 )H (0): fix H (0) J . and by lemma 3.2 and (1.3), IA A2 0C4 igj dij(d j “d- H)Eo{(A1j/C )- 21(0 < X1 -X1< Aij/OC)(xj- xi) d11>0 - H'(0)/2}I 2+ < A Kn max E { A / ) 21 0 < x. -x1< A / (x. x.) - H'(0)/2} i k. Therefore a+ n + Varo(Sn) =k21VaroEo(Sn|Xk) < o ;4[ sup |F(x + A1k/oc ) - F(xm2 {If Id1dk|(A1k/oc )32 x,i o v A e R. (1'1‘1') N§(A):M2(A) v aeR. Proof of (i). Let A < A' and set A = max(xzn)(A), xzn)(A')). Then . " A A new ) - 112(1)) 1%] diUx;(A') mx dx - In“) #136 dx] n + X%(A) n X'(A' ) 3_0 . Proof of (ii). We consider the case when A 3_0; the proof for A < 0 is similar. o;1[wz(A/oc) - "2(0)] n X' (A/o ) xi 1 c oc [12 d: Ix. .(A/Oc ) Jflx 5 dx + Z d; Ix X1 Jflx 5 dx] '1 E d+ (Ad+/ )"ix1 mu - M‘X'TJAdV o i=l i[ 1 CC X§(A/Oc) x x i i 0c C n + 0'1 Z d ”E(Ad Ioc ) 1 ° i=l 2 " 2 2 ”1' “mi” i=l fii(A/O'c) _ Jflx) dx — JTIXiJJAdiloc Therefore, by assumption (l.l) and the above, 4l Io-][w (A/o ) - w (0)] - Ao'2 E d2 Jflx 5| c 2 c 2 c i=1 i i X. §,A max I(Adi/oc) 1 IX}(A/o ) JTIx) dx - JTIXiJ| = 0(1) as n + an 1 c ljjfm We complete the proof of (ii) by showing that _ " 27‘7 mpo leEz l 1 But this follows from the NLLN since VarOUfTX-T] ) g ; f2(x)dx < co. Proof of (iii). By the Cauchy-Schwarz inequality 11 142(1)) = [{f>o}{ci§1 dim. - Adi g x)32/f(x)}f(x)dx Iv a n (Id, 12 diI(Xi - Ad]. 5 x)/"(‘Tf x dx)2 =1 X' (A) n 2 (IXE:;(A) fig] d11(xi - Adi §_x)/f(x5 dx) I A M O. _l. H x A: Dv V v '0. X D. X v N Before stating our final lemma we define, for each 0 < a < m; 0 < d < m and n 3_l, the event En2(a,d) = {0;2M2(0) < a, 11?; ogzwgwoc) 3 d}. Lemma 3.6. For every 6 > 0 there exist positive real numbers N,a and d such that n 3.N implies 42 P0(En2(a,d)) 3_l - 8. Proof. Since 0 §_W2(0) §_M2 (0) for all n, the sequence {a 21H2(0)} is bounded in probability by lemma 3. 4. Hence for fixed M! < “’9 (3.12) (c‘zw 2 NZC(A/o)- [02 112(0) +AE “MYTH IP +0 o by lemma 3.5. Now let b be such that 9010;2M2(0) §_bJ 3_1 - e v n 3_1 . If we take d > 2b and choose a so that a >125+ ammo #1171)" we have 022w§(0) §_o;2M2(0) §_b < d/2 and hence liT:a to; w2(0) + A 50 Jflx‘712> [a £0 J?TX‘7 og‘lwzwm2 3_3d/2 A on {022M2(0) §_b}. The proof is completed by applying (3.l2). D 4. Asymptotic Distribution of océz. Throughout this section we retain the notation of sections l-3 and assume that (l.l) - (l.4) hold. In addition we assume, without loss of generality, that 8 = 0. Lemma 4.1. {ocgz} is bounded in probability. 43 Proof. Let a > 0 be given. By lemma 3.6 there exist positive real numbers a,d and N such that POEEn2(a,d)] 3_l - e V n > N. By parts (i) and (iii) of lemma 3.5 -2 -2 2 inf o M A o > inf w A > d 1A1.>.a ° 2(/‘)‘IA1:a°‘ 2W9“ and -2 2 -2 oc ”2(0) f-Oc M2(0) < d on En2(a,b) whenever n 3_N. Thus ,. -2 {locezl §_a} : {oc M2(0) < d, I2T:a M2(A/oc) 3_d} 2 Enz(a,d) V n 3_N implies Ptlocézl 1a]: PEEn2(a.d)1:1 - e v n :N . o The result of theorem 3.] suggests that an approximating statistic for chz is 023T;(0)/H'(0). The next lemma gives the asymptotic distribution of that statistic. Lemma 4.2. Under assumptions (l.l) and (l.3) L0(JT2'o;3T;(O) =>N(0,l). Proof. Since * * the projection of T2(0) into the family of linear rank statistics is 44 *,- -12 " wz .- n 0c .2 diRniO 1-l (Hajek and Sidak (1967), p. 61). Since P -3 * 'k 0 (Sievers, l976), the proof is completed by noting that L JT? ’3 w* N o 1 0( 0c 2) => ( . ) under assumptions (l.l) and (l.3) (Hajek and Sidak (l967), p. 163). D For 0 < a < b < w define -3 * . . G a,b = { T 0 /H' 0 < b, 1nf M A/ > 1nf M A/ } "21 > lac 21) ()|__ |A|>a 21 Cc) Mia 21 0c) and An2(a) = supue“ - of T;(O)/H'(0)l; W1 5 a. M2(A*/oc) = inf M (A/o )}. lAls.a 2 ° Lemma 4.3. Let c > 0 and 0 < b < a be given. Then »P0({An2(a) > e} n Gn2(a,b)) + 0 as "+00. Proof. Suppose there exist 81 and 6 positive such that Po({An2(a) > e} n Gn2(a,b)) 3_6 for infinitely many n. For each such n there exists a lel g_a such that 45 (4.1) 1A0 - og3 T;(0)/H'(O)I z.e( and (4.2) M (A /o ) = inf M (A/o ) 2 0 c IALSP .2 c on Gn2(a,b). Since ._ '2 '3 * 2 l Q(A) .- oc M2(0) - 2Aoc 12(0) + A H (0) is quadratic in A and achieves its minimum at A = 023T;(0)/H'(0), (4.l) implies that (MAO) - (no? 12(01/11'1011 _>_ Hume? . By (4.2) we also have M2(o;4 12(01/11'1011 _>_ "2 . But -2 su o M (A/ ) - 0(4) IAIE§.| c 2 0c I _>_ maxnogznzmoxoc) - omen. (a;2M2(o;41;(o)/H-(o)) - 0(og3 T;(0)/H'(0))I} 3_H'(0)c§/2 on ‘Gn2(a,b) since (022M2(A0/oc) - Q(A0)| < H'(0)e§/2 implies 46 -2 -4 * . -3 * . 0c M2(°c 12(0)/H (0)) - 010C 12(01/H (0)) 3,022 M21A0/oc) - 0(o;3 1:101/H'10)) > 0(A0) - H'(0)e§/2 - 0(o;3 T;(O)/H'(0)) .3 H'(0)e§/2 . Thus lim sup sup [022M2(A/oc) - Q(A)| 3_H'(0)e¥/2 , n+on IALga contradicting theorem 3.1. D . We are now ready to give the asymptotic distribution of o 82. ' Theorem 4.1. L0(oc§2) +'N(0, (12H'(0)2)"). p . Proof. We prove that IOCBZ - 023 T;(0)/H'(0)l +0 0. The theorem then follows from lemma 4.2. Let c > 0 and 6 > 0 be given. By lemma 4.2 and the proof of lemma 4.l we may choose 0 < b < a < w and N1 > 0 such that P0[6n2(a,b)3 3_l - 86 for n 3_N1 . Now use lemma 4.3 to choose N > NI such that Po({An2(a) > e} n Gn2(a,b)) < 56 whenever n 3_N. Then for n 3_N we have 47 sup{|A - o C3T2(0)/H (0)): M2(A*/oc) = inf M2(A/oc )} c §.A (a) n2 [Alia sup{|A* - 023T;(0)/H'(0)| : M2(A*/oc) = igf M2(A/oc)} |v Iocéz- og312(o)/H (O)! on {An2(a) 5-5} n Gn2(a,b). Since P({An2(a) 5_e} n Gn2(a.b)) 3_l - a for n large, P Iocé2 — 0312101/H (0)I +0 o o The asymptotic relationship between éz and the Wilcoxon- type estimator fiw is established in the following p Corollary 4.1. Under assumptions (l.l) - (l.5) ocléz "gwl +0 0 Proof, An immediate consequence of the asymptotic uniform linearity (in A) of w* is . P (212'623 w; - o e |.+° CW 0' Since lemma 4.2 and theorem 4.l yield p -3 'k v: 0 and A ‘3 * 1 P0 lacs2 - 0c 12(01/H (0)) +- 0. we have A A P0 0c'82 ' BwI + 0 U CHAPTER 3 KOLMOGOROV-SMIRNOV TYPE ESTIMATION OF B l. Notation and Preliminaries. In chapter 3 we retain the nota- tion of previous sections. To the assumptions of the model intro- duced in section 1 of the introduction we add the following: (l.l) F has a continuous bounded density f satisfying f(x) > 0 a.e. on {x: 0 < F(x) < l}. (1.2) lim 0'1 max Idil = 0 . new c lgjgn ' In what follows let the vectors 3 and x be given. For A real define (l.3) DC(A) = sup |U(x,A)|. -a_o. d1. < o, 1 :1 < .1“ in}. Proof. We consider only 0:; the proof for D; is similar. That D: is a step function follows from the fact that D:(A) is a function of the ranks of {Xi(A), l §_i §_n}. To establish its non-decreasing nature we make use of (1.3). Let A1 5.A2 5,..5_Am denote the ordered members of {(X3 - X1)/(dj - di); 1 5_i < j §_n, di f dj} and set A0 = -w, A +w. For 1 §_j 5.m choose any m+l= 50 A', A" such that A. J-]X£(A") and (1.6) D:(A") = max{ sup U(x,A"),U(x0(A"),A"), sup U(x,A")} . xXk(A") Note that as A E (Aj_], Aj+]) crosses A., only the residuals J Xk(A) and X£(A) cross. The other residuals remain distinct and in their same relative order with probability one. Hence the following are valid: (l.7) sup U(x,A') = sup U(x,A") = sup U(x,A.) xX£(A ) x>Xk(A ) x>Xk(Aj)=X£(Aj) (1.9) u1x01A').A') = u1x01Aj).Aj) - a, 51 (1.10) U(x0(A"),A") = U(xo(Aj),Aj) - dk . Ne complete the proof by considering three cases: Case I. If dk < d£ 5_0 then by (1.9) and (l.lO) sup U(x,A') 3_U(Xk(A')',A') xX£(A') follows from (1.5) - (l.8). Case II. If O'idk < d2 then sup U(x,A') x>X2(A') |V U(x,§:?A') U(x.A )} DC(A ) follows from (1.5) - (l.8). Case III. If d §_0 §_d£, d < d£, then by (1.9) k k 52 sup U(x,A') _>_ U(X£(A')+,A') = U(xo(Aj),AJ.) x>X£(A') .3 U(x0(Aj),Aj) - d2 = U(x0(A'),A'). Thus D:(A') = max{ sup U(x,A'), sup U(x,A') xX£(A') §_max{ sup U(x,A'), sup U(x,A'), U(x0(A"),A")} xX£(A') = D:(A") follows from (l.5) - (1.8). To establish left continuity, first note that + - , Dc(Aj) - max{ sup U(x,Aj), U(x0(Aj),Aj), sup U(x,Aj)}. xXk(Aj) Applying (l.7), (l.8) and x>§:?Aj)u1x.Aj)_>_ U(kajifiAj) = U(xoujmj). we have + _ , , = + Dc(Aj) - max{x<§:?A')U(x,A ). x>§:?A')U(x.A )} Dc(A)' Finally, since D:(A') = D:(A") in cases I and 11, the points of discontinuity of D: are seen to be a subset of r3. D Before defining 33 we need one additional + - - + - - n + Lemma 1.2. DC(A1) = 0 = Dc(Am) and Dc(A]) = Z d = + + i=l i Dc(Am)° 53 Proof. Note that for A < A]. l §_i < j §_n and di < dj imply Xi(A) < Xj(A), so that the {Xk(A), l §.k §_n} are naturally ordered with respect to d1 §_d2 53"5-dn' Using the monotonicity of the .d 1's and {'13:} d1. = 0 we have, for 1: k gn-l, k U(x,A) 8 2.] di 3 09 X G [xk(A)9 Xk+1(A))° 1: Since U(x,A) '-' 0 . X e H(O)“), x(n)(A)) we have sup [U(x,A)] = 0 -ao Am. B Lemmas l.l and 1.2 guarantee that the following exist and are finite: 54 . ,+ - 1nf{A E R, Dc(A) Z-Dc(A)}’ m (.0 II sup{A e R; D;(A) 3_D:(A)}. ID (A) ll Note that by the monotonic nature of D:(A) and D;(A), B;* 3_B; w.p. l. We are now ready to define the estimator A - 'I * ** B3 ’ 2(83 + B3 ) ° Lemma 1.3. Dc is nondecreasing for A 3_§3 and nonincreasing for A:%. Proof. Note that (1.11) DC(A) = max(D:(A), o;(A)) v A e R . By the definition of 5;, A > 8; implies D:(A) 3_D;(A) and hence + ** ** DC(A) = Dc(A). By the definition of B3 , A < 33 implies D;(A) 3_D:(A) and hence DC(A) = D;(A). Thus, since * A ** + A .. 33 5_B3 5.83 , DC(A) = DC(A) for A > 83 and Dc(A) = DC(A) for A < B3 and it remains to show that A . A- A+ (1.12) DC(B3) 5_m1n(Dc(83). DC(B3)) . But by lemma l.l, + A + A- + e+ DC(B3) - Dc(e3) §_DC(B3) and D’(“ ) - D'(“*) < D'(“') c 83 - c 83 - c 83 ' Therefore, 55 (1.13) DC(§3) = max(-A) and - + Dc(-A)(A) - DC(A)(-A) . 1k *1: Thus the definitions of B3 and 83 yield * * 331-5) = -83(A) and ** ** B3 ('5) = ‘33 (1,) - Therefore §3(l) ~ §3(',X,) = '§3(£)s completing the proof of (i). 58 Proof of (ii). As in the proof of theorem l.2.l, 62(5) ~ 62(§*) where 5* = (Xn,Xn_1,...,X1). Using the definitions of D: and D; and the proof of (i) one obtains D:(A)(A*) = o;<-A)(-A) . Thus 63(5) ~ 6311*) = 63(-A) ~ -§3(A) as in the proof of (i). ' D 3. Asymptotic Distribution of UCB3. In this section we assume, without loss of generality, that B = 0. To aid in the proof of theorem 3.l we first define a class of functionals {T2, z 6 R} on the set V of bounded functions on '[0,l] by 12(h) = sup {1h(t) + zf(F'1(t))]1/0} 0+nu4un1vo-mu)+fiu4un1vm 5. Wt) - oh)! 5 llh - gum v t e [0.13 and 59 l£h(t) + zf(F'](t))] A o - [9(1) + zf(r“(t))1 A 0| :.|h(t) - g(t)| 5.uh - gum v t e [0,l]. Thus 112111) - 1.1911 5.21111 - 911,. establishing that T2 is a continuous functional on C[0,lJ. Remark. Let {B(t), 0 5_t 5.1} denote a Brownian bridge. Then since 3(o+) = S(0+,0) = f(F“(o+)) = o w.p. 1, we have (w p.1) 12(3) = sup {B(t) + zf(F'1(t))} 0 0; the proof for z 5_0 is similar. * ** From the definitions of B3 and B3 9 ink + .. 83 < z/oc c>Dc(Z/cc) > DC(Z/oc) and 60 + _ * Dc(z/oc) _>_ Dc(z/°c) $83 _<_ z/oc . * A ** . . Thus 83 §_B3 §_B3 implies that + - A + - POEDc(z/oc) > Dc(z/oc)] §_P0[B3 §_z/oc] §_P0[Dc(z/oc) 3_DC(z/oc)]. Applying theorem l.3.l and using the inequalities -1 + 0c Dc(z/oc) 3_0, sup {0215(t,0) + zf(F'](t))} 3_0, 0 D;(z/oc)1 + P0[12(B) > 0]. But Po TZ(B) = 0 = 0 by lemma l of Rao et al. (1975) and the proof is completed. B 4. Interval Estimation of 3., Throughout this section we assume, without loss of generality, that B = 0 and that (l.l) and (l.2) hold. Let Yc,a denote the critical value for which one rejects ’ Ho: 8 = 0 at level 6 whenever Dc(0) > YC,a° Then a l00(l-a)% confidence set for B is given by Ic,a := {A; Dc(A) s-Yc,a} . Lemma l.l and (l.l) imply that IC 6’ when nonempty, must be an interval. Note that 62 Ic,01 = 4’ $DC(D) > Yc,a; hence empty intervals are obtained only on an event where one rejects the true null hypothesis. It has been shown (Hajek and sidak, l967, p. 189) that 10(og‘oc1011 = 10103351 181111). Defining K to be the l-a percentile of L ( sup |B(t)|) we a 0 Dgtfj then have 1. -l - 1m oC y - K "#0 Lemma 4.1. Given 0 < B < m» and e > 0 there exist b and N positive such that n 3_N implies . -T POE 1nf o D (z/oc) > B] > 1 - e. |2lzb c c Proof. Using lemma 1.3, the proof is similar to that of lemma l.3.3. D Define K; = sup 7 n>l c,a on R. We are now ready to prove and let u denote Lebesgue measure Theorem 4.1. Suppose that (l.l) and (1.2) hold. Then for each n > 0, lim sup oc u(Ic a) is bounded in probability by n 9 -l ZKaufum + n. ‘ £3221, Let €»> 0. Hme > 6 > 0. By lemma 4.1 there exist b and N positive such that n 3_N implies * P [ inf D(z/o ) > K J > T - 8/2 . 63 Choose t0 6 (0,l) such that 1114110)) 1 11111.. - a and N1 3_N such that -l -l POE sup oc [S(t0,A/oc) - S(t0,0) - Aocf(F (to))| _<_ 6] >1 - 8/2 V n 1 N IAlzb 1' Then n 1N1 implies OcIc,a : (A; |S(t0,A/oc)| Z-Yc,a} n {A; [AI 3_b} -l c {A; |S(t0,0) - Aocf(F (to))l f_Yc,a + oc6} n {A; |A| §_b} ; {m (5(t0,o) - Aocf(F"(to))l 31cm + oc5} + 0C6 + s(tO.O))/ocf(F"(to)) < A = {A; '(Yc,a :_(y + océ - S(t0,0))/0cf(F-](to))} c,a with probability greater than l - 5. Hence “1 + 6)/(Hfflm - a): > 1 - e. POElim sup Ocu(1c,a) §_2(oc c,a 11 Since 6 and s were arbitrary and lim 0'1 n+m c = KG, the proof c,a is completed. . D It is possible, under more restrictive conditions, to show that the bounds of theorem 4.1 hold w.p.l. Such a result is given in the following Theorem 4.2. Assume, in addition to (l.l), that f'(x) exists and is bounded for a.a. x. Regarding g, assume that 64 (4-1) nlfio;1 max |d1.| =0(l) as n+m, lgjfm (4.2) lim inf n"o2 > o . n c Then . -1 l1mnsup “c““c.o) _<_ 2KallflL° w.p. 1. Proof. Let c > D and D < b < m. Define, for t e (0.1) and x,A 6 R, u*(t.A) = f] dimx.) : 111411) + Adm. 1: n U'(t.A) = Z diI(xi 5_x + Adi). . i=1 . From theorem 3.l of Ghosh and Sen (l972), sup |U*(t,A/oc) - U*(t,0) - Aocf(F'1(t))l + 0 w.p. 1 (P0) . 0__N1 . 1 0 _. ’ c _. «i 09 c 9 9 1=l U'(x,A/oc) = o = U'(x;,A/oc) , A = 0. and " - + . [U'(x,A/oc)| §_1£] di = |UI(xO,A/oc)| , A < 0 imply (4.5) [U'(x,A/oc)| _<_ |U'(x3.A/oc)l. Similarly, for n 3_N1 and x 3_x1 = sup S(F). 66 (4.6) |U'(x,A/oc)| 5_|u'(x;.A/oc)| . Combining (4.5) and (4.6) yields (4.4) and hence (4.3). D 5. Asymptotic Efficiency of the Ic,a' Using the bounds of section 4 one can compare the Ic,a to other common confidence intervals for which asymptotic lower bounds can be computed (Rao et al., 1975). Koul (1971) has computed the asymptotic lengths of the normalized confidence intervals based on a wide class of linear rank statistics. Although his bounds were in probability bounds, they can be strengthened to w.p.l bounds by applying the results of Ghosh and Sen (1972). As an example of the type of results which can be obtained and to demonstrate the efficiency of the Kolmogorov- Smirnov type intervals, we compute bounds for the asymptotic efficiency of the Ic,a with respect to confidence intervals based on the Normal scores and Nilcoxon type rank statistics. In what follows let 6 denote the standard normal c.d.f., let 20‘ be defined by ¢(za) = l - a and define o(t.f) = f'(r"(t))/f(F"(t)). o < t < 1. Assume, without loss of generality, that B = D. (a) Comparison with Hilcoxon-type intervals. Let -1 n a=={A; In E K ‘ i=l diRniAl 5-6c,a} where 6c is such that one rejects H0: 8 = 0 at level 6 whenever In"1 X?=] diRniAl > 6c,a and accepts HO otherwise. Under the assumptions of theorem 4.2 67 . 2 11m 0 u(K ) = 2 //S f f (x)dx w.p.l. C Caa a/Z Thus, . “(1C a) _ J17 2 (5.1) 112+:UP 3112‘37'5-*6(F) .- KG f f (x)dx/za/zllfuco w.p.l. To obtain an upper-bound on lim sup Wa(F) o+0 for fixed F, we investigate the behaVior of 112+;up KZa/Za' From Hajek and sidak (1967, p. 182) we obtain 1 - o = P[ sup (3(1)) 5_KaJ 3_1 - 2 exp(-2K§) 0 6c a and accepts H0 otherwise. Under the assumptions of theorem 4.2, . _ l -1 A12 ocu(Jc,a) - Zza/z/fo ¢ (u)o(u,f)du w.p. l . 69 Thus, 11(1 ) . c a = 1 -] limup mgwafi) K01 f0 9 (")‘PW’IMU/Za/znfuaa w.p.l. The following table gives the upper bound WG(F) for various choices 1 of F and a. For a = 0, ?a(F)-:= f0 o'](u)w(u,f)du/4flfum. TABLE II Values of wa(F) for Comparison to Normal Scores Intervals a\F Std. Normal Logistic Dbl. Exp. Cauchy .5 3.076 2.769 1.958 1.8 .10 1.865 1.679 1.187 1.1 .05 1.737 1.564 1.106 1.0 .025 1.655 1.490 1.054 .96 .01 1.584 1.426 1.008 .92 .005 1.546 1.392 .984 .90 0 .627 .564 .399 .36 6. Monte Carlo Study. In order to compare 63 with other point estimators of B, 5000 samples of size 40 were generated from each of the standard normal, double exponential, logistic, and Cauchy (median 0) distributions. Taking c1 = i, l §_i g_40, we then computed 6]. 63, and the Nilcoxon estimate, 6", for each sample. The following table gives 52(ch.) for each set of 5000 samples. 70 TABLE III Values of 52(oc6 ) 2 F 5 Std. Normal Logistic Dbl. Exp. Cauchy s2(oc§w) 1.0668 2.9883 1.4541 .3153 52(oc61) 1.1755 3.1985 1.4532 .3666 s2(oc§3) 1.1181 3.1138 1.5497 .3386 Each set of observations was based on a corresponding sample of uniform (0,l) variates generated by the Fortran subroutine RANF on the Michigan State University CDC 6500. The logistic and double exponential variates were generated by computing F'](U) for each uniform variate U, the Cauchy variates were generated by computing tan[(U - .5)/n], and each normal variate was generated by computing (-2 ln U1)15 cos(2n U2) for independent uniform (0,l) variates U1 and U2. BIBLIOGRAPHY BIBLIOGRAPHY Adichie, J. (1967). Estimates of regression parameters based on rank tests. Ann. Math. Statist. 38 894-904. Fine, T. (1966). On the Hodges and Lehmann shift estimator in the two sample problem. Ann. Math. Statist. 37 l814-lBlB. Ghosh, M. and Sen, P. (l972). 0n bounded length confidence interval for the regression coefficient based on a class of rank statistics. Sankhya Series A. 34 33-52. Hajek, J. (l969). Nonparametric Statistics. Holden-Day, San Francisco, California. Hajek, J. and sidak, z. (1967). Theory of Rank Tests. Academic Press, New York. Koul, H. (1971).. Asymptotic behavior of a class of confidence regions based on ranks in regression. Ann. Math. Statist. 42 466-476. Koul, H. (1977). Behavior of robust estimators in the regression model with dependent errors. Ann. Statist. 5 681-699. Rao, P., Schuster, E. and Littell, R. (1975). Estimation of shift and center of symmetry based on Kolmogorov-Smirnov statistics. Ann. Statist. 3 862-873. Scholz, F. (1978). Weighted median regression estimates. Ann. Statist. 6 603-609. Sievers, G. (l976). Weighted rank statistics for simple linear regression. Mathematics report #44, Western Michigan University. 71