,....... 4. 4... 44v ...44;.4.4..4 (4.... ...-.4 4.. .. 4.... .4344 ...... .....4..........r.v.4 r .1244 4. 44.4.4 44 4 . .47 44.4.4.4...uy......44 .....44 4... ... .4; .4. 44.1 .... . . . , r47r4 ”4.444453. . .4 4 4 I. .4. 44 .. . .444/4 . . 4 y ....4 r! U“ 4r.444.4.44 41%.}: .....4}. .44 . , . ...? )4 42.4."...rr...4 , 14.4.41... 4.: .41 4...... , 74.4.3.9...4. 74.4.4541“ 444......5H14. .r.;v..34.1..::» .,.4r 1:..4p... ... .4Il..4,.:4 4.4..» 344. 4..4.\344.,44.?.4 .4....44_..4.......4.. 44.4 ...4... .44 4 4...... .4tf44.4... .4...) if?! v5.4?l. , ... ...»r . 44 ... 4.2.1., , a. ....444..4..44>..44. .45 5*:- .-- 4N5 LIBRARY Michigan State University : . w This is to certify that the thesis entitled THE PROBABILITIES OF MODERATE DEVIATIONS OF U-STATISTICS AND EXCESSIVE DEVIATIONS 0F KOLMOGOROV-SMIRNOV AND KUIPER STATISTICS presented by Gerald Marlowe Funk has been accepted towards fulfillment of the requirements for Ph.D. Statistics and degree in Probability 34w,” fl/K/VV; Major professor Date May 21, 1971 0-7639 ABSTRACT THE PROBABILITIES OF MODERATE DEVIATIONS OF U-STATISTICS AND EXCESSIVE DEVIATIONS OF KOLMOGOROV-SMIRNOV AND KUIPER STATISTICS BY Gerald Marlowe Funk Let X1, X2, . . . be independently and identically distributed random variables. The U-statistic, Um) generated by a function symmetric in its k arguments based on X1, . . . , Xn was defined by Hoeffding (Ann. Math. Statis (1948)). Under certain moment condi- tions Rubin and Sethuraman (Sankhy'a’ Ser. A (1965) ) obtained order results for probabilities of moderate deviations of these statistics. These results are extended to obtain expressions of the form 2 /2 -1/2n-1/Zc P(U(n) - BUM) > ckcr(logn/n)1 )~ (chzlog n) if c > 0 and o- > O is the standard deviation of the limiting distri- bution of V}? (UM) - EU(n)). Similar results are obtained for Lehmann-generalized U-statistics, and for some functions of U-statistics. A related problem, that of obtaining asymptotic expansions of probabilities of excessive deviations of the Kolmogorov-Smirnov and Kuiper statistics is considered. Let F denote the c.d.f. of Xi and let Fn be the sample c. d. f. based on X1, . . . , Xn . Let D: = Slip (F(x) - Fn(x)) . Suppose F is continuous and let n A: = 0(1) and n A: ... 00. Then Ge rald Marlowe Funk 2 + '2an P[D > x ]~e n 11 This result was obtained by Rubin and Sethuraman (Sankhya’ Ser. A (1965)). In this paper a different proof of this result is presented and the problem is solved for larger deviations, i. e. , when the condition n A: = 0(1) is relaxed. Similar results are obtained for the two-sided Kolmogorov-Smirnov statistic and the Kuiper statistic. THE PROBABILITIES OF MODERATE DEVIATIONS OF U-STATISTICS AND EXCESSIVE DEVIATIONS OF KOLMOGOROV-SMIRNOV AND KUIPER STATISTICS By Gerald Marlowe Funk A THESIS Submitted to Michigan State Unive r s ity in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1971 (5‘ 7Ja277 ACKNOWLEDGMENTS My most sincere gratitude is extended to Professor Herman Rubin whose enthusiasm and guidance made this undertaking possible. I benefited immeasurably from our many discussions of large sample theory and asymptotic expansions. ii TABLE OF CONTENTS PROBABILITIES OF MODERATE DEVIATIONS OF U-STATISTICS. 1. 0 Introduction and Summary 1.1 U-Statistics l. 2 Extensions PROBABILITIES OF EXCESSIVE DEVIATIONS OF THE KOLMOGOROV-SMIRNOV AND KUIPER STATISTICS . Introduction and Summary The Sample Distribution Function Some Asymptotic Expansions NNNN WNt—IO Kolmogorov- Smirnov Statistics N uh Kuiper Statistic . 2. 5 Probabilities of Large Deviations of Kolmogorov- Smirnov Statistics . 2. 6 Probabilities of Large Deviations of the Kuiper Statistic . . . . . . B IBLIOGRAPHY iii Probabilities of Excessive Deviations of the Probabilities of Excessive Deviations of the Page 19 19 22 24 34 51 62 76 83 PROBABILITIES OF MODERATE DEVIATIONS OF U-STATISTICS 1. 0 Introduction and Summary The primary result of this chapter is the development of an asymptotic expansion for probabilities of moderate deviations of U- statistics. Wassily Hoeffding [11] defined a U-statistic based on n indepen- dent random variables, X1’ 2, . . . , Xn’ and a real valued function of m(f_ n) arguments, Y, as -l U' (X ,...,X ) = [m1(n)] zwx. X ) \y 1 n m 1 1 1 m where the summation is extended over all permutations (11, . . . , im) of m different integers, iJ. , such that l S ij _<_n . Without loss of generality, any U-statistic may be written as -1 U¢(X1,...,X)=(n) Z ¢(X.1,...,X. ) (1.0.1) 1 max/J3] = 1-¢(x) n—>CD (’0 . ( )2 prov1ded E¢(X1,...,Xm) = O , E cp(X1,...,Xm) < co and 2 0‘2 = E E[¢(x1,...,xm)lxl]) > 0. Here X 1X2 “ 1 ¢(x) : _1_._ f e 2 dxl. VZn -m Clearly there was no loss of generality in assuming the first moment of q) to be zero. If all moments of (p up to order p exist for some p > x‘2 + 2 then for fixed x > O 1/2 logP[U(X,...,X)>mox(1—O£P—) ] Z 1. 740 1 n n 1m n—>oo -11... _-Z logn The latter result was obtained by Herman Rubin and Jayaram Sethuraman [17]. They define deviations of the U-statistic from its 1/2 mean of the form c<£9—§1—ll) to be moderate deviations. Our main result is that, with no additional assumptions, 1 1/2 P[U (X1,...,X )>mo*x(—952) ] lim : ‘P n n , = 1 (1.0.2) “Tm 1 - ¢(X\/I0gn) Since the sample mean is a U-statistic, this may be viewed as a generalization of the following theorem due to Herman Rubin and Jayaram Sethuraman [l7] . Theorem 1. 0 Let X1, X2, . . . be independently and identically distrib- uted real valued random variables such that EX1 : O, EXZ : 0'2 and E|X|px2+2 and x>0. Then 1 n 10 n 1/2 P[— z X. > ax(—&—) ] n ._1 1 n lim 1‘ = 1 (1.0.3) n->oo 1 - ¢(xm) and 1 10 n 1/2 P[|H Z Xil>ox(—-§—-) I 1. i=1 __ 1m — 1 . n—>oo 2(1-¢(x'\/logn) The theorem remains true if the constant x is replaced by x+e wherec=o( 1). n n logn In section 1.1 the proof of (1. O. 2) is presented. In section 1.2 some extensions of this result are discussed. However, the problem of extending these results to deviations larger than moderate deviations remains open. In Chapter 2 moderate and excessive deviation results are presented for the Kolmogorov-Smirnov statistics. This problem is formally introduced in section 2. 0. 1. 1 U-Statistics Let X1, X2, . . . be a sequence of independently and identically distributed random variables defined on some probability space (0,58, P). Let (p be a Borel measurable, real valued function symmetric in its m arguments. The U-statistic generated by (p and the first n random variables is defined as in (1. 0. 1): —1 n .,xn)=(m) 1. Z 1,)(xi,...,xi )(1.1.1) Given this sequence of U-statistics, {Uém} , we wish to study the rate at which the corresponding sequence of probabilities of moder- ate deviations from the mean, {P[U((pn) - EUfpn)> c 12.5.2] } , tends to zero as n becomes large. Our approach is to exhibit a sequence of real numbers which is asymptotically equivalent to this sequence of probabilities. (Two sequences of real numbers, {an} and {bu}, are defined to be asymptotically equivalent if lim an/bn = l in which n—>oo case we write a m b ). n n Our main result is Theorem 1. l Let {USU} be the sequence of U-statistics defined in (1. 1. l) and suppose that: E|x2+2 and x>O. (1.1.2) E¢(X1,...,Xm) = 0. (1.1.3) 0'2 = E(E[o (114) 1,..., m 1 . Then x2 (n) 10 n ”2 2 -1/2 “2‘ P U¢ > ma'x —§— ~(21rx logn) n (1.1.5) and 2 1/2 _ P[lU;n)l > ma’x(1—O§—Q) ]~ 2(21rleogn)-l/2 n 2 . (l. 1.6) Proof. It will be shown that there is a real valued function (alt) such that the U-statistics generated by (pl and X1, X2, . .. have the following propertie s: 1/2 c ‘ Z P[U(n) > x0‘ (1353) :1- “ 1 2]~(2nx210gn)‘1/2n‘x /2 s01 n (nlo / g“) (1.1. 7) 1/2 c 2. P[|U(n)| > x0‘(—12&-11\) :I: n ]~ 2(21'rleogn)-1/Z n—x /2 (01 n (nlo 1/2 gn) (1.1.8) 2 xmcre: -x /2 P[IU(n> - mU(n)| > 31/2]: 0(2-————) (1.1.9) (0 ((71 (n log n) x/log n where {an} is a positive sequence such that En = 0(1) and logn = o(sn). Now for any two real valued random variables Y1 and Y2 and constants a and 5 > 0 P[Y +Y >a] > P[Y >a+6] - P[Y < -5] 1 2 - 1 2 and P[Y1+Y2 > a] : P[Y1> a-b] + P[Y2 > 6]. (n) 10 n ”2 Inparticular if Y +Y :U , Y =U -mU , a=mx0'———g—— 1 2 (p 2 .p «)1 n 1/2 and 6 = (mx (ran) (n log n)_ then these inequalities together with (1.1. 7) and (1.1. 9) will verify (1. l. 5). Similarly, the inequalities P[|Y1+YZ| >a] _>_ P[|Y1l>a+6]- P[|YZ| > 6] and P[|Y1+Y2| >a] 5 P[|Y1|>a—6]+P[)Y2|> 6] together with (1.1. 8) and (1.1. 9) will verify (1. 1. 6). Define ._ l I I l ._ 0. Finally, (1.1.2) and Jensen's inequality yield Elqollp = E :EE[l¢lplX1] = Elwlp < co for some p > c2+2. Thus the conditions of Theorem 1. l are satisfied for the sequence cp(X1),(p(XZ), Since (n) 1 n U = — E (p (X.) “’1 n i=1 1 1 and since . i l ~x2/2 11m (— e )/(1-¢(x)) :1 X->oo x 211' the asymptotic equivalences (1. 1. 7) and (l. l. 8) have been verified. To show that (l. l. 9) is also true, Ufpn) is decomposed into k U-statistics as follows: Define (pm(x1,...,x )=go(xl,...,xm) andfor lir 1. Evidently, ’ ‘P (P1 p ,Um) (n) ”ma -mU l > le

] r=2 r Iffr m-l (r110gr1)1/Z |/\ | /\ V m V V (fi) (n1ogn)"/2 2 (m) E|U(n)l (1.1.11) mxeen r y r22 r for O [2] : O(gn (logn) n ). P[|U(n) _ mU 1 (p (n log n) Let v be an even integer such that c'2 < v E p then (1. l. 9) is verified and the proof is finished. 11 1.2 Extensions. Let F denote a distribution function and 1y a real valued function m _II integrable with respect to the product measure 1:1 F.1 . Define a para- meter 9(F) as G(F)= /°'-/\y(xl,x2,...,xm)F(dxl)F(dx2) ---F(dxm), (1.2.1) Hoeffding [11] calls O(F) a regular functional of F . Given n (_>_m) independent random variables, X1, X2, . . "Xn , a reasonable method of estimating 6(F) is to replace F by the sample distribution function Fn. This leads to a statistic of the form: 1m n n 6(Fn):(fi) Z Z Y(Xi,Xi,...,Xi ). (1.2.2) 11:1 1m=1 1 2 m This statistic is closely related to the U-statistic discussed in the previous section. (Note that 1)! does not depend on n.) Define cp(x1,...,x )=Z‘i’(x.,...,x.) (1.2.3) where the summation is extended over all permutations of {1, 2, . . . ,m} . For lij < m define ‘l’m_j(xl,...,xm_j) = Z ‘l’(xi1,...,xim) (1.2.4) where the summation is over the distinguishable permutations in which each Xi’ l :1 _<_m-j , occurs at least once. Then m _ n (n) n (n) n (n) n 9(Fn)—()U +(m )U +”'+()U‘Y (1.2.5) m (P 12 Hoeffding showed that if E[9(F )]2 < 00 then the asymptotic distribu- tion of (I; (earn) - 9(F)) is that of J; (Uén) - 6(F)) . The following theorem shows, under additional moment condi- tions, that a similar result is true for moderate deviations. (The notation developed in (1.2. 1) through (1. 2. 4) is used in the statement of the theorem) Theorem 1. 2.1 Let {Xi} be a sequence of independently and identically dis- tributed random variables with common distribution F and let {8(Fn)} be the corresponding sequence of regular functionals of the sample distribution function (1. 2. 2). If [...] [x2+2 (1.2.6) andfor lijEm-l, j” [[‘Y(X1,...XJ.1)]qF(dx ) °--F(dxj) 1, q>x2 (1.2.7) and 0'2 > 0 where 0'2 =/[/” -(/——1-—¢p(xl,...,xm) -9(F)) F(dx1) ---F(dxm_1)]2F(de) (1.2.8) then for x > 0 1/2 2 )- 9(F) > mxo-(lggll) I,” (anzlogn)-l/Z n-x /2 pl“F (1.2.9) n and 1/2 2 p[|e(Fn) _ 9(F)| >mx0'(.1£§l_r_l.) ]~ 2(21rleogn) ”2 n_.x({.22.10) 13 Proof. Using the expansion (1. 2. 5) 9(Fn) may be written as m-l . (n) m-l 1 m-j-l . ( 9(F)=(II (1-1—))U ,+ z: —.—( r1 (1-1—))U n n (p/m. :1 n‘] n i=1 3' 1:1 m-j/(m-j)! Since any weighted sum of finitely many U-statistics is itself a U-statis- tic (provided the weights are constants independent of n), 9(Fn) may be written as m-l 1 (n) 9(F ) 2 23 —.-U (1.2.11) n j=0 nJ Clearly (p0 = go/m.l and the 475 are weighted sums of 90 and the 1y]. . Evidently, 1 1/2 Pszan) -6(F)>mX0‘(—9-§—n—) :] < pEJln) 9 (log n)1/2 En :| _ I - (F) >mx0‘ ‘ 1/2 (p0 n (n logn) m-l e nJ.1/Z + 2: FEUWB n 1/2 (1.2.12) i=1 "’3' (m-1>mx0‘ + er (po 11 (nlogn) m_1 _8 nj.1/2 _ 2: p[U(n,)< “ 172:] (1.2.13) i=1 "’1 (m-1) 5 I < zs“’13:lU(’,1 [ (1.2.14) (pJ. n — n cpj and for v > 1 Minkowski's inequality is repeatedly used to Show 1/v 1/v /\ F: 3 /\ 17¢ (EIU .l") _ max {(Elcplv11/v.l/V} (Pj ‘Pj _ 11 and v>x2. Set n+J-1/Z 6:“ (n log n )1/2 Then the right hand side of (l. 2. 14) is asymptotically negligible com- 2 ..1/2 n-x /2 for 1:] :m-l , and (1.2.9) is verified. pared to (log n) A similar argument verifies (1. 2. 10). The latter part of this proof is a special case of Lemma 1. 2.1 Let {Xn) and {Yn} be sequences of real random variables such that 1/2 PEXn > x(1_0§_) ]~1-¢(X~/logn) for x e (O,x0). n n Then Yn 10 n 1/2 P]:Xn+ 7>x(—g——) :'~1-¢(xlogn) 11 I). if c > 1/2 and, for some fixed M, ElYnlviM< co for some 2 v>-2-—}E—_-l andalln. 15 Proof. Use inequalities analogous to(1. 2. 12)and (1. 2.13)and apply Markov's inequality. We give one further example of how Theorem 1. 1 may be extended. Let {XS-h , 1 :j i r , denote r independent sequences of independently and identically distributed random variables and let Fj denote the common distribution of the random variables in the jth sequence. Let T0 be a Borel measurable, real valued function of r m: _2 mj arguments. Let T0(---,x.,~-,xk,--o):T0(--o,xk,--o,x.,-~ 3:1 1 1 if mj , ,x;r>, ,x;r> ) j=1 j k k k k 1 ml 1 mr where the summation is extended over all subscripts for which _ ...j :c.n;l_<_j:r. k1 km. J J Here the cj's (_>__ 1) are integral valued and represent the proportion of X(J)'s in the ”sample. " U'(n) The sequence of statistics may be approximated by the sequence of U-statistic s -1 . U(n)— (‘1) 8(2. , ,z. ) (1.2.16) ‘P m 1x , x>0, then 1/2 1/2 P[ '(n)>m0'x(-1—O—§—rl) :|~P[:U;n)>mcrx(1—O£-rl) :]. 1'1 Proof. i=1 3' J and may be written as r c.n (n) r c n Note that II (rri )U' consists of Mn 2 II (r191 ) summands =1 3' summands of U'(n) not included in (Nn -Mn)U;n) . Thus U(n) may be written as .(n)_ (n) n (n) U —U¢ +(Tn-m—U ) v Condition (1.1.2) assures us that E|U(n)] < M < co for some 2 v Nn " 0 V v>x +2 and condition(1.2.19)guarantees EITnl :l—M—-] EIT I N n v and EITO] < co . Since 1712- = O(%—) , Lemma 1.2.1 applies and the n theorem is proved. The theorem remains true if the independence conditions are replaced with the assumption that the Z's are independent (and condi- r tion(l. 2. l9)replaced by the obvious moment conditions) and if E c. is .:1 r J replaced by Z) Cj + o(l) . i=1 The final theorem in this sect-ion deals with functions of U-statistics. Let {Xn} denote a sequence of independently and iden- tically distributed random variables and let U(1n), . . . , Ulin) be k (1) (Z) (k) U-statistics generated by X , . . . , Xn and the ke rnals (p ,(p , . . . , (p 1 respectively. For each U-statistic, assume that conditions (1. 1. 3) and (l. l. 4) of Theorem 1. 1 hold and that all moments of (pg) exist and are finite. Define aij = E¢§1)¢(13) ; 1:1 :k, 1_<_j_<_1< where (p?) is defined as in (1. 1. 10a). If the determinant laij] is positive then 18 Theorem 7. 1 of [11] asserts that the asymptotic joint distribution of the Mn UJlm's is non-singular. Thus each nondegenerate linear com- . . k (n) b1nat1on .23 a.U. J-1 J J Theorem 1. 1. In View of Theorem 8 of [17] we have the following is a U-statistic satisfying the conditions of Theorem 1.2.3 Let f(xl, . . . ,Xk) be a function of k arguments and k < Hxll ) f(x1, . ..,Xk) — f(0,...,0) + :31 bixi +0 10g ”X“ 2 k 2 . . . . where ”X“ = 23 xi . Then, for the above-ment1oned U-statistics, if 121 x>0, and O'b>0, 1/2 2 PE(U(1n), . . .,ULM) - {(0, . . .,0) >xeb(1—9§fl) :]~(21rleog n)'1/‘Z fix )2 where 2 k k 0' = Z Z b.m.b.m.oz.. b :1 1:1 1 1 J J 1] j and mj is the number of arguments of (p(']) l :j _<_ n . Similar results are valid for 8(Fn) and for Lehmann-gene ralized U-statistic s. PROBABILITIES OF EXCESSIVE DEVIATIONS OF THE KOLMOGOROV-SMIRNOV AND KUIPER STATISTICS 2. 0 Introduction and Summary Let X1, X2, . . . ’Xn be a sequence of one -dimensional, indepen- dent identically distributed random variables each having the continuous cumulative distribution function F . The empirical cumulative distribution function, Fn’ associated with the sequence X . , Xn is defined by the relation 1... SIH Z n xigx The Kolmogorov-Smirnov statistics are defined as D: = sup (Fn(x)-F(x)) —oo cn-l/Z) : e-ZC n—>oo n 2 lim P(D- > cn_1/2) = e"2C n—>oo n no .2 2 lim P(Dn>en‘1/2) = 2 z (.1)J'le"ZJ C , n—mo j=l and Nicolass Kuiper [14, p. 43] has shown that 00 .2 2 lim P(Vr1 > cn-l/Z) : 2 (4j2c2 - 1) 8.2] C ' n->co j=l Second order terms of these asymptotic expansions are also known. [10, 14]. Following the terminology of Herman Rubin and Jayaram Sethuraman -l/2 n [17], deviations of these statistics from zero of the form c for constant c will be called ordinary deviations. Any deviation of the form -1/2 11 , for {Cu} a sequence unbounded above, will be called exces- sive. If {Cu} is also an increasing sequence it is apparent that 1/2 limP(D+>c n“ ): 0. n n n—>oo Of interest is the rate at which probabilities of excessive devia- tions tend to zero as the sample size becomes large. Much work has been done in this area. 21 Herman Rubin and Jayaram Sethuraman [17] made use of a theorem by N. V. Smirnov [10, p. 154] to obtain the following excessive 3n-l/2 deviation result. Let cn —> co and CH =- O(l) , then 2 -2c P[D+>c n-1/2]~e n n n -2cZ P[D >c n-1/2]~28 n n n In section 2. 3 it is shown that asymptotic expansions similar to n-1/6 these remain valid if the condition Cn = 0(1) is replaced by nl/Z). In section 2. 4 similar results are obtained for the Kuiper statis- c=o( tic;e.g., for Cn>C\/logn and cnn-1/6=O(l), c>1/2 2 -2c P[V >c n-1/2]~8cze n n n n However, for larger deviations the results are fundamentally different, that is, they are not a "natural" extension of the ordinary deviation results. A constant deviation will be called large. Jayaram Sethuraman [18] proved that for any constant c between zero and one 1 Flog P[Dn > c] log(3(c) C l-x-c (3(a) = sup > O C] ~oz(C) (13(0)) ; where a(c) does not depend on n . Asymptotic expansions are also given for D , and D- . n n Order results for large deviations of the Kuiper statistic have also been worked out. Innis G. Abrahamson [1] has proved that 1 1'1 log P[Vn Z c] ~ log (3(c) , In section 2. 6 it is proved that P[Vn _>_ c] is asymptotically equal to a1(c)n((3(c))n. Section 2. 1 contains a brief review of some well known facts about Fn, Dn and Vn and the relationship between the order statistics of uni- form random variables and Poisson processes. Section 2. 2 contains some theorems of a technical nature which are used in the later sections. 2. 1 The Sample Distribution Function In the following paragraphs some well known properties of the sample distribution function are reviewed. No new results are presented in this section. 2. 1. 1. Let X1, X2, . . . , Xr1 be n independent random variables with common continuous cumulative distribution function F . The statistics D+, D-, D and V are independent of F. n n n n For example, set Y1: F(Xi) then the Y.1 are independent uniform random variables on the interval [0, 1]. Since 23 + _ the statistics D , D , D and V generated by X ,.. . ,X are, n n n n l n with probability one, the same as those generated by Y1, . . . , Yn . In the following sections it will be assumed that the X.1 themselves are uniform random variable S . 2.1.2 The equation Fn(x) - x = D: can only be valid if x is one of the observed values of the (uniform) random variables X1, . . . , Xn From continuity argmnents, it is evident that Fn(x) -x takes, with probability one, its maximal value at a unique point, Xmax . The random variable Xmax is uniform on the unit interval. In the following sections it will be assumed that the maximum deviation of Fn(x) - x is unique. (Similar remarks apply to the infimum of Fn(x—) —X . [13, 2]). 2. 1. 3 The probability distribution of the order statistics, X(l) _<_ X(Z) E . . . _<_ X(n) , of n independent uniform random variables on the interval (5, t) and that of the jump points, T _<_ T E . . . _<_ Tr1 1 2 of a Poisson process, X(v) , is the same given that X(s) = m and X(t) = m+n [6 p. 400 and 12, p. 239]. By a Poisson process with parameter A is meant a separable, real process X(t) with stationary independent integral valued increments and 24 2. 2 Some Asymptotic Expansions The idea of obtaining asymptotic results for Kolmogorov-Smirnov statistics by a consideration of n Fn(t) as a Poisson process, X(n t) , is not new (see D. A. Darling [20] and R. Pyke [16]). This idea is employed here to obtain asymptotic expansions for the conditional probabilities of the events {F (t)-t< m n —n -x for all t such that O5+s k212 m] n’m __ k—n 1 9 9 9 sixfit n1 . —1-1‘IE—:—S—)_ 1f0n(t-S), In particular, 25 m n-m P[X 1, A]: < l and Xe- = X, e . Evidently as X tends to one, w tends to zero. Lemma 2. 2.1 If A =l+a ;a >0, a 20(1) and w is the largest real n n n n n root of the equation w = A (l -e-W) then w ~ 2a n n n 26 Proof. Noting the remarks following Theorem 2. 2. 2 , w : )\ _ )3” ..)\ b _x’i‘ n n n where )‘ne n = xle n and since the function g()() : he is continuous and for X > 0 has a unique maximum at X = 1 it is evident «I; that )tn 1 1 implies )‘n 1 l . An application of the mean value theorem yields: a a2 3 e- =l-a+T-O(a);a>0. Thus (1+a)e-EL : l-a2/2+O(a3), a>0 (1 -b)eb :1-b2/2 - O(b3), b > o, >:< ->\T1 >{: -)\: Let A =l+a;)\ =l-b sothat 1e zxe ,thatis -a n nb n n n n n _ n . 2 3 _ 2 3 (1+an)e — (l - bn)e Wthh means that bn + O(bn) — an - O(an) . _ _ 2 _ z and Since an—o(l), bn—o(l) we have bn(l+o(l)) — an(l-o(l)). Thusb~a andwza +b-2a. n n n n n n In fact w 22a (1+O(a )) . n n n . t . . . . G1ven that the m h order statlst1c of n 1ndependent un1form random variables on the unit interval occurs at x , the first m — 1 order statistics, X , . . . , X , have the same distribution as the (1) (In-1) order statistics of m - 1 independent uniform random variables on the interval [0,x]. Also U. = n(x-X .); i=1,2,...,m-l would (1) (In-1) be the order statistics of m -1 uniform random variables on the inter- val [0, n x]. Evidently 27 In _ P[0<5]:1I<)x(Fn(t)- t)< n - xlxhnl —X] k C — — : p[_n - X(k)< 3- -x, k—l,2,...,m-1]X( )—x] P[-(m-1)F1r1n_l(t) +t_<_1; O:t:nx]X(0) = o, X(nx) =m-1] Ln(x,m) , say. (2.2.3) Here Filn-l (t) is the sample distribution function based on the {U1} and X(u) is a Poisson process with jump points {U1} on (O,nx) [See 2.1.3]. The next theorem shows that, asymptotically, the condition X(nx) = m-l in (2. 2. 3) may be dropped for suitable choice of m and A . Theorem 2. 2. 3 Let {X(u) ,u: 0} be a Poisson process and, for each n>0 , let {Xn(u) , u __>__ 0} be a Poisson process with parameter An = l + where O < c < co , O < t < l and dn is a positive sequence such that dn = 0(1) and ndr21> alogn for some a > 0. Let kr1 : [n(t+cdn)] . Define Ln(t,kn) = P[u-X(u):1;0 c and c d = 0(1), 11 n nn Ln(t,kn) ~ PEu - Xn(u) _<_ I] and 2cd n L t, kn) = [1 + a(n,t,c)] (2.2.4) n( where 28 lim sup a(n,t,c) = 0 (2.2.5) n—>co A and the supremum is taken over all t and c such that 0 ~’ tO < t < t1 < 1 and l 0} has parameter A. The probability of the sample functions for which u - X(u) : 1 for all u > 0 is given in Theorem 2. 2. 2. This probability may be expressed as a sum by con- ditioning on the events {X(n t) : k - 1} for k 3 nt . For reasons of notational simplicity, define Q(>.)=p[u-X(u)5_1;0.,k) = P[u—X(u):1;u>nt]X(nt):k-l] (2.2.7) and pn(>.,k) = P[X(nt) : k- 1] . (2. 2. 8) Since X(-) has independent increments, O(X) may be written as QM) = Z L (t.k)P (MMQ (Mk). (2.2.9) k>nt n n n and since X(.) has stationary increments, Qn()\, k) = P[u -X(u) :k-nt; uZO]X(O) = 0] In View of Theorem 2. 2. 2, QM) = 1-6 (2.2.10) and 29 '(k'nm’ if kznt. (2.2.11) Qn(A.k) = 1 -e Apparently Qn(A,k) and Ln(t,k) are nondecreasing functions of k and we are led to the following inequalities: Q(A) > ( Z P (A,j))L (t,k)Q (A,k) if k>nt (2.2.12) — j>k n n n — Q(A) < L (t,k)+( Z? P (AinL (t,k')+ Z P (Ad) - n jZk n n jzk' n if k'>k>nt.(2.2.13) If k is large, if 2 P (A,j) is near one and if E P (A,j) j>k n j>k' n is near zero then L t,k) could be approximated by Q(A) . Of course n( Ln(t, k) is independent of A so the problem reduces to that of selecting A so as to make the bounds in (2. 2.12) and (2. 2.13) reasonably tight when k = kn' An intuitively appealing idea is to "aim" the sample functions of the process {X(u)} at the point (n t, kn)’ i. e. , to select A so that EX(n t) '= kn Proceeding with this idea, let nth;1 = [n(t+cdn)] + [nan] (2.2.14) -1 where {an} is a positive sequence such that 8n = O(dn) , (neg) : 0(1) and me: = 0(1) . Now if {Xi} is a sequence of independently distributed Poisson random variables with parameter 1 then ntA' n I .. _ Pn[An,k] _ P[i_21 Xi_k—1]. 30 Cramér's theorem for probabilities of excessive deviations of a sum of independent random variables from its mean [8, p. 517] appl ie 5 so that 2 P [A'.j]~-—-——l-——- e 2. . n n J(1-e n ) (l+—-——2—-——e 2 ) (2.2.16) V2nns: Ln( for all n > Ne where Ne is a constant determined by {an} alone, and not on c or t (since Ar'1 > n(t0+dn) for all t and c satisfying the restrictions after (2. 2. 5)). Making use of Lemma 2. 2. 1 it is seen that -w 2cdr1 8n (l—e )<( +——)(1+acd) — t to n n for some constant a and that ndZ n -(k -nt)w '1 t1 > (i—e “ < (1+2e for n sufficiently large. Thus 2cdn ( l Ln(t, kn) < t 1+a (n,t, c)) (2.2.17) where lim sup a'(n,t, c) = O . n->00 A 31 In order to obtain a lower bound for Ln(t, kn) return to . 1/2 11 _ _ 1 : (2. 2.13)w1th nt An — [n(t+cdn)] [n en] and kr1 kn+ (n log n) Then, using Lemma 2. 2. 1, line (2. 2. l7) and Crame’r's Theorem it is s een that t0 n 2 ns _ n 2 2 10,103.11 3 e j>k . — n and Z P (A",j):(nlogn)fl1/Z j>k' n n “ n for n sufficiently large and some constant a . Thus 2cdn Ln(t,kn) Z t (1+a (n,t,c)) (2.2.18) where lim sup a"(n, t,c) = O . The Theorem is proved. n—->00 A The use of Cramér's theorem to obtain asymptotic expansions for Z P (A' ,k ) and Z P (A", R) was not essential. An alterna- k>k n n n k>k' n n .... n — n _ tive approach is to note that if p(A, k) = Ake X/kl then repeated integration by parts verifies [7, p. 163] l w -x n 2) p(A, k) = 3—,] e x dx. k=0 ' x 32 Of course 111 can be evaluated using Stirling's formula and asymptotic expansions for the above integral, an incomplete gamma function, have been worked out by W. Fulks [9]. The following lemma presents an asymptotic expansion for binomial probabilities [7, p. 169] . Lemma 2. 2. 2 Let {dn} be a positive sequence such that dn : 0(1) and ndf'l > aologn for some a > O. For each t 6 (0,1) let {cn(t)} be a sequence such that cn(t) :1 , m : mn(c,t) = n(t+cn(t)dn) is integral valued and for some 5 > 0 , cn(t)dn < l-t -e . Define _ n m n-m Pn(t,m) — (m)t (l—t) (2.2.19) and f (t,m) : (21rn(t+c (t)d )(l-t-c (t)d )-1/2 n n n n cn(t)dn ‘m cn(t)dn “(mm (1 + ———E-—-—) (I - ——1—-_-—t—-) (2.2.20) Then Pn(t,m) : fn(t,m) (l :l: O(-nl—t)) (2.2.21) Proof. A refined version of Stirlings formula [7, p. 52] states that l l .. Z .. l/2nn+1/2e n 12n+l . (2n)1/2nn+1/ e ne12n (217) m [A :1 [A Thus f (t ) _.1___ __1_ __.1___ n 'm 9"" 12n+1 ' 12m ‘ 12(n-m) _<_ Pn(t, m) 1 1 1 3 fn(t’m)exp(12n ' 12m+l ' 12(n-m)+l) and the result (2. 2. 21) follows after slight manipulation. Lemma 2. 2.3. Let K have a binomial distribution with parameters n and t . Let m=n(t+cn), 00' Then (l-t-c ) (t+c c n c n]] szgslll-l—fi-l (1+?) Proof [3] . For x > 1, K 2 O, xK is an increasing function so that P[K > m] < min s“m(ts + 1—t)r1 S>l where E SK 2 (ts + l-t)n Differentiate the right hand side with respect to s to find the minimizing value is s = (£)(-1—-£) > 1 . O n-m t 2.3. Probabilities of Excessive Deviations of the Kolmogorov-Smirnov Statistics Let X1,X2,...,X be a sequence of independent uniform random n variables on the unit interval and let Fn(x) denote the sample distribution function. Let p(t, %.— t) denote the probability "density" that the (unique) maximum of Fn(S)-S , for se[0,l] , is attained at t and is equal to E;— t . Thus n l n ‘jp z p(t, 9.- t)dt = 1 . (2.3.1) m=l n 0 If pn m(t) denotes the probability density of the m-th order statistic, X(m) , then m _ = _ m _ . = p(t, a. t) pn,m(t)P[Fn(s) s 5 E. t, 05s51|x(m) t] (2.3.2) and we have the following. Lemma72.3.l. ‘ fi—‘Vfi'i For p(t, %.~ t) defined as above and lgmgn - t) = a(g)tm‘l(1—t)n‘m(1— n-m )Ln(t,m) . (2.3.3) p(t, ETTZE) :15 Proof: The probability density of X(m) is [12] = m-l _ n-m pn m(.t) m(§)t (1 t) (2.3.4) In view of theorem 2.2.1 , 34 35 P[Fn(s)-S 5 g - t;tgs51]x(m)=t]=1 - 53.12.15) (2.3.5) Since Ln(t,m) is defined in (2.2.3) as P[u—X(u)sl;0 dn] = Zp(t,-m- - t)dt. (2.3.7) n ‘ n. 0 m2n(t+dn) An asymptotic expansion for the integral on the right (in 2.3.7) is obtained (for suitable sequences {dn}) by first finding an asymptotic expansion for p(t, %.— t) and then one for p(t,% - t) for mzn(t +dn) fixed t and finally carrying out the required integration using the method of Laplace. Let {dn} and {cn(t)} be sequences as defined (and restricted) in Lemma 2.2.2 and m=mn(c,t)=n(t+cn(t)dn) . The following Lemma presents an asympotic expansion for p(t, % — t) Lemma 2.3.2 Let p(t, %.— t) be the "density" defined prior to (2.3.1). Let m:mn(c,t) and c=cn(t). Define cdn t+cd cd l-tcdn 2 __ n __ n dn gn(c,t)— log(lfi-E—) (l l-t) (2.3.8) 36 and _ 2c2 -1/2 bn(c,t) — t(l_t)[21r(t+cdn)(l--t--cdn)] . (2.3.9) Then for any t and t such that 0 < t < t < t < 1 and sequence 0 1 0 l c such that c > c and c d = 0(1) n n n n 2 m 2 nd g (C,t) p(t, B- - t) =/n dnbn(c,t)e “ “ (1+a(n,c,t))(2.3.10) where 1im supa(n,t,c)=0 and the supremum is over all t ; n—m A 0 < t0 < t < t1.< 1 and over all c ; 1 s c S on. Proof: The proof is achieved by using expressions (2.2.4) and (2.2.19- 2.2.21) in expression (2.3.3). Next on the agenda is the task of evaluating Z p(t, %— t) m3n(t+dn) for fixed t and sequence {dn} For each k=0,l,2,... define the sequence {c:(t)} so that n(t+c:(t)dn) = [n(t+dn)]+k if [n(t+dn)]l and n(t+andn) = [n(t+adn)] , the upper and lower Darboux sums of 2 a ndngn (C s t) n 11an bn(C,t)€ c;(t) obtained by breaking [c'(t),an] into intervals of width l/ndn are dc n(a -l)d —l “ ndzg (c:(t>,t> bn(ck(t),t)e n “ (2.3.13) k=l m and n(an—l)dn k ndign(c§(t>,t> b (c (t),t)e (2.3.14) = n I1 respectively. This leads to Lemma 2.3.3. Let p(t, g.— t) be the ”density” defined prior to (2.3.1) and {dn} be defined as in Lemma 2.2.2. then 2 ndzgn(l,t) p(t, E:- t) = dn(t)e n (1+a(n,t)) (2.3.15) m3n(t+dn) n where 1/2 —1/2 an(t) = 2(ndfi) (2nt(l-t)) , (2.3.16) gn(l,t) is defined in (2.3.8) 38 and lim sup a(n,t) = 0 . n+°° t01 let kn=[(a-l)ndn] and write the left hand side of (2.3.15) as n m:(k,t) :E: mg(k,t) :3: p(t, ------~- — t) +- p(t, -——————- - t) (2.3.17) =1 “ kgkn+1 “ = Q1+Q2 say . It will be shown that Ql is asymptotically equivalent to the right hand side of 2.3.15 and that Q2 is asymptotically negligible when ”a" is a sufficiently large real number. Making use of (2.3.10) and (2.3.12) through (2.3.14), Ql may be expressed as -l a ndzg (c,t) (ndan bn(c,t)e “ “ dc)(1+a*(n,t)) (2.3.18) 1 A 3\ Q. :3 N v D [_J I 2 0 ndngn(Cn(t).t) + 0n(bn(Cg(t),t)e where 2 and lim sup a*(n,t) = 0 n+w t0_ eu du . (2.3.23) 1 2 1 ndngn(a,t) 0 40 8 V _ (Here gn(a,t) --§E— gn(c,t)) In particular if we let a = 1+0 where 0 > 0 , 0 = 0(1) n n n n l = and 0(0n) nd n then 2 nd g (l,t) an ndign(c,t) bn(1,t)e n n ndn bn(c,t)e dc== (l+o(l))_(2.3.24) dnlgg(l,t)] 1 To verify (2.3.24) note that ndzé g'(a t) < ndzd '(1 t) nn n9 .. nng a and 2 dn dn 1 _ _ ——-_ _.___ ndndng (l,t) — ndn0n(log(l+ t ) log(l l-t)) (2.3.25) 2 -ndn0n < ____.__ 2t(1-t) provided d < . to(l-to) tl(l-tl) n min —*——-7{-* , 2 The latter inequality is obtained by noting that f(x) = logC£:£. — log E:£:§;_-__3L__ t t+x 2t(1-t) is such that f(0) = 0 and f is increasing if x is small. Since 1im ndfidn = m we conclude that the integrals on the right hand side ofn(2.3.22) and (2.3.23) converge (uniformly for t0| for some a(n,t) such that lim sup a(n,t) = 0 . n+m t a0 log n for some aO > 0 , the constant a in (2.3.27) may be selected so large as to make 02 asympotically negligible compared to Q1 for all t0"s)i .., given that the supremum is attained at the point t , t0: dn . Odt = o(e n ) (2.3.32) 0 t1 To verify (2.3.31), gn(l,t) is twice differentiated with respect to t 3 2 = 2 = ._£_ _ _ l-t dnfn(t) dngn(1,t) (t+dn)log t+d +(1 t dn)log( ) n 1—t-dn (2.3.33) d t+d a 2 n 1-t n dzf' t = ———-d 1,t =—————— - 1 -——— . 2.3.34 n n( ) at ngn< ) t(1_t) og< t l-t-dn) < > 2 2 2]“ l dzf;(t) =-—§§ d gn(1,t) = —d 2 + 1 ;] < 0 n at n n t (t+dn) (1-t)2(1-t—d (2.3.35) 44 By using Taylor expansions of the log—terms in the expression 2 d d d _. = n _ _E. _ __E. dnfn(t) t(1—t) log(l+ t )+ log(l l-t) it is seen that t: , the (unique) maximum of gn(1,t) is in the interval 6%(l—dn), 1/2) for all n sufficiently large. Using the mean value theorem, gn(1,t) may be expressed as (l t) = (l t*)+ -l—(t—t*)2f"(3 ) (2 3 36) gm ’ gn , n 2 n n n e o for some 6ne[[t:,tn] U [t,tgH In evaluating the integral in (2.3.31) the following notation is used: _ —2 —1 hl(t) — t (t+dn) , _ —2 —1 h (t) — (1—t) (l—t-d ) , 2 n = * t1,n tn + an , = 2': — t0,n tn En , where - —2 _ 2 En — 0(1) and En — o(ndn) , u = (nd:(hl(tl n)+h2(t0’n))l/2 : _ 2 1/2 v — (ndn(h1(t0,n)+h2(tl,n)) , and for some e>0 , t0<1/2—e and tl>1/2+e . In view of (2.33) through (2.3.36) 2 ndnfn(t:) 7 tl,n 2 nd f (t) 01 (t‘)e u (t -t*)':_x' f a(t)e nn dtg n n n l,n ne2dx t n u 0,n n ._ it Un(t0,n tn) 45 2 nd f (t*) V (t -t*) ndzf (t) a (tt)e n n n n lsn n 2 a (t)e “ “ dt 8 n -X n > '— dx 2 _ * Vn(t0,n tn) ' H for some tn 6 [t0,n’t1,n] and tn €[t0,n’t1,n] Because tgrvl/Z it is seen that I dn(tn) dn(tg) -1/2 ___.__./v-—————-/~/(2n) un vn so that t1,n 2 2 ndnfn(t) ndngn(l,t§) dn(t)e dt/Ve ‘ (2.3.37) t 0,n Also, referring to the definition of dn(t) , (2.3.16), it is evident that t t 0’n ndfifp(t) 1/2 2 1/2 0,n “dzfn(t) . — n an(t)e dt 5 2(2nt0(1—t0)) (ndn) dt, t0 to (2.3.38) Here, fn(t) may be expanded as . __ , 1 _ fn(t) — fn(t0’n) + fn(7)(t t0,n) for t0 (l-t-dn-n) 1—t—dn t+dn n(l—t-dn) l—t Note that (1 — %)x < e—1 and (l - %)-X < (1 — %)—le for x > 0 so that 2 -1 l—t—d nd f (t) n 11 Z p(t, g- t) : n(l- 3}) <———-—)e “ mzn(t+dn) 1't where fn(t) is defined in 1ine(2.3.33). Thus there is a constant c > 0 such that t t () :E: 0 ndzfn(t) p(t, E-- t) 5 cu e n dt . (2.3.43) m2n(t+d ) n 0 1'1 0 As in (2.3.39), fn(t) may be bounded above by fn(t) s fn(t0) + (t—t0)fg(t0) for O < t 5 t Thus the right hand Side of (2.3.43) is bounded 0 . above by 48 2 en ndnfn(t0) 2 (2.3.44) Y ndnfn(t0) which is asymptotically negligible when compared to 2 * ndnfn(tn) This may be easily demonstrated by means of the following crude inequalities. For t < t: fg, = (t—tpfmn) : (-21- - t) and _ t2 11 fn(t) — fn(2t)—tft'l(2t) + 2 End) 1 f 2t —-——— s n( ) l6t provided 2t < t3. Thus the expression in(2.3.44)is less than 2 ndn 2 _ IBIb ce endnfn(2t0) ’i_ 2 (203045) 12 t0)dn ' * prov1ded 2tO < tn . . 2 Then Since fn(2t0) < fn(tfi) and ndn > aolog n selecting tO < a0/16 also will assure that expression (2.3.45) and hence expression (2.3.43) is asympotically negligible compared to (2.3.37). A similar argument is used to show 11 1 :E: ndfifn(t*) p(t, E:— t)dt = o(e “ ) (2.3.46) m3n(t+dn) ' t1 49 Recalling Lemma 2.3.1 and Lemma 2.2.3 it is seen that p(t, g — t) .<_ $319104?“ 2' n(t+dn) ndzf (t) m ________ n n . p(t, 'n - t) _<_ e if t < 1-c1r1 t m3n(t+dn) and and is zero if t > l—dn Thus ndgfn(tl) [:2 p(t, 3,? — t>dt 5 en 3 m>n(t+dn ) and (2.3.46) is verified. The results of this section are summarized as: Theorem 2.3. 1 If {Xn} is a sequence of independent, identically distributed random variables with continuous distribution functions, F , and + Dn = sup Fn(x) — F(x) is the Kolmogorov-Smirnov statistic generated x by X1,...,Xn then for any real sequence {dn} such that d > 0 , n (1n = 0(1) and ndi > a0 log n for some aO > 0 2 nd f (t*) + nun P[Dn 3_dn]nve where dn c1n -dr21fn(t)= (t+dn)log(l+ —)+(1—t—dn)log(l— TH—t and t: is the unique value at which fn(t) attains its maximum for O < t < l . 50 Corollary_2.3.l For the sequences {Km} and {dn} defined in the previous theorem 2 _ ndnfn(t;) P[Dn > dn] Ne ndif (til) Pan > dn]r~/2e ‘1 2. 4. Probabilities of Excessive Deviations of the Kuiper Statistic For a sequence of random variables, each with distribution function F , the Kuiper statistic is defined in terms of the sample dis- tribution function, Fn’ as This statistic was originally suggested to test hypotheses about distributions on a circle. It has the property that Vn is the same no matter where on the circle the count to determine Fn(x) begins. To make this statement more precise, define addition on the unit interval by s+t= s+t if s+t<1 s+t-l if s+t>l and define the "interval” [s,s-i—t] by {x:s:x:s+t} if s+t<1 and {x:s:x:l or 0:x:s+t-l} if s+t>1. Let X , X . , Xn be independent uniform random variables 2... H on the unit circle. Define X:(t) as the number of Xi in the interval S (s, S-i-t] and nF:(t) = Xn(t). Then V = sup (F (x) - X) - inf (F (x) - X) n 05x51 n 0:x_<_l n s . s = sup (F (x) - x) - 1nf (F (x) - x) 05x51 n 05x51 n Let Dn(v) : Fn(v) - v. If the infimum of Dn(v) occurs at s and the maximum at s-i—t and the mth order statistic in the interval (s, 8-1-1) 51 52 occurs at s-i—t then V = E- - t. Let p (t, E - t) be the joint n n n n "density" of the above event. Then 1 l n-l m 23 p (t,— -t)ds dtzl. m=l n n 0 0 Using the notation developed in (2. 6. 3), pn(t, It? - t) may be written as L(t, n: m) P(t, n) m) R(t, n: m). Let A be the event that an order statistic occurs at s and the th m order statistic in the interval (5, S-i-l) occurs at S-i-t. The joint density associated with this event is n! m-l n-m-l P(t'”'m) Z (m-l).'(n-m-l).' t ”'9 for O clogn for some c > 0 . Then if m = [n(t+dnll 2 -1/2 3/2 ndn gn“) r) (3 . P(t,n,m)~ [21rt(1-t)] Here, gn(t) is defined by d d 2 _ n _dngn(t)—(t+dn)log(l+—t-)+(l-t-dn)10g(l-——). The proof involves using Stirling’s formula in a manner Similar to that used in Lemma 2. 3. 2. Lemma 2. 4. 2 If {dn} is a sequence of real numbers such that dn = 0(1) and ndrz1 > clogn for some cZ :t/Z then for m = [n(t+dn)] , 0 < t < 1 2d 2 L(t.n.m)~ (——3>. Proof. The proof consists of breaking L(t, n, m) into parts by con- ditioning on Fug-2:1) . First note that P[X:(t/2) = le] follows the binomial distribution with parameters 1/2 and m-l ; P[X:(t/2) = le] = (mj‘1)(1/2)m“l for 0_<_j :m—l. 54 Let Ll(t,n,m,j) : P[Dn(s) 3 Dn(v) :Dn(S-i-t) ; v e [s, s+t/2]]X:(t/2) = j,A] L2(t, n,m,j) = P[Dn(s) :Dn(v) :Dn(s-i-t) ; v e [s-i-t/2, s-i—t]]X:(t/2) = j,A]. Then m l m 1 ( .- )L1(t,n,m,j) L2(t,n,m,j) . 0 J m-l M I L(t, n,m) : (1/2) 1 As in sections 2. 2 and 2. 3 the relationship between uniform order statistics and a Poisson process, X(-) , can be exploited. P[Dn(s) : Dn(v) ; ve [s, s+t/2]|X:(t/2) = j,A] P[u—X(u) : l ; 0 < u < nt/2]X(nt/2) : j] Ln(t/Z,J+l). If j = n/2(t+dn+o(dn)) then Theorem 2. 2. 3 yields L (t/Z.j+1)~ Zd /t. n n Clearly, L1(t,n,m,j) = P[D (s) < D (v) ; vs [s, s—i—t/2]|erl(t/2) = j,A] _ P[Dn(s) _<_Dn(v) and Dn(v') > Dn(s—i-t); for all ye [s, s-i—t/2] and some v' e [s, s-i—t/Z]]XE(t/Z) : j ,A]. It will be shown that for j : n/2(t+dn+o(dn)) that P[Dn(v') > Dn(s-i—t) for some v' e[s, s-i—t]]Xi(t/2) : j,A] = O(dn) so that 55 L1(t, n,m,j) ~ Zdn/t‘ To evaluate P[Dn(v) > Dn(s-i—t) for some ve (s, s-i-t/Z) Ix:(t/2) =j,A] observe that, conditionally, there are j - 1 order statistics in the interval (8, s-i-t/Z) so the conditional probability density that the kth order statistic occurs at s-i-v is the probability density that the kth of j - l uniform order statistics on the interval (0,t/2) occurs at V 3 .. '. _ _ .-l-k (t/Z) lkflklnfig)“ 1W??? 05v5t/2. The density with respect to u = (t/2)-1v is k-l j-l-k k_n(t/Z-v) . The conditional probability that Dn(x) does not exceed k/n - v to the ”left" of v is P[Fn(x)-x:k/n-V 01. Let e n e— (2ce —u)dr1 - usn k = j u+ n (t+d +8) . n n Since knzju+c0jdn/t forsomee, 130:2;c31, we may obtain, using Lemma 2. 3.4 that P[Dn(v') > Dn(s-i-t) for some v' e [s, s-i-t]]erl(t/2) : j,A] nnn e ( 1(d2/t21g (1.u:)) O ~2jdi/t2 o(e ) O(n-cZ/t) Similarly, L2(t, n, m,j) ~ 2dn/t . This may be verified by observing that P[Dn(v) :Dn(s-i-t) ; v e [s-i-t/Z, s t]|x:(t/2) = j,A] p[u_x(u) :1 0 < u < nt/2]X(nt/2) = m-j] Ln(t/2,m-j+l) and then proceding with the proof that L1(t, n, m, j) ~ 2 dn/t . 57 Apparently . . s 2. _ P[Dn(s) _<_Dn(v) :Dn(s+t)ve [s, s+t]]Xn(t/2) j,A] - 0 unless nt/2 . n (b) J=Z(t+(l+0)dn) |e|<1 (c) other j 2d 2 For part (a) the sum is asymptotically ( tn) (1 -o(l)> and parts (b) and (c) are negligible compared to this. I Corollary 2. 4.1 II 0 A ...—1 V If {dn} is a sequence of real numbers such that (1n and rid:1 > clogn for some c > 0 then for m = [n(t+dn], 0 < t <1 d L(t,n,m) = 00—3)}. Lemma 2. 4. 3 H O A l--' v If {dn} is a sequence of real numbers such that dn 58 (l-t) 2 and ndr21> clog n for some c2 3 then for m = [n(t+dn)] , 0n(—12t)‘ Also, using the notation of Theorem 2. 2. l, P[D (v') < Dn(s) for some v' e [s-i—t, s-i—t-i—l—g—t [A and Xs+t(l-£—E) = j] dn+€n P]: sup (F . (u) -u>> ] l-t n’3 2 0 0 , dn = 0(1) and ndi>1/210gn then 2 ndrzign(t:) P[V >d ]~ 8nd e n n n where ng (t) = (t+d )log -—t— + (l-t-d )log -—l—‘£— and t]< is n n n t+dn n l-t—dn n the value at which g(t) attains its maximum 0 < t < 1 . Proof. The proof is similar to that of Lemma 2. 3.4. By Lemma 2.4.1, 2. 4. 2, 2.4. 3 it is clear that for m = n(t+cd) C n 61 m 2 c 4 3/2 ndngn(c,t) pn(t9 n 't) ~ (217) —1/2 (cdn) r1 e (t(1-t))‘5/2 , here drz1 gn(c, t) is determined by replacing dn by cdn in the definition of d: gn(t) and, repeating the reasoning of Lemma 2. 3. 3, z; (t, m_____+k -t) k=0 “ 1-t 3.;- _ _5 nd g (c,t) ~ 4nd I (2 ) 1/Z( (1 t) /Z( d )4 3/Ze dc 1 5/2 2 _1/2 ndn ) 2 ~ (Z1T) 4c]~oz(c)[6(c)]n, (2.5.1) where * t>1<+c * l_t*-c (3(6) = [1 f, ) (—l-f——) 1, (2.5.2) t +c l-t -c and ei-w‘ -1/2 a(c)- 9%)]: 1 t ‘C + *2 :1 (2.5.3) 2(]t +6)2 (l-t ) (t'+o) ’ >1: Also, t is a root of the equation 62 63 log(t .l't‘c) + C :0;0 O and r15: = alogn if 31—252 < c2 and O elsewhere. Let K be a Poisson random variable with parameter p . Then -m K m m- P[K :m] _<_ min S ES = (i) e H (2. 5. 8) s>l In particular, if p : ntxr-l and m is defined as above [18 - - n ntx m-ntk - 2 p (x',j) < < “)9 “rn n 1'1 — m —- —- n8 _ n + . - z Pn(>.n.3) : e 2(t+°+28n+97“ _<_ n 3 (2.5.10) j>m+2n5 — n and 3 1'18 n5 _ n + n 2 Pn(>.+,j) _<_ minsmEs‘K EeZ(t+c+9m 3(t+c+9/n)2 _<_ n-‘a j_<_m n 521 (2.5.11) . . + . . Let W+ denote the root assoc1ated With )‘n as defined in Theorem 2. 2. 2. Similarly let w_ denote the root associated with x; . Then, returning to the inequality (2. 2. 12) it is seen that -W 1. (t,m) 5 (1—e J“) n (HEN—a) (2.5.12) for all n sufficiently large, and using inequality (2. 2. 13) it is seen 65 that -w Ln(t.m) 3 (1- e (2.5.13) ‘)<1-2n'a) for all n suficiently large. Since X = xg'ze')‘ the Finally, let w denote the root associated with x. >1: >:< - w = X - x where K < 1 is the solution to Xe following inequalities obtain 2(5 + e/n) 2(.c.In + e/n) w 1/2 , it is concluded that Ln(t,m) = (1-e'w)(1+3(n’l/Z)) (2.5.16) An asymptotic expansion for the expression p (t,m) = m(n)tm’1(1-t)n‘m n m may be found in a mannersimilar to that used in Lemma (2. 2. 2). It is found that (t+c)n 1/2 n 6 k2 Pn(t, m) 2 ( 2 ) (p(t, c)) (a(t, c)> (1 +0(-——)) Z'rrt (l-t-c) n (2.5.17) where _ t t+c l-t l-t-c B (2.5.18) — n—l/2 and sup an(t, k) = o( ) for x < t+c < x and 0 i k 3 kn: o(nl/Z) 0 1 The probability density that the maximum deviation of D: from 0 occurs at t and is greater than or equal to n(t+c) is k qn_0 k+9 where mk = n(t+c+ n) is integral valued and 0 gen < 1 . To see that m p(t! _ -t) qn(t’ C) 1-a(t, c) (2. 5. 20) write k' Ink rnk q(t,c)= Ep(t,————-t)+ Z) p(t,—~t). n k=O “ k:k'+1 “ The first sum on the right tends to the required limit for large k' ; k' m m k'+1 1330 P“, ‘31: ‘ t) = P“, “29‘ “(I .13. Willa”) (1+32 k<§>tk‘1<1-t)“‘k kzk'H “ t_>_k'+1 c+k' l-t—c-k' c+k' t+c+k' -n :"Ul- 1..) (1+ .) 1 nk' t l-t-k'-C' n 5- n(l-t ' c+t+k‘ ) (find) -l : o(fi(t,c))n for some k'=O((10rgln) )(2.5.21) \ Thus qn(t, c) may be written as 9n qn(t,c) = fn(t,c)(a(t,c)) (1+an) (2.5.22) where Ian] : o(1) independent of s and c for s bounded away from zero and s+c bounded away from 1 . The probability that D: is at least c is l-c f qn(s. c)ds (2.5.23) 0 and for fixed t such that n(t+c) is an integer t t f qn(s,c)ds f1fn(s,c)[a(s,c)]e(l+o(l))ds 1 (1,3) (c-3- t (1 +o(1))f fn(s, c) [a(s, c)]n(t‘s)ds. 1 “H (2.5.24) The integral on the right may be expressed as t-— O n l l uc -u u ~_f (t: C) [em][a(tyc)] [a(t-_:C)] du O 1 uc ~._fn(t,c) f e-——t(1_t)du O , C _ t(1-t) 't(l-t) _ cn fn(t,c)|:1—e ]‘ (2. 5.25) This may be seen by letting s = t - g; 0 i u < 1 then t 1 f1fn(s,c)[a(s,c)]n(t“5)ds = if fn(t-%,c)[a(t-:1—1,c]udu o t-- 1'1 and observing that a(t-g, c) ~ a(t, c) and that t-E n(t+c) 14+}: n(l-t-c) u n n’ n ’ l-t t l-t-c CU. ~fn(t.c)e “1‘” [a(ncn‘.u This final relationship is used to define g (s, c) , a smooth approxima- c . s(l-S) . Def1ne tion to qn(s, c). Define b(s,c) : log a(s,c) + C _ b(s,c) s(l—s) - s(l-s) fn(s, C)[1_e-b(s, C):] l:_c—:_ 1—e ](2. 5.26) gn(s,c) - if b(s,c);1é 0 and 69 C gn(s,c) : fn(s’c)(S(lC-S))(1'e s(l-s)) if b(s,c) = 0. Then (by 2.6.24, 2.6.25) And from the above discussion it is evident that for t bounded away from O, and t+c bounded away from 1, that t t flqn(s,c)ds :(f1gn(s,c)ds)(l+on(t,c)) t-— t—— 11 1’1 where (on(t,c)' < an and an is a sequence convergent to O inde- pendent of t and c . Thus for O < tO (x)en (X) is absolutely integrable for every positive value of n >n . 0 (ii) h(x) has a single maximum in the interval, an interior point of (a, b). (iii) h"(x) is continuous, h'(y) = 0 and h"(y) < 0. Then 8.8 H96.) b nh(x) ~21: ”2 nh( ) 41X) e dX ~ ¢(Y) [m] e y a In our case t+c l-t-c t 1-t h(s) : 10g(t+c) (l-t—c) c )_[ s+c Jl/Z(1—e‘w)[ b Jl: e- s('1_s)] (MS — 2'rr(l-s-c 1-a 1_e‘b - a = a(s, c), b = b(s,c) , w 2 Wk as defined above, and g (S,c)ds : n (13(5) n (S)ds : ¢(y)( -”1T) en (y) h (V) to to Since t+c l-t-c t l—t NS) = ”g(m) (1-1;-..) 11"(8) II I O N (_‘7 U} 3TH + O + T: I U) AH y—l l U) I O l___l 71 it is evident that h'(s) has at most one root in the interval [0, l-c] and it must be in the interior of the interval since 1im h'(s) : co lim h'(s) z -oo s—>O s—>1-c Jo ’\ >:< : Call this root t and select t0 and t1, so that to < t < t1. Evidently conditions (i), (ii), (iii) of the method of Laplace are satisfied. Thus t1 t1 q (s c)ds~f gn(s,c)ds to to ~(1-e-W ) l-t*-c + l -l/Ze-nh(t*) C >}< 2 >1: 2 z}: 2 >' t (t +c) (1-t) (t +c) = a(C)[B(C)]n (a(c) and (3(c) are defined in lines (2) and (3) of the Theorem), To complete the proof it should be noted that t t 1 0 l P[D:>c] zf qn(t,c)dt+f qn(t,c)dt+f qn(t,c)dt 0 t0 t1 t 1 O ~a(c)[(3(c)]n+f qn(t,c)dt +f qn(t,c)dt. 0 t It remains to be shown that the two integrals on the right are small compared with 01(c)[(3(c)]n . 72 The integral to qn(t, c)dc 0 denotes the probability of the event that the maximum deviation of Fn(s) - s is at least c and is obtained for some 5 in the interval (0, t0) . The event described is realized only if at least nc of the order statistics, Yk , are in the interval (0, t0) . The probability of the latter event is P[K 2 nc] where K is a binomial random variable with parameters n and t0. Thus to qn(t’ C)dt < P[K > nc] :min S-nc Es +K O — _ s>l : min s-nc(ts +1 -t)n s>l = [1360.0]n say l-c Clearly 6(t0,6) < (3(6) if 0 < t0 <1/2 and to < (l-c) C c[(3(c)]l/C. For any such tO , to f qn(t.c>dt = o(a(c)[p(c)]“) o A similar argument demonstrates that 73 1 f qn(t,c)dt _<_ P[K > n(t1+c)] _<_ [(3(t1,1c1+c)]r1 t1 (here K is a binomial random variable with parameters n and t1) so ti" < t < l-c for any t1; 1 This concludes the proof. The following two theorems are immediate. Let X1, X2, . . . be a sequence of independent, identically distributed random variables with common, continuous distribution function F . Define the sample distribution function as Define the Kolmogorov-Smirnov statistics: D- = sup (F(x) - F (x)) n -wi< >.< * 1- ta: 1- >5 B ; B :: exp{(t +C)(10g * )+(1-t -C)(10g_ji<—_)} : g; t +c l-t -c l-t -c >:< >1: >1: >§< _ w t [l-t -c t +c :] 1/2 a? 0‘ = T + —“‘::‘2‘ C t (l-t ) . Theorem 2. 5. 2 Let D; be defined as above, then n P[D;>c]~a"(3 . Theorem 2. 5. 3 Let Dn be defined as above, then P[Dn>c]~ 26.67l To verify theorem 2. 5. 2 assume, without loss of generality, that X1,X2,... define Uizl-X., i:1,2,3,... . Let 1 then x - F:(x) .—. F:(l-x) - (l-x). are uniform random variables on the unit interval and 75 Thus P[D;l > c] = P[sup(F:'(u) - u)> c] and the probability on the right u is determined in Theorem 2. 5. 1. Theorem 2. 5. 3. follows from the observation that P[D >c] = P[D+>c]+P[D‘>c]—P[D'>c,D+>c] n n n n. n and the probability on the right is small compared to the others. 2. 6. Probabilities of Large Deviations of the Kuiper Statistic. Let X X be a sequence of independent, identically dis- 1, 2’ . . . tributed random variables with common continuous distribution function F . As before, the sample distribution function generated by the first n random variables is defined as F(X)= Z ;i=1,Z,...,n n x. c] lim =l;Oc] n—‘CD The following theorem, together withTheorem 2. 4. 3 proves that the probabilities of large deviations of Vr1 and Dn are not asymptotically equivalent, in fact, P[D > c] lim n = 0; 0 < c <1. n—>m p[Vn> C] Theorem 2. 6.1 Let Vr1 be defined as above and let c be a real number, O c] ~ arl(c)n[(3(c)]n . (2.6.1) The definitions of [5(c) , w;< and t).< are given in Theorem 2. 5. l and 2 1/2 2 -1/2 “1“) z “1-2.-t) 372 [7—1— + 21 1 (2-6'21 (l-t) (t+c) t (t+c) (l-t) (l-t-c) w=w*, t= t*, Proof. The method of proof is similar to that of Theorem 2. 5. 1. Assume, without loss of generality, that X1, X2, . . . , Xn are n inde— pendent uniform random variables on the unit interval. The notation s 4— t is used to describe addition on the unit circle. That is, for- O_<_s (t+c) n 3f2 (“139(3) . (2.6.5) As was shown in Theorem 2. 5. 1 (3%, c+ E) ~ ak(t, c)(3n(t, c) ..,.) = (1%?) (MILE?) The joint probability density that the infimum of Fn(v) - v is at s and the maximum is at s-i-t and that Vr1 Z {:11 - t , m: n(t+c) , is qn(t’ C) ~ its??? pn(t, c), Apparently qn(t, c) is independent of s so that 1 ,{ qn(t.C)ds = qn(t,C) as in Theorem 2. 5.1 we have t t f quill, c)du ~f gn(u, c)du 1 t-H t- Eli—- for gn(u.C) = fn(u,c) [lbw eb(u, C)c] ENC u:|l:1e eu(1C 11)] 80 fn(u, c) denotes the right hand side of (2. 6. 5)divided by l —a(t, c) , and b(u, c) is defined as in(2. 5.26). Finally the probability that Vn is greater than c is t l 1 P[Vr1 > c] = f qn(t,c)dt ~ I gn(t,c)dt O t O for any t0 and t1 such that O < tO < t* < t1< l-c . The last integral may be integrated using the method of Laplace thus obtaining P[Vn > c] ~ 011(c)n[(3(c)]r1 . It remains to verify the asymptotic expan- sions listed under (2. 6.4). The asymptotic expansion of P(t, n,m) is found by using Sterling's approximation of k.‘ as described following line (2. 2.21). Next, for any Poisson stochastic process {X(v), v30} , define L'(a,b, k) : P[a:v-X(v) _<_ 1 ; 0 < v < b|X(O) = o, X(b) = k]. For L(t, n,m) as defined in(2. 6. 3), identifying the m - 1 order statistics in the interval (s, 8-1-1) with the corresponding order statistics of m - 1 independent random variables on the interval (0, nt) and these, in turn, with the m - 1 jump points of a Poisson process in the interval (0,nt) we observe that L(t,n,m) = L'(nt-m, nt,m-1) -1 _ = ‘JE L'(nt-m, 925-. [5117] +3) - P[X<%5)=[1n—211—]+11X0 (2.6.10) and w is the largest real root of the equation w : (l + E) (l -e-W) . Applying (2. 6. 7) through (2. 6. 10) to (2. 6. 6) it is seen that Z ) L(t, n,m) ~ (1 - e‘W The asymptotic expansion of R(t,n,m) is obtained in a similar manner and we have 82 1w. n, m) ~ (ff-,- This completes the proof. BIBLIOGRAPHY 10. ll. 12. 13. BIBLIOGRAPHY Abrahamson, Innis G. (1967). Exact Bahadur efficiences for the Kolmogorov-Smirnov and Kuiper one-and two-sample statis- tics. Ann. Math. Stat. 38, pp. 1475-1483. Birnbaum, Z. W. and Ronald Pyke. (1958). On some distributions related to the statistic D: . Ann. Math. Stat. 29. PP. 179- 187. Blackwell, David and J. L. Hodges, Jr. (1959). The probability in extreme tail of a convolution. Ann. Math. Stat. 30, pp. 1113-1120. Copson, E. T. (1965). Asymptotic Expansions. Cambridge Univer- sity Press, Cambridge. Darling, D. A. (1960). On the theorems of Kolmogorov-Smirnov. Theory of Probability and _i_t_:§ Applications, 5, pp. 356-60. Doob, J. L. (1953). Stochastic Processes. Wiley, New York. Feller, William. (1957). _A_r_1_ Introduction £2 Probability Theory and I_t§ Applications. Vol. 1, 2nd edition. Wiley, New York. Feller, William (1966). £13 Introduction to Probability Theory and It_s Applications. Vol. 2, Wiley and Sons, New York. Fulks, W. (1951). A Generalization of Laplace's Method. Proceed- in s _qf_the American Mathematical Society, Vol. 2, p. 613- 622. Gnedenko, B. V., V. S. Koroluk and A. V. Skorokhod. (1960). Asymptotic expansions in probability theory. Proc. Fourth Berkeley Symp. Math. Stat. Prob. Vol. 2, pp. 153-170. Univ. of California, Berkeley. Hoeffding, W. (1948). On a class of statistics with asymptotically normal distribution. Ann. Math. Stat. Vol. 19, pp. 293-323. Karlin, Samuel. (1966). _A_First Course _ip_ Stochastic Processes. Academic Press, New York. Kuiper, Nicolaas H. (1960). On the random cumulative frequency function. Proc. of the Royal Neth. Academy o_f Sciences Seriesé Vol. 63, pp. 32-37. 83 14. 15. l6. 17. 18. 19. 20. 84 Kuiper, Nichlass H. (1960). Tests concerning random points on a circle. Proc. _qf_ the Royal Neth. Academy 3f Sciences Series A. Vol. 63, pp. 38-47. Loéve, Michel. (1963). Probability Theory (3rd edition). Van Nostrand, Princeton. Pyke, Ronald, (1959). The supremum and infimum of the Poisson process. Ann. Math. Stat. Vol. 30, pp. 568-576. Rubin, Herman and J. Sethuraman. (1965). Probabilities of moderate deviations. Sankhya’. Series A. Vol. 27, pp. 325- 344. ' Sethuraman, J. (1964). On the probability of large deviations of sample means. Ann. Math. Stat. Vol. 35, pp. 1304-1316. Takacs, Lajos. (1967). Combinatorial Methods in the Theory 9_f_ Stochastic Processes. Wiley and Sons, New York. Widder, D. V. (1941). The Laplace Transform. Princeton Univ- ersity Press, Princeton. .111 18111. 11.1 RSITY L IE IBRARIES 56 7667 11m 30 0 Mlllmlsllhlllllm 3 1293