. .uétislitr «.‘yolsragildtt‘41‘xiiclll: (git; lgu'liuitf. 11);!" ii... 5’ . (${tftlilcélICPn‘v (1...?) 15.9.51)! ’(‘i‘é‘ff") lititallbxtlll iloivv : .3 ,,.:1 Fit.) : i, 11.4: I . (I) 45((riirpli. o. (I (A, r rt. v . r:¢z$)i»1.l.(rtlli.ztlorllllrn / . 5.) ) v I: 5(1))?! . < :715: .7 .u......:.. .L I... (51.11»... yr... :1; l . V. £1.73 . £>i.. J MICHIGAN STATE UNIVERSITY LIBRARIES I II III \IIIIIIIIIIIIIII IIIIIIIIIIIIIIII 3 1293 00609 4045 LIBRARY Michigan State University This is to certify that the dissertation entitled An Invariance Principle Applicable to the Bootstrap presented by John Kinateder has been accepted towards fulfillment of the requirements for Doctoral degreein Statistics Date May 17. 1990 MSU is an Affirmative Action/Equal Opportunity Institution 0-12771 PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE MSU Is An Affirmative ActiorVEquel Opportunity Institution AN INVARIANCE PRINCIPLE APPLICABLE TO THE BOOTSTRAP By John Gerald Kinateder A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1990 ABSTRACT AN INVARIANCE PRINCIPLE APPLICABLE TO THE BOOTSTRAP By John Gerald Kinateder It is shown that the bootstrap of the sample mean can be viewed as a stochastic integral which isolates the roles of the resampling empirical distri- bution function H n and the normalized partial sum process W”. An invariance principle is then established for the normalized partial sum process and bootstrap partial sum process encompassing both the normal and non-normal domains of attraction in the symmetric case. Using the sample representation of the form given by LePage, Woodroofe, and Zinn (1981) for the a < 2 case, and Donsker’s theorem in the finite variance case, we show that the processes (W", W" o H “) converge jointly to (W, W o H) where W is the homogeneous independent increments SaS’ process and H is the limit of the resampling cumulative. To my father, who worked hard, so I could play. iii Acknowledgements The author would like to thank Professor Raoul LePage for the suggestion of the problem and all of the helpful direction in solving it. He would also like to thank Professors Schlomo Levental and Anil Jain for their time and interest. Professor Hira Koul was particularly helpful with his meticulous reading of the original manuscript and many helpful corrections and sugges- tions. Finally, the author would like to thank the Office of Naval Research for supporting him for the last year and one half of his doctoral study. iv Contents List of Tables. vii List of Figures. viii 2 The Stochastic Integral Representation. 6 3 The Invariance Principle. 11 3.1 The Main Theorem ......................... 11 3.2 Proof for the a < 2 case ...................... 12 3.3 Proof in the Finite Variance Case ................. 29 4 The Limit Laws. 31 4.1 Infinite Variance (a < 2): Symmetric Case ........... 31 4.2 Finite Variance Case ........................ 34 5 Knight’s result follows in the symmetric (a < 2) case. 35 6 Simulation Results. 45 7 Remarks 52 7.1 Other resampling plans. ..................... 52 7.2 Only off by a scale. ........................ 53 A Appendix. Bibliography. vi 55 63 List of Tables 2.1 Conditional distribution of 22:,(X; — X") given the data. . . 8 2.2 Conditional distribution of f r W2 o H 2(dr) given the data. . . 8 2.3 Conditional distributions given the ordered data. ....... 8 6.1 Analysis of sizes of bootstrap confidence radii. ......... 51 vii List of Figures 6.1 6.2 6.3 6.4 Coverage of bootstrap method for various 0: ........... 47 Distribution of bootstrap confidence radii for a = l, n = 50. . 48 Distribution of bootstrap confidence radii for various 0:. . . . . 49 Distribution of bootstrap confidence radii for or = 1, n = 200. . 50 viii Chapter 1 Introduction. Suppose X l, X 2, . .. are independent random variables distributed according to a distribution function F with location parameter 0. In order to make inferences about 0, we may consider the distribution of the sample mean about 9 : Xn—O. For example, the well-known Lindeberg—Levy Central Limit Theorem [Bil86] tells us that if EX,2 < so, then n‘lnfl-fi — EX1)—>d N(0,02) where 02 represents the variance of X1. In the finite variance case, we can use this to make inferences about 0 = EX 1. Of course EX? < 00 is not necessary for convergence in distribution of the sample mean. Definition 1.1 F is said to be in the domain of attraction of a distribution a (not concentrated at one point) if there exist constants an > 0,bn, and a random variable Y with distribution n, such that n 5,. = (1209- - b") _.d Y, (1.1) n j=1 Necessarily an ~ cnl/a for some a E (0, 2] and c > 0; Y and p are said to be a-stable. In what follows, we assume that F is in the domain of attraction of an a- stable distribution and X1, X2, . . . is a sequence of i.i.d. F random variables. If 0 < a < 2, then we fix a sequence an > 0 such that for each y > 0, n(1 — F(a,,y)) —* y’“ as n —* 00. For such an, (1.1) holds with bn = 0 for 0 < 0 <1, b,, = EXlif1< a < 2, and b,, = Esin(X/a,,) ifa =1. (For existence of such a sequence, see Feller [Fel7l].) If a = 2, then we choose a,1 such that n a;1Z(X,- -— EX1)—>d N(0,1). j=l In either case, let ,1: denote the limit distribution. Since the distribution F is generally unknown, so is the distribution of 5... Thus, if we are to use 5,, to estimate 0, then we need to have some idea of the variability of Sn. As was suggested by Efren [Efr79], in a wide variety of situations, we can use resampling of the data X‘, . . . ,X,, to estimate the distribution of an estimator. This is the essence of bootstrap. Let F" be the empirical distribution function of X1, . . . ,Xn : no) = l i 100. _<_ x). n k=l For each observation of the data X1, . . . , X ,,, we consider the distribution of m Sr; = 07-731 2(X; — X-fl) j=1 where Xf, . . . ,X,‘,', are independent and distributed according to F“. This is equivalent to simple random sampling from the original sample, X1, . . . , X", with replacement, and applying the same statistic to the resampled data as we would to the original data. The resampled data is often called the bootstrap sample, and the conditional distribution of the statistic applied to the bootstrap sample (given the data (X 1, . . . , Xn)) is referred to as the bootstrap distribution. Bickel and Freedman [BF81] showed that in the finite variance case, the bootstrap distribution of 5:", given (X1,...,X,,), converges weakly to N (0, 1). (Recall that the variance is removed here by the choice of an.) Singh [Sin8l]) showed that under the added assumptions that E |X1|3 < co and F is non-lattice, the bootstrap of the pivoted sample mean is actually asymp- totically a better approximation to the true distribution than the normal, based on the Edgeworth expansion. Hall [Ha188] showed that when X1 has finite variance but EIXII3 = 00, the general situation is that the normal approximation and bootstrap approx- imation are asymptotically equivalent. Therefore, in this case, it is better to use the normal approximation in lieu of the computational cost of the bootstrap. For the case of 0 < a < 2, when the bootstrap sample size m is taken to be the same as the original sample size n, it was shown by Athreya ([Ath84] and [Ath87]) that the bootstrap distribution of S; does not converge weakly to a constant distribution along almost all sample sequences. He showed that it converges in distribution (with respect to the weak topology on the space of bounded measures) to a random distribution. He gave a representation for this random limit distribution in terms of Poisson random measures. Notice that if (M;,,...,M;,,) is a multinomial vector with parameters (n, (i, . . . ,fi ) independent of the sample sequence, then £(S;IX1,...,X..) = £(:Xk( :1 -1)|X,,...,X,.). (1.2) k=l The ML- can be thought of as counts; X,- is chosen Mg, times in the bootstrap sample. Following Athreya’s work, Knight [Kni89] gave a different representation for this limit law. Using the distributional relationship (1.2) above, and the sample sequence representation provided by LePage, Woodroofe, and Zinn [LWZBI], he gave the following explicit representation of the limit law: As in [LWZ81], define p by . l — F(y) p 3115301 - F(y) + F(-y-)' Let she”... be i.i.d., P(€1=l)= p = l — P(el = -1). Let F = (F1,I‘2,...) represent the arrival times of a Poisson process with unit rate; I". = {35:16,- where P(£,~ Z :r) = e" for all i (5,62,. . . are independent). Let Mf, 2‘, . .. be independent Poisson mean 1 random variables. Finally, assume that {6,}, {I‘,-}, {M;} are mutually independent. Then L(S;|X,,...,X,,) _.d £(chl‘;l/°’(1II;-1)|ckI‘;I/°,k 21). (1.3) k=1 Notice that the above convergence is in distribution. In fact, Giné and Zinn [G289] show that in the infinite variance case, this cannot be strength- ened to almost sure convergence (which does occur in the finite variance case). In the infinite variance case, the bootstrap distribution of S; does not converge in distribution to the limit distribution a obtained in the limit of the original 5,, sequence. That is, the bootstrap distribution is not a consistent estimate of u. Because of this phenomenon, it has been said that the bootstrap does not work in this case. (Although it is understood why this claim was made, Chapter 6 suggests that the method may actually be viable for applications in the a < 2 case.) What happens when the bootstrap sample size m is allowed to differ from the sample size n? Athreya [Ath85] showed that in the 0 < a < 2 case, the bootstrap can still be made to work if the bootstrap sample size is chosen small enough in relation to n. More precisely, if the bootstrap sample size m,, -+ 00 such that mn/n -+ 0, then the bootstrap distribution of 5;," converges weakly to a in probability. Arcones and Giné [AG88] added to this answer by showing that if m,, log log mn/n —» 0, then the bootstrap central limit theorem holds almost surely. That is, the conditional distribution of 5;," converges with respect to the weak topology to )1 almost surely. But if limninf mn log log mn/n > 0, then there is no almost sure convergence — not even to a random measure! In this thesis, we examine the relationship between the distribution of the partial sums and the resampling criteria. We give a decomposition of the bootstrap of the sample mean in the form of a stochastic integral. Then we develop an invariance principle explaining the behavior of the processes involved in the decomposition. When these processes are replaced by their limits in the stochastic integral, the integral obtained turns out to have the limiting distribution of the bootstrap for all a. This affords a general rep- resentation of the bootstrap limit law encompassing both the normal and nonnormal domains of attraction in one expression. We give the stochastic integral decomposition in Chapter 2. In Chapter 3 we introduce the invariance principle. The theorem is proved for the finite variance case as well as the a < 2 symmetric case. Chapter 4 gives a de- composition of the limiting distributions of the same form as that given in Chapter 2. Chapter 5 gives an alternate proof of Knight’s representation of the limit law (of course restricted to the symmetric case) using the invari- ance principle. We give some simulation results in Chapter 6 which suggest that in the a < 2 symmetric case the bootstrap of the sample mean actually performs very well. Chapter 7 contains some concluding remarks suggesting some of the value of the research as well as future directions. The appendix contains proofs of some of the more technical lemmas. Chapter 2 The Stochastic Integral Representation. Here we give a new decomposition of the bootstrap of the sample mean. Definition 2.1 For each pair of positive integers m,n, we define the follow- ing: (i) Z, = (Xm, . . . ,X(,,)) are the absolutely ordered observations WM 2 2. |X1n>|, so that X“) is the i‘h largest in absolute value. (ii) W" is the scaled partial sum process associated with (X1, . . . ,Xn): Int] W"(t)= ‘ 2X1, t 6 [0,1]. k=l (iii) (M ,;,,. .,M,‘,'m) is a multinomial (m, (3;, . . . , %)) vector independent of the observations (X1, . . . , X"). (iv) H (”'“l is the empirical distribution function of the centered multinomial vector: H‘m’x"’( $2210” rip—<2) Theorem 2.1 Let n be the number of observations, and let m be the boot- strap sample size. Then c (a;l XXX; — X.) | A?) = c (/tW" o HWth) | 3(2). k=l We will refer to W" 0 H ("“"l as the bootstrap partial sum process. It should be pointed out here that in reference to the bootstrap, since resampling with replacement has no dependence on the order of the data, m aim — X.) | 3?.) = 51:00: — X.) |X1,....X.). k=l k=l But this alternative conditioning is not valid for the stochastic integral rep- resentation; C(frW" o H‘m’”)(dr)|}(v,.) 51$ £(/r W" o H(m'")(dr) | X1,...,X,,). To see this, consider the following example. Example. Suppose the data comes in: X1 = 2, X2 = 1. Here we will let the resample size m = n = 2, so we will denote H (2'2) by H2. Ta- bles 2.1 and 2.2 describe the conditional distribution of 22.1100: — X2) and fr W2 o H2(dr) given (X1,X2) (with a; = 1). Thus, in particular, 2 P(Z(Xi - X2)=1|X1= 2.x. :1) =1/4, k=l but P(/rW2 0 H2011) .-. 1|Xl = 2,X. =1) = 0. But if we condition on (Xm, X (2)), then we must consider the conditional distribution of 2(X; —X2) and fr W" o H"(dr) given (X1,X2) = (1,2) and (X1,X2) = (2,1) separately, each with probability 1/2. In this case, we get the following two columns as possibilities for each of XXX; - X2) and fr W2 o H2(dr), each occuring with probability 1/2; see Table 2.3. Table 2.1: Conditional distribution of ZLAX; -— X") given the data. X; X; ram—X.) 1 l -1 l 2 0 2 1 0 2 2 l Table 2.2: Conditional distribution of fr W2 o f12(dr) given the data. Mg, M23 H2(—l) f12(0) H2(1) frW2 o H2(dr) 0 2 1/2 1/2 1 -1 l l 0 l l 0 I 1 0 l l 0 2 0 1/2 1/2 1 -1 Table 2.3: Conditional distributions given the ordered data. XXX; — X2) fr W" o H"(dr) -l l -1 1 0 0 0 0 0 0 0 0 l -l -l l Since the conditional probability of each of these values being achieved in both cases is US, it is clear that in this example, both 2(X; — X2) and fr W2 o H2(dr) have the same conditional distribution given the ordered sample (Xn), X(2)). Proof of Theorem 2.1. Let (M' m1,oo .,M;m) and Hm”) be as in the hypothesis. Notice that c (2.: X.( ;, — 1:) | Z.) = z: (for; — X.) l 27.). k=l k=l For real t let Almlu) = Z X.I( ;,,. — 73 g t). k=l n This process adds mass X,- at the points Mg”- — 12—. Then n m m n 2 x.(M;.. — 1’3) = Br — —)(2 MW... = 2) 1:1 71 r=0 n k=l = frA(m'")(dr). If we show that aglAlm'") has the same joint distribution with If. as the W" o Hlm'“) process, then the proof will be complete. 0 Lemma 2.1 For each m and n, (E; a;l/l('"'")(t), t E 13:405.; W" 0 11(m'")(t), t E R). Proof. Since both HIm'“) and AIm'") are right continuous and constant except for possible jumps at r — 13-, r = 0,1,...,m, it suffices to show that (YnmanoHIm’nlh— 3), r=0,1,...,m) (2.1) n =d (in; A(m'")(r — 31—), r = 0,1,... ,m). n 10 We do this by examining the increments. Condition on (Mgm . - . 141,7;71) and use the exchangeability of (X), . . . , X.) to show that amen-uh?) <32}; 2 X1”) 7‘ =0,l,...,m> (2.2) j=nH('"-")(r—1—§) r... (X); Z X,-I(M;U- = 0),...,: X,1(M,;, = m)). i=1 i=1 To see this, notice that nHlm'")(r — $3) = #{k S n : Mg. 5 r}, so that in particular nHlm'")(—l - %) = 0 and nHIm'")(m — 12-) = m. Thus, (2.1) follows by an application of a Borel-measurable transformation to both sides in (2.2). 0 Corollary 2.1 For any Bord-measurable function f, c (2:: mm... — 2)) Y) = c (/ f(r)W" o H‘“""(dr)| Y) . Proof. Notice that i x.“ .1 — 2) = / f(r)A‘"""’(dr) k=l and apply Lemma 2.1. Chapter 3 The Invariance Principle. Since Y. is a function of the partial sum process W", and conditionally on in, S; has distribution dependent on the bootstrap partial sum process W" o H (mm), it is clear that the behavior of S; jointly with )7" is dependent on the behavior of W“ o H ("h“) jointly with W". The following invariance principle helps explain this behavior for large n. 3.1 The Main Theorem. We define the Skorohod metric as it is defined in Billingsley [Bi168]. Definition 3.1 For each pair of functions a: and y in D[0,1], define the distance d,(.'c, y) as the infimum of all those values of6 for which there exists a strictly increasing and onto transformation A : [0,1] —* [0,1] such that ”A - 1H S <5 and Ill? - 1J0)” S 6, where t denotes the identity function on [0,1]. Theorem 3.1 Suppose H“ is a sequence of stochastic processes on R in- dependent of W" converging uniformly to H almost surely. Let W be the 11 12 homogeneous independent increments symmetric a-stable process with scale determined by 5,, —-»d W(1). (A) If X; has a symmetric distribution in the domain of attraction of an a-stable distribution (a < 2) and H is the distribution function of a discrete random variable which takes values in afinite set or a set which can be written in the form {d1,d2,. . .} such that d, < d2 < ---, then (We H",W")—+d(Wo 11, W) in the product space (D(R),ZJ) x (D[0,1],5) where U denotes the uniform topology on D(R) and 5 denotes the Skorohod topology on D[0,1]. (B) IfEX12< 00 then (We H",W")—».(Wo 1W) in the product space (D(R),L(1) x (D[0,1],L(2) where L11 denotes the uniform topology on 0(3) and U2 denotes the uniform topology on D[0,1]. Notice that in part (B), W is a Brownian motion. 3.2 Proof for the a < 2 case. Throughout this section we assume the hypotheses of Theorem 3.1(A). Definition 3.2 Let £1, £2, . . . be 2.2.d, P(€1=l)=P(€1=—l)=l/2. Let F represent the arrival times of a unit rate Poisson process as described in the introduction. 13 Let T1,T2,. .. be i.i.d., uniformly distributed on (0,1). Define W by W(t) = fury/“HT. g t), t 6 [0,1]. k=1 LePage [LeP80] showed that W is a homogeneous independent increments symmetric a-stable process. We start by exploring the way that a particular LePage-like representation W" with the same distribution as W" converges to W. Then we will use this convergence along with the almost sure uniform convergence of H " to [I to finish the proof. Let 1 - G be the distribution function of IXII. Let G“ be the usual inverse: For real y 6 (0,1), G"(y) = inf{:r : C(13) S y}. Define for each n and k = l,2,...,n, Y"). = a;‘G"(I‘k/I‘n+1). Notice that (€1Yn1, . . . , (”Ya") =4 0:1(Xu), . . . , X90). (3.1) As introduced by LePage [LeP80], we define random variables L? in such a way that the processes I (L? S t), j = 1,...,n facilitate scrambling of the ordered random variables, €1Yn1, . . . , 6,. Y..." t L',‘(t) = min{t: 735 L771}; < [nt] - 210' “L? S 1) L'-‘t J” _ n+1-j min{t:T- } j=2,...,n. Lastly, define W" to be the scrambled partial sum process associated with (‘51an, ° - ' téflYfln): Wu“) = z": cpl/"1.1(L2 S t) 16 [0,1]. (3.2) =1 14 W"(t) is constant on each of the intervals [fb 1%), j = 1,...,n, adding a random selection of one of the ckYnk’s at each of the times t = i, %, . . . , 1. Thus W" has the same distribution as W" in D[O,1]. We first examine the behavior of W" truncated to its N ——1 largest jumps. Let N—l N-I 143(1) = : ernJ-HL? g 1); and SN = Z emf/“10,50. j=1 j=l Proposition 3.1 For each N, Vfi —+ 5N as in the Skorohod topology. Proof. By definition of an, for each j, with probability one, Y,”- —> 17"“. Therefore the vector of ordered sizes of the N — l jumps of V; is approaching the vector of ordered sizes of the N — l jumps of SN. But the vector of locations of the N — l jumps of VA’,‘ is also converging to the corresponding vector for SN. To see this note that < -n is an L2(.7"¢)-Cauchy sequence because, for P > 71) Ema»? = 3:2 Z 172’“ k=n Andforp>q_>_n, Bum—nun“ = EEF(. ‘ Z arr?“ (21.30): k=q+l p 2 = E(s;’ Z 1‘; “1). k=q+l As p,q —+ 00, this goes to zero (by the bounded convergence theorem). Thus (f:(t))p>n is an L2(f})-Cauchy sequence which converges almost surely to T,.(t)/s,.. By completeness of L2(f}), T..(t)/s,. E L2(f}), and film-'13 T.(t)/s.. (3-7) Since E" is continuous on L’(f¢), by Lemma A.2 in the Appendix, E" (mo/s.) = lim Ecru ) P“°°° = 3;! Z I‘;"°E"c,.I(T,. g t) k=n = S;IZF;1/aql(Tk sn ZFk [n(l TkAs) —s(k 3)], k=n By (3.7) E(E}‘-(/P( t)— T.(t)/s.) ) —. 0. Therefore Efren) — Tun/8.)” _.,. 0. Thus there is a subsequence (m) such that E"(f.i’*(t) — T.(t)/s,,)2 -» 0 Hence by the Minkowski inequality for conditional expectation, More»? —) E" (T—g‘fl) n Thus by Lemma A.2 (in the Appendix) and its corollary, with probability one, Ef‘(T.(t)/s.)2= lim Ef‘s ’2 (Z cJ-FJ-VOHTJ _<_ t))2 J=n 2 3"2 lim Ef‘(ZeJ-FJ I‘7'Jl/°I(T <3) +:6J~F Iii/ONT 6(1tll)8 k-ooo 1:" J: —n PI: PI: = 8313?;(2 17W?)- s .) + 5: P;”°E"I 0 almost surely, V. is increasing almost surely. Concerning (c), the set [2:1,] F;2/° < 00] has probability one. On this set, fort < l ff,’(t) —+ V.(t). Since EV.(1) < oo, f:(1) —» V.(l) almost surely and f,‘,’(t) is ft—measurable for all t, V. is the almost sure limit of adapted continuous paths. By completeness, V. is predictable (see Metivier [Met82]). [3 Proof of Proposition 3.2(continued). Now we check that the conditions of Theorem 3.2 are satisfied. Relation (i) 18 clear. For (ii), by (3.2) and the MCT for conditional expectation, mtg/3.), = E(19:I‘(_.;2 Z r;”°zn(1 - T. A t))) k=n = Es.( -2 Z P-s/aEr(_1n(1— T). A t))). k=n But the processes {FJ} and {TJ} are independent and E(—ln(1 — T). A t)) = fol ln(l -- u A t) du = t. Hence (T /3. =tEZ(I‘ r;2/°/s3.)- k=n Lemma A.4 shows that 02((T./s.),) —) 0 as n —r 00. Thus, Chebyshev’s inequality gives (T./s.), —+,, t, for each t. Finally, Lemma A3 in the Appendix shows that F;2/°’/s,’, -—+ 0 with probability one. By the bounded convergence theorem (iii) follows. 0 Corollary 3.1 sup |T.(t)| -—+,, 0. Proof. Noting that sf, -> 0 almost surely, this follows by Proposition 3.2. D 21 Notice that this corollary is equivalent to the statement: In the uniform metric, Ee,r;"°1(:r, g .) _., W. (3.11) i=1 Now we will work on finding an apprOpriate bound for the tail sums of the processes W" described earlier. Define n (JR/(i) = Z GYMNL? S t); s}... = 2 Y3,» j=~ j=N We will apply a method similar to that which was used in PrOposition 3.2. Notice that here we must control the starting and ending indeces, whereas before we only needed to keep track of the former. Proposition 3.3 Indexed by t, URI/3N. is an Lz-martingale, with condi- tional variance " " ["‘1‘11L'P>5 (UN) “size.- 2 ———(’ ")- t Proof. For each n,t let I. = .(r,,.,1(L; s u)... 31.13113 n}. Notice that the process (U fi/sN.(t)) is adapted to (flu), and that for each n, j, and t, s”. and Y.; are measurable .77... By Lemma A.2 (Appendix) E’MUMt) = 2; E’MY..,-e,-I(L; g t) J: = Z YnjEfmchL? S t) j=N = Z YnJ'CJ'HL? S .3) j=N = UMs). 22 Also, for each t, E(Uhltl/Sanz -- E(Er..2;.(iy.,.,1(1;gt))2) =N _—. E(sN'f, J; Y,3.1Er 1(1; 3 1)) S Ean 2Y3 = lk/n) _ 8N1! 71;] n _ k k 0 j=N 23 [nt]-1 [(Ln> _) : SNnjzzYnz 12 fi. [:1 Proposition 3.4 For each t in [0,1], the following relations hold: E(ghf) = Inn—t] (3.12) Var<£j§i>t g 2E(::’:)+;%:t—E—fi. (3.13) Proof. By Proposition 3.3 n 2[nt]—1 [(1131 > 5)) El”) = when. 2 .-. 8N” ‘ j=N k=0 ’1 "“1" P111: > *- III‘) = an 2Y1? j: J _ n =N n lc k=0 _l___ntlE " X'iE.”i) _ [Iii] — it Also, 2 n n 2 var<”~) Ea) w— an t 3N1: t n Fix N and n and for eachj = N,...,n, let (.11-. I(L'-‘ > 5) P1=Y.,-2/3~.; 01': Z ‘75:?”— 11:0 Then EV"): = Egr(:.,.c.)“ 3N1: t 24 = E: 2:1).ij (CC) 1=N )— Notice that all the Cj’s are independent of P, so EP(C.-C,-) = E(Cng). Ifi = j, then “-1 W1 1 I(L;-‘> I(L'-‘ > K) = 4:; 2., :5) nu.) |/\ EZZ lc=0 u=0 ”(n-1n 11( Ln>5 Ln) >)n)) (n—k)(n—u n-ln-l P( Lu > k_\_/_u =22 k=0u=o(n-k) ("2'") “'1"- n(-—k)kA(n—u) =ZZ( Ic=0u=0n(n'_"c )(n—V) 1 n-1 n-1 = nZZ(n-k)v (n-u) nk=0 v=0 l"'12(n—k)—l = —2 <2. nk=0 (n—lc) " Ifi #j then EC;C‘ equals [ntl- --ij(L >_ kvtL > k) ["']"2["‘11P(L,->—",.-L > u) z 3.2.): +22: 2: k=0 u=k+l (n_ k)(n — V) k=0 [nil-l [nt]- 2 [Ml-l = (n—k)(n-k 2):“ (n-u)(n-—k—1) Z;( n()n—-n—-1( +2gu=zk11n(n-l)(n—lc)(n—V) 1 [nt]- 1 [nt]-— 2 [Ml-1 < 1 2 — nn—( 1)( Icz-O + lg) ugl 1) [nt]2 n(n—1)° 25 Thus E(Svi): g E(éjzfl) 2)+2 $1?“ij [m]: ___») S E(QPN+n(£:1t_]21))- Observing the definition of my and subtracting the squared mean, the propo- sition is proved. Ci Proposition 3.5 There exists a sequence (no(N)) such that Yn’(N)N 3Nn’(N) for every sequence (n’(N)) such that n’(N) Z no(N) for each N. -+ 0 a.s. -l/a Proof. Since for each j, Y",- —+ 1‘; a.s., we have for each N, YnzN 11-2/0 2 _i 2N -—2/aa' Z‘N+l Yr!) ZN.“ Fj Hence we can choose no(N) Z 2N large enough that for n 2 no(N), Y3” 1332’“ P | — _ a|>2"”)<2-”, ( 2% Y3, 2%.“; 1‘,” Fix a sequence (n’(N)) such that n’(N) 2 no(N) for each N. By a simple application of a Borel-Cantelli lemma, as N —> oo Ynimgzv P-2/a ENHY mm» 2%?” 1‘ I 2’ ° By the strong law of large numbers, I‘"’° 1‘3”“ ___ (mm-”a 2,91, 17”“ " NI‘;”° N(F2~/N)‘2/° -+ 0 a.s.. (3.14) —-+ 0 a.s.. (3.15) 26 Combining (3.14) and (3.15), we see that 2 Yn’(N)N N —* 0 a.s.. £lV+1 Ynz’(N)j But since n’(N) 2 2N, Yylimw < Y732’(N)N _ N ‘ sivn'm) 2i!“ Ynzluvb' Proposition 3.6 For each N, as n -+ 00, Ill/,3 o H" — SN 0 H” —+ O a.s.. Proof. Fix N. Let BN equal the set on which N VIM-TA —> o, i=1 N VlYnj-F-‘l/al -’ 0, J i=1 T,#H(t)foralli21 and tER, llHn-H” —> 0. By hypotheses, P(B~) = 1. We claim that ||V,{,‘ o H” - SN o H" -—+ 0 on BN. To see this, fix a: 6 EN and let 6 > 0. Suppress the argument w from the following relations. By the assumptions on H, there exists K such that for k2 K, H(dk) Z H(dK) > max{T,- : i g N}. Thus, we can choose 5 > 0 such that e < A IT.- — H(t)|. gyms}? 27 Fix M’ such that for n 2 M’ N V lLy—le 0 arbitrarily, so we are done. 0 With existence guaranteed by Propositions 3.1 and 3.6, choose n1(N) _>_ no(N) (no is defined in Proposition 3.5) such that for n 2 n1(N), P(||V,(,‘oH" —5,.oHu > N“) < N“, (3.19) 28 and P(d,(V§,SN) > N“) < N". (3.20) Define N(n) = max{N 21:n,(N)g n} V 1. Lemma 3.3 Vii/‘01) —>,, W in the Skorohod topology. Proof. Let (in) be a subsequence. Since n1(N) _>_ no(N) _>_ 2N —+ 00, we can choose a subsequence (nk’) such that N(nk,) < N(nk,.+,) for each j. Let 6 > 0. Now m.) 2 n1(N(nk’)) by definition of N(-), and forj large enough, N(nk,) > l/varepsilon, in which case i P(d.(v;.:.,,,,. W) 2 25) g P(d.(v,,";;,j,,5~(.,,1,) > e) + P(d,(SN(,.,j), W) > e) S 1 N(nkj) + P(d,(S~(MJ_), W) > 6), Since N(nk,.) —v 00, the claim is proved. 0 Lemma 3.4 U Run) / shun)“ converges in distribution to Brownian motion with respect to the uniform topology on D[0,1]. Proof. As in the proof of Lemma 3.3, choose subsequences (nk) and (nkj). Again we use Theorem 3.2. By Proposition 3.3, U 3100/ sNWn is an Lz-mar- tingale. Surely U N(n) /sN(,,),,(O) E 0. Also, since "1:, _>_ n1(N(nk,-)) Z no(N(nk,)), Propositions 3.4 and 3.5 give us that the conditional variance condition is sat- isfied. Finally, Proposition 3.5 gives us that the expected squared maximum jump also converges to 0. C1 29 Theorem 3.4 W" —>,, W in the Skorohod topology on D[0,1]. Proof. Notice that W" = V§(n)+U}(‘,(n). By the corollary to Proposition A.1, sN(,,),, —+, 0. Therefore, by Lemma 3.4, ”Ugh," -i,, 0. By definition of d, it is easy to show that if d,(;r,,,;r) —> 0, and ||y,,|| -+ 0, then d,(:c,, +y,,,:r) -> 0. Apply Lemmas 3.3 and 3.4 to complete the proof. Cl Proposition 3.7 W" o H" —i,, W o H in the uniform topology on D(R). Proof. For any n, ”W" o H” — w o I!” s ”W" o H" — v,:;(,,, 0 mu 1:) “ll/’3‘") o H" _ SM“, 0 H|1+1|S~(n)o H — Wo HI). A3 A3 A) S "W" — Vfiwfl = ||U,’{,(,,)|| —+, 0 as was shown in the proof of Theo- rem 3.4. A3 S “VNM — W” = IITN(,,)|| —i,, 0 by (2.2) because N(n) -+ 00. Also, since n1(N(n)) _<_ n, and N(n) —v 00, A2 —i,, 0. C1 Combining Theorem 3.4 and Proposition 3.7, (W" o H". W") ->. (W o H. W) in the prescribed space. Since these processes have the same joint distribution as (W" o H“, W"), Theorem 3.1(A) is proved. U 3.3 Proof in the Finite Variance Case. Assume the hypotheses of Theorem 3.1(B). Here EX 2 < 00 so a,, = 121/2 and Donsker’s Theorem applies: 0 30 W" _’d W, under the uniform topology on D[0,1], where W is Brownian motion (see Billingsley [Bi168]). Let W" be a process with the same distribution as W" such that W" converges uniformly almost surely to a continuous Brownian motion W. lwnoH" — W0 HHR S ||W"0H" — WoHnlln+ ”WOH" - WOHIIR 5 “W" — Wll(o.11+||W o H" — W o 11“,.t By construction, the first term on the right converges to 0 almost surely. With probability one W is uniformly continuous on [0,1]. Since H " —i H uniformly almost surely, the last term converges to 0 almost surely. With this representation, we have (W“0H“, W“) —+ (WoH, W) uniformly almost surely. Therefore, (w" o H", W") a. (w 0 1!, W) under the uniform topology in each coordinate. D Chapter 4 The Limit Laws. Under the usual resampling scheme, (i.i.d. resampling from the data), when we replace the processes involved in the stochastic integral decomposition by their limits obtained in the invariance principle, we get the limiting distribu- tion of the bootstrap. Throughout this chapter, let M ' be a Poisson (mean 1) random variable and let H(z) = P(M‘ — 1 S x). 4.1 Infinite Variance (a < 2): Symmetric Case In light of the results of Bickel and Freedman [BF81] and the result (1.3) by Knight [Kni89], and the invariance principle, the following theorem shows why the stochastic integral decomposition may be the natural way to view the bootstrap. Let 6,1‘, and W be as in Definition 3.2. Let M{,M2‘,... be i.i.d. ~ M“ independent of c, F. 31 32 Theorem 4.1 £(ZekP;1/°(M; —l)|ckI‘;1/°,k 21) k=1 = z: (/°° two H(dt)|ck[‘;‘/°',k _>_ 1) . Proposition 4.1 shows that this decomposition of the limit law carries over to the finite variance case. Lemma 4.1 With probability 1, Z c r;‘/°( (M; 4:215: c r; ‘/°I(M;— 1: r). (4.1) k=lr=_1 k=l Proof. Let Y = 2:11 ckF;1/a. Since Y is symmetric a-stable, it has charac- terisic function exp(—a°‘|0|°’) for some a > 0 (see [ST89, Definition 1.1.4]). Consider the partial sums K 00 Z r 2 ckI‘;1/°I(M; — 1 = r). (4.2) r=—l k=l By Theorem 3.3(A) and (B), for each 1' 2 —1, ZerFZ1/°(M;— 1)I(M,:-1 = r) := chkF;1/°I(M; —-‘1 = r) k=l k=1 :4 r(P(M,‘ —- 1 = r))1/°Y. By Theorem 3.3(C), as r varies, the inner sums in (4.1) are independent. Therefore, the characteristic function of the left hand side in (4.2) is K K II exp(-0°IT(P(ME-1 = r))”"’49|") = eXP(-0"|9l°' Z lr‘l"'1"(1‘41"-1 = 7‘)). 73-1 r=-l As K —> 00 this tends to exp(—d°|9|°E|M1‘ — 1|“). This is the characteristic function of (EIMf' — 1|“)1/aY. Thus, by the Continuity Theorem, as K —) 00, K 00 Z r )3 airy/“HM; — 1 = r)—+d(E|Ml'—1|°’)1/°'Y. (4.3) f=-l k=1 33 Since the left hand side is the partial sum of independent random variables, the sum must also converge almost surely (see [Bre68, Prop. 8.36]). By Lemma A.1 applied K + 1 times, the double sum in (4.3) equals 22;, arr/7M; - l)I(M; — 1 S K). But, by the same lemma, the left hand side in (4.1) almost surely equals :3 e r;‘/°'(M; — 1)I(M,; — 1 s K) + 2 arr/WM; — 1)I(M; — 1 > K). k=l By Theorem 3.3(B) the term on the right has the same distribution as (Ele—1|°I(M,'- 1 > K))‘/°'Y, which converges in probability to zero as K —+ co, ZrZekr;"°1(M; — 1 =r),):e,.1‘;‘/°(M;— 1). (4.4) r=-l k=l Coupled with the almost sure convergence of the left hand side of (4.4) al- ready established, the lemma is proved. C1 Proof of Theorem 4.1. By definition of W and Lemma A.1 co L:tWO H(dt) = Z riekf‘zl/aHTk E (H(r —1),H(r)]). r=-1 k=l Hence, by Lemma 4.1, it is enough to show that (:e.r;"°1(M; —— 1 = r),r 2 —1;e,.r,,,k _>_1) k=l =d <2 arr/GUT), E (H(r -1),H(r)]),r Z —1; ed}, k 21>. k=l For each k, (1(M; — 1 = r),r _>_ -1) :4 (1(7): 6 (H(7‘ - l),H(r)]),r Z —l). Also, for different k, these processes are independent of eachother. Since both processes are independent of e and F, Theorem 4.1 follows. Cl 34 4.2 Finite Variance Case. The following proposition shows that the above representation for the limit law carries over naturally to the finite variance case. Proposition 4.1 If W is a Brownian motion then ft W o H(dt) has the standard normal distribution. Proof. To see this, look at the characteristic functions again. The stochastic integral can be written as Z” r[W(H(r)) — W(H(r-))]. By independence r=-1 of the increments of Brownian motion, its characteristic function is lim fi exp(_r2t2[ll(r) - H(r—)]) n-ooo r=-1 2 2 n = lim exp(—% Z r2[H(r) — H(r—)]) n—ooo r=—l = exp(—£2-/r2 dH(r)) = exp(-t2/2). This is the characteristic function of the standard normal distribution, so the assertion is proved. D Assume W is a version of Brownian motion with continuous sample paths. Since conditioning on the ordered jumps of such a process provides us with no additional information, £(/tWoH(dt)| orderedjumpsofW) =£(/tWoH(dt)). By applying Bickel and Freedman’s results [BF81], using the proper scal- 1/ 2a', we see that in the symmetric case the result concerning a ing, an = n distribution in the domain of attraction of an infinite variance stable random variable can be viewed as an extension of what was already known in the finite variance case. Chapter 5 Knight’s result follows in the symmetric (a < 2) case. Here we focus on the case that the resampling is the usual simple random sampling from the original data with the resample size in" equal to n. As usual, denote H (am) by H ". The distribution on S; conditional on X1, . . . ,)(fl is random. We wish to show that this random distribution converges in distribution (with respect to the weak topology on the space of bounded measures) to the random distribution of 00 Z “PEI/”(Mi - 1) k=l conditional on c and I‘. This is the result which Knight proved. We will show that it follows using the proof of the invariance principle proved in Chapter 3. Each path a: in D[0, 1] has only countably many jumps. Let Ax denote the sequence of jumps of x ordered by absolute value from largest to smallest. Throughout this section, let W", W be as defined in Chapter 3. We will use CF to denote the process (air/G, k _>_ 1), so that the a-field generated by AW coincides with the a—field generated by 6P. 35 36 When put into our framework, our goal is equivalent to showing £(/rW"oH“(dr)|AW") -+d£(/rWoH(dr)|AW). (5.1) Since the function H on the right hand side above is deterministic, it may seem that when we condition the integral on the right by AW that the dis- tribution becomes degenerate. However, 0(AW) only contains information about the magnitude and directions of the jumps of W — nothing about their locations. We point out here that H ", H satisfy the hypotheses of Theorem 3.1( A). The only part which may not be obvious is stated in the following proposition. Proposition 5.1 H“ —+ H uniformly almost surely. Proof. Here, let AH"(t) and AH(t) denote H"(t) — H"(t—) and H(t) — H (t—) respectively. By Scheffe’s Theorem [Bil86, Theorem 16.11] it suffices to show that for each t AH"(t) -+ AH(t). (5.2) But for each t E(AH"(t)) = P(M,:l — 1 = t) —> P(M,' - l = t) = AH(t), and gamma) = 192% 2:“ I(M;, — 1 = 1))2 — (P(M;, — 1 = t))?. (5.3) The first term on the right in (5.3) equals n—l %P(M,:l—l=t)+ P(M;l_1:t,M;2—l=t). The first term here is asymptotically negligible. And for each integer k in (0,1,. . . ,n} by Stirling’s formula, [Bil86, Problem 27.18], 37 P(M;1= k,M,','2 = k) = k!k!(n — 2k)! ' n" nn+%e-n(n _ 2)n-—2k (k1)2(n - 2k)"-2*+%en-2knn — 6-1)2 -2k+2( n )1/2( Tl _ 2 )n—2k _ (I'— e n - 2k n — 21: e" 2 _i (721-) ' If lc is replaced by t+ 1 we see that this limit equals the limit of the last term on the right in (5.3). We have verified the sufficient condition in (5.2). U By Daley and Vere-Jones [DV88, Prop. 9.1.VIl(i)], it suffices to show that for bounded uniformly continuous f on R, an] r w" o Ham)" mm w an] r W o mam)" AW]. Proposition 5.2 If f is a bounded measurable function on R then em] 1' w" o H"(dr))|| AW") :1 E[f(/ r W" o H"(dr))|| AW"). Proof. We show that there is a measurable function T such that the left and right hand sides above equal T(AW") and T(AW") respectively. Then we invoke the fact that AW” =4 AW“. (See (3.1).) Let f be any bounded measurable function. Let 9 denote a measurable function such that g(W", H”) = fr W“ o H"(dr). Let [In be the set of all maps 7r taking (2:1, . . . ,xn) into partial sum processes of the form [Ml <2 1,“) : t6 [0, l]> k=l 38 where 1r is a permutation of n elements. Define h on the set of n‘h-order partial sum processes by h(w) = Ef(g(w, H")). By the Fubini theorem, h is measurable. Also Em] r W" o H"(dr))u AW") = Emma. H“))IIAW"] = E1E11n-l e A 39 n-l (‘J "M =9 Y'U) .1] = PK: e,y,,,-1(L; g H“(2:))> '=l 1:=-l = f(€1Ynlv ' ° 7 CnYnn) where f is just the name we give to the function which evaluates the previous line at the vector-value (elyn1,. . . ,enym). The right hand side in (5.4) equals n-l P[(Z Canj1(L? S H"(I))> 6 AIICIF1,€2F2,. ..] j=l I=-l n 71-! = P[ e A||61F1,...,c,,l‘n,l‘,,+1 i=1 "- ‘J=¢1 1‘71 =[‘, = PK:EiailG—le/‘YMOKL? 5 Hn(x))>,=:1 6 Al = f(e1a;‘G'1(I‘j/Pn+1), - - - . CadilG'I(Fj/Fn+1))- Since Y",- = a;'G“‘(I‘j/I‘n+1) for each j, we are done. 0 Corollary 5.1 If f is a bounded measurable function on R then Em] r W" o H"(dr))n AW) =. an] r W“ o was)" AW]. In light of Proposition 5.2 and Corollary 5.1, our task is reduced to show- ing an] r W" o was)" AW] _., an] r w o H(dr))|l AW]. We will show in fact that the above convergence occurs in probability. The proof of the following proposition will be the main tool in showing this con- vergence. 40 Proposition 5.4 For 6 > 0, P[|/rW" o H"(dr) — ero H(dr)| > 36 HAW] —>,, 0. Before we directly prove this statement, we will first state and prove a few lemmas. Lemma 5.1 ”ll/.1,W "<-°H"dr) [:rWonr) >6IIAW]—»,,0. (5.4) Proof. By Proposition 5.3 a]: .- W" o H"(dr)|AW") = c(/T r W" o H"(dr)|AW). By Proposition 5.1, and the proof of the invariance principle, W" o H" —*, W o H in the uniform topology. Since H", H change only at —1, 0,... , T in (—oo, T], T ~ 1' f r w" o H"(dr) —+,/ r w o H(dr). Using the Markov inequality, it is easy to show that for any collection of random variables, X, X1, X2, . . ., and any sub-a-field 9’, if X" —+, X then Xn —» X in g-conditional probability. That is, for any 6 > 0, P[|X,, — XI > sue] —+p 0, Thus (5.4) holds. 0 Lemma 5.2 For any 6 > 0, as T —* 00, P“ j: r w o H(dr)| 2 an AW] _. 0 a.s.. (5.5) 41 Proof. P °° W d >5 W 11/, r oH< r>1_ 11A 1 = p“ Z 5,,1‘;‘/°(M;— 1)I(M,;— 1 2 T)| 2 6 || AW] k=l < 13KB C1.1“.Z""(M,:—1)1(M,:— 1 2 T))2 k=l - 52 ‘Fl f: F;2/°)E[(M,' — 1)21(M; — 1 2 T)||cI‘ . k=1 1 m _<_ —((Z 5.13:1“)2 + 2 62 k=l The expectation term is a sequence of real numbers converging to zero the statement is proved. D For each T, define 91 on R by gT(a:) = (:1: — 1)I[T,°°)(:c — 1). Lemma 5.3 There exists a tight sequence of random variables 5,, such that for all 5 > 0, P“ / rW" o H"(dr)| 2 6||AW] < .1. _ 62 {nEg;~(M;1) a.s.. Proof. By Corollary 2.1, P11 / W?" o Warn 2 6IIAW] = P11 2 e.Y..gT(M;.)I 2 ans-14., .- _<_ n] k=1 1 3'5 |/\ EKZ ekYagflMJszlléanjJ S "l k=l IA 1 E(Cfl +dfl)i where cu = (ZZj¢k€j€kynjynnElngMril)gT(M:i2)ll€jYnj’jSnl’ d. = (:Y:.)ELq%(M;.)ne.-Y..,jSn]. k=l 42 Notice that the expectation term in c" is bounded by Eg§~(M,:,) by the Cauchy—Schwarz inequality. Also 1: 2 n ZZi¢k€j€kYfljYflk = (Z {kl/uh) — z: Yn2k' k=l k=1 The first of the terms on the right has the same distribution as (a; 1 22:, X k)2 which converges in distribution by hypothesis. Thus as a sequence of ran- dom variables it is tight in n. The second term converges in probability to 22;, F?“ by Lemma A.6 in the Appendix. Therefore (2 Zj¢ijEkYannk) must be tight in n. To complete the proof, let 5n = lzzj¢.€i‘kyniynk + Z Ynzk. C1 k=1 Lemma 5.4 For each T, Eg§~(M;,) —> EgflMf). Proof. Since we can recall the well-known weak convergence of the binomial (n, A/ n) to the Poisson (A) and apply the square of the bounded continuous function h;- on R defined by hT(:t) = (:1: - l)I[_.j,1-)(:z: -1)+ TI[T,OO)(I - l) + I(-oo,1)(1: - 1) it suffices to show that E(M;l — l)’ —i E(Ml‘ - 1):. By the weak convergence of (Mg, -1)2 to (M; —1)2, we need only establish that {(M;l - l)2 3;, is uniformly integrable. Apply the computation of the centered fourth moment of a binomial given by Lehmann in [Leh83] to get E(M;l_1)4=3(n—1)2+n—1(1_6(n-—1)). n n n2 It is easily seen that the right hand side is less than 4 for all n. D For intervals A let z..(A) = [A rW"oH"(dr) Z(A) = [A rWoH(dr). 43 Proof of Proposition 5.4. We will show that for any n > 0, “”1an P( PllZn(R) - Z(R)| Z 35 II 6F] 2 377) S 271- (5.6) First notice that Zn(—oo, —l) = Z(—oo, —1) = 0 a.s.. Also, for any n, P( Pllznl-1.°0) - Zl-1,00)| _>_ 35H 61‘] Z 371) S P(Pllan_1le_ Zl-ltTll Z 5|! 6F] 2 n) + P( PllZn(T,00)l 2 5H 6F] _>_ 17) + P( Pl|Z(T,<><>)| _>_. 5H 6F] 2 0)- By Lemma 5.2 we can choose To large enough that the third term on the right is less than 17 for T 2 To. Consider the sequence of random variables (5,.) provided by Lemma 5.3. Since 6,, is tight there exists B such that P( |£,,| 2 B) < 17 for all n. F ix T _>_ To such that 1762 Z 2BEg’(Mf). Then choose nT large enough so that for n 2 n1, Eg§-(M,f,) S 2Egz(Mf). Then for n 2 n7, P( P112.(T.oo)l 2 611d) 2 n) _<_ age/59am.) 2 n) S ’7- By Lemma 5.3, (5.6) holds. 1:] Proposition 5.5 For bounded uniformly continous f E[f(Z..(R))||AW] -*p E[f(Z(R))||AWl- Proof. Let 6 > 0. Since f is uniformly continuous, there exists a 6’ > 0 such that |f(a:) — f(y)| < 6/2 whenever la: - yl < 6’. Define A5: to be the set 44 [12,,(3) — 2(3)) < 6’]. Then P< 1121mm» — menu 4‘11 2 a) s P(|E[(f(Z.(R)) - 1(Z(R)))I...uer11 2 6/2) + P(IE[(f(Z.(R)) - nz>n,.n e111 2 6/2) = P(|E[(f(Z.(R)) — ((2(R)>)1.;,11er11 2 6/2) 3 arms ”61"] 2 6/(4llfll)) = P< P112. — 202112 6'1 er) 2 6/(4llfll) >. This last term converges to zero by Proposition 5.4. 0 Now apply the proposition described from Daley and Vere-Jones [DV88] to complete the proof. Chapter 6 Simulation Results. For random variables which have infinite variance, we found that in the symmetric case, the bootstrap of the sample mean does not perform so badly. In fact, in some ways, the method gives better results than it does in the finite variance case. For various indices of stability 0, and confidence levels 7, we simulated observations of random variables X ;, symmetric about 0, in the domain of attraction of an a-stable random variable and applied the bootstrap of the sample mean to create symmetric 7-confidence intervals for 0 in the following manner: For a given sample Xn = (X 1, . . . , X"), the confidence interval 0.,(Xn) is given by 01(xn) = [Xvi " T1(xn)an + T‘v(xn)l: where T,(Xn) is estimated as a quantity which satisfies PllX; — an _<_ Tv(xn)llxnl = T A suprising observation was that the empirical coverage of this method was consistently higher than 7 for a < 2. Figure 6.1 shows the observed coverage on the bootstrap method applied 1000 times each for 7 = .95 con- fidence with sample size n = 50, bootstrap resample size m = 50, and 500 45 (X 0.01) 1.1 93 46 Figure 6.1: Coverage of bootstrap method for various 0:. I I WI I I I I I I rT I I I I I I fiTI I I I T I I I I r f r— ..................................................................................................................................................................... 0—1 1- -< "' -¢ 47 bootstrap observations for various values of a. We used Monte Carlo simu- lations with X, ~ eU'V" where P(e = l) = 1/2 = P(e = —1), U uniform of (0,1). Notice that for a > 2, F has finite variance, and hence it is expected that the coverage should be approximately .95. Consider the confidence radii obtained by the above method, scaled by nl‘l/a because 5; = na;‘(5<; — X"). and a,, ~ cnl/a (see Feller [Fel7l]). In the finite variance case, the bootstrap distribution of the scaled and centered bootstrapped sample mean converges weakly to a fixed (normal) distribution almost surely. Since the limit distribution is continuous, the confidence radii given by the above method and then scaled as indicated converge almost surely to a fixed number as the sample size n tends to infinity. But by Athreya’s early results [Ath84] we should not expect this phe- nomenon to occur in the infinite variance case. Our simulation results exem- plify this. Figure 6.2 shows a frequency histogram of the observed confidence radii (logarithmically scaled) with n = 50, a = 1.0. The vertical line repre- sents the logarithm of the radius necessary for an unconditional confidence interval with confidence level equal to the coverage observed by applying this method. Figure 6.3 shows more of the same phenomena for various values of a. 48 Figure 6.2: Distribution of bootstrap confidence radii for a = 1, n = 50. l l l l 49 Figure 6.3: Distribution of bootstrap confidence radii for various a. 12 13 20 '1. 1 3 6 n=58 nlph.=.58 llphl=1.26 ). O a O .1 I I I I I I I I I A A 1 A L A .. A A A A -1 0 1 2 3 4 6 -1.2 -0.2 0.8 1.8 2.8 alpha=1.60 nigh-31.90 120*! 1 1 r rq 120 _f r I 1 I I r 199» . 139 . 4 80 4 49 ~ 20 , A 1 A 1 1 1 ‘1.1 -0.1 0.9 1.9 2.9 '1.2"..9'0.8-0.3 0 0.3 9.8 alphn=2.00 alpm=3.oo 50 Figure 6.4: Distribution of bootstrap confidence radii for a = 1, n = 200. m 0 B O 2 ‘5' TIIIIIIITITIIIITIIIITIIII] » O O ’1. p U m 4 m :1 As expected, even as 11 gets large, the distribution of the scaled radii ob- tained by this method is dispersed apparently continuously over the positive real axis. Figure 6.4 shows what happens when n = 200. More is observed. Since the confidence widths generated by the method have such a wide distribution, we examined how well the procedure compares with the procedure which simply uses the unconditional distribution of the sample mean about 0. That is, we estimated T, such that P(an_0l S T7)=7' Since the st are in the domain of attraction of the stable random variable Y, we can do this not only by Monte Carlo simulation, but also by using the quantiles of Y, because 5,, —>d Y (see (1.1)). The results were startling. For a very small proportion of the applications, the confidence radii obtained by the method T,(Xn), were larger than T, But for the complement, the T,(Xn) was extremely small compared to T.” 51 Table 6.1: Analysis of sizes of bootstrap confidence radii. c Observed proportion of times T95(Xn) < cT963 1 .940 1/2 .881 1/4 .770 .107 .500 suggesting that the bootstrap performs well. Example. Here is an illustration of how the bootstrap confidence radii compare with confidence radii obtained using the unconditional distribution of the sample mean. We ran a simulation with 1000 observations for a = 1, n = 50, and 2000 bootstrap resamplings for each observation with m = 50. The observed coverage of bootstrap method was 0.968. The radius about sample mean necessary for unconditional confidence of .968 is T968 9- 32.13. The implications of these results could be very far reaching. 94% of the times the method was applied, the radius of the confidence interval for 0 was less than the radius necessary to give confidence equivalent to the empirical coverage obtained by the method. Maybe more substantial is how much smaller the observed radii were. Half of the time the bootstrap confidence interval radius was less than about a tenth of the radius necessary for unconditional confidence. More needs to be studied in this direction. The invariance principle proved in this paper will help to explain the phenomenon. Chapter 7 Remarks 7 .1 Other resampling plans. The stochastic integral representation given in Chapter 2 together with the invariance principle proved in Chapter 3 provide a new way of studying the resampling problem. One of the problems with the usual bootstrap in the infinite variance case is its inconsistency. We have examined resampling plans by means of multipliers different from the usual multinomial multipliers in an effort to produce a consistent “bootstrap” for the long tailed case. One such method is to define multipliers 6" = (6?, . . . ,63) whose distri— bution satisfies the following conditions: (a) For each i, P(6? = —1)=1/2 = P(6?=1); (b) 22:1 61: = 0i (c) Each permutation of the components of 6" has the same distribution. One of the virtues of the usual centered bootstrap is its location invari- 52 53 ance; if X,- = Y,- + 0, then 11 XX“ rile—1): ZYk( .ik'll k=l k=l because 22:1(M;k — 1) = 0. This method shares this property. Other benefits to this approach are that 6" —>d (61, 62, . . .) where 6;, 2d 6),. Also 61,6), :4 6),, so that in fact 11 00 -1 -1/a an E Xkbj: “(if ckl‘k . k=l k=l An analogous statement to the one made here about the conditional limit laws (which are the important ones to consider) can be made for this type of resampling using a similar, but easier, analysis. This method, which we call seamless resampling, is explored more thor- oughly in a forthcoming joint paper with R. LePage which calls on the in- variance principle proved here to prove its value. 7.2 Only off by a scale. Consider the symmetric case. Recall the result given by Knight [Kni89]. The representation he gives for the random limit law is the random conditional distribution of Z semi/KM; - 1) k=1 given the sequence elFl'l/a, 62F;1/°', . . .. It is shown by LePage in [LWZ81] that for our choice of an and b,,, 00 s, .4, Z ark-”a. k=l 54 By LePage’s Theorem 3.3(A), the unconditional distribution of the sum given by Knight is only a scale away from the correct limit law p : Z ekl‘;l/°'(M; — 1) =4 (E(M; — 1mm 2 2,1“;1/0. k=1 k=l Appendix Appendix A Appendix. Lemma A.l Suppose A, B Q R are disz'oint Bord-measurable sets. If {Yk} is a sequence of i.i.d. random variables independent ofe and F, and EIYI I“ < co, then with probability one, 5; arr/Wknuswk) = Z e.I‘;"°'mA(Y.) + Z c.P;"°Y.IB(Y.). k=l k=l k=l Proof. By Theorem 3.3(A), we see all three sums converge with probability one. Let a, b, and c, denote their respective almost sure limits. For each n, la -— (b + c)| 3 la — Z ekF;l/°Yk1AU3(Yk)I k=l + |b — Z ckr;‘/"Y,.IA(Y,.)| + Ic — E CkF;l/aYkIB(Yk)la k=1 k=l Since the right hand side converges to zero almost surely as n —+ 00, a = b+c almost surely. 0 Lemma A.2 Let T be an arbitrary random variable taking on values in (0, 1) and e be a random variable independent ofT with P(e 2: 1) = % = P(e = —1). 55 56 For 3 in (0,1) let .7, = o{eI(T S u),u S 3}. Then for B Q (s,1], with probability one, P(T E B) P[T e B||f,] = P(T > S) [(T > s), and E71210 e B) = 0. Proof. Intuitively, if f, represents the information we are provided with at time s, then at time s we know the value of T and e if T S 3. Let A, = {0,[T > s],[e = 77]fl[T S u] : n = :l:l,0 S u S s}. A, is a r-system generating .7, and fl is a finite union of sets in A,. Both terms on the right are f,-measurable. The proof is completed by checking that the integral condition for conditional expectation is satisfied on A, for both relations. (See [Bil86, Thm 34.1].) D Lemma A.3 With probability one, I‘;2/°’/s?, —+ 0. Proof. Let 5 > 0. We need to show that with probability 1 eventually ~2/a R < 5. (A1) m -2 a j=0 Fuel-{7' But (A.1) occurs if and only if (294/... < s[(§1)—2/a + i(r"+j)-2/a]; (A.2) n n i=1 n and (A.2) is true if and only if (><>>:<——> 6 n j=1 Choose m > (1 + €)/e. By the strong law of large numbers, we know that I‘M,- / n —> 1 almost surely for each j. Hence, for each to in a set of probability one, 3no(w) such that for n 2 no(w) 57 Thus for n 2 720(0)), <.°°;(— M) > :(W) 1 i=1 i=1 \/ AA P—J «1+ 0) \__/ A y—a l m v Lemma A.4 For each t, 02(V,,(t)) —+ 0. Proof. oo '2/0 2 5204(1)) .-_- E(X"2 F (ln(1 — :r,c A t) — t)) k=n '1 oo ;2/0 2 = EEI‘(Z ——(1n(1,—- :11. At) — 1)) 15:75 3n 2/a = EZ(P;% )202(-ln(1—Tk/\t))). k=n But 02(—1n(1 — T), /\ t)) is no greater than E(— ln(1 — T1))2 = 2. Thus, 2(133’“) ) 02(Vn(t)) 5 2E( k=n n _2/0, —2/01 3 E (1“,, Zr ) 2 2 3n k=n 3n = ware/33,). This last term converges to zero by the LDCT, since 1 2 P;2/°‘/s,2, -—+ 0, by Lemma A.3. D 58 Lemma A.5 If h is a Borel measurable function, the h(Tk)I(T;c S s) is measurable with respect to .7... Proof. By considering separately the sets {TkI(Tk S 8) Z 7} for 7 Z 1, ’7 6 (0,1), and 7 S 0, we see that T;J(T;c S s) is f,-measurable. Define g on R by g(a:) = h(a:)I(o,,](;r). We are done because 9 is Borel- measurable and g(TkI(Tk S 8)) = h(Tk)I(Tk S S). C] Lemma A.6 ForO3]. 1 — s Proof. By Proposition A.5, ln(1 — Tk A t)[T,c _<_ s] is f,-measurable. Thus, E" ln(1 — Tk At) = ln(1 — Tk At)[T,c _<_ s] + E" ln(1 — Th /\t)[T,c > 3]. (AA) Let A, be defined as in the proof of Lemma A.2. Since 0 is the only set in A, properly contained in [T1‘ > s], Ef‘ ln(1 - T]. A t)[T;c > s] is almost surely constant on {T1‘ > 3}. Call this constant c. By the definition of conditional expectation, c(1—s)=cP(Tk>s) = / ln(1—TkAt)dP {T198} 1 = /ln(1-u/\t)du = Atln(1—u)du+£l(l—t)du = (1—3)1n(1—3)+ S—t. 59 Thus, on {T1c > s}, —t Ef‘ln(1—Tk/\t) = ln(1—s)+ :_3 3 _- = ln(1—TkAs)+1_3. Combining this with (AA) the lemma is proved. [3 By Remarks 3 and 4 in [LWZ81] Z3" —+d 2:111:2/0 and our as- sumptions on F and an, but the following statement needs proof. i=1 YnJ' Proposition A.l 2Y2.- pin“. i=1 Proof. For each N, 72, let 1: N-l n 73 - —ZY n), =ZY .j, and R~.n = 2 Y” :1 =1 j=N Let (nk) be a subsequence. It suffices to show that there is a further subsequence (7119-) such that rm) J, 22°: I‘;2/°'. Since for each j, YnJ -+ PJ'I/a a.s., for each N, N SN,” —+ Z 171/0 a.s.. (A.5) i=1 For each j, choose N such that 13(2):”, 172/6” > 2") < 2”. Then for each j, choose n(NJ- ) > n(NJ - ), such that for n 2 n(NJ-), P(|SN n -—ZI"’2/°| > 2") )< 2". For each j, let nkj. = min{n;c : nk 2 n( NJ)}. Write 5N1“) as SJ, and RijJ- as R'. Since 73* —. —J-S + RJ, we only need to show that 51"“) 2f: 1 I“; We almost surely and RJ- Hp 0. The proofs of these facts follow. 60 Lemma A.7 Asj ——> 00, SJ- —* 2 1:2,“ as k=l Proof. Let A,- = {lsj — 22;, r;’/°'| > 2(2-J')}. Notice that N] . Nj m — a . A,- c {15,- - Err/“I > 2"}Uuzrk’” — Zn.“ l> 2’1}. k=l k=l k=l Therefore P(AJ) 5 2(2"). The proof is completed with an application of the first BoreLCantelli lemma. U Lemma A.8 RJ- —>,, 0. Proof. It suffices to show that for each subsequence (ju), there is a further subsequence (jyi), such that RJ-yJ --+,, 0. Let j,, be a subsequence. Since Tn —>d 2:11 Fri/a, (Tnkj) is tight. But then (SJ-,RJ) is tight in R2 because Sj,Rj Z 0 and TM), = Sj + RJ. Since (SJ-y, RJ-u) is tight in R2 there is a weakly convergent subsequence (5'in , RJ-v', ). Say it converges to (S, R). We must show that R = 0 a.s.. For a reduction of cumbersome notation, denote 23:1 I‘J-_2/°t by Z. Since (8,",'. + RJ'W) is a subsequence of (1'3”), SJ-VJ + RJW —’d Z. But 5,}... + RJ-W —»d S' + R by the continuous mapping theorem. Therefore Z :4 S + R. By Lemma A.7 S :4 Z. Since RJ'W _>_ 0, we must have R Z 0. Let (I) be the standard normal distribution function. 0 = E(S'+ R) — E(S) = E((S + R) — <1>(S))I(R > 0). Since (S' + R) — @(S) > 0 on {R > 0}, P(R > 0) = 0. Thus R = 0 almost surely. Cl 61 Notice that Proposition A.l and the fact that YnJ- —+ PJ—l/a almost surely together imply that for each N, RN. _., 2: Ff“ (A.6) k=N Corollary A.l 8N,n(N) —+,, 0 for any nondecreasz'ng sequence (n(N)) such that n(N) > N for each N. Proof. Let 6 > 0. There exists No such that P(ZfiNo Fri/a > e) < 5. By (A.6), we can choose N1 > No such that P(IRNO,n — ZEZNO F;2/al > 5) < e, for n 2 N1. But then for N such that N Z No and n(N) > N1, n(N) P(SNJIUV) > 26) ‘2 P(Z Yn2j > 26) j=N n(N) P(Z ij > 25) j=No "(M —2 2 P(|ZY§-— Z r; /°'|>5) +P( (Z r;/°> j=N0 k=No 1:: No < 25 C] |/\ |/\ Bibliography Bibliography [Aal78] [AG89] [Ath84] [Ath85] [Ath87] [BF81] [Bi168] [Bi186] [Bre68] Odd Aalen. Nonparametric inference for a family of counting pro- cesses. Annals of Statistics, 6:701-726, 1978. Miquel A. Arcones and Evarist Giné. The bootstrap of the mean with arbitrary sample size. Annals of Inst. of H. Poincare, 1989. K. B. Athreya. Bootstrap of the Mean in the Infinite Variance Case. Technical Report, Iowa State University, 1984. K. B. Athreya. Bootstrap of the Mean in the Infinite Variance Case- II. Technical Report, Iowa State University, 1985. K.B. Athreya. Bootstrap of the mean in the infinite variance case. Annals of Statistics, 15:724—731, 1987. Peter J. Bickel and David A. Freedman. Some asymptotic theory for the bootstrap. Annals of Statistics, 9:1196—1217, 1981. Patrick Billingsley. Convergence of Probability Measures. John Wiley, New York, 1968. Patrick Billingsley. Probability and Measure. John Wiley, New York, 1986. Leo Breiman. Probability. Addison-Wesley, New York, 1968. 62 [DV88] [Efr79] [Fel71] [G289] [Hal88] [Kni89] [Leh83] [LeP80] [szsi] [Met82] [P0184] 63 DJ. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Springer-Verlag, 1988. Bradley Efron. Bootstrap methods: another look at the jackknife. Annals of Statistics, 7:1—26, 1979. William Feller. An Introduction to Probability Theory and Appli- cations. John Wiley, New York, 1971. Evarist Giné and Joel Zinn. Necessary conditions for the bootstrap of the mean. Annals of Statistics, l7:684-—691, 1989. Peter Hall. Rate of convergence in bootstrap approximations. An- nals of Probability, 16:1665—1684, 1988. Keith Knight. On the bootstrap of the sample mean in the infinite variance case. Annals of Statistics, 17:1168-1175, 1989. E. L. Lehmann. Theory of Point Estimation. John Wiley, New York, 1983. Raoul LePage. Multidimensional Infinitely divisible variables and processes. Part 1: Stable Case. Technical Report, Stanford Univer- sity, 1980. Raoul LePage, Michael Woodroofe, and Joel Zinn. Convergence to a stable distribution via order statistics. Annals of Probability, 9:624—632, 1981. Michel Metivier. Semimartingales. A Course on Stochastic Pro- cesses. Walter de Gruyter, Waltham, Massachussetts, 1982. Henry Pollard. Convergence of Stochastic Processes. Springer- Verlag, 1984. 64 [Sin81] Kesar Singh. On the asymptotic accuracy of Efron’s bootstrap. Annals of Statistics, 9:1187—1195, 1981. [ST89] Gennady Samorodnitsky and Murad Taqqu. Stable Non-Gaussian Random Processes. January 1989. Notes for future text. 11111111111WIll1111111111111111ll