£33 1‘ . .4.)qu a.“ ...«(Wufifimtmflfl WP. n5» 4...... . .udfikr. 9‘» n)! . an. 1.» bruins... n.8mmvvmfi ,, u. .9. 54.....eywuwlmauuw... . .wm.m4hm mgunxwm. . M K. .. «fawn. «.... ab: 4.33.. :6] . ... .. I... “flag?“ . 01:17.! . U ‘1" .- I:. A... A! to. . ...-£19 I .v \n? .5. ; . x: c‘ . ..3 x... w .p‘t: .. ‘5an “f." 26?qu; . 1! §.'xv; .....IGIOD I II V. .‘levllls‘nv‘ .llgfl ... 1.3.3.1...1 . a 3 . . 21.11. .I' ...| I. IV lifil‘ slur... ..Hi tr”... .. Liifltsirta‘: 2'05! ..l ... . , . ‘ . H . .. . . . . . .. . . u . . ‘ .n ‘ . . . y . . . . . [Emitttstt‘alQ I .| .....Hl. . . .. ‘ . . . . . ‘ \ . , . . . . ‘ ..A.“lv.t:.l.vfiut ~ ... . . ‘ . . n . ‘ , f .. .. » . . ‘ ‘ ,. . . . ‘ : ...»! . 1.2.... . .. I . H .. ‘ . . . y . v . . . ‘ . . n.5 .l 'o...‘l...\.-v|i\:mw¥l.ylhv H' l 1 ¢ . \.J‘7.iu~ 131‘. .'n "-1 .. m ‘ lil|tt1||h ..§'l)\ ,A .v a I'l .1-II {tilintil'obl 1: Hal! bya‘Yuuuls.3‘.tPI-v5. ...-ti..- ‘ ' V ‘ u- U. ‘19! Illlulna: cl .1 .v‘ 7110 lv 0:“: .. . I. . . .V x .u ‘ . . . .vl) .. I‘ll JC,I¢IF..» .05.;vlul'L int: s.-..1ix..'l.l‘ .0 ill: 1“!!! I! ‘ I9, 5:315?! flaw - I nunll. ‘n :2.I.I'lll.| av . . ‘ A: "- . I|h¢l ‘0 ‘v u v air-v1»..f’lnv9“ ‘ . ’i'l .01.. ,b A 31.1). 5!! “Enlrf‘hflufltl‘s'. ‘ .13t' III-n. : t . l. n . A o. . I... . . ‘n. . ...-o - . -. o v... 0. I-.. ~1'tv vl 1.! ,btlu I . -....nnn I..V- J ‘ V K , . t 7.1. < I...1hh|0l.~.|cl_.fllll..|lt‘ . Solo. ... ‘ fi~v.hh>.n.qo.lt.l‘f ..I‘IIJ.‘ . ‘ .‘ .6». . v. 6.- n v | VHFSi? IlllllllllllllllIlil'llllllllllllllllllillll 3 1293 00885 2224 This is to certify that the dissertation entitled Resampling Methods For Linear Models presented by Krzysztof Podgorski has been accepted towards fulfillment of the requirements for Ph.D. degree in StatiStiCS ajor profey/ ‘ Date July 23, 1993 MSU is an Affirmative Action/Equal Opportunity Institution 0—12771 LIBRARY _ M'Chlgan State University ___ PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE __ ll ___Jl =4 “—l _—l __7 . . MSU Is An Affirmative Action/Equal Opportunity Institution czwmwnpma-pd Resampling methods for linear models By Krzysztof Podgorski A DISSERTATION Submitted to Michigan State University in partial fulfilment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 1993 Abstract Resampling methods for linear models By Krzysztof Podgorski In this dissertation we study resampling residuals by random permutation in linear regression with exchangeable errors. For given observations we con- sider the resampling distribution obtained by this method versus the distri- bution of errors conditional on its order statistics. We observe that with high probability both are approximately equal on contrast vectors after suitable normalization. The results remain valid if we consider the reduced regres— sion model. This model comes from the original one by removing those data for which the corresponding errors take extreme values. We obtain general conditions under which confidence regions produced by such methods are ac- curate. No moment / distribution assumptions on the underlying errors other than exchangeability are required. We adopt the term seamless resampling to indicate this robustness. In the absence of finite second moments random- ness of the limiting bootstrap distributions has been considered as a failure of the traditional bootstrap and no practical meaning has been given to the phenomena. For our method this type of randomness is still present in the limit but it has clear probabilistic interpretation as a conditional distribution which can be used to e.g. obtain confidence sets. We study this phenomena in detail for the case of independent errors with distribution from the domain of attraction of stable laws. We have developed computer software in which we implement these methods and provide convenient tools for the analysis of resampled data in general. Acknowledgment I wish to express my gratitude to my advisor Professor Raoul LePage for the introduction to the problem, fruitful discussions and constant encourage- ment. His exceptional support and excellent guidance during my doctoral studies at Michigan State University went far beyond what I could ever ex- pect. iii Contents 1 Theoretical Results 1 1.1 Introduction ............................ 2 1.2 Main Results ........................... 6 1.3 Examples ............................. 16 1.4 Proofs ............................... 27 2 Numerical Analysis 39 2.1 General Description of Software ................. 40 2.2 The Numerical Procedures .................... 41 2.3 The Graphical Interface ..................... 47 2.4 Numerical Experiments ...................... 50 2.5 Conclusion ............................. 54 Appendices 55 Appendix A ............................... 55 Appendix B ............................... 58 iv List of Figures Genome-com 10 11 12 The conditional distribution of errors and its recovery by permut- ing residuals ............................ 19 The distributions obtained by other methods ........... 19 Conditional distributions in the unreduced model ......... 21 Conditional distributions in the reduced model .......... 21 Results of resampling, Gaussian case ................ 24 Results of resampling, Cauchy case ................. 25 Results of resampling, stable case with a = 0.8 .......... 28 The sample of 1000 elements obtained by permuting residuals in a linear regression model versus an autoregression model . . 42 Resampling residuals in an autoregression model with Gaus- sian errors versus Cauchy errors .................. 44 The main window of the LaPlot application for analysis of multivariate data .......................... 48 The distribution of resampled errors and residuals, Gaussian The distribution of resampled errors and residuals, Cauchy case. 53 Part 1 Theoretical Results 1.1 Introduction This paper considers a multiple regression model (1) Y = Xfi + e, where X is an n x d finite matrix of real numbers, fl is a d x 1 vector of real numbers, and 6 is an n x 1 vector of exchangeable random variables. For reasons that will be discussed later we will also consider reduced regression models which come from (1) by deleting some data and the corre- sponding rows of the design matrix. To describe this submodel, let R; be the rank of 6.- among (e.-)?___l and M be a subset of m integers from {1, . . . , n} with cardinality m. For any n x 1: matrix A by A M we will denote an (n — m) x k matrix obtained from A after deleting those rows indexed by i whose R,- is in M. With this notation for any M we define a reduced regression model by the equality (2) YM = XIII/6 + 6M- Notice that for different vectors of errors 6 with a fixed set 11/! we can have different reduced models (2) and thus the design matrix XM becomes ran- dom. A k x 1 vector v of real numbers is called a contrast if the entries of 2) sum to zero, equivalently if the Euclidean scalar product 12- 1 = 0 where 1 denotes the vector of 1’s in IR" (we will not indicate in notation dependence of l on a dimension 1:). Throughout the paper we will assume that X M has rank d and also that the vector 1 is in the column space of X and thus of any X M for a subset M of {1, . . . , n}. Without losing generality we can assume that 1 is the first column of X. Then contrast vectors come to (2) in a natural way. Namely, for i > 1 the least squares estimators d,- of fl,- are of the form 5,- = V,- - YM for a contrast V.- depending also on M. Indeed, V,- is the i-th row of the matrix M = (XEXMYIXL and MXM = I. This implies that any row V,- for i > 1 is orthogonal to the first column which is assumed to be equal to 1. Ordinarily one is interested in approximating the joint distribution of v - 6 for several 2) since it is equal to the estimation error, i.e. v-e = (v-Y)—(v-Xfi). For i.i.d. errors with finite variance, the central limit theorem is popularly used for this purpose. Bootstrap methods apply to this case as well (see: Efron (1979), Freedman (1981), Wu (1986)) but how are we to know if the underlying assumptions have practical application to the data at hand? See LePage (1990). Instead approximating or estimating the unconditional distribution of as, we propose, without moment assumptions, to estimate the sampling distribu- tion of v - 6 conditional on the sigma field .7 generated by the order statistics of the coordinates of 6. For this purpose we will use randomly permuted residuals. To be more precise let us introduce the following notation. We will use 2) - 6']: to denote the above conditional distribution. Denote by csi the vector of residuals Y - I”, where I” = Y/ X is the projection of Y to the column space of X. Let 7r denote a uniformly distributed random permu- tation applied to the coordinates of n-space. Suppose that the distribution of 1r, conditional on 6, is also uniform over all n! permutations, so that 1r is independent of 6. We will observe that, provided d/ n is small, 1) - r6i|Y 3 provides with high probability a close approximation of v - MIT. Notice that the latter distribution is equal to v-e|.7-' since 6 has exchangeable coordinates. Proposal 1 For any finite set of contrast vectors {vb k S I}, estimate the joint f—conditional sampling distributions of contrasts vk - e by the distribu- i tions of v), - 7T6 conditional on Y. This proposal provides strong support for the approach taken by Freed- man and Lane (1983) who develop descriptive tests of linear hypotheses. To quote from them, “our reference sets are derived by permuting residuals, and our significance level is a descriptive statistic rather than a probability.” It will be seen below that v - 7T6'LIY is an estimate of a conditional distribu- tion insensitive to the moment assumptions. Thus it is possible to achieve robust estimation of the conditional sampling distributions of least squares estimators. Previously only M-estimators with bounded score functions were known to have this property, see Lahiri (1992). As we will see Proposal 1 can be applied to a quite general class of exchangeable distributions. However sometimes the recovered distribution v - 6|]: itself will posses certain unpleasant properties when some coordinates of 6 will take relatively very large values. This may happen for example for long tailed distributions of 6. By our result also the distribution v-7rosi IY will inherit this. It can happen, e.g. the example in Section 1.3, that confidence regions will not be in the form of single intervals but sums of widely separated intervals. In such a case to “improve” the distribution of interest it would be reasonable to exclude observations with outsized disturbance by errors and then apply our method of resampling by permutation to this reduced model. Our results extends also to this situation. Let us describe this in a more rigorous way. We shall fix a set It! of those values of ranks (R,-)?=1 for which we have decided to exclude corresponding observations. For example by taking M = {1,n} we will exclude two ob- servations: with the smallest and the largest error. Thus we will consider a reduced model as defined by (2) and by «5%,, we will denote its residuals i.e. ct = YM — YM / X M. We will use the same letter 7r for a random permutation in R""" (again disregarding dependence on a dimension). We will denote RM = {R,- : i E M } In the problem of estimation of coefficients fl we will use contrast vectors being rows of (XLXMY’XL and the matrix X M is now dependent on values of R M- For that reason we will allow a contrast vector V(RM) E R"‘"‘ to be dependent on 6 through its order statistics R 16(- Fur- ther let fM denote the sigma field generated by the order statistics of 6M and by RM while GM is the sigma field generated by YM and R [4- With this no- tation and conventions we will find that if the ratio d/ (n — m) is small then, as before, the distribution V(RM) - “MGM approximates the distribution V(RM) - ele'M. This can be stated as the generalization of our proposal. Proposal 2 For any finite set of contrast vectors {V ;k _<_ l}, which may dependent upon RM, estimate the joint fM —conditional sampling distribution of contrasts Vk - 6M by the joint QM -conditional distribution of Vk - «61%,. Notice that RM provides information concerning which data might be excluded from the model and YM are observations remaining after this ex- clusion. Thus QM represents information which should be given to us to consider a reduced model based on extreme values of errors. 1 .2 Main Results We assume the notation of the introduction. As before, 6 is assumed to has exchangeable coordinates and to be independent of a random uniformly dis- 2 we will denote tributed permutation 7r. If u is an k x 1 vector then by a, u the arithmetic average and the vector of squares of coordinates, respectively. By 3?, we will denote the (k — 1)—divisor sample variance of u i.e. 33 = km/(k — 1). Our goal is to show that for any contrast vector V(RM) dependent on ranks RM the distribution V(RM) - “MGM approximates the distribution V(RM) - eleM. Here as before QM = 0(YM, RM) = 0(6M, RM). In all that follows for a random variable Z or a sigma field H by E Z , B” we will denote conditional expectation with respect to Z and H, respectively. Also using Z [71 we will mean both conditional distribution of a random vari- able with respect to a o-field 'H and an example of a random variable with this distribution allowing context to distinguish between these two concepts. This convention will simplify notation later when we will consider weak con- vergence of conditional distributions. First notice that for o E En_m we have (6M,RM) rd— (06M,RM). Here 2,, denotes the set of all permutations of coordinates of vectors in IR". To see that let consider an event {6M 6 A, RM = RM}. Define a permutation 6 E 2,, such that it permutes e,- for 7',- ¢ M the same way as o and leaves 6, unchanged for r,- E M. Then RM(5'€) 2 RM“) on a set {R,,‘,(e) = RM} thus by exchangeability of e we have P(€M(€) E Ali/(4(6) = RM) = P(€M(56) E A, 3112036) = RM) = P(06M(€) 6 «4.3151(6) = RM)- This implies that V(RM) . emf," é V(R,,—,) - ireMIQM. Indeed, for each bounded and measurable function h and a random permutation of coordi- nates in R””" independent of e we have E’thmm-et) = {1,—5,55“ Z (1(V(RM)'0‘€M) ' UEEn—m = Emmi/(Rn) - ml) = EGMMl/(RM) ‘ 7r6M), where in the last equality we use that fM is a sub-o—field of QM and the fact that the last conditional expectation is fM-measurable. Moreover both V(RA-,) - nehlgM and V(R,;,) ' (MIfM are centered at zero since, by indepen- dence of 1r and e we have vamm - mat) = EQWVW) ' 6i!) — 1 g [r _ .1. — m E M deg-"fax (RM) . 6M) : EQM (E(7IV(RA_I))|RM=RM . 6i!) :0, where in the last equality we use the fact that E(7ru) = 171 = k‘1(u - 1)1 for any k x 1 vector u and thus E(7ru) = 0 if u is a contrast vector. Similarly we have E’M(V(RM).7reM) = EfM(7rV(R,—,).eM) ‘—“ Eif-M (E(WV(RM))IRM=RM ° 6,”) = 0. The main result can be stated as follows. Proposition 1 Let IV _C_ {1, . . . , n} have cardinality m. If 1 is in the column space of XM which has rank (1 and 7r is a random permutation of coordinates in Rn‘m then for each contrast vector V(RM) we have E9M(V(RA-,) - reM - V(R,,-4) - inst)2 _ d—l ECHO/(RM) - 71'6M)2 _ n — m -1 ' Ef For the proofs of this and other results in this section see Section 1.4. As a consequence of the above result we have that the distributions V(R,g) - eMIfM and V(RM) - rehlgM are approximately the same when scaled by EfM(V(RM) - 6M)2 provided that d/(n — m) is small. This follows directly by Markov’s inequality applied to the second moment which stands in the denominator of the left side in Proposition 1 (see also the proof of Proposition 2; Section 1.4). One can use as well the Mallows metric d2 as a measure of this closeness (see Bickel and Freedman (1981)). The result is interesting by itself. For example any picture illustration of a distribution is in general invariant on rescaling. But when no moment assumptions are involved the scaling EfM(V(RM) - 6M)2 may diverge to infinity with n. In such a case we need more arguments to apply our methods of resampling to confidence regions. The next result explains the practical meaning of our result when we do not use the scaling. For the sake of simplicity of notation we will state it for the case M = 9 but it can be restated in the obvious way also in the general case. For a given contrast vector v, let 7(6) = E‘(v . 7re — v - nei)2/E‘(v - 7T6)2, C1=‘/Ef(t’° €)2, where in the latter we do not show explicitly dependence on the order statis- tics of e and let C2(Y) = JE‘(v - rel)? Denoting by F1, F2 the distribution functions of v'€|f, v- «6* |Y, respectively, we then have the following relations. Proposition 2 For any positive 6 we have d -—1 d —1 < -l (3) P(7(€)> "_16)_ n_ 6 . (4) am 6 {Few — Clo — «06-2. F2(417 + 01m woo-21 for all a: 6 IR and f 2 / _ ”_d 2 E 0,0 )_ n—1C" We can use Proposition 2 to examine accuracy of using the distribution given by F2 obtained by resampling residuals instead of the unknown conditional distribution given by F]. For example (4) allows us to assess a value of the confidence regions based on F2 as approximations for corresponding confi- dence regions based on F1. More precisely, we should adjust the first ones by taking their halo of thickness C16 and then their significance will differ at most by 7(e)/62. By (3) the random variable 7(6) is small with high proba- bility. Thus the accuracy of proposed approximation of F1 by F2 depends on how large C16 is. Suppose for example that we would like to use quantiles of F2 for obtaining confidence intervals. Then we want C16 to be small relative to these quantiles and thus to lengths of confidence intervals. In the next paragraph we will explore this problem in more detail. For any p 6 (0,1) let KILKE denote p-quantiles of F1,F2, respectively. By (3) for any 60 there exist (20, a subset of underlying probability space, with P(Qo) > 1 — 60 and a constant M > 0 dependent neither on n nor on d such that on {20 we have 7(6) 3 Add/n. By (4) on (20 for each 6 > O we have Md C1 Md Cl F-z(.‘l‘- ——\/—n:) - 6,F2 (1' + —T)+ 6] for all .1: E R. This implies that for any p,6 > 0 such that p - 6,12 + 6 still F1(.T) E are in (O, l) we have the following relations between corresponding quantiles . M_d C1_ M__c_l C__1 Thus it would be desirable to have CIA/n, at least asymptotically, small relatively to K;. We can say equivalently that we are interested in the cases 10 when quantiles of (ti W,, are convergent to infinity, where = (v-1r6)|e . Jaw-m)? We will examine this question for errors sampled independently from a dis- (5) We tribution belonging to the domain of attraction of a symmetric stable law. Let G(a:) = 17‘°1[1,°°)(:r) and for any n E N let (UM)?=1 and (6i)?=1 be independent sequences of i.i.d. random variables such that Um.- has the uniform distribution on (0, 1) and P(6,- = :lzl) = l / 2. Let us define a sequence 6,, = (6,”)?=1 of errors by the equality (6) 6n,i = 6iG—liUnJ) = (Stud/O- It is well known that the distribution of 6",,- belongs to the domain of attrac- tion of a symmetric stable law with index of stability 0. In fact for 1' > 1 we have P(|6,,,,-| > 1') = 1"“ (see also LePage (1980)). Although we will assume here this special form of errors by application of invariance principle of the type obtained by Kinateder (1990) it is possible to extend our results on an arbitrary law from the domain of attraction of a stable distribution. We will also need a special form for contrast vectors. Let U : [0, 1] —* IR be a continuous function such that [01 v(:r)d.r = 0. Define a sequence v,, = (my, )?:1 of contrast vectors by i 1 " k (7’ - " (a) ' a 2; v (5)- To express the limit distribution of W,, = (1),, - ren)|e,,/(E‘(v,, - 7re)2)1/2 we will need a notion and some basic properties of the stochastic integral 11 of a function v with respect to a Levy motion. For details see for example Schilder (1970) or Kuelbs (1973). By a Levy motion we mean here a stochastic process (A,),E[0,1] continuous in probability with independent and homogenous increments such that for any t 6 [0,1] a distribution of A, is symmetric stable with characteristic function ¢,(u) = exp(—t|u|°‘). Then the integral fol vdA is defined in a usual way by as a limit in topology of convergence in probability of simple functions. A random variable fol vdA has a stable distribution with the characteristic function ¢v(u) = exp(-— fol |v(x)|°'d;z: |u|°). There exists a version of Levy motion such that its trajectories with probability one are right continuous and left limits exist at any point t 6 [0,1] (cadlag) (cf. Doob (1953) p.422). Thus we can define a o-field ,7 generated by values of jumps of trajectories of this process. We can do it, for example, by ordering values of jumps of a given cadlag trajectory in a sequence since the number of jumps exceeding given positive value is finite. Then ,7 is just a o—field generated by such a sequence. By I‘ = (I‘,),-€N we will denote arrival times of a Poisson process with intensity one. In this situation we have the following limit behavior of W... Theorem 1 For each continuous function v : [0, 1] —-> IR with probability one the conditional distributions _ v,, - «6,, - \/E‘~(v,, ~1re,,)2 converge weakly to the distribution [,1 vdA \/ Z [01 v2(:1:)d:r We 6.. 12 where Z is J measurable and Z 4 23:1 I‘TQ/a. Remark Proposition 1 implies that the joint distribution (v,, - net,” - 1r6,,)|6,, scaled by ‘/E‘"(v,, - 7r6,,)2 converge to (f0l vdA, f0l vdA)|j scaled by \/Z fol v2(a:)d:r. As a corollary we obtain desired property for the quantiles of the distri- butions of \/7_i W,,. Corollary 1 For a non-zero continuous function u : [0,1] —-> IR the quantiles of J17 W,,, different from median, converge to plus or minus infinity with probability one. For the proofs we refer again to Section 1.4. The special form of contrast vectors assumed above is in some sense es- sential for obtaining a result of this type. We will see that in some extreme cases i.e. for a very unusual choice of contrast vectors, the quantiles con- verge to zero. In such a case usage of quantiles of F2 as approximations of corresponding ones for F1 becomes worthless. Example 1 Assume that our contrast vectors are of the form v,, = (v,, . . . , v,,, 0, . . . , 0) 6 IR", where k S 72 does not depend on n and (v1, . . . ,vk) is a fixed contrast vector. Further let 6,, = (61,...,6,,), where 6,, i E N are i.i.d. random variables such that Shh/E diverges in probability to infinity. In such a case v,, - 7r6,,|6,, converges in distribution to the same limit as v,, - 6;]6n, where 6; means random variable obtained by traditional bootstrap i.e. by resampling with replacement form (61, . . . , 6,,). It becomes more clear if we 13 notice that because of the special form of v,, permuting 6,, in v,, - «6,, is equivalent to sampling without replacement k-element sample form a pop- ulation of n-elements (we have here k fixed and n going to infinity). Thus the distribution of v,, - 1r6,,|6,, converges weakly to the distribution of v,, - 6],. But assumption that SQ/fi diverges in probability to infinity implies that fi/ JE‘(v,, - 1r6,,)2 will be convergent to zero so do the quantiles of fi W,,. See Section 1.4 for more rigorous arguments. As a specified example of a random variable satisfying the required con- dition one can consider the errors given by (6) for a 6 (0,1). Indeed, for such 6 we have n ‘2 O n —l O i _ i=1 Pi / —2 _ i=1 (Siri / —-3 n I‘n-i/l Fn+/l (8) —2+‘2/o (Pu-H )2/0 Zn: l—v-‘Z/o = n _ i n i=1 1" 1 1/0 n _n-3+1/0 (A) Z‘SJ‘FI/a- i=1 n We are using here the well known fact that (Pl/Fn+la - - - a 1-‘n/I‘ri-+-l) g(U1,na - - - 9Un,n)- By Law of Large Numbers I‘,,/n converges as. to 1. The series 2:, 6,1" ,— 1/ a and 2:21P?” are convergent as. to the continuous distributions. Since n‘2+2/°‘ for a E (0, 1) diverges thus in this case with probability one sfn / n is convergent to infinity. However in some cases our approximations will be justified regardless of the form of contrast vectors. Namely, this will happen if s," /,/77 is convergent 14 in probability to zero. Indeed, by simple application of Proposition 1 we can obtain the following corollary. In its formulation we do not indicate explicitly dependence of sf” as well as F1, F2 on n and also it should be remembered that F1, F2 being conditional distributions are dependent on a random value of 6. Theorem 2 If sf / n converges in probability to zero then for any 6 > 0 and for sufliciently large n on a set of probability not less than 1 — 6 we have F1(.’L‘) 6 [F2(.’II — 6) — 6,F2(.’E + 6) + 6]. for all a: 6 IR. Notice that the assumption required in this result is satisfied for errors in domain of attractions of stable laws when a 6 (1, 2]. In Section 1.4 there are given arguments for errors given (6) but by series representations of stable laws given by LePage (1980) they can be easily extended to the general case. Theorem 1 and Theorem 2 validate approximation of the unknown dis- tribution of errors of contrasts by permuting residuals under special assump— tions on distributions of errors. The failure of this approximation in Ex- ample ] is due to the asymptotically pathological choice of contrast. At this moment we do not know any example with an essentially different type of contrasts for which applicability of Proposition 1 for this approximation would not be valid. We will end this section with the following result, uniform with respect to numbers of contrast vectors. 15 Proposition 3 Under the assumptions of Proposition 1 and for contrast vectors VAR“), k =1,...,r we have E722=1E0M[(vk(RM) ' WGM) _ (Vk(RM) ° “(it )12 = d -1 22=1EGM(VI:(RM)°W6M)2 n - m -1' 1.3 Examples Example 2 We now present a simple example which illustrates resampling by permu- tations and the advantage of using a reduced model. The vector of errors considered here has one value dominating the other ones and typifies what can happen when the distribution of errors has a long tail (in particular when our errors do not satisfy the usual assumptions about existence of moments). Consider an n x 2 matrix X defined as follows 1 ll 1 1 X: 1—1 1.1—1‘ We assume that n is even and the columns X1 = l and X; of X are orthog- 16 onal. Suppose that our actual errors consist of an n-vector 6 given by " 1 a 0 e = In 0 d One can notice at once that 1-6 Xz'f 6/X = - 1+ ——- - X2 lllll2 ||X2H2 P 2a/n . 2a/n : — 1+ — - X2 = , n 0 .. O .l and thus _ q a -— 2a/n -2a/n fi = —2a/n 0 0 )- a: Let us take for our contrast vector v the second column X2 of the matrix X. One can easily observe that the distribution v - elf is concentrated on two 17 points a and --a with the equal weights 1 / 2. We will compare three differ- ent methods of estimating this distribution: the resampling by permutation of residuals, the traditional bootstrap with replacement and the normal ap- proximation. On Figure 1 we present the distribution of interests itself versus first order approximations for n = 625 of distribution obtained by resampling residuals while on Figure 2 we present first order approximation of the dis- tribution obtained by applying the traditional bootstrap with replacements and the normal approximation. See Appendix 2.5 for details. If we consider the reduced model (2) for M = {n} and if the residual a — 2a/ n is large compared to residuals —2a/ n we will delete the first row from our model and then we will obtain 6 M = 6]} = 0 and thus resampling of residuals gives us a perfect estimate of the distribution of errors. 18 0 MI.— NIH O l —a a —a Distribution of v . 6]}. 0.5 «0.215; 0.5 l f a The resampling by permutation Figure l: The conditional distribution of errors and its recovery by permuting residuals 0.42e"l / 0426.“ ll/ fl /\ The traditional bootstrapping Figure 2: The distributions obtained by other methods 19 The normal approximation Example 3 As a second example, assume that errors are made from the errors in the first example by adding a sample vector from Gaussian white noise with a small deviation relative to the value of a. In our numerical simulations we will take it equal to a/\/1_i. Let us take as before M = {n}. Figures 3 and 4 present the conditional distributions of error of the contrast in the reduced and unreduced model. It illustrates the advantage of the reduced model. Although approximations in both cases are accurate, in the case of the unreduced model the confidence region which should be used consists of two separated intervals around —a and a while in the reduced model one can consider the natural confidence interval around the origin. 20 0.255 0.235 0.5 0.5 0.5 0.5 —a a —a a Distribution of v - 6|]: with previous errors The resampling by permutation modified by adding white noise for modified errors Figure 3: Conditional distributions in the unreduced model 0.2:? 0.2:? 1 1 r 7 Y if —-a a -—a a Distribution of v M - EM If after deletion The resampling residuals by permutation based on the maximal error after deletion Figure 4: Conditional distributions in the reduced model 21 Example 4 As a third illustration we have made computer simulations which compare resampling permutations of errors vs resampling permutations of residuals in a linear regression model fitting a cubic polynomial. Consider a 101- dimensional symmetric a—stable error 6 with independent coordinates and a 101 x 4 matrix X obtained by substitution of equally spaced x—values in l 2 1 3 __ J'- ,:1: ,x ), so :rm- — LL,- j = 1,2,3,4, 1:,- = (i — 1)/101, i = 1,...,101. We have used the Chambers the interval [0,1] to the vector of monomials (11:0,:1: et. a1. formula (1983) to generate stable errors by pseudo—random numbers. Since our result does not require any assumption on moments of errors we have observed three notable cases: the normal distribution (a = 2), Cauchy distribution (a = 1) and a = 0.8. We first compute three 101 x 1 row vectors v,, i = 1,2,3 of the matrix (XTX)‘1XT as well as the matrix of projection on the column space of X. Then, by generating pseudo—random permutations 7r we get a sample of two hundred errors v,- - 7r6, residuals v,- - #6i and projections of errors v, - r6, i = 1, 2, 3. Visual comparison of the values of these samples is made on parallel plots. Since scale changes in 6 apply equally to v,- - M, v,- - 71'6‘L ,v, ~ 7r6 it is the relative comparisons between these plots which are important. Four parallel lines represent the coordinate axes of four dimensional space and any point of this space corresponds to a polygonal line which joins values of the coordinates on each axis. For convenience all polygonal lines are started at the origin. For details concerning parallel plots see Chernoff (1973). In these parallel plots we see a very close agreement between the multivari- ate sampling distribution of {'v,--7r6,i = 1, 2, 3} and that of {v,--7r6i,i = 1, 2, 3} 22 conditional on the given 6. As might be expected the agreement is even bet- ter than suggested by the sampling distribution of {v,- - «6“, i = 1,2, 3}. For this example(d — 1)/(n - 1) = .03. In Part 2 we will give more detailed description of numerical methods and procedures we have used here. 23 23.61 :4 i i 2, v,- ~1re. =2 —23.61 a: 23.61 -23.61 2,v,~ - 7I'6'L. a: 23.61 -23.61 2,v,- - 11'6“. a: Figure 5: Results of resampling, Gaussian case 24 =4 i 232 6 —232.6 a=1,v,~-1r6. i -7r6‘L. a: 1,1); 232.6 -232.6 ..] 4 232.6 -232.6 1,115 -7r€. Figure 6: Results of resampling, Cauchy case 25 229093 -229093 229093 -229093 a = 0.8,v, - 7re 229093 -229093 i=2 i=3 i=4 a = 08,11, -1r6 Figure 7: Results of resampling, stable case with a = 0.8 26 1.4 Proofs Variants of the following lemma, in the special case when V is not ran- dom, are apparently well known Chernoff (1973) Section 4.1; Freedman and Lane (1983), pg 296, Lemma 1). Although formulated here in probabilistic language it illustrates rather a geometrical property of the group of permu- tations. We present the simple proof of this result. Lemma 1 Let V be a random contrast vector. Assume that V, a random n x 1—vector 6 and a o-field 9 all are independent of a random permutation it. Then Eg(v-ag)2/n! 062" = EG(Z V252 {em + Z V1:’Ib€a(a)€a(b)) /”' a¢b )a¢b a¢b = E9{uvue+—mvn2— (v 1N]? we}. = EG((: (2211/02) {2+ nnl( ——-1),,(ZV Vb)( )(Z€a(a)£a(b)) )) where 2,, denotes the set of all permutations of coordinates of vectors in IR". Since (V - 1) = 0 we obtain 71 —1 E9(V - M)? = E, (llVll'zn ('52 — o?) = Eanvu'zezn The above lemma and some elementary properties of conditional expec- tation are the only tools in the proof of our main result. PROOF OF PROPOSITION 1. By Lemma 1 we have that EQM [II/(RM) - m.) — (VIRI—II - «emf = WWI/(Re) - wet/XIII? = E9“(||V(R III-s“? s...,x..I (9) = EQMIIV( (.RI‘I) )II2 SEM/XM where the last equality follows from measurability of 33M /XM with respect to GM. By similar arguments we have also (10) EQMII’IRM ) WM)2 = SEMIEQMIIV R(II!) )II2 Using the fact that (6M, RM) ; (06M, RM) obtained in Section 1.2 and since both 33M and E‘Ms2 M" m" are TM measurable we have 2 ,EGMI(V(Rn)-W6M)-(V(RM)' new] = ErsEM/XM EQM(V(RM) ° W6M)2 83M EfEstgu/XM 2 36M (11) = _ .FMS — E stun/X1“ = Ems? SJMM/Xu The proof can be completed using an equality E’s3x=«d—nMn—nef 28 obtained by Box and Watson (1962); Section 4.1 in their study of F—distribut- ion approximations for sf/x/sf with non—normal errors 6 and also mentioned by Freedman and Lane (1983); see (21), p. 267. In our case it may be obtained from Lemma 1 by expanding in an orthonormal basis {ua} of X M with u1 = 1/(n — m) as follows 1 EMS? SnM/xM = —.—E‘" (IIWfM/XMII2 " (MM/XII '1)2/(n — m)) n — m — 1 d 2 = —— “21(110-71'6M)2— (6M -l) /(n — m) n — 171 —IE “I: = ———ZE‘“(u -27r6M) n — m- al__2 = ———1——dZ||ua| 232 n — In - 1 0:2 d—l 2 = ———s n—m—l ‘M' where Lemma 1 was used in the third equality. El Likewise we can obtain the result on the relative second moments of difference of 7I’€M — Mil. For simplicity of the notation let us formulate it in the case M = 0. If we denote by X e l the orthogonal complement of 1 in the column space of X we have then E‘||(7r6 — 7r6l)/X e 1”2 _ d -1 E‘II(7r6)/X 9 1||2 — n -1' Indeed, for any vector u we have by the definition of 3,2, that 33 = sE/xe, and ||u||2= (u 1)2/n+( (n—1)s2. Thus E]: Ef||( (7r6—7r6 )/X91||2= (n—1)Ef 3m”): (n—1)Ejrs2 9e/x 29 and Ef|l(7r6)/X 6 III2 = (n —1)Ef32. C PROOF OF PROPOSITION 2. For any random variables Z1, 22 with marginal distribution functions F 1,172 and for each 6 > 0, C > 0 we have that F2(:c — ca) — P (Z2 E Z‘ — Z 26) s Flor) SF2($+CI5)+P(Z2C ‘26). which applied to Z, = v . W6, 22 = v . rel, P = P‘ and C = C1 (notice that Z1l€ = v - elf, Z2l€ = v - reilY and C1 = l/E‘Z?) together with the Markov inequality I22 — ZII E‘IZ2 - ZII2 _2 P‘ —— > 6 < 6 ( C, — _ Cf gives us the second part of Proposition 2. From the Markov inequality applied to non-negative 7(6) we get P(7(6) > <5) S E 7(6)/<5- Thus the first part of Proposition 2 follows from Proposition 1 since it implies that E 7(6) = (d — 1) / (n — 1). The third part of Proposition 2 follows from Proposition 1 if we notice that it remains valid when we replace FL by 6 — FL and d — 1 by n — d. In fact it follows from (11) that 2 EfECI’U ' ”(1)2 = Ersc/xi Cf S? ’ where Xi denotes orthogonal complement of X. So the expansion in or- thonormal basis in Xl gives us, as before, that Efsf/x, = sf(n — d)/(n - 1) (notice that basis will consist of n — d contrast vectors). Cl 30 We will precede the proof of Theorem 1 by some considerations on con- structions of series representations of stable laws and Lévy processes as in- troduced by LePage (1980). Let u = (u,-)‘,?;l be any sequence of numbers belonging to (0, 1). We will define I?(uat) = 1{s€[0,l]:u15lns]/n}(t) and for 1 < i g n, by recursion, If (u,t) : 1{8€I0»1I=UI- i;IIIWEIMI/(n-I'IIU)’ It is not difficult to verify the following list of properties of a sequence (I?(u,t) 15;, (see also LePage (1980) and Kinateder (1990)). First it is obvi- ous that a sequence (1,-"(u, t))?=1 consists only of zeros and ones. Moreover for fixed t and n there are exactly k = lnt] ones in this sequence. Such a sequence can be interpreted as a combination without replacement of k-elements form n-elements when positions of ones indicate elements of {1, . . . ,n} which are chosen to this combination. On the other hand if we fix n, and u then the number of ones in the sequence will increase from 0 to n as t will increase form 0 to 1. A sequence of positions of subsequent appearance of ones or, equivalently, jumps in the sequence (1,."(u, t))§‘=l is a permutation of elements of {1,...,n}. It is worth notice here that these jumps can occur only at t = k/n and one at the time. The combination and permutation, which can be related in this way to (I?(u,t))?=1, depend in fact only on values of u = (um-”:1. We will refer to this as a combination and a permutation chosen by u. 31 Now if U = (Um-:1 is a sequence of i.i.d. random variables with uniform distribution on (0,1) then for fixed t E [k/n,(k + 1)/n), k E {0,...,n}, a random combination chosen by U has a uniform distribution over all possi- ble (2) combinations. Also the positions of jumps of (I,-"(U,-))?=1 i.e. the permutation chosen by U will be uniformly distributed over all n! possible permutations. Moreover I,-"(u,t) as a function of t, for fixed u, n and i, will change its value form zero to one at one and only one of the points t = k/n, and it is right continuous. It follows from that 1?('u,t)= 1Io,zI(J.-"(U)), where J," denotes the jump of I,”( u, -). So for any sequence of random vari- ables 6,, = (6,”)?=1 independent of U a formula (12) W) = Damn, i=1 defines cadlag process on [0, ll constant on intervals [(k — 1) / n, k / n), where k E {1, . . . ,n} and with n-jumps at points k/n,k = 1,... ,n equal to values of 6,,,,~, i = 1, . . . , n randomly permuted by U. Thus an _ n E _ n k_]' n mew-(L t.) L ( . I).-. is a random permutation of 6,, chosen by U. Consequently, for any vector v 6 1R” we have v - 7r6,, l6,, = Z v,,AL?l(6,I‘,-)§:1 1:1 From now on let I‘ = (I‘ 02:, will be arrival times of a Poisson process with intensity equal to one i.e. l",- = 21 + - - - + Z,- where Z,- are i.i.d. random 32 variables with exponential distribution with a parameter equal to one. Then (F,- / I‘,,+1)?=1 is a vector of i.i.d. random variables uniformly distributed over (0,1). Let 6 = (6,),9;l be random signs defined as in Section 1.2. Moreover let P and 6 be mutually independent. Then F—l/O 7! en = 6nd :1: = 61+.— ( ) ‘ I all") i=1 is a vector of i.i.d. random variables with a distribution form the domain of attraction of a symmetric a-stable law. For a continuous function v on [0,1] let Riemman type sums be defined by 2L1 v(k/n)AA2, where AA}; = A(k/n) - A((k — 1)/n). They converge in probability to ID1 vdA (see Schilder (1970)). Kinateder (1990) proved that a sequences of cadlag processes (L"(t)),€[0,1] is weakly convergent under Sko- rohod metric to (A"(t)),€[0,1] (see also LePage (1980)). The main idea was to apply the series representation of LePage (1980) of a Lévy motion and integrals with respect to Lévy motion. This representation states that a symmetric o—stable Lévy motion with cadlag trajectories can be written as Mt) = Zéiri—UOIILI,,1](I) = :6iPFI/01[0.tl(Ui)I i=1 i=1 where U is independent of (6,1‘). One can easily extend this formula for stochastic integral with respect to Lévy motion to obtain 1 00 / vdA = :5,r;‘/°5(U,). 0 ‘_ 1—1 The next proof exploits this convenient representation once again and applies it in a similar manner to Kinateder (1990) to obtain that n 1 "lingo:v(k/n)AL2l(6,-I‘,-)f:1a‘zs'jo vdAlj. k=l 33 PROOF OF THEOREM 1. By definition of v,, (see (7)) and Lemma 1 we have ”It ' “enlen = “6"” ( ?=1v(i/n)AL?I(6iFi)?:I \fl‘I‘I-(vn - 7r6,,)2 IIvnIISCa IIEnII ?=1v(i/n) ?=1AL?) n llénll ' Notice that Ia = (I-i)‘”(:£j(i)/~-(§év(i)/n)) —1/2 ( 11:1 [72/0 1/2 a. 172/0 - (2;. 6IFF‘/°)2/n) ' and AL? 22;. 6.17”“ II€nII = J2}; p.72”. By almost sure convergence of 23:, 6,I‘,—1/° and 22:, [72/0 and the fact that lim,,_. 2;, v (i) /n = f v(t)dt = 0 we have with probability one (13) lim ”6"” = (/ Mum)“, "Pm IlvnIIS‘n 2?:1v(i/n):?:1AL? : 0 (14) lim ”“°° ’1 “en“ (15) ,Iigg, Ilenll = ,l Err”. i=1 34 Now we will show that 2?_1v(;:-)AL'-‘ converges almost surely to f vdA. Kinateder (1990) has proven that a sequence Ln(t) is convergent with prob- F—l/a ability one to 23311‘ 6 I“) ,])(U ) (in fact convergence of the sequence of entire stochastic processes holds 1n Skorohod metric). Mostly the same ar- guments apply here when instead of 1(0.t) we will take a continuous function v. Namely with probability one "1320: r7 ”“6 v( (R"(U =2 r:”°6( v(U i=1 since lim R?(U) '—S' U.- (see Kinateder (1990); Proof of Proposition 3.1). In view of the series repre- sentation and definition of AL;1 this is equivalent to "11.1ng v:( )AL"_ 8‘3 Eff/“6,- v(U.-) éfolmm. This implies that "1930”" -7re,, 5,, g flli_.n0102n:v(z'/n)AL,’-’|(6,-I‘,-)§’_°=1 i=1 = iP:‘/°é.-v 0} i=1 has positive probability. Then P(§I‘:”°6.vw.) =0) = EP (BF/“mm =0I(r.6.).-EN) (17) > 0. But 2?; F:1/065'U(Ug) = fol mm and since 12 is a non-zero function thus fol vdA has a—stable distribution which is continuous. This contradicts (17). [:1 Let us discuss now the statements included in Example 1 in Section 1.2. For a fixed w belonging to the underlying probability space by 6;;(w) = (6;'m-(u))),’.‘=l we will denote a random variable obtained by traditional bootstrap i.e. by re- sampling with replacement from (6,-(w))?=1. For w from the set of probability one forl= 1,...,k we have . 1 n (18) “In Eellw-f;.l(u’) = lim _ Zeitvfiflw) : Eeitv'q n~oo "#00 n j_.1 . 36 We will assume that for our fixed on (18) holds. Denote 6,,(w) = (5,-(w))?=1. Then lEeitvn-x£n(w) _ eitvrql S IEeitv.-1rc,.(w) __ eitvn-e;(w)| +|Eeitv.-£;(w) _ eith-ékl n! (n - k)! 1 S (n—k)!( n! —3) [C H Eeitvlc;‘k(w) _ eith-Cg.’ . (=1 + Thus by (18) we get that vn-7renlen converge weakly to vk.ek with probability one. PROOF OF THEOREM 2. By Proposition 1 and Lemma 1 we have 2 Ef(v - 7T€i — v - m)2 = (d —1)||v||2—s‘—. n - 1 Thus by our assumption Ef(v ~ ml — v - 7r6)2 converges to zero in probability and thus for a given 6 > O and for sufficient large n we have P(Ef(v . mi — 12-7“)2 > 6) < 6. Now using Markov’s inequality in the same manner as in the proof of Propo- sition 2 we immediately obtain the result. 37 The next proof is just a slight variation Of the proof of Proposition 1. PROOF OF PROPOSITION 3. By (9) and (10) we have 22:1 EGM[(VI:(RM) - W614) - (WARM) 4611)]? 22:1 Eg“(Vk(Rn) - #6102 SfM/xM {71:11599""”(l/1091(4)”2 83., 22:1 E9“ ||(Vk(1't’11)||2 2 saw/X14 ——-,2 . ScM Now conclusion follows immediately from the proof of Proposition 1. 38 Part 2 Numerical Analysis 39 2.1 General Description of Software This section contains the description of our software which was created to ver- ify results on resampling methods for linear models. The software is intended to be developed to a package which, we believe, will become a very useful tool not only for verifying theoretical results and comparing different resam- pling methods but also for tracking new phenomena which can contribute to the development new effective statistical methods. The linear regression and time series are the two main areas in which these programs are assumed to be useful but we believe that there can be more applications for them. From the point Of view of functionality our software has two complementary components. The first one includes all procedures for numerical analysis of data will be written in the C—language. To assure the maximal portability we use the standard C—language as defined in Kernighan and Ritchie (1978). The second component has as its goal a visualization of Obtained sets of data. We would like to assure simplicity and functionality together with high qual- ity of graphics and we found that the NeXTSTEP environment available on NeXT computers and IBM compatibles satisfy both requirements the best. The whole software in the portable C—language serving the numerical part of our project was written by the author while the front—end and graphical ap- plications available in NeXTSTEP computer environment was programmed by Isaac Tsaiyi in cooperation with author and Dr. R.LePage. 4O 2.2 The Numerical Procedures As we have mentioned our software is designated to enable simulations and analysis of the resampling methods in linear regression models and time series. It will serve as an easy to use but powerful research tool for wide range Of problems connected with resampling methods and linear models in statistics. In its current stage the data analysis is made in the two steps. First all of our strictly numerical programs and procedures write their results into files with names usually chosen by a user. Then the graphical interface can be used for analysis of the data contained in these files. Thus the analysis and display of the data are separated processes here. In the future however we would like to control them under one joint application running on NeXTSTEP. In its present form our software has subroutines which compute all linear algebra Objects which are extensively used in studies of linear models such as inverse matrices, projections on given subspaces and more. We have also provided procedures for finding the least square estimates Of unknown pa- rameters of a given model. Many of them can be applied to time series as well as to linear regression without any change but, Of course, the properties of these two classes of linear models are quite different. Figure 8 illustrates the difference in the character of data Obtained when resampling by permutation is applied to two different models (linear regression and autoregression). In both cases the underlying distribution was normal. We also have efficient procedures for generating samples of data which correspond to a given model when its parameters are specified. Once we 41 7.1m 14.475665 11.339018 #831691 44.421016 JIM Dub 2 Data 3 Dan 4 03639] 0.130009 0.273100 W 41.139112 JJIO‘IM Duh 2 Dan 3 Data 4 Figure 8: The sample of 1000 elements obtained by permuting residuals in a linear regression model versus an autoregression model 42 generate an appropriate set of data we can apply to them a chosen resampling method. We have incorporated many programs which compute different characteristics of the resampled data. In our simulations we needed a very efficient generator of resampling schemes therefore we have written quick procedures for generating random sampling based on the generator Of pseudo- random numbers. In our theoretical research we have concentrated on the method of re- sampling by permutation but in the proposed package a user has also access to other resampling methods like the traditional bootstrap or bootstrapping by signs (see LePage (1990)). Generating sets Of data from as many classes of distributions as possible will be also assured. Currently because of our research interests we implemented methods of generating data from the do- main Of attraction of stable laws. In Figure 9 we can Observe how the nature of underlying distributions can influence the Obtained numerical results even for the same autoregression model. We have also made simulations for an autoregression time series. Let us briefly describe this model. Let (Xn)n€Z be an autoregressive process of order (I or AR(d) for shortness i.e. a stochastic process such that for each n 6 2 the following equation is satisfied d JYn = Z [Ban-k + 6n, k=l where (fnlnez is a sequence of i.i.d. random variables and (1301;, is a finite sequence Of real numbers. Let n E Z and m 2 (1 will be fixed integers. Denote 43 03‘3” «.13» 0.273100 W 43.139”: 4314754 Doll 2 Dub 3 Data 4 9.5107] 4.060931 7.664361 Figure 9: Resampling residuals in an autoregression model with Gaussian errors versus Cauchy errors. 44 5 = (51,...,gd,o,...,0) l 6 : (6n1---a€n—m+l) 1673'" X = (Xna-°-1Xn—m+l) Skdx = (XII-k, - ° - a Xn—lc—m-H) J A symmetric random operator M : ’R" —> ’R" will be defined by its matrix representation - Xn-l Xn—2 Xn—m i M = Xn-z Xn—s -°- Xn—m_1 L Xn—m Xn—m—l Xn—2m+1 . Notice that the columns of this matrix constitute the vectors 5 X , . . . , S '"X . Denote by M a vector space spanned by them and for any vectors 1:, y E 72'" by 33-31 we denote their inner product while a: / M will stand for the projection of a: on M. With this notation we have X = M H + 6. Moreover, MfiEM. 45 Indeed, Mfl = (sx-fi,...,smx.fl) d d (2 [Ban—ka - - - a Z fian-k—m-l-l) k=l k=l d 2: 3,.st k=l By ordinary residuals for this model we will mean the coefficients of a random vector 6']- =JY_X/M Since M ,8 E M we have E'L = e - e/M The term adaptive residuals will be used when referring to a random vector (X - X/Mh (Sm—IJY — Sm—lX/M)1 X11 — (X/M)1 X,_,,,+, — (Sm-1.17M )l where (:r);c stands the k—th coordinate of a vector 1: E 72'”. Of course, we have (5 — e/M)l (Sm—1€ _ Sm_1€/Sm_lM)1 46 The estimate 6 of 6 which is defined by the equation X/Sm‘lM = sm-lMfl‘ was considered even in the case when the second moment Of 6,, does not exist see Kanter and Steiger (1974), Harman and Kanter (1977). Then we have X=M6+el. It seems thus to be reasonable to consider the adaptive estimate 8“ of 6 by the equation X = M3“+e°. For such a model we have carried on analogous resampling methods. An autoregression model of the order four was considered with the same coef- ficients in the both cases but in the first one we deal with the Gaussian distribution of underlying stationary random sequence while in the second one a sample from Cauchy distribution was generated. Although we do not have any specefic theoretical results verifying applicability of our method of resampling to this model, results Of simulations strongly sugest that results Of this type can be true also in this case. 2.3 The Graphical Interface In our studies of resampling by permutations we have used the £a’Plot ap- plication developed by R. LePage, K. Podgérski and I. Tsaiyi. It exploits the idea of parallel plots. The main window of this application is presented 47 l -$.‘ll3724 Data 2 Dana 3 Date I Figure 10: The main window of the Euplot appli *ation for analysis Of 111111— tivariate data. 48 on Figure 10. On the upper right side of the main window there is the “Create profile” icon which makes it possible to set a profile on the parallel plot symmetric with respect to the horizontal axis. This profile can be com- pressed (rescaled) by using the “Profile Compression” slide. Those points on the parallel plot which lie outside of the profile are automatically painted in a different color than the others. A user can select the colors used. In Fig— ure 10 a profile was set up on the second and third vertical axes. The profile is indicated by two horizontal lines which show its actual position. Using the “Modify profile” icon one can easily change it. The current percentage Of points which are not pruned by our profile is shown in the “Confidence Percentage” display. Confidence interval half—widths appear in the smaller dark windows at the bottom. A change Of the horizontal gap between axes can be also regulated by a horizontal slide on the right bottom side of the display window. The slide just below the graph enable to track the data in the cases when the number of axes is such big that dimensions of our graph exceed horizontal dimension of the graph window. This can be also regulated by a change Of the horizontal gap between axes by the horizontal slide on the right bottom side of the main window. The rest Of the slides corresponds to the axes showed on the plot and can be used for changing the scale of each of them. Numbers on the top and the bottom of each axis indicates the actual value Of the corresponding coordinate of the first point which is not pruned by a given profile. The data for this graphical interface should be prepared in a file where coordinates of each n dimensional point are written in a one separate row. 49 In other words a file should have the form of a matrix where rows represents points from the n dimensional Euclidian space which we want to view on the parallel plot. Many additional features which come usually with any NeXTSTEP application are also added as for example possibility of saving a picture as a Post Script file. 2.4 Numerical Experiments In this section we will discuss some simulation data Obtained for few different specifications of the model (2). In our simulations we check the accuracy of taking Iy as conditional confidence intervals based on resampling residuals by comparing them with random intervals I. Obtained by resampling real errors. We also compare our method with the methods based on the traditional bootstrap. The conditional distributions of both Z and 2’ will be examined too. Here we will include only a small variety possible numerical studies. Some others requires additional procedures and supporting programs which are not yet available. The descriptions of existing software, as used here, are given in Appendix B. The proposed method of resampling is based on rearrangements Of resid- uals by random permutations which corresponds to 7r in our notation. Al- though hypothetically we can calculate the exact values of quantiles of a distribution obtained by resampling by considering all permutations of n— dimensional vectors, in practice it would not be possible even for not very large n (52! estimates the number of subatomic particles in the visible uni- verse). We can omit this by sampling independently from the distribution 50 of 1r and then using the empirical distribution GN(1:) = £le 1(4be where Z}, = 11,- - 7r" (Y — 17) and 7r", 1: = 1,... , N are independent permutations of vectors in R“. Then by the law of large numbers or, better, by the Glivienko— Cantelli Theorem the a/ 2 and 1 — a/ 2 sample quantiles a” and b” of G N will be approximations of corresponding quantiles for actual errors a’ and b’. In our computer programs we have used this method for finding conditional confidence intervals for a parameter 5 in a linear regression model. Example 5 Here we will consider the model based on fitting of the cubic polynomial as described in Section 1.3. The vectors of errors will be an independent sample from stable distributions with the parameter of stability equal to 2, l or, in other words, with Gaussian and Cauchy distributions. More precisely, we will consider the matrix X with the coefficients given by x”- = r171 '=1,2,3,4, 1:,- = (i — 1)/101, i = 1,. . . ,101 and as the contrast vectors the rows of the matrix of projection 011 the column space of X. Generating independently permutations we have Obtained the empirical distributions of random variables denoted earlier by Z and Z’. We can Ob- serve that the computable distribution of Z’, independently of the underlying distribution, is quite close to the unknown distribution of Z. On Figures 11 and 12 vertical lines 011 the third axis indicate approximately the same .85 confidence interval for 33 whether using resampled errors or residuals. 51 0 'JIT" 1m" I139” D «95ml 49.9)” 451245 0 9.9 M" 20.0507 ”le 0 41.170“ 49.07." 44.3” Figure 11: The distribution of resampled errors and residuals, Gaussian case. 52 0 -I7.9ll3 4|an ~l2li M4 0 91!)" l7IJ|4 111501 0 .Ol 41! -|76A91 425.175 Figure 12: The distribution of resampled errors and residuals, Cauchy case. 53 2.5 Conclusion We propose resampling from the permutations of residuals as a method of es- timating the conditional sampling distributions of contrasts; without moment assumptions and assuming only exchangeable errors. The idea of resampling permutations of residuals occurs in Box and Watson (1962), Freedman and Lane (1983) and possibly elsewhere. Both papers place considerable empha- sis on establishing conditions under which F—distributions associated with standard tests in regression are recovered by resampling permutations. In fact Box and Watson (1962) do not otherwise recommend the method while Freedman and Lane (1983) recommend it, but only as a descriptive method. Neither paper makes the crucial observation that the relations Of Proposi- tions 1, 2 imply that with probability tending to one the distribution v-7rei IY approximates v - (If, when rescaled by W, as (d — 1)/(n — 1) —» 0. In fact we have independently discovered Propositions 1,2 in the course of extending the results Of LePage (1990) which exploit resampling the signs 6 of residuals in much the same way: 22 - 6ei|Y x v - 6| |e| for symmetric errors. We believe that methods which approximately recover conditional distributions by resampling are more widely applicable, e.g. to time series, and we have adopted the term seamless resampling to indicate their robust character. 54 Appendices Appendix A We present some facts about exact distributions and their approximations which were described and illustrated in Section 1.3. We propose v - mile as a good approximation of v - rel]: except in un— usual circumstances. Notice first that this random variable has the same distribution as 7w - {J‘IE and thus takes the values yf = :l:(a—2a/n—2ak/n+2a(l—k—1)/n) = i2a(1— 2(k +1)/n) with corresponding probabilities mu-) 2(8) where k = 0,... ,l —- 1 and l + l = n/2. This follows from the fact that it corresponds to the distribution which samples one’s or minus one’s. The 55 above probability is assigned to the event when the choice consists of k one’s (minus one’s) and l — 1 — k minus one’s (one’s). The weight 1/ 2 comes from the two symmetric events in which the first value of 1m is equal either to one or to minus one. Let '7 denote the hypergeometric distribution with parameters n - 1,1 — 1,1 — 1. Then around points to our conditional distribution corresponds to the distribution of the random variable :té = i2a(1 — 2(7 + 1) / n). Since Em = (l—1)2/(n -1)zI/2 = n/4. and (l — U212 (71 - 1)2(n - 2) for large n we have approximately Va'r('7) = z 71/16, E(:l:{) z :l:(a — 4a/n) z in, and Var“) z E(—4a/n('y—n/4)—4a/n))2= = 16a2/n2E('7 — 11/4)2 +16a2/n2 z a2/n(1 + 16/11) x a2/n. This confirms the fact that our distribution is concentrated quite close to points a and —a and thus is reasonable approximation of the distribution of interest. One can also argue through the normal approximation of 7 for large n. Namely, it follows from Theorem 3 Of Bickel and Freedman (1984) that (47 - n)/fl converges in law to the standard normal distribution N(0,1). 56 SO for large n distribution of 7 is approximately equal to N (n / 4, n/ 16) and consequently the distribution of 5 is equal to N (a, az/n). Now consider the traditional bootstrap where the distribution of interest is approximated by the distribution of v-(e‘L )* Ie, where (61 )* indicates random variable obtained by sampling n-times with replacement from values of the vector €‘L. This random variable is distributed over points of the form (’61 — k2)(a — 20/”) + 20,02 — (1)/Tl with probabilities T1,i.,%_.'l;(kla l1) T1,};‘%_l(k2, 12), where k1,k2 = 0,...,n,11= 0,...,l -k1,12 = 0,...,1— k2. Here and henceforth Thphm stands for the trinomial distribution given by l l — :r x l_£_ Tammy) = (191) (192)”(193) y, :r y where :1: = 0,...,l, y = 0,...,l —:r and p3 = 1 —p1 -—p2. Similarly, Bum will denote the binomial distribution. Now one can notice that the distribution of v - (ei)*|e has asymptotically nonvanishing mass around zero. Indeed, from the following equality 7).P1,P2(x1y) = Bl,P1(x) Bl—x,—£2—(y)a 1_P1 and taking k1 = k2 = 0 we Obtain that the points 2a(lg—ll)/n, 11,12 = 0,. . . ,l are distributed with probabilities (1 — 1/"ln316'52501l BI,L(12)- 271—2 57 and using the Poisson approximation for the binomial distribution and Cen- tral Limit Theorem we will Obtain for large n the approximate distribution equal to e'lN(0,a2/n), which gives asymptotically the mass equal to e". In fact the mass will be even bigger since we should take into account not only k1 = k2 = 0 but also all cases when k1 — k2 = 0. This implies that in this case the traditional bootstrap fails to recover the distribution v - (If. It is possible to explore the distribution of v - (ei)'|e even more carefully a — 2a n since it is equal to the distribution of '7’ - / , where a two dimen- 2a/n sional vector 7 has a trinomial distribution with parameters 1, l /n, 1 / 2 — 1 / n and X" denotes symmetrization of a random variable X. So we can find that the distribution will have the positive mass around all points of the form :l:ka, k =0,1,2,.... As the last method considered here we will consider the normal approxi- mation N (O, 62||v||2) of our original distribution, where (a — 2a/n)2 +(l—1)(2a/'n.)2 n —1 ' We have then 62||v||2 z a2(1—2/n) and this method fails completely in recov- ering the distribution of v - elf, and of course the unconditional distribution may be far from normal. Appendix B N ow we will give short descriptions of the computer programs we have used to obtain the numerical simulations presented in the previous section. In the 58 near future we intend to develop with Dr. LePage this software to a package which will serve as a useful tool for exploring of the resampling phenomena of the linear models in statistics. From the point of view of functionality we have divide our tools into two independent although complementary parts, numerical and graphical. Numerical subroutines and programs. All strictly numerical pro- grams were written in the standard C—language as described in Kernighan and Ritchie (1978) so can be run on any computer with a C—language com- piler. All numerical results of these programs are held in files with names specified by a user. The main routines for resampling linear models are called blr, plr, slr. They provide quick and convinient tool for resampling in these models. Below we put more detailed description of their usage. The programs plr, blr, slr create sets of bootstrap replicate betahat vectors using three different methods of resampling: by permutations, by traditional bootstrapping and by sign change, respectively. For example plr 1000 observ design outputl output2 + will produce a file outputl with the least square fit Of observ to the model design and a file output2 containing 1000 bootstrap betahat vectors Ob- tained by applying ordinary least squares to 1000 independent uniform ran- dom permutations of the residuals (i.e. residuals from the original fit). The programs can also be used with only four parameters if a user wants to have a design matrix and a vector of Observations in one file. Then one should just type 59 plr 1000 input outputl output2 + The observation vector y is assumed to be the first column in the input file. The programs blr and slr work in an analogous way but instead Of per— muting residuals blr uses with-replacement sampling of the residuals while slr uses independent sign changes. Example 6 Consider input files with the following data observations: design: -3 1 1 2 1 0 1 1 -1 Notice then that the vector of residuals has the form -1 -1 Here are examples of output files Obtained by applications of our programs to these data. Thus plr 1000 observ design outputl output2 + will produce a file output2 with data 60 betahat1* 0.000000 .000000 .000000 .000000 .000000 .000000 .000000 .000000 .000000 OOOOOOOOO .000000 betahat2* -1.500000 -1.500000 -1.500000 0.000000 1.500000 1.500000 1.500000 -1.500000 1.500000 1.500000 Similarly for two others programs. blr 1000 observ design outputl betahat1* -1.000000 1.000000 2.000000 0.000000 -1.000000 0.000000 -1.000000 betahat2* 0.000000 -1.500000 0.000000 1.500000 0.000000 -1.500000 0.000000 61 output2 + (ENTER/RETURN) 0.000000 0.000000 -1.000000 0.000000 1.000000 -1.500000 slr 1000 observ design outputl output2 + betahat1* betahat2* 1.333333 0.000000 -0.666667 1.000000 -1.333333 0.000000 1.333333 0.000000 0.666667 1.000000 0.666667 1.000000 0.666667 -1.000000 0.000000 0.000000 0.000000 0.000000 -0.666667 -1.000000 The file output 1 in all three cases will include the fit: 0.000000 -2.000000 Here is the list of other subroutines with their short descriptions. vanderm0.c This programs creates matrices which are used to define a lin- ear regression models when fitting coefficients of polynomial is consid- 62 ered. In linear algebra matrices of this type are named Vandermonde matrices and they are of the form [2:5,] = [If],i = l, ...,n,j = 1, ...,m. stab_err.c We are using the generator of pseudorandom numbers and for- mula given in Chambers et. a1. (1983) to generate a sample from symmetric stable distribution. The values of a sample are written into a file specified by a user. pow_uni.c Here we are generating a sample from a distribution which is power of the uniform distribution on a given interval. lin_regr.c For a given regression model we compute all matrices and vectors which are important for statistical analysis (including the projection matrix, the least square fit of unknown parameters and the vector of residuals). permJes.c This program generates a sample obtained by resampling a given vector by a random permutation and then taking the inner prod- uct with another given vector. bootstrp.c For a defined regression model and a given vector of errors this program produces three samples connected with our technique of re- sampling. Namely, we will get samples from exact and approximated conditional distributions as described above as well as a sample which is difference between them. These program was used to verify the ac- curacy Of our approximation. trad_bts.c The same function as the above program but when the tradi- 63 tional bootstrapping is considered instead of resampling by permuta- tions. res_err.c This program computes the distribution of the ends of random intervals defined in Section 1.2 for a given regression model. conf_res.c A sample from the conditional distribution of the ends of confi- dence intervals proposed in Section 1.2 is computed. clt_appr.c A sample of the ends Of confidence interval obtained when the normal approximation is used to our distributions. This program was used to compare two methods. Graphical support. To visualize Obtained numerical data we have used two systems Turbo-C, Borland implementation of the C—language available on IBM—compatibles and utilities provided by the NeXTSTEP. The last one we have found particularly useful for creation of the high level graphical utilities and therefore finally we would like to have our package available in the compact form on this computer. Turbo-C graphics We have created several programs using Turbo Graphic which deliver some pictures illustrating properties of our numerical data. The one which was used here to draw histograms Of obtained samples enables to produce a histogram of any set of data (not nec- essary bell shaped) and illustrates how heavy are the “tails” of our sample which is important when distributions without finite moments are considered. 64 La’Plot The new application in the NeXTSTEP computer environment was developed to simplify analysis Of our data. It enables to track mul- tidimensional data exploiting the idea of parallel plot (see Chernoff (1973)). Confidence regions can be Observed in great detail. High quality graphics produce pictures. This application will be developed to the full software serving resampling techniques in linear models. See also Section 2.3. 65 Bibliography [1] Bickel, P.J. and Freedman, DA. (1981). Some assymptotic theory for the bootstrap. Ann. Statist. 9, 1196—1217. [2] Bickel, P.J. and Freedman, DA. (1984). Asymptotic normality and the bootstrap in stratified sampling. Ann. Statist. 12, 470-482. [3] Box, C.E.P. and Watson, GS. (1962). Robustness to non—normality of regression tests. Biometrica, 49, 93—106. [4] Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, RA. (1983). Graphical methods for data analysis. Belmont:Wadsworth. [5] Chernoff, H. (1973), Using faces to represent points in k—dimensional space. J. Am. Statist. Assoc., 68, 361—368. [6] Doob, J.L. (1953), Stochastic Processes. Wiley: New York J. Am. Statist. Assoc., 68, 361-368. [7] Efron, B. (1979), Bootstrap methods: Another look at the jackknife. Ann. Statist., 7, 1-26. 66 [8] Freedman, D.A. (1981). Bootstrapping regression models. Ann. Statist., 9, 1218-1228. [9] Freedman, D.A. and Lane, D. (1983). A nonstochastic interpretation of reported significance levels. J. Business Econ. Statist. 1, 292-298. [10] Hannan, E.J. and Kanter, M. (1977) Autoregressive processes with infinite variance. J. Appl. Prob. 14 411-415. [11] Kanter, M. and Steiger, W.L. (1974) Regression and autoregression with infinite variance. Adv. Appl. Prob. 6 768-783. [12] Kernighan, B.W. and Ritchie, D.M. (1978) The C Programming Lan- guage. Prentice—Hall Inc., Englewood Cliffs, New Jersey [13] Kinateder, J. (1990), An invariance principle applicable to the boot- strap. PhD Thesis, Michigan State University. [14] Kuelbs, J. (1973), A representation theorem for symmetric stable pro- cesses and stable measures. Z. Wahrsch. Verw. Gebiete 26, 259-271. [15] Lahiri, SN. (1992). Bootstrapping M—estimators of a multiple linear regression parameter. Ann. Statist., 20, 1548-1593. [16] LePage, R. (1980), Multidimensional infinitely divisible variables and processes. Part I: Stable case. Technical Report # 2.92, Statistics De- partment, Stanford. 67 [17] LePage, R. (1990), Bootstrapping signs. in Exploring the limits of bootstrap, R. LePage and L. Billard Eds., John Wiley and Sons Publ., (1992) [18] Schilder, M. (1970), Some structure theorems for the symmetric stable laws. Ann. Math. Statist., 41, 522-536. [19] Wu, C.F.J. (1986), Jackknife, bootstrap and other resampling meth- ods in regression analysis. Ann. Statist., 14, 1261-1295. 68