WIlWlNlHWIINWMNININHIHIIHHWWI meals (m 00 l\\|\\\\\l\\\1\||\WWI\lHWWI Will 31 ms LIBRARY Michigan State University This is to certify that the dissertation entitled Conditional Inference For Incomplete Permutation Bootstraps In Multiple Linear Regression presented by Rudolf Bohumil Blazek has been accepted towards fulfillment of the requirements for Ph.D. «main Statistics till] I ’«r r. 1 'aoul LePage Date December 18, 1998 MS U is an Affirmative Action/Equal Opportunity Institution 0-1277? PLACE iN REIURN BOX to remove this checkout from your record. TO AVOID FINE return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 1M Macs-p.“ CONDITIONAL INFERENCE FOR INCOMPLETE PERMUTATION BOOTSTRAPS IN MULTIPLE LINEAR REGRESSION By Rudolf Bohumil Blaz'ek A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree Of DOCTOR OF PHILOSOPHY Department of Statistics and Probability December 21, 1998 ABSTRACT CONDITIONAL INFERENCE FOR INCOMPLETE PERMUTATION BOOTSTRAPS IN MULTIPLE LINEAR RECRESSION By Rudolf Bohumil Blaz’ek In this dissertation we develop a method for estimating conditional distributions of estimation errors for coefficients in multiple linear regression. These distribu- tions are conditional on how the Observation errors 6 are presented to the model. The estimation of these distributions is achieved via incomplete permutation boot- strap of the observed residuals. We prove results ensuring that the incomplete permutation bootstrap distributions approximate the desired conditional distribu- tions under very relaxed conditions. In particular, the key assumption needed is for the errors 6 to have an exchangeable distribution. For the case of independent block permutations in a model with i.i.d. errors from the domain Of attraction of a stable law with a < 2 we prove an invariance principle assuring correctness of confidence regions based on the incomplete permutation bootstrap. There is an application of these methods to wavelets. Mym rodicfim iii ACKNOWLEDGMENTS I would like to express my deep gratitude to my advisor Dr. LePage for in- troducing me to the subject studied in this dissertation, for his consistent encour- agement, and for all he has taught me during the course of my graduate career. Without his help my studies and research would have been much more difficult. I would also like to thank all other members of the Department of Statistics and Probability, with special thanks to members of my Guidance Committee, Dr. Gilliland, Dr. Levental, Dr. Salehi, and Dr. Nowak of the Department of Electrical and Computer Engineering. Also Dr. Fotiu of the Computer Laboratory Statistical Consulting Service for his support in my research. No less important is my appreciation which I would like to express to my colleagues and friends who enlightened my personal life. I would like to thank my parents for their spiritual support and wisdom they shared with me during all my life. Thanks to my sisters Zuzka and Hanka; and also to Wen-Ying. iv TABLE OF CONTENTS LIST OF FIGURES vi 1 Incomplete Permutation Bootstrap in Multiple Linear Regression 1 1.1 Introduction ................................. 1.2 Random Permutations in Multiple Linear Regression .......... 1.3 Conditional Models ............................. 1.4 Incomplete Permutation Bootstrap .................... 1.5 Independent Block Permutations ..................... 1.6 Examples .................................. 1.7 Asymptotics for Independent Block Permutations ............ 1.8 Exact Confidence Intervals Based on Incomplete Permutation Tests . . 2 Wavelet Expansion Model 2.1 Introduction ................................. 2.2 Modified Discrete Wavelet Transform ................... 2.3 Illustration ................................. 2.4 Conclusions - Incomplete Permutations of Residuals ........... APPENDICES A Matrix formulation of Basic Results BIBLIOGRAPHY 1 9 13 19 22 29 39 44 47 47 48 50 57 58 58 64 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 LIST OF FIGURES The conditional and unconditional distribution of the errors 1) - we in a simple linear regression with i.i.d. errors from the domain of attrac- tion of a symmetric stable law with a = 0.5 (in particular, 6, = (LU,-2 are reciprocals of squared i.i.d. uniform variables with independent symmetrical random signs). The estimates of the distributions were obtained via Monte—Carlo using 2000 replicas of the corresponding random permutations. ......................... Comparison of the exact and incomplete permutation bootstrap ap- proximated conditional distributions of the errors I) - we in a simple linear regression with i.i.d. errors from the domain of attraction of a stable law with a = 0.5 (see Figure 1.1). All estimates of the distributions were obtained via Monte—Carlo using 2000 replicas of the corresponding random permutations. ............... The conditional and unconditional distributions of the errors 2) - 7re in a simple linear regression with a single outlier. The corresponding independent block and full permutation bootstrap approximations were obtained via Monte—Carlo with 5000 replicas of the correspond- ing random permutations. ....................... Comparison of the exact and approximated conditional distributions of the errors 1) - 7re in a simple linear regression with a single outlier. (Data from Figure 1.3.) ......................... Four point linear regression with a single outlier. The conditional and unconditional distributions of the errors 12 - 7n: and their correspond- ing independent block and full permutation bootstrap approxima- tions. The approximations were Obtained via Monte—Carlo with 5000 replicas of the corresponding random permutations. ...... The conditional and unconditional distribution of the errors I) ' rre in a simple linear regression with a single outlier. The corresponding independent block and full permutation bootstrap approximations were Obtained using Monte—Carlo with 5000 replicas of the corre- sponding random permutations. (Same as Figure 1.3) ........ Comparison of the exact and approximated conditional distributions of the errors 1) - ms in a four point linear regression with a single outlier. (Data from Figure 1.5.) ......................... Comparison of the exact and approximated conditional distributions of the errors ’0 - we in a simple linear regression with a Single outlier. (Same as Figure 1.4, see data in Figure 1.6.) ............. vi 5 7 34 35 36 36 1.9 The conditional and unconditional distribution of the errors I) - ms in a simple linear regression with i.i.d. Gaussian errors. The correspond- ing independent block and full permutation bootstrap approxima- tions were Obtained using Monte-Carlo with 2000 replicas of the corresponding random permutations. ................. 1.10 The conditional and unconditional distribution of the errors 11 - me in a simple linear regression with i.i.d. errors from the domain of attrac- tion of a symmetric stable law with a = 0.5 (in particular, 5,- = diUi‘z are reciprocals of squared i.i.d. uniform variables with independent symmetrical random signs). The estimates of the distributions were obtained via Monte—Carlo using 2000 replicas of the corresponding random permutations. (Same as Figure 1.1) ............. 2.1 Mexican hat mother wavelet 1/2(t) = (1 — t2)e"2/2 ............. 2.2 Wavelets $075 and $0.5 based on Mexican hat mother wavelet ...... 2.3 The conditional distribution of the estimation errors for the coefficient 02 of 1120,_5, given the order statistics of the errors e(left), and its es- timate based on full permutation bootstrap of the observed residuals (right) ................................... 2.4 The incomplete permutation bootstrap approximation of the distribu- tion Of the estimation errors for the coefficient 02, conditional on the fact that the errors with large absolute values align with the last 72/ 2 components of e. In the simulation study the components of e have been sorted to satisfy this condition. .............. 2.5 The approximation of the conditional distribution of the estimation errors for the coefficient 62, given that the errors with large absolute values lie among the first 11/ 2 components of e. In the simulation 38 51 51 54 study the components of e have been sorted to satisfy this condition. 55 2.6 For a vector of errors with randomly arranged components, the block- wise conditional distribution of the estimation errors for the coeffi- cient 6’2, given the order statistics of the errors in each of the two blocks are well recovered by the incomplete permutation bootstrap of the observed residuals. In this particular case the true block-wise conditional distribution appears similar to the conditional distri- bution given the order statistics of is (see Figures 2.3 and 2.7 for comparison) ................................ 2.7 Comparison of conditional distributions of the estimation errors for the coefficient 62 and the corresponding incomplete permutation boot- strap recoveries. Fhll permutation bootstrap was used for the dis- tribution conditional on the order statistics of e ............ vii 55 56 Chapter 1 Incomplete Permutation Bootstrap in Multiple Linear Regression 1. 1 Introduction One of the common goals in multivariate linear models y = X - ,8 + e is to find Si— multaneous confidence regiOns for a subset of components of the unknown vector fl. An important consideration is that the relative performance of various estimators and designs should be capable of evaluation before taking data. Introduced as a descriptive method in [8], permutation bootstrap of the Ob- served residuals was shown by the authors of [17] and [18] to provide approximate conditional confidence intervals for the (contmstl) components of 6, given the or- der statistics of the components of the vector Of errors 6. A very important feature of these conditional methods is that they may be applied to all but the constant term without moment assumptions on exchangeable errors. The key idea to be developed in this dissertation is that a linear form 2 view, 1A vector v 6 Rd will be called a contrast if v - 1 = 0. Here 1 represents a vector with all components equal to 1. may for outsized errors {6,} be greatly changed depending upon how the outsized em. line up with I),- under a random permutation 7r uniform over the group 2,, of all permutations of {1, . . . , n}. We shall show that the estimation of particular conditional distributions (as- sociated with how 6 aligns with a contrast '0) can be Obtained via (incomplete) random permutation of the observed residuals over a subgroup A of the permuta- tion group 2,, of indices {1, . . . ,n}. A very special example of this would be to use independent blockwise permutations2 uniformly distributed over the indices of vectors v1 and '02, respectively, for a vector ..=(::) where both v1 and v2 are contrasts, and the ig—norm “'01“ is relatively small in comparison to [[112 II. It is then of interest to study the conditional distribution of v - 7re condi- tional on which block 1:1 or v2 the extreme values of the errors align with. F ig- ure 1.1 illustrates this with long—tailed errors 6,- = (SIM-'2 (where 61, . . . ,6" is a Rademacher sequence and U1, . . . , Un are i.i.d. uniform random variables indepen- dent of the random signs 6,) in a simple linear regression y = a + flu: + us with :1: = (—10, —9.5, . . . , 10)’. Here the centered vector 2: plays the role of the contrast 2The idea of blockwise random permutations appears in many contexts, for example in re- stricted randomizations (e.g. Lehmann [14, Chapter 5, Section 9]). If the variation among the experimental units is excessive, then a conventional test will have small power. As a remedy Lehmann examines restricted randomization analogous to stratified sampling: ...The experimental material is divided into subgroups, which are more homoge- neous than the material as a whole, so that within each group the differences among the u’s (units) are small. In animal experiments, for example, this can frequently be achieved by a division into litters. Randomization is then applied within each group. The focus is thus on “matching designs” required for Fisher’s exact tests. In particular, the contrasts considered by Lehmann are restricted to indicators of treatment group. 3 'v with '01 consisting of the 7 smallest and 7 largest components of a: (see (1.40) on page 32). In the second row the figure shows the distribution of v - 7re conditional on the event that the components of c with large absolute values align themselves with the block 1:1 of extreme values of the vector :13, while the third row illustrates the conditional distribution given that the extreme values of e align instead with the block '02. The full permutation bootstrap approximation and the unconditional distribution of v - we are shown for comparison in the first row. Figure 1.2 com- pares the true conditional distributions to their incomplete permutation bootstrap approximations. For the same model Figure 1.3 provides a second illustration with a “pulse” error i.e. containing a single outlier e = (a, O, . . . , 0)’. The second and third rows show the .' distributions of v - 7re conditional on whether the single large error aligns with the 2, respectively. (In other words, the last two rows show the conditional block 121 or v distributions v - 7T6 |7r(1) 6 A1 and v - Ire [7r(1) 6 A2 where the sets A1 and A2 contain indices of components in 1) corresponding to 121 and '02, respectively.) The first row of Figure 1.3 illustrates the full permutation bootstrap approximation of the unconditional distribution of v-7re. Figure 1.4 offers a more detailed comparison of the incomplete permutation approximations of the corresponding conditional probabilities. Motivated by the above idea we extend results Of [17] to create conditional confidence regions given A7r, where A is a subgroup of Zn and Ar = {mr : V E A}. We achieve this by the means of incomplete permutation bootstrap of the observed residuals, i.e. bootstrap based on random permutations distributed uniformly over A, instead of 2,, as in [8, 17, 18]. These are shown in row 3 of Figure 1.3 and in Figure 1.4 with A permuting the indices in each block A1 and A2 among themselves. Figure 1.1: The'conditional and unconditional distribution of the errors 2) - 7I€ in a simple linear regression with i.i.d. errors from the domain of attraction of a symmetric stable law with a = 0.5 (in particular, 6,- = QUIZ are reciprocals of squared i.i.d. uniform variables with independent symmetrical random signs). The estimates of the distributions were obtained via Monte-Carlo using 2000 replicas Of the corresponding random permutations. 1 . 1(10-6 ‘ -400000 400000 -400000 400000 Unconditional distribution of v - ms Full permutation bootstrap approximation of the distribution of v - 7re 4.xio*~ 2 . x10.6 L —400000 400000 -400000 400000 Conditional distribution [.1 of v — 1re given Incomplete permutation bootstrap that extreme values of ms align with v1 approximation of the distribution £1 ~400000 400000 -400000 400000 Conditional distribution £2 of v . 1re given Incomplete permutation bootstrap that extreme values of we align with v2 approximation of the distribution £2 Figure 1.2: Comparison of the exact and incomplete permutation bootstrap ap— proximated conditional distributions of the errors v-rre in a simple linear regression with i.i.d. errors from the domain of attraction of a stable law with a = 0.5 (see Figure 1.1). All estimates of the distributions were Obtained via Monte—Carlo using 2000 replicas of the corresponding random permutations. 2.x10"> j -400000 400000 -400000 400000 Comparison of the conditional Comparison of the conditional distribution £1 and its approximation distribution £2 and its approximation The fact that the permutation is incomplete generally introduces a conditional bias to the least squares estimator of ,8. This bias is not present in the case A = 2,, studied in [17]. We introduce a conditional bias adjusted estimator for which we obtain in Theorem 1.3.4 below a design formula (1.19) for the relative size of the mean square of its (remaining) conditional bias (RMSB). Aside from bias a key issue is the relative size of the mean squared discrepancy (RMSD) between the exact Arr—conditional distribution of the bias adjusted esti- mator and its approximation based on incomplete permutation bootstrap of the Observed residuals. Theorem 1.4.1 below establishes an exact design formula (1.23) for the relative size of the mean square discrepancy. In Theorem 1.4.3 we also obtain a design formula (1.27) for the relative reduc- tion in the size of our conditional confidence regions (based on the bias adjusted estimator and incomplete permutation bootstrap) versus the full permutation boot- strap in cases where both the RMSB and RMSD are small. Figure 1.3: The conditional and unconditional distributions of the errors 1: - we in a simple linear regression with a single outlier. The corresponding independent block and full permutation bootstrap approximations were Obtained via Monte— Carlo with 5000 replicas Of the corresponding random permutations. 0 . 05' 0.05 0.04 0. 03’ 0. 03’ 0.02 0 01 0 01 -10 -7 o 7 10 -1.0 -‘7 0 f :0 Unconditional distribution of v - Ire Full permutation bootstrap approximation of v - we 0 14* 0 l 0. 12’ 0.1 0. 08 r 0.05 0. 06 t 0.04 0.02 -10 -7 0 7 10 -i0 i7 0 '3 10 Distribution of v - we | 1r(l) 6 A1 Incomplete permutation bootstrap approximation of v - 7re |7r(1) 6 A1 0. 0'7 - 0. 07> 0. 06 - 0 . 05 r 0. 05] 0. 04 > 0 . 03* O. 03‘ 0. 02 * O. 01 > o. 01. -i0 -7 o 7 1‘0 -10 -‘7 0 i 10 Distribution of v - we I 1r(1) 6 A2 Incomplete permutation bootstrap approximation of v - we | 1r(1) 6 A2 Figure 1.4: Comparison of the exact and approximated conditional distributions of the errors '0 - 7re in a simple linear regression with a single outlier. (Data from Figure 1.3.) ""1 0 7.. -10k —7 0 Comparison of v - 7re |7r(1) 6 A1 Comparison of v - 7re | 7r(1) 6 A2 and its approximation and its approximation We refine these results in Section 1.5 where we obtain a formula (1.33) for the relative An—conditional mean square discrepancy for the case of independent block permutations of block—wise contrasts in which the estimator is unbiased. In this section we also obtain simplified versions of the formulas for the case of independent block permutation bootstrap in simple linear regression. The reduced formulas and the performance of the independent block permutation bootstrap in simple linear regression are illustrated by a numerical simulation and comparison in Example 1.6.1. In addition to developing these incomplete permutation bootstrap confidence regions we also extend the idea of Quade (1973, [20]) and the results from [19] and [9] to find an exact one—dimensional confidence interval (1.52) for a single coefficient of the regression model (1.1), conditional on the incomplete random permutations of the errors (Proposition 1.8.1). Unfortunately, the pivot employed in this exact method seems not to generalize to simultaneous confidence regions for several coefficients. At present it would seem that our incomplete permutation method provides the only viable solution for the case of simultaneous confidence regions, short Of a fully exhaustive exact permutations test method. Although our results are not limited to the case of errors attracted to stable laws, we establish limit theorems for the special case of errors {6,} belonging to the domain of attraction of an a—stable law with a < 2. These limit theorems describe the conditional convergence Of conditional confidence intervals based on incomplete block—wise permutations, founded on results in [18]. For such errors the sampling distribution of the size of the conditional confidence intervals tends to be small compared to the size of unconditional confidence intervals. In Chapter 2 we discuss an application of these results to the problem of esti- mating the coefficients of a wavelet expansion of an L2 function which can, from the practical point of view, be considered a special case of model (1.1). The spe- cial form of the matrix X in the wavelet expansion model suggests that instead of the full permutation bootstrap one should consider using incomplete permutation bootstrap with a specifically selected permutation subgroup A. In wavelet expansion of an L2 function f the vector y of the sampled values of the function f tends to be of a large dimension n, therefore the limiting theory of the (incomplete) permutation bootstrap will apply to the discrete approximation of the original wavelet expansion problem. Therefore these modified bootstrap methods can potentially be applied with good results to wavelet shrinkage (see [5]) to construct simultaneous conditional confidence intervals for the wavelet coefficients not only in the case of normal errors as considered by the authors in [5], but also in cases with ill behaved errors for which the central limit theorems fail. An important example Of such a situation is the case of errors attracted to a stable law with index a < 2 mentioned above. 9 1.2 Random Permutations in Multiple Linear Regression Consider a linear regression model y=X~fi+7re (1.1) where 31an is an observed real vector, and is a known matrix with real compo- nents, Bob, 1 is a vector of unknown real parameters, and 7r is a random permutation with distribution uniform over the collection 2,, of all permutations over {1, . . . , n} The vector of errors is introduced into the model via its random n—presentation 7re obtained from 6 by applying 7r to the indices of the components of e. The unknown real vector 6,, x1 itself is considered to be non-random. We will assume that n 2 1, X has full rank d and that either its first column is the vector 1 with all components equal to 1, or alternatively that all the columns of X are orthogonal to 1. Typically, the goal is to estimate the joint unconditional distribution of the estimation errors 3,- — [3,- for several indices i Z 1. Here ,0 represents the least squares estimator A [3 = V-y =((X’X)‘1X’)-y We will be concerned with a modification of the above goal which seeks instead to estimate particular conditional distributions of the estimation errors. Denoting the rows of the matrix V by 12,-, i = 1,. ..,d, allows us to rewrite the estimation errors as 3,- — 6, = v,- - we and for i Z 2 consider a more general problem of estimating the joint distribution of v - ms for several contrast vectors 2) which are generally not assumed to be rows of the matrix V. 10 Note that this is a generalization since if 1 is the first column of X, then for each i 2 2 the vectors 2),- are contrasts. Indeed, for all i _>_ 2 the vectors 1),- are orthogonal to the first column 1 of the matrix X since VX is equal to the identity matrix. In the case when all columns of X are orthogonal to 1 the definition of V clearly guarantees that v, are contrasts for all i _>_ 1. 1.2. 1 Basic Properties Let us first consider a few basic properties of random permutations. Hereafter we will assume that 7r is a random permutation with distribution uniform over the group 2,, of all permutations over {1, . . . , n}. For an n-dimensional vector 6 and a permutation p E 2,, we will by p6 denote the vector obtained from 6 by applying p to the indices of its components. The i-th component of p6 will be naturally denoted by (pm. We will use the usual notation E}- to denote the expectation conditional on a a—algebra f. If Z is a random variable, then Ea(Z) will for simplicity be written as EZ. In this section we will take advantage Of the following two facts. For any measurable function f, any n-dimensional real vector 6, and for all i, j = 1, . . . ,n, where i aé j, it holds that Ef(€1r(i)) = £23 “£0 (1-2) Er“) _ 1 n 1 f(€7r(j)) “ Tl. _1;f(€k)‘ T—l_—1f(€7r(i)) (13) To see that (1.2) holds write Ef(e,,(,-)) = E 22:, 1{,,(,-)=k}f(ek) and Observe that for all i and k the probability P(7r(i) = k) is equal to -11;. Similarly, for every i and 11 l we have that E"(f)f(e,,(j)) = E"(‘)2k¢,1{,,(j)=k}f(ek) as. on the set {7r(i) = l}, and that for all k aé l and i # j P(7r(j) = k | 7r(i) = l) = —1—- Therefore (1.3) n—l' holds as well. Lemma 1.2.1 (Lemma 1 in [17]) Let 6 and v be vectors in IR" and let 7r be a random permutation with distribution uniform over 2,,. Then for every contrast vector u in B" it holds that Em . ne)(v . ms) = n—iim . v) “e — enz, (1.4) where E denotes the vector 1(e - 1) /n with all components equal to the average of the components of 6. Proof: First notice that for any vector 2 6 En the equation (1.2) implies that E (z - ms) 2 2 - 3. Consequently we can write E [u - 7r(e — 8)] [v - 7r(e — é)] E(u~7re)(v-7re) - ('I'1.-'i3)||1‘s||2 = E(u-1re)(v'7re) since u is a contrast. Therefore we can assume, without loss of generality, that e is also a contrast since for an arbitrary vector 6 this modified lemma can be applied to the vector 6 — E which is a contrast. Observe that for a contrast is the right hand side in (1.3), with the identity function in place of f, becomes flew). Then, using equations (1.2) and (1.3) and the fact that u is a contrast, we obtain n E(u - Ire)(v - we) = Z u,v,-Ee:(,-) + Zu,vjEe,,(,)E“(i)e,,(J-) i=1 i¢j 1 1 = ; ”EH2 (u ' v) " EL—l guivjEérer) I J 12 = inelflu - v) — ||e||2[(u-1)(v-1) — (u-vH __1___ n(n —- 1) 1 2 — ”_lllell (w). Corollary 1.2.2 Let u, v, and e be vectors in B" and let 7r be a random permu- tation with distribution uniform over 2,,. Then Emma-we) = ”:1(It—a].v)”._.—snz+(.-...—.)Hau2 (1.5) = ~77},- ([u — a] -[v — 27]) Ile — an? +0147) “an? (1.6) Proof: To prove (1.5) write E(u-7re)(v-7re) = E([u-—fi.]-7re)(v-1re)+E(fi-1re)(v-7re) = E([u—'&]-1re)(v-7re)+(il-e)(iJ-e) (1.7) = nil (In—a]-v)IIe-alr+(a-a>néni by Lemma 1.2.1 with u — 11 playing the role of the contrast vector. To see that (1.6) holds notice that in (1.7) E([u — it] - ne)(v - n6) :2 E(['u. —- fl] - ne)([v — 13] - 7T6) since E([u — 11] - 7T6)(’l—) - re) = (i2 - E)E([u - it] -7re) = 0. Corollary 1.2.3 Let e and 'v be vectors in B" and let 7r be a random permutation with distribution uniform over 2,,. Let W be a subspace in R" with either 1 E W 13 or 1 _L W. Then for any contrast vector u in R" 1 E(u~ (W€)/W)(v' (Tel/w) = ”_— _1(U/w°’v/w)ll€ — éllz- (1.8) Here u/W denotes the projection of the vector u. on the space W. Proof: We can clearly write (u- (Ire)/W)(v - (7T6)/W) = (u/W -7re)(v/W - re) where u/W is a contrast. Hence a direct application of Lemma 1.2.1 implies that (1.8) holds. To see that u/W is a contrast recall that either 1 E W or 1 _L W and observe that because of u being a contrast u/W - 1 = u - 1/W = 0 in both cases. 1.3 Conditional Models To simplify the notation of projections we will use z/x to denote the projection of a vector z E R" on the subspace generated either by a set X of vectors in R", or by the columns of a matrix X. Further, X i will denote the appropriate orthogonal complement of the corresponding space and zi will stand for z/X, whenever the meaning of X is implicitly clear from the context. Similarly, z 1 X will indicate that the vector 2 is orthogonal either to the space X or to the column space of the matrix X. Before we introduce the conditional model, let us briefly mention a few impor- tant properties of the permutations and random permutations being used. Consider a subgroup A of the group 2,, of all permutations over {1, . . . ,n} and recall that for every permutation V E A there is a unique inverse permutation in A, denoted 1 1 by U" , such that VV- results in the identity permutation. In addition, if A is a 14 random permutation with distribution uniform over A then also A” is distributed uniformly over A. We will also take advantage of the fact that for such a random permutation A and for any vector v E B" it holds that AEAv 2 EM). (1.9) The equation (1.9) follows from the fact that the application of any fixed permu- tation 7 E A leaves the components of EM) unaffected. To see this we can write yEAv = IWI ZVEA 71m = EAv, where M] represents the cardinality of A. The last equality holds because A is a group of permutations and hence for each 6 in A there exists a unique V E A such that 6 = 71/. Assumption: For all that follows we require that /\ be uniformly distributed over a subgroup A of 2,, and independent of 11' which is distributed uniformly over 2,,. We will now turn our attention to a conditional version of the model (1.1). Observe that the conditional distribution 7r I Arr of 7r given the o-algebra generated by the random set—valued mapping Ar = {V7l' : V E A} satisfies 7r |A7r = /\7r [7r. (1.10) Under the An—conditional model for the errors re the estimator v - 3; may be conditionally biased with conditional bias EA”(v-y—v-Xfi) = E"(Au-7re) = (EAv)-7re. (1.11) In an attempt to reduce the effect of any Ar—conditional bias we propose using a bias—adjusted estimator (’0 - yladj = v - y — (EM) ' (NV, (M?) where, of course, (ms)i = yi = y/X, is a statistic. 15 The remaining conditional bias of the adjusted estimator (u - y)“,- and its unconditional mean square are described in the following two lemmas. Lemma 1.3.1 The remaining Arr—conditional bias of the bias—adjusted estimator (u - y)ad,- is equal to Badj(A,7r) = (EAv—EA(EAv)J-) -we (1.13) E (A [(EAv)/X]) - we (1.14) Proof: The Arr-conditional bias of ('v - y),,dj can be written as E” (v . y — (EAv) . (we)J- — v . X3) = E” (v - we — (EAv) - (we)i) = E" (v - Awe — (EAv) - (Awe)i) = E" (An . we — A(EAv)l - we) (1.15) = E" (Av — AEAv + A [(EAv)/X]) - we = E (A [(EAv)/x]) . we which proves both statements Of the lemma as (1.13) follows directly from (1.15). In the last equality above we have used (1.9). Note that if in the following lemma 1 _L X then the assumption that v is a contrast is not necessary. This fact can be seen from the proof of the lemma. However, for other results below we will have to require that v be a contrast even if1_LX. Lemma 1.3.2 If v E R" is a contrast, then the mean square of the conditional 16 bias BadJ-(A, 7r) is 1 n—l 19(13adj(1\,7r))2 [[EAv — EA(EAv)i]]2 ||e — .2”2 (1.16) = n:1[[EA[(EAv)/X][[2||e — en? (1.17) Proof: We will first show that the vector EA [(EAv)/X] is a contrast. If u is a contrast then so is a vector obtained from u by permuting its components and hence also EAu is a contrast. In addition, the projection u/X of a contrast it is a contrast since u/X ' 1 = u - 1/x = 0 in both cases when 1 is the first column of X or if 1 _L X. Therefore (EAv)/X and consequently also EA [(EAv)/x] are contrast vectors. Hence the assertion of the lemma is a direct consequence of Lemma 1.2.1 applied to E (E (A [(EAv)/x]) - re)2 with the contrast EA [(EAv)/x] playing the role of the vectors u and '0. Cl The next lemma and theorem provide the means for comparing the mean square Of the remaining Arr—conditional bias Bail-(A, 7r) against the mean squared error of the bias adjusted estimator (v - y)ad,-. Lemma 1.3.3 For a contrast vector v E R" the mean squared error of the bias— adjusted estimator (v - y)“,- equals 1 n—l MSE,dj = [[v — (EAw)i]|2 He — e||2. (1.18) Proof: First write MSEadj = E ((v ' y)adj — 1%sz 17 = E(v-y—(EAv)-(7r€)L—’U'Xfi)2 = EEA’r (v . 7re — (EAv) - (7rt3)i)2 : EE" (1) . Are — (EAv) - (Arre)i)2 = EE" ([Av — A(EAv)i] -7re)2. Similarly as in the proof of Lemma 1.3.2 we will Show that, for every fixed value of the random permutation A, the vector Av — A(EAv)i is a contrast. Observe that it suffices to prove that (EAv)J- is a contrast since then it is clear that Av—A(EAv)i is nothing but a random permutation of a difference of two contrast vectors, and hence itself is a contrast, although random. As noted in the proof of Lemma 1.3.2, if v is a contrast then so are EAv and it’s projection to the subspace X i which, similarly as X itself, either contains the vector 1 or is perpendicular to it. A straightforward application Of Lemma 1.2.1 with u and '0 both equal to the contrast vector Av — A(EA’v)i then yields that for each fixed value of A E"([Av — A(EAv)i] - we)2 = n i 1 [[Av — A(EAw)i||2 He — en2 = n i 1 Ilv — iII2 He — €12 which completes the proof. Theorem 1.3.4 Let 6 be a vector in R" and let 7r be a random permutation with distribution uniform over 2,,. Let and be a matrix of rank d with the first column equal to 1 or such that 1 .1. X. Assume that A is a random permutation distributed uniformly over a subgroup A of 2,, and independent of 7r. Then for any contrast 18 vector v in R" the ratio of the mean square of the remaining Arr—conditional bias Badj(A,7r) to the mean squared error of the bias—adjusted estimator (v - y),,dj, i.e. the relative mean squared bias, is mat-(And)? __ IIEA [(EW/xllf RMSB = _ . . MSEadJ’ llv - (1%”le (1.19) Proof: The assertion of the theorem follows directly from (1.17) and (1.18). [:1 An extension of Theorem 1.3.4 to the multi—dimensional case can be Obtained as follows. Corollary 1.3.5 Let r be a positive integer and let {ak},:=1 and {bk};=1 be two sets of real numbers. Then for r contrast vectors {vk},:=1 it holds that 2 2;, akE(Bad,-(A, 7r, v,))2 _ 22-. a, IIEAKEAvU/xlll - 1.20 22:1 bkMSEadJ-(vk) thl bk ””1: _ (EAvk)i||2 ( ) Proof: Equations (1.17) and (1.18) imply that 212:1 akE(Badj(/\,7rka))2 = 22:1 CREW/”KW ' was - ‘vk ' Xe)? 212:1 bkMSEadj(vk) Zi=1bkE((vk ' y1adj — ”I: ' X512 _ r 2 wi—I Ile — e112 z,=,a,||EA[(EAv.)/x1]| — Ile — @112 22.1% Hm — MEWLIIZ 2 zz=1ak|]EAI(EAvk)/XIH 22-11». va — (EAkalz ' In general it is desirable that (1.19) or (1.20) be small so that the bias com- ponent of the error of the bootstrap approximation developed below is relatively small when compared to the sampling variation of v - y. 19 1.4 Incomplete Permutation Bootstrap The Arr—conditional distribution Of the estimation errors of the bias—adjusted esti- mator (v - y),,d,- satisfies [(22 ~ ytt- — v - X3] IAr = (A122 —(EAv)*1-vre) I71 (1.21) as noted in the proof of Lemma 1.3.1. To approximate the conditional distribution (1.21) of the errors we will propose using A[v — (EAv)i] - (re)i [y (1.22) where the unknown presentation Of the errors 71'6 is replaced by the observed resid- uals (ne)i. The proposal will by justified by the fact that both the mean squared remaining bias of the bias—adjusted estimator (v - y),,dj and the mean squared dif- ference of the random variables in (1.21) and (1.22) will in some cases be small when scaled by the mean squared error of the bias—adjusted estimator. Confidence regions and hypotheses tests based on the distribution (1.22) will under some circumstances and with high probability closely approximate the con- ditional forms based on (1.21). Typically we have to require that (1.19) and (1.23) below both be small which depends only on X and A. These regions and tests may, as a practical matter, be constructed via Monte—Carlo simulations. The following theorem will allow us to investigate how well (1.22) approximates the target distribution (1.21) as it establishes an upper bound on the Mallows metric between these two distributions (see e.g. [3] for details on Mallows metric). For a fixed contrast vector v and a random permutation A uniform over A let TrueA = A[v—(EAv)i]-7re and BSA = A[v—(EAv)i]-(7re)i. These two specifically 20 chosen dependent random variables have conditional distributions (1.21) and (1.22), respectively. Theorem 1.4.1 The mean squared difference of True), and BS A scaled by the mean squared error MSEadj of the bias—adjusted estimator (v-y)adj, in other words, the relative mean squared discrepancy, is “BSA _ TmAy _ E [[[Av -— EAv]/,,]|2 RMSD = _ MSEadI' llv - (EMPII2 (1.23) Proof: TO show (1.23) we will first use Corollary 1.2.3 to observe that the numerator can be written as E(EsA — Truew)2 = El/W , (“)1 __ A(EAv)i . (we)i —- Av - we + A(EAv)i - we]2 = E[Av . (we/x — MEAvr - (“El/x12 = EE"[(Av — A(EAv)i) - (no/x12 = 1 E [[[Av — A(EAv)]/xl[2 ||e — e‘sll2 n—I = n: 1E “(Av _ EAv]/XH2 ue — 212. The role of the vectors u and 'v in Corollary 1.2.3 is, for each fixed value of A, played by Av — A(EAv)i which has been shown to be a contrast in the proof Of Lemma 1.3.3. The form Of the denominator in (1.23) can be obtained from the assertion of Lemma 1.3.3, hence (1.23) holds. By minimizing (1.23) we can control how closely the incomplete permutation bootstrap method approximates the conditional distribution (1.21). 21 Corollary 1.4.2 (Part a) is Proposition 1 in [17]) Assume that all requirements of Theorem 1.4.1 are satisfied for A = 2,,. Then the estimatorv-y has no conditional bias and the following hold: a) If 1 is the first column of X then the relative mean square discrepancy becomes E(v-A7re-—v.A(7I'e)i)2 d—1 M D: = . . R S E(v - Arne)2 n —1 (124) b) If 1 _L X then the relative mean square discrepancy becomes E(v - Arre — 110A(7re)i)2 d D = = . RMS E(v - Arne)2 n — 1 (1 25) Proof: Let {20,-};1 be an orthonormal basis Of the space generated by the . columns Of X. Both assertions (1.24) and (1.25) follow directly from (1.23) and Lemma 1.2.1 applied to E(v - Aw,-)2 for each i. To prove (1.25) first observe that if A = 2,, then E (Av) = t7 = 0 since u is a contrast vector. This and Lemma 1.2.1 yield that the numerator on the outmost right hand side in (1.23) becomes 2 d EHOWVXH = 253(02in d n _1 |le|2 1w.- — m1? (126) i=1 d 2 — "_lnvu. The last equality holds because of the fact that {111,-}:1 is an orthonormal basis and thatfi=0for everyi= 1,...,dsince 1 .LX. The proof of ( 1.24) is essentially identical if we assume that ml is equal to a multiple of 1 because then ”to—,- = 0 for every i = 2, . . . ,d, and the first summand of 22 2le E(v - Aim-)2 in (1.26) becomes E('v - Aw1)2 = 0. Note that such a basis exists since in this case 1 is assumed to be the first column of X. We will use the previous result to compare the performance of our incomplete permutation bootstrap to the performance of the full permutation bootstrap. The latter has been shown in [17] to perform in some cases better than the uncon- ditional approach, yielding narrower confidence regions. Therefore it is sensible to determine in which situations our incomplete permutation bootstrap brings an additional performance improvement. Corollary 1.4.3 Letv be a contrast vector and let BS A, ESE", TrueA, and Truegn be as above. Then the ratio of the incomplete bootstrap mean square discrepancy versus the mean square discrepancy under the full permutation bootstrap model can be expressed as E(BS,( — TereA)2 _ n —1 E(Bs,;n — 111mm)2 ‘ d— IEHW EA” W] 2 a (1.27) Proof: According to the proof of Theorem (1.4.1) and formula (1.26) we can write 2 E(BS,1 — True/1)2 _ E [[(Av — EAvl/xH E(BSgn — Truegn)2 f:—[ “v”2 which implies (1.27). 1.5 Independent Block Permutations In the previous sections we obtained results for the unconditional relative mean square bias (RMSB) of the bias adjusted estimator (v - y)“,- and the unconditional 23 relative mean square discrepancy (RMSD) between the true conditional distribu- tion of its errors and the incomplete permutation bootstrap approximation. In this section we will consider the relative Arr—conditional mean square dis- crepancy between the true and incomplete permutation bootstrap distributions in a special case Of independent block permutations of a vector 1) consisting of blocks Of contrasts. One of the advantages of this special case is that the estimator v - y becomes conditionally unbiased which implies that the true conditional distri- bution (1.21) of the estimation errors and the incomplete permutation bootstrap approximation (1.22) turn out to be Av . re Irr, and Av ° (mi Iv respectively. For a partition A = (14),),5;1 Of {1, . . .,n}, a vector u in IR", and every k E {1, . . . , K} we will define a vector [u]’° = (“the/1,. representing the k'th block of components of the vector u. The cardinalities of the sets A), will be denoted by nk=|Ak|,k-——1,...,K. We will say that a random permutation A consists of K independent block permutations if for some fixed partition A as above and any vector u E R" the random permutation A satisfies [Au]k = AkMk, k = 1,. . .,K, (1.28) where A1, A2, . . . , A K are independent random permutations with distributions uni- form over the full permutation groups 2,,,, . . . , 2,,K, respectively. Note that previously we have required that A be distributed uniformly over the 24 permutation subgroup A, therefore the condition (1.28) also enforces the form of A. Such a subgroup A will be called a K —block permutation group. Below we describe the Arr-conditional Mean Square Error Of the estimator v - y and the Arr—conditional Mean Square Discrepancy of the conditional distribution of its estimation errors from the proposed incomplete permutation bootstrap approx— imation under the assumption that EA?) = 0. Note that the requirement EAv = 0 is in the case of independent block permutations equivalent to the condition that EAk[v]" = [v]k = 0, i.e. that [21]" is a contrast vector for every k = 1, . . . , K. It is also easy to see that if E Av = 0 then the estimator v - y is conditionally un- biased as suggested above since according to (1.11) on page 14 the Arr—conditional bias of v - y equals (EAv) - 7re. Lemma 1.5.1 Assume that A consists of K independent block permutations and that v E R". If EAv = 0 then the Arr—conditional mean squared error of the estimator v - y equals K 2 — 2 MSEA" = EA"(v - 7re)2 = k: [[[v]k[[ [[[re]k — [rte]" [[ . (1.29) 1 : nk—I Proof: Using Lemma 1.2.1 with [11]" playing the role of the contrast vectors u and v we can write MSEM = EA"(v - y — v - X,6)2 = E”(Av - 7re)2 = E" (I: Akflv]’° - [re]k) 2 E" (Mk . Ak[r6]k)2 (1.30) ll M3: 1: 1 25 K _, III IIIrIk-I “I“ II In the equality (1.30) we used the independence Of the random permutations {Aklle and the fact that EA’rAkflv]k - [ne]k = [v]k - Emflrrefl" = 0 for every k=1,...,K. Lemma 1.5.2 Under the assumptions of Lemma 1.5.1 the condition EAv = 0 implies that the A7r~conditi0nal mean squared discrepancy between the true condi- tional distribution of the errors of the estimator‘v'y and its incomplete permutation bootstrap approximation is equal to [111"]? E" Inna/XI" - Irma/XII If, (1.31) K E” BS —T 2: ( A TUBA) Erik—1 where TrueA and BS A are as in Theorem 1.4.]. Proof: Similarly as in the proof of Lemma 1.5.1 we will use the independence of A1, . . .,AK, the fact that EA"Ak[v]k - [(7re)/X]k = W - EA"[[(7re)/X]k = 0 for every k E { 1, . . . , K}, and Lemma 1.2.1 with u and 2) replaced by the contrasts [[vfi’c to Obtain EA“(BSA — Tram)2 2 EM (Av - (7re)i — Av - re)2 = EA”(Av-(7re)/X)2 =E’”(2Ak[v1k1/XI’“ II K 1 2 —— 2 = kX-T’Hk—I [[[vlkll En I()‘7I€)/xllk_[(’\7r€)/X]Ik[[ ' Corollary 1.5.3 Assume that l is the first column of the matrix X and that A is as in Lemma 1.5.1. Let {wj}:=1 be an orthonormal basis of the column space of the matrix X such that wl = 1/,,/r—i. IfEij = 0 for allj = 2, . . . ,d then for any 1) E R" with EA!) = O the Arr—conditional mean squared discrepancy between the true conditional distribution of the errors of the estimator v - y and its incomplete permutation bootstrap approximation can be written as EWBSA — True/()2 = (1.32) = ;[(2, 1 IIIkaIIgIIIwI-Ikllz) __ nk—l (‘2 I IIIreIk-Wllzflwz-I’“ijI"))], where TrueA and BSA are as in Theorem 1.4.]. Proof: The assertion follows from Lemma 1.2.1 and 1.5.2. In (1.31) we can write (1 2 2(775 ' willwilk i=2 EA" Iva/xr—Iea/wrlf = E” d = Z [101],“-[w,~]kEA"(7re-w,)(7reowj), i,j=2 27 for every k E {1, . . . , K}, and then use Lemma 1.2.1 to observe that for every i andjin {2...,d} Ewe - wow - w» = E“ (‘2‘ War - iwil') (gum—11'- lel’) = g E" ([wa . may) ([[wjnl . Alflrell’) = 1 lwill‘lell’lllrel-Fre—l’llz- l=1n1—1 In the second equality we use the independence of /\1, . . .,AK and the fact that Enqflunfl’ - [refit = flwJI’ - EMIIrrefl’ = 0 for every l = 1, . . . , K. Theorem 1.5.4 Let /\ be a random permutation consisting of K independent block permutations. If v E R" satisfies EM: 2 0 then the ratio of the Arr—conditional mean squared discrepancy between the true conditional distribution of the errors of the estimator v - y and its incomplete permutation bootstrap approximation versus its Arr—conditional mean squared error, in other words the relative Arr—conditional mean square discrepancy of v - y can be expressed as was, — Mew = 2;; ”WW E“ llama/w — two/xi“ ”2 E""(/\v - rros)2 2:le 1 “[[vllku2 Hflrreflk — W ”2 nk-l , (1.33) where True,‘ and BSA are as in Theorem 1.4.]. Proof: The result is a direct consequence of Lemmas 1.5.1 and 1.5.2. 28 Corollary 1.5.5 Under the assumptions and using the notation of Corollary 1.5.3 we obtain that ifl is the first column ofX then EAW(BSA — True/02 _ (K -1 1‘3A7'(z\v-rns)2 — flmlk ll ) (1'34) 2K2: . -1 HM H 11 mi n) (21 k=l nk —1 llflwer - W H2 (iwr . [2216)] , where TrueA and BSA are as in Theorem 1.4.1. Proof: The assertion follows from Lemma 1.5.1 and Corollary 1.5.3. As a corollary of Theorem 1.5.4 we obtain a formula for the case of full permu- tation bootstrap in which the relative Arr—conditional mean squared discrepancy becomes equal to the unconditional RMSD described in Corollary 1.4.2 above and in Proposition 1 of [17]. Corollary 1.5.6 If /\ is a random permutation with distribution uniform over 23,, and v E R" is a contrast vector then the Arr-conditional mean squared discrepancy becomes EA"(BSA — MCAV _ d — 1 EA"(Av ~ we)? — n — 1 (1.35) where TrueA and BSA are as in Theorem 1.4.1. Proof: In the assertion of Corollary 1.5.5 take K = 1 and write, using the facts that [all = u for any vector u and that ”to—,- = 0 for i Z 2, EM(1w212—W)II2 = lliw212 — W IIZ E222. and K K 2 EA"(7re - 202)2 = Z E7r (2: 15m [Airefll - [10232) m=l (=1 K = 2. E213... (Irelm ~ 3.120212"? K 1 = 25—0 aIl222IleImII 15m where 102 = m/ Hm“. Clearly (1.36) and (1.38) together yield (1.39). Recall that the events E1, . . . , E, are disjoint. Equation (1.37) is a consequence of (1.36) and Lemma 1.3.3, according to which the mean square error of the estimator is equal to MSEadj = E()\v . me)2 = 1 2 _ 2 2 2 ’U €—€ —— ’U a Figure 1.5 shows the conditional distribution of v - we and its incomplete per- mutation bootstrap approximation for the case of four—point regression with single 32 outlier 6 = (1,0,0, . . . , 0)’. We consider a vector v = a: and assume that v consists of two blocks ( -10) ( -3) —10 -3 W = ' and M2 = z 3 10 K10} \3/ with dimensions m1 and mg, respectively. The unconditional distribution of v - 7T6 assigns probabilities $711 to the points —10 and 10 and probability 3% to the points ——3 and 3. The conditional distributions of v - ire given that in ms the error value 1 aligns with [v]1, that is v - 7T6 |7r(1) 6 A1, consists of the points —10 and 10, each with probability 1 / 2. Similarly, v - 7re |7r(1) 6 A2 is uniform over the points -3 and 3. The full permutation bootstrap of the residuals approximates the unconditional distribution of v - 7T6 while the independent block permutation bootstrap approxi- mates the conditional distributions. The approximations shown in Figure 1.5 were obtained via Monte-Carlo using 5000 random permutations. The dimensions of the blocks were chosen m1 = 100 and m2 = 200. Figure 1.6 shows the distributions of v ~ us for simple linear regression y = a + Ba: + ms with pulse error e = (1,0,0, . . . ,0)’ and (~10) —9.8 —9.6 . (1.40) 9.8 I 10} The contrast vector v was chosen equal to a: with the first block [[vlll consisting of 33 the 7 largest and 7 smallest components of (B. For i = 1, 2 the conditional distribution of v - 7T6, given the order statistics of errors in each block, is uniform over the components of [v]‘ and, as Figure 1.5 shows, is well approximated by the independent block permutation of bootstrap of the residuals. The approximations shown were obtained using Monte-Carlo with 5000 replicas of v - A(7re)i|Y. Figure 1.9 illustrate conditional distributions of v - e in the same regression model with errors 6 being i.i.d. Gaussian with one outlier. In Figure 1.10, e,- = 6,-Uf2 where 61, . . . , 6,, are i.i.d. symmetrical random signs and U1, . . . , Un are i.i.d. uniform random variables independent of the random signs. 34 Figure 1.5: Four point linear regression with a single outlier. The conditional and unconditional distributions of the errors v . we and their corresponding indepen- dent block and full permutation bootstrap approximations. The approximations were obtained via Monte—Carlo with 5000 replicas of the corresponding random permutations. 0 0 0 . 3 0 . 1 0 . 2 0 . 05 2 0 . l :10 -3 3 1’0 -10 :3 3 To Unconditional distribution of v - me Full permutation bootstrap approximation of v - 1n: 0 . 5 2 0 0 0 . 4 0 . 1 0 . 3 o . 2 . 0 . OS 2 0 . 1 -1o -‘3 3 10 -10 -‘3 3 1‘0 Distribution of v - we | 7r( 1) 6 A1 Incomplete permutation bootstrap approximation of v - ire | 1r(1) 6 A1 0 . S 2 0 0 o 4 0.15 0 . 3 0 . 1 0 . 2 0 . 05 2 0 . 1 -f0 -3 3 1‘0 -io :3 3 130 Distribution of v ' ire | 7r( 1) 6 A2 Incomplete permutation bootstrap approximation of v - 11'6 | 1r(1) 6 A2 35 Figure 1.6: The conditional and unconditional distribution of the errors v - NC in a simple linear regression with a single outlier. The corresponding independent block and full permutation bootstrap approximations were obtained using Monte— Carlo with 5000 replicas of the corresponding random permutations. (Same as Figure 1.3) 0.052 o.osI 0.04 0 . 03 2 0 03 _ 0.02 0 01) 0 01 -10 -7 o 7 10 3.0 -A7 0 '; 1A0 Unconditional distribution of v - 71’6 Full permutation bootstrap approximation of v - we 0 . 14* 0 II 0 . 122 0.1 0. 082 0.05 0. 06 2 0.04 0.02 -10 -7 6 7 10 -i0 3 6 '7 1.0 Distribution of v - 7T6 | 7r(1) 6 A1 Incomplete permutation bootstrap approximation of v - me I 7r(l) 6 A1 0. 072 0 . 072 0. 06 2 0.052 0 . 05 2 0 . 04 2 0 . 03 2 0 . 03 2 0 .02 2 0.01 2 0 . 01 2 30 -7 o 7 1‘0 -10 57 6 i 10 Distribution of v - ire I 7r(1) 6 A2 Incomplete permutation bootstrap approximation of v 2 ms |7r(l) 6 A2 36 Figure 1.7: Comparison of the exact and approximated conditional distributions of the errors v ~ ms in a four point linear regression with a single outlier. (Data from Figure 1.5.) 0.5 0.5 0.4 0 4 0.3 0 3 0.2 0 2 0.1 0 l T -10 -3 3 10 ' ~10 —3 3 10 Comparison ofv - 7T6 |7r(1) 6 A1 Comparison of v -7re |7r(1) 6 A2 and its approximation and its approximation Figure 1.8: Comparison of the exact and approximated conditional distributions of the errors v - we in a simple linear regression with a single outlier. (Same as Figure 1.4, see data in Figure 1.6.) ‘31 - l 1. ~10 -7 0 7 10 Comparison of v - 7r6 |7r(1) 6 A1 Comparison of v - 7T6 |1r(1) 6 A2 and its approximation and its approximation 37 Figure 1.9: The conditional and unconditional distribution of the errors v - we in a simple linear regression with i.i.d. Gaussian errors. The corresponding independent block and full permutation bootstrap approximations were obtained using Monte- Carlo with 2000 replicas of the corresponding random permutations. 0.0052 0.0052 -260 6 zoo -200 6 200 Unconditional distribution of v - are Full permutation bootstrap approximation of the distribution of v - 7T6 0.0042 0.0042 -200 6 200 -2‘oo 6 200 Conditional distribution [3; of v - 1n: given Incomplete permutation bootstrap that extreme values of 7T6 align with v1 approximation of the distribution £1 0.0052 0.0052 -2600 6 260 -2‘oo 6 260 Conditional distribution [.2 of v - in: given Incomplete permutation bootstrap that extreme values of ma align with v2 approximation of the distribution £2 38 Figure 1.10: The conditional and unconditional distribution of the errors v - 7T6 in a simple linear regression with i.i.d. errors from the domain of attraction of a symmetric stable law with a = 0.5 (in particular, 6, = 5,-U,‘2 are reciprocals of squared i.i.d. uniform variables with independent symmetrical random signs). The estimates of the distributions were obtained via Monte—Carlo using 2000 replicas of the corresponding random permutations. (Same as Figure 1.1) 1.x10‘6r -400000 400000 -400000 400000 Unconditional distribution of v - 71'6 Fill] permutation bootstrap approximation of the distribution of v - ire 2.x10"2 ~400000 400000 -4o6ooo 400000 Conditional distribution £1 of v - 1re given Incomplete permutation bootstrap that extreme values of ms align with v1 approximation of the distribution £1 .x10_6’ -4o6ooo 400000 -400000 400000 Conditional distribution £2 of v - ire given Incomplete permutation bootstrap that extreme values of 7T6 align with v2 approximation of the distribution £2 39 1.7 Asymptotics for Independent Block Permu- tations Our results in the previous sections Show that the conditional distribution v - 71’6 |A7r, i.e. the distribution Av - 71'6 |1r or v - Ac I6, can under some conditions on the design be well approximated by incomplete permutation bootstrap of the observed residuals (ms)i with proper compensation for the conditional bias of the estimator v 2 y. To support the use of conditional confidence intervals based on incomplete per- mutations bootstrap and to clarify the relationship of the conditional distribution to the unconditional one we establish a strong invariance principle (as n increases to infinity) for the conditional distributions vn - Anen len (1.41) for a sequence of particularly chosen block-wise contrast vectors vn E R" in the special case when for each n the errors 6,, = (em)?=1 are i.i.d. random variables with distribution attracted to a fixed a—stable law of index 0 < a < 2, and the random permutation An consists of K independent uniform block permutations. Theorem 1.7.1 (Limit Theorem) Let A = {Ak};1 be a partition of the in- terval [0,1] into K finite unions of intervals, and for every n let An be a random permutation consisting of K independent block permutations induced by A as de- scribed in (1.43}. For every distribution F in the domain of attraction of a—stable law having index a < 2 there exists a triangular array such that for every n the errors in the n‘th row of the array 6,, = (em-)2;1 are i.i.d. F and ifv is a continuous function 40 on [0,1] with finite variation, satisfying f A1. v = 0 for all k = 1,. . . ,K then there exist cddla’g versions L1, . . . , LK of K independent a—stable Le’vy processes on [0, 1] such that 1 K 1 E; vn-Anen la. a kzumira / v 00, where u is the Lebesgue measure, the vectors vn are defined using v as specified in (1.44), the scaling constants an are defined in (1.45), and for every k = 1, . . . , K the o—algebra J?" is generated by the jumps of the process L" while f), is an appropriate monotone, piece-wise linear scaling of [0,1] onto Ak. Those familiar with convergence to stable laws may note the absence of center- ing in (1.42), but this is taken care of by centering vn (see (1.46)). Proof: We will adopt a modification of the approach used by authors in [17] who obtained similar results for the complete permutation bootstrap case. This method is based on a series representation of a cadlag version of a Lévy process which is described in great detail in [18]. Assume that for every n the permutation An is generated by K independent block permutations, i.e. that for a fixed integer K there is a partition A = {Ak}f=l of the interval [0, 1] with A), being a finite union of intervals for every k = 1, . . . , K, and that for every n the random permutation An applied to a vector u 6 R" satisfies [[Anu]k = A:|]u]]k, k = 1, . . . , K, (1.43) where {A2}; are independent random permutations, each distributed uniformly over 2mg; . Here m: denotes the dimension of [14" = (u,),-e,,Ak which, similarly as before, represents a vector consisting of the 10““ block of components of the vector 41 u. Notice that, in contrast to our previous definition, the set of indices {1, . . . , n} is partitioned into K blocks by the sets in nA. We will generalize the notation of block vectors to functions on [0,1] as fol- lows. For a positive integer n and a function v : [0,1] —+ R define [vfin as the n—dimensional vector with components equal to values of v sampled at the succes- sive points i/ n, i = 1, . . . , n. For every 10 = 1, . . . , K let [[v]f, denote the k'th—block of [v]n. This notation will be used to specify the special form of the sequence of contrast vectors {vn}n21. For what follows we will assume that v: [0, 1] —> [R is a continuous function with finite variation satisfying /v(x)dx=0, k=1,...,K, AI: and for every n we will define a vector vn E R" by requiring [[vn]k = [v]: — [[vflfi, k = 1, . . .,K. (1.44) The specific choice of the vectors vn guarantees that all the vectors vn are block- wise contrasts, in other words,that all the corresponding blocks [vn]’° are con- trast vectors. As noted earlier, this condition is equivalent to the condition that EAnvn = 0 for all n 2 1. Let us now state the results for the full permutation case as developed in [18]. For a random variable X with distribution F define G+(x) = P(X 2 x IX 2 0), G_(x) = P(—X > x| — X > 0), and C(35) = P(|X| 2 0). The inverse of a function F will be defined as F‘1(x) = inf {y 2 0: F(y) S x}. For every 72. let an = G—1(1/n) (1.45) If F belongs to the domain of attraction of an a~stable law with a < 2 then 42 the following limits exist (see [18]) __ lim P(X 2 x) _ lim P(—X > x) p " x—voo C(22) 2 q _ x—m C(15) For a sequence u = (u,)f_:1 of numbers from [0,1] and 0 g t g 1 define (as in [15], reprinted in [16]) I?(u,t) = 1{s: U171 S [sn]}(t) and recursively I,"(u, t) 1{s: u,-(n + 1 — i) 3 [sn] — 23;] Iflu, 3)}(t)’ wherei=1,...,n. If 1" = (13);): 1 is a sequence of the arrival times of a standard Poisson process, U = (U03; is a sequence of i.i.d. random variables with uniform distribution over the interval [0,1], and if 6 = ((2)321 is a sequence of random, possibly non- symmetric signs such that l", U, and 6 are mutually independent then according to [18, Lemma 3 and (11)] the random vector en = (5n,i)?=1 = "it! (620321 (FP:1)) n i=1 consists of i.i.d. F random variables. Here 7g? represents a uniform random permu- tation of {1, . . . ,n} chosen by U via the positions (1,-'2)?=1 of jumps of the functions (I?(U, t))?=l' According to Theorem 2 in [18]: Let a 6 (0,2) and let v be a continuous real function with bounded variation with f0l v = 0. Then 41.12052“ z)6,- 0.5—,1 ( Pi ) = Z Ziggy"), (as), (1.46) n 1': i=1 i 1 Ffl+1 where Z-— I? 1/06i+1+ q ”061—1 2‘ P(XZO) 2 P(—X>0) 2 ' 43 Note that for every n the sum under the limit in (1.46) can be written as i [v]n ~ 112,? e. The right side of (1.46) can be according to [18] viewed as f vdL where L is a cadlag version of an a-stable Lévy motion on [0, 1]. Theorem 2 in [18] also states that if bn = E(z,-f1"/nG5,(x)dx)n/an then the scaled and centered sums 1/an 22;, em- — bn converge (a.s.) to a finite random variable as in (1.48) below with A = [0,1]. Note that [an S B n/an for some B > 0, therefore if {an} is a sequence of real numbers with ncn/an —2 0 then lim 2"— : 6",,- = 0 (a.s.). (1.47) 71—2000,", 1 1: If A is a finite union of intervals in [0,1] we can use the same construction to obtain an (a.s.) limit for the sums of [np(A)] random variables lim 1 Z v(f(J,-"))6,- 0&1 (FL) = 00 #2, (a.s.), n—oOO anp(A) iEnA .=1 1 (1.43) n+1 where f is a scaling of [0, 1] onto A. We can therefore construct K independent (a.s.) limits for 31,: [UK - Axienk as u(Ak)° f v(fk)dL". Here we use that if F is a—stable law then for every x > 0 it holds that limn_,oo aux/an = x“. The facts that v is continuous and f A1. vdu = 0 guarantee that [v]: —-> 0. Replacing cn in ( 1.47) by [v]: we obtain that a—lgflv]: - en), converges to 0 (a.s.), for every k as an —-> oo, hence we obtain that for every k = 1, . . . , K 1 lim —— (0,112 - A216,), = u(Ak)°' / v(f,,)dL’°, (a.s.). Similarly as in Proposition 1 of [18], the conditional convergence result is ob- tained via conditioning on (1226,") which is equivalent to conditioning on the jumps of the corresponding processes. 44 1.8 Exact Confidence Intervals Based on Incom- plete Permutation Tests Quade (1973) ([20]; see also [19] and [9]) has found a pivot which can be used to develop an exact confidence interval for B in the simple regression model y=a1+xfi+7re (1.49) where a: is a known vector in R" and a and S are two unknown parameters. For simplicity we will assume that a: is a contrast, i.e. :1: - 1 = 0. The model (1.49) is a special case of the model (1.1) y = X - ,6 + 7T6 with the matrix X having two columns 1 and a: and fl = (a,fi)’. The vector 4: 6 R" is, similarly as in (1.1), considered to be non-random but unknown, and the random permutation 7r is assumed to be distributed uniformly over the group 2,, of all permutations of components in B". Lastly, we assume that the random permutation A is distributed uniformly over a subgroup A of 2,,. Let v denote the second row of the matrix V = (X ’ X )“lX ’ , in other words let v = ”—xleg, and recall that v is a contrast. Denote by B = v - y the least squares estimator of B. Under the hypothesis H0: 5 = 50 we can observe that the vector of marginal errors 60 = y — g — $30 is equal to re — E. Therefore under the hypothesis Ho we are able to recover the centered errors without postulating the value of a. A permutation test for testing Ho can be based on the test statistic Trr(160) : ’0 ° 60 which under Ho has Arr-conditional distribution that is uniformly distributed over the indices of the values of v - A60. Note that if H0 holds then T,,(60) is equal to 45 v - rs = B — BO. Denote by T" the random variable v - A60 which is Arr-conditionally independent of T 7r(Bo), but under Ho: 5 = So has the distribution T,,(B0) IAN. Then the p- value for the observed value to of the test statistic T1450) can be expressed as 2min(P+, P‘, %) where P+ = P(T" > to) and P“ = P(T" < to). Lemma 1.8.1 (Extension of Quade’s result in [20]) Let T,(B0), T", and A be as before. Assume in addition that there are no ties among the values of T". Then an a-level confidence region for B can be formed as B = {fies % < P"(T* > Time» 31-2}, (150) where a is of the form 1% for some integer 0 < j < 2|Al. Proof: Denote by PBAO" the probability measure under the assumption that So is the true value of the parameter ,8, conditional on Arr. Let the u-conditional distribution function of T" be denoted by F 50. To examine the probability that So is not covered by the confidence region B we will write Pause r B) = Pit,"(P2(T* 3 ma.» 3 25-) + P3,"(P2(T* 3 72.03..» > 1- s) (1.51) and observe that P&”(P"(T* S T1,(Bo)) S 92!) = P&”(F00(Tn(fio)) S 523) = g since Fgo(T,,(fio)) is distributed uniformly over {fi: 10 = 0, . . . , IAI} and a is of the form 1%, for some j. Similarly, the second probability on the right hand side in (1.51) is equal to %. Hence Pgflflo g? B) = a. The confidence region in (1.50) has the desired confidence level, but is hard to construct since for each value of B0 the probability P"(T"‘ > T,(Bo)) has to be 46 computed individually. To simplify the construction of the confidence region B we will rewrite its definition in a more useful form using a pivotal random variable which does not depend on the value of ,80. A special case of this method, for the permutation group 2,,, was first introduced by Quade in 1973. Proposition 1.8.1 (Extends Quade [20]) Let a = (21%| for some integer 0 < j < 2|Al. Then the a-level confidence region (1.50) can be obtained as B={eo.-gfi—fio) B — 30) becomes P"(v 2 /\(7T€)i > (B — fio)(1-(v-Am))) and the a-level confidence region (1.50) can be obtained as B={fio:%