£33 1‘ . .4.)qu a.“

...«(Wuﬁﬁmtmﬂﬂ WP.
n5» 4......

. .udﬁkr. 9‘» n)! .

an. 1.» bruins...

n.8mmvvmﬁ ,, u.
.9. 54.....eywuwlmauuw... .
.wm.m4hm mgunxwm. . M K.
.. «fawn. «.... ab: 4.33..

:6]

 

. ... .. I...
“ﬂag?“
.

01:17.! .
U ‘1" .-

I:. A... A! to.
. ...-£19 I .v \n?
.5. ; . x: c‘ . ..3
x... w .p‘t: ..
‘5an “f."
26?qu;

. 1! §.'xv; .....IGIOD I II V. .‘levllls‘nv‘

.llgfl ... 1.3.3.1...1 . a 3
. . 21.11. .I' ...| I. IV
liﬁl‘ slur... ..Hi tr”...

.. Liiﬂtsirta‘: 2'05!

..l ... . , . ‘ . H . .. . . . . . .. . . u . . ‘ .n ‘ . . . y . . . . . [Emitttstt‘alQ I .|
.....Hl. . . .. ‘ . . . . . ‘ \ . , . . . . ‘ ..A.“lv.t:.l.vﬁut ~ ...
. . ‘ . . n . ‘ , f .. .. » . . ‘ ‘ ,. . . . ‘ : ...»! . 1.2....
. .. I . H .. ‘ . . . y . v . . . ‘ . . n.5 .l 'o...‘l...\.-v|i\:mw¥l.ylhv

H'
l

1
¢

. \.J‘7.iu~
131‘.

.'n
"-1
..

m

‘ lil|tt1||h ..§'l)\ ,A .v a
I'l .1-II {tilintil'obl 1:
Hal! bya‘Yuuuls.3‘.tPI-v5. ...-ti..-

 

 

 

 

‘ ' V ‘ u- U. ‘19!
Illlulna: cl .1 .v‘ 7110 lv 0:“:
.. . I. . . .V x .u ‘ .
. . .vl) .. I‘ll JC,I¢IF..» .05.;vlul'L
int: s.-..1ix..'l.l‘ .0 ill: 1“!!! I!
‘ I9, 5:315?! flaw

              

- I nunll. ‘n :2.I.I'lll.| av . .

‘ A: "- . I|h¢l ‘0 ‘v u v air-v1»..f’lnv9“ ‘

. ’i'l .01.. ,b A 31.1). 5!! “Enlrf‘hﬂuﬂtl‘s'. ‘ .13t' III-n. : t . l. n . A
o. . I... . . ‘n. . ...-o - . -. o v... 0. I-.. ~1'tv vl

 

1.! ,btlu I . -....nnn I..V- J ‘ V K , . t 7.1. <
I...1hh|0l.~.|cl_.ﬂlll..|lt‘ . Solo. ... ‘ ﬁ~v.hh>.n.qo.lt.l‘f ..I‘IIJ.‘ . ‘ .‘ .6». . v. 6.- n v |

 

 

VHFSi?

IlllllllllllllllIlil'llllllllllllllllllillll

3 1293 00885 2224

This is to certify that the

dissertation entitled
Resampling Methods For Linear Models
presented by
Krzysztof Podgorski

has been accepted towards fulﬁllment
of the requirements for

 

 

Ph.D. degree in StatiStiCS

  

ajor profey/ ‘

Date July 23, 1993

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution 0—12771

 

LIBRARY _
M'Chlgan State
University

 

 

___

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

__ ll
___Jl =4
“—l _—l

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

__7 . .

MSU Is An Affirmative Action/Equal Opportunity Institution
czwmwnpma-pd

 

 

 

Resampling methods for linear
models

By
Krzysztof Podgorski

A DISSERTATION
Submitted to
Michigan State University

in partial fulﬁlment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Department of Statistics and Probability

1993

Abstract
Resampling methods for linear models
By
Krzysztof Podgorski
In this dissertation we study resampling residuals by random permutation in
linear regression with exchangeable errors. For given observations we con-
sider the resampling distribution obtained by this method versus the distri-
bution of errors conditional on its order statistics. We observe that with high
probability both are approximately equal on contrast vectors after suitable
normalization. The results remain valid if we consider the reduced regres—
sion model. This model comes from the original one by removing those data
for which the corresponding errors take extreme values. We obtain general
conditions under which conﬁdence regions produced by such methods are ac-
curate. No moment / distribution assumptions on the underlying errors other
than exchangeability are required. We adopt the term seamless resampling
to indicate this robustness. In the absence of ﬁnite second moments random-
ness of the limiting bootstrap distributions has been considered as a failure
of the traditional bootstrap and no practical meaning has been given to the
phenomena. For our method this type of randomness is still present in the
limit but it has clear probabilistic interpretation as a conditional distribution
which can be used to e.g. obtain conﬁdence sets. We study this phenomena
in detail for the case of independent errors with distribution from the domain
of attraction of stable laws. We have developed computer software in which
we implement these methods and provide convenient tools for the analysis of

resampled data in general.

Acknowledgment

I wish to express my gratitude to my advisor Professor Raoul LePage for
the introduction to the problem, fruitful discussions and constant encourage-
ment. His exceptional support and excellent guidance during my doctoral
studies at Michigan State University went far beyond what I could ever ex-

pect.

iii

Contents

1 Theoretical Results 1
1.1 Introduction ............................ 2
1.2 Main Results ........................... 6
1.3 Examples ............................. 16
1.4 Proofs ............................... 27
2 Numerical Analysis 39
2.1 General Description of Software ................. 40
2.2 The Numerical Procedures .................... 41
2.3 The Graphical Interface ..................... 47
2.4 Numerical Experiments ...................... 50
2.5 Conclusion ............................. 54
Appendices 55
Appendix A ............................... 55
Appendix B ............................... 58

iv

List of Figures

Genome-com

10

11

12

The conditional distribution of errors and its recovery by permut-

ing residuals ............................ 19
The distributions obtained by other methods ........... 19
Conditional distributions in the unreduced model ......... 21
Conditional distributions in the reduced model .......... 21
Results of resampling, Gaussian case ................ 24
Results of resampling, Cauchy case ................. 25
Results of resampling, stable case with a = 0.8 .......... 28

The sample of 1000 elements obtained by permuting residuals
in a linear regression model versus an autoregression model . . 42
Resampling residuals in an autoregression model with Gaus-
sian errors versus Cauchy errors .................. 44
The main window of the LaPlot application for analysis of
multivariate data .......................... 48

The distribution of resampled errors and residuals, Gaussian

The distribution of resampled errors and residuals, Cauchy case. 53

Part 1

Theoretical Results

1.1 Introduction
This paper considers a multiple regression model
(1) Y = Xﬁ + e,

where X is an n x d ﬁnite matrix of real numbers, ﬂ is a d x 1 vector of real
numbers, and 6 is an n x 1 vector of exchangeable random variables.

For reasons that will be discussed later we will also consider reduced
regression models which come from (1) by deleting some data and the corre-
sponding rows of the design matrix. To describe this submodel, let R; be the
rank of 6.- among (e.-)?___l and M be a subset of m integers from {1, . . . , n} with
cardinality m. For any n x 1: matrix A by A M we will denote an (n — m) x k
matrix obtained from A after deleting those rows indexed by i whose R,- is
in M. With this notation for any M we deﬁne a reduced regression model

by the equality

(2) YM = XIII/6 + 6M-

Notice that for different vectors of errors 6 with a ﬁxed set 11/! we can have
different reduced models (2) and thus the design matrix XM becomes ran-
dom.

A k x 1 vector v of real numbers is called a contrast if the entries of 2) sum
to zero, equivalently if the Euclidean scalar product 12- 1 = 0 where 1 denotes
the vector of 1’s in IR" (we will not indicate in notation dependence of l on
a dimension 1:). Throughout the paper we will assume that X M has rank d

and also that the vector 1 is in the column space of X and thus of any X M

for a subset M of {1, . . . , n}. Without losing generality we can assume that
1 is the ﬁrst column of X. Then contrast vectors come to (2) in a natural
way. Namely, for i > 1 the least squares estimators d,- of ﬂ,- are of the form
5,- = V,- - YM for a contrast V.- depending also on M. Indeed, V,- is the i-th
row of the matrix M = (XEXMYIXL and MXM = I. This implies that
any row V,- for i > 1 is orthogonal to the ﬁrst column which is assumed to be
equal to 1.

Ordinarily one is interested in approximating the joint distribution of v - 6
for several 2) since it is equal to the estimation error, i.e. v-e = (v-Y)—(v-Xﬁ).
For i.i.d. errors with ﬁnite variance, the central limit theorem is popularly
used for this purpose. Bootstrap methods apply to this case as well (see:
Efron (1979), Freedman (1981), Wu (1986)) but how are we to know if the
underlying assumptions have practical application to the data at hand? See
LePage (1990).

Instead approximating or estimating the unconditional distribution of as,
we propose, without moment assumptions, to estimate the sampling distribu-
tion of v - 6 conditional on the sigma ﬁeld .7 generated by the order statistics
of the coordinates of 6. For this purpose we will use randomly permuted
residuals. To be more precise let us introduce the following notation. We
will use 2) - 6']: to denote the above conditional distribution. Denote by csi
the vector of residuals Y - I”, where I” = Y/ X is the projection of Y to the
column space of X. Let 7r denote a uniformly distributed random permu-
tation applied to the coordinates of n-space. Suppose that the distribution
of 1r, conditional on 6, is also uniform over all n! permutations, so that 1r

is independent of 6. We will observe that, provided d/ n is small, 1) - r6i|Y

3

provides with high probability a close approximation of v - MIT. Notice that

the latter distribution is equal to v-e|.7-' since 6 has exchangeable coordinates.

Proposal 1 For any ﬁnite set of contrast vectors {vb k S I}, estimate the
joint f—conditional sampling distributions of contrasts vk - e by the distribu-

i

tions of v), - 7T6 conditional on Y.

This proposal provides strong support for the approach taken by Freed-
man and Lane (1983) who develop descriptive tests of linear hypotheses. To
quote from them, “our reference sets are derived by permuting residuals, and
our signiﬁcance level is a descriptive statistic rather than a probability.” It
will be seen below that v - 7T6'LIY is an estimate of a conditional distribu-
tion insensitive to the moment assumptions. Thus it is possible to achieve
robust estimation of the conditional sampling distributions of least squares
estimators. Previously only M-estimators with bounded score functions were
known to have this property, see Lahiri (1992).

As we will see Proposal 1 can be applied to a quite general class of
exchangeable distributions. However sometimes the recovered distribution
v - 6|]: itself will posses certain unpleasant properties when some coordinates
of 6 will take relatively very large values. This may happen for example for
long tailed distributions of 6. By our result also the distribution v-7rosi IY will
inherit this. It can happen, e.g. the example in Section 1.3, that conﬁdence
regions will not be in the form of single intervals but sums of widely separated
intervals. In such a case to “improve” the distribution of interest it would be
reasonable to exclude observations with outsized disturbance by errors and

then apply our method of resampling by permutation to this reduced model.

Our results extends also to this situation.

Let us describe this in a more rigorous way. We shall ﬁx a set It! of those
values of ranks (R,-)?=1 for which we have decided to exclude corresponding
observations. For example by taking M = {1,n} we will exclude two ob-
servations: with the smallest and the largest error. Thus we will consider a
reduced model as deﬁned by (2) and by «5%,, we will denote its residuals i.e.
ct = YM — YM / X M. We will use the same letter 7r for a random permutation
in R""" (again disregarding dependence on a dimension). We will denote
RM = {R,- : i E M } In the problem of estimation of coefﬁcients ﬂ we will
use contrast vectors being rows of (XLXMY’XL and the matrix X M is now
dependent on values of R M- For that reason we will allow a contrast vector
V(RM) E R"‘"‘ to be dependent on 6 through its order statistics R 16(- Fur-
ther let fM denote the sigma ﬁeld generated by the order statistics of 6M and
by RM while GM is the sigma ﬁeld generated by YM and R [4- With this no-
tation and conventions we will ﬁnd that if the ratio d/ (n — m) is small then,
as before, the distribution V(RM) - “MGM approximates the distribution

V(RM) - ele'M. This can be stated as the generalization of our proposal.

Proposal 2 For any ﬁnite set of contrast vectors {V ;k _<_ l}, which may
dependent upon RM, estimate the joint fM —conditional sampling distribution

of contrasts Vk - 6M by the joint QM -conditional distribution of Vk - «61%,.

Notice that RM provides information concerning which data might be
excluded from the model and YM are observations remaining after this ex-
clusion. Thus QM represents information which should be given to us to

consider a reduced model based on extreme values of errors.

1 .2 Main Results

We assume the notation of the introduction. As before, 6 is assumed to has
exchangeable coordinates and to be independent of a random uniformly dis-

2 we will denote

tributed permutation 7r. If u is an k x 1 vector then by a, u
the arithmetic average and the vector of squares of coordinates, respectively.
By 3?, we will denote the (k — 1)—divisor sample variance of u i.e. 33 =
km/(k — 1). Our goal is to show that for any contrast vector V(RM)
dependent on ranks RM the distribution V(RM) - “MGM approximates the
distribution V(RM) - eleM. Here as before QM = 0(YM, RM) = 0(6M, RM).
In all that follows for a random variable Z or a sigma ﬁeld H by E Z , B” we
will denote conditional expectation with respect to Z and H, respectively.
Also using Z [71 we will mean both conditional distribution of a random vari-
able with respect to a o-ﬁeld 'H and an example of a random variable with
this distribution allowing context to distinguish between these two concepts.
This convention will simplify notation later when we will consider weak con-
vergence of conditional distributions.

First notice that for o E En_m we have (6M,RM) rd— (06M,RM). Here
2,, denotes the set of all permutations of coordinates of vectors in IR". To
see that let consider an event {6M 6 A, RM = RM}. Deﬁne a permutation

6 E 2,, such that it permutes e,- for 7',- ¢ M the same way as o and leaves 6,

unchanged for r,- E M. Then RM(5'€) 2 RM“) on a set {R,,‘,(e) = RM} thus

by exchangeability of e we have

P(€M(€) E Ali/(4(6) = RM) = P(€M(56) E A, 3112036) = RM)
= P(06M(€) 6 «4.3151(6) = RM)-

This implies that V(RM) . emf," é V(R,,—,) - ireMIQM. Indeed, for each
bounded and measurable function h and a random permutation of coordi-

nates in R””" independent of e we have

E’thmm-et) = {1,—5,55“ Z (1(V(RM)'0‘€M)
' UEEn—m

= Emmi/(Rn) - ml)

= EGMMl/(RM) ‘ 7r6M),
where in the last equality we use that fM is a sub-o—field of QM and the
fact that the last conditional expectation is fM-measurable. Moreover both

V(RA-,) - nehlgM and V(R,;,) ' (MIfM are centered at zero since, by indepen-

dence of 1r and e we have

vamm - mat) = EQWVW) ' 6i!)

— 1 g [r _ .1.
— m E M deg-"fax (RM) . 6M)
: EQM (E(7IV(RA_I))|RM=RM . 6i!)

:0,

where in the last equality we use the fact that E(7ru) = 171 = k‘1(u - 1)1 for

any k x 1 vector u and thus E(7ru) = 0 if u is a contrast vector. Similarly

we have

E’M(V(RM).7reM) = EfM(7rV(R,—,).eM)
‘—“ Eif-M (E(WV(RM))IRM=RM ° 6,”)
= 0.

The main result can be stated as follows.

Proposition 1 Let IV _C_ {1, . . . , n} have cardinality m. If 1 is in the column
space of XM which has rank (1 and 7r is a random permutation of coordinates
in Rn‘m then for each contrast vector V(RM) we have

E9M(V(RA-,) - reM - V(R,,-4) - inst)2 _ d—l
ECHO/(RM) - 71'6M)2 _ n — m -1 '

 

Ef

For the proofs of this and other results in this section see Section 1.4.

As a consequence of the above result we have that the distributions
V(R,g) - eMIfM and V(RM) - rehlgM are approximately the same when
scaled by EfM(V(RM) - 6M)2 provided that d/(n — m) is small. This follows
directly by Markov’s inequality applied to the second moment which stands
in the denominator of the left side in Proposition 1 (see also the proof of
Proposition 2; Section 1.4). One can use as well the Mallows metric d2 as
a measure of this closeness (see Bickel and Freedman (1981)). The result is
interesting by itself. For example any picture illustration of a distribution
is in general invariant on rescaling. But when no moment assumptions are
involved the scaling EfM(V(RM) - 6M)2 may diverge to infinity with n. In
such a case we need more arguments to apply our methods of resampling to

conﬁdence regions.

The next result explains the practical meaning of our result when we do
not use the scaling. For the sake of simplicity of notation we will state it for
the case M = 9 but it can be restated in the obvious way also in the general

case. For a given contrast vector v, let

7(6) = E‘(v . 7re — v - nei)2/E‘(v - 7T6)2,

C1=‘/Ef(t’° €)2,

where in the latter we do not show explicitly dependence on the order statis-

tics of e and let

 

C2(Y) = JE‘(v - rel)?

Denoting by F1, F2 the distribution functions of v'€|f, v- «6* |Y, respectively,

we then have the following relations.

Proposition 2 For any positive 6 we have

 

 

 

d -—1 d —1
< -l
(3) P(7(€)> "_16)_ n_ 6 .
(4) am 6 {Few — Clo — «06-2. F2(417 + 01m woo-21
for all a: 6 IR and
f 2 / _ ”_d 2
E 0,0 )_ n—1C"

We can use Proposition 2 to examine accuracy of using the distribution given
by F2 obtained by resampling residuals instead of the unknown conditional

distribution given by F]. For example (4) allows us to assess a value of the

conﬁdence regions based on F2 as approximations for corresponding conﬁ-
dence regions based on F1. More precisely, we should adjust the ﬁrst ones
by taking their halo of thickness C16 and then their signiﬁcance will differ at
most by 7(e)/62. By (3) the random variable 7(6) is small with high proba-
bility. Thus the accuracy of proposed approximation of F1 by F2 depends on
how large C16 is. Suppose for example that we would like to use quantiles of
F2 for obtaining conﬁdence intervals. Then we want C16 to be small relative
to these quantiles and thus to lengths of conﬁdence intervals. In the next
paragraph we will explore this problem in more detail.

For any p 6 (0,1) let KILKE denote p-quantiles of F1,F2, respectively.
By (3) for any 60 there exist (20, a subset of underlying probability space,
with P(Qo) > 1 — 60 and a constant M > 0 dependent neither on n nor on

d such that on {20 we have
7(6) 3 Add/n.
By (4) on (20 for each 6 > O we have

Md C1 Md Cl
F-z(.‘l‘- ——\/—n:) - 6,F2 (1' + —T)+ 6]

for all .1: E R. This implies that for any p,6 > 0 such that p - 6,12 + 6 still

F1(.T) E

 

are in (O, l) we have the following relations between corresponding quantiles

. M_d C1_ M__c_l C__1

Thus it would be desirable to have CIA/n, at least asymptotically, small

relatively to K;. We can say equivalently that we are interested in the cases

10

when quantiles of (ti W,, are convergent to inﬁnity, where

= (v-1r6)|e .
Jaw-m)?

We will examine this question for errors sampled independently from a dis-

(5) We

tribution belonging to the domain of attraction of a symmetric stable law.
Let G(a:) = 17‘°1[1,°°)(:r) and for any n E N let (UM)?=1 and (6i)?=1

be independent sequences of i.i.d. random variables such that Um.- has the

uniform distribution on (0, 1) and P(6,- = :lzl) = l / 2. Let us deﬁne a sequence

6,, = (6,”)?=1 of errors by the equality
(6) 6n,i = 6iG—liUnJ) = (Stud/O-

It is well known that the distribution of 6",,- belongs to the domain of attrac-
tion of a symmetric stable law with index of stability 0. In fact for 1' > 1
we have P(|6,,,,-| > 1') = 1"“ (see also LePage (1980)). Although we will
assume here this special form of errors by application of invariance principle
of the type obtained by Kinateder (1990) it is possible to extend our results
on an arbitrary law from the domain of attraction of a stable distribution.
We will also need a special form for contrast vectors. Let U : [0, 1] —* IR be a
continuous function such that [01 v(:r)d.r = 0. Define a sequence v,, = (my, )?:1

of contrast vectors by

i 1 " k
(7’ - " (a) ' a 2; v (5)-
To express the limit distribution of W,, = (1),, - ren)|e,,/(E‘(v,, - 7re)2)1/2

we will need a notion and some basic properties of the stochastic integral

11

of a function v with respect to a Levy motion. For details see for example
Schilder (1970) or Kuelbs (1973).

By a Levy motion we mean here a stochastic process (A,),E[0,1] continuous
in probability with independent and homogenous increments such that for
any t 6 [0,1] a distribution of A, is symmetric stable with characteristic
function ¢,(u) = exp(—t|u|°‘). Then the integral fol vdA is deﬁned in a usual
way by as a limit in topology of convergence in probability of simple functions.
A random variable fol vdA has a stable distribution with the characteristic
function ¢v(u) = exp(-— fol |v(x)|°'d;z: |u|°). There exists a version of Levy
motion such that its trajectories with probability one are right continuous
and left limits exist at any point t 6 [0,1] (cadlag) (cf. Doob (1953) p.422).
Thus we can deﬁne a o-ﬁeld ,7 generated by values of jumps of trajectories
of this process. We can do it, for example, by ordering values of jumps of a
given cadlag trajectory in a sequence since the number of jumps exceeding
given positive value is ﬁnite. Then ,7 is just a o—ﬁeld generated by such a
sequence. By I‘ = (I‘,),-€N we will denote arrival times of a Poisson process

with intensity one. In this situation we have the following limit behavior of

W...

Theorem 1 For each continuous function v : [0, 1] —-> IR with probability one
the conditional distributions
_ v,, - «6,,
- \/E‘~(v,, ~1re,,)2
converge weakly to the distribution
[,1 vdA
\/ Z [01 v2(:1:)d:r

We 6..

 

 

 

12

where Z is J measurable and Z 4 23:1 I‘TQ/a.

Remark Proposition 1 implies that the joint distribution (v,, - net,” -
1r6,,)|6,, scaled by ‘/E‘"(v,, - 7r6,,)2 converge to (f0l vdA, f0l vdA)|j scaled by
\/Z fol v2(a:)d:r.

As a corollary we obtain desired property for the quantiles of the distri-

butions of \/7_i W,,.

 

 

Corollary 1 For a non-zero continuous function u : [0,1] —-> IR the quantiles
of J17 W,,, different from median, converge to plus or minus inﬁnity with

probability one.

For the proofs we refer again to Section 1.4.

The special form of contrast vectors assumed above is in some sense es-
sential for obtaining a result of this type. We will see that in some extreme
cases i.e. for a very unusual choice of contrast vectors, the quantiles con-
verge to zero. In such a case usage of quantiles of F2 as approximations of

corresponding ones for F1 becomes worthless.

Example 1

Assume that our contrast vectors are of the form v,, = (v,, . . . , v,,, 0, . . . , 0) 6
IR", where k S 72 does not depend on n and (v1, . . . ,vk) is a ﬁxed contrast
vector. Further let 6,, = (61,...,6,,), where 6,, i E N are i.i.d. random

variables such that Shh/E diverges in probability to inﬁnity. In such a
case v,, - 7r6,,|6,, converges in distribution to the same limit as v,, - 6;]6n,
where 6; means random variable obtained by traditional bootstrap i.e. by

resampling with replacement form (61, . . . , 6,,). It becomes more clear if we

13

notice that because of the special form of v,, permuting 6,, in v,, - «6,, is
equivalent to sampling without replacement k-element sample form a pop-
ulation of n-elements (we have here k ﬁxed and n going to inﬁnity). Thus
the distribution of v,, - 1r6,,|6,, converges weakly to the distribution of v,, - 6],.
But assumption that SQ/ﬁ diverges in probability to inﬁnity implies that
ﬁ/ JE‘(v,, - 1r6,,)2 will be convergent to zero so do the quantiles of ﬁ W,,.

 

See Section 1.4 for more rigorous arguments.
As a speciﬁed example of a random variable satisfying the required con-
dition one can consider the errors given by (6) for a 6 (0,1). Indeed, for

such 6 we have

 

n ‘2 O n —l O
i _ i=1 Pi / —2 _ i=1 (Siri / —-3
n I‘n-i/l Fn+/l
(8) —2+‘2/o (Pu-H )2/0 Zn: l—v-‘Z/o
= n _ i
n i=1

1" 1 1/0 n
_n-3+1/0 (A) Z‘SJ‘FI/a-
i=1

n

We are using here the well known fact that

(Pl/Fn+la - - - a 1-‘n/I‘ri-+-l) g(U1,na - - - 9Un,n)-

By Law of Large Numbers I‘,,/n converges as. to 1. The series 2:, 6,1" ,— 1/ a
and 2:21P?” are convergent as. to the continuous distributions. Since
n‘2+2/°‘ for a E (0, 1) diverges thus in this case with probability one sfn / n is

convergent to inﬁnity.

However in some cases our approximations will be justiﬁed regardless of

the form of contrast vectors. Namely, this will happen if s," /,/77 is convergent

14

in probability to zero. Indeed, by simple application of Proposition 1 we can
obtain the following corollary. In its formulation we do not indicate explicitly
dependence of sf” as well as F1, F2 on n and also it should be remembered
that F1, F2 being conditional distributions are dependent on a random value

of 6.

Theorem 2 If sf / n converges in probability to zero then for any 6 > 0 and

for suﬂiciently large n on a set of probability not less than 1 — 6 we have
F1(.’L‘) 6 [F2(.’II — 6) — 6,F2(.’E + 6) + 6].
for all a: 6 IR.

Notice that the assumption required in this result is satisﬁed for errors in
domain of attractions of stable laws when a 6 (1, 2]. In Section 1.4 there are
given arguments for errors given (6) but by series representations of stable
laws given by LePage (1980) they can be easily extended to the general case.

Theorem 1 and Theorem 2 validate approximation of the unknown dis-
tribution of errors of contrasts by permuting residuals under special assump—
tions on distributions of errors. The failure of this approximation in Ex-
ample ] is due to the asymptotically pathological choice of contrast. At
this moment we do not know any example with an essentially different type
of contrasts for which applicability of Proposition 1 for this approximation
would not be valid.

We will end this section with the following result, uniform with respect

to numbers of contrast vectors.

15

Proposition 3 Under the assumptions of Proposition 1 and for contrast

vectors VAR“), k =1,...,r we have

E722=1E0M[(vk(RM) ' WGM) _ (Vk(RM) ° “(it )12 = d -1
22=1EGM(VI:(RM)°W6M)2 n - m -1'

 

1.3 Examples

Example 2

We now present a simple example which illustrates resampling by permu-
tations and the advantage of using a reduced model. The vector of errors
considered here has one value dominating the other ones and typiﬁes what
can happen when the distribution of errors has a long tail (in particular when
our errors do not satisfy the usual assumptions about existence of moments).

Consider an n x 2 matrix X deﬁned as follows

1 ll

1 1
X:

1—1

1.1—1‘

 

 

We assume that n is even and the columns X1 = l and X; of X are orthog-

16

onal. Suppose that our actual errors consist of an n-vector 6 given by

 

 

 

 

 

" 1
a
0
e =
In 0 d
One can notice at once that
1-6 Xz'f
6/X = - 1+ ——- - X2
lllll2 ||X2H2
P 2a/n .
2a/n
: — 1+ — - X2 = ,
n 0
.. O .l
and thus _ q
a -— 2a/n
-2a/n
fi = —2a/n
0
0

 

 

)- a:

Let us take for our contrast vector v the second column X2 of the matrix X.

One can easily observe that the distribution v - elf is concentrated on two

17

points a and --a with the equal weights 1 / 2. We will compare three differ-
ent methods of estimating this distribution: the resampling by permutation
of residuals, the traditional bootstrap with replacement and the normal ap-
proximation. On Figure 1 we present the distribution of interests itself versus
ﬁrst order approximations for n = 625 of distribution obtained by resampling
residuals while on Figure 2 we present ﬁrst order approximation of the dis-
tribution obtained by applying the traditional bootstrap with replacements
and the normal approximation. See Appendix 2.5 for details.

If we consider the reduced model (2) for M = {n} and if the residual
a — 2a/ n is large compared to residuals —2a/ n we will delete the ﬁrst row
from our model and then we will obtain 6 M = 6]} = 0 and thus resampling

of residuals gives us a perfect estimate of the distribution of errors.

18

0
MI.—
NIH
O

 

 

 

l

—a a —a
Distribution of v . 6]}.

 

 

0.5

«0.215;

 

0.5

 

l

f

a

The resampling by permutation

Figure l: The conditional distribution of errors and its recovery by permuting

residuals

0.42e"l / 0426.“

 

 

ll/ fl /\

The traditional bootstrapping

 

Figure 2: The distributions obtained by other methods

19

The normal approximation

Example 3

As a second example, assume that errors are made from the errors in the
ﬁrst example by adding a sample vector from Gaussian white noise with a
small deviation relative to the value of a. In our numerical simulations we
will take it equal to a/\/1_i. Let us take as before M = {n}. Figures 3
and 4 present the conditional distributions of error of the contrast in the
reduced and unreduced model. It illustrates the advantage of the reduced
model. Although approximations in both cases are accurate, in the case of
the unreduced model the conﬁdence region which should be used consists of
two separated intervals around —a and a while in the reduced model one can

consider the natural conﬁdence interval around the origin.

20

0.255 0.235

0.5 0.5 0.5 0.5

 

 

 

 

—a a —a a

Distribution of v - 6|]: with previous errors The resampling by permutation
modiﬁed by adding white noise for modiﬁed errors

Figure 3: Conditional distributions in the unreduced model

0.2:? 0.2:?
1 1

 

 

 

r 7 Y if

—-a a -—a a

 

 

Distribution of v M - EM If after deletion The resampling residuals by permutation

based on the maximal error after deletion

Figure 4: Conditional distributions in the reduced model

21

Example 4

As a third illustration we have made computer simulations which compare
resampling permutations of errors vs resampling permutations of residuals
in a linear regression model ﬁtting a cubic polynomial. Consider a 101-
dimensional symmetric a—stable error 6 with independent coordinates and

a 101 x 4 matrix X obtained by substitution of equally spaced x—values in

l 2 1

3 __ J'-
,:1: ,x ), so :rm- — LL,-

j = 1,2,3,4, 1:,- = (i — 1)/101, i = 1,...,101. We have used the Chambers

the interval [0,1] to the vector of monomials (11:0,:1:

et. a1. formula (1983) to generate stable errors by pseudo—random numbers.
Since our result does not require any assumption on moments of errors we
have observed three notable cases: the normal distribution (a = 2), Cauchy
distribution (a = 1) and a = 0.8. We ﬁrst compute three 101 x 1 row
vectors v,, i = 1,2,3 of the matrix (XTX)‘1XT as well as the matrix of
projection on the column space of X. Then, by generating pseudo—random
permutations 7r we get a sample of two hundred errors v,- - 7r6, residuals v,- - #6i
and projections of errors v, - r6, i = 1, 2, 3. Visual comparison of the values
of these samples is made on parallel plots. Since scale changes in 6 apply

equally to v,- - M, v,- - 71'6‘L

,v, ~ 7r6 it is the relative comparisons between these
plots which are important. Four parallel lines represent the coordinate axes of
four dimensional space and any point of this space corresponds to a polygonal
line which joins values of the coordinates on each axis. For convenience all
polygonal lines are started at the origin. For details concerning parallel plots
see Chernoff (1973).

In these parallel plots we see a very close agreement between the multivari-

ate sampling distribution of {'v,--7r6,i = 1, 2, 3} and that of {v,--7r6i,i = 1, 2, 3}

22

conditional on the given 6. As might be expected the agreement is even bet-
ter than suggested by the sampling distribution of {v,- - «6“, i = 1,2, 3}. For
this example(d — 1)/(n - 1) = .03.

In Part 2 we will give more detailed description of numerical methods

and procedures we have used here.

23

 

 

 

 

23.61

:4

i

i

2, v,- ~1re.

=2

—23.61

a:

 

 

 

 

23.61
-23.61

2,v,~ - 7I'6'L.

a:

 

 

 

 

 

 

 

23.61
-23.61

2,v,- - 11'6“.

a:

Figure 5: Results of resampling, Gaussian case

24

=4

i

 

 

 

 

232 6
—232.6

a=1,v,~-1r6.

i

-7r6‘L.

a: 1,1);

 

 

 

 

 

232.6
-232.6

 

..] 4

 

 

232.6

-232.6

1,115 -7r€.

Figure 6: Results of resampling, Cauchy case

25

 

229093

 

-229093

 

 

229093

 

 

-229093

 

a = 0.8,v, - 7re

 

229093

 

 

 

-229093 i=2 i=3 i=4

a = 08,11, -1r6

Figure 7: Results of resampling, stable case with a = 0.8

26

1.4 Proofs

Variants of the following lemma, in the special case when V is not ran-
dom, are apparently well known Chernoff (1973) Section 4.1; Freedman and
Lane (1983), pg 296, Lemma 1). Although formulated here in probabilistic
language it illustrates rather a geometrical property of the group of permu-

tations. We present the simple proof of this result.

Lemma 1 Let V be a random contrast vector. Assume that V, a random

n x 1—vector 6 and a o-ﬁeld 9 all are independent of a random permutation

it. Then

           

Eg<v - m? = Egu
PROOF. We have

Eg(V-{)2 = E? )3 E°<V~9>(v-ag)2/n!

062"

= EG(Z V252 {em + Z V1:’Ib€a(a)€a(b)) /”'

a¢b

)a¢b a¢b
= E9{uvue+—mvn2— (v 1N]? we}.

= EG((: (2211/02) {2+ nnl( ——-1),,(ZV Vb)( )(Z€a(a)£a(b)) ))

where 2,, denotes the set of all permutations of coordinates of vectors in IR".
Since (V - 1) = 0 we obtain

71
—1

 

E9(V - M)? = E, (llVll'zn ('52 — o?) = Eanvu'zezn

The above lemma and some elementary properties of conditional expec-

tation are the only tools in the proof of our main result.

PROOF OF PROPOSITION 1. By Lemma 1 we have that

EQM [II/(RM) - m.) — (VIRI—II - «emf = WWI/(Re) - wet/XIII?
= E9“(||V(R III-s“? s...,x..I
(9) = EQMIIV( (.RI‘I) )II2

SEM/XM

where the last equality follows from measurability of 33M /XM with respect to

GM. By similar arguments we have also
(10) EQMII’IRM ) WM)2 = SEMIEQMIIV R(II!) )II2

Using the fact that (6M, RM) ; (06M, RM) obtained in Section 1.2 and since
both 33M and E‘Ms2

M" m" are TM measurable we have

2
,EGMI(V(Rn)-W6M)-(V(RM)' new] = ErsEM/XM
EQM(V(RM) ° W6M)2 83M
EfEstgu/XM

2
36M

 

(11) =

 

_ .FMS
— E stun/X1“

= Ems?

SJMM/Xu

The proof can be completed using an equality

E’s3x=«d—nMn—nef

28

obtained by Box and Watson (1962); Section 4.1 in their study of F—distribut-
ion approximations for sf/x/sf with non—normal errors 6 and also mentioned
by Freedman and Lane (1983); see (21), p. 267. In our case it may be
obtained from Lemma 1 by expanding in an orthonormal basis {ua} of X M

with u1 = 1/(n — m) as follows
1

EMS? SnM/xM = —.—E‘" (IIWfM/XMII2 " (MM/XII '1)2/(n — m))
n — m —
1 d 2
= —— “21(110-71'6M)2— (6M -l) /(n — m)
n — 171 —IE “I:
= ———ZE‘“(u -27r6M)
n — m- al__2
= ———1——dZ||ua| 232
n — In - 1 0:2
d—l 2

= ———s
n—m—l ‘M'

where Lemma 1 was used in the third equality.
El

Likewise we can obtain the result on the relative second moments of
difference of 7I’€M — Mil. For simplicity of the notation let us formulate it in
the case M = 0. If we denote by X e l the orthogonal complement of 1 in

the column space of X we have then

E‘||(7r6 — 7r6l)/X e 1”2 _ d -1
E‘II(7r6)/X 9 1||2 — n -1'
Indeed, for any vector u we have by the deﬁnition of 3,2, that 33 = sE/xe, and

||u||2= (u 1)2/n+( (n—1)s2. Thus

E]:

 

 

Ef||( (7r6—7r6 )/X91||2= (n—1)Ef 3m”): (n—1)Ejrs2 9e/x

29

and

Ef|l(7r6)/X 6 III2 = (n —1)Ef32.

C

PROOF OF PROPOSITION 2. For any random variables Z1, 22 with marginal

distribution functions F 1,172 and for each 6 > 0, C > 0 we have that

 

 

F2(:c — ca) — P (Z2 E Z‘

— Z
26) s Flor) SF2($+CI5)+P(Z2C ‘26).
which applied to Z, = v . W6, 22 = v . rel, P = P‘ and C = C1 (notice that
Z1l€ = v - elf, Z2l€ = v - reilY and C1 = l/E‘Z?) together with the Markov

inequality
I22 — ZII E‘IZ2 - ZII2 _2
P‘ —— > 6 < 6
( C, — _ Cf

gives us the second part of Proposition 2.

From the Markov inequality applied to non-negative 7(6) we get
P(7(6) > <5) S E 7(6)/<5-

Thus the ﬁrst part of Proposition 2 follows from Proposition 1 since it implies
that E 7(6) = (d — 1) / (n — 1). The third part of Proposition 2 follows from
Proposition 1 if we notice that it remains valid when we replace FL by 6 — FL
and d — 1 by n — d. In fact it follows from (11) that

2
EfECI’U ' ”(1)2 = Ersc/xi
Cf S? ’

 

where Xi denotes orthogonal complement of X. So the expansion in or-
thonormal basis in Xl gives us, as before, that Efsf/x, = sf(n — d)/(n - 1)
(notice that basis will consist of n — d contrast vectors).

Cl

30

We will precede the proof of Theorem 1 by some considerations on con-
structions of series representations of stable laws and Lévy processes as in-
troduced by LePage (1980). Let u = (u,-)‘,?;l be any sequence of numbers
belonging to (0, 1). We will deﬁne

I?(uat) = 1{s€[0,l]:u15lns]/n}(t)
and for 1 < i g n, by recursion,

If (u,t) : 1{8€I0»1I=UI- i;IIIWEIMI/(n-I'IIU)’

It is not difficult to verify the following list of properties of a sequence
(I?(u,t) 15;, (see also LePage (1980) and Kinateder (1990)). First it is obvi-
ous that a sequence (1,-"(u, t))?=1 consists only of zeros and ones. Moreover for
ﬁxed t and n there are exactly k = lnt] ones in this sequence. Such a sequence
can be interpreted as a combination without replacement of k-elements form
n-elements when positions of ones indicate elements of {1, . . . ,n} which are
chosen to this combination. On the other hand if we ﬁx n, and u then the
number of ones in the sequence will increase from 0 to n as t will increase
form 0 to 1. A sequence of positions of subsequent appearance of ones or,
equivalently, jumps in the sequence (1,."(u, t))§‘=l is a permutation of elements
of {1,...,n}. It is worth notice here that these jumps can occur only at
t = k/n and one at the time. The combination and permutation, which
can be related in this way to (I?(u,t))?=1, depend in fact only on values of
u = (um-”:1. We will refer to this as a combination and a permutation chosen

by u.

31

Now if U = (Um-:1 is a sequence of i.i.d. random variables with uniform
distribution on (0,1) then for ﬁxed t E [k/n,(k + 1)/n), k E {0,...,n}, a
random combination chosen by U has a uniform distribution over all possi-
ble (2) combinations. Also the positions of jumps of (I,-"(U,-))?=1 i.e. the
permutation chosen by U will be uniformly distributed over all n! possible
permutations. Moreover I,-"(u,t) as a function of t, for ﬁxed u, n and i, will
change its value form zero to one at one and only one of the points t = k/n,

and it is right continuous. It follows from that
1?('u,t)= 1Io,zI(J.-"(U)),

where J," denotes the jump of I,”( u, -). So for any sequence of random vari-

ables 6,, = (6,”)?=1 independent of U a formula
(12) W) = Damn,
i=1

deﬁnes cadlag process on [0, ll constant on intervals [(k — 1) / n, k / n), where
k E {1, . . . ,n} and with n-jumps at points k/n,k = 1,... ,n equal to values

of 6,,,,~, i = 1, . . . , n randomly permuted by U. Thus

an _ n E _ n k_]' n
mew-(L t.) L ( . I).-.

is a random permutation of 6,, chosen by U. Consequently, for any vector

 

v 6 1R” we have

v - 7r6,, l6,, = Z v,,AL?l(6,I‘,-)§:1

1:1

From now on let I‘ = (I‘ 02:, will be arrival times of a Poisson process

with intensity equal to one i.e. l",- = 21 + - - - + Z,- where Z,- are i.i.d. random

32

variables with exponential distribution with a parameter equal to one. Then
(F,- / I‘,,+1)?=1 is a vector of i.i.d. random variables uniformly distributed over
(0,1). Let 6 = (6,),9;l be random signs deﬁned as in Section 1.2. Moreover
let P and 6 be mutually independent. Then

F—l/O 7!
en = 6nd :1: = 61+.—
( ) ‘ I all")

i=1
is a vector of i.i.d. random variables with a distribution form the domain of

attraction of a symmetric a-stable law.

For a continuous function v on [0,1] let Riemman type sums be deﬁned
by 2L1 v(k/n)AA2, where AA}; = A(k/n) - A((k — 1)/n). They converge
in probability to ID1 vdA (see Schilder (1970)). Kinateder (1990) proved that
a sequences of cadlag processes (L"(t)),€[0,1] is weakly convergent under Sko-
rohod metric to (A"(t)),€[0,1] (see also LePage (1980)). The main idea was
to apply the series representation of LePage (1980) of a Lévy motion and
integrals with respect to Lévy motion. This representation states that a

symmetric o—stable Lévy motion with cadlag trajectories can be written as
Mt) = Zéiri—UOIILI,,1](I) = :6iPFI/01[0.tl(Ui)I
i=1 i=1

where U is independent of (6,1‘). One can easily extend this formula for

stochastic integral with respect to Lévy motion to obtain

1 00
/ vdA = :5,r;‘/°5(U,).
0 ‘_

1—1
The next proof exploits this convenient representation once again and applies

it in a similar manner to Kinateder (1990) to obtain that

n 1
"lingo:v(k/n)AL2l(6,-I‘,-)f:1a‘zs'jo vdAlj.
k=l

33

PROOF OF THEOREM 1. By deﬁnition of v,, (see (7)) and Lemma 1 we have
”It ' “enlen = “6"” ( ?=1v(i/n)AL?I(6iFi)?:I
\ﬂ‘I‘I-(vn - 7r6,,)2 IIvnIISCa IIEnII

?=1v(i/n) ?=1AL?)
n llénll '

 

 

 

Notice that

Ia = (I-i)‘”(:£j(i)/~-(§év(i)/n))

—1/2

 

( 11:1 [72/0 1/2
a. 172/0 - (2;. 6IFF‘/°)2/n) '
and
AL? 22;. 6.17”“
II€nII = J2}; p.72”.
By almost sure convergence of 23:, 6,I‘,—1/° and 22:, [72/0 and the fact that

lim,,_. 2;, v (i) /n = f v(t)dt = 0 we have with probability one

 

(13) lim ”6"” = (/ Mum)“,

"Pm IlvnIIS‘n

2?:1v(i/n):?:1AL? : 0

 

(14) lim
”“°° ’1 “en“

(15) ,Iigg, Ilenll = ,l Err”.
i=1

34

Now we will show that 2?_1v(;:-)AL'-‘ converges almost surely to f vdA.
Kinateder (1990) has proven that a sequence Ln(t) is convergent with prob-

F—l/a

ability one to 23311‘ 6 I“) ,])(U ) (in fact convergence of the sequence of
entire stochastic processes holds 1n Skorohod metric). Mostly the same ar-
guments apply here when instead of 1(0.t) we will take a continuous function
v. Namely with probability one
"1320: r7 ”“6 v( (R"(U =2 r:”°6( v(U
i=1

since

lim R?(U) '—S' U.-

(see Kinateder (1990); Proof of Proposition 3.1). In view of the series repre-

sentation and deﬁnition of AL;1 this is equivalent to

"11.1ng v:( )AL"_ 8‘3 Eff/“6,- v(U.-) éfolmm.
This implies that
"1930”" -7re,, 5,, g ﬂli_.n0102n:v(z'/n)AL,’-’|(6,-I‘,-)§’_°=1
i=1
= iP:‘/°é.-v<U.-)I(6.r.-)?:l

i=1

[01 vdAlJ.

HQ-

(16)

Combining (13)—(16) we obtain Theorem 1.

35

PROOF OF COROLLARY 1. By Theorem 1 it is enough to show that the
limiting distribution of Wu does not have an atom at zero with probability

one. Assume to the contrary that the event

{P(§:P:1/06iv(U,-) = 0'(Pi6i)i6N) > 0}
i=1

has positive probability. Then

P(§I‘:”°6.vw.) =0) = EP (BF/“mm =0I(r.6.).-EN)
(17) > 0.

But 2?; F:1/065'U(Ug) = fol mm and since 12 is a non-zero function thus

fol vdA has a—stable distribution which is continuous. This contradicts (17).

[:1

Let us discuss now the statements included in Example 1 in Section 1.2. For a
ﬁxed w belonging to the underlying probability space by 6;;(w) = (6;'m-(u))),’.‘=l

we will denote a random variable obtained by traditional bootstrap i.e. by re-

sampling with replacement from (6,-(w))?=1. For w from the set of probability

one forl= 1,...,k we have
. 1 n
(18) “In Eellw-f;.l(u’) = lim _ Zeitvﬁﬂw) : Eeitv'q
n~oo "#00 n j_.1 .

36

We will assume that for our ﬁxed on (18) holds. Denote 6,,(w) = (5,-(w))?=1.

Then

lEeitvn-x£n(w) _ eitvrql S IEeitv.-1rc,.(w) __ eitvn-e;(w)|

+|Eeitv.-£;(w) _ eith-ékl

n! (n - k)! 1
S (n—k)!( n! —3)

[C
H Eeitvlc;‘k(w) _ eith-Cg.’ .

(=1

 

+

 

Thus by (18) we get that vn-7renlen converge weakly to vk.ek with probability

one.

PROOF OF THEOREM 2. By Proposition 1 and Lemma 1 we have

2
Ef(v - 7T€i — v - m)2 = (d —1)||v||2—s‘—.
n - 1
Thus by our assumption Ef(v ~ ml — v - 7r6)2 converges to zero in probability

and thus for a given 6 > O and for sufﬁcient large n we have
P(Ef(v . mi — 12-7“)2 > 6) < 6.

Now using Markov’s inequality in the same manner as in the proof of Propo-

sition 2 we immediately obtain the result.

37

The next proof is just a slight variation Of the proof of Proposition 1.

PROOF OF PROPOSITION 3. By (9) and (10) we have

22:1 EGM[(VI:(RM) - W614) - (WARM) 4611)]?
22:1 Eg“(Vk(Rn) - #6102
SfM/xM {71:11599""”(l/1091(4)”2
83., 22:1 E9“ ||(Vk(1't’11)||2

2
saw/X14
——-,2 .

ScM

 

 

Now conclusion follows immediately from the proof of Proposition 1.

38

Part 2

Numerical Analysis

39

2.1 General Description of Software

This section contains the description of our software which was created to ver-
ify results on resampling methods for linear models. The software is intended
to be developed to a package which, we believe, will become a very useful
tool not only for verifying theoretical results and comparing different resam-
pling methods but also for tracking new phenomena which can contribute
to the development new effective statistical methods. The linear regression
and time series are the two main areas in which these programs are assumed
to be useful but we believe that there can be more applications for them.
From the point Of view of functionality our software has two complementary
components. The ﬁrst one includes all procedures for numerical analysis of
data will be written in the C—language. To assure the maximal portability
we use the standard C—language as deﬁned in Kernighan and Ritchie (1978).
The second component has as its goal a visualization of Obtained sets of data.
We would like to assure simplicity and functionality together with high qual-
ity of graphics and we found that the NeXTSTEP environment available on
NeXT computers and IBM compatibles satisfy both requirements the best.
The whole software in the portable C—language serving the numerical part of
our project was written by the author while the front—end and graphical ap-
plications available in NeXTSTEP computer environment was programmed

by Isaac Tsaiyi in cooperation with author and Dr. R.LePage.

4O

2.2 The Numerical Procedures

As we have mentioned our software is designated to enable simulations and
analysis of the resampling methods in linear regression models and time
series. It will serve as an easy to use but powerful research tool for wide
range Of problems connected with resampling methods and linear models in
statistics.

In its current stage the data analysis is made in the two steps. First all of
our strictly numerical programs and procedures write their results into ﬁles
with names usually chosen by a user. Then the graphical interface can be used
for analysis of the data contained in these ﬁles. Thus the analysis and display
of the data are separated processes here. In the future however we would like
to control them under one joint application running on NeXTSTEP.

In its present form our software has subroutines which compute all linear
algebra Objects which are extensively used in studies of linear models such
as inverse matrices, projections on given subspaces and more. We have also
provided procedures for ﬁnding the least square estimates Of unknown pa-
rameters of a given model. Many of them can be applied to time series as well
as to linear regression without any change but, Of course, the properties of
these two classes of linear models are quite different. Figure 8 illustrates the
difference in the character of data Obtained when resampling by permutation
is applied to two different models (linear regression and autoregression). In
both cases the underlying distribution was normal.

We also have efﬁcient procedures for generating samples of data which

correspond to a given model when its parameters are speciﬁed. Once we

41

7.1m 14.475665 11.339018

 

#831691 44.421016 JIM
Dub 2 Data 3 Dan 4
03639] 0.130009 0.273100

 

W 41.139112 JJIO‘IM
Duh 2 Dan 3 Data 4

Figure 8: The sample of 1000 elements obtained by permuting residuals in a

linear regression model versus an autoregression model

42

generate an appropriate set of data we can apply to them a chosen resampling
method. We have incorporated many programs which compute different
characteristics of the resampled data. In our simulations we needed a very
efﬁcient generator of resampling schemes therefore we have written quick
procedures for generating random sampling based on the generator Of pseudo-
random numbers.

In our theoretical research we have concentrated on the method of re-
sampling by permutation but in the proposed package a user has also access
to other resampling methods like the traditional bootstrap or bootstrapping
by signs (see LePage (1990)). Generating sets Of data from as many classes
of distributions as possible will be also assured. Currently because of our
research interests we implemented methods of generating data from the do-
main Of attraction of stable laws. In Figure 9 we can Observe how the nature
of underlying distributions can inﬂuence the Obtained numerical results even
for the same autoregression model.

We have also made simulations for an autoregression time series. Let
us brieﬂy describe this model. Let (Xn)n€Z be an autoregressive process of
order (I or AR(d) for shortness i.e. a stochastic process such that for each

n 6 2 the following equation is satisﬁed
d
JYn = Z [Ban-k + 6n,
k=l
where (fnlnez is a sequence of i.i.d. random variables and (1301;, is a ﬁnite

sequence Of real numbers. Let n E Z and m 2 (1 will be ﬁxed integers.

Denote

43

03‘3” «.13» 0.273100

 

W 43.139”: 4314754
Doll 2 Dub 3 Data 4
9.5107] 4.060931 7.664361

 

Figure 9: Resampling residuals in an autoregression model with Gaussian

errors versus Cauchy errors.

44

5 = (51,...,gd,o,...,0) l

6 : (6n1---a€n—m+l)
1673'"

X = (Xna-°-1Xn—m+l)

 

Skdx = (XII-k, - ° - a Xn—lc—m-H)

J

A symmetric random operator M : ’R" —> ’R" will be deﬁned by its matrix

 

 

representation
- Xn-l Xn—2 Xn—m i
M = Xn-z Xn—s -°- Xn—m_1
L Xn—m Xn—m—l Xn—2m+1 .
Notice that the columns of this matrix constitute the vectors 5 X , . . . , S '"X .

Denote by M a vector space spanned by them and for any vectors 1:, y E 72'"
by 33-31 we denote their inner product while a: / M will stand for the projection

of a: on M. With this notation we have
X = M H + 6.

Moreover,

MﬁEM.

45

Indeed,

Mﬂ = (sx-ﬁ,...,smx.ﬂ)

d d
(2 [Ban—ka - - - a Z ﬁan-k—m-l-l)
k=l k=l

d
2: 3,.st

k=l
By ordinary residuals for this model we will mean the coefﬁcients of a random
vector

6']- =JY_X/M

Since M ,8 E M we have
E'L = e - e/M

The term adaptive residuals will be used when referring to a random vector

(X - X/Mh

(Sm—IJY — Sm—lX/M)1
X11 — (X/M)1

X,_,,,+, — (Sm-1.17M )l

where (:r);c stands the k—th coordinate of a vector 1: E 72'”. Of course, we

have

(5 — e/M)l

(Sm—1€ _ Sm_1€/Sm_lM)1

46

The estimate 6 of 6 which is deﬁned by the equation
X/Sm‘lM = sm-lMﬂ‘

was considered even in the case when the second moment Of 6,, does not exist
see Kanter and Steiger (1974), Harman and Kanter (1977).
Then we have

X=M6+el.

It seems thus to be reasonable to consider the adaptive estimate 8“ of 6 by
the equation

X = M3“+e°.

For such a model we have carried on analogous resampling methods. An
autoregression model of the order four was considered with the same coef-
ﬁcients in the both cases but in the ﬁrst one we deal with the Gaussian
distribution of underlying stationary random sequence while in the second
one a sample from Cauchy distribution was generated. Although we do not
have any speceﬁc theoretical results verifying applicability of our method of
resampling to this model, results Of simulations strongly sugest that results

Of this type can be true also in this case.

2.3 The Graphical Interface

In our studies of resampling by permutations we have used the £a’Plot ap-
plication developed by R. LePage, K. Podgérski and I. Tsaiyi. It exploits

the idea of parallel plots. The main window of this application is presented

47

l

-$.‘ll3724

 

Data 2 Dana 3 Date I

 

Figure 10: The main window of the Euplot appli *ation for analysis Of 111111—

tivariate data.

48

on Figure 10. On the upper right side of the main window there is the
“Create proﬁle” icon which makes it possible to set a proﬁle on the parallel
plot symmetric with respect to the horizontal axis. This proﬁle can be com-
pressed (rescaled) by using the “Proﬁle Compression” slide. Those points on
the parallel plot which lie outside of the proﬁle are automatically painted in
a different color than the others. A user can select the colors used. In Fig—
ure 10 a proﬁle was set up on the second and third vertical axes. The proﬁle
is indicated by two horizontal lines which show its actual position. Using
the “Modify proﬁle” icon one can easily change it. The current percentage
Of points which are not pruned by our proﬁle is shown in the “Conﬁdence
Percentage” display. Conﬁdence interval half—widths appear in the smaller
dark windows at the bottom. A change Of the horizontal gap between axes
can be also regulated by a horizontal slide on the right bottom side of the
display window.

The slide just below the graph enable to track the data in the cases when
the number of axes is such big that dimensions of our graph exceed horizontal
dimension of the graph window. This can be also regulated by a change Of
the horizontal gap between axes by the horizontal slide on the right bottom
side of the main window. The rest Of the slides corresponds to the axes
showed on the plot and can be used for changing the scale of each of them.
Numbers on the top and the bottom of each axis indicates the actual value
Of the corresponding coordinate of the ﬁrst point which is not pruned by a
given proﬁle.

The data for this graphical interface should be prepared in a ﬁle where

coordinates of each n dimensional point are written in a one separate row.

49

In other words a ﬁle should have the form of a matrix where rows represents
points from the n dimensional Euclidian space which we want to view on
the parallel plot. Many additional features which come usually with any
NeXTSTEP application are also added as for example possibility of saving

a picture as a Post Script ﬁle.

2.4 Numerical Experiments

In this section we will discuss some simulation data Obtained for few different
speciﬁcations of the model (2). In our simulations we check the accuracy of
taking Iy as conditional conﬁdence intervals based on resampling residuals by
comparing them with random intervals I. Obtained by resampling real errors.
We also compare our method with the methods based on the traditional
bootstrap. The conditional distributions of both Z and 2’ will be examined
too. Here we will include only a small variety possible numerical studies.
Some others requires additional procedures and supporting programs which
are not yet available. The descriptions of existing software, as used here, are
given in Appendix B.

The proposed method of resampling is based on rearrangements Of resid-
uals by random permutations which corresponds to 7r in our notation. Al-
though hypothetically we can calculate the exact values of quantiles of a
distribution obtained by resampling by considering all permutations of n—
dimensional vectors, in practice it would not be possible even for not very
large n (52! estimates the number of subatomic particles in the visible uni-

verse). We can omit this by sampling independently from the distribution

50

of 1r and then using the empirical distribution GN(1:) = £le 1(4be where
Z}, = 11,- - 7r" (Y — 17) and 7r", 1: = 1,... , N are independent permutations of
vectors in R“. Then by the law of large numbers or, better, by the Glivienko—
Cantelli Theorem the a/ 2 and 1 — a/ 2 sample quantiles a” and b” of G N
will be approximations of corresponding quantiles for actual errors a’ and b’.
In our computer programs we have used this method for ﬁnding conditional

conﬁdence intervals for a parameter 5 in a linear regression model.

Example 5

Here we will consider the model based on ﬁtting of the cubic polynomial
as described in Section 1.3. The vectors of errors will be an independent
sample from stable distributions with the parameter of stability equal to 2, l
or, in other words, with Gaussian and Cauchy distributions. More precisely,
we will consider the matrix X with the coefﬁcients given by x”- = r171
'=1,2,3,4, 1:,- = (i — 1)/101, i = 1,. . . ,101 and as the contrast vectors the

rows of the matrix of projection 011 the column space of X.

Generating independently permutations we have Obtained the empirical
distributions of random variables denoted earlier by Z and Z’. We can Ob-
serve that the computable distribution of Z’, independently of the underlying
distribution, is quite close to the unknown distribution of Z. On Figures 11
and 12 vertical lines 011 the third axis indicate approximately the same .85

conﬁdence interval for 33 whether using resampled errors or residuals.

51

0 'JIT" 1m" I139”

 

 

D «95ml 49.9)” 451245

0 9.9 M" 20.0507 ”le

 

 

0 41.170“ 49.07." 44.3”

Figure 11: The distribution of resampled errors and residuals, Gaussian case.

52

 

 

0 -I7.9ll3 4|an ~l2li M4

0 91!)" l7IJ|4 111501

 

 

0 .Ol 41! -|76A91 425.175

Figure 12: The distribution of resampled errors and residuals, Cauchy case.

53

2.5 Conclusion

We propose resampling from the permutations of residuals as a method of es-
timating the conditional sampling distributions of contrasts; without moment
assumptions and assuming only exchangeable errors. The idea of resampling
permutations of residuals occurs in Box and Watson (1962), Freedman and
Lane (1983) and possibly elsewhere. Both papers place considerable empha-
sis on establishing conditions under which F—distributions associated with
standard tests in regression are recovered by resampling permutations. In
fact Box and Watson (1962) do not otherwise recommend the method while
Freedman and Lane (1983) recommend it, but only as a descriptive method.
Neither paper makes the crucial observation that the relations Of Proposi-
tions 1, 2 imply that with probability tending to one the distribution v-7rei IY
approximates v - (If, when rescaled by W, as (d — 1)/(n — 1) —» 0.
In fact we have independently discovered Propositions 1,2 in the course of
extending the results Of LePage (1990) which exploit resampling the signs
6 of residuals in much the same way: 22 - 6ei|Y x v - 6| |e| for symmetric
errors. We believe that methods which approximately recover conditional
distributions by resampling are more widely applicable, e.g. to time series,
and we have adopted the term seamless resampling to indicate their robust

character.

54

Appendices

Appendix A

We present some facts about exact distributions and their approximations
which were described and illustrated in Section 1.3.

We propose v - mile as a good approximation of v - rel]: except in un—
usual circumstances. Notice ﬁrst that this random variable has the same

distribution as 7w - {J‘IE and thus takes the values

yf = :l:(a—2a/n—2ak/n+2a(l—k—1)/n)

= i2a(1— 2(k +1)/n)

with corresponding probabilities

mu-)
2(8)

where k = 0,... ,l —- 1 and l + l = n/2. This follows from the fact that

 

it corresponds to the distribution which samples one’s or minus one’s. The

55

above probability is assigned to the event when the choice consists of k one’s
(minus one’s) and l — 1 — k minus one’s (one’s). The weight 1/ 2 comes from
the two symmetric events in which the ﬁrst value of 1m is equal either to one
or to minus one.

Let '7 denote the hypergeometric distribution with parameters n - 1,1 —
1,1 — 1. Then around points to our conditional distribution corresponds to

the distribution of the random variable :té = i2a(1 — 2(7 + 1) / n). Since
Em = (l—1)2/(n -1)zI/2 = n/4.

and
(l — U212
(71 - 1)2(n - 2)

for large n we have approximately

Va'r('7) =

 

z 71/16,

E(:l:{) z :l:(a — 4a/n) z in,
and

Var“) z E(—4a/n('y—n/4)—4a/n))2=
= 16a2/n2E('7 — 11/4)2 +16a2/n2

z a2/n(1 + 16/11) x a2/n.

This conﬁrms the fact that our distribution is concentrated quite close to
points a and —a and thus is reasonable approximation of the distribution of
interest. One can also argue through the normal approximation of 7 for large
n. Namely, it follows from Theorem 3 Of Bickel and Freedman (1984) that
(47 - n)/ﬂ converges in law to the standard normal distribution N(0,1).

56

SO for large n distribution of 7 is approximately equal to N (n / 4, n/ 16) and
consequently the distribution of 5 is equal to N (a, az/n).

Now consider the traditional bootstrap where the distribution of interest
is approximated by the distribution of v-(e‘L )* Ie, where (61 )* indicates random
variable obtained by sampling n-times with replacement from values of the

vector €‘L. This random variable is distributed over points of the form
(’61 — k2)(a — 20/”) + 20,02 — (1)/Tl

with probabilities
T1,i.,%_.'l;(kla l1) T1,};‘%_l(k2, 12),

where k1,k2 = 0,...,n,11= 0,...,l -k1,12 = 0,...,1— k2. Here and

henceforth Thphm stands for the trinomial distribution given by

l l — :r x l_£_
Tammy) = (191) (192)”(193) y,
:r y

where :1: = 0,...,l, y = 0,...,l —:r and p3 = 1 —p1 -—p2. Similarly, Bum will
denote the binomial distribution. Now one can notice that the distribution
of v - (ei)*|e has asymptotically nonvanishing mass around zero.

Indeed, from the following equality

7).P1,P2(x1y) = Bl,P1(x) Bl—x,—£2—(y)a

1_P1

and taking k1 = k2 = 0 we Obtain that the points 2a(lg—ll)/n, 11,12 = 0,. . . ,l

are distributed with probabilities

(1 — 1/"ln316'52501l BI,L(12)-

271—2

57

and using the Poisson approximation for the binomial distribution and Cen-
tral Limit Theorem we will Obtain for large n the approximate distribution
equal to e'lN(0,a2/n), which gives asymptotically the mass equal to e".
In fact the mass will be even bigger since we should take into account not
only k1 = k2 = 0 but also all cases when k1 — k2 = 0. This implies that in

this case the traditional bootstrap fails to recover the distribution v - (If.

It is possible to explore the distribution of v - (ei)'|e even more carefully

a — 2a n
since it is equal to the distribution of '7’ - / , where a two dimen-
2a/n

sional vector 7 has a trinomial distribution with parameters 1, l /n, 1 / 2 — 1 / n
and X" denotes symmetrization of a random variable X. So we can ﬁnd that
the distribution will have the positive mass around all points of the form
:l:ka, k =0,1,2,....

As the last method considered here we will consider the normal approxi-
mation N (O, 62||v||2) of our original distribution, where

(a — 2a/n)2 +(l—1)(2a/'n.)2
n —1 '

 

We have then 62||v||2 z a2(1—2/n) and this method fails completely in recov-
ering the distribution of v - elf, and of course the unconditional distribution

may be far from normal.

Appendix B

N ow we will give short descriptions of the computer programs we have used

to obtain the numerical simulations presented in the previous section. In the

58

near future we intend to develop with Dr. LePage this software to a package
which will serve as a useful tool for exploring of the resampling phenomena
of the linear models in statistics.

From the point of view of functionality we have divide our tools into two

independent although complementary parts, numerical and graphical.

Numerical subroutines and programs. All strictly numerical pro-
grams were written in the standard C—language as described in Kernighan
and Ritchie (1978) so can be run on any computer with a C—language com-
piler. All numerical results of these programs are held in ﬁles with names
speciﬁed by a user. The main routines for resampling linear models are called
blr, plr, slr. They provide quick and convinient tool for resampling in these
models. Below we put more detailed description of their usage.

The programs plr, blr, slr create sets of bootstrap replicate betahat
vectors using three different methods of resampling: by permutations, by

traditional bootstrapping and by sign change, respectively. For example
plr 1000 observ design outputl output2 + <ENTER/RETURN>

will produce a ﬁle outputl with the least square ﬁt Of observ to the model
design and a ﬁle output2 containing 1000 bootstrap betahat vectors Ob-
tained by applying ordinary least squares to 1000 independent uniform ran-
dom permutations of the residuals (i.e. residuals from the original ﬁt).

The programs can also be used with only four parameters if a user wants
to have a design matrix and a vector of Observations in one ﬁle. Then one

should just type

59

 

plr 1000 input outputl output2 + <ENTER/RETURN>

The observation vector y is assumed to be the ﬁrst column in the input ﬁle.
The programs blr and slr work in an analogous way but instead Of per—
muting residuals blr uses with-replacement sampling of the residuals while

slr uses independent sign changes.

Example 6

Consider input ﬁles with the following data

observations: design:

-3 1 1
2 1 0
1 1 -1

Notice then that the vector of residuals has the form

-1

-1

Here are examples of output ﬁles Obtained by applications of our programs

to these data. Thus
plr 1000 observ design outputl output2 + <ENTER/RETURN>

will produce a ﬁle output2 with data

60

betahat1*

0.000000
.000000
.000000
.000000
.000000
.000000
.000000
.000000
.000000

OOOOOOOOO

.000000

betahat2*

-1.500000
-1.500000
-1.500000
0.000000
1.500000
1.500000
1.500000
-1.500000
1.500000
1.500000

Similarly for two others programs.

blr 1000 observ design outputl

betahat1*

-1.000000
1.000000
2.000000
0.000000

-1.000000
0.000000

-1.000000

betahat2*

0.000000
-1.500000
0.000000
1.500000
0.000000
-1.500000
0.000000

61

output2 + (ENTER/RETURN)

0.000000 0.000000
-1.000000 0.000000
1.000000 -1.500000

slr 1000 observ design outputl output2 + <ENTER/RETURN>

betahat1* betahat2*
1.333333 0.000000
-0.666667 1.000000
-1.333333 0.000000
1.333333 0.000000
0.666667 1.000000
0.666667 1.000000
0.666667 -1.000000
0.000000 0.000000
0.000000 0.000000
-0.666667 -1.000000

The ﬁle output 1 in all three cases will include the ﬁt:

0.000000
-2.000000

Here is the list of other subroutines with their short descriptions.

vanderm0.c This programs creates matrices which are used to deﬁne a lin-

ear regression models when ﬁtting coefﬁcients of polynomial is consid-

62

ered. In linear algebra matrices of this type are named Vandermonde

matrices and they are of the form [2:5,] = [If],i = l, ...,n,j = 1, ...,m.

stab_err.c We are using the generator of pseudorandom numbers and for-
mula given in Chambers et. a1. (1983) to generate a sample from
symmetric stable distribution. The values of a sample are written into

a ﬁle speciﬁed by a user.

pow_uni.c Here we are generating a sample from a distribution which is

power of the uniform distribution on a given interval.

lin_regr.c For a given regression model we compute all matrices and vectors
which are important for statistical analysis (including the projection
matrix, the least square ﬁt of unknown parameters and the vector of

residuals).

permJes.c This program generates a sample obtained by resampling a
given vector by a random permutation and then taking the inner prod-

uct with another given vector.

bootstrp.c For a deﬁned regression model and a given vector of errors this
program produces three samples connected with our technique of re-
sampling. Namely, we will get samples from exact and approximated
conditional distributions as described above as well as a sample which
is difference between them. These program was used to verify the ac-

curacy Of our approximation.

trad_bts.c The same function as the above program but when the tradi-

63

tional bootstrapping is considered instead of resampling by permuta-

tions.

res_err.c This program computes the distribution of the ends of random

intervals deﬁned in Section 1.2 for a given regression model.

conf_res.c A sample from the conditional distribution of the ends of conﬁ-

dence intervals proposed in Section 1.2 is computed.

clt_appr.c A sample of the ends Of conﬁdence interval obtained when the
normal approximation is used to our distributions. This program was

used to compare two methods.

Graphical support. To visualize Obtained numerical data we have used
two systems Turbo-C, Borland implementation of the C—language available
on IBM—compatibles and utilities provided by the NeXTSTEP. The last one
we have found particularly useful for creation of the high level graphical
utilities and therefore ﬁnally we would like to have our package available in

the compact form on this computer.

Turbo-C graphics We have created several programs using Turbo Graphic
which deliver some pictures illustrating properties of our numerical
data. The one which was used here to draw histograms Of obtained
samples enables to produce a histogram of any set of data (not nec-
essary bell shaped) and illustrates how heavy are the “tails” of our
sample which is important when distributions without ﬁnite moments

are considered.

64

 

La’Plot The new application in the NeXTSTEP computer environment was
developed to simplify analysis Of our data. It enables to track mul-
tidimensional data exploiting the idea of parallel plot (see Chernoff
(1973)). Conﬁdence regions can be Observed in great detail. High
quality graphics produce pictures. This application will be developed
to the full software serving resampling techniques in linear models. See

also Section 2.3.

65

Bibliography

[1] Bickel, P.J. and Freedman, DA. (1981). Some assymptotic theory for
the bootstrap. Ann. Statist. 9, 1196—1217.

[2] Bickel, P.J. and Freedman, DA. (1984). Asymptotic normality and
the bootstrap in stratiﬁed sampling. Ann. Statist. 12, 470-482.

[3] Box, C.E.P. and Watson, GS. (1962). Robustness to non—normality

of regression tests. Biometrica, 49, 93—106.

[4] Chambers, J.M., Cleveland, W.S., Kleiner, B. and Tukey, RA. (1983).
Graphical methods for data analysis. Belmont:Wadsworth.

[5] Chernoff, H. (1973), Using faces to represent points in k—dimensional
space. J. Am. Statist. Assoc., 68, 361—368.

[6] Doob, J.L. (1953), Stochastic Processes. Wiley: New York J. Am.
Statist. Assoc., 68, 361-368.

[7] Efron, B. (1979), Bootstrap methods: Another look at the jackknife.
Ann. Statist., 7, 1-26.

66

[8] Freedman, D.A. (1981). Bootstrapping regression models. Ann.
Statist., 9, 1218-1228.

[9] Freedman, D.A. and Lane, D. (1983). A nonstochastic interpretation

of reported signiﬁcance levels. J. Business Econ. Statist. 1, 292-298.

[10] Hannan, E.J. and Kanter, M. (1977) Autoregressive processes with
inﬁnite variance. J. Appl. Prob. 14 411-415.

[11] Kanter, M. and Steiger, W.L. (1974) Regression and autoregression
with inﬁnite variance. Adv. Appl. Prob. 6 768-783.

[12] Kernighan, B.W. and Ritchie, D.M. (1978) The C Programming Lan-
guage. Prentice—Hall Inc., Englewood Cliffs, New Jersey

[13] Kinateder, J. (1990), An invariance principle applicable to the boot-
strap. PhD Thesis, Michigan State University.

[14] Kuelbs, J. (1973), A representation theorem for symmetric stable pro-

cesses and stable measures. Z. Wahrsch. Verw. Gebiete 26, 259-271.

[15] Lahiri, SN. (1992). Bootstrapping M—estimators of a multiple linear
regression parameter. Ann. Statist., 20, 1548-1593.

[16] LePage, R. (1980), Multidimensional inﬁnitely divisible variables and
processes. Part I: Stable case. Technical Report # 2.92, Statistics De-

partment, Stanford.

67

[17] LePage, R. (1990), Bootstrapping signs. in Exploring the limits of
bootstrap, R. LePage and L. Billard Eds., John Wiley and Sons Publ.,
(1992)

[18] Schilder, M. (1970), Some structure theorems for the symmetric stable
laws. Ann. Math. Statist., 41, 522-536.

[19] Wu, C.F.J. (1986), Jackknife, bootstrap and other resampling meth-
ods in regression analysis. Ann. Statist., 14, 1261-1295.

68