.«z. .
Jﬁw 4a .1324.

 

 

 

'i‘r't'uul.

This is to certify that the
dissertation entitled
Minimum Distance Regression
and
Autoregressive Mode] Fitting

presented by
Pingping Ni

has been accepted towards fulﬁllment
of the requirements for

Ph.D. degree in Statistics

M

 

Major professor

Hira L. Kou]

Date May 20, 2002

 

MSU is an Afﬁrmative Action/Equal Opportunity Institution 0- 12771

 

LIBRARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 cJCIRC/DateDuepGS-p 15

 

MINIMUM DISTANCE REGRESSION AND
AUTOREGRESSIVE MODEL FITTING

By

Pingping Ni

A DISSERTATION

Submitted to
Michigan State University
in partial fuﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

2002

ABSTRACT

MINIMUM DISTANCE REGRESSION AND AUTOREGRESSIVE

MODEL FITTING

By

Pingping Ni

This work proposes a class of tests for ﬁtting a parametric regression model
to a regression function when the underlying design variables are random and the
model is possibly hetroscedastic. These tests are based on certain minimized L2
distances between a nonparametric regression function estimator and the parametric
model being ﬁtted. The work obtains the asymptotic distribution of the pr0posed
statistic under the null hypthesis. It also derives the asymptotic distribution of
the corresponding minimum distance estimator. A class of tests based on a slightly
different L2 distance for ﬁtting a parametric autoregressive model to a autoregressive
function is also prOposed in this thesis. The asymptotic prOperties of underlying

parameter estimator and corresponding minimized distanced is derived.

Copyright by
PINGPING NI

2002

ACKNOWLEDGMENTS

I would like to thank my advisor Professor Hira L. Koul for his guidance and
many helpful discussions on the subject of this thesis. He was always available when
I had doubts or questions. His general thinking of statistical problem and ways to
solve the problem will help my future research and working. I would also like to
thank all the other committee members, Professors Connie Page, Habib Salehi, and
Lijian Yang, for serving on my guidance committee.

Many thanks to Professor Connie Page for her advice when I was at the consult-
ing service, to Professors Vincent Melﬁ, Habib Salehi, and James Stapleton, and to
Cathy Sparks for their help on my simulation study. Finally I would like to thank
the department of Statistics and Probabilities for offering me graduate assistantships
so that I could come to the states to complete my graduate studies at the Michigan
State University.

This research was partly supported by the NSF Grant DMS 0071619, under the

PI: Professor Hira K0111.

iv

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

1 Introduction

2 Minimum Distance Regression Model Fitting

2.1

2.2

2.3

2.4

2.5

2.6

Introduction ................................
Assumptions ................................
Consistency of 6; and én .........................
Asymptotic distribution of ﬁn ......................
Asymptotic distribution of the minimized distance ...........

Simulations ................................

3 Minimum Distance Autoregressive Model Fitting

3.1

3.2

introduction ................................

Assumptions ................................

vii

ix

14

14

15

18

26

38

51

65

65

67~

3.3 Consistency of 6n ............................. 70
3.4 Asymptotic distribution of ﬁwn — 60). ................ 76

3.5 Asymptotic behavior of the minimum

distance ................................... 86
4 Simulations 101
BIBLIOGRAPHY 122

vi

LIST OF TABLES

2.1

4.1

4.2

4.3

4.4

4.5

Empirical sizes and powers for testing models 0 vs. model 1 to 4.

Tests for model I 22.3. model 2 with double exponential errors.
Tests for model 1 vs. model 2 with N(0, 0.1) errors. ..........
Tests for model 1 vs. model 3 with the N(O, 0.1) errors .........
Mean and s.d.(6n) under model 1 with double exponential errors. . .

Mean and s.d(9n) under model 1 with normal errors. .........

vii

57

. 106

107

107

. 108

108

LIST OF FIGURES

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

4.1

4.2

4.3

4.4

4.5

4.6

The density curve of Jag” — 1). .................... 53
The density curve of nhd/2(th(én) -— Cn) ................ 54
The density of \/7_1(61n — 0.5) ....................... 58
The density of \/n(t92n — 0.8) ....................... 59
The 2 dimentional density of \/T—l(0n — 00) when n = 30 ......... 60
The 2 dimentional density of ﬁwn — 90) when n = 50 ......... 61
The 2 dimentional density of \fnwn -— 90) when n = 100. ....... 62
The 2 dimentional density of \/T_L(6n — 00) when n = 200. ....... 63
The density of test statistics under H0. ................. 64
The density of \/r_L(0n — 0.8) when the errors are double exponential. 110
The density of ﬁwn — 0.8) when the errors are N(0, 0.1). ..... 111
The density of Tn(9n) under model I with double exponential errors. . 112
The density of Tn(6n) under model 2 with double exponential errors. . 113
The density of Tum“) under model 1 with N(0, 0.1) errors ....... 114
The density of Tn(6n) under model 2 with N(0, 0.1) errors ....... 115

viii

4.7 The density of Tn(9n) under model 3 with N(O, 0.1) errors ....... 116
4.8 The density of the suitably scaled minimized distance under model 1

with double exponential errors ....................... 117
4.9 The density of the suitably scaled minimized distance under model 2

with double exponential errors ....................... 118
4.10 The density of the suitably scaled minimized distance under model 1

with N(0,0.1) errors ............................ 119
4.11 The density of the suitably scaled minimized distance under model 2

with N(0,0.1) errors ............................ 120
4.12 The density of the suitably scaled minimized distance under model 3

with N(0, 0.1) errors ............................ 121

ix

Chapter 1

Introduction

This thesis is concerned with the classical problem of using a set of variables, say
d—dimensional variable X, to explain the response Y, a 1- dimensional real variable.
As in the practice this is often done in terms of the conditional mean function of Y,

given X, known as the the regression function, and deﬁned as
p.(x) = E(Y|X = x), x 6 Rd,

assuming, of course, E [Y] < 00. In the context of time series where X may be the
vector of the previous (1 lagged variables, ,u is called the autoregressive function.
To be speciﬁc, let {(Xi, Yi) : i = 1, ..., n, } be observable random variables, where
(Xi, Y,) has the same distributions as (X, Y), for all 1 g i g n. They are said to obey
a regression model with regression function u if in addition {(X,, Y.) : i = 1, ..., n.}
are independent and identically distributed (i.i.d). The data is said to have come

from an autoregressive model of order d = 1 with autoregressive function u, if in

addition, Xn+1 is also observable and Y, = Xi“, 1 g i S n.
Let 9 C R", and {mg(-) : 6 E 9}, be a given set of parametric models. The
statistical problem addressed in this thesis is that of model checking, i.e., to test the

goodness-of-ﬁt hypothesis

(1.0.1) H0 : u(x) = m90(x), for some 00 E O, and for all x E I vs.

H1 : H0 is not true,

based on the given data, where I is a compact subset of W.

Several researchers have used nonparametric techniques on model checking in
regression and autoregressive setting since the late 1980’s. For instance, Eubank
and Spiegelman (1990), Eubank and Hart (1992, 1993), Hardle and Mammen (1993),
Stute (1996), and Stute, Thies, and Zhu (1998) address this problem in regression
setting, while An and Cheng (1991), Vidar, Yao, and Tjdstheim (1997), and K011]
and Stute (1999) in the autoregressive setting. In the regression context, some of
these works focus on ﬁxed design rather than random and under some restrictive
assumptions on the error distribution. The proposed tests in these papers, except
Stute (1996), Stute et al. (1998), and Koul and Stute (1995), are based on some
nonparametric estimator of the regression function while the tests in the latter
papers are based on a certain partial sum empirical process of the residuals.

Here we shall brieﬂy summarize the contents of some of these papers. Eubank

and Spiegelman (1990) consider the sequence of models where d = 1, at stage n, _

X,- = x," with 0 g xln < < xnn S 1, known nonrandom, and where

[43:31): ,80 + 18133231 + f($in)i 1 S i S TL,

and f is a smooth unknown function. Moreover, here the errors Y,- -— p(xin) are as-
sumed to be i.i.d. N (0, T?) with r2 unknown. It is also assumed that xi" are gener-
ated by a continuous positive density w on [0, 1] through the relation for” w(x)dx =
(23' - 1) / 2n. The problem addressed in this paper is to test the hypothesis f = 0
versus the alternative that f E L2(w)/{1,x}, f is absolutely continuous and its
a.e. derivative f’ is absolutely continuous and square integrable. Here the space
L2(w)/{1,x} consists of all functions in L2(w) orthogonal to 1 and the identity
function.

The paper proposes two tests. For one, they assume that f = Tnpa, a 6 R”,
where Tnp is a vector of known functions orthogonal to 1 and the identity function.
Then the test is based on the the least square estimators of ﬁg, 51 and a. The other
test is based on the spline estimation of f and the least squares estimators of ,80, 61.
They prove the asymptotic normality of their pmposed statistics under their null
hypothesis. We note that the problem addressed in this paper may be thought to
be equivalent to ﬁtting a simple linear regression model, i.e., to test H0 of (1.0.1),
with m = 2, m9(x) = (1, x)6, against a nonparametric class of alternatives.

Hardle and Mammen (1993) consider the problem of testing H0 based on the

model

(1.0.2) Yr = #(Xi) + 52',

where ei’s are allowed to be heteroscedastic with E(e,-|X,-) = 0 and E(5?!Xi = 3;) =

02(x). They propose a class of tests based on

(1.0.3) Mhh(0) := A [n'1 2: Kh(x — X.) (Y,- — 772.9(Xi)) {fh(x)}’2dG(x),

— n 1 u
Mar) == 12 ‘ZKh(a:— X.), x 6 IR", Km) = it?

i=1
where K is a kernel density function on [—1,1]"1 and G is a o-ﬁnite measure on Rd.

Their test is based on the statistic Tn := nhd/QMMM), where the estimator d and

the null model are assumed to satisfy the condition

(1.0.4) m5(x) -— mgo(x) =(1/n)Z < n(x),'y(X,-) > e,- + op((nlogn)_1/2),

uniformly in 2:. Here n and 7 are bounded functions taking values in R" for some
h. It is pointed out in the paper that this assumption holds for linear models and
the weighted least squares estimators in nonlinear models if u(-) is ”smooth” with
W?) = (3/39)ma(-) at 9 = 90-

Apart from the usual assumptions such as the kernel K is a symmetric, twice
continuously differentiable with compact support, X lies in a compact set with
probability 1 and the density f of X is bounded away from zero and inﬁnity, they
also assumed that hn = cn‘lMH) for some known constant c > 0, the regression
function ,u and the density f are twice continuously differentiable, and Eexp(te,~) is
uniformly bounded in i for |t| small enough.

Under some additional assumptions, they concluded that the asymptotic null
distribution of nhd/2(Mhh(§) — C") is N(0, 17), where Cn depends on the u = mac,

4

the second derivative of K and it‘d/2, and where

r _ 04($)92(13) x (2) 2
v-2/————f2($) d /(K (0) dt,

and g is the Lebesgue density of G.

The choice of bandwidth hn = cn‘l/(‘H") is asymptotically optimal for the class
of twice continuously differentiable regression functions. It is also crucial in getting
the rates of uniform consistency of nonparametric estimators of u and f, which in
turn play a crucial part in the proofs of this paper.

The paper gives details of the proof for one dimensional case only, i.e, for the
case d = 1. But it is not clear how their proof can be extended to the case d > 1,
without a concern for bandwidth selection.

These authors also did Monte Carlo simulations on both distribution of the test
statistic and its asymptotic distribution. Their studies show that the simulation of
the null distribution of the test statistic has a non-negligible large departure from
the limiting distribution in its mean, variance, and shape. It is also proved in the
paper that the naive bootstrap does not work for degenerate U statistics. So they
suggested to use wild bootstrap to calculate critical values.

Stute, Thies and Zhu (1998) also considered the problem of testing H0 of (1.0.1)
for model (1.0.2) with d = 1. They constructed a class of test statistics by ﬁrst
splitting the whole sample into two parts, 1 to n1 and n1 to n with n1 —> 00 and
n — n1 —+ 00. The test statistic is based on the cusum process of the residuals of

the second half. Let Fm be the empirical distribution function of XMH, ..., Xn, 6n,

be a ﬁn - nl-consistent estimator of the true parameter under the null hypothesis
based on (X,,Y-), n1 +1 < i < n, and let

RACE )2 (n — n1 ”2 Z 1{X <x}0n11( Xille‘ _ m0n1(Xi)l-

i: m+1

Deﬁne the transformation Tn by

(use)
= m— [1 0.:3(y>mii,,(y>A;3<y>x [fosemmzuwa may).

Here

iﬂz/yoc m9n1(u)7h0nl(u) ;I2(U)Fnt(du)i

02 is a consistent estimator of 02 based on the ﬁrst half sample, and m0:

111 59—97" m'"

Under the assumption that f: mgo(u)mg;(u)o‘2(u)F(du) is positive deﬁnite for
some $0 < 00, and under some additional smoothness assumptions on the null
model, they proved that under H0, Taft}, -—) B o F in distribution on D[—oo,x0],
where B is a standard Brownian motion and F is the distribution function of X.
They then propose the test statistic o;12F,f2(x0) ff;[TnR;,(x)]2Fn(dx). It is also
proved in this paper that their test statistic converges in distribution to f01 Bz(u)du
under their null hypothesis.

An and Cheng (1991) considered a problem of testing linearity of an autoregres-

sive function. They proposed a Kolmogorov-Simirnov type of test statistic based on

a process similar to R, with

éi = (X,- — X) - ﬁ(Xi_1 - X),
22:1(Xk -' X)(Xk._—1 - X)
22:1(Xk _ X)2

 

_ 1 "
X=—E Xk, and ,5:
n
k=1

The test statistic is deﬁned to be

_ m
m 1/2 A
sup—oc<t<oc A ekI(Xk_1<t) ’
0' k-2

where m = m(n) is a subsequence of n satisfying m —> 00 and m(lnlnn)/n —>
0. It is proved in the paper that this test statistic converges in distribution to
sup0<t<1|B(t)| under the null hypothesis, where B stands for standard Brownian
motion.

Koul and Stute (1999) preposed a class of tests for testing the goodness-of—ﬁt
of an autoregressive model also based on an analogue of Rn. The test statistic is

deﬁned as follows,

lTnVn($)l
sup ———————,
2:90 (In Gn($0)

where, for an x E R,

1 " 1 "
TnVn($) = 777-: [1(Xi—1 S 93) - aZThen(Xi-11147.1(Xi-llT'TIOJXi—i)
i=1 jzl

XI(XJ'_1 S Xi_1 A $)] (Xi — m9n (Xi-1))?

Gum = iZI<x.--.sa. Ana): / ma.(y>m3‘,(y>1<y2nanny).

i=1

Here 6,. is the least square estimator of 60, of, is a consistent estimator of the variance
of the error. It is also proved that under H0 and when 02(x) = 02, the test statistic _
converges to sup0<t<1|B(t)|, in distribution.

7

Vidar, Yao, and Tjgbstheim (1997) considered a problem of ﬁtting a linear autore-
gressive function by using local polynomial approximation. They pointed out that
one can construct new tests of linearity by exploiting that the ﬁrst order derivative is
a constant, and the second order derivative is zero for a linear model. From the esti-
mation point of view, the local polynomial method does overcome some draw backs
of kernel type nonparametric estimate provided that the regression/autoregressive
function is ”smooth”, for example the existence of higher order derivatives. If conti-
nuity is the only smoothness condition that is put on the regression / autoregressive
function, then either a kernel type estimates or a local polynomial estimate of the
regression/autoregressive function yield exactly the same estimates. That means
that some tests proposed in this paper can not be extended easily to nonlinear case
without relatively strong smoothness conditions on the regression/autoregressive
function.

Our work uses the minimum distance ideas as developed by Wolfowitz (1952,
1954, 1957), to propose tests of ﬁt for the problem. The inference procedures based
on various Lg-distances have proved to be successful in producing tests for ﬁtting a
distribution and / or a density function, and in producing asymptotically efﬁcient and
robust estimators of the underlying parameters in the ﬁtted model, as is evidenced in
the works of Beran (1977, 1978), Donoho and Liu (1988a, 1988b), and Koul (1985),
among others.

Beran (1977) focuses on ﬁtting a parametric family .7: = {fa : 9 E O} of densities -

to the common density in the one sample setup. The question raised in the paper is

8

how to estimate 6 in order to investigate the ﬁt of the model to the data. This paper
introduces a new efﬁcient parametric estimator based on the minimum Hellinger
distance. The Hellinger distance is deﬁned to be the L2 norm of the difference of
the square roots of two nonnegative densities. The parametric estimator 6,, is the
6 E O that minimizes the Hellinger distance between f9 and ﬂ. It is proved that
under some conditions the estimator 6,, is stable under small perturbations, and
ﬁ(6n — 60) converges in distribution to a normal random variable with mean zero
and variance 4‘1[fsgo(x)sg;(x)dx]“1, where 60 is the true parameter, see is 39—939 at
6 = 60, and 39 = 51/2. The test statistic for testing the null hypothesis that f is
a member of f, against the alternative hypothesis that f is not a member of f,
is the corresponding minimum Hellinger distance. It is also proved in the paper
that under some conditions the suitably standardized minimum Hellinger distance
converges in distribution to N (0, 1) under the null hypothesis.

In the context of density ﬁtting problem in the one sample set up, Beran (1977,
1978) showed that the inference procedures based on the Hellinger distance have
desirable properties. In the regression model ﬁtting context, this motivates one to

consider the square distance
(1.0.5) M,;,,(0) = [(ihha) — manned), 9 6 w,
I

and the corresponding minimum distance estimator a; = argminaeeMghw), where

ﬁhh(x) is a nonparametric estimator of the regression function u(x) based on the

window width h = hn:

n‘1 2;, Kh(x — Xi)Y,-
fill“) .

 

£1th) =

But because the integrand inside the square of Mgh is not centered, and be-
cause of the non-negligible asymptotic bias in the nonparametric estimator ﬂhh,
the goodness-of-ﬁt statistic M;,,(a;) does not have a desirable asymptotic null dis-
tribution. Moreover, the estimator 0;, though consistent, is not asymptotically
normal. In fact it can be shown that generally the sequence (nhd)1/2Ha,’, — 60“
may not even be tight. For example, see Remark 2.4.3 at the end of Chapter 2.
To overcome this difficulty, one may think of using MM, deﬁned in (1.0.3) and let
(in = argmingethhw).

Now, under the null hypothesis H0, the ith summand inside the square integrand
of Mhh(60) is now conditionally centered, given the it” design variable, 1 _<_ i g n.
But the asymptotic bias in n1/2(ci,, — 60) and Mhh(dn) caused by the nonparametric
estimator fh of f in the denominator of uh}, still exists. It turns out that this
difﬁculty can be overcome if we use Optimal window width, different from h, and
possibly a different kernel, to estimate f. This leads us to consider the following

modiﬁcation of the above distance and estimator. Deﬁne
(1.0.6)fwn (x) = n‘1 ZKMx — X,), x 6 Rd, wn ~ (logn/n)zh,
i=1

Mme) = f1[n"ZKh(r-Xi)(K-me(Xz-)) {fawn-Mae).

where K ‘ is a density kernel function, possibly different from K, satisfying a Lips-

10

chitz condition. The proposed minimum distance estimators of 6 are
6,, :2 argmingEthww),
and the pr0posed tests of H0, one for each G, will be based on

326i) th(6) = th(6n).

We also consider the following square distance and estimator:

Mgww) = fi (ﬁhu.(x)—mg(x))2dG(x), 66W,

6;: : argminaeeMIin)»
where
th(x) = n-1 2:ng — mpg/fwd), x 6 RP.
i=1

For the sake of simplicity, we write h for hn and w for wn.
This thesis proves the consistency of 6;, 6n, and the asymptotic normality of

n1/2(6,, — 60). The asymptotic null distribution of the statistic
T, :2 mid/2 (th60 — a.) ﬁt,”

is shown to be standard normal. This result is similar in nature as the corollary to
Theorem 8 of Beran (1977 , p459). A test of H0 can be thus based on Tn. Here, On

is an nhd/z-consistent approximation of an asymptotic centering sequence 0,, and

11

A

I‘n is a consistent estimator of the asymptotic variance F,

Cu 2: n'2Z/Kh(x—X - £€,-2({f x)}'2dG(x),

i=1

a, = n'zz/Kh(::—X:e . é§{fw( x)}—2dG(x), éi=Yi—mén(X,-),lgi§n,

i=1 dx /( /K(u)K(v + u)du)2du,

I“ := 2/o4(x)g2(($;

1“,. = hdn ‘2:(/Kh( x—X )Kh(x—X j)5‘éj {fw(x )}‘2dG(x))2,

#J'

 

where 02(x):= E{(Y— u(x ))lX— — x}, x 6 Rd.

In autoregressive setup, where autoregressive function is deﬁned to be
(1.0.7) ,u(x) = E(Xn|Xn-1 = x), n E Z,

and Z stands for the set of integers, we prOpose a class of test statistics for testing

H0 of (1.0.1) based on a slightly different LQ-distance Mh(6) deﬁned as

(1.0.8) Mh(6 ):=/(6th15— i- 1 )(Xi—m9(X,-_1))) dG(x).

The underlying parameter estimator is deﬁned as

(1.0.9) 6,, := argminaethw).

It is proved in this thesis that when d = 1 and 553 are i.i.d with 02(x) E 02,

under some conditions, ﬁ(6n — 60) converges in distribution to a normal random

variable with mean zero and covariance matrix 251172231 under H0, where

(1.0.10) 722: = afmokx )mZLu: museum. e<x)==mo.(x)f(x).

20: fax) )sT) )(a:.dG)

12

It is also proved that under some conditions, a suitably standardized minimized
distance Mh(6n) converges in distribution to a standard normal random variable
under the null hypothesis.

This thesis is organized as follows. Chapter 2 discusses the model ﬁtting for
regression function. Theorem 2.2.1 and Theorem 2.3.1 give the asymptotic prOp-
erties of the underlying parameter estimate. Theorem 2.4.1 gives the asymptotic
distribution of the minimized distance under the null hypothesis. A test statistic
therefore can be constructed based on this theorem. Chapter 3 discusses a parallel
results for autoregressive model ﬁtting.

Chapter 4 shows a simulation results in the autoregressive setting. This sim-
ulation compares the level and power performance of a minimum distance test
with that of Koul and Stute and An and Cheng (1991) tests for the sample sizes
50, 100, 200, 500. The minimum distance test is seen to perform better at some of
the chosen altenatives and for all chosen sample sizes. For additional details see
Chapter 4.

In the sequel, all limits are taken as n —) 00, unless speciﬁed otherwise.

13

Chapter 2

Minimum Distance Regression

Model Fitting

2. 1 Introduction

This chapter discusses a minimum distance method for ﬁtting a parametric model
to the regression function, i.e. to test H0 of (1.0.1) based on the random sample
{(X,,Y,-) : i = 1,...,n} from the distribution of (X,Y), for which (X,,Y,-) satisfy
(1.0.2), where I is a compact subset of Rd. Moreover, assuming that the given
parametric family of models holds, one is interested in ﬁnding the model in the
given family that best ﬁts the data.

In this chapter, we will construct a class of tests based on th(6n) in (1.0.6).
In contrast to Hardle and Mammen (1993), our results do not require the null -

regression function to be twice continuously differentiable nor do the proofs in this

14

chapter need the rate for uniform consistency of [ihw for u. Moreover, we derive the
asymptotic distributions of n1/2(6n — 60) and T n under H0. This was made feasible
by recognizing to use different window widths for the estimation of the numerator
and denominator in the nonparameteric regression function estimator.

The rest of the chapter is organized as follows. Section 2 states various as-
sumptions, and section 3 contains the consistency proofs. The claimed asymptotic
normality of 6,. and th(6n) are proved in sections 4 and 5, respectively. A sim-
ulation study is presented in section 6 to illustrate the asymptotics for the sample
sizes 50, 100, and 200. The results are presented in terms of densities of \/7—2(6n — 60)
and nhd/2(th(6n) -— C”). These graphs show that the distribution of \fn—(6n — 60)
resembles the asymptotic normal distribution quite well even for the sample of size
50. The distribution of nhd/2(th(6n) — Cn) has a small negative bias compared
with the asymptotic normal distribution for all three sample sizes. But the bias

decreases as n increases.

2.2 Assumptions

Here we shall state the needed assumptions. About the errors, the underlying design

and G we assume the following:
(e1) The random variables {(Xi, Y,); X,- 6 W, Y,- E Ki = 1, - ~- ,n}, are i.i.d. with

the regression function ,u(x) = E(Y|X = x) satisfying fu2(x)dG(x) < 00,

where G is a o-ﬁnite measure on W.

15

(e2) E(Y — u(X))2 < 00 and the function 02(x) := E{ (Y - u(x))2|X = x} is as.
(G) continuous on I.

(f) The design variable X has a uniformly continuous Lebesgue density f that is

bounded from below on T.

(g) G has a continuous Lebesgue density 9.
About the kernel functions K, K ‘ we shall assume the following:

(k) The kernels K, K“ are positive symmetric density functions on [—1,1]d with
ﬁnite variances and f lulTK2(u)du + f K’2(u)du < 00, for r = 0,1,2. In addi-

tion, K ‘ satisﬁes a Lipschitz condition.

About the parametric family of functions to be ﬁtted we need to assume the
following:
(m1) For each 6, m9(x) is as continuous in x w.r.t integrating measure G.

(m2) The parametric family of models m9(x) is identiﬁable w.r.t 6. i.e., if me, (x) =

m92(x), for almost all x (G), then 61 = 62.

(m3) For some positive continuous function 6 on I and for some 6 > 0,

[mp2 (x) — m9,(x)l _<_ “6;; — 61||8€(x), V62, 61 E 6-), x E I.

(m4) The model mg is differentiable in 6 in a neighborhood of 60 with the vector

of derivatives me, such that for every 6 > 0, k < oo,

 

. _ . _ _ T ' .
lim SUpP< Slip lm9(X2) ”1.9002(1) (0 00) m9o(X1)l > 6)
n igign,(nhd)l/2||a—eougk ”0 " 90H

16

is 0.

(m5) For every 6 > 0, there is an N, < 00 such that for every 0 < k < 00,

P ( max It‘d/2||mg(X,-) - r'ngo(X,-)|| Z 6) g e, Vn > N,.
igign,(nhd)1/2||9—00Hgk

About the bandwidth h,, we shall make the following assumptions:

(hl) h,,-—)0asn—)oo.

(h2) nhid —+ 00 as n —) oo.

(h3) h ~ n‘“, where a < min(1/2d,4/(d(d + 4))).

Conditions (hl) and (h2) suffice for the consistency of 6", while (h3) is needed for

the asymptotic normality of 6,, and th(6,,). Of course, (113) implies (hl) and (h2).

It is well known that under (f), (k), (111) and (112), cf., Mack and Silverman

 

(1982),

(2-2-1) :21; ht?) - f0)! = 011(1)) :21; film) - f(:L') = 0,00),
use) I

2.2.2 . —1 = 0,, 1 ,

( ) :23 nix) | ( )

 

These conclusions are often used in the proofs below.

In the sequel, we write h for h,,, w for w,,; the true parameter 60 is assumed to be
an inner point of O; and the integrals with respect to the G-measure are understood
to be over the set I. The inequality (a + b)2 g 2(a2 + b2), for any real numbers a, b,

is often used without mention in the proofs below.

17

A

2.3 Consistency of 6;“, and 6,,

This section proves the consistency of 6;, and 6”. To state and prove these results we
need some more notation. Let L2(G) denote a class of square integrable real valued

functions on Rd with respect to G. Deﬁne
p(u1,1/2) := /I(V1(x) — V2($))2dG(.’E), V1, V2 6 L2(G),
and the map
T(u) = argminé,Ee p(u, mg), l/ E L2(G).L2(G).
In the sequel we shall often use the following notation
(in), := fh’2dG, dp = f-2do.

Moreover, for any integral L :2 fydgbh, L 2: f'ydp. Thus, e.g., T(6) stands for

T(6) with aw replaced by (p, i.e., with fw replaced by f. We also need to deﬁne

#7101319) 5: "—IZKhix—Xilm0(xi)a
i=1

[1,,(x,6) := n‘IZKh(x—X,-)r'ng(X,-),
i=1

Un(x, 6) := n"1 ZKh(x — X,)Y,- — un(x, 6),
i=1

n-1 2 K),(x — X,)(Y,~ —- m,(X,-)), U,,(x) = Un(x, 00),
Zn($,6) :: “71(1‘19) — Hn($100)

= n"1 ZKh(x —- X,)[mg(X,-) — mgo(X,)], 6 6 IR",
i=1

18

1%.,(1) := n‘1 Ema: — X,), K;(x) z: n-1 2 K;(x — X.) x 6 ad,
i=1 i=1
20 := /m90(x)mg;(x)dG(x).
To begin with we state

Lemma 2.3.1 Let m satisfy the conditions (m1), (m2), and (m3). Then the fol-
lowing hold.

(a) T(V) always exists, VV 6 L2(G).

(b) If T(V) is unique, then T is continuous at 1/ in the sense that for any sequence

of {11"} E L2(G) converging to V in L2(G), T(l/n) —-> T(V), i.e.,
p(1/,,, u) ——> 0 implies T(Vn) ———) T(V), as n —> 00.
(c) T(mg(-)) = 6, uniquely for V 6 E 9.

Proof. The main ideas of the following proof are essentially as in Beran (1977).
Proof of Part (a). Because 9 is compact, it sufﬁces to show that for every
V E L2(G), the map 6 +—> p(u, m9) is continuous. Accordingly, let 6,, be a sequence

in O, converging to a 6 E 9. Then, by the Cauchy-Schwarz inequalities, we obtain
|P(V1m0n) — [)(V, mall S Mmemma) + 2P1/2(V1m9)P1/2(m0n»m0) ‘—> 0»

by (m3).

Proof of part (b). Let {14,}, V in L2(G) be such that
(2.3.1) p(u,,,1/) —> 0.

19

Set 6 = T(u), 19,, = T(Vn). Then, by the deﬁnition of T,

IOU/1117711)") S p(Vn,ma).

By subtracting and adding 11 and expanding the quadratic and using the the
Cauchy-Schwarz inequality on the cross product term, the above bound is bounded
above by

p(1/,,, V) + p(u, mg) + 2p1/2(V,,, V)p1/2(l/, m,;).

In view of (2.3.1), we thus obtain

(2.3.2) lim sup p(u,,, mg") g p(u, my).

On the other hand, again by the deﬁnition of T, 6, and 19,, here, p(u, mg) _<_ p(V,m19,,)

which, together with an argument like the above, implies

P(me0n) -p(V»mv) Z P(Vn»mvn) —P(V,m19n)
2 pm. u) — 2p1/2<un.u)p1/2<u,m.a.).
But
p(u, mg") S 6p(1/,,,1/) + 4p(1/, mg) = 0(1).
Thus, again in view of (2.3.1), liminfn p(u,,,m,9n) 2 p(z/,m,)), which together with
(2.3.2), yields

(2.3.3) p(1/,,,m,)n) -—> p(1/,m,;).

Ptom this it follows that 6,, —) 6. For, suppose 19,, 4+ 19. Then, by the compact- -
ness of 9, there is a subsequence {"6“} C {6"} such that 19,”, -—> 61 ¢ 60, and by

20

the continuity of the map 6 H p(V,6), and by (2.3.1), we Obtain p(1/,,k,m,9nk) ——>
p(u, 7119,). Hence, by (2.3.3), p(l/, mm) = p(l/, 171.13), implying, in view of the unique-

ness of T(V), a contradiction, unless 191 = 6.

Proof of part (c) follows from the identiﬁability condition (m2), which implies
that T(Tl’tg) = 6. D

A consequence of this lemma is the following

Corollary 2.3.1 Suppose H0, (e1), (62), (f), (m1), (m2), and (m3) hold. Then,

6;, —-—> 60, in probability under H0.

Proof. We shall use part (b) of the Lemma 2.3.1 with 12,, = ﬂaw, u = mgo. Note
that Mgw(60) = p(;1hw,mgo), 6;, = T(un), and by the identiﬁability condition (m2),

T(V) = 60 is unique. It thus suffices to prove

(2.3.4) pm... me.) = opu).

To show this, we note that by plugging in Y, = u(X,-) +5,- and note that u = mgo
under H0, and expanding the quadratic integrand, p([ihw, u) is bounded above by

the sum 2[C,,1 -+- Cn2(60)], where,

C... := / Uitx)d¢w(x).

cam) == / [i.(w)—f<:.<x)ma<x)]2dia.<x). 661v.

It thus suffices to show that both of these two terms are 0,,(1).

21

By Fubini, the continuity of f and U2, assured by (e2) and (f), and by (k) and

(h2),
(2.3.5) E/U3(a:)d,o(x) = n-I/EKﬂx - X)a2(X)d<,o(x) = 0(1/nhd) = 0(1),
we obtain that

(2.3.6) / U:< )dm )=0.<<nhd)-1)

Hence, by (2.2.2),

o... s supu< )/f.< )r/U za:<)d.o<x)=0.<<nhd)-1)

:61

Next, we shall show
(2.3.7) Cn2(60) = 0,,(1).
Let
eh(x, 6) = EK),(x — X)mg(X) = / K(u)mg(x - uh)f(x — uh)du,
e,",,(x, 6) = EK;(x — X)£CTTI.9( )=/K(u u)mg( f(x — uw)du.

By adding and subtracting eh(x, 6) and e',’,,(x, 6) in the quadratic term of the inte-

grand, one obtains that

(2.3.8) Cn2(0) S 3Cn21(6) + 3Cn22(0) + 3Cn23(9), 6 E 9,
where
(2.3.9) Cn21(6) = f[u,,(x,6) — e;,(x,6)]2 d¢w(x),

0.226) = f [Ri($)mo($)-6;($.9)]2d¢w($).
Cn23(6) = /[€h($,9)—€;(I,6)]2d¢w($).

22

By thini, the fact that the variance is bounded above by the second moment,

and by (f), (k) and (m1), one obtains hat
(2.3.10) ECn21(60) _<_ n‘I/EK,2,(x—X)mgo(X)dcp(x) = 0((nhd)‘1).

Hence Cn21(60) = 0p((nhd)‘1) follows from (2.2.2). Similarly, one can obtain that
Cn22(60) = Op((nhd)‘1). The claim C,,23(60) = 0(1) follows from the continuity of
mac and f. This completes the proof of (2.3.7), and hence that of (2.3.4) and the
corollary. C]

Before stating the next result we give a fact that is often used in the proofs

below. Under (f), (k), and (112),

(2.3.11)/E

= n‘I/EKﬂx — X)012(X)dcp(x) + /[EK),(x — X)a(X)]2dcp(x)

71-1 2 Kh(~"3 - Xi)a(Xil] 060(3)

 

= 0(1) + 0(1) = 0(1), for any continuous function a on I.

We now proceed to state and prove

Theorem 2.3.1 Under H0, (e1), (e2), (f), (k), (m1), (m2), (m3), (Ill), and (h2),

A

(2.3.12) 6,, —) 60, in probability under H0.

Proof. We shall again use part (b) of Lemma 2.3.1 with 11(x) E mgo(x), V,,(x) E
mén(x). Then by (m2), 6,, = T (14,), 60 = T(u), uniquely. It thus suffices to show

that

(2.3.13) p(mén, 777.90) = 0,,(1).

23

But observe that

p(mg,, , mac) 5 2601).... mg) + p(/1).w.meo)l-
Thus, in view of (2.3.4), it sufﬁces to show that
(2-3-14) th(én) E pmhw. min) = 0p(1)-
But this will be implied by the following result.
(2-3-15) 81:1) WWW) - M£w(9)l = 012(1)-
For, (2.3.15) implies that
Mm.) = Mme.) + 0,,(1), M;w(9;) = Mme.) + 0,.(1),
(2.3.16) Mm.) — Mam) = Mini.) — Mater.) + 0,,(1).

By the deﬁnitions of 6,, and 6;, for every n, the left band size of (2.3.16) is nonneg-

ative, while the ﬁrst term on the right hand side is nonpositive. Hence,
Mliw(6n) — th(6ﬁ) = 0P(1)°

This together with the fact that Mgw(6;‘,) S M;w(60) and (2.3.4) then proves (2.3.14).

We now focus on proving (2.3.15). Add and subtract u,,(x, 6) / fw(x) inside the
parenthesis of M ,jw(6), expand the quadratic, and use the Cauchy-Schwarz inequality
on the cross product, to obtain that the left hand side of (2.3.15) is bounded above
by

1/2

81:1) Cn2(9) + 2 51:1) (Cn2(9)th(9))

24

It thus sufﬁces to show that
(2.3.17) sup Cn2(6) = op(1), sup th(6) = 0,,(1).
9 9

Recall the notation at (2.3.9). Using the same argument as for (2.3.10), and by

the boundedness of m on 1' x 9, one obtains that
sup ECn21(6) = 0(1) = sup ECn22(6).
9 9

By the continuity of mg and f, one also readily sees that Cn23(6) = 0(1), for each
6 E O. In view of an inequality like (2.3.8) for Cng, we thus obtain that Cn2(6) =

0,,(1), for each 6 E O. This and (2.2.2) in turn imply that

 

(2.3.18) 0,2(9) g sup . C",,(9)=o,,(1), v9ee.
1'
Finally, by (m3),

lCn2(62) - Cn2(91)l

 

s 21192—91usupf2(x) if [n-liKh(x—X,)6(X,)]2d<p(x)

261 1:", (x) i=1
+ [[R;<x)i<x)12d¢(x)].
But (2.3.11) applied once with a E E and once with a E 1 implies that the third
factor of this bound is 0,,(1). This bound and (2.2.2) together with the compactness
o O and (2.3.18) completes the proof of the ﬁrst part of (2.3.17).
To prove the second part of (2.3.17), note that by adding and subtracting m90(X,-)

to the it” summand in th(6), we obtain

M...<9) s 2sup<f<x)/f.<z))2 (/ U§($)ds0($) + f 229226)) .

x61

25

But, by the boundedness of m over I x O and by (2.3.11) applied with a E 1,

(2.3.19) 53,. / Z3,(x,6)d<,o(x) g of (Kn(x))2d<p(x) = 0,,(1).

This together with (2.3.6) then completes the proof of the second part of (2.3.17),

and hence that of the Theorem 2.3.1. [I]

2.4 Asymptotic distribution of 6,,

In this section we shall prove the asymptotic normality of n1/2(6,, - 60). The ﬁrst

step towards this goal is to show that
(2.4.1) nhdH6,, — 9,,“2 = 0,,(1).

Recall the deﬁnition of Z,, from (2.3.1) and let D,,(6) := fZ§(x,6)d<p(x). We

claim
(2.4.2) nlLan(6,,) = (),,(1).
To see this, observe that

n 2
nhthw(6o) = Mid/(n‘IZKMx—Xﬂei) d¢w(x)

i=1

S nhd/U§(I)d<9(l‘)+nhd/U3($)d<9($):1éir)|f2($)/f3.(x) - 1|

= 0,,(1),
by (2.3.5) and (2.2.2). But, by deﬁnition,

th(én) S th(60))

26

implying that

nhthw(6,,) = 0,,(1).

These facts together with the inequality

Dn(6) <2i1th(60) + A{hw(én)l

proves (2.4.2).

Next, we shall show that for any a > 0, there exists an N,, such that
(2.4.3) P (D,,(6n)/||6,, — 90))2 _>_ a + llgﬁiflezob) > 1 — a, v n > N,,,

where 20 is as in (2.3.1). The claim (2.4.1) then will follow from (2.4.3), (2.4.2), the

positive deﬁniteness of >30, and the fact
nthn(6,,) = nhd||6,, — 9,,“2 [D,,(6,,)/l|6,, — 90”?)

To that effect, let

(2.4.4) u,, := (6,, — 60), d,,,- := mg (X,—) mgo(X,- ) — uzmgo(X,-), 1 S i _<_ n.
We have

M— S Dnl + Dnz, where

||9n - 90“2

D“ = f ”-127“— “(IIdT‘iIIlrdm
0.2 = f M] dad).

llunll

 

27

By the assumption (m4) and the consistency of 6,,, one veriﬁes by a routine argument

that D,,1 = 0,,(1). For the second term we notice that
(245) Dn2 > inf 2,,(6),

where
Z,,(b) := / [bT [1,,(x, 60)]2 d<p(x), b 6 Rd.

By the usual calculations one sees that for each b 6 Rd, 2,,(b) —-) bTZOb, in probabil-
ity. Also, note that for any 6 > 0, and any two unit vectors b, b, 6 Rd, llb— b,” g 6,

we have

2

|2n(b)- $7.01)! <5()5+2 )U ”“2194 33- Xilllmao(X zllldsotv )

i=1

But the expected value of the r.v.’s inside the square of the second factor tends to
f Ilm(x )ll f(x )dcp(x), and hence this factor is 0,,(1). From these observations and

the compactness of the set {b 6 W; ”b“ = 1}, we obtain that

sup (2,,(b) — bTEObl = 0,,(1).
l!b||=1

This fact together with (2.4.5) implies (2.4.3) in a routine fashion, and also concludes
the proof of (2.4.1).
We shall now prove the asymptotic normality of n1/2(6,, — 60). The proof is

classical in nature. Recall the deﬁnitions (2.3.1) and (2.4.4), and let

th(6) :2 —2/Un(x,6),ii,,(x,6)d<,b,,,(x).

28

Since 60 is an interior point of O, by the consistency, for sufﬁciently large n, 6,, will
be in the interior of O and th(6,,) = 0, with arbitrarily large probability. But the

equation th(6,,) = 0 is equivalent to

(2.4.6) [Un(x)/1,,(x,6,,)d<,bw(x) = [Z,,(x,6,,)ii,,(x,6,,)d¢w(x).

We shall show that n”2 x the left hand side of this equation converges in distribution
to a normal r.v., while the right hand side of this equation equals R,,(6,, — 60), for
all n 21,with R, = 20 —+- 0,,(1).

To establish the ﬁrst of these two claims, rewrite this r.v. as the sum 3,, -+- Sm +

gnl + gn2 'i” 97,3 'i' 9,,4, where

5,, = /Un(x);ih(x)d<p(x), ii),(x) = EK),(x — X)mgo(X),
Un($)/lh(13)(f.£2($) - f'2(l‘))dG(1?).

Un(I) [91:03. 90) - [Mill d<9($)

Un($l [#n($éri lid-'13» 6’0 W

/

/
as:[useuuaarwuenafa)ar%nwca)

/

a.=/Uw<)hax9) —t%($W%]J ammo)
We need the following lemmas.

Lemma 2.4.1 Suppose (e1), {62), (f), (g), (k), (h1), (h2) hold, Ele|2+5 < 00, for

some 6 > 0, and 9190(1) is continuous in x E I. Then, under H0, nl/QSn —-)d

29

N(O, 2) , where

2 = zimhsof/EKux—Xmo—X)a?(X)nh<x>u£<y)99<x)99(y>

z [02(xlmoo($)m£(x)92(x)

f(x) dz.

 

Moreover, iff is twice continuously diﬁerentiable, and h satisﬁes (h3), then
(2.4.7) n1/2|Sn1| = 0,,(1).
Lemma 2.4.2 Under H0, (61), (62), (f), (k), (m1), (m2), (m4), (m5), (h1), (h2),

(2-4-8) (9) Til/297:1 = 012(1)» (’9) ”mm = 0,,(1).

(2-4-9) (6) n” 29713 = 012(1), (d) 711/ng = 019(1)-

The proof of (2.4.7) is facilitated by the following lemma, which along with its

proof appears as Theorem 2.2 part (2), in Bosq (1998).

Lemma 2.4.3 Let fw be the kernel estimate associate with a kernel K‘ which sat-
isﬁes a Lipschitz condition. If f is twice continuously differentiable with a compact

1
support, if wn is chosen to be on (log n/n)m where an ——> a0 > 0, then

(10g;c ﬂ)‘1(n/10gn)‘ﬁ 81:11) lfwtr) - f (It)! —> 0, 61-S-

for any positive integer k.

Proof of Lemma 2.4.1. For convenience, we shall give the proof here only for

the case d = 1, i.e., when uh(:r) is one dimensional. For multidimensional case, the '

30

result can be proved by using linear combination of its components instead of uh(a:),
and applying the same argument.

Let sm- := th(:r - Xi)eiuh(:r)d<p(x), and rewrite
nl/zSn = 12'“2 Z 3",.
i=1

Note that {3mg 1 g i g n} are i.i.d. centered r.v.’s for each n. By the L-F C.L.T.,

it sufﬁces to show that as n —> 00,

(2.4.10) E33,1 —) 2,
(2.4.11) E {s§,,1(|snll > Til/2M} —> 0, VA > 0.
But,

E33,, E/Kh(:z: — X)8[4h($)d<,9(:v) >< /Kh(y - Xl€ﬂh(y)d99(y)

= f/EKm:—X)K,,(y—X)(I"(Xlﬂh(13)ﬁh(y)dse($)d<p(y)-

By the transformation :1: - z = uh, y — z = oh, 2 = t, taking the limit, and using

the assumed continuity of 02, f, and 9, we obtain

2 = lim/f/K(u)K(v)oz(t)uh(z—l—uh)iih(I-+-vh)f(2:)

h—)0
x g(:z: + uh)g($ + vh)
f2(a: + uh)f2(z + vh)

dx.

du do d1:

 

 

/ 92<x)m3.<x>92(z)
f(x)

Hence (2.4.10) is proved.

To prove (2.4.11), note that by the Holder inequality, the L.H.S. of (2.4.11) is

31

bounded above by

A_6/2Tl_6/2E(Sn1)2+6

(fume — X)#h(x))2’iédso(r))2 [elm] .

S A-6/2 ”_6/2E

 

This upper bound is seen to be of the order 0((nhd)'5/2) = 0(1), by (h2), thereby
proving (2.4.11).
To prove (2.4.7), by the Cauchy-Schwarz inequality, the boundedness of uh(:1:),

(2.3.6), and by Lemma 2.4.3, we obtain

983.. s Cn [(U.(x>9h<x)>zdso(z> sup (mo/fin) — 1 2

zEI

= n Op((nlld)_1)0p((108k n>2<lognm> 9‘?)

= 0,, ((log,c n)2(log n)ﬁ7 nad7h) = op(1),by (h3).

This completes the proof of Lemma 2.4.1. [:1

Proof of Lemma 2.4.2. By the Cauchy-Schwarz inequality,

Hal/29.412 s (1912/ / Uﬂxldwxl) (..1/2/ 11,249.90) —- ﬂh(x)||2d99(r)) .
By (2.3.5), and (112),
(2.4.12) En1/2/U:($)dcp($) = 0(n'1/2h—d) = 0(1).

To handle the second factor, ﬁrst note that (1,,(93, (90) — ph(:c) is an average of centered
i.i.d. r.v.’s. Using Fubini, and the fact that variance is bounded above by the second

moment, we obtain that the expected value of the second factor of the above bound

32

is bounded above by
(2.4.13) n’l/Q/EIIK),(I — X)r'ngo(:z:)[|2 deem) = 0(n'1/2h’d) = 0(1).

This completes the proof of (2.4.8)(a). This together with (2.2.2) implies (2.4.8)(b).

To prove (c), similarly,

lin1/29n3ll2 S n/U§($)d<p(17)/ I

But, the second integral is bounded above by

9,44, 9.) — m. 90)”? 949(4)

 

max Hm. (X>— mom->1)? / (R.(x>)29<p<x)=o.(hd)x041),

1<i<n

by (2.4.1) and the assumption (m5), and by (2.3.11) applied with a E 1. This
together with (2.3.5) proves (2.4.9)(c). The proof of (2.4.9)(d) uses (2.4.9)(c) and
is similar to that of (2.4.8)(b), thereby completing the proof of the Lemma 2.4.2. [I]

Next, shall show that the right hand side of (2.4.6) equals Rn(6n — 60), where
(2.4.14) [Ln = 20 '4" 012(1)-

Again, recall the deﬁnitions (2.3.1) and (2.4.4). The right hand side of (2.4.6)

can be written as the sum Wnl + an, where

)md
Wm := / [u:(ln))n (,2: 6) 1142K“ a: — ::)dn-] dpw(z) = VnuZun,

Wn2 I: fﬂn($v6n)ﬂn($,60) (195149013) un 2 Ln un say,

 

33

so that the right hand side of (2.4.6) equals [Vn 441+ Ln] un. But,

dni
anu s max —' ‘
15‘5"”unll
v... == / 1?.(4) (Inn(x,9.)nd9w<x)
s ggllmxxo—maaxan /K.<x>99w(x>

+ / Rn<x>|mh<49o>nds9w<x>

= can-+0.0).

by (2.2.2), the assumption (m5), and by (2.4.1). This together with (m4) then
implies that lanll = 0,,(1), and by the consistency of 6”, we also have ||Vnuffll = 0,,(1).

Next, consider Ln. We have

Ln = fﬂnbﬂoﬂﬂnwﬂn) -ﬂn(9:.00)le¢w('-II) + fﬂn($»90)ﬂ:($,90)d¢w($l

= L711 + [1,0, say.

But, by (2.2.1) and (m5), “Ln“ = 0,,(1), while

 

an — fﬂhwigolﬁﬂﬁgoldiﬁdﬂ !
s / ((4144.90) — p.(x.9o)11299.,<4>

+2 f ((44490) — 944,90)” (In.<x,9o)ud9w(z).

But, by (2.2.2) and (2.4.13), this upper bound is op(1). Moreover, by usual calcula-

tions and using (2.2.2), one also obtains

/ no, 90mm, 90ld85w($) = 20 + 04(1).

34

This then proves the claim (2.4.14).
Upon combining these results about the left hand side and the right hand side

of (2.4.6), we have the following theorem.

Theorem 2.4.1 Assume (e1), (e2), (f), (g), (k), (m1) - (m5), and (h3) hold.
Suppose, in addition, that E|e|2+5 < 00, for some 6 > 0, andf is twice continuously

diﬁerentiable. Then, under H0,
(2.4.15) Til/2(9), -— 90) = mini/23,, + 0,,(1).
Consequently, n1/2(6n —— 60) => N(0. 2512231), where E is as in Lemma 2.4.1.

Remark 2.4.1 Upon choosing g E f, one sees that

E = f02(z)mgo($)mg;(a:)f(:1:)d$, 20 =/rngo(:r)mg;(a:)f($)dzr.

It thus follows that in this case the asymptotic distribution of n1/2(6n — 60) is the
same as that of the least square estimator. This analogy is in ﬂavor similar to
the one observed by Beran (1977) when pointing out that the minimum Hellinger
distance estimator in the context of density ﬁtting problem is asymptotically like
the maximum likelihood estimator.

Consider a: and 6 are one dimensional case. Let mg(:z:) = 61:, so 7349(2) = 2:.

Let 6n be the minimum distance (MD) estimator, 6,, be the lease absolute distance

(LAD) estimator. The variance of £49,, — 60) is denoted by V1, and

 

= 03f12292($)f‘1(:r)dx
(f1 9290(4))2

35

V1

The variance of \/r—i(6n — 60) is denoted by V2, and

 

1
V2 = W
Let g(2:) = f2(:z:)l(a:). then
a: f. $2f3(x)12($)dx
V1 = 2 .
(f1 x2f2(:r)l(x))

Now consider the example that X ~ N (0, 7'"), l (1:) = f‘1(a:), the error distribu-

tion is N(0, 03), and I is a ﬁnite interval [—a, a], then

 

 

 

V a: ffa 172f($ld$ of
1 (la x2f(z (1:13)2 ffa ﬁflxldxl
27m2 n02
V = e : ———£.
2 472 2T2

Take r = 1 and a large enough such that

/a $2w($)da‘ > if“) x2w($)da:,

0. —oc
where 7/1 stands for the standard normal density, then V1 < V2.

Or take a = 1 and 7' small enough such that

"HO

2 00
/ y2w(y)dy > ;/ y2¢(y)dy,

G
r 00

then V1 < V2.

Remark 2.4.2 Linear regression. Consider the linear regression model, where
q = d+ 1, G) = Rd“, and m(r) = 61 + 6310, with 61 6 IR, 62 6 Rd. Because now the _
parameter space is not compact the above results are not directly applicable to this

36

model. But, now the estimator has a closed expression and this regression function
satisﬁes the conditions (m1) - (m5) trivially. The same techniques as above yield
the following result.

With the notation in (2.3.1), in this case

. , R1103) . EKh(.’L‘ - X)
un(2:.9) E #1427) s ,uhm 2
1:” Kh(~73- Xilxi EKh(a:—X)X

230 = / 1 m g($)d$, En: [ﬂn($)ﬂn($)'d¢w($),

:L‘ IBIL‘
_ 1 9’ Home»
E ' f , f(x) dx’
IL‘ 231:

MW) = [we — <9 — 9o)'4n(z>]299w(4~>.

The positive deﬁniteness of Z,, and direct calculations thus yield

(9‘. — 90) = 2:3 / no) Un($)d90w(33l-

From the fact that Z,, —-+ 20, in probability, parts (a) and (b) of Lemma 2.4.2,
and from Lemma 2.4.1 applied to the linear case, we thus obtain that if (e2), (k)
and (h3) hold, if the regression function is a linear parametric function, and if

f ||x||2d0(r) < 00, f is twice continuously differentiable, then
n1/2(6n — 60) = 251 / Un(:1:)[ih($)d<p($) + op(1) => N(0,2312251).
Remark 2.4.3 Tightness. Consider when d = 1, from the definition, 0:, satisﬁes

37

the equation

(2.4.16) A (mo; (:16) — m90(1:)) ma;($)dG(r) = An + Bn + C",

 

 

where
1 n , dG'(:r)
An = -— K x—Xie, ma-x . ,
film; "( ll "( )no)
_ l n $_ -m _ _ r'n- dG(:c)
Bn — /I‘(nt=ZIKh( X1) 00(Xz) Hal) 0,,(x) fh($),
. dG($)
Cu 2 EKhx—Xl m0X1 —mo:z: ma-x - ,
[I < M .() 9<>> MW)
and

#a = EKh(-’F — X1) (m90(X1l " m90(x))»

and in stand for am/ae.

The left hand side of (2.4.16) is approximately

(a; — 9o) / miwcm.

An and B, are op(1/\/r—il_i) by Cauchy-Schwarz inequality, consistency of 01;, and

continuity of m on 6 E 9. But 0,, is approximately

40(4)
f (a?) '

 

[fl/KW) (mach: — uh) — m60($)) TWOW)

So if mao(') is not differentiable, and Vnh(m90(a: — h) — mgo(:1:)) is divergent, then

Vnh(oz; — 60) is not tight.

38

2.5 Asymptotic distribution of the minimized dis-

tance

This section contains a proof of the asymptotic normality of the minimized distance
th(6n). To state the result precisely, recall the deﬁnitions of C", CmC'n, I‘, f‘n

from (1.0.7) and let

P, := 2nd / / [EKh(a: — X)K,,(y — X)02(X)]2 d<p(:1:)d<p(y).

We shall prove the following

Theorem 2.5.1. Suppose (e1), (e2), (9), (k), (m1)-{m5) hold, E54 < 00, h satis-
ﬁes (h3), and f is twice continuously diﬁerentiable.
Then, under H0, nhd/2(th(6n) — Cu) asymptotically normally distributed with

mean zero and variance I‘. Moreover, (FnF‘l — 1| = 0,,(1).

Consequently, the test that rejects H0 whenever f‘;1/2nhd/2|th(6n)—Cnl > 7.0/2,
is of the asymptotic size a, where .20 is the 100(1 — a)% percentile of the standard
normal distribution.

Our proof of this theorem is facilitated by the following ﬁve lemmas.

Lemma 2.5.1 If (e1), (e2), (f), (g), (k) hold and if nhd —> 00, then
nhd/2(th(60) —C,,) is asymptotically normally distributed with mean zero and vari-

ance I‘.

39

Lemma 2.5.2 Suppose (e1), (62), (f), (1:), (m3), (m4), (m5), (h1), (h2) hold and

Ea:4 < 00. Then nh‘l/2 th(60) — th(6n) = op(1).

Lemma 2.5.3 Suppose, in addition to (61), (62), (k), (m3), (m4), (m5). and

Be4 < 00, f is twice continuously diﬁerentiable and h satisﬁes (h3). Then,
”ltd/2 thlgol — thlgol = 012(1)-
Lemma 2.5.4 Under the same conditions as in Lemma 2.5.3,
nhd/2(C'n — C3,) = 0,,(1).
Lemma 2.5.5 Under the conditions of Theorem 2.5.1., f‘n — I‘ = 0,,(1). Conse-
quently, the positive deﬁniteness ofF implies, lf‘nF‘l — 1! = 0,,(1).

Proof of Lemma 2.5.1. Note that th(60) can be written as the sum of Cu and

Mug, where

Mn2 = n—22/Kh($—Xi)Kh(IL‘—Xj) €15jd§0($).
i¢j

We shall prove that
(2.5.1) nhd/2Mn2 is AN(O,1",,).

To prove (2.5.1), we shall use Theorem 1 of Hall (1984) which is reproduced here

for the sake of completeness.
Theorem 2.5.2. Let TC, 1 S i S n, be i.i.d. random vectors, and let

Un. I: Z Hn(Xi9X~j)i Gn($1y) = EHH(X1?$)HTI(X1ay)’

131(an

40

where Hn is a sequence of measurable functions symmetric under permutation, with

EHn()~(1,)~(2)|)~(1) = 0, a.s., and EH:(X1,X2) < 00, for each n _>_ 1.

If

- - - - - - 2
EGE,(X1,X2) +n-1EH3(X1,X2)] / [EH§(X1,X2)] —+ 0,
then Un is asymptotically normally distributed with mean zero and variance

n2 EH3<X~1, 210/2.

Apply this theorem to X,- = (X T 5,)T and

Hn(Xi,Xj) = n_1}td/2/Kh(III—Xi)Kh($—Xj)5i8jd§0($),

so that

”Ltd/2Mn222 Z Hn(Xi,X~j).

lgi<j5n

Observe that this Hn(X1, X2) is symmetric, E(Hn(X1,X2)|X1) = 0, and

EH§(X1, X2)

= n—2lLd/f [EKh($ — X1)Kh,(?/ _ X1)02(X1) ]2 d90($)d90(y)

 

s (thdl‘1/fUMUWU’T+u)02(:r-uh)f(x-uh)du]2dso(:r)d<9(y)

< 00, for each n 2 1.
Hence, in View of Theorem 2.5.2., we only need to show that

(2.5.2) EGE,(X1,X2)/ [19113221, 562)] 2 = 0(1),
(2.5.3) n‘lEH2()~(1,)~{2)/ [EH§(X1,X2)]2 = 0(1).

41

To prove (2.5.2) and (2.5.3), it sufﬁces to prove the following three results.

(2.5.4) 563,021, X2) = 0(n-4hd),
(2.5.5) EHf,(X1, X2) = 0(n-4h-d),
(2.5.6) EH§(X1,X2) = 0(n'2).

To prove (2.5. 4), write a t 6 Rd“ as tT= (tT,t 2), with tle 61R". Then, for any

t,s 6 Rd“,

Gn(t,s) = n‘2hd//Kh($—t1)Kh(z—sl)t232

xE[Kh(r — X1)Kh(z — X1)02(X1) ammo).
For the sake of brevity write dcpxm- — d<p(a )dgo(z )dgo(w)dgp(v), and

EKWI - XilKhlz - X1)02(X1)

= /K,,(:r -— t)K),(z — t)02(t)f(t)dt

= W/KM

= Bh( z(—3:) say.

u)02(:r; — uh)f(x - uh)du

 

Then, by expanding square of the integrals and changing the variables, one obtains

that

EGi(X~11X~2)

n’4h2d////Bh( :1: — w)B),(z —:1:)Bh(z — v)Bh(v — w) dSPzzwv

= 0(n "4hd)

42

This proved (2.5.4). Similarly, one obtains

EH§(X1,X2)
= TL-4h2dE (/ Kh($ — XllKh($ " le5152 (190(55) )4
= metal/ff] (EKh(a: — X1)K,,(y — X1)Kh(s — X1)Kh(t -— X1)<7“(X1))2
dsoxyst

= 0(n‘4h_d),
and that
EH§(X1,X2)
= n-zth / / Kh(a: — xlmo — X2)K,.(y — X1)Kh(y — X2)e¥e§ d<p(:v)dsp(y)

= n‘zhd/f [EKh(x — X1)Kh(y- X1)02(X1)]2 d90($)d80(3/)

= 002-2),

thereby verifying (2.5.5) and (2.5.6). This completes the proof of (2.5.1).

By (2.5.7),

(1 /2)n2EH3,(f(1, 5(2)
= 2261/] (/K(u)h‘dK(¥ +u)02(a: — uh)f(x — uh))2 d<p(:z:)d<,0(y)
——> (U?) f(awrgwmi f(f K<u>K<v + u>du>2dv

by the continuity of 02 and f. This complete the proof of Lemma 2.5.1. [:1

43

Note that 0,, = n‘lEfKﬂx—Xl) 5f (190(27). Let 6,, :2 Ef Kﬂx—Xﬂef (190(1).

Then, by routine calculations,
, 2
E (Mid/2(6), — Cn))
n 2
= E (n-lhd/z 2 [/ Kﬂx — X06? (190(3) — enJ)
i=1

3 72,-1th (/ Kin: — X06? (190(1))2

(f Km: — X1) d¢(x))2€i]

= 0((nhdl’1) = 0(1).

= n'lth

 

Combining this with the Lemma 2.5.1, one obtains that nhd/2(th(60) — Cu) is
AN (0, F”).

Proof of Lemma 2.5.2. Recall the deﬁnitions of Un and Zn from (2.3.1). To prove
part (b), add and subtract m90(X,-) to the 2"" summand inside the square integrand

of th(én), to obtain that

th(60) — th(én) = 2/Un($)Zn($,én) d¢w(:1:) —/Z§(3:,én) dgbw(2:)

= 2Q1 - Q2~ 533’-
We need to show that

(2.5.7) (2) nhd/zQI 2 010(1), (22) nILd/2Q2 = 0,,(1).

44

By subtracting and adding (én — 90)Trhgo(X,-) to the 2"" summand of the second

factor of integrand in Q1, we can rewrite Q1 as the sum of Q11 and Q12, where

/ Una)

Q12 = (93. — 001T / Un<x112n<wo>d¢w<x>

Q11

 

12-1 Z Kh(.’L‘ - Xi)dm'] d¢w(:z:),
i=1

where dm- are as in (2.4.4). By (2.4.1), for every n > 0, there is a k < 00, N < 00,
such that P(An) 2 1— n, for all n > N, where An := {(nhd)1/2||én — 00“ < k}. By

the Cauchy-Schwarz inequality, (2.2.2), (2.3.6) and the fact that

(2.5.8) / (Rn(:r))2 dam = 0,.(1),

we obtain that on the event An, nhd/leul is bounded above by

nl/zllén — 00||(nhd)1/2 sup in—I-L— Op((nhd)"1/2).

i»(nhd)”’ll9-90H<k llén - 90”
This bound in turn is 0,,(1) by Theorem 2.4.1 and the assumption (m4). Hence to
prove (2.5.7)(i), it remains to prove that nhd/2IQ12I = 0,,(1).

But Q12 can be rewritten as the sum of Qm and Qm, where

Q12. = (é. — 9017‘ / Un(z>pn(x,é.)d¢w(z),

Q1222 = (én — 60)T/Un($) [lin($aén) — #413390) d‘iaw(x)
Arguing as above, on the event An, (nhd/2IQ122I)2 is bounded above by

nzhduén — eon? max 11mm» — moo(Xz-)||20p((nhd)‘1) = 0pc),

lgign
by (2.2.2), (2.3.6), (2.5.8), and assumptions (m5) and (h2).

45

 

Next, note that Qm is the same as the expression in the left hand side of (2.4.6).

Thus, it is equal to

(2.5.9) ((9,. — 601T / Zn(x.én>un(x,én)d¢w(z)
= (6.. — 001T]z.<x.én)un(z,601d¢w<z>
+<én — 6017‘ f 2.42:. 62.) [w.én) — mm] mm
2 D1 + D2, say,
But, by the Cauchy-Schwarz inequality, (2.2.2), (2.3.19), and (2.5.8), nhd/2IDll is

bounded above by

nhd/2llén ‘ 90l|20p(1l = 010(1),
by Theorem 2.4.1 and the assumption (m5) and (h2). Similarly, one shows nhd/2|Dgl

is bounded above by
nhd/znén — 9o||20p(1) = 0pm.
This completes the proof of (2.5.7)(i).

The proof of (2.5.7)(ii) similar. Details are left out for the sake of brevity. D

Proof of Lemma 2.5.3. Note that
nhd/21thwo) — moon 5 mid/2 / U2<x>d¢<x> sup ”Ra/fax) — 1|
2:61
= mild/20A(nhd)‘1)0p((10gk”)(log n/nlﬁ) = 0,,(1),

by (2.3.5) and Lemma 2.4.3. Hence the lemma. (3

Proof of Lemma 2.5.4. Let

t:- = mg, (Xi) — mom), Ana) == mm) (rm) — Wm) .

Then,
1 ” -
0,, = E: / K§(z—x,)(e. —t,-)2d<pw(:r:)
i=1
1 n
= E Z / K1210” ‘ Xi)(€i — til2dSP<$l
i=1

4.5—2 2 / Kﬂz — X,)(E,- - ti)2An($)d¢($)

2' AM + Ang, say.
In order to prove the lemma it sufﬁces to prove that
(2.5.10) (a) nhd/2(An1 — C1,) = 0p(1), and (b) mild/QAng = 0,,(1).

By expanding the quadratic term in the integrand, Am can be written as the sum

of C", Ann, and An13, where

A7112 = n-2Z/K§($ - Xi)t?d(p($),
i=1
141113 = —2n_ZZ/K,2,(CL‘—Xi)Eitid¢(I).
i=1

But |An12| S maxlSiSn [align—2 zyzlfKﬂx — X,)dgp(:z:). By (m4) and (2.4.1),
one obtains that Inaxz-Sn |ti|2 = Op((nh.d)‘1). Moreover, by the usual calculation,

one obtains that

n"? 2 / Kﬂw — Xadm) = Mum-1).

Hence,

lAnrzl = 0p((nhd)’1)0p((nhdl‘1) = 0p((nhd)‘2)-

47

Similarly,

lAn13l S 2m<ar3cltiln-2Z/K§($—Xi)lsildgo(x)
— i=1

= 0p((nhdl’1/2)Op((nhdl'1) = 0p((nhd)‘3/2).
Hence

Inhd/2(An1 — C'n)| = Mid/2 (Op((nhd)”2) + Op((nhd)_3/2))

= mum-“(2141+ 0p((nh2d>-W) = 0pm.

To prove the part (b) of (2.5.10), note that Aug can be written as the sum of

Arm, Arm, and An233 where

A... = 71-22 / X2 MUM)
An22 = Z/Kh AM )dﬂx )
An23 = 71—2 til/Khwii) {)1'5 'it A nl‘( )d(,0(.’13).

By taking the expected value and the usual calculation, one obtains that

n-ZZ/Xm—X X>§d<p()= 0,((n,,d)-1),

i=1

Hence

2:61

Inhd/zAnml S SUP lAn(5E)ln—2 Z / Kﬂx _ Xi)52?d‘P($)
i=1
2 'nhd/QOp(log,c n (log n/n)d—:‘)Op((nhd)‘1)

= opal-(“2 log. n (low/mm) = 0pm.

48

by Lemma 2.4.3 and (2.2.2). Similarly, one obtains that

Inhd/2Anggl S suplAnx) n3:( )lmax [til2 nJZ/Kﬁ (x—X d(,017( )
:6
= (rm/2011001112 (logn/n12/‘d+‘”10p((nh 1- 10p((nh. 1*)

= 0p((nh3d/2)‘1)= 011(1),
and

(mic/2AM: _<_ 2sup1A(x1lmaxIt In zz/Kg($_x 115-(M)
= nizd/201(logkn (logn/n12/‘d+4)10p((nhd1 1/210p((nhd1-11

= 0p((nh2d)-1/2) = 012(1),

thereby completing the proof of the part (b) of (2.5.10), and hence that of the
lemma. C].

Proof of Lemma 2.5.5. Deﬁne

- —hnd‘ZZ(/Kh(x—X) -)XKh(a:— j)-esjd(p(:r ))2 :HZXX
i=1

We shall ﬁrst prove

(2.5.11) 1‘“, — 1”“, = 0,,(1),

(2.5.12) r —r =op(1)

The claim of this lemma follows from these results and the fact that I‘n —> I“.

49

For the sake of convenience, write Kh(2: — X ,-) by K 1(2). Now, rewrite P" as the

sum of the following terms:
B1 = hd “1‘2: (fKr-(ale )(Ei-t1)(€r-tj)dso($))2.
B2 = (ﬁn-2 Z (/ X.(:c1X.-(x1(e.- — t.1(e.- -t1)An($)d¢($))2.
B. = 422:”: Um. -(-t.-1(o- 4.1111(2))
x (fw X.(z1X.(e~1(e. — 111(5. — trlAn($)d<p($)) .
In order to prove (2.5.11), it suffices to prove that

(2.513) 81—1111 —_10p( ), 82 = 0p(1), and B3 17- 0P(1)'

By taking the expected value, Fubini, and usual calculation one obtains that

(2.5.141 hd (Ln-220m .- ($11151 .1191 1)2=op(11,

(2.5.15) Min-2: (1mm) ( )|€,|d1,0(1‘))2 = 0,,(1),
(2.5.16) hdn"2 in; (/ Ki($)Kj(2:)d(p(x))2 = 0,,(1).
Further more,

(25.17) Egg/1,111) .—_ 0,,(1), by (2.2.2)

(2.5.18) 33.21. 11.1 = 0,,(1). by (1114) and (2.4.1).

50

Note that by expanding (er — ti)(€j — tj) and the quadratic terms, |B1 — f‘nl is

bounded above by the sum of 812 and 813, where

13,. = hdfn‘2Z(/Ki(2:)K 1:)(,1tt,-|+1e,t,(+1tej1)do(r))2,

B... = he E(jK x).1se.1d<4z))
x,-(2:)(/KK e)(.1tt,1+1e.t1+1t.-e1>de<e 2))

But B12 2 op(1) by (2.5.15), (2. 5. 16), (.2 5.18), and the fact that {t} are free ofz
It further implies that 813 = 0,,(1) by (2.5.14) and applying the Cauchy-Schwarz
inequality to the double sum. Hence [81 - f‘nl = 0,,(1).

Note that

2
B. s sup1A4e )1nhd ”Z :(fme) 514-5-— must—514126))

1:61

= 0p(1)0p(1) = 012(1),
by the inequality
|€i — tillfj - til S lfifjl + (ltitjl + IEitz’l + ltz'Ejl),

and expanding the quadratic terms, and by (2.5.17), (2.5.14), and the result that
Bu and 813 are both 0,,(1). Finally, again an application of the Cauchy-Schwarz
inequality to the double sum yields B3 = 0,,(1). This completes the proof of (2.5.13),
and hence that of (2.5.11).

To proved (2.5.12), note that I‘n = Ef‘n. Hence

2 2 m - - - - 2 -
E (1“,, — 1“,.) g ZEH3(X,-,Xj) +c Z EH§(X,,Xj)H§(X,-,Xk)
i=1 i¢j¢k
g (n2 +en3)EH3(X1,X2)

51

for some constant c by expanding the quadratic terms and the fact that the variance
is bounded above by the second moment. But by (2.5.5), this upper bound is

0((nhd)‘1) = 0(1). Hence (2.5.12) is proved, and so is the Lemma 2.5.5. [:1

2.6 Simulations

The simulation study of the distribution of 6,, and the the minimum distance
th(én) was conducted for a linear regression function family {172.9(2) = 02:, 9 6 IR}
with various sample sizes. First we generated random sample {Xi}?, n = 50, from
uniform [—1.1] distribution, and random sample {at};I from normal distribution with
mean zero and standard deviation 0.1. Then let Y,- = mgO(X,-) + 8,, with 60 = 1,

2' = 1, ..., n. The kernel functions we used to construct the test statistic are
X(u) = K‘(u) = 3/4(1- U2)I{|u| S 1}.

The bandwidth h is chosen to be 7271/3 and w is chosen to be nil/5. The measure
G is a measure with Lebeague density g(:z:) = 1 on {—1.1}.

Recall that 5,, is the minimizer of th(6). By taking the derivative of th(6)
in 6 and solving the equation of (’9th(6)/86 = 0, the minimum distance estimate

of 90 is given by

én = An/Bnt

52

where

An = [:(gKh(:r—X:; . )(ZKh(:c—X) - X-Z)Kw( x—X - )2d2:
Bn 2 /:( éKhHH 02(ng x—X - )2dz.

The normalized value of 6",, then is calculated by Jag), — 1). The corresponding

minimum distance and the estimate of its asymptotic mean are calculated by
2 n -2
th(é,,) = f: (14:16“ x—X ,- 13—6,, X,) )) (2Kw($—Xr)> dz,
i=1

(3”,, = [I (Z K,‘~:(r — X,)(Y, — )(énxy) (5: Kw(:c — X0) dz.

The value of the test statstic is calculated by nhd/2(th(én) — C1,). In order to plot
a density curve, we repeated the above sampling and calculations for 1000 times.
The density curves of normalized 9,, and the test statistic are plot by using density
plot command with Gussian kernel Option in SPLUS2000. We also did the above
simulation for n = 100 and n = 200.

The ﬁrst three graphs in Figure 2.1 are the Monte Carlo density curves of
Jag" — 1) from 1000 runs with sample size n = 50, n = 100, n = 200 respectively.
The fourth graph is the N (0, (0.173025)2) density, the density curve of the limiting
distribution of Jada — 1) based on the theorem we obtained in section 4. The ﬁrst
three graphs in Figure 2 are the Monte Carlo density curves of nhd/2(th(§n) — Cu)
from 1000 runs with sample size n = 50, n = 100, and n = 200 respectively. The
fourth graph is the density curve of the limiting distribution of nltd/2(th(én) - C’n) _
in Theorem 5.1. which is N (0, (0.026344)2) in the present case. The graphs show

53

 

2.0 ‘

1.5‘

1.0‘

0.5 ‘

 

 

0.0

-0.5

0.0 0.5
Sample size is 50

 

2.0‘
1.5‘
1.0‘

0.5 (

 

 

0.0

-0.5

0.0 0.5
Sample size is 200

 

2.0 ‘

1.5“

1.0 "

0.5 ‘

 

 

-0.5 0.0 0.5
Sample size is 100

 

0.5 ‘

 

 

 

0.0

-0.5 0.0 0.5
Limiting distribution of the estiamte of the parameter.

Figure 2.1: The density curve of ﬁ(én — 1).

54

 

 

 

 

0 _
-0.1 0.0 0.1
Sample size is 50

 

15‘

10“

 

u...—

-0.1 0.0 0.1
Sample size is 200

Figure 2.2: The density curve

 

55

 

151

10‘

 

 

 

-0.1 0.0 0.1
Sample size is 100

 

15

10‘

 

 

 

-0.1 0.0 0.1
Limiting dstribution of the test statistic.

0f nhd/2(th(én) — (in).

that the distribution of ﬁwn — 1) resembles the asymptotic normal distribution
quite well even for sample size is 50. The distribution of nhd/2(th(én) — 6'”) has a
small negative bias compared with the asymptotic normal distribution for all three
sample sizes. But the bias decreases as 77. increases.

A simulation for d = 2 and m = 2 was also conducted. The hypothesis to be

tested is
H0 : 11(3) = 0.52:1 + 0.82:2, vs. H1 : H0 is not true.
The parametric model to be ﬁtted is
{mg(:1:1,:1:2) = 611131 + 02252, 0 = (01,62)T 6 R2 x = ($1,232)T E R2}.
We chose the following ﬁve models to generate simulated data from:

model 0. Y,- = 0.5X1,-+ 0.8X2i + 5i,

model 1. Y, = 0.5X1i + 0.8X2i + 0.3(X1i - 0.5)(X2i — 0.2) + 51',

model 2. Y,- = 0.5X1i + 0.8X2i + 0.3X11-X22- — 0.5 + 51':

model 3. Y,- = 0.5X1i + 0.8X25+1.4(e$p{—0.2X12i} — e$p{0.7X§i}) + 5,,

model 4. Y, = [{Xzi > 0.2}X1i '1' 51‘,

The error distribution is N(0,0.3). X1, are i.i.d N(0,0.7) and X2,- are i.i.d N(0,1).
The sample sizes chosen are 30, 50, 100, and 200. The nominal level that is used to
implement the test is a = 0.05. There are 1000 replications for each combination of

(model, sample size). Data from model 0 are used to study the empirical size, and

56

the data from models 1 to 4 are used to study the empirical power of the test. The

empirical size (power) is computed by
Relative frequency of ( value of the test statistic > F‘1(1 — (1)),

where F is the asymptotic distribution of the test statistics under H0.

The bandwidth h is chosen to be n“1/4'5 and w is chosen to be (log n/n)1/(d+4),
the measure G is taken to be the uniform distribution on [—1,1].

The density curves of normalized 6,, and th(6n) are plotted by using den-
sity plot command with Gussian kernel Option in SPLUS2000 for one dimenstion
and Surface-Spline Fine Grid for two dimension, where 6,, = (91",62n)T and 00 =
(0.5, 0.8)T.

The results of the power study are shown in the table. The tables gives the
empirical sizes and powers for testing model 0 against models 1 to 4.

The simulation results of the densities of ﬁwn — 60), and the minimum distance
test statistics are shown in Figure 2.3 to 2.9.

Figure 2.3 is the Monte Carlo density curves of x/V—llgm — 0.5) from 1000 runs
with sample size n = 30, n = 50, n = 100, n = 200 respectively. Figure 2.4 is
the Monte Carlo density curves of ﬁwgn — 0.8). Figure 2.5 is the Monte Carlo
density surface of W191; — 60) when n = 30. Figure 2.6 is the Monte Carlo density
surface of ﬁwn — 00) when n = 50. Figure 2.7 is the Monte Carlo density surface
of V5197; — 60) when n = 100. Figure 2.8 is the Monte Carlo density surface of

ﬁwn — 90) when n = 200. Figure 2.9 is the Monte Carlo density of the test

57

statistic under H0 with sample size n = 30, n = 50, n, = 100, n = 200.

In the following ﬁgures, ”-

19

is forn=30,”---

is for n = 100, and a heavy solid line is for n = 200.

Table 2.1: Empirical sizes and powers for testing models 0 vs. model 1 to 4.

”

isforn=50,”— — —

 

 

 

 

 

 

 

 

 

 

n = 30 n=50 n=100 n=200
model 0 0.005 0.022 0.036 0.049
model 1 0.003 0.062 0.670 0.895
model 2 0.931 0.999 1.000 1.000
model 3 0.461 0.975 1.000 1.000
model 4 0.035 0.368 0.977 1.000

 

 

 

 

58

11

The density of \/r—i(61n — 0.5).

Figure 2.3:

 

 

 

. 41...}
,x~..\..\..\\.\..v‘¢‘
\.. \
\\
..w.\\.
. «
rl
51?” I
. (0., 3.15...
. a! 1H,...m. I
. . 33"
[1.4.1] It |.. lllﬁ.lllll.q

 

0.2 ‘

Id ..
I .l r
I .. .,
I... ..
.’.....o
. t... .
, .5.
4...
w
p
...
.lll ll; 11ll.|

 

 

0.0

59

Figure 2.4: The density of \/r—i(62n — 0.8).

 

 

60

 

Figure 2.5: The 2 dimentional density of Jim” — 00) when n = 30.

 

000. «N 000.0( 000.0 000.0
(7

 

as

61

tional density of ﬁwn — 90) when n = 50.

The 2 dimen

Figure 2.6

         

                                                    
                        

  

.m...~....,..w“ W6“\ \‘\ u a a. a . g
§Nk.‘vw.é
‘R‘l—l‘x‘ﬁu‘. .. ...
‘“\!“.I‘$..
"InaﬁfOMag.
0““‘6933.
géov e.g.,: w

     

03....
3:, 3:

0 o 0 0
..~.€e.....e..3.
494. . . : v

 

. a
3 x s. 0 ..
¢....”...~......"...
0.0.30.0,
. ..

  

 

000. 00 000. 0N 000.00. 000.0

«7

62

1 density of \/T_l(9n — 90) when n = 100.

zona

t

The 2 dimen

 

Figure 2.7

000.NP000.¢0000.00000.0( 000.0
P7

 

63

tional density of \/r—i(0n — 90) when n = 200.

The 2 dimen

Figure 2.8

 

000. 00 000. 00 000. 00 000.0
0(7

 

64

Figure 2.9: The density of test statistics under H0.

 

1.0 4

.0
co

.0
0)

.0
A

0.2

0.0

I ‘ '\ 1
*1 ’ \_ 2
l t x .
l 1," ‘ w
l _ .1. ' \ \ ii
1 g“: ‘.\
_l ' -. .\
' i s \
1 ,4‘ .-
) 1- 1
r " .
4 f i 1‘
( t" ‘1“
l 1' \ \
if)
—4 .‘ '
I 3 .‘It
i l \‘.
“t.
p 1.
‘J, 31“
3 . \
; O, . i \\l
f! , \ \
-1 1" . K \-
1 / I ‘ ' ‘\\ ~.
l 5., ’ \\\
,. I ' \x.
‘ / I - \\i "V. \
-// I _ 01:;\ \
/ I . ‘\.\"’;'-
4(— ’ " me __ .A u...

 

 

65

 

Chapter 3

Minimum Distance Autoregressive

Model Fitting

3. 1 introduction

This chapter discusses application of minimum distance idea in fitting a parametric
model to the autoregressive function. To be Speciﬁc, let Xn be a real valued strictly
stationary process having finite expectation. The autoregressive function is deﬁned
to be

Mr) = E(anXn_1 = 1:), n E Z

Let {mg(-) : 9 E 9},9 C R", G compact, be a given set of parametric functions.

The statistical problem of interest here is to test the goodness-of-ﬁt hypothesis

H0 : u(:r) = m90(.’L‘), for some 60 E 6'), and for all a: E I vs. H1 : H0 is not true,

66

based on the sample {X ,- : i E Z, } from the stochastic process, where I is a compact
subset of IR.

In the context of regression ﬁtting problem under the i.i.d set up, the asymptotic
prOperties of minimum distance estimator of the parameter 6,, are studied, where 6,,
is deﬁned to be the argument that minimizes a transformation of the L2(G) distance
between the nonparametric estimate of regression function ,a and the parametric
function me. It has been shown that the so deﬁned minimum distance estimator is
consistent, asymptotically normally distributed with rate of Vii. The corresponding
minimized distance is also asymptotically normally distributed. Thus a class of tests
can be constructed by using suitably standardized minimum distance. Encouraged
by what have been shown in i.i.d case, we consider to apply the same idea to the
autoregressive model checking.

when dealing with regression model ﬁtting, to reduce the bias caused by fh in
Mhh(6) deﬁned in chapter 1, we used an optimal window width for the Nadaraya-
Watson type estimation of f, i.e. ﬂan. But it still causes bias. Hence in this chapter
we consider using a slightly different L2 distance deﬁned as M),(6) of (1.0.8), which

is actually the L2—distance between 7:197 and a kernel estimator 771.9 f deﬁned as
A 1 "
mof = 5 2; K44 — X.-.)X.-.

where G is a o-ﬁnite measure with bounded Lebesgue density 9.

The estimate of the parametor is deﬁned as (1.0.9). The test statistic Tn is

67

defined to be

”hi/2

1 " 2 .2
Tn = f“?! (Mh(0n) - E 521/6; Kh(I — Xi_1)dG($)Ei) ,

 

where F3, is a consistent estimator of F,

o?)2/If2(r)gz(r)dr(u)(f(/K1<)(u+e)du)2de,

and 5,- = X ,- — mgn(X,-_1). Similar to the discussion in chapter 2, F3, can be chosen

to be

4/1 (iiKh(z—X,_1)é?>2g2(z) d2:(u)/</K K()u-l—v )du)2dv.

In this chapter a proof of the consistency of 6,,, the asymptotic normality of
\/r7(6,, — 60), and asymptotic normality of the test statistic Tn are presented. A test

of H0 can be thus based on Tn.

3.2 Assumptions

Recall that the deﬁnition of ”Geometrically Strongly Mixing” (GSM) from section
2.3 of Bosq (1998). {X,} is GSM if there exists co > 0 and p 6 [0,1) such that

a(k) S copk,k 2 1, where

a(k)= = supa(0{X..s s t},e{X..s 2 t+ 14),
a(.A,B): = sup |P(AﬂB)—P(A)P(B)|,
AEA,BEB

where o{X,, s g t} stands for the a ﬁeld generated by {X ,, s S t}. It’s also pointed _
out in Bosq (1998) that the usual linear processes are GS M .

68

Here we shall state the needed assumptions.

(M) The time series {X,-; X,- 6 IR, i E Z}, where Z stand for all integers, is strictly

stationary satisfying GSM mixing condition and X, = p(X,-_1) + 5,.

McKeague and Zhang (1994) pointed out that it is easier to check geometric
ergodicity, which implies strong mixing with a geometric mixing rate. From Tweedie
(1983), one obtains that a sufﬁcient (but by no means necessary) condition for
geometric ergodicity of the nonlinear autoregressive process is that p and o are
bounded on compact sets, where o2 = E (efleo).

About the errors and underlying design we assume the following:

(81) The autoregressive function n() satisﬁes fp2(:1:)dG(:v) < 00, where G is a

o-finite measure on R.

(82) {5,} are i.i.d and 5,-1.1 is independent to X-, j = 0, ..,i, and o2 2: Eef.

(S3) The density of X0 is twice continuously differentiable Lebesgue density f that
is bounded from below on I. Denote the ﬁrst and second derivatives of f by f’

H ‘
and f , respectively. We also suppose that 311901222324 t5.t6} liftt.t2,t3.t4.te.telloo <

00, where ft1.t2.t5.t4,tt.te IS a Jornt densrty of X,“ X,,, Xta, X,“ th, and Xto.

About the kernel function K we shall assume the following:

Conditions (K), (Al), and (A2) are the same as those in chapter 2.

(A3) For each 6, mg(:z:) and m90(.v) are as continuous in a: w.r.t the integrating _
measure G.

69

(A4) The function 6 1——> me is continuous in L2(G): For any sequence 6,,, 6 E 6),

”6,, — 6” —> 0, implies p(m9n,m9) —-> 0.

(A5) For every 5 > 0, there is an N, < 00 such that for every 0 < k < 00,

max h1/2llmg(X,-) - r'rigo(X,-)H = 0,,(1).

ISiSn.(nh)‘/’ll0-00llsk
About the bandwidth h we shall make the following assumptions:
(H) h ~ n‘“ for some a > 0, and there is a 7 > 0 such that nh“7 -—> 00.

In this chapter we will often use an inequality in Bosq (1998). We list it here as

a lemma.

Lemma 3.2.1 Let X and Y be real valued random variables such that X E Lq(P),

Y E L’(P), where q,r >1 and 3 -l- i = 1 — i, then

ICOU(Xe Yll S 219(20ll/pllelqllYlln
in particular

ICOv(X.Y)l S 4al|X||ee||Yl|ee.

where

a = 01(0(X),0(Y)).

“XII... = inf{b: P(|X1 > b) = 0}.

70

Analog to the notations deﬁned in section 2.3, we introduce some notations that

will be needed in this chapter.
1 n
Un($.9) = a :KMIF - Xi-1)(Xi — m6(Xi-1))t and
Un(l‘) = (,1? 60): iZKh( I — Xi_1)8i,

and Z,,(z, 6), 12,,(23, 6), and 6,,(2, 6) are as deﬁned in section 2.3 with X,- replaced by

X,_1. Note that
114,490) = / Ug(r)do(r).
I

We also introduce the following notation,

.7h3={yElR:|$—y|§h,$EI}.

3.3 Consistency of 6,,

The main result of this section is the consistency of 6,,. Similar to the proof of
consistency of 6,, in previous chapter, we will ﬁrst prove the consistency of 6; in

Lemma 3.3.3, where now 6;, is deﬁned to be

6:, = argminOEeMgw), and
n 2
M;(6): = A(iZKh(at—X,~_1)X,—m9(2:)f(2:)) dG(a:).
i=1

This result will be in turn used to prove the consistency of 6,, in Theorem 3.1 .
Lemmas 3.3.1 and 3.3.2 list some results that will be needed in the proof of -
Lemma 3.3.3 and theorem.

71

Lemma 3.3.1 Let 112,, be a sequence ofmedimension vectors of real valued functions
deﬁned on IR, bounded on J, uniformly in n. Then, under the condition (M), the

following hold for V2: 6 I and V0 < a < l:

Tl

n-1 : (Kh(.’13 - Xi—1)¢n(Xi—1) — EKhCL‘ — X0Wn1Xoll

i=1

— (—¥—>
_ 0p VnhH“
I

(a)

 

 

 

Tl

LIL—1 (Kh(.’L' '— Ari_1)’l/)n(Xi_1)— EKh($ — X0)’ll)n(X0)) dG(.’E)

1

(b)

 

 

 

i:
1

\/ nh

= 0p(
n-1 2 K44: — X.-.)v..(X.-_.) — EKtlx — Xo)w..(Xo)

(c)E/I
1

: 0(nh1+a)’

 

).

2

dG(:r)

 

 

 

 

 

where || || stands for the usual L2 norm deﬁned on IR", i.e.

 

”((11, mtamlTH = a? + - - - + a3", V(a1, ...,am)T E R”.

Proof. Note that the lemma holds for {tin} if and only if it holds for all 3""
component of {urn}, 1 g j S m. Hence we only need to prove the lemma for the
case of m = 1. Recall m is the dimension of G.

Leta<b< 1. ForaanI, let
¢n1Xil 3= Kh1$ — Xt)¢’n(Xi) - EKMI - Xi)7l)n(Xi)-

Then E¢n(Xi) = 0. SO

71

(3.3.1) E (71-1 Z(K,,(.v — X,_,)tp,,(x,_,) — EKh(:z: — X0)2/2,,(X0)))

i=1

: n—1E¢i(XU) '1" 271-2 Z COV(¢n(Xi—l)i ¢n(Xj—1))

i<j

72

The ﬁrst term is 0(— —) by the fact that the variance is bounded above by its second

moment, boundedness of 1b,, and f. By Lemma 3.2.1, the second term is bounded

above by
(3-3'2) 2n—2Z2b—1((]201—i)X)bH¢n( i—llllqll¢n(XJ-1)llr
i<j
n— 1n-k
22+bn-2 ablk Wlibn i—1)llqll¢n(xi+k-1)llr
k=1 i=1
n—l

S 22+bn‘1 ab(k)l|¢n(Xo)ll2»
k=1
by taking q = r = 2/(1— b) and p = 1/b. But note that
11¢.(X.)11§= (13144251112122 = 001-212-222) = 001222 1 002-1222).

Hence (3.3.1) is MW). Consequently (a) holds. Similarly, one may prove (b) and

(c). [3

Corollary 3.3.1 Let ’l/J(:E) be a real valued continuous function on IR. Then under

the conditions (M), (51), (S2), and (53),
n‘1 Zn: K),(:r — X,-_1)2/2(X,-_1) ——> w(:r)f(:1:), in probability, VI 6 I.
Proof. Note that by the continuity of f and w,
EKh(:r — X,)ib(X,-) = / K(u)1,/)(:1: — uh)f(x — uh)du ——-> z/J($)f(a:),
so the corollary follows by applying Lemma 3.3.1 (a) to 1b,, = it. C]

Lemma 3.3.2 Under the conditions (SI), (52), and (K),
E/Zf,($,6,,)dG'(:r) ——-> 0.
I

73

Proof. By adding and subtracting Kh(:r—X,-_1)X,- to the ith summand in Z,,(x, 6,,),

and expanding the quadratic term, one Obtains
/ z§(x,0,.)do(:c) s 2Mh(6n) + 2Mh(60) S 4Mh(60)-
I

The second inequality follows from the deﬁnition of 9". Therefore, to prove the

lemma it sufﬁces to show that E Mh(90) —> 0. Note that by Fubini,

EMh(00)

2 2 7‘
= 3. / EXEC: — X0)dG(:z:) + —- 2 E/ Km: - Xi—1)Kh($ - Xj—ildG($l€i€j-
T). I 77. I

K)”
The ﬁrst term is O((nh)‘1) by direct calculation. The second term is O by taking

conditional expectation on o{X, : s S j} ﬁrst. Hence
(3.3.3) EMh(60) = EAU§(x)dG($) = 0((nh)—1).
So the lemma is proved. D
Lemma 3.3.3 Under the conditions (51), (52), (83), (K), (A1), and (A4),
6; —> 00, in probability under H0.

Proof. The proof is similar to Corollary 3.1 in chapter 2. According to Lemma 3.1

in chapter 2, it sufﬁces to show that

(3.3.4) A (£ZKM1: — Xi_1)X,- — m90(.7:)f(1:)) dG(a:) = 0,,(1).

Note that by plugging in X,- = m90(X,~_1) -+- 5,, adding and subtracting
EKh(z — Xi_1)m90(X,-_1) in the 2"" summand of the integrand, the left hand side of

74

(3.3.4) is bounded above by the sum of the following three terms:

(a) [I U3<z>dcm

(b) [I (g; ZlKhUF — Xi-ilmao(Xi-1l - EKh($ — Xi—llmeo(Xi—1ll) (10(17)

(c) / (Emu: - X.)m..(xo> — more»? dam.
I
The term (a) is Op(1/(nh)) by (3.3.3). The term (b) is op(1) by Lemma 3.3.1

(c) with 2b,, = mgo. The term (c) is 0(1) because it is equal to

A(/ K(U)(meo($ — uh)f(x — uh) — met,(:L‘)f(:1:))du)2 (10(3) = 0(1)

by continuity of rage and f, compactness of 1'. Hence (3.3.4) holds, so does the
lemma. D

Now we are ready to present the main theorem of this section.
Theorem 3.3.1 Under the conditions (M), (51), (52), (53), (K), (.41), and (A4),
6,, —> 60, in probability under H0.

Proof. The proof of this theorem is similar to Theorem 3.1 in chapter 2. Here
we only sketch the proof. Recall the deﬁnition of p from section 2.2 and note that
M};(6) = p(rn, mgf), where rn(:1:) := 11‘1 2:1 Kh(a: — Xi_1)X,-.

By the same argument as in the proof of Theorem 2.3.1 with Mgw and th

replaced by M}: and Mh, it sufﬁces to prove the following result,

(3.3.5) sup ith) — M;(t9) = 0,,(1).
969

75

To prove (3.3.5), add and subtract n“1 1:, X)(: - Xi_1)mg(X,-_1) inside the
parenthesis of A1,:(0), expand the quadratic, and use the Cauchy-Schwarz inequality

on the cross product, to obtain that the left hand side of (3.3.5) is bounded above

by

sup (W) + 2 sup<Cn(6)Mh<6>)1/2,
668 969

where

Cn(6) :fr (£21042: — Xi_1)(m9(X.-_1) — mg(x))) dG(a:).

To prove (3.3.5), it sufﬁces to prove that

(a) sup Cn(6) 2 010(1), and (b) sup Mh(9) 2 010(1).
068 966

First to prove (a). Note that Kh(x — X,) is nonzero only if X,- 6 J1 for large n

such that h g 1, so Cn(6) is bounded above by

"19(9) "m0($ll2ﬂ (iZKﬁz—Xi-il) 610(17)-

As a consequence of Lemma 3.3.1 part (c) with 2b,, = 1,

(3.3.6) [I (rt-1 2 Km: — X._,)) (10(3) = 0,,(1).

And

sup
ly_zlshvz9ye~71

 

SUP SUP mo(y)—me(:v) =00)

'2
969 ly-zlsh.z.y€.71

 

because of the continuity of m and compactness of (9 and J1. Hence

sup 0,,(9) = 0,,(1).
age

76

Next to prove (b). By plugging in X,- = mgo(X,-_1) + 52', one obtains that Mh(6)
is bounded above by:
2/ U,2,(I)dG(I) + 2/ Z:($,9)dG(IL‘).
I I

The ﬁrst term is 0,,(1) by (3.3.3). For large n such that h g 1 , the second term

is bounded above by

968,116.71

2
1 n
4 sup mim- / (;ZK.(x-x.-1>) dG(:c) =op<1>
I i=1
by the continuity of m, the compactness of G) and .71, and (3.3.6). Hence

sup Mh(6) = 0,,(1).
069

So (3.3.5) is proved, so is the theorem. D

3.4 Asymptotic distribution of (We, — 60).

In this section we will prove the asymptotic normality of 6,,. Before that we introduce

some notations that are going to be used in this section. Deﬁne
(3.4.1) §n(I) = EKh(IE — X0)r'n90(X0),
7771(15) = Tn’en(1:) _ mﬁo (I) - mi; ($)(9n _ 60),

and {(x), 772 are as deﬁned in (1.0.10) of Chapter 1.
Note that under the condition (M), 6,, is a solution to the equation 0Mh(6)/86 =

O. i.e.
[Unﬁt 9n)un(:1:, 9n)dG($) = 0.
I

77

Plug in X.- = m90(X.-_1) + e,- in Un(.’L‘,9n) and rewrite the above equation in the

following form:

(3.4.2) [I Un(I)un(I,6n)dG(I) = / z.(z,9,,)pn(x,9..)do(z)

I

As in the proof of Theorem 2.4.1, we will use Lemmas 3.4.1 and 3.4.2 below to
show that the left hand side of the above equation is approximated by an average of
martingale differences. Hence, by the martingale central limit theorem (M.G.C.L.T)
converges in distribution to a normal random variable with rate 1/\/n. The right
hand side can be written as (6,,, — 60) times a random variable which, by Lemma
3.4.3, converges in probability to a positive constant. So the theorem about the
asymptotic normality of 9,, follows.

Now we start with three lemmas.

Lemma 3.4.1 Under the conditions (M), (51), (52), 5(3), (K), (A1), and (A4),

there is a function f such that the following hold:

 

lunwo) — doll = 0.0),

(a) SUP

(b) sup Hm, 9..) — as!) = 0.0).

261'

Lemma 3.4.2 Let Z be a real valued continuous function on I. Under the conditions

(M), (51), (52), (53), (K), (AI), (A3), and (A4),

ﬁ/ (£21041? — Xi—1)l(Xi—1)Ei€n($)) dG(:1:)

78

converges in distribution to a normal random vector with mean zero and covariance

matrix given by
0.1/W >m9.< MW 19% WW
I
The result is also true when {n are replaced by 5.

Lemma 3.4.3 Under the conditions (M), (51), (52), (S3), (K), (A1), (A2), (A3),

and (A4),

(3-4'3lll0n — QOlI‘I/(i‘ ZK’I( x — Xi 1)nn(Xi—1))€($)d0($)= 012(1)-
We will state the main theorem of this section.

Theorem 3.4.1 Under the conditions (M), (51), (52), (53), (K), (A1), (A2),

(A3), and (A4), was, — 60) converges in distribution to a normal random vec-

tor with mean zero and covariance matrix 20 172261, where 20, n2 are as deﬁned

in (1.0.10).

Proof. Note that the right hand side of (3.4.2) can be written as (6,. — 00)R,.,

where R, is a sum of following terms:

R... -—- [ﬂn($a90)ﬁ:($,9nldc($l
Rn2 : _/I(%IZIK’1($—Xi-€T)H—6£§%ll)lﬂn(x,6n)dG($).

By Lemma (3.41), Rnl- — fI($ (II)dG( ) + op(1). By Lemma (3.4.3) and

Lemma (3.4.1), Rug = op(1). Hence R4. converges in probability to 20.

79

Note that by adding and subtracting Kh(.’L‘ — Xi_1)€($) to the ith summand, the

left hand side of (3.4.2) can be written as a sum of the following two terms:

L1,. = fl madame),

Li. [I Una) 112446..) - 6(4)] 40(4).

By Lemma 3.4.2 with l = 1 and En = {, URL), converges in distribution to a
normal random vector. By Lemma 3.4.1 and (3.3.3), the term L312 = op((nh)'1/2) .
So the left hand side of (3.4.2) is op((nh)‘1/2) and the right hand side of (3.4.2)

is (0,, — 60)Rn where Rn converges to 2 in probability. Hence
(3.4.4) (6,. — 60) = op((nh)-1/2).

Next we shall show that (6,1 - 60) is actually Op(1/ ﬂ Note that by adding
and subtracting Kh(.’13 — X,_1)r'ngo(X,-_1) — 5,,(12) to the ith summand, the left hand

side of (3.4.2) can also be written as the sum of the following three terms:

L... = (I Un(I)€n(x)dG(I),

Liz = / Una) (1234.60) — 54(4)) 40(4),
I

{‘1
:3 to
w

l

/IUn(I)Zn(I,6n)dG(I).

By Lemma 3.4.2 with l = 1, ﬁLil converges in distribution to a normal ran-
dom vector. The term L22 = op(1/\/ii) by the Cauchy-Schwarz inequality, Fubini,
Lemma 3.3.1 (c), and (3.3.3). The term L3“, = op(1/\/n) is by (3.4.4), (3.3.3), and

the assumption (A5).

80

Combine the above discussion to conclude that

[(971 ’65an: [Ln2+0P(1')

Hence the Theorem holds by Lemma 3.4.2. [3.
Next we are going to prove the three lemmas.

Proof of Lemma 3.4.1.
We will prove a bit more general form of this lemma. i.e. for any continuous

function I on I,

(3.4.5) sup 11:14“ a; — X,_1)l(X,-_1) — l(I)f(I)| —+ o

261'

where f is the density function of X0.

Because l(I)f(I) is continuous on compact set I, so it is bounded on I. So

sup E—ZKhI-Xz1)l(X_1)—l($)f(.’lf)l

IEI

= sup l /K(u)[l( (I — uh)f(I - uh) — l(I)f(I)]du|

IEI

sup imam) — Iowa)! ——> o.

l/\

In order to complete the proof of (3.4.5), we still need to show that

(3 4 6) )sup— —2 Kh( (a: — X,-_ ()zX,_,) — EKh(I — X0)l(X0)l .—_ 6,,(1)
I6

Let Cn(I) = £22; Kh(.’1,‘ — Xi_1)l(X,-_1). Consider covering compact set I E
B = {I 6 IR: III S b} for some b < 00 by 12,, closed set: Bjn = {I : Ill-xjnl < b/l/n},

where 1 g j g 14, such that BO” 03271 — —¢ for j ¢ 16.

81

 

By the assumption that K is Lipschitz, there is a ﬁnite positive number 5 such

 

 

that
s 1 " b6 1 "
Katy) — Cn($jn)l S Eill' — Ijnl If; ”(Xi—IN S h2l/n g; ”(Xi—Illa
b8 .
IECn(I) _ E<n($jn)l S (1.21/7; Ell(X0)la V1: 6 Bjna IS .7 _<_. 77..
So
(3.4.7)

i211) 161(3) - ECn($)l

S sup sun |<n($)-ECn(I)l

S SUI) SUI) (Katy) — <n(xjn)l + lCn($jn) _ ECn(:Bjn)l + IECn($jn) - ECn($)l)

133311,. 268),;

2b
_ —7ﬂ0p(1)+ SUI) lCn($jn)_ECn($inll'

h2 n 15.75%
Note that lKh(£L‘ -— X,_1)l(X,-_1)I is zero unless X,_1 6 J1 for large n such that
h g 1, and it is bounded by c/(2h) when Xi._1 6 J1 for some constant c. By

Theorem 1.3, part (1) of Bosq (1998), for any 1 < q < (n/2),

P(|C(I ) E(( )|>e)<4 —52 h2 +22 1+—4€1 W [n
n. 'n — nx'n I - ' " —°
J J ‘ 6p 8c2 q q eh “2c:

Choose 11,, = n, and q = (fr—i/ h, then

P( sup ICn(xjn) - E<n($jn)l > 6)

ISjSVn
2 4c 1

1/2
S 411,,eIp(—8£—C2 - nhz+7> + 221/nq (1+ :5) a[h"’/2]

3 c1 ne‘c’ "h2 + c3n3/2h‘1p0" "h2/2 = 0(1),

82

for some positive constants c1, C2, and c3 by conditions (M) and (H) . Hence (3.4.7)
is 0,,(1), so is (3.4.6). this also completes the proof of (3.4.5).
By taking l = mgo in (3.4.5), then part (a) of the lemma is proved. To prove

part (b) of the lemma, it sufﬁces to prove that

(34.8) 3;; Mutual — Mrﬂohl = 010(1)-

Because for large n such that h S 1, Kh($ — X44) is nonzero only if )(.--1 E .71,

so by the continuity of mg, compactness of 9 and .71, and the consistency of 0”,

sun Handy) - m00(y)ll = 012(1)-
116.71

Apply (3.4.5) with l(I) = 1, one obtains that

sup n’1 2 Kh(:t‘ — Xi_1) = 0,,(1).

IEI i=1

Hence (3.4.8) is bounded above by

sup 11min) — 614(4)” W 2: K44 — X.-.) = 0.41).

1163: 2:61 i=1
That completes the proof of the part (b) of the lemma. [:1

We are going to apply the Martingale Central Limit theorem, i.e. Corollary 3.1
of Hall and Heyde (1989) to prove Lemma 3.4.2. For the sake of completeness, we

state the corollary here as a lemma:

Lemma 3.4.4 Suppose Snkn = 2;, X"). and (19,14,314) is a zero-mean, square

integrable martingale array with differences X7125 and n2 is an as ﬁnite random

83

variable. If {Xm} satisfy the following conditions:

(a) V5 > 0, Z E[X:,I{lxn,l>5}|7n.i_1] ——> 0, in probability.

i=1

(b) =ZE(X:|,.7m-_1) -—>r)2, in probability.

(C) 95...- C 35.4.1... for i S i S kmn 21.

Then, Snkn converges in distribution to a normal random variable with mean zero

and variance n2.

Proof of Lemma 3.4.2. W.L.O.G, here only gives the proof for the case that 9 is
one dimension. We will construct a martingale array and verify the three conditions

of the Lemma 3.4.4. Deﬁne

j
Snj = Zn—l/Q/IKh($_Xi—1)l(Xi—1)€n($)dG($)€iv

7n.) = 0{Xo.X1.---.Xj,€1.-.-.€j}-

Then {Sm-”7“} is a zero mean, square integrable martingale array, and .7,”- Q
7”,”, Xm- = 77,—”2 f1 K),(I-—X,_1)l(X,~_1)fn(I)dG(I)e,-. So the condition (c) holds.

For any A > 0 and c > 0,

(3.4.9) ZE[X§1{|X,,,|>X}|I,,,1]< X? E[|X,,,-(2+C|J-‘,,,_,].

:1
Because 5,,(I) = EK),(I — X0)m90(X0) = f K(u)r'n90(I — uh)f(I — uh)du, the
kernel function K has bounded support, and continuity of mg, and f, so 5,, is

bounded uniformly in n at I E I, and suppose the bound is 85. Furthermore, note

84

 

 

that
(3.4.10)
[IKMIC — Xt)||€n($)||dG($) S Bg/K(U)9(Xt + uh)du S BgSUPgu/l < 00-
It

So by stationary of X,- and deﬁnition of 7714-1, (3.4.9) is bounded above by

—’C _c C 1 n C
/\ n /2(Btsupg(yllz+ 521311511“ lfnr—Il
3’

i=1

= A‘Cn—c/2C = op(1),

for some constant C. Hence the condition (a) holds.

For the condition (b), note that

EV: %:E ((/I Khlx — Xi—1)1(Xi—1l€n($ldG($))202)

= E(f Km - Xt>z<Xt>6tach<x>>202
= [I f1 (12de — xtma - 1101120101) €n(I)§n(y)dG($)dG(y) ~02
—> 02 (I m3.<4)F<4>.42<4>13<4)dx.
Let Vm- denote (f, Km: — X,)l(X,_1)§n(I)dG(I))202. Note that, V", is bounded

uniformly in ni. Then

71

(3.4.11) Var(V,,2) = E ( (Vm- — Evan)

:HH

i=1

1 2 "
: Elana/710+ ﬁZCOVO/ni’ an)’

i<j

By (3.4.10), and the fact that variance is bounded above by its second moment,
the ﬁrst term on the right hand side of the last equality sign of (3.4.11) is bounded ,
above by c (nh)‘1 for some constant c. Hence it converges to zero.

85

 

 

By Lemma 3.2.1 and (3.4.10), the second term on the right hand side of the last

equality sign of (3.4.11) is bounded above by

1 1 1
,3": z 4||Vntllica < c.— = 4.)
Iczl jik-z

for some constant C1. So (3.4.11) tends to zero. This proves that condition (b) holds.
So by Lemma 3.4.4, ﬁf (3121121 Kh(23 — X,)l(X,-)e,-+1) €n(I)dG(I) converges in

distribution to a normal random variable with mean zero and variance given by

02 / mi.(:v)l2(x)92(x)f3(r)dx

in particular when l = 1, the variance is

'12 = 02 f mt.<=v)92(4113<4>dx.

Proof of Lemma 3.4.3.

By (A2) and consistency of 6,,, one obtains

(3'4-12) maXOSiSn-llnn(Xi)l/ll6n -' 60H = 019(1)-

Similar to the proof of (3.4.10), f1 Kh($ — Xi)l€($)ldG(.’L‘) are bounded uniformly

in i and h g 1, hence

[IKMIC " Xi)l€($)ldG($)l7ln(Xi)l/ll0n - 90” = 012(1) uniform in i-

So (3.4.3) is also a 0,,(1). C1 -

86

 

 

3.5 Asymptotic behavior of the minimum

distance.

In chapter 2, it has been proved that the standardized minimum distance is asymp-

1/2 under the i.i.d setup. In this section

totically normally distributed with rate nh
we will show that the same result is also true if the observations are from a stochas-
tic process satisfying a GSM condition. This result can be seen from the following

three propositions.

Before present the propositions, Deﬁne
E," = Xi — rngn(X,-_1) 2: 1, ...,n.

Proposition 3.5.1 Under the conditions (M) to (H),

1
nhl/2 )

M),(6,,) — M),(00) = e.(

Proposition 3.5.2 Under the conditions (M) to (H),

1 " ~ - 1
,1, Z [1 K20: — X.-1)dG(:r)<e?— e?) = 04m)-

Proposition 3.5.3 Under the conditions (M) to (H),

71—22:] Kh(£L‘ — Xi_1)Kh(iE — Xj_1)dG($)€i€j
i<j I

is asymptotically normally distributed with mean zero and asymptotic variance 0%,

where 02 is speciﬁed in the condition (CI) of the proof of Proposition 3.5.3.

T].

A natural consequence of these three propositions is the following theorem.

87

.J

 

Theorem 3.5.1

~ ”Ill/2 1 n 2 ~2
Tn = F2 Mh(0n) - £5 2:]! Kh($ — Xi_1)dG(.’E)E,
i=1 0

converges to a standard normal random variable as n tends to inﬁnity, where

[‘2 24(02)2Af2($)g2($)d$ / (/K(u)K(u+v)du)2dv.

Similar to the discussion in chapter 2, one obtains that Ff, deﬁned in the in-

 

troduction section of this chapter is a consistent estimator of F2. Hence based on
this theorem, we can therefore conduct the goodness-of-ﬁt test by using T n as a test
statistic.

Next we will prove the prOpositions and the theorems. But before that we will

present some lemmas ﬁrst.

Lemma 3.5.1 Under the conditions (M) to (H),

AOL—1 ZKh($ — Xi—llTlani—ll/llgn - 90H) 610(13): 019(1)-

Proof. Let Inn(X,-)|/||9n — 60H 2 7”,. Note that the left hand side of the lemma can

be expanded as a sum of two terms:
1 ﬂ
(3.5.1) E E )1 Km: — Xi_1)7,2,(,_1)dG(I), and
i=1

2 " .
ﬂ Z/I‘K’JI _ Xi-llKh(I — ki-lldG($)7n(i—1)7n(j—l)~

i<j
The ﬁrst term of (3.5.1) is equal to

1 1 1
i=1

nh

88

 

by the boundedness of g and (3.4.12). Note that

E / m(e — X.)1<,.(.t — X,)dG(I)
= f/K(u)K (va),,-( —uh, I — vh)dudvdG(I)

< SUP fij($ay)G(I) < 00-
1,116.71

Hence the second term of (3.5.1) is 0,,(1). This completes the proof. (:1

Lemma 3.5.2 Under the conditions (M) to (H),

LZ:(I,9n)dG(I) = 0,,(n’1).

Proof. By plugging in mgn(X,-_1) — m90(X,-_1) = mg;(X,_1)(6n — 60) + 77,,(X,_1) to
the i1th summand in Z,,(I, 6n), and basic inequality, the left hand side of the lemma

is bounded above by a sum of following two terms:
216. — 96l|2 / underwear
m(X ) 2
2 6,. —9 2 [i K( )(77—"—"‘—1 dG x .

The ﬁrst term is 0,,(3—1) by Lemma 3.3.1 part (c) with #1,, = m9O and Theorem
3 4.1. The second term is op(—11;) by Lemma 3. 5 1 and the Theorem 3. 4.1. Hence the
lemma is proved. Cl
Now we are ready to prove the propositions.
Proof of proposition 3.5.1. By plugging in X,- = mgo(X.-_1) -+- e,- to the ith

summand in M).(l9n) and Mh(60) , and expanding the quadratic, the left hand side I

89

of pr0position 3.5.1 is equal to a sum of following two terms:
(a) f 2344. 6.1240(4), and (b) 2 / 2.4.4. 0.>U.<x)dc(4).
(a) is 0 pg -) by Lemma 3. 5. 2. By plugging 1n
m6.(Xt-_1) - m90(Xi-1) = r'n'i.(Xt—1)(6.. - 90) + Tln(Xi—1l

to the ith summand in Z,,(I, 6,,) , the term (b) can be written as a sum of following

two terms:
(b1) (6. — 901T/ﬂ6($,901Un($)dG($).
I

((92) (an 6°)TH1(52K"(I_X‘1)(|lii(_foﬁl'|))maﬁa”)

The term (b1) is 0,,(31-17/3) by Lemma 3.4.1 (a) , Lemma 3.4.2 with l = 1 and 5,,
replaced by 5, (3.3.3), and Theorem 3.4.1. The term (b2) is op( Tull/2) by Cauchy-
Schwarz inequality, Lemma 3.5.1. (3.3.3), and Theorem 3.4.1. Hence the pr0position
follows. U
Proof of proposition 3.5.2. By plugging in e“.- = X.- — 771.9,, ()(.--1) and X.- =
mgo(X,--1) + e,- to the ith summand, and expanding the quadratic, the left hand side

of this proposition can be written as a sum of following two terms:
1 n
(a) ‘3 Z/Kﬂx — Xt—1)(mo..(Xt—1) — m90(Xi-1))2dG($)1
Tl i=1 I

(b) 272 Z /1: KZ(I — X,_1)e.- (mgn(X,-_1) — mg0(X,-_1))dG(I).

To prove the pr0position, it sufﬁces to show that

(3.5.2) (a)=o.(ﬁ117) and (them—51,72).

90

Note that by plugging in man (Xi_1) —mgo(X,-_1) = mg;(X,-_1)(6n —90)+nn(X,-_1)
to the it“ summand in (a), expanding the quadratic, term (a) is bounded above by

the sum of following terms
”97, —60l|2— 22:21th(23— Xi— 1 )rn90(X,-_1)dG(I), and

7711(Xi-1) I
In _QOll_1:Z/Ki2t($_i—)(_—_H6 _,0H.>dc<>.

The ﬁrst term of (a) is 0,,(51) by taking the expectation of the summation and

Theorem 3.4.1. Similar to the argument of ﬁrst term in (3.5.1), the second term of
(a) is Opn(—1—h.) Hence the ﬁrst part of (3. 5. 2) holds.
Using the same skill and similar argument as above, one obtains that term (b)

can be written as a sum of following two terms
1h _
T E :% Xi— 1 .
(611— 60) — I n('ni:1 h, h, ——)m90(XXi_1)Ei) dG(.’L‘),

T(6n 2( T — TlnX( i 1)
(9" 9°) He. ain't/(:1: 24K h “In. won) Ml“)

By taking the expectation of the absolute value of the integration, and Theorem

 

3.4.1, the ﬁrst term is op( n/——,)h1,. Similarly the second term is also op( ——1)—n,)h,. Hence

the second part of (3.5.2) holds, so does the proposition. [:1
Now we prove the prOposition 3.5.3.

Proof of proposition 3.5.3. Deﬁne

4513' = [116.02 — Xi-1lKh($ — Xj—rldG(33)€i€j,

n

1 j
an = ﬁ2¢ijy Unzzlvnj'
I: J:

91

 

 

Then, Un is a sum of Martingale differences. In order to apply the M.G.C.L.T to

prove the proposition 3.5.3, one needs to check the following three conditions:
(Cl) Var(U =2 EVn
1
(C2) 32— 2 v3, -—. 1 in probability,
0 1 " 2 . . .
( 3) 53 2E {an1{lvnjl>gan}lfn‘j_1}} ——> Om probab111ty,Ve > 0.

The proof of proposition 3.5.3 is broken down into four lemmas.

Lemma 3.5.3 Under the conditions (M) to (H),

(71211) En( 4 Z ¢ij¢1j)2= 0( (1).

i<l<j

Lemma 3.5.4 Under the conditions (M) to (H),
Var( U) =25:an2 = = 0(712IL).

Lemma 3.5.5 Under the conditions (M) to (H),

0,:4 2 EV”: —
i=1
Lemma 3.5.6 Under the conditions (M) to (H),

07:2"-4 2%?) " Elf/1%) = 010(1)-
i<j

The condition (Cl) and (C3) are the direct consequences of lemma 3.5.4 and

Lemma 3.5.5 respectively. Note that

'=;,IZ(Z (11?,- + 2 Z am”) .

i<l<j

92

So,
(‘43 - EijVUi

j:
n

2
_ 2 __ 2
_ 03”.; Z(¢ij Easij) + 0.721724

i<j

H»:

 

 

Z(¢1j¢1j '— E¢ij¢zj) = 0p(1)

i<l<j

by Lemma 3.5.3, Lemma 3.5.4, and lLemma 3.5.6. Hence the condition (C2) holds
by Lemma 3.5.4. Therefore the proposition is proved. [:1

Next we will focus on the proof of the four lemmas.

Proof of Lemma 3.5.3. Note that left hand side of lemma is equal to

h2
(3.5.3) 53 Z E¢ij¢1j¢ry¢rf

(i<l<j),(i’<l’<j’)

where the summand E¢ij¢lj¢iljr¢pjl is
E/ Kh($ — X,_1)K,,(x — Xj_1)K,,(y — Xz—1)Kh(y - Xj—l)
IxIxIxI
Kh(8 - Xir_1)Kh(S — le._1)Kh(t — X1r_1)Kh(t — in_1)EiE‘IEEEiIEpEidcxyat,
and deW 2: dG('J:)dG(y)dG(s)dG(t).
Deﬁne
E, = {(i < l < j, i' < l' < j') : there are 11 distinct values in i,l,j,i',l',j'. }.

Then on Fy, (3.5.4) is bounded above by

1

(3.5.4) const - W = 0(

 

Denote the 11 distinct indices deﬁned in FV by (i1 < i2 <, ..., < iv). Deﬁne dj be

the 3"" largest difference among 2',“ — i,, i = 1, ...,1/ — 1. Also deﬁne

Km...j.I-,-.-.->= H Kh(-—Xz_1)e§”.
(E(jljﬂp..jk)

93

where p, is either 1 or 2. Then (3.5.3) can be written as

h2 6
(3-5-5); 2 Cu 2 E¢ij¢1j¢ry¢ry = 03/13 + c4A4 + 05A5 + c6A6, say,
11:3 Fl,

for some constants 03, c4, c5, and c6. In order to prove the lemma, it sufﬁces to show
that A, = 0(1), for u = 3,4,5, 6.

But when u = 3 or 4, by (3.5.4) and (3.5.5),

h2 1 1
A3 = 0(3 ' n3 ' F) = 0(a) = 0(1),
h2 4 1

So we only need to show A5 = 0(1) and A6 = 0(1).

Deﬁne
7' = min {j S 5: dj = il+1 — i1, and EK,,,..,i,(-, -, -, ) = Ofor some 1.}.
It is seen from this deﬁnition that on FV
(3.5.6) 7' g 8 — u.

Next we will show that

1
71.4"Th2

(3.5.7) A, = O( ), V = 5,6.
Suppose d, = i,“ — i, for some I. On F”, u = 5, 6, (3.5.3) is equal to
/ COV (Ki1,..,i( (1:! y? S) t)? Kil+1,..,iu (1:) ya 39 t)) dG($)dG(y)dG(S)dG(t).
IxIxIxI
By Lemma 3.2.1, the above term is bounded above by

(3.5.5/I I I I2p[Za(dr)ll/”llKi.....z-.llqllKi...,.....,IlrdG(:v)dG(y)dG(s)dG(t).

94

l l l =
for any p, q,r > 1, and q + q -+- r 1, where
IIK.... .,I|.,= (EIK.,(:1:, y,s,t)lq)1/q.

By an usual calculation, (3.5.8) is bounded above by

 

const - p[a(d,)]1/p

 

By taking q = r = V, A, is bounded above by

h2 T u—l—‘r 3/51 1
const - gin Ed, [a(d.,)] h_4 = O(—), V = 5,6,

 

The lemma therefore is proved.

[3

Proof of Lemma 3.5.4. Because E¢ij = 0 for any i 75 j, so EVM-Vnz = 0 for 3' ¢ 1,

and EUn = 0. Hence

(3.5.9) Var(Un) = Eu: = E (:2 14,-) =:2E.

— n4: 1%.,- +14: Ems

i<j i<l<j

By Lemma 3.5.3 the second term of right hand side of the above equation is

0((n2h)‘ ). Note that Ed; can be written as a sum of two terms:

(a) Cov (Ki(x, 3,053, KJ-(x, we?) dG(a:)dG(y),

IxI

(b) / Imam/>1”) dG< )dG(y).

95

where
((433131): K1101? — X,_1)K,,(y - X14)-
By Lemma 3.2.1, the term (a) is bounded above by

(3-5-10) const-QPlQOU-iNI/pf lle-(IL‘.y)Hq||Kj($.y)|lrdG(I)dG(3/)

IxI
for any p,q,r > land %+%+%=I. Take p = q = r = 3, then (3.5.10) is bounded

above by
(3.5.11) const [0 (j —i)]1/3-——

Hence (a) is bounded above by

»:1/3 2
const— ”7—22” 2.: Z( 0((n h) 1).

k: lj— —i=k

On the other hand, by the direct calculation,

1.. / (Emu: y) W) dG( >dG<y>
IxI

——> (a2)2/If2(x)gz(xmz /(/K(u)K (n+v) )du)2dv =: 1‘2.

This together with (3.5.9) as well as Lemma 3.5.3 implies that

 

n-l
n2h - V(U,,) = n2h-c731 = n2h - 2 EV; ——> F2.
i=1
So Lemma 3.5.4 is proved. [:1

Proof of Lemma 3.5.5.

96

 

 

 

Note that

(3.5.12) n4h22El/n2j "1017213950 +62%; 2: E<Z522j¢zj +Csh n4 2: E¢ij¢zj
j

i<j i;l<j i¢l<j

+C4;‘ Z E¢ij¢lj¢kj+c5::_:' Z E¢ij¢lj¢sj¢tj
i¢l¢k<j i¢l¢s¢t<j

for some constants c1, C2, c3, c4 and c5.
By direct calculation, the ﬁrst term on the right hand side of (3.5.12) is bounded
above by

I}:
const 7H: h—3 =O(n—1-2h) =0(1).

i<j

Similarly, the second and the third terms are bounded above by

const ' h— : hi: O(-1—) = 0(1),

124 . .
z¢l<J

and the fourth term is bounded above by

const - if: E l - O(h) - 0(1)
714 I _ _ '

L
i¢z¢k<j

So in order to prove the lemma, it sufﬁces to show that the ﬁfth term on right hand
side of (3.5.12) is 0(1). But note that the ﬁfth term is actually a special case of
(3.5.3), hence it is 0(1) by Lemma 3.5.3. The lemma is proved. C]

Now we are about to prove Lemma 3.5.6. But before that we ﬁrst prove an
inequality. This inequality is an extension of Theorem 1.3 part ( 1) of Bosq (1998).

The inequality is given in the following theorem.

97

Theorem 3.5.2 Suppose {fn} is a family of o~ﬁeld satisfying Tn Q 7H1. Eij is

measurable w.r.t fivj, Eéij = O for i 7E j, and |§ij| S b. Then for any 5 > 0,
n 2 1/2
qe 8b n
P026131 >135) Sclqexp{—87p}+c2q2 (1+?) .0 (w),

for some constants c1, c2 > O and 1 s q S [’5‘]

Proof.

First consider blocking. Let p be an integer between 1 and n. Let q = [£1 + 1.

Deﬁne

n
11 . .
V1() = EEij,z=1~p,]=1~p, ......

i<j
11

v3“) = 250,1: 21p+1~ (21+ 1);), j = 2lp+1,~ (21+1)p,
i<j

n
21 . .
V1” = E {ij,z=1~p,]=p+1~2p, ......

i<j
Vf’“) = Zg.,,i=2zp+1 ~ (2l+1)p,j=(21+1)p+1 ~2(1+1)p,
i<j

n

3 . .

V1“) = Eéij,z=p+1~2p,]=p+l~2p, ......
i<j

143‘” = :§,,,z'=(21+1)p+1 ~2(l+1)p,j =(21+1)p+1~2(l+1)p,
i<j
11
V4“) — -- '— 1 2 ”—2 1 3
1 — Zngaz—p'l” N P,]— p‘l' N p, ......
i<j
n
4(1) _ .. ._ .1.
V: — 25:],3—(21+1)p+1~2(l.1)p,
i<j

j=(21+1)P+1~(2(l+2)+1)p, ......

98

 

 

 

 

 

Deﬁne
Ak={(i,j):(k—1)p<j—i5kp,i=1~n,j=1~n}.

Then

11 2Q
Z€ij=z 2 Eg-

Ki k=1 (13776.41

So for any 5 > 0,

(3.5.13) P(|:€ijl > n 25) _<_ :Pl 2 €ij|>1;—

I<J (i,j)€Ak

On each Ak, deﬁne WW 2' = 1 ~ 4 as above, then

(3.5.14) 2 g“: 2 (6100+ 2 W200 + Z 142(k) + Z V140“)
l l l

(i .7) )EAI:
Hence

4

z w— sin: 0sz >125)

(idle/4k i=1

But by recursively using Bradley’s lemma 1.2 in Bosq (1998), there are indepen-

dent random variables Wlilk), ..., Wlilk), such that Pw‘“) _—. Pvﬂk), and
l l

. 1/2
1' i lVlm + '00
(3.5.15) P(|w,""—v,"°’|) 311- (l l ,\ C" a(p).

for any 0 < A 3 MW”) + CHOO. Hence

(3 516) |ZV“"’| > 9—25
. . I , 8g
= POZV” (>7:3 —— 2:IW,“’°— Vf‘k’lsAM)
l
+P m(UuWi‘k’ — vi“): > n)
l

 

l/\
“U
’ _ \
V
colaM
'Q m

I

4:

>’
V
C
E.

;
S.

E
V

V
V

Choose
71%
= . _, 6 — 1 2
/\ mzn(16q2 ( )bp),
then

IIW’""+cHw s Hv,“"’llx+cs<6+1>bp2

III/1““ + one. 2 c — 1114““le 2 (6 -1)5p2 > 0.
So, 0 < A g ”VIM“ + cHoo. Hence in view of (3.5.15), (3.5.16) is bounded above by

(3.517)

 

22 1/2
P(IZW:‘")I>”; —:-)+q u (max<§ii,16”b(5+1)>) -a<p>

n28

Choose 6 such that

6—1— n25

 

1 _ 16(12pr _ 4b

3

then (3.5.17) is bounded above by

1/2
(3.5.18) m(zwgml. $11 ..., (1.3;) ...(p).

But by applying Hoeffding’s inequality to Wink), one may obtain that

2(n—26)2 n52
i(k) __ __ 16q < . _
PO 2 W I >153) 2e:l:p{ _q(bp2)2} _ 261p { 16b2p}'

Hence (3.5.18) is bounded above by

n52 8b ”2
2 -— 11- - 1 — . .
e$p{ 16b2p}+ q ( + E) 0(1))

100

 

 

So (3.5.13) is bounded above by

n52 8b In
16-q-e$p{—i—6—ﬁI—9}+88-q2 (1+?) -a(p).

The theorem is thus proved. D

We will apply the above inequality to prove Lemma 3.5.5.
Proof of Lemma 3.5.6.

We will show that for any A > 0,

P ((32 2"] f, — E33)! > A) = 0(1).
n i<j

Apply Theorem 3.5.2 with 5,3- = d)%— Eqﬁfj, By boundedness of g and kernel

function K, one obtains that {U is bounded above by c/h.2 for some constant c that

doesn’t depend on i, j. For any positive number A,

(3.5.19) P 0%,ng —E¢§,) > A) _<_ P ([233, —E¢§) )|> 71.2%)
i<j z<J

By Theorem 3.5.2 with e = A/h and b = c/hz, one obtains that (3.5.19) is

 

bounded above by

52 8c 1 ”2 n
0 .2 I O -————- O 2 ' 2 . _ . — . — o
(3 5 0) 61 q exp{ 862 qh }+ 02 q (1+ 5 h) a([2q])

Choose q = nil, then qh2 —> 00 and n/(2q) = O(nﬁ?) —> 00 as n tends to 00.
Hence both two terms in (3.5.20) tend to 0 by condition (M). therefore the proof of

the Lemma 3.5.6 is complete. [:1

101

 

 

Chapter 4

Simulations

This chapter contains a simulation study comparing three tests. More precisely, let

{Xh t = 0, i1 i 2, ...} be a stationary stochastic process satisfying
Xt = #(Xt-1)+ 5:,

where {at} are i.i.d. r.v.’s with mean zero and at is independent of Xt_1, for all
t. The parametric family of functions to be fitted to u is chosen to be m9(x) =

6:13, x 6 1R, 6 6 IR with 90 = 0.8. That is, the hypothesis to be tested is

 

H0 : p(:1:) = 0.82:, vs. H1 : H0 is not true.
We chose the following three models to generate simulated data from:

model 1. Xt+1 = 0.8Xt + €t+1,
model 2. Xt+1 = 0.8Xt—1.Ze:1:p(—X,2))Xt+ EH1 + 0.1,
model 3. X,+1 = 0.8x, + 0.5(X, —— 0.5)2 —— 0.3(X, - 0.5)3 + am.

102

 

The error distribution is either N (0, 0.1) or double exponential. The sample sizes
chosen are 50, 100, 200, and 500. The three different tests are those of Koul and
Stute (1999) denoted by KS, An and Cheng (1991) denoted by AC, and the mini-
mum distance test of Chapter 3 denoted by MD. The nominal level that is used to
implement the test is a = 0.05. There are 1000 replications for each combination of
(model, sample size, error distribution). Data from model 1 are used to study the
empirical size, and the data from models 2 and 3 are used to study the empirical

power of these tests. The empirical size (power) is computed by
Relative frequency of ( value of the test statistic > F‘1(1 — (1)),

where F is the asymptotic distribution of the test statistics under H0.
The steps to compute the test statistics are as follows: Let X(O) g, ..., _<_ X(n)

denote the ordered X0, X1, ...,Xn.

1. Koul and Stute test:

Step 1: Compute the least square estimate of 00 under H0:

else = Ext—IXi/ZIXE—r

Step 2: Compute Vn(X,-), i = 1,2, ...n, where

3

1
Vn($) = 7 Z (Xi — 6186X1_1)1(Xi_1 S III), 1' E R.
i=1

3

Step 3: Compute An(X(,-)), X (i) g 230, where 230 is the 99th percentile of the sample

103

 

Ana) — fy21(?/>I)Gn(dy)—lZX.2_11(X1-1>23),
Gn(:c) = liI(X1_1<x), $61K

Step 4: Compute the estimate of the error variance:

- :léw -9136X i—1)2 -

Step 5: Compute TnVn(X,-), i = 1,2, ...,n, where

TI

1 1 "
Tnvna) = 752 [Ion--1 s::)— ;ZX1-1A;1(Xj-1)X._1
i=1

i=1

XI(Xj—1 S Xi-l /\ 37)] (Xi - glseXi-ll'

Step 6: compute the test statistic

T3: sup lTnVn($)l = sup lTnVn(Xi)l.
$310 (In Gn(l‘0) X13450 0'11 Gn($0)

The limiting distribution of the test statistic is sup0<t<1 |B(t)|, where B(t) is a
standard Brownian motion. The 95th percentile of this distribution is approximately

equal to 2.2414 obtained from the formula

P( sup |B(t)|<b)= ((1P|B))|<b +2:(—1)‘P((2i-1)b<B(1)<(2i+1)b),

0<t<1

given on page 553 of the book by Resnick (1992). The number 2.2414 is such that

IP( sup lB(t)| < 2.2414) — 0.95) g (0.1)6.

0<t<1
The same cut off value was also used by An and Cheng (1991).

104

 

 

2. An and Cheng test:

Step 1: Compute the sample covariance:

1 " 1 "
A’_— X-2 , (l A =- XiXi—-

Step 2: Compute

. ’71

P = T-

70
Step 3: Compute 6'2 = n‘1 2;,(Xi — ﬁXi_1)2.
Step 4: Compute (E,- = X,- — ﬁXi_1.

Step 5: Compute Kn(X,-), i = 1,2, ...,n, where

 

where m = m(n) is a subsequence of n for which m /n is about 0.75.
Step 7: Compute the test statistic Kn = sup, IRn(t)| = sup,- an(X,-)|.
The limiting distribution of the test statistic is the same as that of the KS test.
3. Minimum distance test:
Choose the kernel function K (a) = 2(1 — v2)1(}u| _<_ 1), the compact set I =
[—1,1],h = n’1/4 and C(27) = x on [—1,1].

Step 1: Compute the minimum distance estimate of 00,

 

f1 (Eli Kh($ - Xi—llxi) (221:1 K1107 " Xi—1)Xi_1) d1".

9., = n
f1 (21:1 Kh(17 " Xi—1)Xi—1)2 dx

Step 2: Compute the residuals by

55 = Xi — 6,,Xi_1,z=1,2,...,n.

105

 

 

Step 3: Compute the test statistic

Tnzn::/2[/I(nleh(i)i2$—X—lé) dI——:/K2($— Xi_1)5?d$].

 

n

where

MAE 21941“ Xi-llzé?)d /(/K(u) )K(u+v)du)2dv.

The simulation programming was done using C language. To generate a time
series of size n from a given error distribution, ﬁrst a random sample was generated
from uniform [0, 1], then by calling rnorm function from R the errors from N (0, 0.1)
were generated, while the errors from the double exponential were generated by
inverting the distribution function. Then a series of (101 + n) r.v.’s are generated
based on models 1, 2, 3, and the errors. Finally, X0, ...,Xn are taken to be the last
(71. + 1) observations.

The sizes and powers of the three tests were simulated for the sample sizes
a = 50, n = 100, n = 200 and n = 500, each repeated 1000 times. The density
curves of normalized 6n and Mh(6n) are plotted by using density plot command with
Gussian kernel option in SPLUS2000.

The results of the simulation study are shown in the Tables 4.1 to 4.3. The
Tables 4.1 and 4.2 give the empirical sizes and powers of the three tests for testing
model 1 against model 2, and for the error distributions double exponential and
N (0, 0.1), reSpectively. Table 4.3 gives a similar data when testing model 1 against
model 3 with the error distribution N (0,0.1) only. From these tables, one sees
that all three tests have equally good power performance for the sample size 500.

106

 

But KS and MD tests have better empirical sizes than the AC test for most of all

remaining sample sizes. Compared to the other two tests, the MD test performs

better for testing model 1 against model 3, and also has good performance in testing

model 1 against model 2 when the error distribution is N (0, 0.1) and the sample size

100 or more, but its power is not as good as that of the other two tests when the

sample size is as small as 50. Overall, AC test has good power performance but it

seems the empirical size is not good for testing model 1 against model 2 with double

exponential errors. KS test seems to have better performance than AC and MD in

testing model 1 against model 2 with normal error distribution, but not as good as

the other two for testing model 1 against model 3 with N (0, 0.1).

Table 4.1: Tests for model 1 22.3.

model 2 with double exponential errors.

 

 

 

 

 

 

 

 

 

 

n = 50 n=100 n:200 n=500
tests size power size power size power size power
AC 0.059 0.027 0.308 0.209 0.132 0.789 0.128 0.934
KS 0.072 0.137 0.064 0.446 0.054 0.837 0.051 0.990
MD 0.012 0.111 0.043 0.406 0.045 0.824 0.050 0.999

 

 

107

 

 

Table 4.2: Tests for model 1 vs. model 2 with N(O, 0.1) errors.

 

 

 

 

 

 

 

n = 50 n=100 n=200 n=500
tests size power size power size power size power
AC 0.010 0.867 0.022 0.999 0.025 1.000 0.034 1.000
KS 0.029 0.998 0.036 1.000 0.042 1.000 0.049 1.000
MD 0.011 0.659 0.019 1.000 0.023 1.000 0.044 1.000

 

 

 

 

 

 

Table 4.3: Tests for model 1 vs. model 3 with the N(0,0.1) errors.

 

 

 

 

 

 

 

 

 

n = 50 n=100 n=200 n=500
tests size power size power size power size power
AC 0.019 0.372 0.019 0.947 0.025 1.000 0.029 1.000
KS 0.013 0.554 0.071 0.777 0.059 0.871 0.046 1.000
MD 0.011 0.421 0.021 0.982 0.035 1.000 0.049 1.000

 

 

 

 

 

 

 

 

 

Tables 4.4 and 4.5 below list the mean and standard deviation of 9,, under H0
with double exponential and N (0, 0.1) errors, respectively. From the tables one can
see that 6,, converges to 60 = 0.8 as sample sizes change from 50 to 500, and the
standard deviation tends to be smaller as sample size tends to be larger.

The simulation results of the densities of ﬁwn — 0.8), the minimum distance
test statistics, and suitably scaled minimized distances are shown in Figure 4.1 to

Figure 4.12.

108

 

Table 4.4: Mean and s.d.(6,,) under model 1 with double exponential errors.

 

 

 

 

 

 

 

 

 

 

 

sample size n=50 n=100 n=200 n=500
mean 0.82 0.809 0.807 0.802
stdev 0.0963 0.0777 0.0533 0.0339

 

 

 

 

Table 4.5: Mean and s.d(6,,) under model 1 with normal errors.
sample size n=50 n=100 n=200 n=500
mean 0.845 0.821 0.813 0.807
stdev 0.0957 0.0682 0.0475 0.0306

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.1 is the Monte Carlo density curves of ﬁwn — 0.8) from 1000 runs
with sample size n = 50, n = 100, n = 200, n = 500 respectively when the error
distribution is double exponential. Figure 4.2 is the Monte Carlo density curves of
was, — 0.8) when the error distribution is N (0,01). The graphs show that the
distribution of ﬁwn — 0.8) converges to its asymptotic normal distribution very
quickly.

Figure 4.3 is the Monte Carlo density of Tn under H0 when the error distribution
is double exponential. Figure 4.4 is the Monte Carlo density of Tn under H0 when
the error distribution is double exponential. Figure 4.5 is the Monte Carlo density
of Tn under H0 when the error distribution is N(0,0.1). Figure 4.6 is the Monte

Carlo density of T1., under H0 when the error distribution is N(0,0.1). Figure 4.7

109

 

is the Monte Carlo density of Tn under H0 when the error distribution is N (0. 0.1).
From these graphs, it can be seen that eventually the density of the test statistics
converge to a standard normal density under H0, and to a normal density with unit
variance and a positive mean under the alternatives.

Figure 4.8 to Figure 4.12 are the Monte Carlo density of nh1/2Mh(6n) with
sample size n = 50, n = 100, n = 200 under models 1, 2, and 3, when the error
distributions are double exponentail and normal. From the graphs we ﬁnd that the
densities under models 2 and 3 are approximately their counterparts under model 1

with positive shifts.

” 1’

In the following ﬁgures, ” ------ is for n = 50, ”— - — is for n = 100, a heavy

solid line is for n = 200, and a light solid line is for standard normal distribution.

110

 

The density of ﬁwn — 0.8) when the errors are double exponential.

Figure 4.1:

 

 

 

 

 

0.15 “

0.10 “

0.05 ‘

0.00

30

20

10

-10

111

 

 

Figure 4.2: The density of Jim” — 0.8) when the errors are N(0,0.1).

 

 

 

112

 

Figure 4.3: The density of Tn(0n) under model 1 with double exponential errors.

 

0.2 4‘

 

 

 

 

0.0 .‘ ‘ 4“.
-20 -1o 0 1o 20 30 40

113

 

Figure 4.4: The density of Tn(0n) under model 2 with double exponential errors.

 

0.25 ‘

0.20 ‘

0.15 ‘

 

 

0.10 4

0.05 “

 

 

 

 

 

.. . '~-.-- ................... d
0.00 ,

114

 

The density of Tn(6n) under model 1 with N(0,0.1) errors.

Figure 4.5:

 

 

 

 

 

 

1.2 1
0.8 .
0.4 4

0.0

115

 

The density of Tn(0n) under model 2 with N(0,0.1) errors.

 

Figure 4.6:

 

 

 

.. iiilﬁllil - 11

116

 

Figure 4.7: The density of Tn(9n) under model 3 with N(O, 0.1) errors.

 

 

 

.-ll I L

 

0.0

5
4|

117

Figure 4.8: The density of the suitably scaled minimized distance under model 1
with double exponential errors.

 

0.30 “

0.25 “

0.20 7

0.15 ‘

0.10 “

 

0.05 H

0.00 7 ' ‘ ' ' .
o 10 20 30 4o 50 so

 

 

 

 

118

 

Figure 4.9: The density of the suitably scaled minimized distance under model 2
with double exponential errors.

 

0.25 ‘

0.20 “

0.15 *

 

0.10 “

0.05 ‘

 

 

 

 

.
§
0
Q
0"
"oaa.-._"
A-

we ' ' - - . a,

 

119

 

Figure 4.10: The density of the suitably scaled minimized distance under model 1
with N(0,0.1) errors.

 

1.24

0.8 ‘

0.4 ‘1

 

 

 

 

0.0

120

 

Figure 4.11: The density of the suitably scaled minimized distance under model 2

with N(O, 0.1) errors.

 

you. .

"-....OIA
. .\l
I
’

_L
b
‘4-» __4 J

c
QIOIUAVO¢~‘-o-QOQ'OQOA

~.~'.I
-u-'-_-.-
-
"n
- .-

  

 

.0
01
___ __ L-

0.0 . ' . .
7 9

121

 

Figure 4.12:
with N(0,0.1) errors.

The density of the suitably scaled minimized distance under model 3

 

 

 

0.5 ‘
0.4 ‘ / 1
l
0.3 ~ I
o ‘. “
0.2 .1 O" ‘3 ‘0‘
'1‘ 3 ‘5.
a, ‘. “
‘.
. \,
.’ \_
0.1 d _i' ‘3'
I \
.“ '\. .
I \_ '.
o’. " .\
.’ 0. .....
0.0 1 ‘1‘ --------- — ..........
0.5 3.0 5.5 8.0 10.5 13.0 15.5

122

 

 

BIBLIOGRAPHY

An Hong-zhi and Cheng Bing (1991). A Kolmogorov-Smirnov type Stgatitic with
application to test for nonlinearity in time series , International Statistical
Review, 59, 287-307.

Beran, R.J. (1977). Minimum Hellinger distance estimates for parametric models,
Ann. Statist, 5, 445-463.

Beran, R.J. (1978). An efﬁecient and adaptive robust estimator of location. Ann.
Statist, 6, 292-313.

Bosq, D (1998). Nonparametric statistics for stochastic processes, Second Edition,
Springer.

Cox, D., Koh, E., Wahba, G., and B. S. Yandell. (1988). Testing the parametric
null model hypothesis in semiparametric partial and generalized spline models.
Ann. Statist., 16, 113-119.

Donoho, D.L. and R. C. Liu. (1988a). The ”automatic” robustness of minimum
distance functionals. Ann. Statist, 16, 552-586.

Donoho, D.L. and R. C. Liu. (1988b). Pathologies of some minimum distance
estimators. Ann. Statist, 16, 587-608.

Eubank, R.L. and Hart, JD. (1992). Testing goodness-of-fit in regression via order
selection criteria. Ann. Statist, 20, 1412-1425.

Eubank, R.L. and Hart, JD. (1993). CommonaJity of CUSUM, von Neumann and
smooth-ing based goodness-of-ﬁt tests. Biometrika, 80, 89-98.

Eubank, R.L. and Spiegelman, C. H. (1990). Testing the goodness of ﬁt of a linear
model via nonparametric regression techniques. J. Amer. Statist. Assoc. 85, _
no. 410, 387-392.

123

 

Hall, P. (1984). Central limit theorem for integrated square error of multivariate
nonparametric density estimators. J. Malt. Analysis, 14, 1-16.

Hardle, W. and E. Mammen. (1993). Comparing nonparametric versus parametric
regression Fits. Ann. Statist, 21, 1926-1947.

Hjellvik, V, Yao, Q, and Tjgbstheim, Dag. (1998). Linearity testing using local
polynomial approximation. J. Statist. Plann. Inference. 68, no. 2, 295-321.

Koul, H and , Stute, W. (1999). Nonparametric model checks for time series. Ann.
Statist, 27, 204-236.

Mack, Y.P. and Silverman, B.W. (1993). Weak and strong uniform consistency
of kernel regression estimates. Z. Wahrscheinichkeitstheorie verw. Gebiete,
405-415.

McKeague, I. and Zhang, M. (1993). Identiﬁcation of nonlinear time series from
ﬁrst order cumulative characteristics. Ann. Statist., 22, 495-514.

Resnick, S. (1992). Adventures in stochastic processes, Birkhuser Boston, Inc.,
Boston, MA.

Stute, W. (1996). Nonparametric model checks for regression, Ann. Statist, 25,
613-641.

Stute, W., Thies, S. & L. Zhu. (1997). Model checks for regression: An innovation
process approach. Ann. Statist, 26, 1916-1934.

Tweedie, R.L. (1983). Criteria for rates of convergence of Markov chains, with ap-
plication to queueing theory. In Papers in Probability, Statistics and Analysis
(J.F.C. Kingman and G.E.H.Reuter, eds.)260—276. Cambridge Univ. Press.

Wolfowitz, J. (1953). Estimation by the minimum distance method. Ann. Inst.
Statist. Math, Tokyo 5, 9—23.

124

Wolfowitz, J. (1954). Estimation by the minimum distance method in nonpara-
metric stochastic difference equations. Ann. Math. Statist., 25, 203—217.

Wolfowitz, J. (1957). The minimum distance method. Ann. Math. Statist, 28,

75-88.

125

l11111111111111l