......t r

ti

1.... by};

“3.91%“??05.

“agar“...

“Iii I

.mhhuvﬂ

zaﬂ mu.»
lib“???

. .1

...

x M

H. Al u

: ...”.uaduf...”

. .... Q. . , ,
. . . . . ..sgmﬁﬁwwﬁ.%%mﬁr ,.
. . A ‘11.. Jmm....‘ :. . l. . ‘ . Panninkl,‘ l . . , >
35$..51.”...nmmsnvainma ... .a. .....» .

ll

...-. “.3
‘ 19. I: . ,
J1

 

209;

This is to certify that the
dissertation entitled

Polynomial Spline Smoothing for Nonlinear Time Series

presented by

Li Wang

has been accepted towards fulfillment
of the requirements for the

PhD. degree in Statistics and ProbabilitL

 

 

M“

Major Professor’s Signature

4/50/42,?

Date

MSU is an Afﬁrmative Action/Equal Opportunity Institution

Do-I-I-I-a-o-n-v-n-o-oCO-o-I-l-n-I-I-.-c--.-..—

 

L i t}- ;"75- A i 3i Y

.3. 1.?“ ._
i‘..-::CE-.,~;gm Stet-a

stasQ

! ‘ .
l '1' 3.‘ ."i rt .4.‘
I.) A If .u \ l 1 \, ‘ H

__.-—-__. -
~-— .—
H-..

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/07 p:/CIRCIDateDue.indd-pt1

POLYNOMIAL SPLINE SMOOTHING FOR
NONLINEAR TIME SERIES

Li Wang

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Probability and Statistics

2007

ABSTRACT

POLYNOMIAL SPLINE SMOOTHIN G FOR
NONLINEAR TIME SERIES

By

Li Wang

Nonlinear time series analysis has gained much attention in recent years due primar—
ily to the fact that linear time series models have encountered various limitations in real
applications and the development in nonparametric regression has established a solid foun-
dation for nonlinear time series analysis. In this dissertation, polynomial spline smoothing
is studied for nonlinear time series.

For univariate nonlinear time series, uniform confidence bands of a nonparametric pre-
diction function are constructed using the polynomial spline method. As an application,
after removing the environmental Kuznets curve trend effects, the impact of the economic
intervention on environmental quality change is quantified for the United States and Japan,
with different conclusions.

Application of non- and semiparametric regression techniques to high dimensional time
series data have been hampered due to the lack of effective tools to address the “curse of
dimensionality”. There are essentially two approaches to circumvent this difﬁculty: function
approximation and dimension reduction.

For the function approximation approach, the nonlinear additive autoregression (NAAR)
model is examined. Under rather weak conditions, spline-backﬁtted kernel estimators of the
component functions are proposed for weakly dependent samples that are both computa-

tionally expedient (so it. is usable for analyzing very high dimensional time series), and

theoretically reliable (so inference can be made on the component functions with conﬁ—
dence).

For the dimension reduction approach, a single-index prediction (SIP) model based on
weakly dependent sample is studied. The single-index is identified by the best approximation
to the multivariate prediction function of the response variable, regardless of whether or not
the prediction function is a genuine single-index function. A polynomial spline estimator is
proposed for the single—index prediction coefﬁcients, and is shown to be root-n consistent
and asymptotically normal. An iterative optimization routine is used which is sufﬁciently
fast for the user to analyze large data sets of high dimension within seconds. Application of
the proposed procedure to the river flow data of Iceland has yielded superior out-of—sample

rolling forecasts.

Copyright by
Li Wang
2007

I dedicate this work to my husband Lei Gao and my parents.

ACKNOWLEDGMENTS

,I would like to thank many people who have helped me on the path towards this disser-
tation. First and foremost, I would like to express my gratitude to my advisor, Professor
Lijian Yang. I could never have reached the heights or explored the depths without his gen—
erous help, unbreakable support and patient guidance. Every single discussion I had with
him was valuable in terms of new ideas and renewed scientiﬁc excitement. His infectious
enthusiasm and unlimited zeal have been major driving forces through my graduate study
at MSU and will keep encouraging me in my future research.

I also wish to express my gratitude to my dissertation committee, Professor Dennis

Gilliland, Professor Yiming Xiao, Professor Ana Maria Herren, for sparing their precious
‘ time to serve on my committee and giving valuable comments and suggestions.

I must acknowledge Professor Dennis Gilliland and Professor Connie Page for accepting
me as one of the consultants at CSTAT, where I have obtained plenty of opportunities to
work with students and faculty from a variety of disciplines.

I am grateful to the entire faculty and staff in the Department of Statistics and Proba-
bility who have taught me and assisted me during my study at MSU. And special thanks
are given to Professor James Stapleton and Professor Vince Melﬁ for their numerous help,
constant support and encouragement.

Thanks to the graduate school and the Department of Statistics who provided me with
the Dissertation Completion Fellowship for working on this dissertation. This dissertation
is also supported in part by NSF award DMS 0405330.

Last but not least, I would like to thank my husband, Lei Gao, for his love and support
over all these years and two of my academic sisters: Dr. Jing Wang and Dr. Lan Xue for

their generous help.

vi

TABLE OF CONTENTS

LIST OF TABLES ................................. ix
LIST OF FIGURES ................................ x
1 Introduction ................................... 1
1.1 Nonlinear Time Series Prediction Model .................... 1

1.2 Spline Conﬁdence Bands ............................. 2

1.3 Nonlinear Additive Autoregression (NAAR) Model .............. 3
1.4 Single—Index Prediction (SIP) Model ...................... 4
1.5 Polynomial Spline Smoothing .......................... 5

2 Spline Confidence Bands for Time Series Prediction Function ..... 7
2.1 Introduction .................................... 7

2.2 Main results .................................... 8
2.3 Error decomposition ............................... 11
2.4 Implementation .................................. 13
2.5 Examples ..................................... 15
2.5.1 Simulation example ............................ 15

2.5.2 Environmental Kuznets curve (EKC) .................. 17

2.6 Proof of Theorem 2.2.1 .............................. 19
2.6.1 Preliminaries of Theorem 2.2.1 with k = 1 ............... 19

2.6.2 Proof of Proposition 2.3.1 with k = 1 .................. 21

2.6.3 Proof of Theorem 2.2.1 with k = 1 ................... 24

2.6.4 Preliminaries of Theorem 2.2.1 with k = 2 ............... 25

2.6.5 Variance calculation ........................... 26

2.6.6 Proof of Theorem 2.2.1 with k = 2 ................... 28

3 Spline-Backﬁtted Kernel Smoothing of NAAR Models ......... 33
3.1 Introduction .................................... 33

3.2 The SPBK estimator ............................... 36
3.3 Decomposition .................................. 41
3.4 Bias reduction .................................. 43
3.5 Variance reduction ................................ 45
3.6 Simulations .................................... 47
3.6.1 Example 1 ................................. 48

3.6.2 Example 2 ................................. 50

vii

3.7 Proof of the main results ............................. 50

3.7.1 Preliminaries ............................... 50
3.7.2 Empirical approximation of the theoretical inner product ....... 57
3.7.3 Proof of Lemma 3.5.2 ........................... 61

4 Spline Single-Index Prediction Model ....... . ............ 70
4.1 Introduction .................................... 70
4.2 The Method and Main Results ................ > .......... 72
4.2.1 Identifiability and definition of the index coefﬁcient .......... 72
4.2.2 Variable transformation ......................... 73
4.2.3 Estimation Method ............................ 75
4.2.4 Asymptotic results ............................ 75

4.3 Implementation .................................. 78
4.4 Simulations .................................... 80
4.4.1 Example 1 ................................. 80
4.4.2 Example 2 ................................. 81

4.5 Application .................................... 82
4.6 Proof of the main results ............................. 83
4.6.1 Preliminaries ............................... 83
4.6.2 Proof of Proposition 4.2.1 ........................ 87
4.6.3 Proof of Proposition 4.2.2 ........................ 95
4.6.4 Proof of Theorem 4.2.2 .......................... 99
BIBLIOGRAPHY ................................. 132

viii

4.1
4.2
4.3
4.4
4.5
4.6

LIST OF TABLES.

Example 2.5.1: Piecewise constant spline bands coverage probabilities . . . . 103
Example 2.5.1: Piecewise linear spline bands coverage probabilities ..... 103
Report of Example 3.6.1 ............................. 104
The computing time of Example 3.6.1 ...................... 104
Report of Example 4.4.1 ............................. 105
Report of Example 4.4.2 ............................. 106

ix

4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12

4.13
4.14

4.15
4.16
4.17
4.18
4.19
4.20

Example 2.5.1:
Example 2.5.1:
Example 2.5.1:
Example 2.5.1:
Example 2.5.1:
Example 2.5.1:
Example 2.5.1:
Example 2.5.1:
Example 2.5.2:
Example 2.5.2:
Example 2.5.2:
Example 3.6.1:
ponent . . . .

Example 3.6.1:
component . .

Example 3.6.1:
ponent . . . .

Example 3.6.1:
Example 3.6.2:
Example 3.6.2:
Example 4.4.1:
Example 4.4.1:
Example 4.4.2:

LIST OF FIGURES

95% constant spline confidence bands with opt = l ..... 107
99% constant spline conﬁdence bands with opt = 1 ..... 108
95% constant spline conﬁdence bands with opt = 2 ..... 109
99% constant spline conﬁdence bands with opt = 2 ..... 110
95% linear spline conﬁdence bands with opt = 1 ....... 111
99% linear spline conﬁdence bands with opt = 1 ....... 112
95% linear spline conﬁdence bands with opt = 2 ....... 113
99% linear spline conﬁdence bands with opt = 2 ....... 114
Plot of the EKC in terms of u(t) and v(t) ........... 115
'Itend and noise analysis of US ................. 116
"ﬁend and noise analysis of Japan ....... . ........ 117

SPBK estimator with conﬁdence intervals for the ﬁrst com-

................................... 118
SPBK estimator with conﬁdence intervals for the second.
................................... 119
SPBK estimator with conﬁdence intervals for the third com-
................................... 120
Plot of the relative efﬁciencies of components 2 and 3 . . . . 121
Plot of the relative efficiencies of components 1 and 2 . . . . 122
Plot of the relative efficiencies of components 15 and 30 . . . 123
The actual bivariate surface .................. 124
The univariate approximation to the bivariate surface . . . . 125
The univariate approximation (d = 10, 50) .......... 126

4.21 Example 4.4.2: The univariate approximation (d =2 100, 200) ......... 127

4.22 Example 4.4.2: Kernel density plots of the error norms ............ 128
4.23 Example 4.4.2: Kernel density plots of the error norms ............ 129
4.24 Time plots of the daily river ﬂow data ..................... 130
4.25 The ﬁtted, residual and forecast plots of the river flow data .......... 131

xi

CHAPTER 1

Introduction

1.1 Nonlinear Time Series Prediction Model

Classic regression and time series tools such as the generalized linear model and the lin-
ear autoregression are known to be inadequate for complex data that exhibit nonlinearity.
This recognition has motivated the development of non— and semiparametric regression
techniques, with far reaching applications, see, for example, Fan and Gijbels (1996), Bosq
(1998), Fan and Yao (2003).

A typical nonparametric problem in time series analysis is the classical decomposition of
a realization of a time series into a slowly changing function known as a “trend component”,
or simply trend, a periodic function referred to as a “seasonal component”, and ﬁnally a
“random noise component”, which in terms of the regression theory should be called the -
time series of residuals. In time series analysis smoothing problems occur of course in the
spectral domain when we want to estimate the spectral density, e.g. for model ﬁtting. In
the time domain nonparametric prediction is one of the ﬁelds where smoothing methods are
intensively used. A well-known example is the water ﬂow prediction from a time series of
river data, see Section 4.5 in Chapter 4. In the motorcycle crash test, the acceleration of the
dummy head after impact follows a complicated instead of a simple polynomial time trend.
Another example of the nonlinear time series is the quarterly unemployment rate of US.
women, which follows a nonlinear instead of a simple linear prediction formula. Effective
tools for extracting information from such complex regression data have to be non- and

semiparametric in nature.

In the following, let {X$,K}n 1 = {X,-,1,...,X,,d,Y,~}?_l be a (d+1)—dimensional
z: _

strictly stationary process following the stochastic regression model
Yz' = m (Xi) +0(Xi)€z‘,m(xtl = E(Yz'|xz'), (1-1-1)

in which E(e,- IX,) 2 0, E (5?|X,-) = 1, 1 _<_ i g n. The-d-variate functions m, a are
the unknown mean and standard deviation of the response Y, conditional on the predictor
vector X,, often estimated nonparametrically.

Two very popular forms of nonparametric regression are kernel/ local polynomial type
and spline type smoothing. In this work, the polynomial spline smoothing is extensively
studied for nonlinear time series. The greatest advantages of spline smoothing, as pointed
out in Huang and Yang (2004), Xue and Yang (2006 b) are its simplicity and fast compu-
tation.

For model in (1.1.1), when the dimension of the predictor vector X,- is l (d = 1), spline
confidence bands are obtained in Chapter 2 for time series prediction function m under weak
dependence. Application of smoothing techniques to high dimensional time series have been
hampered due to the lack of effective tools to address the “curse of dimensionality”, which
refers to the poor convergence rate of nonparametric estimation of general multivariate
function. Much effort has been devoted to methods of circumventing. this difﬁculty. In
the words of Xia, Tong, Li and Zhu (2002), there are essentially two approaches: function
approximation and dimension reduction. Additive model and single~index model, special
cases of model (1.1.1), are good examples to represent these two approaches. Chapter 3 and

Chapter 4 discuss these two models separately.

1.2 Spline Conﬁdence Bands

Consider the one dimensional case of model (1.1.1) for strictly stationary bivariate time
series {(X,,I/,-)}?:1
Y,- =m(X,)+o(X,-)5,-,i=1,...,n, (1.2.1)

where the errors {5,}?21 are white noise, i.e., E(E,‘ IX,) = 0,var(5,- IX,) 2 1 and 5,- is a

martingale difference for the a—ﬁeltl f,- = 0 {Xj,8j_1, 1 3’ j S i} for 2' = 1, ...,n.

To put the discussion in perspective, consider the question of how the adjustment of
GDP autonomously inﬂuence the change of the environmental quality in Japan, see Section
2.5.2 in Chapter 2. The logarithm of GDP per capita and the emissions per capita of Japan
are decomposed as u(t) + X: and v(t) + Yt, t = 1, ...,n respectively, where the quadratic
trends u(t) and v(t) are given in (2.5.3), {(Xt, IQ)}?___1 are zero mean stationary time series
of residuals. The aforementioned question can be formulated in terms of various hypotheses
about the prediction function m(:z:) = E(Yt|Xt = :r). In Figure 4.11 (b), a 99% conservative
simultaneous conﬁdence band of m(:c) is plotted together with the linear regression line,
clearly showing the nonlinear dependence of Y; on Xt. The corresponding Figure 4.10 (b)
for the United States, however, shows a linear and insigniﬁcant m(:r). Making such inference
about the global shape of the prediction function m(:i:) depends crucially on the construction
of simultaneous conﬁdence bands for m using the time series observations {(Xi, Y,)}?___1.

In Chapter 2, asymptotically conservative simultaneous conﬁdence bands are con—
structed for nonparametric prediction function m based on piecewise constant and piecewise
linear polynomial spline estimation, respectively. Simulation experiments have provided
strong evidence that corroborates with the asymptotic theory. As an application, after
removing the environmental Kuznets curve trend effects, the impact of the economic inter-
vention on environmental quality change is quantified for the United States and Japan, with

different conclusions.

1.3 Nonlinear Additive Autoregression (NAAR) Model

For multi-dimensional strictly stationar time series X - ,...,X' ,Y- 7.: , the followin
3’ 1,1 z,d 2 ,_1 g

additive structure is assumed for model ( 1.1.1)

(1
v, = c+ 2 ma (Xi’a) +0(X,-)5,- (1.3.1)
021

In nonlinear additive autoregression data-analytical context, each predictor Xaa, 1 S a g
d could be observed lagged values of Y,, such as X”, = Y,-_(,, or of a different times

series. Model (1.3.1) therefore, is the exact same nonlinear additive autoregression model of

Huang and Yang (2004), which allows for exogenous variables. For identifiability, additive
component functions must satisfy the conditions Ema (Xi,a) E 0, a = 1, ..., d.

Application of additive model to high dimensional time series data has been hampered
by the scarcity of smoothing tools. The straightforward kernel methods are too compu-
tationally intensive for high dimension, thus limiting their applicability to small number
of predictors. Spline methods on the other hand, provide only convergence rates but no
asymptotic distributions, so no measures of conﬁdence can be assigned to the estimators.

In Chapter 3, a spline-backﬁtted kernel estimator is proposed for estimating the un-
known component functions {ma (-)}g=1 based on a geometrically strong mixing sample
following model (1.3.1). under minimal smoothness assumptions. The idea is to employ one
step backﬁtting after the spline pilot estimators, and then follow up with kernel smoothing,
which combines the fast computing of polynomial spline smoothing and the good asymptotic
property of kernel smoothing. Thus, the spline-backﬁtted kernel estimator is both computa-
tionally expedient for analyzing very high dimensional time series, and theoretically reliable

to make inference on the component functions with confidence.

1.4 Single-Index Prediction (SIP) Model

Single-index model, a special case of projection pursuit regression, has proven to be an effi-
cient way of coping with the high dimensional problem in nonparametric regression. Single—
index model summarizes the effects of the explanatory variables within a single variable
called the index. The basic appeal of single-index model is its simplicity: the d-variate func—
tion m (x) = m (:51, ..., xd) is expressed as a univariate function g of xTBO = 25:1 90’pIp.

In Chapter 4, a robust singleindex prediction (SIP) model is introduced for stochastic
regression model 1.1.1 regardless if the underlying function is exactly a single-index function.
Applications of SIP models lie in a variety of ﬁelds, such as discrete choice analysis in
econometrics and dose—response models in biometrics, where high-dimensional regression
models are often employed, see Hardle, Hall and Ichimura (1993). The proposed spline

estimator of the index coefficient possesses not only the usual strong consistency and ﬁ-

rate asymptotically normal distribution, but also is as efﬁcient as if the true link function g
is known. By taking advantage of the spline smoothing method and the iterative method,
the proposed procedure is much faster than the MAVE method, see Xia, Tong, Li and Zhu
(2002). This procedure is especially powerful for large sample size n and high dimension d
and unlike the MAVE method, the performance of the SIP remains satisfying in the case

d>n.

1.5 Polynomial Spline Smoothing

Let {X,, 1”,}?21 be a strictly stationary process. Assume that X,, i = 1, ..., n, are supported
on a compact interval [a, b]. Polynomial splines begin by choosing a set of knots (typically,
much smaller than the number of data points 11), and a set of basis functions spanning a
set of piecewise polynomials satisfying continuity and smoothness constraints.

To be speciﬁc, divide [a,b] into (N+ 1) subintervals Jj = [tj,tj+1), j = 0,...,N —
1, J N = [tN,1], where T := {tj )9; is a sequence of equally-spaced points, called interior

knots, given as
t1_k=...=t_1=t0=a<t1<...<tN <b=tN+1 =...:tN+k,

in which t,- = jh, j = 0, 1, ..., N + 1,12 = 1/ (N + 1) is the distance between neighboring

knots. Denote by
00") [a, b] = {m|the kth order derivative of m is continuous on [a,b]} (1.5.1)

and Gag-2) = C(k‘Ql [a,b] the space of all CUE—2) [a,b] functions that are polynomials of
degree It — 1 on each interval. The j—th B-spline of order k for the knot sequence T denoted

by Bi), is recursively deﬁned by the de Boor (2001), i.e.

u—t- B: __ —t’ B- _
1k (u):( J) 1.1: if“) __ (“ J+kl J+1,k 1(a),1—kgjgN, (1.5.2)

tj+k—1—tj tj+lc " tj+1

 

for k > 1, with

I t' < u < t" 1
B- :1 = 3“ 3+
3’1 (u) { { 0 otherwise

'uEJj}

For model (1.2.1), assume that m(:1:) belongs to 00¢) [a,b], the space of functions that
have k—th order continuous derivatives for some integer k > 0, on the interval [(1,1)]. The

polynomial spline estimator is

it

ﬁt], () :2 argmin Z {Y,- - g (X,-)}2 ,k > 0. (1.5.3)
g(-)€G(k-2)[a,b] i=1

In the rest of this dissertation, spline smoothing is applied for the stochastic regression

model (1.1.1) under different conditions.

CHAPTER 2

Spline Conﬁdence Bands for Time Series

Prediction Function

2.1 Introduction

Theoretical properties of nonparametric smoothers are typically examined in terms of mean
square, pointwise, or uniform rate of convergence, while practical consideration favors meth-
ods that are easy to implement and interpret. In addition, fast computing is appealing for
users of smoothers. For kernel smoothing of independent data, satisfactory results on rates
of convergence have been obtained, see Fan and Gijbels (1996) for pointwise and mean
square convergence rates, Miiller, Stadtmiiller and Schmitt (1987) for conﬁdence intervals
of derivative estimates, Neumann (1995, 1997) for bandwidth choice and construction of
conﬁdence intervals, Hall and Titterington (1988), Hardle (1989), Xia (1998), Claeskens
and Van Keilegom (2003) for uniform conﬁdence bands. Spline smoothers of independent
data have been investigated in parallel, see for example, Stone (1985, 1994) for mean square
convergence, Huang (2003) for pointwise convergence, and Zhou, Shen and Wolfe (1998) for
uniform conﬁdence bands.

Nonparametric smoothing of weakly dependent data has been vigorously pursued in
many directions due to its superiority for the modeling and forecasting of nonlinear time
series, see, for instance, Fan and Yao (2003) for kernel type autoregression smoothing,
and Huang and Yang (2004) for spline type autoregressive smoothing. Conﬁdence bands,

however, remain unavailable for all nonparametric smoothers based on dependent observa-

tions, due to the lack of Hungarian embedding for dependent random variables, similar to
that established by Tusnady (1977) for independent random variables. Existing results on
nonparametric smooth conﬁdence bands rely on such strong approximation result of i.i.d.
sample, see, for instance, Bickel and Rosenblatt (1973), Rosenblatt (1976), Hardle ( 1989),
Xia (1998), Claeskens and Van Keilegom (2003).

In this chapter, asymptotic simultaneous conﬁdence bands are obtained for the unknown
regression function m (x) in (1.1.1) based on the polynomial spline estimator 772,, (2:) deﬁned
in (1.5.3), while the observations {(X,, Y,)},’-‘=1 are only assumed to have a—mixing coefﬁcient
a(n) decaying geometrically (see Assumption (A4) of Section 2.2). Instead of applying
the usual Hungarian embedding technique used in most existing works, we make use of
the Berry-Esseen bound in Sunklodas (1984) for sequences of mixing random variables to
establish that the constructed conﬁdence bands are conservative. The resulting conﬁdence
bands are comparable in terms of formula and narrowness to those constructed for i.i.d.
sample. Further research will show that these simultaneous conﬁdence bands are very
useful for multi-step ahead forecasting of time series data, such as studied in Chen, Yang
and Hafner (2004).

The rest of this chapter is organized as follows. The main ﬁndings of splines conﬁdence
bands are stated in Section 2.2. Section 2.3 provides further insights into the error structure
of spline estimators, from which one is able to obtain the asymptotic conﬁdence bands. This
is accomplished by establishing simultaneous Berry-Esseen bound for the estimation noise.
Section 2.4 describes the actual steps to implement the conﬁdence bands. Section 2.5 reports
the ﬁndings in an extensive simulation study and the application to the environmental

Kuznets curve (EKC) analysis. All technical proofs are contained in Section 2.6.

2.2 Main results
Before stating the main theorems, we formulate some assumptions.
(A1) The regression function 711 E C(k) [a, b], k = 1,2.

(A2) The marginal density function f (11:) of X is continuous and positive on its compact

support, the interval Ia,bI. The standard deviation function 0(17) is continuous and

positive on Ia, 1)].

(A3) The number of interior knots N satisﬁes: (n/ log n)1/(2k+1) << N << 111/3, hence for

k = 2, one can take N N n1/5, while for k = 1, one can take N N n1/3(logn)”1/6.

(A4) There exist positive constants K0 and A0 such that a(n) S Koe“’\0" holds for all n,

where the strong mixing coeﬂ'icient of order n is deﬁned as

a(n)= sup IP(BﬂC)—P(B)P(C)I,n21.
BEO’{X3,Ys,3£t},CEO’{X3,Y3,SZt‘f'Tl}

(A5) The joint distribution of random variables (X ,5) satisﬁes the following:
(a) The error is a white noise, E(e IX 2 :15) = 0, E (52 IX = 2:) =1.
(b) There exists A10 > 0 such that

sup E (IEI3 IX = 2:) < MO.
xEan

Assumptions (A1)-(A5) are typical in the nonparametric smoothing literature, see for
instance, Fan and Yao (2003), Huang and Yang (2004).

For any a: 6 [a,b], deﬁne its location index j (:13) and relative location index (5 (:22) as
._ a: — t -
J’ (2:) = in (2:) = min I IE—h—‘EI ,N}, 6(1) 2 —IJ'(_I)’ (2.2.1)

It is clear that tj(x) g :1: < tj(_,,,)+1, 0 S 6(12) < 1, Va: 6 [a,b], j(b) = N, 6(b) = 1. For any
1.2-integrable functions (15, (p on Ia, b], the theoretical and empirical inner products and the

corresponding L2 norms are deﬁned respectively by

b
at» = / immense = mangoes},

”as = E {i2 (X)} = /b¢2 (as) f($)d:c,

we). 2 n4 2 {a5 or.) 990(1)} , “at. = 72-1: ,2 (X0.

i=1 i:1

For notation simplicity, we denote by IIII00 the supremum norm of a function r on Ia,b],

:2 sup Ir (1:)I, and the moduli of continuity of a continuous function r on [a,b]
:cEIa,bI

is denoted as w (r, h.) 2

wide

max Ir (x) — r (23’) I. By the uniform continuity of r on
:r,r’€[a,bI,Ix-$’Igh
an interval [a, b], one has ’Iimow (r, h) = 0.

We denote the theoretical norms of Bj,k: k = 1,2 in (1.5.2) as follows
%n=lwnﬁ=/GWU@ML was
d”, = IIBJ'JIIE = [K {(11: -— tj+1) h-l} f(:1:) d:r.. (2.2.3)
For theoretical analysis, in the following of this chapter, we use the rescaled B-spline basis
(divided by its theoretical norm cj'n, dj,n) {By-,1 (:12) }§V=0 and {8,32 (:12) }j.v:_1 for constant
spline space C(‘ll and linear spline space (7(0) deﬁned in Section 1.5. The inner product

matrix of the B-spline basis {83,1 (:3) }j:0 is obviously the identity matrix I N+12 while the

corresponding matrix V of the B-spline basis {Bj,2(1‘)}§:_1 is denoted as

V N B B N 2 2 4
_ (Ujlj)jsj,:—l — (< jl’2, J’2>)jsj,=—I, ( . I )
whose inverse matrix S and its 2 x 2 diagonal submatrices are expressed as
N —1 Sj—Ij—I Sj—lj -
S: (34.) ,, 2V ,S, = , . ,9 =0,...,N. (2.2.5)
J J Jr.) :_1 Sj7j_l Sjij ‘
Next deﬁne matrices 2, A (:13) and 83- as
N
N 2
23 ‘-‘ (011),):4 = {f0 (’0) 33,2 (U) 131,2 (“IN”) 411} 1 , (225)
j: 2—1

= Cj(x)_1{1—5($)} ,2 J2 j=—1,N
ACE) (OJ-(306(1)) ,CJ 1 OSjSN—I ,

lj+2,j+1 lj+2,j+2
where terms {liklli-qu are the entries of the inverse of the (N +2) x (N + 2) matrix

1- - l- -
E,=(J+1’J+l 3+1”+2),j=0,1,...,N, (2.2.7)

MN+2 and can be computed by Lemma 2.6.10

 

 

1 \f2/4 0 \
ﬁ/zi 1 1/4
MN+2 = 1/4 _1 , (23-8)
., 1/4
1/4 1 ﬁ/4
\0 ﬁ/4 1 f

10

Deﬁne next

f 13(1) (v) 02 (v) f (v) dv

 

2
o,,,1(:1:) = 2 , (2.2.9)
716.
3(1),"
N
1
2
an,2(a:) = E Z 814,2($)B,,’2(:r)sjj;su/aﬂ, (2.2.10)
j,j’,l,l’=—l

with j(:r) deﬁned in (2.2.1), ij in (2.2.2) and s“; and 03-) in (2.2.5), (2.2.6). These
03, k (2:) are shown in Lemmas 2.6.5, 2.6.11 to be the pointwise variance functions of the
spline estimators 1a,, (2:), l: = 1,2. Lastly, deﬁne an inﬂation correction factor, for any

a 6 (0,1)
dn(a)=1—-—{2log(N +1)}—1 log (Ct/2) + % {log log (N +1) +log47r} . (2.2.11)

THEOREM 2.2.1. Under Assumptions (AU—(A5), for any a 6 (0,1), an asymptotic

100 (1 —— a) % conservative conﬁdence band for m (an) over interval [a, b] is
mm :1: an), (11:) {2k log (N +1)}1/2 an (oz/k) ,k = 1, 2. (2.2.12)
In other words, for k z 1, 2

limian Im (I) 6 ink (:17) j: 0",), (.17) {2k log (N +1)}1/2dn(a/k),Va:”E [a,bII _>_1— a,

Tl—'*OO

in which on,1(:r.) is given in (2.2.9), replaceable by o(:1:) {f(2:)nh}‘"1/2 accord-

ing to (2.6.6) in Lemma 2.6.5, on; (:r) is given in (2.2.10), replaceable by
l 2

o(:1:) {2f ($)nh/3}—1/2 {AT (:13) Ej(I)A(:r)} / according to Lemma 2.6.9 and (2.6.1.9) in

Lemma 2.6.11, and d" (a) is given in (2.2.11).

2.3 Error decomposition

In this section, we break the polynomial spline estimation error at, (:13) —— m (2:) into a bias
term and a noise term, with ﬁik(:17) given in (1.5.3). We ﬁrst establish the uniform rate
at which the empirical inner product approximates the theoretical inner product for all

B-splines.

11

LEMMA 2.3.1. Under Assumptions (A3) and (A5), we have
A,“ = sup IIIBj,1II§ n — 1| 2 Op {(nh)—1/2 log n} , (2.3.1)
OSISN ’

(91.92%; ‘ (91,92)
l|91l|2 l|92ll2

 

A112 : sup

 

 

= 0,, {(nh)_1/2 log n} . (2.3.2)

Note that the spline estimator in (1.5.3) is ya}, (x) that m, (x) E XIV—14c)? kB jk (x),
where

- . T "
{AI—k,k1“‘1/\N,k} = argmin Z Y,‘ - Z )‘jkBJ k(Xi)
{A1_k’k,...,x\N,k}eRN+k i=1 3'- 1— k

With a slight abuse of notation, introduce a function Y deﬁned only on data points:

Y (X,) E Y,,1§i§ n, and write

B T B B —1 Y B > N
T)—{ 2.k(I)}i—kstN (< 2%” Jik>n)1—k52,2’SN{< ’ 3"“ n}j=1—k‘

Deﬁne asimilar function E as E (X,) E a (X,) 5,, 1 S i S n, then on data points Y = m+E
with m = {m(X1), ...,m(Xn)}T. An empirical inner product yields m, (x) = r72), (x) +

Ek (x), where

T -l
= {BM (33)}1—kﬁjiN (<Bi'JC’Bj*k>n)1-kgj,j’giv{<m’ 81k)" lj=1—k (2'33)

x) = {BM (x)}'f_ijSN ((3,411 Bj’k>n)1—jkgj,j’gN {(13, B,,,)n}:l_k. (2.3.4)

Thus, the estimation error in), (2:) — m (x) consists of a bias term 171,, (x) — m (2:) and a noise

term E), (x), such that

721,, (x) —— m (x) : {171k (2:) — m (27)} + 5k (x). (2.3.5)

LEMMA 2.3.2. [de Boor (2001) page 149/ There exists an absolute constant Ck > 0, k 2 1,

such that for every 771 E (70‘) Ia, b], there exists a function g E C(k—2) [(1,1)], such that

IIg — mIIOO S Ck IIw (mac—‘1), h) II h“‘1 g Ck
OO

 

 

771(k) II 11“.

00

12

LEMMA 2.3.3. [Huang (2003) Theorem 5.1] Under Assumptions (A 1)-(A4), there exists an
absolute constant Ck > 0, k 2 1, such that for any m 6 CU”) Ia,b] and the function 771,, (x)
as in (2.3.3), with probability approaching 1

um), (I) — m mum g 0,, inf IIg — mud, = o, (hk) . (2.3.6)
966' ’2)

Lemmas 2.3.2 and 2.3.3 establish that the bias term is of order Op (hk) uniformly over
2: 6 [a,b]. Hence the main hurdle of proving Theorem 2.2.1 is the noise term 5,, (x) deﬁned

in (2.3.4). This is handled by the next proposition.

PROPOSITION 2.3.1. Under Assumptions (A2)—(A5), with 071,1 (x) given in (2.2.9) and

07,2 (x) given in (2.2.10), for any 0 < a < 1, k = 1,2, one has

 

limian I sup 0;), (2:) 5k (2:) g {2klog(N + 1)}1/2d/n (oz/k) Z 1 -— a. (2.3.7)

"HOG IEIde
2.4 Implementation

In this section, we describe in detail the procedures implemented to construct the conﬁdence
bands in Theorem 2.2.1. All of the codes have been written in R.

Given any sample {(X,, Y,)}?_:_1, use the minimum and maximum values of {X,-}?:1 as
the endpoints of interval [a, b]. The number of knots N is taken to be Ickn1/3(logn)—1/6I
for k = 1 and Icknl/SI for k = 2, where ck (k = 1,2) are positive integers. As with
previous works on conﬁdence bands (Hardle 1989, Xia 1998, Claeskens and Van Keilegom
2003), explicit formula of coverage probability for the bands does not exist, hence there
is no Optimal method to select C), (k = 1,2). So we have not attempted adaptive knot
selection, as Hardle, Marron and Yang (1997) had illustrated that it could lead to uniform
inconsistency. We have set c1 = 6, c2 = 3 for piecewise constant and piecewise linear bands

respectively, which works well in all simulations.

The least squares problem in (1.5.3) is solved by writing spline functions as linear com-

k—l

+ ,j=1,...,N. In

binations of the truncated power base, which are 1, x, ...,xk’l, (2: — tj)

13

other words, we take

)k— 1
772,, (2:): :ypxp +22")ij , (2.4.1)

1120

where the coefﬁcients {’70, ..., 7k—1:’71,ka ..., 3N,k}T minimize the following sum of squares

2
11

Z Y 27px? JFZIJ'HX Jlkl

i=1
When constructing the conﬁdence bands, one needs to estimate the unknown functions
f (2:) and 02 (x) for the evaluation of the functions 011,1 (2:) in (2.2.9) and 07,3 (x) in (2.2.10)
according to Lemma 2.6.5 and Lemma 2.6.11.
Let R (u) = 15 (1 -— u2)2 I {IuI _<_ 1} /16 be the quartic kernel, 3,, be the sample stan-
dard deviation of {X,- H _1 and

A , :1 hr—oltl f {U -15) ’ (2.4.2)

hrot ,f

 

where hm, f:(47f)1/10(140/3)1/571-1/5sn is the rule- of—thumb bandwidth of Silverman
(1986). Theorem 2.2 of Bosq (1998), page 47, implies the following uniform consistency
result

sup
xEIa,bI

 

f (:17) — f (I)I = 0, as. (2.4.3)

Deﬁne vectors Z), = {Z1,k, .., Z,,,k}T, k = 1, 2 with Zn), 2 {Y,- — m), (X,)}2, then the

spline estimation of o2 (x), 6% (x), k = 1, 2, can be obtained by using the N adaraya-Watson

estimation on data {X,, Z,,k}?:1. It is clear from standard theory of kernel smoothing that
max sup I0% (2:) - 02 (2:) = Op (h). (2.4.4)
k=1 2 xEIa, b]

With all the above preparation, one can compute the following conﬁdence bands
1a,,(a: )ionk(x, opt){2klog(N+1)}1/2d%(,a/2) k=1,2,0pt= 1,2, (2.4.5)
where m, (x) is given in (2.4.1), the additional parameter opt = 1, 2 indicating the estima—

tion being at each value 1: or at the nearest left knot with j (x) and f(r ) deﬁned in (2.2.1)

14

and (2.4.2)

s,,,1(.z:,1) =-_ 1310““) 134/2 (5,”) n—1/2h_1/2, (2.4.6)
as (x. 2) = 61(r)f‘1/2(x)n‘1/2h‘1/2. (2.47)

an; (x, 1) : {AT (x) amp: (2))”2 {nhf (t,(,,)}—l/2 ﬁ/‘isg (1%), (2.4.8)

. _ 1/2 . —1/2 -
07,2 (x, 2) 2 {AT (2:) =j(x)A (x)} {nhf (x)} \/3/202 (x). (2.4.9)
Since sup x — tJ-(x) S h —~> 0, as n ——> 00, and according to Lemma 2.6.9, the matrix B]-

 

 

xEIa,bI
approximates matrix 83- uniformly for 0 S j S N, (2.4.3) and (2.4.4) entail that all of the

four bands above are asymptotically conservative.

2.5 Examples

2.5.1 Simulation example

To illustrate the ﬁnite—sample behavior of the proposed conﬁdence bands, some simulation
results are presented. The number of interior knots N is chosen according to Section 2.4.
The data set in our simulation study is generated from heteroscedastic regression model
(1.2.1), with

100 -— exp (x)
100 + exp (2:) ’

 

m (x) = sin (27rx), 0 (x) = 00 5 ~ N (0, 1), 00 = 0.2, 0.5. (2.5.1)

We simulate {7}}?‘21 from a moving average sequence of order q, i.e,

 

 

1
T1 = 2 2 (62' + 9161—1 + 9251—2 + + qui—q),
\/1+61+...+6q
where in the simulation, q is taken to be 4, 61 = = 6g 2 0.2 and f,’s are i.i.d. r.v.’s

~ N(0,1). We then define X, = (I)(T,-), where (I) is the standard normal distribution
function, so X, is uniformly distributed on [0,1].

We choose sample size n to be 100, 200, 500 and 10000, conﬁdence level 1 — a =
0.99, 0.95 as usual. Tables 4.1 and 4.2 contain the coverage probabilities as the percentage
of coverage of the true curve at all data points by the confidence bands in (2.4.5) with 500

replications of sample size n = 100, 200 and 500. The coverage probabilities of the confidence

15

bands in (2.4.5) have also been computed by plugging in the true value of density function
f(x) = IIO,1I(x) and the variance function 0(x) in (2.5.1), called the oracle bands as they
use quantities that are unknown but for “oracles”.

Table 4.1 shows that the performance of all four bands becomes much closer with larger
sample size. When sample size reaches 500, all four bands have nearly the same coverage
at noise level 0.2. In Table 4.2, the coverage percentages show very positive conﬁrmation
of Theorem 2.2.1 when k = 2. At sample size 100, regardless of noise level, both of the two
piecewise linear bands in (2.4.5) achieve at least .980 and .948 for conﬁdence level 1 ——a = .99
and .95, respectively.

From Tables 4.1 and 4.2, it is obvious that larger sample size guarantees improved
coverage, while reasonable coverage has also been achieved at moderate sample sizes. While
under the same circumstances, the band by linear spline performs much better than the
band by constant spline. We have also observed that the noise level has more influence on
the constant bands coverage, and very little on the linear bands’.

Corresponding to opt = 1, 2, four ﬁgures of constant bands (Figures 4.1 - 4.4) and four
ﬁgures of linear bands (Figures 4.5 - 4.8) are created for graphical comparison: each with
four types of symbols: dots (data), center smooth solid line (true curve), center dotted line
(the spline estimated curve), upper and lower thick solid line (conﬁdence bands). Comparing
Figures 4.1 - 4.4, one sees that the band widths are very close as sample size reaches 500.
This is more evident from Figures 4.5 - 4.8.

In all ﬁgures, the conﬁdence bands of n = 500 are thinner and ﬁt better than those of
n = 100. Also the smaller the signiﬁcance level, the wider the conﬁdence band. Overall,
linear bands are superior to constant ones in terms of smoothness and narrowness.

Observing that the estimation of on; (x) by 6mg (x, 1) at knots as in (2.4.8) or by
6mg (x,2) at all observations as in (2.4.9) does not seem to have much noticeable impact
on the widths of the conﬁdence bands, while the estimation at knots seems to produce
closer coverage probabilities to the nominal conﬁdence level, we recommend always using
estimation by (3mg (2:, l) at knots for simpler and faster implementation.

For the linear bands, we have also carried out simulation at noise level 0.2, for sample

16

size n = 10000 and Opt = 1 (estimation on knots). The coverage is always 99.6% for a = 0.01
and 97.6% for a = 0.05, both higher than the nominal coverage of 99% and 95%, consistent
with their conservative deﬁnitions. Remarkably, it takes merely 365 seconds to run 500
replications with sample size as large as 10000 on a Pentium III PC. This is extremely fast
considering that nonparametric regression is done without WARPing, see Hardle, Hlavka

and Klinke (2000).

2.5.2 Environmental Kuznets curve (EKC)

The environmental Kuznets curve (EKC), an inverted—U relationship between pollution and
income, is an influential generalization about the way environmental quality changes as a
country makes the transition from poverty to relative affluence. The EKC predicts that
pollution will ﬁrst increase, but subsequently decline if income growth proceeds far enough.
The shape of the relationship between the rate of environmental degradation and GDP per
capita has been the subject of much empirical examination. Several studies have attempted
to test the EKC hypothesis empirically. The majority of these studies use panel data in
conjunction with a static ﬁxed and/or random effects panel estimator. In this section, we
examine whether or not countries (here we select US and Japan) actually behave like the
EKG, and we further look at the nonparametric time series nature of the data set after
elimination of the trend.

One key variable of this study, the environment index is the emissions of sulfur from
1850 to 1990, see Lefohn, Husar and Husar (1999). The other key variable is GDP per
capita from 1850 to 1990, which can be obtained in Maddison (2003).

To gain an insight into the model structure, we decompose the logarithm of GDP per
capita and Emission per capita into their trend parts and noise parts, respectively, i.e., for

t = 1, ...,n
{log(GDP per capita)}, = u(t) + Xt, {log(Emission per capita)}, = v(t) + Yg.

We are interested in two sets of hypotheses, given here separately in terms of the relationship

between the trends u(t) and v(t), and between the stationary noise {Xt}?:1 and {Yt}?:1.

17

EKC hypothesis: There exists an inverted-U relationship between u(t) and v(t). (see
Figure 4.9)

Residual/noise hypothesis: There exists a linear relationship between {Xt}?__.1 and
{Ytliéi-

The EKC hypothesis can be tested by performing a routine trend analysis. After de-
trending, {Xt}?=1 and {IQ}?=1 are obtained, then one can estimate the regression relation
between them and construct an piecewise linear spline conﬁdence band for the testing.
Case 1. United States Example

We get the trends u(t), v(t) of US data by ﬁtting a polynomial regression on time t.
u(t) :2 0.00511+ 3.3127, v(t) = -—O.0001t2 + 0.0261t — 2.1788, (2.5.2)

with the corresponding R2 = 0.9814,0.9256. So for US, the EKC hypothesis is retained by
the trend analysis. After elimination of the trend, {X t}?=1, {Y1}?=1 appear to be stationary.
For the residual hypothesis, Figure 4.10 shows that when conﬁdence level is as small
as 80%, the linear regression line is still covered by the conﬁdence band. This phenomena
implies that the residual hypothesis is retained.
Moreover, we can see that the conﬁdence bands also cover the horizontal line E (YtIXt) E
0. So one concludes that Y; is unpredictable from Xt, that is, the intervention of emission
is immune to the intervention of economy.
Case 2. Japan Example

The quadratic trends u(t), v(t) for Japan data are given as
u(t) = 0.000312 -— 0.00191+ 6.7308, v(t) = —0.000512 + 0.0952t — 9.0772 (2.5.3)

with R2 = 0.9829,0.9544. From the trend relationship curve, one sees that it is not a U
shaped curve as EKC predicted. However, we are not sure whether it would succeed to
decouple environmental pollution and resource use from economic growth, which will make
this a tuning point and U shape later. To test the residual hypothesis, Figure 4.11 shows
that neither the linear regression line nor the horizontal line E (i’tIXt) E O can be covered by
the conﬁdence bands even when the conﬁdence level reaches 99%. So the residual hypothesis

is rejected at significance level smaller than 0.01 given that the confidence band is already

18

conservative. This phenomena implies that the intervention of emission is not immune to
the intervention of economy, or say that the adjustment of GDP has autonomous influence

on the change of environmental quality, but not in a linear way.

2.6 Proof of Theorem 2.2.1

2.6.1 Preliminaries of Theorem 2.2.1 with k = 1

Throughout the following, denote by c, C, any positive constants, without distinction.
The properties of C2.” and djjn are given in the following lemma, whose proof consists

of direct algebraic veriﬁcations.

LEMMA 2.6.1. As 71 ——* 00, for C2," deﬁned in (2.2.2) and d”, in (2.2.3)

cm = f(tj)h(1+rj,n,1),<bj,1,bj;,1>EO,j7éj', (2.6.1)

2 1+r- 2 j=0...N—1
d. = — t- h 3*"1 ’ ’ ’
9'" 3H3“) I1/2+r,-,,,,2 j=—1,N,

1 1+f- , j’-—j =1,
<bj,2, 31,2) 2 gfftj+l)h{0 “12 III-ji>lv

where

02}sz lTj,n,lI + 412?:‘N ITj1n,2I + 451%4 Ifjanﬂl g Cw (f, h), (2.6.2)

if (tj+1) h {1 — Cw (f, h)} g (1,,” 3 if (9+1) h {1 + Cw (f, 11)} . (2.6.3)

To prove Lemma 2.3.1, we make use of the following Bernstein inequality for geometri—

cally a-mixing sequence.

LEMMA 2.6.2. (Bosq (1998), page 31, Theorem 1.4/ Let {§,,t E Z} be a zero mean real valued
a—mixing process, Sn = 221:1 g, Suppose that there exists c > 0 such that fori = 1, ...,n,
k = 3, 4,..., EI§,Ik S elf—2ME6,2 < +00, then for each n > 1, integer q E I1,n/2], each
5 > 0 and k 2 3

(152 n 2k/(2k+1)
P(ISnI 2 n5) ‘3 a1 exp (———-————) + (12 (k) 01 (I I) ,

25771122 + 568 q + 1

 

19

where a() is the a-mixing coeﬂicient deﬁned in (3. 2.10) and

2 577121c/(21c+1)

),a2(k)=lln 1+—‘~‘———— ,

61:27—’+2(1+
q E

25mg + 5a:
with mr = Ina-X1992 lléillr, 7‘ 2 2.

PROOF OF LEMMA 2.3.1. For brevity, we only give the proof of Lemma 2.3.1 for An,l-

—1

For any 0 S j S N, let 1),-J = B121(X,-)-1, then llBlelin — 1 = 11 1‘11)”, with

Ema- = 0 and for any r _>_' 2, C1- inequality implies that
E low-l" = E Is},1 (X,) — 1Ir g 2T—1EIB,2j‘,(X,-)+ 1I 3 CO {2h—1}r_l,
where cjjn is as (2.22) with properties given in (2.6.1) and (2.6.2). On the other hand
130112,): EBf-I1 ' — )1I2 2 EIB;,1(X,-)-1I = {2c,-,,,}‘1—12 Clh‘l.

. ~ it—
So there is a constant c, such that for all k > 2, E'Ii),JIA S (ch—l) 2klEnz2j. Thus
Cramer’s condition is satisﬁed with Cramer’s constant equal to ch‘l. Applying Lemma

2.6.2 to n’1 2?:1711‘03 for any 6 > 0, q E [1,n/2I, one has for k = 3
1 n —q62 n 6/7
P — >6 <a ex ——————” +a 30(I—I) ,
{n 2772,} n} _ 1 P (25mg +5(:(5n) 2() (1+1

i=1
(5k 71 n (52
(In: )g ,a1:2—-+2 1+———r,—l——- ,1‘1132E1ﬂ-Nh-1
nh (I 25111.} + 5125,,

 

 

where

 

6/7

5m 2 2
a2 (3) = Mn 1+ 6: , 17-13 —112;”t§2<lIl’lz,jll3 S {CO (h) }

Observe that 6n = 0(1), then by taking q such that Iﬁ—II Z colog n, q _>_ cln/logn for

 

some constants c0,c1, one has a1 = 0(n/q) : 0 (log 71), a2 (3) = 0 (n2). Assumption (A4)

6/7 6/7
q + 1 q + 1

Thus, for 71 large enough,

$1272..»

yields that

 

(Slogn

 

 

—} S clog 71 exp {4:263 log n} + C71,?"6A0C0/7.
nh

20

Taking c0, 6 large enough, one has for large n, P {111 121:1 712331 > (nh)—1/2 6 log n} S TF3.
Hence (2.3.1) holds because

00

0° 2 Log: {3 00 _2
E P sup “1133111211. — 1|> 372 N g 2 271 < 00. Cl
n=l ‘ N , n21

OSJS 121:

2.6.2 Proof of Proposition 2.3.1 with k = 1

To prove Proposition 2.3.1, the following important lemmas are needed. We denote by (I)

the standard normal distribution function.

LEMMA 2.6.3. [Sunklodas (1984), Theorem 1] Let {£3211 be an a-mirtrzg sequence with
Efn = 0. Denote d :2 maxlgign {ElfiIZ‘HS} ,0 < 6 S 1, Sn 2 227-121 52': 03, := ES}, 2 con
for some ('0 E (0, +00). [fa (n) g Koe‘AO", A0 > 0, K0 > 0, then there extstcl = c1(K,6),

62 = (:2 (K, (5), such that

A

 

d
S Cl
coon

6{log(on/((1)/2) /)\}1+6

                 

{01:15}, < z} — ‘1)(2)

 

for any A with /\1 g A 3 A2, where

/\1=(:2{log (On/CO/ 2V} /n. b>2(1+(5)/6; A2245—1(2+6)log(0n/C5/2.)

LEMMA 2. 6. 4. [Leadhette1, LGdgren and Rootzen (198?), Theorem 1.5 .3] As N —* 00, one
has

[(1) (r/(LN + bN)]N -—+ exp (—e—T),

where
aN = (210g N)”2 , bN = (210g N)1/2 — (2 log N)"1/2 (log logN + log47r) /2.
Note that E1 (:13) in (2.3.4) can be rewritten as
2:38 e 31.1“22 n, (2.6.4)

with 8; —— (13,8 j 1),, = £2121 Bj,1(Xi)0(Xi)5i- NOW deﬁne

—ZE;B,1,:; e [a 1)] (2.6.5)

The next lemma gives the pointwise variance of 51 (1:).

21

LEMMA 2.6.5. The pointwise variance of E1 (:r) is the function 031.1(1) deﬁned in (2.29)
which satisﬁes

02(1.

f (1:) 73h

 

Eel (11122031 (so): {1+ ma} zeia 1] (2.6.6)

with supIE[a,b] lrnJ (I)l -1 0.

PROOF. Note that E(ez-IX--) —— 0, E[Bj1(X,-)Bj1 (Xk)o(Xi)a(Xk)el-ek] = 0,Vi # k,
the rest of the proof follows from Lemma 2.6.1 and the continuity of functions a (:r) and

f (:13). Cl
The difference between E1 (1:) in (2.6.4) and 5:1 (as) in (2.6.5) is negligible uniformly over

as 6 [a,b].

LEMMA 2.6.6. Under Assumptions {A2) and (A5)

1131(1) — 121(1):: A,“ (1 - 11.1)"1 121 (2:11 e 11.11.

PROOF. For any :1: 6 [a,b]

51(x)- 51(zr)| _<_ I51 (3:)| sup I B31 . — 1| sup 31,1 .
1 03.131), M .7 “2,11 OSjSN H J “2,1:
Meanwhile (2.3.1) of Lemma 2.3.1 implies that

2 _1 _1
032111 lllBlelgn — 1| S An,1, (1 + An,1) S 0<SUEN “81111127213 (1 _ An,1) ,

hence the lemma follows. C]

Since the stochastic function 13:1 (:13) given in (2.6.5) takes constant value on each interval
I 3, one only has to bound each of the N + 1 rescaled noise terms simultaneously by the
Berry-Esseen bound for weakly dependent data. First we verify the conditions in Lemma

2.6.3 for {1"}- E Bj,1 (Xi)U(Xi)511 1 S i g n, j = 0, ..., N.

LEMMA 2.6.7. There exist constants c0 (f,o) ,CU (f,o) > 0, such that for each j = 0, ..., N

2
n
2 __
0310- E E (£15,0-) == nE{BJ-,1(Xz-)0(X,-)e,-} = ncjﬂll/ o2 (u) f (u) du = on,
i=1

’1'
(2.6.7)
where COJ: c]. n f1 (u )f (u) )du > (0 (f, a) > 0 with C1," deﬁned in (2.22) and
d1 _:_ 5151,31" = E {B3111 (21.10“ (X0 15.13} 5 Co (f,0)h“1/2- (2.6.8)

22

Proof. Using the definition of on 1 (:r) in (2.2.9)

n 2 2
03m- = E (251,3) "271— Cj,En"71;z{ ZBj,1(B$)j,1(Xi)0(:i)€i} =n20j,n031,1($)
i=1
2 ncjjfll/Iﬂx) (u)o2 (u)f(u) du = nco,j> __ neg (f, o)>

Next, by Lemma 2.6.1 and the continuity of functions 02 (:12) and f (1:), one has

—3 2 _
=,EIaJ-I3< — JJ/ [1 a3 (u)f(u1du s 001120)}: V2. C1
1'
PROOF OF PROPOSITION 2.3.1 WITH p = 1. Note that for any j = 0,...,N, :1: e I]-

.1101)— 3,}:119 ,)191(:c (XJ)a(XJ-)e.-=a;,}Ze-,J. (2.6.9)
i=1

in which 03”- = on 2 q;(f,o) > O as in (2.6.7) and dj S C'(f,or)h-1/2 as in (2.6.8).

Observing that {§,-,j}?___1 forms a stationary a-mixing sequence, with Eﬁm = 0. Deﬁne

 

An— _ 0ga<xN :ng P{on11(:r)51(:1:) 3 2,21: E 1]} -— <D(z)l. (2.6.10)
i.e.,
22:1 33101?) Bj,z‘01(X) 010051
A = ’ < - — z
” @2511“? P{ non1(:r) ”619 (N )’

 

 

equations (2.6.9), (2.6.10) and Lemmas 2.6.3, 2.6.7 imply

_ n C (to)
l_ i°< ’ E I' —<I> < 01 0
Plow :50 — z a: J} Ml _ h1/2C0(f’0)0"’j

 

An = max sup
0<j<N z

E 0%) =0(N’1)J

where the last step follows from Assumption (A3). Using the above, for a waN given in

 

 

Lemma 2.6.4 and eachj = 1, ..., N, one has

P1; J._‘;€J-J<
P{0;,;Zléi,j _ _l_(_)_g_(a_/2_) +bN,.’II E Ij}

a
i— 1 N

POn{ 11:21gi,'£1———0g (Ea/2)—bN,III€Ij}

<1>(— 10g<a/2)/aN + 1w) — <1 <10g<a/2)/a~ — 1N) + o (Iv—1).

 

—log (a/2)/aN+bN,:E€ I }

 

ll

23

Applying Lemma 2.6.4, one easily Obtains that as n ——) OO

(PCT/(IN +bN) = 1—(‘3uT1Vm1 +0(N_1) ,

@(T/aN+bN)—q>(—T/aN—bN) —_- I—Qe’TN"l+o(N”1).

Letting 2e-T = a or 7' = — log (oz/2) entails that uniformly in j,

 

 

n
_ 108(01/2) a -1
P 01- €'-S--—-—-—+bN 1,3:61- =1——————+0(N ).
{ 71,3; 1’] aN+1 + ‘7 1+N
Thus
11
_ log a/2 ,
P{0n$-E gm- >—TIEIWT)+bN+1’$EIJ-, for someOS] _<_N}
i=1

 

 

l
> _ Og(a/2)
aN+1

N n
—l
s :P{ 2a-
j=0 121
So as n —+ 00, one has

n
—1
P { 0m 2&3)"
i=1

 

 

+bN+1,1:E Ij}=oz+0(1).

_10g(a/2)
aN+1

S

 

 

+bN+11$€1j10SjS N} 21—a+0(1).
Hence

lim inf P sup

"—00 :1:€[a,b]
n
-1
an,j Z €21.7-
i=1

Therefore, using Lemma 2.6.6, one has proved (2.3.7) for k = 1. Cl

 

0;,11(x)é1(x)| 3 {210g (N MW2 at («0]
— _log (CY/2)
aN+1

: lim inf P

“H00

S

 

 

 

+bN+1,xEIJ-,0§jSN] 21—01.

.-

2.6.3 Proof of Theorem 2.2.1 with k = 1
PROOF OF THEOREM 1 WITH 1: = 1. By (2.3.6) and Assumption (A3), one has
(17mm — m (2:)"... = OJ (1») = op {fl/211‘”? (log (N + 1))1/2}.

so the uniform bias order is negligible compared tO (nh)_1/2 {log(N + 1)}1/2, which is the

uniform noise order of

aJ.,1(x){—log(a/2)/aw+1 + m1} = m (J) (210g (N +1)}1/2 dn (a).

24

Now (2.3.5) and Proposition 2.3.1 yield the conservativity of the band in (2.2.12) for k = 1

limian pm (:11) E 1i11(;r) :1: 0,1,1 (1:) {210g (N +1)}1/2dn(a),‘v’1: 6 [a,b]]

”-400 1.

= limian sup 01:11 (:11) Isl (1:) + 7711(r) — m (:r)| 3 {210g (N +1)}1/2dn(a)]
"—900 _x€[a,bl ’ .

ll

 

limian sup o_1(:r)17:1(1:) S {2log(N + 1)}1/2 dn (C1)] 2 1 — a.

,1
"“00 _z€[a,b] "

Therefore, Theorem 2.2.1 has been proved for the case Of k = 1. Cl

2.6.4 Preliminaries of Theorem 2.2.1 with k = 2

In this subsection we examine some matrices used in the construction Of conﬁdence band
in (2.2.12) for k = 2. In what follows, |T| is used to denote the maximal absolute value Of
all the elements in matrix T, V is the inner product matrix deﬁned in (2.2.4) and M N+2

is the tridiagonal matrix as deﬁned in (2.2.8).

AI
LEMMA 2.6.8. Given matrix Q = MN+2 +I‘, in which F = (73-3-1) . -/ 1 satisﬁes 7.7-1.1 .=_ 0
.73] :—

if lj —j'| > 1 and [Fl 3, 0. Then there exist constants c,C > 0 independent ofn and I‘,

such that with probability approaching one
c151 s IQEI s 0151,0-11asln'lsl s c-1 (a as 6 RN”.

Proof Of the above lemma is trivial. As an application Of Lemma 2.6.8, consider the

N
, then there exists a positive

. _ ..1 - ~ ——
matrlx S —— V deﬁned 1n (2-2-5l- LBt £3" — {sgn (Sj'j) }]=_1

Cs such that

 

N
Z lsj’jl S lSéj/l S 05 éjll = C3,Vj’ : —1,0,...,N. (2.611)
j=—1

The next lemma follows by applying Lemma 2.6.8 with Q = M N+2- It ensures that

one can approximate S with the inverse Of M N+2, with a simpler distribution-free form in

(2.2.8). This approximation is uniform for Sj in (2.2.5) and Ej in (2.2.7) as well.

LEMMA 2.6.9. As 72 ——> OO,|lVIj_V1+2 — SI —> 0 and max lEj — Sjl —» O.
OSjSN

The tridiagonal terms of the matrix MIT/1+2 can be computed through the following

lemma, which is a direct result Of Zhang (1999), Theorem 4.5, page 101.

LEMMA 2.6.10. Let
21: (2+x/3)/4, z2= (2— Jig/4, 9=z2/z1=7—4\/3,
one can compute the terms 12”,]: = l)”, Ii — k| S 1 deﬁned in (2.2.8) by the following formulae
8219' (1 — 6N+1)— 2:1 (1 — 0”)
8.212 (1 — 0N+1)— 221 (1 - (9N) + (1 — 6N—1)/8’
{8.1 (1_ JAM-k) _ (1.. J~+1—k)} {8... (1- 6H) _ (1- ale-2)}

(Z1 — 22){64z%(1_ 9N“) -16z1 (1— 9N) + (1 _ 9N-1)} ’
for 2 S k g N +1.

 

l11 =1N+2,N+2 =

lch =

(—2\/§) (21 (1 — 6N) — (1 — 6N‘1)/8)

8.2% (1 — 01"“) — 2.21 (1 — 6N) + (1 — 9N—1)/8’

{.., (1 —. M) - (1 -— M) {... (1 - 9H) — (1 #2))
421(21— 22) {64.2% (1 — 6N“) — 167.1 (1 - 0N) + (1 — 6N‘1)} ’

for 2 g k S N. In particular, there exists a constant c, > 0 such that | malix llikl _<_ C).
i—k £1

 

112 = lN+1.N+2 =

lk,k+1 = '-

2.6.5 Variance calculation

We examine the behavior Of 52 (z) deﬁned in (2.3.4), rewritten as

N
£2 (:12) = 2 (1,8,2 (3:),11: 6 [a,b], (2.6.12)
j=-1
where the spline coefficient vector 5 = ((1-1, - - - , (LN)T according to (2.3.4) is
n N
t -l 1 * N
(v + v ) 5 Z BJ,2 (XJ)a (X.-)eJ- , v = ((BJ2.13,-,.,J>J.)jj,___1 — v.
i=1 j:_l 9 _

where V*, the difference between empirical and theoretical inner product matrices, satisﬁes
lv*| g An; = 0,, {(nh)-1/21og1/2 (11)} by (2.3.2).
Now deﬁne a = (a._1, - -- ,aN)T by replacing (V + V"‘)_1 with V—1 = S in above for-
mula, i.e.
N N

n

N
- 1 Z" , , Z 1
a : S {; Bj12 (At) 0 (At) 51'} : Sjlj’; E Bil-.2 (Xi) 0' (X053 1
i=1

i=1 j=—1 j=-1 j’z—l

26

and deﬁne, with j(:r ) in (2. 2. 1) and

N N
32(37) = :0 aij,2(T 17:) ZS $23 j,2(Xi)0(Xi)5iBJ/,2(I)
32—1 jij ,=-1
1 n
:2 I Z Bj’,2($) Z sj’,j;§;BjJ2(Xi)0(Xi)Ei (2.6.13)
. _. . ]=—l z—l
J -J(I)-1J(I)

for :1: 6 [a,b]. Next deﬁne an (N + 2)-vector U

n N
and 2-vectors {Aj};:0

._ A11 :1 ___1_ 2:: 12:12-18 SJ- 1JI312(X) ( J5)J-
A] _ < A]? l _ SJU “W ( Z": 12N__18j,,~J-IB rJ2(XJ-)0(X )5J ’ (2615)

in which the (j — 1)—th and j-th rows Of the matrix S is denoted as an 2 x (N + 2) matrix

~ (I-.__ _ 8'__ ... S'_ .
s,-=(';Js_1'1l 38:)" 1;”),0333N. (2.6.16)
.71— J) 3)

Then, one can write 52(37) in the following matrix form

em) = DT(:C)A ), :1: 6 [a,b], (2.6.17)

J'(I
in which the function D (x) is a 2-vect0r such that

T
D (x) E {DJ(J)_1 (5-) J 1),-(J) (2:)} .DJ- (:5) a n‘l/zBJ-gcc). —1 s j : N. (26.18)
The next lemma provides the pointwise variance of 22 (x).

LEMMA 2.6.11. The pointwise variance ofég (x) is the function 0% 2 (x) deﬁned in (2.2.10),

which satisﬁes

02 :1:
E {5:3 (x)} E 031,2(x) = :f—(a:)—7%AT (x) Sj(x)A (:13) {1+ rng (x)}, (2.6.19)

with sque[a,b] I7'11,2 (13)] —+ 0, j (x) is as deﬁned in (2.2.1), A (x) as deﬁned in (2.2.7) and
matrix Sj in (2.25). Consequently, there exist positive constants c(I and C0 such that for
large enough 71

ca (nh)—1/2 g 0712(13) g Co (nh)_1/2 ,Vx 6 [(1,1)]. (2.6.20)

27

PROOF. From (2.6.15) and (2.6.17), one has

E {5:3 (x)} = DT (x) cov (A1010) D (x) = DT (x) Sj(x) cov (U) §ﬂ$)D (x).

Note that E(Ei|Xz') = 0, E [8.712 (Xi) 81,2 (Xk)0‘(Xz')(I (Xk) 52511:] = 0,Vi 75 k, the jl-th

entry of the covariance matrix of U deﬁned by (2.6.14) is

~11; : 2 E {BJJ(XJ)BJ,J(XJ)a(XJ-)a(XJ)5J5-J}
i=1k=1
1 Tl
= 52131 Bj,(i,i2X)Bl2(X)02(Xi)=} [02(7) ‘U,)Bzz(v)f(v)dv=0'jl

which is the jl-th entry of the matrix 2 deﬁned in (2.2.6), i.e., cov (U) = E. The rest of

the prOOf is simple algebra. Cl

2.6.6 Proof of Theorem 2.2.1 with k = 2

Prior to the proof Theorem, we introduce some notation. First we deﬁne 2—vectors {ZJ- ”:0

(j) (3') ,
z,- -=- (311,212) = Af{COV(Aj)}_l/2 = (2%” Aﬂ +ﬁ12 15M ) v (2.6.21)
312/) 1'1 +2221) 3
where denote ( )
J
{cov (Aj)}_1/2 E ( [301) #12:) . ' (2.6.22)
'612) '822)

2 Then it is clear that var (Z3) 2 I , var (2.7"!) = 1,7 21,2, for any 3' = 0, ..., N.

The covariance matrix Of Aj approximates 02(tj+1)Sj deﬁned in (2.2.5) uniformly.

LEMMA 2.6.12. For {Aj ”:0 deﬁned in (2.6.15) and matrix Sj deﬁned in (2.2.5), one has

_. 2 . . " . __
cov (Aj) — o (tJ+1)SJ +R j,() _<_ j < N, 111320021]?lele —— 0.

PROOF. Since Aj = SjU with S, deﬁned in (2.6.16) and cov (U) = 2 as in the proof Of

Lemma 2.6.11. Thus the covariance matrix Of AJ- is

N _ , N . ,
COV (Aj) : 3123? = Zﬁlz—l Sj—lJcsj—IJUkl 2k l=—1 5],k5]—1,lakl .
2k,l:—1Sj—1,ksj,lakl Zk,z=_.18j,k8j,10k1
By Assumption (A2), (2.6.16) and (2.2.6)
01-1 = /02 (U) Bk,2 (U) 31,2 ("0)“de = 02 (tk+1)'U/Jz + 010(f02Jl) -

28

Similarly, one also has 0k! = 02 (t1+1)vk1 + cw (fa2, h). Thus

N
31—1 ij—l 11’k102(t1+1) 5]" ij—l 1'01-102 (t1+1) ~ ..
COM/‘1'): ’ ’ 2 ' ’ 2 +Rj,
“2-1 Si-Lk'stvkl" (tk+1) Sj,k3j,1vk10 (tk+1)

where

N N
~ 2 Zk1=_15'—1,kS'—1,1 Z =_ 3',k5'—1,l
thcw(fa,h)( N J J [NI 1] J )

J 221:4 3j~1,k3j.l 2112—1 $chij

Note that 2152,24 8]"):ka =— 011.175 j and Efcv=_1 5]"ka = 1 ifl = j, thus

2 2
. = 314.10 (6') 31—110 (6+1) ~.,_ 2 . . ~.
(Sm/(A3) ( 5241.102 (9+1) 31.10% (6+1) +R’ _ a “”283 +11"

LEMMA 2.6.13. For the matrices 5;”? deﬁned in (22 7)

. ...—1/2
hm max :1 -
"—00 OSJ'SN

— u(tJ-H) {cov (Aj) }“1/2| = 0. (2.6.23)

— 2 — . . . .
PROOF. Note that E j 1/ , {cov (Aj)} 1/ 2are symmetric matrices and usmg the followmg

fact for symmetric matrices A and B

 

C

A—1/2 _ B—l/2l = C max (A—1/2 _ B—1/2) 61
1:12

(BAUQ + .A.Bl/2)(A”1/2 — 3‘1/2) 12,-] = |B - Al,

 

 

_<_ max
221,2

together with Lemma 2.6.12, one has

c|:~:;1/2 — v(t)-+1) {cov (A1) }“/2| .<.. lo“2(tj+1>cov (A1) — 5,]

S lSj - E)" + IU—2(tj+1)COV (Aj) - 8]" = ISJ' — Ejl + U”2(tj+1)ﬁ.j.

The desired result follows from Lemma 2.6.9. Cl

LEMMA 2.6.14. Under Assumptions (A 1)-(A5), for the variables Zj'w ’y = 1,2, 0 gj g N,
deﬁned in (2.6.21), one has

max lim sup P max {Z2 } > 2 {log (N + 1)} ((1,, (Cl/2)}2 S 01/2. (2.6.24)
7:112 n—100 USJSN J2

29

PROOF. Without loss of generality, we prove (2.6.24) only for 7 = 1.

{Z ,21} > 2 {log (N + 1)} {11. (042112]

P[O_<_j<N

= P[Or<na.XN{ZJI}>{210g(N+1))1/2dn(a/2)]1

where, according to (2.6.21) and (2.6.22)

211 — 59/91 +ﬂij2)Aj2= £2: 2 (2021131: 1,11 +ﬁi£)3j,k) Bk,2(X1)0(X1)€1-
ﬁi=lk=—l

Let (id' 2 25:4 (6933);”, + 6132 )Sch) Bk,2(X,-)U(X,-)e,-, j = 0,...,N,1§= 1, ...,n, then
17.
r 2 r
(7—1431 = Sn = 2C1,» ICU/5231) = 7135321 =
'=-_1

So one only needs to ﬁnd a bound for E lC11jl3 in order to apply Lemma 2.6.3 to Sn. By
the boundedness of max IE”, (2.6.3) and (2.6.23)
OSjSN

3
N .
E|<1,,I3:E Z (115% s,.. 11+13§Qs,1) 3,,1111) 03(X1)|si|
k=—1
3

N . ,
s MOE Z (11Ei’s,_1,1+ﬂ§’,)s,,1)B11011) 03(X1) 3011,0114”.

112—1

 

Lemma 2.6.3 entails that An = 0 (n—l/zh-lﬂ) = 0 (N‘2), in which An is

P{n_l/2ZC,J S 2} — (13(2) .
i=1

OgistuplP{ZJ-1 < z}—— <I>(z) )l— —— Ornjachsgp

 

 

By Lemma 2.6.4, one has uniformly in j

PZ[I ,11<{21og(=1N+1)}1/21,(a/2,] ——2—W“Tﬁ+o(N-l).

Therefore

T1

[0211;ng lell > {210g (N + 1)}1/2 dn (Cl/2)]

Mzm

P[|z ,11 >{2log( (N+ 121/211111121]

1»
H
O

[1— {1— 27h” +0(1) =a/2+o(1).

0

14.1
l I

II
I‘MZ

30

Hence

limsupP[0g}z1£x}v{ZJ-1} > 2 {log (N + 1)} {(111 (Or/2)} 2] = a/2. D

121-"’00

LEMMA 2.6.15. For a given 0 < a < 1, and 0112(1) as given in (2.2.10)

lim inf P

n—100

 

sup
$6 [a,b]

 

1.1111111 (11)] S 21log<N + 131/21,11,121] 2 1— a.

PROOF. Note that 62(13) = DT (11:) Aj(x)1 where D (3:) and Aj(x) are deﬁned in (2.6.18) and
(2.6.15). Thus, standardization leads to

{a;,§(z)é2(a:)}2 = {11;}2(11)D (,)}T A111) A321) {a;,12(:rr)D (11)} . (2.6.25)

{0;’12(x)D (11)}T cov (Am) {a;,12(:r)D (1)}

~ -_ __T .“1._.T__ r2
Deﬁne for any J —— 0,...,N, Q.7 -— AJ- {cov (A])} A] —— ZJZ]. — 27:13 A'ﬂ. The

 

maximization lemma of Johnson and Wichern (1992), page 166, ensures that for any :1: E

[01 bl

 

211-). T A AT_11(_11__
1111(2) (1) 1(1) 11,1211) T
T S A11

D :r) I (A . ) D 1:)

an’2(;r CO\ 3(1’) an,2(x

2
0;,12(z)52(:r)| g 03%); Qj. Thus (2.6.24)

1:) {C0V(A J'(I)) }—1 A111) 2 ijrl’

which together with (2.6.25) entails that sup
xE[a,b]

 

implies

limian sup la;12($)52(x)| _<_ 2 {log (N +1)}1/2dn(a/2) 2
"HOG x6[a,b] ’ 1
limian [max

71—400 0<j<N

 

Q1 < 4{10g(N+ 1)}{d11(a/2)}2 2

_ l 2 > — = — . [:1
1 721:2hfrlndsoiépP[0 EELSXN{Zj7} >2{og(N+1)}{dn(a/ )} 2] _1 a/2x2 1 a

The next lemma’s proof follows from Lemma 2.6.8, (2.6.12), (2.6.13), (2.3.2) and (2.6.20).

LEMMA 2.6.16. Under Assumptions (A3) and (.45), one has

 

sup 01112“ .5) .1112( )' — sup =Op{(nh)“l/Qlogn} =op(1).

r6[a,b] :rE[a,b]

 

 

31

PROOF OF PROPOSITION 2.3.1 WITH k = 2. It follows from Lemmas 2.6.15 and 2.6.16
automatically. [:1
PROOF OF THEOREM 2.2.1 WITH k = 2. Note that equation (2.3.6) implies that

”7712 (51:) — m (a:)||OO = Op (h2), hence

(1111)”? {log(N+ 1)}"1/2 111711 (:1) — 111(1)“...

= 0, {(111)1/2 {101m +1)}*1/212} = 01(1).

which implies that the bias order is negligible compared to the noise order. Applying (2.3.7)
with k = 2 in Proposition 2.3.1

lim inf P pm (3:) E 7112 (x) :1: 20,12 (:13) {log (N +1)}1/2 dn(a/2),V:1: E [a,b]]

111—’00

: limian sup 0;,12 (as) |52 (:13) + 7712 (:c) — 771(1E)| S 2 {log (N +1)}1/2 dn (Or/2)]

”“00

 

L:ic€[a,b]
= lilmiééfP sup 0;§($)E2 (1:) §2{log(N+1)}1/2dn ((1/2)] 2 1—a. D
H be[a,b] ’

32

CHAPTER 3

Spline-Backﬁtted Kernel Smoothing of
NAAR Models

3.1 Introduction

For the past two decades, various non— and semiparametric regression techniques have been
developed for the analysis of nonlinear time series; see, for example, Robinson (1983),
Tjostheim and Auestad (1994), Huang and Yang (2004), to name one article represen-
tative of each decade. Application to high dimensional time series data, however, has been
hampered due to the scarcity of smoothing tools that are not only computationally expe—
dient but also theoretically reliable. This has motivated the proposed procedures of this
Chapter.

For the NAAR model in (1.3.1), estimators of the unknown component func-
tions {n1.a(-)}g:1 are proposed based on a geometrically strong mixing sample
(14,-, X“, ..., X,,d}?:1. If the data were actually i.i.d. 'observations instead of a time series re—
alization, many methods would be available for estimating {ma (”321. For instance, there
are four types kernel-based estimators: the classic backﬁtting estimators (CBE) of Hastie
and Tibshirani (1990), Opsomer and Ruppert (1997); marginal integration estimators (MIE)
of Linton and Nielsen (1995), Linton and Hardle (1996), Fan, Hardle and Mammen ( 1998),
Sperlich, Tjostheim and Yang (2002), Yang, Sperlich and Hardle (2003) and a kernel based

method of estimating rate to optimality of Hengartner and Sperlich (2005); the smoothing

backﬁtting estimators (SBE) of Mammen, Linton and Nielsen (1999); and the two—stage

33

estimators, such as one step backﬁtting of the integration estimators of Linton (1997), one
step backﬁtting of the projection estimators of Horowitz, Klemmela and Mammen (2006),
and one Newton step from the nonlinear LSE estimators of Horowitz and Mammen (2004).
For the spline estimators, see Stone (1985), (1994), Huang (1998), and Xue and Yang (2006
b).

In time series context, however, there are fewer theoretically justiﬁed methods due
to the additional difﬁculty posed by dependence in data. Some of these are: the kernel
estimators via marginal integration of Tjestheim and Auestad (1994), Yang, Hardle and
Nielsen (1999); and the spline estimators of Huang and Yang (2004). In addition, Xue and
Yang (2006 a) have extended the marginal integration kernel estimator and spline estimator
to additive coefﬁcient models for weakly dependent data. All of these existing methods are
unsatisfactory in regard to either the computational or the theoretical issue. The existing
kernel methods are too computationally intensive for high dimension d, thus limiting their
applicability to small number of predictors. Spline methods, on the other hand, provide
only convergence rates but no asymptotic distributions, so no measures of conﬁdence can
be assigned to the estimators.

If the last (1 — 1 component functions were known by “oracle”, one could create
{33,11X1,1}?=1 With Yi,1 = Y1“ — C — 232277101 (X13111) = m1(X1',1) + 0 (Xi,11---1Xi,d)51‘1
from which one could compute an “oracle smoother” to estimate the only unknown func—
tion m1 (1:1), thus effectively bypassing the “curse of dimensionality”. The idea of Linton
(1997) was to obtain an approximation to the unobservable variables Y“ by replacing
ma (X230) ,2' = 1, ...,n, a = 2, ..., d with marginal integration kernel estimates and arguing
that the error incurred by this “cheating” is of smaller magnitude than the rate 0 (71-2/5)
for estimating function m1 (3:1) from the unobservable data. The procedure of Linton (1997)
is modiﬁed by substituting mo, (X1301) ,i = 1, ..., n, a = 2, ..., d with spline estimators, specif-
ically, a two-stage estimation procedure is proposed: first one pre—estimates {ma (30)}g:2
by its pilot estimator through an under smoothed centered standard spline procedure, next
one constructs the pseudo response Y“ and approximates m1 (2:1) by its Nadaraya—Watson

estimator as given in (3.212).

34

The above proposed spline-backﬁtted kernel (SPBK) estimation method has several ad-
vantages compared to most of the existing methods. Firstly, as Sperlich, Tjostheim and
Yang (2002) mentioned, Linton (1997) mixed up different projections, making it uninter-
pretable if the real data generating process deviates from additivity. While the projections
in both steps here are with respect to the same measure. Secondly, since our pilot spline
estimator is thousands of times faster than the pilot kernel estimators in Linton (1997), the
proposed method is computationally expedient, see Table 4.4. Thirdly, the SPBK estima-
tor can be shown as efficient as the “oracle smoother” uniformly over any compact range,
whereas Linton (1997) proved such “oracle efﬁciency” only at a single point. Moreover,
the regularity conditions considered here are natural and appealing and close to being the
minimal compared to the papers mentioned above. In contrast, higher order smoothness is
needed with growing dimensionality of the regressors in Linton and Nielsen (1995). Stronger
and more obscure conditions are assumed for the two-stage estimation proposed by Horowitz
and Mammen (2004).

The SPBK estimator achieves its seemingly surprising success by borrowing the
strengths of both spline and kernel: Spline does a quick initial estimation of all additive
components and removes them all except the one of interest; kernel smoothing is then ap—
plied to the cleaned univariate data to estimate with asymptotic distribution. Propositions
3.4.1 and 3.5.1 are the keys in understanding the proposed estimators’ uniform oracle ef-
ﬁciency. They accomplish the well-known “reducing bias by undersmoothing” in the ﬁrst
step using spline and “averaging out the variance” in the second step with kernel, both steps
taking advantage of the joint asymptotics of kernel and spline functions, which is the new
feature of the proofs here.

Fan and Jiang (2005) provides generalized likelihood ratio (GLR) tests for additive
models using the backﬁtting estimator. Similar GLR test based on the SPBK estimator is
feasible for future research.

The rest of the chapter is organized as follows. Section 3.2 introduces the SPBK esti-
mator, and states its asymptotic “oracle efficiency” under appropriate assumptions. Section

3.3 provides some insights into the ideas behind the proofs of the main theoretical results,

by decomposing the estimator’s “cheating” error into a bias and a variance part. Section
3.4 shows the uniform order of the bias term. Section 3.5 shows the uniform order of the
variance term. Section 3.6 presents Monte Carlo results to demonstrate that the SPBK
estimator does indeed possess the claimed asymptotic properties. All technical proofs are

contained in Section 3.7.

3.2 The SPBK estimator

In this section, a spline-backﬁtted kernel estimation procedure is proposed. For convenience,
denote vectors as x = (1:1,, ...,xd) and take I] - I] as the usual Euclidian norm on R“ such
that ”x“ : “Sid 1:2,, and H - “00 the sup norm, “X“oo = $11315an Ira]. In what
follows, let Y,- and X,- = (X111: ..., X,,d)T be the ith response and predictor vector. Denote
Y = (Y1, ..., Yn)T the response vector and (X1, ..., X”)T the design matrix.

Assume that the predictor X0 is distributed on a compact interval [am ba] ,a = 1, ..., (1.
Without loss of generality, all intervals [am b0] = [0, 1] , or = 1, ..., d. We pre-select an integer
N 2 Nn ~ n2/ 5 log n, see Assumption (B6) below. For any a = 1, ...,d, the constant B-
spline function in (1.5.2) can be rewritten as the indicator function 1,1,0, (2:0,) of the (N + 1)
equally-spaced subintervals of the ﬁnite interval [0,1] with length H '= Hn = (N +1)-1,

that is
1.1ng0, <(J+1)H,

=0 l N. 3.2.1
0 otherwise, ”I ’ ’ ’ ( )

IJ,a ($01) 2 {

Deﬁne the following centered spline basis

III.1+1,11||2
]l1J10l]2

with the standardized version given for any a = 1, ..., d,

(11,00,130) : IJ+1,Q(1:(1) — IJ,a (Ea) ,Va 2 1, ...,d, J = 1, ..., N, (3.2.2)

bJ,o: (Ia)
“bJ,a”2 ,

Deﬁne next the (1+ dN)-dimensional space G = G[0, 1] of additive spline functions

8,1,0, (11,) = VJ = 1, N. (3.2.3)

as the linear space spanned by {1,BJ,a (ma) ,a = 1,...,d,J = 1, ...,N}, while denote by
Gn C R” spanned by {1,{BJ’O (Xi’a)}?:1,a =1,...,d,J =1,...,N}. As 11 —1 00, the

dimension of 0,, becomes 1 + dN with probability approaching one. The spline estimator

36

of additive function m (x) is the unique element 111(x) = fnn (x) from the space C so that

the vector {7110(1) , ...,riz (X,,)}T best approximates the response vector Y. To be precise
._ ~10 + Z Z A’JQIJC, 1:0,) (3.2.4)
a=1J=l

~I 1.] AI
where the coefﬁcients (A0, )‘111’ ..., A Md) are solutions of the least squares problem

. T "
{A01 A1, 11 Alma] = argmianN+1 2 Y1 - A0 - Z Z )‘J,a[J,a (X1,a)
i=1 (1:1le

Simple linear algebra shows that

. d N
Th(x)=)\0+ ZJZ1, 118,111.1(111) (32.5)

where (Ag, 111,1, ---1:\N,d) are solutions of the following least squares problem

2
d N
T
{A03A1,11“'3 ANﬁf} =argmianN+IZ Y—— AO‘ZZAJHOBJO (Xz,a) )
i=1 a=1J= 1
(3.2.6)

while (3.2.4) is used for data analytic implementation, the mathematically equivalent ex-
pression (3.2.5) is convenient for asymptotic analysis.

The pilot estimators of each component function and the constant are

N n
7510(1130) = ZAJMBJQ Ia) — "_IZZAJMBJOX i,a)
_ 1': 1J= 1
. d n N ..
11. = 1111-122211131. (x111)- (32-7)
a=12=1 J=l

These pilot estimators are then used to deﬁne new pseuddresponses Y“, which are estimates

of the unobservable “oracle” responses Y“. Speciﬁcally,

(122

(1
{[2,1’“ - — c — 2 ma( Y,1- — ,-— c — Z n1.a(X,-,a), (3.2.8)
01:2

where 6 = 7,, = 11‘1 2:21 Y,, which is a Vii-consistent estimator of c by Central Limit

Theorem. Next, define the spline-backﬁtted kernel (SPBK) estimator of ml (1:1) as 711] (2:1)

37

n
i:

based on {Yi’hXi’l} 1, which attempts to mimic the would-be Nadaraya-Watson esti-

mator 171.;(111) of m1(:c1) based on {Yi’th-szl if the unobservable “oracle” responses

{1331}le were available

23121 K11 (X131 — $1) 13.1
221:1 Kh (X131 — 171)

where 1),-,1 and Y“ are deﬁned in (3.2.8).

1‘. K X- — Y-
,111'{(11,)=Z'-1 "( "1 I1) ”1 (3.2.9)

211:1 Kh (X131 - $1) ’

 

 

mi (331) =

Throughout this chapter, on any ﬁxed interval [0, 1], denote the class of Lipschitz con-

tinuous functions for any ﬁxed constant C > 0 as
Lip([0,1],C) ——— {ml ]m(:1:) — m(:z:')] S C la: — :r'] ,Vx,$’ 6 [0,1]} .

(Bl) The additive component function 1n1(a:1) 6 C(2) [0,1] deﬁned in (1.5.1), while there

is a constant 0 < Coo < 00 such that mﬁ E Lip ([0,1],000), V13 = 2, ...,d.

B2) There exist positive constants K and )1 such that a n S K e-AO" holds for all n,
( 0 0 0

T!
with the a-mixing coefﬁcients for {Zi = (X21150) deﬁned as

a(k) = sup |P(BﬂC)—P(B)P(C)|, k_>_ 1. (3.2.10)
BEG{Zs,sgt},C€o{Zs,32t+k}

(BB) The noise 5,- satisﬁes E(5,- ]X,-) = 0, E (5,2 ]X,-) = 1, E (I51|2+5]X1) < M5 for some
6 > 1/2 and a ﬁnite positive M5. The conditional standard deviation function a (x)

is continuous on [0,1]d and

0 <ca g inf o(x) S sup o(x) 3 Ca < 00.
x6101)“ 1610,1111

(B4) The density function f (x) of X is continuous and

0<cf§ inf f(x)§ sup f(x)SCf<oo.
x€[0,1]d x€[0,1]d
The marginal densities fa (230,) of X0 have continuous derivatives on [0, 1] as well as

the uniform upper bound C f and lower bound Cf.

38

(B5) The kernel function K ELip ([—1, 1],Ck) for some constant Ck > 0, and is bounded,
nonnegative, symmetric, and supported on [—1,1]. The bandwidth h of the kernel K

1/5

is assumed to be of order n" , i.e., chn-l/5 S h _<_ C'hn’l/5 for some positive

constants Ch, ch.

(86) The number of interior knots N ~ 112/5 log n, i.e., anQ/5 logn g N S CNnZ/5 logn

for some positive constants cN,CN, and the interval width H = (N + 1)’1 .

REMARK 3.2. 1. The smoothness assumption of the true component functions is greatly
relaxed and Assumption (BI) is closed to the minimal. By the result of Pham (1986), a
geometrically ergodic time series is a strongly mixing sequence. Therefore, Assumption (B2)
is suitable for (1.3.1) as a time series model under aforementioned assumptions. Assumption
(B3)-(B5) are typical in the nonparametric smoothing literature, see for instance, Fan, and
Gijbels (1996). For (B6), the proof of Theorem 3.2.1 in Section 3.7 will make it clear that
the number of knots can be of the more general form N ~ n2/5N', where the sequence N ’
satisﬁes N’ —+ 00, n’ON' —1 0 for any 0 > 0. There is no optimal way to choose N’ as in
the literature. Here N is selected to be of barely larger order than n2/5.

The asymptotic property of the kernel smoother 111'; (2:1) is well-developed. Under As-
sumptions (Bl)-(B5), it is straightforward to verify (as in Bosq 1998) ”that

sup lift; (1:1) — m1(:r1)| = 0,, (n_2/5 log n)
$1€]h,1—h]

\/fl—h{ 311071) - m1 (931) — b1(1:1)h2] 2 N {0’1}? (1:1)},

where

b1(1‘1) IUZKfuldufm'l'fxilfi($1)/2+m'1($1)f{($1)}f1_1($1)1 (3211)
vi (1,) = 1160011113 [02(X1....,X.1) 1X1 = 111111-1011). ' '

The following theorem states that the asymptotic uniform magnitude of difference between
111; (2:1) and 1111‘ (2:1) is of order 0,, (n‘2/5), which is dominated by the asymptotic uniform
size 01171; (3:1) -— m1 (171). As a result, 111; (2:1) will have the same asymptotic distribution

as 171’,“ ($1).

39

THEOREM 3.2.1. Under Assumptions (BI) to (B6), the SPBK estimator riff (11:1) given in
(3.2.9) satisﬁes

sup (111; (11,) — 111; (1:1)1 = 0,, (152/5).
x1€[0,l]

Hence with b1 (3:1) and v? (2:1) as deﬁned in (3.2.11), for any 2:1 E [h,1 — h]

1111(111’; (2:1) — m1 (11:1) — b1(:1:1)h2} 2, N {010% (3:1)} .

REMARK 3.2.2. The above theorem holds for 111:, (3:0,) similarly constructed as 1111‘ (11:1),

for any a 2 2, ...,d, i.e.,

TL K X- —1: 1?- .
111:“,(.1O,)=21-}l $5; a) “'0‘, 19,,,=1/,-—a— Z 111, (Xw), (3.2.12)
21:1 h 1,1—2211) 13113111311111

 

where 111.), (Xiyﬂ), [3 : 1, ...,d are the pilot estimators of each component function given
in (3.2.7). Similar constructions can be based on local polynomial instead of Nadaraya—
Watson estimator. For more on the properties of local polynomial estimators, in particular,
its minimax efficiency, see Fan and Gijbels (1996).

REMARK 3.2.3. Compared to the SBE in Mammen, Linton and Nielsen (1999), the
variance term v1 (3:1) is identical to that of SBE and the bias term b1 (2:1) is much more
explicit than that of SBE at least when Nadaraya—Watson smoother is used. Theorem 3.2.1
can be used to construct asymptotic conﬁdence intervals. Under Assumptions (B1)-(B6),

for any a E (0, 1), an asymptotic 100 (1 - a) % pointwise conﬁdence intervals for m (2:) is

111.1(11) —b,(111)11‘2110mm,){/K2(11)d11]1/2/{11hf,(1:1)}1/2, (3.213)

where 61(x1) and f1 (1:1) are any constant estimators of E [o2 (X) IX 1 = 2:1] and f1 (2:1).
The following corollary provides the asymptotic distribution of ﬁt" (x). The proof of

this corollary is straightforward and therefore omitted.

COROLLARY 3.2.1. Under Assumptions (BI) to (B6) and the additional assumption that
ma ($0,) E 0(2) [0,1], 0: == 2,...,d, for any x E [0,1]d, the SPBK estimator 111:, (x), a =

1, ...,d, are deﬁned as given in (3.2.12). Let

d
711* (x) = (3 + 2 7T1; (350)1bfx) : Z (10(1301) ,v2(x) : Z U121,($a)1
0:1

then

M{m* (x) -— m (x) —b(x)h2} 2, N {0,122 (x)}.

3.3 Decomposition

In this section, some additional notations are introduced in order to shed some light on
the ideas behind the proof of Theorem 3.2.1. Denote by ||¢||2 the theoretical L2 norm of

a function 45 on [O,1]d, ||¢||§ = E {(1)2 (X)} = fl0 11d d2 (x)f(x) dx, and the empirical L2

_1 n
i=1

norm as Mug,” 2 n (t2 (X,). The corresponding inner products for L2-integrable

functions (f), 90 on [O,1]d are

m» E{¢(X)w(X)} = [[0 lld¢(X)so(X)f(X)dx,

(cf), so>2,n = n“ E (I) (Xi) so (Xi)-
i=1

The evaluation of spline estimator m (x) at the n observations results in an n—dimensional
vector, in (X1, ..., Xn) = {in (X1) , ..., m (Xn)}T, which can be considered as the projection
of Y on the space 0,, with respect to the empirical inner product (-, )2,” . In general, for any
n-dimensional vector A 2 {A1, ..., An}T, deﬁne PnA (x) as the spline function constructed

from the projection of A on the inner product space (Gm (3)231)

d N
PnA (X) Z )‘O '1‘ Z Z AJ,aBJ,a ($01):
a=1J=l

with the coefﬁcients (20, 21,1, ...,;\N,d) given in (3.2.6). Next, the multivariate function
PnA (x) is decomposed into empirically centered additive components PmaA (2:0,), (1 =

1, ..., d and the constant component Pch

n
pr (2:0) = P3,.» (ma) — n“ 2 PtaA (X...) , (3.3.1)
i=1
A d n
PMA = A0 + n_1 2 Z P;,OA (Xm) . (3.3.2)
021 i=1

where PE‘QA (3:0,) 2 291:1 X1,“ B 1’0. ($01). With these new notations, one can rewrite the

spline estimators m (x) , ma (2:0) , me deﬁned in (3.2.5) and (3.2.7) as

771' (X) : PnY (X) , ﬁlo (Ia) Z Pn,aY (ma) ,ﬁlc = Pn,cY,

41

Based on the relation Y = m(X)+o (X) 5 = m (X)+E with noise vector E = {o (Xi) ei};l=1,

one deﬁnes similarly the noiseless spline smoothers
m (x) = Pn {771(X)}(x),1iia(a:a) = Pma {m (X)} (ma) ,mc = Pmc {m (X)} , (3.3.3)
and the variance spline components
E (x) = m: (x) .2.. (x3) = Pics (ma) ,5. = p.,.E. (3.3.4)

Due to the linearity of operators Pn, Pmc, Puma = 1, ...,d, one has the following crucial

decomposition for proving Theorem 3.2.1,
m (x) = m (x) + E (x), mc = The + EC, ma (2:0)) 2 ma (11:0,) + E00130) (3.3.5)

for a = 1, ..., (1. As closer examination is needed later for E (x) and Ea (11:0), one deﬁnes in

addition 5 = {(10, (11,1, ..., aN,d}T as the minimizer of the following
2
n

d N
Z 0 (X051 — a0 - Z Z aJ,oBJ,a (X130) - (3.3-5)

i=1 a=1J=1

—1
Then 2‘ (x) in (3.3.4) can be rewritten as 5TB (x), where 5 = (BTB) BTE is the solution

of (3.3.6), and vector B (x) and matrix B are deﬁned as
T
B (X) 2’— {1, 31,1 (11:1) , ..., BN,d ($d)} , B = {B (X1) , ..., B (Xn)}T . (3.3.7)

To be specific, the least square solution of the noise is

—1
T
a: 1 OdN ézy=10(xi)5i
OdN <BJ,aaBJ’,a/>2’n Isma’sd, a ?=1BJ,a(Xi,a)0(Xi)€i 1ngN,
ng,J’gN 1309’
(3.3.8)

where 0p is a p—vector with all elements 0. The main objective here is to study the difference
between the smoothed backﬁtted estimator m; (3:1) and the smoothed “oracle” estimator
m’i‘ (3:1) , both given in (3.2.9).

Horn now on, assume without loss of generality that d = 2 for notational brevity.
Making use of the deﬁnition of ('3 and the signal noise decomposition (3.3.5), the difference

m; (11:1) —- m; (2:1) — f: + c can be treated as the sum of two terms

% ELIKh(Xi,1-$1){ﬁl2(xi,2)“m2(Xz',2)}Z ‘I’t($1)+\1’v(rvi)
hill KI: (X231 — $1) $121119; (Xi,l - 2:1)

 

 

, (3.3.9)

42

where

‘1’!) (171) = 7-11-2101 (X131 - $1) {(7129132) - m2 (Xi,2)}, (3-3-10)
Wu (5151) = —;:21Kh(x 1 — $1)€ 2(Xi,2) . (3.3.11)

The term \Ilb (2:1) is induced by the bias term 1712 (X232) —m2 (Xi,2), while ‘11,, (2:1) is related
to the variance term E2 (X 132). Both of these two terms have order op(n“2/5) by Propositions
3.4.1 and 3.5.1 in the next two sections. Standard theory of kernel density estimation ensures
that the denominator term in (3. 3. 9), 1 2,; 1 K h (X131 — 3:1), has a positive lower bound
for 2:1 E [0, 1]. The additional nuisance term 6 —— c is of clearly order 0,, (n‘lﬂ) and thus
0,, (n’2/5), which needs no further arguments for the proofs. Theorem 3.2.1 then follows

from Propositions 3.4.1 and 3.5.1.

3.4 Bias reduction

In this section, we show that. the bias term \Ilb (2:1) of (3.3.10) is uniformly of order

0;, (n‘2/5) for 1171 E [0,1], which is given by Proposition 3.4.1 as below.

PROPOSITION 3.4.1. Under Assumptions (B1) to (B2), and (B4) to (B6)

..1‘13,..""b<$1>' = 0p (.412 + H) = ., (Tl-”5)-

One important result from page 149, de Boor (2001), is cited before the proof.

LEMMA 3.4.1. There exists a constant Coo > 0 such that for any component function ma 6

Lip([0,1] ,Coo) and function 90 E 0,01 =1,...,d, Hga — mall00 S COOH.

LEMMA 3.4.2. Under Assumption (8]), there exists function 91, 92 E G, such that

2
Th — g + :3 (1,901 (X0)>2,n

(.121

= 017(n'1/2 + H) ,
2,n

 

 

 

 

where g (x) = c + 23:1 9“ (1rd) and 171 is deﬁned in (3.3.3).

43

PROOF. By Lemma 3.4.1, there is a constant C'00 > 0 such that for function go, E G

”90 —malloo _<_ COOHv a = 132' Thus “g_mlloo S 231:1“90 “malloo S 2CooH 311d

||m — m||2,,, S “g — mllzn S 2CooH. The triangular inequality then implias that

”771 — 9|l2,n 5 “Th — mll2,n + ”9 - m”2,n S 4000(1)

|(9a(Xa),1)2,nl s |<1,ga (X3)... — (11m0(XC1))2,n

 

+ |<1.ma(Xa)>2,n

 

 

 

 

 

 

 

 

 

3 COOH + 0,,(n-1/2) . (3.4.1)
Therefore
2 2
7h _ g + Z (1290 (XO))2,n S “777' _ gll2,n + Z I<1290 (XO))2,n
0:1 2,n 0:1
3 600011 + 0,, (71-1/2) = 0,, (TH/2 + H). E]
PROOF OF PROPOSITION 3.4.1. Denote
Rr ___ sup Z?=1Kh(Xi,1 - 1‘1) {92 (X232) - m2 (Xi.2)}
3:1€[0,1] ZZZ—.110; (X231 ‘ 171)
R Z?=1Kh (X231 - I1) (7712 (Xi,2) - 92 (Xi,2) +(1:!12(X2))2,n}
2 = sup a
x16[0,1] Z?=1Kh(xi,1 — 31)
then supxlelml [\I/b (2:1)|< ((1,512 (X2))2 ,, + R1 + R2. For R1, using Lemma 3.41
To deal with R2, let 3120120,) 2: BJ’z (2:0)) — (1, 8J2 (Xa))2,n, for J = 1, ..,N, O: = 1,2,

then one can write

2 N
m(x) (x)+:(1, ga(Xa))2 "“251 +0223}, BJa( (.230)

(1:1
Thus, n'l 221:1 Kh (X131 — 31:1) {1712 (XL?) — 92 (X13) + (1,92 (X2))2,,,} can be rewritten
n—l 2?:1Kh (Xi,1 — x1) 2:"); 1aJ2BJ2 (X132): bounded by

N
Z “1,2

=1 1<J<Nn

N
3261'}

2 { sup
J21 nggN

 

 

":1 Kth 211— 331) 33,2 (X232)

n“ 2w (X331)

i=1

k,

 

 

 

 

 

 

Tl
+ An,l 'Wl 2: Kb (X131 — 231)
i=1

 

},

44

where A,“ = 0,, (log n/\/1—i) as in (3.7.15), LUJ (X1321) is as given in (3.5.5) with mean
you (2:1). By Lemma 3.7.2

sup sup
x16[0,l]1SJSN

+ sup sup In,” ($1)! 2 Op (logn/M) +Op (HI/2) = Op (HI/2).

IIEIO,” ISJSN

sup sup

n
-l
71, ij (Xia-Tl)“qu (IE1)
I16[0,l] ISJSN

i=1

 

n
”‘1 ZWJ (X13351) <
i=1

 

 

 

Therefore, one has

n‘1 2191(th — 2:1) {1712 (X132) - 92 (X232) + (1,92 (X2))2,n}

 

 

 

 

 

 

 

 

 

 

sup
$16l0,l] i=1
N .. 2 1/2 1/2 logn N d 2 1/2
S N Z (am) 0,, (H ) + 0,. W, = 0. Z (w)
J=l J=1
2 2
_—. 0,,( ... — g + 5; (1,9... we»... ) = 0.. .. —g + : <1.ga(Xa)>2,..
0:21 2 0:1 2,n
where the last step follows from Lemma 3.7.7. Thus, by lemma 3.4.2
R2 = 0,,(rr1/2 + H) . (3.4.3)
Combining (3.4.1), (3.4.2) and (3.4.3), one establislws Proposition 3.4.1. CI

3.5 Variance reduction

This section shows that the term ‘11., (1:1) given in (3.3.11) is uniformly of order 0,, (n72/5).
This is the most challenging part to be proved, mostly done in Section 3.7. Deﬁne an

auxiliary entity

m:

N
~20 0,12 B,J2( (3-5—1)

where (1J2 is given in (3.3.8). Deﬁnitions (3.3.1) and (3.3.2) imply that E2 (2:2) is simply

the empirical centering of (2‘; (11:2), i e

n
3:2 (:2) E g; (.172) — 114: a; (Xw). (3.5.2)
i:l

PROPOSITION 3.5.1. Under Assumptions (B2) to (B6),

sup N. (2:1)l = 0p (H) = 010 (7772/5) .
1216(0,”

According to (3.5.2), one can write ‘111, ($1) = ‘11?) (x1) -— 315,1) (11:1), where

1)(3131) 2 ”Plth( (,)X11-$1 711253, (X132), (3-5-3)
i=1

wt” (x1) = WZXI. (Xz,1-$1)5§(X1,2), (3.5.4)
[=1

in which E; (X2?) is given in (3.5.1). Further one denotes
an (X1421) = Kh(X(,1 - 1‘1)BJ,2 (X12) , no] ($1) = EWJ (X1551), (35-5)
by (3.3.8) and (3. 5 1), ‘11,,(2) (2:1) can be rewritten as

n N
2 _ ..
\I/S, ) (171) = n l E E aJ,2wJ (X),:1:1). (3.5.6)
l=l J=1

The uniform order of ‘11“) (11:1) and \I/(2 )(1121) are given in the following two lemmas.

LEMMA 3.5.1. Under Assumptions (B2) to (86), (11.8%,) in (3.5.3) satisﬁes

   

l,” (1:1)] = 0,, {N (logn)2/n}.

    

;E1€[0,l]

PROOF OF LEMMA 3.5.1. Based on (3.5.1)

71 Tl
—1 :55 (Xi,2) = 9.1.2 {71—12322 (Xi,2)}
n

l
5 2 31,2 (X232) -
i=1

M2

J

N
2am - sup
J21 ingN

l

l/\

 

 

Lemma 3.7.5 implies that
N 1/2

N
25.1.2 g N233, g{N.5T5}
J=l

J=l

”2 = Op(Nn—1/210gn) .

By (3.7.15),(3.7.18), sup In“l 21:1 8,1,2 (Xi,2)| S An,l = 0,, (n71/2 log n), so
IngN

”—1252“ =Op{N(logn)2-/n} (3.5.7)

46

By Assumption (B5) on the kernel function K, standard theory on kernel density estimation
entails that squle[0,1] ln‘l 211:1 Kh (Xhl — 1:1)l 2 Op (1). Thus with (3.5.7) the lemma

follows immediately. CI
LEMMA 3.5.2. Under Assumptions (B2) to (B6), ‘1’?) (2:1) in (3.5.4) satisﬁes

sup
11610.1]

 

«153’ (ml = 0M).

Lemma 3.5.2 follows from Lemmas 3.7.9 and 3.7.10. Proposition 3.5.1 follows from
Lemmas 3.5.1 and 3.5.2.

3.6 Simulations

In this section two simulation experiments are carried out to illustrate the ﬁnite-sample
behavior of the SPBK estimators m; (sea) for a = 1,...,d. The programming codes are
available both in R 2.2.1 and XploRe. For more information on XploRe, see Hardle, Hlavka
and Klinke (2000) or visit the following website, http://www.xplore—stat.de.

The number of knots N for the spline estimation as in (3.2.6) will be determined by the

sample size and a tuning constant c. To be precise
N = min ([cn2/510gn] +1, [(n/Q —1)d_1]),

in which [a] denotes the integer part of a. In this simulation study, c is chosen to be 0.5 and
1.0. As seen in Table 4.3, the choice of c makes little difference, so we always recommend
to use c z 0.5 to save computation for massive data set. The additional constraint that
N g (n/ 2 - 1) d—1 ensures that the number of terms in the linear least squares problem
(3.2.6), 1+dN, is no greater than n / 2, which is necessary when the sample size n is moderate
and dimension d is high.

We have obtained for comparison both the SPBK estimator my; (350,) and the “oracle”
estimator mg, (33a) by Nadaraya-Watson regression estimation using quartic kernel and the
rule—of-thumb bandwidth.

We consider first the accuracy of the estimation, measured in terms of mean average

squared error. To see that the SPBK estimator mg, (2:0) is as efﬁcient as the “oracle”

47

smoother ﬁt; (ma), define the following empirical relative efﬁciency of my, (.130) with respect
to in; (3:0,) as

11 ~ .. 2 1/2
Zizl {ma (X1170) - m0, (Xi,a)}

.. 2
ESE-.1 {m3 (X130) - ma (Xi,a)}
Theorem 3.2.1 indicates that the effa should be close to 1' for all a = 1, ...,d. Figure 4.15

 

(3.6.1)

effa =

and 4.16 provide the kernel density estimations of the above empirical efﬁciencies to observe
the convergence, where one sees that the center of the density plots is going toward the
standard line 1.0 and the shape of those plots becomes narrower as well when sample size

n is increasing.

3.6.1 Example 1

A time series {1’2}?:§1999 is generated according to the NAAR model with sine functions

given in Chen and Tsay (1993),
. 7T , 71‘
Y; =1.551n(;2—Yt_2) —1.081n(-2-Yt_3) + cost, 00 = 0.5,1.0,

where {502331996 are i.i.d. standard normal errors. Let X? = {Yt__1, Yt_2, Yt_3}. Theo-

n+3
rem 3, page 91 of Doukhan (1994) establishes that {Yb X?) is geometrically ergodic.

t=—1996
The ﬁrst 2000 observations are discarded to make the last n+3 observations { Yt}?=+ 13 behave
like a geometrically a-mixing and strictly stationary time series. The multivariate datum
{1Q,XtT}::: then satisﬁes Assumptions (Bl) to (86) except that instead of being [0,1],
the range of Yt—a, a = 1, 2, 3 needs to be recalibrated. Since there is no exact knowledge of
the distribution of the Yt, many realizations of size 50000 have been generated from which
one sees that more than 95% of the observations fall in [—2.58,2.58] ([—3.14,3.14]) with
00 = 0.5 (00 = 1) . We will estimate the functions {ma(a:a)}g=1 for ma 6 [—2.58,2.58]

([—3.14,3.14]) with 00 = 0.5 (00 = 1.0), where

m1(5131) E 0,
m2 (1:2) E 1.5 sin (£3172) — E [155111619] ,
m3 (1'3) E —1.0$in (€173) -— E[—1.031n(th)] .

48

Sample size n is chosen to be 100, 200, 500 and 1000. Table 4.3 lists the average squared
error (ASE) of the SPBK estimators and the constant spline pilot estimators from 100 Monte
Carlo replications. As expected, increases in sample size reduce ASE for both estimators
and across all combination of c values and noise levels. (Table 4.3 also shows that the
SPBK estimators improve upon the spline pilot estimators immensely regardless of noise
level and sample size, which implies that our second Nadaraya—Watson smoothing step is
not redundant.

To have some impression of the actual function estimates, at noise level 00 = 0.5 with
sample size n = 200, 500, the oracle estimators (thin dotted lines), SPBK estimators in;
(thin solid lines) and their 95% pointwise confidence intervals (upper and lower dashed
curves) for the true functions ma (thick solid lines) have been plotted in Figure 4.12, 4.13
and 4.14. The visual impression of the SPBK estimators are rather satisfactory and their
the performance improves with increasing n.

To see the convergence, Figure 4.15 plots the kernel density estimations of the 100
empirical efﬁciencies for sample sizes n = 100, 200, 500 and 1000 at the noise level 00 = 0.5.
The vertical line at efficiency = 1 is the standard line for the comparison of ﬁt; (11:0,) and
1h; (10.). One can clearly see from Figure 4.15 that as sample size 71. increases the efﬁciency
distribution converges to 1, conﬁrmative to the conclusions of Theorem 3.2.1.

Lastly, the computing time of Example 3.6.1 is provided based on 100 replications done
on an ordinary PC with Intel Pentium IV 1.86 GHz processor and 1.0 GB RAM. The average
time run by XploRe to generate one sample of size n and compute the SPBK estimator and
marginal integration estimator (MIE) has been reported in Table 4.4. The MIEs have been
obtained by directly recalling the “intest” in XploRe. As expected, the computing time for
M113 is extremely sensitive to sample size due to the fact that it requires n2 least squares
in two steps. In contrast, at least for large sample data, the proposed SPBK is thousands
of times faster than MIE. Thus our SPBK estimation is feasible and appealing to deal with

massive data set.

49

3.6.2 Example 2

Consider the following nonlinear additive heteroscedastic model
d W . o d
. 1.1..
Yt = 2:18111(EX¢_0)+ 0(X) abet .~ N(0, 1) ,
a:

in which X? = { Xt_1, ..., X t—d} is a sequence of i.i.d random variables with standard normal
distribution truncated in the interval [—2.5,2.5] and the conditional standard deviation

function is deﬁned as

o (X) 2 JOE - 5 — eXP(Zg=1lXt-a|/d) 00 : 0.1.

2 5 + exp (22:1 lXt—al/ d) ,

This choice of a (X) ensures that the design is heteroscedastic, and the variance is roughly

 

proportional to dimension d. This proportionality is intended to mimic the case when
independent copies of the same kind of univariate regression problems are simply added
together.

For d = 30, 100 replications have been done for sample sizes n = 500, 1000, 1500 and
2000. The kernel density estimator of the 100 empirical efﬁciencies is graphically represented
in Figures 4.16 and 4.17. Again one sees that with increasing sample sizes, the relative

efﬁciency are becoming closer to the vertical standard line, with narrower spread out.

3.7 Proof of Theorems

Throughout this section, an >> 1),; means lim bn/an = 0, and an ~ on means lim bn/an =
n—+oo n——+oo

c, where c is some constant.

3.7.1 Preliminaries

Deﬁne for a =1,2,J =1,...,N +1

on. = IIIJ,..||§ ——- [13,... (was (mantra. (3.7.1)

LEMMA 3.7.1. Under Assumptions {B4} and (B6), one has:

50

(i) there exist constants C0 (f) and Cl (f) depending on the marginal densities
1.. (as) ,a = 1,2, such. that Co (f) H s ||b1,a||§ s 01 (f) H-

 

(ii)
1 J’=J
_CJ 1 ((2.1, “(110,“;l J’=J—1
E{BJ,.. (x...) 3.1/,0. ma} = + “l ““2 J
«Wuhan; Ila. ll,“ _m
0 |J—— J’l >1
~ 1 |J’—-J| g1
11 |J’—-J|>1
and fork 21,
1.11.1; (Wham is) 1:1,
I: ,__ _
EIBJ,a (Xi,a)BJ’a( J+1’a”bJ’a“2 “by“ “2k ﬁg}; J J 1
CJ+2,allb-I,2_0”kHbJ’,a'2—k/CJ+110,J=J+1
0 |J— J’|>1
H1 k If ——J| <1
0 |J’— J] >1

where 0,1,0, (1 =1,2,J =1,...,N +1 are given in (3.7.1).
PROOF. Note that for any a = 1,2, J = 1,...,N, byﬂ ($0) in (3.2.2) can be rewritten as
bJ,a ($0) = IJ+1,a ($a) - CJ+1,aIJ,a(xa)/CJ,01 and
2
”bJ,a“2 = CJ+l,a(1+ CJ+l,a/CJ,a)a

In Assumption (B4), the two positive constants cf, C f are the upper and lower bounds of

fa (:50), then
CfH S CJ,a S CfH

CO (f) H 2 cf (1 + cf/Cf) H 3 "111,0“; 3 C, (1 + of/cf) H = Cl (f) H,

for all J = 1, ...,N + 1,01 21,2. The proof of (ii) is trivial. Cl
LEMMA 3.7.2. Under Assumptions (B4) to (86), for an” (171) given in (3.5.5)

= o (HI/'3).

 

sup sup l/in (:rl)
3316(0,” ISJSN

51

PROOF. By deﬁnition,

 

11w‘1(2:1)l = |E{Kh(X1J — 3:1) 8‘12 (X1,2)}| is bounded by

ijh (“1 -$1)IBJ,2(U2)|f (U1,U2)dU1dU2

//1{(U1)'————bJ2(u2)l f(hU1 +$1,112)dv1‘d112
“’9 “2J2

(112,.)‘21 {f/K 1,11.12<u2>1<1w. +$12u2)dv1du2

+ (%i£)l/2//K(vl)1J,2(u2)f(’wl + $1,U2)d’01d‘u2}-

The boundedness of the joint density f and the Lipschitz continuity of the kernel K will

(I

then imply that

sup sup //1{(U1)1J,2 (112)f(hu1 + $13,112)du1du2 S CKCIH,
$1€[0,l] ISJSN

the proof of the lemma is then completed, by (i) of Lemma 3.7.1. Cl

LEMMA 3.7.3. Under Assumptions (Bi) and (BU), there exist constants CO > CO > 0 such
that for any a = (a0,a1,1, ...,aN,1,a1)2, ...,n/(1,2),
‘2

2 , 2 2 r
C0 a0 + Zaia S a0 + Z (lJaaBJ’a S CO 0.0 + Zaj’a . (3.7.2)
J,a J,a 2 J,a

PROOF. Lemma 1 of Stone (1985) provides a constant (:0 > 0 such that

‘2 N 2 N 2
2
(10 + Z aJ,aBJ,(r 2 C0 (10 + Z aJ,lBJ,l + Z aJ,ZBJ,2 1
J,a 2 J21 2 J=1 2

then (3.7.2) follows if there exist constants 06 > 06 > 0 such that for a = 1, 2
‘2

N N
I 2
60 Z 01,2 3 Z “JnBJe
J=1 J=l

N
g 06 2 113,0. (3.7.3)
2 J=1

To prove (3.7.3), the original B—Spline basis is employed. Without loss of generality, let
a = 1, and use the constant basis {IJJ (2:1)}[JV:11. Represent the term 2.11:1 a JJB J,1 (2:1)

as follows
N N +1

20.11311071):203111110131) (3-7-4)

J: l

52

Theorem 5.4.2 in Devore & Lorentz (1993) says that there is an equivalent relationship
between the Lp (p > 0) norm of a B-spline function and the sequence of B-spline coefﬁcients.

To be speciﬁc

NH 2 NH N+l

2
Z dJ,11J,1 =/ Z dJ,IIJ,1($1) idl‘l = 2 (13,13
J=l J=1 J21

L2
The uniform boundedness of the joint density in Assumption (B4) implies that

NH 2 NH 2 NH 2

c, Edwin 3 20111111 SC; Zdu’m
J=l J=1

L2 2 J=l L2

Then Lemma 3.7.1 and (3.7.4) lead to

N+1 N 2 2
a
2 J,l C.I+1,1
§:dJ,l:§:_—{< )+1}‘
J=1

 

J=1 “bull: C“
Then
N N+1 N
ca 2 (23111—1 3 2 (13,15 Ca 2 (23,111“,
J=l J=1 J=l

for positive constants ca and Ca. Therefore,

N N 2 NH 2 N
2 2
“f0“ 2 “J,1 3 Z “1213121 = Z dJJIJJ 5 CfCa 2 0.1.11
le J31 2 J=1 2 J21
i.e. (3.7.3) holds given of, = efca, 06 = CfCa. C]

Lemmas 2.6.2 and 3.7.2 entail the next Lemma 3.7.4, which shows the uniform supre-
mum magnitude of n—1 }:le {WJ(X[,$1) — qu (2:1)} and n-1 Z?=1WJ(Xla-’Cl)- The

quantities on (X1,:r1) and qu (3:1) are deﬁned in (3.5.5).

LEMMA 3.7.4. Under Assumptions (82), (B4) to (B6)

sup sup
xle[0,1] ISJSN

= 0,, (log n/M) , (3.7.5)

 

 

71—1 :2 {wJ (X12131) — Ia.” (131)}

Tl

n‘l ZLUJ (X1,:1:1)

[:1

Sllp sup

2 0,,(H1/2) . (3.7.6)
x1€[0,l] nggN

 

 

PROOF. For simplicity, denote w} (X1, 51:1) = mg (X1, 1:1) —— 11“” (11). Then

2
E{w3(xz.z1)} = Ew3(x2.x1) - 12%,, (x1).

While E1222, (X1, 1:1) is equal to

 

_ -2 CJ+1,2
h l “bJ,2”2 K2 (“1) 1J+1,2 (“2) + IJ,2 (212) wal + I12U2)dvldU22
CJ,2

which implies that Ew3(X1,:1:1) ~ h’1 and Ew3(X1,231) >> [1,2,J(x1). Hence for n sufﬁ-
ciently large

* 2 * '-
E{WJ(X12171)} =Ew3 (X12$1)‘#3J ($1) 26 h 12

for some positive constant c*. When 1' Z 3, the r—th moment E IwJ (X1, x1)|r is

{HbJ,2“2)_r//K£(ul “171){IJ+1,2(U2) + (Cum) 1J,2(U2)}f(U1,U2)du1dU2-

CJ2

)

 

It is clear that EIwJ (X1,$1)|r ~ 11(1—T)H1”r/2 and IEWJ‘ ()'{l,:1:1)|r S CHr/2 by Lemma
T
3.7.2, thus E le (X(,$1)|r >> IHWJ (11:1)l .

T

E )w3(X12I1)|r = Ele (X12151) * #2” (351)

S 2"“1(EIwJ(X1,1‘1))T+lpwj(xl)|r) S {ch-lH—l/2}

 

(T2) rlE [21} (x,, 2:1)|2,

then there exists a constant c* : cit-1H 71/ 2 such that
—— 2
Ewa,(X,,x1)|r S C: 2T!Elw3 (X12130) 2

that means the sequence of random variables {wj(X1,x1)}?=l satisﬁes the Cramér’s con—

dition, hence by the Bernstein’s inequality one has for r = 3

2 6/7
qpn 72'
P > <0. ex — +a 3 a —-— ,
{ "(M)" l p( 25m§+5c2pn) 2” ([HID

n
"—1 ij (X12161)
i=1

 

 

 

 

 

where
102" 'th 2” +2 1+ ’02 '11 2 I“
p=p-—-—-,w1 az— ,w11m~1,
n Vnh l q 25mg + 5c2pn 2
(12(3) = Mn 1 + 5m‘2/7 with 7713 : max “w" (X :1: )H < C0 (211—1)2 ”3
pn 1 ISiSn J [1 l 3 _. .

54

Observe that 504),, = 0(1), then by taking q such that [ﬁr] 2 ('0 log n, q 2 cln/ logn for

some constants (0,01, one has a1 = 0(n/q) = 0 (log 11), a2 (3) = 0 (112). Assumption (82)

n 6/7 n 6/7 6A
_ __ - - C0/7
0([Q+1D S{K0exp( A0[Q+1])} 5011 0 .

Thus, for n large enough,

yields that

 

 

l n plogn . 2
P — w" X,:c > <cn—C2p lo n+Cn2_6’\0C0/7. 3.7.7
{n [23 J< , 1) H}. g < >

 

 

We divide the interval [0, 1] into Mn ~ n6 equally spaced intervals with disjoint endpoints
0 = 551,0 < 2:111 < < “TLMn = 1. Employing the discretization method, one has

n

n-1 :00} (X1, 121$)

[=1

sup sup (3.7.8)

x1€[0,1]1SJSN

= sup sup
OSkSMn ngSN

 

 

 

n
"—1 2w} (X1431)
1 1

 

n
n“1 Z {w} (Xm) - 01} (Xz,x1,k)}
[=1

By (3.7.7), there exists large enough p > 0 such that for any 1 S k g Mn,1 S J _<_ N

1
p {.
n
which implies that

(X)
Z P { sup sup

OSk§1Wn ISJSN
N
2%

Thus, Borel-Cantelli Lemma entails that

+ Sllp Sllp sup
ISkSIWn ISJSN $1El$l k—l’xl k]

 

 

n
Zed} (X1,$1,k)

> p(nh)-1/210gn} ;<_ 11‘“),
[=1

 

 

Tl.

n71 2w} (X1, 331$)

[=1

 

 

> logn
“pm

 

 

n
77,-1ng (Xlaxljc)

[=1

§

 

l (X)
2 p 08"} g ZNMnn‘IO < 00.

1121

 

1i712w3(xl,$1,k) = Op (log n/ﬁ) . (3.7.9)

[=1

sup sup
ogkgMn ingN

 

 

Employing Lipschitz continuity of kernel K, one has for 2:1 6 l$I,k—1,$l,kl

sup |Kh(X,,1— 1:1) -— Kh(X,,1— x1,k)| g CKMgllz—2.
lngMn

55

Hence one has

n.
n-1:{w3(x1.x1) — w; (xl:$1,k)}
[:1

Sup sup SUP
ISkSMn ISJSN$1€[31,k_lvxlikl

 

 

S CKMglh"2 sup sup IBJQ (2:2)l = 0(Mn—1h‘2H71/2).
$2E[0,l] ISJSN

Thus, one has

1

n 2 {w} (X11331) — w} (39,111)}

[=1

sup sup SUP
lskSMn ISJSNrielxl,k—1v$1,kl

 

 

= o (i) , (3710)

since Mn ~ 116. (3.7.5) follows instantly from (3.7.8), (3.7.9) and (3.7.10). As a result of
Lemma 3.7.2 and (3.7.5), (3.7.6) holds. D
The next lemma provides the size of 5T5.
LEMMA 3.7.5. Under Assumptions (32) to (86), the least square solution 5 deﬁned in

(3. 3. 6) satisﬁes

N 2
5% = 213+ Z 2 213,0 = 0,, (N (log n)2 /n) . (3.7.11)
J=la=l
—l
PROOF. According to (3.3.8) and (3.3.7), ii = (BTB) BTE, then
-1
5TBTB5 _—. (5TBTB) (BT13) BTE 2 5T (BTE) .

As the matrix B is given in (3.3.7), one has

1

“Bang," = 5T > 5 2 5T (rt—IBTE) . (3.7.12)
Zn

B B
< 3,... J’.a’

According to (3.7.21), ”Bing,” is bounded below in probability by (1 - An) ”Bang. By
(3.7.2), one has
2

N 2
- 2 c _ -2 -
”Ban2 = a0 + E : 2 :33), 3 c0 (10 +2 :aia . (3.7.13)

Meanwhile one can show that 5T (n‘lBTE) is bounded above by

1/2

2

- - 1 n 1 "

“(21+Zafig {gZMXdEi} +Z{;ZBJ,O(Xi,O)U(Xi)€i}
La i=1 [0

i=1

2 1/2

(3.7.14)

56

Combining (3.7.12), (3.7.13) and (3.7.14), the squared norm 5T5 is bounded by

W2(1—An {i- 20(Xi)51}2+2{% 231,0(Xi,a)0(xi)5i}

i=1

Using the same truncation version of 5 as in Lemma 3.7.10, Bernstein inequality entails that

 

 

 

 

n
”-1 —l
:0 (X )si+ nggNa= 12 n E BJ,O (X130) 0 (X,~)e2 = 0;, (logn/J77).
Therefore (3.7.11) holds since An is of order op(1). C]

3.7 .2 Empirical approximation of the theoretical inner product

 

 

 

 

 

Let
71
Am] = sup [(1, 81,0)2 n — <1,BJ,Q>2| = sup n.-1 2 BJ’a (X (3.7.15)
J,CX , J,O’ 1:1
An,2 = SUP <BJ,oszJ’a> - (31,71,311 a) , (3-7-15)
J,J’,a ’ 2‘" ’ 2
A ,3 = sup <BJ ,B I I — <BJ ,B I I) l. (3.7.17)
n lSJ,J’$N,a75a/ ’0 J ’0 >2,n ,a J ’a 2
LEMMA 3.7.6. Under Assumptions (32), (B4) and {B6}, one has
An,1 = 0,, (7171/2 log n) , 3 (3.7.18)
An’g = Op (n.1/2H_1/2 log n) , (3.7.19)
An,3 = 0,, (n—l/2 log n) . (3.7.20)

PROOF. The proof of (3.7.18) follows from Bernstein’s inequality immediately, thus is
omitted. Here we only prove (3.7.19) and (3.7.20). We will discuss case by case with

various 0, a’, J and J’, via Bernstein’s inequality. For brevity, set

62' : €i,J,J’,a,a’ : ”—1 [BJ,a (Xi,a) BJ’,a’ (Xm’) “ E {3.1.01 (X230) BJ,’a’ (Xi,a’) }] 1

then

ZéiH’JJ Mao:

i=1

ZEiH’JJ Haa’

i=1

An 2 = sup

, 1 24,113: sup
1<J<N,=a 1,2

l<J,J’<N,’a;éa

 

 

 

 

We will consider a = 07' = 1 in the CASE 1.1 to CASE 1.3.

57

CASE 1.1 when IJ— J'I > 1. The deﬁnition of BJ‘l in (3.2.3) will guarantee that
31,1 (X,-,1)BJ;,l (Xu) = 0 if |J — J’| >1.

CASE 1.2 when J = J'. By Lemma 3.7.1, the variable 5,- and its second moment can be
simpliﬁed as follows

5,- = n“1{83,1(X,,1)—1},E£?= 5,13%), (Xm) --1}2 = —1—2{E83,1(X,-,1)—1},

7!-

in which EBfu(X,,1) = lle’lll;4(CJ+1,l+ cf,“ 1/6},) .The selection of H will make
E811 (X231) the major term of {E811 (Xm) — I}, then there exist constants c532 and

C,

E 2 > 0 such that

c€,2n—2H-l _<_ E53 3 Cé,212._2H—l.

In terms of the Minkowski’s inequality, the k-th absolute moment has the following upper
bound
k _ ,__ .
E|g,|k =n7kElBil(X,-,1) —1| g ”42" 1(53351 (X,,1)+1}.

where EB‘21k1 (XI-,1) ~ 1 according to Lemma 3.7.1. Hence there exists a constant 05,2 > 0
such that

E|§,|k g Cf2n‘k2k’lHl’k.
Next step is to verify the Cramér’s condition

Eléz‘lk S ngn—ka—lHl—k = Cf,2,,—(k—2)2k—lH—(k—2)n—2H—1

 

202 20 (k4) k—2
6’2 £12 .“2 -1 It 2
cm (nH ) 652” H 5 {05,2} “59"

—l
5,2 "‘ 5,265.2

Cantelli lemma, when J = J’, a = a’ = 1, one has (3.7.19).

in which C“ - 2C 'n‘1H_l max 1,2C2 . Applying Lemma 2.6.2 and Borel -
6,2

CASE 1.3 when IJ —- J’l = 1. Without loss of generality we only prove the case that

J' = J +1. Now 5,; = Tl—lBJ’l (X131) BJ+L1 (X231) has the second moment

as? = [E33, (xii) 83.1,. (xm — {Earl (x...) 8m. mar] ,

where {EBJ’1(X1"1)BJ+1,1(Xi,1)}2 ~ 1, E33,! (Xi,l) 83“,, (X131) ~ H71, according

to Lemma 3.7.1. Hence, E612 ~ H‘l. The k-th moment is given by

Eléilk = n—kEIBJJ (X131) BJ+1,1 (X231) - EBJ,1(X2',1)BJ+1,1 (X,,1)Ik

11"”‘2k—1[EIBJ,1(X1,1)BJ+1,1(Xmllk + IEBJJ (X131) 31+” (Xillllkl’

l/\

where |EBJ)1(X,)1)BJ+1)1(X,)1)|k ~ 1 and EIBJ)1(X,)1)BJ+1)1(X,)1)Ik ~ Hl‘k, ac-

cording to Lemma 3.7.1. Hence there exists a constant 06,3 > 0 such that
k k -k k—l l—k

Similar as in Case 1.2, (3.7.19) follows by using Bernstein’s inequality.
CASE 2 when a = 01' = 2, all the above discussion applies without modiﬁcations.
CASE 3 when a # a'. Without loss of generality, suppose a = 1, (1’ = 2. First we still

need to calculate the order of second moment E63,
2 __2 2 2
Ea.- = n E {8h (X131) 3),, (Xx-,2} - (133,), (xx-,1) BJe ma} .
The boundedness of the density function f (2:1, 2:2) implies that
IEBJ)1(X,-1) 3,5, (X-)2)I < E|€,-|

IIbJ,l”2—1IIbJ’)2II2—//IbJ,(l 132,1 )bJI)2 (131,2 )If($l,$2)d’ﬁldﬂi2

Cf {lleJllé—I [Ill/,1 (1171',1)Id$1} {Ilbjlz II2 l/IbJIM 112,2 )Idxg}
0.} 1,1 —1 J’ 1,2 1
c,{1+ 6:1 }{||bJ)1||, H}{1+ C; IIIIb, ,))'I| HIch),H,

for some constant 013,1 > 0, where the last step is derived by Lemma 3.7.1. As a con-

l/\

|/\

 

 

|/\

1:

sequence, IEBJJ (X131) BJI2 (X,)2)I _<_ C); lHk' Meanwhile, by Assumption (B4) and
Lemma 3.7.1,

2
E{BJ)1(X),-1)BJI, (X,- 2)}
”bJ,l||22 I)IbJI22I 2//bj)(1 151,1)bJ/2 (1,7522)f(117772)‘1171d$2

_. —2

Cf {IIbJ,1ll22 [:J,l (13,1) (1351} {lle/12 2 fb.211,2(17,',2)(111‘2}

C/{1+ Cam/{121,1} (”521152 H}{1+ ‘J’+12/‘J'2 } {IIbl’

 

IV

 

 

 

—2
I; H} 2 68,2-

59

Hence there exist constants 66’ C2 > 0 such that

—2 CnI—2
cné _<_E{-<C'n€

For any I: > 2, the k-th moment of Iﬁil is given by

' k
E|§,|k = 71—kEIBJ,1 (Xm) BJ’)2 (X232) — EBJ,1 (Xi,1) 131/)2 (Xi,2)I

l/\

k k
"_k2k—l [EIBJ)1 (X131) 311,2 (X.,2)I + IEBJ.1 (X21) 3.152 (X.,2)I I
where there exists a constant C B] > 0 such that

13:IB),)1(X,)1).BJ,(X,-)2)IIc

11b5,.112‘"||b.r,;/ / lb (..,,2

b,
0,9,2),- /|b..,(. ,1; 22.}{||.,,,)-'°
cIc .
c,{1+ ’3‘} 1+ 1;” {(152,|1;’°II5J,,)‘

CJ’,2

l/\

(Ti ,2 I f (1131.332) (113161132

_/-’2(le’ $2 2)I kdilrg}

...

k k
c C I _
Cf{1+ ”’Zm} 1+ J “’2 {.,~f(1+c,/C,)} kH2“k <Cg,H2—k.

. .k
91,1 “J’ ,2

 

 

|/\

 

 

 

 

|/\

 

 

|/\

 

 

Thus there is a constant CE > 0 such that
Eléilk S n—ka‘l [Cg/H24" + C}; 1H’"I g (C{)kn‘k2k‘1H2—k

k—2
202 k—‘2 . 20 202
_£_ 1 . ——2 < 6 € . 2
C5 (2067; H 1) (,{n _ {11H max ( Cg ,1) } k.E{,.

l/\

—2

Employing the Bernstein’s inequality and the fact that Egg ~ 71. , one has

n

z},— [3,. (x...) (x...) — E {3... (x...) (x.,..)}]|
i=1

sup sup
13J,J’gN aséa’

 

is of order 0,, (n"1/2 log 71). So the proof of (3.7.19) and (3.7.20) is completed. C]

LEMMA 3.7.7. Under Assumptions (82), (B4) and (B6), the uniform supremum 0f the

rescaled difference between (91,92)., n and (511,92), zs

 

An = sup (3.7.21)

91.92€C(_1)

I(91.g2)2 ’(91.92>2I
, ’" .—.()( 10g" )=o,,(1).

||91|l2 llgzllz 1.1/2H1/2

60

PROOF. For every 91,92 6 G('1), one can write

91(X12X2) : a0 + Z.Z=123=10J,QBJ,Q(X0),
.92 (X11X2) : ‘16 + Zlel £3,121 afl’p’B-l'ﬂ’ (X01),

in which for any J, J’ = 1,...,N,a,a’ = 1,2, 0,1,0, and aJ/a/ are real constants. The

difference between the empirical and theoretical inner products of 91 and 92 is

[(gi,g2>2,n - (91.92%! S Z<a6.a1,aBJ,a>2,n + Z <a0,a’J,,a/BJ/,az>

1.0 J’,a’
+ Z laJ,al (81,0, BAG/>2,” — <31,” raj/WM = L1+ L2 + L3.
J,J’,a,a’

2,n

 

 

I
011'
J,a

The equivalence of norms given in equation (3.7.2) and deﬁnition (3.7.15) lead to
1/2 1/2

An,l ' lab] ' Z I‘Mal S 00A“ 062 + Za’JZ’a 2:03.01 NV2
Ja Ja J,a

: CA,1An,1ll91“2||92H2 H‘”2 = 0,, (...—I/m—I/z logn) "911:2 1:92:12-

L1

|/\

Similarly, one has

— ‘ — '2 -l2
L2£C§,1An,1ll91||2ll92II2H 1/2:0p(” 1/ H / logn)|l91||2||g2l|2-

For the last term L3, one has, by deﬁnitions (3.7.16) and (3.7.17)

 

L3 S 2 la.l,al ail/,allmaXMmaAns)
J,J’,a,a’

1/2 1/2

S CA,2maX(An,2iAn,3) 203.01 20324
J.0 J’,a’
< C A A a —0 “WWI/21
_ A,2max( n,2: 71,3)”91II2H92II2- p n 0%“ H91|l2||92|l2~
Therefore, statement (3.7.21) is established. Cl

3.7.3 Proof of Lemma 3.5.2

In the following, denote V as the theoretical inner product of the B-spline basis

{1,840 (ma),J =1,...,N,a =1,2},i.e.

T T
1 031V 1 0N 0N
02N <BJ,O’BJ,,QI>2 150,0,‘521 O V V
ngJ’sN N 21 22

61

where 0,, = {0, ..., 0}T. Let S be the inverse matrix of V, i.e.

1 0% 0%
S=V‘1= 0N S11 512 . (3.7.23)
0N 321 322

The next lemma on the positive deﬁniteness of matricesiV and S is a sufﬁcient step to

achieve Lemmas 3.7.9 and 3.7.10.

LEMMA 3.7.8. Under Assumptions {B4} and (B6), for the matrices V and S deﬁned in
(3.7.22) and (3.7.23) respectively, there exist constants CV > CV > O and Cs > 65 > 0

such that

cv12N+1 S V S Cv12N+n CSIZN+1 S S S 0512N+1- (3.7-24)

PROOF. Take a real vector ﬂ = ([30,[31,1,...,EN,1,51,2,...,ﬁN,2)T E R2N+l, One has

TB 2: T 1 03), 2 TV
”a won. a (mm) [3 . a

where denote B2 (x) = {1,B1,1 (X1) , ..., BN3 (X2)}T. According to (3.7.2), there exist
constants CV > CV > 0 such that

2

2 .
CV n3 + Zﬂia Z “ﬁTB2 Oi)“2 = (3(2) + 25.1,a3.1,a (Ia) ,
J,(1 J,a 2
2

||3T132(x)||:=a3+ gamma) 2w 33+Zﬁia .
J,a

2 J,(I

thus one concludes that

CvnTn = Cv 33 + 2133,... 2 NW} 2 av [33 + 2:33,... = cvnTn.
J,a J,a
which implies that cVI2N+1 _<_ V S CVI2N+1. The second half of (3.7.24) follows by
changing )6 by V"1/2ﬁ. Cl

As an application of the above Lemma, for any (2N + 1)-vectors x and y

XTSy S C's(2N + 1) ||X|| ° IIYII, (37.25)

62

where CS is the same as in (3.7.24). Note that a given in (3.3.8) can be rewritten as
- T ‘1 T 1 T _1 1 T ... —1 1 T
a=(B B) BE: EBB EBE =(V+V) 513E, (3.7.26)

where V* is the difference between empirical and theoretical inner product matrices, i.e.

T
V“ _ O 02N
021V <BJ’O” BJ',0'>2,n _ <BJ'O" BJ’a0’>2 Isaac/£2,

1gJ,J’gN

Now deﬁne a = {30,314,...,aN,1,a1,2,...,aN,2}T by replacing (V + V"')‘1 with V‘1 = S

in the above formula, that is

a = V“1 (n’lBTE) —_— S(n"‘1BTE) . (3.7.27)
and deﬁne
- (2) n N
‘1’2; (331) = "’1 Z Z (34,2011 (X2331)- (3-7-28)
i=1J=l

The next lemma shows that the difference between 4132) (3:1) in (3.5.6) and ﬁll?) (51:1) in
(3.7.28) is negligible uniformly over 2:1 6 [0,1].
LEMMA 3.7.9. Under Assumptions (B2) to (B6),
sup 915,2) (2:1) -— 91?) (2:1)] 2 Op ((log n)2 /nH) .
1216(0,”
PROOF. According to (3.7.26) and (3.7.27), one has V a = (V + V*) a, which implies that
V*" = V (a — 5). Using (3.7.19) and (3.7.20), one obtains that

W (a — a)“ = “wan 3 Op (ml/2H-1 logn) nan.
By Lemma 3.7.5, ”a” 2 Op (n—l/ZNI/2 log n), so one has
"v (a — a)“ 3 0p ((105511)? n-1N3/2}.
Thus according to Lemma 3.7.8, one has

"(a _ 5)“ 2 0p {(logn)2 n‘1N3/2}_

63

Using Lemma 3.7.5 again, one has

“an s ”(a — a)” + nan = 0,,(1ognm/n) . (3729)

Hence
(2) “(2) N 1 "
(w. (x1) - a (seal = 2 (an — an) ; Zea. (xtm .
J=l [=1
Cauchy-Schwartz inequality implies that

 

         

2
(2) (log n)2 1/2 (log 11)
sup 1)—\Ilv (x1) <\/—O O H =0 —— .
x€[0,l] l p ”H p ( ) p nH
Therefore the lemma follows. Cl

LEMMA 3.7.10. Under Assumptions (82) to {B6}, for \TISZ) (2:1) as deﬁned in (3.7.28)

 

     

N
A 2 A
\Ils, ) (2:1)l = sup 1 E1 Kh(X 231— .731) )E GJJBJJ (XI-,2) = 0,, (H).
x16[0,1] x1€[0.lln J: 1

PROOF. Note that

 

N
@£Q)($l)l S 2042qu (1‘1)+ +ZdJ ,2” 1}—_:{W.I(quf151)—1UoJ($1)}

= Q1 ($1) + Q2 (351)- . (3-7-30)

)2.

By Cauchy—Schwartz inequality, one has

N
a. .2.. .;{

J_. 1 316(0,”

71—12”: (142709.11) - In” (131)}

i=1

 

 

Observe that "an 2 Op (log m/N/n) as given in (3.7.29) and

sup
x1€[0,l]

 

n“ Z{WJ(X7L,$1) — u...) (x1)}| = 0p(logn/\/571) ,

i=1

given in Lemma 3.7.4, so by Assumptions (BS) and (B6)

sup Q2 (’51) = 0,, (logn/N771) Wop (If/8%) : 0,, {Ego—$23}

$1€[0,l]

2 0,, {(log n)3 NH}. (3.7.31)

64

Using the discretization idea again as in the proof of Lemma 3.7.4, one has

N
sup Q1(2:1) 3 max (1)211“, (1131,]c) + (3.7.32)
x1€[0,l] ISkSMn ng J
N N
K113i! sup 2 (“Us/11.” (15 1) _ 2 71mm” (Jim) = T1 + T2.
- - "$l€l31,k-11$1,kl J=1 J=1

where Mn ~ n. Deﬁne next

W = max n‘1 1 :1: s B X- 0 X- e-
1 ISkSMn 1g§n131§g1vle( l,k) J+N+1,J’+l J’,1( 2,1) ( 1.) i

 

W2

ll

1

—l
m B X: x- -
13kg” n ISEnISJ’ZJlsNqu (lec) SJ+N+LJJ+N+1 J],2( 2,2)0( I)El

 

 

then it is clear that T1 _<_ W1 + W2. To show that both of the two terms W1 and W2 have

order 0,, (H), we truncate the random variable 5,- at the level of

1 2
D = 90 —— — . 3..
n n (2+6<90<5) (733)

where 6 is the same as in Assumption (B3). Without loss of generality, we only give the

proof of W1 = 0,, (H). Let
55,0 = 8,105.15 Du), 5:0 = 8110521 > Dn), EZD = 5,7,0 —- E (527:0 |X,-),
.ng = Z 11W($1,k)sJ+N+1,J,HBJ/,1(X,-,1)a(X,)€;‘,D,

lgxng

and denote W10 as the truncated centered version of W , i.e.,

n
n-1 Z (1,,
i=1

Next we show that 'Wl — WID I = Op (H) Note that (W1 _ IVID

W10 :.— max . (3.7.34)

lngMn

 

 

 

S A1+ A2, where

1 " _
A1 = 1334,, E: Z a” (2:1).) SJ+N+1,J,+IBJI,1(Xi11)U(Xi)E(8i,D|xi) ,
”'1ng,ng

1 " 1.
A2 — 131/212%!" 71;: Z 1th (15m) 3J+N+1,JJ+1BJI,1(X¢',1)U (X05130 ~
”119,ng

65

T
Let ﬂu) ($1,113) : {“1421 (11,16) 1' ° ' #in ($1,k)} 1 then

N
n
T _ _
A1 = max Ha; (lec) S2l {n l ZBJ’ 1(Xi'1)0 (X01; (El-’0 lXi)}JJ
i=1 ’ =1

lgkgMn

1/2

N N i 2
“8122214 D‘Zi ZBJ“X“)°( )E(EZD'X“)} ’

i=1
according to (3.7.25). By Assumption (BB),

IE (egDux.)| = IE (5301):.) S

and slupl% 22:1 BJ,1(X1‘,1)0'(X1‘)
,0:
Lemma 2.6.2. Therefore

 

E 051le lxi)
—(1+6)
D}l+6 S MéDn 1

 

 

= Op(log n/ﬁ) by Bernstein inequality given in

2 1/2

/\

n
z:

N N
—(1+6) 2 l .
A1 .. M6011 131334" 211% ($1,k)Jz—:l{nZBJ1(le 0(2)}

1

Op {ND;(1+6) log2 71/11} = 0p (H),

where the last step follows from the choice of Du in (3.7.33). Meanwhile

2+6 . 00
EIEnl2+6 00 E (Elsnl |xn) Ma
2 Pugnl > D”) < Z_____ D2+6 — 2: D31” 5 2 2+6 < 00’

11:1 11:1 11:] n

 

 

since 6 > 1/2. By Borel-Cantelli Lemma, one has with probability 1

n
”_1 Z Z #wJ (331,1c) SJ+N+1,J'+IBJI,1(Xi’l)0 (x0531) : 0

i=1 ISJ,J’5N

for large 11. Therefore, one has |W1 — WID I 5 A1 + A2 = Op (H). Next we want to show

that wlD = 0,, (H), with wlD deﬁned in (3.7.34). Since

ch = ”w (131,1)T321 {31,1(Xi,1),-~ ,31,N (Xi,l)}T 0098*) 5,. D,

so the variance of UM is

“w ($1,k)TS21VHI ({Bl,1(Xi,1),--° ,BN,1( X1,T1)} 0095) i 0) 321111.) (131, k)-

66

According to Assumption (BB), 0 (x) is continuous on a compact set [0,1]d, so it is clear

that chu 5 var ({Bl,1(Xi,1), - - - , BN,1 (Xi,1) }TU(X1')) S 03V”. Thus

var (Ui,k) ~ pw (x1,k)T 821V11521uw (331,1) V£,D

: “w ($1,107. 521%; (zlJc) V€,Dv
* T 1/2
where VQD = var {El-,0 [Xi }. Let n (1131,11) = {uw (IlJc) [1w ($1,k)}

0363 {N (11311)}2 V5,D S var (Ui,k) S 0303 {K (1131,10)2 V5,D-

When 1‘ 2 3, the r-th moment EUsz is

E lUiJclr = E Z l‘wJ (171$) SJ+N+1,J,+IBJJ,1(Xi’l)O (X05210

 

 

 

 

 

 

igxng
1'
* T
g E 2 "“U ($1,k) SJ+N+1,J'+IBJ’,1(Xi’1)0(xi) 13(5750 Ixi)
1gLng
T
_<_ E Z #1.”(13m)3J+N+1’JI+IBJI,1(X1,1)0(Xi) Dir—2‘43,
IgLfSN
while
T
E Z Ile(1131,11)3J+N+1,JI+IBJIJ(X1,1)0(Xi)
1gLng
T T T
: E [1,“,(1'1’k) $21 {BI,1(X1°,1)," ' ,Bl,N (Xi,l)} ”(Xi)
r
g CgCgE lpw (x1,k)T {31,1(Xi,1)a ' " ,BI,N(X1‘,1)}T|
N r/Z
S CECE {n ($1,k)}rE 2 33,1091)
J=1
3 ago; in (x1,.)}”0 (HM/2) ~
Therefore

'5 le'Jclr S CECE {5 ($1,k)}r0 (”hr/2) Bil—2&0

—2
_<_ {can (x1,k)D,,H-l/2}r rush/(kl2 < +00,

67

which means the sequence of random variables {UMHLI satisﬁes the Cramér’s condition
with Cramér’s constant equal to c... 2: cor: (rue) DnH “1/2, hence by the Bernstein’s in-

equality we have for r = 3

-1" (10?. n 6/7
P ” Zuni: an SaleXp —25m§+5c,pn' +a2(3)0([q—+-1-D ,

 

 

 

 

 

(=1
where
2 5mG/7
,,n=pH,01=23+2 1+ 2p" ,a2(3)=lln 1+ 3 ,
q 25m2 + 5c1pn Pn

_ 1/3
m3 N {"3 (351,0)2 Ve,D, ms S {C{"(I1,k)}3 H 1/2DnVe,D} -

Observe that 5gp" = 0(1), then by taking q such that [q—i‘f] 2 c0 log n, q 2 cm/ logn for

some constants c0,cl, one has a1 = 0(1),/Q) = 0 (log n), (12 (3) = 0 (n2). Assumption (B2)

6/7 6/7
__7}_ __ _n_ *6/\0C‘0/7
at...) siKoew<Aot+1Dl ,

and as n —+ 00, one has

yields that

«11% N q_/>n_ : mfg/5
25mg + 5C*pn C... CO (10g ")5/2 D

 

 

-—> +00.

Thus, for 12. large enough,
1 n
P {; Zuni:

i=1
Taking q), p large enough, P {'37: 2&1 UiJCl > pH} S 11.3, for large 11. Hence

 

> pH} g clognexp {—C‘2p2 log n} + Cn2‘6Allco/7 3 11—3.

 

§P(Iwﬂzwi= i245:

n—lk- l

 

 

Thus, Borel-Cantelli Lemma entails that W10 = 0,, (H). Noting that lWl — WIDI =

0,; (H), one obtains that W1 = 0,, (H). Similarly one can show that 1V 2 Op (H). Hence

T1 g wl + w2 = 0,, (H). (3.7.35)

68

Employing Lipschitz continuity of kernel K, the term T22 is bounded by

N
2
- 2 .. 2
“an max sup §:{uw1($1)-uwj ($1.10} _<.lla|| x

N .
max sup E K X -—:1:)——K X -:1: 2 B X )2
1SKM"351€l3'¢1,Ic—1v1=1,kl1X5 [{ h( 11 1 h( 11 1'0} { J'2( 12 }]

Therefore, according to Assumption (BS), Lemma 3.7.1 (ii) and (3.7.29),

N 2 1/2
T <0 Nl/zlogn {ZJ=1EBJ,2(X12)} -O Nl/Zlogn _ _1/2
2“ p 1.1/2 112114., ‘" m ”Pi" )
(3.7.36)

Combining (3.7.32), (3.7.35) and (3.7.36) one has SUlee[0,1] Q1 (2:1) 2 Op (H). The desired
result follows from (3.7.30) and (3.7.31). Cl

69

CHAPTER 4

Spline Single-Index Prediction Model

4.1 Introduction

Consider the stochastic heteroscedastic regression model given in (1.1.1), an attractive di-
mension reduction method to deal with the “curse of dimensionality” is the single-index
model, similar to the ﬁrst step of projection pursuit regression, see Friedman and Stuetzle
(1981), Hall (1989), Huber (1985), Chen (1991). The basic appeal of single-index model
is its simplicity: the d-variate function m (x) = m (151, ...,xd) is expressed as a univariate
function of xTBO 2 23:1 351,00? Over the last two decades, many authors had devised
various intelligent estimators of the single-index coefficient vector 90 = (60,1, ..., 6’0,d)T, for
instance, Powell, Stock and Stoker (1989), Hardle and Stoker (1989),.Ichimura (1993), Klein
and Spady (1993), Hardle, Hall and Ichimura (1993), Horowitz and Hardle ( 1996), Carroll,
Fan, Gijbels and Wand (1997), Xia and Li (1999), ‘Hristache, Juditski and Spokoiny (2001).
More recently, Xia, Tong, Li and Zhu (2002) proposed the minimum average variance esti—
mation (MAVE) for several index vectors.

All the aforementioned methods assume that the d-variate regression function m (x) is
exactly a univariate function of some xTBO and obtain a root-n consistent estimator of 00.
If this model is misspeciﬁed (m is not a genuine single-index function), however, a goodness-
of—ﬁt test then becomes necessary and the estimation of 00 must be redefined, see Xia, Li,

Tong and Zhang (2004). Here instead of presuming that underlying true function m is a

single-index function, a univariate function g is estimated that optimally approximates the

70

multivariate function m in the sense of
g(1/) = E [m(X)|XT60 = V], (4.1.1)

where the unknown parameter 00 is called the SIP coefﬁcient, used for simple interpretation
once estimated; XTOO is the latent SIP variable; and g is _a smooth but unknown function
used for further data summary, called the link prediction function. Our method therefore
is clearly interpretable regardless of the goodness~of—ﬁt of the single-index model, making it
much more relevant in applications.

Estimators of 00 and g are proposed in this chapter based on weakly dependent sample,
which includes many existing nonparametric time series models, that are (i) computationally
expedient and (ii) theoretically reliable. Estimation of both 00 and g has been done via
the kernel smoothing techniques in existing literature, while polynomial spline smoothing
is used here. The greatest advantages of spline smoothing, as pointed out in Huang and
Yang (2004), Xue and Yang (2006 b) are its simplicity and fast computation. The proposed
procedure involves two stages: estimation of 00 by some JE—consistent B, minimizing an
empirical version of the mean squared error, R(0) = E {Y - E ( YI XT0)}2; spline smoothing
of Y on XTB to obtain a cubic spline estimator g of g. The best single-index approximation
to m(x) is then m(x) = g) (XTB).

Under geometrically strong mixing condition, strong consistency and (fa-rate asymp—
totic normality of the estimator B of the SIP coefﬁcient 00 in (4.1.1) are obtained. Proposi—
tion 4.2.2 is the key in understanding the efﬁciency of the proposed estimator. It shows that
the derivatives of the risk function up to order 2 are uniformly almost surely approximated
by their empirical versions.

Practical performance of the SIP estimators is examined via Monte Carlo examples.
The estimator of the SIP coefﬁcient performs very well for data of both moderate and high
dimension d, of sample size n from small to large, see Tables 4.5 and 4.6, Figures 4.19,
4.20 and 4.21. By taking advantages of the spline smoothing and the iterative optimization
routines, one reduces the computation burden immensely for massive data sets. Table 4.6
reports the computing time of one simulation example on an ordinary PC, which shows that

for massive data sets, the SIP method is much faster than the MAVE method. For instance,

71

the SIP estimation of a 200-dimensional 60 from a data of size 1000 takes on average mere
284 seconds, while the MAVE method needs to spend 2432.56 seconds on average to obtain
a comparable estimates. Hence on account of criteria (1) and (ii), our method is indeed
appealing. Applying the proposed SIP procedure to the rive ﬂow data of Iceland, we have
obtained superior forecasts, based on a 9-dimensional index selected by BIC, see Figure
4.25.

The rest of this chapter is organized as follows. Section 4.2 gives details of the model
speciﬁcation, proposed methods of estimation and main results. Section 4.3 describes the
actual procedure to implement the estimation method. Section 4.4 reports the main findings
in an extensive simulation study. The proposed SIP model and the estimation procedure
are applied in Section 4.5 to the river ﬂow data of Iceland. Most of the technical proofs are

contained in Section 4.6.

4.2 The Method and Main Results

4.2.1 Identiﬁability and deﬁnition of the index coefﬁcient

It is obvious that without constraints, the SIP coefficient vector 00 2 (00,1, ...,60,d)T is
identiﬁed only up to a constant factor. Typically, one requires that “90“ = 1 which entails
that at least one of the coordinates 00,1, ..., 60") is nonzero. One could assume without loss
of generality that 0041 > 0, and the candidate 60 would then belong to the upper unit
hemisphere Si“ 2 ((61,...,6d)|zg:1 0,2, = 1,0,, > 0}.

For a ﬁxed 9 = (61, ...,od)T, denote X9 = xTo, X9, = xfe, 1 g i g 71. Let

mo (X0) = E (YlXa) = E{m (X) |X0}- (4.2.1)
Define the risk function of 0 as
12(0) 2 E [{Y — me mm?) = E {m(X) —- 1r1.9(X)5))}2 + E02 (X), (4.2.2)
which is uniquely minimized at 00 6 51—1, i.e.

90=arg min [{(B).
0631—1

72

REMARK 4.2.1. Note that 51—1 is not a compact set, so a cap shape subset of 81-1 is

introduced

d
Sf!“ = (61,.--,6d)|26§=1,6d 2 x/1—-c2 .c e (0.1)

p=1
Clearly, for an appropriate choice of c, 60 6 511—1, which is assumed in the rest of the
chapter.
Denote 0_d = (61, ..., 6d__1)T, since for ﬁxed 0 6 31-1, the risk function R (6) depends

only on the ﬁrst d — 1 values in 6, so R (0) is a function of 9—d

R" (9-.» = R (61.62,...,ad-1,(/1 —- Ila—dig) ,

with well-deﬁned score and Hessian matrices

32

c9
3* 0_ 2 ”__R* 6__ , H,“ 0_ : ___-___—
( d) ( d) ( d) 394397;)

6a,, R“ (6—d)- (4.2.3)

ASSUMPTION (C1): The Hessian matrix H * (90,—d) is positive deﬁnite and the risk func-
tion R“ is locally convex at 60,—d: i.e., for any 6 > 0, there exists 6 > 0 such that

R* (6—d) — Rik (00,—d) < 6 implies ”B-d — 00,—dll2 < 8.

4.2.2 Variable transformation

Throughout this chapter, denote by B31 = {x 6 Rd |||x|| g a} the d-dimensional ball with

radius a and center 0 and
00°) (33) = {m lthe kth order partial derivatives of m are continuous on 83 }

the space of k-th order smooth functions.
ASSUMPTION (C2): The density function of X, f (x) 6 0(4) (831), and there are constants

0 < cf S Cf such that

ef/void (33) 3 f(x) 3 cf/void (83), x e Bf;l
f(X) —=— 0, x ¢ Bi '

For a ﬁxed 9, deﬁne the transformed variables of the SIP variable X9
U9 = Fd(X0),Uo,i = Fd (X94) ,1 S i _<_ 72, (42-4)

73

in which Fd is the a rescaled centered Beta {(d + 1) /2, (d + 1) /2} cumulative distribution

function, i.e.

_ V/a I‘(d+ 1) 2 (d-1)/2
E) (V) —— [I P{(d +1)/2}22d (1 —- t ) dt,1/ E [—a, a]. (4.2.5)

 

REMARK 4.2.2. For any ﬁxed 6, the transformed variable U9 in (4.2.4) has a quasi-uniform

[0, 1] distribution. Let f9 (21) be the probability density function of U9, then for any 11 E [0, 1]

f0 (u) = {112’i (vi) xx, (v), v = 171(1),

in which fxl9 (v) = limAVﬁo P(1/ S X9 S u + AV). Noting that 1139 is exactly the projection

ofx on 0, let 7),, = {xlu S 2:9 S u+Au}ﬂBg, then one has
P(VSX9SV+AV)=P(X€’DV)=/ f(x)dx.
Du

According to Assumption (C2)

chold(’D,,)
Vold (3g)

<P <X < +21 < .
— (V— 9*” "L veid(Bg)

On the other hand

Vold(’DV) = Vold_1(,7V)Au + 0 (Au) ,

where JV 2 {xlxg : v} n 83. Note that the volume of Bf,i is nd/2ad/I‘ (cl/2 + 1) and
l~1 2
was (7,) = «(d—1V2 (a2 — V2)“ V / l“{(d+1)/2},

thus

 

VOl_(j1/)_ 1 F(d+1) 1/ 2 (“W
W—aﬁgnwfwll—(E) l '

Therefore 0 < C] S f0 (u) S Cf < 00, for any ﬁxed 0 and u E [0, 1].
In terms of the transformed SIP variable U9 in (4.2.4), one can rewrite the regression

function mg in (4.2.1) for ﬁxed 6
79 (U9) = Eim (X) We} = E {m (X) IXB} = "100(0), (42-5)
then the risk function R (6) in (4.2.2) can be expressed as

W) = E [{Y — 79 (U0)}2] = E (m (X) — ‘19 (Ueii2 + E02 (X). (4.2.7)

74

4.2.3 Estimation Method

Estimation of both 60 and 9 requires a degree of statistical smoothing, and all estimation
here is carried out via cubic spline. In the following, deﬁne the estimator B of 60 and the
estimator f] of 9.

According to the deﬁnition of B-spline in Section 1.5 of Chapter 1, for ﬁxed 0, the cubic

spline estimator ”)9 of 79 and the related estimator rho of me are deﬁned as

n

79 (') = arg mi? 2: {Yi - ’7 (U0,i)}21 771M”) = ’19 {Ed (V)}- (423)
7(')€G(2)l0»1li=1

Deﬁne the empirical risk function of 0

R

=n—IZ{Y 70 (,UOi)} 2114207,- The (,Xoi)}2 , (42-9)

i=1
then the spline estimator of the SIP coefﬁcient 00 is deﬁned as

B=arg min [2(6),
06 C"

and the cubic spline estimator of g is me with 6 replaced by 9, i.e.

)€G(2)l011i=1

§(u) = {arg ( min Zn:{Y— 7(U0 .)}2} {Fd (u)}. (4.2.10)

4.2.4 Asymptotic results

The following are some other assumptions to achieve the main theorems.
ASSUMPTION (C3): The regression function m E 0(4) (33) for some a > 0.
ASSUMPTION (C4): The noise 8 satisﬁes E(5 IX) 2 O, E (82 IX) = 1 and there exists a

positive constant M such that sup E (IEI3 [X = x) < M. The standard deviation function
xEBd
o (x) is continuous on Bg,

0 < C0 S inf 0(x) S sup o(x) S Cg < oo.
x683 x6831

ASSUMPTION (C5): There exist positive constants K0 and A0 such that a (n) S K()e"’\0"

11
holds for all n, with the a-mixing coefﬁcient for {Zi = (Xia,) } 1 deﬁned as
1:

a(k)= sup |P(BﬂC)-P(B)P(C)|,k21.
B€o{Z3,sS t}, CEo{Zs,s>t+k}

75

ASSUMPTION (C6): The number of interior knots N satisﬁes: nl/6 << N <<
111/5 (log n)_2/5.

REMARK 4.2.3. Assumptions (C3) and (C4) are typical in the nonparametric smoothing
literature, see for instance, Hardle (1990), Fan and Gijbels (1996), Xia, Tong Li and Zhu
(2002). By the result of Pham (1986), a geometrically ergodic time series is a strongly
mixing sequence. Therefore, Assumption (C5) is suitable for (1.1.1) as a time series model

under aforementioned assumptions.

THEOREM 4.2.1. Under Assumptions (CU—(C6), one has

A

041—2 00,__d,a.s.. (4.2.11)
(X)
PROOF. Denote by (f2, 7:, 73) the probability space on which all {(X?, 13)} , 1 are deﬁned.
1

By Proposition 4.2.2, given at the end of this section

 

sup it“ (0_d) — R' (B-dll —-+ 0,a.s.. (4.2.12)
lie-allhS ' 1'62

So for any 6 > 0 and w 6 9, there exists an integer no (to), such that when n > 120(w),
it“ (00,_d,w) — R“ (90,—d) < 6/2. Note that 6-)) = ﬁ_d (w) is the minimizer of
Ti" (9_d,w), so f2" (ﬁ_d (w) ,w) —- R“ (90,—d) < 6/2. Using (4.2.12), there exists n1 (w),
such that when n > n1(w), R“ (6_d(w),w) - ft“ (de (w),w) < 6/2. Thus, when

n > max(n0(w),n1(w)),
12* (a, (w) ,w) — 12* ((907)) < 5/2 + {2* (a, (w) ,w) — 12* (00,—2) < 5/2 + 6/2 = 6.

According to ‘Assumption (C1), R“ is locally convex at 90,—d: so for any 5 > 0 and any a), if
R“ (ﬁ_d (w) ,w) — R“ (90,-d) < 6, then "9_d(w) —60,_.d“ < e for 11 large enough , which

implies the strong consistency. Cl

THEOREM 4.2.2. Under Assumptions (CU-(06), one has

“77(9-(1'904-(1) 41» N {0, 2 (00)},

76

where Z (60) = {11* (90,..d)}T1 ‘1’ (60) {Hill (90:11)}

60) : {‘l’pqlggil with
1M 2 —2E [ (1,), + 790)”) (U90)] + 260,q6(;(11E[{’1’p')’d(U90) + 7902,14} ((190))
+2653E [(700711) (1190)] {(93,111 + 96,112) I{p=q} + 60.1260141{p#q}}

+260 p60 (11E [{TY ”Yp'yq + 7607p, q} (U60)] — 26011360196661? [{7}2i + 700ﬁd’d} (U00)] ,

= ... [1(1) - W) (4 — 412114)} (..,) he. (..,) — v12],

2
in which 7p and '71,” are the values of 33579, 33359-679 taking at 0 = 60, for any p,q =

H“ (60,—d): {lpq}pql:1 and

1,2, ...,d —— 1 and 79 is given in (4.2.6).

REMARK 4.2.4. Consider the Generalized Linear Model (GLM): Y = g (XTHO) + o (X) e,
where g is a known link function. Let 6 be the nonlinear least squared estimator of 00 in
GLM. Theorem 4.2.2 shows that under the assumptions (C1)-(C6), the asymptotic distri-
bution of the 6_d is the same as that of 6. This implies that the proposed SIP estimator
6_d is as efﬁcient as if the true link function g is known.

The next two propositions play an important role in the proof of the main results.
Proposition 4.2.1 establishes the uniform convergence rate of the derivatives of ”)9 up to

order 2 to those of '19 in 6. Proposition 4.2.2 shows that the derivatives of the risk function

up to order 2 are uniformly almost surely approximated by their empirical versions.

PROPOSITION 4.2.1. Under Assumptions {C2)—(C6), with probability 1

 

 

 

sup sup Hg (u) — 79 (u)| = O {(nh)”1/210gn + h4} , (4.2.13)
96531-4 HEIOJ]
811p SUP max —{‘108 (U91) - 20 ((101)) = 0(10,——gn + '13) . (4-2-14)
l<p<d9€ Sd— llSiSn 86p ’ ’ nh3

2

6 10 n
67—p86q{&0(U91i)—70(1U0i)=}| O(\/§ﬁ— +h2) . (4.2.15)

PROPOSITION 4.2.2. Under Assumptions (C2)-(C6), one has for k : 0,1, 2
k

ak0_d

 

sup sup max
151) q<d9€ Sd— llSiSn

 

 

{12* (ad) — 12* (0-2)} = o(1),a.s..

 

 

sup
llmllsv 1-62

Proofs of Theorem 4.2.2, Propositions 4.2.1 and 4.2.2 are given in Section 4.6.

77

4.3 Implementation

This section describes the actual procedure to implement the estimation of 00 and g.

We ﬁrst introduce some new notation. For ﬁxed 0, denote the B-spline matrix as

N
BO 2 {BjA (U0,i)l;:1,j=_3 and

-1
P9 2 By (Bng) B; (4.3.1)
as the projection matrix onto the cubic spline space 05,22). For any p = 1, ..., d, denote
Bp : _—

as the ﬁrst order partial derivatives of B9 and P9 with respect to 6.

It is easy to see that the distribution function Fd in (4.2.5) satisﬁes

 

- d I‘(d +1) ( 1:2) 2
F'scz—Fz -— Ix<a. 4.3.2
d() d, .1 ar{(d+1)/2}22d ..2 (ll_) ( )
Let S*(0_d) be the score vector of 13* (6_d), i.e.
“at a “at
S (6_d) = ———-—R (0_d). (4.3.3)

ao_d

The next lemma provides the exact forms of S“ (6_d).

LEMMA 4.3.1. For the score vector of 13* (6_d) deﬁned in (4.3.3), one has

. . . d-l
3* (ad) —_- —n—1 {YTPPY — opeglvTPdY} 1 , (4.3.4)
p:
where for any p 21, 2, ...,d
r . . . -1
YI PpY : 2YT (I — P9) 3,, (3339) 3331, (4.3.5)

n,N

where Bp = {{Bj’3 (U0,i) — Bj+1,3 (U0,i)}Fd(X0,i)h—1Xi,p} 3 with Fd in (4.3.2)

i=l,j=—-
PROOF. For any p = 1, 2, ..., d, the derivatives of B—splines in de Boor (2001) implies

. a ( ) "(N d ( ) d "'N
Bp : {_ .,4 (19).} :{_B.’4 U9. —Ue,'}
39p ’ ‘ i=l,j=—3 d“ J ,1 d9? 1 i=1.j=—3

N
B- u. B- U- - ..,
3 1339“.) ‘ 231.2} .32) Fd(x9’i)Xi*”
t3+3 ‘3 5+4 t3“ izljz-3
n,N

H

 

 

ll

{{Bj,3 ((19.1) " Bj+1,3 (U20) Fd (x912) hﬁlxi’p i=1j=-3°

78

Next, note that

Pp : Bp (Bng)—1 B; + Ba [52— {(B3B0)"1Bg}]

 

 

p
= 13,, (33130) B}; + B9 {29%; (335139) — } 133" + Bo (13,9 B9) 133,"
Since
a{ BTB -1BTB } r ‘1 __ :r
0 E ( 9 2p 9 o z 6032:?) 13ng + (Bng) 16(3)::39),

and 53; (Bgeg) 2 1'3ng + 3313,, thus

a —l -l . . —l
—— (3339) = — (3339) (BZ‘BQ + 333p) (3339) .
36,,
Hence
. . T —l T T —l . T
P, = (I — P9) 3,, (B,9 B9) B0 + B), (3939) 8,, (I — P9).
Thus, (4.3.5) follows immediately. El

In practice, the estimation is implemented via the following procedure.

Step 1. Standardize the predictor vectors {X3331 and for each ﬁxed 6 E 33-1 obtain
the CDF transformed variables {U9,i}?=1 of the SIP variable {XeiiliLI through formula
(4.2. 5), where the radius a is taken to be the 95% percentile of {lle-II}?___1.

Step 2. Compute quadratic and cubic B-spline basis at each value U9,,~, where the number

of interior knots N is
N = min {c1 [ml/5"] ,cz}, (4.3.6)

Step 3. Find the estimator 6 of 60 by minimizing it“ through the port optimization
routine with (0,0, ..., 1)T as the initial value and the empirical score vector 6* in (4.3.4).
If at < n, one can take the simple LSE (without the intercept) for data {IQ,X,-}?___l with its
last coordinate set positive.

Step 4. Obtain the spline estimator g of g by plugging 6 obtained in Step 3 into (4.2.10).

REMARK 4.3.1. In (4.3.6), c1 and c2 are positive integers and [V] denotes the integer part
of u. The choice of the tuning parameter c1 makes little difference for a large sample and

according to the asymptotic theory there is no optimal way to set these constants. We

79

recommend using c1 = 1 to save computing for massive data sets. The ﬁrst term ensures
Assumption (C6). The addition constrain oz can be taken from 5 to 10 for smooth monotonic
or smooth unimodel regression and c2 > 10 if has many local rninima and maxima, which

is very unlikely in application.

4.4 Simulations

In this section, two simulations are carried out to illustrate the ﬁnite-sample behavior of the
SIP estimation method. The number of interior knots N is computed according to (4.3.6)

with c; = 1, c2 = 5. All of the codes have been written in R.

4.4.1 Example 1

Consider the model in Xia, Li, Tong and Zhang (2004)

Y = m (X) + 0‘08, 00 = 0.3, 0.5, 51-?in N(0, 1)

where X = (X1,X2)T ~N(0,12), truncated by [~2.5,2.5]2 and
1/2
m (x) 2 31+ 2:2 + 4exp {— (2:1 + 3:2)2} + (501:? + 33%) . (4.4.1)

If 6 = 0, then the underlying true function m is exactly a single-index function, i.e., m (X) =
\/2XT60 + 4exp {~2 (XT60)2}, where 63‘ = (1,1)/\/2. While (5 75 0, then m is not a
genuine single—index function. An impression of the bivariate function m for 6 = 0 and
6 = 1 can be gained in Figure 4.18.

For 6 = 0,1, one hundred random realizations of each sample size n 2 50,100,300
are drawn respectively. To demonstrate how close the SIP estimator is to the true index
parameter 60, Table 4.5 lists the sample mean (MEAN), bias (BIAS), standard deviation
(SD), the mean squared error (MSE) of the estimates of 60 and the average MSE of both
directions. Horn this table, one sees that the SIP estimators are very accurate for both
cases 6 = 0 and 6 = 1, which shows that the proposed method is robust against the
deviation from single-index model. As we expected, when the sample size increases, the SIP
coefﬁcient is more accurately estimated. Moreover, for n = 100,300, the total average is

inversely preportional to n.

80

4.4.2 Example 2

Consider the heteroscedastic regression model (1.1.1) with

05{ —exp( "XII/«50)
°s+exp(uxn/f)

 

m (X) = sin (ng00) , o (X): (4.4.2)

in which x,- = {X,-,1,...,X,-,d}T and 5,, i = 1,. ,,n are '~ N(0,1), 00 = 0.2. In this
simulation, the true parameter 60 = (1, 1,0, ...,0, l)/ \/3 for different sample size n and
dimension d. The superior performance of SIP estimators is borne out in comparison with
MAVE of Xia, Tong, Li and Zhu (2002). We also investigate the behavior of SIP estimators
in the previously unexplored cases that n is smaller than or equal to d, for instance, n =
100,d = 100, 200 and n = 200,d = 200,400. The average MSEs of the d dimensions are
listed in Table 4.6, from which one sees that the performance of the SIP estimators are quite
reasonable and in most of the scenarios n S d, the SIP estimators still work astonishingly
well where the MAVEs become unreliable. For n = 100, d = 10, 50, 100, 200, the estimates
of the link prediction function from model (4.4.2) are plotted in Figures 4.20 and 4.21, which
are rather satisfactory even when dimension exceeds the sample size.

Theorem 4.2.1 indicates that 6_d is strongly consistent of 60’_d. To see the convergence,
we run 100 replications and in each replication, the value of “6 —- 6OII/ \/d is computed.
Figures 4.22 and 4.23 plot the kernel density estimations of the 100 “6 — 60" in Example
2, in which dimension d = 10,50, 100,200. There are four types of line characteristics:
the dotted-dashed line (n = 100), dotted line (n = 200), dashed line (500) and solid line
(n = 1000). As sample sizes increasing, the squared errors are becoming closer to 0, with
narrower spread out, conﬁrmative to the conclusions of Theorem 4.2.1.

Lastly, Table 4.6 reports the average computing time of Example 2 to generate one
sample of size n and perform the SIP or MAVE procedure done on the same ordinary
Pentium IV PC. Horn Table 4.6, one sees that the proposed SIP estimator is much faster
than the MAVE. The computing time for MAVE is extremely sensitive to sample size as we
expected. For very large (1, MAVE becomes unstable to the point of the breaking down in

four cases.

81

4.5 Application

In this section the proposed SIP model is demonstrated through the river ﬂow data of
Jékulsa Eystri River of Iceland, from January 1, 1972 to December 31, 1974. There are
1096 observations, see Tong (1990). The response variables are the daily river ﬂow (Yt),
measured in meter cubed per second of Jékulsa Eystri River. The exogenous variables are
temperature (X t) in degrees Celsius and daily precipitation (Zt) in millimeters collected at
the meteorological station at Hveravellir.

This data set was analyzed earlier through threshold autoregressive (TAR) models by
Tong, Thanoon and Gudmundsson (1985), Tong (1990), and nonlinear additive autoregres-
sive (NAARX) models by Chen and Tsay (1993). Figure 4.24 shows the plots of the three
time series, from which some nonlinear and non-stationary features of the river ﬂow series
are evident. To make these series stationary, the trends are removed by a simple quadratic
spline regression and these trends (dashed lines) are shown in Figure 4.24. By an abuse of
notation, we shall continue to use Xt, Yt, Z; to denote the detrended series.

In the analysis, we pre-select all the lagged values in the last 7 days (1 week),
i.e., the predictor pool is {Yt_1,...,Yt_7,Xt,Xt_1,...,Xt_7,Zt,Zt-1,...,Zt_7,}. Using
BIC similar to Huang and Yang (2004) for the proposed spline SIP model with 3
interior knots, the following 9 explanatory variables are selected from the above set
{Yt_1,...,Yt_4,Xt,Xt_1,Xt_2,Zt,Zt_1}. Based on this selection, we fit the SIP model

again and obtain the estimate of the SIP coefﬁcient
0 = {—O.877, 0.382, —O.208, 0.125, —0.046, —0.034, 0.004, ~0.126, 0.079}T.

The ﬁrst two plots of Figure 4.25 display the ﬁtted river ﬂow series and the residuals against
time.

Next we examine the forecasting performance of the SIP method. We start with esti—
mating the SIP estimator using only observations of the ﬁrst two years, then we perform the
out-of-sample rolling forecast of the entire third year. The observed values of the exogenous
variables are used in the forecast. The last plot of Figure 4.25 shows the SIP out-of—sample

forecasts. For the purpose of comparison, the MAVE method is also used, in which the

82

same predictor vector is selected by using BIC. The mean squared prediction error is 60.52
for the SIP model, 61.25 for MAVE, 65.62 for NAARX, 66.67 for TAR and 81.99 for the
linear regression model, see Chen and Tsay (1993). Among the above ﬁve models, the SIP

model produces the best forecasts.

4.6 Proof of The Theorems

4.6.1 Preliminaries

In this section, some properties of the B-spline are introduced.

LEMMA 4.6.1. For each 0 < r g 00, there exist constants c > 0 such that for each spline

combination 2?]:_k+1 aj,kBJ-,k up to order k = 4, one has

4 N' _ 1r
cal/T uaII._||Zk_2Z,--_k.1 011.81.11.31.” (3' 1h) / Han... 1<r<oo

Chl/rllallrgll:i_2Zj——k+laj,jkk3,llr:(3h)1/rllallrv 0<r<1

where a: =(a 1 2,002, ...,aNQ, MONA)- In particular, under Assumption A2, for any

ﬁxed 0, one has
4 N
l 2
ch / ”an2 3 Z Z aj,,,B,-,,, g Gill/2 ”an2
PROOF. It follows from the B-spline Property on page 96 of de Boor (2001),
22:2 29”:_k+1 B“: E 3 on [0,1]. So the right inequality is immediate for r = 00. When

1 g r < oo, Holder’s inequality implies that

4 N 4 N V” 4 N 1’1”
Z Z “MBJ'JC E Z Z Ian-ITEM Z Z 3ch
k=2j=—k+1 k=2j=—k+1 k=2j=—k+1
4 N l/r
31.” Z Z laikerik

k=2j=~k+1
Since all the knots are equally spaced, f_oo 83- k (u) du S h, the right inequality follows

from
T

[01: Z ajkBJ (u) )dug3r—lhllall:.

k: 2j=—k+1

83

W hen r < 1, one has

4 N T 4 N
Z Z “JuichJc 3: Z layukerr

k=2j=—k+l k=2j=—k+l
SincefoO 00 Bjk(u )(lugtj+k-tj=khand

r

00
A12 2 “MB 13k (11) dug llall;/;ooB;’k (u)du§3h||a||;,

k: 2j=— —k+l
the right inequality follows in this case as well. For the left inequalities, Theorem 5.4.2,

DeVore and Lorentz (1993) implies that

r

la,,k| gel/1 / Z 09-18“. (u) du
j j=-k+1

for any 0 < r g 00, so

7'

la- |r<Crh'1/tj+l Z d
Jﬁ — l ajkB] ,k (u) u'

ti j=—k+l
Since each u E [0,1] appears in at most 10 intervals (tj,tj+k), adding up these inequalities,

one obtains that

T N T
j+k _
||a||;golh 1:]: Z a,,,B ()u 1111330}; 1 . Z 01MB”
j=—k+l j=~k+1 r
The left inequality follows. [:1

For any functions (b and (,0, deﬁne the empirical inner product and the empirical norm

1 n
<15. We = [0 <1 (u) r (u) f0 00 du. 1145113,... = n“ 2 <22 (00,.) .
i=1
In addition, if functions 43,1p are L2 [0, l]-integrable, deﬁne the theoretical inner product and
its corresponding theoretical L2 norm as

1111130: f< ¢ (.umuodu <45 (Plnozn ZN U01) «pa/o.)-

i=1
LEMMA 4.6.2. Under Assmnptions {02), (C5) and (CG), with probability 1,

 

sup max <Bj‘k, Bj/ kl> - <Bj,k,Bj, k’> I: O {(nN)"l/2 log n} .
d— lk,k’ =2,3,1 ' ’ 7%9 ’ 9
QESC
1<j,j' <N

84

PROOF. \Ne only prove the case I: = k' = 4, all other cases are similar. Let

(03,343 = BM (Urn) 314,4 (119,1) — E3334 (U0,i) 334,4 (”0,0 ,

with the second moment

I 2
EC; = E [312,4 (110,1) 332-54 (1103)] * {33334 (U93) 334,4 (U0,i)} 1

.1121
2
where {133,34 (09,-) 3,4,4 (U9,,-)} ~ N-2 and E [3%, ((19,)3324'4 (U9,,)] ~ N-1 by As-

sumption (C2). Hence, EC: . ., . ~ N‘l. The k-th moment is given by

’1!) )2
k k
E Ing'J-IJI = E lBjA (Ugﬁ') 3.7-[’4 (Ugﬂ') - EBj,4 (119.1“) le’4 (U9,i)l

k k
S 2k_1 {ElBj14 (U9,i)Bj’,4(U91i)| + EBJ'A (Ugﬂ') le’4 (U0,i)| } 1

 

I:
N N‘1 by As-
Ik

where

k
N N‘k, E lEBM (09,1) le,4 (UM)

 

 

531,4 (09,1) le,4 (UM)

sumption (C2). Thus, there exists a positive constant C such that E|C0,j,j’,i

 

<

Ck“1k!E(§j, .. So the Cramér’s condition is satisﬁed with Cramér’s constant c*. By

1

Lemma 2.6.2, one has for k = 3

1 n (1(52 n 6/7
p f .. . >6 <a ex — n +0 ka([-—]) 1
1 g (Odd/12 _— n — l p 257713 + 5C*6n 2( ) q + 1

where
1 62 .N ‘11 2
6n=60gn,a1=22+2(1+ (n ) 0g n),m%~N_l,
q

 

 

 

 

 

 

x/nN 25mg + 5min
‘ "lg/7 1/3
3 = 11 1 , - = . I .. . <cN .
a2( ) n + 6n m3 1???” (add/,1 3 '-

 

 

 

Observe that 506,, = 0(1) by Assumption (C6), then by taking q such that [5%] 2 c0 log n,
q 2 cm/ logn for some constants (20,01, one has a1 = 0(n/q) = 0 (log 11), a2 (3) = 0 (112)

via Assumption (C6) again. Assumption A5 yields that

6/7 6/7
. _L _ " “6*000/7
0(lq+1l) S {KOCXP( AOlq+1l)} SC” '

Thus, for ﬁxed 0 6 521—1, when 71 large enough

1 n
P {'5 240.1121
1,:

 

 

 

> 6"} g clog 11 exp {-(:262 logn} + Cn2-6AOCO/7. (4.6.1)

85

We divide the d — 1 intervals into nG/(d‘l) equally spaced intervals with disjoint endpoints
—1 = 61W < 6N < < 6AM” = 1, for p = 1,...,(1— 1. Projecting these small cylinders

onto 82*, the radius of each patch Ar, 1' = 1, ...,Mn is bounded by CM; 1. Denote the

 

projection of the Affn points as 9,. = (0,. _d, \/1 — “0,. ﬂing), r = O, 1, ..., Mn. Employing

the discretization method, sup axICQ [ii is bounded by
d—l 1<m I<N 1],],
065C _jij

sup ICG I + sup max sup ICO - .1 . -— C - 4 - . (4.6.2)
0<r<Mn1<jj’<N r’j‘j’ i O_<_r<Mnl<jj’ <N9€A ’3’] ’1 9”“ ’1

By (4.6.1) and Assumption (C6), there exists large enough value 6 > 0 such that

1 n —10
PI; ECWJJ’J >5"} 3” .
z:

n (X) 00
71—12(9r31j11i 3 5n} S 2 Z ”WW—10 5 C 2 "‘3
[:1 n=1 11:1

Thus, Borel—Cantelli Lemma entails that

71
lo n
-12 g
71 ' ' ’ : O ,a-So- 4.6-3
[:1 (91“,.713’31 ( GEN) ( )

Employing Lipschitz continuity of the cubic B-spline, one has with probability 1

 

 

which implies that

00
ZP{ max
11:] H

ISJJ’SN

 

 

sup max
O<r<Mn l<j,j I<N

 

 

 

 

 

n
-l —l —6
sup max sup {C - -/ - —( . -/ } = O (M h ) . (4.6.4)
0<r<A/Inl<j,j<N9€A1-n; 0’“ ’2 0r,“ ’1 n
Therefore Assumption (C2), (4.6.2), (4.6.3) and (4.6.4) lead to the desired result. [:1

Denote by G = C(O) U C“) U 0(2) the space of all linear, quadratic and cubic spline
functions on [0,1]. We establish the uniform rate at which the empirical inner product

approximates the theoretical inner product for all B-splines Bj,’c with k = 2, 3, 4.

LEMMA 4.6.3. Under Assumptions (C2), (C5) and (C6), one has

(71.72%,0" (7172M:
An: sup sup
065g—171,726C: “’71 ”2,9 ll72l|2,9

 

O{(nh) ”2 log n} . (4.6.5)

 

PROOF. Denote without loss of generality,

4 N 4 N
=2 _2 09-1811an 2 [311-831

—k+l k=2j=~k+1

86

for any two 3 (N + 3)-vectors
a I (C!_1,2,0’0,2, "HO/1V2) "'1aN,4)1ﬁ : (ﬂ—1,2)ﬁ0,23 "'1ﬁN,2) "'HBNA) -

Then for ﬁxed 6

4 N
(71,72)n,9 = 1 Z Z 013', Bj,k(U0,i) Z Z IBj,Bk] ,k (,U9z')

i=1 Nk=2j=— —k+l k: 2j=—-k+1

4 4 N
: Z: Z _Z a Jkﬁlecl <Bj,kile,k’>n’g’

N
“3: 4,.a,1,.,<s.,.,a,.,.>,.
N
“zzk+:ﬂj,jk16lkl<Bjkajlkl>9-

According to Lemma 4.6.1, one has for any 9 E 33!“,

2
ll71||2,9

2
ll72ll2,0 :

<1h||0l|2_ l|71||2,_9 < C2hllal|2.61hllﬁll2_ < ||72||2,9_ < C2hllﬁll2.

Cih llallz llﬁllz S ll711l2,9||72l|2,9 S Czhllall2 llﬁll2.

Hence

llalloo llﬁlloo
‘ 61h llallz llﬁllz

(’71 1 72)n,0 - (A/11’72)9
ll71ll2,9”72ll2,9

 

An = sup sup
aesg—l 7167,7261‘

 

 

 

 

1 n
x sup -Z{< Bjk’Bj’k [>12 "<Bj,kiB.lk,> },
9639 1k,km =2,,34 "2:1 n, 1* 9
l_<_j,j/<N
1 n
AnSCOh-l sup —2 <Bjk’Bj’k ,> _<Bj,k,B,Ik,> ,
add—M11234 "i=1 n,9 3' 9

 

 

l<j j I<N
which, together with Lemma 4.6.2, imply (4.6.5).

4.6.2 Proof of Proposition 4.2.1

For any ﬁxed 6, write the response YT: (Y1, ..., Y”) as the sum of a signal vector 79, a

parametric noise vector E9 and a systematic noise vector E, i.e.,

Y=79+E3+E,

87

in which the vectors 7?; 2 {79(U0,1),...,79(U9,n)}, ET = {o (X1)51, ...,o(Xn)5n} and

E2; = {m(xl) - ‘19 (Um) (X...) -— 7'9 (Ua,n)}.

REMARK 4.A.1: If m is a genuine single-index function, then E90 E 0, thus the proposed
SIP model is exactly the single-index model.
Let ng be the cubic spline space spanned by ( B 334 (U93) }?=1’ —3 S j S N for ﬁxed

6. Projecting Y onto 051mg yields that

A ,. .. T . . .

‘79 = {70(U0,1)1 --‘1’70(U0,n)} 2 FIG] (2) ’79 + PTO] (2)139 + FIG] (2) E,

C G G
n,0 n,0 11,0

where ”ya is given in (4.2.8). We break the spline wtimation error ’3’0 (no) — 79 (11.9) into a

bias term ’79 (no) — 79 (no) and two noise terms E9 (119) and £9 (1L0)

39 (U9) - 79 (U9) = {39 (U9) - 79 (U9)} + 5‘9 (U9) + 3‘9 (U9) 1 (45.6)
where
59 (U) I {33,4 ONES-9v V39 {(79,3j,4)n,9}:_31 (4-6-7)
17:9 (u) = {9,34 (u) [3 9.1: N v;}, {(Eg, 31.4)",9}:_3 , (4.6.8)
5-59 (11) = {Bj14(u)}1—13;<_jSN V7119 {(E, Bj,4>n,0}:___3‘ (4.6.9)

In the above, denote by Vmg the empirical inner product matrix of the cubic B-spline basis

and similarly, the theoretical inner product matrix as V9

N N
,v ={(B. ,B- > } . 4.6.10
3 0 3,,4 ],4 0 jJ’Z-3 ( )

The next lemma is a special case of Theorem 13.4.3 in DeVore and Lorentz (1993).

1 T
V =_ = < . B- >
n,0 nBaBH { BJIA’ 1,4 "’9}j,j’=*

LEMMA 4.6.4. If a bi-inﬁnite matrix with bandwidth r has a bounded inverse A"1 on [2 and
K. = m(A) := ||A||2 “A4“2 is the condition number of A, then ||A_1||oo 3 200(1— u)—l,

with CO = u-2r ”14-1”, 11 = (n2 _1)‘/4"(n2 +1)*‘/4’.

LEMMA 4.6.5. Under Assumptions (CB), (C5) and (C6), there exist constants 0 < CV < CV

such that cVN-l nwng g wTvgw g ovN-l “ng and
N‘1 2< Tv <C N’1 2 4611
cv llWll2 _ w n,9W ._ v ”W112 .a.s., ( - - )

88

with matrices V9 and Vn,0 deﬁned in (4.6.10). In addition, there exists a constant C > 0
such that

sup “V1?” _<_CN,a.s., sup ”VB—1” SC'N. (4.6.12)
BeSg'l 00 9652i—1 00

PROOF. First we compute the lower and upper bounds for the eigenvalues of Vnﬂ.
Let w be any (N+4)-vector and denote 7w (u) = 29,:_3 ijJ'A (u), then ng =
{7w (UgJ) , ...,7w (U9,n)}T and the deﬁnition of An in (4.6.5) from Lemma 4.6.3 entails

that
2 T 2 2
lhwll2,9 (1 — An) S W Vn,9W = ||7w||2,n,9 S ||7w||2,9 (1 + An)- (4-5-13)

Using Theorem 5.4.2 of DeVore and Lorentz (1993) and Assumption (C2), one obtains that
2

N
C 2 C 2
cm uwng s “mute = WTV9W = Z ijJ-A 3 01,91le6. (4.6.14)
i=‘3 29
which, together with (4.6.13), yield
cfoN-l “w“; (1 — An) g wTv'nﬂw _<_ ofozv-l nwug (1 + A"). (4.6.15)

Now the order of An in (4.6.5), together with (4.6.14) and (4.6.15) implies (4.6.11), in which
cv = ch, CV = CfC. Next, denote by Amax (Vmg) and Ami“ (Vmg) the maximum and

minimum eigenvalue of Vn’g, simple algebra and (4.6.11) entail that

CVIV—1 _>.. ”Vnﬂllz : )‘max (Vnﬂ) 1| V8119 2 : Aaliln (Vnﬂ) .<_ clea a.s.,

 

 

 

thus
K :2 “V71,9||2 “Va-folk = )‘max (Vnﬂ) All?" (Vnﬂ) S CVCT/l < 0010-3--

Meanwhile, let wj = the (N + 4)-vector with all zeros except the j—th element being 1, j =
—3, ..., N. Then clearly

Tl

1 .
WJTVnﬂWJ' 2 ; 2332,4(1103') = “324“:9 1 llellz : 1’ ‘3 S J -<— N
i=1
and in particular
wgVnﬂWO S )xmax (Vmg) “WOll2 = Amax (vnﬂ) 1
WZ‘3‘ln,(le--3 2 )‘min (Vnﬂ) ”w—3ll2 : )‘min (Vnﬂ) -

89

This, together with (4.6.5) yields that

TV W ”BO 4”2 ”Bo 4”2 l— A
F:)\ 2 V /\—.l V 2 WO 11,9 0 :—-—,——n’-0—> , 0 n1
“ "M "(9) .....< '1’”) wax/...“.-. Ila—3,413.1 ‘ Ila—3.4“: 1 + A,

 

 

which leads to a 2 C > 1,a.s. because the deﬁnition of B—spline and Assumption (C2)
ensure that ”80,4“; _>_ C0 "843,4“: for some constant 1C0 > 1. Next applying Lemma
4.6.4 with u = (n2 — I)”16 (n2 + 1).”16 and c0 = u‘8 ”V;},|l2, one gets “V’Zblloo g
21/”8N(1 — u)"1 2: CN,a.s.. Hence part one of (4.6.12) follows. Part two of (4.6.12) is
proved in the same fashion. [:1

In the following, denote by QT (m) the 4-th order quasi-interpolant of m corresponding
to the knots T, see equation (4.12), page 146 of DeVore and Lorentz (1993). According to

Theorem 7.7.4, DeVore and Lorentz (1993), the following lemma holds.
LEMMA 4.6.6. There exists a constant C > 0, such that for O S k S 2 and 7 6 C(4) [0,1]
[[6 -- QT 6W” 3 0 ”7(4)” h”,
00 oo

LEMMA 4.6.7. Under Assumptions (02), (C3), (C5) and (C6), there exists an absolute

constant C > 0, such that for function 370(11.) in (4.6. 7)

 

 

 

 

 

 

 

 

 

 

dk
sup —-—- (99 - 79) g C 772(4)” h4_k,a.s.,0 g k g 2, (4.6.16)
Sd‘l duk oo .
Re c 00
PROOF. According to Lemma 2.3.3, there exists an absolute constant C > 0, such that
sup Ina — you... s C sup inf u) — is"... _<_ C||m(4)|| h4.a.s.. (4.6.17)
96353—1 9633—1760 2) 00
which proves (4.6.16) for the case k = 0. Applying Lemma 4.6.6, one has for O _<_ k g 2
(1" 4 - _
sup ———k- {QT (79) - '79} S C sup "7(9 )” h4 k S C “mm” h4 k, (4.6.18)
(1.] du _1 00 00
065C 00 96 C

As a consequence of (4.6.17) and (4.6.18) for the case I: = 0, one has

 

 

sup "wen—69110030764)" h4.a.s.,
CD

9653—1
which, according to the differentiation of B-spline given in de Boor (2001), entails that
k

 

 

 

 

 

 

d
sup “7; {QT (79) — 59} g 0 771(4)“ h4~k,a.3., 0 _<_ k g 2. (4.6.19)
(1-1 du 00
9686 00
Combining (4.6.18) and (4.6.19) proves (4.6.16) for k = 1,2. C]

90

LEMMA 4.6.8. Under Assumptions (C1), (C2), (C4) and (C5), there exists an absolute

constant C > 0, such that

8

551—259(0),» - 79 (U0,i)}?:1 00 S C

sup SUP
< < .d—

 

 

 

 

 

 

"1(4)” 113, 0.8., (4.6.20)
00

92 1
m {’79 (U93) — 79 ((193));

 

 

 

sup sup l S C “m(4)“ h2,a.s.. (4.6.21)
ISPﬂSdgegg—l 00 0°
PROOF. According to the deﬁnition of 79 in (4.6.7), and the fact that QT (79) is a cubic

spline on the knots T

{(4% (79) - 79}(U9,i)}?=1 = P9 {{QT (79) — 79}(U9,i)},:1,

which entails that

5%; {{9T (79) — n} (091-) )2; = 552-30 {{QT (79) ‘ U}(U94)}?=1

= 6,, {an (79) — 79}(l]0,i)}:;1 + P953; {{QT (79) — m (”09”;-

Since

2:1

819,, “Q?“ (79) — 79} (U0,i)}?:1 Z {{QT (3?;79) _ 53;”) 0194))“
+ (Ed; {QT (79) - 79} (U92) Xip}:=li

applying (4.6.19) to the decomposition above produces (4.6.20). The proof of (4.6.21) is

similar. Cl

LEMMA 4.6.9. Under Assumptions (CS), (C5) and (C6), there exists a constant C > 0 such

that
-l T -l°T
sup ”n B9” $Ch,a.s., sup sup ”n Bp“ SC,a.s., (4.6.22)
963;"1 0° ‘SP‘Sdoesg‘l 0°
sup “Pg“oo gC,a.s., sup sup "Pp“OOSCh—l,a.s.. (4.6.23)
9632-1 1SPSdassg‘l

PROOF. To prove (4.6.22), observe that for any vector a 6 R", with probability 1

n
”*1 Z 8,7,4 ((10,1')
i=1

 

 

—l T
n B a“ < a max
9 — ll ”00

<Cl (
4,93,, _ luau...

 

 

1141.353”

00

 

 

91

< llalloo nh LZUB 3,3 Bj+l 3) ((199)) Fat (X92) Xi,p S Cilalloo'

To prove (4.6.23), one only needs to use (4.6.12), (4.6.22) and (4.3.1). Cl

—3<j <N

 

 

LEMMA 4.6.10. Under Assumptions (C2) and (C4)-(C6), one has with probability 1

 

 

 

 

 

 

 

 

BTE " 16
9 _1 gn

— = B- U - x. - =0 , 4.6.24
n .2. M W )2 (m) < >

C _
9 BTE BTE

sup sup -— —Q— = sup sup —p— :O(logn). (4.6.25)
13105496531“1 86” n 00 13099635?“1 00 Vnh

 

 

 

 

 

 

 

 

Similarly, under Assumptions (C2), ( C4 )-( C6), with probability 1

 

 

 

 

 

 

 

 

 

BTE9
sup 9 = sup 3rr<1a§N— —Z Bj 4 (U9 1') {m (X,- ) — 79 (U9 i)}l
965;!"1 00 965g 1 — J
logn)
0 , 4.6.26
(VnN ( )
BTE
sup sup 1 9 0 = O (logn) ,a.s.. (4.6.27)
1311311965214 59p U 00 \/nh

 

 

 

 

PROOF. We decompose the noise variable 8,- into a truncated part and a tail part 5,- =

8,01" + 5, .2" + mp", where Dn = n" (1/3 < 1} < 2/5), 5,01" = 5,1{Ieil > Dn},

ED2n— -— EiIHEiI < Dn} “‘ 7712' Dnam Dn— “EIE1I{|EiI < Dn} [Xi]

It is straightforward to verify that the mean of the truncated part is uniformly bounded by

D,’,‘2, so the boundedness of B-spline basis and of the function 02 entail that

= o (1);?) = o (71—2/3) .

sup 3:12:31); (U9i) 0(X i) m?"
965d 1

 

The tail part vanishes almost surely

CX) OO

Epilgnl > Dn} S 201:3 <00

n=l n21
Borel-Cantelli Lemma implies that

_ZBJ 4 (U9,-)o o(,-z.X)e,ln =O(n~k),foranyk>0.

 

92

For the truncated part, using Bernstein’s inequality and discretization as in Lemma 4.6.2
sup sup n 1sz 4 U9) ,- (X0510; = 0 (log n/x/nN) ,a s
965914 131' S N '
Therefore (4.6.24) is established as with probability 1

= o(n"2/3) +0(n_k) +0 (logn/W) = 0(logn/\/1_t—N—).

 

 

iBTE

sup
96 Sd— l

The proofs of (4.6.25), (4.6.26) are similar as E (m (Xi) — 79 (U9 1') |U9 i} E 0, but no trun-

 

 

 

IOO

cation is needed for (4.6.26) as sup 1139.?” Im (X- ) -— 79 (U9 ,)| < C < 00 Meanwhile,
d- l 7'
968C

to prove (4.6.27), we note that for any p = 1, ..., d

N
6

agP(BTE9):{:j—'la9pBJi4U91){m(Xi)_ ’79 (U9 ,i)}l}

According to (4.2.6), one has 79 (U9) E E {m (X) |U9}, hence

j=—3

E [9,, (U9) {m (X) — 79 (U9)}l s 0, —3 s 1' _<_ N.9 e st“.
Applying Assumptions (C2) and (C3), one can differentiate through the expectation, thus
(9 . __
13(59;[Bj,4(U9){TU(X)-79(U9)}]}EO,1_<_de,-3SJSNﬂESg 1.

which allows one to apply the Bernstein’s inequality to obtain that with probability 1
N

{n1 23—19%“ 11(91){m (Xi,)-79(U9.-)}]} =o{(nh)-1/2iegn},

j=—3 00
which is (4.6.27). [3

LEMMA 4.6.11. Under Assumptions (C2) and (C4)—(C6), for £9 (u) in (4.6.9), one has

sup sup IE9 (11)] = O {(nhrl/2 log n} ,a.s.. (4.6.28)
96 gd—l U€[0,1]
--c

-1
PROOF. Denote 6 E (6-3,...,6N)T=(B§BB) 133E = v;},(n -lBTE), then
3‘9 (u) = Zf;_3 6,3,), (u), so the order of E9 (u) is related to that of a. In fact, by

Theorem 5.4.2 in DeVore and Lorentz (1993)

sup sup Iég(u)l S 3‘11) llélloo:

 

 

sup "ng (n—lBg‘E) N 3 CN sup n‘lB'GI‘EH , a 3
96521—1 0° 965:.”1 0°

93

where the last inequality follows from (4.6.12) of Lemma 4.6.5. Applying (4.6.24) of Lemma
4.6.10, one has established (4.6.28). 1]

LEMMA 4.6.12. Under Assumptions (C2) and (C4)-(C6), for E9 (u) in (4.6.8), one has

sup sup IE9 (u)| = 0 {(nh)_l(2 log n} ,a.s.. (4.6.29)
OESg—‘l ILEIO,” .
The proof is similar to Lemma 4.6.11, thus omitted. C]

The next result evaluates the uniform size of the noise derivatives.

LEMMA 4.6.13. Under Assumptions (C2)-(C6), one has with probability 1

 

 

 

sup Esup max ——5 U = 0{ nh3 _1/210 11}, 4.6.30
8
sup sup max ——-—59 (U9 2') =0 {(nh3) ”2 log n} (4.6.31)
l<p<d0€ Sd-l l<i_<_n 09p
su ‘92 _ 5 —1/2
p sup max 59(U9 ,) =0 ()nh logn , (4.6.32)
1<p q<d9€ 8d- 1 l<i_<_n 69,,89459
sup 6_sup max __(9__2 =0 {(nh5)1/2 log n} (4.6.33)

 

 

PROOF. Note that

8e (U ) n =(I—P)B(BTB)~1BTE+B (BTB )_IBT(I—P )E
36 9 9,5 9 p 9 9 9 9 9 9 p 9 -
P i=1

Applying (4.6.24) and (4.6.25) of Lemma 4.6.10, (4.6.12) of Lemma 4.6.5, (4.6.22) and
(4.6.23) of Lemma 4.6.9, one derives (4.6.30). To prove (4.6.31), note that

3 n 3 . 6
U = —— P = P —— = T 4.6.34
{ET-9;“ 9 ,i)}i:1 69p{ 9E9} 19139 + P939;)139 T1 + 2, ( )
in which

T1

((1 —- P9) 13,, — B9 (3313,) BT39}( }(BTB9)_ B9 E9

ll

 

 

1

(I—P9)Bp—B9()

—1
BT 0B9 8

94

B339)“ 851:}(11 (11:39) lBg‘Eg
71
£913

En”)

 

 

By (4.6.24), (4.6.12), (4.6.22) and (4.6.23), one derives

sup ||T1||00 = 0 (11"1/2N3/2 log n) ,a.s., (4.6.35)
GeSg‘l

while (4.6.27) of Lemma 4.6.10, (4.6.12) of Lemma 4.6.5 .

sup “TZHOO = N x 0 (n-1/2h_1/2logn) = 0 (n'lﬂh-3/2 log n) ,a.s.. (4.6.36)
BESd—l
C

Now, putting together (4.6.34), (4.6.35) and (4.6.36), one can establish (4.6.31). The proof
for (4.6.32) and (4.6.33) are similar. Cl

PROOF OF PROPOSITION 4.2.1. According to the decomposition (4.6.6)

(39(10— ’79 (U)| = |{i9 (U) - 79 (u)} + 59 (U) + g39(11)| -

Then (4.2.13) follows directly from (4.6.16) of Lemma 4.6.7, (4.6.28) of Lemma 4.6.11 and
(4.6.29) of Lemma 4.6.12. Again by deﬁnitions (4.6.8) and (4.6.9), we write

8 8 6 8
-...__. * _ U . :_._ ~ _ . .~ . _“ . .
99p {(79 79) ( 9.2)} 86,, (79 79) (U92) + 86,,“ (U99) + 86,,89 (U9“)
It is clear from (4.6.20), (4.6.30) and (4.6.31) that with probability 1

sup sup max
1<p<d96 Sd— 11<i<na

 

82p (‘79 - '79) (U6,i)l = 0 (’13) ,

 

 

 

 

9 9 -—1/2
~ U U _ o h3 1 .
12229962? 11<?an{ 5995M 0i)+ 899% 8“ O’i) } {(n ) Ogn}
Putting together all the above yields (4.2.14). The proof of (4.2.15) is similar. El

4.6.3 Proof of Proposition 4.2.2

LEMMA 46.14.. Under Assumptions (02)-(C6), one has

sup 1[133(0) — RB( )l = o(1),a.s..
0656

PROOF. For the empirical risk function 13(0) in (4.2.9), one has

{a (9) __- 91299919,.) — (x9 — 0 (X0692

95

n
= n'1 2 {79 (119,1) - 79 (U6,i) + 79 (U99) —' m (Xi) - 0 (X,)e,~}2,
i=1
hence

Ii (0) = 7171::{79 (U0,i) - 79 (1199)}2 + "“1 :02 (X05?
i=1 ' i=1
+2971 2 {790199) - 79 ((19.2)) (79019.9) — m (Xi) * 0 (Xi) 5,}
i=1

+n_12(’y9 (U99) — m (X0)2 + 271—1 E {’76 (U0,i) " 7710(1)} ‘7 (Xil5i,
i=1

i=1

where ’y9 (51:) is deﬁned in (4.2.8). Using the expression of R (6) in (4.2.7), one has

sup |R(6) -— R(0)| 3 11+ 12 + 13 + I4,

 

 

 

Gesg’l
with
n 2
11 = SUP 7712(790/99) -79(U9,i)} '
9esg'1 £21
11
12 = SUP 2n‘12{79 (Hos) - 79 (U6,i)} {79 (U99) — m(Xi) - 0(Xi)€i} .
OESd_1 i=1
C
n 2
13 = SUP ””1 Z {79 (U0,i) — m(Xi)} - E {79 (U9) - m(X)}2 ,
Besg—l i=1

 

 

+

 

}.

13 + 14 = 0(1),a.s.. (4.6.37)

 

1 n
— Zn? (xae? - E02 (X)
2—1

n.

 

 

2 Tl
14 = SUP { :5 Z {79 (U93) - 7" (X0) 0 (Xi)€i
96531—1 i=1

Bernstein inequality and strong law of large number for a mixing sequence imply that

Now (4.2.13) of Proposition 4.2.1 provides that

sup sup H9 (u) — '79 (u)| = 0(n‘l/2h71/2logn + 714) ,a.s.,
C

which entail that

11 = 0 { (n_1/2h—1/2 log n)2 + (h4)2} ,a.s., (4.6.38)

11
12 S 0 {(1Lh)71/2 logn +114}>< sutp 1211‘lZI'7'9(U9,,-)— 1n(X,-) — a (X,) 52".
GESC‘ i=1

96

Hence

[2 g 0 (77—1/2h71/2logn + h4) ,a s

The lemma now follows from (4.6.37), (4.6.38) and (4.6.39) and Assumption (C6).

LEMMA 4.6.15. Under Assumptions (CB) - (06), one has

sup sup
Oesg‘llspsd

 

 

a ,. _ n
331-113(9) — R(9)} — 71 1;:15994;

in which

. (9 8
€6,239 2 2 {79 (U99) — Yi} 53—9970 (U6,i)" 8010120) E(€g,,-,p) = 0.

Furthermore for k = 1, 2

sup
9esg‘1

——kk- {9(9) — 9(9)} -_—

 

 

PROOF. Note that for any p = 1,2, ...,d

1 (‘9 . 8
5535;3(3Fn12{79( U9,4) - Y}a—-0p 79 ((19.4),

556;):(9) = E[{79(Ug)—m(x )}(9 7——9(U9)]

ll

Thus E (99,9) = 213 “790109) — 19} 33579 ((199)) — 335mm = o and

1 8 ~ ~—1 n
9’99; {12(9) - 12(0)} = (2") gain) + J1,0,p + me + J3ﬂm’

with

J1,9,p = fl Z {79 ((19.4) - 79 (1199)} 5% (79 - 79) (U9, 1')

J2,9,p = "71217901997 — 771(39)— 0 (Xi) 84‘} 8%; (79 — 79) (U99),

Tl
_ . 8
J3,0,p = n l E :{79 (U99) ’79 ((199)) —66p79(U9,i).

97

: 0(n_1/2) , a.s.,

0 (n—1/2h-1/2_k logn + h4"k) , a.s..

E[{44(U4>—m (x )—a(X>e}a %79(U9)l‘

(4.6.39)

[:1

(4.6.40)

(4.6.41)

(4.6.42)

(4.6.43)

Bernstein inequality implies that

sup sup

"1269,43,;

Meanwhile, applying (4.2.13) and (4.2.14) of Proposition 4.2.1, one obtains that

0{(nh)_1/210gn + h4} X 0 {(nh3)-1/210gn + hs}

 

 

=0(n_1/210gn) ,a.s.. (4.6.44)

sup sup IJ19,pl

= 0(n‘1h7210g2 n +117) ,a.s.. (4.6.45)
Note that
” 9
J2,9,p = 71—1 2 {79 (U0,i) - 7” (Xi) — 0 (X054) 59-; (79 - 79) (119,4)
_1 T 8
”TL (E+E0) 'a—é-{Pg (E+E6)}.
P

Applying (4.2.13), one gets

J2,,9p+n1(E+E9)T—{P9(E+E9)}=

0(h3) ,as

sup sup
96 541-1 l_<_p<d

 

 

06,,
while (4.6.24), (4.6.26) and (4.6.12) entail that with probability 1

sup sup
9eS§“11$PSd

 

n"1 (E +139)ng {P9 (12+ E9»

 

— 0{(nN)_1/210gn} X N X N X 0 {(nN)'—1/210gn} = 0{n"1Nlog2n} ,
thus

sup sup IJ2 9 pl— —- 0(h3 + n-IN log2 n) ,a.s.. (4.6.46)

Lastly

n
_ 8
J3,9,p “‘71 1 E :( (79 —79)(9 71:79 (U94)
i=1

—-1
BTB BT 9
= n_1(E + E9)T B9 (——0n_£) -—Q-—-'y9.

By applying (4.6.24), (4.6.26), and (4.6.12), it is clear that with probability 1

-1
T BTB BT a
sup sup (n-lBgE+n—1B3‘E0) 9 9 _Q._.70
oesg—I 192d n

 

0 {(nN)_1/2logn} X N X 0 {h + (nN)—1/2logn}

H

0 {n71 log2 n + (nN)—l/2 log n} ,

98

while by applying (4.6.16) of Lemma 4.6.7, one has

sup sup

= 0 (174) ,a.s.,
gegg-llspgd

n :(79-79)56—p 79(U9 ,4)

 

 

together, the above entail that

sup sup IJ3 9 pl— — {0h4 + n-1 log2 n + (nN).l/2 log n} ,a.s.. (4.6.47)
BESd- ll<p<d

Therefore, (4. 6. 43), (4. 6. 45), (4. 6. 46), (4. 6 47) and Assumption A6 lead to (4. 6. 40), which,
together with (4.6.44), establish (4.6.42) for k — 1.

Note that the second order derivative of R (0) and R (0) with respect to (9p, 6‘, are
n 82

. . " 9 9 .
Z{79(U9,4)-Y4}m79(U9,4) +§T79 9(U9,4) 89p79(U9’i) 4

i=1

2n"1

 

92 9
2 IE {79 (U9) - m (X)} W79 (U9) + E {867 79 (U9);9 7;“ (U9)}I
The proof of (4.6.42) for k = 2 follows from (4.2.13), (4.2.14) and (4.2.15). CI

PROOF OF PROPOSITION 4.2.2. The result follows from Lemma 4.6.14, Lemma 4.6.15,

equations (4.6.50) and (4.6.51). Cl

4.6.4 Proof of the Theorem 4.2.2

Let 3'; (6_d) be the p—th element of 3* (0_d) and for 79 in (4.2.6), denote

714,1) 2: 2 {7p _ 60,1)66’31’7’d} (U90,i) {790 (U00,i) — K} 1 (4"648)
h ‘° 1 13 tk' 49:94 =12...d—1.
w ere 71,15 vaueo 53579 a mga 0, or any p,q , , ,
LEMMA 4.6.16. Under Assumptions (02)—{C6), one has
n
9 (90:0!) " "71:77:39 =
i=1

PROOF. For any p = 1, ...,d — 1

sup

0(n'1/2) ,a.s.. (4.6.49)
ISde—l

 

 

3;: (6-4) -S;(6 -4) = (9%];— 9 46515—1){R(6)—R(6)}.

99

Therefore, according to (4.6.40), (4.6.41) and (4.6.48)
n 11
774,44 : n—l £590,444 "' 60,1266,(1171—1 250099: E (724,44) = 04
i=1 i=1
Tl
3'; (90,—4) — 517(00,-d) — 71—1 2 77,3, = 0(n—1/2) ,a.s..
i=1

Since 3* (6-9) attains its minimum at 90,-d4 for p = l, ..., d - 1

sup
ISPSd-l

 

 

50,

,, __ 9 _19
Sp(60.~d)=(99'1’66d 769)R(6)9=90

 

which yields (4.6.49). Cl

LEMMA 4.6.17. The (p, q)-th entry of the Hessian matrix H“ (60,—d) equals [p,q given in
Theorem 4.2.2.

PROOF. It is easy to show that for any p,q = 1, 2, ...,d,

£1120): 831,130" (X) 79 (U9))2 —QE [79(U9)86p 70 (U6)I

 

 

 

 

 

32 a a2
agpaqu(0)— —2E [ca—[p70 (U9)—’79an (U9) +79 (U9) 66p1————-9(d6qU9)],
Note that
3 8 8 8
-——R* 0 z — “ ‘1:— 4.6."0
86p ( “0 99pR(6) 9d96d3(9)4 ( 0 )
62 92 9 92 9 92
R* 9 = —— _ _(l _B 6
(99443641 ( _d) 861,636,, ( 69 661,869 ( ) 69 869 86,, R( )
9 9,, 9 9994 92
u“ r—‘_— —-1‘?((9 + ——-————R 6). 4.6.51)
06‘] 1__ Ila—dug 89d ) 93 869869 ( (
Thus

9%;R*(0 -4)— — —2E I49 (U9) 9 0 49 (9,9)I 4293-19,]; [70(99): 7,3479%)I

(901)3qu (6-41) —2E{5_9p 79(U9) 08 79 (U9)+79 (U9) 86 39‘] 79019)}

 

 

100

_ 9 9 9‘2
449,4,19 {54,4244 (U44) 55544614) + 44 (U4) 94,94,” (99)}

 

9 9,,
99
‘7 1 - ||0_4||§

 

+2 E {79 (U9) 810979 (U9))

_ 9 9 - 92
+2999913 {579;79 (U9) 55379019) + 79 (U9) W79 (110)}

 

9 2 92
4999—215: _— U 4 U U .
p q d [{36d79( 9)} +79( 9) 39939417“ 9)I
Therefore we obtained the desired result. Cl

PROOF OF THEOREM 4.2.2. For any 1) =1,2,...,d — 1, let

)4 (t) = S; (49.4 + (1 — 904-4) ,4 e (0.11,

then

A

4422—9441-4044) (64—44)-

Note that 8* (6_d) attains its minimum at 6-9, i.e., 5‘; (6_d) E 0. Thus, for any p =

1,2,...,d — 1, tp 6 [0,1], one has

‘3; (90,—41) = 3; (é—d) " 3; (90,—41) = fp (1) — fp (0)

 

... {5335239 (4,9,, 4 (1 _ 4,.) 94-4) lq=4,...,4_4 (49-4—04-4) .
then
. a? - . -
_s* (00,-41) = {agqagpw (tpéLd + (1 — tp) 00’_d)}p,q=1,...,d—1 (9—4 — 90,-4) .

Now (4.2.11) of Theorem 4.2.1 and Proposition 4.2.2 with k = 2 imply that uniformly in

p,q=1,2,...,d—1

92 - .
7 ‘ —+ . . . . 2
99,,99PR (tpa‘d+(1 ”7604-01) ((1.94934 (465 )

 

where [p,q is given in Theorem 4.2.2. Noting that \/1—1 (é—d“90,—d) is represented as

1
ﬁg” (60,—(1) 1

02 ,* .
_ llmR (tPO—d + (1 7 7’) 607(1) lp,q=1,...,d—1l

101

d—l

where 8* (90,—(1) = {8; (60’_d)} 1 and according to (4.6.48) and Lemma 4.6.16

11
s; (90,_d) = 44-1244,,- + 0 (4-1/2) ,a.s., E ((74.4) = 0.

i=1

_ ' A d—l
Let ‘11 (00) = (dm):,q_l__l be the covariance matrix of JR {5; (60,—d) }p=l with dim given

in Theorem 4.2.2. Cramér-Wold device and central limit theorem for a mixing sequences

entail that
. *4 d
J53 (90,—41) r“) N {04‘1’(90)}-

—1
Let )3 (60) = (H* (90,—d)}—1‘I’(90) I{H* (90,—dllTI , with H“ (90,—d) being the Hes-
sian matrix deﬁned in (4.2.3). The above limiting distribution of ﬁS“ (00,—d): (4.6.52)

and Slutsky’s theorem imply that

\ﬂ-l (largo—41) L N {042 (90)}- U

102

Table 4.1. Example 2.5.1: Piecewise constant spline bands coverage probabilities

 

 

 

 

 

 

 

 

noise level sample size conﬁdence level estimated bands oracle bands

1 — 0.01 0.588 0.588 0.590 0.582

100 1 - 0.05 0.320 0.288 0.278 0.276

1 — 0.01 0.660 0.716 0.772 0.766

0.2 200 1 — 0.05 0.410 0.428 0.522 0.512
1— 0.01 0.858 0.856 0.858 0.856

500 1 — 0.05 0.548 0.556 0.564 0.554

1— 0.01 0.7040 .792 0.870 0.864

100 1— 0.05 0.4820 .542 0.682 0.666

1 — 0.01 0.762 0.812) 0.880 0.876

0.5 200 1 — 0.05 0.568 0.570) 0.690 0.676
1 -— 0.01 0.9220 .924 0.930 0.926

500 1 — 0.05 0.7320 .744 0 782 0 776

 

 

 

 

 

Table 4.2. Example 2.5.1: Piecewise linear spline bands coverage probabilities

 

noise level

sample size

conﬁdence level 0.99 confidence level 0.95

 

 

 

 

 

100 0.980 (0.990) 0.948 (0.962)
0.2 200 0.994 (0.996) 0.956 (0.978)
500 0.994 (1.000) 0.950 (1.000)
100 0.984 (0.992) 0.956 (0.974)
0.5 200 0.994 (1.000) 0.972 (0.988)
500 0.996 (1.000) 0.978 (1.000)

 

 

 

103

 

Table 4.3. Report of Example 3.6.1

 

component #1

component #2

component #3

 

 

 

 

 

 

 

00 n
lst stage 2nd stage lst stage 2nd stage lst stage 2nd stage

100 0.5 0.1231 0.0461 0.1476 0.0645 0.1254 0.0681

1.0 0.1278 0.0520 0.1404 0.0690 0.1318 0.0726

0‘5 200 0.5 0.0539 0.0125 0.0616 0.0275 0.0577 0.0252
1.0 0.0841 0.0144 0.0839 0.0290 0.0848 0.0285

500 0.5 0.0263 0.0031 0.0306 0.0107 0.0278 0.0102

1.0 0.0595 0.0044 0.0578 0.0115 0.0605 0.0119

1000 0.5 0.0169 0.0015 0.0210 0.0053 0.0178 0.0054

1.0 0.0364 0.0018 0.0367 0.0054 0.0375 0.0059

100 0.5 0.3008 0.0587 0.3298 0.1427 0.3236 0.1393

1.0 0.3088 0.0586 0.3369 0.1364 0.3062 0.1316

1'0 200 0.5 0.1742 0.0256 0.1783 0.0802 0.1892 0.0701
1.0 0.2899 0.0328 0.2830 0.0824 0.3043 0.0721

500 0.5 0.0924 0.0065 0.1124 0.0421 0.1004 0.0345

1.0 0.2299 0.0078 0.2305 0.0458 0.2314 0.0362

1000 0.5 0.0616 0.0033 0.0637 0.0270 0.0646 0.0224

1.0 0.1460 0.0034 0.1433 0.0275 0.1429 0.0219

 

 

 

 

 

 

 

 

 

Table 4.4. The computing time of Example 3.6.1

 

 

 

Method n = 100 n = 200 n = 400 n = 1000

 

MIE 10 76 628 10728

 

SPBK 0.7 0.9 1.2 4.5

 

 

 

 

 

104

Table 4.5. Report of Example 4.4.1

 

 

 

 

 

 

90 n 90 BIAS SD MSE Average MSE
9 59 — 04 0.00825 76 — 05
100 041 (—0.00236) (0.02093) (0.00044) 7e — 05
9 —6e — 04 0.00826 76 — 05 (0.00043)
03 042 (0.00174) (0.02083) (0.00043)
9 —0.00124 0.00383 24 — 05
300 041 (—0.00129) (0.01172) (0.00014) 26 — 05
0 —0.00124 0.00383 2e - 05 (0.00014)
04? (0.00110) (0.01160) (0.00013)
9 0.00121 0.01346 0.00018
100 041 (—0.00137) (0.02257) (0.00051) 0.00018
9 -0.00147 0.01349 0.00018 (0.00051)
05 02 (0.00062) (0.02309) (0.00052)
9 -0.00204 0.00639 46 — 05
300 041 (-—0.00229) (0.01205) (0.00015) 46 — 05
9 0.00197 0.00637 4e - 05 (0.00015)
0:2 (0.00208) (0.01190) (0.00014)

 

 

 

 

 

105

 

Table 4.6. Report of Example 4.4.2

 

 

 

 

 

 

 

 

 

Sample Size n Dimension d Average MSE Time
MAVE SIP MAVE SIP
4 0.00020 0.00018 1.91 0.19
10 0.00031 0.00043 2.17 0.10
50 50 0.00031 0.00043 3.29 0.10
100 0.00681 0.00620 5.94 0.31
200 0.00529 0.00407 27.90 0.49
4 0.00008 0.00008 3.28 0.09
10 0.00012 0.00017 3.93 0.13
100 50 0.00032 0.00127 8.48 0.16
100 —— 0.00395 — 0.44
200 -— 0.00324 — 0.73
4 0.00004 0.00003 5.32 0.17
10 0.00005 0.00007 7.49 0.24
200 50 0.00007 0.00030 15.42 0.24
100 0.00015 0.00061 40.81 0.54
200 — 0.00197 — 1.44
4 0.00002 0.00001 14.44 0.76
10 0.00002 0.00003 24.54 0.79
50 , 0.00002 0.00010 52.93 0.89
500 100 0.00003 0.00012 143.07 0.99
200 0.00004 0.00020 386.80 1.96
400 — 0.00054 —— 4.98
4 0.00001 0.00001 33.57 1.95
10 0.00001 0.00001 62.54 3.64
50 0.00001 0.00003 155.38 2.72
1000 100 0.00001 0.00005 275.73 1.81
200 0.00008 0.00006 2432.56 2.84
400 —— 0.00010 — 9.35

 

 

 

 

106

 

 

 

 

 

 

 

 

 

 

 

 

 

confidence band, n=100, level=0.95
N —-1
o .-
P' -1
N _
I
F l T l l f
o o o 2 0.4 o 6 o 8 1 o
conﬁdence band, n=500, level=0.95
N
o -1
‘T —(
N ._
l
l l l l l I
o o o 2 0.4 o 6 o 8 1 o

 

 

 

Figure 4.1. Example 2.5.1: 95% constant spline conﬁdence bands with opt = 1
Note: conﬁdence bands (upper and lower dashed curves) computed from (2.4.5) with

k = 1, opt = 1, 941(4) (center dotted curve), m(x) = sin(27r:r) (center smooth solid curve).

107

 

 

 

 

 

 

 

 

 

 

 

 

confidence band, n=100, Ievel=0.99
N 0-4
o .4
"l‘ _
“I! _
I I l I I w
0.0 0.2 0.4 0.6 0.8 1.0
confidence band, 11:500. Ievel=o.99
N —
o —1
‘7 ..
N -1
I
I l l I l l
o o o 2 0.4 o 6 0 B 1 o

 

 

 

Figure 4.2. Example 2.5.1: 99% constant spline conﬁdence bands with opt = 1
Note: conﬁdence bands (upper and lower dashed curves) computed from (2.4.5) with

k = 1, opt = 1, 7771(3) (center dotted curve), m(x) = sin(27rz) (center smooth solid curve).

108

 

confidence band, n=100, Ievel=0.95

 

 

 

 

 

 

 

—2
I

 

 

 

 

 

 

Figure 4.3. Example 2.5.1: 95% constant spline conﬁdence bands with opt = 2
Note: conﬁdence bands (upper and lower dashed curves) computed from (2.4.5) with

k = 1, opt = 2, 7711(3) (center dotted curve), m(x) = sin(27rar) (center smooth solid curve).

109

 

confidence band, n=100, Ievel=0.99

 

-1

 

 

 

 

 

 

 

 

 

 

“'1 _
I I I I I I
0.0 0.2 0.4 0.6 0.8 1.0
conﬁdence band, n=500, Ievel=0.99
N ..-
., 1
'T -1
N ..
I
I I I I I 1
0 0 o 2 0.4 0 6 0 6 1 0

 

 

 

Figure 4.4. Example 2.5.1: 99% constant spline conﬁdence bands with opt = 2
Note: conﬁdence bands (upper and lower dashed curves) computed from (2.4.5) with

k = 1, opt = 2, 1721(3) (center dotted curve), m(x) = Sin(21r:1:) (center smooth solid curve).

110

 

confidence band, n=100, Ievel=0.95

 

 

-1

 

 

 

 

 

 

0'1 _

I I I I I I
0.0 0.2 0.4 0.6 0.8 1.0
confidence band, n=500, Ievei=0.95
N .1
o _-
‘T -(
N —i
I

 

 

 

0.0 0.2 0.4 0.6 0.8 1.0

 

 

 

Figure 4.5. Example 2.5.1: 95% linear spline conﬁdence bands with opt = 1
Note: conﬁdence bands (upper and lower dashed curves) computed from (2.4.5) with

k = 2, opt = 1, 942(4) (center dotted curve), m(x) = sin(27r:r) (center smooth solid curve).

111

 

 

 

 

 

 

 

 

 

 

 

 

confidence band, n=100, Ievel=0.99
N _.
o _
'T —4
N _
I
I I 1 I I l
0 0 o 2 0.4 0 6 o 8 1 o
confidence band, n=500, Ievel=0.99
N .—
o —-1
'T ..
N _
I
I I I I I I
0 o 0 2 0.4 o 6 o a 1 o

 

 

 

Figure 4.6. Example 2.5.1: 99% linear spline conﬁdence bands with opt = 1
Note: conﬁdence bands (upper and lower dashed curves) computed from (2.4.5) with

k = 2, opt = 1, 77120:) (center dotted curve), m(x) = sin(27ra;) (center smooth solid curve).

112

 

 

 

 

 

 

 

 

 

 

 

 

confidence band, n=100, Ievel=0.95
N _.
o —
‘T _.
N _
I
I I l' I I I
o o o 2 0.4 o 6 o 8 1 o
confidence band, n=500, Ievel=0.95
N _-
° ‘1
V" _
‘1‘ .4
T I I I I
o o o 2 0.4 o 6 o a 1 o

 

 

 

Figure 4.7. Example 2.5.1: 95% linear spline conﬁdence bands with opt = 2

Note: conﬁdence bands (upper and lower dashed curves) computed from (2.4.5) with

k: = 2, opt = 2, Th2(:r) (center dotted curve), m(x) = sin(27r:r) (center smooth solid curve).

113

 

 

 

 

 

 

 

 

 

 

 

 

confidence band, n=1oo, Ievel=o.99
m —1
o ...
"l‘ _
N _
I
I I I I I I
o o o 2 0.4 o 6 o a 1 o
confidence band, n=500, level=0.99
N .—
o _
‘7 _
N _
I
F I I I I I
o o o 2 0.4 o 6 o a 1 o

 

 

 

Figure 4.8. Example 2.5.1: 99% linear spline confidence bands with opt = 2
Note: conﬁdence bands (upper and lower dashed curves) computed from (2.4.5) with

k = 2, opt = 2, Th2(:c) (center dotted curve), m(x) = sin(27r$) (center smooth solid curve).

114

 

 

u(t)

 

 

V0)

 

Figure 4.9. Example 2.5.2: Plot of the EKC in terms of u(t) and v(t)

 

 

Plot of the Trend Relationship of the EKC for US

 

-1.2 -1.0

-1.4

Trend at log (Emission per capita)
4.8 1 6

-2.0
i

 

 

 

I I I I I I

3.4 3.6 3.8 4.0 4.2 4.4
Trend at log (GDP per capita)

 

Conﬁdence Band of the Residual Relationship of the
EKC for US with confidence levei=0.80

 

 

 

Residual of log (Emission per capita)
0 0

 

 

 

 

I I I I I I I
-0.20 -O.15 -O.10 -0.05 0.0 0.05 0.10

Residual oi log (GDP per capita)

 

 

 

Figure 4.10. Example 2.5.2: Trend and noise analysis of US

Note: linear fit (solid), zero ﬁt (dotted dashed) and spline ﬁt (dashed) with 80% bands
(upper and lower solid).

116

 

Plot of the Trend Relationship of the EKC for Japan

 

 

 

 

 

 

 

 

 

 

«3 q
.5
i q, _
a
.5
.8
e
9}, rs .-
u 0
g
e
U
c
2
:- op 4
O": .-
I I I I
7 s 9 1o
Trend of log (GDP per capita)
Conﬁdence Band of the Residual Relationship of the EKC
for Japan with confidence levei=0.99
2
s. -
A o Q
g o. .‘ o . ....... f".
g N e 0..-]..f’. .
a o' ' ' ”ff
S )l’.’ o
'2 .1 ’
'g o _ .- - - -.-.-.-.-._ -..._._,_.-A_.,. .. L -.-.-.-._.-.-.-.-._._.--.-.- -.-.-- ...- --.- -.-
L1". 0
2a ; ' ‘ '
3 N _ 0 ‘/ ' 0
§ 9 ls! .
:9 e /
8 l../
I Q. I" ’l "' o
9 “ ° ,r .1
”‘1
‘0. O
(.3 _
r ‘ I I I 1 I
-o.2 -o.1 0.0 0.1 0.2 0.3
Residual of log (GDP per capita)

 

 

 

Figure 4.11. Example 2.5.2: Trend and noise analysis of Japan

Note: linear ﬁt (solid), zero ﬁt (dotted dashed) and spline ﬁt (dashed) with 99% bands
(upper and lower solid).

117

 

 

 

 

 

 

 

 

Estimation of component #1, n = 200
N- h
3“ _____________________ ,
c _ """""""" ....—
—"""" ..................
""""""""""" _
q.— i-
2 I 0 | 2

 

 

 

 

 

 

 

 

 

 

Estimation of component #1, n = 500
:L r
....1 -
3. _
t ...........................................................................
C ....................................... ‘5‘-
------------------------------------ J
3. 1.
qt. ..
-2 0 2

 

 

 

 

 

 

Figure 4.12. Example 3.6.1: SPBK estimator with conﬁdence intervals for the ﬁrst component

Note: oracle estimator (dotted), SPBK estimator (solid) and 95% pointwise conﬁdence
intervals constructed by (3.2.13) (thin dashed) of the ﬁrst component (smooth solid curve).

118

 

 

Estimation of component #2, n = 200

 

------------

 

 

 

 

 

 

 

 

 

 

 

 

 

-4
H
q

 

 

 

 

 

 

Figure 4.13. Example 3.6.1: SPBK estimator with conﬁdence intervals for the second component

Note: oracle estimator (dotted), SPBK estimator (solid) and 95% pointwise conﬁdence
intervals constructed by (3.2.13) (thin dashed) of the second component (smooth solid).

119

 

 

Estimation of component #3, n = 200

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.14. Example 3.6.1: SPBK estimator with conﬁdence intervals for the third component

Note: oracle estimator (dotted), SPBK estimator (solid) and 95% pointwise conﬁdence
intervals constructed by (3.2.13) (thin dashed) of the third component (smooth solid).

120

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Efficiency density of component #2
(V) -1
z-
.a N _t
C
0
‘O
o _.
r
0.0
Efficiency density of component #3
I ..... n-1m
: —— n=200
I """'" "8500
: -— n=1ooo
:0 ~ I
>‘ I
’25 N r '
I:
d!
‘0
'- '1
l
O _. I
I I I I
0.0 0.5 1.0 1.5

 

 

 

Figure 4.15. Example 3.6.1: Plot of the relative efﬁciencies of components 2 and 3
Note: the empirical efﬁciencies of ﬁt; (3:0) to in; (ma) computed by (3.6.1) based on 100

replications, a = 2, 3.

121

 

 

 

 

 

   

 

 

 

 

 

 

 

 

   
 

 

 

 

Efficiency density of component #1

..... n‘soo

"‘ é -— n=1000

: --- n=1500

I — n-2000
.. — E
0 - I
.>.~ .
's .
c I
o I
u I
N — i
o _. I

I I I I

0.0 0.5 1.0 1.5

Efficiency density of component #2

. ..... "35m

“‘ ’4 ; — n=1ooo

. -- n=1500

: -- n=2000
‘* - 2
o -4 l
.2: i
to I
C I
d) I
D I
u — I
O —. I

I I I I

0.0 0.5 1.0 1.5

 

 

 

Figure 4.16. Example 3.6.2: Plot of the relative efﬁciencies of components 1 and 2
Note: the empirical efﬁciencies of m; (1:0,) to in; (1:0,) computed by (3.6.1) based on 100

replications, a = 1, 2.

122

 

 

 

 

 

 

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Efficiency density of component #15
..... n:

m .1 : —- n=1000

I _‘ "31500

; — na2000
" r I
m - 2
.é‘ .
In I
C I
o I
u I
N - l
O _ I

I I I I

0.0 0.5 1.0 1.5

Efficiency density of component #30
_ ----- n=500

m ; -—- n-IOOO

I — "BISOO

: — n=2000
v J :
to ~ I
2‘ .
“a .
C I
O I
u l
N — 1
o — 1

I I I I

0.0 0.5 1.0 15

 

 

 

Figure 4.17. Example 3.6.2: Plot of the relative efﬁciencies of components 15 and 30
Note: the empirical efﬁciencies of ﬁt; (3:0,) to in; (1:0) computed by (3.6.1) based on 100

replications, a = 15, 30.

123

 

   

5-..-- 3

    

D ..
‘ ' ' III
- d . I,
. . ' {dd/”ll
.' . :r’r'I,’
.
I. g I,
0* . .
_ _ -..
I .' '
... ."-.,_.-¢ ‘-.._
t I ‘I_ . ' O .I u
.

 

 

 

 

     
    
   

s . .
. . .' - . ‘ _ ‘ .
. . . P
. , .
. . ' . - . . . . , ‘.- .
. I . ' . _
. : _ ; _ -
. : x i -
. ~ f: .
.' . - j .
' l n "‘ ’ .
. ‘ I "'( " '
. - I 1 I I .
. . g ,l I I ,t, .
_' . .1 ‘ I ' I I’l' ,‘I 4:, {‘7' _
. . . - a ;.;a,';,;,,-’ ,
. _ . ""1","a' ‘

.'.GIODICII-

 

 

 

 

Note: the

Figure 4.18. Example 4.4.1: The actual bivariate surface

actual surface m in model (4.4.1) with respect to 6 = 0, l.

124

 

 

 

 

 

 

 

 

 

 

 

.
o o f
o 1"
ID -' O
_’.l"\. . l .
l '\
r '0 o x . . '
I C ‘, .‘
v - .‘ ' I ’
I. ‘.
i \0 i.
I . "\ .
1" .0 \.
'~ .-.
0') -' ‘ . O
’5
_I
.I
N '1 1’
I.
I
Q
0.
0 I
'- - o i
I.
I
1'.
...‘
C
_. I
o l‘
(C
or .
I I I I
-1 0 ‘l 2

 

 

 

Figure 4.19. Example 4.4.1: The univariate approximation to the bivariate surface

Note: function 9 (solid curve); estimate of g (dotted curve) by 90; estimate of 9 (dashed
curve) by (i = (0.69016, 0.72365)T for 6 = 0 and (0.72186, 0.69204)T for 6 = 1.

125

 

n: 100, d: 10

 

1.0

0.5

0.0

-1.0

 

 

 

 

 

“=100, (1:50

 

1.0

0.5

0.0

 

-1.0

 

 

 

 

 

 

Figure 4.20. Example 4.4.2: The univariate approximation ((1 = 10, 50)
Note: estimate of g with 9 (dotted curve), estimate of g with 00 (dashed curve), true

function m (x) in (4.4.2) (solid curve).

126

 

n= 100, d: 100

 

 

 

 

 

 

n= 100, d: 200

 

 

 

 

 

 

 

 

Figure 4.21. Example 4.4.2: The univariate approximation ((1 = 100, 200)
Note: estimate of g with 9 (dotted curve), estimator of g with 00 (dashed curve), the true

function m(x) in (4.4.2) (solid curve).

127

 

Density Estimation, d=10

 

 

 

 

 

 

 

@—
I

I

I

I

I

I

I

(VJ-1‘
I

I

I

I

I

> I
35 I
C N“
o I
o I
I

I

I

I

I

v--"
I

I

I

I

I

I

I

0-1
I

I I T I I I 1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

 

Density Estimation, d=50

 

 

 

 

 

 

v a .
1"
I
I
I
I
I
l
m— I
1
I
l
I
1
>~ I
.0: ,
"c' N -'
Q I
Q I
I
l
I
I
I
v- '1'
I
1
I
I
l
1
0.1L . . . ,
l l l I I I I I

 

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

 

 

 

Figure 4.22. Example 4.4.2: Kernel density plots of the error norms

Note: the kernel density estimators of "0 — Boll/x/d are based on 100 replications.

128

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Density Estimation, (1:100
w 4
.. — a
3. :
‘3 N — .
q, I
0 1
I I I I I I I I
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Density Estimation, (1:200
#- —I
t;-
m _ 2
E 2
E”: N — -
m I
0 i
o ..--i--..-...-- ----.- ,_
I I I I I I I I
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

 

 

 

Figure 4.23. Example 4.4.2: Kernel density plots of the error norms

Note: the kernel density estimators of "0 — Boll/ﬂ are based on 100 replications.

129

 

 

 

flow
20 40 60 80100120140
1

 

 

 

 

 

 

 

 

 

 

 

800 1000
days
0—1
1' [I I‘ll
. I‘
0" I
0. II I l i
g \I I ‘ ‘ l l
O 1
'7‘ ’I
0
01-1
I
I I I I I I
0 200 400 600 800 1000
days
8—1
0.
(D
E o
s v‘

 

 

O-
N

0 200 400 600 800 1000

 

 

 

.4
.4
_1

 

 

 

Figure 4.24. Time plots of the daily river ﬂow data

Note: the ﬁrst, second and third are ﬂow (solid) with trend (dashed), temperature (solid)

with trend (dashed line) and precipitation(solid) with trend (dashed) respectively.

130

 

 

150

100
1

50
1

 

 

 

0 200 400 600 800 1 000

 

 

60

residual
0 20
j J

-20
1

 

 

 

0 200 400 600 800 1 000

days

 

 

150
1

ﬂow
1 00
A

50
l

 

 

 

800 900 1000 1100

days

 

 

 

Figure 4.25. The ﬁtted, residual and forecast plots of the river ﬂow data

Note: the ﬁrst is the river flow data (“+”) with the SIP fitted values (line); the second is

the residual plot; the third is the out-of-sample rolling forecasts (line) for the third year.

131

BIBLIOGRAPHY

[1] Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the deviations of
density function estimates. Ann. Statist. 1 1071—1095.

[2] Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. New York: Springer.

[3] Carroll, R., Fan, J ., Gijblos, I. and Wand, M. P. (1997). Generalized partially linear
single-index models. J. Amer. Statist. Assoc. 92 477-489.

[4] Chen, H. (1991). Estimation of a projection -persuit type regression model. Ann. Statist.
19 142-157.

[5] Chen, R. and Tsay, R. S. (1993). Nonlinear additive ARX models. J. Amer. Statist.
Assoc. 88 956—967.

[6] Chen, R., Yang, L. and Hafner, C. (2004). Nonparametric multi-step ahead prediction
in time series analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 669-686.

[7] Claeskens, G. and VanKeilegom, I. (2003). Bootstrap conﬁdence bands for regression
curves and their derivatives. Ann. Statist. 31 1852-1884.

[8] de Boor, C. (2001). A Practical Guide to Splines. New York: Springer.

[9] DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation: Polynomials
and Splines Approximation. Springer-Verlag, Berlin.

[10] Doukhan, P. (1994). Mixing: Properties and Examples. Springer-Verlag, New York.

[11] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications, London:
Chapman and Hall.

[12] Fan, J. and Jiang, J. (2005). Nonparametric inference for additive models. J. Amer.
Statist. Assoc. 100 890—907.

[13] Fan, J., Hardle, W. and Mammen, E. (1998). Direct estimation of low-dimensional
components in additive models. Ann. Statist. 26 943-971.

[14] Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric
Methods. New York: Springer.

[15] Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist.
Assoc. 76 817-823.

132

[16] Hall, P. (1989). On projection pursuit regression. Ann. Statist. 17 573-588.

[17] Hall, P. and Titterington, D. M. (1988). On conﬁdence bands in nonparametric density
estimation and regression. J. Multivariate Anal. 27 228-254.

[18] Hardle, W. (1989). Asymptotic maximal deviation of M-smoothers. J. Multivariate
Anal. 29 163-179.

[19] Hardle, W. (1990). Applied Nonparametric Regression. Cambridge University Press,
Cambridge.

[20] Hardle, W. and Hall, P. and Ichimura, H. (1993). Optimal smoothing in single—index
models. Ann. Statist. 21 157-178.

[21] Hardle, W. , Hlavka, Z. and Klinke, S. (2000). XploRe Application Guide. Springer-
Verlag, Berlin.

[22] Hardle, W., Marron, J. S. and Yang, L. (1997). Discussion of “Polynomial splines and
their tensor products in extended linear modeling” by Stone et. al. Ann. Statist. 25
1443-1450.

[23] Hardle, W. and Stoker, T. M. (1989). Investigating smooth multiple regression by the
method of average derivatives. J. Amer. Statist. Assoc. 84 986-995.

[24] Hastie, T. J. and Tibshirani, R. J. ( 1990). Generalized Additive Models. London: Chap—
man and Hall.

[25] Hengartner, N. W. and Sperlich, S. (2005). Rate optimal estimation with the integration
method in the presence of many covariates. J. Multivariate Anal. 95 246-272.

[26] Horowitz, J. L. and Hardle, W. (1996). Direct semiparametric estimation of single-
index models with discrete covariates. J. Amer. Statist. Assoc. 91 1632-1640.

[27] Horowitz, J. and Mammen, E. (2004). Nonparametric estimation of an additive model
with a link function. Ann. Statist. 32 2412-2443.

[28] Horowitz, J. Klemela, J. and Mammen, E. (2006). Optimal estimation in additive
regression. Bernoulli 12 271-298.

[29] Hristache, M., Juditski, A. and Spokoiny, V. (2001). Direct estimation of the index
coeffcients in a single-index model. Ann. Statist. 29 595-623.

[30] Huang, J. Z. (1998). Projection estimation in multiple regression with application to
functional ANOVA models. Ann. Statist. 26 242-272.

[31] Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist.
31 1600-1635.

[32] Huang, J. and Yang, L. (2004). Identification of nonlinear additive autoregressive mod-
els. J. R. Stat. Soc. Ser. B Stat. Methodol. 66, 463-477.

133

[33] Huber, P. J. (1985). Projection pursuit (with discussion). Ann. Statist. 13 435-525.

[34] Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation
of single-index models Journal of Econometrics 58 71-120.

[35] Johnson, R. A. and Wichern, D. W. (1992). Applied Multivariate Statistical Analysis.
New Jersey: Prentice Hall.

[36] Klein, R. W. and Spady. R. H. (1993). An efﬁcient semiparametric estimator for binary
response models. Econometrica 61 387-421.

[37] Leadbetter, M. R., Lindgren, G. and Rootzén, H. (1983). Extremes and Related Prop-
erties of Random Sequences and Processes. New York: Springer.

[38] Lefohn, A. S., Husar, J. D. and Husar, R. B. (1999). Estimating historical anthropogenic
global sulfur emission patterns for the period 1850-1990. Atmospheric Environment. 33
3435-3444.

[39] Linton, O. B. and Nielsen, J. P. (1995). A kernel method of estimating structured
nonparametric regression based on marginal integration. Biometrika 82 93-101.

[40] Linton, O. B. and Hardle, W. (1996). Estimating additive regression models with known
links. Biometrika 83 529-540.

[41] Linton, O. B. (1997). Efﬁcient estimation of additive nonparametric regression models.
Biometrika 84 469—473.

[42] Maddison, A. (2003). The World Economy: Historical Statistics. Paris: OECD.

[43] Mammen, E., Linton, O. and Nielsen, J. (1999). The existence and asymptotic prop-
erties of a backﬁtting projection algorithm under weak conditions. Ann. Statist. 27
1443-1490.

[44] Muller, H. G., Stadtmiiller, U. and Schmitt, T. (1987). Bandwidth choice and conﬁ-
dence intervals for derivatives of noisy data. Biometrika 74 743-749.

[45] Neumann, M. H. (1995). Automatic bandwidth choice and conﬁdence intervals in non-
parametric regression. Ann. Statist. 23 1937-1959.

[46] Neumann, M. H. (1997). Pointwise conﬁdence intervals in nonparametric regression
with heteroscedastic error structure. Statistics 29 1-36.

[47] Opsomer, J. D. and Ruppert, D. (1997). Fitting a bivariate additive model by local
polynomial regression. Ann. Statist. 25 186-211.

[48] Pham, D. T. (1986). The mixing properties of bilinear and generalized random coefﬁ-
cient autoregressive models. Stochastic Anal. Appl. 23 291-300.

[49] Powell, J. L., Stock, J. H. and Stoker, T. M. (1989). Semiparametric estimation of
index coefﬁcients. Econometrica. 57 1403-1430.

134

[50] Robinson, P. M. (1983). Nonparametric estimators for time series. J. Time Ser. Anal.
4 I85—207.

[51] Rosenblatt, M. (1976). On the maximal deviation of k-dimensional density estimates.
Ann. Probab. 4 1009-1015.

[52] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London:
Chapman and Hall.

[53] Sperlich, S., Tjostheim, D. and Yang, L. (2002). Nonparametric estimation and testing
of interaction in additive models. Econometric Theory 18 197-251.

[54] Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist.
13 689-705.

[55] Stone, C. J. (1994). The use of polynomial splines and their tensor products in multi-
variate function estimation. Ann. Statist. 22 118-184.

[56] Sunklodas, J. (1984). On the rate of convergence in the central limit theorem for
strongly mixing random variables. Lithuanian Math. J. 24 182-190.

[57] Tjostheim, D. and Auestad, B. (1994). Nonparametric identiﬁcation of nonlinear time
series: projections. Amer. Statist. 89 1398-1409.

[58] Tong, H. (1990). Nonlinear Time Series: A Dynamical System Approach. Oxford, U.K.:
Oxford University Press.

[59] Tong, H., Thanoon, B. and Gudmundsson, G. (1985). Threshold time series modeling
of two icelandic riverﬂow systems. Time Series Analysis in Water Resources. ed. K. W.
Hipel, American Water Research Association.

[60] 'I‘usnady, G. (1977). A remark on the approximation of the sample (if in the multidi-
mensional case. Period. Math. Hungar. 8 53-55.

[61] Wang, L. and Yang, L. (2007). Spline-backﬁtted kernel smoothing of nonlinear additive
autoregression model. Ann. Statist. Forthcoming.

[62] Xia, Y. (1998). Bias-corrected conﬁdence bands in nonparametric regression. J. R. Stat.
Soc. Ser. B Stat. Methodol. 60 797-811.

[63] Xia, Y. and Li, W. K. (1999). On single-index coefﬁcient regression models. J. Amer.
Statist. Assoc. 94 1275-1285.

[64] Xia, Y., Li, W. K., Tang, H. and Zhang, D. (2004). A goodness—of—ﬁt test for single-
index models. Statist. Sinica. 14 1-39.

[65] Xia, Y., Tong, H., Li, W. K. and Zhu, L. (2002). An adaptive estimation of dimension
reduction space. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 363-410.

135

[66] Xue, L. and Yang, L. (2006 a). Estimation of semiparametric additive coefﬁcient model.
J. Statist. Plann. Inference 136 2506—2534.

[67] Xue, L. and Yang, L. (2006 b). Additive coefﬁcient modeling via polynomial spline.
Statistica Sinica 16 1423-1446.

[68] Yang, L., Hardle, W. and Nielsen, J. P. (1999). Nonparametric autoregression with
multiplicative volatility and additive mean. J. Time Ser. Anal. 20 579-604.

[69] Yang, L., Sperlich, S. and Hardle, W. (2003). Derivative estimation and testing in
generalized additive models. J. Statist. Plann. Inference 115 521-542.

[70] Zhang, F. (1999). Matrix Theory: Basic Results and Techniques. New York: Springer.

[71] Zhou, S., Shen, X. and Wolfe, D. A. (1998). Local asymptotics of regression splines and
conﬁdence regions. Ann. Statist. 26 17601782.

136

   

I“[1311];[1211811113131