PLACE IN RETURN BOX to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/05 p:/CIRC/DateDue.indd-p.1

The Application of B-Spline Smoothing:
Conﬁdence Bands and Additive Modelling

By

Jing Wang

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Statistics and Probability

2006

ABSTRACT

The Application of B-Spline Smoothing: Conﬁdence Bands and
Additive Modelling

By

Jing Wang

Asymptotically exact and conservative conﬁdence bands are obtained for nonpara-
metric regression function, based on constant and linear polynomial spline estimation,
respectively. Compared to the pointwise nonparametric conﬁdence interval of Huang
(2003), the confidence bands are inﬂated only by a factor of {log (11)}1/2, similar to
the N adarayar-Watson conﬁdence bands of Hardle (1989), and the local polynomial
bands of Xia (1998) and Claeskens and Van Keilegom (2003). Simulation experiments
have provided strong evidence that corroborates with the asymptotic theory.

A great deal of effort has been devoted to the inference of additive model in the
last decade. Among the many existing procedures, the kernel type are too costly to
implement for large number of variables or for large sample sizes, while the spline type
provide no asymptotic distribution or any measure of uniform accuracy. We propose a
synthetic estimator of the component function in an additive regression model, using
a one-step backﬁtting, with spline smoothing in the ﬁrst stage and kernel smoothing
in the second stage. Under very mild conditions, the proposed SBK estimator of the

component function is asymptotically equivalent to an ordinary univariate Nadaraya-

Watson estimator, hence the dimension is effectively reduced to one at any point. This
dimension reduction holds uniformly over an interval under stronger assumptions
of normal errors, and asymptotic simultaneous conﬁdence bands are provided for
the component functions. Monte Carlo evidence supports the asymptotic results for
dimensions ranging from low to very high, and sample sizes ranging from moderate to
large. The proposed simultaneous conﬁdence bands are applied to the Boston housing
data for linearity diagnosis.

Phenological information reﬂecting seasonal changes in vegetation is an important
input variable in climate models such as the Regional Atmospheric Modeling System
(RAMS). It varies not only among different vegetation types but also with geographic
locations (latitude and longitude). In the current version of RAMS, phenologies are
treated as a simple sine function that is solely related to the day of year and latitude,
in spite of major seasonal variability in precipitation and temperature. In short,
the sine curves of phenolog are far different from the observed. Via linear spline
smoothing we developed more realistic phenological functions of all land covers in
the East Africa to improve RAMS model based on remote sensing observations. In
addition, we quantify the differences between the RAMS’s default phenological curves

and those linear spline estimates derived from remote sensing observations.

© 2006

J ing Wang

All Rights Reserved

To my grandma, my parents, and Yuming

ACKNOWLEDGMENTS

I would like to express my sincere gratitude to my advisor Professor Lijian Yang.
He is always willing to answer all kinds of questions with great patience and share his
profound insights with me. I am very appreciative to his innumerous encouragement
and support during my research and job search. With enduring enthusiasm and
dedication to the academia and thoughtful attention to the students, Professor Yang
sets an example for being an excellent faculty.

I am truly grateful to Professors Jiaguo Qi, R. V. Ramamoorthi and Yijun Zuo for
taking time to serve in my dissertation committee. Especially, I would like to thank
Professor Qi and the CLIP group for providing me ﬁnancial support and sharing their
knowledge with me on the project. I really appreciate Professors Dennis Gilliland and
Connie Page for their guidance for two years at CSTAT, I obtained valuable experience
on consulting service.

My special thanks goes to Professor James Stapleton for continuous help and
encouragement from the very beginning. I also want to thank Professor Vince Melﬁ,
Cathy Sparks and Laurie Secord for their assistance, and I thank all the professors
and friends who helped me at MSU over ﬁve years.

This dissertation research has been supported in part by NSF grants DMS 0405330

and BCS 0308420.

vi

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

1 Introduction

1.1
1.2
1.3
1.4

Introduction ................................
Conﬁdence Bands .............................
Additive Component Eastimation ....................

Application to Seasonality Analysis ...................

2 Spline Conﬁdence Bands

2.1
2.2
2.3
2.4

2.5

2.6
2.7

Introduction ................................
Main Results ...............................
Error Decomposition ...........................
Implementation ..............................
2.4.1 Implementing Exact Bands ....................
2.4.2 Implementing Conservative Bands ................
Simulation and Examples ........................
2.5.1 Simulation .............................
2.5.2 Fossil Example ..........................

Conclusions ................................

vii

xi

11

14
14
15
20
24
26
27
30
30
32

2.7.2 Proof of Theorem 1 ...........

2.7.1 Preliminaries for Theorem 2 ......
2.7.2 Variance Calculation ..........
2.7.3 Proof of Theorem 2 ...........

3 Spline-Backﬁtted Kernel Regression

3.1 Introduction ..................
3.2 SBK and SBLL Estimators ...........
3.3 Decomposition .................
3.4 Simulation and Examples ...........
3.4.1 Simulation ...............
3.4.2 Boston Housing Example ........
3.5 Conclusions ...................
3.6 Proof of Theorems ...............
3.6.1 Variance Reduction ...........
3.6.2 Bias Reduction .............
3.6.3 Technical Lemmas ...........

4 Application to Seasonality Analysis

4.1 Introduction ...................
4.2 Method .....................
4.2.1 Study Area and Data Description . . .
4.2.2 Polynomial Spline Regression .....

4.2.3 Spline Fitting for LAI by LULC Type

4.3 Results .....................
4.3.1 Land Cover Phenologies ........
4.3.2 Sensitivity and Uncertainty ......

4.3.3 Phenological Functions of Land Cover

viii

0000000000000

OOOOOOOOOOOOO

0000000000000

ooooooooooooo

0000000000000

OOOOOOOOOOOOO

0000000000000

36
42
44
46

52
52
56
61
67
67
71
73
74
74
75
79

100
100
103
103
104
106
107
107
109
110

4.3.4 Implications ........................... 112
4.4 Conclusions ................................ 113

BIBLIOGRAPHY 141

4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8

LIST OF TABLES

Coverage probabilities of constant spline bands. ............ 115
Coverage probabilities of linear spline bands ............... 116
Relative efﬁciency of firmI against 7713,01 for d = 4, 10. ......... 117
Relative efﬁciency of map against 171.33 for d = 50. .......... 118
Coefﬁcients table for Deciduous Shrubland with Sparse Trees. . . . . 119
Coefﬁcients table for Deciduous Woodland ................ 120
Coefﬁcients table for Open to Very Open 'I‘raes. ............ 121
Coefﬁcients table for Rainfed Herbaceous Crop. ............ 122

4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15

4.16

4.17

LIST OF FIGURES

Constant spline conﬁdence bands with opt = 1. ............ 124
Constant spline conﬁdence bands with opt = 2. ............ 125
Linear spline conﬁdence bands with opt = l. .............. 126
Linear spline conﬁdence bands with opt = 2 ............... 127
Testing Ho : m (at) = Zia akx", d = 2, 3, 5,6 for fossil data. . . . . 128
Relative efﬁciency of mag against rhsp, d = 4 .............. 129
Relative efficiency of ﬁrm, against rhsp, d = 10. ............ 130
Relative efﬁciency of 17:3,0 against than, d = 50, a = 1, 10. ...... 131
Relative efﬁciency of 708,0 against Tits“), (1 = 50, a = 19, 50 ....... 132
Linearity test for the Boston housing data ................ 133
LAI trend of rainfed herbaceous crops. ................. 134
LAI trend of open to very open trees ................... 135
Spline conﬁdence bands of LAI of deciduous woodland. ........ 136
Spline conﬁdence bands and RAMS curves of LAI of deciduous shrubland.137
Spline conﬁdence bands and RAMS curves of LAI of rainfed herbaceous

crop. .................................... 138
Spline conﬁdence bands and RAMS curves of LAI of open to very open

trees ..................................... 139
Improved representation of land surface in RAMS ............ 140

xi

CHAPTER 1

Introduction

1.1 Introduction

For the past three decades, nonparametric regression has been widely used in many
statistical applications, from biostatistics to econometrics, from engineering to geog-
raphy. This is due to its ﬂexibility in modelling complex relationships among variables
by “letting the data speak, for themselves”. To ﬁx the idea, we begin with the uni-
variate regression models. Assume that observations {(Xg, Yi)}?=1 and unobserved

errors {5;};1 are i.i.d. copies of (X, Y, a) satisfying the regression model
Y=m(X)+o(X)6,. (1.1)

The unknown mean and standard deviation functions m (1:) and 0‘ (2:), deﬁned on a
compact interval [a, b], need not to be of any speciﬁc form.
Two popular nonparametric smoothing techniques are local polynomial / kernel and

polynomial spline. The kernel type estimators are “local”, treated comprehensively

in Fan and Gijbels (1996) and Hardle (1990). The polynomial spline estimators, on
the other hand, are global, see Stone (1985, 1994) and Huang (2003).

The ﬁdelity of a nonparametric regressor is measured in terms of its rate of con-
vergence to the unknown regression function. The type of convergence rates can
be pointwise, or uniform. For kernel type estimators, rates of convergence of all
three types have been established by Mack and Silverman (1982), Fan and Gijbels
(1996).For kernel smoothing of univariate regression ﬁmction, Hall and Titterington
(1988), Hardle (1989), and Xia (1998) made signiﬁcant contributions on the con-
ﬁdence bands. All of these are based on strong approximation of some empirical
processes by the 2-dimensional Brownian bridge, as in Tusnady (1977), which is the
same idea used in Bickel and Rosenblatt (1973) for conﬁdence band of probability
density function. More recently, Claeskens and Van Keilegom (2003) improved upon
Xia (1998) by using smoothed bootstrap, and by extending the conﬁdence band to
derivatives of the regression function. Hardle, Huet, Mammen and Sperlich (2004)
introduced the bootstrap bands with corrected bias.

For polynomial splines, least squares rates of convergence have been obtained by
Stone (1985, 1994), while pointwise convergence rates and asymptotic distribution
have been recently established in Huang (2003). Conﬁdence band for polynomial
spline regression, however, remains unavailable except under the strong restriction of
homoscedastic normal errors, see Zhou, Shen and Wolfe (1998). Since the conﬁdence
bands is one of the most important ways to do the model diagnosis, in another
words testing the validity of the parametric model, the conﬁdence bands for the

heteroscedastic model is in great demand because of its generality.

1.2 Conﬁdence Bands

An asymptotic exact (conservative) 100 (1 - a) % conﬁdence band for the unknown
m (3:) over interval [a, b] consists of an estimator Th (2:) of m (2:), lower and upper

conﬁdence limit ﬁt (1:) — In (3:), Th (1:) + In (2:) at every n: E [a, b] such that

“lingoP {m (as) 6 1h(:1:) :1: In (2:) ,Va: 6 [a, b]} = 1— or, exact,

lilrn’icng {m (1:) 6 ﬁt (1:) :l: In (2:) ,Va: 6 [a, b]} 2 1 — a, conservative.

Conﬁdence band of kernel type estimators are computationally intensive since a
least squares estimation has to be done at every point. In contrast, it is enough to solve
only one least square problem to get the polynomial spline estimator. The greatest
advantages of polynomial spline estimation are its simplicity of implementation and
fast computation. But so far the asymptotics property of the spline smoothing is not
complete as the kernel type.

To introduce the spline functions, divide the ﬁnite interval [a, b] into (N + 1)
subintervals Jj = [tj,tj+1) ,j = 0,....,N -— 1,JN = [tN,b]. A sequence of equally-

spaced points {tj};.v=1, called interior knots, are given as
t0=a<t1< < tN <b=tN+1,tj =a+jh, j =0,I,...,N+1,

in which h = (b — a) / (N + 1) is the distance between neighboring knots. We denote
by G (”-2) = C(94) [a, b] the space of functions that are polynomials of degree p -— 1
on each Jj with continuous (p -— 2)th derivative on [a, b]. For example, 0(4) denotes
the space of functions that are constant on each Jj, and 0(0) denotes the space of

functions that are linear on each Jj and continuous on [a, b].

Our ﬁrst objective is get the following polynomial spline estimator based on data

{(Xi,Y,-)}?=1 drawn from model (1.1)

n
A _ . . _ . 2 _
"117(3) _ argImngeGQ—mlmb] E; {K .9 (X1)} :17 " 1: 2: (1'2)

and then construct the error bound function In (2:) around this spline estimator.

We now state our main results in the next two theorems.

Theorem 1.2.1. Under Assumptions (A01 )-(AC4 ) on Page 16, if p = 1 (constant),
then an asymptotic 100 (1 — a) % exact conﬁdence band for m (2:) over interval [a, b]
is

m. (x) i a... (as) {210g (N + 1)}1/2 an.
in which 0,1,1 (2:) is the pointwise variance function of ml (:12), and can be replaced by

a (2:) {f (2:)nh}-1/2, d" is deﬁned in (2.2.13) with limit 1 as n —+ co .

Theorem 1.2.2. Under Assumptions (ACI)-(AC4) on page 16, if p = 2 (linear),
then an asymptotic 100 (1 — a) % conservative conﬁdence band for m (1:) over interval
[a, b] is

m (3:) :l: 0,1,2 (1:) {210g (N + l) — 210g a}1/2 ,
in which 0,1,2 (3:) is the pointwise variance function of m2 (at), is deﬁned in (2.2.11).

The construction in Theorem 1.2.1 is similar to the connected error bar of Hall and
Titterington (1988). Ours is superior in two aspects: ﬁrst, we treat not only equally-
spaced designs, but random designs; second, by applying the strong approximation of
Tusnady (1977), our conﬁdence band is asymptotically exact rather than conservative.

The error bars of Hall and Titterington (1988) are based on a kernel estimator while

ours is based on a regressogram. The upcrossing results used in the proof of Theorem
1.2.2 is also different from that used in Bickel and Rosenblatt (1973), Rosenblatt
(1976) and Hardle (1989). The theorem on linear conﬁdence band, however, bears no
similarity to the local polynomial bands in Xia (1998), Claeskens and Van Keilegom
(2003). It is instructive to point out that the asymptotic variance function on; (2:) of
m2 (2:) is a special unconditional version of equation (6.2), in [Huang (2003), Remark
6.1, page 1624]. Thus, as we have mentioned in the abstract, the linear conﬁdence
band localized at any given point x, is only a factor of (log n)”2 wider than the

pointwise normal conﬁdence interval of Huang (2003).

1.3 Additive Component Eastimation

While in practice we have to deal with the high dimensional data in most times.
Much effort has been devoted to addressing the issue of the “curse of dimensionality”.
One popular choice for such purpose is the additive model popularized by the book
of Hastie and Tibshirani (1990). Stone (1985) proposed estimators for component
functions and their derivatives, and established optimal rates of convergence. These
were later called polynomial spline estimators in the extended context of functional
ANOVA model in Stone (1994), Huang (1998). Huang and Yang (2004) further
extended these estimators to weakly dependent data and developed consistent BIC
model selection procedure based on such estimation.

Hastie and Tibshirani (1990) proposed backﬁtting estimators for components func-

tions without theoretical justiﬁcations, while Opsomer and Ruppert (1997) offered

partial asymptotic results for the case of d = 2 under some strong assumptions. Op-
somer (2000) extended the theoretical results to a general case with more than 2
covariates. Mammen, Linton and Nielsen (1999) proposed a projection based modi-
ﬁcation of the backﬁtting algorithm and established its theoretical properties, which
was implemented in Nielsen and Sperlich (2005) and called smooth backﬁtting esti-
mator. Another viable alternative is the so—called marginal integration method, as
ﬁrst proposed in Tj¢stheim and Auestad (1994), Linton and Nielsen (1995), Linton
and Hairdle (1996), and further developed in various contexts by Fan, Hardle and
Mammen (1998), Yang, Hairdle and Nielsen (1999), Sperlich, Tjostheim and Yang
(2002), Yang, Sperlich and Hardle (2003), Xue and Yang (2006). Using the wavelet
transformation, Hardle, Sperlich and Spokoiny (2001) developed the additivity and
the polynomial structural tests. Series estimator in Andrews and Whang (1990) cir-
cumvented the curse of dimensionality when interactions are present in the model.

Let {IQ,X?}:=1 = {I/i,X,-1,...,X,'d}?=l be an i.i.d. sample following the additive
model

d
Y = m(X) + a (X) e,X = (X1, ...,Xd) ,m(x) = 0+ 2 ma (ma) , (1.3)
a=l

where the noise satisﬁes E (EIX) = 0, var (EIX) = 1 and the component functions
satisfy the identiﬁcation conditions Ema (Xa) 5 0,0: = 1, ...,d. In addition, one
assumes that each predictor X0 is distributed on a compact interval [am ba] .

If the last d — 1 of the component ftmctions were known by “oracle” , then one
could deﬁne a new variable Y1 = Y - c - 23:21:17.0 (X0) = m1(X1-) + 0(X)5

which one can use to regress on the numerical variable X1 to estimate the only

unknown function m1 (x1), without the “curse of dimensionality”. The basic idea
of Linton (1997) was to obtain an approximation to the variable Y1 by substituting
ma (Xa) ,a = 2, ..., d with the marginal integration pilot estimates (kernel-based)
and establishing that the error caused by this “cheating” is negligible for estimating
function m1 (2:1). The two-step idea for nonparametric regression also later appeared
in Fan and Chen (1999) for local quasi-likelihood estimation. It is well known that the
kernel estimation in high dimension would be extremely computationally intensive.
Kim, Linton and Hengartner (1999) provided an computationally efﬁcient two-step
estimator, a reduction in computation of order n compared with marginal integration.
The spline method, on the other hand, is very fast, but the rate of convergence is only
established in mean squares sense, and there is no pointwise conﬁdence interval or
even consistency in additive models. In particular, Hardle, Marron and Yang (1997)
demonstrated that the adaptive spline method could lack uniform consistency.

We propose to pre—estimate the functions {ma ($Q)}g=1 by an under smoothed
constant spline procedure. These function estimates are then used as if they were
the true functions for constructing the “oracle” estimator. The greatest advantage
of our approach over that of Linton (1997) is that ours is much faster, and can be
applied to cases of extremely high dimension data (e.g., the number of predictors, d,
can be as large as 50 or 100). One may wonder how one could have all these good
features in one method. The success of our method is due to the well-known “reducing
bias by undersmoothing” and “averaging out the variance” principles, both goals are
accomplished with the joint asymptoties of kernel and spline functions, which is the

new feature of our proofs.

In addition to those features, uniform conﬁdence bands are provided for all func—
tion estimates under mild conditions. For additive regression model, however, it seems
that this present work is the one of the few to offer the measure of uniform accuracy
with theoretical justiﬁcations. The good news is that the conﬁdence band we provide
for ma (ma) with any a = 1, ..., d, is asymptotically the same conﬁdence band that
Héirdle (1989) established for univariate regression with kernel smoother, regardless
how many regressors there are and what other functions mo, (:50) , a = 1, ..., d are.
Hence neither the dimension of nor other function components play any role in forming
the band for ma (ma), at least according to the asymptotic theory. In this sense, our
estimator of mo, (30,) possesses what we would like to call “uniform oracle efﬁciency” ,
which is much stronger than the “pointwise oracle efﬁciency” of Linton (1997).

Without loss of generality, we take all intervals [amba] = [0,1] ,0: = 1, ...,d.
Deﬁne for any a = 1, ...,d, the indicator function I J,a ($0,) of the (N +1) equally-
spaced subintervals of the ﬁnite interval [0, 1], that is

1 JHgasa<(J+1)H,

H = Hn .—. (Nn+ 1)‘1,J =0,1,...,N.
0 otherwise,

IJ,a (55a) = {
(1.4)

Deﬁne the (1 + dN)-dimensional space G of additive spline functions as the lin-
ear space spanned by {1,Ij’a (ma) ,0: =1,...,d,J = 1, ...,N}. The spline estimator
of additive function m (x) is the unique element 152 (x) = mu (x) from the space

C so that the vector {m (X1) , ...,m (Xn)}T best approximates the response vector

Y = (Y1, ..., Yn)T. To be precise, we deﬁne

d .N
m (x) = So + Z 2 Sign. (ma). . (1.5)

a=1J=1

where the coefﬁcients A0, £1.11 ..., 51 Mel are the least square solution given by

T d N 2
{A0,A1, 1.-- AN d} - ”argmianNH 2 {Vi A0 - Z Z AJHaIJa (Xia)}-
=1 a=1J=l
(1.6)

The pilot estimators of each component function and the constant are deﬁned as

N n .N
The, (3170:) = Z AJ,a:IJ,a, (517a) “ "-1: z )‘J,aIJ,a (Xia),

i=lJ=l

d n .N
m. i0+n 122215)) IJ,a(X,a). (1.7)

These pilot estimators are then usedt to deﬁne a set of new pseudo-responses 17,1

which are estimated versions of the unobservable “oracle” responses Y“,
d

Ya=n—e—Zma(xa),n1=Y.-—c-ija(X.-a)i=1=n“‘ZY.~
a=2 a=2
(1 8)
The proposed splinebackﬁtted kernel (SBK) estimator of m1 (1:1) as in” (1:1) based
on {17“, X51 }n 1, which is an attempt to mirnick the would-be N adaraya-Watson
1,:
estimator 1713,] (1:1) of ml (31) based on (Ya, Xil}?=1: had the unobservable “oracle”

responses {Yil}?=1 been available.

. 2L1 Kh (Xil— $1)Yi1 - Z?=1Kh(Xi1 - $1)Yi1
m (I: = — :1: = —
3:“ 1) 2.1;. K. (X51 — x1) "‘8‘: ( 1) 29:11am - x.)

 

 

(1.9)

where f’ﬂ and Y“ are deﬁned above. Similar constructions can be based on local
linear instead of Nadaraya—Watson estimator, which is called spline-backﬁtted local
linear estimator (SBLL).

The asymptotic property of the kernel smoother 171,) (2:1) is well—developed ac-

cording to Theorem 4.2.1 of Hardle (1990), one has
~ D
Yn'h'{m.,1 (x1) — m1(2=1) — be.) ha} —. N (0.122 (mo) .

9

where

bffvl) = #2(K){m'1'($1)f1($1)/2+m'1 (I1)f{ (5131)}f1'1 (171).
v2 (31) = "KllgEl02($1:X2:~--:Xd)}fl-l($1):

In contrast, the bias coefﬁcient of the SBLL estimator would simply be b(:1:1) =

(1.10)

112 (K) m’l’ (2:1) /2, without the additional term of the SBK estimator, while the vari-
ance coefﬁcients of SBLL and SBK are the same.
Hardle (1989)provide the uniform asymptotics for kernel smoother. For any a E

(0, 1), an asymptotic 100 (1 — oz) % conﬁdence band for m1 (1:1) over interval [0, 1] is
nli.rr(;of’{m1(:r:1)6 ms,1(z1)iln(1:1),‘v’xl 6 [0,1]} = 1 — a
where In (an) is deﬁned in (3.2.9).

Theorem 1.3.1. Under Assumptions (A51) to (A56) on page 57, for any x1 6 [0, 1],

the SBK/SBLL estimator msn (1:1) given in (1.9) satisﬁes

Is... (an) - an (anal = 0. (124/5)

Theorem 1.3.2. Under Assumptions (A51) to (A56) and (A52’) on page 57, the

SBK/SBLL estimator mm (11:1) given in (1.9) satisﬁes

SUP If?!“ ($1) - 1713,1 (131)] = 0p ("'2/5) -
x1€[0,1]

The two theorems state that the asymptotic magnitude of difference between
as) (1:1) and ﬁrm (2:1) is dominated by the asymptotic size of in” (1:1) — m1 (1:1).
Hence mm (x1) will have the same asymptotic distribution as in“ (2:1), pointwise
and uniformly. Higher order local polynomials can also be used, with obvious mod-
iﬁcations. For more on the properties of local linear estimators, in particular, its

minimax efﬁciency, see Fan and Gijbels (1996).

10

1.4 Application to Seasonality Analysis

Many studies demonstrate the inﬂuence of land use and land cover change on lo-
cal and regional climate. The Climate and Land use Interaction Project, or CLIP
(http://clip.msu.edu) attempts to understand the nature and magnitude of the inter-
actions of climate and land use/ cover change across East Africa.

Phenological information reﬂecting the seasonal variability of vegetation is an
important input variable in regional climate models such as Regional Atmosphere
Simulation System (RAMS). It varies not only among different vegetation types but
also with geographic locations (latitude and longitude). I

Many climate models use simple functions for vegetation parameters since, to ﬁrst
order, the planet is warmer and wetter as you approach the equator. However, east
Africa is unique in having semiarid grasslands along the equator, and drastically dif-
ferent surface conditions govern the radiationbudget in this region. Climate models
are dependent on an accurate representation of the surface radiation budget to repli-
cate atmospheric development. Thus, modeling climate for a unique area like east
Africa requires a different treatment of vegetation characteristics.

RAMS version 4.4 (Cotton et a1. 2003), a state-of-the-art three dimensional at-
mospheric model, includes a representation of vegetation called the Land-Ecosystem-
Atmosphere Feedback, version 2 (LEAF-2) (Walko et a1. 2000). For a. given land
cover class, LEAF-2 provides functions for several vegetation characteristics includ-
ing LAI, fractional cover, roughness length, and displacement height. Although these

characteristics are interrelated, we will consider only LAI here.

11

Based on the observations of LAI of MODIS data, the polynomial spline regres-
sion is employed to ﬁt the function of each land type in East Africa. We develop
the function ﬁrst temporally and then further investigate the spatial inﬂuence. In
other words, the estimate function of LAI will rely on the time and the spatial index
(latitude and longitude). Four major land cover types are chosen to display the trend
of the LAI.

Let Z =LAI, :1: = latitude, y = longitude, t =Julian day. For each LC type we

develop the LAI function as follows,
11
Z (2:,y, t) = 610 (x,y) + Zdj (1:, y) - (t — tj)+ + 612(1c,y)t, (1.11)
j=l

The coefﬁcients cij (2:, y) for j = 0,1, ..., 12, are estimated based on the MODIS
data at each individual grid. Different LC type will have different coefﬁcients set, see
Tables 4.5 - 4.8.

Figure 4.11 and 4.12 illustrates two examples of the seasonal variation in LAI for
common classes in the study area, ”Rainfed Herbaceous Crop” and ”Open to Very
Open Trees” . The observed LAI and resultant splines are distinctly different from the
RAMS / LEAF-2 default parameterization, with the LEAF-2 parameterization com-
pletely failing to capture the seasonality at the equator (solid) or in the regions +/-
5 (dashed /dotted) away. The spline parameterizations accurately capture bimodal
greening events at the equator, unimodal features away from the equator, and the
very low LAI for maize regions following harvest.

Figures 4.17 shows LAI values at 8 May 2000 for three combinations of land

cover and LAI phenology, along with a MODIS image for comparison. The profound

12

difference in LAI from 4.17 (a) to (d) at the Equator shows that the LEAF-2 function
is essentially treating the semidesert of eastern Kenya as having high LAI with no
variation. These successive improvements have helped to give a more precise surface
parameterization while keeping the ﬂexibility needed to accommodate projected land
use change.

The hypotheses for each land type is

H0 : LAI trend curve follows the RAMS Curve

Ha : Not follow the RAMS Curve.

The test illustrates that the RAMS curves overestimate the LAI, with the difference
being signiﬁcantly large indicated from the small p-value< 0.001, see Figures 4.13 to
4.16.

The dissertation is organized as follows. In Chapter 2, we develop the exact
conﬁdence bands via constant spline regression and the conservative ones via linear
spline regression. Chapter 3 the spline-backﬁtted kernel estimator is proposed to
estimate the component function in an additive model under mild conditions. We
applied the linear spline estimator and its uniform asymptoties to estimate and test
the Leaf Area Index trend for CLIP (Climate Land Interaction Project) in Chapter

4.

I3

CHAPTER 2

Spline Conﬁdence Bands

2.1 Introduction

In this chapter, we present conﬁdence bands of univariate regression function based
on polynomial spline smoothing. We assume that observations {(Xi, Y5)}?._.1 and

unobserved errors {5i}?=1 are i.i.d. c0pies of (X, Y, e) satisfying the regression model
Y=m(X)+a(X)e, , (2.1.1)

where the joint distribution of (X, 5) satisﬁes Assumption (AC4) in Section 2.2. The
unknown mean and standard deviation functions m (2:) and a (2:), deﬁned on interval
[a, b], need not to be of any speciﬁc form. If the data actually follows a polynomial
regression model, m (2:) would be a polynomial and a (2:), a constant.

We organize this chapter as follows. In Section 2.2 we state our main results on
conﬁdence bands constructed from (piecewise) constant/ linear splines. In Section 2.3

we provide further insights into the error structure of spline estimators. Section 2.4

14

describes the actual steps to implement the conﬁdence bands. Section 2.5 reports
ﬁndings in an extensive simulation study and the application to the testing of poly-
nomial trend hypothesis for the well-known motorcycle data. Section 2.6 concludes.

All technical proofs are contained in Section 2.7.

2.2 Main Results

To introduce the spline functions, divide the ﬁnite interval [a, b] into (N + 1) subin-
tervals Jj = [tj,tj+1) ,j = 0, ....,N — 1,JN = [tN,b]. A sequence of equally-spaced

points {tj iii-1’ called interior knots, are given as
to =a<t1 < <tN <b=tN+1,tj =a+jh, j =0,1,...,N+1,

in which h = (b -— a) / (N + 1) is the distance between neighboring knots. We denote
by C(P’Z) = C(94) [a, b] the space of functions that are polynomials of degree p - 1
on each Jj and has continuous (p — 2)th derivative. For example, G('1) denotes
the space of functions that are constant on each Jj, and Gm) denotes the space of
functions that are linear on each Jj and continuous on [a, b].

In what follows, IHI00 denotes the supremum norm of a function r on [a, b], i.e.
||r||00 = sup Ir (2:)I, and the moduli of continuity of a continuous function r on [a, b]

2:6 a,b

is denoted as w(r,h) = Ir (2:) —r(1’)|. One has ’linzwﬁ, h) = 0

max
z,2:’€[a,b],|2:—2:’|Sh
by the uniform continuity of r on a compact interval [a, b].

Our approach is to get the following polynomial spline estimator based on data

15

{(Xi,1/,-)}?=1 drawn from model (2.1.1)

71
A _ . . _ ’ 2 _
mp (17) _ argmlng€G(P—2)[a’b] ; {K 9 (X1)} 3p - 1: 2: (2'21)

and then construct the error bound function In (2:) around this spline estimator. The

technical assumptions we need are as follows:
(A01) The regression function m () E C(p) [a, b] , p = 1,2.

(AC2) The density function f (2:) of X is continuous and positive an interval [a, b] .The
standard deviation function a (2:) E C [a,b] has bounded variation and positive

lower bound on [a, b].

(AC3) The subinterval length h ~ n-l/ (2p+1). I.e., the number of interior knots N ~

n1/(2p+1)_
(AC4) The joint distribution F (2:, e) of random variables (X, 5) satisﬁes the following:
(a) The error is a white noise: E(e|X = 2:) = 0, E (52 IX = 2:) = 1.

b There exists a positive value 61 p and ﬁnite positive M such that E e 2 ' 6 <
6
M6 and

S“P2:6[a,b] E (HMS IX = 13) < M5-

Assumptions (AC1)-(ACB) are the same as in Huang (2003), while Assumption
(AC4) is the same as (C2) (a) of Mack and Silverman (1982). All are typical assump—
tions for nonparametric regression, with (AC1), (AC2) and (AC4) weaker than the

corresponding assumptions in Hardle (1989).

16

To properly deﬁne the conﬁdence bands, we introduce some additional notations.

For any 2: E [a, b], deﬁne its location and relative position indices j (2:) ,6 (2:) as

j (2:) = jn (2:) = min { [ﬁg] ,N} ,6(2:) = “57:19? (2.2.2)

It is clear that tjn(x) S :c < tjn($)+1' 0 _<_ 6(2) < 1,V2: 6 [a,b), and 6(b) = 1. De-
note by ||¢||2 the theoretical L2 norm of a function 4) on [a, b], "dug = E {(152 (X )} =
f: (b2 (2:) f (2:) d2:,and the empirical L2 norm as “95113,. = n-1 " _1¢2 (Xi). Corre-

sponding inner products are deﬁned by

<¢.<.o> >=f new nx>dx=Ew<X>so<X>1 (¢,<p)n=-Z¢(Xi)<p(Xz-)-
i=1

for any L2-integrab1e functions (I), (,0 on [a, b]. Clearly E (43, (,0)n = (d, (p).

Although the truncated power basis is used in implementation (see Section 2.4),
it is more convenient to work with the B-spline basis for theoretical analysis. The B-
spline basis of 0(4), the space of piecewise constant splines, are indicator functions
of intervals Jj, bj1(2:) = I- (2:) = IJJ. (2:) ,j = 0, 1, ..., N. The B-spline basis of 0(0),

the space of piecewise linear splines, are {bj 2 (2:)}j__1

 

$-t'1 .
bj’2(2:) = K( hJ+ ),J = —1,0,...,N, for K(u) = (1 - |u|)+.

Deﬁne next their theoretical norms

b
c ..=nb-,1||§= / Ij(x)f(a:)d:c.dj,n=llbj,2||§= f. K2(“ ,:’+—-—1-)f( )2.-
(2.2.3)

We introduce the rescaled B-spline basis {83- 1 (2:)}N .__0 and {B ,2 (2:)}N for

j=—1

17

C(‘ll and 0(0)

3331(1) 5 bj,1 (I) {Cj,n}—l/21j=01~-:N1

-1 2 .
13,32 (2:) s (1,3 (2:) {d,-,,} / ,3 = —1,...,N. (2.2.4)
It is straightforward to see that
2 . . .
“133,"2 =1,] = 0,1,... ,N, (3,,1,Bj,,,> =5 0,] 7,1 1', (2.2.5)

The inner product matrix V of the B-spline basis {3,32 (2:)};V=_l is denoted as

V N B B N 2 2 6
- (”ﬂame—1 .- (< 1",?” J’2>)j,j’=—1’ ( ' ' )
whose inverse 5 and 2 x 2 diagonal submatrices of 5 are expressed as
N ._ ._ ._ .
S = (3.1) , ., = V‘1,Sj = 31 1’3 1 33 ‘17 ,j = 0,...,N. (2.2.7)
J J N =“ 3231—1 32.2

Next deﬁne matrices 2, A (2:) and Ej as

 

 

N
N
2 = (0j“)j,j,=—l = {/02 (v) ng (’0) 812 (U) f (U) dv} . -I 1 . (2.28)
.71] ="
ACT) = (cj(x)-1{1'5($)}),Cj={ \f2- J:="'1,N ,
Cj(3)6 (It) 1 J = 0, ..., N —I
E,- = ( (“1'3“ (“n+2 ) ,j = 0,1,...,N, (2.2.9)
lj+2,j+1 lj+2,j+2 .
with terms film [i - kl S 1 deﬁned through the following matrix inversion
( 1 ﬁ/4 o \
ﬁ/4 1 1/4
1/4 1 . —1
M = = (11:) 1
N+2 1/4 ' (N+2)x(N+2)
1/4 1 \/2/4
A 0 ﬁ/‘I 1 (N+2)x(N+2)

(2.2.10)

18

and computed via (2.4.14), (2.4.17), and (2.4.18).

 

We deﬁne now
I I- (1002 (wow 1 N
0,2,,1 (2:) = sz‘) ”62 , 031,2 (2:) = ﬁ 2 Bj’,2 (3)3113 (:12) sjjrsulaﬂ,
2(3).n j,j’,l,l’=-—1
(2.2.11)

with j (2:) deﬁned in (2.2.2), ej,n in (2.2.3), 31-12(2) in (2.2.4), and s“: and 0,1 in
(2.2.7), (2.2.8). These 0,2,”, (2:) are shown in Lemmas 2.7.4, 2.7.4 to be the pointwise
variance functions of 1:1,, (2:) , p = 1, 2.

We now state our main results in the next two theorems.

Theorem 1. Under Assumptions (A01 )-(AC'4), if p = 1, then an asymptotic

100 (1 — a) % exact conﬁdence band for m (2:) over interval [a, b] is
m, (2:) d: Un’l (2:) {2 log (N + m”2 d”, (2.2.12)
in which an,1(2:) is given in (2.2.11) and can be replaced by a(2:) {f (2:) nh}"1/2,

according to (2.7.7) in Lemma 2.7.4, and

(in = 1 — {2 log (N + l)}_1 [log{—%log(1 - (1)} + % {loglog (N +1)+log41r}] .

(2.2.13)

Theorem 2. Under Assumptions (AC1)-(AC’4), if p = 2, then an asymptotic

100 (1 — a) % conservative conﬁdence band for m (2) over interval [a, b] is
1112(2) :1: an; (2:) {2 log (N + 1) — 2 log 01]”2 , (2.2.14)

in which on; (2:) is given in (2.2.11 ) and can be replaced by
a(2:) {2f(2:)nh/3}-1/2AT(2:)Sj(x)A(2:), according to Lemma 2.7.4, and by

a (2:) {2f (2:) nh/3}'1/2 AT (2:) Ej(x)A (2:) according to Lemma 2.7.3.

19

The construction in Theorem 1 is similar to the connected error bar of Hall and
Titterington (1988). Ours is superior in two aspects: ﬁrst, we treat not only equally-
spaced designs, but random designs; second, by applying the strong approximation
theorem of Tusnady (1977), our conﬁdence band is asymptotically exact rather than
conservative. The error bars of Hall and Titterington (1988) are based on a kernel
estimator while ours regressogram. The upcrossing results (Theorem 2.3.4) used in
the proof of Theorem 1 is also different from that used in Bickel and Rosenblatt
(1973), Rosenblatt (1976) and Hardle (1989). Theorem 2 on linear conﬁdence band,
however, bears no similarity to the local polynomial bands in Xia (1998), Claeskens
and Van Keilegom (2003), except the width of the band being of the same order
n’l/‘r’ (log n)1/2. The asymptotic variance function 0,2,3 (2:) of m2 (2:) in (2.2.11) is a
special unconditional version of equation (6.2), in Huang (2003), Remark 6.1, page
1624. Thus, the linear band localized at any given point 2:, is only a factor of (log n)”2

wider than the pointwise conﬁdence interval of Huang (2003).

2.3 Error Decomposition

In this section, we break the estimation error 111,, (2:) — m (2:) into a bias term and
a noise term. To understand this decomposition, we begin by discussing the spline
space C(P’Z) and the representation of the spline estimator 111,, (2:) in (2.2.1).

The ﬁrst fact to note is that the empirical inner products of the B-spline basis

{By-,1 (2:)};1;0 and {B 32 (2:)};lr:_l deﬁned in (2.2.4) approximate the theoretical inner

 

products uniformly at the rate of \/n"1h“l log (n), according to the following lemma.

20

.N

Lemma 2.3.1. As n —+ co, the B-spline basis {Bj,1 (15)};io and {Bj,2 (3)}j=_1

deﬁned in (2.2.4) satisfy

An,1 = SUP “[3331”:n - 1| = 012(W) 1 (2.3-11

OSJ'SN

 

 

 

 

 

’4”,2 : 811p (91192)n — (913.92) llgll2,n _ 1 2 0p ( log n/ (nh)) .
91192640) ||91||2 ||92||2 960(0) ||9||2
(2.3.2)
N

To express the estimator rap (2:) in {Bi}? (2:)} we introduce the following

i=1-p’

vectors in R" for p = 1,2
T T .
Y = (Y1, ...,Yn) 1Bj,p (X) = {Bjm (X1) , '°'1Bj,p (Xn)} ,] = I - p, ..., N.

The deﬁnition of 171,, (2:) in (2.2.1) entails that #1,, (2:) E Zj'il-p AijJ-‘p (2:) where
. . T
the coefﬁcients {Al—pp, ..., /\ N41} are solutions of the following least squares prob—

lem

n N' 2
{A1_p,p,..., AN,p}T = argminz {K - Z ALPHA? 0(5)} . (2.3.3)

i=1 j=1—p
We write Y as the sum of a signal vector m and a noise vector E

Y = m + E,m = {m (x1) , ...,m (Xn)}T,E = {0(X1)81, ...,a (xn)e,,}T.

Projecting this relationship into the linear space spanned by (Lap-2) -—-

{Bjm (X) };:l—p’ a subspace of R”, one gets
. . A T . . .
mp = {mp (X1) , ...,mp (Xn)} = PI‘OJ (p_2)Y =Pr03 (p_2)m + PI'OJ (p_2)E.
an 0,, 0,,
It entails that in the space GAP-2) of spline functions
12,, (2:) = 171,, (2:) + 5p (2:) , (2.3.4)

21

where

N N
mph): 2 1,,p3,,p(x),gp(x)= Z enema). (2.3.5)
j=l-P j=1—p

.. .. T
The vectors {A1_p,p, ..., AMP} and {&1_p,p, ...,dN,p}T are solutions to (2.3.3) with
Y,- replaced by m (Xi) and o (Xi) s,- respectively.
We cite next two important results. The ﬁrst one from de Boor (2001), page 149,

the second one from Theorem 5.1 of Huang (2003).

Theorem 2.3.1. There is an absolute constant 0,, > 0,p _>_ 1 such that for every

m E 00’) [a, b], there exists a function g e GfP-Z) [a, b] such that
"g - ...... s a, 1» (mt—1’:h)ll..r" .<. a. "mm”. ,,

Theorem 2.3.2. V There is an absolute constant C1, > 0,p Z 1 such that for any

m 6 0(9) [a, b] and the function 711,, (2:) deﬁned in (2.3.5),

limp (x) — mm”... s 0p inf ug — mu... = 0p (hp). (236)
g€G(p-2)

According to (2.3.4), the estimation error 111,, (2:) —- m (2:) = {171, (2:) — m (2‘)} +
5p (2:) where according to Theorem 2.3.2, the bias term «71,, (2:) -— m (2:) is of order
0,, (hp). Hence the main hurdle of proving Theorems 1 and 2 is the noise term Ep (2:).

This is handled by the next two propositions.

Proposition 2.3.1. With ring (2:) given in (2.2.11), the process 0,1,1 (2:)"1 51 (2:) ,2: E
[a, b] is almost surely uniformly approximated by a Gaussian process U (2:) , 2: 6 [a, b]

with covariance structure

N

EU (I)U(y) = EL] (11:) ' Ij (y) : j(2:),j(y):vxry 6 [a,b]:
j=0

where 53'! is the Kronecker symbol, i. e., (SJ-,1 = 1 if j = l and 0 otherwise.

22

Proposition 2.3.2. For a given 0 < a < Land on; (2:) as given in (2.2.11)

0‘; (2)52 (2:) g {2 log (N + 1) — 2loga}1/2 2 1 — a. (2.3.7)

"1

 

lim inf P sup
13

"200 €[a,b]
We state next the strong approximation theorem of Tusnady (1977), which will be
used later in the proof of Lemmas 2.7.6 and 2.7.6, key steps in proving Proposition

2.3.1 and Proposition 2.3.2.

Theorem 2.3.3. Let U1, ...,Un be i.i.d. r.v. ’s on the 2 -dimensional unit square
with

P(U,- <t) =)\(t),0 gt S 1,
where t = (t1,t2) and 1 = (1,1) are 2-dimensional vectors, A(t) = t1t2. The
empirical distribution function F,‘,‘(t) based on sample (U1,...,Un) is F,",‘(t) =
n-1 ELI I {Hi <t} for 0 S t S l. The 2-dimensional Brownian bridge B (t) is de-
ﬁned by B (t) = W (t) — A(t) W (1) for 0 S t S 1, where W (t) is a 2-dimensional

Wiener process. Then there is a version of 13;? (t) and B (t) such that

P[ sup I’ll/2 {F1}: (0 “ Aftll " B“) > ”—1/2(Clogn + 2:) logn < Ke‘M7
OStSI

(2.3.8)

holds for all 2:, where C, K, A are positive constants.

For the rest of the paper, we denote the well-known Rosenblatt quantile transfor-

mation as
(x',.-') = M (X. s) = {Fx (x) . Felx (52)}. (2.3.9)
which produces random variables X ’ and 6’ with independent and identical uniform

distribution on the interval [0,1]. This transformation had been used in, for instance,

23

Bickel and Rosenblatt (1973), Hirdle (1989). Substituting the vector t = (tLtg) in

Theorem 2.3.3 with (X ’ ,e’ ), and the stochastic process nl/2 {F}: (t) — A (t)} with
2,,{M-1 ($22)} = 2., (2,5) = v; {Fn (53,5) — F (2,5)), (2.3.10)

where Fn (:r, 5) denotes the empirical distribution of (X, s), then (2.3.8) implies that

there exists a version of 2-dimensional Brownian bridge B such that

sup IZn (2:,6) — B {M (2:,e)}| = 0 (n"1/2log2 n) ,w.p.1. (2.3.11)
2:,6

The next result on upcrossing probability is from Leadbetter, Lindgren and
Rootzén (1983), Theorem 1.5.3, page 14. In our proof of Theorem 1, it plays the
role of Theorem A1 in Bickel and Rosenblatt (1973) or Theorem C in Rosenblatt

(1976).

Theorem 2.3.4. If {1,...,én are i.i.d. standard normal r.v.’s, then for Mn =

max{{1,...,§n} ,7' E R, as n —» 00
P {an (Mn — bn) S T} -+ 8XP(-€-T),P{1Mnl _<. r/an + bn} -> exn (46-7),

where an: Zlogn 1/2,bn= Zlogn 1/2— 1 2logn ‘1/2 loglogn+log47r .
2

2.4 Implementation

In this section, we describe the procedures to implement the conﬁdence bands in
Theorems 1 and 2. We have written our codes in XploRe due to the convenience of

using certain kernel type estimators. Information on XploRe is in Ha'rdle, Hlavka and

Klinke (2000).

24

Given any sample {(Xi,Y,-)}?=l from model (2.1.1), we use min(X1,...,Xn)
and max (X 1, ...,Xn) respectively as the endpoints of interval [a, b]. Minor adjust-
ments could be made for outliers. The number of interior knots is taken to be
N = [clnl/(2p+1)] + c2, where C] and c2 are positive integers. Since explicit for-
mula of coverage probability does not exist for the bands, there is no optimal method
to select (c1, c2). In simulation, the simple choice of 5 for CI and l for c2 seems to
work well, so these are set as default values

The least squares problem in (2.2.1) can be solved via the truncated power basis
1,x,...,:1:p‘1,

(a: — t1):- ,j = 1, ..., N. In other words
p—l N 1

m, (2:) = Z rm" + 23,-, (a: - t1)”; , (2.4.1)
k=0 j=1

where the coefﬁcients {’70, ..., "yp_1, ’71,,” ..., "yN,p}T are solutions of the following least

squares problem

2
n p—l N
{5’02 '°')&p—19’?l,p)”’);l’N,p}T : 31‘ng K _ 2:7ka + Z7j,p (Xi — tj):_l
i=1 k=0 j=l .

When constructing the conﬁdence bands, one needs to evaluate the functions
0,2”, (x) in (2.2.11). This is done differently for the exact and conservative bands,
and the description is separated into two subsections. For both constant and linear
bands, according to Lemmas 2.7.4, 2.7.4, one needs the unknown functions f (2:)
and a2 (as). Let R (u) = 15 (1 — u2)21{|u| S l} /16 be the quartic kernel, 3,; =the

sample standard deviation of (Xi)?=1 and

. _ " _ ~ )Q._ _
f (11):" 12%}th (Kt—ix) ’hrow: (4w)'/‘°(140/3)1/5n ”53.. (2.4.2)
i=1 ’

25

where hrot,f is the rule-of-thumb bandwidth of Silverman (1986). Deﬁne next matri-
ces z, = {2122: ..,Zn,p}T,p = 1,2 with 2,3,, = (Y, - 131,, (X,-)}2 and

X=X(x)=(X1-x xn-x)T,W=W(x)=diag{1?(’,f:of)} .

i=1

 

where hm“, =the rule-of-thumb bandwidth of Fan and Gijbels (1996) based on data

Xi, Z,- l: . Then one deﬁnes the following estimators of 02 (x)
g? 1—1
«2 T ’1 T
0,, (x) = (X WX) X WZp,p = 1,2. (2.4.3)

Bickel and Rosenblatt (1973), Fan and Gijbels (1996) provide the following uniform

consistency results

max sup '5? (2:) — a (2')] + sup
P=1,2 1:6 [a,b] zEla,b]

 

m) — f (2:)l = 0,, (1). (2.4.4)

2.4.1 Implementing Exact Bands

The function 0a,] (as) is approximated by either one of the following, with f (1:) and

61 (11:) deﬁned in (2.4.2) and (2.4.3), j (2:) deﬁned in (2.2.2)

671,1 (23,1) = (3’1 (5(3)) f—l/2 (tj(a:)) fl—l/2h-1/2, (2.4.5)
«3... (x, 2) = &1($)f‘1/2(x)n'1/2h‘1/2. (2.46)

where the additional parameter value 1 or 2 indicating the estimation at each value a:

or at the nearest left knot. Since sup 3 h —’ 0, as n —> oo,(2.4.4) entails

2:6 [a,b]

 

 

3‘9o)
that both of the bands below are asymptotically exact with 1711 (2:) given in (2.4.1)

and d" in (2.2.13)
ﬁll (:13) :i: 5n,1 (x, opt) {2 log (N + 1)}1/2 dn, opt = 1, 2. (2.4.7)

26

2.4.2 Implementing Conservative Bands

According to Lemma 2.7.3, for 0 S j S N, the matrix Ej approximates matrix
S, uniformly. Hence both of the bands below are asymptotically conservative, with

7712(3) given in (2.4.1)
1712(3) d: 6mg (2:, opt) {2 log (N + 1) - 2 log a}1/2 ,opt = 1, 2, (2.4.8)

where the function on,2(:1:) in (2.2.11) for the linear band is estimated consistently

by either one of the next two formulae

{AT (I) 5311915 (56))”2 \/3/—252 (tj(:c)) f _1/2 (9m) "—1/2’424???)

an,2(x.2) = {AT($)5j(x)A($)}1/2 Meow-V2(on-”2’2“”, (2.4.10)

611,2 (x) 1)

with A (12) and Ej deﬁned in (2.2.9), 3' (:5) deﬁned in (2.2.2), and f(:c) and 62 (2:)
deﬁned in (2.4.2) and (2.4.3).
In order to calculate the matrix MR1? which is needed for (2.2.9), we introduce

two theorems from matrix theory.

Theorem 2.4.1. [Gantmacher and Krein (1960), page 95, equation (43)] For a sym-

metric Jacobi matriz J given as follows

a1 ()1 0
b . .
J: 1 ,
bN+l
0 bN+1 “N” (N+2)x(N+2)

its inverse matrix J ‘1 = (lik)( N +2)x( N +2) satisﬁes

li,k = #2ka S [Value = 1/JkXiJc S 1', (2.4.11)

27

where

(—1)'.det (J(1,,,,,,-_1)) bibi+l ° ° ° bN+1 (‘1)k det(J(k+1,...,'N+2))
' = r X = ’
‘ det (J) k bkbk+l ' ' 'bN+1

 

 

(2.4.12)
and J(1.---,i-1) is deﬁned as the upper left (i — 1) x (i — 1) submatn'x of J, det (J)
is the determinant of matrix J, while J(k +1,” N +2) is the corresponding lower right

(N + 2 — k) x (N + 2 - k) submatrix.

Theorem 2.4.2. [Zhang (1999), page 101, Theorem 4.5] For a tridiagonal matrix

givenas
a b 0
TN: C a " ,N_>_1, (2.4.13)
b
0 c a

if a2 75 4bc, then the determinant of TN is

0”“ -—ﬂN+1 _ a+\/a2—4bc

a-Va2—4bc
,a .
a-ﬂ 2

(1813 TN = 2

J3:

 

 

 

To apply Theorem 2.4.1 and Theorem 2.4.2, we let

 

_. 2
z, = 2+4ﬁ,22 = afﬁx) = :3 = (2— J5) = 7—4\/§. (2.4.14)
1

For any N Z 1, Theorem 2.4.2 entails that det (TN) = (Z{V+l — 221V“) /(zl — 22),

if one takes a = 1,b = c =1/4 in (2.4.13). Next, denote for any N Z 1

~ TN TT
MN+1 = ( N

TN 1 ,TN = (0,...,0,\/§/4)

) lxN
(N+1)x(N+l)

with the convention that M1 E 1. By the expansion of determinant of matrix M,-

along the last row and then the last column, Vi = 1, ..., N + 1

...,—I {21 (1— 6*) - (1— 01-1)}
8(21 - 22) '

 

det (A4,) = det (TF1) - 8—1det(T*_2) =

28

The determinant of matrix M N+2 can be expanded along the ﬁrst row and then

the ﬁrst colurrm:

det (MN+2) = det (MA/+1) — 8—1det(117!N)

= 4"“ {64.2% (1 — 9N+1)— 1621 (1 — a”) + (1 — 6N-1)}/{64(21 — 22)}.

Applying (2.4.12) to matrix M N+2 yields

4 (—1)(1/4)”“(ﬁ/4)2/det<M~+2). i=1, (2415)
‘ (-1)‘ (1/4)"““ («i/4) det (Ni--1) /det <M~+21 . 2 s .- s N. '

_ 2 -1 ~
x. _ <-1){(1/4)" WM} det(M~+1). k=1. (2.4.16)

—l ..
(—1)"{(1/4)N+1"‘(\/§/4)} det (M(N+2)_,,), 2 g k g N.
Next, we apply (2.4.11) from Theorem 2.4.1 together with (2.4.15) and (2.4.16),

for all i, k = 1, ..., N + 2. Then the principle diagonal entries are
det MN“) /det (MA/+2), k = 1, N + 2
"”‘ 2 { det EM(N+2)—k) det (MM) /det(MN+2), k = 2, N + 1
which, after some algebra, becomes
82% (1 — 0N+1)— zl (1 — 0N)
1“ : WWW” = 821,?(1— 0N+1) — 221 (1 - 6N) + 8 (1 - 9N-1)’

= {821 (1 -— (WW-k) — (1 - 9N+1-’°)} {821 (1 — 9H) — (1 — 9k-2) }

l
k,k (21 _ 22) {6422 (1 __ 9N+1)_1521(1- 0N) + 64 (1 - 6N-1)}
(2.4.17)

 

where 2 g k S N + 1. Similarly, the upper diagonal entries are

11' 1+1 = 11+1 1‘ = { (_ﬁM) det .(MN) Met (MNf2)’ i: 1’N +1
’ ’ (—1/4) det (M( N +1H) det (M,_1) /det (MA/+2), 1 = 2, N

which, by applying again (2.4.11), (2.4.15) and (2.4.16), becomes

(-2\/2) 21 (1 — 0N) — (1 — 0"”)

l :l =
‘2 NH'N” 82$ (1 —9N+1) —221 (1 -0N) +8(1 —9N-1)’

 

29

{8.21 (1 — HIM“) — (1 — o"-*)} {321 (1 — 6H) — (1 — 0i-2)}
(“”1 z (-4) (21 - zz) {642i (1 - 9N“)- 1611 (1 - 9N) + 64 (1 - 9N'1)} ’
(2.4.18)

 

in which 2 S i S N. By the symmetry of matrix M N+21 the lower diagonal entries

are li+1,i = li,i+11 for all i = 1, ..., N + I.

2.5 Simulation and Examples

2.5.1 Simulation

To illustrate the ﬁnite-sample behavior of our conﬁdence bands, we present some

simulation results. The data set is generated from model (2.1.1), with

100 - exp (x)

100 + exp(x)’X N U l-'51 '5115 N N (0, 1) (2.5.1)

m (x) = sin (21rx) ,0 (x) = 00

The noise level 00 = 0.2, 0.5 while sample size n = 100, 200, 500,10000. Conﬁdence
level 1 — a = 0.99, 0.95. Tables 4.1 and 4.2 contain the coverage probabilities as the
percentage of coverage of the true curve at all data points by the conﬁdence bands in
(2.4.7) and (2.4.8), over 500 replications of sample size n. We have also computed the
coverage probabilities of the conﬁdence bands in (2.2.12) by plugging in the true value
of density function f (x) = Il-1/211/2] (x) and the variance function a (x) in (2.5.1).
These bands are called “oracle bands” as they use quantities that are unknown but
for “oracles”; whereas the bands in (2.4.7) are called “estimated bands”.

In Table 4.1 the surprising outcome is that all four bands have the same coverage
with noise level 0.5. At noise level 0.2, the performance of all four bands becomes

much closer with sample sizes increasing, whereas for small sample sizes the oracle

30

bands are slightly better. In Table 4.2, the coverage percentages show very positive
conﬁrmation of Theorem 2. At sample size 200, regardless of noise level, both of the
two candidate bands in (2.4.8) achieve at least 95.6% and 90% for conﬁdence level
1 - a = 0.99, 0.95 respectively.

Fiom both tables, it is obvious that larger sample size guarantees improved cov-
erage, with reasonable coverage achieved at moderate sample sizes. Under the same
circumstances, the linear band performs much better than the constant band, which
corroborates with the theory. The noise level has more inﬂuence to the constant
bands than the linear ones.

For the linear bands, we have also carried out simulation for sample size n = 10000
and opt = 1. Regardless of the noise level, the coverage is always 99.4% for a = 0.01
and 97.6% for a = 0.05, both higher than the nominal coverage of 99% and 95%,
consistent with their conservative deﬁnitions. Remarkably, it takes merely 88 minutes
to run 500 simulations with sample size as large as 10000 on a Pentium 4 PC. This is
extremely fast considering that nonparametric regression is done without WARPing
[Hardle Hlavka and Klinke (2000)].

The graphs in Figure 2.4.8 are created based on two samples of size 100 and 500
respectively, each with four types of symbols: points (data), center thin solid line
(true curve), center dashed line (the estimated curve), upper and lower. thick solid
line (conﬁdence bands). In all ﬁgures, the conﬁdence bands of n = 500 are thinner

and ﬁts better than those of n = 100.

31

2.5.2 Fossil Example

The fossil data reﬂect global climate millions of years ago through ratios of strontium
isotopes found in fossil shell, it was studied by Chaudhuri and Marron (1999) to
detect the structure via kernel smoothing. Ruppert, Wand and Carroll (2003) provide
penalized spline smoothing ﬁts to the data. In this section we test the polynomial form
of the fossil data regression curve. The null hypothesis is H0 : m (x) = Ezzl akxk,
with polynomial degree d = 2, 3, 5, 6. The response Y is the strontium isotopes ratio
after linear transformation, Y = 0.70715+ratio*10’5, since all the value are very
close to 0.707, while the predictor X is the fossil shell age in million years.

In Figure 4.5, the center dotted line is the linear spline ﬁt. The upper/ lower thin
lines represent linear bands based on Theorem 2, implemented according to (2.4.8).
The solid line is the least square polynomial ﬁt with degrees 2, 3, 5, 6. Clearly, the
oversmoothed quadratic null curve (d = 2) is rejected at signiﬁcance level 0.01 since
it is far away from being totally covered by the conﬁdence bands with conﬁdence
0.99. Though when d = 3,5 the null solid curves can capture the big dip at the
range of 110 —- 115 million years old, it is not a good ﬁt even visually. Thus both
null parametric models Ho are rejected at the level 0.01. While in the case d = 6,
all signiﬁcant features are shown in the null polynomial curve, the relative high ratio
before 105 million years old, the substantial dip around 115 million years old, the
relative ﬂat stage between 95 and 105. Given a 80% conﬁdence bands the entire null
curve falls between the upper and lower limits even though the bands are narrower

than the those with conﬁdence 90%, in other words for the testing we obtain a p—value

32

greater than 0.20. The shape of the polynomial curve with d = 6 is consistent with
the nonparametric structure given in , Chaudhuri and Marron (1999) and Ruppert,

Wand and Carroll (2003).

2.6 Conclusions

We provide exact forms of two conﬁdence bands constructed from polynomial spline
regression. Asymptotic properties have been established for equally spaced, nonadap-
tive selection of knots. Extension to adaptive design is infeasible, as Hardle, Marron
and Yang (1997) had shown that adaptive knots selection could lead to inconsistency
in L00 norm.

It is possible, however, to extend the constant spline band in Theorem 1 to un-
equally spaced deterministic knots subject to mesh constraints as in Huang (2003).
The linear band in Theorem 2 does not allow such direct extension. This is one of
the two reasons that the constant band remains viable despite the fact that the linear
band has much better theoretical property and practical performance. The constant
band is kept also for its simplicity. When implemented according to (2.4.7) with es-
timation on equally-spaced knots, the conﬁdence limits at point x is the exact same
as those at the nearest knot 9(3), so the constant band is in fact (N + l) indepen-
dently inﬂated conﬁdence intervals. In contrast, the piecewise linear band has to be
calibrated at each new point x. That is, the conﬁdence limits at x and the ones at
tJ-(x) are different.

Extension to multivariate regression is difﬁcult for lack of sharp approximation of

33

the kind in (2.3.8). This limitation is also in Xia (1998), Claeskens and Van Keilegom
(2003). The main hurdle of generalizing our method to higher order splines is the
inversion of the inner product matrix of B-spline basis, for which close form solutions
exist in the case of linear spline with the aid of (2.4.11) and (2.4.12). The inner
product matrices of the two basis in (2.2.4) are diagonal and tridiagonal respectively,

while for higher order splines it becomes multi-diagonal.

2.7 Proof of Theorems

2.7.1 Preliminaries for Theorem 1

Throughout Appendices A and B, we denote by the same letters c, C', any posi-
tive constants, without distinction in each case. The detailed proof is given at

http: //www.msu.edu/'yangli/bandfull.pdf.

Lemma 2.7.1. Under Assumptions (AC3) and (AC4), there exists a sequence

{Du} = {naO} for some do > 0, such that the following conditions are fulﬁlled

00
2 012+“) < oo, (nh)—l/2 logann, «wt/1011+"), 0,:‘511-1/2 —+ 0. (2.7.1)

n=l

And for any sequence {Du} that satisﬁes the above four conditions, we have
P{w|3N(w), 9 [a,-I g Dmi = 1,...,n,n > N(w)} = 1.
Lemma 2.7.2. As n -—> 00, for ij and dim deﬁned in (2.2.3)

61",, = f (tj) h (1 + Tj’n’l) 1 (bj,libj’,l> E Oaj ¢ j, (2.7.2)
1 + rmg j = o, ...,N — 1,

, (2.7.3)
1/2 + Tj,n,2 J : “LN:

2
din = §f(tj+1)h{

34

1 1+f° 2 IJ"-J'|=1,
b- ,b. )=— t- h 1”“ 2.7.4
< ,2 6m“) {0 “._jlﬂ, < )

where

ogixN lrj’ml + —1’2?§N “13"!” + 4313.134 Ifjml S Cw (f. h). (2.7.5)
In particular,

1 2

5f (9+1) h {1 — Cw (f,h)} s d," _<. 5f (t,,.,) h {1 + Cw (f, 11)}. (2.7.6)

PROOF OF LEMMA 2.3.1. For brevity, we give only the proof of (2.3.1) for An,l-

Take any j = 0, 1, ...,N

n
2 _—
lllB-.1||2,.. -1| = IZele = {B§,,(X,)—1}n 1
i=1
With E51 = 0 and for any It 2 2, Minkowski’s inequality implies that

_ k _ 2 "
Elgilk =n kElBJzil (Xi) — II S (2/n)k2 1E [312,]; (Xi)+ I] S{1—IE} Coll,

while (2.7.2) entails that E53 2 n’2E [1331, (X,) — 1] 2 {2/ (1111)}? cm.
It is then clear that one can ﬁnd a constant c > 0 such that for all k > 2,
E léilk _<_ (cn‘lh-1)k_2 klE |€,~|2. Applying Bernstein’s inequality to 2?:1 6,, for

any large enough 6 > 0
n
P { 2 6i

i=1
=> 2 P sup “IBM“2 n -1| 2 6\/(nh) log (n) < oo

 

 

 

2 a,/ (rim-1 log (n)} g 211-3

 

for such 6 > 0, then (2.3.1) follows. U

35

2.7.2 Proof of Theorem 1

In this section, we will investigate the asymptotic behavior of 51 (x) deﬁned in (2.3.5).
Since

(Ber (X) ,3“ (X)>n = 0 unless j = j', 51 (x) can be written as

N
—2
51 (1‘) = 25531.1 (5'3) ”3 '.1||2,n
i=0
in which
1 n
6'} = (E13131 (70),, = 5 ZB',1(X1)0(X1)€1'~
i=1
Lemma 2.7.3. Let 5‘1 (x) = Ell/=0 a; 131 (x) ,x E [a, b] then
.. .. —l A
|€1(1‘)- 61($)| S «411,1 (1 - 411,1) |€1(1‘)|1$ E [0151,
where A,” is deﬁned in (2.3.1).
The asymptotic behavior of SUPxe[a,b]l§1(-T)l therefore is the same as that of
supx€[a,b] lél (“TN '
Lemma 2.7.4. The pointwise variance of (31 (x) is the function 03,1 (x) deﬁned in

(2.2.11) which satisﬁes

E{é1(a:)}20121,$1( )= (mgx)h{1+rn,1f(x)},x6[a,b] (2.7.7)

with supxelaM Irml (x)| -—> 0.

PROOF. The term E {£1 (x)}2 has the expression for 03,1 (x) in (2.2.11). By (2.7.5)

and the continuity of functions 02 (x) and f (x), 03,1 (x) can be expressed as

0’2 ($)f($)h+ fjﬂx) {a2 (v)f(‘U) __ 0.2 ($)f(x)} d1) 2 _(L)
"{f (tj($)) h+r j,,(x)n 1}2 :nf _(_x)h

36

{1+ 1'n,l (33)}1

 

with supxelaﬁ] {771,1 (x)| -') 0, establishing (2.7.7). [3

Lemma 2.7.5. Let the sequence {Du} satisfy (2. 7.1) and deﬁne forx 6 [a,b]

N N
631,1 (95) = 011,1 (10—1 : 331 (513)53f = 011,1 (ml-'1 23m (x) (8; - 55;) 1

J=0 J=0
N
63,21 (1:) = 0n11($)—IZ Bjtl (x) (6; - E635) [{IEKDR} (2.7.8)
j=0

then with probability 1

”an, (1:) — 4,2, (1:)“00 = o (19,:(“5’1/1711) = 0(1).

PROOF. Notice that E8; = EH; 2&1 BjJ (X,)o (X,) 6,} = 0 since E(6,-|X,-) = 0,
then

6,” (x) = {on 1 (x) fog-(3)71 }//Ij(z) (v) a (v) EdZn (v, 6)
according to the deﬁnition of Zn (v, 6) in (2.3.10). The process 6“",1 (x) is separated
into two parts 6“",1 (x) = €21 (x) + {6",1 (x) — €21 (x)} . The truncated part 62 1 (x)

is deﬁned in (2.7.8). The tail part 6“,“ (x) — 6,131 (x) is bounded uniformly over [a, b]

 

 

 

 

 

by
-1
$215] {011,1 (I) x/T—lcj(x),n} / / 11(1) (1110(1)) 51{|6IZDn}dZﬂ (11,5)
5.2151] {011(1) Mn} :2 11(4)“ ) “lama-lava} (“-9)
+ :1;pr {0&1 (x) Cj(x),n f/Iﬂx) (v)a(v)6I{lEIZDn})dF(v,6)(2.7.10)

By Lemma 2.7.1, the term in (2.7.9) is 0 almost surely. The term in (2.7.10) is

37

bounded by

sup 0,, 1 (x)c 61(3)“ } l/Ijm (u)a(v)f(v) [/|6| [{16120n1dF(6 |v)] dv

xehﬂ
-1 \/nh
S su o x / I - v v v dv— _ C'<——.
.4151. ..1( )c cm.) } ,(.1( > < m > 13,—; D1,,
The lemma follows immediately by the third condition in (2.7.1). E]

Lemma 2.7.6. Deﬁne forx 6 [a,b]

(0) (x) {0n1(x) "CJ(:1:)n}—lI// j(:1:) (”)0(”)51{|6|<Dn}dB {M (1115)}
(2.7.11)

then with probability 1

sup
xehﬂ

 

6531(1) — 4,13, (11)] = o (114/211-1013, log2 n) = 0(1).

 

PROOF. First, supera,b] 6:01 (x) — 6,131 (x)| can be written as

sup
xehﬂ

 

 

—l
(”n.1(rlx/50j(x),n} //1j(x) (v) 0 (v) €1{(.:|<Dn}d [Zn (015) - B {M (016)}1 ,
which the double integration becomes the following via integration by parts

sup
xEMab

 

 

/ / 1211(1) 4) B{M(vs>}14{(.) (v)a(v)eI{...<h..1}

S SUP
xehﬂ

x f/IZn (v,6) — B {M (v, 6)}| d {EIUEKDnl} d {Ij(x) (v) o (v)}.
Next, by Lemma 2.7.4, the bounded variation of the function a (x) in Assumption
(AC2), the strong approximation result (2.3.11) and the ﬁrst condition in (2.7.1), the

above term is bounded as
0 {(nh)”2 n-l/zh-l (n"1/210g2 n) Du} = 0(n—l/2h—1/2Dnlog2n) = 0(1) w. p. 1,

38

thus completing the proof of the lemma. CI
The next lemma ﬁnds a process 6"1 ((x) deﬁned in terms of the 2- dimensional

Brownian motion to approximate 6(0 )1'(x) 1n (2. 7. 11).

Lemma 2.7.7. Deﬁne for x 6 [a,b]

6(1)(x)= {Un1($)\/7—;C j.-(x)n} [[1 3(1) (0)0 0(v)61((e|<on}dW{M(v 5)}

then with probability 1
”8(1) (513) _ 551%”)“00 = 0(hl/2D—(H-dl) : 0(1) .

PROOF. Based on the Rosenblatt transformation M (x,6) deﬁned in (2.3.9), and

(19%;) = f(x,6), then the term “eff; (x)— 6(0) loo(11)" is bounded by

 

2111pr {011,1 (xlfc j,}(x)n f/Ij(x) (10(1)) l5lI{|6|<Dn}dM (v €)W(1 1)
—1
s sup 41,1(4Wc,(.,,..} / 11111 (41014111021441
x€[a,b]
x {/ (satisﬁeselvwe) WI. 111
\/_h -
_ C(TZ—h)hM TD” |W(1, 1)|= 0(111/20, (“H”) =o(1) w. p. 1
The last step is obtained by applying the third condition in (2.7.1). C]

The next lemma expresses the distribution of 6,: ((x) in terms of 1- dimensional

Brownian motion.

Lemma 2. 7. 8. The process 6(13(x) has the some probability structure as the process

1:53,} (x)={a..1(x)f ,-(.,, .} [9410112) («44.11412 (v)dW(v> soda 121

39

where

3?, (11) = / 621{|€]<Dn}felv(6|v)d6. (2.7.12)

PROOF. By applying Ito os Isometry Theorem, it is obtained that var{6n(1] (x)] and
var 6(2)1(x)} are exactly the same for any x E [a, b]. Hence, the two Gaussian

processes 6”“) (x) and 65,2 l (x) have the same probability structural]

Lemma 2.7.9. Deﬁne for any x E [a, b]

ef(3)1(--_—g;) {0,1,1 (IE) ﬁ6j(3),n}—l [11(3) (1)) 0’ (v) f% (v) (lW (U)

then

 

 

n2]—(x) 6(3)100=(x)]] 0(D‘+\/h)(: 0(1) w. p. l.

PROOF. By the fourth condition in (2.7.1) , supxelaﬂé n2](x) - 6(3] (x)] is almost

 

surely bounded by

 

:11pr 13.111) — 1| :11pr 0;} (1>c;(;,,,n‘1/2 [1.1.1 (1)0 (11) it (v) 11W (11)

= 0(1),:511-1/2) = 0(1)

 

 

3
"1

Lemma 2.7.10. The process 6 ] (x) is a Gaussian process with mean 0, variance 1,

and covariance

CO‘U{€S:) 1W1 511(31(y )} = 6j(x),j(y)ivxiy 6 [a,b]

PROOF. The variance and covariance are given by Ité’s Isometry Theorem

111(553 (1)} = {0.11 (1:) «61:11.14 / 111.1) (1)112 (1011111111 =

40

according to (2.7.7). Likewise the covariance cov {6(3) (x) ,6513] (y)} is

n,l

—1

{011,1 (:5) 011,1 (11) ncj(x),ncj(y),n}

xE { /
11(2)

—1
= {and (37) 0n,l (y) ”Cj(x),ncj(yl,"} / 02 (v) f (U) ‘11) = 61($)J(yl
Ji(x)”’1(y)

«(uni <v>dW(v) [J

J'(y)

11(1) ii (aw/(12)]

which completes the proof. Cl
PROOF OF PROPOSITION 2.3.1. The proof follows immediately from Lemmas

2.7.3, 2.7.5, 2.7.6, 2.7.7, 2.7.8, 2.7.9 and 2.7.10. Cl

PROOF OF THEOREM 1. It is clear from Proposition 2.3.1 that the Gaussian
process U (x) consists of (N + 1) i.i.d. standard normal variables U (to) , ...,U (tN),

hence Theorem 2.3.4 implies that as n —1 00

P sup IU(x)IST/aN+1+bN+1 —+exp(—2e”).
x€[a,b]

By letting T = — log {“é’ log (1 — 01)}, and using the deﬁnition of aN+1 and bN+1,

we obtain
“III P 811p |U(x)| S — log (ml-log (1 — 11)] {2 log (N + 1)}‘1/2
n—ioo x6[a,b] 2

 

+ {2 log (N + 1)}1/2 — % {2log (N + 1)}"1/2 {loglog (N + 1) + log4r}] =1- (1.

Replacing U (x) with a,” (x)—16”1 (x) (Proposition 2.3.1), and the deﬁnition of dn

in (2.2.13) entail that

lim P sup
3

n—+oo 6 [a,b]

 

an,1(x)'16'1 (x) S {2 log (N + 1)}1/2 dn] =1— 0.

41

According to (2.3.6), it implies that (nh)-1/2 1/log (N + 1) ”Th1 (x) - m (x)]]00 =

01) (1) . Thus according to (2.3.4)

nleréoP rm (x) E 1111(1) :l: a,“ (x) {210g (N + 1)}1/2 dn,Vx E [a, b]]
= 11111 P {2 log (N + my”2 11,-;1 sup 0;} (x) (1:5, (x) + 171, (x) — m (x)l s 1]

"—400 x6 [a,b

 

n—roo x6 [a,b

: lim P {210g(N+1)}'1/2d,-,1 sup 0;} (x) ]61 (x)] S I] =1—a. Cl

In

2.7.1 Preliminaries for Theorem 2

In this subsection we examine some matrices used in the construction of conﬁdence
band in (2.2.14) and in the proof of Theorem 2.

The next lemma corresponds to (2.2.5) for piecewise constant basis. In what
follows, we use IT] to denote the maximal absolute value of any matrix T, and M N+2

is the tridiagonal matrix as deﬁned in (2.2.10).

Lemma 2.7.1. The inner product matrix V of the B-spline basis {ng (x)};.v=_1
deﬁned in (2. 2. 6) has the following decomposition
v = MN” + (17,107,” = MN+2 + 17 (2.7.1)
.71] - 1
where 111.1]. E 0 if ]j —j’] 21, and
]17] g Cw (f, h). (2.7.2)

PROOF. By (2.7.3), (2.7.4) and (2.7.5), the inner product of (bj/2,bj,2> can be

replaced by ,1, f (1,11) h 11 ]j’ — j] = 1, and :1, f (1,11) h or .3, f (1,“) h when 3" = j,
plus some uniformly inﬁnitesimal differences dominated by 11.1 (f, h) . Then based on

the deﬁnition of Bj’g (x), the lemma follows immediately. Cl

42

The next lemma shows that multiplication by M N+2 behaves similarly to multi-

plication by a constant.

N
Lemma 2.7.2. Given matrix Q = MN+2 + F, in which I‘ = (73.14)

. 4:—

satisﬁes
71.34 E 0 if (j — j’l _>_ 1 and [PI -p+ 0. Then there exist constants c,C > 0 independent

of n and I‘, such that in probability
Clél s l9€| s Cl€|,C'1|£| s Iﬂ'lél s c‘1|€|,V€ e RN”. (2.7.3)

PROOF. Since each row of M N+2 has diagonal element equal to l, and one or two
nonzero off-diagonal terms whose total absolute values do not exceed 2J2 / 4 = 1/ J2,
hence

(1 -1/\/§ - 3 IN) lél s Iﬂél 5 3(1 +|1‘|)|§|,

which entails the left inequality of (2.7.3), and the right one follows by switching the
roles of E and GE. D
As an application of Lemma 2.7.2, consider the matrix S = V‘1 deﬁned in (2.2.7).

- N
Let {34 = {sgn (SJ-11)} , 1, then there exists a positive C's such that
J:-

 

N
Z lst-ll s (851.4 g C's £14] = Cs,Vj' = —1,0, ...,.N (2.7.4)
j=—1

The matrix S appears in the construction of the conﬁdence band, but it can not
be computed exactly as it involves the unknown density f (x). We approximate .S'

with the inverse of M N+2: with a simpler, distribution-free form in (2.2.10). This

approximation is uniform for Sj in (2.2.7) and Ej (2.2.9) as well.

Lemma 2.7.3. As n —* oo,|M1§i2 — .5" —+ O and OinaéxN |EJ~ - Sjl —’ O.
_J_

43

PROOF. By deﬁnition,
MN+2M1QLZ = 1 = vs = (MA/+2 + V) s.

Denote by e,- the unit vector with i-th element 1, then applying Lemma 2.7.2 with

Q = MN+2, one derives

_ N 2
clMNl2 -S'=cm§x(

 

MN+2 5) e,
N+2 ~ _. _
s melM~+z<M~12-S)eils|V|(|M~12-SI+IM~12I)
Since (2.7.2) makes '9' S Cw (f, h), as n —’ oo

Cw(f,h)
c—Cw(f,h)

 

lMﬁiz—Sl g |M;,12|=0{w(f,h)}—»o.

Now by deﬁnition of submatrices Si and :j, axIEj- S j_<_| IMN1+2 —,S| the

0_<_j<N

lemma follows. C]

2.7 .2 Variance Calculation

We now examine the asymptotic behavior of ProjG(0)E, which is
n

N
52 (x) Proj 0(0) E: Z aj Bj2(x), x E [a, b] (2.7.5)

j=—l
where the spline coefﬁcient vector 5 = ((1-1, ...,ELN)T are solutions to the normal

equations

N

((3132, Bj,’2>n):‘vj’=—l 5 = (% i 3332 (Xi) 0 (xi) 6;)
’ i=1

j=—1

In other words

N

= (V + B) —l ('3; Zn: 3:32 (X2) 0 (Xi) 5i) , (2.7.6)
i=1

J'=-1

93!
II

“N

 

where '3' _<_ A"; 2 0p (Ju‘Ih’Tlog (n)) by (2.3.2).

.. —1
Now deﬁne aj’s by replacing (V + B) with V"1 = S in above formula, i.e.

J=—-1 i=1

A

“N

51.1 N 1 n
a -_—. g = Z 314,; 2 13¢ (X,) o (x,) 5,- (2.7.7)
j’=—l,..,N

and deﬁne for x E [a, b]

N N n
. Z .. 1
£2 (2:) = aijg (51:) = E 81—11.; E 31'; (Xi) 0 (Xi) EiBj”2 (:12) . (2.7.8)
j=—l J'J":..1 i=1

In order to calculate the variance of 52 (x), we express the matrix 2 deﬁned in

(2.2.8) as

2 = enven + (5,033, = seven + 2, 9,. = diag{o(to) .. . ..a<t~+1)} ,
(2.7.9)

where
a”. :—-0 'f '- " 1, 6- <0 w ,h +w 02,}; . 2.7.10
]l I I] 3' j’supll ﬂ| { (f ) (f )} ( )

The next lemma is a special case of the unconditional version of equation (6.2) in

Huang (2003).

Lemma 2.7.4. The pointwise variance of E2 (x) is the function 03,; (x) deﬁned in
(2.2.11), which satisﬁes

02 x
E {£2 (x)} E 3,2 (x) = WAT (x) Sj(x)A (x) {1 + ng (x)} (2.7.11)

45

with sque[a,b] Ir"; (2:)] -—> 0, j (x) is as deﬁned in (2.2.2), A (x) as deﬁned in (2.2.9)
and matrix Sj in (2.2.7). Consequently, there exist positive constants co and Ca such

that for large enough n
ca (nh)_l/2 S on; (x) 5 Ca (nh)'1/2 ,Vx 6 [a,b]. (2.7.12)

PROOF. See Wang and Yang (2005). C]

2.7 .3 Proof of Theorem 2

Several lemmas will be given below for the proof of Proposition 2.3.2.

Lemma 2.7.5. Deﬁne for x 6 [a,b]

N
571,2 (33) = 0;}2 (33) 52 (33) = 01;; (35) 2 61-13143 (1?) :
j’=-1
N
522 (x) = 0;; (x) 2 61,433.“, (x) Ill€l<Dn}' (2.7.13)
j'—-1

where Dn satisﬁes (2.7.1). Then with probability 1
. .D _ 1/2 1/2 -(1+5) _
5mg (x) - 571,2 (x) 00 — O n h Dn — 0(1).

PROOF. Since obviously Eén’g (x) = 0, Vx E [a, b] ,

é 2(x)=o-l (x)n-1/2 lg?) B- (x) i s. [[3 (v)o(v)edZ (v e)
"1 n,2 J’,2 3’] 1,2 n a

j'=j(x)—1 j=-1

where Zn (x,e) is deﬁned in (2.3.10). The technical proof is very similar to Lemma

-/. The same order is also

2.7.5, except that we employ (2.7.4) to deal with Eff—"1 3.7 J.

achieved. D

46

Lemma 2.7.6. Let M be the Rosenblatt transformation given in (2.3.9) and deﬁne
for x E [a, b]
5“” 2(1): {fan2(z)}" Z 8 M20” ,.B// ﬂ (v)0‘ (v)eI{l€l<Dn}dB{M(v,) e )}.

j’j=-l
Then with probability 1

 

sup

eff]; (x) — en 2 (x)]: 0(n_1/2h—1/20n log2 n) = 0(1).
x€[a,b] .

PROOF. See Lemma 2.7.6. [3

Lemma 2.7.7. 2.7.7Deﬁne forx E [a, b]

__n__0 (1' ) N
é(1)($) —2 EB 42 (1:) Sj’j f/ng (’U)0‘ ('U) EI{|5]<Dn}dW {M (v,e)} ,
J’j=-l
then with probability 1

sup
x6 [a,b]

 

eff], (x)— 5(0)( 2—(x)] — 0,:(h1/ZD (1+2) = 0(1).
Lemma 2. 7. 8. The process énl gm) x E [a, b] has the same probability structure as

eﬂ(x)=—T—a ”Z B/2(x)sj7j// Bj2(v)o o(v)sn(v)f2(v)dW(v), xE[a, b]

j’j=-1
where .93, (v) is as deﬁned in (2.7.12).

PROOF. Use It6’s Isometry Theorem again. D

Lemma 2.7.9. Deﬁne for any x E [a, b]

N
(3)“: L2" 2 3,.2(e)s [Bj,2(v)o<v)fi(v)dW(v)

J’J=-l

then var{é£z; (x)} E 1,Vx E [a, b], and with probability 1

“52(2) 2m: _ 63(3) 20CW” _ —O (h—l/ZD-5)_ — 0(1)

47

PROOF. Using (2.7. 1) in the last step, the term squda 1,15

 

3o) — 43’ 222(2)]

bounded by
sup ll — 33, (x) sup J: B j’2 (x) lsJ-zj] [8332 (v)o(v) f2 (v)dW (v)
x6[a,b] x€[a,b] j 'j=-1

 

S M5D;6hl/2C /o(v) f2ﬁ (v) dW (v) =

 

0(h_l/2D;6) = 0(1) w. p. 1.

Meanwhile, for any x E [a, b]

3 ,2($ z) N 1 2
var {egg (x)} = E{0\/r—i n 2 BJ- -/ 2 (x) erJ [8ng (v)o (v) f7 (v) dW (11)}

j’j=-1

 

0;3($) 2
= —-— 2N: Br2(w)Bzr2(xSunni/322(1)312(1)) (v)f(v)dv =1

n 1.2" I 1':-
directly from (2.2.8) and 1(2.2.11). C]

Now deﬁne for any 3" = —l, ..., N and x 6 [a,b], the functions

<2 (2:) = 2.4/22;; (x) 82,2 (x) .612) = ((222.1 (x) .9222 (x))T

and the random vector A = (A_1, A0, . . . ,AN)T where

22. £2. 23/] 2,22 .2222222222

j=-1
Then A ~ N (0, $25) as EAJ.’ = 0,Vj’ = —1, .., N, and the covariance is EAJ-IAII =

Z%__1J s jajlsuh for any j, ’l’= .,N, and a j, is defined in (2. 2. 8). Notice that

é(3)2=(17)_2 Cj’ (1;)Aj ,— ._ C($)T A J-W =(A _1, Aj)T ,j = 0,...,N
J'=j(37)-1J($)

and since Lemma 2.7.9 states that the term €53 (x) always has variance 1, it means

that
é(3)(x )2 C($)T Aj(x)

\/<(2)T {cov( A2(.2)}<(x)

48

 

(2.7.14)

 

Lemma 2.7.10. For any given 0 < a < 1, one has

limian ( sup lén,2 (2:)] g [2 {log (N + l) — log {1}]1/2) 2 1 - 0:. (2.7.15)

n-—+oo x€[a,b

PROOF. Deﬁne for anyj = 0, ..., N
-l
QJ- = A}? {cov (Aj)} Aj.
Result 4.7 (a), page 140 of Johnson and Wichern (1992) ensures that QJ- is distributed
as x3 for any j = O, ..,N, hence v
C! .
P [Qj > 2{log(N+ 1) — loga}] = TVTT’VO g j g N.

Then (2.7.14) and the Maximization Lemma of Johnson and Wichern (1992), page

66 ensure that for any :1: E [a, b]

2

“(3) 2 _ l5(x)TAj($) < AT A. —1A. _ .
{5,1,2 (17)} - f(:c)T {cov (Aj(:r))} 5(1‘) — j(x) {C°v( 1(2))} 1(3) ‘ QJ(I)'

 

 

2
553 (ill S maxOSjSN {Q1} and

 

One has therefore Spr€[a,b]

 

 

 

 

' 2
P sup €53 (2:)l _<_ 2 {log (N + 1) — log 0:}

_x€[a,b] ’
F

2 P 92.23%, {Qj} > 2 {log (N + 1) — loga}] Z 1— 01.

Now (2.7.15) follows from Lemmas 2.7.5, 2.7.6 2.7.7, 2.7.8, 2.7.9. [3
Lemma 2.7.11.
su 52(23) _ 82 (2:) = Op (133:1 ___ 0p(1)-
x6[a,b] 071,203) xe[a,b] “n,2 (1‘) M

 

 

 

 

 

 

49

PROOF. Recall the deﬁnition for 5 = (514,60, ...,ZzN)T and a = (&_1,&0,. . .,€1N)T
in (2.7.6) and (2.7.7), one has (V + B) 5 = Va. Based on Lemma 2.7.2 and (2.3.2),

there exists a constant c such that

- A
cIé-él s IV(é-5)l = [Bil s An,2(l5—5I+l5|) => Ié-él s fail-Ia.
_ n,
(2.7.16)
Horn the deﬁnitions of 52 (:13) in (2.7.5) and £2 (3:) in (2.7.8), plus (2.7.12), (2.7.16)

and (2.7.6), as n -—1 OO

 

 

 

 

 

 

. .. N
52 (:6) 52(17) —1 A - 1/2 14112 ..
— g sup 0' (:1: Ia—aB-gx $011 —’—a.
zEIa,b] 071,2 (1‘) 0n,2(=v) z€[a,b] 1;, "’2 ) I 3’ ( ) c—Amzl I
(2.7.17)

Use (2.7.6) again, it implies that as n —> OO

 

 

 

 

.. (— N (—
£2 (2:) 2 nh sup 2 éijg (:12) = nh sup 5B; (2:)l Z Cﬁlal
xe[a,b] 071,2 (:3) Ca z€[a,b] j=_1 Co z6[a,b] ,

 

 

 

 

(2.7.18)

where 132 (1) = {3—1,2($),---,BN,2($)}T,112(55) = {b—1,2($),---:bN,2($)}T-

Then the desired result follows from (2.7.17) and (2.7.18), i.e.

=0? («19%) = 0,,(1).

PROOF OF PROPOSITION 2.3.2. It follows from Lemma 2.7.10 and Lemma

52 (I)
”n,2 (it)

D

 

 

sup 52(1‘) _ 52(2) S A";
xE[a,b] 071,2 (2:) 071,2 (2:) C-Aﬂ3 x€[a,b]

 

 

2.7.11 automatically. Cl
PROOF OF THEOREM 2. Now (2.3.6) implies that “1712 (:c) — m (2:)“0O = 0,, (112),

and hence

(nh)_1/2 y/log (N + 1) ”1712(2) —— m(:1:)||00 = 0,,{(nh)-1/2 y/log (N +1)h2} = 0,, (1).

50

Applying (2.3.7) in Proposition 2.3.2

lim inf P

n—mo

= lim inf P
n—+oo

= limian

71—100

;m (x) 6 m2 (2:) i: on; (x) {2 log (N + 1) — 2loga}1/2 ,Va: 6 [a, b]]

sup 03(1):) (gm) + 1712(2) — m(:1:)| g {210g (N + 1) - 210g a}l/2

L36 [0- ,b]

sup

 

_xe [a,b]

 

52 (1‘)
011,2 (1:)

 

S {210g(N+1)—210ga}1/2] Z l—oz.l:l

51

l

CHAPTER 3

Spline-Backﬁtted Kernel

Regression

3.1 Introduction

One popular choice to addressing the issue of the “curse of dimensionality” is the
additive model popularized by the book of Hastie and Tibshirani (1990)
(I
Y = m(X) + 0(X)5,X = (X1, ...,xd) ,m (x) = 6+ 2 ma (21:0), (3.1.1)
a=l

where the noise satisﬁes E (elX) = 0,var (EIX) = 1 and the component functions
satisfy the identiﬁcation conditions Ema (Xa) E 0,0 = 1, ...,d, In addition, we as-
sume that the predictor Xa is distributed on a compact interval [am he] ,a = 1, ..., d.
The goal is the efﬁcient and fast estimation Of the (1 unknown component functions
{ma(:ca)}g=1 based on an i.i.d. sample {n,xf}; = {13, Xil, ...,Xid}?=1 follow-

ing model (3.1.1).

52

If the last d— 1 of the component functions were known by “oracle”, then one could
deﬁne a new variable Y1 = Y -— c— 23:2 ma (X0) = m1 (X 1) +0 (X) 5 which one can
use to regress on the numerical variable X1 to estimate the only unknown function
m1 (1:1), without the “curse of dimensionality”. The basic idea of Linton (1997) was
to obtain an approximation to the variable Y1 by substituting ma (X0) , a = 2, ..., d
with the marginal integration pilot estimates (kernel-based) and establishing that the
error caused by this “cheating” is negligible for estimating function m1 (11:1).

In this chapter we propose to pre-estimate the functions {ma($a)}g=1 by an
under smoothed constant spline procedure. These function estimates are then used
as as if they were the true functions for constructing the “oracle” estimator. The
greatest advantage of our approach over that of Linton (1997) is that ours is much
faster, and can be applied to cases of extremely high dimension data (e.g., the num-
ber of predictors, d, can be as large as 50 or 100). We believe that our approach is
the ﬁrst example Of marrying the traditionally parallel spline smoothing and kernel
smoothing techniques, leading to an estimator with asymptotically normal distribu-
tion like a typical kernel estimator, without the formidable computational burden
of high dimensional kernel smoothing. Figuratively speaking, spline smoothing can
be compared to a Sledgehammer capable of breaking any huge chunk of material
(i.e., a regression problem from data of very high dimension and very large sample
size), in one slam (i.e., solving only one linear least squares problem), but does not
guarantee the ﬁne shapes of the broken pieces (i.e., the estimates are not guaranteed
to converge at any point or uniformly over an interval, only in the L2 sense). In

contrast, kernel smoothing works like a sharp knife that cuts anything into pieces of

53

precise shapes (i.e., conﬁdence intervals are available at any point based on asymp-
totic normal distribution, and confidence bands are available over compact intervals),
but is too tedious to use for a large chunk of material (i.e., the computation cost is
intolerable when dimension is high and/or sample size is large). Our proposed new
tool can be described as a hammer-knife capable of ﬁrst slamming any huge clump
into many much smaller pieces (i.e., univariate regression problems) in one hit (the
spline backﬁtting step), and then cutting all the smaller pieces into the exact desired
shapes (one dimensional kernel smoothing of backﬁtted pseudo data). In this sense,
the method we propose combines the best features of both spline and kernel methods.

Smoothing experts may wonder how one could have all these good features in
one method. The success Of our method is due to the well-known “reducing bias by
undersmoothing” and “averaging out the variance” principles, see Propositions 3.3. 1,
3.3.2 and 3.3.3. Both goals are accomplished with the joint asymptotics of kernel
and spline functions, which is the new feature of our proofs. For more details, see
Lemmas 3.6.3, 3.6.4 and 3.6.5.

In addition to the above features, uniform conﬁdence bands are provided for all
function estimates under mild conditions. Literature on nonparametric conﬁdence
bands has been scarce, and as far as we know, is lacking in multivariate regression
setting. For additive regression model, however, it seems that the present work is the
one of the few to Offer the measure of uniform accuracy with theoretical justiﬁcations.
The good news is that the conﬁdence band we provide for ma (1:0) with any a =
1, ..., d, is asymptotically the same conﬁdence band that Hardle (1989) established for

univariate regression with kernel smoother, regardless how many regressors there are

54

and what other functions ma (2:0) ,0 = 1, ...,d are. Hence neither the dimension (1
nor other function components play any role in forming the band for ma (2:0,), at least
according to the asymptotic theory. In this sense, our estimator of ma (ma) possesses
what we would like to call “uniform oracle efﬁciency” , which is much stronger than the
“pointwise oracle efﬁciency” of Linton (1997). Furthermore, components in directions
not of interests are only required to be Lipschitz continuous (see Remark 3 at the end
of Section 3.2). Compared to all existing methods, this feature makes admissible the
broadest class of additive model.

The rest of the chpater is organized as follows. In Section 3.2 we introduce the
spline—backﬁtted kernel estimator, and state their asymptotic “oracle efﬁciency” under
appropriate assumptions, both pointwise and uniform. In Section 3.3 we provide
some insights into the ideas behind our proofs of the main theoretical results, by
decomposing the estimator’s “cheating” error into a bias and a noise part, which will
be shown separately to be of negligible order. In Section 3.4, we present extensive
Monte Carlo results to demonstrate that the proposed estimator does indeed possess
the claimed asymptotic properties. The simulated examples cover a wide range of
sample sizes with correlated structure and some very high dimensions, which would
have been either infeasible to handle with kernel smoothing methods, or lacking any
measure of conﬁdence, pointwise or global, by spline method. The proposed estimator
are applied to the Boston Housing data in Section 3.4.2. Section 3.5 concludes, and

all technical proofs are contained in the 3.6.

55

3.2 SBK and SBLL Estimators

In this section, we describe the spline-backﬁtted kernel estimation procedure. Let
{n,xg’}; = {IQ,X,-1,...,X,-d}?=l be an i.i.d. sample following model (3.1.1). In
what follows, we write all responses as Y = (Y1, ..., Yn)T, and denote by X the design
matrix (X1, ..., Xn)T. Without loss of generality, we take all intervals [am b0] =
[0, 1] ,a = 1, ..., d. We pre select an integer Nn ~ 112/5 log (71), see Assumption (AS6)
below. Next, we deﬁne for any a = 1, ...,d, the indicator function 1,1,0 ($0,) of the

(N + 1) equally-spaced subintervals of the ﬁnite interval [0, 1], that is

1 JHSxa<(J+l)H,

H = Hn = (Nn+ 1)‘1,J =0,1,...,N.
0 otherwise,

IJ,a ($a) = {
(3.2.1)
Deﬁne next the (1 + dN)-dimensional space G of additive spline functions as the
linear space spanned by {1, [La (ma) ,a = 1, ...,d, J = 1, ..., N}, while denote by 0,,
the subspace Of R" spanned by {{1}?=1 , {11,0 (Xia)}?=1 ,a = 1, ..., d, J = 1, ..., N}.
As n -—2 00, the dimension of Ga becomes 1 +‘dN with probability approaching one.
The spline estimator of additive function 117. (x) is the unique element 1h (x) =
771,; (x) from the space Gso that the vector {ﬁt (X1) , ..., 1h (Xn)}T best approximates

the response vector Y = (Y1, ..., Yn)T. To be precise, we deﬁne

d N
m (x) = i0 + Z 2 $1,011,, ($0,), (3.2.2)
021 J =1

where the coefﬁcients X0, 31,1, ..., XMd are the solution of the following least squares

56

problem

2
.. . .. T n d N

()0, Am, Alva} = argmianN+12{Yi - /\o - Z Z )‘J,aIJ,a 0%)} ~
i=1 a=l J=l

(3.2.3)

The pilot estimators of each component function and the constant are deﬁned as

N n N
The! (515a) = Z AJ,aIJ,a ($0) ‘ "-12 Z AJ,aIJ,a (Xia) :
J=1 i=1 J=1
d

n N
The = 3‘0 + 1110: Z 21)], aIJa (Xia). (3.2.4)
Hz] :1

These pilot estimators are then used to deﬁne a set of new pseudo-responses 17,-]

which are estimated versions of the unobservable “oracle” responses 1’21, to be speciﬁc,

d

f[1'1=Yi-é-E1_47l"lar(Xio:)aYil= 1“i‘c-2:2"1cnz()(z'c1z) 1:1 zen-1:”,
0:2 (1:2
(3.2.5)
where by Central Limit Theorem 6 is a ﬁ-consistent estimator of 0. Next, we
deﬁne the spline-backﬁtted kernel (SBK) estimator of m1 (1:1) as 7713,1(221) based
on {IQ-1, X51}'.l 1, which is an attempt to mimick the would-be Nadaraya-Watson
z:
estimator 7718.1 (:51) Of m1 (2:1) based on {121,Xi1}?=1, had the unobservable “oracle”

responses {Ella—.1 been available.

732 1a,) = 2521191091 mYu .. (,1): Bilge (X11 —rc1)Y11
8’ 221:1 Kh (Xil - 551) ms’ 2L1 Kh (X11 — x1)

 

 

, (3.2.6)

where 17,-] and Y“ are deﬁned in (3.2.5).

Throughout this paper, on any ﬁxed interval [a, b], we denote the space Of sec-
ond order smooth functions as 0(2) [a,b] = {m|m” E 0 [a,b]}, and the class of
Lipschitz continuous functions for any ﬁxed constant C > 0 as Lip ([a, b] ,C') =

{ml |m(:r) —m(:r’)| S CIx—x'l ,Vzc,:1:' 6 [a,b]}.

57

Before presenting the main theoretical results, we state the following assumptions.

(A81) The component function m1 6 0(2) [0, 1] , while there is a constant 0 < Coo < 00

(A32)

(A8?)

(A33)

(AS4)

(A85)

(AS6)

such that m5 6 Lip ([0, 1] ,C'oo) , Vﬂ = 1, ...,d.

The noise 5,- given X,- are i. i. d. with mean 0 and variance 1, for i = 1, ...,n,
while the conditional standard deviation function a (x) is continuous on [0, 1]“.

Denote Co = maxxe[0,1]d o (x).

The conditional distribution of noise 5 = (51, ...,en) given X = (X1, ...,Xn)T is

n-dimensional standard normal.

The density function f (x) of X is continuous and

O < of S inf {f(x)} S sup {f(x)} 5 Cf < OO.
XEIOJI“ x€[0,1]d

The marginal densities fa ($0,) of X0, have continuous derivatives on [0,1].

The kernel density function K 6 Lip ([-1, 1] ,CK) for some constant CK > 0,

and is bounded, nonnegative, symmetric, and supported on [—1, 1]

The bandwidth h of the kernel K is assumed to be of order 71-1/5, i.e.,

chn"1/5 S h S C'hn"1/5 for some positive constants ch, 0),.

The number of interior knots Nn ~ 112/5 log (n), i.e., an2/5 log (n) 3 Na S
(7an5 log (n) for some positive constants CMCN, and the interval width H =

(Nn + 1)“.

58

The asymptotic property of the kernel smoother in“ (2:1) is well-developed. Un-

der Assumptions (ASH—(ASS), according to Theorem 4.2.1 of Hardle ( 1990), one has

«I»? {m (an) — m1(x1>— b<x1)h2}—’3 N (0.v2 (2:1)),

where
b(1151) = 1£12(K){m'1'(2=1)f1(171)/2+m'1($1)f{($1)}f1—1 (1‘1), (327)
112(1‘1) = ||K||§15'7{02 ($1.X2.~-,Xd)}ff1 ($1)-

Hardle (1989)provide the uniform asymptotics for kernel smoother. For any oz 6

(0, 1), an asymptotic 100 (1 - a) % conﬁdence band for m1 (271) over interval [0, 1] is

n1_i_n&P{m1 (11:1) 6 171,1 (x1) 2H,, (1:1) ,Vxl 6 [0, 1]} = 1 — a

 

where
ln(:1:1) = ”5%) [dn — {log (h'z) }-1/2 log {~—--;- log (1 - a)}] (3.2.8)

 

 

_ 1/2 1 f K’2 (u) du.
d" {log (h 2)} [1 + 2 {log (h‘2)} log {4n2 f K2 (u) du }](3'2'9)

The next two theorems state that the asymptotic magnitude Of difference between
m 3,1 (2:1) and 7713,] (2:1) is of order 0,, (”-2/5) , which is dominated by the asymptotic
size of ms) (2:1)—m1 (271). Hence m 3,1 (x1) will have the same asymptotic distribution

88 7713,1651)-

Theorem 3.2.1. Under Assumptions (A51) to (A86), for any 1:1 6 [0,1], the SBK

estimator 171,) (2:1) given in (3. 2. 6) satisﬁes
. .. _ .. .. P
Ims,1($1) - ms,l (31)] = 0p (n 2/5) 01‘ 712/5 ("13,1 ($1) - ms,1 (171)} —* 0-
Hence with Han) and v2 (1:1) as deﬁned in (3.2. 7)

m{ﬁls,1($1)— m1(1171) - b($131)“) 2* N (on? ($1))-

59

Theorem 3.2.2. Under Assumptions (A51) to (A36) and (AS2’), the SBK estimator

138,1 (3:1) given in (3.2. 6) satisﬁes

sup ImsJ (331) _ ThsJ (31)] = 011(n-2/5) '
$1€[0,1]
Hence for any 2

{108(h—2)}1/2 ( 3‘1 ﬂ lmsa (171) - m1(931)l - dn) < 2]

316(5),” 1) ($1)

 

lim P
"#00
= exp (“-2 exp (-z)},

For any a 6 (0,1), an asymptotic 100 (1 — a)% conﬁdence band for m1(3:1) over

interval [0, 1] is

613,1 (3:1) :l: v(x1)(\/1—ih)—l [dn - {log (h'z) }—1/2 log {—% log (1 — a)}] .
(3.2.10)

in which (1,, equals to

{log (my/2+; {1... (r2) )"1/2 [log {[132 (3.1} - 10g {4.2/K2 (302)] .

Remark 1. Similar estimators mm (270,) can be constructed for any oz = 2, ..., d
with same oracle properties. Also, similar constructions can be based on local
linear instead of Nadaraya-Watson estimator in (3.2.6). In contrast, the bias co—
efﬁcient of the spline-backﬁtted local linear (SBLL) estimator would. simply be
b (2:1) = W (K) m’l’ (2:1) /2, without the additional term Of the SBK estimator, while
the variance coefﬁcients of SBLL and SBK are the same. Higher order local poly-
nomials can also be used, with obvious modiﬁcations. For more on the properties
of local linear estimators, in particular, its minimax efﬁciency, see Fan and Gijbels

(1996).

60

Remark 2. The proofs Of Theorems 3.2.1 and 3.2.2 will make it clear that the
number of knots can be of the more general form N" ~ n2/5N,’,, where the sequence
N,’, satisﬁes Ni, —» oo, n‘aNf, —+ 0 for any 6 > 0. There is no optimal way to choose
N,’,, however, at least to us at this time. The fact that N;1 = o (n'2/5) ensures
that the bias in the spline pilot estimators is negligible compared tO the bias of h2 in
the kernel/local linear smoothing stage. On the other hand, one does not allow Nn
to be too large for practical reasons: the number Of terms in (3.2.3), 1 + dNn has to
be small relative to n. Hence we select Nn to be of barely larger order than n2/5.

Remark 3. Assumption A1 requires only the Lipschitz continuity for the com-

ponents except for the component Of interest. Obviously all ma are required to be

second order smooth if one needs to estimate all components.

3.3 Decomposition

In this section, we introduce some additional notations in order to shed some
light on the ideas behind the proofs of Theorems 3.2.1 and 3.2.2. Denote by
||¢||2 the theoretical L2 norm of a function o5 on [0,1]“, “(tug = E{¢2 (X)} =
f[0,l]d (#2 (x) f (x) dx, and the empirical L2 norm as ”ding,“ = n‘1 2&1 ¢2 (Xi). For

any Lg-integrable functions 45, (p on [0,1]d , the corresponding inner products are de-

ﬁned by
(¢,¢)2 =/ d¢(X)<.0(X)f(X)dx=E{¢(X)<p(X)}.
[0.1]
(¢,<P)2,n = n-IZ¢(Xi)‘P(Xi)- (3-3-1)
i=1

61

A function (b on [0,1]“ is called theoretically (empirically) centered if (l,<,o)2 = O

((1,1p)2,n = 0). Deﬁne the following theoretically centered spline basis

b (:1: )=I ( — llI““"""2I v =1 31:1 N 332
J,a a J+1,a 13a) W- J,a (13a): 0 ,..., , ,..., 1 ( - - )
,a 2

where the functions I J’a (ways as deﬁned in (3.2.1) are indicators on the subintervals
[JH, (J + 1) H). The standardized one is given for any a = 1,...,d,

bJ,a (170!)
"(Ma “2

The additive function space G deﬁned earlier can also be spanned by the lin-

B La ($0,) = ,w = 1, N. (3.3.3)

early independent basis {1, BJ’a (2:0,) , J = 1, ..., N,a = 1, ...,d}, although these new
basis involve unknown quantities and therefore can not be computed from the
data, they are more convenient for mathematical analysis than the truncated
power basis in (3.2.1). Similarly Gn can be spanned linearly by the basis
{1, {BM (2%)}?=1 ,a = 1, ...,d, J = 1, ...,.N}

For better understanding, we use the projection idea to elaborate the constant
spline estimators. The evaluation of constant spline estimator m (x) at the n Obser-
vations results in an n-dimensional vector, in (X) = {:3 (X1) , ..., m (Xn)}T, which
can be considered as the projection of Y on the space G" with respect to the em-
pirical inner product (o, )2," deﬁned in (3.3.1). In general, for any n-dimensional

vector V 2 {V1, ..., Vn}T, we define PnV (x) as the spline function constructed from

62

the projection of V on the inner product space (Gn, (~, )1")

d N
PnV (X) = i)0 + Z E: 17.13330. (2.-a),
a=1J=1

2
n d .N
{00, 01,1, ...,ON’d}T = argmianN+1 2 Vi ‘ ”0 " Z Z vJ,aBJ,a (Xia)} 1
i=1 (1:1 J=1

which is similar to (3.2.2) and (3.2.3) except the basis. Next, the multivari-
ate function PnV (x) is decomposed into empirically centered additive components

PmaV (2:0,) ,0: = 1, ..., d and the constant component Pmcv

ﬂ
Pn,aV(:1:a) = PROV (ma) — 71-12 meV (X,,,) (3.3.4)
i=1
.N
P311" (Iva) = 01,0131.) (20.), (3-3-5)
J=1
d n
Pch = 30 + nil Z Z Paav (Xm), (3.3.6)
a=li=l

in which the centering procedure is the same as (3.2.4).
With these new notations, we can rewrite the constant spline estimators

iii (x) ,ma (2:0),7716 deﬁned in (3.2.2) and (3.2.4) as
Th (X) : PnY (X) ,ma ($a) = Pn,aY ($0) ,ﬁlc = Pn,cY.

Based on the relation Y =m (i) + (1()-{)8 = m (i) + E, with noise vector

E = {0 (X1) 5i}?___1, similarly deﬁne the noiseless spline smoothers
m (x) = P. {m (3)} (x) , m. (x...) = P... {m (3)} (x3) ,3. = P... {m (3)},
and the noise spline components
5 (x) = PnE (x) ,é’a (ma) = PmaE (2:0) ,éc = PmcE. (3.3.7)

63

Due to the linearity of operators Pn,Pn,a,Pn,c,a = 1, ...,d, one has the following

decomposition, which is crucial to prove Theorems 3.2.1 and 3.2.2

Th (X) = 771(X) + 5(X) ,ﬁla ($0) = ﬁla ($0) + Ea (ma) ,mc = mg + 56,0! =1, ..., d.
(3.3.8)

As closer examination is needed later for 5 (x) and 50 (ma), one deﬁne that
2

n d N
.. .. - .. T .
a : {00101,11 "'1 aN,d} : argmmZ 0’ (X051. — 0.0 - Z Z aJ,aBJ,a (Xia) )
i=1 a=l J=1
(3.3.9)
--1
then 5 (x) in (3.3.7) can be rewritten as 5TB (x) , where a = (BTB) BTE is the

solution of equation (3.3.9), and matrices B (x) and B are deﬁned as
T
B (x) = {1, 31,1 (1:1) , ..., BN,d (Id)} ,3 = {B (X1) , ..., B (Xn)}T . (3.3.10)

TO be speciﬁc, the least square solution Of the noise is

-1

5 ___ 1 0 ( "-1 Z?=10(X1)61 ) .
0 (3110’ BJ’,o/>2 11 1300/31, "-1 2?:1 BJ,a (Xia) 0 (xi) 5i ISJSN,

, 1_<_J,J’SN . ISOSd

(3.3.11)

Our main objective is to study the difference between smoothed backﬁtted esti-
mator 111.84 (2:1) and the smoothed “oracle” estimator {133,1 (1:1), both given in (3.2.6).
From now on, we assume without loss of generality that d = 2 for notational brevity.

. . . _ ON +1
Denote the projection matrix P0 N I — , we deﬁne another aux-
+l’ N IN
iliary entity

_1 T N
.3; (x2) = P321132) = {(BTB) BTE} PoN,,,1N (B (x))T = 2 6112812 (23).
J=l

64

which, in particular, entails that

-1 T T N
52 (X12) = {(BTB) BTE} P0N+1,IN (61TH) = Z a.I,2BJ,2 (X12),

J=l (3.3.12)
in which e,- is the n—dimensional unit vector with i-th element 1 and else 0 and hence
the i-th row of matrix B, QTB = B (X,) , is the basis functions corresponding to the
i-th Observation Xi. Deﬁnitions (3.3.5) and (3.3.6) imply that 52 (3:2) is simply the

empirical centering of g?! (11:2), i.e.

n N n N
52 ($2) 5 52 ($2)-"—1 :52 (X12) = Z 5J,2BJ,2 (Km-n.1 2: 51,2312 (X12) -
,-=1 1:1 i=1 J=l

(3.3.13)
Making use Of the signal noise decomposition (3.3.8), the difference my, (3:1) -

613,1 (11:1) + 6 — c can be treated as the sum of two terms

 

 

"_IZ?=1Kh(Xi1- $1) {m2 (X12) - m2 (X12)} = I($1) + 11 ($1)
"-1 Z?=1Kh(X11 -$1) "'12?=1Kh(X11 —$1)’
(3.3.14)
where

1 ($1) = "—1 Z Kh (X11 - $1) '52 (X12), (3-3-15)

i=1
11 ($1) = "-1 2K); (X11 '- $1) ' {7712 (X12) - m2 (X12)}- (3-3-15)

i=1

The term I ($1) is related to the noise terms 52 (X12), while II (2:1) is induced by the
bias terms in; (X52)-—m2 (X12) . Propositions 3.3. 1 and 3.3.2 below show respectively
that the term I ($1) is of order 0,, (n'2/5), either at a given point or over an interval.
This is the most challenging part to be proved, mostly done in Subsection 3.6.1. On

the other hand, Proposition 3.3.3 below shows that the bias term II (3:1) is uniformly

65

of order 0,, (n'2/5) for 2:1 6 [0,1], to be proved in Subsection 3.6.2. Standard
theory of kernel density estimation ensures that the denominator term in (3.3.14),
11‘1 Z?=1Kh(X,~1 -- x1), has a positive lower bound for 2:1 6 [0, l]. The additional
nuisance term é—c is of clearly order 0 (n‘l/z) and thus 0,, (n‘2/5) , which needs no
further arguments for the proofs. Hence both Theorems 3.2.1 and 3.2.2 follow from
Propositions 3.3.1, 3.3.2 and 3.3.3. Section 3.6, therefore, is devoted'exclusively
to the proofs Of these three propositions, rather than of the main theoretical results,
Theorems 3.2.1 and 3.2.2 themselves.

The next three propositions follow respectively from Lemmas 3.6.10 and 3.6. 11,

Lemmas 3.6.11 and 3.6.12, Lemmas 3.6.1 and 3.6.2.
Proposition 3.3.1. Under Assumptions (A51) to (A56), for any 93] 6 [0,1]
(1 ($1)) = 0,, (71-1/2) = a, (71-2/5) .

Proposition 3.3.2. Under Assumptions (A51) to (A56) and (A52’)

sup Ia: =0 n"1/2lon1/2 =0 n'2/5.
xlelmlul): p( {g} ) ,( )

Proposition 3.3.3. Under Assumptions (A51), and (A53) to (A56)

..i‘ié’i'“<xl>'=01(‘m=0p)(""2”)-

66

3.4 Simulation and Examples

3.4.1 Simulation

In this section, we present simulated results to illustrate the ﬁnite-sample behavior
of the spline backﬁtted kernel estimators in”, (2:0,) for any a = 1, ...d.
The data set is generated from the regression model Y = 23:1 ma (X a)+a (X)-e.

The additive elements are assumed to be
ma (ma) = sin (21er) ,Va = 1, ...,d.

Similar to Nielsen and Sperlich (2005), the predictors Xa are obtained through the

transformation X0 = 2.5 * {<I> (Za) — 0.5}, where (I) is the standard normal distri-

bution function and the variable Za ~ N (0,1) ,0: = 1, ...,d with thecorrelation

coefﬁcients pug = p, a 74 ,6 for any pair of Z ’3. Now the correlation between X’s is

not p any more, it will depend on p. In order to validate the assumption that the

density is bounded below from 0, we will focus on the estimation inside [—1, 1]d.
Meanwhile, the error term 5 follows standard normal distribution and is indepen-

dent of X. The conditional standard deviation function is defined by

_ a 100-exp{2§=1 IxaI/d}

“ 7' 100 + exp {221:1 Ixal /d}°

By this choice of a (x), we ensure that our design is heteroscedastic, and the variance

 

0 (X)

is roughly proportional to dimension d. This proportionality is intended to mimic the
case when independent copies of the same kind of univariate regression problems are

simply added together.

67

We now describe how the SBLL estimator are implemented. The ﬁrst step is to
obtain the spline estimator of Egg ma (X0), using the truncated power B-spline
basis as in (3.2.3). The selection of knots will uniquely deﬁne the basis. The knots
number N" will be determined by the sample size and two tuning constants, to be
speciﬁc

Nn : min ([Cln2/510gn] + 02, [(n/4 -1)d—1]):

in which [c] denotes the integer part of c. In our simulation study, we have used
c1 = 1 = c2. The choice of these constants c1 and c2 makes little difference for a
large sample. But for small sample size, it does affect the performance to a degree.
The additional constraint that N S (n / 4 — 1) d‘1 ensures that the number of terms
in the linear least squares problem (3.2.3), 1 + dNn, is no greater than n/4, which is
necessary when the sample size n is moderate and dimension d is high.

The oracle smoother m 3,1 (:01) for comparison is obtained by local linear regression
of the unobservable m1 (X 1 )+0 (X) e on X 1 directly, while the oracle SBLL estimators
ms) (2:1) are obtained by local linear regression of {121, Xil }:=l° To save space, we
only implement the local linear version of mm (2:1), i.e., the SBLL estimator, using
the XploRe quantlet “lpregxest”. For information on XploRe, see Hardle, 'Hlavka and
Klinke (2000) or visit http://www.xplore-stat.de.

We have run 5 = 500 replications for sample sizes n = 100, 200, 500 and 1000 with

correlation coefﬁcient p = 0, 0.3 respectively. The dimensions are taken at d = 4, 10.

The major objective of this section is to compare the relative efﬁciency of in”, with

68

respect to mm

 

%2?=1 {ﬁlm (XiaJ) ‘ "‘0 (Xia'l) }2 [{lxia l '31}
a a =

l n . 2 , 1,...,d,l = 1,...S
3 25:1 {ms,a (X1111) _ ma (Xia.l)} [{IXiaIISI}

effaJ =

S
1
effa "—" §§8ﬁ0’[,a=l,u.,d,

in which {X,-1,,,...,X,-d,,};‘=1 is the l-th sample, 1 = 1,...,5. Theorems 3.2.1 and
3.2.2 indicate that the eﬂiciency should be close to 1. In particular, when we have
an efﬁciency value bigger than 1, ﬁts“, (2:0) is a better estimator in the sense of mean
square distance.

The corresponding mean and the standard error (in the parenthesis) of the rel-
ative efﬁciencies for ﬁrst and third dimension ((1 = 1, 3) is given in Table 4.3. For
the case of p = 0, almost of all the mean values are around 1 without noticeable
inﬂuence from the sample size and the correlation. The trend of standard errors
conﬁrm the comparability of SBLL Thad to the oracle estimator fits“), with faster
convergence for a larger sample. At p = 0 and all the random selected directions, the
SBLL performs better than the oracle local linear estimator in most cases because
the independent components can be well—estimated at the ﬁrst stage, then univariate
local linear smoothing at the second stage will treat less noise than the case of direct
oracle estimator, the local linear estimator.

In the cases of p = 0.3, the trend to relative efﬁciency 1 is very clear regardless
of the dimension d. All the means are becoming larger accordingly and approaching
to 1 steadily when the sample size becomes bigger. Typically, the relative efﬁciencies

are greater than 0.97 for d = 4 with sample size 200, and for d = 10 with sample size

69

500 respectively. We believe that in high dimensional cases the convergence rate is
slower than in lower dimensional cases when the predictors are strongly correlated.
The standard errors in the parenthesis follow the same trend that less variation is
with larger sample size, though it shows slower convergence compared to the case of
p = 0, which is not unexpected.

In addition, several ﬁgures display the features of the relative efﬁciencies in details.
In Figuras 4.6 and 4.7 four types of line characteristics which correspond to the four
sample sizes, the solid line (100), the dotted line (200), the thin line (500) and the thick
line (1000). The vertical line at efﬁciency 1 is the standard line for the comparison
of mm (:51) and {7'sz (x1) . More efﬁciency values distributed around the vertical line
would be conﬁrmative to the conclusions of Theorems 3.2.1 and 3.2.2.

All the curves in Figures 4.6 and 4.7 are the density estimates of relative efﬁciency
distributions for speciﬁc sample size n, correlation coefﬁcient p and dimension d. With
increasing sample sizes, we found that the relative efﬁciency are becoming closer to
the vertical standard line, with narrower spread out. In addition, the curve with
p = 0 shows a faster convergence to the vertical line than those with p = 0.3 in all
cases. An interesting point is that almost of all the peak points of the thick line (with
the largest sample size) fall very close to the vertical lines. All above conﬁrms the
theorem that SBLL behaves similarly like the oracle local linear estimator.

We have done some more simulation with d = 50, and S = 100 replications
for p = 0,03, and n = 500, 1000, 1500,2000, the results of which are graphically
represented in Figures 4.8 and 4.9. The basic graphic pattern is similar to that for

the lower dimensions (1 = 4, 10, though with slower convergence rate and relatively

70

lower efﬁciency. The corresponding statistics are listed in Table 4.4.

3.4.2 Boston Housing Example

In this section we apply our method to the Boston Housing Data. The data ﬁles
bostonh.dat is available in the software of Xplore. The data set contains 506 different
houses from a variety of locations in Boston Standard Metropolitan Statistical Area
in 1970. The median value and 13 sociodemographic statistics values of the Boston
housas were ﬁrst studied by Harrison and Rubinfeld (1978) to estimate the housing
price index model. Breiman and ﬁiedman (1985) did further analysis to deal with
the multi-collinearity among the independent variables. By using a stepwise method,
they proposed the alternating conditional expectation method to select a subset of the
variablas in order to maximize the correlation between the fitted value and the selected
covariates. Four variables were selected by penalizing for overﬁtting. Opsomer and
Ruppert (1998) illustrated their automated bandwidth selection for ﬁtting additive
models based on the selected four variables. We will use the same four covariates for
our model ﬁtting and current analysis. The response and explanatory variables of
interest are:

MEDV: Median value of owner-occupied homes in $1000’s

RM: average number of rooms per dwelling

TAX: full-value property-tax rate per $10, 000

PTRATIO: pupil-teacher ratio by town school district

LSTAT: proportion of population that is of ”lower status” in %

71

One major concern is the big gap in the domain of variables TAX and LSTAT,
which will cause severe trouble at the ﬁrst stage of spline estimation. So logarithmic
transformation is done for these two variables before ﬁtting the model. We will ﬁt an

additive model as follows:
MEDV = p + m1 (RM) + m2 (log (TAX)) + m3 (PTRATIO) + m4 (log (LSTAT)) + 5.

Although the transformation has shrunk the gap in the domain, some compromise
will be necessary to astimate the components since we select the same knots number
for each direction. In this case we choose a large number of knots, N = 5. In the
smoothing step, we use the SBLL estimator to get the ﬁnal function estimate of each
input variable.

In Figure 4.10, the univariate function estimates and corresponding conﬁdence
bands are displayed together with the “pseudo data points” with pseudo response
as the backﬁtted response after subtracting the sum function of the remaining three
covariatas as in (3.2.5). All the function estimates are represented by the dotted lines,
“data poin ” by circles, and conﬁdence hands by upper and lower thin lines. The
kernel used in SBLL astimator is Quartic kernel, K (n) = g: (l - 112)2 for —-1 < u < 1.

Besides the estimation of the component functions, we also use our proposed
conﬁdence bands to test the linearity of the components. In Figure 4.10 the straight
solid lines are the regression lines with the least square coefﬁcients. The ﬁrst ﬁgure
shows that the linearity null hypothesis H0 : m1 (RM) = a1 + ()1 - RM, will be
rejected since the conﬁdence bands with 0.99 conﬁdence couldn’t totally cover the

straight regression line, i.e the p-value is less than 0.01. Similarly the linearity of

72

the component functions for log (TAX) and log (LSTAT) are not accepted at the
signiﬁcance level 0.01. While the least square straight line of variable PTRATIO
in the upper right ﬁgure totally falls between the upper and lower 95% conﬁdence
bands, thus the linearity null hypothesis H0 : m3 (PTRATIO) = a3 + b3 -_ PTRATIO
is accepted at the signiﬁcance level 0.05.

In addition we add up all the SBLL estimates of component functions and the
mean response as a estimate for the response (MEDV). The correlation between
the estimate and the raw value of MEDV is as high as 0.80112, implying rather

satisfactory ﬁt.

3.5 Conclusions

In this paper we have proposed SBK and SBLL estimators for the component
functions in an additive regression model. These estimators behave asymptotically
like the standard Nadaraya—Watson and local linear estimators in one dimension, thus
breaking the problem of d—dimensional additive regression to d univariate regression
problems. This is achieved by approximating the unobservable sample {IQ-1, Xill?=1
with the spline estimated sample {IQ-1, Xil}:;l. Although much mathematics is
devoted to proving that this approximation works, the implementation is very easy.
To give some idea of how fast the procedure is, to run 100 replications for sample
sizes 11. = 500, 100,1500, 2000 and dimension as high as d = 50 takes about 40 minutes

on a Dell notebook. In other words, within this time span, a total of 100 x 4 =

400 SBLL estimators Then (1:0,) and the same number of oracle smoothers 1713,] (:51)

73

are computed. In addition, the SBK and SBLL estimators inherit the asymptotic
conﬁdence bands (3.2.10) of univariate Nadaraya—Watson and local linear estimators.
The combination of speed and global accuracy for very high dimension regression is

very appealing.

3.6 Proof of Theorems

3.6. 1 Variance Reduction

In this subsection we prove Propositions 3.3.1 and 3.3.2. The magnitude of the
variance term I (2:1) in (3.3.15) can be measured by its conditional second moment

given X1,...,Xn. Based on (3.3.13) and (3.3.15), the conditional second moment

. 2 ..
E {1(m1)IX} of I(:rl) given X = {X1,...,Xn} is

n n n 2
E [{n“ 23m. (le — x1)e‘§(xz2)— n‘l 2K}. (Xh — x1) . n“ 25; (292)}

(=1 (=1 ' 1

 

It is clear that
15‘{I<an>l5'<}2 = E{If(x1)|f<} —E{I%(x1)|5<},

where for brevity, we write

11(21) = "_IZKh(Xn-Il)€§(xzz) (3.6.1)
(=1

12(171) = "—IZKMXH—$1)'n-1255(Xi2)- ‘ (3-5-2)
(=1 i=1

If further one denotes

€109,131) = Kh (X11 — $1) 31,2 (X12), (3-5-3)

74

then

n N n N
I1(351) = "‘1 Z Kh (X11 - 171) Z 5J,2BJ,2 (X12) = "-1 Z Z 5J,2€J (Khan)-
z=1 J=l (=1 J=1

(3.6.4)

In order to obtain the order of the conditional second moment of I1 (:51), we
ﬁrst ﬁnd the supremum magnitudes of E§J(Xz,x1), {J (Xbxl) -— E51 (X(,:z:1) and
the size of 2y=1|a1,2|, in Lemma 3.6.3, 3.6.4 and 3.6.7. Consequently, Lemma
3.6.10 shows that SUlee[0,1] E {112 (2:1)] X} = 0,, (n’l). In Lemma 3.6.11 we have
5‘1le6[0,1] IIg (2:1)] = Op(Nn'1\/l3g—r—L) .Based on the selection of N ~ 112/5 log n,
Proposition 3.3.1 is thus proved.

There is one more Assumption (ASZ’) in addition to Assumptions (AS1) to
(A86) in Lemma 3.6.12. The order of 11(31) under the new restrictions is ob-
tained uniformly over [0,1] inﬂated only by a factor of {log (n)}1/2 compared
with the pointwise case, one has SUlee[O,1]lIl (2:1)] = 0,, (W). Now
again, due to the selection of the interval width H m (n2/ 5 log n)-1 , the order
Op (Nn'lm) of 3‘1le6[0,1] |12 (2:1)] in Lemma 3.6.11 is negligible compared
with order of supzl€[0,1]|11(xl)|. So under the Assumptions (A31) to (A36) and

(ASZ’), we have established the uniform bound over [0, 1] of Proposition 3.3.2.

3.6.2 Bias Reduction

Now we prove Proposition 3.3.3 by bounding the bias term II (2:1) in (3.3.16). We

ﬁrst cite one important result from page 149 of de Boor (2001).

Theorem 3.6.1. Under Assumption (A1) ma 6 Lip([0, 1] ,C’oo), then there exists a

75

function ya 6 G [0, 1] such that Va = 1, ...,d

“.90 _ mango S CooH- (3.6.5)

Lemma 3.6.1. Under Assumptions (A51), (A33) and (A56), for the spline function

92 satisfying (3.6.5), one has

 

 

 

I! . _ . _ .
sup 2:1 Kh (len 1'1) {92 (X12) 1712002)} 5 Geoff, (3.6.6)
2:16[0,1] Zi=1Kh(Xi1 - $1)
and for a = 1,2
n
laymen = n-1 29am.) = 0,.(12-1/2 + H). (3.6.7)
i=1

 

 

PROOF. The ﬁrst inequality (3.6.6) follows trivially from (3.6.5). To prove the second
inequality, deﬁne a function g (x) = c+ 2351 ga (1:0), then ||g — mlloo S 2CooH and
hence Ilg - m|]2,n 3 20001-1. The deﬁnition of projection in Hilbert space then implies
that

um - mum. 5 Mg - m||2,n < 20001:!

where m is the projection of m to the space G with respect to (~, )2" , the triangular
inequality implies that

Ila—9112,. .<. 40001;. (3.6.8)
Now (3.6.5) leads to lEnga (Xa) — Enma (Xa)| 3 COOH, while Ema (X0) = 0 leads

to Enma (X0) = 0,, (n‘l/Z). Putting these together, one has

lEnga (Xa)l S. “31:90 (X0) - Enma (Kan + lEnma 0(0)] 3 COOH + 017(n-1/2) ,
(3.6.9)

which establishes (3.6.7). CI

76

In order to show that the bias term II (2:1) deﬁned in (3.3.16) is uniformly

op (n‘2/5), the following lemma suﬂices.

Lemma 3.6.2. Under Assumptions (A31) to (A56), as n —-* oo

Z?=l Kh (Xil - 1‘1) {7712 (X12) - 92 (Xe?) + Eng2 (X2)}
221:1 Kh (Xil - $1)

 

sup
$16[O,I]

 

 

= Op(n—1/2 + H).
(3.6.10)

PROOF. Using the same notations as in the proof of Lemma 3.6.1, (3.6.8) and (3.6.9)

now give
um — g + Em (X1) + Enge (X2)|l2,n s scooH + 0,.(n-1/2) .
and Lemma 3.6.8 would then entail that
um — g + Engl (x1) + £2.92 (mug = 0,, (W2 + H) . (3.6.11)

To complete the proof of the lemma, we write

2 .N

(in — g) (x) + 12.91 (X1) + Engz (X2) = a + Z Z amBza (we).
a=lJ=1

where the empirically centered spline basis are

71
Bio, (1'0) = BJ,a (370:) " EnBJ,a (X0) = BJ,a (55a) - "—1 Z BJ,a (Xia) ,

i=1

foranylSJSN,l_<_a$2. Thenfora=1,2,

.N
7710 (55a) — 9a (330:) + Enga (Xa) = Z aJ,aB3,a (170) .
J=1

and according to (3.6.19) one has
um — g + Engl (x1) + Enge (mug

2 N 2 2 N
2 co {0+ZZGJ’aEnBJ’a(Xa)} +2201! . (3.6.12)

a=lJ=l a=1J=1

77

Now

n’1 2 Kn (Xil - $1) {1712 (Xiz) — 92 (X12) + Engz 0(2)}

i=1
n N
_ n‘1 2 K), (Xu - $1) 2 (”233,2 (Xi2)»
i=1 J=l

which is bounded by

 

 

N
E 1 El K( — B X
l
< E - E K X: — B X'
_ J=1]0J,2| <{l_<_s.‘II;N n i=1 h( 21 331) J,2( :2)

 

 

+

n
"-1 2K1; (Xil - $1)

i=1

 

 

sup IEnBJ,2 (X2)|}
ingN

which can be rewritten as the following according to the deﬁnitions of Q (X;, 1:1) in

}

Minkowski inequality, Lemma 3.6.5, (3.6.29) and standard properties of kernel den-

(3.6.3) and of Afm in (3.6.28)

N
Z laJ,2]{ Slip
lngNn

J=l

+An,1 n 123101091 ’31)

i=1

11:61] (thl)

I: l

 

 

 

 

sity estimator now imply that

Seals” n lZ:Kh(Xz1-.’L‘1){mz(X22).<12(X12)+1‘77192(X2)}
31 i=1

,lNéag’ZU {(opf' H)+0,, )(WN

 

 

 

|/\

= 0,, (HI/2,|NZaJ2)=—0 p((l;=laJ,2)
=1
N 2 2 N 1/2
= Op {6 + Z Z &J,aEnBJ,a (X0)} + Z Z 03:0 ’
a=1J=1 “=11=1

78

which according to (3.6.11) and (3.6.12) is
=0(||*- +E (X) E X) =0 '1/2+H
p m 9 n91 1 + n92( 2 "2) p n 1

thus proving (3.6. 10). [3

Now combining Lemmas 3.6.1 and 3.6.2, one immediately gets

SUP “-1 Z Kh (Xil - a71) {T712 (Xiz) - m2 (Xi2)}
$16]0,l] i=1

 

= 0,, (n—l/2 + H) = 0,, (n'Z/S) ,

which establishes Pmposition 3.3.3.

3.6.3 Technical Lemmas

In this subsection we have collected all the auxiliary results used in Subsections 3.6.1

and 3.6.2.

Lemma 3.6.3. Under Assumptions (A53) to (A56), one has
sup sup IE£J (Xia): = 0(H1/2) .
21€]0,l] ISJSN

PROOF. Deﬁne for a = 1,2,J = 1, ...,N + 1

2
01,0 = “IJ,a"2 = [Lia (1:0)}?! ($0)d33av
then bJ’a (3:0,) in (3.3.2) can be written as (2,1,0, (2:0) = 11+“; (30) —

CJ+1,o:IJ,a ($0) /CJ,a and
"IMO“: = chm (1 + cJ+1,a/eJ,a),Va = 1,2, J = 1, ...,N.

In Assumption (A33) the two positive constants cf,C'f are the upper and lower

bounds of all the marginal densitiae fa (ma) , then for all J = l, ..., N + 1, a = 1, 2
CfH _<_ cJ,a S CfH. (3.6.13)

79

Then for all a = 1,2, J = 1, ..., N, ”5.19“: m H, or speciﬁcally
2
Cf (1+Cf/Cf)H_<_ ”bJ,a“2 SCf (1+Cf/Cf) H. - (3.6.14)
The absolute expected value of {J (Xbxl) is

|E€J (Xz,131)| = IE {Kn (X11 - $1) BJ,2 (X12)}|
S f/KMM*1‘1)IBJ,2(U2)]f(U1,U2)dU1dU2
= f/K(vl)|—_— b’w “2 )lf(hv1+x1,tt2)dv1d02
Ile. 2||2

(lle.)2“2 //K(v1){IJ+1,2(U2)+ (53%)1/203 (112)}

xf(hv1 + $1,112) dvld‘U2

(lle,2II2)_l {f/K(v1)1J+1,2(U2)f(hvl +$i,u2)dvlduz
1/2
+ (SJ-iii) //K(v1)IJ,2(U2)f(’WI +$I.U2)dvldu2}-

CJ,2
The boundedness of the joint density f and the Lipschitz continuity of the kernel

K will then imply that

SUP SUP f/K(v1)IJ,2(U2)f(hv1+x1,uz)dvldu2SCKC'fH,
$1€[0,l] ISJSN

the proof of the lemma is then completed. D

Lemma 3.6.4. Denote by 0,; a set of endpoints in [0, 1] , with cardinality Mn = anl

of order n6, i.e. there exist constants 0 < cD < CD such that ch6 S Mn S CDnS,

=0,,( 13;"). (3.6.15)

then under Assumptions (A33) to (A36)

11
SUP SUP 71-1 2 {SJ (thl) - E€J (Xz,x1)}
xleDnngSN (=1

 

 

80

PROOF. For simplicity, denote £3 (X), :51) = {J (X), 2:1) — E€J (X1, 2:1) . First we will
compute the moments of the theoretical centered random variable 6} (X1, 2:1) for later
use in Bernstein’s inequality

E {6} (19,151)}2 = E53 (Xbl‘l) - {E€J(X1:$1)}2:

in which the ﬁrst term

E53 (Xi’an) = E {101001 - $1)BJ,2(X12)}2

 

K2 CJ 12
= //——-§(vl){IJ+1,2(u2)+ + ’ IJ,2(“2) f(hvl +$1,u2)dv1duz.
h lle.2"2 C” '

so there exist constants c’,C’ > 0, such that c’h’l S E53 (Xbxl) S C’h’l. Then

E53 (Xbxl) >> {EEJ (X),:rl)}2 where an > bn means limn—em bn/an = 0. Hence
a: 2 _.
E {51 (Xz,$1)} = E53 (XL-1‘1) - {EEJ (39,331)}2 2 3h 1,

for positive constant c" < c'.

When k 2 3, the k-th moment E |{J (X),:cl)|k is

{”bJ,2"2}-k//Kif(u1-$1){1J+1,2(U2)+ (CJ+1’2)kIJ,2(u2)}f("1,u2)du1dU2,

 

61,2

and it can be bounded as follows

I: k
c(h(1"’°)H(1-"/2) {1 + (2.—i) } s E|§J(x,,e1)|’° g c;h(1-*)H(1-k/2) {1 + (2;) }.

Lemma 3.6.3 implies |E§J (X),:r1)|k S CHIC/2, then E IEJ (X;.,:1:1)|'c >>
IEéJ (Xl,a:1)|k. E '6} (X¢,x1)]k can be expressed as
E |€J (Kb-Tl) - E€J (X1,I1)lk S 2’” (E |€J (Xbxlﬂk + |E€J(Xt,x1)|k)

0 ’° C (k-2)
< C12k‘1h(1-k)H(1“k/2) (_I.) kl ___. Cl {Zh—lH—1/2 (4)} k!(h_1)
_ Cf Cf

k...
g {ngh‘lH‘l/2}( 2) ME IE} (Xz,$1)l2,

81

then there exists such a constant c = C'g2h’1H"1/2 such that
k _ 2
E IE} (Xz.x1)| _<. c’c 2km“ [6} (xz.:c1)| ,

that means the sequence of random variables {G (X), $1)}?=1 satisﬁes the Cramér’s

condition, hence by the Bernstein’s inequality we have

logn —62 logn
P > 6 — < 2 ,
{ — (nh) } - exp {c* + 2026H"1/2‘/logn/ (nh) }

there exists large enough value 6 > 0 such that

—62/ {a + 2026H-1/2,/legn/(nh)} g —10, then

"‘1 :63 (XI: 31)

(=1

 

”-1 263(X1: 31)

(=1

 

 

oo
2 P sup sup

 

 

logn
2" 675}

00 00
S 2 Z: NMnn"lo 3 200 Z n-3 < 00.
12:1 n=1

Borel—Cantelli Lemma impliae (3.6.15). C]

Lemma 3.6.5. Under Assumptions (A33) to (A36)

sup sup
x1€[0,1]1.<.JSN

 

”—1 2009,1131)
[=1

 

PROOF. Denote for :1: 6 [0,1] , A (1:) = SUPlstN In’1 2L1 {I (X), 2:)]. If we choose
the subset Dn as in Lemma 3.6.4 to consist of equally spaced endpoints in [0,1] ,
speciﬁcally

Dn ={$1,k,0 _<_ k S Mn;0=$1’0 < 131,1 < <$1’Mn = 1},

then the consecutive endpoints make a total of Mn subintervals with length M; 1.

Employing the discretization method, we have

sup IA (2:1)] = sup IA ($1,k)| + sup sup ]A(:cl) - A ($1,k)l-
x1e[0,1] osksMn 19chn x1€[“1,k—1’$l,k

(3.6.16)

82

We only need to bound the second term, as Lemmas 3.6.3 and 3.6.4, and the fact
III/2 >> Vlogn/ (nh) yield

sup IA ($1,k)l = sup sup = 0p(H1/2). (3.6.17)

OSkSMn $1601; ISJSN

 

 

”—1 Zg.’ (x1131)
(=1

Employing Lipschitz continuity of kernel K, one has

 

 

 

 

 

Slip sup IKh (X11 -— $1) - Kh (X11 - $1,k)|
193M" $1€le,k-l’$1,k
X — x X - 113
S sup sup CK (1’12 1 - ll h2 I”: S CKMgthESJS)
lSkSM" $l€I$1,k—1'$l,k]
Hence we have
sup sup IA (x1) — A ($1,k)I
ISkSMn xlele,k—I'x1,k
n n
g sup sup sup n-1 26; (XI, x1) - n-l : {J (Xbxuc)
lSkSMn$1€[xl’k_l,xl,k] ISJS'N (=1 [=1
S SUP SUP IKh (X11 - $1) - Kn (X11 - $1,k)I
193M" $16I31,k—1v$1,k

"-
‘1 B X
XlssblgNn g] J,2( 12)|

_. = —1-2 —1/2 = —1
__ CKMnhz $2856,1]1$S.1112N|BJ’2(2:2)| 0(Mn h H ) 0(n ),

since ch6 3 Mn 5 C'Dn6 in Lemma 3.6.4. The lemma follows instantly from

(3.6.16), (3.6.17) and the above result. D

Lemma 3.6.6. Under Assumptions (A33) and (A36), there exist constants Co >

CO > 0 such that
2

2
60 0(2) + 203,0 5 a0 + ZaJﬁBJﬁ 3 00 a0 + 203,0 , (3.6.19)
Ja J,a 2 J,a
for any a = (a0,a1,1, ...,aN,1,a1,2, ...,aN,2)T E R2N+l.

83

PROOF. According to Lemma 1 in Stone (1985), there exists a constant co > 0 such

that
2 2

N
2
00 + Z aJ,aBJ,a 2 00 a0 + Z aJ,1BJ,1
J,a 2 J=1 2

2

N
+ Z aJ,2BJ,2 ,
J=1 2

If it can be proved that there exist constants 06 > c6 > 0 such that for oi = 1, 2
2

N N N
c6 2 (13,0 5 Z a 1,0,8 La 5 06 Z aid, (3.6.20)

then (3.6.19) follows. To prove (3.6.20), the original B-Spline basis is employed.
Without loss of generality we only provide the proof for a = 1. We pick the constant

basis {I J,1 (x1)}IJv:l1 and represent the term zyﬂ a JJB J,1 (x1) as follows

N N+1
Z aJ,13J,1(-’1«‘1) = Z dJ,IIJ,1($1)- (3-6-21)
J=1 J=1

Theorem 5.4.2 in Devore & Lorentz (1993) says that there is an equivalent relationship
between the LP (p > 0) norm of a B-spline function and the sequence of B-spline

coefﬁcients. To be speciﬁc, in our case

N+1 2 N+1 2 N+1 I
z dJ,IIJ,1 = j Z dJ,IIJ,1(-'51) (1171 = 2 (13,117-
As in Assumption (A33) the joint density bounded between cf and Cf, we have
N+1 2 N+1 2 N+1 2
CI 2 dJ.11J.1 S 2 dJ,11J.1 S Of 2 «mm
J=1 L2 J=1 2 J=1 L2

The equality (3.6.21) and (3.6.14) leads to

 

Elf EN: “31 {(CJ+11)2 1}
J,1 = ’ —' +
J=1 J=1 |le.1||§ C“
N N+1 N
=> cd 2 ailH‘1 3 2: (13,1 3 Cd 2 ailH—l,
J=1 J=1 J=1

84

for positive constants Cd and Cd. Therefore,

2 2

 

 

 

 

 

 

 

 

N N N+1 N
€de 2 “-21.1 S 2: aJ,1BJ,1 = Z d.I,1IJ,1 S (7de 2: “3,1:
J=1 J=1 2 J=1 2 J=1
i.e. (3.6.20) holds given of, = cfcd, 6'6 = Cde. [:1

Lemma 3.6.7. Under Assumptions (A31) to (A56), the least square solution 5 de-

ﬁned in (3. 3. 9) satisﬁes
N N
"T5 = 63 + Z 2: (ii, = 0,, (—) . (3.6.22)
‘ -l
PROOF. According to (3.3.9), 5 = (BT13) BTE, then
-1
5TBTB5 = (5TBTB) (BT13) BTE = 5T (BTE) .

Replacing BTB with matrix of the inner products (BJ,a, BJ/ a’>2 , as the matrix
2 ’n

B is given in (3.3.10), one has

1
“1351);, = 5T <3 B > 5 = 57‘ (n-IBTE) . (3.6.23)
J’a, J’,OI 2 71

Based on (3.6.19), the left hand side of (3.6.23) is bounded below by
2

(1 " An) "Béllg : (1 " An) 5'0 + Z &J,aBJ,a

J,a

 

 

 

_>_co(1—An)(&3+2&3,a ,

J,a
(3.6.24)

2

where An is of order op ( 1) in Lemma 3.6.8. While the last step in (3.6.24) is obtained
from (3.6.19). Meanwhile by the Cauchy-Schwartz inequality and the expression of a
in (3.3.11), the right hand side of (3.6.23) is bounded from above by

1/2 n 2 n 2 1/2
(53 + 253,0.) [{"_l :0 (xi)5i} + Z {n—1 2 BJ,a (Xia) 0 (Xi)5i} ] -
J,a i=1 J,a '

i=1

(3.6.25)

85

Now (3.6.23), (3.6.24) and (3.6.25) will lead implies that a?) + 2),, a?” is less
than

n 2
ca2 (1 — Aer2 [{n“ Za- (X.)e.-} +Z{" ‘1 EB]. (X1000 (x he} ] .
i=1

i=1

Note next that it is trivial to verify that

E[{n’lio(xi)ei}2 +Z{n IZBJQ(X,-O)U(X)e,}2]=0(n‘1N).

i=1 i=1

Therefore (3.6.22) holds. [3

Lemma 3.6.8. Under Assumptions (A33) and (A54), the uniform supremum of the

rescaled difference between (91, ”)2," and (91,92); is

 

 

|(91,92)2 —(91,92>2|
An = sup ’" = ,, log" =op(1). (3.6.26)
91 926d“ -1) ||91||2||92|l2 ”H
PROOF. Let
N 2
91(X11X2) = “0+2 ZaJ,aBJ,a(Xa)1
J=la=1
N 2
92(X1,X2) = 06+ 2 Z “’JI,QIBJ',a’ (X01),
J’=lo/=1

in which for any J, J' = 1, ...,N,a,a' = 1,2, a1), and afﬂa, are real constants.
The difference between the empirical and theoretical inner products of 91 and 92

is

(91.92)”. — (91,92)2| =

 

<ao+ZZIGJHaBJatao+N Z Z“ “,JI a’BJ’,a’>2

10:1 J’=10’=_l 2,"

—<ao + Z Z aJ,aBJ,mao + Z Z “11,,20131'a'>

J=la=1J’=1a’1

86

S Z<a6an,aBJ,a>2,n + Z (“O’GG’M’BJ’ta'lm

Jra J,,Q’

+ Z: IaJ,a| lat/[rail |<BJ,a, BJI’QI>2,n - <BJ,a,BJI,aI>2l .(3.6.27)

J,J’,a,a’

The equivalence of norms given in equation (3.6.19) leads to

Z (06.02.131.32. .<. All ' labl - ZGJ,aBJ,a
Ja

J,a 2
1/2
<CA* ’1/2 "2 <0 A*
_ 0 11,1 Idol 201,0 — A,l n,1 "91“2 "92”2 1
J01
where
ARI = eJup |(1, B “)2,” — (1, B ”)2! =-. sJup I (1, B J,,),nl . (3.6.28)
,0 ,0

Similarly it holds for the second term in (3.6.27) that

Z (.0, ail/ewe), ,, s 02114;. "91H2HQ2H2-
J,a ’

It is easy to show by Bernstein’s inequality that

ARI = sup
,0:

 

n
n" 2 BL. (X...)
i=1

 

= 0,, (W) . (3.6.29)

The third term in (3.6.27) will be in probability less than

 

Z IGTI’O‘I la’J’ﬂ’l |<BJ"" BJI’O">2,n — <BJ'O" EKG/>2

J,J’,a,a/
1/2 1/2
JsJ’3aaa, J’a J”al
S 0.421412 ||91||2 |l92II2.
where
t _
Am? _ Jj’usal <BJ’O’ ’ 3150’ >2,n ... <BJ'O’ BJ':0'>2 '

 

 

87

Now since

(91.92)”. — (91,92)2| S {(CA,1 + 02,1) A711 + CA,2A;,2} ||91||2 |l92||2,

if we can show that

A);2 = 0,, («log n/ (nH)) , (3.6.30)
plus the fact that \/log n/ (nH) >> \/ log n/n, based on the selection of H ‘1 ~
n2/5 log n, then there exists a constant C A > 0

|(gl:92>2,n — (glag2l2l I * * *
. * S (CA,1 + 0,4,1) An,l + CA.2An,2 S CAAn.1h
||91||2 ||92||2

 

the order 0,, (W) of An will be established as in the statement (3.6.26).

The proof of (3.6.30) will be provided case by case with vari-
ous a, 01’, J and J’ , via Bernstein’s inequality. For brevity, we set
1;,- = n-1 [31,0 (Xia) 811,0; (2%,) — E {B J,,, (Xia) B ”a, (XE/ﬂ] , then
A3; = SUP1ngN,a=1,2 |23=1m| -

We will consider a = a’ = l in the CASE 1.1 to CASE 1.3.

CASE 1.1 when IJ - J'l > 1. The deﬁnition of B J,1 in (3.3.3) will guarantee that
in probability 31,1 (Xi1)BJ/,1(X,°1) = o if |J — J’| > 1.

CASE 1.2 when J = J’. The variable 1),- and its second moment can be simpliﬁed

as follows

17t= "—1 (33,1001) - 1} .1377? = 3133(33,1(Xt1)-1}2 = $ {E311 (X21) - 1}.

in which E83’1(X,-1) = lib-1.1”;4 (014.13 + C3+l.l/63.1) . The selection of H will

make E311 (X271) the major term of {E33,I (Xﬂ) - 1}, then there exist constants

88

60,2 and 0:13 > 0 such that
cyan—2H"1 S Er],-2 S Cg’2n_2H_l.

In terms of the Minkowski’s inequality, the k-th absolute moment has the following

upperbound
k —k 2 k —k k-l 2k
E|17,~| =n ElBJ’1(X,-1)—1| $1. 2 {EBJ,1(X,'1)+1}.

-2k _ .
where EBgf‘l (Xil) = “bJ,1”2 (”4,1,1 +03’fH’1/c3f’1 1). Hence there exist con-

stants C82 and C 32 such that
_ k 1-
c’ngl k g 133%, (xﬂ) S 03211 k,

then the term E83,“l (Xill will be the dominant one compared with 1. Hence there

exists a constant C03 > 0 such that
E [17,)" g 05,2n-k2k-1H1-k.
Next step is to verify the Cramér’s condition
E Milk S 03,2n-k2k—1H1-k = 05,2”—(k-2)2k—lH-(k-2)n—2H—l

2
2077.2 2011.2
Cn,2 ”H

 

(k4) k—2
> awn-2H4 _<_ {0,7,2} k!En,-2,

in which 0,12 = (20,7,2n’1H‘1) max (1, 203,297,” . For a large value 6 > 0, we have
n

P { 2771'
[=1

3 2epr:

 

 

—62 log n/ (nH)
6\/lo n nH 2e
2 g /( )} S xp [4 Z?=1E’li2+ 203,26‘Mog n/ (nH)]

—62 log n/ (nH) :l

4n {03,211‘211’1} + 203,26t/log n/ (nH)

 

 

89

If the large enough value 6 is taken such that —62/ {40,93 + 20;,2dt/log n/ (nH)} S

 

—3,then
logn _3_
P N
t? {133211.234 l“; " >2. <°°

 

 

Applying Borel -Cantelli lemma, when J = J’, a = a’ = 1 we have

Zn:

CASE 1.3 when |J — J’ I = 1. Without loss of generality we only prOve the case

An = sup
’ 1_<_J<N

 

 

mama—H1).

that J’ = J + 1. Now 7h' = n‘lBJ’l (Xi1)BJ+1,1(Xi1) has the second moment

— 2
En? = n 2 [E83,1(X.1)83.1,1<Xa) - {E811 (X11)BJ+1,1(X11)} ] ,

 

 

 

where
{331,1 (Xil) BJ+1,1 091)}2
= lle.1||;2l|bJ+1,1ll;2 [/{IJ+11($1)- 311’ IJ,1($1)}
2
x {11.21 (x1)— 2’” ‘ 1.1.1 1 (x1)}f1(x1)d:c1]
J+1,l
2
_ —2 —2 _ J+2,1
— ”le“2 “bJ+l.1“2 { CJ+1,1/IJ+1’1($1)f1 ($1)d$1}
-2 -2
= C3+2,1||b1.1||2 |le+1.1||2 1
and

E331 (X11) 83.11 (X11)

2
||b1,1n;21b1.1,1||;2[{11.1,1<e1>- gimme}

 

 

2
cJ+2,1
x{IJ+2,1(=II1) CJ+111J+1,1(331)} f1($1)d$1

’1

90

 

 

_ _ C 2
lle,1||22||bJ+1.1||22{( +21) [11.111111111111111}

CJ+1,1

(C3+2,1 lleJll2-2 lle+1,1ll;2) /CJ+1,1,

According to (3.6.13), cfH S CJ+1,1 S CfH, so E77,-2 will be with the same order as
the major term n’zEB},l (Xi1)83+1’1 (X11) , i.e. there exist constants cmg, 0:13 > 0
such that

6,7,311.—2H—1 3 E17? _<_ Cg’3n_2H-l.

The k-th moment is given by

- k
Elmlk = n kElBJ,l(Xil)BJ+l,l(Xil)“EBJ,1(X1'1)BJ+1,1(X1'1)|

|/\

n'ka'l [E IBJ,1 (X11) BJ+1,1 (X11)|k + IEBJ,1 (X11) BJ+1,1 (X11)Ik] 1
where
IEBJ,1 (X11) BJ+1,1 (X11)|k = 85”,, “51,1":c lle+1,1“2-’c "’ 1
EIBJ,1(Xi1)BJ+1,1(Xil)lk = (0134,21 “in”? lle+1,1ll2—k) #5111 ~ Hl-k-

Hence there exists a constant 0,7,3 > 0 such that
E milk 3 05,371‘k2k—1H1-k.

Similar as in Case 1.2, the conclusion follows by using Bernstein’s inequality

ﬂ

2121-

i=1

A;2= sup
’ ngsN

 

 

=01(W).

CASE 2 when a = a’ = 2, all the above discussion appliw without extra modiﬁ-
cations.

CASE 3 when a # a’. Without of loss generality, suppose a = 1, a’ = 2.

91

First we still need to calculate the order of second moment E173,

E7)? = "‘2 I153{I3J,1(X11)BJI,2(X¢2)}2 - {331,1(X11)BJI,2(X12)}2I -

The boundedness of the density function f ($1,152) implies the order 0 (H) of the

absolute mean

IEBJ’I (Xi1)BJI,2 (Xi2)I S. E Inil

|/\

_ —1
IIbJ,lIl21II,2IbJ’ II2 f/IbJJ(1711le!2(352)If($11$2)d$1d$2

Cf{IIbJ,1“2 [le,1($i1)Id$1}{IIbJ12I2[I5J12($12)Id$2}
041+ i3,—i’—‘}{ubnn; H}{‘+ i-ﬁ-T‘Zlillbml ”#0811”,

for some constant 03,1 > 0, where the last step is derived from the equations (3.6.13)

l/\

 

|/\

k
and (3.6.14). As aconsequence, IE {BJ’I (Xil) 3.1/,2 (X;2)}I g C§,1Hk' Meanwhile
the uniform order of the mean square 0 (1) will be obtained by Assumption (A83),

and (3.6.13) and (3.6.14),

E{BJ1(X11)BM(X12)}2
I,le1H2 2IIleg II2 2]]le($11)b112($1'2)f($11172)d331d$2

.. —2
Cf {IIbJ,1“22 [bJ,1 (mil) £1121} {I bJI’2I 2 ij’,2 (mfg) (1:32}

cf{1+ C3+1,1/C3’1} {lIbJ11ll2-2H} {1 + 03,+1’2/c3,’2} {IIbJ,’2II;2,H} _>_ 03,2.

IV

 

 

Hence there exist constants 6’71 0;, > 0 such that

can—2 3 Eng-9' S Can—2.

92

First we still need to calculate the order of second moment E173,

En? = n-2 [E {81,1 (X11) 811,2 (Ia-2)}2 — {E811 (X11) 3,1,2 (X12)}2I .

The boundedness of the density function f (221,2:2) implies the order 0 (H) of the

absolute mean

IEBJ,1(X1‘1)BJI,2(X2’2)I S E |771°|
_1 -1
“bJJIlz IIbJ’,2II2 f] IbJ,1($1'1)bJI,2(-’51'2)
_ -1
Cf{“bJ.lll21 / le.1($z'1)|d$1} {IIbJ’,2II2 / IbJ',2(xi2)IdI2}-
CJ 1,1 CJ’ 1,2
Cf {1 + 6:1 }{lle,l"2l H}{1+CJ + "7}{III’JA2II2—l HIS 03,111.

for some constant 03,1 > 0, where the last step is derived from the equations (3.6.13)

l/\

 

f($11$2)d$1d32

l/\

|/\

 

k
and (3.6.14). As aconsequence, IE {37,1 (X11) BJI’2 (X52)}I S C§,1Hk' Meanwhile
the uniform order of the mean square 0 (1) will be obtained by Assumption (A83),

and (3.6.13) and (3.6.14),

2
E {BJJ (X11) BJI,2 092)}
_ —2
IIbJ,1"22IIbJ’,2II2 [[1711($105313($12)f($1112)d$1d$2

_ —2
Cf{IIbJ,1|l22/b,21,1(~731'1ldi171}{HIM/,2”2 [b31,2($i2)d$2}

cf {1 + €3+111/C311}{"b111“;2 H} {I + 631+1,2/63,,2} {IIbJ,’2II;2-H} 2 63,2.

IV

Hence there exist constants 617,08 > 0 such that

cnn”2 5 En,-2 S Gan-2.

92

 

For any k > 2, the k-th moment of |17,-| is given by

11:
Elm!" = n*’°E|BJ,1 (x11)BJ/,2(X12)—EBJ,1 (2908113 (X11)

 

k
g n‘kzk’l IEIBJ,1(X1'1)BJI,2(X1'2)I +IEBJ,1(X11)BJI,2I(X12)

 

'“l
where there exists a constant C B’ > 0 such that
k
E IBJ, 1 (X11) BJ’ 2 (X12)
1:
IIbJ,1II2 Iijlz II; k/f IbJ,1 (1311le: 2 ($12)I “$1.32) d15161-722

CfIIIbJ,1II;k / IbJ,1 ($11)Ikd1=1}{IIbJI,2II2k / IbJI,2($12) kdxz}
ck
s042%}{12%}{111112121 }

ck H“
g C,{1+ $111} {1+%‘-"—2}{cf (1+cf/C,)}"°H2-’°$051124“.

J’ ,2

 

l/\

|/\

 

 

Thus there is a constant C" > 0 such that

Elﬂilk S n’ka‘l [Cg/H24“ + C§,1HkI S (C3’,,)"n""2’€‘1H2-’c

k~2
203 -2 20 202

—l,,(2c n-lH-1)k Gnu-2 g {——"- max (_1 1)} 11113173.
0n "H 6n

Employing the Bernstein’s inequality and the fact that En,2 ~ 11‘2, for any

|/\

1SJ1J’SN1a7éa'1

" 31,0, (Xi0)BJ’a’ (X111!) -E {31.12 (Xia) 3er (Xia’)} ( logn
‘ p

sup 2: ’ n ’ -O

1ngN i=1

 

 

71

Hence for any 1 S J, J’ g N, a, a’ = 1,2, the proof of (3.6.30) is completed. B
-l
The next lemma on the positive deﬁniteness of matrix (n‘lBTB) is a sufﬁcient

step to achieve Lemma 3.6. 10.

93

Lemma 3.6.9. Under Assumptions (A33) and (A54), for the matrix S =

dN+l 1 T —1 . .
(31.3.1), -/ 1 = (n’ B B) , there exist constants Cs > 63 > 0 such that wzth
.71.? =

probability approaching to 1, one has
CSI2N+1 S 5‘1 S CSIZN-H- (3-5-31)

PROOF. Take a real vector c = (no, 111,1, ...,uN,1,u1,2, ...,uN,2)T E RZNH, one has

2 1 O
2 = (T C = (TS—1C: (3'6'32)

T
B
”C * ,n O (BLQ, BJ’,a'>2 n

 

 

where we denote B... = {1, 81,1 (X1) , ..., BNQ (X2)}T. Meanwhile, the deﬁnition of

An in (3.6.26) entails in particular that

2 2 2
HM 2M»412811121198: .u—An»

 

 

 

 

while (3.6.19) means that there exist constants CS > cs > 0 such that

2

2
2 2 T 2 2 2
Cs “0 + 2 :qu Z IIC 3* 2 = “0+ 23111081,, (3a) 2 CS "0 + :“Ja 1
J,a J,O 2 J10

 

 

hence

J,a

 

 

2
2 2
2," _>_cS “0+Z"J,a (1 —An).
J 0
(3.6.33)
Putting together (3.6.32), (3.6.33), one concludes that with probability approach-
ing 1

CSCTC = (73 113 + 211%, Z (TS-lg 2 cs 11(2) + 21130 = CgCTC,
J,a J,a

which gives (3.6.31). 1:]

94

Lemma 3.6.10. Under Assumptions (AS!) to (A86), for any 1:1 6 [0,1] and 11 (2:1)

deﬁned in (3. 6.2), one has

22186131] E { 112 ($1)| i} = 0,, (n‘l) . (3.6.34)

-1
PROOF. It is known that 5 = (BTB) BTE, then the conditional mean square of

5'; (X12) given )2 is E [{6‘3 (Xl2)}2| )2]

~ T T ~ T ..
= E ({aTPONHJN (CITE) } {aTPONHJN (€53) }|X)
_ T T '1 T T - T ‘1
.. e,BP0N+1,,N (B B) B E(EE lX)-B(B B)
Based on Assumption (A82), we have E{(E-ET)|X1,...,Xn} S 031" in the
matrix sense, then applying these two matrices to a quadratic form with vector
-l ..
{B (BTB) P0N+1JNBTCI’}’ one has E [{g (X,2)}2| x]
-—l T
s a: - {cram} - (3%) {w (423)}
I
= "‘10: ° {0N+1: 31,2 (X12) ..., BN,2 (X12)} 5 {0N+1. 31,2 (erg) ..., BN,2 (le2)}
= "—103 ' 2 3J3 (X12)SJ+N+1,J’+N+IBJ’,2 (Xl’2) ,

lsJ,J’_<_N .
where the 3J+N+1,J’+N+1’S are elements of S in Lemma 3.6.9. Plugging in the

above term, and employing (3.6.4), the term E { I ? (2:1)] X}
03 "
s g 2 Kh (X11 — n) K}. (x,,, — x1)
l,l’=l

2 BJ.2 (X12) 3J+N+1,J’+N+IBJ’,2 (Xl’2)

lsJ,J’5N
02 n 2
-1
s 2% z z { 2Kh<Xn—z1>BJ.2<xm}
lsJ,J’5N lngN (=1

|/\

C2 " 2
7:103 Z {n-IZKMXH—-’1=1)BJ,2(X12)} ,

ISJSN (=1

95

where 05 is the same as in (3.6.31). Now using Lemma 3.6.5, one has with probability

approaching to 1

)

.. C2
sup E{112(a:1)|X} 3 ——‘-’-CS 2: H = E:
3163”] n 1<JsN n

which implies (3.6.34). E]

Lemma 3.6.11. Under Assumptions (A31) to (A56), for 12 (51:1) as deﬁned in

(3.6.2), one has

n n
SUP |12($1)|= SUP ”_IZKMXH-$1)°n-IZ§§(X1°2)

 

 

Mm)-

$1€[0,1] $163M] (=1 i=1
PROOF. Based on (3.3.12), n‘1 21:15; (X52) can be expressed as
n N N n
"-1 Z Z aJ,2BJ,2 (X22) = Z ("n,2 {71-1 2 BJ,2 0(a)} -
i=1 J=1 J=1 i=1

Lemma 3.6.7 helps to get

N N 2 1/2 T 1/2 12
£51,2S{N";_1&J,2} 5{N-5 5} =Op(Nn‘/).

Now it is clear from (3.6.28) and (3.6.29) that

 

 

sup
ISJSN

 

 

_<_ ARI = 0,, (‘/n‘1 logn) ,

ﬂ
"—1 2 31,2 (Xiz)
i=1
hence
n N N
n'1 :5; (X22) S E (1L2 = 0p (:V log n) .
i=1 J=1

(3.6.35)

- sup
ISJSN

 

 

 

ﬂ
"-1 Z BJ,2 (Xiz)
i=1

By Assumption (A84) on the kernel ﬁmction K, standard theory on kernel density
estimation entails that supzle[0,1] ln‘1 2L1 Kh (X11 — $1)] = 0,, (1). Thus with

(3.6.35) the lemma follows immediately. Cl

96

Lemma 3.6.12. Under Assumptions (A51) to (A56) and (ASZ’), and with 11 (:61)

deﬁned in (3.6.2), one has

n
SUP |11(1?1)| = SUP ”—1 2191(le *11) ' 55 (X12)
x1€[0,1] x1E[O,l] [=1

=0p(W).

(3.6.36)

 

PROOF. The discretization idea will be employed again in this lemma, by dividing
the interval [0, 1] into Mn equally spaced intervals with disjoint endpoints O = 1:1,0 <

11:1,1 < < $1.Mn =1. As in (3.6.16), we start with

sup l11(x1)l= sup |11(x1,k)|+ sup sup |11(x1)-11(x1,k)|.
x1€[0,1] ogkgMn lgIchn x1e [x1 [FIJI k

(3.6.37)

Note that for any 2:1 6 [0, 1], (3.3.12) and (3.6.2) imply that

N

—1
55 (X12) = Z iiJ,‘.2BJ,2 (X12) = (erTB) P0N+1JN (BT13) ETE-
J=1

ﬂ
11 (2:1) = n“ E K}. <le — x1) 5; (X12)
(=1

17.
-l
= 71-1} :19, (X,1 — 1:1) (4B) PON +1sz (BTB) BTE.
(=1

Since 11 (1:1) is a linear combination of the noise terms in E, its conditional distribu-

tion given )1 is normal with mean 0, under Assumption (ASZ’). Let

R()~{,$1,k) = (var {11 ($1,k)| X})_1/2 11 (131,16) ,

then the conditional distribution of R (Xal’k) given )1 is standard normal. In what
follows, we use the well-known tail property of the normal distribution, i.e. 1—<I> (1:) 3

¢ (:l:) /:1:, for a: Z 0, hence there exists some c > 0, such that l -- <I> (3:) 3 Cd) (1:) for

97

large 3:, where <I> (2:) and 43 (x) are the cumulative distribution function andthe density
function of the standard normal. Take tn 2 ‘/16 log n, then there exists a constant c

such that for large enough n

:P{ sup |R(X,x1k)| >tnX

0<k<Mn

 

=21}, SUP lzlztn
ﬂ— 0<k<Mn
t2
< 2M. P{|Z|>tn}<cZMn ex..{_3}1<.::M..n-8 <00,

n=1 n=l

where Z ~ N (0, 1) . Consequently for a large value 6 > 0, we have

0<k<Mn

;P{ sup IR (X,x1,k)| Z (ix/log—n} < 00,

the Borel-Cantelli Lemma will then imply that SUPOSkg Mn IR (i,xl,k)| =

0,, (Vlogn) . The conditional variance of 11 (751$) given X is deﬁned as follows:

var {11 (331,.“ x} = E [(1, (61,.) — E11(x1,k)}2|X] = E {1,2 (2.-1,9] x}.

Now Lemma 3.6.10 implies that SUPOSkgMn var {11 ($1,k)| X} = 0,, (n-l). Hence

 

sup [11(xl,k)lg sup |R(X’$l’k)l sup \/var{11(xl,k)|(i§.38)

OSkSMn OSkSMn OSkSMn

= 0,, ( 10g") . (3.6.39)

11

 

Next, with (3.3.12) and (3.6.18), we note that

sup suP I11(331) — 11 ($1.01
193M" $l€[xl,k—lv11,k

= sup sup

n
"—1 Z{Kh(X11-' $1) - Kh (X11 - 931,0} '55 (X12)
193M" $1€[xl,k—1:$l,k] '

l=1

 

98

|/\

SUP SUP IKh (X11 - $1) - Kh (X11 - $1,k)|
ISkSM"31€[$1,k—1:x1,k

N

X SUP ZaJ,ZBJ,2(X12)

N N 1/2
S CMglh-2H-l/2 Z IaJ,2| S CM;1h_2H—l/2N1/2(Z ,2) ,
J=1 J=1

which, when combined with (3.6.22), leads to

sup sup [[1 (x1) —- I1 ($1,k)| (3.6.40)
lSkSMn 2:16 [x1,k—1!xl,k
= o, (Mglh-ZN . N1/2n-1/2) = a, (n—l) . (3.6.41)

due to the choice of CDn6 3 Mn 5 C'Dn6 in Lemma 3.6.4.

Now (3.6.37), (3.6.39) and (3.6.41) establish the lemma. Cl

99

CHAPTER 4

Application to Seasonality Analysis

4. 1 Introduction

Many studies demonstrate the inﬂuence of land use and land cover change on 10—
cal and regional climate. The Climate and Land use Interaction Project, or CLIP
(http://clip.msu.edu) attempts to understand the nature and magnitude of the inter-
actions of climate and land use/ cover change across East Africa.

Phenological information reﬂecting the seasonal variability of vegetation is an
important input variable in regional climate models such as Regional Atmosphere
Simulation System (RAMS). It varies not only among different vegetation types but
also with geographic locations (latitude and longitude).

Many climate models use simple functions for vegetation parameters since, to ﬁrst
order, the planet is warmer and wetter as you approach the equator. However, east
Africa is unique in having semiarid grasslands along the equator, and drastically dif—

ferent surface conditions govern the radiationbudget in this region. Climate models

100

 

are dependent on an accurate representation of the surface radiation budget to repli-
cate atmospheric development. Thus, modeling climate for a unique area like east
Africa requires a different treatment of vegetation characteristics.

RAMS version 4.4 (Cotton et a1. 2003), a state-of-the-art three dimensional at-
mospheric model, includes a representation of vegetation called the Land-Ecosystem-
Atmosphere Feedback, version 2 (LEAF-2) (Walko et a1. 2000). For a given land
cover class, LEAF-2 provides functions for severalvegetation characteristics including
LAI, fractional cover, roughness length, and displacement height. Although these
characteristics are interrelated, we will consider only LAI here.

Remote sensing parameterization for land surface schemes in climate models is
focusing on the transformation of categorical LULC information into quantitative
land surface biophysical parameters (Pitman 2003). The parameters that will result
from this analysis, and that will be inputs to the regional climate model, include
surface albedo, fractional vegetative cover, leaf area index (both senescence and green)
and above ground biomass. In this paper we will investigate the variation of LAI
temporally and spatially for each land type.

The phenological discrepancy between the RAMS model and the remote sensing
measurement given in Section 2 will show that the pre—assumed relationship is sig-
niﬁcantly different from the colleted information from MODIS (Moderate Resolution
Imaging Spectroradiometer).

Based on the observations of LAI of MODIS data, the polynomial spline regression
is employed to ﬁt the function of each land type in East Africa. The ﬁtted curve is

a piecewise polynomial joined at knots, which are the equally-spaced time points of

101

one whole year. The estimated curve is derived from the least square procedure. In
this paper, the linear spline is used for simple implementation and reliable theoretical
property. The corresponding statistical theory were provided in Huang (2003) and
Wang and Yang (2005).

There are two great advantages of spline regression. It is non-parametric, i.e. the
estimation only depends on the available data without assuming any speciﬁc form
of the model. Second, it has a speciﬁc expression for the estimated function. Other
nonparametric regression methods such as kernel or local polynomial do not produce
an overall function formula. Hence the spline function is preferred for data-driven
estimation and future prediction.

We will develop the function ﬁrst temporally and then further investigate the
spatial inﬂuence. In other words, the estimate function of LAI will rely on the time
and the spatial index (latitude and longitude). Compared with the simulation result
derived from RAMS, the estimates at the observations will play the role of ”observa—
tion”.

The research objective of this study was to derive spatially explicit phenologies
for all LULC types in East Africa for improved parameterization of regional climate
methods (such as RAMS). By addressing this objective, the following two questions
must be addressed:

What are the differences in LAI between the observations from MODIS censor
and the simulated values from RAMS?

Are there any signiﬁcant differences among the land types and do they if any vary

with geographic locations?

102

4.2 Method

4.2.1 Study Area and Data Description

East Africa is a region that is undergoing rapid land use change and where changes
in climate would have serious consequences for people’s livelihoods and requiring new
coping and land use strategies.

Consequently, uncertainty in climate modeling is expected to be high, partly due
to uncertainty related to the use of generic land cover parameters including their
phenological functions. The CLIP project also created a new land use) land cover
(LULC) classiﬁcation based on the best available international LULC products for
the East Africa region (cite Ge et al 2005, Torbick et al 2005a). The new LULC
classiﬁcation (Torbick et al 2005b), labeled ”CLIP—cover,” was used as the spatial
land cover layer for which the LAI remote sensing data were extracted by LULC, or
land type.

Two primary data sets are used to develop the phenological curves. The ﬁrst is a
hybrid LULC classiﬁcation with 34 land types at 1km spatial resolution for the entire
study region. The hybrid combines the strengths of Global Land Cover for the year
2000 (GLCZOOO) (Mayaux et a1 2004) and Africover (Africover 2002) LULC products.
Assessments determined GLC2000 more accurately classiﬁed natural land cover types,
while Africover more accurately classiﬁed human—managed landscapes (Torbick et al
2005b). The new hybrid CLIP Cover captures these strengths geospatially for a single
LULC for the study region.

The second is LAI from the MODIS instrument on the Terra satellite platform.

103

Brieﬂy, LAI is a description of vegetation structure and the amount of plant canopy
relative to a unit on the surface. In climate models, LAI is used to represent compo-
nents of energy balance equations between the surface and lower atmospheric bound-
aries. The MODIS LAI product used, MOD15A2 v4.0 (Knyazikhin et a1. 1999), is
available at 8-day temporal intervals at 1km spatial resolution covering the entire
study region in a 2-dimensional tessellation. The data was obtained through the Na-
tional Aeronautics and Space Administration (NASA) Land Processes Distribution
Active Archive Center.

Data was obtained from February 2000 to December 2003 at 8-day intervals. Data
preprocessing included mosaicing tiles, rescaling data values, quality control for cloud
cover and ﬁll values, and reprojecting data from Integerized Sinusoidal Projection
into Lambert Azimuthal Equal Area. Using the hybrid LULC product, LAI data was
subset into tables by LULC type. Each table contains 8-day LAI from February 2000
- December 2003 by LULC type with geographic coordinates (latitude / longitude)

at each pixel (or LAI value) representing spatial location information.

4.2.2 Polynomial Spline Regression

The imagery data for each land cover type is collected from January 2000 to December
2003, roughly every 8 days for each pixel (solution = 1 kilometer). Some diﬂiculties
that have been encountered were empty cells due to cloud cover, small size of some
land covers.

First calculate the mean for each grid (0.1 degree) at every available Julian day.

104

For each speciﬁc grid, the LAI of each land cover type can be seen as a series of data
points over explanatory variable time (one year). So we treat each series of LAI at
each grid as a univariate function of time. The linear spline regression was employed
to get the spline estimator of LAI, which is shown in Figures 4.11 and 4.12 .

In order to capture the spatial feature of each land cover type, we combine all
the regression coefﬁcient of linear splines. Then for each coefﬁcient we perform the
polynomial regression on the spatial index, latitude and longitude. The corresponding
outcomes are listed in Tables 4.5 - 4.8.

The dependence of LAI on time is investigated in the framework of nonparametric
regression. To introduce this concept, let {(Ti, Yi) 21:1 be identically and indepen-

dently distributed observations, satisfying
Y,- = m(T,-) + o(T,-)8,-,i = 1,...,n.

where the errors 5,- have mean zero and variance one. The mean function m (t) and
standard deviation function a (t) are not assumed to be of any speciﬁc form but
have to be estimated from the data directly, see Wang and Yang (2005). If the data
actually follows a polynomial regression model, the ﬁmction m (t) is a polynomial of
t and o (t) will typically be a constant.

To introduce the concept of spline, one divides the ﬁnite interval [a, b] into (N + 1)
subintervals Jj = [tj,tj+1) ,j = 0,1,...,N — 1,JN = [tN,b]. A sequence of equally-
N
] 1

spaced points {tj} .= , called interior knots, are given as

t0=a<t1< <tN <b=tN+1,tj =a+jh,j =0,1,...,N+l,
in which h = (b -— a) / (N + 1) is the distance between neighboring knots. We ap-

105

proximate m (t) by linear spline. These are piecewise linear functions, linear on J]-
each and continuous on the entire interval [a, b].

The linear spline estimator of m (t) based on data {(Ti, Yi)}?___1 is given by

N

1?). (t) = 510 + Zﬁj (t — tj)+ + &N+1t (4.2.1)
i=1

where the coefﬁcient are the solutions of the following least square problem

N

n
{(30, ---,&1v+1}T = argmin RN+2 Z Yi - ao - Zaj (Ti - tj)+ - aN+1t
i=1 j=l

in which (t — tj) + = max {0,t - tj} is the so-called ”truncated linear function” with

truncation at knot tj.

4.2.3 Spline Fitting for LAI by LULC Type

At ﬁrst we resample the LAI pixels within 0.1 latitude degree and 0.1 longitude degree
together as one grid block. In order to get the representative LAI values, the spatially
averaged LAI at each grid is obtained for each available Julian day. The second step
is to get the means of the same Julian days over four years. After the above two-step
averages, LAI means of a whole year at each grid is available.

Based on the LAI means, the equation (4.2.1) is established after one step least
squared procedure for each grid. To avoid the non-continuity difference between the
values of early January and late December, we duplicate the one year data to create
a two-year data, hence [a,b] = [0,730]. For uniformity across various LULC types

and locations, we pick one knot every two months, i.e. N =2 11
ll .

_« ~. _ . - .... J -_
LAI (t) _ a0 + Zn] (t t,)+ + alzt, t, _ 365- 6” ._ 1,11 (4.2.2)

106

Let Z =LAI, a: = latitude, y = longitude, t =Julian day. For each LC type we develop
the LAI function as follows,
11
z (x, y, t) = 60 (x, y) + Z a, (2:, y) . (t — t,-) + + 612 (x, y) t, (4.2.3)
j=l
The coefﬁcients 6,- (2:, y) for j = 0,1,..., 12, are estimated based on the MODIS
data at each individual grid. Different LC type will have different coefficients set, see

Tables 4.5 - 4.8.

4.3 Results

4.3.1 Land Cover Phenologies

In order to show the magnitude of the difference driven by the spatial affect, in
particular the latitude, the linear spline curves estimated by formula (4.2.1), the
RAMS simulation curve and the difference curve are provided respectively at equator,
5" north, , and 5° south. Each grid points covered the area of .1 by .1 squared degrees,
the longitudinal of three grid points are chosen to be as close as possible. In Figures
4.11 and 4.12, the green solid line represents the LAI at 5" North, the red dashed line
for the equator, and the blue dotted line for 5" South.

Figures 4.11 and 4.12 illustrates several examples of the seasonal variation in LAI
for common classes in the study area. The lower right graphs are the trigonometric
curve of LAI over time for two land types, open to very open trees, and rainfed
herbaceous crop. Although the length of vertical axis of the RAMS curve is the same

0.2, the start points of the range are different though. While in the ﬁgures of the linear

107

splines the range of the vertical is 6, from 0 to 6, that is a substantial difference. If the
same scale is chosen as the one for the spline estimates, no distinguishable differences
occur among the RAMS curve at the three selected latitudes. While there is no
longitude effect in RAMS, it plays an unnoticeable role in the system. There is only
one valley for northern latitudes and one peak for south latitudes in RAMS, and the
valley or peak point is in the exact middle of the year. At the equator it is a ﬂat
straight line no matter what land type is represented.

The linear spline estimators have a better ﬁt spatially and temporally compared
with RAMS. The green solid line (5° N) achieves its peak point of LAI around August,
while all the blue dotted lines (5° S) show the largest LAI value in the spring, such
as early March for Rainfed Herbaceous Crop. Not surprising are the fact that the
northern and southern curves are symmetric about the center, June, for each type
because the two locations are symmetric about the equator. For both land types, the
LAI at the equator has greater LAI than those far away from the equator. Especially
for land type rainfed herbaceous crop, the regression line at the equator is far above
both the spline regression lines at 5° N and 5°S latitude. The linear spline estimates
produce two noticeable valleys at the equator. That is a big difference from the
constant LAI value of RAMS. The LAI varies at the equator over time, it is not ﬁxed
given the keep-changing weather condition.

The lower right graphs in Figures 4.11 and 4.12 show that the differences between
the LAI values from RAMS and the linear spline estimates. Horn the graph, except
there is little overlap between the difference at equator and the ”0 line” for land

types, all the remaining distance is very large. The statistical testing of the difference

108

is given in next section.

In summary, the observed LAI and resultant splines are distinctly different from
the RAMS/ LEAF -2 default parameterization, with the LEAF-2 parameterization
completely failing to capture the seasonality at the equator or in the regions +/-
5 away. The spline parameterizations accurately capture bimodal greening events
at the equator, unimodal features away from the equator, and the very low LAI for

maize regions following harvest.

4.3.2 Sensitivity and Uncertainty

Conﬁdence band of a function estimator is the collection of simultaneous conﬁdence
intervals over the range of data. It can be used to test the hypothesized curve. Linear
spline conﬁdence bands were developed in Chapter 2. Given a small signiﬁcance
level (less that 0.05), the conﬁdence bands based on the sample information can be
obtained. If the null curve is totally covered by the upper and lower conﬁdence bands,
then its deviation from the true curve is insigniﬁcant and will be accepted as a valid
representation of the true curve; otherwise, it should be rejected as the null curve,
since it is signiﬁcantly different from the data pattern.

In this paper, the hypotheses for a land type are:

H0 :LAI trend curve follows the RAMS Curve H a :LAI trend curve does not
follow the RAMS Curve.

For the test, the same data from the previous four land types for comparison is

used in Figure 4.13 to 4.16. The upper right corner ﬁgures represent the LAI average

109

value for each grid block. The three grid blocks are chosen to have almost the same
longitude. The triangle is for LAI at equator, the diamond for North 5 degree, the
cross for the South 5 degree. The blue solid line represents the LAI value of the
RAMS, the green solid line is the linear spline regression line, and the dashed red
lines (upper and lower) are the conﬁdence bands derived from the MODIS data given
the signiﬁcance level 0.001.

Although tested with a signiﬁcance level as low as 0.001, the RAMS curves are
above both bands for 5°N and 5° S. At the equator there is some overlap for deciduous
woodland and deciduous Shrubland with sparse trees, however it is still far from being
totally covered by the bands. Therefore this test illustrates that the RAMS curves
overestimate the LAI, with the difference being signiﬁcantly large indicated from the

small p—value< 0.001.

4.3.3 Phenological Functions of Land Cover

To model the LAI spatially, the coefﬁcients in equation (4.2.3) are further approxi-
mated with quadratic functions of x and y. The same four dominant land types are
selected for analysis.

Horn Section 4.2, a coefﬁcient set with 13 coefﬁcient elements {cij (:l:,y) ”:0 is
obtained. Each coefﬁcient element 61- (2:, y) is related to all grid point. For better re-
gression, the outliers (grid points) are ﬁrst detected and removed from the coefficients
based on the screening of the kernel density estimators. Then the corresponding part

in the data set will be left out too. The deleted outliers are shown in the following

110

table , at most 5.234% out of the whole data will not affect the regression.

 

Deciduous with Deciduous Open to Very Rainfed
Outliers Shrubland Trees Woodland Open Trees Herbaceous Crop

 

Grid(%) 348 (4.831%) 344698593) 269 (5.418%) 324 (5.234%)

 

 

 

 

 

 

Data (%) 16254 (3.2%) 16084 (2.672%) 14334 (3.982%) 18448 (4.068%)

 

 

The polynomial regression is applied to ﬁt the above trimmed coefficients. The

employed function is as follows for
&j(:1:,y) = Co + cla: + c222 + dly + dgy2 + elsry

By the ordinary least square procedure, the new set of coefﬁcients (c0, cl,c2, d1, d2, e)
are obtained for the previous four land cover types and are listed in Tables 4.5 to
Table 4.8.

Employ the table coefficients for 6,- (T, y) in (4.3.3), and further plug into equation
(4.2.3), the LAI estimates are obtained based on the parametric regression spatially
and spline regression temporally. There is negligible amount of unreasonable esti-

mates

 

Deciduous with Deciduous Open to Very Rainfed
Estimate Shrubland Tl'ees Woodland Open Trees Herbaceous Cro

Less thanO 699 (0.142%) 369 (0.062%) 122 (0.035%) 110 (0.025%)

 

 

 

 

 

 

 

 

We replace all the negatives with 0, then the linear correlation coefficients between

the ﬁnal estimates and the raw LAI is provided in the following table.

111

 

Deciduous with Deciduous Open to Very Rainfed
Shrubland Tmes Woodland Open Trees Herbaceous Crop

0.62814 0.57409 0.59555 0.53253

 

 

 

 

 

 

 

4.3.4 Implications

Figures 4.17 shows LAI values at 8 May 2000 for three combinations of land cover
and LAI phenology, along with a MODIS image for comparison. LAI exerts a strong
inﬂuence on the radiation budget at the surface, and when incorporated into models it
can improve accuracy, see Lu and Shuttleworth (2002). Figure 4.17 (a) shoWs grid-cell-
averaged LAI for OGE with LAI values assigned from LEAF-2. Figure 4.17 (b) shows
CLIPCover crosswalked with the same vegetation classes in the LEAF-2 lookup table.
Figure 4.17 (c) shows the LAI distribution using the CLIPCover classes, but with
LAI values assigned based on the MODIS-derived spline functions. Here, time class-
speciﬁc curves of LAI (splines) have been estimated for different regions to generate
look-up tables for LAI more appropriate for these regions than LEAF-2. Figure
4.17 ((1) shows the raw MODIS LAI for the date selected. Since RAMS treats LAI
slightly differently from MODIS, the example shown here has been corrected for this
discrepancy. The profound difference in LAI from Figure 4.17 (a) to (d) at the Equator
shows that the LEAF—2 function is essentially treating the semidesert of eastern Kenya
as having high LAI with no variation. These successive improvements have helped to
give a more precise surface parameterization while keeping the ﬂexibility needed to

accommodate projected land use change.

112

4.4 Conclusions

In general, we found that this approach resulted in a large improvement over the
generic cover parameters in RAMS in the representation of seasonal variability of
LAI. This improvement is expected to signiﬁcantly improve the seasonal precipitation
pattern in RAMS scenarios. For certain land cover, the phenological information
varies spatially. At the same grid point the phenologies changes for different land
covers.

Sensitivity needs to quantify spatially and by type. For better estimation and
prediction, the time dependence and the spatial correlation should be considered.
There are more inﬂuence affects like the elevation and the topology distance to other

geographic features such as Ocean, lakes, Mountain and human settlement etc.

113

Tables

114

 

 

 

 

 

 

 

 

noise level sample size n conﬁdence estimated bands oracle bands

0.99 0.476 (0.458) 0.606 (0.606)

100 0.95 0.256 (0.246) 0.438 (0.436)

0.99 0.704 (0.708) 0.802 (0.802)

0.2 200 0.95 0.454 (0.456) 0.532 (0.532)
0.99 0.826 (0.834) 0.832 (0.832L

500 0.95 0.462 (0.456) 0.468 (0.468)

0.99 0.618 (0.618) 0.618 (0.618)

100 0.95 0.504 (0.504) 0.504 (0.504)

0.99 0.860 (0.860) 0.860 (9.860)

0.5 200 0.95 0.716 (0.716) 0.716 (0.716)

0.99 0.932 (0.932) 0.932 (0.932)

500 0.95 0.802 (0.802) 0.802 (0.802)

 

 

 

 

 

 

 

 

 

 

Table 4.1. Coverage probabilities of constant spline bands.

115

 

 

 

 

 

noise level sample size 11 conﬁdence level 0.99 conﬁdence level 0.95
100 0.900 (0.896) 0.816 (0.814)
0.2 200 0.956 (0.962) 0.902 (0.904)
500 0.990 (0.988) 0.954 (0.958)
100 0.904 (0.904) 0.822 (0.814)
0.5 200 0.956 (0.960) 0.900 (0.902)
500 0.990 (0.988) 0.956 (0.960)

 

 

 

 

 

 

Table 4.2. Coverage probabilities of linear spline bands.

116

 

 

 

 

 

 

 

 

 

 

 

 

 

effL eff3

d n p=0 p=0.3 40:0 p=0.3
100 1.015 (0.287) 0.958 (0.320) 1.000 (0.268) 0.926 (0.266)

4 200 0.992 (0.126) 0.974 (0.164) 1.001 (0.133) 0.973 (0.153)
500 0.993 (0.060) 0.990 (0.083) 0.995 (0.058) 0.990 (0.083)
1000 0.998(00416) 1.000 (0.06Q 0.998 (0.042) 0.997 (0.057)
100 0.899 (0.648) 0.666 (0.597) 0.952 (0.832) 0.641 (0.552)

10 200 1.026 (0.4%) 0.818 (0.361) 1.045 (0.479) 0.826 (0.395)
500 1.012 (0.145) 0.977 (0.171) 1.002 (0.138) 0.970 (0.182)
1000 0.999 (0.078) 0.986 (0.104) 0.989 (0.082) 0.988 (0.105)

 

Table 4.3. Relative efﬁciency of raw against 628,... for d = 4, 10.

117

 

n

eff)

efflo

efflg

effgo

 

500

1.030 (0.830)

0.995 (0.778)

0.737 (0.567)

0.861 (0.648)

 

1000

1.130 (0.756)

1.015 (0.523)

1.055 (0.467)

1.056 (0.509)

 

1500

1.022 (0.318)

1.029(0248)

1.107 (0.302)

0.957 (0.205)

 

2000

1.029 (0.197)

1.016 (0.194)

1.045 (0.188)

1.061(0223)

 

0.3

500

0.379 (0.297)

0.410 (0.408)

0.352 (0.296)

0.444 (0.721)

 

1000

0.618 (0.269)

0.604 (0.290)

0.623 (0.268)

0.607 (0.311)

 

1500

0.864 (0.345)

0.843 (0.280)

0.806 (0.254)

0.831 (0.250)

 

 

2000

 

0.915 (0.247)

 

0.872 (0.194)

 

0.917 (0.221)

 

0.907 (0.221)

 

Table 4.4. Relative efficiency of 753,0, against mm for d = 50.

118

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Co C1 C2 41 dz 61

do 13.86733 0.189258 —0.01538 —0.61737 0.007717 -0.00977
€11 0.501879 0.009621 0.000144 —0.02756 0.00039 —0.00021
ﬁg —0.63643 —0.00351 -1.1E — 05 0.033409 —0.00044 0.000168
63 0.425755 —0.00089 —7.2E — 05 -0.02161 0.000266 -1.6E - 05
04 0.230287 —0.00537 —0.00025 -—0.01455 0.000229 0.000037
€15 —0.38993 0.001671 —7.7E - 05 0.022872 —0.00033 —6.9E — 05
ﬁg —0.l7788 —6.9E — 05 0.000194 0.010029 —0.00015 0.000025
(17 0.560264 0.007802 0.000233 —0.03082 0.000431 -0.00013
€18 —0.65163 —0.00305 —3.8E - 05 0.034248 —0.00045 0.000146
€19 0.430822 —0.00104 -6.2E — 05 —0.02189 0.00027 -8E — 06
6110 0.222698 —0.00512 -0.00026 —0.01413 0.000224 0.000024
611 -0.36273 0.000745 -1.7E - 05 0.021394 —0.00031 -2.3E — 05
5112 —0.2125 -0.00392 0.000027 0.012197 -0.00018 0.00008

 

Table 4.5. Coefﬁcients table for Deciduous Shrubland with Sparse Tmes.

119

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Co 9. C2 d1 d2 81

50 15.60422 0.258968 —0.01681 —0.7006 0.008762 —0.01201
{11 0.285465 0.010554 0.000218 -0.01437 0.000196 —0.00021
(12 —0.52319 —0.00485 —3.6E — 05 0.025799 —0.00032 0.000184
(“13 0.423792 —0.00264 —0.00014 —0.02168 0.000263 0.000014
(‘14 0.165587 —0.00551 —0.00016 -0.01029 0.000164 0.000077
(15 —0.44096 0.003254 —0.00018 0.025446 —0.00036 —0.00014
(15 0.062666 0.000927 0.000261 —0.00331 0.000032 0.00001
€17 0.322273 0.008389 0.000262 —0.01656 0.000226 —0.00012
98 -0.53571 —0.00429 —4.9E - 05 0.026532 —0.00033 0.000161
5.9 0.428083 —0.00285 -0.00013 —0.02193 0.000267 0.000022
6119 0.157467 -0.0052 -0.00017 —0.00982 0.000158 0.000064
6111 —0.41156 0.002126 —0.00014 0.023738 —0.00034 —9.7E — 05
612 —0.07965 —0.00318 —1E — 06 0.004623 —7.5E — 05 0.00005

 

Table 4.6. Coeﬂicients table for Deciduous Woodland.

120

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

60 61 62 d1 d2 61

60 21.36797 0.582205 -0.01761 —0.9429 0.011065 -0.01953
61 0.755761 0.026441 0.000046 —0.0398 0.00054 -0.00064
62 —0.40319 —0.00305 -0.00016 0.020365 —0.00027 0.000065
63 —0.52959 -0.02078 —l.5E — 05 0.03061 1 —0.00044 0.000568
64 0.583969 0.007367 -0.00017 -0.03334 0.000476 -0.00027
65 —0.20593 —0.00388 -0.00013 0.012463 —-0.00018 0.000071

65 —0.4363 -0.00384 0.000293 0.024033 -0.00035 0.000095
67 1.062424 0.023523 0.000225 -0.05847 0.000819 —0.0005

68 —0.49433 -—0.00221 -0.00021 0.025909 —0.00035 0.000024
69 -0.49496 —0.021 11 0.000005 0.028505 --0.00041 0.000584
610 0.529546 0.007892 -0.00021 -0.03003 0.000427 -0.0003

6.11 —0.01689 -0.00576 —1.4E - 05 0.000928 —8E - 06 0.000164
612 -0.22684 —0.01172 0.000174 0.011508 -0.00016 0.000297

 

Table 4.7. Coefﬁcients table for Open to Very Open Trees.

121

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

60 61 62 611 612 61
60 27.46197 0.516892 -—0.01812 -1.34425 0.017536 —0.01782
61 0.665941 0.016663 0.000098 —0.03529 0.000488 —0.00035
62 —0.30472 0.00122 —0.0002 0.015172 -0.0002 -7.9E — 05
63 —0.44979 —0.01913 —2E — 06 0.0253 —0.00035 0.000526
64 0.59182 0.004496 —0.00022 —0.03303 0.000461 —0.0002
65 —0.11557 -0.00089 —0.00021 0.006697 -9.1E — 05 -3.5E — 05
66 —0.56834 —0.0029 0.000415 0.031754 —0.00046 0.000102
67 0.902208 0.017373 0.000252 -0.04916 0.000688 -0.0003
68 —0.37557 0.001015 -0.00024 0.019326 -0.00026 —9.1E - 05
69 —0.4228 —0.01907 0.000015 0.023719 —0.00033 0.000531
610 0.54799 0.004403 -0.00024 —0.03046 0.000424 —0.00021
611 0.038773 —0.00055 ——0.00012 —0.00236 0.000039 —5E — 06
612 -0.28223 -0.00642 0.000175 0.015208 -0.00022 0.000154

 

Table 4.8. Coefficients table for Rainfed Herbaceous Crop. -

122

Figures

123

 

 

sample size n = 100, conﬁdence = 0.95

 

 

 

 

 

 

«0.5 0 0.5

 

 

 

sample size n = 100, conﬁdence = 0.99

 

 

 

 

 

 

 

 

sample size n = 500, conﬁdence = 0.95

 

 

 

 

 

«0.5 0 0.5

 

 

 

sample size n = 500, conﬁdence = 0.99

 

 

 

 

 

 

0.5 0 0.5

 

Figure 4.1. Constant spline conﬁdence bands with opt = 1.

 

 

 

 

sample size n = 100, conﬁdence = 0.95 sample size n = 100, conﬁdence = 0.99

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

° \ . ° \
O ’ ‘.. . .
-.~ . ° 6 I 1‘- 0
0r;— 0 0.5 6.5 f) 0.5
sample size n = 500, conﬁdence = 0.95 sample size n = 500, conﬁdence = 0.99

 

 

 

 

 

 

 

 

 

«0.5 0 0.5 0.5 0 0.5

 

 

 

 

 

Figure 4.2. Constant spline conﬁdence bands with opt = 2.

125

 

 

sample size n = 100, conﬁdence = 0.95

 

 

 

 

\
cu /”
. /
"II .
0'5 6 0.5

 

 

 

 

sample size n = 500, conﬁdence = 0.95

 

 

 

 

U I
0.5 0 0.5

 

 

 

 

sample size n = 100, conﬁdence = 0.99

 

 

 

 

 

 

 

 

sample size n = 500, conﬁdence = 0.99

 

 

 

 

 

 

Figure 4.3. Linear spline conﬁdence bands with opt = 1.

126

 

 

 

 

sample size n = 100, conﬁdence = 0.95

 

 

 

 

.. ..
=1

\. /-
'9 “‘1'

 

 

 

 

sample size n = 500, conﬁdence = 0.95

 

 

 

 

-0.5 0 0.5

 

 

 

 

 

 

sample size n = 100, conﬁdence = 0.99

 

 

 

 

 

 

 

sample size n = 500, conﬁdence = 0.99

 

 

 

 

 

Figure 4.4. Linear spline conﬁdence bands with opt = 2.

127

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Degree f 2, conﬁdence level = 0.499 Degree 7 3, conﬁdence level = 0.99
SH
4 a )
L a
.30 do do .50 do .10
Degree 6: 5, conﬁdence level = 0.499 Degree f 6, conﬁdence level = 0.80
°
5,( .
a
L 9)
we no no .3» no no

 

 

 

 

 

Figure 4.5. Testing H0 : m (:c) = 25:1 akxk, d = 2,3, 5, 6 for fossil data.

128

 

 

 

Efﬁciency of the l-st estimator, d=4, 1:0.0 Efﬁciency of the l-st estimator, d=4, t=0.3

24

 

 

O
e-O

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

-0 I 2 3 '0 I 2 3
Efﬁciency of the 3-rd estimator, d=4, F00 Efﬁciency of the 3-rd estimator, d=4, r=0.3
g. L g. L
'o . l 3 'o . i 3

 

 

 

 

 

Figure 4.6. Relative efﬁciency of 163,0, against 163,0, d = 4.

129

 

 

 

Efﬁciency of the l-st estimator, d=10, r=0.0 Efﬁciency of the l-st estimator, d=10, r=0.3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Efficiency of the 3-rd estimator, d=10, r=0.0 Efﬁciency of the 3-rd estimator, d=10, r=0.3

 

 

vol

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.7. Relative efﬁciency of mm, against 163,0, d = 10.

130

 

 

 

Efﬁciency of the l-st estimator, d=50, r=0.0 Efﬁciency of the l-st estimator, d=50, r==0.3

'1. L

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Efﬁciency of the lO-th estimator, d=50, r=0.0

1.5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.8. Relative emciency of 171,,0, against 613,0, d = 50, a = l, 10.

131

 

 

 

Efﬁciency of the 19-th estimator, d=50, r=0.0

1,5

0,5

 

 

 

 

 

 

Efﬁciency of the 19-th estimator, d=50, r=0.3

 

 

 

 

 

 

 

 

 

 

 

Efﬁciency of the SO—th estimator, d=50, r=0.0
1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Efﬁciency of the SO-th estimator, d=50, [=03

 

 

 

 

 

 

 

 

 

 

Figure 4.9. Relative efficiency of m,,,, against 763,0, (1 = 50, a = 19,50.

132

 

 

Conﬁdence Level = 0.99

 

81

 

 

 

..
at:
.
.

 

 

 

 

log('l‘ AX), Conﬁdence = 0.99

 

 

 

 

 

 

 

5.5 6

 

 

 

PTRATIO, Conﬁdence: 0.99

 

 

 

 

 

 

 

log(LSTA'I'), Conﬁdence = 0.99

 

 

 

 

 

 

Figure 4.10. Linearity test for the Boston housing data.

133

 

 

 

 

 

 

 

 

Rainfed Herbaceous Crop
A A
'1 A -
AA
A A
A A A AA “AAA 5
A
AA AA A A A ‘
A A + A
A AAA A X A
m A

 

+ 9 7. Q + $4- + +
I o'..@¢$*++ﬂb++%bt ++ ++

+ +
AA 3:: “1+: +

   

 

Spline Estimates

 

 

oﬁ'-..§..

 

 

 

4.9+Y ‘5-2

+ 9 . +4-
0 + + +£0.09”; H 41
+ +
43'+ 0 “0’0".
0 4 8 l2
RAMS Values

 

 

 

 

 

 

 

 

 

 

“"o.....-

 

 

 

 

Figure 4.11. LAI trend of rainfed herbaceous crops.

 

 

Open to Very Opel Trees

 

 

 

 

 

Spline Estimates

 

 

 

 

 

 

418+" '52

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.12. LAI trend of open to very open trees.

135

 

 

 

Deciduous Woodland

 

 

 

 

 

A A A 3A
A ‘ A A
A
Q AA
A A
" A
,t A A > M A
a! 0A A AA P
A 8A A) + + A
' A
A A AAA A 0< 549”" $ A + A
C . c A W A
’A v {5+ t + A C
o 0..” 0° A AAOA AA4F++
00- A "AQA + + > + + .
+ +1
’5 ++ O A A 0
8 + + 00 + d
0 + 6 0° 0 t +43)
+*+‘N+#‘N'H + + A on“ W 3;
a V V V *‘V
8 12
Latitude = -5, alpha=0.0001
o- 1- ~0-

 

 

 

 

 

 

 

Figure 4.13. Spline conﬁdence bands of LAI of deciduous woodland.

 

136

 

 

 

 

 

 

 

 

 

 

 

 

Deciduous Shrubland with Sparse Trees

 

 

 

 

 

   

Latitude = O, alpha=0.0001
-.

 

A A A AA A
AAA A R A A AA AA
A V A A
A A A
A A» + + + A
v A A + * . o
O A AQ c. 6 A3 +
<30 A Ag '0‘“ + 1P" +
+ +3?» =9 e 4' o
+ g N’ 9)!) f) % 80
00 x r 090. ‘ (:0 <

 

 

 

 

 

 

 

 

 

 

 

Latitude = 5, alpha=0.0001

 

 

 

 

 

 

 

 

Figure 4.14. Spline conﬁdence bands and RAMS curves of LAI of deciduous shrub-

land.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Open to Very Open Trees Latitude = 0, alpha=0.0001
p
A A A
A MAAB A A A AAA
A A A A
M“ 5°” “Afw‘dgn
“32336:: A “ﬂ 3% A A
A%6}5CQ:QO + ++ ++43
000 0 $0 +++++ "’ + +4’ + + 000°
WMM+13++++ + + ”5*?
O
0 l ; 1‘2 0 4 ; |2
Latitude = -S, alpha=0.0001 Latitude = 5, alpha=0.0001
P v- -

 

 

 

 

Figure 4.15. Spline conﬁdence bands and RAMS curves of LAI of rainfed herbaceous

crop.

138

 

 

Rainfed Herbaceous Crop

 

 

 

 

ed p
A A
A
v1 A D
A 435% AA A
AA A AAA
4 A A A A A
& M A A A
A +
°" 5 A A AA 4’ + A
+ A 9 “Al? + 95‘ t M+¢$¢$+ +
cv (>0 o" «93%, + A... + +
o 0 3+ 31* + 3+
Oat“ + 43.0% +1 03
W+++ A Q queoo
@ V I V
0 4 8 I2

Latitude = 0, alpha=0.0001

 

 

 

 

 

 

 

 

Latitude = -5, alpha=0.0001

 

 

 

 

 

 

 

Latitude = 5, alpha=0.0001

 

 

 

 

 

 

 

Figure 4.16. Spline conﬁdence bands and RAMS curves of LAI of open to very open

trees.

139

 

CLchover with LEAFZ

x

 

Figure 4.17. Improved reprwentation of land surface in RAMS.

140

BIBLIOGRAPHY

141

BIBLIOGRAPHY

[1] Africover(2002). Africover- Eastern Africa Module. Land cover mapping based
on satellite remote sensing. Food and Agriculture Organization of the United
Nations.

[2] Andrews, D. and Whang, Y.(1990). Additive interactive regression models: cir-
cumvention of the curse of the dimensionality. Economic Theory. 6 ,466—479.

[3] Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the devia-
tions of density function estimates. The Annals of Statistics. 1 1071-1095.

[4] Bralower, T.J., Fullagar, P.D., et al (1997). Mid-cretaceous strontium-isotope
stratigraphy of deep-sea sections. Geological Society of America Bulletin. 109,
1421-1442.

[5] Breiman, L. and Friedman, J .H. (1985). Estimating optimal transformations for
multiple regression and correlation. Journal of the American Statistical Associ-
ation. 80, 580-619 .

[6] Chaudhuri, P. and Marron, J.S. (1999). SiZer for exploration of structures in
curves. Journal of the American Statistical Association. 94 807-823.

[7] Claeskens, G. and Van Keilegom, I. (2003). Bootstrap conﬁdence bands for re-
gression curves and their derivatives. The Annals of Statistics. 31 1852—1884.

[8] Cotton, W. R., et a1. (2003). RAMS 2001: Current status and future directions.
Meteorology and Atmospheric Physics. 82, 5-29.

[9] de Boor, C. (2001). A Practical Guide to Splines. Springer-Verlag, New York.

[10] DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Springer-
Verlag, Berlin.

[11] Fan, J. and Chen, J. (1999), One-step local quasi-likelihood estimation. Journal
of the Royal Statistical Society Series B. 61, 927-934

142

[12] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications.
Chapman and Hall, London.

[13] Fan, J. Ha'rdIe, W. and Mammen, E. (1998). Direct estimation of low-dimensional
components in additive models. The Annals of Statistics. 26, 943—971.

[14] Cantmacher, F. R. and Krein, M. C. (1960). Oszillationsmatrizen, Oszilla-
tionskerne und kleine Schwingungen mechanischer Systeme. Akademie-Verlag,
Berlin.

[15] Hall, P. and Titterington, D. M. (1988). On conﬁdence bands in nonparametric
density estimation and regression. Journal of Multivariate Analysis. 27 228—254.

[16] Hardle, W. (1989). Asymptotic maximal deviation of M-smoothers. Journal of
Multivariate Analysis. 29 163—179.

[17] Hardle, W. (1990). Applied Nonparametric Regression. Cambridge University
Press, Cambridge.

[18] Hardle, W. , Hlavka, Z. and Klinke, S. (2000). XploRe Application Guide.
Springer-Verlag, Berlin.

[19] Hiirdle, W., Huet, S. ,Mammen, E., and Sperlich, S.(2004). Bootstrap inference
in semiparametric generalized additive models. Economic Theory. 20, 265-300.

[20] Hardle, W., Marron, J. S. and Yang, L. (1997). Discussion of “Polynomial splines
and their tensor products in extended linear modeling” by Stone et. al. The
Annals of Statistics. 25 1443-1450. -

[21] Hardle, W., Sperlich, S. and Spokoiny, V. (2001) Structural tests in additive
regression. Journal of the American Statistical Association. 96, 1333-1347.

[22] Harrison, D. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demand
for cleaning air. Journal of Economics and Management. 5, 81-102.

[23] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman
and Hall, London.

[24] Huang, J. Z. (1998). Projection estimation in multiple regression with application
to functional AN OVA models. The Annals of Statistics. 26, 242—272.

[25] Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. The
Annals of Statistics . 31,1600-1635.

[26] Huang, J. Z. and Yang, L. (2004). Identiﬁcation of nonlinear additive autore-
gression models. Journal of the Royal Statistical Society Series B. 66, 463—477.

143

[27] Johnson, R. A. and Wichern, D. W.(1992). Applied Multivariate Statistical Anal-
ysis. Prentice-Hall, New Jersey.

[28] Kim, W., Linton, O. B., and Hengartner, N.(1999). A Computationally efﬁcient
oracle estimator for additive nonparametric regression with bootstrap conﬁdence
intervals. Journal of Computational and Graphical Statistics. 8, 278-297.

[29] Knyazikhin, Y., J. Glassy, J. L., Privette, Y.Tian, A. Lotsch, Y. Zhang, Y. Wang,
J. T. Morisette, P. otava, R.B. Myneni, R. R. Nemani, S. W. Running,(1999)
MODIS Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radi-
ation Absorbed by Vegetation (FPAR) Product (MODIS) Algorithm Theoretical
Basis Document.

[30] Leadbetter, M. R., Lindgren, G.andRootzén, H.(1983). Extremes and Related
Properties of Random Sequences and Processes. Springer-Verlag, New York.

[31] Linton, O. B. and Nielsen, J. P.(1995). Estimating structured nonparametric
regression models by the kernel method. Biometrika. 82, 93-101. ,

[32] Linton, O. B. and Hardle, W.(1996). Estimating additive regression models with
known links. Biometrika. 83, 529—540.

[33] Linton, O. B.(1997). Efﬁcient estimation of additive nonparametric regression
models. Biometrilca. 84, 469—473.

[34] Mack, Y. P. and Silverman, B. W.(1982). Weak and strong uniform consistency
of kernel regression estimates. Z. Wahrscheinlichkeitstheorie verm Gebiete. 61
405-415.

[35] Mammen, E., Linton, O. and Nielsen, J .(1999). The existence and asymptotic
properties of a backﬁtting projection algorithm under weak conditions. The An-
nals of Statistics. 27, 1443-1490.

[36] Mayaux, P. Bartholome, E., ﬁtiz, S. and Belward, A. (2004). A new land-cover
map of Africa for the year 2000. Journal of Biogeography. 31,861-877.

[37] Nielsen, J. P. and Sperlich, S.(2005), Smooth backﬁtting in practice, Journal of
the Royal Statistical Society B. 67, 43-61.

[38] Olson, J. M., Alagarswamy, G. , Andresen, J., Campbell, D.J., Ge, J ., Huebner,
M., Brent Lofgren, B., Lusch, D.P., Moore, N., Pijanowski, B.C., Qi, J ., Torbick,
N ., Wang, J. and Yang, L. (2006) Integrating diverse methods to understand
climate-land interactions at multiple spatial and temporal scales, GeoForum.

[39] Opsomer, J. D.(2000). Asymptotic properties of backﬁtting estimators. Journal
of Multivariate Analysis. 73, 166-179

144

[40] Opsomer, J. D. and Ruppert, D.(1997). Fitting a bivariate additive model by
local polynomial regression. The Annals of Statistics. 25 186—211.

[41] Opsomer, J. D.andRuppert, D.(1998). A My automated bandwidth selection
method for ﬁtting additive models. Journal of the American Statistical Associa-
tion. 93, 605—619.

[42] Pitman, A. (2003) The evolution of, and revolution in, land surface schemes
designed for climate models. International Journal of Climatology. 23, 479-510.

[43] Rosenblatt, M.(1976). On the maximal deviation of k-dimensional density esti-
mates. The Annals of Probability. 41, 009-1015.

[44] Ruppert, D., Wand, M.P. and Carroll, R.J.(2003) Semiparametric Regression.
Cambridge University Press, Cambridge; New York .

[45] Silverman, B. W.(1986). Density Estimation for Statistics and Data Analysis.
Chapman and Hall, London.

[46] Sperlich, S., Tjostheim, D. and Yang, L.(2002). Nonparametric estimation and
testing of interaction in additive models. Econometric Theory. 18, 197-251.

[47] Stone, C. J .(1985). Additive regression and other nonparametric models. The
Annals of Statistics. 13, 689—705.

[48] Stone, C. J .(1994). The use of polynomial splines and their tensor products in
multivariate function estimation. The Annals of Statistics. 22, 118—184.

[49] Tj¢stheim, D. and Auestad, B.(1994). Nonparametric identiﬁcation of nonlinear
time series: projections. Journal of the American Statistical Association. 89,
1398-1409.

[50] Torbick, N., Lusch, D., Olson, J., Qi, J., Ge, J.. (2005a) An Assessment of
Africover and GLC2000 using general agreement and airborne videography. In-
ternational Journal of Remote Sensing. (submitted).

[51] Torbick, N., Qi, J ., Lusch, D., Olson, J ., Moore, N., Ge, J. (2005b) Developing
land use land cover parameterization for climate-land modelling in East Africa.
(in progress).

[52] Tusnady, G.(1977). A remark on the approximation of the sample (if in the
multidimensional case. Periodica Mathematica Hungarica. 8, 53-55.

[53] Walko, R.L., Band, L.E., Baron, J., Kittel, T.G.F., Lammers, R., Lee, T.J.,
Ojima, D., Pielke Sr., R.A., Taylor, C., Tague, C., 'Ilemback, C.J., Vidale, PL,
(2000). Coupled atmosphere - biophysics - hydrology models for environment
modeling. Journal of Applied Meteorology. 39, 931- 944.

145

[54] Wang, J. and Yang, L.(2006). Polynomial spline conﬁdence bands for regression
curves. The Annals of Statistics. tentatively accepted.

[55] Xia, Y.(1998). Bias-corrected conﬁdence bands in nonparametric regression.
Journal of the Royal Statistical Society Series B. 60, 797—811.

[56] Xue, L. and Yang, L.(2006). Estimation of semiparametric additive coefﬁcient
model. Journal of Statistical Planning and Inference. 136, 2506-2534.

[57] Yang, L., Hardle, W. and Nielsen, J. P.(1999). Nonparametric autoregression
with multiplicative volatility and additive mean. Journal of Time Series Analy-
sis. 20, 579-604.

[58] Yang, L., Sperlich, S.and Hardle, W. (2003). Derivative estimation and testing in
generalized additive models. Journal of Statistical Planning and Inference. 115,
521-542.

[59] Zhang, F.(1999). Matrix Theory. Basic Results and Techniques. Springer-Verlag,
New York.

[60] Zhou, S., Shen, X. and Wolfe, D. A.(1998). Local asymptotics of regression splines
and conﬁdence regions. The Annals of Statistics. 26, 1760-1782.

146

      

       
 

 
   

[[11]

470

m[[[[‘][][ﬂ[][[[i[[