PLACE IN RETURN BOX to remove this checkout from your record. To AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 2/05 p:/CIRC/DateDue.indd-p.1 The Application of B-Spline Smoothing: Confidence Bands and Additive Modelling By Jing Wang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2006 ABSTRACT The Application of B-Spline Smoothing: Confidence Bands and Additive Modelling By Jing Wang Asymptotically exact and conservative confidence bands are obtained for nonpara- metric regression function, based on constant and linear polynomial spline estimation, respectively. Compared to the pointwise nonparametric confidence interval of Huang (2003), the confidence bands are inflated only by a factor of {log (11)}1/2, similar to the N adarayar-Watson confidence bands of Hardle (1989), and the local polynomial bands of Xia (1998) and Claeskens and Van Keilegom (2003). Simulation experiments have provided strong evidence that corroborates with the asymptotic theory. A great deal of effort has been devoted to the inference of additive model in the last decade. Among the many existing procedures, the kernel type are too costly to implement for large number of variables or for large sample sizes, while the spline type provide no asymptotic distribution or any measure of uniform accuracy. We propose a synthetic estimator of the component function in an additive regression model, using a one-step backfitting, with spline smoothing in the first stage and kernel smoothing in the second stage. Under very mild conditions, the proposed SBK estimator of the component function is asymptotically equivalent to an ordinary univariate Nadaraya- Watson estimator, hence the dimension is effectively reduced to one at any point. This dimension reduction holds uniformly over an interval under stronger assumptions of normal errors, and asymptotic simultaneous confidence bands are provided for the component functions. Monte Carlo evidence supports the asymptotic results for dimensions ranging from low to very high, and sample sizes ranging from moderate to large. The proposed simultaneous confidence bands are applied to the Boston housing data for linearity diagnosis. Phenological information reflecting seasonal changes in vegetation is an important input variable in climate models such as the Regional Atmospheric Modeling System (RAMS). It varies not only among different vegetation types but also with geographic locations (latitude and longitude). In the current version of RAMS, phenologies are treated as a simple sine function that is solely related to the day of year and latitude, in spite of major seasonal variability in precipitation and temperature. In short, the sine curves of phenolog are far different from the observed. Via linear spline smoothing we developed more realistic phenological functions of all land covers in the East Africa to improve RAMS model based on remote sensing observations. In addition, we quantify the differences between the RAMS’s default phenological curves and those linear spline estimates derived from remote sensing observations. © 2006 J ing Wang All Rights Reserved To my grandma, my parents, and Yuming ACKNOWLEDGMENTS I would like to express my sincere gratitude to my advisor Professor Lijian Yang. He is always willing to answer all kinds of questions with great patience and share his profound insights with me. I am very appreciative to his innumerous encouragement and support during my research and job search. With enduring enthusiasm and dedication to the academia and thoughtful attention to the students, Professor Yang sets an example for being an excellent faculty. I am truly grateful to Professors Jiaguo Qi, R. V. Ramamoorthi and Yijun Zuo for taking time to serve in my dissertation committee. Especially, I would like to thank Professor Qi and the CLIP group for providing me financial support and sharing their knowledge with me on the project. I really appreciate Professors Dennis Gilliland and Connie Page for their guidance for two years at CSTAT, I obtained valuable experience on consulting service. My special thanks goes to Professor James Stapleton for continuous help and encouragement from the very beginning. I also want to thank Professor Vince Melfi, Cathy Sparks and Laurie Secord for their assistance, and I thank all the professors and friends who helped me at MSU over five years. This dissertation research has been supported in part by NSF grants DMS 0405330 and BCS 0308420. vi TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES 1 Introduction 1.1 1.2 1.3 1.4 Introduction ................................ Confidence Bands ............................. Additive Component Eastimation .................... Application to Seasonality Analysis ................... 2 Spline Confidence Bands 2.1 2.2 2.3 2.4 2.5 2.6 2.7 Introduction ................................ Main Results ............................... Error Decomposition ........................... Implementation .............................. 2.4.1 Implementing Exact Bands .................... 2.4.2 Implementing Conservative Bands ................ Simulation and Examples ........................ 2.5.1 Simulation ............................. 2.5.2 Fossil Example .......................... Conclusions ................................ vii xi 11 14 14 15 20 24 26 27 30 30 32 2.7.2 Proof of Theorem 1 ........... 2.7.1 Preliminaries for Theorem 2 ...... 2.7.2 Variance Calculation .......... 2.7.3 Proof of Theorem 2 ........... 3 Spline-Backfitted Kernel Regression 3.1 Introduction .................. 3.2 SBK and SBLL Estimators ........... 3.3 Decomposition ................. 3.4 Simulation and Examples ........... 3.4.1 Simulation ............... 3.4.2 Boston Housing Example ........ 3.5 Conclusions ................... 3.6 Proof of Theorems ............... 3.6.1 Variance Reduction ........... 3.6.2 Bias Reduction ............. 3.6.3 Technical Lemmas ........... 4 Application to Seasonality Analysis 4.1 Introduction ................... 4.2 Method ..................... 4.2.1 Study Area and Data Description . . . 4.2.2 Polynomial Spline Regression ..... 4.2.3 Spline Fitting for LAI by LULC Type 4.3 Results ..................... 4.3.1 Land Cover Phenologies ........ 4.3.2 Sensitivity and Uncertainty ...... 4.3.3 Phenological Functions of Land Cover viii 0000000000000 OOOOOOOOOOOOO 0000000000000 ooooooooooooo 0000000000000 OOOOOOOOOOOOO 0000000000000 36 42 44 46 52 52 56 61 67 67 71 73 74 74 75 79 100 100 103 103 104 106 107 107 109 110 4.3.4 Implications ........................... 112 4.4 Conclusions ................................ 113 BIBLIOGRAPHY 141 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 LIST OF TABLES Coverage probabilities of constant spline bands. ............ 115 Coverage probabilities of linear spline bands ............... 116 Relative efficiency of firmI against 7713,01 for d = 4, 10. ......... 117 Relative efficiency of map against 171.33 for d = 50. .......... 118 Coefficients table for Deciduous Shrubland with Sparse Trees. . . . . 119 Coefficients table for Deciduous Woodland ................ 120 Coefficients table for Open to Very Open 'I‘raes. ............ 121 Coefficients table for Rainfed Herbaceous Crop. ............ 122 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 LIST OF FIGURES Constant spline confidence bands with opt = 1. ............ 124 Constant spline confidence bands with opt = 2. ............ 125 Linear spline confidence bands with opt = l. .............. 126 Linear spline confidence bands with opt = 2 ............... 127 Testing Ho : m (at) = Zia akx", d = 2, 3, 5,6 for fossil data. . . . . 128 Relative efficiency of mag against rhsp, d = 4 .............. 129 Relative efficiency of firm, against rhsp, d = 10. ............ 130 Relative efficiency of 17:3,0 against than, d = 50, a = 1, 10. ...... 131 Relative efficiency of 708,0 against Tits“), (1 = 50, a = 19, 50 ....... 132 Linearity test for the Boston housing data ................ 133 LAI trend of rainfed herbaceous crops. ................. 134 LAI trend of open to very open trees ................... 135 Spline confidence bands of LAI of deciduous woodland. ........ 136 Spline confidence bands and RAMS curves of LAI of deciduous shrubland.137 Spline confidence bands and RAMS curves of LAI of rainfed herbaceous crop. .................................... 138 Spline confidence bands and RAMS curves of LAI of open to very open trees ..................................... 139 Improved representation of land surface in RAMS ............ 140 xi CHAPTER 1 Introduction 1.1 Introduction For the past three decades, nonparametric regression has been widely used in many statistical applications, from biostatistics to econometrics, from engineering to geog- raphy. This is due to its flexibility in modelling complex relationships among variables by “letting the data speak, for themselves”. To fix the idea, we begin with the uni- variate regression models. Assume that observations {(Xg, Yi)}?=1 and unobserved errors {5;};1 are i.i.d. copies of (X, Y, a) satisfying the regression model Y=m(X)+o(X)6,. (1.1) The unknown mean and standard deviation functions m (1:) and 0‘ (2:), defined on a compact interval [a, b], need not to be of any specific form. Two popular nonparametric smoothing techniques are local polynomial / kernel and polynomial spline. The kernel type estimators are “local”, treated comprehensively in Fan and Gijbels (1996) and Hardle (1990). The polynomial spline estimators, on the other hand, are global, see Stone (1985, 1994) and Huang (2003). The fidelity of a nonparametric regressor is measured in terms of its rate of con- vergence to the unknown regression function. The type of convergence rates can be pointwise, or uniform. For kernel type estimators, rates of convergence of all three types have been established by Mack and Silverman (1982), Fan and Gijbels (1996).For kernel smoothing of univariate regression fimction, Hall and Titterington (1988), Hardle (1989), and Xia (1998) made significant contributions on the con- fidence bands. All of these are based on strong approximation of some empirical processes by the 2-dimensional Brownian bridge, as in Tusnady (1977), which is the same idea used in Bickel and Rosenblatt (1973) for confidence band of probability density function. More recently, Claeskens and Van Keilegom (2003) improved upon Xia (1998) by using smoothed bootstrap, and by extending the confidence band to derivatives of the regression function. Hardle, Huet, Mammen and Sperlich (2004) introduced the bootstrap bands with corrected bias. For polynomial splines, least squares rates of convergence have been obtained by Stone (1985, 1994), while pointwise convergence rates and asymptotic distribution have been recently established in Huang (2003). Confidence band for polynomial spline regression, however, remains unavailable except under the strong restriction of homoscedastic normal errors, see Zhou, Shen and Wolfe (1998). Since the confidence bands is one of the most important ways to do the model diagnosis, in another words testing the validity of the parametric model, the confidence bands for the heteroscedastic model is in great demand because of its generality. 1.2 Confidence Bands An asymptotic exact (conservative) 100 (1 - a) % confidence band for the unknown m (3:) over interval [a, b] consists of an estimator Th (2:) of m (2:), lower and upper confidence limit fit (1:) — In (3:), Th (1:) + In (2:) at every n: E [a, b] such that “lingoP {m (as) 6 1h(:1:) :1: In (2:) ,Va: 6 [a, b]} = 1— or, exact, lilrn’icng {m (1:) 6 fit (1:) :l: In (2:) ,Va: 6 [a, b]} 2 1 — a, conservative. Confidence band of kernel type estimators are computationally intensive since a least squares estimation has to be done at every point. In contrast, it is enough to solve only one least square problem to get the polynomial spline estimator. The greatest advantages of polynomial spline estimation are its simplicity of implementation and fast computation. But so far the asymptotics property of the spline smoothing is not complete as the kernel type. To introduce the spline functions, divide the finite interval [a, b] into (N + 1) subintervals Jj = [tj,tj+1) ,j = 0,....,N -— 1,JN = [tN,b]. A sequence of equally- spaced points {tj};.v=1, called interior knots, are given as t0=a >=f new nx>dx=Ewso1 (¢, =5 0,] 7,1 1', (2.2.5) The inner product matrix V of the B-spline basis {3,32 (2:)};V=_l is denoted as V N B B N 2 2 6 - (”flame—1 .- (< 1",?” J’2>)j,j’=—1’ ( ' ' ) whose inverse 5 and 2 x 2 diagonal submatrices of 5 are expressed as N ._ ._ ._ . S = (3.1) , ., = V‘1,Sj = 31 1’3 1 33 ‘17 ,j = 0,...,N. (2.2.7) J J N =“ 3231—1 32.2 Next define matrices 2, A (2:) and Ej as N N 2 = (0j“)j,j,=—l = {/02 (v) ng (’0) 812 (U) f (U) dv} . -I 1 . (2.28) .71] =" ACT) = (cj(x)-1{1'5($)}),Cj={ \f2- J:="'1,N , Cj(3)6 (It) 1 J = 0, ..., N —I E,- = ( (“1'3“ (“n+2 ) ,j = 0,1,...,N, (2.2.9) lj+2,j+1 lj+2,j+2 . with terms film [i - kl S 1 defined through the following matrix inversion ( 1 fi/4 o \ fi/4 1 1/4 1/4 1 . —1 M = = (11:) 1 N+2 1/4 ' (N+2)x(N+2) 1/4 1 \/2/4 A 0 fi/‘I 1 (N+2)x(N+2) (2.2.10) 18 and computed via (2.4.14), (2.4.17), and (2.4.18). We define now I I- (1002 (wow 1 N 0,2,,1 (2:) = sz‘) ”62 , 031,2 (2:) = fi 2 Bj’,2 (3)3113 (:12) sjjrsulafl, 2(3).n j,j’,l,l’=-—1 (2.2.11) with j (2:) defined in (2.2.2), ej,n in (2.2.3), 31-12(2) in (2.2.4), and s“: and 0,1 in (2.2.7), (2.2.8). These 0,2,”, (2:) are shown in Lemmas 2.7.4, 2.7.4 to be the pointwise variance functions of 1:1,, (2:) , p = 1, 2. We now state our main results in the next two theorems. Theorem 1. Under Assumptions (A01 )-(AC'4), if p = 1, then an asymptotic 100 (1 — a) % exact confidence band for m (2:) over interval [a, b] is m, (2:) d: Un’l (2:) {2 log (N + m”2 d”, (2.2.12) in which an,1(2:) is given in (2.2.11) and can be replaced by a(2:) {f (2:) nh}"1/2, according to (2.7.7) in Lemma 2.7.4, and (in = 1 — {2 log (N + l)}_1 [log{—%log(1 - (1)} + % {loglog (N +1)+log41r}] . (2.2.13) Theorem 2. Under Assumptions (AC1)-(AC’4), if p = 2, then an asymptotic 100 (1 — a) % conservative confidence band for m (2) over interval [a, b] is 1112(2) :1: an; (2:) {2 log (N + 1) — 2 log 01]”2 , (2.2.14) in which on; (2:) is given in (2.2.11 ) and can be replaced by a(2:) {2f(2:)nh/3}-1/2AT(2:)Sj(x)A(2:), according to Lemma 2.7.4, and by a (2:) {2f (2:) nh/3}'1/2 AT (2:) Ej(x)A (2:) according to Lemma 2.7.3. 19 The construction in Theorem 1 is similar to the connected error bar of Hall and Titterington (1988). Ours is superior in two aspects: first, we treat not only equally- spaced designs, but random designs; second, by applying the strong approximation theorem of Tusnady (1977), our confidence band is asymptotically exact rather than conservative. The error bars of Hall and Titterington (1988) are based on a kernel estimator while ours regressogram. The upcrossing results (Theorem 2.3.4) used in the proof of Theorem 1 is also different from that used in Bickel and Rosenblatt (1973), Rosenblatt (1976) and Hardle (1989). Theorem 2 on linear confidence band, however, bears no similarity to the local polynomial bands in Xia (1998), Claeskens and Van Keilegom (2003), except the width of the band being of the same order n’l/‘r’ (log n)1/2. The asymptotic variance function 0,2,3 (2:) of m2 (2:) in (2.2.11) is a special unconditional version of equation (6.2), in Huang (2003), Remark 6.1, page 1624. Thus, the linear band localized at any given point 2:, is only a factor of (log n)”2 wider than the pointwise confidence interval of Huang (2003). 2.3 Error Decomposition In this section, we break the estimation error 111,, (2:) — m (2:) into a bias term and a noise term. To understand this decomposition, we begin by discussing the spline space C(P’Z) and the representation of the spline estimator 111,, (2:) in (2.2.1). The first fact to note is that the empirical inner products of the B-spline basis {By-,1 (2:)};1;0 and {B 32 (2:)};lr:_l defined in (2.2.4) approximate the theoretical inner products uniformly at the rate of \/n"1h“l log (n), according to the following lemma. 20 .N Lemma 2.3.1. As n —+ co, the B-spline basis {Bj,1 (15)};io and {Bj,2 (3)}j=_1 defined in (2.2.4) satisfy An,1 = SUP “[3331”:n - 1| = 012(W) 1 (2.3-11 OSJ'SN ’4”,2 : 811p (91192)n — (913.92) llgll2,n _ 1 2 0p ( log n/ (nh)) . 91192640) ||91||2 ||92||2 960(0) ||9||2 (2.3.2) N To express the estimator rap (2:) in {Bi}? (2:)} we introduce the following i=1-p’ vectors in R" for p = 1,2 T T . Y = (Y1, ...,Yn) 1Bj,p (X) = {Bjm (X1) , '°'1Bj,p (Xn)} ,] = I - p, ..., N. The definition of 171,, (2:) in (2.2.1) entails that #1,, (2:) E Zj'il-p AijJ-‘p (2:) where . . T the coefficients {Al—pp, ..., /\ N41} are solutions of the following least squares prob— lem n N' 2 {A1_p,p,..., AN,p}T = argminz {K - Z ALPHA? 0(5)} . (2.3.3) i=1 j=1—p We write Y as the sum of a signal vector m and a noise vector E Y = m + E,m = {m (x1) , ...,m (Xn)}T,E = {0(X1)81, ...,a (xn)e,,}T. Projecting this relationship into the linear space spanned by (Lap-2) -—- {Bjm (X) };:l—p’ a subspace of R”, one gets . . A T . . . mp = {mp (X1) , ...,mp (Xn)} = PI‘OJ (p_2)Y =Pr03 (p_2)m + PI'OJ (p_2)E. an 0,, 0,, It entails that in the space GAP-2) of spline functions 12,, (2:) = 171,, (2:) + 5p (2:) , (2.3.4) 21 where N N mph): 2 1,,p3,,p(x),gp(x)= Z enema). (2.3.5) j=l-P j=1—p .. .. T The vectors {A1_p,p, ..., AMP} and {&1_p,p, ...,dN,p}T are solutions to (2.3.3) with Y,- replaced by m (Xi) and o (Xi) s,- respectively. We cite next two important results. The first one from de Boor (2001), page 149, the second one from Theorem 5.1 of Huang (2003). Theorem 2.3.1. There is an absolute constant 0,, > 0,p _>_ 1 such that for every m E 00’) [a, b], there exists a function g e GfP-Z) [a, b] such that "g - ...... s a, 1» (mt—1’:h)ll..r" .<. a. "mm”. ,, Theorem 2.3.2. V There is an absolute constant C1, > 0,p Z 1 such that for any m 6 0(9) [a, b] and the function 711,, (2:) defined in (2.3.5), limp (x) — mm”... s 0p inf ug — mu... = 0p (hp). (236) g€G(p-2) According to (2.3.4), the estimation error 111,, (2:) —- m (2:) = {171, (2:) — m (2‘)} + 5p (2:) where according to Theorem 2.3.2, the bias term «71,, (2:) -— m (2:) is of order 0,, (hp). Hence the main hurdle of proving Theorems 1 and 2 is the noise term Ep (2:). This is handled by the next two propositions. Proposition 2.3.1. With ring (2:) given in (2.2.11), the process 0,1,1 (2:)"1 51 (2:) ,2: E [a, b] is almost surely uniformly approximated by a Gaussian process U (2:) , 2: 6 [a, b] with covariance structure N EU (I)U(y) = EL] (11:) ' Ij (y) : j(2:),j(y):vxry 6 [a,b]: j=0 where 53'! is the Kronecker symbol, i. e., (SJ-,1 = 1 if j = l and 0 otherwise. 22 Proposition 2.3.2. For a given 0 < a < Land on; (2:) as given in (2.2.11) 0‘; (2)52 (2:) g {2 log (N + 1) — 2loga}1/2 2 1 — a. (2.3.7) "1 lim inf P sup 13 "200 €[a,b] We state next the strong approximation theorem of Tusnady (1977), which will be used later in the proof of Lemmas 2.7.6 and 2.7.6, key steps in proving Proposition 2.3.1 and Proposition 2.3.2. Theorem 2.3.3. Let U1, ...,Un be i.i.d. r.v. ’s on the 2 -dimensional unit square with P(U,- ”—1/2(Clogn + 2:) logn < Ke‘M7 OStSI (2.3.8) holds for all 2:, where C, K, A are positive constants. For the rest of the paper, we denote the well-known Rosenblatt quantile transfor- mation as (x',.-') = M (X. s) = {Fx (x) . Felx (52)}. (2.3.9) which produces random variables X ’ and 6’ with independent and identical uniform distribution on the interval [0,1]. This transformation had been used in, for instance, 23 Bickel and Rosenblatt (1973), Hirdle (1989). Substituting the vector t = (tLtg) in Theorem 2.3.3 with (X ’ ,e’ ), and the stochastic process nl/2 {F}: (t) — A (t)} with 2,,{M-1 ($22)} = 2., (2,5) = v; {Fn (53,5) — F (2,5)), (2.3.10) where Fn (:r, 5) denotes the empirical distribution of (X, s), then (2.3.8) implies that there exists a version of 2-dimensional Brownian bridge B such that sup IZn (2:,6) — B {M (2:,e)}| = 0 (n"1/2log2 n) ,w.p.1. (2.3.11) 2:,6 The next result on upcrossing probability is from Leadbetter, Lindgren and Rootzén (1983), Theorem 1.5.3, page 14. In our proof of Theorem 1, it plays the role of Theorem A1 in Bickel and Rosenblatt (1973) or Theorem C in Rosenblatt (1976). Theorem 2.3.4. If {1,...,én are i.i.d. standard normal r.v.’s, then for Mn = max{{1,...,§n} ,7' E R, as n —» 00 P {an (Mn — bn) S T} -+ 8XP(-€-T),P{1Mnl _<. r/an + bn} -> exn (46-7), where an: Zlogn 1/2,bn= Zlogn 1/2— 1 2logn ‘1/2 loglogn+log47r . 2 2.4 Implementation In this section, we describe the procedures to implement the confidence bands in Theorems 1 and 2. We have written our codes in XploRe due to the convenience of using certain kernel type estimators. Information on XploRe is in Ha'rdle, Hlavka and Klinke (2000). 24 Given any sample {(Xi,Y,-)}?=l from model (2.1.1), we use min(X1,...,Xn) and max (X 1, ...,Xn) respectively as the endpoints of interval [a, b]. Minor adjust- ments could be made for outliers. The number of interior knots is taken to be N = [clnl/(2p+1)] + c2, where C] and c2 are positive integers. Since explicit for- mula of coverage probability does not exist for the bands, there is no optimal method to select (c1, c2). In simulation, the simple choice of 5 for CI and l for c2 seems to work well, so these are set as default values The least squares problem in (2.2.1) can be solved via the truncated power basis 1,x,...,:1:p‘1, (a: — t1):- ,j = 1, ..., N. In other words p—l N 1 m, (2:) = Z rm" + 23,-, (a: - t1)”; , (2.4.1) k=0 j=1 where the coefficients {’70, ..., "yp_1, ’71,,” ..., "yN,p}T are solutions of the following least squares problem 2 n p—l N {5’02 '°')&p—19’?l,p)”’);l’N,p}T : 31‘ng K _ 2:7ka + Z7j,p (Xi — tj):_l i=1 k=0 j=l . When constructing the confidence bands, one needs to evaluate the functions 0,2”, (x) in (2.2.11). This is done differently for the exact and conservative bands, and the description is separated into two subsections. For both constant and linear bands, according to Lemmas 2.7.4, 2.7.4, one needs the unknown functions f (2:) and a2 (as). Let R (u) = 15 (1 — u2)21{|u| S l} /16 be the quartic kernel, 3,; =the sample standard deviation of (Xi)?=1 and . _ " _ ~ )Q._ _ f (11):" 12%}th (Kt—ix) ’hrow: (4w)'/‘°(140/3)1/5n ”53.. (2.4.2) i=1 ’ 25 where hrot,f is the rule-of-thumb bandwidth of Silverman (1986). Define next matri- ces z, = {2122: ..,Zn,p}T,p = 1,2 with 2,3,, = (Y, - 131,, (X,-)}2 and X=X(x)=(X1-x xn-x)T,W=W(x)=diag{1?(’,f:of)} . i=1 where hm“, =the rule-of-thumb bandwidth of Fan and Gijbels (1996) based on data Xi, Z,- l: . Then one defines the following estimators of 02 (x) g? 1—1 «2 T ’1 T 0,, (x) = (X WX) X WZp,p = 1,2. (2.4.3) Bickel and Rosenblatt (1973), Fan and Gijbels (1996) provide the following uniform consistency results max sup '5? (2:) — a (2')] + sup P=1,2 1:6 [a,b] zEla,b] m) — f (2:)l = 0,, (1). (2.4.4) 2.4.1 Implementing Exact Bands The function 0a,] (as) is approximated by either one of the following, with f (1:) and 61 (11:) defined in (2.4.2) and (2.4.3), j (2:) defined in (2.2.2) 671,1 (23,1) = (3’1 (5(3)) f—l/2 (tj(a:)) fl—l/2h-1/2, (2.4.5) «3... (x, 2) = &1($)f‘1/2(x)n'1/2h‘1/2. (2.46) where the additional parameter value 1 or 2 indicating the estimation at each value a: or at the nearest left knot. Since sup 3 h —’ 0, as n —> oo,(2.4.4) entails 2:6 [a,b] 3‘9o) that both of the bands below are asymptotically exact with 1711 (2:) given in (2.4.1) and d" in (2.2.13) fill (:13) :i: 5n,1 (x, opt) {2 log (N + 1)}1/2 dn, opt = 1, 2. (2.4.7) 26 2.4.2 Implementing Conservative Bands According to Lemma 2.7.3, for 0 S j S N, the matrix Ej approximates matrix S, uniformly. Hence both of the bands below are asymptotically conservative, with 7712(3) given in (2.4.1) 1712(3) d: 6mg (2:, opt) {2 log (N + 1) - 2 log a}1/2 ,opt = 1, 2, (2.4.8) where the function on,2(:1:) in (2.2.11) for the linear band is estimated consistently by either one of the next two formulae {AT (I) 5311915 (56))”2 \/3/—252 (tj(:c)) f _1/2 (9m) "—1/2’424???) an,2(x.2) = {AT($)5j(x)A($)}1/2 Meow-V2(on-”2’2“”, (2.4.10) 611,2 (x) 1) with A (12) and Ej defined in (2.2.9), 3' (:5) defined in (2.2.2), and f(:c) and 62 (2:) defined in (2.4.2) and (2.4.3). In order to calculate the matrix MR1? which is needed for (2.2.9), we introduce two theorems from matrix theory. Theorem 2.4.1. [Gantmacher and Krein (1960), page 95, equation (43)] For a sym- metric Jacobi matriz J given as follows a1 ()1 0 b . . J: 1 , bN+l 0 bN+1 “N” (N+2)x(N+2) its inverse matrix J ‘1 = (lik)( N +2)x( N +2) satisfies li,k = #2ka S [Value = 1/JkXiJc S 1', (2.4.11) 27 where (—1)'.det (J(1,,,,,,-_1)) bibi+l ° ° ° bN+1 (‘1)k det(J(k+1,...,'N+2)) ' = r X = ’ ‘ det (J) k bkbk+l ' ' 'bN+1 (2.4.12) and J(1.---,i-1) is defined as the upper left (i — 1) x (i — 1) submatn'x of J, det (J) is the determinant of matrix J, while J(k +1,” N +2) is the corresponding lower right (N + 2 — k) x (N + 2 - k) submatrix. Theorem 2.4.2. [Zhang (1999), page 101, Theorem 4.5] For a tridiagonal matrix givenas a b 0 TN: C a " ,N_>_1, (2.4.13) b 0 c a if a2 75 4bc, then the determinant of TN is 0”“ -—flN+1 _ a+\/a2—4bc a-Va2—4bc ,a . a-fl 2 (1813 TN = 2 J3: To apply Theorem 2.4.1 and Theorem 2.4.2, we let _. 2 z, = 2+4fi,22 = affix) = :3 = (2— J5) = 7—4\/§. (2.4.14) 1 For any N Z 1, Theorem 2.4.2 entails that det (TN) = (Z{V+l — 221V“) /(zl — 22), if one takes a = 1,b = c =1/4 in (2.4.13). Next, denote for any N Z 1 ~ TN TT MN+1 = ( N TN 1 ,TN = (0,...,0,\/§/4) ) lxN (N+1)x(N+l) with the convention that M1 E 1. By the expansion of determinant of matrix M,- along the last row and then the last column, Vi = 1, ..., N + 1 ...,—I {21 (1— 6*) - (1— 01-1)} 8(21 - 22) ' det (A4,) = det (TF1) - 8—1det(T*_2) = 28 The determinant of matrix M N+2 can be expanded along the first row and then the first colurrm: det (MN+2) = det (MA/+1) — 8—1det(117!N) = 4"“ {64.2% (1 — 9N+1)— 1621 (1 — a”) + (1 — 6N-1)}/{64(21 — 22)}. Applying (2.4.12) to matrix M N+2 yields 4 (—1)(1/4)”“(fi/4)2/det 0, such that the following conditions are fulfilled 00 2 012+“) < oo, (nh)—l/2 logann, «wt/1011+"), 0,:‘511-1/2 —+ 0. (2.7.1) n=l And for any sequence {Du} that satisfies the above four conditions, we have P{w|3N(w), 9 [a,-I g Dmi = 1,...,n,n > N(w)} = 1. Lemma 2.7.2. As n -—> 00, for ij and dim defined in (2.2.3) 61",, = f (tj) h (1 + Tj’n’l) 1 (bj,libj’,l> E Oaj ¢ j, (2.7.2) 1 + rmg j = o, ...,N — 1, , (2.7.3) 1/2 + Tj,n,2 J : “LN: 2 din = §f(tj+1)h{ 34 1 1+f° 2 IJ"-J'|=1, b- ,b. )=— t- h 1”“ 2.7.4 < ,2 6m“) {0 “._jlfl, < ) where ogixN lrj’ml + —1’2?§N “13"!” + 4313.134 Ifjml S Cw (f. h). (2.7.5) In particular, 1 2 5f (9+1) h {1 — Cw (f,h)} s d," _<. 5f (t,,.,) h {1 + Cw (f, 11)}. (2.7.6) PROOF OF LEMMA 2.3.1. For brevity, we give only the proof of (2.3.1) for An,l- Take any j = 0, 1, ...,N n 2 _— lllB-.1||2,.. -1| = IZele = {B§,,(X,)—1}n 1 i=1 With E51 = 0 and for any It 2 2, Minkowski’s inequality implies that _ k _ 2 " Elgilk =n kElBJzil (Xi) — II S (2/n)k2 1E [312,]; (Xi)+ I] S{1—IE} Coll, while (2.7.2) entails that E53 2 n’2E [1331, (X,) — 1] 2 {2/ (1111)}? cm. It is then clear that one can find a constant c > 0 such that for all k > 2, E léilk _<_ (cn‘lh-1)k_2 klE |€,~|2. Applying Bernstein’s inequality to 2?:1 6,, for any large enough 6 > 0 n P { 2 6i i=1 => 2 P sup “IBM“2 n -1| 2 6\/(nh) log (n) < oo 2 a,/ (rim-1 log (n)} g 211-3 for such 6 > 0, then (2.3.1) follows. U 35 2.7.2 Proof of Theorem 1 In this section, we will investigate the asymptotic behavior of 51 (x) defined in (2.3.5). Since (Ber (X) ,3“ (X)>n = 0 unless j = j', 51 (x) can be written as N —2 51 (1‘) = 25531.1 (5'3) ”3 '.1||2,n i=0 in which 1 n 6'} = (E13131 (70),, = 5 ZB',1(X1)0(X1)€1'~ i=1 Lemma 2.7.3. Let 5‘1 (x) = Ell/=0 a; 131 (x) ,x E [a, b] then .. .. —l A |€1(1‘)- 61($)| S «411,1 (1 - 411,1) |€1(1‘)|1$ E [0151, where A,” is defined in (2.3.1). The asymptotic behavior of SUPxe[a,b]l§1(-T)l therefore is the same as that of supx€[a,b] lél (“TN ' Lemma 2.7.4. The pointwise variance of (31 (x) is the function 03,1 (x) defined in (2.2.11) which satisfies E{é1(a:)}20121,$1( )= (mgx)h{1+rn,1f(x)},x6[a,b] (2.7.7) with supxelaM Irml (x)| -—> 0. PROOF. The term E {£1 (x)}2 has the expression for 03,1 (x) in (2.2.11). By (2.7.5) and the continuity of functions 02 (x) and f (x), 03,1 (x) can be expressed as 0’2 ($)f($)h+ fjflx) {a2 (v)f(‘U) __ 0.2 ($)f(x)} d1) 2 _(L) "{f (tj($)) h+r j,,(x)n 1}2 :nf _(_x)h 36 {1+ 1'n,l (33)}1 with supxelafi] {771,1 (x)| -') 0, establishing (2.7.7). [3 Lemma 2.7.5. Let the sequence {Du} satisfy (2. 7.1) and define forx 6 [a,b] N N 631,1 (95) = 011,1 (10—1 : 331 (513)53f = 011,1 (ml-'1 23m (x) (8; - 55;) 1 J=0 J=0 N 63,21 (1:) = 0n11($)—IZ Bjtl (x) (6; - E635) [{IEKDR} (2.7.8) j=0 then with probability 1 ”an, (1:) — 4,2, (1:)“00 = o (19,:(“5’1/1711) = 0(1). PROOF. Notice that E8; = EH; 2&1 BjJ (X,)o (X,) 6,} = 0 since E(6,-|X,-) = 0, then 6,” (x) = {on 1 (x) fog-(3)71 }//Ij(z) (v) a (v) EdZn (v, 6) according to the definition of Zn (v, 6) in (2.3.10). The process 6“",1 (x) is separated into two parts 6“",1 (x) = €21 (x) + {6",1 (x) — €21 (x)} . The truncated part 62 1 (x) is defined in (2.7.8). The tail part 6“,“ (x) — 6,131 (x) is bounded uniformly over [a, b] by -1 $215] {011,1 (I) x/T—lcj(x),n} / / 11(1) (1110(1)) 51{|6IZDn}dZfl (11,5) 5.2151] {011(1) Mn} :2 11(4)“ ) “lama-lava} (“-9) + :1;pr {0&1 (x) Cj(x),n f/Iflx) (v)a(v)6I{lEIZDn})dF(v,6)(2.7.10) By Lemma 2.7.1, the term in (2.7.9) is 0 almost surely. The term in (2.7.10) is 37 bounded by sup 0,, 1 (x)c 61(3)“ } l/Ijm (u)a(v)f(v) [/|6| [{16120n1dF(6 |v)] dv xehfl -1 \/nh S su o x / I - v v v dv— _ C'<——. .4151. ..1( )c cm.) } ,(.1( > < m > 13,—; D1,, The lemma follows immediately by the third condition in (2.7.1). E] Lemma 2.7.6. Define forx 6 [a,b] (0) (x) {0n1(x) "CJ(:1:)n}—lI// j(:1:) (”)0(”)51{|6|}14{(.) (v)a(v)eI{... soda 121 39 where 3?, (11) = / 621{|€]c;(;,,,n‘1/2 [1.1.1 (1)0 (11) it (v) 11W (11) = 0(1),:511-1/2) = 0(1) 3 "1 Lemma 2.7.10. The process 6 ] (x) is a Gaussian process with mean 0, variance 1, and covariance CO‘U{€S:) 1W1 511(31(y )} = 6j(x),j(y)ivxiy 6 [a,b] PROOF. The variance and covariance are given by Ité’s Isometry Theorem 111(553 (1)} = {0.11 (1:) «61:11.14 / 111.1) (1)112 (1011111111 = 40 according to (2.7.7). Likewise the covariance cov {6(3) (x) ,6513] (y)} is n,l —1 {011,1 (:5) 011,1 (11) ncj(x),ncj(y),n} xE { / 11(2) —1 = {and (37) 0n,l (y) ”Cj(x),ncj(yl,"} / 02 (v) f (U) ‘11) = 61($)J(yl Ji(x)”’1(y) «(uni dW(v) [J J'(y) 11(1) ii (aw/(12)] which completes the proof. Cl PROOF OF PROPOSITION 2.3.1. The proof follows immediately from Lemmas 2.7.3, 2.7.5, 2.7.6, 2.7.7, 2.7.8, 2.7.9 and 2.7.10. Cl PROOF OF THEOREM 1. It is clear from Proposition 2.3.1 that the Gaussian process U (x) consists of (N + 1) i.i.d. standard normal variables U (to) , ...,U (tN), hence Theorem 2.3.4 implies that as n —1 00 P sup IU(x)IST/aN+1+bN+1 —+exp(—2e”). x€[a,b] By letting T = — log {“é’ log (1 — 01)}, and using the definition of aN+1 and bN+1, we obtain “III P 811p |U(x)| S — log (ml-log (1 — 11)] {2 log (N + 1)}‘1/2 n—ioo x6[a,b] 2 + {2 log (N + 1)}1/2 — % {2log (N + 1)}"1/2 {loglog (N + 1) + log4r}] =1- (1. Replacing U (x) with a,” (x)—16”1 (x) (Proposition 2.3.1), and the definition of dn in (2.2.13) entail that lim P sup 3 n—+oo 6 [a,b] an,1(x)'16'1 (x) S {2 log (N + 1)}1/2 dn] =1— 0. 41 According to (2.3.6), it implies that (nh)-1/2 1/log (N + 1) ”Th1 (x) - m (x)]]00 = 01) (1) . Thus according to (2.3.4) nleréoP rm (x) E 1111(1) :l: a,“ (x) {210g (N + 1)}1/2 dn,Vx E [a, b]] = 11111 P {2 log (N + my”2 11,-;1 sup 0;} (x) (1:5, (x) + 171, (x) — m (x)l s 1] "—400 x6 [a,b n—roo x6 [a,b : lim P {210g(N+1)}'1/2d,-,1 sup 0;} (x) ]61 (x)] S I] =1—a. Cl In 2.7.1 Preliminaries for Theorem 2 In this subsection we examine some matrices used in the construction of confidence band in (2.2.14) and in the proof of Theorem 2. The next lemma corresponds to (2.2.5) for piecewise constant basis. In what follows, we use IT] to denote the maximal absolute value of any matrix T, and M N+2 is the tridiagonal matrix as defined in (2.2.10). Lemma 2.7.1. The inner product matrix V of the B-spline basis {ng (x)};.v=_1 defined in (2. 2. 6) has the following decomposition v = MN” + (17,107,” = MN+2 + 17 (2.7.1) .71] - 1 where 111.1]. E 0 if ]j —j’] 21, and ]17] g Cw (f, h). (2.7.2) PROOF. By (2.7.3), (2.7.4) and (2.7.5), the inner product of (bj/2,bj,2> can be replaced by ,1, f (1,11) h 11 ]j’ — j] = 1, and :1, f (1,11) h or .3, f (1,“) h when 3" = j, plus some uniformly infinitesimal differences dominated by 11.1 (f, h) . Then based on the definition of Bj’g (x), the lemma follows immediately. Cl 42 The next lemma shows that multiplication by M N+2 behaves similarly to multi- plication by a constant. N Lemma 2.7.2. Given matrix Q = MN+2 + F, in which I‘ = (73.14) . 4:— satisfies 71.34 E 0 if (j — j’l _>_ 1 and [PI -p+ 0. Then there exist constants c,C > 0 independent of n and I‘, such that in probability Clél s l9€| s Cl€|,C'1|£| s Ifl'lél s c‘1|€|,V€ e RN”. (2.7.3) PROOF. Since each row of M N+2 has diagonal element equal to l, and one or two nonzero off-diagonal terms whose total absolute values do not exceed 2J2 / 4 = 1/ J2, hence (1 -1/\/§ - 3 IN) lél s Iflél 5 3(1 +|1‘|)|§|, which entails the left inequality of (2.7.3), and the right one follows by switching the roles of E and GE. D As an application of Lemma 2.7.2, consider the matrix S = V‘1 defined in (2.2.7). - N Let {34 = {sgn (SJ-11)} , 1, then there exists a positive C's such that J:- N Z lst-ll s (851.4 g C's £14] = Cs,Vj' = —1,0, ...,.N (2.7.4) j=—1 The matrix S appears in the construction of the confidence band, but it can not be computed exactly as it involves the unknown density f (x). We approximate .S' with the inverse of M N+2: with a simpler, distribution-free form in (2.2.10). This approximation is uniform for Sj in (2.2.7) and Ej (2.2.9) as well. Lemma 2.7.3. As n —* oo,|M1§i2 — .5" —+ O and OinaéxN |EJ~ - Sjl —’ O. _J_ 43 PROOF. By definition, MN+2M1QLZ = 1 = vs = (MA/+2 + V) s. Denote by e,- the unit vector with i-th element 1, then applying Lemma 2.7.2 with Q = MN+2, one derives _ N 2 clMNl2 -S'=cm§x( MN+2 5) e, N+2 ~ _. _ s melM~+zn):‘vj’=—l 5 = (% i 3332 (Xi) 0 (xi) 6;) ’ i=1 j=—1 In other words N = (V + B) —l ('3; Zn: 3:32 (X2) 0 (Xi) 5i) , (2.7.6) i=1 J'=-1 93! II “N where '3' _<_ A"; 2 0p (Ju‘Ih’Tlog (n)) by (2.3.2). .. —1 Now define aj’s by replacing (V + B) with V"1 = S in above formula, i.e. J=—-1 i=1 A “N 51.1 N 1 n a -_—. g = Z 314,; 2 13¢ (X,) o (x,) 5,- (2.7.7) j’=—l,..,N and define for x E [a, b] N N n . Z .. 1 £2 (2:) = aijg (51:) = E 81—11.; E 31'; (Xi) 0 (Xi) EiBj”2 (:12) . (2.7.8) j=—l J'J":..1 i=1 In order to calculate the variance of 52 (x), we express the matrix 2 defined in (2.2.8) as 2 = enven + (5,033, = seven + 2, 9,. = diag{o(to) .. . ..a 0, j (x) is as defined in (2.2.2), A (x) as defined in (2.2.9) and matrix Sj in (2.2.7). Consequently, there exist positive constants co and Ca such that for large enough n ca (nh)_l/2 S on; (x) 5 Ca (nh)'1/2 ,Vx 6 [a,b]. (2.7.12) PROOF. See Wang and Yang (2005). C] 2.7 .3 Proof of Theorem 2 Several lemmas will be given below for the proof of Proposition 2.3.2. Lemma 2.7.5. Define for x 6 [a,b] N 571,2 (33) = 0;}2 (33) 52 (33) = 01;; (35) 2 61-13143 (1?) : j’=-1 N 522 (x) = 0;; (x) 2 61,433.“, (x) Ill€l 2{log(N+ 1) — loga}] = TVTT’VO g j g N. Then (2.7.14) and the Maximization Lemma of Johnson and Wichern (1992), page 66 ensure that for any :1: E [a, b] 2 “(3) 2 _ l5(x)TAj($) < AT A. —1A. _ . {5,1,2 (17)} - f(:c)T {cov (Aj(:r))} 5(1‘) — j(x) {C°v( 1(2))} 1(3) ‘ QJ(I)' 2 553 (ill S maxOSjSN {Q1} and One has therefore Spr€[a,b] ' 2 P sup €53 (2:)l _<_ 2 {log (N + 1) — log 0:} _x€[a,b] ’ F 2 P 92.23%, {Qj} > 2 {log (N + 1) — loga}] Z 1— 01. Now (2.7.15) follows from Lemmas 2.7.5, 2.7.6 2.7.7, 2.7.8, 2.7.9. [3 Lemma 2.7.11. su 52(23) _ 82 (2:) = Op (133:1 ___ 0p(1)- x6[a,b] 071,203) xe[a,b] “n,2 (1‘) M 49 PROOF. Recall the definition for 5 = (514,60, ...,ZzN)T and a = (&_1,&0,. . .,€1N)T in (2.7.6) and (2.7.7), one has (V + B) 5 = Va. Based on Lemma 2.7.2 and (2.3.2), there exists a constant c such that - A cIé-él s IV(é-5)l = [Bil s An,2(l5—5I+l5|) => Ié-él s fail-Ia. _ n, (2.7.16) Horn the definitions of 52 (:13) in (2.7.5) and £2 (3:) in (2.7.8), plus (2.7.12), (2.7.16) and (2.7.6), as n -—1 OO . .. N 52 (:6) 52(17) —1 A - 1/2 14112 .. — g sup 0' (:1: Ia—aB-gx $011 —’—a. zEIa,b] 071,2 (1‘) 0n,2(=v) z€[a,b] 1;, "’2 ) I 3’ ( ) c—Amzl I (2.7.17) Use (2.7.6) again, it implies that as n —> OO .. (— N (— £2 (2:) 2 nh sup 2 éijg (:12) = nh sup 5B; (2:)l Z Cfilal xe[a,b] 071,2 (:3) Ca z€[a,b] j=_1 Co z6[a,b] , (2.7.18) where 132 (1) = {3—1,2($),---,BN,2($)}T,112(55) = {b—1,2($),---:bN,2($)}T- Then the desired result follows from (2.7.17) and (2.7.18), i.e. =0? («19%) = 0,,(1). PROOF OF PROPOSITION 2.3.2. It follows from Lemma 2.7.10 and Lemma 52 (I) ”n,2 (it) D sup 52(1‘) _ 52(2) S A"; xE[a,b] 071,2 (2:) 071,2 (2:) C-Afl3 x€[a,b] 2.7.11 automatically. Cl PROOF OF THEOREM 2. Now (2.3.6) implies that “1712 (:c) — m (2:)“0O = 0,, (112), and hence (nh)_1/2 y/log (N + 1) ”1712(2) —— m(:1:)||00 = 0,,{(nh)-1/2 y/log (N +1)h2} = 0,, (1). 50 Applying (2.3.7) in Proposition 2.3.2 lim inf P n—mo = lim inf P n—+oo = limian 71—100 ;m (x) 6 m2 (2:) i: on; (x) {2 log (N + 1) — 2loga}1/2 ,Va: 6 [a, b]] sup 03(1):) (gm) + 1712(2) — m(:1:)| g {210g (N + 1) - 210g a}l/2 L36 [0- ,b] sup _xe [a,b] 52 (1‘) 011,2 (1:) S {210g(N+1)—210ga}1/2] Z l—oz.l:l 51 l CHAPTER 3 Spline-Backfitted Kernel Regression 3.1 Introduction One popular choice to addressing the issue of the “curse of dimensionality” is the additive model popularized by the book of Hastie and Tibshirani (1990) (I Y = m(X) + 0(X)5,X = (X1, ...,xd) ,m (x) = 6+ 2 ma (21:0), (3.1.1) a=l where the noise satisfies E (elX) = 0,var (EIX) = 1 and the component functions satisfy the identification conditions Ema (Xa) E 0,0 = 1, ...,d, In addition, we as- sume that the predictor Xa is distributed on a compact interval [am he] ,a = 1, ..., d. The goal is the efficient and fast estimation Of the (1 unknown component functions {ma(:ca)}g=1 based on an i.i.d. sample {n,xf}; = {13, Xil, ...,Xid}?=1 follow- ing model (3.1.1). 52 If the last d— 1 of the component functions were known by “oracle”, then one could define a new variable Y1 = Y -— c— 23:2 ma (X0) = m1 (X 1) +0 (X) 5 which one can use to regress on the numerical variable X1 to estimate the only unknown function m1 (1:1), without the “curse of dimensionality”. The basic idea of Linton (1997) was to obtain an approximation to the variable Y1 by substituting ma (X0) , a = 2, ..., d with the marginal integration pilot estimates (kernel-based) and establishing that the error caused by this “cheating” is negligible for estimating function m1 (11:1). In this chapter we propose to pre-estimate the functions {ma($a)}g=1 by an under smoothed constant spline procedure. These function estimates are then used as as if they were the true functions for constructing the “oracle” estimator. The greatest advantage of our approach over that of Linton (1997) is that ours is much faster, and can be applied to cases of extremely high dimension data (e.g., the num- ber of predictors, d, can be as large as 50 or 100). We believe that our approach is the first example Of marrying the traditionally parallel spline smoothing and kernel smoothing techniques, leading to an estimator with asymptotically normal distribu- tion like a typical kernel estimator, without the formidable computational burden of high dimensional kernel smoothing. Figuratively speaking, spline smoothing can be compared to a Sledgehammer capable of breaking any huge chunk of material (i.e., a regression problem from data of very high dimension and very large sample size), in one slam (i.e., solving only one linear least squares problem), but does not guarantee the fine shapes of the broken pieces (i.e., the estimates are not guaranteed to converge at any point or uniformly over an interval, only in the L2 sense). In contrast, kernel smoothing works like a sharp knife that cuts anything into pieces of 53 precise shapes (i.e., confidence intervals are available at any point based on asymp- totic normal distribution, and confidence bands are available over compact intervals), but is too tedious to use for a large chunk of material (i.e., the computation cost is intolerable when dimension is high and/or sample size is large). Our proposed new tool can be described as a hammer-knife capable of first slamming any huge clump into many much smaller pieces (i.e., univariate regression problems) in one hit (the spline backfitting step), and then cutting all the smaller pieces into the exact desired shapes (one dimensional kernel smoothing of backfitted pseudo data). In this sense, the method we propose combines the best features of both spline and kernel methods. Smoothing experts may wonder how one could have all these good features in one method. The success Of our method is due to the well-known “reducing bias by undersmoothing” and “averaging out the variance” principles, see Propositions 3.3. 1, 3.3.2 and 3.3.3. Both goals are accomplished with the joint asymptotics of kernel and spline functions, which is the new feature of our proofs. For more details, see Lemmas 3.6.3, 3.6.4 and 3.6.5. In addition to the above features, uniform confidence bands are provided for all function estimates under mild conditions. Literature on nonparametric confidence bands has been scarce, and as far as we know, is lacking in multivariate regression setting. For additive regression model, however, it seems that the present work is the one of the few to Offer the measure of uniform accuracy with theoretical justifications. The good news is that the confidence band we provide for ma (1:0) with any a = 1, ..., d, is asymptotically the same confidence band that Hardle (1989) established for univariate regression with kernel smoother, regardless how many regressors there are 54 and what other functions ma (2:0) ,0 = 1, ...,d are. Hence neither the dimension (1 nor other function components play any role in forming the band for ma (2:0,), at least according to the asymptotic theory. In this sense, our estimator of ma (ma) possesses what we would like to call “uniform oracle efficiency” , which is much stronger than the “pointwise oracle efficiency” of Linton (1997). Furthermore, components in directions not of interests are only required to be Lipschitz continuous (see Remark 3 at the end of Section 3.2). Compared to all existing methods, this feature makes admissible the broadest class of additive model. The rest of the chpater is organized as follows. In Section 3.2 we introduce the spline—backfitted kernel estimator, and state their asymptotic “oracle efficiency” under appropriate assumptions, both pointwise and uniform. In Section 3.3 we provide some insights into the ideas behind our proofs of the main theoretical results, by decomposing the estimator’s “cheating” error into a bias and a noise part, which will be shown separately to be of negligible order. In Section 3.4, we present extensive Monte Carlo results to demonstrate that the proposed estimator does indeed possess the claimed asymptotic properties. The simulated examples cover a wide range of sample sizes with correlated structure and some very high dimensions, which would have been either infeasible to handle with kernel smoothing methods, or lacking any measure of confidence, pointwise or global, by spline method. The proposed estimator are applied to the Boston Housing data in Section 3.4.2. Section 3.5 concludes, and all technical proofs are contained in the 3.6. 55 3.2 SBK and SBLL Estimators In this section, we describe the spline-backfitted kernel estimation procedure. Let {n,xg’}; = {IQ,X,-1,...,X,-d}?=l be an i.i.d. sample following model (3.1.1). In what follows, we write all responses as Y = (Y1, ..., Yn)T, and denote by X the design matrix (X1, ..., Xn)T. Without loss of generality, we take all intervals [am b0] = [0, 1] ,a = 1, ..., d. We pre select an integer Nn ~ 112/5 log (71), see Assumption (AS6) below. Next, we define for any a = 1, ...,d, the indicator function 1,1,0 ($0,) of the (N + 1) equally-spaced subintervals of the finite interval [0, 1], that is 1 JHSxa<(J+l)H, H = Hn = (Nn+ 1)‘1,J =0,1,...,N. 0 otherwise, IJ,a ($a) = { (3.2.1) Define next the (1 + dN)-dimensional space G of additive spline functions as the linear space spanned by {1, [La (ma) ,a = 1, ...,d, J = 1, ..., N}, while denote by 0,, the subspace Of R" spanned by {{1}?=1 , {11,0 (Xia)}?=1 ,a = 1, ..., d, J = 1, ..., N}. As n -—2 00, the dimension of Ga becomes 1 +‘dN with probability approaching one. The spline estimator of additive function 117. (x) is the unique element 1h (x) = 771,; (x) from the space Gso that the vector {fit (X1) , ..., 1h (Xn)}T best approximates the response vector Y = (Y1, ..., Yn)T. To be precise, we define d N m (x) = i0 + Z 2 $1,011,, ($0,), (3.2.2) 021 J =1 where the coefficients X0, 31,1, ..., XMd are the solution of the following least squares 56 problem 2 .. . .. T n d N ()0, Am, Alva} = argmianN+12{Yi - /\o - Z Z )‘J,aIJ,a 0%)} ~ i=1 a=l J=l (3.2.3) The pilot estimators of each component function and the constant are defined as N n N The! (515a) = Z AJ,aIJ,a ($0) ‘ "-12 Z AJ,aIJ,a (Xia) : J=1 i=1 J=1 d n N The = 3‘0 + 1110: Z 21)], aIJa (Xia). (3.2.4) Hz] :1 These pilot estimators are then used to define a set of new pseudo-responses 17,-] which are estimated versions of the unobservable “oracle” responses 1’21, to be specific, d f[1'1=Yi-é-E1_47l"lar(Xio:)aYil= 1“i‘c-2:2"1cnz()(z'c1z) 1:1 zen-1:”, 0:2 (1:2 (3.2.5) where by Central Limit Theorem 6 is a fi-consistent estimator of 0. Next, we define the spline-backfitted kernel (SBK) estimator of m1 (1:1) as 7713,1(221) based on {IQ-1, X51}'.l 1, which is an attempt to mimick the would-be Nadaraya-Watson z: estimator 7718.1 (:51) Of m1 (2:1) based on {121,Xi1}?=1, had the unobservable “oracle” responses {Ella—.1 been available. 732 1a,) = 2521191091 mYu .. (,1): Bilge (X11 —rc1)Y11 8’ 221:1 Kh (Xil - 551) ms’ 2L1 Kh (X11 — x1) , (3.2.6) where 17,-] and Y“ are defined in (3.2.5). Throughout this paper, on any fixed interval [a, b], we denote the space Of sec- ond order smooth functions as 0(2) [a,b] = {m|m” E 0 [a,b]}, and the class of Lipschitz continuous functions for any fixed constant C > 0 as Lip ([a, b] ,C') = {ml |m(:r) —m(:r’)| S CIx—x'l ,Vzc,:1:' 6 [a,b]}. 57 Before presenting the main theoretical results, we state the following assumptions. (A81) The component function m1 6 0(2) [0, 1] , while there is a constant 0 < Coo < 00 (A32) (A8?) (A33) (AS4) (A85) (AS6) such that m5 6 Lip ([0, 1] ,C'oo) , Vfl = 1, ...,d. The noise 5,- given X,- are i. i. d. with mean 0 and variance 1, for i = 1, ...,n, while the conditional standard deviation function a (x) is continuous on [0, 1]“. Denote Co = maxxe[0,1]d o (x). The conditional distribution of noise 5 = (51, ...,en) given X = (X1, ...,Xn)T is n-dimensional standard normal. The density function f (x) of X is continuous and O < of S inf {f(x)} S sup {f(x)} 5 Cf < OO. XEIOJI“ x€[0,1]d The marginal densities fa ($0,) of X0, have continuous derivatives on [0,1]. The kernel density function K 6 Lip ([-1, 1] ,CK) for some constant CK > 0, and is bounded, nonnegative, symmetric, and supported on [—1, 1] The bandwidth h of the kernel K is assumed to be of order 71-1/5, i.e., chn"1/5 S h S C'hn"1/5 for some positive constants ch, 0),. The number of interior knots Nn ~ 112/5 log (n), i.e., an2/5 log (n) 3 Na S (7an5 log (n) for some positive constants CMCN, and the interval width H = (Nn + 1)“. 58 The asymptotic property of the kernel smoother in“ (2:1) is well-developed. Un- der Assumptions (ASH—(ASS), according to Theorem 4.2.1 of Hardle ( 1990), one has «I»? {m (an) — m1(x1>— b 0. There is no optimal way to choose N,’,, however, at least to us at this time. The fact that N;1 = o (n'2/5) ensures that the bias in the spline pilot estimators is negligible compared tO the bias of h2 in the kernel/local linear smoothing stage. On the other hand, one does not allow Nn to be too large for practical reasons: the number Of terms in (3.2.3), 1 + dNn has to be small relative to n. Hence we select Nn to be of barely larger order than n2/5. Remark 3. Assumption A1 requires only the Lipschitz continuity for the com- ponents except for the component Of interest. Obviously all ma are required to be second order smooth if one needs to estimate all components. 3.3 Decomposition In this section, we introduce some additional notations in order to shed some light on the ideas behind the proofs of Theorems 3.2.1 and 3.2.2. Denote by ||¢||2 the theoretical L2 norm of a function o5 on [0,1]“, “(tug = E{¢2 (X)} = f[0,l]d (#2 (x) f (x) dx, and the empirical L2 norm as ”ding,“ = n‘1 2&1 ¢2 (Xi). For any Lg-integrable functions 45, (p on [0,1]d , the corresponding inner products are de- fined by (¢,¢)2 =/ d¢(X)<.0(X)f(X)dx=E{¢(X)2 11 1300/31, "-1 2?:1 BJ,a (Xia) 0 (xi) 5i ISJSN, , 1_<_J,J’SN . ISOSd (3.3.11) Our main objective is to study the difference between smoothed backfitted esti- mator 111.84 (2:1) and the smoothed “oracle” estimator {133,1 (1:1), both given in (3.2.6). From now on, we assume without loss of generality that d = 2 for notational brevity. . . . _ ON +1 Denote the projection matrix P0 N I — , we define another aux- +l’ N IN iliary entity _1 T N .3; (x2) = P321132) = {(BTB) BTE} PoN,,,1N (B (x))T = 2 6112812 (23). J=l 64 which, in particular, entails that -1 T T N 52 (X12) = {(BTB) BTE} P0N+1,IN (61TH) = Z a.I,2BJ,2 (X12), J=l (3.3.12) in which e,- is the n—dimensional unit vector with i-th element 1 and else 0 and hence the i-th row of matrix B, QTB = B (X,) , is the basis functions corresponding to the i-th Observation Xi. Definitions (3.3.5) and (3.3.6) imply that 52 (3:2) is simply the empirical centering of g?! (11:2), i.e. n N n N 52 ($2) 5 52 ($2)-"—1 :52 (X12) = Z 5J,2BJ,2 (Km-n.1 2: 51,2312 (X12) - ,-=1 1:1 i=1 J=l (3.3.13) Making use Of the signal noise decomposition (3.3.8), the difference my, (3:1) - 613,1 (11:1) + 6 — c can be treated as the sum of two terms "_IZ?=1Kh(Xi1- $1) {m2 (X12) - m2 (X12)} = I($1) + 11 ($1) "-1 Z?=1Kh(X11 -$1) "'12?=1Kh(X11 —$1)’ (3.3.14) where 1 ($1) = "—1 Z Kh (X11 - $1) '52 (X12), (3-3-15) i=1 11 ($1) = "-1 2K); (X11 '- $1) ' {7712 (X12) - m2 (X12)}- (3-3-15) i=1 The term I ($1) is related to the noise terms 52 (X12), while II (2:1) is induced by the bias terms in; (X52)-—m2 (X12) . Propositions 3.3. 1 and 3.3.2 below show respectively that the term I ($1) is of order 0,, (n'2/5), either at a given point or over an interval. This is the most challenging part to be proved, mostly done in Subsection 3.6.1. On the other hand, Proposition 3.3.3 below shows that the bias term II (3:1) is uniformly 65 of order 0,, (n'2/5) for 2:1 6 [0,1], to be proved in Subsection 3.6.2. Standard theory of kernel density estimation ensures that the denominator term in (3.3.14), 11‘1 Z?=1Kh(X,~1 -- x1), has a positive lower bound for 2:1 6 [0, l]. The additional nuisance term é—c is of clearly order 0 (n‘l/z) and thus 0,, (n‘2/5) , which needs no further arguments for the proofs. Hence both Theorems 3.2.1 and 3.2.2 follow from Propositions 3.3.1, 3.3.2 and 3.3.3. Section 3.6, therefore, is devoted'exclusively to the proofs Of these three propositions, rather than of the main theoretical results, Theorems 3.2.1 and 3.2.2 themselves. The next three propositions follow respectively from Lemmas 3.6.10 and 3.6. 11, Lemmas 3.6.11 and 3.6.12, Lemmas 3.6.1 and 3.6.2. Proposition 3.3.1. Under Assumptions (A51) to (A56), for any 93] 6 [0,1] (1 ($1)) = 0,, (71-1/2) = a, (71-2/5) . Proposition 3.3.2. Under Assumptions (A51) to (A56) and (A52’) sup Ia: =0 n"1/2lon1/2 =0 n'2/5. xlelmlul): p( {g} ) ,( ) Proposition 3.3.3. Under Assumptions (A51), and (A53) to (A56) ..i‘ié’i'“'=01(‘m=0p)(""2”)- 66 3.4 Simulation and Examples 3.4.1 Simulation In this section, we present simulated results to illustrate the finite-sample behavior of the spline backfitted kernel estimators in”, (2:0,) for any a = 1, ...d. The data set is generated from the regression model Y = 23:1 ma (X a)+a (X)-e. The additive elements are assumed to be ma (ma) = sin (21er) ,Va = 1, ...,d. Similar to Nielsen and Sperlich (2005), the predictors Xa are obtained through the transformation X0 = 2.5 * { (Za) — 0.5}, where (I) is the standard normal distri- bution function and the variable Za ~ N (0,1) ,0: = 1, ...,d with thecorrelation coefficients pug = p, a 74 ,6 for any pair of Z ’3. Now the correlation between X’s is not p any more, it will depend on p. In order to validate the assumption that the density is bounded below from 0, we will focus on the estimation inside [—1, 1]d. Meanwhile, the error term 5 follows standard normal distribution and is indepen- dent of X. The conditional standard deviation function is defined by _ a 100-exp{2§=1 IxaI/d} “ 7' 100 + exp {221:1 Ixal /d}° By this choice of a (x), we ensure that our design is heteroscedastic, and the variance 0 (X) is roughly proportional to dimension d. This proportionality is intended to mimic the case when independent copies of the same kind of univariate regression problems are simply added together. 67 We now describe how the SBLL estimator are implemented. The first step is to obtain the spline estimator of Egg ma (X0), using the truncated power B-spline basis as in (3.2.3). The selection of knots will uniquely define the basis. The knots number N" will be determined by the sample size and two tuning constants, to be specific Nn : min ([Cln2/510gn] + 02, [(n/4 -1)d—1]): in which [c] denotes the integer part of c. In our simulation study, we have used c1 = 1 = c2. The choice of these constants c1 and c2 makes little difference for a large sample. But for small sample size, it does affect the performance to a degree. The additional constraint that N S (n / 4 — 1) d‘1 ensures that the number of terms in the linear least squares problem (3.2.3), 1 + dNn, is no greater than n/4, which is necessary when the sample size n is moderate and dimension d is high. The oracle smoother m 3,1 (:01) for comparison is obtained by local linear regression of the unobservable m1 (X 1 )+0 (X) e on X 1 directly, while the oracle SBLL estimators ms) (2:1) are obtained by local linear regression of {121, Xil }:=l° To save space, we only implement the local linear version of mm (2:1), i.e., the SBLL estimator, using the XploRe quantlet “lpregxest”. For information on XploRe, see Hardle, 'Hlavka and Klinke (2000) or visit http://www.xplore-stat.de. We have run 5 = 500 replications for sample sizes n = 100, 200, 500 and 1000 with correlation coefficient p = 0, 0.3 respectively. The dimensions are taken at d = 4, 10. The major objective of this section is to compare the relative efficiency of in”, with 68 respect to mm %2?=1 {film (XiaJ) ‘ "‘0 (Xia'l) }2 [{lxia l '31} a a = l n . 2 , 1,...,d,l = 1,...S 3 25:1 {ms,a (X1111) _ ma (Xia.l)} [{IXiaIISI} effaJ = S 1 effa "—" §§8fi0’[,a=l,u.,d, in which {X,-1,,,...,X,-d,,};‘=1 is the l-th sample, 1 = 1,...,5. Theorems 3.2.1 and 3.2.2 indicate that the efliciency should be close to 1. In particular, when we have an efficiency value bigger than 1, fits“, (2:0) is a better estimator in the sense of mean square distance. The corresponding mean and the standard error (in the parenthesis) of the rel- ative efficiencies for first and third dimension ((1 = 1, 3) is given in Table 4.3. For the case of p = 0, almost of all the mean values are around 1 without noticeable influence from the sample size and the correlation. The trend of standard errors confirm the comparability of SBLL Thad to the oracle estimator fits“), with faster convergence for a larger sample. At p = 0 and all the random selected directions, the SBLL performs better than the oracle local linear estimator in most cases because the independent components can be well—estimated at the first stage, then univariate local linear smoothing at the second stage will treat less noise than the case of direct oracle estimator, the local linear estimator. In the cases of p = 0.3, the trend to relative efficiency 1 is very clear regardless of the dimension d. All the means are becoming larger accordingly and approaching to 1 steadily when the sample size becomes bigger. Typically, the relative efficiencies are greater than 0.97 for d = 4 with sample size 200, and for d = 10 with sample size 69 500 respectively. We believe that in high dimensional cases the convergence rate is slower than in lower dimensional cases when the predictors are strongly correlated. The standard errors in the parenthesis follow the same trend that less variation is with larger sample size, though it shows slower convergence compared to the case of p = 0, which is not unexpected. In addition, several figures display the features of the relative efficiencies in details. In Figuras 4.6 and 4.7 four types of line characteristics which correspond to the four sample sizes, the solid line (100), the dotted line (200), the thin line (500) and the thick line (1000). The vertical line at efficiency 1 is the standard line for the comparison of mm (:51) and {7'sz (x1) . More efficiency values distributed around the vertical line would be confirmative to the conclusions of Theorems 3.2.1 and 3.2.2. All the curves in Figures 4.6 and 4.7 are the density estimates of relative efficiency distributions for specific sample size n, correlation coefficient p and dimension d. With increasing sample sizes, we found that the relative efficiency are becoming closer to the vertical standard line, with narrower spread out. In addition, the curve with p = 0 shows a faster convergence to the vertical line than those with p = 0.3 in all cases. An interesting point is that almost of all the peak points of the thick line (with the largest sample size) fall very close to the vertical lines. All above confirms the theorem that SBLL behaves similarly like the oracle local linear estimator. We have done some more simulation with d = 50, and S = 100 replications for p = 0,03, and n = 500, 1000, 1500,2000, the results of which are graphically represented in Figures 4.8 and 4.9. The basic graphic pattern is similar to that for the lower dimensions (1 = 4, 10, though with slower convergence rate and relatively 70 lower efficiency. The corresponding statistics are listed in Table 4.4. 3.4.2 Boston Housing Example In this section we apply our method to the Boston Housing Data. The data files bostonh.dat is available in the software of Xplore. The data set contains 506 different houses from a variety of locations in Boston Standard Metropolitan Statistical Area in 1970. The median value and 13 sociodemographic statistics values of the Boston housas were first studied by Harrison and Rubinfeld (1978) to estimate the housing price index model. Breiman and fiiedman (1985) did further analysis to deal with the multi-collinearity among the independent variables. By using a stepwise method, they proposed the alternating conditional expectation method to select a subset of the variablas in order to maximize the correlation between the fitted value and the selected covariates. Four variables were selected by penalizing for overfitting. Opsomer and Ruppert (1998) illustrated their automated bandwidth selection for fitting additive models based on the selected four variables. We will use the same four covariates for our model fitting and current analysis. The response and explanatory variables of interest are: MEDV: Median value of owner-occupied homes in $1000’s RM: average number of rooms per dwelling TAX: full-value property-tax rate per $10, 000 PTRATIO: pupil-teacher ratio by town school district LSTAT: proportion of population that is of ”lower status” in % 71 One major concern is the big gap in the domain of variables TAX and LSTAT, which will cause severe trouble at the first stage of spline estimation. So logarithmic transformation is done for these two variables before fitting the model. We will fit an additive model as follows: MEDV = p + m1 (RM) + m2 (log (TAX)) + m3 (PTRATIO) + m4 (log (LSTAT)) + 5. Although the transformation has shrunk the gap in the domain, some compromise will be necessary to astimate the components since we select the same knots number for each direction. In this case we choose a large number of knots, N = 5. In the smoothing step, we use the SBLL estimator to get the final function estimate of each input variable. In Figure 4.10, the univariate function estimates and corresponding confidence bands are displayed together with the “pseudo data points” with pseudo response as the backfitted response after subtracting the sum function of the remaining three covariatas as in (3.2.5). All the function estimates are represented by the dotted lines, “data poin ” by circles, and confidence hands by upper and lower thin lines. The kernel used in SBLL astimator is Quartic kernel, K (n) = g: (l - 112)2 for —-1 < u < 1. Besides the estimation of the component functions, we also use our proposed confidence bands to test the linearity of the components. In Figure 4.10 the straight solid lines are the regression lines with the least square coefficients. The first figure shows that the linearity null hypothesis H0 : m1 (RM) = a1 + ()1 - RM, will be rejected since the confidence bands with 0.99 confidence couldn’t totally cover the straight regression line, i.e the p-value is less than 0.01. Similarly the linearity of 72 the component functions for log (TAX) and log (LSTAT) are not accepted at the significance level 0.01. While the least square straight line of variable PTRATIO in the upper right figure totally falls between the upper and lower 95% confidence bands, thus the linearity null hypothesis H0 : m3 (PTRATIO) = a3 + b3 -_ PTRATIO is accepted at the significance level 0.05. In addition we add up all the SBLL estimates of component functions and the mean response as a estimate for the response (MEDV). The correlation between the estimate and the raw value of MEDV is as high as 0.80112, implying rather satisfactory fit. 3.5 Conclusions In this paper we have proposed SBK and SBLL estimators for the component functions in an additive regression model. These estimators behave asymptotically like the standard Nadaraya—Watson and local linear estimators in one dimension, thus breaking the problem of d—dimensional additive regression to d univariate regression problems. This is achieved by approximating the unobservable sample {IQ-1, Xill?=1 with the spline estimated sample {IQ-1, Xil}:;l. Although much mathematics is devoted to proving that this approximation works, the implementation is very easy. To give some idea of how fast the procedure is, to run 100 replications for sample sizes 11. = 500, 100,1500, 2000 and dimension as high as d = 50 takes about 40 minutes on a Dell notebook. In other words, within this time span, a total of 100 x 4 = 400 SBLL estimators Then (1:0,) and the same number of oracle smoothers 1713,] (:51) 73 are computed. In addition, the SBK and SBLL estimators inherit the asymptotic confidence bands (3.2.10) of univariate Nadaraya—Watson and local linear estimators. The combination of speed and global accuracy for very high dimension regression is very appealing. 3.6 Proof of Theorems 3.6. 1 Variance Reduction In this subsection we prove Propositions 3.3.1 and 3.3.2. The magnitude of the variance term I (2:1) in (3.3.15) can be measured by its conditional second moment given X1,...,Xn. Based on (3.3.13) and (3.3.15), the conditional second moment . 2 .. E {1(m1)IX} of I(:rl) given X = {X1,...,Xn} is n n n 2 E [{n“ 23m. (le — x1)e‘§(xz2)— n‘l 2K}. (Xh — x1) . n“ 25; (292)} (=1 (=1 ' 1 It is clear that 15‘{Il5'<}2 = E{If(x1)|f<} —E{I%(x1)|5<}, where for brevity, we write 11(21) = "_IZKh(Xn-Il)€§(xzz) (3.6.1) (=1 12(171) = "—IZKMXH—$1)'n-1255(Xi2)- ‘ (3-5-2) (=1 i=1 If further one denotes €109,131) = Kh (X11 — $1) 31,2 (X12), (3-5-3) 74 then n N n N I1(351) = "‘1 Z Kh (X11 - 171) Z 5J,2BJ,2 (X12) = "-1 Z Z 5J,2€J (Khan)- z=1 J=l (=1 J=1 (3.6.4) In order to obtain the order of the conditional second moment of I1 (:51), we first find the supremum magnitudes of E§J(Xz,x1), {J (Xbxl) -— E51 (X(,:z:1) and the size of 2y=1|a1,2|, in Lemma 3.6.3, 3.6.4 and 3.6.7. Consequently, Lemma 3.6.10 shows that SUlee[0,1] E {112 (2:1)] X} = 0,, (n’l). In Lemma 3.6.11 we have 5‘1le6[0,1] IIg (2:1)] = Op(Nn'1\/l3g—r—L) .Based on the selection of N ~ 112/5 log n, Proposition 3.3.1 is thus proved. There is one more Assumption (ASZ’) in addition to Assumptions (AS1) to (A86) in Lemma 3.6.12. The order of 11(31) under the new restrictions is ob- tained uniformly over [0,1] inflated only by a factor of {log (n)}1/2 compared with the pointwise case, one has SUlee[O,1]lIl (2:1)] = 0,, (W). Now again, due to the selection of the interval width H m (n2/ 5 log n)-1 , the order Op (Nn'lm) of 3‘1le6[0,1] |12 (2:1)] in Lemma 3.6.11 is negligible compared with order of supzl€[0,1]|11(xl)|. So under the Assumptions (A31) to (A36) and (ASZ’), we have established the uniform bound over [0, 1] of Proposition 3.3.2. 3.6.2 Bias Reduction Now we prove Proposition 3.3.3 by bounding the bias term II (2:1) in (3.3.16). We first cite one important result from page 149 of de Boor (2001). Theorem 3.6.1. Under Assumption (A1) ma 6 Lip([0, 1] ,C’oo), then there exists a 75 function ya 6 G [0, 1] such that Va = 1, ...,d “.90 _ mango S CooH- (3.6.5) Lemma 3.6.1. Under Assumptions (A51), (A33) and (A56), for the spline function 92 satisfying (3.6.5), one has I! . _ . _ . sup 2:1 Kh (len 1'1) {92 (X12) 1712002)} 5 Geoff, (3.6.6) 2:16[0,1] Zi=1Kh(Xi1 - $1) and for a = 1,2 n laymen = n-1 29am.) = 0,.(12-1/2 + H). (3.6.7) i=1 PROOF. The first inequality (3.6.6) follows trivially from (3.6.5). To prove the second inequality, define a function g (x) = c+ 2351 ga (1:0), then ||g — mlloo S 2CooH and hence Ilg - m|]2,n 3 20001-1. The definition of projection in Hilbert space then implies that um - mum. 5 Mg - m||2,n < 20001:! where m is the projection of m to the space G with respect to (~, )2" , the triangular inequality implies that Ila—9112,. .<. 40001;. (3.6.8) Now (3.6.5) leads to lEnga (Xa) — Enma (Xa)| 3 COOH, while Ema (X0) = 0 leads to Enma (X0) = 0,, (n‘l/Z). Putting these together, one has lEnga (Xa)l S. “31:90 (X0) - Enma (Kan + lEnma 0(0)] 3 COOH + 017(n-1/2) , (3.6.9) which establishes (3.6.7). CI 76 In order to show that the bias term II (2:1) defined in (3.3.16) is uniformly op (n‘2/5), the following lemma suflices. Lemma 3.6.2. Under Assumptions (A31) to (A56), as n —-* oo Z?=l Kh (Xil - 1‘1) {7712 (X12) - 92 (Xe?) + Eng2 (X2)} 221:1 Kh (Xil - $1) sup $16[O,I] = Op(n—1/2 + H). (3.6.10) PROOF. Using the same notations as in the proof of Lemma 3.6.1, (3.6.8) and (3.6.9) now give um — g + Em (X1) + Enge (X2)|l2,n s scooH + 0,.(n-1/2) . and Lemma 3.6.8 would then entail that um — g + Engl (x1) + £2.92 (mug = 0,, (W2 + H) . (3.6.11) To complete the proof of the lemma, we write 2 .N (in — g) (x) + 12.91 (X1) + Engz (X2) = a + Z Z amBza (we). a=lJ=1 where the empirically centered spline basis are 71 Bio, (1'0) = BJ,a (370:) " EnBJ,a (X0) = BJ,a (55a) - "—1 Z BJ,a (Xia) , i=1 foranylSJSN,l_<_a$2. Thenfora=1,2, .N 7710 (55a) — 9a (330:) + Enga (Xa) = Z aJ,aB3,a (170) . J=1 and according to (3.6.19) one has um — g + Engl (x1) + Enge (mug 2 N 2 2 N 2 co {0+ZZGJ’aEnBJ’a(Xa)} +2201! . (3.6.12) a=lJ=l a=1J=1 77 Now n’1 2 Kn (Xil - $1) {1712 (Xiz) — 92 (X12) + Engz 0(2)} i=1 n N _ n‘1 2 K), (Xu - $1) 2 (”233,2 (Xi2)» i=1 J=l which is bounded by N E 1 El K( — B X l < E - E K X: — B X' _ J=1]0J,2| <{l_<_s.‘II;N n i=1 h( 21 331) J,2( :2) + n "-1 2K1; (Xil - $1) i=1 sup IEnBJ,2 (X2)|} ingN which can be rewritten as the following according to the definitions of Q (X;, 1:1) in } Minkowski inequality, Lemma 3.6.5, (3.6.29) and standard properties of kernel den- (3.6.3) and of Afm in (3.6.28) N Z laJ,2]{ Slip lngNn J=l +An,1 n 123101091 ’31) i=1 11:61] (thl) I: l sity estimator now imply that Seals” n lZ:Kh(Xz1-.’L‘1){mz(X22).<12(X12)+1‘77192(X2)} 31 i=1 ,lNéag’ZU {(opf' H)+0,, )(WN |/\ = 0,, (HI/2,|NZaJ2)=—0 p((l;=laJ,2) =1 N 2 2 N 1/2 = Op {6 + Z Z &J,aEnBJ,a (X0)} + Z Z 03:0 ’ a=1J=1 “=11=1 78 which according to (3.6.11) and (3.6.12) is =0(||*- +E (X) E X) =0 '1/2+H p m 9 n91 1 + n92( 2 "2) p n 1 thus proving (3.6. 10). [3 Now combining Lemmas 3.6.1 and 3.6.2, one immediately gets SUP “-1 Z Kh (Xil - a71) {T712 (Xiz) - m2 (Xi2)} $16]0,l] i=1 = 0,, (n—l/2 + H) = 0,, (n'Z/S) , which establishes Pmposition 3.3.3. 3.6.3 Technical Lemmas In this subsection we have collected all the auxiliary results used in Subsections 3.6.1 and 3.6.2. Lemma 3.6.3. Under Assumptions (A53) to (A56), one has sup sup IE£J (Xia): = 0(H1/2) . 21€]0,l] ISJSN PROOF. Define for a = 1,2,J = 1, ...,N + 1 2 01,0 = “IJ,a"2 = [Lia (1:0)}?! ($0)d33av then bJ’a (3:0,) in (3.3.2) can be written as (2,1,0, (2:0) = 11+“; (30) — CJ+1,o:IJ,a ($0) /CJ,a and "IMO“: = chm (1 + cJ+1,a/eJ,a),Va = 1,2, J = 1, ...,N. In Assumption (A33) the two positive constants cf,C'f are the upper and lower bounds of all the marginal densitiae fa (ma) , then for all J = l, ..., N + 1, a = 1, 2 CfH _<_ cJ,a S CfH. (3.6.13) 79 Then for all a = 1,2, J = 1, ..., N, ”5.19“: m H, or specifically 2 Cf (1+Cf/Cf)H_<_ ”bJ,a“2 SCf (1+Cf/Cf) H. - (3.6.14) The absolute expected value of {J (Xbxl) is |E€J (Xz,131)| = IE {Kn (X11 - $1) BJ,2 (X12)}| S f/KMM*1‘1)IBJ,2(U2)]f(U1,U2)dU1dU2 = f/K(vl)|—_— b’w “2 )lf(hv1+x1,tt2)dv1d02 Ile. 2||2 (lle.)2“2 //K(v1){IJ+1,2(U2)+ (53%)1/203 (112)} xf(hv1 + $1,112) dvld‘U2 (lle,2II2)_l {f/K(v1)1J+1,2(U2)f(hvl +$i,u2)dvlduz 1/2 + (SJ-iii) //K(v1)IJ,2(U2)f(’WI +$I.U2)dvldu2}- CJ,2 The boundedness of the joint density f and the Lipschitz continuity of the kernel K will then imply that SUP SUP f/K(v1)IJ,2(U2)f(hv1+x1,uz)dvldu2SCKC'fH, $1€[0,l] ISJSN the proof of the lemma is then completed. D Lemma 3.6.4. Denote by 0,; a set of endpoints in [0, 1] , with cardinality Mn = anl of order n6, i.e. there exist constants 0 < cD < CD such that ch6 S Mn S CDnS, =0,,( 13;"). (3.6.15) then under Assumptions (A33) to (A36) 11 SUP SUP 71-1 2 {SJ (thl) - E€J (Xz,x1)} xleDnngSN (=1 80 PROOF. For simplicity, denote £3 (X), :51) = {J (X), 2:1) — E€J (X1, 2:1) . First we will compute the moments of the theoretical centered random variable 6} (X1, 2:1) for later use in Bernstein’s inequality E {6} (19,151)}2 = E53 (Xbl‘l) - {E€J(X1:$1)}2: in which the first term E53 (Xi’an) = E {101001 - $1)BJ,2(X12)}2 K2 CJ 12 = //——-§(vl){IJ+1,2(u2)+ + ’ IJ,2(“2) f(hvl +$1,u2)dv1duz. h lle.2"2 C” ' so there exist constants c’,C’ > 0, such that c’h’l S E53 (Xbxl) S C’h’l. Then E53 (Xbxl) >> {EEJ (X),:rl)}2 where an > bn means limn—em bn/an = 0. Hence a: 2 _. E {51 (Xz,$1)} = E53 (XL-1‘1) - {EEJ (39,331)}2 2 3h 1, for positive constant c" < c'. When k 2 3, the k-th moment E |{J (X),:cl)|k is {”bJ,2"2}-k//Kif(u1-$1){1J+1,2(U2)+ (CJ+1’2)kIJ,2(u2)}f("1,u2)du1dU2, 61,2 and it can be bounded as follows I: k c(h(1"’°)H(1-"/2) {1 + (2.—i) } s E|§J(x,,e1)|’° g c;h(1-*)H(1-k/2) {1 + (2;) }. Lemma 3.6.3 implies |E§J (X),:r1)|k S CHIC/2, then E IEJ (X;.,:1:1)|'c >> IEéJ (Xl,a:1)|k. E '6} (X¢,x1)]k can be expressed as E |€J (Kb-Tl) - E€J (X1,I1)lk S 2’” (E |€J (Xbxlflk + |E€J(Xt,x1)|k) 0 ’° C (k-2) < C12k‘1h(1-k)H(1“k/2) (_I.) kl ___. Cl {Zh—lH—1/2 (4)} k!(h_1) _ Cf Cf k... g {ngh‘lH‘l/2}( 2) ME IE} (Xz,$1)l2, 81 then there exists such a constant c = C'g2h’1H"1/2 such that k _ 2 E IE} (Xz.x1)| _<. c’c 2km“ [6} (xz.:c1)| , that means the sequence of random variables {G (X), $1)}?=1 satisfies the Cramér’s condition, hence by the Bernstein’s inequality we have logn —62 logn P > 6 — < 2 , { — (nh) } - exp {c* + 2026H"1/2‘/logn/ (nh) } there exists large enough value 6 > 0 such that —62/ {a + 2026H-1/2,/legn/(nh)} g —10, then "‘1 :63 (XI: 31) (=1 ”-1 263(X1: 31) (=1 oo 2 P sup sup logn 2" 675} 00 00 S 2 Z: NMnn"lo 3 200 Z n-3 < 00. 12:1 n=1 Borel—Cantelli Lemma impliae (3.6.15). C] Lemma 3.6.5. Under Assumptions (A33) to (A36) sup sup x1€[0,1]1.<.JSN ”—1 2009,1131) [=1 PROOF. Denote for :1: 6 [0,1] , A (1:) = SUPlstN In’1 2L1 {I (X), 2:)]. If we choose the subset Dn as in Lemma 3.6.4 to consist of equally spaced endpoints in [0,1] , specifically Dn ={$1,k,0 _<_ k S Mn;0=$1’0 < 131,1 < <$1’Mn = 1}, then the consecutive endpoints make a total of Mn subintervals with length M; 1. Employing the discretization method, we have sup IA (2:1)] = sup IA ($1,k)| + sup sup ]A(:cl) - A ($1,k)l- x1e[0,1] osksMn 19chn x1€[“1,k—1’$l,k (3.6.16) 82 We only need to bound the second term, as Lemmas 3.6.3 and 3.6.4, and the fact III/2 >> Vlogn/ (nh) yield sup IA ($1,k)l = sup sup = 0p(H1/2). (3.6.17) OSkSMn $1601; ISJSN ”—1 Zg.’ (x1131) (=1 Employing Lipschitz continuity of kernel K, one has Slip sup IKh (X11 -— $1) - Kh (X11 - $1,k)| 193M" $1€le,k-l’$1,k X — x X - 113 S sup sup CK (1’12 1 - ll h2 I”: S CKMgthESJS) lSkSM" $l€I$1,k—1'$l,k] Hence we have sup sup IA (x1) — A ($1,k)I ISkSMn xlele,k—I'x1,k n n g sup sup sup n-1 26; (XI, x1) - n-l : {J (Xbxuc) lSkSMn$1€[xl’k_l,xl,k] ISJS'N (=1 [=1 S SUP SUP IKh (X11 - $1) - Kn (X11 - $1,k)I 193M" $16I31,k—1v$1,k "- ‘1 B X XlssblgNn g] J,2( 12)| _. = —1-2 —1/2 = —1 __ CKMnhz $2856,1]1$S.1112N|BJ’2(2:2)| 0(Mn h H ) 0(n ), since ch6 3 Mn 5 C'Dn6 in Lemma 3.6.4. The lemma follows instantly from (3.6.16), (3.6.17) and the above result. D Lemma 3.6.6. Under Assumptions (A33) and (A36), there exist constants Co > CO > 0 such that 2 2 60 0(2) + 203,0 5 a0 + ZaJfiBJfi 3 00 a0 + 203,0 , (3.6.19) Ja J,a 2 J,a for any a = (a0,a1,1, ...,aN,1,a1,2, ...,aN,2)T E R2N+l. 83 PROOF. According to Lemma 1 in Stone (1985), there exists a constant co > 0 such that 2 2 N 2 00 + Z aJ,aBJ,a 2 00 a0 + Z aJ,1BJ,1 J,a 2 J=1 2 2 N + Z aJ,2BJ,2 , J=1 2 If it can be proved that there exist constants 06 > c6 > 0 such that for oi = 1, 2 2 N N N c6 2 (13,0 5 Z a 1,0,8 La 5 06 Z aid, (3.6.20) then (3.6.19) follows. To prove (3.6.20), the original B-Spline basis is employed. Without loss of generality we only provide the proof for a = 1. We pick the constant basis {I J,1 (x1)}IJv:l1 and represent the term zyfl a JJB J,1 (x1) as follows N N+1 Z aJ,13J,1(-’1«‘1) = Z dJ,IIJ,1($1)- (3-6-21) J=1 J=1 Theorem 5.4.2 in Devore & Lorentz (1993) says that there is an equivalent relationship between the LP (p > 0) norm of a B-spline function and the sequence of B-spline coefficients. To be specific, in our case N+1 2 N+1 2 N+1 I z dJ,IIJ,1 = j Z dJ,IIJ,1(-'51) (1171 = 2 (13,117- As in Assumption (A33) the joint density bounded between cf and Cf, we have N+1 2 N+1 2 N+1 2 CI 2 dJ.11J.1 S 2 dJ,11J.1 S Of 2 «mm J=1 L2 J=1 2 J=1 L2 The equality (3.6.21) and (3.6.14) leads to Elf EN: “31 {(CJ+11)2 1} J,1 = ’ —' + J=1 J=1 |le.1||§ C“ N N+1 N => cd 2 ailH‘1 3 2: (13,1 3 Cd 2 ailH—l, J=1 J=1 J=1 84 for positive constants Cd and Cd. Therefore, 2 2 N N N+1 N €de 2 “-21.1 S 2: aJ,1BJ,1 = Z d.I,1IJ,1 S (7de 2: “3,1: J=1 J=1 2 J=1 2 J=1 i.e. (3.6.20) holds given of, = cfcd, 6'6 = Cde. [:1 Lemma 3.6.7. Under Assumptions (A31) to (A56), the least square solution 5 de- fined in (3. 3. 9) satisfies N N "T5 = 63 + Z 2: (ii, = 0,, (—) . (3.6.22) ‘ -l PROOF. According to (3.3.9), 5 = (BT13) BTE, then -1 5TBTB5 = (5TBTB) (BT13) BTE = 5T (BTE) . Replacing BTB with matrix of the inner products (BJ,a, BJ/ a’>2 , as the matrix 2 ’n B is given in (3.3.10), one has 1 “1351);, = 5T <3 B > 5 = 57‘ (n-IBTE) . (3.6.23) J’a, J’,OI 2 71 Based on (3.6.19), the left hand side of (3.6.23) is bounded below by 2 (1 " An) "Béllg : (1 " An) 5'0 + Z &J,aBJ,a J,a _>_co(1—An)(&3+2&3,a , J,a (3.6.24) 2 where An is of order op ( 1) in Lemma 3.6.8. While the last step in (3.6.24) is obtained from (3.6.19). Meanwhile by the Cauchy-Schwartz inequality and the expression of a in (3.3.11), the right hand side of (3.6.23) is bounded from above by 1/2 n 2 n 2 1/2 (53 + 253,0.) [{"_l :0 (xi)5i} + Z {n—1 2 BJ,a (Xia) 0 (Xi)5i} ] - J,a i=1 J,a ' i=1 (3.6.25) 85 Now (3.6.23), (3.6.24) and (3.6.25) will lead implies that a?) + 2),, a?” is less than n 2 ca2 (1 — Aer2 [{n“ Za- (X.)e.-} +Z{" ‘1 EB]. (X1000 (x he} ] . i=1 i=1 Note next that it is trivial to verify that E[{n’lio(xi)ei}2 +Z{n IZBJQ(X,-O)U(X)e,}2]=0(n‘1N). i=1 i=1 Therefore (3.6.22) holds. [3 Lemma 3.6.8. Under Assumptions (A33) and (A54), the uniform supremum of the rescaled difference between (91, ”)2," and (91,92); is |(91,92)2 —(91,92>2| An = sup ’" = ,, log" =op(1). (3.6.26) 91 926d“ -1) ||91||2||92|l2 ”H PROOF. Let N 2 91(X11X2) = “0+2 ZaJ,aBJ,a(Xa)1 J=la=1 N 2 92(X1,X2) = 06+ 2 Z “’JI,QIBJ',a’ (X01), J’=lo/=1 in which for any J, J' = 1, ...,N,a,a' = 1,2, a1), and affla, are real constants. The difference between the empirical and theoretical inner products of 91 and 92 is (91.92)”. — (91,92)2| = 2 10:1 J’=10’=_l 2," — J=la=1J’=1a’1 86 S Z2,n + Z (“O’GG’M’BJ’ta'lm Jra J,,Q’ + Z: IaJ,a| lat/[rail |2,n - 2l .(3.6.27) J,J’,a,a’ The equivalence of norms given in equation (3.6.19) leads to Z (06.02.131.32. .<. All ' labl - ZGJ,aBJ,a Ja J,a 2 1/2 2,n — 2 J,J’,a,a/ 1/2 1/2 JsJ’3aaa, J’a J”al S 0.421412 ||91||2 |l92II2. where t _ Am? _ Jj’usal 2,n ... 2 ' 87 Now since (91.92)”. — (91,92)2| S {(CA,1 + 02,1) A711 + CA,2A;,2} ||91||2 |l92||2, if we can show that A);2 = 0,, («log n/ (nH)) , (3.6.30) plus the fact that \/log n/ (nH) >> \/ log n/n, based on the selection of H ‘1 ~ n2/5 log n, then there exists a constant C A > 0 |(gl:92>2,n — (glag2l2l I * * * . * S (CA,1 + 0,4,1) An,l + CA.2An,2 S CAAn.1h ||91||2 ||92||2 the order 0,, (W) of An will be established as in the statement (3.6.26). The proof of (3.6.30) will be provided case by case with vari- ous a, 01’, J and J’ , via Bernstein’s inequality. For brevity, we set 1;,- = n-1 [31,0 (Xia) 811,0; (2%,) — E {B J,,, (Xia) B ”a, (XE/fl] , then A3; = SUP1ngN,a=1,2 |23=1m| - We will consider a = a’ = l in the CASE 1.1 to CASE 1.3. CASE 1.1 when IJ - J'l > 1. The definition of B J,1 in (3.3.3) will guarantee that in probability 31,1 (Xi1)BJ/,1(X,°1) = o if |J — J’| > 1. CASE 1.2 when J = J’. The variable 1),- and its second moment can be simplified as follows 17t= "—1 (33,1001) - 1} .1377? = 3133(33,1(Xt1)-1}2 = $ {E311 (X21) - 1}. in which E83’1(X,-1) = lib-1.1”;4 (014.13 + C3+l.l/63.1) . The selection of H will make E311 (X271) the major term of {E33,I (Xfl) - 1}, then there exist constants 88 60,2 and 0:13 > 0 such that cyan—2H"1 S Er],-2 S Cg’2n_2H_l. In terms of the Minkowski’s inequality, the k-th absolute moment has the following upperbound k —k 2 k —k k-l 2k E|17,~| =n ElBJ’1(X,-1)—1| $1. 2 {EBJ,1(X,'1)+1}. -2k _ . where EBgf‘l (Xil) = “bJ,1”2 (”4,1,1 +03’fH’1/c3f’1 1). Hence there exist con- stants C82 and C 32 such that _ k 1- c’ngl k g 133%, (xfl) S 03211 k, then the term E83,“l (Xill will be the dominant one compared with 1. Hence there exists a constant C03 > 0 such that E [17,)" g 05,2n-k2k-1H1-k. Next step is to verify the Cramér’s condition E Milk S 03,2n-k2k—1H1-k = 05,2”—(k-2)2k—lH-(k-2)n—2H—l 2 2077.2 2011.2 Cn,2 ”H (k4) k—2 > awn-2H4 _<_ {0,7,2} k!En,-2, in which 0,12 = (20,7,2n’1H‘1) max (1, 203,297,” . For a large value 6 > 0, we have n P { 2771' [=1 3 2epr: —62 log n/ (nH) 6\/lo n nH 2e 2 g /( )} S xp [4 Z?=1E’li2+ 203,26‘Mog n/ (nH)] —62 log n/ (nH) :l 4n {03,211‘211’1} + 203,26t/log n/ (nH) 89 If the large enough value 6 is taken such that —62/ {40,93 + 20;,2dt/log n/ (nH)} S —3,then logn _3_ P N t? {133211.234 l“; " >2. <°° Applying Borel -Cantelli lemma, when J = J’, a = a’ = 1 we have Zn: CASE 1.3 when |J — J’ I = 1. Without loss of generality we only prOve the case An = sup ’ 1_<_J- gimme} 2 cJ+2,1 x{IJ+2,1(=II1) CJ+111J+1,1(331)} f1($1)d$1 ’1 90 _ _ C 2 lle,1||22||bJ+1.1||22{( +21) [11.111111111111111} CJ+1,1 (C3+2,1 lleJll2-2 lle+1,1ll;2) /CJ+1,1, According to (3.6.13), cfH S CJ+1,1 S CfH, so E77,-2 will be with the same order as the major term n’zEB},l (Xi1)83+1’1 (X11) , i.e. there exist constants cmg, 0:13 > 0 such that 6,7,311.—2H—1 3 E17? _<_ Cg’3n_2H-l. The k-th moment is given by - k Elmlk = n kElBJ,l(Xil)BJ+l,l(Xil)“EBJ,1(X1'1)BJ+1,1(X1'1)| |/\ n'ka'l [E IBJ,1 (X11) BJ+1,1 (X11)|k + IEBJ,1 (X11) BJ+1,1 (X11)Ik] 1 where IEBJ,1 (X11) BJ+1,1 (X11)|k = 85”,, “51,1":c lle+1,1“2-’c "’ 1 EIBJ,1(Xi1)BJ+1,1(Xil)lk = (0134,21 “in”? lle+1,1ll2—k) #5111 ~ Hl-k- Hence there exists a constant 0,7,3 > 0 such that E milk 3 05,371‘k2k—1H1-k. Similar as in Case 1.2, the conclusion follows by using Bernstein’s inequality fl 2121- i=1 A;2= sup ’ ngsN =01(W). CASE 2 when a = a’ = 2, all the above discussion appliw without extra modifi- cations. CASE 3 when a # a’. Without of loss generality, suppose a = 1, a’ = 2. 91 First we still need to calculate the order of second moment E173, E7)? = "‘2 I153{I3J,1(X11)BJI,2(X¢2)}2 - {331,1(X11)BJI,2(X12)}2I - The boundedness of the density function f ($1,152) implies the order 0 (H) of the absolute mean IEBJ’I (Xi1)BJI,2 (Xi2)I S. E Inil |/\ _ —1 IIbJ,lIl21II,2IbJ’ II2 f/IbJJ(1711le!2(352)If($11$2)d$1d$2 Cf{IIbJ,1“2 [le,1($i1)Id$1}{IIbJ12I2[I5J12($12)Id$2} 041+ i3,—i’—‘}{ubnn; H}{‘+ i-fi-T‘Zlillbml ”#0811”, for some constant 03,1 > 0, where the last step is derived from the equations (3.6.13) l/\ |/\ k and (3.6.14). As aconsequence, IE {BJ’I (Xil) 3.1/,2 (X;2)}I g C§,1Hk' Meanwhile the uniform order of the mean square 0 (1) will be obtained by Assumption (A83), and (3.6.13) and (3.6.14), E{BJ1(X11)BM(X12)}2 I,le1H2 2IIleg II2 2]]le($11)b112($1'2)f($11172)d331d$2 .. —2 Cf {IIbJ,1“22 [bJ,1 (mil) £1121} {I bJI’2I 2 ij’,2 (mfg) (1:32} cf{1+ C3+1,1/C3’1} {lIbJ11ll2-2H} {1 + 03,+1’2/c3,’2} {IIbJ,’2II;2,H} _>_ 03,2. IV Hence there exist constants 6’71 0;, > 0 such that can—2 3 Eng-9' S Can—2. 92 First we still need to calculate the order of second moment E173, En? = n-2 [E {81,1 (X11) 811,2 (Ia-2)}2 — {E811 (X11) 3,1,2 (X12)}2I . The boundedness of the density function f (221,2:2) implies the order 0 (H) of the absolute mean IEBJ,1(X1‘1)BJI,2(X2’2)I S E |771°| _1 -1 “bJJIlz IIbJ’,2II2 f] IbJ,1($1'1)bJI,2(-’51'2) _ -1 Cf{“bJ.lll21 / le.1($z'1)|d$1} {IIbJ’,2II2 / IbJ',2(xi2)IdI2}- CJ 1,1 CJ’ 1,2 Cf {1 + 6:1 }{lle,l"2l H}{1+CJ + "7}{III’JA2II2—l HIS 03,111. for some constant 03,1 > 0, where the last step is derived from the equations (3.6.13) l/\ f($11$2)d$1d32 l/\ |/\ k and (3.6.14). As aconsequence, IE {37,1 (X11) BJI’2 (X52)}I S C§,1Hk' Meanwhile the uniform order of the mean square 0 (1) will be obtained by Assumption (A83), and (3.6.13) and (3.6.14), 2 E {BJJ (X11) BJI,2 092)} _ —2 IIbJ,1"22IIbJ’,2II2 [[1711($105313($12)f($1112)d$1d$2 _ —2 Cf{IIbJ,1|l22/b,21,1(~731'1ldi171}{HIM/,2”2 [b31,2($i2)d$2} cf {1 + €3+111/C311}{"b111“;2 H} {I + 631+1,2/63,,2} {IIbJ,’2II;2-H} 2 63,2. IV Hence there exist constants 617,08 > 0 such that cnn”2 5 En,-2 S Gan-2. 92 For any k > 2, the k-th moment of |17,-| is given by 11: Elm!" = n*’°E|BJ,1 (x11)BJ/,2(X12)—EBJ,1 (2908113 (X11) k g n‘kzk’l IEIBJ,1(X1'1)BJI,2(X1'2)I +IEBJ,1(X11)BJI,2I(X12) '“l where there exists a constant C B’ > 0 such that k E IBJ, 1 (X11) BJ’ 2 (X12) 1: IIbJ,1II2 Iijlz II; k/f IbJ,1 (1311le: 2 ($12)I “$1.32) d15161-722 CfIIIbJ,1II;k / IbJ,1 ($11)Ikd1=1}{IIbJI,2II2k / IbJI,2($12) kdxz} ck s042%}{12%}{111112121 } ck H“ g C,{1+ $111} {1+%‘-"—2}{cf (1+cf/C,)}"°H2-’°$051124“. J’ ,2 l/\ |/\ Thus there is a constant C" > 0 such that Elflilk S n’ka‘l [Cg/H24“ + C§,1HkI S (C3’,,)"n""2’€‘1H2-’c k~2 203 -2 20 202 —l,,(2c n-lH-1)k Gnu-2 g {——"- max (_1 1)} 11113173. 0n "H 6n Employing the Bernstein’s inequality and the fact that En,2 ~ 11‘2, for any |/\ 1SJ1J’SN1a7éa'1 " 31,0, (Xi0)BJ’a’ (X111!) -E {31.12 (Xia) 3er (Xia’)} ( logn ‘ p sup 2: ’ n ’ -O 1ngN i=1 71 Hence for any 1 S J, J’ g N, a, a’ = 1,2, the proof of (3.6.30) is completed. B -l The next lemma on the positive definiteness of matrix (n‘lBTB) is a sufficient step to achieve Lemma 3.6. 10. 93 Lemma 3.6.9. Under Assumptions (A33) and (A54), for the matrix S = dN+l 1 T —1 . . (31.3.1), -/ 1 = (n’ B B) , there exist constants Cs > 63 > 0 such that wzth .71.? = probability approaching to 1, one has CSI2N+1 S 5‘1 S CSIZN-H- (3-5-31) PROOF. Take a real vector c = (no, 111,1, ...,uN,1,u1,2, ...,uN,2)T E RZNH, one has 2 1 O 2 = (T C = (TS—1C: (3'6'32) T B ”C * ,n O (BLQ, BJ’,a'>2 n where we denote B... = {1, 81,1 (X1) , ..., BNQ (X2)}T. Meanwhile, the definition of An in (3.6.26) entails in particular that 2 2 2 HM 2M»412811121198: .u—An» while (3.6.19) means that there exist constants CS > cs > 0 such that 2 2 2 2 T 2 2 2 Cs “0 + 2 :qu Z IIC 3* 2 = “0+ 23111081,, (3a) 2 CS "0 + :“Ja 1 J,a J,O 2 J10 hence J,a 2 2 2 2," _>_cS “0+Z"J,a (1 —An). J 0 (3.6.33) Putting together (3.6.32), (3.6.33), one concludes that with probability approach- ing 1 CSCTC = (73 113 + 211%, Z (TS-lg 2 cs 11(2) + 21130 = CgCTC, J,a J,a which gives (3.6.31). 1:] 94 Lemma 3.6.10. Under Assumptions (AS!) to (A86), for any 1:1 6 [0,1] and 11 (2:1) defined in (3. 6.2), one has 22186131] E { 112 ($1)| i} = 0,, (n‘l) . (3.6.34) -1 PROOF. It is known that 5 = (BTB) BTE, then the conditional mean square of 5'; (X12) given )2 is E [{6‘3 (Xl2)}2| )2] ~ T T ~ T .. = E ({aTPONHJN (CITE) } {aTPONHJN (€53) }|X) _ T T '1 T T - T ‘1 .. e,BP0N+1,,N (B B) B E(EE lX)-B(B B) Based on Assumption (A82), we have E{(E-ET)|X1,...,Xn} S 031" in the matrix sense, then applying these two matrices to a quadratic form with vector -l .. {B (BTB) P0N+1JNBTCI’}’ one has E [{g (X,2)}2| x] -—l T s a: - {cram} - (3%) {w (423)} I = "‘10: ° {0N+1: 31,2 (X12) ..., BN,2 (X12)} 5 {0N+1. 31,2 (erg) ..., BN,2 (le2)} = "—103 ' 2 3J3 (X12)SJ+N+1,J’+N+IBJ’,2 (Xl’2) , lsJ,J’_<_N . where the 3J+N+1,J’+N+1’S are elements of S in Lemma 3.6.9. Plugging in the above term, and employing (3.6.4), the term E { I ? (2:1)] X} 03 " s g 2 Kh (X11 — n) K}. (x,,, — x1) l,l’=l 2 BJ.2 (X12) 3J+N+1,J’+N+IBJ’,2 (Xl’2) lsJ,J’5N 02 n 2 -1 s 2% z z { 2KhBJ.2 (1:) 3 ¢ (:l:) /:1:, for a: Z 0, hence there exists some c > 0, such that l -- (3:) 3 Cd) (1:) for 97 large 3:, where (2:) and 43 (x) are the cumulative distribution function andthe density function of the standard normal. Take tn 2 ‘/16 log n, then there exists a constant c such that for large enough n :P{ sup |R(X,x1k)| >tnX 0tn} 0, we have 0 M A a! 0A A AA P A 8A A) + + A ' A A A AAA A 0< 549”" $ A + A C . c A W A ’A v {5+ t + A C o 0..” 0° A AAOA AA4F++ 00- A "AQA + + > + + . + +1 ’5 ++ O A A 0 8 + + 00 + d 0 + 6 0° 0 t +43) +*+‘N+#‘N'H + + A on“ W 3; a V V V *‘V 8 12 Latitude = -5, alpha=0.0001 o- 1- ~0- Figure 4.13. Spline confidence bands of LAI of deciduous woodland. 136 Deciduous Shrubland with Sparse Trees Latitude = O, alpha=0.0001 -. A A A AA A AAA A R A A AA AA A V A A A A A A A» + + + A v A A + * . o O A AQ c. 6 A3 + <30 A Ag '0‘“ + 1P" + + +3?» =9 e 4' o + g N’ 9)!) f) % 80 00 x r 090. ‘ (:0 < Latitude = 5, alpha=0.0001 Figure 4.14. Spline confidence bands and RAMS curves of LAI of deciduous shrub- land. Open to Very Open Trees Latitude = 0, alpha=0.0001 p A A A A MAAB A A A AAA A A A A M“ 5°” “Afw‘dgn “32336:: A “fl 3% A A A%6}5CQ:QO + ++ ++43 000 0 $0 +++++ "’ + +4’ + + 000° WMM+13++++ + + ”5*? O 0 l ; 1‘2 0 4 ; |2 Latitude = -S, alpha=0.0001 Latitude = 5, alpha=0.0001 P v- - Figure 4.15. Spline confidence bands and RAMS curves of LAI of rainfed herbaceous crop. 138 Rainfed Herbaceous Crop ed p A A A v1 A D A 435% AA A AA A AAA 4 A A A A A & M A A A A + °" 5 A A AA 4’ + A + A 9 “Al? + 95‘ t M+¢$¢$+ + cv (>0 o" «93%, + A... + + o 0 3+ 31* + 3+ Oat“ + 43.0% +1 03 W+++ A Q queoo @ V I V 0 4 8 I2 Latitude = 0, alpha=0.0001 Latitude = -5, alpha=0.0001 Latitude = 5, alpha=0.0001 Figure 4.16. Spline confidence bands and RAMS curves of LAI of open to very open trees. 139 CLchover with LEAFZ x Figure 4.17. Improved reprwentation of land surface in RAMS. 140 BIBLIOGRAPHY 141 BIBLIOGRAPHY [1] Africover(2002). Africover- Eastern Africa Module. Land cover mapping based on satellite remote sensing. Food and Agriculture Organization of the United Nations. [2] Andrews, D. and Whang, Y.(1990). Additive interactive regression models: cir- cumvention of the curse of the dimensionality. Economic Theory. 6 ,466—479. [3] Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the devia- tions of density function estimates. The Annals of Statistics. 1 1071-1095. [4] Bralower, T.J., Fullagar, P.D., et al (1997). Mid-cretaceous strontium-isotope stratigraphy of deep-sea sections. Geological Society of America Bulletin. 109, 1421-1442. [5] Breiman, L. and Friedman, J .H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Associ- ation. 80, 580-619 . [6] Chaudhuri, P. and Marron, J.S. (1999). SiZer for exploration of structures in curves. Journal of the American Statistical Association. 94 807-823. [7] Claeskens, G. and Van Keilegom, I. (2003). Bootstrap confidence bands for re- gression curves and their derivatives. The Annals of Statistics. 31 1852—1884. [8] Cotton, W. R., et a1. (2003). RAMS 2001: Current status and future directions. Meteorology and Atmospheric Physics. 82, 5-29. [9] de Boor, C. (2001). A Practical Guide to Splines. Springer-Verlag, New York. [10] DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Springer- Verlag, Berlin. [11] Fan, J. and Chen, J. (1999), One-step local quasi-likelihood estimation. Journal of the Royal Statistical Society Series B. 61, 927-934 142 [12] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall, London. [13] Fan, J. Ha'rdIe, W. and Mammen, E. (1998). Direct estimation of low-dimensional components in additive models. The Annals of Statistics. 26, 943—971. [14] Cantmacher, F. R. and Krein, M. C. (1960). Oszillationsmatrizen, Oszilla- tionskerne und kleine Schwingungen mechanischer Systeme. Akademie-Verlag, Berlin. [15] Hall, P. and Titterington, D. M. (1988). On confidence bands in nonparametric density estimation and regression. Journal of Multivariate Analysis. 27 228—254. [16] Hardle, W. (1989). Asymptotic maximal deviation of M-smoothers. Journal of Multivariate Analysis. 29 163—179. [17] Hardle, W. (1990). Applied Nonparametric Regression. Cambridge University Press, Cambridge. [18] Hardle, W. , Hlavka, Z. and Klinke, S. (2000). XploRe Application Guide. Springer-Verlag, Berlin. [19] Hiirdle, W., Huet, S. ,Mammen, E., and Sperlich, S.(2004). Bootstrap inference in semiparametric generalized additive models. Economic Theory. 20, 265-300. [20] Hardle, W., Marron, J. S. and Yang, L. (1997). Discussion of “Polynomial splines and their tensor products in extended linear modeling” by Stone et. al. The Annals of Statistics. 25 1443-1450. - [21] Hardle, W., Sperlich, S. and Spokoiny, V. (2001) Structural tests in additive regression. Journal of the American Statistical Association. 96, 1333-1347. [22] Harrison, D. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for cleaning air. Journal of Economics and Management. 5, 81-102. [23] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall, London. [24] Huang, J. Z. (1998). Projection estimation in multiple regression with application to functional AN OVA models. The Annals of Statistics. 26, 242—272. [25] Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. The Annals of Statistics . 31,1600-1635. [26] Huang, J. Z. and Yang, L. (2004). Identification of nonlinear additive autore- gression models. Journal of the Royal Statistical Society Series B. 66, 463—477. 143 [27] Johnson, R. A. and Wichern, D. W.(1992). Applied Multivariate Statistical Anal- ysis. Prentice-Hall, New Jersey. [28] Kim, W., Linton, O. B., and Hengartner, N.(1999). A Computationally efficient oracle estimator for additive nonparametric regression with bootstrap confidence intervals. Journal of Computational and Graphical Statistics. 8, 278-297. [29] Knyazikhin, Y., J. Glassy, J. L., Privette, Y.Tian, A. Lotsch, Y. Zhang, Y. Wang, J. T. Morisette, P. otava, R.B. Myneni, R. R. Nemani, S. W. Running,(1999) MODIS Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radi- ation Absorbed by Vegetation (FPAR) Product (MODIS) Algorithm Theoretical Basis Document. [30] Leadbetter, M. R., Lindgren, G.andRootzén, H.(1983). Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag, New York. [31] Linton, O. B. and Nielsen, J. P.(1995). Estimating structured nonparametric regression models by the kernel method. Biometrika. 82, 93-101. , [32] Linton, O. B. and Hardle, W.(1996). Estimating additive regression models with known links. Biometrika. 83, 529—540. [33] Linton, O. B.(1997). Efficient estimation of additive nonparametric regression models. Biometrilca. 84, 469—473. [34] Mack, Y. P. and Silverman, B. W.(1982). Weak and strong uniform consistency of kernel regression estimates. Z. Wahrscheinlichkeitstheorie verm Gebiete. 61 405-415. [35] Mammen, E., Linton, O. and Nielsen, J .(1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. The An- nals of Statistics. 27, 1443-1490. [36] Mayaux, P. Bartholome, E., fitiz, S. and Belward, A. (2004). A new land-cover map of Africa for the year 2000. Journal of Biogeography. 31,861-877. [37] Nielsen, J. P. and Sperlich, S.(2005), Smooth backfitting in practice, Journal of the Royal Statistical Society B. 67, 43-61. [38] Olson, J. M., Alagarswamy, G. , Andresen, J., Campbell, D.J., Ge, J ., Huebner, M., Brent Lofgren, B., Lusch, D.P., Moore, N., Pijanowski, B.C., Qi, J ., Torbick, N ., Wang, J. and Yang, L. (2006) Integrating diverse methods to understand climate-land interactions at multiple spatial and temporal scales, GeoForum. [39] Opsomer, J. D.(2000). Asymptotic properties of backfitting estimators. Journal of Multivariate Analysis. 73, 166-179 144 [40] Opsomer, J. D. and Ruppert, D.(1997). Fitting a bivariate additive model by local polynomial regression. The Annals of Statistics. 25 186—211. [41] Opsomer, J. D.andRuppert, D.(1998). A My automated bandwidth selection method for fitting additive models. Journal of the American Statistical Associa- tion. 93, 605—619. [42] Pitman, A. (2003) The evolution of, and revolution in, land surface schemes designed for climate models. International Journal of Climatology. 23, 479-510. [43] Rosenblatt, M.(1976). On the maximal deviation of k-dimensional density esti- mates. The Annals of Probability. 41, 009-1015. [44] Ruppert, D., Wand, M.P. and Carroll, R.J.(2003) Semiparametric Regression. Cambridge University Press, Cambridge; New York . [45] Silverman, B. W.(1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London. [46] Sperlich, S., Tjostheim, D. and Yang, L.(2002). Nonparametric estimation and testing of interaction in additive models. Econometric Theory. 18, 197-251. [47] Stone, C. J .(1985). Additive regression and other nonparametric models. The Annals of Statistics. 13, 689—705. [48] Stone, C. J .(1994). The use of polynomial splines and their tensor products in multivariate function estimation. The Annals of Statistics. 22, 118—184. [49] Tj¢stheim, D. and Auestad, B.(1994). Nonparametric identification of nonlinear time series: projections. Journal of the American Statistical Association. 89, 1398-1409. [50] Torbick, N., Lusch, D., Olson, J., Qi, J., Ge, J.. (2005a) An Assessment of Africover and GLC2000 using general agreement and airborne videography. In- ternational Journal of Remote Sensing. (submitted). [51] Torbick, N., Qi, J ., Lusch, D., Olson, J ., Moore, N., Ge, J. (2005b) Developing land use land cover parameterization for climate-land modelling in East Africa. (in progress). [52] Tusnady, G.(1977). A remark on the approximation of the sample (if in the multidimensional case. Periodica Mathematica Hungarica. 8, 53-55. [53] Walko, R.L., Band, L.E., Baron, J., Kittel, T.G.F., Lammers, R., Lee, T.J., Ojima, D., Pielke Sr., R.A., Taylor, C., Tague, C., 'Ilemback, C.J., Vidale, PL, (2000). Coupled atmosphere - biophysics - hydrology models for environment modeling. Journal of Applied Meteorology. 39, 931- 944. 145 [54] Wang, J. and Yang, L.(2006). Polynomial spline confidence bands for regression curves. The Annals of Statistics. tentatively accepted. [55] Xia, Y.(1998). Bias-corrected confidence bands in nonparametric regression. Journal of the Royal Statistical Society Series B. 60, 797—811. [56] Xue, L. and Yang, L.(2006). Estimation of semiparametric additive coefficient model. Journal of Statistical Planning and Inference. 136, 2506-2534. [57] Yang, L., Hardle, W. and Nielsen, J. P.(1999). Nonparametric autoregression with multiplicative volatility and additive mean. Journal of Time Series Analy- sis. 20, 579-604. [58] Yang, L., Sperlich, S.and Hardle, W. (2003). Derivative estimation and testing in generalized additive models. Journal of Statistical Planning and Inference. 115, 521-542. [59] Zhang, F.(1999). Matrix Theory. Basic Results and Techniques. Springer-Verlag, New York. [60] Zhou, S., Shen, X. and Wolfe, D. A.(1998). Local asymptotics of regression splines and confidence regions. The Annals of Statistics. 26, 1760-1782. 146 [[11] 470 m[[[[‘][][fl[][[[i[[