......t r ti 1.... by}; “3.91%“??05. “agar“... “Iii I .mhhuvfl zafl mu.» lib“??? . .1 ... x M H. Al u : ...”.uaduf...” . .... Q. . , , . . . . . ..sgmfifiwwfi.%%mfir ,. . . A ‘11.. Jmm....‘ :. . l. . ‘ . Panninkl,‘ l . . , > 35$..51.”...nmmsnvainma ... .a. .....» . ll ...-. “.3 ‘ 19. I: . , J1 209; This is to certify that the dissertation entitled Polynomial Spline Smoothing for Nonlinear Time Series presented by Li Wang has been accepted towards fulfillment of the requirements for the PhD. degree in Statistics and ProbabilitL M“ Major Professor’s Signature 4/50/42,? Date MSU is an Affirmative Action/Equal Opportunity Institution Do-I-I-I-a-o-n-v-n-o-oCO-o-I-l-n-I-I-.-c--.-..— L i t}- ;"75- A i 3i Y .3. 1.?“ ._ i‘..-::CE-.,~;gm Stet-a stasQ ! ‘ . l '1' 3.‘ ."i rt .4.‘ I.) A If .u \ l 1 \, ‘ H __.-—-__. - ~-— .— H-.. PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/07 p:/CIRCIDateDue.indd-pt1 POLYNOMIAL SPLINE SMOOTHING FOR NONLINEAR TIME SERIES Li Wang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Probability and Statistics 2007 ABSTRACT POLYNOMIAL SPLINE SMOOTHIN G FOR NONLINEAR TIME SERIES By Li Wang Nonlinear time series analysis has gained much attention in recent years due primar— ily to the fact that linear time series models have encountered various limitations in real applications and the development in nonparametric regression has established a solid foun- dation for nonlinear time series analysis. In this dissertation, polynomial spline smoothing is studied for nonlinear time series. For univariate nonlinear time series, uniform confidence bands of a nonparametric pre- diction function are constructed using the polynomial spline method. As an application, after removing the environmental Kuznets curve trend effects, the impact of the economic intervention on environmental quality change is quantified for the United States and Japan, with different conclusions. Application of non- and semiparametric regression techniques to high dimensional time series data have been hampered due to the lack of effective tools to address the “curse of dimensionality”. There are essentially two approaches to circumvent this difficulty: function approximation and dimension reduction. For the function approximation approach, the nonlinear additive autoregression (NAAR) model is examined. Under rather weak conditions, spline-backfitted kernel estimators of the component functions are proposed for weakly dependent samples that are both computa- tionally expedient (so it. is usable for analyzing very high dimensional time series), and theoretically reliable (so inference can be made on the component functions with confi— dence). For the dimension reduction approach, a single-index prediction (SIP) model based on weakly dependent sample is studied. The single-index is identified by the best approximation to the multivariate prediction function of the response variable, regardless of whether or not the prediction function is a genuine single-index function. A polynomial spline estimator is proposed for the single—index prediction coefficients, and is shown to be root-n consistent and asymptotically normal. An iterative optimization routine is used which is sufficiently fast for the user to analyze large data sets of high dimension within seconds. Application of the proposed procedure to the river flow data of Iceland has yielded superior out-of—sample rolling forecasts. Copyright by Li Wang 2007 I dedicate this work to my husband Lei Gao and my parents. ACKNOWLEDGMENTS ,I would like to thank many people who have helped me on the path towards this disser- tation. First and foremost, I would like to express my gratitude to my advisor, Professor Lijian Yang. I could never have reached the heights or explored the depths without his gen— erous help, unbreakable support and patient guidance. Every single discussion I had with him was valuable in terms of new ideas and renewed scientific excitement. His infectious enthusiasm and unlimited zeal have been major driving forces through my graduate study at MSU and will keep encouraging me in my future research. I also wish to express my gratitude to my dissertation committee, Professor Dennis Gilliland, Professor Yiming Xiao, Professor Ana Maria Herren, for sparing their precious ‘ time to serve on my committee and giving valuable comments and suggestions. I must acknowledge Professor Dennis Gilliland and Professor Connie Page for accepting me as one of the consultants at CSTAT, where I have obtained plenty of opportunities to work with students and faculty from a variety of disciplines. I am grateful to the entire faculty and staff in the Department of Statistics and Proba- bility who have taught me and assisted me during my study at MSU. And special thanks are given to Professor James Stapleton and Professor Vince Melfi for their numerous help, constant support and encouragement. Thanks to the graduate school and the Department of Statistics who provided me with the Dissertation Completion Fellowship for working on this dissertation. This dissertation is also supported in part by NSF award DMS 0405330. Last but not least, I would like to thank my husband, Lei Gao, for his love and support over all these years and two of my academic sisters: Dr. Jing Wang and Dr. Lan Xue for their generous help. vi TABLE OF CONTENTS LIST OF TABLES ................................. ix LIST OF FIGURES ................................ x 1 Introduction ................................... 1 1.1 Nonlinear Time Series Prediction Model .................... 1 1.2 Spline Confidence Bands ............................. 2 1.3 Nonlinear Additive Autoregression (NAAR) Model .............. 3 1.4 Single—Index Prediction (SIP) Model ...................... 4 1.5 Polynomial Spline Smoothing .......................... 5 2 Spline Confidence Bands for Time Series Prediction Function ..... 7 2.1 Introduction .................................... 7 2.2 Main results .................................... 8 2.3 Error decomposition ............................... 11 2.4 Implementation .................................. 13 2.5 Examples ..................................... 15 2.5.1 Simulation example ............................ 15 2.5.2 Environmental Kuznets curve (EKC) .................. 17 2.6 Proof of Theorem 2.2.1 .............................. 19 2.6.1 Preliminaries of Theorem 2.2.1 with k = 1 ............... 19 2.6.2 Proof of Proposition 2.3.1 with k = 1 .................. 21 2.6.3 Proof of Theorem 2.2.1 with k = 1 ................... 24 2.6.4 Preliminaries of Theorem 2.2.1 with k = 2 ............... 25 2.6.5 Variance calculation ........................... 26 2.6.6 Proof of Theorem 2.2.1 with k = 2 ................... 28 3 Spline-Backfitted Kernel Smoothing of NAAR Models ......... 33 3.1 Introduction .................................... 33 3.2 The SPBK estimator ............................... 36 3.3 Decomposition .................................. 41 3.4 Bias reduction .................................. 43 3.5 Variance reduction ................................ 45 3.6 Simulations .................................... 47 3.6.1 Example 1 ................................. 48 3.6.2 Example 2 ................................. 50 vii 3.7 Proof of the main results ............................. 50 3.7.1 Preliminaries ............................... 50 3.7.2 Empirical approximation of the theoretical inner product ....... 57 3.7.3 Proof of Lemma 3.5.2 ........................... 61 4 Spline Single-Index Prediction Model ....... . ............ 70 4.1 Introduction .................................... 70 4.2 The Method and Main Results ................ > .......... 72 4.2.1 Identifiability and definition of the index coefficient .......... 72 4.2.2 Variable transformation ......................... 73 4.2.3 Estimation Method ............................ 75 4.2.4 Asymptotic results ............................ 75 4.3 Implementation .................................. 78 4.4 Simulations .................................... 80 4.4.1 Example 1 ................................. 80 4.4.2 Example 2 ................................. 81 4.5 Application .................................... 82 4.6 Proof of the main results ............................. 83 4.6.1 Preliminaries ............................... 83 4.6.2 Proof of Proposition 4.2.1 ........................ 87 4.6.3 Proof of Proposition 4.2.2 ........................ 95 4.6.4 Proof of Theorem 4.2.2 .......................... 99 BIBLIOGRAPHY ................................. 132 viii 4.1 4.2 4.3 4.4 4.5 4.6 LIST OF TABLES. Example 2.5.1: Piecewise constant spline bands coverage probabilities . . . . 103 Example 2.5.1: Piecewise linear spline bands coverage probabilities ..... 103 Report of Example 3.6.1 ............................. 104 The computing time of Example 3.6.1 ...................... 104 Report of Example 4.4.1 ............................. 105 Report of Example 4.4.2 ............................. 106 ix 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 Example 2.5.1: Example 2.5.1: Example 2.5.1: Example 2.5.1: Example 2.5.1: Example 2.5.1: Example 2.5.1: Example 2.5.1: Example 2.5.2: Example 2.5.2: Example 2.5.2: Example 3.6.1: ponent . . . . Example 3.6.1: component . . Example 3.6.1: ponent . . . . Example 3.6.1: Example 3.6.2: Example 3.6.2: Example 4.4.1: Example 4.4.1: Example 4.4.2: LIST OF FIGURES 95% constant spline confidence bands with opt = l ..... 107 99% constant spline confidence bands with opt = 1 ..... 108 95% constant spline confidence bands with opt = 2 ..... 109 99% constant spline confidence bands with opt = 2 ..... 110 95% linear spline confidence bands with opt = 1 ....... 111 99% linear spline confidence bands with opt = 1 ....... 112 95% linear spline confidence bands with opt = 2 ....... 113 99% linear spline confidence bands with opt = 2 ....... 114 Plot of the EKC in terms of u(t) and v(t) ........... 115 'Itend and noise analysis of US ................. 116 "fiend and noise analysis of Japan ....... . ........ 117 SPBK estimator with confidence intervals for the first com- ................................... 118 SPBK estimator with confidence intervals for the second. ................................... 119 SPBK estimator with confidence intervals for the third com- ................................... 120 Plot of the relative efficiencies of components 2 and 3 . . . . 121 Plot of the relative efficiencies of components 1 and 2 . . . . 122 Plot of the relative efficiencies of components 15 and 30 . . . 123 The actual bivariate surface .................. 124 The univariate approximation to the bivariate surface . . . . 125 The univariate approximation (d = 10, 50) .......... 126 4.21 Example 4.4.2: The univariate approximation (d =2 100, 200) ......... 127 4.22 Example 4.4.2: Kernel density plots of the error norms ............ 128 4.23 Example 4.4.2: Kernel density plots of the error norms ............ 129 4.24 Time plots of the daily river flow data ..................... 130 4.25 The fitted, residual and forecast plots of the river flow data .......... 131 xi CHAPTER 1 Introduction 1.1 Nonlinear Time Series Prediction Model Classic regression and time series tools such as the generalized linear model and the lin- ear autoregression are known to be inadequate for complex data that exhibit nonlinearity. This recognition has motivated the development of non— and semiparametric regression techniques, with far reaching applications, see, for example, Fan and Gijbels (1996), Bosq (1998), Fan and Yao (2003). A typical nonparametric problem in time series analysis is the classical decomposition of a realization of a time series into a slowly changing function known as a “trend component”, or simply trend, a periodic function referred to as a “seasonal component”, and finally a “random noise component”, which in terms of the regression theory should be called the - time series of residuals. In time series analysis smoothing problems occur of course in the spectral domain when we want to estimate the spectral density, e.g. for model fitting. In the time domain nonparametric prediction is one of the fields where smoothing methods are intensively used. A well-known example is the water flow prediction from a time series of river data, see Section 4.5 in Chapter 4. In the motorcycle crash test, the acceleration of the dummy head after impact follows a complicated instead of a simple polynomial time trend. Another example of the nonlinear time series is the quarterly unemployment rate of US. women, which follows a nonlinear instead of a simple linear prediction formula. Effective tools for extracting information from such complex regression data have to be non- and semiparametric in nature. In the following, let {X$,K}n 1 = {X,-,1,...,X,,d,Y,~}?_l be a (d+1)—dimensional z: _ strictly stationary process following the stochastic regression model Yz' = m (Xi) +0(Xi)€z‘,m(xtl = E(Yz'|xz'), (1-1-1) in which E(e,- IX,) 2 0, E (5?|X,-) = 1, 1 _<_ i g n. The-d-variate functions m, a are the unknown mean and standard deviation of the response Y, conditional on the predictor vector X,, often estimated nonparametrically. Two very popular forms of nonparametric regression are kernel/ local polynomial type and spline type smoothing. In this work, the polynomial spline smoothing is extensively studied for nonlinear time series. The greatest advantages of spline smoothing, as pointed out in Huang and Yang (2004), Xue and Yang (2006 b) are its simplicity and fast compu- tation. For model in (1.1.1), when the dimension of the predictor vector X,- is l (d = 1), spline confidence bands are obtained in Chapter 2 for time series prediction function m under weak dependence. Application of smoothing techniques to high dimensional time series have been hampered due to the lack of effective tools to address the “curse of dimensionality”, which refers to the poor convergence rate of nonparametric estimation of general multivariate function. Much effort has been devoted to methods of circumventing. this difficulty. In the words of Xia, Tong, Li and Zhu (2002), there are essentially two approaches: function approximation and dimension reduction. Additive model and single~index model, special cases of model (1.1.1), are good examples to represent these two approaches. Chapter 3 and Chapter 4 discuss these two models separately. 1.2 Spline Confidence Bands Consider the one dimensional case of model (1.1.1) for strictly stationary bivariate time series {(X,,I/,-)}?:1 Y,- =m(X,)+o(X,-)5,-,i=1,...,n, (1.2.1) where the errors {5,}?21 are white noise, i.e., E(E,‘ IX,) = 0,var(5,- IX,) 2 1 and 5,- is a martingale difference for the a—fieltl f,- = 0 {Xj,8j_1, 1 3’ j S i} for 2' = 1, ...,n. To put the discussion in perspective, consider the question of how the adjustment of GDP autonomously influence the change of the environmental quality in Japan, see Section 2.5.2 in Chapter 2. The logarithm of GDP per capita and the emissions per capita of Japan are decomposed as u(t) + X: and v(t) + Yt, t = 1, ...,n respectively, where the quadratic trends u(t) and v(t) are given in (2.5.3), {(Xt, IQ)}?___1 are zero mean stationary time series of residuals. The aforementioned question can be formulated in terms of various hypotheses about the prediction function m(:z:) = E(Yt|Xt = :r). In Figure 4.11 (b), a 99% conservative simultaneous confidence band of m(:c) is plotted together with the linear regression line, clearly showing the nonlinear dependence of Y; on Xt. The corresponding Figure 4.10 (b) for the United States, however, shows a linear and insignificant m(:r). Making such inference about the global shape of the prediction function m(:i:) depends crucially on the construction of simultaneous confidence bands for m using the time series observations {(Xi, Y,)}?___1. In Chapter 2, asymptotically conservative simultaneous confidence bands are con— structed for nonparametric prediction function m based on piecewise constant and piecewise linear polynomial spline estimation, respectively. Simulation experiments have provided strong evidence that corroborates with the asymptotic theory. As an application, after removing the environmental Kuznets curve trend effects, the impact of the economic inter- vention on environmental quality change is quantified for the United States and Japan, with different conclusions. 1.3 Nonlinear Additive Autoregression (NAAR) Model For multi-dimensional strictly stationar time series X - ,...,X' ,Y- 7.: , the followin 3’ 1,1 z,d 2 ,_1 g additive structure is assumed for model ( 1.1.1) (1 v, = c+ 2 ma (Xi’a) +0(X,-)5,- (1.3.1) 021 In nonlinear additive autoregression data-analytical context, each predictor Xaa, 1 S a g d could be observed lagged values of Y,, such as X”, = Y,-_(,, or of a different times series. Model (1.3.1) therefore, is the exact same nonlinear additive autoregression model of Huang and Yang (2004), which allows for exogenous variables. For identifiability, additive component functions must satisfy the conditions Ema (Xi,a) E 0, a = 1, ..., d. Application of additive model to high dimensional time series data has been hampered by the scarcity of smoothing tools. The straightforward kernel methods are too compu- tationally intensive for high dimension, thus limiting their applicability to small number of predictors. Spline methods on the other hand, provide only convergence rates but no asymptotic distributions, so no measures of confidence can be assigned to the estimators. In Chapter 3, a spline-backfitted kernel estimator is proposed for estimating the un- known component functions {ma (-)}g=1 based on a geometrically strong mixing sample following model (1.3.1). under minimal smoothness assumptions. The idea is to employ one step backfitting after the spline pilot estimators, and then follow up with kernel smoothing, which combines the fast computing of polynomial spline smoothing and the good asymptotic property of kernel smoothing. Thus, the spline-backfitted kernel estimator is both computa- tionally expedient for analyzing very high dimensional time series, and theoretically reliable to make inference on the component functions with confidence. 1.4 Single-Index Prediction (SIP) Model Single-index model, a special case of projection pursuit regression, has proven to be an effi- cient way of coping with the high dimensional problem in nonparametric regression. Single— index model summarizes the effects of the explanatory variables within a single variable called the index. The basic appeal of single-index model is its simplicity: the d-variate func— tion m (x) = m (:51, ..., xd) is expressed as a univariate function g of xTBO = 25:1 90’pIp. In Chapter 4, a robust singleindex prediction (SIP) model is introduced for stochastic regression model 1.1.1 regardless if the underlying function is exactly a single-index function. Applications of SIP models lie in a variety of fields, such as discrete choice analysis in econometrics and dose—response models in biometrics, where high-dimensional regression models are often employed, see Hardle, Hall and Ichimura (1993). The proposed spline estimator of the index coefficient possesses not only the usual strong consistency and fi- rate asymptotically normal distribution, but also is as efficient as if the true link function g is known. By taking advantage of the spline smoothing method and the iterative method, the proposed procedure is much faster than the MAVE method, see Xia, Tong, Li and Zhu (2002). This procedure is especially powerful for large sample size n and high dimension d and unlike the MAVE method, the performance of the SIP remains satisfying in the case d>n. 1.5 Polynomial Spline Smoothing Let {X,, 1”,}?21 be a strictly stationary process. Assume that X,, i = 1, ..., n, are supported on a compact interval [a, b]. Polynomial splines begin by choosing a set of knots (typically, much smaller than the number of data points 11), and a set of basis functions spanning a set of piecewise polynomials satisfying continuity and smoothness constraints. To be specific, divide [a,b] into (N+ 1) subintervals Jj = [tj,tj+1), j = 0,...,N — 1, J N = [tN,1], where T := {tj )9; is a sequence of equally-spaced points, called interior knots, given as t1_k=...=t_1=t0=a 1, with I t' < u < t" 1 B- :1 = 3“ 3+ 3’1 (u) { { 0 otherwise 'uEJj} For model (1.2.1), assume that m(:1:) belongs to 00¢) [a,b], the space of functions that have k—th order continuous derivatives for some integer k > 0, on the interval [(1,1)]. The polynomial spline estimator is it fit], () :2 argmin Z {Y,- - g (X,-)}2 ,k > 0. (1.5.3) g(-)€G(k-2)[a,b] i=1 In the rest of this dissertation, spline smoothing is applied for the stochastic regression model (1.1.1) under different conditions. CHAPTER 2 Spline Confidence Bands for Time Series Prediction Function 2.1 Introduction Theoretical properties of nonparametric smoothers are typically examined in terms of mean square, pointwise, or uniform rate of convergence, while practical consideration favors meth- ods that are easy to implement and interpret. In addition, fast computing is appealing for users of smoothers. For kernel smoothing of independent data, satisfactory results on rates of convergence have been obtained, see Fan and Gijbels (1996) for pointwise and mean square convergence rates, Miiller, Stadtmiiller and Schmitt (1987) for confidence intervals of derivative estimates, Neumann (1995, 1997) for bandwidth choice and construction of confidence intervals, Hall and Titterington (1988), Hardle (1989), Xia (1998), Claeskens and Van Keilegom (2003) for uniform confidence bands. Spline smoothers of independent data have been investigated in parallel, see for example, Stone (1985, 1994) for mean square convergence, Huang (2003) for pointwise convergence, and Zhou, Shen and Wolfe (1998) for uniform confidence bands. Nonparametric smoothing of weakly dependent data has been vigorously pursued in many directions due to its superiority for the modeling and forecasting of nonlinear time series, see, for instance, Fan and Yao (2003) for kernel type autoregression smoothing, and Huang and Yang (2004) for spline type autoregressive smoothing. Confidence bands, however, remain unavailable for all nonparametric smoothers based on dependent observa- tions, due to the lack of Hungarian embedding for dependent random variables, similar to that established by Tusnady (1977) for independent random variables. Existing results on nonparametric smooth confidence bands rely on such strong approximation result of i.i.d. sample, see, for instance, Bickel and Rosenblatt (1973), Rosenblatt (1976), Hardle ( 1989), Xia (1998), Claeskens and Van Keilegom (2003). In this chapter, asymptotic simultaneous confidence bands are obtained for the unknown regression function m (x) in (1.1.1) based on the polynomial spline estimator 772,, (2:) defined in (1.5.3), while the observations {(X,, Y,)},’-‘=1 are only assumed to have a—mixing coefficient a(n) decaying geometrically (see Assumption (A4) of Section 2.2). Instead of applying the usual Hungarian embedding technique used in most existing works, we make use of the Berry-Esseen bound in Sunklodas (1984) for sequences of mixing random variables to establish that the constructed confidence bands are conservative. The resulting confidence bands are comparable in terms of formula and narrowness to those constructed for i.i.d. sample. Further research will show that these simultaneous confidence bands are very useful for multi-step ahead forecasting of time series data, such as studied in Chen, Yang and Hafner (2004). The rest of this chapter is organized as follows. The main findings of splines confidence bands are stated in Section 2.2. Section 2.3 provides further insights into the error structure of spline estimators, from which one is able to obtain the asymptotic confidence bands. This is accomplished by establishing simultaneous Berry-Esseen bound for the estimation noise. Section 2.4 describes the actual steps to implement the confidence bands. Section 2.5 reports the findings in an extensive simulation study and the application to the environmental Kuznets curve (EKC) analysis. All technical proofs are contained in Section 2.6. 2.2 Main results Before stating the main theorems, we formulate some assumptions. (A1) The regression function 711 E C(k) [a, b], k = 1,2. (A2) The marginal density function f (11:) of X is continuous and positive on its compact support, the interval Ia,bI. The standard deviation function 0(17) is continuous and positive on Ia, 1)]. (A3) The number of interior knots N satisfies: (n/ log n)1/(2k+1) << N << 111/3, hence for k = 2, one can take N N n1/5, while for k = 1, one can take N N n1/3(logn)”1/6. (A4) There exist positive constants K0 and A0 such that a(n) S Koe“’\0" holds for all n, where the strong mixing coefl'icient of order n is defined as a(n)= sup IP(BflC)—P(B)P(C)I,n21. BEO’{X3,Ys,3£t},CEO’{X3,Y3,SZt‘f'Tl} (A5) The joint distribution of random variables (X ,5) satisfies the following: (a) The error is a white noise, E(e IX 2 :15) = 0, E (52 IX = 2:) =1. (b) There exists A10 > 0 such that sup E (IEI3 IX = 2:) < MO. xEan Assumptions (A1)-(A5) are typical in the nonparametric smoothing literature, see for instance, Fan and Yao (2003), Huang and Yang (2004). For any a: 6 [a,b], define its location index j (:13) and relative location index (5 (:22) as ._ a: — t - J’ (2:) = in (2:) = min I IE—h—‘EI ,N}, 6(1) 2 —IJ'(_I)’ (2.2.1) It is clear that tj(x) g :1: < tj(_,,,)+1, 0 S 6(12) < 1, Va: 6 [a,b], j(b) = N, 6(b) = 1. For any 1.2-integrable functions (15, (p on Ia, b], the theoretical and empirical inner products and the corresponding L2 norms are defined respectively by b at» = / immense = mangoes}, ”as = E {i2 (X)} = /b¢2 (as) f($)d:c, we). 2 n4 2 {a5 or.) 990(1)} , “at. = 72-1: ,2 (X0. i=1 i:1 For notation simplicity, we denote by IIII00 the supremum norm of a function r on Ia,b], :2 sup Ir (1:)I, and the moduli of continuity of a continuous function r on [a,b] :cEIa,bI is denoted as w (r, h.) 2 wide max Ir (x) — r (23’) I. By the uniform continuity of r on :r,r’€[a,bI,Ix-$’Igh an interval [a, b], one has ’Iimow (r, h) = 0. We denote the theoretical norms of Bj,k: k = 1,2 in (1.5.2) as follows %n=lwnfi=/GWU@ML was d”, = IIBJ'JIIE = [K {(11: -— tj+1) h-l} f(:1:) d:r.. (2.2.3) For theoretical analysis, in the following of this chapter, we use the rescaled B-spline basis (divided by its theoretical norm cj'n, dj,n) {By-,1 (:12) }§V=0 and {8,32 (:12) }j.v:_1 for constant spline space C(‘ll and linear spline space (7(0) defined in Section 1.5. The inner product matrix of the B-spline basis {83,1 (:3) }j:0 is obviously the identity matrix I N+12 while the corresponding matrix V of the B-spline basis {Bj,2(1‘)}§:_1 is denoted as V N B B N 2 2 4 _ (Ujlj)jsj,:—l — (< jl’2, J’2>)jsj,=—I, ( . I ) whose inverse matrix S and its 2 x 2 diagonal submatrices are expressed as N —1 Sj—Ij—I Sj—lj - S: (34.) ,, 2V ,S, = , . ,9 =0,...,N. (2.2.5) J J Jr.) :_1 Sj7j_l Sjij ‘ Next define matrices 2, A (:13) and 83- as N N 2 23 ‘-‘ (011),):4 = {f0 (’0) 33,2 (U) 131,2 (“IN”) 411} 1 , (225) j: 2—1 = Cj(x)_1{1—5($)} ,2 J2 j=—1,N ACE) (OJ-(306(1)) ,CJ 1 OSjSN—I , lj+2,j+1 lj+2,j+2 where terms {liklli-qu are the entries of the inverse of the (N +2) x (N + 2) matrix 1- - l- - E,=(J+1’J+l 3+1”+2),j=0,1,...,N, (2.2.7) MN+2 and can be computed by Lemma 2.6.10 1 \f2/4 0 \ fi/zi 1 1/4 MN+2 = 1/4 _1 , (23-8) ., 1/4 1/4 1 fi/4 \0 fi/4 1 f 10 Define next f 13(1) (v) 02 (v) f (v) dv 2 o,,,1(:1:) = 2 , (2.2.9) 716. 3(1)," N 1 2 an,2(a:) = E Z 814,2($)B,,’2(:r)sjj;su/afl, (2.2.10) j,j’,l,l’=—l with j(:r) defined in (2.2.1), ij in (2.2.2) and s“; and 03-) in (2.2.5), (2.2.6). These 03, k (2:) are shown in Lemmas 2.6.5, 2.6.11 to be the pointwise variance functions of the spline estimators 1a,, (2:), l: = 1,2. Lastly, define an inflation correction factor, for any a 6 (0,1) dn(a)=1—-—{2log(N +1)}—1 log (Ct/2) + % {log log (N +1) +log47r} . (2.2.11) THEOREM 2.2.1. Under Assumptions (AU—(A5), for any a 6 (0,1), an asymptotic 100 (1 —— a) % conservative confidence band for m (an) over interval [a, b] is mm :1: an), (11:) {2k log (N +1)}1/2 an (oz/k) ,k = 1, 2. (2.2.12) In other words, for k z 1, 2 limian Im (I) 6 ink (:17) j: 0",), (.17) {2k log (N +1)}1/2dn(a/k),Va:”E [a,bII _>_1— a, Tl—'*OO in which on,1(:r.) is given in (2.2.9), replaceable by o(:1:) {f(2:)nh}‘"1/2 accord- ing to (2.6.6) in Lemma 2.6.5, on; (:r) is given in (2.2.10), replaceable by l 2 o(:1:) {2f ($)nh/3}—1/2 {AT (:13) Ej(I)A(:r)} / according to Lemma 2.6.9 and (2.6.1.9) in Lemma 2.6.11, and d" (a) is given in (2.2.11). 2.3 Error decomposition In this section, we break the polynomial spline estimation error at, (:13) —— m (2:) into a bias term and a noise term, with fiik(:17) given in (1.5.3). We first establish the uniform rate at which the empirical inner product approximates the theoretical inner product for all B-splines. 11 LEMMA 2.3.1. Under Assumptions (A3) and (A5), we have A,“ = sup IIIBj,1II§ n — 1| 2 Op {(nh)—1/2 log n} , (2.3.1) OSISN ’ (91.92%; ‘ (91,92) l|91l|2 l|92ll2 A112 : sup = 0,, {(nh)_1/2 log n} . (2.3.2) Note that the spline estimator in (1.5.3) is ya}, (x) that m, (x) E XIV—14c)? kB jk (x), where - . T " {AI—k,k1“‘1/\N,k} = argmin Z Y,‘ - Z )‘jkBJ k(Xi) {A1_k’k,...,x\N,k}eRN+k i=1 3'- 1— k With a slight abuse of notation, introduce a function Y defined only on data points: Y (X,) E Y,,1§i§ n, and write B T B B —1 Y B > N T)—{ 2.k(I)}i—kstN (< 2%” Jik>n)1—k52,2’SN{< ’ 3"“ n}j=1—k‘ Define asimilar function E as E (X,) E a (X,) 5,, 1 S i S n, then on data points Y = m+E with m = {m(X1), ...,m(Xn)}T. An empirical inner product yields m, (x) = r72), (x) + Ek (x), where T -l = {BM (33)}1—kfijiN (n)1-kgj,j’giv{n)1—jkgj,j’gN {(13, B,,,)n}:l_k. (2.3.4) Thus, the estimation error in), (2:) — m (x) consists of a bias term 171,, (x) — m (2:) and a noise term E), (x), such that 721,, (x) —— m (x) : {171k (2:) — m (27)} + 5k (x). (2.3.5) LEMMA 2.3.2. [de Boor (2001) page 149/ There exists an absolute constant Ck > 0, k 2 1, such that for every 771 E (70‘) Ia, b], there exists a function g E C(k—2) [(1,1)], such that IIg — mIIOO S Ck IIw (mac—‘1), h) II h“‘1 g Ck OO 771(k) II 11“. 00 12 LEMMA 2.3.3. [Huang (2003) Theorem 5.1] Under Assumptions (A 1)-(A4), there exists an absolute constant Ck > 0, k 2 1, such that for any m 6 CU”) Ia,b] and the function 771,, (x) as in (2.3.3), with probability approaching 1 um), (I) — m mum g 0,, inf IIg — mud, = o, (hk) . (2.3.6) 966' ’2) Lemmas 2.3.2 and 2.3.3 establish that the bias term is of order Op (hk) uniformly over 2: 6 [a,b]. Hence the main hurdle of proving Theorem 2.2.1 is the noise term 5,, (x) defined in (2.3.4). This is handled by the next proposition. PROPOSITION 2.3.1. Under Assumptions (A2)—(A5), with 071,1 (x) given in (2.2.9) and 07,2 (x) given in (2.2.10), for any 0 < a < 1, k = 1,2, one has limian I sup 0;), (2:) 5k (2:) g {2klog(N + 1)}1/2d/n (oz/k) Z 1 -— a. (2.3.7) "HOG IEIde 2.4 Implementation In this section, we describe in detail the procedures implemented to construct the confidence bands in Theorem 2.2.1. All of the codes have been written in R. Given any sample {(X,, Y,)}?_:_1, use the minimum and maximum values of {X,-}?:1 as the endpoints of interval [a, b]. The number of knots N is taken to be Ickn1/3(logn)—1/6I for k = 1 and Icknl/SI for k = 2, where ck (k = 1,2) are positive integers. As with previous works on confidence bands (Hardle 1989, Xia 1998, Claeskens and Van Keilegom 2003), explicit formula of coverage probability for the bands does not exist, hence there is no Optimal method to select C), (k = 1,2). So we have not attempted adaptive knot selection, as Hardle, Marron and Yang (1997) had illustrated that it could lead to uniform inconsistency. We have set c1 = 6, c2 = 3 for piecewise constant and piecewise linear bands respectively, which works well in all simulations. The least squares problem in (1.5.3) is solved by writing spline functions as linear com- k—l + ,j=1,...,N. In binations of the truncated power base, which are 1, x, ...,xk’l, (2: — tj) 13 other words, we take )k— 1 772,, (2:): :ypxp +22")ij , (2.4.1) 1120 where the coefficients {’70, ..., 7k—1:’71,ka ..., 3N,k}T minimize the following sum of squares 2 11 Z Y 27px? JFZIJ'HX Jlkl i=1 When constructing the confidence bands, one needs to estimate the unknown functions f (2:) and 02 (x) for the evaluation of the functions 011,1 (2:) in (2.2.9) and 07,3 (x) in (2.2.10) according to Lemma 2.6.5 and Lemma 2.6.11. Let R (u) = 15 (1 -— u2)2 I {IuI _<_ 1} /16 be the quartic kernel, 3,, be the sample stan- dard deviation of {X,- H _1 and A , :1 hr—oltl f {U -15) ’ (2.4.2) hrot ,f where hm, f:(47f)1/10(140/3)1/571-1/5sn is the rule- of—thumb bandwidth of Silverman (1986). Theorem 2.2 of Bosq (1998), page 47, implies the following uniform consistency result sup xEIa,bI f (:17) — f (I)I = 0, as. (2.4.3) Define vectors Z), = {Z1,k, .., Z,,,k}T, k = 1, 2 with Zn), 2 {Y,- — m), (X,)}2, then the spline estimation of o2 (x), 6% (x), k = 1, 2, can be obtained by using the N adaraya-Watson estimation on data {X,, Z,,k}?:1. It is clear from standard theory of kernel smoothing that max sup I0% (2:) - 02 (2:) = Op (h). (2.4.4) k=1 2 xEIa, b] With all the above preparation, one can compute the following confidence bands 1a,,(a: )ionk(x, opt){2klog(N+1)}1/2d%(,a/2) k=1,2,0pt= 1,2, (2.4.5) where m, (x) is given in (2.4.1), the additional parameter opt = 1, 2 indicating the estima— tion being at each value 1: or at the nearest left knot with j (x) and f(r ) defined in (2.2.1) 14 and (2.4.2) s,,,1(.z:,1) =-_ 1310““) 134/2 (5,”) n—1/2h_1/2, (2.4.6) as (x. 2) = 61(r)f‘1/2(x)n‘1/2h‘1/2. (2.47) an; (x, 1) : {AT (x) amp: (2))”2 {nhf (t,(,,)}—l/2 fi/‘isg (1%), (2.4.8) . _ 1/2 . —1/2 - 07,2 (x, 2) 2 {AT (2:) =j(x)A (x)} {nhf (x)} \/3/202 (x). (2.4.9) Since sup x — tJ-(x) S h —~> 0, as n ——> 00, and according to Lemma 2.6.9, the matrix B]- xEIa,bI approximates matrix 83- uniformly for 0 S j S N, (2.4.3) and (2.4.4) entail that all of the four bands above are asymptotically conservative. 2.5 Examples 2.5.1 Simulation example To illustrate the finite—sample behavior of the proposed confidence bands, some simulation results are presented. The number of interior knots N is chosen according to Section 2.4. The data set in our simulation study is generated from heteroscedastic regression model (1.2.1), with 100 -— exp (x) 100 + exp (2:) ’ m (x) = sin (27rx), 0 (x) = 00 5 ~ N (0, 1), 00 = 0.2, 0.5. (2.5.1) We simulate {7}}?‘21 from a moving average sequence of order q, i.e, 1 T1 = 2 2 (62' + 9161—1 + 9251—2 + + qui—q), \/1+61+...+6q where in the simulation, q is taken to be 4, 61 = = 6g 2 0.2 and f,’s are i.i.d. r.v.’s ~ N(0,1). We then define X, = (I)(T,-), where (I) is the standard normal distribution function, so X, is uniformly distributed on [0,1]. We choose sample size n to be 100, 200, 500 and 10000, confidence level 1 — a = 0.99, 0.95 as usual. Tables 4.1 and 4.2 contain the coverage probabilities as the percentage of coverage of the true curve at all data points by the confidence bands in (2.4.5) with 500 replications of sample size n = 100, 200 and 500. The coverage probabilities of the confidence 15 bands in (2.4.5) have also been computed by plugging in the true value of density function f(x) = IIO,1I(x) and the variance function 0(x) in (2.5.1), called the oracle bands as they use quantities that are unknown but for “oracles”. Table 4.1 shows that the performance of all four bands becomes much closer with larger sample size. When sample size reaches 500, all four bands have nearly the same coverage at noise level 0.2. In Table 4.2, the coverage percentages show very positive confirmation of Theorem 2.2.1 when k = 2. At sample size 100, regardless of noise level, both of the two piecewise linear bands in (2.4.5) achieve at least .980 and .948 for confidence level 1 ——a = .99 and .95, respectively. From Tables 4.1 and 4.2, it is obvious that larger sample size guarantees improved coverage, while reasonable coverage has also been achieved at moderate sample sizes. While under the same circumstances, the band by linear spline performs much better than the band by constant spline. We have also observed that the noise level has more influence on the constant bands coverage, and very little on the linear bands’. Corresponding to opt = 1, 2, four figures of constant bands (Figures 4.1 - 4.4) and four figures of linear bands (Figures 4.5 - 4.8) are created for graphical comparison: each with four types of symbols: dots (data), center smooth solid line (true curve), center dotted line (the spline estimated curve), upper and lower thick solid line (confidence bands). Comparing Figures 4.1 - 4.4, one sees that the band widths are very close as sample size reaches 500. This is more evident from Figures 4.5 - 4.8. In all figures, the confidence bands of n = 500 are thinner and fit better than those of n = 100. Also the smaller the significance level, the wider the confidence band. Overall, linear bands are superior to constant ones in terms of smoothness and narrowness. Observing that the estimation of on; (x) by 6mg (x, 1) at knots as in (2.4.8) or by 6mg (x,2) at all observations as in (2.4.9) does not seem to have much noticeable impact on the widths of the confidence bands, while the estimation at knots seems to produce closer coverage probabilities to the nominal confidence level, we recommend always using estimation by (3mg (2:, l) at knots for simpler and faster implementation. For the linear bands, we have also carried out simulation at noise level 0.2, for sample 16 size n = 10000 and Opt = 1 (estimation on knots). The coverage is always 99.6% for a = 0.01 and 97.6% for a = 0.05, both higher than the nominal coverage of 99% and 95%, consistent with their conservative definitions. Remarkably, it takes merely 365 seconds to run 500 replications with sample size as large as 10000 on a Pentium III PC. This is extremely fast considering that nonparametric regression is done without WARPing, see Hardle, Hlavka and Klinke (2000). 2.5.2 Environmental Kuznets curve (EKC) The environmental Kuznets curve (EKC), an inverted—U relationship between pollution and income, is an influential generalization about the way environmental quality changes as a country makes the transition from poverty to relative affluence. The EKC predicts that pollution will first increase, but subsequently decline if income growth proceeds far enough. The shape of the relationship between the rate of environmental degradation and GDP per capita has been the subject of much empirical examination. Several studies have attempted to test the EKC hypothesis empirically. The majority of these studies use panel data in conjunction with a static fixed and/or random effects panel estimator. In this section, we examine whether or not countries (here we select US and Japan) actually behave like the EKG, and we further look at the nonparametric time series nature of the data set after elimination of the trend. One key variable of this study, the environment index is the emissions of sulfur from 1850 to 1990, see Lefohn, Husar and Husar (1999). The other key variable is GDP per capita from 1850 to 1990, which can be obtained in Maddison (2003). To gain an insight into the model structure, we decompose the logarithm of GDP per capita and Emission per capita into their trend parts and noise parts, respectively, i.e., for t = 1, ...,n {log(GDP per capita)}, = u(t) + Xt, {log(Emission per capita)}, = v(t) + Yg. We are interested in two sets of hypotheses, given here separately in terms of the relationship between the trends u(t) and v(t), and between the stationary noise {Xt}?:1 and {Yt}?:1. 17 EKC hypothesis: There exists an inverted-U relationship between u(t) and v(t). (see Figure 4.9) Residual/noise hypothesis: There exists a linear relationship between {Xt}?__.1 and {Ytliéi- The EKC hypothesis can be tested by performing a routine trend analysis. After de- trending, {Xt}?=1 and {IQ}?=1 are obtained, then one can estimate the regression relation between them and construct an piecewise linear spline confidence band for the testing. Case 1. United States Example We get the trends u(t), v(t) of US data by fitting a polynomial regression on time t. u(t) :2 0.00511+ 3.3127, v(t) = -—O.0001t2 + 0.0261t — 2.1788, (2.5.2) with the corresponding R2 = 0.9814,0.9256. So for US, the EKC hypothesis is retained by the trend analysis. After elimination of the trend, {X t}?=1, {Y1}?=1 appear to be stationary. For the residual hypothesis, Figure 4.10 shows that when confidence level is as small as 80%, the linear regression line is still covered by the confidence band. This phenomena implies that the residual hypothesis is retained. Moreover, we can see that the confidence bands also cover the horizontal line E (YtIXt) E 0. So one concludes that Y; is unpredictable from Xt, that is, the intervention of emission is immune to the intervention of economy. Case 2. Japan Example The quadratic trends u(t), v(t) for Japan data are given as u(t) = 0.000312 -— 0.00191+ 6.7308, v(t) = —0.000512 + 0.0952t — 9.0772 (2.5.3) with R2 = 0.9829,0.9544. From the trend relationship curve, one sees that it is not a U shaped curve as EKC predicted. However, we are not sure whether it would succeed to decouple environmental pollution and resource use from economic growth, which will make this a tuning point and U shape later. To test the residual hypothesis, Figure 4.11 shows that neither the linear regression line nor the horizontal line E (i’tIXt) E O can be covered by the confidence bands even when the confidence level reaches 99%. So the residual hypothesis is rejected at significance level smaller than 0.01 given that the confidence band is already 18 conservative. This phenomena implies that the intervention of emission is not immune to the intervention of economy, or say that the adjustment of GDP has autonomous influence on the change of environmental quality, but not in a linear way. 2.6 Proof of Theorem 2.2.1 2.6.1 Preliminaries of Theorem 2.2.1 with k = 1 Throughout the following, denote by c, C, any positive constants, without distinction. The properties of C2.” and djjn are given in the following lemma, whose proof consists of direct algebraic verifications. LEMMA 2.6.1. As 71 ——* 00, for C2," defined in (2.2.2) and d”, in (2.2.3) cm = f(tj)h(1+rj,n,1),EO,j7éj', (2.6.1) 2 1+r- 2 j=0...N—1 d. = — t- h 3*"1 ’ ’ ’ 9'" 3H3“) I1/2+r,-,,,,2 j=—1,N, 1 1+f- , j’-—j =1, lv where 02}sz lTj,n,lI + 412?:‘N ITj1n,2I + 451%4 Ifjanfll g Cw (f, h), (2.6.2) if (tj+1) h {1 — Cw (f, h)} g (1,,” 3 if (9+1) h {1 + Cw (f, 11)} . (2.6.3) To prove Lemma 2.3.1, we make use of the following Bernstein inequality for geometri— cally a-mixing sequence. LEMMA 2.6.2. (Bosq (1998), page 31, Theorem 1.4/ Let {§,,t E Z} be a zero mean real valued a—mixing process, Sn = 221:1 g, Suppose that there exists c > 0 such that fori = 1, ...,n, k = 3, 4,..., EI§,Ik S elf—2ME6,2 < +00, then for each n > 1, integer q E I1,n/2], each 5 > 0 and k 2 3 (152 n 2k/(2k+1) P(ISnI 2 n5) ‘3 a1 exp (———-————) + (12 (k) 01 (I I) , 25771122 + 568 q + 1 19 where a() is the a-mixing coeflicient defined in (3. 2.10) and 2 577121c/(21c+1) ),a2(k)=lln 1+—‘~‘———— , 61:27—’+2(1+ q E 25mg + 5a: with mr = Ina-X1992 lléillr, 7‘ 2 2. PROOF OF LEMMA 2.3.1. For brevity, we only give the proof of Lemma 2.3.1 for An,l- —1 For any 0 S j S N, let 1),-J = B121(X,-)-1, then llBlelin — 1 = 11 1‘11)”, with Ema- = 0 and for any r _>_' 2, C1- inequality implies that E low-l" = E Is},1 (X,) — 1Ir g 2T—1EIB,2j‘,(X,-)+ 1I 3 CO {2h—1}r_l, where cjjn is as (2.22) with properties given in (2.6.1) and (2.6.2). On the other hand 130112,): EBf-I1 ' — )1I2 2 EIB;,1(X,-)-1I = {2c,-,,,}‘1—12 Clh‘l. . ~ it— So there is a constant c, such that for all k > 2, E'Ii),JIA S (ch—l) 2klEnz2j. Thus Cramer’s condition is satisfied with Cramer’s constant equal to ch‘l. Applying Lemma 2.6.2 to n’1 2?:1711‘03 for any 6 > 0, q E [1,n/2I, one has for k = 3 1 n —q62 n 6/7 P — >6 _ cln/logn for some constants c0,c1, one has a1 = 0(n/q) : 0 (log 71), a2 (3) = 0 (n2). Assumption (A4) 6/7 6/7 q + 1 q + 1 Thus, for 71 large enough, $1272..» yields that (Slogn —} S clog 71 exp {4:263 log n} + C71,?"6A0C0/7. nh 20 Taking c0, 6 large enough, one has for large n, P {111 121:1 712331 > (nh)—1/2 6 log n} S TF3. Hence (2.3.1) holds because 00 0° 2 Log: {3 00 _2 E P sup “1133111211. — 1|> 372 N g 2 271 < 00. Cl n=l ‘ N , n21 OSJS 121: 2.6.2 Proof of Proposition 2.3.1 with k = 1 To prove Proposition 2.3.1, the following important lemmas are needed. We denote by (I) the standard normal distribution function. LEMMA 2.6.3. [Sunklodas (1984), Theorem 1] Let {£3211 be an a-mirtrzg sequence with Efn = 0. Denote d :2 maxlgign {ElfiIZ‘HS} ,0 < 6 S 1, Sn 2 227-121 52': 03, := ES}, 2 con for some ('0 E (0, +00). [fa (n) g Koe‘AO", A0 > 0, K0 > 0, then there extstcl = c1(K,6), 62 = (:2 (K, (5), such that A d S Cl coon 6{log(on/((1)/2) /)\}1+6 {01:15}, < z} — ‘1)(2) for any A with /\1 g A 3 A2, where /\1=(:2{log (On/CO/ 2V} /n. b>2(1+(5)/6; A2245—1(2+6)log(0n/C5/2.) LEMMA 2. 6. 4. [Leadhette1, LGdgren and Rootzen (198?), Theorem 1.5 .3] As N —* 00, one has [(1) (r/(LN + bN)]N -—+ exp (—e—T), where aN = (210g N)”2 , bN = (210g N)1/2 — (2 log N)"1/2 (log logN + log47r) /2. Note that E1 (:13) in (2.3.4) can be rewritten as 2:38 e 31.1“22 n, (2.6.4) with 8; —— (13,8 j 1),, = £2121 Bj,1(Xi)0(Xi)5i- NOW define —ZE;B,1,:; e [a 1)] (2.6.5) The next lemma gives the pointwise variance of 51 (1:). 21 LEMMA 2.6.5. The pointwise variance of E1 (:r) is the function 031.1(1) defined in (2.29) which satisfies 02(1. f (1:) 73h Eel (11122031 (so): {1+ ma} zeia 1] (2.6.6) with supIE[a,b] lrnJ (I)l -1 0. PROOF. Note that E(ez-IX--) —— 0, E[Bj1(X,-)Bj1 (Xk)o(Xi)a(Xk)el-ek] = 0,Vi # k, the rest of the proof follows from Lemma 2.6.1 and the continuity of functions a (:r) and f (:13). Cl The difference between E1 (1:) in (2.6.4) and 5:1 (as) in (2.6.5) is negligible uniformly over as 6 [a,b]. LEMMA 2.6.6. Under Assumptions {A2) and (A5) 1131(1) — 121(1):: A,“ (1 - 11.1)"1 121 (2:11 e 11.11. PROOF. For any :1: 6 [a,b] 51(x)- 51(zr)| _<_ I51 (3:)| sup I B31 . — 1| sup 31,1 . 1 03.131), M .7 “2,11 OSjSN H J “2,1: Meanwhile (2.3.1) of Lemma 2.3.1 implies that 2 _1 _1 032111 lllBlelgn — 1| S An,1, (1 + An,1) S 0 0, such that for each j = 0, ..., N 2 n 2 __ 0310- E E (£15,0-) == nE{BJ-,1(Xz-)0(X,-)e,-} = ncjflll/ o2 (u) f (u) du = on, i=1 ’1' (2.6.7) where COJ: c]. n f1 (u )f (u) )du > (0 (f, a) > 0 with C1," defined in (2.22) and d1 _:_ 5151,31" = E {B3111 (21.10“ (X0 15.13} 5 Co (f,0)h“1/2- (2.6.8) 22 Proof. Using the definition of on 1 (:r) in (2.2.9) n 2 2 03m- = E (251,3) "271— Cj,En"71;z{ ZBj,1(B$)j,1(Xi)0(:i)€i} =n20j,n031,1($) i=1 2 ncjjfll/Iflx) (u)o2 (u)f(u) du = nco,j> __ neg (f, o)> Next, by Lemma 2.6.1 and the continuity of functions 02 (:12) and f (1:), one has —3 2 _ =,EIaJ-I3< — JJ/ [1 a3 (u)f(u1du s 001120)}: V2. C1 1' PROOF OF PROPOSITION 2.3.1 WITH p = 1. Note that for any j = 0,...,N, :1: e I]- .1101)— 3,}:119 ,)191(:c (XJ)a(XJ-)e.-=a;,}Ze-,J. (2.6.9) i=1 in which 03”- = on 2 q;(f,o) > O as in (2.6.7) and dj S C'(f,or)h-1/2 as in (2.6.8). Observing that {§,-,j}?___1 forms a stationary a-mixing sequence, with Efim = 0. Define An— _ 0ga < 01 0 Plow :50 — z a: J} Ml _ h1/2C0(f’0)0"’j An = max sup 0(— 10g(—T/aN—bN) —_- I—Qe’TN"l+o(N”1). Letting 2e-T = a or 7' = — log (oz/2) entails that uniformly in j, n _ 108(01/2) a -1 P 01- €'-S--—-—-—+bN 1,3:61- =1——————+0(N ). { 71,3; 1’] aN+1 + ‘7 1+N Thus 11 _ log a/2 , P{0n$-E gm- >—TIEIWT)+bN+1’$EIJ-, for someOS] _<_N} i=1 l > _ Og(a/2) aN+1 N n —l s :P{ 2a- j=0 121 So as n —+ 00, one has n —1 P { 0m 2&3)" i=1 +bN+1,1:E Ij}=oz+0(1). _10g(a/2) aN+1 S +bN+11$€1j10SjS N} 21—a+0(1). Hence lim inf P sup "—00 :1:€[a,b] n -1 an,j Z €21.7- i=1 Therefore, using Lemma 2.6.6, one has proved (2.3.7) for k = 1. Cl 0;,11(x)é1(x)| 3 {210g (N MW2 at («0] — _log (CY/2) aN+1 : lim inf P “H00 S +bN+1,xEIJ-,0§jSN] 21—01. .- 2.6.3 Proof of Theorem 2.2.1 with k = 1 PROOF OF THEOREM 1 WITH 1: = 1. By (2.3.6) and Assumption (A3), one has (17mm — m (2:)"... = OJ (1») = op {fl/211‘”? (log (N + 1))1/2}. so the uniform bias order is negligible compared tO (nh)_1/2 {log(N + 1)}1/2, which is the uniform noise order of aJ.,1(x){—log(a/2)/aw+1 + m1} = m (J) (210g (N +1)}1/2 dn (a). 24 Now (2.3.5) and Proposition 2.3.1 yield the conservativity of the band in (2.2.12) for k = 1 limian pm (:11) E 1i11(;r) :1: 0,1,1 (1:) {210g (N +1)}1/2dn(a),‘v’1: 6 [a,b]] ”-400 1. = limian sup 01:11 (:11) Isl (1:) + 7711(r) — m (:r)| 3 {210g (N +1)}1/2dn(a)] "—900 _x€[a,bl ’ . ll limian sup o_1(:r)17:1(1:) S {2log(N + 1)}1/2 dn (C1)] 2 1 — a. ,1 "“00 _z€[a,b] " Therefore, Theorem 2.2.1 has been proved for the case Of k = 1. Cl 2.6.4 Preliminaries of Theorem 2.2.1 with k = 2 In this subsection we examine some matrices used in the construction Of confidence band in (2.2.12) for k = 2. In what follows, |T| is used to denote the maximal absolute value Of all the elements in matrix T, V is the inner product matrix defined in (2.2.4) and M N+2 is the tridiagonal matrix as defined in (2.2.8). AI LEMMA 2.6.8. Given matrix Q = MN+2 +I‘, in which F = (73-3-1) . -/ 1 satisfies 7.7-1.1 .=_ 0 .73] :— if lj —j'| > 1 and [Fl 3, 0. Then there exist constants c,C > 0 independent ofn and I‘, such that with probability approaching one c151 s IQEI s 0151,0-11asln'lsl s c-1 (a as 6 RN”. Proof Of the above lemma is trivial. As an application Of Lemma 2.6.8, consider the N , then there exists a positive . _ ..1 - ~ —— matrlx S —— V defined 1n (2-2-5l- LBt £3" — {sgn (Sj'j) }]=_1 Cs such that N Z lsj’jl S lSéj/l S 05 éjll = C3,Vj’ : —1,0,...,N. (2.611) j=—1 The next lemma follows by applying Lemma 2.6.8 with Q = M N+2- It ensures that one can approximate S with the inverse Of M N+2, with a simpler distribution-free form in (2.2.8). This approximation is uniform for Sj in (2.2.5) and Ej in (2.2.7) as well. LEMMA 2.6.9. As 72 ——> OO,|lVIj_V1+2 — SI —> 0 and max lEj — Sjl —» O. OSjSN The tridiagonal terms of the matrix MIT/1+2 can be computed through the following lemma, which is a direct result Of Zhang (1999), Theorem 4.5, page 101. LEMMA 2.6.10. Let 21: (2+x/3)/4, z2= (2— Jig/4, 9=z2/z1=7—4\/3, one can compute the terms 12”,]: = l)”, Ii — k| S 1 defined in (2.2.8) by the following formulae 8219' (1 — 6N+1)— 2:1 (1 — 0”) 8.212 (1 — 0N+1)— 221 (1 - (9N) + (1 — 6N—1)/8’ {8.1 (1_ JAM-k) _ (1.. J~+1—k)} {8... (1- 6H) _ (1- ale-2)} (Z1 — 22){64z%(1_ 9N“) -16z1 (1— 9N) + (1 _ 9N-1)} ’ for 2 S k g N +1. l11 =1N+2,N+2 = lch = (—2\/§) (21 (1 — 6N) — (1 — 6N‘1)/8) 8.2% (1 — 01"“) — 2.21 (1 — 6N) + (1 — 9N—1)/8’ {.., (1 —. M) - (1 -— M) {... (1 - 9H) — (1 #2)) 421(21— 22) {64.2% (1 — 6N“) — 167.1 (1 - 0N) + (1 — 6N‘1)} ’ for 2 g k S N. In particular, there exists a constant c, > 0 such that | malix llikl _<_ C). i—k £1 112 = lN+1.N+2 = lk,k+1 = '- 2.6.5 Variance calculation We examine the behavior Of 52 (z) defined in (2.3.4), rewritten as N £2 (:12) = 2 (1,8,2 (3:),11: 6 [a,b], (2.6.12) j=-1 where the spline coefficient vector 5 = ((1-1, - - - , (LN)T according to (2.3.4) is n N t -l 1 * N (v + v ) 5 Z BJ,2 (XJ)a (X.-)eJ- , v = ((BJ2.13,-,.,J>J.)jj,___1 — v. i=1 j:_l 9 _ where V*, the difference between empirical and theoretical inner product matrices, satisfies lv*| g An; = 0,, {(nh)-1/21og1/2 (11)} by (2.3.2). Now define a = (a._1, - -- ,aN)T by replacing (V + V"‘)_1 with V—1 = S in above for- mula, i.e. N N n N - 1 Z" , , Z 1 a : S {; Bj12 (At) 0 (At) 51'} : Sjlj’; E Bil-.2 (Xi) 0' (X053 1 i=1 i=1 j=—1 j=-1 j’z—l 26 and define, with j(:r ) in (2. 2. 1) and N N 32(37) = :0 aij,2(T 17:) ZS $23 j,2(Xi)0(Xi)5iBJ/,2(I) 32—1 jij ,=-1 1 n :2 I Z Bj’,2($) Z sj’,j;§;BjJ2(Xi)0(Xi)Ei (2.6.13) . _. . ]=—l z—l J -J(I)-1J(I) for :1: 6 [a,b]. Next define an (N + 2)-vector U n N and 2-vectors {Aj};:0 ._ A11 :1 ___1_ 2:: 12:12-18 SJ- 1JI312(X) ( J5)J- A] _ < A]? l _ SJU “W ( Z": 12N__18j,,~J-IB rJ2(XJ-)0(X )5J ’ (2615) in which the (j — 1)—th and j-th rows Of the matrix S is denoted as an 2 x (N + 2) matrix ~ (I-.__ _ 8'__ ... S'_ . s,-=(';Js_1'1l 38:)" 1;”),0333N. (2.6.16) .71— J) 3) Then, one can write 52(37) in the following matrix form em) = DT(:C)A ), :1: 6 [a,b], (2.6.17) J'(I in which the function D (x) is a 2-vect0r such that T D (x) E {DJ(J)_1 (5-) J 1),-(J) (2:)} .DJ- (:5) a n‘l/zBJ-gcc). —1 s j : N. (26.18) The next lemma provides the pointwise variance of 22 (x). LEMMA 2.6.11. The pointwise variance ofég (x) is the function 0% 2 (x) defined in (2.2.10), which satisfies 02 :1: E {5:3 (x)} E 031,2(x) = :f—(a:)—7%AT (x) Sj(x)A (:13) {1+ rng (x)}, (2.6.19) with sque[a,b] I7'11,2 (13)] —+ 0, j (x) is as defined in (2.2.1), A (x) as defined in (2.2.7) and matrix Sj in (2.25). Consequently, there exist positive constants c(I and C0 such that for large enough 71 ca (nh)—1/2 g 0712(13) g Co (nh)_1/2 ,Vx 6 [(1,1)]. (2.6.20) 27 PROOF. From (2.6.15) and (2.6.17), one has E {5:3 (x)} = DT (x) cov (A1010) D (x) = DT (x) Sj(x) cov (U) §fl$)D (x). Note that E(Ei|Xz') = 0, E [8.712 (Xi) 81,2 (Xk)0‘(Xz')(I (Xk) 52511:] = 0,Vi 75 k, the jl-th entry of the covariance matrix of U defined by (2.6.14) is ~11; : 2 E {BJJ(XJ)BJ,J(XJ)a(XJ-)a(XJ)5J5-J} i=1k=1 1 Tl = 52131 Bj,(i,i2X)Bl2(X)02(Xi)=} [02(7) ‘U,)Bzz(v)f(v)dv=0'jl which is the jl-th entry of the matrix 2 defined in (2.2.6), i.e., cov (U) = E. The rest of the prOOf is simple algebra. Cl 2.6.6 Proof of Theorem 2.2.1 with k = 2 Prior to the proof Theorem, we introduce some notation. First we define 2—vectors {ZJ- ”:0 (j) (3') , z,- -=- (311,212) = Af{COV(Aj)}_l/2 = (2%” Afl +fi12 15M ) v (2.6.21) 312/) 1'1 +2221) 3 where denote ( ) J {cov (Aj)}_1/2 E ( [301) #12:) . ' (2.6.22) '612) '822) 2 Then it is clear that var (Z3) 2 I , var (2.7"!) = 1,7 21,2, for any 3' = 0, ..., N. The covariance matrix Of Aj approximates 02(tj+1)Sj defined in (2.2.5) uniformly. LEMMA 2.6.12. For {Aj ”:0 defined in (2.6.15) and matrix Sj defined in (2.2.5), one has _. 2 . . " . __ cov (Aj) — o (tJ+1)SJ +R j,() _<_ j < N, 111320021]?lele —— 0. PROOF. Since Aj = SjU with S, defined in (2.6.16) and cov (U) = 2 as in the proof Of Lemma 2.6.11. Thus the covariance matrix Of AJ- is N _ , N . , COV (Aj) : 3123? = Zfilz—l Sj—lJcsj—IJUkl 2k l=—1 5],k5]—1,lakl . 2k,l:—1Sj—1,ksj,lakl Zk,z=_.18j,k8j,10k1 By Assumption (A2), (2.6.16) and (2.2.6) 01-1 = /02 (U) Bk,2 (U) 31,2 ("0)“de = 02 (tk+1)'U/Jz + 010(f02Jl) - 28 Similarly, one also has 0k! = 02 (t1+1)vk1 + cw (fa2, h). Thus N 31—1 ij—l 11’k102(t1+1) 5]" ij—l 1'01-102 (t1+1) ~ .. COM/‘1'): ’ ’ 2 ' ’ 2 +Rj, “2-1 Si-Lk'stvkl" (tk+1) Sj,k3j,1vk10 (tk+1) where N N ~ 2 Zk1=_15'—1,kS'—1,1 Z =_ 3',k5'—1,l thcw(fa,h)( N J J [NI 1] J ) J 221:4 3j~1,k3j.l 2112—1 $chij Note that 2152,24 8]"):ka =— 011.175 j and Efcv=_1 5]"ka = 1 ifl = j, thus 2 2 . = 314.10 (6') 31—110 (6+1) ~.,_ 2 . . ~. (Sm/(A3) ( 5241.102 (9+1) 31.10% (6+1) +R’ _ a “”283 +11" LEMMA 2.6.13. For the matrices 5;”? defined in (22 7) . ...—1/2 hm max :1 - "—00 OSJ'SN — u(tJ-H) {cov (Aj) }“1/2| = 0. (2.6.23) — 2 — . . . . PROOF. Note that E j 1/ , {cov (Aj)} 1/ 2are symmetric matrices and usmg the followmg fact for symmetric matrices A and B C A—1/2 _ B—l/2l = C max (A—1/2 _ B—1/2) 61 1:12 (BAUQ + .A.Bl/2)(A”1/2 — 3‘1/2) 12,-] = |B - Al, _<_ max 221,2 together with Lemma 2.6.12, one has c|:~:;1/2 — v(t)-+1) {cov (A1) }“/2| .<.. lo“2(tj+1>cov (A1) — 5,] S lSj - E)" + IU—2(tj+1)COV (Aj) - 8]" = ISJ' — Ejl + U”2(tj+1)fi.j. The desired result follows from Lemma 2.6.9. Cl LEMMA 2.6.14. Under Assumptions (A 1)-(A5), for the variables Zj'w ’y = 1,2, 0 gj g N, defined in (2.6.21), one has max lim sup P max {Z2 } > 2 {log (N + 1)} ((1,, (Cl/2)}2 S 01/2. (2.6.24) 7:112 n—100 USJSN J2 29 PROOF. Without loss of generality, we prove (2.6.24) only for 7 = 1. {Z ,21} > 2 {log (N + 1)} {11. (042112] P[O_<_j{210g(N+1))1/2dn(a/2)]1 where, according to (2.6.21) and (2.6.22) 211 — 59/91 +flij2)Aj2= £2: 2 (2021131: 1,11 +fii£)3j,k) Bk,2(X1)0(X1)€1- fii=lk=—l Let (id' 2 25:4 (6933);”, + 6132 )Sch) Bk,2(X,-)U(X,-)e,-, j = 0,...,N,1§= 1, ...,n, then 17. r 2 r (7—1431 = Sn = 2C1,» ICU/5231) = 7135321 = '=-_1 So one only needs to find a bound for E lC11jl3 in order to apply Lemma 2.6.3 to Sn. By the boundedness of max IE”, (2.6.3) and (2.6.23) OSjSN 3 N . E|<1,,I3:E Z (115% s,.. 11+13§Qs,1) 3,,1111) 03(X1)|si| k=—1 3 N . , s MOE Z (11Ei’s,_1,1+fl§’,)s,,1)B11011) 03(X1) 3011,0114”. 112—1 Lemma 2.6.3 entails that An = 0 (n—l/zh-lfl) = 0 (N‘2), in which An is P{n_l/2ZC,J S 2} — (13(2) . i=1 OgistuplP{ZJ-1 < z}—— (z) )l— —— Ornjachsgp By Lemma 2.6.4, one has uniformly in j PZ[I ,11<{21og(=1N+1)}1/21,(a/2,] ——2—W“Tfi+o(N-l). Therefore T1 [0211;ng lell > {210g (N + 1)}1/2 dn (Cl/2)] Mzm P[|z ,11 >{2log( (N+ 121/211111121] 1» H O [1— {1— 27h” +0(1) =a/2+o(1). 0 14.1 l I II I‘MZ 30 Hence limsupP[0g}z1£x}v{ZJ-1} > 2 {log (N + 1)} {(111 (Or/2)} 2] = a/2. D 121-"’00 LEMMA 2.6.15. For a given 0 < a < 1, and 0112(1) as given in (2.2.10) lim inf P n—100 sup $6 [a,b] 1.1111111 (11)] S 21log — = — . [:1 1 721:2hfrlndsoiépP[0 EELSXN{Zj7} >2{og(N+1)}{dn(a/ )} 2] _1 a/2x2 1 a The next lemma’s proof follows from Lemma 2.6.8, (2.6.12), (2.6.13), (2.3.2) and (2.6.20). LEMMA 2.6.16. Under Assumptions (A3) and (.45), one has sup 01112“ .5) .1112( )' — sup =Op{(nh)“l/Qlogn} =op(1). r6[a,b] :rE[a,b] 31 PROOF OF PROPOSITION 2.3.1 WITH k = 2. It follows from Lemmas 2.6.15 and 2.6.16 automatically. [:1 PROOF OF THEOREM 2.2.1 WITH k = 2. Note that equation (2.3.6) implies that ”7712 (51:) — m (a:)||OO = Op (h2), hence (1111)”? {log(N+ 1)}"1/2 111711 (:1) — 111(1)“... = 0, {(111)1/2 {101m +1)}*1/212} = 01(1). which implies that the bias order is negligible compared to the noise order. Applying (2.3.7) with k = 2 in Proposition 2.3.1 lim inf P pm (3:) E 7112 (x) :1: 20,12 (:13) {log (N +1)}1/2 dn(a/2),V:1: E [a,b]] 111—’00 : limian sup 0;,12 (as) |52 (:13) + 7712 (:c) — 771(1E)| S 2 {log (N +1)}1/2 dn (Or/2)] ”“00 L:ic€[a,b] = lilmiééfP sup 0;§($)E2 (1:) §2{log(N+1)}1/2dn ((1/2)] 2 1—a. D H be[a,b] ’ 32 CHAPTER 3 Spline-Backfitted Kernel Smoothing of NAAR Models 3.1 Introduction For the past two decades, various non— and semiparametric regression techniques have been developed for the analysis of nonlinear time series; see, for example, Robinson (1983), Tjostheim and Auestad (1994), Huang and Yang (2004), to name one article represen- tative of each decade. Application to high dimensional time series data, however, has been hampered due to the scarcity of smoothing tools that are not only computationally expe— dient but also theoretically reliable. This has motivated the proposed procedures of this Chapter. For the NAAR model in (1.3.1), estimators of the unknown component func- tions {n1.a(-)}g:1 are proposed based on a geometrically strong mixing sample (14,-, X“, ..., X,,d}?:1. If the data were actually i.i.d. 'observations instead of a time series re— alization, many methods would be available for estimating {ma (”321. For instance, there are four types kernel-based estimators: the classic backfitting estimators (CBE) of Hastie and Tibshirani (1990), Opsomer and Ruppert (1997); marginal integration estimators (MIE) of Linton and Nielsen (1995), Linton and Hardle (1996), Fan, Hardle and Mammen ( 1998), Sperlich, Tjostheim and Yang (2002), Yang, Sperlich and Hardle (2003) and a kernel based method of estimating rate to optimality of Hengartner and Sperlich (2005); the smoothing backfitting estimators (SBE) of Mammen, Linton and Nielsen (1999); and the two—stage 33 estimators, such as one step backfitting of the integration estimators of Linton (1997), one step backfitting of the projection estimators of Horowitz, Klemmela and Mammen (2006), and one Newton step from the nonlinear LSE estimators of Horowitz and Mammen (2004). For the spline estimators, see Stone (1985), (1994), Huang (1998), and Xue and Yang (2006 b). In time series context, however, there are fewer theoretically justified methods due to the additional difficulty posed by dependence in data. Some of these are: the kernel estimators via marginal integration of Tjestheim and Auestad (1994), Yang, Hardle and Nielsen (1999); and the spline estimators of Huang and Yang (2004). In addition, Xue and Yang (2006 a) have extended the marginal integration kernel estimator and spline estimator to additive coefficient models for weakly dependent data. All of these existing methods are unsatisfactory in regard to either the computational or the theoretical issue. The existing kernel methods are too computationally intensive for high dimension d, thus limiting their applicability to small number of predictors. Spline methods, on the other hand, provide only convergence rates but no asymptotic distributions, so no measures of confidence can be assigned to the estimators. If the last (1 — 1 component functions were known by “oracle”, one could create {33,11X1,1}?=1 With Yi,1 = Y1“ — C — 232277101 (X13111) = m1(X1',1) + 0 (Xi,11---1Xi,d)51‘1 from which one could compute an “oracle smoother” to estimate the only unknown func— tion m1 (1:1), thus effectively bypassing the “curse of dimensionality”. The idea of Linton (1997) was to obtain an approximation to the unobservable variables Y“ by replacing ma (X230) ,2' = 1, ...,n, a = 2, ..., d with marginal integration kernel estimates and arguing that the error incurred by this “cheating” is of smaller magnitude than the rate 0 (71-2/5) for estimating function m1 (3:1) from the unobservable data. The procedure of Linton (1997) is modified by substituting mo, (X1301) ,i = 1, ..., n, a = 2, ..., d with spline estimators, specif- ically, a two-stage estimation procedure is proposed: first one pre—estimates {ma (30)}g:2 by its pilot estimator through an under smoothed centered standard spline procedure, next one constructs the pseudo response Y“ and approximates m1 (2:1) by its Nadaraya—Watson estimator as given in (3.212). 34 The above proposed spline-backfitted kernel (SPBK) estimation method has several ad- vantages compared to most of the existing methods. Firstly, as Sperlich, Tjostheim and Yang (2002) mentioned, Linton (1997) mixed up different projections, making it uninter- pretable if the real data generating process deviates from additivity. While the projections in both steps here are with respect to the same measure. Secondly, since our pilot spline estimator is thousands of times faster than the pilot kernel estimators in Linton (1997), the proposed method is computationally expedient, see Table 4.4. Thirdly, the SPBK estima- tor can be shown as efficient as the “oracle smoother” uniformly over any compact range, whereas Linton (1997) proved such “oracle efficiency” only at a single point. Moreover, the regularity conditions considered here are natural and appealing and close to being the minimal compared to the papers mentioned above. In contrast, higher order smoothness is needed with growing dimensionality of the regressors in Linton and Nielsen (1995). Stronger and more obscure conditions are assumed for the two-stage estimation proposed by Horowitz and Mammen (2004). The SPBK estimator achieves its seemingly surprising success by borrowing the strengths of both spline and kernel: Spline does a quick initial estimation of all additive components and removes them all except the one of interest; kernel smoothing is then ap— plied to the cleaned univariate data to estimate with asymptotic distribution. Propositions 3.4.1 and 3.5.1 are the keys in understanding the proposed estimators’ uniform oracle ef- ficiency. They accomplish the well-known “reducing bias by undersmoothing” in the first step using spline and “averaging out the variance” in the second step with kernel, both steps taking advantage of the joint asymptotics of kernel and spline functions, which is the new feature of the proofs here. Fan and Jiang (2005) provides generalized likelihood ratio (GLR) tests for additive models using the backfitting estimator. Similar GLR test based on the SPBK estimator is feasible for future research. The rest of the chapter is organized as follows. Section 3.2 introduces the SPBK esti- mator, and states its asymptotic “oracle efficiency” under appropriate assumptions. Section 3.3 provides some insights into the ideas behind the proofs of the main theoretical results, by decomposing the estimator’s “cheating” error into a bias and a variance part. Section 3.4 shows the uniform order of the bias term. Section 3.5 shows the uniform order of the variance term. Section 3.6 presents Monte Carlo results to demonstrate that the SPBK estimator does indeed possess the claimed asymptotic properties. All technical proofs are contained in Section 3.7. 3.2 The SPBK estimator In this section, a spline-backfitted kernel estimation procedure is proposed. For convenience, denote vectors as x = (1:1,, ...,xd) and take I] - I] as the usual Euclidian norm on R“ such that ”x“ : “Sid 1:2,, and H - “00 the sup norm, “X“oo = $11315an Ira]. In what follows, let Y,- and X,- = (X111: ..., X,,d)T be the ith response and predictor vector. Denote Y = (Y1, ..., Yn)T the response vector and (X1, ..., X”)T the design matrix. Assume that the predictor X0 is distributed on a compact interval [am ba] ,a = 1, ..., (1. Without loss of generality, all intervals [am b0] = [0, 1] , or = 1, ..., d. We pre-select an integer N 2 Nn ~ n2/ 5 log n, see Assumption (B6) below. For any a = 1, ...,d, the constant B- spline function in (1.5.2) can be rewritten as the indicator function 1,1,0, (2:0,) of the (N + 1) equally-spaced subintervals of the finite interval [0,1] with length H '= Hn = (N +1)-1, that is 1.1ng0, <(J+1)H, =0 l N. 3.2.1 0 otherwise, ”I ’ ’ ’ ( ) IJ,a ($01) 2 { Define the following centered spline basis III.1+1,11||2 ]l1J10l]2 with the standardized version given for any a = 1, ..., d, (11,00,130) : IJ+1,Q(1:(1) — IJ,a (Ea) ,Va 2 1, ...,d, J = 1, ..., N, (3.2.2) bJ,o: (Ia) “bJ,a”2 , Define next the (1+ dN)-dimensional space G = G[0, 1] of additive spline functions 8,1,0, (11,) = VJ = 1, N. (3.2.3) as the linear space spanned by {1,BJ,a (ma) ,a = 1,...,d,J = 1, ...,N}, while denote by Gn C R” spanned by {1,{BJ’O (Xi’a)}?:1,a =1,...,d,J =1,...,N}. As 11 —1 00, the dimension of 0,, becomes 1 + dN with probability approaching one. The spline estimator 36 of additive function m (x) is the unique element 111(x) = fnn (x) from the space C so that the vector {7110(1) , ...,riz (X,,)}T best approximates the response vector Y. To be precise ._ ~10 + Z Z A’JQIJC, 1:0,) (3.2.4) a=1J=l ~I 1.] AI where the coefficients (A0, )‘111’ ..., A Md) are solutions of the least squares problem . T " {A01 A1, 11 Alma] = argmianN+1 2 Y1 - A0 - Z Z )‘J,a[J,a (X1,a) i=1 (1:1le Simple linear algebra shows that . d N Th(x)=)\0+ ZJZ1, 118,111.1(111) (32.5) where (Ag, 111,1, ---1:\N,d) are solutions of the following least squares problem 2 d N T {A03A1,11“'3 ANfif} =argmianN+IZ Y—— AO‘ZZAJHOBJO (Xz,a) ) i=1 a=1J= 1 (3.2.6) while (3.2.4) is used for data analytic implementation, the mathematically equivalent ex- pression (3.2.5) is convenient for asymptotic analysis. The pilot estimators of each component function and the constant are N n 7510(1130) = ZAJMBJQ Ia) — "_IZZAJMBJOX i,a) _ 1': 1J= 1 . d n N .. 11. = 1111-122211131. (x111)- (32-7) a=12=1 J=l These pilot estimators are then used to define new pseuddresponses Y“, which are estimates of the unobservable “oracle” responses Y“. Specifically, (122 (1 {[2,1’“ - — c — 2 ma( Y,1- — ,-— c — Z n1.a(X,-,a), (3.2.8) 01:2 where 6 = 7,, = 11‘1 2:21 Y,, which is a Vii-consistent estimator of c by Central Limit Theorem. Next, define the spline-backfitted kernel (SPBK) estimator of ml (1:1) as 711] (2:1) 37 n i: based on {Yi’hXi’l} 1, which attempts to mimic the would-be Nadaraya-Watson esti- mator 171.;(111) of m1(:c1) based on {Yi’th-szl if the unobservable “oracle” responses {1331}le were available 23121 K11 (X131 — $1) 13.1 221:1 Kh (X131 — 171) where 1),-,1 and Y“ are defined in (3.2.8). 1‘. K X- — Y- ,111'{(11,)=Z'-1 "( "1 I1) ”1 (3.2.9) 211:1 Kh (X131 - $1) ’ mi (331) = Throughout this chapter, on any fixed interval [0, 1], denote the class of Lipschitz con- tinuous functions for any fixed constant C > 0 as Lip([0,1],C) ——— {ml ]m(:1:) — m(:z:')] S C la: — :r'] ,Vx,$’ 6 [0,1]} . (Bl) The additive component function 1n1(a:1) 6 C(2) [0,1] defined in (1.5.1), while there is a constant 0 < Coo < 00 such that mfi E Lip ([0,1],000), V13 = 2, ...,d. B2) There exist positive constants K and )1 such that a n S K e-AO" holds for all n, ( 0 0 0 T! with the a-mixing coefficients for {Zi = (X21150) defined as a(k) = sup |P(BflC)—P(B)P(C)|, k_>_ 1. (3.2.10) BEG{Zs,sgt},C€o{Zs,32t+k} (BB) The noise 5,- satisfies E(5,- ]X,-) = 0, E (5,2 ]X,-) = 1, E (I51|2+5]X1) < M5 for some 6 > 1/2 and a finite positive M5. The conditional standard deviation function a (x) is continuous on [0,1]d and 0 0, and is bounded, nonnegative, symmetric, and supported on [—1,1]. The bandwidth h of the kernel K 1/5 is assumed to be of order n" , i.e., chn-l/5 S h _<_ C'hn’l/5 for some positive constants Ch, ch. (86) The number of interior knots N ~ 112/5 log n, i.e., anQ/5 logn g N S CNnZ/5 logn for some positive constants cN,CN, and the interval width H = (N + 1)’1 . REMARK 3.2. 1. The smoothness assumption of the true component functions is greatly relaxed and Assumption (BI) is closed to the minimal. By the result of Pham (1986), a geometrically ergodic time series is a strongly mixing sequence. Therefore, Assumption (B2) is suitable for (1.3.1) as a time series model under aforementioned assumptions. Assumption (B3)-(B5) are typical in the nonparametric smoothing literature, see for instance, Fan, and Gijbels (1996). For (B6), the proof of Theorem 3.2.1 in Section 3.7 will make it clear that the number of knots can be of the more general form N ~ n2/5N', where the sequence N ’ satisfies N’ —+ 00, n’ON' —1 0 for any 0 > 0. There is no optimal way to choose N’ as in the literature. Here N is selected to be of barely larger order than n2/5. The asymptotic property of the kernel smoother 111'; (2:1) is well-developed. Under As- sumptions (Bl)-(B5), it is straightforward to verify (as in Bosq 1998) ”that sup lift; (1:1) — m1(:r1)| = 0,, (n_2/5 log n) $1€]h,1—h] \/fl—h{ 311071) - m1 (931) — b1(1:1)h2] 2 N {0’1}? (1:1)}, where b1(1‘1) IUZKfuldufm'l'fxilfi($1)/2+m'1($1)f{($1)}f1_1($1)1 (3211) vi (1,) = 1160011113 [02(X1....,X.1) 1X1 = 111111-1011). ' ' The following theorem states that the asymptotic uniform magnitude of difference between 111; (2:1) and 1111‘ (2:1) is of order 0,, (n‘2/5), which is dominated by the asymptotic uniform size 01171; (3:1) -— m1 (171). As a result, 111; (2:1) will have the same asymptotic distribution as 171’,“ ($1). 39 THEOREM 3.2.1. Under Assumptions (BI) to (B6), the SPBK estimator riff (11:1) given in (3.2.9) satisfies sup (111; (11,) — 111; (1:1)1 = 0,, (152/5). x1€[0,l] Hence with b1 (3:1) and v? (2:1) as defined in (3.2.11), for any 2:1 E [h,1 — h] 1111(111’; (2:1) — m1 (11:1) — b1(:1:1)h2} 2, N {010% (3:1)} . REMARK 3.2.2. The above theorem holds for 111:, (3:0,) similarly constructed as 1111‘ (11:1), for any a 2 2, ...,d, i.e., TL K X- —1: 1?- . 111:“,(.1O,)=21-}l $5; a) “'0‘, 19,,,=1/,-—a— Z 111, (Xw), (3.2.12) 21:1 h 1,1—2211) 13113111311111 where 111.), (Xiyfl), [3 : 1, ...,d are the pilot estimators of each component function given in (3.2.7). Similar constructions can be based on local polynomial instead of Nadaraya— Watson estimator. For more on the properties of local polynomial estimators, in particular, its minimax efficiency, see Fan and Gijbels (1996). REMARK 3.2.3. Compared to the SBE in Mammen, Linton and Nielsen (1999), the variance term v1 (3:1) is identical to that of SBE and the bias term b1 (2:1) is much more explicit than that of SBE at least when Nadaraya—Watson smoother is used. Theorem 3.2.1 can be used to construct asymptotic confidence intervals. Under Assumptions (B1)-(B6), for any a E (0, 1), an asymptotic 100 (1 - a) % pointwise confidence intervals for m (2:) is 111.1(11) —b,(111)11‘2110mm,){/K2(11)d11]1/2/{11hf,(1:1)}1/2, (3.213) where 61(x1) and f1 (1:1) are any constant estimators of E [o2 (X) IX 1 = 2:1] and f1 (2:1). The following corollary provides the asymptotic distribution of fit" (x). The proof of this corollary is straightforward and therefore omitted. COROLLARY 3.2.1. Under Assumptions (BI) to (B6) and the additional assumption that ma ($0,) E 0(2) [0,1], 0: == 2,...,d, for any x E [0,1]d, the SPBK estimator 111:, (x), a = 1, ...,d, are defined as given in (3.2.12). Let d 711* (x) = (3 + 2 7T1; (350)1bfx) : Z (10(1301) ,v2(x) : Z U121,($a)1 0:1 then M{m* (x) -— m (x) —b(x)h2} 2, N {0,122 (x)}. 3.3 Decomposition In this section, some additional notations are introduced in order to shed some light on the ideas behind the proof of Theorem 3.2.1. Denote by ||¢||2 the theoretical L2 norm of a function 45 on [O,1]d, ||¢||§ = E {(1)2 (X)} = fl0 11d d2 (x)f(x) dx, and the empirical L2 _1 n i=1 norm as Mug,” 2 n (t2 (X,). The corresponding inner products for L2-integrable functions (f), 90 on [O,1]d are m» E{¢(X)w(X)} = [[0 lld¢(X)so(X)f(X)dx, (cf), so>2,n = n“ E (I) (Xi) so (Xi)- i=1 The evaluation of spline estimator m (x) at the n observations results in an n—dimensional vector, in (X1, ..., Xn) = {in (X1) , ..., m (Xn)}T, which can be considered as the projection of Y on the space 0,, with respect to the empirical inner product (-, )2,” . In general, for any n-dimensional vector A 2 {A1, ..., An}T, define PnA (x) as the spline function constructed from the projection of A on the inner product space (Gm (3)231) d N PnA (X) Z )‘O '1‘ Z Z AJ,aBJ,a ($01): a=1J=l with the coefficients (20, 21,1, ...,;\N,d) given in (3.2.6). Next, the multivariate function PnA (x) is decomposed into empirically centered additive components PmaA (2:0,), (1 = 1, ..., d and the constant component Pch n pr (2:0) = P3,.» (ma) — n“ 2 PtaA (X...) , (3.3.1) i=1 A d n PMA = A0 + n_1 2 Z P;,OA (Xm) . (3.3.2) 021 i=1 where PE‘QA (3:0,) 2 291:1 X1,“ B 1’0. ($01). With these new notations, one can rewrite the spline estimators m (x) , ma (2:0) , me defined in (3.2.5) and (3.2.7) as 771' (X) : PnY (X) , filo (Ia) Z Pn,aY (ma) ,filc = Pn,cY, 41 Based on the relation Y = m(X)+o (X) 5 = m (X)+E with noise vector E = {o (Xi) ei};l=1, one defines similarly the noiseless spline smoothers m (x) = Pn {771(X)}(x),1iia(a:a) = Pma {m (X)} (ma) ,mc = Pmc {m (X)} , (3.3.3) and the variance spline components E (x) = m: (x) .2.. (x3) = Pics (ma) ,5. = p.,.E. (3.3.4) Due to the linearity of operators Pn, Pmc, Puma = 1, ...,d, one has the following crucial decomposition for proving Theorem 3.2.1, m (x) = m (x) + E (x), mc = The + EC, ma (2:0)) 2 ma (11:0,) + E00130) (3.3.5) for a = 1, ..., (1. As closer examination is needed later for E (x) and Ea (11:0), one defines in addition 5 = {(10, (11,1, ..., aN,d}T as the minimizer of the following 2 n d N Z 0 (X051 — a0 - Z Z aJ,oBJ,a (X130) - (3.3-5) i=1 a=1J=1 —1 Then 2‘ (x) in (3.3.4) can be rewritten as 5TB (x), where 5 = (BTB) BTE is the solution of (3.3.6), and vector B (x) and matrix B are defined as T B (X) 2’— {1, 31,1 (11:1) , ..., BN,d ($d)} , B = {B (X1) , ..., B (Xn)}T . (3.3.7) To be specific, the least square solution of the noise is —1 T a: 1 OdN ézy=10(xi)5i OdN 2’n Isma’sd, a ?=1BJ,a(Xi,a)0(Xi)€i 1ngN, ng,J’gN 1309’ (3.3.8) where 0p is a p—vector with all elements 0. The main objective here is to study the difference between the smoothed backfitted estimator m; (3:1) and the smoothed “oracle” estimator m’i‘ (3:1) , both given in (3.2.9). Horn now on, assume without loss of generality that d = 2 for notational brevity. Making use of the definition of ('3 and the signal noise decomposition (3.3.5), the difference m; (11:1) —- m; (2:1) — f: + c can be treated as the sum of two terms % ELIKh(Xi,1-$1){fil2(xi,2)“m2(Xz',2)}Z ‘I’t($1)+\1’v(rvi) hill KI: (X231 — $1) $121119; (Xi,l - 2:1) , (3.3.9) 42 where ‘1’!) (171) = 7-11-2101 (X131 - $1) {(7129132) - m2 (Xi,2)}, (3-3-10) Wu (5151) = —;:21Kh(x 1 — $1)€ 2(Xi,2) . (3.3.11) The term \Ilb (2:1) is induced by the bias term 1712 (X232) —m2 (Xi,2), while ‘11,, (2:1) is related to the variance term E2 (X 132). Both of these two terms have order op(n“2/5) by Propositions 3.4.1 and 3.5.1 in the next two sections. Standard theory of kernel density estimation ensures that the denominator term in (3. 3. 9), 1 2,; 1 K h (X131 — 3:1), has a positive lower bound for 2:1 E [0, 1]. The additional nuisance term 6 —— c is of clearly order 0,, (n‘lfl) and thus 0,, (n’2/5), which needs no further arguments for the proofs. Theorem 3.2.1 then follows from Propositions 3.4.1 and 3.5.1. 3.4 Bias reduction In this section, we show that. the bias term \Ilb (2:1) of (3.3.10) is uniformly of order 0;, (n‘2/5) for 1171 E [0,1], which is given by Proposition 3.4.1 as below. PROPOSITION 3.4.1. Under Assumptions (B1) to (B2), and (B4) to (B6) ..1‘13,..""b<$1>' = 0p (.412 + H) = ., (Tl-”5)- One important result from page 149, de Boor (2001), is cited before the proof. LEMMA 3.4.1. There exists a constant Coo > 0 such that for any component function ma 6 Lip([0,1] ,Coo) and function 90 E 0,01 =1,...,d, Hga — mall00 S COOH. LEMMA 3.4.2. Under Assumption (8]), there exists function 91, 92 E G, such that 2 Th — g + :3 (1,901 (X0)>2,n (.121 = 017(n'1/2 + H) , 2,n where g (x) = c + 23:1 9“ (1rd) and 171 is defined in (3.3.3). 43 PROOF. By Lemma 3.4.1, there is a constant C'00 > 0 such that for function go, E G ”90 —malloo _<_ COOHv a = 132' Thus “g_mlloo S 231:1“90 “malloo S 2CooH 311d ||m — m||2,,, S “g — mllzn S 2CooH. The triangular inequality then implias that ”771 — 9|l2,n 5 “Th — mll2,n + ”9 - m”2,n S 4000(1) |(9a(Xa),1)2,nl s |<1,ga (X3)... — (11m0(XC1))2,n + |<1.ma(Xa)>2,n 3 COOH + 0,,(n-1/2) . (3.4.1) Therefore 2 2 7h _ g + Z (1290 (XO))2,n S “777' _ gll2,n + Z I<1290 (XO))2,n 0:1 2,n 0:1 3 600011 + 0,, (71-1/2) = 0,, (TH/2 + H). E] PROOF OF PROPOSITION 3.4.1. Denote Rr ___ sup Z?=1Kh(Xi,1 - 1‘1) {92 (X232) - m2 (Xi.2)} 3:1€[0,1] ZZZ—.110; (X231 ‘ 171) R Z?=1Kh (X231 - I1) (7712 (Xi,2) - 92 (Xi,2) +(1:!12(X2))2,n} 2 = sup a x16[0,1] Z?=1Kh(xi,1 — 31) then supxlelml [\I/b (2:1)|< ((1,512 (X2))2 ,, + R1 + R2. For R1, using Lemma 3.41 To deal with R2, let 3120120,) 2: BJ’z (2:0)) — (1, 8J2 (Xa))2,n, for J = 1, ..,N, O: = 1,2, then one can write 2 N m(x) (x)+:(1, ga(Xa))2 "“251 +0223}, BJa( (.230) (1:1 Thus, n'l 221:1 Kh (X131 — 31:1) {1712 (XL?) — 92 (X13) + (1,92 (X2))2,,,} can be rewritten n—l 2?:1Kh (Xi,1 — x1) 2:"); 1aJ2BJ2 (X132): bounded by N Z “1,2 =1 12,.. 0:21 2 0:1 2,n where the last step follows from Lemma 3.7.7. Thus, by lemma 3.4.2 R2 = 0,,(rr1/2 + H) . (3.4.3) Combining (3.4.1), (3.4.2) and (3.4.3), one establislws Proposition 3.4.1. CI 3.5 Variance reduction This section shows that the term ‘11., (1:1) given in (3.3.11) is uniformly of order 0,, (n72/5). This is the most challenging part to be proved, mostly done in Section 3.7. Define an auxiliary entity m: N ~20 0,12 B,J2( (3-5—1) where (1J2 is given in (3.3.8). Definitions (3.3.1) and (3.3.2) imply that E2 (2:2) is simply the empirical centering of (2‘; (11:2), i e n 3:2 (:2) E g; (.172) — 114: a; (Xw). (3.5.2) i:l PROPOSITION 3.5.1. Under Assumptions (B2) to (B6), sup N. (2:1)l = 0p (H) = 010 (7772/5) . 1216(0,” According to (3.5.2), one can write ‘111, ($1) = ‘11?) (x1) -— 315,1) (11:1), where 1)(3131) 2 ”Plth( (,)X11-$1 711253, (X132), (3-5-3) i=1 wt” (x1) = WZXI. (Xz,1-$1)5§(X1,2), (3.5.4) [=1 in which E; (X2?) is given in (3.5.1). Further one denotes an (X1421) = Kh(X(,1 - 1‘1)BJ,2 (X12) , no] ($1) = EWJ (X1551), (35-5) by (3.3.8) and (3. 5 1), ‘11,,(2) (2:1) can be rewritten as n N 2 _ .. \I/S, ) (171) = n l E E aJ,2wJ (X),:1:1). (3.5.6) l=l J=1 The uniform order of ‘11“) (11:1) and \I/(2 )(1121) are given in the following two lemmas. LEMMA 3.5.1. Under Assumptions (B2) to (86), (11.8%,) in (3.5.3) satisfies l,” (1:1)] = 0,, {N (logn)2/n}. ;E1€[0,l] PROOF OF LEMMA 3.5.1. Based on (3.5.1) 71 Tl —1 :55 (Xi,2) = 9.1.2 {71—12322 (Xi,2)} n l 5 2 31,2 (X232) - i=1 M2 J N 2am - sup J21 ingN l l/\ Lemma 3.7.5 implies that N 1/2 N 25.1.2 g N233, g{N.5T5} J=l J=l ”2 = Op(Nn—1/210gn) . By (3.7.15),(3.7.18), sup In“l 21:1 8,1,2 (Xi,2)| S An,l = 0,, (n71/2 log n), so IngN ”—1252“ =Op{N(logn)2-/n} (3.5.7) 46 By Assumption (B5) on the kernel function K, standard theory on kernel density estimation entails that squle[0,1] ln‘l 211:1 Kh (Xhl — 1:1)l 2 Op (1). Thus with (3.5.7) the lemma follows immediately. CI LEMMA 3.5.2. Under Assumptions (B2) to (B6), ‘1’?) (2:1) in (3.5.4) satisfies sup 11610.1] «153’ (ml = 0M). Lemma 3.5.2 follows from Lemmas 3.7.9 and 3.7.10. Proposition 3.5.1 follows from Lemmas 3.5.1 and 3.5.2. 3.6 Simulations In this section two simulation experiments are carried out to illustrate the finite-sample behavior of the SPBK estimators m; (sea) for a = 1,...,d. The programming codes are available both in R 2.2.1 and XploRe. For more information on XploRe, see Hardle, Hlavka and Klinke (2000) or visit the following website, http://www.xplore—stat.de. The number of knots N for the spline estimation as in (3.2.6) will be determined by the sample size and a tuning constant c. To be precise N = min ([cn2/510gn] +1, [(n/Q —1)d_1]), in which [a] denotes the integer part of a. In this simulation study, c is chosen to be 0.5 and 1.0. As seen in Table 4.3, the choice of c makes little difference, so we always recommend to use c z 0.5 to save computation for massive data set. The additional constraint that N g (n/ 2 - 1) d—1 ensures that the number of terms in the linear least squares problem (3.2.6), 1+dN, is no greater than n / 2, which is necessary when the sample size n is moderate and dimension d is high. We have obtained for comparison both the SPBK estimator my; (350,) and the “oracle” estimator mg, (33a) by Nadaraya-Watson regression estimation using quartic kernel and the rule—of-thumb bandwidth. We consider first the accuracy of the estimation, measured in terms of mean average squared error. To see that the SPBK estimator mg, (2:0) is as efficient as the “oracle” 47 smoother fit; (ma), define the following empirical relative efficiency of my, (.130) with respect to in; (3:0,) as 11 ~ .. 2 1/2 Zizl {ma (X1170) - m0, (Xi,a)} .. 2 ESE-.1 {m3 (X130) - ma (Xi,a)} Theorem 3.2.1 indicates that the effa should be close to 1' for all a = 1, ...,d. Figure 4.15 (3.6.1) effa = and 4.16 provide the kernel density estimations of the above empirical efficiencies to observe the convergence, where one sees that the center of the density plots is going toward the standard line 1.0 and the shape of those plots becomes narrower as well when sample size n is increasing. 3.6.1 Example 1 A time series {1’2}?:§1999 is generated according to the NAAR model with sine functions given in Chen and Tsay (1993), . 7T , 71‘ Y; =1.551n(;2—Yt_2) —1.081n(-2-Yt_3) + cost, 00 = 0.5,1.0, where {502331996 are i.i.d. standard normal errors. Let X? = {Yt__1, Yt_2, Yt_3}. Theo- n+3 rem 3, page 91 of Doukhan (1994) establishes that {Yb X?) is geometrically ergodic. t=—1996 The first 2000 observations are discarded to make the last n+3 observations { Yt}?=+ 13 behave like a geometrically a-mixing and strictly stationary time series. The multivariate datum {1Q,XtT}::: then satisfies Assumptions (Bl) to (86) except that instead of being [0,1], the range of Yt—a, a = 1, 2, 3 needs to be recalibrated. Since there is no exact knowledge of the distribution of the Yt, many realizations of size 50000 have been generated from which one sees that more than 95% of the observations fall in [—2.58,2.58] ([—3.14,3.14]) with 00 = 0.5 (00 = 1) . We will estimate the functions {ma(a:a)}g=1 for ma 6 [—2.58,2.58] ([—3.14,3.14]) with 00 = 0.5 (00 = 1.0), where m1(5131) E 0, m2 (1:2) E 1.5 sin (£3172) — E [155111619] , m3 (1'3) E —1.0$in (€173) -— E[—1.031n(th)] . 48 Sample size n is chosen to be 100, 200, 500 and 1000. Table 4.3 lists the average squared error (ASE) of the SPBK estimators and the constant spline pilot estimators from 100 Monte Carlo replications. As expected, increases in sample size reduce ASE for both estimators and across all combination of c values and noise levels. (Table 4.3 also shows that the SPBK estimators improve upon the spline pilot estimators immensely regardless of noise level and sample size, which implies that our second Nadaraya—Watson smoothing step is not redundant. To have some impression of the actual function estimates, at noise level 00 = 0.5 with sample size n = 200, 500, the oracle estimators (thin dotted lines), SPBK estimators in; (thin solid lines) and their 95% pointwise confidence intervals (upper and lower dashed curves) for the true functions ma (thick solid lines) have been plotted in Figure 4.12, 4.13 and 4.14. The visual impression of the SPBK estimators are rather satisfactory and their the performance improves with increasing n. To see the convergence, Figure 4.15 plots the kernel density estimations of the 100 empirical efficiencies for sample sizes n = 100, 200, 500 and 1000 at the noise level 00 = 0.5. The vertical line at efficiency = 1 is the standard line for the comparison of fit; (11:0,) and 1h; (10.). One can clearly see from Figure 4.15 that as sample size 71. increases the efficiency distribution converges to 1, confirmative to the conclusions of Theorem 3.2.1. Lastly, the computing time of Example 3.6.1 is provided based on 100 replications done on an ordinary PC with Intel Pentium IV 1.86 GHz processor and 1.0 GB RAM. The average time run by XploRe to generate one sample of size n and compute the SPBK estimator and marginal integration estimator (MIE) has been reported in Table 4.4. The MIEs have been obtained by directly recalling the “intest” in XploRe. As expected, the computing time for M113 is extremely sensitive to sample size due to the fact that it requires n2 least squares in two steps. In contrast, at least for large sample data, the proposed SPBK is thousands of times faster than MIE. Thus our SPBK estimation is feasible and appealing to deal with massive data set. 49 3.6.2 Example 2 Consider the following nonlinear additive heteroscedastic model d W . o d . 1.1.. Yt = 2:18111(EX¢_0)+ 0(X) abet .~ N(0, 1) , a: in which X? = { Xt_1, ..., X t—d} is a sequence of i.i.d random variables with standard normal distribution truncated in the interval [—2.5,2.5] and the conditional standard deviation function is defined as o (X) 2 JOE - 5 — eXP(Zg=1lXt-a|/d) 00 : 0.1. 2 5 + exp (22:1 lXt—al/ d) , This choice of a (X) ensures that the design is heteroscedastic, and the variance is roughly proportional to dimension d. This proportionality is intended to mimic the case when independent copies of the same kind of univariate regression problems are simply added together. For d = 30, 100 replications have been done for sample sizes n = 500, 1000, 1500 and 2000. The kernel density estimator of the 100 empirical efficiencies is graphically represented in Figures 4.16 and 4.17. Again one sees that with increasing sample sizes, the relative efficiency are becoming closer to the vertical standard line, with narrower spread out. 3.7 Proof of Theorems Throughout this section, an >> 1),; means lim bn/an = 0, and an ~ on means lim bn/an = n—+oo n——+oo c, where c is some constant. 3.7.1 Preliminaries Define for a =1,2,J =1,...,N +1 on. = IIIJ,..||§ ——- [13,... (was (mantra. (3.7.1) LEMMA 3.7.1. Under Assumptions {B4} and (B6), one has: 50 (i) there exist constants C0 (f) and Cl (f) depending on the marginal densities 1.. (as) ,a = 1,2, such. that Co (f) H s ||b1,a||§ s 01 (f) H- (ii) 1 J’=J _CJ 1 ((2.1, “(110,“;l J’=J—1 E{BJ,.. (x...) 3.1/,0. ma} = + “l ““2 J «Wuhan; Ila. ll,“ _m 0 |J—— J’l >1 ~ 1 |J’—-J| g1 11 |J’—-J|>1 and fork 21, 1.11.1; (Wham is) 1:1, I: ,__ _ EIBJ,a (Xi,a)BJ’a( J+1’a”bJ’a“2 “by“ “2k fig}; J J 1 CJ+2,allb-I,2_0”kHbJ’,a'2—k/CJ+110,J=J+1 0 |J— J’|>1 H1 k If ——J| <1 0 |J’— J] >1 where 0,1,0, (1 =1,2,J =1,...,N +1 are given in (3.7.1). PROOF. Note that for any a = 1,2, J = 1,...,N, byfl ($0) in (3.2.2) can be rewritten as bJ,a ($0) = IJ+1,a ($a) - CJ+1,aIJ,a(xa)/CJ,01 and 2 ”bJ,a“2 = CJ+l,a(1+ CJ+l,a/CJ,a)a In Assumption (B4), the two positive constants cf, C f are the upper and lower bounds of fa (:50), then CfH S CJ,a S CfH CO (f) H 2 cf (1 + cf/Cf) H 3 "111,0“; 3 C, (1 + of/cf) H = Cl (f) H, for all J = 1, ...,N + 1,01 21,2. The proof of (ii) is trivial. Cl LEMMA 3.7.2. Under Assumptions (B4) to (86), for an” (171) given in (3.5.5) = o (HI/'3). sup sup l/in (:rl) 3316(0,” ISJSN 51 PROOF. By definition, 11w‘1(2:1)l = |E{Kh(X1J — 3:1) 8‘12 (X1,2)}| is bounded by ijh (“1 -$1)IBJ,2(U2)|f (U1,U2)dU1dU2 //1{(U1)'————bJ2(u2)l f(hU1 +$1,112)dv1‘d112 “’9 “2J2 (112,.)‘21 {f/K 1,11.121<1w. +$12u2)dv1du2 + (%i£)l/2//K(vl)1J,2(u2)f(’wl + $1,U2)d’01d‘u2}- The boundedness of the joint density f and the Lipschitz continuity of the kernel K will (I then imply that sup sup //1{(U1)1J,2 (112)f(hu1 + $13,112)du1du2 S CKCIH, $1€[0,l] ISJSN the proof of the lemma is then completed, by (i) of Lemma 3.7.1. Cl LEMMA 3.7.3. Under Assumptions (Bi) and (BU), there exist constants CO > CO > 0 such that for any a = (a0,a1,1, ...,aN,1,a1)2, ...,n/(1,2), ‘2 2 , 2 2 r C0 a0 + Zaia S a0 + Z (lJaaBJ’a S CO 0.0 + Zaj’a . (3.7.2) J,a J,a 2 J,a PROOF. Lemma 1 of Stone (1985) provides a constant (:0 > 0 such that ‘2 N 2 N 2 2 (10 + Z aJ,aBJ,(r 2 C0 (10 + Z aJ,lBJ,l + Z aJ,ZBJ,2 1 J,a 2 J21 2 J=1 2 then (3.7.2) follows if there exist constants 06 > 06 > 0 such that for a = 1, 2 ‘2 N N I 2 60 Z 01,2 3 Z “JnBJe J=1 J=l N g 06 2 113,0. (3.7.3) 2 J=1 To prove (3.7.3), the original B—Spline basis is employed. Without loss of generality, let a = 1, and use the constant basis {IJJ (2:1)}[JV:11. Represent the term 2.11:1 a JJB J,1 (2:1) as follows N N +1 20.11311071):203111110131) (3-7-4) J: l 52 Theorem 5.4.2 in Devore & Lorentz (1993) says that there is an equivalent relationship between the Lp (p > 0) norm of a B-spline function and the sequence of B-spline coefficients. To be specific NH 2 NH N+l 2 Z dJ,11J,1 =/ Z dJ,IIJ,1($1) idl‘l = 2 (13,13 J=l J=1 J21 L2 The uniform boundedness of the joint density in Assumption (B4) implies that NH 2 NH 2 NH 2 c, Edwin 3 20111111 SC; Zdu’m J=l J=1 L2 2 J=l L2 Then Lemma 3.7.1 and (3.7.4) lead to N+1 N 2 2 a 2 J,l C.I+1,1 §:dJ,l:§:_—{< )+1}‘ J=1 J=1 “bull: C“ Then N N+1 N ca 2 (23111—1 3 2 (13,15 Ca 2 (23,111“, J=l J=1 J=l for positive constants ca and Ca. Therefore, N N 2 NH 2 N 2 2 “f0“ 2 “J,1 3 Z “1213121 = Z dJJIJJ 5 CfCa 2 0.1.11 le J31 2 J=1 2 J21 i.e. (3.7.3) holds given of, = efca, 06 = CfCa. C] Lemmas 2.6.2 and 3.7.2 entail the next Lemma 3.7.4, which shows the uniform supre- mum magnitude of n—1 }:le {WJ(X[,$1) — qu (2:1)} and n-1 Z?=1WJ(Xla-’Cl)- The quantities on (X1,:r1) and qu (3:1) are defined in (3.5.5). LEMMA 3.7.4. Under Assumptions (82), (B4) to (B6) sup sup xle[0,1] ISJSN = 0,, (log n/M) , (3.7.5) 71—1 :2 {wJ (X12131) — Ia.” (131)} Tl n‘l ZLUJ (X1,:1:1) [:1 Sllp sup 2 0,,(H1/2) . (3.7.6) x1€[0,l] nggN PROOF. For simplicity, denote w} (X1, 51:1) = mg (X1, 1:1) —— 11“” (11). Then 2 E{w3(xz.z1)} = Ew3(x2.x1) - 12%,, (x1). While E1222, (X1, 1:1) is equal to _ -2 CJ+1,2 h l “bJ,2”2 K2 (“1) 1J+1,2 (“2) + IJ,2 (212) wal + I12U2)dvldU22 CJ,2 which implies that Ew3(X1,:1:1) ~ h’1 and Ew3(X1,231) >> [1,2,J(x1). Hence for n suffi- ciently large * 2 * '- E{WJ(X12171)} =Ew3 (X12$1)‘#3J ($1) 26 h 12 for some positive constant c*. When 1' Z 3, the r—th moment E IwJ (X1, x1)|r is {HbJ,2“2)_r//K£(ul “171){IJ+1,2(U2) + (Cum) 1J,2(U2)}f(U1,U2)du1dU2- CJ2 ) It is clear that EIwJ (X1,$1)|r ~ 11(1—T)H1”r/2 and IEWJ‘ ()'{l,:1:1)|r S CHr/2 by Lemma T 3.7.2, thus E le (X(,$1)|r >> IHWJ (11:1)l . T E )w3(X12I1)|r = Ele (X12151) * #2” (351) S 2"“1(EIwJ(X1,1‘1))T+lpwj(xl)|r) S {ch-lH—l/2} (T2) rlE [21} (x,, 2:1)|2, then there exists a constant c* : cit-1H 71/ 2 such that —— 2 Ewa,(X,,x1)|r S C: 2T!Elw3 (X12130) 2 that means the sequence of random variables {wj(X1,x1)}?=l satisfies the Cramér’s con— dition, hence by the Bernstein’s inequality one has for r = 3 2 6/7 qpn 72' P > <0. ex — +a 3 a —-— , { "(M)" l p( 25m§+5c2pn) 2” ([HID n "—1 ij (X12161) i=1 where 102" 'th 2” +2 1+ ’02 '11 2 I“ p=p-—-—-,w1 az— ,w11m~1, n Vnh l q 25mg + 5c2pn 2 (12(3) = Mn 1 + 5m‘2/7 with 7713 : max “w" (X :1: )H < C0 (211—1)2 ”3 pn 1 ISiSn J [1 l 3 _. . 54 Observe that 504),, = 0(1), then by taking q such that [fir] 2 ('0 log n, q 2 cln/ logn for some constants (0,01, one has a1 = 0(n/q) = 0 (log 11), a2 (3) = 0 (112). Assumption (82) n 6/7 n 6/7 6A _ __ - - C0/7 0([Q+1D S{K0exp( A0[Q+1])} 5011 0 . Thus, for n large enough, yields that l n plogn . 2 P — w" X,:c > We divide the interval [0, 1] into Mn ~ n6 equally spaced intervals with disjoint endpoints 0 = 551,0 < 2:111 < < “TLMn = 1. Employing the discretization method, one has n n-1 :00} (X1, 121$) [=1 sup sup (3.7.8) x1€[0,1]1SJSN = sup sup OSkSMn ngSN n "—1 2w} (X1431) 1 1 n n“1 Z {w} (Xm) - 01} (Xz,x1,k)} [=1 By (3.7.7), there exists large enough p > 0 such that for any 1 S k g Mn,1 S J _<_ N 1 p {. n which implies that (X) Z P { sup sup OSk§1Wn ISJSN N 2% Thus, Borel-Cantelli Lemma entails that + Sllp Sllp sup ISkSIWn ISJSN $1El$l k—l’xl k] n Zed} (X1,$1,k) > p(nh)-1/210gn} ;<_ 11‘“), [=1 Tl. n71 2w} (X1, 331$) [=1 > logn “pm n 77,-1ng (Xlaxljc) [=1 § l (X) 2 p 08"} g ZNMnn‘IO < 00. 1121 1i712w3(xl,$1,k) = Op (log n/fi) . (3.7.9) [=1 sup sup ogkgMn ingN Employing Lipschitz continuity of kernel K, one has for 2:1 6 l$I,k—1,$l,kl sup |Kh(X,,1— 1:1) -— Kh(X,,1— x1,k)| g CKMgllz—2. lngMn 55 Hence one has n. n-1:{w3(x1.x1) — w; (xl:$1,k)} [:1 Sup sup SUP ISkSMn ISJSN$1€[31,k_lvxlikl S CKMglh"2 sup sup IBJQ (2:2)l = 0(Mn—1h‘2H71/2). $2E[0,l] ISJSN Thus, one has 1 n 2 {w} (X11331) — w} (39,111)} [=1 sup sup SUP lskSMn ISJSNrielxl,k—1v$1,kl = o (i) , (3710) since Mn ~ 116. (3.7.5) follows instantly from (3.7.8), (3.7.9) and (3.7.10). As a result of Lemma 3.7.2 and (3.7.5), (3.7.6) holds. D The next lemma provides the size of 5T5. LEMMA 3.7.5. Under Assumptions (32) to (86), the least square solution 5 defined in (3. 3. 6) satisfies N 2 5% = 213+ Z 2 213,0 = 0,, (N (log n)2 /n) . (3.7.11) J=la=l —l PROOF. According to (3.3.8) and (3.3.7), ii = (BTB) BTE, then -1 5TBTB5 _—. (5TBTB) (BT13) BTE 2 5T (BTE) . As the matrix B is given in (3.3.7), one has 1 “Bang," = 5T > 5 2 5T (rt—IBTE) . (3.7.12) Zn B B < 3,... J’.a’ According to (3.7.21), ”Bing,” is bounded below in probability by (1 - An) ”Bang. By (3.7.2), one has 2 N 2 - 2 c _ -2 - ”Ban2 = a0 + E : 2 :33), 3 c0 (10 +2 :aia . (3.7.13) Meanwhile one can show that 5T (n‘lBTE) is bounded above by 1/2 2 - - 1 n 1 " “(21+Zafig {gZMXdEi} +Z{;ZBJ,O(Xi,O)U(Xi)€i} La i=1 [0 i=1 2 1/2 (3.7.14) 56 Combining (3.7.12), (3.7.13) and (3.7.14), the squared norm 5T5 is bounded by W2(1—An {i- 20(Xi)51}2+2{% 231,0(Xi,a)0(xi)5i} i=1 Using the same truncation version of 5 as in Lemma 3.7.10, Bernstein inequality entails that n ”-1 —l :0 (X )si+ nggNa= 12 n E BJ,O (X130) 0 (X,~)e2 = 0;, (logn/J77). Therefore (3.7.11) holds since An is of order op(1). C] 3.7 .2 Empirical approximation of the theoretical inner product Let 71 Am] = sup [(1, 81,0)2 n — <1,BJ,Q>2| = sup n.-1 2 BJ’a (X (3.7.15) J,CX , J,O’ 1:1 An,2 = SUP - (31,71,311 a) , (3-7-15) J,J’,a ’ 2‘" ’ 2 A ,3 = sup 2,n ,a J ’a 2 LEMMA 3.7.6. Under Assumptions (32), (B4) and {B6}, one has An,1 = 0,, (7171/2 log n) , 3 (3.7.18) An’g = Op (n.1/2H_1/2 log n) , (3.7.19) An,3 = 0,, (n—l/2 log n) . (3.7.20) PROOF. The proof of (3.7.18) follows from Bernstein’s inequality immediately, thus is omitted. Here we only prove (3.7.19) and (3.7.20). We will discuss case by case with various 0, a’, J and J’, via Bernstein’s inequality. For brevity, set 62' : €i,J,J’,a,a’ : ”—1 [BJ,a (Xi,a) BJ’,a’ (Xm’) “ E {3.1.01 (X230) BJ,’a’ (Xi,a’) }] 1 then ZéiH’JJ Mao: i=1 ZEiH’JJ Haa’ i=1 An 2 = sup , 1 24,113: sup 1 1. The definition of BJ‘l in (3.2.3) will guarantee that 31,1 (X,-,1)BJ;,l (Xu) = 0 if |J — J’| >1. CASE 1.2 when J = J'. By Lemma 3.7.1, the variable 5,- and its second moment can be simplified as follows 5,- = n“1{83,1(X,,1)—1},E£?= 5,13%), (Xm) --1}2 = —1—2{E83,1(X,-,1)—1}, 7!- in which EBfu(X,,1) = lle’lll;4(CJ+1,l+ cf,“ 1/6},) .The selection of H will make E811 (X231) the major term of {E811 (Xm) — I}, then there exist constants c532 and C, E 2 > 0 such that c€,2n—2H-l _<_ E53 3 Cé,212._2H—l. In terms of the Minkowski’s inequality, the k-th absolute moment has the following upper bound k _ ,__ . E|g,|k =n7kElBil(X,-,1) —1| g ”42" 1(53351 (X,,1)+1}. where EB‘21k1 (XI-,1) ~ 1 according to Lemma 3.7.1. Hence there exists a constant 05,2 > 0 such that E|§,|k g Cf2n‘k2k’lHl’k. Next step is to verify the Cramér’s condition Eléz‘lk S ngn—ka—lHl—k = Cf,2,,—(k—2)2k—lH—(k—2)n—2H—1 202 20 (k4) k—2 6’2 £12 .“2 -1 It 2 cm (nH ) 652” H 5 {05,2} “59" —l 5,2 "‘ 5,265.2 Cantelli lemma, when J = J’, a = a’ = 1, one has (3.7.19). in which C“ - 2C 'n‘1H_l max 1,2C2 . Applying Lemma 2.6.2 and Borel - 6,2 CASE 1.3 when IJ —- J’l = 1. Without loss of generality we only prove the case that J' = J +1. Now 5,; = Tl—lBJ’l (X131) BJ+L1 (X231) has the second moment as? = [E33, (xii) 83.1,. (xm — {Earl (x...) 8m. mar] , where {EBJ’1(X1"1)BJ+1,1(Xi,1)}2 ~ 1, E33,! (Xi,l) 83“,, (X131) ~ H71, according to Lemma 3.7.1. Hence, E612 ~ H‘l. The k-th moment is given by Eléilk = n—kEIBJJ (X131) BJ+1,1 (X231) - EBJ,1(X2',1)BJ+1,1 (X,,1)Ik 11"”‘2k—1[EIBJ,1(X1,1)BJ+1,1(Xmllk + IEBJJ (X131) 31+” (Xillllkl’ l/\ where |EBJ)1(X,)1)BJ+1)1(X,)1)|k ~ 1 and EIBJ)1(X,)1)BJ+1)1(X,)1)Ik ~ Hl‘k, ac- cording to Lemma 3.7.1. Hence there exists a constant 06,3 > 0 such that k k -k k—l l—k Similar as in Case 1.2, (3.7.19) follows by using Bernstein’s inequality. CASE 2 when a = 01' = 2, all the above discussion applies without modifications. CASE 3 when a # a'. Without loss of generality, suppose a = 1, (1’ = 2. First we still need to calculate the order of second moment E63, 2 __2 2 2 Ea.- = n E {8h (X131) 3),, (Xx-,2} - (133,), (xx-,1) BJe ma} . The boundedness of the density function f (2:1, 2:2) implies that IEBJ)1(X,-1) 3,5, (X-)2)I < E|€,-| IIbJ,l”2—1IIbJ’)2II2—//IbJ,(l 132,1 )bJI)2 (131,2 )If($l,$2)d’fildfli2 Cf {lleJllé—I [Ill/,1 (1171',1)Id$1} {Ilbjlz II2 l/IbJIM 112,2 )Idxg} 0.} 1,1 —1 J’ 1,2 1 c,{1+ 6:1 }{||bJ)1||, H}{1+ C; IIIIb, ,))'I| HIch),H, for some constant 013,1 > 0, where the last step is derived by Lemma 3.7.1. As a con- l/\ |/\ |/\ 1: sequence, IEBJJ (X131) BJI2 (X,)2)I _<_ C); lHk' Meanwhile, by Assumption (B4) and Lemma 3.7.1, 2 E{BJ)1(X),-1)BJI, (X,- 2)} ”bJ,l||22 I)IbJI22I 2//bj)(1 151,1)bJ/2 (1,7522)f(117772)‘1171d$2 _. —2 Cf {IIbJ,1ll22 [:J,l (13,1) (1351} {lle/12 2 fb.211,2(17,',2)(111‘2} C/{1+ Cam/{121,1} (”521152 H}{1+ ‘J’+12/‘J'2 } {IIbl’ IV —2 I; H} 2 68,2- 59 Hence there exist constants 66’ C2 > 0 such that —2 CnI—2 cné _<_E{- 2, the k-th moment of Ifiil is given by ' k E|§,|k = 71—kEIBJ,1 (Xm) BJ’)2 (X232) — EBJ,1 (Xi,1) 131/)2 (Xi,2)I l/\ k k "_k2k—l [EIBJ)1 (X131) 311,2 (X.,2)I + IEBJ.1 (X21) 3.152 (X.,2)I I where there exists a constant C B] > 0 such that 13:IB),)1(X,)1).BJ,(X,-)2)IIc 11b5,.112‘"||b.r,;/ / lb (..,,2 b, 0,9,2),- /|b..,(. ,1; 22.}{||.,,,)-'° cIc . c,{1+ ’3‘} 1+ 1;” {(152,|1;’°II5J,,)‘ CJ’,2 l/\ (Ti ,2 I f (1131.332) (113161132 _/-’2(le’ $2 2)I kdilrg} ... k k c C I _ Cf{1+ ”’Zm} 1+ J “’2 {.,~f(1+c,/C,)} kH2“k 0 such that Eléilk S n—ka‘l [Cg/H24" + C}; 1H’"I g (C{)kn‘k2k‘1H2—k k—2 202 k—‘2 . 20 202 _£_ 1 . ——2 < 6 € . 2 C5 (2067; H 1) (,{n _ {11H max ( Cg ,1) } k.E{,. l/\ —2 Employing the Bernstein’s inequality and the fact that Egg ~ 71. , one has n z},— [3,. (x...) (x...) — E {3... (x...) (x.,..)}]| i=1 sup sup 13J,J’gN aséa’ is of order 0,, (n"1/2 log 71). So the proof of (3.7.19) and (3.7.20) is completed. C] LEMMA 3.7.7. Under Assumptions (82), (B4) and (B6), the uniform supremum 0f the rescaled difference between (91,92)., n and (511,92), zs An = sup (3.7.21) 91.92€C(_1) I(91.g2)2 ’(91.92>2I , ’" .—.()( 10g" )=o,,(1). ||91|l2 llgzllz 1.1/2H1/2 60 PROOF. For every 91,92 6 G('1), one can write 91(X12X2) : a0 + Z.Z=123=10J,QBJ,Q(X0), .92 (X11X2) : ‘16 + Zlel £3,121 afl’p’B-l'fl’ (X01), in which for any J, J’ = 1,...,N,a,a’ = 1,2, 0,1,0, and aJ/a/ are real constants. The difference between the empirical and theoretical inner products of 91 and 92 is [(gi,g2>2,n - (91.92%! S Z2,n + Z 1.0 J’,a’ + Z laJ,al (81,0, BAG/>2,” — <31,” raj/WM = L1+ L2 + L3. J,J’,a,a’ 2,n I 011' J,a The equivalence of norms given in equation (3.7.2) and definition (3.7.15) lead to 1/2 1/2 An,l ' lab] ' Z I‘Mal S 00A“ 062 + Za’JZ’a 2:03.01 NV2 Ja Ja J,a : CA,1An,1ll91“2||92H2 H‘”2 = 0,, (...—I/m—I/z logn) "911:2 1:92:12- L1 |/\ Similarly, one has — ‘ — '2 -l2 L2£C§,1An,1ll91||2ll92II2H 1/2:0p(” 1/ H / logn)|l91||2||g2l|2- For the last term L3, one has, by definitions (3.7.16) and (3.7.17) L3 S 2 la.l,al ail/,allmaXMmaAns) J,J’,a,a’ 1/2 1/2 S CA,2maX(An,2iAn,3) 203.01 20324 J.0 J’,a’ < C A A a —0 “WWI/21 _ A,2max( n,2: 71,3)”91II2H92II2- p n 0%“ H91|l2||92|l2~ Therefore, statement (3.7.21) is established. Cl 3.7.3 Proof of Lemma 3.5.2 In the following, denote V as the theoretical inner product of the B-spline basis {1,840 (ma),J =1,...,N,a =1,2},i.e. T T 1 031V 1 0N 0N 02N 2 150,0,‘521 O V V ngJ’sN N 21 22 61 where 0,, = {0, ..., 0}T. Let S be the inverse matrix of V, i.e. 1 0% 0% S=V‘1= 0N S11 512 . (3.7.23) 0N 321 322 The next lemma on the positive definiteness of matricesiV and S is a sufficient step to achieve Lemmas 3.7.9 and 3.7.10. LEMMA 3.7.8. Under Assumptions {B4} and (B6), for the matrices V and S defined in (3.7.22) and (3.7.23) respectively, there exist constants CV > CV > O and Cs > 65 > 0 such that cv12N+1 S V S Cv12N+n CSIZN+1 S S S 0512N+1- (3.7-24) PROOF. Take a real vector fl = ([30,[31,1,...,EN,1,51,2,...,fiN,2)T E R2N+l, One has TB 2: T 1 03), 2 TV ”a won. a (mm) [3 . a where denote B2 (x) = {1,B1,1 (X1) , ..., BN3 (X2)}T. According to (3.7.2), there exist constants CV > CV > 0 such that 2 2 . CV n3 + Zflia Z “fiTB2 Oi)“2 = (3(2) + 25.1,a3.1,a (Ia) , J,(1 J,a 2 2 ||3T132(x)||:=a3+ gamma) 2w 33+Zfiia . J,a 2 J,(I thus one concludes that CvnTn = Cv 33 + 2133,... 2 NW} 2 av [33 + 2:33,... = cvnTn. J,a J,a which implies that cVI2N+1 _<_ V S CVI2N+1. The second half of (3.7.24) follows by changing )6 by V"1/2fi. Cl As an application of the above Lemma, for any (2N + 1)-vectors x and y XTSy S C's(2N + 1) ||X|| ° IIYII, (37.25) 62 where CS is the same as in (3.7.24). Note that a given in (3.3.8) can be rewritten as - T ‘1 T 1 T _1 1 T ... —1 1 T a=(B B) BE: EBB EBE =(V+V) 513E, (3.7.26) where V* is the difference between empirical and theoretical inner product matrices, i.e. T V“ _ O 02N 021V 2,n _ 2 Isaac/£2, 1gJ,J’gN Now define a = {30,314,...,aN,1,a1,2,...,aN,2}T by replacing (V + V"')‘1 with V‘1 = S in the above formula, that is a = V“1 (n’lBTE) —_— S(n"‘1BTE) . (3.7.27) and define - (2) n N ‘1’2; (331) = "’1 Z Z (34,2011 (X2331)- (3-7-28) i=1J=l The next lemma shows that the difference between 4132) (3:1) in (3.5.6) and fill?) (51:1) in (3.7.28) is negligible uniformly over 2:1 6 [0,1]. LEMMA 3.7.9. Under Assumptions (B2) to (B6), sup 915,2) (2:1) -— 91?) (2:1)] 2 Op ((log n)2 /nH) . 1216(0,” PROOF. According to (3.7.26) and (3.7.27), one has V a = (V + V*) a, which implies that V*" = V (a — 5). Using (3.7.19) and (3.7.20), one obtains that W (a — a)“ = “wan 3 Op (ml/2H-1 logn) nan. By Lemma 3.7.5, ”a” 2 Op (n—l/ZNI/2 log n), so one has "v (a — a)“ 3 0p ((105511)? n-1N3/2}. Thus according to Lemma 3.7.8, one has "(a _ 5)“ 2 0p {(logn)2 n‘1N3/2}_ 63 Using Lemma 3.7.5 again, one has “an s ”(a — a)” + nan = 0,,(1ognm/n) . (3729) Hence (2) “(2) N 1 " (w. (x1) - a (seal = 2 (an — an) ; Zea. (xtm . J=l [=1 Cauchy-Schwartz inequality implies that 2 (2) (log n)2 1/2 (log 11) sup 1)—\Ilv (x1) <\/—O O H =0 —— . x€[0,l] l p ”H p ( ) p nH Therefore the lemma follows. Cl LEMMA 3.7.10. Under Assumptions (82) to {B6}, for \TISZ) (2:1) as defined in (3.7.28) N A 2 A \Ils, ) (2:1)l = sup 1 E1 Kh(X 231— .731) )E GJJBJJ (XI-,2) = 0,, (H). x16[0,1] x1€[0.lln J: 1 PROOF. Note that N @£Q)($l)l S 2042qu (1‘1)+ +ZdJ ,2” 1}—_:{W.I(quf151)—1UoJ($1)} = Q1 ($1) + Q2 (351)- . (3-7-30) )2. By Cauchy—Schwartz inequality, one has N a. .2.. .;{ J_. 1 316(0,” 71—12”: (142709.11) - In” (131)} i=1 Observe that "an 2 Op (log m/N/n) as given in (3.7.29) and sup x1€[0,l] n“ Z{WJ(X7L,$1) — u...) (x1)}| = 0p(logn/\/571) , i=1 given in Lemma 3.7.4, so by Assumptions (BS) and (B6) sup Q2 (’51) = 0,, (logn/N771) Wop (If/8%) : 0,, {Ego—$23} $1€[0,l] 2 0,, {(log n)3 NH}. (3.7.31) 64 Using the discretization idea again as in the proof of Lemma 3.7.4, one has N sup Q1(2:1) 3 max (1)211“, (1131,]c) + (3.7.32) x1€[0,l] ISkSMn ng J N N K113i! sup 2 (“Us/11.” (15 1) _ 2 71mm” (Jim) = T1 + T2. - - "$l€l31,k-11$1,kl J=1 J=1 where Mn ~ n. Define next W = max n‘1 1 :1: s B X- 0 X- e- 1 ISkSMn 1g§n131§g1vle( l,k) J+N+1,J’+l J’,1( 2,1) ( 1.) i W2 ll 1 —l m B X: x- - 13kg” n ISEnISJ’ZJlsNqu (lec) SJ+N+LJJ+N+1 J],2( 2,2)0( I)El then it is clear that T1 _<_ W1 + W2. To show that both of the two terms W1 and W2 have order 0,, (H), we truncate the random variable 5,- at the level of 1 2 D = 90 —— — . 3.. n n (2+6<90<5) (733) where 6 is the same as in Assumption (B3). Without loss of generality, we only give the proof of W1 = 0,, (H). Let 55,0 = 8,105.15 Du), 5:0 = 8110521 > Dn), EZD = 5,7,0 —- E (527:0 |X,-), .ng = Z 11W($1,k)sJ+N+1,J,HBJ/,1(X,-,1)a(X,)€;‘,D, lgxng and denote W10 as the truncated centered version of W , i.e., n n-1 Z (1,, i=1 Next we show that 'Wl — WID I = Op (H) Note that (W1 _ IVID W10 :.— max . (3.7.34) lngMn S A1+ A2, where 1 " _ A1 = 1334,, E: Z a” (2:1).) SJ+N+1,J,+IBJI,1(Xi11)U(Xi)E(8i,D|xi) , ”'1ng,ng 1 " 1. A2 — 131/212%!" 71;: Z 1th (15m) 3J+N+1,JJ+1BJI,1(X¢',1)U (X05130 ~ ”119,ng 65 T Let flu) ($1,113) : {“1421 (11,16) 1' ° ' #in ($1,k)} 1 then N n T _ _ A1 = max Ha; (lec) S2l {n l ZBJ’ 1(Xi'1)0 (X01; (El-’0 lXi)}JJ i=1 ’ =1 lgkgMn 1/2 N N i 2 “8122214 D‘Zi ZBJ“X“)°( )E(EZD'X“)} ’ i=1 according to (3.7.25). By Assumption (BB), IE (egDux.)| = IE (5301):.) S and slupl% 22:1 BJ,1(X1‘,1)0'(X1‘) ,0: Lemma 2.6.2. Therefore E 051le lxi) —(1+6) D}l+6 S MéDn 1 = Op(log n/fi) by Bernstein inequality given in 2 1/2 /\ n z: N N —(1+6) 2 l . A1 .. M6011 131334" 211% ($1,k)Jz—:l{nZBJ1(le 0(2)} 1 Op {ND;(1+6) log2 71/11} = 0p (H), where the last step follows from the choice of Du in (3.7.33). Meanwhile 2+6 . 00 EIEnl2+6 00 E (Elsnl |xn) Ma 2 Pugnl > D”) < Z_____ D2+6 — 2: D31” 5 2 2+6 < 00’ 11:1 11:1 11:] n since 6 > 1/2. By Borel-Cantelli Lemma, one has with probability 1 n ”_1 Z Z #wJ (331,1c) SJ+N+1,J'+IBJI,1(Xi’l)0 (x0531) : 0 i=1 ISJ,J’5N for large 11. Therefore, one has |W1 — WID I 5 A1 + A2 = Op (H). Next we want to show that wlD = 0,, (H), with wlD defined in (3.7.34). Since ch = ”w (131,1)T321 {31,1(Xi,1),-~ ,31,N (Xi,l)}T 0098*) 5,. D, so the variance of UM is “w ($1,k)TS21VHI ({Bl,1(Xi,1),--° ,BN,1( X1,T1)} 0095) i 0) 321111.) (131, k)- 66 According to Assumption (BB), 0 (x) is continuous on a compact set [0,1]d, so it is clear that chu 5 var ({Bl,1(Xi,1), - - - , BN,1 (Xi,1) }TU(X1')) S 03V”. Thus var (Ui,k) ~ pw (x1,k)T 821V11521uw (331,1) V£,D : “w ($1,107. 521%; (zlJc) V€,Dv * T 1/2 where VQD = var {El-,0 [Xi }. Let n (1131,11) = {uw (IlJc) [1w ($1,k)} 0363 {N (11311)}2 V5,D S var (Ui,k) S 0303 {K (1131,10)2 V5,D- When 1‘ 2 3, the r-th moment EUsz is E lUiJclr = E Z l‘wJ (171$) SJ+N+1,J,+IBJJ,1(Xi’l)O (X05210 igxng 1' * T g E 2 "“U ($1,k) SJ+N+1,J'+IBJ’,1(Xi’1)0(xi) 13(5750 Ixi) 1gLng T _<_ E Z #1.”(13m)3J+N+1’JI+IBJI,1(X1,1)0(Xi) Dir—2‘43, IgLfSN while T E Z Ile(1131,11)3J+N+1,JI+IBJIJ(X1,1)0(Xi) 1gLng T T T : E [1,“,(1'1’k) $21 {BI,1(X1°,1)," ' ,Bl,N (Xi,l)} ”(Xi) r g CgCgE lpw (x1,k)T {31,1(Xi,1)a ' " ,BI,N(X1‘,1)}T| N r/Z S CECE {n ($1,k)}rE 2 33,1091) J=1 3 ago; in (x1,.)}”0 (HM/2) ~ Therefore '5 le'Jclr S CECE {5 ($1,k)}r0 (”hr/2) Bil—2&0 —2 _<_ {can (x1,k)D,,H-l/2}r rush/(kl2 < +00, 67 which means the sequence of random variables {UMHLI satisfies the Cramér’s condition with Cramér’s constant equal to c... 2: cor: (rue) DnH “1/2, hence by the Bernstein’s in- equality we have for r = 3 -1" (10?. n 6/7 P ” Zuni: an SaleXp —25m§+5c,pn' +a2(3)0([q—+-1-D , (=1 where 2 5mG/7 ,,n=pH,01=23+2 1+ 2p" ,a2(3)=lln 1+ 3 , q 25m2 + 5c1pn Pn _ 1/3 m3 N {"3 (351,0)2 Ve,D, ms S {C{"(I1,k)}3 H 1/2DnVe,D} - Observe that 5gp" = 0(1), then by taking q such that [q—i‘f] 2 c0 log n, q 2 cm/ logn for some constants c0,cl, one has a1 = 0(1),/Q) = 0 (log n), (12 (3) = 0 (n2). Assumption (B2) 6/7 6/7 __7}_ __ _n_ *6/\0C‘0/7 at...) siKoewn_ : mfg/5 25mg + 5C*pn C... CO (10g ")5/2 D -—> +00. Thus, for 12. large enough, 1 n P {; Zuni: i=1 Taking q), p large enough, P {'37: 2&1 UiJCl > pH} S 11.3, for large 11. Hence > pH} g clognexp {—C‘2p2 log n} + Cn2‘6Allco/7 3 11—3. §P(Iwflzwi= i245: n—lk- l Thus, Borel-Cantelli Lemma entails that W10 = 0,, (H). Noting that lWl — WIDI = 0,; (H), one obtains that W1 = 0,, (H). Similarly one can show that 1V 2 Op (H). Hence T1 g wl + w2 = 0,, (H). (3.7.35) 68 Employing Lipschitz continuity of kernel K, the term T22 is bounded by N 2 - 2 .. 2 “an max sup §:{uw1($1)-uwj ($1.10} _<.lla|| x N . max sup E K X -—:1:)——K X -:1: 2 B X )2 1SKM"351€l3'¢1,Ic—1v1=1,kl1X5 [{ h( 11 1 h( 11 1'0} { J'2( 12 }] Therefore, according to Assumption (BS), Lemma 3.7.1 (ii) and (3.7.29), N 2 1/2 T <0 Nl/zlogn {ZJ=1EBJ,2(X12)} -O Nl/Zlogn _ _1/2 2“ p 1.1/2 112114., ‘" m ”Pi" ) (3.7.36) Combining (3.7.32), (3.7.35) and (3.7.36) one has SUlee[0,1] Q1 (2:1) 2 Op (H). The desired result follows from (3.7.30) and (3.7.31). Cl 69 CHAPTER 4 Spline Single-Index Prediction Model 4.1 Introduction Consider the stochastic heteroscedastic regression model given in (1.1.1), an attractive di- mension reduction method to deal with the “curse of dimensionality” is the single-index model, similar to the first step of projection pursuit regression, see Friedman and Stuetzle (1981), Hall (1989), Huber (1985), Chen (1991). The basic appeal of single-index model is its simplicity: the d-variate function m (x) = m (151, ...,xd) is expressed as a univariate function of xTBO 2 23:1 351,00? Over the last two decades, many authors had devised various intelligent estimators of the single-index coefficient vector 90 = (60,1, ..., 6’0,d)T, for instance, Powell, Stock and Stoker (1989), Hardle and Stoker (1989),.Ichimura (1993), Klein and Spady (1993), Hardle, Hall and Ichimura (1993), Horowitz and Hardle ( 1996), Carroll, Fan, Gijbels and Wand (1997), Xia and Li (1999), ‘Hristache, Juditski and Spokoiny (2001). More recently, Xia, Tong, Li and Zhu (2002) proposed the minimum average variance esti— mation (MAVE) for several index vectors. All the aforementioned methods assume that the d-variate regression function m (x) is exactly a univariate function of some xTBO and obtain a root-n consistent estimator of 00. If this model is misspecified (m is not a genuine single-index function), however, a goodness- of—fit test then becomes necessary and the estimation of 00 must be redefined, see Xia, Li, Tong and Zhang (2004). Here instead of presuming that underlying true function m is a single-index function, a univariate function g is estimated that optimally approximates the 70 multivariate function m in the sense of g(1/) = E [m(X)|XT60 = V], (4.1.1) where the unknown parameter 00 is called the SIP coefficient, used for simple interpretation once estimated; XTOO is the latent SIP variable; and g is _a smooth but unknown function used for further data summary, called the link prediction function. Our method therefore is clearly interpretable regardless of the goodness~of—fit of the single-index model, making it much more relevant in applications. Estimators of 00 and g are proposed in this chapter based on weakly dependent sample, which includes many existing nonparametric time series models, that are (i) computationally expedient and (ii) theoretically reliable. Estimation of both 00 and g has been done via the kernel smoothing techniques in existing literature, while polynomial spline smoothing is used here. The greatest advantages of spline smoothing, as pointed out in Huang and Yang (2004), Xue and Yang (2006 b) are its simplicity and fast computation. The proposed procedure involves two stages: estimation of 00 by some JE—consistent B, minimizing an empirical version of the mean squared error, R(0) = E {Y - E ( YI XT0)}2; spline smoothing of Y on XTB to obtain a cubic spline estimator g of g. The best single-index approximation to m(x) is then m(x) = g) (XTB). Under geometrically strong mixing condition, strong consistency and (fa-rate asymp— totic normality of the estimator B of the SIP coefficient 00 in (4.1.1) are obtained. Proposi— tion 4.2.2 is the key in understanding the efficiency of the proposed estimator. It shows that the derivatives of the risk function up to order 2 are uniformly almost surely approximated by their empirical versions. Practical performance of the SIP estimators is examined via Monte Carlo examples. The estimator of the SIP coefficient performs very well for data of both moderate and high dimension d, of sample size n from small to large, see Tables 4.5 and 4.6, Figures 4.19, 4.20 and 4.21. By taking advantages of the spline smoothing and the iterative optimization routines, one reduces the computation burden immensely for massive data sets. Table 4.6 reports the computing time of one simulation example on an ordinary PC, which shows that for massive data sets, the SIP method is much faster than the MAVE method. For instance, 71 the SIP estimation of a 200-dimensional 60 from a data of size 1000 takes on average mere 284 seconds, while the MAVE method needs to spend 2432.56 seconds on average to obtain a comparable estimates. Hence on account of criteria (1) and (ii), our method is indeed appealing. Applying the proposed SIP procedure to the rive flow data of Iceland, we have obtained superior forecasts, based on a 9-dimensional index selected by BIC, see Figure 4.25. The rest of this chapter is organized as follows. Section 4.2 gives details of the model specification, proposed methods of estimation and main results. Section 4.3 describes the actual procedure to implement the estimation method. Section 4.4 reports the main findings in an extensive simulation study. The proposed SIP model and the estimation procedure are applied in Section 4.5 to the river flow data of Iceland. Most of the technical proofs are contained in Section 4.6. 4.2 The Method and Main Results 4.2.1 Identifiability and definition of the index coefficient It is obvious that without constraints, the SIP coefficient vector 00 2 (00,1, ...,60,d)T is identified only up to a constant factor. Typically, one requires that “90“ = 1 which entails that at least one of the coordinates 00,1, ..., 60") is nonzero. One could assume without loss of generality that 0041 > 0, and the candidate 60 would then belong to the upper unit hemisphere Si“ 2 ((61,...,6d)|zg:1 0,2, = 1,0,, > 0}. For a fixed 9 = (61, ...,od)T, denote X9 = xTo, X9, = xfe, 1 g i g 71. Let mo (X0) = E (YlXa) = E{m (X) |X0}- (4.2.1) Define the risk function of 0 as 12(0) 2 E [{Y — me mm?) = E {m(X) —- 1r1.9(X)5))}2 + E02 (X), (4.2.2) which is uniquely minimized at 00 6 51—1, i.e. 90=arg min [{(B). 0631—1 72 REMARK 4.2.1. Note that 51—1 is not a compact set, so a cap shape subset of 81-1 is introduced d Sf!“ = (61,.--,6d)|26§=1,6d 2 x/1—-c2 .c e (0.1) p=1 Clearly, for an appropriate choice of c, 60 6 511—1, which is assumed in the rest of the chapter. Denote 0_d = (61, ..., 6d__1)T, since for fixed 0 6 31-1, the risk function R (6) depends only on the first d — 1 values in 6, so R (0) is a function of 9—d R" (9-.» = R (61.62,...,ad-1,(/1 —- Ila—dig) , with well-defined score and Hessian matrices 32 c9 3* 0_ 2 ”__R* 6__ , H,“ 0_ : ___-___— ( d) ( d) ( d) 394397;) 6a,, R“ (6—d)- (4.2.3) ASSUMPTION (C1): The Hessian matrix H * (90,—d) is positive definite and the risk func- tion R“ is locally convex at 60,—d: i.e., for any 6 > 0, there exists 6 > 0 such that R* (6—d) — Rik (00,—d) < 6 implies ”B-d — 00,—dll2 < 8. 4.2.2 Variable transformation Throughout this chapter, denote by B31 = {x 6 Rd |||x|| g a} the d-dimensional ball with radius a and center 0 and 00°) (33) = {m lthe kth order partial derivatives of m are continuous on 83 } the space of k-th order smooth functions. ASSUMPTION (C2): The density function of X, f (x) 6 0(4) (831), and there are constants 0 < cf S Cf such that ef/void (33) 3 f(x) 3 cf/void (83), x e Bf;l f(X) —=— 0, x ¢ Bi ' For a fixed 9, define the transformed variables of the SIP variable X9 U9 = Fd(X0),Uo,i = Fd (X94) ,1 S i _<_ 72, (42-4) 73 in which Fd is the a rescaled centered Beta {(d + 1) /2, (d + 1) /2} cumulative distribution function, i.e. _ V/a I‘(d+ 1) 2 (d-1)/2 E) (V) —— [I P{(d +1)/2}22d (1 —- t ) dt,1/ E [—a, a]. (4.2.5) REMARK 4.2.2. For any fixed 6, the transformed variable U9 in (4.2.4) has a quasi-uniform [0, 1] distribution. Let f9 (21) be the probability density function of U9, then for any 11 E [0, 1] f0 (u) = {112’i (vi) xx, (v), v = 171(1), in which fxl9 (v) = limAVfio P(1/ S X9 S u + AV). Noting that 1139 is exactly the projection ofx on 0, let 7),, = {xlu S 2:9 S u+Au}flBg, then one has P(VSX9SV+AV)=P(X€’DV)=/ f(x)dx. Du According to Assumption (C2) chold(’D,,) Vold (3g)

0. ASSUMPTION (C4): The noise 8 satisfies E(5 IX) 2 O, E (82 IX) = 1 and there exists a positive constant M such that sup E (IEI3 [X = x) < M. The standard deviation function xEBd o (x) is continuous on Bg, 0 < C0 S inf 0(x) S sup o(x) S Cg < oo. x683 x6831 ASSUMPTION (C5): There exist positive constants K0 and A0 such that a (n) S K()e"’\0" 11 holds for all n, with the a-mixing coefficient for {Zi = (Xia,) } 1 defined as 1: a(k)= sup |P(BflC)-P(B)P(C)|,k21. B€o{Z3,sS t}, CEo{Zs,s>t+k} 75 ASSUMPTION (C6): The number of interior knots N satisfies: nl/6 << N << 111/5 (log n)_2/5. REMARK 4.2.3. Assumptions (C3) and (C4) are typical in the nonparametric smoothing literature, see for instance, Hardle (1990), Fan and Gijbels (1996), Xia, Tong Li and Zhu (2002). By the result of Pham (1986), a geometrically ergodic time series is a strongly mixing sequence. Therefore, Assumption (C5) is suitable for (1.1.1) as a time series model under aforementioned assumptions. THEOREM 4.2.1. Under Assumptions (CU—(C6), one has A 041—2 00,__d,a.s.. (4.2.11) (X) PROOF. Denote by (f2, 7:, 73) the probability space on which all {(X?, 13)} , 1 are defined. 1 By Proposition 4.2.2, given at the end of this section sup it“ (0_d) — R' (B-dll —-+ 0,a.s.. (4.2.12) lie-allhS ' 1'62 So for any 6 > 0 and w 6 9, there exists an integer no (to), such that when n > 120(w), it“ (00,_d,w) — R“ (90,—d) < 6/2. Note that 6-)) = fi_d (w) is the minimizer of Ti" (9_d,w), so f2" (fi_d (w) ,w) —- R“ (90,—d) < 6/2. Using (4.2.12), there exists n1 (w), such that when n > n1(w), R“ (6_d(w),w) - ft“ (de (w),w) < 6/2. Thus, when n > max(n0(w),n1(w)), 12* (a, (w) ,w) — 12* ((907)) < 5/2 + {2* (a, (w) ,w) — 12* (00,—2) < 5/2 + 6/2 = 6. According to ‘Assumption (C1), R“ is locally convex at 90,—d: so for any 5 > 0 and any a), if R“ (fi_d (w) ,w) — R“ (90,-d) < 6, then "9_d(w) —60,_.d“ < e for 11 large enough , which implies the strong consistency. Cl THEOREM 4.2.2. Under Assumptions (CU-(06), one has “77(9-(1'904-(1) 41» N {0, 2 (00)}, 76 where Z (60) = {11* (90,..d)}T1 ‘1’ (60) {Hill (90:11)} 60) : {‘l’pqlggil with 1M 2 —2E [ (1,), + 790)”) (U90)] + 260,q6(;(11E[{’1’p')’d(U90) + 7902,14} ((190)) +2653E [(700711) (1190)] {(93,111 + 96,112) I{p=q} + 60.1260141{p#q}} +260 p60 (11E [{TY ”Yp'yq + 7607p, q} (U60)] — 26011360196661? [{7}2i + 700fid’d} (U00)] , = ... [1(1) - W) (4 — 412114)} (..,) he. (..,) — v12], 2 in which 7p and '71,” are the values of 33579, 33359-679 taking at 0 = 60, for any p,q = H“ (60,—d): {lpq}pql:1 and 1,2, ...,d —— 1 and 79 is given in (4.2.6). REMARK 4.2.4. Consider the Generalized Linear Model (GLM): Y = g (XTHO) + o (X) e, where g is a known link function. Let 6 be the nonlinear least squared estimator of 00 in GLM. Theorem 4.2.2 shows that under the assumptions (C1)-(C6), the asymptotic distri- bution of the 6_d is the same as that of 6. This implies that the proposed SIP estimator 6_d is as efficient as if the true link function g is known. The next two propositions play an important role in the proof of the main results. Proposition 4.2.1 establishes the uniform convergence rate of the derivatives of ”)9 up to order 2 to those of '19 in 6. Proposition 4.2.2 shows that the derivatives of the risk function up to order 2 are uniformly almost surely approximated by their empirical versions. PROPOSITION 4.2.1. Under Assumptions {C2)—(C6), with probability 1 sup sup Hg (u) — 79 (u)| = O {(nh)”1/210gn + h4} , (4.2.13) 96531-4 HEIOJ] 811p SUP max —{‘108 (U91) - 20 ((101)) = 0(10,——gn + '13) . (4-2-14) l 10 if has many local rninima and maxima, which is very unlikely in application. 4.4 Simulations In this section, two simulations are carried out to illustrate the finite-sample behavior of the SIP estimation method. The number of interior knots N is computed according to (4.3.6) with c; = 1, c2 = 5. All of the codes have been written in R. 4.4.1 Example 1 Consider the model in Xia, Li, Tong and Zhang (2004) Y = m (X) + 0‘08, 00 = 0.3, 0.5, 51-?in N(0, 1) where X = (X1,X2)T ~N(0,12), truncated by [~2.5,2.5]2 and 1/2 m (x) 2 31+ 2:2 + 4exp {— (2:1 + 3:2)2} + (501:? + 33%) . (4.4.1) If 6 = 0, then the underlying true function m is exactly a single-index function, i.e., m (X) = \/2XT60 + 4exp {~2 (XT60)2}, where 63‘ = (1,1)/\/2. While (5 75 0, then m is not a genuine single—index function. An impression of the bivariate function m for 6 = 0 and 6 = 1 can be gained in Figure 4.18. For 6 = 0,1, one hundred random realizations of each sample size n 2 50,100,300 are drawn respectively. To demonstrate how close the SIP estimator is to the true index parameter 60, Table 4.5 lists the sample mean (MEAN), bias (BIAS), standard deviation (SD), the mean squared error (MSE) of the estimates of 60 and the average MSE of both directions. Horn this table, one sees that the SIP estimators are very accurate for both cases 6 = 0 and 6 = 1, which shows that the proposed method is robust against the deviation from single-index model. As we expected, when the sample size increases, the SIP coefficient is more accurately estimated. Moreover, for n = 100,300, the total average is inversely preportional to n. 80 4.4.2 Example 2 Consider the heteroscedastic regression model (1.1.1) with 05{ —exp( "XII/«50) °s+exp(uxn/f) m (X) = sin (ng00) , o (X): (4.4.2) in which x,- = {X,-,1,...,X,-,d}T and 5,, i = 1,. ,,n are '~ N(0,1), 00 = 0.2. In this simulation, the true parameter 60 = (1, 1,0, ...,0, l)/ \/3 for different sample size n and dimension d. The superior performance of SIP estimators is borne out in comparison with MAVE of Xia, Tong, Li and Zhu (2002). We also investigate the behavior of SIP estimators in the previously unexplored cases that n is smaller than or equal to d, for instance, n = 100,d = 100, 200 and n = 200,d = 200,400. The average MSEs of the d dimensions are listed in Table 4.6, from which one sees that the performance of the SIP estimators are quite reasonable and in most of the scenarios n S d, the SIP estimators still work astonishingly well where the MAVEs become unreliable. For n = 100, d = 10, 50, 100, 200, the estimates of the link prediction function from model (4.4.2) are plotted in Figures 4.20 and 4.21, which are rather satisfactory even when dimension exceeds the sample size. Theorem 4.2.1 indicates that 6_d is strongly consistent of 60’_d. To see the convergence, we run 100 replications and in each replication, the value of “6 —- 6OII/ \/d is computed. Figures 4.22 and 4.23 plot the kernel density estimations of the 100 “6 — 60" in Example 2, in which dimension d = 10,50, 100,200. There are four types of line characteristics: the dotted-dashed line (n = 100), dotted line (n = 200), dashed line (500) and solid line (n = 1000). As sample sizes increasing, the squared errors are becoming closer to 0, with narrower spread out, confirmative to the conclusions of Theorem 4.2.1. Lastly, Table 4.6 reports the average computing time of Example 2 to generate one sample of size n and perform the SIP or MAVE procedure done on the same ordinary Pentium IV PC. Horn Table 4.6, one sees that the proposed SIP estimator is much faster than the MAVE. The computing time for MAVE is extremely sensitive to sample size as we expected. For very large (1, MAVE becomes unstable to the point of the breaking down in four cases. 81 4.5 Application In this section the proposed SIP model is demonstrated through the river flow data of Jékulsa Eystri River of Iceland, from January 1, 1972 to December 31, 1974. There are 1096 observations, see Tong (1990). The response variables are the daily river flow (Yt), measured in meter cubed per second of Jékulsa Eystri River. The exogenous variables are temperature (X t) in degrees Celsius and daily precipitation (Zt) in millimeters collected at the meteorological station at Hveravellir. This data set was analyzed earlier through threshold autoregressive (TAR) models by Tong, Thanoon and Gudmundsson (1985), Tong (1990), and nonlinear additive autoregres- sive (NAARX) models by Chen and Tsay (1993). Figure 4.24 shows the plots of the three time series, from which some nonlinear and non-stationary features of the river flow series are evident. To make these series stationary, the trends are removed by a simple quadratic spline regression and these trends (dashed lines) are shown in Figure 4.24. By an abuse of notation, we shall continue to use Xt, Yt, Z; to denote the detrended series. In the analysis, we pre-select all the lagged values in the last 7 days (1 week), i.e., the predictor pool is {Yt_1,...,Yt_7,Xt,Xt_1,...,Xt_7,Zt,Zt-1,...,Zt_7,}. Using BIC similar to Huang and Yang (2004) for the proposed spline SIP model with 3 interior knots, the following 9 explanatory variables are selected from the above set {Yt_1,...,Yt_4,Xt,Xt_1,Xt_2,Zt,Zt_1}. Based on this selection, we fit the SIP model again and obtain the estimate of the SIP coefficient 0 = {—O.877, 0.382, —O.208, 0.125, —0.046, —0.034, 0.004, ~0.126, 0.079}T. The first two plots of Figure 4.25 display the fitted river flow series and the residuals against time. Next we examine the forecasting performance of the SIP method. We start with esti— mating the SIP estimator using only observations of the first two years, then we perform the out-of-sample rolling forecast of the entire third year. The observed values of the exogenous variables are used in the forecast. The last plot of Figure 4.25 shows the SIP out-of—sample forecasts. For the purpose of comparison, the MAVE method is also used, in which the 82 same predictor vector is selected by using BIC. The mean squared prediction error is 60.52 for the SIP model, 61.25 for MAVE, 65.62 for NAARX, 66.67 for TAR and 81.99 for the linear regression model, see Chen and Tsay (1993). Among the above five models, the SIP model produces the best forecasts. 4.6 Proof of The Theorems 4.6.1 Preliminaries In this section, some properties of the B-spline are introduced. LEMMA 4.6.1. For each 0 < r g 00, there exist constants c > 0 such that for each spline combination 2?]:_k+1 aj,kBJ-,k up to order k = 4, one has 4 N' _ 1r cal/T uaII._||Zk_2Z,--_k.1 011.81.11.31.” (3' 1h) / Han... 1 - I: O {(nN)"l/2 log n} . d— lk,k’ =2,3,1 ' ’ 7%9 ’ 9 QESC 16 6"} g clog 11 exp {-(:262 logn} + Cn2-6AOCO/7. (4.6.1) 85 We divide the d — 1 intervals into nG/(d‘l) equally spaced intervals with disjoint endpoints —1 = 61W < 6N < < 6AM” = 1, for p = 1,...,(1— 1. Projecting these small cylinders onto 82*, the radius of each patch Ar, 1' = 1, ...,Mn is bounded by CM; 1. Denote the projection of the Affn points as 9,. = (0,. _d, \/1 — “0,. fling), r = O, 1, ..., Mn. Employing the discretization method, sup axICQ [ii is bounded by d—l 1 0 such that 1 n —10 PI; ECWJJ’J >5"} 3” . z: n (X) 00 71—12(9r31j11i 3 5n} S 2 Z ”WW—10 5 C 2 "‘3 [:1 n=1 11:1 Thus, Borel—Cantelli Lemma entails that 71 lo n -12 g 71 ' ' ’ : O ,a-So- 4.6-3 [:1 (91“,.713’31 ( GEN) ( ) Employing Lipschitz continuity of the cubic B-spline, one has with probability 1 which implies that 00 ZP{ max 11:] H ISJJ’SN sup max On’g’ N “3: 4,.a,1,.,,. N “zzk+:flj,jk16lkl9- According to Lemma 4.6.1, one has for any 9 E 33!“, 2 ll71||2,9 2 ll72ll2,0 : <1h||0l|2_ l|71||2,_9 < C2hllal|2.61hllfill2_ < ||72||2,9_ < C2hllfill2. Cih llallz llfillz S ll711l2,9||72l|2,9 S Czhllall2 llfill2. Hence llalloo llfilloo ‘ 61h llallz llfillz (’71 1 72)n,0 - (A/11’72)9 ll71ll2,9”72ll2,9 An = sup sup aesg—l 7167,7261‘ 1 n x sup -Z{< Bjk’Bj’k [>12 " }, 9639 1k,km =2,,34 "2:1 n, 1* 9 l_<_j,j/ _ , add—M11234 "i=1 n,9 3' 9 ln,0}:___3‘ (4.6.9) In the above, denote by Vmg the empirical inner product matrix of the cubic B-spline basis and similarly, the theoretical inner product matrix as V9 N N ,v ={(B. ,B- > } . 4.6.10 3 0 3,,4 ],4 0 jJ’Z-3 ( ) The next lemma is a special case of Theorem 13.4.3 in DeVore and Lorentz (1993). 1 T V =_ = < . B- > n,0 nBaBH { BJIA’ 1,4 "’9}j,j’=* LEMMA 4.6.4. If a bi-infinite matrix with bandwidth r has a bounded inverse A"1 on [2 and K. = m(A) := ||A||2 “A4“2 is the condition number of A, then ||A_1||oo 3 200(1— u)—l, with CO = u-2r ”14-1”, 11 = (n2 _1)‘/4"(n2 +1)*‘/4’. LEMMA 4.6.5. Under Assumptions (CB), (C5) and (C6), there exist constants 0 < CV < CV such that cVN-l nwng g wTvgw g ovN-l “ng and N‘1 2< Tv 0 such that sup “V1?” _<_CN,a.s., sup ”VB—1” SC'N. (4.6.12) BeSg'l 00 9652i—1 00 PROOF. First we compute the lower and upper bounds for the eigenvalues of Vnfl. Let w be any (N+4)-vector and denote 7w (u) = 29,:_3 ijJ'A (u), then ng = {7w (UgJ) , ...,7w (U9,n)}T and the definition of An in (4.6.5) from Lemma 4.6.3 entails that 2 T 2 2 lhwll2,9 (1 — An) S W Vn,9W = ||7w||2,n,9 S ||7w||2,9 (1 + An)- (4-5-13) Using Theorem 5.4.2 of DeVore and Lorentz (1993) and Assumption (C2), one obtains that 2 N C 2 C 2 cm uwng s “mute = WTV9W = Z ijJ-A 3 01,91le6. (4.6.14) i=‘3 29 which, together with (4.6.13), yield cfoN-l “w“; (1 — An) g wTv'nflw _<_ ofozv-l nwug (1 + A"). (4.6.15) Now the order of An in (4.6.5), together with (4.6.14) and (4.6.15) implies (4.6.11), in which cv = ch, CV = CfC. Next, denote by Amax (Vmg) and Ami“ (Vmg) the maximum and minimum eigenvalue of Vn’g, simple algebra and (4.6.11) entail that CVIV—1 _>.. ”Vnflllz : )‘max (Vnfl) 1| V8119 2 : Aaliln (Vnfl) .<_ clea a.s., thus K :2 “V71,9||2 “Va-folk = )‘max (Vnfl) All?" (Vnfl) S CVCT/l < 0010-3-- Meanwhile, let wj = the (N + 4)-vector with all zeros except the j—th element being 1, j = —3, ..., N. Then clearly Tl 1 . WJTVnflWJ' 2 ; 2332,4(1103') = “324“:9 1 llellz : 1’ ‘3 S J -<— N i=1 and in particular wgVnflWO S )xmax (Vmg) “WOll2 = Amax (vnfl) 1 WZ‘3‘ln,(le--3 2 )‘min (Vnfl) ”w—3ll2 : )‘min (Vnfl) - 89 This, together with (4.6.5) yields that TV W ”BO 4”2 ”Bo 4”2 l— A F:)\ 2 V /\—.l V 2 WO 11,9 0 :—-—,——n’-0—> , 0 n1 “ "M "(9) .....< '1’”) wax/...“.-. Ila—3,413.1 ‘ Ila—3.4“: 1 + A, which leads to a 2 C > 1,a.s. because the definition of B—spline and Assumption (C2) ensure that ”80,4“; _>_ C0 "843,4“: for some constant 1C0 > 1. Next applying Lemma 4.6.4 with u = (n2 — I)”16 (n2 + 1).”16 and c0 = u‘8 ”V;},|l2, one gets “V’Zblloo g 21/”8N(1 — u)"1 2: CN,a.s.. Hence part one of (4.6.12) follows. Part two of (4.6.12) is proved in the same fashion. [:1 In the following, denote by QT (m) the 4-th order quasi-interpolant of m corresponding to the knots T, see equation (4.12), page 146 of DeVore and Lorentz (1993). According to Theorem 7.7.4, DeVore and Lorentz (1993), the following lemma holds. LEMMA 4.6.6. There exists a constant C > 0, such that for O S k S 2 and 7 6 C(4) [0,1] [[6 -- QT 6W” 3 0 ”7(4)” h”, 00 oo LEMMA 4.6.7. Under Assumptions (02), (C3), (C5) and (C6), there exists an absolute constant C > 0, such that for function 370(11.) in (4.6. 7) dk sup —-—- (99 - 79) g C 772(4)” h4_k,a.s.,0 g k g 2, (4.6.16) Sd‘l duk oo . Re c 00 PROOF. According to Lemma 2.3.3, there exists an absolute constant C > 0, such that sup Ina — you... s C sup inf u) — is"... _<_ C||m(4)|| h4.a.s.. (4.6.17) 96353—1 9633—1760 2) 00 which proves (4.6.16) for the case k = 0. Applying Lemma 4.6.6, one has for O _<_ k g 2 (1" 4 - _ sup ———k- {QT (79) - '79} S C sup "7(9 )” h4 k S C “mm” h4 k, (4.6.18) (1.] du _1 00 00 065C 00 96 C As a consequence of (4.6.17) and (4.6.18) for the case I: = 0, one has sup "wen—69110030764)" h4.a.s., CD 9653—1 which, according to the differentiation of B-spline given in de Boor (2001), entails that k d sup “7; {QT (79) — 59} g 0 771(4)“ h4~k,a.3., 0 _<_ k g 2. (4.6.19) (1-1 du 00 9686 00 Combining (4.6.18) and (4.6.19) proves (4.6.16) for k = 1,2. C] 90 LEMMA 4.6.8. Under Assumptions (C1), (C2), (C4) and (C5), there exists an absolute constant C > 0, such that 8 551—259(0),» - 79 (U0,i)}?:1 00 S C sup SUP < < .d— "1(4)” 113, 0.8., (4.6.20) 00 92 1 m {’79 (U93) — 79 ((193)); sup sup l S C “m(4)“ h2,a.s.. (4.6.21) ISPflSdgegg—l 00 0° PROOF. According to the definition of 79 in (4.6.7), and the fact that QT (79) is a cubic spline on the knots T {(4% (79) - 79}(U9,i)}?=1 = P9 {{QT (79) — 79}(U9,i)},:1, which entails that 5%; {{9T (79) — n} (091-) )2; = 552-30 {{QT (79) ‘ U}(U94)}?=1 = 6,, {an (79) — 79}(l]0,i)}:;1 + P953; {{QT (79) — m (”09”;- Since 2:1 819,, “Q?“ (79) — 79} (U0,i)}?:1 Z {{QT (3?;79) _ 53;”) 0194))“ + (Ed; {QT (79) - 79} (U92) Xip}:=li applying (4.6.19) to the decomposition above produces (4.6.20). The proof of (4.6.21) is similar. Cl LEMMA 4.6.9. Under Assumptions (CS), (C5) and (C6), there exists a constant C > 0 such that -l T -l°T sup ”n B9” $Ch,a.s., sup sup ”n Bp“ SC,a.s., (4.6.22) 963;"1 0° ‘SP‘Sdoesg‘l 0° sup “Pg“oo gC,a.s., sup sup "Pp“OOSCh—l,a.s.. (4.6.23) 9632-1 1SPSdassg‘l PROOF. To prove (4.6.22), observe that for any vector a 6 R", with probability 1 n ”*1 Z 8,7,4 ((10,1') i=1 —l T n B a“ < a max 9 — ll ”00 C _ 9 BTE BTE sup sup -— —Q— = sup sup —p— :O(logn). (4.6.25) 13105496531“1 86” n 00 13099635?“1 00 Vnh Similarly, under Assumptions (C2), ( C4 )-( C6), with probability 1 BTE9 sup 9 = sup 3rr<1a§N— —Z Bj 4 (U9 1') {m (X,- ) — 79 (U9 i)}l 965;!"1 00 965g 1 — J logn) 0 , 4.6.26 (VnN ( ) BTE sup sup 1 9 0 = O (logn) ,a.s.. (4.6.27) 1311311965214 59p U 00 \/nh PROOF. We decompose the noise variable 8,- into a truncated part and a tail part 5,- = 8,01" + 5, .2" + mp", where Dn = n" (1/3 < 1} < 2/5), 5,01" = 5,1{Ieil > Dn}, ED2n— -— EiIHEiI < Dn} “‘ 7712' Dnam Dn— “EIE1I{|EiI < Dn} [Xi] It is straightforward to verify that the mean of the truncated part is uniformly bounded by D,’,‘2, so the boundedness of B-spline basis and of the function 02 entail that = o (1);?) = o (71—2/3) . sup 3:12:31); (U9i) 0(X i) m?" 965d 1 The tail part vanishes almost surely CX) OO Epilgnl > Dn} S 201:3 <00 n=l n21 Borel-Cantelli Lemma implies that _ZBJ 4 (U9,-)o o(,-z.X)e,ln =O(n~k),foranyk>0. 92 For the truncated part, using Bernstein’s inequality and discretization as in Lemma 4.6.2 sup sup n 1sz 4 U9) ,- (X0510; = 0 (log n/x/nN) ,a s 965914 131' S N ' Therefore (4.6.24) is established as with probability 1 = o(n"2/3) +0(n_k) +0 (logn/W) = 0(logn/\/1_t—N—). iBTE sup 96 Sd— l The proofs of (4.6.25), (4.6.26) are similar as E (m (Xi) — 79 (U9 1') |U9 i} E 0, but no trun- IOO cation is needed for (4.6.26) as sup 1139.?” Im (X- ) -— 79 (U9 ,)| < C < 00 Meanwhile, d- l 7' 968C to prove (4.6.27), we note that for any p = 1, ..., d N 6 agP(BTE9):{:j—'la9pBJi4U91){m(Xi)_ ’79 (U9 ,i)}l} According to (4.2.6), one has 79 (U9) E E {m (X) |U9}, hence j=—3 E [9,, (U9) {m (X) — 79 (U9)}l s 0, —3 s 1' _<_ N.9 e st“. Applying Assumptions (C2) and (C3), one can differentiate through the expectation, thus (9 . __ 13(59;[Bj,4(U9){TU(X)-79(U9)}]}EO,1_<_de,-3SJSNflESg 1. which allows one to apply the Bernstein’s inequality to obtain that with probability 1 N {n1 23—19%“ 11(91){m (Xi,)-79(U9.-)}]} =o{(nh)-1/2iegn}, j=—3 00 which is (4.6.27). [3 LEMMA 4.6.11. Under Assumptions (C2) and (C4)—(C6), for £9 (u) in (4.6.9), one has sup sup IE9 (11)] = O {(nhrl/2 log n} ,a.s.. (4.6.28) 96 gd—l U€[0,1] --c -1 PROOF. Denote 6 E (6-3,...,6N)T=(B§BB) 133E = v;},(n -lBTE), then 3‘9 (u) = Zf;_3 6,3,), (u), so the order of E9 (u) is related to that of a. In fact, by Theorem 5.4.2 in DeVore and Lorentz (1993) sup sup Iég(u)l S 3‘11) llélloo: sup "ng (n—lBg‘E) N 3 CN sup n‘lB'GI‘EH , a 3 96521—1 0° 965:.”1 0° 93 where the last inequality follows from (4.6.12) of Lemma 4.6.5. Applying (4.6.24) of Lemma 4.6.10, one has established (4.6.28). 1] LEMMA 4.6.12. Under Assumptions (C2) and (C4)-(C6), for E9 (u) in (4.6.8), one has sup sup IE9 (u)| = 0 {(nh)_l(2 log n} ,a.s.. (4.6.29) OESg—‘l ILEIO,” . The proof is similar to Lemma 4.6.11, thus omitted. C] The next result evaluates the uniform size of the noise derivatives. LEMMA 4.6.13. Under Assumptions (C2)-(C6), one has with probability 1 sup Esup max ——5 U = 0{ nh3 _1/210 11}, 4.6.30 8 sup sup max ——-—59 (U9 2') =0 {(nh3) ”2 log n} (4.6.31) l< sutp 1211‘lZI'7'9(U9,,-)— 1n(X,-) — a (X,) 52". GESC‘ i=1 96 Hence [2 g 0 (77—1/2h71/2logn + h4) ,a s The lemma now follows from (4.6.37), (4.6.38) and (4.6.39) and Assumption (C6). LEMMA 4.6.15. Under Assumptions (CB) - (06), one has sup sup Oesg‘llspsd a ,. _ n 331-113(9) — R(9)} — 71 1;:15994; in which . (9 8 €6,239 2 2 {79 (U99) — Yi} 53—9970 (U6,i)" 8010120) E(€g,,-,p) = 0. Furthermore for k = 1, 2 sup 9esg‘1 ——kk- {9(9) — 9(9)} -_— PROOF. Note that for any p = 1,2, ...,d 1 (‘9 . 8 5535;3(3Fn12{79( U9,4) - Y}a—-0p 79 ((19.4), 556;):(9) = E[{79(Ug)—m(x )}(9 7——9(U9)] ll Thus E (99,9) = 213 “790109) — 19} 33579 ((199)) — 335mm = o and 1 8 ~ ~—1 n 9’99; {12(9) - 12(0)} = (2") gain) + J1,0,p + me + J3flm’ with J1,9,p = fl Z {79 ((19.4) - 79 (1199)} 5% (79 - 79) (U9, 1') J2,9,p = "71217901997 — 771(39)— 0 (Xi) 84‘} 8%; (79 — 79) (U99), Tl _ . 8 J3,0,p = n l E :{79 (U99) ’79 ((199)) —66p79(U9,i). 97 : 0(n_1/2) , a.s., 0 (n—1/2h-1/2_k logn + h4"k) , a.s.. E[{44(U4>—m (x )—a(X>e}a %79(U9)l‘ (4.6.39) [:1 (4.6.40) (4.6.41) (4.6.42) (4.6.43) Bernstein inequality implies that sup sup "1269,43,; Meanwhile, applying (4.2.13) and (4.2.14) of Proposition 4.2.1, one obtains that 0{(nh)_1/210gn + h4} X 0 {(nh3)-1/210gn + hs} =0(n_1/210gn) ,a.s.. (4.6.44) sup sup IJ19,pl = 0(n‘1h7210g2 n +117) ,a.s.. (4.6.45) Note that ” 9 J2,9,p = 71—1 2 {79 (U0,i) - 7” (Xi) — 0 (X054) 59-; (79 - 79) (119,4) _1 T 8 ”TL (E+E0) 'a—é-{Pg (E+E6)}. P Applying (4.2.13), one gets J2,,9p+n1(E+E9)T—{P9(E+E9)}= 0(h3) ,as sup sup 96 541-1 l_<_p‘ I ’25 N r ' I: d! ‘0 '- '1 l O _. I I I I I 0.0 0.5 1.0 1.5 Figure 4.15. Example 3.6.1: Plot of the relative efficiencies of components 2 and 3 Note: the empirical efficiencies of fit; (3:0) to in; (ma) computed by (3.6.1) based on 100 replications, a = 2, 3. 121 Efficiency density of component #1 ..... n‘soo "‘ é -— n=1000 : --- n=1500 I — n-2000 .. — E 0 - I .>.~ . 's . c I o I u I N — i o _. I I I I I 0.0 0.5 1.0 1.5 Efficiency density of component #2 . ..... "35m “‘ ’4 ; — n=1ooo . -- n=1500 : -- n=2000 ‘* - 2 o -4 l .2: i to I C I d) I D I u — I O —. I I I I I 0.0 0.5 1.0 1.5 Figure 4.16. Example 3.6.2: Plot of the relative efficiencies of components 1 and 2 Note: the empirical efficiencies of m; (1:0,) to in; (1:0,) computed by (3.6.1) based on 100 replications, a = 1, 2. 122 Efficiency density of component #15 ..... n: m .1 : —- n=1000 I _‘ "31500 ; — na2000 " r I m - 2 .é‘ . In I C I o I u I N - l O _ I I I I I 0.0 0.5 1.0 1.5 Efficiency density of component #30 _ ----- n=500 m ; -—- n-IOOO I — "BISOO : — n=2000 v J : to ~ I 2‘ . “a . C I O I u l N — 1 o — 1 I I I I 0.0 0.5 1.0 15 Figure 4.17. Example 3.6.2: Plot of the relative efficiencies of components 15 and 30 Note: the empirical efficiencies of fit; (3:0,) to in; (1:0) computed by (3.6.1) based on 100 replications, a = 15, 30. 123 5-..-- 3 D .. ‘ ' ' III - d . I, . . ' {dd/”ll .' . :r’r'I,’ . I. g I, 0* . . _ _ -.. I .' ' ... ."-.,_.-¢ ‘-.._ t I ‘I_ . ' O .I u . s . . . . .' - . ‘ _ ‘ . . . . P . , . . . ' . - . . . . , ‘.- . . I . ' . _ . : _ ; _ - . : x i - . ~ f: . .' . - j . ' l n "‘ ’ . . ‘ I "'( " ' . - I 1 I I . . . g ,l I I ,t, . _' . .1 ‘ I ' I I’l' ,‘I 4:, {‘7' _ . . . - a ;.;a,';,;,,-’ , . _ . ""1","a' ‘ .'.GIODICII- Note: the Figure 4.18. Example 4.4.1: The actual bivariate surface actual surface m in model (4.4.1) with respect to 6 = 0, l. 124 . o o f o 1" ID -' O _’.l"\. . l . l '\ r '0 o x . . ' I C ‘, .‘ v - .‘ ' I ’ I. ‘. i \0 i. I . "\ . 1" .0 \. '~ .-. 0') -' ‘ . O ’5 _I .I N '1 1’ I. I Q 0. 0 I '- - o i I. I 1'. ...‘ C _. I o l‘ (C or . I I I I -1 0 ‘l 2 Figure 4.19. Example 4.4.1: The univariate approximation to the bivariate surface Note: function 9 (solid curve); estimate of g (dotted curve) by 90; estimate of 9 (dashed curve) by (i = (0.69016, 0.72365)T for 6 = 0 and (0.72186, 0.69204)T for 6 = 1. 125 n: 100, d: 10 1.0 0.5 0.0 -1.0 “=100, (1:50 1.0 0.5 0.0 -1.0 Figure 4.20. Example 4.4.2: The univariate approximation ((1 = 10, 50) Note: estimate of g with 9 (dotted curve), estimate of g with 00 (dashed curve), true function m (x) in (4.4.2) (solid curve). 126 n= 100, d: 100 n= 100, d: 200 Figure 4.21. Example 4.4.2: The univariate approximation ((1 = 100, 200) Note: estimate of g with 9 (dotted curve), estimator of g with 00 (dashed curve), the true function m(x) in (4.4.2) (solid curve). 127 Density Estimation, d=10 @— I I I I I I I (VJ-1‘ I I I I I > I 35 I C N“ o I o I I I I I I v--" I I I I I I I 0-1 I I I T I I I 1 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Density Estimation, d=50 v a . 1" I I I I I l m— I 1 I l I 1 >~ I .0: , "c' N -' Q I Q I I l I I I v- '1' I 1 I I l 1 0.1L . . . , l l l I I I I I 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Figure 4.22. Example 4.4.2: Kernel density plots of the error norms Note: the kernel density estimators of "0 — Boll/x/d are based on 100 replications. 128 Density Estimation, (1:100 w 4 .. — a 3. : ‘3 N — . q, I 0 1 I I I I I I I I 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Density Estimation, (1:200 #- —I t;- m _ 2 E 2 E”: N — - m I 0 i o ..--i--..-...-- ----.- ,_ I I I I I I I I 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Figure 4.23. Example 4.4.2: Kernel density plots of the error norms Note: the kernel density estimators of "0 — Boll/fl are based on 100 replications. 129 flow 20 40 60 80100120140 1 800 1000 days 0—1 1' [I I‘ll . I‘ 0" I 0. II I l i g \I I ‘ ‘ l l O 1 '7‘ ’I 0 01-1 I I I I I I I 0 200 400 600 800 1000 days 8—1 0. (D E o s v‘ O- N 0 200 400 600 800 1000 .4 .4 _1 Figure 4.24. Time plots of the daily river flow data Note: the first, second and third are flow (solid) with trend (dashed), temperature (solid) with trend (dashed line) and precipitation(solid) with trend (dashed) respectively. 130 150 100 1 50 1 0 200 400 600 800 1 000 60 residual 0 20 j J -20 1 0 200 400 600 800 1 000 days 150 1 flow 1 00 A 50 l 800 900 1000 1100 days Figure 4.25. The fitted, residual and forecast plots of the river flow data Note: the first is the river flow data (“+”) with the SIP fitted values (line); the second is the residual plot; the third is the out-of-sample rolling forecasts (line) for the third year. 131 BIBLIOGRAPHY [1] Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates. Ann. Statist. 1 1071—1095. [2] Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. New York: Springer. [3] Carroll, R., Fan, J ., Gijblos, I. and Wand, M. P. (1997). Generalized partially linear single-index models. J. Amer. Statist. Assoc. 92 477-489. [4] Chen, H. (1991). Estimation of a projection -persuit type regression model. Ann. Statist. 19 142-157. [5] Chen, R. and Tsay, R. S. (1993). Nonlinear additive ARX models. J. Amer. Statist. Assoc. 88 956—967. [6] Chen, R., Yang, L. and Hafner, C. (2004). Nonparametric multi-step ahead prediction in time series analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 669-686. [7] Claeskens, G. and VanKeilegom, I. (2003). Bootstrap confidence bands for regression curves and their derivatives. Ann. Statist. 31 1852-1884. [8] de Boor, C. (2001). A Practical Guide to Splines. New York: Springer. [9] DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation: Polynomials and Splines Approximation. Springer-Verlag, Berlin. [10] Doukhan, P. (1994). Mixing: Properties and Examples. Springer-Verlag, New York. [11] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications, London: Chapman and Hall. [12] Fan, J. and Jiang, J. (2005). Nonparametric inference for additive models. J. Amer. Statist. Assoc. 100 890—907. [13] Fan, J., Hardle, W. and Mammen, E. (1998). Direct estimation of low-dimensional components in additive models. Ann. Statist. 26 943-971. [14] Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. New York: Springer. [15] Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817-823. 132 [16] Hall, P. (1989). On projection pursuit regression. Ann. Statist. 17 573-588. [17] Hall, P. and Titterington, D. M. (1988). On confidence bands in nonparametric density estimation and regression. J. Multivariate Anal. 27 228-254. [18] Hardle, W. (1989). Asymptotic maximal deviation of M-smoothers. J. Multivariate Anal. 29 163-179. [19] Hardle, W. (1990). Applied Nonparametric Regression. Cambridge University Press, Cambridge. [20] Hardle, W. and Hall, P. and Ichimura, H. (1993). Optimal smoothing in single—index models. Ann. Statist. 21 157-178. [21] Hardle, W. , Hlavka, Z. and Klinke, S. (2000). XploRe Application Guide. Springer- Verlag, Berlin. [22] Hardle, W., Marron, J. S. and Yang, L. (1997). Discussion of “Polynomial splines and their tensor products in extended linear modeling” by Stone et. al. Ann. Statist. 25 1443-1450. [23] Hardle, W. and Stoker, T. M. (1989). Investigating smooth multiple regression by the method of average derivatives. J. Amer. Statist. Assoc. 84 986-995. [24] Hastie, T. J. and Tibshirani, R. J. ( 1990). Generalized Additive Models. London: Chap— man and Hall. [25] Hengartner, N. W. and Sperlich, S. (2005). Rate optimal estimation with the integration method in the presence of many covariates. J. Multivariate Anal. 95 246-272. [26] Horowitz, J. L. and Hardle, W. (1996). Direct semiparametric estimation of single- index models with discrete covariates. J. Amer. Statist. Assoc. 91 1632-1640. [27] Horowitz, J. and Mammen, E. (2004). Nonparametric estimation of an additive model with a link function. Ann. Statist. 32 2412-2443. [28] Horowitz, J. Klemela, J. and Mammen, E. (2006). Optimal estimation in additive regression. Bernoulli 12 271-298. [29] Hristache, M., Juditski, A. and Spokoiny, V. (2001). Direct estimation of the index coeffcients in a single-index model. Ann. Statist. 29 595-623. [30] Huang, J. Z. (1998). Projection estimation in multiple regression with application to functional ANOVA models. Ann. Statist. 26 242-272. [31] Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist. 31 1600-1635. [32] Huang, J. and Yang, L. (2004). Identification of nonlinear additive autoregressive mod- els. J. R. Stat. Soc. Ser. B Stat. Methodol. 66, 463-477. 133 [33] Huber, P. J. (1985). Projection pursuit (with discussion). Ann. Statist. 13 435-525. [34] Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index models Journal of Econometrics 58 71-120. [35] Johnson, R. A. and Wichern, D. W. (1992). Applied Multivariate Statistical Analysis. New Jersey: Prentice Hall. [36] Klein, R. W. and Spady. R. H. (1993). An efficient semiparametric estimator for binary response models. Econometrica 61 387-421. [37] Leadbetter, M. R., Lindgren, G. and Rootzén, H. (1983). Extremes and Related Prop- erties of Random Sequences and Processes. New York: Springer. [38] Lefohn, A. S., Husar, J. D. and Husar, R. B. (1999). Estimating historical anthropogenic global sulfur emission patterns for the period 1850-1990. Atmospheric Environment. 33 3435-3444. [39] Linton, O. B. and Nielsen, J. P. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika 82 93-101. [40] Linton, O. B. and Hardle, W. (1996). Estimating additive regression models with known links. Biometrika 83 529-540. [41] Linton, O. B. (1997). Efficient estimation of additive nonparametric regression models. Biometrika 84 469—473. [42] Maddison, A. (2003). The World Economy: Historical Statistics. Paris: OECD. [43] Mammen, E., Linton, O. and Nielsen, J. (1999). The existence and asymptotic prop- erties of a backfitting projection algorithm under weak conditions. Ann. Statist. 27 1443-1490. [44] Muller, H. G., Stadtmiiller, U. and Schmitt, T. (1987). Bandwidth choice and confi- dence intervals for derivatives of noisy data. Biometrika 74 743-749. [45] Neumann, M. H. (1995). Automatic bandwidth choice and confidence intervals in non- parametric regression. Ann. Statist. 23 1937-1959. [46] Neumann, M. H. (1997). Pointwise confidence intervals in nonparametric regression with heteroscedastic error structure. Statistics 29 1-36. [47] Opsomer, J. D. and Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression. Ann. Statist. 25 186-211. [48] Pham, D. T. (1986). The mixing properties of bilinear and generalized random coeffi- cient autoregressive models. Stochastic Anal. Appl. 23 291-300. [49] Powell, J. L., Stock, J. H. and Stoker, T. M. (1989). Semiparametric estimation of index coefficients. Econometrica. 57 1403-1430. 134 [50] Robinson, P. M. (1983). Nonparametric estimators for time series. J. Time Ser. Anal. 4 I85—207. [51] Rosenblatt, M. (1976). On the maximal deviation of k-dimensional density estimates. Ann. Probab. 4 1009-1015. [52] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall. [53] Sperlich, S., Tjostheim, D. and Yang, L. (2002). Nonparametric estimation and testing of interaction in additive models. Econometric Theory 18 197-251. [54] Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689-705. [55] Stone, C. J. (1994). The use of polynomial splines and their tensor products in multi- variate function estimation. Ann. Statist. 22 118-184. [56] Sunklodas, J. (1984). On the rate of convergence in the central limit theorem for strongly mixing random variables. Lithuanian Math. J. 24 182-190. [57] Tjostheim, D. and Auestad, B. (1994). Nonparametric identification of nonlinear time series: projections. Amer. Statist. 89 1398-1409. [58] Tong, H. (1990). Nonlinear Time Series: A Dynamical System Approach. Oxford, U.K.: Oxford University Press. [59] Tong, H., Thanoon, B. and Gudmundsson, G. (1985). Threshold time series modeling of two icelandic riverflow systems. Time Series Analysis in Water Resources. ed. K. W. Hipel, American Water Research Association. [60] 'I‘usnady, G. (1977). A remark on the approximation of the sample (if in the multidi- mensional case. Period. Math. Hungar. 8 53-55. [61] Wang, L. and Yang, L. (2007). Spline-backfitted kernel smoothing of nonlinear additive autoregression model. Ann. Statist. Forthcoming. [62] Xia, Y. (1998). Bias-corrected confidence bands in nonparametric regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 797-811. [63] Xia, Y. and Li, W. K. (1999). On single-index coefficient regression models. J. Amer. Statist. Assoc. 94 1275-1285. [64] Xia, Y., Li, W. K., Tang, H. and Zhang, D. (2004). A goodness—of—fit test for single- index models. Statist. Sinica. 14 1-39. [65] Xia, Y., Tong, H., Li, W. K. and Zhu, L. (2002). An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 363-410. 135 [66] Xue, L. and Yang, L. (2006 a). Estimation of semiparametric additive coefficient model. J. Statist. Plann. Inference 136 2506—2534. [67] Xue, L. and Yang, L. (2006 b). Additive coefficient modeling via polynomial spline. Statistica Sinica 16 1423-1446. [68] Yang, L., Hardle, W. and Nielsen, J. P. (1999). Nonparametric autoregression with multiplicative volatility and additive mean. J. Time Ser. Anal. 20 579-604. [69] Yang, L., Sperlich, S. and Hardle, W. (2003). Derivative estimation and testing in generalized additive models. J. Statist. Plann. Inference 115 521-542. [70] Zhang, F. (1999). Matrix Theory: Basic Results and Techniques. New York: Springer. [71] Zhou, S., Shen, X. and Wolfe, D. A. (1998). Local asymptotics of regression splines and confidence regions. Ann. Statist. 26 17601782. 136 I“[1311];[1211811113131