r H... . ”g: n. .3. :1. xx ..:-$.15 .3: : .2: .a \x; ..:. I: h l. . glib... £1.53: £73.... as A. ‘3‘ 1:: K ..:.hnnmzt} .25 3 5» , : . V: yawn. : , _ _ ..: .. Y . “Strata! . , : . , . . .i . )‘14 . .75.. ..:, . :4 fll UBRARY Michig... State UfilVCI ait'y' This is to certify that the dissertation entitled Non- and semiparametric modeling of financial and macro- economic time series presented by Rong Liu has been accepted towards fulfillment of the requirements for the Ph.D. degree in Statistics Major Professor’s Signature 3/11/09 Date MSU is an Affinnative Action/Equal Opportunity Institution -A-o-c-o-O-I-I-I-I-h‘- PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5I08 Kthrolecc8-PreslclRC/DateDue.indd NON- AND SEMIPARAMETRIC MODELING OF FINANCIAL AND MACRO-ECONOMIC TIME SERIES By Rong Liu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Statistics 2009 ABSTRACT N ON- AND SEMIPARAMETRIC MODELING OF FINANCIAL AND MACRO-ECONOMIC TIME SERIES By Rong Liu Nonlinear time series analysis has gained much attention in recent years due primarily to the fact linear time series models have encountered various limitations in real applications and the development in nonparametric regression has established a solid foundation for nonlinear time series analysis. For example, the effect of technology on the economic growth, volatility of exchange returns, which follow nonlinear instead of simple linear prediction formulas. Efl'ective tools for extracting information from such complex regression data have to be nonparametric in nature. A smooth kernel estimator is pr0posed for multivariate cumulative distribution function in Chapter 2, extending the work on Yamato (1973) on univariate distribution function estimation. Under assumptions of strict stationarity and geometrically strong mixing, we establish that the proposed estimator follows the same pointwise asymptotically normal distribution of the empirical cdf, while the new estimator is a smooth instead of a step func- tion as the empirical cdf. We also show that under stronger assumptions the smooth kernel estimator has asymptotically smaller mean integrated squared error than the empirical cdf, and converges to the true cdf uniformly almost surely at a rate of (n’1/2 log n). Simulated examples are provided to illustrate the theoretical properties. Using the smooth estimator, survival curves are given for real data applications. “Curse of dimensionality” is a significant obstacle in high dimensional time series anal- ysis, see Fan and Yao (2003). Several high dimensional data analysis techniques have been proposed to deal with this problem and Xia, Tong, Li and Zhu (2002) pointed out that there are essentially two approaches: function approximation and dimension reduction. GARCH model, Additive Coefficient Model (ACM) and Generalized Additive model (CAM) are good examples to represent these two approaches. In Chapter 3, a cubic spline regression procedure is proposed to estimate the unknowns in the semiparametric GARCH model that is intuitively appealing due to its simplicity, and as such, can be used by non experts. The theoretical properties of the procedure is the same as the kernel procedure in Yang (2006), and simulated and real data examples show that the numerical performance is also comparable to the kernel method. The new method is computationally much more efficient and very useful for analyzing financial time serias data. In Chapter 4, a spline-backfitted kernel estimator is proposed for estimating the unknown component functions ma, based on a geometrically strong mixing sample following model (1.3.1) under minimal smoothness assumptions. The idea is to employ one step backfitting after the spline pilot estimators, and then follow up with kernel smoothing, which combines the fast computing of polynomial spline smoothing and the good asymptotic property of kernel smoothing. Thus, the spline-backfitted kernel estimator is both computationally expedient for analyzing very high dimensional time series, and theoretically reliable to make inference on the component functions with confidence. In Chapter 5, a Spline-backfitted kernel (SBK) estimator is proposed for the Generalized Additive Model time series data with oracle efficiency. It is both computationally expedient and theoretically reliable, and simulation evidence strongly corroborates the asymptotic theory. ACKNOWLEDGMENTS I would like to thank many peOple who have helped me on the path towards this disser- tation. First and foremost, I would like to express my gratitude to my advisor, Professor Lijian Yang. I could never have reached the heights or explored the depths without his generous help, unbreakable support and patient guidance. I also wish to express my gratitude to my dissertation committee, Professor Dennis Gilliland, Professor Lyudmila Sakhanenko, Professor Emma Iglesias, Professor Yiming Xiao, Professor Richard Baillie for sparing their precious time to serve on my committee and giving valuable comments and suggestions. I am grateful to the entire faculty and staff in the Department of Statistics and Proba- bility who have taught me and assisted me during my study at MSU. And special thanks are given to Professor James Stapleton, Professor Connie Page and Professor Raoul LePage for their numerous help, constant support and encouragement. Thanks to the graduate school and the Department of Statistics who provided me with the Dissertation Completion Fellowship (2009), Summer Support Fellowship (2008) and Stapleton Fellowship for working on this dissertation. This dissertation is also supported in part by NSF awards DMS 0405330 and 0706518. Last but not least, I would like to thank four of my academic sisters: Dr. Jing Wang, ' Dr. Li Wang, Qiongxia Song and Shujie Ma for their generous help. iv TABLE OF CONTENTS LIST OF TABLES ................................. vii LIST OF FIGURES ................................ viii 1 Introduction ................................... 1 1.1 Nonlinear Time Series Prediction Model .................... 1 1.2 Semiparametric GARCH Model ......................... 2 1.3 Additive Coeflicient Model (ACM) ....................... 4 1.4 Generalized Additive Model(GAM) ....................... 5 1.5 Polynomial Spline Smoothing .......................... 6 2 Kernel estimation of multivariate cumulative distribution ....... 7 2.1 Introduction .................................... 7 2.2 Asymptotic Results ............. ' .................. 9 2.3 Bandwidth Selection ............................... 11 2.4 Examples ..................................... 14 2.4.1 A simulated example ........................... 14 2.4.2 GDP growth and unemployment ..................... 15 2.5 Appendix ..................................... 17 2.5.1 Preliminaries ............................... 17 2.5.2 Proofs of Theorems 2.2.1 and 2.2.2 ................... 17 2.5.3 Proof of Theorem 2.2.3 .......................... 21 3 Spline estimation of a semiparametric GARCH model ......... 26 3.1 Introduction .................................... 26 3.2 Estimation Method ................................ 27 3.3 Implementation .................................. 32 3.4 Simulation ..................................... 33 3.5 Applications .................................... 34 3.6 Appendix ..................................... 35 3.6.1 Preliminaries ............................... 35 3.6.2 Proof of Proposition 3.6.1 ........................ 41 3.6.3 Proof of Proposition 3.2.1 ........................ 46 4 Spline-backfitted kernel smoothing of additive coefficient model . . . . 51 4.1 Introduction ‘ .................................... 51 4.2 Assumptions ................................... 54 4.3 4.4 4.5 4.6 4.7 Oracle Smoothers ................................ 57 Spline—backfitted Kernel Estimators ...................... 60 4.4.1 Decomposition .............................. 63 Implementation .................................. 66 Examples ..................................... 68 4.6.1 Simulated example ............................ 68 4.6.2 Real data example ............................ 69 Appendix ..................................... 70 4.7.1 Preliminaries ............................... 70 4.7.2 Oracle smoothers ............................. 71 4.7.3 Estimation of constants ......................... 83 4.7.4 Estimation of function components ................... 88 5 Spline-backfitted kernel smoothing of generalized additive model . . . 101 5.1 Introduction .................................... 101 5.2 Oracle Smoothers ................................. 102 5.3 Spline—backfitted Kernel Estimators ...................... 104 5.4 Implementation .................................. 105 5.5 Examples ..................................... 107 5.5.1 Simulationl ............................... 107 5.5.2 Simulation 2 ............................... 109 5.6 Appendix ..................................... 109 5.6.1 Preliminaries ............................... 109 5.6.2 Oracle smoothers ............................. 110 5.6.3 Spline backfitted kernel estimators ................... 118 BIBLIOGRAPHY ................................. 160 vi (COOKIGUIIbOONi—I i—ti—Ii—l tot-HO LIST OF TABLES Simulated example 2.4.1 ............................. 132 Simulated example 3.4.1 ............................. 132 Simulated example 3.4.1 ............................. 132 Fitting DEM / GBP returns ............................ 133 Fitting DEM/USD returns ............................ 133 Residual check for fitting DEM/GBP returns .................. 133 Residual check for fitting DEM / USD returns .................. 133 Simulated example 4.6.1 ............................. 134 Simulated example 5.5.1 ............................. 134 Simulated example 5.5.1 ............................. 134 Simulated example 5.5.2 ............................. 135 Simulatedexamp135.5.2........................ ..... 135 vii (DWNGCJ‘AOOMH NMMNHHHHi—AHi—ii—Ah—Ii—d WNHOCDOOKICDOWACONHO LIST OF FIGURES ACF plot of GDP quarterly growth rate. .................... 136 Timeplot of GDP quarterly growth rate. .................... 137 ACF plot of unemployment quarterly growth rate ................ 138 Timeplot of unemployment quarterly growth rate ................ 139 Survival curves of GDP growth rate conditional on unemployment growth rate. 140 Plot of densities of d. ......... , ..................... 141 Residuals of DEM/USD daily returns ...................... 142 Estimated function m for the semiparametric GARCH model. ........ 143 Errors of GDP forecasts. ............................ 144 Estimation of function c1 + mSBLL,41 (9.3-3) ................... 145 A typical estimator of mu based on n = 500 observations. .......... 146 GDP growth rate-dotted line; estimated TF P growth rate—solid line. . . . . 147 Plot of empirical distribution of relative efficiency: r = 0, a = 0. ....... 148 Plot of empirical distribution of relative efficiency: 7' = 0, a = 0.5. ...... 149 Plot of empirical distribution of relative efficiency: 1' = 0.5, a = 0. ...... 150 Plot of empirical distribution of relative efficiency: 1' = 0.5, a. = 0.5. ..... 151 Plot of function estimation for r = 0, a = 0: n = 500. ............. 152 Plot of function estimation for r = 0, a = 0: n = 1000 .............. 153 Plot of function estimation for r = 0, a = 0: n = 1500 .............. 154 Plot of function estimation for r = 0, a = 0: n = 2000 .............. 155 Plot of function estimation for r = 0.5, a = 0.5: n = 500. ........... 156 Plot of function estimation for 1' = 0.5, a. = 0.5: n = 1000 ............ 157 Plot of function estimation for 1‘ 2: 0.5, a = 0.5: n = 1500 ............ 158 viii 24 Plot of function estimation for r = 0.5, a = 0.5: n = 2000 ............ 159 CHAPTER 1 Introduction 1.1 Nonlinear Time Series Prediction Model Nonlinear time series analysis has gained much attention in recent years due primarily to the fact linear time series models have encountered various limitations in real applications and the development in nonparametric regression has established a solid foundation for nonlinear time series analysis. For example, the effect of technology on the economic growth, volatility of exchange returns, which follow nonlinear instead of simple linear prediction formulas. Effective tools for extracting information from such complex regression data have to be nonparametric in nature. I view this line of research as developing theory that is motivated and influenced by applications. A typical nonparametric problem in time series analysis is the classical decomposition of a realization of a time series into a slowly changing function known as a “trend component” , or simply trend, a periodic function referred to as a “seasonal component”, and finally a “random noise component”, which in terms of the regression theory should be called the time series of residuals. In time series analysis smoothing problems occur of course in the spectral domain when we want to estimate the spectral density, e.g. for model fitting. In the time domain nonparametric prediction is one of the fields where smoothing methods are intensively used. Two very pOpular forms of nonparametric regression are kernel/local polynomial type and spline type smoothing. In this work, the polynomial spline smoothing is extensively studied for nonlinear time series. The greatest advantages of spline smoothing, as pointed out in Huang and Yang (2004), Xue and Yang (2006 b) are its simplicity and fast compu- tation. But spline smoothing also has disadvantages, such as no limiting distribution. So the combination for kernel/ local [polynomial and spline smoothing is studied in Chapters 4 and 5. “Curse of dimensionality” is a significant obstacle in high dimensional time series anal- ysis, see Fan and Yao (2003). Several high dimensional data analysis techniques have been proposed to deal with this problem and Xia, Tong, Li and Zhu (2002) pointed out that there are essentially two approaches: function approximation and dimension reduction. GARCH model and Generalized Additive model (GAM) are good examples to represent these two approaches. 1.2 Semiparametric GARCH Model In the study of many financial time series such as foreign exchange returns, it has been a known fact that the return itself can not be predicted. It is the forecasting of the returns’ volatility that is of special interests. Empirical evidences had led to the understanding that for such series, the volatility often depends on infinitely many past returns with diminishing weights. The GARCH(p, q) model of Bollerslev (1986), for example, allows the volatility function to depend on all past observations, with geometrically decaying rate. As a special case, the GARCH(1, 1) model describes a process {YtltZ-oo of the form 1’; =-.- ought E Z = {.., —2, ——1,0, 1,2, ...,} where the innovations {étltez are i.i.d random variables satisfying E (g,) = 0, E (5%) = 1, and {0,2}::_00 denotes the conditional volatility seriae a? = var (YtlYt_1,Yt_2, ...) i.e., for some 10,60,010 > 0,04) + 50 < 1, 0,2 = w + [302:105-11’iji E Z. (1.2.1) Engle and Ng (1993) and Glosten, Jaganathan and Runkle (1993), Hentschel (1995), Duan (1997), Hafner and Herwartz (2006), Hafner (2008) had examined various useful extensions of model (1.2.1), mostly providing empirical evidence without establishing asymptotic re- sults. For related theoretical works on GARCH model, see Peng and Yao (2003), Sun and Stengos (2006) and Chan, Deng, Peng and Xia (2007). In recent years, there has been a surge of interests in applying nonparametric smoothing theory to volatility estimation, as in Yang, Hardle and Nielsen (1999), Dahl and Levine (2006), Levine (2006), Brown and Levine (2007). In particular, Hafner (1998) had proposed iterative algorithm for nonparametric GARCH model of the form 03:21.31? m,(Yt_j) ,teZ,011, with } 0 otherwise Bj’l (’U.) : [{ILEJJ' In Chapters 3, 4 and 5, spline smoothing is applied under different conditions. CHAPTER 2 Kernel estimation of multivariate cumulative distribution 2.1 Introduction This chapter is based on Liu and Yang (2008). The estimation of probability density func— tions (pdf’s) and cumulative distribution functions (cdf’s) occupy a central place in applied data analysis in the social sciences. While many statisticians and econometricians are fa- miliar with various smooth nonparametric estimators of pdf’s, the smooth estimation of cdf’s has not been investigated as much, see Li and Racine (2007) sections 1.4 and 1.5. To properly define the problem, let (X, = (Xi1,...,Xid)T}:_1 be a geometrically a-mixing and strictly stationary sequence of d—dimensional variables, with a common probability density function f E C(p'H) (Rd) and cumulative distribution function F E C(p‘fd‘l'll (Rd), in which p is an odd integer. Traditionally, F is estimated by the empirical cumulative distri- bution function F (x) = n—1 ZI-‘zl I (X,- _<_ x}, whose theoretical properties have been well known. One obvious drawback of F is that it is a step function even when the true cdf F is smooth. Yamato (1973) proposed a smooth estimator of F by integrating a kernel density esti- mator of the density f. To be precise, define the following kernel estimator of F F(X) = F12. (X) = /—:of(ll) dll = 71—1 Zfll [11(1, (X, -- u) du,Vx ERd (2.1.1) where f (u) is the standard d—dimensional kernel density estimator (kde) of f (u) (see Bickcl and Rosenblatt, 1973) 13(11): "—12:21 Kh (X1: ‘11):Kh (u) =Hd “I‘K (g) .11 =(U1.---.Ud)T a=1 ha in which h = (h1,...,hd)T are positive numbers depending on the sample size 71, called bandwidths. Theoretical properties of F (x) as an estimator of the unknown true distribution func- tion F (x) have been investigated by several authors for the case of d = 1 and under i.i.d assumptions, see for example Yamato (1973), Reiss (1981), Falk (1983) and more recently Cheng and Peng (2002). For feasible econometric applications of univariate kernel estima- tion of cumulative distribution function, such as to the testing of stochastic dominance, see Li and Racine (2007), page 23, and the references therein. In this chapter, we examine under a strong mixing assumption and for arbitrary dimen- sion d, the local property of F (x) in terms of pointwise asymptotic distribution and its global property in terms of mean integrated squared error (MISE) and maximal deviation. We have proved that the smooth estimator F (x) behaves asymptotically the same as the empirical cdf F (x) at any point x, have obtained its asymptotic mean integrated squared error (AMISE) and have established its uniform almost sure convergence rate. The rest of the chapter is organized as the following. In Section 2.2, we give Theorems 2.2.1, 2.2.2 and 2.2.3, the main results on pointwise, mean integrated squared and uniform asymptotics. In Section 2.3, we describe a data—driven rule to select the asymptotically optimal bandwidths h, which makes the MISE of F asymptotically smaller than that of the empirical cdf F according to Theorem 2.3.2, another compelling reason that F is preferable over F other than smoothness. In Section 2.4, we present Monte Carlo evidence that cor— roborates the theory and illustrates the use of F with two real data examples. The first real data example illustrates the stochastic dependence of GDP growth rate on unemployment growth rate in the US economy. Second example shows that gold and silver are substitute goods and their prices are strongly associated. All technical proofs are in the Appendix. 2.2 Asymptotic Results Throughout this chapter, we denote hq‘nax : maX(h1, ...,,Ld)’ hprod = hl X ' ‘ ' X hd and for any a: E R, K (2:) = ffoo K (u) du,, where K is a kernel function in Assumption (A4). K(x) = H331 K(xa) for any vector x = (r1, . . . ,xd)T. Then K (x) E 0 unless x Z —1 and K(x) E 1 if x Z 1.where for any two vectors x = (2:1,...,:rd)T, y = (y1,...,yd)T, x 2 y if and only if :ca 2 ya,‘v’a = 1, ...,d. It is easily verified that [31 K(w) dw = 1, We also denote up“ (K) = f_11 K (u) up'Hdu, D (K) = 1 — f—l-l K2 (21)) dw. For any vector x = ($1,...,:rd)T and Va = 1,...,d, we denote x_a = (2:1,...,xa_1,:ra+1,...,:rd)T and with slight abuse of notation, write x = (ma,x_a)T. We list below some basic assumptions. (A1) The cumulative distribution function F E C(p+d+1) (Rd), in which 1) is an odd integer, while all (p+d+1)-th partial derivatives of F belong to L1(Rd) and < 0. 3:83;, If (X)l _ (A2) There exist positive constants K0 and A0 such that a (k) 3 K0 exp (—)\0k) holds for all k, where the k-th order strong mixing coeflicient of the strictly stationary process {X3}:S___oo is defined as a(k)= sup |P(BflC)—P(B)P(C)|,k21. BEO’{Xs,SSt},CEU{X3,S_>_t'l-k} (A3) A5 n ‘7 00: nhprod _’ 001 nl/2hprod/(lognl1/2 + nl/Zhfifiic "" 0- (A4) The univariate kernel function K () is of (p + 1)-th order, supported on [~l,1], Lip- schitz continuous. Assumptions (A1) to (A4) are all typical conditions in time series smoothing literature, see Bosq (1998) Chapter 2 for similar or even stronger assumptions. Elementary arguments show that D (K) > 0 under Assumption (A4). The following theorem concerns the asymptotic distribution of F given in (2.1.1) at any xER“ THEOREM 2.2.1. Under Assumptions (AU-(A4), Vx ER“ as n ——> oo ,/nv-1(x) (F (x) — F (x)) ->.1 N (o, 1) , where v (x) = Z:_oo7(l).i(l) = EI{X.- _<. x}I{X.-+1s x} — 1:20.). Theorem 2.2.1 shows that the smooth estimator F (x) has asymptotically the same dis- tribution as the empirical cdf F (x). In particular, for iid process {X3} , s = —00, ...,00, the asymptotic variance function V (x) reduces to the more familiar form of 7(0) = F(x) {1 — F (x)}. The global performance of F (x) as an estimator of F (x) can be measured in terms of Mean Integrated Squared Error (MISE) and maximal deviation MISE (F) MISE (F; h) = E f {F (x) —- F (x)}2 dF (x), (2.2.1) Dn (F) = Dn (F; b) = sup IF (x) — F (x)| . (2.2.2) xeRd The next two theorems give the asymptotic formula of MISE (F) and the almost sure rate of 0,, (F). THEOREM 2.2.2. Under Assumptions (AU-(A4), as n —> oo, MISE (F; h) = AMISE (F, h) + 0 (1.3.532 + n-lhmax) in which the Asymptotic Mean Integrated Squared Error (AMISE) is 2 .' __ fV(x)dF(x) up“ (K) .1 p“ p+l AMISE (Eh) _ n + (10+ 1),, 20, [3:1 ha h, Bag,“ (F) _ D (K) 23:1 haCa (F) with 69+1F(x) 6p+1F(x) 6F (x) 6$a Bump“ (F) = dF (x) ,Ca (F) = dF (x) ,‘v’a, s = 1, ...,d. THEOREM 2.2.3. Under Assumptions {AU-(A4), as n —+ 00, Dr, (F) = 0,1,3, (n'l/2 log n) while for i.i.d. x1, ...,xmnn (F) = 0,3. (rt-V2 (log n)1/2). 10 The first term n‘1 f V (x) dF (x) in the formula of AMISE (E; h) is the exact MISE of the empirical cdf F. We are unaware of any published results on the MISE or the strong uniform rate of convergence for smooth estimation of multivariate distribution function based on strongly mixing data, as in Theorems 2.2.2 and 2.2.3. Since Assumptions (A1) to (A4) are mild, we believe that these strong theoretical results hold for most multiple time series data with continuous distributions. In the next section we describe how Theorem 2.2.2 is used to compute a data-driven bandwidth for implementing the smoothed estimator F. 2.3 Bandwidth Selection To have insight into the minimization of AMISE (F; h) given in Theorem 2.2.2, define a function Q : R1 x M+ (d)lx R1 for elementwise positive vectors v = (v1, ...,vat)T ,a = (a1,...,ad)T E R1 = (0,+oo)d and M = (Mafi)d E M+ (d), the set of all positive 073:1 definite d x d matrices: d d Q (v, M,a) = :0 )621 'UavfiMafi _ Zazi aqua/(12H) = VTMV _ aTvl/(p+1) T in which vl/(P+1) = (v%/ (p+1),...,v(11/ UH”) . In the following, we denote for any d- dimensional vector a = (a1, ...,ad)T, the d x d diagonal matrix whose (aa)-th element is ama = 1, ...,d as diag (a). The following theorem is easily proved similar to Yang and Tschernig (1999). THEOREM 2.3.1. (i) The gradient and Hessian matrices of Q (v, M, a) with respect to v are a - ' d _ 1 - 1/ 0 Q (Cgp+1)/(2F+1)C;4(p+l)/(2p+1) ngp‘t'zl/ (2r+1) 61:41/ (2P+1) Q (v, M, a) , v, cMM,caa) = v (cMM, caa) ____ Cgp+1)/(2F+1)C;/I(p+1)/(2F+1)v (M, a). To make use of Theorem 2.3.1, we make an additional assumption on F, . (1 (A5) The matrices Bp+1 (F) = {80,ng (Fllafid E M+ (d) and c (F) = (a. (F))is 6 R1. Theorem 2.2.2, Theorem 2.3.1 (ii) and Assumption (A5) ensure the existence of a unique optimal bandwidth hopt that minimizes 2 AMISE (F; 1.) = f V (ledF (x) + Q (1.17“, we,“ (F) ,n-ID (K) c (F)) Theorem 2.3.1 (iii) then implies that 2 K prH (F) .1240 (K) c (F)) }-1/(2P+1) h... = h... (n,K,F)=v1/(P+1)( WW1) (Bp+1(F),C(F)) . = n-l/(2p+1) “12’“ (K) D (K) (p + 1)!2 Thus to obtain the optimal bandwidth hopt, one computes exactly the factors involving n and K in the above expression, and estimate the following factor 0 = o (F) = (61, ....6.)T = (61(F),....6.(F))T = ,1/(p+1) (B.+1(F),C(F)) . The next theorem follows from the negativity result in Theorem 2.3.1 (ii): THEOREM 2.3.2. Under Assumptions (A1 )-(A5), F has asymptotically smaller MISE than the empirical cdf F. Specifically, MISE (F) = n—1 f V (x) dF (x) and as n -—> oo MISE (F; hopt) = MISE (F) + 71-(21’+2)/(2P+1)C (K, F) + o (n-(2P+2)/(2P+1)) , o(v(Bp+1(F),C(F)) ,B.+1(F),C(F)) < 0. D (K)2p#129+1 (K) —1/(2P+1) (p+1)!2 C(K,F)={ 12 Following Yang and Tschernig (1999), we define a plug-in asymptotic optimal bandwidth fio — nu§+1(K) '1/(2P‘l‘1) 1’" C(K)(p+1)12 ,1/(p+1) (13.... (F) ,6: (F)) in which the plug-in estimator of the unknown parameter 0, 9 = V” (9+1) (BpH (F), C (F )), is computed by Newton-Raphson method using the gradient and Hessian formulae of Theorem 2.3.1 and where the plug-in estimators of the unknown matrices Bp+1 (F) = {8034,44 (Fllfiflzl , C (F) are . . d .. - d B.“ (F) = {8.1.2.1 (F)}a,[,:1 ,0 (F) = {C0 (F)}O,=1 . - n n (p) d X1"! jzl i=1 7:117'7éa ~00 n d X” X {"71 Z Kl? (Xjfl — Xw) H K97 ($7 - Xe) (1937} 1 . n n X- Ca (F) = n~1}:{n-1 2 K9,, (Xja - X,,,,) H / ’7 Kg, (x, — X,,) (13,} . j=1 i=1 °° 7=L7¢a - The pilot bandwidth vector g = (91,...,gd)T is the simple rule-of-thumb bandwidth for multivariate density estimation in Scott (1992). In the next section, we present Monte Carlo evidence for Theorems 2.2.2 and 2.2.3, and illustrate the use of the smooth estimator F (x) with real data examples. In all computing, we use the quartic kernel K(u) = 15/16 x (1 — u2)21(|u| g 1) with p = 1 and plug—in bandwidth fiopt described above. We have not experimented with other choices of K and 1) due to limit of space and as these choices are in general not as crucial as that of the bandwidth, see Fan and Yao (2003). 13 2.4 Examples 2.4.1 A simulated example In the section, we examine the asymptotic results of Theorems 2.2.2 and 2.2.3 via simulation. The data are generated from the following vector autoregression (VAR) equation Xt=aXt_1+e,-,e,-~N(0,2),2Stgn,2=[{1}’10],0$a,p<1 with stationary distribution X, = (Xt1,Xt2)T ~ N (O, (1 - az)_1 )3). Clearly, higher values of (1 correspond to stronger dependence among the observations, and in particular, if a = 0, the data is i.i.d. The parameter p controls the orientation of the bivariate cdf F, and in particular, if a = p = 0, then F is a bivariate standard normal distribution. In this study, we have experimented with three cases: p = 0, a = O; p = 0.5, a = 0.2; p = 0.9, a = 0.2 to cover various scenarios. A total of 100 samples {Xt}?=1 of sizes n = 50, 100, 200, 500 are generated, and F is computed using the optimal bandwidths fiopt described in section 2.3. Of interest are the mean over the 100 replications of the global maximal deviation Dn (F) defined in (2.2.2), denoted as [7,, (F), and the mean integrated squared error MISE (F;hopt) defined in (2.2.1). Both measures are listed in Table 1. As one examines Table 1, both D" (F) and MISE (13313013,) values decrease as sample size increases in all cases, corroborating with Theorems 2.2.2 and 2.2.3. Also listed in Table 1 are the differences of the same measures for the empirical cdf F against those of F, which are always positive regardless of the data generating process (i.e., for different combinations of a, p) and measures of deviation (i.e., En or MISE). This corroborates with Theorem 2.3.2 that F has asymptotically smaller MISE than F. Based on the above observations, we believe our kernel estimator of multivariate cdf is a convenient and reliable tool, which is also superior to the empirical cdf in terms of accuracy. 14 2.4.2 GDP growth and unemployment In this section, we discuss in detail the dependence of US GDP quarterly growth rate on unemployment rate. There are three types of unemployment: frictional, structural, and cyclical. Economists regard frictional and structural unemployment as essentially unavoid- able in dynamic economy, so full employment is something less than 100% employment. The full-employment rate of unemployment is also referred to as the natural rate of unemploy- ment. It does not mean the economy will always operate at the natural rate. The economy sometimes operates at an unemployment rate higher than the natural rate due to cyclical unemployment. In contrast, the economy may on some occasions achieve an unemployment rate below the natural rate. For example, during World War II, when the natural rate was about 4% and actual rate below 2% during 1943-1945. It is caused by the pressure of wartime production resulted in an almost unlimited demand for labor. The natural rate is not forever fixed. It was about 4% in the 19608, and economists generally agreed that the natural rate was about 6%. Today, the consensus is that the rate is about 5.5%. GDP gap denotes the amount by which actual GDP falls short of the theoretical GDP under the natural rate. Okun’s law, based on recent estimate, indicates that for every 1% which the actual unemployment rate exceeds the natural rate, a GDP gap of about 2% occurs. See Samuelson (1995), p.559 or McConnell and Brue (1999), p.214 for more details. In other words, if unemployment rate falls, then GDP growth rate increases. But unemployment rate can not keep falling because it moves around the natural rate. So it is useful to find the relationship between the GDP growth rate and unemployment growth rate. Let th = the seasonally adjusted quarterly unemployment growth rate in quarter t, th = the quarterly GDP growth rate in quarter t, all data taken from the l-st quarter of 1948 (t = 1) to the 2—nd quarter of 2006 (t = 234) . Since both data have been seasonally adjusted, it is reasonable to treat Xt = (Xt1,Xt2)T ,t = l, ..., 234 as a strictly stationary time series, which is shown in the time plots. ACF plots also indicate that the assumption of a-mixing is satisfied. The plots are shown in Figures 1—4. Given any interval I = [a, b], the survival function of th conditional on th E I is 15 defined as F(b,:m) - F(a,$2) F(b, +00) — F(a, +00) 51(32) = F(th > 1132!th E I) = 1 -- (2.4.1) in which F is the joint distribution function of th and th. The function 31(232) can be approximated by the following plug—in estimator - :r = _ F(b,$2)—I:‘(a,a:2) I 81(2) 1 F(b,+oo)-13‘(a,+oo) (2.4.2) in which F‘ is the kernel estimator of F defined in (2.1.1). According to Theorems 2.2.1 and 2.2.3, for any fixed 11:2, ISI($2) — 31052). = Op (71-1/2) while sup$2€R ISI($2) — 81(x2)| = 00.3, (714/2 log n) , so the estimator 571052) is theoretically very reliable. We therefore draw probabilistic con- clusions based on the smooth estimate 31(272) instead of the true SI(:1:2). In Figure 5, the estimated conditional survival curve 31(552) is plotted for intervals I = [—0.08,—0.04], I = [—0.02,0.02], I = [0.04, 0.08]. Clearly, when the unemployment growth rate is between —0.08 and —0.04, the chance to have the GDP growth rate higher than 1.5% is the greatest, which is about 0.2. This is in accordance with the Okun’s law that the growth in GDP is the associated with the unemployment rate. So if policymakers want to achieve high GDP growth rate, they may find better ways to lower the unemployment rate. One can even estimate the probabilities of GDP growth rates given the policy of unemployment, which is the interval I. If current unemployment rate is close to the natural rate, then the I is an interval close to 0, such as [—0.02, 0.02]; if the current unemployment rate is much higher than the natural rate, then the I is an negative interval, i.e., trying to lower the unemployment rate. On the other hand, the survival function of th conditional on th can be computed similarly. If certain level of GDP growth rate is planned to be achieved, one can estimate the conditional probabilities of different unemployment growth rates. 16 2.5 Appendix 2.5.1 Preliminaries In this appendix, we denote by C (or c) any positive constants, by U (or u) sequences of random variables that are uniformly O (or 0) of certain order and by 00.5. almost surely 0, etc. LEMMA 2.5.1. [Berry-Esseen inequality, Sunklodas (1984), Theorem 1] Let {5,};1 be an a-miring sequence with Eén = 0. Denote d6 := Ina-X1992 {E|§,-|2+5} ,0 < 6 S 1, Sn = 23:15., 0?. == E82. 2 can for some Co e (o,+oo). Ifa(n) s KoeXP(-)~0n), A0 > 0, K0 > 0, then there exist c1 = c1(K0,6), c2 = c2 (K0,6), such that An = sup 2 P {05:15}, < z} — (I) (z)! S 01::5{10g(0n/c$/2)/A}1+6 (2.5.1) _n for any A with A1 g A 3 A2, where A1: c2{log(on/c(1)/2)}b/n,b > 2 (1 + 6) /6; A2 = 4 (2 + 6) 6‘1 log (o,,/c(1)/2) . LEMMA 2.5.2. (Bernstein’s inequality, Bosq (1998), Theorem 1.4). Let {{t} be a zero mean real valued process, Sn = 221:1 15,-. Suppose that there exists c > 0 such that fori = 1, ' - - ,n, k 2 3,E|§,-|’c g #21:!ng < +oo,m,- = maxlsisN ||€,-||,.,r 2 2. Then for each n > 1, integer q E [1,n/2], each s > 0 and k 2 3 2 l 1 fl . — qgn . 'n 27+ P{lZi=1€'| >"5”} Salexp( 25m3+5cgn) +a2(k)a([q+1l) where 251713 + Scen 5n 2 5m2k/(2k+1) a1=2§+2(1+—i———) ,a2(lc)=11n (l+—-‘L—— . 2.5.2 Proofs of Theorems 2.2.1 and 2.2.2 LEMMA 2.5.3. Under Assumptions (A1),(A3) and (A4), as n —+ 00 , ~ _ “pH (K) d p+13p+1F(X) p+1 E{F(x)} — F+ (p +1)! a=1 " 51.3“ + H (mm) ' 17 Proof. Using the integral form of Taylor expansion and denoting hv = (h1u1,...,hdud)T, we write f(u + was f(u)+ 211;. 1:___,(Z hava— —%a) no + Hal, 1 p+1 t? d 8 RIM-1 = R’P+1(uihv) = [0 {3" (20:1 havaazg) f(u+thv)} dt Hence Assumption (A4), (A1) and (A3) sequentially imply that E{F(x)} = E]; Kh(X,- -u)du=/_:odu/[;1,1]d f(u+hv)K(v)dv ._. f1; f(uldu+ flea/PW [22%; 2:: 111011087670)" f(U) MW] K(vmv (x)+]:o mdufl 111d [[01 {Pl (Z:=1hava£:)p+l f(u+thv)}dt] K(v) dv +1; (K) 61"“ (I_;)—_+1+1)I /-:o ZZ=1h hp+1 aup-l-l 1f(u)du+u(hfi$}() It 1(103p+1F(X)+ F(x)+ ):;—+——+ 1)! 2:2}1 h———3+16$p——;——1 +.u(hfi+a,l() Cl LEMMA 2.5.4. Under Assumptions (AU-(A4), as n —+ 00 E{/_:0Kh(X,,-—u)du/_:0Kh(xj—u)du} ={ F(x)—D(K)Zd=1hagg§+u(hmax) i=j, EIIXiSX}I{XJ-Sx}+u(hmax) iyéj. Proof. We begin with the case of i = j, E{/:o Kh(X, ')—udu}2 =/_°° 00f(v)K(xh ")2 dv=/o: f(x— hw)K2(w ) hpmddw z: hprOd/j: {I (w 2 —1)— I (w 2 1)} f(x - hw)R2(w)dw +/100f(x — hw)hproddw _—= hpmd/o: {1(w Z —l) —— I (w 2 1)} f(x -— hw)II’2(w)dw+F(x — h) =2: _1hpr0d/loo dwa/l1 du)af(x— hW)K 2(u/a) +F(x—)—Z::1 BFI(X:)ha+u(/imax) . d 8F x = F (x) — 2.21% [f )D (K) + u (hm). 18 Similarly, for the case of i 75 j, one obtains E{[;Kh(x,-—u)duf:oKh(xj—u)du} ‘ oo 00 ~ X—‘V' "' x—V' z] / dVidefi,j(viivj)K( h 1)K( h J) _(x) —00 (X) 00 .. ... = [.1 /_1 fi,j(x - hwz', X - 11‘”le (W) K (W) hgroddwidwj = hired {[310 “(We -1) - [(WiZ 1)}R(Wi)dwi+ Adei} X {/_0:{1(ij ‘1)— 1 (W32 1)} If (Wj) de+/loodwj} fi,j(x -— hw,,x — hwj) 00 ~ 00 = hjzarod/ 1 {I (we —1)— I (we 1)} K (WNW/1 defz',j(X — hwi’x ‘ hwi) _ .. ~ 00 Mama/4 {1(sz —1> — 1(sz 1)} K (we dw, [1 dwifid(x - hm - w +EI{x,- gx—h}I{Xj _<_x—h}+u(hmax) d 00 oo 1 ~ = 2 ha/ de/ deCI/ K (wia) dwiafiJ ($0: "' hwiaa xxx _ VLCU X - Vj) a-l h hi-O —l d 00 oo 1 __ + 2: ha/ dvz-f de_a/ K (wja) dwjafi,j(x "’ Viv ma — hwjayxa ‘- Vj.a) 0:1 h hj.a -l +EI{X,- S x}I{Xj S x} — 2::1ha6EI{X4 £52.1{Xj S x} +u(hmax) d _ . = EI{X.- 3x}1{x, Sx} _ Zhaamxi $632109 Sx} a=1 +ihaaEI{X.- _<. x}1{x,- g x} 0:1 61a 2 El (X,- S x}I{X]~ S x} +u(hmax) .D + u (hmax) Denote 5,, = 5,, (x) = n {13’ (x) —E17"(x)} = 22;] {in in which an =5... (x) =/_:o m. (x.- —u>du—E{/_;Kh(x. —u)du}. then clearly E5”, = 0. Denote by '7 (l) = cov (§,,n,§i+l’n) the autocovariance function, then 19 COROLLARY 2.5.1. Under Assumptions (AU-(A4), as n -—» oo F(x)— F2(x)- D(K)Zd_ ha%%§+u(hm) i=j, C0V(€i,n:€j,n)=§(i_j)={ EI{X;‘ Zd=1ha%§%l+u(hm) i=j _ EI{X1-SX}I{X,-Sx}+u(hm) iaj 2 [F(x)+flp+—1(1K——-—)!Z:=1hp+li$£:(1£2+ u(hg,+a,1{)] , the rest of the proof is trivial. El II Proofs of Theorems 2.2.1 and 2.2.2. According to Corollary 2.5.1 ..(1): 7(0)— D(K)Zd—1haga€§§l+u(hmaxl 1:“ (252) (l)+U(hmax) [#0 . i in which '7 (l)— —- 7 (l, x): EI {X1 S x} I{X1+l S x}— F2(x ). Lemma 2.5.3 and Assump- tion (A3) further imply that K +1 x 31, = n {F(x)—-()—(Fx 6:2)?” 2:2hhg+16p T811 )+ +u (hm). (2.5.3) Meanwhile, 0,2, = E52, = var (Sn) = nAn + an where An = lelsclogn (1 - |l| /n)§(l) and Ba = chogn<|l| a“ (m s 2mm“ (1 — Ill /n>41<01 21°24” 7 (l) 2 c0, therefore lelsfl (l) > 0. Then by (2.5.1) in Lemma 2.5.1, :6 {log(on/c(1)/2) //\}1+6. P{a;13n < z} —(z)l S 6160 For c 2 2/A0, anl S S Cln‘z. For 72 large enough, og/n = Let 6 = 1, A = 4(2~+6)6"110g (on/cé/Z) = 12log(on/c(1)/2), d = 1, then An S $05-12“ ‘2: —C—0 2 O(n")1/2), i.e., Sn/on "’d N(0, 1). Theorem 2.2.1 then follows be- cause WWF (F(x —F (x)) "’d N (0, 1) by Slutsky’s theorem. Equations (2.5.2) and (2.5.3) together with E5”, = 0 imply that . 2 K 1 x 2 {EF(x)-—F(x)}2 = %{Z:=lhg+lflfi%—)} +u(h§g§2): n—_1V(x) D(K)n—1c;lhaaF—T_(:)+ M( —1hmax)y .. .. 2 E {F (x) —EF (x)} .. . 2 hence Theorem 2.2.2 follows by computing f E {F (x) -EF (x)} + .. 2 {EF (x) —F(x)} dF (x). 2.5.3 Proof of Theorem 2.2.3 LEMMA 2.5.5. Denote gm1,...,md = (a1,m1v”' ,adflnd) 6 Rd, 1 S ma S Ma and A ___ |F( )—E{F( )}|, n IsfaagMa gml, ,md gml, ,md W .)—Fdu—E{/g1 th(xi—u)du}) :1, —OO and for k 2 2, E (Ichk) = E (lg-n1“ (3,), which is I / gm1"" ’md K1, (x,- — u)du—-E{/gm1fu,md K1, (x,- — u) du} .<_ 1k‘2E( nan} S a1 exp (——-——-—E—%——) + a2 (3) or ([n/ (q + 1)])9. 25mg + Seen Take such that [n/( + 1)] > 10 n > i_c_1_n_ ____‘15_121_____ > c a2 lo n and a1=29+2 1+-—€%-—-—- =0(logn). q 25mg + Scan . 1 3 since m3 = mamas. Hang 3 {Bags} / s 1, then a2(3)=11n 1+-i Slln 1+——§-—— Slln 1+ 5 =O(n), 5n 1 alogn an—2 log n a ([n/ (q + 1W7 s (K0 exp (-)~o [n/ (q +1)l))6/7 s cn—sxoco/v, So for c0, c2 large enough P “2:1 Ci" n 1 P max n_1l - I > an"? 10 n < {ISmaSMa 2,51 ("limb ,md g _ A“!1,...,Md n —1 Z P{” 2 Cin,m1,m,md 1 i=1 m1=1,...,md= Hence Borel—Cantelli lemma implies that An = Oa,s_ (n‘1/2 log n). Meanwhile 8,, is > nan} S O(log n) exp (~02a2 log n) + Cn1’6A060/7 S Cn—(d+2), d l > an? logn} S Cll—(d+2) H Ma S Cn—z. a=1 bounded by Kim|F(9m1:~»md)-E{F(gm1v-»md)}l 22 WWH(gm-ml}-F(9mwmd>| = An + U(n“1/2) = Oa,s_ (714/2 log n) . If X1, . - - ,Xn are i.i.d., then An+Bn = 00.5, (nil/2 (log n)1/2) by using same steps above with Bernstein’s inequality of i.i.d. case. El LEMMA 2.5.6. VA C Rd, IA |K11 (v — u)| du SfRd lKh (v — u)| du S HKH‘Ifi. Proof. Applying elementary arguments, f A lKh (v — u)| du S fRd ”(11 (v — u)| du is bounded by d _1 ”a " “0: [Rd Ha=1ha K( ha ) LEMMA 2.5.7. Let —oo = and < < aa,Na = 00 be such that d 1 du = 110:, [_1IKrwandwasuKIIii - U max(N1,---,Nd) 3 On and P(a0,kSXaSaa,k+1) g 1/n,Vl g k g Na,V1 g a g d. Then E (gin-and lKh (X—u)|du = u (n-1/2(1ogn)1/2) in which gn1,...,,, = d (a -- . a ) 6 Rd 1,111, 7 d’nd ' Proof. 0° /9n1+1,---,nd+1 12/91:,"- ”(11 (X — u)| du S [00 HQ, (v — u)| dudF (v) ,nd 9n1,--- ,nd 971. +1,---,n +l+(hlv"'1hd) 9n +1,m,n +1 =/ 1 d dF(v) 1 d IKh(v—u)|du gnli'" ,nd—(h1,'” ’hd) 9111 2'" and ya +1,---,n +1+(hlv"’ihd) S C/ 1 d dF (v) gub... ’nd—(hl’m vhd) 9n1+1,--- ,nd+1+(h1 ’hd) accordin to Lemma 2.5.6. g gnli'" ,nd—(h1,"' ’hd) dF (v) equals 911 +l,~-,n +1+(hl""ihd) 9n +1,~--,n +1 9n +l,---,n +1 / 1 d dF(v)—/ 1 ‘1 dF(v)+/ 1 d dF(v) 9111 9'" ,nd—(h1,"' ’hd) 9711 1'” and gn11"'1nd 9n +1,---,n +1+(’11r"ihd) 9n +1,---,n +1 =/ 1 0‘ dF(v)—/ 1 d dF(v) gnla'" ,nd—(h13"' ’hd) gn1,---,nd +P (gri1,-~,nd S X S 9121+1,---,nd+1) 23 01,111 al,n1+1 a1,1i.1+1+hi S + + a h1 a 01,n1+1 1,111— 1,121 ad,n ad,n +1 ad,n +1+h1 - f d + j d + f d dF(v) 0 hi 6‘ ad,nd+1 d7nd~ d,nd _ /9n1+1,-",nd+1dF(v) + l/n 9 "1"" and a Within the above sum, the 3d - 2‘1 terms with aa’na+l are 0 n‘1 , while each of the aina 2d terms without ffzfigfl is bounded by hprod mag; | f (x)|. Applying Assumptions (A1) x6 and (A3), X E/ 9n1,--- ,n ”(11 (X — u)| du S Chprod max If (x)|+C (3d — 2d) /n = u (n"1/2 (logn)1/2) . d XERd Cl LEMMA 2.5.8. Under the same conditions of Lemma 2.5. 7, for Vx == (231, - -- ,Td) 6 Rd, n"1 2:121 [Cm] = Ua.s. (7171/2 log n) in which (in = (in (gn1,‘" ,nd) =/ {lKh (X, — u)| dU—E lKh (X — u)|} du. 9111,... ,nd while for i.i.d. X1,...,Xn,n"1 Z?=1l€z’nl = Ua_s_ (n"1/2 (logn)1/2). Proof. One can show by applying Lemma 2.5.2 as in the proof of Lemma 2.5.5. Cl Proof of Theorem 2.2.3. Under the same conditions of Lemma 2.5.7. one has A max IF (gn1,...,,,d) — F (gnl,...,nd)l = 00.3, (n—l/zlogn) lsnaSNa by Lemma 2.5.5. For Vx = ($1, - - - ,Ed) 6 Rd, there exist integers n1, - - - ,nd such that F(gn1,...,nd) S F(x) S F (gn1+1,...,nd+1). Hence lF(x) — 1:" (9"12"'i"d)l is bounded by 1 " x K x- d <1 n x K - d 52%] M .-u) u —;Z.-=1/ l hog-u)| u gn1,'”,nd 9711,..." 1 d 1 X =_Z’f / {lKh(X,--—u)|du—E|Kh(X—u)|}du n 1:1 9 .. n1, "d X +/ E [Kb (x -— u)| du = 00.3, (71—1/210gn) 9 "12'” M 24 according to Lemmas 2.5.7 and 2.5.8. Then according to Lemma 2.5.5, F(x) — F(x)I S I13”(x) — 1:" (gn1,...,nd)I + IF (9711,... ,nd) — F (gn1,...,nd)I + IF (9711,...md) —- F(x)I = Uaug. (71—1/2 log n) + Uws, (n"1/2 logn) + U(1/n) and if X1, - - . , Xn are i.i.d, one can replace logn in the above inequality by (log n)1/2. D 25 CHAPTER 3 Spline estimation of a semiparametric GARCH model 3.1 Introduction It is widely recognized that global smoothing methods such as those by spline or wavelet are computationally much more efficient than local kernel smoothing, see for example the com- parison of computing time in Xue and Yang (2006b) and Wang and Yang (2007). Recent development of regression spline smoothing in terms of local asymptotics (Huang (2003)), of high dimensional and weakly dependent data (Huang and Yang (2004), Xue and Yang (2006b) and Wang and Yang (2007)) has presented convincing incentives for applying spline smoothing to solve challenging problems in time series analysis. We have applied cubic spline smoothing to the semiparametric GARCH model (1.2.2), which resulted in a proce- dure that is a much faster but shares the same theoretical and numerical properties of the kernel smoothing procedure in Yang (2006). Table 3 shows the computing time compari- son between the proposed cubic spline method versus the local linear method in estimating parameter a0. Clearly, the cubic spline method is superior for large sample as its comput- ing time is proportional to n-1 of the corresponding time of the local linear method. The advantage of spline method had already been recognized by Engle and Ng (1993), which pro— posed spline estimation for the news impact curve for extensions of model (1.2.1), without developing justifications by asymptotic theory. The chapter is organized as follows. In Section 3.2 we discuss the assumptions of the 26 model ( 1.2.2), the spline estimation of the unknown parameter cm and asymptotic properties including its oracle efficiency. In section 3.4 we describe the implementation of the estimator. In sections 4 and 5 we apply the method to simulated and empirical examples. All technical proofs are given in the Appendix. 3.2 Estimation Method The statistical inference of the semiparametric GARCH model (1.2.2) consists of astimating both parameter a0 and link function m. In this chapter we focus on estimating the param- eter as once a0 is estimated with Vii-consistency, the estimation of function m is a routine application of univariate smoothing. The following assumptions on the data generating process are used A1: The process {Yt}?:—oo is strictly stationary, and the innovations {Etltez have finite r—th absolute moments E Iétlr = mr < oo, 0 < r S 6. A2: The link function m(-) is positive everywhere on 12+ and has Lipschitz continuous 4—th derivative. For convenience, define Xt = 2;; aé-‘lYtEJ-J E Z which simplifies model (1.2.2) to Yt = ml/2 (Xt) Q, at2 = m (Xt) ,t E Z while the process {Xflfg satisfies the Markovian —OO equation Xt = aOXt_.1+m (Xt_1) {L1, t E Z. Since a0 is an unknown parameter in (0, 1), to make numerical optimization feasible, we assume that 020 lies in the interior of A = [(11, a2], where O < 01 < (12 < 1, are boundary values known a priori. In practice, one takes sufficiently small a1 and sufficiently large 02 based on prior knowledge of the data. Define next th as a series analogous to Xt but with any candidate value of a E A 00 00 '—1 2 ' 2 X0.) = 2a] YH = Za’m (XH) §,_j,t e z. (3.2.1) i=1 i=1 We need the following assumptions on the processes {Xa,t}?_:_oo ,a E A. A3: The processes {Xa,t}:_oo,a E A are jointly strictly stationary and geometrically a-mixing, i.e., the o ~mixing coefficient oz(k) S cpk, for constants c > 0, 0 < p < 1, 27 where a(k) = sup ) |P(A)P(B) —— P(A fl B)I. A60 (Xa,t ,tS0,a€A) ,BEo (Xa,t,t_>_k,aEA From Assumption (A3) and the fact that the innovations {5322—00 are iid, the joint distribution of (Ybét, Xa,t,a E A) is strictly stationary. For each a E A, define the trans- formed variables for the th as, F01 (Xa,t) + F02 (Xa,t) Uai = F (Xat) = 2 , 1 S t S n (3.2.2) in which F01 and Fa2 are cdfs of X01): and Xa2¢ respectively. In particular, we denote Ut = U00; = F (XW) = F (Xt). A4: The pdf associated with F is f (x) > 0, Va: 6 (0, +00) and Ua,t has a pdf (pa(.) which is Lipschitz continuous and there exist constants C90, C90 such that infaEA,0SuS1 (pa (u) 2 Cw and “Pas/1,0931 9% (u) 5 cv- For any aEA define the predictor of Yt2 based on Ua,t as ga(u) = E(Yt2|Ua,t = u),0 < u < 1. In particular, denote g(Ut) = ga0(UaO,t) = E(Yt2IUao,t) = m(Xt). Define the risk function of a as R(a) = E {Yt2 —- ga(Ua,t)}2. Apparently{1@},?i_oo have finite 4—th moment due to assumption (A1) and (A2). So R(a) allows the usual bias-variance decomposition R(a) = E {9(Ut) - 90(Ua,t)}2 + (m4 -1)E92(Ut) which, together with 9(Ut) "=— 9a0(Ua0,t), imply that R(a) = E {9(a) — was}2 + 3(00) 2 R 0 and R(a) is locally convex at 0:0, i.e., for any 5 > 0, there exists 6 > 0 such that R(oz) — R(ao) < 6 implies Ia — 00] < 5. Thus by minimizing the prediction error of Yt2 on Ua,t, one should be able to locate the true parameter a consistently via polynomial spline smoothing. To introduce the space of splines, we divide [0,1] into (N+1) subintervals Jj = [tj,tj+1), j = 0,...,N — 1, JN = 28 ltN,1], where T ;= {tj}N 3:1 is a sequence of equally-spaced points, called interior knots, givenas t1_k=...=t_1=t0=0 1, with I t' < u < t- B- = I = J — 3+1 3’10“) {uer} { 0 otherwise Define the spaces of linear, quadratic and cubic spline functions on [0, 1] as N+1 P(k“2)=r(’“‘2)[o,11= 717(105 AZ AJBJ,k,ue[o,11 .k=2,3.4. J=1—k Given a realization {Yt}?:l, define for VaEA the cubic spline estimator of ga(-) 1 n 2 2 at) = argmin —,; Z {12 — 7(Ua,t)} 2) n 761‘ t=n’+1 with n’ and n’ ’ = n — n’. We do not use the first 12’ data points for implementation reasons in Section 3. Define next the empirical risk function flak-7%,; Z {KB—man}? t=n’+1 and let a be the minimizer of R(a), i.e. 51 == argmin 12(0). (3.2.3) aEA We assume the following on the number of interior knots A6: The number of interior knots N satisfies: n”6 < N = Nn << n”5 (log n)—2/5. The next theorem establishes the strong consistency of d. 29 THEOREM 3.2.1. Under assumptions (AU-(A6), as n —> oo, 0 -—> 00, as. Proof. According to Proposition 3.2.1, one has sup 12(0) -- 12(0) —+ 0,a.s. Thus there 06A exists an integer no (to), such that R(00,w) —- R(00,w) < 6/2 when n > no (w). Notice that 0 is the minimizer of R(00,w), so 12(0,w) - R(0o,w) < 6/2. There also exists an integer n1(w), such that R(0,w) — R(0,w) < 6/2 when n > n1(w). Thus, when n > maX(n0(W),n1 (717)), ma, w) — R(00, w) = ma, w) — Rm, w) + Rm, w) —— R(ao, w) < 6. According to Assumption (A5), R is locally convex at 00, so for any 5 > 0 and any 13, if R(0,w) — R(00,w) < 6, then I0 -) 00] < s for n large enough, which has proved the theorem. Cl Denote the asymptotic variance of 0 by the following “sandwich” formula 2 (00) = 13"(010V1‘1’(010)R”(00)"1 (3-2-4) with 2 " d 2 ‘1’ (00) = £7,“ var (a7, 2 500$) = 4E I{90(U00,t) “ Yt2} EI0=0090(U0,t)I (3'2-5) t=n’ +1 2 and R”(00) = iriffla) 0:00 2 d2 d 2 =25: {MUM —Yt }E‘—29.. 00 a; (a — 00) —’d N (0, 2 (00)). (3.2.7) _d_ d0R(a) and Proof. Denote S (0) = {0,15 = {90(U0,t) “ Ytz} 21%90(U0,t) " E [{90(U0,t) — YtZ} £0101de : (3-2-8) 30 then because %R(00) = 0, one has . 2 n _ 5(00) — £7" 2 €00’t = 0 (n 1/2) ,a.s.. (3.29) t=n’+1 according to (3.6.37). Mean Value Theorem then implies that for some t E [0, 1] A A 2 A 50) -— s = £24m + (1 -— t) a0) (a - ac) and 3(0) = 0 because 138(0) attains its minimum at 0. Thus, one has A 2 A —S(0to) = £33051 + (1 - t) 00) (51 - 00) i.e., . d2 . é — C10 2 —S(0o)/——2—R(t0 + (1 — t) 00). d0 One has d2 - . (12 II Emu” + (1 — t) 0:0) -> $713010) = R (aloha-S- by Theorem 3.2.1 and Proposition 3.2.1, where R”(00) is given in (3.2.6). According to (3.2.9), one has @5100) "’d N {0, \II (00)} by the Central Limit Theorem for strongly mixing processes (Theorem 1.7, [4]), where \11 (00) is given in (3.2.5). Then Theorem 3.2.2 is proved by formula (3.2.4) and Slutsky’s Theorem. El The proofs of Theorems 3.2.1 and 3.2.2 given above have made use of complicated arguments involving spline smoothing, summarized in the following proposition, whose proof is given in the Appendix. PROPOSITION 3.2.1. Under Assumptions (A 1)-(A6), as n —-> oo dk ‘ - sup W{R(0)—R(0)} —>0,a.s.,k=0,1,2. 06A 00 According to Theorem 3.2.2, the true parameter vector 00 can be estimated by 0 at \fn-rate. One can then use the estimate 0 in place of the unknown 00 for the estimation of function m. We define next the “would-be oracle” estimator of 00 if the link function g had been “oracally” known 5: = argminae A 11(0), where the oracle empirical risk is 12(0) = (n”)"1 222:”, +1 {Yt2 — g(Ua,t)}2, so '0' serves as a benchmark of oracle optimality. The next theorem states the asymptotic oracle efficiency of estimator 0. 31 THEOREM 3.2.3. Under assumptions (A1 )-(A6), as n —+ co, the estimator 0 is asymp- totically omcally efficient, i.e., it is asymptotically as eflicient as 0. Specifically, fit: —00) "*d N (0,2(00)) where the variance 2(00) is the same as in (3.2.4) and (3.2.7). The proof of Theorem 3.2.3 consists of routine arguments in parametric inference, thus it is omitted. 3.3 Implementation For a given realization {Y2}?=1, denote in the following two integers n’ i [2 logn/ log (02—1)] + Ln” = n —— n’. It is easily verified that 2 I I sup 0" =03 0 such that for any 9 6 0(4) [0,1] and 0 S k S 2, "(QT (g) - 9)('°)II .<_ 000 "9(4)" H44“- CX) oo LEMMA 3.6.2. (B-spline Property). (i) Partition of Unity. (de boor 2001, page .96) The sequence {B-,k}fi:_k+1 provides a positive and local partition of unity, i.e., each BM is positive on (tj,tj+k), is zero off [tj,tj+k], Zjiwkfl 8J3), = 1. (ii) Differentiation. (de boor 2001, page 116) B- _ u 8- _ 1 iBjk(“)=(k-1) "k1” — ”1”“ 1“) .l—ksJ'sN. d" ’ tj+k—1 -' tj tj+k “ tj+1 (iii) Good Condition. (De Vore and Lorentz 1993, Theorem 5.4.2, page 145) There is a constant Dk > 0 such that for each spline S = XXL”); +1 chj,k of order k and each 0 < r S 00, BI. IIc'II < Isu. < llc’ll 1: . < .. Dk IIc'II S IISII, S Isl/r IIc',rII ,0 < r < 1. For any functions g1, 92 6 L2 [0,1], define for V0 6 [01, 02] the theoretical inner product and norm as 1 (91,92).. = [0 II (“)92 (u) «I. (u)du. IIgIIIE. = (91,91).. LEMMA 3.6.3. There exist constants c > 0 such that for any A := (A_1,2, A03, ..., AN,2’ ..., ANA) E R3N+9. _ 1 chl/rllA||. 00, with probability 1 —1/2 sup max I=O{(nN) logn}. 06A k,k’=2,3,4 I J 7 7'“, "I0 3’ J ’k a l—ijsNJ—k'Sj'SN Proof. We only prove the case k = k’ = 4, all other cases are similar. Let C0,j,j’,t = 314 WM) Bj’,4 (Umt) " E3334 (Umt) Bj’,4 (Umt) with the second moment 2 E8 E {31.0.03}... v.0] — {EBII (U...) 3,, (U.,.>} a.j.j’,t = 2 where E In},4 (U...) 312.54 (Ua,I)I ~ er, [133,4 (Haj) 8,4,4 (Ua,t)] ~ N-2 uniformly for all —3 S j, j' S N by Assumption (A4). Hence, EC2 . ~ N ‘1 uniformly for all 0.1.1" It "3 S if S N. The k—th moment is k k I = E IBj,4 (Ua,t) le’4 (Ua,t) — EBj’4 (Ua1t) 3.1-[’4 (Ua,t)I k k E ICGJJ’J |/\ k k where EIB,-,4 (Haj) 3,5,, (Ua,I)I ~ N-l, IEBM ((1a))BjI,4 (U..,I)I ~ N-k uniformly k < for all —3 S j, j’ S N. Thus, there exists a constant C > 0 such that E (misfit 37 02k 1klEC2. jj’ tfor all -3 S j, j' S N. So Cramér’s condition in Lemma 2.5.2 is satisfied, one has for 6,, = t6logn/x/nN and fixed 0 I n P {W [Zian—I—l Caijajlat We divide interval [01,02] into n6 equally spaced intervals with disjoint endpoints 01 = > 6,,} _<_ n_10. (3.6.1) is bounded by a1 < -~ < a-MII = C*2 and SUPaeA max-ssI‘II’SN IC0,j,j’,t sup max IC - -I I+ max sup max IC - -I — C - —I . (3.6.2) 1n,a, 4 N 4 N II7III2,0 : Z Z Z Z vl’j’kfyl‘jlk,0j k=2j=-k+1 k’=2 j’=—k+l N at. = z 5; 2 2 ( ,I.) . OI k=2j=—k+1k'—=2j’=—k+1 38 Let '71: (71,—1,2> 71,0,2? "'271,N,2I "'I71,N,4)7 72: (72,—1,2a’72,012, "'772,N,2) "'2 72,N,4)‘ According to Lemma 3.6.3, one has for any a 6 [(11,012], 2 2 2 2 2 2 0h ||71||2 S ||71||2,a S Ch ||71||2 ,Ch ||72||2 .<_ ll72ll2,a S Ch ll’72ll2, Chll71l|2 ll72ll2 S l171||2,a ll72ll2,a S Ch ll71l|2 ll72|l2~ Hence ’7 ’7 “ 7 ’7 An ___ 811p Sllp < 1’ 2>naa < 1’ 2>a — ll7lllooll72lloo aeA 7167,726F l|71||2,a l|72||2,a 01h ||71l|2 ||72||2 1 n x sup max — -— aEA k,k’=2,3,4 n g{ J J ’k’ n,a J J ’k’ a 1—kgj:N.1—k’sj’5N < ooh"1 sup max 1 Zn (B B > _ - . 3k) .’ — 1k) .’ 7 (16A k,k’=2,3,4 n 1:1 3 J ’k, n,a J J ’k’ a 1—ks:‘gN.1—k’sJ"sN which, together with Lemma 3.6.4, imply (3.6.5). C] For any fixed a, one has Y2 = ga+g-ga+E=ga+Ea+E, where ET ___ {9 (Ut) (5? “ 1) }:=n’+l ,Ea = {9 (Ut) — ga (Ua,t)}?=nr+1. Then one can break the cu- bic spline estimation error as ga(u) — ga(U) = 20.6) — yam) + Mu) + Wu)» (3“) where §a(u) = {32,4 (1‘)}Z‘3gjgN V5}! {(30, Bja4>n.a}:-v=—3’ 230(2) = {BM (u)}€3sjsN VJ}. {(Ea, Bj,4>n,a }:-:_3’ - T _ N N N vma = {(3)-,4, le’4>n,a}j,jI=_3 ,Va = {(BM, Bj,,4)a}j,j,:_3. (3.6.7) The next proposition is used in proving Proposition 3.2.1. PROPOSITION 3.6.1. Under Assumptions (A 1)—(A4), (A6), as n —> oo sup sup mo (21.) — ga (u)| = O {(nh)-l/210gn + h4} ,a.s., (3.6.8) aEAuE[0,1] 39 d . _ —1/2 —3/2 3 22333212.. a {go (a...) _ ga (60,0) .. o {n h logn + h },a.s., (3.6.9) d2 21612 bd—a—i {9a (Ua,t) -— ga (Ua,t)} = O{n_1/2h-5/210gn + hz} ,a.s.. (3.6.10) In order to prove the above proposition, we need several technical lemmas. The following is a special case of Theorem 13.4.3 in [15]. LEMMA 3.6.6. If a bi-infinite matrix with bandwidth 1' has a bounded inverse A'1 on 12 and K. = K(A) = ||A||2||A'1||2 is the condition number of A, then ”A'IHOO 3 2CD (1 — v)_1, with co = 21—2' IIAIIZ, v = (It2 — 1)1/4r (It2 +1)-1/4r. LEMMA 3.6.7. Under Assumptions (A3), (A4) and (A6), there exist constants O < CV < CV such that ch—1 Ilwna _<_ wTvaw sow-1 Ilwuié (3.6.11) WW IIng s WTVn,aW saw-1 :1ng (3612) with matrices Va and Vma defined in (3.6.7). In addition, there exists a constant C > 0 such that sup ”VELH _<_ CN,a.s., sup “V31” _<_ C’N. (3.6.13) aEA ’ 0° aEA 00 Proof. Let w be any i (N + 4)-vector and 7w (u) = Zfl;_3 w JBJ'A (u), then Baw = {7“, (Ua’nI) ,...,7w (Ua,n_1)} and An in (3.6.5) entails that "va13. (1 — An) s WTVn,aW s (mug. (1 + An). (3614) By Theorem 5.4.2 of [15] and Assumption (A4), one has C C cw— nwng g WTVaw $0,9-— nwug (3.6.15) N N which, together with (3.6.14), yield C C ce—N— uwng (1 — An) 5 WTVn,aW soapy, :le130 + An). Then one has (3.6.11) and (3.6.12) by (3.6.15), (3.6.14) and (3.6.5). Next, denote by Amax (Vma) and Ami“ (Vma) the maximum and minimum eigenvalue of Vma, then CVN-1 S “Vn,a”2 “V7231 2 :- Amax (Vn,a) , Ag)?“ (V1141) = “Valallzz SCVN_1)a'8'7 40 thus K. = “Vnfl,”2 = Amax (Vma) /)‘min (Vma) = CV/cV < oo,a.s. One can also Show that It 2 C > 1,a.s.. Combining the above and Lemma 3.6.6 with u = (14,2 —- I)”16 (K2 + 1)—1/16, one gets “VT—thloo S 2v‘8N (1 ——v)"1 = CN,a.s., which is part one of (3.613). Part two of (3.6.13) can be proved similarly. Cl 3-6.2 Proof of Proposition 3.6.1 LEMMA 3.6.8. Under Assumptions (A2)-(A4) and (A6), as n —+ 00 sup “(50, - 900(k)" S C “mm“ h4‘k,a.s.,0 S k S 2. (3.6.16) aEA 0° 0° Proof. According to Theorem A.1 of [36], there exists an absolute constant C > 0, such that SUPllga- galleoSCSup inf ||7— yell00 oo sup “(nI')-1B£“ S Ch, sup CX) (n")'—1B£”00 S C,a.s. 06A (16A 0 d T —1 sup ”Pall00 S C, sup ”Pa” S Ch, sup —— (BaBa) = O (N) ,a.s. .1621 (EA 00 06A d0 66 II Proof. For any vector a ER" , one has Il(n”)"BZ.‘a||oosuau.. max [(n”)“Z" B..4(Ua3)|30hnau... -3_<.jSN t=n’+l and using equation (3.6.20), H (n”)—1 35a" is bounded with probability 1 by CX) Tl 00 “an -32312‘31 (em—1 2: {(B,,3—B,-+1,3)(U..,.)}f(X6,3)Ir12jeJ-1Y,2_, t=n’+1 i=1 42 (3.6.22) (3.6.23) (3.6.24) (3.6.25) 0.3. S Cllalloo. Then one has (3.6.25) by (3.6.19), (3.6.13), (3.6.24), (3.6.23) and (3.6.22). Equations (3.6.20) and (3.6.21) are needed for proving the rest of the inequalities. Cl LEMMA 3.6.11. Under Assumptions (A2)-(A4) and (A6), d’c - sup “—1; {ya (Ua,t) - 901 (Uaat)} A da 3 C ”63(4)” h4_k,a.s., k = 1, 2. (3.6.26) 06 00 Proof. According to the definition of go, in (3.2.3), one has (1 Edg,‘ [{QT (go) _ go} (Ua,t)] = aPa [{QT (96) - ya} (Umtil = P6 [{QT (ye) - go} (Ua,t)l + Pail; [{QT (90) " 90} WW” * it; [{QT(901) "‘ go} (Ua,t)] = [{QT (21:15.90) — £90} ([10130] d °° - + [3,7 {QT (ye) -- go} (U..,.)] f (X...) h’1 2369-1133,, .7: which yield (3.6.26) for k = 1 by (3.6.17) and (3.6.25). The proof for k = 2 is similar. Cl LEMMA 3.6.12. Under Assumptions (A2)-(A4) and (A6), as n -—> 00, one has with proba- bilityl :29. 33E m=0(l}i—Zl:3 337'?“ 500%), (3'6”) :25: 3‘2 (53’) m=0(l£5i)23 53(335“) m=0(’$—g—;El (36-28) :29. 337E m=0(i‘}i%)’:3 333“ 05003-2) (“-2") Proof. we prove only the first equation in (3.6.27) and the second equation of (3.6.28), other equations can be proved similarly. One has 33:36—12” 3.6,66><6—21 N n tzn’ '— J-w3‘ Denote Z): = g (Ut) (é? —- 1) = Zfln+ Z52" + Ztg", where Dn = n77 (1/3 < n < 2/5), 2.3" =g(U.) (632-1)1(l9(Ut) (a? — 1)| > 12...}, 252" = g (U63) (5% — 1) I {|g(U.) (a? -— 1)| s 0.} — 2:3; 43 2.2." = E [9(Ut) (é? — 1)1{|g(u.) (é? — 1)| _<_ D..}]. Note that the B-spline basis is bounded, so it is straightforward to verify that the mean of the truncated part is uniformly bounded by D;2 333166—129...(U...)25.1... 0(1):?) = . (2‘2”)- One has 22:11,“ P{|g (Un_1) (5?, —— 1)[ > Du} S :00 ,+1D;3 < 00 according to the 71:7! assumption that E (g?) = mg < +00, and Borel—Cantelli Lemma implies that the tail part l(n”)m1 Z:=n,+1 3334 (Ua,t) Zt,Dlnl = 0 (n—k) , for any h > 0. For the truncated part, using Lemma 2.5.2 and discretization, one has ((n”)—1 Zn 3,7,4 (Ua,t) Zgnl = 0 (log n/M) . t=nL+1 Therefore the first equation in (3.6.27) is established with probability 1. To prove the second equation of (3.6.28), notice that BEENMWZ" BMW{,.U,._..(U..)}]N ’ n t=ni+1 j=——3 d BEE... ,, -1 n d N a“: nil = (n ) thnl+1 36 [8.7.34 (Uait) {9 (Ut) — ga (Ua’t) }] j=-3 While E [BM (Ua,t) {g (Ut) — ga (U04) }] = 0, —3 S j S N implies that d . E {.3 (B... (U...) {3 w.) - 3.. (U.,.)}]} = o. -3 s 3 s N, eeA, which allows one to apply Lemma 2.5.2 to obtain that with probability one TE sup 1 Ba" 0 = O (logn/Vnh). 06A da n 00 Cl LEMMA 3.6.13. Under Assumptions (A2)-(A4) and (A6), as n —+ 00 sup sup léa(u)| = 0 (log n/Vnh) ,a.s., (3.6.30) aEAuEmJ] sup sup [Ea(u)| = 0 (log n/x/nh) ,a.s.. (3.6.31) aEAuEWJ] 44 Proof. We only prove (3.6.30), the proof of (3.6.31) is similar. Denote a = (&_3, ..., 6N)T = -l (BEE...) BEE _—_ V12)! {(n”)-1 BEE}, then éa(u) = Zfiz—3 éijA (u). 00 sup sup [éa(u)| S SUP llélloo = SUP “V722! (n—IBEE)” aEAu€[0,l] aEA Ore/1 S CN sup ”(n")"1 BEE” a 3 (16A 0° where the last inequality follows from Lemmas 3.6.7 and 3.6.12. Cl LEMMA 3.6.14. Under Assumptions (A2)-(A4) and (A6), as n —r 00 d sup max -—5‘a(Ua,t) =O(n—1/2N3/2logn),a.s., (3.6.32) aEAn’+1StSn da d sup max —Ea(Ua,t) =O(n_1/2N3/210gn) ,a.s., (3.6.33) aeAn’+1StSn (10 d2 sup max ———2-ea(U... t) — —-O (n—1/2N5/210gn) ,a.s., (3.6.34) a6An’+1—E(a>) =37 2 6.,.+Ja.1+Ja,2+Ja.3 t=n’ +1 where 50¢ is defined in (3.2.8) and Ega. = 0 , and where 1 " - d .. Jal = '77; Z {90(Ua,t)_ga(Ua,t)}'Ja(ga“9a)(Ua,t)y 1 ==n’+1 Ja,2 = ’17,: {ga(Ua,)t _Yt 2}: (9a‘ 9a) (Ua, t) ”t=n’+1 1 Ja,l = _ ,2 {90:(Ua, t)‘ 90:( (U0 ,dt)} da —ga (U0 ,-t) nflt=n’+1 By Lemma 2.5.2, sup [(n”)—1 Z?—n’+1€a1tl = 0 (n’1/2logn) a.s.. Meanwhile, (3.6.8) aEA '- and (3.6.9) imply that sup lJaJl = 0(n"1h"210g2n + h7) a.s.. Note that aEA 1 Ja,2 = ‘77 Z{ga(Ua, t) — Yt 2}— (9(1“ 90:) (U0, t) 1t——-n’+1 7,; (E. +E)T;,5:; {P.(E. +E)}- One has sup 0611 according to (3.6.16). Next Jaw—2+1”(Ea+E)T—-—-{PO.(EQ+E)}|=0(h3) as 1 d |-— (E. +E>TE{P.(E. +E>}| 48 1 d BTB BT {( ,.a-p) W} BgBa)-w BT II 1 . s W}| sup aEA (Ea + E)T Ba -1 BECBa Tl.” l S 0(N x sup —— ) a€A{ n” 00 —1 1 BTB clBT +0 (N) x sup 72,7 (Ea + E)T Ba (€173 d—dn -—°‘ ,(Ea + E) 06.4 00 oo -1 1 d BTB BT +0 (N) X sup a; (Ea + E)T Ba E; ( :11” a)“: —Ia [(Ea + E) 06A 00 00 oo = 0(N) x 0 (logn/M) x 0(N) x 0 (logn/W) '2 0(n—1N210g2n) a.s. according to (3.6.27), (3.6.28), (3.6.29), (3.6.13) and (3.6.25). So sup|Ja,2| 06A 0 (Tb—1N2 log2 n + 113) ,a.s.. Similarly, one can write Ja,3=—1-,;Z {gun/W) yaw...» d 1,9411...) t=n ’+1 1 1~ BTBa ’ BT11 +W(E0+E) Ba( an )1 n —daga and has 1 n 1 d SUP 377 Z {9a(Ua,t)-ga(Ua,t)} EggMUmt) =0(h4) a-S-, 06A t=n’+1 49 sup aEA logn } {logn} == 0 x N x h =0 ... {VnN x/nN as Thus (3.6.36) is proved for k = 1. One can prove that for the term {out defined in (3.2.8), 72 1 T 33,13a “133; d ”77(Ea‘l'E) Ba *1). -——-ga with probability 1 sup aeA $55 {R(a) — R(a)} — 5;, En: goat = o (71—1/2) . (3.6.37) t=n’+1 The proof of (3.6.36) for k = 2 follows from (3.6.8), (3.6.9) and (3.6.10), since 11. 1 d2 ‘ 1 ,. 2 d2 . d .. d ,. EWRQI) = Ft 2,21 [{ga(Ua,t) _ Yt } ga—Q‘ga(Ua.t) + E90(Ua,t)jdgga(Uait)] ’ :17, 1 d2 d2 d d 5mm) = E [{gawat) — YE} 50—29411...) + 3390(Ua,t>3;ga] . [3 Proof of Proposition 3.2.1. It follows from Lemma 3.6.15 and Lemma 3.6.16. [3 CHAPTER 4 Spline-backfitted kernel smoothing of additive coefficient model 4.1 Introduction This chapter is based on Liu and Yang (2009). Model (1.3.1)’s versatility for econometric applications is illustrated by the following example. Consider the forecasting of US GDP annual growth rate, which is modelled as the Total Factor Productivity (TFP) growth rate plus a linear function of capital growth rate and labor growth rate, according to the classic Cobb-Douglas model (Cobb and Douglas, 1928). As pointed out in Li and Racine (2007), p. 302, it is unrealistic to ignore the non neutral effect of R&D spending on the TFP growth rate and on the complementary slopes of capital and labor growth rates. Thus a smooth coefficient model should fit the production function better than the parametric Cobb-Douglas model. Indeed, Figure 9 shows that a smooth coefficient model has much smaller rolling forecast errors than the parametric Cobb-Douglas model, based on data from 1959 to 2002. In addition, Figure 10 shows that the TFP growth rate is a function of R&D spending, not a constant. Many methods exist for the estimation of functional/varying coefficient models, see Cai, Fan and Yao (2000), Yang, Park, Xue and Hardle (2006) for kernel type estimators, Huang, Wu and Zhou (2002), Huang and Shen (2004) for spline estimators. Thase published works have partial success in addressing the inaccuracy of estimating multivariate nonpara- metric functions, commonl known as the “curse of dimensionalit r”. picall , 0)timal y i y I 51 convergence rates of the coefficient function estimators are established, locally for kernel estimators, or globally for spline estimators. ,d2 . . . . . . d Our V18“! 18 that a satlsfactory procedure for estimating the functions {mag (ma)}l__1_1 0:1 and constants {mot}?__1_1 in model (1.3.1) should meet three broad criteria. Specifically, the procedure should be (i) computationally expedient; (ii) theoretically reliable and (iii) in- tuitively appealing. As model (1.3.1) is a natural extension of additive model, we extend the “spline-backfitted kernel smoothing” of Wang and Yang (2007) to additive coefficient model, combining the best features of both kernel and spline methods. Kernel procedures for additive model, such as Yang, Hardle and Nielsen ( 1999), Sperlich, Tjostheim and Yang (2002), Yang, Sperlich and Hardle (2003), Rodriguez-Poo, Sperlich and Vieu (2003), Hen- gartner and Sperlich (2005) satisfy criterion (iii) and partly (ii) as they are asymptotically normal at any given point, but not (i) since they are extremely computationally intensive when either the dimension is high or sample size is large, as illustrated in the Monte—Carlo results of Wang and Yang (2007). Spline approaches of Stone (1985), Huang (1998a,b), Huang and Yang (2004) to additive model, on the other hand, do not satisfy criterion (ii) as they lack limiting distribution, but are fast to compute, thus satisfying (i). In addition, none of the published works had established “uniform convergence rate”, thus lacking in regard to (ii). The spline-backfitted kernel (SBK) and spline-backfitted local linear (SBLL) estimators we propose are essentially as fast and accurate as an univariate kernel and local linear smoothing, thus completely satisfying all three criteria (i)-(iii). Other alternatives for estimating model ( 1.3.1) that may satisfy criteria (i)—(iii) are possible extensions of the smoothed backfitting of Mammen, Linton & Nielsen (1999) and Nielsen & Sperlich (2005), and the two-stage estimator of Horowitz and Mammen (2004). It is important to note that although Horowitz and Mammen (2004) had used B spline in simulation, their theoretical proof was for what should be called “orthogonal series-backfitted local linear” estimator in our parlance. We now describe the oracle smoothing idea of Linton (1997) in the context of model (1.3.1). If all the nonparametric functions of the last d2 —- 1 variables, {mal(:ra)};i__1:f§=2 d . and all the constants {mmhil were known by “oracle”, one could define a new variable 52 d ' 2 Y,1 = 23:14 mu (X1) T; + a (X,T) 5 = Y — 22:1 {m01+ ma; (Xa)} T1 and estimate a: all functions {mu (31)}:21 by linear regression of X1 on T1,",le with kernel weights computed from variable X1. These would—be estimators do not suffer from the “curse of dimensionality” and are called “oracle smoothers” . We propose to pre—estimate the functions {mat (30)}?ifi=2 and constants {mozhdil by linear spline and then use these estimates as substitutes to obtain an approximation I21 to the variable Y,1, and construct “oracle” estimators based on 17,1. The theoretical contribution of this chapter is proving that the error caused by this “cheating” is negligible. Consequently, the SBK/SBLL estimators are uniformly (over the data range) equivalent to univariate kernel/ local linear “oracle smoothers”, automatically inheriting all their oracle efficiency properties. Our proof relies on the general principles of “reducing bias by undersmoothing” and “averaging out the variance” , accomplished with the joint asymptotics of kernel and spline functions. Another innovation in this chapter is the (hi-consistent oracle estimation of constants {motfilil ’d2 Xue & . . (1 under conditions no more than second order smoothness of {mal($a)}1i1 0:1. Yang (2006a) had provided (fit-consistent estimation of constants {"1002}; only under higher order smoothness Assumptions, while Xue and Yang (2006b) had failed to obtain {ii-consistency for estimating {m01}2i;1. This chapter is organized as follows. In Section 4.2 we discuss the assumptions of the model (1.3.1). In Section 4.3, we introduce the oracle smoothers and discuss its asymptotic propertiae. In Section 4.4 we introduce the SBK and SBLL estimators, their L2 consistency and asymptotic normal distribution. The ideas behind our proofs of the main theoretical results are given by decomposing the estimator’s “cheating” error into a bias and a variance part. In Section 4.5 we discuss the implementation of the estimators. In Section 4.6 we apply the methods to simulated and empirical examples. All technical proofs are given in the Appendix. 53 4.2 Assumptions Let {(Y,-, Xi, Ti)}?___1 be a sequence of strictly stationary observations, with identical distri— bution as (Y, X, T) in model (1.3.1). Denote the unknown conditional mean and variance functions as m (X, T) = E (YIX, T) ,02 (X, T) 2 var (YIX, T), then one has Yi = m (Xi, Ti) + 0(X1,Tz') 5i (4.2-1) for some conditional white noises {51);} that satisfy E(£,-|X1-, T2) = 0, E (ezZIXi, T,) = 1. The variables (Xi, T5) can consist of either exogenous variables or lagged values of 1’}. For the additive coefficient model, the regression function m takae the form in ( 1.3.1), and satisfies the identification conditions that E {mod (Xa)} = O, 1 SIS (11,1 S 0: S (12 (4.2.2) d2 ensuring the unique additive representations of ml (x) = mg; + 2 ma; (Ia). As in most a=1 works on nonparametric smoothing, estimation of the functions {mat (ma)};i__1_’l(f£=1 is con- ducted on compact sets. Without lose of generality, let the compact set be X = [0,1]“2. Following Stone (1985), p. 693, the space of a—centered square integrable functions on [0,1] is Hg = {g : E{g(Xa)} = 0,E{g2(Xa)} < +00} ,1 s a 3 d2. Next define the model space M, a collection of functions on X x Rdl as (11 d2 M = g(x, t) = 29; (x) t); g; (x) = 901+ 29010501) 3.9a! 6 H2 2 1:1 a=1 in which {90!};g1 are finite constants. The constraints that E {gal (Xa)} = 0, 1 S a S d2 ensure unique additive representation of m; as expressed in (4.2.2), but are not neces— sary for the definition of space M. In what follows, denote by En the empirical expec- tation, Engo = 221:1 0 as Lip([a,b] ,C) = {g] ]g (x) -—g(:z:')] S Cla: -:r:’] , Vx,:c' 6 [a,b]}. We mean by “~” both sides having the same order as n —> 00. We denote by IdIXdl the d1 x d1 identity matrix, and 0,11de the d1 x all zero matrix. For any vector x = (51:1, 51:2, - - - , 2,12), we denote the supremum and Euclidean norms as Ix] = maxlSan2 Ira] and leu = (22:. x2.) 1’ 2. We need the following Assumptions on the data generating process. (A1) The tuning variable X = (X 1, . . . ,Xd2) has a continuous probability density function f(x) that satisfies 0 < of S minxeX f(x) S maxxgx f(x) S Cf < 00 for some constants cf and Cf and f(x) = 0,:c ¢ x = [0,1]“2. (A2) There exist constants 0 < cQ S CQ < +00 and 0 < c5 S C5 < +00 and some 6 > d 1/2, such that CQIdlxat1 S Q(X) = {4(X)},,]/=1 = E (TTT IX = X) S 001(11de 2+6 and 05 S E{(T1TI/) IX =x} S C"; for all x E X and l,l’ =1,...,d1. (A3) The vector process {ct}f__‘_’___oo = {(Yt, Xt, Tt)}f:_oo is strictly stationary and geomet- rically strongly mixing, that is, its a -mizing coefficient a(k) S 6,0“, for constants c > 0, 0 < p < 1, where a(k) = SUpA€a(ct,tS0),BEa(ct,t2k) |P(A)P(B) —- P(A fl 8)] (A4) The coefficient components, mat E C’1 [0,1], mg] 6 Lip ([0, 1] ,Coo) ,Vl S a S d2,1 S 15 d1 with m1, 6 C2{0,1],V1 g: 3 d1. (A5) The conditional variance function 02 (x, t) is measurable and bounded. The errors {5,}?=1 satisfy E(e,-|.7-',-) = 0, E(e,2|F,) = 1, E (Ieilz'l'nlfi) S CG, for some n E (1/2, 1] and the sequence of o—fields F,- = a {(Xj,Tj) ,j S i;€j,j S i -— 1} for i = 1,...,n. (A6) The marginal density f1 (:51) of X1 and the conditional second moment matrix func- tion Q1 (2:1) defined in (4.2.3) both have continuous derivatives on [0,1]. 55 Assumptions (A1)—(A5) are common in the literature, see for instance, Huang & Yang (2004), Huang & Shen (2004) and especially Xue & Yang (2006b). Assumption (A6) is needed only for the asymptotic theory of oracle “kernel smoother”, but not for the oracle “local linear smoother”. Assumption (A2) implies also that for all 173 6 [0,1] , 1 S oz S d2 andl,l'=1,...,d1 |/\ Qa ($0): {(10 (170)}11 l’_ _=1 ET(TT lXa — 1'01) < CQIdlxdl (4 2 3) E{(T1TII)2+ IXa = ma} _<_ 05. Furthermore, Assumptions (A2) and (A5) imply that for some constant C > 0 CQIdl Xdl l/\ 06 max E|Tl]2+" < C1 max E |T1T1|2+6= C lmax E|T1|4+26S 0C5 < +00. (4.2.4) lSlSdl lSSdl lSSdl At one referee’s request, we provide here insight into the relationship allowed between the vectors T and X under Assumption (A2). It is instructive to first understand what T and X can not be in the context of identifiability for functions {mal(:1:a)}I=’__d1 a_1. Suppose that the vector X is centered so that EX = 0. Then model (1.3.1) is unidentifiable when (T1,T2)= (X1,X2) since —3X2T1 +3X1T2 — 0, E(— -3X2)= E (3X1) = 0 and the function m (x,t) in (1.3.1) is expressed as d1 d2 d2 2 mm + 2 mal (Ira) Q + mm + m21 ($2) + 2 mal ($0) t1 (=3 a=1 a=1,a;é2 0‘2 + m02 + m12(3131) + Z "101(330) t2 a=2 d1 d2 d2 5 2 mm + 2 mal (95a) 11+ m01 + m21(1172) - 3152 + 2 mal (30:) t1 (=3 01:1 a=1,a7£2 d2 + 77102 + m12(1131) + 35131 + Z ma1($a) t2, 0:2 so one can use "”51 (x2) = mm (:62) —-3:1:2 and mh (x1) = 17112 (11:1)+3:z:1 to replace mgl (11:2) and mu (11:1) without changing the data generating process (1.3.1). In other words, the functions mm (:52) and m12(:1:1) are unidentifiable. Xue and Yang (2006a), p.2523 gave a 56 similar counterexample, and discussed why an unidentifiable model may perform better for prediction. More generally, it is revealing to note that Assumption (A2) not only rules out the above anomaly, but it also does not allow the possibility that there exist two CD’S (1 S l S d1) almost surely equal to two Borel functions of X. To see this, suppose that (T1,T2) = {1,01 (X) , (p2 (X)} ,a.s for some Borel functions (p1 and (p2. Assumption (A2) impliae that T2 T T CQ12x2 S E{ ( T111112 711222 ) X = x} S qu2x2,\7’x E X leading to cQI2x2 S ( (P1 2:351:72“) (p1 $353)“) ) S. CQI2x2,a.s.,Vx E X which can not be true as for any x E x, the 2 x 2 matrix in the above is singular, thus can not be 2 eqlgxg. That Assumption (A2) guarantees the identifiability of model (1.3.1) has been established in Lemma 1 of Xue and Yang (2006b). It is important to observe, however, that Assumption (A2) does not exclude the case of one Th1 S l S d1 almost surely equal to a Borel function of X. 4.3 Oracle Smoothers We now introduCe what is known as the oracle smoother in Wang & Yang (2007) as a bench- mark for evaluating the estimators. Denote for any vector x = (x1 , 2:2, - - - , xd2) the deleted vector X_1 = (x2,~- 133112) and for the random vector X,- = (Xi1,X,-2,--- ’Xidz) the deleted vector X,,_1 = (Xig, . -- aXidz): 1 S i S n. For any 1 S l S d1, write m4) (x_1) = mg; + 222:2 ma] (ma). Denote the vector of pseudo-responses Y1 = (Y1,1,- -- ,Yn,1)T in which d1 ' d1 Yi,1 = Yi — 297101 + "1-1,: (Xi,-1)}Til = Zm11(X11)"-’}z+ U (Xi: Ti) 62'. 1:1 [=1 These would have been the “responses” had the unknown functions (111-1,; (x—1)}1 0, while the bandwidth h = hlm > O,h ~ n’1/5. Likewise, one can define the local linear oracle smoother of mL. (1121) as .. 1 ‘1 1 mLL,1,- ($1) = (1.11 xd1,0d1xd1) (HCE‘LJWICLLJ) ECELJWIYL (4-3-2) in which T C _ T1 , ... , Tn LL’I T1 (X11 -$1) , , Tn (Xn1-$1) ' In this chapter denote ug (K) = fuzK (u) du, “Kllg = [K(u)2 du, Q1 (:51) as in (4.2.3) and define the following bias and variance coefficients 1 bLL,l,l’,1 ($1) = 5% (K) "1,1,1 ($1) f1 ($1) flu/,1 ($1), bx,l,l',1($1): $112 (K) [2m'11(331)5% {f1($1)qw,1($1)} +mll’1($1)f1 ($1)f1u',1(331)] 1 231 ($1) = “Kllg f1 ($1) E {TTT02 (X, T) |X1 = $1}, {vl,,/,1 00, the oracle local linear smoother mLLJI (x1) given in (4.3.2) satisfies d d1 1 ,— .. all "h mLL,1,- ($1) _ m1,- (331) - E :bLL,(,z’,1 (371) (‘2 "" N (01{v[,1',1($1)}”,=1) - l=1 ’ l’ =1 With Assumption (A 6) in addition, the oracle kernel smoother mm, (11:1) in (4.3.1) satisfies d1 d1 r— .. d1 nh mK’1,. (:81) — 772.1,. ((121) - Z bK,l,l',1 (1111) h2 -+ N (0, {UM/,1 ($1)} ) . l=1 l,l’=1 l’=1 THEOREM 4.3.2. Under Assumptions (A1) to (A5) and (A 7), as n ——> 00, the oracle local linear smoother mud, (3:1) given in (4.3.2) satisfies sup ImLL,1,. (:51) — m1,.(x1)] = 0,, (log n/Vnh). x1€[h,l—h] With Assumption (A 6) in addition, the oracle kernel smoother mm, (:01) in (4.3.1) satisfies sup lmK,1,. (2:1) — m1,.(:1:1)] = 0;, (log n/V 71h) . $1€[h,1—h] Remark 1. The above theorems hold for mum (Ta) and mm, (330,) similarly constructed as 7721,qu (2:1)and mK,1,.(x1), for any a = 2, ...,d2, i.e., _11 .. 1 mLL,a,- ($0) = (Idl X611 1 Odl Xdl) (ECEL,QWOCLL,O) ECELpWaYa: 1 1 ‘1 1 mK’a,. (Ilia) 3: (if—ICEWOCK) ECfiWaYa, except that in Assumption (A4 ) one has to replace “mu 6 02 [0, 1] ,V1 S l S d1” with “ma, 6 02 [0, 1] ,‘v’l S l S d1” and in Assumption (A6), f1(:1:1) and Q1(:1:1) have to be replaced with fa (Ta) and Q0, (Ta). The same oracle idea applies to the constants as well. Define the would-be “estimators” of constants (mopflKdl as the following least squares solution 11 d1 2 ~ ~ T . m0 = ("100131ng = arg $111112 Yic - ZmOITil , (4-3-4) i=1 1:1 59 in which the oracle responses are d1 d2 d1 Yic = Y2" - Z 2 mod (Xia) Ta = ZmozTa + 0 (X133) Ei- (4-3-5) 1:10:21 (=1 The following result provides optimal convergence rate of who to ma, which are needed for removing the effects of mg for estimating the functions {mu ($1)}7i1 PROPOSITION 4.3.1. Under Assumptions (A1)-(A5) and (A8), as n —-+ 00, 5“Pl_<_l_<_d1 Imoz - mozl = 0p ("71(2) - Although the oracle smoothers Thug, (Ta), mm, (:50) possess the desirable theoretical properties in Theorems 4.3.1 and 4.3.2, they are not useful statistics as they are computed based on the knowledge of unavailable functions {mat (2:0)};2’16222 and constants {mot}?il. They do, however, motivate the spline-backfitted estimators that we introduce in the next section. 4.4 Spline-backfitted Kernel Estimators In this section we describe how the unknown functions {mat (Ta)}]1=_1_’16f(21=2 and constants {mOllliil can be pre-estimated by linear spline and how the estimates are used to construct the “oracle estimators”. To this end, we first introduce the space of linear splines. Let 0 = 60 < £1 < < E N < {N+1 = 1 denote a sequence of equally spaced points, called interior knots, on interval [0,1]. Denote by H = (N + 1).1 the width of each subinterval [€J,€J+1] ,0 S J S N and denote the degenerate knots {-1 = 015N+2 = 1. We assume that (A8) The number of interior knots N = Nn ~ n1/4logn and hence H ~ 71"”4 (log n)‘1 . For J = 0,. . . , N + 1, define the linear B spline basis as (N+I)$—J+1 76J”13$S€J bJ($)=(1-|$-€J|/H)+= J+1_—(N+1)$,€JS$S€J+1: 0 , otherwise 60 the space of a-empirically centered linear spline functions on [0, 1] as N+1 G913 = 9013901(17oz)E Z )‘JbJ (550:):En {90: (Xall = 0 ,1 S a S d2, J=0 and the space of additive spline coefficient functions on X x Rdl as al1 d2 G9: = 9 (x,t) = 291009; 91(X) = 901 + Z 9al(17a);90l E Riga: E 091,0: 1 1:1 (1:1 which is equipped with the empirical inner product (~, )2,” The multivariate function m (x, t) is estimated by an additive spline coefficient function d1 11 m (x,t) = Zmz (x) t: = argmin Z {n — g (Xi'an?- (4.4.1) [=1 96 n i=1 d2 Since in (x,t) 6 G9,, one can write a, (x) = m01+ )3 ml (230,); for m0, 6 R and ma, (ma) 6 (1:1 02,0. Simple algebra shows that the following oracle estimators of the constants mm are ex- actly equal to mm, in which the oracle pseudo-responses 1),-c = “—23:14 2:11 ma, (Xia) Til which mimick the oracle responses Yic in (4.3.5) 11 d1 2 mo =(fi101)irglgd1 = arg (A0 mix; )2 Y.- -— Exam-z . (4.4.2) 1,..., 0d1 i=1 z=1 PROPOSITION 4.4.1. Under Assumptions (A1) to (A5) and (A8), as n —+ 00, 3119151ng W01 ’ 7901' = 0p (73—1/2), hence SUPISlSd1 lmoz - mm] = 0,, (n’l/Z) follow- ing Proposition 4.3.1. Define next the oracle pseudo-responses 1),-1 = Y,- - 2:1 (mm + 2:12 ma, (Xia)) Ta .. . .. T and Y1 = (1’11, - ~ :Ynl) , with mo, and ma, defined in (4.4.2) and (4.4.1) respectively. The spline-backfitted kernel (SBK) and spline—backfitted local linear (SBLL) estimators are , —1 - 1 '1 1 . mSBK,1,.($1) = (CEWICK) CTWIYI = (gcflwch) ECTWIYI, (4.4.3) —1 . 1 1 . mSBLL,1,- ($1) = (Idlxdlfldlxdl) (ECELJWICLLJ) ECEL,1W1Y1- (44-4) The following theorem states that the asymptotic uniform magnitude of difference between mSBK,1,. (x1) and 7711“,. (11:1) is of order op (n‘2/5), which is dominated by the asymptotic 61 size of mK,1,.(T1) — m1,.(a:1). As a result, mSBK,1,. (T1) will have the same asymptotic distribution as 7711“,. (T1). The same is true for 71133];qu ($1) and ThLLJr (x1). THEOREM 4.4.1. Under Assumptions (A1) to (A5), (A7) and (A8), as n -—-> 00, the SBK estimator mSBK’1,. (T1) in (4.4.3) and the SBLL estimator mSBLL,1,. (T1) in (4.4.4) satisfy $186115,” IThSBK,1,- ($1) - 771K,1,- ($1)|+xlseu[0p,1] ImSBLL,1,- ($1) - 771LL,1,- ($1)] = 0p (71—2/5) - Theorem 4.4.1 follows from (4.4.13) and Propositions 4.4.1, 4.4.2 and 4.4.3, and re- mains true if the number of knots is of the more general form N ~ n1/4N' where N’ ——+ 00, N ' /n" ——> 0,Vr > 0 as n -—+ 00. The following corollary provides the asymptotic distributions of mSBLLJI (T1) and 771K]. (2:1). The proof of this corollary is straightforward from Theorems 4.3.1 and 4.4.1. COROLLARY 4.4.1. Under Assumptions (A1) to (A5), (A7) and (A8), for any $1 6 [h,1 — h], as n —+ 00, the SBLL estimator mSBLL,1,. ($1) in (4.4.4) satisfies d1 “1 Vnh mSBLL,1,~($1) -m1,-($1) — E :bLL,z,z',1(~"1) (‘2 _+ N (0'{UI.I’.1($1)}HI=1) l=1 z’=1 ’ and with the additional Assumption (A6), the SBK estimator mSBK,1,. (3:1) in (4.4.3) sat- isfies d1 d1 d1 \/nh mK,1,.($1)-m1,-($1)— be,z,z’,1($ll “2 TN(0’{UU’J ($1)}zz'=1) l==1 l’=1 , where bLL,l,l’,1 (.171), bK,l,l',l (3:1) and “1,151 (931) are defined as (4.3.3). Remark 2. The above theorem and corollary hold for mSBKfi, (Ta) and mSBLL,a,. (Ta) similarly constructed for any a = 2, ..., d, i. e., ...1 1 Tl A 1 .. mSBK,a,- (Ta) = (I—lcfiwacK) CEWaYa, (4.4.5) A d A A where Yia = Yi “ 21:1 {"101 + E1ga’g(12,a’¢a mat (Xia)}- 62 4.4. 1 Decomposition In this section, we introduce the ideas of the proof of Theorem 4.4.1. Our main objective is to study the difference between the smoothed backfitted estimator ThSBK,1z'($1) and the smoothed “oracle” estimator filKJl’ (51:1). First, define the theoretical inner product of b J and 1 with respect to the a-th marginal density fa (1:0,) as ch, = (b J (Xa) , 1) = f b J (Ta) fa (Ta) data and define the centered B spline basis b J,0 (Ta) and theistandardized B spline basis B J0, (Ta) as CJ, 5.1, ($ ) a bJ—l (Ia) :BJ,a ($0) = _Li ,1SJSN+L 446 CJ—1,a lle.a”2 ( ) bJ,a (Ia) = (U (330:) - so that EB”,K (Xa) E 0, E830 (X0) E 1. For any n-dimensional vector I‘ ={F1, ...,I‘n}T, we define the additive spline co- efficient function constructed from the projection of I‘ on the inner product space d .. d N 1 .. . . (09;, (-,.)2,n) as (PnI‘) (x,t) 2 21:1 {701+ 21:12:12}; 7J,a’[BJ’a (Ia)}tl,, 1n whlch . . T .. . n d1 d2 N+1 2 2 Pi - 2 70,1 + Z Z 'l'J,a,lBJ,a (Xia) Ti , (4-4-7) i=1 1:1 0:1 J=1 so one can rewrite the linear spline estimator in (4.4.1) as m (x, t) = (PnY) (x, t), where we denote by Y = (IQ-figs" the response vector. The coeficients of the linear regressors t1, 1 S l S d1 are denoted as the multivariate additive spline functions d2 N+1 (Pair) (x) = $111+ 2 2 $14,181.. (ma) .1 = 1.011. a=1 J=1 - (1 Note that (Pn,lI‘)(:ra) = 70,, + 202:1 ( h,a,lr)($a) where ( 30,11“) ($0.) = 29:11 'er’aJBJp, (220,), we define the empirically centered additive components (Pma’ll‘) (ma), 0 = 1, ..., (lg n (Pmlr) (x0) = (P;,a,,r) (1:0) _ 71-1: (p;,a,,r) (X,,,). (4.4.8) i=1 Using these notations, spline estimators of ml(x) and matte“) are mm = (Ple) (x) ,ma, ($0) = (Pn,a,lY) ((120), while noiseless spline smoothers and variance 63 spline components are 17119C) = (Pn,lm) (X) 1771011 ($a) = (Pn,a,lm) ($a). gl( :(Pn, IE) (x) 501(HO) (P11,a,lE) (330) (4-4-9) where in = {m (Xi, Ti)}¥1:,-Sn is the true function vector and E = {0 (X5, Ti) eilfsisn the error vector. Due to the linearity of operators Pup and Pn,a,l: 1 S l S d1,1 S a S d2 and Y = m+E due to (4.2.1), one has the following crucial decomposition for proving Theorem 4.4.1, m, (x) = m, (x)+El (x) , ma, (Ta) = ma, (Ta)+éal (ma) ,1 SIS d1,1 S a S d2. (4.4.10) We define additionally an auxiliary entity e3, ($0,) = (P:,,C,,1E)($a).1 g l 3 (11,1 g a 3 d2. (4.4.11) Definition (4.4.8) implies that 50,1 (Ta) is simply the empirical centering of {7:31 (2:0,), i.e. n 801 (23(1):...— 5;! (Ia)— M: (4.412) .21 According to (4.3.1) and (4.4.3), 1 1 . 1 1 WSBK,1,-($1)-mk,1,.($1)=(Ecfiwcx) -CKW1 (Y1 Y1). .. - . T Y1 -Y1= (Y1,1.°" .Yn,1) "- (Y1,1.°~ .Yn,1)T d1 = Z{m01 — mm + "1,1,1 (X44) - 711-1,: 09,1» Ta [:1 lSiSn d1 = CK (m0! ‘ 77100131911 + 2: ("1-1.2 (X231) - 771-1,: (X4,_1)} Ta ’21 lSiSn where making use of the definition of mo) and the signal noise decomposition (4.4.10), the difference mm, ($1) — mSBm, (T1) — mg, + m0, can be treated as the sum of two terms d Tl 1 ’1 1 1 . (Ecfiwch) ECEW 2 {mm (X111) - m-1,z (39-1)} 7“,, (=1 i 1 ll 64 —1 = (iCEV‘HCK) {‘I’b (171) + ‘I’v ($1)}:l121 (4413) where d1 " d - 1 ‘1’b(I1)— " —CKW1 [Z {m-1,(1X4,_1) - m-1,z (Xi,_1)}Tu] = {‘I’by ($1)},,=1. z— 1 -=1 2 (4.4.14) (11 ' n d1 d2 ‘I'v ($1) — -CKW1 26-1,2(X1,-1)T41 = {‘I’UJI ($1)} ,,_1 £11,: (XL-1) = Zéaz (Xia) (4.4.15) and 1 " d1 ‘I’by ($1) = #1; Z Kh (X11 - 131) Ta! 2 {m_1,z (X,,_1) - 7714,: (X,,_1) } Tu i=1 1: 1 ‘I’v,t'(-'”1) = —;Kh(Xil $1)Tz’d215-1,(I(Xi,-1)Til ' 1:1 The term \Ilb (2:1) is induced by the bias term Tin” (X,,_1) -— m_1,l (X134), while ‘11,,(221) relates to the noise terms g_l,l (Xi,-1)- Both of these have order op(n'2/5) by Propositions 4.4.2 and 4.4.3 below. PROPOSITION 4.4.2. Under Assumptions (A1)-(A4), (A 7) and (A8), as n —-+ 00, sup sup |\Ilb l’ (1:1)l = 0,, (n-1/2 + H2) = 019 (n’2/5) . 131'ng x1€[0,1] ’ PROPOSITION 4.4.3. Under Assumptions (A1) to (A5), (A 7) to (A8), as n —-> 00, WW; ($1)| == Op (N (log n)2 /n + H2) = 0,, (n‘2/5) . sup1 51' 541 S“13:1:le[0,1] According to (4.4.12) and (4.4.15), we can write ‘11,} [I (.131) — Wat ) ,1(:l: ) — \II(1 1'10””) where (1) 11 d1 \Ilv,” ($1) = 4:219: (X21 -$1)Tz'1TzI n 14:216.,“ X431) (4-4-15) {211: 1 (2) " d1 11%,, (x1) = 71—122mm“—x1)1,mT,,5,,(x ), (4.4.17) i=11=1 65 in which E:1J(X ,- _-1) —§*1(Xia) and E’" l( Xia) is given in (4.4.11). If further one 0220 denotes Wm“! (X13331) = Tz'zTuIKh (X41 - $1) BJ,a (Xia) , ($1) = Baum)! (X, $1) (4.4.18) #wJ,a,l,l’ then by (4.4.17), (4.7.9) and (4.4.11), 110,), (2:1) can be rewritten as n 0’1 N+1 d2 1142,41) =n 1:: Z ZaJMma”, (X,,:1:1) (4.4.19) 1': ll: 1J= 10:2 LEMMA 4.4.1. Under Assumptions (A1) to (A5), (A7) to (A8), as n —) oo, \IISI),(:1:1) defined in (4.4.16) satisfies SUPISI’Sdl squle[0,l] lily}, (u)| = 0p (N (log n)2 /n) . LEMMA 4.4.2. Under Assumptions (A1) to (A5), (A7) to (A8), , as n —-> oo, Q1182) (11:1) 11:22,), (1:1)l = 0,, (H2) . defined in (4.4.17) satisfies suplgl’Sdl squle[0,1] Proof of Proposition 4.4.2 is given in the Appendix, while Pr0position 4.4.3 follows from Lemmas 4.4.1 and 4.4.2. Lemma 4.4.2 follows from Lemmas 4.7.13 and 4.7.14, both proved in the Appendix, while the proof of Lemma 4.4.1 is given in the Appendix. Similar result can be proved for mSBLL ll’ (:31) by extending :41) 1’ (2:1) and ‘11,, 1’ (1:1) to terms such as ‘ 71,219: (Xi 1 " “31) (193% 2:1) Tm: {m_1,(1Xi,-1) 77741,: (XL-1)} Ta, 1 n EZKh(Xi1"2('L—$l)(—:) Til’ée-lfi i,-1) Til» i=1 4.5 Implementation We implement our procedures with the following rule-of-thumb number of interior knots N = Nn = min ([n1/4logn] + 1, [n/4d1d2 —1/d2] — 1) which satisfies Assumption (A8), i.e.N = Nn ~ n1/ 4 log n, and ensures that the num- ber of parameters in the linear least squares problem (4.4.7) is no more than n/4, i.e., d1 {1 + (12(Nn +1)} fl 71/4. 66 By Corollary 4.4.1, the asymptotic distributions of the estimators ThSBLL,a,- (x0) de- pend not only on the functions bLL l l’ a (ma) and U)!!! a (ma) but also crucially on the choice of bandwidths ha. So we define the optimal bandwidth of ha, denoted by happta as the minimizer of the total asymptotic mean integrated squared errors (AMISE) of {fiza1(xa),l= 1, . . . ,dl}, which is defined as d 2 1 AMISE{ma.}- / Z ZbLW, (mama +4450, (ma)/(nha) fa(xa)d:ra. (=1 By letting dAMISE {771m} /dha = 0, one gets the optimal bandwidth happt as d 1/5 "-1 f Zzil=12’1’1’,a(33a)far (ma)d—’Ea ha,opt = (11 d1 2 , 4f Z:z’.—.1 {21:1 bLL,l,l’,a ($11)} fa (500:)de where 4f 221’ :1 {21:1 bLL,l,l’,a ($04)} fa (37a) dSEa lS approx1mated by 2 d1 ”-1 Z/‘Z (K) 2 :lemflzl Xia) fa( (Xia) (In/,0 (Xia) ll___1 .— To implement this, we propose the following simple estimation methods for terms mgl (1:1), qll’,a (ma), “um/,0 (ma) and fa (sea). The resulting bandwidth is denoted as illppt- o The derivative function mg, (Xia) is estimated as 22:2 k (k — 1) amhkxfa-Z + 6 25:13 51011,]: (Xil — :Z’k-3) where {ambkfitfi minimize the following least squares n N+3 2 2 Y1- (1:1: Zaamlk a+zaan(1k (Xia_ta,k— (3)3 Til i=1 [210:1 k=0 where miniXfl = to < 0.02. On average, the higher R&D investment - spending causes faster GDP growing. However, overspending on R&D often leads to high losses (Culpepper, 2004 and Tokic, 2003). We have also computed the average contribution of R&D to GDP growth for 1964-2001, which is about 40%. The GDP and estimated TFP growth rates is shown in Figure 12, it is obvious that TFP growth is highly correlated to the GDP growth. For more details, see Arnold (2005). 4.7 Appendix 4.7 .1 Preliminaries In the proofs that follow, we use U and u to denote sequences of random variables that are uniformly O and o of certain order. LEMMA 4.7.1. (Xue and Yang, 2006b, Lemma 14.2, Lemma A.5) There exists a constant co > 0 such that for any sets of coefficients {aoz,aJ,a,z,15J5N+1,1313d1,15a5d2}, (=1 a=l J=1 01:1 J=1 d1 d2 N+1 2 d2 N+1 Z 0'01 + Z Z aJ,a,lBJ,a)2 ti 2 C0: 001+ 2 Z (Mal) 70 and that as n —> 00, with probability approaching 1, d1 d2 N+1 2 d2 N+1 Z (“01 + Z Z aJ,a,lBJ,a) t! _>_ CO :21: 1(001 + Z :2 aJa I) 2,n LEMMA 4.7.2. Under Assumptions (A1) and (A8), one has: (i) there exist constants ef,C’f,co (f) and CO( f) depending on the marginal densities fa (Ira) ,1 S a S d2, such that cfH S CJ,a S CfH andc0(f)H S ”1),/,0“: S Co(f)H. (ii) uniformlyfor J, J’ =1,...,N+1 1 J’ = J E {BM (Xm) BM, (2%)} ~ —1/3 |J’ — J| = 1 1/6 |J’ — J| = 2 k Hl-k JI_ J < 2 EIBJ,a (Xia) BJ’,a (Xiall ~{ 0 if __ Jl ; 2 1k 2 1. LEMMA 4.7.3. Under Assumption (A2), for VT defined in (4.7.15) and ST = Vi} CQCVId1{d2(N+1)+1} S VTgCQCVId1{d2(N+1)+1}’ CQCSId1{d2(N+1)+1} S STSCQCsId1{d2(N+1)+1}- Proof. By definition, VT = E [E (TTTI X) ® { B (X) B (X)T}]. According to Assump— tion (A2) and Theorem 20, p. 192 of Zhang (1999), VT S CQId1®E {B (X) B (X)T} S CQCVId1{d2(N+1)+1}' One can prove similarly the result for ST. E] Lemma 3.6.1 and Assumption (A3) ensure the existence of functions god 6 0(0) [0, 1] such that l “90,, - mallloo< _ Co0 “miduo0 H2, a— _1,. ..,d2,l = 1, ...,dl. (4.7.1) 4.7.2 Oracle smoothers In this section, we prove Theorems 4.3.1 and 4.3.2 for mm, (.131). Corresponding proof for mum. (:51) would require replacing Kh(X,-1 - 1:1) by K], (X21 — (1:1) (£15,111) in the proof, which does not add a great deal of difficulty. According to (4.3.1), _ 1 ‘1 1 mK,1,. (:51) —- 177.1,. (£131) = (ECEWICK) gngl {Y1 " CKml,- ($1)} 1 71 Tl. Y1- CKm1,- ($1): [:22 {m11(X mu ($1)}Tz'z +0(Xi,Tz‘)€i] , i=1 then i—Cfiwl (Y1 — CKm1,- ($1)) iS d1 d1 [hi ZKh (Xil— $1)T {1’ [Zlmll(xi1) — mll ($1)} Til + 0(xi1TilgiJ] l— 1 III—11 = {Bil ($1) + Vz’ ($10211; where 1 " all By ($1) = a Z Kh (X21 - Tut 297111091) — m11($1)}Tz'z=Z B, 1' ($1) i=1 1: 1 B; 1' (1'1) — £2 KMX 2'1 — $1){m1z (X11) m11($1)}7}sz-1u (4-7-2) V1! (x1)= £210. (X 1 —— $1)sz (X.- T.) a. (4.7.3) ”i=1 Denoting D1 1’ (1:1)=-,1,-Z,-__1 K h (X21 — x1) TilT -,l; the dispersion matrix is I ECIQWICK : (— ZKh(Xi1 - mllTilTi I’d) (D1 1' ($1))IJ’: 1 d1 z: l,l’=1 LEMMA 4.7.4. Under Assumptions {A1} to (A4), (A6) to (A7), as n —* oo, sup sup IBM (:51) — b.,.1KH’ (3:1)h 2[=0 p(h1/2logn/\/1_t) 1311'ng 3:16[h,1—h] where for any 1:1 6 [h,1 — h], , 3f ' ( l ” b11151 (x1)=§uz 1,a(1 + 6) > 2/5, which requires 6 > 1/2 provided by Assumption (A2). We make use of the following truncation and tail decomposition TilDI’J where T?" 211’ ,1 spondingly the truncated and tail parts of (was TIT“; {lTuT (II > Du}, T.D “,2 = Tisz'z' {lT'isz'l’l _<_ Dn}. Define corre- D Cz’,n,:1 Ci ,1). (:51an T01”, 1) a €11,112 : Ci,n($11Xi,T,u’/12)- 73 According to Assumption (A2), 6 Emmi/1m") _ 2: 1,, a" :2 ( . . ) /\ 2+6 2+5 1121 n=1 D51 ) n=l D" oo 06 oo __ k -—a 2+6 Ems—06:12 < > 2. According to Lemma 2. 5. 2 (Bernstein’s inequality), 2k 2 gen 71 2E+1 P >n€ ne logna n 25m2 + 5023 25m2 + 5c 5 _ 108 n 2 n 2 1 " 25m2 + 50 a 2 l Vnh > C3a2 log n C3a2 log n ~ a2 log n _ 1_08" :25n12 + 50 nan‘2/5 10 n ’ 257712 + 5 Da 2 C0 g 2 C0 nah—71h. 52-, a1=2§+2(1+ )=O(logn), 25mg + 5c1€n 6/7 a (3)=11n 1+ m3 ,withm — max “5 H (C60 2 an 3 1<'< alog n/Jfi} _<_ 0(log 71) exp (—c5a2 log n) + Cn2“6)‘0¢2/7 2 = 71-65“ O(log n) + Cn2-6A002/7, for c0, c2, a large enough. For all 1:1 6 [h, 1 — h], we discretize by equally spaced h = 131,0 < 131,1 <---010gn/x/5} i=1 P max 11'" OSjSMn Mn 71 .<_ ZP{n'l Z Ci,n,2 ($1,j) 0:1 i=1 for a and c2 large enough. Borel-Cantelli lemma implies that 11 71—1 21 Cm (9314') 1: whole interval [h, 1 — h], one has > alogn/fi} S Cn—BMn 3 C1172 maxlgngn = 0p (alogn/fi) a.s.. Taking supremum over the n _1-1 311p n Ci,n,2 S :1: Ci ,,n 2 I1€[h, l—h] ; 0 oo, SUP lEéz-NIQNII - h—lfl (371) E (T1IT1II02(X1T)|X1 = $1) “Kllgl = 0 (h), $16[h,1—h] sup [1213," — h‘1f1(T1)E( T102 (x T) 1X1 — x1) ||K||2|0 = $1E[h,l-h] Proof. According to (4.7.7), Eé.,n,1e.,n,2 = ET1T202 (X. T) K12. (X1 —— x1) — “,2 th "171512 tdtd _‘ [011612 1161112001,)? h f(uv) Ll 2 1 u—x 2 1 1 — tta u,t —K 11 ,u ,t du du dt /d1/[0,lld212 ( )h2 ( h )f(1-1)1_1 77 = h—1 _ t1t202 ($1 + hu1,u_1, t) K(U1)2 f (11:1 + hv1, u_1,t)dv1du_1dt R41 [0 1]d2 1 [—1,1] 1 1 1 11 Rd1 [0,11d2-1 [4,1112 2 2 2 2 <90 (171111-110 30 (931111-110 2 2 {a (x1,u_1,t)+ 6:61 1111+ 23111 (1111) +1101) 2 3f (931111-110 32f ($1111-11t) 2 2 K (111) {f (11:1, u_1,t) + 31:1 hul + 263% (hul) + u (h )} dv1 du_1 dt __ -l 2 __ h ./[—1 1] K (1)1)2 dv1 Rdl [0’11d2_1 t1t20 (11:1, u_1, t) f (331, u_1,t) du_1dt + U (h) = TM (11) E (T1Tzo2 (x, T) IX1 = 2:1) IIKII§ + U01). Similarly, one has for any l’, l” E1,W,,,,g,,,,1 — h 11(11)E( 1,11,”. (x T) |X1- — $1) IIKIIE + Um) E13,, — h-ln (1,141,102,111 T) |X1— — x1) ”1112+ I101) .13 LEMMA 4.7.7. Under Assumptions (A I) to (A3), (A5) and (A 7), as n -—> 00, there exists a constant C such that 1 S Ch—Ygafi -— 1')??? fort 75 j sup sup Icov (ginlhéjnl’I) 15,13d111e1h1—11] ’ ’ ’ ’ + Proof. According to Davydov’s Inequality [Bosq 1998, p. 21. equation (1.10)], for %+ % ;1-' = 1, cov (62,11,111, 3311,11) is bounded by C2 {201 (.7 — 1)}1/1)TIITil’0(XiaTi)EiKh(Xil- 771),)“ IIT leIU (XjaTj) 5:th (Xj l — $1)IIT Let q = r = 2 + 17, p = 1 + 2/17, where 17 takes value in the Assumption (A5), then one has _1+fl g cov(£inl’1£jnl”) S Ch +770 (j — 2') +77 for some constant C. C] T Proof of Theorem 4.3.1. For any A = (A1, ..., A111) 6 Rdl, d1 d1 11 71. d1 T d1 _ __ 1 A {V1I("1‘1)}1I=1 "- )2 Az'VzI (931) - 2: )‘z'; 251,11,1'=%Zl 1}; )‘1I51' 11,1, 1:1 1:1 1: 1-1.1 78 d (1 Define £1,” = 213:1 A141,”; and Sn = Sn (2:1) = 221:1 51,11 = nAT {V11 ($1)}l’1=1’ then one has ES" = 0. Let d1 d1 '7 (k) = ’7 (k, 131) = COV (€1,nv€i+k,n) :2 COV (Z Allé’ifllJ” Z Alléii-kfllll) z’:1 1’:1 11;...) 211:1) 1:1 {:1 1713' I :3]. =nvar(€,-,n)+n Z ( lkl)7(k)=nvar(£,1’n)+nAn. ISIkISn-l In the above, var (gm) : var(Z;1,1=1 Al'giflll’) : 1141\sz where (11 d1 E = h {cov (€1,n,l” {mu/I) }l' l”=1 = hE {€i,n,l’€i,n,lfl}l, (”=1 = 1 ($1) IIKHE E {TTTa2 (x. T) 1X1 = 111} by Lemma 4.7.6. While according to Lemma 4.7.7, one has 2 $3 13’.- I7 (k)| S d1 13,3521 ICOV (€2,n,l”€1+k,n,l”)I S Ch 0 (k) ’7 . Hence lAnl = Z 106) ISIIISn—l s 2 (1i?) h—%$%{Koexp(—Aok)}7%7 ISIIISn-l -2210 S Koh +” Z eXp{—I\0k77/(2+77)}, ISIIISn—l _ 1+1 so there exists a constant Cl such that An < Clh 2317. So An/ var (gm) —1 0 as n -> 00. Then 03, ~ 11 var (51,”) 2 can when n is large, so according to (2.5.1) in Lemma 2.5.1, there exist constants c1 and c2 such that for some 0 < 17 g 1 d" {log(on/c(1)/2) /)‘}1+17 . TI C0011 P{0;ISn < z} —(z)I g cl An = sup 2 for any A with )11 _<_ A S )2, where A1 = 02 {10g(011/c111/2)}b/n1b > 2 (1 + 77) /I1; 12 = 4 (2 + I2)I7'110g(0n/c111/2) - 79 For the 17 in Assumption (A5), set A = 4 (2 + 77) 11"1 log (an/céfl) , then by (4.2.4) one has lgi_<_n lgign z’=1 z'=1 d1 d1 d4] = max {El 2 Al’éz’mJ’liz-HI} = max {El 2 )‘I’Tz'l’a (Xi, Ti) EiKh (Xil — 1:1) |2+77} d1 S (30501; {El 2 Kh (X1 ‘ 171) PM} ..: 0 {h_(1+n)}’ l’=1 i.e., An = 0 {h-(1+n)/gg} = 0 {n(1+n/2)/5-n/2} = 0(n1/5—277/5) —+ 0 when 1/2 < 17 _<_ 1. So Sn/an —+ N(0,1), then «72,th {V1, ($137,; ——) N (0,AT2A). By Cramér—Wold device, one has \fli {VII ($1)}:i,1=1 —-» N (0, 23). Then according to Slutsky’s theorem, one has d1 d1 VnhE(TTT|X1 = 931) {771K,1,- ($1) — m1,z' ($1) — 251,1! ($1) ’12} —’ N(0,2) (=1 ”:1 . .. d d1 i.e., Vnh {mK,1,. (x1) — mu, (3:1) _ 21:1 bu, ($1) h2}l’=1 —> N (0, Q1 (:z:1)_l 20,1 ($1)—1), where Q1 (2:1) is defined in (4.2.3). Cl Proof of Theorem 4.3.2. Let Dn = n“ with a < g, o(2 + n) > 1,a(1+ 1)) > 2/5, which requires 17 > 1/2. Rewrite Z,- = Til/5,- = Z51" + Z52" + Z3" where 23" = z, {|Z,| > Dn} ,ngn = Z,- {12,1 g on} — zflngn = E2, {|Z,-| g Du}. Define €i,n,l’,j = Kh(Xi1’ $1) 0 (X231?) Z-D" j =1,2,3- 2,1' ’ According to Assumption (A5) and (4.2.4), one has oo CUEIZi|2+n = C i E {ITIIIZME (|5|2+fl |X,T)} 21002.: 2 D") Z l/\ 2+1) 0 2+1] 1121 n=1 D71 n=1 D" oo oo 1 .. s Cachlrz},|2+’7§ : 2+" = CachIMME :17. 042+") < 00. 1121 n 1221 By Borel—Cantelli Lemma, one has with probability 1, n'12?=1€ 1 = O for large n. i,n,l’ = U (“n-k) for any I: > 0. Using As— Therefore, one has squle[0,1] ln‘l 23:152. n (I 1 sumption (A5) and (4.2.4) EIZi|2+n D 2,,3"| = I-EZz- {I24 > 01.}! s 01+" Tl 8O _ E {E |T,/|2+’7 E (|5|2+'7 1x,'r)} /D,1,+" = o (n-Z/S). Hence "—1 2g,” =n-1 Z Kh(X -1 — $1)0(X1,T,-) 21g" 1:” = 12,-1 Zn Kh(X,-1 — 1:1) 0 (72—2/5) = 0,, (72—2/5) . i=n Meanwhile D 2 D 2 D 2 (2:32") = EZE{|Z,-| S Dn} ‘ (2:33") = E212 - EZ,-2{|Z,-| > 0"} — (Z1351) 2 2+ D 2 — — s E {731E (e,- IXe-Je) }—EZ,. '7 {lZil > Du} /Dr‘z/_(Zi,3n) = EfiirtUp (Dun + n 4/5)’ E D" 2 6,2 m112=E{Kh(Xi1 - $1) 0 (Xi/Ti) 2:32 } -—- h’1f(rv1)E (T302 (X,T) 1X1 = x1) IIKH§ {1 + u (1)} k k—2 2 = E ( €i,n,t’,2l lée,n,z’,2l ) 2 E5 '2 <002’“ 205; 2/hk 2E|§,,,,,,,2 I E léi,n,l’,2 lk_2 S sup léi,n,l’,2 Ei ,n ,,21’ $16[O,l] according to Assumption (A3) and the truncation of Zipz‘", then there exist a constant k ) C1 = CaDn/h 81101] that E ( €i,n,l’,2l ) S Cllc-zklE(€?n (I 2), k 2 2. Similar to the proof of Lemma 4.7.4, by using Lemma 2.5.2 (Bernstein’s inequality), we 6/7 m logn letk=3,a 3=11n 1+ 3 ,m2=E2 =0 h‘1,5 =a—-— 2” ( En ) 2 (5mm) ( l n W7}; 6 n 2 gen 17. 7 >115 log na nh 2 2 -— 1 25m2 + 5c5n 25m2 + Sclen 25m2 + 501a 0g: Vn C3a2 log n __ C302 log n 2 1 “ 2 _1/2 12 ~ azlogn, 25mg}; + 5CoDn/ha 0g "h 25m2h + 5ac0nan h" / log n v nh 81 0 =2: +2 =Olo n, 16/(71+25m2+5015n) ( g ) ("33 C6Dn 2 3 <11 1 = , a2() n{ +an-1/2h 1/210g77.} 0(n) 6 7 n 6" (IQ—1]) / S (K06 *0[q+1]) Son—6A002/7, a2 (3) - '6 €i,71,l’,2||3 S CfiDn, ,with "7.3— — max 1_<_i alog n/Jfi} S 0(log n) exp (—c5a2 log n) + Cn2"6)‘OC2/7 2 = 72765“ O(log n) + Cn2-6A002/7 sup = 0,; {(nh)—1/2logn} . xle[0,1] n -—1 n 2 €2’,n,l’,2 i=1 1?. -1 n .21 the z: = 0,, {('nh)-1/2 log n} i.e., Then sulee[0,1] sup IV}: (21:1) I— — Op {(nh) 1plogn} (4.7.8) $16[0,1] for the term V}; ($1) in (4.7.3). According to Lemma 4.7.5, d d1 1 772K,1,. (171) —' 7711,. (1171) = (1 CTWIC)-1 Z BIJI ($1) + WI (151) [=1 "=1 d1 d1 —1 _—= {{ETlTl/Kh (X1 — $1)}Zi’zl + Op (1171/2 log 11)} {231,11 ($1) + VII ($1)} 1:1 [’21 d1 d1 = [{ETlleKh (X1 -— $1 ”321] {2: BL 1,( ($1) + V1; (2:1)} + 0p (n—1/2 log n), =1 [I21 [{ETITz’Kh (XI—$1)}z}121]_1 = [f($l)Q($12X_1)+u(h2)]—1 = f‘1(:v1)Q‘1(x1,X_1) + u 022)"1 . 82 Meanwhile, according to Lemma 4.7.4 and (4.7.8), d1 EB”; (:01) + V); (11:1) = Up(h1/210gn/\/fl + hz) + Up {(nhrl/zlogn}. (=1 According to Assumptions (A1) and (A2), f‘1 (3:1) 3 cfl, 0611,11 _<_ Q'1(:1:1,X-1) S, calIdl, so SUleeUzJ—h] |mK,1,.(x1) - m1) (x1)| = Op{(nh)"1/2logn}. El 4.7.3 Estimation of constants To closely examine terms If) (x) and 50,1 (2:0,), we denote the following vector of coefficients T a = {(1011 01,1,1, "-1 aN+l,d2,lv 0'02: 01,1,2, ”'7 aN+1,d2,21 "-i a0d1) a1,l,d11 "-1aN+l,d2,d1} (4.7.9) such that the noise term 5:) (x) in (4.4.9) is expressed as 0‘2 N+1 (Pn,zE) (x) -—= a (x) = 401+ 2 Z 4,0,3), (ma). (4.7.10) a=1 J=1 -1 Equation (4.7.10) implies that a = (DTD) DTE , where . D = {D (X1,T1) , ...,D (Xn,Tn)}T = {T1®B (X1) , ...,Tn®B (Xn)}T, (4.7.11) B (x) = {1,31,1 (4:1) ,...,BN+1,,,2 (xd2)}T,t = {t1,...,td1}T. (4.7.12) Note that 5 given in (4.7.9) can be rewritten as a = GDTD) -1 GDTE) = (VT+V.})_1 (i—DTE) , (4.7.13) where by (4.7.11) DTD = 2 [(BT?) ® {B (xi) B T}] ,DTE = Z [{T.-®B } a (my) 54], i=1 i=1 (4.7.14) and Vi} is the difference between empirical and theoretical inner product matrices, i.e. vT = E [(TTT) <3) {B (X) B (X)T}] = E [Q (X) B {B (X) B (X)T}] , (4.7.15) Vi} = ii: [(T,T§’") 49 {B (X,) B (X,)T}] -—E [Q (X) <8) {B (X) B (X)T}]. i=1 83 T NOW define a = {0:01, a1’1,1)-") aN,d2,11 0'02) 01,1,27 "'7 aN,d2,2) ""l O’Odl? a1,1,d17 "" aN,d2,d1} by replacing (VT+V'}~)_1 with V2111 = ST in the above formula, that is s = v.1} (n—IDTE) = sT (n-lDTE) . (4.7.16) LEMMA 4.7.8. Under Assumptions (A1) to (A3), (A5) and (A8), as n —2 00 “on = 0,, (n‘1/2N1/2 log n) , (4.7.17) “5 -— an = 0,, (n‘1N3/210g2 n) , (4.7.18) ”an = 0,, (n’l/ 2N1/ 2 log n) . (4.7.19) —1 Proof. By definition, 5TDTD5 = éTDTD (DTD) DTE -_— éDTE. Using (4.7.13), one has "135qu = n—laTDTDa =n*15TDTE g “a” “n-IDTB“. (4.7.20) According to Lemma 4.7.1, 2 ~ 2 = “Dall2,n ' 2,n 60 ”15“2 = Co: (031+ Z 030)) S Z (001+ Z aJ,a,lBJ,a t1 , z J,a,l l J,a,l (4.7.21) So "an is bounded by cal “n’lDTE”. Bernstein’s inequality and truncation entail that 2 ”n'lDTE“ = O,,{(logn)2 N/n}, so (4.7.17) follows from (4.7.20) and (4.7.21). Ac- cording to (4.7.13) and (4.7.16), one has VT 3 = (VT+V.}) 5, which implies that v3.5 -_- VT (5 — 5). OP (71-1/211"1 log n) ”all. By (4.7.17) one has ”VT (5 - 5)” _<_ 0p{(10gn)2n‘1N3/2} One obtains from (4.7.29) and (4.7.30) “VT (5 — a)” = ““115“ 3 Thus according to Lemma 4.7.3, one has us - on = Op (n’1N3/2 log2 n), which is (4.7.18). Then (4.7.19) follows (4.7.17) and (4.7.18). C] LEMMA 4.7.9. Under Assumptions (A1) to (A3), (A5) and (A8), , as n -—+ 00 n d1 d2 sup —1— 2:61;"; 2 50,7), = 0,, (71-1/2) . (4.7.22) 71 131,34 i=1 (=1 0:1 84 Proof. According to (4.4.9) and (4.7.10), one has d1d1 d2 N+1 n i2n112:6017}1= iznll;z Z aJwalBJa(Xia)T il i=1 l=1a=1=la==1J1 1:20 a—J,oz,l ”ET {,I’BJa(Xia) Til=I(’+II(’ J,a,l where III=ZaJafEI ZTu/BJp(Xia)T il: J,,al "i=1 - . 1 " 111’ = Z (“J,aa _ aJ,a,l) 1“,; ZTil’BJpz (X101) Til- J,a,l i=1 Let 11’ = 111,1 + I",2 where 11”] x Z a'J,oz,l {gt-1:217" {,I’BJa (Xz'a)T ~ET‘IIBJ,O (X0) 'Tl} J,a,l III” I < ”all V (N +1)d1d2 sup = Op (n-lNlog2 n) , III2=ZaJWalETleJa(-Xza)nl= (En/31.54%)?» ,vr;1(n‘1DTE). J,,al T _ T . Let v), = (ETIIBJfi (Xa) T1)J,a,lVT1 = (”$01) . According to (4.7.14) "$271 {,I’BJa (Xia)T —ET'I’BJ,O (X01) Tl n Il’,2 = "—1 Z Z vJ,a,lBJ,a (Xia) Til” (X11 Ti) 52'- i=1 J,a,l Since 5,- is martingale difference according to Assumption (A5) var (1&2) "" n 22"?“ Z ”J,,alBJ,a(Xia)T 2'10 (Xi Tilgz' =1 J,,a l 2 n < ”-2 Z CUE Z UJ,a,lBJ,a (Xia) Til '“' J,a,l where 2 Z ”JnJBJn (Xia) Ta = Z Z '“JMUJME {Btu (Xe) TIBJ'oI (X0!) Tz’} Jial J.a.l J’ ,o/ ,l’ = VINTVIT," = {Emma (Xa) mica vglvTvgl {1271.3 J,,, (Xa) 78.1,...) = {En/8.1,. (Xe) mi.) V111 {EB/8.1,.(Xe)71}),a) 3 CV “{ETyBJe (Xe)71}J,a,.ll: = 0 (1) because clearly IETyBJ’a (Xa)T)| = U(H1/2). So var (1,52) = 0(n-1), and therefore 111,2 = 0,, (n’l/Z). So '94 S III',1|+|II',2 = 0,, (71—1/2) . (4.7.23) Next, by applying Bernstein’s inequality with truncation technique, 51113312 Twl’BJa (Xia) Til— ETI’BJXI (X01) Tl: 0p”( _1/2 log 12.) ~ Ja,ln Thus sup LOJ ".1 11B JO (Xia,) Till is bounded by 3111),, ET zl'BJa(Xia)T —ETIIBJ,a(X)Tl + |E%B.,=.(X.)11|0(H1/2). 0: Then .. - 1 " [”14 S “a“ all \/(N +1)d1d2sup gZEt’BJoMioflh i=1 = 017(n'1N3/210g2 n) \/(N + 1)d1d20p(H1/2)=0p(n1N3/210g2.n) (4.7.24) Now (4.7.22) follows from (4.7.23) and (4.7.24). The lemma is proved. Cl LEMMA 4.7.10. Under Assumptions (A1) to (A5), and (A8), , as n —> 00 n ‘11 d2 2 n"1 2 Z Z {111... (X...) — m... (Xenia-1] = 0.014) (4725) i=1 l=1a=1 Proof. According to (4.7.1), there exists 9a! 6 0(0) [0, 1] such that “gal — mallloo = 0(H2) = 0(n'1/2) .According to Theorem 1.7 of Bosq (1998) p. 36, n‘1/223=1(T5— ET?) => N (0,02) where 02 Z23°___OOCOV(T02,1.)) < 00 by ap- plying Davydov’s Inequality [Bosq 1998, p. 21, equation (1.10)], then 71."1 3:173 = ET,2 + 0,,(n-1/2) = 0,, (1). so 2 n d1 d2 2 n 1 ”*1 Z Z 2 {Thai (Xia) _ "101(Xia)}Til:| Sn _1 Z 1[dzd 2 ”mal_ ma'ool“ i=1 1 1 l- 1 a: = (1:1 1 86 71 d1 d2 2 n < ”—1 Z [Z Z ”9011‘ mallloo Til] = O ("—1) (”—1 2T5) = 0P ("-1) '0 i=1 i=1 (=1 (:21 Proof of Propositions 4.3.1 and 4.4.1. According to (4.3.4), 7710 —- m0 = (CECE-ICE (Y. —- moT) _ d1 = (%C§CK) 1%CE0 (X..,T,-)e,. We know %C§CK= (.1: :3,_ 172173.) 1H according to Theorem 1.7 .of Bosq (1998) p. 36, one has 71'”2 23:1 {Tde-l/ — ETlTlr} => - Then N (0,02) where 02 = 2:92—00 Cov (T01T0l;,T,-)Tu/) < 00. Therefore %C§CK=(ET)TI}I)Z},=1 + 0,, (71-1/2). Similarly, %c§o(X,-,T,)e,=op (1.4/2), implying 311131513111 |m01— mozl = 0,; (71—1/2), which has completed the proof of Proposition 4.3.1. Next, According to (4.3.4) and (4.4.2), 7710 - 7710 = (01010401 (Ye - Ye) d1 d2 " =K(C TCK)—ICT [Z 2 {mal( Xia) '_ mal (Xian Til] a-l a=1 i=1 d1 d2 " = (CECK)-1CE [Z 2 {mod (Xia) — mal (Xia) + mod (Xia) — mal (Xia)} Til] =n<10£c K)-110TK[(§:§Zeazn): l— 10.]. =1 0‘1 d2 " + [{Z Z {Thad (Xia) _ mod (Xiallnl}] ] - One has d1 d2 £2175” Z Z (mad (Xia) _mal (Xia)) Til ”1': 1 [=1 0:1 1 n 1/2 l 11 d1 d2 2 1/2 3 (71- ZTZ) ; Z Z 2 (ma! (Xia)_ mal( Xia» Til i=1 il’ i=1 1:1 0:1 5 0p (1) Op (Ml/2) = 0,, (71—1/2) . (4.7.26) by Lemma 4.7.10. Then the Proposition 4.4.1 follows (4.7.22) and (4.7.26). Cl 87 4.7.4 Estimation of function components Define n ”—1 Z BJ,a (Xia) i=1 1 An,1 = SUP |<11 BJ,a>2 n “ (lvBJ,a>2| = Sllp J,a ’ J,a An,2 = sup 2 — (3.1371), BJI,QIT;1I>2' . (4.7.27) J,J’,a7éo/,l,l’ i" LEMMA 4.7.11. Under Assumptions (A1) to (A3), and (A8), , as n —-> 00 An; = Op (n"1/2 log n) , (4.7.28) An; = 01, (n_1/2H"1/2 log n) , (4.7.29) An,3 = Op (n"1/2 log n) . (4.7.30) Proof. The proof of (4.7.28) follows from Bernstein’s inequality immediately, thus is omit- ted. Here we only prove (4.7.29) and (4.7.30). We will discuss case by case with various 1, l’, a, 07’, J and J', via Bernstein’s inequality. For brevity, we set 52' — €7,n,J,J',o,o/,z,z' = 51,1 + 5232 = €4,1,n,J,J’,o,o',z,z' + 5.,2,n,J,J',o,ol,z,z' = BJ.a (Xia) BJ',o/ (Xia’) TilTu' ‘ EBJ,a (Xia) BJ’,a’ (X. )Tz'lTu’ D where 5.,- = B... (X...) B)... (X...) 1);}, — EB... (X...) 3.7,... (X...) #71,, j = 1,2 by the same truncation (4.7.5) in Lemma 4.7.4. Then 1471.2 = SUPJ,J',o,z,z'"‘1IZ?=1§.,n,J,J',o,o,t,z'l and An,3 = sup J, J’.a#a’.l.l’ n—l 23:1 6i,n’J’JI,a’al’l,lll. One has with probability 1, sup [Egg 1| = U(n-1) , (4.7.31) J,J’,a,a’,l,l’ n sup n43: £31 2 U (n—k) ,k > 0. (4.7.32) Jleiaiailil’ i=1 We will consider 0: = 07’ = 1 in the Case 1.1 to Case 1.4. Case 1.1 when (J —- J’ I > 2. The definition of B J’l in (4.4.6) will guarantee that BJ,1(Xi1)BJI,1(Xi1) = 0 if IJ - J'l > 2. 88 Case 1.2 when J = J'. According to Lemma 4.7.2, EBJa (Xia)T ilT'1’_ _ E {BJa (Xia) E (Tle 11’IXia) }z 0 (1) E{B?...(X.-..)T..T .21} =E{Bi..(X.-..)E(T.%T5,IX...)} ~H-1. 2 So E52 = IE {330 (X,a)7},7;,.}2 — {EBia (Xian-171.} I ~ H-l. According to (4. 7. 31), one has E53 2 = Efz— E631 ~ H’l. Lemma 4.7.2 provides a constant Cg > 0 such that 2 Dn k E I57, 2Ik — "E IBJ,a (Xi-QT?" 1,111 2 EBJ.a (X50) Till’,2I -2 < sup IBJa (X10) '1'? E622 Jmall’ S 0? “21124034575132 _<_ (CgDn/H)k_ZE€?,2~ —EBJa (Xia)TD" ill’ ,2 i,2!l’ I Using the same technique in Lemma 4.7.4 by applying Lemma 2.5.2 and Borel -Cantelli Lemma, when J = J’,a = a’ = 1, one has n —12€i,=2 i=1 sup 0(n ‘1/2H’1/2logn) Jmall’ and then we can get (4.7.29) combining with (4.7.32). Case 1.3 when IJ —— J' I = 1. Without loss of generality we assume that J’ = J + 1. EBJ,a(Xia)BJ+1,a (Xia)TilT1_1’ — E {BJ,a (Xia) BJ+1, a (Xia) E (TilT 11’IXia)} = 0 (1) E {BJ.. (X...) BJ+1.. (X...)T.-. Tn} ——E {IBJ.. (Xia)BJ+1,a(X(Xia)I2 E (712.1",§IIX...)} N H ’1, i.e., E53 ~ H"1. Similar to Case 1.2, (4.7.29) follows by using Bernstein’s inequal- ity. Case 1.4 when IJ —— J’ I = 2, all the above discussion in case 1.3 applies with replacing J’=J+1with J’=J+2. Case 2 when a = 07' > 1, all the above discussion applies without modifications. Case 3 when a 7S 0’. Without loss of generality, suppose a = 1, a’ = 2. First, we still need to calculate the order of second moment E522, which is E53 — E13,. The boundedness of the density function f (:51, x2) implies that s “will;1 HbJJHQ’ f / leJ (scab... (...) f(xinnwxldz. EBJ,1 (X41) 3,152 (X42) 89 S ”bJJllz—l “"12”;1 >< <7sz 3 CB,1H, for some constant 03,1 > 0, where the last step is derived by Lemma 4.7.2. According to Assumption (A1) and Lemma 4.7.2, 2 _ E{BJ,1(X11)BJI,2(X12)} =|le,1“22|le,2“2 23,/[b 1(1‘1'1) bJ/2($12)f($1,$2)d$1d332 .>. of {1111152 [11,. mom} {11,2152 [1131.2 (m) m} 1 _. = 613 {2 + 2631/0341 — CJ,1/CJ—1.1}{||"J.1||2 2 H} x 1 _ 3 {2 + 2c], 2/cJ,_1 2 c-—J/,2/cJ/_1,2}{Ile,2||22H} 2 03,2. 2 2 2 E5.- = E {81,1 (X11) BM (11,2) T117211} -- {E811 (X10191,2 (Xian-121,} = E {3,211 (X11) 331,2 (X12) E (EiE3/IX11,X.-2)} 2 —[E{BJ,1(X11)BJ’ 2 (Xi2) E(T1'1T11’1X1'11X1'2)}] - According to Assumption (A2), there exist constants 06’ such that Cg _<_ E512. Similarly, we can get 0; > 0 such that E6? _<_ C’, i.e., E522 ~ 1, then E522 ~ 1 by (4.7.31). Second, the k-th moment of [512' is given by k D k E [€12] = E IBJ’I (X11)BJ1,2(X12)T517,2 — E {BJ,1 (X11) 3.1/,2 (X12) “[11:22” and there is a constant Cg such that E l5£,2lk is bounded by D J,J ,u ’ S Cf‘sz‘kD£”2E€?,2 S (CgDn/ H )k-zEfig- Similar to the proof of (4.7.29), the proof of (4.7.30) is completed. Cl LEMMA 4.7.12. Under Assumptions (A1) to (A3), (A5), and (A7) to (A8), , as n —> oo = 0 (111/2) , (4.7.33) sup sup sup sup 2:16[0,1]1_<_JSN+1250_<.d21s1,1I£d1 ,0, , sup sup sup sup (4.7.34) zle{0,1]1:£J$N+123chd2 131,1’gd1n :1 {wJHa 1 1' (X23351) #wJa 11’ (331)} 7011 1:1 90 = Op (log n/M) , (4.7.35) where wJflJJr (X1,:z:1) and “WJ (231) defined in (4.4.18), hence ,a,1,1’ n-1 :wJflJJ/ (X1, 2:1) = Op(H1/2) . (4.7.36) i=1 sup sup sup sup x16[0,1]1_<_JSN+1 2_<.an2 151,1’Sd1 Proof. According to the definitions of w r X33231 and 1:1 in 4.4.18 , J,a,1,1 #wJ ,a,1,l’ 11w ’0’”, (1'1) 3 Ele,a,1,1’(X11$1)| J - 1 — = lle,a||2 1 / 1.1/14‘“ ,, $1) 1.1.1..) f (111, 110, 151,111) dU1d’uadtldtll g ”bJfl“;1 {/ Itlt1/K(U1)bJ (”u)| f (1131 + hvhumtbty) d’U1d’uadtldtll CJ,a cJ-—1,a + /|t1t1’K (U1) bJ_.1 (ua)| f (331 + hvbua, t11t1’) dv1duadt1dtlr} . The boundedness of the joint density f and the Lipschitz continuity of the kernel K will imply that there exist constant c2 such that / Itzter (1)1le (“all f ($1 + hv1,ua.tz,ty) dvlduadtldtfl S CQCKCzH, [ltltl’K (U1) bJ_1 (“011 f (£131 + hv1,ua,t1, tll) dvld’uadtldtll _<_ CQCKQH, and therefore E lefiJJ/ (X13221)! = O (HI/2). Meanwhile ,. E le,a,1,1’(xivxl)| = E 171171-115: (X11 — $1) 31.0: (X,a)|’" _ 1 -- = lle,a||2"/l(t,tl,)" firm (ul h 3:1) 3,0010!) 7' (ml/VIC (v1) {2 ( Z; ) CJLOCJja 3(ua)b3—_‘i‘ (110)} -—— "bang-"h” [/ a=0 f ($1 + h'U1, ua, t1, t1!) dv1duadt1dtll] . f (U1, Ila, t1, tl’) dUldUadtldtll The boundedness of the joint density f and the Lipschitz continuity of the kernel K will imply that there exist constant 02 such that f I (m)? K. («1) b3 (11.11233; (1.) f (11:1 + hv1,ua,t1,tl/) dvlduadtldtl! S CKC2H, 91 which implies that E wJa”, (X15131) ~11l ”THI 7/2, hence E013 an”, (X13231) N h 1. Define wJ,a,1,1’ (X13231) = CUJJQJJIJ (X1, 11:1) + wJ,a,l,l’,2 (X4, 231) where 1) . wJ,a,1,1’,j (X1131) = Kh(X1'1 _ 371) BJ,a (Xia) Tull/2J1] = 1, 2 by the same truncation (4.7.5) in Lemma 4.7.4. One has with probability 1, 2 E{wJ,a,1,l”1(x1', $1)} = U ("_1) and squ1E[0,ll [72-1 211:1 wJ,a,1,1”1()(1,$1)| = U (n‘k) for k > 0_ Define 811131161011 ”3,011,1' (X11171) = wJ,a,1,1’ (X1131) " EwJ,a,1,1’(X1'1$1) . (”2,0 11’,j (X21 $1) = wJHa l 11.71, ' (X11231) — EwJ,a,l,ll,j (X13 $1) ' ThenE{w;, ,,,,(x.- 21)} — ——w:;E{ H. (x..z1)} E(w},a,,,,,1(X.-.T1))2 ~ ml. and k 0,1,1',2 (Xz,$1)| is bounded by ’0 ,, ,2 (X12112 k... sup If I (X1321) — Ew r (X,- $1)l E (12* 1 00 71 d1 N+1 d2 7% 12}: Z ZaJaszmamxz-m) sup sup lily}, (2:1)l— — sup sup i=1 (=1 J=1 a=2 lgl’gdl xle[0,1] 1l< z: 2 2a ,,,,(z 1) 1: 1J2 la=2 d1 N+1 d2 11 Z Z 2 514,047“1 2 {wJfiM (xi: 331) '- #wJa 11' ($1)} = R1 (1171) + R2 (1:1)- l=1 J=1 a=2 i=1 1 H (4.7.38) . . . N+1 1/2 By Cauchy—Schwartz 1nequa11ty, R2 (2:1) 18 bounded by (2:1: 21.]: 1 Z: (1-2 aJ a I) x 1)”- : 0p(logn/\/1Th) , n IZ{meaul (X44171)"#wm,”,($1)} 1:1 (fig 1:1 J==l a=2 $16M] Observe that “an = 0,, (log n\/N /n) as given in (4.7.19) and sup $16[0,1] n IZ{wJall’(X11$1)/‘wm ,,,,(171)} 1:1 which is given in (4.7.34), so the order of sulee[0,l] R2 (151) by Assumptions (A7), (A8) is O n O 2 0,, (logm/N/n) \/(N + 1) 41 (d2 — no, (if???) = 0,, (W) (47.39) _ op ((logn)3 NH) . (4.7.40) 93 Using again the discretization idea, we divide the interval [0, 1] into Mn ~ 11 equally spaced intervals with endpoints 0 = 221,0 < 2:1,1 < < $1,Mn = 1. Then 3“px16[0,1]R1 (11:1) 41 N+1 42 < "151.134" 2.2% (”0‘me “HA 1310+ lgglnxiélzisgmrkl d1 N+1 d2 011 N+1 dz ZZZaJHav‘IH-‘w 0,11, M1)“ZZZGHJJOWW “ll/(1k) l==1Jla=2 l==1Jla=2 =T1+T2. Noting that dJflJ is d1 N+1 dz 1 Z Z Z SJ+(a—1)(N+1)+(l-1)d2(N+1),J,+(a’_1)(N+1)+(l”_1)d2(N+1)n— X z”=1 J’=1 a’=1 17. Z BJ’,a' (X1701)Til"a(Xi1Ti)5iw 121 according to (47-16), Where SJ+(a- -—-1)(N+1)+(l 1)d2(N+1),J’+(o/—1)(N+1)+(l”—1)d2(N+1) is the corresponding element in ST: VT'1.We define W n equals a,,,lcr’l n N+1N+1 -1 13314,, n X Z Z 5J+(a—1)(N+1)+(z—1)42(N+1),J'+(d-I)(N+1)+(I”—1)d2(N+1> 3,150,! (Xm) Til/I0 (Xi: Ti) 541%)“ I ,,, (17m)! then it is clear that T1 _<_ Edi 120:2 EW—l 222-1Wala’l’h To show that each term W l 01’,” has order Op (71—2/5) we truncate the T1115, by the same way in the proof of 0” Theorem 4.3.2, 1 2 D = 90 —<() <—. 4.7.41 11. n (2 + 7) 0 5) ( ) where 17 is the same as in Assumption (A5). Let Z,-== T [”52 2in + Zz-Dz" +Zi" 3 , where 2,3" = z,- {|z,-| > Dn} zf’gn- _. z, {W s D,,}— zflmzfln— _ EZ {)2 l < Dn} For fixed J,a,l,l', 1),, = Z 44,0”,(2:1).)B,r,a,(x.a)a(x.,r.)2,%" 15.1,ng SJ+(a—1)(N+l)+(l—1)d2(N+1),J’+(a’—1)(N+1)+(l"-1)d2(N+1)’ 94 and denote WD , as the truncated centered version of W0 1 a: l”? i.e., a,,a,l’l’ D — 4.7.42 Wa,l,a’,l”—1SkSMnn1: Ui ,k ( ) In the following, we will prove that IW al a; lu— W51, 0”,, = Op (H). It 18 clear that IW a l a; l” — Wat 0,1,,l 3 A1 + A, where n N+1N+1 D A1: ”’1 Z Z 2:1qu , (171,112) BJ’ or (Xia)0(xini) Z,- 1” 1_ 3, the r—th absolute moment E lUiJclr is E 2 “Wm,” (INC) BJ’,a’ (Xia) ‘7 (Xi’ Ti) 1_<_J,J’ SN+1 Z52"l'lmi)} SJ+(a-1)(N+1)+(l-1)d2(N+1),J’+(a’-1)(N+1)+(l”-1)d2(N+1)l E( r-—2 S 0303 (*6 ($1,k)}r 0(H1"’/2)D£‘2Vz,o S {con (31,1) DnH *1/2} rlE |U,-,,,|2, which means the sequence {Ugh }Ll satisfies the Cramér’s condition with Cramér’s constant equal to c... = c0n($1,k) DnH '1/ 2, applying Bernstein’s inequality for r = 3 1 n (1.03: n 6/7 P n" U- > <0. ex - +a 3 a , Z 2”“ —p" — 1 p 25m§+5c...pn 2” ([q+1D [=1 where 2 5m6/7 =pn'3/5H'1/210gn, a1=23+2 1+ p" ,a2(3)=11n 1+ 3 , p" r 2 r q 207712 + carp" p" 96 2 3 _ 1/ 3 m§~{'€(-T1,k)} v21), m3 .<.{c{~:(x1,k>} H l/ZDnVZfl} . Then by taking q such that [q—Z—I] 2 c0 log n, q 2 cm/ logn for some constants co, c1, one has a1 = 0(n/q) = 0 (log n), a2 (3) = 0 (n2). Assumption (A2) yields that 6/7 6/7 a<1a1>{<[.:.1>} andasn-eoo,onehas qu, > 01,0212'1/5H"1 logzn ,0277."1/5H"1 logn / ~ ~ 10 n. 25m§+5mpn 25mg+5DnH—1/2m—3/5H—1/210gn Dun—2/5n'1/5H—1 P 8 Thus, for 11. large enough, 1 n P {5 Z UiJc i=1 Taking c0, p large enough, one has for large n, P { l% 221:1 UN: 00 OO Mn 1 n 2 P (lwgw 2 ,,H) = 2: >: P ( in”. n=l i=1 n=1k=l Thus, Borel-Cantelli Lemma entails that W001 C1,1,, 2 Op (114/5). Therefore, one has W ,,,a/Jn = 01, (124/5) since lWa,l,0/,l” — WD ,,,” = 0,, (114/5). Hence a aLa > pH} S clognexp {—c2plogn} + Chg—(”060” S n—3. > pn'2/5} S 1173. Hence 00 2 pH) 5 Z Mun—3 < oo. n=1 011 d2 d1 6’2 T1 _<_ Z Z Z Z Wa,l,al,zfl = 019 (TL—W5) . (4.7.43) 1:1 (2:2 (”:1 01:1 Employing Lipschitz continuity of kernel K, the term T2 equals 1313-3471 SUp$1€l$1,k—1:$1,k] d1 N+1 d2 d1 N+1 d2 2 Z Z dJ,a,l#wJ,aJ,l, ($1) - Z Z Z C“Ummug“, (331$) (=1 J21 a=2 (=1 J-.—_-1 0:2 is bounded by “a“ x N+1 Ina-X SUP 2 E [{Kh (X1 - $1) - Kh (X1 - $1,k)}2{7117}zIBJ,a(Xa)}2] - 195M" mlelxl,k—1’$1,kl J=1 Therefore, according to Assumption (A8), Lemma 4.7.2 (ii), and (4.7.19), 1 N+1 2 2 “Mg .121 E31,, (Xa)T 117’,” (4.7.44) = 0 (log nv Nn‘1h74Mg2) = 010(71‘1/2) . T2 3 CQop (71—1/2N1/210gn) 97 Combining (4.7.43) and (4.7.44), one has sup R1 (2151) = 0p (71-2/5). The desired result $16[0,1] follows from (4.7.38) and (4.7.40). Cl Proof of Proposition 4.4.2. (4.7.1) implies that IEn9_1,l (xi,-1)| S IEn9_1,z (X4,_1) - Enm-1,z (Xi,-1)| + IEnm-1,z (Xi,_1) I(4-7-45) S Coo (d2 -— 1) sup ”mhllloo H2 + Op (114/2). ZSanQ By definition (4.4.14), SUPx1e[O,1] I‘llb 1’ (271)] 3 R1 + R2 + R3 where Ri = 811p —ZKh(X11-$1)Z{mu i,_1) 9-1,1(X1,-1)}7117}zr x1€[0,1]n 1 R2 = SUP - Kh(X'1-I1) x16[0,1] n; 1 1 Z{g_1,z (X1,_1)—Eng_1,z (X1,_1)—rh_1,1( i,-1)}Tle u, 1 n '1 sup — Z Kh (X11 — 331) Z En9_1,l (xi,_1) Till-1111' R3 313011] n i=1 (:1 For R1, using (4.7.1), one has dll 11 R1 3 000(d2—1) supd 2||m;,||oo H2Z—IZIT11T1’I 2222(a2a1) {0p(H/>+Ov(fi)} I: la: 2 J: 1 d1 d2 d2 = 0p 2 m1 (X) "101-29a: (X)+ ZEflgal (X) ) 2 (=1 (1:1 = 0,, (n'1/2+H2). (4.7.47) The last step follows from d2 ml (X) *7 "101-Z 9az( (X) + Z Engaz (X) a: l a=1 2 d2 S “Th1 (X) — m1 (X)||2 + W (X) ”mm 2 9az(X + Z Engaz (X) 2 “=1 2 d2 5 300,, Z ”mglum H2 + 0,, (71-1/2). (:21 99 Thus 122 = 0,, (77-1/2 + H2). Similarly, it 1 = sup n —2: Kh (X21 - 111):le Eng 1,(1 1)TilTil’ $1610, 1] _1 i: Kh(X .1— 51:1)T-1Tfl/ 6’1 S R{:|En9_1,z(xi,) -1 )I} SUP (___1 $1€[0,1]n fith (X41 — $1) TilTW—zl i=1 —Op (71—1/2 + H2). (4.7.48) $1€[0,1]n d1 3 {Z lEn9-1,l (x.,_1)|} SUP [=1 by (4.7.45). Combining (4.7.46), (4.7.47) and (4.7.48), one establishes Proposition 4.4.2. Cl Proof of Lemma 4.4.1. Based on formula (4.4.11), 71.“1 221131,) (Xi,-1) is n ‘12 N+1 d2 N+1 n 5; z 2 41.4... (x...) = z z a... {n-l :33... ms}. i=1 0:2 J=1 a=2 J=1 i=1 Lemma 4.7.8 implias that /\ d2 N+1 d2 N+1 ”2 Z Z 5J,a,l _. {(N+1)(d2 - 1) ' Z 2 53,01} 0:? J=l 01:2 J=1 _<_ {(N + 1) (d2 — 1) .5T5}1/2 = 0p(Nn-1/21ogn) . Now it is clear from (4.7.27) and (4.7.28) that sup1_<_ JSNH |n'1 221:1 B J.a (Xia)| S An,1 = 0;, (71-1/2 log n), hence d2 N+1 1 n N(log ”)2 5:3.“ x.,-1)< z z a... sup 23.424.) = 0,, (——). i (1:2 J=1 (11, £21 (4.7.49) While standard kernel theory implies that supxlem,” ln‘l 2&1 2:21 K h (Xfl — $1)'1})T;)/' = 0,, (1) .Thus the lemma follows immediately from (4.7.49) and (4.4.16). CI 100 CHAPTER 5 Spline-backfitted kernel smoothing of generalized additive model 5.1 Introduction Following Stone (1985), p. 693, the space of a—centered square integrable functions on [0, 1] is M = {g : E{g(Xa)} = 0,15;{g2 (X00) < +00},1 g a _<_ d. in which 9 are finite constants. The constraints that E{ga (Xa)} = 0, 1 S a S d ensure unique additive representation of ma as expressed in (1.4.3), but are not neces- sary for the definition of space M. In what follows, denote by En the empirical ex- pectation, Encp = 23:1 cp (X.,-) /n. We introduce two inner products on M. For func- tions 91,92 6 M, the theoretical and empirical inner products are defined respectively as (91.92) = E {91 (X) 92 (X)}, (91,92).. = En {91 (X) 92 (X)}- The corresponding induced norms are ||91||g = E9? (X), ”mug,” = Eng? (X). The model space M is called theoretically (empirically) identifiable, if for any 9 E M, ||g||2 = 0 (llgllzfi = 0) implies that g = 0 as. In this chapter, for any compact interval [a, b], we denote the space of p—th order smooth function as C(p)[a, b] = {glgo’l E C [a,bl}, and the class of Lipschitz continuous functions for constant C > 0 as Lip ([a,b] ,C) = {9] |g(:z:) -— g (3')] _<_ Clx — :r’l , Vx,:c’ E [a,b]}. We mean by “N” both sides having the same order as n —-> 00. We denote by Idxd the d x d iden- tity matrix, and de d the d x d zero matrix. For any vector x = (x1, :52, - - - ,xd), we denote 1/2 the supremum and Euclidean norms as |x| = maxlSan Ixal and ”x“ = (2211:1233) . 101 We need the following Assumptions on the data generating process. (A1) (A2) (A3) (A4) (A5) 5.2 The additive component functions ma (230,) 6 0(2) [0,1], or = 1, ..., d. The inverse link function b' satisfies the following: b' E 02 (9) where 9 is a compact interval such that m ([0,1]d) is in the interior of G and C), > maxeee b” (0) 2 mingee b” (6) > c), for some constants Cb > C), > 0. There exists a compact interval A such that m1 ([0, 1]) C A and that A + m_1([0,1]d—1) C 9 where m-1(x_1) = c+ 23:21:10, (pa) with x_1 = (1:2, ...,:L‘d). The conditional variance function 02 (x) is measurable and bounded. The errors {8i}?=1 satisfy E(eilfl) = 0, E (eglfi) = 1, E (IQ-PH] IE) 3 CT, for some 1) E (1 /2, 1] and the sequence of a-fields F, =0{(Xj),j gram Si—1}fori= 1,...,.n The density function f (x) of (X1, ...,Xd) is continuous and 0 O, and is bounded, nonnegative, symmetric, and supported on [—1,1]. The bandwidth h of the kernel K is assumed to be of order n’1/5, i.e., c),n"1/5 S h g Chn’1/5 for some positive constants 0),, Ch. 102 If the last d — 1 components {ma (ma)}g=2 were known by “oracle”, then the only unknown component m1(:c1) could be estimated by the following procedure. Define for each $1 6 [h,1 - h] an local quasi-likelihood function l (a) = l (a, 2:1) as _ n n 1 2,21 [Yr {0 + m_1(X4_1)} " b {a + ”1.1 (Xi-1)}lKh(Xi1 - 1‘1) (5-2-1) and define the oracle smoother of m1 (2:1) as in!“ (231) = argmaxl(a) . (5.2.2) aEA THEOREM 5.2.1. Under Assumptions (A1)-(A6), as n —i 00 sup [mm (x1) — m1 (7:1)] = 003, (log n/Vnh) = 071.3. (n—2/5 log n) . $1€]h,l-h] THEOREM 5.2.2. Under Assumptions (A1)-(A6), for any 2:1 6 [h, 1 — h], as n ——> 00, the oracle kernel smoother mm (1:1) given in (5.2.2) satisfies W {mm (a) — m (an) — bias. (xi) (9/01 (2:1)}:1 —» N (0.01 ($1)-1.3mm, ($1)-1) where D1051) = fl (1‘1) E lb” {7” (X)} lX1 = 951] (5-2-3) and vi (2:1) = f1 <41>E{a2 (X) m = x1} IIKII§, #2(K){m’1'($1)f($1)l3 [b"{m(X)} |X1 = 151] mi (e1)f(e1) 53-1—4: [b” {m (X)} m -= x1] — {mi (41)}2 f (22.) E [b”’ {m (X)} m = 2:1] }. (524) ll bi831 ($1) The same oracle idea applies to the constant as well. Define the the quasi-likelihood function ~ _ n 1c (a) = n 1 21:1 le' {a + m-c (Xill — b {a 'l‘ m-c (Xilllv where m _c (X) 2 23:1 ma (X0) and then the infeasible estimator is 5 = argmaxOLE A lc (a) . Clearly, l; (E) = 0. 103 THEOREM 5.2.3. Under Assumptions (A1)-(A5), as n —> oo, 6 —>a_s, c and IE - c] = Op(n—1/2) . Although the oracle smoother mm (131) possess the desirable theoretical properties in Theorems 5.2.2 and 5.2.1, it not useful statistics as it is computed based on the knowledge of unavailable functions {ma (2:0)}322 and constants c. They do, however, motivate the spline-backfitted estimators that we introduce in the next section. 5.3 Spline-backfitted Kernel Estimators We need following Assumption for kernel function. (A7) The number of interior knots N ~ 77.”4 log n, i.e., an1/4logn S N S CNn1/4logn for some positive constants cN,CN, and the interval width H = (N + 1)"1 . In what follows, we denote IIKllg = fK (u)2 du,u2 (K) = f K (u) uzdu. For J = 0, . . . , N + 1, define the linear B spline basis as (N+1)$—J+1,€J_1_<_$S€J bJ($)=(1“|$-EJ|/H)+= J+1-(N+1)$ , EJS$S€J+1: 0 , otherwise the space of a-empirically centered linear spline functions on [0, 1] as N+1 J=0 0?; = {90 3 907(13a) E AJbJ (17a) ,En {ga (Xa)} = 0},1S 01 S d, which is equipped with the empirical inner product (-, )2“. Define L (g) = i; 2&1 le’g (Xi) — b {g (Xi)}] ,g 6 G9,. The multivariate function m (x) is estimated by an additive spline function rh (x) = argmaxf. (g). (5.3.1) 9609. Next define the quasi-likelihood function .. 1 n ,. ,. r' 1(a) = a 27:1 le' {0 + "1-1 (39-1)} — b {a + "1-1 (Xi-1)llKh(Xi1 — $1) (0.3-2) TllSBKJ (2:1) = argmaxl(a). (5.3.3) aEA 104 THEOREM 5.3.1. Under Assumptions (A1)—(A7), SUP IThSBK,1(331) - fiIK,1(1171)| = 0a.s. (”'2/5) - $1€]0,1] Theorem 5.3.1 follows (5.6.11), Lemmas 5.6.11 and 5.6.12. The following theorems are straightforward from Theorems 5.2.2, 5.2.1 and 5.3.1. THEOREM 5.3.2. Under Assumptions (A1)—(A 7), as n —> oo sup ]mSBK,1 (3:1) — m1 (3:1)] = 0&3, (log n/M) = 0.1.3, (n-2/5 log n) . $1€[h,l—h] THEOREM 5.3.3. Under Assumptions (A1)-(A7), for any :51 E [h,1 -—h], as n --> oo, mSBKJ (2:1) given in ( 5. 3.3) satisfies .. . 2 d1 «n1. {msm (m1) — m1 (4.) - blasl (24) 12 ml (41)},,=, —> N (0.0. ($1)-‘1 vi (2:1) 01 ($1)—1) where biasl ($1) and D1 (3:1) are defined as (5.2.4) and (5.2.3). Then define lo (a) = n—1 2;, [Y1- {a + rh_c(X1-)} - b {a + The (Xilll: where like (X) = 22:1 ma (X0). Define next the spline-backfitted estimator (’5 = argmaxae A lo (a). THEOREM 5.3.4. Under Assumptions (A1)-(A5) and (A7), as n —> oo, |e - E] = 0,,(n‘1/2), hence (a -— c] = 0p(n'1/2) . 5.4 Implementation We implement our procedures with the following rule-of-thumb number of interior knots N = Nn = min ([n1/4logn] + 1,) which satisfies Assumption (A8), i.e.N = Nn ~ nl/4 log n, and ensures that the number of parameters in the linear least squares problem. According to Theorem 5.3.3, the asymptotic distributions of the estimators TllSBsz (11:0) depend not only on the functions biasa (220,) /Da (pa) and Da($a)—1’Ug ($Q)Da (550)—1, 105 but also crucially on the choice of bandwidths ha. So we define the Optimal bandwidth of ha, denoted by happt, as the minimizer of the asymptotic mean integrated squared errors (AMISE) of {ma(:ra),l = 1, . . . ,d}, which is defined as AMISE{r‘na} = /]{biasa(xa)hg/Da(xa)}2 +0., (mar-1 v3, (ma) Da (marl / (7.7%)] fa (2:0,) am... By letting dAMISE {n.1,} /dha = 0, one gets the optimal bandwidth happt as ao.={ 1macro.) 122?. (mamas...)- faunas} (5 p 4] {blasa (330) /Da (mall2 fa (170:) diva , which is approximated by '3 . = "—1 22;. De We)“ v.2. (X...) D... (marl ” 5 0,01) 4 211:1 {biasa (X20) /Da (Xia)}2 1 where Do: (513a) = fa (5130:) E lb” {m (X)} lXa = $0] and v: (as) = fa (ma) E {a2 (X) IXa = ma} (1X13, bias... (4..) = )4 (X) (m" (as) f (ma) E [b” {m (X)} (Xa— — ma] +m2. (4..) f (as) 55:? [b” {m (X)} (X... = 42...] — {mt (4:4)}2 f (x...) E [b’” (m (X)} (Xa = $0.1} . To implement this, we propose following estimation methods for the terms mg, (Iva), mg (2:0,), fa (1130,), E {02 (X) lXa = 1:0,}, E [b” {m (X)} IXa = ma], E [b’” {m (X)} lXa = 2:0,] and 533E [b” {m (X)} IXO, = sea]. The resulting bandwidth is denoted as happt. o The derivative functions m2, (Xia) and mg, (Xia) are estimated as N 3 2 Zi=1kaalkxm +32}: + 4aa,,lk(Xi1"ta,k-3) and 106 A k—2 N . . N 3 22:2 k (k ‘ 1) aa,l,kXia 'l' 6 216:]? aa,l,k (Xil - ta,k—3) Where {aa,l,k}k=0 maximize the following 3 N+3 3 2;, {Y1 (211:0 aa,1,kX3. + 21.—4 aa,l,k (Xia_ flak—3) ) 3 N+3 3 —b 2 00,1,kaa + Z 010,”: (Xia _ tank—3) k=0 k=4 where min,- Xz-a = tap < --- < ta,N+1 = max; Xia o E [b" {m (X)} [X0 = 3:0,] is estimated as 3 .. N 3 . 3 . . . . Zk=0 11$],ka + 23;, a0,”c (Ira — ta,k—3) by minimizmg n 2 Zi=1 [b], {Th (xi)} —{Z:.___ _0 0'01,ka + ZN:3 aa,l,k (X0 '" tk—3)3}] 2 8 E [b” {m (X)} IXa = pa] and E [b'” {m (X)} lXa = ma] are estimated 55; N 3 3 . by Zk_1kaaH-lk:c§,1 + 32k: + 40a,,lk($a—ta,k 3)2 and Zk=0“3,l,k$a + Ell/:13 a0, 1 k (ma — ta k- 3)3 by minimizing 2 n - 3 N+3 2:31 [hm {1110(1)} " {21:20 aa,1,kX§+ 211:4 amtk (X0 — t1c-3)3}] - o E {02 (X) lXa = $0,} is estimated by 3 A N 3 . 3 . . . . Zk=0 a§,l,kza + 2k; aa,l,k ($0 _ ta,k—3) by mlnlmlzmg 2 n N+3 Z l'—b'{m(Xz)}-Zaa,1kX3+Zaa,,-11(Xa tk— 3)3 i=1 =0 k—4 0 Density function fa(:1:a) is estimated by g; 23:1 K h a (Xia — :50) with the rule-of-the- thumb bandwidth ha. 5.5 Examples 5.5.1 Simulation 1 The data are generated from the model with m1 = sin (7w), m2 = (3:1:) and mg (:17) = m4 (:13) == m5 (2:) = 11:, where is the standard normal distribution function. The data are generated from the following vector autoregression (VAR) equation for 0 S a, r < 1, (1 r r- Xt=aXt_1+e,-,e,-~N(O,Z),2$t§n,2= T ‘ , _r r 1‘ with stationary distribution Xt = (Xt1,...,Xtd)T ~ N (0, (1 - a2).1 )3). Clearly, Higher values of a correspond to stronger dependence among the observations, and in particular, if a = 0, the data is i.i.d. The parameter r controls the correlation of the bivariate th and th. In this study, we have experimented with two cases: 1' = 0, a = 0; r = 0.5, a = 0.5 to cover various scenarios. For a = 1, ...,d, let damn], .732.an denote the smallest and largest observations of the variable .730, in the i -th replication. The functions {ma }g=1 are estimated on sample values. Denoting the estimator of m) in the k-th replication as mSBK,a,k and Xta are the points where the functions are evaluated, we define the (mean) integrated squared error (ISE and MISE) as . 1 n . 2 ISE(mSBK,a,k) = " _ {mSBK,a,k(Xta,k)—ma(Xta,k)} 1 n t—l 1 100 . “1755 11:1 ISE(mSBK,a,k)- MISE(fiISBK,a) = Then to see that the SBK estimator is as efficient as the ”oracle smoother” mm, (330,), we define the empirical relative efliciency of mSBKfl (2:0,) with respect to fizxfi (11:0) as 221:] {file (Ilia) "' ma(Xta)}2 1/2 . Z?=1 {mSBK,a(Xta) "’ ma(Xta)}2 Tables 9 and 10 show the MISEs of Efl's of mama and ml“, for a = 1, 2. It is obvious EFF :- C! that the SBK estimator has as good as performance of oracle estimator, (and it corroborates with Theorem 5.3.1. 108 5.5.2 Simulation 2 Using the same model in Simulation 1 but with high dimension d = 10, where ma (map) = sin (111:), a = 1, ..., 10 and data are generated the” same way. We have run 100 replications for sample size n = 500, 1000, 1500, 2000. The MISEs of Effs of mSBKJ and rhKJ are shown in Table 11. As expected, increases in sample size reduce MISE for both estimators and across all combinations of r and a values. To see the convergence, Figure 13 plots the kernel density estimation of the 100 empirical efficiencies for a = 1 and sample sizes n = 500, 1000, 1500, 2000 for r = 0, a = 0. The vertical line at efficiency = 1 is the standard line for the comparison of ThSBKJ and mm. One can clearly see that the center of the density plots is going toward the standard line 1.0 with narrower spread when sample size n is increasing, which is confirmative to the result of Theorem 5.3.1. The basic graphic pattern of Figure 16 with r = 0.5, a = 0.5 is similar to that for the i.i.d case, though with slower convergence rate and relatively poorer efficiency. To have some impression of the actual function estimates, for r = 0, a = 0 and r = 0.5, a = 0.5 with sample size n = 500, 1000, 1500, 2000, we have plotted the SBK estimators and their 95% pointwise confidence intervals (three dotted lines), oracle estimators (dashed lines) for the true functions m1 (solid lines) in Figures 17—24. The visual impression of the SBK estimators is rather satisfactory and their performance improves with increasing sample size. Lastly, we provide the computing time of Example 2 from 100 replications on an ordinary PC with Intel Pentium IV 1.86 GHz processor and 1.0 GB RAM. The average time run by XploRe to generate one sample of size n and compute the SBK estimator is reported in Table 12. 5.6 Appendix 5.6. 1 Preliminaries In the proofs that follow, we use U and u to denote sequences of random variables that are uniformly 0 and o of certain order. 109 LEMMA 5.6.1. ([70], Lemma A2) Tthere exist constants co > 0 such that for ”any __ T 1 d N 1 A — (AO’AJ10)1$JSN+1,1_<_agd E R + ( + )1 2 2 C0 A8 + 2 A3“! S A0 + Z AJ’QBJ,Q (5.6.1) J,a Jia 2 LEMMA 5.6.2. ([70], Lemma A.4)Under Assumptions (A2), (A4) and (A6), the uniform supremum of the rescaled difference between (g1, 92);", and (g, g2)2 is . —- M— -... ..I (91,92)2 n — (91:92» 10 n A" = ”‘50) I 11911121192112 I ’ (___n1/2g11172) ' (5'62) 91192€Gn [0.1] 5.6.2 Oracle smoothers LEMMA 5.6.3. Under Assumptions (A1)-(A6), sup 1" (m1 (2:1)) -- bias. (on) 112] = 0.... (logn/v.1.) $1€[h,1—h] where biasl (x1) is defined as (5.2.4). Proof. According to (5.2.1), l’ (m1 (x1)) equals 11. 1/n 2,11 [Y1 — b’ {m1 ($1) + 771.1 (X1-1)}] K1. (X11 — 131) (5-5-3) = 1/n 2;, lb, {m(X1)} — b'{m1($1)+ "1-1 (39-1)} + 0 (Xi) 51] K11 (X11 — 931) Let {1,12 = {in ($1) = €1,n,1 + €1,n,2 is lb, {771 (Xi)} - b'{m1($1)+ "1-1 09.1)) + 0 (Xi) 81'] Kb (X11 - $1) (5-6-4) ’53 [W {77109)} — b'{m1($1)+ m-1(X1-1)} + 0 (X1) 61] Kh (X11 - 1131)] where €1,n,1 = 6171,1081) = 0 (X1) 51K}: (X11 - $1)- €1,n,2 = €2,712 ($1) = lb, {7710(1)} - b'{rn1(x1)+ m_1(Xz'_1)}] Kh (Xil — 1‘1) —E [[b’ {"1 0(1)} " b'{7711($1)+ ma (X1_1)}l K1. (X11 - 171)] . 110 Then according to (5.6.3), one can rewrite 1‘" (m1 (1:1)) as 1/"Z:1€rn+E[b' {7710(1)} b{m1($1) +7” 1 (Xi-1)}] 102(le —-’L"1)- While E [b’ {m (39)} — b’ {m1 ($1) + m_1(Xr-1)}] Kh (X11 '- $1) = [[0 11d [b’ {m (u)} -— b’ {m1 (1:1) + m-1(u-1)}] %K (U1h 2:1)f (u)du : [[0 11d 1b"{m($1r“-1)}{m1(“1)’ m1($1)} +';‘b”’ {m ($1, 11-1)} {m1 (“1) ‘ m1($1)}2 + 1102)] 1 ul —:L‘1 2 EK( h )f(u1,u_1)du1du-1+u (h ) .- ” , (full)2 I! — [[0,1]d—1/[-1,1][b {m(r1,u_1)} {hv1m1(~’c1)+ 2 m1 ($1)+“(h2)} +%b”’ {m (x1, u_1)} {hvlm'r ($1) + (’WI)2 m” (”51) + “ (h2) }2l K (111) { f (1:1, u_1) + hv fligf—l) + U (112)} dvldu_1+u (112) = h2/ UiK ('01)dv1 m’l’($1)f1(xl) 5” {m ($1,11-1)}f(u|$1)dU_1 {..],1] 2 1 [mud- +m’1=/[01(x1)]d_1 b”{m(:1:1,u_1)}—6—f—(%li)du_1}+u(h2). Mo (K) {m” (x1) )1 (an) E [b" {m (X)} |X1- — 1‘1] mi (an) 53—1— [f (E1) E [b” {m (X)} 1X1 = M] — {ma ($1)}2 f (an) E [b’” {m (X)} m = 2:11} +u (h?) - Let Dn = n0 with a < g, a(2+n) > 1 ,a(1+17) > 2/5, which requires 17 > 1/2. Rewrite e- = 63" +512" +521??? where 52D?“ —— ez{|e,-| > Du}, ED 2" — -e,{|ez| < Dn} —€2D§‘, :5in = EEi {lgil S Dn}- Define for .7: 112731 €i,n,1,j —‘ €i,n,1,j ($1)i3 [b' {m (191)} - b'{m1($1) + m-1(Xz‘-1)}+ 0‘ (X05371 Kn (X11 - $1)- 111 According to Assumption (A5), one has 00 00 GEE ’7 Z..- P(ler-I>Dn <2“: —-———'——"——— of,” < 0005:: =00 0:52 n“°‘(2+") < 00. By Borel-Cantelli Lemma, one has with probability 1, 11’1 23:1 final = 0 for large n. Therefore, one has squle[0,1] [yr—1 23;, £211,141 = U (n—k) for any It > 0. Using As- sumption (A5), E e- 2+" _ Tl. Hence 71—1 2;, Er,n,1,3 = ”—1 2;, Kh (X21 - $1) 0 (Xi) 553” = 71-1 2;, Kh (X11 — $1) 0 (71—2/5) = 0a.s. (114/5) - Meanwhile E53,“; = E [0 (Xi) 51K}; (X11 -' ac1)]2 = h’lfr (11:1)E {02 (X) m = x1} "Eng {1 + u (1)}. E lfi,n,l,2|k = E (léi,n,1,2lk—2 lgi,n,1,2|2) k—Z 2 .... _ .. 2 S SUP I€r,n,1,2| E|€1,n,1,2l S 002k 2171’: Z/hk 2E léi,n,1,2| , $16[0,1] then there exist a constant c1 = COD" / h such that E (|€i,n,1,2|k) < ck 2h!E(§,2 1,.n 1 2), k > 2. By using Lemma 2. 5. 2, we let k- = 3, a2 (3) = 6/7 logn lln 1+ ,m2 =E Oli“1,e =a , n 2 (512 ,=n,1,2) 0( ) n \/n_h n q52 n 9 P - > n5 < ex — n + 3 a ([—]) { Efrmdg n} .. (11 p ( 25mg +5615”) a2( ) (1+, c 71 take q such that [fl] 2 c2 log n, q 2 l 3 for some constants c2, C3. 0g n (a log 71)2 C3" a2 (log ”)2 qe,2, = q nh > log n nh 2 2 . ‘ ‘ 257712 + 5057, 25m2 + Sale" 25mg + 5Clalog n V nh 112 (:3a2 log n __ C3a2 log n > _ ~ a2logn _ logn 25m2h + So no“ *1/2h'1/21 ’ 257122 h+5 D ha —-—h 2 C0 11 ogn 2 C0 n/ m 2 a1=22+2 1+ 28” =O(logn), q 257712 + 5C1€n mes/7 = 11 1 3 th = - < D “2 (3) n + 5n W1 ,m3 121351,, ”€1,11,1,2“3 —. 66 n: CGDn 2 a 331112 1+ =o(n), 2( ) { an‘l/zh‘l/2 logn} 6 7 n 6/7 ——,\0[—-—— n J / a ([—]) 3 K06 q + 1 g Cn”6)‘OC2/7, q + 1 therefore for large n P {71-1 '22:, {,,,ng > alog nfi/hh} S 0(108 71) eXp (—C5a2 log n) + Cn2‘6*002/7 2 = n-c5a 0(logn) + Cn2'6AOC2/7 for c2, C5, a large enough. For all $1 6 [h, 1 — h], we discrete by equally spaced h = 331,0 < 371,1<°"<$1,Mn=1-h,Mn=n4, P{ max n’1 '22:, £1371,” (mm-)l > alog rib/Eh} OSjSMn 32:34:11” Pn{ “1'2; fii,n12(z1j)l>alogn/\/——}}]2 K (———,,———)2 f (u) du+U (m) _ -1 ’ m 2: v — ' m x m 2 _ h [[0.1]d‘1/[-1,11[b{ (1+h1,u-1>} b{ 1(1)+ _1(u-1)}] K(v1)2 f (:51 + hv1,u_1) dvldu_1+U (h4) = f/[mm—1»{widower K (m)2 {f (2:1, u_1) + U(h)}dv1du_1+U (h4) = U (h). Note that supxl Ib’ {m (X,)} — b’ {m1 (3:1) + m4 (Xi_1)}| S Cbh when Kh(X,-1 — :01) 31$ 0. Similar to the proof for 613712, one has It _. E IEz’,n,2| S (20b)k 2 1353,12; and then sulee[h,1- h] n "71 Z §i,n,2 = 00.5. {(nhlfll/2 log n} i=1 Putting {,,n,1,€i’n,2 together, the lemma is proved. Cl LEMMA 5.6.4. Under Assumptions (A2), (A4)-(A6), sup l” (m1 (2:1)) + D1 (1:1)l = 0&3, (log n/Vnh) , $16[h,1—h] where D1 (2:1) is defined as (5.2.3). Proof. According to (5.6.3), one has 1*” (m1 ($1)) is —1/an:, [b” {m (an) +m_1(x.-_1)}] Kh (X21 — a) (5.6.5) 114 Let Cm = lb" {m1 ($1) + 771.1 (Xi-1)}] Kh(Xi1 ~ 271), then 15'3sz = E [[b” {m1 ($1) + 711-1091)” Kh (X11 - $1)] “1‘11 II 1 = [[0,1le b {m1(:z:1)+ m~1 (u_1)} hK (T) f (u) d“ =/ d / b”{m1(131)+m_1(ll_1)}K(’U1)f($l +hv1,U-1)dvidu_1 [0,112 l-Lll z [[0,11d2/l-mlb {m1($1)+m_1(11_1)}K('Ul) {f (1:1, 1L1) + hvlw + U (’12) } d’Uldu_1 = [[0,1]‘12 [~1,1]b” {m1 ($1) + 711.1 (11-1)}K(v1)f($1,u_1)dv1du_1 + U (’12) = fl (a) E [b" {m (X)} m = x1] + U (h?) . 139%,, = E [[b"{m1($1) + m_1 (Xi-1)}] Kh (X21 - 331)]:2 II 1 _- x = [[0,1]d2 [b {m1 (”1) + m_1(u_1)}]2 EEKZ (ELI—fl) f (u) d“ = h‘1 [[0 1142/[41] [1?” {m1 ($1) +m-1 (11.1)}]2K2 (v1)f($1 + h“01,11_1)d’01du.1 — _1 ” m 1’ m u 2 2 'U _ h [[0,1]d2 /[—1,1][b { 1( 1)+ -l( '1)” K ( l) {f (x1, u_1) + hwy-W + U (112)} dvldu_1 - 2 = h 1f1(rr:1)llK||§157[[b”{m(X)}] m = x1] + U (h?) . Similar to the proof of Lemma 5.6.3, the result follows the Lemma 2.5.2. E] LEMMA 5.6.5. Under Assumptions (A1) to (A3), (A5) and (A 7), as n ——> 00, there exists a constant C such that 1+ SUP ICOV (€z‘,n,€j,n)l S Ch—figaO’ — 02% fOTi 753' $16[h,1—-h] 115 Proof. According to Davydov’s Inequality, for %+ % + %- = 1, cov (Em, (in) is bounded by . . 1 02 {20 (J - 7J} /p ”Ema + 5i,n,2“q “€j,n,1 + €j,n,2“r s c2 {2a 0- —z->}1/P(|Ia.n,1n.+ “€212,2Hq) (llgj,n,1|lr+lléjfiflllr) Let q = r = 2 + 1),}? = 1 + 2/77, where 17 takes value in the Assumption (A5), then _ 1 —§fl one has ”gimflllq ___ U(h 2+77) and ”gimnq = U (h +17). cov (leufjmJn) S —%fl Ch +71a (j — 2') +’7 for some constant C. C] PROOF OF THEOREM 5.2.1. Existing a 1721 (1:1) between 1711“ (1:1) and ml (:31) such that 17(17sz ($1)) - i, ("ll ($1)) = I” (7711 (1131)) {771191 ($1) — m1 ($1)} Note that l7 (771191 (2:1)) = 0, then _ 17 (m1 ($1)) - , 5.6.6 1” (1721 ($1)) ( ) film ($1) - 7711 ($1) = Lemma 5.6.4 implies that c S squ16[h,1—h] |_[” (7711 ($1))l g C as for some constants 0 < c < G. Then the theorem follows Lemma 5.6.3 and (5.6.6). PROOF OF THEOREM 5.2.2. Let Sn = Sn (2:1) = L1 5,3", where 51-," is defined as (56.4), then one has ES” = O and 1" (m1 (2:1)) = Sn/n + b(:1:1)h2 + u (h2). ’7 (k) = '7 (19,171) = 00V (6231:, €i+k,n) 0?. = ESE. = var (Sn) = var (2:16.311) = 2le var (5...) + 2;]. cov (mm) = nvar (ft-,,,) + n 2 (1— I?) 7 (k) = nvar (5m) + nAn, ISIkISn-l where var (an) = h-1f1($1)E{02(X)IX1 = 21} IIKIIE + U (h4) . While according to Lemma 5.6.5, one has _1+ ”(Ml = ICOV (€z‘,n,€i+k,n)l S Ch $201 (1.0727), 116 A... Hence _1+ lAnl = lzlslllsn—17(k)l S ZISlllSn—l (1- lnfl) h 2:3 {K0 exp (—/\0k)}2% 1+ Ker-ail lelllSn_IeXP{-Aokn/(2+n)}, l/\ __ 1+ so there exists a constant C1 such that An _<_ 01h 237%. So An/ var (Em) —-> O as n —-» oo. . Then 0?, ~ nvar (gm) 2 can when n is large, so according to (2.5.1) in Lemma 2.5.1, there exist constants c1 and c2 such that for some 0 < 17 S 1 P {07:15}, < z} — (z)| 3 c1 6:2,, {log (an/6(1)”) /)‘}1+fl An = sup 2 for any A with A1 3 A S A2, where 1 2 b _ 1 2 A1 = 02 {log (an/co/ )} /n,b > 2 (1 + n)/n;)\2 = 4(2 + 77)?) 110g (an/co/ )- For the 17 in Assumption (A5), set A = 4(2 +17)17—1 log (an/c5”), then by Assumption (A6) (in is max {El [b’ {m (Km — b’ {m1 ($1) + m-1(x.-_1>}+ 0 mm] Kh (Xn - 31> I'm} lgign = 1:33;" {E ICbh + a (Ma-12+" lKh (Xu — $1) 12+") S 005012 {ElKh (X1 - $1) |2+"} = 0 {h—(1+")}: i.e., An = o {h-(1+")/a2} = 0 {n(1+’7/2)/5"7/2} = 0(n1/5‘2’7/5) —> 0 when 1/2 < n s 1. So Sn/an —+ N(O,1), then n{z*' oo, sup.,e[h,1_h] If" (m ($1)) — 2'” (m1 (semi —) 0 because SUprc1E[h,1—h] lml (2:1) — m1 ($1)] —+ 0. Then according to Slutsky’s theorem, one has Vnh {{filKJ (£131) — m1($1)}Dl(:r1) — biasl (1:1) 112} —+ N (0,2)? (231)) . Where D1 (1:1) is defined in (5.6.7). Then the theorem is proved. 117 PROOF OF THEOREM 5.2.3. According 'to Mean Value Theorem, there exists a 6 between c and a such that (E— c) f” (E) = t7 (E) — I’ (c) = —i' (c), where —f” (E) = n"1 "_ _lb” {c + m _C (X,- )} > Ch > 0 according to Assumption (A2) and where m c (X): 221,21 ma (X0) and then the infeasible estimator is 5 = argmaxae A [C (a) . Clearly, l2. (5) = Using Bernstein’s Inequality, one has —’a..s 0 i’(c)|=|n ‘12:, -’b{c+mc(X.-)}] }]=| |-1\;10(x,)5, which implies IE-cl = Ua_s, [1/(n_1/2). So 11‘1 221—1 b” {c+ m _c (X,)} and it convergents to Eb” {m (X)} almost sure. Then ac- i” (if) —I” (c)| ——>a,3, 0, in which i” (c) = cording to central limit theorem, we — c) ..,, N(0 [E Eb”{m(X>}] E02091) . 5.6.3 Spline backfitted kernel estimators In this section, we give the proof of Theorem 5.3.1. First, define the theoretical inner product of b J and 1 with respect to the a—th marginal density fa (2:0,) as c J,,, (b J (X0),1) = f b J (1:0,) fa (226,) data and define the centered B spline basis b J,,, (2:0,) and the standardized B spline basis B J,0 ($0,) as CJ bJ,a($a) = bJ (301)“ c : bJ—l (513a): _ ,a b a: BJ’O (ma) = Jill—32,1 S J S N+1, (5.6.7) ll J,a”2 so that EBJ’a (X0) 5 0, E830 (Xa) .=—_' 1. For V9 6 6'2, one can write g = ATB (X,) for a T vector A = (A0, AJ,Q),_<_JSN+1’,SO_<_d e Rl+d(N+1) and B (X) = {1.3m (931) , ---:BN+l,d ($d)}T, (53-6-8) Then with a slight abuse of notation, we denote 1‘;(g)=,11(>.)=n -1 "_ _1[1/,A(B(){ ,-)—b{>.TB(x,) }] and then L: 17712:[YBMXM—bI{ATB(X)}B(Xi)]- (5.6.9) 118 The multivariate function m (x) is estimated by an additive spline function fi1(x) = mg + 2:; ma (xa)- =ATb (x), (5.6.10) A = (A0, AJQ)T lsasd — -argmaxL (A). 1£JSN+1 According to (5.3.2), existing a mm (11:1) between filSBK,1 ($1) and fiIKJ (1:1) such that l" (ThSBKJ (3:1)) — 1’ (771K; (2:1)) = 5” (firm ($1)) {mSBK,1($1) '- 771m (331)}: Then according to l" (ThSBKJ (5131)) = 0, one has (7 (film (931)) 1” (771K; (31)) Let fit be an additive spline function such that “m —- mllc>0 _<_ COOH2 in the Lemma 3.6.1 ThSBKJ ($1) - 771m ($1) = - (5-5-11) and A such that m (x) = ATB (x). (5.6.12) In what follows, we denote the dimension of vector A as Nd = (N + 1) d + 1. PROOF OF THEOREM 5.3.4. Existing 6' between 6 and 5 such that (3 — E = 41(5) fig (5’) ,where —[”(E’ ) = n“1 23,-21b” {E’ +rh_c (X,)} > Cb > 0 according to As- sumption (A6), then 11(6) = 21(6) — 11(5) = 72-12;, [1" {6+ m... (X)} — b’ {6+ mac-)1] = 1/n 2;, b” {a + m.. (X.)} {m (X.) — m- (X.)} +0 [1/n 2;, {m.. (X) — m... (Ea->12] = I + 00,3, (NdH4 + Ndn-1 logn) , by Lemma 5.6.9, where I = 11 + 12, I. = 1/n 2;, b” {a + m. (X61 {..., (X1) - m-.- (X61, 6 = 1/n 2;, b” {a + m-. (X)} {m (X1) — 6-. (X61. According to (3.6.1), 11 = 0,13, (H 2), while 2 ,,_1 :2; b” {a + m_c (x,)} x {ZI.,J,.EJ,.. (26.)) , __ n .. 12,?) = n 1 22:1 b” {C + m-C (Xi)} ZISJSN+1,1SOSd @U,J,GBJ,O (X20) 1 r-A-H ... n .. 12”" = n 1 21:1 b” {c + m'c (X¢)} {ZISJSN+1,1San (D’J'O‘BJ’Q 0%)} ‘ __ n |12,bl S 0671 1 22;, {ZngSN+1,1sagdl(pb'J'a| IBJ,a (Xia)l} 2 1/2 S Cb {Z15J_<_N+1,1gagd {plaid} x —1 n 2 1/2 [1 + Z15J§N+1,2$agd{n 21:1 |BJ10(X“')I} ] = c, x 0,, (Nj/ 2115/?) x [0,, (1) + (N + 1) x (d — 1) x 0,, (11)] = 06... (N3/ 2115/2) according to (5.6.19) and (5.6.21), similarly |Iz,.l = O... (N..H7/2 + NdH-1/2n_l log n) . One has [gm = f2,” + 00,3, (”—1/2) x 00,, (Ni/Qn-l/2 log n) x 0 (N), where ~ _ 11 12,1) = n 1 Zizl b" {m (Xin’ {leJSN+1,ISa_<_d q)v,J,aBJ,a (Xia)} = 213.196ngan man-1 2;, b" {m (X.)} 3,, (X...) =-- 21 93“,,303, ¢.,J,..Eb” {m (X)} 81,... (X...) +003. (Ni/2n.”2 log n) x Ns/Z x 0&3, (n’2/5 log n) = [‘24, + 0&3, (Ndn-g/ 10 log2 n) where i2,” = ZI, = "8’51; 2er [a (x,)s,-] B (x,), (5.6.17) and r = 5. — S. — «r, — ,,. (5.6.18) 123 LEMMA 5.6.8. Under Assumptions (A1)-(A5) and (A7), 'A - AI = 0,1,3, (H2 + 71—1/2 log n), (5.6.19) ”A — A“ = Oa,3_ (Ni/2H2 + Nyzn-l/2 log n) . “in.” = 0.... (H2N3/2n-1/2log2 n) .14»): = 0.... (NJ/2n-1/Zlogn) , “erl = 0&5, (1)/(iffy2 + NdH-l/zn—l log n) . Proof. Mean Value Theorem implies that there exist an Nd x Nd diagonal matrix t whose diagonal elements are in [0, 1], such that for A* = tA+ (I Nd — t) A at (A) 311(A) 6211(A) ()1 A) ___—__. —. —— 2 T — , 6" A=A 8A A=A (”6" A=A* A -—1 A . _ 62L (A) a A) A—A=— . ——— According to (5.6.9), 621: (A) 1 11 II T T — = — b A B - B - B x- BWT n21 { (X.)} (X.) (.) So curl 2;, B (X.) B (X.)T : n-12;, [b" {ATB (X.)} B (X.) B (X.)"] ’ s Cbn-l 2;, B (X.) B (X.)T because B (X4) B (X,-)T Z 0 and Assumption (A7). Lemma 5.6.7 imply that 2 A n o < cbchN, s gig—($2 = ,,1—2,___, [b” {XTB(X.)}B (X.)B (X.)T] S CbCVINd < oo a.s.. Then (5.6.19) follows Lemma 5.6.6. Next, aim) 8A _ a?t(A) X aAaAT (.-.) A=A 2 = -51; :b’” {A*TB (X,)} {(A — A)TB (19)} 130(1)- A: 124 X_X:_($£Q) '1 3120.) BAaAT .\=x 8A A=A .. -1 (2:22? .-.) firsbmuflmxoi{TB}2w while 021:0.) ’1 aim — (3‘3)? .\=:\) 71‘— A=X = —Sb;1;Z:—.1 [n3 (xi) — b’ {STE 0%)} B (3%)] = ‘I’b + 1’1) and ,. —1 r = (2:23;? H) 51; 2:215" {x*TB(x.-)} {(x — :x)TB (xi)}2B(x1-). Note that 2 ”21n— 2:11;" {YTBom} {(x— x)TB(x.-)} Bowl _<. 0.21,; z; {(x — X)TB(Xz-)}2 "Bacon s $2; {(x - X)TB(x.-)}2 S 55172 (X - :‘lT {$Z:=IB<’9)B<’9)T} C“ X) ___‘L 2H1/2 = 0.1.3. (NdH7/2 + NdH—l/ 271—1 log2 n) . S Ill-7‘“2 =0” (1V0t11"‘+1Vc1"‘1 1°? ”) X m So ||,.|| = 0.1.3. (NdH7/2 + NdH-l/Qn-l log n) . Next, 2 “an? = Sb;- 2; [b' {m (xm — b’ {m dv1dua s CKC2H- and therefore sup |Ew ($1)] = o (HI/2) (5.6.22) $1€[O,I] by Lemma 4.7.2. Similarly, E'w 1,0, (2:1 )r ~ hl—rH1_r/2, hence Ewyfl (3:1)2 ~ h‘l. Accord- ing to Lemma 2.5.2 and similar proof of Lemma A5 in [68], one has sup sup sup lea(:1:1)— EwJa (1:1)I- — Oans (logn/Vn h.) 116(0, 1] 1b’J,a, q’b,J,a are the corresponding elements in the vectors (Pb, ‘1)” and (Pr defined as (5.6.16), (5.6.17) and (5.6.18). _ 71: |12,b| 5 CW 1 21:1 {[9pr + 219$»eran I‘I’b,J,a| IBJ,a (Xia)l} Kh (X11 - :61) 1/2 2 2 2 —1 n . _ 5 0” [{‘pbfl + ZISJSN+1.2-<_arSd(bvawa}] x [{n 21:1 K" (X‘1 331)} _1 n 2 1/2 + ZISJSN+LZSan {71 21.21 lBJ,a (Xia)| Kh (X11 - $1)} ] :1 Cb x 0,, (Ni/2H5”) x [0,... (1) + (N +1) x (d — 1) x 0a.s. (H)] = 0.1.3. (Ni/211W?) . according to (5.6.19) and (5.6.21), similarly |12,,| = 0,3, (NdH7/2 + NdH“1/2n—1 log n) . 128 12,11 = 7,2,1, + 0&3, (n—2/5 log n) x 00.3, (Ni/:Zn—U2 log n) . where ...1 n 12,), = n Zizl b” {m (Kin {(1)150 + ZISJSN+LZSC¥SC1 (Dv,J,ozBJ,a (Xia)} Kn (X11 - $1) = {10311—1 2:21 1)" {7710(1)} Kh (X271 - $1) + __ n ZlSJSN+LZSa$d (pvJon l Zizl b” {m (xi)} BJ,a (Xia) Kh (Xil ‘ $1) = <1..on {m (X)} K}. (X1 — x1) + ZEMMSOQ .,J,aEb" {m (X)} 81,. (X.) K). (X1 — 21) +00"; (Ndl/Zn—l/2 log n) x Ni” x 0,1,3. (n-2/5 log n) = f2,v,1 + f2,v,2 + Oa.s. (Ndn-9/101082 n) Where T211 = (1,031" {m (X)} K}. (X1 — x1) 1 n = Eb” {m (X)} Kh (X1 — $1) a 21:1 0 (X051 12,1),2 = ZISJSN+L2SOS¢1 (Dv,J,aEb” {m (X)} BJ,a (X0!) Kh (X1 '“ 31) T II = {Eb {m(X)} BJ.a (X0) Kh (X1 " $1)}ISJSN+1,2_<_an x ((bv.J.a)1ngN+1,2gagd _ II — ZISJSN+1,2_gangb {m(X)} BJ,a (X0) Kh (X1 — $1) 1 n .7; Zizl 0' (x1) 5’: {SJaa + ZISJ,SN+1,ISOI_<_d SJ,(I,J’,O,BJ’,(I, (XiOJ)} 1 n . = a 21:1 0 (Xi) 82‘ leJsNHzgasd Hb,k,J,a ($1) {8J3 + Zng’SN+l,lga’_<_d SJaOJ'Ia'BJ'aO/ (Xi )} 129 where $03,510,, S Ja J, a; are the corresponding element in the matrix Sb defined in (5.6.13) has the form shown in the proof of Theorem 5.3.4 and 115*, J,a ($1) = Eb” {m (X)} By", (X0) Kh (X1 — 11:1) , which has the order 00,3, (HI/2). Denote 0,; = 1 2 n9°(—-—n<90<-), 6.431" --- Blue-(>175, f3 = Eez-Iilez-ISDn}, £2" = 2 + 5 eiI{|e,-| 3 Dn} — 655‘. Then flu? = A1 + A2 + A3 where l n D M: = ZISJ5N+1,2gagd#b’k’J’a ($1) ($1) a :0 (X05231? i=1 {51.0 + ZISJ’SN+1,1So/Sd SJ.a.J’.a’BJ’.a’ (Xia’)} 1k = 1' 2’ 3' Then one has with probability 1, A1 = 0 for large 11. Next, .2+n = I—Ee.I{Ie.I > 0n}: 3 PM— = 0091”“) , 5p” 1 ,3 D’ll'l'fl 2 A3 S 05b [ZnggNHggagdflb’k’Jva ($1) _1 n D" 2 1/2 Z1_<_J’SN+1,ISo/Sd{n 21:1 BJ’B’ (Xia’ ”(Km-233 } ] - 1+71) 2 < ( Z — CD" [ 1_<_J_<_N+1,2Sa$dub1kv'lva ($1) _1 n 2 1/2 Z15J’gN+1,1_<_a’5d{n 2143756“ 0900009)} ] __ 1 2 _ 1 2 Dn(1+”)o,,,., {(NHNloan/n) / } = Dn(1+")oa,s, {(Nlog2n/n) / } = 0a.3_ (”_2/5) . Lastly, A2 = 00.3, (n‘3/51‘i-1/2 log n) = 013, (71-2/5) according to Bernstein’s Inequality. Then fgmg = 0&3, (n—2/5) according to the orders of A1,A2 and A3. With similar proof, we can show ’13,,“ = 00“,, (n'2/5). Lastly, denote A2 = 71‘1 2?:1 {1, where D 52' = ZISJSN+172gasdflb,k,J,a($1)0(Xi)5i,2n {SJ:Q + ZISJISN+1a1SQISd SJ,O,J’,CY’BJ’,O, (Xtai)} ' 130 Then Efii = O, and D N+1.d var (51') = MachJ var { 0 (X2) 8i; Dn } Salub’c BJljd (Xia,) a (X1)Ei1k J’:laa,=1 s 000§Cvu£k_1utjk_c = 0 (1). Then A2 = Oa,3_ (71-1/2 log n) = oa.s_ (n‘2/5) according to Bernstein’s Inequality. Then T2,”; = 00,3, (n‘Q/S) according to the orders of A1, A2 and A3. With similar proof, we can show T2,,“ = 00.3, (n’Z/f’). Then the lemma is proved. Cl LEMMA 5.6.12. Under Assumptions (A1)v(A7), Va, 0 S supx1€[h,1_h] |_[Il(a)[ S C a.s. for some constants 0 < c < C. Proof. According to (5.3.2), one has A n A 1"(0) = *1/71 2):, lb" {0 + 771-1 (X1_1)}] Kh (X11 *- $1)- Cb S b"{a+fil_1(xi-1)} S Cb and Spr1e[h,1—Iz]|1/nZiL-1 Kh (X11 - $1) - f(xlll = Oa,3_ {(nh)"l/ 2 log n} imply the lemma. Cl 131 f Table 1. Simulated example 2.4.1 n [2,, (F) D“ (F) — 12,, (F) MISE (F) MISE (F) — MISE (F) 50 0.101 0.055 0.157 0.021 p = 0, 100 0.073 0.035 0.072 0.010 a = 0. 200 0.051 0.022 0.033 0.004 500 0.034 0.012 0.014 0.001 50 0.107 0.051 0.201 0.032 p = 0.5, 100 0.075 0.034 0.088 0.015 a = 0.2. 200 0.052 0.022 0.041 0.004 500 0.037 0.011 0.019 0.002 50 0.106 0.035 0.202 0.035 p = 0.9, 100 0.073 0.024 0.086 0.014 a = 0.2. 200 0.050 0.015 0.040 0.006 500 0.036 0.008 0.020 0.002 Note: on and MISE of F and F. Table 2. Simulated example 3.4.1 Estimation n = 400 n = 800 n = 1600 n = 3200 61 0.036325 0.023289 0.013743 0.008098 Note: The mean of squared errors for 100 replications. Table 3. Simulated example 3.4.1 n 400 800 1600 3200 Spline estimation 4 11 31 92 Local linear estimation 102 630 3200 18000 Time ratio 1 : 25 1 : 57 1 : 103 1 : 196 Note: Computing time (in seconds) of cubic spline estimation and local linear estimation of parameter 010 for one replication with n = 400, 800, 1600, 3200. PC with Intel Pentium IV 1.86 GHz processor and 1.0 GB RAM. 132 Table 4. Fitting DEM/GBP returns Fitted Model Log—Likelihood Volatility Prediction Error GARCH(1,1) GJ R Semi. GARCH(Kernel) Semi. GARCH(Spline) 0.5231 0.5233 0.5306 0.5786 0.1045 0.1039 0.0994 0.0987 Table 5. Fitting DEM/USD returns Fitted Model Log-Likelihood Volatility Prediction Error GARCH(1,1) GJR Semi. GARCH(Kernel) Semi. GARCH(Spline) -0.1567 -0.1566 -0.1508 -0.1485 0.6667 0.6661 0.6529 0.6476 Table 6. Residual check for fitting DEM/GBP returns ACF up to lag létl ,Zt (€42,th Iétld,Z§ létI4,Z§ 100 0.07, 0.09 0.02, 0.06 0.02, 0.05 0.01, 0.05 200 0.045, 0.065 0.01, 0.04 0.01, 0.035 0005,0035 300 0.04, 0.06 0.007, 0.037 0.007, 0.033 0.003, 0.047 Table 7. Residual check for fitting DEM/USD returns ACF up to lag létll, 232 lgtlJ , Z? léth , Z? 100 004,009 004,006 0.06, 0.05 0.05, 0.05 200 0025,0065 0.025, 0.04 004,0.035 004,0035 300 0.0167, 0.06 0.0167, 0.037 0.03, 0.033 0038,0047 133 Table 8. Simulated example 4.6.1 SBLL fit 71 = 200 n = 500 Spline fit p = 1 m01 = 2 1.9813(0.1636) 1.9964(0.0980) "102 = 1 0.9909(0.0539) 0.9989(0.0343) mll 0.0255 0.0096 "‘21 0.0276 0.0089 m12 0.0113 0.0041 "’22 0.0097 0.0030 n=200 n=500 1.9813(0.1636) 1.9964(0.0980) 0.9909(00539) 0.9989(0.0343) 0.0561 0.0185 0.0125 0.0063 0.0089 0.0063 0.0085 0.0065 Note: the means and standard errors (in parentheses) of 132.01, 11102 and the AISEs of mSBLLle mSBLL,12v mSBLL,211 171.3131,ng by two methods: SBLL and polynomial Spline. Table 9. Simulated example 5.5.1 d = 5 n MISE (mSBKJ) MISE (mSBK,1) EFF (fizSBKJ) std {EFF (ThSBK,l)} p = 0. a _ 0 500 0.054 0.060 1.112 0.274 r = 0.5, 500 0.101 0.094 1.023 0.279 a = 0.5. Note: The MISES and EFFS of mSBKJ: mSBKJ- Table 10. Simulated example 5.5.1 (1 = 5 n MISE (67.33“) MISE (mSBKQ) EFF (mSBm) std {EFF (mSBK,2)} r = 0, . a _ 0 500 0.017 0.027 1.503 0.896 r = 0.5, 500 0.036 0.417 0.997 0.400 a = 0.5. Note: The MISEs and EFFs of 712.33“, mSBm. 134 Table 11. Simulated example 5.5.2 d —-— 10 n MISE (177.3ng) MISE (171191) W (mSBKJ) std {EFF (ThSBKJ) } 500 0.0965 0.0701 0.9868 0.3813 1' = 0, 1000 0.0491 0.0453 1.0228 0.2324 a = 0. 1500 0.0298 0.0331 1.1021 0.3123 2000 0.0246 0.0280 1.1014 0.2161 500 0.0992 0.0735 0.9515 0.3154 r = 0, 1000 0.0453 0.0440 1.0489 0.2741 a = 0.5. 1500 0.0285 0.0327 1.0957 0.2306 2000 0.0259 0.0282 1.0801 0. 1823 500 0.2318 0.1373 0.8732 0.3122 r = 0.5, 1000 0.1343 0.0885 0.9186 0.4027 0 = 0. 1500 0.0756 0.0605 0.9294 0.2493 2000 ' 0.0567 0.0474 0.9811 0.2877 500 0.2757 0.1386 0.8509 0.3356 7‘ = 0.5, 1000 0.1389 0.0899 0.8950 0.2731 a = 0.5. 1500 0.0776 0.0601 0.9686 0.2715 2000 0.0593 0.0485 0.9885 0.3050 Note: The MISES and EFFS of mSBKJ: ThKJ. Table 12. Simulated example 5.5.2 n 500 1000 1500 2000 r = 0, a = 0. 5.6 22 49 86 r = 0.5, a = 0.5. 7.2 27 57 102 Note: Computing time of 7533K, 1. 135 acf L Sample autocorrelation function (act) 015 L— 10 15 20 Figure l. ACF plot of GDP quarterly growth rate. 136 2.5 1.5 0.5 137 Figure 2. Timeplot of GDP quarterly growth rate. 250 acf Sample autocorrelation function (acf) 015 Figure 3. ACF plot of unemployment quarterly growth rate. 138 15 x x u u 10- _10 I 1 l l O 50 100 150 200 Figure 4. Timeplot of unemployment quarterly growth rate. 139 Conditional Survival Curve Survival Probability 015 GDP Quarterly Growth Rate Figure 5. Survival curves of GDP growth rate conditional on unemployment growth rate. Note: th E [-0.08, —0.04], thin solid; Kg 6 [-0.02, 0.02], thick solid; Kg 6 [0.04, 0.08], dotted. 140 Empirical densities of parameter estimates v... L— NJ .. '''' “ _ .... fl .— ‘l- - ‘ \ o .,-/ \ ‘ .y' ' 8 / fl... \\ z x," \ I .0 ' C y I r l 0.2 0.4 0.6 0.8 X Figure 6. Plot of densitiw of 61. Note: 72. = 400 - dashed line, 17. = 800 - dotted line, n = 1600 - thin solid line,n = 3200 - thick solid line 141 Residual Series Plot Residual Figure 7. Residuals of DEM/USD daily returns 142 Estimated m(x) Function estimation 00 c5“ ' ©— - C v!- b O N o l 0 5 10 15 x Figure 8. Estimated function m for the semiparametric GARCH model. 143 GDP forecast errors Error(billions) 1994 1996 1998 2000 2002 Year Figure 9. Errors of GDP forecasts. Note: model (4.6.2)—solid line; model (4.6.1)—dotted line. 144 TFP growth rate’E-3 TF P growth rate estimation X*E-2 Figure 10. Estimation of function c1 + mSBLL,41 (mt_3). 145 Function estimation d ‘1 Figure 11. A typical estimator of mu based on n = 500 observations. Note: true function mll—solid line; fitSBLLJl—dotted line. 146 GDP and estimated TF P grth rates A A A A .3851; - #5? AA“ ‘2; .3": A A i 2 ’5 A O i :3 h «:1- i _ 1970 1980 1990 2000 Year Figure 12. GDP growth rate—dotted line; estimated TFP growth rate—solid line. . 147 Efficiency of the l-st estimator, r=O, a=0 ..._ ... .1. F \ ". / : \ 't .' \ °. [I .: \ 'g I 3 ‘ ". I ' ‘°‘ . S I '.\ V)_ I ‘0 "' O I o .... I C ... / .. ... \ / C l l l 0 0.5 l 1.5 X Figure 13. Plot of empirical distribution of relative efficiency: 1' = 0, a = 0. thick solid line. Note: 11. = 500 - dashed line, n = 1000 - dotted line, it = 1500 - thin solid line,n = 2000 - 148 Efficiency of the l-st estimator, 1:0, a=O-.5 N l 1,5 015 Figure 14. Plot of empirical distribution of relative efficiency: r = 0, a = 0.5. Note: n = 500 - dashed line, n = 1000 — dotted line, it = 1500 - thin solid line,n = 2000 - thick solid line. 149 Efficiency of the l-st estimator, r=0.5, a=0 '0 X Figure 15. Plot of empirical distribution of relative efficiency: 1' = 0.5, a = 0. Note: n = 500 - dashed line, n = 1000 — dotted line, n = 1500 - thin solid line,n = 2000 — thick solid line. 150 Efficiency of the l-st estimator, r=O.5, a=0.5 0 0:5 1 1'5 X Figure 16. Plot of empirical distribution of relative efficiency: r = 0.5, a = 0.5. Note: n = 500 - dashed line, n = 1000 - dotted line, n = 1500 - thin solid line,n = 2000 - thick solid line. 151 Confidence Level = 0.95, n = 500 Figure 17. Plot of function estimation for r = 0, a = 0: n = 500. Note: m1(x1) - solid line, fizK,1(xl) - dashed line, confidence bands and mSBK,1($1) - three dotted lines. 152 Confidence Level = 0.95 , n = 1000 Figure 18. Plot of function estimation for r = 0, a = 0: n = 1000. Note: m1 (1:1) - solid line, fhKJCrl) - dashed line, confidence bands and rhSBK,1(a:1) - three dotted lines. 153 Confidence Level = 0.95, n = 1500 I l C; 1,5 Figure 19. Plot of function estimation for r = 0, a = 0: n = 1500. Note: m1 (11:1) — solid line, fizK,1($1) - dashed line, confidence bands and filSBK,1(-’El) - three dotted lines 154 Confidence Level = 0.95, n = 2000 C: Figure 20. Plot of function estimation for r = 0, a = 0: n = 2000. Note: m1(m1) - solid line, 172K,1(:1:1) - dashed line, confidence bands and mSBK,1(-’31) - three dotted lines. 155 Confidence Level = 0.95, n = 500 Figure 21. Plot of function estimation for r = 0.5, a = 0.5: n = 500. Note: m1(1:1) - solid line, fizK,1(:c1) - dashed line, confidence bands and mSBK,1($l) - three dotted lines 156 Confidence Level = 0.95, n = 1000 C: ‘ J .1 -l -0.5 O 0.5 l Figure 22. Plot of function estimation for r = 0.5, a = 0.5: n = 1000. Note: m1(2:1) - solid line, 171K,1(:c1) - dashed line, confidence bands and filSBKJCCl) - three dotted lines. 157 Confidence Level = 0.95, n = 1500 Figure 23. Plot of function estimation for r = 0.5, a = 0.5: n = 1500. Note: m1(a:1) - solid line, rhK,1(a:1) - dashed line, confidence bands and fitSBKJCCI) - three dotted lines. 158 Confidence Level = 0.95, n = 2000 1 l Figure 24. Plot of function estimation for r = 0.5, a = 0.5: n = 2000. Note: m1(:c1) - solid line, firKJCcl) — dashed line, confidence bands and 771$BK,1(5’31) — three dotted lines 159 BIBLIOGRAPHY [1] Arnold, R. (2005). RED and Productivity Growth: A Background Paper. Congressional Budget Office. [2] Bickel, P. J. and Rosenblatt, M. (1973). On some global measures of the deviations of density function estimates. Annals of Statistics, 1, 1071—1095 [3] Bollerslev, T. P. (1986). Generalized autoregressive conditional heteroscedasticity. Jour- nal of Econometrics, 31, 307—327. [4] Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. Springer-Verlag, New York. [5] Brown, L. D. and Levine, M. (2007). Variance estimation in nonparametric regression via the difference sequence method. Annals of Statistics, 35, 2219-2232. [6] Cai, Z., Fan, J. and Yao, Q. (2000). Emotional-coefficient regression models for non- linear time series Journal of the American Statistical Association, 95, 941—956. [7] Chan, N ., Deng, S., Peng, L. and Xia, Z. (2007). Interval estimation of value-at-risk based on GARCH models with heavy-tailed innovations. Journal of Econometrics, 137, 556—576. [8] Chen, R. and Tsay, R. S. (19933). Nonlinear additive ARX models. Journal of the American Statistical Association, 88, 956—967. [9] Chen, R. and Tsay, R. S. (1993b). Functional-coefficient autoregressive models. Journal of the American Statistical Association, 88, 298-308. [10] Cheng, M.Y. and Peng L. (2002). Regression modeling for nonparametric estimation of distribution and quantile functions. Statistica Sinica, 12, 1043—1060. [11] Cobb, C. W. and Douglas, P. H. (1928). A theory of production. American Economic Review, 18, 139—165. [12] Culpepper, W. L. (2004). High RED Spending Fuels Rev- enue Growth not Profits. Available for downloading at http.//www. culpepper. com/eBulletin/2004/A ugustRatiosA rticle. asp. [13] Dahl, C. M. and Levine, M. (2006). Nonparametric estimation of volatility models with serially dependent innovations. Statistics and Probability Letters, 76, 2007—2016. 160 [14] de Boor, C. (2001). A Practical Guide to Splines. Springer-Verlag, New York. [15] DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation: Polynomials and Splines Approximation. Springer-Verlag, Berlin. [16] Doukhan, P. (1994). Mixing: Properties and Examples. Springer-Verlag, New York. [17] Duan, J. C. (1997). Augmented GARCH(p, q) process and its diffusion limit. Journal of Econometrics, 79, 97—127. [18] Engle, R. F. and Ng, V. (1993). Measuring and testing the impact of news on volatility. Journal of Finance, 48, 1749—1778. [19] Falk, M. (1985). Asymptotic normality of kernel type estimators of quantiles. Annals of Statistics, 13, 428—433. [20] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chap- man and Hall, London. [21] Fan, J., Hardle, W. and Mammen, E. (1998). Direct estimation of low-dimensional components in additive models. Annals of Statistics, 26, 943-971. [22] Fan, J. and Jiang, J. (2005). Nonparametric inference for additive models. Journal of the American Statistical Association, 100, 890—907. [23] Glosten, L. R., Jaganathan, R. and Runkle, D. E. (1993). On the relation between the expected value and the volatility of the nominal excess return on stocks. Journal of Finance, 48, 1779—1801. [24] Hafner, C. M. (1998). Nonlinear Time Series Analysis with Applications to Foreign Exchange Rate Volatility. Physica-Verlag, Heidelberg. [25] Hafner, C. M. (2008). Temporal aggregation of multivariate GARCH processes. Journal of Econometrics, 142, 467—483. [26] Hafner, C. M. and Herwartz, H. (2006). Volatility impulse responses for multivariate GARCH models: An exchange rate illustration. Journal of International Money and Finance, 25, 719—740. [27] Hardle, W. , Hlavka, Z. and Klinke, S. (2000). XploRe Application Guide. Springer- Verlag, Berlin. [28] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall, London. [29] Hastie, T. J. and Tibshirani, R. J. (1993). Varying-coefficient models. Journal of the Royal Statistical Society Series B, 55, 757—796. [30] Hentschel, L. (1995). All in the family: nesting symmetric and asymmetric GARCH models. Journal of Financial Economics, 39, 71—104. 161 ill. [ill-1' l1 [31] Hengartner, N. W. and Sperlich, S. (2005). Rate optimal estimation with the integration method in the presence of many covariates. Journal of Multivariate Analysis, 95, 246— 272. [32] Horowitz, J. and Mammen, E. (2004). Nonparametric estimation of an additive model with a link function. Annals of Statistics, 32, 2412—2443. [33] Horowitz, J. Klemela, J. and Mammen, E. (2006). Optimal estimation in additive regression. Bernoulli, 12, 271—298. [34] Huang, J. Z. (1998a). Projection estimation in multiple regression with application to functional AN OVA models. Annals of Statistics, 26, 242—272. [35] Huang, J. Z. (1998b). Functional ANOVA models for generalized regression. Journal of Multivariate Analysis, 67, 49—71. [36] Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Annals of Statistics, 31, 1600—1635. [37] Huang, J. Z. and Shen, H. (2004). Functional coefficient regression models for non- linear time series: a polynomial spline approach. Scandinavian Journal of Statistics, 31, 515—534. [38] Huang, J. Z., Wu, C. O. and Zhou, L. (2002). Varying—coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika, 89, 111—128. [39] Huang, J. Z. and Yang, L. (2004). Identification of nonlinear additive autoregression models. Journal of the Royal Statistical Society Series B, 66, 463—477. [40] Levine, M. (2006). Bandwidth selection for a class of difference-based variance esti- mators in the nonparametric regression: a possible approach. Computational Statistics and Data Analysis, 50, 3405—3431. [41] Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Annals of Statistics, 36, 261—286. [42] Li, Q. and Racine, J. S. (2007). Nonparametric Econometrics: Theory and Practice. Princeton University Press, Princeton. [43] Linton, O. B. (1997). Efficient estimation of additive nonparametric regression models. Biometrika, 84, 469—473. [44] Linton, O. B. and Hardle, W. (1996). Estimation of additive regression models with known links. Biometrika, 83, 529—540. [45] Linton, O. B. and Mammen, E. (2005). Estimating Semiparametric Arch (00) Models by Kernel Smoothing Methods. Econometrica, 73, 771—836. 162 [46] Linton, O. B. and Nielsen, J. P. (1995). A kernel method of estimating structured nonparametric regression based on marginal integration. Biometrika, 82, 93—101. [47] Liu, R and Yang, L. (2008). Kernel estimation of multivariate cumulative distribution function. Journal of Nonparametric Statistics, 20, 661—677. [48] Liu, R and Yang, L. (2009). Spline-backfitted kernel smoothing of additive coefficient model. In Press. Econometric Theory, [49] Mammen, E., Linton, O. and Nielsen, J. (1999). The existence and asymptotic proper- ties of a backfitting projection algorithm under weak conditions. Annals of Statistics, 27, 1443—1490. [50] McConnell, C. and Brue, S. (1999). Economics: Principles, Problems, and Policies. Irwin/McGraw—Hill, Boston. [51] Nielsen, J. P. and Sperlich, S. (2005). Smooth backfitting in practice. Journal of the Royal Statistical Society Series B, 67, 43—61. [52] Opsomer, J. D. and Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression. Annals of Statistics, 25, 186—211. [53] Peng, L. and Yao, Q. (2003). Least absolute deviations estimation for ARCH and GARCH models. Biometrika, 90, 967—975. [54] Pham, D. T. (1986). The mixing properties of bilinear and generalized random coeffi- cient autoregressive models. Stochastic Analysis and Applications, 23, 291—300. [55] Reiss, R. D. (1981). Nonparametric estimation of smooth distribution functions. Scan- dinavian Journal of Statistics, 8, 116—119. [56] Robinson, P. M. (1983). Nonparametric estimators for time series. Journal of Time Series Analysis, 4, 185—207. [57] Rodriguez-Poo, J. M., Sperlich, S. and Vieu, P. (2003). Semiparametric estimation of separable models with possibly limited dependent variables. Econometric Theory, 19, 1008—1039. [58] Samuelson, P. (1995). Economics. Irwin/McGraw-Hill, New York. [59] Scott, D. W. (1992). Multivariate Density Estimation: Theory, Practice, and Visual- ization. John Wiley & Sons, New York. [60] Sperlich, S., Tjestheim, D. and Yang, L. (2002). Nonparametric estimation and testing of interaction in additive models. Econometric Theory, 18, 197—251. [61] Solow, R. M. (1957). Technical change and the aggregate production function. The Review of Economics and Statistics, 39, 312_320_ 163 [62] Stone, C. J. (1985). Additive regression and other nonparametric models. Annals of Statistics, 13, 689-705. [63] Stone, C. J. (1986). The dimensionality reduction principle for generalized additive models. Annals of Statistics, 14, 590—606. [64] Stone, C. J. (1994). The use of polynomial splines and their tensor products in multi- variate function estimation. Annals of Statistics, 22, 118—184. [65] Sun, Y. and Stengos, T. (2006). Semiparametric efficient adaptive estimation of asym- metric GARCH models. Journal of Econometrics, 133, 373—386. [66] Sunklodas, J. (1984). On the rate of convergence in the central limit theorem for strongly mixing random variables. Lithuanian Mathematical Journal, 24, 182—190. [67] Tjestheim, D. and Auestad, B. (1994). Nonparametric identification of nonlinear time series: projections. Journal of the American Statistical Association, 89, 1398—1409. [68] Wang, L. and Yang, L. (2007). Spline—backfitted kernel smoothing of nonlinear additive autoregression model. Annals of Statistics, 35, 2474—2503. [69] Xue, L. and Yang, L. (2006a). Estimation of semiparametric additive coefficient model. Journal of Statistical Planning and Inference, 136, 2506—2534. [70] Xue, L. and Yang, L. (2006b). Additive coefficient modeling via polynomial spline. Statistica Sinica, 16, 1423—1446. [71] Yamato, H. (1973). Uniform convergence of an estimator of a distribution function. Bulletin of Mathematical Statistics, 15, 69—78. [72] Yang, L. (2000). Finite nonparametric GARCH model for foreign exchange volatility. Communications in Statistics-Theory and Methods, 5 8L 6, 1347—1365. [73] Yang, L. (2002). Direct estimation in an additive model when the components are proportional. Statistica Sinica, 12, 801—821. [74] Yang, L. (2006). A semiparametric GARCH model for foreign exchange volatility. Jour- nal of Econometrics, 130, 365—384. [75] Yang, L., Hardle, W. and Nielsen, J. P. (1999). Nonparametric autoregression with multiplicative volatility and additive mean. Journal of Time Series Analysis, 20, 579— 604. [76] Yang, L., Sperlich, S. and Hardle, W. (2003). Derivative estimation and testing in generalized additive models. Journal of Statistical Planning and Inference, 115, 521— 542. [77] Yang, L. and Tschernig, R. (1999). Multivariate bandwidth selection for local linear regression. Journal of the Royal Statistical Society Series B, 61, 793-815. [78] Zhang, F. (1999). Matrix Theory. Springer-Verlag, New York. 164 Mliltiiflij]]]]]]I]]]]i]]]]|]fl