0
Q (Cgp+1)/(2F+1)C;4(p+l)/(2p+1) ngp‘t'zl/ (2r+1) 61:41/ (2P+1) Q (v, M, a) ,
v, cMM,caa) =
v (cMM, caa) ____ Cgp+1)/(2F+1)C;/I(p+1)/(2F+1)v (M, a).
To make use of Theorem 2.3.1, we make an additional assumption on F,
. (1
(A5) The matrices Bp+1 (F) = {80,ng (Fllaﬁd E M+ (d) and
c (F) = (a. (F))is 6 R1.
Theorem 2.2.2, Theorem 2.3.1 (ii) and Assumption (A5) ensure the existence of a unique
optimal bandwidth hopt that minimizes
2
AMISE (F; 1.) = f V (ledF (x) + Q (1.17“, we,“ (F) ,n-ID (K) c (F))
Theorem 2.3.1 (iii) then implies that
2 K
prH (F) .1240 (K) c (F))
}-1/(2P+1)
h... = h... (n,K,F)=v1/(P+1)(
WW1) (Bp+1(F),C(F)) .
= n-l/(2p+1) “12’“ (K)
D (K) (p + 1)!2
Thus to obtain the optimal bandwidth hopt, one computes exactly the factors involving n
and K in the above expression, and estimate the following factor
0 = o (F) = (61, ....6.)T = (61(F),....6.(F))T = ,1/(p+1) (B.+1(F),C(F)) .
The next theorem follows from the negativity result in Theorem 2.3.1 (ii):
THEOREM 2.3.2. Under Assumptions (A1 )-(A5), F has asymptotically smaller MISE than
the empirical cdf F. Speciﬁcally, MISE (F) = n—1 f V (x) dF (x) and as n -—> oo
MISE (F; hopt) = MISE (F) + 71-(21’+2)/(2P+1)C (K, F) + o (n-(2P+2)/(2P+1)) ,
o(v(Bp+1(F),C(F)) ,B.+1(F),C(F)) < 0.
D (K)2p#129+1 (K) —1/(2P+1)
(p+1)!2
C(K,F)={
12
Following Yang and Tschernig (1999), we deﬁne a plug-in asymptotic optimal bandwidth
ﬁo — nu§+1(K) '1/(2P‘l‘1)
1’" C(K)(p+1)12
,1/(p+1) (13.... (F) ,6: (F))
in which the plug-in estimator of the unknown parameter 0, 9 =
V” (9+1) (BpH (F), C (F )), is computed by Newton-Raphson method using the gradient
and Hessian formulae of Theorem 2.3.1 and where the plug-in estimators of the unknown
matrices Bp+1 (F) = {8034,44 (Fllﬁﬂzl , C (F) are
. . d .. - d
B.“ (F) = {8.1.2.1 (F)}a,[,:1 ,0 (F) = {C0 (F)}O,=1 .
- n n (p) d X1"!
jzl i=1 7:117'7éa ~00
n d X”
X {"71 Z Kl? (Xjﬂ — Xw) H K97 ($7 - Xe) (1937} 1
. n n X-
Ca (F) = n~1}:{n-1 2 K9,, (Xja - X,,,,) H / ’7 Kg, (x, — X,,) (13,} .
j=1 i=1 °°
7=L7¢a -
The pilot bandwidth vector g = (91,...,gd)T is the simple rule-of-thumb bandwidth for
multivariate density estimation in Scott (1992).
In the next section, we present Monte Carlo evidence for Theorems 2.2.2 and 2.2.3, and
illustrate the use of the smooth estimator F (x) with real data examples. In all computing,
we use the quartic kernel K(u) = 15/16 x (1 — u2)21(|u| g 1) with p = 1 and plug—in
bandwidth ﬁopt described above. We have not experimented with other choices of K and
1) due to limit of space and as these choices are in general not as crucial as that of the
bandwidth, see Fan and Yao (2003).
13
2.4 Examples
2.4.1 A simulated example
In the section, we examine the asymptotic results of Theorems 2.2.2 and 2.2.3 via simulation.
The data are generated from the following vector autoregression (VAR) equation
Xt=aXt_1+e,-,e,-~N(0,2),2Stgn,2=[{1}’10],0$a,p<1
with stationary distribution X, = (Xt1,Xt2)T ~ N (O, (1 - az)_1 )3). Clearly, higher
values of (1 correspond to stronger dependence among the observations, and in particular,
if a = 0, the data is i.i.d. The parameter p controls the orientation of the bivariate cdf F,
and in particular, if a = p = 0, then F is a bivariate standard normal distribution. In this
study, we have experimented with three cases: p = 0, a = O; p = 0.5, a = 0.2; p = 0.9,
a = 0.2 to cover various scenarios.
A total of 100 samples {Xt}?=1 of sizes n = 50, 100, 200, 500 are generated, and F is
computed using the optimal bandwidths ﬁopt described in section 2.3. Of interest are the
mean over the 100 replications of the global maximal deviation Dn (F) deﬁned in (2.2.2),
denoted as [7,, (F), and the mean integrated squared error MISE (F;hopt) deﬁned in
(2.2.1). Both measures are listed in Table 1. As one examines Table 1, both D" (F) and
MISE (13313013,) values decrease as sample size increases in all cases, corroborating with
Theorems 2.2.2 and 2.2.3. Also listed in Table 1 are the differences of the same measures
for the empirical cdf F against those of F, which are always positive regardless of the data
generating process (i.e., for different combinations of a, p) and measures of deviation (i.e.,
En or MISE). This corroborates with Theorem 2.3.2 that F has asymptotically smaller
MISE than F.
Based on the above observations, we believe our kernel estimator of multivariate cdf is a
convenient and reliable tool, which is also superior to the empirical cdf in terms of accuracy.
14
2.4.2 GDP growth and unemployment
In this section, we discuss in detail the dependence of US GDP quarterly growth rate on
unemployment rate. There are three types of unemployment: frictional, structural, and
cyclical. Economists regard frictional and structural unemployment as essentially unavoid-
able in dynamic economy, so full employment is something less than 100% employment. The
full-employment rate of unemployment is also referred to as the natural rate of unemploy-
ment. It does not mean the economy will always operate at the natural rate. The economy
sometimes operates at an unemployment rate higher than the natural rate due to cyclical
unemployment. In contrast, the economy may on some occasions achieve an unemployment
rate below the natural rate. For example, during World War II, when the natural rate
was about 4% and actual rate below 2% during 1943-1945. It is caused by the pressure of
wartime production resulted in an almost unlimited demand for labor. The natural rate is
not forever ﬁxed. It was about 4% in the 19608, and economists generally agreed that the
natural rate was about 6%. Today, the consensus is that the rate is about 5.5%.
GDP gap denotes the amount by which actual GDP falls short of the theoretical GDP
under the natural rate. Okun’s law, based on recent estimate, indicates that for every
1% which the actual unemployment rate exceeds the natural rate, a GDP gap of about
2% occurs. See Samuelson (1995), p.559 or McConnell and Brue (1999), p.214 for more
details. In other words, if unemployment rate falls, then GDP growth rate increases. But
unemployment rate can not keep falling because it moves around the natural rate. So it
is useful to ﬁnd the relationship between the GDP growth rate and unemployment growth
rate.
Let th = the seasonally adjusted quarterly unemployment growth rate in quarter t,
th = the quarterly GDP growth rate in quarter t, all data taken from the l-st quarter of
1948 (t = 1) to the 2—nd quarter of 2006 (t = 234) . Since both data have been seasonally
adjusted, it is reasonable to treat Xt = (Xt1,Xt2)T ,t = l, ..., 234 as a strictly stationary
time series, which is shown in the time plots. ACF plots also indicate that the assumption
of a-mixing is satisfied. The plots are shown in Figures 1—4.
Given any interval I = [a, b], the survival function of th conditional on th E I is
15
deﬁned as
F(b,:m) - F(a,$2)
F(b, +00) — F(a, +00)
51(32) = F(th > 1132!th E I) = 1 -- (2.4.1)
in which F is the joint distribution function of th and th.
The function 31(232) can be approximated by the following plug—in estimator
- :r = _ F(b,$2)—I:‘(a,a:2) I
81(2) 1 F(b,+oo)-13‘(a,+oo) (2.4.2)
in which F‘ is the kernel estimator of F deﬁned in (2.1.1). According to Theorems 2.2.1 and
2.2.3, for any ﬁxed 11:2, ISI($2) — 31052). = Op (71-1/2) while
sup$2€R ISI($2) — 81(x2)| = 00.3, (714/2 log n) ,
so the estimator 571052) is theoretically very reliable. We therefore draw probabilistic con-
clusions based on the smooth estimate 31(272) instead of the true SI(:1:2).
In Figure 5, the estimated conditional survival curve 31(552) is plotted for intervals
I = [—0.08,—0.04], I = [—0.02,0.02], I = [0.04, 0.08]. Clearly, when the unemployment
growth rate is between —0.08 and —0.04, the chance to have the GDP growth rate higher
than 1.5% is the greatest, which is about 0.2. This is in accordance with the Okun’s law that
the growth in GDP is the associated with the unemployment rate. So if policymakers want
to achieve high GDP growth rate, they may ﬁnd better ways to lower the unemployment
rate. One can even estimate the probabilities of GDP growth rates given the policy of
unemployment, which is the interval I. If current unemployment rate is close to the natural
rate, then the I is an interval close to 0, such as [—0.02, 0.02]; if the current unemployment
rate is much higher than the natural rate, then the I is an negative interval, i.e., trying to
lower the unemployment rate.
On the other hand, the survival function of th conditional on th can be computed
similarly. If certain level of GDP growth rate is planned to be achieved, one can estimate
the conditional probabilities of different unemployment growth rates.
16
2.5 Appendix
2.5.1 Preliminaries
In this appendix, we denote by C (or c) any positive constants, by U (or u) sequences of
random variables that are uniformly O (or 0) of certain order and by 00.5. almost surely 0,
etc.
LEMMA 2.5.1. [Berry-Esseen inequality, Sunklodas (1984), Theorem 1] Let {5,};1 be an
a-miring sequence with Eén = 0. Denote d6 := Ina-X1992 {E|§,-|2+5} ,0 < 6 S 1, Sn =
23:15., 0?. == E82. 2 can for some Co e (o,+oo). Ifa(n) s KoeXP(-)~0n), A0 > 0,
K0 > 0, then there exist c1 = c1(K0,6), c2 = c2 (K0,6), such that
An = sup
2
P {05:15}, < z} — (I) (z)! S 01::5{10g(0n/c$/2)/A}1+6 (2.5.1)
_n
for any A with A1 g A 3 A2, where
A1: c2{log(on/c(1)/2)}b/n,b > 2 (1 + 6) /6; A2 = 4 (2 + 6) 6‘1 log (o,,/c(1)/2) .
LEMMA 2.5.2. (Bernstein’s inequality, Bosq (1998), Theorem 1.4). Let {{t} be a zero mean
real valued process, Sn = 221:1 15,-. Suppose that there exists c > 0 such that fori = 1, ' - - ,n,
k 2 3,E|§,-|’c g #21:!ng < +oo,m,- = maxlsisN ||€,-||,.,r 2 2. Then for each n > 1,
integer q E [1,n/2], each s > 0 and k 2 3
2 l 1
ﬂ . — qgn . 'n 27+
P{lZi=1€'| >"5”} Salexp( 25m3+5cgn) +a2(k)a([q+1l)
where
251713 + Scen 5n
2 5m2k/(2k+1)
a1=2§+2(1+—i———) ,a2(lc)=11n (l+—-‘L—— .
2.5.2 Proofs of Theorems 2.2.1 and 2.2.2
LEMMA 2.5.3. Under Assumptions (A1),(A3) and (A4), as n —+ 00
, ~ _ “pH (K) d p+13p+1F(X) p+1
E{F(x)} — F*(z)l S 6160
For c 2 2/A0, anl S S Cln‘z. For 72 large enough, og/n =
Let 6 = 1, A = 4(2~+6)6"110g (on/cé/Z) = 12log(on/c(1)/2), d = 1, then An S
$05-12“ ‘2: —C—0 2 O(n")1/2), i.e., Sn/on "’d N(0, 1). Theorem 2.2.1 then follows be-
cause WWF (F(x —F (x)) "’d N (0, 1) by Slutsky’s theorem. Equations (2.5.2)
and (2.5.3) together with E5”, = 0 imply that
. 2 K 1 x 2
{EF(x)-—F(x)}2 = %{Z:=lhg+lﬂﬁ%—)} +u(h§g§2):
n—_1V(x) D(K)n—1c;lhaaF—T_(:)+ M( —1hmax)y
.. .. 2
E {F (x) —EF (x)}
.. . 2
hence Theorem 2.2.2 follows by computing f E {F (x) -EF (x)} +
.. 2
{EF (x) —F(x)} dF (x).
2.5.3 Proof of Theorem 2.2.3
LEMMA 2.5.5. Denote gm1,...,md = (a1,m1v”' ,adﬂnd) 6 Rd, 1 S ma S Ma and
A ___ |F( )—E{F( )}|,
n IsfaagMa gml, ,md gml, ,md
W .)—F du—E{/g1 th(xi—u)du}) :1,
—OO
and for k 2 2, E (Ichk) = E (lg-n1“ (3,), which is
I / gm1"" ’md K1, (x,- — u)du—-E{/gm1fu,md K1, (x,- — u) du}
.<_ 1k‘2E( nan} S a1 exp (——-——-—E—%——) + a2 (3) or ([n/ (q + 1)])9.
25mg + Seen
Take such that [n/( + 1)] > 10 n > i_c_1_n_ ____‘15_121_____ > c a2 lo n and
a1=29+2 1+-—€%-—-—- =0(logn).
q 25mg + Scan
. 1 3
since m3 = mamas. Hang 3 {Bags} / s 1, then
a2(3)=11n 1+-i Slln 1+——§-—— Slln 1+ 5 =O(n),
5n 1 alogn
an—2 log n
a ([n/ (q + 1W7 s (K0 exp (-)~o [n/ (q +1)l))6/7 s cn—sxoco/v,
So for c0, c2 large enough
P “2:1 Ci"
n 1
P max n_1l - I > an"? 10 n <
{ISmaSMa 2,51 ("limb ,md g _
A“!1,...,Md n
—1
Z P{” 2 Cin,m1,m,md
1 i=1
m1=1,...,md=
Hence Borel—Cantelli lemma implies that An = Oa,s_ (n‘1/2 log n). Meanwhile 8,, is
> nan} S O(log n) exp (~02a2 log n) + Cn1’6A060/7 S Cn—(d+2),
d
l
> an? logn} S Cll—(d+2) H Ma S Cn—z.
a=1
bounded by
Kim|F(9m1:~»md)-E{F(gm1v-»md)}l
22
WWH(gm-ml}-F(9mwmd>|
= An + U(n“1/2) = Oa,s_ (714/2 log n) .
If X1, . - - ,Xn are i.i.d., then An+Bn = 00.5, (nil/2 (log n)1/2) by using same steps above
with Bernstein’s inequality of i.i.d. case. El
LEMMA 2.5.6. VA C Rd, IA |K11 (v — u)| du SfRd lKh (v — u)| du S HKH‘Iﬁ.
Proof. Applying elementary arguments, f A lKh (v — u)| du S fRd ”(11 (v — u)| du is
bounded by
d _1 ”a " “0:
[Rd Ha=1ha K( ha )
LEMMA 2.5.7. Let —oo = and < < aa,Na = 00 be such that
d 1
du = 110:, [_1IKrwandwasuKIIii - U
max(N1,---,Nd) 3 On and P(a0,kSXaSaa,k+1) g 1/n,Vl g k g Na,V1 g
a g d. Then E (gin-and lKh (X—u)|du = u (n-1/2(1ogn)1/2) in which gn1,...,,, =
d
(a -- . a ) 6 Rd
1,111, 7 d’nd '
Proof.
0° /9n1+1,---,nd+1
12/91:,"- ”(11 (X — u)| du S [00 HQ, (v — u)| dudF (v)
,nd 9n1,--- ,nd
971. +1,---,n +l+(hlv"'1hd) 9n +1,m,n +1
=/ 1 d dF(v) 1 d IKh(v—u)|du
gnli'" ,nd—(h1,'” ’hd) 9111 2'" and
ya +1,---,n +1+(hlv"’ihd)
S C/ 1 d dF (v)
gub... ’nd—(hl’m vhd)
9n1+1,--- ,nd+1+(h1 ’hd)
accordin to Lemma 2.5.6.
g gnli'" ,nd—(h1,"' ’hd)
dF (v) equals
911 +l,~-,n +1+(hl""ihd) 9n +1,~--,n +1 9n +l,---,n +1
/ 1 d dF(v)—/ 1 ‘1 dF(v)+/ 1 d dF(v)
9111 9'" ,nd—(h1,"' ’hd) 9711 1'” and gn11"'1nd
9n +1,---,n +1+(’11r"ihd) 9n +1,---,n +1
=/ 1 0‘ dF(v)—/ 1 d dF(v)
gnla'" ,nd—(h13"' ’hd) gn1,---,nd
+P (gri1,-~,nd S X S 9121+1,---,nd+1)
23
01,111 al,n1+1 a1,1i.1+1+hi
S + +
a h1 a 01,n1+1
1,111— 1,121
ad,n ad,n +1 ad,n +1+h1
- f d + j d + f d dF(v)
0 hi 6‘ ad,nd+1
d7nd~ d,nd
_ /9n1+1,-",nd+1dF(v) + l/n
9
"1"" and
a
Within the above sum, the 3d - 2‘1 terms with aa’na+l are 0 n‘1 , while each of the
aina
2d terms without ffzﬁgﬂ is bounded by hprod mag; | f (x)|. Applying Assumptions (A1)
x6
and (A3),
X
E/
9n1,--- ,n
”(11 (X — u)| du S Chprod max If (x)|+C (3d — 2d) /n = u (n"1/2 (logn)1/2) .
d XERd
Cl
LEMMA 2.5.8. Under the same conditions of Lemma 2.5. 7, for Vx == (231, - -- ,Td) 6 Rd,
n"1 2:121 [Cm] = Ua.s. (7171/2 log n) in which
(in = (in (gn1,‘" ,nd) =/ {lKh (X, — u)| dU—E lKh (X — u)|} du.
9111,... ,nd
while for i.i.d. X1,...,Xn,n"1 Z?=1l€z’nl = Ua_s_ (n"1/2 (logn)1/2).
Proof. One can show by applying Lemma 2.5.2 as in the proof of Lemma 2.5.5. Cl
Proof of Theorem 2.2.3. Under the same conditions of Lemma 2.5.7. one has
A
max IF (gn1,...,,,d) — F (gnl,...,nd)l = 00.3, (n—l/zlogn)
lsnaSNa
by Lemma 2.5.5. For Vx = ($1, - - - ,Ed) 6 Rd, there exist integers n1, - - - ,nd such that
F(gn1,...,nd) S F(x) S F (gn1+1,...,nd+1). Hence lF(x) — 1:" (9"12"'i"d)l is bounded by
1 " x K x- d <1 n x K - d
52%] M .-u) u —;Z.-=1/ l hog-u)| u
gn1,'”,nd 9711,..."
1 d
1 X
=_Z’f / {lKh(X,--—u)|du—E|Kh(X—u)|}du
n 1:1 9 ..
n1, "d
X
+/ E [Kb (x -— u)| du = 00.3, (71—1/210gn)
9
"12'” M
24
according to Lemmas 2.5.7 and 2.5.8. Then according to Lemma 2.5.5,
F(x) — F(x)I S I13”(x) — 1:" (gn1,...,nd)I + IF (9711,... ,nd) — F (gn1,...,nd)I
+ IF (9711,...md) —- F(x)I
= Uaug. (71—1/2 log n) + Uws, (n"1/2 logn) + U(1/n)
and if X1, - - . , Xn are i.i.d, one can replace logn in the above inequality by (log n)1/2. D
25
CHAPTER 3
Spline estimation of a semiparametric
GARCH model
3.1 Introduction
It is widely recognized that global smoothing methods such as those by spline or wavelet are
computationally much more efﬁcient than local kernel smoothing, see for example the com-
parison of computing time in Xue and Yang (2006b) and Wang and Yang (2007). Recent
development of regression spline smoothing in terms of local asymptotics (Huang (2003)),
of high dimensional and weakly dependent data (Huang and Yang (2004), Xue and Yang
(2006b) and Wang and Yang (2007)) has presented convincing incentives for applying spline
smoothing to solve challenging problems in time series analysis. We have applied cubic
spline smoothing to the semiparametric GARCH model (1.2.2), which resulted in a proce-
dure that is a much faster but shares the same theoretical and numerical properties of the
kernel smoothing procedure in Yang (2006). Table 3 shows the computing time compari-
son between the proposed cubic spline method versus the local linear method in estimating
parameter a0. Clearly, the cubic spline method is superior for large sample as its comput-
ing time is proportional to n-1 of the corresponding time of the local linear method. The
advantage of spline method had already been recognized by Engle and Ng (1993), which pro—
posed spline estimation for the news impact curve for extensions of model (1.2.1), without
developing justiﬁcations by asymptotic theory.
The chapter is organized as follows. In Section 3.2 we discuss the assumptions of the
26
model ( 1.2.2), the spline estimation of the unknown parameter cm and asymptotic properties
including its oracle efﬁciency. In section 3.4 we describe the implementation of the estimator.
In sections 4 and 5 we apply the method to simulated and empirical examples. All technical
proofs are given in the Appendix.
3.2 Estimation Method
The statistical inference of the semiparametric GARCH model (1.2.2) consists of astimating
both parameter a0 and link function m. In this chapter we focus on estimating the param-
eter as once a0 is estimated with Vii-consistency, the estimation of function m is a routine
application of univariate smoothing.
The following assumptions on the data generating process are used
A1: The process {Yt}?:—oo is strictly stationary, and the innovations {Etltez have ﬁnite
r—th absolute moments E Iétlr = mr < oo, 0 < r S 6.
A2: The link function m(-) is positive everywhere on 12+ and has Lipschitz continuous
4—th derivative.
For convenience, deﬁne Xt = 2;; aé-‘lYtEJ-J E Z which simpliﬁes model (1.2.2) to
Yt = ml/2 (Xt) Q, at2 = m (Xt) ,t E Z while the process {Xﬂfg satisﬁes the Markovian
—OO
equation Xt = aOXt_.1+m (Xt_1) {L1, t E Z. Since a0 is an unknown parameter in (0, 1), to
make numerical optimization feasible, we assume that 020 lies in the interior of A = [(11, a2],
where O < 01 < (12 < 1, are boundary values known a priori. In practice, one takes
sufﬁciently small a1 and sufﬁciently large 02 based on prior knowledge of the data. Deﬁne
next th as a series analogous to Xt but with any candidate value of a E A
00 00
'—1 2 ' 2
X0.) = 2a] YH = Za’m (XH) §,_j,t e z. (3.2.1)
i=1 i=1
We need the following assumptions on the processes {Xa,t}?_:_oo ,a E A.
A3: The processes {Xa,t}:_oo,a E A are jointly strictly stationary and geometrically
a-mixing, i.e., the o ~mixing coefﬁcient oz(k) S cpk, for constants c > 0, 0 < p < 1,
27
where
a(k) = sup ) |P(A)P(B) —— P(A ﬂ B)I.
A60 (Xa,t ,tS0,a€A) ,BEo (Xa,t,t_>_k,aEA
From Assumption (A3) and the fact that the innovations {5322—00 are iid, the joint
distribution of (Ybét, Xa,t,a E A) is strictly stationary. For each a E A, deﬁne the trans-
formed variables for the th as,
F01 (Xa,t) + F02 (Xa,t)
Uai = F (Xat) = 2
, 1 S t S n (3.2.2)
in which F01 and Fa2 are cdfs of X01): and Xa2¢ respectively. In particular, we denote
Ut = U00; = F (XW) = F (Xt).
A4: The pdf associated with F is f (x) > 0, Va: 6 (0, +00) and Ua,t has a pdf (pa(.) which is
Lipschitz continuous and there exist constants C90, C90 such that infaEA,0SuS1 (pa (u) 2
Cw and “Pas/1,0931 9% (u) 5 cv-
For any aEA define the predictor of Yt2 based on Ua,t as ga(u) = E(Yt2|Ua,t = u),0 <
u < 1. In particular, denote g(Ut) = ga0(UaO,t) = E(Yt2IUao,t) = m(Xt). Deﬁne the risk
function of a as R(a) = E {Yt2 —- ga(Ua,t)}2. Apparently{1@},?i_oo have ﬁnite 4—th moment
due to assumption (A1) and (A2). So R(a) allows the usual bias-variance decomposition
R(a) = E {9(Ut) - 90(Ua,t)}2 + (m4 -1)E92(Ut) which, together with 9(Ut) "=— 9a0(Ua0,t),
imply that
R(a) = E {9(a) — was}2 + 3(00) 2 R*

0 as Lip([a,b] ,C) = {g] ]g (x) -—g(:z:')] S Cla: -:r:’] , Vx,:c' 6 [a,b]}.
We mean by “~” both sides having the same order as n —> 00. We denote by IdIXdl
the d1 x d1 identity matrix, and 0,11de the d1 x all zero matrix. For any vector x =
(51:1, 51:2, - - - , 2,12), we denote the supremum and Euclidean norms as Ix] = maxlSan2 Ira]
and leu = (22:. x2.) 1’ 2.
We need the following Assumptions on the data generating process.
(A1) The tuning variable X = (X 1, . . . ,Xd2) has a continuous probability density function
f(x) that satisﬁes 0 < of S minxeX f(x) S maxxgx f(x) S Cf < 00 for some
constants cf and Cf and f(x) = 0,:c ¢ x = [0,1]“2.
(A2) There exist constants 0 < cQ S CQ < +00 and 0 < c5 S C5 < +00 and some 6 >
d
1/2, such that CQIdlxat1 S Q(X) = {4(X)},,]/=1 = E (TTT IX = X) S 001(11de
2+6
and 05 S E{(T1TI/) IX =x} S C"; for all x E X and l,l’ =1,...,d1.
(A3) The vector process {ct}f__‘_’___oo = {(Yt, Xt, Tt)}f:_oo is strictly stationary and geomet-
rically strongly mixing, that is, its a -mizing coefﬁcient a(k) S 6,0“, for constants
c > 0, 0 < p < 1, where a(k) = SUpA€a(ct,tS0),BEa(ct,t2k) |P(A)P(B) —- P(A ﬂ 8)]
(A4) The coefﬁcient components, mat E C’1 [0,1], mg] 6 Lip ([0, 1] ,Coo) ,Vl S a S d2,1 S
15 d1 with m1, 6 C2{0,1],V1 g: 3 d1.
(A5) The conditional variance function 02 (x, t) is measurable and bounded. The errors
{5,}?=1 satisfy E(e,-|.7-',-) = 0, E(e,2|F,) = 1, E (Ieilz'l'nlﬁ) S CG, for some n E
(1/2, 1] and the sequence of o—ﬁelds F,- = a {(Xj,Tj) ,j S i;€j,j S i -— 1} for i =
1,...,n.
(A6) The marginal density f1 (:51) of X1 and the conditional second moment matrix func-
tion Q1 (2:1) deﬁned in (4.2.3) both have continuous derivatives on [0,1].
55
Assumptions (A1)—(A5) are common in the literature, see for instance, Huang & Yang
(2004), Huang & Shen (2004) and especially Xue & Yang (2006b). Assumption (A6) is
needed only for the asymptotic theory of oracle “kernel smoother”, but not for the oracle
“local linear smoother”. Assumption (A2) implies also that for all 173 6 [0,1] , 1 S oz S d2
andl,l'=1,...,d1
|/\
Qa ($0): {(10 (170)}11 l’_ _=1 ET(TT lXa — 1'01) < CQIdlxdl (4 2 3)
E{(T1TII)2+ IXa = ma} _<_ 05.
Furthermore, Assumptions (A2) and (A5) imply that for some constant C > 0
CQIdl Xdl
l/\
06
max E|Tl]2+" < C1 max E |T1T1|2+6= C lmax E|T1|4+26S 0C5 < +00. (4.2.4)
lSlSdl lSSdl lSSdl
At one referee’s request, we provide here insight into the relationship allowed between the
vectors T and X under Assumption (A2). It is instructive to ﬁrst understand what T and
X can not be in the context of identiﬁability for functions {mal(:1:a)}I=’__d1 a_1. Suppose
that the vector X is centered so that EX = 0. Then model (1.3.1) is unidentiﬁable when
(T1,T2)= (X1,X2) since —3X2T1 +3X1T2 — 0, E(— -3X2)= E (3X1) = 0 and the function
m (x,t) in (1.3.1) is expressed as
d1 d2 d2
2 mm + 2 mal (Ira) Q + mm + m21 ($2) + 2 mal ($0) t1
(=3 a=1 a=1,a;é2
0‘2
+ m02 + m12(3131) + Z "101(330) t2
a=2
d1 d2 d2
5 2 mm + 2 mal (95a) 11+ m01 + m21(1172) - 3152 + 2 mal (30:) t1
(=3 01:1 a=1,a7£2
d2
+ 77102 + m12(1131) + 35131 + Z ma1($a) t2,
0:2
so one can use "”51 (x2) = mm (:62) —-3:1:2 and mh (x1) = 17112 (11:1)+3:z:1 to replace mgl (11:2)
and mu (11:1) without changing the data generating process (1.3.1). In other words, the
functions mm (:52) and m12(:1:1) are unidentiﬁable. Xue and Yang (2006a), p.2523 gave a
56
similar counterexample, and discussed why an unidentiﬁable model may perform better for
prediction.
More generally, it is revealing to note that Assumption (A2) not only rules out the above
anomaly, but it also does not allow the possibility that there exist two CD’S (1 S l S d1)
almost surely equal to two Borel functions of X. To see this, suppose that (T1,T2) =
{1,01 (X) , (p2 (X)} ,a.s for some Borel functions (p1 and (p2. Assumption (A2) impliae that
T2 T T
CQ12x2 S E{ ( T111112 711222 )
X = x} S qu2x2,\7’x E X
leading to
cQI2x2 S ( (P1 2:351:72“) (p1 $353)“) ) S. CQI2x2,a.s.,Vx E X
which can not be true as for any x E x, the 2 x 2 matrix in the above is singular, thus can
not be 2 eqlgxg. That Assumption (A2) guarantees the identiﬁability of model (1.3.1) has
been established in Lemma 1 of Xue and Yang (2006b). It is important to observe, however,
that Assumption (A2) does not exclude the case of one Th1 S l S d1 almost surely equal
to a Borel function of X.
4.3 Oracle Smoothers
We now introduCe what is known as the oracle smoother in Wang & Yang (2007) as a bench-
mark for evaluating the estimators. Denote for any vector x = (x1 , 2:2, - - - , xd2) the deleted
vector X_1 = (x2,~- 133112) and for the random vector X,- = (Xi1,X,-2,--- ’Xidz) the
deleted vector X,,_1 = (Xig, . -- aXidz): 1 S i S n. For any 1 S l S d1, write m4) (x_1) =
mg; + 222:2 ma] (ma). Denote the vector of pseudo-responses Y1 = (Y1,1,- -- ,Yn,1)T in
which
d1 ' d1
Yi,1 = Yi — 297101 + "1-1,: (Xi,-1)}Til = Zm11(X11)"-’}z+ U (Xi: Ti) 62'.
1:1 [=1
These would have been the “responses” had the unknown functions (111-1,; (x—1)}1*(z)I g cl
An = sup
2
for any A with )11 _<_ A S )2, where
A1 = 02 {10g(011/c111/2)}b/n1b > 2 (1 + 77) /I1; 12 = 4 (2 + I2)I7'110g(0n/c111/2) -
79
For the 17 in Assumption (A5), set A = 4 (2 + 77) 11"1 log (an/céﬂ) , then by (4.2.4) one has
lgi_<_n lgign
z’=1 z'=1
d1 d1
d4] = max {El 2 Al’éz’mJ’liz-HI} = max {El 2 )‘I’Tz'l’a (Xi, Ti) EiKh (Xil — 1:1) |2+77}
d1
S (30501; {El 2 Kh (X1 ‘ 171) PM} ..: 0 {h_(1+n)}’
l’=1
i.e., An = 0 {h-(1+n)/gg} = 0 {n(1+n/2)/5-n/2} = 0(n1/5—277/5) —+ 0 when 1/2 < 17 _<_
1. So Sn/an —+ N(0,1), then «72,th {V1, ($137,; ——) N (0,AT2A). By Cramér—Wold
device, one has \ﬂi {VII ($1)}:i,1=1 —-» N (0, 23). Then according to Slutsky’s theorem, one
has
d1 d1
VnhE(TTT|X1 = 931) {771K,1,- ($1) — m1,z' ($1) — 251,1! ($1) ’12} —’ N(0,2)
(=1 ”:1
. .. d d1
i.e., Vnh {mK,1,. (x1) — mu, (3:1) _ 21:1 bu, ($1) h2}l’=1 —>
N (0, Q1 (:z:1)_l 20,1 ($1)—1), where Q1 (2:1) is deﬁned in (4.2.3). Cl
Proof of Theorem 4.3.2. Let Dn = n“ with a < g, o(2 + n) > 1,a(1+ 1)) > 2/5,
which requires 17 > 1/2. Rewrite Z,- = Til/5,- = Z51" + Z52" + Z3" where 23" =
z, {|Z,| > Dn} ,ngn = Z,- {12,1 g on} — zﬂngn = E2, {|Z,-| g Du}. Deﬁne
€i,n,l’,j = Kh(Xi1’ $1) 0 (X231?) Z-D" j =1,2,3-
2,1' ’
According to Assumption (A5) and (4.2.4), one has
oo CUEIZi|2+n = C i E {ITIIIZME (|5|2+ﬂ |X,T)}
21002.: 2 D") Z
l/\
2+1) 0 2+1]
1121 n=1 D71 n=1 D"
oo oo
1 ..
s Cachlrz},|2+’7§ : 2+" = CachIMME :17. 042+") < 00.
1121 n 1221
By Borel—Cantelli Lemma, one has with probability 1, n'12?=1€ 1 = O for large n.
i,n,l’
= U (“n-k) for any I: > 0. Using As—
Therefore, one has squle[0,1] ln‘l 23:152. n (I 1
sumption (A5) and (4.2.4)
EIZi|2+n
D
2,,3"| = I-EZz- {I24 > 01.}! s 01+"
Tl
8O
_ E {E |T,/|2+’7 E (|5|2+'7 1x,'r)} /D,1,+" = o (n-Z/S).
Hence
"—1 2g,” =n-1 Z Kh(X -1 — $1)0(X1,T,-) 21g"
1:” = 12,-1 Zn Kh(X,-1 — 1:1) 0 (72—2/5) = 0,, (72—2/5) .
i=n
Meanwhile
D 2 D 2 D 2
(2:32") = EZE{|Z,-| S Dn} ‘ (2:33") = E212 - EZ,-2{|Z,-| > 0"} — (Z1351)
2 2+ D 2 — —
s E {731E (e,- IXe-Je) }—EZ,. '7 {lZil > Du} /Dr‘z/_(Zi,3n) = EﬁirtUp (Dun + n 4/5)’
E D" 2
6,2 m112=E{Kh(Xi1 - $1) 0 (Xi/Ti) 2:32 }
-—- h’1f(rv1)E (T302 (X,T) 1X1 = x1) IIKH§ {1 + u (1)}
k k—2 2
= E ( €i,n,t’,2l lée,n,z’,2l )
2
E5 '2 <002’“ 205; 2/hk 2E|§,,,,,,,2 I
E léi,n,l’,2
lk_2
S sup léi,n,l’,2
Ei ,n ,,21’
$16[O,l]
according to Assumption (A3) and the truncation of Zipz‘", then there exist a constant
k )
C1 = CaDn/h 81101] that E ( €i,n,l’,2l ) S Cllc-zklE(€?n (I 2), k 2 2.
Similar to the proof of Lemma 4.7.4, by using Lemma 2.5.2 (Bernstein’s inequality), we
6/7
m logn
letk=3,a 3=11n 1+ 3 ,m2=E2 =0 h‘1,5 =a—-—
2” ( En ) 2 (5mm) ( l n W7};
6
n 2
gen 17. 7
>115 ** log na nh
2 2 -— 1
25m2 + 5c5n 25m2 + Sclen 25m2 + 501a 0g:
Vn
C3a2 log n __ C302 log n
2
1 “ 2 _1/2 12 ~ azlogn,
25mg}; + 5CoDn/ha 0g "h 25m2h + 5ac0nan h" / log n
v nh
81
0 =2: +2 =Olo n,
16/(71+25m2+5015n) ( g )
("33
C6Dn 2
3 <11 1 = ,
a2() n{ +an-1/2h 1/210g77.} 0(n)
6 7 n 6"
(IQ—1]) / S (K06 *0[q+1]) Son—6A002/7,
a2 (3) -
'6 €i,71,l’,2||3 S CﬁDn,
,with "7.3— — max
1_<_i alog n/Jﬁ} S 0(log n) exp (—c5a2 log n) + Cn2"6)‘OC2/7
2
= 72765“ O(log n) + Cn2-6A002/7
sup
= 0,; {(nh)—1/2logn} .
xle[0,1]
n
-—1
n 2 €2’,n,l’,2
i=1
1?.
-1
n .21 the
z:
= 0,, {('nh)-1/2 log n} i.e.,
Then sulee[0,1]
sup IV}: (21:1) I— — Op {(nh) 1plogn} (4.7.8)
$16[0,1]
for the term V}; ($1) in (4.7.3). According to Lemma 4.7.5,
d
d1 1
772K,1,. (171) —' 7711,. (1171) = (1 CTWIC)-1 Z BIJI ($1) + WI (151)
[=1 "=1
d1
d1
—1
_—= {{ETlTl/Kh (X1 — $1)}Zi’zl + Op (1171/2 log 11)} {231,11 ($1) + VII ($1)}
1:1
[’21
d1 d1
= [{ETlleKh (X1 -— $1 ”321] {2: BL 1,( ($1) + V1; (2:1)} + 0p (n—1/2 log n),
=1 [I21
[{ETITz’Kh (XI—$1)}z}121]_1 = [f($l)Q($12X_1)+u(h2)]—1
= f‘1(:v1)Q‘1(x1,X_1) + u 022)"1 .
82
Meanwhile, according to Lemma 4.7.4 and (4.7.8),
d1
EB”; (:01) + V); (11:1) = Up(h1/210gn/\/ﬂ + hz) + Up {(nhrl/zlogn}.
(=1
According to Assumptions (A1) and (A2), f‘1 (3:1) 3 cfl, 0611,11 _<_ Q'1(:1:1,X-1) S,
calIdl, so SUleeUzJ—h] |mK,1,.(x1) - m1) (x1)| = Op{(nh)"1/2logn}. El
4.7.3 Estimation of constants
To closely examine terms If) (x) and 50,1 (2:0,), we denote the following vector of coefﬁcients
T
a = {(1011 01,1,1, "-1 aN+l,d2,lv 0'02: 01,1,2, ”'7 aN+1,d2,21 "-i a0d1) a1,l,d11 "-1aN+l,d2,d1}
(4.7.9)
such that the noise term 5:) (x) in (4.4.9) is expressed as
0‘2 N+1
(Pn,zE) (x) -—= a (x) = 401+ 2 Z 4,0,3), (ma). (4.7.10)
a=1 J=1
-1
Equation (4.7.10) implies that a = (DTD) DTE , where .
D = {D (X1,T1) , ...,D (Xn,Tn)}T = {T1®B (X1) , ...,Tn®B (Xn)}T, (4.7.11)
B (x) = {1,31,1 (4:1) ,...,BN+1,,,2 (xd2)}T,t = {t1,...,td1}T. (4.7.12)
Note that 5 given in (4.7.9) can be rewritten as
a = GDTD) -1 GDTE) = (VT+V.})_1 (i—DTE) , (4.7.13)
where by (4.7.11)
DTD = 2 [(BT?) ® {B (xi) B *