GOODNESS-OF-FIT TESTING OF ERROR DISTRIBUTION IN NONPARAMETRIC
ARCH(1) MODELS AND LINEAR MEASUREMENT ERROR MODELS
By
Xiaoqing Zhu

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Statistics—Doctor of Philosophy
2015

ABSTRACT
GOODNESS-OF-FIT TESTING OF ERROR DISTRIBUTION IN
NONPARAMETRIC ARCH(1) MODELS AND LINEAR MEASUREMENT
ERROR MODELS
By
Xiaoqing Zhu
This thesis discusses the goodness-of-fit testing of an error distribution in a nonparametric autoregressive conditionally heteroscedastic model of order one and in the linear
measurement error model.
For the nonparametric autoregressive conditionally heteroscedastic model of order one,
the test is based on a weighted empirical distribution function of the residuals, where the
residuals are obtained from a local linear fit for the autoregressive and heteroscedasticity
functions, and the weights are chosen to adjust for the undesirable behavior of these nonparametric estimators in the tails of their domains. An asymptotically distribution free test
is obtained via Khmaladze martingale transformation. A simulation study is included to
assess the finite sample level and power behavior of this test. It exhibits some superiority
of this test compared to the classical Kolmogorov-Smirnov and Cram´er-von Mises tests in
terms of the finite sample level and power.
For the linear measurement error model, a class of test statistics are based on the integrated square difference between the deconvolution kernel density estimators of the regression
model error density and a smoothed version of the null error density, an analog of the so
called Bickel and Rosenblatt test statistics. The asymptotic null distributions of the proposed test statistics are derived for both the ordinary smooth and super smooth cases. The
asymptotic powers of the proposed tests against a fixed alternative and a class of local non-

parametric alternatives for both cases are also described. A finite sample simulation study
shows some superiority of the proposed test compared to some other tests.

To my beloved parents, Shijun Zhu and Guangxia Lv, my brother, Yingming Zhu, and
my boyfriend, Silong Zhang.

iv

ACKNOWLEDGMENTS

Foremost, I would like to express my sincere gratitude and appreciation to my advisor Dr.
Hira L. Koul for his continuous support for my Ph.D. study and research, for his patient
guidance, encouragement, valuable criticism, and immense knowledge. His enlightening
ideas, constructive comments, numerous advises and uncountable time he spent make it
possible for me to finish this dissertation. His enthusiasm in research, and optimistic and
energetic attitude will help me in my future life.
Besides of my advisor, I would also like to thank Dr. Lyudmila Sakhanenko, Dr. David
Todem, and Dr. Yimin Xiao for serving as members of my doctoral committee and their invaluable suggestions. My sincere thanks also go to Dr. Shiyuan Zhong from the Department
of Geography at MSU, for offering me collaboration opportunities in her group and providing
me chances to work in diverse exciting projects. I would also like to thank Dr. Weixing Song
from the Department of Statistics at Kansas State University for his interesting course on
measurement error models. I would like to express my deepest appreciation to Dr. Zhiying
Wen from Tsinghua University for his continuous support and invaluable encouragement all
these years.
Thanks also go to the entire faculty and staff members in the Department of Statistics
and Probability who have taught me and helped me during my study at MSU. My special
thanks go to Dr. Yimin Xiao for his interesting courses, valuable advise and encouragement.
Thanks to the graduate school, the College of Natural Science and the Department of
Statistics and Probability who provided me the Dissertation Continuation Fellowship and
traveling fellowships for working on my thesis and attending conferences. This dissertation
is also supported in part by the grant NSF DMS 1205271, P.I. Hira L. Koul.
v

I would also like to thank my academic family members: Liqian Cai, Bin Gao, Tao He,
Abhishek Kaul, Lisi Pei, Dr. Xin Qi, Honglang Wang, Yuzhen Zhou and all other students
from the Department of Statistics and Probability for the numerous discussions and all the
fun we had in the last four years.
Last but not least, I would express my profound gratitude to my beloved parents, Shijun
Zhu and Guangxia Lv, my older brother, Yingming Zhu and my boyfriend, Silong Zhang for
their love, endless support and encouragements all these years.

vi

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 1

ix

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Chapter 2 Nonparametric ARCH(1) Models . . . . . . . . . . . . . . . . . .
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Autoregressive and Variance Functions Estimation . . . . . . . . . . . . . . .
2.3 Goodness-of-fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Asymptotic expansion for the weighted empirical distribution function
2.3.2 Khmaladze martingale transformation . . . . . . . . . . . . . . . . .
2.3.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4 Limiting process . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4
4
6
11
11
13
20
23
29
32

Chapter 3 Linear Measurement Error Models
3.1 Introduction . . . . . . . . . . . . . . . . . . .
3.2 Asymptotic Null Distribution . . . . . . . . .
3.2.1 Ordinary smooth case . . . . . . . . .
3.2.2 Super smooth case . . . . . . . . . . .
3.3 Consistency and Asymptotic Power . . . . . .
3.4 Simulations . . . . . . . . . . . . . . . . . . .
3.4.1 Ordinary smooth case . . . . . . . . .
3.4.2 Super smooth case . . . . . . . . . . .
3.5 Proofs . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

50
50
55
55
58
59
64
65
70
74

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

BIBLIOGRAPHY

vii

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

LIST OF TABLES

Table 2.1

Monte carlo critical values of the KS and CvM tests. . . . . . . . . . . .

31

Table 2.2

Empirical levels of Un test . . . . . . . . . . . . . . . . . . . . . . .

32

Table 2.3

Empirical powers of tests based on σ
ˆ22 . . . . . . . . . . . . . . . . . .

32

Table 3.1

Monte Carlo critical values of all the tests, ordinary smooth case. . . . .

67

Table 3.2

Empirical powers against chosen alternatives, ordinary smooth case.

. .

68

Table 3.3

Empirical powers against mixture normal (left panel) and logistic alternatives, ordinary smooth case. . . . . . . . . . . . . . . . . . . . . . . . .

69

Table 3.4

Monte Carlo critical values of the TKS , TCvM , and Wn , super smooth case.

.

70

Table 3.5

Empirical powers against alternative distributions, super smooth case.

.

72

Table 3.6

Empirical powers against mixture normal (left panel) and logistic distributions, super smooth case. . . . . . . . . . . . . . . . . . . . . . . . .

73

viii

KEY TO ABBREVIATIONS
• ARCH(1): autoregressive conditionally heteroscedastic model of order 1.
• Gof: Goodness-of-fit.
• KS: Kolmogorov-Smirnov.
• CvM: Cram´er-von Mises.
• KK: Khmaldaze and Koul (2009).
• MSW: M¨
uller, Schick, and Wefelmeyer (2012).
• d.f.: distribution function.
• i.i.d: independent and identically distributed

ix

Chapter 1
Introduction
One of the classical problems of statistical inference is to test if a given random sample comes
from a given continuous distribution. This is the so called goodness-of-fit testing problem.
A well known test for this problem is Kolmogorov’s tests based on empirical distribution
function, which is asymptotically distribution free. This is desirable because it makes this
test implementable for large or moderate sample sizes. This property is lost as soon as there
is a nuisance parameter present in the testing problem as happens to be case when, for
example, one is fitting a given distribution up to an unknown location parameter or up to
unknown location and scale parameters.
Similarly, analogous tests based on the residual empirical process in regression or in
autoregressive conditionally heteroscedastic time series models are not asymptotically distribution free for fitting a known distribution function (d.f.) to the error d.f. One way to obtain
asymptotically distribution free tests from residual empirical process in these models is to
base tests on its Khmaladze (1981) martingale transform. This has been successfully done
in parametric and non-parametric regression models in Khmaladze and Koul (2004, 2009).
M¨
uller, Schick and Wefelmeyer (2012) developed analogous transformation test based on
certain weighted residual empirical process for fitting a known error d.f. in nonparametric
autoregressive time series models of order 1. Chapter 2 of this thesis pertains to developing and analyzing analogously transformed process for fitting a known error d.f. to the error
d.f. in nonparametric autoregressive conditionally heteroscedastic time series models of order
1

1. The supremum test based on this transform is asymptotically distribution free. A finite
sample study shows accuracy of the asymptotic null distribution of this test, and that its
empirical power dominates that of the Kolmogorov test based on weighted residual empirical
process at all chosen alternatives, levels, and sample sizes.
Another way to obtain asymptotically distribution free tests in these problems is to
assume densities exist and use nonparametric estimators of densities to fit a given density.
For fitting a known density in the one sample set up, Bickel and Rosenblatt (1973) were
the first to investigate the asymptotic null distribution of a test based on a L2 -distance
between a kernel type density estimator and its null expected value. The asymptotic null
distribution of a suitably standardized version of this statistics was shown to be standard
Gaussian. Since then numerous papers have appeared proposing tests based on analogs of
this statistics in various models having some nuisance parameters. A desirable property
of this statistics is that its asymptotic null distribution is not affected by not knowing the
nuisance parameters in the one sample location-scale models. Lee and Na (2002), Bachmann
and Dette (2005), Horvath and Zitikis (2006) and Koul and Mimoto (2012) observed that
this fact continues to hold for the analog of this statistics when fitting an error density
based on residuals in parametric autoregressive and generalized autoregressive conditionally
heteroscedastic time series models. A similar fact has been observed to hold by Ducharme
and Lafaye de Micheaux (2004) in parametric autoregressive moving average models, by
Cheng and Sun (2008) in parametric nonlinear autoregressive time series models, by Bercu
and Portier (2008) for multivariate ARMAX models in adaptive tracking, and by Na (2009)
for infinite-order autoregressive models.
The regression models where covariates are not directly observable are abound in real
world applications as is evidenced by the three monographs of Fuller (1987), Carroll, Rup2

pert and Stefanski (1995), and Cheng and Van Ness (1999). In these models one observes a
surrogate for the covariates with some error. These are known as measurement error regression models or errors-in-variables regression models. Statistical inference in these models is
highly sensitive to the knowledge of the error distributions. Knowing the regression model
error distribution can help to develop efficient inference for the underlying parameter in
these models. It is thus of interest to develop goodness-of-fit tests for fitting a known error
density to the regression model error density in the presence of measurement error in the
covariates. Chapter 3 of this thesis pertains to developing a goodness-of-fit tests for this
testing problem in linear measurement error regression models. The test statistics are of
the above L2 type distance based on a class of deconvoluted error density estimators and
the smoothed version of null error density. Two types of tail properties of the measurement
error distribution are considered, which are the ordinary smooth case and super smooth
case. For each case, a comprehensive theoretical analysis of the asymptotic distributions of
these statistics under null hypothesis, under a fixed alternative and under a sequence of local
nonparametric alternatives is presented. A member of this class of tests is compared via a
finite sample simulation with some other tests. It dominates several of these tests in terms
of the power at the chosen alternatives when the measurement error is large.

3

Chapter 2
Nonparametric ARCH(1) Models

2.1

Introduction

In recent years, there has been a considerable focus for providing asymptotically distribution
free tests for fitting a known error distribution in regression and autoregressive and moving average models. Boldin (1982, 1990), Koul (1991, 2002), Khmaldaze and Koul (2004),
Koul and Ling (2006), among others, focus on tests based on residual empirical distribution
function (d.f.) in parametric cases. Khmaldaze and Koul (2009) provide martingale transform tests based on residual empirical d.f. for nonparametric regression models, and M¨
uller,
Schick, and Wefelmeyer (2012) provide similar tests fitting an error distribution in semiparametric partially linear regression models.
The focus of the present chapter is to analyze an analog of the above tests for fitting an
error distribution in nonparametric autoregressive conditionally heteroscedastic models of
order 1 (ARCH(1)). One of the main problems faced here is the construction of the nonparametric residuals so that the corresponding residual empirical d.f. obeys uniform asymptotic
linearity expansion up to the first order. M¨
uller et al. (2009) obtained this type of a result for
nonparametric homoscedastic autoregressive time series models of order 1. In this chapter
we extend this result to a class of ARCH(1) models.
The chapter is organized as follows. In section 2, we introduce the local linear estimators
of autoregressive and variance functions and state their uniform strong consistency. The
4

asymptotic uniform linear expansion of a suitably standardized weighted residual empirical
process based on the corresponding residuals, and the asymptotic distributions of the test
based on the martingale transform of these weighted residual empirical processes are established in section 3. Several examples of error d.f.’s where the results of this chapter are
applicable are also discussed in section 3. A simulation study of section 4 shows that the
finite sample power of the martingale transform test is uniformly higher than that of the
Kolmogorov-Smirnov test based on a weighted residual empirical process at all chosen alternatives. This finding is consistent with that reported in Khmaladze and Koul (2009) (KK)
when dealing with nonparametric regression models. The same simulation study also shows
some superiority of the proposed test over the Cram´er-von Mises based on a weighted residual
empirical process in terms of the finite sample level and power at the chosen alternatives.
The proofs of some technical results pertaining to nonparametric estimators of autoregressive and heteroscedasticity functions and those of the asymptotic uniform linearity of the
weighted residual empirical process are deferred to the last section of this chapter, section
2.5.
One of the novelties of this chapter is in the implementation of the Khmaladze martingale
transform test in ARCH(1) models even when the incomplete Fisher information matrix
is singular. In the location set up alone this matrix is known to be singular for double
exponential error distribution. In this chapter we note that this matrix is singular also for a
class of t-distributions in the present location-scale context, which is unlike in the location
set up where it is nonsingular as was noted in KK.

5

2.2

Autoregressive and Variance Functions Estimation

Consider the nonparametric ARCH model of order 1

Xi = m(Xi−1 ) + σ(Xi−1 )εi ,

i ∈ Z := {0, ±1, · · · },

(2.2.1)

where εi , i ∈ Z are independent copies of a standardized random variable (r.v.) ε, and
εi is independent of Xi−1 , for all i ∈ Z. Note that then m(x) = E(Xi |Xi−1 = x), and
σ 2 (x) = E{(Xi − m(Xi−1 )2 |Xi−1 = x}, x ∈ R, i ∈ Z.
Let F be a known d.f. We are interested in testing the hypothesis that the d.f. of ε is
F . Any test of such a hypothesis has to be based on the estimated residuals, which in turn
needs suitable estimators of the nonparametric functions m and σ.
Several researchers have investigated numerous nonparametric estimators of m and σ in
regression and autoregressive models. In order to use these estimators in the above testing
problem, one needs their uniform consistency. For homoscedastic regression models with
bounded dependent variable, Ojeda (2008) established the H¨older continuity properties of
the local polynomial estimators of the regression function for the one dimensional covariate
case. For heteroscedastic regression models, Neumeyer and Van Keilegom (2010) established
the uniform consistency of the local polynomial estimators for the regression and variance
functions in the case of multidimensional covariates. To estimate the variance function, they
use the estimators of the type a
ˆ−m
ˆ 2 (see also Yao and Tong (1994), where a
ˆ(x) and m(x)
ˆ
are estimators of E(Y 2 |X = x) and m(x), respectively. For homoscedastic autoregressive
models, Masry (1996) proved the uniform consistency over compact sets of multivariate local
polynomial estimators of the autoregressive function, provided the time series is α−mixing.
For stationary and ergodic auto-regressive time series of order 1, MSW proved the uniform
6

consistency over a sequence of compact intervals increasing to R of the local linear estimators of the autoregressive function. For the one dimensional α−mixing time series model,
Neumeyer and Selk (2013) proved the uniform consistency over a sequence of compact intervals increasing to R of the Nadaraya-Waston estimators for autoregressive and variance functions. Fan and Yao (1998) provided the asymptotic properties for an efficient fully-adaptive
2 |X = x)
estimator for the variance function, i.e. the local linear estimator of E((Y − m(X))
ˆ

in the one dimensional β−mixing case. Different from the mixing condition, Wu, Huang
and Huang (2010) gave a moment contracting condition for the dependence properties of
a general autoregressive model, and established the uniform consistency for the NadarayaWaston type estimators of the autoregressive function over a bounded compact set. Based
on the moment contracting condition for one dimensional stationary autoregressive model,
Borkowski and Mielniczuk (2012) established the asymptotic distributional properties of the
2 |X = x).
efficient fully-adaptive local linear estimator of the variance function E((Y − m(X))
ˆ

To proceed further, we now define the estimators of interest here. Let K and W be
density kernel functions and h1 and h2 be the bandwidths. Define
n

(ˆ
a0 (x), ˆb0 (x)) = arg min
α,β

Xi − α − β(Xi−1 − x)

2

K

i=1

Xi−1 − x
,
h1

x ∈ R. (2.2.2)

Note that a
ˆ0 (x) and ˆb0 (x) are the local linear estimators of m(x) and the first derivative
m(x)
˙
of m(x), respectively. Henceforth, m(x)
ˆ
=a
ˆ0 (x). To estimate σ 2 (x), we shall consider
the following two methods. The first one is based on Yao and Tong (1994), where σ
ˆ 2 (x) ≡
σ
ˆ12 (x) = a
ˆ1 (x) − m
ˆ 2 (x), and
n

Xi2 − α − β(Xi−1 − x)

(ˆ
a1 (x), ˆb1 (x)) = arg min
α,β

i=1

7

2

W

Xi−1 − x
.
h2

(2.2.3)

The second estimator is based on the work of Fan and Yao (1998), who suggested an
efficient fully-adaptive procedure, σ
ˆ 2 (x) ≡ σ
ˆ22 (x) = a
ˆ2 (x), where
n

(ˆ
a2 (x), ˆb2 (x)) = arg min
α,β

rˆi − α − β(Xi−1 − x)

2

W

i=1

Xi−1 − x
.
h2

(2.2.4)

Here rˆi = [Xi − m(X
ˆ i−1 )]2 . We shall show that both of these estimators of σ 2 (x) yield the
same asymptotic result for the proposed goodness-of-fit tests under similar conditions. Here
we shall present some consistency results about these estimators. In order to do so, we need
some assumptions as follows.
In the sequel, for any twice differentiable function g, g˙ and g¨ represent the first and second
derivatives of g, respectively. All limits are taken as n → ∞, unless specified otherwise.
Assumptions:
(E) There exists some b > 1 +

√

3 such that E[|X0 |2b ] < ∞ and E[|ε1 |2b ] < ∞.

(F) The innovation εj , j ∈ Z, are i.i.d. F . The density f of F is continuously differentiable
and supx∈R |xf (x)| < ∞ as well as supx∈R |x2 f˙(x)| < ∞.
(H) The sequence of bandwidths hi = αi cn , i = 1, 2, αi > 0, cn → 0 and
√

3 ) → 0,
(log n)η /(nc2+
n

nc4n (log n)η → 0,

∀ η > 0.

(2.2.5)

If σ
ˆ22 (x) is used, cn also satisfies,

3.8 ) → 0,
(log n)η /(ncn

∀ η > 0.

(2.2.6)

(I) The two sequences of real numbers an , bn satisfy the following conditions: an < 0 < bn ,
8

−an and bn tend to infinity such that for an 0 ≤ r1 < ∞, (bn − an ) = O((log n)r1 ),
and P (X0 ≤ an + λ) + P (X0 ≥ bn − λ) = o((log n)−1 ), for any λ > 0.
(KZ) For the α−mixing process, the kernel density K is supported on [−1, 1], symmetric
around 0 and three times differentiable, with all three derivatives bounded. Moreover
˙
K(1) = K(1)
= 0. The kernel W satisfies the same conditions.
(KZ ) For the geometric moment contracting process, the kernel density K is supported on
[−1, 1], symmetric around 0 and three times continuously differentiable. The kernel W
satisfies the same conditions.
(M) The functions m and σ are four times differentiable and there exist constants 0 < d1 <
d2 < ∞, 0 ≤ rq , rs < ∞, and sequences qn , qn,σ such that for all sufficiently large
n, d1 < qn < d2 (log n)rq , d1 < qn,σ < d2 (log n)rs , supx∈[an −h , bn +h ] |m(k) (x)| =
1
1
O(qn ), and supx∈[an −h , bn +h ] |σ (k) (x)| = O(qn ), k = 0, 1, 2, 3, 4, and
2
2
(inf x∈In |σ(x)|)−1 = O(qn,σ ), where h1 , h2 are as in (H) above.
(X) The observations Xj , j ∈ Z have a common marginal density g, which is bounded
and four times differentiable with bounded derivatives. The density is also bounded
away from zero on compact intervals. There exists some 0 ≤ rg < ∞ such that qn,g =
(inf x∈In g(x))−1 = O((log n)rg ), where In := [an , bn ], with an , bn as in Assumption
(I).
(Z) The process (Xj )j∈Z is α-mixing with mixing-coefficient α(n) = O(n−κ ), for some
√
√
(3 + 3)b + 2 + 3
√
√ ,7 .
κ > max 2
(1 + 3)b − 2(2 + 3)

9

Moreover, supx∈R ((|m(x)| + |σ(x)|)2k )g(x) < ∞, and there exists a j ∗ ≥ 1 such that

sup ((|m(x)| + |σ(x)|)k (|m(x )| + |σ(x )|)k gX0 ,Xj−1 (x, x )) < ∞,

∀ j > j ∗ + 1,

x,x ∈R

for k = 1, 2, where gU,V denotes joint density of any two r.v’s (U, V ).
(Z ) Xn = J (· · · , εn−1 , εn ), which is a σ−field generated by · · · , εn−1 , εn . Also (Xt )t∈Z is
geometric moment contracting, i.e. let Y p = (E|Y |p )1/p , for n > 0, some q > 1 and
0 < r < 1, Xn − Xn∗ q = O(rn ), where Xn∗ = J (· · · , ε−1 , ε∗0 , · · · , εn−1 , εn ) and ε∗0 is
an independent copy of ε0 .
The above assumptions (E), (F), (H), (I), (KZ), (M), (X) and (Z) are similar to the conditions in Neumeyer and Selk (2013) for the mixing processes. Assumption (Z ) is similar as in
Borkowski and Mielniczuk (2012) when the process satisfies the moment contracting condition, and the kernel conditions (KZ ) are similar to those in M¨
uller, Schick, and Wefelmeyer
(2009) (MSW). The relation (2.2.6) in assumption (H) is needed only for the analysis of
σ
ˆ22 (x).
We are now ready to state a uniform consistency result for the above estimators of m
and σ 2 . Its proof is deferred to the last section. Throughout the chapter, In := [an , bn ],
with an , bn as in Assumption (I).
Lemma 2.2.1 Suppose (2.2.1), (F), (H), (I), (KZ) or (KZ ), (X), (Z) or (Z ), and (M)
hold. Then

m(x)
ˆ
− m(x)
= Op
σ(x)
x∈In
σ
ˆ (x) − σ(x)
sup i
= Op
σ(x)
x∈In
sup

cn

−1/2 −1/2
n
(log n)1/2

Qn ,

−1/2 −1/2
n
(log n)1/2

Q2n ,

cn

10

(2.2.7)
i = 1, 2,

(2.2.8)

where Qn = qn qn,g qn,σ .
ˆ the Khmaladze marThe next section describes the proposed weighted empirical d.f. F,
ˆ its asymptotic distribution under null hypothesis and
tingale transform test based on F,
computation of the test statistics for several distributions.

2.3
2.3.1

Goodness-of-fit Tests
Asymptotic expansion for the weighted empirical distribution function

To begin with we need to introduce the weighted residual empirical d.f. Unlike in the regression case, MSW noted that the dependency and unboundedness of the observations
create some technical difficulties in autoregressive time series models because of the poor
performances of the estimator m(x)
ˆ
for large values of x. They used only those residuals
εˆj = Xj − m(X
ˆ j−1 ), for which Xj−1 falls in the interval In = [an , bn ]. Analogously, we use
the following weighted residual empirical process.
Fix a λ > 0. Let ωn (x) ∈ (0, 1) be a sequence of functions arbitrarily defined for x in
the intervals [an , an + λ) and (bn − λ, bn ]. In addition, assume that ωn (x) is three times
(j)

differentiable in x with uniformly three bounded derivatives, i.e., supn∈N supx∈R |ωn (x)| <
∞, j = 1, 2, 3, and satisfies

1,
ωn (x) =
0,

x ∈ [an + λ, bn − λ],
x ∈ [an , bn ].

11

(2.3.1)

Let ωnj = ωn (Xj−1 ) and

ω
¯j =

ωnj
,
n
i=1 ωni

j = 1, · · · , n.

Let εˆj := (Xj − m(X
ˆ j−1 ))/ˆ
σ (Xj−1 ), where µ
ˆ, σ
ˆ are as in the previous section. Then the
weighted residual empirical d.f. of interest is
n

ˆ
F(x)
=

ω
¯ j I(ˆ
εj ≤ x),

x ∈ R.

(2.3.2)

j=1

We also need the empirical d.f. based on the true errors

1
Fn (x) =
n

n

I[εj ≤ x],

x ∈ R.

j=1

For the one dimension autoregressive homoscedastic regression model, where εˆj = Xj −
m(X
ˆ j−1 ), MSW established, under the null hypothesis and under some conditions, that
1
ˆ
sup |F(x)
− Fn (x) − f (x)
n
x∈R

n

εj | = op (n−1/2 ).
j=1

Neumeyer and Selk (2013) obtained an analogous result for the ARCH(1) model (2.2.1) by
using nonparametric residuals based on Nadaraya-Waston type estimators of autoregressive
and variance functions. Under some conditions, they proved that

1
ˆ
sup F(x)
− Fn (x) − f (x)
n
x∈R

n

x
[εj + (ε2j − 1)] = op (n−1/2 ).
2

(2.3.3)

j=1

Theorem 2.3.1 below shows that this result continues to hold when residuals are based

12

on the local linear fitting of m(x) and σ 2 (x) as defined in (2.2.2)–(2.2.4).
Theorem 2.3.1 Under the assumptions (2.2.1), (E), (F), (H), (I), (KZ) or (KZ ), (M),
(X) and (Z) or (Z ), (2.3.3) continues to hold.

2.3.2

Khmaladze martingale transformation

The classical tests for the goodness-of-fit testing of an error distribution are the KolmogorovSmirnov (KS) and Cram´er-von Mises (CvM) tests. Using the the asymptotic expansion
(2.3.3) we readily obtain the following
Corollary 2.3.1 Under the conditions of Theorem 2.3.1,

ˆ
KS = n1/2 sup |F(x)
− F (x)| →d sup |R(x)|,
x∈R

CvM = n

x∈R

ˆ
ˆ
(F(x)
− F (x))2 dF(x)
→d

R2 (x)dF (x),

where R(x) is a zero-mean Gaussian process with covariance function

Cov(R(x1 ), R(x2 )) = E

x
I(ε ≤ x1 ) − F (x1 ) + f (x1 )(ε + 1 (ε2 − 1))
2
x
× I(ε ≤ x2 ) − F (x2 ) + f (x2 )(ε + 2 (ε2 − 1)
2

.

Clearly these limiting null distributions depend on F in a complicated fashion and to
date no theoretical results about their quantiles are available, which makes it impractical to
implement these tests in practice, even for large samples. Instead, we propose to use the
ˆ to obtain asymptotically distribution free tests.
Khmaladze martingale transformation of F

13

To proceed further, as in KK, assume F has an absolutely continuous density f with
almost derivative f˙. Let ψf (x) = −f˙(x)/f (x). We assume

I(f ) =

ψf2 (x)dF (x) =

f˙ 2
dF < ∞.
f

(2.3.4)

Note that Eε2 < ∞ and (2.3.4) imply

[xψf (x) − 1]2 dF (x) < ∞.

(2.3.5)

Thus (2.3.4) and (2.3.5) guarantee the finiteness of the Fisher information for location and
scale parameters.
Consider the extended score function vector h(x) = (1, ψf (x), xψf (x) − 1)T , for locationscale family F ((y − θ)/σ) with respect to both θ and σ, at θ = 0 and σ = 1. Define the
incomplete information matrix
∞

h(y)hT (y)dF (y)
ΓF (x) =
x

f (x)
 1 − F (x)


∞ ˙2
=  f (x)
x (f (y)/f (y)dy


∞
˙
˙
xf (x)
x (f (y) + y f (y))f (y)/f (y)dy


xf (x)



∞
˙
˙
.
x (f (y) + y f (y))f (y)/f (y)dy 

∞
2
˙
x (f (y) + y f (y)) /f (y)dy

Suppose ΓF (x) is nonsingular, for all x ∈ R, and define, as in KK, for a signed measure v,
x

K(x, v) =
−∞

hT (y)Γ−1
F (y)

∞

h(z)dv(z)dF (y),
y

14

x ∈ R.

(2.3.6)

If we define a vector function
x

hdF = (1 − F (x), −f (x), −xf (x))T ,

H(x) =
−∞

then analogous to (2.4) of KK, we obtain

H T (x) − K(x, H T ) = 0,

∀ x ∈ R.

(2.3.7)

Let

vˆn (x) =

√

ˆ
n[F(x)
− F (x)],

vn (x) =

√

n[Fn (x) − F (x)],

x ∈ R.

The Khmaladze martingale transformed processes Un and Un are defined as

Un (x) =
Un (x) =

√
√

ˆ
ˆ = vˆn (x) − K(x, vˆn ),
n[F(x)
− K(x, F)]

(2.3.8)

n[Fn (x) − K(x, Fn )] = vn (x) − K(x, vn ).

Based on the asymptotic expansion (2.3.3), we can rewrite

Un (x) = Un (x) + ηn (x),

ηn (x) = ξn (x) − K(x, ξn ),

1
ξn (x) = vˆn (x) − vn (x) − f (x) √
n

n

x
[εj + (ε2j − 1)],
2

j=1

sup |ξn (x)| = op (1).
x

If the matrix ΓF (x) is singular, then Γ−1
cannot be uniquely defined. But, the above
F (x)
transformation is still well defined as is evidenced in the following lemma. This lemma is an

15

extension of Lemma 2.1 of KK, suitable for the location-scale set up. As mentioned in KK,
it is an adaptation and simplification of a more general argument presented in Nikabadze
(1987) and Tsigroshvili (1998).
Lemma 2.3.1 Suppose, for some x0 , such that 0 < F (x0 ) < 1, the matrix ΓF (x) , for x > x0
degenerates to the form




x
 1 1


ΓF (x) = (1 − F (x))  1 1
x


x x x2 + 1




,



∀ x > x0 ,

(2.3.9)

or

 1


ΓF (x) = (1 − F (x))  k
 x

k

k
x
(k+1)2 k
(k+2)x2
k2
x


k 

k2 
,
x 

2
k

∀ x > x0 ,

some

k > 0.

(2.3.10)

Then in both cases, the equalities (2.3.7) and, hence, (2.3.8) are still valid. Besides, for
(2.3.9),

hT (x)Γ−1
F (x)

∞
x

2vn (x) − x∞ vn (y)dy
h(y)dvn (y) = −
,
1 − F (x)

x ∈ R;

(2.3.11)

for (2.3.10),

hT (x)Γ−1
F (x)

∞
x

∞ vn (y)
dy
y2

(k + 1) 2vn (x) + (k + 2)x x
h(y)dvn (y) = −
k
1 − F (x)

,

x ∈ R.
(2.3.12)

The conclusions (2.3.11) and (2.3.12) continue to hold with vn replaced by vˆn .
16

Proof. The proof of this lemma is similar to that of Lemma 2.1 of KK, which was proved
for the location model only where the analog of Γ is 2 × 2. In the present set up Γ is 3 × 3
matrix, which creates some complexity. For the sake of self containment and completeness,
we give details here to deal with this situation.
When ΓF (x) is degenerate of the form (2.3.9), h(x) = (1, 1, x − 1)T . The image of the
linear operator in R3 of ΓF (x) is

I(ΓF (x) ) = {b : b = ΓF (x) a, for some a ∈ R3 }
= {b : b = (1 − F (x))(β, β, βx + γ), β, γ ∈ R},

and the kernel of this operator is

K(ΓF (x) ) = {a : ΓF (x) a = 0}
= {a : a = α(1, −1, 0), α ∈ R},

To prove the equalities (2.3.7), it suffices to show that for any b ∈ I(ΓF (x) ), a ∈ K(ΓF (x) ),

h(x)T Γ−1
Γ
(b + a) = h(x)T (b + a).
F (x) F (x)

Note that for any b ∈ I(ΓF (x) ), a ∈ K(ΓF (x) ),

ΓF (x) (b + a) = ΓF (x) b = (2β + βx2 + γx, 2β + βx2 + γx, 3βx + βx3 + γx2 + γ)T .

17

For any g = (λ, λ, λx + η) ∈ I(ΓF (x) ), if ΓF (x) g = ΓF (x) b, then

2λ + λx2 + ηx = 2β + βx2 + γx,

3λx + λx3 + ηx2 + η = 3βx + βx3 + γx2 + γ.

From these two equations we obtain λ = β and η = γ. Then Γ−1
is any linear operator on
F (x)
I(ΓF (x) ) such that

Γ−1
Γ
b = b + a1 ,
F (x) F (x)

a1 ∈ K(ΓF (x) ).

From this fact we obtain that for any a ∈ K(ΓF (x) ), hT a = 0,

h(x)T Γ−1
Γ
(b + a) = h(x)T Γ−1
Γ
b = h(x)T (b + a1 ) = h(x)T (b + a).
F (x) F (x)
F (x) F (x)

Similarly, one proves (2.3.7) in the case ΓF (x) is degenerate of the form (2.3.10). This
completes the proof of (2.3.7), which in turn yields the claims (2.3.11) and (2.3.12) for vn
and vˆn , in an obvious way.
Sometimes it is convenient to use the time transformation t = F (x), un = vn (F −1 (t)),
uˆ = vˆn (F −1 (t)), γ(t) = h(F −1 (t)), and Γt = t1 γ(s)γ(s)T ds, 0 ≤ t ≤ 1. Now consider a
function parametric version of the u- and un -processes and their transforms:
1

u(ϕ) =

1

ϕ(s)du(s),

un (ϕ) =

0

ϕ(s)dun (s),
0

1

K(ϕ) = K(ϕ, u) =
0

ϕ(t)γ T (t)Γ−1
t
1

Kn (ϕ) = K(ϕ, un ) =
0

b(ϕ) = u(ϕ) − K( ϕ),

1

γ(s)du(s)dt,
t

ϕ(t)γ T (t)Γ−1
t

1

γ(s)dun (s)dt,
t

bn (ϕ) = un (ϕ) − Kn (ϕ),

18

ϕ ∈ L2 [0, 1].

Write b(t) and bn (t) for b(ϕ) and bn (ϕ(·)), respectively, when ϕ(·) = I(· ≤ t). Then
t

b(t) = u(t) −
0

γ T (z)Γ−1
z
t

bn (t) = un (t) −
0

1

γ(s)du(s)dz,

t ∈ [0, 1],

(2.3.13)

z

γ T (z)Γ−1
z

1

γ(s)dun (s)dz,

t ∈ [0, 1].

z

If Φ ⊂ L2 [0, 1] is a subset of square integrable functions such that the sequence un (ϕ), n ≥
1, is uniformly in n equicontinuous on Φ, then un →d u in ∞ (Φ), where u is standard
Brownian bridge, and ∞ (Φ) is the set of all uniformly bounded real valued functions on Φ
(see van der Vaart and Wellner (1996)).
The following theorem describes the weak convergence of the process Kn (ϕ), ϕ ∈ Φ. It
is an extension of Theorem 2.1 of KK, which is valid for the location model only, to the
location-scale model.
Theorem 2.3.2 (i) Let L2,ε ⊂ L2 [0, 1] be the subspace of all square integrable functions
which are equal to 0 on the interval (1 − ε, 1]. Then, Kn →d K, on L2,ε , for any 0 < ε < 1.
(ii) Let, for any arbitrarily small but fixed ε > 0, C < ∞, and α < 1/2, Φε ⊂ L2 [0, 1] be
a class of all square integrable functions satisfying the following right tail condition:

−1/2 (1 − s)−1/2−α ,
|ϕ(s)| ≤ C[γ T (s)Γ−1
s γ(s)]

∀ s > 1 − ε.

(2.3.14)

Then, Kn →d K, on Φε .
The following theorem describes the weak limit of the bn process and is an extension of
Theorem 2.2 of KK to the location-scale set up. Recall that as in van der Vaart and Wellner
(1996), the family of Gaussian random variable b(φ), φ ∈ Φ, Φ ⊂ L2 [0, 1], is continuous on
Φ, with covariance function Eb(φ)b(φ ) = 01 φ(t)φ (t)dt is called Browian motion on Φ.
19

Theorem 2.3.3 (i) Let Φ be a Donsker class, that is, let un →d u in l∞ (Φ). Then, for
every ε > 0,

bn →d b in

∞ (Φ ∩ Φ

ε ),

where {b(ϕ), ϕ ∈ Φ} is standard Brownian motion.
(ii) If the envelop function Ψ(t) of (2.3.14) tends to positive (finite or infinite) limit at
t = 1, then for the process (2.3.13) we have

bn →d b

2.3.3

on [0, 1].

Examples

Here, we shall assess the behavior of γ T (s)Γ−1
s γ(s), as s → 1, for some well known distributions. This is needed to understand the behavior of the bound in (2.3.14), which in
turn sheds some light on the class of functions ϕ one can use in this testing problem. Many
technical details are similar to those appearing in KK when dealing with the location model
only where Γs is 2 × 2 matrix. In the current set up we are dealing with the 3 × 3 matrix,
which makes the details of derivations a bit more involved.
First, let F be standard normal d.f. Then h(x) = (1, x, x2 − 1)T . With ζ ≡ ζ(x) =

20

f (x)/(1 − F (x)), we obtain




ζ
xζ
 1


ΓF (x) = (1 − F (x))  ζ
xζ + 1
(1 + x2 )ζ


xζ (1 + x2 ) 2ζ + (x + x3 )ζ
Γ−1
=
F (x)




,



(1 − F (x))−1
×
2 − 3ζ 2 + 3xζ + xζ 3 − 2x2 ζ 2 + x3 ζ

2
2 2
3
−2ζ
ζ 2 − xζ
 2 − ζ + 3xζ − x ζ + x ζ



−2ζ
2 + xζ − x2 ζ 2 + x3 ζ −ζ + xζ 2 − x2 ζ


ζ 2 − xζ
−ζ + xζ 2 − x2 ζ
1 − ζ 2 + xζ





,



and

hT (x)Γ−1
h(x) =
F (x)

3 − 4ζ 2 + 4xζ + x2 ζ 2 + x4 − 2x3 ζ
1
.
(1 − F (x)) 2 − 3ζ 2 + 3xζ + xζ 3 − 2x2 ζ 2 + x3 ζ

Using the asymptotic expansion for the tail of the normal d.f. for ζ(x) we obtain, as in KK,

x
ζ(x) =
,
1 − S(x)

n

where S(x) =
i=1

(−1)i−1 (2i − 1)!!
1
3
15
= 2 − 4 + 6 − ··· .
2i
x
x
x
x

From this one can derive that
3 − 4ζ 2 + 4xζ + x2 ζ 2 + x4 − 2x3 ζ
→ 9/5,
2 − 3ζ 2 + 3xζ + xζ 3 − 2x2 ζ 2 + x3 ζ

x → ∞,

and hence h(x)T Γ−1
h(x) ∼ 9(1 − F (x))−1 /5, x → ∞, equivalently,
F (x)

−1
γ T (s)Γ−1
s γ(s) ∼ 9(1 − s) /5,

21

s → 1.

This result is similar to the one obtained in KK for the location model only, where 9/5 is
replaced by 2.
Next, consider logistic d.f. F (x) with scale parameter 1, or equivalently ψf (x) = 2F (x) −
1. Then h(x) = (1, 2F (x)−1, x(2F (x)−1)−1)T or in terms of s = F (x), γ(s) = h(F −1 (s)) =
(1, 2s − 1, F −1 (s)(2s − 1) − 1)T , and when s is close to 1,



s
xs
 1

2

1 − 2s + 4s
s + x(1 − s)2
Γs ∼ (1 − s) 
s
+ xs

3
3


π 2 + 12x2
s + x(1 − s)2
+ xs 3 − 9x2 − 6(x − 3x2 )s +
xs
3
9(1 − s)









.

s=F (x)

−1
From this formula, one can verify that γ T (s)Γ−1
s γ(s) ∼ (1 − s) , for s → 1. This result

is different from the one reported in KK, where analogous γ and Γ satisfy γ T (s)Γ−1
s γ(s) =
4(1 − s)−1 , for all 0 ≤ s < 1.
Next, consider the double exponential d.f. with density f (x) = e−|x| /2. For x > 0, we
get h(x) = (1, 1, x − 1)T , and ΓF (x) is degenerate and equals to (2.3.9). An argument similar
to the proof of Lemma 2.3.1 yields h(x)T Γ−1
h(x) = 2(1 − F (x))−1 , for all x > 0 with
F (x)
F (x) < 1.
Finally, consider student tk -distribution with degrees of freedom k. In this case,
1 Γ((k + 1/2))
1
f (x) = √
.
πk Γ(k/2) (1 + x2 /k)(k+1)/2

22

As shown in KK, using the results of Soms (1976), for every k ≥ 1,
d 1
1 + x2 /k
1 Γ((k + 1/2)) k/2
f (x) ∼ k k , dk = √
k ,
x
k x
π Γ(k/2)
dk
x
k+1
k+1
f (x) ∼ k+1
,
x → ∞.
, ψf (x) =
∼
2
k 1 + (x /k)
x
x
1 − F (x) ∼

Hence, h(x) = (1, ψf (x), xψf (x) − 1)T ∼ (1, (k + 1)/x, k)T , and ΓF (x) degenerates and has
the form as in (2.3.10). This is unlike the location model case where KK observed that
the analog of ΓF (x) is non-degenerate. Nevertheless, one still continues to have the same
right tail behavior for the quadratic from γ(s)T Γ−1
s γ(s) as in the location model case, viz,
−1
γ(s)T Γ−1
s γ(s) ∼ {2(k + 1)/k}(1 − s) , s → 1.

2.3.4

Limiting process

In this section we discuss the weak convergence of the Un process. Towards this goal we
assume the same tail conditions for vˆn as in KK, which is that for some 0 < β < 1/2,
|ˆ
vn (y)|
= op (1),
y>x (1 − F (y))β
sup

as x → ∞,

(2.3.15)

uniformly in n. To simplify the notation, we let

ψ1 (x) = −f˙(x)/f (x),

ψ2 (x) = −xf˙(x)/f (x) − 1,

and denote the right tail mean of ψ1 and ψ2 by

Ex ψi = E[ψi (e1 )|e1 > x],

ψi0 = ψi − Ex ψi ,

Varx (ψi ) = Var[ψi (e1 )|e1 > x],

Covx (ψ1 , ψ2 ) = Cov[ψ1 (e1 ), ψ2 (e1 )|e1 > x] i = 1, 2.
23

Now we formulate three more conditions on F :
(a) For any ε > 0, the function ψi (F −1 ), i = 1, 2, is monotone on [1 − ε, 1].
(b) For some δ > 0, ε > 0 and some C < ∞, and for all x, such that F (x) > 1 − ε,

hT (x)Γ−1
(0, ψ10 (x), 0)T
F (x)

2 Var (ψ ) − ψ ψ Cov (ψ , ψ )|
|ψ10
x 2
x 1 2
10 20
=
Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 )

≤ C(1 − F (x))−2δ ,
hT (x)Γ−1
(0, 0, ψ20 (x))T
F (x)

=

2 Var (ψ ) − ψ ψ Cov (ψ , ψ )|
|ψ20
x 1
x 1 2
10 20
Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 )

≤ C(1 − F (x))−2δ .

Note that in terms of the above notation, with t = F (x),

γ T (t)Γ−1
t γ(t)
=

2 Var (ψ ) − 2ψ ψ Cov (ψ , ψ )
ψ 2 Varx (ψ2 ) + ψ20
1
x 1
x 1 2
10 20
× 1 + 10
.
(1 − F (x))
Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 )

Hence, condition (b) implies

−1−2δ ,
γ T (t)Γ−1
t γ(t) ≤ C(1 − t)

∀ t > 1 − ε.

(2.3.16)

(c) For some 0 < C < ∞ and β > 0 as in (2.3.15),
∞
x

[1 − F (y)]β dψi (y) ≤ C|ψi0 (x)|,

i = 1, 2.

Remark 2.3.1 As mentioned in KK, (2.3.15) also holds for vn for any 0 < β < 1/2. Conditions
(a), (b) and (c) are easy to check for all the examples in Section 2.3.3 by following similar
24

procedures even with δ = 0 in condition (b), so we omit the details here.
Now we consider the asymptotic behaviors for the K(ψ, ξn ), which is
1

K(ψ, ξn ) =
0

ψ(t)γ T (t)Γ−1
t

1

γ(s)ξn (F −1 (ds))dt,

t

and for a given indexing class Φ of functions from L2 [0, 1]. Let Φ ◦ F = {ϕ(F (·)), ϕ ∈ Φ}.
We can prove the similar limiting process for Un as Theorem 4.1 in KK.
Theorem 2.3.4 (i) Suppose conditions (2.3.15) and (a)-(c) are satisfied with β > δ. Then,
on the class Φε as in Theorem 2.3.2, with α < β − δ, we have

sup |K(ϕ, ξn )| = op (1),

n → ∞.

ϕ∈Φε

Therefore, if Φ is a Donsker class, then, for every ε > 0,

Un →d b

in

∞ (Φ ∩ Φ
ε

◦ F ),

where {b(ϕ), ϕ ∈ Φ} is standard Brownian motion.
(ii) If, in addition, δ < α, then for the time transformed process Un (F −1 (·)) of (2.3.8),

Un (F −1 (·)) →d b(·)

in D[0, 1].

Proof. The proof below is similar to that of Theorem 4.1 in KK.

25

Note that

[ψ10 Varx (ψ2 ) − ψ20 Covx (ψ1 , ψ2 )]a1
;
Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 )
[ψ20 Varx (ψ1 ) − ψ10 Covx (ψ1 , ψ2 )]a2
T =
.
γ T (t)Γ−1
t (0, 0, a2 )
(1 − F (x))[Varx (ψ1 )Varx (ψ2 ) − Covx (ψ1 , ψ2 )]
T =
γ T (t)Γ−1
t (0, a1 , 0)

The above equalities used with ai = t1 (1 − s)β dψi (F −1 (s)), i = 1, 2, combined with conditions (b) and (c), yield

T
−1−2δ ,
|γ T (t)Γ−1
t (0, a1 , a2 ) | ≤ C(1 − t)

∀ 1 − < t < 1.

(2.3.17)

Now we prove the first claim.
(i) Denote ξn (t) = ξn (x) with t = F (x). Because of the singularities at t = 0 and t = 1
in both integrals in K(ϕ, ξn ), we will isolate the neighborhood of t = 1. The neighborhood
of t = 0 can be treated more easily. First assume Γt > 0 for all t < 1. Then,
1
0

ϕ(t)γ T (t)Γ−1
t

1
t

1−ε

γ(t)ξn (ds)dt =

0

ϕ(t)γ T (t)Γ−1
t

1−ε

+
0
1

+
1−ε

1−ε

γ(t)ξn (ds)dt

t
1

ϕ(t)γ T (t)Γ−1
t

ϕ(t)γ T (t)Γ−1
t

1−ε
1
t

γ(t)ξn (ds)dt

γ(t)ξn (ds)dt.

We shall show that each of these three terms are op (1).
First consider the third summand on the right-hand side. By definition,
n

ξn (t) = uˆn (t) − un

(t) − f (F −1 (t))n−1/2

[εi +
i=1

26

F −1 (t) 2
(εi − 1)].
2

The third summand is then sum of the two terms, one corresponding to the difference uˆn −un
and the other corresponding to the remaining term. Now, since df (F −1 (s)) = ψf (x)f (x)dx
and dF −1 (s)f (F −1 (s)) = [1 + xψf (x)]f (x)dx, F (x) = s, then
1
1−ε

ϕ(t)γ T (t)Γ−1
t

1

γ(t)(df (F −1 (s)) + dF −1 (s)f (F −1 (s)))dt

t

1
is the sum of the second and the third coordinate of 1−ε
ϕ(t)γ(t)dt, and is small for small ε

anyway. Assumption (a) guarantees the monotonicity of ψf (F −1 ) and dF −1 (s)f (F −1 (s)),
so the integration by parts is justified, and we obtain
1
1−ε

ϕ(t)γ T (t)Γ−1
t
1

=
1−ε

1

γ(t)ˆ
un (ds)dt
t

ϕ(t)γ T (t)Γ−1
− γ(t)ˆ
un (t) −
t

1

uˆn (s)dγ(s) dt.
t

Using assumption (2.3.14) on ϕ and (2.3.16), we obtain
1
1−ε

ϕ(t)γ T (t)Γ−1
un (t)dt
t γ(t)ˆ
1

≤ C
1−ε
1

≤ C
1−ε

which is small for small

1/2
[γ T (t)Γ−1
t γ(t)]

1
(1 − t)1+α+δ−β

1
(1 − t)1/2+α−β

|ˆ
un (t)|
β
t>1−ε (1 − t)

dt sup

|ˆ
un (t)|
,
β
t>1−ε (1 − t)

dt sup

as soon as α < β − δ.

Note that t1 uˆn (s)dΓ(s) =

T
0, t1 uˆn (s)dψf (F −1 (s)), t1 uˆn (s)d(F −1 (s)ψf (F −1 (s)) .

Using monotonicity of ψf (F −1 (s)) and F −1 (s)ψf (F −1 (s)) for small enough ε, we obtain,

27

for all t > 1 − ε,
1
t
1
t

uˆn (s)dψf (F −1 (s)) < C

uˆn (s)d(F −1 (s)ψf (F −1 (s))) < C

1
t

|ˆ
un (t)|
;
β
t>1−ε (1 − t)

(1 − s)β dψf (F −1 (s)) sup

1
t

(2.3.18)

|ˆ
un (t)|
.
β
t>1−ε (1 − t)

(1 − s)β d(F −1 (s)ψf (F −1 (s))) sup

Therefore, using (2.3.17), for the double integral
1
1−ε

ϕ(t)γ T (t)Γ−1
t

1

1

uˆn (s)dγ(s)dt ≤ C
t

|ˆ
un (t)|
,
β
t>1−ε (1 − t)

(1 − t)−1−2δ dt sup

1−ε

which is small as soon as α < β − δ. The same conclusion is true for uˆn replaced by un .
Since (2.3.18) implies the smallness of
1

and

uˆn (s)d(F −1 (s)ψf (F −1 (s)))

and

1−ε
1
1−ε

1

uˆn (s)dψf (F −1 (s))

1−ε
1
1−ε

un (s)dψf (F −1 (s));
un (s)d(F −1 (s)ψf (F −1 (s))),

to prove that the middle summand on the right-hand side is small one needs only finiteness
of ψ1 (x), ψ2 (x) in each x with 0 < F (x) < 1, which follows from (a). This and uniform in x
smallness of ξn proves smallness of the first summand as well.
The smallness of integrals
ε
0

ϕ(t)γ T (t)Γ−1
t γ(t)

1
t

γ(s)ξn (ds)dt,

−1
follows from Γ−1
t ∼ Γ0 for small t, and square integrability of ϕ and Γ.

28

If Γt is degenerate of the form (2.3.9) for any t > t0 , we get

γ T (t)Γ−1
t

2ξn (t) − t1 ξn (t)dt
.
γ(s)ξn (ds)dt = −
1−t
t
1

If Γt is degenerate of the type (2.3.10) for any t > t0 , we get

γ T (t)Γ−1
t

1
(k + 1) 2ξn (t) + (k + 2)F −1 (t) t ξn (t)/F −1 (t)2 dt
γ(s)ξn (ds)dt = −
.
k
1−t
t
1

The smallness of all tail integrals easily follows by the tail condition (2.3.15) for our choice
of the indexing functions ϕ.
(ii) Since for δ < α the envelope function Ψ(t) of (2.3.14) satisfies inequality

Ψ(t) ≥ (1 − t)δ−α .

It has positive finite or infinite lower limit at t = 1. We can choose an indexing class of
indicator functions ϕ(t) = I[τ ≤ t] and the claim follows.

2.4

Simulations

In this section we report the findings of a simulation study. To examine the performance of
the proposed test, we consider the following autoregressive and conditional variance functions

m(x) =

(1/2 + x2 /2) − 1/2,

σ 2 (x) = 3/4 + x2 /4,

29

x ∈ R.

In the null hypothesis, F is the d.f. of a standardized normal r.v., as in Section 2.3.3, then
h(x) = (1, x, x2 − 1)T , and Γ−1
is as in (2.3.15). The interval In := [− log(n), log(n)]. For
F (y)
the purpose of computation, we use the following representation of
n

Un (x) =

ω
¯ i [I(ˆ
ei ≤ x) − h(ˆ
ei )T G(x ∧ eˆi )],

n1/2

x ∈ R,

i=1

eˆi := εˆi I(− log n ≤ Xi−1 ≤ log(n)),

where G(x) =

−1
y≤x ΓF (y) h(y)dF (y).

εˆi := (Xi − m(X
ˆ i−1 ))/ˆ
σ (Xi−1 ),

Let eˆ(j) , 1 ≤ j ≤ n denote the ordered residuals

eˆi , 1 ≤ i ≤ n. Then Un := supx∈R |Un (x)| = max{max1≤j≤n |Un (ˆ
e(j) )|, supx<ˆe

(1)

|Un (x)|}.

The asymptotic critical values of the Un -test are the critical values of the distribution
of sup0≤t≤1 |b(t)|. From Khmaladze and Koul (2004) these critical values at the levels 5%,
2.5% and 1%, respectively, are 2.24241, 2.49771 and 2.80705. To compare the effect of the
two estimators σ
ˆ12 (x) and σ
ˆ22 (x) of σ 2 (x) given at (2.2.2) and (2.2.4) on the finite sample
behavior of the test, we first compared the type I error for different sample sizes obtained
by computing the number of times Un exceeded the given asymptotic critical value, divided
by the number of repetitions, based on the sample sizes n = 300, 500, each repeated 1000
times. The results are displayed in Table 2.2. One sees that σ
ˆ22 is more effective than σ
ˆ12 in
preserving the nominal level of this test.
Then we used the adaptive estimator σ
ˆ22 to examine the finite sample power of the
proposed Khmaladze martingale transform Un test. The alternatives chosen are the mixture
distributions of standard normal and standardized t-distribution with degree of freedom 4,
√
i.e (1 − p)N (0, 1) + pt4 / 2, for p ∈ [0, 1].
We compared the Un test with the two classical tests, KS and CvM tests. The critical
values for the latter two tests are simulated by Monte Carlo method. We choose n = 500

30

and 1000 repetitions for each test. The critical values thus obtained are given in Table 2.1.
Level
KS
CvM
0.01 1.03159 0.21080
0.025 0.93812 0.17630
0.05 0.86067 0.15036
Table 2.1: Monte carlo critical values of the KS and CvM tests.

The empirical powers, i.e., the relative rejection frequencies under the chosen alternatives,
for all three tests based on the sample sizes n = 300 and n = 500 with 1000 repetitions and
5%, 2.5% and 1% levels are displayed in Table 2.3. As in KK, the martingale transform test
Un again has larger empirical power than the KS test, uniformly at all chosen levels and for
all values of p. Its empirical powers are also higher than those of the CvM test, at all chosen
levels and for all values of p, except for p = .8 and p = 1.
In this simulation study the time series Xi was generated as follows. For each simulation,
900 + n observation of Xi were generated, and only the last n observations were used in
the test, to ensure stationarity. The local linear estimators for m
ˆ and σ
ˆ 2 were calculated
using the biweight kernel function K(x) ≡ W (x) ≡ 15(1 − x2 )2 I(|x| ≤ 1)/16. Both the
bandwidths were chosen according to the assumption by a rule of thumb as h1 = h2 =
1.06 ∗ min(sd(ˆ
e), IQR(ˆ
e)/1.34) ∗ h−2/(6+1.9) , where eˆ is the vector of all residuals with
Xi−1 ∈ In = [− log n, log n], i = 1, · · · , n, and IQR means the interquartile range. Let
s = (log n − |x|)/0.1, x ∈ R. The weight function used was



0,




wn (x) =
1,





 −20s7 + 70s6 − 84s5 + 35s4 ,

31

x∈
/ [− log n, log n];
x ∈ [− log n + 0.1, log n − 0.1];
otherwise.

Table 2.2: Empirical levels of Un test

Level
0.05
0.025
0.01

n = 300
0.014
0.005
0.005

σ
ˆ12
n = 500
0.021
0.010
0.006

n = 600
0.031
0.014
0.006

n = 300
0.031
0.009
0.004

σ
ˆ22
n = 500
0.047
0.017
0.008

n = 600
0.051
0.027
0.010

Table 2.3: Empirical powers of tests based on σ
ˆ22 .
p
0

0.2

0.4

0.6

0.8

1

2.5

Level
0.05
0.025
0.01
0.050
0.025
0.010
0.050
0.025
0.010
0.050
0.025
0.010
0.050
0.025
0.010
0.050
0.025
0.010

Un
0.030
0.018
0.007
0.073
0.057
0.045
0.148
0.129
0.110
0.261
0.223
0.188
0.404
0.342
0.300
0.556
0.499
0.437

n = 300
KS
0.053
0.022
0.006
0.038
0.015
0.006
0.071
0.049
0.024
0.109
0.066
0.032
0.216
0.141
0.087
0.331
0.235
0.153

CvM
0.049
0.018
0.007
0.041
0.024
0.012
0.099
0.066
0.037
0.182
0.131
0.076
0.408
0.311
0.217
0.575
0.478
0.368

Un
0.049
0.020
0.007
0.134
0.118
0.106
0.303
0.263
0.229
0.494
0.445
0.398
0.673
0.612
0.563
0.812
0.760
0.710

n = 500
KS
0.049
0.021
0.014
0.046
0.025
0.013
0.089
0.052
0.030
0.241
0.172
0.101
0.422
0.326
0.209
0.587
0.447
0.356

CvM
0.052
0.029
0.013
0.052
0.025
0.014
0.169
0.117
0.075
0.411
0.336
0.253
0.716
0.627
0.516
0.873
0.816
0.738

Proofs

In this section we give the proof of Theorem 2.3.1. To this end, we list some useful lemmas.
For α−mixing processes, we can follow the same proof as in Selk and Neumeyer (2013), and
for the moment contracting stationary processes, the proofs are similar to those of Wu et al.
(2010). Many details that follow lemma will be brief.

32

Let t1 , t2 , · · · be measurable functions which are bounded by the same constant B. Let
1
Tn (x) =
nh

n

tn (Xj )K
j=1

Xj − x
,
h1

x ∈ R.

(2.5.1)

We have
Lemma 2.5.1 Under the conditions of Theorem 2.3.1,

sup |Tn (x) − E(Tn (x))| = Op
x∈In

Proof.

log n 1/2
.
ncn

(i) Under condition (Z) for α−mixing processes, the proof is similar to that of

Lemma B.1 in Selk and Neumeyer (2010) with k = 0 in their proof.
(ii) For the moment contracting processes, since the t1 , t2 , · · · are bounded on In , the
claim follows from Proposition 2 and Lemma 4 of Wu et al. (2010).
Next, consider

1
Un,l (x) =
nh

n

εj σ(Xj−1 )K (l)
j=1

Xj−1 − x
,
h1

x ∈ In , l = 0, 1, 2,

(2.5.2)

where K (l) is the l-th derivative of K. We have
Lemma 2.5.2 Under the conditions of Theorem 2.3.1,

sup
x∈In ,l=0,1,2

|Un,l (x)| = Op

−1/2−l −1/2
n
(log n)1/2

cn

+ c2n qn .

Proof. i) Under the condition (Z) for α−mixing processes, it follows from Lemma B.1
and Lemma B.2 of Selk and Neumeyer (2010) applied with k = 1.
33

(ii) Under the condition (Z ) for the moment contracting processes, because of the stationarity, it follows from Lemma 4 of M¨
uller et al. (2009).
Proof of Lemma 2.2.1. The general idea of the proof this lemma and Theorem 2.3.1
is similar to that of Theorem 1 in M¨
uller et al. (2009), so we use similar notation as in their
paper and shall be brief whenever possible. Let Ki (u) = ui K(u), i ≥ 0, Let Ki (u) = ui K(u),
i ≥ 0,

1
pˆi (x) =
nh1

n
j=1

Xj−1 − x
Ki
,
h1

1
qˆi (x) =
nh1

n

X j Ki
j=1

Xj−1 − x
,
h1

x ∈ R.

On the event, pˆ2 (x)ˆ
p0 (x) − pˆ21 (x) > 0,
pˆ (x)ˆ
q0 (x) − pˆ1 (x)ˆ
q1 (x)
.
m(x)
ˆ
= 2
pˆ2 (x)ˆ
p0 (x) − pˆ21 (x)
Assumption (F), (H), (K), and Lemmas 2.5.1 imply

sup |ˆ
pi (x) − E[ˆ
pi (x)]| = Op (h1 ),

i = 0, 1, 2, · · · .

(2.5.3)

x∈In

Let p¯i (x) = E[ˆ
pi (x)] and λi =

Ki (u)du =

ui K(u)du. Note that p¯i (x) =

g(x −

h1 u)ui K(u)du, and λ0 = 1, λ1 = 0, λ2 > 0. By (2.5.3),

pˆi /g − λi In + p¯i /g − λi In = Op (h1 ),

i = 0, 1, 2, · · · .

Hence

pˆ2 (x)ˆ
p0 (x) − pˆ21 (x) − λ2 g 2 In = Op (h1 ).
34

(2.5.4)

With (inf x∈In g(x))−1 = qn,g in assumption (X), there exists an η > 0 such that

2
inf |ˆ
p2 (x)ˆ
p0 (x) − pˆ21 (x)| > η → 1.
P qn,g
x∈In

(2.5.5)

Write qˆi = Ai + Bi , for i = 0, 1, where
1
Ai (x) =
nh1
Bi (x) =

1
nh1

n

σ(Xj−1 )εj Ki
j=1
n

m(Xj−1 )Ki
j=1

Xj−1 − x
,
h1
Xj−1 − x
,
h1

x ∈ R.

Since the second derivative m
¨ of m is bounded, a Taylor expansion shows that

1
(Bi − mˆ
pi − mh
˙ 1 pˆi+1 − mh
¨ 21 pˆi+2 )/g In = Op (h31 ),
2

where

(2.5.6)

· In denotes the super norm over In .

Note that the proof of the properties of σ
ˆ12 is similar to one for m,
ˆ so we give the details
for m
ˆ and σ
ˆ22 only. By gn = up (hn ), we mean that there exists constant C > 0, such
that P ( gn In ≤ C hn In ) → 1. Based on the analysis above, we obtain the following
expansions, which are similar to those appearing in Yao and Tong (1994). With rˆj ≡
(Xj − m(X
ˆ j−1 ))2 ,

m(x)
ˆ
− m(x) =

1
nh1 g(x)

n

σ(Xj−1 )εj K
j=1

Xj−1 − x
h2 λ2
+ 1 m(x)
¨
+ up (Rn,1(x) ),(2.5.7)
h1
2

35

σ
ˆ22 (x) − σ22 (x)
1
=
nh2 g(x)

(2.5.8)
n

Xj−1 − x
{ˆ
rj − σ 2 (x) − σ˙ 2 (x)(Xj−1 − x)} + up {Rn,2 (x)},
h2

W
j=1

where

1
Rn,1 (x) =
ng(x)
n

+
j=1

1
Rn,2 (x) =
ng(x)
n

+

n

σ(Xj−1 )εj K
j=1

Xj−1 − x
h1

Xj−1 − x
Xj−1 − x
σ(Xj− )εj K
h1
h1
n

W
j=1

2 q h3 );
+ O(qn,g
n 1

Xj−1 − x
{ˆ
rj − σ 2 − σ˙ 2 (x)(Xj−1 − x)}
h2

Xj−1 − x
Xj−1 − x
{ˆ
rj − σ 2 − σ˙ 2 (x)(Xj−1 − x)}
W
h2
h2

j=1
2 q 2 h3 ).
+O(qn,g
n 2

From Lemma 2.5.2, we have

1
sup
x∈In nh1 g(x)

n

σ(Xj−1 )εj Ki
j=1

Xj−1 − x
h1

= Op qn qn,g

log n 1/2
.
nh1

From (2.5.7) and the above bounds we readily obtain

sup |m(x)
ˆ
− m(x)| = Op

−1/2 −1/2
n
(log n)1/2

cn

qn qn,g .

x∈In

Combining this fact with condition (M) completes the proof of (2.2.7).

36

(2.5.9)

To deal with σ
ˆ22 , a similar analysis as in Fan and Yao (2002) can be followed, where

rˆj = {Xj − m(X
ˆ j−1 )}2 = {σ(Xj−1 )εj + m(Xj−1 ) − m(X
ˆ j−1 )}2
ˆ j−1 )}
= σ 2 (Xj−1 )ε2j + 2σ(Xj−1 )εj {m(Xj−1 ) − m(X
+{m(Xj−1 ) − m(X
ˆ j−1 )}2 .

Then

σ
ˆ22 (x) − σ 2 (x)
= J1 + J2 − J3 + J4 + Op (h2 )(|J1 + J2 − J3 + J4 | + |J1∗ + J2∗ − J3∗ + J4∗ |),

where

1
J1 =
nh2 g(x)
J2 =

J3 =

J4 =

1
nh2 g(x)
2
nh2 g(x)
1
nh2 g(x)

n

W

Xj−1 − x
{σ 2 (Xj−1 ) − σ 2 (x) − σ˙ 2 (x)(Xj−1 − x)},
h2

W

Xj−1 − x 2
σ (Xj−1 )(ε2j − 1),
h2

W

Xj−1 − x
σ(Xj−1 )εj {m(X
ˆ j−1 ) − m(Xj−1 )},
h2

W

Xj−1 − x
{m(X
ˆ j−1 ) − m(Xj−1 )}2 ,
h2

j=1
n
j=1
n
j=1
n
j=1

and Ji∗ is defined in the same way as Ji with one more factor h−1
2 (Xj−1 − x) in the jth
summand, for j = 1, · · · , n and i = 1, · · · , 4. Condition (M) implies

J1 In = Op (qn qn,g h22 ),

37

and from Lemma 2.5.2, we obtain

log n 1/2
.
J2 In = Op qn2 qn,g
nh2
Based on (2.5.9),

3 q 2 log n
J4 In = Op qn,g
n
nh h

.

1 2

To deal with J3 , rewrite J3 = J31 + J32 + J33 , where
1
J31 =
2
n h1 h2 g(x)

n

K
i,j=1

g −1 (Xi−1 )W
=

h21 λ2
J32 =
nh2 g(x)
|J33 | ≤

Op (1)
n2 h2

Xi−1 − Xj−1
σ(Xi−1 )σ(Xj−1 )εi εj
h1

Xj−1 − x
Xi−1 − x
+ g −1 (Xj−1 )W
h2
h2

1

φij ,
n2 h1 h2 g(x)
1≤i,j≤n

n

W

Xi−1 − x
σ(Xi−1 )εi m(X
¨ i−1 ),
h2

W

Xi−1 − Xj−1
Xi−1 − x
K
σ(Xi−1 )σ(Xj−1 )|εi |εj /g(Xi−1 ) ,
h2
h1

i=1
n

i,j=1

where
Xi−1 − Xj−1
Xi−1 − x
σ(Xi−1 )σ(Xj−1 )εi εj g −1 (Xi−1 )W
h1
h2
Xj−1 − x
+g −1 (Xj−1 )W
.
h2

φij = K

38

Argue as in Borkowski and Mielniczuk (2012), to obtain

E

φij

2

2 ).
= Op (n2 c2n qn4 qn,g

1≤i,j≤n

To obtain the uniform bound, we consider the equal-length cover Ink and with center
xnk ,
k = 1, · · · , L(n), for In , where

L(n) = O((log n)r1 /(c3n (ncn )1/2 qn qn,g )).

Then

sup |J31 (x)| ≤

x∈In

max

sup

1≤k≤L(n) x∈In ∩I
nk

|J31 (x) − J31 (xnk )| +

max
1≤k≤L(n)

|J31 (xnk )| = R1 + R2 .

Note that

R1 ≤

For any

2
C(log n)r1 qn3 qn,g

L(n)h2 (nh1 )1/2

= Op qn2 qn,g c2n .

> 0, by the relation (2.2.6) in assumption (H), for a constant C < ∞,

−1 c−2 R >
qn−2 qn,g
2
n

P

≤ L(n)P

−1 c−2
qn−2 qn,g
n

1

φij
n2 h1 h2 g(x)
1≤i≤j≤n

C(log n)r1

>ε

2
1
E
φ
ij
2
c3n (ncn )1/2 qn qn,g 2 n4 c4n h21 h22 qn4 qn,g
i<j
r
C(log n) 1
=
→ 0.
2 n5/2 c19/2 q q
n n,g
n

≤

39

So J31 In = Op (qn2 qn,g c2n ). Similarly, we obtain J33 In = Op (qn2 qn,g c2n ). Also, it is obvious
that J32 In = op (qn2 qn,g h22 ).
Based on the above analysis, we obtain

σ
ˆ22

− σ22

1
=
nh2 g(x)

n

W
j=1

Xj−1 − x 2
σ (Xj−1 )(ε2j − 1) + Op (qn2 qn,g c2n ).
h2

This relation, the fact (ˆ
σ2 − σ)/σ = (ˆ
σ22 − σ 2 )/2σ 2 − (ˆ
σ2 − σ)2 /2σ 2 , Lemma (2.5.2) and the
condition (M) together imply (2.2.8) in routine fashion. This also completes the proof of
Lemma 2.2.1.
Proof of Theorem 2.3.1. We denote Sˆ = (m
ˆ − m)/σ, Tˆ = (ˆ
σ − σ)/σ. Let Fω denote the
weighted empirical distribution function based on the unobserved innovations, which is
n

ω
¯ j I[εj ≤ t],

Fω (t) =

t ∈ R.

j=1

Similarly as in Lemma B.5 of Selk and Neumeyer (2013), we obtain

sup |Fω (t) − F(t)| = op (n−1/2 ),

W =

t∈R

1
n

n

ωnj = 1 + op (1).
j=1

Next, define

ˆ Tˆ) = 1
B(t, S,
n

n

ˆ j−1 ) + Tˆ(Xj−1 )t − F (t) ,
ωnj F t + S(X
j=1

40

and

ˆ Tˆ) = 1
H(t, S,
n

n

ˆ j−1 ) + Tˆ(Xj−1 )t − F t + S(X
ˆ j−1 ) + Tˆ(Xj−1 )t
ωnj I ε ≤ t + S(X

,

j=1

for t in R and S, T in C(R), the set of continuous functions from R to R. Then we can
rewrite

ˆ − Fω (t)) = H(t, S,
ˆ Tˆ) − H(t, 0, 0) + B(t, S,
ˆ Tˆ).
W (F(t)

It follows from Lemma 2.5.3 below that

1
n

n
j=1
n

1
n

n
m(X
ˆ j−1 ) − m(Xj−1 )
1
ωnj
=
εj + op (n−1/2 ),
σ(Xj−1 )
n
j=1
n

ωnj
j=1

σ
ˆ (Xj−1 ) − σ(Xj−1 )
1
=
σ(Xj−1 )
2n

(ε2j − 1) + op (n−1/2 ).
j=1

As f˙ exists, we derive

ˆ Tˆ) − f (t) 1
sup B(t, S,
n
t∈R

n
j=1

1
εj + t(ε2j − 1)
2

1
≤
n

n

ˆ j−1 ) + Tˆ(Xj−1 )t]2 ,
ωnj f˙(ξt,Xj−1 )[S(X
j=1

ˆ j−1 ) + Tˆ(Xj−1 )t and t. The relation supt∈R |t2 f˙(t)| < ∞
for some ξt,Xj−1 between S(X
yields

ˆ Tˆ) − f (t)
sup B(t, S,
t∈R

1
n

n
j=1

1
εj + t(ε2j − 1)
2

41

= op (n−1/2 ).

Thus to complete the proof of claim (2.3.3), it remains to show that

sup H(t, S, T ) − H(t, 0, 0) = op (n−1/2 ).

(2.5.10)

t∈R

Based on condition (E), we have

max |εj | = op (n1/2 ).

1≤j≤n

Since S In = op (1), T In = op (n−1/4 ), the probability of the event

An := { max |εj | ≤ 2n1/2 − 1} ∩ { Sˆ In = op (1),
1≤j≤n

Tˆ In = op (n−1/4 )},

tends to one. On the event An

sup

ˆ Tˆ − H(t, 0, 0)| =
|H(t, S,

sup

ˆ Tˆ)
B(t, S,

|t|>n1/2 ,1≤i≤n

|t|>n1/2

√
√
≤ 2(1 − F ( n/2)) + 2F (1 − n/2).

Since F has a finite second moment, we have F (t) = o(t−2 ), as t → −∞ and 1 − F (t) =
o(t−2 ), as t → ∞. This implies that

sup

ˆ Tˆ − H(t, 0, 0)| = op (n−1/2 ).
|H(t, S,

|t|>n1/2

So we are left to show

sup

ˆ Tˆ − H(t, 0, 0)| = op (n−1/2 ).
|H(t, S,

|t|≤n1/2

42

(2.5.11)

Now let δ = 1/(1 +

√

3). For any interval I, let C11+δ (I) be the set of differentiable

functions h on R that satisfy h I,δ ≤ 1, where

h I,δ = h I + h˙ I +

˙
˙
|h(x)
− h(y)|
.
|x − y|δ
x,y∈I,x=y
sup

Now let Dn = {u + ν : u ∈ Un , ν ∈ Vn }, where

Un = {h ∈ C(R) : h In ≤ n−1/2 },
−1/2

Vn = {h ∈ C11+δ (R) : h In ≤ n−1/2 cn

log nQ2n },

with Qn = qn qn,g qn,σ . Let uˆ(x) := m(x)
ˆ
− m(x) − vˆ(x), and uˆσ (x) := σ
ˆ 2 (x) − σ 2 (x) − vˆσ (x),
where

1
vˆ(x) :=
nh1 g(x)
vˆσ (x) :=

1
nh2 g(x)

n

σ(Xj−1 )εj K
j=1
n

W
j=1

Xj−1 − x
+ Op (qn c2n ),
h1

Xj−1 − x 2
σ (Xj−1 )(ε2j − 1).
h2

It follows from Lemma 2.5.2 and similar argument as in Selk and Neumeyer (2013), Sˆ and
Tˆ belong to Dn with probability tending to one. So (2.5.11) will be followed if we prove

sup

|H(t, S, T − H(t, 0, 0)| = op (n−1/2 ).

|t|≤n1/2 ,S,T ∈Dn

To this end, set ηn = n−1/2 .

Let t1 , · · · , tMn be ηn -net of [−n1/2 , n1/2 ], and set

43

ν1 , · · · , νNn for Vn . We can choose the former net such that

Mn ≤ 2 + n,

(2.5.12)

Nn ≤ exp(K∗ (2 + bn − an )n1/(2+2δ) ),

(2.5.13)

the second net is

where K∗ is some positive constant, see also (Van der Vaart and Wellner (1996)). Note that
ν1 , · · · , νNn is an 2ηn -net for Dn . We have

sup

|H(t, S, T ) − H(t, 0, 0)|

|t|≤n1/t ,S,T ∈Dn

≤ max |Hn (ti , νl , νm ) − Hn (ti , 0, 0)| + max Di,l,m ,
i,l,m

i,l,m

where

Di,l,m =

sup
|t−ti |≤ηn , S−νl I ≤2ηn , T −νm I ≤2ηn

|H(ti , S, T ) − H(t, νl , νm )|

+|H(ti , 0, T ) − H(t, 0, νm )| + |H(ti , S, 0) − H(t, νl , 0)| + |Hn (ti , 0, 0) − Hn (t, 0, 0)|.

For |t − ti | ≤ ηn , S − νl I ≤ 2ηn , T − νm I ≤ 2ηn , we have

I y ≤ ti + νl (x) + νm (x)ti − ηn (A + 3)

≤ I y ≤ t + S(x) + T (x)t
≤ I y ≤ ti + νl (x) + νm (x)ti − ηn (A + 3) ,

44

and

F ti + νl (x) + νm (x)ti − ηn (A + 3)
≤ F t + S(x) + T (x)t ≤ F ti + νl (x) + νm (x)ti + ηn (A + 3) ,

for all y ∈ R and x ∈ In , where A = |T | + 2|ti | + 2ηn . Hence

|H(ti , S, T ) − H(t, νl , νm )|
≤ |H ti + ηn (A + 3), νl (x), νm (x) − H ti − ηn (A + 3), νl (x), νm (x) | + 2Ri,l,m ,

with

Ri,l,m
n

=
j=1

ωnj
{F ti + νl (x) + νm (x)ti + ηn (A + 3) − F ti + νl (x) + νm (x)ti − ηn (A + 3) }
n

≤ 2ηn (sup |Af (ξ)| + 3 f ∞ ),

say.

t

for some ξ is between ti + νl (x) + νm (x)ti − ηn (A + 3) and ti + νl (x) + νm (x)ti + ηn (A + 3).
By assumption (F), there exists some L, such that |Af (ξ)| < L < ∞. Similarly, we derive
the bound for the following terms,

|H(ti , 0, T ) − H(t, 0, νm )| ≤ |H ti + ηn (A + 1), 0, νm (x) − H ti − ηn (A + 1), 0, νm (x) |
≤ 4ηn L + 4 f ∞ ,
|H(ti , S, 0) − H(t, νl , 0)|) ≤ |H ti + 3ηn , νl (x), 0 − H ti − 3ηn , νl (x), 0 | ≤ ηn 12 f ∞ ,
|Hn (ti , 0, 0) − Hn (t, 0, 0)| ≤ |H(ti + ηn , 0, 0) − H(ti − ηn , 0, 0)| ≤ ηn 4 f ∞ .

45

So

ˆ Tˆ − H(t, 0, 0)| = T1 + T2 + T3 + T4 + T5 + ηn (8L + 32 f ∞ ),
|H(t, S,

sup
|t|≤n1/2 ,S,T ∈Dn

where

T1 = max |H(ti , νl , νm ) − H(ti , 0, 0)|,
i,l,m

T2 = max |H ti + ηn (A + 3), νl (x), νm (x) − H(ti − ηn (A + 3), νl (x), νm (x))|,
i,l,m

T3 = max |H ti + ηn (A + 1), 0, νm (x) − H ti − ηn (A + 1), 0, νm (x) |,
i,l,m

T4 = max |H ti + 3ηn , νl (x), 0 − H ti − 3ηn , νl (x), 0 |,
i,l,m

T5 = max |H(ti + ηn , 0, 0) − H(ti − ηn , 0, 0)|.
i,l,m

To continue, for any υi and τi , i = 1, 2, let

Yj = ωnj I εj ≤ s + υ1 (Xj−1 ) + τ1 (Xj−1 )s − I εj ≤ t + υ2 (Xj−1 ) + τ2 (Xj−1 )t
− F s + υ1 (Xj−1 ) + τ1 (Xj−1 )s + F t + υ2 (Xj−1 ) + τ2 (Xj−1 )t

.

We have |Yj | ≤ 2, E(Yj |X0 , · · · , Xj−1 ) = 0, and
n

E Yj2 |X0 , · · · , Xj−1

Vn =
j=1
n

≤

F s + υ1 (Xj−1 ) + τ1 (Xj−1 )s − bF t + υ2 (Xj−1 ) + τ2 (Xj−1 )t
j=1

≤ n f (ξ)

s + υ1 (Xj−1 ) + τ1 (Xj−1 )s − t + υ2 (Xj−1 ) + τ2 (Xj−1 )t

where ξ is between s + υ1 (Xj−1 ) + τ1 (Xj−1 )s and
46

,

t + υ2 (Xj−1 ) + τ2 (Xj−1 )t. Since supt |tf (t)| < ∞, there exists some constant L, such that

Vn ≤ n{ f ∞ (|s − t|(1 + σ In ) + υ1 − υ2 In ) + L τ1 − τ2 In } = n f ∞ B.

Then by martingale inequality in Freedman (1975),
n

P (|H(s, υ1 , τ1 )| − H(t, υ2 , τ2 )| >

βn1/2 )

Yj > βn1/2 , Vn ≤ n f ∞ B ,

= P
j=1

≤ 2 exp(−

−1/2

Also νl In ≤ n−1/2 cn

β 2n
4βn1/2 + 2n f ∞ B

).

log nQ2n + ηn . Thus we obtain that

P (T1 > βn−1/2 )
P (|H(ti , νl , νm )| − H(ti , 0, 0)| > βn1/2 )

≤
i,l,m

≤ 2Mn Nn2 exp −

β 2n
4βn1/2

−1/2
+ 4n(n−1/2 cn
log nQ2n (L + 1) + ηn )

.
f ∞

Similarly, there exists some constant L2 and L3 , such that

P (T2 >

βn−1/2 )

≤

2Mn Nn2 exp

−

P (T3 > βn−1/2 ) ≤ 2Mn Nn2 exp −
P (T4
P (T5

β 2n
4βn1/2 + nηn (L2 + 12 f ∞ )
β 2n

4βn1/2 + nηn (L3 + 4 f ∞ )
β 2n
> βn−1/2 ) ≤ 2Mn Nn2 exp −
,
4βn1/2 + 12nηn f ∞
β 2n
> βn−1/2 ) ≤ 2Mn Nn2 exp −
.
4βn1/2 + 4nηn f ∞

As δ = 1/(1 +

√

,
,

3) and relation (2.2.5) in condition (H), together with relations (2.5.12)

47

and (2.5.13), we obtain that

P (Ti > βn−1/2 ) → 0,

i = 1, 2, · · · , 5,

β > 0.

This completes the proof of (2.5.10) and hence the proof of Theorem 2.3.1
Lemma 2.5.3 Under the conditions of Theorem 2.3.1,

1
n

n
j=1

n
m(X
ˆ j−1 ) − m(Xj−1 )
1
ωn (Xj−1 )
=
εj + op (n−1/2 ),
σ( Xj−1 )
n
j=1

and for i = 1, 2,

1
n

n

ωn (Xj−1 )
j=1

n
σ
ˆi (Xj−1 ) − σ(Xj−1 )
1
=
(ε2j − 1) + op (n−1/2 ).
σ( Xj−1 )
2n
j=1

Proof. To prove the first equation, from the proof of Lemma 2.2.1, we have

1
m(x)
ˆ
− m(x) =
nh1 g(x)

n

σ(Xj )εj K
j=1

Xj − x
+ op (n−1/2 ).
h1

Then we only need to prove

1
n

n
i=1

ωn (Xi )
nh1 g(Xi )σ(Xi )

n
j=1

Xj − X i
σ(Xj )εj K
h1

1
=
n

n

εj + op (n−1/2 ).
j=1

Denote
n

ˆ =
d(x)
i=1

ωn (Xi )σ(x)
x − Xi
K
.
nh1 g(Xi )σ(Xi )
h1

48

¯ = E(d(x)),
ˆ
Let d(x)
we have

ωn (u)σ(x)
x−u
K
du.
h1 σ(u)
h1

¯ =
d(x)

¯
Then we have E[(d(X)
− 1)2 ] → 0. Therefore

1
n

n

¯ j ) − 1) = op (n−1/2 ).
εj (d(X
j=1

Thus we only need to prove that

1
n

n

˜ j ) = op (n−1/2 ),
εj d(X

(2.5.14)

j=1

˜ = d(x)
ˆ − d(x).
¯
where d(x)
But the proof of (2.5.14) is similar to that of Lemma B.3 of Selk
and Neumeyer (2013) under the mixing condition (Z), and as that appearing in section 5 of
M¨
uller et al. (2009) under the moment contracting condition (Z ). The second equation can
be followed by similar proof.

49

Chapter 3
Linear Measurement Error Models

3.1

Introduction

The problem of fitting an error distribution in regression models has been well studied when
covariates are fully observed, see, e.g., Loynes (1980), Koul (2002), Khamalze and Koul (2004,
2009) and the references therein. However, in practice there are numerous examples of real
world applications where covariates are not observable. Instead, one observes some surrogates
for covariates. The monographs of Cheng and Van Ness (1999), Fuller (2009) and Carroll,
Rupert, Stefanski, and Crainiceanu (2012) are full of such important applications. These
models are often called errors-in-variables models or measurement errors models. Relatively
little is known about fitting an error distribution to the regression model in these models.
In this chapter we investigate a class of tests for this testing problem based on deconvoluted
density estimators of the error density.
Let p ≥ 1 be a given dimension of the covariate vector X. In a multiple linear regression
model with measurement error in X one observes the response variable Y and a surrogate
p-vector Z obeying the model

Y = α + β X + ε,

Z = X + u,

(3.1.1)

for some α ∈ R, β ∈ Rp , where the p-vector u is the measurement error in X. Here b
50

denotes the transpose of any vector b ∈ Rp . The variables ε, u and X are assumed to be
mutually independent, with Eε = 0 and Eu = 0. And for the model identifiability reasons,
we assume the density g of the measurement error u to be known.
Let f denote density of ε, and f0 be a known density with zero mean. Consider the
problem of testing the hypothesis

H0 : f = f0

v.s H1 : f = f0 ,

(3.1.2)

based on a random sample (Yi , Zi ), 1 ≤ i ≤ n from the joint distribution of (Y, Z) obeying
the model (3.1.1).
Note that if in (3.1.1), β = 0, then Y bears no relation with X and hence whether X is
observable or not is irrelevant for making inference about f . In particular any goodness-of-fit
test based on Yi , 1 ≤ i ≤ n, useful for fitting a density up to an unknown location parameter
may be used to test the above hypotheses. Thus, from now onwards we shall assume β = 0
in this chapter.
Since we observe Z instead of X, we shall rewrite the model (3.1.1) as

Y = α + β Z + e,

e = ε − β u.

Because u and ε are independent, the density of e is h(v) =
Let h0 (v) =

f (v + β u)g(u)du, v ∈ R.

f0 (v + β u)g(u)du, v ∈ R. As argued in Koul and Song (2012), there is a

one-to-one map between the densities of ε and e. Hence, testing for H0 is equivalent to
testing for

H0 : h = h0 ,

vs. H1 : h = h0 .
51

(3.1.3)

In the one sample i.i.d. set up, Bickel and Rosenblatt (1973) goodness-of-fit test for fitting
a known density is based on an L2 distance between a kernel density estimator and its null
expected value. This test is adapted to fitting an error density up to an unknown location
parameter, where the density estimator would be based on the estimated residuals. This
statistics has the property that its asymptotic null distribution is not affected by not knowing
the location parameter. In other words, not knowing the nuisance location parameter has no
effect on asymptotic level of the test based on the analog of this statistics. What is remarkable
is that this property continues to hold in several more complicated additive models. Lee and
Na (2002), Bachmann and Dette (2005), and Koul and Mimoto (2012) observed that this
fact continues to hold for the analog of this statistics when fitting an error density based
on residuals in autoregressive and generalized autoregressive conditionally heteroscedastic
time series models. This type of property makes these L2 -distance type tests more desirable,
compared to the tests based on residual empirical processes, because the asymptotic null
distribution of the standardized residual empirical process depends on the estimators of the
underlying nuisance parameters in these models in a complicated fashion. In all of these
works all data are completely observable.
In the above measurement error model, Koul and Song (2012) proposed analogous class
of tests for the testing problem (3.1.3) based on kernel density estimators of h obtained
from the residuals Yi − α
ˆ − βˆ Zi , 1 ≤ i ≤ n, directly, where α
ˆ , βˆ are some n1/2 -consistent
estimators of α, β, under H0 .
Alternately, because f is involved in the convolution h, it is natural to construct tests of
H0 based on a deconvolution density estimators. In this chapter we develop an analogs of
the above tests for testing H0 based on deconvolution density estimators.
There is a vast literature on the deconvolution estimators of the density of X in the
52

measurement error model (3.1.1), as is evidenced in the papers of Carroll and Hall (1988),
Stefanski and Carroll (1990), Fan (1991), van Es and Uh (2004), and Delaigle and Hall
(2006) among others. The goodness-of-fit testing problem pertaining to the density function
of X has been studied by several authors including Butucea (2004), Holzman and Boysen
(2006), Holzman, Bissantz and Munk (2007), and Loubes and Marteau (2014). All of these
authors use analogs of the above L2 -distance type tests based either on the deconvoluted
estimator of density of X or on a density estimator of Z density. None of them address
the above problem of testing (3.1.2) or (3.1.3) pertaining to the error density in the above
measurement error model (3.1.1).
Consider the model (3.1.1) and assume for the time being α, β are known. Since we
observe Y and Z, we can construct a kernel density estimator of density h of e := Y − α −
β Z = ε − β u, which is also an estimator of the convolution of the density f of ε with the
known density of β u. From this we obtain a deconvolution density estimator of f , which
we shall use to construct tests of H0 .
Let Φγ denote the characteristic function of a density γ. Proceeding a bit more precisely,
by the independence of ε and u, Φh (t) = Φf (t)Φg (−βt). Assuming Φg (t) = 0, for all
t ∈ R, the characteristic function of ε is Φf (t) = Φh (t)/Φg (−βt). Using the data Yi , Zi , 1 ≤
i ≤ n, an estimate of Φh is provided by the empirical characteristic function Ψn (t) :=
n−1

itej
n
j=1 e

of ej := Yj − α − β Zj , 1 ≤ j ≤ n. A kernel density estimator of h is
1
hn (x, α, β) =
nb

n

K
j=1

x − ej
,
b

where K is a kernel function with its characteristic function ΦK compactly support and
b > 0 is a bandwidth sequence. Then the characteristic function of hn is ΦK (bt)Ψn (t). Since

53

Φg is known, a kernel estimate of Φf (t) is ΦK (bt)Ψn (t)/Φg (−βt). By the inversion formula,

fn (x, α, β) =

Ψn (t)
1
e−itx ΦK (bt)
dt,
2π R
Φg (−βt)

is a deconvolution estimate of f when α and β are known. But, in practice α, β are seldom
known. Let α
ˆ , βˆ be estimators of α, β, respectively. Then the corresponding deconvoluˆ obtained from fn after replacing α, β by α
ˆ
tion estimator of f is fˆn (x) := fn (x, α
ˆ , β)
ˆ , β,
respectively. The proposed class of tests, one for each K and b, of H0 is to be based on

2
fˆn (x) − Kb ∗ f0 (x) dx,

Tˆn =
R

where for any function γ, Kb ∗ γ(x) := b−1

K((x − y)/b)γ(y)dy.

It is well known that the convergence rate of the deconvolution density estimators depends
sensitively on the tail behaviour of the characteristic function of the underlying measurement
error, which in the present set up is Φg . There are two general cases: one is the ordinary
smooth case, where |Φg (t)| is of polynomial order |t|−κ , for some κ > 0, as |t| → ∞; the other
λ
is the super smooth case, where |Φg (t)| is of the order |t|λ0 e−|t| /ν , for some λ0 ∈ R, λ > 0

and ν > 0, as |t| → ∞. In this chapter, we obtain asymptotic distributions of Tˆn under H0
in both the ordinary smooth and super smooth cases in section 2. The consistency against
a fixed alternative, the asymptotic power against a class of local nonparametric alternatives
and against a fixed alternative for both cases is described in section 3.
The findings of a finite sample simulation that compares the empirical power of a member
of the proposed class of tests with that of the Kolmogorov–Smirnov, Cram´er–von Mises tests
based on the empirical d.f. of {ˆ
ej := Yj − α
ˆ − βˆ Zj , 1 ≤ j ≤ n}, and a Koul and Song (2012)

54

ˆ are presented in section 4. The comparison is made for the three
test based on hn (·, α
ˆ , β)
choices of the measurement error variance σu2 . In the ordinary smooth case, the proposed
test dominates the Koul-Song test at almost all chosen alternatives for all three choices of
σu2 . It also dominates the other two tests for the larger values of σu2 at most of the chosen
alternatives and for a larger sample size. The findings in the super smooth case are similar.
In general the proposed test has better empirical power at the chosen alternatives compared
to some of these other tests for larger values of σu2 , while Cram´er–von Mises test dominates
in terms of the empirical power for smaller values of σu2 . See section 3.4 for more on this
finite sample comparison.
Throughout this chapter, N µ, σ 2 ) denotes the normal distribution with mean µ and
variance σ 2 , all limits are taken as n → ∞, →d and →p denoted the convergence in distribution and probability, respectively, and the range of integration in all the integrals is R,
unless specified otherwise.

3.2

Asymptotic Null Distribution

This section discusses the asymptotic null distribution Tˆn for the ordinary smooth and super
smooth cases.

3.2.1

Ordinary smooth case

Here we shall first derive the limiting null distribution of Tˆn for the ordinary smooth case.
To begin with we state the needed assumptions.
(A): The characteristic function Φg of the error vector u satisfies Φg (t) = 0, for all t ∈ Rp ,
and |Φg (t)| ≈ t −κ , for a κ > 0, i.e. there are c, C > 0 such that c t −κ ≤ |Φg (t)| ≤
55

C t −κ , for all sufficiently large t .
(B): The characteristic function Φf of the density f of ε satisfies |Φf (t)| = O(|t|−r ), for
some r > 1, as |t| → ∞.
(C): The characteristic function ΦK of the kernel function K is symmetric around 0 and
compactly supported on [−1, 1].
(D): E{ X 4 + |ε|4 + u 4 } < ∞.
Next, define ψ(β, s, t) := Φg (βt + βs)Φf (t + s), and let

Tn (α, β) :=

fn (x, α, β) − Kb ∗ f0 (x)

2

dx,

CM,b :=

|ΦK (tb)|2
dt,
|Φg (βt)|2

|ΦK (tb)|2 |ΦK (sb)|2
|ψ(β, s, t)|2 dt.
|Φg (βt)|2 |Φg (βs)|2

CV,b :=

Using Theorem 1 of Holzman et al. (2007) one can derive the following result. Suppose H0
and the assumptions (A)–(C) hold and b → 0, nb → ∞. Then

CM,b ≈ b−(2κ+1) ,
−1/2

nCV,b

CV,b ≈ b−(4κ+1) ,

Tn (α, β) − CM,b / 2πn

→d N 0, 1/2π 2 .

(3.2.1)
(3.2.2)

ˆ Thus we need the above results to hold with α, β replaced
Note that Tˆn = Tn (ˆ
α, β).
ˆ respectively. Accordingly, write CˆM,b , CˆV,b and Ψ
ˆ n (t) for CM,b , CV,b and
by α
ˆ and β,
ˆ respectively. We are now ready to state the following
Ψn (t), when α, β are replaced by α
ˆ , β,
theorem, which provides yet another example where the asymptotic null distributions of
these L2 -distance statistics are not affected by not knowing the nuisance parameters α, β.

56

Theorem 3.2.1 Suppose H0 holds, assumptions (A), (B) with r > 3/2, (C) and (D) hold,
and that

n1/2 {|ˆ
α − α| + βˆ − β } = Op (1).

(3.2.3)

In addition, suppose b → 0, and nbmax{2κ+3,3.5} → ∞, with κ as in (A). Then CˆM,b ≈
b−(2κ+1) , CˆV,b ≈ b−(4κ+1) , and

−1/2

nCˆV,b

Tˆn −

CˆM,b
2πn

→d N 0,

1
.
2π 2

(3.2.4)

The proof of this theorem is given in the last section. Let za be (1 − a)100th percentile of
the N (0, 1) distribution. An immediate consequence of (3.2.4) is that for any 0 < a < 1, the
test that rejects H0 whenever

Tn :=

√

−1/2

2πnCˆV,b

Tˆn −

CˆM,b
> za/2
2πn

has the asymptotic size a.
Examples of g that satisfy assumption (A) include uniform distribution with κ = 1,
gamma distributions with scale γ where κ = γ, exponential where κ = 1, and Laplace
distribution with location 0 and scale 1 where κ = 2. The class of the regression error
densities f that satisfy assumption (B) includes Laplace where r = 2, normal and Cauchy
for any r > 0.

57

3.2.2

Super smooth case

Now we consider the problem of obtaining the limiting distribution of Tˆn in the super smooth
case. Here we need the following assumptions.
(A ): The characteristic function Φg of the error variable u satisfies Φg (t) = 0, for any t ∈ Rp .
λ

For any β ∈ Rp , βk = 0, for k = 1, · · · , p, |Φg (βt)| ∼ C(β)|t|λ0 e−ν(β)|t| , as |t| → ∞,
for a λ > 1, C(β) > 0, ν(β) > 0, and λ0 ∈ R. Also, C(β), ν(β), exist bounded first
derivatives.
(B ): The density f is square-integrable, and Eε2 < ∞.
(C ): The characteristic function ΦK of the kernel function K is symmetric around 0 and
compactly supported on [−1, 1]. Moreover ΦK (0) = 1, and there exist A > 0, ω ≥ 0
such that

ΦK (1 − t) = Atω + o(tω ),

as t → 0.

From Holzmann and Boysen (2006) we can deduce that under the conditions (A )–(C ),
as n → ∞ and b → 0,
(2λ)1+2ω πC 2 (β)n
A2 ν 1+2ω (β)bλ−1+2λω+2λ0 exp

2ν(β)/bλ

Γ(2ω + 1)

Tn (α, β) →d χ22 /2, (3.2.5)

where χ22 is a r.v. having chi-square distribution with 2 degree of freedom, and Γ(·) is the
Gamma function.
In order to derive a similar result for Tˆn , we need the following additional condition. Let
q˙ be the first derivative of q for any function q.
58

(D ): There exists some λ1 > 1, Φf (t)| = O(|t|−λ1 ) as |t| → ∞.
Theorem 3.2.2 Suppose H0 and the assumptions (A ), (B ), (C ), (D ), (D) hold, b → 0,
and

nb−η exp − 2ν(β)/bλ → ∞,

for any

η > 0.

(3.2.6)

Then

Tn,s :=

ˆ 2n
(2λ)1+2ω πC(β)
ˆ 1+2ω bλ−1+2λω+2λ0 exp
A2 ν(β)

Tˆn →d χ22 /2. (3.2.7)
λ
ˆ
2ν(β)/b Γ(2ω + 1)

Note that the factor multiplying Tˆn here is all known. Again, the proof of this theorem
appears in the last section. The corresponding test is to rejects H0 with asymptotic size
a, for 0 < a < 1, whenever Tn,s > Xa /2, where Xa is (1 − a)100th percentile of the χ22
distribution.
Examples satisfying assumption (A ) include normal densities. If g is a standard normal
density then Cg = 1, λ0 = 0, λ = 2 and ν = 2. For kernel functions satisfying assumption
(C ), Holzmann and Boysen (2006) used the sinc kernel K(x) = sin(x)/(πx), with A = 1
and ω = 0, and Fan (1992) used ΦK (t) = (1 − t2 )3 with A = 8 and ω = 3. Other suitable
kernel functions can also be found in Delaigle and Hall (2006).

3.3

Consistency and Asymptotic Power

In this section we shall discuss the consistency and asymptotic power for fixed and local
nonparametric alternatives of the above tests for both ordinary and super smooth cases.

59

Consistency. Let f1 be another fixed density of ε such that

2

f1 − f0 :=

f1 (x) − f0 (x) dx

1/2

> 0.

(3.3.1)

Consider the fixed alternatives, H1 : f (x) = f1 (x), for all x ∈ R.
The following two theorems yield the consistency of the above Tn and Tn,s tests against
H1 for the ordinary and super smooth cases, respectively.
Theorem 3.3.1 Suppose assumptions (A) and (C) hold, f0 and f1 satisfy (B) with r > 3/2,
and have finite fourth moment, and (3.2.3) holds under H1 . Furthermore, suppose (D) holds,
b → 0, and nbmax{2κ+3,3.5} → ∞. Then
√
CˆM,b
−1/2
2πnCˆV,b Tˆn −
→p ∞.
2πn

(3.3.2)

Theorem 3.3.2 Assume (3.2.3) holds under H1 , and that the assumptions of Theorem 3.2.2
hold. Then

n
bλ−1+2λω+2λ0 exp

ˆ λ
2ν(β)/b

Tˆn →p ∞.

Asymptotic local power. First we consider the ordinary smooth case. We shall describe
the asymptotic distribution of Tˆn under a sequence of the local nonparametric alternatives

f1n (x) = f0 (x) + δ1n (x),

x ∈ R,

with δ1n = (CV,b /2)1/4 /(nπ)1/2 , and f1n a nonnegative function, ∈ L2 (R), and
0. We obtain
60

(x)dx =

Theorem 3.3.3 Suppose the assumptions of Theorem 3.2.1 hold and that under H1n :
f (x) = f1n (x), (3.2.3) holds. Then, under under H1n ,
√
−1/2
2πnCˆV,b (Tˆn − CˆM,b /(2πn)) →d N (

2 , 1).

Similarly for the super smooth case, consider a sequence of the local nonparametric
alternatives

x ∈ R,

f2n (x) = f0 (x) + δ2n (x),
δ2n =

(2λ)1+2ω πC(β)2 n

−1/2

A2 ν(β)1+2ω bλ−1+2λω+2λ0 exp 2ν(β)/bλ Γ(2ω + 1)

with f2n a nonnegative function,

∈ L2 (R), and

,

(x)dx = 0. We obtain

Theorem 3.3.4 Suppose the assumptions of Theorem 3.2.2 hold and (3.2.3) holds under
H2n : f (x) = f2n (x). Then, under H2n ,

ˆ 2n
(2λ)1+2ω πC(β)
ˆ 1+2ω bλ−1+2λω+2λ0 exp
A2 ν(β)

Tˆ −
ˆ λ Γ(2ω + 1) n
2ν(β)/b

2

→d χ22 /2.

The above two theorems show that the proposed tests can detect alternatives which converge
to f0 at a rate slower than n−1/2 .
Asymptotic power against a fixed alternative. Now we describe the asymptotic power
for the ordinary smooth case against a fixed alternative f1 such that f1 − f0 > 0. To
proceed further we state the following result, which follows from Theorem 2 of Holzmann et
al. (2007). Assume f1 = f0 satisfies (3.3.1), assumptions (A) and (C) hold, f1 and f0 satisfy

61

assumption (B) for some r > κ + 1, and have bounded second derivatives b → 0, and (3.2.6)
holds. Then, under H1 ,

n1/2 Tn (α, β) − Kb ∗ (f1 − f0 ) 2 →d N (0, τ02 ),

(3.3.3)

where

τ02 =

1
Var
2π 3

e−itε

Φf1 (t) − Φf0 (t)
Φg (βt)

dt .

We shall use this result to analyze the asymptotic distribution of Tˆn under the fixed
alternative H1 . To proceed further, let µZ := EZ, and suppose the first derivatives f˙1 and
f˙0 exist. Define

Af = 2

(f1 − f0 )f˙0 (x) dx,

Bf = 2µZ

(f1 − f0 )f˙1 (x) dx.

Theorem 3.3.5 Assume that (A), (C) and (D) hold, f1 and f0 satisfy assumption (B) with
r > κ + 1, r > 3/2, and κ as in (A) and have bounded second derivatives. Also, assume
(3.3.1) and (3.2.3) hold under H1 . Furthermore, if b → 0, nbmax{4κ+2,2κ+3} → ∞, then

n1/2 Tˆn − Kb ∗ (f1 − f0 ) 2 − (ˆ
α − α)Af − (βˆ − β) Bf

→d N (0, τ02 ). (3.3.4)

Note that the effect of estimating α and β introduces another bias term n1/2 ((ˆ
α − α)Af +
(βˆ − β) Bf ) in the asymptotic distribution of the statistics Tˆn . This bias will vanish if to
begin with there is no intercept parameter in the model and µZ = 0. It also vanishes under
the following linearity condition on the estimators.

62

Furthermore, suppose under H1 , the estimators α
ˆ and βˆ satisfy the following expansion:
1
α
ˆ−α=
n

n

ηj + op (n−1/2 ),

(3.3.5)

j=1
n

1
βˆk − βk =
n

ζjk + op (n−1/2 ),

k = 1, · · · , p,

(3.3.6)

j=1

where ηj are i.i.d. with Eη = 0, Var(η) > 0, E|η|2+ϑ < ∞, for some ϑ > 0. Moreover, the
same conditions are satisfied by ζjk ’s, and also for i = j = k ηi , ζj and ek are mutually
independent.
Examples of the estimators of α
ˆ , βˆ that satisfy these two conditions include the naive
least square estimators, maximum likelihood estimators (see Huˇakov´a and Meintanis (2007)),
and the bias-corrected estimators (see Fuller (1987)). Using the above expansion, we obtain
Theorem 3.3.6 Assume the conditions of Theorem 3.3.5 and (3.3.5)-(3.3.6) for α
ˆ and βˆ
hold. Then, for some τ > 0,

n1/2 Tˆn − Kb ∗ (f1 − f0 ) 2 →d N (0, τ 2 ).

(3.3.7)

The form of τ is described in the proof of this theorem in the last section, see (3.5.26).
Although τ is complicated to calculate in practice, the bootstrap simulation methods can be
used to estimate τ .
For the super smooth case, in order to obtain a similar result as above, we need to make
the following stronger assumptions on f1 and f0 :
ξ
(B∗ ) The characteristic function Φf of the density f of ε satisfies |Φf (t)| = O(|t|ξ0 e−|t| /ζ )

for some ξ0 ∈ R, ζ > 0 and ξ > λ.

63

Assumption (B∗ ) implies (D ), and assures

Φf (t)/Φg (βt) dt < ∞. An example of f and

g satisfying the above condition is where f is a normal density with variance smaller than
1, and g is standard normal density.
A result analogous to (3.3.3) can be obtained in the super smooth case also by following
the proof of Theorem 2 in Holzmann et al. (2007) with known α and β. To be clear, assume
f1 , f0 satisfying (3.3.1), assumptions (A ) and (C ) hold, and f1 and f0 satisfy assumption
(B∗ ). Assume b → 0, and

nb−η exp − 4ν(β)/bλ → ∞,

for any η > 0.

(3.3.8)

Then (3.3.3) holds. In the case of unknown α and β, we obtain the following theorem.
Theorem 3.3.7 Suppose assumptions (A ), (C ), and (B∗ ) hold, f1 , f0 satisfy (3.3.1), and
have bounded second derivatives. If, in addition, b → 0, and (3.2.6) holds, then we have
(3.3.4).
Furthermore, if α
ˆ and βˆ satisfy (3.3.5)-(3.3.6), then (3.3.7) holds for some τ > 0.

3.4

Simulations

In this section we report the findings of some extensive simulations, which assess some finite
sample level and power behavior of a member of the above class of tests. The results are
presented in the two subsections for ordinary and super smooth cases.

64

3.4.1

Ordinary smooth case

Consider the measurement error model

Y = 1 + X + ε,

Z = X + u,

(3.4.1)

where X ∼ N (0, 1) and Φg (t) = 16/(4 + σu2 t2 )2 . This Φg satisfies assumption (A) of the
ordinary smooth case with κ = 4. We wish to test the hypothesis that ε ∼ N (0, 0.25), i.e., f0
in H0 is the density of normal distribution with mean zero and variance 0.25. As in Koul and
Song (2012), we use the bias-corrected estimators α
ˆ = Y¯ − βˆZ¯ and βˆ = SZY /(SZZ − σu2 ),
where Y¯ and Z¯ denote the sample mean of Y and Z, and SZY and SZZ denote the sample
covariance of Z and Y and the sample variance of Z, respectively. In the deconvolution
estimator of f , we used the sinc kernel K(x) = sin x/(πx). The proposed test based on Tˆn
−1/2
rejects H0 for the large values of Tˆn := nCˆV,b |Tˆn − CˆM,b /(2πn)|.

We shall compare this test with with the Kolmogorov-Smirnov (TKS ), the Cram´er-von
Mises (TCvM ) tests and the Wn test proposed by Koul and Song (2012), all based directly
ˆ i , 1 ≤ i ≤ n. The first two statistics are defined as
on residuals eˆi := Yi − α
ˆ − βZ

TKS := sup n1/2 |Fˆn (x) − F0 (x)|,
x∈R

where Fˆn (x) := n−1

n
ei
i=1 I(ˆ

TCvM := n

2
Fˆn (x) − F0 (x) dF0 (x),

≤ x). To define Wn , let ϕ be a density kernel on R, ϕ2 (u) :=

ϕ(v)ϕ(u + v) dv, c ≡ cn be another window width, w be a compactly supported density

65

on R, and let
n

x − eˆj
ϕ
,
c

1
Cˆn := 2 2
n c

ˆ n := 2
Γ

˜ 2 (x)w2 (x)dx
h
n

ϕ2 (u) du.

ˆ :=
Then, with h0 (x, β)

ˆ
f0 (x + βu)dx,

˜ n (x) :=
h

1
nc

j=1

ˆ −1
Wn := nb1/2 Γ
n

n

ϕ2
i=1

v − eˆi
w(v) dv,
c

2

˜ n (x) − h0 (x, β)
ˆ 2 w(x) dx − Cˆn .
h

In this simulation study, we chose the kernel function ϕ to be the standard normal density
and the bandwidth c = n−0.27 , and w(·) was chosen to be the uniform density on the
closed interval [−6, 6]. All these three tests reject H0 for large values of their corresponding
statistics.
To assess the effect of the measurement error on the finite sample level and power of
these tests, we conducted simulations for the three values of σu2 = 0.25, σu2 = 0.5, σu2 = 1
with bandwidth b = 0.5n−1/12 ,b = 0.65n−1/12 , and b = 0.8n−1/12 , respectively .
It is well known that the approximation of the distributions of the test statistics based
on density estimators by their asymptotic distributions is generally slow. For that reason in
this simulation study we use the Monte Carlo simulation method to obtain the critical values
for all tests considered. At level 0.05, the critical values of all four tests are simulated by
the Monte Carlo method, based on sample size 300 and 500, repeating 1000 times. The 95%
quantiles are calculated for 1000 repetitions and the mean values of the 1000 quantiles are
chosen as the critical values given in Table 3.1 for different values of σu2 . The Monte Carlo
level of the three tests Tˆn , TKS and TCvM is relatively more robust against the variation in
66

the measurement error, compared to that of the Wn test.
n
300

500

σu2
0.25
0.5
1
0.25
0.5
1

Tˆn
1.18118
1.14531
1.15600
1.19852
1.20847
1.20484

TKS
0.86714
0.87355
0.88576
0.86223
0.86674
0.87829

TCvM
0.16988
0.17395
0.18056
0.16929
0.17357
0.18073

Wn
1.34799
1.40213
1.49112
1.40258
1.44800
1.52883

Table 3.1: Monte Carlo critical values of all the tests, ordinary smooth case.

The alternatives considered here are t-distributions with k degrees of freedom, denoted
by tk , for k = 4, 6, 8, 10, 15, 20, double exponential (DE) and logistic (L) distributions, all
having zero mean and standard deviation 0.5. The sample sizes chosen are 300 and 500
and the level is .05. From Table 3.2, we see that in terms of the empirical power, the Tˆn
dominates the Wn test uniformly across the chosen alternatives, sample sizes and the values
of σu2 , while for n = 500, it also dominates the TKS test for at almost all chosen alternatives,
when σu2 = .5, 1, while the TCvM test dominates all other tests for the smallest value of σu2 .
We also considered the following normal and logistic mixture alternatives.

f1 = 0.5N (−µ, 0.25) + 0.5N (µ, 0.25),

µ > 0,

f2 = 0.5 (−λ, 1.5/π) + 0.5 (λ, 1.5/π),

λ ≥ 0,

− x−a
b ).

where (a, b) is the density of the logistic d.f. 1/(1 + e

The empirical powers for

normal and logistic mixture alternatives are given in Table 3.3. In both cases the sample
sizes are 300 and 500 and the level is 0.05. From Table 3.3 one observes the following. First,
as σu2 increases, the empirical powers decrease generally. Secondly, for the alternatives f1 and
σu2 = 1, the proposed test Tˆn based on deconvolution density estimator has larger empirical
67

powers than the TKS and Wn tests in most of the cases, while the TCvM test dominates all
other three tests in terms of the empirical power. For σu2 = 0.25, the Wn test dominates all
three tests Tˆn , TKS and TCvM at all normal mixture alternatives. For σu2 = 0.5, the Tˆn test
has larger empirical powers than TKS , but smaller empirical powers than TCvM and Wn .
Similar phenomena can be found from Table 3.3 for the alternatives f2 .

n
300

σu2
0.25

0.5

1

500

0.25

0.5

1

Test
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn

t4
0.240
0.191
0.268
0.035
0.093
0.064
0.100
0.018
0.046
0.052
0.042
0.037
0.397
0.244
0.398
0.051
0.131
0.113
0.162
0.059
0.069
0.059
0.058
0.042

t6
0.093
0.084
0.103
0.023
0.072
0.057
0.077
0.037
0.050
0.046
0.052
0.041
0.158
0.122
0.159
0.037
0.052
0.070
0.059
0.043
0.050
0.049
0.046
0.034

t8
0.072
0.062
0.065
0.023
0.038
0.055
0.065
0.038
0.043
0.046
0.047
0.050
0.092
0.082
0.101
0.027
0.049
0.055
0.057
0.034
0.045
0.043
0.042
0.054

t10
0.052
0.058
0.053
0.027
0.051
0.054
0.052
0.043
0.040
0.050
0.050
0.039
0.078
0.077
0.083
0.032
0.054
0.062
0.062
0.041
0.052
0.044
0.056
0.047

t15
0.047
0.041
0.053
0.027
0.052
0.060
0.056
0.040
0.049
0.055
0.057
0.046
0.068
0.050
0.054
0.034
0.043
0.052
0.045
0.048
0.061
0.053
0.057
0.048

t20
0.060
0.053
0.062
0.039
0.047
0.053
0.051
0.046
0.049
0.047
0.060
0.038
0.060
0.044
0.053
0.035
0.049
0.064
0.069
0.049
0.047
0.050
0.049
0.041

DE
0.183
0.150
0.226
0.033
0.050
0.051
0.071
0.032
0.046
0.047
0.054
0.043
0.344
0.223
0.315
0.059
0.107
0.106
0.129
0.043
0.060
0.049
0.068
0.034

Table 3.2: Empirical powers against chosen alternatives, ordinary smooth case.

68

L
0.059
0.064
0.068
0.036
0.050
0.049
0.049
0.034
0.048
0.040
0.047
0.038
0.097
0.076
0.090
0.025
0.044
0.053
0.052
0.037
0.063
0.050
0.059
0.048

µ
n
300

2
σu
0.25

0.5

1

500

0.25

0.5

1

Test
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn
Tˆn
TKS
TCvM
Wn

0.2
0.084
0.078
0.101
0.137
0.071
0.078
0.082
0.095
0.079
0.059
0.079
0.075
0.113
0.095
0.143
0.172
0.063
0.091
0.084
0.094
0.089
0.065
0.084
0.073

0.4
0.672
0.577
0.787
0.818
0.457
0.350
0.514
0.473
0.262
0.199
0.265
0.212
0.893
0.835
0.955
0.955
0.646
0.529
0.709
0.639
0.394
0.305
0.415
0.320

λ
0.6
0.999
0.998
1.000
1.000
0.973
0.920
0.985
0.974
0.729
0.593
0.760
0.655
1.000
1.000
1.000
1.000
0.998
0.990
0.999
0.999
0.898
0.801
0.930
0.855

0.8
1.000
1.000
1.000
1.000
1.000
0.998
1.000
1.000
0.964
0.924
0.974
0.949
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.996
0.988
1.000
0.995

1.2
0.252
0.236
0.367
0.419
0.162
0.163
0.226
0.206
0.147
0.117
0.153
0.132
0.390
0.393
0.586
0.585
0.300
0.247
0.373
0.322
0.182
0.156
0.197
0.153

1.4
0.523
0.460
0.651
0.687
0.363
0.278
0.430
0.403
0.209
0.164
0.233
0.181
0.747
0.697
0.879
0.872
0.520
0.444
0.619
0.531
0.304
0.218
0.342
0.265

1.6
0.767
0.703
0.862
0.874
0.551
0.431
0.631
0.584
0.291
0.228
0.328
0.255
0.947
0.913
0.985
0.982
0.776
0.668
0.845
0.790
0.446
0.379
0.484
0.373

1.8
0.929
0.889
0.966
0.968
0.742
0.645
0.824
0.772
0.450
0.347
0.496
0.404
0.992
0.987
1.000
1.000
0.922
0.858
0.961
0.934
0.627
0.528
0.681
0.550

Table 3.3: Empirical powers against mixture normal (left panel) and logistic alternatives, ordinary
smooth case.

69

3.4.2

Super smooth case

Now consider the measurement error model (3.4.1), where again X ∼ N (0, 1), ε ∼ N (0, 0.25)
but u ∼ N (0, σu2 ). The bias-corrected estimators are also used to estimate α and β. The
sinc kernel K(x) = sin x/(πx) is consider for the deconvolution kernel estimator, with the
√
bandwidth b = 0.55(log n)−0.5 , b = ( 0.5 + 0.05)(log n)−0.5 and b = 1.15(log n)−0.4 when
σu2 = 0.25, σu2 = 0.5 and σu2 = 1, respectively. Thus, Cg = 1, ν = 2/σu2 , λ0 = 0, λ = 2,
A = 1, and ω = 0 in equation (3.2.4). Then the left side of (3.2.7) can be written as

Tˆn,s :=

2πnσu2 βˆ2 Tˆn
.
ˆ u |2 /b2 )
b exp(|βσ

The Monte Carlo distribution of Tˆn,s for the sample size 1000 based on 1000 repetitions is
very close to χ22 /2. Hence the critical values of this test are obtained from χ22 /2 distribution.
To examine the power, we compared our test with the same three direct tests as in the
previous section. We generated the critical values for TKS , TCvM and Wn defined as above
by Monte Carlo methods, based on 500 and 1000 sample size, repeated 1000 times. The 95%
quantiles are calculated for 1000 repetitions and the mean values of 1000 these quantiles are
chosen as the critical values. These critical values are listed in Table 3.4.
n
500

1000

σu2
0.25
0.5
1
0.25
0.5
1

TKS
0.85655
0.85670
0.85545
0.85038
0.85183
0.85210

TCvM
0.16540
0.16500
0.16446
0.16482
0.16535
0.16492

Wn
1.39447
1.45467
1.53780
1.46119
1.51706
1.59195

Table 3.4: Monte Carlo critical values of the TKS , TCvM , and Wn , super smooth case.

We consider the same alternative as in the ordinary smooth case of the subsection 3.4.1.
70

The empirical powers against t, double exponential and logistics distributions are given in
Table 3.5. From this table one sees that the proposed deconvolution test provides the largest
empirical powers in all the cases compared to the other three testing methods when σu2 = 1,
while it dominates the Wn test for smaller values of σu2 . The empirical powers against
normal and logistics mixture alternatives are given in Table 3.6, for sample size 500 and
1000. From this table we see that for both normal and logistic mixture alternatives, the Wn
test dominates the Tˆn,s and TKS tests for all chosen sample sizes and for all values of σu2 ,
while the TCvM test dominates all other tests uniformly.

71

n
500

σu2
0.25

0.5

1

1000

0.25

0.5

1

Test
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn

t3
0.207
0.451
0.694
0.146
0.143
0.152
0.250
0.043
0.155
0.084
0.087
0.043
0.262
0.714
0.960
0.499
0.184
0.273
0.468
0.092
0.199
0.094
0.135
0.066

t4
0.118
0.184
0.300
0.037
0.105
0.086
0.104
0.047
0.093
0.050
0.059
0.047
0.117
0.364
0.551
0.098
0.085
0.102
0.164
0.041
0.116
0.062
0.072
0.035

t5
0.111
0.119
0.160
0.037
0.087
0.073
0.082
0.051
0.088
0.046
0.052
0.044
0.096
0.193
0.304
0.051
0.079
0.090
0.120
0.041
0.088
0.051
0.057
0.047

t6
0.087
0.090
0.120
0.035
0.080
0.066
0.059
0.039
0.071
0.044
0.047
0.045
0.067
0.123
0.171
0.036
0.066
0.072
0.090
0.044
0.078
0.054
0.053
0.058

t8
0.072
0.071
0.082
0.029
0.065
0.053
0.055
0.046
0.074
0.054
0.053
0.049
0.066
0.090
0.114
0.038
0.046
0.056
0.060
0.041
0.074
0.048
0.063
0.043

t10
0.061
0.062
0.066
0.040
0.069
0.045
0.058
0.051
0.058
0.047
0.049
0.050
0.062
0.067
0.088
0.037
0.060
0.048
0.053
0.049
0.063
0.044
0.047
0.055

DE
0.121
0.130
0.206
0.034
0.105
0.073
0.088
0.045
0.070
0.054
0.057
0.044
0.165
0.274
0.403
0.095
0.081
0.100
0.095
0.046
0.081
0.049
0.050
0.050

L
0.075
0.071
0.082
0.035
0.072
0.051
0.045
0.034
0.075
0.051
0.050
0.054
0.062
0.080
0.101
0.044
0.075
0.046
0.048
0.050
0.073
0.047
0.039
0.042

Table 3.5: Empirical powers against alternative distributions, super smooth case.

72

m
n
500

σu2
0.025

0.5

1

1000

0.025

0.5

1

Test
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn
Tˆn,s
TKS
TCvM
Wn

0.2
0.048
0.079
0.136
0.143
0.048
0.042
0.049
0.045
0.040
0.049
0.059
0.054
0.063
0.154
0.225
0.187
0.070
0.096
0.147
0.044
0.034
0.067
0.083
0.052

0.4
0.140
0.800
0.930
0.908
0.065
0.171
0.263
0.191
0.042
0.099
0.128
0.085
0.150
0.983
1.000
0.994
0.153
0.742
0.898
0.631
0.275
0.368
0.511
0.238

λ
0.6
0.547
1.000
1.000
1.000
0.207
0.803
0.938
0.835
0.352
0.452
0.590
0.359
0.684
1.000
1.000
1.000
0.553
1.000
1.000
1.000
0.968
0.933
0.977
0.879

0.8
0.890
1.000
1.000
1.000
0.599
0.997
1.000
0.999
0.925
0.865
0.950
0.828
0.979
1.000
1.000
1.000
0.892
1.000
1.000
1.000
1.000
1.000
1.000
1.000

1.2
0.092
0.389
0.565
0.502
0.079
0.199
0.297
0.195
0.042
0.115
0.145
0.081
0.076
0.632
0.824
0.755
0.070
0.361
0.557
0.342
0.094
0.174
0.252
0.126

1.4
0.134
0.645
0.844
0.810
0.105
0.326
0.512
0.362
0.071
0.164
0.237
0.127
0.110
0.929
0.984
0.966
0.108
0.634
0.803
0.610
0.231
0.295
0.420
0.199

1.6
0.189
0.853
0.961
0.942
0.165
0.549
0.732
0.558
0.159
0.286
0.412
0.227
0.208
0.996
0.997
0.997
0.169
0.879
0.964
0.862
0.365
0.484
0.649
0.358

1.8
0.305
0.974
0.997
0.997
0.240
0.765
0.897
0.783
0.291
0.342
0.523
0.279
0.356
0.998
0.998
0.998
0.271
0.984
0.996
0.973
0.625
0.660
0.837
0.508

Table 3.6: Empirical powers against mixture normal (left panel) and logistic distributions, super
smooth case.

73

3.5

Proofs

Here we present proof of Theorems 3.2.1–3.3.7. We write Tn := Tn (α, β) and fn (x) :=
ˆ for expressions simplicity.
fn (x, α, β) with known α, β and fˆn (x) := fn (x, α
ˆ , β)
Since CV,b ≈ b−(4κ+1) , we first show

nb2κ

(fˆn − fn )2 (x) dx = op (1).

(3.5.1)

Using Parseval’s equation, we have

(fˆn − fn )2 (x) dx
=

1
4π 2

=

1
2π

|ΦK (ht)|2

≤

1
2π

|ΦK (ht)|2

(3.5.2)

e−itx ΦK (ht)

ˆ n (t)
2
Ψn (t)
Ψ
−
dt dx
ˆ
Φg (−βt)
Φg (−βt)

ˆ n (t)
Ψ
Ψn (t) 2
dt
−
ˆ
Φg (−βt)
Φg (−βt)
ˆ n (t) − Ψn (t))|2
|Ψ
dt
ˆ 2
|Φg (−βt)|

ˆ − Φg (−βt)|2
|Φg (−βt)
1
2
|ΦK (bt)Ψn (t)|
+
dt
ˆ g (−βt)|2
2π
|Φg (−βt)Φ
1
1
=
S1 +
S , say.
2π
2π 2

Since ΦK is supported on [−1, 1], ΦK (bt) = 0, for |t| > 1/b. Thus in the above two
integrals, t ∈ [−1/b, 1/b]. Since µg :=

|x|g(x)dx < ∞, Φ˙ g exists and is uniformly bounded

74

above by µg . This fact together with (3.2.3) and assumption (A) imply,

ˆ − Φg (−βt)| ≤ µg |t| βˆ − β ,
|Φg (−βt)
ˆ
Φg (−βt)
−1
|t|≤1/b Φg (−βt)
max

=

ˆ − Φg (−βt)
Φg (−βt)
Φg (−βt)
|t|≤1/b
max

(3.5.3)
(3.5.4)

= Op (n−1/2 b−κ−1 ).

ˆ ≥ |Φg (−βt)|/2, t ∈ [−1/b, 1/b]}. Since nb2κ+3 → ∞, (3.5.4) implies
Let An := {|Φg (−βt)|
P (An ) → 1. Thus we need only to restrict our attention to An .
Consider S2 . Conditions (A) and (B) imply that there exists a M , cβ , Cβ and Cf , such
that for all |t| > M , cβ |t|−κ ≤ |Φg (βt)| ≤ Cβ |t|−κ and Φf (t) ≤ Cf |t|−r . Take n large
enough so that M < 1/b. Split the integral in S2 into two ranges, one with |t| ≤ M and
the other with |t| > M . Then by (3.2.3) and (3.5.3) we obtain that on the event An , S2 is
bounded from the above by
|tΦK (bt)Ψn (t)|2
dt + Op (n−1 )
4
|Φg (−βt)|
1/b≥|t|>M
|tΦK (bt)|2 |Ψn (t) − Φh (t)|2
2
2
ˆ
≤ 8µg β − β
|Φg (−βt)|4
1/b≥|t|>M

4µ2g βˆ − β 2

|tΦK (bt)Φh (t)|2
+
dt + Op (n−1 ).
4
|Φg (−βt)|
By the Parseval’s identity

Tn (α, β) =

1
2π

|ΦK (bt)|2 |Ψn (t) − Φh (t)|2
dt = Op (n−1 b−2κ−1 ),
2
|Φg (−βt)|

(3.5.5)

because of (3.2.1) and (3.2.2). Because |Φg (βt)|−2 ≤ c2β |t|2κ , the first term within the curly
brackets in the above bound is bounded above by b−2κ−2 Tn (α, β) = Op (n−1 b−4κ−3 ).
75

Similarly, assumptions (A) and (B) imply
|tΦK (bt)Φf (t)|2
|tΦK (bt)Φh (t)|2
dt
=
dt = O(bmin(2r−2κ−3,0) ).
4
2
|Φ
(−βt)|
|Φ
(−βt)|
g
g
|t|>M
|t|>M
Hence, in view of (3.2.3),

S2 = Op (n−2 b−4κ−3 ) + Op (n−1 bmin(2r−2κ−3,0) ) = op (n−1 b−2κ ).

Next, to analyze S1 . Let

1
S11 :=
n2

|ΦK (bt)|2 |

1
S12 :=
n2

|ΦK (bt)|2 |

1
S13 :=
n2

|ΦK (bt)|2 |

n
ˆ
j=1 t(β − β)

it(Yj −α−β Zj ) 2
|

Zj e

|Φg (−βt)|2
it(Yj −α−β Zj ) 2
n
|
j=1 te
2
|Φg (−βt)|
n
ˆ
j=1 t((β − β)

76

dt,

it(Yj −α−β Zj ) 2
|

Zj )2 e

|Φg (−βt)|2

dt,

dt.

(3.5.6)

Using the fact Yj − α − β Zj = εj − β uj , we obtain on the event An ,

S1
≤ 4

(3.5.7)
|ΦK (bt)|2

ˆ n (t) − Ψn (t)|2
|Ψ
dt
|Φg (−βt)|2

|ΦK (bt)|2 |

16
≤
n2

n
ˆ
j=1 t(β − β)

Zj e

it(εj −β uj ) 2
|

|Φg (−βt)|2

16(ˆ
α − α)2
+
n2

it(εj −β uj ) 2
n
|
j=1 te
|Φg (−βt)|2

|ΦK (bt)|2 |

|ΦK (bt)|2 |

16
+ 2 2
n b

n
ˆ
j=1 t((β − β)

Zj )2 e

dt

dt

it(εj −β uj ) 2
|

|Φg (−βt)|2

16(ˆ
α − α)4
+
n 2 b2

it(εj −β uj ) 2
n
|
j=1 te
|Φg (−βt)|2

|ΦK (bt)|2 |

dt

dt + Op n−3 b−2κ−7

= 16[S11 + (ˆ
α − α)2 S12 + b−2 S13 + (ˆ
α − α)4 b−2 S12 ] + Op n−3 b−2κ−7 ,

by (3.2.3), assumption (A), and the fact that

n
3
j=1 |Zj |

= Op (n).

Now, consider S11 .
p

S11 ≤ p
k=1

(βˆk − βk )2
n2

|ΦK (bt)|2 |

it(εj −β uj ) 2
n
|
j=1 tZkj e
|Φg (−βt)|2

Since X, u and ε are mutually independent, for any k = 1, · · · , p,

EZk e

it(εj −β uj )

= EXk Φh (t) + Φf (t)Euk e−itβ u .

77

dt

We use this to obtain
|ΦK (bt)|2 |

1
n2

it(εj −β uj ) 2
n
|
j=1 tZkj e
2
|Φg (−βt)|

dt

it(εj −β uj )
it(ε −β uj ) 2
n
− EZkj e j
]
j=1 [Zkj e
b2 |Φg (−βt)|2
2
n
|ΦK (bt)|2
j=1 EXk tΦh (t)]
dt
|Φg (−βt)|2
n
−itβ u |
|ΦK (bt)|2
j=1 tΦf (t)Euk e
dt.
|Φg (−βt)|2

|ΦK (bt)|2

3
≤
n2
3
+ 2
n
3
+ 2
n

(3.5.8)

dt

An argument similar to the one used in the proof of Theorem 1 in Holzmann et al. (2007)
implies that the first summand in the upper bound of (3.5.8) is Op (n−1 b−2κ−3 ). The second
summand is Op (1), by assumption (B), and Φh (t)/Φg (−βt) = Φf (t). To analyze the third
summand in the upper bound of (3.5.8), decompose the integral into two ranges, |t| > M
and |t| ≤ M , and use the conditions (A)-(B) to show that the term with integration over
|t| ≤ M is Op (1), while the term with |t| > M is of the order Op (bmin(2r−2κ−3,0) ), thereby
showing that the third summand in (3.5.8) is of the order Op (1) + Op (b2r−2κ−3 ). Thus

S11 = Op (n−1 b−2κ−3 ) + Op (1) + Op (b2r−2κ−3 ).

(3.5.9)

Similarly one obtains that S12 and S13 are of the same order as S11 . Then (3.5.7), (3.5.9),
nb2κ+3 → ∞, nb7/2 → ∞ imply

nb2κ S1 = Op (n−1 b−3 ) + Op (b2κ ) + Op (b2r−3 ) + Op (n−2 b−7 ) = op (1).

This together with (3.5.6) completes the proof of (3.5.1).

78

From (3.5.1) and (3.5.5) we obtain

(fn − fˆn )2 (x) dx + 2

Tˆn − Tn =

(fn − fˆn )(fn − Kb ∗ f0 )(x) dx

= op (n−1 b−2κ−1/2 ),

by (3.2.1) and (3.2.2). Hence, in view of (3.2.2),

1/2
n/CV,b Tˆn − CM,b / (2π)n

→d N 0, 1/2π 2 .

(3.5.10)

To complete the proof of (3.2.4), it suffices to show that

(a) 1 −

1/2
CˆV,b
1/2

= op (b1/2 ),

CV,b

(b)

CˆM,b
1/2
CˆV,b

−

CM,b
1/2

= op (1).

CV,b

To show (3.5.11)(a), recall ψ(β, s, t) := Φg (βt + βs)Φf (t + s). Then

|CV,b − CˆV,b |
|ΦK (tb)|2 |ΦK (sb)|2
|ψ(β, s, t)|2 ds dt
2
2
|Φg (βt)| |Φg (βs)|

=
−

|ΦK (tb)|2 |ΦK (sb)|2
ˆ s, t)|2 ds dt
|ψ(β,
2
2
ˆ
ˆ
|Φg (βt)| |Φg (βs)|
ˆ 2|
|ΦK (tb)|2 |ΦK (sb)|2 ||Φg (βt)|2 − |Φg (βt)|
|ψ(β, s, t)|2 ds dt
2
2
2
ˆ
|Φg (βt)| |Φg (βs)| |Φg (βt)|

≤
+

ˆ 2|
|ΦK (tb)|2 |ΦK (sb)|2 ||Φg (βs)|2 − |Φg (βs)|
|ψ(β, s, t)|2 ds dt
2
2
2
ˆ
ˆ
|Φg (βt)| |Φg (βs)| |Φg (βs)|

+

|ΦK (tb)|2 |ΦK (sb)|2
ˆ s, t)|2 ds dt
|ψ(β, s, t)|2 − |ψ(β,
2
2
ˆ
ˆ
|Φg (βt)| |Φg (βs)|

79

(3.5.11)

In view of (3.5.4), the first term in the above bound is bounded from the above by

max 1 −

|t|≤1/b

|Φg (βt)|2
ˆ 2
|Φg (βt)|

|ΦK (tb)|2 |ΦK (sb)|2
|ψ(β, s, t)|2 ds dt
|Φg (βt)|2 |Φg (βs)|2

= Op (n−1/2 b−1−κ CV,b ).

The other two terms in the above bounds are bounded similarly. Together with (3.2.1) and
nb2κ+1 → ∞, we obtain

|1 − CˆV,b /CV,b | = Op (n−1/2 b−1−κ ) = op (b−1/2 ),

which implies (3.5.11)(a).
Next, consider (3.5.11)(b). Applying (3.2.1), (3.5.11)(a) and nb2κ+1 → ∞,

1/2
1/2
|CˆM,b /CˆV,b − CM,b /CV,b |
−1/2
−1/2
1/2
1/2
≤ |CˆM,b − CM,b ||CˆV,b | + CM,b CV,b 1 − CˆV,b /CV,b

≤

max 1 −

|t|≤1/b

|Φg (βt)|2
1/2
CM,b CˆV,b + op (1)
2
ˆ
|Φg (βt)|

= Op (n−1/2 b−3/2−κ ) = op (1).

This completes the proof of (3.5.11), which combined with (3.5.10) also prove (3.2.4), thereby
completing the proof of Theorem 3.2.1.
Proof of Theorem 3.2.2. Let ζβ (b) := exp(2ν(β)/bλ ), β ∈ R. We shall first show that
n
bλ−1+2λω+2λ0 ζβ (b)

(fˆn − fn )2 (x)dx = op (1).

80

(3.5.12)

The proof is similar as in the ordinary smooth case. We only list some main differences.
First, arguing as for (3.5.4), for the super smooth case, (A ) implies
ˆ
Φg (−βt)
−1
|t|≤1/b Φg (−βt)
max

=

ˆ − Φg (−βt)
Φg (−βt)
Φg (−βt)
|t|≤1/b
max

(3.5.13)

1/2

= Op (n−1/2 b−1+2λ0 ζβ (b)).

ˆ ≥ |Φg (−βt)|/2, t ∈ [−1/b, 1/b]}.
By (3.2.6), hence P (An ) → 1, with An := {|Φg (−βt)|
Assumptions (B ) and (D ) imply that there exist constants M, cβ , Cβ < ∞, such that
λ

λ

for |t| > M , cβ |t|λ0 e−ν(β)|t| ≤ |Φg (t)| ≤ Cβ |t|λ0 e−ν(β)|t| and |Φf (t)| ≤ Cg1 |t|−λ1 . Also,
on the event An , there exists some β˜ between βˆ and β, such that S2 is bounded from the
above by
|ΦK (bt)|2 |Ψn (t) − Φh (t)|2 + |Φh (t)|2
2µg βˆ − β 2
dt
b2
|Φg (−βt)|4
|t|≥M
+Op (n−1 ).

Based on (3.2.5),
|ΦK (bt)|2 |Ψn (t) − Φg (t)|2
dt = Op (n−1 bλ−1+2λω+4λ0 ζβ2 (b)).
4
|Φg (−βt)|
|t|≥M
From Lemma 5 in van Es and Uh (2005), it follows that
|ΦK (bt)Φh (−βt)|2
dt = Op (b2λ0 +2λ1 +λ(1+2ω) ζβ (b)).
|Φg (−βt)|4
|t|≥M

81

So when λ1 > 1 and n, b satisfy (3.2.6), we have

S2 = op (n−1 bλ−1+2λω+2λ0 ζβ (b)).

Now we consider S1 . Follow the same arguments as (3.5.7), using assumptions (A ) and
(B ) to obtain

S1 ≤ 8(βˆ − β)2 /b2 S11 + 8(ˆ
α − α)2 /b2 S12 + Op

b2λ0 −1 ζβ (b)
n2 b 4

.

(3.5.14)

Consider S11 first. Similar as (3.5.8), together with assumptions (A )-(D ), we obtain

S11 = Op (n−1 b−1+2λ0 ζβ (b)) + Op (1) + Op (b2λ0 −1+2λ1 +λ(1+2ω) ζβ (b)).

(3.5.15)

S12 can be considered the same way. Thus the above arguments(3.2.6), (3.5.14) and (3.5.15)
imply

nS1
λ−1+2λω+2λ
0 ζβ (b)
b

= Op (n−1 b−λ−2λω ) + Op (n−1 b−λ−2λω−1−2λ0 )
+O(b2λ1 −2 ) + Op (n−1 b−λ−4−2λω ) = op (1).

This completes the proof of (3.5.12). Combining this with (3.2.5), we obtain

Tˆn − Tn
=

(fn − fˆn )2 (x) dx + 2

(3.5.16)
(fn − fˆn )(fn − Kb ∗ f0 )(x) dx

= op (n−1 bλ−1+2λω+2λ0 ζβ (b)).

82

Also, since βˆ − β = Op (n−1/2 ), the first derivatives of ν(β) and C(β) exist,

ˆ − ν(β))/bλ | = op (1).
|1 − exp − 2(ν(β)

(3.5.17)

Then (3.5.16) and (3.5.17) yield to (3.2.7), thus we complete the proof of Theorem 3.2.2.
Proof of Theorem 3.3.1. Define

T˜n =

2
fˆn (x) − Kb ∗ f1 (x) dx,

Argue as in the proof of Theorem 3.2.1 to obtain

−1/2

nC˜V,b

T˜n − CˆM,b / 2πn

→d N 0, 1/2π 2 ,

(3.5.18)

where C˜V,b is same as CˆV,b with f replaced by f1 . Hence, C˜V,b ≈ b−(4κ+1) .
Next, consider

nb2κ+1/2 (Tˆn − T˜n )
= nb2κ+1/2
+2nb2κ+1/2

Because

Kb ∗ f0 (x) − Kb ∗ f1 (x)

2

dx

fˆn (x) − Kb ∗ f1 (x) Kb ∗ f1 (x) − Kb ∗ f0 (x) dx.

2
Kb ∗ f0 (x) − Kb ∗ f1 (x) dx → f1 − f0 2 > 0 and nb2κ+3 → ∞, the first term

in the right hand side above is of the order O(nb2κ+1/2 ) → ∞, while by (3.5.18) and the
Cauchy-Schwarz inequality, the second term is of the order op (nb2κ+1/2 ). This completes
the proof of Theorem 3.3.1.
The proofs of Theorems 3.3.2, 3.3.3 and 3.3.4 are similar to those of Theorems 3.3.1 and
83

3.2.1, and hence no details are given.
Proof of Theorem 3.3.5. For the sake of completeness of this chapter, we first provide a
brief proof of (3.3.3). For j = 1, · · · , n, let

Dj =

1
π

|ΦK (tb)|2

it(εj −β uj )

e

− Φf1 (t) Φf1 (t) − Φf0 (t) dt.

Φg (−βt)

Note that since K is symmetric, Dj is real. Rewrite

Tn − Kb ∗ (f1 − f0 ) 2
(fn − Kb ∗ f1 )2 dx + 2

=

(fn − Kb ∗ f1 )(Kb ∗ (f1 − f0 )) dx.

Recall (3.2.2) and that nb4κ+2 → ∞. Hence, the first term on the right hand side above
is Op (n−1 b2κ+1 ) = op (n−1/2 ). Using Parseval’s equation, the second term can be written
as n−2

n
j=1 Dj .

Note that Dj ’s are independent arrays identically distributed r.v.’s, with

ED1 = 0, and Var(D1 ) converging to

τ02 : =

1
π2

Φh (t − s)

1
− 2
π
1
=
Var
2π 3

(Φf1 (s) − Φf0 (s))(Φf1 (t) − Φf0 (t))
Φg (βs)Φg (−βt)

Φf1 (−t)(Φf1 (t) − Φf0 (t)) dt
e−itε

Φf1 (t) − Φf0 (t)
Φg (βt)

ds dt

2

dt .

Moreover,

1
E|D1 |4 ≤ 4
π

4
1
+ |Φf1 (t)| |Φf1 (t)| + |Φf0 (t)| dt = O(1),
|Φg (−βt)|

84

by the assumption (B) with r > κ + 1. Hence one obtains (3.3.3), by the Lindeberg-Feller
CLT.
To complete the proof of Theorem 3.3.5, first, consider the case where α is known, so
that fˆn is based on the residuals Yi − α − βˆ Zi ’s only. Without loss of generality, assume
α = 0. Under the alternative H1 ,

n1/2 (Tˆn − Tn )
= n1/2

(fˆn − fn )2 (x) dx + 2n1/2

+2n1/2

(fˆn − fn )(fn − Kb ∗ f1 )(x) dx

(fˆn − fn )(Kb ∗ f1 − Kb ∗ f0 )(x) dx.

The same proof as that of (3.5.1) and nb4κ+2 → ∞ imply

n1/2

(fˆn − fn )2 (x) dx = op (n−1/2 b−2κ ) = op (1).

(3.5.19)

This fact together with (3.2.4) and the Cauchy-Schwarz inequality implies

2n1/2

(fˆn − fn )(fn − Kb ∗ f1 )(x) dx = op (n−1 b−3κ−1 ) = op (1).

(3.5.20)

To deal with the the remaining part, let ∆f (x) := (Kb ∗ f1 − Kb ∗ f0 )(x). Rewrite fˆn − fn
as the sum of the following two terms:

D1 :=

D2 :=

e−itx ΦK (bt)

e−itx ΦK (bt)

it(εj −βˆ uj )
n
(e
j=1

it(ε −βˆ uj )
−e j
)
∆f (x) dt dx,
ˆ
2πnΦg (−βt)

it(εj −βˆ uj )
n
e
j=1

1
ˆ
Φg (−βt)

2πn

85

−

1
∆f (x)dtdx.
Φg (−βt)

Consider D1 first. Since nb4κ+2 → ∞, and κ > 1, then uniformly in |t| ≤ 1/b,

1
n

n

it(ε −βˆ uj )
e j

it(ε −β uj )
−e j

n
j=1 it(β

=

ˆ Zj eit(εj −βˆ uj )
− β)
n

j=1

+ op (n−1/2 ).

Let

te−itx ΦK (bt)

C0 :=

it(εj −β uj )
n
j=1 [Zjk e

− EZjk e

it(εj −β uj )
]

2πnΦg (−βt)

dt∆f (x) dx.

Then EC0 = 0 and

EC02

(3.5.21)
it(εj −β uj )

n

te−itx Φ

≤E
≤
≤

j=1
E|Zk |2

n
E|Zk |2
n

E|Zk |2
≤
2πn

K (bt)

te−itx

− EZjk e
2πnΦg (−βt)

Zjk e

it(εj −β uj )

dt∆f (x) dx

2
ΦK (bt)
dt∆f (x) dx
2πΦg (−βt)

te−itx ΦK (bt)
|ΦK (bt)|
dt
|Φg (−βt)|

2
ΦK (bt)
dt∆f (x) dx
2πΦg (−βt)

|∆f˙ (x)| dx

2

= O(n−1 b−2κ−2 ) = o(1).

Hence, C0 = op (1). Since

EZk eit(ε−β u) = µZ Φh (t) + Φf1 (t)Euk e−iβ ut ,

assumption (B) with r > κ + 1 and the relation Φh (t) = Φg (t)Φf1 (t) imply

te−itx ΦK (bt)

EZk eit(ε−β u)
dt∆f (x)dx = O(1).
Φg (−βt)

86

Together with (3.5.4), and nb4κ+2 → ∞, the above analysis yields

D1 +

i(βˆ − β)
2πn

te−itx ΦK (bt)

EZeit(ε−β u)
dt∆f (x)dx
Φg (−βt)

(3.5.22)

= Op (n−1 b−κ−1 ) = op (n−1/2 ).

Next consider D2 . Uniformly in |t| ≤ 1/b,
ˆ − Φg (−βt)
Φg (−βt)
Φ2g (−βt)
p

it(βk − βˆk )Euk e−iβ ut (βk − βˆk )2 t2 Eu2 e−iβ ut
−
Φ2g (−βt)
Φ2g (−βt)

=
k=1

+ Op (n−3/2 b−3−2κ ).

Let
C1 :=

te−itx ΦK (bt)

n
it(ε−β u) − Φ (t))Eu e−iβ ut
h
k
j=1 (e
∆f (x)dtdx,
2
Φg (−βt)

C2 :=

t2 e−itx ΦK (bt)

n
it(ε−β u) − Φ (t))Eu2 e−iβ ut
h
j=1 (e
k
∆f (x)dtdx.
2
Φg (−βt)

Note that ECi = 0, i = 1, 2, and same arguments as (3.5.21) yield

EC12 = O(n−1 b−4κ−2 ) = o(1),

EC22 = O(n−1 b−4κ−4 ) = o(b−2 ).

Hence, C1 = op (1) and C2 = op (b−1 ). Since Φh (t) = Φf1 (t)Φg (−βt), nb4κ+2 → ∞, by (3.5.4)
and assumption (B) with r > κ + 1, we obtain
i(βˆ − β)
D2 −
2πn

te−itx Φ

K (bt)

= op (n−1/2 ).

87

Φf1 (t)Eue−iβ ut
Φg (−βt)

dt∆f (x) dx

(3.5.23)

Also,

−

1
2π

ite−itx ΦK (bt)Φf1 (t)dt = Kb ∗ f˙1 (x).

Combine this with (3.5.22) and (3.5.23) to obtain

(fˆn − fn )(fn − Kb ∗ f0 )(x) dx = (βˆ − β) Bf + op (n−1/2 ).

2n1/2

Recall (3.5.19) and (3.5.20), immediately

n1/2 (Tˆn − Tn − (βˆ − β) Bf ) = op (1).

(3.5.24)

Next, consider the case when the intercept parameter α is unknown. Let a = α − α
ˆ.
Then

ˆ − Kb ∗ f0 (x + a) 2 dx
fn (x, α, β)

Tˆn =

ˆ − Kb ∗ f0 (x) 2 dx
fn (x, α, β)

=
+
−2

Kb ∗ f0 (x + a) − Kb ∗ f0 (x)

2

dx

ˆ − Kb ∗ f0 (x) Kb ∗ f0 (x + a) − Kb ∗ f0 (x) dx.
fn (x, α, β)

ˆ and from (3.5.24) we have
The first term on the right side above is Tn (α, β),

ˆ − Tn − (βˆ − β) Bf ) = op (1).
n1/2 (Tn (α, β)

Because f˙0 exists, and is finite, and a = Op (n−1/2 ), the second term is Op (n−1 ). Then to

88

deal with the third term, rewrite the factor multiplying -2 as the sum of the following three
terms:

ˆ − fn (x) Kb ∗ f0 (x + a) − Kb ∗ f0 (x) dx,
fn (x, α, β)
fn (x) − Kb ∗ f1 (x) Kb ∗ f0 (x + a) − Kb ∗ f0 (x) dx,
Kb ∗ f1 (x) − Kb ∗ f0 (x) Kb ∗ f0 (x + a) − Kb ∗ f0 (x) dx.

By using the Cauchy-Schwarz inequality, together with a = Op (n−1/2 ), (3.5.18) and
(3.5.24), verify that each of the first two terms above are op (n−1/2 ). The finiteness of f¨0
implies that the third term is equal to

a

Kb ∗ f1 (x) − Kb ∗ f0 (x) Kb ∗ f˙0 (x) dx + op (n−1/2 ).

The above analysis and (3.5.24) imply

n1/2 Tˆn − Tn − (βˆ − β) Bf − (ˆ
α − α)Af = op (1).

(3.5.25)

This fact together with (3.3.3) completes the proof of Theorem 3.3.5.
Proof of Theorem 3.3.6. For Tˆn , recall (3.3.5), (3.3.6) and (3.5.25). Using the details in
the proof of Theorem 3.3.5, we obtain,

1
Tˆn − Kb ∗ (f1 − f0 ) 2 =
n

n

(Dj + ηj Af + ζj Bf ) + op (n1/2 ).
j=1

89

We write

τ 2 := Var(D1 + η1 Af + ζ1 Bf ).

(3.5.26)

Since Dj + ηj Af + ζj Bf , for j = 1, · · · , n are arrays of i.i.d. zero mean r.v.’s and E|D1 |4 =
O(1), E|η|2+ϑ < ∞ and E ζ 2+ϑ < ∞ for some ϑ > 0. Thus the claim (3.3.7) follows by
the Lindeberg-Feller CLT, thereby completing the proof.
The proof of Theorem 3.3.7 is similar as the arguments in the proof of Theorem 3.3.5
and 3.3.6. Thus we omit the details of the proof.

90

BIBLIOGRAPHY

91

BIBLIOGRAPHY
[1] Bachmann, D. and Dette, H. (2005). A note on the Bickel–Rosenblatt test in autoregressive time series. Statistics and Probability Letters, 74(3), 221–234.
[2] Bercu, B., and Portier, B. (2008). Kernel density estimation and goodness-of-fit test in
adaptive tracking. SIAM Journal on Control and Optimization, 47(5), 2440–2457.
[3] Bickel, P. and Rosenblatt, M. (1973). On some global measures of the deviations of
density function estimates. The Annals of Statistics, 1, 1071–1095.
[4] Boldin, M.V. (1982). An estimate of the distribution of the noise in an autoregressive
scheme. Theory of Probability and Its Applications, 27(4), 866–871.
[5] Boldin, M.V. (1990). On testing hypotheses in the sliding average scheme by the kolmogorovsmirnov and ω 2 tests. Theory of Probability and Its Applications, 34(4), 699–
704.
[6] Borkowski, P. and Mielniczuk, J. (2012). Performance of variance function estimators
for autoregressive time series of order one: asymptotic normality and numerical study.
Control and Cybernetics, 41, 415-441.
[7] Butucea, C. (2004). Asymptotic normality of the integrated square error of a density
estimator in the convolution model. Sort, 28(1), 9–26.
[8] Cheng, C. L., and Van Ness, J. W. (1999). Statistical regression with measurement error.
John Wiley & Sons.
[9] Cheng, F., and Sun, S. (2008). A goodness-of-fit test of the errors in nonlinear autoregressive time series models. Statistics & Probability Letters, 78(1), 50–59.
[10] Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a
density. Journal of the American Statistical Association, 83(404), 1184–1186.
[11] Carroll, R. J., Ruppert, D., Stefanski, L.A., and Crainiceanu, C.M. (2012). Measurement
error in nonlinear models: a modern perspective. CRC Press.

92

[12] Delaigle, A., and Hall, P. (2006). On optimal kernel choice for deconvolution. Statistics
and Probability Letters, 76(15), 1594–1602.
[13] Ducharme, G. R., and Lafaye de Micheaux, P. (2004). Goodness-of-fit tests of normality
for the innovations in ARMA models. Journal of Time Series Analysis, 25(3), 373–395.
[14] Durbin, J. (1973). Distribution Theory for Tests Based on the Sample Distribution Function. CBMS Regional Conference Series in Applied Mathematics 9. SIAM, Philadelphia.
[15] Fan, J. (1991). Asymptotic normality for deconvolution kernel density estimators.
Sankhy¯a: The Indian Journal of Statistics, Series A, 53(1), 97–110.
[16] Fan, J. and Yao, Q. (1998). Efficient estimation of conditional variance functions in
stochastic regression. Biometrika, 85(3), 645–660.
[17] Fan, J., and Yao, Q. (2002). Nonlinear time series: Nonparametric and Parametric
Methods. Springer Series in Statistics, Springer-Verlag New York, Inc.
[18] Freedman, D.A. (1975). On tail probabilities for martingales. The Annals of Probability
3(1), 100–118.
[19] Fuller, W. A. (2009). Measurement error models. John Wiley & Sons.
[20] Horv´ath, L., and Zitikis, R. (2006). Testing goodness of fit based on densities of GARCH
innovations. Econometric theory, 22(03), 457–482.
[21] Jiang, J. (2001). Goodness-of-fit tests for mixed model diagnostics. Annals of statistics,
1137–1164.
[22] Khmaladze, E. V. (1982). Martingale Approach in the Theory of Goodness-of-fit Tests.
Theory of Probability & Its Applications, 26(2), 240–257.
[23] Khmaladze, E. V. (1993). Goodness of fit Problems and Scanning Innovation Martingales. Annals of Statistics, 21 (2), 798-829.
[24] Khmaladze, E. V. and Koul, H. L. (2004). Martingale transforms goodness-of-fit tests
in regression models. The Annals of Statistics, 32, 995–1034.
[25] Khmaladze, E. V. and Koul, H. L. (2009). Goodness-of-fit problem for errors in nonparametric regression. The Annals of Statistics, 37, 3165–3185.
93

[26] Koul, H.L. Asymptotic behavior of Wilcoxon type confidence regions in multiple linear
regression. Annals of Mathematical Statistics, 40 (1969),1950-1979
[27] H.L. Koul Some convergence theorems for ranks and weighted empirical cumulatives
Annals of Mathematical Statistics, 41 (1970), 1768-1773
[28] Koul, H. L. (1991). A weak convergence result useful in robust autoregression. Journal
of Statistical Planning and Inference, 29(3), 291–308.
[29] Koul, H. L. (2002). Weighted Empirical Processes in Dynamic Nonlinear Models. Second
Edition. Lecture Notes in Statistics, 166. Springer-Verlag New York, Inc.
[30] Koul, H. L. and Ling, S. (2006). Fitting an error distribution in some heteroscedastic
time series models. The Annals of Statistics, 34(2), 994–1012.
[31] Koul, H. L. and Mimoto, N. (2012). A goodness-of-fit test for GARCH innovation
density. Metrika, 75(1), 127–149.
[32] Koul, H. L. and Song, W. (2012). A class of goodness-of-fit tests in linear errors-invariables model. Journal de la Soci´et´e Fran¸caise de Statistique, 153(1), 52–70.
[33] Lee, S. and Na, S. (2002). On the BickelRosenblatt test for first-order autoregressive
models. Statistics and Probability Letter 56, 23–35.
[34] Loynes, R. M. (1980). The empirical d.f. of residuals from generalized regression. The
Annals of Statistics, 8, 285–298.
[35] Loubes, J. M. and Marteau, C. (2004). Goodness-of-fit testing strategies from indirect
observations. Journal of Nonparametric Statistics, 26(1), 85–99.
[36] Masry, E. (1996). Multivariate local polynomial regression for time series: Uniform
strong consistency and rates. Journal of Time Series Analysis, 17(6), 571–599.
[37] M¨
uller, U. U., Schick, A., and Wefelmeyer, W. (2007). Estimating the error distribution function in semiparametric regression. Statistics and Decisions-International Journal
Stochastic Methods and Models, 25(1), 1-18.
[38] M¨
uller, U.U., Schick, A. and Wefelmeyer, W. (2009a). Estimating the innovation distribution in nonparametric autoregression. Probability Theory and Related Fields, 144(1-2),
53–77.

94

[39] M¨
uller, U. U., Schick, A., and Wefelmeyer, W. (2009b). Estimating the error distribution
function in nonparametric regression with multivariate covariates. Statistics & Probability
Letters, 79(7), 957-964.
[40] M¨
uller, U. U., Schick, A., and Wefelmeyer, W. (2012). Estimating the error distribution
function in semiparametric additive regression models. Journal of Statistical Planning
and Inference, 142(2), 552–566.
[41] Na, S. (2009). Goodness-of-fit test using residuals in infinite-order autoregressive models.
Journal of the Korean Statistical Society, 38(3), 287–295.
[42] Neumeyer, N. and Van Keilegom, I. (2010). Estimating the error distribution in nonparametric multiple regression with applications to model testing. Journal of Multivariate
Analysis, 101(5), 1067–1078.
[43] Neumeyer, N. and Selk, L. (2013). A note on non-parametric testing for gaussian innovations in ararch models. Journal of Time Series Analysis, 34(3), 362–367.
[44] Nikabadze, A. M. (1988). On a Method for Constructing Goodness-of-Fit Tests for
Parametric Hypotheses in Rm . Theory of Probability and Its Applications, 32(3), 539–
544
[45] Ojeda, J.L. (2008). H¨older continuity properties of the local polynomial estimator. Prepublicaciones del Seminario Matem´
atico, 4, 1–21.
[46] Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function.
The Annals of Mathematical Statistics, 27(3), 832–837. Chicago
[47] Selk, L. and Neumeyer, N. (2013). Testing for a Change of the Innovation Distribution
in Nonparametric Autoregression: The Sequential Empirical Process Approach, Scandinavian Journal of Statistics, 40(4), 770–788.
[48] Soms, A. P. (1976). An asymptotic expansion for the tail area of the t -distribution.
Journal of the American Statistical Association, 71, 728–730.
[49] Stefanski, L. A. and Carroll, R. J. (1990).Deconvolving kernel density estimators. Statistics, 21(2), 169–184.
[50] Tsigroshvili, Z. (1998). Some notes on goodness-of-fit tests and innovation martingales.
Proceedings of A. Razmadze Mathematical Institute, 117, 89–102.

95

[51] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence. Springer, New
York.
[52] Van Es, A. and Uh, H. -W. (2004). Asymptotic normality of nonparametric kernel
type deconvolution density estimators: crossing the cauchy boundary. Nonparametric
Statistics, 16(1-2), 261–277.
[53] Wu, W.B., Huang, Y. and Huang, Y. (2010). Kernel estimation for time series: An
asymptotic theory. Stochastic Processes and their Applications, 120(12), 2412–2431.
[54] Yao, Q. and Tong, H. (1994). Quantifying the influence of initial values on non-linear
prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology),
56(4), 701–725.

96