MODEL CHECKING PROBLEMS IN MEASUREMENT ERROR MODELS WITH
VALIDATION DATA
By
Pei Geng

A DISSERTATION
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
Statistics – Doctor of Philosophy
2017

ABSTRACT
MODEL CHECKING PROBLEMS IN MEASUREMENT ERROR MODELS WITH
VALIDATION DATA
By
Pei Geng
This thesis addresses some aspects of regression model checking problems when the covariates are observed with measurement errors. Both classical error-in-variables models and
Berkson models are investigated when validation data is available.
In Tobit error-in-variables regression models, the response is truncated at a given level
while the covariate is collected with errors. In this thesis we assume the density of measurement error to be unknown. Using the calibration idea, a new regression function is derived
under the null hypothesis and estimated using the kernel smoothing method and validation
data. Then a class of test statistics are constructed using the nonparametric residuals based
on kernel regression estimators when validation data is available. The proposed class of tests
is shown to be robust to the choices of parameter estimators and consistent against a large
class of fixed alternatives. The asymptotic normality of these test statistics is established
under the null hypothesis and under a sequence of local alternatives. A practical bandwidth
selection strategy is developed. A finite sample simulation study shows superiority of a member of the proposed class of tests over the two existing tests in terms of empirical power. A
real data application is presented to validate the current understanding of the data set.
In Berkson models, without specifying the measurement error density, the calibrated regression function is estimated using both the primary data containing the responses and the
validation data. A kernel smoothed integrated square distance is defined between the responses and the regression estimator. The parameter estimators are obtained by minimizing

the integrated square distance. Further the test statistic is constructed based on the minimized distance. The consistency and asymptotic normality of these estimators are proved.
The asymptotic null distribution of the proposed class of test statistics based on the corresponding minimized distances and the test consistency against certain alternatives are also
established. A simulation study shows desirable behavior of a member of these minimum
distance estimators and tests.

Copyright by
PEI GENG
2017

To my dear and beloved parents Jinchen Geng and Jingmiao Liu, aunt Jingzhi Liu and
sister Ying Geng.

v

ACKNOWLEDGMENTS

First and foremost, I wish to convey my greatest and most sincere appreciation to my
advisor Professor Hira L. Koul. I am indebted to his constant guidance, encouragement
and enthusiasm in research, endless time and prompt replies to my questions and confusion
over the years. Without his valuable suggestions and comments, vast and sharp knowledge,
passionate and caring attitude, I would not have grown so fast in academia. From Professor
Koul, I have learned not only the professional knowledge and critical thinking in statistics,
but more importantly, the rigorous and responsible spirit towards science which has built a
profound basis for my future work.
Secondly, I would like to thank Dr. Lyudmila Sakhanenko for all her advice and helpful
discussions in the past. Her broad knowledge and innovative ideas has greatly enlightened
me and helped me overcome the difficulty. I would like to thank Dr. Qing Lu for shedding
me the light on the applications of statistics. I also would like to thank Dr. Ping-shou Zhong
for serving as a member of my doctoral committee.
Thirdly, I wish to express my sincere gratitude to Professor Yimin Xiao, my MSc advisor
Professor Wensheng Wang and my undergraduate professor Dr. Zifeng Yang. Without their
encouragement and support, I would not have come to MSU to pursue my doctoral degree.
Last but not in the least, with my deepest affection, I would like to thank my parents
Jinchen Geng, Jingmiao Liu, my sister Ying Geng and my aunt Jingzhi Liu for their endless
love and support over the years.

vi

TABLE OF CONTENTS

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

KEY TO ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x

Chapter 1

1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 2 Model checking in Tobit EIV
2.1 Introduction . . . . . . . . . . . . . . .
2.2 A class of tests . . . . . . . . . . . . .
2.3 Asymptotic distributions . . . . . . . .
2.3.1 Asymptotic null distribution . .
2.3.2 Asymptotic power . . . . . . . .
2.4 Estimation of θ0 . . . . . . . . . . . . .
2.5 Data analysis . . . . . . . . . . . . . .
2.5.1 Simulations . . . . . . . . . . .
2.5.2 Real data application . . . . . .
2.6 Proofs . . . . . . . . . . . . . . . . . .

regression
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

using validation data
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .

Chapter 3 Minimum distance model checking in Berkson
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 A class of tests . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Estimation of θ0 . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Consistency of θˆn . . . . . . . . . . . . . . . . . . .
3.3.2 Asymptotic normality of θˆn . . . . . . . . . . . . .
3.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Finite sample performance of θˆn . . . . . . . . . . .
3.5.2 Test performance . . . . . . . . . . . . . . . . . . .
3.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

models
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

5
5
9
11
11
17
20
23
23
29
31
47
47
50
55
56
57
69
85
87
89
92

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

vii

LIST OF TABLES

Table 2.1: Comparison of estimators: absolute bias and RMSE in parenthesis . . . .

27

Table 2.2: Empirical levels for p = 1 at nominal level 0.05 . . . . . . . . . . . . . . .

28

Table 2.3: Empirical power comparison for p = 1 at nominal level 0.05 . . . . . . . .

28

Table 2.4: Empirical level and power of VT M E test for p = 2 at nominal level 0.05

.

29

Table 2.5: Estimation and testing results of the enzyme reaction dataset . . . . . . .

31

Table 3.1: Performance of θˆn , θ˜n in the linear case (3.37), p = 1. . . . . . . . . . . . .

88

Table 3.2: Performance of θˆn , θ˜n in the nonlinear case (3.38), p = 1. . . . . . . . . . .

89

Table 3.3: Performance of θˆn , θ˜n in the linear case with p = 2 . . . . . . . . . . . . .

90

Table 3.4: Empirical level and power under linear null model (left panel) and nonlinear null
model (right panel) for p = 1 . . . . . . . . . . . . . . . . . . . . . . . . . .

91

Table 3.5: Empirical level and power under linear null model for p = 2 . . . . . . . . . .

92

viii

LIST OF FIGURES

Figure 2.1: Estimated regression functions using three estimation methods . . . . . .

ix

31

KEY TO ABBREVIATIONS

xT

the transpose of an Euclidean vector x

IA

the indicator of event A

→d

convergence in distribution

→p

convergence in probability

n∧N
Nq (ν, Σ)

minimum of n and N
q-variate normal distribution
with mean vector ν and covariance matrix Σ

KN

Koul and Ni (2004)

KS

Koul and Song (2009)

x

Chapter 1
Introduction
In the area of statistical inference, regression model checking is a critical topic to study
the significant relationships between responses and covariates. Extensive research studies
have been focusing on this topic since the early 1990s, as is evidenced by the recent review
article of Gonz´alez-Manteiga and Crujeiras (2013). Among the main classes of tests, kernelbased test procedures are important tools to investigate if the regression functions belong to
a particular parametric family.
The most of the literature on regression model checking assumes that the covariates are
fully observed. However, in reality, many covariates are observed with errors. Instead of
observing the true covariate of interest, one observes a surrogate variable. The regression
models where covariates are observed with errors are known as measurement error regression
models. In general, there are two types of measurement error models: error-in-variables
(EIV) models and Berkson models. The monographs of Fuller (1987), Cheng and van Ness
(1999) and Carroll, Ruppert, Stefanski and Crainiceanu (2006) provide ample examples
of such models and contain systematic analysis of the underlying issues involved in these
models. In general, the simple analysis using the error-prone covariates as the true ones
causes heavy bias in the estimators of the underlying parameters and loss of power in tests.
Hence regression calibration is needed for parameter estimation and hypothesis tests.
In economics and other social sciences, Tobit regression first introduced by Tobin (1958)

1

is a useful model to analyze truncated responses. In Tobit models, the early study mainly
focused on parameter estimation. For example, Amemiya (1984) summarizes comprehensive
estimation procedures when the regression error follows a Gaussian distribution while Abarin
and Wang (2009) proposes the second-order least squares estimation when the error has a
general parametric distribution. In Tobit EIV models, Wang (1998) proposed a two-step
moment estimators in the linear case when both the covariate and measurement error are
normally distributed. When the measurement error distribution is unknown, but with the
help of validation data, the least squares estimation procedure proposed by Song (2009) is
applicable after the regression model is calibrated to that based on the surrogate variable.
Regarding the model checking problem in Tobit EIV models, existing methods include
a score-type test proposed by Song (2009) and a transformation-based distribution-free test
proposed by Song (2011). The former test applies to the least squares estimators and the latter test achieves superior performance for one dimensional covariate when the measurement
error distribution is known.
However, the measurement error distribution is hardly known in reality. An alternative
approach is to conduct statistical inference with an available validation sample. Chapter 2
of this thesis aims to develop a robust model checking procedure in Tobit EIV models with
the help of validation data. In the chapter, a kernel-based nonparametric test is proposed
for fitting a parametric function to the regression function in Tobit EIV models. Given
a consistent parameter estimator, the calibrated regression function is first estimated by
the Nadaraya-Watson estimator based on validation data, then a class of test statistics is
constructed based on the nonparametric residuals. This class of tests is obtained by adopting
the test of Zheng (1996) to the current set up. The proposed tests are shown to be robust to
the parameter estimation choices and the consistency and asymptotic distributions of the test
2

statistics are established under the null hypothesis and under certain alternatives. With the
two bandwidth parameters involved in these tests, a practical bandwidth selection strategy is
developed and applied in a simulation study. The simulation study shows attractive empirical
power performance of a member of the proposed class of tests compared to the two existing
tests.
When a predicting variable is observed with errors, in many cases, it is more appropriate
to assume that the true variable equals the observed plus an error. This is the so called
Berkson measurement error models. For example, the levels of a certain pollutant, such
as lead, in a place are usually measured at fixed spots while the actual exposure to an
individual is subject to the location, time and air condition. Therefore, it is natural to treat
the actual exposure as the measured pollutant levels with a small random error term. To
ensure the identifiability in the Berkson models, it is often assumed that the density of the
measurement error is known. For parameter estimation, Wang (2004) proposed a minimum
distance procedure based on the first two moments of responses in nonlinear regressions.
As for model checking, the main contribution is made by Koul and Song (2009) where a
minimum distance test is constructed based on kernel smoothing technique.
In all the mentioned studies for Berkson models above, authors have assumed that the
measurement error density is known or known up to an unknown Euclidean parameter. This
is a restrictive assumption as it limits the applicability of the inference procedures. However,
the availability of validation data helps to circumvent this assumption.
Regarding the parameter estimation in Berkson models with validation data, a collection
of attractive methodologies have been studied. In linear and nonlinear EIV models with
validation data, Lee and Sepanski (1995) constructed an estimation procedures based on
least squares methods with regression functions replaced by their corresponding wide-sense
3

conditional expectation functions. In linear EIV models, Wang and Rao (2002) developed
an estimated empirical loglikelihood based on the validation data and then constructed an
estimated empirical likelihood confidence region for the parameters in regression functions.
For general Berkson-type models, Du, Zou and Wang (2011) proposed a nonparametric
regression function estimation based on kernel smoothing skills.
But the model checking study in Berkson models with validation data seems to be sparse.
To fill this gap, in Chapter 3 below, we adopt the minimum distance methodology of Koul
and Song (2009) to propose analogous procedures for obtaining the parameter estimators and
further perform lack-of-fit hypothesis testing. The regression function given the surrogate
variable is nonparametrically estimated based on validation data. Then an integrated square
distance between the responses and the regression estimator is defined by means of kernel
smoothing and is minimized for parameter estimation. Eventually the minimized distance
is used as the test statistic. Both consistency and asymptotic normality of the proposed
estimator are established. The asymptotic distributions of the proposed test statistics under
the null and the consistency against certain fixed alternative hypotheses are also derived.
It is shown that the asymptotic distributions of these test statistics are the same as in the
case of known measurement error density while those of the corresponding estimators of the
parameters in the null model are affected by the estimation of the regression function using
validation data. A finite sample study shows literally no bias in the proposed minimum distance estimators. Empirical levels and power are obtained for different choices of the sample
size ratio between primary data and validation data under various alternative hypotheses.
The empirical level is well controlled for most of the chosen cases and the empirical power
significantly increases as the sample size increases for all the chosen alternatives.

4

Chapter 2
Model checking in Tobit EIV
regression using validation data

2.1

Introduction

In economics and other social sciences, many response variables are observed with lower
or upper thresholds. For instance, household expenditure on certain durable goods is zero
for some families depending on other factors and positive for other families, hours worked in
social science is zero for women who choose not to work and positive for others, and as the
third example, the demand of tickets for a game or conference is also limited to the capacity of
the event. Regression model with truncated response data was first studied by Tobin (1958).
Since then these models are called Tobit regression models. Bhattacharya, Chernoff and
Yang (1983) developed a nonparametric Mann-Whitney type estimator of the parameter
in linear Tobit models. The survey paper of Amemiya (1984) provides a comprehensive
introduction to these regression models with Gaussian errors.
To proceed a bit more precisely, in the Tobit regression model of interest here, the scalar
response variable Y ∗ is observed only when it is positive and is related to the p-dimensional

5

predicting variable vector X by the relation

Y ∗ = µ(X) + ε,

Y = Y ∗ I(Y ∗ >0) .

(2.1)

Here Y is the observed response and the scalar random error ε is assumed to have zero mean
and to be independent of X so that µ(x) = E(Y ∗ |X = x), x ∈ Rp .
The last two and a half decades have seen an intense research activity on the topic
of testing for lack-of-fit of a regression model as is evidenced in a recent review paper of
Gonz´alez-Manteiga and Crujeiras (2013). Let Θ ⊂ Rq and M := {m(x, θ), x ∈ Rp , θ ∈ Θ}
be a family of known parametric functions. In the lack-of-fit testing problem of interest here
we wish to test the hypothesis

H0 : µ(x) = m(x, θ), for some θ ∈ Θ for all x ∈ C, versus H1 : H0 is not true,

where C is a compact subset of Rp .
In the case of no measurement error, i.e., when X is fully observed, there are a few tests
for fitting a parametric model to µ(x) in the above Tobit regression model. Wang (2007)
developed a nonparametric test to diagnose nonlinearity in the median Tobit regression
model where covariate is non-random, one dimensional, and the function µ(x) represents
the median of the distribution of Y ∗ at the design variable x. Song (2011) proposed an
asymptotically distribution-free test for fitting a parametric model to µ(x) of (2.1). This
test is based on the supremum of the Stute, Thies and Zhu (1998) type transformation of a
partial sum process of calibrated residuals and is applicable only when the dimension p of
X equals 1. Koul, Song and Liu (2014) adopted the Zheng’s (1996) statistics to test H0 for

6

a large class of the given functions M and for p ≥ 1.
The goal here is to develop tests for H0 in the model (2.1) when there is measurement
error in the covariate vector X, where one does not observe X. Instead one observes a
surrogate variable Z related to X by the relation

Z = X + U,

(2.2)

where the random error U is distributed with mean 0 and unknown covariance matrix Σu .
The r.v.’s X, U , ε are assumed to be mutually independent.
Throughout the chapter, the primary data set consists of a random sample of n observations {(Yi , Zi ), i = 1, 2, · · · , n} obtained from the model (2.1) and (2.2). We further assume
that there is a validation data set consisting of N i.i.d. observations {(Xj , Zj ), j = 1, · · · , N }
on (X, Z) of (2.2), independent of the primary data set.
To proceed further, we recall the testing methodology developed in Koul et al. (2014)
(KSL) when there is no measurement error in X. For any r.v. V , let fV denote its density.
Under H0 , the regression function of Y , given X, is

q(x, θ) = E(Y |X = x) = m(x, θ)Qε,0 (−m(x, θ)) + Qε,1 (−m(x, θ)),

where Qε,j (z) = z∞ uj fε (u)du, j = 0, 1. Thus one has the regression model

Y = q(X, θ) + ξ,

E(ξ|X) = 0.

(2.3)

The function q is monotone as a function of m(x, θ). Hence the original testing problem is
equivalent to testing for E(Y |X = x) = q(x, θ). As in KSL, in order to ensure the model
7

identifiability, and for simplicity, we assume fε to be known. See Remark 2.3.1 for the case
when fε belongs to a parametric family.
In the error-in-variables model here, the regression model (2.3) is of little help, because
X is not observable. Instead we now derive the new regression model, given Z. Let g(z, θ) =
E(q(X, θ)|Z = z). Direct calculations show that under H0 , E(Y |Z = z) = g(z, θ), so that
we have the regression model

Y = g(Z, θ) + η,

E(η|Z) = 0.

(2.4)

The original testing problem is thus transformed to the problem of testing

H0 : E(Y |Z = z) = g(z, θ), for some θ ∈ Θ, for all z ∈ C, versus

(2.5)

H1 : H0 is not true.

Clearly H0 implies H0 . The converse need not be true in general. Song (2008) gives a
sufficient condition for the equivalence of H0 and H0 . It suffices to require the family of
densities fu (z − ·), z ∈ Rp , to be a complete family. From now on, we focus on testing (2.5)
under (2.4) given both primary data and validation data.
Existing literature on parametric measurement error Tobit regression model mainly focuses on the parameter estimation. Song (2009) obtained consistent estimators by a modified
least square procedure while Wang (1998) proposed method of moments estimators when
X, u and ε all follow normal distributions. Both estimators are shown to be

√

n-consistent

for the true parameter θ0 and asymptotically normal. As far as the lack-of-fit testing in
these models is concerned, under the assumption that the measurement error density fU

8

is known, Song and Yao (2011) generalized the test procedure in Song (2011) to the measurement error Tobit models for the one dimensional covariate while Song (2009) provided a
score-type test based on the least square residuals, which were constructed using the validation data. In the current chapter we assume the availability of a validation data set, which
is used to estimate g and fU , thereby avoiding the assumption of having known fU .
In the next section we describe the proposed tests for this problem. Section 2.3 describes the asymptotic normality of the proposed test statistics under H0 and under some
alternatives along with the needed assumptions. Some parameter estimators under H0 are
described in Section 2.4. Section 2.5.1 reports findings of a finite sample simulation study,
which shows some superiority of a member of the proposed class of tests compared to the
tests in Song and Yao (2011) and Song (2009) in terms of the empirical power. The proposed
test is also applied to a real data example in Section 2.5.2 and the results validate the current
understanding of the dataset. All proofs are deferred to the last Section 2.6.

2.2

A class of tests

To describe the proposed class of tests, we need to construct residuals in the model (2.4).
Since here the function g is unknown, we need to estimate this function nonparametrically.
The validation data is critical to do this. Let K be a kernel density function, w be a window
width associated with sample sizes n and N , and set Kw (x) = K(x/w)/wp , x ∈ Rp . Let
N

WN (z, θ) :=

N −1

N

Kw (z − Zk )q(Xk , θ),

f˜(z) := N −1

Kw (z − Zk ), z ∈ Rp , θ ∈ Θ.
k=1

k=1

9

Then, for a given θ, a kernel estimator of g(z, θ) using the validation data set is

gˆ(z, θ) :=

WN (z, θ)
,
f˜(z)

z ∈ Rp .

(2.6)

Because the validation data is independent of the primary data, gˆ(z, θ) is independent of the
primary data, for each θ, so is the kernel density estimator f˜ of the density fZ of Z.
√
Let θ0 be the true value of the parameter θ for which H0 holds. Let θˆn be a n-consistent
estimator of θ0 , and define the residuals

ηˆi = Yi − gˆ(Zi , θˆn ),

i = 1, · · · , n.

Let h = hn be another sequence of window widths. Based on the idea proposed by Zheng
(1996), under H0 , since E(ηi |Zi = z) = 0 for all z ∈ C, we have

E[ηi E(ηi |Zi )fZ (Zi )] = 0,

∀ i ≥ 1,

(2.7)

while it is strictly positive under H1 . In order to use the empirical version of (2.7) in the
primary dataset to form the test statistic, the conditional expectation in the above equation
can be estimated by
n

ˆ i |Zi ] =
E[η
j=i,j=1

Zj − Zi
1
K
ηˆj IC (Zj ) f˜(Zi ).
(n − 1)hp
h

(2.8)

Upon multiplying this by ηˆi f˜(Zi )IC (Zi ) and then summing up over i leads to the class of

10

test statistics, one for each K,

1
Vn =
n(n − 1)hp

n

n

IC (Zi )IC (Zj )K
i=1 i=j=1

Zi − Zj
ηˆi ηˆj ,
h

(2.9)

useful in the current set up. The reason for restricting the covariate Z to a compact set C is
to avoid the usual difficulty associated with the vanishing f˜(z).
We first decompose the residuals as

ηˆi

= Yi − gˆ(Zi , θˆn )

(2.10)

= [Yi − g(Zi , θ0 )] + [g(Zi , θ0 ) − gˆ(Zi , θ0 )] + [ˆ
g (Zi , θ0 ) − gˆ(Zi , θˆn )]
:= ηi − ei − δi ,

say.

Compared to the test statistic in Zheng (1996), it is important to observe that there is
an extra nonparametric estimation residual term ei involved here, which is shown later to
contribute to the asymptotic distributions of Vn .

2.3

Asymptotic distributions

In this section we shall derive the asymptotic distributions of Vn under H0 and under
some alternatives.

2.3.1

Asymptotic null distribution

We shall first state the needed assumptions for obtaining the asymptotic null distribution
of Vn . Let σ 2 (z) := E(η 2 |Z = z), z ∈ Rp , γ 2 (z) := E

11

2

q(X, θ0 ) − g(Z, θ0 ) |Z = z , z ∈ Rp ,

N0 denote an open neighborhood of θ0 and · the Euclidean norm of a vector or of a matrix.
For a given positive integer k, a density kernel K is said to be of order k if
for all 1 ≤ j ≤ k − 1 and

uj K(u)du = 0,

uk K(u)du = 0. We are now ready to state our assumptions. All

limits are taken under N ∧ n → ∞, unless mentioned otherwise.
(C1) The density function fZ is continuously differentiable and inf z∈C fZ (z) > 0.
(C2) The regression function m(x, θ) is differentiable with respect to θ, for each x ∈ Rp , with
its vector of derivatives m(x,
˙
θ) satisfying E supθ∈Θ |m(X, θ)|2 + supθ∈Θ m(X,
˙
θ) 2 < ∞.
√
(C3) There exists an estimator θˆn of θ0 , such that n θˆn − θ0 = Op (1), under H0 .
(C4) For some ∆ > 0, supz∈Rp σ 2+∆ (z)fZ (z) < ∞. The functions σ 2 (z)fZ (z) and g(z, θ0 )fZ (z)
are continuous and uniformly bounded. The functions fZ (z) and g(z, θ0 )fZ (z) and their first
and second derivatives are continuous and uniformly bounded.
(C5) For each z ∈ Rp , g(z, θ) is twice continuously differentiable in θ at θ0 . The second
differential g¨ satisfies E{supθ∈N0 g¨(Z, θ) 2 } < ∞.
(C6) E(σ 2 (Z))2 + E(γ 2 (Z))2 < ∞.
(C7) K is a continuous and symmetric density function on Rp with bounded partial derivatives, and of order k > p/4.
(C8) N/n → λ, h → 0, w → 0, and w/h → c, 0 ≤ c < ∞.
(C9) (n ∧ N )hp → ∞, (n ∧ N )wp → ∞.
(C10) With k as in (C7), nhp/2 w2k → 0.
Remark 2.3.1. Parametric fε . Suppose fε belongs to a parametric family of densities
with unknown parameter vector ν. Then Qε,1 , Qε,0 and g(z, θ) will also depend on ν. Let
γ := (θT , ν T )T , and γˆ be a

√
n-consistent estimator of γ under H0 . Then one can apply
12

ˆ replaced by g(z, γˆ ) throughout. The asymptotic distributions
the above tests with g(z, θ)
of the thus modified test statistics are not affected by this modification. For example, if
ε ∼ N1 (0, σ 2 ) and m(x, θ) = α + βx, then γ = (θT , σ)T = (α, β, σ)T and

q(x, γ) = m(x, θ)Qε,0 (−m(x, θ)) + Qε,1 (−m(x, θ))
= (α + βx)Φ

α + βx
α + βx
+ σφ
,
σ
σ

where Φ and φ are the cumulative distribution function and density function of the standard
normal distribution. For more estimation details, see Wang (1998) and Amemiya (1984).
Remark 2.3.2. The conditions (C3)–(C5) are essentially used to ensure the

√

n-consistency

of the least square parameter estimators in Section 3.3 while the assumptions (C7)–(C10)
about K and bandwidths are needed to derive the asymptotic distributions of the proposed
test statistics.
Remark 2.3.3. Note that the order k of the kernel function K needs to be larger than p/4
in order to obtain a valid bandwidth h since both nhp → ∞ and nhp/2+2k → 0 should be
satisfied. For example, if p < 8, (C10) will be automatically true for any symmetric kernel
density. However, the test will suffer from the curse of dimensionality since the asymptotic
bias of kernel regression estimators is of the order O(hk ). As p increases, the basic assumption
nhp → ∞ requires wider bandwidth. As a consequence, the kernel function order k has to
be increased in order to make the bias negligible compared to the asymptotic rate.
To state the main theorem, here we need to introduce

K 2 (u)du,

K1 :=

13

(2.11)

K2 :=
τ1 :=

K(u)K(v)

2
1
K(s + c(u − v)) + K(−s + c(u − v)) dudv ds
2

IC (z)[σ 2 (z)]2 fZ2 (z)dz K1 ,

τ2 :=

IC (z)[γ 2 (z)]2 fZ2 (z)dz K2 ,

where c is as in assumption (C8). We are now ready to state the following theorem describing
the asymptotic null distribution of Vn . Throughout, →p and →D denote the convergence in
probability and in distribution, respectively.
Theorem 2.3.1. Under (2.1) and (2.2), the assumptions (C1)–(C10) and under H0 , the
following result holds. With λ as in assumption (C8), if 0 < λ < ∞, then

nhp/2 Vn →d N1 (0, 2τ1 + (2τ2 )/λ2 ).

(2.12)

Moreover, τ1 and τ2 can be consistently estimated by
1
τˆ1 :=
n(n − 1)
τˆ2 :=

n

IC (Zi )IC (Zj )Kh (Zi − Zj )ˆ
ηi2 ηˆj2 K1 ,
i=j=1

1
N (N − 1)

N

IC (Zk )IC (Zl )Kw (Zk − Zl )ηk2 ηl2 K2 ,
k=l=1

where ηk = q(Xk , θˆn ) − gˆ(Zk , θˆn ), k = 1, · · · , N.
Consequently, the test that rejects the null hypothesis whenever

τ1 + 2ˆ
τ2 /λ2 > zα/2
nhp/2 Vn / 2ˆ

will have the asymptotic size α, where zα is the upper α quantile of the N1 (0, 1) distribution.
The proof of the above theorem is given in Section 2.6. Here we briefly sketch the idea
of the proof, which is also helpful in discussing the case of λ = ∞. For the sake of brevity,
14

write

i=j

for

n
i=1

n
j=i=1 ,

k=l

for

N
k=1

N
l=k=1 ,

and let

Kh,ij := Kh (Zi − Zj ) = h−p K (Zi − Zj )/h ,

1 ≤ i, j ≤ n.

Then, using the decomposition (2.10), the statistic Vn can be decomposed as

Vn

=
=

1
n(n − 1)
1
n(n − 1)

IC (Zi )IC (Zj )Kh,ij ηˆi ηˆj

(2.13)

i=j

IC (Zi )IC (Zj )Kh,ij ηi ηj + ei ej + δi δj − 2ηi ej − 2ηi δj − 2ei δj
i=j

:= Vn1 + Vn2 + Vn3 − 2Un1 − 2Un2 − 2Un3 ,

say.

We will first show that only Vn1 and Vn2 contribute to the asymptotic variance of nhp/2 Vn
and the asymptotic mean of nhp/2 Vn is 0 under assumed conditions. Then both Vn1 and
Vn2 are approximated by degenerate U statistics, constructed by the projection of Vn1 based
only on the primary data and that of Vn2 based only on the validation data. Eventually we
obtain that Vn is asymptotically normally distributed with convergence rate nhp/2 .
In fact, τ1 = EC {[σ 2 (Z)]2 fZ (Z)}K1 = EC {η 2 [fZ (Z)E(η 2 |Z)]}K1 , where EC denotes the
expectation over the compact subset C. The unconditional expectation can be consistently
estimated by the sample average

1
n

n

IC (Zi )ˆ
ηi2 fˆ(Zi ){E(ηi2 |Zi )},
i=1

where fˆ is a kernel density estimator of fZ based on {Zi , i = 1, · · · , n} in the primary data,

15

and the conditional expectation E is the kernel estimator given by

E(ηi2 |Zi )

1
=
n−1

n

IC (Zj )Kh,ij ηˆj2 /fˆ(Zi ).
j=i,j=1

Plugging in the estimated conditional residuals in the sample average, we obtain

τˆ1 =

1
n(n − 1)

IC (Zi )IC (Zj )Kh,ij ηˆi2 ηˆj2 K1 .
j=i

The parameter τ2 can be estimated similarly. Actually, both {Zi , i = 1, · · · , n} and
{Zk , k = 1, · · · , N } can be used to formulate the kernel density estimator in τˆ1 to make the
estimation more efficient as long as they are i.i.d. copies of Z.
Remark 2.3.4. Alternative consistent estimators of τ1 and τ2 are given by

τ˜1 :=
τ˜2 :=

1
n(n − 1)

2 ηˆ2 ηˆ2 ,
IC (Zi )IC (Zj )hp Kh,ij
i j

(2.14)

i=j

1
N (N − 1)

2 η˜2 η˜2 .
IC (Zk )IC (Zl )wp Kw,kl
k l
k=l

A modification of the proofs in Zheng (1996) yields that τ˜j , j = 1, 2 are indeed consistent for
τj , j = 1, 2, respectively. Details are skipped. A simulation study also shows little difference
between the two proposed estimation methods when sample size is large.
Remark 2.3.5. The case λ = ∞ and λ = 0. When the validation data size N is much
larger than the primary data size n, i.e., when λ is sufficiently large, the regression function
g(Z, θ0 ) can be efficiently recovered by means of the validation data, hence the primary data
dominates the asymptotic behavior of the tests while the validation data does not play a
role in their asymptotic variances. This is justified by letting N/n → ∞, or having λ = ∞.
16

From the proof of the above theorem, we see that

nhp/2 Vn1 →D N (0, 2τ1 ), and N hp/2 Vn2 →D N (0, 2τ2 ).

Moreover, Vn1 and Vn2 are asymptotically independent. All other terms in Vn are asymptotically negligible, compared to these two terms. Hence

nhp/2 Vn ∼ nhp/2 Vn1 + nhp/2 Vn2 = nhp/2 Vn1 +

n
N hp/2 Vn2 →D N (0, 2τ1 ),
N

since now n/N → 0.
On the other hand, when the primary data set is much larger than the validation data
set, the asymptotic convergence rate is limited by the validation data sample size. By going
through the proof of Theorem 2.3.1 again with N/n → 0, we obtain
Theorem 2.3.2. Under (2.1), (2.2), assumptions (C1)–(C10), and H0 , as N/n → λ = 0,
N hp/2 Vn →D N (0, 2τ2 ).

2.3.2

Asymptotic power

In this section we shall investigate the asymptotic power of the proposed tests under the
fixed alternative
H : µ(x) = (x),
where

∈
/ M and E 2 (X) < ∞. Let h(Z) = E( (X)|Z). Then the relation between Y and

Z takes the form
Y = h(Z) + η.

17

Note that E 2 (X) < ∞ implies that Eh2 (Z) < ∞. Additionally, we assume the following.
(C11) E[(h(Z) − g(Z, θ))2 fZ2 (Z)] has a unique minimizer θa .
Under H , the decomposition of the residuals (2.10) becomes

ηˆi = [Yi − g(Zi , θa )] + [g(Zi , θa ) − gˆ(Zi , θa )] + [ˆ
g (Zi , θa ) − gˆ(Zi , θˆn )]
= η¯i − e¯i − δ¯i ,

where η¯i = h(Zi ) − g(Zi , θa ). Because η¯i is no longer centered at 0 under H , Vn1 is a
non-degenerate U statistic. As shown in Lemma 2.6.1, under H , the asymptotic property
of Vn1 is still dominated by

Tn1 =

1
n(n − 1)

ϕ˜2 (Zi , Zj ),
i=j

where ϕ˜2 (Zi , Zj ) = IC (Zi )IC (Zj )Kh,ij η¯i η¯j . Tn1 is a non-degenerate U statistic as well. By
Lemma 3.1 in Zheng (1996),

2
Tn1 =
n

n

√
E[ϕ˜2 (Zi , Zj )|Zi ] − E[ϕ˜2 (Z1 , Z2 )] + op (1/ n).

i=1

By the weak law of large numbers, the first term above converges to 2E[ϕ˜2 (Z1 , Z2 )], in
probability. Algebra shows that

E[ϕ˜2 (Z1 , Z2 )] = E{IC (Z)[h(Z) − g(Z, θa )]2 fZ (Z)} + o(1).

18

Hence

Tn1 →p E IC (Z)[h(Z) − g(Z, θa )]2 fZ (Z) .

(2.15)

Since Tn1 is the only leading term in Vn1 , we have

Vn1 →p E IC (Z)[h(Z) − g(Z, θa )]2 fZ (Z) .

(2.16)

Under H , both e¯i and δ¯i share the properties of ei and δi with θ0 replaced by θa . Thus arguing as under H0 , one can verify that all the terms except Vn1 in (2.13) are Op (1/( nhp/2 )).
Moreover,

τ˜1

=

1
n(n − 1)

=

1
n(n − 1)

→p

2 ηˆ2 ηˆ2
hp IC (Zi )IC (Zj )Kh,ij
i j
i=j
2 η¯2 η¯2 + o (1)
hp IC (Zi )IC (Zj )Kh,ij
p
i j
i=j

K 2 (u)du E{IC (Z){σ 2 (Z) + [h(Z) − g(Z, θa )]2 }fZ (Z)} := τ¯1 .

Unlike τ˜1 , τ˜2 does not involve any information of Yi . Instead, since η˜k = q(Xk ) − gˆ(Zk , θˆn ),
the kernel regression estimator gˆ(Zk , θˆn ) always consistently estimates E(q(Xk |Zk )), hence
τ˜2 →p τ2 under H . Now we state the asymptotic property of the proposed test under H .
Theorem 2.3.3. Under the conditions (C1)–(C10) and (C11), and under the alternative
hypothesis H , for finite 0 < λ < ∞, we obtain

Vn
2ˆ
τ1 + 2ˆ
τ2 /λ2

→p

E{IC (Z)[h(Z) − g(Z, θa )]2 fZ (Z)}
2¯
τ1 + 2τ2 /λ2

19

> 0.

The standardized test statistic nhp/2 Vn /

2ˆ
τ1 + 2ˆ
τ2 /λ2 →p ∞, under H . Hence the pro-

posed test is consistent against the alternatives H .
Now we consider a sequence of local alternatives:

Ha : E(Yi∗ |Xi ) = m(Xi , θ0 ) + bn a(Xi ),

where a(·) ∈
/ M, a(·) is continuously differentiable, E[a(X)]2 < ∞, and bn → 0.
Theorem 2.3.4. Under (C1)–(C11), Ha with bn = (nhp/2 )−1/2 and 0 < λ < ∞, we have
nhp/2 Vn
2ˆ
τ1 + (2ˆ
τ2 )/λ2

→D N (γ, 1),

where γ = E{IC (Z)E[a(X)|Z]2 fZ (Z)}/ 2τ1 + (2τ2 )/λ2 .

2.4

Estimation of θ0

To perform the proposed testing procedure, we first need to obtain a

√

n-consistent

estimator θ0 . Song (2009) uses the least square estimator
n

θˆOLS = arg min
θ

2

IC (Zi ) Yi − gˆws (Zi , θ) ,

(2.17)

i=1

where the Nadaraya-Watson regression estimator gˆws employs a bandwidth ws . In order to
assess the performance of the proposed test under different estimation procedures, we also
introduce the weighted least square estimators. In particular, we choose the weight function
f˜(Zi ) for each observation Zi to avoid the unstableness of kernel regression estimation with

20

f˜(Zi ) close to 0. In other words, we considered the weighted least squares estimator
n
θ

2

Yi − gˆws (Zi , θ) [f˜(Zi )]2 .

θˆW LS = arg min

(2.18)

i=1

An argument similar to the one used in Song (2009) shows that θˆW LS is also asymptotically
normal with convergence rate

√

n. In addition, when all r.v.’s X, u and ε are Gaussian

in the above linear errors-in-variables Tobit model, Wang (1998) obtained two-step moment
estimators θˆT M E . In order to conveniently apply the estimator in the next section, we briefly
describe it here. Assume

Y ∗ = α + β T X + ε,
X ∼ N (µX , ΣX ),

Y = max{Y ∗ , 0},
u ∼ N (0, Σu ),

Z = X + u,

ε ∼ N (0, Σε ),

where X, u and ε are mutually independent and ∆ := Σ−1
X Σu is known. Under the normality
assumption, the first and second moments can be calculated

µY ∗ = α + β T µX ,
σY2 ∗ = β T ΣX β + σε2 ,

µX = µZ ,

σZY ∗ = ΣX β,

ΣZ = ΣX + Σu .

(2.19)
(2.20)

One can construct estimating equations by the substitution of sample moments in the above
euqations in order to obtain the parameter estimation. However the moments of Y ∗ are also
needed. But algebra shows that

E(Y ) = Φ(δ)E(Y |Y > 0),

21

δ := µY ∗ /σY ∗

(2.21)

E(Y |Y > 0) = µY ∗ + σY ∗ φ(δ)/Φ(δ)

(2.22)

E(ZY |Y > 0) = σZY ∗ + µZ E(Y |Y > 0).

(2.23)

Let µ
ˆY denote the overall sample mean of Yi ’s responses and µ
ˆY + be the mean of the positive
Yi ’s only. Using sample moments in equation (2.21), we obtain

δˆ = Φ−1 (ˆ
µY /ˆ
µY + ).

Then combining this with (2.22), we obtain the estimates of µY ∗ and σY ∗ as follows.

ˆ
ˆ
µ
ˆY ∗ = δˆµ
ˆY + /[δˆ + φ(δ)/Φ(
δ)],

ˆ
σ
ˆY ∗ = µ
ˆY ∗ /δ.

By equation (2.23), one can further estimate σZY ∗ by

σ
ˆZY ∗ = µ
ˆZY ∗ − µ
ˆZ µ
ˆY + .

Then plugging µ
ˆY ∗ , σ
ˆY ∗ , σ
ˆZY ∗ in (2.19) and (2.20), and using ∆ = Σ−1
X Σu , which is known,
we obtain the following estimators of α and β.

ˆXσ
βˆT M E = Σ
ˆZY ∗ ,

ˆX = Σ
ˆ Z (I + ∆)−1 ,
Σ

ˆ
α
ˆT M E = µ
ˆY ∗ − µ
ˆX β,

µ
ˆX = µ
ˆZ .

Computationally, θˆT M E = (ˆ
αT M E , βˆT M E )T is more efficient due to the closed form
while the other two estimators require optimization. We use all three estimators in the
simulation study of the next section.

22

2.5
2.5.1

Data analysis
Simulations

In this section we present the findings of a Monte Carlo simulation study. In this study
we used the three estimators of θ0 mentioned in Section 3.3. Let Vθˆ denote the corresponding
ˆ We chose p = 1, 2. The empirical level of the V ˆ
test statistic with estimator of θ0 equal to θ.
θ
test is seen to be robust against these choices of the estimators for p = 1. In all simulations
we set N = 2n for convenience. All the results are obtained by generating 1000 replications.
In the case of one-dimensional covariate, i.e., when p = 1, we compared the power
performance of the Vθˆ test with the two existing methods: one is the Wn test of Song and
Yao (2011) based on Stute, Thies and Zhu (1998) type transformation of a partial sum
residual process and the other is the score-type test Sn in Song (2009). Although Song’s
score-type test does not directly apply to the Tobit model, it can be successfully adapted
to the current model when the testing problem is transformed to (2.5). In order to clearly
see how the simulation is implemented, we briefly describe both existing methods here.
Regarding the score type test, Sn is defined as

1
Sn =
n

n

IC (Yi − gˆws (Zi ))W (Zi ),
i=1

where W (·) is a weight function of the covariate Z. Specifically, we used the uniform weight
function W (Zi ) ≡ 1 in the simulation.
As for the transformed test Wn , first define the stochastic process

1
Wn (z) = √
n

n
i=1

1
eˆi I(c ≤ Zi ≤ z) − √
n

n

eˆi
i=1

23

1
n

n

ˆlT (Zj )M
ˆ −1 I(c ≤ Zj ≤ Zi ∧ z) ˆl(Zi ),
j
j=1

where

eˆi =

˙ i , θˆn )
ˆl(Zi ) = g(Z
,
τ (Zi , θˆn )

Yi − g(Zi , θˆn )
,
τ (Zi , θˆn )

ˆj = 1
M
n

n

ˆl(Zk )lT (Zk )I(Zk ≥ Zj ≥ c),
k=1

τ (z, θ) = E((Y − g(Z))2 |Z = z)
˜ 2 Φ(˜
˜ + σ 2 (˜
˜
˜ + σ 2 Φ(˜
˜ − g(z, θ)2 ,
= σ 2 (˜
α + βz)
α + βz)
α + βz)φ(˜
α + βz)
α + βz)
˜
˜ + σφ(˜
˜
g(z, θ) = σ(˜
α + βz)Φ(˜
α + βz)
α + βz),
α
˜ = (α + (1 − ∆)βµx )/σ,
σ 2 = σε + ∆β 2 σu2 ,

β˜ = (1 − ∆)β/σ,

∆ = σx2 /(σx2 + σu2 ).

One rejects H0 if

sup |Wn (z)/ FˆZ (z0 ) − FˆZ (c)| > bα ,

c≤z≤z0

where bα is the value such that P (sup0≤u≤1 |B(u)| > bα ) = α and B(u) is the standard
Brownian motion. For the nominal level 0.05, bα = 2.242. We used c = min{Zi } and z0 is
the 95th quantile of Zi .
The simulation results show that in terms of the empirical power, the V test with θˆT M E
outperforms all other tests for small and moderate sample sizes, and it behaves as well as
the Wn test for the large sample size. Moreover, all the three Vθˆ tests corresponding to the
three different estimators θˆ of θ0 produce higher power than the corresponding score-type
tests at the chosen alternatives. In addition, the empirical powers of the proposed Vθˆ test
under the three specified estimation options match the estimation performance in the sense

24

that more accurate estimator θˆ leads to higher empirical power of the corresponding Vθˆ
test. The estimators of θ0 used in Wn and Sn tests are θˆT M E and θˆOLS , respectively, as
was done in Song and Yao (2011) and Song (2009). We used the kernel density K(u) =
0.75(1 − u2 )I(|u| ≤ 1) of order k = 2 for all the kernel density related tests.
In the case p = 2, the only other available test is the Song’s score-type test. However,
2p

in Song’s test, the bandwidth ws should satisfy both conditions N ws /log(N ) → ∞ and
N ws2k → 0, where k is a positive even integer specifying the order of the symmetric kernel
density K. When p = 2, the above bandwidth conditions require k to be larger than 2,
which means that symmetric densities like the above kernel K(u) can not be used here. If
we take k = 4, K will take negative values at certain points. In a simulation, using the
fourth order density kernel, K(4) (u, v) = (0.086 − 0.2u2 )(0.086 − 0.2v 2 )K(u)K(v)/0.0462 as
introduced in Jones and Signorini (1997), the least square estimator of θ0 showed large bias
and mean square error, which in turn makes the score-type test difficult to implement. For
these reasons we only report the finite sample behavior of the Vθˆ tests. We find both the
empirical levels and powers under the chosen alternatives are satisfying.
Bandwidth selection: As is evident, the implementation of the proposed tests requires the
selection of the two bandwidths. One is the kernel regression bandwidth w and the other
is the bandwidth h used in forming the test statistics Vn . It is thus important to provide a
practical strategy for the selection of these bandwidths.
ˆ we propose to obtain the optimal w, denoted by wb , by
Given a consistent estimator θ,
minimizing the mean square error of the kernel regression estimator gˆw as follows.

M SE1 (w) :=

1
n

n

ˆ 2,
IC (Zi )(Yi − gˆw (Zi , θ))
i=1

25

wb := arg min M SE1 (w).
w

Note that no cross validation is needed since the Nadaraya-Watson estimator gˆw is constructed based on the independent validation data {(q(Xk ), Zk ), k = 1, ..., N } instead of
{(Yj , Zj ), j = 1, ..., n}.
Regarding the bandwidth h, recall that h was originally used to estimate E(ηi |Zi ) by the
estimator given in (2.8). Since, under H0 , E(η|Z = z) ≡ 0, we propose to obtain an optimal
h by minimizing the mean square error

M SE2 (h) =

1
n

n

ˆ i |Zi ])2 =
IC (Zi )(E[η
i=1

1
n

n

n

i=1

j=i,j=1

2
IC (Zi )IC (Zj )
Zj − Zi
˜(Zi ) .
f
η
ˆ
K
j
(n − 1)hp
h

To satisfy that h → 0 and w/h → c < ∞ in (C8), we enforce the constraint 0.1wb ≤ h ≤ 10wb
in the above minimization. In our simulation, we applied the grid search of bandwidths
starting from 0.1wb with step 0.02 to obtain the optimal bandwidth hopt . In some cases, a
grid search study showed that M SE2 (h) decays slowly to 0, for all sufficient large values of
h. This will cause the chosen bandwidth to be much larger. Hence, we set a threshold 0.05
for M SE2 to avoid choosing too large a bandwidth. To summarize,

hopt := min h : h = argmin0.1w ≤h≤10w max{M SE2 (h), 0.05} .
b
b

Simulation 1: p = 1. In this simulation, the data were generated from the model (2.1)
and under H0 , where m(x, θ) = α + βx, θ = (α, β)T , with α = 1, β = 0.6, and Z = X + u,
where X ∼ N1 (0, 22 ), ε ∼ N1 (0, 1) and u ∼ N (0, 0.52 ) so that the ratio σu2 /σx2 = 1/16 in
the TME procedure of Wang (1998). The truncation rate is approximately 26%. Moreover,
here q(x, θ) = (α + βx)Φ((α + βx)) + φ((α + βx)), where Φ and φ are distribution function
and density function of the N1 (0, 1) r.v., respectively.

26

Following the estimation procedure in Song (2009), ws was set as N −1/3 to obtain estimators θˆOLS and θˆW LS . Table 2.1 reports the absolute bias and square root of mean square
error (RMSE) (in parenthesis) of the three estimators of θ0 mentioned in Section 3.3. As expected, both bias and RMSE decrease as the sample size n increases for the three estimators,
which indicates the consistency of the estimators. Moreover, under the Gaussian scenario,
θˆT M E is seen to be superior among the three and θˆW LS is the least favorable. In the power
analysis section, we will see that the empirical power of the Vθˆ-test shows a similar pattern.
n, N
α
ˆ OLS
α
ˆ W LS
α
ˆT M E
n, N
βˆOLS
βˆW LS
βˆT M E

(50, 100)
(100, 200)
0.019(0.196) 0.005(0.129)
0.018(0.237) 0.010(0.152)
0.008(0.166) 0.004(0.116)
(50, 100)
(100, 200)
0.021(0.142) 0.003(0.083)
0.023(0.183) 0.011(0.113)
0.005(0.103) 0.004(0.069)

(200, 400)
0.004(0.090)
0.005(0.104)
0.004(0.082)
(200, 400)
0.004(0.057)
0.004(0.081)
0.0006(0.049)

(300, 600)
0.002(0.076)
0.0001(0.087)
0.001(0.069)
(300, 600)
0.001(0.049)
0.002(0.068)
0.0003(0.041)

Table 2.1: Comparison of estimators: absolute bias and RMSE in parenthesis
In the discussion below, we write VOLS , VW LS and VT M E for Vθˆ when θˆ equals the
θˆOLS , θˆW LS and θˆT M E of the previous section, respectively.
To implement the proposed test, the set C was used to be the overlap interval of {Zi , i =
1, ..., n} and {Zk , k = 1, ..., N }. In other words, C is chosen as the interval [a, b], where
a = max{min{Zi }, min{Zk }} and b = min{max{Zi }, max{Zk }}. As mentioned in the
main theorem, there are two options for estimating the asymptotic variances. To simplify
the computation, we used the estimators given at (2.14). Applying the above bandwidth
selection scheme, Table 2.2 shows that the empirical levels of all the V tests are well controlled
for the large sample sizes.
To investigate the power performance, we compared the proposed test with the Wn and

27

(n, N )
VOLS
VW LS
VT M E

(50,100)
0.008
0.022
0.011

(100,200)
0.011
0.019
0.014

(200,400)
0.030
0.036
0.033

(300,600) (400,800)
0.029
0.043
0.031
0.041
0.034
0.049

(500,1000)
0.046
0.051
0.048

Table 2.2: Empirical levels for p = 1 at nominal level 0.05

Sn tests mentioned above. We performed a finite sample power comparison by generating
data under the model (2.1) and the alternatives H1 : µ(x) = m(x, θ, b), for all x ∈ C and
some b ∈ R, where m(x, θ, b) = 1 + 0.6x + b sin(x), b ∈ R.
Table 2.3 displays the empirical power of the three types of tests for increasing sample
sizes. One can see that the empirical power of all tests increases as the sample sizes n, N
and the nonlinear effect b increase. For small and moderate sample sizes, VT M E performs
the best, and both Wn and VT M E achieve the highest power for the large sample size among
the five tests. All the three V tests outperform the score-type test for the larger sample
sizes. Among the three V tests, VT M E performs the best, followed by VOLS and VW LS .
This finding also matches the behavior of the three estimators of θ0 presented in Table 2.1.
(n, N )
(100, 200)

b
Wn
Sn
VOLS
0 0.059 0.039 0.011
0.5 0.213 0.119 0.115
1 0.382 0.306 0.448
(300, 600)
0 0.067 0.049 0.029
0.5 0.563 0.317 0.592
1 0.927 0.504 0.983
(500, 1000) 0 0.077 0.058 0.046
0.5 0.822 0.394 0.805
1 0.991 0.583 0.984

VW LS
0.019
0.08
0.300
0.031
0.400
0.966
0.051
0.640
0.979

VT M E
0.014
0.218
0.743
0.034
0.643
0.984
0.048
0.834
0.983

Table 2.3: Empirical power comparison for p = 1 at nominal level 0.05

Simulation 2: p=2. We conducted a brief simulation study for the bivariate predicting
variables case. Here, both primary sample {(Yi , Zi ), i = 1, ..., n} and validations sample

28

{(Xk , Zk ), k = 1, ..., N } are generated from the model

Y ∗ = α + β1 X1 + β2 X2 + b (X12 + X22 ) + ε,

Y = Y ∗ I(Y ∗ > 0),

Z = X + u,

where α = β1 = β2 = 1, ε ∼ N1 (0, 0.52 ), X = (X1 , X2 )T ∼ N2 (0, Σx ), Σx = (σij )2×2 ,
σ11 = σ22 = 1, σ12 = 0.5 and u ∼ N2 (0, Σu ), Σu = 0.52 I2 . Here I2 is a 2 × 2 identity
matrix. b = 0 corresponds to the null model. The Gaussian distribution assumption and
known covariance ratio Σ−1
x Σu suggests the use of VT M E -test. The compact set C is now
a rectangle with sides chosen in a similar way as in the case of p = 1. We used the kernel
function K(u, v) = K(u)K(v) of order k = 2, where K is the same as in the case of p = 1,
and the bandwidths were selected by the above MSE criteria. In Table 2.4, the empirical
level is seen to be slightly liberal for the larger sample sizes and the empirical power increases
as b, n and N increase.
b (100,200)
0
0.032
0.1
0.051
0.3
0.156
0.5
0.385

(200,400)
0.047
0.060
0.269
0.758

(300,600)
0.051
0.096
0.502
0.896

(400,800)
0.055
0.154
0.683
0.975

(500,1000)
0.068
0.164
0.782
0.989

Table 2.4: Empirical level and power of VT M E test for p = 2 at nominal level 0.05

2.5.2

Real data application

The enzyme reaction speed data was originally collected in 1974 to study the relationship
between the initial rate of enzyme reaction and the concentration of UDP-galactose. The
data set has been analyzed by both Stute, Xue and Zhu (2007) and Du et al. (2011). The
primary sample contains n = 30 observations of (Yi , Zi ), where Yi is the initial rate of reaction
29

speed and Zi denotes the basal density of UDP-galactose, for the ith individual, 1 ≤ i ≤ 30.
The basal density can be measured in the two ways described in Du et al. (2011): as a
simple chemical treatment and by an expensive precision machine. The former treatment
produces surrogate observation Z while the latter treatment serves accurate observation X.
A validation data consisting of N = 10 pairs of basal density were obtained. We manually
truncated the responses at 125 with truncation rate 27% to apply the proposed test. It is
commonly believed that the Michaelis-Menten model given by

m(x, θ) = θ1 x/(θ2 + x),

is an appropriate model for this data set.
In the estimation step, we adopted the ad hoc bandwidth ws = σ
ˆZ˜ N −1/3 as recommended
in Sepanski and Carroll (1993) and obtained both θˆOLS and θˆW LS . Besides these two
estimators, the empirical likelihood estimator θˆEL obtained in Stute, Xue and Zhu (2007)
is also presented in Table 2.5. Then in the test step, we continued applying the bandwidth
selection method introduced in Section 3.5. The parameter estimators, optimal bandwidths
and p-values of the V tests are presented in Table 2.5 and the estimated curves by both
least square methods and empirical likelihood in Stute, Xue and Zhu (2007) are displayed
in Figure 2.1. None of the V tests using the three estimators are significant, which validates
the current understanding that the above Michaelis-Menten model is proper for the data set.

30

180
160

Estimation by θ^OLS
Estimation by θ^WLS
Estimation by θ^EL

125

140

reaction speed

200

Reaction speed and basal density

0.0

0.2

0.4

0.6

0.8

1.0

VOLS
VW LS
VEL

(θˆ1 , θˆ2 )
(217.37, 0.071)
(218.41, 0.072)
(212.70, 0.065)

wb
0.12
0.12
0.12

hopt p-value
0.49 0.464
0.45 0.592
0.29 0.462

Table 2.5: Estimation and testing results of the
enzyme reaction dataset

basal density

Figure 2.1: Estimated regression functions using three estimation methods

2.6

Proofs

Recall the notation given in (2.10). Throughout this section, f stands for the density fZ
of Z. We begin by listing some of the important facts about the first and second moments
of the three parts of the residuals below, where θ¯ is a vector such that θ¯ − θ0 ≤ θˆn − θ0 .

E(ηi |Zi ) = 0, E(ηi2 |Zi ) = σ 2 (Zi ), for all 1 ≤ i ≤ n.

(2.24)

E(ei |Zi ) = O(wk ), E(e2i |Zi ) = O(1/(N wp ) + w2k ), uniformly in i for Zi ∈ C. (2.25)
1
¯ 0 − θˆn ). (2.26)
δi = gˆ(Zi , θ0 ) − gˆ(Zi , θˆn ) = gˆ˙ (Zi , θ0 )(θ0 − θˆn ) + (θ0 − θˆn )T g¨ˆ(Zi , θ)(θ
2

Fact (2.24) is assumption (C3). Fact (2.25) follows from Theorem 2.2.1 of Bierens (1987)
pertaining to Nadaraya-Watson regression estimators while the claim (2.26) follows from
Taylor expansions of gˆ at θ0 . Intuitively, both e and δ are asymptotically negligible compared to η, however {ei , i = 1, · · · , n} are not independent since they are all based on the
validation data set (Xk , Zk )k=1,··· ,N . Hence we need to study those terms that involve

31

{ei , i = 1, · · · , n}.
In the sequel,

Di = (Zi , ηi ), 1 ≤ i ≤ n;

Dk = (Zk , Xk ), 1 ≤ k ≤ N.

We have
Lemma 2.6.1. Under H0 and (C1)–(C10),

nhp/2 Vn1 →d N1 (0, 2τ1 ),

(2.27)

where τ1 is defined at (2.11).
Proof. Recall (2.13). Rewrite

Vn1 =

1
n(n − 1)

IC (Zi )IC (Zj )Kh (Zi − Zj )ηi ηj .
i=j

Define
Hn (Di , Dj ) = IC (Zi )IC (Zj )Kh (Zi − Zj )ηi ηj .
It can be seen that Vn1 is a degenerate statistic since E[Hn |Di ] = 0. The result is immediate
by a slight modification of the proof in Zheng (1996).
The proofs of Lemma 2.6.2 and 2.6.3 use Lemmas 2.6.4 and 2.6.5 which are given in the
end of the section.
Lemma 2.6.2. Assume (C1)–(C10), H0 hold and 0 < λ < ∞. Then, with τ2 is as in (2.11),

N hp/2 Vn2 →d N1 (0, 2τ2 ).
32

(2.28)

Proof. Let Iij = IC (Zi )IC (Zj ). In the proof here the indices i, j vary from 1 to n. Let
1
Vn2 =
n(n − 1)N 2

N

N

i=j k=1 l=1

Iij Kh,ij
K
K
[q(Xk ) − g(Zi )][q(Xl ) − g(Zj )].
f (Zi )f (Zj ) w,ik w,jl

Direct calculations show that under (C1) and (C7)–(C9),
f˜(z)
− 1 = op (1).
z∈C f (z)

sup

(2.29)

Now rewrite

Vn2 =
=

1
n(n − 1)
1
n(n − 1)

IC (Zi )IC (Zj )Kh (Zi − Zj )ei ej
i=j

Iij Kh,ij
i=j

1
=
n(n − 1)N 2

i=j

g(Zi ) − WN (Zi )
f˜(Zi )

g(Zj ) − WN (Zj )
f˜(Zj )

Iij Kh,ij N N
Kw,ik Kw,jl [q(Xk ) − g(Zi )][q(Xl ) − g(Zj )]
f˜(Zi )f˜(Zj )
k=1 l=1

= Vn2 + op (Vn2 ).

The above fact follows from (2.29).
We shall now analyze Vn2 . Rewrite
1
Vn2 =
n(n − 1)N
+

N

1 Iij Kh,ij
K
K
[q(Xk ) − g(Zi )][q(Xk ) − g(Zj )]
N f (Zi )f (Zj ) w,ik w,jk
i=j k=1

1
n(n − 1)N 2

i=j k=l

Iij Kh,ij
K
K
[q(Xk ) − g(Zi )][q(Xl ) − g(Zj )]
f (Zi )f (Zj ) w,ik w,jl

= Vn21 + Vn22 .

33

For Vn21 , define the symmetric function

ψ1 (Di , Dj , Dk ) =

1 Iij Kh,ij
K
K
[q(Xk ) − g(Zi )][q(Xk ) − g(Zj )]
N f (Zi )f (Zj ) w,ik w,jk

and L(Zi , Zj , Zk ) = E[(q(Xk ) − g(Zi ))2 (q(Xk ) − g(Zj ))2 |Zi , Zj , Zk ]. Note that this kernel
depends on both n and N , but this dependence is not exhibited for the sake of brevity. In
order to apply Lemma 2.6.4, we need to calculate variances of all projections of ψ1 . Rigorous
calculation shows that

Var(ψ1 )
≤
=
=
=

(2.30)

2
Iij Kh,ij
1
2
K 2 K 2 [q(Xk ) − g(Zi )]2 [q(Xk ) − g(Zj )]2
Eψ1 = E
N 2 f 2 (Zi )f 2 (Zj ) w,ik w,jk
2
Iij Kh,ij
1
E 2
K 2 K 2 L(Zi , Zj , Zk )
N2
f (Zi )f 2 (Zj ) w,ik w,jk
2
Iij Kh,ij
1
2 K2
E[Kw,ik
E 2
w,jk L(Zi , Zj , Zk )|Zi , Zj ]
2
2
N
f (Zi )f (Zj )
2
Kh,ij
1
2 (u)K 2 Zj − Zi + u L(Z , Z , Z − wu)f (Z
K
E
i j i
i
C
w
N 2 w3p
f 2 (Zi )f 2 (Zj )

= O
= O

1
N 2 w3p h2p
1
N 2 w3p hp

EC K 2

Zj − Zi
L(Zi , Zj , Zi )f (Zi )
h

K 2 (u)K 2

− wu)du

Zj − Zi
+ u du
w

.

In the above derivations we used assumption (C1) that guarantees that the density f is
bounded from below. Next, consider

E(ψ1 |Di , Dj )
=

1 Iij Kh,ij
E{Kw,ik Kw,jk [q(Xk ) − g(Zi )][q(Xk ) − g(Zj )]|Di , Dj }
N f (Zi )f (Zj )

34

1
N
1
=
N

=

Iij Kh,ij
E{Kw,ik Kw,jk [µ2 (Zk ) − (g(Zi ) + g(Zj ))g(Zk ) + g(Zi )g(Zj )]|Di , Dj }
f (Zi )f (Zj )
Iij Kh,ij
1
Zi − x
(µ2 (x) − (g(Zi ) + g(Zj ))g(x) + g(Zi )g(Zj ))f (x)dx
K
f (Zi )f (Zj )
w
w2p

1 Iij Kh,ij σ 2 (Zi )
= Op
N wp f (Zi )f (Zj )

K(u)K

Zj − Zi
+ u du .
w

Furthermore, uniformly in i, j,

Var E(ψ1 |Di , Dj ) ≤ E E(ψ1 |Di , Dj )

2

=O

1

.

N 2 w2p hp

Similar arguments imply that uniformly in 1 ≤ i ≤ n and 1 ≤ k ≤ N ,

Var(E(ψ1 |Di )) = O
Var(E(ψ1 |Di , Dk )) = O

1
N 2 w2p

,

Var(E(ψ1 |Dk )) = O

1
N 2 w2p

,

1
.
N 2 wp h2p

Assumptions (C9), (C10) and 0 < λ < ∞ together with Lemma 2.6.4 yield

Var(N hp/2 Vn21 ) = O

2
2
4
4nhp
nhp
+
+
+
+
(N wp )3 (N wp )2 (N wp )(N hp ) (N wp )2 N 2 w2p

= o(1).

Moreover, direct calculations show that E[Vn21 ] = Eψ1 = o(1/N ). Hence,

E[N hp/2 Vn21 ]2 = Var(N hp/2 Vn21 ) + E 2 [N hp/2 Vn21 ] = o(1),

(2.31)

Vn21 = op (1/(N hp/2 )).

In order to analyze the variance of Vn22 , for i = j, 1 ≤ i, j ≤ n, k = l, 1 ≤ k, l ≤ N , we

35

define the following symmetric function

ψ2 (Di , Dj , Dk , Dl ) =

Iij Kh,ij
{K
K
[q(Xk ) − g(Zi )][q(Xl ) − g(Zj )]
f (Zi )f (Zj ) w,ik w,jl
+Kw,il Kw,jk [q(Xl ) − g(Zi )][q(Xk ) − g(Zj )]}/2,

i.e., ψ2 is symmetric within the blocks (Di , Dj ) and (Dk , Dl ). Then we have
1
Vn22 =
n(n − 1)N 2

n

N

ψ2 (Di , Dj , Dk , Dl ).
i=j=1 k=l=1

In order to apply Lemma 2.6.5, we need to calculate the variances of all projections of ψ2 .
Computations similar to the above used for analyzing Var(Vn21 ) yield the following facts.

Var(ψ2 ) = O

1
hp w2p

,

Var(E(ψ2 |Dk )) = O(w2k ),

Var(E(ψ2 |Di )) = O(w4k ),

(2.32)

w4k
,
hp
1
Var(E(ψ2 |Dk , Dl )) = O p ,
h

Var(E(ψ2 |Di , Dj )) = O

w2k
Var(E(ψ2 |Di , Dk )) = O
,
wp
w2k
Var(E(ψ2 |Di , Dj , Dk )) = O p p ,
h w

Var(E(ψ2 |Di , Dk , Dl )) = O

1
.
h2p

Given the above projection variances, (C8)–(C10), 0 < λ < ∞ and Lemma 2.6.5 imply that
4
4w4k 4w2k
+
+
+
n
N
n2 N 2 hp w2p
8
8w2k
8
+
+
+
nN 2 hp nN 2 h2p n2 N hp wp
1
2
=
V ar(E(ψ2 |Dk , Dl )) + o
2
2
N
N hp

Var(Vn22 ) = O

2w4k
2
16w2k
+
+
n2 hp N 2 hp nN wp

=O

1
N 2 hp

(2.33)

.

In fact, the variance term of E(ψ2 |Dk , Dl ) dominates the variance of Vn22 . The facts (2.31)

36

and (2.33) in turn imply that

Vn2 = Vn22 + op (1/(N hp/2 )).

(2.34)

To further investigate Vn22 , we study the projection of ψ2 on the validation data set
E(ψ2 |Dk , Dl ). From the format of ψ2 , we only need to study the projection when Zk ∈ C
and Zl ∈ C. In fact, for fixed Zk , suppose Zk ∈
/ C, there is small enough r such that
Nr (Zk ) ∩ C = ∅ where Nr (Zk ) is the neighborhood of Zk within radius of M r and M is the
boundary of the density support of K on each coordinate. It leads to Kw,ik = 0, then ψ2 = 0.
For kernel function with density support on Rp such as normal density, one can argue with
large enough MK such that the kernel density is arbitrarily small outside of [−MK , MK ]p .
Details are skipped. Hence asymptotically, change of variables and Taylor expansion yield

E(ψ2 |Dk , Dl )
= IC (Zk )IC (Zl ) [q(Xk ) − g(Zk )][q(Xl ) − g(Zl )]
+wk C1

K(u)K(v)Kh,kl (u, v)dudv

K(u)K(v)Kh,kl [q(Xk ) − g(Zk )]g (k) (Zl )v k
+[q(Xl ) − g(Zl )]g (k) (Zk )uk dudv + Op (w2k )

:= ψ2 (Dk , Dl ) + R2 (Dk , Dl ) + Op (w2k ),

where Kh,kl (u, v) = 1/2{K((Zk − Zl )/h+w(u−v)/h)/hp +K((Zl − Zk )/h+w(u−v)/h)/hp },
C1 = 1/k!. Notice that ψ2 (Dk , Dl ) is the leading term, the other two terms are negligible as
w → 0. Hence (2.33) can be rewritten as

Var(Vn22 ) =

1
2
V
ar[
ψ
(
D
,
D
)]
+
o
2
k
l
N2
N 2 hp
37

=O

1
.
N 2 hp

(2.35)

Next, we will show that Vn22 is asymptotically equivalent to Vn22 defined below:

Vn22

=

1
N2

{ψ2 (Dk , Dl ) + R2 (Dk , Dl )} + Op (w2k )

(2.36)

k=l

:= Tn2 + Tn2 + Op (w2k ).

It can be seen that Vn22 is the projection of Vn22 on the validation data. Hence E{(Vn22 −
Vn22 )Vn22 } = 0, and

Var(Vn22 − Vn22 ) = Var(Vn22 ) − Var(Vn22 ).

(2.37)

Now we will prove that Tn2 dominates Vn22 by showing the asymptotic properties of each
term. The last term in (2.36) is negligible with asymptotic rate N hp/2 since N hp/2 w2k → 0
by assumption (C10).
First, note that Tn2 is a degenerate U statistic. After verifying the conditions in Theorem
1 of Hall (1984), we apply the theorem and obtain that

N Tn2
{2E ψ22 (D1 , D2 )}1/2

→d N1 (0, 1).

(2.38)

Since E(ψ2 (D1 , D2 )) = 0, (2.32) further implies that E ψ22 (D1 , D2 ) = Var[ψ2 (D1 , D2 )] =
O(1/hp ). Moreover, (2.38) implies that

V ar(Tn2 ) =

1
2
Var(ψ2 (D1 , D2 )) + o
2
2
N
N hp

=O

1
N 2 hp

.

(2.39)

Second, note that Tn2 is a non-degenerate U statistic with mean 0. By applying the

38

central limit theorem for non-degenerate U statistics presented in Serfling (1981), we obtain
√

N Tn2

{4Var(E(R2 |D1 ))}1/2

→d N1 (0, 1).

Straightforward calculation indicates that Var(E(R2 |D1 )) = O(w2k ). Then, as n ∧ N → 0,

Var(N hp/2 Tn2 ) = O(N 2 hp w2k /N ) = O(N hp w2k ) = o(1),

under the condition that nhp/2 w2k → 0 and 0 < λ < ∞. Therefore

Tn2 = op (1/(N hp/2 )).

This fact combined with (2.39) and (2.36) yield

Var(Vn22 ) =

2
1
Var[
ψ
(
D
,
D
)]
+
o
.
2
1
2
N2
N 2 hp

(2.40)

The results (2.35), (2.37) and (2.40) together imply that

Var(Vn22 − Vn22 ) = o

1
.
N 2 hp

Hence

Vn2 = Vn22 + op (1/(N hp/2 )) = Vn22 + op (1/(N hp/2 )) = Tn2 + op (1/(N hp/2 )).

39

(2.41)

Given the asymptotic results in (2.38), we have

hp E ψ2 (D1 , D2 ) →

K(u)K(v)[K(s + c(u − v)) + K(−s + c(u − v))]/2dudv

2

ds

IC (x)(γ 2 (x))2 f 2 (x)dx = τ2 .

×

By connecting the above limiting variance with (2.38) and (2.41), eventually we obtain that

N hp/2 Tn2 →1 N1 (0, 2τ2 ), hence N hp/2 Vn2 →d N1 (0, 2τ2 ).

(2.42)

This in turn completes the proof of Lemma 2.6.2.
For the next lemma recall the decomposition (2.13).
Lemma 2.6.3. Under assumptions (C1)–(C10) and 0 < λ < ∞, the following holds when
H0 is true.

Vn3 = op (1/(nhp/2 ));

Unj = op (1/(nhp/2 )), j = 1, 2, 3.

Proof. The proof of the claim about Vn3 is similar to that of Lemma 6.2. Rewrite

Vn3 =
=
=

1
n(n − 1)
1
n(n − 1)

Iij Kh,ij δi δj
i=j

i=j

1
n(n − 1)N 2

Iij Kh,ij
ˆ
ˆ
[WN (Zi , θ0 ) − WN (Zi , θ)][W
N (Zj , θ0 ) − WN (Zj , θ)]
˜
˜
f (Zi )f (Zj )

i=j k,l

Iij Kh,ij
ˆ − q(Xk , θ0 )]
Kw,ik Kw,jl [q(Xk , θ)
˜
˜
f (Zi )f (Zj )
ˆ − q(Xl , θ0 )]
×[q(Xl , θ)

= Vn3 + op (Vn3 ),
40

where

Vn3 =

1
n(n − 1)

i=j k,l

IC (Zi )IC (Zj )Kh,ij
ˆ − q(Xk , θ0 )][q(Xl , θ)
ˆ − q(Xl , θ0 )].
[q(Xk , θ)
f (Zi )f (Zj )

Furthermore, Vn3 is decomposed as the sum of the following two terms.
1
Vn3 =
n(n − 1)N 2
+

N
i=j k=1

1
n(n − 1)N 2

Iij Kh,ij
K
K
[q(Xk , θˆn ) − q(Xk , θ0 )]2
f (Zi )f (Zj ) w,ik w,jk

i=j k=l

Iij Kh,ij
K
K
f (Zi )f (Zj ) w,ik w,jl
×[q(Xk , θˆn ) − q(Xk , θ0 )][q(Xl , θˆn ) − q(Xl , θ0 )]

= Vn31 + Vn32 ,

say.

Similar to the analysis of Vn21 , define the symmetric function

φ1 (Di , Dj , Dk ) =

Because

√

Iij Kh,ij
K
K
[q(Xk , θˆn ) − q(Xk , θ0 )]2 .
f (Zi )f (Zj ) w,ik w,jk

n(θˆn − θ0 ) = Op (1) and by the Taylor expansion, we have
√
q(Xk , θˆn ) − q(Xk , θ0 ) = Op (1/ n).

Then we can easily check that Vn31 = op (1). Following the routine argument showed in the
proof of Lemma 3.3d in Zheng (1996), we obtain that Vn32 = op (1) under H0 and (C2),
(C3), (C5), (C7)–(C9).

41

Similarly, Un1 can be written as

Un1 =
=

1
n(n − 1)N 2
1
n(n − 1)N 2

Iij Kh,ij Kw,ik Kw,jl ηi (q(Xl ) − g(Zj ))
i=j,k,l

Iij Kh,ij Kw,ik Kw,jk ηi (q(Xk ) − g(Zj ))
i=j,k

1
+
n(n − 1)N 2
= Un11 + Un12 ,

Iij Kh,ij Kw,ik Kw,jl ηi (q(Xl ) − g(Zj ))
i=j,k=l

say.

Analogous to the analysis of Vn1 and Vn2 , similar results can be derived for Un1 as follows:

Un11 = op (1/(nhp/2 )),

and Un12 can be formulated as a non-degenerate U statistic with the kernel function

φ2 (Di , Dj , Dk , Dl ) = Iij Kh,ij Kw,ik Kw,jl [ηi (q(Xl ) − g(Zj )) + ηj (q(Xk ) − g(Zi ))]/4
+Iij Kh,ij Kw,il Kw,jk [ηi (q(Xk ) − g(Zj )) + ηj (q(Xl ) − g(Zi ))]/4.

By the central limit theorem of non-degenerate U statistics, we can see that

√
nUn12 =

Op (wk ). Thus

nhp/2 Un1 = Op

√
√
nhp · nUn12 = Op (

nhp w2k ) = op (1)

under the assumption (C10). The proofs of the claims pertaining to Un2 and Un3 are similar.
Details are omitted for the sake of brevity.

42

Proof of Theorem 2.3.4. Similar to the proof of Vn under H0 , we can show that under Ha

Vn
=

(2.43)

1
n(n − 1)

1
=
n(n − 1)
1
+ 2
N

IC (Zi )IC (Zj )Kh,ij ηˆi ηˆj
i=j

IC (Zi )IC (Zj )Kh,ij η¯i η¯j
i=j

IC (Zk )IC (Zl ) [q(Xk ) − g(Zk )][q(Xl ) − g(Zl )]

K(u)K(v)Kh,kl (u, v)dudv

k=l

+op (1/(nhp/2 ))
= T¯n1 + T¯n2 + op (1/(nhp/2 )),

say.

One can verify that T¯n1 and T¯n2 are the leading terms of Vn1 and Vn2 , respectively, as
derived in Lemma 2.6.2. Rewrite η¯i = ηi + bn E(a(Xi )|Zi ), where E(ηi |Zi ) = 0.

T¯n1

=

1
n(n − 1)

=

1
n(n − 1)

=

1
n(n − 1)
+b2n

:=

Iij Kh,ij η¯i η¯j
i=j

Iij Kh,ij [ηi + bn E(a(Xi )|Zi )][ηj + bn E(a(Xj )|Zj )]
i=j

Iij Kh,ij ηi ηj + bn
i=j

1
n(n − 1)

2
n(n − 1)

Iij Kh,ij ηi E[a(Xj )|Zj ]
i=j

Iij Kh,ij E[a(Xi )|Zi ]E[a(Xj )|Zj ]

i=j
W1 + bn W2 + b2n W3 .

W1 is a degenerate two sample U statistic, hence

nhp/2 W1 →d N1 (0, 2τ1 ).

43

After symmetrization, W2 can be written as a non-degenerate U statistic, hence

√
nW2 =

Op (1), furthermore,
√
nhp/2 bn W2 = hp/4 ( nW2 ) →p 0.
A similar argument as (2.15) indicates that

W3 →p E{IC (Z)E[a(X)|Z]2 f (Z)}.

Hence

nhp/2 T¯n1 →d N1 (E{IC (Z)E[a(X)|Z]2 f (Z)}, 2τ1 ).

(2.44)

As for T¯n2 , the result of Tn2 in (2.42) still holds since T¯n2 only involves the validation data
and it is irrelevant to the hypothesis of the regression model, i.e.,

nhp/2 T¯n2 →d N1 (0, 2τ2 ).

(2.45)

Note that T¯n1 and T¯n2 are independent since they are constructed based on independent
samples. Combining (2.43), (2.44) and (2.45), we obtain that

nhp/2 Vn →d N1 (E{IC (Z)E[a(X)|Z]2 f (Z)}, 2τ1 + (2τ2 )/λ2 ).

This completes the proof of Theorem 2.3.4.
Lemma 2.6.4. Let {Di , i = 1, · · · , n} be a set of i.i.d. r.v.’s and {Dk , k = 1, · · · , N } be

44

another set of i.i.d. r.v.’s, which is independent of {Di }. Define the two sample U statistic
1
T =
n(n − 1)N

n

N

ϕn (Di , Dj , Dk ),
i=j=1 k=1

where ϕn is a symmetric function with regard to permutation of (Di , Dj ) and square integrable for each n. Then

1
4
2
Var(E(ϕ
|D
))
+
Var(E(ϕn |D1 ))
Var(ϕ
)
+
n
n
1
n
N
n2 N
2
4
+ 2 Var(E(ϕn |D1 , D2 )) +
Var(E(ϕn |D1 , D1 )) .
nN
n

Var(T ) = O

(2.46)

Proof. Algebra shows that

Var(N n(n − 1)T )
[ϕn (Di , Dj , Dk ) − Eϕn (Di , Dj , Dk )]

= E

2

i=j,k

E [ϕn (Di , Dj , Dk ) − Eϕn (Di , Dj , Dk )][ϕn (Ds , Dt , Dl ) − Eϕn (Ds , Dt , Dl )]

=
i=j,k s=t,l

=

+
{s,t}={i,j},l=k

+4
{s,t}={i,j},k=l

s=i,t=j,k=l

s=i,t=j,k=l

s=i,t=j,k=l

E{[ϕn (Di , Dj , Dk ) − Eϕn ][ϕn (Ds , Dt , Dl ) − Eϕn ]}

+

+

+4

s=i,t=j,k=l

= 2n(n − 1)N Var(ϕn ) + 2n(n − 1)N (N − 1)Var(E(ϕn |D1 , D2 ))
+4n(n − 1)(n − 2)N Var(E(ϕn |D1 , D1 )) + 4n(n − 1)(n − 2)N (N − 1)Var(E(ϕn |D1 ))
+n(n − 1)(n − 2)(n − 3)N Var(E(ϕn |D1 )).

The claim (2.46) follows from this identity upon dividing both sides by N n(n − 1)
using the fact that (n − k)/n → 1, and (N − k)/N → 1, for k = 1, 2, 3.

45

2

and

Furthermore, define

1
S=
n(n − 1)

n
i=j=1

1
N (N − 1)

N

ψn (Di , Dj , Dk , Dl ),
k=l=1

where ψn is square integrable and symmetric with regard to permutation of (Di , Dj ) as well
as (Dk , Dl ), i.e., ψn (Di , Dj , ·, ·) = ψn (Dj , Di , ·, ·) and ψn (·, ·, Dk , Dl ) = ψn (·, ·, Dl , Dk ), for
each n. An argument similar to the one used in Lemma 2.6.4 yields the following lemma.
Lemma 2.6.5. Suppose {Di , 1 ≤ i ≤ n} and {Dk , 1 ≤ k ≤ N } are the two independent
random samples and S is the two sample statistic defined above. Then

4
4
4
Var(ψ
)
+
Var(E(ψ
|D
))
+
Var(E(ψn |Dk ))
n
n
i
n
N
n2 N 2
2
2
16
+ 2 Var(E(ψn |Di , Dj )) + 2 Var(E(ψn |Dk , Dl )) +
Var(E(ψn |Di , Dk ))
nN
n
N
8
8
Var(E(ψn |Di , Dk , Dl )) + 2 Var(E(ψn |Di , Dj , Dk )) .
+
2
nN
n N

Var(S) = O

46

Chapter 3
Minimum distance model checking in
Berkson models

3.1

Introduction

In statistical data analysis, the data is often collected subject to measurement error. One
typical way to treat the measurement error is the errors-in-variables model which assumes
that the real observation Z is a surrogate of the true unobserved variable X, i.e., Z = X + η,
where η is the measurement error variable. Regression models with measurement error in
covariates has received broad attention in the literature over the last century. In the last
three decades it has been the focus of numerous researchers, as is evidenced in the three
monographs by Fuller (1987), Cheng and Van Ness (1999) and Carroll, Ruppert, Stefanski
and Crainiceanu (2006), and the references therein. However, as Berskon (1950) argued that
in many situations it is more appropriate to treat the true unobserved variable X as the
observed variable Z plus an error, i.e., X = Z + η. For instance, in economics, the household
income is usually not precisely collected due to the survey design or data sensitivity. It was
described in Kim, Chao and H¨ardle (2016) that when the income data were collected by
asking individuals which salary range categories they belong to, such as between 5,000 USD
and 9,999 USD, then the midpoint of the range interval was used in analysis. In this case, it is

47

sensible to assume that the true income fluctuates around the midpoint observation subject to
errors. Another example is an individual’s exposure to some contaminant in epidemiological
study, for instance, the atmospheric particulate matter that have a diameter less than 2.5
micrometers (PM2.5). Usually the concentration of PM2.5 in an area is reported hourly or
daily as an average measurement, however, the true exposure for an individual relies on the
specific location and the time of the day. This type of data also favors the Berkson error
model. More examples can be found in Du et al. (2011) and Carroll et al. (2006).
Proceeding a bit more precisely, in the Berkson measurement error regression model of
interest here one has the triple X, Y, Z, obeying the relations

Y = µ(X) + ε,

X = Z + η.

(3.1)

Here Y is a scalar response variable and ε is an error variable with Eε = 0. The random
vectors X, Z, η are p-dimensional, with X being the true unobservable covariate vector, Z
representing an observation on X and η denoting the measurement error having Eη = 0.
For identifiability reasons, the three r.v.’s ε, X, η are assumed to be mutually independent.
Thus µ(x) = E(Y |X = x), for all x ∈ Rp .
Let Θ ⊂ Rq be a compact set, {mθ (x); θ ∈ Θ, x ∈ Rp } be a family of given functions and
C be a compact subset in Rp . The problem of interest here is to test

H0 : µ(x) = mθ0 (x), for some θ0 ∈ Θ and all x ∈ C, versus
H1 : H0 is not true,

based on the primary sample {(Zi , Yi ), i = 1, ..., n} and an independent validation sample

48

{(Zk , Xk ), k = 1, ..., N }, all satisfying (3.1). Then the empirical version of η are naturally
obtained by ηk := Xk − Zk , 1 ≤ k ≤ N .
The literature contains several references that address the estimation of the underlying
parameters in the model (3.1). In the case µ(x) is linear, Berkson (1950) showed that the
ordinary least squares estimators continue to be unbiased and consistent for the underlying
parameters. For polynomial regression, Huwang and Huang (2000) use the method of moments based on the first two conditional moments of Y , given Z, when Z, ε, η are Gaussian,
to produce consistent estimators of the underlying parameters. Relaxing the normality error
distribution assumption to a parametric density family, Wang (2004) developed a minimum
distance approach based on the first two conditional moments of the response variable to
consistently estimate more general parametric regression functions. In the case the measurement error density fη is known, Delaigle, Hall and Qiu (2006) constructed nonparametric
estimators of µ(x) by means of trigonometric series and deconvolution techniques. In the
case fη is unknown but validation data is available, Du et al. (2011) used integrated local linear smoothing and Fourier transformation to formulate a nonparametric estimator of
µ(x). Schennach (2013) obtained a sieve-based nonparametric regression estimator with the
help of instrumental variable without assuming fη to be known.
Relatively, the literature is scant on the above testing problem. Koul and Song (2009) are
the first authors to address this problem. Assuming fη is known, they proposed parameter
estimation by minimizing an integrated square distance between a nonparametric regression
function estimator and the model being fitted and then utilized the minimized distance to
implement the hypothesis test. In the current chapter, we extend this methodology to the
case when fη is unknown, but when validation data is available.
A surprising finding is that the asymptotic distributions of the minimum distance (m.d.)
49

test statistics in the case of unknown fη is the same as in the case of known fη . The
asymptotic distributions of the corresponding m.d. estimators of the null model parameters
are affected by not knowing fη in general. Exceptions are provided by the linear models
when the set C and the integrating measure used in the definition of the above mentioned
distances are symmetric around zero.
This chapter is organized as follows. Section 3.2 describes the proposed m.d. estimators
and test statistics and the needed assumptions for the derivation of their consistency and
asymptotic normality. Section 3.3 establishes the consistency and asymptotic normality of
the m.d. estimators while in Section 3.4 we state the main results of the proposed tests
under the null and certain fixed alternative hypotheses and provide sketches of the proofs.
It is worth mentioning that the variation in validation data contributes to the asymptotic
distributions of the proposed m.d. estimators of the null model parameters but not to the
asymptotic distributions of the m.d. test statistics. Section 3.5 reports findings of a Monte
Carlo study that assesses some finite sample properties of an estimator and a test in the
proposed classes of these inference procedures. Some of the proofs are relegated to the last
Section 3.6 of the chapter.

3.2

A class of tests

This section describes a class of the proposed tests and estimators of the null model
parameters along with the needed assumptions. To overcome the difficulty created by not
observing X we use the calibration idea as used in Koul and Song (2009). Accordingly,

50

assume E|µ(X)| < ∞, E|mθ (X)| < ∞, for all θ ∈ Θ and z ∈ Rp , define

H(z) := E µ(X) Z = z =
Hθ (z) := E[mθ (X)|Z = z] =

µ(x)fη (x − z)dx =

µ(y + z)fη (y)dy,

mθ (x)fη (x − z)dx =

mθ (y + z)fη (y)dy.

Then the original model can be transformed to

Y = H(Z) + ξ,

E(ξ|Z) = 0,

(3.2)

and the hypothesis testing becomes

H0 : H(z) = Hθ0 (z), for some θ0 ∈ Θ and all z ∈ C, vs.
H1 : H0 is not true.

To proceed further, let w ≡ wn = c(log n/n)1/(p+4) , c > 0, and h ≡ hn be two bandwidth sequences associated with sample sizes n and N , K be a density kernel and G be a
nondecreasing right continuous real valued function on R and define

z − Zi
1
1
, fˆw (z) =
Khi (z) = p K
h
h
n
Mn (θ) =
Mn (θ) =
Wn (θ) =

1
nfˆw (z)
1
ˆ
nfw (z)
1
nfˆw (z)

n

N

Kwi (z), Hθ (z) =

N −1

i=1

n

k=1
2

Khi (z)[Yi − Hθ (Zi )] dG(z),
i=1
n

2

Khi (z)[Yi − Hθ (Zi )] dG(z),
i=1
n

2

θ˜n = argminθ Mn (θ),
θˆn = argminθ Mn (θ),

Khi (z)[Hθ (Zi ) − Hθ (Zi )] dG(z).
i=1

51

mθ (z + ηk ),(3.3)

Note that the density estimator fˆw is based on a bandwidth w that is different from the
bandwidth h employed in the numerator of the regression function estimator. This plausible
scheme was proposed in Koul and Ni (2004) (KN) in order to have an nhp/2 -consistent
estimator of the asymptotic bias in Mn (θ˜n ).
In the case fη is known then Hθ is a known parametric function and Koul and Song (2009)
(KS) proposed the minimum distance testing procedure based on Mn (θ˜n ). However, this
method is not feasible without the knowledge of fη , which renders the regression function
Hθ to be unknown also. But, with the availability of validation sample {Xk , Zk }, where
ηk := Xk − Zk , 1 ≤ k ≤ N is a random sample from fη , we are able to estimate Hθ (z)
by Hθ (z) defined above. This then leads to the class, one for each G and K, of m.d. test
statistics Mn (θˆn ).
We shall now present the needed assumptions for establishing the consistency and asymptotic normality of θˆn and Mn (θˆn ). Many of these assumptions are the same as in KS. Define,
for x, y ∈ Rp and θ ∈ Θ,

σθ (x, y) := Cov mθ (x + η), mθ (y + η) ,

σθ2 (x) := σθ (x, x) = Var(mθ (x + η)).

(A1) {(Yi , Zi ), Zi ∈ Rp , i = 1, ..., n} is an i.i.d. sample with regression function H(z) =
E(Y |Z = z) satisfying

H 2 dG < ∞, where G is a σ-finite measure with continuous

Legesgue density g on C while {(Zk , Xk ), Zk ∈ Rp , Xk ∈ Rp , k = 1, ..., N } is an i.i.d.
sample from Berkson measurement error model X = Z + η.
(A2) 0 < σε2 := Var(ε) < ∞, τ 2 (z) = E[(mθ0 (X) − Hθ0 (Z))2 |Z = z] is a.e. (G) continuous
on C.
(A3) Both E|ε|2+δ and E|(mθ0 (X) − Hθ0 (Z)|2+δ are finite for some δ > 0.
52

(A4) Both E|ε|4 and E|(mθ0 (X) − Hθ0 (Z)|4 are finite.
(A5)

σθ2 (z)dG(z) < ∞, for all θ ∈ Θ.

(F1) The density fZ is uniformly continuous and bounded away from 0 in C.
(F2) The density fZ is twice continuously differentiable in C.
(H1) mθ (x) is a.e. continuous in x, for every θ ∈ Θ.
(H2) The parametric function family Hθ (z) is identifiable with respect to θ, i.e, Hθ1 (z) =
Hθ2 (z) a.e. in z implies θ1 = θ2 .
(H3) For some positive continuous function r on C, and for some 0 < β ≤ 1, |Hθ1 (z) −
Hθ2 (z)| ≤ θ1 − θ2 β r(z), for all θ1 , θ2 ∈ Θ and z ∈ C.
(H4) For each x, mθ (x) is differentiable with respect to θ in a neighborhood of θ0 with the
derivative vector m
˙ θ (x) such that for every sequence 0 < δn → 0,

sup

N −1

N
k=1 [mθ (Zi

+ ηk ) − mθ0 (Zi + ηk ) − (θ − θ0 )T m
˙ θ0 (Zi + ηk )]
θ − θ0

i,θ

= op (1),

where the supremum is taken over 1 ≤ i ≤ n, θ − θ0 ≤ δn .
(H5) The vector function m
˙ θ0 (x) is continuous in x ∈ C and for every

> 0, there are n

and N such that for every 0 < a < ∞, and for all n > n , N > N ,

P

max
1≤i≤n,1≤k≤N,(nhp )1/2 θ−θ0 ≤a

(H6)

H˙ θ0 2 dG < ∞ and Σ0 =

h−p/2 m
˙ θ (Zi + ηk ) − m
˙ θ0 (Zi + ηk ) ≥

H˙ θ0 H˙ θT dG is positive definite.
0

(K) The density kernel K is positive symmetric and square integrable on [−1, 1]p .
(W1) nh2p → ∞ and N/n → λ, λ > 0.
53

≤ .

(W2) h ∼ n−a , where 0 < a < min(1/2p, 4/(p(p + 4))).

We state some facts that will be often used in the proofs below. Note that (H4) implies
that for every 0 < a < ∞,
N −1
sup

N
k=1

mθ (Zi + ηk ) − mθ0 (Zi + ηk ) − (θ − θ0 )T m
˙ θ0 (Zi + ηk )
θ − θ0

i,θ

where the supremum is taken over 1 ≤ i ≤ n, (nhp )1/2 θ − θ0

= op (1), (3.4)

≤ a. From Mack and

Silverman (1982) we obtain that under (F1), (K1), (W1) and (W2),
f 2 (z)
− 1 = op (1). (3.5)
sup |fˆh (z) − fZ (z)| = op (1), sup |fˆw (z) − fZ (z)| = op (1), sup Z
2 (z)
z∈C
z∈C
z∈C fˆw
Theorem 2.2 part (2) in Bosq (1998) yields that under assumptions (F2) and (K1),

(logk n)−1 (n/ log n)2/(p+4) sup |fˆw (z) − fZ (z)| → 0, a.s., ∀ integer k > 0.

(3.6)

z∈C

We also recall the following facts from KN and KS. Let dϕ = fZ−2 dG, dϕˆ = fˆw−2 dG. For
any continuous function α(z) with

α(z)dϕ(z)
ˆ −

α(z)dϕ(z)

|α(z)|dϕ(z) < ∞, (3.5) implies
fZ2 (z)
−1
2 (z)
z∈C fˆw

|α(z)|dϕ(z) = op

α(z)dϕ(z) + op

|α(z)|dϕ(z) .

≤ sup

|α(z)|dϕ(z) .

Hence

α(z)dϕ(z)
ˆ
=

54

(3.7)

From (3.9) in KN, for any function α(z) as above, (F1), (K1) and (W1) imply

1
E
n

n

Kh (z − Zi )α(Zi )

2

dϕ(z) =

α2 dG + o(1) = O(1).

(3.8)

i=1

In the sequel, we shall not exhibit the set C in the integrals. All the integrals with respect
to the measure G are supposed to be over this set, unless specified otherwise.

3.3

Estimation of θ0

In this section, we establish the consistency and asymptotic normality of θˆn under H0 .
To begin with, consider the following decomposition that shows a connection between
Mn (θ) and Mn (θ), where Wn (θ) is as in (3.3).

Mn (θ) =

1
nfˆw (z)

n

2

Khi (z)[Yi − Hθ (Zi ) + Hθ (Zi ) − Hθ (Zi )] dG(z)

(3.9)

i=1

= Mn (θ) + Wn (θ) + 2Rn (θ),

where Rn (θ) is the cross product term. We can see that the validation data is involved
through the extra terms Wn and Rn . The following lemma about Wn is found to be useful
in deriving various results in the sequel. Its proof is given in the last Section 3.6 of the
chapter. Let K1 be as in (2.11) and let

γ(θ) :=

σθ2 (x, y)dG(x)dG(y),

AN (θ) =

1
N

σθ2 (z)dG(z).

Lemma 3.3.1. Suppose (A1), (A2), (A5), (F1), (H1), (K), and (W1) hold. Then for every

55

θ ∈ Θ for which µ(x) = mθ (x), x ∈ C, we have

N (Wn (θ) − AN (θ)) → N1 (0, γ(θ)).

3.3.1

(3.10)

Consistency of θˆn

We first establish the consistency of the proposed m.d. parameter estimators θˆn . Many
details below are similar to those in KN and KS. Recall µ(x) = E(Y |X = x). Let H(z) =
E(µ(X)|Z = z), and define

ρ(ν, Hθ ) =

(ν − Hθ )2 dG,

T (ν) = argminθ

(ν − Hθ )2 dG = argminθ ρ(ν, Hθ ), ν ∈ L2 (G).

Lemma 3.3.2. Suppose (A1), (A2), (A5), (F1), (H1), (H3), (K) and (W1) hold. If, in
addition T (H) is unique, then

θˆn = T (H) + op (1).

The proof is deferred to the last Section 3.6 of the chapter. Assumption (H2), Lemmas
3.3.1 and 3.3.2 immediately imply the consistency of the proposed estimators θˆn as stated
in the following theorem.
Theorem 3.3.1. Suppose (A1), (A2), (A5), (F1), (H1)–(H3), (K), (W1) and H0 hold.
Then θˆn →p θ0 .

56

3.3.2

Asymptotic normality of θˆn

Here we present the asymptotic normality result about θˆn under H0 .
Theorem 3.3.2. Suppose (A1)–(A3), (A5), (F1), (F2), (H1)–(H6), (K), (W1), (W2) and
H0 hold, then under H0 ,
√
−1 ,
−1
n(θˆn − θ0 ) →d Nq 0, Σ−1
0 (Σ1 + λ Σ2 )Σ0

where Σ0 is given in (H6) and

Σ1 =
Σ2 =

(σε2 + τ 2 (u))H˙ θ0 (u)H˙ θT (u)g 2 (u)
0

fZ (u)

du,

(3.11)

σθ0 (x, y)H˙ θ0 (x)H˙ θT (y)dG(x)dG(y).
0

√
This theorem shows that θˆn is n-consistent for θ0 and the asymptotic covariance matrix
is mainly determined by the two terms Σ1 and Σ2 . The matrix Σ1 represents the variation
in Berkson measurement error model when fη is known as in KS while Σ2 represents the
contribution due to the estimation of Hθ by Hθ using the validation data. Moreover, the
covariance tends to decay as N/n increases. When N/n → ∞, in other words, when the
validation sample size N is sufficiently large, compared to the primary sample size n, not
surprisingly the above asymptotic covariance degenerates to the case as if fη is known.
Remark 3.3.1. Here we verify that the quantities Σ1 and Σ2 in the asymptotic variancecovariance matrix are well defined under the given assumptions. Given (A2) and the compactness of C, τ 2 (u) is bounded on C. Assumption (H6) further implies that Σ1 is finite and
positive definite.

57

Next, consider Σ2 . The Cauchy-Schwarz inequality implies that σθ (x, y) ≤ σθ (x)σθ (y)
for all x, y ∈ R, θ ∈ Θ, and that for any a ∈ Rq ,

|aT Σ2 a| ≤
=
=

σθ0 (x, y) aT H˙ θ0 (x)H˙ θT (y)a dG(x)dG(y)
0
σθ0 (x)σθ0 (y) aT H˙ θ0 (x) aT H˙ θ0 (y) dG(x)dG(y)
σθ0 (x) aT H˙ θ0 (x) dG(x)

2

≤ a 2

σθ2 (x)dG(x)
0

Hθ0 (x) 2 dG(x).

Hence assumptions (A5) and (H6) ensure that the entries of Σ2 exist and are finite. Moreover,
as seen in the proofs below, Σ2 is a positive definite covariance matrix.
Now we describe some parametric function families along with the corresponding Σ2 that
satisfy the assumptions (H3)–(H5).
Example 3.3.1. The linear and polynomial cases. Suppose q = p, mθ (x) = θT x,
θ, x ∈ Rp . Then Hθ (z) = θT z is a known function. In this case there is no need to estimate
this function and one can also use θ˜n as a m.d. estimator of θ. See Remark 3.3.2 for an
asymptotic equivalence between θˆn and θ˜n .
In the polynomial regression of order p, q = p + 1 and mθ (x) = θT (x), x ∈ R, where
θ = (θ1 , ..., θp+1 )T and (x) := (1, x, ..., xp )T such that E (X) < ∞, where

·

denotes

the Euclidean norm. Then

L(z) := E( (X)|Z = z) = (1, z, E(z + η)2 , ..., E(z + η)p )T ,

Hθ (z) = θT L(z).

This model is a simple deviation from the linear model and one already sees the need to

58

estimate Hθ (z). Given the validation data, an estimate of Hθ (z) in this case is given by
1
Hθ (z) =
N

N
k=1

1
mθ (z + ηk ) =
N

N

θ1 + θ2 (z + ηk ) + θ3 (z + ηk )2 + ... + θp+1 (z + ηk )p .
k=1

Here (H3) is satisfied with r = L. Furthermore, m
˙ θ (x) = (x) and H˙ θ (z) = L(z) for all
θ ∈ Θ. Therefore, similar to the linear case, (H4) and (H5) hold. Moreover,

σθ (x, y) = θT E (x + η) T (y + η) − L(x)LT (y) θ,
Σ2 =

θ0T [E (x + η) T (y + η) − L(x)LT (y)]θ0 L(x)LT (y)dG(x)dG(y).

Example 3.3.2. The nonlinear case. In biochemistry, one of the well known models for
enzyme kinetics relates enzyme reaction rate to the concentration of a substrate x by the
formula α0 x/(θ + x), α0 > 0, θ > 0, x > 0. This is the so called Michaelis–Menten model.
The ratio γ0 = α0 /θ is defined as the catalytic efficiency that measures how efficiently an
enzyme converts a substrate into product. With γ0 known, the function can be written as

mθ (x) :=

γ0 θx
,
θ+x

θ > 0,

x > 0.

(3.12)

We will verify that this nonlinear function satisfies (H3)–(H5).
Regarding (H3), as shown in KS, one sufficient condition is that the regression function
mθ (x) satisfies the same condition in (H3). In this case, direct calculation shows that

|mθ1 (x) − mθ0 (x)| =

γ0 x2 |θ1 − θ0 |
≤ γ0 |θ1 − θ0 |.
(θ0 + x)(θ1 + x)

Hence (H3) holds for (3.12).

59

Furthermore, suppose for each x ∈ Rp , the q × q matrix m
¨ θ (x) := ∂ 2 mθ (x)/∂θ2 exists
for all θ in a neighborhood U0 of θ0 and m
¨ θ (x) ≤ C, for all θ ∈ U0 and x ∈ Rp , where the
constant C may depend on θ0 . Then, (H4) holds because by the Mean Value Theorem, with
probability 1, for all 1 ≤ i ≤ n, N ≥ 1, θ − θ0 ≤ δn ,
N −1

N
k=1 [mθ (Zi

+ ηk ) − mθ0 (Zi + ηk ) − (θ − θ0 )T m
˙ θ0 (Zi + ηk )]
θ − θ0

≤ C θ − θ0 ≤ Cδn .

In particular, for the function (3.12), p = 1 = q, the second derivative of the function
m
¨ θ (x) = −2γ0 x2 /(θ + x)3 is bounded for θ > 0 and x > 0, so (H4) holds in this case.
√
As for (H5), with nhp |θ − θ0 | ≤ a and θ1∗ falling between θ and θ0 , we have

sup h−p/2 |m
˙ θ (Zi + ηk ) − m
˙ θ0 (Zi + ηk )| = sup h−p/2 |m
¨ θ∗ (Zi + ηk )(θ − θ0 )|
i,k,θ∗

i,k,θ

√
√
≤ sup Ch−p/2 |θ − θ0 | = Op (h−p/2 / nhp ) = Op 1/( nhp ) = op (1),
θ

where C is the upper bound for the second derivative m
¨ θ (x). Therefore (H5) is satisfied.
Another nonlinear example is the exponential function mθ (x) = eθx , θ, x ∈ R. In practice,
it is reasonable to assume that both Θ and the domain of X are bounded subsets in R, i.e.,
|θ| ≤ C1 and |x| ≤ C2 . Again it suffices to verify that the condition in (H3) holds with
Hθ (z) replaced by mθ (x). With θ∗ falling between θ1 and θ2 , we obtain
∗
|mθ2 (x) − mθ1 (x)| = |m
˙ θ∗ (x)(θ2 − θ1 )| = |xeθ x (θ2 − θ1 )|

≤ (|x|eC1 |x| )|θ2 − θ1 | := r(x)|θ2 − θ1 |.

Therefore (H3) holds for the exponential regression function. Moreover, the second derivative

60

m
¨ θ (x) = x2 eθx is bounded by the constant C12 eC1 C2 . Hence the argument similar to that
for (3.12) yields that the exponential function also satisfies (H4) and (H5).
Next, we provide a sketch of the proof of Theorem 3.3.2. The most of the details of
the proof are the same as in KN and KS. So we shall be briefly indicating only the major
differences.
Proof of Theorem 3.3.2. We first show that

nhp θˆn − θ0 2 = Op (1).

(3.13)

Define

Dn (θ) =
Dn (θ) =

1
n
1
n

n

2

Khi (z)(Hθ (Zi ) − Hθ0 (Zi )) dϕ(z),
ˆ
i=1
n

2

Khi (z)(Hθ (Zi ) − Hθ0 (Zi )) dϕ(z).
ˆ

(3.14)

(3.15)

i=1

We shall shortly prove the following two facts.

nhp Dn (θˆn ) = Op (1).

(3.16)

For any 0 < a < ∞, there exist na and Na such that

P Dn (θˆn )/ θˆn − θ0 2 ≥ a + inf bT Σ0 b > 1 − a
b =1

∀ n > n a , N > Na ,

(3.17)

where Σ0 is defined in (H6). Then, as in KS, (3.13) follows from (3.16), (3.17) and the

61

relation
nhp Dn (θˆn ) = [nhp θˆn − θ0 2 ][Dn (θˆn )/ θˆn − θ0 2 ].
Proof of (3.16). Subtracting and adding Yi to the ith summand in (3.14) and the triangular inequality yield

Dn (θˆn ) ≤ 2(Mn (θˆn ) + Mn (θ0 )) ≤ 2(Mn (θ0 ) + Mn (θ0 )),

because θˆn is the minimizer of Mn . From (3.4) of KS, we obtain nhp Mn (θ0 ) = Op (1).
Lemma 3.3.1 and the decomposition (3.9) of Mn imply that nhp Mn (θ0 ) = Op (1). Therefore

nhp Dn (θˆn ) = Op (1).

(3.18)

Next, subtracting and adding Hθ0 (Zi ) to the ith summand in (3.15) and the triangular
inequality yield

Dn (θˆn ) ≤ 2(Wn (θ0 ) + Dn (θˆn )).

Lemma 3.3.1 implies that N Wn (θ0 ) = Op (1). This fact, (3.18) and (W1) yield (3.16).
To prove (3.17), define

un = θˆn − θ0 ,
µ
ˆ˙ n (z, θ) =

Dn1 =

1
n
1
nN

˙ θ0 (Zi + ηk ),
dnik = mθˆ (Zi + ηk ) − mθ0 (Zi + ηk ) − uTn m
n

n

˙
Khi (z)H θ (Zi )dϕ(z)
ˆ
=

i=1
n N

Khi (z)
i=1 k=1

dnik
un

2

1
nN

dϕ(z),
ˆ

62

n

(3.19)

N

Khi (z)m
˙ θ (Zi + ηk )dϕ(z),
ˆ
i=1 k=1

Dn2 =

uTn µ
ˆ˙ n (z, θ) 2
dϕ(z).
ˆ
un

Then by the Cauchy-Schwarz inequality,
Dn (θˆn )
=
un 2

1
nN

n

N

i=1 k=1

uTn m
˙ θ0 (Zi + ηk )
dnik
+
Khi (z)
un
un

2

dϕ(z)
ˆ

≥ Dn1 + Dn2 − 2 Dn1 Dn2 .

By (3.7) and (3.8),

1
n

n

2

Khi (z)

dϕ(z)
ˆ
= Op (1).

(3.20)

i=1

The consistency of θˆn , (H4) and (3.20) in turn imply
N

Dn1 ≤ max
i

N −1
k=1

dnik 2
un

1
n

n

Khi (z)

2

dϕ(z)
ˆ
= op (1).

(3.21)

i=1

An argument similar to the one used in KN in the analysis of the analog of Dn2 yields
(3.17) for Dn2 , thereby completing the proof of (3.13).
Now we provide a sketch to derive the asymptotic variance of

√ ˆ
n(θn − θ0 ). Proceeding

as in KN and KS, θˆn is the root of the score equation

˙
M n (θ) = −2

1
n

n

Khi (z)(Yi − Hθ (Zi ))
i=1

1
n

n

˙
Khi (z)Hθ (Zi )) dϕ(z)
ˆ
= 0.

(3.22)

i=1

Arguing as in Lemma 4.2 of KN pertaining to gn1 , the above equation becomes

˙
M n (θ) = −2

1
n

n

Khi (z)(Yi − Hθ (Zi ))
i=1

63

1
n

n

Khi (z)H˙ θ (Zi )) dϕ(z)
ˆ
= 0.
i=1

(3.23)

Define

1
µ˙ n (z, θ) =
n
1
Un1 (z) =
n
Sn =

n

Khi (z)H˙ θ (Zi ),
i=1
n

Khi (z)ξi ,
i=1

1
Un2 (z) =
n

Un1 (z)µ˙ h (z, θ0 )dϕ(z),

1
Vn (z, θ) =
n

µ˙ h (z, θ) = E[Kh (z − Z)H˙ θ (Z)],

ξi := Yi − Hθ0 (Zi ),

n

Khi (z)(Hθ0 (Zi ) − Hθ0 (Zi )),
i=1

Tn =

Un2 (z)µ˙ h (z, θ0 )dϕ(z),

n

Khi (z)(Hθ (Zi ) − Hθ0 (Zi )),

Σ0 =

H˙ θ0 (x)H˙ θT (x)dG(x).
0

i=1

Then the equation (3.23) is equivalent to

[Un1 (z) − Un2 (z)]µ˙ n (z, θˆn )dϕ(z)
ˆ
=

Vn (z, θˆn )µ˙ n (z, θˆn )dϕ(z).
ˆ

(3.24)

A major difference between the proofs in KN, KS and here is the presence of the additional
term

Un2 (z)µ˙ n (z, θˆn )dϕ(z)
ˆ
in (3.24) due to the estimation of Hθ0 (z) by Hθ0 (z). A slight

modification of the arguments in the proofs of Lemmas 4.1–4.3 of KN yield
√

n

Un1 (z)µ˙ n (z, θˆn )dϕ(z)
ˆ
=

√
n

√

nSn + op (1),

Un2 (z)µ˙ n (z, θˆn )dϕ(z)
ˆ
=

√

√

nSn →d Nq (0, Σ1 ),

nTn + op (1).

It thus remains to investigate the asymptotic property of Tn . For that purpose, define

φT (Zi , ηk ) :=
(x) :=

Khi (z)[mθ0 (Zi + ηk ) − Hθ0 (Zi )]µ˙ h (z, θ0 )dϕ(z),

[mθ0 (z + x) − Hθ0 (z)]µ˙ h (z, θ0 )fZ (z)dϕ(z).
64

1 ≤ i ≤ n, 1 ≤ k ≤ N,

Then
n
√
1
nTn = √
nN

N

φT (Zi , ηk ).

i=1 k=1

The statistic Tn is a two sample U statistic with kernel function φT . We shall be using
Theorem B.1 in Sepanski and Lee (1995) to derive asymptotic distribution of Tn and some
other statistics. For the sake of completeness we include statement of this theorem as Lemma
3.6.1 in the last Section 3.6.
In order to apply Lemma 3.6.1 to Tn , we need to identify the limits of projections of
φT , i.e., limn→∞ E(φT |Z1 ) and limn→∞ E(φT |η1 ) as well as their corresponding variances.
Algebra shows that E(φT |Z1 ) ≡ 0, E(φT |η1 ) →p (η1 ), Var( (η1 )) = Σ2 , where Σ2 is as in
(3.11). Applying Lemma 3.6.1 in Sepanski and Lee (1995) yields that
√

nTn →d Nq (0, Σ2 /λ),

where λ > 0 is as in assumption (W1). Note that the asymptotic property of Tn is dominated
by the behavior of E(φT |η1 ), the projection of φT on the validation sample space and Sn is
constructed only based on the primary sample (Yi , Zi ). Hence Sn and Tn are asymptotically
independent. Therefore the left hand side of (3.24) is asymptotically normally distributed
with convergence rate

√

n and variance-covariance matrix Σ1 + (Σ2 /λ).

Now we will show that the right hand side of (3.24) equals Ωn (θˆn − θ0 ), where

Ωn = Σ0 + op (1).

65

(3.25)

Let en := un / un ,

Vn :=

1
nN

n

N

i=1 k=1

d
ˆ
Khi (z) nik µ˙ n (z, θˆn )dϕ(z),
un

Ln :=

ˆ
µ˙ n (z, θˆn )µ˙ Tn (z, θ0 )dϕ(z).

Then the right hand side of (3.24) can be rewritten as

Vn (z, θˆn )µ˙ n (z, θˆn )dϕ(z)
ˆ
= [Vn eTn + Ln ]un .

Argue as in KN to show that (H4) implies Vn = op (1) and Ln = Σ0 + op (1). Moreover, en
being a unit vector, we obtain Vn eTn

= op (1). This completes the sketch of the proof of

(3.25), thereby that of Theorem 3.3.2.
Remark 3.3.2. Connection between θˆn and θ˜n in linear regression. Here we shall
investigate a relation between the estimators θˆn and θ˜n in the linear model. Assume

µ(x) = mθ (x) = θT x,

x ∈ C ⊂ Rp ,

for some θ ∈ Θ ⊂ Rp .

(3.26)

Then Hθ (z) = θT z and a closed form of θˆn can be derived by taking derivative of Mn (θ)
and solving the equation ∂ Mn (θ)/∂θ = 0, i.e., Bn θˆn = An , where

An =
Bn =

1
n
1
n

n

Khi (z)Yi
i=1
n

1
n

n

Khi (z)(Zi + η¯) dϕ(z),
ˆ
i=1

Khi (z)(Zi + η¯)
i=1

66

1
n

n

Khi (z)(Zi + η¯)T dϕ(z),
ˆ
i=1

N
k=1 ηk .

with η¯ = N −1

Similarly, Bn θ˜n = An , where

1
n

An =

1
n

Bn =

n

1
n

Khi (z)Yi
i=1
n

1
n

Khi (z)Zi
i=1

n

Khi (z)Zi dϕ(z),
ˆ
i=1
n

ˆ
Khi (z)ZiT dϕ(z).
i=1

Roughly speaking, because η¯ →p 0, An − An = op (1), Bn − Bn = op (1) and hence θˆn −
θ˜n →p 0. Furthermore, under some specific conditions, both θˆn and θ˜n can achieve the same
asymptotic efficiency. We present two such assumptions here.
(A6) Eη 2 < ∞. τ1 (z) := E |ε| Z = z is a.e. (G) continuous.
(A7) νG :=

zdG(z) = 0,

zz T dG(z) is positive definite.

Proposition 3.3.1. Suppose (3.1) and (3.26) hold with θ = θ0 . In addition suppose (A1),
(F1), (K), (W1), (A6) and (A7) hold, then

√ ˆ
n(θn − θ˜n ) →p 0.

Proof. For the transparency of the exposition, we give details for the case p = 1 only.
Then Bn =

n−1

2
n
ˆ
i=1 Khi (z)Zi dϕ(z).

Bn = κG + op (1), where κG =

By (3.5), (3.7), (3.8) and direct calculations,

z 2 dG(z). By (A7), κG > 0. Then θ˜n = Bn−1 An is well

defined for all sufficiently large n and the consistency of θ˜n yields that An = Op (1). We shall
shortly show that

(a)

√

(b) Bn = Bn + op (n−1/2 ).

n(An − An ) = op (1),

Then for all sufficiently large n, θˆn = Bn−1 An and
√

√
n(θˆn − θ˜n ) =

n(An Bn − An Bn )
Bn Bn
67

√
=

n[An Bn − An (Bn + op (n−1/2 ))]
Bn (Bn + op (n−1/2 ))

(3.27)

√
=

n(An − An )Bn − op (An )
= op (1).
κ2G + op (1)

To prove (3.27)(a), rewrite
√

n(An − An ) =

By (A6) and CLT,

√

√

1
n

n¯
η

n

1
n

Khi (z)Yi
i=1

n

Khi (z) dϕ(z)
ˆ
:=

√
n¯
η An .

i=1

n¯
η = Op (1). It thus suffices to show that An = op (1). Let A∗n denote

the An with ϕˆ replaced by ϕ. Then the facts (3.7), E(|Y | Z = z) ≤ |θ0 z|+τ1 (z), assumption
(A6) and rigorous calculation yield that

1
n2

|An − A∗n | = op

n

|Khi (z)Khj (z)Yi |dϕ(z) = op (1).
i,j=1

Now we rewrite

1
A∗n = 2
n

n
i=1

2 (z)Y dϕ(z) + 1
Khi
i
n2

n

n

Khi (z)Khj (z)Yi dϕ(z) := An1 + An2 .
i=1 j=i=1

Calculation of moments shows that EAn1 = O((nh)−1 ), EAn2 = θ0 νG + o(1), Var(An1 ) =
O(n−3 h−2 ) and Var(An2 ) = O(n−1 ). Hence A∗n = θ0 νG + op (1), and (A7) implies (3.27)(a).
Now we prove (3.27)(b). Let

Bn :=

1
n

n

1
n

Khi (z)Zi
i=1

n

Khi (z) dϕ(z).
ˆ
i=1

Then, by (3.8),

Bn − Bn = 2¯
η Bn

+ η¯2

1
n

n

2

Khi (z) dϕ(z)
ˆ
= 2¯
η Bn + Op (n−1 ).
i=1

68

√
Argue as in the analysis of An to obtain that Bn = νG + op (1). This fact and n¯
η = Op (1)
√
√
η )νG +Op (n−1/2 ), which together with (A7) imply (3.27)(b).
imply that n(Bn −Bn ) = 2( n¯
This also completes the proof of the lemma.

3.4

Testing

In this section we establish the asymptotic behavior of the proposed tests associated with
Mn (θˆn ) under the null and certain fixed alternative hypotheses. Let

ξi = Yi − Hθ0 (Zi ),
1
Cn = 2
n
1
Cn = 2
n

n

ξˆi = Yi − Hθˆ (Zi ),
n

2 (z)ξ 2 dϕ(z),
Khi
i

Γn =

i=1
n
2 (z)ξˆ2 dϕ(z),
Khi
i ˆ

Γn =

i=1

2hp
n2
2hp
n2

Khi (z)Khj (z)ξi ξj dϕ(z)

2

,

i=j

ˆ
Khi (z)Khj (z)ξˆi ξˆj dϕ(z)

2

.

i=j

Because ξ = Y − Hθ0 (Z) = ε + mθ0 (X) − Hθ0 (Z) and because Z, η and ε are mutually
independent, E(ξ 2 |Z = z) = σε2 + τ 2 (z), where τ 2 is as in (A2). Since C is compact, and by
(A2), τ 2 is continuous, we obtain

E ξ 2 |Z = z dG(z) < ∞.

The following theorem gives the main result of this section.
Theorem 3.4.1. Suppose (A1), (A2), (A4), (A5), (F1)–(F2), (K), (H1)–(H6), (W1) and
−1/2

(W2) hold. Then, under H0 , nhp/2 Γn

Mn (θˆn ) − Cn →d N1 (0, 1).
−1/2

Consequently, the null hypothesis is rejected by the test if Tn := nhp/2 Γn

|Mn (θˆn ) −

Cn | > zα/2 with the asymptotic size α > 0, where zα is the upper 100αth percentile of the
standard normal distribution.
The theorem shows that the ratio parameter N/n does not play a role in the limiting

69

null distribution. This finding is also reflected in the finite sample simulation study through
the empirical level and power with different choices of N/n.
Here we provide a sketch of the proof of the above theorem. Rewrite

Mn (θˆn )
=

1
n

n
i=1

Khi (z) Yi − Hθ0 (Zi ) + Hθ0 (Zi ) − Hθ0 (Zi ) + Hθ0 (Zi ) − Hθˆ (Zi )
n

=

[Un1 (z) − Un2 (z) − Vn (z, θˆn )]2 dϕ(z)
ˆ

=

[Un1 (z) − Un2 (z)]2 dϕ(z)
ˆ +

dϕ(z)
ˆ

[Vn (z, θˆn )]2 dϕ(z)
ˆ
[Un1 (z) − Un2 (z)]Vn (z, θˆn )dϕ(z)
ˆ

−2
=: Jn + Dn (θˆn ) − 2Kn (θˆn ),

2

say.

The following three lemmas are needed for the proof of Theorem 3.4.1.
Lemma 3.4.1. Suppose assumptions (A1), (A2), (A4), (A5), (F1)–(F2), (K), (H1)–(H6),
(W1), (W2) and H0 hold. Then

−1/2

nhp/2 Γn

(Jn − Cn ) →d N1 (0, 1).

(3.28)

Lemma 3.4.2. Under the assumptions of Lemma 3.4.1, the following holds.

(a) nhp/2 Dn (θˆn ) = op (1),

(b) nhp/2 Kn (θˆn ) = op (1).

(3.29)

Lemma 3.4.3. Suppose assumptions (A1), (A2), (F1), (K), (H1)–(H6), (W1) with λ < ∞,

70

(W2) and H0 hold. Then

(a) nhp/2 (Cn − Cn ) = op (1).

(b) Γn − Γn = op (1).

(3.30)

The above three lemmas yield the asymptotic normality of Mn (θˆn ) in Theorem 3.4.1 in
a routine fashion. Here we provide the proofs of these lemmas.
Proof of Lemma 3.4.1. Let Jn∗ denote the Jn with ϕˆ replaced by ϕ. Algebra shows that

EJn∗ = E

2 (z)dϕ(z) + E
Un1

2 (z)dϕ(z) = O((nhp )−1 ) + O(N −1 ) = O (nhp )−1 .
Un2

Then, by (3.6) and (W2),

nhp/2 Jn − Jn∗

f 2 (z)
−1
2 (z)
z∈C fˆw

≤ nhp/2 Jn∗ sup

= nhp/2 Op ((nhp )−1 )Op logk (n)

log(n) p/(p+4)
= op (1),
n

Therefore,

Jn = Jn∗ + op ((nhp/2 )−1 ) = Op ((nhp )−1 ).

(3.31)

It thus suffices to prove (3.28) with Jn replaced by Jn∗ . To proceed further, define for
1 ≤ i, j ≤ n, 1 ≤ k, l ≤ N , i = j, k = l,

∆ik = mθ0 (Zi + ηk ) − Hθ0 (Zi ),
Di = (Zi , ξi ),
(3.32)
1
ψ1 (Di , Dj , ηk , ηl ) =
Khi (z)Khj (z)[(ξi − ∆ik )(ξj − ∆jl ) + (ξj − ∆jk )(ξi − ∆il )]dϕ(z),
2

71

ψ2 (Di , ηk , ηl ) =
ψ3 (Di , Dj , ηk ) =
ψ4 (Di , ηk ) =

2 (z)(ξ − ∆ )(ξ − ∆ )dϕ(z),
Khi
i
i
ik
il

Khi (z)Khj (z)(ξi − ∆ik )(ξj − ∆jk )dϕ(z),
2 (z)(ξ − ∆ )2 dϕ(z).
Khi
i
ik

Rewrite

Jn∗
1
nN

=

=
=

1
n2 N 2

n

N

Khi (z)(ξi − ∆ik )

dϕ(z)

i=1 k=1
n
N

Khi (z)Khj (z)(ξi − ∆ik )(ξj − ∆jl )dϕ(z)
i,j=1 k,l=1

1
n2 N 2

2

i=j,k=l

i=j,k=l

Khi (z)Khj (z)(ξi − ∆ik )(ξj − ∆jl )dϕ(z)

+

+

+

i=j,k=l

i=j,k=l

=: Jn1 + Jn2 + Jn3 + Jn4 .

All these four quantities are similar to the two sample U statistics. We will show that only
Jn2 contributes to the asymptotic expectation and only Jn1 contributes to the asymptotic
variance in the limiting distribution. Note that E(∆ik |Zi ) ≡ 0 and E(ξi |Zi ) ≡ 0, a.s. Hence

EJn1 = 0,
E(Jn2 − Cn ) = E

(3.33)
1
n2 N 2

n

E(ψ2 (Di , ηk , ηl )|Di ) − Cn
i=1 k=l

1
2 (z)ξ 2 dϕ(z) = O((N nhp )−1 ),
E Kh1
1
Nn
n−1
EJn3 =
E Kh1 (z)Kh2 (z)∆11 ∆21 dϕ(z) = O(N −1 ),
nN
=

EJn4 = O((N nhp )−1 ).

72

Now we investigate the variances of Jnj , j = 1, 2, 3, 4, using Lemmas 2.6.4 and 2.6.5. We
verify that Jn1 is the only leading term. Note that
1
Jn1 = 2 2
n N

ψ1 (Di , Dj , ηk , ηl ).
i=j,k=l

In order to apply Lemma 2.6.5, we first calculate the projections of ψ1 :

E(ψ1 |Di , Dj ) =

Khi (z)Khj (z)ξi ξj dϕ(z),
(mθ0 (z + ηk ) − Hθ0 (z))(mθ0 (z + ηl ) − Hθ0 (z))f 2 (z)dϕ(z) + op (1),

E(ψ1 |ηk , ηl ) =
1
2
1
E(ψ|Di , ηk , ηj ) =
2

E(ψ1 |Di , Dj , ηk ) =

Khi (z)Khj (z)[(ξi − ∆ik )ξj + (ξj − ∆jk )ξi ]dϕ(z),
Khi (z)[(ξi − ∆ik )(mθ0 (z + ηk ) − Hθ0 (z))
+(ξi − ∆il )(mθ0 (z + ηl ) − Hθ0 (z))]f (z)dϕ(z) + op (1).

All other projections vanish. We also verify the variances of the above projections

Var(ψ1 ) = O(h−p ),

Var E(ψ1 |Di , Dj ) = O(h−p ),

Var E(ψ1 |Di , Dj , ηk ) = O(h−p ),

Var E(ψ1 |ηk , ηl ) = O(1),

Var E(ψ1 |Di , ηk , ηl ) = O(1).

Therefore, Lemma 2.6.5 implies that

Var(Jn1 ) = O

1
n2 N 2 hp

1
1
1
1
+ 2 p+ 2+ 2 p+
n h
N
n Nh
nN 2

=O

1
n2 hp

.

Furthermore, it is seen that only the variance term associated with E(ψ1 |Di , Dj ) dominates

73

the variance of Jn1 and all other projection variances are o(1/(n2 hp )). Thus, if we let
1
Jn1 =
n(n − 1)

n

n

E(ψ1 |Di , Dj ),
i=1 j=i=1

then
nhp/2 (Jn1 − Jn1 ) = op (1).
−1/2

From Lemma 5.1 in KN, we obtain nhp/2 Γn

−1/2

nhp/2 Γn

Jn1 →d N1 (0, 1). Hence

Jn1 →d N1 (0, 1).

(3.34)

By using arguments similar to those used in the proof of Lemma 3.3.1, one can verify

Var(nhp/2 Jn2 ) = o(1),

Var(nhp/2 Jn3 ) = o(1),

Var(nhp/2 Jn4 ) = o(1).

Combining these facts with the expectation results in (3.33), we have

Jn2 = Cn + op (1/(nhp/2 )),

Jn3 = op (1/(nhp/2 )),

Jn4 = op (1/(nhp/2 )).

Therefore, (3.31) and these facts above imply

−1/2

nhp/2 Γn

−1/2

(Jn − Cn ) = nhp/2 Γn

This fact together with (3.34) yield the conclusion (3.28).

74

Jn1 + op (1).

Proof of Lemma 3.4.2. Recall the notation from (3.19). We have

1
nN

Dn (θˆn ) =

≤ 2
= 2 un

n

˙ θ0 (Zi + ηk )
Khi (z) dnik + uTn m
i=1 k=1
n N

1
nN
2

N

2

2

dϕˆ

(3.35)

2
uTn µ
ˆ˙ n (z, θ0 ) dϕ(z)
ˆ

Khi (z)dnik dϕ(z)
ˆ +2
i=1 k=1

Dn1 + Dn2 .

The Cauchy-Schwarz inequality and (3.5) imply that

Dn2 ≤

µ
ˆ˙ n (z, θ0 ) 2 dϕ(z)
ˆ
=

[µ
ˆ˙ n (z, θ0 )]T µ
ˆ˙ n (z, θ0 )dϕ(z) + op (1).

Calculation shows that E [µ
ˆ˙ n (z, θ0 )]T µ
ˆ˙ n (z, θ0 )dϕ(z) = O(1) under (H6). Hence Dn2 =
Op (1). This fact, (3.21) and the fact n un 2 = Op (1), implied by Theorem 3.3.2, together
with (3.35) imply Dn (θˆn ) = op (nhp/2 )−1 , thereby proving (3.29)(a).
In order to prove (3.29)(b), let Un := Un1 − Un2 and rewrite

Kn (θˆn ) =

=

=

1
Un (z)
nN
Un (z)

Un (z)

1
nN
1
nN

n

N

Khi (z)[mθˆ (Zi + ηk ) − mθ0 (Zi + ηk )] dϕ(z)
ˆ
n

i=1 k=1
n N

Khi (z)[dnik + uTn m
˙ θ0 (Zi + ηk )] dϕ(z)
ˆ
i=1 k=1
n N

Khi (z)dnik dϕ(z)
ˆ +
i=1 k=1

=: R1 + R2 .

75

Un (z)uTn µ
ˆ˙ n (z, θ0 )dϕ(z)
ˆ

The facts (3.4), (3.31), and the Cauchy-Schwarz inequality imply that

nhp/2

R1

≤

1/2
Jn

=

nhp/2 o

1
nN

×
p(

un )Op

n

N

Khi (z)dnik
i=1 k=1
((nhp )−1/2 )

2

dϕ(z)
ˆ

1/2

= op (1).

Next, rewrite
n

R2

= uTn

Un (z) n−1
i=1
n

= uTn

Un (z) n−1

˙
Khi (z)H θ0 (Zi ) dϕ(z)
ˆ
˙
Khi (z)H θˆ (Zi ) dϕ(z)
ˆ
n

i=1
n

−uTn

Un (z)

˙
˙
Khi (z)(H θˆ (Zi ) − H θ0 (Zi )) dϕ(z)
ˆ

n−1

n

i=1

=: R21 − R22 .

The score equation (3.22) implies that
n

R21

= uTn

Vn (z, θˆn ) n−1
i=1
n

= uTn

Vn (z, θˆn ) n−1

˙
Khi (z)H θˆ (Zi ) dϕ(z)
ˆ
n

˙
Khi (z)H θ0 (Zi ) dϕ(z)
ˆ

i=1
n

+uTn

Vn (z, θˆn ) n−1

˙
˙
ˆ
Khi (z)(H θˆ (Zi ) − H θ0 (Zi )) dϕ(z)
n

i=1

=: R211 + R212 .

Direct calculations together with (3.29)(a) and (3.8) yield
n

nhp/2

R211

≤

nhp/2

n−1

un [Dn (θˆn )]1/2

i=1

76

−1/2
˙
Khi (z)H θ0 (Zi ) 2 dϕ(z)
ˆ

= nhp/2 Op (n−1/2 )op ((nhp/2 )−1/2 )Op (1) = op (1).

Similarly, assumption (H5), n1/2 un = Op (1) and (3.29)(a) imply that nhp/2 R212 = op (1)
thereby nhp/2 R21 = op (1).
Regarding R22 , the Cauchy-Schwarz inequality implies that

nhp/2 R22

≤

1/2

un Jn

×

1/2
1
Khi (z)(m
˙ θˆ (Zi + ηk ) − m
˙ θ0 (Zi + ηk )) 2 dϕ(z)
ˆ
n
nN

= nhp/2 Op (n−1/2 )Op ((nhp )−1/2 )op (hp/2 ) = op (1).

The last equality holds because of assumption (H5) and (3.8). This completes the proof of
the lemma.
Proof of Lemma 3.4.3. Recall that

ξˆi = Yi − Hθˆ(Zi ) = [Yi − Hθ0 (Zi )] + [Hθ0 (Zi ) − Hθˆ (Zi )] = ξi + δ˜i .
n

Note that δ˜i are not independent due to the common use of validation sample, we further
decompose the residual as

δ˜i = Hθ0 (Zi ) − Hθ0 (Zi ) + Hθ0 (Zi ) − Hθˆ (Zi ) = si + ti ,
n

say.

Proof of (3.30)(a). Let

1
C¯n = 2
n

n

1
Bn = 2
n

2 (z)δ˜2 dϕ(z),
Khi
i
i=1

1
φ5 (Zi , ηk ) =
nN

n
2 (z)ξ δ˜ dϕ(z),
Khi
i i
i=1

2 (z)[m (Z + η ) − H (Z )]2 dϕ(z),
Khi
θ0 i
k
θ0 i

77

φ6 (Zi , ηk , ηl ) =

1
n

2 (z)[m (Z + η ) − H (Z )][m (Z + η ) − H (Z )]dϕ(z).
Khi
θ0 i
k
θ0 i
θ0 i
l
θ0 i

Let Cn∗ denote the Cn with dϕˆ replaced by dϕ. Arguing as for (3.31), it suffices to show
that (3.30)(a) holds with Cn replaced by Cn∗ . Decompose

Cn∗

1
= 2
n

n
2 (z)(ξ + δ˜ )2 dϕ(z) = C + C
¯n + 2Bn .
Khi
n
i
i
i=1

We claim

(a) nhp/2 C¯n = op (1),

(b) nhp/2 Bn = op (1).

(3.36)

To prove (3.36)(a), by the triangular inequality, we obtain

C¯n

1
n2

=

n
2 (z)[H (Z ) − H (Z ) + H (Z ) − H (Z )]2 dϕ(z)
Khi
i
θ0 i
θ0 i
θ0 i
θˆ
n

i=1

1
≤ 2 2
n

n
2 (z)[H (Z ) − H (Z )]2 dϕ(z)
Khi
θ0 i
θ0 i
i=1

1
+ 2
n
=: 2 C¯n1 + 2 C¯n2 ,

n
2 (z)[H (Z ) − H (Z )]2 dϕ(z)
Khi
θ0 i
θˆn i

i=1

say.

First, consider

C¯n1 =

1
2
n N2

1
=
nN

n

N
2 (z)[m (Z + η ) − H (Z )][m (Z + η ) − H (Z )]dϕ(z)
Khi
θ0 i
k
θ0 i
θ0 i
l
θ0 i

i=1 k,l=1

n

N

i=1 k=1

1
φ5 (Zi , ηk ) +
nN 2

n

N

φ6 (Zi , ηk , ηl ) =: C¯n11 + C¯n12 .
i=1 k=l=1

78

The summand C¯n11 is a two sample U statistic with the kernel function φ5 . Direct calculations yield the following facts.

Eφ5 = O(1/(nN hp )),
E(φ5 |ηk ) =

K2
nN hp

Var(E(φ5 |Zi )) = O

E(φ5 |Zi ) =

1
nN

2 (z)σ 2 (Z )dϕ(z),
Khi
i

[mθ0 (z + ηk ) − Hθ0 (z)]2 f (z)dϕ(z) + op (1/(nN hp )),
1
n2 N 2 h3p

Var(E(φ5 |ηk )) = O

,

1
n2 N 2 h2p

.

Because λ = lim N/n < ∞, by Lemma 3.6.1, we obtain
√
N C¯n11 = Op Var(E(φ5 |Zi )) + λVar(E(φ5 |ηk )) = Op 1/(nN h3p/2 ) .

Therefore, (W1) implies

nhp/2 C¯n11 = Op

1
√

N hp N

= op (1).

Next, consider C¯n12 . It is a two sample degenerated U statistic with the kernel function
φ6 . Similar to the analysis of Q3 in Lemma 3.3.1, we have Var(Cn12 ) = O N −2 (nhp )−2 .
Hence under (W1),

nhp/2 Cn12 = Op

nhp/2
N nhp

1
= Op √ √
N N hp

= op (1).

Therefore nhp/2 C¯n1 = op (1).
Next, consider

C¯n2 =

1
n2

n
2 (z)
Khi
i=1

1
N

N

[mθˆ (Zi + ηk ) − mθ0 (Zi + ηk )]
n

k=1

79

2

dϕ(z)

=

≤

1
n2
2
n2

n
2 (z)
Khi
i=1
n
2 (z)
Khi
i=1

1
N
1
N

2
+ 2
n
:= C¯n21 + C¯n22 ,

N

dnik + uTn m
˙ θ0 (Zi + ηk )
k=1
N

2

dnik

2

dϕ(z)

dϕ(z)

k=1
n
2 (z)
Khi

i=1

1
N

N

uTn m
˙ θ0 (Zi + ηk )

2

dϕ(z)

k=1

say.

By the facts (3.4) and n un 2 = Op (1), we obtain

C¯n21 = op ( un 2 )Op

The facts N −1

N
˙ θ0 (z
k=1 m

C¯n22 = Op

2
n2

2
n2

n
2 (z)dϕ(z) = o (n−2 h−p ).
Khi
p
i=1

+ ηk ) = H˙ θ0 (z) + op (1), n un 2 = Op (1) and (H6) yield

n

2
−2 −p
2 (z) uT H
˙
Khi
n θ0 (Zi ) dϕ(z) = Op (n h ).

i=1

Hence, by assumption (W1), we obtain nhp/2 C¯n2 = nhp/2 Op (n2 hp )−1 = op (1), thereby
completing the proof of (3.36)(a).
Next, consider

1
Bn = 2
n

n
2 (z)ξ s dϕ(z) +
Khi
i i
i=1

1
n2

n
2 (z)ξ t dϕ(z) =: B + B .
Khi
n1
n2
i i
i=1

Recall the notation in (3.32). Rewrite

1
Bn1 = 2
n N

n

N
2 (z)ξ ∆ dϕ(z).
Khi
i ik

i=1 k=1

80

Algebra shows that E(Bn1 ) = 0 and
1
Var(Bn1 ) =
4
n N2
= O

n

N

E

2 (y)K 2 (z)ξ ξ ∆ ∆ dϕ(y)dϕ(z)
Khi
i j ik jl
hj

i,j=1 k,l=1

1
n3 N h2p

.

Therefore, nhp/2 Bn1 = op (1). An argument similar to the one used in the analysis of
C¯n2 yields that nhp/2 Bn2 = op (1), thereby completing the proof of (3.36)(b), and also
of (3.30)(a).
Proof of (3.30)(b). Rewrite

Γn

=

2hp
n2

Khi (z)Khj (z)(ξi + δ˜i )(ξj + δ˜j )dϕ(z)
ˆ

2

i=j

= Γn +

2hp
n2

ˆ
Khi (z)Khj (z)(ξi δ˜j + ξj δ˜i + δ˜i δ˜j )dϕ(z)
i=j

4hp

+ 2
n

2

ˆ
Khi (z)Khj (z)(ξi δ˜j + ξj δ˜i + δ˜i δ˜j )dϕ(z)

Khi (z)Khj (z)ξi ξj dϕ(z)
ˆ
i=j

=: Γn + Γn1 + Γn2 ,

say.

It suffices to show that Γn1 = op (1) and Γn2 = op (1). The triangular inequality implies that

Γn1

≤

6hp
n2

Khi (z)Khj (z)ξi δ˜j dϕ(z)
ˆ

i=j
p
6h

+ 2
n

2

Khi (z)Khj (z)δ˜i δ˜j dϕ(z)
ˆ

i=j

=: I1 + I2 + I3 .

81

6hp
+ 2
n
2

Khi (z)Khj (z)δ˜i ξj dϕ(z)
ˆ
i=j

2

Substituting sj + tj for δ˜j in I1 , it can be seen that
12hp
n2

≤

I1

Khi (z)Khj (z)ξi ti dϕ(z)
ˆ

2

+

i=j

12hp
n2

Khi (z)Khj (z)ξi si dϕ(z)
ˆ

2

i=j

=: I11 + I12 .

Rewrite
12hp
I11 = 2
n

n
i=j=1

1
N

N

Khi (z)Khj (z)ξi [mθˆ (Zj + ηk ) − mθ0 (Zj + ηk )]dϕ(z)
ˆ

2

n

k=1

.

Analogous to the analysis of C¯n2 , by (3.7) and the Cauchy-Schwarz inequality, for 1 ≤ i =
j ≤ n, we obtain

1
Khi (z)Khj (z)ξi
N
= Op

≤ Op
=

N
k=1

dϕ(z)
ˆ

N

1
Khi (z)Khj (z)ξi
N

k=1

2 (z)ξ 2 dϕ(z) ×
Khi
i

Op (h−p/2 )Op ((nhp )−1/2 )

mθˆ (Zj + ηk ) − mθ0 (Zj + ηk )
n

mθˆ (Zj + ηk ) − mθ0 (Zj + ηk ) dϕ(z)
n

2 (z)
Khj

= Op

1
N

N

dnik + uTn m
˙ θ0 (Zj + ηk )

2

dϕ(z)

1/2

k=1
−1/2
−p
(n
h ).

Hence I11 = hp Op (n−1/2 h−p ) = op (1).
∗ denote the I with ϕ
Regarding I12 , let I12
ˆ replaced by ϕ, then it suffices to prove that
12
∗ = o (1) by (3.5). Rewrite
I12
p

∗
I12

hp
= 2 2
n N

n

N

Khi (y)Khj (y)Khi (z)Khj (z)ξi2 ∆jk ∆jl dϕ(y)dϕ(z).
i=j=1 k,l=1

82

Define

1
2N
1
φ8 (Di , Dj , ηk , ηl ) =
2

Khi (y)Khj (y)Khi (z)Khj (z)[ξi2 ∆2jk + ξj2 ∆2ik ]dϕ(y)dϕ(z),

φ7 (Di , Dj , ηk ) =

Khi (y)Khj (y)Khi (z)Khj (z)[ξi2 ∆jk ∆jl + ξj2 ∆ik ∆il ]dϕ(y)dϕ(z).

∗ can be rewritten as
Then I12

∗
I12

=

hp
n2 N

n

N

i=j=1 k=1

=: L1 + L2 ,

hp
φ7 (Di , Dj , ηk ) + 2 2
n N

n

N

φ8 (Di , Dj , ηk , ηl )
i=j=1 k=l=1

say.

Both L1 and L2 are two sample U statistics. Verify that by (A2), (A5) and (W1),

∗ ) = E(L ) =
E(I12
1

(n − 1)hp
Eφ6 = O((N hp )−1 ) = o(1).
nN

Furthermore, by calculating the second moments of the conditional expectations in Lemma
2.6.4, it can be shown that, under (A2), (A4), (A5) and (W1),

Eφ27 = O N −2 h−3p ,

Var(E(φ7 |Di )) = O (N hp )−2 ,

Var(E(φ7 |Di , Dj )) = O N −2 h−3p ,

Var(E(φ7 |ηk )) = O (N hp )−1 .

Lemma 2.6.4 implies that Var(L1 ) = o(1). Thereby L1 = op (1). Similarly, Lemma 2.6.5
yields that L2 = op (1). The results I2 = op (1) and I3 = op (1) are obtained in a similar
manner. Details are skipped for the sake of brevity of the chapter. The fact Γn2 = op (1) is

83

derived by using the fact that

hp n−2

Khi (z)Khj (z)|ξi ξj |dϕ(z)
ˆ

2

= Op (1)

i=j

proved in KN, the application of Cauchy-Scharwz inequality and the fact that Γn1 = op (1).
This completes the proof of (3.30)(b) and also of Lemma 3.4.3.
We further briefly discuss the consistency of these tests. We establish that under some
regularity conditions, |Tn | →p ∞, under certain fixed alternatives, which implies the consistency of the sequences of tests based on Tn .
Recall the definitions of H(z) and T (H) in the beginning of Section 3.3.1. Let θn be an
consistent estimator of T (H) and define

ξi = Yi − H(Zi ),
1
Cn = 2
n

ξni = Yi − Hθn (Zi ),

n
2 (z)ξ 2 dϕ(z),
Khi
ni ˆ
i=1

−1/2

Let Tn := nhp/2 Γn

2hp
Γn = 2
n

n

Khi (z)Khj (z)ξni ξnj dϕ(z)
ˆ

2

.

i=j=1

(Mn (θn ) − Cn ). Then the theorem below presents the asymptotic

behavior of the proposed test under certain alternative hypotheses.
Theorem 3.4.2. Suppose (A1), (A2), (A4), (A5),(F1), (F2), (H3), (K), (W1) and (W2)
hold and the alternative hypothesis H1 : µ(x) = m(x), x ∈ C satisfies that inf θ ρ(H, Hθ ) > 0
and T (H) is unique. Then |Tn | →p ∞ for any consistent estimator θn of T (H).
By Lemma 3.3.2, θˆn is consistent for T (H), therefore the above theorem implies that
|Tn | → ∞ in probability under the same regularity conditions, and the test based on Tn
is consistent against the alternative m for which inf θ ρ(H, Hθ ) > 0. The proof of Theorem
3.4.2 is similar to that of Theorem 5.1 in KS with slight modifications. The techniques used
84

for analyzing Wn (θ) in Lemma 3.3.1 and Dn (θ) in the proof of Theorem 3.3.2 are enough to
produce the conclusions. Details are skipped for the sake of brevity.

3.5

Simulation

In this section, we present the results of a Monte Carlo study of the proposed estimation
and testing procedures for p = 1, 2. For p = 1, both linear and nonlinear functions are
chosen as the underlying true regression to generate the primary and validation data. For
p = 2, a linear regression is assumed. Various values of the ratio N/n are selected to
demonstrate its role on the performance of these inference procedures. Throughout the
simulation, the kernel function K is chosen as K(u) = 0.75(1 − u2 )I(|u|≤1) for p = 1 while
K(u) = 0.752 (1 − u21 )(1 − u22 )I(|u |≤1,|u |≤1) for p = 2. All of the results obtained are based
1
2
on 1000 replications.
We need to determine the two bandwidths for the implementation of the above estimation
and testing procedures. As mentioned in the beginning of Section 3.4, one bandwidth used
for estimating fZ is w = c(log n/n)1/(p+4) , c > 0. We propose to obtain c by minimizing,
w.r.t. c, the unbiased cross-validation criterion U CV (w) developed in Wolfgang, Marron
and Wand (1990), where
1
(R(K))p
U CV (w) =
+
p
nw
n(n − 1)wp

with R(K) =

K 2 (x)dx and K ∗ K(x) =

n

(K ∗ K − K)
i=j=1

Zi − Zj
,
w

K(y)K(x − y)dy. We apply a grid search to

85

choose the optimal coefficient c starting from 0.1 with step 0.02, i.e.,

c∗n := argmin0.1≤c≤10 U CV c(log n/n)1/(p+4) ,

wopt = c∗n (log n/n)1/(p+4) .

For the bandwidth h, in order to satisfy (W2), we choose to use h = σ
ˆZ n−1/3 for p = 1
recommended by Sepanski and Carroll (1993) and h = n−1/4.5 for p = 2 as used in KS.
In order to interpret the performance of the proposed estimator θˆn , we also present the
performance of the KS estimator θ˜KS . Recall that in KS, the measurement error density fη
is assumed to be known.
Both the means and square root of mean square error (RMSE) of the two estimators are
reported. For the proposed estimator θˆn , N/n are chosen as 1, 1/2 and 1/10 to illustrate
how N/n affects the estimator performance. In both linear and nonlinear cases, the bias
and RMSE decrease as the sample sizes increase. In the linear case, as shown in Example
3.3.1, the asymptotic variance of θˆn is the same as that of θ˜KS . This is also reflected in this
finite sample study as the RMSEs of θˆn and θ˜KS in Table 3.1 are very similar for all of the
three choices of N/n. In the nonlinear case, Table 3.2 shows that the obtained RMSE of θˆn
is larger than θ˜KS and it decreases as N/n increases from 1/10 to 1.
In the testing procedure, with nominal level 0.05, the empirical level and power are
obtained by computing #{|Tn | ≥ 1.96}/1000. The sample size ratios are chosen as N/n =
4, 1 and 1/4. Both the linear and nonlinear regressions are used as the null for p = 1 while
the linear regression is chosen as the null for p = 2. The empirical power is obtained under
various choices of alternative models.

86

3.5.1

Finite sample performance of θˆn

In this subsection we report the findings of a finite sample performance of the estimator
θˆn in linear and nonlinear cases.
The linear case with q = 1 = p. In this case, we generated the data from (3.1) with

mθ (x) = θx,

θ0 = 1,

(3.37)

where ε ∼ N1 (0, 0.22 ), η ∼ N1 (0, 0.12 ), Z ∼ U [−1, 1]. Then
1
Hθ (z) =
N

Hθ (z) = θz,

N

θ(z + ηk ) = θ(z + η¯).
k=1

The two bandwidths are chosen as described above. Throughout the simulation, C = [−1, 1],
and G is the uniform measure on [−1, 1]. Hence, as noted in Example 3.3.1, here Σ2 = 0
and the asymptotic variances of θˆn is equivalent to that of θ˜KS . This fact is also reflected
in this finite sample study by observing that the RMSE of θˆn remains the same for different
choices of N/n as seen in the Table 3.1.
The nonlinear case with q = 1 = p. In this section, the regression function is

mθ (x) = eθx ,

θ0 = −1,

(3.38)

and all other setup is the same as in the above simulation for the linear case. Then the
regression function, given z, is

2 2
Hθ (z) = eθ ση /2 eθz ,

Hθ (z) =

1
N

N

eθ(z+ηk ) = eθz
k=1

87

1
N

N

eθηk .
k=1

N/n = 1

(n, N )
(100,100)
ˆ
mean(θn )
1.0016
ˆ
RMSE(θn )
0.0389
N/n = 1/2
(n, N )
(100,50)
ˆ
mean(θn )
0.9979
ˆ
RMSE(θn )
0.0381
N/n = 1/10
(n, N )
(100,10)
mean(θˆn )
0.9974
ˆ
RMSE(θn )
0.0399
KS
n
100
˜
mean(θKS )
0.9999
˜
RMSE(θKS )
0.0393

(200,200) (400,400) (600,600)
0.9994
1.0005
0.9996
0.0298
0.0194
0.0163
(200,100) (400,200) (600,300)
0.9985
1.0005
1.0005
0.0295
0.0194
0.0165
(200,20) (400,40) (600,60)
0.9984
1.0001
0.9999
0.0299
0.0195
0.0170
200
400
600
0.9996
1.0006
0.9995
0.0298
0.0194
0.0170

Table 3.1: Performance of θˆn , θ˜n in the linear case (3.37), p = 1.

In this case, the second term Σ2 in the asymptotic variance is calculated as
2

2

σθ0 (x, y) = eση (eση − 1)eθ0 (x+y) ,

2

2

Σ2 = e2ση (eση − 1)

(x + ση2 θ0 )eθ0 x dG(x)

2

> 0.

. Table 3.2 shows the consistency of θˆn as the bias is very little and the RMSE decreases as
the samples sizes increase. The RMSEs of θˆn are larger than those of θ˜KS , for all chosen
values of N/n. Furthermore, the RMSE of θˆn decreases as N/n increases.
The linear case with q = 2 = p. We further consider the case mθ (x) = θ1 x1 + θ2 x2 ,
where θ = (θ1 , θ2 )T ∈ R2 and x = (x1 , x2 )T ∈ R2 . The true parameter θ0 = (1, 1) is used
to generate the data. Denote Zi = (Zi1 , Zi2 )T and ηi = (ηi1 , ηi2 )T for 1 ≤ i ≤ n. Both
Zi1 and Zi2 are generated independently from U [−1, 1] while ηi1 and ηi2 are generated from
N1 (0, 0.12 ) and N1 (0, 0.22 ), respectively. Then Xi = (Xi1 , Xi2 ) is obtained as the sum of
Zi and ηi . The primary data {(Yi , Zi ), 1 ≤ i ≤ n} are obtained with the above regression
function and the error ε following N1 (0, 0.22 ). The validation data {ηk , 1 ≤ k ≤ N } are
independently simulated from ηi . The bandwidth w is obtained based on the UCV criterion

88

N/n = 1

(n, N )
(100,100)
ˆ
mean(θn )
-0.9999
ˆ
RMSE(θn )
0.0360
N/n = 1/2
(n, N )
(100,50)
ˆ
mean(θn )
-1.0032
ˆ
RMSE(θn )
0.0392
N/n = 1/10
(n, N )
(100,10)
mean(θˆn )
-1.0023
ˆ
RMSE(θn )
0.0498
KS
n
100
˜
mean(θKS )
-1.0005
˜
RMSE(θKS )
0.0321

(200,200) (400,400) (600,600)
-1.0004
-0.9999
-1.0002
0.0249
0.0172
0.0141
(200,100) (400,200) (600,300)
-1.0004
-1.0000
-1.0001
0.0267
0.0181
0.0149
(200,20) (400,40) (600,60)
-1.0009
-1.0004
-1.0003
0.0358
0.0245
0.0200
200
400
600
-0.9998
-0.9998
-1.0002
0.0233
0.0162
0.0132

Table 3.2: Performance of θˆn , θ˜n in the nonlinear case (3.38), p = 1.

with p = 2 while h is taken as h = n−1/4.5 as used in KS. In this case, C = [−1, 1]2 and G
is the uniform measure on [−1, 1]2 . The choices of N/n are the same as the previous cases.
Both means and RMSE of the estimator θˆn = (θˆn,1 , θˆn,2 )T and θ˜KS = (θ˜KS,1 , θ˜KS,2 )T are
presented in Table 3.3. It shows small estimation bias and reduced RMSE for increased
sample sizes.

3.5.2

Test performance

Here we present the test performance of the proposed test associated with Mn (θˆn )) in
terms of empirical level and power for different alternative hypotheses and various sample
size ratio choices.
The case q = 1 = p. The finite sample performance of the Tn test is assessed for both
the above linear (3.37) and nonlinear (3.38) regression models as the null. For each case,
the three different alternatives are chosen to obtain the empirical power of a member of the
class of the proposed tests.

89

N/n = 1

(n, N )
(100,100)
ˆ
mean(θn,1 )
0.9953
ˆ
RMSE(θn,1 )
0.0728
ˆ
mean(θn,2 )
0.9989
ˆ
RMSE(θn,2 )
0.0634
N/n = 1/2
(n, N )
(100,50)
ˆ
mean(θn,1 )
0.9943
ˆ
RMSE(θn,1 )
0.0779
mean(θˆn,2 )
0.9975
RMSE(θˆn,2 )
0.0644
N/n = 1/10
(n, N )
(100,10)
ˆ
mean(θn,1 )
0.9928
ˆ
RMSE(θn,1 )
0.0813
ˆ
mean(θn,2 )
0.9892
ˆ
RMSE(θn,2 )
0.0679
KS
n
100
mean(θ˜KS,1 )
0.9957
RMSE(θ˜KS,1 )
0.0732
mean(θ˜KS,2 )
0.9999
˜
RMSE(θKS,2 )
0.0633

(200,200) (300,300) (400,400)
1.0004
0.9999
1.0013
0.0397
0.0332
0.0275
1.0032
1.0013
0.9999
0.0398
0.0307
0.0271
(200,100) (300,150) (400,200)
1.0000
0.9998
1.0007
0.0399
0.0332
0.0269
1.0011
1.0009
0.9983
0.0395
0.0308
0.0261
(200,20) (300,30) (400,40)
0.9990
0.9990
1.0006
0.0400
0.0333
0.0275
0.9965
0.9980
0.9971
0.0399
0.0311
0.0273
200
300
400
1.0006
0.9999
1.0013
0.0397
0.0334
0.0275
1.0037
1.0017
1.0002
0.0399
0.0306
0.0271

Table 3.3: Performance of θˆn , θ˜n in the linear case with p = 2

Model 0: Y = X + ε

Y = e−X + ε

Model 1: Y = X + 0.2X 2 + ε

Y = e−X + 0.2X 2 + ε

Model 2: Y = X + 0.5 sin(2X) + ε

Y = e−X + 0.5 sin(2X) + ε

Model 3: Y = XI(X≤0.5) + 0.5I(X>0.5) + ε

Y = e−X I(X≤0.5) + e−0.5 I(X>0.5) + ε

The entities G, K, fZ , U and ε are as in the q = 1 = p cases in Section 3.5.1.
The empirical levels under Model 0 and the empirical power under Models 1, 2, 3, are
shown in Table 3.4 with increasing sample sizes. The left and right panels of Table 3.4
correspond to the left and right panels of the models above, respectively. With nominal level
0.05, the empirical level is well controlled in the linear case while it is slightly conservative
under the exponential null model with larger sample sizes. The proposed test rejects the
90

N/n = 4
(n, N )
Model 0
Model 1
Model 2
Model 3
N/n = 1
(n, N )
Model 0
Model 1
Model 2
Model 3
N/n = 1/4
(n, N )
Model 0
Model 1
Model 2
Model 3

(100,400)
0.031
0.686
0.392
0.913

(200,800)
0.041
0.982
0.781
1.000

(100,100)
0.029
0.671
0.409
0.921

(200,200)
0.037
0.984
0.790
1.000

(100,25)
0.056
0.679
0.483
0.902

(200,50)
0.048
0.967
0.814
1.000

N/n = 4
(500,2000) (100,400)
0.032
0.046
0.326
1.000
0.996
1.000
1.000
0.321
N/n = 1
(500,500)
(100,100)
0.041
0.029
1.000
0.345
1.000
0.996
1.000
0.327
N/n = 1/4
(500,125)
(100,25)
0.052
0.037
1.000
0.344
0.995
0.996
1.000
0.339

(200,800)
0.022
0.748
1.000
0.740

(500,2000)
0.039
1.000
1.000
1.000

(200,200)
0.032
0.746
1.000
0.730

(500,500)
0.040
1.000
1.000
1.000

(200,50)
0.033
0.744
1.000
0.733

(500,125)
0.038
1.000
1.000
0.996

Table 3.4: Empirical level and power under linear null model (left panel) and nonlinear null model
(right panel) for p = 1

null hypotheses with high power for moderate and large sample sizes, for all the three chosen
alternatives. Moreover, it is observed that for the same primary sample size n, the empirical
power changes little when the validation sample size N increases. This finding also is somewhat consistent with the theoretical result that the sample size ratio N/n does not play a
critical role in the asymptotic behavior of the proposed test statistic.
The case q = 2 = p. In this case, the setup is the same as in the estimation subsection
3.5.1 for p = 2. We investigate the empirical level of the proposed test under Model ∅ and
power under alternative Models I, II and III below.
Model ∅:

Y = θ0T X + ε,

θ0 = (1, 1)T ,

Model I:

Y = θ0T X + 0.2X1 X2 + ε

Model II:

Y = θ0T X + 0.5 sin(2X1 X2 ) + ε

Model III:

Y = θ0T XI T
+ 0.5I T
+ε
(θ X≤0.5)
(θ X>0.5)
0

0

91

X = (X1 , X2 )T

N/n = 4
(n, N )
Model ∅
Model I
Model II
Model III
N/n = 1
(n, N )
Model ∅
Model I
Model II
Model III
N/n = 1/4
(n, N )
Model ∅
Model I
Model II
Model III

(40,160)
0.041
0.052
0.581
0.654

(100,400)
0.034
0.103
0.900
0.910

(200,800)
0.048
0.278
0.953
0.961

(400,1600)
0.047
0.596
0.998
0.999

(40,40)
0.040
0.056
0.585
0.636

(100,100)
0.035
0.121
0.900
0.908

(200,200)
0.049
0.295
0.958
0.963

(400,400)
0.046
0.608
0.998
0.997

(40,10)
0.092
0.105
0.604
0.638

(100,25)
0.081
0.188
0.901
0.912

(200,50)
0.077
0.340
0.961
0.964

(400,100)
0.069
0.658
0.999
0.999

Table 3.5: Empirical level and power under linear null model for p = 2
The numerical findings are summarized in Table 3.5. It is observed that the empirical
levels preserve the nominal size 0.05 for larger sample sizes when N/n = 1 and 4. The
empirical levels for N/n = 1/4 are slightly inflated due to limited validation sample and it
decreases towards the nominal size 0.05 as sample sizes increase. The empirical power under
all chosen alternatives increases as the sample size increases.

3.6

Proofs

In this section, we provide the detailed proofs of Lemmas 3.3.1 and 3.3.2. To proceed, we
state Theorem B.1 in Sepanski and Lee (1995) and recall Lemmas 2.6.4 and 2.6.5 pertaining
to some two sample U statistics.
Lemma 3.6.1. Let {xi }, i = 1, ..., n be an i.i.d. sample, and {vj }, j = 1, ..., m be another
i.i.d. sample which is independent with {xi }.The functions ψn (v, x, h) is a sequence of ran92

dom functions with a bandwidth h. In addition, suppose the following hold.
(1) There exists square integrable functions q1 (v) and q2 (x) such that |E{ψn (v, x, h)|v}| ≤
q1 (v) and |E{ψn (v, x, h)|x}| ≤ q2 (x),
(2) limn→∞ E{ψn (v, x, h)|v} → p1 (v), a.e., and limn→∞ E{ψn (v, x, h)|x} → p2 (x), a.e. for
some measurable functions p1 (v) and p2 (x), and
√
(3) limn→∞ nE{ψn (v, x, h)} → 0.
Then

1
√
m n

n

m

ψn (vj , xi , h) →d N1 (0, λVar{p1 (v)} + Var{p2 (x)}),
i=1 j=1

where λ = limn∧m→∞ (n/m), assumed to be finite.
Proof of Lemma 3.3.1. Let Wn∗ (θ) be the Wn (θ) with dϕˆ replaced by dϕ. To proceed,
for 1 ≤ i = j ≤ n, 1 ≤ k = l ≤ N , define

2 (z)[m (Z + η ) − H (Z )]2 dϕ(z),
Khi
θ i
k
θ i

φ1 (Zi , ηk ) =

φ2 (Zi , Zj , ηk , ηl ) =
φ3 (Zi , Zj , ηk ) =
φ4 (Zi , ηk , ηl ) =

Khi (z)Khj (z)[mθ (Zi + ηk ) − Hθ (Zi )][mθ (Zj + ηl ) − Hθ (Zj )]dϕ(z),
Khi (z)Khj (z)[mθ (Zi + ηk ) − Hθ (Zi )][mθ (Zj + ηk ) − Hθ (Zj )]dϕ(z),

Khi (z)Khj (z)[mθ (Zi + ηk ) − Hθ (Zi )][mθ (Zi + ηl ) − Hθ (Zi )]dϕ(z).

Rewrite

Wn∗ (θ)
1
=
n2

n

Khi (z)Khj (z)[Hθ (Zi ) − Hθ (Zi )][Hθ (Zj ) − Hθ (Zj )]dϕ(z)
i,j=1

93

=
=

1
n2 N 2

n

N

Khi (z)Khj (z)[mθ (Zi + ηk ) − Hθ (Zi )][mθ (Zj + ηl ) − Hθ (Zj )]dϕ(z)
i, j=1 k,l=1

1
n2 N 2

φ1 (Zi , ηk ) +
i=j k=l

φ2 (Zi , Zj , ηk , ηl )
i=j k=l

+

φ3 (Zi , Zj , ηk ) +
i=j k=l

i=j k=l

=: Q1 + Q2 + Q3 + Q4 ,

φ4 (Zi , ηk , ηl )

say.

Because E[mθ (Z + η)|Z = z] = Hθ (z), then EQ2 = EQ4 = 0. Therefore
1
1
Eφ1 (Z, η) + Eφ3 (Z1 , Z2 , η)
nN
N
1
Kh2 (z − Z1 )σθ2 (Z1 )dϕ(z) + E Kh (z − Z1 )Kh (z − Z2 )σθ (Z1 , Z2 )dϕ(z)
N

E(Wn∗ (θ)) = EQ1 + EQ3 =
1
E
nN
1
= O
N

=

σθ2 (z)dG(z) = O AN (θ) .

This fact and (3.5) imply that Wn (θ) = Wn∗ (θ) + op (Wn∗ (θ)) = Wn∗ (θ) + op (1/N ). Therefore,
it suffices to prove that (3.10) holds with Wn (θ) replaced by Wn∗ (θ).
We investigate each quantity in the decomposition of Wn∗ (θ). First, Q1 is a two sample U
statistic with kernel function φ1 . In order to apply Lemma 3.6.1, it is necessary to calculate
the projections of φ1 , i.e.,

E(φ1 |Zi ) = hp

2 (z)σ 2 (Z )dϕ(z),
Khi
i

E(φ1 |ηk ) = Op K1

[mθ (z + ηk ) − Hθ (z)]2 dG(z) .

It can be verified that Eφ1 , Var(E(φ1 |Zi )) and Var(E(φ1 |ηk )) are all finite. Therefore
Lemma 3.6.1 implies that for finite 0 < λ < ∞,

Zn :=

n
√
1
N×
nN

N

[φ1 (Zi , ηk ) − Eφ1 ]

i=1 k=1

94

is asymptotically normally distributed. Hence

N Q1 =

1
nN 1/2

Zn +

1
nN 1/2

E(φ1 ) = op (1).

(3.39)

Similarly, Q2 is a two sample statistic with the kernel function φ2 . Note that E[φ2 |Zi ] =
E[φ2 |Zi , Zj ] = E[φ2 |Zi , Zj , ηk ] = E[φ2 |ηk ] = E[φ2 |Zi , ηk ] = 0 for 1 ≤ i = j ≤ n and
1 ≤ k ≤ N.
To proceed further, define

[mθ (z + ηk ) − Hθ (z)][mθ (z + ηl ) − Hθ (z)]f 2 (z)dϕ(z),

φ˜2 (ηk , ηl ) =

1
Q2 =
N (N − 1)

N

φ˜2 (ηk , ηl ).
k=l=1

Calculation shows that,

Var(φ2 ) = O

1
,
h2p

E(φ2 |ηk , ηl ) = Op φ˜2 (ηk , ηl ) ,

Var φ˜2 (ηk , ηl ) = Σθ .

Then Lemma 2.6.5 implies that

Var N (Q2 − Q2 ) = O (nhp )−2 = o(1).

Thereby N (Q2 − Q2 ) = op (1). Furthermore, with Q2 being a degenerated U statistic,
applying Theorem 1 in Hall (1984) to Q2 yields N Q2 → N1 (0, 2γ(θ)), which in turn implies

N Q2 → N1 (0, 2γ(θ)).

95

(3.40)

Next, consider Q3 , which is defined with kernel function φ3 . Algebra shows
1
N
1
E(φ3 |Zi ) = Op
N
1
E(φ3 |ηk ) = Op
N
1
E(φ3 |Zi , ηk ) = Op
N

E(φ3 |Zi , Zj ) = Op

Khi (z)Khj (z)σθ (Zi , Zj )dϕ(z) ,
Khi (z)σθ (Zi , z)f (z)dϕ(z) , E(φ3 ) = Op AN (θ) ,
[mθ (z + ηk ) − Hθ (z)]2 fZ2 (z)dϕ(z) ,
Khi (z)[mθ (Zi + ηk ) − Hθ (Zi )][mθ (z + ηk ) − Hθ (z)]fZ (z)dϕ(z) .

Furthermore, the second moments of the above projections can be derived

Eφ23 = O(N −2 h−2p ),

E[E(φ3 |Zi , Zj )]2 = O(N −2 h−2p ),

E[E(φ3 |ηk )]2 = O(N −2 ),

E[E(φ3 |Zi )]2 = O(N −2 h−p ),

E[E(φ3 |Zi , ηk )]2 = O(N −2 h−p ).

Then Lemma 2.6.4 yields that

Var(Q3 ) = O

1
1
4
4
Var(E(φ3 |Z1 ) + Var(E(φ3 |η1 )) = O
+ 3 .
p
2
n
N
nh N
N

Hence N (Q3 − Eφ3 ) = op (1) for sufficient large N and Eφ3 = O(1/N ). Therefore

Q3 = Q3 − Eφ3 + Eφ3 = Eφ3 + op (1/N ) = AN (θ) + op (1/N ).

(3.41)

The same routine argument and Lemma 2.6.4 applying to Q4 lead to

Var(Q4 ) = O

1
(nhp )2 N 2

and EQ4 = 0.

(3.42)

Combining all the results of components of Wn (3.39)–(3.42), one can see that Q2 domi-

96

nates the convergence rate of Wn and only Q3 contributes to the mean of Wn asymptotically,
which in turn yields (3.10).
Proof of Lemma 3.3.2. KS has shown that θ˜n = T (H) + op (1) by proving that

sup |Mn (θ) − ρ(H, Hθ )| = op (1).

(3.43)

θ∈Θ

In the current setup, if we show

sup |Mn (θ) − Mn (θ)| = op (1),

(3.44)

θ∈Θ

then, by (3.43),

sup |Mn (θ) − ρ(H, Hθ )| = op (1).

θ∈Θ

Then arguing as in KS will yield the lemma.
Proof of (3.44). By the Cauchy-Schwarz inequality,

|Mn (θ) − Mn (θ)| ≤ Wn (θ) + 2[Wn (θ)Mn (θ)]1/2 .

It suffices to show that supθ |Wn (θ)| = op (1) and supθ |Mn (θ)| = Op (1). The the compactness
of Θ and Hθ ∈ L2 (G) imply that supθ |ρ(H, Hθ )| is finite. Furthermore, (3.43) shows that
supθ |Mn (θ)| = Op (1).
Now we study Wn (θ). By Lemma 3.3.1, Wn (θ) = op (1), for every θ ∈ Θ. Moreover, for

97

any θ1 , θ2 ∈ Θ,

|Wn (θ1 ) − Wn (θ2 )|
1
n

≤

n

Khi (z)[Hθ1 (Zi ) − Hθ1 (Zi ) + Hθ2 (Zi ) − Hθ2 (Zi )]

2

dϕ(z)
ˆ

i=1

1
n

×

n

Khi (z)[Hθ1 (Zi ) − Hθ1 (Zi ) − Hθ2 (Zi ) + Hθ2 (Zi )]

2

dϕ(z)
ˆ

1/2

.

i=1

The first term on the right hand side above is Op (1) due to the boundedness of Hθ and the
compactness of Θ. Similar to the proof on p143 of KS, the second term is bounded above by

2

1
n

n

2

Khi (z)[Hθ1 (Zi ) − Hθ2 (Zi )]

dϕ(z)
ˆ

(3.45)

i=1

+2

1
n

n

Khi (z)[Hθ1 (Zi ) − Hθ2 (Zi )]

2

dϕ(z).
ˆ

i=1

The first term can be rewritten as 2 times the factor

1
n

n

Khi (z)[Hθ1 (Zi ) − Hθ1 (Zi ) + Hθ1 (Zi ) − Hθ2 (Zi ) + Hθ2 (Zi ) − Hθ2 (Zi )]

2

dϕ(z)
ˆ

i=1

1
n

≤ 3 Wn (θ1 ) +
1
n

=3

n

Khi (z)[Hθ1 (Zi ) − Hθ2 (Zi )]

2

dϕ(z)
ˆ + Wn (θ2 )

i=1

n

Khi (z)[Hθ1 (Zi ) − Hθ2 (Zi )]

2

dϕ(z)
ˆ + op (1).

i=1

The last claim holds because N AN (θ) = O(1) and hence by Lemma 3.3.1, WN (θ) = op (1),
for all θ ∈ Θ. Then, by (H3), the bound (3.45) is further bounded from the above by

8

1
n

n

Khi (z)[Hθ1 (Zi ) − Hθ2 (Zi )]
i=1

98

2

dϕ(z)
ˆ + op (1)

≤ 8 θ1 − θ2

2β

1
n

n

Khi (z)r(Zi )

2

dϕ(z)
ˆ + op (1) = θ1 − θ2 2β Op (1),

i=1

by (3.8) applied with α = r. The above result and the compactness of Θ implies that
supθ∈Θ |Wn (θ)| = op (1), by a routine argument.

99

BIBLIOGRAPHY

100

BIBLIOGRAPHY

Abarin T. and Wang L. (2009). Second-order least squares estimation of censored regression
models. Journal of Stat. Plann. and Infer. 139, 125–135.
Amemiya, T. (1984). Tobit models: A survey. Journal of Econometrics 24, 3–61.
Berkson, J. (1950). Are there two regressions? J. Amer. Statist. Assoc. 45, 164–180.
Bhattacharya, P. K., Chernoff, H. and Yang, S. S. (1983). Nonparametric estimation of the
slope of a truncated regression. Ann. Statist. 11, 505–514.
Bierens, H. (1987). Kernel Estimators of Regression Functions. Advances in Econometrics,
Fifth Worldcongress, Vol 1, Bewley (ed.), 99–144.
Bosq, D. (1998). Nonparametric statistics for stochastic processes, 2nd Edition, Springer,
Berlin.
Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. (2006). Measurement Error
in Nonlinear Models: A Modern Perspective. Second edition. Chapman and Hall, London.
Cheng, C. L., and Van Ness, J. W. (1999). Statistical regression with measurement error.
John Wiley & Sons.
Delaigle, A., Hall, P. and Qiu, P. (2006). Nonparametric methods for solving the Berkson
errors-in-variables problem. J. R. Statist. Soc. B, 68(2), 201–220.
Du, L., Zou, C. and Wang, Z. (2011). Nonparametric regression function estimation for
errors-in-variables models with validation data. Statistica Sinica, 21, 1093–1113.
Fuller, W. A. (1987). Measurement error models. John Wiley & Sons.
Gonz´alez-Manteiga, W. and Crujeiras, R. M. (2013). An updated review of Goodness-of-Fit
tests for regression models. Test, 22, 361–411.
Hall, P. (1984). Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal. 14, 1–16.
Huwang, L. and Huang Y.H. (2000). On errors-in-variables in polynomial regression–
Berkson case. Statistica Sinica 10, 923–936.
Jones, M. C. and Signorini, D. F. (1997). A Comparison of Higher-Order Bias Kernel
Density Estimators J. Amer. Statist. Assoc. 439, 1063–1073.

101

Kim, K. H., Chao, S. and H¨ardle, W. K. (2016), Simultaneous Inference for the Partially
Linear Model with a Multivariate Unknown Function when the Covariates are Measured
with Errors. SFB 649 Discussion Paper 2016–024.
Koul, L. H. and Ni, P. (2004). Minimum distance regression model checking. Journal of
Stat. Plann. and Infer. 119, 109–141.
Koul, H. L. and Song, W. (2009) Minimum distance regression model checking with Berkson
measurement errors. Annals of Statistics, 37(1), 132–156.
Koul, H. L., Song, W. and Liu, S. (2014). Model checking in Tobit regression via nonparametric smoothing. J. Multivariate Anal. 125, 36–49.
Lee, L. F. and Sepanski, J. H. (1995). Estimation of Linear and Nonlinear Errors-inVariables Models Using Validation Data, Journal of the American Statistical Association,
90(429), 130–140.
Mack, Y.P. and Silverman, B.W. (1982). Weak and strong uniform consistency of kernel
regression estimates. Z. Wahrsch. Gebiete, 61, 405–415.
Schennach, S.M. (2013). Regressions with Berkson errors in covariates–a nonparametric
approach. The Annals of Statistics, 41(3), 1642–1668.
Sepanski, J. H. and Carroll, R. J. (1993). Semiparametric quasilikelihood and variance
function estimation in measurement error models. J. Econometrics, 58, 223–256.
Sepanski, J.H. and Lee, L. (1995). Semiparametric estimation of nonlinear errors-invariables models with validation study. Journal of Nonparametric Statistics. 4(4), 365–394.
Serfling R. J. (1981). Approximation theorems of mathematical statistics. Wiley Series in
Probability and Statistics. Wiley, Hoboken, NJ.
Song, W. (2008). Model checking in errors-in-variables regression. J. Multivariate Anal. 99,
2406–2443.
Song, W. (2009). Lack-of-fit testing in errors-in-variables regression model with validation
data. Statist. Probab. Lett. 79, 765–773.
Song, W. (2011). Distribution-free test in Tobit mean regression model. Journal of Stat.
Plann. and Infer. 141, 2891–2901.
Song, W. and Yao, W. (2011). A lack-of-fit test in Tobit errors-in-variables regression
models. Statist. Probab. Lett. 81, 1792–1801.
Stute, W., Thies, S. and Zhu, L. (1998). Model checks for regression: an innovation process
approach. Ann. Statist., 26, 1916–1934.
102

Stute, W., Xue, L. and Zhu, L. (2007). Empirical likelihood inference in nonlinear errors-incovariables models with validation data. J. Amer. Statist. Assoc.,102, 332-346.
Tobin, J. (1958). Estimation of Relationships for Limited Dependent Variables. Econometrica, 26(1), 24–36.
Wang, L. (1998). Estimation of censored linear errors-in-variables models. J. Econometrics
84, 383–400.
Wang, L. (2004). Estimation of nonlinear models with Berkson measurement errors. Annals
of Statistics,32, 2559–2579.
Wang, L. (2007). A simple nonparametric test for diagnosing nonlinearity in Tobit median
regression model. Statist. Probab. Lett. 77, 1034–1042.
Wang, Q. and Rao, J. N. K. (2002). Empirical Likelihood-Based Inference in Linear Errorsin-Covariables Models with Validation Data. Biometrika, 89, 2, 345–358.
Wolfgang, H., Marron, J. S. and Wand, M. P. (1990). Bandwidth choice for density derivatives. J. R. Statist. Soc. B, 52(1), 223–232.
Zheng, J. X. (1996). A consistent test of functional form via nonparametric estimation
techniques. J. Econometrics 175(2), 263–289.

103