TESTING OF REGRESSION FUNCTIONS WHEN RESPONSES ARE
MISSING AT RANDOM
By
Xiaoyu Li

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
Statistics
2012

ABSTRACT

TESTING OF REGRESSION FUNCTIONS WHEN
RESPONSES ARE MISSING AT RANDOM
By
Xiaoyu Li
This thesis consists two chapters. The ﬁrst chapter proposes a class of minimum distance tests for ﬁtting a parametric regression model to a regression function when some
responses are missing at random. These tests are based on a class of minimum integrated
square distances between a kernel type estimator of a regression function and the parametric
regression function being ﬁtted. The estimators of the regression function are based on two
completed data sets constructed by imputation and inverse probability weighting methods.
The corresponding test statistics are shown to have asymptotic normal distributions under
null hypothesis. Some simulation results are also presented.
The second chapter considers the problem of testing the equality of two nonparametric
regression curves against a one-sided alternatives based on two samples with possibly distinct
design and error densities, when responses are missing at random. This chapter proposes
a class of tests using imputation and covariate matching. The asymptotic distributions of
these test statistics are shown to be Gaussian under null hypothesis and a class of local
nonparametric alternatives. The consistency of these tests against a large class of ﬁxed
alternatives is also established. This chapter also includes a simulation study, which assesses
the ﬁnite sample behavior of a member of this class of tests.

Copyright by
XIAOYU LI
2012

ACKNOWLEDGMENTS

I would like to sincerely and gratefully thank my advisor Professor Hira L. Koul for his excellent guidance and great patience during the past ﬁve years. He sets up a career model for me
with his great enthusiasm towards science, serious attitude, hard working, and extraordinary
kindness to students. His love of statistics and mathematics encourages me to keep working
in my research.
I also wish to thank Professor Vidyadhar S. Mandrekar, Yimin Xiao, David Todem for
serving on my dissertation committee. Special thanks to Professor Lijian Yang and Yimin
Xiao for their help in my graduate study and life. Many thanks to Professor Vidyadhar S.
Mandrekar for his teaching and encouragement, and to Professor James Stapleton for his
help since the ﬁrst day I came to Michigan State University. I am grateful to the Department
of Statistics and Probability for oﬀering assistantship to support me to complete graduate
studies.
Finally, I would like to thank my family for their love which enables me to complete this
work and pursue my career goal.
This research is supported by the NSF DMS Grant 0704130, under the P.I.: Professor
Hira L. Koul.

iv

TABLE OF CONTENTS

List of Tables . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

Chapter 1
1.1
1.2
1.3
1.4
1.5
1.6

Minimum Distance Regression Model Checking when Responses
are Missing At Random . . . . . . . . . . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Consistency of the minimum distance estimators . . . . . . . . . . . . . . . .
Asymptotic distribution of the minimum distance estimators under H0 . . .
Asymptotic distribution of the test statistics under H0 . . . . . . . . . . . .
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

Chapter 2
2.1
2.2
2.3
2.4
2.5

Testing for Superiority of Two Regression Curves when
sponses are Missing At Random . . . . . . . . . . . . . . . .
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Asymptotic distribution of the test statistic under H0 , H1N , and H1 . .
Some suggested estimators . . . . . . . . . . . . . . . . . . . . . . . . .
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Bibliography . . . . . . . . . . . . . . .

v

1
1
5
10
16
35
46

Re. . .
. . .
. . .
. . .
. . .
. . .

50
50
53
58
68
73

. . . . . . . . . . . . . . . . . . .

79

LIST OF TABLES

Table 1.1

Table 1.2

Table 1.3

Empirical sizes and powers for model 0 vs. models 1-4 with X ∼
N (0, V1 ) and ε ∼ N (0, (.3)2 ) . . . . . . . . . . . . . . . . . . . . . .

48

Empirical sizes and powers for model 0 vs. models 1-4 with X ∼
N (0, V2 ) and ε ∼ N (0, (.3)2 ) . . . . . . . . . . . . . . . . . . . . . .

48

ˆ
Mean and s.d. of θn1 under model 0 with X ∼ N (0, V1 ), ε ∼
2 ), and E(δ|X = x) = ∆ (x) . . . . . . . . . . . . . . . . .
N (0, (.3)
1

49

Table 2.1

ˆ
Empirical sizes of V , with coeﬃcients ρ1 , ρ2 , ρ3 , and ∆l = Dl , l = 1, 2. 74

Table 2.2

ˆ
Empirical sizes of V , with coeﬃcients ρ1 , ρ2 , ρ3 , and ∆l = 1, l = 1, 2. 74

Table 2.3

ˆ
Empirical powers of V with ρ1 , ρ2 , ρ3 in Table 2.1, and ∆l = Dl ,
l = 1, 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

Table 2.4

ˆ
Empirical powers of V with ρ1 , ρ2 , ρ3 in Table 2.2, and ∆l = 1, l = 1, 2. 75

Table 2.5

ˆ
Empirical sizes and powers of V with ρ1 = ρ2 = ρ3 = 1 and ∆l = Dl ,
l = 1, 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

ˆ
Empirical sizes and powers of V with ρ1 = ρ2 = ρ3 = 1 and ∆l = 1,
l = 1, 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

Table 2.6

vi

Chapter 1

Minimum Distance Regression Model
Checking when Responses are Missing
At Random

1.1

Introduction

In this chapter, we discuss a class of minimum distance tests for ﬁtting a parametric model
to the regression function based on imputation and inverse probability weighting method,
when responses are missing at random. To be speciﬁc, let X be an explanatory variable of
dimension d with d ≥ 1, Y be a response variable of dimension 1 with E|Y | < ∞, δ be an
indicator for whether the response is missing or observed, i.e. δ = 1 if Y is observed, and
δ = 0 if Y is missing. The missing mechanism of the data is missing at random, in which δ
and Y are conditionally independent, given X, i.e. P (δ = 1|Y, X) = P (δ = 1|X), a.s.; see
1

Little and Rubin (1987). Let

µ(x) = E(Y |X = x),

x ∈ Rd ,

denote the regression function. Consider the regression model

Y = µ(X) + ε

(1.1)

with response missing at random. Let {mθ (·) : θ ∈ Θ}, Θ ⊂ Rq , be a given parametric
model and I be a compact subset of Rd . The problem of interest is to test the hypothesis

H0 : µ(x) = mθ0 (x)

for some θ0 ∈ Θ, and for all x ∈ I,

H1 : H0 is not true,

based on the random sample {(Xi , δi Yi ) : i = 1, 2, · · · , n} from the distribution of (X, δY )
in model (1.1). One is also interested in ﬁnding the parameter θ ∈ Θ that best ﬁts the data
under the null hypothesis.

Regression model checking when data are completely observed is a classical problem in
statistics. Many interesting results are available, see, e.g., Eubank and Spiegelman (1990),
Eubank and Hart (1992, 1993), H¨rdle and Mammen (1993), Zheng (1996), Hart (1997),
a
Stute et al. (1998), Koul and Ni (2004), Koul and Song (2009), Koul (2011), among others.
Hart (1997) summarized numerous testing procedures. Koul and Ni (2004) (K-N) proposed
a class of tests based on certain minimized L2 distances between a nonparametric estimator
of the regression function and the parametric model being ﬁtted. They proved asymptotic
2

normality of the minimum distance estimators and the proposed test statistics under the
ﬁtted model, and consistency of the proposed tests against a class of ﬁxed alternatives. Koul
and Song (2009) extended this minimum distance methodology to the regression model with
Berkson measurement errors. They also obtained the asymptotic power of the proposed
tests against a class of local alternatives. Koul (2011) implemented the minimum distance
methodology on classical regression model with design non-random and uniform on [0, 1].
Sun and Wang (2009) considered the model checking problem when data are missing at random. They constructed complete data sets by imputation and inverse probability weighting
methods, and proposed two score-type and two empirical process based test statistics. The
asymptotic behaviors of these test statistics were investigated under the null hypothesis and
local alternatives.
In this chapter we focus on adapting the minimum distance testing method of K-N to
missing data at random setup when the data are completed by the imputation and inverse
probability weighting methods. To describe the testing procedure, we need to estimate µ(x).
Since, under H0 , µ is parametric, we only need to estimate θ0 at

√

n-consistent rate. Let αn
ˆ

be such an estimator of θ0 based on the random sample. A suggested choice of αn is given in
ˆ
˜
Theorem 1.4.1, Section 1.4 below. Let K be a symmetric kernel function on [−1, 1]d , b = bn
˜
˜
be a bandwidth sequence of positive numbers, Kb (y) := b−d K(y/bn ), y ∈ Rd ; and let for
n
x ∈ Rd ,

∆(x) = P (δ = 1|X = x),

and

ˆ
∆n (x) =

n
˜
i=1 δi Kb (x − Xi ) .
n
˜
i=1 Kb (x − Xi )

ˆ
Note that ∆n (x) is the Nadaraya-Watson kernel estimator of ∆(x). We construct two comˆ
plete data sets {(Xi , Yij ), i = 1, · · · , n}, j = 1, 2, by imputation and inverse probability
3

weighting methods, respectively, where

ˆ
Yi1 = δi Yi + (1 − δi )mαn (Xi ),
ˆ

i = 1, · · · , n;

δi
δi
Y + 1−
mαn (Xi ),
ˆ
ˆ n (Xi ) i
ˆ n (Xi )
∆
∆

ˆ
Yi2 =

(1.2)
i = 1, · · · , n.

(1.3)

To proceed further, let K and K ∗ be kernel functions on [−1, 1]d , h = hn and w = wn be
window width sequences of positive numbers, G be a σ-ﬁnite measure on Rd with Lebesgue
density g. Assume the design variable X has a uniformly continuous Lebesgue density f
that is bounded from below on I. Deﬁne
n

n

ˆ
fh (x) = n−1

Kh (x − Xi ),

ˆ
fw (x) = n−1

i=1

∗
Kw (x − Xi ),

x ∈ Rd ,

i=1

where hn ∼ n−a with 0 < a < min(1/(2d), 4/(d(d + 4))), and wn ∼ (log n/n)1/(d+4) .
Adaptive versions of the L2 distances proposed in K-N in the current setup are
n

ˆ
Tnj (θ) =

n−1
I

2

ˆ
ˆ
Kh (x − Xi )(Yij − mθ (Xi )) {fw (x)}−2 dG(x),

θ ∈ Rq ,

i=1

and the corresponding minimum distance estimators are

ˆ
ˆ
θnj := arg min Tnj (θ),

j = 1, 2.

θ∈Θ

ˆ ˆ
The proposed tests of H0 are to be based on Tnj (θnj ), j = 1, 2.
To proceed further, we need more notation. Let

ˆ
εij := Yij − mθ (Xi ),
ˆ
ˆ
nj

4

j = 1, 2,

(1.4)

n

ˆ
Cnj := n−2

i=1 I

2
Kh (x − Xi )ˆ2 {fw (x)}−2 dG(x),
εij ˆ

ˆ
Γnj := 2hd n−2
i=k

I

ˆ
Kh (x − Xi )Kh (x − Xk )ˆij εkj {fh (x)}−2 dG(x)
ε ˆ

ˆ
ˆ ˆ
ˆ
ˆ 1/2
Dnj := nhd/2 (Tnj (θnj ) − Cnj )/Γnj ,

2

,

j = 1, 2.

ˆ
For each j = 1, 2, the proposed test rejects H0 whenever |Dnj | is large. Asymptotic normality
ˆ
ˆ
of n1/2 (θnj − θ0 ) and Dnj , j = 1, 2, under H0 are established in Section 1.4 and Section 1.5,
ˆ
respectively. Consistency of θnj , j = 1, 2, under H0 is given in Section 1.3. Assumptions and
preliminary lemmas needed to prove all these results are stated in Section 1.2, while Section
1.6 is devoted to simulation studies.
In the sequel, we write h for hn , w for wn , and b for bn ; the integrals with respect to
the G-measure are understood to be over the set I; all limits are taken as n → ∞, unless
speciﬁed otherwise; for any two sequences of real numbers an and bn , notation an ∼ bn
means that an /bn → 1; the convergence in probability is denoted by →p , in distribution, by
→d , and almost surely, by →a.s. ; the r-dimension normal distribution with mean vector a
and covariance matrix B is denoted by Nr (a, B), and N (a, B) = N1 (a, B). Denoted by Φ
the standard normal cumulative distribution function, and zα the (1 − α)-quantile.

1.2

Assumptions

Here we shall state the needed assumptions.
(e1) (Xi , δi Yi ); Xi ∈ Rd , Yi ∈ R, δi = 0 or 1, i = 1, 2, · · · , n, are i.i.d. random vectors
with δ = 1, if Y is observed, and δ = 0, if Y is missing; δ and ε are conditionally
independent, given X.
5

(e2) E(ε|X = x) = 0, Eε2 < ∞. The function σ 2 (x) := E(ε2 |X = x) is a.e. in (G)
continuous on I, and ∆(x) := E(δ|X) = P (δ = 1|X = x) is positive and Lipschitzcontinuous of order 1 on an open interval containing I.
(e3) E|ε|2+δ0 < ∞, for some δ0 > 0.
(e4) Eε4 < ∞.
(f1) The design variable X has a uniformly continuous Lebesgue density f that is
bounded from below on an open interval containing I.
(f2) The density f is twice continuously diﬀerentiable with a compact support.
(g) G is a σ-ﬁnite measure on Rd and has a continuous Lebesgue density g.
(k1) The kernels K and K ∗ are positive symmetric square integrable densities on
[−1, 1]d . In addition, K ∗ satisﬁes Lipschitz-continuity of order 1.
˜
(k2) The kernel K is positive symmetric square integrable density on [−1, 1]d , satisfying
˜
Lipschitz-continuity of order γ, γ > 0. K(u) attains its maximum at u = 0.
(m1) For each θ, mθ (x) is a.s. continuous in x w.r.t. integrating measure G.
(m2) The parametric family of models mθ (x) is identiﬁable w.r.t. θ, i.e., if mθ1 (x) =
mθ2 (x), for almost all x(G), then θ1 = θ2 .
(m3) For some positive continuous function

on I and for some β > 0,

|mθ2 (x) − mθ1 (x)| ≤ θ2 − θ1 β (x),

∀θ2 , θ1 ∈ Θ, x ∈ I.

(m4) The true parameter θ0 is an inner point of Θ. For every x, mθ (x) is diﬀerentiable
in θ in a neighborhood of θ0 with the vector of derivatives mθ (x), such that for every
˙
6

ε > 0, k < ∞,

lim sup P

mθ (Xi ) − mθ0 (Xi ) − (θ − θ0 )T mθ0 (Xi )
˙

√ sup

n

θ − θ0

1≤i≤n, nhd θ−θ0 ≤k

>ε

= 0.
(m5) The vector function x → mθ0 (x) is continuous in x ∈ I and for every ε > 0,
˙
there is an Nε < ∞ such that for every 0 < k < ∞,

max

P

1≤i≤n,(nhd )1/2 θ−θ0 ≤k

(m6) n−1

h−d/2 mθ (Xi ) − mθ0 (Xi ) ≥ ε ≤ ε,
˙
˙

n
˙T
˙
i=1 δi mθ0 (Xi )mθ0 (Xi ),

∀n > Nε .

˙θ
n ≥ q, and E[δ mθ0 (X)mT (X)] are positive deﬁ˙
0

nite.
(a) The estimator αn is
ˆ

√

n-consistent for θ0 under H0 .

(b1) nbd → ∞, nbd+1 → 0.
n
n
(b2) bn ∼ n−r , where 1/(d + 1) < r < 1/d.
(h1) hn → 0.
(h2) nh2d → ∞.
n
(h3) hn ∼ n−a , where 0 < a < min(1/(2d), 4/(d(d + 4))).
(h4) hn ∼ n−a , where 0 < a < 1/d − r, with r in (b2).
(w) wn = an (log n/n)1/(d+4) , an → a0 > 0.
Note that (h3) implies (h1) and (h2), (h4) implies (h3), and (b2) implies (b1). Among these
assumptions, (e3), (e4), (f1), (f2), (g), (k1), (m1)-(m5), (h1)-(h3), (w), and part of (e1) and
(e2), are similar as in K-N when no data are missing; conditions on δ and ∆ in (e1) and (e2)
7

are for the missing data at random setup; (m6) and (a) are used for the imputation method,
while (k2), (a), (b1), (b2), and (h4) are for the inverse probability weighting method. An
example of r in (b2) and a in (h4) is r = (2d + 1)/(2d(d + 1)), a = 1/(2d(d + 1)).

We need the following notation in the proofs later. For i = 1, · · · , n, j = 1, 2, x ∈ Rd ,
deﬁne

ε∗ :=
i2

ε∗ := δi εi ,
i1

δi
ε,
∆(Xi ) i

˜
Yi1 := mθ0 (Xi ) + ε∗ ,
i1

(1.5)

˜
Yi2 := mθ0 (Xi ) + ε∗ ,
i2
∗
∗
Kwi := Kw (x − Xi ),

Khi (x) := Kh (x − Xi ),
n

n

ˆ
fh (x) := n−1
dψ(x) :=

˜
˜
Kbi (x) := Kb (x − Xi ),

Khi (x),

i=1
{f (x)}−2 dG(x),

ˆ
fw (x) := n−1

n
∗
Kwi (x),

ˆ
fb (x) := n−1

i=1

i=1

ˆ
ˆ
dψh (x) := {fh (x)}−2 dG(x),

n

µn (x, θ) :=

ˆ
ˆ
dψw (x) := {fw (x)}−2 dG(x),

n

n−1

Khi (x)mθ (Xi ),

µn (x, θ) :=
˙

n−1

i=1
n

Zn (x, θ) := n−1

Khi (x)mθ (Xi ),
˙
i=1

Khi (x)(mθ (Xi ) − mθ0 (Xi )),
i=1

µh (x) := E µn (x, θ0 ) = EKh (x − X)mθ0 (X),
˙
˙
˙
n

µnδ (x, θ) := n−1
˙

Khi (x)(1 − δi )mθ (Xi ),
˙
i=1

µhδ (x) := E µnδ (x, θ0 ) = EKh (x − X)(1 − δ)mθ0 (X),
˙
˙
˙
n

Unj (x, θ) := n−1

˜
Khi (x)(Yij − mθ (Xi )),
i=1
n

ˆ
Unj (x, θ) := n−1

ˆ
Khi (x)(Yij − mθ (Xi )),
i=1
n

Unj (x) := Unj (x, θ0 ) =

n−1

Khi (x)ε∗ ,
ij
i=1

n

Tnj (θ) :=

˜
Kbi (x),

n−1

2

˜
Khi (x)(Yij − mθ (Xi )) dψ(x),
i=1

8

θ ∈ Rq ,

n

n−1

˜
Tnj (θ) :=

2

ˆ
˜
Khi (x)(Yij − mθ (Xi )) dψw (x),

θ ∈ Rq ,

i=1

˜
˜
θnj := arg min Tnj (θ),
θ∈Θ

˜
εij := Yij − mθ (Xi ),
˜
˜
nj

n

2

n−1

˜
Anj :=

ˆ
˜
Khi (x)(Yij − Yij ) dψ(x),
i=1

n

Cnj := n−2

2
˜
Khi (x)(Yij − mθ0 (Xi ))2 dψ(x),
i=1

rn (x) :=
ˆ

1
1
−
,
ˆ n (x) ∆(x)
∆

un := αn − θ0 ,
ˆ

rni := rn (Xi ),
ˆ
ˆ

dni := mαn (Xi ) − mθ0 (Xi ) − uT mθ0 (Xi ).
ˆ
n ˙

The following lemmas are found useful in proofs later. Lemma 1.2.1 is facilitated by Mack
and Silverman (1982), and Lemma 1.2.3 is derived by Theorem 3 of Collomb and H¨rdle
a
(1986).

Lemma 1.2.1. Under the conditions (f1), (k1), (h1), and (h2), the following hold.

ˆ
sup |fh (x) − f (x)| = op (1),

(1.6)

ˆ
sup |fw (x) − f (x)| = op (1),

(1.7)

x∈I

x∈I

f (x)
− 1 = op (1).
ˆ
x∈I fw (x)

(1.8)

sup

Lemma 1.2.2. (Theorem 2.2 part (2), Bosq (1998)) Under the assumptions (f2), (k1), and
(w), we have for ∀k > 0, and k ∈ N,

ˆ
(logk n)−1 (n/ log n)2/(d+4) sup fw (x) − f (x) → 0,
x∈I

9

a.s.

(1.9)

Lemma 1.2.3. Suppose (e2), (f2), (k2), and (b1) hold, then

ˆ
sup |fb (x) − f (x)| = op (1),

(1.10)

ˆ
sup |∆n (x) − ∆(x)| = op (1),

(1.11)

x∈I

x∈I

1
1
−
= op (1),
ˆ n (x) ∆(x)
x∈I ∆
1
1
n1/2 bd/2 (log n)−1/2 sup
−
= Op (1).
ˆ
∆(x)
x∈I ∆n (x)
sup

1.3

(1.12)
(1.13)

Consistency of the minimum distance estimators

ˆ
In this section we prove the consistency of the minimum distance estimators θnj , j = 1, 2,
under H0 . To state the results, we need Lemma 3.1 in K-N as a preliminary reproduced
here for the sake of completeness. Let L2 (G) denote a class of square integrable real valued
functions on Rd with respect to G. Deﬁne

ρ(ν1 , ν2 ) :=

(ν1 (x) − ν2 (x))2 dG(x),

ν1 , ν2 ∈ L2 (G),

and the map

M(ν) := arg min ρ(ν, mθ ),

ν ∈ L2 (G).

θ∈Θ

Lemma 1.3.1. (Koul and Ni (2004)) Let m satisfy conditions (m1)-(m3). Then the following
hold.

(a) M(ν) always exists, ∀ν ∈ L2 (G).
10

(b) If M(ν) is unique, then M is continuous at ν in the sense that for any sequence
of {νn } ∈ L2 (G) converging to ν in L2 (G), M(νn ) → M(ν), i.e.,

ρ(νn , ν) → 0

implies M(νn ) → M(ν)

as n → ∞.

(c) M(mθ (·)) = θ, uniquely for ∀θ ∈ Θ.
ˆ
We now proceed to state and prove the consistency of θnj , j = 1, 2.
Theorem 1.3.1. Under H0 , (e1), (e2), (f1), (k1), (m1)-(m4), (a), (h1), and (h2),

ˆ
θnj →p θ0 ,

j = 1, 2.

Proof. The basic idea of the proof is the same as in K-N, Theorem 3.1; Only details
ˆ
˜
with respect to Yij − Yij , i = 1, · · · , n, are diﬀerent. By part (c) in Lemma 1.3.1, one has
ˆ
θnj = M(mθ ), j = 1, 2, and θ0 = M(mθ0 ). Then it suﬃces to prove ρ(mθ , mθ0 ) = op (1),
ˆ
ˆ
nj

nj

j = 1, 2, by part (b1) in Lemma 1.3.1. Deﬁne
n

mnj (x) :=
˜

n−1

n

˜ ˆ
Khi (x)Yij /fw (x),

mnj (x) :=
ˆ

i=1

n−1

ˆ ˆ
Khi (x)Yij /fw (x),
i=1

ˆ
Rnj (θ) =

[mnj (x) − mθ (x)]2 dG(x),
ˆ

θ ∈ Rq ,

Cn (θ) :=

ˆ
ˆ
[µn (x, θ) − fw (x)mθ ]2 dψw (x).

ˆ
ˆ
βnj := arg min Rnj (θ),
θ∈Θ

By the fact that

ˆ ˆ
ˆ
ρ(mθ , mθ0 ) ≤ 2[ρ(mθ , mnj ) + ρ(mnj , mθ0 )] = 2[Rnj (θnj ) + Rnj (θ0 )],
ˆ
ˆ
ˆ
ˆ
nj

nj

11

it suﬃces to show

ˆ
Rnj (θ0 ) = op (1),

j = 1, 2,

ˆ ˆ
Rnj (θnj ) = op (1),

(1.14)

j = 1, 2.

(1.15)

If we can prove (1.14) and the following result

ˆ
ˆ
sup |Tnj (θ) − Rnj (θ)| = op (1),

j = 1, 2,

(1.16)

θ∈Θ

ˆ
we can obtain (1.15). This is because the deﬁnition of βnj and (1.14) lead to the result
ˆ ˆ
ˆ ˆ
Rnj (βnj ) = op (1), which together with (1.16) leads to Tnj (βnj ) = op (1); by the deﬁnition of
ˆ
ˆ ˆ
θnj , one has Tnj (θnj ) = op (1); this result and (1.16) bring the claim (1.15). Therefore, we
only need to prove (1.14) and (1.16).

˜
Recall Anj from (1.5). To prove (1.14), note that

ˆ
[fw (x)(mnj (x) − mnj (x)) + Unj (x)
ˆ
˜

ˆ
Rnj (θ0 ) =

ˆ
ˆ
+µn (x, θ0 ) − fw (x)mθ0 (x)]2 dψw (x)
≤ 3
+3

[mnj (x) − mnj (x)]2 dG(x) + 3
ˆ
˜

2
ˆ
Unj (x)dψw (x)

ˆ
ˆ
[µn (x, θ0 ) − fw (x)mθ0 (x)]2 dψw (x)

ˆ2
˜
˜
≤ 3(1 + sup |f 2 (x)/fw (x) − 1|)Anj + 3Tnj (θ0 ) + 3Cn (θ0 ),

j = 1, 2,

x∈I

By Fubini, the continuity of f , σ 2 , and ∆, assured by (e2) and (f1), and by (k1) and (h2),
we have
12

E

2
Un1 (x)dψ(x) = n−1

2
EKh (x − X)∆(X)σ 2 (X)dψ(x) = O((nhd )−1 ),

E

2
Un2 (x)dψ(x) = n−1

2
EKh (x − X){∆(X)}−1 σ 2 (X)dψ(x) = O((nhd )−1 ),

so that Tnj (θ0 ) =

2
Unj (x)dψ(x) = Op ((nhd )−1 ), j = 1, 2. Together by (1.8), we have

ˆ
˜
Tnj (θ0 ) ≤ sup |f (x)/fw (x)|2 Tnj (θ0 ) = Op ((nhd )−1 ),

j = 1, 2.

x∈I

The claim Cn (θ0 ) = op (1) can be derived by the same argument as that of proving (3.5) in
K-N. Note that for i = 1, · · · , n,

ˆ
˜
Yi1 − Yi1 = (1 − δi )(mαn (Xi ) − mθ0 (Xi )),
ˆ
ˆ
˜
Yi2 − Yi2 = rni δi εi + 1 −
ˆ

(1.17)

δi
(mαn (Xi ) − mθ0 (Xi ))
ˆ
∆(Xi )

−ˆni δi (mαn (Xi ) − mθ0 (Xi )).
r
ˆ

Recall un and dni from (1.5). By calculation in (3.9) in K-N, (m4), and (a), we have
|d |2
˜
An1 ≤ 2 un 2 max ni 2
1≤i≤n un

ˆ2
fh (x)dψ(x)

n

+2 un

n−1

2

Khi (x)(1 − δi ) mθ0 (Xi )
˙

2

dψ(x)

i=1

= op (1),

(1.18)
n

˜
An2 ≤ 3
+3

n−1

2

Khi (x)ˆni δi εi dψ(x)
r

i=1
n
n−1
i=1

Khi (x) 1 −

2
δi
(mαn (Xi ) − mθ0 (Xi )) dψ(x)
ˆ
∆(Xi )

13

n

n−1

+3

2

Khi (x)ˆni δi (mαn (Xi ) − mθ0 (Xi )) dψ(x)
r
ˆ

i=1
n

≤ 3

n−1

2

Khi (x)ˆni δi εi dψ(x)
r
i=1

+6 un

2

d2
ni
max
1≤i≤n un 2

n

n−1

Khi (x) 1 +
i=1

n

+6 un 2
+6

+6

n−1

Khi (x) 1 −

δi
∆(Xi )

2

δi
∆(Xi )

mθ0 (Xi )
˙

2

dψ(x)

dψ(x)

i=1
n
2
d2
ni
Khi (x)δi dψ(x)
n−1
sup rni
ˆ2
un 2 max
1≤i≤n un 2 1≤i≤n
i=1
n
2
n−1
Khi (x)δi mθ0 (Xi ) dψ(x)
˙
un 2 sup rni
ˆ2
1≤i≤n
i=1

= op (1),

(1.19)

Therefore, together with (1.8) and (1.12), we obtain (1.14). To prove (1.16), write

ˆ
ˆ
Tnj (θ) − Rnj (θ)
mnj (x) −
ˆ

=
= −
−2

µn (x, θ) 2
dG(x) −
ˆ
fw (x)

[mnj (x) − mθ (x)]2 dG(x)
ˆ

2
µn (x, θ)
− mθ (x) dG(x)
ˆ
fw (x)

mnj (x) −
ˆ

µn (x, θ)
ˆ
fw (x)

µn (x, θ)
− mθ (x) dG(x),
ˆ
fw (x)

j = 1, 2.

By Cauchy-Schwarz (C-S) inequality, we have

1/2
ˆ
ˆ
ˆ1/2
sup |Tnj (θ) − Rnj (θ)| ≤ sup Cn (θ) + 2 sup Tnj (θ)Cn (θ),

θ∈Θ

θ∈Θ

θ∈Θ

Hence it suﬃces to prove

14

j = 1, 2.

sup Cn (θ) = op (1),
θ∈Θ

ˆ
sup Tnj (θ) = Op (1),

j = 1, 2.

(1.20)

θ∈Θ

One can prove the ﬁrst claim in (1.20) by the same argument as in proving (3.14) in K-N.
˜
To prove the second part of (1.20), note that by adding and subtracting Yij to the i-th
ˆ
summand in Tnj (θ), we obtain

ˆ2
˜
ˆ
Tnj (θ) ≤ 2(1 + sup |f 2 (x)/fw (x) − 1|) Anj +
x∈I

[Unj (x) − Zn (x, θ)]2 dψ(x)

ˆ2
≤ 2(1 + sup |f 2 (x)/fw (x) − 1|)
x∈I

˜
× Anj + 2

2
Unj (x)dψ(x) + 2

From (3.16) in K-N, one obtains supθ∈Θ

2
Zn (x, θ)dψ(x) .

2
Zn (x, θ)dψ(x) = Op (1). By (1.8) and Anj = op (1),

2
ˆ
Unj (x)dψ(x) = op (1) in the argument above, we have supθ∈Θ Tnj (θ) = Op (1), j = 1, 2.

Thus the proof of the theorem is complete.

15

1.4

Asymptotic distribution of the minimum distance
estimators under H0

ˆ
This section states and proves the asymptotic normality of θnj , j = 1, 2. To proceed further,
we need the following notation. Let

Σ0 :=

mθ0 (x)mT (x)g(x)dx,
˙
˙θ

Σ∗ :=
0

(1 − ∆(x))mθ0 (x)mT (x)g(x)dx,
˙
˙θ

Σ1 :=

σ 2 (x)∆(x)mθ0 (x)mT (x)g 2 (x)(f (x))−1 dx
˙
˙θ

(1.21)

0

0

0

˜
˙θ
σ 2 (x)∆(x)mθ0 (x)mT (x)g(x)dx Σ−1 Σ∗
˙
0
0

Σ∗ + 2
1

0

˜
+Σ∗ Σ−1
0 0

˜
˙θ
σ 2 (x)∆(x)mθ0 (x)mT (x)f (x)dx Σ−1 Σ∗ ,
˙
0
0
0

Σ2 :=

σ 2 (x)mθ0 (x)mT (x)g 2 (x)(∆(x)f (x))−1 dx,
˙
˙θ

˜
Σ0 :=

∆(x)mθ0 (x)mT (x)f (x)dx,
˙
˙θ
0

0

n

˜
Σn := n−1
i=1

δi mθ0 (Xi )mT (Xi ),
˙
˙θ
0

n

˜
Sn := n−1

δi εi mθ0 (Xi ),
˙

Snj :=

Unj (x)µh (x)dψ(x),
˙

j = 1, 2.

i=1

Theorem 1.4.1. Suppose H0 , (e1), (e2), (e3), (f1), (f2), (g), (k1), (m1)-(m5), (a), and
(h3) hold. Then,

ˆ
n1/2 (θn1 − θ0 ) = Σ−1 n1/2 {Sn1 + Σ∗ (ˆ n − θ0 )} + op (1),
0 α
0

(1.22)

where αn is in (1.2), and
ˆ

ˆ
n1/2 (θn1 − θ0 ) = Op (1).
16

(1.23)

T
If under H0 , mθ0 (x) is a linear function of θ0 , i.e. mθ0 (x) = θ0 l(x), for all x ∈ I, where

˜n
l(x) satisﬁes (m1)-(m3) and (m6), we can take αn = Σ−1 {n−1
ˆ

n
˙
i=1 δi Yi mθ0 (Xi )},

which

is the least square estimator and satisﬁes condition (a), and we obtain

ˆ
˜ ˜
n1/2 (θn1 − θ0 ) = Σ−1 n1/2 {Sn1 + Σ∗ Σ−1 Sn } + op (1).
0 n
0

(1.24)

If (k2), (b2), and (h4) hold, one has

ˆ
n1/2 (θn2 − θ0 ) = Σ−1 n1/2 Sn2 + op (1).
0

(1.25)

Consequently, (1.24) and (1.25) lead to

ˆ
n1/2 (θnj − θ0 ) →d Nq (0, Σ−1 Σj Σ−1 ),
0
0

j = 1, 2.

(1.26)

˜ ˜ ˜
Here Σ0 , Σ∗ , Σ0 , Σn , Sn , Snj , and Σj , j = 1, 2, are in (1.21).
0

Proof. We prove the theorem in two steps, following the routine to prove Theorem 4.1
in K-N.
Step 1. The ﬁrst step is to show that

ˆ
nhd θnj − θ0 2 = Op (1),

Let Dn (θ) :=

j = 1, 2.

(1.27)

2
Zn (x, θ)dψ(x). Note that

ˆ
nhd Dn (θnj )

=

nhd

ˆ
ˆnj − θ0 2 Dn (θnj ) ,
θ
ˆ
θnj − θ0 2
17

j = 1, 2.

It suﬃces to prove

ˆ
nhd Dn (θnj ) = Op (1),

j = 1, 2,

(1.28)

because the rest follows the a similar argument used in proving (4.4) in K-N, if the correˆ
ˆ
sponding θn is changed to θnj , j = 1, 2. Observe that

ˆ
nhd Dn (θnj )
= nhd

ˆ
ˆ
ˆ
[Unj (x, θnj ) − Unj (x, θ0 )]2 dψ(x)

ˆ2
≤ 2nhd (1 + sup |fw (x)/f 2 (x) − 1|)
x∈I

×

ˆ
ˆ
ˆ2
Unj (x, θnj )dψw (x) +

ˆ
ˆ2
Unj (x, θ0 )dψw (x)

ˆ2
ˆ
≤ 4nhd (1 + sup |fw (x)/f 2 (x) − 1|)Tnj (θ0 )
x∈I

≤

8nhd (1 +

ˆ2
ˆ2
˜
sup |fw (x)/f 2 (x) − 1|)(1 + sup |f 2 (x)/fw (x) − 1|){Tnj (θ0 ) + Anj }.

x∈I

x∈I

˜
By (1.7), (1.8), and Tnj (θ0 ) = Op ((nhd )−1 ), j = 1, 2, it suﬃces to prove nhd Anj = Op (1),
j = 1, 2. This result hold for j = 1 because of (1.18). When j = 2, by (a), (1.12), and
calculation in (1.19), it suﬃces to show the following results:
n

nhd

n−1

2

Khi (x)ˆni δi εi dψ(x) = Op (1).
r

(1.29)

i=1

To prove (1.29), we have
n

nhd E

n−1

n

2

Khi (x)ˆni δi εi dψ(x) = n−1 hd
r
i=1

E rni δi ε2
ˆ 2 i
i=1

18

2
Khi (x)dψ(x)

= hd E δˆn (X)ε2
r2
= h−d E
×

2
Kh (x − X)dψ(x)

K 2 ((x − z)/h){f (x)}−2 g(x)f (z){∆(z)}−1 σ 2 (z)

˜
˜
(∆(z) − 1)K(0) + n (∆(z) − δi )Kbi (z) 2
i=2
dzdx
˜
˜
K(0) + n δi Kbi (z)
i=2
K 2 (u){f (z + uh)}−2 g(z + uh)f (z){∆(z)}−1 σ 2 (z)

=
×E

˜
˜
(∆(z) − 1)K(0) + n (∆(z) − δi )Kbi (z) 2
i=2
dzdu,
˜
˜
K(0) + n δi Kbi (z))
i=2

where the last equality is derived by Fubini’s theorem. Let

Bn (z) := E

˜2
˜
(∆(z) − 1)2 K 2 (0) + n (∆(z) − δi )2 Kbi (z)
i=1
,
˜
˜
[K(0) + n δi Kbi (z))]2
i=1

z ∈ Rd .

Let Ib be the bn -neighborhood of compact set I. By (e2), (f1), and (k1), it is suﬃcient
˜
to show supz∈I Bn (z) = O(1). Let In (z) := K(0) +
b

n
˜
i=1 δi Kbi (z),

z ∈ Ib , n ≥ 1, and

˜
I0 (z) ≡ K(0). For any z ∈ Ib , write Bn (z) = Bn1 (z) + B2n (z) + 2Bn3 (z) − 2Bn4 (z), where

˜
Bn1 (z) := E [In (z)]−2 (∆(z) − 1)2 K 2 (0) ,
n

Bn2 (z) := E [In

(z)]−2

Bn3 (z) := E [In

(z)]−2

˜2
(∆(z) − δi )2 Kbi (z) ,
i=1

˜
˜
(∆(z) − δi )(∆(z) − δj )Kbi (z)Kbj (z) ,
1≤i<j≤n
n

˜
Bn4 (z) := E [In (z)]−2 (1 − ∆(z))K(0)

˜
(∆(z) − δi )Kbi (z) .
i=1

Observe that

˜
Bn1 (z) ≤ (∆(z) − 1)2 (K(0))2 E[In (z)]−2 ,

19

˜2
Bn2 (z) ≤ nE(∆(z) − δn )2 Kbn (z)E[In−1 (z)]−2
≤ nbd E[In−1 (z)]−2

˜
{∆2 (z) − 2∆(z)∆(z − bv) + ∆(z − bv)}K 2 (v)dv,

hence it is vital to analyze E[In (z)]−2 , z ∈ Ib . To proceed further, we shall calculate the
marginal probability mass function of δ and conditional probability density function of X
given δ based on the joint distribution of (X, δ). Let fX,δ be the joint p.d.f. of X and δ, fX
be the marginal p.d.f. of X, fδ be the marginal p.m.f. of δ, fX|δ be the conditional p.d.f. of
X given δ, fδ|X be the conditional p.m.f. of δ given X. For k ∈ {0, 1} and x ∈ Rd , by (e2)
and (f1),

fδ|X (k|x) = ∆k (x)(1 − ∆(x))1−k ,

fX (x) = f (x),

thus
fδ|X (k|x)fX (x)
fX,δ (x, k)
fX|δ (x|k) =
=
=
fδ (k)
fδ (k)

∆k (x)(1 − ∆(x))1−k f (x)
.
∆k (x)(1 − ∆(x))1−k f (x)dx

Let

X n := (X1 , X2 , · · · , Xn ),
ρ1 (x) := fX|δ (x|1) =
ρ∗ (z, b) :=
1

sup

δ n := (δ1 , δ2 , · · · , δn ),

∆(x)f (x)
,
p

ρ1 (z − bu),

p :=

ρ0 (x) := fX|δ (x|0) =
ρ1∗ (z, b) :=

u∈[−1,1]d

inf

u∈[−1,1]d

∆(x)f (x)dx,

(1 − ∆(x))f (x)
,
1−p

ρ1 (z − bu),

z ∈ Ib .

Write the conditional expectation E[·|X n , δ n ] as En [·]. For z ∈ Ib , if E[In (z)]−2 can be
bounded by an expression of E[In−1 (z)]−2 , then E[In−k (z)]−2 can be bounded by the ex-

20

pression of E[In−k−1 (z)]−2 , k = 0, 1, · · · , n − 1, and we can ﬁnally obtain a bound of
E[In (z)]−2 . Note that

E[In (z)]−2
˜
= E E [In−1 (z) + δn Kbn (z)]−2 |X n−1 , δ n−1 , Xn
= E E [In−1 (z)]−2 |X n−1 , δ n−1 , Xn (1 − ∆(Xn ))
˜
+E [In−1 (z) + Kbn (z)]−2 |X n−1 , δ n−1 , Xn ∆(Xn )
= E

En−1 [In−1 (z)]−2 (1 − ∆(x))f (x)dx
+bd

˜
En−1 [In−1 (z) + K(u)]−2 ∆(z − bu)f (z − bu)du
En−1 [In−1 (z)]−2 ρ0 (x)dx

= E (1 − p)
+pbd

˜
En−1 [In−1 (z) + K(u)]−2 ρ1 (z − bu)du

= E (1 − p)[In−1 (z)]−2 + pbd
+p[In−1 (z)]−2 1 − bd
=

1 − pbd
+pbd E

[−1,1]d

[−1,1]d

[−1,1]d

˜
[In−1 + K(u)]−2 ρ1 (z − bu)du

[−1,1]d

ρ1 (z − bu)du

ρ1 (z − bu)du E[In−1 (z)]−2

˜
[In−1 + K(u)]−2 ρ1 (z − bu)du

≤ {1 − pbd (2d ρ1∗ (z, b))}E[In−1 (z)]−2 + pbd ρ∗ (z, b)E
1

[−1,1]d

˜
[In−1 + K(u)]−2 du

≤ {1 − p(2b)d ρ1∗ (z, b)}E[In−1 (z)]−2 + p(2b)d ρ∗ (z, b)E[In−1 (z) + c0 ]−2 ,
1

(1.30)

˜
where c0 = min{2−(d+2) , 2−(d+2)/2 ( K 2 (u)du)1/2 }. To obtain the last inequality above,
˜
we used the following fact. For any a ≥ K(0) > 0,

21

[−1,1]d

(a + c0 )−2 du −

[−1,1]d

˜
(a + K(u))−2 du

˜
˜
K 2 (u) + 2aK(u) − c2 − 2ac0
0
du
2 (a + K(u))2
˜
d
(a + c0 )
[−1,1]
˜
˜
K 2 (u) − c2
K(u) − c0
0 du + 2a
= (a + c0 )−2
du
˜
˜
[−1,1]d (a + K(u))2
[−1,1]d (a + K(u))2
˜
c2
K 2 (u)
−2
−2
0 du
du − a
≥ (a + c0 )
2
2
[−1,1]d a
[−1,1]d (2a)
˜
c0
K(u)
+2a
du −
du
2
2
[−1,1]d a
[−1,1]d (2a)
=

≥ (a + c0 )−2 (2a)−2

˜
K 2 (u)du − 2d+2 c2 + (2a)−1 1 − 2d+2 c0
0

≥ 0,

thus,

E

[−1,1]d

˜
[In−1 + K(u)]−2 du

= E En−1
≤ E

[−1,1]d

[−1,1]d

˜
[In−1 + K(u)]−2 du

[In−1 + c0 ]−2 du = 2d E[In−1 + c0 ]−2 .

By a similar argument used in proving (1.30), we have k, j = 0, 1, · · · , n,

E[In−k (z) + jc0 ]−2 ≤ {1 − p(2b)d ρ1∗ (z, b)}E[In−k−1 (z) + jc0 ]−2
+p(2b)d ρ∗ (z, b)E[In−k−1 (z) + (j + 1)c0 ]−2 .
1

Therefore, by (1.30) and (1.31), the following hold:

22

(1.31)

E[In (z)]−2
≤ {1 − p(2b)d ρ1∗ (z, b)} {1 − p(2b)d ρ1∗ (z, b)}E[In−2 (z)]−2
+p(2b)d ρ∗ (z, b)E[In−2 (z) + c0 ]−2
1
+{p(2b)d ρ∗ (z, b)} {1 − p(2b)d ρ1∗ (z, b)}E[In−2 (z) + c0 ]−2
1
+p(2b)d ρ∗ (z, b)E[In−2 (z) + 2c0 ]−2
1
2

=
k=0

2
{1 − p(2b)d ρ1∗ (z, b)}2−k {p(2b)d ρ∗ (z, b)}k E[In−2 (z) + kc0 ]−2
1
k

≤ {1 − p(2b)d ρ1∗ (z, b)}2 {1 − p(2b)d ρ1∗ (z, b)}E[In−3 (z)]−2
+p(2b)d ρ∗ (z, b)E[In−3 (z) + c0 ]−2
1
+2{1 − p(2b)d ρ1∗ (z, b)}{p(2b)d ρ∗ (z, b)} {1 − p(2b)d ρ1∗ (z, b)}
1
×E[In−3 (z) + c0 ]−2 + p(2b)d ρ∗ (z, b)E[In−3 (z) + 2c0 ]−2
1
+{p(2b)d ρ∗ (z, b)}2 {1 − p(2b)d ρ1∗ (z, b)}E[In−3 (z) + 2c0 ]−2
1
+p(2b)d ρ∗ (z, b)E[In−3 (z) + 3c0 ]−2
1
3

=
k=0

3
{1 − p(2b)d ρ1∗ (z, b)}3−k {p(2b)d ρ∗ (z, b)}k E[In−3 (z) + kc0 ]−2
1
k

≤ ···
n

≤
k=0

n
{1 − p(2b)d ρ1∗ (z, b)}n−k {p(2b)d ρ∗ (z, b)}k E[I0 (z) + kc0 ]−2
1
k

˜
≤ {1 − p(2b)d ρ1∗ (z, b)}n [K(0)]−2
n

+c−2
0
k=1

n −2
k {1 − p(2b)d ρ1∗ (z, b)}n−k {p(2b)d ρ∗ (z, b)}k .
1
k

(1.32)

By (e2) and (f1), for large enough n, f (x) and ∆(x) are bounded and bounded below from
zero, and Lipschitz-continuous on Ib . Let f and ∆ denote the Lipschitz constants of f and
23

∆, respectively. Deﬁne

c1 := min ρ1∗ (z, b) > 0,

c2 := ( f sup ∆(z) + ∆ sup f (z)),

z∈Ib

p(z, b) :=
˜

z∈I2b

z∈I2b

p(2b)d ρ∗ (z, b)
1
.
1 + p(2b)d (ρ∗ (z, b) − ρ1∗ (z, b))
1

By (1.30) and the fact that supz∈I (ρ∗ (z, b) − ρ1∗ (z, b)) ≤ 2bd1/2 c2 , we have
b 1
˜
E[In (z)]−2 ≤ {1 − p(2b)d c1 }n [K(0)]−2 + c−2 {1 + p(2b)d+1 d1/2 c2 }n
0
n

×
k=1

n −2
k {1 − p(z, b)}n−k {˜(z, b)}k .
˜
p
k

(1.33)

Hence,

nbd E[In (z)]−2

≤

˜
nbd [K(0)]−2
+c−2
0

{1 − p(2b)d c

{1 + p(2b)d+1 d1/2 c
n

×nbd
k=1

1

2

d
−(p(2b)d c1 )−1 −n(2b) pc1
}

d+1 pd1/2 c
2
(p(2b)d+1 d1/2 c2 )−1 n(2b)
}

n −2
k {1 − p(z, b)}n−k {˜(z, b)}k .
˜
p
k

Note that n!(k!)−1 ((n − k)!)−1 {1 − p(z, b)}n−k {˜(z, b)}k is the probability mass function of
˜
p
the Binomial(n, p(z, b)) distribution. Recall the Chernoﬀ’s bound for a r.v. ζ ∼ B(n, p0 ),
˜
and a constant η ∈ (0, 1),

P (ζ < (1 − η)np0 ) < exp(−np0 η 2 /2).

Using this bound, with η = 1/2, we obtain that for any z ∈ Ib ,

24

n

nbd
k=1

n −2
k {1 − p(z, b)}n−k {˜(z, b)}k
˜
p
k
n˜(z,b)/2
p

=

nbd

n

n −2
k {1 − p(z, b)}n−k {˜(z, b)}k
˜
p
k

+
k=1
n˜(z,b)/2
p

k= n˜(z,b)/2 +1
p

n
{1 − p(z, b)}n−k {˜(z, b)}k + nbd {n˜(z, b)/2}−2
˜
p
p
k

≤

nbd

≤

k=1
d exp(−n˜(z, b)/8) + nbd {n˜(z, b)/2}−2
nb
p
p

= nbd exp(−nbd 2d−3 pc1 (1 + p(2b)d+1 d1/2 c2 )−1 )
+(nbd )−1 41−d p−2 c−2 (1 + p(2b)d+1 d1/2 c2 )2 = O((nbd )−1 ),
1

by condition (b1). Together with the fact that

d −1
{1 − p(2b)d c1 }−(p(2b) c1 ) → exp(1),
d+1 d1/2 c )−1
2

{1 + p(2b)d+1 d1/2 c2 }(p(2b)

→ exp(1),

we have

nbd E[In (z)]−2 = O((nbd )−1 ),

z ∈ Ib ,

sup nbd E[In (z)]−2 = O((nbd )−1 ).
z∈Ib

Hence

sup Bn1 (z) = O((nbd )−2 ),

z∈Ib

sup Bn2 (z) = O((nbd )−1 ).

z∈Ib

25

Observe that

Bn3 (z)
˜
˜
(∆(z) − δn−1 )(∆(z) − δn )Kb(n−1) (z)Kbn (z)
= n(n − 1)E
˜
˜
[In−2 (z) + δn−1 K
(z) + δn Kbn (z)]2
b(n−1)

= n(n − 1)b2d E

˜
˜
K(u)K(v) 2
∆ (z)(1 − ∆(z − bu))(1 − ∆(z − bv))
[In−2 (z)]2
˜
˜
K(u)K(v)
−2
∆(z)(1 − ∆(z))
˜
[In−2 (z) + K(u)]2
×∆(z − bu)(1 − ∆(z − bv))
+

˜
˜
K(u)K(v)
(∆(z) − 1)2
2
˜
˜
[In−2 (z) + K(u) + K(v)]
×∆(z − bu)∆(z − bv)
×f (z − bu)f (z − bv)dudv,

thus we have

|Bn3 (z)|
≤ n(n − 1)b2d E

˜
˜
K(u)K(v) 2
[∆ (z)(1 − ∆(z − bu))(1 − ∆(z − bv))
[In−2 (z)]2
−2∆(z)(1 − ∆(z))∆(z − bu)(1 − ∆(z − bv))
+(∆(z) − 1)2 ∆(z − bu)∆(z − bv)]
+2

˜
˜
˜
˜
K(u)K(v)
K(u)K(v)
−
˜
[In−2 (z)]2
[In−2 (z) + K(u)]2
×∆(z)(1 − ∆(z))∆(z − bu)(1 − ∆(z − bv))
×f (z − bu)f (z − bv)dudv

≤ n(n − 1)b2d E

˜
˜
K(u)K(v)
[ |∆(z) − ∆(z − bu)||∆(z) − ∆(z − bv)|
[In−2 (z)]2
+∆(z)(1 − ∆(z))|∆(z − bv) − ∆(z − bu)| ]
26

+4

˜
˜
K 2 (u)K(v)
∆(z)(1 − ∆(z))∆(z − bu)(1 − ∆(z − bv))
[In−2 (z)]3
×f (z − bu)f (z − bv)dudv

≤ nbd+2 (nbd )E[In−2 (z)]−2 2
∆
˜
˜
K(u)K(v) u

×

v f (z − bu)f (z − bv)dudv

+nbd+1 (nbd )E[In−2 (z)]−2 ∆(z)(1 − ∆(z)) ∆
˜
˜
K(u)K(v) u − v f (z − bu)f (z − bv)dudv

×

+4(nbd )2 E[In−2 (z)]−3 ∆(z)(1 − ∆(z))
˜
˜
K 2 (u)K(v)∆(z − bu)(1 − ∆(z − bv))f (z − bu)f (z − bv)dudv.

×

By a similar argument used in proving (1.33), for z ∈ Ib and j = 3, 4, · · · , one has

˜
E[In (z)]−j ≤ {1 − p(2b)d c1 }n [K(0)]−j
−j
+c0 {1 + p(2b)d+1 d1/2 c2 }n

n
k=1

n −j
k {1 − p(z, b)}n−k {˜(z, b)}k ,
˜
p
k

hence by (b1) and Chernoﬀ’s bound, we obtain that

n2 b2d E[In (z)]−3
˜
≤ n2 b2d [K(0)]−3 {1 − p(2b)d c1 }n
n

+c−3 {1 + p(2b)d+1 d1/2 c2 }n × n2 b2d
0
k=1

≤

˜
n2 b2d [K(0)]−3

{1 − p(2b)d c

1

n −3
k {1 − p(z, b)}n−k {˜(z, b)}k
˜
p
k

d
−(p(2b)d c1 )−1 −n(2b) pc1
}

+c−3 {1 + p(2b)d+1 d1/2 c2 }(p(2b)
0

d+1 d1/2 c )−1 n(2b)d+1 pd1/2 c2
2

× n2 b2d {n˜(z, b)/2}−3 + n2 b2d exp(−n˜(z, b)/8)
p
p

27

˜
∼ n2 b2d [K(0)]−3 exp(−n(2b)d pc1 )
+c−3 exp(n(2b)d+1 pd1/2 c2 ) (nbd )−1 81−d p−3 c−3 (1 + p(2b)d+1 d1/2 c2 )3
0
1
+n2 b2d exp(−nbd 2d−3 pc1 (1 + p(2b)d+1 d1/2 c2 )−1 )
= O((nbd )−1 ),

for any z ∈ Ib , and supz∈I |Bn3 (z)| = O((nbd )−1 ). With the fact that
b

˜
Bn4 (z) = 2n(1 − ∆(z))K(0)E

˜
(∆(z) − δn )Kbn (z)
[In (z)]2

˜
= 2nbd (1 − ∆(z))K(0)
×E

∆(z)(1 − ∆(z − bu))
˜
K(u)f (z − bu)
[In−1 (z)]2
(1 − ∆(z))∆(z − bu)
−
du,
˜
[In−1 (z) + K(u)]2

we have

|Bn4 (z)|
˜
≤ 2nbd (1 − ∆(z))K(0)
×E

˜
K(u)
|∆(z) − ∆(z − bu)|
[In−1 (z)]2
˜
˜
K(u)
K(u)
+
−
(1 − ∆(z))∆(z − bu) f (z − bu)du
˜
[In−1 (z)]2 [In−1 (z) + K(u)]2

˜
= 2nbd+1 (1 − ∆(z))K(0)E[In−1 (z)]−2 ∆
˜
+4nbd (1 − ∆(z))2 K(0)E[In−1 (z)]−3

˜
K(u) u f (z − bu)du
˜
K 2 (u)∆(z − bu)f (z − bu)du

= O(b(nbd )−1 ) + O((nbd )−2 ) = O((nbd )−2 ),

and supz∈I |Bn4 (z)| = O((nbd )−2 ). Thus we have supz∈I |Bn (z)| = O((nbd )−1 ), and
b

b

28

n

n−1

nhd

2

Khi (x)ˆni δi εi dψ(x) = Op ((nbd )−1 ).
r
i=1

Moreover, one obtains
n

n−1

2

Khi (x)ˆni δi εi dψ(x) = Op ((nhd )−1 (nbd )−1 )) = op (n−1 ).
r

(1.34)

i=1

This completes the proof of (1.29), and hence we obtain (1.27).
Step 2. In this part, we shall prove (1.22)-(1.26) in two steps, (2.a) and (2.b).
(2.a) We will prove (1.22), (1.24) and (1.25) by similar arguments used in proving the
asymptotic normality of the minimum distance estimator when data is complete in K-N. Let

˙
ˆ
Tnj (θ) := −2

ˆ
ˆ
Unj (x, θ)µn (x, θ)dψw (x),
˙

j = 1, 2,

ˆ
be the derivative of Tnj (θ) with respect to θ. Since θ0 is an interior point of Θ by condition
ˆ
ˆ
(m4), and θnj is consistent for θnj by Theorem 1.3.1, θnj will be in the interior of Θ and
˙
ˆ ˆ
Tnj (θnj ) = 0 with arbitrarily large probability for all suﬃcient large n. The equation
˙
ˆ
Tnj (θ) = 0 is equivalent to

ˆ
ˆ
ˆ
ˆ
ˆ
(Unj (x, θnj ) − Unj (x, θnj ))µn (x, θnj )dψw (x) +
˙
=

ˆ ˙
ˆ
ˆ
Zn (x, θnj )µn (x, θnj )dψw (x),

ˆ
ˆ
Unj (x)µn (x, θnj )dψw (x)
˙
j = 1, 2.

(1.35)

ˆ
By similar proof as that of (4.16) in K-N, the right-hand side of (1.35) equals Rn (θnj − θ0 )
for all n ≥ 1, with Rn = Σ0 + op (1); while for the second term on the left-hand side, one has
29

ˆ
ˆ
Unj (x)µn (x, θnj )dψw (x) = Snj + op (n−1/2 ) by similar proofs as those of Lemma 4.1 and
˙
Lemma 4.2 in K-N, with Un and εi replaced by Unj and ε∗ in (1.5), respectively. Recall un
ij
and dni from (1.5). For the ﬁrst term on the left-hand side with j = 1, note that

ˆ
ˆ
ˆ
ˆ
ˆ
(Un1 (x, θn1 ) − Un1 (x, θn1 ))µn (x, θn1 )dψw (x)
˙
n

=

un

n−1
i=1
n

+uT
n

n−1

d
ˆ
ˆ
˙
Khi (x)(1 − δi ) ni µn (x, θn1 )dψw (x)
un
ˆ
ˆ
Khi (x)(1 − δi )mθ0 (Xi ) µn (x, θn1 )dψw (x) := Jn1 + Jn2 .
˙
˙

i=1

By (m4), (m5), (a), and result (1.8), we obtain

n1/2 Jn1

|dni |
1≤i≤n un
|d |
≤ n1/2 un max ni
1≤i≤n un

≤ n1/2 un

max

+ max

1≤i≤n

ˆ
ˆ
ˆ
fh (x) µn (x, θn1 ) dψw (x)
˙
ˆ
ˆ
fh (x) µn (x, θ0 ) dψw (x)
˙

mθ (Xi ) − mθ0 (Xi )
˙ˆ
˙
n1

ˆ
ˆ
fh (x)dψw (x)

= op (1).

Moreover, observe that

T
n1/2 Jn2 = n1/2 uT
n

ˆ
µhδ (x)µT (x)dψw (x)
˙
˙h

+n1/2 uT
n

ˆ
µhδ (x){µT (x, θ0 ) − µT (x)}dψw (x)
˙
˙n
˙h

+n1/2 uT
n

ˆ
ˆ
µhδ (x){µT (x, θn1 ) − µT (x, θ0 )}dψw (x)
˙
˙n
˙n

+n1/2 uT
n

ˆ
{µnδ (x, θ0 ) − µhδ (x)}µT (x)dψw (x)
˙
˙
˙h

+n1/2 uT
n

ˆ
{µnδ (x, θ0 ) − µhδ (x)}{µT (x, θ0 ) − µT (x)}dψw (x)
˙
˙
˙n
˙h

+n1/2 uT
n

ˆ
ˆ
{µnδ (x, θ0 ) − µhδ (x)}{µT (x, θn1 ) − µT (x, θ0 )}dψw (x).
˙
˙
˙n
˙n

30

On the right-hand side of last equality, the last ﬁve terms are op (1), because of (m5), (a),
(1.8), C-S inequality and the fact that

{µn (x, θ0 ) − µh (x)}{µT (x, θ0 ) − µT (x)}dψ(x)
˙
˙
˙n
˙h

E
=

V ar(µn (x, θ0 ))dψ(x) = Op ((nhd )−1 ),
˙

ˆ
ˆ
{µn (x, θn1 ) − µn (x, θ0 )}{µT (x, θn1 ) − µT (x, θ0 )}dψ(x)
˙
˙
˙n
˙n
=

ˆ2
fh (x)dψ(x) max (mθ (Xi ) − mθ0 (Xi ))(mT (Xi ) − mT (Xi )) = op (hd ),
˙ˆ
˙
˙ˆ
˙θ
θn1
0
n1
1≤i≤n
{µnδ (x, θ0 ) − µhδ (x)}{µT (x, θ0 ) − µδ hT (x)}dψ(x)
˙
˙
˙ nδ
˙

E
=

V ar(µnδ (x, θ0 ))dψ(x) = Op ((nhd )−1 ).
˙

For the ﬁrst term, by (m4), (m5), (a), (1.8), and C-S inequality, one has

ˆ
µh (x)µT (x)dψw (x) = Σ∗ + op (1).
˙
˙ hδ
0

Hence (1.22) holds. If under H0 , mθ0 (x) is a linear function of θ0 , and αn is the least square
ˆ
˜n ˜
estimator, we have un = Σ−1 Sn and result (1.24).
To prove (1.25), it suﬃces to show that when j = 2, the ﬁrst term in the left-hand side
of (1.35) multiplied by n1/2 is op (1). Note that by C-S inequality,

ˆ
ˆ
ˆ
ˆ
ˆ
(Un2 (x, θn2 ) − Un2 (x, θn2 ))µn (x, θn2 )dψw (x)
˙
≤

ˆ2
1 + sup |f 2 (x)/fw (x) − 1|
x∈I

By the fact that

2

˜
An2

2

ˆ
µn (x, θn2 ) 2 dψ(x).
˙

ˆ
ˆ2
µn (x, θn2 ) 2 dψ(x) = Op (1), and supx∈I |f 2 (x)/fw (x) − 1| = op (1)
˙

˜
derived by (1.8), and it suﬃces to prove An2 = op (n−1 ), which in turn follows (a), (1.12),
31

(1.19), and (1.34).

(2.b) We shall prove (1.26) in this step. Based on (1.24) and (1.25), it suﬃces to prove
that

˜ ˜
n1/2 {Sn1 + Σ∗ Σ−1 Sn } →d Nq (0, Σ1 ),
0 n

(1.36)

n1/2 Sn2 →d Nq (0, Σ2 ).

(1.37)

The proof of (1.37) is similar as that of Lemma 4.1 (a) in K-N, if εi , σ 2 and Σ there are
replaced by δi εi /∆(Xi ), σ 2 /∆, and Σ2 in (1.5), respectively. To prove (1.36), note that
˜
˜n
˜
n1/2 Sn = Op (1) by the Central Limit Theorem, and Σ−1 = Σ−1 + op (1) by Law of Large
0
Numbers and routine calculations. Thus we have

˜
˜
˜
˜ ˜
˜ ˜
n1/2 {Sn1 + Σ∗ Σ−1 Sn } = n1/2 {Sn1 + Σ∗ Σ−1 Sn } + Σ∗ (Σ−1 − Σ−1 )(n1/2 Sn )
0 n
0 n
0 0
0
˜ ˜
= n1/2 {Sn1 + Σ∗ Σ−1 Sn } + op (1),
0 0

˜ ˜
and it suﬃces to show n1/2 {Sn1 + Σ∗ Σ−1 Sn } →d Nq (0, Σ1 ). Write
0 0

˜
n1/2 {Sn1 + Σ∗ Σ−1 sn1 }
0 0
n

=

˜
Khi (x)µh (x)dψ(x) + Σ∗ Σ−1 mθ0 (Xi ) δi εi
˙
0 0 ˙

n−1/2
i=1
n

= n−1/2

sni ,

say.

i=1

Note that by (e1) and (e2), {sni , i = 1, · · · , n} are i.i.d. centered r.v.’s for each n. By the
Lindeberg-Feller C.L.T., it suﬃces to prove that as n → ∞,
32

Es2 → Σ1 ,
n1

(1.38)

E{s2 I(|sn1 | > n1/2 η)} → 0 ∀η > 0.
n1

(1.39)

By the continuity of σ 2 , ∆, f , and g, we obtain

˜
Kh (x − X)µh (x)dψ(x) + Σ∗ Σ−1 mθ0 (X)
˙
0 0 ˙

Es2 = E
n1

2

∆(X)σ 2 (X)

Kh (x − X)Kh (y − X)σ 2 (X)∆(X)µh (x)µT (y)dψ(x)dψ(y)
˙
˙h

= E
+E

˜
Kh (x − X)σ 2 (X)∆(X)µh (x)mT (X)dψ(x)Σ−1 Σ∗
˙
˙θ
0
0
0

˜
+Σ∗ Σ−1 E
0 0

˙h
Kh (x − X)σ 2 (X)∆(X)mθ0 (X)µT (x)dψ(x)
˙

˜
˜
+Σ∗ Σ−1 E[mθ0 (X)mT (X)σ 2 (X)∆(X)]Σ−1 Σ∗
˙
˙θ
0 0
0
0
0

σ 2 (x)∆(x)mθ0 (x)mT (x)(f (x))−1 g 2 (x)dx
˙
˙θ

→

0

+2

˜
σ 2 (x)∆(x)mθ0 (x)mT (x)g(x)dx Σ−1 Σ∗
˙
˙θ
0
0
0

˜
+Σ∗ Σ−1
0 0

˜
˙θ
σ 2 (x)∆(x)mθ0 (x)mT (x)f (x)dx Σ−1 Σ∗ = Σ1 ,
˙
0
0
0

Hence (1.38) is proved. Note that by the H¨lder’s inequality, the L.H.S. of (1.39) with η = δ0
o
in (e3) is bounded by

2+δ
Cn−δ0 /2 Esn1 0

˜
Kh (x − X)µh (x)dψ(x) + Σ∗ Σ−1 mθ0 (X)
˙
0 0 ˙

= Cn−δ0 /2 E
≤ Cn−δ0 /2 E

2

Kh (x − X)µh (x)dψ(x)
˙

2+δ0

˜
+Cn−δ0 /2 E[{2Σ∗ Σ−1 mθ0 (X)}2+δ0 |δε|2+δ0 ]
0 0 ˙
33

|δε|2+δ0

2+δ0

|δε|2+δ0

2+δ0

≤ Cn−δ0 /2 22+δ0 E

(Kh (x − X)µh (x)) 2 dψ(x)
˙

2

dψ(x)

δ0

|δε|2+δ0

˜
+Cn−δ0 /2 E[{2Σ∗ Σ−1 mθ0 (X)}2+δ0 |δε|2+δ0 ]
0 0 ˙
= Op ((nhd )−δ0 /2 ).

Therefore the proof is complete.

Remark 1.4.1. (Choice of G). Assuming f = 0 implies g = 0. When q = 1 and σ 2 (x) ≡ σ 2 ,
ˆ
a constant, the asymptotic variance of θn1 satisﬁes

˜
v1 : = σ 2 Σ−1 + σ 2 Σ−2
0
0

∆(x)m2 (x)(f (x))−1 g 2 (x)dx
˙θ

∆(x)m2 (x)f (x)dx
˙θ

−

0
−1

0

∆(x)m2 (x)g(x)dx
˙θ

2

0

˜
≥ σ 2 Σ−1 ,
0

because, by C-S inequality,

∆(x)m2 (x)g(x)dx
˙θ

2

0

˙
∆1/2 (x)mθ0 (x)f 1/2 (x)∆1/2 (x)mθ0 (x)f −1/2 (x)g(x)dx
˙

=
≤

∆(x)m2 (x)f (x)dx
˙θ
0

2

∆(x)m2 (x)(f (x))−1 g 2 (x)dx,
˙θ
0

ˆ
with equality if and only if g ∝ f ; and the asymptotic variance of θn2 satisﬁes

v2 : = σ 2
≥ σ2

(∆(x))−1 m2 (x)g 2 (x)(f (x))−1 dx
˙θ
0

∆(x)m2 (x)f (x)dx
˙θ
0

˜
= σ 2 Σ−1 ,
0
34

−1

m2 (x)g(x)dx
˙θ
0

−2

because

m2 (x)g(x)dx
˙θ

2

0

(∆(x))−1/2 mθ0 (x)g(x)(f (x))−1/2 (∆(x))1/2 mθ0 (x)(f (x))1/2 dx
˙
˙

=
≤

(∆(x))−1 m2 (x)g 2 (x)(f (x))−1 dx
˙θ
0

2

∆(x)m2 (x)f (x)dx,
˙θ
0

with equality if and only if g ∝ f ∆. This implies that both lower bounds on the asymptotic
ˆ
variances of θnj , j = 1, 2, are at that of the least square estimator’s when the regression
function is linear.

1.5

Asymptotic distribution of the test statistics under
H0

ˆ
In this section we shall discuss the asymptotic null distribution of Dnj in Theorem 1.5.1.
Theorem 1.5.1. Assume that H0 , (e1), (e2), (e3), (e4), (f1), (f2), (g), (k1), (m1)-(m5),
(a), and (h3) hold. Then,

ˆ
Dn1 →d N (0, 1).

If, in addition, (k2), (b2), and (h4) hold, then,

ˆ
Dn2 →d N (0, 1).

ˆ
Consequently, for each j = 1, 2, the test that rejects H0 whenever |Dnj | > zα/2 , is of the
35

asymptotic size α.
The proof of Theorem 1.5.1 is facilitated by Lemma 1.5.2-1.5.7. The idea of the proof is
similar to that of Theorem 5.1 in K-N. Lemma 1.5.1 is applied to prove Lemma 1.5.2.

˜
Lemma 1.5.1. (Theorem 1 of Hall (1984)) Let Xi , 1 ≤ i ≤ n, be i.i.d. random vectors, and
let

˜ ˜
Hn (Xi , Xj ),

Un :=

˜
˜
Gn (x, y) = EHn (X1 , x)Hn (X1 , y),

1≤i<j≤n

where Hn is a sequence of measurable functions symmetric under permutation, with

˜ ˜ ˜
E(Hn (X1 , X2 )|X1 ) = 0,
2 ˜ ˜
EHn (X1 , X2 ) < ∞,

a.s.,

and

for each n ≥ 1.

If

4 ˜ ˜
2 ˜ ˜
˜ ˜
[EG2 (X1 , X2 ) + n−1 EHn (X1 , X2 )]/[EHn (X1 , X2 )]2 → 0,
n

then, Un is asymptotically normally distributed with mean zero and variance equal to
2 ˜ ˜
n2 EHn (X1 , X2 )/2.

To proceed further, we need the following notation:

K2 (v) :=

K(u)K(u + v)du,

Γ1 := 2 K2 2

K2 2 :=

∆2 (x)(σ 2 (x))2 g 2 (x)
dx,
f 2 (x)
I
36

2
K2 (v)dv,

Γ2 := 2 K2 2

(1.40)
(σ 2 (x))2 g 2 (x)
dx,
2
2
I ∆ (x)f (x)

Γn1 := 2hd

[EKh (x − X)Kh (y − X)∆(X)σ 2 (X)]2 dψ(x)dψ(y),

Γn2 := 2hd

[EKh (x − X)Kh (y − X)(∆(X))−1 σ 2 (X)]2 dψ(x)dψ(y).

ˆ
ˆ
ˆ
˜
Recall the deﬁnitions of Cnj , Γnj , Dnj from (1.4), and Tnj , Tnj , Cnj from (1.5).
Lemma 1.5.2. Suppose H0 , (e1), (e2), (e4), (f1), (g), (k1), (h1), and (h2) hold. Then,

nhd/2 (Tnj (θ0 ) − Cnj ) →d N (0, Γj ),

j = 1, 2.

The proof of Lemma 1.5.2 follows the same routine as that of Lemma 5.1 in K-N, but
with the following changes: for j = 1, replace εi , σ 2 (x), σ 4 (x), Γn , and Γ in K-N by δi εi ,
∆(x)σ 2 (x), ∆(x)σ 4 (x), Γn1 , and Γ1 , respectively; for j = 2, replace εi , σ 2 (x), σ 4 (x), Γn ,
and Γ in K-N by (∆(Xi ))−1 δi εi , (∆(x))−1 σ 2 (x), (∆(x))−3 σ 4 (x), Γn2 , and Γ2 , respectively.
The following results will be used in the proofs later:

Γnj →a.s. Γj ,

j = 1, 2.

(1.41)

Remark 1.5.1. Similar as Remark 5.1 in K-N, one has

nhd/2 (Tnj (θ0 ) − ETnj (θ0 )) →d N (0, Γj ),

j = 1, 2.

Lemma 1.5.3. Under H0 , (e1), (e2), (f1), (f2), (k1), and (h3),

˜
nhd/2 |Tnj (θ0 ) − Tnj (θ0 )| = op (1),

j = 1, 2.

The proof of Lemma 1.5.3 is similar to that of Lemma 5.3 in K-N where now Un (x) would
37

be changed to Unj (x), j = 1, 2.
Lemma 1.5.4. Under H0 , (e1), (e2), (e3), (f1), (k1), (m1)-(m5), (a), (h1), and (h2),

˜ ˆ
˜
nhd/2 |Tn1 (θn1 ) − Tn1 (θ0 )| = op (1).

If, in addition, (k2), (b2), and (h4) hold, then,

˜ ˆ
˜
nhd/2 |Tn2 (θn2 ) − Tn2 (θ0 )| = op (1).

Proof. Observe that

˜
˜ ˆ
Tnj (θ0 ) − Tnj (θnj ) = 2

ˆ
ˆ
Unj (x)Zn (x, θnj )dψw (x) −

2
ˆ
ˆ
Zn (x, θnj )dψw (x).

ˆ
If we follow similar routine as proof of Lemma 5.2 in K-N, with θn and Un in K-N changed
ˆ
to θnj and Unj , respectively, we can ﬁnd that it suﬃces to show

ˆ
nhd/2 (θnj − θ0 )T

ˆ
ˆ
Unj (x)µn (x, θnj )dψw (x) = op (1),
˙

j = 1, 2.

Note that the integral is the same as the second term in the left-hand side of (1.35). Thus,

ˆ
L.H.S. = nhd/2 (θnj − θ0 )T
ˆ
−nhd/2 (θnj − θ0 )T
:= Qn1j − Qn2j ,

ˆ ˙
ˆ
ˆ
Zn (x, θnj )µn (x, θnj )dψw (x)
ˆ
ˆ
ˆ
ˆ
ˆ
(Unj (x, θnj ) − Unj (x, θnj ))µn (x, θnj )dψw (x)
˙

j = 1, 2,

say.

We have Qn1j = op (1) for j = 1, 2, by the same argument as used in proving (5.10) in K-N.
38

By the proof of step (2.a) of Theorem 1.4.1, one obtains that

n1/2

ˆ
ˆ
ˆ
ˆ
ˆ
(Unj (x, θnj ) − Unj (x, θnj ))µn (x, θnj )dψw (x) = Op (1),
˙

j = 1, 2,

hence Qn2j = nhd/2 Op (n−1/2 )Op (n−1/2 ) = Op (hd/2 ), and the proof is completed.

Lemma 1.5.5. If H0 , (e1), (e2), (e3), (f1), (k), (m1)-(m5), (a), (h1), and (h2) hold, then

ˆ ˆ
˜ ˆ
nhd/2 |Tn1 (θn1 ) − Tn1 (θn1 )| = op (1).

If, in addition, (k2), (b2), (h3), and (h4) hold, then,

ˆ ˆ
˜ ˆ
nhd/2 |Tn2 (θn2 ) − Tn2 (θn2 )| = op (1).

Proof. Observe that

ˆ ˆ
˜ ˆ
|Tnj (θnj ) − Tnj (θnj )|
=

ˆ
ˆ
ˆ
ˆ
[Unj (x, θ0 ) − Zn (x, θnj )]2 − [Unj (x, θ0 ) − Zn (x, θnj )]2 dψw (x)

ˆ2
≤ (1 + sup |f 2 (x)/fw (x) − 1|)

ˆ
(Unj (x, θ0 ) − Unj (x, θ0 ))2 dψ(x)

x∈I

ˆ
(Unj (x, θ0 ) − Unj (x, θ0 ))2 dψ(x)

+2

1/2

ˆ
(Unj (x, θ0 ) − Zn (x, θnj ))2 dψ(x)

×

by the C-S inequality. The results in (1.18), (1.19), and (1.34) lead to the fact

ˆ
(Unj (x, θ0 ) − Unj (x, θ0 ))2 dψ(x) = op (n−1 ).
39

1/2

,

Let for j = 1, 2,

ˆ
unj := θnj − θ0 ,
ˆ

ˆ
dnij := mθ (Xi ) − mθ0 (Xi ) − uT mθ0 (Xi ).
ˆnj ˙
ˆ
nj

(1.42)

Then,

ˆ
[Unj (x, θ0 ) − Zn (x, θnj )]2 dψ(x)
2
ˆ
Zn (x, θnj )dψ(x)

≤ 2

2
Unj (x, θ0 )dψ(x) + 2

≤ 2

ˆ
dnij
2
Unj (x, θ0 )dψ(x) + 4 unj 2 max
ˆ
ˆ
1≤i≤n unj
n

+4 unj
ˆ

2

n−1

= Op ((nhd )−1 ) + op

Khi (x) mθ0 (Xi )
˙

i=1
(n−1 ) + O

p (n

−1 )

2

ˆ2
fh (x)dψ(x)

dψ(x)

= Op ((nhd )−1 ),

j = 1, 2.

by (m4) and Theorem 1.4.1. This completes the proof of Lemma 1.5.5.
Lemma 1.5.6. If H0 , (e1), (e2), (e3), (f1), (f2), (k), (m1)-(m5), (a), and (h3) hold, then

ˆ
nhd/2 |Cn1 − Cn1 | = op (1).

If, in addition, (k2), (b2), and (h4) hold, then,

ˆ
nhd/2 |Cn2 − Cn2 | = op (1).

Proof. Let for j = 1, 2, i = 1, · · · , n,

ˆ2
vw (x) := f 2 (x)/fw (x) − 1,

tij := mθ (Xi ) − mθ0 (Xi ),
ˆ
nj

40

(1.43)

si := (1 − δi )(mαn (Xi ) − mθ0 (Xi )),
ˆ
ci := (1 −

ai := ∆(Xi )ˆni ,
r

δi
)(mαn (Xi ) − mθ0 (Xi )),
ˆ
∆(Xi )

qi := −ˆni δi (mαn (Xi ) − mθ0 (Xi )),
r
ˆ
wi1 := si − ti1 ,

wi2 := ci + qi − ti2 .

Note that from result (1.17), for i = 1, · · · , n,

˜
ˆ
˜
εi1 = (Yi1 − mθ0 (Xi )) + (Yi1 − Yi1 ) − (mθ (Xi ) − mθ0 (Xi ))
ˆ
ˆ
nj

= ε∗ + si − ti1 ,
i1
ˆ
˜
˜
εi2 = (Yi2 − mθ0 (Xi )) + (Yi2 − Yi2 ) − (mθ (Xi ) − mθ0 (Xi ))
ˆ
ˆ
nj

= ε∗ (1 + ai ) + ci + qi − ti2 ,
i2

and hence,

ˆ
Cn1 − Cn1
n

=

n−2

n
2
Khi (x)[ˆ2
εi1

ˆ
− (ε∗ )2 ]dψw (x) + n−2
i1

i=1
n

= n−2

2
Khi (x)(ε∗ )2 vw dψ(x)
i1
i=1

2
ˆ
Khi (x)[(ε∗ + wi1 )2 − (ε∗ )2 ]dψw (x)
i1
i1

i=1
n
−2
+n

2
Khi (x)(ε∗ )2 vw dψ(x),
i1

i=1

ˆ
Cn2 − Cn2
n

= n−2

n
2
ˆ
Khi (x)[ˆ2 − (ε∗ )2 ]dψw (x) + n−2
εi2
i2

i=1

2
Khi (x)(ε∗ )2 vw dψ(x)
i2
i=1

41

n

=

2
ˆ
Khi (x)[(ε∗ (1 + ai ) + wi2 )2 − (ε∗ )2 ]dψw (x)
i2
i2

n−2
i=1
n
−2
+n

2
Khi (x)(ε∗ )2 vw dψ(x).
i2

i=1

By (a), (m4), (1.9), (1.13), (1.27), one has

vn := sup |vw (x)| = Op ((logk n)(log n/n)2/(d+4) ),

(1.44)

x∈I

tnj := max |tij | = Op ((nhd )−1/2 ),
1≤i≤n

j = 1, 2,

sn := max |si | = Op (n−1/2 ),

an := max |ai | = Op ((nbd )−1/2 (log n)1/2 ),

cn := max |ci | = Op (n−1/2 ),

qn := max |qi | = Op ((n−1/2 b−d/2 (log n)1/2 ),

1≤i≤n

1≤i≤n

1≤i≤n

1≤i≤n

wn1 := max |wi1 | ≤ sn + tn1 = Op ((nhd )−1/2 ),
1≤i≤n

wn2 := max |wi2 | ≤ cn + qn + tn2 = Op ((n−1/2 b−d/2 (log n)1/2 ).
1≤i≤n

These facts together with the following facts that
n

Snj,2 :=

n−2

2
Khi (x)(ε∗ )2 dψ(x) = Op ((nhd )−1 ),
ij
i=1
n

Snj,1 := n−2

2
Khi (x)|ε∗ |dψ(x) = Op ((nhd )−1 ),
ij
i=1
n
2
Khi (x)dψ(x) = Op ((nhd )−1 ),

Snj,0 := n−2

j = 1, 2,

i=1

we obtain

2
ˆ
nhd/2 {(|Cn1 − Cn1 |)} ≤ nhd/2 [(1 + vn ){2wn1 Sn1,1 + wn1 Sn1,0 }

+vn Sn1,2 ] = op (1),

42

2
ˆ
nhd/2 {(|Cn2 − Cn2 |)} ≤ nhd/2 [(1 + vn ){2wn2 Sn2,1 (1 + an ) + wn2 Sn2,0

+(2an + a2 )Sn2,2 } + vn Sn1,2 ] = op (1),
n

by (b2) and (h4). This completes the proof.

Lemma 1.5.7. Under H0 , (e1), (e2), (e3), (f1), (k1), (m1)-(m5), (a), (h1), and (h2),

ˆ
Γn1 − Γ1 = op (1).

If in addition, (f2), (k2), (b2), and (h4) hold, then,

ˆ
Γn2 − Γ2 = op (1).

ˆ
Consequently, Γj > 0 implies |Γnj Γ−1 − 1| = op (1), j = 1, 2.
j

Proof. The proof of Lemma 1.5.7 is similar to that of Lemma 5.5 in K-N. Recall vw ,
ti1 , ti2 , si , ai , ci , qi from (1.43), and vn , tn1 , tn2 , sn , an , cn , qn , wn1 , wn2 from (1.44). Let
for k = 1, 2,

˜
Γnk := 2hd n−2

Khi (x)Khj (x)ε∗ ε∗ dψ(x)
ik jk

2

.

i=j

From result (1.41), it suﬃces to show

˜
Γnk − Γnk = op (1),

ˆ
˜
Γnk − Γnk = op (1),

k = 1, 2.

(1.45)

The ﬁrst claim in (1.45) is proved similarly as (5.13) in K-N. For the second claim, note that
43

ˆ
˜
Γn1 − Γn1
= 2hd n−2

Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )(1 + vw (x))dψ(x)
j1
i1
i=j

−2hd n−2

Khi (x)Khj (x)ε∗ ε∗ dψ(x)
i1 j1

2

i=j

= 2hd n−2

Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )dψ(x)
j1
i1

2

i=j

+2hd n−2

Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )vw (x)dψ(x)
i1
j1

2

i=j

+4hd n−2

Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )dψ(x)
i1
j1
i=j

×

Khi (x)Khj (x)(ε∗ + wi1 )(ε∗ + wj1 )vw (x)dψ(x)
i1
j1
Khi (x)Khj (x)ε∗ ε∗ dψ(x)
i1 j1

−2hd n−2

2

,

i=j

ˆ
˜
Γn2 − Γn2
= 2hd n−2

Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 }
i2
i=j

×{ε∗ (1 + aj ) + wj2 }(1 + vw (x))dψ(x)
j2
−2hd n−2

Khi (x)Khj (x)ε∗ ε∗ dψ(x)
i2 j2

2

i=j

= 2hd n−2

Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 }
i2
i=j

×{ε∗ (1 + aj ) + wj2 }dψ(x)
j2
+2hd n−2

2

Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 }
i2
i=j

×{ε∗ (1 + aj ) + wj2 }vw (x)dψ(x)
j2

44

2

2

2

+4hd n−2

Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 }
i2
i=j

×{ε∗ (1 + aj ) + wj2 }dψ(x)
j2
Khi (x)Khj (x){ε∗ (1 + ai ) + wi2 }
i2

×

×{ε∗ (1 + aj ) + wj2 }vw (x)dψ(x)
j2
−2hd n−2

Khi (x)Khj (x)ε∗ ε∗ dψ(x)
i2 j2

2

.

i=j

By Fubini’s theorem and taking the expected value, one obtains

Wnk,2,2 := 2hd n−2

(ε∗ )2 (ε∗ )2
ik
jk

2

Khi (x)Khj (x)dψ(x)

= Op (1),

i=j

Wnk,2,1 := 2hd n−2

(ε∗ )2 |ε∗ |
ik
jk

Khi (x)Khj (x)dψ(x)

2

= Op (1),

i=j

Wnk,2,0 := 2hd n−2

(ε∗ )2
ik

2

Khi (x)Khj (x)dψ(x)

= Op (1),

i=j

Wnk,1,1 := 2hd n−2

|ε∗ | |ε∗ |
ik jk

Khi (x)Khj (x)dψ(x)

2

= Op (1),

i=j

Wnk,1,0 := 2hd n−2

|ε∗ |
ik

Khi (x)Khj (x)dψ(x)

2

= Op (1),

i=j

Wnk,0,0 := 2hd n−2

Khi (x)Khj (x)dψ(x)

2

= Op (1),

k = 1, 2.

i=j

Hence, we have

2
4
ˆ
˜
|Γn1 − Γn1 | ≤ (1 + vn )2 {2wn1 Wn1,2,0 + wn1 Wn1,0,0 + 4wn1 Wn1,2,1
2
3
2
+4wn1 Wn1,1,1 + 4wn1 Wn1,1,0 } + (2vn + vn )Wn1,2,2

= op (1),

45

2
ˆ
˜
|Γn2 − Γn2 | ≤ (1 + vn )2 {(2an + a2 )Wn1,2,2 + 2wn2 (1 + a2 )2 Wn2,2,0
n
4
+wn2 Wn2,0,0 + 4wn2 (1 + an )3 Wn2,2,1
2
3
+4wn2 (1 + an )2 Wn2,1,1 + 4wn2 (1 + an )Wn2,1,0 }
2
+(2vn + vn )Wn1,2,2

= op (1).

Therefore the second claim of (1.45) is proved, and so is Lemma 1.5.7.

1.6

Simulations

In this section two simulation studies are reported. The ﬁrst investigates behavior of the
ˆ
empirical size and power of the test I(|Dn1 | > 1.96) with g(x) ≡ 1 on [−1, 1]2 at 4 alternatives under diﬀerent designs and data missing probabilities. The second lists the mean
ˆ
and standard deviation of the minimum distance parameter estimator θn1 . In both studies,
d = 2, and the completed data set are constructed using imputation method. All simulations
are based on 1000 replications.
In the ﬁrst study, we compare the empirical size and power of the test at 4 alternatives,
on 2 designs X, and 3 data missing probabilities ∆(X). More precisely, the design variables
Xi = (X1i , X2i )T , i = 1, · · · , n, are i.i.d bivariate normal N (0, Vk ), k = 1, 2, with




 0.36 0 
V1 = 
,
0 1




1


V2 = 

46

0.64 
.
0.64 1

(1.46)

The three choices of ∆(x), x = (x1 , x2 )T , are as follows:

∆1 (x) = (1 + e−0.8−0.5x1 −0.5x2 )−1 ,

(1.47)

∆2 (x) = (1 + e−0.2−0.3x1 −0.3x2 )−1 ,
∆3 ≡ 1,

the complete data.

These choices are similar to those in Sun and Wang (2009). They use the data missing
probabilities {1+exp(−0.3−0.3x)}−1 , {1+exp(−1.0−0.8x)}−1 , and 1−0.4 exp(−5(x−0.4)2 )
when d = 1. The error distribution is N (0, (0.3)2 ). The regression function under the null
T
hypothesis is µ(x) = θ0 l(x), where θ0 = (0.5, 0.8)T , l(x) = x = (x1 , x2 )T . The regression

models are as follows:

M odel 0. δi Yi = δi µ(Xi ) + δi εi ,
M odel 1. δi Yi = δi µ(Xi ) + 0.5δi (X1i − 0.2)(X2i − 0.4) + δi εi ,
M odel 2. δi Yi = δi µ(Xi ) + 0.5δi (X1i X2i − 1) + δi εi ,
2
2
M odel 3. δi Yi = δi µ(Xi ) + 2δi {exp(−0.4X1i ) − exp(0.6X2i )} + δi εi ,

M odel 4. δi Yi = δi X1i I(X2i > 0.2) + δi εi ,

The nominal level is α = 0.05. The sample sizes considered are n = 50, 100, 200. The
ﬁrst 2 tables describe empirical sizes and powers in models 0-4. Model 0 is the null model
while model 1-4 are the alternatives. These empirical levels and powers are computed by
ˆ
the relative frequency of the event {|Dn1 | > 1.96} in corresponding models. Bandwidths
h = n−1/4.5 and w = (log n/n)1/6 are chosen because of (h3) and (1.9). The kernels are

47

K(u, v) ≡ K 1 (u)K 1 (v) and K ∗ ≡ K, with K 1 (u) := 3 (1 − u2 )I(|u| ≤ 1).
4
Table 1.1: Empirical
ε ∼ N (0, (.3)2 )
n
∆
Model 0
Model 1
Model 2
Model 3
Model 4

sizes and powers for model 0 vs. models 1-4 with X ∼ N (0, V1 ) and

∆1
.020
.103
.993
.315
.241

n=50
∆2
.027
.079
.941
.203
.159

∆3
.031
.224
1
.999
.484

∆1
.029
.278
1
.351
.671

n=100
∆2
.029
.176
.999
.270
.497

∆3
.036
.586
1
1
.905

∆1
.033
.633
1
.375
.980

n=200
∆2
∆3
.034 .042
.513 .935
1
1
.338
1
.920
1

Table 1.1 gives the empirical sizes and powers for testing model 0 against models 1-4
with design X ∼ N (0, V1 ), when the data are randomly missing with either of the 2 missing
data probabilities or with no missing data. In the simulation, the empirical sizes of the test
for model 0 keep less than 0.05. When the sample size increases, it gradually approaches
the asymptotic level and becomes quite close at the sample size 200. On the other hand,
the empirical powers of the test are greater than 0.05 against each alternative 1-4 for all the
sample sizes we take, and become closer to 1 as the sample size increases; especially against
alternative 2, the power is above 0.94 even at sample size 50. From the comparison among
the 3 data missing probabilities, we observe that the level behavior is aﬀected by the data
missing probability, while the power is aﬀected much more.
Table 1.2: Empirical
ε ∼ N (0, (.3)2 )
n
∆
Model 0
Model 1
Model 2
Model 3
Model 4

sizes and powers for model 0 vs. models 1-4 with X ∼ N (0, V2 ) and

∆1
.025
.115
.965
.237
.203

n=50
∆2
.027
.103
.831
.187
.144

∆3
.030
.371
1
1
.529

∆1
.029
.199
.999
.272
.596

n=100
∆2
.031
.164
.991
.209
.471

∆3
.036
.677
1
1
.927

∆1
.035
.479
1
.274
.957

n=200
∆2
∆3
.037 .043
.373 .952
1
1
.227
1
.892
1

Table 1.2 lists empirical sizes and powers with design X ∼ N (0, V2 ). In addition to
48

obtaining similar conclusion as the ﬁrst table, we can also ﬁnd that the power and the
level behaviors are aﬀected by the dependence between the design variable coordinates,
although they are not aﬀected that much. Results for model 4 in both tables show that the
discontinuity of regression function has an eﬀect on the power of the test, because the power
is dramatically changed as the sample size increases.
ˆ
Table 1.3: Mean and s.d. of θn1 under model 0 with X ∼ N (0, V1 ), ε ∼ N (0, (.3)2 ), and
E(δ|X = x) = ∆1 (x)
n
n = 50
n = 100
n = 200
Mean
(.494, .804)
(.503, .800)
(.499, .800)
(.110, .084)
(.078, .061)
(.052, .043)
Std dev

ˆ
The second study gives the mean and standard deviation of each component of θn1
under the null hypothesis model 0 with normal error ε ∼ N (0, (0.3)2 ) when d = q = 2.
The variance of design and data missing probability are chosen to be V1 in (1.46) and ∆1 in
(1.47), respectively. The regression function and parameter are the same as in the ﬁrst study.
Results listed in Table 1.3 show that the minimum distance estimator of the parameter is
very close to the true parameter and the standard deviation is quite small.

49

Chapter 2
Testing for Superiority of Two
Regression Curves when Responses
are Missing At Random

2.1

Introduction

This chapter considers a class of tests using covariate matching for comparing the equality
of two nonparametric regression curves against a one-sided alternative, when responses are
missing at random. More precisely, let (Xk , δk Yk ), k = 1, 2, be the two groups of random
variables, where Xk is a one-dimensional explanatory variable, Yk is a one-dimensional response variable, δk is an indicator for whether the response is missing or observed, i.e. δk = 1,
if Yk is observed, and δk = 0, if Yk is missing, k = 1, 2. We say Yk is missing at random, if
δk and Yk are conditionally independent, given Xk , i.e. P (δk = 1|Yk , Xk ) = P (δk = 1|Xk ),
a.s., k = 1, 2; see Little and Rubin (1987).
50

Now, let

µk (x) := E(Yk |Xk = x),

x ∈ R,

k = 1, 2,

be the two regression functions so that

Yk = µk (Xk ) + εk ,

E(εk |Xk = x) = 0,

∀ x ∈ R,

k = 1, 2.

Let I be a compact interval in R. The problem of interest is to test the hypothesis

H0 : µ1 (x) = µ2 (x),

for all x ∈ I,

H1 : µ1 (x) ≥ µ2 (x),

for all x ∈ I with strict inequality for at least one x ∈ I,

based on independent samples {(Xk,i , δk,i Yk,i ) : i = 1, · · · , nk } from the distributions of
(Xk , δk Yk ), k = 1, 2, respectively. Moreover, let φ be a non-negative continuous function on
R. One is interested in the asymptotic power of a given test against the local alternatives

H1N : µ1 (x) = µ2 (x) + N −1/2 φ(x),

N :=

n1 n2
,
n1 + n2

for all x ∈ I.

(2.1)

When we observe complete data, this testing problem has been addressed by many researchers. In particular, Hall et. al (1997) proposed a class of tests based on the covariatematching, and the local averaging interpolation rule. They proved the asymptotic normality
of the proposed statistics under general alternatives, allowing design and error densities to
be diﬀerent. They also proposed an adaptive version of their test that achieves the optimal power against a sequence of local alternatives. Koul and Schick (1997) proposed four
51

classes of tests under the assumption of possibly distinct design but common error densities.
They gave a general asymptotic optimality theory against a sequence of local alternatives.
One of these classes of covariate-matched tests is shown to have desirable asymptotic power
properties against several alternatives. Koul and Schick (2003) (K-S) developed this class
of test further and derive their asymptotic power for the local alternatives, under the heteroscedastic setting with possibly distinct error and design densities in the two regression
models. They obtained an upper bound on the asymptotic power of all tests against a given
sequence of local alternatives using a semiparametric approach, and showed that a member
of this class of tests achieves this upper bound.

This chapter discusses the above one-sided testing problem when responses are missing
at random. We construct a complete data set by imputing kernel-type estimates for the
regression functions, and investigate the asymptotic properties of the modiﬁed version for
missing at random setup of the covariate-matched test statistic proposed in K-S under null
hypothesis and local alternatives. The consistency of the tests based on these statistics is
also discussed. To set up the analysis, let U be the set of all non-negative functions that are
continuous on I and vanish oﬀ I. Assume that Xk has a density gk that is bounded away
from zero on I, k = 1, 2. Let K be a symmetric Lipschitz continuous kernel density with
compact support [−1, 1], a = aN , bk = bk,n , ck = ck,n , and dk = dk,n , be bandwidth
k
k
k
sequences. Let Kh (y) := K(y/h)/h, y ∈ R, h = a, bk , ck , dk . The estimators of regression
functions and the constructed responses are, respectively,

µk (x) :=
ˆ

nk
i=1 δk,i Yk,i Kbk (x − Xk,i )
,
nk
i=1 δk,i Kbk (x − Xk,i )

ˆ
Yk,i := δk,i Yk,i + (1 − δk,i )ˆk (Xk,i ),
µ
52

1 ≤ i ≤ nk ,

k = 1, 2.

For each k = 1, 2, let vk be a non-negative estimate of vk :=
ˆ

√
u/gk which vanishes oﬀ I.

The covariate-matched statistic and the adaptive version with responses missing at random,
respectively, are

1
T :=
n1 n2

n1 n2

v1 (X1,i )v2 (X2,j )(Y1,i − Y2,j )Ka (X1,i − X2,j ),
i=1 j=1

and

ˆ
T :=

1
n1 n2

n1 n2

ˆ
ˆ
v1 (X1,i )ˆ2 (X2,j )(Y1,i − Y2,j )Ka (X1,i − X2,j ).
ˆ
v
i=1 j=1

The needed assumptions and conditions to state the main results are given in Section 2.2.
ˆ
Section 2.3 states the asymptotic normality of T under H0 and H1N , and the consistency of
ˆ
the test based on T . The optimal u to maximize the asymptotic power against H1N is also
ˆ
discussed. Section 2.4 gives the estimates needed to construct T and the corresponding test.
Simulation studies are set in Section 2.5.

2.2

Assumptions

In this section we shall state the needed assumptions. The following assumptions are similar
to those in K-S. For each k = 1, 2,

(e1) (Xk,i , δk,i Yk,i ) : Xk,i ∈ R, Yk,i ∈ R, δk,i = 0 or 1, i = 1, 2, · · · , nk , are i.i.d. random vectors with δk,i = 1, if Yk,i is observed, and δk,i = 0, if Yk,i is missing;
µk (x) = E(Yk,1 |Xk,1 = x), x ∈ R, εk,i = Yk,i − µk (Xk,i ), δk,i and εk,i are conn

n

1
2
ditionally independent, given Xk,i . {(X1,i , δ1,i Y1,i )}i=1 and {(X2,j , δ2,j Y2,j )}j=1 are

53

independent.
2
(e2) Eε2 < ∞, σk (x) := E(ε2 |Xk,1 = x) and ∆k (x) := E(δk,1 = 1|Xk,1 = x) are
k,1
k,1

continuous and positive on I.
4
(e3) νk (x) := E(ε4 |Xk,1 ), x ∈ R, is bounded on an open interval containing I.
k,1
2
(e4) σk and ∆k are twice continuously diﬀerentiable on I.

(g1) The design variable Xk,1 has a bounded Lebesgue density gk which is continuous
and positive on I.
(g2) The density g is twice continuously diﬀerentiable on I.
(k) The kernel w is symmetric square integrable continuous density with compact
support [−1, 1]. In addition, w satisﬁes Lipschitz-continuity of order 1.
(m) µ1 is continuous. µ2 is Lipschitz-continuous of order 1 with Lipschitz constant
µ2 .

(p) φ is a non-negative continuous function.
(q) ξ is a non-negative continuous function with ξ(x) > 0 for at least one x ∈ I.
(u) U is the set of all non-negative functions that vanish oﬀ I and whose restrictions
to I are continuous.
(w1) a2 N → 0, aN η1 → ∞, for some η1 ∈ (1/2, 1).
η
(w2) b2 nk → 0, bk nk2 → ∞, for some η2 ∈ (1/2, 1).
k
η

(w3) ck → 0, dk → 0, (ck +dk )nk3 → ∞ for some η3 ∈ (0, 1/2), (c5 +d5 )nk (log nk )−1 ≤
k
k
C for some C < ∞.
(z) {Ik,1 , · · · , Ik,B } partitions I into disjoint intervals of equal length πk , with πk → 0
k

1/2

and nk πk → ∞.
54

2
Note that (e2) and (g1) imply that for each k = 1, 2, the functions gk , σk , and ∆k , are

bounded and uniformly continuous on the compact interval I, and bounded away from zero
on I.
Rewrite H1 into the form:

H1 : µ1 = µ2 + ξ,

where ξ satisﬁes (q) and

u(x)ξ(x)dx > 0,

u ∈ U.

(2.2)

To state the main results, we need the following set of additional conditions on estimators.
They are motivated by Schick (1987), and proposed in K-S as Deﬁnition 2.1, Assumption
2.3, and Lemma 2.4, for the case of complete responses. These conditions are reproduced as
follows, only with changes from the case of complete responses to data missing at random
setup. We need these conditions not only under H0 and H1N in (2.1), but also under H1 in
(2.2). Let

X := (X1,1 , · · · , X1,n1 , X2,1 , · · · , X2,n2 ),

(2.3)

δ := (δ1,1 , · · · , δ1,n1 , δ2,1 , · · · , δ2,n2 ),
Y := (Y1,1 , · · · , Y1,n1 , Y2,1 , · · · , Y2,n2 ),
rk (x) = u(x)/gk (x), x ∈ I.

and Yk,j be the vector obtained from Y by removing Yk,j , j = 1, · · · , nk , k = 1, 2.

Deﬁnition 2.2.1. The estimator rk of rk is said to be consistent and cross-validated on I
ˆ
for the function rk (short CCV on I for rk ) if the following two conditions hold:
55

n

N k
1I (Xk,i )E[{ˆk (Xk,i ) − rk (Xk,i )}2 |X, δ] = op (1),
r
n2 i=1
k
N max sup E[{ˆk (x) − E[ˆk (x)|X, δ, Yk,j ]}2 |X, δ] = op (1).
r
r
1≤j≤nk x∈I

(2.4)
(2.5)

We say rk is a modiﬁcation of rk if P (supx∈I |˜k (x) − rk (x)| > 0) → 0. We say rk is
˜
ˆ
r
ˆ
ˆ
essentially CCV on I for rk if there exists a modiﬁcation of rk which is CCV on I for rk .
ˆ

Assumption 2.2.1. The estimate rk is essentially CCV on I for rk for k = 1, 2.
ˆ

Lemma 2.2.1. Suppose there are modiﬁcations vk of vk such that, for k = 1, 2 and l = 1, 2,
˜
ˆ

0 ≤ vk (x) ≤ M,
˜

x ∈ I,

(2.6)

for some ﬁnite constant M , and such that

1
nl

nl

E[{˜k (Xl,i ) − vk (Xl,i )}2 |X, δ] = op (1),
v

(2.7)

i=1

N max sup E[{˜k (x) − E[˜k (x)|X, δ, Yl,i ]}2 |X, δ] = op (1).
v
v
1≤i≤nl x∈I

(2.8)

Then Assumption 2.2.1 holds.

The proof of Lemma 2.2.1 follows that of Lemma 2.4 in K-S, only with changes from X
to (X, δ). Since this proof does not involve the responses Y but only the designs (X, δ), the
above lemma holds under H0 , H1N , and H1 .

Remark 2.2.1. Suppose modiﬁcations vk of vk exist and satisfy (2.6)-(2.8), k = 1, 2. K-S
˜
ˆ
56

show in their proof of Lemma 2.4 that the estimators

1
r1 (x) := v1 (x)
ˆ
ˆ
n2
r2 (x) := v2 (x)
ˆ
ˆ

1
n1

n2

v2 (X2,j )Ka (x − X2,j ),
ˆ

and

v1 (X1,i )Ka (x − X1,i ),
ˆ

x ∈ R,

(2.9)

j=1
n1
i=1

are essentially CCV on I for r1 , and r2 , respectively, and their respective modiﬁcations can
be chosen as

1
r1 (x) = v1 (x)
˜
˜
n2
r2 (x) = v2 (x)
˜
˜

1
n1

n2

v2 (X2,j )Ka (x − X2,j ),
˜

(2.10)

j=1
n1

v1 (X1,i )Ka (x − X1,i ).
˜
i=1

We also need the following notation and results in the proofs later. Let

hk (x) := ∆k (x)gk (x),
1
ˆ
hk (x) :=
nk

λk := inf hk (x),
x∈I

nk

δk,l Kb (x − Xk,l ),
k

l=1

k = 1, 2.

1
gk (x) :=
ˆ
nk

(2.11)

nk

Kb (x − Xk,l ).
k

l=1

Lemma 2.2.2. Let tk = tnk , k = 1, 2, be bandwidths satisfying tk → 0 and nk t5 (log nk )−1 ≤
k
C for some C < ∞. Assume (e2), (e4), (g1), and (g2) hold. Then the following hold.

1
sup
x∈I nk

nk

Ktk (x − Xk,i ) − gk (x) = op (1).

(2.12)

i=1
nk

1
δk,i Ktk (x − Xk,i ) − hk (x) = op (1).
x∈I nk i=1
sup

57

(2.13)

1
sup
x∈I nk

nk

εk,i δk,i Ktk (x − Xk,i ) = op (1).

(2.14)

i=1
nk

1
ε2 δk,i Ktk (x − Xk,i ) − Eε2 δk,1 Ktk (x − Xk,1 ) = op (1).
k,i
k,1
nk
x∈I
i=1
nk
1
nk
i=1 δk,i Ktk (x − Xk,i )
sup
− ∆k (x) = op (1).
nk
1
Ktk (x − Xk,i )
x∈I
n
i=1
sup

(2.15)

(2.16)

k

This lemma is obtained from Theorem 3 of Collomb and H¨rdle (1986).
a

2.3

Asymptotic distribution of the test statistic under
H0, H1N, and H1

ˆ
In this section we discuss the asymptotic distribution of T against H1N in Theorem 2.3.1.
The asymptotic null distribution is included because the choice φ = 0 corresponds to the
ˆ
null hypothesis. The asymptotic behavior of T against H1 is given in Theorem 2.3.2, while
consistency of the corresponding test against H1 is stated in Remark 2.3.1.
K-S propose an optimal u to test H0 against H1N when data is complete. When responses
are missing at random, a similar optimal u can be derived. This result is given in Remark
2.3.1.
The following deﬁnitions are used in the theorems and remarks below.

n2
N
=
,
n1
n1 + n2
2
σ1 (x)
ψ1 (x) :=
,
∆1 (x)g1 (x)

q1 :=

τ 2 :=

N
n1
=
,
n2
n1 + n2
2
σ2 (x)
ψ2 (x) :=
,
∆2 (x)g2 (x)

q2 :=

u2 (x)[q1 ψ1 (x) + q2 ψ2 (x)]dx,
58

D :=

(2.17)
x ∈ R,
u(x)(µ1 (x) − µ2 (x))dx.

Theorem 2.3.1. Assume that (e1), (e2), (e4), (g1), (g2), (k), (m), (p), (u), (w1), (w2),
and Assumption 2.2.1 hold. Then under H1N of (2.1),

N 1/2

1
ˆ
T −D−
n1
−

1
n2

n1
i=1
n2
j=1

u(X1,i )
δ ε
∆1 (X1,i )g1 (X1,i ) 1,i 1,i
u(X2,j )
δ ε
= op (1),
∆2 (X2,j )g2 (X2,j ) 2,j 2,j

as both sample sizes n1 and n2 tend to inﬁnity. Consequently, under H1N ,

ˆ
N 1/2 (T − D) →d N (0, τ 2 ),

as n1 ∧ n2 → ∞.

ˆ
Proof. Recall r from (2.3), rk from (2.9), gk and hk from (2.11), k = 1, 2. For x ∈ R,
ˆ
ˆ
k, m = 1, 2, let

1
µk,m (x) :=
¯
nk
εk (x) :=
¯

1
nk

1
¯
φk (x) :=
nk

nk

ˆ
µm (Xk,l )δk,l Kb (x − Xk,l )/hk (x),
k

l=1
nk

ˆ
εk,l δk,l Kb (x − Xk,l )/hk (x),
k

l=1
nk

ˆ
φ(Xk,l )δk,l Kb (x − Xk,l )/hk (x).
k

l=1

ˆ
Suppose H1N holds. With deﬁnitions above, write T = A1 + B1 − B2 + C1 − C2 + R1 + R2 ,
where

1
A1 :=
n1 n2
B1 :=

1
n1

n1 n2

v1 (X1,i )ˆ2 (X2,j ) µ2 (X1,i ) − µ2 (X2,j ) Ka (X1,i − X2,j ),
ˆ
v

i=1 j=1
n1

r1 (X1,i )(1 − δ1,i ) µ1,2 (X1,i ) − µ2 (X1,i ) ,
ˆ
¯
i=1

59

1
B2 :=
n2
C1 :=

C2 :=

1
n1
1
n2

n2

r2 (X2,j )(1 − δ2,j ) µ2,2 (X2,j ) − µ2 (X2,j ) ,
ˆ
¯
j=1
n1

r1 (X1,i ) δ1,i ε1,i + (1 − δ1,i )¯1 (X1,i ) ,
ˆ
ε
i=1
n2

r2 (X2,j ) δ2,j ε2,j + (1 − δ2,j )¯2 (X2,j ) ,
ˆ
ε
j=1

N −1/2
R1 :=
n1
R2 :=

n1

r1 (X1,i )φ(X1,i ),
ˆ

i=1
n
N −1/2 1

n1

¯
r1 (X1,i )(1 − δ1,i ) φ1 (X1,i ) − φ(X1,i ) .
ˆ

i=1

In the following, we shall show that

N 1/2 A1 = op (1);

N 1/2 Bk = op (1),

k = 1, 2;

N 1/2 R1 = n1/2 D + op (1);
N 1/2 C

N 1/2
=
k
nk

N 1/2 R2

nk
i=1

(2.18)

rk (Xk,i )
δ ε + op (1),
∆k (Xk,i ) k,i k,i

= op (1).

k = 1, 2;

(2.19)
(2.20)

Among them, (2.18) is derived by similar proof as that of Theorem 2.6 in K-S, while some
details proof of (2.19) are also inspired by those of Theorem 2.6 in K-S. Recall the Lipschitz
constant µ2 of µ2 in condition (m). By (g1), (m), (u), (w1), Assumption 2.2.1, and routine
calculation, one has

N 1/2 |A1 |

≤

N 1/2

1
µ2 a
n1

n1

r1 (X1,i ) = op (1).
ˆ
i=1

From (g1), (m), (u), (w2), Assumption 2.2.1, and the fact

60

µk,2 (Xk,i ) − µ2 (Xk,i )
¯

≤
≤

1
nk

nk
l=1 |µ2 (Xk,l ) − µ2 (Xk,i )|δk,l Kbk (Xk,i

− Xk,l )

ˆ
hk (Xk,i )
µ2 bk ,

k = 1, 2,

nk
ˆ
i=1 rk (Xk,i )

1
one obtains N 1/2 |Bk | ≤ N 1/2 µ2 bk n
k

= op (1), k = 1, 2. For each k = 1, 2,

note that

1
Ck =
nk

nk
i=1

1
εk,i
nk

nk

rk (Xk,l )(1 − δk,l )δk,i Kb (Xk,i − Xk,l )
ˆ
k

ˆ
hk (Xk,l )

l=1

+ δk,i rk (Xk,i ) .
ˆ

Write Ck = Ck,1 + Ck,2 + Ck,3 + Ck,4 , where

1
Ck,1 :=
nk

nk
i=1

1
εk,i
nk

nk
l=1

rk (Xk,l )(1 − δk,l )δk,i Kb (Xk,i − Xk,l )
k
hk (Xk,l )
−

1
Ck,2 :=
nk

nk
i=1

1
εk,i
nk

nk
l=1

δk,i (1 − ∆k (Xk,i ))rk (Xk,i )
,
∆k (Xk,i )

rk (Xk,l )
rk (Xk,l )
ˆ
−
ˆ
hk (Xk,l ) hk (Xk,l )
×(1 − δk,l )δk,i Kb (Xk,i − Xk,l ) ,
k

1
Ck,3 :=
nk
Ck,4 :=

1
nk

nk

εk,i δk,i (ˆk (Xk,i ) − rk (Xk,i )),
r
i=1
nk
i=1

εk,i δk,i rk (Xk,i )
.
∆k (Xk,i )

For i, l = 1, · · · , nk , k = 1, 2, let

Ik,i,l :=

rk (Xk,l )(1 − δk,l )δk,i Kb (Xk,i − Xk,l )
k
,
hk (Xk,l )

61

Jk,i :=

δk,i (1 − ∆k (Xk,i ))rk (Xk,i )
.
∆k (Xk,i )

By (e1), (e2), (g1), (u), (w2), and routine calculation, one has E(N 1/2 Ck,1 ) = 0 and

V ar(N 1/2 Ck,1 )
n

=
=

=

2
1 k
N
E ε2
Ik,1,l − Jk,1
k,1 n
nk
k l=1
N
(n − 1)(nk − 2)
nk − 1 2
Ik,1,2 + k
Ik,1,2 Ik,1,3
E ε2
k,1
nk
n2
n2
k
k
2(nk − 1)
2
−
Jk,1 Ik,1,2 + Jk,1
nk
N nk − 1
2
2
σk (x)∆k (x)gk (x) rk (x + bu)(1 − ∆k (x + bu))
n k n 2 bk
k

×(∆k (x + bu))−2 (gk (x + bu))−1 K 2 (u)dudx
+

(nk − 1)(nk − 2)
n2
k

2
σk (x)∆k (x)gk (x)

rk (x + bu)

×(1 − ∆k (x + bu))(∆k (x + bu))−1 )

rk (x + bv)

×(1 − ∆k (x + bv))(∆k (x + bv))−1 K(u)K(v)dudvdx
−

2(nk − 1)
nk

2
σk (x)rk (x)(1 − ∆k (x))gk (x)

rk (x + bu)

×(1 − ∆k (x + bu))(∆k (x + bu))−1 ) K(u)dudx
+
→ 0,

2
2
σk (x)rk (x)(1 − ∆k (x))2 (∆k (x))−1 gk (x)dx

k = 1, 2.

Hence N 1/2 Ck,1 = op (1), k = 1, 2. Recall the modiﬁcation rk of rk deﬁned in (2.10) which
˜
ˆ
is CCV on I for rk . Let for i, j, m = 1, · · · , nk , k = 1, 2,

rk,i (x) := E[˜k (x)|X, δ, Yk,i ],
˜
r

rk,i,j (x) := E[˜k,i (x)|X, δ, Yk,j ],
˜
r

62

1
ˆ
Mk,i :=
nk
1
˜
Mk,i :=
nk

nk
l=1
nk
l=1

rk (Xk,l )
ˆ
rk (Xk,l )
−
(1 − δk,l )δk,i Kb (Xk,i − Xk,l ),
k
ˆ
hk (Xk,l ) hk (Xk,l )
rk (Xk,l )
rk (Xk,l )
˜
−
(1 − δk,l )δk,i Kb (Xk,i − Xk,l ),
k
ˆ
hk (Xk,l ) hk (Xk,l )

˜
˜
Mk,i;j := E[Mk,i |X, δ, Yk,j ],

˜
˜
Mk,i;j,m := E[Mk,i;j |X, δ, Yk,m ].

Then we have

1
Ck,2 =
nk

nk
i=1

1
˜
εk,i Mk,i;i +
nk

nk
i=1

= Ck,2,1 + Ck,2,2 + Ck,2,3 ,

1
˜
˜
εk,i (Mk,i − Mk,i;i ) +
nk

nk

ˆ
˜
εk,i (Mk,i − Mk,i )
i=1

say.

For each k = 1, 2, let Qk,l;i,j := E[(˜k,i (Xk,l ) − rk,i,j (Xk,l ))2 |X, δ], l, i, j = 1, · · · , nk . By
r
˜
C-S inequality, one has
n

n

n

n

N k k
˜
˜
E[(Mk(i),i − Mk(i),i,j )2 |X, δ]
Sk,1 : =
2
nk i=1 j=1
N k k
=
E
n2 i=1 j=1
k

1
nk

nk

˜
rk,i (Xk,l ) rk,i,j (Xk,l )
˜
−
ˆ
ˆ
hk (Xk,l )
hk (Xk,l )
l=1
×(1 − δk,l )δk,i Kb (Xk,i − Xk,l )
k

n

n

2

X, δ

n

˜
˜
N k k
1 k rk,i (Xk,l ) rk,i,j (Xk,l ) 2
≤
−
Kb (Xk,i − Xk,l ) X, δ
E
2
k
ˆ
ˆ
nk
nk i=1 j=1
hk (Xk,l )
hk (Xk,l )
l=1
1
×
nk
nk nk

nk

N
1
≤ sup gk (x) 2
ˆ
E
nk i=1 j=1 nk
x∈I
l=1

nk

(1 − δk,l )δk,i Kb (Xk,i − Xk,l )
k
l=1

rk,i (Xk,l ) rk,i,j (Xk,l ) 2
˜
˜
X, δ
−
ˆ
ˆ
hk (Xk,l )
hk (Xk,l )
×Kb (Xk,i − Xk,l )
k

63

N
= sup gk (x) 3
ˆ
nk
x∈I

nk nk nk

l=1 i=1 j=1 k
nk

λ
ˆ
{hk (Xk,m ) ≥ k }]
2

= sup gk (x)I[
ˆ
x∈I

Qk,l;i,j
Kb (Xk,i − Xk,l )
ˆ
h2 (Xk,l ) k

m=1

N
× 3
nk
nk

+ sup gk (x)I[
ˆ
x∈I

m=1

Qk,l;i,j
Kb (Xk,i − Xk,l )
ˆ
h2 (Xk,l ) k

l=1 i=1 j=1 k

λ
ˆ
{hk (Xk,m ) < k }]
2

N
× 3
nk
= Sk,1,1 + Sk,1,2 ,

nk nk nk

nk nk nk

Qk,l;i,j
Kb (Xk,i − Xk,l )
ˆ
h2 (Xk,l ) k

l=1 i=1 j=1 k

say.

By Assumption 2.2.1, (2.5), (e1), and C-S inequality, one obtains
n

N k
sup
Qk,l;i,j
1≤i,l≤nk nk j=1
n

N k
E[(˜k,i (Xk,l ) − rk,j,i (Xk,l ))2 |X, δ]
r
˜
=
sup
nk
1≤i,l≤nk
j=1
n

n

N k k
≤
sup
E[E({˜k (Xk,l ) − rk,j (Xk,l )}2 |X, δ, Yk,i )|X, δ]
r
˜
2
1≤l≤nk nk i=1 j=1
≤ N max sup E[{˜k (x) − rk,j (x)}2 |X, δ] = op (1).
r
˜

(2.21)

1≤j≤nk x∈I

(2.12) in Remark 2.2.2 shows supx∈I gk (x) = op (1). Together with (2.11) and (2.21), we
ˆ
have

1
Sk,1,1 ≤
n2
k

nk nk

n

N k
Kb (Xk,i − Xk,l )
sup
Qk,l;i,j
k
1≤i,l≤nk nk j=1
l=1 i=1
nk

× sup gk (x)I[
ˆ
x∈I

ˆ
{hk (Xk,m ) ≥ λk /2}]
m=1

64

(λk /2)2

≤

sup gk (x)
ˆ

n

N k
Qk,l;i,j
sup
1≤i,l≤nk nk j=1

2

x∈I

(λk /2)2 = op (1).

(2.13) in Remark 2.2.2 leads to the result
nk

P(

ˆ
ˆ
{hk (Xk,i ) < λ/2}) ≤ P ( max |hk (Xk,i ) − hk (Xk,i )| > λ/2)
1≤i≤n
k

i=1

ˆ
≤ P (sup |hk (x) − hk (x)| > λ/2) → 0.
x∈I

Together with the fact that

nk ˆ
i=1 {hk (Xk,i )

< λ/2} ∈ σ(X, δ), one has Sk,1,2 = op (1). Thus,

we have Sk,1 = op (1), k = 1, 2. Let

˜
Di := ε1,i M1,i;i ,

Di,j := E[Di |X, δ, Y1,j ],

i, j = 1, · · · , n1 .

Note that by (e1), Di,i = 0, and E[Di Dj |X, δ] = E[(Di − Di,j )(Dj − Dj,i )|X, δ]. From (e2),
one has
n

n

n

k,2,1

)2 |X, δ]

n

n

E[(N 1/2 C

n

N k k
=
E[(Di − Di,j )(Dj − Dj,i )|X, δ]
n2 i=1 j=1
k
N k k
≤
E[(Di − Di,j )2 |X, δ]
2
nk i=1 j=1
N k k
˜
˜
=
E[ε2 (M1,i;i − M1,i;i,j )2 |X, δ]
1,i
2
nk i=1 j=1
2
≤ Sk,1 sup σk (x) = op (1),

k = 1, 2.

x∈I

Thus N 1/2 Ck,2,1 = op (1), k = 1, 2. By similar routine in proving Sk,1 = op (1), one has
N
Sk,2 := n
k

nk
2
˜
˜
i=1 E[(Mk,i − Mk,i;i ) |X, δ]

= op (1), k = 1, 2. This together with (e2) and C-S

inequality leads to the following result:
65

(N 1/2 C

k,2,2

)2

N
≤
nk

nk

˜
˜
(Mk,i − Mk,i;i )2
i=1

1
nk

nk

ε2
k,j = op (1),

k = 1, 2.

j=1

Because of P (|N 1/2 Ck,2,3 | > 0) ≤ P (supx∈I |˜k (x)− rk (x)| > 0) → 0, we have N 1/2 Ck,2,3 =
r
ˆ
op (1), k = 1, 2. Therefore one obtains N 1/2 Ck,2 = op (1), k = 1, 2. By similar proof as that
of Theorem 2.6 in K-S, N 1/2 Ck,3 = op (1) can be derived. Then one has

N 1/2 C

N 1/2
=
k
nk

nk
i=1

rk (Xk,i )
δ ε + op (1),
∆k (Xk,i ) k,i k,i

k = 1, 2.

Furthermore, by Assumption 2.2.1, (2.4), (p), C-S inequality, and Law of Large Numbers,
one obtains

N 1/2 R1

1
=
n1
+

N 1/2 R2

≤

n1

1
r1 (X1,i )φ(X1,i ) +
n1

i=1
n1

1
n1

1
n1

n1

(˜1 (X1,i ) − r1 (X1,i ))φ(X1,i )
r
i=1

(ˆ1 (X1,i ) − r1 (X1,i ))φ(X1,i ) = N 1/2 D + op (1),
r
˜
i=1
n1

1/2
r1 (X1,i )
ˆ2

i=1

1
n1

n1

¯
(φ1 (X1,i ) − φ(X1,i ))2

1/2

.

i=1

Together by the result as follows:

1
n1

n1

¯
(φ1 (X1,i ) − φ(X1,i ))2
i=1

1
=
n1
≤

1
n1

n1
i=1
n1
i=1

1
n1
1
n1

n1

ˆ
(φ(X1,l ) − φ(X1,i ))δ1,l Kb1 (X1,i − X1,l )/h1 (X1,i )

l=1
n1

ˆ
(φ(X1,l ) − φ(X1,i ))2 δ1,l Kb1 (X1,i − X1,l )/h1 (X1,i )
l=1

66

2

n

=

n

1 1 1
ˆ
(φ(X1,l ) − φ(X1,i ))2 δ1,l Kb1 (X1,i − X1,l )/h1 (X1,i )
n2 i=1
1
l=1
n1

× I[

n1

ˆ
{h1 (X1,m ) ≥ λ1 /2}] + I[

m=1

ˆ
{h1 (X1,m ) < λ1 /2}]

m=1

= op (1) + op (1) = op (1),

we have N 1/2 R2 = op (1). Therefore, one obtains

ˆ
N 1/2 T

=

N 1/2

1
D+
n1

n1
i=1

r1 (X1,i )
1
δ1,i ε1,i −
∆1 (X1,i )
n2

n2
j=1

r2 (X2,j )
δ ε
+ op (1).
∆2 (X2,j ) 2,j 2,j

Thus the proof is complete.

Theorem 2.3.2. Suppose (e1), (e2), (e4), (g1), (g2), (k), (m), (p), (u), (w1), (w2), and
ˆ
Assumption 2.2.1 hold. Then under H1 in (2.2), one has N 1/2 T →p ∞.

The proof of Theorem 2.3.2 is similar to that of Theorem 2.3.1, only with diﬀerence that
N 1/2 (R1 + R2 ) →p ∞ under H1 .

Remark 2.3.1. Let γ :=

u(x)φ(x)dx τ . Assume that under H0 , H1 , and H1N , the

assumptions of Theorem 2.3.1 hold, and there exists an estimate τ 2 of τ 2 which satisﬁes
ˆ
τ 2 = τ 2 + op (1). Then, one has
ˆ

ˆ τ
N 1/2 T /ˆ →d N (0, 1),

under H0 ,

ˆ τ
N 1/2 T /ˆ →d N (γ, 1),

under H1N ,

ˆ τ
N 1/2 T /ˆ →p ∞,

under H1 .

67

Consequently, the asymptotic level of the test

ˆ
ˆ τ
V = I{N 1/2 T /ˆ ≥ zα },

(2.22)

is α. The asymptotic power of this test under H1N is 1 − Φ(zα − γ). An application of the
Cauchy-Schwarz (C-S) inequality shows that γ and the asymptotic power are maximized by
the choice

u = uφ :=

φ1I
.
q1 ψ1 + q2 ψ2

The maximal asymptotic power is 1 − Φ(zα − γφ ), where

γφ :=

1/2
φ2 (x)1I (x)
dx
q1 ψ1 (x) + q2 ψ2 (x)

is the maximal value of γ. This result is similar to that of the complete responses data
discussed in Remark 2.8 and Remark 2.9 of K-S. The only diﬀerence in the missing data at
random structure is reﬂected in having ∆k (x) appear in the denominator of ψk , k = 1, 2.
The result is exactly the same as that of complete responses data when ∆k ≡ 1, k = 1, 2.

2.4

Some suggested estimators

In this section we shall consider estimates of vk and τ 2 . K-S give these estimates for a given
u and the (unknown) optimal u when responses are complete, and discuss their properties.
When responses are missing at random, similar estimates and properties are still valid. They
are listed as follows for the sake of completeness.
68

ˆ
The following discussion gives an estimate of vk , k = 1, 2. Recall gk and hk from (2.11).
ˆ
When u is known, consider

vk :=
ˆ

√

u/ˆk ,
g

vk :=
˜

√

u/(ˆk ∨ η),
g

(2.23)

where η is a positive number satisfying gk (x) > 4η for all x ∈ I, vk is a modiﬁcation of vk
˜
ˆ
which satisﬁes assumption of Lemma 2.2.1. This implies that Assumption 2.2.1 holds.
When u = uφ with a known non-negative continuous function φ, let ck and dk be bandwidths satisfy (w3), and consider
nk
j=1 Yk,j δk,j Kck (x − Xk,j )
,
nk
δk,j Kck (x − Xk,j )
j=1

µk,c (x) :=
ˆ
σk (x)
ˆ2

:=

σ2
ˆ
ˆk := k ,
ψ
ˆ
hk

nk
j=1 (Yk,j

µφ :=
ˆ

(2.24)

− µk,c (Xk,j ))2 δk,j Kd (x − Xk,j )
ˆ
k
nk
j=1 δk,j Kdk (x − Xk,j )

φ1I
ˆ
ˆ
q1 ψ1 + q2 ψ2

,

vk :=
ˆ

µφ
ˆ
,
gk
ˆ

,

x ∈ R,

k = 1, 2.

Arguing as in the estimation of vk when u = uφ in section 3 of K-S, we can ﬁnd a modiﬁcation
vk of vk , which satisﬁes the assumptions of Lemma 2.2.1, such that Assumption 2.2.1 holds.
˜
ˆ
The following lemma gives the needed properties of σk .
ˆ2

Lemma 2.4.1. Suppose (e1), (e2), (e3), (e4), (g1), (g2), (k), (m), (p), (u), (v), and (w3)
hold. Then for each k = 1, 2,

2
sup |ˆk (x) − σk (x)| = op (1),
σ2

under H0 , H1 , and H1N ,

x∈I

2
and σk is essentially CCV on I for σk , under H0 , H1 , and H1N .
ˆ2

69

(2.25)

Proof. First we give the proof of (2.25) under H1N . The case φ = 0 corresponds to the
result under H0 . For k = 1, 2 and x ∈ I, deﬁne
1
ˆ
hk,c (x) :=
nk
µk,c (x) :=
¯

εk,c (x) :=
¯

1
nk

nk

δk,l Kck (x − Xk,l ),
l=1
nk
l=1 µk (Xk,l )δk,l Kck (x − Xk,l )

ˆ
hk,c (x)
1
nk

nk
l=1 εk,l δk,l Kck (x − Xk,l )

ˆ
hk,c (x)

,

,

ˆ
while hk,d (x), µk,d (x), and εk,d (x) can be deﬁned similarly when the bandwidth dk is used
¯
¯
2
instead of ck . One can write σk (x) − σk (x) into the sum of the following terms:
ˆ2

Zk,1 (x) =

Zk,2 (x) =

nk
2
¯
j=1 (µk (Xk,j ) − µk,c (Xk,j )) δk,l Kdk (x − Xk,j )

1
nk

ˆ
hk,d (x)

,

nk 2
¯
j=1 εk,c (Xk,j )δk,j Kdk (x − Xk,j )

1
nk

ˆ
hk,d (x)
2
nk
−

+

nk
¯
j=1 εk,j εk,c (Xk,j )δk,j Kdk (x − Xk,j )

1
nk

ˆ
hk,d (x)
nk 2
j=1 εk,j δk,j Kdk (x − Xk,j )

ˆ
hk,d (x)

2
− σk (x)

= Zk,2,1 − Zk,2,2 + Zk,2,3 ,

Zk,3 (x) =

2
nk

say,

nk
¯
j=1 (µk (Xk,j ) − µk,c (Xk,j ))(εk,j

− εk,c (Xk,j ))δk,j Kd (x − Xk,j )
¯

ˆ
hk,d (x)

k

.

By (m), (p), and (u), we have supx∈I Zk,1 (x) ≤ 2 2 c2 + op (n−1 ) = op (1). (2.13) and (2.14)
µ k
in Lemma 2.2.2 leads to the result

70

sup Zk,2,1 (x) ≤

x∈I

max ε2 (Xk,j )
¯
1≤j≤nk k,c

≤ sup
x∈I

h(x) 2
ˆ
hk,c (x)

sup

1
nk

x∈I

nk
2
l=1 εk,l δk,l Kck (x − Xk,l )
h2 (x)

= op (1).

From (2.13) and (2.15) in Lemma 2.2.2, one obtains

sup |Zk,2,3 (x)| ≤ sup

x∈I

1
nk

nk 2
2
j=1 εk,j δk,j Kdk (x − Xk,j ) − σk (x)hk (x)

hk (x)

x∈I

hk (x)
hk (x)
2
+ sup σk (x) sup
−1
ˆ
ˆ
x∈I hk,d (x)
x∈I
x∈I hk,d (x)

× sup

= op (1)op (1) + op (1)op (1) = op (1).

By C-S inequality, we have supx∈I |Zk,2,2 (x)| = op (1) and supx∈I |Zk,3 (x)| = op (1). There2
fore, supx∈I |ˆk (x) − σk (x)| = op (1) holds under H1N .
σ2

Under H1 of (2.2), the above proof remain the same except that of Z1,1 (x). By (m), (q),
(u), and compactness of I,

sup Z1,1 (x)

x∈I

≤

sup
x∈I,0≤t≤c1

≤
≤

sup

(µ1 (x) − µ1 (x + t))2
2(µ2 (x) − µ2 (x + t1 ))2 +

sup

2(ξ(x) − ξ(x + t2 ))2

x∈I,0≤t1 ≤c1
x∈I,0≤t2 ≤c1
2 2 2 c2 +
sup
2(ξ(x) − ξ(x + t2 ))2 = op (1).
µ 1
x∈I,0≤t2 ≤c1

Therefore one has (2.25) under H1 . The rest of the results in this lemma can be proved in a
routine fashion. Thus the proof is complete.
71

To estimate τ 2 , let {Ik,1 , · · · , Ik,B } and πk be as in assumption (z). Deﬁne
k

ˆ
∆k (x) :=

gk (x) :=
˜

nk
j=1 δk,j Kck (x − Xk,j )
,
nk
j=1 Kck (x − Xk,j )
nk

1
nk π k

1{X
j=1

k,j ∈Ik,i }

,

x ∈ R,

x ∈ Ik,i ,

k = 1, 2.

By Remark 3.2 in K-S, the function gk (x) is a simple bin-estimate, which is uniformly
˜
consistent for gk (x) for x ∈ I under condition (z). Recall rk from (2.9). Because τ 2 from
ˆ
(2.17) can be expressed as

τ = q1
= q1

2
2
r1 (x)σ1 (x)g1 (x)
dx + q2
∆1 (x)
4
2
3
v1 (x)σ1 (x)g1 (x)
dx + q2
∆1 (x)

2
2
r2 (x)σ2 (x)g2 (x)
dx
∆2 (x)
4
2
3
v2 (x)σ2 (x)g2 (x)
dx,
∆2 (x)

we consider two estimators of τ 2 :

τ2
ˆ

1
:= q1
n1

τ∗
ˆ2

1
:= q1
n1

n1

n

r1 (X1,i )ˆ1 (X1,i )
ˆ2
σ2
ˆ2
σ2
1 2 r2 (X2,j )ˆ2 (X2,j )
+ q2
,
ˆ
ˆ
n2
∆1 (X1,i )
∆2 (X2,j )
i=1
j=2
n1
i=1

v1 (X1,i )ˆ1 (X1,i )˜1 (X1,i )
ˆ4
σ2
g2
1
+ q2
ˆ
n2
∆1 (X1,i )

n2
j=2

v2 (X2,j )ˆ2 (X2,j )˜2 (X2,j )
ˆ4
σ2
g2
.
ˆ
∆2 (X2,j )

These estimators have the following properties, which can be proved in a routine fashion.

Lemma 2.4.2. Suppose the assumptions of Lemma 2.4.1, (w1), and (z) hold. Then

τ 2 = τ 2 + op (1),
ˆ

and

hold under H0 , H1 , and H1N .
72

τ∗ = τ 2 + op (1)
ˆ2

2.5

Simulations

ˆ
In this section we investigate the behavior of the empirical size and power of the test V deﬁned
in (2.22) against local alternatives and ﬁxed alternatives. To be speciﬁc, let I = [0, 1], Z1 and
Z2 be independent standard normal random variables, and independent of {X1 , X2 , δ1 , δ2 }.
Recall uφ deﬁned in (2.24). Design and error distributions and functions including φ, ξ, u,
ˆ
µ2 , ∆l , l = 1, 2, are chosen as follows:

X1 ∼ N (0, (0.7)2 ),
ε1 =

Z1

X1 and X2 are independent;

2
ε2 = Z2 (1 + X2 );

,

2
1 + X1

∆l (x) = Dl (x),

X2 ∼ N (0, 1),

l = 1, 2,

where D1 (x) = {1 + exp(−0.5 − 0.5x)}−1 ,
or ∆l (x) ≡ 1,
φ(x) = φj (x),

l = 1, 2,

D2 (x) = {1 + exp(−2 − 2x)}−1 ,

for complete responses;

j = 0, 1, 2, 3,

where φ0 (x) = 0,

φ1 (x) = (x + 1)2 ,

φ2 (x) = 2ex ,

φ3 (x) = 4 cos(x);

ξ(x) = ξj (x),

j = 1, 2, 3,

where ξj (x) = φj (x);

u(x) = uj (x),

j = 1, 2, 3,

where uj (x) = 1[0,1] (x)φj (x),

or u(x) = u∗ (x),
j

j = 1, 2, 3,

where u∗ (x) = uφj (x);
ˆ
j

µ2 (x) = log (x2 + 0.5).

The kernel is chosen to be K(u) := 3 (1 − u2 )I{|u| ≤ 1}, with bandwidths a = ρ1 N −2/3 ,
4
−2/3

bk = ρ2 nk

−1/4

, and ck = dk = ρ3 nk

, k = 1, 2, where ρi , i = 1, 2, 3, are constants. The

sample sizes are chosen to be n1 = n2 = 50, 100, 200. All simulations are based on 2000
73

replications. The nominal level is α = 0.05. The empirical sizes and powers are computed
ˆ τ
by the relative frequency of the event {N 1/2 T /ˆ ≥ 1.645}.
ˆ
Table 2.1: Empirical sizes of V , with coeﬃcients ρ1 , ρ2 , ρ3 , and ∆l = Dl , l = 1, 2.
u1
u∗
1
u2
u∗
2
u3
u∗
3

(ρ1 , ρ2 , ρ3 )
(.5,.2,.8)
(.5,.2,.8)
(.8,.5,.8)
(.2,.5,.8)
(.2,.2,.5)
(.2,.5,.8)

n1 = n2 = 50
.066
.071
.077
.066
.072
.068

n1 = n2 = 100
.057
.059
.053
.062
.058
.055

n1 = n2 = 200
.050
.052
.051
.052
.050
.049

ˆ
Table 2.2: Empirical sizes of V , with coeﬃcients ρ1 , ρ2 , ρ3 , and ∆l = 1, l = 1, 2.
u1
u∗
1
u2
u∗
2
u3
u∗
3

(ρ1 , ρ2 , ρ3 )
(.8,.5,.8)
(.8,.8,.8)
(.2,.2,.8)
(.5,.8,.8)
(.2,.2,.8)
(.2,.2,.8)

n1 = n2 = 50
.073
.085
.072
.084
.071
.063

n1 = n2 = 100
.066
.079
.065
.074
.061
.073

n1 = n2 = 200
.052
.052
.049
.051
.052
.050

Before we calculate the empirical powers, we choose suitable coeﬃcients (ρ1 , ρ2 , ρ3 ) in
bandwidths for each u and the corresponding test, in order to make the empirical size close
to 0.05 when n1 = n2 = 200. To ﬁnd such coeﬃcients, we compare the empirical sizes
among all choices of ρi ∈ {0.2, 0.5, 0.8}, i = 1, 2, 3, and pick the one which is closest to
0.05 at n1 = n2 = 200. For each u, empirical sizes with the best choice of (ρ1 , ρ2 , ρ3 ) at
n = 50 and 100 are also listed. These results of data with responses missing at random,
i.e. ∆l = Dl , l = 1, 2, are put in Table 2.1; while results of complete data set, ∆l ≡ 1,
l = 1, 2, are reported in Table 2.2. Notice that these choices of ρi s are just fairly good ones
ˆ
among many others. There doesn’t really exist best choices. The behavior of V under null
hypothesis will not be aﬀected by the choices of these coeﬃcients for large sample sizes n1
and n2 .
74

ˆ
Table 2.3: Empirical powers of V with ρ1 , ρ2 , ρ3 in Table 2.1, and ∆l = Dl , l = 1, 2.
φ
φ1

φ2

φ3

n1 = n2
50
100
200
50
100
200
50
100
200

u = u1
.268
.238
.238
.420
.384
.379
.431
.402
.410

u = u∗
1
.183
.230
.308
.268
.281
.357
.292
.303
.379

u = u2
.281
.265
.269
.436
.388
.389
.472
.403
.388

u = u∗
2
.230
.255
.341
.318
.339
.399
.344
.368
.421

u = u3
.242
.233
.215
.356
.370
.351
.436
.427
.444

u = u∗
3
.210
.228
.282
.308
.324
.411
.371
.395
.464

ˆ
Table 2.4: Empirical powers of V with ρ1 , ρ2 , ρ3 in Table 2.2, and ∆l = 1, l = 1, 2.
φ
φ1

φ2

φ3

n1 = n2
50
100
200
50
100
200
50
100
200

u = u1
.339
.295
.280
.519
.472
.429
.495
.434
.404

u = u∗
1
.345
.302
.303
.509
.503
.468
.541
.490
.483

u = u2
.238
.247
.236
.382
.353
.373
.401
.380
.376

u = u∗
2
.325
.308
.282
.503
.477
.494
.542
.507
.503

u = u3
.234
.215
.237
.360
.376
.373
.448
.445
.455

u = u∗
3
.207
.185
.201
.314
.310
.329
.405
.420
.404

ˆ
Table 2.5: Empirical sizes and powers of V with ρ1 = ρ2 = ρ3 = 1 and ∆l = Dl , l = 1, 2.
φ
φ0

φ1

φ2

φ3

n1 = n2
50
100
200
50
100
200
50
100
200
50
100
200

u = u1
.070
.055
.056
.280
.262
.243
.439
.405
.394
.410
.362
.332

u = u∗
1
.070
.060
.056
.293
.288
.275
.459
.448
.440
.463
.439
.454

u = u2
.068
.060
.054
.292
.264
.244
.450
.419
.374
.417
.391
.361

75

u = u∗
2
.080
.052
.054
.299
.273
.280
.457
.434
.458
.474
.489
.490

u = u3
.065
.058
.060
.281
.258
.242
.439
.418
.396
.503
.488
.476

u = u∗
3
.065
.058
.059
.252
.241
.230
.412
.394
.411
.521
.508
.516

ˆ
Table 2.6: Empirical sizes and powers of V with ρ1 = ρ2 = ρ3 = 1 and ∆l = 1, l = 1, 2.
φ
φ0

φ1

φ2

φ3

n1 = n2
50
100
200
50
100
200
50
100
200
50
100
200

u = u1
.068
.061
.060
.305
.294
.263
.477
.452
.439
.449
.394
.364

u = u∗
1
.073
.074
.059
.331
.307
.317
.525
.524
.472
.537
.537
.533

u = u2
.076
.056
.045
.315
.290
.281
.494
.456
.463
.474
.412
.392

u = u∗
2
.088
.061
.066
.340
.315
.320
.525
.514
.509
.571
.557
.541

u = u3
.063
.052
.041
.301
.266
.265
.500
.470
.474
.553
.541
.535

u = u∗
3
.078
.054
.062
.274
.288
.253
.488
.467
.471
.611
.601
.610

ˆ
Table 2.3 and 2.4 give the empirical powers of V against H1N of (2.1), with respect to
missing data and complete data, respectively. These empirical powers of each test with
corresponding u are calculated with the coeﬃcients (ρ1 , ρ2 , ρ3 ) in bandwidths given in Table
2.1 and 2.2.
ˆ
Table 2.5 and 2.6 compare the empirical powers of V with diﬀerent u s against H1N , by
choosing common coeﬃcients ρ1 = ρ2 = ρ3 = 1 in bandwidths, with respect to missing
data and complete data, respectively. In each table, one can see that the empirical sizes are
ˆ
getting closer to 0.05 as the sample sizes increase. For each φ = φj , j = 1, 2, 3, the test V
with u = u∗ has the largest, or one of several largest, empirical power among all choices of
j
u. This is consistent with the result in Remark 2.3.1. Moreover, for each j = 1, 2, 3, u = u∗
j
has larger empirical powers than u = uj for all choices of φ. From comparison between
two tables, one can see that empirical powers of all these tests at three sample sizes of the
complete data’s are larger than those of the missing data’s, while their empirical sizes don’t
show much diﬀerence. It means that data missing probability aﬀects the power of the test.
ˆ
All of the empirical powers of V with above choices of u are 1, against H1 in (2.2) with
76

ξ = ξj , j = 1, 2, 3, for both of the missing data and the complete data, and for all three
ˆ
sample sizes. This result in turn shows the consistency of V .

77

BIBLIOGRAPHY

78

BIBLIOGRAPHY
[1] Bosq, D. 1998. Nonparametric Statistics for stochastic Processes, 2nd Edition. Springer,
Berlin.
[2] Collomb, G. and H¨rdle, W. (1986). Strong uniform convergence rates in robust nona
parametric time series analysis and prediction: kernel regression estimation from dependent observations. Stochastic Process. Appl., 23, no.1, 77-89.
[3] Eubank, R.L. and Hart, J.D. 1992. Testing goodness-of-ﬁt in regression via order selection criteria. Ann. Statist., 20, 1412-1425.
[4] Eubank, R.L. and Hart, J.D. 1993. Commonality of CUSUM, von Neumann and
smoothing based goodness-of-ﬁt tests. Biometrika, 80, 89-98.
[5] Eubank, R.L. and Spiegelman, C.H. 1990. Testing the goodness of ﬁt of a linear model
via nonparametric regression techniques. J. Amer. Statist. Assoc., 85, 387-392.
[6] Hall, P., Huber, C., Speckman, P.L., 1997. Covariate-matched one-sided tests for the
diﬀerence between functional means. J. Amer. Statist. Assoc., 92, 1074-1083.
[7] Hall, P. 1984. Central limit theorem for integrated square error of multivariate nonparametric density estimators. J. Multivariate Anal., 14, 1-16.
[8] H¨rdle, W. and Mammen, E. 1993. Comparing nonparametric versus parametric rea
gression ﬁts. Ann. Statist., 21, 1926-1947.
[9] Hart, J.D. 1997. Nonparametric Smoothing and Lack-of-ﬁt Tests. Springer, New York.
[10] Koul, H.L. 2011. Minimum distance lack-of-ﬁt tests for ﬁxed design. J. Statist. Plann.
Inference, 141, 65-79.
79

[11] Koul, H.L. and Ni, P.P. 2004. Minimum distance regression model checking. J. Statist.
Plann. Inference, 119, 109-141.
[12] Koul, H.L. and Schick, A. (1997). Testing the equality of two regression curves. J.
Statist. Plann. Inference, 65, 293-314.
[13] Koul, H.L. and Schick, A. (2003). Testing the superiority among two regression curves.
J. Statist. Plann. Inference, 117, 15-33.
[14] Koul, H.L. and Song, W.X. 2009. Minimum distance regression model checking with
Berkson measurement errors. Ann. Statist., 37, 132-156.
[15] Little, R.J.A. and Rubin, D.B. 1987. Statistical Analysis with Missing Data. Wiley,
New York.
[16] Mack, Y.P. and Silverman, B.W. 1982. Weak and strong uniform consistency of kernel
regression estimates. Z. Wahrsch. Gebiete, 61, 405-415.
[17] Schick, A. 1987. A note on the construction of asymptotic linear estimators. J. Statist.
Plann. Inference, 16, 89-105. Correction (1989), 22, 269-270.
[18] Stute, W., Thies, S., and Zhu, L.X. 1998. Model checks for regression: an innovation
process approach. Ann. Statist., 26, 1916-1934.
[19] Sun, Z.H. and Wang, Q.H. 2009. Checking the Adequacy of a General Linear Model
with Responses Missing at Random. J. Statist. Plann. Inference, 139, 3588-3604.
[20] Zheng, J.X. 1996. A consistent test of functional from via nonparametric estimation
technique. J. Econometrics, 75, 263-289.

80